+ All Categories
Home > Documents > THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL...

THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL...

Date post: 09-Jun-2018
Category:
Upload: truongnhi
View: 213 times
Download: 0 times
Share this document with a friend
52
THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO A SOUND’S LOCATION 1.1. THE DUPLEX THEORY OF AUDITORY LOCALIZATION T raditionally, the principal cues to a sound’s location are identified as the differences between the sound field at each ear. The obvious fact that we have two ears sampling the sound field under slightly different conditions makes these binaural cues self-evident. A slightly more subtle concept underlying traditional thinking is that the differ- ences between the ears are analyzed on a frequency by frequency ba- sis. This idea has as its basis the notion that the inner ear encodes the sounds in terms of its spectral characteristics as opposed to its time domain characteristics. As a result, complex spectra are thought to be encoded within the nervous system as varying levels of activity across a wide range of auditory channels; each channel corresponding to a different segment of the frequency range. While there is much merit and an enormous amount of data supporting these ideas, they have tended to dominate research efforts to the exclusion of a number of other important features of processing. In contrast to these traditional views, there is a growing body of evidence that: (i) illustrates the important role of information available at each ear alone (monaural cues to sound location); CHAPTER 2 Virtual Auditory Space: Generation and Applications, edited by Simon Carlile. © 1996 R.G. Landes Company.
Transcript
Page 1: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

THE PHYSICAL

AND PSYCHOPHYSICAL BASIS

OF SOUND LOCALIZATIONSimon Carlile

1. PHYSICAL CUES TO A SOUND’S LOCATION

1.1. THE DUPLEX THEORY OF AUDITORY LOCALIZATION

T raditionally, the principal cues to a sound’s location are identifiedas the differences between the sound field at each ear. The obvious

fact that we have two ears sampling the sound field under slightlydifferent conditions makes these binaural cues self-evident. A slightlymore subtle concept underlying traditional thinking is that the differ-ences between the ears are analyzed on a frequency by frequency ba-sis. This idea has as its basis the notion that the inner ear encodes thesounds in terms of its spectral characteristics as opposed to its timedomain characteristics. As a result, complex spectra are thought to beencoded within the nervous system as varying levels of activity acrossa wide range of auditory channels; each channel corresponding to adifferent segment of the frequency range. While there is much meritand an enormous amount of data supporting these ideas, they havetended to dominate research efforts to the exclusion of a number ofother important features of processing. In contrast to these traditionalviews, there is a growing body of evidence that:

(i) illustrates the important role of information available ateach ear alone (monaural cues to sound location);

CHAPTER 2

Virtual Auditory Space: Generation and Applications, edited by Simon Carlile.© 1996 R.G. Landes Company.

Page 2: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

28 Virtual Auditory Space: Generation and Applications

(ii) suggests that processing across frequency is an importantfeature of those mechanisms analyzing cues to sound loca-tion (monaural and binaural spectral cues);

(iii) suggests that the time (rather than frequency) domain char-acteristics of the sound may also play an important role insound localization processing.

The principal theoretical statement of the basis of sound localiza-tion has become know as the “duplex theory” of sound localizationand has its roots in the work of Lord Rayleigh at the turn of thecentury. It is based on the fact that “the main difference between thetwo ears is that they are not in the same place.”1 Early formulationswere based on a number of fairly rudimentary physical and psycho-physical observations. Models of the behavior of sound waves aroundthe head were made with simplifying approximations of the head as asphere and the ears as two symmetrically placed point receivers (Fig.␣ 2.1).2

Despite these simplifications the resulting models had great explana-tory and predictive power and have tended to dominate the researchprogram for most of this century.

The fact that we have two ears separated by a relatively large headmeans that, for sounds off the mid-line, there are differences in thepath lengths from the sound source to each ear. This results in a dif-ference in the time of arrival of the sound at each ear; this is referredto as the interaural time difference (ITD). This ITD manifests as adifference in the onset of sound at each ear and, for more continuoussounds, results in an interaural difference in the phase of the soundsat each ear (interaural phase difference: IPD). There are importantfrequency limitations to the encoding of phase information. The audi-tory nervous system is known to encode the phase of a pure tonestimulus at the level of the auditory receptors only for relatively lowfrequencies.3 Psychophysically, we also seem to be insensitive to dif-ferences in interaural phase for frequencies above about 1.5␣ kHz.4,5 Forthese reasons, the duplex theory holds that the encoding of interauraltime differences (in the form of interaural phase differences) is re-stricted to low frequency sounds.

As the head is a relatively dense medium it will tend to reflectand refract sound waves. This only becomes a significant effect whenthe wavelengths of the sound are of the same order or smaller thanthe head. For a sound located off the midline, the head casts an acousticshadow for the far ear and generates an interaural difference in thesound level at each ear (interaural level difference: ILD). At low fre-quencies of hearing this effect is negligible because of the relativelylong wavelengths involved, but for frequencies above about 3␣ kHz themagnitude of the effect rises sharply. The amount of shadowing ofthe far ear will depend on the location of the source (section 1.3) sothat this effect provides powerful cues to a sound’s location. There arealso changes in the level of the sound at the ear nearer to the sound

Page 3: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

29

source that are dependent on the location of the source. The lattervariations result from two distinct effects: Firstly, the so-called ob-stacle or baffle effect (section 1.3) and secondly, the filtering effects ofthe outer ear (section. 1.5 and chapter 6, section 2.2). The head shadowand near ear effects can result in interaural level differences of 40 dBor more at higher frequencies. The magnitudes of these effects andthe frequencies at which they occur are dependent on the precisemorphology of the head and ears and thus can show marked differ-ences between individuals.

The duplex theory is, however, incomplete in that there are a numberof observations that cannot be explained by reference to the theoryand a number of observations that contradict the basic premises ofthe theory. For instance, there is a growing body of evidence that thehuman auditory system is sensitive to the interaural time differencesin the envelopes of high frequency carriers (see review by Trahiotis6).There are a number of experiments that suggest that this informationis not dependent on the low frequency channels of the auditory sys-tem.7,8 In the absence of a spectral explanation of the phenomena, this

Fig. 2.1. The coordinate system used for calculating the interaural time differences in a simple pathlength model and the interaural level difference model. In these models the head is approximatedas a hard sphere with two point receivers (the ears). Reprinted with permission from Shaw EAG.In: Keidel W D, Neff W D, ed. Handbook of Sensory physiology. Berlin: Springer-Verlag,1974:455-490.

The Physical and Psychophysical Basis of Sound Localization

Page 4: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

30 Virtual Auditory Space: Generation and Applications

suggests a role for some form of time domain code operating at higherfrequencies. Furthermore, recent work suggests that coding the interauraldifferences in both amplitude and frequency modulated signals is de-pendent on rapid amplitude fluctuations in individual frequency channelswhich are then compared binaurally.9

The incompleteness of the duplex theory is also illustrated by thefact that listeners deafened in one ear can localize a sound with a fairdegree of accuracy (chapter 1, section 2.2). This behavior must bebased upon cues other than those specified by the duplex theory whichis principally focused on binaural processing of differences betweenthe ears. A second problem with the theory is that because of thegeometrical arrangement of the ears a single interaural difference intime or level is not associated with a single spatial location. That is, a

Fig. 2.2. The interaural time and level binaural cues to a sound’s location are ambiguousif considered within frequencies because a single interaural interval specifies more thanone location in space. Because of the symmetry of the two receivers on each side of thehead, a single binaural interval specifies the locations in space which can be describedby the surface of a cone directed out from the ear, the so-called “cone of confusion.” Forinteraural time differences, the cone is centered on the interaural axis. The case is slightlymore complicated for interaural level differences as, for some frequencies, the axis of thecone is a function of the frequency. Reprinted with permission from Moore BCJ. AnIntroduction to the Psychology of Hearing. London: Academic Press, 1989.

Page 5: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

31

particular interaural difference will specify the surface of an imaginarycone centered on the interaural axis (Fig.␣ 2.2). The solid angle of thecone will be associated with the magnitude of the interval; for ex-ample the cone becomes the median plane for zero interaural timedifference and becomes the interaural axis for a maximum interauraltime difference. Therefore, interaural time differences less that themaximum possible ITD will be ambiguous for sound location. Thesehave been referred to as the “cones of confusion.”1 Similar argumentsexist for interaural level differences although, as we shall see, the conesof confusion for these cues are slightly more complex.

The kind of front-back confusions seen in a percentage of local-ization trials is consistent with the descriptions of the binaural cuesand indicative of the utilization of these cues (chapter 1, section 2.1.3).However, the fact that front-back confusions only occur in a smallfraction of localization judgments suggests that some other cues areavailable to resolve the ambiguity in the binaural cues. These ambigu-ities in the binaural cues were recognized in the earliest statements ofthe duplex theory and it was suggested that the filtering properties ofthe outer ear might play a role in resolving these ambiguities. How-ever, in contrast to the highly quantitative statements of the binauralcharacteristics and the predictive models of processing in these earlyformulations, the invocation of the outer ear was more of an ad hocadjustment of the theory to accommodate a “minor” difficulty. It wasnot until the latter half of this century that more quantitative modelsof pinna function began to appear10 and indeed it has been only re-cently that quantitative and predictive formulations of auditory local-ization processing have begun to integrate the role of the outer ear11

(but see Searle et al12). In the following sections we will look in detailat what is known about the acoustics of the binaural cues and also theso-called monaural cues to a sound’s location. We will then look atthe role of different structures of the auditory periphery in generatingthese location cues and some of the more quantitative models of thefunctional contribution of different components of the auditory pe-riphery such as the pinna, head, shoulder and torso.

1.2. CUES THAT ARISE AS A RESULT OF THE PATH LENGTHDIFFERENCE

The path length differences depend on the distance and the angu-lar location of the source with respect to the head (Fig. 2.1).1,13 Varia-tions in the ITD with distance is really only effective for source loca-tions a to 3a, where a is the radius of a sphere approximating thehead. At distances greater than 3a the wave front is effectively planar.The ITDs produced by the path length differences for a plane soundwave can be calculated from

D = r (θ + sin(θ)) (1)

The Physical and Psychophysical Basis of Sound Localization

Page 6: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

32 Virtual Auditory Space: Generation and Applications

where D = distance in meters, r = radius of head in meters, θ = angleof sound source from median plane in radians, (Fig. 2.1).1

The timing difference produced by this path length difference is14

t = D/c (2)

where t = time in seconds, c = speed of sound in air (340␣ m s–1).

The interaural phase difference (IPD) produced for a relativelycontinuous periodic signal is then given by Kuhn15

IPD = tω (3)

where ω = radian frequency.

For a continuous sound, the differences in the phase of the soundwaves at each ear will provide two phase angles; a° and (360° – a°). Ifthese are continuous signals there is no a priori indication of whichear is leading. This information must come from the frequency of thesound wave and the distance between the two ears. Assuming themaximum phase difference occurs on the interaural axis, the only un-ambiguous phase differences will occur for frequencies whose wave lengths(λ) are greater than twice the interaural distance. At these frequenciesthe IPD will always be less than 180° and hence the cue is unam-biguous.

Physical measurements of the interaural time differences producedusing click stimuli are in good agreement with predictions from thesimple “path length” model described above.14-16 This model breaksdown however, when relatively continuous tonal stimuli are used(Fig.␣ 2.3).14,15,17,18 In general, the measured ITDs for continuous tonesare larger than those predicted. Furthermore, the ITDs become smallerand more variable as a function of frequency and azimuth locationwhen the frequency exceeds a limit that is related to head size.

The failure of the simple models to predict the observed varia-tions in ITDs results from the assumption that the velocity of thesound wave is independent of frequency. Three different velocities canbe ascribed to a signal; namely the phase, group and signal velocities.15,18,19

The rate of propagation of elements of the amplitude envelope is rep-resented by the group ITD, while the phase velocity of the carrier is

a The fact that a signal can have a number of different velocities is notintuitively obvious to many. Brillouin19 likens the phase and group velocitiesto the ripples caused by a stone cast into a pond. He points out for instance thatif the group velocity of the ripple is greater than the phase velocity one seeswavelets appearing at the advancing edge of the ripple, slipping backwardsthrough the packet of wavelets that make up the ripple and disappearing at thetrailing edge.

Page 7: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

33

Fig. 2.3. Measurements of the interaural time differences using a dummy head reveal that this is a function ofboth frequency and the type of sound. The points plot data obtained from the measurement of on-going phaseof a tone at a number of angles of incidence (15°, 30°, 45°, 60°, 75° and 90° referenced to the median plane).The solid lines to the left show the predictions based on the phase velocity of the wave (eq. 5) and can beseen to be a good match for the data only for the lowest frequencies. The boxed points show the solutionsfor integer ka for the complete model from which equation (5) was derived (i.e., without the simplifyingassumption that ka < 1; see text). On the right y-axis, the dashed lines show the predictions of the simple pathlength model (eq. 2) and the arrows show measurements from the leading edge of a tone burst. Reprintedwith permission from Kuhn GF, J Acoust Soc Am 1977; 62:157-167.

The Physical and Psychophysical Basis of Sound Localization

best ascribed to what was previously thought of as the steady stateITD.a Over the frequency range of auditory sensitivity, the group andsignal velocities are probably identical.18

When phase velocity is constant, phase and group velocities willbe equal, regardless of wavelength. However, because the phase veloc-ity of sound waves is dependent on wavelength (particularly at highfrequencies), then relatively large differences can occur between thephase and group velocities.19 In addition, as a wave encounters a solidobject, it is diffracted such that the wavefront at the surface of theobject is a combination of the incident and reflected waves. Underthese circumstances the phase velocity at the surface of the object be-comes frequency-dependent in a manner characteristic of the object.18

The interaural phase differences based on phase velocity, for fre-quencies in the range 0.25␣ kHz to 8.0␣ kHz, have been calculated us-ing a sphere approximating the human head (Fig. 2.3).

IPD ≈ 3ka sin(ainc) (4)

Page 8: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

34 Virtual Auditory Space: Generation and Applications

where k = acoustic wave numberb (2π/λ), a = radius of the sphere,ainc = angle of incidence of the plane sound wave (see Kuhn15 forderivation).

The interaural time difference is calculated using equation (3)

ITD ≈ 3(a/c)sin(ainc) (5)

where c = speed of sound in air.

According to equation 5, ITD is constant as a function of fre-quency, however this relation15 holds only where (ka)2 << 1. The pre-dicted ITDs from this formulation are larger than those predicted us-ing path-length models of the time differences around the human head1

(eq. 1), and for any one stimulus location are constant as a functionof frequency only for frequencies below 0.5␣ kHz (where a␣ =␣ 8.75 cm).Above this frequency, ITDs decrease as a function of frequency to thevalues predicted by the path-length model (eq.␣ 1).

The steady state ITDs measured from a life-like model of the hu-man head were dependent on the frequency of the sinusoidal stimu-lus15,17 and were in good agreement with the theoretical predictions(Fig. 2.3). In summary, measured ITDs were larger than predicted bythe simple path-length model and relatively stable for frequencies be-low about 0.5␣ kHz. ITDs decreased to a minimum for frequencies above1.4␣ kHz to 1.6␣ kHz and varied as a function of frequency at higherfrequencies. In general there was much less variation in the measuredITDs as a function of frequency for angles closer to the median plane.

Roth et al18␣ measured ITDs for cats and confirmed that these changesin the ITD also occur for an animal with a smaller head and differentpinna arrangement. Moderate stability of the ITDs was demonstratedonly for frequencies below about 1.5␣ kHz and for locations within 60°of the median plane. In addition, the functions relating onset ITDand frequency were variable, particularly at high frequencies. This vari-ability was found to be attributable to the pinna and the surface sup-porting the animal. These findings indicate that it cannot be assumedthat a particular ITD is associated with a single azimuthal location.Steady state ITD is a potentially stable cue for sound localization onlyat low frequencies (humans < 0.6␣ kHz; cats < 1.5␣ kHz), but is fre-quency dependent at higher frequencies.

The phase and group velocities have also been calculated for thefirst five acoustic modes of the “creeping waves” around a rigid sphereb The acoustic wave number simply allows a more general relationship to beestablished between the wavelength of the sound and the dimensions of theobject. In Figure 2.3 the predicted and measured ITDs for the human head areexpressed in terms of both the acoustic wave number and the correspondingfrequencies for a sphere with the approximate size of the human head (in thiscase, radius␣ =␣ 8.75 cm).

Page 9: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

35

for ka between 0.4 and 25.0. The “creeping waves” are the waves re-sulting from the interaction of the incident and reflected sounds closeto the surface of the obstacle. The ka relates the wavelength to theradius of the sphere so that for a sphere approximating the humanhead (a␣ =␣ 8.75 cm) ka between 0.4 and 25.0 represents a frequencyrange of 0.25␣ kHz to 16␣ kHz. At 1.25␣ kHz the group velocities forthe first, second and third modes are 0.92, 0.72 and 0.63 times theambient speed of sound.20 These calculations suggest that there aresignificant differences between the group and phase velocities at fre-quencies that are physiologically relevant to human auditory localiza-tion. Roth et al18 have demonstrated differences of the order of 75␣ µsbetween phase and group ITDs around an acoustically firm sphereapproximating a cat’s head which are consistent with the calculationsof Gaunaurd.20 Thus, the physical description of sound wave trans-mission, and the acoustic measurements of the sound, suggests thattwo distinct types of interaural timing cues are generated in the fre-quency range relevant to mammalian sound localization.

1.3. THE HEAD AS AN ACOUSTIC OBSTACLEAs a consequence of the separation of the ears by the acoustically

opaque mass of the head, two different acoustic effects will vary thepressure at each ear for a sound source located away from the medianplane. The resulting disparity in the sound level at each ear is com-monly referred to as the Interaural Level Difference (ILD).c The firsteffect, occurring at the ear ipsilateral to the source of the sound, isdue to the capacity of the head to act as a reflecting surface. For aplane sound wave at normal incidence, the sound pressure at the sur-face of a perfectly reflecting barrier will be 6 dB higher than the pres-sure measured in the absence of the barrier21 (Fig. 2.4). Thus an on-axis pressure gain will be produced at the ipsilateral ear when thewavelength of the sound is much less than the interaural distance.

The second effect is due to the capacity of the head to diffract thesound wave. When the wavelength is of the same order as the interauraldistance, only small diffractive effects are produced. However, at rela-tively shorter wavelengths, the head acts as an increasingly effectiveobstacle and produces reflective and diffractive perturbations of thesound field. Thus, for an object of fixed size such as the head, thedistribution of sound pressure around the object will depend on theincident angle and the frequency of the plane sound wave.

The Physical and Psychophysical Basis of Sound Localization

c This is also referred to as the interaural intensity difference (IID); however,this is an inappropriate usage of the term. The differences so measured are thedifferences in the pressure of the sound at each ear, not in the average powerflux per unit area (intensity). Much of the early literature uses the term IIDalthough it is used in a way which is (incorrectly) synonymous with ILD.

Page 10: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

36 Virtual Auditory Space: Generation and Applications

The distribution of pressure about a hard sphere was first describedby Lord Rayleigh and further developed by Stewart around the turnof the century.10,22 Figure 2.4 shows the changes in the gain in soundpressure level (SPL), relative to SPL in the absence of the sphere, cal-culated as a function of frequency and the angle of incidence of aplane sound wave.10 Note the asymptotic increase to 6 dB for wavesat normal incidence due to the reflective gain. In contrast to the simple

Fig. 2.4. Calculated transformations of the sound pressure level from the free field to apoint receiver located on a hard sphere (radius a) as a function of the acoustic wavenumber (2πa/λ) and the angle of incidence of a plane sound wave. The location of thereceiver is indicated by the filled circle on the sphere and all azimuths (θ) are referencedto the median plane. On the lower x-axis the corresponding frequencies for the humanare included (head radius␣ =␣ 8.75 cm). For sound sources located in the ipsilateral field,the SPL increases as a function of the frequency to an asymptote of 6 dB. In a slightly lesssymmetrical model the higher order changes in the SPL exhibited for sources in thecontralateral field (θ␣ =␣ -45° and -135° for ka␣ =␣ 2.5; 4; 6; 10....) are likely to be smoother.Reprinted with permission from Shaw EAG. In: Keidel W D, Neff W D, ed. Handbook ofSensory physiology. Berlin: Springer-Verlag, 1974:455-490.

Page 11: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

37

characterization of the head as an acoustic obstacle which producesthe largest interaural intensity differences for sound located on theinteraural axis, the Rayleigh-Stewart model predicts that the largestinteraural difference will occur for sounds located around ±45° and±135°. This is due to the nature of the diffractive interactions of thesound traveling around the sphere from different directions and theirinteraction around the axis of the far ear.

The effects on the sound pressure at the surface of the sphere pro-duced by the distance of the source from the center of the sphere arelikely to be significant for distances of up to 10a, particularly for lowfrequencies.13 This effect is due mainly to the spherical nature of thesound wave in the vicinity of a point source. It should also be kept inmind that the ears of most mammals are not separated by an arc of180°; for instance the ears of Homo Sapiens are separated by an arc ofabout 165° measured around the front of the head. This arrangementwill produce slight differences between the pressure transforms for soundslocated anterior and posterior of the ears.13

In summary, the head acts as an effective acoustic obstacle whichreflects and diffracts the sound field for sounds whose wavelengths aresmall relative to the dimensions of the head. This results in a fre-quency dependent transformation of the sound pressure from the freefield to the acoustic receivers located on either side of the head. Dif-ferences in the sound pressure at each ear are related to the locationof the sound in the free field. These interaural level differences aremost significant for high frequencies.

1.4. TRANSFER FUNCTION OF THE AUDITORY PERIPHERY:SOME METHODOLOGICAL ISSUES

1.4.1. Measurement techniquesIn the first chapter it was discussed in broad terms how the outer

ear filters sound, resulting in the so-called ‘spectral’ cues to a sound’slocation. A large proportion of the rest of this chapter will reviewwhat is known about these outer ear filter functions. This issue is ofconsiderable importance, as these filter functions are the ‘raw mate-rial’ of VAS displays. In chapter 4 there is a more detailed review ofthe methodology of measuring these filter functions and validating themeasurements.

Directionally-dependent changes in the frequency spectra at theeardrum have been studied in humans using probe microphone mea-surements10,23-31

by miniature microphones placed in the ear ca-nal11,12,32-34,36-38 and by minimum audible field measurements39 (see alsoref. 40). Before proceeding, a number of general comments regardingthese methods should be made. What characterizes these studies is thewide range of interstudy and intersubject variation (for review see forinstance Shaw23,41). These variations are probably due to a variety of

The Physical and Psychophysical Basis of Sound Localization

Page 12: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

38 Virtual Auditory Space: Generation and Applications

factors. Firstly, intersubject variation in pinna dimensions might beexpected to produce intersubject variation in the measured transforms.To some extent this can be accounted for by structural averaging42

where the transforms are normalized along the frequency axis so themajor components of different transforms from different subjects co-incide, thus preserving the fine structure of the transformation. In-deed, where averaging of the transfer functions has been done withoutstructural averaging, then the functions are much shallower andsmoother.10,23

The other major source of variation results from differences in themeasurement procedures. Probably the most important considerationrelates to the specification of the point within the outer ear at whichthe transfer function from the free field should be measured. This is-sue is covered in more detail in chapter 4 (section 2.1) and so is onlybriefly considered here. Over the frequency range of interest the wavemotion within the canal is principally planar24 (see also Rabbitt43). Inthis case all of the directional dependencies of the transfer functioncould be recorded at any position within the canal.11,44 However, theeardrum also reflects a proportion of the incoming sound energy backinto the canal, which results in standing waves within the canal.45,46

The incoming sounds interacts with the reflected sounds to produce adistribution of pressure peaks and nulls along the canal that varies asa function of frequency. Simply put, a 180° phase difference in thephase of the incoming and outgoings waves is associated with a pres-sure null and at positions of 0° phase difference, a pressure peak. Thisresults principally in a notch in the measured transfer function whichmoves up in frequency as the position of a probe microphone getscloser to the eardrum.45,47 Therefore, to capture the HRTF faithfullyit is necessary to get as close to the eardrum as possible so that thenotch resulting from the pressure null is above the frequency range ofinterest. HRTFs measured at more distal locations in the canal will beinfluenced by the standing waves at the lower frequency range, so thatdifferences in the measurement positions between studies could thenbe expected to produce differences in the HRTFs reported.

Other influences on the measurements result from the perturba-tion of the sound field by the measurement instruments themselves.Although their dimensions are small (miniature microphone diameter=␣ 5␣ mm, probe tube diameter␣ =␣ 1 to 2␣ mm) these instruments have thecapacity to perturb the sound field, particularly at higher frequencies.These effects are related to the extent of the obstruction of the canaland their cross sectional location within the canal.43 The insertion ofminiature microphones into the meatus could also change the effec-tive length of the ear canal which could vary the quarter wavelengthclosed tube resonance of the auditory canal as well as affect the trans-verse mode of the concha (section 1.8.1). Blocking the canal to someextent is also likely to vary the input impedance of the ear but this is

Page 13: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

39

unlikely to affect the directional responses of the human ear to fre-quencies below 10␣ kHz. However, variation in impedance may wellaffect the relative magnitude of the spectral components of a frequencytransform for any one stimulus position.10,48,49

Much intersubject variation and variations between studies mayalso be due to the lack of a set of criteria for establishing reliablemorphological axes to act as references for sound source position. TheHRTFs are a strong function of location and even small variations inthe relative position of subjects’ heads with respect to the stimuluscoordinates both within and across studies are likely to lead to fairlylarge apparent variations in the HRTF across the population of sub-jects. One recent approach has been to use a perceptual standard fordefining these axes: that is, simply requesting a subject to point theirnose towards a particular visual and/or auditory target and to holdtheir head upright can result in a highly reproducible head posturewithin individuals (Carlile and Hyams, unpublished observations). Pre-sumably this reflects the precision with which we perceptually alignourselves with the gravitational plane on the one hand and our senseof “directly in front” on the other.

1.4.2. Coordinate systemSpecifying the precise location of a stimulus requires the adoption

of a coordinate system that at least describes the two dimensional spaceabout the subject. The most common form of coordinate system is asingle pole system, the same system that is used for specifying thelocation on the surface of the planet. With the head centered in animaginary sphere, the azimuth location is specified by the lines of lati-tude where directly ahead is usually taken to be azimuth 0° with loca-tions to the right of the anterior midline as negative. The elevation isspecified by lines of longitude with the audio-visual horizon as the 0°reference and inferior elevations being negative (Fig. 2.5a). The big-gest advantage of such a system is that it is the most intuitive or atleast, the system which people are most familiar with. One of the dis-advantages is that the radial distance specified by a particular numberof degrees azimuth varies as a function of the elevation. For instance,on the greater circle (Elevation 0°) the length of the arc specified perdegree is greatest and becomes progressively shorter as one approachesthe poles. This becomes a problem when, for instance, sampling theHRTFs at equidistant points in space. Specifying a sampling distanceof 5° will result in a massive over sampling of the areas of space asso-ciated with the poles. However, simple trigonometry will provide theappropriate corrections to be made to allow equal area sampling.

A second coordinate system that has been occasionally used is thedouble pole system. This specifies elevation in the same way as thesingle pole system but specifies azimuth as a series of rings parallel tothe midline and centered on the poles at each interaural axis (Fig.␣ 2.5b).

The Physical and Psychophysical Basis of Sound Localization

Page 14: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

40 Virtual Auditory Space: Generation and Applications

This system was first introduced with the development of a free fieldauditory stimulus system that was built around a semicircular hoop.The hoop was attached to supports and motors that were located onthe interaural axis so that rotating the hoop caused it to move overthe top of the animal (see Knudsen et al50). A speaker could be movedalong the hoop so that when the hoop was rotated the speaker movedthrough an arc described by the azimuth rings of the double pole sys-tem. The principal advantage of such a system is that the azimuth arclength is constant as a function of elevation. This is of course impor-tant when comparing localization errors across a number of elevations.A second advantage of this system is that each azimuth specifies thecone of confusion for particular interaural time differences and thussimplifies a number of modeling and computational issues.

The principal disadvantage of the double pole system is that it isnot very intuitive and from a perceptual point of view, does not mapwell to localization behavior. For instance, if a sound was presented atan elevation of 55° above the audio visual horizon on the left interauralaxis this would be classified as azimuth 90°/elevation 55° in a singlepole system. To turn to face such a source the subject would rotatecounter-clockwise through 90°. However, in a double pole system thislocation would be specified as 40° azimuth and 55° elevation and itseems counter intuitive to say that the subject would still have to turn90° to the left to face this source. Regardless of these difficulties, weshould be careful to scrutinize the coordinate system referred to when

Fig. 2.5. Two different coordinate systems used commonly to represent the location of a sound in space; (a)single pole system of latitude and longitude, (b) double pole system of colatitude and colongitude. Modifiedwith permission from Middlebrooks JC et al, J Acoust Soc Am 1989; 86:89-108.

Page 15: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

41

we consider generalized statements about localization processing. Forinstance, it has been claimed that monaural cues play little to no rolein azimuth localization processing but are important in determiningthe elevation of the source.51,52 This statement was made in the con-text of a double pole system and is well supported by the data. Shouldthis be misinterpreted in the context of a single pole system the state-ment is far more contentious and in fact would contradict those au-thors’ own data.

To some extent both the single and the double pole systems mapimperfectly to the perceptual dimensions of auditory space. When wespeak of a location in the vernacular we refer to the azimuth (or someanalogous term) but elevation is better mapped onto the notion of theheight of the source. That is, we live more in a perceptual cylinderthan at the center of a sphere, or more correctly we tend to stand atthe bottom of a perceptual cylinder. Notwithstanding this difficulty,the most convenient and most common form of design for a free fieldstimulus system is based on a semicircular hoop which describes a sphere.This therefore requires some form of spherical descriptor. As the singlepole system is by far the more intuitive we have chosen to use thissystem to describe spatial location.

1.5. HRTF MEASUREMENTS USING PROBE TUBESPure tones25,41 and the Fourier analysis of impulse responses29,30,31,42,53

have been used to study the spectral transformations of sounds in boththe horizontal and the vertical planes (see Shaw10,23,24 for extensive re-views of the early literature, chapter 3 section 5 for signal processingmethodology). We have recently recorded the HRTFs for each ear foraround 350 locations in space and systematically examined the changesin the HRTF for variations in azimuth and elevation on the anteriormidline and the interaural axis.31 These recordings were obtained us-ing probe tube microphones placed within 6␣ mm of the eardrum us-ing an acoustic technique to ensure placement accuracy. Using a dummyhead equipped with an internal microphone we also calibrated the acousticperturbations of our recording system27 (chapter 4).

1.5.1. Variation in the HRTF with azimuthThe horizon transfer function determined for the left ear of one

subject is shown in Figure 2.6. This has been calculated by collectingtogether the HRTFs recorded at 10° intervals along the audio-visualhorizon. The amplitude spectrum of each HRTF was determined anda continuous surface was obtained by interpolating between measure-ments (see Carlile31,44,54 for a full description of the method). Thissurface is plotted as a contour plot of azimuth location versus fre-quency with the gain at each frequency-location conjunction indicatedby the color of the contour. This horizon transfer function is repre-sentative of the large number of ears that we have so far measured.

The Physical and Psychophysical Basis of Sound Localization

Page 16: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

42 Virtual Auditory Space: Generation and Applications

The most prominent spectral features of the horizon transfer functionvary asymmetrically about the interaural axis (-90° azimuth). For instance,anterior sound locations result in a gain of greater than 15 dB in trans-mission for frequencies 3␣ kHz to 5␣ kHz but this is reduced by 10 dBto 15 dB over the same frequency range for posterior locations. Forthis subject, a second high gain feature was evident at high frequen-cies for anterior but not posterior locations. This is associated with avery sharp notch between 8␣ kHz and 10␣ kHz for anterior locationswhich is absent at posterior locations. The gain of the high frequencyfeature is highly variable between subjects but the associated notch isgenerally evident in all subjects. For frequencies below 2.5␣ kHz, thegain varied from around 0 dB at the anterior midline to around 6␣ dBfor locations around the interaural axis. This small gain and the loca-tion dependent changes are well predicted by the obstacle effect atthese frequencies (c.f. Fig. 2.4).

These findings are fairly typical of the results of previous studiesin which the effects of the auditory canal have been accounted for.10,29,42

Fig. 2.6. The horizon transfer function shows how the HRTF varies as a function of location for a particularhorizon. In this case, the HRTFs have been plotted and interpolated for locations on the ipsilateral audio-visualhorizon: 0° indicates the anterior median plane and 90° the ipsilateral interaural axis. Frequency is plottedusing a logarithmic scale and the gain of the transfer function in dB is indicated by the color of the contour.Reprinted with permission from Carlile S and Pralong D, J Acoust Soc Am 1994; 95:3445-3459.

Page 17: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

43

The differences between these studies fall within the range of theintersubject differences we have observed in our own recordings, withthe exception of the generally lower transmission of higher frequenciesand a deeper notch around 10␣ kHz to 12␣ kHz. This has also beenreported by Wightman and Kistler29 who have recorded the HRTFsusing similar techniques to those in our laboratory.

1.5.2. Variation in the HRTF with elevationThe variation in the HRTFs due to the elevation of the sound

source are shown in Figure 2.7 for the frontal median plane and thevertical plane containing the interaural axis. These plots have beengenerated in the same way as the horizon transfer functions with theexception that the elevation of the sound source is plotted on the Y-axis.These plots are referred to as the meridian transfer functions for vari-ous vertical planes.31 For the anterior median plane, the meridian transferfunctions show a bandpass gain of around 20 dB for frequencies be-tween 2␣ kHz and 5␣ kHz. For the lateral vertical plane, this bandpassextends from about 1␣ kHz to around 7␣ kHz. In both cases, there is anarrowing of the bandwidth at the extremes of elevation. Additionallythere is an upward shift in the high frequency roll off with an in-crease in elevation from -45° to above the audio-visual horizon. Thisis manifest as an increase in the frequency of the notch in the HRTFsfrom 5␣ kHz to 8␣ kHz with increasing elevation for the lateral merid-ian transfer function.

These findings are in good agreement with previously publishedstudies that have measured the transfer functions from similar spatiallocations to those shown in Figures 2.7. In particular, the increase inthe frequency of the high frequency roll-off for increasing elevationon the median plane has been previously reported23,24,55 (see also Hebrankand Wright36) and is evident in the averaged data of Mehrgardt andMellert.42 Compared to the asymmetrical changes in the horizon transferfunctions (Fig. 2.6) there are relatively smaller asymmetries in themeridian transfer functions for locations above and below the audio-visual horizon (Fig. 2.7).

Several authors have examined the transfer functions for soundslocated on the median plane.10,33,36,39,42 As with measurements of hori-zontal plane transformations, the intersubject variations are quite large.However, the most consistent feature is a narrow (1/3 octave) peak at8␣ kHz for sounds located over the head (90° elevation). This is evi-dent in miniature microphone recordings,33 in recordings taken frommodel ears,36 as well as in probe microphone recordings.42 This is con-sistent with the psychophysical data for median plane localization wherean acoustic image occurs over the head for narrow band noise cen-tered at 8␣ kHz36 (see also Blauert40). Furthermore, frontal cues may beprovided by a lowpass notch that moves from 4␣ kHz to 10␣ kHz aselevation increases from below the horizon to around 60°.10,33,36,39,42

The Physical and Psychophysical Basis of Sound Localization

Page 18: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

44 Virtual Auditory Space: Generation and Applications

Fig.

2.7

. The

mer

idia

n tr

ansf

er fu

nctio

ns a

re s

how

n fo

r a v

aria

tion

in th

e el

evat

ion

of t

he s

our

ce lo

cate

d o

n (a

) the

ant

erio

r mid

line

or (

b) t

he ip

sila

tera

lin

tera

ural

axi

s. T

he a

udio

-vis

ual h

oriz

on

corr

esp

ond

s to

0° e

leva

tion.

Oth

er d

etai

ls a

s fo

r Fig

. 2.6

. Rep

rinte

d w

ith p

erm

issi

on

fro

m C

arlil

e S

and

Pra

long

␣D,

J Aco

ust

Soc

Am

199

4; 9

5:34

45-3

459.

Page 19: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

45

The spectral features responsible for “back” localization seem morecomplicated. Mehrgardt and Mellert42 show peaks for frequencies be-tween 1␣ kHz and 1.5␣ kHz. Hebrank and Wright36 demonstrate a lowpasscut off for frequencies above 13␣ kHz for sounds located behind thehead and reported psychophysical data showing that signals with a peakaround 12␣ kHz to 13␣ kHz tend to be localized rear-ward. Blauert40

reports that the percept of back localization by narrow band noisestimuli can be produced with either 1␣ kHz or 10␣ kHz center frequen-cies. These studies suggest that rear-ward localization may be due to ahigh frequency (> 13␣ kHz) and/or a low frequency (< 1.5␣ kHz) peakin the median plane transformation.

1.6. CONTRIBUTION OF DIFFERENT COMPONENTSOF THE AUDITORY PERIPHERY TO THE HRTF

In considering the spectral transfer functions recorded at eitherend of the ear canal, it is important to keep in mind that structuresother than the pinna will contribute to these functions.10,56 Figure 2.8shows the relative contribution of various components of the auditoryperiphery calculated for a sound located at 45° azimuth. These mea-sures are very much a first approximation calculated by Shaw,10 butserve to illustrate the point that the characteristics of the HRTF aredependent on a number of different physical structures.

The gain due to the head, calculated from the Rayleigh-Stewartdescription of the sound pressure distribution around a sphere,10,21,22

increases with increasing frequency to an asymptote of 6 dB. The rateof this increase, as a function of frequency, is determined by the ra-dius of the sphere. In humans this corresponds to a radius of 8.75 cmand the midpoint to asymptote occurs at 630␣ Hz (see Fig. 2.4). Thecontribution of the torso and neck is small and restricted primarily tolow frequencies. These pressure changes probably result from the in-teractions of the scattered sound waves at the ear and are effectiveprimarily for low frequencies. The contribution of the pinna flap issmall at 45° azimuth but probably exerts a greater influence on theresulting total for sounds presented behind the interaural axis48 (seealso section 1.7).

The largest contributions are attributable to the concha and theear canal/eardrum complex. An important feature of these contribu-tions is the complementarity of the conchal and ear canal componentswhich act together to produce a substantial gain over a broad range offrequencies. However, an important distinction between the two is thatthe contribution of the ear canal is insensitive to the location of thestimulus, while the gain due to the concha and the pinna flange isclearly dependent on stimulus direction.10,24,48,57 That is to say, the HRTFis clearly composed of both location-dependent and location indepen-dent components.

The Physical and Psychophysical Basis of Sound Localization

Page 20: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

46 Virtual Auditory Space: Generation and Applications

1.7. MODELS OF PINNA FUNCTIONThere are three main functional models of pinna function. The

pinna is a structure convoluted in three dimensions (see Fig. 1.4) andall theories of pinna function refer, in some way, to the interactionsof sound waves, either within restricted cavities of the pinna, or as aresult of the reflections or distortions of the sound field by the pinnaor the pinna flap. These models and other numerical models of thefiltering effects of the outer ear are also considered in chapter 6.

1.7.1. A resonator model of pinna functionThe frequency transformations of the pinna have been attributed

to the filtering of the sound by a directionally-dependent multi-modalresonator.10,24,48,57-59 This model has been realized using two similaranalytical techniques. Firstly, precise probe tube measurements havebeen made of the sound pressures generated in different portions oflife-like models of the human pinna57 and real human pinnae. Be-tween five and seven basic modes have been described as contributingto the frequency transfer function of the outer ear. The first mode(M1) at around 2.9␣ kHz is attributed to the resonance of the ear canal.

Fig. 2.8. Relative contributions of the different components of the human auditory periphery calculatedby Shaw (1974). The source is located at 45° from the median plane. At this location the transformationis clearly dominated by the gains due to the concha and the ear canal. An important distinction betweenthese components is that the gain due to the concha is highly dependent on the location of the sourcein space, while the gain of the canal remains unaffected by location. Reprinted with permission fromShaw EAG, in: Keidel WD, Neff WD, ed. Handbook of Sensory physiology. Berlin: Springer-Verlag,1974:455-490.

Page 21: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

47

The canal can be modeled as a simple tube which is closed at oneend. An “end correction” of 50% of the actual length of the canal isnecessary in matching the predicted resonance with the measured fre-quency response. This correction is attributed to the tight folding ofthe tragus and the crus helicas around the opening of the canal en-trance57 (see Fig. 1.4).

The second resonant mode (M2) centered around 4.3␣ kHz is at-tributed to the quarter wavelength depth resonance of the concha.Again, the match between the predicted and the measured values re-quires an “end correction” of 50%. The large opening of the conchaand the uniform pressure across the opening suggests that the conchais acting as a “reservoir of acoustic energy” which acts to maintain ahigh eardrum pressure across a wide bandwidth.57

For frequencies above 7␣ kHz, precise probe tube measurements usingthe human ear model suggest that transverse wave motion within theconcha begins to dominate.24 The higher frequency modes (7.1␣ kHz,9.6␣ kHz, 12.1␣ kHz, 14.4␣ kHz and 16.7␣ kHz) result from complex multi-pole distributions of pressure within the concha resulting from trans-verse wave motion within the concha. An important result from thesemeasurements is that the gain of the higher frequency modes was foundto be dependent on the incident angle of the plane sound wave.

The second approach was to construct simple acoustic models ofthe ear and make precise measurements of the pressure distributionswithin these models.24,48,49,58 By progressively adding components tothe models, the functions of analogous morphological components ofthe human external ear could be inferred (Fig. 2.9). The pinna flangewas found to play an important role in producing location-dependentchanges in the gain of the lower conchal modes 3␣ kHz to 9␣ kHz.48

The model flange represents the helix, anti-helix and lobule of thehuman pinna (Fig. 2.9). There is an improved coupling of the firstconchal mode to the sound field for sound in front of the ear, butthis gain is greatly reduced when the sound source is toward the rear.This has been attributed to the interference between the direct andscattered waves from the edge of the pinna.49

By varying the shape of the conchal component of the models,the match between the frequency transforms for higher frequenciesmeasured from the simple models and that measured from life-likemodels of human ears can be improved. The fossa of the helix andthe crus helicas both seem to be very important in producing the di-rectional changes for the higher modes.24,49,58

In summary, while the primary features of the transformations seemto be adequately accounted for by the acoustic models, the agreementbetween the theoretical predictions of the modal frequencies and themeasured modes is not as good. The size of the ad hoc “end correc-tions” for the predictions based on simple tube resonance suggest thatwhile simple tube resonance models may provide a reasonable first

The Physical and Psychophysical Basis of Sound Localization

Page 22: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

48 Virtual Auditory Space: Generation and Applications

Fig. 2.9. Pressure transformation relative to the SPL at the reflecting plane for three simpleacoustic models of the outer ear. The variations in the gain of the model for variation inthe elevation of the progressive wave source are indicated in each panel (0° indicates alocation in front of the ear and 90° above the ear). The dimensions of the models are inmm and the small filled circle in each model illustrates the position of the recordingmicrophone. Panels A and B indicate the blocked meatus response of the models withincreasingly complex models of the concha, while panel C shows the response of thesystem with a tubular canal and an approximation to the normal terminating impedanceof the ear drum. Note the large gain at around 2.6␣ kHz that appears with the introductionof the canal component of the model. Reprinted with permission from Shaw EAG. In:Studebaker GA, Hochberg I. Acoustical factors affecting hearing aid performance.Baltimore: University Park Press, 1980:109-125.

Page 23: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

49

approximation, a more accurate description may require more sophis-ticated modeling techniques.

1.7.2. Sound reflections as a basis of pinna functionIn 1967, Dwight Batteau60 suggested that the pinna may trans-

form the incoming signal by adding a time-delayed copy of the signalto the direct signal. Sound from different locations in space could bereflected off different features of the pinna so that the magnitude ofthe time delay between the direct and reflected waves would be char-acteristic of the location of the sound source. The identity of the functionproviding the inverse transformation of the composite signal wouldprovide the information as to the source location.

Using a model of the human pinna, a monotonic change in thereflected delay of between 0 and 100␣ µs was produced by varying thelocation of a sound on the ipsilateral azimuth. Sounds located close tothe midline, where the greatest spatial acuity has been demonstrated,produced the largest delays, and those located behind the interauralaxis produced the smallest. Changing the elevation of the sound sourceproduced a systematic variation in the delay of between 100␣ µs and300␣ µs.60 Hiranaka and Yamasaki61 have examined these properties us-ing real human ears. Using miniature microphones they recorded theresponse of the ear to a very short transient produced by an electricspark. These data, collected using a very large number of stimuluslocations, clearly demonstrates systematic changes in the number ofreflected components and in the time delays between the reflectedcomponents as a function of the stimulus location. Stimuli located infront of the subjects produced two or more reflected components, while“rear-ward” locations produced only one, and “above” locations pro-duced no reflected components. The systematic changes in the delaysall occurred in the first 350␣ µs from the arrival of the direct signal.Furthermore, the amplitude of the reflected components was similarto that of the direct components.

An obvious objection to a localization mechanism based on re-flected delays is that the time differences involved in the delayed sig-nals are very small relative to the neural events which would presum-ably be involved in determining the identity of the inverse transformation.However, Wright et al62 demonstrated that when the amplitude ratiobetween a direct and a time-added component exceeds 0.67, humansubjects can easily discriminate 20␣ µs delays using white noise stimulipresented over headphones. These authors point out that the final spec-trum of a spectrally dense signal will depend on the interval betweenthe initial and time-added components. The interaction of differentfrequency components will depend on the phase relationship betweenthe direct and reflected waves which is, of course, dependent on thetime interval between the direct and reflected components. Such amechanism would be expected to produce a “comb filtering” of the

The Physical and Psychophysical Basis of Sound Localization

Page 24: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

50 Virtual Auditory Space: Generation and Applications

input signal and produce sharp peaks and nulls in the frequency trans-formation at frequencies where reinforcing and canceling phase inter-actions have occurred.62-65 Thus reflected echoes are likely to producechanges in the input signal that can be analyzed in the frequency do-main rather than the time domain as suggested by Batteau.60

The perceived location of monaurally presented noise stimuli changesas a function of the magnitude of the delay between the direct andtime-added components. Variations in the apparent elevation of thestimulus occur for delays of between 160␣ µs to 260␣ µs,63 while system-atic variations in the apparent azimuth are produced by 0 to 100␣ µschanges in the delay.64 Watkins63 derives a quantitative model for ver-tical localization of sound based on spectral pattern recognition asdescribed by an auto-correlation procedure. The quantitative predic-tions of median plane localization of low-, high- and band-pass noiseare in reasonable agreement with the psychophysical results of Hebrankand Wright.36

There are two main criticisms of the “time delay” models of pinnafunction. As was pointed out above, analysis in the time domain seemsunlikely because of the very small time intervals produced. The acous-tic basis of the frequency transformations produced by time-added delayshas also been criticized. Invoking an observation by Mach, Shaw58 hasargued that the wavelengths of the frequencies of interest are too largeto be differentially reflected by the folds of the pinna. Therefore, thevariations in the frequency transfer functions, as a function of sourcelocation, may not be due to phase interactive effects of the direct andreflected waves. Furthermore, the psychophysical discriminations of time-added delay stimuli may simply be based on the spectral changes nec-essarily produced by the analog addition of the two electrical signals;the implication of an underlying acoustic analogy in the ear is simplya theoretical expectation.

1.7.3. A diffractive model of pinna functionThe third model of pinna function results arises from work exam-

ining the frequency dependence of the directional amplification of thepinna of nonhuman mammals (cat,66 wallaby,67 guinea pig,68 bat,69 fer-ret44). There is an increase in the directionality of the ear as a func-tion of stimulus frequency. This has been attributed to the diffractiveeffects of the aperture of the pinna flap and has been modeled usingan optical analogy. For the cat, bat and wallaby there is a good fitbetween the predicted directionality of a circular aperture, which ap-proximates the pinna aperture, and the directional data obtained frompressures measured within the canal or estimated from cochlear mi-crophonic recordings. For the ferret44 and the guinea pig68 the modelprovides a good fit to the data only at higher frequencies. These dif-ferences undoubtedly reflect the different morphologies of the outerears of these two groups of animals: The mobile pinnae of the cat, bat

Page 25: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

51

and wallaby resemble a horn collector whereas the pinnae of the guineapig and ferret more closely resemble the outer ear of the human wherethe pinna flange is a less dominant structure.

As an exclusive model of pinna function, the optical analogy ofthe diffractive effect does not lend itself easily to an explanation ofthe directionally-dependent changes in the frequency transfer functionsat the primary conchal resonance for both the ferret and the humanpinna. The diffractive effect, which is due to the outer dimensions ofthe pinna flap, probably acts in conjunction with other acoustic ef-fects produced by the convolutions around the ear canal (see Carlile44

for further discussion).In summary, the pinna is probably best described as a directionally

sensitive multimodal resonator for narrow-band and relatively continuoussignals. For transient sounds, the pinna may act as a reflector, wherethe time delay between the direct and reflected sound is a unique functionof the source location. In this case the signal analysis is unlikely to bein the time domain as was first suggested by Batteau60 but more likelyrepresents a frequency domain analysis of the resulting comb-filteredinput. In animals such as the cat where the pinna flap is a dominantfeature, the monopole directionality of the ear, particularly for highfrequencies, is probably best described by the diffractive effects of thepinna flap.

1.8. JUDGMENTS OF THE DISTANCE OF A SOUND SOURCE

1.8.1. Psychophysical PerformanceHuman subjects are relatively poor at judging the distance of the

sound source under controlled conditions70 (also see Blauert40 for anexcellent discussion). As was discussed above, the externalization of asound heard over headphones could be considered a component ofthe egocentric distance of the source. However, it is likely that withheadphone listening, the rather unnatural perception of a sound withinthe head probably results from the failure to provide the auditory sys-tem with an adequate set of cues for the generation of an externalizedsound source. More conventionally, the perception of the distance ofa sound source has been found to be related to the familiarity of thesubject with the source of the sound. Gardner71 found that distanceestimates were excellent over a 1␣ m to 9␣ m range for normal speechbut that whispering and shouting lead to under estimates and overestimates of distance respectively. Haustein (reported in Blauert40) foundthat good estimates of distance over much the same range of distancescould be obtained for clicks if the subject was pre-exposed to the stimuliover the test range. With unfamiliar sounds, distance estimates are initiallypoor although subjects also seem to be able to decrease the error oftheir estimate with increased exposure to the sound, even in the ab-sence of feedback.70

The Physical and Psychophysical Basis of Sound Localization

Page 26: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

52 Virtual Auditory Space: Generation and Applications

1.8.2. Acoustic cues for the judgment of distanceThere are a number of potential physical cues to source distance

that the auditory system could utilize (see Coleman72 for review). Thesimplest cue is one of stimulus level; if the source is located in ananechoic environment, then as the distance of a source is increasedthe pressure level decreases by 6 dB for each doubling of the distance.If the source level is known or the distance between the receiver and alevel invariant source is changed, then pressure level may act as a dis-tance cue. Indeed, it has been shown that the estimated distance ofthe source of speech is related to the presentation level for distancesup to 9␣ m71,73 although the situation seems to be more complex forimpulsive sounds40 and sounds presented in virtual auditory space.74

Certainly, human listeners are sensitive to relatively small variations inpressure. However, as discussed below, there are also distance effectson the spectrum of the sound, which is in turn an important factor inthe perception of the loudness. It is unclear how these factors mightinteract in the perception of sound source distance.

Hirsch has argued that if the azimuth direction of a sound sourceis know from, say, ITD cues, then ILDs cues might be useful in de-termining distance.75␣ Molino76 tested Hirsch’s original predictions (slightlymodified to account for a number of theoretical problems) but foundthat, for pure tones at least, subjects seem to be unable to utilize thispotential cue. There is some evidence that the rank order of distanceestimates of noise sound sources placed along the interaural axis werebetter than for other relative head locations77 although absolute dis-tance judgments did not seem to benefit when there were very fewstimulus presentations in the test environment.78 This suggests that,under some rather restricted conditions, binaural information may providesome additional cue to source distance.

When the source is close to the head, say less than 2␣ m to 3␣ m,the wavefront from a point source will be curved (the radius of curva-ture being related directly to the distance from the source). As wehave seen above, the transfer functions of the outer ears vary as afunction of the angle of incidence of the source, so that we wouldexpect that variation in the distance of the source over this range wouldresult in variation in the transfer functions. As well as varying themonaural spectral cues as a function of distance, this could also resultin variations in the binaural cues; if the sound is located off the mid-line then the “apparent angle” of the source will also vary for each earas a function of distance. Certainly the distance-related variations inthe path lengths from the source to each ear produce variation in theinteraural time differences that fall into the perceptually detectable range.In this light, it may be important that normal human verbal commu-nication is generally carried out within the range of distances that arerelevant to this discussion. There would also be considerable evolu-tionary pressure for accurate distance estimations within this range,particularly for predators.

Page 27: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

53

At distances greater than 15␣ m the attenuation of high frequenciesby transmission through the air becomes greater than that for low fre-quencies. This is likely to be increasingly important for frequenciesgreater than 2␣ kHz and such attenuation is also be affected by meteo-rological conditions.79 In a leafy biosphere this attenuation is likely tobe even more marked. This will lead to lowpass filtering of a soundwith a roll-off that will vary with the distance from the source. Butleret al80 demonstrated that the apparent distance of sound sources re-corded under both echoic and anechoic conditions varies as a functionof the relative differences in the high and low frequency componentsof the spectra played back over headphones. More recent findings haveprovided evidence that the spectral content can provide a relative dis-tance cue but this requires repeated exposure to the sound.81 Thesefindings are consistent with the everyday perceptual experiences of thelow rumbling sounds of distant thunder or a distant train or aircraft.

The discussion above largely ignores the situation where the lis-tening environment is reverberant, as is certainly the case for mosthuman listening experiences. The pattern of reflected sounds reachinga listener in a reverberant environment will vary as a function of therelative geometry of the environment. In general, the relative soundenergy levels of the direct and reverberant sound will vary as a func-tion of distance. At locations close to the listener, the input signalwill be dominated by the direct signal but at increasing distances therelative level of the reverberant contribution will increase. This effectof reverberation on the perception of the distance of the source wasfirst demonstrated by von Bekesy82 and for many years the recordingindustry has used the effects of reverberation to increase the percep-tion of both the distance and the spaciousness of a sound.

Mershon and King83␣ make an important distinction between re-quiring the subjects to make an estimate of the distance and to reportthe apparent distance of the source; the former case implies that morethan apparent distance should be taken into account and may wellconfound experiments concerned with the metrics of perceptual varia-tion. Likewise, they argue that nonauditory cues may play an impor-tant role in distance estimation, particularly where the observer is re-quired to select from a number of potential targets. In a very simpleexperiment, these authors assessed the relative contributions of soundlevel and reverberation to the perception of the apparent distance of awhite noise source. Subjects estimated the distance of two sources (2.7␣ mand 5.5␣ m) presented at one of two sound levels under either anechoicor reverberant conditions. In summary, they found that while soundlevel might provide powerful cues to variation in the distance of asource, there was no evidence that it was used as an absolute cue tosource distance. Under reverberant conditions, the apparent distanceof the source was farther than for sources presented at the same dis-tance in anechoic conditions, thus confirming von Bekesy’s82 earlierobservations. Probably the most important finding of this study was

The Physical and Psychophysical Basis of Sound Localization

Page 28: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

54 Virtual Auditory Space: Generation and Applications

that reverberation provided a cue to the absolute distance of the sourceas well as providing information about the change in relative distance.However, this study used only a very limited number of conditions todetermine the relative contribution of these two distance cues. Morerecently Mershon et al84 have reported that in conditions of low re-flectance distance is systematically underestimated. In a reverberantenvironment target distance was also underestimated in the presenceof background noise, presumably because of the disruption of rever-berant cues to distance.

There is a considerable amount of work still to be done on theaccuracy of distance perception, the effects of different reverberantenvironments and the interaction of the other potential cues to dis-tance perception. In this context, the use of VAS stimuli combinedwith simple room acoustical models will be able to make an impor-tant and rapid contribution to our understanding of the influence ofdifferent factors on distance perception.

2. PSYCHOPHYSICAL SENSITIVITY TO ACOUSTICCUES TO A SOUND’S LOCATION

2.1. INTRODUCTIONThis section examines some of the data on the sensitivity of the

human auditory system to the physical cues identified in section 1. Itis important to recognize, however, that demonstrating sensitivity to aparticular cue does not necessarily demonstrate its involvement in au-ditory localization. While there is value in such correlative arguments,they are not necessarily demonstrative of a causative connection be-tween the physical parameter and the perception. There are a numberof physical parameters which co-vary with the location of a sound sourcethat may be involved in other auditory processes, but do not directlyrelate to the perception of location itself; for instance, the separationof foreground sounds of interest from background masking sounds (e.g.,ref. 82, 85). Likewise, the judgment of just noticeable difference be-tween two stimuli may also be important but determination of theabsolute location of a sound source is more likely to also require cod-ing of the magnitude and vector of the differences between two stimuli(chapter 1, section 2.1).

Studies examining the sensitivity to a single cue to a sound’s loca-tion have been carried out using closed field (headphone) stimulation.This is often referred to as dichotic stimulus presentation and involvesthe independent stimulation of each ear by headphones or by soundsystems sealed into the ears. This system facilitates the precise controlof the timing and intensities of the stimuli delivered to each ear. Theprincipal advantage here of course is that a single parameter associatedwith a sound’s location can be varied in a way which would be im-possible by changing the location of the sound in the free field where

Page 29: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

55

all of the parameters co-vary. The stimuli generally employed in thesekinds of experiments are not intended to result in the perception ofan externally localized image of the sound source (although this ispossible86,87), but rather, have been designed to observe the psycho-physical or physiological effects of varying or co-varying the binauralcues of time and intensity. We should also bear in mind, however, therecent work suggesting that there may be a number of important in-terdependencies between different aspects of a stimulus88 (chapter 1,section 1.3). Under such conditions, superposition in system behaviormay not be a valid assumption. Despite these limitations, a consider-able understanding about the range and limits of stimulus coding bythe auditory system can be gained from these studies.

2.2. SENSITIVITY TO INTERAURAL TIME DIFFERENCES

2.2.1. Psychophysical measures of sensitivity to interaural timedifferences

The auditory nervous system is exquisitely sensitive to changes inthe interaural phase or timing of stimuli presented dichotically. Thesmallest detectable interaural time differences are for noise containinglow frequencies, where just noticeable differences can be a low as 6␣ µs.89

For a 1␣ kHz tone, the just noticeable variation in interaural phase isbetween 3° and 4° phase angle. Above this frequency the thresholdrises very rapidly, so that phase difference becomes undetectable forfrequencies above 1.5␣ kHz.4,5 These findings are consistent with pre-dictions of the upper frequency limit for unambiguous phase informa-tion; i.e., for 1␣ kHz the wavelength (λ) equals 34 cm and, as the aver-age human interaural distance equals 17.5 cm, the upper limit for preciseinteraural phase discrimination is almost exactly λ/2. The upper fre-quency limit for interaural phase sensitivity is also consistent with thephysiological limits imposed by the fidelity of phase encoding by theauditory nervous system3 (section 2.2.1).

The correspondence between the predicted upper frequency limitfor unambiguous phase information and the upper limit of phase dis-crimination demonstrated by the dichotic experiments has lent consid-erable support to the duplex theory of sound localization. However,more recent psychophysical experiments have provided evidence thatthe simple division of time and intensity localization cues on the basisof frequency may be naive6 (see Hafter90 for discussion). Amplitudemodulated stimuli with high frequency carriers can be lateralized onthe basis of time differences in their envelopes7,91,92

(see also Boerger,quoted in Blauert93). Lateralization of these stimuli is not affected bythe presence of a low-pass masker7,8

but declines with a reduction inthe correlation between the carriers.8 Furthermore, lateralization doesnot occur for an AM signal matched against a pure tone at the carrierfrequency.8 The interaural signal disparity, expressed as the amount of

The Physical and Psychophysical Basis of Sound Localization

Page 30: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

56 Virtual Auditory Space: Generation and Applications

interaural spectral overlap, was found to affect the sensitivity to interauraltime differences for both short tone bursts and clicks.94 These datasuggest that the information necessary for the lateralization of ampli-tude modulated high frequency stimuli is carried in the high frequencychannels and that the auditory system is sensitive to the envelope ofthe time domain signal.

It has been known for some time that the lateralization of dichoticallypresented spectrally dense stimuli is affected by the duration of thestimulus.95,96␣ Mean ITD thresholds for spectrally dense stimuli decreasewith increasing stimulus duration and asymptote around 6␣ ms for stimulusdurations greater than 700␣ µs.96 Furthermore, the efficacy of onset andongoing disparities in producing a lateralized image was found to varyas a function of the duration of the stimulus. Onset disparity wascompletely ineffective in producing lateralization of dichotic white noisestimuli with no ongoing disparities when the stimulus duration wasgreater than 150␣ ms.96 Furthermore, onset disparity was dominant onlyfor stimulus durations of less than 2␣ ms to 4␣ ms.95 Yost97␣ measuredthe ITD thresholds for pulsed sinusoids with onset, ongoing and off-set disparities. With the addition of these amplitude envelopes, theITD thresholds were significantly lower for high frequencies whencompared to the data of Zwislocki and Feldman5 and Klump and Eady4

who varied only the interaural phase. As discussed earlier (chapter 1,1.3) the recent work of Dye et al88 has lead to suggestions that theauditory processing strategy might be determined by the duration ofthe signal to be analyzed. The perception generated by a short dura-tion sound is dependent on the ‘mean’ of the different cues from whichit is composed (so-called synthetic listening) while longer duration stimulitend to be parsed or streamed into different auditory objects. In thiscontext, the trigger seems to be the duration of the signal.

2.2.2. Multiple signals and the “precedence effect”The discussion above suggests that the auditory system may be

dealing with the onset and ongoing time disparity cues in different ways.A related consideration is the effects of multiple real or apparent sources.In this case, the auditory system must make an important distinctionbetween inputs relating to distinct auditory objects and inputs relat-ing to reflections from nearby surfaces. The interactions between theincident and reflected sounds may therefore affect the way in whichthe auditory system weights differently the onset and ongoing compo-nents of a sound. The experiments discussed above have all relied ondichotically presented time-varying stimuli; however they are analo-gous to the so-called precedence (or Haas) effect which relies on freefield stimuli40,98,99 (see also introductory review in Moore100). Whentwo similar stimuli are presented from different locations in the freefield, the perceived location of the auditory event is dependent on theprecise times of arrival of the two events. Stimuli arriving within a

Page 31: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

57

millisecond of each other will produce a fused image that is located atsome point between the two sound sources;d this has been referred toas summing localization.40 For greater temporal disparities, the latersound is masked until the arrival time disparity is of the order of 5␣ msto 40␣ ms.99 The actual period over which this masking is effective isvery dependent on the spectral and temporal characteristics of the sound.Although the later sound is masked (in that it is not perceived asrelating to a separate auditory object), it may still have some effect onthe perception of the final image. This is particularly the case if thereis a large level disparity between the first and subsequent sound. Forinstance, if the second sound is much louder it can completely over-ride the precedence effect. Wallach et al98 also noted that the transientcomponents in the signal were important in inducing the effect. If thesound levels of two narrow band sources are varied inversely over aperiod of seconds, the initial perception of the location of the sourceremains unchanged despite the fact that the level cues are reversed(the Franssen effect).101 However, this effect fails in an anechoic envi-ronment or if a broadband sound is used.

Our understanding of the mechanisms of the precedence effectprovides important insights into the psychoacoustic effects of rever-berant room environments102 particularly in the case where multiplesources are concerned.99 The precedence effect is clearly a binauralphenomenon: in a reverberant environment our perception of the re-flected sound rarely interferes with our perception of the primary sound.However, the simple expedient of blocking one ear can result in alarge increase in our perception of the reverberation to the extent thatlocalization and discrimination of the source can made very difficult.The emphasis on the initial components of the signals arriving at eachear has obvious advantages for sound localization in reverberant envi-ronments. In a relatively close environment, the first reflected echoeswould probably arrive over the 40␣ ms period following the arrival ofthe incident sound, so that the ongoing ITDs would be significantlyaffected over this time course. Consistent with this, Roth et al18 foundthat ongoing ITDs estimated from cats in an acoustically “live” envi-ronment over the first 10␣ ms to 100␣ ms exhibited large variations as afunction of frequency and location. These authors argued that the large

The Physical and Psychophysical Basis of Sound Localization

d This is often thought to be the basis of hi-fi stereo; however unless specificchannel delays have been inserted into the program at mix down, thestereophonic effect of most pop music is dependent on the level of the sound ineach channel. In cases where two microphones have been used to sample acomplex source such as an orchestra, the recordings of the individual instrumentsin each channel will obviously contain delays related to their relative proximityto each microphone, so that the stereophony will depend on both levels anddelays.

Page 32: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

58 Virtual Auditory Space: Generation and Applications

variations in ITD would render this cue virtually useless for localiza-tion as the interval/position relationships would be at best very com-plex, or at worst, arbitrary (section 1.2).

Hartmann103 compared the localization of broadband noise, lowfrequency tone impulse and relatively continuous low frequency tones.Consistent with other results, the broadband noise was most accuratelylocalized of all of the stimuli; however accuracy was also degraded inhighly reverberant conditions. The localization of the low frequencypulse with sharp onset and offset transients was unaffected by roomechoes unless these echoes occurred within about 2.5␣ ms. The samestimulus was virtually unlocalizable when presented without onset tran-sients (6 to 10 seconds rise time). These results indicate the relativeimportance of the initial components of a signal for accurate localiza-tion.

2.2.3. Neurophysiological measures of ITD sensitivityBy recording the electrophysiological activity from the auditory

systems of nonhuman mammals we have gained considerable insightinto how sound is encoded within the nervous system and, most im-portantly, the fidelity of these neural codes. As discussed earlier, theITDs will be manifest as differences in the arrival time of the soundto each ear (onset time differences) and as differences in the phases ofthe on-going components of the sounds in each ear. An importantprerequisite for interaural sensitivity to the phase of a sound is theability of each ear to encode the phase monaurally. This will haveassociated with it certain encoding limits. Following the initial encod-ing, a second process will be required for the binaural comparison ofthe input phase at each ear. This is likely to involve different neuralmechanisms which have, in turn, their specific encoding limitations.

With respect to the monaural encoding of the phase characteris-tics of a sound, it is clear that phase is only encoded for the middleto low frequency components and that this limit is most likely deter-mined by the auditory receptors in the inner ear.3 When the responsesof individual fibers in the auditory nerve are recorded, the ability ofthe auditory system to lock to the phase of an input sine wave de-creases with an increase in frequency. There is still considerable de-bate about actual cut-off frequency but most researchers would expectlittle phase information to be encoded in the mammalian auditory systemfor frequencies above 4␣ kHz. In fact, the capacity of these nerve fibersto “lock” to the phase of the stimulus falls rapidly for frequencies greaterthan 1.5␣ kHz.104 Recording the responses of auditory neurons at otherlocations within the auditory nervous system of mammals providesevidence that is broadly consistent these findings (see Yin and Chan105

and Irvine106 for review).At these lower frequencies the auditory system has been shown to

be exquisitely sensitive to interaural phase. Psychophysical experiments

Page 33: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

59

using humans4,5 and other primates (see Houben, 1977 quoted inBrown)107 demonstrate threshold discrimination differences of 3° and11° in interaural phase respectively. Sensitivity to the interaural dis-parities in the time of arrival and the ongoing phase of dichotic stimulihas been demonstrated in auditory nuclei of a variety of animals (forrecent reviews see Irvine106,108). The convergence of phase sensitive in-puts allows the comparison of interaural phase differences. Single neu-rons recorded in the medial superior olivary nucleus, the first relaynucleus where the inputs from the two ears converge, demonstrate phase-locked responses to stimulation of either ear. Studies of the inputsfrom each ear suggest that the convergent information is both excitatoryand inhibitory and the timing relationship of the inputs from each earare critical in determining the output activity of the neurons in themedial superior olive.109,110 For low frequency stimuli, the maximumresponse of a neuron occurs for a specific interaural delay, regardlessof the stimulation frequency. While the total period of the responsevariation is equal to the period of the stimulus frequency, the maxi-mum response occurs only at an interaural delay that is characteristicof the unit under study. This “characteristic delay” may reflect a neu-ral delay introduced into the signal by neural path length differencesfrom each ear. Thus, individual cells could act as neural coincidencedetectors sensitive to particular interaural delays. Such a mechanismfor the detection of small interaural time differences was first pro-posed by Jeffress111 (Fig. 2.10, see also refs. 112-114).

The coding of interaural time disparities in the envelopes of am-plitude modulated high frequency stimuli115 and for modulated noisebands in the presence of low-pass maskers116 has been demonstrated inthe inferior colliculi of the cat and guinea pig respectively. This hasalso been confirmed in recent recordings in the lateral superior olivein the cat.117 These findings may represent a neurophysiological corre-late for the psychophysical observations that amplitude modulated highfrequency stimuli can be lateralized on the basis of interaural timedifferences.7,91

2.3. SENSITIVITY TO INTERAURAL LEVEL DIFFERENCESA number of investigators have examined ILD cues by making sound

pressure measurements at, or in, the ears of experimental subjects orlife-like models (sections 1.3, 1.5 and 1.6.3). As we have seen, thepinna and other structures of the auditory periphery act to produce adirectionally selective receiver which is frequency dependent.

2.3.1. Psychophysical measures of within frequency sensitivityto ILD

The threshold interaural intensity differences for sounds presenteddichotically vary as a function of frequency.118 Thresholds for frequen-cies below 1␣ kHz are around 1 dB sound pressure level (SPL re 20␣ µP)

The Physical and Psychophysical Basis of Sound Localization

Page 34: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

60 Virtual Auditory Space: Generation and Applications

Fig. 2.10. The Jeffress modelof ITD coding. When a soundis located off the midline thereis a difference in the path-lengths from the source toeach ear. This results in a dif-ference in the arrival time toeach ear. Jeffress proposedthat an anatomical differencein path-lengths could be usedby the auditory nervous sys-tem to encode the interauraltime differences. Essentially,information from each earwould converge in the ner-vous system along neuronalpaths that differed in length.For instance, in the nucleusdepicted in the figure there isan array of neurones receivinginformation from both the ip-silateral and contralateral ears.The neurone labeled 1 has theshortest path-length from thecontralateral ear and the long-est path-length from the ipsi-lateral ear. If this neurone onlyresponded when the signalsfrom both ears arrived coinci-dentally (i.e., acts as a coinci-dence detector) then neurone1 will be selective for soundswith a large interaural delayfavoring the ipsilateral ear. Thatis, if the sound arrives first atthe ipsilateral ear it will have alonger pathway to travel toneurone 1 than the later arriv-ing sound at the contralateralear. In this way, very smalldifferences in interaural timedifferences could be con-verted into a neural place code where each of the neurones in this array (1 to 7) each code for particularinteraural time difference. The resolution of such a system is dependent on the conduction velocitiesof the fibers conveying the information to each coincidence detector and synaptic ‘jitter’ of the detector;e.g., the excitatory rise time and the reliability of the action potential initiation. Reprinted with permissionfrom Goldberg JM, Brown PB, J Neurophysiol 1969; 32:613-636.

Page 35: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

61

and decrease to about 0.5 dB SPL for 2␣ kHz to 10␣ kHz. These changesare in close agreement with those predicted from the measurements ofminimum audible angles,119 particularly over the 2␣ kHz to 5␣ kHz fre-quency range.

The sensitivity of the auditory system to ILDs at low frequencieshas also been investigated indirectly using the so-called ‘time-intensitytrading ratio’. In these experiments, the ILDs of a dichotic stimuluswith a fixed ITD were varied to determine what ILD offset was re-quired to compensate for a particular ITD; the ratio expressed as ms/dB(see discussions by Moore100 and Hafter90). Over moderate interauraldifferences, the measurements of this relation varies from 1.7␣ µs/dBfor pure tone to 100␣ µs/dB for trains of clicks. A number of studieshave also suggested that the processing of these cues may not be thatcomplementary; the variance of measurements trading time against afixed level are different from that obtained by trading level against afixed time. Furthermore, some studies found that subjects can reporttwo images (one based on the ITD and the other based on the ILD)and not one image as is assumed in the previous studies. This maywell reflect an analytic as opposed to synthetic processing strategy. How-ever, regardless of the basis of the time-intensity trading, the impor-tant point in this context is that the auditory system is sensitive toILDs of the order of 1 dB to 2 dB over the low frequency ranges.

2.3.2. Neurophysiological measures of ILD sensitivityThere are two main strategies that could be utilized by the audi-

tory system in the analysis of interaural level differences. The tradi-tional thinking on this matter reflects the way in which the inner earis thought to encode complex sounds: i.e., principally as a spectralcode which is manifest within the central nervous system as a topo-graphical code of frequency. The analysis of ILD information is thoughtto occur when these two topographic codes of frequency intersect inthe superior olivary nuclei where the analysis is carried out on a fre-quency by frequency basis.

There is a considerable literature documenting the physiologicalresponses of auditory neurones to variation in interaural level differ-ences in a number of different species and auditory nuclei (for recentreviews see Irvine,106,108 Fig. 2.11). There is considerable heterogeneityin the response patterns of individual neurones with a spectrum ofresponse types coding a range of interaural levels. One important neu-rophysiological finding is that in some nuclei the sensitivity of indi-vidual neurones to variations in ILDs using pure tone stimuli can varyconsiderably with both the frequency of the tones presented and theoverall (or average) binaural level at which these stimuli are pre-sented.120-124 This wide variability in response presents some difficul-ties in interpreting how these neurones might be selective for specificILDs. However, in at least one nucleus (the superior colliculus), when

The Physical and Psychophysical Basis of Sound Localization

Page 36: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

62 Virtual Auditory Space: Generation and Applications

broadband stimuli are utilized a consistent pattern of responses to ILDscan be found across a wide range of average binaural levels. That is,the inclusion of a broad range of frequencies in the stimulus results ina response which is selective for ILD per se rather than one confoundedby variations in the overall sound level.

This leads to considerations of the second strategy by which theauditory system might analyze ILD cues namely binaural analysis acrossfrequency rather than within frequency. The monaural filter functions

Fig. 2.11. Using a closed field stimulus system the sound level can be varied ineach ear about a mean binaural stimulus level or alternatively the level in one earcan be varied while keeping the level in the other ear constant.136 Solid line showsthe variation in the firing rate of a binaurally sensitive neurone in the inferiorcolliculus by varying the contralateral stimulus level. Note that the activity of thisneurone is completely inhibited when the interaural stimulus level slightly favorsthe ipsilateral ear. The dashed line shows the response to stimulation of thecontralateral ear alone. There is evidence that the interaural level differences atwhich neurones are inhibited varies considerably across the population ofneurones. This variation is consistent with the idea that ILD may also be encodedby neural place. Reprinted with permission from Irvine DRF et al, Hear Res 1995;85:127-141.

Page 37: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

63

of each ear combine to produce a complex spectral pattern of binauraldifferences (section 1.5.1, see also Fig. 2.14). In the traditional viewof the ascending auditory system, frequency is thought to be preservedby the place code, so that different channels carry different frequen-cies. If spectral patterns in either the individual monaural cues or intheir binaural comparisons are to be analyzed across frequency, thenfrequency information must converge rather than remain segregated.There are a number of subcortical nuclei in the auditory system wherethis kind of convergence appears to occur (for instance the dorsal cochlearnucleus, the dorsal nuclei of the lateral lemniscus, the external nucleusof the inferior colliculus and the deep layers of the superior colliculus).Single neurones in these structures can be shown to have wide andcomplex patterns of frequency tuning, indicating that frequency infor-mation has converged, presumably as a result of the neural selectionfor some other emergent property of the sound.

There have been virtually no physiological or psychophysical stud-ies of the coding of binaural spectral profiles; however, some psycho-physical work has been carried out on the sensitivity to monaural (ordiotically presentede) spectral profiles. The lack of any systematic studyof interaural spectral differences (ISDs) is surprising on a number ofcounts. Firstly, there should be considerable advantage to the auditorysystem in analyzing ISDs as, unlike monaural spectral cues, the ISD isindependent of the spectrum of the sound. The auditory system mustmake certain assumptions about the spectrum of a signal if monauralspectral cues are to provide location information. By contrast, as longas the overall bandwidth of a sound is sufficiently broad, the ISD isdependent only on the differences between the filtering of the soundby each ear. As both of the filter functions are very likely to be knownby the auditory system, this cue is then unambiguous. However, whenthe ISDs are calculated for a sound on the audio-visual horizon (e.g.,see Fig. 2.14) there is very little asymmetry in the shape of the binau-ral spectral profile for locations about the interaural axis. In this case,the utility of ISD cues for resolving the front-back ambiguity in thewithin-frequency binaural cues of ITD and ILD is considerably com-promised. However, that these cues play some role in the localizationwas indicated by a decision theory model of auditory localizationbehaviour.125 Searle et al125 argued that on the basis of an analysis of alarge number of previous localization studies it was likely that the pinnadisparity cues provided about the same amount of localization infor-mation as the monaural pinna cue.

The Physical and Psychophysical Basis of Sound Localization

e In this context a dichotic stimulus refers to a stimulus which is presentedbinaurally and contains some binaural differences. On the other hand a dioticstimulus is presented binaurally but is identical at each ear.

Page 38: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

64 Virtual Auditory Space: Generation and Applications

2.4. SENSITIVITY TO CHANGES IN THE SPECTRAL PROFILEOF SOUNDS

The human head related transfer functions (HRTFs) indicate thatthe spectral profile of a sound varies considerably as a function of itslocation in space. These variations occur principally over the middleto high frequency range of human hearing where the wavelengths ofthe sounds are smaller than the principal structures of auditory pe-riphery. Although there is considerable evidence that the “spectral” cuesfurnished by the outer ear are important for accurate localization, con-siderably less is known about how the auditory system encodes thesevariations in the spectral profile.

There is a considerable body of experimental work examining theeffects of spectral profile for the low to mid range of frequencies. Greenand his colleagues have coined the phrase spectral “profile analysis” todescribe the ability of subjects to discriminate variations in the spec-tral shape of a complex stimulus126 (see also Moore et al127). In thesestudies, spectrally complex stimuli have generally been constructed bysumming a number of sinusoids covering a range of frequencies anddetermining the sensitivity of subjects to changes in the amplitude ofan individual component of this complex (see Green128,129 for review).Randomizing the phase of the components, which varies the time do-main waveforms of the stimuli, seems to have no significant effect onthis task. This has been interpreted as indicating that the detection ofvariations is most likely based on variations in the amplitude spec-trum rather than waveform. The number of components in the com-plex, the frequency interval between components and the frequencyrange of components all affect the detectability of variations in theamplitude of a single component. In particular, detection is best whenthe components cover a wide range of frequencies and individual com-ponents are spaced about 2 critical bands apart. The frequency of thecomponent which is varied within the complex also has a moderateeffect on the sensitivity of detection. However, these variations do noteasily discriminate among the various models of auditory function.

For our purposes it is sufficient to summarize this work as indi-cating that the auditory system is indeed sensitive to quite small changesin the spectral profile of a stimulus and, with some limitations, thissensitivity is generally related to the bandwidth of the sound. It seemslikely that any analysis of the monaural spectral cues and possibly bin-aural spectral differences, will involve some kind of process that ana-lyzes information across frequency in a manner similar to that illus-trated by the kind of profile analysis experiments carried out by Greenand colleagues.

Page 39: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

65

2.5. MODELS OF THE ENCODING OF SPECTRAL PATTERNSBY THE AUDITORY SYSTEM

2.5.1. Models of auditory encodingIn identifying which parameters of the HRTF might be important

in the localization of a sound source, it is important to consider waysin which this information is encoded by the auditory nervous system.The HRTF illustrates the way in which the sound arriving at the ear-drum is modified by the auditory periphery. There are at least twoother important transmission steps before the sound is represented asa sensory signal within the auditory nervous system. These are (1) theband pass transfer function of the middle ear and (2) the transfer functionof the inner ear. There are essentially two ways in which these pro-cesses might be taken into account.f The first is to computationallymodel what is known about the physical characteristics of these differ-ent stages and then combine the HRTFs with such a model. Whilethere have been huge advances in our understanding of the mecha-nisms of the inner ear over the last decade, it is probably fair to saythat such a functional model of the middle and inner ear is still likelyto be incomplete. The second approach is to use models of auditoryprocessing that have been derived from psychophysical data. Such modelstreat the auditory system as a black box and simply try to relate inputto perceptual output. An underlying assumption of the spectral pro-cess models is that the auditory system can be modeled as a bank ofband pass filters, e.g., Glasberg and Moore130 (but see also Patterson131).Such models attempt to take into account (i) the transfer function ofthe middle ear; (ii) the variation in the frequency filter bandwidth asa function of frequency; and (iii) the variation in the filter shape as afunction of level.132-134 The outputs of such models are generally ex-pressed in terms of “neural excitation patterns.” That is, the outputsmake predictions about how a sound might appear to be coded acrossan array of frequency sensitive fibers within the central nervous sys-tem (chapter 1, section 1.5).

We have combined one model130,135 with the measured HRTFs toestimate the excitation patterns generated by a spectrally flat noise ata nominal spectrum level of 30 dB (see Carlile and Pralong31 for

The Physical and Psychophysical Basis of Sound Localization

f The most obvious way of taking these processes into account would be tomeasure the neurophysiological responses after the encoding of the sound by theinner ear. However, this is also highly problematic. Meaningful measurementof the neural responses in humans is not technically possible (animal modelscan be used in this kind of experiment, which requires surgery to gain accessto the neural structures to be recorded from). Additionally, it is not clear thatthere is as yet a sufficiently good understanding of the nature of the neural codeto unequivocally interpret the results of any such measurements.

Page 40: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

66 Virtual Auditory Space: Generation and Applications

computational details). The components of the HRTFs which are likelyto be most perceptually salient can be determined from such an analysis.

2.5.2. Perceptual saliency of the horizon transfer functionFigure 2.12 shows the calculated excitation patterns generated by

a flat noise source at different azimuth locations along the ipsilateralaudio-visual horizon. This is referred to as the horizon excitation functionand shows the dramatic effects of passing the HRTF filtered noisethrough these auditory filter models. There are significant differencesbetween the excitation patterns and the HRTF obtained using acous-tical measures (cf. Figs. 2.6 and 2.12). One of the most marked ef-fects is the smoothing of much of the fine detail in the HRTF by theauditory filters, particularly at high frequencies. Additionally there areoverall changes in the shape of the horizon transfer function; the within-frequency variations are very similar between the horizon transfer functionand the horizon excitation function, as the auditory filter is not sensi-tive to the location of the stimulus. However, the transfer function ofthe middle ear dramatically changes the shape of the functions across-frequency.

For frequencies between 2.5␣ kHz and 8␣ kHz there is a large in-crease in the gain of the excitation patterns for anterior locations. Bycontrast, there is a marked reduction in the excitation pattern for fre-quencies between 3␣ kHz and 6␣ kHz for locations behind the interauralaxis (-90° azimuth). Furthermore, there is an increase in the frequencyof peak excitation as the source is moved from the anterior to poste-rior space; this ranges from 0.5 to 1.0 octaves for the eight subjectsfor whom this was calculated.31

2.5.3. Perceptual saliency of the meridian transfer functionThe effects of combining the meridian transfer function with the

auditory filter model is shown in Figure 2.13 for sounds located onthe anterior median plane. There is an appreciable reduction in theamount of spectral detail at the higher frequencies. The frequency ofthe peak of the principal gain feature in the middle frequency rangevaries as a function of elevation. There is also a trough in the excita-tion pattern that varies from 5␣ kHz to 10␣ kHz as the elevation of thesource is increased from 45° below the audio-visual horizon to thedirectly above the head. There are a considerable number of studiesthat provide evidence for the involvement of spectral cues in medianplane localization (chapter 1, section 2.1.4); therefore it is likely thatthe changes evident in the meridian excitation pattern (Fig. 2.13) areboth perceptually salient and perceptually relevant.

2.5.4. Variation in the interaural level differencesThe HRTFs can also be used to estimate the pattern of interaural

level differences. It is important to note, however, that these differences

Page 41: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

67

Fig. 2.12. The horizon excitation function for the ipsilateral audio-visual horizon has been calculated from thedata shown in Fig. 2.6 (see text and Carlile and Pralong31 for method). The gain indicated by the color of thecontour is in dB excitation. This calculation takes into account the frequency dependent sensitivity and thespectral filtering properties of the human auditory system and provides a measure of the likely perceptualsaliency of different components of the HRTF. Reprinted with permission from Carlile S and Pralong D, JAcoust Soc Am 1994; 95:3445-3459.

The Physical and Psychophysical Basis of Sound Localization

will only be calculated by the auditory nervous system following theencoding of the sound by each ear. That is, the interaural level differ-ence is a binaural cue obtained by comparing the neurally encodedoutputs of each ear. For this reason we have estimated the neurallyencoded ILDs from the excitation patterns that are calculated for eachear by subtracting the excitation pattern obtained at the contralateralear from that at the ipsilateral ear. The horizon ILDs calculated forthe left and right ears are shown in Figure 2.14. The pattern of changesin ILD within any one frequency band demonstrate a non-monotonicchange with location, thereby illustrating the “cone of confusion” typeambiguity. Not surprisingly, because of the complex filtering proper-ties of the outer ear, the patterns of within-frequency changes in ILDare often more complex than predicted by a simple spherical head model.In general, however, ILDs peak around the interaural axis, particularlyfor the mid to high frequencies where the changes are sufficiently largeto be acoustically and perceptually relevant.

Page 42: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

68 Virtual Auditory Space: Generation and Applications

Fig.

2.1

3. T

he m

erid

ian

exci

tatio

n fu

nctio

ns h

ave

bee

n ca

lcul

ated

fro

m t

he d

ata

sho

wn

in F

ig. 2

.7. A

ll o

ther

det

ails

as

for

Fig.

2.1

2. R

eprin

ted

with

per

mis

sio

n fr

om

Car

lile

S, P

ralo

ng D

, J A

cous

t So

c A

m 1

994;

95:

3445

-345

9.

Page 43: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

69

Fig.

2.1

4. T

he h

oriz

on

tran

sfer

func

tion

pro

vid

es a

mea

sure

of h

ow

one

out

er e

ar m

od

ifies

the

soun

d a

s a

func

tion

of i

ts lo

catio

n. T

hese

dat

a ca

n al

sob

e us

ed to

est

imat

e ho

w th

e b

inau

ral s

pec

tral

cue

s m

ight

var

y as

a fu

nctio

n o

f lo

catio

n. In

this

figu

re th

e b

inau

ral s

pec

tral

cue

s ge

nera

ted

for l

oca

tions

on

the

aud

io-v

isua

l ho

rizo

n o

f the

(a) l

eft e

ar a

nd (b

) rig

ht e

ar a

re sh

ow

n. T

he le

vel i

n d

B o

f the

bin

aura

l diff

eren

ce is

ind

icat

ed b

y th

e co

lor o

f the

co

nto

ur.

No

te th

at th

e d

iffer

ence

s bet

wee

n th

ese

plo

ts re

sult

fro

m d

iffer

ence

s in

the

filte

ring

pro

per

ties o

f the

left

and

righ

t ear

s res

pec

tivel

y, in

dic

atin

g a

sign

ifica

ntd

epar

ture

fro

m b

iolo

gica

l sym

met

ry in

the

se s

truc

ture

s. R

eprin

ted

with

per

mis

sio

n fr

om

Car

lile

S, P

ralo

ng D

, J A

cous

t So

c A

m 1

994;

95:

3445

-345

9.

The Physical and Psychophysical Basis of Sound Localization

Page 44: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

70 Virtual Auditory Space: Generation and Applications

There are considerable differences evident in the horizon ILDscalculated for each ear for this subject. Similar kinds of inter-ear dif-ferences were seen in seven other subjects; however those shown inFigure 2.14 represent the largest differences observed. These data dem-onstrate that there is sufficient asymmetry in the ears to produce whatare very likely to be perceptually salient differences in the patterns ofILDs at each ear (for sounds ipsilateral to that ear). Clearly, estimatesof ILD that are based on recordings from one ear and assume that theears are symmetrical should be treated with some caution. These interauralvariations were also sufficient to produce ILDs of greater than 9 dBfor a sound located on the anterior median plane.31

2.6. PINNA FILTERING AND THE CONSTANCYOF THE PERCEPTION OF AUDITORY OBJECTS

One of the most striking effects related to the spectral profile ofthe sound can be obtained by simply playing noise filtered by differ-ent HRTFs through a loud speaker in the free field. The changes inthe spectral profiles from HRTF to HRTF are discernible as signifi-cant changes in the timbre of the sound. Clearly in this demonstra-tion the noise is being filtered a second time by the listener’s ears butthe point here is that the differences between sounds filtered by differ-ent HRTFs are very obvious. However, the most remarkable featureof this demonstration is that when these noise stimuli are heard overheadphones appropriately filtered by the left and right HRTFs, thelarge timbral differences are not apparent but rather these differenceshave been mapped onto spatial location. That is, there seems to besome kind of perceptual constancy regarding the auditory object sothat the changes in the input spectra are perceived as variation in thespatial location of the object rather than in the spectral characteristicsof the sound. In the visual system there is an analogous example to dowith color constancy. The color we perceive an object to be is relatedto the light reflected from that object. The problem is that under dif-ferent types of illumination, the wavelengths of the light that is actu-ally reflected can vary considerably. Despite this we still perceive aleaf to be green regardless of whether we are looking at the leaf atdawn, midday or dusk, or even under a fluorescent light. The solu-tion to this problem is that the visual system is sensitive to the ratioof light reflected at different wavelengths. Whether there is some kindof similar auditory mechanism relying on, for instance, the compari-son between the frequency bands or between the sounds at each earis, of course, unknown at this time. Such a process might also indi-cate a close interaction between the analytical system responsible forlocalization processing and pattern recognition.

3. CONCLUDING REMARKSIn this chapter we have canvassed a wide range of cues to the

location of a sound in space. We have also considered the sensitivity

Page 45: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

71

of the auditory system to these cues as indicated by psychophysicalexperiments on humans and by directly measuring the neural responsesin animals. Both the acoustic and perceptual studies give us some in-sight into the kinds of signals that a VAS display must transduce.Although the sensitivity studies have generally examined each localiza-tion cue in isolation, they provide data as to the quantum of each cuethat is detectable by the auditory system. Although there is some evi-dence of complex non-linearities in the system and of complex inter-dependencies between the cues, these studies provide, at least to a firstapproximation, an idea of the resolution which we would expect ahigh fidelity VAS display to achieve.

ACKNOWLEDGMENTSDrs. Martin, Morey, Parker, Pralong and Professor Irvine are warmly

acknowledged for comments on a previous version of this chapter. Someof the recent bioacoustic and excitation modeling work reported inthis chapter was supported by the National Health and Medical Re-search Council (Australia), the Australian Research Council and theUniversity of Sydney. The Auditory Neuroscience Laboratory main-tains a Web page outlining the laboratory facilities and current re-search work at http://www.physiol.usyd.edu.au/simonc.

REFERENCES1. Woodworth RS, Schlosberg H. Experimental Psychology. New York: Holt,

Rinehart and Winston, 1962.2. Woodworth RS. Experimental Psychology. New York: Holt, 1938.3. Palmer AR, Russsell IJ. Phase-locking in the cochlear nerve of the guinea-

pig and its relation to the receptor potential of inner hair-cells. Hear Res1986; 24:1-15.

4. Klump RG, Eady HR. Some measurements of interaural time differencethresholds. J Acoust Soc Am 1956; 28:859-860.

5. Zwislocki J, Feldman RS. Just noticeable differences in dichotic phase. JAcoust Soc Am 1956; 28:860-864.

6. Trahiotis C, Robinson DE. Auditory psychophysics. Ann Rev Psychol1979; 30:31-61.

7. Henning GB. Detectibility of interaural delay in high-frequency complexwaveforms. J Acoust Soc Am 1974; 55:84-90.

8. Nuetzel JM, Hafter ER. Lateralization of complex waveforms: effects offine structure, amplitude, and duration. J Acoust Soc Am 1976;60:1339-1346.

9. Saberi K, Hafter ER. A common neural code for frequency- and ampli-tude-modulated sounds. Nature 1995; 374:537-539.

10. Shaw EAG. The external ear. In: Keidel WD, Neff WD, ed. Handbookof Sensory physiology. Berlin: Springer-Verlag, 1974:455-490.

11. Middlebrooks JC, Makous JC, Green DM. Directional sensitivity of sound-pressure levels in the human ear canal. J Acoust Soc Am 1989; 86:89-108.

The Physical and Psychophysical Basis of Sound Localization

Page 46: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

72 Virtual Auditory Space: Generation and Applications

12. Searle CL, Braida LD, Cuddy DR et al. Binaural pinna disparity: anotherauditory localization cue. J Acoust Soc Am 1975; 57:448-455.

13. Hartley RVL, Fry TC. The binaural location of pure tones. Physics Rev1921; 18:431-42.

14. Nordlund B. Physical factors in angular localization. Acta Otolaryngol1962; 54:75-93.

15. Kuhn GF. Model for the interaural time differences in the horizontal plane.J Acoust Soc Am 1977; 62:157-167.

16. Feddersen WE, Sandel TT, Teas DC et al. Localization of high-frequencytones. J Acoust Soc Am 1957; 29:988-991.

17. Abbagnaro LA, Bauer BB, Torick EL. Measurements of diffraction andinteraural delay of a progressive sound wave caused by the human head.II. J Acoust Soc Am 1975; 58:693-700.

18. Roth GL, Kochhar RK, Hind JE. Interaural time differences: Implica-tions regarding the neurophysiology of sound localization. J Acoust SocAm 1980; 68:1643-1651.

19. Brillouin L. Wave propagation and group velocity. New York: Academic,1960.

20. Gaunaurd GC, Kuhn GF. Phase- and group-velocities of acoustic wavesaround a sphere simulating the human head. J Acoust Soc Am 1980;Suppl. 1:57.

21. Ballantine S. Effect of diffraction around the microphone in sound mea-surements. Phys Rev 1928; 32:988-992.

22. Kinsler LE, Frey AR. Fundamentals of acoustics. New York: John Wileyand Sons, 1962.

23. Shaw EAG. Transformation of sound pressure level from the free field tothe eardrum in the horizontal plane. J Acoust Soc Am 1974; 56:1848-1861.

24. Shaw EAG. The acoustics of the external ear. In: Studebaker GA,Hochberg I, ed. Acoustical factors affecting hearing aid performance. Bal-timore: University Park Press, 1980:109-125.

25. Djupesland G, Zwislocki JJ. Sound pressure distribution in the outer ear.Acta Otolaryng 1973; 75:350-352.

26. Kuhn GF. Some effects of microphone location, signal bandwidth, andincident wave field on the hearing aid input signal. In: Studebaker GA,Hochberg I, ed. Acoustical factors affecting hearing aid performance. Bal-timore: University Park Press, 1980:55-80.

27. Pralong D, Carlile S. Measuring the human head-related transfer func-tions: A novel method for the construction and calibration of a miniature“in-ear” recording system. J Acoust Soc Am 1994; 95:3435-3444.

28. Wightman FL, Kistler DJ, Perkins ME. A new approach to the study ofhuman sound localization. In: Yost WA, Gourevitch G, ed. DirectionalHearing. New York: Academic, 1987:26-48.

29. Wightman FL, Kistler DJ. Headphone simulation of free field listening.I: Stimulus synthesis. J Acoust Soc Am 1989; 85:858-867.

Page 47: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

73

30. Hellstrom P, Axelsson A. Miniture microphone probe tube measurementsin the external auditory canal. J Acoust Soc Am 1993; 93:907-919.

31. Carlile S, Pralong D. The location-dependent nature of perceptually sa-lient features of the human head-related transfer function. J Acoust SocAm 1994; 95:3445-3459.

32. Belendiuk K, Butler RD. Monaural location of low-pass noise bands inthe horizontal plane. J Acoust Soc Am 1975; 58:701-705.

33. Butler RA, Belendiuk K. Spectral cues utilized in the localization of soundin the median sagittal plane. J Acoust Soc Am 1977; 61:1264-1269.

34. Flannery R, Butler RA. Spectral cues provided by the pinna for monaurallocalization in the horizontal plane. Percept and Psychophys 1981;29:438-444.

35. Movchan EV. Participation of the auditory centers of Rhinolophus ferrum-equinum in echolocational tracking of a moving target. [Russian].Neirofiziologiya 1984; 16:737-745.

36. Hebrank J, Wright D. Spectral cues used in the localization of soundsources on the median plane. J Acoust Soc Am 1974; 56:1829-1834.

37. Hammershoi D, Moller H, Sorensen MF. Head-related transfer functions:measurements on 24 human subjects. Presented at Audio EngineeringSociety. Amsterdam: 1992:1-30.

38. Moller H, Sorensen MF, Hammershoi D et al. Head-related transfer func-tions of human subjects. J Audio Eng Soc 1995; 43:300-321.

39. Bloom PJ. Determination of monaural sensitivity changes due to the pinnaby use of minimum-audible-field measurements in the lateral vertical plane.J Acoust Soc Am 1977; 61:820-828.

40. Blauert J. Spatial Hearing: The psychophysics of human sound localiza-tion. Cambridge, Mass.: MIT Press, 1983.

41. Shaw EAG. Earcanal pressure generated by a free sound field. J AcoustSoc Am 1966; 39:465-470.

42. Mehrgardt S, Mellert V. Transformation characteristics of the externalhuman ear. J Acoust Soc Am 1977; 61:1567-1576.

43. Rabbitt RD, Friedrich MT. Ear canal cross-sectional pressure distribu-tions: mathematical analysis and computation. J Acoust Soc Am 1991;89:2379-2390.

44. Carlile S. The auditory periphery of the ferret. I: Directional responseproperties and the pattern of interaural level differences. J Acoust Soc Am1990; 88:2180-2195.

45. Khanna SM, Stinson MR. Specification of the acoustical input to the earat high frequencies. J Acoust Soc Am 1985; 77:577-589.

46. Stinson MR, Khanna SM. Spatial distribution of sound pressure and en-ergy flow in the ear canals of cats. J Acoust Soc Am 1994; 96:170-181.

47. Chan JCK, Geisler CD. Estimation of eardrum acoustic pressure and ofear canal length from remote points in the canal. J Acoust Soc Am 1990;87:1237-1247.

48. Teranishi R, Shaw EAG. External-ear acoustic models with simple geom-etry. J Acoust Soc Am 1968; 44:257-263.

The Physical and Psychophysical Basis of Sound Localization

Page 48: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

74 Virtual Auditory Space: Generation and Applications

49. Shaw EAG. The external ear: new knowledge. In: Dalsgaad SC, ed.Earmolds and associated problems-Proceedings of the seventh DanavoxSymposium. 1975:24-50.

50. Knudsen EI, Konishi M, Pettigrew JD. Receptive fields of auditory neu-rons in the owl. Science (Washington, DC) 1977; 198:1278-1280.

51. Middlebrooks JC, Green DM. Sound localization by human listeners. AnnuRev Psychol 1991; 42:135-159.

52. Middlebrooks JC. Narrow-band sound localization related to external earacoustics. J Acoust Soc Am 1992; 92:2607-2624.

53. Price GR. Transformation function of the external ear in response toimpulsive stimulation. J Acoust Soc Am 1974; 56:190-194.

54. Carlile S. The auditory periphery of the ferret. II: The spectral transfor-mations of the external ear and their implications for sound localization.J Acoust Soc Am 1990; 88:2196-2204.

55. Asano F, Suzuki Y, Sone T. Role of spectral cues in median plane local-ization. J Acoust Soc Am 1990; 88:159-168.

56. Kuhn GF, Guernsey RM. Sound pressure distribution about the humanhead and torso. J Acoust Soc Am 1983; 73:95-105.

57. Shaw EAG, Teranishi R. Sound pressure generated in an external-ear rep-lica and real human ears by a nearby point source. J Acoust Soc Am1968; 44:240-249.

58. Shaw EAG. 1979 Rayleigh Medal lecture: the elusive connection. In:Gatehouse RW, ed. Localisation of sound: theory and application. Con-necticut: Amphora Press, 1982:13-27.

59. Shaw EAG. External ear response and sound localization. In: GatehouseRW, ed. Localisation of sound: theory and application. Connecticut:Amphora Press, 1982:30-41.

60. Batteau DW. The role of the pinna in human localization. Proc RoyalSoc B 1967; 158:158-180.

61. Hiranaka Y, Yamasaki H. Envelope representations of pinna impulse re-sponses relating to three-dimensional localization of sound sources. J AcoustSoc Am 1983; 73:291-296.

62. Wright D, Hebrank JH, Wilson B. Pinna reflections as cues for localiza-tion. J Acoust Soc Am 1974; 56:957-962.

63. Watkins AJ. Psychoacoustic aspects of synthesized vertical locale cues. JAcoust Soc Am 1978; 63:1152-1165.

64. Watkins AJ. The monaural perception of azimuth: a synthesis approach.In: Gatehouse RW, ed. Localisation of sound: theory and application.Connecticut: Amphora Press, 1982:194-206.

65. Rogers CAP. Pinna transformations and sound reproduction. J Audio EngSoc 1981; 29:226-234.

66. Calford MB, Pettigrew JD. Frequency dependence of directional amplifi-cation at the cat’s pinna. Hearing Res 1984; 14:13-19.

67. Coles RB, Guppy A. Biophysical aspects of directional hearing in theTammar wallaby, Macropus eugenii. J Exp Biol 1986; 121:371-394.

68. Carlile S, Pettigrew AG. Directional properties of the auditory periphery

Page 49: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

75

in the guinea pig. Hear Res 1987; 31:111-122.69. Guppy A, Coles RB. Acoustical and neural aspects of hearing in the Aus-

tralian gleaning bats, Macroderma gigas and Nyctophilus gouldi. J CompPhysiol A 1988; 162:653-668.

70. Coleman PD. Failure to localize the source distance of an unfamiliar sound.J Acoust Soc Am 1962; 34:345-346.

71. Gardner MB. Distance estimation of 0° or apparent 0°-orientated speechsignals in anechoic space. J Acoust Soc Am 1969; 45:47-53.

72. Coleman PD. An analysis of cue to auditory depth perception in freespace. Psychol Bul 1963; 60:302-315.

73. Ashmead DH, LeRoy D, Odom RD. Perception of the relative distancesof nearby sound sources. Percept and Psychophys 1990; 47:326-331.

74. Begault D. Preferred sound intensity increase for sensations of half dis-tance. Peceptual and Motor Skills 1991; 72:1019-1029.

75. Hirsch HR. Perception of the range of a sound source of unknownstrength. J Acoust Soc Am 1968; 43:373-374.

76. Molino J. Perceiving the range of a sound source when the direction isknown. J Acoust Soc Am 1973; 53:1301-1304.

77. Holt RE, Thurlow WR. Subject orientation and judgement of distance ofa sound source. J Acoust Soc Am 1969; 46:1584-1585.

78. Mershon DH, Bowers JN. Absolute and relative cues for the auditoryperception of egocentric distance. Perception 1979; 8:311-322.

79. Ingard U. A review of the influence of meteorological conditions on soundpropagation. J Acoust Soc Am 1953; 25:405-411.

80. Butler RA, Levy ET, Neff WD. Apparent distance of sound recorded inechoic and anechoic chambers. J Exp Psychol: Hum Percept Perform 1980;6:745-50.

81. Little AD, Mershon DH, Cox PH. Spectral content as a cue to perceivedauditory distance. Perception 1992; 21:405-416.

82. Bekesy GV. Experiments in hearing. USA: McGraw-Hill Book Company,1960.

83. Mershon DH, King LE. Intensity and reverberation as factors in the au-ditory perception of egocentric distance. Percept Psychophys 1975;18:409-415.

84. Mershon DH, Ballenger WL, Little AD et al. Effects of room reflectanceand background noise on perceived auditory distance. Perception 1989;18:403-416.

85. Saberi K, Perrott DR. Minimum audible movement angles as a functionof sound source trajectory. J Acoust Soc Am 1990; 88:2639-2644.

86. Plenge G. On the differences between localization and lateralization. JAcoust Soc Am 1974; 56:944-951.

87. Sayers BM, Cherry EC. Mechanism of binaural fusion in the hearing ofspeech. J Acoust Soc Am 1957; 29:973-987.

88. Dye RH, Yost WA, Stellmack MA et al. Stimulus classification procedurefor assessing the extent to which binaural processing is spectrally analyticor synthetic. J Acoust Soc Am 1994; 96:2720-2730.

The Physical and Psychophysical Basis of Sound Localization

Page 50: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

76 Virtual Auditory Space: Generation and Applications

89. Zerlin S. Interaural time and intensity difference and the MLD. J AcoustSoc Am 1966; 39:134-137.

90. Hafter ER. Spatial hearing and the duplex theory: how viable is the model?New York: John Wiley and Sons, 1984.

91. McFadden D, Pasanen EG. Lateralization at high frequencies based oninteraural time differences. J Acoust Soc Am 1976; 59:634-639.

92. McFadden D, Moffitt CM. Acoustic integration for lateralization at highfrequencies. J Acoust Soc Am 1977; 61:1604-1608.

93. Blauert J. Binaural localization. Scand Audiol 1982; Suppl.15:7-26.94. Poon PWF, Hwang JC, Yu WY et al. Detection of interaural time differ-

ence for clicks and tone pips: effects of interaural signal disparity. HearRes 1984; 15:179-185.

95. Tobias JV, Schubert ED. Effective onset duration of auditory stimuli. JAcoust Soc Am 1959; 31:1595-1605.

96. Tobias JV, Zerlin S. Lateralization threshold as a function of stimulusduration. J Acoust Soc Am 1959; 31:1591-1594.

97. Yost WA. Lateralization of pulsed sinusoids based on interaural onset,ongoing, and offset temporal differences. J Acoust Soc Am 1977;61:190-194.

98. Wallach H, Newman EB, Rosenzweig MR. The precedence effect in soundlocalization. Am J Psych 1949; 62:315-337.

99. Zurek PM. The precedence effect. In: Yost WA, Gourevitch G, ed. Di-rectional hearing. New York: Springer-Verlag, 1987:85-105.

100. Moore BCJ. An introduction to the psychology of hearing. 3rd Edition.London: Academic Press, 1989.

101. Hartmann WM, Rakerd B. On the minimum audible angle–A decisiontheory approach. J Acoust Soc Am 1989; 85:2031-2041.

102. Berkley DA. Hearing in rooms. In: Yost WA, Gourevitch G, ed. Direc-tional hearing. New York: Springer-Verlag, 1987:249-260.

103. Hartmann WM. Localization of sound in rooms. J Acoust Soc Am 1983;74:1380-1391.

104. Johnson DH. The relationship between spike rate and synchrony in re-sponses of auditory-nerve fibers to single tones. J Acoust Soc Am 1980;68:1115-1122.

105. Yin TCT, Chan JCK. Neural mechanisms underlying interaural time sen-sitivity to tones and noise. In: Edelman GM, Gall WE, Cowan WM, ed.Auditory Function. Neurobiological basis of hearing. New York: JohnWiley and Sons, 1988:385-430.

106. Irvine DRF. Physiology of the auditory brainstem. In: AN Popper, RRFay, ed. The mammalian Auditory pathway: Neurophysiology. New York:Springer-Verlag, 1992:153-231.

107. Brown CH, Beecher MD, Moody DB et al. Localization of pure tones byold world monkeys. J Acoust Soc Am 1978; 63:1484-1492.

108. Irvine DRF. The auditory brainstem. Berlin: Springer-Verlag, 1986.109. Sanes D. An in vitro analysis of sound localization mechanisms in the

gerbil lateral superior olive. J Neurosci 1990; 10:3494-3506.

Page 51: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

77

110. Wu SH, Kelly JB. Binaural interaction in the lateral superior olive: Timedifference sensitivity studied in mouse brain slice. J Neurophysiol 1992;68:1151-1159.

111. Jeffress LA. A place theory of sound localization. J Comp Physiol Psychol1948; 41:35-39.

112. Goldberg JM, Brown PB. Response of binaural neurons of dog superiorolivary complex to dichotic tonal stimuli: some physiological mechanismsof sound localization. J Neurophysiol 1969; 32:613-636.

113. Yin TCT, Chan JCK, Carney LH. Effects of interaural time delays ofnoise stimuli on low-frequency cells in the cat’s inferior colliculus. IIIEvidence for cross-correlation. J Neurophysiol 1987; 58:562-583.

114. Smith PH, Joris PX, Yin TC. Projections of physiologically characterizedspherical bushy cell axons from the cochlear nucleus of the cat: evidencefor delay lines to the medial superior olive. J Comp Neurol 1993;331:245-260.

115. Yin HS, Mackler SA, Selzer ME. Directional specificity in the regenera-tion of Lamprey spinal axons. Science 1984; 224:894-895.

116. Crow G, Langford TL, Mousegian G. Coding of interaural time differ-ences by some high-frequency neurons of the inferior colliculus: Responsesto noise bands and two-tone complexes. Hear Res 1980; 3:147-153.

117. Jorris P, Yin TCT. Envelope coding in the lateral superior olive. I. Sensi-tivity to interaural time differences. J Neurophysiol 1995; 73:1043-1062.

118. Mills AW. Lateralization of high-frequency tones. J Acoust Soc Am 1960;32:132-134.

119. Mills AW. On the minimum audible angle. J Acoust Soc Am 1958;30:237-246.

120. Hirsch JA, Chan JCK, Yin TCT. Responses of neurons in the cat’s supe-rior colliculus to acoustic stimuli. I. Monaural and binaural response prop-erties. J Neurophysiol 1985; 53:726-745.

121. Wise LZ, Irvine DRF. Interaural intensity difference sensitivity based onfacilitatory binaural interaction in cat superior colliculus. Hear Res 1984;16:181-188.

122. Semple MN, Kitzes LM. Binaural processing of sound pressure level incat primary auditory cortex: Evidence for a representation based on abso-lute levels rather than the interaural level difference. J Neurophysiol 1993;69.

123. Irvine DRF, Rajan R, Aitkin LM. Sensitivity to interaural intensity dif-ferences of neurones in primary auditory cortex of the cat: I types ofsensitivity and effects of variations in sound pressure level. J Neurophysiol1996; (in press).

124. Irvine DRF, Gago G. Binaural interaction in high frequency neurones ininferior colliculus in the cat: Effects on the variation in sound pressurelevel on sensitivity to interaural intensity differnces. J Neurophysiol 1990;63.

125. Searle CL, Braida LD, Davis MF et al. Model for auditory localization. JAcoust Soc Am 1976; 60:1164-1175.

The Physical and Psychophysical Basis of Sound Localization

Page 52: THE PHYSICAL AND PSYCHOPHYSICAL BASIS OF …cns.bu.edu/~shinn/Carlile book/chapter_2.pdfTHE PHYSICAL AND PSYCHOPHYSICAL BASIS OF SOUND LOCALIZATION Simon Carlile 1. PHYSICAL CUES TO

78 Virtual Auditory Space: Generation and Applications

126. Spiegel MF, Green DM. Signal and masker uncertainty with noise maskersof varying duration, bandwidth, and center frequency. J Acoust Soc Am1982; 71:1204-1211.

127. Moore BC, Oldfield SR, Dooley GJ. Detection and discrimination ofspectral peaks and notches at 1 and 8␣ kHz. J Acoust Soc Am 1989;85:820-836.

128. Green DM. Auditory profile analysis: some experiments on spectral shapediscrimination. In: Edelman GM, Gall WE, Cowan WM, ed. Auditoryfunction: Neurobiological basis of hearing. New York: John Wiley andSons, 1988:609-622.

129. Green DM. Profile analysis: Auditory intensity discrimination. New York:Oxford University Press, 1988.

130. Glasberg BR, Moore BC. Derivation of auditory filter shapes fromnotched-noise data. Hear Res 1990; 47:103-138.

131. Patterson RD. The sound of a sinusoid: time-interval models. J AcoustSoc Am 1994; 96:1419-1428.

132. Patterson RD. Auditory filter shapes derived with noise stimuli. J AcoustSoc Am 1976; 59:640-654.

133. Rosen S, Baker RJ. Characterising auditory filter nonlinearity. Hear Res1994; 73:231-243.

134. Patterson RD, Moore BCJ. Auditory filters and excitation patterns as rep-resentations of frequency resolution. In: Moore BCJ, ed. Frequency selec-tivity in hearing. London: Academic Press, 1986:123-177.

135. Moore BCJ, Glasberg BR. Formulae describing frequency selectivity as afunction of frequency and level, and their use in calculating exitation pat-terns. Hear Res 1987; 28:209-225.

136. Irvine DRF, Park VN, Mattingly JB. Responses of neurones in the infe-rior colliculus of the rat to interaural time and intensity differences intransient stimuli: implications for the latency hypothesis. Hear Res 1995;85:127-141.


Recommended