Effectiveness of spatial cues, prosody, and talker characteristics in selective attention C.J....

Post on 17-Dec-2015

218 views 2 download

Tags:

transcript

Effectiveness of spatial cues, prosody, and talker characteristics in

selective attention

C.J. Darwin & R.W. Hukin

Background

• Spatial attention often the focus of studies of the cocktail party effect

• But humans can separate sources that aren’t separated in space

• What other aspects of the speech signal are useful for source separation?– Pitch contour?– Individual characteristics?– A combination of characteristics?

Aims

• Characterize the role of natural prosody in sound source localization

• Characterize the role of vocal-tract size in sound source localization

Methods

• 13 listeners (21-52yrs)• “Could you PLEASE write the word

bead/globe down now?” / “You’ll ALSO hear the sound bead/globe played here”

Methods

• 13 listeners (21-52yrs)• “Could you PLEASE write the word

bead/globe down NOW?” / “You’ll ALSO hear the sound bead/globe played HERE”– Target word onsets aligned– Target word duration matched– Similar phrase durations

Methods

• Three pitch conditions– Original– Together (Equalize target word F0s)– Apart (Shift target word F0s apart)

• Two splicing methods– Normal– Swapped

You will ALSO hear the sound globe played here

You will also hear the sound globe played HERE

Could you please write the word bead down NOW

Could you PLEASE write the word bead down now

You will ALSO hear the sound globe played here

Could you please write the word bead down NOW

Swapped…

Could you PLEASE write the word bead down now

You will also hear the sound globe played HERE

Methods

• Three pitch conditions– Original– Together (Equalize target word F0s)– Apart (Shift target word F0s apart)

• Two splicing methods– Normal (prosodic cues reinforce spatial)– Swapped (prosodic cues oppose spatial)

• ITDs– 0, ±45.3, ±90.7 µs

• 144 trials heard 5 times each (720 trials)

You will ALSO hear the sound globe played here / Could you please write the word BEAD down now

ResultsITD = 0• Normal:

Select target with matching prosody (83%)

• Swapped: Lower incidence of accuracy (69%)

In the absence of other cues, listeners can use natural F0 contour to track a sentence

ResultsITD ≠ 0• Normal:

Improved accuracy (93%)

• Swapped: chance selection with an ITD of ±45.3 µs

• With ITD of ±90.7 µs report target with ITD of target sentence

ResultsITD ≠ 0• Apart condition

strengthens prosodic cues chance of reporting target with same prosody as target sentence

• Together condition weakens prosodic cues

ITD cues dominate, but natural prosody can help direct listeners’ attention

Aims

• Characterize the role of natural prosody in sound source localization

• Characterize the role of vocal-tract size in sound source localization

Experiment 2

• Changed spectral envelope by 15%– Formant frequencies changed– Voice source characteristics changed– F0 unchanged

• Produced 2 apparently different talkers

• ITD 0, ±45.3, ±90.7, ±181.4 µs

• Different vocal tract sizes have a large effect • Even with large ITDs and swapped condition,

listeners prefer original target word (73%)

Experiment 3

• Fixed ITD ±90.7 µs

• Vocal tract size changes of ±2, ±4, ±8, ±15%

• A ± 8% size difference is comparable to that between male and females

• Little significant change arises across vocal tract length change conditions below ±8%

Conclusions

• Natural prosodic variations more effectively override spatial cues than monotone F0

• Vocal tract size changes ≥ average male/female differences can override spatial cues

Things to consider

• Natural cues?• Natural setting?• In a natural environment are these cues ever

pitted against one another?• What are listeners really attending to? Can

we really conclude that more attention is being paid to ITD than to prosody?

But is the vocal tract modification of realistic proportions?