Natural variations in speech intelligibility: An fMRI study
Han-Gyol Yi1, Rajka Smiljanic2, Bharath Chandrasekaran1
1S o u n d B r a i n L a b ( h t t p : / / s o u n d b r a i n l a b . w o r d p r e s s . c o m ) , D e p a r t m e n t o f C o m m u n i c a t i o n S c i e n c e s a n d D i s o r d e r s2U T S o u n d L a b ( h t t p : / / u t s o u n d l a b . w o r d p r e s s . c o m ) , D e p a r t m e n t o f L i n g u i s t i c sT h e U n i v e r s i t y o f T e x a s a t A u s t i n , A u s t i n , T X , U n i t e d S t a t e s
I N T R O D U C T I O N
R E F E R E N C E S
A C K N O W L E D G E M E N T S
Everyday we encounter speech of varying intelligibility due to talker de-pendent factors such as non-native accents3,4,6 and talker independent fac-tors such as noise or incorporation of visual cues7,8. Despite the inherent variability in speech, everyday conversations are largely unimpeded. How does the brain resolve natural variations in speech intelligibility? Studies on intelligibility have focused on using stimuli that are con-trolled in acoustical properties, yet reduced in intelligibility. This body of research has mostly indicated the left anterior superior temporal sulcus as the main area for intelligibility processing. This is in contrast to clinical studies that indicate the posterior region in the left superior temporal sul-cus as the main lesion site of receptive aphasia1. However, the spectrally rotated speech signal, albeit providing good acoustic control, does not resemble the types of signal that are to be re-solved in real life. For instance, there is no guarantee that the cortical ar-eas behind rotated speech processing are equivalent to those involved in non-native speech perception, the latter of which is becoming more and more prevalent owing to the increasing diversity in this society. In this fMRI study, participants listened to:
1. Abrams, D. A., Ryali, S., Chen, T., Balaban, E., Levitin, D. J., & Menon, V. (2012), ‘Multivariate activation and connectivity patterns discriminate speech intelligibility in Wernicke’s, Broca’s, and Geschwind’s Areas’, Cerebral Cortex, Epub ahead of print.
2. Adank, P., Noordzij, M. L., & Hagoort, P. (2012), ‘The role of planum temporale in processing accent variation in spoken language comprehension’, Human Brain Map-ping, vol. 33, pp. 361-372.
3. Bradlow, A. R., Torretta, G. M., & Pisoni, D. B. (1996), ‘Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics’, Speech Communi-cation, vol. 20, no. 3-4, pp. 255-272.
4. Bradlow, A. R., & Bent, T. (2008), ‘Perceptual adaptation to non-native speech’, Cog-nition, vol. 106, no. 2, pp. 707-729.
5. Hickok, G., & Poeppel, D. (2007), ‘The cortical organization of speech processing’, Na-
ture Reviews Neuroscience, vol. 8, pp. 393-402.6. Smiljanic, R., & Bradlow, A. (2009), ‘Speaking and hearing clearly: Talker and listener
factors in speaking style changes’, Language and Linguistics Compass, vol. 3, no. 1, pp. 236-264.
7. Sumby, W. H., & Pollack, I. (1954), ‘Visual contribution to speech intelligibility in noise’, Journal of the Acoustical Society of America, vol. 26, no. 2, pp. 212-215.
8. Wong, P. C. M., Uppunda, A. K., Parrish, T. B., & Dhar, S. (2008), ‘Cortical mecha-nisms of speech perception in noise’, Journal of Speech, Language, and Hearing Re-search, vol. 51, pp. 1026-1041.
9. Worsley, K. J. (2001), ‘Statistical analysis of activation images’, Functional MRI: An Introduction to Methods, ch. 14, eds. P. Jezzard, P.M. Matthews, & S.M. Smith, OUP.
SoundBrainLab
M E T H O D S
R E S U L T S
D I S C U S S I O N
Scanning parameters and analysis. Young adult (ages 18 to 35) monolingual native English speakers (N=20;12 f) were scanned in a Siemens Mag-netom Skyra 3T MRI scanner at the Imaging Research Center of the University of Texas at Austin. T1 images were obtained via MPRAGE sequence at the resolution of 1.0x1.0x1.0 mm. T2* images were obtained via EPI sequence (flip angle = 60°) at the spatial resolution of 2.0x2.0x2.0 mm, 36 slices with 50% distance factor, TR = 1.8 s, and TE = 30 ms. BOLD data were processed using FMRI Expert Analysis Tool Version 6.00, part of FM-RIB’s Software Library. Each participant’s data was linearly registered first to the T1 structural scan and then to the MNI152 template. Statistic im-ages were thresholded using clusters determined by Z>2.3 and a corrected cluster significance threshold of P=0.059.
Stimuli• 56 English sentences• Spoken by a female native English speaker.• Low pass filtered at 4kHz• 28 presented intact; 28 presented after spectral rotation at 4kHzDesign• Four blocks per each stimulus type; seven sentences per blockTask• After each block, button press: 1 (intelligible) or 2 (unintelligible)
fMRI Task 1: Speech vs. Rotated Speech
fMRI Task 2: Native vs. Non-Native Speech with Visual Cues
Stimuli• 80 English sentences• Spoken by native (N=4; 2 f) or non-native (N=4; 2 f; native Korean speakers) Eng-
lish speakers• 40 presented with visual cues; 40 presented without visual cues (fixation cross)Design• Event-related design with pseudorandomized sequenceTask• After each sentence, button press on a 1-to-4 scale (1: unintelligible; 4: intelligible)Behavioral Analogue• Participants completed a similar task outside the scanner• Multi-talker babble noise (SNR: -12 dB)• Typed perceived sentences; scored on proportion of keywords reported correct
Behavioral Indices of Speech Perception
Left: In the scanner, participants judged native sentenc-es to be more intelligible than the non-native sentences (p<.05).Right: Outside the scanner, participants comprehended (a) native sentences more accurately than non-native sentences (p<.001), (b) sentences with visual cues better than those without (p<.001), and (c) this visual enhance-ment was more pronounced for native than for non-na-tive sentences (p<.005).
Neural Correlates of Speech Perception Tasks
Validity of the Speech vs. Rotated Speech ComparisonThis contrast yielded the entire language network spanning both dorsal (articulatory) and ventral (comprehension) streams5. This is not indicative of intelligibility processing proper but language processing, as rotated speech constitutes non-linguistic stimulus, and is not comparable to actual speech in terms of semantics, syntax, or phonology.
Neural Correlates for Native vs. Non-Native Speech PerceptionNon-native speech, judged less intelligible, required both dorsal and ventral streams of the language network, since more processing is necessary for challenging tasks. This activity was comparable to the traditional Speech>Rotated contrast. In contrast, native speech processing which should be more automatic and effortless involved limited re-cruitment of posterior portions of the right temporooccipital junction.
Role of Visual CuesVisual cues aided speech processing7, especially for native speech than for non-native speech. The reduced extent of neural involvement for the non-native>native contrast with visual cues indicate that the level of neural efficiency became more equivalent across two speech types as more external cues were available.
Future Directions1. Assess sources of individual variability in degrees of visual cues incorporation and
native-bias in speech intelligibility (McGurk effect susceptibility; Implicit Association Task)
2. Functional connectivity analysis to complement the univariate subtraction approach in assessing the nature of dual streams activation in speech perception.
The authors would like to thank Kirsten Smayda, Jasmine E. B. Phelps, and Rachael Gilbert for significant contributions in data collection and processing; the faculty and the staff of the Imaging Research Center at the University of Texas at Austin for technical support and counsel.
0.0
0.1
0.2
0.3
0.4
0.5
Native Non-Native
Prop
ortio
n of
Wor
ds C
orre
ct No Visual CuesWith Visual Cues
1
2
3
4
Native Non-Native
Inte
lligi
bilit
y R
atin
gs
No Visual CuesWith Visual Cues
A repeated measures ANOVA was run with two within-subjects factors (visual cues; nativeness of talkers) on the intelligibil-ity ratings and proportion of words correct in the sentences perception tasks conducted inside and outside the scanner.
Task 1: Rotated SpeechNormal>Rotated Speech
Ventral stream of speech processing• Bilateral anterior to posterior superior temporal
sulcus• Combinatorial network and lexical interface
Dorsal stream of speech processing• Left supramarginal gyrus, motor area and inferior
frontal gyrus• Sensorimotor interface and articulatory network
Task 2: No Visual CuesNative>Non-Native Speech
Sensorimotor interface• Right angular gyrus
Lexical interface• Right posterior middle temporal gyrus
Non-Native>Native SpeechVentral stream of speech processing
• Bilateral anterior to posterior superior temporal sulcus
• Combinatorial network and lexical interfaceDorsal stream of speech processing
• Bilateral inferior frontal gyri, right superior pari-etal lobule
• Sensorimotor interface and articulatory network
Task 3: With Visual CuesNative>Non-Native Speech
Sensorimotor interface• Right angular gyrus
Lexical interface• Right posterior middle temporal gyrus
Non-Native>Native SpeechDorsal stream of speech processing
• Left inferior frontal gyri, right superior parietal lobule
• Sensorimotor interface and articulatory network
S U M M A R YBehavioral Results
1. Native speech is more intelligible and comprehensible than non-native speech.2. Visual cues benefit native speech more than non-native speech.
fMRI Results1. Artificial intelligibility (rotated speech) contrast yields the entire language network.2. Non-native speech with no visual cues contrast also yields the entire bilateral language network.3. Native speech processing includes limited involvement of sensorimotor and lexical interfaces.4. With no visual cues, the anterior superior temporal sulcus (indicated in speech>rotated) is activat-
ed in non-native>native rather than the native>non-native contrast.5. With visual cues, less neural activation is required to process non-native speech.
Box 1: The Dual-Stream Model of the Func-tional Anatomy of Language5
• Native vs. non-native speakers’ sen-tences
• With or without visual cues of face movements
Our aims are to (a) demonstrate the neural correlates for processing of natural variations in intelligibility, (b) assess the role of visual cues, and (c) ultimately re-solve the lack of consensus on the nature of the speech intelligibility region in the brain.