+ All Categories
Home > Documents > Natural variations in speech intelligibility: An fMRI study · RIB’s Software Library. Each...

Natural variations in speech intelligibility: An fMRI study · RIB’s Software Library. Each...

Date post: 11-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
1
Natural variations in speech intelligibility: An fMRI study Han-Gyol Yi 1 , Rajka Smiljanic 2 , Bharath Chandrasekaran 1 1 SoundBrain Lab (http://soundbrainlab.wordpress.com), Department of Communication Sciences and Disorders 2 UTSoundLab (http://utsoundlab.wordpress.com), Department of Linguistics The University of Texas at Austin, Austin, TX, United States INTRODUCTION REFERENCES ACKNOWLEDGEMENTS Everyday we encounter speech of varying intelligibility due to talker de- pendent factors such as non-native accents 3,4,6 and talker independent fac- tors such as noise or incorporation of visual cues 7,8 . Despite the inherent variability in speech, everyday conversations are largely unimpeded. How does the brain resolve natural variations in speech intelligibility? Studies on intelligibility have focused on using stimuli that are con- trolled in acoustical properties, yet reduced in intelligibility. This body of research has mostly indicated the left anterior superior temporal sulcus as the main area for intelligibility processing. This is in contrast to clinical studies that indicate the posterior region in the left superior temporal sul- cus as the main lesion site of receptive aphasia 1 . However, the spectrally rotated speech signal, albeit providing good acoustic control, does not resemble the types of signal that are to be re- solved in real life. For instance, there is no guarantee that the cortical ar- eas behind rotated speech processing are equivalent to those involved in non-native speech perception, the latter of which is becoming more and more prevalent owing to the increasing diversity in this society. In this fMRI study, participants listened to: 1. Abrams, D. A., Ryali, S., Chen, T., Balaban, E., Levitin, D. J., & Menon, V. (2012), ‘Multivariate activation and connectivity patterns discriminate speech intelligibility in Wernicke’s, Broca’s, and Geschwind’s Areas’, Cerebral Cortex, Epub ahead of print. 2. Adank, P., Noordzij, M. L., & Hagoort, P. (2012), ‘The role of planum temporale in processing accent variation in spoken language comprehension’, Human Brain Map- ping, vol. 33, pp. 361-372. 3. Bradlow, A. R., Torretta, G. M., & Pisoni, D. B. (1996), ‘Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics’, Speech Communi- cation, vol. 20, no. 3-4, pp. 255-272. 4. Bradlow, A. R., & Bent, T. (2008), ‘Perceptual adaptation to non-native speech’, Cog- nition, vol. 106, no. 2, pp. 707-729. 5. Hickok, G., & Poeppel, D. (2007), ‘The cortical organization of speech processing’, Na- ture Reviews Neuroscience, vol. 8, pp. 393-402. 6. Smiljanic, R., & Bradlow, A. (2009), ‘Speaking and hearing clearly: Talker and listener factors in speaking style changes’, Language and Linguistics Compass, vol. 3, no. 1, pp. 236-264. 7. Sumby, W. H., & Pollack, I. (1954), ‘Visual contribution to speech intelligibility in noise’, Journal of the Acoustical Society of America, vol. 26, no. 2, pp. 212-215. 8. Wong, P. C. M., Uppunda, A. K., Parrish, T. B., & Dhar, S. (2008), ‘Cortical mecha- nisms of speech perception in noise’, Journal of Speech, Language, and Hearing Re- search, vol. 51, pp. 1026-1041. 9. Worsley, K. J. (2001), ‘Statistical analysis of activation images’, Functional MRI: An Introduction to Methods, ch. 14, eds. P. Jezzard, P.M. Matthews, & S.M. Smith, OUP. SoundBrainLab METHODS RESULTS DISCUSSION Scanning parameters and analysis. Young adult (ages 18 to 35) monolingual native English speakers (N=20;12 f) were scanned in a Siemens Mag- netom Skyra 3T MRI scanner at the Imaging Research Center of the University of Texas at Austin. T1 images were obtained via MPRAGE sequence at the resolution of 1.0x1.0x1.0 mm. T2* images were obtained via EPI sequence (flip angle = 60°) at the spatial resolution of 2.0x2.0x2.0 mm, 36 slices with 50% distance factor, TR = 1.8 s, and TE = 30 ms. BOLD data were processed using FMRI Expert Analysis Tool Version 6.00, part of FM- RIB’s Software Library. Each participant’s data was linearly registered first to the T1 structural scan and then to the MNI152 template. Statistic im- ages were thresholded using clusters determined by Z>2.3 and a corrected cluster significance threshold of P=0.05 9 . Stimuli 56 English sentences Spoken by a female native English speaker. • Low pass filtered at 4kHz 28 presented intact; 28 presented after spectral rotation at 4kHz Design Four blocks per each stimulus type; seven sentences per block Task After each block, button press: 1 (intelligible) or 2 (unintelligible) fMRI Task 1: Speech vs. Rotated Speech fMRI Task 2: Native vs. Non-Native Speech with Visual Cues Stimuli 80 English sentences Spoken by native (N=4; 2 f) or non-native (N=4; 2 f; native Korean speakers) Eng- lish speakers • 40 presented with visual cues; 40 presented without visual cues (fixation cross) Design Event-related design with pseudorandomized sequence Task After each sentence, button press on a 1-to-4 scale (1: unintelligible; 4: intelligible) Behavioral Analogue Participants completed a similar task outside the scanner Multi-talker babble noise (SNR: -12 dB) Typed perceived sentences; scored on proportion of keywords reported correct Behavioral Indices of Speech Perception Left: In the scanner, participants judged native sentenc- es to be more intelligible than the non-native sentences (p<.05). Right: Outside the scanner, participants comprehended (a) native sentences more accurately than non-native sentences (p<.001), (b) sentences with visual cues better than those without (p<.001), and (c) this visual enhance- ment was more pronounced for native than for non-na- tive sentences (p<.005). Neural Correlates of Speech Perception Tasks Validity of the Speech vs. Rotated Speech Comparison This contrast yielded the entire language network spanning both dorsal (articulatory) and ventral (comprehension) streams 5 . This is not indicative of intelligibility processing proper but language processing, as rotated speech constitutes non-linguistic stimulus, and is not comparable to actual speech in terms of semantics, syntax, or phonology. Neural Correlates for Native vs. Non-Native Speech Perception Non-native speech, judged less intelligible, required both dorsal and ventral streams of the language network, since more processing is necessary for challenging tasks. This activity was comparable to the traditional Speech>Rotated contrast. In contrast, native speech processing which should be more automatic and effortless involved limited re- cruitment of posterior portions of the right temporooccipital junction. Role of Visual Cues Visual cues aided speech processing 7 , especially for native speech than for non-native speech. The reduced extent of neural involvement for the non-native>native contrast with visual cues indicate that the level of neural efficiency became more equivalent across two speech types as more external cues were available. Future Directions 1. Assess sources of individual variability in degrees of visual cues incorporation and native-bias in speech intelligibility (McGurk effect susceptibility; Implicit Association Task) 2. Functional connectivity analysis to complement the univariate subtraction approach in assessing the nature of dual streams activation in speech perception. The authors would like to thank Kirsten Smayda, Jasmine E. B. Phelps, and Rachael Gilbert for significant contributions in data collection and processing; the faculty and the staff of the Imaging Research Center at the University of Texas at Austin for technical support and counsel. 0.0 0.1 0.2 0.3 0.4 0.5 Native Non-Native Proportion of Words Correct No Visual Cues With Visual Cues 1 2 3 4 Native Non-Native Intelligibility Ratings No Visual Cues With Visual Cues A repeated measures ANOVA was run with two within-subjects factors (visual cues; nativeness of talkers) on the intelligibil- ity ratings and proportion of words correct in the sentences perception tasks conducted inside and outside the scanner. Task 1: Rotated Speech Normal>Rotated Speech Ventral stream of speech processing Bilateral anterior to posterior superior temporal sulcus Combinatorial network and lexical interface Dorsal stream of speech processing Left supramarginal gyrus, motor area and inferior frontal gyrus Sensorimotor interface and articulatory network Task 2: No Visual Cues Native>Non-Native Speech Sensorimotor interface Right angular gyrus Lexical interface Right posterior middle temporal gyrus Non-Native>Native Speech Ventral stream of speech processing Bilateral anterior to posterior superior temporal sulcus Combinatorial network and lexical interface Dorsal stream of speech processing Bilateral inferior frontal gyri, right superior pari- etal lobule Sensorimotor interface and articulatory network Task 3: With Visual Cues Native>Non-Native Speech Sensorimotor interface Right angular gyrus Lexical interface Right posterior middle temporal gyrus Non-Native>Native Speech Dorsal stream of speech processing Left inferior frontal gyri, right superior parietal lobule Sensorimotor interface and articulatory network SUMMARY Behavioral Results 1. Native speech is more intelligible and comprehensible than non-native speech. 2. Visual cues benefit native speech more than non-native speech. fMRI Results 1. Artificial intelligibility (rotated speech) contrast yields the entire language network. 2. Non-native speech with no visual cues contrast also yields the entire bilateral language network. 3. Native speech processing includes limited involvement of sensorimotor and lexical interfaces. 4. With no visual cues, the anterior superior temporal sulcus (indicated in speech>rotated) is activat- ed in non-native>native rather than the native>non-native contrast. 5. With visual cues, less neural activation is required to process non-native speech. Box 1: The Dual-Stream Model of the Func- tional Anatomy of Language 5 Native vs. non-native speakers’ sen- tences With or without visual cues of face movements Our aims are to (a) demonstrate the neural correlates for processing of natural variations in intelligibility, (b) assess the role of visual cues, and (c) ultimately re- solve the lack of consensus on the nature of the speech intelligibility region in the brain.
Transcript
Page 1: Natural variations in speech intelligibility: An fMRI study · RIB’s Software Library. Each participant’s data was linearly registered first to the T1 structural scan and then

Natural variations in speech intelligibility: An fMRI study

Han-Gyol Yi1, Rajka Smiljanic2, Bharath Chandrasekaran1

1S o u n d B r a i n L a b ( h t t p : / / s o u n d b r a i n l a b . w o r d p r e s s . c o m ) , D e p a r t m e n t o f C o m m u n i c a t i o n S c i e n c e s a n d D i s o r d e r s2U T S o u n d L a b ( h t t p : / / u t s o u n d l a b . w o r d p r e s s . c o m ) , D e p a r t m e n t o f L i n g u i s t i c sT h e U n i v e r s i t y o f T e x a s a t A u s t i n , A u s t i n , T X , U n i t e d S t a t e s

I N T R O D U C T I O N

R E F E R E N C E S

A C K N O W L E D G E M E N T S

Everyday we encounter speech of varying intelligibility due to talker de-pendent factors such as non-native accents3,4,6 and talker independent fac-tors such as noise or incorporation of visual cues7,8. Despite the inherent variability in speech, everyday conversations are largely unimpeded. How does the brain resolve natural variations in speech intelligibility? Studies on intelligibility have focused on using stimuli that are con-trolled in acoustical properties, yet reduced in intelligibility. This body of research has mostly indicated the left anterior superior temporal sulcus as the main area for intelligibility processing. This is in contrast to clinical studies that indicate the posterior region in the left superior temporal sul-cus as the main lesion site of receptive aphasia1. However, the spectrally rotated speech signal, albeit providing good acoustic control, does not resemble the types of signal that are to be re-solved in real life. For instance, there is no guarantee that the cortical ar-eas behind rotated speech processing are equivalent to those involved in non-native speech perception, the latter of which is becoming more and more prevalent owing to the increasing diversity in this society. In this fMRI study, participants listened to:

1. Abrams, D. A., Ryali, S., Chen, T., Balaban, E., Levitin, D. J., & Menon, V. (2012), ‘Multivariate activation and connectivity patterns discriminate speech intelligibility in Wernicke’s, Broca’s, and Geschwind’s Areas’, Cerebral Cortex, Epub ahead of print.

2. Adank, P., Noordzij, M. L., & Hagoort, P. (2012), ‘The role of planum temporale in processing accent variation in spoken language comprehension’, Human Brain Map-ping, vol. 33, pp. 361-372.

3. Bradlow, A. R., Torretta, G. M., & Pisoni, D. B. (1996), ‘Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics’, Speech Communi-cation, vol. 20, no. 3-4, pp. 255-272.

4. Bradlow, A. R., & Bent, T. (2008), ‘Perceptual adaptation to non-native speech’, Cog-nition, vol. 106, no. 2, pp. 707-729.

5. Hickok, G., & Poeppel, D. (2007), ‘The cortical organization of speech processing’, Na-

ture Reviews Neuroscience, vol. 8, pp. 393-402.6. Smiljanic, R., & Bradlow, A. (2009), ‘Speaking and hearing clearly: Talker and listener

factors in speaking style changes’, Language and Linguistics Compass, vol. 3, no. 1, pp. 236-264.

7. Sumby, W. H., & Pollack, I. (1954), ‘Visual contribution to speech intelligibility in noise’, Journal of the Acoustical Society of America, vol. 26, no. 2, pp. 212-215.

8. Wong, P. C. M., Uppunda, A. K., Parrish, T. B., & Dhar, S. (2008), ‘Cortical mecha-nisms of speech perception in noise’, Journal of Speech, Language, and Hearing Re-search, vol. 51, pp. 1026-1041.

9. Worsley, K. J. (2001), ‘Statistical analysis of activation images’, Functional MRI: An Introduction to Methods, ch. 14, eds. P. Jezzard, P.M. Matthews, & S.M. Smith, OUP.

SoundBrainLab

M E T H O D S

R E S U L T S

D I S C U S S I O N

Scanning parameters and analysis. Young adult (ages 18 to 35) monolingual native English speakers (N=20;12 f) were scanned in a Siemens Mag-netom Skyra 3T MRI scanner at the Imaging Research Center of the University of Texas at Austin. T1 images were obtained via MPRAGE sequence at the resolution of 1.0x1.0x1.0 mm. T2* images were obtained via EPI sequence (flip angle = 60°) at the spatial resolution of 2.0x2.0x2.0 mm, 36 slices with 50% distance factor, TR = 1.8 s, and TE = 30 ms. BOLD data were processed using FMRI Expert Analysis Tool Version 6.00, part of FM-RIB’s Software Library. Each participant’s data was linearly registered first to the T1 structural scan and then to the MNI152 template. Statistic im-ages were thresholded using clusters determined by Z>2.3 and a corrected cluster significance threshold of P=0.059.

Stimuli• 56 English sentences• Spoken by a female native English speaker.• Low pass filtered at 4kHz• 28 presented intact; 28 presented after spectral rotation at 4kHzDesign• Four blocks per each stimulus type; seven sentences per blockTask• After each block, button press: 1 (intelligible) or 2 (unintelligible)

fMRI Task 1: Speech vs. Rotated Speech

fMRI Task 2: Native vs. Non-Native Speech with Visual Cues

Stimuli• 80 English sentences• Spoken by native (N=4; 2 f) or non-native (N=4; 2 f; native Korean speakers) Eng-

lish speakers• 40 presented with visual cues; 40 presented without visual cues (fixation cross)Design• Event-related design with pseudorandomized sequenceTask• After each sentence, button press on a 1-to-4 scale (1: unintelligible; 4: intelligible)Behavioral Analogue• Participants completed a similar task outside the scanner• Multi-talker babble noise (SNR: -12 dB)• Typed perceived sentences; scored on proportion of keywords reported correct

Behavioral Indices of Speech Perception

Left: In the scanner, participants judged native sentenc-es to be more intelligible than the non-native sentences (p<.05).Right: Outside the scanner, participants comprehended (a) native sentences more accurately than non-native sentences (p<.001), (b) sentences with visual cues better than those without (p<.001), and (c) this visual enhance-ment was more pronounced for native than for non-na-tive sentences (p<.005).

Neural Correlates of Speech Perception Tasks

Validity of the Speech vs. Rotated Speech ComparisonThis contrast yielded the entire language network spanning both dorsal (articulatory) and ventral (comprehension) streams5. This is not indicative of intelligibility processing proper but language processing, as rotated speech constitutes non-linguistic stimulus, and is not comparable to actual speech in terms of semantics, syntax, or phonology.

Neural Correlates for Native vs. Non-Native Speech PerceptionNon-native speech, judged less intelligible, required both dorsal and ventral streams of the language network, since more processing is necessary for challenging tasks. This activity was comparable to the traditional Speech>Rotated contrast. In contrast, native speech processing which should be more automatic and effortless involved limited re-cruitment of posterior portions of the right temporooccipital junction.

Role of Visual CuesVisual cues aided speech processing7, especially for native speech than for non-native speech. The reduced extent of neural involvement for the non-native>native contrast with visual cues indicate that the level of neural efficiency became more equivalent across two speech types as more external cues were available.

Future Directions1. Assess sources of individual variability in degrees of visual cues incorporation and

native-bias in speech intelligibility (McGurk effect susceptibility; Implicit Association Task)

2. Functional connectivity analysis to complement the univariate subtraction approach in assessing the nature of dual streams activation in speech perception.

The authors would like to thank Kirsten Smayda, Jasmine E. B. Phelps, and Rachael Gilbert for significant contributions in data collection and processing; the faculty and the staff of the Imaging Research Center at the University of Texas at Austin for technical support and counsel.

0.0

0.1

0.2

0.3

0.4

0.5

Native Non-Native

Prop

ortio

n of

Wor

ds C

orre

ct No Visual CuesWith Visual Cues

1

2

3

4

Native Non-Native

Inte

lligi

bilit

y R

atin

gs

No Visual CuesWith Visual Cues

A repeated measures ANOVA was run with two within-subjects factors (visual cues; nativeness of talkers) on the intelligibil-ity ratings and proportion of words correct in the sentences perception tasks conducted inside and outside the scanner.

Task 1: Rotated SpeechNormal>Rotated Speech

Ventral stream of speech processing• Bilateral anterior to posterior superior temporal

sulcus• Combinatorial network and lexical interface

Dorsal stream of speech processing• Left supramarginal gyrus, motor area and inferior

frontal gyrus• Sensorimotor interface and articulatory network

Task 2: No Visual CuesNative>Non-Native Speech

Sensorimotor interface• Right angular gyrus

Lexical interface• Right posterior middle temporal gyrus

Non-Native>Native SpeechVentral stream of speech processing

• Bilateral anterior to posterior superior temporal sulcus

• Combinatorial network and lexical interfaceDorsal stream of speech processing

• Bilateral inferior frontal gyri, right superior pari-etal lobule

• Sensorimotor interface and articulatory network

Task 3: With Visual CuesNative>Non-Native Speech

Sensorimotor interface• Right angular gyrus

Lexical interface• Right posterior middle temporal gyrus

Non-Native>Native SpeechDorsal stream of speech processing

• Left inferior frontal gyri, right superior parietal lobule

• Sensorimotor interface and articulatory network

S U M M A R YBehavioral Results

1. Native speech is more intelligible and comprehensible than non-native speech.2. Visual cues benefit native speech more than non-native speech.

fMRI Results1. Artificial intelligibility (rotated speech) contrast yields the entire language network.2. Non-native speech with no visual cues contrast also yields the entire bilateral language network.3. Native speech processing includes limited involvement of sensorimotor and lexical interfaces.4. With no visual cues, the anterior superior temporal sulcus (indicated in speech>rotated) is activat-

ed in non-native>native rather than the native>non-native contrast.5. With visual cues, less neural activation is required to process non-native speech.

Box 1: The Dual-Stream Model of the Func-tional Anatomy of Language5

• Native vs. non-native speakers’ sen-tences

• With or without visual cues of face movements

Our aims are to (a) demonstrate the neural correlates for processing of natural variations in intelligibility, (b) assess the role of visual cues, and (c) ultimately re-solve the lack of consensus on the nature of the speech intelligibility region in the brain.

Recommended