+ All Categories
Home > Documents > Brain Oscillations during Spoken Sentence Processing

Brain Oscillations during Spoken Sentence Processing

Date post: 14-Oct-2016
Category:
Upload: lucia
View: 218 times
Download: 3 times
Share this document with a friend
17
Brain Oscillations during Spoken Sentence Processing Marcela Peña 1,2 and Lucia Melloni 2,3 Abstract Spoken sentence comprehension relies on rapid and effortless temporal integration of speech units displayed at different rates. Temporal integration refers to how chunks of information per- ceived at different time scales are linked together by the listener in mapping speech sounds onto meaning. The neural implemen- tation of this integration remains unclear. This study explores the role of short and long windows of integration in accessing mean- ing from long samples of speech. In a cross-linguistic study, we explore the time course of oscillatory brain activity between 1 and 100 Hz, recorded using EEG, during the processing of native and foreign languages. We compare oscillatory responses in a group of Italian and Spanish native speakers while they attentively listen to Italian, Japanese, and Spanish utterances, played either forward or backward. The results show that both groups of par- ticipants display a significant increase in gamma band power (5575 Hz) only when they listen to their native language played forward. The increase in gamma power starts around 1000 msec after the onset of the utterance and decreases by its end, resem- bling the time course of access to meaning during speech percep- tion. In contrast, changes in low-frequency power show similar patterns for both native and foreign languages. We propose that gamma band power reflects a temporal binding phenomenon concerning the coordination of neural assemblies involved in accessing meaning of long samples of speech. INTRODUCTION Spoken sentence comprehension relies on successful temporal binding of speech units, namely the integration of the segmental, suprasegmental, lexical, morphologic, syntactic, semantic, and contextual properties of the utter- ances that occur at different rates and overlap in time. It remains unknown how this integration is implemented in the human mind, and brain oscillations may offer a plausi- ble mechanistic explanation. Two separate cognitive operations, both involving tem- poral binding, are proposed to form the cognitive architec- ture of language comprehension, namely memory retrieval and semantic/syntactic unification operations (Hagoort, 2005; Jackendoff, 2002). The former refers to retrieval of phonological, syntactic, and semantic properties of words from long-term memory. The latter refers to combining in- formation from individual words to create an overall repre- sentation of the utterance. Most evidence on the neural mechanisms underpinning temporal binding for speech comprehension has been provided by studies employing semantic or syntactic violation paradigms, written stimuli, and ERP or event-related field measurements (see Friederici & Weissenborn, 2007; Friederici, 2002). ERPs/event-related fields are time-locked responses to stimuli obtained with high temporal resolution neuroimaging technique record- ings such as EEG and magnetoencephalography (MEG). Previous ERP studies have emphasized the difficulty in identifying a clear onset and offset at which specific speech processes occur, suggesting that many linguistic processes, including phonologic, semantic, syntactic, and pragmatic, might take place in parallel from very early on after the onset of the utterances (see Molinaro, Barber, & Carreiras, 2011; Hagoort, 2008, for reviews). Thus, it is at present unclear how the distributed nodes of the speech network are bound together while sentence level meaning emerges. We hypothesized that the time course of brain oscillations might disclose non-time-locked activities that complement data obtained by ERP studies, which can provide new evi- dence on temporal binding for speech. Synchronous activa- tion of distributed neuronal assemblies has been proposed as a general mechanism to form transient functional net- works and to integrate local information (Singer, 1999; Singer & Gray, 1995). Oscillatory synchrony might thus serve the crucial role of integrating speech units, taking place at different temporal scales. Studies of brain oscilla- tions during the processing of well-formed long samples of speech are rare. Nevertheless, in written language, EEG/ MEG data obtained using semantic/syntactic violation para- digms suggest that language-related memory retrieval op- erations are associated with power increases in the theta band (47 Hz) and power decreases in the alpha band (914 Hz; Bastiaansen, Oostenveld, Jensen, & Hagoort, 2008; Bastiaansen, Van der Linden, ter Keurs, Dijkstra, & Hagoort, 2005; Hagoort, Hald, Bastiaansen, & Petersson, 2004), whereas semantic/syntactic unification operations are linked to increases in beta (1520 Hz) and gamma (>21 Hz) band activities (Bastiaansen, Magyari, & Hagoort, 1 Scuola Internazionale Superiore di Studi Avanzati, Trieste, Italy, 2 Pontificia Universidad Católica de Chile, 3 Max Planck Institute for Brain Research, Frankfurt am Main, Germany © 2012 Massachusetts Institute of Technology Journal of Cognitive Neuroscience 24:5, pp. 11491164
Transcript
Page 1: Brain Oscillations during Spoken Sentence Processing

Brain Oscillations during Spoken Sentence Processing

Marcela Peña1,2 and Lucia Melloni2,3

Abstract

■ Spoken sentence comprehension relies on rapid and effortlesstemporal integration of speech units displayed at different rates.Temporal integration refers to how chunks of information per-ceived at different time scales are linked together by the listenerin mapping speech sounds onto meaning. The neural implemen-tation of this integration remains unclear. This study explores therole of short and long windows of integration in accessing mean-ing from long samples of speech. In a cross-linguistic study, weexplore the time course of oscillatory brain activity between 1and 100 Hz, recorded using EEG, during the processing of nativeand foreign languages. We compare oscillatory responses in agroup of Italian and Spanish native speakers while they attentively

listen to Italian, Japanese, and Spanish utterances, played eitherforward or backward. The results show that both groups of par-ticipants display a significant increase in gamma band power(55–75 Hz) only when they listen to their native language playedforward. The increase in gamma power starts around 1000 msecafter the onset of the utterance and decreases by its end, resem-bling the time course of access to meaning during speech percep-tion. In contrast, changes in low-frequency power show similarpatterns for both native and foreign languages. We propose thatgamma band power reflects a temporal binding phenomenonconcerning the coordination of neural assemblies involved inaccessing meaning of long samples of speech. ■

INTRODUCTION

Spoken sentence comprehension relies on successfultemporal binding of speech units, namely the integrationof the segmental, suprasegmental, lexical, morphologic,syntactic, semantic, and contextual properties of the utter-ances that occur at different rates and overlap in time. Itremains unknown how this integration is implemented inthe human mind, and brain oscillations may offer a plausi-ble mechanistic explanation.Two separate cognitive operations, both involving tem-

poral binding, are proposed to form the cognitive architec-ture of language comprehension, namely memory retrievaland semantic/syntactic unification operations (Hagoort,2005; Jackendoff, 2002). The former refers to retrieval ofphonological, syntactic, and semantic properties of wordsfrom long-term memory. The latter refers to combining in-formation from individual words to create an overall repre-sentation of the utterance. Most evidence on the neuralmechanisms underpinning temporal binding for speechcomprehension has been provided by studies employingsemantic or syntactic violation paradigms, written stimuli,and ERP or event-related field measurements (see Friederici& Weissenborn, 2007; Friederici, 2002). ERPs/event-relatedfields are time-locked responses to stimuli obtained withhigh temporal resolution neuroimaging technique record-ings such as EEG and magnetoencephalography (MEG).

Previous ERP studies have emphasized the difficulty inidentifying a clear onset and offset at which specific speechprocesses occur, suggesting that many linguistic processes,including phonologic, semantic, syntactic, and pragmatic,might take place in parallel from very early on after theonset of the utterances (see Molinaro, Barber, & Carreiras,2011; Hagoort, 2008, for reviews). Thus, it is at presentunclear how the distributed nodes of the speech networkare bound together while sentence level meaning emerges.We hypothesized that the time course of brain oscillationsmight disclose non-time-locked activities that complementdata obtained by ERP studies, which can provide new evi-dence on temporal binding for speech. Synchronous activa-tion of distributed neuronal assemblies has been proposedas a general mechanism to form transient functional net-works and to integrate local information (Singer, 1999;Singer & Gray, 1995). Oscillatory synchrony might thusserve the crucial role of integrating speech units, takingplace at different temporal scales. Studies of brain oscilla-tions during the processing of well-formed long samples ofspeech are rare. Nevertheless, in written language, EEG/MEG data obtained using semantic/syntactic violation para-digms suggest that language-related memory retrieval op-erations are associated with power increases in the thetaband (4–7 Hz) and power decreases in the alpha band(9–14 Hz; Bastiaansen, Oostenveld, Jensen, & Hagoort,2008; Bastiaansen, Van der Linden, ter Keurs, Dijkstra, &Hagoort, 2005; Hagoort, Hald, Bastiaansen, & Petersson,2004), whereas semantic/syntactic unification operationsare linked to increases in beta (15–20 Hz) and gamma(>21 Hz) band activities (Bastiaansen, Magyari, & Hagoort,

1Scuola Internazionale Superiore di Studi Avanzati, Trieste,Italy, 2Pontificia Universidad Católica de Chile, 3Max PlanckInstitute for Brain Research, Frankfurt am Main, Germany

© 2012 Massachusetts Institute of Technology Journal of Cognitive Neuroscience 24:5, pp. 1149–1164

Page 2: Brain Oscillations during Spoken Sentence Processing

2010; Penolazzi, Angrilli, & Job, 2009; Haarmann, Cameron,& Ruchkin, 2002; Braeutigam, Bailey, & Swithenby, 2001;Rohm, Klimesch, Haider, & Doppelmayr, 2001). In fact,beta power linearly increases in syntactically correct sen-tences and decreases after a syntactic violation (Bastiaansenet al., 2010). Moreover, contrary to words semantically in-congruent with respect to their sentence context, semanti-cally congruent words are accompanied by an increase inlow (35–45 Hz; Hald, Bastiaansen, & Hagoort, 2006; Weiss& Mueller, 2003) and broad band (30–100 Hz; Penolazziet al., 2009) gamma band activity during reading.

Because spoken sentence comprehension increasesas the sentence unravels in time, the study of the timecourse of the patterns of oscillatory activity may disclosemechanisms underpinning the unit-by-unit integration ofthe incoming information provided by the speech signaland context. According to the binding by synchronyhypothesis, sensorial and cognitive integration is theproduct of neural synchrony in the gamma band betweenlocal (Singer, 2002) and distant (Varela, Lachaux, Rodriguez,& Martinerie, 2001; Rodriguez, Lachaux, Martinerie,Renault, & Varela, 1999) neural assemblies. Supporting thisproposal, significant increases in gamma band oscillationshave been found to be associated with the emergenceof meaningful objects in visual (Tallon-Baudry, 2009;Melloni et al., 2007; Rodriguez et al., 1999) and audio-visual (Schneider, Debener, Oostenveld, & Engel, 2008;Widmann, Gruber, Kujala, Tervaniemi, & Schröger, 2007)studies, supporting the prominent role of high-frequencyoscillations in unimodal and polymodal integration. Mod-ulations in gamma band activity might thus reveal theimplementation of fast windows of coupling/uncouplingactivity between different neural networks engaged inthe emergence of unified single object representations(Engel & Singer, 2001; Singer, 1999), including thoseemerging from speech.

Brain oscillations have also been reported as biologicalsignatures of the on-line tracking and sampling of speechunits. At the lexical level, brain activity resonates at thesame frequency at which words regularly occur in a contin-uous speech stream (Buiatti, Peña, & Dehaene-Lambertz,2009). At the sublexical level, the Asymmetric Samplingin Time (AST) theory proposes that different neural assem-blies, asymmetrically distributed over the hemispheres, res-onate with the occurrence of slow or rapid events such assyllables or phonemes, respectively (Poeppel, 2003). Infact, theta oscillations over the right hemisphere wouldreflect syllable tracking, occurring every 200–300 msec,whereas low gamma frequencies over the left hemispherewould indicate phoneme tracking, taking place every 40–100 msec. Indeed, theta and, to some extent, gamma fre-quency ranges match the average duration of the syllableand phonemes across several languages (Greenberg,Carvey, Hitchcock, & Chang, 2003). Recent studies supportthe AST proposal, showing that theta band phase coher-ence was significantly reduced over a group of right tem-poral sensors when the spectral information of the

spoken signal at which the syllable occurs was removed byfiltering, rendering speech unintelligible (Luo & Poeppel,2007). Furthermore, the power spectrum recorded duringsilence over electrodes projected over left and right Heschlʼsgyrus showed that theta (3–6 Hz) band activity was strongerover the right hemisphere whereas gamma (28–40 Hz) bandactivity was stronger over the left hemisphere (Giraud et al.,2007). Brain oscillations may thus reflect the functioning oftracking/sampling mechanisms underpinning the process-ing of speech units occurring at different temporal scales.To shed light on the role of brain oscillations during the

perception of well-formed spoken sentences, we carriedout a cross-linguistic study with native speakers of Italianand Spanish. We hypothesize that speech units such asphonemes, syllables, syntactic structure and some isolatedlexical items will be accessed by both groups of participantswhen they listen to Italian and Spanish utterances, whereassemantic–syntactic–pragmatic integration, indispensablefor speech comprehension, will only be possible when par-ticipants listen to their respective native language. We in-vestigated the time course of brain oscillations in a broadfrequency range (1–100 Hz) during attentive listening toutterances in the native language (Spanish for Spanishspeakers and Italian for Italian speakers) and in foreignlanguages (Spanish and Japanese for Italian speakers andItalian and Japanese for Spanish speakers). All evaluatedlanguages (Italian, Japanese, and Spanish) have similarphonemic repertory (International Phonetic Association,1999), whereas only Italian and Spanish have similar rhyth-mic (prosodic), syllabic and syntactic structures, and partiallyshare their lexical repertoire (Ramus, Nespor, & Mehler,1999; Nespor & Vogel, 1986). Overall, Italian and Spanishare linguistically close, and both have low linguistic similar-ity with Japanese. Participants were not previously exposedto any of the tested foreign languages. To control for acous-tic and articulatory factors, we also evaluated the brain re-sponses to the utterances from the three languages playedreversed in time (thereafter backward speech). To assurethat participants attentively listened to all utterances, theywere required to judge whether a short sound played afterthe end of each utterance was part of the sentence or not(see Figure 1A). In summary, in two independent groups(i.e., Italian and Spanish native speakers), we evaluatedsix experimental conditions obtained from the combina-tion of three languages (Italian, Japanese and Spanish) andtwo types of playback (forward and backward).In the context of the current task, we predicted that

the oscillatory responses could reflect (a) non-language-specific activity, that is, cognitive processing related to taskperformance such as attentive monitoring and STM, and(b) language-specific activity related to speech units pro-cessing (see Figure 1B).Regarding non-language-specific oscillatory activity,

we expected to observe alpha suppression after the cueindicating the trial onset. Alpha suppression has beenreported for alerting when participants are exposed toa cue predicting the presentation of a target (Babiloni

1150 Journal of Cognitive Neuroscience Volume 24, Number 5

Page 3: Brain Oscillations during Spoken Sentence Processing

et al., 2004), during sustained attention (Yamagishi,Goda, Callan, Anderson, & Kawato, 2005), and after theabrupt onset of visual (Yantis & Jonides, 1984) and audi-tory stimuli (Shahin, Picton, & Miller, 2009). We expectedthat alpha suppression would decrease while the sen-tence evolves in time, remaining higher for attentionallymore demanding experimental conditions.Concerning language-specific oscillatory responses, we

anticipated two possible, not mutually exclusive, scenar-ios reflecting different aspects of speech processing. Thefirst scenario concerns sampling and tracking mecha-nisms of meaningless speech units (e.g., phonemes andsyllables). On the basis of the AST theory, it is expectedthat phoneme and syllable tracking processes should beaccompanied by an increase in oscillatory power at thefrequency at which the tracked units occur, that is, thetheta band for syllables and the low gamma band for pho-nemes. Regarding the time course of oscillatory activityreflecting phoneme and syllable tracking, it is expectedthat the increase in both theta and low gamma bandpower should start as soon as phonemes and syllables

are perceived and remains high until the utterance finishes.Furthermore, it should be similar in any experimental con-dition in which syllables and phonemes can be identified.Specifically, because the phoneme repertory is highly sim-ilar in all evaluated languages, we predicted that lowgamma band will significantly increase in all forward utter-ances reflecting the ability to track phonemes. We also pre-dicted an increase in theta band activity for Italian andSpanish forward utterances, reflecting syllable tracking.Japanese is a special case, because it is structured contain-ing mora, a subsyllabic prosodic unit. However, in non-native Japanese-speaking adults, the mora is likely to beperceived as a syllable, and thus theta band activity shouldalso be high when participants listen to forward Japaneseutterances. We anticipated a significantly smaller increasein theta and low gamma band for backward speech, be-cause the reversion in time of utterances seriously distortsthe acoustic–phonetic properties of some frequent pho-nemes and syllables, rendering sampling and tracking ofthese linguistic units difficult. A second scenario will be-come evident when the first linguistic units conveying

Figure 1. (A) Trials startedwith a fixation cross followed bya 1500-msec silent period afterwhich an Italian, Japanese, orSpanish utterance played eitherforward or backward waspresented. After a 500-msecsilent interval, a 300-msec-longtest sound was presented, andparticipants had to judgewhether the test sound waspart of the previous utterance.(B) We depict a schematicproposal on linking thechanges in oscillatory activitywith the cognitive processesunderpinning speechprocessing. Briefly, changesin the low-frequency spectrum(<20 Hz) would reflect theprocessing of meaninglessspeech units (e.g., phonemesand syllables) and generalnonlinguistic cognitiveprocessing, whereas thehigh-frequency spectrumwould mirror the integrationof the meaningful ones (e.g.,words).

Peña and Melloni 1151

Page 4: Brain Oscillations during Spoken Sentence Processing

meaning, such as words or phrases, are recognized andmust remain active until the sentence level meaning isfound. This processing may reflect the implementation ofsemantic/syntactic unification processes involving the up-dating of information provided by incoming words with thatprovided by their neighbors. This second step is unique tothe native language and engages different neural networksinvolved in processing the speech signal and other analysesrelevant to the emergence of a meaningful object fromutterances. We predicted that the time course of high-frequency brain oscillations, specifically in the gammarange should increase as soon as the first meaningful lin-guistic units were recognized and sentence comprehen-sion started and should remain high until the sentencelevel meaning of the utterance could be anticipated, themoment when integration is no longer required. The lastword of a sentence can be recognized as soon as its firstphoneme is perceived (Van Berkum, Brown, Zwitserlood,Kooijman, & Hagoort, 2005; McQueen, Cutler, & Norris,2003). Listener expectations allow that the end of almostany utterance can be guessed (Van Berkum, 2008), andgamma band power might thus decrease before the sen-tence ends. We did not predict increments in gamma bandoscillations for either nonnative language forward or back-ward utterances, because the meaningful units would notbe integrated in a sentence level meaning. Neither did wepredict increases in gamma band for foreign utterancescontaining recognizable lexical items (such as /galleria/,which means gallery in both Italian and Spanish), becausewe anticipated that gamma band activity will reflect sen-tence level meaning updating and building processes, butnot isolated word memory retrieval. Concerning theta ac-tivity, previous studies have shown increases in theta bandactivity associated with the increase in verbal STM demands(Bastiaansen et al., 2010; Hagoort & van Berkum, 2007;Weiss et al., 2005). From this perspective, we also expectedthat theta band activity would increase in utterances fromthe native and the close foreign language, associated to therecognition of the ongoing lexical and phrasal items.Finally, regarding the beta band, we anticipated that if betaactivity reflects semantic/syntactic unification processes(Bastiaansen et al., 2010) we should observe a linear in-crease in beta band exclusively for native utterances.

In summary, we expected to find patterns of oscillatoryactivity that were illustrative of the processes underpinningaccess to the meaning of well-formed spoken sentences inthe native language, using a no-violation paradigm.

METHODS

Participants

Two groups of 24 adults each were evaluated. One groupof participants was composed of native Spanish speakers(from Chile), and the other group consisted of nativeItalian speakers (from Italy). Four participants of eachgroup were excluded from the analysis because their

EEG data contained artifacts in more than 50% of the trialsin one or more experimental conditions (see Methods). Allparticipants were monolingual, of ages 20–30 years (Spanishgroup: mean = 23.2 years, SD = 2.6 years; Italian group:mean = 24.6 years, SD = 2.9 years), right-handers, and re-ported normal audition. There were 12 women in eachgroup. The study received the approval of the regionalethical committee for biomedical research. Participants re-ceived a monetary compensation for their participationand signed written informed consent to participate in thestudy.

Stimuli

Forward and Backward Utterances

We recorded speech samples from nine different femalemonolingual native speakers, three from each language.Each speaker recited a series of 54 utterances in her na-tive language using adult-directed speech. From this poolof utterances, we selected 18 utterances by each speakercreating a set of 54 different utterances per language. Theset of utterances had similar semantic interpretation acrossall languages (see the list of sentences translated to Englishin Supplementary Text 1). All sentences were affirmative;pronounced using adult-directed speech; were not pro-sodically, semantically, or syntactically related to each other;and were matched in the number of syllables and (as muchas possible) in the number of function and content words.All sentences had an SVO structure in Spanish and Italianand an SOV structure in Japanese. No systematic acousticor linguistic clues were present at any time during theutterances. The set of utterances was not significantly dif-ferent across languages in either mean energy (root meansquare = 0.18, 0.19, and 0.18 Pa, for Italian, Japanese, andSpanish, respectively), syllable number (15–18 syllables) orduration (2800–3000 msec). Supplementary Figure 1 illus-trates the waveform and spectrogram of a single utteranceplayed in each one of the three languages for their forwardand backward versions. A naive native speaker of eachlanguage verified that the selected utterances were wellformed. A set of 54 backward utterances per languagewas created by reversing the forward utterances in time(find an example in Supplementary Sound 1 with the cor-responding description in Supplementary Text 2). Criti-cally, all participants were evaluated with identical stimuli,procedure, EEG system, and experimental design in theirrespective native countries.

Test Sounds

Test sounds were 300-msec long chunks, randomly ex-tracted from any part of the utterance. Chunks consistedof segmental or suprasegmental units that did not matchreal words or verbal expressions in any native language.Test sounds belonged to the same experimental conditionas the spoken sentence. For instance, after a Forward

1152 Journal of Cognitive Neuroscience Volume 24, Number 5

Page 5: Brain Oscillations during Spoken Sentence Processing

Italian utterance, the target could be a chunk extractedeither from the same spoken sentence or from anotherForward Italian sentence.

Task and Procedure

Participants were evaluated in sound proof Faradayrooms. The structure of the trials is illustrated in Figure 1A.Fifty-four trials per experimental condition were presented.Participants were accommodated in a chair placed 1.5 maway from a monitor and loud speakers. Written instruc-tions, presented on themonitor, informed participants thatthey were expected to attentively listen to a series of utter-ances from different languages delivered by loud speakers.Once the sentence finished, a short sound would be pre-sented, and they should judge whether the sound was achunk taken from the preceding utterance. In each trial,participants responded by pressing one of two differentbuttons on a response pad. For half of the participants,the right button indicated “YES” and the left button indi-cated “NO,” whereas for the other half, the button assign-ment was reversed. The delayed match-to-sample task ofacoustic chunks was used to assure equivalent attentionaldemands during the processing of native and foreign lan-guages. No practice or feedback was provided. We in-structed participants to avoid body and eye movementsand blinking; however, small movements and blinkingwere allowed during breaks that were set to occur every12 min of the session. The presentation order of the ex-perimental conditions was pseudorandomized acrossparticipants. The consecutive repetition of the same ex-perimental condition, speaker, and spoken sentence wasnot allowed.

EEG Data Acquisition

EEGwas continuously recorded with a 64- and 128-channelEEG system (EGI, Inc., Eugene, OR) in Spanish and Italianparticipants, respectively. The EEG was digitized at asampling rate of 1000 Hz (bandpass filter = 0.01–100 Hz).Electrodes were referenced to the vertex (Cz).

Data Analysis

Behavioral performance and electrophysiological activitywere compared across groups and experimental condi-tions. The analysis of brain activity focused on the atten-tive listening period of each trial. We present here theresults for correct trials; however, similar results were ob-tained when all trials were analyzed together.

Behavioral Data

Mean accuracy and RT of the correct responses were sub-mitted to separate repeatedmeasures ANOVAwith Language(Italian, Japanese, and Spanish) and Type of Utterance (For-ward and Backward) as within-subject factors and Group

(Italian and Spanish) as a between-subject factor. TheGreenhouse–Geisser correction was applied.

Time Resolved Spectral Power Computation

The raw EEG signal was segmented into a series of5000-msec long epochs starting at 1500 msec before theonset of the utterances. The continuous 50-Hz (AC) com-ponent (same in Chile and Italy) was filtered in each epochwhile keeping the biological 50-Hz signal. To achieve that,the amplitude and phase of the AC signal was estimatedand subtracted from the original signal, resulting in the se-lectively elimination of the periodic part of the 50-Hz com-ponent (line frequency). Channels contaminated with eyemovements, blinking or motion artifacts, and epochs withmore than seven contaminated channels with voltage fluc-tuations exceeding ±100 μV, transients exceeding ±70 μV,or electrooculogram activity exceeding ±70 μV were ex-cluded from the spectral power analysis. Each nonrejectedepoch was analyzed by applying a sliding window fastFourier transform (Hamming window, window length,and step and window overlap were equal to 232 msec,10 msec, and 90%, respectively, for frequencies from 11to 100 Hz and 500 msec, 10 msec, and 95%, respectively,for frequencies from 1 to 10 Hz). For every participant,time, and frequency, bin amplitude was computed follow-ing the procedure described in Melloni et al. (2007): Signalwindows (232 or 500 points) were zero-padded and fastFourier transformed to get an interpolated frequency res-olution of ∼1 Hz per frequency bin. The instantaneousamplitude was then computed by taking the real andimaginary Fourier coefficients (C( f, t)r andC( f, t)i), squaringand adding them, and taking the square root (i.e., for agiven time window t and frequency bin f ), as follows:

Ampð f ; tÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiCð f ; tÞ2r þ Cð f ; tÞ2i

qðÞ

This amplitude is equivalent to the magnitude of theobserved oscillation at a given time and frequency point,and it was used to construct a time–frequency map perexperimental condition. Each time–frequency map wasnormalized against 1500 msec prestimulus baseline andaveraged across all nonrejected trials and electrodes. Thenormalization involved subtracting the baseline averageand dividing it by the baseline standard deviation (SD) ona frequency-by-frequency basis where S is a signal, μ is theaverage of the signal during the baseline period, and σ isthe SD of the same baseline period. Then, the normalizedsignal was computed by

SN ¼ ðS−μÞ=σ ðÞ

We ran the spectral power analysis over 64 or 128 elec-trodes in the Spanish and Italian groups, respectively.

Peña and Melloni 1153

Page 6: Brain Oscillations during Spoken Sentence Processing

Similar results were obtained in the Italian group when werestricted the analysis to 64 electrodes roughly matched inlocation to the 64 electrodes of the net used for Spanishspeakers.

Time–frequency windows for statistical comparisons.Time–frequency ROI (TF-ROI) were identified by evaluat-ing significant task-related changes of oscillatory activity,being oblivious to any effect of experimental conditionsor groups. By selecting the time windows of interest (ina way orthogonal to the research hypothesis, which wasto evaluate the effect of native language), we alleviate themultiple comparison problem, because instead of run-ning multiple ANOVAs with Language, Type and Groupfor every time and frequency bin, we only run the ANOVAsover the identified TF-ROIs. Independence between theselection procedure and the test of the experimental fac-tors was achieved through assuring orthogonality betweenthe selection contrast and the test contrast, a balanceddesign matrix between conditions, and by including equalamounts of trials per condition (Kriegeskorte, Simmons,Bellgowan, & Baker, 2009). For the selection contrast, weaveraged activity across all experimental factors, that is,group, language, type, and electrodes, and contrastedthose values against the prestimulus interval. This proce-dure identified the time windows when power at each fre-quency bin significantly differed between the listening toutterance period and the corresponding baseline withoutany a priori assumption about the effect of group, languageor type of utterance and is orthogonal to the effect ofexperimental conditions or groups (Kriegeskorte et al.,2009). In particular, we first averaged the time–frequencyresponses across the six experimental conditions andchannels for each participant. Then, the time–frequencyaverages from all participants were pooled together,regardless of their group. The mean power at each ofthe 100 frequency bins was submitted to a paired t test(alpha = 0.05; two-tailed) comparing each sample of thepower during the listening period (0–3500 msec after theonset of the utterance) against its corresponding baseline.A single baseline value per frequency was obtained byaveraging the power across the 1500 msec before utter-ance onset. As a result, we observed three TF-ROIs inwhich the power between 1 and 100 Hz was significantlydifferent from baseline (Figure 2 and SupplementaryFigure 2a and b). The first window involved a frequencyrange from 4 to 8 Hz (theta band) and extended from100 to 3200 msec after sentence onset. The second win-dow concerned frequencies from 9 to 14 Hz (alpha band)and extended from 1000 to 2800 msec. The last windowimplicated frequencies from 55 to 75 Hz (middle gammaband) and extended from 1000 to 2900 msec after thesentence onset. No significant differences were observedfor other frequencies, including low gamma (21–40 Hz)and beta (15–20 Hz) bands. Similar TF-ROIs were identifiedwhen applying false discovery rate (q< 0.05) to correct formultiple comparisons in the entry matrix of p values used

to identify the TF-ROIs (see Supplementary Figure 2). Wereport results based on the uncorrected TF-ROI windowsas differences in window size did not affect the resultsreported below. The uncorrected TF-ROIs provide an esti-mate of the lower and upper bound of the effects in time.In this context, it is important to note that the differencesin the middle gamma band starting at 1000 msec after theonset of the sentences is observed for the average activityand thus should be taken as the upper bound of the effect,that is, the point at which maximal overlap between sen-tences and subjects is reached for the first time, such thatthe increase in gamma becomes statistically significant.To explore the spatial distribution of the oscillatory

response, spectral power in each TF-ROI was averagedover nine clusters of adjacent electrodes located over lat-eral (central, left, and right) and antero-posterior (anterior,middle, and posterior) regions of the scalp. All electrodeclusters contained similar number of electrodes in bothgroups of participants (see Figure 3).The mean spectral power observed in each TF-ROI for

the nine groups of electrodes was submitted to separatedrepeatedmeasures ANOVAs, one per TF-ROI, with Language(Italian, Japanese, and Spanish), Type of Utterance (forwardand backward), Lateralization (central, left, and right), andAntero-posterior Location (anterior, middle, and posterior)as within-subject factors and Group (Italian and Spanish

Figure 2. The time course of the mean t (A) and p values (B, C)for theta, alpha, and low and middle gamma bands from the onset tothe end of the utterances are depicted. The t and p values for eachfrequency bin from 1 to 100 Hz were obtained from the comparisonof the oscillatory activity averaged across all utterances from allexperimental conditions, all channels, and, regardless of the group,against the baseline. The mean t and p values are plotted in red linesfor theta, from 4 to 8 Hz; green lines for alpha, from 9 to 14 Hz;black lines for low gamma, from 21 to 40 Hz; and blue lines formiddle gamma power, from 55 to 75 Hz, in blue lines.

1154 Journal of Cognitive Neuroscience Volume 24, Number 5

Page 7: Brain Oscillations during Spoken Sentence Processing

groups) as between-subject factor. Greenhouse–Geissercorrection was applied in all comparisons.Finally, it should be noted that a main advantage of our

cross-sectional experimental design is that it allows rulingout concerns related to stimulus material, methodology,and/or selection procedure in the TF-ROIs, etc. This isbecause our results can be replicated in an independentpopulation. In particular, the effect of native language overanother language can be evaluated for Spanish nativespeakers and replicated in Italian native speakers (seeResults). Replication is the most rigorous test that can beapplied and immediately eliminates spurious results(which are not likely to replicate).

Comparison with previous studies showing hemisphericdifferences. Previous data reported hemispheric differ-ences for theta and low-gamma activities during speechprocessing. In fact, theta phase coherence from 4 to 8 Hzis significantly greater over the right as compared with theleft temporal MEG sensors during intelligible speech per-ception (Luo & Poeppel, 2007). Moreover, during silence(Giraud et al., 2007), the spontaneous oscillatory powerover fronto-temporal electrodes is greater for theta (3–6 Hz) over the right as compared with the left hemisphere,whereas the contrary is observed for low gamma (28–40 Hz). To compare our results with these previous stud-ies, we submitted the mean theta and low gamma powerobserved during the time–frequency windows and elec-trodes described for each one of the referenced studies(see Supplementary Figure 3) to separated repeated mea-

sures ANOVAs, one for each study and each frequencyrange, with Language (Italian, Japanese, and Spanish),Type of Utterance (forward and backward), and Hemi-sphere (right and left) as within-subject factors and Group(Italian and Spanish groups) as a between-subject factor.Greenhouse–Geisser correction was applied in all com-parisons with more than two levels.

ERP Analysis

To rule out that differences in induced response were attrib-utable to differences in evoked responses, ERP and globalfield power were compared between experimental condi-tions and groups (for a detailed description of these anal-yses, see Supplementary Text 3).

RESULTS

Behavioral Results

Confirming the compliance with the task, in both groupsthe mean accuracy was significantly higher than chancefor all experimental conditions ( p < .001 for each com-parison; Figure 4A).

Regarding the effect of the native language onmean accu-racy, we found a significant triple interaction, Language ×Type×Group [F(2, 76) = 4.3, p< .02]. Mean accuracy wassignificantly higher for forward native language as com-pared with each other experimental condition in bothgroups ( p < .05 for each comparison). In addition, meanaccuracy and RTs were significantly greater for forward as

Figure 3. Left and right panels show the nine clusters of adjacent electrodes over a 64- and a 128-electrode net arrangements used in Spanishand Italian groups, respectively. Each group of electrodes involves the electrodes numbered under the silver-colored area, whereas the dottedlines indicate the limit of the groups of electrodes. The names of the groups of electrodes are indicated in the left picture.

Peña and Melloni 1155

Page 8: Brain Oscillations during Spoken Sentence Processing

compared with backward sentences in both groups andlanguages [F(1, 38) = 11.4, p < .002; F(1, 38) = 4.4, p <.04; main effect of Type of Utterance in accuracy and RTs,respectively], suggesting that participants were more accu-rate but slower to process forward than backward utter-ances (Figure 4A and B).

At the end of the experiment, participants were askedto report whether they had recognized the foreign lan-guages. All participants answered that the languages wereSpanish, Italian, and other Asiatic or Slavic languages,some participants also recognized Japanese. Moreover, allparticipants reported that they recognized one or two iso-lated lexical items in 50–60% of the utterances from therhythmically close language (i.e., Spanish for Italians andItalian for Spanish speakers; see Supplementary Table 1).Those words occurred at different moments of the utter-ances of the rhythmically close foreign language. Importantly,none of them recognized a complete sentence in any of theforeign languages. These set of isolated words were recalledat the end of the utterances, showing that they were main-tained in short-term verbal memory. Backward utteranceswere mostly reported as unknown foreign languages.

Oscillatory Brain Activity

Middle Gamma Band

We found a significant Language × Type of Utterance ×Group interaction [F(1.9, 74.1) = 6.7, p< .002]. Post hoccomparisons revealed that both groups, Italian and Spanishparticipants, showed significantly higher middle gammapower for their respective native language as comparedwith the native language of the other group of participantsplayed forward. In fact, mean middle gamma power wassignificantly higher for forward Spanish than for forwardItalian utterances in Spanish speakers whereas it was signif-icantly higher for forward Italian than for forward Spanish

utterances in Italian speakers [F(1, 38) = 11.53, p < .002]over each group of electrodes (see Table 1). The temporalevolution of oscillatory power per group, language, andtype of utterance is depicted in Figure 5.In addition, gamma band power was higher for left ( p<

.001) and right ( p< .004) groups of electrodes [lateraliza-tion, F(1.7, 65.9) = 8.9, p< .001] than for central electrodesand showed a posterior-to-anterior distribution, with higheramplitudes in middle ( p < .009) and posterior ( p < .03)electrodes than anterior ones [antero-posterior location,F(1.3, 50.3) = 3.6, p < .03]. Scalp topographies of theeffects are shown in Figure 6.These results support the proposal that gamma oscilla-

tions reflect the integration of meaningful units (i.e.,words) during native language processing. Middle gammapower increased around 1000 msec, coincident with thetime when two or more words have been recognized andintegrated, and decreased when the comprehension of thesentence could be anticipated and, in the context of ourtask, word integration was not longer required. Furthersupporting our prediction concerning word integration,we did not observe significant increments in gamma oscil-lations for any other forward or backward experimentalcondition although participants had access to phonemes,prosodic units, isolated words, morphemes, and, in somecases, even syntactic frameworks but that they were unableto integrate into a comprehensible message.The rather late onset is determined by the summation

across trials. It is thus possible that some effects started ear-lier in time but reached stability only around 1 sec. Hence,the latency reflects the upper bound of the effects.

Theta Band

Theta band power was significantly higher for forward thanbackward spoken sentences [type of utterance, F(1, 38) =

Figure 4. Mean accuracy (A) and mean RT (B) of the correct responses for Italian and Spanish groups per experimental condition are depicted inthe left and the right plots, respectively. Vertical lines indicate 1 SD of the mean.

1156 Journal of Cognitive Neuroscience Volume 24, Number 5

Page 9: Brain Oscillations during Spoken Sentence Processing

13.9, p < .001] and showed the opposite scalp distribu-tion as middle gamma. That is, theta band power wassignificantly higher over anterior than middle ( p <.005) and posterior ( p < .001) regions [antero-posteriorlocation, F(1.4, 51.9) = 13.3, p < .001]. Moreover, sixsignificant interactions were observed, that is, Type of Ut-terance× Lateralization [F(1.9, 51.9) = 13.6, p< .04], Typeof Utterance × Language × Lateralization [F(3.4, 129) =2.9, p < .03], Type of Utterance × Antero-posterior Loca-tion [F(1.2, 47)=5.5,p<.02], Language×Antero-posteriorLocation [F(2.7, 102)=2.9 p<.04]; Lateralization×Antero-posterior Location [F(1.8, 70) = 12.7 p < .001], and Typeof Utterance × Lateralization × Antero-posterior Location[F(2.9, 112) = 4.3, p < .006]. Post hoc analysis showedthat, across both groups and languages, theta band powerwas significantly higher for forward utterances over thecentral anterior group of electrodes as compared with theresponse over any other group of electrodes ( p < .05 foreach comparison). Furthermore, mean theta power wassignificantly lower for Forward Japanese than for ForwardSpanish utterances over the right posterior than the centralanterior group of electrodes ( p < .05).Overall, our results support the role of theta oscilla-

tions in syllabic tracking. First, as predicted, theta bandpower was significantly higher for forward than backward

languages. In backward speech the syllabic tracking couldbe periodically reducedmainly as a consequence of the dis-tortion in syllables containing stop phonemes. Second, thetime course of theta band power was consistent with sylla-ble tracking: It started after 100 msec and remained highuntil the end of the utterance. Third, the fact that bothItalian and Spanish participants exhibited similar increasesin theta power for all forward languages including Japa-nese may be explained because adult participants per-ceived more as syllables.

Alpha Band

Alpha power was lower over posterior as compared withanterior ( p< .001) and middle ( p< .001) groups of elec-trodes in the period from 0 to 500 msec after sentenceonset (antero-posterior location, F(1.4, 54.6) = 25.6, p <.001). Moreover, a Lateralization × Antero-posterior Lo-cation was observed [F(3.3, 126) = 13.4, p < .001]. Overanterior groups of electrodes alpha power was greater forcentral and right as compared with left electrodes ( p < .04and p < .02, respectively), whereas over posterior groupsof electrodes, alpha power was significantly greater for cen-tral as compared with left and right electrodes ( p < .002

Table 1. Mean Middle Gamma Band Power Comparisons by Group of Electrodes, Restricted to Spanish and Italian ForwardUtterances

LateralizationAntero-posterior

Location Language

Italian Speakers Spanish Speakers Language×Group

Mean SD Mean SD F(1, 38) p

Central Anterior Italian 0.046 0.112 0.027 0.077 5.7 .02

Spanish −0.001 0.082 0.068 0.189

Central Middle Italian 0.085 0.129 0.038 0.072 12.7 .001

Spanish 0.010 0.048 0.097 0.171

Central Posterior Italian 0.088 0.113 0.034 0.047 10.7 .002

Spanish 0.037 0.078 0.081 0.091

Left Anterior Italian 0.119 0.165 0.042 0.099 10 .003

Spanish 0.016 0.078 0.119 0.244

Left Middle Italian 0.131 0.163 0.058 0.105 10.8 .002

Spanish 0.033 0.056 0.145 0.247

Left Posterior Italian 0.103 0.119 0.047 0.067 11.3 .002

Spanish 0.048 0.072 0.117 0.155

Right Anterior Italian 0.122 0.211 0.058 0.112 12.3 .001

Spanish 0.003 0.091 0.167 0.296

Right Middle Italian 0.138 0.228 0.055 0.102 3.9 .054

Spanish 0.027 0.078 0.174 0.267

Right Posterior Italian 0.104 0.148 0.040 0.050 11.2 .02

Spanish 0.035 0.067 0.104 0.149

Peña and Melloni 1157

Page 10: Brain Oscillations during Spoken Sentence Processing

and p < .004, respectively). Across all experimental con-ditions and groups, alpha power significantly increasedas a function of time, as revealed by higher alpha poweraround the end of the sentence [2.3–2.8 sec after sen-tence onset) as compared with its beginning (0–0.5 sec;F(1, 38) = 17.3, p < .001]. In addition, around the sen-tence ending (2.3–2.8 sec) mean alpha power was lowerfor backward as compared with Forward utterances [utter-ance, F(1, 38) = 5.3, p < .03].

These results suggest that participants deployed a sim-ilar amount of attentional resources to process the onsetof the sentences in all experimental conditions. Alphasuppression was, however, less reduced for backwardsentences around their end, suggesting that backwardspeech demanded more attention.

Interhemispheric Comparisons

We did not observe significant hemispheric differencesneither for theta nor for low gamma band power, whenwe restricted our comparisons to those sensors and fre-quency windows previously used (Giraud et al., 2007;Luo & Poeppel, 2007; see Supplementary Table 2). How-ever, although we did not find hemispheric differences,we did observe regional differences in theta band powerwith significantly higher responses over central anteriorelectrodes as compared with middle or posterior elec-trodes (see above).

ERPs

We did not find significant differences either by Group,Language, or Type of Utterance for the ERP from 0 to500 or from 0 to 2900 msec after the sentence onset(see Supplementary Figure 4), suggesting that the evokedactivity to speech was similar in both groups and across allexperimental conditions. The same holds true when ana-lyzing the global field power over the entire period, com-paring forward versus backward utterances or forwardnative language against the closest language in each groupof subjects (Spanish vs. Italian for the Italian- and Spanish-speaking group).

DISCUSSION

Our results show that changes in brain oscillations mayreflect different steps of the speech processing, includingsyllable tracking/sampling and sentence level meaningprocessing (see Figure 1A and B). In fact, the early startingand sustained increase in theta power observed in thethree evaluated languages played forward support theAST proposal about syllable tracking (Poeppel, 2003).Moreover, the time course of the increase in middlegamma band power, observed in native language only, fitswell with predictions from psycholinguistic models aboutsemantic/syntactic integration for speech (see below).

Figure 5. The meantime–frequency poweracross all channels pergroup and experimentalcondition for backward (A)and forward (B) utterancesis depicted. The color barindicates the spectralpower in SD units (σ).

1158 Journal of Cognitive Neuroscience Volume 24, Number 5

Page 11: Brain Oscillations during Spoken Sentence Processing

Gamma Power Increase for the Native Language

The only significant difference in brain oscillations relatedto native language processing was an increase in middle

gamma band power. The middle gamma band increasebecame significant 1000 msec after sentence onset, re-mained high for several hundreds of milliseconds, and de-creased around the end of the sentence. The time course

Figure 6. We draw the spatial distribution of the mean power in Italian and Spanish groups for the time–frequency ROIs significantly differentto the baseline, that is, theta band (4–8 Hz) from 100 to 3200 msec; alpha band (9–14 Hz) from 0 to 500 msec and from 2300 to 2800 msec;and middle gamma band (55–75 Hz) from 1000 to 2900 msec. Ears and nose indicate the orientation of the skull maps. Color bars indicate theamplitude of the power in SD unit.

Peña and Melloni 1159

Page 12: Brain Oscillations during Spoken Sentence Processing

of middle gamma band observed for the native languagefits well with the time course predicted by current psycho-linguistic models for the semantic/syntactic unificationprocess. Both parallel and serial models acknowledge thatsentence level comprehension relies on a temporal bind-ing of the speech units that evolves while speech unravels(e.g., Hagoort & van Berkum, 2007; Culicover & Jackendoff,2006; Gorrell, 1998; McClelland, St. John, & Taraban, 1989;Frazier, 1987; Marslen-Wilson & Tyler, 1980). The semanticand/or syntactic interpretation of any utterance cannot beguessed from the perception and even recognition of itsinitial word; however, it can be anticipated before the utter-anceʼs end. Unification thus should take place in between,involving several processes (e.g., temporal binding ofprelexical and lexical aspects as well as their semantic–syntactic–pragmatic unification) that likely entail dynamiccoordination of a series of close and distant linguistic andnonlinguistic neural networks.

The fact that we did not observe significant incrementsin gamma oscillations for the close language although par-ticipants had access to phonemes, prosodic units, syllablesand, to some extent, lexical and morphosyntactic struc-tures that they, however, could not integrate either withneighboring words or in the context of themessage furthersupports our claim that middle gamma band activity re-flects linguistic processes of semantic integration of severalwords into a comprehensible meaning. Note that Spanishand Italian participants reported the recognition of isolatedwords from the close foreign language based on lexicalsimilarities with their native language. However, despiteits salience and memorization, isolated lexical recognitiondid not lead to a sustained increase in middle gamma bandpower for foreign languages. Middle gamma band powerthus cannot be associated to the salience or verbal memoryof isolated words lending further support to our proposalthat reflects integration while the utterance unravels intime. Also, differences in performance between conditions,in particular, the fact that higher performance was ob-served for the native language sentences, are unlikely toexplain the middle gamma band results, as we did notobserve any significant correlation between performanceand the amplitude of the gamma oscillation or betweenthe different experimental conditions (data not shown).Previous studies have shown that low gamma band activitypositively correlates with attentional demands (Palva &Palva, 2007) and/or cognitive effort (Simos, Papanikolaou,Sakkalis, & Micheloyannis, 2002). Our results, however,cannot be attributed to these factors, as the most attentiondemanding conditions (such as backward speech, whichhad lower performance as compared with forward condi-tions) were not associated with a significant increase inmiddle gamma power. Subvocal activation associated withthe programming of the tongue movements has been sug-gested to explain the increase in low gamma power from30 to 35 Hz (Giraud et al., 2007). Similarly, to rehearse theutterances, native speakers could have silently repeatedthe spoken sentences until their end. However, we did

not observe significant differences at this frequency rangein any experimental conditions. Increases in low and mid-dle gamma band activity have also been reported duringtasks requiring verbal episodic memory (Schack & Weiss,2005) and spatial working memory of auditory objects(Lutzenberger, Ripper, Busse, Birbaumer, & Kaiser,2002). The fact that gamma band increased only for thenative language rules out that working memory mainte-nance related to the auditory matching-to-sample task.Crucially, our results also do not resemble the topography,time course, or frequency content of artifactual gammaband activity, which is because of microsaccadic artifacts(Yuval-Greenberg, Tomer, Keren, Nelken, & Deouell, 2008;see also Melloni, Schwiedrzik, Rodriguez, & Singer, 2009).In our study, middle gamma band activity decreased

shortly before the end of the utterance. However, inter-pretations of a given spoken sentence can go beyond itsend and may involve further integrative processing ofspeech. However, we believe that the task imposed, whichdid not ask for the analysis of the semantic, syntactic, orpragmatic content of the utterances but, for the detectionof meaningless chunks, may have restricted the interpreta-tion of the sentence to the anticipated and most straight-forward one. Nonetheless, in natural language, brainoscillations may reflect the manifold interpretations thatany given sentence can have, particularly in a discursivecontext, and could possibly extend beyond its end.Mechanistic models addressing how information pro-

vided by phonemes, words, and context is integrated intoa single linguistic meaning from normal speech are rare.Moreover, previous studies on how sentence level mean-ing is accessed from speech have mostly relied on violationparadigms, reading protocols, and the analysis of ERPs, pre-venting a direct comparison with our study. However, ourresults fit well with those predicted by the binding bysynchrony hypothesis, originally proposed for the visualsystem. In this proposal, gamma band synchronizationserves as a neural mechanism for the integration of signals,separated in space and time, yielding a unified sensoryexperience of a meaningful object (Fries, 2005; Varelaet al., 2001; Singer, 1999; Singer & Gray, 1995). Increasesin gamma band synchronization have been observed forthe representation of coherent objects within (Tallon-Baudry & Bertrand, 1999) and between (Schneider et al.,2008; Widmann et al., 2007) sensory modalities, denotingthe prominent role of gamma oscillations in unimodal andmultimodal representations. Previous studies on languagecomprehension from reading report that low gamma bandresponses increase when semantic unification is possible(Penolazzi et al., 2009; Hald et al., 2006) and when prag-matic information, that is, world knowledge, is integrated(Hagoort et al., 2004). These results speak for a unificationprocess that takes several sources of information intoaccount (e.g., word meaning, world knowledge, listenerʼsexpectation). Gamma band synchronization for speech iswidely observed over the scalp, suggesting that differentneuronal assemblies involved in the processing of the

1160 Journal of Cognitive Neuroscience Volume 24, Number 5

Page 13: Brain Oscillations during Spoken Sentence Processing

different features of speech objects are interacting duringspeech temporal binding. In our study, this type of integra-tion was only possible when listeners accessed to meaning-ful units from their native language.Binding for sentence comprehension involves two com-

ponents: a spatial and a temporal one. Binding in spacerefers to the integration of distributed neuronal ensemblescoding for the meaning of individual words. Binding intime refers to the fact that language unravels sequentially,and to understand the meaning of a sentence, informationneeds to be bound in time. Thus, the distributed neural en-sembles have to be integrated over space and time for sen-tence level meaning to emerge. There is evidence thatgamma oscillations bind information in space across bothshort and long distances (Varela et al., 2001). For the inte-gration in time, we propose that the brain dynamically in-tegrates the incoming words in the context of the previousones by exploiting predictions. In particular, we hypothe-size that, as words unravel, they preactivate the semanticnetwork most closely related to them. Preactivationsreflecting predictions have been proposed to take placein the gamma frequency band (Engel, Fries, & Singer,2001) in the form of subthreshold oscillations. Predictionsserve the purpose to facilitate information processing(Melloni, Schwiedrzik, Müller, Rodriguez, & Singer, 2011).Thus, when the incoming information matches the predic-tion, resonance occurs; hence, information is selected, andan increase in gamma oscillation is observed (Grossberg,2009). When unpredictable words appear, they do not res-onate, which is seen as decreases in gamma oscillations. Insummary, we propose that, as information arrives sequen-tially, the new information joins the already synchronouslyoscillating assembly, which now represents the compoundand current meaning. Previous studies on reading haveshown that already at 90 msec predictable words differfrom unpredictable ones (Dambacher, Rolfs, Göllner,Kliegl, & Jacobs, 2009), further confirming that languagecomprehension takes places at a remarkable speed, whichin turn necessitates neuronal mechanism that supportthese temporal dynamics. Gamma oscillation could servethat purpose because they represent fast integration win-dows and by adjusting the phase within and betweenpopulations of neurons allow for effective integration orrouting of information (Fries, 2009).Contrary to previous studies, we did not find significant

differences in low gamma band (i.e., 20–50 Hz) related tospeech processing (Bastiaansen et al., 2010; Penolazziet al., 2009; Giraud et al., 2007). However, low, middle,and high gamma subbands may exhibit different patternsof responses and have different neural origins (for a re-view, see Roopun et al., 2008). Although the differencesbetween our study and previous reports may be becauseof the specifics of the task and study design (spoken vs.written stimuli, no violation vs. violation paradigms, andcross-linguistic vs. only native language), it is importantto note that also the physical properties of the active neu-ral assemblies (such as size and geometry: von Stein &

Sarnthein, 2000) or which cortical layers are mostlyinvolved (Roopun et al., 2006; Cunningham et al., 2004)can affect the frequency of the network oscillations.

Theta, Alpha, and Beta Range Oscillations

We observed a significant increase in theta power in allexperimental conditions, which was greater for forwardthan backward speech in both groups. The theta resultssupport the AST proposal for syllable sampling (Poeppel,2003), although we did not find right hemispheric dom-inance. The increase in theta band observed for Japaneseutterances might be ascribed to the possibility that Japanesemorae were perceived as syllables. In fact, a mora is hardlyprocessed by nonnative Japanese speakers (Menning,Imaizumi, Zwitserlood, & Pantev, 2002). Greater increasesin theta band for forward as compared with backward utter-ances suggest that syllabic structure is sampled with moredifficulty but is not absent in backward speech.

Previous studies have interpreted the increases in thetapower as a biological marker of lexical-semantic access.For example, during the comprehension of a written storydisplayed word per word, open-class (e.g., nouns, verbs,and adjectives) and close-class (e.g., articles, determiners,and prepositions) words elicit different patterns of in-creases in theta band power (Bastiaansen et al., 2005).In contrast, our results did not show an exclusive increasein theta band power for the experimental condition whenopen class words were entirely accessible, that is, forwardnative language, but also for experimental conditions wherelexical access was completely unfeasible.

Theta band activity has also been associated with work-ing memory load. In fact, theta power (Bastiaansen, vanBerkum, & Hagoort, 2002a) and theta coherence (Weiss& Mueller, 2003) linearly increase over time when well-formed written or auditory sentences are perceived. Further-more, during the sequential reading of sentences, thetacoherence increases after the detection of a syntactic(Bastiaansen, van Berkum, & Hagoort, 2002b) and seman-tic (Hald et al., 2006; Hagoort et al., 2004) violation. Like-wise, during spoken sentence processing, theta coherenceis significantly greater during and after the perception ofsubject–subject than subject–object relative clauses fromrelative utterances (Weiss et al., 2005), supporting the ideathat more complex sentences require more verbal workingmemory resources. In our study, we observed a sustainedincrease in theta band from the beginning to the end of theutterances, without significant differences across eitherexperimental conditions or groups. In contrast, in theconditions in which working memory demands werehighest, that is, backward speech, the increase in thetaactivity was smaller, suggesting that theta oscillations didnot reflect verbal working memory effort in our study.

Concerning the alpha band, we found significantlyhigher power near the end as compared with the onsetof the utterances in all experimental conditions, suggestingthat attention demands were initially greater and then

Peña and Melloni 1161

Page 14: Brain Oscillations during Spoken Sentence Processing

decreased by the end of any utterance. This pattern ofinitial alpha suppression may reflect the functioning of ageneral attention mechanism for auditory stimuli. The ten-dency to present longer alpha suppression in backward ascompared with forward utterances supports the idea thatbackward speech involved higher attention demands.Our results agree with previous reports showing that alphaactivity is sensitive to general task demands such asattentional processes and that it decreases when the taskrequires more attentional resources ( Jensen, Gelfand,Kounios, & Lisman, 2002; Klimesch, 1999). Alternatively,the increment in alpha oscillations could also reflect activeinhibition of areas unrelated to the current task ( Jensen &Mazaheri, 2010).

We did not find significant differences in beta bandpower. It has been reported that beta power linearly in-creases while sentences evolve in time and that thisincrease is disrupted by syntactic violations (Bastiaansenet al., 2010). Such increases in beta band activity have beeninterpreted as reflecting a step of the semantic/syntacticunification (Rohm et al., 2001) and also an increment in se-mantic working memory demands (Haarmann et al., 2002)during sentence processing. Our data did not supportthese proposals, however, given that we used different ex-perimental paradigm, further studies are necessary to clar-ify the nature of this discrepancy.

In summary, our study contributes to a better under-standing of the temporal binding problem for speechand may have important consequences for current neuro-cognitive models of speech processing. First, we report aplausible neuronal mechanism for accessing sentencelevel meaning from spoken sentences, that is, gammaband synchrony for the native language. This has neitherbeen observed nor implicated in previous speech studies.Second, our results provide empirical evidence of a doubledissociation in middle gamma band power: In fact, whenusing the same set of utterances, we found that middlegamma band power increased for Italian speakers whenthey heard forward Italian, but not forward Spanish, whereasthe opposite was found for Spanish speakers. We interpretthat switch in the gamma band patterns as reflecting thefact that only when native speakers listen to their native lan-guage they can integrate words into a meaningful message.In contrast, both groups of participants showed similartime courses in lower frequencies, that is, the theta band,when listening to both Italian and Spanish languages. Weinterpret this similar response in low frequencies as reflect-ing processing of speech units such as syllables that lackmeaning and are accessible from the unknown languageswe used (i.e., Spanish and Italian largely share their phone-mic and syllabic repertoire). Third, our results support theextension of the binding by synchrony hypothesis tospeech comprehension. The cross-linguistic nature of thisstudy provides strong evidence for an integrative mecha-nism underpinning access to the plain meaning of spokensentences. Thus, our study is the first to directly show anincrease in middle gamma band synchrony as a neural

mechanism for the processing and comprehension ofspoken sentences in the native language.

Acknowledgments

We thank Eugenio Rodriguez for providing analysis tools,Caspar M. Schwiedrzik for helping in editing the manuscript,and five anonymous reviewers for their insightful commentsduring the revision of the manuscript. This work was supportedby grants Fondecyt 1090662, PIA-Conicyt-CIE-05, and PBCT-PSD72 to M. P.

Reprint requests should be sent to Marcela Peña, CognitiveNeuroscience Sector, Scuola Internazionale Superiore di StudiAvanzati, via Bonomea 265, 34136 Trieste, Italy, or via e-mail:[email protected].

REFERENCES

Babiloni, C., Miniussi, C., Babiloni, F., Carducci, F., Cincotti, F.,Del Percio, C., et al. (2004). Sub-second “temporal attention”modulates alpha rhythms. A high-resolution EEG study.Brain Research, Cognitive Brain Research, 19,259–268.

Bastiaansen, M., Magyari, L., & Hagoort, P. (2010). Syntacticunification operations are reflected in oscillatory dynamicsduring on-line sentence comprehension. Journal ofCognitive Neuroscience, 22, 1333–1347.

Bastiaansen, M., Oostenveld, R., Jensen, O., & Hagoort, P.(2008). I see what you mean: Theta power increases areinvolved in the retrieval of lexical semantic information.Brain Language, 106, 15–28.

Bastiaansen, M., van Berkum, J. J., & Hagoort, P. (2002a). Event-related theta power increases in the human EEG duringonline sentence processing. Neuroscience Letters, 323,13–16.

Bastiaansen, M., van Berkum, J. J., & Hagoort, P. (2002b).Syntactic processing modulates the theta rhythm of thehuman EEG. Neuroimage, 17, 1479–1492.

Bastiaansen, M., Van der Linden, M., ter Keurs, M., Dijkstra, T.,& Hagoort, P. (2005). Theta responses are involved inlexico-semantic retrieval during language processing.Journal of Cognitive Neurosciences, 17, 530–541.

Braeutigam, S., Bailey, A. J., & Swithenby, S. J. (2001). Phase-locked gamma band responses to semantic violation stimuli.Brain Research, Cognitive Brain Research, 10, 365–377.

Buiatti, M., Peña, M., & Dehaene-Lambertz, G. (2009).Investigating the neural correlates of continuous speechcomputation with frequency-tagged neuroelectric responses.Neuroimage, 44, 509–519.

Culicover, P. W., & Jackendoff, R. (2006). The simpler syntaxhypothesis. Trends in Cognitive Sciences, 10, 413–418.

Cunningham, M. O., Whittington, M. A., Bibbig, A., Roopun, A.,LeBeau, F. E., Vogt, A., et al. (2004). A role for fast rhythmicbursting neurons in cortical gamma oscillations in vitro.Proceedings of the National Academy of Sciences, U.S.A.,101, 7152–7157.

Dambacher, M., Rolfs, M., Göllner, K., Kliegl, R., & Jacobs, A. M.(2009). Event-related potentials reveal rapid verification ofpredicted visual input. PLoS One, 4, e5047.

Engel, A. K., Fries, P., & Singer, W. (2001). Dynamic predictions:Oscillations and synchrony in top–down processing.Nature Reviews Neuroscience, 2, 704–716.

Engel, A. K., & Singer, W. (2001). Temporal binding theneural correlates of sensory awareness. Trends in CognitiveSciences, 5, 16–25.

1162 Journal of Cognitive Neuroscience Volume 24, Number 5

Page 15: Brain Oscillations during Spoken Sentence Processing

Frazier, L. (1987). Theories of sentence processing. InJ. Garfield (Ed.), Modularity in knowledge representationand natural-language processing (pp. 291–307).Cambridge, MA: MIT Press.

Friederici, A. D. (2002). Towards a neural basis of auditorysentence processing. Trends in Cognitive Sciences, 6, 78–84.

Friederici, A. D., & Weissenborn, J. (2007). Mapping sentenceform onto meaning: The syntax-semantic interface. BrainResearch, 1146, 50–58.

Fries, P. (2005). A mechanism for cognitive dynamics: Neuronalcommunication through neuronal coherence. Trends inCognitive Sciences, 9, 474–480.

Fries, P. (2009). Neuronal gamma-band synchronization as afundamental process in cortical computation. AnnualReview of Neuroscience, 32, 209–224.

Giraud, A., Kleinschmidt, A., Poeppel, D., Lund, T. E.,Frackowiak, R. S. J., & Laufs, H. (2007). Endogenous corticalrhythms determine cerebral specialization for speechperception production. Neuron, 56, 1127–1134.

Gorrell, P. (1998). Syntactic analysis and reanalysis in sentenceprocessing. In J. D. Fodor & F. Ferreira (Eds.), Reanalysisin sentence processing (pp. 201–245). Dordrecht, theNetherlands: Kluwer Academic Publishers.

Greenberg, S., Carvey, H., Hitchcock, L., & Chang, S.(2003). Temporal properties of spontaneous speech-Asyllable-centric perspective. Journal of Phonetics, 31,465–485.

Grossberg, S. (2009). Cortical and subcortical predictivedynamics and learning during perception, cognition,emotion and action. Philosophical Transactions of theRoyal Society of London, Series B, Biological Sciences, 364,1223–1234.

Haarmann, H. J., Cameron, K. A., & Ruchkin, D. S. (2002).Neural synchronization mediates on-line sentence processingEEG coherence evidence from filler-gap constructions.Psychophysiology, 39, 820–825.

Hagoort, P. (2005). On Broca, brain, and binding: A newframework. Trends in Cognitive Sciences, 9, 416–423.

Hagoort, P. (2008). The fractionation of spoken languageunderstanding by measuring electrical and magneticbrain signals. Philosophical Transactions of the RoyalSociety of London, Series B, Biological Sciences, 363,1055–1106.

Hagoort, P., Hald, L., Bastiaansen, M., & Petersson, K. M.(2004). Integration of word meaning and world knowledgein language comprehension. Science, 304, 438–441.

Hagoort, P., & van Berkum, J. (2007). Beyond the sentencegiven. Philosophical Transactions of the Royal Society ofLondon, Series B, Biological Sciences, 362, 801–811.

Hald, L. A., Bastiaansen, M. C., & Hagoort, P. (2006). EEGtheta and gamma responses to semantic violations inonline sentence processing. Brain Language, 96,90–105.

International Phonetic Association. (1999). Handbook of theInternational Phonetic Association: A guide to the useof the International Phonetic Alphabet. Cambridge, UK:Cambridge University Press.

Jackendoff, R. (2002). Foundations of language: Brain,meaning, grammar, evolution. Oxford, UK: OxfordUniversity Press.

Jensen, O., Gelfand, J., Kounios, J., & Lisman, J. E. (2002).Oscillations in the alpha band (9-12 Hz) increase withmemory load during retention in a short-term memorytask. Cerebral Cortex, 12, 877–882.

Jensen, O., & Mazaheri, A. (2010). Shaping functionalarchitecture by oscillatory alpha activity: Gating by inhibition.Frontiers in Human Neuroscience, 4, 1–8.

Klimesch, W. (1999). EEG alpha and theta oscillations reflect

cognitive and memory performance: A review and analysis.Brain Research Reviews, 29, 169–195.

Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S., & Baker,C. I. (2009). Circular analysis in systems neuroscience:The dangers of double dipping. Nature Neurosciences,12, 535–540.

Luo, H., & Poeppel, D. (2007). Phase patterns of neuronalresponses reliably discriminate speech in human auditorycortex. Neuron, 54, 1001–1010.

Lutzenberger, W., Ripper, B., Busse, L., Birbaumer, N., &Kaiser, J. (2002). Dynamics of gamma-band activityduring an audiospatial working memory task in humans.Journal of Neurosciences, 22, 5630–5638.

Marslen-Wilson, W. D., & Tyler, L. K. (1980). The temporalstructure of spoken language understanding. Cognition, 8,1–71.

McClelland, J. L., St. John, M., & Taraban, R. (1989). Sentencecomprehension: A parallel distributed processing approach.Language and Cognitive Processes, 4, 287–336.

McQueen, J. M., Cutler, A., & Norris, D. (2003). Flow ofinformation in the spoken word recognition system.Speech Communication, 41, 257–270.

Melloni, L., Molina, C., Peña, M., Torres, D., Singer, W., &Rodriguez, E. (2007). Synchronization of neural activityacross cortical areas correlates with conscious perception.Journal of Neuroscience, 27, 2858–2865.

Melloni, L., Schwiedrzik, C. M., Müller, N., Rodriguez, E., &Singer, W. (2011). Expectations change the signatures andtiming of electrophysiological correlates of perceptualawareness. Journal of Neuroscience, 31, 1386–1396.

Melloni, L., Schwiedrzik, C. M., Rodriguez, E., & Singer, W.(2009). (Micro)saccades, corollary activity and corticaloscillations. Trends in Cognitive Sciences, 13, 239–245.

Menning, H., Imaizumi, S., Zwitserlood, P., & Pantev, C. (2002).Plasticity of the human auditory cortex induced bydiscrimination learning of non-native, mora-timed contrastsof the Japanese language. Learning & Memory, 9, 253–267.

Molinaro, N., Barber, H. A., & Carreiras, M. (2011). Grammaticalagreement processing in reading: ERP findings and futuredirections. Cortex, 47, 908–930.

Nespor, M., & Vogel, I. (1986). Prosodic phonology. Dordrecht,NL: Foris Publications Holland.

Palva, S., & Palva, J. M. (2007). New vistas for alpha-frequencyband oscillations. Trends in Neurosciences, 30, 150–158.

Penolazzi, B., Angrilli, A., & Job, R. (2009). Gamma EEG activityinduced by semantic violation during sentence reading.Neuroscience Letters, 465, 74–78.

Poeppel, D. (2003). The analysis of speech in differenttemporal integration windows: Cerebral lateralization as“asymmetric sampling in time”. Speech Communication,41, 245–255.

Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates oflinguistic rhythm in the speech signal. Cognition, 73,265–292.

Rodriguez, E., Lachaux, J. P., Martinerie, J., Renault, B., &Varela, F. J. (1999). Perceptionʼs shadow: Long-distancesynchronization of human brain activity. Nature, 397,430–433.

Rohm, D., Klimesch, W., Haider, H., & Doppelmayr, M. (2001).The role of theta and alpha oscillations for languagecomprehension in the human electroencephalogram.Neuroscience Letters, 310, 137–140.

Roopun, A. K., Kramer, M. A., Carracedo, L. M., Kaiser, M.,Davies, C. H., Traub, R. D., et al. (2008). Temporalinteractions between cortical rhythms. Frontiers inNeurosciences, 2, 145–154.

Roopun, A. K., Middleton, S. J., Cunningham, M. O., LeBeau,F. E., Bibbig, A., Whittington, M. A., et al. (2006). A

Peña and Melloni 1163

Page 16: Brain Oscillations during Spoken Sentence Processing

beta2-frequency (20-30 Hz) oscillation in nonsynapticnetworks of somatosensory cortex. Proceedings ofthe National Academy of Sciences, U.S.A., 103,15646–15650.

Schack, B., & Weiss, S. (2005). Quantification of phasesynchronization phenomena and their importance forverbal memory process. Biological Cybernetics, 92,275–287.

Schneider, T. R., Debener, S., Oostenveld, R., & Engel, A. K.(2008). Enhanced EEG gamma-band activity reflectsmultisensory semantic matching in visual-to-auditory objectpriming. Neuroimage, 42, 1244–1254.

Shahin, A. J., Picton, T. W., & Miller, L. M. (2009). Brainoscillations during semantic evaluation of speech.Brain and Cognition, 70, 259–266.

Simos, P. G., Papanikolaou, E., Sakkalis, E., & Micheloyannis, S.(2002). Modulation of gamma-band spectral power bycognitive task complexity. Brain Topography, 14, 191–196.

Singer, W. (1999). Neuronal synchrony: A versatile code forthe definition of relations? Neuron, 24, 49–25.

Singer, W. (2002). Cognition, gamma oscillations and neuronalsynchrony. In R. C. Reisin, M. R. Nuwer, M. Hallett, &C. Medina (Eds.), Advances in clinical neurophysiology(pp. 3–22). Amsterdam: Elsevier.

Singer, W., & Gray, C. M. (1995). Visual feature integrationthe temporal correlation hypothesis. Annual Review ofNeuroscience, 18, 555–586.

Tallon-Baudry, C. (2009). The roles of gamma-band oscillatorysynchrony in human visual cognition. Frontiers inBioscience, 14, 321–332.

Tallon-Baudry, C., & Bertrand, O. (1999). Oscillatory gammaactivity in humans and its role in object representation.Trends in Cognitive Sciences, 3, 151–162.

Van Berkum, J. J. (2008). Understanding sentences in context.What brain waves can tell us. Current Directions inPsychological Science, 17, 376–380.

Van Berkum, J. J. A., Brown, C. M., Zwitserlood, P., Kooijman,V., & Hagoort, P. (2005). Anticipating upcoming wordsin discourse: Evidence from ERPs and reading times. Journalof Experimental Psychology: Learning, Memory, andCognition, 31, 443–467.

Varela, F., Lachaux, J. P., Rodriguez, E., & Martinerie, J. (2001).The brainweb: Phase synchronization and large-scaleintegration. Nature Reviews Neuroscience, 2, 229–239.

von Stein, A., & Sarnthein, J. (2000). Different frequencies fordifferent scales of cortical integration: From local gamma tolong range alpha/theta synchronization. InternationalJournal of Psychophysiology, 38, 301–313.

Weiss, S., & Mueller, H. M. (2003). The contribution of EEGcoherence to the investigation of language. Brain Language,85, 325–343.

Weiss, S., Mueller, H. M., Schack, B., King, J. W., Kutas, M., &Rappelsberger, P. (2005). Increased neuronal communicationaccompanying sentence comprehension. InternationalJournal of Psychophysiology, 57, 129–141.

Widmann, A., Gruber, T., Kujala, T., Tervaniemi, M., &Schröger, E. (2007). Binding symbols sounds: Evidencefrom event-related oscillatory gamma-band activity.Cerebral Cortex, 17, 2696–2702.

Yamagishi, N., Goda, N., Callan, D. E., Anderson, S. J., &Kawato, M. (2005). Attentional shifts towards an expectedvisual target alter the level of alpha-band oscillatory activityin the human calcarine cortex. Brain Research, CognitiveBrain Research, 25, 799–809.

Yantis, S., & Jonides, J. (1984). Abrupt visual onsets andselective attention: Evidence from visual search. Journalof Experimental Psychology: Human Perception andPerformance, 10, 601–621.

Yuval-Greenberg, S., Tomer, O., Keren, A. S., Nelken, I., &Deouell, L. Y. (2008). Transient induced gamma-bandresponse in EEG a manifestation of miniature saccades.Neuron, 58, 429–441.

1164 Journal of Cognitive Neuroscience Volume 24, Number 5

Page 17: Brain Oscillations during Spoken Sentence Processing

This article has been cited by:

1. Mabel Urrutia, Manuel de Vega, Marcel Bastiaansen. 2012. Understanding counterfactuals in discourse modulates ERP andoscillatory gamma rhythms in the EEG. Brain Research 1455, 40-55. [CrossRef]


Recommended