Investigating the Neural Correlates of Crossmodal Facilitation as … · 2014. 1. 15. · Zainab...

INVESTIGATING THE NEURAL CORRELATES OF CROSSMODAL FACILITATION AS A RESULT OF

ATTENTIONAL CUEING: AN EVENT-RELATED fMRI STUDY

by

Zainab Fatima

A thesis submitted in conformity with the requirements for the degree of Masters of Science

Institute of Medical Science University of Toronto

© Copyright by Zainab Fatima, 2008

ii

Investigating the Neural Correlates of Crossmodal Facilitation as

a Result of Attentional Cueing: an Event-Related fMRI Study

Zainab Fatima

Masters of Science

Institute of Medical Science University of Toronto

2008

Abstract

Attentional cueing modulated neural processes differently depending on input modality. I used

event-related fMRI to investigate how auditory and visual cues affected reaction times to

auditory and visual targets. Behavioural results showed that responses were faster when: cues

appeared first compared to targets and cues were auditory versus visual. The first result was

supported by an increase in BOLD percent signal change in sensory cortices upon cue but not

target presentation. Task-related activation patterns showed that the auditory cue activated

auditory and visual cortices while the visual cue activated the visual cortices and the fronto-polar

cortex. Next, I computed brain-behaviour correlations for both cue types which revealed that the

auditory cue recruited medial visual areas and a fronto-parietal attentional network to mediate

behaviour while the visual cue engaged a posterior network composed of lateral visual areas and

subcortical structures. The results suggest that crossmodal facilitation occurs via independent

neural pathways depending on cue modality.

iii

Acknowledgments I am still in awe of the fact that I have now officially completed my Masters degree. This

is partly because writing the thesis appeared to be such an arduous task at the beginning.

Nevertheless, with time and patience, the page numbers increased and the quality of the writing

improved. In retrospect, I realize that the one person who constantly supported me and pushed

me to produce the best possible work is my supervisor – Dr. Anthony Randal McIntosh. His

relentless critiques of my thesis and constant insistence on getting tasks completed have brought

me to this pinnacle in my life. I am deeply grateful to him. I admire his strength of character and

his ability to always be innovative in the light of academic adversity. He regards criticisms of his

work as challenges and is truly an inspiration when it comes to scientific knowledge. If my

scientific career is an inkling of what Randy’s is, I would be quite satisfied with my progress. So,

thank you Randy for being the person that you are and for keeping me motivated throughout this

whole process.

I would like to thank past and present members of the McIntosh lab who have provided

valuable feedback in various forms during the preparation of my graduate work. These people

include Maria Tassopoulos, Jordan Poppenk, Grigori Yourganov, Tanya Brown, Roxane Itier,

Vasily Vakorin, Diana Khrapatch, Anjali Raja, Antonio Vallesi, Wilkin Chau, Michele Korostil,

Signe Bray, Jeremy Caplan and Mackenzie Glaholt. I would like to especially thank Natasa

Kovacevic – for her insights with regards to my experiment, Andreea Diaconescu and Bratislav

Misic – for allowing me to vent and fret at any given time during my writing episodes, Andrea

Protzner – for constantly keeping me caffeinated, Sandra Moses – for her gentle way of

conveying criticisms, and Hana Burian – for the sushi runs and bear hugs.

I would like to thank my committee members – Drs. Adam Anderson and Claude Alain

for their support and for providing me with thesis-related feedback so promptly. I would also like

to thank Karen Davis, graduate coordinator at the Institute of Medical Science (IMS), for

allowing me to defend my thesis within such stringent time constraints.

Last but not least, I would like to thank my parents and brother for their faith in my

ability to accomplish any goal that I have set for myself and for taking care of me. I would like to

thank my husband and best friend, Ali - without his foot-rubs, back massages and constant

iv

coaxing – I would be far from completing any graduate work. And finally, I would like to thank

my baby girl who has so patiently stayed in my tummy till I have completed my academic

responsibilities. This is one journey we’ve already shared and I can’t wait to meet you, my

darling.

v

Table of Contents Abstract ........................................................................................................................................... ii

Acknowledgments.......................................................................................................................... iii

Table of Contents............................................................................................................................ v

List of Tables ............................................................................................................................... viii

List of Figures ................................................................................................................................ ix

List of Appendices .......................................................................................................................... x

List of Abbreviations ..................................................................................................................... xi

Chapter 1: Literature Review.......................................................................................................... 1

1.1 Overview............................................................................................................................. 1

1.2 Classifying Different Attentional Mechanisms .................................................................. 1

1.2.1 Orienting ................................................................................................................. 1

1.2.2 Endogenous vs. Exogenous Shifts .......................................................................... 2

1.3 Influence of Crossmodal Stimuli on Attentional Mechanisms ........................................... 4

1.4 Crossmodal Asymmetry Reported In Cognitive, Physiological, and Developmental Studies................................................................................................................................. 7

1.4.1 Auditory to Visual (A-V) Interactions.................................................................... 8

1.4.2 Visual to Auditory (V-A) Interactions.................................................................. 10

1.5 General Anatomy of Central Auditory and Visual Pathways........................................... 11

1.5.1 The Auditory Pathway .......................................................................................... 11

1.5.2 The Visual Pathway .............................................................................................. 13

1.6 From Anatomy to Function - A Dynamic Systems’ Perspective...................................... 14

1.7 An Overview of fMRI....................................................................................................... 16

1.7.1 Basic MRI Physics................................................................................................ 16

1.7.2 Physiological Basis of BOLD fMRI ..................................................................... 18

1.7.3 Coupling of Neuronal Activity & BOLD ............................................................. 19

vi

Chapter 2: Aims and Hypotheses.................................................................................................. 22

Chapter 3: Attentional Cueing Modulates Multisensory Interactions in Human Sensory Cortices .................................................................................................................................... 24

3.1 Introduction....................................................................................................................... 24

3.2 Materials and Methods...................................................................................................... 26

3.2.1 Participants............................................................................................................ 26

3.2.2 Stimuli................................................................................................................... 27

3.2.3 Apparatus .............................................................................................................. 27

3.2.4 Procedure .............................................................................................................. 28

3.2.4.1 Trial Structure......................................................................................... 28

3.2.4.2 Task Types.............................................................................................. 28

3.2.4.3 fMRI Session .......................................................................................... 29

3.2.5 fMRI Scanning Parameters ................................................................................... 30

3.2.6 Data Analysis ........................................................................................................ 30

3.2.6.1 Pre-processing Pipeline. ......................................................................... 30

3.2.6.2 Statistical Analysis. ................................................................................ 32

3.3 Results............................................................................................................................... 34

3.3.1 Behavioural Performance...................................................................................... 34

3.3.2 fMRI Results......................................................................................................... 35

3.4 Discussion ......................................................................................................................... 36

Tables............................................................................................................................................ 40

Figures........................................................................................................................................... 46

Chapter 4: The Interplay of Cue Modality and Response Latency in Neural Networks Supporting Crossmodal Facilitation......................................................................................... 60

4.1 Introduction....................................................................................................................... 60

4.2 Methods............................................................................................................................. 62

4.1.1 Data Analysis ........................................................................................................ 62

vii

4.3 Results............................................................................................................................... 63

4.1.2 Behavioural Performance...................................................................................... 63

4.1.3 fMRI Results......................................................................................................... 63

4.4 Discussion ......................................................................................................................... 64

Tables............................................................................................................................................ 68

Figures........................................................................................................................................... 75

Chapter 5: General Discussion...................................................................................................... 87

5.1 A Convergent Model of Audio-Visual Interactions.......................................................... 90

5.2 Dynamic Processing in Sensory-specific Cortices ........................................................... 91

5.3 Limitations ........................................................................................................................ 92

5.4 Future Directions .............................................................................................................. 93

References..................................................................................................................................... 95

Appendices.................................................................................................................................. 115

viii

List of Tables Table 3.1: Mean Reaction Times by Condition. ........................................................................... 40

Table 3.2: Local Maxima from AC-VT: AT-VC Task ST-PLS. .................................................. 41

Table 3.3: Local Maxima from VC-AT: VT-AC Task ST-PLS. .................................................. 43

Table 4.1: Local Maxima from AC-VT: AT-VC Behavioural ST-PLS. ...................................... 68

Table 4.2: Local Maxima from VC-AT: VT-AC Behavioural ST-PLS. ...................................... 71

ix

List of Figures Figure 3.1. Experimental design schematic for auditory cue-visual target tasks.......................... 46

Figure 3.2. Experimental design schematic for visual cue-auditory target tasks.......................... 48

Figure 3.3. Behaviour measures for all four experimental tasks. ................................................. 50

Figure 3.4. BOLD HRFs for cue and target – auditory modality. ................................................ 52

Figure 3.5. Singular image and design scores differentiating cue from target for auditory

modality ........................................................................................................................................ 54

Figure 3.6. BOLD HRFs for cue and target – visual modality. .................................................... 56

Figure 3.7. Singular image and design scores differentiating cue from target for visual modality..

....................................................................................................................................................... 58

Figure 4.1. Correlation profiles for the auditory cue. ................................................................... 75

Figure 4.2. Brain scores plotted by participants for the auditory cue. .......................................... 77

Figure 4.3. Singular images of brain areas that facilitate reaction time for the auditory cue. ...... 79

Figure 4.4. Correlation profiles for the visual cue. ....................................................................... 81

Figure 4.5. Brain scores plotted by participants for the visual cue............................................... 83

Figure 4.6. Singular images of brain regions that facilitate reaction time for the visual cue. ...... 85

x

List of Appendices Appendix A. fMRI Screening Form. .......................................................................................... 115

Appendix B. MRI Screening Form............................................................................................. 119

Appendix C: Information and Consent Form. ............................................................................ 120

List of Abbreviations

Abbreviation Region Cu cuneus Cl claustrum Ga angular gyrus GC cingulate gyrus GF fusiform gyrus

GFd medial frontal gyrus GFi inferior frontal gyrus

GFm middle frontal gyrus GFs superior frontal gyrus Gh parahippocampal gyrus GL lingual gyrus GOi inferior occipital gyrus

GOm middle occipital gyrus GOs superior occipital gyrus

GPoC postcentral gyrus (sensory cortex) GPrC precentral gyrus (motor cortex) GTi inferior temporal gyrus

GTm middle temporal gyrus GTs superior temporal gyrus Gsm supramarginal gyrus INS insula LPc paracentral lobule LPi inferior parietal lobe LPs superior parietal lobe Pcu precuneus Th thalamus

Note: Definition of the region abbreviations by reference to Talairach and Tournoux (1988). Other abbreviations used in tables include L and R which stand for left and right respectively. Ant refers to anterior and Post refers to posterior.

xi

1

Chapter 1: Literature Review

1.1 Overview The past few decades have marked a dramatic change in the realm of cognitive

neuroscience. Neuroimaging techniques such as functional magnetic resonance imaging (fMRI)

and electroencephalography (EEG) have provided scientists with tools to examine brain function

across spatial and temporal domains. Prior to the use of functional neuroimaging, attentional

theories about the brain were based primarily on animal work or observations of deficits in

patients with brain damage or disease. These theories of attention are now informed both by

animal work and human neuroimaging studies; researchers are actively trying to bridge the gap

between brain function and attentional models.

This section will begin with an overview of some common attentional mechanisms and

how these processes can form the basis of behavioural crossmodal facilitation. Crossmodal

facilitation occurs when a cue in one sensory modality, for example auditory, speeds responses to

a target in a different sensory modality, such as vision. The motivation for the current experiment

is to study the effects of crossmodal facilitation, in audition and vision, using the fMRI

technique.

A survey of the behavioural literature on crossmodal facilitation will be followed by an

outline of the neuroanatomy that may support audio-visual processes in the brain. The

implications of neuroanatomical studies on the organization of brain function will then be

described continued by a brief review of the fMRI technique.

1.2 Classifying Different Attentional Mechanisms

1.2.1 Orienting

Experiments on selective attention have shown that when participants are provided with a

cue for the location of an upcoming event, they direct their attention to the event (Posner, 1980;

Posner, Inhoff, Friedrich, & Cohen, 1987). Posner developed an attentional cueing paradigm to

study such visual shifts of attention. The paradigm consisted of a central cross and two peripheral

boxes. The central cross was replaced by a plus sign or an arrow that pointed in the direction of

one of the boxes. The plus sign indicated that there was an equiprobable chance of a target

2

appearing in either of the two boxes while the direction of the arrow correctly identified the

location of the target in eighty-percent of the trials (valid trials) and did not cue correct target

location in twenty-percent of the trials (invalid trials). Responses were faster to valid trials

compared to invalid trials. Posner (1980) postulated that facilitated responses in valid conditions

occurred as a result of attentional engagement at the target location.

Subsequently, Posner and Petersen (1990) proposed a model of selective attention based

on neuropsychological studies where three loci in the brain – the parietal lobe, the superior

colliculus and the thalamus - seemed to contribute to different aspects of attentional shifts.

Individuals with unilateral parietal lobe damage showed normal responses when cues and targets

were presented on the side that was contra-lateral to their lesion but their performance was

impaired when cues appeared on the ipsa-lesional side and the target appeared on the contra-

lesional side (Posner, Walker, Friedrich, & Rafal, 1984). Patients that had degenerate superior

colliculi, structures involved in eye movement coordination, such as those suffering from

supranucluear palsy were slow to move their attention from one location to the next (Rafal et al.,

1988). Lastly, patients with thalamic damage responded poorly at spatial locations that were

contra-lateral to their lesion irrespective of whether trials were valid or invalid (Rafal & Posner,

1987). In summarizing the neuropsychological work that supported the selective attention model,

it can be stated that the parietal lobe was critical in the disengagement of attention, the superior

colliculus was implicated in moving attention from one point in space to another and the

thalamus was essential for re-engaging attention at particular location.

1.2.2 Endogenous vs. Exogenous Shifts

The model of attentional cueing postulated by Posner was based on experiments in which

cues were centrally presented (at fixation) and targets occurred in the periphery (either side of

fixation). Experimenters have subsequently claimed that the location of cue presentation, central

or peripheral, can tap into two different attentional mechanisms – endogenous and exogenous

(see Ruz & Lupianez, 2002 for a review). Endogenous shifts of attention occur following an

instruction to attend to a target location where participants have to use their own volition to

orient attention. In contrast, exogenous shifts of attention are more reflexive in that participants

can automatically orient to the stimulus that is presented at a target location (see Gazzaniga, Ivry,

& Mangun, 2002 for review). When cues are presented centrally, participants have to move their

3

attention according to the instructional content of the cue (Ruz & Lupianez, 2002). On the other

hand, peripheral cues are able to capture attention exogenously at a particular location without

any instructional content (Ruz & Lupianez, 2002). Therefore, informative central and peripheral

cues have been used to study endogenous shifts of attention while spatially non-predictive

peripheral cues have been used to study exogenous shifts of attention (Chica, Sanabria,

Lupianez, & Spence, 2007).

Researchers that have attempted to separate the effects of endogenous and exogenous

shifts of attention have found that endogenous cueing requires a longer delay interval between

cue and target presentation (Eimer, 2000; Muller and Findley, 1988). These studies imply that

different attentional mechanisms may be mediating cue processing. Corbetta and Shulman

(2002) have claimed that endogenous versus exogenous shifts of attention may invoke different

behavioural patterns and neural activations. In a meta-analysis of studies conducted on shifts of

attention, Corbetta and Shulman (2002) ascribed voluntary (endogenous) attention to a network

of areas that include the dorsal posterior parietal and frontal cortex with transient activity in

occipital areas. The functional attributes of this fronto-parietal network include integration of

prior experience with expectations and goals to result in volitional shifts in attention. The

temporo-parietal cortex and ventral frontal regions comprise a separate network that has been

implicated in reflexive (exogenous) attentional shifts. This reflexive network is thought to focus

attention on salient events in the environment.

A similar but less specific distinction in attentional processes was suggested earlier by

Posner (1992) and Posner and Petersen (1990). The authors claimed that an anterior attentional

network consisting of frontal areas and the anterior cingulate was responsible for extracting the

relevance of a selected item while a posterior attentional network involving parietal areas

selected information based on sensory attributes. Posner argued that this anterior-posterior

attention system functioned to coordinate activity at a supramodal (modality-independent) level

and was able to regulate data processing that was specific to particular cognitive tasks.

While Corbetta and Shulman (2002) and to some extent, Posner (1992) have made an

attempt to categorize exogenous and endogenous attentional shifts into distinct processing

streams, a study by Rosen and colleagues (1999) suggested a fair bit of overlap in brain

activations in response to voluntary and reflexive attention. In both endogenous and exogenous

4

cueing tasks, activations were seen in the dorsal premotor region, the frontal eye fields and the

superior parietal cortex. Evidence to date remains inconclusive about the exact nature of

interactions between endogenous and exogenous shifts of attention at the neural level.

The mechanisms of attention discussed in this section are not directly explored in the

current experiment but provide the context for understanding experiments on crossmodal cueing

outlined below.

1.3 Influence of Crossmodal Stimuli on Attentional Mechanisms The research mentioned thus far has focused on visual spatial orienting while

experiments conducted in the 1960s also found attentional modulations of response times to

cross-modal (auditory, visual) stimuli (Bertelson & Tisseyre, 1969; Davis & Green, 1969). In

one experiment, Bertelson and Tisseyre found that an auditory click decreased reaction time to a

subsequent visual flash while a visual flash did not have the same effect on an auditory click. A

more specific investigation into crossmodal cueing was conducted by Buchtel & Butter (1988)

who used lateralized auditory and visual stimuli. These stimuli were spatially significant

(spatially neutral conditions were used as controls) in comparison to Bertelson & Tisseyre’s non-

spatial stimuli (centered flash, binaural click: neutral spatial significance), Two cross-modal

cueing cases were devised, auditory to visual (A-V) and visual to auditory (V-A), as well as the

intra-modal cases, visual to visual (V-V) and auditory to auditory (A-A). Previous findings were

reinforced with respect to reaction time: cross-modal cueing led to faster responses compared to

intra-modal cueing, and audio cues were more effective in facilitating responses compared to

visual cues. In the cross-modal conditions, visual target stimuli seemed much easier to cue than

audio target stimuli, regardless of the cue modality. In fact, Buchtel and Butter reported virtually

no cueing effect for auditory target stimuli.

Farah, Wong, Monheit, and Morrow (1989) performed a cross-modal cueing study on

patients with lesions circumscribed unilaterally to the parietal lobe to try to understand the nature

of crossmodal cueing effects in the brain. Prior to this study, Posner and colleagues (1984) had

shown that patients with unilateral parietal lobe lesions had difficulty disengaging attention in a

standard visual spatial cueing paradigm. Farah and colleagues (1989) extrapolated on Posner’s

work and examined A-V and V-V conditions. In the V-V condition there was a 50ms reduction

in reaction time when the target appeared on the ipsa-lesional side of space, and a considerable

5

increase in reaction times for targets that appeared on the contra-lesional side of space. In the A-

V condition, response times were faster than in the visual cue condition. Again, there was a large

increase in reaction time for targets that appeared contra-lateral to the parietal lesion. This study

was in agreement with its predecessors in a general way: cross-modal A-V cueing produced

facilitated reaction times.

Contrary to the observation of these past studies was a paper by Ward published in 1994.

Ward (1994) used a crossmodal spatial discrimination task where participants made speeded left-

right responses to visual or auditory targets following the presentation of an auditory or a visual

non-predictive cue, both auditory and visual cues, or no cues. Reaction times were measured for

all conditions at different inter-stimulus intervals between the cue and target. The results

indicated that visual cues facilitated reaction times to auditory targets that were presented on the

same side of the cue (compatible) at short inter-stimulus intervals. Auditory cues, in contrast, did

not facilitate reaction times to visual targets shown on either side of cue presentation or at any

inter-stimulus interval. Auditory cues did facilitate reaction times to compatible auditory targets

at short inter-stimulus intervals. Ward’s findings were directly opposite to those found by

previous crossmodal studies (Bertelson & Tisseyre, 1969; Buchtel & Butter, 1988; Farah, Wong,

Monheit, & Morrow, 1988).

In a subsequent study by Spence and Driver (1997), an orthogonal spatial cueing

paradigm (Spence and Driver, 1994) was used to examine crossmodal cueing effects. In the

orthogonal spatial cueing experiment, participants had to discriminate the elevation of a target

sound rather than its laterality. The targets were preceded by an uninformative cue on the same

side in fifty-percent of the trials and on the opposite side in the rest of the trials. The results of

multiple manipulations of the orthogonal spatial cueing paradigm replicated the finding that

auditory cues facilitate reaction times to visual targets. The authors claimed that Ward’s

contradictory findings could be explained in light of spatial compatibility effects and response

priming since Ward’s stimuli were lateralized.

Ward and his collaborators have subsequently demonstrated that the cross-modal

asymmetry in favour of visual cueing still holds when all methodological confounds are removed

(McDonald & Ward, 1999; Ward, McDonald, & Golestani, 1998; Ward, McDonald, & Lin,

2000). Ward, McDonald and Golestani (1998) showed that when cues and targets were presented

6

from exactly the same lateral eccentricity, responses to visual cues were faster, ruling out Spence

and Driver’s criticism about Ward’s initial findings being based on spatial compatibility effects

that arise as a result of cue-target lateralization differences. In another study by Ward, McDonald

and Lin (2000), an implicit spatial discrimination paradigm was used to study cross-modal

asymmetry and visual cue superiority was once again found eliminating effects of response

priming on Ward’s early findings (1994).

Mondor and Amirault (1998) reacted to discrepancies in the direction of the crossmodal

cueing (auditorily or visually driven effects) by conducting a comprehensive investigation into

the impact of experimental set-up on cueing effects. The investigators manipulated cue and target

modalities, stimulus onset asynchrony (SOA), and target location given cue location. In their first

experiment, they tried to maximize uncertainty for the subject: an auditory or visual spatial cue

preceded an auditory or visual lateralized target by either 150 or 300ms. The results showed that

valid trials were faster than invalid trials (also known as the cue validity effect) only for intra-

modal tasks (A-A, V-V) and only for an SOA of 150ms. Cueing for cross-modal cases did not

reach significance. These results seemed inconsistent with past findings (Bertelson & Tisseyre,

1969), but Mondor and Amirault (1998) fostered a hypothesis that could address the discrepancy.

They suspected that endogenous mechanisms were responsible for cross-modal cueing effects.

That is, rather than pure perceptual-motor reactivity, it was expectation or anticipation that

allowed cross-modal attention to be efficient. In their second experiment, Mondor and Amirault

(1998) decreased the uncertainty: seventy-five percent of the cues were valid, while the modality

of cue and stimulus was unpredictable and the SOA was a constant 150ms. A significant cue

validity effect for A-V and V-A resulted even though the A-A and V-V effect was stronger. In a

third experiment, Mondor and Amirault (1998) administered the conditions in blocks, thus

rendering modality of cue and target predictable. This increased intra-modality cueing effects,

though cross-modal cueing effects were unaffected. Mondor and Amirault (1998) concluded that

cross-modal cueing was effective when endogenous mechanisms were at work, and those

mechanisms were spurred by predictability (predictably of modality seemed irrelevant) or lack of

uncertainty.

Although, Mondor and Amirault’s (1998) third experiment did not report strong

crossmodal cueing effects, a subsequent study by Schmitt, Postma and de Haan (2000) found

facilitated reaction times to both auditory and visual cues in an experimental set-up where cue

7

and target modalities were fixed within a block. Symmetric audio-visual cueing effects have

subsequently been reported in other studies (McDonald & Ward, 1999; 2003). At present,

facilitation effects have been shown for both auditory and visual cues but the exact neural

underpinnings of these cueing effects have yet to be isolated.

On a side note, most crossmodal cueing experiments mentioned previously have

manipulated SOA in addition to cue and target modalities (Bertelson & Tisseyre, 1969; Spence

& Driver, 1997; Ward, 1994) to try to understand the type of attentional mechanisms that may

underlie the integration of cue-target information. When Bertelson and Tisseyre (1969) varied

stimulus onset asynchrony between 0 and 700 ms and measured the effect of SOA manipulation

on responses to a target, they found that an auditory click decreased reaction time to a

subsequent visual flash at the smallest SOA. At larger SOAs (greater than 70ms), in the case of

the visual flash that preceded the auditory click, some facilitation of reaction time was noted but

it was greatly attenuated in comparison to the auditory click – visual flash (A-V) case. The

primary purpose of carrying out such behavioural experiments was to determine if there was an

attentional refractory period that followed processing of the first stimulus. The logic was that if

attentional mechanisms which may be governed by higher-order cognitive systems in the brain

were required to process the first stimulus, than this processing would utilize resources that

would not be available to process a subsequent stimulus till a certain time had elapsed (Davis,

1959). This elapsed time was known as the attentional refractory period. However, experiments

by Bertelson and Tisseyre found that there was no attentional refractory period if the two stimuli

were presented in two different sensory modalities (auditory and visual). When stimuli were

presented in the same modalities (visual-visual: V-V), the facilitation was very weakly

represented in the behavioural data. By using a substantially long SOA, both auditory and visual

cueing effects can potentially be delineated.

1.4 Crossmodal Asymmetry Reported In Cognitive, Physiological, and Developmental Studies

The majority of behavioural studies conducted on some aspect of crossmodal cueing

allude to the idea that there may be cognitive, physiological and developmental bases for

auditory and visual interactions. Some of these studies will be described next.

8

1.4.1 Auditory to Visual (A-V) Interactions

Investigators examining multisensory interactions have found that a sudden sound can

enhance the detection of a subsequent flash of light in the same location (McDonald, Teder-

Salejarvi, & Hillyard, 2000). Abrupt sounds synchronized with visual search arrays can improve

the identification of visual targets embedded in a series of distracters (Vroomen & de Gelder,

2000). These studies suggest that auditory events can influence visual processing.

According to Neumann, Van der Heijden, & Allport (1986), shifting visual attention to

auditory events has evolutionary significance. They claim that auditory events that occur distally

can be registered in the brain resulting in appropriate action whereas; by the time visual events

come into view proximally, a response may not be viable. Since auditory events in the world are

transient and intermittent relative to visual events which appear continuous in time, it is more

beneficial to have an asymmetry in audio-visual cueing. Building on concepts proposed by

Neumann and colleagues’ ideas, some researchers (Brown & May, 1989; Harrison & Irving,

1966; Spence & Driver, 1997) have suggested that the primary function of sound localization in

animals is to the direct the eyes towards auditory events. In this way, sound localization may be

necessary for the control of orienting towards significant distal events which occur outside an

animal’s field of view.

Visual orienting with respect to sound localization has been explored in physiological

studies involving the superior colliculus (King, 1993; Stein & Meredith, 1993; Stein, Wallace, &

Meredith, 1995). In order to understand how the superior colliculus may contribute to the

integration of audio-visual stimuli, it is imperative to consider the anatomical organization of this

structure. The first three layers of the superior colliculus are purely visual and organized spatio-

topically (de Monasterio, 1978a, 1978b; Dubois & Cohen, 2002; Leventhal et al., 1981; Perry

and Cowey, 1984; Rodieck and Watanabe, 1993; Schiller and Malpeli, 1977; Sparks, 1988;

Sparks & Hartwich-Young, 1989). There is no influence of audition on these first three

superficial layers (King, 1993). The three layers beneath the visual layers are often referred to as

the deep layers of the superior colliculus. These layers are considered polymodal because they

receive ascending inputs from brainstem nuclei such as the inferior colliculus and the trigeminal

nucleus that represent modalities such as audition, vision and touch (Wickelgren, 1971).

Descending inputs from the temporal cortex and postcentral gyrus also culminate in these deep

9

layers (Wickelgren, 1971). Cells within the deep layers that receive afferents from auditory,

visual pathways and multimodal cortical sites are known as multisensory integrative cells (MSI,

term coined in Calvert, 2001). These MSI cells are not only capable of generating responses to

different modalities but are also able to transform separate sensory inputs into an integrated

product. In conclusion, the anatomy of the superior colliculus makes it an ideal candidate for

carrying out multisensory integration; that is the assimilation of information from multiple

sensory modalities into a motor outcome.

According to Wickelgren, the superior colliculus is able to track stimuli in either vision

or audition that are moving laterally away from an animal to ensure appropriate avoidance or

approach behaviour. This structure, in a variety of species ranging from amphibians to primates,

is able to produce orienting movements linked to the eyes (saccades), head and body as well as

approach, freezing or running responses (Sahibzada, Dean, & Redgrave, 1986; Sparks, 1999;

Vargas, Marques, & Schenberg, 2000). Researchers have hypothesized that the superior

colliculus may be vital in generating motoric responses to salient sensory events (Sparks, 1999;

Stein, 1998).

Other neurophysiological investigations of the functional architecture of the superior

colliculus have shown that there is a two-dimensional representation of auditory target location

represented within deep layers of this structure (King, 1993; Stein & Meredith, 1993). Stein and

Meredith (1993) however, are careful in noting that the deep layers are multimodal rather than

purely auditory suggesting that any shifts of attention could implicate involvement of other

modalities. King (1993) advocates further that there is no spatio-topic map of auditory space in

the brain therefore; it unlikely that vision could guide auditory localization. Clarey, Barone and

Imig (1992) have found that in structures like the inferior colliculus and the primary auditory

cortex, there is a tonotopic organization with lateralized auditory receptive fields but there is no

indication of spatial tuning or spatiotopy. Therefore, the only spatio-topic map of auditory space

in the brain is the one found in polymodal deep layers of the superior colliculus.

Developmental studies of the superior colliculus have shown that the representation of

auditory space in deep layers of the superior colliculus can be altered by varying the visual

environment (King & Carlile, 1993; King, Hutchings, Moore and Blakemore, 1988; Withington,

1992). In contrast, manipulations of auditory experience have no influence on the spatiotopically

10

organized superficial layers of the superior colliculus (Knudsen, Esterly & Knudsen, 1984;

Withington-Wray et al., 1990). This asymmetry in the direction of audio-visual interactions is

also prominent in studies that have examined the development of senses in infants. According to

Gottlieb (cited in Lewkowicz & Kraebel, 2004), different sensory modalities develop in the

following sequence: tactile, vestibular, chemical, auditory and visual. The first four modalities

mentioned are functional prenatally while vision develops most postnatally. Lewkowicz (1988a,

1988b) claims that sensory processing hierarchies evident in developmental profiles of infants

may contribute to the different degrees of how input from a modality is transmitted to different

unisensory and multisensory sites in the brain. For example, inputs from auditory cortical

neurons could possibly project to areas of the visual cortex that are still underdeveloped at birth

however; neurons within the visual cortex may be unable to innervate the developed neuronal

structure of the auditory cortex. Thus, the cellular architecture that supports a particular sensory

modality may result from temporal differences in the development of the senses.

1.4.2 Visual to Auditory (V-A) Interactions

Over the years, researchers have reported some effects of vision influencing audition.

These studies are not as extensive as the ones mentioned previously for the auditory capture of

visual attention but are worth considering.

The famous McGurk effect provides evidence for vision altering speech perception

(McGurk & MacDonald, 1976). For example, a sound of /ba/ is perceived as /da/ when it is

coupled with a visual lip movement associated with /ga/. The McGurk effect suggests that sound

can be misperceived when it is coupled with different visual lip movements. fMRI studies by

Calvert and colleagues (1997, 1999, 2000) have investigated brain activity in relation to

incongruent audio-visual linguistic information (similar to the McGurk effect). Calvert et al.

(1997) found that lip-reading (visual speech) activated areas of the auditory cortex that were

previously considered to be unimodal. In a subsequent study, Calvert and colleagues (1999)

showed an enhancement in sensory-specific cortices when participants saw and heard speech.

These neuroimaging studies uphold the view that vision can influence audition.

Another well-known effect of vision on audition is the ventriloquist effect first reported

by Howard and Templeton (1966). The ventriloquist’s illusion is caused when the perceived

location of a sound shifts towards the location of the visual source. Ventriloquists are able to

11

produce speech without visible lip movements while simultaneously moving a puppet. This leads

to a conflict between visual and auditory localization that culminates in vision dominating

audition. A recent study by Guttman, Gilroy and Blake (2005) has suggested that in audio-visual

conflict situations, vision dominates spatial processing while audition directs temporal

processing. The effects stated in this section suggest that vision can alter audition in very specific

cases such as those of conflicting information from multiple modalities. The influence of vision

on audition in terms of spatial localization and cueing still remains an area that has potential for

exploration.

1.5 General Anatomy of Central Auditory and Visual Pathways An exploration of the characteristics of audio-visual interactions is incomplete without a

consideration for the general organization of auditory and visual pathways in the brain. This

section will briefly review some general areas that are involved in the processing of auditory and

visual stimuli.

1.5.1 The Auditory Pathway

Sound waves from the environment are captured and focused by the auricle – the external

ear – into the auditory canal. The auditory canal is a hollow, air-filled tube that transmits sound

waves to three bones located in the middle ear (the malleus, the incus and the stapes). These

three bones amplify the auditory signal enabling it to travel through the fluid-filled inner ear

structure called the cochlea. The cochlea is the primary site of the conversion of sound energy

into a neural code. This process is called signal transduction (Noback, 1967). Outer hair cells,

also known as auditory sensory receptors, in the cochlea display motility converting mechanical

sound energy into receptor potentials. Subsequently, these outer hair cells transmit receptor

potentials to inner hair cells which provide frequency and intensity information to cochlear

ganglion cells (Miller & Towe, 1979). The innervation of hair cells by ganglion cells in the

cochlea is the first part of the auditory neural pathway where information is encoded both in

terms of frequency and intensity of sound. Neurons within the cochlea respond best to

stimulation at characteristic frequencies of contiguous cells. In this way, tonotopy – organization

of tones that share similar frequencies into topologically neighbouring neurons – begins

postsynaptic to inner hair cells (Spoendlin, 1974).

12

Axons from the cochlear ganglion cells form the cochlear component of the eighth

cranial nerve synapsing in the cochlear nuclear complex of the medulla-pontine junction. The

cochlear nucleus sends auditory inputs via three different pathways – dorsal acoustic stria,

intermediate acoustic stria and trapezoid body – to the pons. The trapezoid body projects to the

superior olivary nuclei where information from both ears (binaural) is processed. Localization of

sounds in space occurs in the medial and lateral portions of the superior olivary nucleus.

Efferents from the superior olivary nucleus, dorsal and intermediate acoustic stria terminate in

the inferior colliculus by way of the lateral lemniscus nuclei. The inferior colliculus in turn sends

outputs to the medial geniculate nucleus of the thalamus which end up in the primary auditory

cortex (area A1, or Brodmann areas 41 and 42; Brodmann, 1909) located on Heschl’s gyrus in

the superior temporal lobe (adapted from Brodal, 1981, Hudspeth, 2000). The auditory cortex is

organized in concentric bands with the primary auditory cortex in the centre and auditory

association areas forming the periphery (see Pandya, 1995 for review). The auditory neural

pathway maintains its tonotopic organization from the cochlear ganglion cells to the primary

auditory cortex

For decades, it was assumed that the auditory neural pathway processed exclusively

acoustic information. In recent years, studies have shown that outputs from midbrain structures

such as the inferior colliculus and the medial geniculate nucleus of the thalamus can contain

some visual information (Komura et al., 2005; Porter, Metzger, & Groh, 2007). Porter, Metzger

and Groh (2007) have demonstrated that the inferior colliculus in monkeys carries visual,

specifically saccade-related information in addition to auditory responses. This study suggests

that the inferior colliculus, predominantly considered to be a unisensory structure responsive to

only auditory stimuli, may have the capacity to integrate specific types of audio-visual

information. In another study by Komura and colleagues (2005), rats were trained to perform an

auditory spatial discrimination task with auditory or auditory-visual cues. The auditory-visual

cues were presented in such a manner that the visual cues were either congruent with auditory

cues or provided conflicting information. The results showed that almost fifteen percent of

auditory thalamic neurons were modulated by visual cues. Responses in these neurons were

enhanced in congruent conditions and suppressed when auditory and visual cues conflicted.

Studies by Porter, Metzger and Groh, and Komura et al. imply that the auditory cortex receives

some visual input via subcortical structures within the central auditory pathway; the extent of

13

which is not clear. Thus, unisensory sites in the brain may have the ability to perform some

multisensory functions.

1.5.2 The Visual Pathway

Light entering the eye is detected by the retina which is composed of photoreceptors

(cones and rods) that transduce light into electrical signals that can be decoded by the brain.

Information from rods and cones feeds into a network of interneurons composed of horizontal,

bipolar and amacrine cells which in turn project to ganglion cells. Axons from the retinal

ganglion cells form the optic nerve (Wurtz & Kandel, 2000a; 2000b). The optic nerve carries

information from both eyes till it reaches the optic chiasm where inputs from the left side of each

eye (left hemifield) and the right hemifield cross to the contra-lateral hemisphere (Guillery,

1982). From the optic chiasm, visual inputs flow through optic tracts segregated by eye (left or

right) to the lateral geniculate nucleus of the thalamus. The primate lateral geniculate nucleus

contains six layers. Layers 1 and 2 are called the magnocellular layers and layers 3-6 are termed

the parvocellular layers (Kaas, Guillery, & Allman, 1972; Sherman, 1988). Both magno- and

parvo-cellular layers project to the primary visual cortex (area V1 or Brodmann area 17;

Brodmann, 1909). V1 sends outputs to V2/V3 from which point the visual information is split

into two streams – temporal and parietal (see Desimone & Ungerleider, 1989). The visual

association areas include areas such as V4, V5 and MT (middle temporal area; adapted from

Wurtz & Kandel, 2000a; 2000b). The plethora of connections between primary visual areas and

visual association cortices are beyond the scope of this discussion but can be reviewed in a paper

by Felleman and Van Essen (1991). Despite the multitude of levels in the central visual pathway,

information is maintained retinotopically - adjacent areas in the visual field are encoded by

slightly different but overlapping neuronal receptive fields.

Apart from the major retinogeniculostriate pathway, the retina also projects to other

subcortical areas such as the superior colliculus (Leventhal, Rodieck, & Drehkr, 1985;

Magalhaes-Castro, Murata, & Magalhaes-Castro, 1976; Rodieck & Watanabe, 1993; Wassle &

Iling, 1980) and the pulvinar nucleus of the thalamus (Grieve, Acuna, & Cudeiro, 2000; Itoh,

Mizuno, & Kudo, 1983; Mizuno et al., 1982; Nakagawa & Tanaka, 1984) as seen in primates

and cats. The superior colliculus and the pulvinar nucleus are also extensively interconnected

(Grieve, Acuna & Cudeiro, 2000; Stepniewska, Qi, & Kaas, 2000). The anatomical organization

14

of both these subcortical structures and their interactions with auditory and visual pathways

places them in a unique position to integrate multisensory information.

1.6 From Anatomy to Function - A Dynamic Systems’ Perspective As discussed in previous sections, the anatomical architecture of subcortical structures in

auditory and visual pathways supports both unisensory and multisensory processing. At the

cortical level, Felleman and Van Essen (1991) have documented extensive feed forward,

feedback and horizontal connections between visual and multisensory brain areas. Connections

between unimodal auditory cortex and primary visual areas have been mapped in primates by

Rockland and Ojima (2003) and Falchier et al. (2002). Cognitive neuroscience studies have also

successfully identified multisensory sites that include cortical regions like the posterior parietal

cortex (Bushara, Grafman, & Hallett, 2001; Iacoboni, Woods, & Mazziotta, 1998), intraparietal

sulcus (Macaluso & Driver, 2005), premotor areas (Dassonville et al., 2001; Kurata et al., 2000;

Tanji & Shima, 1994), anterior cingulate (Laurienti et al., 2003; Taylor et al., 1994) and

prefrontal cortex (Asaad, Rainer, & Miller, 1998; Bushara, Grafman, & Hallett, 2001; Laurienti

et al., 2003; Tanji & Hoshi, 2001; Taylor et al., 1994) and subcortical regions such as the

thalamus (Grieve, Acuna,& Cudeiro, 2000; Porter, Metzger, & Groh, 2007), superior colliculus

(Calvert, 2000; King, 1993; Stein & Meredith, 1993; Stein, Wallace, & Meredith, 1995),

cerebellum (Allen, Buxton, Wong, & Courchesne, 1997; Bense et al., 2001; Kim et al., 1999)

and parts of the basal ganglia (Chundler, Sugiyama, & Dong, 1995; Nagy et al., 2006). The

abundance in areas that process multisensory information suggests that there are large degrees of

functional overlap and redundancy in the brain. This idea was initially asserted by Mountcastle

(1979) who noticed that proximal areas showed similar functional characteristics and that there

were many structural and functional redundancies in sensorimotor systems.

Theories about how the brain’s structural capacity contributes to its functional properties

can be classified into a framework that considers the brain to be a dynamic system. This

framework has been defined by researchers in numerous ways, some of which will be discussed

below (Bressler, 1995; Bressler, 2002; Bressler & McIntosh, 2007; Bressler & Tognoli, 2006;

Fuster, 1997; Goldman-Rakic, 1988; Mesulam, 1981, 1990; 1998; McIntosh, 1999; Price &

Friston, 2002).

15

Bressler and Tognoli (2006) suggest that the functional expression of a particular

cognitive operation results from the co-activation of specific interconnected local area networks.

Mesulam (1990) has also stated that a cognitive or behavioural operation can be subserved by

several interconnected brain areas each of which is capable of multiple computations. Therefore,

a cognitive operation such as attention can be controlled by a diffuse cortical network that is

redundant and specialized at the same time (Mesulam, 1981). According to his view, each

region within a cortical network has some specialization because of its anatomical connections

but this specialization is not absolute in that lesions to different areas in a network could have

similar behavioural consequences. Goldman-Rakic, in a review paper (1988) comparing the

literature on different models of cortical organization claims that the brain’s association cortices

(areas that are largely responsible for complex processing that occurs between sensory input to

primary sensory cortices and motor output elicited by primary motor areas) interact to form a

finite number of dedicated networks that are reciprocally interconnected. These networks are

capable of communicating with sensorimotor systems to produce integrated behaviour.

One dynamical systems’ perspective that has gained momentum in the past few decades

is the notion that regions of the brain that share similar structural properties can contribute to a

multitude of functions by way of their interactions with other regions (Bressler, 1995; Bressler,

2002; McIntosh, 1999; Bressler & McIntosh, 2007, Price & Friston, 2002). These interactions

could be represented via direct or indirect connections. In addition, brain areas that have very

different neural connections can contribute to the same functional output. This framework of

distributed function in the brain has been defined operationally by McIntosh (1999, 2004) using

the term neural context. Neural context represents the local processing environment of a given

brain area that results from modulatory influences from other brain areas (Bressler & McIntosh,

2007; McIntosh, 1999). Therefore, cognitive function may not be localized in an area or network

of the brain but may emerge from dynamic large-scale neural interactions between different

brain areas that change as a function of task demands. Task demands, in broader terms, can be

the situational context in which an event occurs. Situational context refers to environmental

factors such as sensory input and response processing in which a task is performed. In most

cases, neural context is elicited from changes in situational context (Bressler & Tognoli, 2006;

Protzner & McIntosh, 2007).

16

While dynamical models incorporating large-scale neural interactions, neural and

situational contexts have been explored for cognitive processes such as learning and memory

(Lenartowicz & McIntosh, 2005; McIntosh & Gonzalez-Lima, 1994; 1998), they have seldom

been applied to investigate interactions between cognitive processes such as attention and

sensorimotor systems. Brain areas that display attention-sensorimotor interactions can be

dissociated using techniques like functional resonance imaging (fMRI). The use of fMRI to study

brain function will be discussed in the next section.

1.7 An Overview of fMRI

1.7.1 Basic MRI Physics

The MRI concepts presented here are summarized from Brown and Semelka (1999) and

Huettel, Song and McCarthy (2004).

The proton of a hydrogen atom has a magnetic spin that is denoted by a spherical,

distributed positive charge. This charge rotates about an axis at high speeds producing a small

magnetic field called a magnetic moment. In addition to the magnetic moment, a proton’s mass

in combination with the rotating charge produces angular momentum. The magnetic moment and

the angular momentum form the basis of the spin properties of a proton. When a proton is

exposed to a uniform magnetic field (B0), it can assume one of two types of spins – parallel

(same direction as B0) and anti-parallel (opposite direction to B0) – forming the equilibrium state.

Parallel spins are lower in energy and more stable compared to anti-parallel spins. In order to

convert a proton in a parallel spin state to an anti-parallel state, the proton needs to absorb

electromagnetic energy. By the same token, a proton in a high energy state emits electromagnetic

energy as it returns to a low energy state. The electromagnetic radiation frequency required to

excite a proton from a low-energy state to a high-energy state can be calculated for an MR

scanner. This frequency is called the Larmor frequency and is needed to change spins from

parallel to anti-parallel orientations.

Apart from the changes in spin states induced by B0, a proton’s motion about its axis can

also be influenced by B0. In the presence of B0, the axis of rotation for a proton can rotate around

the direction of B0. The motion of a proton in B0 is referred to as spin precession. The physical

characteristics of spin precession are exploited in MRI.

17

In an MRI set-up, a participant is placed in the centre of a uniform magnetic field.

Protons within brain tissue assume their equilibrium states (parallel or anti-parallel). Using the

Larmor frequency to generate an electromagnetic radiation pulse, protons can be excited and

their rotational axis perturbed to generate an MR signal. Once the electromagnetic pulse is

removed, the MR signal starts to decay. MR signal decay, also known as spin relaxation, is of

two types – longitudinal and transverse. Longitudinal relaxation (T1 recovery) corresponds to the

return of net magnetization in the same plane as B0. It is caused by protons in high-energy, anti-

parallel spins (excited state) returning to low-energy, parallel, relaxed states. In order to

understand transverse relaxation (T2 decay), consider the following: an electromagnetic pulse

tips the axis about which a proton precesses in the transverse plane such that all protons precess

in the same phase. When the pulse is removed, this phase-locking gradually subsides and protons

return to their original out of phase states. This is referred to as transverse relaxation. The rates

of longitudinal and transverse relaxation are constant for particular substances such as water,

bone or fat. The constants that describe longitudinal and transverse relaxation are called T1 and

T2, respectively.

While both the T1 and T2 constants are important for MR, a third constant, T2* is

essential for functional MR. T2* includes transverse relaxation due to phase differences in spin

precessions as well as local magnetic field inhomogeneities. The latter can be described by

considering an inhomogeneous external magnetic field where the strength of the magnetic field

at a particular location influences the spin precessional frequency. Perturbed protons at different

locations within the magnetic field lose coherence in their spin precessions at different rates

contributing to the decay of net magnetization in the transverse plane.

An MR signal can highlight different parts of the brain depending on the contrast used. A

T1-weighted image of the brain shows high signal (bright) for fat content and low signal for

cerebrospinal fluid (CSF; dark) while a T2-weighted image shows the opposite contrasts for fat

and CSF.

Two parameters that are critical to the amount of MR signal recorded and the contrast

expressed are repetition time (TR) and echo time (TE). The interval between successive

electromagnetic pulses that excite protons is referred to as TR. The TR influences the rate of

longitudinal recovery after an excitation pulse is removed. TE corresponds to the interval

18

between the excitation pulse and the acquisition of data and affects the rate of transverse decay.

By varying the length of the TR or TE, image intensity at each spatial location can be made more

or less sensitive to differences in T1 and T2.

1.7.2 Physiological Basis of BOLD fMRI

Functional MRI (fMRI) can be used to estimate the spatial locations of neural activations

and associated changes in metabolic demands when a person performs a certain task. These

metabolic changes include variations in concentration of deoxygenated hemoglobin, blood flow

and blood volume, (Buxton, Wong, & Frank, 1998; Kwong et al., 1992) all of which play a role

in producing the blood-oxygenation-level dependent (BOLD) response recorded in fMRI. To

understand the BOLD response, the magnetic properties of oxygenated and deoxygenated

hemoglobin must first be considered. Oxygenated hemoglobin is diamagnetic (does not affect

magnetic field strength) while deoxygenated hemoglobin is paramagnetic (distorts local

magnetic fields). The paramagnetic properties of deoxygenated hemoglobin decrease T2* values

(Thulborn et al., 1982) which allows measurement of changes in brain activity as indexed by

changes in the amount of deoxygenated hemoglobin.

The BOLD response in MR was first discovered by Ogawa and colleagues in 1990.

Anaesthetised rats were placed in an MR scanner. The experiment was performed to try to

investigate brain physiology with MRI. It was known by that point that deoxygenated

hemoglobin decreased blood’s T2* values (Thulborn et al., 1982). Ogawa et al. (1990) used this

finding to demonstrate that varying the proportion of blood oxygenation in rats could lead to

different MR image characteristics. In cases where rats breathed in oxygen at a hundred percent

concentration, Ogawa et al., noticed that T2* images of the brain showed structural differences

and very little vasculature. As the amount of oxygen in rats decreased, brain vasculature became

more prominent. To test this interesting effect, a saline-filled container that had test-tubes of

oxygenated and deoxygenated blood within it was imaged using at T2* contrast (Ogawa & Lee,

1990). T2*-weighted images of oxygenated blood showed a dark outline around the test-tube’s

diameter. In contrast, deoxygenated blood showed a greater signal loss that spilled over into the

area filled with saline. Ogawa and others concluded that functional changes in brain activity

could be studied using what was to be known as the BOLD contrast.

19

Following Ogawa’s work, Belliveau and colleagues (1991) observed the first functional

images of the brain by using a paramagnetic contrast agent called Gd (DTPA)2-. Subsequently,

Ogawa and others (1992) and Kwong et al., (1992) published the first functional images using

the BOLD signal.

The physiological basis of the BOLD signal can be summarized as follows. Relative

amounts of oxygenated and deoxygenated hemoglobin in the capillary bed of a brain region

depend on the ratio of oxygen consumption and supply. When neural activity increases, the

amount of oxygenated blood delivered to that area also increases while levels of deoxygenated

hemoglobin decrease. The BOLD signal captures the displacement of deoxygenated hemoglobin

by oxygenated blood since the former has the capability of affecting magnetic fields but the latter

does not. In other words, changes in oxygenation levels lead to the modulation of microscopic

field gradients around blood vessels which in turn affect T2* values for tissue water that produce

differences in signal strength (Huettel, Song, & McCarthy, 2004).

A gradient echo-planar imaging sequence that is sensitive to the paramagnetic properties

of deoxygenated hemoglobin can be used to display tomographic maps of brain activation

(Brown & Semelka, 1999; Kwong et al., 1992; Huettel, Song, & McCarthy, 2004).

1.7.3 Coupling of Neuronal Activity & BOLD

The majority of functional neuroimaging studies assume that the physiological changes

underlying the BOLD response are capturing neuronal activity. However, the nature of neuronal

activity represented by BOLD responses is still an actively debated topic.

Neuronal activity that would predict fMRI BOLD response could include multiple factors

such as the average firing rate of a sub-population of neurons also referred to as multi-unit

activity (MUA; Legatt, Arezzo, & Vaughan, 1980); synchronous spiking activity across a

neuronal population; the local field potential (LFP) which represents the synchronization of

dendritic currents (Mitzdorf, 1987), or some measure of the sub-threshold electrical activity as

measured by single-unit recordings (SUA, measured by Hubel & Wiesel, 1959). The size of a

neuronal population whose activity is indexed by fMRI signals may also be an issue. Scannell

and Young (1999) postulate that changes in fMRI BOLD responses could be caused by large

20

changes in firing rates in small neuronal populations or small changes in firing rates in larger

neuronal sub-populations.

Logothetis and colleagues (2001) attempted to investigate the relationship between

neuronal firing rates in the brain and subsequent BOLD responses picked up by fMRI. Monkeys

were presented with rotating checkerboard patterns and both fMRI BOLD responses and

electrophysiological signals were measured from primary visual cortex. The electrophysiological

data that was acquired consisted of SUA, MUA and LFP recordings. The results showed a

transient increase in BOLD at the onset of the visual stimulus which persisted until the visual

stimulus went offline. Approximately twenty-five percent of MUA showed a transient increase

in activity and subsequent return to baseline, while LFPs were sustained throughout the stimulus

duration. The authors claimed that increased LFPs during stimulation were significantly stronger

than MUA and were maintained over longer intervals therefore; LFPs give a better estimate of

BOLD responses than MUA. Logothetis et al.’s (2001) paper also suggested that LFPs

resembled integrative activity at neuronal dendritic sites while MUA corresponded to the axonal

firing rate of a small population of neurons. Hence, BOLD seemed to reflect incoming input and

local processing in an area more that spiking activity.

The experiment by Logothetis and colleagues (2001) assumed a somewhat linear

relationship between BOLD and LFPs. A subsequent study by Mukamel and colleagues (2005)

suggested that BOLD responses may be comprised of more complex neuronal activity that does

not necessarily follow a linear pattern. In Mukamel et al.’s experiment (2005), SUA and LFPs

were recorded from two neurosurgical patients. fMRI BOLD signals were collected from healthy

participants. Both patients and participants viewed a movie segment during measurements of

neural activity. The results showed a high, significant correlation (r = 0.75) between SUA from

neurosurgical patients and fMRI BOLD signals in healthy controls. The authors claimed that

fMRI BOLD responses were reliable measures of firing rates in human cortical neurons.

While the findings of Logothetis et al. (2001) and Mukamel et al., (2005) do not directly

contradict each other; it does appear though that neuronal activity that is represented in the

BOLD response may be a complex milieu of input and output processing. According to a review

article on coupling between BOLD and neuronal activity, Heeger and Rees (2002) state that

LFPs could be dominated by activity in proximal neurons which would suggest that local spiking

21

activity, synaptic activity and dendritic currents are all co-varying. The authors emphasize that

fMRI BOLD response may be capturing both presynaptic and postsynaptic activity within a

particular region. For experiments that are designed to investigate global changes in brain

activity, having the resolution of single-unit recordings in fMRI may not be necessary.

22

Chapter 2: Aims and Hypotheses The primary objective of the present study is to understand crossmodal facilitation effects

in the brain. There are discrepancies in behavioural research about the direction of crossmodal

facilitation. To recap, some scientists have found that auditory cues are superior to visual cues in

producing fast responses (Bertelson & Tisseyre, 1969; Buchtel & Butter, 1988; Farah, Wong,

Monheit, & Morrow, 1988; Spence & Driver, 1997) while other researchers have shown the

opposite effect – visual cues facilitate reaction times to auditory targets (McDonald & Ward,

1999; Ward, McDonald, & Golestani, 1998; Ward, McDonald, & Lin, 2000; Ward, 1994). There

have been no studies that I am aware of that have investigated the effects of audio-visual

crossmodal facilitation in the brain using fMRI.

The hypotheses for the current experiment are as follows. Firstly, the magnitude of

facilitation for auditory cues will be larger compared to visual cues given the greater

neuroanatomical capacity for audition to influence vision. Secondly, auditory and visual cueing

will be represented by distinct patterns of brain activation given the differences in behavioural

performance. Lastly, brain areas that respond to cue processing (input) may not be the same

areas that coordinate behaviour (output).

The first two hypotheses will be tested in Chapter 3 and the last hypothesis will be

focused on in Chapter 4. In order to test the hypotheses mentioned, participants will perform a

crossmodal version of the spatial stimulus-response compatibility task while BOLD fMRI

responses are recorded. In spatial stimulus-response compatibility tasks, a cue signals the

response rule to a lateralized target. Responses (button presses) are made to the same

(compatible) or opposite (incompatible) side of target presentation. Reaction times are faster in

compatible conditions than in incompatible conditions (Simon, 1969; Fitts & Seeger, 1953). In

this experiment, the cue and targets will be presented in both auditory and visual modalities.

A unique aspect of the current experimental design that I would like to emphasize is the

manipulation of cue-target order. Tasks can be of two types – cues appearing first followed by

targets or targets being presented first followed by cues. Previous behavioural studies that have

investigated crossmodal facilitation did not manipulate order however, in my fMRI study I can

23

use this order manipulation to examine changes in neural activity based exclusively on cue or

target processing.

24

Chapter 3: Attentional Cueing Modulates Multisensory Interactions in Human Sensory Cortices

3.1 Introduction Experiments on selective attention have shown that when participants are provided with a

cue, they shift their attention to the cued location (Posner, Inhoff, Friedrich, & Cohen, 1987).

Attentional tasks where cues and targets are manipulated have been adapted to study crossmodal

facilitation effects (Spence and Driver, 1997; Ward, 1994). Crossmodal facilitation occurs when

a cue in one sensory modality elicits a speeded response in another sensory modality. A study

conducted by Ward (1994) investigated the effects of crossmodal facilitation using a spatial

discrimination task. Participants made speeded left-right responses to visual or auditory targets

following the presentation of an auditory or a visual non-predictive cue, auditory and visual cues

or no cues. Reaction times were measured for all conditions at different inter-stimulus intervals

between the cue and target. The results indicated that visual cues facilitated reaction times to

auditory targets presented on the same side of the cue (compatible) at short inter-stimulus

intervals. Auditory cues, in contrast, did not facilitate reaction times to visual targets on either

side of cue presentation or at any inter-stimulus interval. Auditory cues did facilitate reaction

times to compatible auditory targets at short inter-stimulus intervals. Ward’s findings were met

with skepticism because previous crossmodal studies that had presented non-predictive auditory

cues had shown response time facilitation to visual targets ipsa-lateral to the cue (Buchtel &

Butter, 1988; Farah, Wong, Monheit, & Morrow, 1989). A crossmodal study conducted after

Ward’s study also showed an asymmetrical facilitation of reaction times for auditory cues in

comparison to visual cues (Spence & Driver, 1997).

Spence and Driver (1997) explained the asymmetrical auditory cue facilitation as having

evolutionary significance. Auditory events in the world are transient and intermittent whereas

visual events are continuous in time. Also, auditory events that occur distally could be registered

in the brain resulting in appropriate action whereas; by the time visual events come into view

proximally, a response may not be viable. Therefore, it is more beneficial to shift visual attention

to auditory events rather than the other way around (Neumann, Van der Heijden, & Allport,

1986). Evidence from neuropsychological studies also indicates that orienting to auditory events

is usually followed by visual localization in areas like the superior colliculus (Stein & Meredith,

25

1993; Stein, Wallace, & Meredith, 1995). McDonald, Teder-Salejarvi & Hillyard (2000) showed

that sudden sound can improve the detection of a subsequent flash of light at the same location.

Abrupt sounds synchronized with visual search arrays can improve the identification of visual

targets embedded in a series of distracters (Vroomen & de Gelder, 2000). The evidence

presented thus far implies that auditory events can influence the processing of subsequent visual

events.

However, there has also been some evidence in favour of visual facilitation of reaction

time. Schmitt et al. (2000) found facilitated reaction times to both auditory and visual cues in an

experimental set-up where cue and target modalities were fixed within a block. Symmetric

audio-visual cueing effects have subsequently been reported in other studies (McDonald &

Ward, 1999; 2003). The famous McGurk effect also provides evidence for vision altering speech

perception (McGurk & MacDonald, 1976). For example, a sound of /ba/ is perceived as /da/

when it is coupled with a visual lip movement associated with /ga/. The McGurk effect shows

that sound can be misperceived when it is coupled with different visual lip movements. These

studies suggest that vision can also alter audition in some cases.

The lack of consensus about the direction of reaction time facilitation in response to

auditory and visual cues is further compounded by attempts to understand the exact nature of

cue-target processing in the brain. Some researchers argue that salient sensory information

contained in the cue is integrated with target information via separate, modality-specific sub-

systems (Bertelson & Tisseyre, 1969; Bushara et al., 1999; Cohen, Cohen, & Gifford, 2004;

Posner, Inhoff, Friedrich, & Cohen, 1987; Spence & Driver, 1997; Ward, 1994). Alternatively,

other scientists argue that synthesis of information from different sensory modalities is achieved

through a supramodal network that involves parts of the prefrontal cortex and parietal areas

(Andersen, 1995; Andersen, Snyder, Bradley, & Xing, 1997; Bedard, Massioui, Pillon, &

Nandrino, 1993; Downer, Crawley, Mikulis, & Davis, 2000; Eimer & Driver, 2001; Farah,

Wong, Monheit, & Morrow, 1989; Iacoboni, Woods, & Mazziotta, 1998; Laurens, Kiehl, &

Liddle, 2005; Snyder, Batista, & Andersen, 1997). Convergent theories, advocated by Macaluso

and others suggest that while supramodal attentional networks may guide sensorimotor

integration, reverberating loops that link sensory-specific cortices to each other can also integrate

sensory information across modalities (Corbetta & Shulman, 2002; Ettlinger & Wilson, 1990;

Macaluso, 2006; Macaluso & Driver, 2005). Macaluso and colleagues have derived a convergent

26

model for visual and tactile modalities but this model has not been applied to audio-visual cue-

target processing.

In order to reconcile discrepancies found in behavioural crossmodal facilitation data and

exact brain mechanisms responsible for multisensory processing; I designed a task that attempted

to capture audio-visual interactions between a cue and a target in an event-related functional

neuroimaging study. A stimulus-response compatibility paradigm, traditionally used to study

response selection and cue-target processing (Bertelson and Tisseyre, 1969; Rodway, 2005), was

modified to investigate crossmodal cue-target interactions. A general version of the stimulus-

response compatibility paradigm has within it a cue that signals a response rule to a lateralized

target. Response times are faster when responses are made to the same side as the presentation of

a target (compatible responses) compared to when responses are made to the opposite side of

target presentation (incompatible responses). This robust behavioural finding is known as the

stimulus-response compatibility effect (Simon, 1969; Fitts & Seeger, 1953).The cues and targets

in my experiment were presented in auditory and visual modalities.

The aims of the first part of my study are to determine the neural correlates that underlie

reaction time facilitation in response to auditory and or visual cues. Given the discrepancies in

behavioural findings, I am unsure about the direction of reaction time facilitation however, I

hypothesize that auditory cues will facilitate reaction times to visual targets because of the

structure of auditory and visual neural pathways in the brain (see Chapter 1 for details).

3.2 Materials and Methods

3.2.1 Participants

Twenty-four (12 female) healthy, right-handed individuals between the ages of 19 and 35

(mean age - 23.08 ± 3.87 years) were voluntarily recruited through undergraduate psychology

courses to partake in the study. All individuals were screened for any history of medical,

neurological, psychiatric, substance abuse-related problems (see Appendix A and B for screening

forms) prior to their participation in the study. All participants signed an informed consent (see

Appendix C) and were reimbursed $100.00 for two sessions. The experiment was conducted in

the fMRI Suite at Baycrest upon approval from the Research Ethics Board at Baycrest.

27

3.2.2 Stimuli

Two types of auditory stimuli that were matched in amplitude but varied in frequency

(250 Hz and 4000 Hz) were used in the experiment. Each participant adjusted the volume of all

auditory tones using the left and right buttons on the response box such that the tones appeared

perceptually identical. The volume adjustment was conducted at the beginning of the experiment

with the scanner turned on so that participants could adjust the volume of the stimuli according

to the noise produced by the scanner.

The visual stimuli used in the experiment were of two types and were matched for

luminance and contrast. The first visual stimulus was a black and white checkerboard pattern and

the second visual stimulus was also a black and white checkerboard pattern, but rotated at a 45

degree angle. Stimulus presentation was controlled and documented by Presentation software

(version 10.2, Neurobehavioural Systems Inc.)

3.2.3 Apparatus

Participants viewed visual stimuli on a translucent screen via a mirror that was mounted

on the head coil (used for acquiring images) in the MRI scanner. The total distance between the

participants’ eyes and the screen was approximately 52 inches. The size of the image on the

screen was 13.75 inches by 11 inches with an image resolution of 800 x 600 x 60 Hz. The field

of view (FOV) was 12 degree vertical and 15 degree horizontal. The majority of the participants

wore their own contact lenses for vision correction. MR safe glasses made by SafeVision

(Webster Groves, MO, USA) with a range from +/- 6 dioptres in increments of 0.5 dioptres were

provided to those participants that did not have prescription contact lenses. Auditory stimuli were

presented using the Avotec Audio System (Jensen Beach, FL, USA). A button-press response

was made by participants with either the left or the right index finger using Fiber-Optic Response

Pad System developed by Current Designs Inc. (Philadelphia, PA, USA). The response pad

system had two paddles – one for each hand and each paddle had four buttons. The first button

used for the left hand (left paddle) was the first button in the four buttons starting from the right

going left. The first button used for the right hand (right paddle) was the first button out of four

buttons going from left to right. The other three buttons on both hands were taped to prevent

responses. Participants rested their hands gently on top of the taped buttons. The signal

transmission of the paddles was less than 1ms.

28

3.2.4 Procedure

3.2.4.1 Trial Structure

A trial had the following sequence: presentation of the first stimulus (S1) for 250ms, a 4-

second inter-stimulus interval (ISI), presentation of the second stimulus (S2) for 250ms and a

response window of 1500ms. The inter-trial interval (ITI) was jittered randomly at 3, 5, 7, 9

seconds. Response times were recorded using Presentation software from the onset of the second

stimulus.

3.2.4.2 Task Types

The study consisted of two scanning sessions on different days. Each session was 1.5

hours in length. Two different ta

Date post:	06-Feb-2021
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Investigating the Neural Correlates of Crossmodal Facilitation as … · 2014. 1. 15. · Zainab...

Documents