INVESTIGATING THE NEURAL CORRELATES OF CROSSMODAL FACILITATION AS A RESULT OF
ATTENTIONAL CUEING: AN EVENT-RELATED fMRI STUDY
by
Zainab Fatima
A thesis submitted in conformity with the requirements for the degree of Masters of Science
Institute of Medical Science University of Toronto
© Copyright by Zainab Fatima, 2008
ii
Investigating the Neural Correlates of Crossmodal Facilitation as
a Result of Attentional Cueing: an Event-Related fMRI Study
Zainab Fatima
Masters of Science
Institute of Medical Science University of Toronto
2008
Abstract
Attentional cueing modulated neural processes differently depending on input modality. I used
event-related fMRI to investigate how auditory and visual cues affected reaction times to
auditory and visual targets. Behavioural results showed that responses were faster when: cues
appeared first compared to targets and cues were auditory versus visual. The first result was
supported by an increase in BOLD percent signal change in sensory cortices upon cue but not
target presentation. Task-related activation patterns showed that the auditory cue activated
auditory and visual cortices while the visual cue activated the visual cortices and the fronto-polar
cortex. Next, I computed brain-behaviour correlations for both cue types which revealed that the
auditory cue recruited medial visual areas and a fronto-parietal attentional network to mediate
behaviour while the visual cue engaged a posterior network composed of lateral visual areas and
subcortical structures. The results suggest that crossmodal facilitation occurs via independent
neural pathways depending on cue modality.
iii
Acknowledgments I am still in awe of the fact that I have now officially completed my Masters degree. This
is partly because writing the thesis appeared to be such an arduous task at the beginning.
Nevertheless, with time and patience, the page numbers increased and the quality of the writing
improved. In retrospect, I realize that the one person who constantly supported me and pushed
me to produce the best possible work is my supervisor – Dr. Anthony Randal McIntosh. His
relentless critiques of my thesis and constant insistence on getting tasks completed have brought
me to this pinnacle in my life. I am deeply grateful to him. I admire his strength of character and
his ability to always be innovative in the light of academic adversity. He regards criticisms of his
work as challenges and is truly an inspiration when it comes to scientific knowledge. If my
scientific career is an inkling of what Randy’s is, I would be quite satisfied with my progress. So,
thank you Randy for being the person that you are and for keeping me motivated throughout this
whole process.
I would like to thank past and present members of the McIntosh lab who have provided
valuable feedback in various forms during the preparation of my graduate work. These people
include Maria Tassopoulos, Jordan Poppenk, Grigori Yourganov, Tanya Brown, Roxane Itier,
Vasily Vakorin, Diana Khrapatch, Anjali Raja, Antonio Vallesi, Wilkin Chau, Michele Korostil,
Signe Bray, Jeremy Caplan and Mackenzie Glaholt. I would like to especially thank Natasa
Kovacevic – for her insights with regards to my experiment, Andreea Diaconescu and Bratislav
Misic – for allowing me to vent and fret at any given time during my writing episodes, Andrea
Protzner – for constantly keeping me caffeinated, Sandra Moses – for her gentle way of
conveying criticisms, and Hana Burian – for the sushi runs and bear hugs.
I would like to thank my committee members – Drs. Adam Anderson and Claude Alain
for their support and for providing me with thesis-related feedback so promptly. I would also like
to thank Karen Davis, graduate coordinator at the Institute of Medical Science (IMS), for
allowing me to defend my thesis within such stringent time constraints.
Last but not least, I would like to thank my parents and brother for their faith in my
ability to accomplish any goal that I have set for myself and for taking care of me. I would like to
thank my husband and best friend, Ali - without his foot-rubs, back massages and constant
iv
coaxing – I would be far from completing any graduate work. And finally, I would like to thank
my baby girl who has so patiently stayed in my tummy till I have completed my academic
responsibilities. This is one journey we’ve already shared and I can’t wait to meet you, my
darling.
v
Table of Contents Abstract ........................................................................................................................................... ii
Acknowledgments.......................................................................................................................... iii
Table of Contents............................................................................................................................ v
List of Tables ............................................................................................................................... viii
List of Figures ................................................................................................................................ ix
List of Appendices .......................................................................................................................... x
List of Abbreviations ..................................................................................................................... xi
Chapter 1: Literature Review.......................................................................................................... 1
1.1 Overview............................................................................................................................. 1
1.2 Classifying Different Attentional Mechanisms .................................................................. 1
1.2.1 Orienting ................................................................................................................. 1
1.2.2 Endogenous vs. Exogenous Shifts .......................................................................... 2
1.3 Influence of Crossmodal Stimuli on Attentional Mechanisms ........................................... 4
1.4 Crossmodal Asymmetry Reported In Cognitive, Physiological, and Developmental Studies................................................................................................................................. 7
1.4.1 Auditory to Visual (A-V) Interactions.................................................................... 8
1.4.2 Visual to Auditory (V-A) Interactions.................................................................. 10
1.5 General Anatomy of Central Auditory and Visual Pathways........................................... 11
1.5.1 The Auditory Pathway .......................................................................................... 11
1.5.2 The Visual Pathway .............................................................................................. 13
1.6 From Anatomy to Function - A Dynamic Systems’ Perspective...................................... 14
1.7 An Overview of fMRI....................................................................................................... 16
1.7.1 Basic MRI Physics................................................................................................ 16
1.7.2 Physiological Basis of BOLD fMRI ..................................................................... 18
1.7.3 Coupling of Neuronal Activity & BOLD ............................................................. 19
vi
Chapter 2: Aims and Hypotheses.................................................................................................. 22
Chapter 3: Attentional Cueing Modulates Multisensory Interactions in Human Sensory Cortices .................................................................................................................................... 24
3.1 Introduction....................................................................................................................... 24
3.2 Materials and Methods...................................................................................................... 26
3.2.1 Participants............................................................................................................ 26
3.2.2 Stimuli................................................................................................................... 27
3.2.3 Apparatus .............................................................................................................. 27
3.2.4 Procedure .............................................................................................................. 28
3.2.4.1 Trial Structure......................................................................................... 28
3.2.4.2 Task Types.............................................................................................. 28
3.2.4.3 fMRI Session .......................................................................................... 29
3.2.5 fMRI Scanning Parameters ................................................................................... 30
3.2.6 Data Analysis ........................................................................................................ 30
3.2.6.1 Pre-processing Pipeline. ......................................................................... 30
3.2.6.2 Statistical Analysis. ................................................................................ 32
3.3 Results............................................................................................................................... 34
3.3.1 Behavioural Performance...................................................................................... 34
3.3.2 fMRI Results......................................................................................................... 35
3.4 Discussion ......................................................................................................................... 36
Tables............................................................................................................................................ 40
Figures........................................................................................................................................... 46
Chapter 4: The Interplay of Cue Modality and Response Latency in Neural Networks Supporting Crossmodal Facilitation......................................................................................... 60
4.1 Introduction....................................................................................................................... 60
4.2 Methods............................................................................................................................. 62
4.1.1 Data Analysis ........................................................................................................ 62
vii
4.3 Results............................................................................................................................... 63
4.1.2 Behavioural Performance...................................................................................... 63
4.1.3 fMRI Results......................................................................................................... 63
4.4 Discussion ......................................................................................................................... 64
Tables............................................................................................................................................ 68
Figures........................................................................................................................................... 75
Chapter 5: General Discussion...................................................................................................... 87
5.1 A Convergent Model of Audio-Visual Interactions.......................................................... 90
5.2 Dynamic Processing in Sensory-specific Cortices ........................................................... 91
5.3 Limitations ........................................................................................................................ 92
5.4 Future Directions .............................................................................................................. 93
References..................................................................................................................................... 95
Appendices.................................................................................................................................. 115
viii
List of Tables Table 3.1: Mean Reaction Times by Condition. ........................................................................... 40
Table 3.2: Local Maxima from AC-VT: AT-VC Task ST-PLS. .................................................. 41
Table 3.3: Local Maxima from VC-AT: VT-AC Task ST-PLS. .................................................. 43
Table 4.1: Local Maxima from AC-VT: AT-VC Behavioural ST-PLS. ...................................... 68
Table 4.2: Local Maxima from VC-AT: VT-AC Behavioural ST-PLS. ...................................... 71
ix
List of Figures Figure 3.1. Experimental design schematic for auditory cue-visual target tasks.......................... 46
Figure 3.2. Experimental design schematic for visual cue-auditory target tasks.......................... 48
Figure 3.3. Behaviour measures for all four experimental tasks. ................................................. 50
Figure 3.4. BOLD HRFs for cue and target – auditory modality. ................................................ 52
Figure 3.5. Singular image and design scores differentiating cue from target for auditory
modality ........................................................................................................................................ 54
Figure 3.6. BOLD HRFs for cue and target – visual modality. .................................................... 56
Figure 3.7. Singular image and design scores differentiating cue from target for visual modality..
....................................................................................................................................................... 58
Figure 4.1. Correlation profiles for the auditory cue. ................................................................... 75
Figure 4.2. Brain scores plotted by participants for the auditory cue. .......................................... 77
Figure 4.3. Singular images of brain areas that facilitate reaction time for the auditory cue. ...... 79
Figure 4.4. Correlation profiles for the visual cue. ....................................................................... 81
Figure 4.5. Brain scores plotted by participants for the visual cue............................................... 83
Figure 4.6. Singular images of brain regions that facilitate reaction time for the visual cue. ...... 85
x
List of Appendices Appendix A. fMRI Screening Form. .......................................................................................... 115
Appendix B. MRI Screening Form............................................................................................. 119
Appendix C: Information and Consent Form. ............................................................................ 120
List of Abbreviations
Abbreviation Region Cu cuneus Cl claustrum Ga angular gyrus GC cingulate gyrus GF fusiform gyrus
GFd medial frontal gyrus GFi inferior frontal gyrus
GFm middle frontal gyrus GFs superior frontal gyrus Gh parahippocampal gyrus GL lingual gyrus GOi inferior occipital gyrus
GOm middle occipital gyrus GOs superior occipital gyrus
GPoC postcentral gyrus (sensory cortex) GPrC precentral gyrus (motor cortex) GTi inferior temporal gyrus
GTm middle temporal gyrus GTs superior temporal gyrus Gsm supramarginal gyrus INS insula LPc paracentral lobule LPi inferior parietal lobe LPs superior parietal lobe Pcu precuneus Th thalamus
Note: Definition of the region abbreviations by reference to Talairach and Tournoux (1988). Other abbreviations used in tables include L and R which stand for left and right respectively. Ant refers to anterior and Post refers to posterior.
xi
1
Chapter 1: Literature Review
1.1 Overview The past few decades have marked a dramatic change in the realm of cognitive
neuroscience. Neuroimaging techniques such as functional magnetic resonance imaging (fMRI)
and electroencephalography (EEG) have provided scientists with tools to examine brain function
across spatial and temporal domains. Prior to the use of functional neuroimaging, attentional
theories about the brain were based primarily on animal work or observations of deficits in
patients with brain damage or disease. These theories of attention are now informed both by
animal work and human neuroimaging studies; researchers are actively trying to bridge the gap
between brain function and attentional models.
This section will begin with an overview of some common attentional mechanisms and
how these processes can form the basis of behavioural crossmodal facilitation. Crossmodal
facilitation occurs when a cue in one sensory modality, for example auditory, speeds responses to
a target in a different sensory modality, such as vision. The motivation for the current experiment
is to study the effects of crossmodal facilitation, in audition and vision, using the fMRI
technique.
A survey of the behavioural literature on crossmodal facilitation will be followed by an
outline of the neuroanatomy that may support audio-visual processes in the brain. The
implications of neuroanatomical studies on the organization of brain function will then be
described continued by a brief review of the fMRI technique.
1.2 Classifying Different Attentional Mechanisms
1.2.1 Orienting
Experiments on selective attention have shown that when participants are provided with a
cue for the location of an upcoming event, they direct their attention to the event (Posner, 1980;
Posner, Inhoff, Friedrich, & Cohen, 1987). Posner developed an attentional cueing paradigm to
study such visual shifts of attention. The paradigm consisted of a central cross and two peripheral
boxes. The central cross was replaced by a plus sign or an arrow that pointed in the direction of
one of the boxes. The plus sign indicated that there was an equiprobable chance of a target
2
appearing in either of the two boxes while the direction of the arrow correctly identified the
location of the target in eighty-percent of the trials (valid trials) and did not cue correct target
location in twenty-percent of the trials (invalid trials). Responses were faster to valid trials
compared to invalid trials. Posner (1980) postulated that facilitated responses in valid conditions
occurred as a result of attentional engagement at the target location.
Subsequently, Posner and Petersen (1990) proposed a model of selective attention based
on neuropsychological studies where three loci in the brain – the parietal lobe, the superior
colliculus and the thalamus - seemed to contribute to different aspects of attentional shifts.
Individuals with unilateral parietal lobe damage showed normal responses when cues and targets
were presented on the side that was contra-lateral to their lesion but their performance was
impaired when cues appeared on the ipsa-lesional side and the target appeared on the contra-
lesional side (Posner, Walker, Friedrich, & Rafal, 1984). Patients that had degenerate superior
colliculi, structures involved in eye movement coordination, such as those suffering from
supranucluear palsy were slow to move their attention from one location to the next (Rafal et al.,
1988). Lastly, patients with thalamic damage responded poorly at spatial locations that were
contra-lateral to their lesion irrespective of whether trials were valid or invalid (Rafal & Posner,
1987). In summarizing the neuropsychological work that supported the selective attention model,
it can be stated that the parietal lobe was critical in the disengagement of attention, the superior
colliculus was implicated in moving attention from one point in space to another and the
thalamus was essential for re-engaging attention at particular location.
1.2.2 Endogenous vs. Exogenous Shifts
The model of attentional cueing postulated by Posner was based on experiments in which
cues were centrally presented (at fixation) and targets occurred in the periphery (either side of
fixation). Experimenters have subsequently claimed that the location of cue presentation, central
or peripheral, can tap into two different attentional mechanisms – endogenous and exogenous
(see Ruz & Lupianez, 2002 for a review). Endogenous shifts of attention occur following an
instruction to attend to a target location where participants have to use their own volition to
orient attention. In contrast, exogenous shifts of attention are more reflexive in that participants
can automatically orient to the stimulus that is presented at a target location (see Gazzaniga, Ivry,
& Mangun, 2002 for review). When cues are presented centrally, participants have to move their
3
attention according to the instructional content of the cue (Ruz & Lupianez, 2002). On the other
hand, peripheral cues are able to capture attention exogenously at a particular location without
any instructional content (Ruz & Lupianez, 2002). Therefore, informative central and peripheral
cues have been used to study endogenous shifts of attention while spatially non-predictive
peripheral cues have been used to study exogenous shifts of attention (Chica, Sanabria,
Lupianez, & Spence, 2007).
Researchers that have attempted to separate the effects of endogenous and exogenous
shifts of attention have found that endogenous cueing requires a longer delay interval between
cue and target presentation (Eimer, 2000; Muller and Findley, 1988). These studies imply that
different attentional mechanisms may be mediating cue processing. Corbetta and Shulman
(2002) have claimed that endogenous versus exogenous shifts of attention may invoke different
behavioural patterns and neural activations. In a meta-analysis of studies conducted on shifts of
attention, Corbetta and Shulman (2002) ascribed voluntary (endogenous) attention to a network
of areas that include the dorsal posterior parietal and frontal cortex with transient activity in
occipital areas. The functional attributes of this fronto-parietal network include integration of
prior experience with expectations and goals to result in volitional shifts in attention. The
temporo-parietal cortex and ventral frontal regions comprise a separate network that has been
implicated in reflexive (exogenous) attentional shifts. This reflexive network is thought to focus
attention on salient events in the environment.
A similar but less specific distinction in attentional processes was suggested earlier by
Posner (1992) and Posner and Petersen (1990). The authors claimed that an anterior attentional
network consisting of frontal areas and the anterior cingulate was responsible for extracting the
relevance of a selected item while a posterior attentional network involving parietal areas
selected information based on sensory attributes. Posner argued that this anterior-posterior
attention system functioned to coordinate activity at a supramodal (modality-independent) level
and was able to regulate data processing that was specific to particular cognitive tasks.
While Corbetta and Shulman (2002) and to some extent, Posner (1992) have made an
attempt to categorize exogenous and endogenous attentional shifts into distinct processing
streams, a study by Rosen and colleagues (1999) suggested a fair bit of overlap in brain
activations in response to voluntary and reflexive attention. In both endogenous and exogenous
4
cueing tasks, activations were seen in the dorsal premotor region, the frontal eye fields and the
superior parietal cortex. Evidence to date remains inconclusive about the exact nature of
interactions between endogenous and exogenous shifts of attention at the neural level.
The mechanisms of attention discussed in this section are not directly explored in the
current experiment but provide the context for understanding experiments on crossmodal cueing
outlined below.
1.3 Influence of Crossmodal Stimuli on Attentional Mechanisms The research mentioned thus far has focused on visual spatial orienting while
experiments conducted in the 1960s also found attentional modulations of response times to
cross-modal (auditory, visual) stimuli (Bertelson & Tisseyre, 1969; Davis & Green, 1969). In
one experiment, Bertelson and Tisseyre found that an auditory click decreased reaction time to a
subsequent visual flash while a visual flash did not have the same effect on an auditory click. A
more specific investigation into crossmodal cueing was conducted by Buchtel & Butter (1988)
who used lateralized auditory and visual stimuli. These stimuli were spatially significant
(spatially neutral conditions were used as controls) in comparison to Bertelson & Tisseyre’s non-
spatial stimuli (centered flash, binaural click: neutral spatial significance), Two cross-modal
cueing cases were devised, auditory to visual (A-V) and visual to auditory (V-A), as well as the
intra-modal cases, visual to visual (V-V) and auditory to auditory (A-A). Previous findings were
reinforced with respect to reaction time: cross-modal cueing led to faster responses compared to
intra-modal cueing, and audio cues were more effective in facilitating responses compared to
visual cues. In the cross-modal conditions, visual target stimuli seemed much easier to cue than
audio target stimuli, regardless of the cue modality. In fact, Buchtel and Butter reported virtually
no cueing effect for auditory target stimuli.
Farah, Wong, Monheit, and Morrow (1989) performed a cross-modal cueing study on
patients with lesions circumscribed unilaterally to the parietal lobe to try to understand the nature
of crossmodal cueing effects in the brain. Prior to this study, Posner and colleagues (1984) had
shown that patients with unilateral parietal lobe lesions had difficulty disengaging attention in a
standard visual spatial cueing paradigm. Farah and colleagues (1989) extrapolated on Posner’s
work and examined A-V and V-V conditions. In the V-V condition there was a 50ms reduction
in reaction time when the target appeared on the ipsa-lesional side of space, and a considerable
5
increase in reaction times for targets that appeared on the contra-lesional side of space. In the A-
V condition, response times were faster than in the visual cue condition. Again, there was a large
increase in reaction time for targets that appeared contra-lateral to the parietal lesion. This study
was in agreement with its predecessors in a general way: cross-modal A-V cueing produced
facilitated reaction times.
Contrary to the observation of these past studies was a paper by Ward published in 1994.
Ward (1994) used a crossmodal spatial discrimination task where participants made speeded left-
right responses to visual or auditory targets following the presentation of an auditory or a visual
non-predictive cue, both auditory and visual cues, or no cues. Reaction times were measured for
all conditions at different inter-stimulus intervals between the cue and target. The results
indicated that visual cues facilitated reaction times to auditory targets that were presented on the
same side of the cue (compatible) at short inter-stimulus intervals. Auditory cues, in contrast, did
not facilitate reaction times to visual targets shown on either side of cue presentation or at any
inter-stimulus interval. Auditory cues did facilitate reaction times to compatible auditory targets
at short inter-stimulus intervals. Ward’s findings were directly opposite to those found by
previous crossmodal studies (Bertelson & Tisseyre, 1969; Buchtel & Butter, 1988; Farah, Wong,
Monheit, & Morrow, 1988).
In a subsequent study by Spence and Driver (1997), an orthogonal spatial cueing
paradigm (Spence and Driver, 1994) was used to examine crossmodal cueing effects. In the
orthogonal spatial cueing experiment, participants had to discriminate the elevation of a target
sound rather than its laterality. The targets were preceded by an uninformative cue on the same
side in fifty-percent of the trials and on the opposite side in the rest of the trials. The results of
multiple manipulations of the orthogonal spatial cueing paradigm replicated the finding that
auditory cues facilitate reaction times to visual targets. The authors claimed that Ward’s
contradictory findings could be explained in light of spatial compatibility effects and response
priming since Ward’s stimuli were lateralized.
Ward and his collaborators have subsequently demonstrated that the cross-modal
asymmetry in favour of visual cueing still holds when all methodological confounds are removed
(McDonald & Ward, 1999; Ward, McDonald, & Golestani, 1998; Ward, McDonald, & Lin,
2000). Ward, McDonald and Golestani (1998) showed that when cues and targets were presented
6
from exactly the same lateral eccentricity, responses to visual cues were faster, ruling out Spence
and Driver’s criticism about Ward’s initial findings being based on spatial compatibility effects
that arise as a result of cue-target lateralization differences. In another study by Ward, McDonald
and Lin (2000), an implicit spatial discrimination paradigm was used to study cross-modal
asymmetry and visual cue superiority was once again found eliminating effects of response
priming on Ward’s early findings (1994).
Mondor and Amirault (1998) reacted to discrepancies in the direction of the crossmodal
cueing (auditorily or visually driven effects) by conducting a comprehensive investigation into
the impact of experimental set-up on cueing effects. The investigators manipulated cue and target
modalities, stimulus onset asynchrony (SOA), and target location given cue location. In their first
experiment, they tried to maximize uncertainty for the subject: an auditory or visual spatial cue
preceded an auditory or visual lateralized target by either 150 or 300ms. The results showed that
valid trials were faster than invalid trials (also known as the cue validity effect) only for intra-
modal tasks (A-A, V-V) and only for an SOA of 150ms. Cueing for cross-modal cases did not
reach significance. These results seemed inconsistent with past findings (Bertelson & Tisseyre,
1969), but Mondor and Amirault (1998) fostered a hypothesis that could address the discrepancy.
They suspected that endogenous mechanisms were responsible for cross-modal cueing effects.
That is, rather than pure perceptual-motor reactivity, it was expectation or anticipation that
allowed cross-modal attention to be efficient. In their second experiment, Mondor and Amirault
(1998) decreased the uncertainty: seventy-five percent of the cues were valid, while the modality
of cue and stimulus was unpredictable and the SOA was a constant 150ms. A significant cue
validity effect for A-V and V-A resulted even though the A-A and V-V effect was stronger. In a
third experiment, Mondor and Amirault (1998) administered the conditions in blocks, thus
rendering modality of cue and target predictable. This increased intra-modality cueing effects,
though cross-modal cueing effects were unaffected. Mondor and Amirault (1998) concluded that
cross-modal cueing was effective when endogenous mechanisms were at work, and those
mechanisms were spurred by predictability (predictably of modality seemed irrelevant) or lack of
uncertainty.
Although, Mondor and Amirault’s (1998) third experiment did not report strong
crossmodal cueing effects, a subsequent study by Schmitt, Postma and de Haan (2000) found
facilitated reaction times to both auditory and visual cues in an experimental set-up where cue
7
and target modalities were fixed within a block. Symmetric audio-visual cueing effects have
subsequently been reported in other studies (McDonald & Ward, 1999; 2003). At present,
facilitation effects have been shown for both auditory and visual cues but the exact neural
underpinnings of these cueing effects have yet to be isolated.
On a side note, most crossmodal cueing experiments mentioned previously have
manipulated SOA in addition to cue and target modalities (Bertelson & Tisseyre, 1969; Spence
& Driver, 1997; Ward, 1994) to try to understand the type of attentional mechanisms that may
underlie the integration of cue-target information. When Bertelson and Tisseyre (1969) varied
stimulus onset asynchrony between 0 and 700 ms and measured the effect of SOA manipulation
on responses to a target, they found that an auditory click decreased reaction time to a
subsequent visual flash at the smallest SOA. At larger SOAs (greater than 70ms), in the case of
the visual flash that preceded the auditory click, some facilitation of reaction time was noted but
it was greatly attenuated in comparison to the auditory click – visual flash (A-V) case. The
primary purpose of carrying out such behavioural experiments was to determine if there was an
attentional refractory period that followed processing of the first stimulus. The logic was that if
attentional mechanisms which may be governed by higher-order cognitive systems in the brain
were required to process the first stimulus, than this processing would utilize resources that
would not be available to process a subsequent stimulus till a certain time had elapsed (Davis,
1959). This elapsed time was known as the attentional refractory period. However, experiments
by Bertelson and Tisseyre found that there was no attentional refractory period if the two stimuli
were presented in two different sensory modalities (auditory and visual). When stimuli were
presented in the same modalities (visual-visual: V-V), the facilitation was very weakly
represented in the behavioural data. By using a substantially long SOA, both auditory and visual
cueing effects can potentially be delineated.
1.4 Crossmodal Asymmetry Reported In Cognitive, Physiological, and Developmental Studies
The majority of behavioural studies conducted on some aspect of crossmodal cueing
allude to the idea that there may be cognitive, physiological and developmental bases for
auditory and visual interactions. Some of these studies will be described next.
8
1.4.1 Auditory to Visual (A-V) Interactions
Investigators examining multisensory interactions have found that a sudden sound can
enhance the detection of a subsequent flash of light in the same location (McDonald, Teder-
Salejarvi, & Hillyard, 2000). Abrupt sounds synchronized with visual search arrays can improve
the identification of visual targets embedded in a series of distracters (Vroomen & de Gelder,
2000). These studies suggest that auditory events can influence visual processing.
According to Neumann, Van der Heijden, & Allport (1986), shifting visual attention to
auditory events has evolutionary significance. They claim that auditory events that occur distally
can be registered in the brain resulting in appropriate action whereas; by the time visual events
come into view proximally, a response may not be viable. Since auditory events in the world are
transient and intermittent relative to visual events which appear continuous in time, it is more
beneficial to have an asymmetry in audio-visual cueing. Building on concepts proposed by
Neumann and colleagues’ ideas, some researchers (Brown & May, 1989; Harrison & Irving,
1966; Spence & Driver, 1997) have suggested that the primary function of sound localization in
animals is to the direct the eyes towards auditory events. In this way, sound localization may be
necessary for the control of orienting towards significant distal events which occur outside an
animal’s field of view.
Visual orienting with respect to sound localization has been explored in physiological
studies involving the superior colliculus (King, 1993; Stein & Meredith, 1993; Stein, Wallace, &
Meredith, 1995). In order to understand how the superior colliculus may contribute to the
integration of audio-visual stimuli, it is imperative to consider the anatomical organization of this
structure. The first three layers of the superior colliculus are purely visual and organized spatio-
topically (de Monasterio, 1978a, 1978b; Dubois & Cohen, 2002; Leventhal et al., 1981; Perry
and Cowey, 1984; Rodieck and Watanabe, 1993; Schiller and Malpeli, 1977; Sparks, 1988;
Sparks & Hartwich-Young, 1989). There is no influence of audition on these first three
superficial layers (King, 1993). The three layers beneath the visual layers are often referred to as
the deep layers of the superior colliculus. These layers are considered polymodal because they
receive ascending inputs from brainstem nuclei such as the inferior colliculus and the trigeminal
nucleus that represent modalities such as audition, vision and touch (Wickelgren, 1971).
Descending inputs from the temporal cortex and postcentral gyrus also culminate in these deep
9
layers (Wickelgren, 1971). Cells within the deep layers that receive afferents from auditory,
visual pathways and multimodal cortical sites are known as multisensory integrative cells (MSI,
term coined in Calvert, 2001). These MSI cells are not only capable of generating responses to
different modalities but are also able to transform separate sensory inputs into an integrated
product. In conclusion, the anatomy of the superior colliculus makes it an ideal candidate for
carrying out multisensory integration; that is the assimilation of information from multiple
sensory modalities into a motor outcome.
According to Wickelgren, the superior colliculus is able to track stimuli in either vision
or audition that are moving laterally away from an animal to ensure appropriate avoidance or
approach behaviour. This structure, in a variety of species ranging from amphibians to primates,
is able to produce orienting movements linked to the eyes (saccades), head and body as well as
approach, freezing or running responses (Sahibzada, Dean, & Redgrave, 1986; Sparks, 1999;
Vargas, Marques, & Schenberg, 2000). Researchers have hypothesized that the superior
colliculus may be vital in generating motoric responses to salient sensory events (Sparks, 1999;
Stein, 1998).
Other neurophysiological investigations of the functional architecture of the superior
colliculus have shown that there is a two-dimensional representation of auditory target location
represented within deep layers of this structure (King, 1993; Stein & Meredith, 1993). Stein and
Meredith (1993) however, are careful in noting that the deep layers are multimodal rather than
purely auditory suggesting that any shifts of attention could implicate involvement of other
modalities. King (1993) advocates further that there is no spatio-topic map of auditory space in
the brain therefore; it unlikely that vision could guide auditory localization. Clarey, Barone and
Imig (1992) have found that in structures like the inferior colliculus and the primary auditory
cortex, there is a tonotopic organization with lateralized auditory receptive fields but there is no
indication of spatial tuning or spatiotopy. Therefore, the only spatio-topic map of auditory space
in the brain is the one found in polymodal deep layers of the superior colliculus.
Developmental studies of the superior colliculus have shown that the representation of
auditory space in deep layers of the superior colliculus can be altered by varying the visual
environment (King & Carlile, 1993; King, Hutchings, Moore and Blakemore, 1988; Withington,
1992). In contrast, manipulations of auditory experience have no influence on the spatiotopically
10
organized superficial layers of the superior colliculus (Knudsen, Esterly & Knudsen, 1984;
Withington-Wray et al., 1990). This asymmetry in the direction of audio-visual interactions is
also prominent in studies that have examined the development of senses in infants. According to
Gottlieb (cited in Lewkowicz & Kraebel, 2004), different sensory modalities develop in the
following sequence: tactile, vestibular, chemical, auditory and visual. The first four modalities
mentioned are functional prenatally while vision develops most postnatally. Lewkowicz (1988a,
1988b) claims that sensory processing hierarchies evident in developmental profiles of infants
may contribute to the different degrees of how input from a modality is transmitted to different
unisensory and multisensory sites in the brain. For example, inputs from auditory cortical
neurons could possibly project to areas of the visual cortex that are still underdeveloped at birth
however; neurons within the visual cortex may be unable to innervate the developed neuronal
structure of the auditory cortex. Thus, the cellular architecture that supports a particular sensory
modality may result from temporal differences in the development of the senses.
1.4.2 Visual to Auditory (V-A) Interactions
Over the years, researchers have reported some effects of vision influencing audition.
These studies are not as extensive as the ones mentioned previously for the auditory capture of
visual attention but are worth considering.
The famous McGurk effect provides evidence for vision altering speech perception
(McGurk & MacDonald, 1976). For example, a sound of /ba/ is perceived as /da/ when it is
coupled with a visual lip movement associated with /ga/. The McGurk effect suggests that sound
can be misperceived when it is coupled with different visual lip movements. fMRI studies by
Calvert and colleagues (1997, 1999, 2000) have investigated brain activity in relation to
incongruent audio-visual linguistic information (similar to the McGurk effect). Calvert et al.
(1997) found that lip-reading (visual speech) activated areas of the auditory cortex that were
previously considered to be unimodal. In a subsequent study, Calvert and colleagues (1999)
showed an enhancement in sensory-specific cortices when participants saw and heard speech.
These neuroimaging studies uphold the view that vision can influence audition.
Another well-known effect of vision on audition is the ventriloquist effect first reported
by Howard and Templeton (1966). The ventriloquist’s illusion is caused when the perceived
location of a sound shifts towards the location of the visual source. Ventriloquists are able to
11
produce speech without visible lip movements while simultaneously moving a puppet. This leads
to a conflict between visual and auditory localization that culminates in vision dominating
audition. A recent study by Guttman, Gilroy and Blake (2005) has suggested that in audio-visual
conflict situations, vision dominates spatial processing while audition directs temporal
processing. The effects stated in this section suggest that vision can alter audition in very specific
cases such as those of conflicting information from multiple modalities. The influence of vision
on audition in terms of spatial localization and cueing still remains an area that has potential for
exploration.
1.5 General Anatomy of Central Auditory and Visual Pathways An exploration of the characteristics of audio-visual interactions is incomplete without a
consideration for the general organization of auditory and visual pathways in the brain. This
section will briefly review some general areas that are involved in the processing of auditory and
visual stimuli.
1.5.1 The Auditory Pathway
Sound waves from the environment are captured and focused by the auricle – the external
ear – into the auditory canal. The auditory canal is a hollow, air-filled tube that transmits sound
waves to three bones located in the middle ear (the malleus, the incus and the stapes). These
three bones amplify the auditory signal enabling it to travel through the fluid-filled inner ear
structure called the cochlea. The cochlea is the primary site of the conversion of sound energy
into a neural code. This process is called signal transduction (Noback, 1967). Outer hair cells,
also known as auditory sensory receptors, in the cochlea display motility converting mechanical
sound energy into receptor potentials. Subsequently, these outer hair cells transmit receptor
potentials to inner hair cells which provide frequency and intensity information to cochlear
ganglion cells (Miller & Towe, 1979). The innervation of hair cells by ganglion cells in the
cochlea is the first part of the auditory neural pathway where information is encoded both in
terms of frequency and intensity of sound. Neurons within the cochlea respond best to
stimulation at characteristic frequencies of contiguous cells. In this way, tonotopy – organization
of tones that share similar frequencies into topologically neighbouring neurons – begins
postsynaptic to inner hair cells (Spoendlin, 1974).
12
Axons from the cochlear ganglion cells form the cochlear component of the eighth
cranial nerve synapsing in the cochlear nuclear complex of the medulla-pontine junction. The
cochlear nucleus sends auditory inputs via three different pathways – dorsal acoustic stria,
intermediate acoustic stria and trapezoid body – to the pons. The trapezoid body projects to the
superior olivary nuclei where information from both ears (binaural) is processed. Localization of
sounds in space occurs in the medial and lateral portions of the superior olivary nucleus.
Efferents from the superior olivary nucleus, dorsal and intermediate acoustic stria terminate in
the inferior colliculus by way of the lateral lemniscus nuclei. The inferior colliculus in turn sends
outputs to the medial geniculate nucleus of the thalamus which end up in the primary auditory
cortex (area A1, or Brodmann areas 41 and 42; Brodmann, 1909) located on Heschl’s gyrus in
the superior temporal lobe (adapted from Brodal, 1981, Hudspeth, 2000). The auditory cortex is
organized in concentric bands with the primary auditory cortex in the centre and auditory
association areas forming the periphery (see Pandya, 1995 for review). The auditory neural
pathway maintains its tonotopic organization from the cochlear ganglion cells to the primary
auditory cortex
For decades, it was assumed that the auditory neural pathway processed exclusively
acoustic information. In recent years, studies have shown that outputs from midbrain structures
such as the inferior colliculus and the medial geniculate nucleus of the thalamus can contain
some visual information (Komura et al., 2005; Porter, Metzger, & Groh, 2007). Porter, Metzger
and Groh (2007) have demonstrated that the inferior colliculus in monkeys carries visual,
specifically saccade-related information in addition to auditory responses. This study suggests
that the inferior colliculus, predominantly considered to be a unisensory structure responsive to
only auditory stimuli, may have the capacity to integrate specific types of audio-visual
information. In another study by Komura and colleagues (2005), rats were trained to perform an
auditory spatial discrimination task with auditory or auditory-visual cues. The auditory-visual
cues were presented in such a manner that the visual cues were either congruent with auditory
cues or provided conflicting information. The results showed that almost fifteen percent of
auditory thalamic neurons were modulated by visual cues. Responses in these neurons were
enhanced in congruent conditions and suppressed when auditory and visual cues conflicted.
Studies by Porter, Metzger and Groh, and Komura et al. imply that the auditory cortex receives
some visual input via subcortical structures within the central auditory pathway; the extent of
13
which is not clear. Thus, unisensory sites in the brain may have the ability to perform some
multisensory functions.
1.5.2 The Visual Pathway
Light entering the eye is detected by the retina which is composed of photoreceptors
(cones and rods) that transduce light into electrical signals that can be decoded by the brain.
Information from rods and cones feeds into a network of interneurons composed of horizontal,
bipolar and amacrine cells which in turn project to ganglion cells. Axons from the retinal
ganglion cells form the optic nerve (Wurtz & Kandel, 2000a; 2000b). The optic nerve carries
information from both eyes till it reaches the optic chiasm where inputs from the left side of each
eye (left hemifield) and the right hemifield cross to the contra-lateral hemisphere (Guillery,
1982). From the optic chiasm, visual inputs flow through optic tracts segregated by eye (left or
right) to the lateral geniculate nucleus of the thalamus. The primate lateral geniculate nucleus
contains six layers. Layers 1 and 2 are called the magnocellular layers and layers 3-6 are termed
the parvocellular layers (Kaas, Guillery, & Allman, 1972; Sherman, 1988). Both magno- and
parvo-cellular layers project to the primary visual cortex (area V1 or Brodmann area 17;
Brodmann, 1909). V1 sends outputs to V2/V3 from which point the visual information is split
into two streams – temporal and parietal (see Desimone & Ungerleider, 1989). The visual
association areas include areas such as V4, V5 and MT (middle temporal area; adapted from
Wurtz & Kandel, 2000a; 2000b). The plethora of connections between primary visual areas and
visual association cortices are beyond the scope of this discussion but can be reviewed in a paper
by Felleman and Van Essen (1991). Despite the multitude of levels in the central visual pathway,
information is maintained retinotopically - adjacent areas in the visual field are encoded by
slightly different but overlapping neuronal receptive fields.
Apart from the major retinogeniculostriate pathway, the retina also projects to other
subcortical areas such as the superior colliculus (Leventhal, Rodieck, & Drehkr, 1985;
Magalhaes-Castro, Murata, & Magalhaes-Castro, 1976; Rodieck & Watanabe, 1993; Wassle &
Iling, 1980) and the pulvinar nucleus of the thalamus (Grieve, Acuna, & Cudeiro, 2000; Itoh,
Mizuno, & Kudo, 1983; Mizuno et al., 1982; Nakagawa & Tanaka, 1984) as seen in primates
and cats. The superior colliculus and the pulvinar nucleus are also extensively interconnected
(Grieve, Acuna & Cudeiro, 2000; Stepniewska, Qi, & Kaas, 2000). The anatomical organization
14
of both these subcortical structures and their interactions with auditory and visual pathways
places them in a unique position to integrate multisensory information.
1.6 From Anatomy to Function - A Dynamic Systems’ Perspective As discussed in previous sections, the anatomical architecture of subcortical structures in
auditory and visual pathways supports both unisensory and multisensory processing. At the
cortical level, Felleman and Van Essen (1991) have documented extensive feed forward,
feedback and horizontal connections between visual and multisensory brain areas. Connections
between unimodal auditory cortex and primary visual areas have been mapped in primates by
Rockland and Ojima (2003) and Falchier et al. (2002). Cognitive neuroscience studies have also
successfully identified multisensory sites that include cortical regions like the posterior parietal
cortex (Bushara, Grafman, & Hallett, 2001; Iacoboni, Woods, & Mazziotta, 1998), intraparietal
sulcus (Macaluso & Driver, 2005), premotor areas (Dassonville et al., 2001; Kurata et al., 2000;
Tanji & Shima, 1994), anterior cingulate (Laurienti et al., 2003; Taylor et al., 1994) and
prefrontal cortex (Asaad, Rainer, & Miller, 1998; Bushara, Grafman, & Hallett, 2001; Laurienti
et al., 2003; Tanji & Hoshi, 2001; Taylor et al., 1994) and subcortical regions such as the
thalamus (Grieve, Acuna,& Cudeiro, 2000; Porter, Metzger, & Groh, 2007), superior colliculus
(Calvert, 2000; King, 1993; Stein & Meredith, 1993; Stein, Wallace, & Meredith, 1995),
cerebellum (Allen, Buxton, Wong, & Courchesne, 1997; Bense et al., 2001; Kim et al., 1999)
and parts of the basal ganglia (Chundler, Sugiyama, & Dong, 1995; Nagy et al., 2006). The
abundance in areas that process multisensory information suggests that there are large degrees of
functional overlap and redundancy in the brain. This idea was initially asserted by Mountcastle
(1979) who noticed that proximal areas showed similar functional characteristics and that there
were many structural and functional redundancies in sensorimotor systems.
Theories about how the brain’s structural capacity contributes to its functional properties
can be classified into a framework that considers the brain to be a dynamic system. This
framework has been defined by researchers in numerous ways, some of which will be discussed
below (Bressler, 1995; Bressler, 2002; Bressler & McIntosh, 2007; Bressler & Tognoli, 2006;
Fuster, 1997; Goldman-Rakic, 1988; Mesulam, 1981, 1990; 1998; McIntosh, 1999; Price &
Friston, 2002).
15
Bressler and Tognoli (2006) suggest that the functional expression of a particular
cognitive operation results from the co-activation of specific interconnected local area networks.
Mesulam (1990) has also stated that a cognitive or behavioural operation can be subserved by
several interconnected brain areas each of which is capable of multiple computations. Therefore,
a cognitive operation such as attention can be controlled by a diffuse cortical network that is
redundant and specialized at the same time (Mesulam, 1981). According to his view, each
region within a cortical network has some specialization because of its anatomical connections
but this specialization is not absolute in that lesions to different areas in a network could have
similar behavioural consequences. Goldman-Rakic, in a review paper (1988) comparing the
literature on different models of cortical organization claims that the brain’s association cortices
(areas that are largely responsible for complex processing that occurs between sensory input to
primary sensory cortices and motor output elicited by primary motor areas) interact to form a
finite number of dedicated networks that are reciprocally interconnected. These networks are
capable of communicating with sensorimotor systems to produce integrated behaviour.
One dynamical systems’ perspective that has gained momentum in the past few decades
is the notion that regions of the brain that share similar structural properties can contribute to a
multitude of functions by way of their interactions with other regions (Bressler, 1995; Bressler,
2002; McIntosh, 1999; Bressler & McIntosh, 2007, Price & Friston, 2002). These interactions
could be represented via direct or indirect connections. In addition, brain areas that have very
different neural connections can contribute to the same functional output. This framework of
distributed function in the brain has been defined operationally by McIntosh (1999, 2004) using
the term neural context. Neural context represents the local processing environment of a given
brain area that results from modulatory influences from other brain areas (Bressler & McIntosh,
2007; McIntosh, 1999). Therefore, cognitive function may not be localized in an area or network
of the brain but may emerge from dynamic large-scale neural interactions between different
brain areas that change as a function of task demands. Task demands, in broader terms, can be
the situational context in which an event occurs. Situational context refers to environmental
factors such as sensory input and response processing in which a task is performed. In most
cases, neural context is elicited from changes in situational context (Bressler & Tognoli, 2006;
Protzner & McIntosh, 2007).
16
While dynamical models incorporating large-scale neural interactions, neural and
situational contexts have been explored for cognitive processes such as learning and memory
(Lenartowicz & McIntosh, 2005; McIntosh & Gonzalez-Lima, 1994; 1998), they have seldom
been applied to investigate interactions between cognitive processes such as attention and
sensorimotor systems. Brain areas that display attention-sensorimotor interactions can be
dissociated using techniques like functional resonance imaging (fMRI). The use of fMRI to study
brain function will be discussed in the next section.
1.7 An Overview of fMRI
1.7.1 Basic MRI Physics
The MRI concepts presented here are summarized from Brown and Semelka (1999) and
Huettel, Song and McCarthy (2004).
The proton of a hydrogen atom has a magnetic spin that is denoted by a spherical,
distributed positive charge. This charge rotates about an axis at high speeds producing a small
magnetic field called a magnetic moment. In addition to the magnetic moment, a proton’s mass
in combination with the rotating charge produces angular momentum. The magnetic moment and
the angular momentum form the basis of the spin properties of a proton. When a proton is
exposed to a uniform magnetic field (B0), it can assume one of two types of spins – parallel
(same direction as B0) and anti-parallel (opposite direction to B0) – forming the equilibrium state.
Parallel spins are lower in energy and more stable compared to anti-parallel spins. In order to
convert a proton in a parallel spin state to an anti-parallel state, the proton needs to absorb
electromagnetic energy. By the same token, a proton in a high energy state emits electromagnetic
energy as it returns to a low energy state. The electromagnetic radiation frequency required to
excite a proton from a low-energy state to a high-energy state can be calculated for an MR
scanner. This frequency is called the Larmor frequency and is needed to change spins from
parallel to anti-parallel orientations.
Apart from the changes in spin states induced by B0, a proton’s motion about its axis can
also be influenced by B0. In the presence of B0, the axis of rotation for a proton can rotate around
the direction of B0. The motion of a proton in B0 is referred to as spin precession. The physical
characteristics of spin precession are exploited in MRI.
17
In an MRI set-up, a participant is placed in the centre of a uniform magnetic field.
Protons within brain tissue assume their equilibrium states (parallel or anti-parallel). Using the
Larmor frequency to generate an electromagnetic radiation pulse, protons can be excited and
their rotational axis perturbed to generate an MR signal. Once the electromagnetic pulse is
removed, the MR signal starts to decay. MR signal decay, also known as spin relaxation, is of
two types – longitudinal and transverse. Longitudinal relaxation (T1 recovery) corresponds to the
return of net magnetization in the same plane as B0. It is caused by protons in high-energy, anti-
parallel spins (excited state) returning to low-energy, parallel, relaxed states. In order to
understand transverse relaxation (T2 decay), consider the following: an electromagnetic pulse
tips the axis about which a proton precesses in the transverse plane such that all protons precess
in the same phase. When the pulse is removed, this phase-locking gradually subsides and protons
return to their original out of phase states. This is referred to as transverse relaxation. The rates
of longitudinal and transverse relaxation are constant for particular substances such as water,
bone or fat. The constants that describe longitudinal and transverse relaxation are called T1 and
T2, respectively.
While both the T1 and T2 constants are important for MR, a third constant, T2* is
essential for functional MR. T2* includes transverse relaxation due to phase differences in spin
precessions as well as local magnetic field inhomogeneities. The latter can be described by
considering an inhomogeneous external magnetic field where the strength of the magnetic field
at a particular location influences the spin precessional frequency. Perturbed protons at different
locations within the magnetic field lose coherence in their spin precessions at different rates
contributing to the decay of net magnetization in the transverse plane.
An MR signal can highlight different parts of the brain depending on the contrast used. A
T1-weighted image of the brain shows high signal (bright) for fat content and low signal for
cerebrospinal fluid (CSF; dark) while a T2-weighted image shows the opposite contrasts for fat
and CSF.
Two parameters that are critical to the amount of MR signal recorded and the contrast
expressed are repetition time (TR) and echo time (TE). The interval between successive
electromagnetic pulses that excite protons is referred to as TR. The TR influences the rate of
longitudinal recovery after an excitation pulse is removed. TE corresponds to the interval
18
between the excitation pulse and the acquisition of data and affects the rate of transverse decay.
By varying the length of the TR or TE, image intensity at each spatial location can be made more
or less sensitive to differences in T1 and T2.
1.7.2 Physiological Basis of BOLD fMRI
Functional MRI (fMRI) can be used to estimate the spatial locations of neural activations
and associated changes in metabolic demands when a person performs a certain task. These
metabolic changes include variations in concentration of deoxygenated hemoglobin, blood flow
and blood volume, (Buxton, Wong, & Frank, 1998; Kwong et al., 1992) all of which play a role
in producing the blood-oxygenation-level dependent (BOLD) response recorded in fMRI. To
understand the BOLD response, the magnetic properties of oxygenated and deoxygenated
hemoglobin must first be considered. Oxygenated hemoglobin is diamagnetic (does not affect
magnetic field strength) while deoxygenated hemoglobin is paramagnetic (distorts local
magnetic fields). The paramagnetic properties of deoxygenated hemoglobin decrease T2* values
(Thulborn et al., 1982) which allows measurement of changes in brain activity as indexed by
changes in the amount of deoxygenated hemoglobin.
The BOLD response in MR was first discovered by Ogawa and colleagues in 1990.
Anaesthetised rats were placed in an MR scanner. The experiment was performed to try to
investigate brain physiology with MRI. It was known by that point that deoxygenated
hemoglobin decreased blood’s T2* values (Thulborn et al., 1982). Ogawa et al. (1990) used this
finding to demonstrate that varying the proportion of blood oxygenation in rats could lead to
different MR image characteristics. In cases where rats breathed in oxygen at a hundred percent
concentration, Ogawa et al., noticed that T2* images of the brain showed structural differences
and very little vasculature. As the amount of oxygen in rats decreased, brain vasculature became
more prominent. To test this interesting effect, a saline-filled container that had test-tubes of
oxygenated and deoxygenated blood within it was imaged using at T2* contrast (Ogawa & Lee,
1990). T2*-weighted images of oxygenated blood showed a dark outline around the test-tube’s
diameter. In contrast, deoxygenated blood showed a greater signal loss that spilled over into the
area filled with saline. Ogawa and others concluded that functional changes in brain activity
could be studied using what was to be known as the BOLD contrast.
19
Following Ogawa’s work, Belliveau and colleagues (1991) observed the first functional
images of the brain by using a paramagnetic contrast agent called Gd (DTPA)2-. Subsequently,
Ogawa and others (1992) and Kwong et al., (1992) published the first functional images using
the BOLD signal.
The physiological basis of the BOLD signal can be summarized as follows. Relative
amounts of oxygenated and deoxygenated hemoglobin in the capillary bed of a brain region
depend on the ratio of oxygen consumption and supply. When neural activity increases, the
amount of oxygenated blood delivered to that area also increases while levels of deoxygenated
hemoglobin decrease. The BOLD signal captures the displacement of deoxygenated hemoglobin
by oxygenated blood since the former has the capability of affecting magnetic fields but the latter
does not. In other words, changes in oxygenation levels lead to the modulation of microscopic
field gradients around blood vessels which in turn affect T2* values for tissue water that produce
differences in signal strength (Huettel, Song, & McCarthy, 2004).
A gradient echo-planar imaging sequence that is sensitive to the paramagnetic properties
of deoxygenated hemoglobin can be used to display tomographic maps of brain activation
(Brown & Semelka, 1999; Kwong et al., 1992; Huettel, Song, & McCarthy, 2004).
1.7.3 Coupling of Neuronal Activity & BOLD
The majority of functional neuroimaging studies assume that the physiological changes
underlying the BOLD response are capturing neuronal activity. However, the nature of neuronal
activity represented by BOLD responses is still an actively debated topic.
Neuronal activity that would predict fMRI BOLD response could include multiple factors
such as the average firing rate of a sub-population of neurons also referred to as multi-unit
activity (MUA; Legatt, Arezzo, & Vaughan, 1980); synchronous spiking activity across a
neuronal population; the local field potential (LFP) which represents the synchronization of
dendritic currents (Mitzdorf, 1987), or some measure of the sub-threshold electrical activity as
measured by single-unit recordings (SUA, measured by Hubel & Wiesel, 1959). The size of a
neuronal population whose activity is indexed by fMRI signals may also be an issue. Scannell
and Young (1999) postulate that changes in fMRI BOLD responses could be caused by large
20
changes in firing rates in small neuronal populations or small changes in firing rates in larger
neuronal sub-populations.
Logothetis and colleagues (2001) attempted to investigate the relationship between
neuronal firing rates in the brain and subsequent BOLD responses picked up by fMRI. Monkeys
were presented with rotating checkerboard patterns and both fMRI BOLD responses and
electrophysiological signals were measured from primary visual cortex. The electrophysiological
data that was acquired consisted of SUA, MUA and LFP recordings. The results showed a
transient increase in BOLD at the onset of the visual stimulus which persisted until the visual
stimulus went offline. Approximately twenty-five percent of MUA showed a transient increase
in activity and subsequent return to baseline, while LFPs were sustained throughout the stimulus
duration. The authors claimed that increased LFPs during stimulation were significantly stronger
than MUA and were maintained over longer intervals therefore; LFPs give a better estimate of
BOLD responses than MUA. Logothetis et al.’s (2001) paper also suggested that LFPs
resembled integrative activity at neuronal dendritic sites while MUA corresponded to the axonal
firing rate of a small population of neurons. Hence, BOLD seemed to reflect incoming input and
local processing in an area more that spiking activity.
The experiment by Logothetis and colleagues (2001) assumed a somewhat linear
relationship between BOLD and LFPs. A subsequent study by Mukamel and colleagues (2005)
suggested that BOLD responses may be comprised of more complex neuronal activity that does
not necessarily follow a linear pattern. In Mukamel et al.’s experiment (2005), SUA and LFPs
were recorded from two neurosurgical patients. fMRI BOLD signals were collected from healthy
participants. Both patients and participants viewed a movie segment during measurements of
neural activity. The results showed a high, significant correlation (r = 0.75) between SUA from
neurosurgical patients and fMRI BOLD signals in healthy controls. The authors claimed that
fMRI BOLD responses were reliable measures of firing rates in human cortical neurons.
While the findings of Logothetis et al. (2001) and Mukamel et al., (2005) do not directly
contradict each other; it does appear though that neuronal activity that is represented in the
BOLD response may be a complex milieu of input and output processing. According to a review
article on coupling between BOLD and neuronal activity, Heeger and Rees (2002) state that
LFPs could be dominated by activity in proximal neurons which would suggest that local spiking
21
activity, synaptic activity and dendritic currents are all co-varying. The authors emphasize that
fMRI BOLD response may be capturing both presynaptic and postsynaptic activity within a
particular region. For experiments that are designed to investigate global changes in brain
activity, having the resolution of single-unit recordings in fMRI may not be necessary.
22
Chapter 2: Aims and Hypotheses The primary objective of the present study is to understand crossmodal facilitation effects
in the brain. There are discrepancies in behavioural research about the direction of crossmodal
facilitation. To recap, some scientists have found that auditory cues are superior to visual cues in
producing fast responses (Bertelson & Tisseyre, 1969; Buchtel & Butter, 1988; Farah, Wong,
Monheit, & Morrow, 1988; Spence & Driver, 1997) while other researchers have shown the
opposite effect – visual cues facilitate reaction times to auditory targets (McDonald & Ward,
1999; Ward, McDonald, & Golestani, 1998; Ward, McDonald, & Lin, 2000; Ward, 1994). There
have been no studies that I am aware of that have investigated the effects of audio-visual
crossmodal facilitation in the brain using fMRI.
The hypotheses for the current experiment are as follows. Firstly, the magnitude of
facilitation for auditory cues will be larger compared to visual cues given the greater
neuroanatomical capacity for audition to influence vision. Secondly, auditory and visual cueing
will be represented by distinct patterns of brain activation given the differences in behavioural
performance. Lastly, brain areas that respond to cue processing (input) may not be the same
areas that coordinate behaviour (output).
The first two hypotheses will be tested in Chapter 3 and the last hypothesis will be
focused on in Chapter 4. In order to test the hypotheses mentioned, participants will perform a
crossmodal version of the spatial stimulus-response compatibility task while BOLD fMRI
responses are recorded. In spatial stimulus-response compatibility tasks, a cue signals the
response rule to a lateralized target. Responses (button presses) are made to the same
(compatible) or opposite (incompatible) side of target presentation. Reaction times are faster in
compatible conditions than in incompatible conditions (Simon, 1969; Fitts & Seeger, 1953). In
this experiment, the cue and targets will be presented in both auditory and visual modalities.
A unique aspect of the current experimental design that I would like to emphasize is the
manipulation of cue-target order. Tasks can be of two types – cues appearing first followed by
targets or targets being presented first followed by cues. Previous behavioural studies that have
investigated crossmodal facilitation did not manipulate order however, in my fMRI study I can
23
use this order manipulation to examine changes in neural activity based exclusively on cue or
target processing.
24
Chapter 3: Attentional Cueing Modulates Multisensory Interactions in Human Sensory Cortices
3.1 Introduction Experiments on selective attention have shown that when participants are provided with a
cue, they shift their attention to the cued location (Posner, Inhoff, Friedrich, & Cohen, 1987).
Attentional tasks where cues and targets are manipulated have been adapted to study crossmodal
facilitation effects (Spence and Driver, 1997; Ward, 1994). Crossmodal facilitation occurs when
a cue in one sensory modality elicits a speeded response in another sensory modality. A study
conducted by Ward (1994) investigated the effects of crossmodal facilitation using a spatial
discrimination task. Participants made speeded left-right responses to visual or auditory targets
following the presentation of an auditory or a visual non-predictive cue, auditory and visual cues
or no cues. Reaction times were measured for all conditions at different inter-stimulus intervals
between the cue and target. The results indicated that visual cues facilitated reaction times to
auditory targets presented on the same side of the cue (compatible) at short inter-stimulus
intervals. Auditory cues, in contrast, did not facilitate reaction times to visual targets on either
side of cue presentation or at any inter-stimulus interval. Auditory cues did facilitate reaction
times to compatible auditory targets at short inter-stimulus intervals. Ward’s findings were met
with skepticism because previous crossmodal studies that had presented non-predictive auditory
cues had shown response time facilitation to visual targets ipsa-lateral to the cue (Buchtel &
Butter, 1988; Farah, Wong, Monheit, & Morrow, 1989). A crossmodal study conducted after
Ward’s study also showed an asymmetrical facilitation of reaction times for auditory cues in
comparison to visual cues (Spence & Driver, 1997).
Spence and Driver (1997) explained the asymmetrical auditory cue facilitation as having
evolutionary significance. Auditory events in the world are transient and intermittent whereas
visual events are continuous in time. Also, auditory events that occur distally could be registered
in the brain resulting in appropriate action whereas; by the time visual events come into view
proximally, a response may not be viable. Therefore, it is more beneficial to shift visual attention
to auditory events rather than the other way around (Neumann, Van der Heijden, & Allport,
1986). Evidence from neuropsychological studies also indicates that orienting to auditory events
is usually followed by visual localization in areas like the superior colliculus (Stein & Meredith,
25
1993; Stein, Wallace, & Meredith, 1995). McDonald, Teder-Salejarvi & Hillyard (2000) showed
that sudden sound can improve the detection of a subsequent flash of light at the same location.
Abrupt sounds synchronized with visual search arrays can improve the identification of visual
targets embedded in a series of distracters (Vroomen & de Gelder, 2000). The evidence
presented thus far implies that auditory events can influence the processing of subsequent visual
events.
However, there has also been some evidence in favour of visual facilitation of reaction
time. Schmitt et al. (2000) found facilitated reaction times to both auditory and visual cues in an
experimental set-up where cue and target modalities were fixed within a block. Symmetric
audio-visual cueing effects have subsequently been reported in other studies (McDonald &
Ward, 1999; 2003). The famous McGurk effect also provides evidence for vision altering speech
perception (McGurk & MacDonald, 1976). For example, a sound of /ba/ is perceived as /da/
when it is coupled with a visual lip movement associated with /ga/. The McGurk effect shows
that sound can be misperceived when it is coupled with different visual lip movements. These
studies suggest that vision can also alter audition in some cases.
The lack of consensus about the direction of reaction time facilitation in response to
auditory and visual cues is further compounded by attempts to understand the exact nature of
cue-target processing in the brain. Some researchers argue that salient sensory information
contained in the cue is integrated with target information via separate, modality-specific sub-
systems (Bertelson & Tisseyre, 1969; Bushara et al., 1999; Cohen, Cohen, & Gifford, 2004;
Posner, Inhoff, Friedrich, & Cohen, 1987; Spence & Driver, 1997; Ward, 1994). Alternatively,
other scientists argue that synthesis of information from different sensory modalities is achieved
through a supramodal network that involves parts of the prefrontal cortex and parietal areas
(Andersen, 1995; Andersen, Snyder, Bradley, & Xing, 1997; Bedard, Massioui, Pillon, &
Nandrino, 1993; Downer, Crawley, Mikulis, & Davis, 2000; Eimer & Driver, 2001; Farah,
Wong, Monheit, & Morrow, 1989; Iacoboni, Woods, & Mazziotta, 1998; Laurens, Kiehl, &
Liddle, 2005; Snyder, Batista, & Andersen, 1997). Convergent theories, advocated by Macaluso
and others suggest that while supramodal attentional networks may guide sensorimotor
integration, reverberating loops that link sensory-specific cortices to each other can also integrate
sensory information across modalities (Corbetta & Shulman, 2002; Ettlinger & Wilson, 1990;
Macaluso, 2006; Macaluso & Driver, 2005). Macaluso and colleagues have derived a convergent
26
model for visual and tactile modalities but this model has not been applied to audio-visual cue-
target processing.
In order to reconcile discrepancies found in behavioural crossmodal facilitation data and
exact brain mechanisms responsible for multisensory processing; I designed a task that attempted
to capture audio-visual interactions between a cue and a target in an event-related functional
neuroimaging study. A stimulus-response compatibility paradigm, traditionally used to study
response selection and cue-target processing (Bertelson and Tisseyre, 1969; Rodway, 2005), was
modified to investigate crossmodal cue-target interactions. A general version of the stimulus-
response compatibility paradigm has within it a cue that signals a response rule to a lateralized
target. Response times are faster when responses are made to the same side as the presentation of
a target (compatible responses) compared to when responses are made to the opposite side of
target presentation (incompatible responses). This robust behavioural finding is known as the
stimulus-response compatibility effect (Simon, 1969; Fitts & Seeger, 1953).The cues and targets
in my experiment were presented in auditory and visual modalities.
The aims of the first part of my study are to determine the neural correlates that underlie
reaction time facilitation in response to auditory and or visual cues. Given the discrepancies in
behavioural findings, I am unsure about the direction of reaction time facilitation however, I
hypothesize that auditory cues will facilitate reaction times to visual targets because of the
structure of auditory and visual neural pathways in the brain (see Chapter 1 for details).
3.2 Materials and Methods
3.2.1 Participants
Twenty-four (12 female) healthy, right-handed individuals between the ages of 19 and 35
(mean age - 23.08 ± 3.87 years) were voluntarily recruited through undergraduate psychology
courses to partake in the study. All individuals were screened for any history of medical,
neurological, psychiatric, substance abuse-related problems (see Appendix A and B for screening
forms) prior to their participation in the study. All participants signed an informed consent (see
Appendix C) and were reimbursed $100.00 for two sessions. The experiment was conducted in
the fMRI Suite at Baycrest upon approval from the Research Ethics Board at Baycrest.
27
3.2.2 Stimuli
Two types of auditory stimuli that were matched in amplitude but varied in frequency
(250 Hz and 4000 Hz) were used in the experiment. Each participant adjusted the volume of all
auditory tones using the left and right buttons on the response box such that the tones appeared
perceptually identical. The volume adjustment was conducted at the beginning of the experiment
with the scanner turned on so that participants could adjust the volume of the stimuli according
to the noise produced by the scanner.
The visual stimuli used in the experiment were of two types and were matched for
luminance and contrast. The first visual stimulus was a black and white checkerboard pattern and
the second visual stimulus was also a black and white checkerboard pattern, but rotated at a 45
degree angle. Stimulus presentation was controlled and documented by Presentation software
(version 10.2, Neurobehavioural Systems Inc.)
3.2.3 Apparatus
Participants viewed visual stimuli on a translucent screen via a mirror that was mounted
on the head coil (used for acquiring images) in the MRI scanner. The total distance between the
participants’ eyes and the screen was approximately 52 inches. The size of the image on the
screen was 13.75 inches by 11 inches with an image resolution of 800 x 600 x 60 Hz. The field
of view (FOV) was 12 degree vertical and 15 degree horizontal. The majority of the participants
wore their own contact lenses for vision correction. MR safe glasses made by SafeVision
(Webster Groves, MO, USA) with a range from +/- 6 dioptres in increments of 0.5 dioptres were
provided to those participants that did not have prescription contact lenses. Auditory stimuli were
presented using the Avotec Audio System (Jensen Beach, FL, USA). A button-press response
was made by participants with either the left or the right index finger using Fiber-Optic Response
Pad System developed by Current Designs Inc. (Philadelphia, PA, USA). The response pad
system had two paddles – one for each hand and each paddle had four buttons. The first button
used for the left hand (left paddle) was the first button in the four buttons starting from the right
going left. The first button used for the right hand (right paddle) was the first button out of four
buttons going from left to right. The other three buttons on both hands were taped to prevent
responses. Participants rested their hands gently on top of the taped buttons. The signal
transmission of the paddles was less than 1ms.
28
3.2.4 Procedure
3.2.4.1 Trial Structure
A trial had the following sequence: presentation of the first stimulus (S1) for 250ms, a 4-
second inter-stimulus interval (ISI), presentation of the second stimulus (S2) for 250ms and a
response window of 1500ms. The inter-trial interval (ITI) was jittered randomly at 3, 5, 7, 9
seconds. Response times were recorded using Presentation software from the onset of the second
stimulus.
3.2.4.2 Task Types
The study consisted of two scanning sessions on different days. Each session was 1.5
hours in length. Two different ta