Post on 07-Aug-2018
transcript
The Statistical Learning Of Musical Expectancy
by
Dominique Thy An Vuvan
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
Department of Psychology University of Toronto
© Copyright by Dominique T. Vuvan 2012
ii
The Statistical Learning Of Musical Expectancy
Dominique Thy An Vuvan
Doctor of Philosophy
Department of Psychology
University of Toronto
2012
Abstract
This project investigated the statistical learning of musical expectancy. As a secondary goal, the
effects of the perceptual properties of tone set familiarity (Western vs. Bohlen-Pierce) and
textural complexity (melody vs. harmony) on the robustness of that learning process were
assessed. A series of five experiments was conducted, varying in terms of these perceptual
properties, the grammatical structure used to generate musical sequences, and the methods used
to measure musical expectancy. Results indicated that expectancies can indeed be developed
following statistical learning, particularly for materials composed from familiar tone sets.
Moreover, some expectancy effects were observed in the absence of the ability to successfully
discriminate between grammatical and ungrammatical items. The effect of these results on our
current understanding of expectancy formation is discussed, as is the appropriateness of the
behavioural methods used in this research.
iii
Acknowledgments
It feels strange to see my name standing alone at the top of this document, because it surely feels
like there have been a hundred authors rather than one. My sincere and infinite thanks go out to
all my ghost co-authors. More specifically…
Thanks to Mark Schmuckler, my supervisor, for moulding me from a tiny naïve science duckling
into a big feisty science mallard. Thanks to my PhD committee – Elizabeth Johnson and Claude
Alain – whose support and invaluable feedback helped me cope with the emotional and
intellectual uncertainty of the research enterprise. Thanks to Jon Prince, my academic big
brother, whose career path trailblazing gives me hope that I too can achieve my science mallard
dreams. Thanks to Bryn Hughes for his help with my experimental stimuli, for all music
cognition experts are lost without a music theory expert buddy. Thanks to my examination
committee – Steve Joordens and Sandra Trehub – for reading my thesis and liking it! Thanks to
Barbara Tillmann, my external appraiser, for providing unique insight based on her considerable
experience with research in this area.
Thanks to Eddy Abraham, my erstwhile life adventure partner, for putting up with the highs and
lows of graduate school for the last six years. Thanks to my thesis writing support group –
Jessica Ellis, Michelle Hilscher, and Cara Tsang – whose never-ending cheerleading and advice
helped me to write faster than I’ve ever written before. Thanks to my parents, who raised a
person of sufficient intelligence and stubbornness to get through this process relatively
unscathed. Thanks to the staff of Moonbean Café, who quietly provided me with smiles,
caffeine, and workspace throughout the writing process. And thank you, dear reader, for being
interested in this work – especially if you’re not one of the individuals thanked above.
iv
Table of Contents
Acknowledgments.......................................................................................................................... iii
Table of Contents ........................................................................................................................... iv
List of Tables ................................................................................................................................ vii
List of Figures .............................................................................................................................. viii
Chapter 1 - Connections Between Statistical Learning, Musical Structure, and Musical
Expectancy ..................................................................................................................................1
1 Introduction .................................................................................................................................1
2 General Methods .........................................................................................................................8
3 Research Goals ............................................................................................................................9
Chapter 2 - Experiment 1: Memory For Western Tone-Word Melodies .......................................11
4 Introduction ...............................................................................................................................11
5 Methods .....................................................................................................................................12
5.1 Participants .........................................................................................................................12
5.2 Apparatus ...........................................................................................................................13
5.3 Materials and Procedure ....................................................................................................13
6 Results .......................................................................................................................................17
6.1 Discrimination Phase .........................................................................................................17
6.2 Expectancy Phase...............................................................................................................18
7 Discussion .................................................................................................................................20
Chapter 3 - Experiment 2: Tonal Priming With Western Chords ..................................................22
8 Introduction ...............................................................................................................................22
9 Methods .....................................................................................................................................25
9.1 Participants .........................................................................................................................25
9.2 Apparatus ...........................................................................................................................25
9.3 Materials ............................................................................................................................26
v
9.4 Procedure ...........................................................................................................................30
10 Results .......................................................................................................................................32
10.1 Discrimination Phase .........................................................................................................32
10.2 Expectancy Phase...............................................................................................................33
11 Discussion .................................................................................................................................39
Chapter 4 - Experiment 3: Memory For Bohlen-Pierce Melodies .................................................42
12 Introduction ...............................................................................................................................42
13 Methods .....................................................................................................................................43
13.1 Participants .........................................................................................................................43
13.2 Apparatus ...........................................................................................................................44
13.3 Materials and Procedure ....................................................................................................44
14 Results .......................................................................................................................................48
14.1 Discrimination Phase .........................................................................................................48
14.2 Expectancy Phase...............................................................................................................48
15 Discussion .................................................................................................................................59
Chapter 5 - Experiment 4: Tonal Priming With Bohlen-Pierce Melodies .....................................63
16 Introduction ...............................................................................................................................63
17 Methods .....................................................................................................................................64
17.1 Participants .........................................................................................................................64
17.2 Apparatus ...........................................................................................................................64
17.3 Materials ............................................................................................................................64
17.4 Procedure ...........................................................................................................................65
18 Results .......................................................................................................................................68
18.1 Discrimination Phase .........................................................................................................68
18.2 Expectancy Phase...............................................................................................................69
19 Discussion .................................................................................................................................77
vi
Chapter 6 - Experiment 5: Tonal Priming With Bohlen-Pierce Chords ........................................80
20 Introduction ...............................................................................................................................80
21 Methods .....................................................................................................................................80
21.1 Participants .........................................................................................................................80
21.2 Apparatus ...........................................................................................................................81
21.3 Materials ............................................................................................................................81
21.4 Procedure ...........................................................................................................................84
22 Results .......................................................................................................................................85
22.1 Discrimination Phase .........................................................................................................85
22.2 Expectancy Phase...............................................................................................................86
23 Discussion .................................................................................................................................87
Chapter 7 - General Discussion .....................................................................................................89
23.1 Examination of Research Goals .........................................................................................89
23.2 Refining the Proposed Model of Expectancy Learning .....................................................91
23.3 Improving Methodology ....................................................................................................93
23.4 Conclusions ........................................................................................................................94
References ......................................................................................................................................96
vii
List of Tables
Table 1: Composition of Materials for Experiment 1
Table 2: Composition of Expectancy Trials for Experiment 2
Table 3: Composition of Materials for Experiment 3
Table 4: Examples of Grammatical and Ungrammatical Standards and Their Corresponding
Comparisons for Experiment 3
Table 5: Mean Areas Under Memory Operating Characteristic Curve for Experiment 3
Table 6: Mean Similarity Ratings for Experiment 3
Table 7: Composition of Materials for Experiment 5 (from Krumhansl, 1987)
viii
List of Figures
For all figures:
1) Error bars are standard error of the condition mean.
2) * signifies p ≤ .05, ** signifies p ≤ .01, *** signifies p ≤ .001.
Figure 1: Examples of grammatical and ungrammatical standards and their corresponding
comparisons for Experiment 1.
Figure 2: Main effect of Trial Type on accuracy in Experiment 1.
Figure 3: Trial Type x Grammaticality interaction for accuracy in Experiment 1.
Figure 4: Main effect of Trial Type on d’ in Experiment 1.
Figure 5: Priming conditions and hypothesized priming strengths if participants develop
harmonic expectancies in Experiment 2.
Figure 6: Finite state grammars used in Experiment 2, based on Figure 1 from Jonaitis & Saffran
(2009). (a) Grammar A; (b) Grammar B.
Figure 7: Examples of items used in Experiment 2, based on Figures 2-3 from Jonaitis & Saffran
(2009). (a) Example of Grammar A exposure item. (b) Example of Grammar A correct
discrimination item. (c) Example of Grammar B correct discrimination item. (d) Example of
Grammar A error discrimination item. (e) Example of Grammar B error discrimination item.
Figure 8: Average similarity ratings for grammatical and ungrammatical items in the
discrimination phase of Experiment 2.
Figure 9: Western and novel priming effects in terms of accuracy (in-tune trials only) for
Experiment 2.
Figure 10: Tuning x Chord Order interaction (accuracy) for the training legal/Western illegal
condition for Experiment 2.
Figure 11: Effect of tuning on reaction time for Experiment 2.
Figure 12: Western and novel priming effects in terms of reaction time (in-tune trials only) for
Experiment 2.
Figure 13: (a) Bohlen-Pierce grammar from Experiment 3, based on Figure 2 from Loui et al.
(2008). (b) An example of a melody constructed from this grammar.
Figure 14: Main effects on memory operating characteristic in Experiment 3.
ix
Figure 15: Grammaticality x Delay interaction for memory operating characteristic in
Experiment 3.
Figure 16: Delay x Comparison interaction for memory operating characteristic in Experiment 3.
Figure 17: Effect of melody familiarity on area under memory operating characteristic in
Experiment 3.
Figure 18: Main effects on similarity ratings for Experiment 3.
Figure 19: Grammaticality x Delay interaction for similarity ratings in Experiment 3.
Figure 20: Grammaticality x Trial Type interaction for similarity ratings in Experiment 3.
Figure 21: Delay x Trial Type interaction for similarity ratings in Experiment 3. # signifies p <
.10.
Figure 22: Grammaticality x Delay x Comparison Type interaction for similarity ratings in
Experiment 3.
Figure 23: Effect of melody familiarity on similarity ratings in Experiment 3.
Figure 24: Visual presentation of priming trials in Experiment 4, based on Figure 2 from
Tillmann & Poulin-Charronnat, 2010.
Figure 25: Effect of familiarization grammar on recognition performance in Experiment 4.
Figure 26: Effect of Grammaticality on accuracy in Experiment 4.
Figure 27: Grammaticality x Timbre interaction for accuracy in Experiment 4.
Figure 28: Grammaticality x Familiarization interaction for accuracy in Experiment 4.
Figure 29: Timbre x Familiarization interaction for accuracy in Experiment 4.
Figure 30: Main effects for reaction time in Experiment 4.
Figure 31: Grammaticality x Target Position interaction for RT in Experiment 4.
Figure 32: Grammaticality x Familiarization interaction for RT in Experiment 4.
Figure 33: Bohlen-Pierce chord grammars used in Experiment 5. (a) Grammar A; (b) Grammar
B. Composition of these chords is specified in Table 7.
x
Figure 34: Grammar x Correctness interaction for discrimination phase in Experiment 5.
Figure 35: Grammaticality x Familiarization interaction for accuracy in Experiment.
1
Chapter 1 Connections Between Statistical Learning, Musical Structure, and
Musical Expectancy
1 Introduction
Expectation, which can be simply defined as “the anticipation of upcoming information based on
past and current information” (Schmuckler, 1997) , is an essential tool for human survival. As
such, this ability to predict future events from previous experiences has been well studied by
cognitive psychologists, especially in the domain of music.
Music offers a uniquely suited material with which to study cognitive expectancy for two
reasons. First, critical to expectancy is the processing of experience over time. Musical passages
must unfold sequentially and thus their perception absolutely requires the integration of auditory
information across time. Second, musical structure has not only been theoretically defined in
extraordinary detail (e.g., Laitz, 2008; Lerdahl & Jackendoff, 1983; Rameau, 1971; Schenker,
1954), but the tight match between these theoretical definitions and listener cognitions have also
been confirmed (e.g., Jones, 1990; Krumhansl, 1990). This is especially true for Western tonal-
harmonic music, but fairly detailed work has been done cross-culturally (e.g., Castellano,
Bharucha, & Krumhansl, 1984; Kessler, Hansen, & Shepard, 1984; Krumhansl, et al., 2000) and
with artificial musical structures (Oram & Cuddy, 1995; Smith & Schmuckler, 2004) as well.
Because musical pieces unfold based on underlying structural rules, the cognitive representation
of these rules is imperative to expectancy. Therefore, music is particularly valuable as
experimental material because its well-defined structure is easy to control and manipulate.
Given these advantages it is not surprising that the role of expectancy in musical processing has
been extensively explored across disciplines. Psychologists have investigated how expectancy
affects listener judgments of how well-formed a musical passage is (Cuddy & Lunney, 1995;
Krumhansl, 1995b; Schellenberg, 1996; Schmuckler, 1989), how quickly and accurately musical
information is encoded (Bharucha & Stoeckig, 1986, 1987; Bigand, Poulin, Tillmann, Madurell,
& D'Adamo, 2003; Marmel & Tillmann, 2009; Marmel, Tillmann, & Delbe, 2010; Marmel,
Tillmann, & Dowling, 2008; Tillmann, 2005; Tillmann, Bharucha, & Bigand, 2000; Tillmann,
Bigand, & Pineau, 1998; Tillmann, Janata, Birk, & Bharucha, 2003, 2008), the parameters of
2
musical production and performance (Carlsen, 1981; Schellenberg, 1996; Schmuckler, 1989;
Thompson, Cuddy, & Plaus, 1997; Unyk & Carlsen, 1987), and memory for musical passages
(Boltz, 1991, 1993; Schmuckler, 1997).
More recently, researchers have also begun using neuroscientific techniques to measure brain
responses to expectancy satisfaction and violation. For instance, Koelsch and colleagues
(Koelsch, Gunter, & Friederici, 2000; Maess, Koelsch, Gunter, & Friederici, 2001), using
electroencephalography and magnetoencephalography, have demonstrated that an early anterior
negativity (EAN), emanating from Broca’s area and its right hemisphere homologue, occurs
approximately 250 ms after an expectancy-violating musical event. Furthermore, the magnitude
of the EAN indexes the degree to which that event violates structural expectations.
The same questions that have captured the attention of psychologists have also been well-studied
by scholars in other disciplines. For instance, Huron (2006) has taken an interdisciplinary
approach to developing a theory of musical expectation that relies on research in music theory,
cognitive science, and evolutionary biology. With this theory, Huron argues that expectancy is
responsible for the various emotion states that can be evoked by musical events, and that this link
between expectancy and emotion developed through evolution to be biologically useful.
Working in the field of music theory, Narmour has developed an account of melodic expectancy
that explains how melodic structure leads to listener expectations for what will follow (Narmour,
1990, 1992). According to Narmour’s Implication-Realization model, these expectancies are
predicated on Gestalt laws of perception such as similarity, closure, and continuity, which are
argued to arise from biological constraints of the sensory system. Empirical study has confirmed
the explanatory utility of Narmour’s model (e.g., Krumhansl, 1995a; Krumhansl, 1995b),
although a simplified version of the model seems to maintain its predictive value (Schellenberg,
1996).
Margulis (2005), also a music theorist, takes a somewhat different approach. Her model of
musical expectation assigns expectancy ratings to melodic events based on the hierarchical
organization of three primary factors (stability, proximity, and direction) and one secondary
factor (mobility). Notably, Margulis’ model improves on Narmour’s by formalizing the roles of
tonality (i.e., stability) and emotion in musical expectation.
3
Other researchers have taken a computational approach, building information-theoretic models to
describe expectancy processing in music. For example, Pearce and Wiggins (2006) have
developed a model of melodic expectancy that combines the bottom-up elements present in
Narmour’s work (Narmour, 1990, 1992) and the top-down influence of tonality formalized in the
work of Margulis and others (Krumhansl, 1990; Margulis, 2005). This model was quite
successful at predicting a variety of results from previous experiments on musical expectancy,
including a study of expectancy in simple intervals (Cuddy & Lunney, 1995), a study of
continuation tones in melodic fragments (Schellenberg, 1996), and a study that evaluated
listeners’ expectancy note-by-note throughout a melody (Manzara, Witten, & James, 1992).
One aspect of musical expectancy that has thus far escaped scrutiny is the cognitive process by
which these expectancies are acquired. Participants in studies of musical expectancy are almost
exclusively adults with an entire lifetime of musical experience upon which their expectancies
rest. Thus, the central aim of this project was to elucidate how musical expectancies develop.
This goal can be broken down into three separable questions:
(1) Upon what is musical expectancy based?
(2) How do listeners learn the basis of musical expectancy?
(3) Is there evidence of expectancy processing following the putative expectancy
learning process?
Extensive work on Western music indicates that music perception relies most heavily upon
tonality, the hierarchy that dictates how all tones are organized around a central tonic, and how
groups of tones (chords) are organized around a tonic chord (see Krumhansl, 1990; 2000 for
review). Importantly, musical expectancies seem to be predicated on these perceptual structures.
For instance, Bharucha and Stoeckig (1986) found that listeners responded faster and more
accurately to expected chords than unexpected chords, and critically, that these expectancies
matched up very well with predictions from the harmonic hierarchy described previously
(Krumhansl, Bharucha, & Kessler, 1982). In another study, Schmuckler (1997) found that
recognition memory for melodies was positively correlated to listener ratings of expectancy, and
that expectancy ratings were in turn related to how well the melody in question adhered to the
rules of tonality.
4
Turning to the question of how listeners learn these tonal rules, some theorists have focused on
the idea that knowledge of these structures, and hence expectancy, is based upon innate
perceptual dispositions (e.g., Huron, 2006; Meyer, 1956; Narmour, 1990). For example,
Narmour’s Implication-Realization theory is predicated on the assumption of perception being
guided by Gestalt laws that are innate to the sensory system.
However, research on listeners of different musical abilities, ages, and different cultures
challenges the presumption that musical expectancies are innate by indicating that the perception
of tonality varies across individuals and groups. Both Cuddy and Badertscher (1987) and
Krumhansl and Shepard (1979) found that musicians have more sharply defined representations
of tonality. Work by several groups has shown that the representation of the musical structure of
one’s culture strengthens continuously from birth, through childhood, and into adulthood (Cuddy
& Badertscher, 1987; Trainor & Trehub, 1992, 1993; Trehub, Schellenberg, & Kamenetsky,
1999). For instance, Trainor and Trehub (1992) found that infants were able to detect melodic
changes regardless of tonal structure, whereas adults were able to better detect changes that
violated tonal structure than changes that occurred within tonal structure. This result
demonstrated that tonal sequences were privileged in adult perception but not in infant
perception. Finally, many studies on cross-cultural tonality perception have shown that listeners
are more sensitive to the tonal structure of their home culture than to the tonal structure of other
cultures (Castellano, et al., 1984; Kessler, et al., 1984; Krumhansl, et al., 2000). For instance,
Castellano et al. (1984) had Western and North Indian listeners respond to North Indian musical
passages. These authors found that Western responses were dependent on the pitch information
present in the passage, whereas North Indian responses demonstrated effects of tonality
regardless of the pitches in the passage. Therefore, although there is merit in a theoretical
approach that treats the tonal structures that form expectancy as fully-formed, it is also important
to examine the learning processes that contribute to these expectancies.
Proponents of this learning view have focused on the role of statistical learning (sometimes
called implicit learning) on the acquisition of mental representations of tonal structure. Statistical
learning refers to the process whereby listeners are able to extract the structural rules of an
organized stimulus system during incidental exposure to rule-abiding exemplars. This process is
generally implicit and occurs without conscious thought. In music, this specifically refers to
listeners’ ability to extract the rules of tonality from their experiences listening to tonal music.
5
Tillmann, Bharucha, and Bigand (2000) have put forth MUSACT, a computational model of
implicit tonality learning that self-organizes based on exposure to tonal music. This connectionist
model stores the information from exposure in three types of nodes – tones, chords, and keys –
and displays learning by strengthening the connections between nodes that co-occur during
exposure and allowing these changes in connection strength to spread through the network of
nodes. This model has successfully accounted for a variety of empirical findings in the tonality
literature, including studies of tone relatedness, memory, and expectancy.
Inherent in these learning accounts is the assumption that listeners are in fact sensitive to
statistical cues present in music that would enable them to formulate their ensuing understanding
of tonal structure. Indeed, research with unfamiliar musical systems seems to confirm that
listeners perceive statistical cues in music, and based on those cues, are able to form novel
mental representations of structure. The two classes of statistical cues that have received the
most attention from music researchers are pitch distributions and predictive dependencies.
A pitch distribution is a set of statistics concerning the overall behaviour of tones in a musical
passage. Every musical system has a defined tone set containing all possible legal pitches. A
pitch distribution is defined by the relative frequency of occurrence of the different pitches in the
passage. In general, more important tones (like the tonic) occur more often than less important
tones. Evidence for listeners’ sensitivity to pitch distributions comes from cross-cultural studies
of tonality as well as studies employing artificial novel tonalities. For example, in the study of
North Indian music discussed above, Castellano et al. (1984) found that the responses of the
American listeners were in high accordance with the responses of the Indian listeners, despite
some key differences. These authors postulated that the musical passages contained distributional
cues to the North Indian tonal hierarchy, and Western listeners were able to extract that
information and respond accordingly. Similarly, in a study where pitch distributions were
experimentally manipulated, Oram and Cuddy (1995) found that listeners rated tones that had
occurred more frequently as fitting better with the musical context that tones that had occurred
less frequently. Moreover, Smith and Schmuckler (2004) presented listeners with musical
passages that exhibited different levels of pitch distributional information, and found that tonality
perception depends on meeting threshold levels of pitch differentiation and organization in these
passages.
6
Predictive dependency statistics, on the other hand, describe the local relations that govern
transitions from one musical unit to the next. For instance, forward transitional probabilities are
calculated from the conditional probability of any particular event, given the event that came
before. Saffran and her collaborators have studied the perception of transitional probabilities in
language extensively. In a pair of studies, Saffran and colleagues (Saffran, Aslin, & Newport,
1996; Saffran, Newport, & Aslin, 1996) demonstrated that both 8-month-old infants and adults
are sensitive to forward transitional probabilities. In these studies, an artificial grammar was
employed consisting of six trisyllabic words. These artificial words were strung together
randomly into a continuous stream that listeners heard during an exposure period. Critically,
although word boundaries could not be detected by using acoustic cues, they could be computed
based on differential predictive dependencies, with the syllables within words exhibiting high
transitional probabilities and the syllables between words exhibiting low transitional
probabilities. Following exposure, both infants and adults were able to discriminate words from
non-words even though they had never before heard these words in isolation, thus confirming
listeners’ perceptual sensitivity to these transitional cues.
Importantly, this sensitivity to transitional cues has been tested with music-like materials as well
(Saffran, 2003a, 2003b; Saffran & Griepentrog, 2001; Saffran, Johnson, Aslin, & Newport,
1999; Saffran, Reeck, Niebuhr, & Wilson, 2005). Similar to the linguistic materials from Saffran
et al. (Saffran, Aslin, et al., 1996; Saffran, Newport, et al., 1996), the artificial languages used for
these studies combined a discrete set of tones into tone-word groups containing three tones each.
These tone-words were then concatenated randomly into a continuous stream, with the cues to
their boundaries being the transitional probabilities between tones. Following exposure to the
tone stream, participants were able to discriminate between tone-words and non-words. These
results provide evidence that listeners’ sensitivity to adjacent dependencies extends to music-like
materials.
In another study exploring predictive dependencies in music, Jonaitis and Saffran (2009)
examined the possibility that listeners could use statistical information to learn the rules of a
harmonic system, which governs how chords (groups of simultaneously sounded notes) are
combined. These authors employed a finite-state grammar which combined familiar Western
chords in novel ways. After familiarization with 100 grammatical chord sequences, listeners
were able to distinguish grammatical chord sequences from ungrammatical ones, and following
7
an additional familiarization session, they were also sensitive to more subtle violations of
grammaticality within generally grammatical items. Thus, listeners are sensitive to structural
statistics that indicate novel combinations of both tones and chords.
Extending these results from items that are produced by combining familiar Western tones in
novel ways, Loui and her colleagues (Loui & Wessel, 2008; Loui, Wessel, & Kam, 2010) studied
the statistical learning of adjacent dependencies using an artificial grammar composed from
Bohlen-Pierce tones. The Bohlen-Pierce scale is a microtonal scale consisting of tones whose
frequencies are interrelated by a logarithmic factor of 3, whereas the chromatic tones of Western
music use a factor of 2. This produces a completely novel tone set with which listeners have no
experience. These authors found that following familiarization with 400 Bohlen-Pierce melodies,
listeners were able to distinguish familiarization melodies from ungrammatical melodies, and
also could generalize their knowledge to distinguish novel grammatical melodies from
ungrammatical melodies.
There is also evidence that listeners are sensitive to non-adjacent dependencies in music. These
transitional probabilities are calculated from the conditional probability of any particular note,
given the note that came two positions earlier. Listeners seem to be able to learn these
dependencies, but only if there is a perceptual or cognitive cue that helps to organize the stimuli
into units. For example, Creel, Newport, and Aslin (2004) found that listeners were only able to
learn non-adjacent relations among tones when a perceptual grouping cue was present. Thus,
when non-adjacent tone words were played in the same pitch register or with the same timbre
during familiarization, listeners were able to distinguish these grammatical tone words from
ungrammatical ones at test. Additionally, Endress (2010) demonstrated that tonality can be used
as a grouping cue to learn non-adjacent dependencies, with listeners learning non-adjacent
melodic units when they were tonal but not when they were atonal.
All in all, these studies suggest that listeners are sensitive to statistical information presented in
musical passages, and that they are able to extract information regarding musical structure from
these statistical cues. Turning to the third question posed above, some authors have suggested
that this type of finding is evidence that statistical learning can give rise to musical expectancies
(e.g., Jonaitis & Saffran, 2009). However, expectancy is best understood as the way that the
structural knowledge learned from statistical cues “is internalized in the form of representations
8
that influence subsequent processing” (Bharucha & Stoeckig, 1986, p. 403). Thus, a true test of
musical expectancy would demonstrate that these statistically-learned structural representations
have downstream processing effects. In this light, simply demonstrating that listeners
discriminate between grammatical and ungrammatical items, without testing the consequences of
this discrimination, is very weak evidence for the statistical learning of expectancies.
This question has been previously studied in language by Graf-Estes, Evans, Alibali, and Saffran
(2007). In their study, these authors presented infants with a artificial word streams similar to
those from used by Saffran et al. (Saffran, Aslin, et al., 1996; Saffran, Newport, et al., 1996).
Following this exposure, infants were separated into groups for an object label-learning task.
During this task, the “word” group was habituated to label-object pairs in which the label was a
word from exposure, whereas the “non-word” and “part-word” groups were habituated to label-
object pairs in which the label was a non-word or part-word, respectively. Graf Estes et al.
(2007) found that only the word group exhibited label learning, which indicated that words that
were segmented from the exposure were being treated as candidate words for linking to meaning.
Thus, language structure learning led to downstream processing effects based on that structure.
To date, only one study has explored this issue in music. Tillmann and Poulin-Charronnat (2010)
trained listeners on melodies produced from a finite-state grammar with Western chromatic tones
as nodes. Following familiarization, listeners encountered novel melodies based on the
familiarization grammar; each melody contained a target note that was in- or out-of-tune.
Critically, this target note either did or did not conform to the familiarization grammar. Listeners
were asked to respond as quickly and accurately as possible with regard to whether the target
was in- or out-of-tune. For in-tune targets, these authors found that listeners responded faster and
more accurately to grammatical than ungrammatical targets. This result replicated previous
studies of melodic priming in Western music, wherein listeners respond more quickly to
expected than unexpected targets (e.g., Marmel, et al., 2008). Therefore, this experiment
successfully provided evidence that structural information that listeners learn from statistical
cues can lead to downstream expectancy effects.
2 General Methods
The experiments in each chapter all share a common structure. Each experiment has three
phases: familiarization, discrimination, and expectancy. The general methodological features of
9
each phase will be explained in this section, and more detailed descriptions will be included in
the methods section for each chapter.
In the familiarization phase, listeners were exposed to a large corpus of materials with a novel
grammatical structure. These materials varied in terms of their complexity and their familiarity.
Complexity refers to whether the materials were composed of melodies (one sound event at a
time) or chords (multiple sound events at a time). Familiarity refers to whether the materials
were composed from a tone set that is familiar (i.e., Western chromatic set) or unfamiliar (i.e.,
Bohlen-Pierce scale).
The discrimination phase consists of an attempt at replication of past work that has shown that
following the familiarization phase, listeners are able to discriminate between items that are
grammatical and ungrammatical in the novel system. Thus, in this phase, listeners were asked to
discriminate between grammatical and ungrammatical items.
The expectancy phase was the critical part of these experiments. In this phase, musical
expectancies arising from the familiarization phase were measured using either a recognition
memory task or a musical priming task. It was hypothesized that musical expectancy would be
manifested in improved memory performance and musical priming effects for grammatical over
ungrammatical items.
3 Research Goals
The current series of experiments was undertaken with two goals in mind. The primary goal was
to demonstrate that listeners are able to learn musical expectancies through statistical learning.
To that end, listeners were familiarized with materials from several published statistical learning
paradigms (Experiments 1-4) as well as one set of novel materials (Experiment 5). Following
this, attempts were made to measure musical expectancies that may have arisen during
familiarization. This was essentially an attempt to conceptually replicate Tillmann and Poulin-
Charronnat’s (2010) results while varying the materials used and the methodologies employed to
measure expectancy.
The secondary goal of this research was to explore whether the ease of learning musical
expectancies depends on certain properties of the familiarization materials. Thus, the
10
familiarization materials varied, as described above, according to their complexity and their
familiarity.
This project will therefore establish a critical mass of evidence for the idea that musical
expectancies are developed, at least in part, through listeners’ incidental exposure to statistical
cues in the environment that give cues to musical structure. Moreover, these experiments will
provide an exploration of the limits to this statistical learning with respect to the perceptual
properties of the materials encountered.
11
Chapter 2 Experiment 1: Memory For Western Tone-Word Melodies
4 Introduction
This experiment aimed to establish that statistical learning processes give rise to musical
expectancies. The materials employed in this experiment were inspired by Saffran and her
colleagues, who have used artificial tone languages extensively in their work (Saffran, 2003a,
2003b; Saffran & Griepentrog, 2001; Saffran, et al., 1999; Saffran, et al., 2005). Saffran’s studies
have shown that after exposure to tone streams constructed of tone words that are defined by
transitional probabilities, participants are able to discriminate between grammatical and
ungrammatical tone-words. Thus, listeners are sensitive to transitional probability information
during music listening. This result has been replicated extensively in adults and infants (Saffran,
et al., 1999), and with materials containing absolute and relative pitch cues (e.g., Saffran &
Griepentrog, 2001). However, Trehub (2003) raised some concerns regarding Saffran’s (2003a)
materials. Specifically, some musical characteristics of the tone-words were not controlled,
which meant that participants were possibly using perceptual, rather than statistical, cues to do
the discrimination task. An additional concern was that expectancies regarding a complex
grammar would be difficult to apprehend during short-term familiarization in the laboratory.
Thus, the familiarization and expectancy phases were designed as replications of Saffran et al.
(1999), but with a set of tone-words in which some critical musical-perceptual characteristics
were controlled, and with the grammatical structure governing the tone-words somewhat
simplified.
In order to measure expectancy, inspiration was taken from the memory literature. Dowling and
his colleagues have studied melody memory comprehensively using a recognition memory task
(Bartlett & Dowling, 1980; Dowling, 1978, 1991; Dowling & Bartlett, 1981; Dowling &
Fujitani, 1971; Dowling, Kwak, & Andrews, 1995). In a typical study, participants hear two
melodies sequentially (called the standard and the comparison, respectively) and must indicate
whether the comparison was the same as the standard. This research has revealed a panoply of
factors that influence participants performance on this task, including the delay between
melodies, and the contour and interval content of the comparison. Importantly, research has
demonstrated that tonality also has predictable effects on melody memory. In general, well-
12
structured tonal melodies are remembered better than those not conforming to the rules of tonal
structure (Cuddy, Cohen, & Mewhort, 1981; Cuddy, Cohen, & Miller, 19799; Dewar, Cuddy, &
Mewhort, 1977; Dowling, 1978, 1991; Frances, 1972).
If familiarization on tone-word languages leads to expectancies about tone-word structure,
perhaps these expectancies can be used to cognitively organize tones into tone-word chunks.
Classic memory research has shown that chunking helps increase processing efficiency and thus
boosts memory performance (Miller, 1956). In this case, melodies that were composed of these
expected tone-word chunks might be remembered differently than randomly-composed
melodies, in the same way that tonal melodies are remembered differently from random ones.
Finally, consideration should be given to the familiarity and complexity of these experimental
materials, and how these characteristics might have influenced the ease of expectancy learning in
this experiment. These stimuli were comprised of Western chromatic tones organized into a
melodic texture, meaning that they were built from familiar-sounding parts and simple in
construction. One can make two divergent predictions here. On the one hand, it is possible that it
would be easy to learn structural expectancies with these materials because the processing
demands of familiar, simple stimuli are low. On the other hand, it is possible that expectancy
learning would be impeded for two reasons. First, the use of familiar musical building-blocks
might have led to the evocation of well-trained Western musical expectancies, thus making it
very difficult to learn novel expectancies in their place. Second, melodic stimuli may be too
simple in texture to provide enough information for the development of music-structural
expectancies.
5 Methods
5.1 Participants
Thirty-two participants were recruited from the University of Toronto Scarborough community
using the introductory psychology participant pool as well as posted advertisements. Participants
were compensated with course credit or $10 per hour.
Participants consisted of 8 males and 24 females with a mean age of 20.2 years (SD = 2.4 years).
Participant data was lost for 8 participants due to experimenter error; the following information
was gathered from the remaining 24 participants. Participants were not pre-selected for musical
13
experience, but had on average 5.1 years of formal musical training (SD = 4.4 years), 1.5 years of
musical theory training (SD = 2.6 years), played music for 1.1 hours per week (SD = 1.9 hours),
and listened to music for 10.1 hours per week (SD = 10.6 hours). No participants reported having
taken part in a music psychology experiment previously, nor did any participant report having
absolute pitch.
5.2 Apparatus
All three phases of the experiment were presented to participants using an Intel Pentium 4
personal computer, with code written and run in MATLAB 7.0. This experiment was realised
using Cogent 2000 developed at University College London by the Cogent 2000 team at the
Functional Imaging Laboratory and the Institute of Cognitive Neuroscience, and Cogent
Graphics developed by John Romaya at the Laboratory of Neurobiology at the Wellcome
Department of Imaging Neuroscience. The experiment interface was viewed on an LG Flatron
L1710S monitor, while the auditory components of the experiment were heard through a pair of
Sennheiser HD 280 pro headphones connected to a Creative Sound Blaster Audigy 2 ZS
soundcard, at a volume comfortable for the participant. Participant responses were collected
using the computer keyboard.
5.3 Materials and Procedure
Participants were randomly assigned to the discrimination group or the expectancy group. Both
groups took part in the familiarization phase. Only the discrimination group participated in the
discrimination phase, and only the expectancy group participated in the expectancy phase.
Familiarization phase:
Two counterbalanced languages were created for the familiarization phase, similar to the
materials used by Saffran et al. (1999, see Table 1). All participants were randomly assigned to
either Language A or Language B. Both languages were composed from the 12 tones of the
Western chromatic set, synthesized in Audacity 1.3 as 0.33 second-long sine waves with 0.03
second onset and offset ramps. For each language, the tones were combined to make four tone-
words containing three tones each. Within each language, the transitional probability between
any two tones within a tone-word was 0.33, and the transitional probability between any two
tones between tone-words was 0. Additionally, none of the transitions within tone-words in one
14
language ever occurred in the other language, such that a tone-word in Language A was a non-
word in Language B and vice versa.
For the familiarization phase, the 4 tone-words of each language were concatenated randomly
and continuously into a five-minute block, with two constraints: that each of the four tone-words
were heard with equal probability, and that there were never back-to-back repeats of a tone-
word. This five-minute block thus contained 75 instances of each tone-word.
During this phase, participants heard the five-minute familiarization block three times, for a total
of 15 minutes of exposure to the tone-word stream. They were told that they would hear a series
of continuous tones, and instructed to listen carefully because they would be tested following the
listening session. They were not informed about which aspects of the tones would be tested.
Discrimination phase:
The four tone-words from each language were exhaustively paired with one another to create 16
discrimination trials. On each trial, participants heard a tone-word from one language, followed
by a 0.75 second pause, and then the tone-word from the other language. Whether the Language
A tone-word or the Language B tone-word would be heard first was counterbalanced across the
trials. Correct responses for Language A participants were incorrect for Language B participants,
and vice versa. Participants were informed that they would hear two tone sequences separated by
silence and instructed to indicate whether the first or second tone sequence was more familiar,
based on the familiarization phase. The task therefore required a subjective judgment of
familiarity that depended on discriminating between a tone-word from the familiarization
language and a non-word from the other language. Following the discrimination trials,
participants completed a survey regarding their musical experience. The entire experimental
session for the discrimination group lasted approximately 30 minutes.
Expectancy phase:
Nine-note melodies were composed for each language by exhaustively joining the four tone-
words into all possible permutations of three tone-words. This process yielded 24 melodies for
each language. Because the tone-words were originally composed to control for interval
(distance between tones) and contour (pattern of ups and downs) content between Language A
and Language B melodies, these two factors could therefore not be used to discriminate the
15
melodies and tone-words of Lang uage A from B. As a result of familiarization, Language B
melodies were expected to sound ungrammatical to Language A participants, and vice versa.
Memory trials were composed following Dowling (1978) study. For each of the 48 standard
melodies (half Language A, half Language B), three comparison melodies were constructed (see
Figure 1). In the Match condition, the comparison melody was identical to the standard melody.
In the Same Contour condition, the comparison started on the same note and followed the same
pattern of ups and downs as the standard. However, notes 2 to 9 were chosen randomly from the
Western chromatic tone set (with the contour restriction) such that the intervals between notes
were altered. In the Random condition, the comparison melody started on the same note as the
standard and notes 2 to 9 were chosen randomly from the chromatic set (without a contour
restriction), creating a melody that differed from the standard in contour and interval content.
According to Dowling (1978) study memory performance should be best for random trials, as
they are highly discriminable from standards. Moreover, if the familiarization phase leads to the
development of melodic expectancies, one should also predict better memory performance for
trials containing standards constructed with tone-words from the participant’s familiarization
language. The order of the 144 expectancy trials (48 Language A and Language B standards x 3
comparisons) was randomized for presentation to participants.
On each trial, participants heard two melodies separated by a 2 second pause. They were
informed that they would hear one melody followed by another melody, and asked to indicate
whether the second melody was the same as the first. Following the expectancy trials,
participants completed a survey regarding their musical experience. The entire experimental
session for the expectancy group lasted approximately one hour.
16
Figure 1. Examples of grammatical and ungrammatical standards and their corresponding
comparisons for Experiment 1.
17
Table 1
Composition of Materials for Experiment 1
Tone Frequency (Hz) Language A Language B
C 261.63
C# 277.18
D 293.66
C# A F
D A G D# 311.13
E 329.63
A# D E
G# D# E F 349.23
F# 369.99
D# G B
C F# B G 392.00
G# 415.30
A 440.00 G# F# C A# F C#
A# 466.16
B 493.88
6 Results
6.1 Discrimination Phase
For each participant in the discrimination group, the number of correct discriminations out of 16
trials was tabulated. These data were submitted to a one-sample t-test with µ = 8 (chance
performance). Participants as a group performed significantly above chance, t(15) = 4.74, p <
.001, with mean number correct = 11.25 ± 0.66. Additionally, there was no significant difference
between participants trained on Language A vs. B, t(14) = 0.18, p = .86.
18
6.2 Expectancy Phase
Responses from the melody memory task were analyzed with respect to accuracy (percent
correct) and d’. Because the 24 standards for each language included every possible permutation
of the tone-words in each language, no systematic differences were expected between standards.
Therefore, responses were collapsed across the 24 standards in each condition for all analyses.
Accuracy data were submitted to repeated measures ANOVA with Trial Type (Same Contour,
Random) and Grammaticality (Grammatical, Ungrammatical) as factors. There was a main effect
of Trial Type, F(2,30) = 4.38, MSE = 0.03, p = .02, ɳp2 = .23; planned comparisons showed that
this was due to higher accuracy for random trials over match trials, t(15) = 2.77, p = .01, and
same contour trials, t(15) = 3.93, p = .001. There was no difference in accuracy for match and
same contour trials, t(15) = 0.71, p = .49 (Figure 2).
Match Same Contour Random
Trial Type
0
20
40
60
80
100
Ac
cu
rac
y (
%)
**
Figure 2 . Main effect of Trial Type on accuracy in Experiment 1.
19
The main effect of Grammaticality was non-significant, F(1,15) = 0.10, p = .76. The interaction
between Trial Type and Grammaticality was significant, F(2,30) = 6.49, MSE = 0.01, p < .01, ɳp2
= .30 (Figure 3). To probe this interaction, unplanned simple effects analyses were conducted for
each trial type at a Bonferroni-corrected α = .02. For match trials, responses to ungrammatical
melodies were more accurate than to grammatical ones, t(15) = 2.60, p = .02. There was no
difference in accuracy between grammatical and ungrammatical melodies for same contour or
random trials, all t < 2.23, all p > .04.
Grammatical Ungrammatical
60
70
80
90
100
Ac
cu
rac
y (
%)
Grammatical Ungrammatical Grammatical Ungrammatical
Grammaticality
Match Trials Same Contour Trials Random Trials
*
Figure 3. Trial Type x Grammaticality interaction for accuracy in Experiment 1.
d’ data were submitted to repeated measures ANOVA with Trial Type (Match vs. Same Contour,
Match vs. Random) and Grammaticality (Grammatical, Ungrammatical) as factors. There was a
main effect of Trial Type, F(1,15) = 16.71, MSE = 2.46, p = .001, ɳp2 = .53, with random
comparisons more distinguishable from matches than same contour comparisons (Figure 4). The
main effect of Grammaticality was not significant, F(1,15) = 0.01, p = .93, nor was the Trial
Type x Grammaticality interaction, F(1,15) = 0.05, p = .82.
20
Match vs. Same Contour Match vs. Random
Trial Type
0
0.5
1
1.5
2
2.5d
-pri
me
***
Figure 4. Main effect of Trial Type on d’ in Experiment 1.
7 Discussion
Participants’ successful discrimination performance replicated Saffran et al.’s (1999) results, and
demonstrated that participants were able to distinguish grammatical from ungrammatical items
following the familiarization period. Results from the expectancy group were not as clear cut.
The effect of trial type on accuracy and d’ were consistent with Dowling’s research (e.g., 1978).
However, evidence that grammatical melodies (composed from familiarization tone-words) led
to better memory performance was quite weak, with no effect of grammaticality on either
accuracy or d’. In fact, the interaction between Trial Type and Grammaticality for accuracy was
actually driven by worse accuracy for grammatical than ungrammatical melodies on match trials.
Thus, this experiment has provided little evidence of musical expectancies arising from statistical
learning.
Why might this experiment have failed to successfully induce musical expectancies? As
discussed, the chromatic melodies used may have been too simple and/or familiar. However, the
21
fact that memory performance was not at ceiling advocates against this idea. Another possibility
is that memory based upon melodic chunking may not have been the right way to study the
statistical learning of expectancies in music, for two reasons. First, the use of memory
performance as a proxy for measuring expectancy, based on previous work suggesting that
highly expected melodies are also better remembered (Schmuckler, 1997), meant that
expectancies were being measured indirectly. A stronger test of expectancy learning would use a
task that directly taps expectancy generation, such as a priming paradigm.
Second, the conceptualization of melody memory in terms of the encoding of short melodic
fragments does not line up with current theoretical understanding of how melodies are encoded
in memory (i.e., Dowling, 1991). Rather, a more intuitive and musically-relevant context in
which to examine these processes may be in the learning of harmonic progressions. The adjacent
dependencies between chords are very important to expectancy, as they govern both harmonic
and melodic movement within a piece. Chords in particular are also more texturally complex
(with more than one note sounded at once), and thus provide a richer source of auditory
information upon which to scaffold expectancy, due to the interaction of simultaneously sounded
notes. Therefore, in Experiment 2, novel chord progressions were employed in a priming task in
an attempt to record the development of musical expectancies from statistical learning.
22
Chapter 3 Experiment 2: Tonal Priming With Western Chords
8 Introduction
In Experiment 2, attention was turned to the learning of expectancies in novel harmonic
progressions. Harmony is the highly constrained system by which tones are combined to form
chords and chords are combined in sequential order. Listener expectations play an important role
in theoretical accounts of harmony, and many music theorists have observed that the harmonic
function of any chord is to indicate which chord comes next (Meyer, 1956; Schenker, 1954). The
order in which chords are sounded is thus critical to harmonic expectancy. Statistical learning is
an ideal mechanism by which these sequential relations could be learned, with listeners
developing sensitivities to the conditional probabilities of one chord occurring given the previous
occurrence of a another chord as a result of exposure to a large grammatical musical corpus. The
MUSACT model (Tillmann, et al., 2000) is an example of how such a learning process might act
to produce expectancies in music. As discussed in Chapter 1, this model employs a network of
tones, chords, and keys in which connections between nodes are strengthened by the co-
occurrence (both simultaneous and serial) of node elements during exposure.
Jonaitis and Saffran (2009) provided some evidence that listeners’ harmonic expectancies may
arise through statistical learning processes. These authors developed a novel harmonic grammar
based on the Phrygian mode. This grammar utilized the chromatic tone set and chords of
Western music, but combined them in an unfamiliar way. Participants were first trained on 100
grammatical exemplars and then asked to make judgments on new items in terms of how similar
to familiarization items they sounded. These new items were manipulated according to whether
they were grammatical or ungrammatical, and whether they were completely correct or
contained minor grammatical errors. After one familiarization session, participants were able to
distinguish between grammatical and ungrammatical sequences. After two familiarization
sessions, participants were also able to distinguish between the correct and error-containing
grammatical items. Jonaitis and Saffran (2009) thus concluded that participants were able to
learn novel harmonic expectancies through statistical learning.
23
Although discrimination between grammatical and ungrammatical items indicates
comprehension of the grammar structure that produced the familiarization items, it is weak
evidence for the development of musical expectancy. As discussed previously, expectancy is
better understood as the way that this structure affects subsequent processing. Thus, to test
whether statistical learning leads to the development of musical expectancy, this experiment used
the methods from a classic musical priming study by Bharucha and Stoeckig (1986). These
authors presented pairs of chords in sequential order, and asked listeners to make a perceptual
judgment on the second chord. They found that judgments were faster and more accurate when
the two chords were related (i.e., the second chord was expected given the first) than when they
were unrelated (i.e., the second chord was unexpected given the first).
The musical priming effects observed with chord pairs by Bharucha and Stoeckig (Bharucha &
Stoeckig, 1986) have been extended to longer musical contexts and more subtle manipulations of
tonal expectancy (Bigand & Pineau, 1997), as well as to melodic stimuli (Marmel & Tillmann,
2009; Marmel, et al., 2010). Research has confirmed that these priming effects arise
predominantly from expectancies based on tonal relatedness, and not from simple
psychoacoustic similarity (Bharucha & Stoeckig, 1987; Bigand, et al., 2003; Tekman &
Bharucha, 1998). This priming paradigm therefore provides an ideal methodology with which
musical expectations can be quantified.
The familiarization and discrimination phases of Experiment 2 were full replications of Jonaitis
& Saffran’s (2009) Experiment 1. In the expectancy phase, participants were presented with
pairs of chords from the familiarization grammar and asked to make a perceptual judgment
(timbre A vs. B) on the second chord. If participants developed expectancies from
familiarization, they should have responded more quickly and accurately when the chord pair
sequences were grammatical than when they were ungrammatical.
Finally, consideration should be given to the familiarity and complexity of these experimental
materials, and how these characteristics might influence the ease of expectancy learning in this
experiment. These stimuli were comprised of Western chromatic piano tones organized into a
harmonic texture, meaning that they were built from familiar-sounding parts and complex in
construction. As in Experiment 1, it is possible to make two divergent predictions here. First, it
may be easy to learn structural expectancies with these materials because the processing
24
demands for familiar chords (although they are combined in a novel way) are low, which allows
cognitive resources to be spent on learning new connections among chords rather than on
perceiving the chords themselves. Conversely, the highly familiar chords may lead to inferior
expectancy learning because they provide a high level of context that may evoke participants’
knowledge of Western harmonic structure. Thus, despite the novel grammatical structure, these
chords may activate extremely strong musical expectancies that have been learned over a
lifetime of exposure to Western music, making it very difficult for participants to develop new
expectancies during a short laboratory session.
In fact, it is almost certain that activation of Western harmonic expectancies could not be
avoided. Because of this concern, the expectancy task conditions were designed to test all
possible combinations of grammaticality in Western harmony vs. in the familiarization grammar.
Thus, the chord pair in each trial was manipulated with regard to its legality in Western harmony
and its legality in the familiarization grammar, with different levels of priming expected for each
of these conditions. If participants did not learn harmonic expectancies during familiarization,
one would expect to observe priming effects only in trials that are legal in Western harmony.
Conversely, if participants did learn harmonic expectancies, and assuming that priming effects
are additive, the resultant priming levels are as follows, and are illustrated in Figure 2:
(1) Trials that are legal in both the familiarization grammar and Western harmony should
show the strongest priming (highest accuracy and lowest reaction times).
(2) Trials that are legal only in Western harmony, should show strong priming.
(3) Trials that are legal only in the familiarization grammar should show moderate priming.
(4) Trials that are illegal in both the familiarization grammar and Western harmony should
not show priming effects (Figure 5).
25
Figure 5. Priming conditions and hypothesized priming strengths if participants develop
harmonic expectancies in Experiment 2.
9 Methods
9.1 Participants
Fourteen participants were recruited from the University of Toronto Scarborough community
using the introductory psychology participant pool as well as posted advertisements. Participants
were compensated with course credit or $10 per hour.
Participants consisted of six males and eight females with a mean age of 20.0 years (SD = 3.1
years). Participants were selected to have at least three years of formal musical training, as non-
musician participants had trouble with the expectancy priming task during piloting. Participants
had on average 6.3 years of formal musical training (SD = 3.4 years), 0.91 years of musical
theory training (SD = 1.6 years), played music for 1.5 hours per week (SD = 1.7 hours), and
listened to music for 11.8 hours per week (SD = 7.0 hours). No participants reported having
taken part in a music psychology experiment previously, nor did any participant report having
absolute pitch.
9.2 Apparatus
This experiment used an identical apparatus to Experiment 1.
26
9.3 Materials
The familiarization and discrimination phases used the original materials from Jonaitis and
Saffran (2009). Stimulus items were chord progressions constructed from a finite state grammar
whose nodes were chords in the Phrygian mode (Figure 6). Progressions were required to begin
and end on the tonic chord (I), although multiple loops through the grammar were permitted.
Two grammars were used (Grammar A and Grammar B). Grammar B was a retrograde of
Grammar A, such that every Grammar B item was a Grammar A item where the chords were
heard in reverse order. The stimuli were designed by these authors to follow music-theoretic
conventions. First, the chords in the progressions form a harmonic hierarchy, such that chords
that are more structurally important are heard more often. Second, the items were voiced to make
the movements from chord to chord clear in the low register, as well as to follow basic rules of
melodic composition for the melody in the upper register. All items were played with a piano
timbre at 120 beats per minute (500 ms per chord).
Figure 6. Finite state grammars used in Experiment 2, based on Figure 1 from Jonaitis & Saffran
(2009). (a) Grammar A; (b) Grammar B.
For the familiarization phase, 100 Grammar A familiarization items (50 unique items, each
presented in two different voicings), four to ten chords in length and distributed uniformly across
keys, were employed. The Grammar B materials were excluded for simplicity of design, and can
be justified because these authors found no differences in performance between Grammars A and
B in their original study.
27
For the discrimination phase, 60 test items that were not part of the familiarization corpus were
used. These stimuli were five to ten chords in length and distributed uniformly across keys.
Furthermore, these items were manipulated along two dimensions: Grammaticality
(Grammatical, Ungrammatical) and Correctness (Correct, Error). Grammatical items adhered to
Grammar A’s structure, whereas ungrammatical ones followed Grammar B’s structure. Correct
items were completely grammatical exemplars of Grammar A or B, whereas error items were
based on one of the grammars but contained one to three illegal transitions. Thus, for each
grammar, there were 15 correct items and 15 error items (5 items containing one, two, and three
illegal transitions, respectively). Figure 7 shows some examples of these items.
28
Figure 7. Examples of items used in Experiment 2, based on Figures 2-3 from Jonaitis & Saffran
(2009). (a) Example of Grammar A exposure item. (b) Example of Grammar A correct
discrimination item. (c) Example of Grammar B correct discrimination item. (d) Example of
Grammar A error discrimination item. (e) Example of Grammar B error discrimination item.
29
Expectancy phase and tuning practice:
For the expectancy phase, subsets of chords from the experimental grammars were produced in a
piano timbre using Sonar 8. Each chord was produced in four versions, with the chord being
either 2 seconds or 3 seconds long and in-tune or out-of tune. Out-of-tune chords were created by
lowering the frequency of the fifth degree (the highest of the three chord tones) by a factor of
21/24
(a quarter-tone). Although Bharucha and Stoeckig (1986) only mistuned the fifth degree by
an eighth-tone, the quarter-tone mistuning was chosen because participants were unable to detect
the eighth-tone mistuning during pilot testing.
Two chord pairs were selected for each of the four priming conditions (no priming, some
priming, strong priming, strongest priming; see Figure 5). The first chord in each pair was called
the “prime”, and the second chord was called the “target”. These chord pairs were manipulated
with respect to Tuning and Chord Order. Tuning refers to whether the target was in-tune or out-
of-tune. Chord Order refers to whether the two chords were presented in the forward (Grammar
A) or retrograde order. Interestingly, chord order effects have not previously been investigated in
the harmonic priming literature. Thus, although the retrograde order is technically
ungrammatical according to the familiarization grammar, these chords would have been heard
adjacently during familiarization and may have been learned as related regardless of presentation
order, leading to priming effects.
Since the prediction of priming with regard to chord order was not clear, both orders for each
chord pair were categorized under the same priming condition (i.e., the retrograde order was still
considered grammatical for the purposes of this design, although the statistical test of this
assumption is presented in the results section). Finally, because familiarization items were
distributed evenly across musical keys, each combination of chord pair, tuning, and chord order
was presented in 3 randomly chosen keys. These manipulations resulted in 96 priming trials (4
priming conditions x 2 chord pairs x 2 tunings x 2 orders x 3 keys); the details of these trials can
be seen in Table 2.
30
Table 2
Composition of Expectancy Trials for Experiment 2
Priming Condition Chord Pair Chord Order Prime Target
No Priming 1 n/a VI II#
(illegal Western,
illegal familiarization)
n/a II# VI
2 n/a vi ii#
n/a ii# vi
Moderate Priming 1 Grammatical I II
(illegal Western,
legal familiarization)
Ungrammatical II I
2 Grammatical I vii
Ungrammatical vii I
Strong Priming 1 n/a VI III
(legal Western,
illegal familiarization)
n/a III VI
2 n/a vi iii
n/a iii vi
Strongest Priming 1 Grammatical iv i
(legal Western,
legal familiarization)
Ungrammatical i iv
2 Grammatical i III
Ungrammatical III i
Note. The prime and target chords are labeled based on the Roman numeral system presented in
Figure 6. Uppercase letters indicate a major chord, and lowercase letters indicate a minor chord
(as defined by Western harmony). These items were actually presented in three randomly chosen
keys during the experiment.
9.4 Procedure
Each participant completed all phases of the experiment.
Tuning training:
31
Before starting the experimental trials, participants were trained to discriminate in-tune from out-
of-tune chords using methodology from Bharucha & Stoeckig (1986). First, participants listened
to four examples each of in-tune and out-of-tune chords. Next, they were presented with 48
chords, half of which were in-tune and half of which were out-of-tune, and required to judge the
intonation of each chord. These test trials were preceded by 10 practice trials, and participants
were required to score 43/48 (90%) correct before moving on to the familiarization phase. All
chords presented during tuning training were 2 seconds in length.
Familiarization phase:
During the familiarization phase, participants heard 100 grammatical progressions in the
familiarization grammar. They were told that they would be presented with items from a novel
musical system and then asked to listen carefully to each item and rate how much they liked each
item on a scale from 1 (did not like it at all) to 7 (liked it a lot). This liking task was used in order
to maintain the participant’s attention throughout the familiarization phase, without calling
attention to the harmonic structural aspects of the items that would be tested later.
Expectancy phase:
The expectancy phase was conducted before the discrimination phase because of concerns that
hearing discrimination items (some of which contained grammatical errors) would alter
expectancy performance. For this priming task, participants were instructed that each trial started
with a random series of 16 chromatic tones (125 ms each), acting as an auditory mask between
trials. This mask was followed by a pair of chords. The first chord (the prime, 3 seconds long)
was always in-tune, whereas the second chord (the target, 2 seconds long) could be in- or out-of-
tune. Participants were asked to indicate as quickly and accurately as possible whether the
second chord was in- or out-of-tune. The 96 expectancy trials were preceded by 12 practice
trials.
Discrimination phase:
In the discrimination phase, participants were told that some of these items they were about to
hear would be from the familiarization system, and some of them would be from a different
system. On each trial, participants heard a test item, and were instructed to judge how similar it
32
was to the familiarization items on a scale from 1 (very dissimilar) to 7 (very similar). The 60
discrimination trials were preceded by six practice trials.
Following all the experimental trials, participants completed a survey regarding their musical
experience. The entire experimental session lasted approximately 90 minutes.
10 Results
10.1 Discrimination Phase
Similarity ratings from each participant were collapsed across the 15 exemplars in each of the
Grammar/Correctness conditions. These mean ratings were then submitted to repeated measures
ANOVA with Grammaticality (Grammatical, Ungrammaticality) and Correctness (Correct,
Error) as factors. There was a main effect of Grammaticality, F(1,13) = 15.69, MSE = 0.37, p =
.002, ɳp2 = .55, with listeners judging Grammatical familiarization items as more similar to
exposure items than Ungrammatical sequences (see Figure 8). The main effect of Correctness
was not significant, F(1,13) = 0.02, p = .88, nor was the interaction between Grammaticality and
Correctness, F(1,13) = 1.08, p = .32.
Grammatical Ungrammatical
0
1
2
3
4
5
Sim
ila
rity
Ra
tin
gs
(1
-7)
Grammaticality
**
Figure 8. Average similarity ratings for grammatical and ungrammatical items in the
discrimination phase of Experiment 2.
33
10.2 Expectancy Phase
Responses from the priming task were analyzed with respect to accuracy and reaction time.
Accuracy data:
Raw accuracy data were collapsed across chord pairs, orders and keys. These mean accuracy
scores were then submitted to repeated measures ANOVA with Western Legality (Legal,
Illegal), Familiarization Legality (Legal, Illegal), and Tuning (In-tune, Out-of-tune) as factors
(see Figure 5). None of the main effects were significant, all F values < 4, all p values > .07.
Turning to the interactions, the Western Legality x Tuning interaction was significant, F(1,12) =
13.12, MSE < 0.01, p < .01, ɳp2 = .50, as was the three-way interaction between Western legality,
familiarization legality, and tuning, F(1,12) = 6.42, MSE = 0.01, p = .03, ɳp2 = .33. All of the
remaining interactions were non-significant, all F values < 1.52, all p values > .23.
The two-way interaction between Western legality and tuning was investigated with simple
effects, whereby the effect of Western legality was assessed for in-tune and out-of-tune trials
separately. For in-tune trials, responses were significantly more accurate for trials that were legal
in Western harmony than trials that were illegal in Western harmony, t(13) = 3.31, p < .01.
However, the effect of Western legality was non-significant for out-of-tune trials, t(13) = 1.61, p
= .13 (Figure 9). This result is in line with the finding that priming effects are strongest when the
acoustic surface of the stimulus is not disrupted, as it is with out-of-tune chords (Marmel &
Tillmann, 2009).
34
Western legal Western illegal
72
76
80
84
88
92A
cc
ura
cy
(%
)
Condition
Western Priming Effect Novel Priming Effect
Training legal/Western illegal
Training illegal/Western illegal
**
Figure 9. Western and novel priming effects in terms of accuracy (in-tune trials only) for
Experiment 2.
A set of a priori analyses were conducted to assess priming effects due to the familiarization
grammar. Since it was possible that Western harmony perception may have overshadowed the
effects of the familiarization grammar for trials that were legal in both harmonic systems, this
analysis focused on the comparison of trials that were legal only in the familiarization grammar
with trials that were illegal in both grammars. Accuracy was not significantly higher for trials
that were legal only in the familiarization grammar than for trials that were illegal in both
grammars, t(13) = .11, p = .92. Due to the effect of acoustic continuity seen for Western priming,
this comparison was assessed for in-tune trials only. For in-tune trials, there was still no
significant accuracy benefit for trials that were legal only in the familiarization grammar over
trials that were illegal in both grammars, t(13) = 1.47, p = .17, although there was a trend
towards greater accuracy for the familiarization legal trials (Figure 9).
Finally, the effect of chord order (in the familiarization grammar) on accuracy was explored.
Raw accuracy data for trials that were legal in the familiarization grammar were collapsed across
chord pairs and keys. These mean accuracy scores were analyzed separately for the subset of the
familiarization legal trials that were not legal in the Western grammar and the subset that were
35
legal in the Western grammar, because of the possibility that Western grammar effects would
overshadow familiarization grammar chord order effects in the latter case.
For the familiarization legal/Western illegal condition, mean accuracy scores were submitted to
repeated measures ANOVA with Tuning (In-tune, Out-of-tune) and Chord Order (Forward,
Retrograde) as factors. There was no significant effect of Tuning, F(1,13) = 1.11, p = .31, but the
effect of Chord Order was marginally significant, F(1,13) = 3.41, MSE = 0.01, p = .09, ɳp2 = .21,
with accuracy for trials played in forward order being higher than for trials played in retrograde
order. The interaction between tuning and chord order was significant, F(1,13) = 4.42, MSE =
.01, p = .055, ɳp2 = .25. This interaction was driven by a significant effect of chord order for in-
tune trials only, t(13) = 2.51, p = .03, with accuracy for forward trials being higher than for
retrograde trials (Figure 10). An identical analysis was performed for the familiarization
legal/Western legal condition. Neither of the main effects was significant, nor was the
interaction, all F values < 1, p > .34.
Forward Retrograde
Chord Order
75
80
85
90
95
100
Ac
cu
rac
y (
%)
in-tune
out-of-tune
*
Figure 10. Tuning x Chord Order interaction (accuracy) for the familiarization legal/Western
illegal condition for Experiment 2.
36
Reaction time data:
Reaction times were only analyzed for trials where participants answered correctly, and all
reaction times greater than 2000 ms were discarded. Due to poor performance (accuracy less
than 75%), the reaction time data for four participants were excluded from the analyses. For the
remaining 10 participants, reaction time data were then collapsed across chord pairs, orders, and
keys. These mean reaction times were submitted to repeated measures ANOVA with Western
Legality (Legal, Illegal), Familiarization Legality (Legal, Illegal), and Tuning (In-tune, Out-of-
tune) as factors.
Neither the main effects of Western legality nor familiarization legality were significant, both F
values < 1.82, both p values > .20. The main effect of Tuning was significant, F(1,9) = 30.12,
MSE = 8198.73, p < .001, ɳp2 = .77, with reaction times faster for out-of-tune than in-tune trials
(Figure 11). None of the two-way interactions were significant, all F values < 3.79, all p values >
.08. The three-way interaction between Western legality, familiarization legality, and tuning was
significant, F(1,9) = 10.24, MSE = 2910.34, p = .01, ɳp2 = .53.
In-tune Out-of-tune
0
200
400
600
800
1000
Re
ac
tio
n T
ime
(m
s)
Tuning
***
Figure 11. Effect of tuning on reaction time for Experiment 2.
A set of a priori analyses were conducted to assess priming effects due to Western grammar and
the familiarization grammar. First, to assess Western harmonic priming, and following the
analysis of accuracy, reaction times for the Western legal trials were compared to reaction times
37
for the Western illegal trials for in-tune trials only. For these trials, responses were no faster for
chord pairs that were legal than pairs that were illegal in Western harmony, t(9) = 1.60, p = .15
(Figure 12).
Western legal Western illegal
700
750
800
850
900
950
1000
Re
ac
tio
n T
ime
(m
s)
Condition
Western Priming Effect Novel Priming Effect
**
Training legal/Western illegal
Training illegal/Western illegal
Figure 12. Western and novel priming effects in terms of reaction time (in-tune trials only) for
Experiment 2.
Next, to assess familiarization grammar priming, reaction times for trials that were legal only in
the familiarization grammar were compared with reaction times for trials that were illegal in both
grammars. Like for the accuracy analysis, this comparison was designed to detect priming effects
due to the familiarization grammar that may have been masked by Western harmony perception
in trials that were legal in both Western harmony and the familiarization grammar. Reaction
times for trials that were legal in the familiarization grammar only were significantly faster than
for trials that illegal in both grammars, t(9) = 2.50, p = .03. Like in the previous analyses, this
contrast was then performed for in-tune trials only. For in-tune trials, reaction times that were
legal in the familiarization grammar only were still significantly faster than for trials that were
illegal in both grammars, t(9) = 3.19, p = .01 (Figure 12). This was not the case for out-of-tune
38
trials, where there was no discernible priming due to the familiarization grammar, t(9) = 0.76, p
= .47.
Finally, the effect of chord order (in the familiarization grammar) on reaction time was explored.
Reaction time data for trials that were legal in the familiarization grammar were collapsed across
chord pairs and keys. These mean reaction times were analyzed separately for the subset of the
familiarization legal trials that were not legal in the Western grammar and the subset that were
legal in the Western grammar, because of the possibility that Western grammar effects would
overshadow familiarization grammar chord order effects in the latter case.
For the familiarization legal/Western illegal condition, mean reaction times were submitted to
repeated measures ANOVA with Tuning (In-tune, Out-of-tune) and Chord Order (Forward,
Retrograde) as factors. There was a significant effect of Tuning, F(1,9) = 8.56, MSE = 7711.35, p
= .31, ɳp2 = .49, with reaction times for out-of-tune trials being faster than for in-tune trials (see
Figure 11). The main effect of Chord Order was not significant, F(1,9) = .003, p = .09, ɳp2 = .21,
nor was the Tuning x Chord Order interaction, F(1,9) = 2.23, p = .17. An identical analysis was
performed for the familiarization legal/Western legal condition. Neither of the main effects was
significant, nor was the interaction, all F values < 2.56, p > .13.
Individual Differences
The last set of analyses explored whether performance in the discrimination phase was
associated with performance in the expectancy phase. Similarity scores were obtained for each
participant by finding the difference between the mean ratings for grammatical items and
ungrammatical items. A higher similarity score indicated that the participant distinguished better
between grammatical and ungrammatical items. The accuracy priming effect was calculated by
subtracting the mean accuracy for the No Priming condition from the mean accuracy for the
Some Priming condition. A larger positive difference indicated a larger accuracy advantage for
the Some Priming condition, which contained chord pairs that were legal only in the novel
grammar. Similarly, the reaction time priming effect was calculated by subtracting the mean
accuracy for the No Priming condition from the mean accuracy for the Some Priming condition.
Here, a larger negative difference indicated a larger reaction time advantage for the novel
grammar. Bivariate correlations were calculated between similarity scores and each of the
39
priming effects. Neither of the two correlations were significant, all absolute r values < .27, p >
.45.
11 Discussion
In the discrimination phase, participants were able to discriminate successfully between
grammatical and ungrammatical items, but they did not make the subtler distinction between
correct Grammar A items and ones that included some ungrammatical transitions; this finding
replicates Jonaitis & Saffran’s (2009, Experiment 1) results.
Turning to the expectancy phase, there was an overall effect of tuning, with higher accuracy and
faster reaction times to out-of-tune trials than in-tune trials. This result has been observed in
previous research, and can be explained by the fact that stimuli that disrupt an acoustical surface
(out-of-tune chords) are more salient than those that do not (in-tune chords) (Marmel &
Tillmann, 2009). More interestingly, priming effects driven by Western harmony were observed,
as expected. However, these priming effects were restricted to differences in accuracy, and did
not manifest in reaction time differences. This was unexpected based on the wealth of previous
research that has demonstrated consistent Western harmonic priming effects on both accuracy
and reaction time. It is possible that Western priming was somewhat disrupted by participants
learning the rules of the familiarization grammar.
Most critically, a priming effect for the familiarization grammar was observed for reaction time
(but not for accuracy). Furthermore, there was some weak evidence that chord order matters in
this priming task, with chord pairs that were presented in the order encountered during
familiarization eliciting more accurate responses than those presented in retrograde order. Thus,
although these effects did not manifest consistently in both accuracy and reaction time, they do
constitute evidence that the statistical learning of adjacent dependencies in novel harmonic
materials has downstream processing effects.
Lastly, discrimination performance was unrelated to priming performance. This may indicate
that the discrimination task and the priming task targeted different cognitive processes, although
given the fact that they were designed specifically to assess the same mental representation –
learned adjacent dependencies between chords in the familiarization grammar – this explanation
seems unlikely. Rather, the priming task may have been a more sensitive measure of
40
participants’ structural knowledge of the novel chord grammar, for two reasons. First, the
priming task presented chord pairs in isolation rather than embedded in a longer context (like in
discrimination). Working memory demands for the priming task were thus lower than for the
discrimination task, relieving participants from processing an entire harmonic passage and thus
allowing participants to focus more attention on the priming task itself. Secondly, the priming
task was an implicit evaluation of learned harmonic structure, whereas the similarity task
required explicit comparison of the passage presented in each trial with the corpus they had
encountered during familiarization. As discussed by Tillmann (2005), implicit tasks are often
superior to explicit ones in the study of musical expectancy because these representations of
expectancy are themselves acquired implicitly, as in statistical learning. A final potential
methodological explanation for the fact that discrimination and priming performance was
unrelated is that the discrimination task was presented after the priming task. This was done to
prevent experience in the discrimination block from affecting performance in the more
theoretically important priming block. However, this meant that there was a significant temporal
delay between familiarization and the performance of the discrimination task, which may have
retarded discrimination performance.
This experiment was able to demonstrate with a harmonic priming task that listeners are able to
form musical expectancies through statistical learning. Furthermore, these results suggest that for
a given tone set (in this case, the familiar Western chromatic set), more complex harmonic
materials are better suited to the development of musical expectancies than simpler melodic
materials (i.e., Experiment 1). Although it is not clear why this is, one can speculate that these
novel harmonic materials provided a stimulus environment that was both rich in information and
similar to other environments where expectancies have been learned before (i.e., Western
harmony).
These first two experiments have focused on the role of stimulus complexity on musical
expectancy learning, while controlling tone set familiarity. However, as alluded to previously,
the use of a Western chromatic tone set in constructing materials produces a confound in the
experiment – representations of familiar Western tonal structure can interfere with the new
musical relations being introduced by the familiarization grammar. Thus, to isolate the learning
of novel expectancies through statistical learning from previously learned expectancies, melodic
41
and harmonic stimuli were constructed from an unfamiliar tone set for the remaining three
experiments.
42
Chapter 4 Experiment 3: Memory For Bohlen-Pierce Melodies
12 Introduction
Experiment 3 was designed to extend the previous results to the learning of melodic expectancies
with an unfamiliar tone set. Although participants failed to learn melodic expectancies in
Experiment 1, this may have been due in part to the nature of the grammar’s construction. For
this study, rather than using a tone-word language to create stimulus items, we used a finite state
grammar in which grammatical items differed from ungrammatical ones based on a more
complex network of transitional probabilities, rather than one in which the transitional
probabilities are based on a small set of legal tone-words. This finite state grammar produced
melodies based on an underlying harmony, which mimics the way melodies are composed in real
music; this grammatical structure may therefore make it easier to learn expectancies than
Experiment 1’s tone-word chunk structure.
Inspiration for this study came from work by Loui and her colleagues (Loui & Wessel, 2008;
Loui, et al., 2010). The tone set used by these researchers was the Bohlen-Pierce scale, a
microtonal tuning system which listeners are very unlikely to have encountered previously, and
whose structure has been described by Krumhansl (1987). Loui et al. (2008; 2010) have shown
that following approximately 30 minutes of exposure to a set of grammatical Bohlen-Pierce
melodies, participants were able to discriminate between a grammatical melody and an
ungrammatical foil, regardless of whether they had heard the grammatical melody before. Thus,
the familiarization and discrimination phases were direct replications of this previous work.
The expectancy phase used a recognition memory task similar to the one from Experiment 1. As
in Experiment 1, if participants were able to learn melodic expectancies during the
familiarization phase, their memory performance should be better for grammatical melodies than
ungrammatical ones. Tonality was operationalized in the present experiment by comparing
grammatical melodies with randomly composed ungrammatical melodies. The novel grammar
from which grammatical melodies were composed governed the way tones were combined,
analogous to the way tonality governs the way tones are combined in real music. As a result,
grammatical melodies should have been perceived as better-formed than ungrammatical
43
melodies in the same way that tonal melodies are perceived as better-formed than atonal
melodies (Krumhansl, 2000). Thus, it was predicted that the effect of grammar in the present
study would be quite similar to the effect of tonality in past studies. For this experiment, the
expectancy phase memory task was designed by combining the methods of DeWitt & Crowder
(1986), who used delay and comparison type as factors, and Dowling (1991), who studied
tonality in addition to those previous two factors.
Finally, consideration should be given to the familiarity and complexity of these experimental
materials and how these characteristics might have influenced the ease of expectancy learning in
this experiment. These stimuli were melodies composed from Bohlen-Pierce tones, meaning that
they were built from unfamiliar-sounding parts but simple in construction. It is possible that
using an unfamiliar tone set will make learning expectancies very difficult because it puts a
heavy burden on perceptual processing. However, this difficulty may be slightly alleviated by the
fact that the structures to be learned are melodic and therefore quite simple. Furthermore,
expectancy learning may be potentiated by the fact that these Bohlen-Pierce structures do not
have to compete with longstanding Western melodic expectancies, as they did in Experiments 1
and 2. Because Western tonality cannot interfere in the perception of melodies that are composed
from tones outside of the Western chromatic set, it may be easier to learn the novel relations
between the Bohlen-Pierce tones established by this novel familiarization grammar.
13 Methods
13.1 Participants
Thirty-two participants were recruited from the University of Toronto Scarborough community
using the introductory psychology participant pool as well as posted advertisements. Participants
were compensated with course credit or $10 per hour.
They consisted of 10 males and 22 females with a mean age of 19.6 years (SD = 3.2 years).
These participants were selected to have at least five years of formal musical training, due to
concerns about task difficulty for non-musicians. Participants had on average 8.4 years of formal
musical training (SD = 4.1 years), 2.6 years of musical theory training (SD = 3.3 years), played
music for 3.9 hours per week (SD = 4.1 hours), and listened to music for 14.9 hours per week
44
(SD = 15.9 hours). One participant in the discrimination condition reported having absolute
pitch. No participants reported having taken part in a music psychology experiment previously.
13.2 Apparatus
This experiment used an identical apparatus to Experiment 1.
13.3 Materials and Procedure
Participants were randomly assigned to the discrimination group or the expectancy group. Both
groups took part in the familiarization phase. Only the discrimination group participated in the
discrimination phase, and only the expectancy group participated in the expectancy phase.
Bohlen-Pierce melodies:
The familiarization and discrimination phases were based on the work by Loui and colleagues
(2008; 2010), employing original materials because the stimuli from Loui and colleagues were
not available. Familiarization and discrimination stimuli were composed from pure sinusoidal
tones synthesized in Matlab 7.0, 500 ms in length with 5 ms rise and fall times. These tones had
frequencies defined by the Bohlen-Pierce system.
The Bohlen-Pierce system uses a microtonal scale based on 13 logarithmically equal divisions of
a tritave (a 3:1 frequency ratio). The tones in the tritave used for these stimuli are defined as:
Frequency (Hz) = k*3n/13
where n is the number of steps along the scale, and k is a constant equal to 220 Hz (Table 3). A
subset of these scale tones were combined into chords (Krumhansl, 1987), and these chords were
combined to form a four-chord progression (Figure 13). This progression was used as a finite-
state grammar to construct eight-note melodies; melodies had to start on a tone from the first
chord and end on a tone from the fourth chord, and successive notes could either repeat the same
tone or choose another tone from the same chord or the next chord in the progression (see Figure
13 for an example).
45
Table 3
Composition of Materials for Experiment 3
Tone in Tritave (n) Frequency (Hz)
0
220.00
1 239.40
2 260.51
3 283.48
4 308.48
5 335.68
6 365.29
7 397.50
8 432.55
9 470.69
10 512.20
11 557.37
12 606.52
Figure 13. (a) Bohlen-Pierce grammar from Experiment 3, based on Figure 2 from Loui et al.
(2008). (b) An example of a melody constructed from this grammar.
46
Familiarization phase:
For the familiarization phase, 28 instances of 18 Bohlen-Pierce melodies were presented in
random order. Participants were instructed to listen carefully to a series of melodies. Participants
were provided with paper and crayons, with which they could draw to help them keep alert over
the 30 minutes of familiarization.
Discrimination phase:
A subset of 10 melodies from familiarization was used for discrimination trials. In addition, 10
novel grammatical and 20 ungrammatical melodies were composed (10 for each block, see
below). Ungrammatical melodies were created in the same way as grammatical melodies, except
by using a retrograde grammar where the four chords occurred in reverse order.
This phase was presented in two blocks of 10 trials. For both blocks, participants heard one
grammatical melody and one ungrammatical melody (with the order counterbalanced across
trials) separated by 1 second, and instructed to indicate which of the two melodies sounded more
familiar. In the first “recognition” block, the grammatical melody had been presented in
familiarization; thus, participants were simply required to recognize a previously heard melody
in order to correctly choose the grammatical item. In the second “generalization” block, the
grammatical melody had not been presented in familiarization; thus, participants were required
to generalize the grammar’s structure to a novel exemplar to do the task.
Following the discrimination trials, participants completed a survey regarding their musical
experience. The entire experimental session for the discrimination group lasted approximately
one hour.
Expectancy phase:
Memory trials in the expectancy phase were similar to the memory trials from Experiment 1. The
120 trials were manipulated in terms of Grammar (Grammatical vs. Ungrammatical), Delay
(Short vs. Long) and Comparison Type (Match, Same Contour, Random). Of the 60 grammatical
standards, 18 were heard during familiarization, and none of the 60 ungrammatical melodies had
47
been used during familiarization or discrimination. Ungrammatical melodies were composed by
choosing 8 notes randomly from the 13 tones of the Bohlen-Pierce tone set.
The standard and comparison were separated by one second for short delay trials, and by 30
seconds for long delay trials. During the long delay, participants were presented with a random
three-digit number from which they were required to count backwards, out loud, by threes. The
three comparison types were constructed in the same manner as in Experiment 1, with two
exceptions. First, the Bohlen-Pierce tone set was used rather than the Western chromatic tone set.
Second, similar to methods used by Dowling (1991), only two notes were altered between match
and same contour trials (as opposed to all of them). These notes were chosen randomly, with the
restriction that they were neither the first nor last note of the melody. This was a result of the
restricted sample space offered by the novel Bohlen-Pierce grammar. Table 4 depicts some
examples of memory trials.
Before encountering the expectancy trials, participants were informed about all the possible trial
types, and presented with six examples. Each memory trial started with the standard melody. On
short delay trials, there was a one-second delay followed by the presentation of the word “TEST”
on the screen as the comparison melody played. On long delay trials, the word “COUNT” and a
random three-digit number appeared on the screen for 30 seconds, followed by the presentation
of the word “TEST” as the comparison melody played. After the comparison played participants
were asked to indicate how similar the second melody was to the first on a confidence scale from
1 (same) to 5 (different). The order of the 120 memory trials was randomized for presentation to
participants. Following these trials, participants completed a survey regarding their musical
experience. The entire experimental session for the expectancy group lasted approximately 90
minutes.
48
Table 4
Examples of Grammatical and Ungrammatical Standards and Their Corresponding
Comparisons for Experiment 3
Grammar Standard Comparison Type Comparison
Grammatical 0 4 7 7 7 0 6 10 Match 0 4 7 7 7 0 6 10
0 0 10 7 4 7 7 0 Same Contour 0 0 10 6 0 7 7 0
0 7 0 10 0 6 10 6 Random 0 7 10 3 7 3 0 6
Ungrammatical 4 6 11 11 11 9 11 12 Match 4 6 11 11 11 9 11 12
5 5 11 5 2 5 5 1 Same Contour 5 5 10 7 2 5 5 1
0 4 1 10 9 11 12 8 Random 0 10 11 7 10 2 1 4
Note. The numbers in the melody column refer to notes corresponding to “n” from Table 3.
14 Results
14.1 Discrimination Phase
The total number of correct responses (out of 10) was tallied for each participant for the
recognition and generalization blocks. The scores from each block were then submitted to a one-
sample t-test with µ = 5 (chance performance). Performance in the recognition block was no
different from chance (mean = 5.44 ± 0.43), t(15) = 1.02, p = .32, nor was performance in the
generalization block (mean = 5.13 ± 0.40), t(15) = 0.32, p = .76.
14.2 Expectancy Phase
Following DeWitt & Crowder (1986), similarity ratings from the melody memory task were
evaluated in two ways. The similarity ratings, recorded on a five-point confidence scale, were
49
converted to areas under a memory operating characteristic (MOC) curve (MacMillan &
Creelman, 2005) . The raw similarity ratings were analyzed in addition to these area scores.
Areas under the MOC curve:
Areas under the MOC curve (Table 5) represent participant sensitivity to differences between
stimuli, and quantify participants’ general memory performance in a particular condition. These
data were organized by the within-subjects factors of Grammaticality (Grammatical,
Ungrammatical), Delay (Short, Long), and Comparison (Match vs. Same Contour, Match vs.
Random), and submitted to repeated measures ANOVA. All three main effects were significant.
Participants performed better on ungrammatical than grammatical trials, F(1,15) = 5.96, MSE =
0.02, p = .03, ɳp2 = .28, and better with short delays than long delays, F(1,15) = 95.63, MSE =
0.02, p < .001, ɳp2 = .86. Lastly, they found same contour melodies less discriminable from
matches than random melodies, F(1,15) = 139.17, MSE = 0.01, p < .001, ɳp2
= .90 (Figure 14).
Table 5
Mean Areas Under Memory Operating Characteristic Curve for Experiment 3
Delay
Short Long
Grammaticality Comparison
Type
SC R SC R
Grammatical 0.76 0.99 0.52 0.61
Ungrammatical 0.76 0.98 0.61 0.75
Note. SC = Same Contour; R = Random.
50
Grammatical Ungrammatical
Grammaticality
0
0.2
0.4
0.6
0.8
1A
rea U
nd
er
MO
C
Short Delay Long Delay
Delay
Match vs. Same Contour Match vs. Random
Comparison
****
***
Figure 14. Main effects on memory operating characteristic for Experiment 3.
With respect to two-way interactions, the interaction between Grammaticality and Delay was
significant, F(1,15) = 5.88, MSE = 0.02, p = .028, ɳp2 = .28, as was the interaction between Delay
and Comparison, F(1,15) = 8.24, MSE = 0.01, p = .01, ɳp2 = .36. The interaction between
Grammaticality and Comparison was not significant, F(1,15) = 1.30, p =.27. The three-way
interaction between Grammaticality, Delay, and Comparison was not significant, F(1,15) = 1.20,
p = .29.
The significant interactions reported above were investigated by using simple effects. The
Grammaticality x Delay interaction was driven by the fact that participants performed better for
ungrammatical trials than grammatical ones after long delays, t(15) = 2.53, p = .02, but not short
ones, t(15) = 0.08, p = .94 (Figure 15).
51
Grammatical Ungrammatical
Grammaticality
0.5
0.6
0.7
0.8
0.9
Are
a U
nd
er
MO
C
Short Delay
Long Delay
*
Figure 15. Grammaticality x Delay interaction for memory operating characteristic in
Experiment 3.
The Delay x Comparison interaction was driven by the fact that random trials were much more
discriminable than same contour trials after short delays, t(15) = 9.26, p < .001, than long delays,
t(15) = 4.89, p < .001, although this difference was still significant for long delays (Figure 16).
52
Match vs. Same Contour Match vs. Random
Comparison
0.5
0.6
0.7
0.8
0.9
1
Are
a U
nd
er
MO
C
Short Delay
Long Delay
***
***
Figure 16. Delay x Comparison interaction for memory operating characteristic in Experiment 3.
Lastly, a priori comparisons were conducted to see if performance differed based on whether the
standard melody was familiar from being presented during the familiarization phase. For this
analysis, MOC areas were calculated by collapsing across delay and lure type (Same Contour,
Random). Memory performance for ungrammatical trials was equivalent to performance for old
grammatical trials that had been encountered in the familiarization phase, t(15) = 1.47, p = .16.
Performance in both these conditions was superior to performance for new grammatical trials
that had not been encountered in the familiarization phase, both t values > 3.16, both p values <
.01 (Figure 17).
53
Condition
0
0.2
0.4
0.6
0.8
1
Are
a U
nd
er
MO
C
**
Grammatical Old
Grammatical New
Ungrammatical
Figure 17. Effect of melody familiarity on area under memory operating characteristic in
Experiment 3.
Raw similarity ratings:
Similarity responses were collapsed across the 10 melodic exemplars for each combination of
tonality, delay, and comparison type (Table 6). These averaged ratings were then submitted to
repeated measures ANOVA with Grammaticality (Grammatical, Ungrammatical), Delay (Short,
Long), and Trial Type (Match, Same Contour, Random) as factors. All three main effects were
significant. Melodies in grammatical trials were judged more similar than in ungrammatical
trials, F(1,15) = 207.76, MSE = 0.49, p < .001, ɳp2 = .93. Melodies in short delay trials were
judged more similar than in long delay trials, F(1,15) = 133.56, MSE = 0.15, p < .001, ɳp2 = .90
(Figure 18). Finally, the three trial types differed in their similarity ratings, F(2,15) = 53.81, MSE
= 0.28, p < .001, ɳp2 = .78. Multiple Bonferroni comparisons determined that same contour trials
54
were judged as more similar than match trials, t(15) = 3.39, p < .01, and that match trials were
judged as more similar than random trials, t(15) = 5.79, p < .001 (Figure 18).
Table 6
Mean Similarity Ratings for Experiment 3
Delay
Short Long
Grammar Comparison
Type
M SC R M SC R
Grammatical 1.34 1.51 2.67 2.86 2.43 2.69
Ungrammatical 2.61 3.29 4.54 4.49 3.14 4.16
Note. M = Match; SC = Same Contour; R = Random.
Grammatical Ungrammatical
Grammaticality
0
1
2
3
4
Sim
ila
rity
(1
= s
am
e,
5 =
dif
fere
nt)
Short Delay Long Delay
Delay
Match Same Contour Random
Trial Type
*** ***
**
***
Figure 18. Main effects on similarity ratings for Experiment 3.
55
In terms of two-way interactions, the interaction between Grammaticality and Delay was
significant, F(1,15) = 11.30, MSE = 0.14, p < .01, ɳp2 = .43, as was the interaction between Delay
and Trial Type, F(2,30) = 70.63, MSE = 0.21, p < .001, ɳp2 = .83. The Grammaticality x Trial
Type interaction was marginally significant, F(2,30) = 2.95, p = .07. The three-way interaction
between Grammaticality, Delay, and Comparison Type was significant, F(2,30) = 5.76, MSE =
0.36, p = .01, ɳp2 = .28.
The significant two-way interactions reported above were investigated by using simple effects.
The Grammaticality x Delay interaction was driven by the fact that the difference in similarity
ratings between short and long delay trials was larger when the trial was also grammatical, t(15)
= 11.63, p < .001, than ungrammatical, t(15) = 5.37, p < .001, with melodies in the short delay
condition judged as more similar than in the long delay condition in both cases (Figure 19).
Short Long
Delay
1.5
2
2.5
3
3.5
4
4.5
Sim
ila
rity
(1
= s
am
e,
5 =
dif
fere
nt)
Grammatical
Ungrammatical
***
***
Figure 19. Grammaticality x Delay interaction for similarity ratings in Experiment 3.
The Grammaticality x Trial Type interaction was driven by the fact that the difference between
grammatical and ungrammatical trials was larger for the Match and Random conditions, t(15) =
11.08, p < .001, than for the same contour condition, t(15) = 7.83, p < .001, with melodies in the
grammatical condition always judged as more similar than those in the ungrammatical condition
(Figure 20).
56
Grammatical Ungrammatical
Grammaticality
1
2
3
4
5
Sim
ila
rity
(1
= s
am
e,
5 =
dif
fere
nt)
Match
Same Contour
Random
***
***
***
Figure 20. Grammaticality x Trial Type interaction for similarity ratings in Experiment 3.
The Delay x Trial Type interaction was driven by differences in the effect of delay for each trial
type. For the match condition, short delay trials received much more similar ratings than long
delay trials, t(15) = 17.90, p < .001. For the same contour condition, short delay trials received
more similar ratings than Long Delay trials, t(15) = 2.83, p = .01, but this difference is not as
great as for the match condition. For the random condition, the pattern was reversed, with short
delay trials receiving marginally less similar ratings than long delay trials (Figure 21).
57
Short Long
Delay
1
2
3
4
5
Sim
ila
rity
(1
= s
am
e,
5 =
dif
fere
nt)
Match
Same Contour
Random
*
***
#
.
Figure 21. Delay x Trial Type interaction for similarity ratings in Experiment 3. # signifies p <
.10.
Finally, the three-way interaction of Grammaticality, Delay, and Comparison Type was fairly
complex. Critically, however, it took the form of the interaction observed in previous work
(Dowling, 1991), wherein similarity ratings for the same contour and random trials converge at
long delays, but only for grammatical trials (Figure 22).
58
Short Long
0
1
2
3
4
5
Sim
ila
rity
(1
= s
am
e,
5 =
dif
fere
nt)
Short Long
Match
Same Contour
Random
Delay
Grammatical Ungrammatical
***
***
***
*
Figure 22. Grammaticality x Delay x Comparison Type interaction for similarity ratings in
Experiment 3.
Lastly, a priori comparisons were conducted to see if responses differed based on whether the
standard melody had been presented during the familiarization phase. There was no difference in
similarity judgments between old grammatical trials with standards that had been encountered
during familiarization, and new grammatical trials with standards that had not been heard before,
t(15) = 0.33, p = .75. However, as supported by the main effect for grammaticality in the
previously reported ANOVA, grammatical trials garnered more similar responses than
ungrammatical trials, regardless of whether they were heard in familiarization, t(15) = 6.85, p <
.001, or not, t(15) = 8.15, p < .001 (Figure 23).
59
Condition
0
1
2
3
4
Sim
ila
rity
(1
= s
am
e,
5 =
dif
fere
nt) ***
Grammatical Old
Grammatical New
Ungrammatical
Figure 23. Effect of melody familiarity on similarity ratings in Experiment 3.
15 Discussion
In this experiment, few of the critical hypotheses concerning discrimination and expectancy
effects were borne out. First of all, the discrimination group was unable to perform above chance
in either the recognition or generalization task. This result is problematic given that it essentially
represents a failure to replicate Loui et al.’s (Loui & Wessel, 2008; Loui, et al., 2010) finding
that participants performed better than chance in both discrimination tasks. This failure to
replicate occurred despite the use of identically constructed materials, and a similar participant
population (trained musicians). Furthermore, this result throws doubt on whether the expectancy
group would be able to learn melodic expectancies at all, since the melody memory task seems to
require more sophisticated knowledge of the novel grammatical structure than do the recognition
or generalization tasks. However, in Experiment 2, participants exhibited evidence of learned
harmonic expectancies even though they were not able to perform a discrimination task (Correct
vs. Error) that required the same structural knowledge. It was hypothesized then that perhaps the
expectancy task was a more sensitive measure of that structural information, and this could be
the case for the expectancy task in this experiment as well.
60
The data from that expectancy block provided a mixture of expected and unexpected results,
when compared with the findings of DeWitt & Crowder (1986) and Dowling (1991). The main
effect of delay, with short delays leading to better performance and more similar ratings, accords
with previous work, as does the effect of comparison type, with poorer performance and more
similar ratings for same contour trials than random trials. The interaction between these
variables, whereby same contour lures are confused with matches at short delays but not long
delays, was also expected. However, here this interaction was only significant for the analysis of
raw similarity ratings, whereas it was found for both the ratings and area scores analysis in
previous work. The lack of the Delay x Comparison Type interaction for the MOC area scores
may potentially be attributed to the unfamiliar Bohlen-Pierce materials used here, which have not
been tested with this methodology previously. Regardless, the generally successful replication of
past research for these variables is somewhat reassuring, as it provides an important point of
convergence between the current study and previous findings.
The critical effects involving grammar, however, did not agree as well with previous work. As
expected, grammatical trials received more similar ratings than ungrammatical ones. However,
memory performance was better when the standard was ungrammatical than when it was
grammatical (Figure 9). This result ran counter to predictions because memory for grammatical
melodies whose structure was expected should be superior to memory for ungrammatical
melodies whose structure was randomly chosen, as Dowling (1991) found for tonality. The
significant two-way interactions between grammar and delay (found for MOC and similarity
analysis) and grammar and comparison type (found for similarity only) were similarly
unprecedented in the literature, and do not lend themselves readily to any obvious explanation.
The observed three-way interaction between tonality, delay, and comparison type was expected
based on previous research (Figure 10), but it is important to note that this interaction was
significant for the similarity ratings only, whereas previous research has found a significant
interaction for both memory performance and similarity ratings.
This unexpected effect of grammar (and the associated interactions) may be explained by a
novelty advantage. The ungrammatical melodies were composed randomly from the full set of
13 Bohlen-Pierce tones, whereas the grammatical melodies were more restricted, moving within
a subset of only six tones through set paths (Figure 13). Thus, the ungrammatical melodies may
be perceived by participants as highly salient due to their distinctiveness following 30 minutes of
61
familiarization on grammatical melodies, leading to increased attention and better memory
performance.
The main finding of this experiment was some weak evidence for expectancy learning in the
analysis of training effects on memory performance, whereby memory was better for melodies
encountered during the training phase than new grammatical melodies. Thus, a limited amount of
familiarization with the familiarization corpus may have led to the formation of veridical
expectancies that boosted memory performance for those items. However, there was no evidence
that participants learned any structural expectancies based on the grammar that produced the
familiarization items.
One general conclusion that might be drawn from this study is that employing a novel tone set
led to a failure to learn expectancies in this study. However, before drawing such a conclusion it
must be remembered that the discrimination task in this study also failed to demonstrate
sensitivity to the grammatical structure of the sequences. Accordingly, it could be that, for some
reason, the familiarization phase of this experiment was simply inadequate to impart the requisite
grammatical structure of these stimuli. Of course, such a failure is curious given that previous
work by Loui and colleagues (Loui & Wessel, 2008; Loui, et al., 2010) has demonstrated
successful discrimination using almost identical methods.
The use of 18 grammatical melodies repeated 28 times each was an adaptation of Loui & Wessel
(2008), who used 15 melodies repeated 27 times each, with alterations to balance the number of
items in each condition of this design. However, these authors found in a subsequent study (Loui,
et al., 2010) that their effect size for the generalization task, which targets structural knowledge
that is critical for expectancy formation, was much larger when they trained participants on a
larger set of grammatical items. Furthermore, previous research has shown that the statistical
coverage of a familiarization set – whether it contains all of the permissible transitions between
units in a grammar – can determine whether the grammatical structure is learned (Poletiek & van
Schijndel, 2009). Assessment of statistical coverage in these materials revealed that two of the
grammatical transitions (out of 27) were not represented in the familiarization melodies.
Before concluding that participants cannot learn musical expectancies with melodic stimuli (such
as those used in Experiments 1 and 3), it must be noted that Tillmann and Poulin-Charronnat
(2010) were recently successful in measuring expectancies in participants following statistical
62
learning with a novel melodic grammar (though this grammar was constructed using the familiar
Western chromatic tone set). Crucially, these authors employed a tonal priming paradigm, very
similar to the one used in Experiment 2, in order to quantify expectancy learning. Therefore,
perhaps the melody memory task used in the present experiment was not sensitive enough to
detect learned expectancies, or did not assess the specific type of expectancies learned by
participants during familiarization.
Consequently, these methodological issues were addressed by two changes in the next
experiment. First, the familiarization phase was changed to more precisely replicate Loui et al.
(2010) in order to maximize the potential effects of statistical learning. Second, expectancies
were measured using a priming task rather than a memory task.
63
Chapter 5 Experiment 4: Tonal Priming With Bohlen-Pierce Melodies
16 Introduction
This experiment was conceptually identical to the last experiment, but used different methods in
order to better potentiate expectancy learning in participants. As discussed, previous research has
found larger effect sizes, possibly because of greater statistical coverage, when larger
familiarization sets are used. Thus, in this experiment, participants were trained with 400 distinct
grammatical items, rather than repeating a smaller subset of items, and this familiarization set
was assessed to ensure complete statistical coverage of the familiarization grammar.
Additionally, participants were randomly assigned to Grammar A (the same familiarization
grammar as Experiment 3) or Grammar B (the retrograde of Grammar A), in order to more
completely replicate the design of Loui et al. (2008; 2010).
Finally, the expectancy phase task was changed from melody memory to the tonal priming task
of Tillmann and Poulin-Charronnat (2010). In these authors’ study, participants were trained on a
set of melodies produced from a finite-state grammar that combined Western chromatic tones in
novel ways. Following familiarization, participants were tested with a melodic priming task in
which they had to identify whether a target note was in- or out-of-tune. The melodies used in this
priming task were novel exemplars of melodies based on the familiarization grammar wherein
the target note was manipulated to either follow or violate the familiarization grammar.
Participants showed successful expectancy learning following familiarization, with better
accuracy and faster reaction times for trials in which the target conformed to the grammar than
trials in which the target violated the grammar. The present study made one alteration to this
paradigm; participants were asked to make timbre judgments rather than tuning judgments, since
they were not familiar with the tuning conventions of the Bohlen-Pierce scale.
64
17 Methods
17.1 Participants
Twenty-two participants were recruited from the University of Toronto Scarborough community
using the introductory psychology participant pool as well as posted advertisements. Participants
were compensated with course credit or $10 per hour.
They consisted of six males and 16 females with a mean age of 19.8 years (SD = 4.9 years).
Participants were selected to have at least five years of formal musical training, due to concerns
about task difficulty for non-musicians. Participants had on average 9.11 years of formal musical
training (SD = 2.8 years), 3.5 years of musical theory training (SD = 3.1 years), played music for
1.8 hours per week (SD = 1.9 hours), and listened to music for 16.5 hours per week (SD = 17.1
hours). No participants reported having absolute pitch, nor did anyone report having taken part in
a music psychology experiment previously.
17.2 Apparatus
This experiment used an identical apparatus to Experiment 1.
17.3 Materials
Tone synthesis:
Three sets of Bohlen-Pierce tones with differing timbres and equal loudnesses were constructed
in CSound according to the frequencies from Table 3. All tones were 500 ms in duration. For the
regular melody notes, a set of pure tones was constructed with 5 ms rise and fall times. For the
target tones (see expectancy phase below), a set of bright-sounding tones (Timbre A) and a set of
dull-sounding tones (Timbre B) was created. All tones consisted of eight harmonics and 5 ms
rise and fall times. Timbre A tones had 1500 times the energy in the last three harmonics as the
first five, and Timbre B tones had 1500 times the energy in the first three harmonics as the last
five.
65
Familiarization phase:
The familiarization phase melodies were constructed identically to those from Experiment 3,
from the pure tones described above. However, 400 unique Bohlen-Pierce melodies were
presented in random order during familiarization (rather than 28 melodies x 18), with this
familiarization set designed for full statistical coverage of the familiarization grammar. Two
familiarization sets were created, with each melody in the Grammar B set the retrograde of a
melody in the Grammar A set. Participants were assigned randomly to Grammar A or B.
Discrimination phase:
The discrimination blocks for this experiment were identical to those from Experiment 3;
participants were presented with 10 recognition trials and 10 generalization trials. Because half
the participants were trained on Grammar B items, the correct grammatical response for
Grammar A participants was the wrong response for Grammar B participants, and vice versa.
Expectancy phase and timbre training:
Two sets of 48 priming melodies were constructed from the tone sets described above, one for
Grammar A and one for Grammar B. For the priming trials, each melody was presented in
grammatical and ungrammatical form. To create ungrammatical versions of each melody, a
target note was altered so that it violated the rules of the grammar. Thus, there were 96 priming
trials each for Grammar A and B. All the notes in each priming melody were played as pure
tones, except for the target note. The melodies in each trial were manipulated with respect to
Grammaticality (Grammatical, Ungrammatical), Timbre (Bright, Dull), Target Position (5, 6, 7),
and Familiarization (Familiarization, Novel). Timbre refers to whether the target note was played
with a Timbre A (bright) or B (dull). Target Position refers to which of the eight notes in the
melody was the target. The Familiarization variable refers to whether or not the melody was part
of the familiarization set.
17.4 Procedure
Each participant completed all phases of the experiment.
66
To start, participants were trained to discriminate Timbre A from Timbre B. First, participants
listened to a randomized set of the 13 Bohlen-Pierce tones in each of the timbres twice, with the
option to listen again until they felt comfortable with the distinction. Next, participants heard a
randomized series of 24 tones, half in Timbre A and half in Timbre B. They were instructed to
indicate which of the two timbres each tone was played with, and were not allowed to proceed
until they achieved a score of at least 20/24 correct.
Lastly, participants were tested with melodies similar to the trials they would encounter in the
priming phase. On each of 15 trials, they heard a melody in which the target note was played by
either Timbre A or Timbre B, and asked to indicate which timbre this was. Participants were
visually-guided in this task (see Figure 24). Each trial began with a fixation cross in the centre of
the screen for 1 second. Then the melody started playing, with each note designated by a white
dot. The note prior to the target was designated by a warning sign, and the target note was
designated by a red question mark. Participants were instructed that upon hearing the target note,
they were to respond as quickly and accurately as possible, indicating which of the two timbres
played the target. Participants were told that the melody would continue playing after the target
note was sounded, and that they must respond within two seconds of hearing the target.
Participants were not allowed to proceed to the familiarization phase until they had scored at
least 12/15 correct.
67
Figure 24. Visual presentation of priming trials in Experiment 4, based on Figure 2 from
Tillmann & Poulin-Charronnat, 2010.
Familiarization phase:
During the familiarization phase, participants heard either 400 Grammar A or 400 Grammar B
progressions. Like in Experiment 3, participants were instructed to listen carefully to the
melodies and provided with paper and crayons with which they could draw to help them keep
alert over the 30 minutes of familiarization.
Discrimination phase:
The discrimination phase was identical to that from Experiment 3.
Expectancy phase:
Before beginning the expectancy trials, participants were reminded of the two timbres with a set
of 13 randomized tones each from Timbre A and Timbre B. They were then reminded of the
instructions for the priming task from timbre training before completing the 96 expectancy trials
for their grammar in random order. The only difference between the timbre training trials and the
68
expectancy trials was that the participants went through training before the expectancy phase;
thus, an effect of grammaticality was predicted for the expectancy trials.
Following all the experimental trials, participants completed a survey regarding their musical
experience. The entire experimental session lasted approximately 90 minutes.
18 Results
18.1 Discrimination Phase
Like in Experiment 3, the total number of correct responses (out of 10) was tallied for each
participant for the recognition and generalization blocks. The scores from each block were then
submitted to a one-sample t-test with µ = 5 (chance performance). Performance in the
recognition block was no different from chance, mean = 4.73 ± 0.34, t(21) = 0.81, p = .43, but
performance in the generalization block was marginally significant, mean = 5.73 ± 0.37, t(21) =
1.95, p = .07. Grammar B participants outperformed Grammar A participants in the recognition
task, t(21) = 2.39, p = .03; this difference was driven by Grammar A participants performing
worse than chance (Figure 25). There was no difference between Grammar A and Grammar B
for generalization, t(21) = 0.48, p = .64.
Grammar A Grammar B
Training Grammar
0
2
4
6
Nu
mb
er
Co
rre
ct
(/1
0)
*
chanceperformance
Figure 25. Effect of familiarization grammar on recognition performance in Experiment 4.
69
18.2 Expectancy Phase
Responses from the priming task were analyzed with respect to accuracy and reaction time.
Accuracy data:
Raw accuracy data were collapsed across the four melodies in each condition. These mean
accuracy scores were then submitted to repeated measures ANOVA with Grammaticality
(Grammatical, Ungrammatical), Timbre (Bright, Dull), Target Position (5, 6, 7), and
Familiarization (Familiarization, Novel) as factors. There was a main effect of Grammaticality,
F(1,21) = 6.81, MSE = 0.02, p = .02, ɳp2 = .25, with ungrammatical trials receiving more
accurate responses than grammatical trials (Figure 26). The main effects of Timbre, Target
Position, and Familiarization were all non-significant, all F values < 1.93, all p values > .15.
Grammatical Ungrammatical
0.9
0.92
0.94
0.96
0.98
Ac
cu
rac
y (
%)
Grammaticality
*
Figure 26. Effect of Grammaticality on accuracy in Experiment 4.
Three two-way interactions were significant: Grammaticality x Timbre, F(1,21) = 6.26, MSE =
0.01, p = .02, ɳp2 = .23, Grammaticality x Familiarization, F(1,21) = 12.72, MSE = 0.01, p < .01,
70
ɳp2 = .38, and Timbre x Familiarization, F(1,21) = 26.27, MSE = 0.01, p < .001, ɳp
2 = .56. All
other two-, three-, and four-way interactions were non-significant, all F values < 1.28, all p
values > .29.
The significant interactions were investigated using simple effects analyses. The interaction
between Grammaticality and Timbre was driven by the fact that ungrammatical melodies
garnered more accurate responses than grammatical ones for bright target notes, t(21) = 3.52, p <
.01, but not dull ones, t(21) = 0.53, p = .61 (Figure 27).
Grammatical Ungrammatical
Grammaticality
88
90
92
94
96
98
100
Ac
cu
rac
y (
%)
Bright Target
Dull Target
**
Figure 27. Grammaticality x Timbre interaction for accuracy in Experiment 4.
The interaction between Grammaticality and Familiarization was driven by the fact that
ungrammatical melodies garnered more accurate responses than grammatical ones for
familiarization melodies, t(21) = 3.94, p = .001, but not novel ones, t(21) = 0.41, p = .68 (Figure
28).
71
Grammatical Ungrammatical
Grammaticality
88
90
92
94
96
98A
cc
ura
cy
(%
)
Training Melody
Novel Melody
***
Figure 28. Grammaticality x Familiarization interaction for accuracy in Experiment 4.
Lastly, the interaction between Timbre and Familiarization was driven by the fact that novel
melodies garnered more accurate responses than familiarization ones when the target was bright,
t(21) = 2.98, p = .01, but familiarization melodies had a marginal accuracy advantage over novel
ones when the target was dull, t(21) = 1.92, p = .07 (Figure 29).
Training Novel
Source of Melody
90
92
94
96
98
Ac
cu
rac
y (
%)
Bright Target
Dull Target
**
Figure 29. Timbre x Familiarization interaction for accuracy in Experiment 4.
72
Reaction time data:
Like for Experiment 2, reaction times were only analyzed for trials where participants answered
correctly, and all reaction times greater than 2000 ms were discarded. All participants performed
at greater than 75% accuracy, so reaction time data for all participants were retained for this
analysis. Like for accuracy, valid data were then collapsed across the four melodies in each
condition. These mean reaction times were submitted to repeated measures ANOVA with
Grammaticality (Grammatical, Ungrammatical), Timbre (Bright, Dull), Target Position (5, 6, 7),
and Familiarization (Familiarization, Novel) as factors. The main effect of Grammaticality was
significant, F(1,21) = 5.87, MSE = 25833.31, p = .03, ɳp2 = .22, with faster reaction times for
ungrammatical than grammatical trials (Figure 12). The main effect of Timbre was significant,
F(1,21) = 18.55, MSE = 31340.85, p < .001, ɳp2 = .47, with faster reaction times for bright than
dull targets. The main effect of Target Position was also significant, F(2,42) = 10.45, MSE =
13697.40, p < .001, ɳp2 = .33. Multiple Bonferroni-corrected comparisons indicated that this
effect was due to participants responding slower for targets at the fifth note of the melody than at
the sixth note, t(21) = 3.21, p < .01, and seventh note, t(21) = 3.56, p < .01. The main effect of
Familiarization was not significant, F(1,21) = 1.06, p = .31. All significant main effects are
illustrated in Figure 30.
Grammatical Ungrammatical
Grammaticality
0
200
400
600
800
Rea
cti
on
Tim
e (
ms
)
Bright Dull
Timbre
5 6 7
Target Position
* *****
Figure 30. Main effects for reaction time in Experiment 4.
73
Two 2-way interactions were marginally significant: Grammaticality x Target Position, F(2,42)
= 3.11, MSE = 16960.00, p = .06, ɳp2 = .13, and Grammaticality x Familiarization, F(1,21) =
4.24, MSE = 17812.52, p = .06, ɳp2 = .17. All other two-, three-, and four-way interactions were
non-significant, all F values < 1.94, all p values > .17.
Like for accuracy, the interactions for reaction time were investigated using simple effects
analyses. The interaction between Grammaticality and Target Position was driven by the fact that
ungrammatical melodies received faster responses than grammatical ones if the target was the
fifth note of the melody, t(21) = 3.46, p < .01, but not if it was the sixth, t(21) = 0.22, p = .83, or
seventh, t(21) = 1.32, p = .20 (Figure 31).
Grammatical Ungrammatical
Grammaticality
520
560
600
640
680
720
760
Re
ac
tio
n T
ime
(m
s)
Target Position 5
Target Position 6
Target Position 7
**
Figure 31. Grammaticality x Target Position interaction for RT in Experiment 4.
The interaction between Grammaticality and Familiarization was driven by the fact that
ungrammatical melodies received faster responses than grammatical ones for familiarization
items but not for novel items (Figure 32).
74
Grammatical Ungrammatical
Grammaticality
520
560
600
640
680
720
Re
ac
tio
n T
ime
(m
s)
Training Melody
Novel Melody
**
Figure 32. Grammaticality x Familiarization interaction for RT in Experiment 4.
Effect of familiarization grammar:
Next, the possibility that the familiarization grammar (A or B) affected participant performance
in the priming task was explored. For this analysis, priming effects for accuracy and reaction
time were quantified for each participant. The accuracy priming effect was calculated by
subtracting the participant’s mean accuracy in the grammatical condition from the mean
accuracy in the ungrammatical condition, with a larger positive difference indicating a larger
accuracy advantage for ungrammatical trials. Similarly, the reaction time priming effect was
calculated by subtracting the participant’s mean reaction time in the grammatical condition from
the mean accuracy in the ungrammatical condition, with a larger negative difference indicating a
larger reaction time advantage for ungrammatical trials. An independent-samples t-test indicated
that there was no difference between Grammar A and B participants in terms of their accuracy or
reaction times, all t values < 1.72, all p values > .10.
Individual Differences
Individual differences analyses explored whether performance in the discrimination phase was
associated with performance in the expectancy phase. Discrimination scores were calculated for
each participant by averaging their scores from the recognition and generalization blocks.
Bivariate correlations were calculated between discrimination scores and each of the priming
effects calculated for the last analysis. Neither of the two correlations were significant, although
75
there was a trend towards a positive relation between discrimination performance and the
priming effect as measured by reaction time, r(20) = .35, p = .11.
Corpus Analysis
Finally, in an effort to explain why participants may have performed better on ungrammatical
than grammatical trials in both Experiments 3 and 4, an analysis of the familiarization and
expectancy corpuses was conducted. These 10 analyses targeted the various types of statistical
information available in familiarization and expectancy items (on the basis of which participants
may have made their response decisions) and were based on the analyses conducted by Tillmann
& Poulin-Charronnat (2010) on their experimental materials. For each analysis (except for
element frequency, see explanation below), Familiarization Grammar (A, B) and Grammaticality
(Grammatical, Ungrammatical) were used as factors.
1. Element frequency was calculated by taking each of the unique targets occurring in the
expectancy (priming) trials and finding the absolute frequency of its occurrence in the
familiarization corpus. Grammar A and Grammar B employed the same set of
grammatical and ungrammatical targets; Familiarization Grammar was therefore not
analyzed as a factor here. There was no difference in element frequency for grammatical
and ungrammatical items, t(8) = 0.92, p = .39.
2. Repetition frequency was calculated by noting, for each of the 96 priming trials in each
language, whether the target tone was a repetition from two tones previous (n-2) or one
tone previous (n -1). Finally, the repetition frequency for each trial (0, 1, or 2) could be
computed. Grammatical trials comprised significantly more repetitions (24.5) than
ungrammatical trials (8.5), t(188) = 2.41, p < .001.
3. Melodic contour was defined for each priming trial by the overall contour (rising, falling,
or static) between the n-2 tone and the target. There was no difference in melodic contour
based on Familiarization Grammar or Grammaticality, both t values < 0.60, both p values
> .55.
76
4. Bigram frequency was calculated by separating all of the priming melodies into their
constituent 2-note chunks. Next, for each of the unique bigrams occurring in the priming
melodies, the frequency of its occurrence in the familiarization corpus was computed.
Grammatical melodies had a higher bigram frequency (88.55) than ungrammatical trials
(35.97), t(57) = 3.64, p = .001.
5. Trigram frequency was calculated identically to bigram frequency, but for 3-note chunks.
Grammar A melodies had a higher trigram frequency (10.94) than Grammar B melodies
(7.34), t(127) = 2.05, p = .04, and grammatical melodies had a higher trigram frequency
(16.24) than ungrammatical melodies (2.03), t(127) = 8.07, p < .001.
6. Associative chunk strength was calculated for each priming sequence by averaging the
chunk frequency (see 4 and 5) associated with each of its constituent bigrams and
trigrams. Grammatical sequences had higher associative chunk strengths (71.12) than
ungrammatical sequences (55.89), t(188) = 5.83, p < .001.
7. A chunk novelty value was assigned to each priming sequence based on whether any
bigram or trigram in that sequence had not occurred in the familiarization sequences (0 =
no novel chunks, 1 = at least one novel chunk). More Grammar B sequences contained
novel chunks (45) than Grammar A sequences (37), t(188) = 2.23, p = .03, and more
ungrammatical sequences contained novel chunks (79) than grammatical sequences (3),
t(188) = 19.30, p < .001.
8. A novel chunk position value was assigned to each priming sequence based on whether
any bigram or trigram in that sequence that had occurred in the familiarization sequences
was occurring in a position it had not occupied in the familiarization sequences (0 = no
chunks are in a novel position, 1 = at least one chunk is in a novel position). More
ungrammatical sequences contained chunks in novel positions (60) than grammatical
sequences (21), t(188) = 6.40, p < .001.
9. First-order transitional probability (TP1) was calculated for each priming sequence. If A
designates the note directly preceding the target, and B designates the target, TP1 was
77
calculated by finding the ratio between the frequency of AB in the familiarization
sequences and the frequency of A in the familiarization sequences. TP1 was higher in
grammatical sequences (0.21) than ungrammatical sequences (0.03), t(188) = 17.44, p <
.001.
10. Second-order transitional probability (TP2) was calculated for each priming sequence. If
A designates the note occurring two positions before the target, B the note directly
preceding the target, and C the target, TP2 was calculated by finding the ratio between
the frequency of ABC in the familiarization sequences and the frequency of AB in the
familiarization sequences. TP2 was higher in grammatical sequences (0.22) than
ungrammatical sequences (0.01), t(188) = 17.62, p < .001.
19 Discussion
Despite methodological improvements, results for this experiment closely paralleled those from
Experiment 3. In the discrimination phase, participants were not able to perform above chance in
the recognition task, but they performed at marginally significant levels for generalization. The
presentation of 400 different melodies over 30 minutes of familiarization means that
remembering any one melody should be quite difficult. Thus, performance on the discrimination
task was expected to be no different from performance on the generalization task, wherein the
grammatical melodies that were presented were novel to participants. At best, perhaps
participants would find the familiarization melodies somewhat familiar, leading to better
performance on discrimination than generalization. Therefore, finding that participants fared
better at generalization than discrimination is a highly odd result, leading to speculation that one
or both of these results may be due to statistical error. The fact that Grammar B participants
performed better than Grammar A participants at recognition is also unexpected. This seemed to
be a result of the Grammar A participants performing at less-than-chance levels, whereas the
Grammar B participants performed at chance. Overall then, replication of Loui et al.’s results –
successful recognition and generalization and no difference between the Grammar A and B
groups – failed again.
Turning to the expectancy phase, the effect of timbre and target position on reaction time was not
surprising. The bright-sounding targets were more discriminable from the pure tones than were
78
the dull-sounding targets, and received faster responses. Later targets received faster responses
than earlier targets, probably because participants were more prepared to respond. However, the
priming advantage of ungrammatical trials over grammatical ones observed in Experiment 3 was
observed here for both accuracy and reaction time. This general trend of better performance for
ungrammatical trials continued through the analyses of interaction effects with grammaticality as
a factor. This ungrammatical advantage is particularly strange considering that grammatical and
ungrammatical trials differed by only one note, and were designed not to differ in terms of
contour.
Furthermore, the extensive corpus analysis did not shed any light on this issue. Calculations of
statistical learning variables (repetition frequency, bigram and trigram frequency, associative
chunk strength, first- and second-order transitional probability) all showed significantly higher
values for grammatical than ungrammatical items. Conversely, calculations of novelty variables
(chunk novelty and novel chunk position) all showed significantly higher values for
ungrammatical than grammatical items. Interestingly, differences between Grammar A and
Grammar B familiarization materials were found for trigram frequency and chunk novelty.
Important however, is the fact that the effect size of grammaticality for these statistics far
outstripped the effect size for those familiarization grammar differences. These findings agree
with past statistical learning research on these statistical variables (Hunt & Aslin, 2001;
Johnstone & Shanks, 1999; Meulemans & Van der Linden, 1997; Tillmann & Poulin-
Charronnat, 2010), which indicates that participants are sensitive to these statistics and should
find grammatical items more familiar than ungrammatical ones, and respond accordingly. In this
experiment, this means that given our stimulus corpus, it is reasonable to expect that participants
would have formed musical expectancies based on the statistical information present in
familiarization items.
Delving further into the interactions observed for accuracy and reaction time produced two
interesting results. First, the timbre effect observed in this experiment does not agree with the
tuning effect observed in Experiment 2. Specifically, Experiment 2 revealed that priming effects
generally only occurred for in-tune trials, where the acoustic surface of the stimulus was not
disrupted. In this experiment, pilot work produced the observation that the dull targets were more
similar to the pure tone notes comprising the remainder of the melody than were the bright
targets. Thus, if priming differences were observed, they would be expected for dull targets over
79
bright targets. However, the Grammaticality x Timbre accuracy interaction in this experiment
revealed that the ungrammatical advantage was seen only for bright targets. That said, this
“priming” effect was, of course, not in the expected direction, meaning that predictions
concerning a priming effect may not necessarily be valid in this case. Perhaps the acoustical
disruption caused heightened attention to bright target trials, which led to the ungrammatical
advantage for these trials, whatever the mechanism of that effect may be.
Second, whether or not participants had heard priming melodies during familiarization or not had
an important interactional effect on both reaction time and response accuracy. The two-way
interactions involving familiarization indicated that the performance advantages observed for
bright and ungrammatical targets depended on whether the melody was heard during
familiarization. This is a very interesting result, because participants only heard the 400
familiarization melodies once in advance of the priming trials in the expectancy phase. Thus,
hearing a melody just once, embedded in a stream of 399 other melodies, altered processing of
that melody later in the experiment. This indicates that the learning mechanisms being employed
by participants during familiarization are very sensitive, and highlights the benefit of using
implicit methodologies such as the priming paradigm because they enable participants to
showcase learning effects that are presumably unavailable to consciousness.
Overall it can be said that the most important effect observed in this experiment, the unexpected
accuracy and reaction time advantage of ungrammatical over grammatical trials, was very
sensitive to context. This leads to the question of whether this strange effect would be maintained
given a change in context. Moreover, the investigation of contextual properties of the
familiarization materials has been a central theme of this project. Therefore, in the last of this
series of experiments, the final manipulation of familiarity and complexity was conducted, and
the familiarization, discrimination, and expectancy corpuses were composed from unfamiliar
complex units: Bohlen-Pierce chords.
80
Chapter 6 Experiment 5: Tonal Priming With Bohlen-Pierce Chords
20 Introduction
The primary purpose of this experiment was to test the learning of musical expectancies with a
novel tone set in a harmonic context. Recall that in a Western chromatic context, the learning of
melodic expectancies was not demonstrated in Experiment 1 (although see Tillmann & Poulin-
Charronnat, 2010), whereas some evidence for the learning of harmonic expectancies was seen
in Experiment 2. Thus, it was of interest to investigate harmonic expectancy learning with a
novel tone set here, particularly because it would complete the testing of hypotheses regarding
the role of stimulus familiarity and complexity in the statistical learning of musical expectancies.
Thus, a Bohlen-Pierce chord grammar was created by combining Krumhansl’s (Krumhansl,
1987) description of the Bohlen-Pierce system with Jonaitis and Saffran’s (Jonaitis & Saffran,
2009) chord grammar, resulting in familiarization materials composed from an unfamiliar tone
set and complex in texture.
Given the results of Experiments 3 and 4, this experiment also serves to test the context
sensitivity of the ungrammatical priming advantage observed in those experiments. If the
ungrammatical advantage is observed again in this experiment, this provides inductive evidence
that the locus of the effect may be a perceptual novelty effect caused by the Bohlen-Pierce tone
system. However, if a regular priming effect (a grammatical advantage for accuracy and reaction
time) is observed, this will indicate that the ungrammatical advantage is due, not to the
unfamiliar tone set, but something more idiosyncratic about the Bohlen-Pierce melodic stimuli
used in Experiments 3 and 4.
21 Methods
21.1 Participants
Forty participants were recruited from the University of Toronto Scarborough community using
the introductory psychology participant pool as well as posted advertisements. Participants were
compensated with course credit or $10 per hour.
81
Participants consisted of 11 males and 29 females with a mean age of 19.5 years (SD = 1.8
years). Participants were selected to have at least five years of formal musical training, due to
concerns about task difficulty for non-musicians. Participants had on average 9.2 years of formal
musical training (SD = 2.9 years), 3.7 years of musical theory training (SD = 3.4 years), played
music for 3.3 hours per week (SD = 4.4 hours), and listened to music for 18.5 hours per week
(SD = 15.7 hours). Two participants in the discrimination condition and three participants in the
expectancy condition reported having absolute pitch. Four participants in the discrimination
group and two participants in the expectancy group reported having taken part in a music
psychology experiment previously, but all six reported being naïve to the experimental
hypotheses.
21.2 Apparatus
This experiment used an identical apparatus to Experiment 1.
21.3 Materials
Chord synthesis:
Seven Bohlen-Pierce chords were constructed for this experiment, according to theoretical
descriptions by Krumhansl (1987). Table 7 depicts the tones used for each chord and their
frequencies. Because each chord has a specified inversion (order of pitch heights for the three
notes), the required pitches were generated in the original tritave starting at k = 220 Hz as well as
the tritave below (starting at k = 110 Hz) and above (starting at k = 660 Hz).
The seven chords, all 500 ms long, were synthesized in CSound in three different timbres. For
target chords and timbre training, the seven chords were synthesized using the bright (Timbre A)
and dull (Timbre B) tones from Experiment 4. For the majority of the chords in each progression,
a new set of complex tones was synthesized with four octave-separated harmonics played at
equal magnitude and 5 ms rise and fall times. These were used instead of the pure tones (from
Experiment 4) because chords composed of pure tones were very difficult to discriminate from
Timbre B chords during pilot work.
Chord grammars for this study were constructed by replacing the seven nodes in Jonaitis &
Saffran’s (2009) Grammars A and B with the seven Bohlen-Pierce chords described above
82
(Figure 33). These two Bohlen-Pierce chord grammars were then used to generate chord
progression stimuli for the three experimental phases.
Table 7
Composition of Materials for Experiment 5 (from Krumhansl, 1987).
Chord Note in Tritave (n) Frequency (Hz)
I
0
220.00
6 365.29
10 512.20
III 3 283.48
9 470.69
(0) 660.00
IV 4 308.48
10 512.20
(1) 718.20
V 6 365.29
12 606.52
(3) 850.45
VI 7 397.50
(0) 660.00
(4) 925.44
VIII [10] 170.73
3 283.48
7 397.50
ix [12] 202.17
3 283.48
9 470.69
Note. The column for “n” gives the value for k = 220 Hz. The values for k = 110 Hz and k = 660
Hz appear in square and rounded brackets, respectively, if applicable.
83
Figure 33. Bohlen-Pierce chord grammars used in Experiment 5. (a) Grammar A; (b) Grammar
B. Composition of these chords is specified in Table 7.
Familiarization phase:
The familiarization phase materials consisted of 50 grammatical chord progressions in each of
the two grammars, from 4 to 10 chords long, repeated four times each in random order such that
no progressions occurred back-to-back. Thus, there were 200 familiarization sequences in
contrast to the 100 sequences from Experiment 2 (based on Experiment 1 from Jonaitis &
Saffran, 2009). The familiarization period here was doubled in imitation of the two
familiarization periods used by Jonaitis and Saffran (2009) in their second experiment, which led
to better grammar learning. This was done because of concerns that the Bohlen-Pierce grammar
would be more difficult to learn because of perceptual unfamiliarity. Participants were randomly
assigned to the Grammar A or Grammar B group.
Discrimination phase:
The discrimination phase materials were identical to those of Experiment 2, except for the fact
that they were generated from the Bohlen-Pierce chord grammars described above (rather than
the Phrygian grammar from Jonaitis & Saffran, 2009). Thus the 60 discrimination trials were
manipulated in terms of Grammar (A vs. B) and Correctness (Correct vs. Error).
Expectancy phase and timbre training:
The expectancy phase and timbre training materials were very similar to those of Experiment 4,
except for the fact that they were composed of three-tone chords (and chord progressions) rather
84
than tones (and melodies). There were 96 priming trials for each grammar, with each Grammar B
chord progression being a retrograde version of a Grammar A progression. Each progression was
either eight chords or 10 chords in length. For eight-chord progressions the target was either the
fourth or fifth chord, and for 10-chord progressions the target was either the fifth or sixth chord.
As in Experiment 4, trials were manipulated in terms of Grammaticality (Grammatical,
Ungrammatical), Timbre (Bright, Dull), and Familiarization (Familiarization, Novel), with 12
progressions corresponding to each combination of these three factors.
21.4 Procedure
Participants were randomly assigned to the discrimination group or the expectancy group. Both
groups took part in the familiarization phase. Only the discrimination group participated in the
discrimination phase, and only the expectancy group participated in the expectancy phase.
Discrimination group:
The discrimination group participated in the familiarization phase and then the discrimination
phase. During the familiarization phase, participants heard 200 familiarization progressions in
their assigned grammar (A or B). Participants rated each progression according to how much
they liked it on a scale from 1-7 (see Experiment 2). Next, in the discrimination phase,
participants heard 60 new progressions and rated each progression according to how similar it
sounded to the familiarization items on a scale from 1-7 (see Experiment 2). Following the
discrimination trials, participants completed a survey regarding their musical experience. The
entire experimental session lasted approximately one hour.
Expectancy group:
The expectancy group was first trained to discriminate between Timbre A and B, using identical
methodology as Experiment 4. Next, they completed the familiarization phase (see description
for discrimination group). Finally, they completed the expectancy phase, again using identical
methodology as Experiment 4. Following the priming trials in the expectancy phase, participants
completed a survey regarding their musical experience. The entire experimental session lasted
approximately one hour.
85
22 Results
22.1 Discrimination Phase
Similarity ratings from each participant were collapsed across the 15 exemplars in each of the
Grammar/Correctness conditions. These mean ratings were then submitted to repeated measures
ANOVA with Grammaticality (Grammatical, Ungrammatical) and Correctness (Correct, Error)
as factors. There was a main effect of Grammaticality, F(1,19) = 57.93, MSE = 0.51, p < .001,
ɳp2 = .75, with higher similarity ratings for grammatical sequences than ungrammatical ones. The
main effect of Correctness was also significant, F(1,19) = 71.09, MSE = 0.21, p < .001, ɳp2 = .79,
with higher similarity ratings for correct than error items. Finally, the interaction between
Grammaticality and Correctness was significant, F(1,19) = 29.13, MSE = 0. 41, p < .001, ɳp2 =
.61. Simple effects indicated that although correct items were rated as significantly more similar
than error items for both grammatical and ungrammatical trials, the correct – error difference
was larger for grammatical trials (Figure 34).
Correct Error
Correctness
2
3
4
5
6
Sim
ila
rity
(1
= d
iffe
ren
t, 7
= s
am
e)
Grammatical Progression
Ungrammatical Progression
***
***
Figure 34. Grammar x Correctness interaction for discrimination phase in Experiment 5.
86
22.2 Expectancy Phase
Accuracy data:
Raw accuracy data were collapsed across length, target position, and repetition. These mean
accuracy scores were then submitted to repeated measures ANOVA with Grammaticality
(Grammatical, Ungrammatical), Timbre (Bright, Dull), and Familiarization (Familiarization,
Novel) as factors. None of the main effects were significant, all F values < 3.39, all p values >
.08. The interaction between Grammaticality and Familiarization was significant, F(1,19) = 6.45,
MSE = 0.003, p = .02, ɳp2
= .25. Simple effects indicates that this interaction was driven by the
fact that novel chord progressions received more accurate ratings than familiarization chord
progressions on ungrammatical trials, t(19) = 3.10, p = .01, but not grammatical trials, t(19) =
0.13, p = .90 (Figure 35). None of the other two- or three-way interactions were significant, all F
values < 3.06, all p values > .09.
Training Novel
Source of Progression
84
86
88
90
92
94
96
Ac
cu
rac
y (
%)
Grammatical Progression
Ungrammatical Progression
**
Figure 35. Grammaticality x Familiarization interaction for accuracy in Experiment 5.
87
Reaction time data:
Reaction times were only analyzed for trials where participants answered correctly, and all
reaction times greater than 2000 ms were discarded. One participant’s reaction time data were
discarded because he or she scored less than 75% accuracy. The remaining valid data were then
collapsed across length, target position, and repetition. These mean reaction times were
submitted to repeated measures ANOVA with Grammaticality (Grammatical, Ungrammatical),
Timbre (Bright, Dull), and Familiarization (Familiarization, Novel) as factors. None of the main
effects were significant, all F values < 3.06, all p values > .09, nor were any of the interactions,
all F values < 1.72, all p values > .20.
23 Discussion
The results from this experiment were quite different from the two previous experiments using
Bohlen-Pierce stimuli. In Experiments 3 and 4 (Bohlen-Pierce melodies), participants were
unable to perform above chance in the discrimination phase. In this experiment, not only did
participants distinguish between sequences that followed the familiarization grammar and
sequences that did not, they also successfully distinguished between completely correct
sequences and ones that contained grammatical errors. Thus, participants exhibited more
sophisticated comprehension of a Bohlen-Pierce chord grammar after 200 familiarization items
than a Western diatonic chord grammar after 100 familiarization items (i.e., Experiment 2 of this
project, and Jonaitis and Saffran (2009). The obvious explanation here is that more
familiarization was able to overcome participants’ perceptual unfamiliarity with Bohlen-Pierce
chords and led to better performance. Additionally, as discussed previously, the novel Western
chord grammar had to compete with existing mental representations linking those chords
together (Western harmony), whereas no such representations exist for Bohlen-Pierce chords.
Additionally, this result shows that participants were able to learn the structure of a chord
grammar, but not a melodic grammar composed of Bohlen-Pierce tones. Given that the melodic
grammar from Experiments 3 and 4 were in fact based on a four-chord progression, perhaps this
result reflects the increased contextual information provided by the more complex chords over
the simpler melodies.
One unexpected result from the discrimination phase is that participants reliably judged correct
sequences from the retrograde grammar as more similar to familiarization items than error items
88
(see Figure 14). This illustrates a surprising ability to cognitively extrapolate what grammatical
items should sound like going forward, but also in reverse. Sensitivity to retrograde forms has
been demonstrated previously. Dowling (1972) found that listeners could recognize retrograde
transformations of short atonal melodies at above-chance levels. Research on twelve-tone serial
music, in which composers commonly employ backwards iterations of the main structural
sequence (the prime form) has also provided evidence for listener sensitivity to retrogrades.
Krumhansl, Sandell, and Sergeant (1987) demonstrated that following a familiarization period
where listeners were exposed to two prime forms, listeners could match retrograde tone rows
with the corresponding prime form.
The results from this experiment go one step further, however, and reveal that after
familiarization on a large number of grammatical exemplars, participants are able to generalize
their knowledge to novel sequences, both in prime and retrograde form. It is important to note,
however, that this result is currently restricted to Bohlen-Pierce chord stimuli.
Turning to the expectancy phase, in contrast to the ungrammatical advantage observed in
Experiments 3 and 4, there was a complete absence of any priming effect in this experiment.
This was unexpected since it followed the successful grammar learning observed in the
discrimination phase. The reasons for this null effect are unclear. The most obvious explanation
would be that participants simply do not develop musical expectancies from familiarization with
this unfamiliar chord grammar. However, this seems unlikely, considering their robust
performance in the discrimination phase, and the demonstration of some priming effects in
Experiment 2 following weaker discrimination performance. It is possible that Bohlen-Pierce
tones, which are tuned in a way that was reported to be completely novel by all participants, are
perceived and hence processed differently from familiar Western chromatic tones, with which
participants have a lifetime of experience. This differential processing may lead to expectancies
that behave differently. Unfortunately, these speculations were not tested by this experiment, and
not enough experimental work has been conducted on the perception of the Bohlen-Pierce tuning
system to make a conjecture regarding the exact locus of this non-effect.
89
Chapter 7 General Discussion
23.1 Examination of Research Goals
The main objective of this project was to test the hypothesis that listeners develop musical
expectancies through the mechanism of statistical learning. On this count, the project was to
some extent successful. In Experiment 2, listeners exhibited an accuracy and reaction time
advantage for chord pairs that were related over chord pairs that were unrelated with respect to
the familiarization grammar. This priming effect was analogous to the priming effects that have
been previously observed for materials structured according to Western tonality (for review, see
Tillmann, 2005). As such, Experiment 2’s result provides a valuable piece of evidence for the
notion that musical expectancy, which is ordinarily predicated on knowledge of tonal structure,
can be acquired through statistical learning processes. However, the success at inducing musical
expectancies in Experiment 2 must be weighed against the failure to induce such expectancies in
Experiments 1, 3, 4, and 5.
While there was simply no evidence of expectancy learning in Experiments 1 and 5, it is of
particular interest that the expectancy-based priming effects that were expected in Experiments 3
and 4 were in fact significant, but in the opposite direction from predictions. Thus, with Bohlen-
Pierce melodies, listeners exhibited better memory performance (Experiment 3) and faster, more
accurate responses (Experiment 4) for ungrammatical items than grammatical ones. This
unexpected ungrammatical advantage, as well as the failure to observe expectancy effects in
Experiment 5, may have been the consequence of the stimulus materials; this idea is further
discussed subsequently.
The secondary objective of this project was to investigate how the properties of the stimuli used
in musical statistical learning paradigms affect listeners’ ability to form expectancies. To this
end, two properties of the stimuli, tone set familiarity and textural complexity, were
systematically manipulated.
With respect to tone set familiarity, one can compare the results of experiments that used
materials constructed from a familiar tone set with the results of studies that used a novel tone
set. None of the studies from this project that used novel tones demonstrated successful
90
expectancy learning, whereas the one example of successful expectancy induction came from a
study that used a Western tone set (Experiment 2). However, before concluding that expectancies
can be learned with familiar tone sets but not novel ones, note that Loui and colleagues (Loui,
Wu, Wessel, & Knight, 2009) were able to use electroencephalography (EEG) to measure
electrophysiological evidence that listeners had learned expectancies through exposure to a
Bohlen-Pierce chord progression. In this study, an early anterior negativity (EAN) was observed
in response to deviant chord progressions that did not match the majority of the progressions
being presented. While this is not strictly evidence that structural learning of the Bohlen-Pierce
chord progression resulted in downstream processing effects (as was earlier defined as a
behavioural benchmark for expectancy learning), the EAN has been well-documented as a brain
response to violations of musical expectancy.
With respect to textural complexity, one can compare the results of experiments that used
simply-textured melodies with the results of studies that used complex-textured chord
progressions. None of the studies from this project that used melodic stimuli demonstrated
successful expectancy learning, whereas the one example of successful expectancy induction
came from a study that used harmonic materials (Experiment 2). However, before concluding
that novel expectancies can be learned with chords but not melodies, recall that Tillmann &
Poulin-Charronnat (2010) showed successful induction of expectancies with melodies using a
priming task.
If not for reasons of tone set familiarity and textural complexity, why was expectancy learning
successful in Experiment 2 (Tonal Priming with Western Chords) and in the studies by Loui et
al. (2009) and Tillmann and Poulin-Charronnat (2010)? These three experiments share two
characteristics that may have potentiated expectancy learning. First, all three experiments were
ecologically valid in terms of the way that stimuli were constructed. Specifically, all three
experiments used finite state grammars that defined the transitional dependencies between units,
similar to the way that tonality governs which tones can occur and in what order. This was in
contrast to the melodic segmentation explored in Experiment 1, for instance. However, it should
be noted that Experiments 3 and 4 used a grammar identical to the one employed by Loui et al.
(2009) and failed to demonstrate expectancy learning. In this case, perhaps the behavioural
priming task (Experiments 3 and 4) was not as sensitive as EEG (Loui, et al., 2009) in detecting
expectancy effects; this idea will be discussed further subsequently.
91
Secondly, expectancies were measured in these three studies using methods that have been
employed extensively to measure expectancy effects produced by real musical structure. Two of
the experiments then measured expectancy through priming and the third used
electroencephalography (EEG) to measure the EAN, a frequently replicated brain indicator of
musical expectancy violation. This is in contrast to the melody memory task used in Experiments
1 and 3, which has not been commonly used in an expectancy context, and may not be sensitive
enough to detect the weak nascent expectancies that developed in these two experiments. Oddly,
however, Experiments 4 and 5 both failed to show expectancy learning using a priming task.
Perhaps the expectancies developed with the Bohlen-Pierce tones were weaker than the ones
developed with familiar Western tones, and therefore required a more sensitive method of
measuring expectancy, such as EEG.
Based on the assembled evidence then, the differences in the current project between studies that
used a familiar tone set and studies that used a novel tone set, and between studies that used
melodic stimuli and studies that used harmonic stimuli, seem to be based upon idiosyncratic
features of the studies themselves rather than upon systematic differences between the
apprehension of expectancies from familiar versus novel tone sets, and melodies versus chords.
23.2 Refining the Proposed Model of Expectancy Learning
One question arising from these studies is how the discrimination and expectancy tasks actually
relate to one another. These experiments were structured on the assumption that discrimination
reflects mental representation of stimulus structure, whereas priming or memory performance
reflects the manifestation of expectancies based on that structure. Within this framework,
statistical learning of stimulus structure is thought to precede the expression of expectancy
effects due to that learning.
In Experiments 1 and 5, participants performed above chance at discrimination but failed to
show any expectancy effects. In Experiment 2, participants performed successfully in both the
discrimination and expectancy phases. These results would be expected based on the framework
described above. In Experiments 3 and 4, however, participants performed at chance in the
discrimination phase and then showed significant priming effects in the expectancy phase, albeit
92
in an unexpected direction. If structure learning precedes expectancy expression, and assuming
the proper measurement of discrimination and expectancy, these results are clearly contradictory.
Therefore, these assumptions must be reconsidered.
Borrowing from the ideas of Perruchet and Pacton (2006) regarding statistical learning and
chunk formation, there are three possibilities here. First, as originally assumed, statistical
learning of structure may precede expectancy formation. As mentioned, the results of
Experiments 3 and 4 contradict this model, although the unexpected direction of this effect
suggests that this ungrammatical advantage may be the result of a perceptual novelty effect
rather than the reflection of expectancy learning.
Second, expectancy formation may precede the apprehension of structure from statistical cues.
This model operates on the idea that expectancies are formed implicitly based upon structure,
and the expression of these expectancies allows listeners to become aware of the statistical
information underlying those expectancies. Thus, although statistical structure is required for the
formation of expectancies, listeners are able to convey their expectancies before being able to
convey knowledge of the structure upon which it is based. This model speaks to the results of
Experiments 3 and 4, and supports the idea that the priming task was a more sensitive measure of
listeners’ structural understanding (via expectancy) than the more explicit discrimination task.
However, this model conflicts with the results from Experiment 5, in which listeners exhibited
highly refined discrimination performance and a complete lack of learned expectancies in the
priming task.
The last possibility, and the most likely, is a combination of the previous two models. According
to this account, structure learning and expectancy formation both rely on statistical cues in the
stimulus. However, these two putative processes would operate in parallel and under some
conditions, would be able to feed back and forth into one another. Thus, successful
discrimination does not necessarily mean successful expectancy learning, and vice versa.
Turning to Experiments 3 and 4, perhaps there was not enough familiarization for the statistical
cues to transitional dependencies in the familiarization grammar to be absorbed. Rather, more
global statistical cues concerning the tone set may have been picked up to listeners. If this was
the case, then discrimination performance would have suffered because this task specifically
targeted transitional dependency information. However, listeners would have the ability to
93
express expectancies based on tone set learning – as discussed previously, grammatical melodies
used a more restricted tone set than ungrammatical ones – leading to expectancy formation and
the ensuing novelty effect. If a longer familiarization phase had been used, perhaps the statistical
cues to transitional dependencies would have surpassed some threshold for structural learning,
and led to successful discrimination as well as a classic priming effect in which performance was
better for grammatical rather than ungrammatical items.
23.3 Improving Methodology
Another question that arises from these inconsistent results is whether behavioural
experimentation is the best methodology available to study statistical learning. Behavioural work
has had reasonable success explaining how listeners develop psychological representations of
musical structure. However, these paradigms have three shortcomings when it comes to
statistical learning. First, effects, when found, are generally small in magnitude, which makes
detecting them an onerous task. For example Jonaitis and Saffran (2009) had to train listeners
with two sessions on two different days before listeners could distinguish correct grammatical
items from ones containing errors. It is possible that the failure to induce expectancies in some of
the present experiments was attributable to familiarization periods that were too short.
Second, behavioural experiments require listeners to be aware of the responses they are making,
whereas statistical learning is largely an implicit process. The use of implicit methods such as the
priming paradigm can alleviate this problem somewhat, although one may still argue that
priming tasks are not as sensitive in the detection of nascent expectancies as are brain techniques
such as EEG (Loui, et al., 2009).
Finally, a third disadvantage of many behavioural paradigms, including the ones employed in
this project, and the most important, is that they assess the end products of learning, rather than
the process itself. Of course, it is possible to measure the status of learned representations
multiple times during familiarization. For instance, Rohrmeier, Rebuschat, and Cross (2011)
found that listeners who did not experience the familiarization phase performed above chance in
the latter parts of their discrimination phase, indicating that they were able to pick up statistical
cues from the discrimination trials. However, these types of paradigms tend to be complicated in
94
design and time-consuming to run, which puts participants at risk for fatigue effects before the
experiment is over.
Therefore, developing a new methodology to study statistical learning, one that surmounts these
problems, would be highly beneficial. As alluded to previously, one promising approach would
be to use electroencephalography (EEG) or magnetoencephalography (MEG) to measure and
localize event-related potentials (ERPs) related to musical expectancy in the brain. ERPs can be
measured continuously during learning and without listeners being aware of the phenomenon
being studied, which may lead to larger effect sizes. Importantly, musical expectancy has been
extensively studied using these techniques (particularly EEG), so the brain indices that are
modulated by expectancy differences have been described in great detail. These ERPs include
the EAN, described previously, and the late negativity (LN), which appears around 400 ms post-
stimulus and has a fronto-central scalp distribution (Koelsch, et al., 2000; Koelsch & Siebel,
2005; Maess, et al., 2001). The EAN is thought to reflect representations of stimulus structure
(i.e., tonality), whereas the LN is thought to reflect the neural integration of each event into the
structural whole (Koelsch & Siebel, 2005).
As described briefly in the previous section, Loui et al. (2009) measured ERPs while participants
listened to Bohlen-Pierce chord progressions. Most of the chord sequences followed a
grammatical system, while a small number of the sequences contained ungrammatical targets.
These ungrammatical targets elicited both the EAN and the LN. Critically, the EAN response
grew in magnitude over time, indicating that as participants encountered more grammatical
exemplars, their knowledge of the chord grammar grew, resulting in stronger responses to
grammatical violations. These results indicate that brain techniques can offer important insights
into the statistical learning process. EEG and MEG are potentially the best way to proceed in
examining the main questions of this project, given the inconsistent results obtained with this set
of behavioural experiments.
23.4 Conclusions
In sum, this project presented a systematic behavioural exploration of the statistical learning of
musical expectancy. Results indicated that statistical learning can indeed contribute to the
formation of musical expectancies, particularly if the novel structure being learned is composed
from Western chromatic tones, with which listeners have extensive perceptual experience.
95
Listeners have much more difficulty learning musical structures comprised of unfamiliar tones,
such as those of the Bohlen-Pierce scale. Future research will focus on the measurement of
event-related potentials as indicators of musical expectancy learning in vivo, which will provide
a more sensitive method to elucidate the relation between statistical learning and mental
representations of musical expectancy.
96
References
Bartlett, J. C., & Dowling, W. J. (1980). Recognition of transposed melodies: A key-distance
effect in developmental perspective. Journal of Experimental Psychology: Human
Perception & Performance, 6(3), 501-515.
Bharucha, J. J., & Stoeckig, K. (1986). Reaction time and musical expectancy: Priming of
chords. Journal of Experimental Psychology: Human Perception & Performance, 12(4),
403-410.
Bharucha, J. J., & Stoeckig, K. (1987). Priming of chords: Spreading activation or overlapping
frequency spectra? Perception & Psychophysics, 41(6), 519-524.
Bigand, E., & Pineau, M. (1997). Global context effects on musical expectancy. Perception &
Psychophysics, 59(7), 1098-1107.
Bigand, E., Poulin, B., Tillmann, B., Madurell, F., & D'Adamo, D. A. (2003). Sensory versus
cognitive components in harmonic priming. Journal of Experimental Psychology: Human
Perception & Performance, 29(1), 159-171.
Boltz, M. (1991). Some structural determinants of melody recall. Memory & Cognition, 19(3),
239-251.
Boltz, M. (1993). The generation of temporal and melodic expectancies during musical listening.
Perception & Psychophysics, 53, 585-600.
Carlsen, J. C. (1981). Some factors which influence melodic expectancy. Psychomusicology, 1,
12-29.
Castellano, M. A., Bharucha, J. J., & Krumhansl, C. L. (1984). Tonal hierarchies in the music of
North India. Journal of Experimental Psychology: General, 113(3), 394-412.
Creel, S., Newport, E. L., & Aslin, R. N. (2004). Distant melodies: Statistical learning of
nonadjacent dependencies in tone sequences. Journal of Experimental Psychology:
Learning, Memory, & Cognition, 30(5), 1119-1130.
Cuddy, L. L., & Badertscher, B. (1987). Recovery of the tonal hierarchy: Some comparisons
across age and levels of musical experience. Perception & Psychophysics, 41(6), 609-
620.
Cuddy, L. L., Cohen, A., & Mewhort, D. J. (1981). Perception of structure in short melodic
sequences. Journal of Experimental Psychology: Human Perception & Performance,
7(4), 869-883.
Cuddy, L. L., Cohen, A. J., & Miller, J. (19799). Melody recognition: The experimental
application of musical rules. Canadian Journal of Psychology, 33, 148-157.
97
Cuddy, L. L., & Lunney, C. A. (1995). Expectancies generated by melodic intervals: Perceptual
judgments of melodic continuity. Perception & Psychophysics, 57, 451-462.
Dewar, K. M., Cuddy, L. L., & Mewhort, D. J. K. (1977). Recognition memory for single tones
with an without context. Journal of Experimental Psychology: Human Learning &
Memory, 3, 60-67.
DeWitt, L. A., & Crowder, R. G. (1986). Recognition of novel melodies after brief delays. Music
Perception, 3(3), 259-274.
Dowling, W. J. (1972). Recognition of melodic transformations: Inversion, retrograde, and
retrograde inversion. Perception & Psychophysics, 12(5), 417-421.
Dowling, W. J. (1978). Scale and contour: Two components of a theory of memory for melodies.
Psychological Review, 85(4), 341-354.
Dowling, W. J. (1991). Tonal strength and melody recognition after long and short delays.
Perception & Psychophysics, 50(4), 305-313.
Dowling, W. J., & Bartlett, J. C. (1981). The importance of interval information in long-term
memory for melodies. Psychomusicology, 1(1), 30-49.
Dowling, W. J., & Fujitani, D. S. (1971). Contour, interval, and pitch recognition in memory for
melodies. The Journal of the Acoustical Society of America, 49(2), 524-531.
Dowling, W. J., Kwak, S., & Andrews, M. W. (1995). The time course of recognition of novel
melodies. Perception & Psychophysics, 57(2), 136-149.
Endress, A. D. (2010). Learning melodies from non-adjacent tones. Acta Psychologica, 135,
182-190.
Frances, R. (1972). La perception de la musique (2nd ed.). Paris: Vrin.
Graf Estes, K., Evans, J. L., Alibali, M. W., & Saffran, J. R. (2007). Can infants map meaning to
newly segmented words? Statistical segmentation and word learning. Psychological
Science, 18(3), 254-260.
Hunt, R. H., & Aslin, R. N. (2001). Statistical learning in a serial reaction time task: Access to
separable statistical cues by individual learners. Journal of Experimental Psychology:
General, 130(4), 658-680.
Huron, D. (2006). Sweet Anticipation. Cambridge: MIT Press.
Johnstone, T., & Shanks, D. R. (1999). Two mechanisms in implicit artificial grammar learning?
Comment on Meulemans and Van der Linden (1997). Journal of Experimental
Psychology: Learning, Memory, & Cognition, 25(2), 524-531.
Jonaitis, E. M., & Saffran, J. R. (2009). Learning harmony: The role of serial statistics. Cognitive
Science, 33, 951-968.
98
Jones, M. R. (1990). Musical events and models of musical time. In R. Block (Ed.), Cognitive
models of psychological time. Hillsdale, N.J.: Lawrence Erlbaum.
Kessler, E. J., Hansen, C., & Shepard, R. N. (1984). Tonal schemata in the perception of music
in Bali and in the West. Music Perception, 2(2), 131-165.
Koelsch, S., Gunter, T., & Friederici, A. D. (2000). Brain indices of music processing:
"Nonmusicians" are musical. Journal of Cognitive Neuroscience, 12(3), 520-541.
Koelsch, S., & Siebel, W. A. (2005). Towards a neural basis of music perception. Trends in
Cognitive Sciences, 9(12), 578-584.
Krumhansl, C. L. (1987). General properties of musical pitch systems: Some psychological
considerations. In J. Sundberg (Ed.), Harmony and Tonality. Stockholm: Royal Swedish
Academy of Music.
Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. New York: Oxford University
Press.
Krumhansl, C. L. (1995a). Effects of musical context on similarity and expectancy.
Systematische Musikwissenschaft, 3, 211-250.
Krumhansl, C. L. (1995b). Music psychology and music theory: Problems and prospects. Music
Theory Spectrum, 17, 53-80.
Krumhansl, C. L. (2000). Rhythm and pitch in music cognition. Psychological Bulletin, 126(1),
159-179.
Krumhansl, C. L., Bharucha, J. J., & Kessler, E. J. (1982). Perceived harmonic structure of
chords in three related musical keys. Journal of Experimental Psychology: Human
Perception & Performance, 8(1), 24-36.
Krumhansl, C. L., Sandell, G. J., & Sergeant, D. C. (1987). The perception of tone hierarchies
and mirror forms in twelve-tone serial music. Music Perception, 5(1), 31-78.
Krumhansl, C. L., & Shepard, R. N. (1979). Quantification of the hierarchy of tonal functions
within a diatonic context. Journal of Experimental Psychology: Human Perception &
Performance, 5(4), 579-594.
Krumhansl, C. L., Toivanen, P., Eerola, T., Toiviainen, P., Järvinen, T., & Louhivuori, J. (2000).
Cross-cultural music cognition: Cognitive methodology applied to North Sami yoiks.
Cognition, 76, 13-58.
Laitz, S. (2008). The complete musician: An integrated approach to tonal theory, analysis, and
listening. (2nd ed.). New York: Oxford University Press.
Lerdahl, F., & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge: MIT Press.
Loui, P., & Wessel, D. (2008). Learning and liking an artificial musical system: Effects of set
size and repeated exposure. Musicae Scientiae, 12(2), 207-230.
99
Loui, P., Wessel, D. L., & Kam, C. L. H. (2010). Humans rapidly learn grammatical structure in
a new musical scale. Music Perception, 27(5), 377-388.
Loui, P., Wu, E. H., Wessel, D. L., & Knight, R. T. (2009). A generalized mechanism for
perception of pitch patterns. The Journal of Neuroscience, 29(2), 454-459.
MacMillan, N. A., & Creelman, C. D. (2005). Detection Theory: A User's Guide (2nd. ed.).
Mahwah, N.J.: Lawrence Erlbaum Associates.
Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in
Broca's area: An MEG study. Nature Neuroscience, 4(5), 540-545.
Manzara, L. C., Witten, I. H., & James, M. (1992). On the entropy of music: An experiment with
Bach chorale melodies. Leonardo, 2, 81-88.
Margulis, E. H. (2005). A model of melodic expectation. Music Perception, 22(4), 663-714.
Marmel, F., & Tillmann, B. (2009). Tonal priming beyond tonics. Music Perception, 26(3), 211-
221.
Marmel, F., Tillmann, B., & Delbe, C. (2010). Priming in melody perception: Tracking down the
strength of cognitive expectations. Journal of Experimental Psychology: Human
Perception & Performance, 36(4), 1016-1028.
Marmel, F., Tillmann, B., & Dowling, W. J. (2008). Tonal expectations influence pitch
perception. Perception & Psychophysics, 70(5), 841-852.
Meulemans, T., & Van der Linden, M. (1997). Associative chunk strength in artificial grammar
learning. Journal of Experimental Psychology: Learning, Memory, & Cognition, 23(4),
1007-1028.
Meyer, L. B. (1956). Emotion and meaning in music. Chicago, IL: University of Chicago Press.
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity
for processing information. Psychological Review, 63(2), 81-97.
Narmour, E. (1990). The analysis and cognition of basic melodic structures: The implication-
realization model. Chicago, IL: University of Chicago Press.
Narmour, E. (1992). The analysis and cognition of melodic complexity: The implication-
realization model. Chicago, IL: University of Chicago Press.
Oram, N., & Cuddy, L. L. (1995). Responsiveness of Western adults to pitch-distributional
information in melodic sequences. Psychological Research, 57, 103-118.
Pearce, M. T., & Wiggins, G. A. (2006). Expectation in Melody: The Influence of Context and
Learning. Music Perception, 23(5), 377-405.
Perruchet, P., & Pacton, S. (2006). Implicit learning and statistical learning: One phenomenon,
two approaches. Trends in Cognitive Sciences, 10(5), 233-238.
100
Poletiek, F. H., & van Schijndel, T. J. P. (2009). Stimulus set size and statistical coverage of the
grammar in artificial grammar learning. Psychonomic Bulletin & Review, 16(6), 1058-
1064.
Rameau, J. (1971). Treatise on Harmony. Mineola, N.Y.: Dover Publications.
Rohrmeier, M., Rebuschat, P., & Cross, I. (2011). Incidental and online learning of melodic
structure. Consciousness and Cognition, 20, 214-222.
Saffran, J. R. (2003a). Absolute pitch in infancy and adulthood: The role of tonal structure.
Developmental Science, 6(1), 37-49.
Saffran, J. R. (2003b). Musical learning and language development. Annals of the New York
Academy of Sciences, 999, 1-5.
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants.
Science, 274, 1926-1927.
Saffran, J. R., & Griepentrog, G. J. (2001). Absolute pitch in infant auditory learning: Evidence
for developmental reorganization. Developmental Psychology, 37(1), 74-85.
Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone
sequences by human infants and adults. Cognition, 70, 27-52.
Saffran, J. R., Newport, E. L., & Aslin, R. N. (1996). Word segmentation: The role of
distributional cues. Journal of Memory and Language, 35, 606-621.
Saffran, J. R., Reeck, K., Niebuhr, A., & Wilson, D. (2005). Changing the tune: The structure of
the input affects infants' use of absolute and relative pitch. Developmental Science, 8(1),
1-7.
Schellenberg, E. G. (1996). Expectancy in melody: Tests of the implication-realization model.
Cognition, 58, 75-125.
Schenker, H. (1954). Harmony (E. M. Borgese, Trans.). Cambridge: MIT Press.
Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic
processes. Music Perception, 7(2), 109-150.
Schmuckler, M. A. (1997). Expectancy effects in memory for melodies. Canadian Journal of
Psychology, 51(4), 292-305.
Smith, N. A., & Schmuckler, M. A. (2004). The perception of tonal structure through the
differentiation and organization of pitches. Journal of Experimental Psychology: Human
Perception & Performance, 30(2), 268-286.
Tekman, H. G., & Bharucha, J. J. (1998). Implicit knowledge versus psychoacoustic similarity in
priming of chords. Journal of Experimental Psychology: Human Perception &
Performance, 24(1), 252-260.
101
Thompson, W. F., Cuddy, L. L., & Plaus, C. (1997). Expectancies generated by melodic
intervals: Evaluation of principles of melodic implication in a melody production task.
Perception & Psychophysics, 59(1069-1076).
Tillmann, B. (2005). Implicit investigations of tonal knowledge in nonmusician listeners. Annals
of the New York Academy of Sciences, 1060, 100-110.
Tillmann, B., Bharucha, J. J., & Bigand, E. (2000). Implicit learning of tonality: A self-
organizing approach. Psychological Review, 107(4), 885-913.
Tillmann, B., Bigand, E., & Pineau, M. (1998). Effects of global and local contexts on harmonic
expectancy. Music Perception, 16(1), 99-117.
Tillmann, B., Janata, P., Birk, J., & Bharucha, J. J. (2003). The costs and benefits of tonal centers
for chord processing. Journal of Experimental Psychology: Human Perception &
Performance, 29(2), 470-482.
Tillmann, B., Janata, P., Birk, J., & Bharucha, J. J. (2008). Tonal centers and expectancy:
Facilitation or inhibition of chords at the top of the harmonic hierarchy? Journal of
Experimental Psychology: Human Perception & Performance, 34(4), 1031-1043.
Tillmann, B., & Poulin-Charronnat, B. (2010). Auditory expectations for newly acquired
structures. The Quarterly Journal of Experimental Psychology, 63(8), 1646-1664.
Trainor, L. J., & Trehub, S. E. (1992). A comparison of infants' and adults' sensitivity to Western
musical structure. Journal of Experimental Psychology: Human Perception and
Performance, 18(2), 394-402.
Trainor, L. J., & Trehub, S. E. (1993). Musical context effects in infants and adults: Key
distance. Journal of Experimental Psychology: Human Perception & Performance, 19(3),
615-626.
Trehub, S. E. (2003). Absolute and relative pitch processing in tone learning tasks.
Developmental Science, 6(1), 44-45.
Trehub, S. E., Schellenberg, E. G., & Kamenetsky, S. B. (1999). Infants' and adults' perception
of scale structure. Journal of Experimental Psychology: Human Perception &
Performance, 25(4), 965-975.
Unyk, A. M., & Carlsen, J. C. (1987). The influence of expectancy on melodic perception.
Psychomusicology, 7, 3-23.