Date post: | 19-Mar-2023 |
Category: |
Documents |
Upload: | khangminh22 |
View: | 0 times |
Download: | 0 times |
LEXICAL TONE PERCEPTION AND
PRODUCTION: THE ROLE OF LANGUAGE
AND MUSICAL BACKGROUND
Barbara Schwanhäußer
M. A.
A thesis submitted for the degree of Doctor of Philosophy
University of Western Sydney
March 2007
ii
I hereby declare that this submission is my own work and, to the best of my
knowledge, it contains no material previously published or written by another person,
nor material which has been accepted for the award of any other degree or diploma at
the University of Western Sydney, or any other educational institution, except where
due acknowledgment is made in the thesis.
I also declare that the intellectual content of this thesis is the product of my own work,
except to the extent that assistance from others in the project‟s design and conception
is acknowledged.
_________________________
iv
Acknowledgements
There are many people I would like to thank for a large variety of reasons.
Firstly, I would like to thank my supervisor, Prof. Denis Burnham. I could not have
imagined having a better supervisor for my PhD, and without his support, knowledge,
perceptiveness, and good humour and I would never have been able to do this work.
His expertise, understanding, and patience, added considerably to my PhD experience.
I appreciate his vast knowledge and skill in many areas, such as psychology and
statistics and his assistance in writing reports (i.e., scholarship application, research
proposals, papers, and this thesis). He has taught me that scientific research can be not
only rewarding and creative, but also heaps of fun. Thanks a lot, Denis for your
enthusiasm and for being a great supervisor!
Thanks to Dr. Kate Stevens for helpful feedback in all matters and also to Dr. Chris
Davis, Dr. Christine Kitamura, and Dr. Hartmut Pfitzinger who were on my
supervisory panel.
I was very lucky to work with the people at MARCS Auditory Laboratories, and I am
very thankful for all the things I learned in the last years, including setting up my
experiments at Chulalongkorn University in Bangkok during 2004. I am very glad
that I had the opportunity to visit different laboratories in Germany (thanks to Prof.
Tillmann at IPSK, Munich), Croatia (thank you Dr. Glovacki-Bernardi at the
University of Zagreb), Thailand (Prof. Luksaneeyanawin at CRSLP, Chulalongkorn
University, Bangkok), and the United States (Prof. Ohala, University of California,
Berkeley; Prof. Keating at University of California, Los Angeles; Dr. Patel at The
Neurosciences Institute, San Diego; Prof. Stevens at MIT, Boston; Prof. Byrd at
University of Southern California, Los Angeles) during 2003, 2004, and 2005, and for
support in attending the INTERSPEECH conference in Lisbon, Portugal in 2005 and
ICMPC conference in Bologna, Italy in 2006.
I would also like to thank the academic and support staff of MARCS Auditory
Laboratories at University of Western Sydney, especially Gail Charlton for assisting
with travel arrangements, Mel Gallagher for her help with ethics and formatting, Dr.
Caroline Jones for her patience and help with set-up and data analysis in the first
v
stages, Brett Molesworth for help in all areas, Dr. Christian Kroos for assistance with
data analysis and Colin Schoknecht, Arman Abrahamyan (last-minute angel), and
Johnson Chen for technical support.
And a big thanks goes to the people who proofread my thesis: Jess Hartcher-O‟Brien,
Nicole Lees, Dr, Christian Kroos, Dr. Kai Mueller, Michael Nash, Dr. Michael Tyler,
and Nan Xu.
I also want to thank Prof. Marcus Taft who made it possible to set up the first
experiment at UNSW, Prof. Sudaporn Luksaneeyanawin for allowing to test at
CRSLP (Chulalongkorn University, Bangkok), and a very special thank you to
Benjawan Kasisopa for helping me in Bangkok and later at MARCS with everything
related to Thai language and lexical tone. And thank you Sarinya Chompoobutr and
Noppawan Nomura for testing Thai participants in Bangkok. Thanks also to Patana
Surawatanapongs and Naruechol Chuenjamnong for lending their voices in the third
Experiment.
I would like to say a big thank-you to all the people who agreed to do the experiments
for this thesis: Brooke Adam, Mary Broughton, Tim Byron, Sean Coward, Mel
Gallagher, Renee Glass, Shaun Halovic, Jemma Harris, Jess Hartcher-O‟Brien,
Graham Howard, Clare Howell, Nicole Lees, Paul Mason, Karen Mattock, Brett
Molesworth, Damien Smith, Michael Tyler, and all other students who participated in
the experiments at University of Western Sydney, University of New South Wales,
Chulalongkorn University, and Assumption University in Bangkok.
Thanks to my officemates and friends, Bettina, Jemma, and Mary for sharing the
room and much more for the last years and to Pimo Söderbohm for inspiration.
I have to say a huge Danke! to my friends in Germany and all over the world, for
supporting me in so many different ways.
And a big thank you to Michael, for all the help with programming and formatting,
for giving me love and encouragement, and for making me laugh. You are the best!
Finally, I would like to thank my family. I could not have done this without my
parents‟ and my sisters‟ love and support.
Ihr seid Wahnsinn!
vi
Abstract
This thesis is concerned with the perception and production of lexical tone. In the first
experiment, categorical perception of asymmetric synthetic tone continua was
examined in speakers of tonal (Thai, Mandarin, and Vietnamese) and non-tonal
(Australian English) languages. It was observed that perceptual strategies for
categorisation depend on language background. Specifically, Mandarin and
Vietnamese listeners tended to use the central tone to divide the continuum, whereas
Thai and Australian English listeners used a flat no-contour tone as a perceptual
anchor; a split based not on tonal vs. non-tonal language background, but rather on the
specific language. In the second experiment, tonal (Thai) and non-tonal (Australian
English) language speaking musicians and non-musicians were tested on categorical
perception of two differently shaped synthetic tone continua. Results showed that,
independently of language background, musicians learn to identify tones more
quickly, show steeper identification functions, and display higher discrimination
accuracy than non-musicians. Experiment three concerns the influence of language
aptitude, musical aptitude, musical memory, and musical training on Australian
English speakers‟ perception and production of non-native (Thai) tones, consonants,
and vowels. The results showed that musicians were better than non-musicians at
perceiving and producing tones and consonants; a ceiling effect was observed for
vowel perception. Musical training per se did not determine acquisition of novel
speech sounds, rather, musicians‟ higher accuracy was explained by a combination of
inherent abilities - language and musical aptitude for consonants, and musical aptitude
and musical memory for tones. It is concluded that tone perception is language
dependent and strongly influenced by musical expertise - musical aptitude and
musical memory, not musical training as such.
vii
LIST OF FIGURES ........................................................................................................................... XIV
LIST OF TABLES ........................................................................................................................... XVII
CHAPTER 1 INTRODUCTI ON ........................................................................................................... 1
1.1 OVERVIEW ...................................................................................................................................... 2
1.2 ORGANISATION OF THESIS .............................................................................................................. 2
CHAPTER 2 SPEECH PERCEPTION AND CATEGORICAL PERCEPTION ............................ 4
2.1 THE NATURE OF LANGUAGE AND SPEECH ...................................................................................... 5
2.1.1 Segmental Aspects of Speech ................................................................................................. 5
2.1.1.1 Nature of Consonants ..................................................................................................................... 6
2.1.1.2 Nature of Vowels ........................................................................................................................... 8
2.1.1.3 The Nature of Lexical Tone ........................................................................................................... 9
2.1.2 Suprasegmental Aspects of Speech ...................................................................................... 10
2.1.2.1 Rhythm ......................................................................................................................................... 10
2.1.2.2 Stress ............................................................................................................................................ 10
2.1.2.3 Intonation ..................................................................................................................................... 11
2.1.3 Tone as a Segmental and Suprasegmental Aspect of Spoken Language .............................. 12
2.2 SPEECH PERCEPTION AND CATEGORICAL PERCEPTION ................................................................. 12
2.2.1 Speech Perception Research History - Important Issues and Problems .............................. 13
2.2.2 The Problem of Segmentation and Speaker Variability ....................................................... 14
2.2.3 The Nature of Categorical Perception ................................................................................. 15
2.2.4 Prediction of Discrimination Performance from Identification results ............................... 18
2.3 STIMULUS FACTORS IN CATEGORICAL PERCEPTION ..................................................................... 19
2.3.1 Step-Size ............................................................................................................................... 19
2.3.2 Stimulus Duration ................................................................................................................ 20
2.3.3 Categorical Perception of Different Classes of Speech Sounds ........................................... 20
2.3.3.1 Categorical Perception of Stop Consonants ................................................................................. 21
2.3.3.2 Categorical Perception of Nasal Consonants ................................................................................ 21
2.3.3.3 Categorical Perception of Approximants...................................................................................... 22
2.3.3.4 Categorical Perception of Fricatives............................................................................................. 22
2.3.3.5 Categorical Perception of Vowels ................................................................................................ 23
2.3.4 Categorical Perception of Nonspeech Stimuli ................................................................................. 23
2.3.4.1 Categorical Perception of Continua Unrelated to Speech ............................................................. 24
2.3.4.2 Categorical Perception of Nonspeech Analogues of Speech Sounds ........................................... 24
2.4 TASK AND RESPONSE FACTORS IN CATEGORICAL PERCEPTION .................................................... 26
2.4.1 Identification Task Factors .................................................................................................. 26
2.4.2 Discrimination Task Factors in Categorical Perception ..................................................... 27
2.4.2.1 ABX and AXB Discrimination Tasks .......................................................................................... 27
2.4.2.2 The Two-Interval Two-Alternative Forced-Choice Discrimination Task .................................... 28
2.4.2.3 The AX Discrimination Task ....................................................................................................... 28
2.4.2.4 The Four-Interval-AX Discrimination Task ................................................................................. 28
2.4.2.5 The Four-Interval Oddity Discrimination Task ............................................................................ 29
viii
2.4.3 Methods for Increasing Categoricality ................................................................................ 30
2.4.3.1 Interference with Auditory Memory............................................................................................. 30
2.4.3.2 Decay of Auditory Memory ......................................................................................................... 31
2.4.4 Methods to Reduce Categoricality of Perception ................................................................. 32
2.4.4.1. The Use of More Sensitive Discrimination Paradigms. ............................................................... 32
2.4.4.2 The Use of Rating Scales and Measurement of Reaction Times .................................................. 33
2.5 PSYCHOACOUSTIC STRATEGIES AND EXPERIENTIAL FACTORS IN CATEGORICAL PERCEPTION ..... 34
2.5.1 Practice and Strategies .................................................................................................................... 35
2.5.1.1 Practice and Feedback .................................................................................................................. 35
2.5.1.2 The Use of Strategies in Categorical Perception .......................................................................... 35
2.5.2 The Influence of Specific Linguistic Experience on Categorical Perception .................................. 38
2.5.3 Categorical Perception in Infants .................................................................................................... 39
2.5.4 Categorical Perception in Animals .................................................................................................. 41
2.6 THEORIES OF CATEGORICAL SPEECH PERCEPTION ....................................................................... 42
2.6.1 The Motor Theory of Speech Perception .............................................................................. 42
2.6.2 Articulatory/Auditory Theories ............................................................................................ 45
2.6.3 The Stage Theory of Speech Perception ............................................................................... 45
2.6.4 The Dual-Process-Model ..................................................................................................... 46
CHAPTER 3 PERCEPTION AND PRODUCTION OF LEXICAL TONE .................................. 48
3.1 WHAT IS A TONAL LANGUAGE? .................................................................................................... 49
3.1.1 Tonal Phenomena ................................................................................................................ 50
3.1.2 Notation of Tone .................................................................................................................. 51
3.2 TONAL LANGUAGE SYSTEMS ........................................................................................................ 53
3.2.1 Thai Tones ............................................................................................................................ 54
3.2.2 Vietnamese Tones ................................................................................................................. 55
3.2.3 Mandarin Chinese Tones ..................................................................................................... 56
3.3 TONOGENESIS ............................................................................................................................... 57
3.3.1 Development of Tones from Voicing Contrasts - Tonal Split ............................................... 57
3.3.2 Development of Tones from Consonants .............................................................................. 58
3.3.3 Development of Tones from Vowel Height ........................................................................... 59
3.3.4 Other Influences of Tone Development ................................................................................ 59
3.4 TONE PRODUCTION ....................................................................................................................... 59
3.5 TONE PERCEPTION ........................................................................................................................ 62
3.5.1 Fundamental Frequency and the Auditory System ............................................................... 62
3.5.1.1 The Outer and Middle Ear ............................................................................................................ 62
3.5.1.2 The Inner Ear and the Basilar Membrane ..................................................................................... 63
3.5.1.3 The Transduction Process and the Hair Cells ............................................................................... 64
3.5.1.4 Central structures ......................................................................................................................... 65
3.5.2 Theories of Pitch Perception ................................................................................................ 65
3.5.3 Pitch Perception in Speech – Lexical Tone .......................................................................... 66
3.5.3.1 Perception of Lexical Tone - Overview ....................................................................................... 66
3.5.3.2 Multidimensional Approach to Lexical Tone Perception ............................................................. 66
ix
3.5.3.3 Perception of Tone when Pitch Information is not Available ....................................................... 67
3.5.3.4 Lateralization and Neuroimaging Studies .................................................................................... 69
3.6 TONE ACQUISITION....................................................................................................................... 72
3.6.1 Production of First Language Tone ..................................................................................... 72
3.6.2 Perception of First Language Tone ..................................................................................... 73
3.6.3 Production of Second Language Tone ................................................................................. 74
3.6.4 Perception of Second Language Tone .................................................................................. 75
3.6.4.1 Mandarin Tone Perception by Second Language Learners .......................................................... 75
3.6.4.2 Thai Tone Perception by Second Language Learners .................................................................. 77
3.6.5 Tone Perception as a Function of Tone Language Experience – First and Second Language
studies ........................................................................................................................................... 77
3.7 CATEGORICAL PERCEPTION OF LEXICAL TONE ............................................................................ 78
3.7.1 Categorical Perception of Cantonese and Mandarin Tones ................................................ 79
3.7.2 Categorical Perception of Thai Tones ................................................................................. 82
CHAPTER 4 MUSICAL PITCH AND TONE ................................................................................. 85
4.1 GENERAL CHARACTERISTICS OF MUSIC ....................................................................................... 86
4.1.1 Scales and Intervals ............................................................................................................. 86
4.1.2 Tempo, Rhythm, and Meter .................................................................................................. 87
4.2 MUSIC IN DIFFERENT CULTURES .................................................................................................. 88
4.2.2 Western Music ...................................................................................................................... 88
4.2.2.1 Scales and Intervals in Western Music ......................................................................................... 88
4.2.2.2 Tempo, Rhythm, and Meter in Western Music ............................................................................ 89
4.2.2.3 Brief History of Western Music ................................................................................................... 89
4.2.3 Thai Music ........................................................................................................................... 90
4.2.3.1 Thai Intervals and Scales .............................................................................................................. 90
4.2.3.2 Tempo, Rhythm, and Meter in Thai Music .................................................................................. 91
4.2.3.3 Brief History of Thai Music ......................................................................................................... 91
4.2.4 Similarities and Differences – Western and Thai Music and Singing .................................. 92
4.2.4.1 Music............................................................................................................................................ 93
4.2.4.2 Singing in Tonal Languages ......................................................................................................... 93
4.3 PERCEPTION OF MUSIC – TEMPO, RHYTHM , GROUPING, AND METER ........................................... 95
4.3.1 Perception of Tempo ............................................................................................................ 95
4.3.2 Perception of Rhythmic Patterns ......................................................................................... 96
4.3.3 Perception of Grouping ....................................................................................................... 97
4.3.4 Perception of Meter ............................................................................................................. 97
4.4 PERCEPTION OF MUSIC - PITCH ..................................................................................................... 98
4.4.1 Categorical Perception of Musical Pitch ............................................................................. 99
4.4.2 Relative Pitch ..................................................................................................................... 100
4.4.3 Absolute Pitch .................................................................................................................... 100
4.4.4 Absolute Pitch Memory ...................................................................................................... 102
4.4.5 Developmental Issues in Pitch Perception ......................................................................... 103
4.4.5.1 Pitch Perception Development ................................................................................................... 103
x
4.4.5.2 Absolute Pitch Perception Development .................................................................................... 105
4.4.5.3 Absolute Pitch Abilities in Infants ............................................................................................. 106
4.4.6 Hemispheric Differences in Pitch Processing .................................................................... 107
4.4.6.1 Lateralization of Pitch Processing .............................................................................................. 107
4.4.6.2 Lateralization of Absolute Pitch ................................................................................................. 108
4.5 MUSIC AND OTHER DOMAINS ..................................................................................................... 109
4.5.1 The Effect of Music on Other Cognitive Abilities............................................................... 111
4.5.2 The Effect of Music on Speech ........................................................................................... 114
4.6 INFLUENCE OF MUSIC ON FOREIGN LANGUAGE SOUND ACQUISITION ........................................ 116
CHAPTER 5 WHAT WE KNOW NOW AND WHAT WE WANT TO FIND OUT ................. 118
5.1 CATEGORICAL PERCEPTION OF ARTIFICIAL TONE CONTINUA .................................................... 119
5.2 INFLUENCE OF MUSICAL BACKGROUND ON TONE PERCEPTION ................................................. 120
5.3 PERCEPTION AND PRODUCTION OF TONES, VOWELS, AND CONSONANTS – INFLUENCE OF MUSICAL
APTITUDE AND LANGUAGE APTITUDE? ............................................................................................. 120
CHAPTER 6 CATEGORICAL PERCEPTION OF SPEECH AND SINE-WAVE TONES IN
TONAL AND NON-TONAL LANGUAGE SPEAKERS ............................................................... 121
6.1 BACKGROUND: RESEARCH ON THE CATEGORICAL PERCEPTION OF TONE .................................. 122
6.2 METHODOLOGICAL ISSUES ......................................................................................................... 123
6.2.1 Stimulus Type Presentation: Blocked vs. Mixed ................................................................ 124
6.2.2 Categorical Perception: Identification and Discrimination .............................................. 124
6.2.2.1 Interstimulus Interval in Discrimination Tasks .......................................................................... 124
6.2.2.2 Refined Operationalisation of Categorical Perception of Tone .................................................. 125
6.2.2.3 Non-Speech Stimulus Materials ................................................................................................. 127
6.3 HYPOTHESES .............................................................................................................................. 127
6.4 EXPERIMENTAL DESIGN ............................................................................................................. 127
6.4.1 Stimuli ................................................................................................................................ 128
6.4.2 Participants ........................................................................................................................ 129
6.4.3 Procedure ........................................................................................................................... 129
6.4.3.1 Identification .............................................................................................................................. 131
6.4.3.2 Discrimination ............................................................................................................................ 132
6.5 ANALYSES .................................................................................................................................. 133
6.5.1 Test Assumptions ................................................................................................................ 133
6.5.2 Language Group Hypotheses ............................................................................................. 134
6.5.3 Strategy Type Hypotheses .................................................................................................. 134
6.6 RESULTS ..................................................................................................................................... 135
6.6.1 Qualitative Evaluation ....................................................................................................... 135
6.6.2 Identification Results.......................................................................................................... 137
6.6.2.1 Trials to Criterion in Identification Training .............................................................................. 137
6.6.2.2 Identification Test Trials: Crossover Values .............................................................................. 139
6.6.2.3 Identification d' Results .............................................................................................................. 141
6.6.3 Discrimination Results ....................................................................................................... 142
6.6.3.1 Overall Discrimination Differences............................................................................................ 142
xi
6.6.3.2 Peak Discrimination Analysis .................................................................................................... 143
6.7 DISCUSSION ................................................................................................................................ 145
6.7.1 Independence of Speech and Non-Speech Processing ....................................................... 145
6.7.2 Perceptual Strategies in Identification and Discrimination ............................................... 146
6.7.3 Categoricality Issues .......................................................................................................... 148
6.8 FURTHER ANALYSIS AND FUTURE DIRECTIONS .......................................................................... 150
CHAPTER 7 PERCEPTION OF SPEECH AND SINE-WAVE TONES: THE ROLE OF
LANGUAGE BACKGROUND AND MUSICAL TRAINING ...................................................... 152
7.1 BACKGROUND: RESULTS OF EXPERIMENT 1 ............................................................................... 153
7.2 METHODOLOGICAL ISSUES ......................................................................................................... 153
7.2.1 Rising and Falling Continua .............................................................................................. 154
7.2.2 Step Size ............................................................................................................................. 156
7.3 HYPOTHESES .............................................................................................................................. 157
7.3.1 Categoricality Differences ................................................................................................. 157
7.3.2 Continuum Shape and Language Background ................................................................... 157
7.3.3 Processing Differences ...................................................................................................... 157
7.4 EXPERIMENTAL DESIGN ............................................................................................................. 158
7.4.1 Stimuli ................................................................................................................................ 158
7.4.2 Participants ........................................................................................................................ 159
7.4.3 Procedure ........................................................................................................................... 159
7.4.3.1 Identification .............................................................................................................................. 160
7.4.3.2 Discrimination ............................................................................................................................ 161
7.5 ANALYSES .................................................................................................................................. 162
7.5.1 Test Assumptions ................................................................................................................ 162
7.5.2 Language Group and Musical Background Hypotheses .................................................... 162
7.5.3 Strategy Type Hypotheses .................................................................................................. 162
7.6 RESULTS ..................................................................................................................................... 163
7.6.1 Identification Training Results ........................................................................................... 163
7.6.1.1 Trials to Criterion in Identification – Rising Continuum ............................................................ 164
7.6.1.3 Trials to Criterion in Identification – Falling Continuum ........................................................... 165
7.6.2 Identification Test Trials .................................................................................................... 167
7.6.2.1 Crossover Values – Rising Continuum ...................................................................................... 167
7.6.2.2 Crossover Values – Falling Continuum ...................................................................................... 168
7.6.2.3 Identification d' Results .............................................................................................................. 169
7.6.3 Discrimination Results ....................................................................................................... 171
7.6.3.1 Overall Discrimination Differences............................................................................................ 171
7.6.3.2 Discrimination Peak Analysis .................................................................................................... 173
7.6.4 Qualitative Evaluation and Summary of Results ................................................................ 176
7.7 DISCUSSION ................................................................................................................................ 180
7.7.1 Differences Due to Continuum Shape ................................................................................ 180
7.7.2 Differences Between Musicians and Non-Musicians ......................................................... 181
xii
CHAPTER 8 PERCEPTION AND PRODUCTION OF TONES, CONSONANTS, AND
VOWELS: THE INFLUENCE OF LANGUAGE APTITUDE, MUSICAL APTITUDE ,
MUSICAL MEMORY, AND MUSICAL TRAINING .................................................................... 182
8.1 INTRODUCTION ........................................................................................................................... 184
8.2 HYPOTHESES .............................................................................................................................. 185
8.2.1 Separate Abilities ............................................................................................................... 185
8.2.1.1 Musicianship .............................................................................................................................. 185
8.2.1.2 Musical Aptitude ........................................................................................................................ 185
8.2.1.4 Musical Memory ........................................................................................................................ 186
8.2.1.3 Language Aptitude ..................................................................................................................... 186
8.2.2 Relationship between Perception and Production ............................................................. 186
8.2.3. Determinants of Perception and Production .................................................................... 186
8.3 METHOD ..................................................................................................................................... 187
8.3.1 Participants ........................................................................................................................ 187
8.3.2 Musical Aptitude ................................................................................................................ 187
8.3.3 Musical Memory for Pitch ................................................................................................. 189
8.3.4 Foreign Language Aptitude ............................................................................................... 190
8.3.5 Stimulus Material for Perceptual Identification and Production Tasks ............................. 191
8.3.6 Perception of Speech Sounds ............................................................................................. 193
8.3.7 Production of Speech Sounds ............................................................................................. 194
8.4 RESULTS: SEPARATE ABILITIES .................................................................................................. 195
8.4.1 Musical Aptitude Results .................................................................................................... 195
8.4.2 Musical Memory Results .................................................................................................... 195
8.4.2 Foreign Language Aptitude Results ................................................................................... 196
8.4.4 Speech Perception .............................................................................................................. 197
8.4.5 Speech Production ............................................................................................................. 202
8.5 RESULTS: COMPARISON OF PERCEPTION AND PRODUCTION ....................................................... 205
8.6 RESULTS: DETERMINANTS OF PERCEPTION AND PRODUCTION ................................................... 205
8.6.1 Factor Analysis for Data Reduction .................................................................................. 205
8.6.2 Correlations Between Variables ........................................................................................ 206
8.6.3 Sequential Regressions ...................................................................................................... 208
8.6.2.1 Tone Perception and Production ................................................................................................ 209
8.6.2.2 Consonant Perception and Production ........................................................................................ 210
8.6.2.3 Vowel Perception and Production .............................................................................................. 210
8.7 DISCUSSION ................................................................................................................................ 216
8.7.1 The Nature of Musicianship ............................................................................................... 217
8.7.2 Musicianship and the Perception and Production of Speech Sounds ................................ 218
8.7.3 Musical Determinants of Speech Perception and Production ........................................... 219
CHAPTER 9 DISCUSSION .............................................................................................................. 222
9.1 SUMMARY OF RESULTS............................................................................................................... 223
xiii
9.1.1 Experiment 1: Categorical Perception of Speech and Sine-Wave Tones in Tonal and Non-
Tonal Language Speakers ........................................................................................................... 223
9.1.2 Experiment 2: Perception of Speech and Sine-wave Tones - The Role of Language
Background and Musical Training ............................................................................................. 224
9.1.3 Experiment 3: Perception and Production of Tones, Vowels, and Consonants - The
Influence of Training, Memory, and Aptitude ............................................................................. 225
9.2 STRATEGY EFFECTS IN TONE PERCEPTION ................................................................................. 225
9.3 MUSICIANS‟ ADVANTAGES IN SPEECH PERCEPTION AND PRODUCTION ...................................... 227
9.3.1 Musical Experience – Transfer to Musical Tasks .............................................................. 228
9.3.2 Musical Experience – Transfer to Related Linguistic Tasks .............................................. 228
9.3.3 Music Experience – Transfer to Less Related Linguistic Abilities ..................................... 229
9.4 LOCUS OF MUSICIANS‟ SUPERIORITY ......................................................................................... 230
9.5 SUGGESTIONS FOR FUTURE RESEARCH ....................................................................................... 232
9.5.1 Relationship between Tone Space and Intonation Space ................................................... 232
9.5.2 Investigation of Relationship Between Musical Training and Musical Aptitude ............... 232
9.5.3 Development of Musicality ................................................................................................. 232
9.5.4 Acoustic Analyses of Speech Production Ability ................................................................ 233
9.5.5 Psychoacoustic Processing Investigation .......................................................................... 233
9.6 CONCLUSION .............................................................................................................................. 233
REFERENCES………….…………………………………………………………...………………..235
APPENDIX………….……………………………………………………………...………………..272
xiv
LIST OF FIGURES
Figure 2.1. IPA chart of consonant sounds…………………………………………...7
Figure 2.2. The vowel chart of the International Phonetic Alphabet…………………8
Figure 2.3. Schematic spectrograms of the syllables [du] and [di]………………….14
Figure 2.4. Idealised categorical perception result…………………………………..16
Figure 2.5. Idealised continuous perception result…………………………………..17
Figure 3.1. Time normalised fundamental frequency contours of Thai tones……….54
Figure 3.2. Time normalised fundamental frequency contours Vietnamese tones…..55
Figure 3.3. Time-normalised fundamental frequency contours Mandarin tones…….56
Figure 3.4. View of the larynx ………………………………………………………60
Figure 3.5. Schematic figure of the vocal folds during phonation…………………...61
Figure 3.6. Anatomy of the ear: outer ear, middle ear, and inner ear………………..63
Figure 3.7. Anatomy of the cochlea…………………………………………...……..64
Figure 4.1. Comparison between the Thai and the Western Scales………………….91
Figure 6.1. Mid-Continuum Response Strategy……………………………………126
Figure 6.2. Flat-Anchor Response Strategy………………………………………...126
Figure 6.3. F0 characteristics of the asymmetric tone continuum…………………..129
Figure 6.4 Identification and discrimination results across languages……………..135
Figure 6.5 Descriptive statistics for trials to criterion scores………………….……137
Figure 6.6. Descriptive statistics for trials to criterion Scores……………...………138
Figure 6.7. Mean d' identification scores…………………………………………...140
Figure 6.8. Mean d' identification scores…………………………………………..140
Figure 6.9. Mean d' discrimination scores………………………………………….142
Figure 6.10. Mean d' discrimination scores………………………………………...143
Figure 7.1. Mid-Continuum Response Strategy……………………………………153
Figure 7.2. Mid-Continuum Response Strategy……………………………………153
xv
Figure 7.3. Flat-Anchor Response Strategy………………………………………...154
Figure 7.4. Flat-Anchor Response………………………………………………….154
Figure 7.5. F0 characteristics of the two asymmetric tone continua………………..157
Figure 7.6. Trials to criterion scores for the rising continuum……………………..162
Figure 7.7. Trials to criterion Scores for the falling Continuum…………………...164
Figure 7.8. Crossover values for the rising continuum for Thai listeners………......165
Figure 7.9. Crossover values for the rising continuum for Australian listeners……165
Figure 7.10. Crossover values for the falling continuum for Thai listeners……......166
Figure 7.11. Crossover values for the falling continuum for Australian listeners….166
Figure 7.12. Descriptive statistics for d‟ values for Thai listeners………...……….167
Figure 7.13. Descriptive statistics for d‟ values for Australian English listeners…..168
Figure 7.14. Descriptive statistics for d‟ values for Thai Listeners………………...168
Figure 7.15. Descriptive statistics for d‟ values for Australian English Listeners…168
Figure 7.16. Descriptive statistics for discrimination accuracy in Thai listeners…..169
Figure 7.17. Descriptive statistics for discrimination accuracy in Australian listeners……………………………………………………………………………...169
Figure 7.18. Descriptive statistics for discrimination accuracy in Thai Listeners….169
Figure 7.19. Descriptive statistics for discrimination accuracy in Australian listeners…………………………………………………………………………...…170
Figure 7.20. Descriptive statistics for the Flat vs. the other stimulus-pairs – Thai and Australian musicians and non-musicians.………………………………...171
Figure 7.21. Descriptive statistics for the Mid Pair vs. the other stimulus-pairs – Thai and Australian musicians and non-musicians…………………………………171
Figure 7.22. Descriptive statistics for the Flat vs. the other stimulus-pairs – Thai and Australian musicians and non-musicians………………………………...172
Figure 7.23. Descriptive statistics for the Mid Pair vs. the other stimulus-pairs – Thai and Australian musicians and non-musicians…………………………………172
Figure 7.24 Identification and discrimination results across languages……………175
Figure 7.25 Identification and discrimination results across languages……………176
xvi
Figure 8.1. Stylised versions of tonal contours used to label keys.………………...191
Figure 8.2. Descriptive statistics for mean percentile-ranking scores in the musical aptitude test……………………………………………………...…………192
Figure 8.3. Descriptive statistics for musical memory results for musicians and non-musicians for shift size and shift direction…………………………………….193
Figure 8.4. Descriptive statistics for musical memory results for musicians and non-musicians across shift size and shift direction…………………………………193
Figure 8.5. Descriptive statistics for mean scores in the language aptitude test……194
Figure 8.6. Descriptive statistics for mean trials to criterion scores………………..194
Figure 8.7. Descriptive statistics for perception accuracy………………………….195
Figure 8.8. Descriptive statistics for tone perception scores……………………….195
Figure 8.9. Descriptive statistics for consonant perception………………………...195
Figure 8.10. Descriptive statistics for vowel perception……………………………195
Figure 8.11. Descriptive statistics for speech production…………………………..200
Figure 8.12. Descriptive statistics for tone production……………………………..200
Figure 8.13. Descriptive statistics for consonant production……………………….200
Figure 8.14 Descriptive statistics for vowel production……………………………200
xvii
LIST OF TABLES
Table 3.1. The Five Lexical Tones of Standard Thai………………………………...52
Table 6.1. Description of the Language Hypotheses …………………………….…133
Table 6.2. Planned Contrasts for the Strategy Type Hypotheses……...……………134
Table 6.3. Descriptive Statistics for Crossover Values.………………………….…139
Table 6.4. Mean d’ values for Identification and Discrimination in Musicians and Non-Musicians ………………………………….……………………………..148
Table 7.1. Planned Contrasts for Strategy Type Hypotheses – Rising continuum...160
Table 7.2. Planned Contrasts for Strategy Type Hypotheses – Falling continuum..161
Table 8.1. Matrix Showing Stimuli Differing on Three Levels……………………..189
Table 8.2. Description of the Speech Sound Type Planned Contrasts……………...196
Table 8.3. Description of the Tone Contrasts………………………………...…….197
Table 8.4. Description of the Consonant Contrasts………………………….……..198
Table 8.5. Description of the Vowel Contrasts …………………………………….199
Table 8.6. Principal Component Loadings and Communalities …………………...203
Table 8.7. Intercorrelations Among Language Aptitude, Musical Aptitude, Musical Memory, and Musical Training…………………………………...……….204
Table 8.8. Descriptive Statistics and Correlations ………………………………...204
Table 8.9. Summary of Significant Predictors ……………………………………..208
Table 8.10. Summary of Significant Predictors – Alternative…………………..….208
2
1.1 Overview
This thesis is concerned with perception and production of lexical tone.
In tonal languages such as Vietnamese, Mandarin, or Thai, additional to differences in
consonants and vowels, differences in tone (fundamental frequency changes, perceived
as pitch differences) can be used to distinguish meaning (1962). In Thai, for example,
the word [mai] can mean “wood”, “not”, “silk”, “burn”, or “new” depending on what
tone it is pronounced with. In speech science research, consonants and vowels have
received most of the attention, while tones in tonal languages have been relatively
neglected. However, more than half of the world‟s population are tonal language
speakers, and an estimated 60 % to 70 % of the world‟s languages are tonal. On the
basis of this prevalence alone, tones, as well as consonants and vowels need to be
considered in all areas of speech perception and production research.
In addition, given the much greater importance of fundamental frequency (F0) in tone,
studies of tones may reveal aspects of speech processing that studies of consonants and
vowels have left uncovered. Such processes may well be elucidated by studying
speakers of tonal and non-tonal languages, and in this thesis both will be investigated.
F0 is also very important in music. In fact, F0 changes are used to change musical
melody and harmony. Given that tone is mainly conveyed by F0 it is of interest whether
or not tone perception is influenced by previous experience with musical tone.
Therefore, in this thesis, musicians‟ and non-musicians‟ perception of tone will also be
investigated. In addition the relative contributions of language aptitude, musical
aptitude, training, and memory will be considered.
1.2 Organisation of Thesis
To understand processes involved in speech perception in general, studies of consonant
and vowel perception are reviewed in Chapter Two. Differences and similarities
between those classes of speech sounds will be considered, with a focus on studies
concerning the phenomenon of categorical perception. This will be followed in Chapter
Three by a review of research in the area of lexical tone, especially tone perception.
Tonal phenomena, such as tonogenesis, tone acquisition, and categorical tone
perception will also be considered. In Chapter Four, research on music, especially pitch
will be summarised. Differences between musical pitch and lexical tone perception will
3
be explicated, including findings concerning singing vs. speaking, hemispheric
differences, and categorical pitch perception.
Chapter Five provides a summary of the introductory chapters, and sets up the research
issues that will be addressed in the experiments. Methods, results, and discussion of the
three experimental studies will be presented in Chapters Six, Seven, and Eight. Chapter
Six concerns categorical perception of speech and sine-wave tones in tonal and non-
tonal language speakers. Chapter Seven considers the influence of language and
musical background on the perception of speech and sine-wave tones and musical
memory. Chapter Eight investigates the influence of musical memory, musical aptitude,
and language aptitude on the perception and production of tones, consonants, and
vowels by musicians and non-musicians.
Finally in Chapter Nine there is a general discussion of the findings in terms of
perceptual strategies and interdependence of tonal language and musical background.
Chapter 10 concludes the thesis, noting implications and direction of future research.
5
2.1 The Nature of Language and Speech
The main purpose of language is to convey information. There are many different ways
to transmit information through language. Some examples are Braille transcription, sign
language and Morse code. Spoken and written language are the most common forms of
language.
Speech – the spoken form of language is not the same as language, because additional
to the linguistic content of what is said, a great deal of non-linguistic information is
conveyed by speech. When we hear somebody speak, we usually only need a few
moments to learn many aspects of the person we are talking to: where they come from,
which social group they belong to, whether they are in a good or in a bad mood. We
can find out about their health state and other important speaker characteristics.
The focus of the current experiments is on both the perception and production of
speech, particularly tones. Sections 2.1.1 and 2.1.2 will introduce segmental and
suprasegmental aspects of speech, ahead of a discussion of categorical speech
perception.
2.1.1 Segmental Aspects of Speech
In articulatory terms, speech sounds differ in whether or not the airflow coming from
the lungs is obstructed in the vocal tract1 and if so, at what point and in what manner it
is obstructed. On this articulatory basis, there are two broad classes of segments in
spoken language: consonants and vowels. Vowels are produced by allowing air to flow
from the lungs in an unobstructed way, whereas consonants are characterised by an
obstruction of the vocal tract. A third class of segments, usually carried on vowels, is
lexical tone. Lexical tone is the distinctive pitch level and/or contour carried by the
syllable(s) of a word, in cases in which tone is an essential feature of the meaning of
that word. In the past, lexical tone has often been neglected in spoken language
research, even though more than 60% to 70% of the world‟s languages are tonal
languages and more that half of the world‟s population are tonal language speakers
(Yip, 2001).
1 The term vocal tract refers to the whole of the air passage above the larynx, the shape of which is the main factor affecting the quality of speech sounds.
6
In order to understand the following information about speech, it is important to define
some terms that will be used.
Consonants and vowels can be phonetic as well as phonemic categories. A sound that is
distinguished on the basis of phonetic or articulatory features is called a phone, whereas
a category of sounds that are used to distinguish meaning in a particular language is
called a phoneme. In English, for example, /t/ and /d/ are different phonemes and „tent‟
and „dent‟ have different meanings. Phonemes can have different phonetic realisations,
which are called allophones. An allophone is one member of a phonemic category. In
English, [th] as in „toast‟, [t] as in „stand‟ and [t] as in „pot‟ are allophones of the
phoneme category /t/, even though their articulatory realisation and acoustic
characteristics are different.
As demonstrated above, the convention for written text is that slashes are used to
indicate phonemes /t/, and phones are written in brackets [t].
2.1.1.1 Nature of Consonants
In the consonant chart of the International Phonetic Association (IPA), consonants are
classified according to their place of articulation and articulatory organs as well as the
manner in which they are produced (see Figure 2.1).
Another important distinction among consonants is whether or not the vocal cords2 are
closed or separated as air coming from the lungs travels through them. If the vocal
cords are kept close together, the air stream must force its way through the glottis3,
causing the vocal cords to vibrate. The resulting sound is called a voiced speech sound,
as in [z]. If the cords are separated, the air is not obstructed at all, and the sound is
called voiceless, as in [s]. In the IPA chart (Figure 2.1), voiced (consonants on the left
of a cell) and voiceless (consonants on the right) versions of the consonants are shown.
There are 11 consonant classes in the IPA chart but in this section, just the 4 consonant
classes that are used in English, which are plosives, nasals, and fricatives are described.
2 The vocal cords are two bands of mucous membrane that are situated in the larynx. The vocal cords vibrate when they are adducted. 3 The glottis is the opening between the vocal cords at the upper part of the larynx.
7
Figure 2.1. IPA chart of consonant sounds, charted by manner (vertical axis), place (horizontal axis) of articulation and voicing (left vs. right member of each cell) (International Phonetics Association, 1999).
Stop consonants or plosives, of which English has /b/, /p/, /d/, /t/, /g/, and /k/, are
produced by completely occluding the vocal tract at a single place of articulation with
the lips, the tongue tip, or the tongue body. In /b/ and /p/, the closure occurs between
the lips, in /d/ and /t/ between the tongue tip and the teeth, and in /g/ and /k/ the tongue
body occludes the vocal tract at the velum. In a plosive, vocal tract air pressure is built
up during the closure phase and then released with a rapid opening movement that
causes a noise burst.
A second class of consonants, called nasals, involves the lowering of the velum during
an oral closure, so that the airflow travels through the nose, rather than through the oral
cavity, as in /m/, which is produced with closed lips or /n/, where the tip of the tongue
occludes the vocal tract at the upper alveolar ridge. In /ŋ/, the back of the tongue
touches the velum.
Fricatives are produced by creating a narrow constriction, usually via tongue tip or
tongue body placement, and an appropriate level of air pressure to produce turbulence
and thus fricative noises. In English, there are five different places of articulation for
fricatives: labiodental (/f/ as in „fast‟ and /v/ as in „vast‟), interdental (/θ/ as in „thunder‟
and /ð/ as in „though‟), alveolar (/s/ as in „sue‟, and /z/ as in „zoo‟), palatal (/∫/ as in
shade and /ʒ/ as in measure), and glottal (/h/ as in house).
8
Plosives that are released as fricatives are called affricates. In English, two affricates
are apparent: [t∫] as in „chicken‟ and [dʒ] as in „jockey‟. The closure is alveolar as in /t/
or /d/ and friction occurs during the release.
Consonants are generally described as fast changing parts of the speech signal. These
changes, called transitions, can also be found in visual representations of the speech
signal. Vowels show more stable formant transitions and will be described in the
following section.
2.1.1.2 Nature of Vowels
Vowels are characterised by an open vocal tract. Vowels are distinguished from one
another by whether they are produced in the front, [i], [a], in the centre, [ə], or in the
back, [u], of the oral cavity, whether the tongue position is high, [i], middle, [e], or low,
[a], and whether the lips are rounded, [y], or unrounded, [i], (see Figure 2.2).
Figure 2.2. The vowel chart of the International Phonetic Alphabet. The vowel quadrangle shows the extreme vowel positions in articulation. Horizontally, frontness/ backness of the vowels (acoustically measured by the second formant F2) are plotted. Vertically, vowels are plotted according to tongue position (acoustically measured with the first formant F1). Where symbols appear in pairs, the one to the right represents the rounded version of the vowel. Figure reproduced from the Handbook of the International Phonetic Association. (1999)
The number of vowels varies across languages. Aranda for example, a language spoken
in Greenland, only has three vowels, whereas twenty different vowels are found in
Punjabi (Pompino-Marschall, 1995). There are 14 different vowels apparent in the
vowel system of English. Vowels, as opposed to consonants, have rather steady-state
characteristics, which means that there is a phase in the vowel during which the
9
acoustic and articulatory characteristics are relatively stable. Those steady vowels are
called monophthongs. Other vowels have more than a single steady state - their quality
changes as the tongue moves in the course of their production and these are called
diphthongs. Examples of diphthongs are /ei/ as in „face‟ and /oI/ as in „boy‟.
2.1.1.3 The Nature of Lexical Tone
In English, only consonants and vowels are used to differentiate the meaning of words.
However, there is a third feature that plays a role in well over half of the world‟s
languages – lexical tone.
In around 60% to 70% of the world‟s languages, pitch height and/or pitch contour of
vowels is used as a lexical feature. These differences in pitch are called lexical tone,
and the languages that make use of tone as a lexical feature are called tonal languages.
In tonal languages, such as Mandarin Chinese or Vietnamese the meaning is not only
determined by vowels and consonants but also by pitch height and pitch contour. Tone
is not a lexical feature in English.
There are two kinds of tonal languages: register tone languages, where the pitch height
of the tones is relatively level, and contour tone languages, in which the contour of at
least some of the tones is more important than the absolute pitch height.
The description of tones in categories of height and contour is relative. On the one
hand, it is not the absolute pitch that makes the tone high or low, but rather the relative
height compared to the pitch range of the particular speaker; on the other hand, it is
relative pitch height compared to the accompanying intonation. Tone is mainly
specified by the psychological dimension of pitch, as measured by the acoustic variable
of fundamental frequency, but other aspects such as duration4, amplitude5 and voice
register6 can also play a role in tone. A comprehensive overview of pitch and lexical
tone is presented in Chapter 3.
4 Duration is the acoustic feature of length of time, measured in seconds or milliseconds.
5 Amplitude is an acoustical measure, that refers to the extent to which an air particle moves to and from around its rest point in a sound wave. The greater the amplitude, the greater the intensity of a sound, and the greater the sensation of loudness. 6 Voice register refers to the voice quality produced by a specific physiological constitution of the larynx. Variations in the length, thickness and tension of the vocal cords combine to produce different types of phonation, such as creaky or breathy voice.
10
2.1.2 Suprasegmental Aspects of Speech
The physical correlate of pitch is fundamental frequency (F0) of the voiced parts of the
acoustic speech signal. As opposed to lexical tone, which can be considered to be a
segmental feature, there are suprasegmental aspects of pitch, which extend not just over
syllables or words, but also over whole utterances. These are rhythm, stress, and
intonation and they are used in tonal languages as well as in non-tonal languages. These
features are reviewed in the following sections.
2.1.2.1 Rhythm
Rhythm in speech is the matter of timing within an utterance7. It can be regarded as the
relationship between strong and weak beats (or stresses). Extended utterances such as
sentences always display a mix of strong and weak beats. In speech, rhythm is apparent
on the word level as in „DOCtor‟ vs. „guiTAR‟ (capitalised words/syllables are
stressed) as well as at the sentence level as in „HE did it.‟ vs. „He DID it.‟ Rhythmic
organisation is not the same in all languages. For example English is a stress-timed
language, in which, it is hypothesised, durations between consecutive stressed syllables
are always similar (Abercrombie, 1967). Other types of rhythmic organisation are
syllable-timed languages, such as French, in which the syllables are said to occur at
regular intervals in time, and mora-timed languages, such as Japanese, in which the
rhythmic units are moras, similar to but generally smaller units than syllables in which
duration is generally equivalent (Crystal, 2003).
In English, the stress foot determines the rhythm. Stress foot is the rhythmic unit that
includes a stressed syllable preceding an unstressed syllable (Echols, Crowhurst, &
Childers, 1997; Shattuck-Hufnagel & Turk, 1996). The pattern of stress feet found in
English is predominantly trochaic, which means that a stressed syllable is usually
followed by no or one unstressed syllable; a strong-weak pattern (Nooteboom, 1997).
2.1.2.2 Stress
Linguistic stress in speech can operate on two levels –the word and the sentence level.
Word stress can be phonemic in many languages. In English, for instance word pairs
can be distinguished by their stress pattern. In some words, if the stress is moved from
the first to the second syllable in bi-syllabic words, the word meaning changes in 7 An utterance is a complete unit of speech in spoken language. It can consist of one or more words and is generally but not always bounded by silence.
11
respect to its lexical class. „SUBject‟, for example, with the stress on the first syllable,
means a topic, whereas when the stress is on the second syllable, „subJECT‟, means to
make somebody accountable for something. Other examples are „PERmit‟ versus
„perMIT‟, and „ABstract‟ versus „abSTRACT‟.
Word stress is not predictable in its placement in English – it must be learned with each
word. In many other languages, word stress is predictable; in French for example it is
always on the last syllable („mademoiSELLE‟, „bon voYAGE‟).
Stress is often signalled by loudness (amplitude), but stressed syllables are not always
louder than others; other factors like duration (lengthening of a stressed syllable), pitch
shift (higher pitch in stressed syllables), and changed spectral characteristics8 are also
very important and reliable features that signal stress.
Sentence stress (or focus) is an important part of the phonology of all languages. This
prosodic feature indicates where the important information point of the sentence is.
Consider the sentence „she was NOT supposed to read about these decisions.‟ vs. „she
was not supposed to READ about these decisions. In the first the meaning is that the
person should not have been informed about the decision, whereas in the second the
possibility of learning about these decisions by means other than reading is opened.
2.1.2.3 Intonation
Intonation is the course of the pitch pattern across an utterance. Over the course of an
utterance, irrespective of fundamental frequency changes resulting from the particular
accentuation patterns, there is a general declination in F0 over time. Over and above
this, intonation contours can serve to transmit differences in meaning. An utterance that
has a falling intonation contour as in „She was there.‟ is usually perceived as a
statement whereas the same sentence produced with a rising intonation contour would
be perceived as a question – „She was there?‟
Apart from the use of pitch for lexical distinctions in tonal languages (see 2.1.1.3 and
Chapter 3), pitch is also used in speech for so-called paralinguistic9 purposes. This kind
8 The sound spectrum, as represented in spectrograms, includes time, frequency and intensity relationships, notably seen as bands of energy called formants. (see Fig 2.3). In this case these variations could include changes to the formant structure. 9 Paralinguistic refers to the set of non-phonemic properties of speech, such as speaking tempo, vocal pitch, and intonational contours, which can be used to communicate attitudes or other shades of meaning.
12
of pitch variation is known as emotional prosody, and it refers to the mechanism by
which personal characteristics and emotional states are indicated in the intonation
contour (Abercrombie, 1968; Kramer, 1963). This affective function of prosody has
been seen to reflect individual psychological states more than general features of the
language community (Fry, 1969, 1970). Some attempts have been made to analyse
affect in the acoustic signal (Lieberman & Michaels, 1962), for example Williams and
Stevens (1972) found regular patterns of pitch change associated with anger and fear.
Closely related to the affective level is the distinction between attitudes by prosody,
including attitudes towards the speaker, towards the content of the utterance, or towards
the listener (Van Lancker, 1980). This application of prosody is used to express a more
personal commentary on the sentence that is produced. In this way prosody can be used
to express rather subtle notions such as hesitancy and irony. The use of prosodic
parameters as transmitters of attitudes is thought to be universal across all human
language systems (W. Wang, 1971).
2.1.3 Tone as a Segmental and Suprasegmental Aspect of Spoken Language
Now that both the segmental and suprasegmental aspects of language have been
considered, it can be seen that lexical tone has a special and even ambiguous nature. On
the one hand lexical tone is segmental as it distinguishes meaning at the lexical level, as
in other segments, consonants and vowels. On the other hand it is suprasegmental as it
uses pitch variation (often over time) as in suprasegmental cues such as intonation, to
do so. In this thesis, the perception of tone in speech contexts and pitch in non-speech
contexts will be investigated on a segmental level.
2.2 Speech Perception and Categorical Perception
Over the past half century, one of the major goals of speech scientists has been to map
acoustic properties of the speech signal and linguistic elements such as phonemes. This
mapping has turned out to be a rather complex process and a complete explanation of
how humans identify consonants and vowels remains unclear.
Speech perception can be divided into three levels (Studdert-Kennedy, 1976). At the
auditory level, the signal is represented in terms of its frequency, intensity, and
temporal attributes (features that can be identified in the acoustic signal) as with any
other auditory stimulus. At the phonetic level, particular acoustic cues, such as formant
13
transitions, duration, etc. are perceived as specific speech segments, phones. At the
phonological level, phonetic segments are classified into language-specific phonemes
in terms of phoneme classes and phonological rules are applied to the perceived
utterance. These three levels may be interpreted as successive discrimination processes
applied to the speech signal. First the auditory signal is separated from other sensory
signals and it is established that the stimulus is an event that has been perceived. Then
the special properties that qualify it as speech allow it to be separated from other
sounds. Finally, its specific characteristics allow it to be recognised as meaningful
speech of a particular language.
In this thesis the phonetic and phonological stage of perception are of most interest, and
within these levels the manner in which speech sounds are classified - categorical
perception. In the following sections, previous research in the area of categorical
perception will be summarised and the major theoretical standpoints explained.
2.2.1 Speech Perception Research History - Important Issues and Problems
Up to the middle of the 20th century, most research in speech perception was conducted
by telephone companies such as the Bell Laboratories in the United States. Their major
interest was to reduce the acoustic speech signal in order to be able to use the available
capacity of the transmitting media as effectively as possible. As a result of experiments
testing speech intelligibility, the telephone, up to this date, only transmits frequencies
between 300 Hz and 3 kHz, as it has been found that frequencies outside this range are
not necessary to understand speech (Pompino-Marschall, 1995).
Much of academic speech perception research started in the context of a technical
failure: The Haskins Laboratories in New York had planned to develop a reading
machine for the blind in the 1950s; their goal was to encode letters into special acoustic
signals, similar to the Morse code. During their research on this it became clear that the
very high transfer rates that humans use in speech communication could not be
achieved, and the new avenue of speech perception research developed.
During World War II, an apparatus for visual investigation of spectral features of
speech sounds and production of synthetic speech was developed: the spectrograph.
14
This involved representation of speech in term of a spectrogram10 and drawing stylised
spectrographic patterns meant to capture the essential linguistically relevant aspects of
the speech signal, and converting these stylised visual patterns to speech via the
“pattern playback” synthesiser. Thus a new method of analysing perceptual processes
by testing synthetised speech material, „analysis-by-synthesis‟, was born. The quality of
those early attempts at synthesis was not very good, but the results of the experiments
conducted at the time became very influential in establishing future directions in speech
perception research. Some of the problems that occurred with synthesis of speech are
discussed in the following section.
2.2.2 The Problem of Segmentation and Speaker Variability
The early approaches to synthesise speech sounds showed that it was relatively
straightforward to produce intelligible vowels by resynthesising appropriate two-
formant patterns. Plosive sounds, such as /b/ or /g/ turned out to be much more complex
to synthesise and these sounds became a major object of investigation from that point in
time. The most salient feature of stop consonants is the dynamic formant transition,
which can differ according to neighbouring sounds.
Figure 2.3. Schematic spectrograms of the syllables [du] and [di].
As shown in Figure 2.3, the spectrogram patterns for the speech sound [d] differ with
respect to the second formant F2. The formant transitions into F2 are very different in 10 In a spectrogram, the horizontal dimension represents time and the vertical dimension represents frequency. Each thin vertical slice of the spectrogram shows the spectrum during a short period of time, using darkness to stand for amplitude. Darker areas show those frequencies where the simple component waves have high amplitude.
2.5
2.0
1.5
1.0
0.5
0
0 100 200 300 400
[du]
[di]
time (ms)
F2
F1 F2 F1 F
req
uen
cy in
kH
z
15
[du] and [di], due to the phonetic context in which the [d] sounds are articulated. This
feature of spoken language is called coarticulation, a term which refers to the overlap
of articulatory movements in consecutive speech sounds (see Figure 2.3). Despite
coarticulation effects on the acoustic realisation of the speech signal, there is invariance
in perception; in this case the same initial phone [d] is perceived in each case.
Another problem for speech synthesis was that while synthesis of the parts with static
formant characteristics resulted in sounds that were clearly intelligible vowels, [u] and
[I] in Figure 2.3, if only the transition part of the syllable was synthesised, the percept
was a complex non-speech chirp which did not sound like a [d] at all. This meant that
auditorially, the sound does not only vary according to the context but it is also not
always possible to segment it in a way that the single phoneme is audible.
The issue of variability of speakers concerns the fact that different productions of the
same sound, syllable or word can look very different in the signal, but can be
recognised as the same utterance by human listeners. Due to large differences in
articulatory anatomy and articulation types, the acoustic signal varies greatly between
speakers and even different productions of the same word can look very different
within one speaker. This variability does not pose a real problem for human listeners,
but it is one of the major obstacles that automatic speech recognition must overcome.
All of these mismatches between the signal and the percept show that the human
listener is able to perceive phonetic/phonemic invariance in the face of acoustic
variability. Further investigation of such phenomena via manipulation of phonetic
features gave rise to establishment of a very important phenomenon: categorical
perception of speech sound continua. Categorical perception assists in the explanation
of how humans overcome the above problems of speech perception and is explained
and discussed in the following sections.
2.2.3 The Nature of Categorical Perception
In categorical perception, stimuli equally spaced along a physical continuum spanning
two phoneme or phone categories are perceived as members of one or the other
category with little perceptual ambiguity, and stimuli within categories are difficult to
differentiate. Categorical speech perception was first described and investigated at
16
Haskins Laboratories (Liberman, 1957). In a typical categorical perception experiment,
a continuum of synthetic stimuli (e.g. consonants in speech experiments) varying in a
physical parameter (e.g. the duration of the voice onset time (VOT) as in /ba/ - /pa/) is
presented to the participant for identification and discrimination. In the identification
task, speech sounds from the continuum are presented to subjects with their task being
simply to label the sounds, for example as „b‟ or „p‟. In the discrimination task, a
judgement is required regarding whether two sounds that are presented in succession
are the same or different. (A more detailed review of different types of discrimination
tasks is presented in section 2.4.2)
stimulus numbers (from 1 to 9) with 1 and 9 being the extreme stimuli
Figure 2.4. Idealised categorical perception result. The dashed lines represent identification; the solid line represent discrimination performance.
Categorical perception is indicated by a particular combination of identification and
discrimination functions, as shown in Figure 2.4. Firstly, identification functions will
exhibit abrupt boundaries between stimulus categories; and secondly, the
discrimination performance is close to the chance level (50%) for stimulus-pairs within
a category, but almost perfect for stimulus-pairs that cross the identification boundary,
a pattern of results known as “phoneme-boundary effect”. Perception is only considered
categorical if the location of the identification boundary corresponds with the location
of best discrimination performance (the discrimination peak). It can be seen that the
underlying notion of the categorical perception of speech is the premise that speech
discrimination ability is very closely connected to the existence or non-existence of
functional differences between sounds.
perc
eptu
al a
ccur
acy
(%
)
0
25
50
75
100
1 2 3 4 5 6 7 8 9
discrimination performance
identification performance
discrimination peak
identification boundary
17
stimulus numbers (from 1 to 9) with 1 and 9 being the extreme stimuli Figure 2.5. Idealised continuous perception result. The solid line represents discrimination, the dashed lines represent identification performance.
If a continuum is perceived continuously, rather than categorically, as it is the case with
most psychoacoustic continua like brightness, amplitude or duration, the identification
functions are less steep than those in categorically identified stimuli. Such a case shown
in the idealised results in Figure 2.5, while discrimination ability is better than chance,
it is constant along the whole continuum. Clear result patterns like those shown here are
rarely obtained in real experiments but they demonstrate the essential features of
categorical and continuous perception. In speech, result patterns resembling continuous
perception have been found in experiments with vowels - identification functions are
not very abrupt and discrimination is only slightly improved around the perceived
identification boundary (Cowan & Morse, 1979; Fry, Abramson, Eimas, & Liberman,
1962; D. B. Pisoni, 1973); and patterns resembling categorical perception have been
found for consonants (Bastian, Eimas, & Liberman, 1961; Eimas, 1963; Lane, 1965;
Liberman, 1957; Liberman, Harris, Hoffman, & Griffith, 1957; Repp, 1984).
Research on the categorical perception of speech was quite productive up until the mid-
to late-eighties (Harnad, 1987; Repp, 1984; Snowdon, 1987) and while the notion of
categorical speech perception is not without controversy, it has now become a core
concept of the field (Kluender, 1994; MacKay, Allport, Prinz, & Scheerer, 1987).
0
25
50
75
100
1 2 3 4 5 6 7 8 9
discrimination performance
identification performance
(no sharp) identification boundary
(no) discrimination peak
perc
eptu
al a
ccur
acy
(%
)
18
2.2.4 Prediction of Discrimination Performance from Identification results
One of the premises of categorical perception is that discrimination performance is
predicted by identification performance. This discrimination prediction consists of two
parts. The first prediction concerns the location of the discrimination peak: To fulfil the
premises of categorical perception, the discrimination peak must be located at the same
point of the continuum where the identification boundary is found. The identification
boundary can be computed as the point on the continuum where correct identification
responses for each category are at 50 percent. Secondly, following the hypothesis that
the listener can only discriminate sounds that are identified as different categories, for
judgements involving two categories, where the probability for each is equal, the
proportion correct (P(C)) in discrimination is predicted to be
P(C) = 0.5 [1+ (pA – pB)2]
where pA is the probability of identifying stimulus A as one of the two categories, and
pB is the probability of identifying stimulus B as the same category, and the chance
level is 0.5 (Liberman, 1957; Macmillan, Kaplan, & Creelman, 1977). In most cases the
use of this prediction formula leads to a conservative prediction, a lower level of
discrimination performance than is actually observed (Damper & Harnad, 2000). The
difference between prediction and actual performance is attributed to different factors:
The first possible reason is that higher-than-predicted discrimination could be based not
only on the phonemic labels but also on important spectral differences between the
sounds (Eimas, 1963; Liberman et al., 1957; D. B. Pisoni, 1971; Wood, 1976). Another
explanation for the discrepancy between obtained and predicted results is that the
difference in some studies may be due to artefacts of the experimental procedure,
irrelevant aspects of the speech signal that could have given extra-speech cues to the
listener, such as accidental noise apparent in one, but not the neighbouring sounds.
These external aspects of the speech signal would not influence identification results,
but might contribute to the discrepancy between predicted and obtained discrimination.
(For detailed reviews of other methods of discrimination prediction see Macmillan,
1987 and Massaro, 1987).
19
2.3 Stimulus Factors in Categorical Perception
In this section features of the stimulus itself are considered and in the following
sections the kinds of tasks and responses used in categorical perception studies are
described. Here, a number of stimulus features are discussed, beginning with one of the
most important, the size of the steps along the continuum.
2.3.1 Step-Size
In discrimination, the most obvious variable that influences response accuracy is the
magnitude of the separation of stimuli on the continuum. As expected, it has been
found that the larger step-size between stimuli to be discriminated, then generally, the
better the discrimination (D. B. Pisoni, 1971). Healy and Repp (1982) tested influence
of three different step sizes and found increased discrimination performance for vowels,
tones, and fricatives but not for stop consonants. Pisoni and Tash (1974) assessed
reaction times in a same-different category discrimination task and found response
times for stimuli that were two steps apart were significantly shorter than for one-step
pairs and pairs of identical stimuli. The other observation made was that „different‟
reaction times for stimulus-pairs crossing a phonetic boundary were greater than for
those stimuli that were separated by four or six steps. Nevertheless, no difference was
found between four-step and six-step different pairs and the likelihood of falsely
responding with „same‟ to a different pair was the highest for two-step pairs. This could
mean that different- reaction times reflected phonetic, rather than auditory ambiguity.
Based on these results Pisoni and Tash (1974) proposed a two-stage model for same-
different comparisons. In this model, auditory stimulus properties are first compared
and a comparison of phonetic labels takes place. Then a second phonetic stage is used
only if the auditory difference does not fall either below the “same” or above the
“different” criterion adopted by the listener.
In summary, same-different reaction time studies have shown that the listener is
sensitive to differences within stop consonant categories, but it is difficult to detect
those. Even though there are experiments that did not find such sensitivities (Repp,
1975), the positive results reinforce the hypothesis that all aspects of the speech signal
are represented in auditory memory.
20
2.3.2 Stimulus Duration
Another factor that can affect categoricality of perception is the duration of the
stimulus. In the case of steady-state vowels, a shortening of the stimulus duration is
suspected to weaken the auditory trace of the stimulus consequently lead to perception
that is more categorical than for longer stimulus durations.
Indeed, it has been found that perception is more categorical with short (around 25-50
ms) vowels than long (100 ms) vowels and even up to 300 ms in duration (Fujisaki &
Kawashima, 1968, 1969). In addition perception is also more categorical for rapidly
changing formant transitions in vowels than for steady-state vowels (D. B. Pisoni,
1971; Sachs, 1969; Sawusch, Nusbaum, & Schwab, 1980; K. N. Stevens, 1968; Tartter,
1981).
These findings of categorical perception for shorter and dynamically varying vowels
suggest that the short duration and rapid transitions for initial stop consonants may be
responsible for their being perceived in a highly categorical manner. Investigations of
this hypothesis in a number of studies confirm the impression that formant transitions
have a representation in auditory memory that can be accessed when redundant steady-
state information is removed from the vowel (Dechovitz & Mandler, 1977; Keating &
Blumstein, 1978; Tartter, 1981). This means that even though the vocalic portion of a
stop consonant-vowel syllable helps phonetic perception, it appears to interfere with the
maintenance of the consonantal features at a pre-categorical level. Therefore the
general auditory salience of irrelevant stimulus parts may be a major factor in
categorical perception.
2.3.3 Categorical Perception of Different Classes of Speech Sounds
Most of the experiments in which categorical perception was tested have used voicing
in initial stop consonants or vowels, because these sounds seem to represent the
endpoints of the categoricality spectrum (stops: categorical; vowels: rather continuous).
In order to be able to interpret the results of the current experiments that are concerned
with the much less investigated categorical perception of lexical tone, the results
obtained with various classes of speech sounds are reviewed in the following sections.
21
2.3.3.1 Categorical Perception of Stop Consonants
Most research in the categorical perception domain has been conducted with stop
consonants and is reviewed here. In stop consonants, possible contrasts are voicing,
manner of articulation, and place of articulation.
It has been shown in numerous experiments that continua in which stop consonant
voicing is manipulated, are perceived in a categorical manner (Abramson & Lisker,
1973; Bastian et al., 1961; D. Burnham, L. Earnshaw, & J. Clark, 1991; Cutting &
Rosner, 1974; Edman, 1979; Eimas, 1963; Harnad, 1987; Liberman, 1957; D. B.
Pisoni, 1971).
Early studies of categorical perception of place of articulation found that this feature of
speech is also perceived categorically (Liberman et al., 1957; Mattingly, Liberman,
Syrdal, & Halwes, 1971; D. B. Pisoni, 1971; Syrdal-Lasky, 1978). Place of articulation
was manipulated by Mattingly et al. (1971), who noticed an absence of discrimination
peaks, which was later attributed to the poor quality of the stimuli. Nevertheless,
Popper (1972) found a discrimination peak on an /ab/ - /ad/ continuum, although
discrimination was better than predicted by identification; (similar results were
obtained in a study by Miller, Eimas, and Zatorre (1979).
Results of studies investigating the influence of the consonant position in the word or
syllable suggest that syllable-final stop consonants are perceived less categorically than
syllable-initial stops, which could be due to the fact that the distinctive information is
better retained in auditory memory when it is in final position.
Another way of manipulating stop consonants is by varying their manner of
articulation. It has been found that continua which involve a change in articulation
manner are perceived categorically (Liberman, Delattre, Gerstman, & Cooper, 1956; J.
L. Miller & Liberman, 1979).
2.3.3.2 Categorical Perception of Nasal Consonants
Nasal consonants have not been tested for categoricality very extensively, because
synthetic manipulation of nasals is more difficult than for vowels and stop consonants.
In his studies on nasal consonants however, Garcia (1966) found categorical
discrimination that was better than predicted by identification. More consistent results
were obtained by Miller and Eimas (1977), who compared /ba/ - /da/ stimuli with
22
stimuli from a /ma/ - /na/ continuum and observed identification that was not as
categorical as for stops, but discrimination patterns that suggested categorical
perception.
Given these results and those of other studies (Larkey, Wald, & Strange, 1978;
Mandler, 1976; J. D. Miller et al., 1979; J. L. Miller & Eimas, 1977) on categorical
perception of nasal consonants, it can be concluded that perception of nasals is very
categorical, with discrimination that slightly exceeds the prediction.
2.3.3.3 Categorical Perception of Approximants
A seminal study on categorical perception of approximants (Miyawaki et al., 1975)
employed a /ra/ - /la/ continuum to investigate the influence of linguistic experience on
perception in Japanese vs. American listeners (in Japanese, /ra/ and /la/ are allophones
of one phonemic category). American listeners showed fairly categorical perception,
Japanese listeners, however, performed poorly in both identification and discrimination
and were far from showing categorical perception. Studies observing perception of
synthetic approximant continua (H. Fujisaki & T. Kawashima, 1970; MacKain, Best, &
Strange, 1981; McGovern & Strange, 1977) obtained very similar results. Frazier
(1976) created a synthetic continuum from /w/ to /l / to /y / through variation of the
initial steady-state portion and the F2 transition and found that those stimuli were
perceived fairly categorically.
Apart from studies that investigated perception of continua between different
approximants, experiments using continua between approximants and other phonemes
have also been conducted and it was shown that perception was highly categorical (J. L.
Miller, 1980).
Altogether, perception of semivowels, liquids, and approximants appears to be less
categorical than that of stop consonants, but is far from being continuous.
2.3.3.4 Categorical Perception of Fricatives
Fricatives are expected to be perceived rather continuously because stimuli on a
synthetic fricative continuum are acoustically rather widespread and even one-step
differences should exceed auditory detection thresholds.
23
Fujisaki and Kawashima (1968; 1970) investigated the perception of fricatives and
observed good within-category discrimination and a peak at the category boundary. A
study by Healy and Repp (1982) yielded somewhat different results: continuous
perception of fricatives. These and other results (Hasegawa, 1976; May, 1981) show
clearly that the acoustic differences between isolated fricatives are relatively easy to
detect and perception of those continua seems as „uncategorical‟ as vowel perception.
2.3.3.5 Categorical Perception of Vowels
Even though vowels are said to be perceived continuously, a closer look at vowel
perception studies reveals that there are discrimination peaks observed in most cases
(Cowan & Morse, 1979; Eimas, 1963; Healy & Repp, 1982; D. B. Pisoni, 1973; Repp,
Healy, & Crowder, 1979; M. E. H. Schouten & Van Hessen, 1992). A study by Fry,
Abramson, Eimas, and Liberman (1962) is one among only a few studies that did not
discover a discrimination peak for vowels. As this was the first study on categorical
perception of vowels it may have given unjustifiable credibility to the common view
that vowels are perceived continuously.
Another property of vowels, rather than their spectral characteristics, is duration, giving
rise to the phonological feature of length. Vowel length can convey phonetic
information, which is phonologically relevant in some languages such as Thai. To test
the categoricality of length, Bastian and Abramson (1964) synthesised a continuum
between the two Thai words /baat/ and /bat/ and found continuous perception.
Recapitulating, it seems that categorical perception is not only a characteristic of stop
consonants, but can be observed in other speech sounds as well.
2.3.4 Categorical Perception of Non-Speech Stimuli
The comparison between speech and non-speech stimuli has always been a very
important aspect of categorical perception research. In order to exclude the possibility
that categorical perception is an artefact of the experimental procedures, it is essential
to test the perception of non-speech stimuli. The original motivation of non-speech
experiments was to determine whether speech is special, which would mean that non-
speech continua would be perceived in a strictly continuous manner. Later, the
24
perception of non-speech sounds promised to yield more insight into possible
psychoacoustic reasons for categorical perception (Mattingly et al., 1971). To use non-
speech stimuli to detect psychoacoustic factors the non-speech must be very similar to
speech stimuli on the one hand, on the other, different enough from speech to avoid
them being perceived as speech sounds.
2.3.4.1 Categorical Perception of Continua Unrelated to Speech
Categorical perception has been found for various diverse continua, e.g. sectored circles
(Cross, Lane, & Sheppard, 1965), flicker fusion11 (Pastore et al., 1977), and colour
(Lane, 1967). There is also the interesting case of categorical hue perception
(Bornstein, 1987).
These results certainly show that categorical perception is not restricted to speech.
However, they do not throw light on the nature of categorical speech perception.
Consideration of non-speech analogues in the next section will address this issue.
2.3.4.2 Categorical Perception of Non-Speech Analogues of Speech Sounds
One major goal of this thesis is to establish whether perception of the same tonal
contours in speech vs. non-speech contexts varies, and what role the linguistic and
musical background plays in such perception. To this end it is important here to
consider the various non-speech analogues that have been used in studies of vowel and
consonant perception.
Voice Onset Time (VOT) Analogues:
In an attempt to create non-speech analogues for VOT, Liberman, Harris, Kinney and
Lane (1961) synthesised a /do/ - /to/ continuum by variation of F1 onset time. The
matching non-speech continuum was obtained by presenting the same sounds, but with
inverted frequency scales and a modified F1 transition. Discrimination of the speech
stimuli was categorical, and of non-speech stimuli continuous. In a follow-up study,
Lane and Schneider (1963, reported in Lane, 1965) found that some participants could
be trained to correctly identify the non-speech stimuli as speech sounds. Results of a
subsequent discrimination study revealed relatively high within-category discrimination
11 The flicker fusion threshold is defined as the frequency at which an intermittent light stimulus seems to be completely steady to the observer.
25
and a peak at the boundary – categorical perception, though see Studdert-Kennedy,
Liberman, Harris, and Cooper (1970), who were unsuccessful in training participants to
identify the non-speech stimuli in a consistent manner.
Another approach to non-speech analogues of VOT, used by Miller, Wier, Pastore,
Kelly and Dooling (1976) was to present stimuli consisting of white noise and a square-
wave buzz with varying noise-buzz lead times. Control data with isolated noises did not
show discrimination peaks, but the noise-buzz stimuli yielded category boundaries that
were generally located at the same point of the continuum where a discrimination peak
was found – another case of categorically perceived non-speech stimuli.
Pisoni (1977) constructed a VOT analogue by varying relative tone onset times (TOTs)
of two pure tones. After training, a boundary effect and discrimination peaks were
observed at a similar location to the VOT boundary for voiced/voiceless stop
consonants. In two subsequent experiments Pisoni (1977) tested discrimination of the
same stimuli without training and some of the participants showed similar results to
those found in the previous study (category boundary at short low-tone lags), whereas
other listeners exhibited two peaks in discrimination – at 20 ms lead and 20 ms lag
times of the lower component tone of the stimulus. Together with a subsequent
successful attempt to train participants to divide the continuum into three categories,
these results show that there were two natural boundaries on the continuum around +20
msec and –20 msec TOT (which coincide with, for example, languages with a three-
way voicing distinction, such as Thai). Pisoni (1977) concludes that VOT perception is
influenced by temporal-order processing limitations. The results suggest that there is a
threshold for judgements of non-simultaneity. It appears that the listener needs an onset
asynchrony of around 20 ms between two successive sounds in order for them to be
perceived as two temporally distinct events. If the separation is less than 20 ms, sounds
appear to have a simultaneous onset and temporal order judgements are difficult (Hirsh
& Sherrick, 1961).
Summerfield (1982) compared perception of three types of continua: TOT, noise-buzz
stimuli (similar to those previously studied by Miller et al., 1976) and VOT, with onset
asynchrony threshold measured as a function of the lowest stimulus component (F1).
The results show that there is a boundary effect for VOT, but not for the two non-
speech continua, suggesting that speech and non-speech processing are different.
26
Formant Transition Analogues:
The most important cues for perception of consonant place of articulation are the
transitions of the first and second formants, F1 and F2. Non-speech analogues of
formant transition cues are created by excluding the constant parts of the signal (F1 and
the steady state portion of F2) to present F2 in isolation (perceived as bleats), or only
the transition (perceived as chirps). Generally, the perception of these chirps and bleats
is continuous, rather than categorical (Kirstein, 1966; Mattingly et al., 1971; D. B.
Pisoni, 1976; Popper, 1972).
Closure Duration in Speech and Non-speech:
To create non-speech analogues of closure duration, Liberman, Harris, Eimas, Lisker,
and Bastian (1961) matched the duration and amplitude of two noise bursts with those
of the pre-closure and post-closure characteristics of speech sounds (/rb d/ -/ r p d/).
Stimuli varied on silent interval duration (30-120 ms) and the results show that ABX
discrimination12 ability of the non-speech stimuli was not as good as for the speech
stimuli and no non-speech discrimination peaks were observed, a result that was further
supported by similar findings by Baumrin (1974). Finally, Perey and Pisoni (1980)
conducted a categorical perception experiment of silent intervals that were embedded in
two 250-ms three-tone complexes imitating the formants of an /ə/-vowel. The results
show that the stimuli were perceived continuously. Together, these results show that
duration of silence is only perceived categorically when presented in a speech context.
2.4 Task and Response Factors in Categorical Perception
When designing a categorical perception task, it is essential to choose procedures
carefully, as these can affect the final results. Thus, a good knowledge of these
procedures and their effects is important and these are reviewed in the following
sections.
2.4.1 Identification Task Factors
In categorical perception tasks, the identification procedure is quite consistent across
studies. The only three task types that are used to test identification are open, covert,
12 In the ABX discrimination task two stimuli (A and B) are presented and then a third (X) is offered, which is either A or B. The subject is required to indicate whether X equals A or B.
27
and AXB identification. In open identification, the subject is presented with each token
(X) of a continuum (usually in random order, and usually multiple times) and asked to
identify the category to which the stimulus belongs, usually using labels that carry the
name of the categories. If there are no category labels and the stimuli are assigned to
functional categories as such as „left‟ or „right‟, the term covert identification is used. A
third way to test identification is with the AXB paradigm, in which the listener must
judge whether stimulus (X), which varies over trials, is more similar to the first or to
the third member of the triad (A or B), which are always the endpoints of the
continuum (Lindblom & Studdert-Kennedy, 1967).
2.4.2 Discrimination Task Factors in Categorical Perception
While the identification task does not offer great variety of experimental procedures,
there are a number of different discrimination tasks to choose from (Gerrits &
Schouten, 2004). The choice of discrimination task is important, because certain tasks
appear to induce categorical perception more than others.
2.4.2.1 ABX and AXB Discrimination Tasks
The ABX task is one of the standard discrimination tests in categorical perception
research. In the ABX discrimination task two stimuli (A and B) are presented and then
a third one (X), which is either A or B. The subject is required to indicate whether X
equals A or B (Liberman et al., 1957). High levels of categorical perception are often
found with the ABX task and Massaro and Cohen (1983) claim that this high level
might reflect the use of phonetic memory. That is, in ABX tasks, listeners may try to
remember both auditory memory traces and the labels of the A and B sounds. When
sound X „arrives‟, these auditory traces may have already faded, in which case listeners
must rely on the labels (or „internal labels‟, if there are no actual labels involved in the
task) they have previously assigned to A and B and choose the one that matches the
label they have assigned to X. Such a strategy could well result in high degrees of
categorical perception. Signal detection analysis of data from an ABX task (B.
Schouten, Gerrits, & Hessen, 2003) has revealed that it is subject to a very strong bias
towards the response “B = X”. In theory, this is not a great problem, as a signal
detection analysis will allow a clear separation between sensitivity and bias, but in
practice, the greater the bias, the more unlikely are the conditions for such an analysis
28
to be met. In order to overcome this problem, a variant of the ABX procedure, the AXB
discrimination task has been used, in which the second stimulus is identical to the first
or the third sound (Van Hessen & Schouten, 1999).
2.4.2.2 The Two-Interval Two-Alternative Forced-Choice Discrimination Task
In the two-interval two-alternative forced-choice (2I2AFC) paradigm, the two stimuli
that are presented are always different, and the subject must determine the order in
which they are presented (AB or BA). This makes it necessary to explain to the
participants what the term „order‟ means, which makes it difficult to avoid mentioning
the phoneme categories in the instructions, with the consequent risk of encouraging
labelling behaviour (M. E. H. Schouten & Van Hessen, 1992). Response bias to one or
the other stimuli has, however, been found to be much smaller here than in ABX tasks
(B. Schouten et al., 2003).
2.4.2.3 The AX Discrimination Task
To avoid strategies that exclusively rely on category labels, a task is required that
reduces the cognitive load on auditory memory and encourages direct auditory
comparison between the stimuli that are to be discriminated (D. W. Massaro & Cohen,
1983). An example of such a task is AX discrimination, in which all possible stimulus-
pairs are presented (AA, BB, AB, and BA) and the participants must indicate whether
they are the same sound or different sounds. A disadvantage of this task is that, if the
difference between two neighbouring stimuli is relatively small, listeners tend only to
respond “different” if they are very sure of their decision. For this reason the AX
discrimination task is not bias-free: the listener‟s response is determined by a subjective
criterion of what is “same” and what is “different”. The AX task is often chosen if there
are time-limitations for the experiment, as it is the most time-efficient discrimination
measure and is also relatively easy to use with young children, or when the stimulus
categories are unfamiliar to the listener, such as in cross-language studies.
2.4.2.4 The Four-Interval-AX Discrimination Task
In this task (4IAX) the test trials consist of eight possible combinations: ABAA,
BAAA, AAAB, AABA, and BABB, ABBB, BBBA, BBAB. The time interval between
the second and the third stimulus is longer than between the other sounds, so that the
29
impression of two pairs of sounds arises. The participants are required to decide which
pair contains two identical stimuli, the first or the second pair. It is assumed that the
listener first determines the differences between the stimuli within the pairs and, in a
second step, determines which of the two differences is smaller. The decision is thought
to be based mainly on bottom-up13 auditory information and not to be subject to top-
down14 influences, such as information about phoneme boundaries (Gerrits &
Schouten, 2004). The 4IAX task has been found to be more sensitive to acoustic
differences between sounds than the other above-mentioned tasks (D. B. Pisoni, 1975).
2.4.2.5 The Four-Interval Oddity Discrimination Task
In the 4I oddity task, the stimuli A and B are presented randomly in the two orders
AABA or ABAA, with stimulus A at the beginning and the end of the trial, functioning
as a reference stimulus. Listeners must respond by indicating if the „oddball‟ (stimulus
B) is the second or third stimulus. In principle this task is as bias-free as 4IAX and it
has a much shorter experimental duration. However, although it is a four-interval task,
the optimal decision rules defined by Macmillan and Creelman (1991) predict that the
ideal observer will ignore the reference stimuli (stimulus 1 and 4) and thus perform the
4I-oddity task like a standard AX (same-different) task (Heller & Trahiotis, 1995). The
advantage over AX is that the listener can decide about the oddball without needing to
refer to any internal criterion of „same‟ or „different‟. In other words, it is expected that
the 4I-oddity paradigm combines some of the important aspects of AX and 4IAX and
that listeners will have the choice between two perceptual strategies: a AX-like
phoneme labelling strategy or a 4IAX-like low-bias strategy.
Due to time constraints, the AX same-different task type is used in the current
experiment series. In addition, as no labels are required in the AX procedure it is
possible to test people from different language backgrounds on the same experiment.
As there were no „real‟ labels for the stimuli that will be used (the pitch contour will be
rising or falling, but this acoustic dimension does not have meaning for everyone) this
was considered to be the most appropriate task with which to work.
13 Bottom-up is a term that characterises any procedure or model which begins with a low-level, e.g. acoustical unit or with the smallest functional units in a hierarchy and proceeds to combine these into larger units. 14 Top-down, as opposed to bottom-up, begins with analysis of a high-level, e.g. more cognitive or a composite unit into progressively smaller units.
30
2.4.3 Methods for Increasing Categoricality
In categorical perception experiments, two different ways of increasing the
categoricality of perception without changing the task have been discovered:
interference with auditory memory and decay of auditory memory.
2.4.3.1 Interference with Auditory Memory
Different attempts to interfere with auditory memory in categorical perception tasks
have been undertaken and the most effective ways to do this are summarised below.
Lane (1965) tested the influence of interference by using an existing vowel continuum
(Fry et al., 1962) and his results indicate that the addition of irrelevant noise interferes
with memory in such a way that it increases discrimination ability at the category
boundaries, but not within categories, and thus leads to a pattern of results that
resembles categorical perception. Fujisaki and Kawashima (1969; 1970) provided
listeners with a fixed vocalic context (/a/) in a test for categorical perception of a vowel
continuum (/i/ - /e/), turning them into diphthongs. Their results show that perception
was more categorical when the context was not present. The difference between the two
sets of results was explained as being a result of the context serving as a perceptual
reference.
Pisoni (1975) investigated the role of a fixed context in vowel perception more
systematically. His hypothesis was that if the context stimuli provide a perceptual
reference, as suggested by Fujisaki and Kawashima (1969; 1970), then it should not be
important whether the context is presented before or after the test stimulus. If, on the
other hand, the context does influence auditory memory, it is expected that addition of a
post-stimulus context will cause more interference than a preceding context. In
addition, Pisoni (1975) hypothesised that the amount of interference would be
determined by the similarity between test stimulus and context stimulus. He used four
different context stimuli (pure tone, white noise, and the vowels /a/ and //) to interfere
with a continuum of stimuli ranging from /i/ to //. In these experiments (identification
and ABX discrimination), the context either followed or preceded the test stimulus. The
results support Pisoni‟s (1975) similarity hypothesis: Discrimination ability was the
31
lowest in the (most similar) / - context with a greater decrease in discrimination
scores when the context followed than when it preceded the test stimuli.
In a subsequent study, Repp, Healy, and Crowder (1979) tested discrimination of
stimuli from an /i/ - / - / / continuum with a silent or a filled (using an inserted /y/
vowel sound) interstimulus interval. Perception of those trials that contained the
intervening vowel stimulus showed a decrease in discrimination performance and it was
concluded that categoricality of perception had increased. The authors‟ interpretation of
these data was that auditory memory had exerted its effect before phonetic
categorisation, in the form of contrastive interactions between auditory stimulus traces,
and that discrimination was the mainly based on the phonetic labels.
Together, the results of these studies strongly suggest that interference with auditory
memory can increase categoricality of perception.
2.4.3.2 Decay of Auditory Memory
Another way of interfering with auditory memory in discrimination tasks is by
manipulating the interstimulus interval (ISI). The ISI is the interval between two stimuli
to be discriminated. Longer ISIs allow greater decay of auditory memory. Since it is
desirable to encourage comparison of the acoustic cues in stimuli during discrimination
and since the auditory trace of speech sounds is time-dependent, it is important to make
a considered decision about the ISI in categorical perception experiments. If the ISI
exceeds the duration of the auditory trace of the stimuli, all that is left of the first
stimulus is a representation coding the relationship of the presented sound to the other
sounds in the experiment, or to pre-established categories, or to both (D. B. Pisoni,
1973). Studies with variable ISIs (Cowan & Morse, 1979; Cutting, Rosner, & Foard,
1976; Frazier, 1976; D. B. Pisoni, 1971, 1973; Repp et al., 1979) have shown greater
categorical perception of speech sounds with longer ISIs up to a maximum of 3 seconds
(Crowder, 1982).
ISI duration will be manipulated in Experiment 1 of this thesis to investigate the role of
auditory memory in pitch perception in speech and non-speech sounds.
32
2.4.4 Methods to Reduce Categoricality of Perception
In this section, the results of studies using more sensitive discrimination paradigms and
experiments using rating scales and reaction time measurements are reviewed.
Experiments in this area have focussed on stop consonants, a class of speech sounds
that are known to be perceived highly categorically.
2.4.4.1. The Use of More Sensitive Discrimination Paradigms.
One way to reduce categoricality is to use more sensitive discrimination paradigms in
order to access memory traces for acoustic properties of stop consonants retained in
auditory memory. As stop consonants are considered to be abstract, highly encoded
categories that require a special speech decoder (Liberman, Cooper, Shankweiler, &
Studdert-Kennedy, 1967), these are ideal candidates for this experimental manipulation.
Pisoni (1971) presented steady-state vowels and stimuli from a /bæ/ - /dæ/ - /gæ/
continuum in a 4IAX and an ABX task (see section 2.3.2). The results show that
discrimination of vowels, but not of consonants was better in 4IAX compared to the
ABX results. Thus the data show a contribution of only phonetic and not auditory
memory in stop consonant perception, even when these more sensitive measures
(4IAX) are used. Pisoni and Lazarus (1974) also compared ABX and 4IAX
discrimination of a /ba/ - /pa/ continuum and included a preparatory sensitisation phase,
where one group of listeners were presented with the whole continuum before the
discrimination task. Discrimination improvement was observed only in those listeners
who had been sensitised and for whom the 4IAX task was used. However, a similar
study (D. B. Pisoni & Glanzman, 1974) found that the factor that increased
discriminability must have been the sensitisation phase, because there was no
difference between results of the ABX and the 4IAX tasks when they were tested
without prior presentation of the stimulus continuum. Crowder (1982) compared
discrimination of a /i/ - /I/ continuum in ABX and AX tasks with different ISIs (500 ms
and 3 s) and his results show that AX is the more sensitive task type that also obtained
much more constant results.
33
In summary there is no doubt that, even when more sensitive discrimination methods
are used, perception of stop consonants is uninfluenced and very categorical. (For a
comparative overview of different discrimination tasks for the perception of non-speech
sounds see Creelman & Macmillan, 1979).
2.4.4.2 The Use of Rating Scales and Measurement of Reaction Times
Another way to quantify differences in within-category perception is by assessing
listeners‟ certainty in identifying stimuli through reaction time measurement. It is
expected that reaction times be longer for „difficult‟ ambiguous stimuli than for „easy‟
unambiguous sounds closer to the endpoints of the continuum. Studdert-Kennedy,
Liberman and Stevens (1963; 1964) were the first to investigate this hypothesis and
found shorter reaction time peaks at the category boundary for stop consonants, a
finding that has been replicated very often since then (Cross et al., 1965; D. B. Pisoni &
Tash, 1974; Repp, 1975, 1981a).
Another way of accessing information about auditory memory in identification is the
use of scales to rate individual stimuli. Conway and Haggard (1971) provided their
subjects with a 9-point rating scale to assess stimuli from /bl/ - /p l/ and /gl/ - /k l/
VOT-continua and their results led to the conclusion that even finer-grained scales do
not make distinctions within those consonant categories possible. These findings
suggest that stop consonants are perceived categorically and that such perception does
not depend on the number of items on the scale that is being used.
A task type called absolute identification was employed by Sachs (1969), in order to
establish a one-to-one correspondence between the stimuli and responses. In this task
listeners used numbers from 1 to 8 to label stimuli on a continuum between /badəl/ and
/bædəl/ and between /a/ and /æ/ vowels that differed in duration. Perception was quite
categorical for all stimuli except for the long vowels, which suggests that rating scales
do not influence categoricality of perception. Similar results were obtained by Cooper,
Ebert, and Cole (1976) for stimuli from /ba/ - /wa/ and /ga/ - /ja/ continua.
Rating scales have also been used in discrimination tasks. Vinegrad (1972) used the
method of direct magnitude scaling to investigate the perception of consonants (/bε/ -
34
/dε/ - /gε/), vowels (/i/ - /I/ - /ε/) and pure tones varying in frequency. Listeners were
asked to rate stimulus X by marking a point on a line between A and B. Their results
suggest highly categorical perception of stop consonants and rather continuous
response patterns for the vowels and pure (non-speech) tones. A similar experiment was
conducted by Strange (1972), who observed the same result patterns for stimuli in
which VOT was manipulated. In a similar vein, Pisoni and Glanzman (1974) had their
participants make confidence ratings for /ba/ - /pa/ discrimination and obtained a very
close relationship between discrimination performance and confidence: the higher the
confidence, the better the performance. Repp (1984) suggests that there might be the
possibility that
“Rather than directly accessing some auditory memory representations, subjects
might base decisions about stimulus differences on estimates of their subjective
uncertainty in phonetic categorization.” (p. 270)
It can be concluded that different kinds of discrimination tasks, rating scales and
reaction time measurements are good ways to access additional information about
stimulus representations of the listener, but they do not change the pattern of
categorical perception.
2.5 Psychoacoustic Strategies and Experiential Factors in Categorical
Perception
In the previous sections, it was shown that categorical perception varies as a function of
stimulus and response factors. Apart from considering these factors, it is also important
to consider whether categorical perception is a property of the auditory system or more
a result of experiential participant factors. This section will review studies that asked
the questions whether categorical perception is immutable (in section 2.5.1. and 2.5.2,
the studies on the influence of training, strategies, and language background on
perception are summarised); whether it is innate (studies with human infants are
reviewed in section 2.5.3); and whether it is specific to humans (a review of animal
studies is given in 2.5.4).
35
2.5.1 Practice and Strategies
It was shown earlier (2.3.3) that within-category discrimination could be manipulated
by using different kinds of discrimination tasks. An additional manner of improving
discrimination is by providing feedback to the participants.
2.5.1.1 Practice and Feedback
In a categorical perception task, feedback means providing the participant with
information about whether the response that they have just given was correct or
incorrect. Hanson (1977) was one of the first researchers to use feedback in a same-
different task and found that listeners‟ performance improved significantly when
feedback was used (compared with Repp, 1975, whose participants completed the same
task without feedback and did not show any improvement).
Training has also been used to improve discrimination performance. Training is
different from feedback in that the listener is given a number of practice trials before
the experiment, whereas feedback is only given during the experiment. Carney, Widin,
and Viemeister (1977) used feedback in their experiment on stimuli from a /ba/ - /pa/
continuum and found improved discrimination, that is less categorical perception. A
follow-up study was conducted by Edman, Soli and Widin (1978) and the results
showed that listeners, who were trained on a labial VOT continuum were able to
transfer their discrimination skills to a velar continuum. Similar results were observed
by Edman (1979), who successfully trained listeners with stimuli from /bæ/ - /dæ/ -
/gæ/ and /pæ/ - /tæ/ - /kæ/ continua and by Samuel (1977) with /da/ -/ta/ stimuli.
Generally, we can conclude that training and feedback, and especially the combination
of both allows intraphonemic discrimination.
2.5.1.2 The Use of Strategies in Categorical Perception
As we have seen in the previous section, feedback and training are ways to enable
listeners to make within-category discriminations in stop consonants by directing the
listeners‟ attention to certain stimulus properties that are not required in speech
discrimination in fluent speech.
In some continua, acoustic differences are more salient and more easily accessible than
in others. The question is: Is discrimination of stimuli in these continua easier and is
less training needed for discrimination in such continua?
36
Repp tested stimuli from a [∫] – [s] continuum, which were perceived rather
categorically before training. After extensive training, isolated fricatives showed
continuous perception. However, when presented in vocalic context, perception was
again categorical (Repp, 1981b). When a different training method was used
(presenting sounds isolated or in context one after the other), participants could be
trained to pick up intraphonemic differences – continuous perception was observed. It
seems that listeners were able to switch between different perceptual modes – phonetic
and acoustic.
There are various studies that show that different auditory strategies may be applied
when listening to speech stimuli or speech-like sounds. This is only possible when
listening in the auditory mode15; in the phonetic mode, all relevant acoustic input is
integrated into a phonetic percept, a phonetic category. Best, Moringiello, and Robson
(1981) discovered that listeners tested on perception of sine-wave stimuli could be
divided into two groups: temporal listeners, who concentrated on temporal duration
information and spectral listeners, whose attention was focused on the spectral changes
in the signal. Similar results were obtained in a study about amplitude rise time
discrimination (Rosen & Howell, 1987): there were large individual differences in
participants‟ attention to spectral vs. temporal cues.
In a study on sine-wave analogues Bailey, Summerfield, and Dorman (1977) found
interesting differences in perception depending on whether the participant perceived the
artificial stimuli as speech or as non-speech: When perceived as speech, the category
boundary was similar to a previously determined boundary of matching speech stimuli,
whereas when perceived as non-speech, the boundary location did not match the speech
boundary. These results suggest that the location of the category boundary (as well as
the shape of the discrimination function) does not only depend on the spectral
characteristics of the stimuli but also on the expectation of the listener.
15 Auditory mode of perception refers to perception without reliance on linguistic category labels. In this mode fine acoustic details are perceived well, even variations within phonetic categories.
37
A similar effect was found for stop consonant manner16. Best et al. (1981) created a
synthetic continuum between sine-wave analogues of /say/ and /stay/ (the stimuli
consisted of an initial noise burst, followed by a variable silent interval and a three-tone
complex with variable F1 onset-frequencies). The results show that those listeners who
perceived the stimuli as speech sounds perceived them in a categorical manner. Those
participants who did not relate the stimuli to speech were separated into two groups:
listeners who paid attention to the duration of the silent interval (temporal listeners) and
listeners who paid attention to the onset quality (spectral listeners). Temporal listeners‟
discrimination was worse than that of the speech-perceivers, whereas spectral listeners
showed much better discrimination abilities than the people who perceived speech.
These results support the conclusion that there are separate modes of perception for
speech and for non-speech sounds.
The frame of reference in which the stimuli are presented is very important for
perception. It has proven to be possible to use different perceptual strategies while
operating in the phonetic listening mode. Researchers at Haskins Laboratories
(unpublished study, cited by Repp, 1984) presented a /ba/ - /da/ continuum for
identification and discrimination and obtained the usual categorical perception result
pattern. When they provided the listeners with an additional label, /, listeners
developed two category boundaries and two discrimination peaks. This shows that there
is strong phonetic influence on discrimination performance.
In summary, it seems that there cannot be a definitive conclusion about whether
perception of speech and non-speech sounds are separate processes, but what seems to
be important is not whether the stimuli are natural speech or non-speech analogues but
rather how they are interpreted by the listener. It seems that categoricality is more a
function of the expectations of the listener than of the acoustic properties of the stimuli.
The results of these studies lead to the conclusion that perceptual categories are not
established as result of psychoacoustic sensitivities but mainly by the phonetic criteria
that are adopted by the listener.
16 Manner of articulation refers to how the tongue, lips, and other speech organs are involved in making a sound. One such manner is that of stops. Stop consonants are made by a set of articulators, e.g., the two lips, or the tongue and the teeth, closing off the airflow momentarily, and then releasing the air in a burst.
38
2.5.2 The Influence of Specific Linguistic Experience on Categorical Perception
As different languages have different phoneme inventories, it is of interest to discover
whether perceptual phoneme boundaries change depending on language background. A
central phenomenon in the area of cross-linguistic studies is the phoneme boundary
effect (Carney et al., 1977). There are two main possibilities – phoneme boundaries
could be of a psychoacoustic or phonetic origin, in which case phoneme boundaries
should be in the same location independent of language background. On the other hand
if boundaries are affected by the surrounding language, then boundaries and
discrimination peaks should occur where the phonemic boundary is located.
Most cross-linguistic studies have examined the voicing dimension, taking advantage of
the fact that languages such as English, French and Thai contrast voicing in
phonetically different ways. English distinguishes voiced and voiceless aspirated stops:
depending on the place of articulation, the perceptual voicing boundary for English
listeners is positioned in the short-lag region of VOT between 20 and 40 ms (Lisker &
Abramson, 1970). In French, Polish and Spanish prevoiced and voiceless unaspirated
stops are contrasted. This category boundary is more variable but generally located at
lag times around 0 ms VOT (Caramazza, Yeni-Komshian, Zurif, & Carbone, 1973;
Keating, Mikos, & Ganong, 1981; L. Williams, 1977). Thai is a language that makes
both distinctions, resulting in three voicing categories (Forfeit, 1977; Lisker &
Abramson, 1970). These differences indicate that native language influences phonetic
boundary locations on VOT continua.
Given these differences, it is to be expected that there would be shifts of discrimination
peaks coincident with the phonetic boundaries across different languages. Indeed Thai
speakers exhibit a discrimination peak in the voicing-lead region, a region in which
English listeners‟ discrimination performance is poor (Abramson & Lisker, 1970).
Most cross-linguistic studies have focussed on consonants, but there have also been few
studies on vowel perception across languages. Flege, Munro, and Fox (1994) tested
English and Spanish bilingual listeners‟ perception of Spanish and English vowels in a
dissimilarity task and found that perceived dissimilarity increased with increasing
acoustic separation in both listener groups. It was concluded that there is a universal,
psychoacoustic component in cross-language vowel perception.
39
Another way of looking at cross-linguistic differences is by testing perception of new
phonetic contrasts. Various studies compared perception of voicing across languages
and found differences in perception, depending on language background. One of the
most influential study is the one by Lisker and Abramson (1970). They compared
perception of VOT in Thai (there are three voicing contrasts in Thai: prevoiced,
voiceless unaspirated and voiceless aspirated) and English (English distinguishes two
voicing contrasts: voiced and voiceless aspirated) listeners. When presented with
stimuli from a VOT continuum, Thai listeners exhibit three categories, whereas English
participants only perceive two categories. These and other results show that native
language does seem to influence boundary existence and location(s) (Lisker, 1970;
Lisker & Abramson, 1970; McClasky, Pisoni, & Carrell, 1980; D. B. Pisoni, Aslin,
Perey, & Hennessy, 1982; Strange, 1972). Together, the results demonstrate that it is
possible to acquire new phonetic contrasts under laboratory conditions, but it is not
clear whether transfer of such acquired phonetic distinctions into the real world, for
example during learning a new language, is promising.
2.5.3 Categorical Perception in Infants
Early on in categorical perception research, when infant perception studies were rare
due to methodological difficulties, categorical perception was seen as a language-
specific phenomenon (Liberman et al., 1957). The subsequent development of tools by
which to test infant speech perception allowed empirical investigation of this issue. In a
now classic study Eimas and his colleagues (Eimas, Siqueland, Jusczyk, & Vigorito,
1971) tested 1-month old infants‟ perception of voicing with the High Amplitude
Sucking technique (HAS). After presenting /ba/ stimuli repeatedly (while the infant
sucked on a non-nutritive nipple) and then presenting a /pa/ stimulus, greater sucking
rates were observed. This was not so when both of the stimuli came from the same
(adult) category (within-category contrasts such as two different /pa/- stimuli but of the
same VOT difference as the cross-category condition). As similar results were found in
subsequent experiments (Aslin & Pisoni, 1980; Aslin, Pisoni, & Jusczyk, 1983; Eilers,
1980) and with place of articulation contrasts (Bertoncini, Bijeljac-Babic, Blumstein, &
Mehler, 1987; Eimas & Miller, 1980b; Jusczyk, Copan, & Thompson, 1978; Moffitt,
1971; Morse, 1972), it was concluded that infants discriminate speech sounds
categorically, that the general mechanisms underlying speech perception are innate and
40
specifically linguistic (Eimas, 1975; Eimas et al., 1971). (However, note that the results
were not so clear for manner of articulation, see Eimas & Miller, 1980a; Hillenbrand,
Minifie, & Edwards, 1979).
A problem with these studies was that, as identification is difficult to test with very
young infants, only the discrimination aspect of categorical perception task was tested.
Burnham, Earnshaw, and Quinn (1987) argue that without identification data, it is not
possible to conclude that there is categorical perception of speech in infants. They
suggest that improved discrimination ability is merely a sign of heightened perceptual
ability and that the processes involved in discrimination are not the same as those in
identification. In a new infant identification procedure developed by Burnham,
Earnshaw, and Clark (1991), 9- to 11-month old infants were tested on their
identification ability for bilabial stops from a VOT continuum. Results show that
identification in the positive VOT region was significantly better than in the negative
VOT region, and that in neither region was a categorical result pattern found. This led
Burnham et al. (1991) to the conclusion that even though there may be categorical
discrimination (Aslin, Pisoni, Hennessy, & Perey, 1981), infants do not identify speech
categorically.
Similar to adult research, most categorical perception studies with infants have been
conducted on consonant contrasts. However, a few experiments have tested infants‟
perception of vowels. In a study investigating vowel pairs ([a] vs. [i] and [u] vs. [i]),
Trehub (1973) found that 1- to 4-month-old babies could discriminate both contrasts.
Similar sensitivities were observed in a later study on [a] vs. [i] by Kuhl and Miller
(1982). Even though discrimination of steady-state characteristics of vowels was
observed in these experiments, the methods and results do not allow conclusions about
categoricality of vowel perception in infants.
The only study that clearly investigated the matter of categorical vowel perception was
the one by Swoboda, Morse and Leavitt (1976), in which 2-month old infants‟
perception of [i] vs. [I] was tested and the results showed that infants perceived these
vowel-stimuli in a continuous manner.
41
As it was shown that, as for adults, infants‟ discrimination of consonants was
categorical and discrimination of vowels continuous, the question of the origin of
categorical perception – psychoacoustic or linguistic – again arose. In order to obtain a
clearer answer to this question, infants‟ perception of non-speech stimuli is of interest.
Continuous discrimination of non-speech sounds vs. categorical of speech sounds was
found by Mattingly, Liberman, Syrdal, and Halwes (1971), but Morse (1972) observed
no perceptual differences between speech and non-speech continua.
These results, together with findings from similar studies (Jusczyk, Pisoni, Walley, &
Murray, 1980; Jusczyk, Rosner, Reed, & Kennedy, 1989), allow the conclusion that
infants‟ perception of speech vs. non-speech is similar to adults‟ perception and
supports the hypothesis that specific perceptual mechanisms are functional from birth.
The difference seems to be that infants are sensitive to all phonetic contrasts, whereas
adults‟ perception is shaped by their linguistic environment such that they have learned
to ignore contrasts that are not linguistically significant for them.
2.5.4 Categorical Perception in Animals
The investigation of speech perception in animals is important because it allows
comparison with infant perception and conclusions about the species-specific nature of
speech perception. As animals do not produce human speech, their level of
discrimination ability for speech sounds should reflect the involvement of
psychoacoustic factors only. In order to investigate this, studies have been conducted
with species that have perceptual systems that are very similar to that of the human
auditory system.
Morse and Snowdon (1975) measured heart rate changes in macaque monkeys in
response to stimuli from Pisoni‟s (1971) /bæ/ - /dæ/ - /gæ/ continuum. There was good
discrimination performance between categories and some within-category sensitivities,
a pattern that is similar to that observed in adult perception of place of articulation
contrasts (see 2.4.3.1). In addition, Kuhl and Miller (1975) used three VOT continua
/ba/ - /pa/, /da/ - /ta/ and /ga/ - /ka/ and found very similar identification abilities for
humans and chinchillas. Similarly, Kuhl and Miller (1978) tested chinchillas on voicing
contrasts and observed categorical discrimination, and Kuhl and Padden (1982)
observed human-like discrimination abilities of voicing and place of articulation in
42
macaques. These patterns of results suggest that there is a psychoacoustic basis for
categorical speech of voicing contrasts. (For a detailed review on animal speech
perception see Harnad, 1987.)
While these results suggest that humans and animals process speech in a similar way,
there are also differences. Waters and Wilson (1976), for example, compared
perception of voicing contrasts in humans and rhesus monkeys and found that both
species showed categorical discrimination, but that the category boundary for the
monkeys was highly dependent on the training stimuli (Sinnott, Beecher, Moody, &
Stebbins, 1976). Furthermore, in a vowel perception study, Kuhl (1991) observed that
monkeys do not exhibit the „perceptual magnet effect‟17. Their perception of vowels
was uninfluenced by the closeness of vowel prototypes, a finding that shows that there
are important differences between human and animal speech perception.
Finally, while there are certain similarities between human and animal speech
perception, it can be argued that the results of speech perception studies with non-
humans do not necessarily imply that general auditory processes or mechanisms
common to humans and non-humans are at work. Non-human performance with speech
is only analogous to human performance; it is possible that similar processes may arise
from disparate evolutionary sources (Jusczyk & Bertoncini, 1988)
2.6 Theories of Categorical Speech Perception
Many theories have been formulated to explain the phenomenon of categorical
perception and how it is embedded in speech perception; a representative sample of the
most important theories is reviewed here. The search for an explanation for how the
transformation from acoustic signal to phoneme occurs has given rise to many
theoretical perspectives on speech perception18. Some of these perspectives are
described in this chapter.
2.6.1 The Motor Theory of Speech Perception
In the 1950s, much influential work was conducted on the categorical perception of
synthetic speech sounds. At Haskins Laboratories, Liberman, Cooper, Delattre and their
17
The perceptual magnet effect theory states that discrimination of vowel contrasts is more difficult if the vowels are acoustically similar to a vowel prototype (P. K Kuhl, 1991) 18 For reviews on speech perception theories and models see (Altmann, 1990; Barron, 1994; Diehl & Kluender, 1989; Diehl, Lotto, & Holt, 2004; Hickok, 2001)
43
colleagues‟ work (Delattre, Liberman, & Cooper, 1955, 1964; Delattre, Liberman,
Cooper, & Gerstman, 1952; Liberman, 1957; Liberman, Delattre, & Cooper, 1952,
1954; Liberman et al., 1956) provided the basis for what is known as the Motor Theory
of Speech Perception.
The motor theory has experienced important changes from the time of its first
formulation (Liberman, 1996), but one claim that has been made in every version is that
the objects of speech perception are articulatory rather than acoustic or auditory events.
More specifically, it is suggested that the articulatory events recovered by human
listeners are neuromotor commands to the articulators – also referred to as intended
gestures – rather than more peripheral events such as articulatory movements or
gestures (Liberman et al., 1967; Liberman & Mattingly, 1985). This view was informed
by a belief arising form the early studies of invariance in speech perception (see section
2.2.2), that the objects of speech perception must be more or less invariant with respect
to phonemes or feature sets and by a further belief that such a requirement was satisfied
only by neuromotor commands.
According to the motor theory, the perception of speech sounds occurs through the
participation of the same neuronal mechanisms that are responsible for their production
(Liberman et al., 1967; Liberman & Mattingly, 1985). Whereas phonemes were
assumed to stand approximately in one-to-one correspondence with neuromotor
commands and muscle contractions, mapping between muscle contractions and vocal
tract shapes was thought to be highly complex owing to the fact that adjacent vowels
and consonants are coarticulated. Because the relationship between articulatory
movements and acoustic signals was assumed to be one-to-one, the complex mapping
between phonemes and speech sounds was attributed mainly to the effects of
coarticulation (Liberman et al., 1967).
As an illustration of the complex mapping between phonemes and their acoustic
realisations, Liberman et al. (1967) pointed to differences in spectrograms of synthetic
two-formant patterns that are perceived by listeners as the syllables /du/ and /di/ (see
Figure 2.3 in section 2.2.2). In these, the steady-state formants correspond to the target
values of the vowels /u/ and /i/, and the rapidly changing formant transitions at the
onset of each syllable carry important information about the initial consonant. That
44
different formant patterns could evoke the same phonemic percept (/d/) strongly
suggested to the motor theorists that invariance must be sought at an articulatory rather
than an acoustic level of description.
The second main assertion of motor theory is that the very close relationship between
speech perception and speech production is innate. Perception of the intended gestures
is said to take place in a specialised speech mode, whose main function it is to make an
automatic conversion from the acoustic signal to the articulatory gesture. The
supporters of this model argue that motor theory can explain a large body of speech
perception phenomena, including the variable relationship between acoustic realisations
and perceived speech sounds, duplex perception19, cue trading20, and audiovisual
integration21 (Liberman & Mattingly, 1985).
Despite these claims it appears that the model is incomplete because it states only that,
but not how exactly the transformation from the acoustic signal to the perceived
articulatory gestures takes place. Moreover, even though categorical speech perception
has been considered to prove the existence and operation of a special decoder for
speech, there is evidence for analogous categorical perception for non-speech signals
(see section 2.3.4.2). These suggest that the categorical perception is not unique to
speech and that its existence does not depend on a special decoder. In addition, the
finding of categorical perception of speech in animals suggests that even if there is a
special decoder, such a special decoder is not unique to humans.
In a newer version of the motor theory, Fowler (1994; 1996) states that speech
perception is a direct mapping process from the acoustic qualities to the gestures by
which those were produced. This is framed within a perspective in which perception is
the direct recovery of the distal event that is perceived. The key elements of the direct
realist approach are: (1) perception is a single step from the signal to the percept, (2) the 19 Duplex perception is an experimental technique that involves manipulation of two components of a sound stimulus, one in each ear. In one ear, the listener would, for example, be presented with a synthesised stop-vowel syllable (such as /ga/) from which the third formant transition is removed; this transition is simultaneously presented to the other ear. People typically perceive a /ga/ as well as the isolated transition, which sounds like a non-speech chirp. Perception is called „duplex‟ because of the double effect: listeners hear both the integrated percept and the isolated transition percept. 20 The term cue trading refers to the concept that different cues can combine to trade against each other to signal the same contrast. 21 In audiovisual integration an integrated percept results from a combination of different auditory and visual input. The phenomenon was discovered by McGurk and MacDonald (1976), who noted that when hearing /ba/ and seeing a video of a face saying /ga/ at the same time, /da/ was perceived.
45
percept is the gesture that produced the event, and (3) there must be an invariant to
mediate the mapping.
2.6.2 Articulatory/Auditory Theories
While the motor theory of speech perception claims that the categorical nature of
perception is a result of the categorical nature of gestures used in production (Liberman
et al., 1957), more recently, it has been proposed that categorical perception occurs as a
result of natural sensitivities of the auditory system, with no reference to articulation, or
any speech-specific mechanism. Stevens (1981) claims that certain acoustic continua,
because of the way the sounds comprising them are processed by the auditory system,
contain regions where discrimination is poor and other regions where it is good.
There are three lines of evidence to support the idea that natural auditory sensitivities
may be responsible for categorical perception. Firstly, categorical perception has been
reported with non-speech continua in which the acoustic contrasts seem to be related to
phonetic contrasts used to distinguish phonemes (Cutting & Rosner, 1974; J. D. Miller
et al., 1976; D. B. Pisoni, 1971). Secondly, mammals seem to perceive VOT
categorically (P. K. Kuhl, 1981; Ramus, Hauser, Miller, Morris, & Mehler, 2000;
Snowdon, 1987), and perhaps they are responding to the same auditory properties that
adult humans respond to, which would speak for a psychoacoustic basis of perceptual
categories. Third, categorical discrimination occurs in human infants for certain speech
and non-speech sounds. For example, infants appear to discriminate VOT categorically
(Aslin et al., 1981), but also discriminate non-speech TOT categorically (Jusczyk et al.,
1980) .
This evidence indicates that categorical perception may have a psychoacoustic basis
that arises from an auditory rather that a linguistic predisposition (Aslin et al., 1981;
Jusczyk, 1981).
2.6.3 The Stage Theory of Speech Perception
The Stage Theory of Speech Perception proposes a different kind of model. In this
theory it is claimed that there is a sequence of stages of processes, the most distinctive
of which includes an array of phonetic feature detectors (Klatt, 1989; K. N. Stevens,
1986). In the first stage the speech signal undergoes analysis in the peripheral auditory
system (inner ear, basilar membrane), including filtering, lateral suppression,
46
adaptation, and phase locking. In the next stage an array of acoustic property detectors,
including detectors for onset, spectral change, formant frequency and periodicity,
compute relational attributes of the signal, for example dynamic changes in the
spectrum or periodicity across different parts of the signal, which tend to be more
invariant than absolute local or static attributes. The third stage consists of an array of
phonetic feature detectors, which examine the set of auditory property values over a
certain period of time, and decide if a particular phonetic feature, such as voicing or
nasality, is present. These decisions are language-specific, i.e., the detectors that are in
use are tuned to the phonetic contrasts of the ambient language of the listener.
Nevertheless, decisions may be similar in many languages owing to constraints
imposed on all speakers and listeners by speech production mechanisms (K. N. Stevens,
1989), and by the auditory system (K. N. Stevens, 1981). A phonetic feature detector
may lead to a decision based on the input from a single acoustic property detector, or it
may combine information from several property detectors. Finally, there are stages of
segmental analysis and lexical search. (For further description of these stages see Klatt,
1989).
The main principle underlying this model is that it should be possible to find a
relatively invariant mapping between acoustic patterns and perceived speech sounds,
provided the acoustic patterns are analysed in an appropriate manner.
2.6.4 The Dual-Process-Model
The Haskins group described speech perception as a process that is either categorical or
continuous, corresponding to the articulatory discontinuity or continuity of the
perceived segmental distinctions; i.e. whether co-articulations between two particular
segments occur, or are anatomically possible. Perception was thought to be mediated by
an articulatory representation of the input (Liberman et al., 1957), even though the
similarity of continuous perception and non-speech perception was evident.
This view of speech perception contrasts with the dual-process model by Fujisaki and
Kawashima (1970) and elaborated by Pisoni (1975). They proposed that two modes are
active simultaneously, one of which is strictly categorical and represents phonetic
classification and the associated short-term memory. The other is claimed to be
continuous and represents processes common to all auditory perception, including
47
auditory short-term memory. The results of any speech discrimination experiment are
thus assumed to reflect both processes: the part of performance that can be predicted
from identification performance (Haskins model) is ascribed to categorical judgements;
the „rest‟, which is the deviation from the Haskins prediction, is assigned to the
memory for acoustic stimulus properties.
The dual-process model partly discards the articulatory basis for categorical perception
by associating continuous perception with auditory (non-speech) perception. Thus, the
difference in categoricality between stop consonants and vowels is hypothesized to
derive not from the different articulatory properties of these segments, but from the
different strengths of their representation in auditory memory. By augmenting the
Haskins model with a free parameter representing the contribution of auditory memory,
Fujisaki and Kawashima also introduced a way of quantifying „degrees‟ of categorical
perception.
The dual-process model reveals new research opportunities, e.g., it allows the
possibility to investigate how subjects use the two sources of information (categorical
and continuous) and what factors might lead them to rely more on one more than the
other. Given that the continuous component is identified with general auditory memory,
techniques became available by supporters of this model to manipulate the strength of
that memory and the influences of that on discrimination (see section 2.4.2). Thus, in
the dual-process model categorical perception changed from being a “special” speech
phenomenon to being a function of the experimental situation.
49
3.1 What is a Tonal Language?
As briefly explained in the previous chapter, apart from consonants and vowels, a third
feature that can be used to distinguish the meaning of words in spoken language is tone.
Tone differs from consonants and vowels in that while all the world‟s languages use
consonants and vowels, not all of the world‟s languages, namely „only‟ 60 to 70
percent, make use of tone to convey meaning (Yip, 2002). Examples of tonal languages
are Mandarin Chinese (almost 900 million speakers), Thai (50 million speakers), and
Yoruba (a language spoken in West Africa, about 20 million speakers).
A language is classified as a tonal language if, over and above consonant and vowel
information, pitch height and/or contour can change the meaning of words. Tonal
languages in which there are mostly level tones and relative pitch height is important,
are called register tone languages; tonal languages that also distinguish meaning by
pitch contour (such as falling or rising), rather than height only, are called contour tone
languages (Pike, 1948). It is reported that around 80 percent of the tonal languages in
the world are contour tone languages (Maddieson, 1978).
Before discussing particular tonal languages in more detail, it is important to define and
distinguish some terms that will be used in the following sections: fundamental
frequency, pitch, and tone.
Fundamental frequency (F0) is an acoustic property of the speech signal. F0 is measured
in Hertz (Hz) and it refers to the pulses per second in the acoustic signal. In speech,
each pulse is produced by a single vocal cord vibration (the process of vocal production
will be explained in section 3.4).
The term „pitch‟ refers to how the F0 of sounds (speech or non-speech) is perceived;
pitch is the subjective auditory perception that is correlated with the fundamental
frequency of the acoustic signal. Even though the terms pitch and F0 are often used
interchangeably, the relationship between F0 and pitch is not straightforward.
Psychoacoustic experiments have shown that pitch perception is only linearly related to
F0 up to frequencies around 500 Hz, while frequencies above 500 Hz are
logarithmically related to physical F0 changes (Moore, 1989).
Tone is a linguistic term: lexical tone refers to a phonological category that
distinguishes words and is only relevant for languages in which pitch plays a phonemic
role – tonal languages such as Thai, Mandarin, Cantonese, or Vietnamese. It is
50
important to note that there are other factors than pitch that influence tonal distinctions.
Duration of the vowel, voice quality, values of the second formant (F2), and amplitude
can also play important roles in the perception and production of lexical tone
(Abramson, 1978; Henderson, 1981; Tseng, Massaro, & Cohen, 1986; Yip, 2002).
Nevertheless, pitch variations – height and/or contour – are the sine qua non of lexical
tone.
In this thesis, the concern is with the pitch variations in speech sounds (lexical tone)
and in non-speech sounds.
3.1.1 Tonal Phenomena
Apart from in lexical tone languages, pitch is also used to distinguish meaning in other
languages; the so-called pitch accent languages make use of tone in a different way
than tonal languages. Pitch accent languages like Japanese or Swedish mark words with
specific tone patterns that can change the meaning of the word. Pitch accent languages
differ from tonal languages in that not every syllable is marked by a tone and the
distribution of tones within a word or an utterance is predictable because it is governed
by certain pitch accent rules (Crystal, 2003). Another difference between pitch accent
and tone is that pitch accent relies on relative pitch across syllables, whereas lexical
tone is syllable-based.
In this thesis, the concern is not with pitch accent languages, but with the syllabic tone
languages Mandarin Chinese, Thai, and Vietnamese and the perception and production
of syllabic lexical tone and non-speech pitch.
A phenomenon that occurs in some tonal languages, e.g., Mandarin and Cantonese
Chinese, is tone sandhi. Sandhi in Sanskrit means "putting together" and it refers to
tone manipulation rules governing the pronunciation of tones. Mandarin Chinese has a
tonal system that is governed by tone sandhi rules. Sandhi refers to the change of tones
depending on their context in spoken language (see section 3.2.3 for more information
about Mandarin tone sandhi). Historically, tone sandhi appears to have developed out
of allophonic variants of tones which assimilated other tones and were subsequently
replaced by them (Gussenhoven, 2004).
Apart from tone sandhi, there is a different articulatory phenomenon that can influence
tone production – tonal coarticulation. Coarticulation refers to the process in spoken
language in which the features of phonemes change due to overlap with the properties
51
of adjacent sounds. This has been observed with consonants and vowels (see section
2.2.2), and tones also seem to be prone to coarticulatory effects. Tonal coarticulation
occurs in various languages, including Thai (Gandour, Potisuk, Dechongkit, &
Ponglorpisit, 1992), Vietnamese (Han & Kim, 1974), and Mandarin Chinese (Y. Xu,
1994). In a comprehensive study of tonal coarticulation of Mandarin, it was observed
that there is bidirectional coarticulation in Mandarin that affects the average pitch
height, rather than only the tone onset or tone offset values (X. Shen, 1990).
Even though tone and intonation are semantically distinct and not to be confused (see
section 2.1.3), there are acoustic interactions between intonation and lexical tone. For
example in Thai, lexical tone interacts with intonation such that lexical tones keep their
contour, but the absolute values of F0 of tones decrease with the usual intonational
declination of F0 over the utterance (Abramson & Svastikula, 1983). Nevertheless, tone
and intonation remain distinct. In a study of intonation patterns in Thai,
Luksaneeyanawin (1984; 1998) observed four intonation types: statement intonation
(also called tune 1), question intonation (tune 2), telephone „yes‟ intonation (tune 3),
and agreeable and interested intonation (tune 4). Integrated in the different tunes, the
lexical tones could still be identified quite accurately, even though there was some
confusion between low, mid, and high tones in tune 1 and between mid and low tones
in tune 2 (Luksaneeyanawin, 1984, 1998).
Thus even though the acoustic characteristics of tones may change due to the effect of
intonation, they remain relatively distinct and can be perceived correctly, as long as the
listener has knowledge about the tonal context, such as the normal speaking range of
the speaker. It appears that there is interaction between tone and intonation in Thai, but
intonation does not seem to interfere with correct tone identification.
3.1.2 Notation of Tone
In order to provide a reference system that can be used to describe lexical tones,
linguists often use numbers, called the „Chao tone values‟ based on Chao‟s (1930)
work. These numbers divide the natural pitch range of a particular speaker into five
levels, the lowest being 1 and the highest being 5. Each syllable is given up to three
Chao numbers in order to track the course of F0 movement across the tone. Most
syllable tones have two digits, the first indicating the onset pitch and last the offset
52
pitch of the tone. Three digits are used for tones that have a changing pitch direction in
the course of the syllable, such as the rising tone in Thai or the mid-falling-rising tone
in Mandarin Chinese (Yip, 2002). If a syllable does not have a tone value or has a
neutral tone22, no numbers are assigned. The Chao values are generally used in the
description of lexical tones in Asian and Southeast Asian tonal languages. This
numerical system sometimes goes along with small diagrams of the tonal contour.
Using these conventions, the representation of the five Thai tones is shown in Table
3.1.
Table 3.1
The Five Lexical Tones of Standard Thai and an Example of a Five-way Tone Contrast
on the Syllable /na/. IPA transcriptions, pitch trajectories, tone names, Chao-values
and meanings are provided.
IPA symbol trajectory name Chao-value meaning
_______________________________________________________________
[ná] high 55 „aunt‟/‟uncle‟
[nā ] mid 33 „a paddy‟
[nà ] low 11 a nickname
[nâ ] falling 451 „face‟
[nă ] rising 215 „thick‟
In the description of tones in American languages a similar system is used, but with
reversed values – 5 indicates low and 1 high tone (Yip, 2002). African languages are
usually described with the letters H for high, L for low and M for middle tone
(Gussenhoven, 2004; Yip, 2002), because most of these are register tone languages
with little or no movement within the course of a single tone.
In this thesis, lexical tones will either be described by numerical Chao values or they
will be named according to the appropriate convention for the particular language.
22 The neutral tone (also called fifth tone) is an inherent tone whose pitch characteristics depend on the preceding tone (see also 3.3.1 for tone sandhi).
53
Building on this preliminary exposition of tonal phenomena, and knowledge of how
tones may be described, the following section provides more detailed reviews of the
tonal language systems that are important for this thesis.
3.2 Tonal Language Systems
This thesis is only concerned with tonal languages that are located in Asia and
Southeast Asia, specifically Mandarin Chinese, Vietnamese, and Thai. In this section an
overview of these and, where relevant, some other tonal languages of the world will be
provided.
Tonal languages can be classified on basis of their geographical location. There are
three areas in the world where most of the tonal languages can be found: Africa, East
and Southeast Asia (including the Pacific), and the Americas.
Most of the African languages are tonal, except for the Semitic languages and the
Berber languages. Most African tone languages have a high and a low (and sometimes
a middle) tone (Yip, 2002).
A great number of the languages of Central America, such as Tewa and Mixtec are
tonal languages. Most tonal inventories in Central American languages have four or
five level tones (Yip, 2002).
Asian and Pacific languages are also rich in tones. Included in the Asian-Pacific
languages are Chinese language families, such as Cantonese and Mandarin. Cantonese
is spoken in Hong Kong and Canton (66 million speakers). The number of tones in the
Cantonese language has been subject of debate in the past. According to Bauer and
Benedict (1997), there are six tones, three level (high, level, and mid-level) and three
contour tones (high rising, mid-low rising, and mid-low falling), with an additional
contrastive tone, that only occurs in some Cantonese speakers. Others put the number at
nine tones (So, 1996), six contrastive tones with three allotones of the three level tones
(high level, mid level, and low level), that differ in pitch height. The three contour tones
are the high rising, the low rising, and the low falling tone. The three allotones are not
contrastive tones: The high entering tone is an allotone of high level, mid entering an
allotone of mid level and low entering is an allotone of low level. These allotones are
similar in height to their level counterparts tones but differ in duration (So, 1996).
54
Other Asian/Pacific languages are Tibeto-Burman, Tai-Kadai (including Thai),
Vietnamese, and Papua, In the following sections (3.2.1, 3.2.2, and 3.2.3), the Thai,
Vietnamese, and Mandarin languages are discussed in further detail, because these are
the populations of tonal language speakers that will serve as participants in experiments
in this thesis.
3.2.1 Thai Tones
Thai is spoken by about 50 million people in Thailand, Vietnam, and in the Yunnan
province of China. There are various dialects of Thai, but Standard Thai (also called
Central Thai or Siamese) is the official language in Thailand, that is used in schools, for
trade, on television, and in national politics. The Thai phonemic inventory consists of
20 consonants, nine monophthongs, and three diphthongs. Each of the monophthongs
can occur in a long or a short version and all 21 vowels can occur with an initial
consonant, a syllable-final consonant, or both (Wayland & Guion, 2003).
In Thai, in addition to consonant and vowel features, every syllable carries a lexical
tone. Thai has five lexical tones: high, falling, mid, rising, and low. There are three
level tones, whose fundamental frequencies are relatively stable: high, mid, and low.
The other two tones, rising and falling have dynamic pitch contours and are therefore
called contour tones (Abramson, 1978). The trajectories of Thai tones in terms of F0
over time are provided in Figure 3.1.
50
100
150
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time (normalised)
Fu
nd
am
en
tal F
req
ue
ncy
(F0
Kha_falling
Kha_high
Kha_low
Kha_mid
Kha_rising
Figure 3.1. Time normalised fundamental frequency contours of the five Thai tones spoken by a male Thai speaker. Figure reproduced from Mattock (2004). Permission to reproduce this figure was obtained from the author.
55
3.2.2 Vietnamese Tones
Vietnamese, the official language of Vietnam, is spoken by around 64 million people
(Dung, Huong, & Boulakia, 1998). Vietnamese is a monosyllabic language with six
lexical tones. These can be separated into two sets of three: three high and three low
tones. The high tones consist of high (or mid) level [33] (tone 1), creaky rising (broken)
[415] (tone 3) and high (or mid) rising [35] (tone 5); the low tones are low falling [21]
(tone 2), low dipping-rising [313] (tone 4) and low level [22] (tone 6) (Dung et al.,
1998; Yip, 2002). The trajectories of Vietnamese tones in terms of F0 over time are
provided in Figure 3.2.
100
120
140
160
180
200
220
240
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time (normalised)
tone1
tone2
tone3
tone4
tone5
tone6
Figure 3.2. Time normalised fundamental frequency contours of the six Vietnamese tones spoken by a male Vietnamese speaker (tone stimuli provided by Prof. Mixdorff, 2007). Permission to use tones was obtained from Prof. Mixdorff.
Over and above F0, other factors that play a role in the Vietnamese tonal system are
duration and voice register. The high rising tone is usually produced in a tense manner,
the creaky rising tone is produced with glottalisation, the low falling tone is usually
produced in a breathy manner, and the low dipping-rising is produced in a tense way
(L. Thompson, 1987).
Duration is generally longest in tone 5 (around 400 ms), followed by tone 1, then tone 2
and tone 3, and is even lower in tone 4, with tone 6 being the shortest (around 160 ms).
56
3.2.3 Mandarin Chinese Tones
Around 70% of the Chinese population speaks Mandarin, and there are a total of 1.1
billion Mandarin speakers across the world. Mandarin is spoken in Mainland China and
is the official language of the country, which includes television, education, and
politics. Mandarin Chinese has four tones: high-level (tone 1), mid-rising (tone 2), low-
falling-rising (tone 3), and high-falling (tone 4). Mandarin tone trajectories in terms of
F0 over time are provided in Figure 3.3.
100
150
200
250
300
350
400
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time (normalised)
Fundam
enta
l Fre
quency (
F0)
ma_55
ma_35
ma_214
ma_51
Figure 3.3. Time normalised fundamental frequency contours of the four Mandarin tones spoken by a female Mandarin speaker. Figure reproduced from Mattock (2004). Permission to reproduce this figure was obtained from the author.
Apart from distinct tone heights and contours, duration differences are also apparent in
Mandarin tones. In citation form, tone 3 is observed to be longer than the other tones,
which are of similar duration (Dow, 1972; A. Ho, 1976; Howie, 1976), however this
duration difference is not as apparent in spontaneous speech (Coster & Kratochvil,
1984; Kratochvil, 1985, 1998). Another cue that plays a role in Mandarin lexical tone is
the intensity contour (Chuang, Hiki, Sone, & Nimura, 1972; Coster & Kratochvil,
1984). In terms of intensity or amplitude, the tones can be categorised into five
patterns: level, higher at onset, higher at offset, higher in the middle, and double-peak
amplitude contour (Lin, 1988). Chuang, Hiki, Sone, and Nimura (1972) revealed that
Tone 4 has the highest amplitude overall, and Tone 3 the lowest amplitude, and Whalen
57
and Xu (1992) have shown that listeners are able to identify all tones except for Tone 1
from amplitude contours alone. Together, it can be concluded that F0, duration, and
amplitude constitute phonetic correlates and perceptual cues for tones in Mandarin,
with F0 usually being the most relevant cue.
Mandarin Chinese exhibits tone sandhi (see section 3.1.1). The main sandhi rules in
Mandarin are: If a third tone is followed by another third tone, the first one changes to a
tone that is similar to the second tone; if a third tone is followed by a neutral23, first,
second, or fourth tone, it changes to what can be called a „half-third‟ tone (the half-third
tone begins to dip like the third tone, but then does not rise) a second tone changes to a
first tone when it follows a first or second tone and is followed by any of the four tones
(Hung, 1989).
3.3 Tonogenesis
Tones, as well as consonants and vowels change over historical time; this tonal
development is called tonogenesis. There are two possible manners of tonal
development: languages can acquire lexical tones or they can lose them. As there are
tonal languages spoken in various regions of the world - Africa, America and South
East Asia (Yip, 2002) - it appears that the geographical location of, or genetic
relationship between, languages does not play a simple role in the origin of lexical tone.
Nevertheless such factors may can be relevant for the development of tones, which
often appears to occur through imitation of other languages (Henderson, 1981), with
tone loss often related to proximity of the language community to speakers of non-tonal
languages (Gussenhoven, 2004). In addition, it is rare for new tones to develop in non-
tonal languages that are not geographically close to areas where lexical tone is used.
3.3.1 Development of Tones from Voicing Contrasts - Tonal Split
The most common type of tonogenesis is the development of tones from a voicing
distinction of stop consonants in a prevocalic position. This so-called tonal split occurs
when a voiced vowel follows a voiced stop consonant and changes to a low-pitched
tone. In the same way, high-pitched tones can develop out of vowels preceded by
23 The neutral tone (where the usual tone is dropped) varies depending on the tone that precedes it (see also 3.3.1 for tone sandhi).
58
voiceless consonants. In this way, two tones are created from one previous voiced-
voiceless consonant distinction. Tonal split has been observed for Vietnamese
(Haudricourt, 1954), Chinese (Karlgren, 1926), other East and Southeast Asian
Languages (Haudricourt, 1954, 1961), including Thai (Gandour, 1974) and in African
languages (Beach, 1938) and Yoruba (Hombert, 1975; 1977a).
It can be concluded that in tonal languages, there may be a trend to minimise intrinsic
effects of preceding vowels on F0, which could be there in order to make the different
tones as perceptually distinguishable as possible.
Two theories have been proposed to explain the articulatory tone change that occurs
after prevocalic voiced or voiceless stops: the aerodynamic theory, and the vocal-cord
tension theory. The aerodynamic theory explains the tone change through pressure
differences in voiced vs. voiceless stops (Hombert, 1975; Hombert & Ladefoged, 1976;
Ohala, 1970, 1973b), whereas the vocal-cord tension theory attributes the F0 changes to
tension changes in the vocal cords as a result of changes in stiffness of the vocal cords
during the production of voiceless sounds, and subsequent spreading out over adjacent
vowels (Ewan, 1975; M. Halle & Stevens, 1971; Ohala, 1973a).
Regardless of which explanation is correct it can be concluded that the voicing
distinction in a prevocalic location results in subtle but perceptible articulation-based F0
changes, which can result in a tone contrast supplementing a voicing contrast.
3.3.2 Development of Tones from Consonants
Apart from the influence of voicing, there are other consonantal features that can play a
role in tone development. The development of lexical tones from a vowel preceded by a
breathy voiced consonant was for example shown in the Punjabi language (Gill &
Gleason, 1969). Implosives24 are another class of consonants that appear to influence
lexical tone development. In languages such as Lolo-Burmese implosives have been
observed to lower the pitch of subsequent vowels in a less effective, but similar way to
voiced stops (Matisoff, 1972). Tone change can also result from the effects of glottal
stop consonants on the preceding vowel. In Vietnamese, the glottal stop disappeared in
the 6th century and was replaced by a rising tone (Haudricourt, 1954), a development
24 Implosives are sounds that are produced by a complete closure of the oral cavity.
59
similar to that in Middle Chinese, where the rising tone also evolved out of a final
glottal stop (Mei, 1970).
3.3.3 Development of Tones from Vowel Height
Another explanation of tone change is the historical development of tones from vowel
quality. Even though there are not many cases to support this suggestion, vowel height
seems to have influenced vowel development in Ngizim, an Afro-Asiatic language
spoken in Nigeria, in which the tone pattern of the verb can be (partially) predicted by
the vowel of the first syllable (Hombert, Ohala, & Ewan, 1979; Schuh, 1971). For a
detailed review of data concerning the development of tones from vowel height see
Hombert (1977b).
3.3.4 Other Influences of Tone Development
In addition to the above causes of tone development from stop consonants, implosive
consonants, glottal consonants, and vowel height, other factors that can play a role in
tonal development are intrinsic pitch25, downdrift26, and interactions between tones,
such as tone redistribution to maximise the perceptual distance between different tones.
Tonal phenomena that cannot be explained by any of these approaches are labelled
tonal „flip-flop‟ or „tonexodus‟. Tonal flip-flop indicates that low tones become high
tones and vice versa. The term tonexodus refers to the development in which particular
tones are eliminated from tonal languages.
3.4 Tone Production
In order to understand the perception of lexical tone, some basic knowledge of the
articulation of F0, the basis of pitch perception, is important. The F0 of speech sounds is
mainly determined by the frequency of vocal cord vibrations. The basic mechanisms of
F0 production are explained below, and the basic structures are shown in Figure 3.4.
25
Intrinsic pitch refers to the fact that high vowels (high in terms of tongue position) are produced with a higher pitch than low (tongue position) vowels. 26
Downdrift is the lowering of a high tone after a low tone (also called automatic downstep).
60
Figure 3.4. View of the larynx (a) lateral view and (b) view from above (Pompino-Marschall, 1995). Permission to reproduce this figure was obtained from the author.
The larynx consists of two rings of cartilage, the cricoid cartilage and the thyroid
cartilage. The thyroid is an open ring that sits on the cricoid. The arytenoids, two small
cartilages, are located on the top of the rear rim of the cricoid cartilage. The vocal cords
(also called vocal „folds‟) are two muscles, together called the vocalis muscle that joins
the thyroid and the arytenoid cartilages. Between the vocal cords is the glottis, a
passage that allows air to pass from the lungs to the mouth. The rotating movement of
the arytenoid cartilages controls the degree of opening of the glottis by bringing the
vocal cords closer together or further apart. The rhythmic closing and opening of the
glottis is often referred to as vocal cord vibration. Vocal cord vibration is achieved by
closing the glottis. When the vocal cords are brought together very closely and air is
forced through the narrow glottal opening, a mechanical force called Bernoulli-force27
applies a sucking effect that draws the vocal cords closer together. Due to this closure,
air pressure from the lungs is built up and eventually forces the vocal folds apart,
releasing a stream of air and reducing the pressure behind the glottis. This cycle is then
repeated. Each burst of air is one vocal cord vibration cycle and these cycles can occur
27
The term Bernoulli-force refers to a high-velocity airstream that passes through a narrow opening causing a reduction in air pressure, which results in drawing the walls of the vocal cords together (Pompino-Marschall, 1995).
(a) (b)
Hyoid
Thyroid
Cricoid Trachea
Arytenoid
Epiglottis
Thyroid
Cricoid
Arytenoid
Glottis
Vocal Cords
61
from as low as 80 times per second (in male speakers) to up to 400 times per second
(for female speakers).
Figure 3.5. Schematic figure of the vocal folds during phonation: (a) closed glottis, (b) subglottal air pressure, (c) glottis forced apart (air pressure: horizontal arrows; Bernoulli pressure: vertical arrows), (d) closing glottis, (e) start of the new cycle (Pompino-Marschall, 1995). Permission to reproduce this figure was obtained from the author.
Small but perceptible F0 changes can be achieved by finely adjusting the mass and
stiffness of the vocal cords (Hirose, 1997). The length of the vocal cords can be
increased by contraction of the crico-thyroid muscle, which, in turn, decreases the mass
of the vocal cords and increases their stiffness. As a result of the greater stiffness,
vibration frequency, and thus F0, increases. It has been shown that the crico-thyroid
muscle plays an important role in the process of raising pitch in tonal languages,
whereas the lowering the pitch involves a more complex interaction of crico-thyroid
and thyro-arytenoid muscles (Yip, 2002).
In the production of stop consonants, voicing regulation is only possible under certain
conditions. If the vocal cords are stiffened by muscular tension, a great amount of
pressure has to be applied in order to make the vocal cords vibrate. Stop consonants
that are produced with stiff vocal cords are voiceless. Hence, the vowel following
voiceless consonants is automatically produced with a higher pitch than when preceded
by a voiced consonant, as was seen in consideration of tonogenesis (see Section 3.3).
In the production of vowels and sonorants (voiced consonants with vowel-like quality,
such as approximants like [w] as in „we‟ and [j] as in „yet‟), vibration frequency is
controlled by different factors. In particular the length of the vocal cords can be
affected by the interplay between thyroid and cricoid rotation patterns, leading to
different vibration patterns.
62
3.5 Tone Perception
While a number of articulatory factors contribute to tone and its perception, such as
duration and register (see section 3.5.3), the main factor is F0, and it is F0 that will
mainly be considered here.
3.5.1 Fundamental Frequency and the Auditory System
Here the anatomy and physiology of the human ear are described, especially as they
relate to the coding of fundamental frequency (Ball & Rahilly, 1999; Moore, 1989;
Pompino-Marschall, 1995).
3.5.1.1 The Outer and Middle Ear
The outer ear is composed of the pinna and the auditory canal or meatus (see Figure
3.6). The pinna modifies the incoming sound, particularly at high frequencies, which is
important for the ability to localise sounds. The sound then travels along the meatus and
causes the tympanic membrane to vibrate. These vibrations are then transmitted
through the middle ear by three small ossicles, the malleus, incus, and stapes to a
membrane-covered open window in the bony wall of the spiral-shaped structure of the
inner ear, the cochlea. The major function of the middle ear is to ensure the efficient
transfer of sound from the air through to the fluids in the cochlea. If sound would
impinge directly on the oval window, most of it would simply be reflected because of
the greater acoustical impedance of the oval window. Thus the middle ear acts as an
impedance-matching device or transformer that improves sound transmission and
reduces reflections. Transmission of sound through the middle ear is most efficient at
frequencies between 500 and 4000 Hz (Ball & Rahilly, 1999; Moore, 1989; Pompino-
Marschall, 1995).
63
Figure 3.6. Anatomy of the ear: outer ear, middle ear, and inner ear (Pompino-Marschall, 1995). Permission to reproduce this figure was obtained from the author.
3.5.1.2 The Inner Ear and the Basilar Membrane
The cochlea is a spiral-shaped conical chamber of bone. It has rigid walls and is filled
with almost incompressible fluids. It is divided along its length by two membranes, the
Reissner membrane and the Basilar membrane. The oval window (see Figure 3.7) is
located at the basal end of the cochlea and at the apical end is a small opening (the
helicotrema) connecting the two outer chambers of the cochlea, the scala vestibuli and
the scala tympani. Inward movement of the oval window results in a corresponding
outward movement of the round window, a membrane covering this second opening of
the cochlea (Ball & Rahilly, 1999; Moore, 1989).
When the oval window (see Figure 3.7) is set in motion by an incoming sound, a
pressure difference occurs almost instantaneously through the fluids of the cochlea, and
thus along the whole length of the basilar membrane. A travelling wave moves along
the basilar membrane from the base towards the apex, and the amplitude of this wave at
first increases and then decreases rather abruptly. The response of the basilar membrane
to sounds of different frequencies is strongly influenced by its mechanical properties,
which vary across the length: At the base, the basilar membrane is relatively narrow
and stiff, whereas at the apex it is wider and more flexible, such that the position of
maximum vibration differs according to the frequency of stimulation. High frequencies
produce a maximum displacement of the basilar membrane near the oval window and
there is little movement on the remainder of the membrane. Low frequencies produce a
vibration pattern which extends all the way along the basilar membrane but reaches a
maximum before the end of the membrane (Ball & Rahilly, 1999; Moore, 1989;
outer ear middle ear
inner ear
vestibular appa
ratus cochlea oval window round window
Eustachian tube incus malleus stapes
tympanic membrane
64
Pompino-Marschall, 1995). Thus it can be seen that the basilar membrane acts as a kind
of spectrum analyser, but with a limited resolving power.
3.5.1.3 The Transduction Process and the Hair Cells
Hair cells are located between the basilar membrane and the tectorial membrane, which
form part of a structure called the organ of Corti. The hair cells are divided into inner
and outer hair cells by an arch known as the tunnel of Corti with the inner hair cells
closest to the inside of the cochlea, and the outer hair cells to the outside of the cochlea.
There are about 25000 outer hair cells, with around 140 small hairs protruding from
each one, while there are only around 3500 inner hair cells, each with around 40 small
hairs. The gelatinous tectorial membrane lies above the hairs. The hairs of the outer hair
cells seem to make contact with the tectorial membrane, but this may not be the case for
the inner hair cells. The tectorial membrane appears to be effectively hinged on one
side, so that when the basilar membrane moves up and down, a shearing motion is
created between the basilar and tectorial membrane. This displaces the hair cells
leading to excitation of the inner hair cells, which in turn leads to the generation of
action potentials in the neurons of the auditory nerve. Thus, the inner hair cells
transduce mechanical movement into neural activity.
Figure 3.7. Anatomy of the cochlea (Pompino-Marschall, 1995). Permission to reproduce this figure was obtained from the author.
Basilar membrane
inner outer haircells
Reissner‟s membrane
tectorial membrane
65
3.5.1.4 Central structures
Nerve fibres from the cochlea first synapse in the cochlear nucleus, then in the superior
olivary nucleus, and finally in the inferior colliculus of the middle brain before they
reach the medial geniculate nucleus of the thalamus. From there, fibres lead to the
primary auditory receiving area, which is located in the temporal lobe of the cortex.
3.5.2 Theories of Pitch Perception
Pitch is the attribute of auditory perception that corresponds to the physical dimension
of the wavelength, the repetition rate of the sound signal. For a pure tone this
corresponds to the frequency and for a complex sound such as speech or a musical
chord, to the fundamental frequency (F0). In this section the concern is with different
approaches to pitch perception.
There are two main theories of pitch perception: the place theory and the temporal
theory. The place theory was first proposed by von Békésy (1960), and focuses upon
the spectral analysis of the sound stimulus in the cochlea, in which particular
frequencies (or frequency components, in a complex stimulus) excite particular places
along the basilar membrane. The perceived pitch of a stimulus is said to be related to its
pattern of excitation; for a pure tone, the pitch corresponds to the position of maximum
excitation on the basilar membrane. The problem with the place theory is that it cannot
explain fine resolution of frequency.
The temporal theory (J. F. Schouten, 1940) suggests that nerve spikes tend to occur at a
particular phase in the stimulating waveform (phase locking). The intervals between
successive spikes approximate integer multiples of the period of the stimulating
waveform. Pitch would, according to this be related to the pattern of inter-spike
intervals, rather than to location of excitation patterns on the basilar membrane. The
temporal theory, however, cannot explain the perception of tones with frequencies
many times greater than the maximum firing rate of neurons.
Modern theories, such as that of Moore (1982) tend to combine both place theory and
temporal theory, as neither of them can account for all perceptual phenomena. Moore's
model is like the place model in its initial frequency analysis of the sound, followed by
a temporal analysis of the neural firings.
66
The reliability of a particular theory of pitch perception is not of central interest here; is
it sufficient to note that perceived pitch is not a simple function of the physical stimulus
and the means by which the transformation takes place is both non-linear and still to be
precisely established.
3.5.3 Pitch Perception in Speech – Lexical Tone
There is a great deal of research devoted to the investigation of the sensitivity of the
human ear. The just noticeable difference (jnd) for pitch has been found to be .3 to .5
Hz (Flanagan & Saslow, 1958) in constant synthetic speech sounds and 1.5 to 4 Hz in
dynamic sounds (Klatt, 1973). In this thesis, the concern is with pitch perception in
speech – the case of lexical tone.
Turning to tone perception, the F0 fluctuations both within and between tone classes
must be large enough in order to be perceived as distinct lexical tones, even though
other factors such as duration, amplitude, and voice register may also play a role in tone
perception.
3.5.3.1 Perception of Lexical Tone - Overview
In most of the world‟s languages, lexical tone determines meaning (Yip, 2002). Even
though lexical tone is apparent in the majority of languages, it is underrepresented in
language and speech research. It is essential to investigate lexical tone in order to find
out whether it shares characteristics with other speech segments, such as vowels and
consonants. In the following sections, tone is discussed in terms of perception,
production, and the acquisition and development of both. Categoricality of tone
perception will also be considered.
3.5.3.2 Multidimensional Approach to Lexical Tone Perception
In a multidimensional analysis28 of lexical tone perception, Gandour and Harshman
(1978) investigated the role of average pitch, pitch direction, length, extreme pitch
endpoint of tones, and pitch contour slope on tone perception. Thai, Yoruba, and
American English native speaking participants were tested. The results showed
similarities between the tonal (Thai and Yoruba) speakers, and both similarities and
differences between these and the non-tonal (English) language speakers. All language 28 Multidimensional scaling techniques are used to detect similarities between pairs of stimuli in order to assess the perceptual distance between those stimuli, measured on different dimensions.
67
groups used the dimensions of average pitch and duration as perceptual cues, but tone
direction and slope were perceptually salient only for the Thai and Yoruba speakers. In
comparison to the other cues and the other listeners, Thai listeners paid relatively more
attention to the duration of the stimuli compared to average pitch, direction, and slope
cues. Gandour and Harshman (1978) conclude that mean pitch and duration seem to be
either phonetic or auditory language-general perceptual dimensions, while direction and
slope appear to function as phonetic dimensions only for the speakers of tonal
languages, in which such cues are linguistically relevant. In another study, (Gandour,
1983) Cantonese, Mandarin, Taiwanese, Thai, and English listeners were tested with 19
different pitch contours that can be found in East Asian languages, embedded in the
syllable [wa]. In accord with the previous study, results revealed two important
perceptual dimensions: pitch height and direction. In the English-speaking participants,
pitch direction appeared less important than height.
In summary, it can be concluded that the acoustic feature of pitch height is the most
salient tonal feature, and is used by speakers from both tonal and non-tonal language
backgrounds. Other features (direction, length, and slope) are also important but their
use depends on the particular language background. In general, it seems that pitch
height is a rather universal and acoustic level feature of tone perception, while the other
features are only important for speakers of languages that make use of those features in
a linguistically relevant fashion.
3.5.3.3 Perception of Tone when Pitch Information is not available
In order to investigate the utility of cues other than F0, studies of tone perception in
which F0 is removed are useful. An easy way of removing the pitch information from
lexical tones is whispering. During whispering, the glottis is kept open and thus, all
sounds that are produced are voiceless. Even though whispering is a natural but rather
unusual mode of speaking, people can communicate with whispered speech. Perception
of whispered tone is of interest because F0 information, one of the main acoustic cues
for pitch perception is not available in whispering.
Two types of studies of perception of whispered lexical tone need to be considered
here: those studies that present whispered lexical tones on isolated words and those that
68
present whispered tones in context. After that, tone perception in speech sounds where
F0 is neutralised is considered.
Wise and Chong (1957) investigated perception of whispered lexical tone in Mandarin
sentences and observed correct identification in 62% of the sentences. It was concluded
that lexical tone is not consistently intelligible when it is whispered (Wise & Chong,
1957).
In an experiment about tone perception in whispered speech in isolation in Norwegian,
Swedish, Slovenian, and Mandarin Chinese, Jensen (1958) presented whispered
contrastive word pairs, with the only difference between the words being the lexical
tone. Results of this study show that whispered lexical tone was easier to identify in
Swedish (100% correct identification of whispered tones) and Mandarin Chinese (73-
88% correct), but more difficult in Norwegian (53-73% correct) and Slovenian (71-
85% correct). In a replication of this experiment by Miller (1961) with Vietnamese
whispered tones, only 42% of the whispered tones could be correctly identified. Miller
concludes that less tone information appears to be transmitted in Vietnamese whispered
speech than in the other languages (J. D. Miller, 1961).
A different method of removing tone information is by artificially neutralising the F0
contour while leaving the remaining phonetic information, such as amplitude and
duration intact. Two studies were conducted by Liu and Samuel (2004): in one F0 was
partly or completely neutralised, in the other whispered speech vs. lexical tones with
synthetically removed pitch information29 were tested. The results show that
identification of the neutralised tones is still quite accurate and that there are durational
differences in whispered tones which might aid perception (Liu & Samuel, 2004).
Together these studies show that other cues to tone perception, such as amplitude and
duration are perceived even if F0 information is not available.
Together, these results show that perception of lexical tone in whispered speech in
dependent on the availability of context and that pitch information can be replaced by
other perceptual cues.
29 This was done by replacing the voicing source with white noise.
69
3.5.3.4 Lateralization and Neuroimaging Studies
Further indication of the perception of tone and pitch may be gleaned from studies
determining the locus of processing of particular types of sounds, over and above the
well known specialisation of particular parts of the brain for certain aspects of speech
(Broca, 1861; Wernicke, 1874). In this section, lesion studies (effect on perception and
production of tone), dichotic listening tasks, and functional neuroimaging studies will
be reviewed.
When investigating tonal languages, similarly to non-tonal languages, the approach of
looking at patients with brain lesions is common. In an experiment on brain-damaged
patients, Gandour and Dardarananda (1983) asked left-hemisphere damaged (LHD)
listeners to identify words that differed in the five Thai lexical tones. They found that
LHD patients were less accurate at identifying Thai tones than a normal and a right-
hemisphere damaged (RHD) listener. It was concluded (Eng, Obler, Harris, &
Abramson, 1996) that the left hemisphere of the brain may be responsible for tone
processing (Eng et al., 1996; Gandour & Dardarananda, 1983).
Another tone perception study with brain lesion patients was conducted by Hughes,
Chan, and Su (1983) who observed that Mandarin RHD patients had problems with
perception and production of affective prosody, but their lexical tone identification was
intact (it needs to be mentioned that the stimuli in the prosody part of the task were
sentences, whereas the tone stimuli were words). These results suggest that damage in
the right side of the brain can lead to impairment in monitoring pitch at a global level -
prosody, and not a deficit in lexical tone processing. Together these results indicate that
RHD patients‟ lexical tone perception is normal which suggests that there is a left-
hemispheric dominance for lexical tone processing.
So far, only tone perception has been reviewed. Turning to tone production research,
Gandour also examined brain-injured patients‟ production of lexical tone (Gandour,
Petty, & Dardarananda, 1988; Gandour, Ponglorpisit et al., 1992). Gandour et al. (1988)
70
asked six aphasic patients, one RHD, one dysarthric30, and five normal subjects to
identify and produce Thai tones. Participants without brain damage were very good at
tone identification, while four out of six aphasics showed over 90% identification
accuracy, and the RHD and dysarthric subjects over 92% accuracy. Tone production
was also very accurate in all participants, with very few tone errors. Because most
participants‟ results were at ceiling, it could not be concluded that tone production is
lateralized to one specific hemisphere. In another tone production study, Gandour et al.
(1992) found that RHD patients‟ tone production was between production accuracy of
fluent and non-fluent LHD patients‟ performances, with the non-fluent LHD patients‟
productions being the least accurate. These data imply that poor tone production may
better be characterised as a more peripheral fluency deficit rather than as having its root
cause at the level of a hemispheric differences. Similar results were observed by Yiu
and Fok (1995) who investigated tone production and perception in Cantonese
aphasics. In the perception part, participants had to identify pictures representing six
Chinese words differing in tone. The normal control participants performed
significantly better than the aphasics. In the production part, subjects produced the
same words in production. It was observed that normal subjects produced less tonal
errors than fluent aphasics, who produced less tonal errors than non-fluent aphasics,
which indicates that the grade of fluency influences patients‟ production of lexical tone.
However, as in the other experiments, no right-hemisphere patients were tested, which
makes it impossible to conclude about hemispheric specialisation in the perception and
production of lexical tone.
In addition to the lesion studies mentioned above, different dichotic listening tasks31
were conducted in order to find out about lateralization of tone processing. In such a
task, Van Lancker and Fromkin (1973; 1978) asked Thai and English listeners to
identify Thai tones, presented as words, and hummed versions of these words. Thai
listeners, when listening to Thai words, displayed a right-ear advantage32, indicating
left-hemispheric specialisation for lexical tone, but no advantage for either ear for the
hummed tones. The authors concluded that there is a left-hemisphere dominance for
30 Dysarthric patiens have a speech impairment resulting from damage to the nerves and areas of the brain that control the muscles that are used in forming words. 31 In dichotic listening tasks listeners hear a different signal in each ear through headphones. 32 The phenomenon of right-ear advantage shows that speech sounds are perceived louder and clearer when presented to the right ear.
71
tone processing. It has to be mentioned that the condition in which the Thai participants
listened to Thai words was the only one in which listeners were presented with
semantically correct meaningful syllables and therefore the results might be caused by a
word listening effect but may not be interpretable as a tone perception result.
Not only Thai, but also Mandarin tones were used in dichotic listening tasks: For
example, Baudoin-Chial (1986) asked French and Mandarin native listeners to identify
Mandarin consonants, tones, and hums. The Mandarin group showed no ear preferences
for any of the stimuli, whereas for the French listeners, REA was observed for
consonants, left-ear advantage (LEA) for tones, and no ear advantage for the hums.
More recently, Wang, Jongman, and Sereno (2001) presented Mandarin words differing
in tone to Mandarin and English native listeners for identification. (English listeners
were trained before the main task and the sounds were embedded in noise.) An REA
was observed in Mandarin, but not in English speaking participants, which supports the
view that the left hemisphere is responsible for tone processing. However, again this
pattern of results could be due to general word processing rather than tone processing.
With the help of functional neuroimaging, different studies have been conducted in
order to investigate lateralization of tone perception. Gandour, Wong, and Hutchins
(1998) used positron emission tomography (PET) to investigate tone perception in Thai
and English speaking listeners, who had to discriminate Thai tones presented as words
and low-pass filtered pitch-matched tone contours. The results revealed lateralization of
tone processing in Thai, but not in English speaking participants. A limitation of this
experiment is that the stimuli in the tone condition were real Thai words. Therefore the
data may only show left hemispheric word processing, regardless of lexical tone
contrasts (similar results were found by Klein, Zatorre, Milner, and Zhao, 2001).
In summary, it is difficult to conclude which hemisphere of the brain is specialised for
tone perception and production, as comprehensive studies with RHD and LHD patients
are yet to be conducted. Regarding dichotic listening tasks, it is not possible to study
lateralization of tone perception independently of word/segment perception because
both mostly occur at the same time. Lastly, the neuroimaging data do not completely
support the lateralization view (for a detailed review see Wong, 2002).
72
3.6 Tone Acquisition
Investigation of tone acquisition is important for the understanding of how tone
production and perception develops, and especially how it is affected by linguistic and
developmental variables. As with other aspects of tone, research is sparse, but the few
results available, mainly from case studies, are reviewed in the following section. In the
following, tone acquisition in first language development, both production then
perception is considered first, and then tone production and perception in second
language acquisition will be reviewed.
3.6.1 Production of First Language Tone
In a case study Tuaycharoen (1977), investigated tone acquisition in a single Thai child.
This infant first produced the mid and low tones at around 11 months, which was also
the time the first words were articulated. Despite correctly producing those two tones,
the infant also substituted them with falling, high and rising tones. At 14 months, the
rising tone was introduced and by the end of the 15th month, the high and falling tones
were correctly produced. By the age of around two years, the full set of six lexical tones
was acquired, whereas the production of certain consonants (diphthongs, triphthongs,
initial consonant clusters) was not acquired yet (Tuaycharoen, 1977).
In a similar study of Cantonese, Tse (1977) observed the language development of his
own child. The high and the low tone could be correctly produced at 16 months and by
20 months mid and high rising tone were mastered. Low rising and mid tones were
acquired when the infant began to articulate the first two-word utterances, in the
twenty-first month. It was noted that the child still had problems with articulation of
certain consonants even after the whole set of six lexical tones was acquired (Tse,
1977). In another case, this time with Mandarin Chinese speaking children between the
age of 18 and 36 months, Li and Thompson (1977) also observed that Mandarin tones
were acquired earlier than consonants and vowels. High tone and falling tone were
produced earlier than rising and dipping tones. Tone sandhi rules were mastered at the
same stage as when multi-word utterances were first produced (Li & Thompson, 1977).
In the first cross-sectional multiple-participant study of L1 tone development Burnham,
et al. (2005) investigated tone language production in 12 children at each of ten ages
73
from 18 months to 7 years. It was found that tone production was quite accurate for all
tones at 18 months. Beyond this it was found that tone production and development, as
measured by a tone differentiation score (Barry & Blamey, 2004) improved steadily
over age with respect to the differentiation of the three level tones, and contour vs. level
tones was relatively stable over time but appeared to peak at times of significant
linguistic investment – around the onset of the vocabulary spurt33, and the onset of
reading instruction in school.
Together these data show that lexical tone production is acquired before the production
of consonants and vowels, and that production of tone is more precise and robust than
observed in consonants and vowel production.
3.6.2 Perception of First Language Tone
It appears that tones are generally acquired before consonants and vowels. For example
Tse (1977) investigated a 10-months old child, in order to find out whether the tone or
the consonant information for a word („light‟) was more relevant. He observed that the
responses were more confident when the correct tone information was available with
incorrect segments than when the consonant information was correct. From these data,
Tse concludes that tone information is more salient than segmental information (Tse,
1977). Although this is just a single case study, it is interesting that these results concur
with those for tone production development in first language acquisition.
Turning to relative salience of tonal information across languages, Harrison (1998)
observed that Yoruba but not English speaking infants were able to discriminate
artificial tones and were especially well at discriminating tones that are similar to the
actual Yoruba tones.
In a more comprehensive study Mattock and Burnham (2006) investigated 6- and 9-
month-old Chinese (Mandarin and Cantonese) and English infants‟ discrimination
ability with Thai tones and synthetic violin tones. The results show that English infants‟
discrimination of lexical tones declined over age, but discrimination of the non-speech
tones remained constant. No difference in discrimination performance between 6 and 9
months for either speech or non-speech tones was observed in the Chinese infants. 33 The vocabulary spurt refers to a sudden increase in word acquisition in children‟s language development (Bloom, 1973).
74
These findings indicate that there is perceptual reorganisation for tone just as there is
for consonants and vowels (Werker & Tees, 1992) and that this reorganisation is
dependent on the language environment.
3.6.3 Production of Second Language Tone
Most training experiments in the area of second language tone have concentrated on
tone perception (see section 3.6.4). In this section those experiments that investigated
second language tone production will be summarised.
Shen (1989) analysed tonal errors made by American English speakers who had studied
Mandarin for four months. The results show that error rates ranged from 55.6% for
Tone 4 to 8.9% for Tone 2. The error rates for Tone 1 and Tone 3 were 16.7% and
9.4%. These results indicate that American learners have problems with the production
of all Mandarin tones, but especially with Tone 4 (this is the high falling tone).
Miracle (1989) also analysed the errors that second-year American learners of
Mandarin make, and found an overall error rate of 42.9%. The errors were classified as
either tonal register errors (too high or too low) or tonal contour errors. Miracle found
that the tone errors were evenly divided between these two error types, and they were
also evenly distributed among all tones. These results show that second language tone
acquisition depends on the individual tones at the beginning of the tone learning
process (X. S. Shen, 1989), but later are independent of the particular tone that is
produced (Miracle, 1989).
In a different vein than the previously mentioned quasi-experiments with learners of
tonal languages, Wang, Jongman, and Sereno (2003) conducted a tone training study
with American English listeners to investigate whether perceptual tone training with
Mandarin tones transfers to production performance. The results of perceptual ratings
indicate that tone production improved by 18% compared with pre-training. Acoustic
analyses of the speech data confirm these results, and further show that performance
improvements consisted mainly of increased accuracy in pitch contour rather than pitch
height. These results are consistent with training experiments in the segmental area, and
show transfer effects of perceptual learning to production (Akahane-Yamada, Tohkura,
Bradlow, & Pisoni, 1998; Bradlow, Pisoni, Yamada, & Tohkura, 1997).
75
3.6.4 Perception of Second Language Tone
Listeners who do not have tonal language experience often have difficulties perceiving
lexical tones (Bluhme & Burr, 1971; Kiriloff, 1969; Y. Wang & Spence, 1999) and as
shown earlier (see section 3.5.3.2), non-tonal language listeners place more emphasis
on non-linguistic tone features (average pitch and extreme F0 values), whereas tonal
language speakers concentrate on the linguistic dimensions direction and slope of the
tone contour (Gandour & Harshman, 1978). This indicates that the average pitch and
extreme F0 values are important perceptual features in second language acquisition of
tone. Further studies of second language tone perception for learning Mandarin and
Thai tones are set out below.
3.6.4.1 Mandarin Tone Perception by Second Language Learners
When looking at tone perception in second language learners, it is important to
compare tone perception in tonal and non-tonal language listeners. Lee, Vakoch, and
Wurm (1996) tested Cantonese, Mandarin, and English listeners in a tone
discrimination task of Cantonese and Mandarin tones. They found that tonal language
speakers were more accurate and faster at discriminating tones than non-tonal language
speakers. Thus, it appears that listeners‟ strategy for tone perception depends to some
extent on the linguistic function of pitch in their native language. This is also reflected
in their processing style: Mandarin listeners‟ degree of segmental identification
decreased when an irrelevant pitch level change occurred, whereas English listeners
were not influenced by such a change (Lee & Nusbaum, 1993). This implies that tonal
language listeners perceive pitch and segmental information in an integrated manner,
whereas non-tonal language speakers perceive segments independently of their tonal
manifestation.
Nevertheless, it has been shown that non-tonal language speakers are able to learn
about tone. Tone perception ability has found to improve with increased training:
English listeners‟ perception of Mandarin tones increased by 21% after extensive
perceptual training (Y. Wang & Spence, 1999), a result consistent with what has been
found for consonants (D. B. Pisoni et al., 1982). This perceptual learning of tone
appears to be mainly based on acoustic cues (Gandour & Harshman, 1978).
76
Leather (1990) investigated the effect of production training on perception. A group of
Dutch speakers were trained to produce four Mandarin words differing only in tone.
The results showed that participants‟ tone perception improved through the production
training. The Dutch speakers were able to perceive tone differences without perceptual
training. Leather concluded that training in one modality (production) could enable
learners to perform well in another modality (perception). However, since only a single
syllable was used in training as well as in the test phase, it cannot be concluded that this
observed transfer effect is universal34.
Apart from acoustic cues that are utilised in tone perception by second language tone
learners, it seems that F0-related aspects of the first language can affect tone learning:
In a study about tone perception as a function of linguistic context and sentence
position, Broselow, Hurtig, and Ringen (1987) tested American listeners‟ perception of
Mandarin tones presented in isolation as well as in the context of two or three syllables.
One of the most important findings of this study was that identification accuracy of
Tone 4 varied with the position in context. The authors argue that the results reflected
the interference from English sentence intonation. The results imply that English
listeners‟ perception of Mandarin tones is influenced by their native intonation system.
This finding is in line with the results of studies showing the influence of stress on
Mandarin tone perception. White (1981) observed that English listeners perceive
Mandarin high tones as stressed and the low Tone 3 as unstressed, despite the fact that
in Mandarin, the stress on a syllable is realised by duration and amplitude rather than
by F0.
Together the results of second language Mandarin tone learning show that tone
perception depends on language background, processing style, context, and training
methods.
34 It should be noted however, that while there was no perceptual discrimination training, the production task did of course include presenting four different tones for participants to produce, so there may have be considered to have been some perceptual training of sorts.
77
3.6.4.2 Thai Tone Perception by Second Language Learners
As in the case of Mandarin speakers, Thai language speakers perceive tone better than
non-tonal language (English) speakers (Wayland & Guion, 2003), those listeners
learning Thai as a second language perceive tone better than those without lexical tone
experience (Wayland & Guion, 2003), and Thai children discriminate tones better than
Australian English children (Burnham & Francis, 1997) indicating that linguistic
experience is very important for tone perception.
It appears that the cues that are used in second language tone perception may change
over age. In a series of studies about the use of tonal cues in speech perception over
age, Burnham and Francis (1997) tested four, six, and eight year old Thai and
Australian English children and found that Thai children were better at discriminating
tones than English speaking children. Performance increased with age, independent of
language background. A closer look at the tonal cues that were used revealed that the
cue of mean pitch was used at all ages and the use of pitch onset as perceptual cue
increased with age in Thai children, whereas English speaking children made more use
of tone onset and offset as cues for perception (Burnham & Francis, 1997). These
results support the view that exposure to a tone language across development fine-tunes
a listener‟s use of cues for lexical tone, and that over age speakers of a tone language
integrate more cues in their perception of tone. The results of Burnham and Francis
(1997) support Gandour‟s (1983) findings for adults that pitch height is significant
perceptual dimension in adult tone discrimination for tonal and non-tonal language
speakers. However, Burnham and Francis (1997) found that young tonal and non-tonal
language speaking listeners‟ perception depends on a combination of acoustic cues
rather than on a single acoustic dimension.
3.6.5 Tone Perception as a Function of Tone Language Experience – First and
Second Language studies
The studies summarised in section 3.6.1 to 3.6.4 indicate that native and non-native
tonal language speakers exhibit different patterns in tone perception and production.
Firstly, there seem to be processing differences. Studies of lateralization differences
indicate that lexical tone is processed in a different manner by tonal than by non-tonal
language speakers (Hsieh, Gandour, Wong, & Hutchins, 2001; Klein et al., 2001; Y.
Wang, Sereno, Jongman, & Hirsch, 2001). For native speakers acquiring tones as L1,
78
the tone appears to be a part of each word, however such an association between the
segmental structure and the tone contour does not seem to be active in non-tonal
language speakers (Bluhme & Burr, 1971; Kiriloff, 1969; X. S. Shen, 1989).
Secondly, it appears that second language and first language tone learners attend to
different acoustic/phonetic features. Non-tonal language speakers attend to acoustic
cues differently (Gandour, 1983). Non-tonal language listeners attend more to tone
height than to tone direction, whereas within tonal language listeners, attention depends
on the particular language that is spoken – Cantonese listeners attended more to tone
height than Mandarin and Taiwanese participants, whereas Thai listeners used the
direction cue more than the other groups (Gandour, 1983). A reason for this could be
that non-tonal language listeners have less auditory resources left to pay attention to
cues that are provided by context (Jongman & Moore, 2000). Thirdly, a source of
difficulty with learning tones has been attributed to interference from L1 features, with
knowledge of the function of pitch in the English stress and intonation systems found to
interfere with English listeners‟ perception of lexical tones (Broselow et al., 1987;
White, 1981).
Although tonal and non-tonal language speakers process tones differently, non-tonal
language learners‟ tone perception ability can be improved through training (Y. Wang
& Spence, 1999). This improvement seems to generalise to other contexts, transfers to
tone production, and is stored in learners‟ long-term memory (Y. Wang, Jongman et al.,
2003). These results imply that the adult production and perceptual systems still retain
plasticity with respect to lexical tone, and that cortical representations can be modified
as learners gain additional experience with lexical tone (Y. Wang, Sereno, Jongman, &
Hirsch, 2003; Wong, Skoe, Russo, Dees, & Kraus, 2007).
3.7 Categorical Perception of Lexical Tone
Categorical perception refers to the case in which a physical continuum is perceived
categorically. In speech perception this refers to the categorical perception of
acoustically-based continua (for a detailed review of categorical perception see Chapter
2), and this occurs especially in stop consonants (Liberman, Harris, Eimas et al., 1961;
Liberman et al., 1957), whereas contrasts involving vowels are perceived much less
79
categorically (Fry et al., 1962). Indeed the pattern observed in vowel perception can be
described as almost continuous. In this thesis, the concern is with categorical perception
of lexical tone. Previous research on this matter is presented in the following sections.
The results of studies are best examined for Cantonese and Mandarin, and this
separately, as the findings appear to differ as a function of language background,
perhaps due to the nature of the particular tone systems/tone spaces.
Categorical perception data will be separated into studies that concern mechanisms of
tone perception and experiments that obtain categorical vs. continuous results in
different listener groups with different stimulus materials.
3.7.1 Categorical Perception of Cantonese and Mandarin Tones
Firstly, the studies that have found differences in categoricality in listeners of tonal and
non-tonal languages will be considered here.
Francis, Ciocca, and Ng (2003), tested identification and discrimination of Cantonese
tones in Cantonese native listeners. Identification results for the three continua (low
level to high level, high rising to high level, and low falling to high rising) show that
level tone continua were identified in a continuous way, similar to the degree of
categoricality results found with vowels (Abramson, 1962; Fry et al., 1962), whereas
perception of contour tone continua appeared to be more categorical, as found in
consonant perception (Liberman, Harris, Eimas et al., 1961; Liberman, Harris, Kinney
et al., 1961). No discrimination peaks were observed and the authors conclude that
category boundaries in lexical tone continua are influenced by a combination of natural
psychoacoustic sensitivities and linguistic experience. The natural sensitivities are
shown by the fact that the boundaries between some Cantonese contour tones seem to
be located at locations of perceptual space at which it is likely for listeners to exhibit
heightened auditory sensitivity (changes in pitch contour slope). For example the
transitions between falling and rising and between rising and level pitch contours are
auditory salient boundaries that have been observed in non-tonal participants listening
to speech and non-speech stimuli (Klatt, 1973; Schouten, 1985). The influence of
linguistic experience was shown by the coincidence of a linguistic boundary with a
category boundary (Francis et al., 2003). This pattern of results shows that
categoricality of tones does not only depend on language background but also on
certain acoustic features of the particular tones that the listener is presented with.
80
Fox and Unkefer (1983) also demonstrated categorical perception effects for native
Mandarin speakers when they listened to dynamic Mandarin tones and non-categorical
perception of these tones in American English listeners. These results show that
categoricality depends on the linguistic status that the sound has in the listeners‟
phonological system and tone is therefore treated in a linguistic way by tonal language
speakers – categorical perception and in a rather phonetic/acoustic – continuous way by
non-tonal language speakers.
Quite similar results were observed by Leather (1987), who investigated identification
of dynamic Mandarin tones by Mandarin, Dutch, and English listeners. The results of
this experiment indicate that Mandarin listeners‟ tone perception is categorical, whereas
in English and Dutch speaking participants, a rather continuous pattern of results was
observed, another indicator that categoricality of tone perception is shaped by language
background.
In an experiment investigating mechanisms of tone perception by measuring
discrimination of small F0 contour variations by Mandarin and English listeners,
Stagray and Downs (1993) observed that Mandarin speakers are less sensitive to small
pitch differences than English listeners. This result appears surprising at first, but it
makes sense when it is taken into account that Mandarin listeners, in natural language
processing, have to ignore small F0 differences in order to be able to correctly
categorise lexical tones.
Bent, Bradlow, and Wright (2006) tested Mandarin and English listeners identification
and discrimination of (dynamic) speech and non-speech tones to investigate whether
long-term linguistic experience influences processing of non-speech sounds. As
expected, Mandarin listeners‟ identification of tones was significantly more accurate
than English listeners‟; however, non-speech discrimination did not differ across the
listener. Interestingly, they found cross-language differences in non-speech pitch
identification: Mandarin listeners more often misidentified flat and falling pitch
contours than English listeners in a way that could be related to specific features of the
81
sound structure of Mandarin, which suggests that the effect of linguistic experience
extends to non-speech processing under certain stimulus and task conditions.
Recent experiments by Halle, Chang and Best (2000; 2004) compared categorical
identification and discrimination of dynamic tones in (Taiwanese) Mandarin Chinese
and French listeners. In the first experiment (Y. C. Chang & Halle, 2000), Mandarin
listeners were tested and the results reveal that perception is categorical in a similar
manner as vowel continua are (rather shallow identification functions with slight peaks
in discrimination). A follow-up cross-language study was conducted with speakers of
French and Taiwanese Mandarin (P. A. Halle et al., 2004) and categorical perception
was observed in Taiwanese but psychophysically based perception (continuous looking
results) in the French listeners. The category boundary was located at different points of
the continuum for the different language groups, a phenomenon previously found by
Chan et al. (1975). Halle et al. conclude that tones are perceived quasi-categorically (in
a way similar to vowels) by listeners of Taiwan Mandarin, whereas perception of the
same tonal stimuli seems to be psychophysically based in the French listeners (P. A.
Halle et al., 2004).
Xu, Gandour, and Francis (2006) conducted another cross-language study of lexical
tone perception with Mandarin Chinese and English listeners. They examined
perception of tones that ranged from a level to a rising tone and were presented as
speech and sine-wave stimuli. Their results show that tonal language speakers
perceived the tones in a categorical manner, whereas in non-tonal language speakers, a
rather continuous pattern of results was observed. Interestingly, the non-speech tones
were perceived more categorically than the speech tones in the English listener group.
The authors suggest a memory-based model of perception in which categoricality of
perception is domain-general but strongly influenced by long-term categorical
representations.
In another study about categorical perception of lexical tones in listeners from different
language backgrounds, Wang and colleagues (S. Chan et al., 1975; W. Wang, 1976)
found categorical perception with speakers of Mandarin when asked to discriminate
high rising tones versus high level tones, whereas American subjects produced data
82
consistent with continuous perception. Interestingly, the Mandarin listeners exhibited a
category boundary located near the middle of the asymmetric stimulus space, whereas
the American listeners perceptually divided the continuum into „level‟ and „below
level‟ tones. Similar results were obtained for a series of experiments using non-speech
stimuli with the same tonal characteristics as the speech tones, and Chan et al. (1975)
conclude that the results reflect different modes of processing pitch depending on
language background: psychoacoustic processing in non-tonal language and linguistic
processing in the Mandarin listeners.
Mandarin Chinese and American English listeners‟ categoricality of tone perception
was also investigated by Zue (1976). Both English and Mandarin listeners‟ perception
of a continuum between tone 2 and tone 3 (both dynamic tones) was found to be
categorical.
Even though the results seem very different, there is a general trend in Mandarin tone
perception: dynamic tones are perceived rather categorically, whereas static tones are
perceived in a continuous manner.
3.7.2 Categorical Perception of Thai Tones
Abramson (1961; 1962; 1975; 1979) used stimuli from a continuum between Thai mid
and high tones to test Thai and American listeners. Identification results show a very
sharp category boundary in the middle of the continuum for both listener groups, but
the corresponding discrimination peak (albeit somewhat slight) was only observed in
Thai listeners (Abramson, 1961). In follow-up experiments (Abramson, 1977, 1979),
identification data showed a rather continuous response pattern with quite shallow
identification boundaries; and discrimination results also appeared continuous – there
was no clear discrimination peak and discrimination was very good across the whole
continuum. Based on these combined results Abramson concludes that perception of
lexical tone in Thai is continuous. Even though these results appear clear, it should be
stated that a closer inspection of the discrimination functions reveals that there could
have been a ceiling effect that masked the discrimination peaks.
More recently, Burnham and Jones (2002) conducted a categorical perception study
with native Thai speakers and Australian English speakers using both speech and non-
83
speech stimuli - sine-wave pure tones, filtered speech sounds, and violin sounds. The
speech condition used a non-Thai word (/wa/) recorded in three different dynamic Thai
tone contrast continua (mid-fall, rise-mid, rise-fall) and was matched for duration, F0
onset and contours with the three non-speech stimulus categories. The results indicated
that tonal language speakers show categorical perception effects when they listen to
lexical tones, but fail to generalise to non-speech stimuli with the same tonal
characteristics. Another interesting finding of this study was that while Thai listeners‟
perception of lexical tone items in speech was more categorical than in non-speech,
non-tonal English listeners heard non-speech stimuli more categorically that the speech
sounds, indicating that the categorical perception of tone is, to some extent, learned
(Burnham & Jones, 2002).
In summary, the data on perception of lexical tone indicate that pitch movement (level
vs. contour) is important for categoricality of perception. Level tone continua are
perceived in a more continuous way (Abramson, 1979; Francis et al., 2003), whereas
contour tones are perceived categorically (Francis et al., 2003; W. Wang, 1976). It is
difficult to draw firm conclusions however, as the studies are not comparable, because
different stimuli and different languages were investigated. It is therefore necessary to
conduct the same experiment with different language groups.
Another question that is important for this thesis is whether categorical perception is
restricted to speech sounds, or can also be found in non-speech stimuli with the same
pitch characteristics. Therefore, speech and non-speech tones will be used in the
following experiments. Most of the previous studies have not tested identification and
discrimination of lexical tones; however, both are necessary in order to draw
conclusions about categorical perception.
Some of the shortcomings of the previous studies on categorical perception of lexical
tone will be overcome in the following experiments. Previous studies, for example,
have compared perception of speech and non-speech stimuli in speakers of one tonal
and one non-tonal language, but so far, no previous study has integrated different tonal
languages and speech and non-speech sounds and tested identification and
discrimination. It is essential to look at all of these interconnected issues at the same
84
time. In the following cross-language experiment of the categorical perception of
lexical tone, all variables are incorporated: language group (Mandarin, Vietnamese, and
Thai), stimulus type (speech versus non-speech), and categorical perception factors
(identification and discrimination). In terms of language background, the tonal
language groups will consist of Mandarin, Vietnamese, and Thai native speakers, the
non-tonal language group consists of native speakers of Australian English who are
unfamiliar with Mandarin, Vietnamese, Thai or any other tonal language. In terms of
stimulus types, we will use a tone continuum ranging from a rising to a falling tone
presented as speech and sine-wave sounds. In order to match the speech and non-
speech tones, we use linear instead of curved tone contours. Linear tone contours
usually occur in non-speech situations but correspond to a rather rough approximation
of real lexical tones, and are thus less prone to give perceptual advantages to tonal
language speakers. This experimental design makes it possible to assess the effect of
language background on tone perception by testing tonal and non-tonal language
listeners; to find out whether categorical perception is speech-specific or universal by
comparing speech and non-speech tones in tonal and non-tonal language speakers; and
to investigate whether identification results match discrimination data by testing both of
these aspects of categorical perception.
86
4.1 General Characteristics of Music
This chapter concerns music. The chapter is organised in six sections: The first two
sections concern the structure of music and differences in music across cultures, the
next three sections concern people‟s musical ability and how music is perceived, and
the final section concerns the relationship between music and other abilities, especially
speech perception and production.
Music consists of melody, harmony, and rhythm. Relevant aspects of these are
reviewed in the following sections ("Grove Music Online," 2001).
4.1.1 Scales and Intervals
The simplest musical interval is the octave. Physically, an octave difference represents
a frequency ratio of 2:1. Tones with frequencies that are in an octave relationship are
perceived as similar or related, and are given the same name. Because of the perceptual
similarity of tones separated by an octave, it is now generally accepted that pitch may
be modelled as a spiral (Shepard, 1982a), with pitch chroma being the dimension of the
spiral associated with different pitches, and pitch height being the dimension of the
spiral that is associated with different octaves.
The octave has the perceptual quality of consonance, meaning that the sound of two
complex tones played together which have an octave relationship is generally perceived
as pleasant or harmonious. Two complex tones, which have an octave relationship,
have many partials35 in common. For example, a complex tone with a fundamental
frequency (F0) of 100 Hz has partials (harmonics) at 200 Hz, 300 Hz, 400 Hz, and so
on. A complex tone with a fundamental frequency of 200 Hz has partials at 400 Hz,
600 Hz, etc. Every alternate partial of the lower-pitched complex will lie at the same
frequency as every partial of the higher-pitched complex. These coincidences appear to
assist the perception of pleasantness (Justus & Bharucha, 2002).
In contrast, if one of the complex tones is somewhat mistuned so that it is slightly more
or less than an octave away from the other, the resulting sound is perceived as
dissonant. Continuing with the above example, if the higher complex has a 202 Hz
fundamental, then a beating sensation at a rate of two beats per second will result with
35 A partial is one of the component vibrations at a particular frequency in a complex mixture. A partial does not need to be a harmonic. The fundamental and all overtones may be described as partials; in this case, the fundamental is the first partial, the first overtone the second partial, and so on.
87
the second harmonic of the 100 Hz complex tone, making the combination sound
rough. Additionally, the second harmonic of the 202 Hz complex will beat at 4 times
per second with the fourth harmonic of the 100 Hz complex tone such that a 202 Hz
complex tone sounds very rough and unpleasant.
Apart from the octave, there are other musical intervals that are also consonant (Sadie
& Tyrrell, 2001). In general, two notes that have a simple ratio of their fundamental
frequencies will sound consonant. Examples of these intervals and their ratios are: a
perfect fifth (frequency ratio of 3:2), a perfect fourth (4:3), a major third (5:4), and a
minor third (6:5). The reason that intervals with simple frequency ratios are perceived
as pleasant is similar to the reason that octaves sound consonant: their partials fall at the
same frequencies, and there are no near partials to cause beating or roughness.
4.1.2 Tempo, Rhythm, and Meter
Tempo in music is defined as the speed or pacing of a musical composition. Tempo can
be indicated in different ways. A metronome36 can be used to determine the tempo of a
musical piece in terms of beats per minute. In a less quantitative system,
conventionalised descriptions of speed and gestural character of the composition, such
as andante37, allegro38, adagio39, etc. are used.
Rhythm is a fundamental element that plays a part in many aspects of music: it is an
important element in melody; it affects the progression of harmony, and has a role in
such matters as texture, timbre and ornamentation. While in Western music rhythm is
multiplicative (i.e. rhythmic patterns are derived by multiplying or dividing, normally
by two or three), in many non-Western cultures it is additive; an eight-unit rhythm in
Western music is invariably constructed on the basis 2 x 2 x 2, in Middle Eastern music
it can be 3 x 2 x 3.
Meter is defined as the temporal hierarchy of subdivisions, beats and bars, which is
maintained by musicians and deduced by the listener, and functions as a dynamic
36 A metronome is a device that is used to mark time in music by means of regularly recurring ticks or flashes at adjustable intervals. 37 Andante indicates a moderately slow tempo. 38 An allegro is performed quickly in a brisk, lively manner. 39 Adagio indicates that the composition is to be played in a slow tempo.
88
temporal framework for the production and perception of musical durations. Meter is an
aspect of the behaviour of performers and listeners rather than an aspect of the music
itself. Meters may be categorised as duple or triple (according to whether the beat or
pulse is organised in twos or threes) and as simple or compound (indicating whether
those beats are subdivided into duplets or triplets), or even more complex meters.
Rhythmic and metric characteristics of Western and non-Western music, along with
other musical characteristics of these musical styles are discussed in the following
sections.
4.2 Music in Different Cultures
In this thesis it is the relationship between music and speech that is important. Music
and speech are both universal acoustic communicative systems used by humans. The
differences and similarities between music and speech have long been of interest for
psychologists (Feld & Fox, 1994). They share very important acoustic characteristics
such as intensity, duration, rhythm, timbre, and pitch. The common characteristic of
particular interest here is pitch, based on fundamental frequency, as this relates to
lexical tone. Fundamental frequency and pitch are the focus of the following sections,
firstly with regard to Western music and then with regard to Thai music.
4.2.2 Western Music
4.2.2.1 Scales and Intervals in Western Music
In the Western music tradition40, the diatonic scale (C, D, E, F, etc.) is used. A diatonic
scale is a seven-note musical scale comprising five whole-tone and two half-tone steps.
Between two half-tone steps there are either two or three whole tones, with the pattern
repeating at the octave. These scales are the foundation of the European musical
tradition. Within the twelve notes of the chromatic scale41, there are twelve distinct
diatonic scales. The white keys on a piano map out the seven notes of one such diatonic
scale, repeated in each octave.
40 In this review, only sacred and secular art music is considered; folksong and dance music are not discussed. 41 The chromatic scale contains all 12 pitches of the Western tempered scale.
89
4.2.2.2 Tempo, Rhythm, and Meter in Western Music
In Western music, time is usually organised to establish a regular pulse, and by the
subdivision of that pulse, into regular groups. The arrangement of the pulse into groups
is the meter, and the rate of pulses is its tempo. Most Western music possesses a regular
rhythmic pulse and meter.
In basic meters the subdivisions are equally spaced, but in both Western and non-
Western music there are metric patterns that involve unequally spaced beats.
Conventional meters can also be used as a way of notating complex, irregular rhythms.
(In these cases, performers may engage in metric counting, but listeners are not able to
infer any pattern of beats or bars, in which case it is doubtful if any meter is present at
all.)
4.2.2.3 Brief History of Western Music
The history of Western music can be collapsed into six main periods, each of which has
particular more or less stable features that characterise the period.
The first period is the Medieval Period, which lasted from 400 AD to 1400 AD. Before
900 AD, almost all music had a simple melodic structure, called a plainchant. The
plainchant consisted of one melodic line sung in unison. Over the next 500 years, this
simple structure was expanded. By 1300 AD, there were compositions that were written
for three and four voices. These works are referred to as polyphonic (many voices).
The period called Renaissance began around 1400 AD and lasted for about 200 years.
By 1400, various composers were writing polyphonic works in slightly different
manners. This led to more unified sounding works, and gave rise to a number of
contrapuntal forms, such as the canon42, the canzon43, and the fugue44. Most of the
development during the renaissance happened in Italy and most influential during this
musical period was the Italian composer Giovanni da Palestrina.
The following Baroque Period lasted from about 1600 AD to 1750 AD. New hymns
(chorales) were written, which were primarily homophonic (simple chordal structure) in
nature. By the mid-1700s, several composers began to explore new styles of
42 In a canon, all the voices are repeated exactly, but delayed in time. 43 A canzon is a succession of themes, each of which is developed and then discarded. 44 In the fugue, one theme is developed extensively.
90
composition, such as symphony and concerto. The most important composers to
contribute to the style of music of the Baroque period were Bach, Vivaldi, and Haendel.
This development led to the Classical period (1750-1800), where the basic musical
features did not change appreciably, except for the abandonment of polyphony. The
major contribution during this period was the enlargement and augmentation of many
aspects of music, such as the development of the orchestra. Mozart, Beethoven, Haydn,
and Schubert contributed significantly to the Classical period.
The Romantic period started around 1820 and ended in 1900. The scale of works
continued to expand and by the end of the 19th century, operas of three or more hours
were written; symphonies of an hour and a half were composed, and sometimes 200 or
more musicians were needed to perform these works. In this period, among the most
influential composers were Mendelssohn, Verdi, and Wagner.
By 1900, popular and 'classical' music began to separate. Jazz and then Rock became
the music of the masses, and classically trained composers experienced a much smaller
audience than before. The classical world fractured into many different groups, most of
which began to write much smaller pieces again.
4.2.3 Thai Music
Apart from the Western musical system, there are other widespread musical systems in
the world. Because the current series of studies is concerned with perception of speech
and music in tonal languages and particularly in (musician and non-musician) Thai
listeners (see Experiment 1, Chapter 6, and Experiment 2, Chapter 7), the main
characteristics of Thai music are reviewed in this chapter.
4.2.3.1 Thai Intervals and Scales
In Thai music the tuning system is equidistant. Thai instruments are tuned to seven
different pitches (although the voice and non-fixed-pitch instruments use tones beyond
these seven). In traditional Thai music, five fundamental tones, forming a pentatonic
scale, are the basis of most compositions (Morton, 1976). The relationship between
Western and Thai musical scales can be seen in Figure 4.1.
91
Figure 4.1. Comparison between the Thai and the Western Scales
In the Thai tuning system, the octave is divided into seven intervals that are around 171
cents45. Thai music is notated in the same way as Western music; however, as
mentioned above, Western and Thai musicians are not „ruled‟ by the same size musical
steps. It is therefore unlikely that the music from both musical traditions would be
played together as it would give rise to a dissonant or unpleasant sensation. Thai music
is described as non-harmonic, melodic, with an organisation that is horizontal. This
means that Thai musical pieces usually consist of a melody that is played
simultaneously with variants of the same melody. These variants are then played more
slowly or more quickly than the main melody. This melodic format, where
instrumentalists improvise around the central musical theme is called heterophony.
4.2.3.2 Tempo, Rhythm, and Meter in Thai Music
In Thai music, three prominent proportional tempi can be found: the sam chan, the song
chan, and the chan dieo. Sam chan is double of the length of song chan and four times
the length of chan dieo. Each of these can be played at slow, medium, or fast speeds,
depending on the composition and the instruments involved.
The rhythm and meter of Thai music are steady in tempo, show a regular pulse, and can
be described as divisive, in simple duple meter with no swing and little syncopation. In
Thai music the emphasis is generally placed on the final beat of a measure or group of
pulses and phrases, whereas in Western music, the first beat is usually emphasised.
4.2.3.3 Brief History of Thai Music
Historically, the music of Thailand was mainly an oral tradition, which was
characterised by no notational system. It is therefore difficult to describe clearly the
historic forms of Thai music. Morton (1976) suggests that the notation of Thai music
45 The “cent” system is a means of comparing intervals, in which the octave (in Western music) is divided into 1200 equal parts with each of the semitones encompassing 100 cents.
92
began only around 600 years ago. The classical period, also called the Bangkok period
started in 1782 AD, as a development of music from the fourteenth and fifteenth
century. In 1767 AD art collections and libraries were burnt by the Burmese army in
then the capital of Thailand Ayuthaya, which resulted in the loss of most knowledge
about Thai music history before the Bangkok period (Morton, 1976).
In Thai music, three major genres can be identified: classical music, traditional or folk
music, and contemporary pop and rock.
The earliest traditional Thai ensembles were the so-called piphat ensembles, which
included woodwind and percussion instruments. The khruang sai is another form of
ensemble, composed mainly of string instruments; and in the mahori ensemble string
instruments are combined with melodic percussion instruments and flute.
Thai country music, the luk thung, was developed in the middle of the twentieth century
and was used to reflect the daily trials and tribulations of the rural Thai people. A folk
genre found mostly in Isan, in Thailand‟s northeast, is the mor lam. The mor lam is
thematically similar to luk thung. In the mor lam, the melody is tailored to the lexical
tones of the lyrics and the vocals are described as rapid-fire rhythmic with a „funk‟ feel
to the percussion. The kantrum, another musical variety is traditional dance music
played by Cambodians in Thailand near the Cambodian border. Singers, percussion,
and string instruments dominate the sound of the kantrum.
Apart from the traditional and classical music, in the twentieth century, Western
classical music, jazz, and tango became popular in Thailand. Thai melodies were
combined with Western classical music, which progressed into luk grung, a romantic
music style. In the 1960s, Western rock music became very popular, which led to the
formation of Thai pop music, called string and Thai lyrics started being used.
4.2.4 Similarities and Differences – Western and Thai Music and Singing
In the following sections, musical characteristics of Western and non-Western music
will be compared. In the first part, the features of Thai and Western music are
contrasted, followed by a comparison of singing in tonal and non-tonal languages.
93
4.2.4.1 Music
One of the main differences between Western and non-Western music is the tuning.
While it is clear that Thai and Western music share the octave principle, the Thai scale
consists of seven steps, whereas the Western scale has 12 steps (see Figure 4.1).
Another difference between Thai and Western music is that Thai music is non-
harmonic, while Western music is generally harmonic. In Thai music each instrument
plays its own melodic variation, based on a principal melody, whereas in Western
music notes are combined simultaneously and successively to produce chords and
chord progressions. In terms of tempo, rhythm and meter, Thai and Western music are
similar, both showing a regular pulse, however in Western music, the first beat of a
phrase is generally emphasised, whereas in Thai music it is the last beat. One other
reason contributing to the difference in sound between Western and Thai music is the
difference in instruments. In Western music, instruments are classified into string
instruments, wind instruments, and percussion instruments; in Thai music, plucked,
bowed, hit or beaten, and blown instruments are differentiated (Morton, 1976). This
classification is very similar in both musical cultures, however the instruments in each
category differ; therefore the sound that is produced is different.
4.2.4.2 Singing in Tonal Languages
Singing is a universal form of auditory expression in which music and speech are
fundamentally combined. The investigation of singing represents an ideal case for
assessing the differences and the similarities of music and language. In this section,
studies concerning singing in a tonal language are reviewed.
Even though the speaking voice and the singing voice are similar in terms of the organs
involved, there are some slight differences. One feature that distinguishes singing from
speaking is the controlled use of fundamental frequency. In singing the fundamental
frequency must be controlled much more precisely than in speaking, especially if the
singer performs with other musicians. In singing, higher volume levels and greater
dynamic volume ranges are produced than when speaking. Another difference between
the singing and speaking voice is the position of the larynx. Professional singers sing
with a lowered larynx, whereas the larynx is not lowered in speech (Sundberg, 1999).
94
In tonal languages, pitch height and contour are used to contrast the meaning of words
(see Chapter 3 about lexical tone). As singing involves melodic variation, an important
issue is how pitch information is used to signal lexical tones in singing.
Three options are possible to approach an understanding of lexical tone in songs. The
first is to ignore the lexical tones and thus the meaning of the words, and to use only
pitch for melody marking. If this were the case, musicality would be preserved, but
intelligibility reduced. The second alternative is to preserve the lexical tones and ignore
the melody, thus retaining intelligibility at the cost of musicality. In this case, songs
would sound very much like speech. The third option is a combination of the first two:
the composer (or the singer) tries to preserve as much of the lexical tone information as
possible while restricting the melody as little as possible. Empirical investigations of
singing in tonal languages are presented below.
In an investigation of lexical tones in different kinds of songs, Chao (1956) found that
in Mandarin Chinese “Singsong”, a type of song between speaking and singing, single
lexical tones are sung with a consistent pitch pattern that is the same for the whole
song. For example, one singer was reported to sing all high-level tones on the musical
note A (440 Hz). In this kind of song, the musical intelligibility is preserved, as the
listener can identify the tone (and consequently the word) through the pitch pattern that
is assigned to it. In contemporary Mandarin songs, lexical tones are generally ignored
(R. C. Chao, 1956).
An investigation of investigated Cantonese Opera (Yung, 1983) showed an interesting
very systematic relationship between tones and melody: high-level tones are sung on E
(659.3 Hz), G (784 Hz), or D (587.3 Hz), mid-level tones are sung on C (523.3 Hz),
and low-level tones are sung mostly on A (440 Hz) and sometimes on B (493.9 Hz). It
appears then that in Cantonese Opera, each lexical tone has one or more musical notes
assigned to it and thus there is no overlapping of the musical notes and lexical tones
(Yung, 1983).
On the other hand, modern songs in Mandarin and Cantonese exhibit very different
behaviour with respect to the extent to which the melodies affect the lexical tones. In
95
modern Mandarin songs, the melodies dominate, so that the original tones on the lyrics
seem to be completely ignored. In Cantonese songs, however, the melodies typically
take the lexical tones into consideration and attempt to preserve their pitch contours and
relative pitch heights. Wong and Diehl (2002) analysed four contemporary Cantonese
songs. They observed direction of pitch change over pairs of consecutive syllables and
found an overall correspondence of over 90 percent between musical and tonal
sequences. This indicates that, while the fundamental frequency intervals and the shape
of the contours that are normal for speech are not reproduced exactly in these songs,
there does seem to be a very strong tendency for a rising sequence of tones to
correspond with an ascending sequence of musical notes, and for a falling sequence of
tones to correspond with a descending sequence of notes (Wong & Diehl, 2002).
Another analysis of six modern Cantonese songs (M. Chan, 1987) revealed similar
results: most tones are preserved in Cantonese popular music.
Finally, in an investigation of chants and songs of Central Thailand, List (1961)
observed that the “speech melody” (i.e. the pattern of pitch intervals of the lexical
tones) is mostly preserved, but the range of variation of the musical melody appears to
be limited.
In summary, it seems that the treatment of tones in songs does not depend on the
particular language in which they are sung, but more so on the style of the composition
and the songs.
4.3 Perception of Music – Tempo, Rhythm, Grouping, and Meter
In order to understand the perception of music in general, an understanding of the
perception of all musical features, tempo, rhythm, grouping and meter is required, and
these are discussed below.
4.3.1 Perception of Tempo
Tempo describes the rate at which the basic pulses of the musical piece are played;
musical pulses are confined to a tempo range of roughly 50 to 500 ms (Fraisse, 1982).
Sensitivity to small changes in tempo is most accurate in the range from 300 - 800 ms
(Fraisse, 1982). Different lines of evidence propose that temporal intervals ranging
96
from around 200 - 1800 ms, especially those between 400 and 800 ms, have particular
perceptual salience (Braun, 1927; Collyer, Broadbent, & Church, 1994; Fraisse, 1982).
The tempi at which humans prefer to produce and hear an isochronous pulse, the
spontaneous tempo and the preferred tempo are based upon a temporal interval of
about 600 ms (Fraisse, 1982). The phenomena observed in perception of musical tempo
seem to have their origins in the anatomy and physiology of the human body and
research suggest a strong relationship between the perception of rhythm and human
movement, such as heartbeats (means around 670 ms - 1000 ms for adults), walking
(900 ms - 1100 ms), or breathing (200 ms - 350 ms) (Clynes & Nettheim, 1982;
Davidson, 1993; Gabrielsson, 1973; Krumhansl & Schenk, 1997; McLaughlin, 1970;
Shove & Repp, 1995; Todd, 1992; Truslit, 1938).
4.3.2 Perception of Rhythmic Patterns
A rhythmic pattern is a short sequence of events, usually in the order of a few seconds,
characterised by the periods between the successive onsets of the events. The inter-
onset periods are usually simple integral multiples of each other. Around 85 to 95
percent of the notated durations in a typical musical piece are of two categories in a
ratio of 2:1 or 3:1 (Fraisse, 1982). This limit of two main categories is thought to be the
result of a cognitive limitation; it has been observed that even musically experienced
participants have problems distinguishing more than two or three durational categories
in the range below two seconds (Murphy, 1966). In accord with this notion listeners
appear to distort near-integer ratios towards the integers when they repeat rhythmic
structures (Fraisse, 1982), and musicians seem to have difficulties reproducing rhythms
that are not representable as approximations of simple ratios (Fraisse, 1982). Rhythms
that have simple ratios are also easier to reproduce at different tempi, but that is not the
case for complex rhythms (Collier & Wright, 1995). The simplicity of the ratio alone
however cannot account for all perceptual phenomena that are observed in rhythm
perception: Povel (1981) observed that even if the ratios in a rhythmic pattern are
integrals, listeners sometimes cannot realise this relationship unless the pattern structure
makes it evident.
97
4.3.3 Perception of Grouping
As for language, in which utterances can be segmented into sentences, words, syllables,
and phonemes, music can be segmented into groups. Rhythmic patterns are groups that
contain sub-groups, which can be combined to form superordinate musical groups such
as phrases, sections, or movements (Justus & Bharucha, 2002). Lerdahl and Jackendoff
(1983) propose that the psychological representation of a musical piece includes a
hierarchical organisation of groups, also called the grouping structure. Further evidence
to support psychological grouping mechanisms was found by Sloboda and Gregory
(1980), who observed that clicks placed in a musical piece were systematically
remembered as being closer to the phrase boundary than they actually were. This
phenomenon has also been demonstrated in the perception of speech in an experiment
that involved placing a click in a stream of speech (Garrett, Bever, & Fodor, 1966).
Results showed that listeners perceived the click at phrase boundaries, rather than in the
middle of the word, where it was actually placed.
Perceptual grouping can occur even when there is no objective basis for it, a
phenomenon called subjective rhythmisation. Within a range of around 200 ms to 1800
ms intervals, an isochronous pattern will be grouped into twos, threes, or fours (Bolton,
1894), and when listeners are asked to synchronise with such a pattern, they illustrate
the grouping by lengthening or accenting every second or third event (MacDougall,
1903). Grouping is also dependent on tempo, as shown by the occurrence of groups of
larger numbers, which is more likely at fast tempi (Bolton, 1894; Fraisse, 1982).
Rhythmic patterns also affect grouping in that events which are separated by shorter
intervals in a sequence will be grouped into a unit that has a longer interval (Povel,
1984).
4.3.4 Perception of Meter
Meter is the hierarchical organisation of musical pieces based on temporal regularities
of the underlying beat or pulse of a musical sequence. One of the main characteristics
of meter is isochrony. This means that the beats are equally spaced over time, creating a
perceived pulse at a particular tempo (Povel, 1984). A beat itself has no duration - it is
simply used to divide the musical piece into equal time-spans. A sensation of pulse may
be evoked by temporal regularity at any level within a sound sequence and is not a
98
feature of the raw musical stimulus itself, but rather something that the listener infers
from it. For example, if a new event occurs almost once per second, a beat is perceived
every second, no matter if the event is present or not (Justus & Bharucha, 2002). A
form of behaviour that reflects the perception of pulse is tapping of the foot to music.
The process of extraction of regularities in music is often regarded as synchronisation
or internal timing device (Povel & Essens, 1984–5; Wing & Kristofferson, 1973).
4.4 Perception of Music - Pitch
Having reviewed the temporal features of music and their consequent perception, the
following sections will look at the perception of pitch in music. When listening to
music one may experience different pitches played consecutively (melody) or at the
same time (harmony) that form coherent patterns, which unfold as the musical piece
develops. Many perceived aspects of these patterns – such as certain pitches seeming
more steady than others, or that simultaneously played pitches sound more or less
pleasant together, or that the occurrence of certain pitches is highly predictable – appear
to conform to the rules of tonality (see Section 4.2.2.1).
There are two main approaches to the study of pitch processing: one is focusing on
sensitivity to acoustical frequencies and frequency relationships; the other is concerned
with the influence of basic cognitive processes on pitch perception. Researching the
first approach, Seashore (1938) claimed that pitch constitutes the direct correlate of
fundamental frequency, thus the relationship between pitch and frequency was believed
to be mediated exclusively by the dynamics of peripheral auditory mechanisms. The
experience of a difference between two pitches was identified with the perception of a
difference, or a ratio, between two frequencies. Empirical studies conducted within this
vein of research tend to focus on the perception of isolated tones or tone combinations.
However, some of the results of this reductionist approach have proven difficult to
agree with the intuitions of musicians and theorists. For instance, Stevens‟ mel scale of
pitch (S. S. Stevens & Volkmann, 1940) implies that the same interval differs in size
according to the register in which it occurs. This type of disproportion, together with
the emergence of cognitive psychology in the 1950s, stimulated research that focused
on the role of cognitive factors in shaping the experience of musical pitch (Shepard,
1982b).
99
4.4.1 Categorical Perception of Musical Pitch
Musical pitch continua are sensory continua, thus it would be expected that they would
be perceived continuously rather than categorically (see 2.3.4). Burns and Ward (1978)
examined trained musicians‟ perception of melodic intervals between sequentially
presented tones and found that perception was categorical: Identification functions for
the intervals were steep and discrimination was best at the semitone boundaries. This
pattern of results was attributed to learning, rather than the acoustic properties of the
stimuli, and this conclusion is supported by the finding than non-musicians did not
exhibit categorical patterns of results (Burns & Ward, 1978).
In concert with these categorical perception effects just for musicians, Siegel and Siegel
(1977a) observed that trained musicians are very accurate in labelling intervals ranging
between unison46 and a major triad, whereas non-musicians show inconsistent labelling.
In a follow-up study, Siegel and Siegel (1977b) measured musicians‟ magnitude
estimates for intervals that ranged from a fourth to a fifth. Perceptual plateaus and
reduced variability was observed within the three interval categories (fourth, tritone,
fifth), whereas rapid changes with higher variability were observed at the boundaries.
This pattern of results led to the conclusion that musicians perceived these intervals
categorically, however there was no assessment of discrimination abilities.
Categorical perception experiments have also been conducted on simultaneous intervals
and chords. Locke and Kellar (1973) presented chords consisting of three tones,
varying the frequency of the middle tone. Participants were asked to identify stimuli
from a continuum between a minor and a major triad. Musicians‟ perception of these
chords was categorical, whereas non-musicians showed rather poor and non-categorical
identification and discrimination. In a similar vein, Blechner (1977) presented chords
from a minor to a major continuum for identification and discrimination. The listeners
who were capable of consistently labelling the stimuli as minor and major exhibited
categorical perception patterns, whereas those listeners who could not identify minor
and major stimuli did not, and had lower discrimination scores. Similar results were
found by Zatorre and Halpern (1979), who used two-tone simultaneous intervals from
minor third to major third. Thus, it seems that musically trained listeners identify 46 If two tones in unison they are considered to be the same pitch, but are still perceived as coming from two separate sources ("Grove Music Online," 2001).
100
acoustically ambiguous chords as major or minor, in a similar manner that listeners
identify ambiguous speech stimuli as one category or the other.
In summary, it can be concluded that the phenomenon of categorical perception is not
unique to speech or acoustic cues that are relevant to speech, and that categorical
perception of pitch in musical stimuli appears to reflect learned categories, rather than
psychoacoustic sensitivities, thus implicating cognitive factors rather than purely
perceptual mechanisms.
4.4.2 Relative Pitch
Melodic musical intervals can be defined as subjective correlates of sequential
frequency ratios. The melodic information in a musical piece is not dependent on the
absolute frequencies of the tones that form the melody, but rather on the frequency
relationships between the tones. The musical scales of all cultures are based on the
notion that equal frequency ratios cause equivalent percepts (Burns, 1999).
Accordingly, melody transposition does not cause loss of melodic information.
Trained musicians who have developed what is called relative pitch (RP) are able to
assign verbal labels to musical intervals. Possessors of RP are able to identify the name
of the note or interval, but only with a given reference note. RP also enables musicians
to produce intervals when given an interval name and a reference tone. RP is not
necessary in order to appreciate or play music, but it can assist in sight-reading music.
Most trained musicians have RP abilities, and RP training is included in most music
curricula.
4.4.3 Absolute Pitch
There is a small proportion of the population that have absolute pitch (AP). AP (also
called perfect pitch) is defined as the ability to produce or identify specific pitches
without reference to an external standard tone (Baggaley, 1974). Possessors of AP have
internalised their pitch references, and are able to maintain stable representations of
pitch in their long-term memory.
101
The faculty of AP is often compared with colour perception (D. Ward, 1999), because
for possessors of AP, pitch labelling is comparably easy, just as colour labelling is for
most humans. Nevertheless, the comparison is not quite perfect, as the visual system
divides visible wavelengths on the basis of retinal cells (cones) which are specialised
for particular wavelength ranges (Bimler, Kirkland, & Jameson, 2004). Information
from the peripheral auditory system and the cochlea however is more continuous and
there does not seem to be a one-to-one mapping between incoming stimulus and
percept (Bornstein, 1973). Absolute pitch is quite rare, probably occurring in only
0.01% of the population47 (Bachem, 1955; Baharloo, Johnston, Service, Gitschier, &
Freimer, 1998; Profita & Bidder, 1988; Takeuchi & Hulse, 1993). It appears to be
distinct from the ability, which some people develop to judge the pitch of a note in
relation to a reference pitch such as the lowest or highest note they can sing. Rakowski
(1972) asked listeners with and without AP to adjust a variable signal so as to have the
same pitch as a standard signal, for various time delays between the two. For long
compared with shorter time delays listeners without AP showed a marked deterioration
in performance, whereas those with AP did not.
One possibility to explain the origin of AP is that it is a faculty acquired in infancy
and/or childhood through some learning process, although Ward (1999) has suggested
that the converse is true: we may all start with AP, but the ability is usually unlearned
due to being reinforced for relative but not absolute pitch judgements. The limited
success achieved by training in adulthood tends, at the moment, to favour the idea of
some sort of learning process in the development of AP. (For a discussion of infant AP
see 4.5.5 and 4.5.6.).
Timbre can also play a role in AP accuracy. Lockhead and Byrd (1981) demonstrated
that AP possessors are more reliable in identifying tones played on instruments that
they are familiar with. It has also been found that notes that are played on a piano are
easier to identify than notes played on other instruments – a phenomenon also referred
to as “absolute piano” (Takeuchi & Hulse, 1993) – which may also be a familiarity
47 It needs to be noted that there are relatively more AP possessors in Japan, with around 30% for university music education students and around 50% or more for music students (Miyazaki, 2004). This high occurrence of AP in Japan is believed to be a result of early music lessons (see section 4.4.5.2 about the origin of AP).
102
effect given the pervasiveness of the piano. These findings suggest the involvement of
other factors than pitch range or pitch register in pitch identification accuracy.
In an attempt to link absolute pitch to tonal language background, Deutsch, Henthorn,
and Dolson (1999, 2004) examined tonal (Mandarin and Vietnamese) and non-tonal
(American English) speaking participants' F0 variation across two days. The data show
that tonal language speakers' pitch variation between the two days was significantly
smaller than that of non-tonal language speakers, and from this the authors conclude
that the tonal language speakers "display a remarkably precise and stable form of
absolute pitch in enunciating words." (p. 399). This conclusion is questionable, as
absolute pitch is defined as the ability to voluntarily produce or identify specific pitches
without reference to an external standard tone; what Deutsch et al. (1999, 2004) are
examining here is unconscious variation of speaking voice. Nevertheless the fact that
there was less variation for tone language speakers warrants further investigation. A
subsequent study by Burnham, Peretz, Stevens, Jones, Schwanhäußer, Tsukada,
Bollwerk (2004) using more transparent and speech-appropriate measures of F0
variation suggests that the differences between tone and non-tone language speakers are
minimal, prompting the conclusion that such tasks have little to do with absolute pitch.
4.4.4 Absolute Pitch Memory
Over and above absolute and relative pitch listeners with no musical background can
identify familiar melodies presented at novel pitch levels and notice when those
melodies are performed incorrectly (Drayna, Manichaikul, de Lange, Snieder, &
Spector, 2001), suggesting that even participants without AP or RP have accurate
implicit pitch memory. Musical memory abilities were also tested by Schellenberg and
Trehub (2003). In order to investigate pitch memory in people who cannot identify or
produce musical notes, familiar recordings were presented and participants were
required to identify whether the excerpt was taken from the original song or whether it
was shifted in pitch. Even adults with little or no musical experience were able to
remember pitch levels of familiar songs over time, which led to the conclusion that
non-musicians can retain pitch information over long periods of time, an ability
comparable to AP. Similar results were obtained with excerpts as short as 100 ms
suggesting that it is absolute features of the music, such as timbre or frequency spectra
103
that are important rather than relational cues (Schellenberg, Iverson, & McKinnon,
1999).
Further support for this more generalised AP-like ability comes from adults‟ production
of songs. When asked to sing popular songs that they know, almost two thirds of adults
that were tested produced renditions within two semitones of the original versions
(Levitin, 1994) and tempi within 8% of the original tempo (Levitin & Cook, 1996).
Similar consistency in pitch level and tempo has been found when adults were asked to
sing familiar folk songs (like “Yankee Doodle”) on different occasions, even though
they had obviously heard these songs in several pitch levels and tempi (Bergeson &
Trehub, 2002; Halpern, 1989).
Taken together, these results indicate that even though non-musicians are not able to
label pitches, they nevertheless can have good long-term musical pitch memory.
However, it should be noted that studies of this nature have been criticised for various
reasons, one of them being that the reproduction of melodies might involve a form of
muscle memory of the vocal chord muscles that might originate from singing along to
the song, so that the vocal tract position can be recalled when asked to reproduce the
song (Cook, 1991; W. D. Ward & Burns, 1978).
4.4.5 Developmental Issues in Pitch Perception
4.4.5.1 Pitch Perception Development
Pitch perception is essential to music perception (see section 4.2.2). A melody is
characterised by its pitch relationships, without regard to the absolute pitch levels of the
particular tones. This is demonstrated by the fact that adults can recognise a familiar
song at any given pitch level. Infants also seem to have this ability. After limited
exposure to a melody, 5- to 10-month-old infants treat transpositions48 of that melody
as familiar/equivalent to the original melody (H. W. Chang & Trehub, 1977; Trehub,
Bull, & Thorpe, 1984; Trehub, Thorpe, & Moringiello, 1987). When the tones are
rearranged, however (Trehub et al., 1984), or one component tone is altered (Trehub et
al., 1987), infants perceive the tune as new. For infants, who are able to perceive pitch
contour changes even when the standard and comparison melodies are separated by a
48 Transposition of a melody is alteration of component pitches where the pitch relations are preserved.
104
longer (15 sec) temporal interval (H. W. Chang & Trehub, 1977) or by a series of
unrelated notes (Trehub et al., 1984) the pitch contour appears to be the most relevant
aspect of the melody. The salience of the pitch contour is not only found in music. In a
comparison of pitch amplitude and pitch contour Fernald (1991) found that that pitch
contour is also the most salient aspect of infant-directed speech to infants (but see
Kitamura & Burnham, 1998, who show that vocal affect is the most salient aspect of
IDS).
It has also been found that infants are sensitive to interval information in Western
music (Trehub & Trainor, 1993). Infants showed heightened sensitivity for octave
information (Demany & Armand, 1984; Schellenberg & Trehub, 1996b) and have been
shown to confuse melodies that have the same contour (Trehub et al., 1987; Trehub,
Thorpe, & Trainor, 1990), however, they can detect small interval changes in melodies
(Cohen, Thorpe, & Trehub, 1987; Trainor & Trehub, 1993).
Another feature of Western music is the consonance and dissonance of certain intervals.
When 6-month-old infants were given a task requiring them to detect changes in
melodic intervals with varying frequency ratios, it was found that discrimination ability
was better for simple than for complex frequency relationships (Schellenberg &
Trehub, 1996b). Such findings suggest that infants find some intervals more consonant
than others and thus confirm similar observations in children (Schellenberg & Trehub,
1996a) and adults (Schellenberg & Trehub, 1994).
As music not only consists of melodies, but is also made up of harmonies (though see
non-Western music in section 4.2.3), infants‟ perception of harmonies is also of
interest. In a study of adults‟ and 8-month-old infants‟ detection of harmonic change
adults were better able to detect changes that were not within the key49 than changes
within the key, while infants could discriminate changes equally well, independent of
the key (Trainor & Trehub, 1992). This is in line with the general opinion that much of
the listener‟s knowledge of Western harmony is based on learning and exposure, rather
than on natural predispositions.
49 A key is one out of 24 major and minor diatonic scales that provide the tonal framework for a piece of music.
105
4.4.5.2 Absolute Pitch Perception Development
There has been a continuing debate about the origin of AP, mainly between theories
that highlight inherited contributions to AP and those that stress experiential
contributions (Takeuchi & Hulse, 1993; D. Ward, 1999; D. W. Ward & Burns, 1982).
Results that support and contradict the genetic view as well as data for and against the
experiential theory and early earning of AP will be summarised in the following.
Supporters of the hereditary theories emphasise the rarity of AP, and point to evidence
that AP is concentrated in families (Bachem, 1955; Baharloo et al., 1998; Baharloo,
Service, Risch, Gitschier, & Freimer, 2000; Gregersen, Kowalsky, Kohn, & Marvin,
2001; Profita & Bidder, 1988). In fact, while the percentage of AP possessors in the
general population is thought to be less than 0.01% (Bachem, 1955; Baharloo et al.,
1998; Profita & Bidder, 1988; Takeuchi & Hulse, 1993), these numbers are only
estimates and may exaggerate the rarity of AP. Moreover, inherited and environmental
factors are inseparably confounded, and a high incidence of AP by itself is not reliable
support for the contribution of genetic factors (Levitin, 1999; R. J. Zatorre, 2003).
On the other hand, a number of researchers have examined whether AP is learnt
through training. Attempts to improve AP identification in adults by intensive training
have had some success (Cuddy, 1968, 1970), but the levels of performance rarely equal
those found in real cases of AP. Some of those studies showed that it is possible to
remember a fixed standard pitch to a certain degree (Brady, 1970; Cuddy, 1968, 1970).
Nevertheless, these AP training accuracies are far from the level of real AP possessors
who can immediately and accurately identify the 12 pitch classes.
Thus, there is no conclusive evidence supporting the learning theories of AP. It has to
be mentioned that surveys of a great number of musicians show that the proportion of
participants who reported having AP decreased as age of starting musical training
increased (Baharloo et al., 2000; Sergeant, 1969). Miyazaki and Ogawa (2006) tested
children who attended music schools and observed that AP accuracy increased to 80%
or more between 4 and 7 years, and more than two thirds of children acquired AP to a
level of 90% or more correct. It was also found that AP for white piano notes developed
earlier than for black piano notes.
106
However, existing evidence for the early-learning model of AP is mainly based on
anecdotal reports or surveys based on biographical recollections. There is no
convincing evidence supporting the model. In fact, some researchers tried to train
children in AP, but with little or no success (Cohen & Baird, 1990; Crozier, 1997).
These failures may be due to the limited quantity and time of training that was given.
Altogether, these results indicate that AP is most likely a result of long-term training in
childhood. This issue will be considered further in relation to studies with infants
designed to test their AP ability (see section 4.4.5.3).
In summary, these results support the early-learning theory of AP genesis, proposing
that AP is most effectively acquired through training in early childhood. The early-
learning view of AP parallels with the critical period hypothesis50 (Lenneberg, 1967)
for language learning, suggesting that the acquisition process of AP should be
considered within a broader framework of cognitive development.
4.4.5.3 Absolute Pitch Abilities in Infants
As mentioned in 4.4.5.2, it is generally assumed that the roots of AP lie in
childhood/infancy. One of the facts that support this view is the negative correlation
between the onset of musical training and the accuracy of AP (Miyazaki, 1988;
Sergeant, 1969) and that younger children are better than older children in AP training
tasks (Crozier, 1997). According to this view, early in life, AP would be the dominant
pitch-processing mode, which is later replaced by the more functional ability to
represent and remember pitch relations (relative pitch). Saffran and Griepentrog (2001)
conducted two experiments with 8-month-old infants in order to investigate the use of
absolute and relative pitch cues in a statistical learning task with tone sequences. The
results suggest that infants are more likely to track prototypes of absolute pitches than
of relative pitches. In the third part of this experiment adult musicians and non-
musicians were tested on the same statistical learning tasks. Unlike the infants, adult
listeners depended mainly on relative pitch cues. These results suggest that there is a
developmental reorganisation from an initial focus on AP to the ultimate dominance of
50 The strong version of the critical period hypothesis states that children must acquire their first language by puberty or they will never be able to learn from subsequent exposure. The weak version is that language learning will be more difficult and incomplete after puberty.
107
RP. AP may be a less mature perceptual capacity, eventually replaced by RP during
development (Trehub, Schellenberg, & Hill, 1997).
In summary these studies of pitch perception in infancy augment our knowledge of
pitch perception and its development. It appears for example that knowledge of
harmony is probably a learned skill and absolute pitch is most likely a result of
intensive early training.
4.4.6 Hemispheric Differences in Pitch Processing
The processes involved in pitch perception depend to some extent, on the locations in
the brain, where information is processed, particularly with regard to the lateralization
of brain function. Here lateralization with regard to pitch perception in general and AP
in particular are considered in turn.
4.4.6.1 Lateralization of Pitch Processing
In the second half of the nineteenth century, Wernicke (1874) and Broca (1861) found
that certain speech and language problems could be related to specific brain areas. This
led to the suggestion that the left hemisphere of the brain was responsible for language
and analytic processing and the right hemisphere more for processing global
information, such as patterns. Accordingly, musical abilities were thought to be a
function of the right hemisphere; whereas language abilities seemed to be a function
located in the left side of the brain. Today, the picture seems to be much less simple.
It is not clear whether pitch perception in music and language share cognitive
mechanisms, although, a left hemisphere bias has been shown for processing of pitch
category information in both music and language. The suggestion that pitch categories
of a lexical or musical nature are especially dependent on left hemisphere processing is
interesting and differences between musical and lexical pitch categories suggest that
both of these may be processed separately in the left hemisphere. A great deal of
research has been devoted to investigating shared processing mechanisms for music and
speech. Currently, there are two significantly different views on that matter. One of
them assumes strict modularity of the two systems; it states that speech and music are
separate and do not share the same mental processing systems (Fodor, 1983). This view
is supported by the results of many behavioral and imaging studies, that have suggested
108
that linguistic processing occurs in the left hemisphere of the human brain, whereas
music is processed in the right hemisphere (Bever, 1975; Bever & Chiarello, 1974).
Even though these studies show that there is a lateralization effect, it appears that
hemispheric dominance of different processing mechanisms is not absolute, but can be
viewed as a tendency (Wong, 2002).
The alternative view states that hemispheric differences affect particular aspects of
auditory processing and that shared acoustic features of music and speech, such as
pitch, will be processed in a similar way. This possibility is based on the results of
various experiments, that have shown that phonemic processing occurs in the left
hemisphere, whereas melodic and prosodic units are processed in the right part of the
human brain (Bryden, 1982; Kimura, 1961, 1964; Patel, 2003; Patel & Peretz, 1997;
Peretz & Coltheart, 2003; Shankweiler & Studdert-Kennedy, 1967; Studdert-Kennedy
& Shankweiler, 1970; Van Lancker & Fromkin, 1973, 1978). It appears that some
aspects common to music and speech, for example hierarchical organisation, are
processed in overlapping areas of the brain, an observation that suggests that there are
common neural mechanisms which are used in speech and music processing (Patel,
2003).
In the current series of experiments, the effect of experience-dependent learning in the
domain of music on processing in the domain of speech is investigated and from the
results, it may be possible to know more about modularity or non-modularity of speech
and non-speech processing.
4.4.6.2 Lateralization of Absolute Pitch
It appears that there may be some neurophyisological differences between AP
possessors and people without AP. Keenan, Thangaraj, Halpern, and Schlaug (2001)
reported that musicians with AP have an enlarged planum temporale, a region that is
located in the left temporal lobe of the brain, which has been found to be involved with
language processing.
In order to investigate the neural basis of AP, Zatorre, Perry, Beckett, Westbury, and
Evans (1998) used functional and structural brain imaging techniques to measure
cerebral blood flow during presentation of musical notes to both possessors of AP and
109
to musicians without AP. Both listener groups showed similar patterns of increased
cerebral blood flow in auditory cortical areas and the group of AP possessors
demonstrated activation of the left posterior frontal cortex, an area thought to be related
to learning conditional associations. This activity was also observed in non-AP subjects
when they made relative pitch judgments of intervals (R. J. Zatorre et al., 1998).
Increased activity within the right inferior frontal cortex was observed in RP but not in
AP subjects during the interval judgment task, suggesting that AP possessors need not
access working memory mechanisms in this task, because they simply classify each
interval by name. Magnetic resonance imaging measures of cortical volume also
suggested that listeners with AP have an enlarged planum temporale, a result that
correlated with performance in a pitch-naming task (R. J. Zatorre et al., 1998). Their
findings suggest that AP depends on the use of a special neural network that enables
retrieval and manipulation of verbal-tonal associations.
Both of the mentioned studies show the importance of the left hemisphere of the brain
in AP.
4.5 Music and Other Domains
Music and speech are the most complex uses of sound-based communication in
humans. Additional to this similarity, they share other features: Both music and speech
are generative. This means that simple elements such as notes or speech sounds are
combined systematically in order to create complex, but meaningful structures, such as
melodies or words (R. J. Zatorre, Belin, & Penhune, 2002). Both music and speech
consist of elements that are time-dependent and occur in sequences, in which pitch,
duration and dynamics are very important. Both systems are constrained by the limits
of the auditory system, the central nervous system, and memory. Lerdahl and
Jackendoff (1983) have found similarities between the hierarchical structuring in
musical rhythm and the prosodic timing patterns in speech.
The suggestion that there might be links between musical and non-musical domains has
generated a large amount of research in recent years. One line of research is concerned
with the short-term benefits of listening to classical music, and another is concerned
with the effects of long-term musical training (see section 4.5.1). Each is discussed
here.
110
Neurophysiological studies to investigate common vs. separate processing of music and
speech have been considered in 4.4.6.1. Here behavioral evidence on this issue is
considered with respect to priming studies, transfer effects, and the effect of music on
speech.
Before considering these studies it is important to clarify differences in musical ability,
musical aptitude, and musical training. Everybody learns to speak51, and language is
learnt by exposure to speaking. Mothers and caregivers talk to infants and children and
on this and other bases they learn to speak and to understand language/speech.
Moreover there are certain proclivities and structural aspects of the human brain that
facilitate language learning. If we look at music, the case is not so simple. Not
everybody is exposed to music at the same level nor has the opportunity to learn a
musical instrument (or have singing lessons).
It has become clear that certain aspects, such as musical harmony are learnt by
exposure to music, i.e., listening. Other aspects, such as knowing the different keys on
the piano, must be trained, just as learning to read requires instruction. It is known that
there are children - dyslexic children - who have problems with reading (Orton, 1925).
In relation to the speech/music comparison, the question that then arises is whether
there also are people who have problems learning to play an instrument. This question
will be addressed in the following review of musical aptitude studies.
Musical aptitude is different from musical ability. Musical ability refers to the level of
musical skill and musical understanding of an individual (Boyle, 1992). The level of
musical ability is generally a result of various factors, such as aptitude, and musical
training. Musical aptitude is the possibly latent potential of a particular individual to
acquire musical skills.
There are various measures of musical aptitude (Gordon, 1965, 1989; Seashore, Lewis,
& Saetveit, 1939, 1960; Shuter-Dyson & Gabriel, 1981). The aptitude test used in the
current series of experiments is called AMMA (Advanced Measures of Music
Audiation) (Gordon, 1989). This test will be used because it is the only music aptitude
test developed specifically for university students and was found to have significant
51 This excludes individuals with medical conditions that do not allow them to learn how to talk.
111
predictive validity (Gordon, 1990). A criticism often encountered with aptitude
assessment is that musical aptitude and training may be confounded, i.e. performance
on this test may improve with musical training; therefore musical aptitude tests may
never be measures of “pure” aptitude. This problem cannot be solved in this thesis,
however, regression analyses will be conducted in order to eliminate such confounds.
In the ongoing debate on the degree to which certain cognitive abilities are influenced
by biological or environmental factors, the issue of musical aptitude versus training as a
determining factor of musical ability is often raised. In Western culture it is generally
believed that musical ability can mostly be explained by innate talent or giftedness
(Davis, 1994; Gardner, 1983; Radford, 1990). However, there is no direct evidence for
genetic involvement in musicality (Howe, Davidson, & Sloboda, 1998). This is in line
with the claim that most individuals have the potential to develop musical skill
(Ericsson, Krampe, & Tesch-Römer, 1993). Thus, the origin of musical ability and
aptitude has not been clearly established. One of the aims of the current series of
experiments is to investigate the relative roles of musical training and musical aptitude
on speech perception and production, which may lead to clearer understanding of the
interplay between these factors. Even though no gender differences in musical ability
and musical aptitude have been found (Shuter-Dyson & Gabriel, 1981), there are
differences between the genders in terms of musical involvement and achievement.
Girls are generally more involved and successful than boys at musical activities at
school, but men still play much more important roles in the music world. Research has
shown that in instrumental music, the gender-role principle and self-perceptions in
children are opposite to gender differences observed in adults (Eccles, Wigfield,
Harold, & Blumenfeld, 1993).
4.5.1 The Effect of Music on Other Cognitive Abilities
This section will provide a review of studies on how the effect of long-term training in
music can influence unrelated cognitive abilities, such as spatial and visual abilities,
verbal skills, reading and mathematics. Studies with infants and adults will be
considered.
112
Gromko and Poorman (1998) found that children‟s musical aptitude was positively
related to performance on a task that involved matching melody with graphic
representations leading to the conclusion that musical ability also seems to have an
effect on symbolic reasoning (Gromko & Poorman, 1998).
A different line of research investigates the relationship between musical skills and
verbal and visual skills. Hassler, Birbaumer, and Feil (1985) investigated visual-spatial
abilities and verbal fluency in nine- to fourteen-year old children. The sample consisted
of three groups with different musical abilities (non-musicians, musically talented,
musically talented and able to compose/improvise). Test results indicate that the
musically talented children were better than the non-musical children at verbal fluency
and visualisation, but not at tasks measuring spatial relation ability. This shows that
formal music training can be accompanied by better performance in non-musical tasks
(Hassler et al., 1985). In a similar study with adults, Chan, Ho, and Cheung (1998)
compared the verbal and visual memory abilities of women with and without musical
training. The results of this observation show that the groups did not differ on the visual
memory task, whereas the musicians performed better on the verbal memory task (A. S.
Chan et al., 1998). In a study about the influence of formal music training on verbal
recall showed that there is positive transfer between those unrelated skills (Y.-C. Ho,
Cheung, & Chan, 2003). Together this shows that musical training can enhance visual
and verbal abilities.
Another line of research investigated the relationship between music training and
reading and writing skills in children. It was found that discrimination ability of
musical sounds was related to reading performance in early readers of four to five years
age (Lamb & Gregory, 1993). Similarly, Standley and Hughes (1997) found that
children in pre-kindergarten classes (four to five years old) who took music lessons
over a period of two months showed improved pre-reading and pre-writing abilities,
compared to children without music lessons. A study on the relationship between music
and reading showed that a group of students that received Kodaly52 music instruction
52 Zoltan Kodaly was a Hungarian composer and ethnomusicologist. He developed a way of educating young children through singing of the native mother tongue folk songs. The Kodaly Method promotes the learning of music in a series of concepts then applies a sequential learning process for teaching music that follows the natural developmental pattern used in learning a language: i. e., aural, written, and then
113
scored significantly higher in a reading ability test than a control group that did not
receive music training (Hurwitz, Wolff, Bortnick, & Kokas, 1975). However, it was not
clear that the enhancement of reading ability was caused by the music training itself or
simply because of the more varied school program. The question remained unanswered
whether the children would have improved in reading if they had been given a different
kind of special instruction. It also remains unclear how music training could facilitate
reading, since the musical training group did not learn to read music.
The effect of music training on mathematic ability was investigated in different studies,
which will be reviewed here. Mathematical ability in five to seven-year old children
was observed and those children who participated in arts were found to have better
mathematical skills than children who did not do arts (Gardiner, Fox, Knowles, &
Jeffrey, 1996). Another study investigated possible transfer effects of keyboard lessons
in children of six to eight years of age. The results show improved mathematical skills
for the children who took keyboard lessons, which is further evidence that musical skill
can influence mathematical ability (Graziano, Peterson, & Shaw, 1999). Some studies
have found that music instruction can also affect certain mathematical abilities. In a
more complex study, Graziano, Peterson and Shaw (Graziano et al., 1999) compared
the proportional reasoning scores of children from seven to nine years. The study
included one group who received computer-generated spatial-temporal training alone
and another group who received the same spatial-temporal training and piano lessons.
Although both groups scored higher than a control group, the group that included piano
training scored significantly higher than the group that did not. A more recent study
(Rauscher & LeMieux, 2003) found that children who received two years of individual
keyboard lessons performed better on a standardised arithmetic test than children in
control groups. Children who received singing instruction also scored higher than
controls. Children who received instruction on rhythm instruments performed best on a
mathematical reasoning task. A meta-analysis combining six experimental studies
provides tentative support for the notion that music training affects mathematical
achievement (Vaughn, 2000)).
read. Rhythm symbols and syllables are utilized. Hand signals are used in order for the singer to visualize the pitches being sung and to understand tonal relationships.
114
The studies that have been reviewed here provide suggestive support that formal music
training has positive effects on non-musical abilities. However, the specific details of
the effects vary vastly between the studies. Many of the studies have been criticised for
lack of reliability. However, it can be concluded that music lessons can have positive
transfer effects on non-musical abilities. Most studies reviewed here show that formal
music training can enhance performance on unrelated tasks.
Music lessons combine different factors, such as practice, attention, concentration,
timing, ear training, sight-reading, and exposure to music. Schellenberg (2001) argues
that positive transfer into other non-musical areas, such as mathematics or spatial
reasoning, might be unique for learners of all factors involved in music lessons. He
points out that music lessons might improve general skills, such as attention to rapidly
changing temporal features and other skills, which are likely to be transferred into
other, non-musical domains (Schellenberg, 2001).
These data suggest that music may function as a medium for cognitive abilities in other
disciplines, and the connection between music and spatial-temporal logic is very
convincing. However, it is not clear what exactly the contribution of music instruction
is and no longitudinal studies have been conducted to determine the longevity of the
effects.
4.5.2 The Effect of Music on Speech
The effect of music training and exposure to music on unrelated cognitive abilities has
been reviewed in the previous section. This section will focus on the use of music for
speech processing, because in perception of music as well as speech shared processes,
such as melody recognition, contour processing, timbre discrimination, rhythm
processing, prediction, and perception of symbols in context are involved. As music has
been shown to be related to other cognitive abilities (see section 4.5.1), it is interesting
to examine how music and speech processing are connected. The interest in the
relationship between music and speech abilities has a long tradition (Dexter &
Omwake, 1934) and the popular opinion that musical aptitude helps in language
learning has been investigated since the 1930s. In this section, the influence of music
115
training and music aptitude on pitch processing, word recognition, prosody perception,
and foreign language learning will be reviewed.
In a recent study about the influence of musical training on pitch processing in
language, Magne, Schoen, and Besson (2006) analysed 8-year old children‟s behavioral
data and event-related potentials (ERP) in a pitch perception task. Those children who
had a musical training background performed better at a pitch incongruity task in both
language and music. Together with the electrophysiological data, these results confirm
what had previously been found: there are positive transfer effects between the areas of
music and language and the development of prosodic and melodic processing is
influenced by musical experience (Magne et al., 2006).
Another line of research is concerned with the effect of music training on word
recognition, reading, and related abilities. McMahon (1979) trained young children to
discriminate three-note chords and found improved word recognition, reading, and
general phonic skills (McMahon, 1979). In a study by Douglas and Willatts (1994), 8-
year old children‟s literacy and musicality were assessed and the results suggest that
pitch-discrimination ability, rather than rhythm-discrimination ability predict literacy.
Different aspects of music and speech have been examined; the remainder of this
section will focus on the existing literature concerned with the processing of prosody
and melody. Prosody has linguistic and emotional functions and can be defined as
stress and intonation patterns in speech, measured as fundamental frequency, intensity,
duration, and spectral features. Thompson, Schellenberg, and Husain (2004)
investigated the emotional function of prosody and found that adults with musical
training were better at identification of emotions, such as sadness or fear than non-
musical adults. They also demonstrated that 6-year old children who were tested after
one year of musical training were better than non-musical children at identification of
anger or fear. It was concluded that music lessons heighten sensitivity to emotions that
are conveyed by speech prosody (for a review see Slevc and Miyake, 2006).
116
4.6 Influence of Music on Foreign Language Sound Acquisition
The assumption that musically talented people are also better at learning foreign
languages is a very old one and has not yet been systematically tested on a segmental
level, but usually more on a grammar-related structural level (Blickenstaff, 1963;
Gilleece, 2006). In this section, studies concerning the relationship between music and
foreign language sound learning will be investigated.
In an investigation of musical ability and second language learning, Fish (1984) found
that the results of the melodic variation subtest of the music aptitude test (Gordon,
1965) and scores on the Pimsleur sound discrimination test (Pimsleur, 1966) were
correlated, however no correlation was found between music aptitude, sound
discrimination and imitation of short phrases in German ("MLA cooperative foreign
language test, German," 1964). Arellano and Draper (1972) found that pitch, intensity,
and timbre perception, tonal memory and Spanish articulation were correlated
(independently of IQ) in speakers of American English. Similar results were obtained in
a study by Westphal, Leutenegger, and Wagner (1969), who tested the relationship
between psychoacoustic factors and intellectual abilities and second language learning
achievement in American English speaking beginners of German. They observed a
significant correlation between scores on a music aptitude test (Seashore et al., 1939,
1960) and comprehension, reading and production of German sentences. In a recent
study, Gillece (2006) investiagted the relationship between music aptitude and second
language learning and found that there was a significant relationship between music
aptitude and language aptitude, independently of general intelligence.
Another study that illustrates a remarkable link between musical experience and
pronunciation ability was conducted by Eterno (1961). His results show that 90% of
eighth graders who had a minimum of one year of musical experience scored above
average in a pronunciation test, while only 10% scored a rating of average. Not one
person who played a musical instrument scored below average.
In a number of experiments concerning acquisition of new tonal contrasts, Gottfried
(2007) shows that musicians are more accurate than non-musicians at identifying,
discriminating, and imitating Mandarin tones.
117
Taken together, these data show that there is a strong relationship between music and
second language learning; Musical training generally appears to assists in foreign
language (sound) acquisition (Tahta, Wood, & Loewenthal, 1981; Thogmartin, 1982).
The aim of the last experiment in the current series is to investigate the relationship
between musical training, musical pitch memory, music aptitude, language aptitude,
and acquisition of foreign language speech sound perception and production. This aims
to clarify the question what exact aspect of music helps with language learning on a
segmental level.
Research findings in the area of categorical perception, lexical tone, and music
perception were presented in the previous chapters. In the next section, these studies
will be summarised, and the motivation behind the current series of experiments will be
explained. Following this chapter, the Experiments 1, 2, and 3 are presented in Chapters
6, 7, and 8 respectively.
119
5.1 Categorical Perception of Artificial Tone Continua
In the second and third chapters, speech perception processes and lexical tone were
reviewed. Special attention was devoted to the phenomenon of categorical perception.
None of the existing studies of categorical perception of lexical tone tested more than
one tonal language. Thus, any outcomes of those studies cannot necessarily be
attributed to tonal languages in general, but only to the particular languages that were
tested. In this thesis, listeners from different tonal language backgrounds are tested, in
order to investigate whether the observed influence of tonal language experience on the
perception of speech and sine-wave tones is a universal phenomenon, or whether
unique perceptual differences exist for each tonal language. In Experiment 1, speakers
of three tonal languages (Vietnamese, Mandarin, and Thai) will be tested with artificial
synthesised tone continua. The tones are presented in speech and sine-wave contexts in
order to eliminate the effect of tonal content on speech perception. Non-tonal language
(Australian English) speakers‟ tone perception is also examined in order to compare
processes of speech perception in tonal and non-tonal language speakers.
Not all previous experiments on categorical perception of lexical tone have measured
identification and discrimination of tones. As discussed in Chapters 2 and 3, it is
necessary to test both identification and discrimination in order to draw conclusions
about the categoricality of perception. In Experiments 1 and 2, both components of the
categorical perception paradigm will be administered.
This thesis is also concerned with the processing of speech sounds compared to non-
speech sounds. Comparisons between the perception of speech and sine-wave tones in
speakers of tonal languages and non-tonal languages will focus attention on the
differences between speech and non-speech processing.
Most studies in categorical tone perception have only investigated whether tones are
perceived categorically or continuously. In the current Experiments (1 and 2), degrees
of categoricality, measured as subtle differences between listener groups, will be
analysed. In addition to the degree of categoricality, the relative location of category
boundaries on the synthesised tone continua will be investigated.
120
So far, only one study has used asymmetrical tone continua to investigate category
boundary differences between different language groups (S. Chan et al., 1975; W.
Wang, 1976). The present Experiments 1 and 2 will approach the issue of boundary
location on an asymmetric tone continuum, and thus examine the matter of perceptual
tone space in a more systematic way. Use of an asymmetrical continuum allows the
investigation of psychoacoustic vs. linguistic strategies in tone perception.
5.2 Influence of Musical Background on Tone Perception
In Chapters 3 and 4, pitch perception in speech and music was reviewed. The findings
indicate that both language and musical experience can influence perception of musical
pitch and lexical tone.
In Chapter 3, it was also shown that pitch perception is shaped by experience with
lexical tone, and in Chapter 4 that pitch perception is also influenced by experience
with musical tone. In general, musicians and tonal language speakers are more accurate
at perceiving pitch differences.
In Experiment 1, the role of musical ability and tonal language experience is
investigated in a preliminary fashion, ahead of a more detailed and analytic
investigation of these factors in Experiment 2.
5.3 Perception and Production of Tones, Vowels, and Consonants –
Influence of musical aptitude and language aptitude?
Musical ability has different constituents - experience, aptitude, and memory. In
Chapter 4 the influence of music exposure and music lessons (music experience) on
other cognitive abilities was discussed. The results show that music exposure and
training enhances performance in other non-musical areas. In Experiment 3, musicians‟
and non-musicians‟ perception and production of speech sounds (tones, vowels, and
consonants) are investigated and related to the components of music ability, memory
and aptitude, via administration of a musical memory task and a music aptitude test.
121
CHAPTER 6
Categorical Perception of Speech and Sine-Wave Tones in
Tonal and Non-Tonal Language Speakers
122
Experiment 1 concerns categorical perception of speech and sine-wave tone in
speakers of tonal and non-tonal languages. In this introduction past research and a
number of methodological issues are considered ahead of the presentation of
Experiment 1.
6.1 Background: Research on the Categorical Perception of Tone
As indicated in Chapter 3, research regarding categorical perception of lexical tone is
confounded by contradictory findings, differences in participants‟ tonal language
background, varying stimuli, differential methods and data analysis across studies,
and inconsistent operationalisation of categorical perception. These issues and
differences have made it difficult to make comparisons and to provide a
comprehensive account of lexical tone categorisation. Experiment 1 addresses these
issues by providing a common context for comparison between speakers from a non-
tone language background (Australian-English) and a variety of tone language
backgrounds (Thai, Mandarin, and Vietnamese), as well as a measure of
categorisation that encompasses both tone identification and discrimination. In doing
so, a clearer indication of whether perceptual effects are language-dependent or
language-universal can be more confidently made.
Experiment 1 is based on previous research on categorical perception of lexical tone
(see section 3.7). Most recently, Burnham and Jones (2002) found that language
background only impacted upon categorical perception of tones presented as speech,
and not upon any of the non-speech representations they studied (F0 variations in
filtered speech, violin notes, or sine-waves). Specifically, there was better categorical
identification of tone by tonal language (Thai) than non-tonal language (Australian
English) speakers, but only on speech syllables. The relative degree of categoricality
between perceivers with Thai and Australian-English backgrounds did not differ
across the three non-speech tone types. While this finding is superficially inconsistent
with earlier studies of Thai listeners in which tones in speech were perceived rather
continuously (Abramson, 1961; 1977), it should be noted that categorical speech
perception was only investigated in identification by Burnham and Jones (2002).
Together these results suggest that, not only does categorical tone identification
depend on a tone language background, but also that tone in speech is perceived more
categorically than non-speech tone.
123
In this experiment Thai, Mandarin, Vietnamese, and Australian English speakers will
be tested with speech and sine-wave tones in both a categorical identification and a
categorical discrimination task. Thus this study (a) increases the number of tonal
languages considered, and (b) expands the measures used to assess categorical
perception of tone.
With regard to the increase in the number of tonal languages investigated this is
necessary given that previous results in categorical perception relating to tonal
language speakers are inconsistent. With respect to pitch movement (dynamic vs.
static), static tones tend to be perceived continuously (Thai - Abramson 1979;
Cantonese - Francis et al., 2003), whereas dynamic tones are perceived categorically
(Mandarin – Wang, 1976; Cantonese – Francis et al., 2003). Moreover, because of the
differences in stimuli and methods, data are not comparable across studies
investigating Thai and other tonal languages. Wang (1976) observed categorical
perception of Mandarin contour tones, with the boundaries being different in
American English vs. Mandarin listeners, indicating that the reference point that is
used to divide the continuum perceptually depends on language background.
Based on these and Burnham and Jones‟ (2002) results, in addition to Australian
English listeners, Thai and Mandarin listeners‟ perception of tones will be
investigated. Furthermore a tone language as yet unexamined in relation to tone
categorisation, Vietnamese, is also examined in order to expand the range of different
tonal language backgrounds investigated. On this basis, it may be possible to
determine whether categorical perception of tone is general across this subset of tonal
languages.
With regard to the measures used to assess categorical perception, important issues
are pointed out in the following section.
6.2 Methodological Issues
In order to provide a reliable and valid methodological basis for Experiment 1,
deliberations on several methodological concerns are presented below.
124
6.2.1 Stimulus Type Presentation: Blocked vs. Mixed
The method of stimulus presentation is of concern in relation to measures of
categorical perception. Previous research has suggested that perceptual processes are
affected by the context surrounding the presentation of sounds (see section 2.5.1.2).
Burnham and Jones (2002) presented speech and non-speech tones in separate blocks
in their experiment. Indeed it may be the case that blocked presentation of stimulus
types encourages different modes of perception in particular blocks, or that a „speech‟
mode of perception is transferred from a block of speech stimuli to a subsequent block
of non-speech stimuli. (This methodological point bears, to some extent, on the issue
of whether speech and non-speech perception are modular.) Even though the order of
blocks was randomised by Burnham and Jones (2002), only three quarters of
presentations (all except those in which the speech block was presented last) would
not have been prone to this possible order effect. To assess this potential extraneous
influence, the current study presents listeners with the speech and non-speech (sine-
wave) tones in a mixed condition (speech and sine-wave trials presented randomly
within the same blocks of the task), in addition to the blocked presentation that was
used by Burnham and Jones (2002).
6.2.2 Categorical Perception: Identification and Discrimination
Burnham and Jones (2002) were only interested in identification of tones from a
continuum; however, categorical perception is best gauged by a combination of
identification and discrimination results. The current study will test for both
categorical identification and discrimination. This additional task requires that further
considerations be made in the design (choice of interstimulus interval) and that a
refined operationalisation of categorical perception be made. These issues are
addressed below.
6.2.2.1 Interstimulus Interval in Discrimination Tasks
In relation to discrimination tasks, it was shown in Chapter 2 (section 2.4.3.2) that the
interval between two sounds to be discriminated, the interstimulus interval (ISI), can
have an effect on categoricality of perception. Based on previous investigations on
different ISIs it was proposed in 2.4.3.2 that shorter intervals between stimuli to be
125
discriminated result in a phonetic mode of perception and thus less categorical
perception, whereas longer intervals engage a phonemic mode of perception and thus
more categorical processing (Werker & Logan, 1985; Werker & Tees, 1984). These
results suggest acoustic – rather continuous - perception for short ISIs, but phonemic
– more categorical - perception for longer ISIs. Therefore, two different ISIs (500 ms
and 1500 ms) were employed here in order to test for possible effects of stimulus
separation.
6.2.2.2 Refined Operationalisation of Categorical Perception of Tone
In a study of the categorical perception of tone, Wang (1976) found that Mandarin
and American English listeners employed different points to perceptually divide an
asymmetric continuum. The category boundary for American listeners was found to
be psychophysically based, located close to a level tone, whereas for Mandarin
listeners, the boundary was closer to the middle of the continuum, presumably due to
an influence of their language background.
Based on these results (W. Wang, 1976), two alternative response strategies were
deemed possible in the current experiment. These are schematically presented in
Figure 6.1 and Figure 6.2, and described below for the asymmetric continuum used
here in Experiment 1. This continuum consisted of tones with a consistent onset F0
(200 Hz) but varying offset F0s from 160 Hz (falling tone) to 220 Hz (rising tone) in
10 Hz steps (see Figure 6.3). This asymmetrical continuum was used because it was
different from the tonal systems of the three tonal languages investigated here.
Mid-Continuum Response Strategy: In the mid-continuum strategy, it might be
expected that the category identification boundary and discrimination peak for a
synthetic asymmetric continuum would be midway between the endpoints of the
continuum. If so then (i) the identification boundary should lie in the middle of the
continuum (190 Hz offset), and (ii) stimuli in the middle of the continuum should be
more discriminable than those at the ends of the continuum (see Figure 6.1).
126
0
25
50
75
100
160 170 180 190 200 210 220
offset value (in Hz) for the 200 Hz onset tone
res
po
ns
e a
cc
ura
cy
in
%
discrimination identification
Figure 6.1. Mid-Continuum Response Strategy for a synthetic continuum with onset = 200 Hz, and offset = 160 – 220 Hz in 10 Hz steps. The 190-offset stimulus represents the middle of the continuum, whereas the 200 Hz offset stimulus is a flat no-contour tone.
0
25
50
75
100
160 170 180 190 200 210 220
offset value (in Hz) for the 200 Hz onset tone
res
po
ns
e a
cc
ura
cy
in
%
discrimination identification
Figure 6.2. Flat-Anchor Response Strategy for a synthetic continuum with onset = 200 Hz, and offset = 160 – 220 Hz in 10 Hz steps. The 190-offset stimulus represents the middle of the continuum, whereas the 200 Hz offset stimulus is a flat no-contour tone.
Flat-Anchor Response Strategy: In the Flat-Anchor Response strategy, it might be
expected that the category boundary and discrimination peak for this synthetic
mid flat
mid flat
127
asymmetric continuum would be near the flat no-contour tone (200 Hz offset). If so
then (i) the identification boundary should lie on the flat 200 Hz stimulus, and (ii) the
stimuli around the 200 Hz stimulus should be more discriminable than those at the
ends of the continuum (see Figure 6.2).
6.2.2.3 Non-Speech Stimulus Materials
One final methodological consideration concerns the choice of stimulus types. While
Burnham and Jones (2002) used filtered speech, sine-wave tones, and musical
equivalents as non-speech stimuli, here only sine-wave stimuli will be used to
represent non-speech. A reduction in non-speech types here is reasonable because
Burnham and Jones (2002) found that type of non-speech tone did not influence the
results. The sine-wave option is preferable to the violin or filtered speech types in that
it provides a more neutral context compared to the musical context suggested by the
violin, and is more normal sounding than filtered speech, which can sometimes seem
muffled and/or obscured.
6.3 Hypotheses
Hypotheses for language background, tone type, presentation type, and interstimulus
interval (ISI) were entertained. It was hypothesised that perception should be more
categorical (i) by tonal language than non-tonal language speakers; (ii) for speech
than for sine-wave continua; (iii) and within the speech stimuli, in blocked than
mixed presentations, because speech/non-speech juxtaposition might result in
interaction of processing modes (i.e., it was thought that speech and non-speech
processing are not entirely independent in tone perception). With regard to the ISI, a
final hypothesis was entertained, that: (iv) stimulus-pairs with a greater temporal
separation (1500 ms) will be discriminated more categorically than those with a
shorter ISI (500 ms).
6.4 Experimental Design
A 4 x 2 x 2 x (2) design was employed. The between-group factors were participants‟
native language (Thai, Vietnamese, Mandarin, Australian English), presentation
manner (mixed or blocked), interstimulus interval (500 ms or 1500 ms temporal
separation between two sounds in discrimination) and the within-group factor was
128
tone type (speech or sine-wave). The order of tasks (identification and
discrimination), the order of tone types in the blocked condition (speech and sine-
wave tones) as well as order of stimulus in discrimination (higher offset stimulus first
vs. lower offset stimulus first) were counterbalanced between participants and are not
considered in the analysis.
For the identification task measurements were taken for each stimulus on the
continuum; and for the discrimination task measurements were taken for each
contiguous pair of stimuli. Dependent variables in each task are considered in detail
in subsections 6.4.3.1 and 6.4.3.2.
6.4.1 Stimuli
Two synthetic continua were created, one speech, one sine-wave each with identical
F0 contours. The tonal contours of the continua are shown in Figure 6.3. The tones
have a fixed onset of 200 Hz and an offset varying from 160 Hz (falling) to 220 Hz
(rising) in 10 Hz intervals. Thus each continuum is asymmetrical and consists of
seven steps. The stimulus with the 190 Hz offset marks the middle of the continuum,
and the 200 Hz offset stimulus is a flat no-contour stimulus. F0 movement in the
continuum is linear in order to avoid resemblance to the actual tones of the specific
languages whose speakers were tested. The syllable /wa/, recorded from a female
native Thai speaker, was used as the speech sound carrier because this combination of
sounds exists in most of the languages tested53. Variations in tone type matching, and
F0 were achieved by resynthesising speech and sine-wave tones to the same F0 and
duration (495 ms) specifications with the STRAIGHTv30kr16 software (Kawahara,
Katayose, de Cheveigne, & Patterson, 1999). Sine-wave resynthesis was conducted
with the MARCS Auditory Perceptual Toolbox (APT) (Stainsby, Haszard Morris,
Malloch, & Burnham, 2002).
53 The syllable /wa/ in Mandarin has different meanings; wa1 means frog, wa2 means baby or doll, wa3 means tile, and wa4 means socks or stockings. In Thai, wa3 (short vowel) is a question particle, waa0 (long vowel) is a Thai measurement for distance and area, and waa3 (long vowel) is the name of a hill tribe from Myanmar. The sound combination /wa/ is possible but does not have a meaning in Vietnamese.
129
Figure 6.3. F0 characteristics of the asymmetrical tone continuum.
6.4.2 Participants
In all the experiments including this one, participants gave their informed consent (see
Appendix A6.1) and the experiment was covered by Ethics Approval of the
University of Western Sydney (HREC 01/163). A total of 64 participants were tested,
16 native Thai speakers (9 female, 7 male average age: 23.6 years), 16 native
Mandarin speakers (9 female, 7 male average age: 25.6 years), 16 native Vietnamese
speakers (5 female, 11 male average age: 23.6 years), and 16 native Australian
English speakers (11 female, 5 male average age: 26 years). All were students at the
University of New South Wales, Sydney (mean age 24.7 years, range 18-31), who
were reimbursed for their travel expenses. A further three participants (one in the
Australian English group, the other two in the Mandarin listener group) began the
experiment, but failed to complete the task. Their data are not included in the analysis.
6.4.3 Procedure
Participants were tested individually in a single session, in a sound-attenuated testing
cubicle in the Department of Psychology at the University of New South Wales,
Sydney. Stimuli were presented on a laptop computer (Compaq Evo N1000c) over
headphones (KOSS UR20) at a self-adjustable listening level in the DMDX (Forster
& Forster, 2003) experimental environment.
Time Step size: 10 Hz
200 Hz
220 Hz
160 Hz
rising tone
falling tone
210
200
190
180
170
131
6.4.3.1 Identification
In the identification task participants were provided with two labeled (RIGHT and
LEFT) keyboard keys and instructed to “press the RIGHT (LEFT) key for one kind of
sound and the LEFT (RIGHT) key for the other kind of sound”. Responses timed out
after 4000 ms, and timed out trials were not replaced. (An example DMDX script can
be found in Appendix A6.2)
For all listeners, there were two sets of trials, each consisting of a practice phase, a
training phase, and a test phase. For subjects in the Blocked condition, speech and
sine-wave stimuli were presented separately in the two trial sets of the experiment
(counterbalanced order of speech and sine-wave trial sets between listeners); and for
listeners in the Mixed condition, there were simply two sets of mixed (speech and
sine-wave randomised) stimuli. Each of the two trial sets consisted of three phases:
practice, training, and test.
In the practice phase eight items were presented, four of each of the relevant endpoint
stimuli (the 160 Hz and the 220 Hz offset stimuli). Following the practice phase, in
the training phase, a criterion of 8 consecutive correct responses was required in each
trial set: for participants in the Blocked condition this entailed reaching criterion for
the speech set and for the sine-wave set; and for Mixed condition participants this
entailed reaching criterion for each set of mixed stimuli to equate with the procedure
in the Blocked condition and to ensure listeners remembered the task over time. Each
training phase consisted of the endpoints of the continuum and continued until a
criterion of eight consecutive correct responses was reached (criterion results will be
discussed in section 6.6.1.1).
Following criterion in the training phase the test phase was presented. The test phase
consisted of 8 repetitions of each continuum step presented in random order. Given
seven steps on each continuum, participants were required to identify a total of 112
items in the test phase. In total the identification task took 15 to 20 minutes,
depending on the individual participant‟s pace.
Data treatment:
For each listener, two crossover values (in Hz) and two d' values and were computed,
one for each tone type, speech and sine-wave. The crossover value is the point on the
132
continuum above and below which there is the same number of responses for each
category. The d' value measures the steepness of the slope at the crossover, and serves
to indicate the degree of categorical perception. In general, the steeper the
identification curves at crossover, the more categorical the perception. Crossover
values were computed by running a logistic regression for each listener and dividing
the constant by the slope at the 50% boundary (because log (0.5/0.5) = 0 = constant +
slope (crossover)). This value was then converted to Hz, ranging from 160 Hz to 220
Hz, the offset values of the stimuli on the tone continuum. From the identification
responses d', measuring the steepness of the identification at the category boundary,
was computed for the perceptual distance between the stimuli spanning the 50%
crossover. First the proportion of responses for one of the two categories for each
stimulus was converted to a z-score, then for each stimulus-pair, the z-score for the
smaller proportion was subtracted from the z-score from the larger proportion to
derive the d'. To avoid inflated d' estimates, response category proportions of 0 and 1
were converted to 0.005 and 0.995 (1/(2N) and 1-1/(2N) (Macmillan & Creelman,
1991). For four of the 64 participants (one in the Mandarin and three in the Australian
group), it was not possible to identify a unique 50% crossover. For these four
participants the missing values were replaced by means of the respective groups,
separately for sine-wave and speech.
6.4.3.2 Discrimination
In the discrimination task the participants listened to stimulus-pairs and were
instructed to “press the LEFT (RIGHT) key if they are the same sound, and press the
RIGHT (LEFT) key if they are different sounds”. (An example DMDX script can be
found in Appendix A6.3). Responses timed out after 1500 ms (because of the longer
trial duration here in discrimination, the time-out duration was shorter than in
identification task). Omitted trials were not replaced. The task was separated into two
trial sets, punctuated by a break. As in the identification task, there were two manners
of presentation: Blocked and Mixed. In the Blocked presentation, participants listened
to a block of speech stimuli followed by the sine-wave block (or vice versa); in the
Mixed mode the stimuli were presented randomly (speech and sine-wave in the same
blocks). A roving AX paradigm was used, measuring discrimination accuracy along
the whole continuum. Neighbouring stimuli were presented pair-wise. Half of the
participants in each condition listened to stimulus-pairs that were separated by a 1500
133
ms interval, whereas the other half listened to sounds that were separated by a 500 ms
ISI.
Each trial set consisted of a block of eight practice trials and a block of 192 test trials.
The eight practice trials consisted of four different and four same pairs that were
presented with feedback for correct/incorrect responses. In test trials there were four
repetitions of each of the four possible combinations (AA, BB, AB, BA) of each of
the six possible stimulus-pairs for each of the two tone types (speech and sine-wave).
This summed to a total of 192 stimulus-pairs. This task took 20 to 25 minutes,
depending on the participant‟s pace and the ISI (around 20 min for 500 ms ISI,
around 24 min for 1500 ms ISI).
Data treatment:
Discrimination performance was measured by d', calculated according to the models
for discrimination tasks in Kaplan, Macmillan and Creelman (1978). The number of
hits H (when different stimuli were correctly judged as different), false alarms F
(same stimuli were incorrectly perceived as different) was calculated for each
stimulus-pair for each tone type. It needs to be considered here that the larger the
difference between hits and false alarms the better the listener‟s sensitivity to the
differences in tone contours. The statistic d' is a measure of the difference between
hits and false alarms. The number however, is not simply H – F, but rather the
difference between the z-transformations54 of these two rates, where H and F are
forcibly limited to the 0.01 to 0.99 range, which means that a maximum d'-value of
8.715, and a minimum d' value of –8.715 would be obtained. This transformation was
conducted using DPrime Plus, a program available online (C. D. Creelman &
Macmillan, 1996).
6.5 Analyses
6.5.1 Test Assumptions
For all following analyses, α was set at .05. Unless otherwise mentioned, test
assumptions were found to be satisfactory. Outliers and how they were dealt with will
be mentioned in the results sections. Post hoc comparisons were conducted using the
Bonferroni correction for multiple tests.
54 In the z-transformation H and F are converted into z-scores (standard-deviation units).
134
6.5.2 Language Group Hypotheses
As there were a priori reasons to expect differences in categorical perception between
tonal (Vietnamese, Thai, and Mandarin) and non-tonal (Australian English) native
speakers, the analyses of variance (ANOVAs) included a planned contrast to test this
difference (see Table 6.1 - Tonal vs. Non-Tonal).
Table 6.1.
Description of the Language Hypotheses:
The Tonal vs. Non-Tonal Test investigates differences between the tonal language
speakers and Australian English speakers; Residual A tests Thai against Vietnamese
and Mandarin; Residual B tests Vietnamese against Mandarin.
Vietnamese Mandarin Thai Australian English
degrees of freedom
Tonal vs. Non-Tonal
1 1 1 -3 1
Total Residual Omnibus test for differences between Vietnamese, Mandarin, and Thai listeners
2
There were no a priori grounds on which hypotheses about differences between the
three different tone language speaker groups could be based. Therefore the remaining
degrees of freedom (2) and variance for the language background levels (3) after the
above planned contrast, was tested via an omnibus F-test for the Mandarin, Thai, and
Vietnamese groups (see Table 6.1, Residual). Thus the 4-level language background
with 3 degrees of freedom was partitioned such that a df = 1 planned contrast for tone
vs. non-tone was tested, along with a df = 2 omnibus test for differences within the
tone language groups.
6.5.3 Strategy Type Hypotheses
Apart from differences between the language groups, differences between stimulus-
pairs, across the continua were also analysed, according to the planned contrasts
presented in Table 6.2.
135
Table 6.2.
Planned Contrasts for the Strategy Type Hypotheses for the Discrimination Task
Stimulus-pair 160-170 170-180 180-190 190-200 200-210 210-220
Mid Hypothesis 1 -1 -1 2 2 -1 -1
Mid Hypothesis 2 0 0 1 -1 0 0
Flat Hypothesis 1 -1 -1 -1 2 2 -1
Flat Hypothesis 2 0 0 0 1 -1 0
Mid Hypothesis 1 tests whether the two stimulus-pairs in the middle of the continuum
(180-190 and 190-200 are more discriminable than the other stimulus-pairs. Mid
Hypothesis 2 test investigates which of the two, if any, of the stimulus-pairs in the
middle is better discriminated. Flat Hypothesis 1 test examines whether there is a
difference in discrimination performance between the two flat pairs (190-200 and
200-210) and the rest of the stimulus-pairs, Flat Hypothesis 2 tests whether there are
differences between the two flat stimulus-pairs.
6.6 Results
Before considering the results, a qualitative evaluation of identification and
discrimination graphs will be presented in 6.6.1. The results of the identification task
are presented in section 6.6.2, and of the discrimination task in section 6.6.3
6.6.1 Qualitative Evaluation
Identification and discrimination functions are shown for each language group for
speech and sine-waves in Figure 6.4. In identification, the location of the category
boundary is found below the middle (190 Hz offset) of the continuum for Mandarin
and Vietnamese listeners, but above the middle and towards the acoustically flat (200
Hz offset) stimulus for Australian English and Thai listeners. Thus it appears that
some tonal language speakers (Vietnamese and Mandarin) perceptually halve the
continuum, whereas the non-tonal English language speakers and the Thai speakers
use an acoustically salient point on the continuum (the flat 200 Hz stimulus) to divide
the continuum into stimuli above and below this reference point. It can be seen in
Figure 6.4 that the same perceptual anchors are used in discrimination –
discrimination peaks tend to occur near the sharpest identification slopes. This
136
correspondence indicates the same perceptual strategies were used in identification
and discrimination.
-8.715
0
8.715
160 170 180 190 200 210 220
0
50
100
-8.715
0
8.715
160
0
50
100
-8.715
0
8.715
160
0
50
100
-8.715
0
8.715
160 170 180 190 200 210 220
0
50
100
mid flat
mid flat
mid flat
mid flat
Figure 6.4 Identification and discrimination results across languages.
Iden
tific
atio
n re
spon
ses
in %
Dis
crim
inat
ion
accu
racy
(m
inim
um:
-8.7
15, m
axim
um: 8
.715
)
Vietnamese
Mandarin
English
Thai
sine speech
sine speech
sine speech
sine speech
stimulus offset (in Hz) for the 200 Hz onset tone
Figure 6.4 Identification (black lines) and discrimination (grey lines) results across languages
137
6.6.2 Identification Results
The identification task began with practice trials, followed by a training session to
reach criterion, and then a set of test trials. Results for the practice trials were not
analysed. The results of the training session are presented in section 6.6.2.1, followed
by results of the test trials (sections 6.6.2.2, 6.6.2.3, and 6.6.2.3).
6.6.2.1 Trials to Criterion in Identification Training
In the identification training session, participants were required to identify eight
stimuli correctly in succession, in order to proceed to the main part of the experiment.
Raw data are presented in Appendix A6.4 and statistics outputs are presented in
Appendix A6.5. Descriptive statistics (means and standard error bars) for trials to
criterion are presented in Figure 6.5 for blocked conditions and Figure 6.6 for mixed
conditions. The number of trials required to reach criterion are analysed here. In the
blocked condition, trials to criterion for the two blocks of trials, speech and sine-wave
tones, were analysed irrespective of order of presentation (which was counterbalanced
across participants); in the mixed condition, this was not possible, so the trials to
criterion score for Set 1 (the first of the two sets completed) were compared to trials to
criterion for Set 2 in order to examine any possible practice or learning effects.
Blocked vs. Mixed Condition:
Before considering the blocked and mixed conditions separately, a comparison of
trials to criterion in the mixed vs. the blocked conditions collapsed over language
groups via an independent sample t-test revealed that more trials to criterion were
required in the mixed (M = 14.27, sd = 10.27) than in the blocked (M = 9.40, sd =
4.02) condition (t (1, 126) = -3.526, p < .01).
Blocked condition:
Mean trials required to reach criterion in the blocked condition for the four language
groups (Thai, Vietnamese, Mandarin, and Australian English) and two tone types
(speech and sine-wave) are shown in Figure 6.5 (for raw data and statistics outputs see
Appendix A6.4 and A6.5). A 4 (language groups) x (2 stimulus types) analysis of
variance (ANOVA) with repeated measures on the second factor revealed no
significant difference between the speech and the sine-wave trials to criterion (F (1,
138
28) = .096, p > 0.05), and no significant effect for tone language background vs. non-
tone language background (F (1, 28) = 1.678, p > 0.05). There was however a
significant interaction of tone language groups and tone type (F (1, 28) = 4.443, p <
.05) and further post-hoc tests revealed that Thai listeners required more trials to reach
criterion in the sine-wave task, but the Vietnamese listeners required more trials in the
speech task.
Mixed condition:
Three outliers were found55 (z > 3.29) and changed to one unit larger than the next
extreme score, as suggested by Tabachnik and Fidell (2001). The descriptive statistics
for the mixed condition are shown in Figure 6.6. The analysis shows that there were
significantly more trials to criterion required in the first set (M = 19.59) than in the
second (M = 8.94) set (F (1, 28) = 27.337, p < 0.05). No significant differences were
found between tone and non-tone language speakers (F (1,28) =1.62, p > .05) or
between speakers of the different tonal languages (F (2, 28) = .055, p > .05), nor were
there any interactions.
0
10
20
30
Thai Vietnamese Mandarin English
crite
rion s
core
sine
speech
Figure 6.5. Descriptive Statistics for trials to criterion scores across languages in the blocked condition (error bars represent the standard errors of the mean).
55 There was one outlier in the Thai group (66 trials for sine-wave criterion in the mixed condition), one in the Vietnamese group (84 trials in the mixed condition – sine-wave), and one outlier in the Mandarin group (also mixed condition sine-wave – 56 trials).
139
Figure 6.6. Descriptive statistics for trials to criterion scores across languages in the two parts of the mixed condition.
Together these results show that under most circumstances, participants learnt this
identification task quite easily in around nine trials, just one more than the minimum
number of eight. The only condition in which this was elevated (M = 19.3) was in the
first set of mixed trials (see Figure 6.6). This indicates that speech and sine-wave
stimuli were not treated as equivalent by the participants, for responding to one type
interfered with responding to the other type. Nevertheless, they were not treated
completely independently because by the second set of mixed trials, the mean trials to
criterion was around that for the blocked trials, indicating that mixed condition
participants could learn to ignore any sine-wave/speech similarities or differences in
this task that may have inhibited their performance.
6.6.2.2 Identification Test Trials: Crossover Values
Crossover values in Hz were calculated for each listener for each tone type.
Descriptive statistics for crossover values are shown in Table 6.3 and also Figure 6.4.
Raw data and analyses are presented in Appendix A6.6 and 6.7.
A 4 x 2 x (2) language x presentation mode x stimulus type ANOVA with repeated
measures on the last factor was conducted.
For tonal (Mandarin, Vietnamese, and Thai listeners) vs. non-tonal language speakers
(Australian English listeners) the difference was found to be significant, (F (1, 56) =
4.721, p < .05), with generally higher crossovers for non-tone language speakers.
Within the tone language speakers, the omnibus F (difference between groups) was
significant (F (2, 56) = 12.364, p < 0.05). Given the nature of the results (see Table
140
6.3) post hoc comparisons were also conducted. Post-hoc comparisons showed that
the Thai listeners had higher crossovers than the Mandarin listeners (p < .001), and
than the Vietnamese listeners (p < .01). Australian English listeners showed higher
crossover than the Mandarin listeners (p < .001), and that there was no difference
between Mandarin and Vietnamese listeners (p > .05).
There also was a significant main effect for stimulus type (F (1, 56) = 13.788, p <
.05), with the crossover being higher on the continuum for the sine-wave than for the
speech sounds (Mspeech = 188.5, sd = 8.28; Msine-wave = 192.1, sd = 7.90). The main
effect for presentation mode (mixed/blocked) was also significant (F (1, 56) = 5.264,
p < .05), with the blocked crossovers being higher than the mixed crossovers.
Other than these main effects, there were no two-way interactions between the three
factors, language background, presentation mode, and stimulus type, nor was the
three-way interaction significant.
Table 6.3
Descriptive Statistics for Crossover Values (measured in Hz) across Tone Types and
Languages (n = 16 in each group).
Group M (SD) blocked M (SD) mixed
speech sine-wave speech sine-wave
Thai 196.1 (6.01) 196.3 (7.21) 189.1 (10.27) 198.0 (8.97)
Vietnamese 190.9 (7.26) 192.0 (2.07) 184.3 (5.31) 187.0 (9.56)
Mandarin 183.7 (6.01) 188.3 (7.25) 179.8 (7.92) 187.3 (7.19)
Australian 192.9 (4.45) 195.6 (6.78) 190.9 (6.29) 192.8 (6.02)
Means 190.9 (5.93) 193.1 (5.83) 186.0 (7.70) 191.3 (7.94)
141
6.6.2.3 Identification d' Results
Mean identification slope, measured by d' at the crossover location was calculated for
each listener. Descriptive statistics for crossover values are shown in Figure 6.7 and
6.8. Raw Data and analyses are presented in Appendix A6.8 and A6.9. A 4 x 2 x (2)
language x presentation mode x stimulus type ANOVA with repeated measures on
the last factor was conducted.
Identification slopes between tonal language speakers (Mandarin, Vietnamese, and
Thai listeners) and non-tonal language speakers (Australian English listeners) did not
differ significantly, F (1, 56) = .011, p > .05. Within tonal language groups there was
a significant omnibus effect (F (2, 56) = 5.654, p < .05). Post-hoc comparisons
showed that there was significantly stronger identification by the Thai than by the
Vietnamese listeners (p < .005), and for the Mandarin than the Vietnamese listeners
(p < .005). Differences between Thai and Mandarin listeners were not significant (p >
.05). The other two main effects, presentation mode and stimulus type, were not
significant (Ftone type (1, 56) = .490, p > .05; Fpresentation manner (1, 56) = 3.748, p > .05),
and none of the two-way or three-way interactions were significant. Thus the effects
here were for language background, and these results were consistent over
mixed/blocked, and speech/sine-wave conditions.
-1
0
1
2
3
Vietnamese Mandarin Thai English
identif
icatio
n a
ccura
cy (
d')
blocked speech blocked sine
Figure 6.7. Mean d' identification scores across languages for speech and sine-wave stimuli in the blocked condition.
142
-1
0
1
2
3
Vietnamese Mandarin Thai English
identif
icatio
n a
ccura
cy (
d')
mixed speech mixed sine
Figure 6.8. Mean d' identification scores across languages for speech and sine-wave stimuli in the mixed condition.
6.6.3 Discrimination Results
In the discrimination task there were practice trials followed by sets of test trials.
Practice trials were not analysed. For the test trials d' was the dependent variable.
Overall differences in discrimination are presented in 6.6.3.1 and analyses of peaks of
discrimination in 6.6.3.2.
6.6.3.1 Overall Discrimination Differences
Mean discrimination accuracy was calculated for each listener for each stimulus-pair.
Raw data and Analyses are presented in Appendix A6.10 and A6.11. A 4 x 2 x 2 x (2)
language x presentation mode x ISI x stimulus type ANOVA with repeated measures
on the last factor was conducted using the mixture of planned contrasts and omnibus F
described in section 6.5.
Discrimination accuracy for tonal language (Mandarin, Vietnamese, and Thai)
listeners was found to be significantly lower than for non-tonal language (Australian
English) listeners (Mtone = 2.425, sd = 3.434; Mnon-tone = 3.427, sd = 3.368; F (1, 48) =
4.167, p < .05) and the omnibus F for differences within tone languages was not
significant (F (2, 48) = 2.244, p > .05). None of the other main effects (presentation
manner, ISI, or stimulus type) were significant nor were any interactions.
143
6.6.3.2 Peak Discrimination Analysis
When categorical perception is observed the identification slopes are steep in the
region on the continuum where perception changes from one category to the other –
the category boundary. So far we have looked at location of the boundary (see section
6.6.1.2) and steepness of the identification function at the boundary (section 6.6.1.3).
In categorical perception experiments not only identification is investigated. In order
to show that stimulus continua are perceived in a categorical manner, discrimination
also needs to be assessed. When discrimination is poor (around chance level) in those
areas of the continuum where identification is most consistent (between the endpoints
of the continuum and the category boundary area), it is concluded that stimuli are
indiscriminable (for example in the case of voicing continua, (Repp, 1984)). Another
feature of categorical discrimination is the existence of a discrimination peak at the
category boundary. This peak shows that stimuli that are identified inconsistently
(sometimes identified as category A, sometimes identified as category B), are
discriminated well.
In the current experiment the focus is on differences in categorical perception (all
measures: identification d', crossover location, overall discrimination performance,
and discrimination peak location) differences between speakers of different
languages. Examining location of discrimination peaks is interesting as it may reveal
further information about perceptual strategies that were observed in identification.
Discrimination values over the continuum are shown in Figure 6.9 and 6.10 for the
four language groups for sine-wave and speech tones. Raw discrimination data are
presented in Appendix A6.10 and A6.11.
Here, as set out in 6.5, planned comparisons were constructed to test the mid-
hypothesis and the flat-hypothesis. These are considered in turn.
Mid-Hypothesis:
Overall, the stimuli in the middle of the continuum (stimulus-pairs 180-190 Hz and
190-200 Hz) were discriminated significantly more accurately than the other stimulus-
pairs F (1, 48) = 38.735 (Mmid = 3.280, sd = 3.404; Mother = 2.372, sd = 3.415), but
there was no significant difference between these two (F (1, 48) = 0.137).
144
0
1
2
3
4
5
sine speech sine speech sine speech sine speech
Thai Vietnamese Mandarin English
dis
crim
inatio
n a
ccu
racy
(d')
mid (190) pairs other stimulus pairs
Figure 6.9. Mean d' discrimination scores for the stimuli around the middle of the continuum (the 180-190 Hz and 190-200 pairs) vs. the other stimulus-pairs on the continuum.
None of the interactions between the mid-hypothesis and other factors (presentation
mode, ISI, and language background, or their interactions) were found to be
significant. Thus, it can be said that discrimination was best in the centre of the
continuum (180-200 Hz) irrespective of language background or stimulus conditions.
Flat-Hypothesis:
It was also revealed that those stimuli situated around the flat stimulus (190-200 Hz
and 200-210 Hz) were discriminated more easily than the other stimulus-pairs on the
continuum: F (1, 48) = 22.181 (Mflat = 3.207, sd = 3.427; Mother = 2.409, sd = 3.412)
and that there was no significant difference between these two stimulus-pairs around
the flat stimulus (F (1, 48) = 1.343, p > .05).
145
0
1
2
3
4
5
sine speech sine speech sine speech sine speech
Thai Vietnamese Mandarin English
dis
crim
ina
tion
acc
ura
cy (
d')
Flat (200) pairs other stimulus pairs
Figure 6.10. Mean d' discrimination scores for the stimuli around the flat stimulus (the 190-200 and 200-210 pairs) vs. the other stimulus-pairs on the continuum.
There was also an interaction between this effect and the different tonal language
speaking groups. There is a significantly greater difference between the flat pairs and
the other pairs in the Thai listener group than there is in the Mandarin and Vietnamese
listeners (F (2, 48) = 5.543, p < .05). Thus it can be said that the flat hypothesis was
upheld for the Thai but not the Vietnamese or Mandarin speakers. Due to the nature of
the observed results, post-hoc tests with just the English speakers were also conducted
and these revealed that the flat-hypothesis was upheld there
6.7 Discussion
The results of this experiment will be discussed considering important aspects of tone
perception, starting with differences between speech and non-speech tone perception,
perceptual strategies, and categoricality issues, leading to further analyses and future
directions.
6.7.1 Independence of Speech and Non-Speech Processing
In the current experiment, unlike in Burnham and Jones (2002), no significant
differences between speech and sine-wave processing were found in terms of
identification or discrimination accuracy. While this is contrary to earlier results by
Burnham and Jones (2002), one could perhaps argue that in terms of F0, the difference
between speech and sine-wave non-speech is not that large. While in the speech
version, F0 is carried by the fundamental frequency and a number of higher harmonics
146
and in the sine-wave version, it is carried only by the frequency of the sine-wave, the
pitch contour is clear in both versions. This is quite different from, say, non-speech
analogues of VOT or place of articulation continua. According to this view, the lack
of difference in terms of categoricality is not surprising.
It was, however, found that significantly more trials were required to reach criterion in
the first but not the second set of trials the mixed condition. This means that speech
and sine-wave tone processing are not completely independent, shown by the elevated
criterion score in the first set, which indicates that there is interference between the
two stimulus types. This pattern of results also shows that this interference can be
ignored, which is shown by lower criterion scores in the second set.
6.7.2 Perceptual Strategies in Identification and Discrimination
In terms of crossover, the distinction between tonal and non-tonal language speakers
is not clear. The Mandarin and the Vietnamese listeners seem to perceptually halve
the continuum; they exhibit an identification boundary at the 190 Hz offset midpoint
of the continuum, and discrimination peaks at 190 Hz. This perceptual halving of the
asymmetric continuum would appear to be a linguistic approach - the listeners create a
tonal space for the particular task at hand (the synthetic continuum in this case) and
place the tones within this tonal framework – half of the stimuli in one category (those
above the mid-point), and the other half in another category (those below the mid-
point).
On the other hand, the Thai listeners and the Australian listeners divide the continuum
into above and below the flat no-contour stimulus (200 Hz offset). Consideration of
Figure 6.4 shows that this means that their crossovers were closer to the flat stimulus
This would appear to be a more acoustic approach, as the flat tone is an acoustically
salient tone with a distinctly different contour from the other tones. In this way, two
categories may be set up – non-flat rising tones, and non-flat falling tones. The
difference between the two strategies is that those listeners who use the flat tone as a
perceptual anchor appear to perceptually distinguish between rising and falling tones,
whereas those listeners who use the middle of the continuum as an anchor seem to
build a tonal space to use for this task, within which the middle of the continuum
plays a greater role than the flat no-contour stimulus.
As well as looking at the crossover results in terms of the middle and the flat tone,
discrimination results were also analysed according to these two locations on the
147
continuum. Overall, the mid and the flat stimulus-pairs were discriminated more
accurately than the other stimulus-pairs on the continuum, with the Thai listeners
showing significantly greater differences in discrimination accuracy between the flat-
pairs and the other pairs than the Mandarin and Vietnamese listeners. This means that
the Thai listeners‟ perceptual anchor lies again closer to the flat stimulus, whereas for
Vietnamese and Mandarin listeners, the perceptual anchor is more around the middle
of the continuum. This confirms what was found in the identification crossover results
– Thai listeners, but not Vietnamese and Mandarin listeners use the no-contour tone as
a perceptual anchor, here shown by increased discriminatory ability around that point
on the continuum.
It is interesting that the Thai listeners do not seem to use the same perceptual strategy
as the other (Mandarin and Vietnamese) tonal language listeners. While the results of
the Thai listeners seem very similar to those of the Australian English listeners, this
does not necessarily lead to the conclusion that both use an acoustic approach to tone
processing. Rather it might indicate that the Thai listeners employ a different
linguistic strategy in tone perception than the Vietnamese and Mandarin listeners. To
investigate possible similarities and differences between Thai and Australian English
listeners in terms of categorical tone perception, Experiment 2 will test just those two
language groups.
Although the language abilities of the participants were not tested for the purpose of
this perceptual experiment, and while it is possible that the Thais were more
experienced with English than the Vietnamese, this is unlikely, because all of the Thai
and Vietnamese participants were students of the University of New South Wales in
Sydney, where University level of English knowledge is expected.
Another difference within tonal languages was observed in the identification d'
results. Identification d' gives information about the steepness of the identification
function at the point where the crossover is located. Higher d' values indicate steeper
identification functions and this, in turn means more consistent, or „categorical‟
identification behaviour. Vietnamese listeners show significantly less categorical
perception than Thai and Mandarin listeners.
148
One of the reasons why Vietnamese listeners show less categorical perception could
be that there are distinct durational differences in the real Vietnamese tones (see
section 3.2.2). Duration, additional to F0, is an important feature of lexical tone in
Vietnamese, with tone 5 being the longest (around 400 ms), and then in order of
decreasing duration tone 2, tone 3, tone 4, and tone 6 (160 ms) being the shortest (L.
Thompson, 1987). As the duration of all the tones presented in this experiment was
equated at 495 ms, the Vietnamese listeners may have had greater difficulty in
categorical identification, due to lack of a durational cue. It should be noted that
Mandarin and Thai also have durational differences that are used to distinguish tones;
however, durational differences between the different Vietnamese tones are greater
than those between different Mandarin (300 ms vs. 170 ms) or Thai tones (570 ms vs.
450 ms) (Blicher, Diehl, & Cohen, 1988; Jongman, Wang, Moore, & Sereno, 2006;
Vu, Nguyen, Luong, & Hosom, 2005; Yip, 2002).
Apart from this distinction, there is another important feature of Vietnamese tones that
is missing in the artificial continuum that was used in the current study: voice register.
While different voice registers are present in real Vietnamese tones (Vu et al., 2005;
Yip, 2002), these cues were not engaged here, which might be another reason why the
Vietnamese listeners did not respond in a very categorical way to these stimuli.
Given that register and duration features of Vietnamese tone play a significant role in
Vietnamese tone identification, it would be interesting to see whether Vietnamese
listeners would respond differently to a tone continuum in which duration and/or
voice register are manipulated.
Thus, it appears that tone perception strategies are not universal and each tonal
language has to be considered separately.
6.7.3 Categoricality Issues
The results of this experiment show that, consistent with previous studies (Stagray &
Downs, 1993) discrimination is better overall for non-tonal language than for tonal
language speaking participants (but see section 6.8). The reason for the superiority of
the non-tonal language listeners over tonal language listeners could be the fact that the
Australian participants treat the task as an acoustic one, whereas the tonal language
speakers associate the tones with tones from their own tonal system which influences
their perception of a novel tonal space. Moreover, tonal language listeners are used to
149
perceiving tones that are very similar as one tonal category, whereas non-tonal
listeners are not required to categorise tones in linguistic contexts and thus better able
to perceive subtle tonal differences. However, this can only explain the differences
between speakers of two of the tonal languages (Mandarin and Vietnamese) and the
non-tonal Australian English speakers, Thai listeners‟ perception does not appear to
be influenced in that way. It is not clear why there is this similarity of Thai listeners‟
perception to Australian listeners‟, which leads to the conclusion that tone perception
is not only shaped by tonal language experience in general, but rather a result of the
specific tonal system that the listener is using.
With regard to presentation manner, crossovers, identification accuracy, and
discrimination results were analysed. The results show that the crossover varied
depending on presentation manner (with higher – towards the flat tone) crossovers in
the blocked condition) and tone type (higher crossovers for sine-wave tones). This
indicates that listeners‟ perception is influenced by both the context and the manner in
which it is presented. The crossover being closer to the flat 200 Hz offset stimulus for
sine-wave tones shows a rather acoustical approach to sine-wave tone perception.
Similarly, the fact that blocked presented stimuli show crossovers closer to the flat
tone than those for mixed presentation is also an indicator for a more psychoacoustic
perceptual strategy in the blocked condition. The reason for using this type of
perceptual strategy in the blocked, but not in the mixed condition may be that listeners
are better able to ignore stimulus-relevant features, such as stimulus type and solely
concentrate on the tone contour.
Turning to tone type, although there were higher crossovers for sine-wave than speech
tones, there were no significant differences in categoricality either in discrimination or
in identification. This is surprising, as the previous study by Burnham and Jones
(2002) did observe significant differences between tone types. However, this could, in
part, be due to the fact that the Burnham and Jones experiment used more non-speech
(75% of the stimuli presented were non-speech tones), which could have led to a
greater perceptual separation between speech and non-speech tones in their
experiment. Here, half of the stimuli are speech tones, the other half are sine-wave
equivalents, and this equal distribution might lead to a greater robustness of listeners
150
against stimulus-types which might have led to them perceptually treating them
similarly.
It is also worth pointing out that no difference was observed between the two inter-
stimulus intervals (1500 and 500 ms separation in discrimination). This is interesting
because previous studies have found more categorical discrimination for stimuli
separated by a longer than by a shorter interval (see section 2.4.3.2). A reason for this
lack of difference between the two ISIs could be that the 1500 ms separation was too
long to lead to a perceptual advantage.
6.8 Further Analysis and Future Directions
It was shown that tonal language background, tone type, and presentation type all
influence listeners‟ perception of tone continua.
There are some reported connections between tone perception and musical ability
(see Chapter 4). An ad hoc observation made here concerns a comparison between
results of musically trained listeners and non-musicians56. Inspections of d‟ values for
identification and discrimination (see Table 6.4) indicated that musicians show
slightly steeper identification functions and much more accurate discrimination than
non-musicians.
Table 6.4.
Mean d’ values for Identification and Discrimination in Musicians and Non-
Musicians
Musicians Nonmusicians
N d' ID (SD) d' disc (SD) N d' ID (SD) d' disc (SD)
Thai 5 1.93 (0.45) 3.94 (1.46) 11 1.70 (0.41) 2.75 (1.28)
Vietnamese 4 0.96 (0.42) 3.80 (1.12) 12 1.33 (0.44) 1.22 (1.27)
Mandarin 3 1.74 (0.84) 2.82 (1.21) 13 1.79 (0.37) 2.13 (2.06)
Australian English 8 1.81 (0.49) 4.15 (2.04) 8 1.31 (0.57) 2.72 (1.48)
Mean (SD) 20 1.66 (0.60) 3.82 (1.59) 44 1.55 (0.47) 2.24 (1.58)
This ad hoc comparison (which could not be analysed statistically, because the
participant numbers were not even close to equated) leads to the question whether 56 The definition of musicianship was based on a questionnaire in which participants were required to indicate whether they played a musical instrument and for how long they had played it for.
151
musically trained participants are better at tone identification than non-musicians.
This is of interest as a possible factor in tone perception, which may rival tonal
language background as a determinant of tone perception ability. Indeed these
differences were more apparent in Australian and Thai listeners than Mandarin and
Vietnamese listeners and may in fact explain some of the differences in results that
were found here between the two sets of languages. In order to investigate whether
there are any perceptual differences depending on musical background, musically
trained listeners from both tonal (Thai) and non-tonal (Australian English) language
background are tested in the following experiment.
152
CHAPTER 7
Perception of Speech and Sine-Wave Tones: The Role of
Language Background and Musical Training
153
The current experiment concerns categorical perception of tone in Thai and Australian
English speaking musicians and non-musicians. First, in the introduction the results of
the previous study are summarised and methodological issues are considered. The
method and results of the current experiment are subsequently presented.
7.1 Background: Results of Experiment 1
Two results from Experiment 1 bear on this experiment. First, in Experiment 1, it was
found that tone perception depends not only on whether the native language of the
listener is a tonal or a non-tonal language, but also on the particular tonal language
that is spoken. On the basis of strategy differences found in Experiment 1 for speakers
of different tonal and non-tonal languages, in Experiment 2 differences in categorical
tone perception are investigated for two differently shaped asymmetric continua,
which should accentuate any difference between the Mid-Continuum, and Flat-
Anchor strategies. Only speakers from one tonal language (Thai) and one non-tonal
language (Australian English) are tested in the current experiment. While the issue of
tone language differences are of interest, there were many aspects to consider from
the results of the previous study, and the most important were considered to be
differences between speech and non-speech perception, and the influence of musical
background on tone perception. Second, based on differences found between
musically trained listeners and non-musicians observed in an ad hoc analysis in
Experiment 1, both language groups in this study consist of musicians, and non-
musicians. As in Experiment 1 both identification and discrimination tasks, and
speech and sine-wave continua were included. Stimuli are again presented in separate
blocks or mixed within blocks.
7.2 Methodological Issues
Methodological issues concerning the construction of the stimuli for this experiment,
specifically continuum construction and step size, are considered below with due
reference to the results of Experiment 1.
154
7.2.1 Rising and Falling Continua
In Experiment 1, perception of tones from an asymmetric continuum that had more
falling than rising tones was investigated. The current experiment will again use
asymmetric tone continua in order to test for perceptual effects both around the medial
stimulus of the continuum and also around the non-dynamic flat F0 stimulus.
However, unlike Experiment 1, here asymmetry will be observed in both directions:
there will be a continuum with more falling than rising tones (and a slightly falling
central stimulus), and a continuum with more rising than falling tones (and a slightly
rising central stimulus). This is necessary in order to ensure that any effects that are
observed for the central stimulus are indeed due to its medial position rather than to its
being slightly falling or slightly rising. Another difference between the continua of
this and the last experiment is that the physical separation between the centre of the
continuum and the flat stimulus is greater here (in Experiment 1, the stimulus that
constituted the middle of the continuum was only one step away from the flat
stimulus). Based on the results of the previous experiment, two alternative response
strategies were deemed possible in the current study. These are schematically
presented in Figure 7.1 to Figure 7.4, and described below for the asymmetric
continua used here in Experiment 2. The rising continuum consisted of tones with a
consistent onset F0 (220 Hz) but varying F0 offsets from 205 Hz (falling tone) to 257.5
Hz (rising tone) in 7.5 Hz steps; in the falling continuum, the onset was 220 Hz, and
the offsets ranged from 182.5 Hz to 235 Hz in 7.5 Hz steps (see Figure 7.5).
Mid-Continuum Response Strategy: In the mid-continuum response strategy, it might
be expected that the category identification boundary and discrimination peak for the
synthetic asymmetric continuum would be midway between the endpoints of the
continuum. If so then (i) the identification boundary should be located at 208.75 Hz
offset for the falling continuum and at 231.25 Hz for the rising continuum, and (ii)
stimuli in the middle of the continuum should be more discriminable than surrounding
stimuli, especially those at the ends of the continuum (see Figures 7.1 and 7.2).
155
mid-continuum strategy - rising continuum
0
25
50
75
100
205 212.5 220 227.5 235 242.5 250 257.5
offset value (in Hz) for the 220 Hz onset tone
identifica
tion a
ccura
cy f
or
the
risi
ng t
one in (
%co
rrect
)
discrimination
identification
Figure 7.1. Mid-Continuum Response Strategy for a synthetic continuum with onset = 220 Hz, and offset = 205 – 257.5 Hz in 7.5 Hz steps. The 231.5-offset rising tone stimulus represents the middle of the continuum, whereas the 220 Hz offset stimulus is a flat no-contour tone. The dotted line represents the discrimination function, the solid line the identification function.
mid-continuum strategy - falling continuum
0
25
50
75
100
182.5 190 197.5 205 212.5 220 227.5 235
offset value (in Hz) for the 220 Hz onset tone
identification a
ccura
cy f
or
the
risin
g t
one in (
% c
orr
ect)
discrimination
identification
Figure 7.2. Mid-Continuum Response Strategy for a synthetic continuum with onset = 220 Hz, and offset = 182.5 – 235 Hz in 7.5 Hz steps. The 208.75-offset falling tone stimulus represents the middle of the continuum, whereas the 220 Hz offset stimulus is a flat no-contour tone. The dotted line represents the discrimination function, the solid line the identification function.
Flat-Anchor Response Strategy: In the Flat-Anchor Response strategy, it might be
expected that the category boundary and discrimination peak for this synthetic
asymmetric continuum would be near the flat no-contour tone (220 Hz onset 220 Hz
offset). If so then (i) the identification boundary should lie on the flat 220 Hz
stimulus, and (ii) the stimuli around the 220 Hz offset stimulus should be more
mid flat
mid flat
156
discriminable than surrounding stimuli, especially those at the ends of the continuum
(see Figures 7.3 and 7.4).
flat-anchor strategy - rising continuum
0
25
50
75
100
182.5 190 197.5 205 212.5 220 227.5 235offset value (in Hz) for the 220 Hz onset tone
ide
ntif
ica
tion
acc
ura
cy fo
r th
e
risi
ng
ton
e in
(%
co
rre
ct)
discrimination
identification
Figure 7.3. Flat-Anchor Response Strategy for a synthetic continuum with onset = 220 Hz, and offset = 182.5 – 235 Hz in 7.5 Hz steps. The 208.75-offset falling tone represents the middle of the continuum, whereas the 220 Hz offset stimulus is a flat no-contour tone.
flat-anchor strategy - falling continuum
0
25
50
75
100
205 212.5 220 227.5 235 242.5 250 257.5
offset value (in Hz) for the 220 Hz onset tone
identification a
ccura
cy o
f th
e r
isin
g t
one in (
%
corr
ect)
discrimination
identification
Figure 7.4. Flat-Anchor Response Strategy for a synthetic continuum with onset = 220 Hz, and offset = 205 – 257.5 Hz in 7.5 Hz steps. The 208.75-offset rising tone stimulus represents the middle of the continuum, whereas the 220 Hz offset stimulus is a flat no-contour tone.
7.2.2 Step Size
Another difference between this and the previous study is the step size on the
continuum. In Experiment 1, the physical separation between the stimulus offsets was
10 Hz, with onset fixed at 200 Hz, and the offset ranging from 160 Hz to 220 Hz.
Based on the rather high discrimination performance observed in Experiment 1 (better
mid flat
mid flat
157
than chance at most points of the continuum for most listener groups), the step size in
this study has been reduced from 10 Hz to 7.5 Hz (see Figure 7.5).
7.3 Hypotheses
7.3.1 Categoricality Differences
In the ad hoc analyses in Experiment 1 it appeared that musicians had steeper
identification curves than non-musicians. Based on these results it is expected that
musicians here, irrespective of language background, will show more categorical
identification than non-musicians. Musicians also appeared to show higher
discrimination performance in the previous study than non-musicians, and it is thus
expected that they will also be better at discriminating tones than non-musicians.
Separate (2 x 2) groups of tonal and non-tonal language speaking musicians and non-
musicians will be tested to investigate whether musical experience shapes categorical
perception in a similar manner to that of tonal language experience.
7.3.2 Continuum Shape and Language Background
The shape of the continua (more falling vs. more rising) is expected to influence
performance specifically in the Thai listener group. Thai has more falling than rising
tones (see section 3.2.1) and it is thus hypothesised that, if Thai listeners‟ perception
is influenced by their own tonal space, they should show more accurate identification
and discrimination in the falling than in the rising continuum. No such differences are
expected between the falling and the rising continua for Australian English listeners.
It is expected that both language groups will show a perceptual anchor (identification
boundary and discrimination peak) near the flat 220 Hz onset 220 Hz offset stimulus,
as was found in Experiment 1.
7.3.3 Processing Differences
In the first study, all participants were tested in Australia and interacted with an
English-speaking researcher. In the current experiment, Australian English
participants are again tested in Australia by an English-speaking researcher, but Thai
listeners are tested in Thailand and by a Thai-speaking researcher, a more appropriate
linguistic environment for a tone perception experiment with tonal language speakers.
It is hypothesised that this native language environment will result in more language-
specific processing in the tonal language speakers and therefore more categorical
158
perception of tones than was observed in Experiment 1 (although this will not be
tested formally here).
7.4 Experimental Design
A 2 x 2 x 2 x 2 x (2) design was employed. The between-group factors were listeners‟
native language (Thai, Australian English) and musical background (musicians,
nonmusicians), presentation manner (mixed, blocked), and continuum shape (more
falling, more rising). The within-group factor was tone type (speech, sine-wave). The
order of tasks (identification and discrimination), the order of tone types (speech or
sine-wave), as well as the order of stimuli in discrimination pairs (higher offset
stimulus first vs. higher offset stimulus second) were counterbalanced and are not
considered in the analysis.
For the identification task, measurements were taken for each stimulus on the
continuum; and for the discrimination task measurements were taken for each
contiguous pair of stimuli. Task-dependent variables in each task are considered in
detail in sections 7.4.3.1, and 7.4.3.2.
7.4.1 Stimuli
Four synthetic tone continua were created, two types, a speech continuum and a sine-
wave continuum with identical F0 contours, for each of the two continuum shapes -
one with more falling (5 items) than rising (2 items) stimuli, the other one with more
rising (5 items) than falling (2 items) tones. All tones have a fixed onset of 220 Hz
and offsets varying from 182.5 to 235 Hz in the falling continua; and from 205 to
257.5 Hz in the rising continua (step size: 7.5 Hz). Thus each continuum shape is
asymmetrical and consists of eight steps. F0 movement in all tones is linear. The tonal
contours of the continua are shown schematically in Figure 7.5. Tone types, speech on
the syllable /wa/ and sine-wave were created in the same manner as in Experiment 1
(see section 6.4.1).
159
Figure 7.5. F0 characteristics of the two asymmetric tone continua in Experiment 2.
7.4.2 Participants
A total of 64 adults with normal hearing were tested. Of these, 32 were Thai native
speakers and 32 were Australian English native speakers. Each language group
consisted of 16 musicians and 16 non-musicians. Musicians were defined as having
more than five years of continuous formal musical education; non-musicians were
participants with no more than two years of musical education. The Thai participants
were students at Chulalongkorn University, Bangkok, and Assumption University,
Bangkok (26 female, 6 male; mean age 22 years, range 19-27 years), and were
reimbursed for their travel expenses. The musical and non-musical Australian English
participants (23 female, 9 male; mean age 21 years, range 18-30 years) were students
at the University of Western Sydney and received course credit for their participation.
7.4.3 Procedure
Participants were tested individually in a single session, in sound-attenuated testing
cubicles at Chulalongkorn University in Bangkok, Thailand for the Thai listeners, and
at MARCS Auditory Laboratories at the University of Western Sydney for the
257.5 250 242.5 235 227.5 220 212.5 205
220
235 227.5 220 212.5 205 197.5 190 182.5
0 ms 495 ms 0 ms 495 ms time in ms
200
F0 in Hz
240
180
Rising Continuum
Falling Continuum
160
Australian English listeners. Participants gave their informed consent (see Appendix
A7.1 for Australian participants and Appendix A7.2 for Thai participants) and the
experiment was covered by University of Western Sydney Ethics approval (HREC
05/64). Stimuli were presented on a laptop computer (Compaq Evo N1000c) over
headphones (KOSS UR20) at a self-adjustable listening level in the DMDX (Forster
& Forster, 2003) experimental environment.
7.4.3.1 Identification
As in Experiment 1, participants were provided with two labelled (RIGHT and LEFT)
keyboard keys and instructed to “press the RIGHT (LEFT) key for one kind of sound
and the LEFT (RIGHT) key for the other kind of sound”. Responses timed out after
4000 ms, and timed out trials were not replaced. For all listeners there were two sets
of trials, each consisting of a practice phase, a training phase, and a test phase. For
subjects in the Blocked condition, speech and sine-wave stimuli were presented
separately in the two trial sets of the experiment (counterbalanced order of speech and
sine-wave between listeners); and for listeners in the Mixed condition, there were
simply two sets of mixed (speech and sine-wave randomised) stimuli. The three
phases, practice, training, and test, are described below.
In the practice phase eight items were presented, four of each of the relevant endpoint
stimuli (the 205 Hz and the 257.5 Hz offset stimuli for the rising continuum and the
182.5 Hz and 235 Hz offset stimuli for the falling continuum). Following the practice
phase, in the training phase it was required that a criterion of 8 consecutive correct
responses be made: for participants in the Blocked condition this entailed reaching
criterion for the speech set and for the sine-wave set; and for Mixed condition
participants this entailed reaching criterion separately for each of the two sequential
sets of mixed stimuli to ensure listeners remembered the task over time. Each training
phase consisted of the endpoints of the continuum and continued until a criterion of
eight consecutive correct responses was reached. Following criterial responding in the
training phase the test phase was presented. The test phase consisted of 8 repetitions
of each item on each continuum (speech continuum and sine-wave continuum)
presented in random order. Given the eight steps on each continuum, participants
161
were required to identify a total of 128 items in the test phase. The task took 20 to 25
minutes, depending on the individual participant‟s pace.
Data Treatment:
The data treatment was the same as in the previous study (see section 6.4.3.1). For
each listener, two crossover values (in Hz) and two d' values and were computed (see
6.4.3.1 for details). This value was then converted to the offset values of the stimuli
on the tone continuum (ranging from 205 Hz to 257.5 Hz for the rising continuum and
from 182.5 Hz to 235 Hz for the falling continuum).
7.4.3.2 Discrimination
In the discrimination task the participants listened to stimulus-pairs and were
instructed to “press the LEFT (RIGHT) key if they are the same sound, and press the
RIGHT (LEFT) key if they are different sounds”. Responses timed out after 1500 ms
(because of the longer trial duration in discrimination, the time-out duration was
shorter here than in identification). Omitted trials were not replaced. The task was
separated into two trial sets, punctuated by a break. As in the identification task, there
were two manners of presentation: Blocked and Mixed. A roving AX paradigm was
used, measuring discrimination accuracy along the whole continuum. Neighbouring
stimuli were presented pair-wise. Half of the participants in each condition listened to
tones from the rather rising continuum, whereas the other half listened to tones from
the rather falling continuum. The eight practice trials consisted of four different and
four same pairs that were presented with feedback. In test trials there were four
repetitions of each of the four possible combinations (AA, BB, AB, BA) of each of
the seven possible stimulus-pairs for each of the 2 tone types (speech and sine-wave).
This summed to a total of 224 stimulus-pairs. In total, this task took 20 to 25 minutes,
depending on the individual participant‟s pace.
Data Treatment:
162
Again, discrimination performance was measured by d', calculated according to the
model for discrimination tasks in Kaplan, Macmillan and Creelman (1978). The
proportions of Hits, p (Hit), and, False Alarms, p (False Alarm), were forcibly limited
to the 0.01 to 0.99 range, which meant that a maximum d'-value of 8.715, and a
minimum d' value of –8.715 would be obtained.
7.5 Analyses
7.5.1 Test Assumptions
For all following analyses, α was set at .05. Unless otherwise mentioned, test
assumptions were found to be satisfactory. Outliers and how they were dealt with will
be mentioned in the appropriate results sections. Planned contrasts were conducted
using the Bonferroni correction for multiple tests, where appropriate.
7.5.2 Language Group and Musical Background Hypotheses
As there were a priori reasons to expect differences in categorical perception between
tonal (Thai) and non-tonal (Australian English) native speakers and between
musicians and non-musicians, analyses of variance (ANOVAs) incorporating planned
contrasts were conducted to test the differences between these groups, and their
interactions.
7.5.3 Strategy Type Hypotheses
Apart from differences between the language and musical background groups,
differences between stimulus-pairs in the discrimination pairs, across the continua
were also tested in a priori analyses, according to the planned contrasts presented in
Table 7.1.
Table 7.1
Planned Contrasts for the Strategy Type Hypotheses for the Discrimination Task –
Rising continuum57
Stimulus-pair (offset in Hz)
205-212.5
212.5-220
220-227.5
227.5-235
235-242.5
242.5-250
250-257.5
Mid Hypothesis
-1 -1 -1 6 -1 -1 -1
57 In the current experiment, there was only one contrast per strategy, because the flat stimulus was contained by two stimulus pairs.
163
Flat Hypothesis
2 -5 -5 2 2 2 2
The Mid Hypothesis tests whether the stimulus-pairs in the middle of the continuum
(220 - 227.5 Hz and 220 – 235 Hz) are more discriminable than the other stimulus-
pairs. The Flat Hypothesis test examines whether there is a difference in
discrimination performance between the two flat pairs (212.5 – 220 Hz and 220 -
227.5 Hz) and the rest of the stimulus-pairs.
Table 7.2
Planned Contrasts for the Strategy Type Hypotheses for the Discrimination Task –
Falling continuum
Stimulus-pair (offset in Hz)
182.5-190
190-197.5
197.5-205
205-212.5
212.5-220
220-227.5
227.5-235
Mid Hypothesis
-1 -1 -1 6 -1 -1 -1
Flat Hypothesis
-2 -2 -2 -2 5 5 -2
The Mid Hypothesis tests whether the stimulus-pair in the middle of the continuum
(220 – 205 Hz and 220 – 212.5 Hz) are more discriminable than the other stimulus-
pairs. The Flat Hypothesis test examines whether there is a difference in
discrimination performance between the two flat pairs (212.5 – 220 Hz and 220 -
227.5 Hz) and the rest of the stimulus-pairs.
7.6 Results
Results are presented separately for the identification training (section 7.6.1),
identification test trials (7.6.2) and the discrimination test trial (section 7.6.3) task,
followed by a summary and a qualitative evaluation of the identification and
discrimination functions (section 7.6.4).
7.6.1 Identification Training Results
The identification task began with practice trials, followed by a training session to
criterion, and then a set of test trials. Results for the practice trials were not analysed.
In the identification training session, participants were required to identify eight
164
stimuli correctly in succession in order to proceed to the main part of the experiment.
(For raw data and statistical analyses see Appendix A7.3 and 7.4).
Before considering the rising and the falling continua separately a 2 (English/Thai) x
2 (musicians/non-musicians) x 2 (rising/falling) analysis of variance was conducted.
Musicians required significantly fewer trials to reach criterion than non-musicians
(Mmusicians = 19.12, Mnon-musicians = 33.19; F (1, 56) = 9.630, p < .01). No other
significant main effects or interactions were found. Results of the training session for
the rising and falling continua are presented in section 7.6.1.1 and 7.6.1.2.
7.6.1.1 Trials to Criterion in Identification – Rising Continuum
Before considering the blocked and the mixed conditions separately a 2
(Australian/Thai) x 2 (musicians/non-musicians) x 2 (mixed/blocked) ANOVA was
conducted. Descriptive statistics (means and standard error bars) are presented in
Figure 7.6. The analysis revealed that significantly more trials were required to reach
criterion in the mixed condition than in the blocked condition (t (1, 62) = 2.814, p <
.01; Mmix 21.59, sd = 22.66; Mblock = 10.09, sd = 4.58; for raw data see Appendix
A7.3; for analysis see Appendix A7.4).
Australian Musicians
0
20
40
60
80
set 1 set 2 sine speech
mixed blocked
Australian Non-Musicians
0
20
40
60
80
set 1 set 2 sine speech
mixed blocked
Thai Musicians
0
20
40
60
80
set 1 set 2 sine speech
mixed blocked
Thai Non-Musicians
0
20
40
60
80
set 1 set 2 sine speech
mixed blocked
mean
crite
rion s
core
s
mean
crite
rion s
core
s
165
Figure 7.6. Descriptive Statistics for Trials to Criterion for the Rising Continuum in the Identification Task.
Rising Continuum, Blocked condition: Mean trials required to reach criterion in the
blocked condition for the language groups and tone types are shown in Figure 7.6, and
were analysed in a 2 (Australian/Thai) x 2 (musician/non-musician) x 2 (sine/speech)
ANOVA. No significant differences were found between language groups or
musicians and non-musicians. The analysis shows that significantly more trials were
required to reach criterion in the sine-wave condition (M = 11.82) than in the speech
condition (M = 8.38; F (1, 24) = 5.386, p < .05). No other main effects or interactions
were significant (for analyses Appendix A7.4).
Rising Continuum, Mixed condition: As can be seen in Figure 7.6, in the mixed
condition, Australian participants required significantly more trials than Thai listeners
to reach criterion (MAustralian = 29.06; MThai = 14.125; F (1, 24) = 11.522, p < .01), non-
musicians required significantly more trials than musicians (Mmusicians = 11.63, Mnon-
musicians = 31.56; F (1, 24) = 20.526, p < .01), and participants needed more trials to
reach the criterion in Set 1 (M = 29.81) than in Set 2 (M = 13.37; F (1, 24) = 13.95, p
< .01). The interaction between language and musical background was also found to
be significant, with no significant difference between Thai musicians and non-
musicians, but significantly higher criterion scores for the Australian musicians than
non-musicians (F (1, 24) = 19.011, p < .05). Another interaction that was found to be
significant was the interaction between sets and musical background, with no
difference between musicians and non-musicians in the second set, but significantly
lower criterion scores in musicians in the first set (F (1, 24) = 7.054, p < .05). None of
the other interactions were found to be significant (for analyses Appendix A7.4).
7.6.1.3 Trials to Criterion in Identification – Falling Continuum
Descriptive statistics for trials to criterion in the falling continuum are provided in
Figure 7.7. Before considering the blocked and the mixed conditions separately a
comparison of trials to criterion in the mixed vs. the blocked condition collapsed over
language groups, tone types, and musical backgrounds was conducted. The analysis
revealed no significant difference between criterion scores in the mixed condition and
the blocked condition (t (1, 62) = 1.034, p > .05; Mmix = 13.56, sd = 9.353; Mblock =
11.31, sd = 7.998; for raw data see Appendix A7.3; for analysis see Appendix A7.4).
166
Figure 7.7. Trials to Criterion Scores for the Falling Continuum.
Falling Continuum, Blocked condition: No significant main effects or interactions
were found in the blocked condition. (For raw data and analyses see Appendix A7.3
and 7.4.)
Falling Continuum, Mixed condition: In the mixed condition, musicians needed
significantly fewer trials to reach criterion than non-musicians (F (1, 24) = 7.555, p <
.05) and participants required significantly more trials to reach criterion in the first
than in the second set (F (1, 24) = 7.809, p < .05). (For raw data and analyses see
Appendix A7.3 and A7.4.)
mean
crite
rion s
core
s
m
ea
n c
rite
rion s
core
s
Australian Musicians
0
20
40
60
80
set 1 set 2 sine speech
mixed blocked
Australian Non-Musicians
0
20
40
60
80
set 1 set 2 sine speech
mixed blocked
Thai Musicians
0
20
40
60
80
set 1 set 2 sine speech
mixed blocked
Thai Non-Musicians
0
20
40
60
80
set 1 set 2 sine speech
mixed blocked
167
7.6.2 Identification Test Trials
Identification test trials were analysed with respect to identification crossover
(sections 7.6.2.1 and 7.6.2.2) and d' values (sections 7.6.2.3 and 7.3.2.4) and
separately for the rising and falling continua.
7.6.2.1 Crossover Values – Rising Continuum
Mean crossover values in Hz were calculated for each listener and are shown in
Figure 7.8, 7.9, 7.10, and 7.11. Planned comparisons within a 2 x 2 x 2 x (2) language
x musical background x presentation manner x stimulus type analysis of variance
(ANOVA), with repeated measures on the last factor were conducted to test the effect
of these four factors on the crossover boundary location (for raw data and analyses see
Appendix A7.5 and A7.6).
In the rising continuum, the mean crossover over all manipulations was 232.03 Hz
and there were no significant deviations from this, due to language or musical
background, presentation manner, or stimulus type, or their interactions.
220
230
240
250
mixed blocked mixed blocked
musicians non-musicians
mean c
rossover
valu
es (
Hz)
sine speech
Figure 7.8. Crossover Values for the Rising Continuum for Thai Listeners.
220
230
240
250
mixed blocked mixed blocked
musicians non-musicians
mean c
rossover
valu
es (
Hz)
sine speech
168
Figure 7.9. Crossover Values for the Rising Continuum for Australian Listeners.
7.6.2.2 Crossover Values – Falling Continuum
The mean crossover over all manipulations was 211.83 Hz, but crossovers for Thai
listeners were significantly higher and thus closer to the flat 220 Hz offset tone than
those of the Australian listeners (F (1, 45) = 17.416, p < .005; MAustralian = 208.10 Hz;
MThai = 215.45 Hz). There was also a significant interaction of presentation manner
and musical background: crossovers were similar for both musicians (M = 212.65 Hz)
and non-musicians (M = 212.82 Hz) in the mixed condition, as well as for non-
musicians in the blocked condition (M = 213.9 Hz), but significantly lower for the
musicians in the blocked condition (M = 208.1 Hz; F (1, 45) = 4.397; p < .05). There
was also a significant interaction between presentation manner and tone type: in the
blocked condition, there was little difference in crossover for speech and sine-wave
tones, whereas in the mixed condition, the sine-wave crossover was much higher on
the continuum than the crossover for speech tones (F (1, 45) = 4.631; p = .037),
especially for the Australian non-musicians (F (1, 45) = 5.261; p < .05).
180
190
200
210
220
230
mixed blocked mixed blocked
musicians non-musicians
mean c
rossover
valu
es (
Hz)
sine speech
Figure 7.10. Crossover Values for the Falling continuum for Thai Listeners.
169
180
190
200
210
220
230
mixed blocked mixed blocked
musicians non-musicians
mean c
rossover
valu
es (
Hz)
sine speech
Figure 7.11. Crossover Values for the Falling continuum for Australian Listeners.
7.6.2.3 Identification d' Results
The d' results (steepness of the identification function at the boundary location, an
indicator of the degree of categoricality) were analysed in language x musical
background x presentation manner x tone type, 2 x 2 x 2 x (2) analyses of variance
separately for the rising and the falling continua. Descriptive statistics are shown in
Figures 7.12 and 7.13 for the rising continuum (for raw data see Appendix A7.7, and
for analyses see Appendix A7.8) and Figures 7.14 and 7.15 for the falling continuum.
Before considering the falling and the rising continua separately, a 2 (rising/falling) x
2 (Thai/Australian) x 2 (musicians/non-musicians) analysis of variance collapsed over
presentation manner and tone type was conducted. The analysis revealed no
significant main effects or interactions.
Identification Accuracy - Rising Continuum: For the rising continuum, no main
effects were observed, but there was a language x musical background interaction -
Australian musicians perform more categorically than Australian non-musicians, but
there was little difference between Thai musicians and non-musicians (F (1, 44) =
4.106, p = .049). There was also a significant interaction of musicianship and
presentation manner: In the mixed condition, non-musicians show more categorical
identification, however in the blocked condition, musicians are more categorical in
identifying tones (F (1, 44) = 9.808, p < .005). No other interactions were significant.
170
0
1
2
3
4
mixed blocked mixed blocked
musicians non-musicians
slo
pe (
d')
sine speech
Figure 7.12. Descriptive Statistics for d' Values (rising continuum) for Thai Listeners.
0
1
2
3
4
mixed blocked mixed blocked
musicians non-musicians
slope
(d')
sine speech
Figure 7.13. Descriptive Statistics for d' Values (rising continuum) for Australian English Listeners.
Identification Accuracy - Falling Continuum: In the falling continuum, the mean
identification accuracy (d') over all manipulations was 1.460 and analyses of variance
showed there were no significant differences due to language background, musical
background, presentation manner, or tone type.
171
0
1
2
3
4
mixed blocked mixed blocked
musicians non-musicians
slo
pe (
d')
sine speech
Figure 7.14 Descriptive Statistics for d' Values (falling) for Thai Listeners.
0
1
2
3
mixed blocked mixed blocked
musicians non-musicians
slo
pe (
d')
sine speech
Figure 7.15. Descriptive Statistics for d' Values (falling) for English Listeners.
7.6.3 Discrimination Results
In the discrimination task there were practice trials followed by sets of test trials.
Practice trials were not analysed. For the test trials d' was the dependent variable.
Overall differences in discrimination are presented in 7.6.3.1 and 7.6.3.2 and analyses
of peaks of discrimination and perceptual strategies and perceptual strategies in
7.6.3.3 and 7.6.3.4.
7.6.3.1 Overall Discrimination Differences
Descriptive statistics for overall discrimination ability are presented in Figure 7.16
and 7.17 for rising and in Figure 7.18 and 7.19 for falling continua respectively, and
in summary form in Figures 7.24 and 7.25. For raw data see Appendix A7.9, for
analyses see Appendix A7.10. Before considering the rising and the falling continua
separately a 2 (English/Thai) x 2 (musicians/non-musicians) x 2 (rising/falling)
172
analysis of variance was conducted. Musicians show significantly higher
discrimination accuracy than non-musicians (Mmusicians = 1.589, Mnon-musicians = .998; F
(1, 124) = 7.316, p < .01). No other significant main effects or interactions were
found.
Overall Discrimination, Rising Continuum: In the rising continuum, there were no
significant main effects depending on language background or musicianship and no
interactions of language background with tone type. However, sine-wave stimuli were
generally discriminated significantly better than speech tones (F (1, 48) = 9.672, p <
.01). No other main interactions were found to be significant.
-2
-1
0
1
2
3
4
5
mixed blocked mixed blocked
musicians non-musicians
dis
crim
inatio
n a
ccu
racy
(d')
sine speech
Figure 7.16. Descriptive statistics for Discrimination Accuracy (rising) in Thai Listeners.
-2
-1
0
1
2
3
4
5
mixed blocked mixed blocked
musicians non-musicians
dis
crim
inatio
n a
ccu
racy
(d')
sine speech
Figure 7.17. Descriptive Statistics for Discrimination Accuracy (rising) in Australian Listeners.
173
Overall Discrimination Differences – Falling Continuum: For the falling continuum,
musicians were found to be significantly better at discriminating the tones (F (1, 48) =
13.45, p < .01). There was also an interaction between language and musical
background: Australian musicians were better at discrimination of tones than Thai
musicians (F (1, 48) = 6.51, p < .05), but there was little difference between
Australian and Thai non-musicians. None of the other main effects or interactions
(neither 2-way nor 3-way) were found to be significant.
-2
-1
0
1
2
3
4
5
mixed blocked mixed blocked
musicians non-musicians
dis
crim
inatio
n a
ccu
racy
(d')
sine speech
Figure 7.18. Descriptive Statistics for Discrimination Accuracy (falling) in Thai Listeners.
-2
-1
0
1
2
3
4
5
mixed blocked mixed blocked
musicians non-musicians
dis
crim
inatio
n a
ccu
racy
(d')
sine speech
Figure 7.19. Descriptive Statistics for Discrimination Accuracy (falling) in Australian Listeners.
7.6.3.2 Discrimination Peak Analysis
The discrimination results were analysed separately for the different continuum
shapes. This was necessary because the location of the discrimination peak has to be
analysed relative to the continuum, which means that the middle of the continuum for
174
instance, is at 208.75 Hz for the falling continuum and at 231.5 for the rising
continuum. Raw data are presented in Appendix A7.11; analyses are presented in
Appendix A7.12. Data are schematically represented for the rising continuum in
Figures 7.20 and 7.21 and for the falling continuum in Figures 7.22 and 7.23
Discrimination Peak, Rising Continuum: In the rising continuum, neither the mid-
hypothesis nor the flat-hypothesis were confirmed by the results (Fmid (1, 23) = .471, p
> .05; Fflat (1, 23) = 1.771, p > .05). There was however an interaction of pair type
(flat vs. other) and musical background with the d‟ for the flat pair significantly lower
than for the other pairs in the non-musician but not in the musician group (F (1, 23) =
15.479, p < .05).
-1
0
1
2
3
4
5
musicians non-musicians musicians non-musicians
Australian Thai
dis
crim
ina
tion a
ccus\
racy
(d')
flat pairs other pairs
Figure 7.20. Descriptive statistics for the Flat Pair vs. the other Stimulus-Pairs – Thai and Australian Musicians and Non-Musicians (rising continuum).
-1
0
1
2
3
4
5
musicians non-
musicians
musicians non-
musicians
Australian Thaidis
cri
min
ation a
ccura
cy (
d')
mid pair other pairs
Figure 7.21. Descriptive statistics for the Mid Pair vs. the other Stimulus-Pairs – Thai and Australian Musicians and Non-Musicians (rising continuum).
175
Thus it appears that the flat hypothesis is upheld less for non-musicians than for
musically trained listeners. No other main effects or interactions concerning the mid-
hypothesis or the flat hypothesis were found to be significant.
Discrimination Peak, Falling Continuum: In the falling continuum, both the flat and
the mid hypothesis were supported by the data (Fmid (1, 24) = 8.998, p < .05; Fflat (1,
24) = 10.948, p < .05;). No interactions concerning the mid-hypothesis or the flat-
hypothesis were significant, suggesting that all groups displayed these equivalent
mid/flat results.
-1
0
1
2
3
4
5
musicians non-musicians musicians non-musicians
Australian Thai
dis
crim
ina
tio
n a
ccura
cy (
d')
flat pairs other pairs
Figure 7.22. Descriptive statistics for the Flat Pair vs. the other Stimulus-pairs – Thai and Australian Musicians and Non-Musicians (falling continuum).
-1
0
1
2
3
4
5
musicians non-
musicians
musicians non-
musicians
Australian Thaidis
crim
inatio
n a
ccura
cy (
d')
mid pair other pairs
Figure 7.23. Descriptive statistics for the Mid Pair vs. the other Stimulus-pairs – Thai and Australian Musicians and Non-Musicians (falling continuum).
176
7.6.4 Qualitative Evaluation and Summary of Results
Identification functions and discrimination graphs for both language groups and
musicians and non-musicians are shown in Figure 7.24 for the rising continuum and
in Figure 7.25 for the falling continuum.
In the rising continuum, the identification functions look steeper in Thai musicians
and non-musicians compared to Australian English musicians and non-musicians.
When comparing musicians and non-musicians, it can clearly be seen that musicians
show higher discrimination values and steeper identification curves than non-
musicians. In all groups, the category boundary lies very close to the middle of the
continuum. However, this qualitative observation is not reflected in the analyses. For
the rising continuum the concordance between speech and sine-wave identification in
both Thai and Australian English musicians appears to be greater than that in non-
musicians and the endpoints appear to be identified more reliably (closer to the
minimum and maximum values) by musician listeners than by non-musician listeners.
In the falling continuum, the picture is similar in terms of perceptual accuracy:
musicians show steeper identification functions, higher discrimination accuracy and
greater speech/non-speech correspondence (the speech and sine-wave graphs lie
closer together for musicians than for non-musicians). When looking at the crossover
values, however, the results differ from the rising continuum: the crossovers are not
consistently near the middle of the continuum, but rather between the middle and the
flat no-contour tone.
In summary, considering both the combined results in Figures 7.24 and 7.25 and the
statistical analyses, it appears that musicians learn to identify tones more quickly than
do non-musicians and discriminate tones with greater accuracy than non-musicians
(but only in the falling continuum). For the rising continuum, there is little difference
between musician and non-musician Thai speakers‟ identification accuracy, but
Australian English musicians show better identification accuracy than Australian
English non-musicians.
Participants who are confronted with just one type of stimulus at a time (in the
blocked condition) need fewer trials to reach criterion in identification. In terms of
177
identification boundaries, the perceptual strategy seems to depend on the shape of the
continuum, with the perceptual anchor found in the middle of the rising continuum
(note that this result is not statistically significant), but between the flat no-contour
tone and the central tone for the falling continuum.
English musicians rising
0
50
100
205 212.5 220 227.5 235 242.5 250 257.5
-8.715
0
8.715
English non-musicians rising
0
50
100
205 212.5 220 227.5 235 242.5 250 257.5
-8.715
0
8.715
Thai musicians rising
0
50
100
205 212.5 220 227.5 235 242.5 250 257.5
-8.715
0
8.715
Thai non-musicians rising
0
50
100
205 212.5 220 227.5 235 242.5 250 257.5
-8.715
0
8.715
Iden
tific
atio
n ac
cura
cy in
%
I
dent
ifica
tion
accu
racy
in %
Dis
crim
inat
ion
accu
racy
(m
inim
um:
-8.7
15, m
axim
um: 8
.715
)
sine speech
sine speech
sine speech
sine speech
Figure 7.24. Identification and discrimination for the rising continuum
flat mid flat mid
offset values (in Hz) for the 220 Hz onset tone offset values (in Hz) for the 220 Hz onset tone
Rising Continuum Tone Perception
Thai musicians falling
0
50
100
182.5 190 197.5 205 212.5 220 227.5 235
-8.715
0
8.715
mid flat mid flat
English non-musicians falling
0
50
100
182.5 190 197.5 205 212.5 220 227.5 235
-8.715
0
8.715
Thai non-musicians falling
0
50
100
182.5 190 197.5 205 212.5 220 227.5 235
-8.715
0
8.715
English musicians falling
0
50
100
182.5 190 197.5 205 212.5 220 227.5 235
-8.715
0
8.715
Iden
tific
atio
n ac
cura
cy in
%
I
dent
ifica
tion
accu
racy
in %
Dis
crim
inat
ion
accu
racy
(m
inim
um:
-8.7
15, m
axim
um: 8
.715
)
Figure 7.25. Identification and discrimination results for the falling continuum
sine speech
sine speech
sine speech
sine speech
offset values (in Hz) for the 220 Hz onset tone offset values (in Hz) for the 220 Hz onset tone
Falling Continuum Tone Perception
180
7.7 Discussion
Here the results are discussed especially in terms of continuum shape and differences
depending on musical background.
7.7.1 Differences Due to Continuum Shape
In this experiment, two differently shaped continua were used: one that consisted of
more rising than falling tones, and another that comprised more falling tones than
rising tones. In the analyses, significantly more trials were required to reach the
identification training criterion in the rising continuum than in the falling continuum.
This is interesting because it suggests that over and above any effects of musical or
language experience, the exact properties of the tone continuum, that is the proportion
of rising vs. falling tones, can affect the ability to learn a new tone contrast. So, even
though the physical separation was equivalent in the two continua, listeners learned to
label a slightly rising tone (the upper endpoint of the falling continuum with an onset
of 220 Hz and an offset of 235 Hz) and a steeper falling tone (the lower endpoint of
the falling continuum with an onset of 220 Hz and an offset of 182.5 Hz) faster than a
steeper rising tone (the upper endpoint of the rising continuum with an onset of 220
Hz and an offset of 257.5 Hz) and a slightly falling (the lower endpoint of the rising
continuum with an onset of 220 Hz and an offset of 205 Hz) tone.
In terms of crossover location, no differences were found regarding language
background, musical background, tone type, or presentation manner in the rising
continuum. Thus, all participants‟ perception changed from one tone category to the
other at a similar location on the continuum, namely around the middle (231.25 Hz) of
the rising continuum (Mrising = 232.03, sd = 6.203). In the falling continuum, however,
there are crossover differences, depending on language background, presentation
manner, and tone type, with the mean again being around the middle (208.75 Hz) of
the continuum (Mfalling = 211.83, sd = 8.80). These differences indicate that even
though in both continua the middle is used as a perceptual anchor in identification,
there remain differences that may originate in the specific stimulus properties.
181
7.7.2 Differences Between Musicians and Non-Musicians
Tones are a feature of music in all cultures, but tone is not a feature of speech in all
languages. Musicians, however, are constantly exposed to small tonal changes. Thus,
it was expected that tone perception would be better in musicians than in non-
musicians. The results of the current study confirm this; musicians learn to identify
tones more quickly than non-musicians, and are more accurate at discriminating
subtle differences between tones than non-musicians. Thus it appears that musical
training might improve listeners‟ speed of learning new non-musical tonal distinctions
and their ability to discriminate tones.
A reason for the superiority of musicians over non-musicians could be that musicians
have better pitch pattern processing skills. Jakobson, Cuddy, and Kilgour (2003)
found that music training engages and refines processes involved in pitch pattern
analysis and these may be activated in the current tasks. Thus it is possible that music
instruction affects categorical perception of tone indirectly by strengthening auditory
temporal processing skills (Jakobson et al., 2003), which allows musicians to
discriminate better between rapidly changing acoustic events.
Another more universal explanation of the difference between musicians and non-
musicians‟ categorical perception of tone in both speech and non-speech contexts is
that the musicians‟ dominance may originate from the combination of abilities that
music lessons teach and improve, including focused attention and concentration,
memorisation, music reading, and fine-motor skills (Schellenberg, 2001). The
enhanced ability for identifying and discriminating tones as shown in the current
study might be a result of generally enhanced auditory skills evoked by previous
music training. This will be taken up further in the next experiment; the relative
influence of musical training on the one hand and musical aptitude on the other on
tone perception will be considered.
The results of this experiment show that musical ability enhances the ability to
identify and discriminate tones in both a speech and non-speech context. Interestingly,
these are no systematic differences between speakers of tonal and non-tonal languages
182
in general. This suggests that musical experience is a potent factor that shapes the
perception of unfamiliar tones in speech and non-speech contexts in a stronger way
than tonal language experience does.
It is surprising that no differences between the perception of speech and non-speech
tones have been found in this experiment. One reason for this lack of difference could
be the artificial nature of the speech tones presented. However, it has been shown in
previous studies that synthetic tones that were generated in the same way as the
stimuli used here are in fact perceived as speech sounds and distinct from non-speech
tones (Burnham & Jones, 2002; S. Chan et al., 1975; Howie, 1976). However,
Burnham and Jones (2002) presented a range of types of non-speech sounds (sine-
wave, filtered speech, simulated violin) and so it is possible that the contrast, for the
listener, of speech and non-speech under such conditions is greater. Clearly this is an
issue that requires further research.
While this result pattern seems very clear, it remains uncertain what exactly it is about
musical training that enhances tone perception. The next experiment will investigate
the locus of musicians‟ superiority and whether it is restricted to the quasi-musical
stimulus of tone.
183
CHAPTER 8
Perception and Production of Tones, Consonants, and
Vowels: The Influence of Language Aptitude, Musical
Aptitude, Musical Memory, and Musical Training
184
8.1 Introduction
As discussed in Chapter 4, music exposure and training can influence speech
processing. In fact, in Experiment 2, it became clear that the influence of musical
background on tone perception might even be greater than the influence of tonal
language background, at least for the language backgrounds studied there, English
and Thai. In order to obtain a more comprehensive understanding of the role of
musicality in tone perception, here musical experience, musical memory, aptitude58
(musical aptitude and foreign language aptitude) will be investigated, especially in
relation to speech perception and production. As lexical tone relies very much on F0
(see Chapter 3) there may well be a link between speech and music at the tonal level.
For this reason the effect of musical variables on the perception and production is
investigated here. To analyse whether any such effects are due to a general effect on
phonological perception and production or more specifically on tone, the perception
and production of consonants and vowels are also considered here. In this experiment
the effect of musical and linguistic ability on the perception and production of foreign
language sounds will be investigated. To this end musician and non-musician non-
tone (Australian English) language participants are given tests of musical ability and
linguistic aptitude, along with tests of the perception and production of Thai tones,
consonants, and vowels.
Musicianship was measured by considering musical training and experience. To this
end, groups of musicians, with at least five years of continuous formal musical
training, and non-musicians with no more than two years of musical training were
tested. A questionnaire was administered to specify more precisely the degree and
type of training. In addition to these a priori measures of musicality, further measures
of musical ability were used in order to assess the effect of inherent musicality, free
from the effect of musical training and experience, on foreign language perception
and production. Two types of inherent musicality tests were administered – a test of
musical aptitude, and a test of musical memory.
58 Aptitude is defined as the potential to acquire a skill, a natural tendency to do something well. In this context, it is also the case that there may be between-participant differences in such potential, and these differences may originate from various sources, and not necessarily be genetically inherited.
185
Thus the design of the experiment involved two groups of Australian English
speaking participants (musicians and non-musicians), given tests of musical aptitude,
musical memory, language aptitude, and Thai speech sound tests on the (a) perception
and (b) production of (i) tones, (ii) consonants, and (iii) vowels. The hypotheses are
set out below ahead of a detailed description of the tests and the method.
8.2 Hypotheses
Hypotheses were advanced regarding each individual ability/aptitude test; for
perception and production; and for the relative contribution of abilities/aptitudes to
speech sound production and perception.
8.2.1 Separate Abilities
8.2.1.1 Musicianship
It is expected that musical training (i.e. the musicians vs. non-musician factor) will
enhance both perception and production of lexical tones, vowels, and consonants,
although given that the experience with musical pitch that musical training affords, it
is expected that musical training will enhance both perception and production of
lexical tones more than it will that of consonants and vowels.
8.2.1.2 Musical Aptitude
It is expected that musicians will display higher musical aptitude scores. It is also
expected that high musical aptitude will enhance perception and production of lexical
tones and consonants more than of vowels. The reason for this for tones is that
musical aptitude measures natural abilities people might have for melody and rhythm.
Lexical tone can be related to melody and on this basis it might be expected that tone
perception and production will be enhanced by musical aptitude. For consonants the
contrasts used here are based on differences in voicing, voice onset time, so it might
be expected that consonant perception and production would be enhanced to the
degree that temporal resolution is involved in musical aptitude. For vowels as the
timbre aspect of music, which can be related to vowel features, has not been found to
be enhanced in musicians compared to non-musicians (Lamb & Gregory, 1993; Prior
& Troup, 1988) it is expected that vowel perception and production will be unaffected
by musical aptitude.
186
8.2.1.4 Musical Memory
Based on previous results (Schellenberg & Trehub, 2003) it is expected that musicians
as well as non-musicians will perform equally well on the musical memory test. It is
further expected that musical memory for the pitch of melodies will enhance speech
perception and production. In particular, due to the pitch-based nature of this ability it
is expected that musical memory will enhance the perception and production of
lexical tones more than it will that of consonants and vowels.
8.2.1.3 Language Aptitude
Musicians and non-musicians are expected to perform equally well on language
aptitude. It is also expected that language aptitude will enhance perception and
production of lexical tones, vowels, and consonants equally.
8.2.2 Relationship between Perception and Production
In all the above it is expected that both perception and production will be equally
enhanced. It is also expected that musicians as well as non-musicians will produce
those sounds that result in high perceptual accuracy better than those sounds that are
non-native to identify, i.e. that there should be positive correlations between
perception and production.
8.2.3. Determinants of Perception and Production
It is expected that musical training, musical aptitude, musical memory, and language
aptitude, will contribute to the prediction of participants‟ ability to (a) perceive and
(b) produce (i) tones, (ii) consonants, and (iii) vowels. In particular it is expected that
musical training will predict production and perception of tone, but not of vowels and
consonants. Musical aptitude is expected to predict perception and production of
tones, and consonants and vowels, and musical memory is expected to predict
perception and production of tones, but not consonants and vowels. It is expected that
language aptitude will predict perception and production of all speech sounds equally.
187
8.3 Method
8.3.1 Participants
A total of 36 native Australian English participants were tested; 18 musicians (10
female, 8 male, average age: 27.8 years) and 18 non-musicians (10 female and 8 male,
average age: 24.6 years). Participants gave their informed consent (see Appendix
A8.2) and the experiment was covered by Ethics Approval from the University of
Western Sydney (HREC 06/65). All participants were students at the University of
Western Sydney, who received course credit for their participation. None of the
participants had previous exposure to a tone language. Musicians were defined as
instrumentalists/singers having at least five years of continuous formal musical
training (M = 15.7 years, sd = 10.62). Non-musicians were defined as having no more
than two years of musical training (M = .11 years, sd = .47). In addition to this
classification, all participants were given a questionnaire, which included
demographic and sensory information history as well as musical history (for details of
participants‟ musical history see Appendix A8.1). None of the participants had any
self-reported hearing or speech/language problems. Participants were tested
individually in a single session, in a sound-attenuated testing cubicle in the MARCS
Auditory Laboratories at the University of Western Sydney. They were each given
tests of musical aptitude, musical memory, language aptitude, and perception and
production of tones, consonants, and vowels. The order of tests was counterbalanced
between participants and testing took a total of 75 minutes. Stimuli for all tasks were
presented on a laptop computer (Compaq Evo N1000c) over headphones (KOSS
UR20) at a self-adjustable listening level. Details of each test are given below.
8.3.2 Musical Aptitude
The most comprehensive of the musical aptitude tests is the Musical Aptitude Profile
(MAP), designed by Gordon (1965) to measure seven different dimensions of musical
aptitude in students with and without musical knowledge ranging from grade four to
twelve or older. The MAP consists of three sections: tonal imagery, rhythm imagery,
and musical sensitivity. However, the test takes 3 hours and 30 minutes and so here, a
shorter version was employed, the Advanced Measures of Music Audiation (AMMA;
Gordon, 1989). The Advanced Measures of Music Audiation is a recorded aptitude
test that is usually administered to high school or university students with or without
188
musical experience. The AMMA was chosen because it is reported to represent
stabilised musical aptitude and it thus is best suitable to study musicians and non-
musicians. The AMMA (Gordon, 1989) has adequate normative data, its reliability
and validity are adequate, it is independent of musical training and chronological age,
has clear instructions, and takes only 20 minutes to complete. The norm sample
consisted of American college and university music majors (n = 3206), non-music
majors (n = 2130) and high school students (n = 872). Mean reliability of the test,
measured by split-half and test re-test measures was r = .82, and this value was
around the same for all three of the norm groups.
The AMMA consists of 30 computer-programmed questions with musical material
performed on an electronic instrument. It takes around 20 minutes to administer and is
presented on audio-CD through headphones. Each test question consists of a short
musical statement and a musical answer. For example, the musical statement would
have a different final note than the musical answer or would be partly performed more
slowly or with different phrasing. The student is required to decide whether statement
and answer are the same or different. In case of different judgements, the listener must
then decide whether the difference lies in a tonal (change in tone or key) or
rhythmical (change in duration, tempo, or meter) manipulation. There cannot be a
tonal and a rhythm change. In the practice exercises, it is explained what is meant by
a tonal or a rhythmical change. The listener is required to fill the blank in the “same”
column, if they think the answer is identical to the statement, and if it is judged to be
different, to fill the “tonal” column if the change is perceived as tonal, and the
“rhythm” column if perceived to be rhythmical (see Appendix A8.3 for the answer
sheet). The listeners are instructed not to guess, but to leave items blank if they are
unsure. The change can occur in any position in the answer (beginning, middle, or
end) and the listener cannot answer by counting the number of notes, as there is
always the same number of notes in the question as in the answer. Scoring included
three processes: a) counting the number of correct answers to obtain a raw score; b)
adjusting the raw scores (according to the procedure in Gordon, 1989) c) converting
the raw scores to percentile ranks (according to a table included in the test materials,
Gordon, 1989). All scoring was done by the experimenter, as because scoring was
completely objective, it was not necessary to involve other scorers.
189
8.3.3 Musical Memory for Pitch
The Schellenberg and Trehub (2003) musical memory test was devised to measure
absolute pitch memory in musicians and non-musicians using material that would be
familiar to both - the absolute pitch of popular TV themes. Using such tests
Schellenberg and Trehub (2003) have shown that both musicians and non-musicians
show relatively good pitch memory, and variation in this ability is relatively
independent of musical training. Here the original Schellenberg and Trehub (2003)
test was adapted for the current purposes. Twelve 5-second samples from popular
songs were presented to the listener in two forms – at the original pitch level and at a
slightly transposed pitch. The listener was required to decide which of the versions is
the original version (Schellenberg & Trehub, 2003). A list of the 12 songs used here is
given in Appendix A8.4. The selection criterion for which songs to use was
popularity, as determined in a pilot study conducted at MARCS Auditory laboratories,
UWS, Sydney. The position in the songs from which excerpts were taken was not
predetermined; it was selected to be maximally representative of the overall recording
and then saved as CD-quality sound files (44.1 KHz sampling rate). The original
excerpts were shifted by 1 or 2 semitones upward or downward with ProTools
(DigiDesign) digital-editing software, which is commonly used in professional
recording studios. Pitch shifting had no effect on tempo (speed) or overall sound
quality. Direction and magnitude of the pitch shifts were counterbalanced, so that of
the 12 trials, 6 were upward shifts and 6 downward, and of those, 3 were shifted by 1
semitone, and the other 3 by 2 semitones. Pitch shifts involved multiplying (for
upward shifts) or dividing (for downward shifts) all frequencies in an excerpt by a
factor of 1.12 for 2-semitone shifts, and 1.06 for 1-semitone shifts. For example, a 2-
semitone shift upwards involved a change from 262 Hz to 294 Hz. To eliminate
potential cues from the electronic manipulation (that could result in quality
differences between the original and the shifted versions) the pitch levels of the
original excerpts were also shifted upward and then downward 1 semitone (all
frequencies multiplied and then divided by a factor of 1.06).
The excerpts were presented to participants in a Microsoft PowerPoint file. All 12
trials were tested in the same session. Listeners heard one version of a 5-sec excerpt at
the original pitch level and another version at the altered (upward or downward) pitch.
190
Order of presentation (original-altered or altered-original) was counterbalanced. The
participants activated each excerpt version by clicking on a loudspeaker-icon on the
screen, so they could determine the time separation of the sounds themselves (for an
example of a screen shot see Appendix A8.5). Participants were instructed that they
would hear two versions of the same song on each trial, with one version at the
correct pitch and the other version shifted higher or lower. First, they were required to
indicate whether they had heard the song before, by ticking a box next to the song title
on the answering sheet. Their task was then to identify which was the excerpt with the
original (usual) pitch level (see Appendix A8.4 for the answer sheet). They received
no feedback for correct or incorrect responses during the task. Responses for those
songs that the participants were familiar with were scored as correct or incorrect and
participants were given a proportion correct for known songs score. The task took 4 to
6 minutes, depending on the individual participant‟s pace.
8.3.4 Foreign Language Aptitude
For the present purposes the Pimsleur Language Aptitude Battery (PLAB; Pimsleur,
1966) was used. The PLAB was developed to measure the ability to learn foreign
languages, and consists of six parts, concerned with different aspects of language
learning – grades, motivation, vocabulary, language analysis, sound discrimination
and sound-symbol association. In part 1, the student gives information about Grade
Point Average (a measure of academic achievement used in the United States) in
areas other than foreign languages. In the second part, interest in learning a foreign
language is assessed; the student indicates whether he/she is interested on a 5-point
scale from “rather uninterested” to “strongly interested”. Part 3 involves vocabulary
assessment of English; participants are required to choose from four possibilities a
word that has “approximately the same meaning” as a given word. In part four, ability
to reason logically in terms of foreign language is assessed. In the current experiment,
only part five and six were administered. In part five, the task is to learn three words
in a language (Ewe59) where nasality and lexical tone are used to distinguish meaning.
In the first section, two words that are different in nasality (“cabin”, no nasality, [ehɔ]
and “boa”, nasal final vowel [ehɔ]) are to be learnt. These two words are then
presented in sentence context (items 1 – 7) and the participant identifies these by
59 Ewe is an African language, spoken in Ghana and Togo. It has four lexical tones (Capo, 1991).
191
filling in blanks on the answer sheet (for the answer sheet see Appendix A8.6). Then,
a third word (“friend” [ehɔ] with rising tone) is introduced which differs from one of
the other words (boa) in lexical tone (items 8 – 15). Listeners are required to identify
whether it is “boa” or “friend” that is said in a sentence context. In the last part of the
experiment (items 16 – 30), the listener has to choose from all three words. This test
aims to test how well listeners can learn to perceive foreign language sounds. This
section is relevant to the current study as it tests the ability to learn new phonetic
distinctions and to recognise them in different contexts and is a measure of auditory
ability. Section six consists of 24 nonsense words based on English consonants and
vowels (and essentially English syllable structure). The voice on the tape pronounces
one of four words in a written response set, and the participant simply indicates which
of the four written words was spoken. An example would be an auditory presentation
of the word “trapdel”, with four written possibilities: a) trapled b) tarpled c) tarpdel d)
trapdel. This test is appropriate here because it tests the ability to convert sounds into
written output (see Appendix A8.7 for the answer sheet). In part five, the 30 items
were scored as correct or incorrect resulting in a proportion correct score. In part six,
the 24 items were scored as correct or incorrect and participants were given a
proportion correct score. Taken together, part five and six take around 12 minutes.
8.3.5 Stimulus Material for Perceptual Identification and Production Tasks
In order to prepare the stimuli for the identification and production tasks, 27 Thai
syllables were recorded, each consisting of a consonant and a vowel, with an
accompanying tone. Three levels of each of these three features (tone, consonant, and
vowel) were examined; giving rise to 3 (tones) x 3 (consonants) x 3 (vowels) that is
27 different stimulus combinations. In the recording process, native Thai speakers
were asked to read the Thai syllables were presented on flash cards, each having one
of three levels of voicing (prevoiced bilabial stop [b], voiceless unaspirated bilabial
stop [p], and voiceless aspirated bilabial stop [ph]), vowel quality (closed unrounded
front vowel [i], open-mid rounded back vowel [ɔ]60, unrounded closed back vowel
[u]), and tonal contour (tone 0-mid, tone 1-low, and tone 3-high61). Table 8.1 presents
the stimulus matrix. Both male and female native Thai voices were recorded.
60 This vowel will also labelled /o/. 61 All of the three tones were contour tones (see page 189, Figure 8.1).
192
The three voicing distinctions with the bilabial stop were chosen because in
Australian English only two of these distinctions (the voiceless unaspirated, as in
“spa” and the voiceless aspirated bilabial as in “part”) are phonologically relevant.
Thus, the prevoiced bilabial stop is a non-native speech sound that is phonologically
unfamiliar to the English language-speaking participants. In terms of vowels,
i: ɔ: and u were chosen because in Australian English, i as in /heed/, ɔ as in
/hot/ are native, whereas u is not. The tones that are used in this experiment are the
mid tone (tone 0), the low tone (tone 1) and the high tone (tone 3). One reason for this
choice was that these three tones are distinct in F0, and there is no or very little
overlap between them. Another reason was that, apart from the F0 differences, they
also differ in their degree of contour. The mid and the low tone are similar in pitch
height and contour (both falling contours), whereas the high tone has a rising contour
with a slight fall at the end. All of these, at least at a phonological distinction level are
non-native in English, although it could be argued that the mid tone could be the most
familiar.
Table 8.1
Matrix Showing Stimuli Differing on Three Levels: Voice Onset Time, Lexical Tone,
and Vowel Quality. Non-native sounds (for Australian English participants) are
underlined.
Mid Tone (0) Low Tone (1) High Tone (3)
Prevoiced - b bi:0 bɔ:0 bu:0 bi:1 bɔ:1 bu:1 bi:3 bɔ:3 bu:3
Voiceless unaspirated - p
pi:0 pɔ:0 pu:0 pi:1 pɔ:1 pu:1 pi:3 pɔ:3 pu:3
Voiceless aspirated - ph
phi:0 phɔ:0 phu:0 phi:1 phɔ:1 phu:1 phi:3 phɔ:3 phu:3
Some of the syllables presented are real words in Thai whereas others are non-words
(see Appendix A8.7 for meanings of the words). The reason for this is simply that not
all combinations of the consonants, vowels and tones have a meaning in Thai.
However, the syllables were mostly unlike English language words, except for [bi:0]
as in “be”, [phi:0] as in “pea”, [phɔ:0] as in “paw”, and [phu:0] as in “poo”.
193
One male (23 years) and one female (22 years) Thai speaker were employed to record
the stimuli. Both were from Bangkok city and had lived there their whole lives. The
stimuli were recorded by the experimenter, and those (male and female) stimuli that
best matched in terms of F0 contour and duration as judged by a different native Thai
speaker were selected resulting in a complete set of 27 stimuli from the male and the
female speaker. In both the perception (see 8.3.6) and the production (see 8.3.7) tasks
the male participants (8 in the musician and eight in the non-musician group) were
presented with the male speaker stimuli and the females (10 in the musician and 10 in
the non-musician group) were presented with the female speaker stimuli. The reason
for this was so that F0 of participants' productions could be more directly compared to
native speakers' models.
8.3.6 Perception of Speech Sounds
In the perception part of the study, perception of sounds (see 8.3.5) differing in lexical
tone, consonant, and vowel quality is examined. Perception was measured in an
identification test presented in the DMDX (Forster & Forster, 2003) experimental
environment (see Appendix A8.8 for an example DMDX script). There were three
parts: tone identification, consonant identification, and vowel identification, each
consisting of a practice block, a training block, and a test block. In the practice block
nine items were presented, three of each of the contrasting sounds in the relevant
dimension. For example in the consonant part, three prevoiced, three voiceless
unaspirated, and three voiceless aspirated bilabial stops were presented. Following the
practice block, in the training block, a criterion of three consecutive correct responses
had to be reached before testing continued. In this training block, the same sounds
were presented as in the practice block and feedback was provided. In the test block,
no feedback was given to participants, and there were two repetitions of each of the
27 items presented in random order. Given the 27 different stimuli, two repetitions,
and three parts of the test, participants were required to identify 162 items in total.
Participants were provided with three labelled keyboard keys and instructed to “press
the RIGHT key for one kind of sound and the SPACEBAR key for another kind of
sound, and press LEFT for a third kind of sound”. For the consonant task, the keys
were labelled LEFT “ph” (for the voiceless aspirated bilabial), SPACEBAR “p” (for
the voiceless unaspirated bilabial stop), and RIGHT ”b” (for the prevoiced bilabial
194
stop). In the vowel task, the three keys were labelled “i”, “o”, and “u”, and in the tone
task they were labelled with the tone contour in stylised form (as shown in Figure
8.1).
mid tone (0) low tone (1) high tone (3)
Figure 8.1. Stylised versions of tonal contours used to label keys. Only the contours were presented on the keys.
The order of the experiment parts (tone, consonant, vowel) was counterbalanced
between subjects. In the test phase, responses timed out after 7000 ms, and timed out
trials were not replaced. Each of the tasks took 4 to 5 minutes, depending on the
participants‟ pace.
8.3.7 Production of Speech Sounds
In the production task, the same 27 stimuli were used as in the perception task, again
with male speaker stimuli for males and female speaker stimuli for females. Stimuli to
be imitated were presented in a Power Point file (see Appendix A8.9 for example
screenshots of the production task). First, all 27 stimuli were presented in random
order and participants were required to repeat each sound. After this randomised
presentation, the sounds were presented in a systematic way. First, the different
consonant sounds were introduced; the three sounds that differed only in voicing were
presented in succession with a crossed-out microphone presented on the display and
the participant being instructed to listen and not repeat the sounds. Then the same
three sounds were presented again; this time separately, and the listener was required
to repeat each of the sounds after it was presented. Similarly, in the tone phase of the
experiment, all three stimuli differing in lexical tone, for example: [bi:0], [bi:1], [bi:3]
were presented for listening then repeating. In the vowel phase, the three stimuli
varying in vowel were presented. After the separated presentation of vowel, tones,
and consonants, all sounds were presented again in random order. Thus, each
participant was required to produce five repetitions of each sound, a total of 135
productions. The production task took around 12 minutes.
195
Following this experiment two native Thai phoneticians were employed to rate each
participant‟s five productions for each sound on a scale of 1 (very bad) to 5 (very
good). Reliability between raters was high (r = .83). Details of the rating procedure
are given in Appendix A8.10. The result for each participant for each of the 27 sounds
was a mean score from 1 to 5.
8.4 Results: Separate Abilities
8.4.1 Musical Aptitude Results
Musical aptitude raw scores transformed into percentile ranks (see 8.3.2) were
analysed using a 2 (musicians/non-musicians) x 2 (rhythm/tone) analysis of variance
(ANOVA). Descriptive statistics (means and standard error bars) are shown in Figure
8.2. (For raw data see Appendix A8.11, and for statistical outputs see Appendix
A8.12.) Musicians scored significantly higher in both tone (F (1, 34) = 6.654, p < .05)
and rhythm (F (1, 34) = 4.745, p < .05) sections. No significant interaction between
the two aptitude scores and musical background was observed (F (1, 68) = .019, p >
.05).
0
20
40
60
80
100
tone rhythm
Musi
cal A
ptit
ude
(perc
entil
e r
anki
ng)
musicians non-musicians
Figure 8.2. Descriptive statistics for mean percentile-ranking scores in the musical aptitude test
8.4.2 Musical Memory Results
Descriptive statistics for proportion correct for familiar songs (see section 8.3.3)
shown in Figures 8.3 and 8.4. Raw data and statistical analyses are presented in
Appendices 8.13 and 8.14. A 2 (musicians/non-musicians) x 2 (upward
shift/downward shift) x 2 (one-semitone/two-semitones) analysis of variance
(ANOVA) was conducted. No significant difference was observed between musicians
196
(M = .859, sd = .113) and non-musicians (M = .836, sd = .134) in the musical memory
test (F (1, 34) = .016, p > .05). There was also no significant difference between
upward and downward shifted songs (F = .056, p > .05). There was however a
significant difference between songs that were shifted by one semitone and songs that
were shifted by two semitones with the greater shifts being identified more accurately
than the smaller shifts (F (1, 16) = 5.095, p < .05).
0
0.25
0.5
0.75
1
1 semitone 2 semitones 1 semitone 2 semitones
upward shift downward shift
Mu
sica
l Me
mo
ry
(pro
po
rtio
n c
orr
ect
)
musicians non-musicians
Figure 8.3. Descriptive statistics for musical memory results for musicians and non-musicians for shift size and shift direction
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Franz
Ferdinand
Los Del Rio Eamon Cat Empire Nelly/Kelly Jamelia Puff Daddy Outkast Queen Men at
Work
Black eyed
Peas
Michael
Jackson
Musi
cal M
em
ory
(pro
port
ion c
orr
ect
)
musicians non-musicians
Figure 8.4. Descriptive statistics for musical memory results for musicians and non-musicians across shift size and shift direction
8.4.2 Foreign Language Aptitude Results
Proportion correct scores (see section 8.3.4) for part 5 (sound discrimination) and part
6 (sound-symbol association) of the PLAB modern language aptitude test were
analysed using two separate analyses of variance (ANOVA). Mean values are plotted
in Figure 8.5 and raw data and analyses are presented in Appendix A8.15 and 8.16.
197
No significant differences between musicians and non-musicians were observed in
sound discrimination (F (1, 34) = 3.242, p > .05) or in sound-symbol association (F
(1, 34) = 1.462, p > .05).
0
0.2
0.4
0.6
0.8
1
five six
La
ngua
ge A
ptit
ude
(pro
port
ion c
orr
ect
)
musicians non-musicians
Figure 8.5. Descriptive statistics for mean scores in the language aptitude test
8.4.4 Speech Perception
In the speech perception task, the listeners were required to identify tones (tone 0,
tone 1 tone 3), voicing (prevoiced [b], voiceless unaspirated [p], voiceless aspirated
[ph]), and vowels ([i:], [ɔ: [u ]). First, the results for all three sound types (tones,
consonants, vowels) are compared in both trials to criterion and test trial identification
performance. Then each speech sound type is investigated separately in more detailed
analyses of the particular tones, consonants, and vowels that were tested. Descriptive
statistics are shown for performance for tones, consonants, and vowels in trials to
criterion (Figure 8.6) and test trials (Figure 8.7) and then separately for tones (Figure
8.8), consonants (Figure 8.9), and vowels (Figure 8.10).
0
3
6
9
tones consonants vowels
mea
n tri
als
to c
rite
rion
musicians non-musicians
Figure 8.6. Descriptive statistics for mean trials to criterion scores for musicians and non-musicians for tones, consonants, and vowels
198
0
0.5
1
tones consonants vowels
pe
rceptio
n a
ccu
racy
musicians non-musicians
Figure 8.7. Descriptive statistics for perception accuracy for musicians and non-musicians for tones, consonants, and vowels.
0
0.5
1
low (static) tone mid (static) tone high (dynamic)
tone
tone
perc
ep
tion
acc
ura
cy
musicians non-musicians
Figure 8.8. Descriptive statistics for tone perception scores for musicians and non-musicians.
0
0.5
1
b p ph
conso
na
nt
pe
rce
ptio
n a
ccura
cy
musicians non-musicians
Figure 8.9. Descriptive statistics for consonant perception scores for musicians and non-musicians.
0
0.5
1
u i o
vow
el p
erce
pacc
urac
y
musicians non-musicians
Figure 8.10. Descriptive statistics for vowel perception scores for musicians and non-musicians.
199
Trials to Criterion Analysis
Three outliers were found62 (z > 3.29) and changed to one unit larger than the next
extreme score, as suggested by Tabachnik and Fidell (2001). After dealing with the
outliers, the trials to criterion scores were analysed in an analysis of variance using
planned contrasts to test for differences between speech sound types (see Table 8.2).
Raw data for the criterion results of the speech perception task and statistical analyses
are presented in Appendix A8.17 and 8.18. Descriptive statistics for trials to criterion
results across speech sounds are provided in Figure 8.6.
Table 8.2
Description of the Speech Sound Type Planned Contrasts:
The test investigates (a) differences between pitch-based sounds (tones) and the non-
pitch based sounds (consonants and vowels) and (b) consonants and vowels
Tones Consonants Vowels
Tones (pitch-based) vs. Consonants
and Vowels (non-pitch-based)
2 -1 -1
Consonants vs. Vowels 0 1 -1
Trials to criterion were analysed in a 2 (musicians/non-musicians) x speech sound
(tones, consonants, vowels) planned contrast analysis of variance (ANOVA) in which
tones were compared with consonants and vowels and consonants were compared
with vowels. As can be seen in Figure 8.6 musicians required significantly fewer trials
to reach criterion than non-musicians (F (1, 32) = 4.70, p < .05), and listeners needed
significantly fewer trials to criterion for vowels that consonants (F (1, 32) = 48.20, p
< .05). There was also a significant interaction between musical background and
consonants vs. vowels with no difference between musicians and non-musicians in
vowels, but fewer trials needed in consonant perception by musicians than non-
musicians (F (1, 32) = 5.031, p < .05). None of the other main effects or interactions
were found to be significant.
62 Outliers were changed to one unit larger than the next extreme score. There was one outlier in the consonant task (24 trials – changed to 13), two in the tone task (26 and 24 trials – changed to 19), and one in the vowel task (6 trials – changed to 4).
200
Identification of Tones, Consonants, and Vowels
Speech perception results for tones, consonants, and vowels were analysed in a 2
(musicians/non-musicians) x 3 (tones/consonants/vowels) ANOVA using the planned
contrasts described in Table 8.2. For raw data of the speech perception task and
statistical analysis tables see Appendix A8.19 and 8.20. Descriptive statistics are
provided in Figure 8.7
The analysis shows that musicians show generally better perception accuracy across
all sounds (F (1, 34) = 13.154, p < .05) and consonant perception accuracy was
generally significantly lower than vowel perception accuracy (F (1, 34) = 122.72, p <
.05). No other significant main effects or interactions were observed.
Tone Perception Analysis
Tone perception scores for musicians and non-musicians were analysed using the
planned contrasts described in Table 8.3. Descriptive statistics are provided in Figure
8.8.
Table 8.3
Description of the Contrasts:
The test investigates (a) differences between the two steady63 tones (low tone and mid
tone) and dynamic tone (high) and (b) between the two steady tones
Mid Low High
Steady vs. dynamic tones 1 1 -2
Steady 1 vs. steady 2 1 -1 0
The analysis revealed that musicians are significantly more accurate than non-
musicians at identifying lexical tones (Mmusicians = .799, sd = .243; Mnon-musicians = .635,
sd = .331; F (1, 32) = 11.31, p < .05). The dynamic high tone was perceived
significantly more accurately than the two steady tones (Mdynamic = .961, sd = .059;
Msteady = .595, sd = .299; F (1, 32) = 120.23, p < .05) and the low tone was identified
more accurately than the mid tone (Mlow = .662, sd = .261; Mmid = .529, sd = .323; F
(1, 32) = 20.05, p < .05). The interaction between musicianship and steady vs.
dynamic tones was also significant (F (1, 32) = 13.35, p < .05) showing that the better
perception by musicians than non-musicians occurred only for the steady low and mid
63 In this analysis, the low and mid tone are referred to as “steady” tones, because they are steadily falling, whereas the high tone rises first and falls at the end.
201
tones, and not for the dynamic high tone on which both groups performed equally
well.
Consonant Perception Analysis
Consonant perception accuracy was analysed using the planned contrasts described in
Table 8.4 and descriptive statistics are shown in Figure 8.9. The label „native‟ is
attached to the voiceless unaspirated [p] and the voiceless aspirated [ph] sound, as
they are apparent in Australian English and the label „non-native‟ to the prevoiced
bilabial [b] as it is not part of the Australian English phonological inventory.
Table 8.4
Description of the Contrasts:
The test investigates (a) differences between native and non-native consonants and (b)
differences between the two native consonants
b (non-native) p (native) ph (native)
Native vs. non-native sounds -2 1 1
Native sound 1 vs. native sound 2 0 1 -1
The difference between native and non-native consonants was found to be significant,
with the native consonants [p] and [ph] being identified less accurately than the non-
native [b] consonant (Mnative = .475, sd = .037; Mnon-native = .841, sd = .140; F (1, 32) =
104.07, p < .05). There was a no significant difference between musicians and non-
musicians overall, however there was a greater difference between musicians and
non-musicians on the native consonants, but little difference as a function of musical
background on the native speech sounds (F (1, 32) = 6.49, p < .05). The difference
between the two native sounds was also significant, with sound [ph] being identified
more accurately than sound [p] (Mph = .632, sd = .267; Mp = .319, sd = .271; F (1, 32)
= 50.31, p < .05), but the interaction with musicianship was not significant. These
results seem surprising at first, as it was expected that the native sounds would be
easier to identify than the non-native consonants. The reason for better identification
of [b] than [p] and [ph] however may be attributed to the use of labels in the current
experiment. The label “b” is unambiguous, whereas “p” and “ph” are more prone to
being confused, which may be the reason for lower identification scores for these two
native sounds.
202
Vowel Perception Analysis
Vowels were analysed using planned comparisons shown in Table 8.5. Descriptive
statistics are provided in Figure 8.10. The vowels [i] and [o] are labelled native
because they correspond to phonemes in Australian English, and the [u] vowel is non-
native as it is not part of the Australian vowel system.
Table 8.5
Description of the Contrasts:
The test investigates (a) differences between native and non-native vowels and (b)
differences between native vowel [i] and native vowel [o]
[u] [i] [o]
Native [i, o] vs. non-native [u] vowels 2 -1 -1
Native vowel 1 [i] vs. native vowel 2 [o] 0 1 -1
Analysis revealed no significant differences between musicians and non-musicians in
the perception of vowels (F (1, 32) = .946, p > .05), no significant differences
between native and non-native vowels or between the two native vowels, and no
significant two-way interactions.
8.4.5 Speech Production
Rated speech production performance (see 8.3.6) will be considered first generally
across tones, consonants, and vowels and then more specifically for each of these.
Descriptive statistics are shown for all three (in Figure 8.11) and for tones,
consonants, and vowels in Figures 8.12, 8.13, and 8.14 respectively.
Overall Speech Production Results
Speech production results were analysed in a 2 (musicians/non-musicians) x 3
(tones/consonants/vowels) ANOVA using the planned contrasts described in Table
8.2 (section 8.4.4). Results are schematically presented in Figure 8.11, raw data and
analyses are presented in Appendix A8.21 and 8.22. Musicians show generally better
speech production accuracy across all sounds (Mmusicians = 3.24, sd = .523; Mnon-musicians
= 3.01, sd = .451; F (1, 34) = 5.56, p < .05) and vowel production accuracy (Mvowel =
3.32, sd = .0344) was significantly lower than consonant production accuracy
203
(Mconsonant = 3.01, sd = .0224; F (1, 34) = 47.73, p < .05). No other significant main
effects or interactions were observed.
Figure 8.11. Descriptive statistics for speech production scores for musicians and non-musicians for tones, consonants, and vowels
1
2
3
4
5
mid tone low tone high tone
mea
n p
rod
uction
score
musicians non-musicians Figure 8.12. Descriptive statistics for tone production scores for musicians and non-musicians
Figure 8.13. Descriptive statistics for consonant production scores across musicians and non-musicians
1
2
3
4
5
u i o
mean p
roduction s
core
s
musicians non-musicians
Figure 8.14 Descriptive statistics for vowel production scores for musicians and non-musicians
204
Tone Production
Tone production ratings were analysed in a 2 (musicians/non-musicians) x 3
(tone0/tone1/tone3) ANOVA with the same planned contrasts used in the tone
perception analysis (see Table 8.3, section 8.4.4). Results are shown schematically in
Figure 8.12. Musicians were found to be significantly better at producing tones
overall (F (1, 30) = 9.956. p < .05), and the mid tone was produced significantly better
than the high tone (F (1, 30) = 15.86, p < .05). None of the other main effects or
interactions were found to be significant.
Consonant Production
Consonant production ratings were analysed in a 2 (musicians/non-musicians) x 3
(prevoiced/voiceless-unaspirated/voiceless-aspirated) ANOVA using the same
planned contrasts as in consonant perception (see Table 8.4, section 8.4.4).
Descriptive statistics are shown in Figure 8.13. Musicians were significantly more
accurate at producing consonants in general (F (1, 32) = 5.318, p < .05) and
production of the native ([p] and [ph]) sound was significantly more accurate than
production of the non-native [b] consonant (F (1, 32) = 12.364, p < .05). It was also
observed that the voiceless aspirated consonant was rated as being produced
significantly more accurately than the voiceless unaspirated consonant (F (1, 32) =
16.06, p < .05). No other significant differences were observed.
Vowel Production
Vowel production ratings were analysed in a 2 (musicians/non-musicians) x 3 (vowel
u/vowel i/vowel o) ANOVA according to the contrasts described in Table 8.5. For
descriptive statistics see Figure 8.14. No significant difference was found between
musicians and non-musicians in vowel production (F (1, 32) = 3.676, p > .05),
however, the native [i] vowel was generally produced significantly better than the
native [o] vowel (F (1, 32) = 12.42, p < .05). No other main effects or interactions
were significant.
205
8.5 Results: Comparison of Perception and Production
Scores on perception (scale from 0 to 1) and production (scale from 1 to 5) of tones,
consonants and vowels in musicians and non-musicians are compared in this section
using Pearson product-moment correlations. The critical value for correlations for
tone, consonant, and vowel overall was rcrit = .231, and the critical value for specific
sounds was rcrit = .40) It can be seen that there were only a very few significant
correlations between perception and production in musicians (in [b], vowels overall,
[i], and [o]) and non-musicians (consonants overall). For tones there were no
significant correlations between perception and production. Perception and production
of consonants was positively correlated in non-musicians (r = .265), and there was a
significant negative correlation between perception and production of the prevoiced
bilabial [b] in musicians (r = -.761). This indicates that participants, who were good at
perceiving prevoiced stops, were poor at producing them, which suggests that the
perceptual salience of this contrast for the perceiver actually distracts from the ability
to produce it. One possible explanation for this could be that listeners correctly label
the prevoiced bilabial stop, however consistently produce it wrongly. Vowel results
show a significant correlation between perception and production of the vowels in
general in musicians (r = .385) but not in non-musicians. Apart from the general
vowel perception and production correlation in musicians, perception and production
of [o] (r = .529) and [i] (r = .490) were also positively correlated in musicians, but not
in non-musicians.
8.6 Results: Determinants of Perception and Production
A principle components analysis was conducted to reduce the components
contributing to musical training obtained from the questionnaire (see Appendix A8.1
and 8.2), and the resulting factor was then used along with language aptitude, musical
aptitude, musical memory, and musical training in six separate sequential linear
regression analyses to predict perception and production of tones, consonants, and
vowels.
8.6.1 Factor Analysis for Data Reduction
In order to reduce the data for musical training, a principal components analysis was
performed on the three musical experience variables obtained in the questionnaire:
206
number of instruments played, total number of years playing music, hours a week
currently played for all 36 participants (18 musicians and 18 non-musicians). There
were no missing data and the one outlier was dealt with according to the procedure
suggested by Tabachnik and Fidell (2001)64. The raw data and analyses are presented
in Appendix A8.23 and 8.24. A single component with an eigenvalue greater than one
was extracted, which accounts for 72.11% of the variance. The component loadings,
communalities (h2), and percentages of variance explained are shown in Table 8.6. As
can be seen the three variables all load on the component relatively equally. The
factor was labeled musical training.
Table 8.6
Principal Component Loadings and Communalities (h2) for Music Training Variables.
Factor
Item 1 h2
Number of instruments .884 .781
Years played .827 .683
Hours per week .836 .699
% of variance 72.11 72.11
Label Musical training
8.6.2 Correlations Between Variables
Correlations between the four independent variables (language aptitude, musical
aptitude, musical memory, and musical training) were computed and are shown
schematically in Table 8.7. As can be seen in the results, there is a high correlation
between language aptitude and musical aptitude. This could indicate the involvement
of a general aptitude effect. High correlation was also found between musical aptitude
and musical experience. Thus, there seems to be overlap between inherent music
abilities (aptitude) and experiential music factors (experience). Musical memory
shows no significant correlation with any of the factors, which suggests that this is an
inherent ability that is unrelated to any other aspect of music or to language aptitude.
64 There was one outlier in the musician group for number of instruments played. It was changed to one unit larger than the next extreme score (from 14 to 7)
207
Table 8.7
Intercorrelations Among Language Aptitude, Musical Aptitude, Musical Memory, and
Musical Training
Variables Language Aptitude (PLAB)
Musical Aptitude (AMMA)
Musical Memory
Musical Training
Language Aptitude
(PLAB)
1 .602** .169 .218
Musical Aptitude
(AMMA)
1 .277 .528**
Musical Memory 1 .101
Musical Training 1
** p < .01
Correlations between the six outcome measures (perception and production of tones,
consonants, and vowels) and the four predictor variables are schematically presented
in Table 8.8.
Table 8.8
Descriptive Statistics and Correlations between the six dependent variables and the
four independent variables
Language
Aptitude
(PLAB)
Musical
Aptitude
(AMMA)
Musical
Memory
Musical
Training
Tone Perception .195 .467** .244 .391*
Tone Production .253 .185 .458** .244
Consonant Perception .536** .539** -.117 .315*
Consonant Production .519** .295* .083 .127
Vowel Perception .167 .246 -.027 .269
Vowel Production .325* -.090 .181 -.077
* p < .05, ** p < .01
208
Table 8.8 shows significant correlations between tone perception and musical aptitude
as well as between tone perception and musical training. This indicates that musically
trained participants and participants with high musical aptitude are also good at
perceiving tones and is of interest in the light of the high correlation between musical
aptitude and training (see Table 8.7) showing high intercorrelations between all three
of these factors. Tone production is not correlated with the same variables as tone
perception: here musical memory provides the only significant correlation. This tone
production/music memory correlation is intriguing in light of the fact that music
memory is not correlated with any of other inherent and experiential variables (see
Table 8.7). Of note is that neither tone perception nor tone production are correlated
with language aptitude. The exact relationship between tone perception and
production and measures of inherent abilities and experiential abilities will be
investigated further in the regression analyses.
Consonant perception and production are both highly correlated with language and
musical aptitude. Thus high scores in the inherent ability tests result in high consonant
perception and production. Again, this is interesting in light of the high correlation
between music and language aptitude (see Table 8.7) and may indicate that a general
aptitude factor may underlie perception and production. Interestingly, as for tone
perception, musical training is significantly correlated with perception of consonants
suggesting a general influence of music training on speech perception.
In vowel perception and production the picture is fairly simple; the only significant
correlation here is between vowel production and language aptitude. Thus neither the
possible general aptitude effect, nor the possible effect of musical training on speech
perception appears to operate. This may be due the nature of vowels or due to the high
scores in vowel perception.
8.6.3 Sequential Regressions
Six separate sequential linear regressions were performed, one each for perception
and production of tones, consonants, and vowels. In each the four independent
predictor variables were: language aptitude, musical aptitude and musical memory,
and musical training (the variable that was extracted out of the music training
209
variables in the factor analysis, see section 8.6.1). Raw data and analyses for the
regression are presented in Appendix A8.25 and 8.26. The sequential regression
procedure was chosen in order to identify the additional variance explained by
subsequently entered variables over and beyond the preceding variables. Language
aptitude was entered in the first block, as it is the variable that is expected to be most
closely related to each of the dependent variables (perception and production of tones,
consonants, and vowels). In the next block, musical aptitude was inserted, as, along
with language aptitude the musical aptitude test is designed to measure an inherent
ability. After that, in block 3, the musical memory variable, another presumably
inherent ability, was inserted. In the final block musical training was entered, as it
measures the least inherent factor that has very little relation to predisposition, as it
consists of musical experience data. Regression tables are presented in Appendix
A8.26 and results for each of the six regressions are discussed below.
8.6.2.1 Tone Perception and Production
Tone Perception: In block 1 language aptitude alone did not predict perception of
tone (R = .195, R2 = .038, adjusted R2 = .009, F (1, 33) = 1.306, p > .05). In block 2
when musical aptitude is added to the regression, the resultant combination did
predict tone perception (R = .479, R2 = .230, adjusted R2 = .182), and the addition of
musical aptitude led to a significant R2 change of .192 (F (2. 32) = 4.775, p < .05).
The addition of musical memory in the next block did not result in significant R2-
changes (R = .494, R2 = .244, adjusted R2 = .171, F (3, 31) = 3.336, p > .05), nor did
the addition of musical training in the final block (R = .522, R2 = .272, adjusted R2 =
.175, F (4, 30) = 2.806, p > .05). Thus musical aptitude appears to be the most
important predictor of perception of tones.
Tone production: Language aptitude alone did not predict production of tone (R =
.253, R2 = .064, adjusted R2 = .036, F (1, 34) = 2.320, p > .05), nor did the
combination of language and musical aptitude (R = .256, R2 = .064, adjusted R2 =
.036, F (2, 33) = 1.157, p > .05). When musical memory is added, however, there is a
significant R2 change (R = .494, R2 = .244, adjusted R2 = .173; R2 change = .179, F (3,
32) = 3.447, p < .05), which suggests that tone production can best be explained by
the addition of musical memory to the regression equation.
210
8.6.2.2 Consonant Perception and Production
Consonant Perception: Language aptitude alone predicts perception of consonants (R
= .536, R2 = .287, adjusted R2 = .266, F (1, 34) = 13.709, p < .05). The addition of
musical aptitude does not lead to a significant R2-change (R = .600, R2 = .360,
adjusted R2 = .322, F (2, 33) = 9.297, p > .05), however when musical memory is
added to the model, there is significant R2-change (R = .662, R2 = .438, adjusted R2 =
.385, R2 change = .078; F (3, 32) = 8.310, p < .05). Addition of musical training in
block 4 did not reliably improve R2. Thus, perception of consonants is best explained
by a combination of language and musical memory.
Consonant Production: Language aptitude alone best predicts production of
consonants (R = .519, R2 = .269, adjusted R2 = .247, F (1, 34) = 12.551, p < .05). The
addition of the other variables does not lead to significant R2-changes, which suggests
that language aptitude alone explains production of consonants.
8.6.2.3 Vowel Perception and Production
Vowel Perception: Language aptitude alone does not predict perception of vowels (R
= .167, R2 = .028, adjusted R2 = -.001, F (1, 34) = .979, p > .05), nor does addition of
any further variables. It is of note that there was a ceiling effect in the vowel
perception results, which may account for this lack of significant prediction.
Vowel production: In the case of vowel production, language aptitude alone is not a
significant predictor (R = .325, R2 = .106, adjusted R2 = .079, F (1, 34) = 4.019, p >
.05). When musical aptitude is added, the combination does predict vowel production
(R = .484, R2 = .234, adjusted R2 = .188, R2 change = .128; F (2, 33) = 5.046, p < .05).
Addition of musical memory or musical training in the next blocks does not reliably
improve R2.
Table 8.9 summarizes the predicting variables. It can be seen that tone perception and
production can be best explained by inherent musical abilities (musical aptitude and
musical memory respectively), consonant perception and production are predicted by
inherent language aptitude (and musical memory in perception), and vowel
production is best explained by musical aptitude, whereas none of the variables
explains perception of vowels.
211
Table 8.9
Summary of Significant Predictors when added to the Regressions. R2 change Values
are shown in brackets.
Speech Sound Type Perception Production
Tones Musical Aptitude (.192) Musical Memory (.179)
Consonants Language Aptitude (.287)
Musical Memory (.078)
Language Aptitude (.269)
Vowels No predictors Musical Aptitude (.128)
Thus, it seems that language and musical aptitude are generally good predictors for
perception and production of foreign speech sounds. Considering the very high
correlation between musical aptitude and language aptitude, and the similarly high
correlation between musical aptitude and musical training, one might suspect that this
pattern of results is influenced by the order of blocks that was chosen. In order to
exclude this possibility, an alternative model was tested in which the first predictor
remained language aptitude; the second block was now musical training, followed by
musical memory and the final block, musical aptitude. That is the position of music
aptitude and musical training, which are highly correlated, were reversed. Tables for
the alternative model regressions are given in Appendix A8.27 and are summarised in
Table 8.10.
Table 8.10
Summary of Significant Predictors when added to the Alternative Regressions. R2
change Values are shown in brackets.
Speech Sound Type Perception Production
Tones Musical Training (.128) Musical Memory (.167)
Consonants Language Aptitude (.287) Language Aptitude (.269)
Vowels No Predictors Musical Aptitude (.134)
212
The results of the alternative regression analyses indicate that, irrespective of the
order of steps, the main results remain the same. There were two changes. First, tone
perception is predicted by music aptitude in the preferred model but musical training
in the alternative model. This suggests that due to their high correlation it is difficult
to ascertain precisely whether musical aptitude or training is the critical predictor of
tone perception. Second, music memory dropped out as a predictor of consonant
perception in the alternative model. Music memory and music aptitude remained
significant predictors for tone perception and vowel perception across the two models.
8.6.4 Alternative Approach to Participant Grouping
Because of the high incidence of low scores on the musical training variable (the non-
musicians‟ scores especially on the musical training were close to zero) the
distributions could be skewed. In order to investigate this further, the musicians‟ and
non-musicians‟ data were analysed separately. Six separate sequential linear
regressions were performed for musicians, one each for perception and production of
tones, consonants, and vowels. In each, the four independent predictor variables were:
language aptitude, musical aptitude (the rhythm score and the tonal score were
entered separately, but in the same step of the regression), musical memory, and
musical training. The sequential regression procedure was chosen in order to identify
the additional variance explained by subsequently entered variables over and beyond
the preceding variables. Language aptitude was entered in the first block, as it is the
variable that is expected to be most closely related to each of the dependent variables
(perception and production of tones, consonants, and vowels). In the next block,
musical aptitude, measured as tonal and rhythm (separately) aptitude was entered, as,
along with language aptitude the musical aptitude test is designed to measure an
inherent ability. After that, in block 3, the musical memory variable, another
presumably inherent ability, was entered. In the final block musical training was
entered, as it measures the least inherent factor that has very little relation to
predisposition, as it consists of musical experience data.
For the non-musicians, another set of six sequential linear regressions was performed,
in which the predictor variables were: language aptitude, musical aptitude (the rhythm
213
score and the tonal score, entered in the same step, but as separate variables), and
musical memory.
Non-Musician results:
Tone Perception: None of the variables predict tone perception in non-musicians.
Tone Production: Language aptitude alone does not predict perception of tones,
neither does the addition of either of the two musical aptitude scores; however when
musical memory is added to the model, there is significant R2-change (R = .850, R2 =
.723, adjusted R2 = .637, R2 change = .623; F (4, 13) = 8.469, p < .05). Thus,
perception of tones is best explained the addition of musical memory into the model.
Consonant Perception: Language aptitude alone does not predict perception of
consonants, neither does the addition of either of the two musical aptitude scores;
however when musical memory is added to the model, there is significant R2-change
(R = .786, R2 = .617, adjusted R2 = .500, R2 change = .341; F (4, 13) = 5.247, p <
.05). Thus, perception of consonants is best explained by the addition of musical
memory.
Consonant Production: None of the variables in the analysis predicts consonant
production in non-musicians.
Vowel Perception: None of the variables in the regression predicted vowel perception
in non-musicians.
Vowel Production: Regression analyses showed that none of the entered variables
predicted vowel production either.
These results from an analysis including only the non-musicians‟ results are different
from those obtained when analysing both groups together: tone perception is not
predicted by musical aptitude in non-musicians, however the predictor for tone
production is still the addition of musical memory to language aptitude. In consonant
perception, the non-musician results are similar to the results for the two groups
combined: the addition of musical memory to language aptitude best predicts
consonant perception, however, unlike the original analysis, language aptitude alone
does not predict consonant perception. Consonant production in non-musicians is not
214
predicted by any of the variables used here, even though language aptitude was a
significant predictor in the original analysis.
Vowel perception in non-musicians is not predicted by any variables; and this is the
same as in the combined analysis. Vowel perception, originally best predicted by the
addition of musical aptitude to language aptitude, is now not predicted by any of the
variables.
Thus, overall it seems that the exclusion of the musicians from the analysis leads to
fewer significant predictors in the non-musician group‟s perception and production
with the only significant predictor for these non-musicians being musical memory.
Musician results:
Tone Perception: None of the variables entered in the sequential regression led to a
significant R2 change in musicians‟ tone perception abilities.
Tone Production: None of the variables in the regression predicted tone production in
non-musicians.
Consonant Perception: Language aptitude alone predicts perception of consonants (R
= .616, R2 = .379, adjusted R2 = .205, F (1, 16) = 9.780, p < .05). The addition of
musical aptitude (tone and rhythm scores, which were entered separately) does not
lead to a significant R2-change, and neither does musical memory or musical training.
Thus, perception of consonants is best explained by inherent language abilities,
measured as language aptitude.
Consonant Production: Language aptitude alone predicts production of consonants in
musically trained participants (R = .517, R2 = .267, adjusted R2 = .221, F (1, 16) =
5.826, p < .05). The addition of musical aptitude (rhythm and tone, entered as separate
variables but in the same step) also leads to significant R2-changes (R = .667, R2 =
.445, adjusted R2 = .326, F (3, 14) = 3.745, p < .05. (More specifically, both tone and
rhythm predict consonant production when tone is entered first, but only tone predicts
it when rhythm is entered first, suggesting perhaps that musical aptitude for tone in
musicians may be more important in consonant production than that for rhythm.) In
addition, the addition of both musical memory (R = .748, R2 = .560, adjusted R2 =
.425, F (4, 13) = 4.135, p < .05), and musical training (R = .819, R2 = .671, adjusted
215
R2 = .534, F (5, 12) = 4.889, p < .05) add to the predictive power for consonant
production. This suggests that production of consonants in musicians can best be
explained by a combination of language aptitude, musical (especially tone) aptitude,
music memory and musical training.
Vowel Perception: Language aptitude alone does not predict perception of vowels,
nor does addition of any other variables.
Vowel Production: In the case of vowel production, language aptitude alone is not a
significant predictor. The addition of musical aptitude significantly predicts vowel
production (R = .700, R2 = .490, adjusted R2 = .381, F (1, 16) = 4.481, p < .05).
Addition of musical memory (R = .717, R2 = .514, adjusted R2 = .364, F (4, 13) =
3.432, p < .05) and musical training (R = .798, R2 = .637, adjusted R2 = .485, F (5, 12)
= 4.205, p < .05) in the next blocks also reliably improves R2. Thus, vowel production
in musicians is best predicted by a combination of music aptitude, musical memory,
and musical training.
This set of results is different from that obtained in the original analysis in terms of
tone perception and production; these were predicted by the addition of musical
aptitude and musical memory respectively, however, when only musicians are
analysed, no significant predictors were found. Consonant perception is still predicted
by language aptitude, however the addition of musical memory does not lead to
significant R changes; consonant production however, which was originally predicted
significantly only by language aptitude, is now predicted by a combination of
language and music aptitude, plus music memory and training.
Vowel perception results are the same in this and the previous analysis: no significant
predictors were found. Vowel production however, previously predicted by musical
aptitude alone, is now predicted by a combination of music aptitude, memory, and
training.
The results of this alternative approach, in which musicians were analysed separately,
show great differences in results: Tone perception and production are not predictable,
neither is perception of vowels. Consonant perception is predicted by language
aptitude, and consonant production is predicted by a combination of all four variables.
216
Finally, vowel production is predicted best by a combination of musical aptitude,
memory, and training.
The differences between musicians and non-musicians are evident: while the non-
musicians‟ results were only predicted by musical memory, musicians‟ results are
more complex, and combinations of different variables seem to predict them best,
especially in production of consonants and vowels.
Parallels that have been discovered between the current separated and the previous
combined results for non-musician are found in production of tones, perception of
consonants, and perception and production of vowels. When comparing the
musicians‟ results to the original set of predictors, no similarities are found, except for
the fact that vowel perception is still not predicted by any of the entered variables.
Given these differences there is a need for more comprehensive future studies in
which a greater range of musical abilities and training is sampled than in the current
study.
It can be concluded that non-musicians‟ speech perception and production abilities are
best predicted by musical memory, whereas in musicians, it is a combination of
training and inherent abilities that best predict their speech perception and production.
However, since the assumptions for these analyses were not met (minimum number of
participants in the musician group – 24, minimum number in the non-musician group:
20 participants; in the current study only 18 participants per group were tested), the
conclusions drawn from these results are tentative and can only be seen as
explanatory.
8.7 Discussion
The results show that musicians are better than non-musicians at tone perception and
production, consonant perception and production, and in musical aptitude, but there
were no differences between musicians and non-musicians in language aptitude,
musical memory, or perception and production of vowels. There was some correlation
between perception and production of consonants and vowels, however mainly for the
musicians, which is interesting because it shows that, in musically trained listeners,
consonant and vowel perception and production may be processed similarly. The
217
regression results show that the critical factor in tone perception and production are
the inherent music abilities, aptitude for perception and music memory for production.
Consonant perception and production are mainly explained by language aptitude, and
vowel perception is not explained by any of the variables; vowel production, however
is best predicted by music aptitude (but see section 8.6.4 for alternative approach). In
the section below the results are discussed in terms of what the components of
musical training might be, how music training might affect perception and production,
and finally the musical determinants of speech sound perception and production.
8.7.1 The Nature of Musicianship
Accuracy on the musical memory test was similar for musicians and non-musicians.
This confirms earlier results, that both musicians and non-musicians have quite
accurate long-term memory for pitch (Schellenberg & Trehub, 2003). These results
provide evidence that adults with little musical training remember the pitch level of
familiar instrumental recordings, as reflected in their strong ability to distinguish
correct versions from versions shifted upward or downward by 1 or 2 semitones.
Their failure to identify the correct pitch level of unfamiliar musical recordings
excludes contributions from possible artifacts of the pitch shifting process
(Schellenberg & Trehub, 2003).
The very high accuracy with which these musicians and non-musicians without
absolute pitch65 (AP) identified pitch shifting demonstrates that most people retain
fine-grained information about pitch height over long periods. This could indicate that
music listeners create very accurate representations of musical pieces that contain
absolute and relational characteristics (Dowling, 1999). The results also show that this
type of absolute memory for pitch is much more widespread than the traditional type
of AP, in which the listener names or reproduces tones, isolated from musical
contexts. It is therefore possible that it is the aspect of pitch naming, rather that of
pitch memory that is responsible for the rarity of the traditional form of AP (see also
Chapter 4).
65 Two of the participants in the current experiment were possessors of absolute pitch. This did, however, not influence the results, as they did not show enhanced musical memory (.63 and .91 proportion correct) compared to participants without absolute pitch (M = .854).
218
Musicians‟ musical aptitude scores were found to be significantly higher than non-
musicians‟ scores (see 8.4.1) and there was a high correlation between music aptitude
and training (see 8.6.2). There are two main possible explanations of this. The first
concerns self-selection; those people who have high musical aptitude are probably
more likely to learn an instrument, due possibly to self-motivation or encouragement
from observant parents or teachers. Secondly, once musical training begins, those
people with higher musical aptitude quite probably learn music more quickly and
easily, and thus do not quit playing music as easily as people with lower aptitude.
Musicians and non-musicians scored similarly on language aptitude (see 8.4.2) and
there was a low correlation between music training and language aptitude. Thus it
appears that musical training is not related to language aptitude in the measures used
here.
8.7.2 Musicianship and the Perception and Production of Speech Sounds
Based on the results of the separate abilities, it appears that what distinguishes
musicians from non-musicians is their musical experience, and higher musical
aptitude with no differences in musical memory or language aptitude. Now the effect
of musicianship on the perception and production of speech sounds will be
considered.
The results of the speech perception and production tasks show that musicians are
significantly better than non-musicians at perceiving and producing tones and
consonants but there was no difference between musicians and non-musicians in the
perception and production of vowels. The possible reasons for the systematic
superiority of musicians over non-musicians are set out below, separately for tones
and consonants.
Tones are a feature of music in all cultures, but not a feature of speech in all
languages. Musicians are continually exposed to small tonal variations. It was
therefore expected that tone perception would be better in musicians than in non-
musicians. The results of the current study confirm this; musicians learn to identify
tones more quickly than non-musicians, and are more accurate at discriminating
subtle differences between tones than non-musicians. Thus it appears that musical
219
training might enhance listeners‟ acquisition of new non-musical tonal distinctions
and their ability to perceive and produce tones. A reason for the superiority of
musicians over non-musicians could be that musicians have better pitch pattern
processing skills. Jakobson, Cuddy, and Kilgour (2003) found that musical training
engages and refines processes involved in pitch pattern analysis and these may be
activated in the current tasks. Thus it is possible that music instruction affects
categorical perception of tone indirectly by strengthening auditory temporal
processing skills (Jakobson et al., 2003), which allows musicians to discriminate
better between rapidly changing acoustic events. It is quite possible that such skills
may be of use here in the foreign language perception and production tasks.
For consonants the essential differences between [b], [p], and [ph] are small voice
onset timing (VOT) differences. Music is not only about melody and harmony, but
also about rhythm and timing. So one reason why musicians are better at perceiving
and producing consonants could be that they are more finely tuned to small rhythmic
differences than non-musicians. In fact it has been found that musicians are better
than non-musicians in judging temporal order (Koh, Cuddy, & Jakobson, 2001). This
skill may be useful in the consonant perception task here, which required fine-grained
discrimination of VOT. Indeed, Koh et al. (2001) suggest that increased exposure to
temporal distinctions may improve processing of subtle temporal order (small voice
onset time discrimination) differences.
8.7.3 Musical Determinants of Speech Perception and Production
Now that we know the nature of musical training and the influence of musical training
on perception and production of speech sounds the relative contribution of musical
training and other variables to speech sound perception and production can be
considered.
Musicians are better at musical aptitude, speech sound learning, tone identification,
identification of non-native consonants, and they show greater correlations between
perception and production of speech sounds than non-musicians. However, a more
analytic look at the results through regression analyses shows that it may not be
musical training per se that contributes to the explanation of speech sound perception
220
and production. The determinants for tone, consonants, and vowels are considered
separately below.
Tones: Language aptitude is not important in the perception and production of lexical
tone, rather it is the additional effect of musical aptitude that makes a good tone
perceiver and of musical memory makes a good tone producer. Musical training does
not predict tone perception or production, showing that it is not necessary that
listeners have actual experience in perceiving small tonal changes or producing them
(when learning an instrument). Rather inherent language aptitude and inherent
musical memory aid in tone perception and production, irrespective of degree of
training. A rider must be added to this explanation for tone perception. There, in the
alternative regression model, it was in fact the addition of musical training that added
predictive power to the regression. This, along with the high correlation between
musical aptitude and musical training suggests that neither musical training nor
musical aptitude per se consistently predicts tone perception66. However tone
production is resistant to the model change and musical memory remains the
significant addition in both models67.
Consonants: For consonants, it is clear that language aptitude alone can predict
participants‟ ability for producing consonants. This is also the case in consonant
perception; however, the addition of musical memory also adds to the prediction.
Together, these results show that it is inherent language ability that leads to good
consonant perception and production abilities and for consonant perception the
inherent ability, musical memory (not musical training), also adds to the results68.
Vowels: Finally, for vowels, none of the independent variables served as a good
predictor for vowel perception (note the ceiling effect in vowel perception, see section
8.4.4). In vowel production, however, the best predictor was a combination of
language and musical aptitude. This indicates that language aptitude alone does not
66 Note that when musicians‟ and non-musicians‟ data are analysed separately, this is confirmed – none of the factors in the analysis predict tone perception in either of the groups. 67 However, it should be noted that when musicians‟ and non-musicians‟ data are analysed separately, music memory only predicts tone production in non-musicians, and not in musicians. 68 However, when musicians‟ and non-musicians‟ data are analysed separately, consonant perception is predicted by language aptitude in musicians, and by musical memory in non-musicians; and consonant production is predicted best by a combination of all factors in musicians, but not by any of the variables in non-musicians.
221
predict vowel perception and that musical training is, again, not required to explain
accuracy of vowel perception or production69.
Together, the results show that it is not musical training as such that leads to
musicians‟ superior performance on speech perception and production tasks, but
rather more the case that inherent abilities like musical aptitude, musical memory, and
language aptitude enhance speech perception and production70. In short, playing an
instrument per se does not make foreign language sound learning easier, but aptitude
for music and language makes a good producer or perceiver of new speech sounds.
These results suggest that the reason why musicians are better at speech perception
and production is not due to their experience with music perception and production,
but rather to their predisposition for music, which, in turn could be related to their
motivation to learn an instrument (see section 8.7.2.1) and to their continued musical
experience, while those with less aptitude do not continue with musical training.
69 Similarly, when vowel perception was analysed for musicians and non-musicians separately, none of the factors predicted vowel perception in either group or production in non-musicians; vowel production in musicians was best predicted by music aptitude, memory, and training. 70 However, when musicians‟ and non-musicians‟ data are analysed separately, inherent abilities and training only predicted performance of musicians, but not of non-musicians, for which musical memory seemed to be the best predictor.
223
9.1 Summary of Results
Here the results of Experiments 1, 2, and 3 are summarised ahead of a discussion and
interpretation of these. The final section considers implications of the current
experiments and suggestions for future research.
9.1.1 Experiment 1: Categorical Perception of Speech and Sine-Wave Tones in
Tonal and Non-Tonal Language Speakers
Previous studies of the categorical identification and discrimination of tone typically
employed just a single tonal language. Moreover, the studies often only examined one
part of categorical perception; identification or discrimination.
Experiment 1 here concerned the categorical perception of novel synthetic tone
continua realised as both speech and non-speech stimuli in speakers of tonal and non-
tonal languages. The results in terms of language background were mixed. Mandarin
and Vietnamese listeners‟ perception was similar, but differed from Thai listeners‟
perception, which was unexpectedly similar to that of Australian English listeners.
Thus, there was no clear-cut distinction between tonal and non-tonal language
listeners. Different perceptual strategies were observed between language groups
(mid-continuum strategy for the Vietnamese and the Mandarin listeners, and flat-
anchor strategy for the Thai and the Australian English listeners), but these
differences did not correspond to tonal vs. non-tonal language background. These
results strongly suggest that it simply does not matter from the point of view of
proficiency or strategy choice whether the listener‟s native language is tonal, but
rather, if a tonal language is spoken, which tonal language is spoken. Indeed, it might
even be the case that particular features associated with tone categories in the native
language, e.g., durational differences in Vietnamese, might affect the categoricality of
tone perception for a new synthetic tone continuum.
In summary and conclusion, Experiment 1 here is the first study to compare a range of
tonal languages along with a non-tonal language. The results show that proficiency
and strategies differ across tonal and non-tonal language speakers confronted with a
new synthetic tone continuum. Not all tonal language speakers use the same
224
perceptual strategies, and each tonal language must be considered and analysed
separately.
An interesting side observation made in Experiment 1 was that tonal and non-tonal
language-speaking musicians behaved differently from non-musicians. Ad hoc
analyses suggested that musicians required fewer trials to reach criterion in
identification, exhibited more consistent identification patterns (identification was
more categorical), and were more accurate at discriminating tones. These observations
suggest that not only language background but also musical training appears to
influence tone perception. The reasons for this were unclear in Experiment 1, and
begged investigation in a more systematic manner.
9.1.2 Experiment 2: Perception of Speech and Sine-Wave Tones - The Role of
Language Background and Musical Training
The influence of musical background on the perception of tone was tested further in
Experiment 2, in which tonal and non-tonal language speakers (Thai and Australian
English speakers because of their strategy similarities observed in Experiment 1) with
and without musical training were tested on categorical identification and
discrimination tasks. Two different continua, one with more falling and fewer rising
tones, the other with more rising and fewer falling tones were employed in this
experiment in contrast to the single continuum used in Experiment 1. In this
controlled experimental manipulation of language and music background it was
shown that the ability to perceive a novel synthetic tone continuum categorically did
not differ appreciably as a function of language background (tone, Thai, vs. non-tone,
Australian English), but it was found that musicians compared with non-musicians
learn to identify tone categories more quickly, show more consistent tone labelling
abilities, and have better tone discrimination abilities. Interestingly, it was found that,
independently of language and music background, identification and discrimination
accuracy was higher for the falling than for the rising continuum. This suggests that
perception of tones does depend on the shape of the continuum that is presented.
225
9.1.3 Experiment 3: Perception and Production of Tones, Vowels, and
Consonants - The Influence of Training, Memory, and Aptitude
The question of whether the perceptual advantage found in Experiment 2 enables
musicians to perceive and to produce tones was addressed in Experiment 3. In order
to specify clearly what effects musicianship might have on perception and production
of tone compared with those on speech more generally, not only tone, but also
consonant and vowel perception and production were tested. In order that the effect of
musicianship per se on tone and phone perception and production was investigated,
non-tone (Australian English) musician and non-musician participants were tested
with non-native speech sounds - Thai tones, consonants, and vowels. Moreover, in
order to specify clearly what aspect of musicality might be the critical factor,
measures of musical training, musical aptitude, and musical memory were employed
(along with a measure of language aptitude as a control for inherent linguistic skills
independent of musical training).
Overall, there appeared to be a ceiling effect for vowel perception, due to the task
being quite easy and resulting in uniformly high identification scores. This aside,
musicians were better able to perceive and produce both tones and consonants than
non-musicians. However, when the specific predictors of these abilities were
considered, it was found that they differed according to speech sound type: consonant
perception and production was predicted by language aptitude (arguably an
autoregression effect), except that in consonant production adding musical memory
also added predictive power; whereas tone perception and production were best
predicted when musical aptitude and musical memory respectively were added to the
regression equation. Thus the type of musical influence was entirely unexpected:
rather than musical training, i.e., the amount and type of musical experience, it was
the inherent musical ability, musical memory, that best predicted tone production, and
for tone perception it was musical aptitude, although, in line with a strong correlation
between musical aptitude and musical training, there may well have been a co-
determination by musical aptitude/training
226
9.2 Strategy Effects in Tone Perception
In Experiment 1, there was evidence for the differential use of two different strategies
in the categorical perception of a novel synthetic tone continuum: Mandarin and
Vietnamese tonal language speakers tended to divide the asymmetric tone continuum
into above and below the centre; whereas Thai and Australian listeners appeared to
use a different strategy, dividing the continuum into tones above and below a flat no-
contour tone.
It seems that for the Thai and non-tonal Australian English listeners it is easier to use
a perceptually salient tone (flat 200 Hz onset 200 Hz offset tone) as the reference
point at which to create a boundary between tones of one and the other category –
rising or falling. For Mandarin and Vietnamese listeners, however, the separation
point is located at the centre of the continuum; they appear to create a new tone space
for the task and this tone space is perceptually-based, with the centre point creating
the boundary between the two tone categories. These two approaches are different.
The flat-anchor strategy used by Thai and English listeners is a more acoustically or
psychophysically-based strategy - the perceptually different and thus presumably
more salient flat stimulus is used as category boundary. All tones above the flat
stimulus are the categorised as one tone type, presumably „rising‟, while the tones
below the flat tone are categorised as another tone type, presumably „falling‟. On the
other hand in the mid-continuum strategy for the Mandarin and Vietnamese listeners,
the flat tone appears to play a less critical role in their perception of the asymmetric
continuum: they simply divide it into two equal halves, depending on the values of the
two endpoints. This appears to be a more linguistic approach and perhaps one that, all
other things being equal, is potentially a more adaptive and profitable approach when
learning a new (tone) language.
Another reason for the differences between the three tonal languages could be the
relative similarity of the synthetic tones used in the experiment to the actual tones of
the tonal languages (Thai, Vietnamese, and Mandarin). Thai has a relatively large
proportion of static tones (three static, two dynamic tones) compared to Vietnamese
(two static, four dynamic tones) and Mandarin (one static, three dynamic tones). Thai
listeners' greater exposure to static tones in their everyday linguistic experience may
227
predispose them to use the static „flat‟ tone as a perceptual anchor more readily. This
requires further investigation via specific experimental investigations such as
systematically varying synthetic tones from specific tone spaces, or by training groups
on different synthetic tone spaces and testing for transfer to new tones. The results of
such studies would assist in determining the plausibility of the indication here that the
listeners‟ tone space influences their perception of novel tonal spaces
The reason why Australian listeners, who are not required to attend to subtle tonal
differences at the lexical level in their own language, use the acoustic approach
appears to be obvious. The flat tone in the synthetic continuum is the most
acoustically salient (not falling, not rising) tone and therefore is used as a perceptual
anchor. Moreover, this is what Australian English (non-tonal) language speakers
might consider to be the more “normal” speech sound. What remains unclear is why
the Thai listener group, unlike the other tonal language groups, uses the same,
seemingly acoustic approach to tone perception as the Australian English listeners.
One reason could be that the Thai listeners‟ pre-existing linguistically-based
perceptual anchors, derived from the tone values and tone space of their native
language, just happen to coincide with the flat no-contour tone, which in turn is non-
tonal language listeners‟ acoustically-based anchor. While the veracity of this
explanation cannot be confirmed here it provides grist for future experiments in tone
perception (see section 9.5). This pattern of results for tone perception strategies,
together with observed differences between perception of differently shaped continua,
show that it is not possible to generalise from results based on speakers of one tonal
language to the speakers of all tonal languages, and that each group of language
speakers must be considered separately, as must the specific tone space characteristics
of their particular languages.
9.3 Musicians’ Advantages in Speech Perception and Production
The results of all three experiments show that musicality is associated with perceptual
advantages in identification, discrimination, and production of tones. These
advantages are discussed in further detail below in relation to (i) transfer to musical
tasks (ii) transfer to related linguistic tasks and (iii) transfer to less related linguistic
tasks.
228
9.3.1 Musical Experience – Transfer to Musical Tasks
The results of Experiment 3 show that musicians have higher musical aptitude scores
than non-musicians and there is a high correlation between musical aptitude and
musical training. This pattern of results is not very surprising, considering that people
with high musical aptitude are more likely to take up and keep playing an instrument
than people who find learning an instrument difficult. Another reason for musicians‟
better performance on musical aptitude tests could be that they are more used to
listening to subtle differences in music, and therefore their perception is more fine-
tuned than non-musicians‟ perception, especially since most music curricula start with
classical music, and the musical aptitude stimuli are based on Western classical
tonality and rhythm. Together, it seems only natural that musical training enhances, or
encourages or allows expression of, abilities that rely on similar processes as those on
which musical skills are based. However, that said, it is of interest that there are no
consistent differences between musicians and non-musicians in terms of musical
memory; musicians and non-musicians perform equivalently on this task, and musical
memory does not correlate with any other of the musical or linguistic tasks here. Thus
while musicianship (musical experience and musical aptitude seem closely related,
musical memory is independent of these, but note a ceiling effect in the musical
memory results. The ceiling effect in the musical memory results indicates that not
only did a large proportion of participants not vary in training, they also did not vary
in terms of memory. A more challenging memory task would presumably yield
reliable differences between musicians and non-musicians, as well as larger individual
differences with the groups.
9.3.2 Musical Experience – Transfer to Related Linguistic Tasks
In Experiment 3 it was found that musicians are better than non-musicians at
perception and production of lexical tones. The reasons for this are very likely related
to the fact that tone is a feature of both speech (in tonal languages) and music. Thus,
the frequent exposure to fine tonal differences may enhance musicians‟ tone
perception. While on the surface this explanation appears reasonable, the fact that
experience with tones in speaking a tonal language, Thai, does not appear to
systematically improve the perception of a new tone space (see Experiment 2 results),
229
this may not be the complete picture. It may well be the case that it is musical aptitude
and not musical experience that facilitates lexical tone perception.
Indeed the results of the regression analyses suggest that tone perception and
production are best predicted by inherent musical abilities, musical aptitude and
musical memory. Even though there may be some involvement of musical training
(see test of the alterative model, see section 8.6.3), musical training alone cannot
account for tone perception and production ability.
Elevated musical aptitude may be related to more general abilities, specifically in
musicians‟ generally auditory processing capabilities in listeners with musical training
and/or aptitude. Jakobson, Cuddy, and Kilgour (2003) found that musical training
engages and refines processes involved in pitch pattern analysis. They hypothesise
that music instruction strengthens auditory temporal processing skills, which would
enable musicians to discriminate better between rapidly changing acoustic events, a
skill that is necessary in at least the tone perception tasks in the current series. Given
the overlap between musical training and musical aptitude here, and the often
confounding of these factors in previous studies it is possible that the auditory
temporal processing skills mentioned above might be just as related to musical
aptitude as to musical training.
Another consideration speaking against the notion that musical expertise per se
facilitates tone perception and production is that the goal is generally to produce
stable pitches in Western music; in this experiment, none of the three tones were
stable.
9.3.3 Music Experience – Transfer to Less Related Linguistic Abilities
If indeed the skills uncovered or described by musical aptitude and/or musical training
are related to general auditory processing skills, then it should be the case that ability
for other linguistic skills, not just tone perception, should be elevated in musicians
compared to non-musicians. Such a contention is supported by the results of
Experiment 3, which show that musicians have more accurate consonant perception
abilities than non-musicians. Thus, the reason that musicians are better at perceiving
consonants than non-musicians could be that they are more finely tuned than non-
230
musicians, not only to subtle tonal differences, but also to small timing differences. It
has previously been found that musicians are also better than non-musicians in
judging temporal order (Koh et al., 2001). Such an advantage may be attributed to
musicians‟ frequent exposure to small tonal changes. However, given the close
relationship found between musical aptitude and musical training in the current
experiments, whether the advantages that musicians show here on linguistic tasks are
the result of acquired skills or latent aptitudes must await further experimentation.
9.4 Locus of Musicians’ Superiority
The results of the current experiments have shown that musicians are better at
learning new tonal contrasts, identification of novel tones, discrimination of subtle
tonal differences, and perception and production of novel consonants and tones.
The reasons for this are complex. Correlation results show that musical training is
highly correlated with musical aptitude, which, in turn, is correlated with language
aptitude. The regression results reveal that it is not musical training per se that
predicts perception and production of speech sounds. Thus, the question is: what is
the exact locus of musicians‟ superiority?
Linguistic ability for tones is best predicted by a combination of language and musical
aptitude (for tone perception) and musical memory (for tone production), but not by
language aptitude alone. Indeed it is the addition of musical aptitude and musical
memory to the regression equations for tone perception and tone production
respectively that significantly predicts the production. Thus, it appears that it is not
musical training as such that leads to good tone perception and production, but
inherent musical abilities. There is however, a rider to this conclusion; the high
correlation between musical aptitude and musical training plus the success in
prediction by musical training in the alternative regression model suggests that it may
be difficult to tease apart the effects of musical aptitude and musical training.
Nevertheless, the results strongly suggest that there may well be a component of
musicality that predicts tone perception and production that is independent of musical
training or experience. To find out more about this and how it may relate to a more
general underlying aptitude, consonant and vowel results need to be considered.
231
Consonant perception is best explained by language aptitude alone (and the addition
of musical memory in production). Again, musical training does not appear to play a
role and this is the case even under the alternative regression model. This involvement
of language aptitude makes intuitive sense, as the language aptitude test measures the
ability to learn new speech sounds. However, what requires addressing is why
consonant perception and production, but not tone perception and production are
predicted by language aptitude. It may well be the case that consonant perception and
production are more linguistic in nature than tone perception and production. One line
of evidence to support such a view is the fact that consonants are perceived more
categorically than tones or even vowels (see Chapter 2). However, while this may be
the case in general, what is shown here is that a particular (sub-) test of language
aptitude (Pimsleur, 1966) does not predict non-native participants' perception and
imitation of these naturally produced tone contrasts. Given the unfamiliarity of lexical
tone contrasts for these non-native listeners, it may not be too surprising that listeners
do not process these linguistically (as the previous experiments also suggest), but
perhaps more acoustically or even musically.
Turning to vowels, there is no contribution of language aptitude in the ability for
vowel perception or production. Indeed for vowel perception there were no significant
predictors. For vowel production however musical aptitude was a significant
predictor. However, it should be noted that the vowel perception task here was the
easiest task overall, and the regression results are the least comprehensive. Further
studies should be conducted before definitive conclusions can be drawn regarding
vowel perception and production 71.
The fact that musical training as such does not explain good perception and
production of speech sounds suggests that previous research may have confounded
musical training and musical aptitude. Such studies must, therefore, be considered
with caution, and future studies investigating speech processing skills in musicians
71 It also needs to be noted that when musicians‟ and non-musicians‟ data were analysed separately, non-musicians‟ speech perception and production in general were best predicted by musical memory, whereas a combination of musical training and inherent abilities best predicted the musicians‟ results. It should further be noted, however, that these results should be considered with caution, as the necessary assumptions for this separate analysis were not completely met.
232
and non-musicians should consider inherent factors (language aptitude, musical
aptitude, and musical memory) in addition to experiential factors such as of musical
training history. This and other directions for future research are considered in the
final section.
9.5 Suggestions for Future Research
The results of the current experiments have added to our understanding of the role of
musical ability in language ability, but have also opened a range of further questions.
Suggestions for future research, picking up on such questions, are discussed below.
9.5.1 Relationship between Tone Space and Intonation Space
In order to find out more about how specific characteristics of listeners‟ native tone
space (or intonation space in the case of non-tonal intonation languages such as
English) further experiments are required in which cross-language tone and intonation
are manipulated. In this way, it could be tested exactly how the native tone space
influences the acquisition of a new tone space.
9.5.2 Investigation of Relationship Between Musical Training and Musical
Aptitude
The results of the current studies indicate that there is a high correlation between
musical aptitude and musical training. This may, however, at least partly be due to the
selection criteria (no less than five years of musical training for musicians and no
more than 2 years of training for non-musicians). In order to separate the effects of
musical training and musical aptitude, there needs to be a greater range of degrees of
musical experience. Future studies should sample more widely from the population of
musical and non-musical participants.
9.5.3 Development of Musicality
Another important issue to consider in future experiments is the issue of the
development of musicality. Experiments with children may uncover whether
musicality is a learned skill that can be acquired at any age or whether there is a
critical period for music acquisition, similar to language acquisition. Some research in
this area was considered in Chapter 4. However studies with children are required to
233
determine the degree of aptitude in the presence or absence of musical training,
perhaps also in relation to their foreign language learning ability.
9.5.4 Acoustic Analyses of Speech Production Ability
In order to discover more about the production skills of musicians and non-musicians,
speech productions need to be analysed acoustically. The question here would be:
what exactly is the difference between the “good” and the “bad” producers? There are
various possibilities, for example in tone production, accuracy of overall pitch,
goodness of pitch contours, duration; in consonant production, accuracy of VOT
values, differences in production of native and non-native consonants; in vowel
production: accuracy of formant values, difference in production between native and
non-native vowels, etc.
9.5.5 Psychoacoustic Processing Investigation
Another important issue that has not been addressed in the current series of studies is
whether the relationship between musical ability and speech sound acquisition is a
result of individual differences in basic auditory processing, such as the ability to
perceive the pitch patterns (Jakobson et al., 2003), temporal order perception (Koh et
al., 2001) or the ability to perceive the presence or absence of very low amplitude
sounds. Previous research has shown that individual speech perception differences are
not dependent on spectral and temporal processing accuracy for non-speech
(Surprenant & Watson, 2001). It is thus necessary to investigate whether individual
variation in basic auditory abilities and processing can predict variation in the
perception of foreign language sounds and of musical sounds.
9.6 Conclusion
This thesis offers the first comprehensive multi-language investigation of lexical tone
perception in speakers of tonal and non-tonal languages with and without musical
experience. The findings suggest that tone processing is language-specific and
strongly shaped by inherent musical ability. Speech perception and production results
indicate that musical training is not the determining factor in acquisition of novel
speech sounds. Rather it appears to be inherent abilities, such as language or musical
234
aptitude, or a more universal „auditory aptitude‟ that explains musicians‟ superiority
in speech perception and production.
The current experiments represent are the first step into a new avenue of research, in
which both inherent abilities, and experiential factors will be considered in the search
for the origin of good speech acquisition skills.
236
Abercrombie, D. (1967). Elements of general phonetics. Chicago, IL: Aldine.
Abercrombie, D. (1968). Paralanguage. British Journal of Disorders of
Communication, 3, 55-59.
Abramson, A. S. (1961). Identification and discrimination of phonetic tones. Journal
of the Acoustical Society of America, 33, 842.
Abramson, A. S. (1962). The vowels and tones of standard Thai: Acoustical
measurements and experiments. International Journal of American
Linguistics, 28(2).
Abramson, A. S. (1975). The tones of Central Thai: Some perceptual experiments. In
J. C. J. G. Harris (Ed.), Studies in Thai Linguistics (pp. 1-16). Bangkok:
Central Institute of English Language.
Abramson, A. S. (1977). The noncategorical perception of tone categories in Thai.
Paper presented at the 93rd meeting of the Acoustical Society of America,
State College, Penn.
Abramson, A. S. (1978). Static and dynamic acoustic cues in distinctive tones.
Language and Speech, 21(4), 319-325.
Abramson, A. S. (1979). The noncategorical perception of tones in Thai. In B.
Lindblom & S. Ohmann (Eds.), Frontiers of speech communication research
(pp. 127-134). London: Academic Press.
Abramson, A. S., & Lisker, L. (1970). Discriminability along the voicing continuum:
Cross-language tests. In Proceedings of the 6th International Congress of
Phonetic Sciences (pp. 569-573). Prague: Academia.
Abramson, A. S., & Lisker, L. (1973). Voice-timing perception in Spanish word-
initial stops. Journal of Phonetics, 1, 1-8.
Abramson, A. S., & Svastikula, K. (1983). Intersections of tone and intonation in
Thai. Haskins Laboratories Status Report on Speech Research, SR-74/75, 143-
154.
Akahane-Yamada, R., Tohkura, Y., Bradlow, A. R., & Pisoni, D. B. (1998). Does
training in speech perception modify speech production? In H. T. Bunnell &
W. Idsardi (Eds.), Proceedings of the 4th International Conference on Spoken
Language Processing (Vol. 2, pp. 606-609). Philadelphia, PA, USA.
Altmann, G. T. M. (1990). Cognitive models of speech processing: An introduction.
In G. T. M. Altmann (Ed.), Cognitive models of speech processing:
237
Psycholinguistic and computational perspectives (pp. 1-23). Cambridge: MA
US: The MIT Press.
Arellano, S. I., & Draper, J. E. (1972). Relations between musical aptitudes and
second-language learning. Hispania, 55(1), 111-121.
Aslin, R. N., & Pisoni, D. B. (1980). Some developmental processes in speech
perception. In G. Yeni-Komshian, J. Kavanagh & C. Ferguson (Eds.), Child
phonology: Perception and production (pp. 67-96). New York: Academic
Press.
Aslin, R. N., Pisoni, D. B., Hennessy, B. L., & Perey, A. J. (1981). Discrimination of
voice onset time by human infants: New findings and implications for the
effects of early experience. Child Development, 52, 1135-1145.
Aslin, R. N., Pisoni, D. B., & Jusczyk, P. W. (1983). Auditory development and
speech perception in infancy. In M. M. Haith & J. J. Campos (Eds.), Infancy
and the biology of development. New York: Wiley.
Bachem, A. (1955). Absolute pitch. Journal of the Acoustical Society of America, 27,
1180–1185.
Baggaley, J. (1974). Measurement of Absolute Pitch. Psychology of Music(22), 11-17.
Baharloo, S., Johnston, P. A., Service, S. K., Gitschier, J., & Freimer, N. B. (1998).
Absolute pitch: An approach for identification of genetic and nongenetic
components. American Journal of Human Genetics, 62, 224–231.
Baharloo, S., Service, S. K., Risch, N., Gitschier, J., & Freimer, N. B. (2000).
Familial aggregation of absolute pitch. American Journal of Human Genetics,
67, 75-758.
Bailey, P. J., Summerfield, Q., & Dorman, M. F. (1977). On the identification if sine-
wave analogues of certain speech sounds. Haskins Laboratories Status Report
on Speech Research, SR-51/52, 1-25.
Ball, M., & Rahilly, J. (1999). Phonetics. The Science of Speech. London: Arnold.
Barron, R. W. (1994). The sound-to-spelling connection: Orthographic activation in
auditory word recognition and its implications for the acquisition of
phonological awareness and literacy skills. In V. W. Berninger (Ed.), The
varieties of orthographic knowledge, 1: Theoretical and developmental issues.
Neuropsychology and cognition (Vol. 8, pp. 219-242). Dordrecht,
Netherlands: Kluwer Academic Publishers.
238
Barry, J., & Blamey, P. (2004). The acoustic analysis of tone differentiation as a
means for assessing tone production in speakers of Cantonese. Journal of the
Acoustical Society of America, 116(3), 1739-1748.
Bastian, J., & Abramson, A. S. (1964). Identification and discrimination of phonemic
vowel duration. In Speech research and instrumentation (Vol. 10). New York:
Haskins Laboratories.
Bastian, J., Eimas, P. D., & Liberman, A. (1961). Identification and discrimination of
a phonemic contrast induced by silent interval. Journal of the Acoustical
Society of America, 33, 842.
Baudoin-Chial, S. (1986). Hemispheric lateralization of modern standard Chinese
tone processing. Journal of Neurolinguistics, 2, 189–199.
Bauer, R. S., & Benedict, P. K. (1997). Modern Cantonese Phonology (Vol. 102).
Berlin: Mouton de Gruyter.
Baumrin, J. M. (1974). Perception of the duration of a silent interval in nonspeech
stimuli: A test of the motor theory of speech perception. Journal of Speech
and Hearing Research, 17, 294-309.
Beach, D. M. (1938). The phonetics of the Hottentot language. Cambridge:
Cambridge University Press.
Bekesy, G. V. (1960). Experiments in hearing. New York: McGraw-Hill.
Bent, T., Bradlow, A., & Wright, B. (2006). The influence of linguistic experience on
the cognitive processing of pitch in speech and nonspeech sounds. Journal of
Experimental Psychology: Human Perception & Performance, 32(1), 97-103.
Bergeson, T. R., & Trehub, S. E. (2002). Absolute pitch and tempo in mothers' songs
to infants. Psychological Science, 13, 72-75.
Bertoncini, J., Bijeljac-Babic, R., Blumstein, S. E., & Mehler, J. (1987).
Discrimination in neonates of very short CV's. Journal of the Acoustical
Society of America, 82, 31-37.
Best, C. T., Moringiello, B., & Robson, R. (1981). Perceptual equivalence of acoustic
cues in speech and nonspeech perception. Perception & Psychophysics,
29(191-211).
Bever, T. G. (1975). Cerebral asymmetries in humans are due to their differentiation
of two incompatible processes: holistic and analytic. Annals of the New York
Academy of Sciences, 163, 251-262.
239
Bever, T. G., & Chiarello, R. J. (1974). Cerebral dominance in musicians and
nonmusicians. Science, 185, 537-539.
Bimler, D., Kirkland, J., & Jameson, K. (2004). Quantifying variations in personal
color spaces: Are there sex differences in color vision? Color Research and
Application, 29, 128-134.
Blechner, M. J. (1977). Musical skill and the categorical perception of harmonic
mode. Unpublished PhD Dissertation, Yale University.
Blicher, D. L., Diehl, R. L., & Cohen, L. B. (1988). Effects of syllable duration on the
perception of Mandarin tones: A cross-language study. Journal of the
Acoustical Society of America, 84(1), 157.
Blickenstaff, C. B. (1963). Musical talents and foreign language learning ability.
Modern language Journal, 47, 359-363.
Bloom, L. (1973). One word at a time: The use of single-word utterances before
syntax. The Hague: Mouton.
Bluhme, H., & Burr, R. (1971). An audio-visual display of pitch for teaching Chinese
tones. Studies in Linguistics, 22, 51-57.
Bolton, T. L. (1894). Rhythm. American Journal of Psychology, 6, 145-238.
Bornstein, M. H. (1973). Color vision and color naming: A psychophysiological
hypothesis of cultural difference. Psychological Bulletin, 80(4), 257-285.
Bornstein, M. H. (1987). Perceptual categories in vision and audition. In S. Harnad
(Ed.), Categorical Perception: The Groundwork of Cognition (pp. 535-565).
Cambridge: Cambridge University Press.
Boyle, J. (1992). Evaluation of musical ability. In R. Colwell (Ed.), Handbook of
Research on Music Teaching and Learning (pp. 247–265). New York: Oxford
University Press.
Bradlow, A., Pisoni, D., Yamada, R., & Tohkura, Y. (1997). Training Japanese
listeners to identify English /r/ and /l/. IV. Some effects of perceptual learning
on speech production. Journal of the Acoustical Society of America, 101,
2299–2310.
Brady, P. T. (1970). Fixed-scale mechanism of absolute pitch. Journal of the
Acoustical Society of America, 48, 883–887.
Braun, F. (1927). Untersuchungen ueber das persoenliche Tempo. Archiv der
gesamten Psychologie, 60, 317-360.
240
Broca, P. (1861). Remarques sur le siege de la faculte de langage articule, suivis d'une
observation d'aphemie (perte de la parole). Bulletin de la Societe Anatomique,
6, 330-357.
Broselow, E., Hurtig, R. R., & Ringen, C. (1987). The perception of second language
prosody. In G. Ioup & S. H. Weinberger (Eds.), Inter-language Phonology,
The Acquisition of Second Language Sound System (pp. 350-361). Cambridge:
Newbury House Publishers.
Bryden, M. P. (1982). Laterality: Functional asymmetry in the brain. New York:
Academic Press.
Burnham, D., Earnshaw, L., & Clark, J. (1991). Development of categorical
identification of native and non-native bilabial stops: infants, children and
adults. Journal of Child Language, 18, 231-260.
Burnham, D., Earnshaw, L., & Quinn, M. (1987). The development of categorical
identification of speech. In B. McKenzie & H. Day (Eds.), Perceptual
Development in Early Infancy: Problems and Issues (pp. 237-275). New York:
Erlbaum.
Burnham, D., & Francis, E. (1997). The role of linguistic experience in the perception
of Thai tones. In A. S. Abramson (Ed.), Southeast Asian Linguistic studies in
honour of Vichin Panupong (pp. 29-47).
Burnham, D., & Jones, C. (2002). Categorical perception of lexical tone by tonal and
non-tonal language speakers. In Proceedings of the 9th Australian
International Conference on Speech Science & Technology. Melbourne:
Australian Speech Science & Technology Association Inc.
Burnham, D., Peretz, I., Stevens, K., Jones, C., Schwanhäußer, B., Tsukada, K., et al.
(2004). Do Tone Language Speakers have Perfect Pitch? Paper presented at
the 8th International Conference on Music Perception & Cognition, Evanston,
IL.
Burnham, D., Tsukada, K., Jones, C., Rungrojsuwan, S., Krachaikiat, N., &
Luksaneeyanawin, S. (2005, December 15-16, 2005). Lexical tone production
development in Thai children, 18 months to 6 years: Relationships with
language milestones? Paper presented at the 15th Australian Language and
Speech Conference, Sydney.
241
Burnham, D. K., Earnshaw, L. J., & Clark, J. E. (1991). Development of categorical
identification of native and non-native bilabial stops: Infants, children, and
adults. Journal of Child Language, 18(2), 231-260.
Burns, E. M. (1999). Intervals, scales, and tuning. In D. Deutsch (Ed.), The
Psychology of Music (pp. 215-264). New York: Academic Press.
Burns, E. M., & Ward, D. (1978). Categorical perception - phenomenon or
epiphenomenon: Evidence from experiments in the perception of melodic
musical intervals. Journal of the Acoustical Society of America, 68, 456-468.
Capo, H. B. C. (1991). A comparative phonology of Gbe. Publications in African
Languages and Linguistics, 14.
Caramazza, A., Yeni-Komshian, G., Zurif, E., & Carbone, E. (1973). The acquisition
of a new phonological contrast: The case of stop consonants in French-English
bilinguals. Journal of the Acoustical Society of America, 54, 421-428.
Carney, A. E., Widin, G. P., & Viemeister, N. F. (1977). Noncategorical perception of
stop consonants differing in VOT. Journal of the Acoustical Society of
America, 62, 961-970.
Chan, A. S., Ho, Y.-C., & Cheung, M.-C. (1998). Music training improves verbal
memory. Nature, 381(284).
Chan, M. (1987). Tone and melody in Cantonese. Paper presented at the Thirteenth
Annual Meeting of the Berkeley Linguistics Society, Berkeley, BLS.
Chan, S., Chuang, C., & Wang, W. (1975). Cross-language study of categorical
perception for lexical tone. Journal of the Acoustical Society of America, 58,
119.
Chang, H. W., & Trehub, S. E. (1977). Auditory processing of relational information
by young infants. Journal of Experimental Child Psychology, 24, 324-331.
Chang, Y. C., & Halle, P. (2000). Taiwan Huayu shengdiao fanchou ganzhi
[Categorical perception of Taiwan Mandarin tones]. Tsing Hua Journal of
Chinese Studies, new series XXX, 1, 51-56.
Chao, R. C. (1956). Tone, intonation, singsong, chanting, recitative, tonal
composition, and atonal composition in Chinese. In M. Halle, H. G. Lunt, H.
McLean & C. H. V. Schooneveld (Eds.), For Roman Jakobson: Essays on the
Occasion of His Sixtieth Birthday, 11 October 1956. The Hague, Netherlands:
Monton & Co.
242
Chao, Y.-R. (1930). A system of tone letters. Le Maitre Phonetique, 45, 24-27.
Chuang, C., Hiki, S., Sone, T., & Nimura, T. (1972). The acoustical features and
perceptual cues of the four tones of standard colloquial Chinese. Proceedings
of the 7th International Congress on Acoustics, (Adademial Kiado, Budapest),
297-300.
Clynes, M., & Nettheim, N. (1982). The living quality of music: Neurobiologic
patterns of communicating feeling. In M. Clynes (Ed.), Music, Mind and Brain
(pp. 171–216). New York.
Cohen, A. J., & Baird, K. (1990). Acquisition of absolute pitch: The question of
critical periods. Psychomusicology, 9, 31–37.
Cohen, A. J., Thorpe, L. A., & Trehub, S. E. (1987). Infants‟ perception of musical
relations in short transposed tone sequences. Canadian Journal of Psychology,
41, 33–47.
Collier, G. L., & Wright, C. E. (1995). Temporal rescaling of simple and complex
ratios in rhythmic tapping. Journal of Experimental Psychology: Human
Perception and Performance, 21, 602-627.
Collyer, C. E., Broadbent, H. A., & Church, R. M. (1994). Preferred rates of repetitive
tapping and categorical
time production. Perception & Psychophysics, 55, 443–453.
Conway, D. A., & Haggard, M. P. (1971). New demonstrations of categorical
perception (No. 5). Cambridge: University of Cambridge, Psychology
Laboratory.
Cook, P. R. (1991). Identification of control parameters in an articulator vocal tract
model, with applications to the synthesis of singing. Unpublished PhD
Dissertation, Stanford University.
Cooper, W. E., Ebert, R. E., & Cole, R. A. (1976). Perceptual analysis of stop
consonants and glides. Journal of Experimental Psychology: Human
Perception & Performance, 2, 92-104.
Coster, D. C., & Kratochvil, P. (1984). Tone and stress discrimination in normal
Peking dialect speech. In B. Hong (Ed.), New papers in Chinese linguistics
(pp. 119-132). Canberra: Australian National University Press.
Cowan, N., & Morse, P. A. (1979). Influence of task demands on the categorical
versus continuous perception of vowels. In J. J. Wolf & D. H. Klatt (Eds.),
243
Speech Communication Papers (pp. 443-446). New York: Acoustical Society
of America.
Creelman, C. D., & Macmillan, N. A. (1979). Auditory phase and frequency
discrimination: A comparison of nine procedures. Journal of Experimental
Psychology: Human Perception & Performance, 5, 146-156.
Creelman, C. D., & Macmillan, N. A. (1996). DPrime Plus. 2004, from
http://www.psych.utoronto.ca/~creelman/
Cross, D. V., Lane, H. L., & Sheppard, W. C. (1965). Identification and
discrimination functions for a visual continuum and their relation to the motor
theory of speech perception. Journal of Experimental Psychology, 70, 63-74.
Crowder, R. G. (1982). Decay of auditory memory in vowel discrimination. Journal
of Experimental Psychology: Human Learning and Memory, 8, 153-162.
Crozier, J. B. (1997). Absolute pitch: Practice makes perfect, the earlier the better.
Psychology of Music, 25, 110–119.
Crystal, D. (2003). A Dictionary of Linguistics and Phonetics (5 ed.). Malden, MA:
Blackwell.
Cuddy, L. L. (1968). Practice effects in the judgment of absolute pitch. Journal of the
Acoustical Society of America, 43, 1069-1076.
Cuddy, L. L. (1970). Training the absolute identification of pitch. Perception &
Psychophysics, 8, 265-269.
Cutting, J., & Rosner, B. (1974). Categories and boundaries in speech and music.
Perception & Psychophysics, 16(3), 564-570.
Cutting, J., Rosner, B., & Foard, C. (1976). Perceptual categories for musiclike
sounds: implications for theories of speech perception. Quarterly Journal of
Experimental Psychology, 28, 361-378.
Damper, R., & Harnad, S. (2000). Neural network models of categorical perception.
Perception & Psychophysics, 62(4), 843-867.
Davidson, J. (1993). Visual perception of performance manner in the movements of
solo musicians. Psychology of Music, 21(2), 103–113.
Dechovitz, D., & Mandler, R. (1977). Effects of transition length on identification and
discrimination along a place continuum. Haskins Laboratories Status Report
on Speech Research, SR-51/51, 119-130.
244
Delattre, P. C., Liberman, A. M., & Cooper, F. S. (1955). Acoustic loci and
transitional cues for consonants. Journal of the Acoustical Society of America,
27, 769-773.
Delattre, P. C., Liberman, A. M., & Cooper, F. S. (1964). Formant transitions and loci
as acoustic correlates of place of articulation. Studie Lingustica, 18, 104-121.
Delattre, P. C., Liberman, A. M., Cooper, F. S., & Gerstman, L. J. (1952). An
experimental study of the acoustic determinants of vowel color. Word, 8, 195-
210.
Demany, L., & Armand, F. (1984). The perceptual reality of tone chroma in early
infancy. Journal of the Acoustical Society of America, 76, 57-66.
Dexter, E. S., & Omwake, K. T. (1934). The relation between pitch discrimination
and accent in modern languages. Journal of Applied Psychology, 18, 267-271.
Diehl, R. L., & Kluender, K. R. (1989). On the objects of speech perception.
Ecological Psychology, 1(2), 121-144.
Diehl, R. L., Lotto, A. J., & Holt, L. L. (2004). Speech perception. Annual Review of
Psychology, 55, 149-179.
Douglas, S., & Willatts, P. (1994). The relationship between musical ability and
literacy skills. Journal of Research in Reading, 17, 99-107.
Dow, F. (1972). An outline of Mandarin Phonetics. Canberra: Australian National
University Press.
Dowling, W. J. (1999). Development of music perception and cognition. In D.
Deutsch (Ed.), The Psychology of Music (2nd ed., pp. 603–625). San Diego,
CA: Academic Press.
Drayna, D., Manichaikul, A., de Lange, M., Snieder, H., & Spector, T. (2001).
Genetic correlates of musical pitch recognition in humans. Science, 291, 1969-
1972.
Dung, D., Huong, T., & Boulakia, G. (1998). Intonation in Vietnamese. In D. Hirst &
A. Di Cristo (Eds.), Intonation Systems. A Survey of Twenty Languages (pp.
395-416). Cambridge: Cambridge University Press.
Echols, C. H., Crowhurst, M. J., & Childers, J. B. (1997). The perception of rhythmic
units in speech by infants and adults. Journal of Memory and Language,
36(2), 202-225.
245
Edman, T. R. (1979). Discrimination of intraphonemic differences along two place of
articulation continua. In J. J. Wolf & D. H. Klatt (Eds.), Speech
Communication Papers (pp. 455-458). New York: Acoustical Society of
America.
Edman, T. R., Soli, S. D., & Widin, G. P. (1978). Learning and generalization of
intraphonemic VOT discrimination. Journal of the Acoustical Society of
America, 63, 19 (Abstract).
Eilers, R. E. (1980). Infant speech perception: History and mystery. In G. H. Yeni-
Komshian, J. F. Kavanagh & C. A. Ferguson (Eds.), Child Phonology (Vol. 2,
pp. 23-39). New York: Academic Press.
Eimas, P. D. (1963). The relation between identification and discrimination along
speech and non-speech continua. Language and Speech, 6, 206-217.
Eimas, P. D. (1975). Auditory and phonetic coding of the cues for speech:
Discrimination of the [r-l] distinction by young infants. Perception &
Psychophysics, 18, 341-347.
Eimas, P. D., & Miller, J. L. (1980a). Contextual effects in infant speech perception.
Science, 209, 1140-1141.
Eimas, P. D., & Miller, J. L. (1980b). Discrimination of the information for manner of
articulation. Infant Behavior and Development, 3, 367-375.
Eimas, P. D., Siqueland, E. R., Jusczyk, P., & Vigorito, J. (1971). Speech perception
in infants. Us: American Assn for the Advancement of Science.
Eng, N., Obler, L. K., Harris, K. S., & Abramson, A. S. (1996). Tone perception
deficits in Chinese-speaking Broca‟s aphasics. Aphasiology, 10, 649–656.
Eterno, J. A. (1961). Foreign language pronunciation and musical aptitude. Modern
Language Journal, 45, 168 -170.
Ewan, W. G. (1975). Laryngeal behavior in speech. Unpublished PhD Dissertation,
University of California, Berkeley.
Feld, S., & Fox, A. A. (1994). Music and language. Annual Review of Anthropology,
23, 25-53.
Fernald, A. (1991). Prosody in speech to children: Prelinguistic and linguistic
functions. Annals of Child Developmental Psychology, 8, 43–80.
Fish, L. (1984). Relationships among Eighth-Grade German Students' Learning
Styles, Pitch Discrimination, Sound Discrimination, and Pronunciation of
246
German Phonemes. Unpublished Master's Thesis, University of Minnesota,
Minneapolis.
Flanagan, J., & Saslow, M. (1958). Pitch discrimination of synthetic vowels. Journal
of the Acoustical Society of America, 32, 1319-1328.
Flege, L. E., Munro, M. J., & Fox, R. A. (1994). Auditory and categorical effects on
cross-language vowel perception. Journal of the Acoustical Society of
America, 95(6), 3623-3641.
Fodor, J. (1983). Modularity of mind. Cambridge, MA: MIT Press.
Forfeit, K. G. (1977). Linguistic relativism and selective adaptation for speech: A
comparative study of English and Thai. Perception & Psychophysics, 21, 347-
351.
Forster, K. I., & Forster, J. C. (2003). DMDX: A Windows display program with
millisecond accuracy. Behavior Research Methods, Instruments & Computers,
35, 116-124.
Fowler, C. A. (1994). Speech perception: direct realist theory. In R. E. Asher (Ed.),
The Encyclopaedia of Language and Linguistics (pp. 4199-4203). Oxford:
Pergamon.
Fowler, C. A. (1996). Listeners do hear sounds, not tongues. Journal of the Acoustical
Society of America, 99, 1730-1741.
Fox, R., & Unkefer, J. (1983). The effect of lexical status on the perception of tone.
Journal of Chinese Linguistics, 13, 71-87.
Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.), The Psychology of Music
(1 ed., pp. 149-180). New York: Academic Press.
Francis, A. L., Ciocca, V., & Chit Ng, B. K. (2003). On the (non)categorical
perception of lexical tones. Perception & Psychophysics, 65(7), 1029-1044.
Frazier, L. (1976). What can /w/, /l/, /y/ tell us about categorical perception? Haskins
Laboratories Status Report on Speech Research, SR-48, 235-256.
Fry, D. B. (1969). Acoustic Phonetics. Cambridge: Cambridge University Press.
Fry, D. B. (1970). Prosodic phenomena. In B. Malmberg (Ed.), Manual of Phonetics.
Amsterdam: North Holland.
Fry, D. B., Abramson, A. S., Eimas, P. D., & Liberman, A. M. (1962). The
identification and discrimination of synthetic vowels. Language and Speech,
5, 171-189.
247
Fujisaki, H., & Kawashima, T. (1968). The influence of various factors on the
identification of synthetic speech sounds. In Reports of the 6th international
congress on acoustics (Vol. No. 2, pp. 95-98). Tokyo.
Fujisaki, H., & Kawashima, T. (1969). On the modes and mechanisms of speech
perception. In Annual Report of the Engineering Research Institute, (Vol. 28,
pp. 67-73): University of Tokyo.
Fujisaki, H., & Kawashima, T. (1970). Some experiments on speech perception and a
model for the perceptual mechanism. In Annual Report of the Engineering
Research Institute, Faculty of Engineering (Vol. 29, pp. 207-214): University
of Tokyo.
Fujisaki, H., & Kawashima, T. (1970). Some experiments on speech perception and a
model for the perceptual mechanism. Tokyo: Faculty of Engineering.
Gabrielsson, A. (1973). Adjective ratings and dimension analyses of auditory rhythm
patterns. Scandinavian Journal of Psychology, 14, 244–260.
Gandour, J. (1974). Consonant types and tone in Siamese. Journal of Phonetics, 2,
337-350.
Gandour, J. (1983). Tone perception in far eastern languages. Journal of Phonetics,
11, 149-175.
Gandour, J., & Dardarananda, R. (1983). Identification of tonal contrasts in Thai
aphasic patients. Brain and Language, 18, 98-114.
Gandour, J., & Harshman, R. (1978). Crosslanguage differences in tone perception: A
multidimensional scaling investigation. Language and Speech, 21(1), 1-33.
Gandour, J., Petty, S. H., & Dardarananda, R. (1988). Perception and production of
tone in aphasia. Brain and Language, 35(2), 201-240.
Gandour, J., Ponglorpisit, S., Khunadorn, F., Dechongkit, S., Boongrid, P.,
Boonklam, R., et al. (1992). Lexical tones in Thai after unilateral brain
damage. Brain and Language, 43, 275–307.
Gandour, J., Potisuk, S., Dechongkit, S., & Ponglorpisit, S. (1992). Tonal
coarticulation in Thai disyllabic utterances: A preliminary study. Linguistics of
the Tibeto-Burman Ares, 15(1), 93-110.
Gandour, J., Wong, D., & Hutchins, G. (1998). Pitch processing in the human brain is
influenced by language experience. NeuroReport, 9, 2115-2119.
248
Garcia, E. (1966). The identification and discrimination of synthetic nasals. Haskins
Laboratories Status Report on Speech Research, SR-7/8(3.1-3.16).
Gardiner, M. F., Fox, A., Knowles, F., & Jeffrey, D. (1996). Learning improved by
arts training. Nature, 381, 284.
Garrett, M., Bever, T., & Fodor, J. (1966). The active use of grammar in speech
perception. Perception & Psychophysics, 1, 30-32.
Gerrits, E., & Schouten, M. E. H. (2004). Categorical perception depends on the
discrimination task. Perception & Psychophysics, 66(3), 363-376.
Gill, H. S., & Gleason, H. A. (1969). A reference grammar of Punjabi: Patiala.
Gilleece, L. F. (2006). An empirical investigation of the association between musical
aptitude and foreign language aptitude. Unpublished PhD Dissertation,
Trinity College Dublin.
Gordon, E. (1965). Music Aptitude profile. Boston: Houghton Mifflin.
Gordon, E. (1989). Advanced Measures of Music Audiation: GIA Publications.
Gottfried, T. L. (2007). Music and language learning: Effect of musical training on
learning L2 speech contrasts. In O.-S. Bohn & M. J. Munro (Eds.), Language
Experience in Second Language Speech Learning: In honor of James Emil
Flege. (pp. 221–237).
Graziano, A. B., Peterson, M., & Shaw, G. L. (1999). Enhanced learning of
proportional math through music training and spatial-temporal reasoning.
Neurological Research, 21, 139-152.
Gregersen, P. K., Kowalsky, E., Kohn, N., & Marvin, E. W. (2001). Early childhood
music education and predisposition to absolute pitch: Teasing apart genes and
environment. American Journal of Medical Genetics, 98(3), 280-282.
Gromko, J. E., & Poorman, A. S. (1998). Developmental trends and relationships in
children's aural perception and symbol use. Journal of Research in Music
Education, 46, 16-23.
Grove Music Online. (2001). Retrieved 5 October, 2006, from
http://www.grovemusic.com.ezproxy.uws.edu.au
Gussenhoven, C. (2004). The Phonology of Tone and Intonation. Cambridge:
Cambridge University Press.
Halle, M., & Stevens, K. N. (1971). A note on laryngeal features (Quarterly progress
Report). Boston: MIT Research Lab of Electronics.
249
Halle, P. A., Chang, Y. C., & Best, C. T. (2004). Identification and discrimination of
Mandarin Chinese tones by Mandarin Chinese vs. French listeners. Journal of
Phonetics, 32(3), 395-421.
Halpern, A. R. (1989). Memory for the absolute pitch of familiar songs. Memory &
Cognition, 17, 572-581.
Han, S. M., & Kim, K.-O. (1974). Phonetic variation of Vietnamese tones in
disyllabic utterances. Journal of Phonetics, 2, 223-232.
Hanson, V. L. (1977). Within-category discrimination in speech perception.
Perception & Psychophysics, 21, 423-430.
Harnad, S. (1987). Categorical Perception: The Groundwork of Cognition.
Cambridge: Cambridge University Press.
Harrison, P. (1998). Yoruba babies and unchained melody. UCL Working Papers in
Linguistics, 10, 1-20.
Hasegawa, A. (1976). Some perceptual consequences of fricative coarticulation.
Unpublished PhD Dissertation, Purdue University.
Hassler, M., Birbaumer, N., & Feil, A. (1985). Musical talent and visual-spatial
ability: a longitudinal study. Psychology of Music, 13, 99-113.
Haudricourt, A.-G. (1954). De l'origine des tons en vietnamien. Journal Asiatique,
242, 69-82.
Haudricourt, A.-G. (1961). Bipartition et tripartition des systemes de tons dans
quelques langues d'extreme-orient. Bulletin de la Societe Linguistique de
Paris, 56, 163-180.
Healy, A. F., & Repp, B. H. (1982). Context independence and phonetic mediation in
categorical perception. Journal of Experimental Psychology: Human
Perception & Performance, 8, 68-80.
Heller, L. M., & Trahiotis, C. (1995). The discrimination of samples of noise in
monotonic, diotic and dichotic conditions. Journal of the Acoustical Society of
America, 97, 3775-3781.
Henderson, E. J. A. (1981). Tonogenesis: Some recent speculations on the
development of tone. Transactions of the Philological society, 112, 1-24.
Hickok, G. (2001). Functional anatomy of speech perception and speech production:
Psycholinguistic implications. Journal of Psycholinguistic Research, 30(3),
225-235.
250
Hillenbrand, J. M., Minifie, F. D., & Edwards, T. J. (1979). Tempo of spectrum
change as a cue in speech and sound discrimination by infants. Journal of
Speech and Hearing Research, 22, 147-165.
Hirose, H. (1997). Investigating the physiology of laryngeal structures. In W. J.
Hardcastle & J. Laver (Eds.), Handbook of Phonetic Sciences (pp. 116-136).
Oxford: Blackwell.
Hirsh, I. J., & Sherrick, C. E. (1961). Perceived order in different sense modalities.
Journal of Experimental Psychology, 62, 423-432.
Ho, A. (1976). The acoustic variation of Mandarin tones. Phonetica, 33, 353-367.
Ho, Y.-C., Cheung, M.-C., & Chan, A. S. (2003). Music training improves verbal but
not visual memory: Cross-sectional and longitudinal explorations in children.
Neuropsychology, 17, 439-450.
Hombert, J.-M. (1975). Towards a theory of tonogenesis: an empirical,
physiologically and perceptually-based account of the development of tonal
contrasts in language. Unpublished PhD Dissertation, University of
California, Berkeley.
Hombert, J.-M. (1977a). Consonant types, vowel height, and tone in Yoruba. Studies
in African Linguistics, 8(173-190).
Hombert, J.-M. (1977b). Development of tones from vowel height? Journal of
Phonetics, 5, 9-16.
Hombert, J.-M., & Ladefoged, P. (1976). The effect of aspiration on the fundamental
frequency of the following vowel. Journal of the Acoustical Society of
America, 59, 572.
Hombert, J.-M., Ohala, J. J., & Ewan, W. G. (1979). Phonetic explanations for the
development of tone. Language, 55, 37-58.
Howie, J. M. (1976). Acoustical studies of Mandarin vowels and tones. Cambridge,
UK: Cambridge University Press.
Hsieh, L., Gandour, J., Wong, D., & Hutchins, G. D. (2001). Functional heterogeneity
of inferior frontal gyrus is shaped by linguistic experience. Brain and
Language, 76, 227-252.
Hughes, C. P., Chan, J. L., & Su, M. S. (1983). Aprosodia in Chinese patients with
right cerebral hemisphere lesions. Archives of Neurology, 40, 732-736.
251
Hung, T. T. N. (1989). Syntactic and Semantic Aspects of Chinese Tone Sandhi.
Bloomington: Indiana University Linguistics Club Publications.
Hurwitz, I., Wolff, P. H., Bortnick, B. D., & Kokas, K. (1975). Nonmusical effects of
the Kodaly music curriculum in primary grade children. Journal of Learning
Disability, 8, 167-174.
International Phonetics Association. (1999). Handbook of the International Phonetic
Association: A guide to the use of the International Phonetic Alphabet:
Cambridge University Press.
Jakobson, L. S., Cuddy, L. L., & Kilgour, A. R. (2003). Time tagging: A key to
musicians‟ superior memory. Music Perception, 20, 307–313.
Jensen, M. K. (1958). Recognition of word tones in whispered speech. Word, 14, 187-
197.
Jongman, A., & Moore, C. (2000). The role of language experience in speaker and
rate normalization processes. Paper presented at the 6th International
conference on Spoken Language Processing.
Jongman, A., Wang, Y., Moore, C., & Sereno, J. A. (2006). Perception and
production of Mandarin Chinese tones. In L. H. T. E. Bates, and O. Tseng
(Ed.), The Handbook of Chinese Psycholinguistics: Cambridge University
Press.
Jusczyk, P. W. (1981). Infant speech perception: A critical appraisal. In P. D. Eimas
& J. L. Miller (Eds.), Perspectives on the study of speech (pp. 113-164).
Hillsdale: Erlbaum.
Jusczyk, P. W., & Bertoncini, J. (1988). Viewing the development of speech
perception as an innately guided learning process. Language and Speech, 31,
217-238.
Jusczyk, P. W., Copan, H., & Thompson, E. (1978). Perception by two-month-olds of
glide contrasts in multisyllabic utterances. Perception & Psychophysics, 24,
515-520.
Jusczyk, P. W., Pisoni, D. B., Walley, A. C., & Murray, J. (1980). Discrimination of
the relative onset of two-component tones by infants. Journal of the Acoustical
Society of America, 67, 262-270.
252
Jusczyk, P. W., Rosner, B. S., Reed, M., & Kennedy, L. J. (1989). Could temporal
order differences underlie 2-month-olds' discrimination of English voicing
contrasts? Journal of the Acoustical Society of America, 85, 1741-1749.
Justus, T. C., & Bharucha, J. J. (2002). Music perception and cognition. New York,
NY, US: John Wiley & Sons Inc.
Kaplan, H. L., Macmillan, N. A., & Creelman, C. D. (1978). Tables of d' for variable-
standard discrimination paradigms. Behavior Research Methods &
Instrumentation, 10, 796-813.
Karlgren, B. (1926). Etudes sur la phonologie Chinoise. In Archives d'Etudes
Orientales (Vol. 15). Stockholm: Norstedt.
Kawahara, H., Katayose, H., de Cheveigne, A., & Patterson, R. D. (1999). Fixed point
analysis of frequency to instantaneous frequency mapping for accurate
estimation of F0 and periodicity. Paper presented at the EUROSPEECH '99,
Budapest, Hungary.
Keating, P. A., & Blumstein, S. E. (1978). Effects of transition length on the
perception of stop consonants. Journal of the Acoustical Society of America,
64, 57-64.
Keating, P. A., Mikos, M. J., & Ganong, W. F. (1981). A cross-language study of
range of voice-onset time in the perception of initial stop voicing. Journal of
the Acoustical Society of America, 70, 1261-1271.
Keenan, J. P., Thangaraj, V., Halpern, A. R., & Schlaug, G. (2001). Absolute pitch
and the planum temporale. NeuroImage, 14, 1402-1408.
Kimura, D. (1961). Cerebral dominance and the perception of verbal stimuli.
Canadian Journal of Psychology, 15, 156-165.
Kimura, D. (1964). Left-right differences in the perception of melodies. Quarterly
Journal of Experimental Psychology, 16, 335-358.
Kiriloff. (1969). On the auditory discrimination of tones in Mandarin. Phonetica, 20,
63-69.
Kirstein, E. (1966). Perception of second-formant transitions in non-speech patterns.
Haskins Laboratories Status Report on Speech Research, SR-7/8, 9.1-9.3.
Kitamura, C., & Burnham, D. (1998). The infant's response to vocal affect in maternal
speech. Advances in Infancy Research, 12, 221-236.
253
Klatt, D. (1973). Discrimination of fundamental frequency contours in synthetic
speech: Implications for models of pitch perception. Journal of the Acoustical
Society of America, 53, 8-16.
Klatt, D. (1989). Review of selected models of speech perception. In W. Marslen-
Wilson (Ed.), Lexical Representation and Process (pp. 169-226). Cambridge,
MA: MIT Press.
Klein, D., Zatorre, R., Milner, B., & Zhao, V. (2001). A cross-linguistic PET study of
tone perception in Mandarin Chinese and English speakers. NeuroImage, 13,
646-653.
Kluender, K. R. (1994). Speech perception as a tractable problem in cognitive
science. In M. A. Gernsbacher (Ed.), Handbook of Psycholinguistics (pp. 173-
217). San Diego, CA, USA: Academic Press, Inc.
Koh, C. K., Cuddy, L. L., & Jakobson, L. S. (2001). Associations and dissociations
among music training, tonal and temporal order processing, and cognitive
skills. Annals of the New York Academy of Sciences, 930(1), 386–388.
Kramer, E. (1963). Judgement of personal characteristics and emotions from
nonverbal properties of speech. Psychological Bulletin, 61, 408-420.
Kratochvil, P. (1985). Variable norms of tones in Beijing prosody. Cahiers de
Linguistique Asie Orientale, 14, 153-174.
Kratochvil, P. (1998). Intonation in Beijing Chinese. In D. Hirst & A. Di Cristo
(Eds.), Intonation Systems. A Survey of Twenty Languages (pp. 417-431).
Cambridge: Cambridge University Press.
Krumhansl, C. L., & Schenk, D. L. (1997). Can dance reflect the structural and
expressive qualities of music? Musicae Scientiae, 1, 63–85.
Kuhl, P. K. (1981). Discrimination of speech by nunhuman animals: Basic auditory
sensitivities conducive to the perception of speech-sound categories. Journal
of the Acoustical Society of America, 70, 340-349.
Kuhl, P. K. (1991). Human adults and human infants show a "perceptual magnet
effect" for the prototypes of speech categories, monkeys do not. Perception &
Psychophysics, 50(2), 93-107.
Kuhl, P. K., & Miller, C. (1975). Speech perception by the chinchilla: Voiced-
voiceless distinction in alveolar plosive consonants. Science, 190, 69-72.
254
Kuhl, P. K., & Miller, J. D. (1978). Speech perception by the chinchilla: Identification
functions for synthetic VOT stimuli. Journal of the Acoustical Society of
America, 63, 905-907.
Kuhl, P. K., & Miller, J. D. (1982). Discrimination of auditory target dimensions in
the presence or absence of variation in a second dimension by infants.
Perception & Psychophysics, 32, 279-292.
Kuhl, P. K., & Padden, D. M. (1982). Enhanced discriminability at the phonetic
boundaries for the voicing feature in macaques. Perception & Psychophysics,
32, 542-550.
Lamb, S. J., & Gregory, A. H. (1993). The relationships between music and reading in
beginning readers. Educational Psychology, 13, 19-27.
Lane, H. L. (1965). Motor Theory of Speech Perception: A critical review.
Psychological Review, 72, 275-309.
Lane, H. L. (1967). A behavioral basis for the polarity principle in linguistics.
Language, 43, 494-511.
Larkey, L. S., Wald, J., & Strange, W. (1978). Perception of synthetic nasal
consonants in initial and final syllable position. Perception & Psychophysics,
23(299-312).
Leather, J. (1987). F0 pattern inference in the perceptual acquisition of second
language tone. In A. James & J. Leather (Eds.), Sound Patterns in Second
Language Acquisition. Dordrecht: Foris.
Leather, J. (1990). Perceptual and productive learning of Chinese lexical tone by
Dutch and English speakers. Paper presented at the New Sounds 90:
Amsterdam Symposium on the Acquisition of Second Language Speech,
University of Amsterdam.
Lee, L., & Nusbaum, H. C. (1993). Processing interactions between segmental and
suprasegmental information in native speakers of English and Mandarin
Chinese. Perception & Psychophysics, 53, 157-165.
Lenneberg, E. H. (1967). Biological Foundations of Language. New York: John
Wiley & Sons.
Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music.
Cambridge, MA: MIT Press.
255
Levitin, D. J. (1994). Absolute memory for musical pitch - Evidence from the
production of learned melodies. Perception & Psychophysics, 56(4), 414-423.
Levitin, D. J. (1999). Absolute pitch: Self-reference and human memory.
International Journal of Computing and Anticipatory Systems, 4, 255-266.
Levitin, D. J., & Cook, P. R. (1996). Memory for musical tempo: Additional evidence
that auditory memory is absolute. Perception & Psychophysics, 58, 927-935.
Li, C. N., & Thompson, S. A. (1977). The acquisition of tone in Mandarin-speaking
children. Journal of Child Language, 4, 185-199.
Liberman, A. M. (1957). Some results of research on speech perception. Journal of
the Acoustical Society of America, 29, 117-123.
Liberman, A. M. (1996). Introduction: Some assumptions about speech and how they
changed. In Speech: A Special Code. Cambridge, MA: MIT Press.
Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967).
Perception of the speech code. Psychological Review, 74, 431-461.
Liberman, A. M., Delattre, P. C., & Cooper, F. S. (1952). The role of selected
stimulus-variables in the perception of unvoiced stop consonants. American
Journal of Psychology, 65, 497-516.
Liberman, A. M., Delattre, P. C., & Cooper, F. S. (1954). The role of consonant-
vowel transitions in the stop and nasal consonants. Psychological Monogram,
68, 1-13.
Liberman, A. M., Delattre, P. C., Gerstman, L. J., & Cooper, F. S. (1956). Tempo of
frequency change as a cue for distinguishing classes of speech sounds. Journal
of Experimental Psychology, 52, 127-137.
Liberman, A. M., Harris, K., Eimas, P. D., Lisker, L., & Bastian, J. (1961). An effect
of learning on speech perception: The discrimination of durations of silence
with and without phonemic significance. Language and Speech, 54, 175-195.
Liberman, A. M., Harris, K., Hoffman, B., & Griffith, K. (1957). The discrimination
of speech sounds within and across phoneme boundaries. Journal of
Experimental Psychology: Human Perception & Performance, 54, 385.
Liberman, A. M., Harris, K., Kinney, J., & Lane, H. (1961). The discrimination of
relative onset time of the components of certain speech and nonspeech
patterns. Journal of Experimental Psychology, 61, 379-388.
256
Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception
revised. Cognition, 21, 1-36.
Lieberman, P., & Michaels, S. (1962). Some aspects of fundamental frequency,
envelope amplitude, and the emotional content of speech. Journal of the
Acoustical Society of America, 34(7), 922-927.
Lin, M. C. (1988). Putong hua sheng diao de sheng xue texing he zhi jue zhengzhao
[Standard Mandarin tone characteristics and percepts]. Zhongguo Yuyan, 3,
182-193.
Lindblom, B., & Studdert-Kennedy, M. (1967). On the role of formant transitions in
vowel recognition. Journal of the Acoustical Society of America, 42, 830-843.
Lisker, L. (1970). On learning a new contrast. Haskins Laboratory Status Report on
Speech Research, SR-24, 1-15.
Lisker, L., & Abramson, A. S. (1970). The voicing dimension: Some experiments in
comparative phonetics. In Proceedings of the 6th International Congress of
Phonetic Sciences (pp. 563-567). Prague: Academia.
List, G. (1961). Speech melody and song melody in central Thailand.
Ethnomusicology, 5, 15-32.
Liu, S., & Samuel, A. G. (2004). Perception of Mandarin lexical tones when F0
information is neutralized. Language and Speech, 47(2), 109-138.
Locke, S., & Kellar, L. (1973). Categorical perception in a nonlinguistic mode.
Cortex, 9, 355-369.
Lockhead, G. R., & Byrd, R. (1981). Practically perfect pitch. Journal of the
Acoustical Society of America, 70, 387-389.
Luksaneeyanawin, S. (1984). The tonal behaviour of one-word utterances: The
interplay between tone and intonation in Thai. Unpublished PhD Dissertation,
University of Edinburgh.
Luksaneeyanawin, S. (1998). Intonation in Thai. In D. Hirst & A. Di Cristo (Eds.),
Intonation Systems. A Survey of Twenty Languages (pp. 376-394). Cambridge:
Cambridge University Press.
MacDougall, R. (1903). The structure of simple rhythmic forms. Psychological
Review, Monograph Supplements, 4, 309-416.
MacKain, K. S., Best, C. T., & Strange, W. (1981). Categorical perception of English
/r/ and /l/ by Japanese bilinguals. Applied Psycholinguistics, 2, 369-390.
257
MacKay, D. G., Allport, A., Prinz, W., & Scheerer, E. (1987). Relationships and
modules within language perception and production: An introduction. In A.
Allport & D. G. MacKay (Eds.), Language perception and production:
Relationships between listening, speaking, reading and writing. Cognitive
science series (pp. 1-15). London, England UK: Academic Press, Inc.
Macmillan, N. A. (1987). Beyond the categorical/continuous distinction: A
psychophysical approach to processing modes. In S. Harnad (Ed.),
Categorical Perception: The Groundwork of Cognition (pp. 53-85).
Cambridge: Cambridge University Press.
Macmillan, N. A., & Creelman, C. D. (1991). Detection theory: A user's guide.
Cambridge: Cambridge University Press.
Macmillan, N. A., Kaplan, H. L., & Creelman, C. D. (1977). The psychophysics of
categorical perception. Psychological Review, 84, 452-471.
Maddieson, I. (1978). The frequency of tones. UCLA Working Papers in Phonetics,
41, 43-52.
Magne, C., Schoen, D., & Besson, M. (2006). Musician children detect pitch
violations in both music and language better than nonmusician children:
Behavioral and electrophysical approaches. Journal of Cognitive
Neuroscience, 18(2), 199-211.
Mandler, R. (1976). Categorical perception along an oral-nasal continuum. Haskins
Laboratories Status Report on Speech Research, SR-47, 147-154.
Massaro, D. W. (1987). Categorical partition: A fuzzy-logical model of categorization
behavior. In S. Harnad (Ed.), Categorical Perception: The Groundwork of
Cognition (pp. 254-283). Cambridge: Cambridge University Press.
Massaro, D. W., & Cohen, M. M. (1983). Categorical or continuous perception: A
new test. Speech Communication, 2, 15-35.
Matisoff, J. (1972). The Loloish tonal split revisited (Research Monograph No. 7).
Berkeley: Center for South and Southeast Asia Studies, University of
California.
Mattingly, I. G., Liberman, A., Syrdal, A. M., & Halwes, T. (1971). Discrimination in
speech and nonspeech modes. Cognitive Psychology, 2, 131-157.
Mattock, K. (2004). Perceptual Reorganisation for Tone: Linguistic Tone and Non-
Linguistic Pitch Perception by English Language and Chinese Language
258
Infants. Unpublished PhD Dissertation, University of Western Sydney,
Sydney.
Mattock, K., & Burnham, D. (2006). Chinese and English infants‟ tone perception:
Evidence for perceptual reorganization. Infancy, 10(3), 241-265.
May, J. G. (1981). Acoustic factors that may contribute to categorical perception.
Language and Speech, 24, 273-284.
McClasky, C. L., Pisoni, D. B., & Carrell, T. D. (1980). Effects of transfer of training
on identification of a new linguistic contrast in voicing (Progress Report No.
6). Bloomington: Indiana University, Department of Psychology.
McGovern, K., & Strange, W. (1977). The perception of /r/ and /l/ in syllable-initial
and syllable-final position. Perception & Psychophysics, 21, 162-170.
McLaughlin, T. (1970). Music and Communication. London: Faber and Faber.
McMahon, O. (1979). The relationship of music discrimination training to reading
and associated auditory skills. Bulletin of the Council for Research in Music
Education, 59, 68-72.
Mei, T. L. (1970). Tones and prosody in middle Chinese and the origin of the rising
tone. Harvard Journal of Asiatic Studies, 30, 86-110.
Miller, J. D. (1961). Word tone recognition in Vietnamese whispered speech. Word,
17, 11-15.
Miller, J. D., Eimas, P. D., & Zatorre, R. (1979). Studies of place and manner of
articulation in syllable-final position. Journal of the Acoustical Society of
America, 66, 1207-1210.
Miller, J. D., Wier, C. C., Pastore, R., Kelly, W. J., & Dooling, R. J. (1976).
Discrimination and labeling of noise-buzz sequences with varying noise-lead
times: An example of categorical perception. Journal of the Acoustical Society
of America, 60, 410-417.
Miller, J. L. (1980). Contextual effects in the discrimination of stop consonant and
semivowel. Perception & Psychophysics, 28, 93-95.
Miller, J. L., & Eimas, P. D. (1977). Studies on the perception of place and manner of
articulation: A comparison of the labial-alveolar and nasal-stop distinctions.
Journal of the Acoustical Society of America, 61, 835-845.
259
Miller, J. L., & Liberman, A. M. (1979). Some effects of later-occurring information
on the perception of stop consonant and semivowel. Perception &
Psychophysics, 25, 457-465.
Miracle, W. C. (1989). Tone production of American students of Chinese: A
preliminary acoustic study. Journal of Chinese Language Teachers
Association, 24(49–65).
Miyawaki, K., Strange, W., Verbrugge, R., Liberman, A., Jenkins, J. J., & Fujimura,
O. (1975). An effect of linguistic experience: The discrimination of [r] and [l]
by native speakers of Japanese and English. Perception & Psychophysics, 25,
331-340.
Miyazaki, K. (1988). Musical pitch identification by absolute pitch possessors.
Perception & Psychophysics, 44(6), 501-512.
Miyazaki, K. (2004). How well do we understand absolute pitch? Acoustical Science
and Technology, 25(6), 270–282.
Miyazaki, K., & Ogawa, Y. (2006). Learning absolute pitch by children: A cross-
sectional study. Music Perception, 24(1), 63-78.
MLA cooperative foreign language test, German. (1964). Princeton, NJ: Educational
Testing Service.
Moffitt, A. R. (1971). Consonant cue perception by twenty-to-twenty-four-week old
infants. Child Development, 42, 717-731.
Moore, B. C. J. (1989). An introduction to the psychology of hearing (3rd ed.).
London: Academic Press.
Morse, P. A. (1972). The discrimination of speech and nonspeech stimuli in early
infancy. Journal of Experimental Child Psychology, 14, 477-492.
Morse, P. A., & Snowdown, C. T. (1975). An investigation of categorical speech
discrimination by rhesus monkeys. Perception & Psychophysics, 17, 9-16.
Morton, D. (1976). The traditional Music of Thailand: University of California Press.
Murphy, L. E. (1966). Absolute judgments of duration. Journal of Experimental
Psychology, 71, 260-263.
Nooteboom, S. (1997). The prosody of speech: Melody and rhythm. In W. J.
Hardcastle & J. Laver (Eds.), Handbook of Phonetic Sciences (pp. 640-673).
Oxford: Blackwell.
260
Ohala, J. (1970). Aspects of the control and production of speech. In Working Papers
in Phonetics, UCLA (Vol. 15). Los Angeles.
Ohala, J. (1973a). Explanations for the intrinsic pitch of vowels (Monthly Internal
Memorandum). Berkeley: Phonology Laboratory, University of California.
Ohala, J. (1973b). The physiology of tone. In L. M. Hyman (Ed.), Consonant types
and tone (Southern California occasional papers in linguistics). Los Angeles:
USC.
Orton, S. T. (1925). Word-blindness in school children. Archives of Neurology and
Psychiatry, 14, 582–615.
Pastore, R. E., Ahroon, W. A., Baffuto, K. J., Friedman, C., Puelo, J. S., & Fink, E. A.
(1977). Common-factor model of categorical perception. Journal of
Experimental Psychology: Human Perception & Performance, 3, 686-696.
Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience,
6(7), 674-681.
Patel, A. D., & Peretz, I. (1997). Is music autonomous from language? A
neuropsychological appraisal. In I. Deliege & J. Sloboda (Eds.), Perception
and Cognition of Music (pp. 191-215). London: Erlbaum Psychology Press.
Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature
Neuroscience, 6(7), 688-691.
Perey, A. J., & Pisoni, D. B. (1980). Identification and discrimination of durations of
silence in nonspeech signals (Research Report on Speech Perception No. 6).
Bloomington: Indiana University.
Pike, K. L. (1948). Tone Languages: University of Michigan Press.
Pimsleur, P. (1966). The Pimsleur language aptitude battery. New York: Harcourt
Brace Jovanovice.
Pisoni, D. B. (1971). On the nature of categorical perception of speech sounds.
Unpublished PhD Dissertation, University of Michigan.
Pisoni, D. B. (1973). Auditory and phonetic memory codes in the discrimination of
consonants and vowels. Perception & Psychophysics, 13, 253-260.
Pisoni, D. B. (1975). The role of auditory short-term memory in vowel perception.
Memory & Cognition, 3, 7-18.
261
Pisoni, D. B. (1976). Some effects of discrimination training on the identification and
discrimination of rapid spectral changes (Progress Report No. 3).
Bloomington: Indiana University, Department of Psychology.
Pisoni, D. B. (1977). Identification and discrimination of the relative onset of two
component tones; Implications for the perception of voicing in stops. Journal
of the Acoustical Society of America, 61, 1352-1361.
Pisoni, D. B., Aslin, R. N., Perey, A. J., & Hennessy, B. L. (1982). Some effects of
laboratory training on identification and discrimination of voicing contrasts in
stop consonants. Journal of Experimental Psychology: Human Perception &
Performance, 8, 297-314.
Pisoni, D. B., & Glanzman, D. L. (1974). Decision processes in speech discrimination
as revealed by confidence ratings (No. 1). Bloomington: Indiana University.
Pisoni, D. B., & Lazarus, J. (1974). Categorical and noncategorical modes of speech
perception along the voicing continuum. Journal of the Acoustical Society of
America, 55(2), 328-333.
Pisoni, D. B., & Tash, J. (1974). Reaction times to comparisons within and across
phonetic categories. Perception & Psychophysics, 15, 285-299.
Pompino-Marschall, B. (1995). Einfuehrung in die Phonetik. Berlin: de Gruyter.
Popper, R. D. (1972). Pair discrimination for a continuum of synthetic voiced stops
with and without first and third formants. Journal of Psycholinguistic
Research, 1, 205-219.
Povel, D. (1981). Internal representation of simple temporal patterns. Journal of
Experimental Psychology: Human Perception & Performance, 7, 3-18.
Povel, D. (1984). A theoretical framework for rhythm perception. Psychological
Research, 45, 315-337.
Povel, D., & Essens, P. (1984–5). Perception of temporal patterns. Music Perception,
2, 411–440.
Prior, M., & Troup, G. A. (1988). Processing of timbre and rhythm in musicians and
non-musicians. Cortex, 24(3), 451-456.
Profita, J., & Bidder, T. G. (1988). Perfect pitch. American Journal of Medical
Genetics, 29, 763-771.
Rakowski, A. (1972). Direct comparison of absolute and relative pitch. In Hearing
Theory 1972. Eindhoven, The Netherlands: IPO.
262
Ramus, F., Hauser, M. D., Miller, C., Morris, D., & Mehler, J. (2000). Language
discrimination by human newborns and by cotton-top tamarin monkeys.
Science, 288(5464), 349-351.
Rauscher, F. H., & LeMieux, M. T. (2003). Piano, rhythm, and singing instruction
improve different aspects of spatial-temporal reasoning in Head Start children.
Poster presented at the annual meeting of the Cognitive Neuroscience Society,
New York.
Repp, B. H. (1975). Categorical perception, dichotic interference, and auditory
memory: A "same-different" reaction time study.Unpublished manuscript.
Repp, B. H. (1981a). Perceptual equivalence of two kinds of ambiguous speech
stimuli. Bulletin of the Psychonomic Society, 18, 12-14.
Repp, B. H. (1981b). Two strategies in fricative discrimination. Perception &
Psychophysics, 30, 217-227.
Repp, B. H. (1984). Categorical perception: Issues, methods, findings. In Speech and
Language: Advanced Basic Research and Practice (Vol. 10, pp. 243-335):
N.J. Lass.
Repp, B. H., Healy, A. F., & Crowder, R. G. (1979). Categories and context in the
perception of isolated steady-state vowels. Journal of Experimental
Psychology: Human Perception & Performance, 5, 129-145.
Rosen, S., & Howell, P. (1987). Auditory, articulatory and learning explanations of
categorical perception in speech. In S. Harnad (Ed.), Categorical Perception:
The Groundwork of Cognition (pp. 113-160). Cambridge: Cambridge
University Press.
Sachs, R. M. (1969). Vowel identification and discrimination in isolation vs. word
context. In Quarterly Progress Report (Vol. 93, pp. 220-229). Cambridge,
Mass.: Research Laboratory of Electronics.
Sadie, S., & Tyrrell, J. (Eds.). (2001). The New Grove Dictionary of Music and
Musicians (2nd ed.). London: Oxford University Press.
Saffran, J. R., & Griepentrog, G. J. (2001). Absolute pitch in infant auditory learning:
Evidence for developmental reorganization. Developmental Psychology,
37(1), 74-85.
Samuel, A. G. (1977). The effect of discrimination training on speech perception:
Noncategorical perception. Perception & Psychophysics, 22, 321-330.
263
Sawusch, J. R., Nusbaum, H. C., & Schwab, E. C. (1980). Contextual effects in vowel
perception II. Evidence for two processing mechanisms. Perception &
Psychophysics, 27, 292-302.
Schellenberg, E. G. (2001). Music and nonmusical abilities. Biological Foundations
of Music, 930, 355-371.
Schellenberg, E. G., Iverson, P., & McKinnon, M. C. (1999). Name that tune:
Identifying popular recordings from brief excerpts. Psychological Bulletin &
Review, 6(641-646).
Schellenberg, E. G., & Trehub, S. E. (1994). Frequency ratios and the discrimination
of pure tone sequences. Perception & Psychophysics, 56, 472-478.
Schellenberg, E. G., & Trehub, S. E. (1996a). Children's discrimination of melodic
intervals. Developmental Psychology, 32.
Schellenberg, E. G., & Trehub, S. E. (1996b). Natural musical intervals: evidence
from infant listeners. Psychological Science, 5, 272-277.
Schellenberg, E. G., & Trehub, S. E. (2003). Good pitch memory is widespread.
Psychological Science, 14(3), 262-266.
Schouten, B., Gerrits, E., & Hessen, A. J. (2003). The end of categorical perception as
we know it. Speech Communication, 41, 71-80.
Schouten, J. F. (1940). The perception of pitch. Philips Techn. Rev., 5, 286-294.
Schouten, M. E. H., & Van Hessen, A. J. (1992). Modelling phoneme perception: I.
Categorical perception. Journal of the Acoustical Society of America, 92,
1841-1855.
Schuh, R. (1971). Toward a typology of Chadic vowel and tone systems.Unpublished
manuscript.
Seashore, C. E. (1938). The Psychology of Music. New York: McGraw-Hill.
Seashore, C. E., Lewis, D., & Saetveit, J. (1939, 1960). Seashore Measures of
Musical Talents. New York: The Psychological Corporation.
Sergeant, D. (1969). Experimental investigation of absolute pitch. Journal of
Research in Music Education, 17, 135-143.
Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Identification of consonants and
vowels presented to left and right ears. Quarterly Journal of Experimental
Psychology, 19, 59-63.
264
Shattuck-Hufnagel, S., & Turk, A. E. (1996). A prosody tutorial for investigators of
auditory sentence processing. Journal of Psycholinguistic Research, 25(2),
193-247.
Shen, X. (1990). Tonal coarticulation in Mandarin. Journal of Phonetics, 18, 281-285.
Shen, X. S. (1989). Toward a register approach in teaching Mandarin tones. Journal
of Chinese Language Teachers Association, 24, 27-47.
Shepard, R. (1982a). Geometrical approximations to the structure of musical pitch.
Psychological Review, 89, 305-333.
Shepard, R. (1982b). Structural Representation of Musical Pitch. In D. Deutsch (Ed.),
The Psychology of Music (pp. 344–390). London.
Shove, P., & Repp, B. H. (1995). Musical motion and performance: Theoretical and
empirical perspectives. In J. Rink (Ed.), The Practice of Performance (pp. 55–
83). Cambridge.
Shuter-Dyson, R., & Gabriel, C. (1981). The psychology of musical ability. London:
Methuen.
Siegel, J. A., & Siegel, W. (1977a). Absolute identification of notes and intervals by
musicians. Perception & Psychophysics, 21, 143-152.
Siegel, J. A., & Siegel, W. (1977b). Categorical perception of tonal intervals:
musicians can't tell sharp from flat. Perception & Psychophysics, 21, 399-407.
Sinnott, J. M., Beecher, M. D., Moody, D. B., & Stebbins, W. C. (1976). Speech
sound discrimination by monkeys and humans. Journal of the Acoustical
Society of America, 60, 687-695.
Slevc, R. L., & Miyake, A. (2006). Individual differences in second-language
proficiency: Does musical ability matter? Psychological Science, 17(8), 675-
681.
Sloboda, J., & Gregory, A. (1980). The psychological reality of musical segments.
Canadian Journal of Psychology, 34, 274-280.
Snowdon, C. (1987). A naturalistic view of categorical perception. In S. Harnad (Ed.),
Categorical Perception: The Groundwork of Cognition (pp. 332-355). New
York: Cambridge University Press.
So, L. K. H. (1996). Tonal changes in Hong Kong Cantonese. Current Issues in
Language and Society, 3(2), 186-189.
265
Stagray, J., & Downs, D. (1993). Differential sensitivity for frequency among
speakers of a tone and a nontone language. Journal of Chinese Linguistics,
21(1), 143-163.
Stainsby, T., Haszard Morris, R., Malloch, S., & Burnham, D. (2002). MARCS
Auditory Perception Toolbox (APT). Sydney:
http://marcs.uws.edu.au/research/software/apt.htm.
Standley, J. M., & Hughes, J. E. (1997). Evaluation of an early intervention music
curriculum for enhancing prereading/writing skills. Music Therapy
Perspectives, 15(2), 79-85.
Stevens, K. N. (1968). On the relations between speech movements and speech
perception. Zeitschrift fuer Phonetik, Sprachwissenschaft, und
Kommunikationsforschung, 21, 102-106.
Stevens, K. N. (1981). Constraints imposed by the auditory system on the properties
used to classify speech sounds: Data from phonology, acoustics and psycho-
acoustics. In T. F. Myers, J. Laver & J. Anderson (Eds.), The Cognitive
representation of Speech. Amsterdam: North Holland.
Stevens, K. N. (1986). Models of phonetic recognition II: A feature-based model of
speech recognition. In P. Mermelstein (Ed.), Proceedings of the Montreal
Satellite Symposium on Speech Recognition (pp. 66-67). Montreal: 12th
International Congress of Acoustics.
Stevens, K. N. (1989). On the quantal nature of speech. Journal of Phonetics, 17, 3-
45.
Stevens, S. S., & Volkmann, J. (1940). The relation of pitch to frequency: a revised
scale. American Journal of Psychology, 53, 329–353.
Strange, W. (1972). The effects of training on the perception of synthetic speech
sounds: Voice onset time. Unpublished PhD Dissertation, University of
Minnesota.
Studdert-Kennedy, M. (1976). Speech perception. In N. J. Lass (Ed.), Contemporary
issues in experimental phonetics (pp. 243-293). New York: Academic press.
Studdert-Kennedy, M., Liberman, A. M., Harris, K. S., & Cooper, F. S. (1970). Motor
theory of speech perception: A reply to Lane's critical review. Psychological
Review, 77(234-249).
266
Studdert-Kennedy, M., Liberman, A. M., & Stevens, K. N. (1963). Reaction times to
synthetic stop consonants and vowels at phoneme centers and phoneme
boundaries. Journal of the Acoustical Society of America, 35, 1900 (Abstract).
Studdert-Kennedy, M., Liberman, A. M., & Stevens, K. N. (1964). Reaction time
during the discrimination of stop consonants. Journal of the Acoustical Society
of America, 36, 1989.
Studdert-Kennedy, M., & Shankweiler, D. P. (1970). Hemispheric specialization for
speech perception. Journal of the Acoustical Society of America, 48, 579-594.
Summerfield, Q. (1982). Differences between spectral dependencies in auditory and
phonetic temporal processing: Relevance to the perception of voicing in initial
stops. Journal of the Acoustical Society of America 72 (1), 51-61.
Sundberg, J. (1999). The Perception of Singing. In D. Deutsch (Ed.), The Psychology
of Music (pp. 171-214). New York: Academic Press.
Surprenant, A. M., & Watson, C. S. (2001). Individual differences in the processing
of speech and nonspeech sounds by normal-hearing listeners. Journal of the
Acoustical Society of America, 110, 2085-2095.
Swoboda, P. J., Morse, P. A., & Leavitt, L. A. (1976). Continuous vowel
discrimination in normal and at risk infants. Child Development, 47(2), 459-
465.
Syrdal-Lasky, A. (1978). Effects of intensity on the categorical perception of stop
consonants and isolated second formant transitions. Perception &
Psychophysics, 23, 420-432.
Tabachnik, B. G., & Fidell, L. S. (2001). Using multivariate statistics (4 ed.).
Needham Heights, MA: Allyn and Bacon.
Tahta, S., Wood, M., & Loewenthal, K. (1981). Foreign accents: factors relating to
the transfer of accent from the first to the second language. Language and
Speech, 24(4), 265-272.
Takeuchi, A. H., & Hulse, S. H. (1993). Absolute pitch. Psychological Bulletin,
113(2), 345-361.
Tartter, V. C. (1981). A comparison of the identification and discrimination of
synthetic vowel and stop consonant stimuli with various acoustic properties.
Journal of Phonetics, 9(477-486).
267
Thogmartin, C. (1982). Age, individual differences in musical and verbal aptitude,
and pronunciation achievement by elementary school children learning a
foreign language. International Review of Applied Linguistics in Language
Teaching, 20(1), 66-72.
Thompson, L. (1987). A Vietnamese Reference Grammar. Hawaii: University of
Hawaii.
Thompson, W. F., Schellenberg, E. G., & Husain, G. (2004). Decoding speech
prosody: Do music lessons help? Emotion, 4(1), 46-64.
Todd, N. P. M. (1992). The dynamics of dynamics: A model of musical expression.
Journal of the Acoustical Society of America, 91, 3540–3550.
Trainor, L. J., & Trehub, S. E. (1992). A comparison of infants' and adults' sensitivity
to Western musical structure. Journal of Experimental Psychology: Human
Perception & Performance, 18, 394-402.
Trainor, L. J., & Trehub, S. E. (1993). What mediates infants' and adults' superior
processing of the major over the augmented triad. Music Perception, 11, 185-
196.
Trehub, S. E. (1973). Infants' sensitivity to vowel and tonal contrasts. Developmental
Psychology, 9(1), 91-96.
Trehub, S. E., Bull, D., & Thorpe, L. A. (1984). Infants' perception of melodies: The
role of melodic contour. Child Development, 55, 821-830.
Trehub, S. E., Schellenberg, E., & Hill, D. (1997). Music perception and cognition: A
developmental perspective. In I. Deliege & J. Sloboda (Eds.), Music
Perception and Cognition (pp. 121-162). Sussex: Psychology Press.
Trehub, S. E., Thorpe, L. A., & Moringiello, B. A. (1987). Organizational processes
in infants' perception of auditory patterns. Child Development, 58, 741-749.
Trehub, S. E., Thorpe, L. A., & Trainor, L. J. (1990). Infants‟ perception of good and
bad melodies. Psychomusicology, 9, 5–19.
Trehub, S. E., & Trainor, L. J. (1993). Listening strategies in infants: The roots of
language and musical development. In S. McAdams & E. Bigand (Eds.),
Thinking in sound: Cognitive perspectives on human audition (pp. 278-327).
London: Oxford University Press.
Truslit, A. (1938). Gestaltung und Bewegung in der Musik. Berlin.
268
Tse, J. K.-P. (1977). Tone acquisition in Cantonese: A longitudinal case study
(Unpublished manuscript): University of Southern California.
Tseng, C., Massaro, D. W., & Cohen, M. (1986). Lexical tone perception in Mandarin
Chinese. In H. Kao & R. Hoosain (Eds.), Evaluation and Integration of
Acoustic Features, in Linguistics, Psychology, and the Chinese Language (pp.
91-104). Hong Kong: Centre of Asian Studies.
Tuaycharoen, P. (1977). The phonetic and phonological development of a Thai baby:
From early communicative interaction to speech. Unpublished PhD
Dissertation, University of London.
Van Hessen, A. J., & Schouten, M. E. H. (1999). Categorical perception as a function
of stimulus quality. Phonetica, 56(1-2), 56-72.
Van Lancker, D. (1980). Cerebral lateralization of pitch cues in the linguistic signal.
Papers in Linguistics, 13(2), 201-177.
Van Lancker, D., & Fromkin, V. (1973). Hemispheric specialization for pitch and
"tone": Evidence from Thai. Journal of Phonetics, 1, 101-109.
Van Lancker, D., & Fromkin, V. (1978). Cerebral dominance for pitch contrasts in
tone language speakers and in musically untrained and trained English
speakers. Journal of Phonetics, 6, 19-23.
Vaughn, K. (2000)). Music and mathematics: Modest support for the oft-claimed
relationship. Journal of Aesthetic Education, 34(3-4), 149-166.
Vinegrad, M. D. (1972). A direct magnitude scaling method to investigate categorical
versus continuous modes of speech perception. Language and Speech, 15,
114-121.
Vu, T., Nguyen, D., Luong, M., & Hosom, J.-P. (2005). Vietnamese large vocabulary
continuous speech recognition. Paper presented at the INTERSPEECH 2005,
Lisboa, Portugal.
Wang, W. (1971). The basis of speech. In C. Reed (Ed.), The Learning of Language.
New York: Appleton-Century-Crofts.
Wang, W. (1976). Language change. Origins and evolution of language and speech.
Annals of the New York Academy of Sciences, 280, 61-72.
Wang, Y., Jongman, A., & Sereno, J. A. (2001). Dichotic perception of Mandarin
tones by Chinese and American listeners. Brain and Language, 78(3), 332-
348.
269
Wang, Y., Jongman, A., & Sereno, J. A. (2003). Acoustic and perceptual evaluation
of Mandarin tone productions before and after perceptual training. Journal of
the Acoustical Society of America, 113, 1033-1044.
Wang, Y., Sereno, J., Jongman, A., & Hirsch, J. (2001). Cortical reorganization
associated with the acquisition of Mandarin tones by American learners: An
fMri study. Paper presented at the Sixth International Conference on Spoken
Language Processing, Beijing.
Wang, Y., Sereno, J. A., Jongman, A., & Hirsch, J. (2003). fMRI evidence for cortical
modification during learning of Mandarin lexical tone. Journal of Cognitive
Neuroscience, 15, 1019-1027.
Wang, Y., & Spence, M. (1999). Training American listeners to perceive Mandarin
tone. Journal of the Acoustical Society of America, 106, 3649-3658.
Ward, D. (1999). Absolute Pitch. In D. Deutsch (Ed.), The Psychology of Music (pp.
265-298). New York: Academic Press.
Ward, D. W., & Burns, E. M. (1982). Absolute Pitch. In D. Deutsch (Ed.), The
Psychology of Music (pp. 431-451). New York: Academic Press.
Ward, W. D., & Burns, E. M. (1978). Singing without auditory feedback. Journal of
Research in Singing and Applied Vocal Pedagogy, 1(2), 24-44.
Waters, R. S., & Wilson, W. A. (1976). Speech perception in rhesus monkeys: The
voicing distinction in synthesized labial and velar stop consonants. Perception
& Psychophysics, 19, 285-289.
Wayland, R., & Guion, S. (2003). Perceptual discrimination of Thai tones by naive
and experienced learners of Thai. Applied Psycholinguistics, 24, 113-129.
Werker, J. F., & Logan, J. S. (1985). Cross-language evidence for three factors in
speech perception. Perception & Psychophysics, 37, 35-44.
Werker, J. F., & Tees, R. C. (1984). Phonemic and phonetic factors in adult cross-
language speech perception. Journal of the Acoustical Society of America, 75,
1866-1878.
Werker, J. F., & Tees, R. C. (1992). The organization and reorganization of human
speech perception. Annual Review of Neuroscience, 15, 377-402.
Wernicke, C. (1874). The aphasia symptom-complex: A psychological study on an
anatomical basis.
270
Westphal, M. E., Leutenegger, R. R., & Wagner, D. L. (1969). Some psycho-acoustic
and intellectual correlates of achievement in German language learning of
Junior High School students. The Modern Language Journal, 53(4), 258-266.
Whalen, D. H., & Xu, Y. (1992). Information for Mandarin tones in the amplitude
contour and in brief segments. Phonetica, 49, 25-47.
White, C. M. (1981). Tonal perception errors and interference from English
intonation. Journal of Chinese Language Teachers Association, 16, 27-56.
Williams, C., & Stevens, K. N. (1972). Emotions and speech: Some acoustical
correlates. Journal of the Acoustical Society of America, 52(4), 1238-1250.
Williams, L. (1977). The perception of stop consonant voicing by Spanish-English
bilinguals. Perception & Psychophysics, 21, 289-297.
Wing, A. M., & Kristofferson, A. B. (1973). The timing of interresponse intervals.
Perception & Psychophysics, 13, 455–460.
Wise, C. M., & Chong, L. P.-H. (1957). Intelligibility of whispering in a tone
language. Journal of Speech and Hearing Disorders, 22, 335-338.
Wong, P. (2002). Hemispheric specialization of linguistic pitch patterns. Brain
Research Bulletin, 59(2), 83-95.
Wong, P., & Diehl, R. L. (2002). How can the lyircs of a song in a tone language be
understood? Psychology of Music, 30(2), 202-209.
Wong, P., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience
shapes human brainstem encoding of linguistic pitch patterns. Nature
Neuroscience, in press.
Wood, C. C. (1976). Discriminability, response bias, and phoneme categories in
discrimination of voice onset time. Journal of the Acoustical Society of
America, 60, 1381-1389.
Xu, Y. (1994). Production and perception of coarticulated tones. Journal of the
Acoustical Society of America, 95, 2240-2253.
Xu, Y., Gandour, J., & Francis, A. L. (2006). Effects of language experience and
stimulus complexity on the categorical perception of pitch direction. Journal
of the Acoustical Society of America, 120(2), 1063-1074.
Yip, M. (2001). Tone. Cambridge: Cambridge University Press.
Yip, M. (2002). Tone. Cambridge: Cambridge University Press.
271
Yiu, E., & Fok, A. (1995). Lexical tone disruption in Cantonese aphasic speakers.
Clinical Linguistic Phonetics, 9, 79-92.
Yung, B. (1983). Creative process in Cantonese opera I: The role of linguistic tones.
Ethnomusicology, 27, 29-47.
Zatorre, R., & Halpern, A. (1979). Identification, discrimination, and selective
adaptation of simultaneous musical intervals. Perception & Psychophysics,
26(5), 384-395.
Zatorre, R. J. (2003). Absolute pitch: a model for understanding the influence of
genes and development on neural and cognitive function. Nature
Neuroscience, 6(7), 692-695.
Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory
cortex: Music and speech. United Kingdom: Elsevier Science.
Zatorre, R. J., Perry, D. W., Beckett, C. A., Westbury, C. F., & Evans, A. C. (1998).
Functional anatomy of musical processing in listeners with absolute pitch and
relative pitch. Proceedings of the National Academy of Sciences of the United
States of America, 95(6), 3172-3177.
Zue, V. (1976). Some perceptual experiments on the Mandarin tones. Paper presented
at the 92nd Meeting of the Acoustical Society of America, San Diego,
California.
273
Appendix A6.1 Consent Form and Questionnaire MARCS Auditory Laboratories
College of Arts, Education and Social Sciences
Denis Burnham Professor of Psychology, Director MARCS
Phone: (+612) 9772 6681 Fax: (+612) 9772 6736 Email: [email protected] Web: www.uws.edu.au/marcs/
Speech perception across different languages October, 2003
PARTICIPANT INFORMATION STATEMENT You are invited to participate in a research study on human speech. The results of the study will be used to understand how adults produce and perceive speech and other auditory signals. The benefits of this study include increased understanding of how easily adult humans produce speech sounds in their native language, how easily humans perceive that acoustic information in another's speech. We are interested in studying this for different languages. So this research is being conducted with speakers of English, Thai, Vietnamese and Mandarin. You are invited to participate because you are a native speaker of English, Thai, Vietnamese or Mandarin. If you participate, you will complete a 45-minute session. You will be asked to identify and discriminate short sound items. If you choose to participate, you will receive $15 at the completion of the study, in reimbursement of travel costs. Participation is voluntary. You have a right not to participate in, or subsequently withdraw from the study. Any decision not to participate will not affect any current or future relationship with the University of Western Sydney or the University of New South Wales. If you agree to take part in this study, you will be asked to sign a consent form (see over). If you would like additional information on the project or have any questions please do not hesitate to contact Caroline Jones on 9772 6230. Please take time now to ask any questions you may have. There are no anticipated risks to your participation. Thank you for your time. Denis Burnham MARCS Auditory Laboratories & School of Psychology University of Western Sydney (Bankstown) NOTE: This study has been approved by the University of Western Sydney Human Research Ethics Committee, and ratified by the UNSW Ethics Secretariat. If you have any complaints or reservations about the ethical conduct of this research, you may contact the UWS Ethics Committee through the Research Ethics Officers (tel: 02 4570 1136). Any issues you raise will be treated in confidence and investigated fully, and you will be informed of the outcome.
274
MARCS Auditory Laboratories College of Arts, Education and Social Sciences Denis Burnham Professor of Psychology, Director MARCS Phone: (+612) 9772 6681 Fax: (+612) 9772 6736 Email: [email protected] Web: www.uws.edu.au/marcs/
Speech perception across different languages CONSENT FORM Please read the information sheet before signing this. 1. Yes No I,..............................................................................(please print name) agree to participate as a participant in the study described in the participant information statement attached to this form. 2. Yes No I acknowledge that I have read the participant information statement, which explains why I have been selected, the aims of the experiment and the nature and the possible risks of the investigation, and the statement has been explained to me to my satisfaction. 3. Yes No I understand that I can withdraw from the study at any time, and I understand that my decision whether or not to participate in or subsequently withdraw from this study will not affect any current or future relationship to the University of Western Sydney. 4. Yes No I agree that research data gathered from the results of the study may be published, provided that I cannot be identified. 5. Yes No I agree that research data gathered from the results of the study may be provided to other researchers in conference presentations and in follow-up research, provided that I cannot be identified. 6. Yes No I understand that if I have any questions relating to my participation in this research, I may contact Caroline Jones (9772 6230) who will be happy to answer them. 7. Yes No I acknowledge receipt of a copy of the Participant Information Statement. 8. Yes No I agree to complete a questionnaire about my language background and other details relevant to the research before participating in the research. NOTE: This study has been approved by the University of Western Sydney Human Research Ethics Committee, and ratified by the UNSW Ethics Secretariat. If you have any complaints or reservations about the ethical conduct of this research, you may contact the UWS Ethics Committee through the Research Ethics Officers (tel: 02 4570 1136). Any issues you raise will be treated in confidence and investigated fully, and you will be informed of the outcome. Participant's signature: ………………………………….. Date: …………………………………..
275
MARCS Auditory Laboratories College of Arts, Education and Social Sciences Denis Burnham Professor of Psychology, Director MARCS Phone: (+612) 9772 6681 Fax: (+612) 9772 6736 Email: [email protected] Web: www.uws.edu.au/marcs/
Speech perception across different languages PARTICIPANT QUESTIONNAIRE
Please fill in the following details. This information is important for the study, and is the only information about you which will be retained. 1. Your initials: ............................. 2. Age: _____ years, _____ months 3. Gender: Male / Female (please circle) 4. Hearing: Do you have normal hearing? Yes / No (please circle) 5. Speech/language history: Do you have any history of speech/language problems? Yes/No 6. Language background: Please list all languages which you speak natively, i.e. which you learned from birth.
Native language/s: …………………………………….. ……………………………………..
Please also list all languages which you have some knowledge of, and indicate how old you were when you started learning the language. Other language/s you have knowledge of: Age at which you started learning this language: …………………………………….. ……………………………………. …………………………………….. ……………………………………. 7. Musical background: Please list all the musical instruments which you have play/ed and indicate for how long you play/ed the instrument, e.g. violin, 5 years Instrument: Years of Playing: ……………………… …………… ……………………… …………… 8. Place of birth (City/town & Country): ................................................................. 9. If your Mandarin / Vietnamese / Thai dialect is typical of a particular area/city (e.g. Beijing, HongKong) please write the name of the place here: ........................................................................................
THANK YOU!
276
Appendix A6.2 DMDX Script - Identification <ep><nfbt><dfm 1><n 480><s 480><d 75><azk><cr><fd 100><id "keyboard"><dbc 0><dwc 255000000><eop> $0 <ln -3>”In this experiment, you’ll identify sounds.”, <ln -2>"You’ll press the LEFT key”, <ln -1>“when you hear one kind of sound.", <ln 1>"And you’ll press the RIGHT key”, <ln 2>“when you hear the other kind of sound.", <ln 4>"Before you start the experiment,.",
<ln 5>”let’s do some practice.”, <ln 6>”Please press the spacebar to continue.”; 0<ln -3>”PRACTICE TRIALS - Please press the keys for practice!”, <ln 3>”Please press the spacebar to start the practice.”; +550 ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lspn.wav”/; -551 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lsin.wav”/; +552 ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lsin.wav”/; -553 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lspn.wav”/; +554 ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lsin.wav”/; -555 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lspn.wav”/; +556 ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lspn.wav”/; -557 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lsin.wav”/; 0<ln -3>”Good. Now you’ll get to practise this some more.”, <ln 0>”When you get 8 in a row correct,”, <ln 1>”you’ll move on to the testing phase.”, <ln 3>”Please press the spacebar to continue.”; 0<ln -3>”TRAINING TRIALS”, <ln 3>”Please press the spacebar to start the training.”; 2000 <set 1,2>;$ \+301 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-4000>/; -302 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-4000>/; +303 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-4000>/; -304 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-4000>/; +201 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-4000>/; -202 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-4000>/; +203 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-4000>/; -204 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-4000>/;\ $4000 <bicGT 1,1,-12000>;$ \+305 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-6000>/; -306 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-6000>/; +307 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-6000>/; -308 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-6000>/; +205 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-6000>/; -206 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-6000>/; +207 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-6000>/; -208 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-6000>/;\ $6000 <bicGT 1,1,-12000>;$ \+309 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-8000>/; -310 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-8000>/; +311 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-8000>/; -312 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-8000>/; +209 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-8000>/; -210 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-8000>/; +211 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-8000>/; -212 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-8000>/;\ $8000 <bicGT 1,1,-12000>;$ \+313 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-10000>/; -314 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-10000>/; +315 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-10000>/;
277
+213 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-10000>/; -214 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-10000>/; +215 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-10000>/; -216 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-10000>/;\ $10000 <bicGT 1,1,-12000>;$ \+317 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-2000>/; -318 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-2000>/; +319 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-2000>/; -320 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-2000>/; +217 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-2000>/; -218 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-2000>/; +219 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-2000>/; -220 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-2000>/;\ $12000; 0 <ln -3>"Great, 8 out of 8!", <ln -2>”Now you’ll move on to the experiment”, <ln -1>"The experiment consists of two blocks.", <ln 0>"There will be a break in each of the blocks and between them.", <ln 2>”Please press the key that best corresponds to what you hear”, <ln 4>”Please respond as quickly and accurately as you can”, <ln 5>”Press the spacebar to begin the experiment”;$ \-121 <nfb>”ready”/<wav 2>*”200210lsin.wav”/; -122 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -123 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -124 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -125 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -51 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -52 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -53 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -54 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -55 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -111 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -112 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -113 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -114 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -115 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -41 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -42 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -43 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -44 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -45 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -101 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -102 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -103 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -104 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -105 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -31 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -32 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -33 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -34 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -35 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -91 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -92 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -93 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -94 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -95 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -21 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -22 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -23 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -24 <nfb>“ready”/<wav 2>*”200180lspn.wav”/;
278
-25 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -81 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -82 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -83 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -84 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -85 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -11 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -12 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -13 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -14 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -15 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -131 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -132 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -133 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -134 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -135 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -61 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -62 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -63 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -64 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -65 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -71 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -72 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -73 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -74 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -75 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -1 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -2 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -3 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -4 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -5 <nfb>“ready”/<wav 2>*”200160lspn.wav”/;\ $0 <ln -4> “THANK YOU for your effort!”, <ln -3>”In the second part, you’ll also identify sounds.”, <ln -2>"You’ll press the LEFT key”, <ln -1>“when you hear one kind of sound.", <ln 1>"And you’ll press the RIGHT key”, <ln 2>“when you hear the other kind of sound.", <ln 4>"Before you start the experiment,.", <ln 5>”let’s do some practice.”, <ln 6>”Please press the spacebar to continue.”; 0<ln -3>”PRACTICE TRIALS - Please press the keys for practice!”, <ln 3>”Please press the spacebar to start the practice.”; +558 <nfb 0> ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lspn.wav”/; -559 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lsin.wav”/; +560 ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lsin.wav”/; -561 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lspn.wav”/; +562 <nfb 0> ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lsin.wav”/; -563 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lspn.wav”/; +564 ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lspn.wav”/; -565 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lsin.wav”/; 0<ln -3>”Good. Now you’ll get to practise this some more.”, <ln 0>”When you get 8 in a row correct,”, <ln 1>”you’ll move on to the testing phase.”, <ln 3>”Please press the spacebar to continue.”; 0<ln -3>”TRAINING TRIALS”, <ln 3>”Please press the spacebar to start the training.”; 3000 <set 1,2>;$ \+221 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-14000>/;
279
-222 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-14000>/; +223 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-14000>/; -224 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-14000>/; +321 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-14000>/; -322 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-14000>/; +323 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-14000>/; -324 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-14000>/;\ $14000 <bicGT 1,1,-21000>;$ \+225 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-16000>/; -226 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-16000>/; +227 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-16000>/; -228 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-16000>/; +325 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-16000>/; -326 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-16000>/; +327 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-16000>/; -328 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-16000>/;\ $16000 <bicGT 1,1,-21000>;$ \+229 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-18000>/; -230 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-18000>/; +231 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-18000>/; -232 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-18000>/; +329 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-18000>/; -330 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-18000>/; +331 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-18000>/; -332 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-18000>/;\ $18000 <bicGT 1,1,-21000>;$ \+233 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-20000>/; -234 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-20000>/; +235 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-20000>/; -236 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-20000>/; +333 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-20000>/; -334 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-20000>/; +335 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-20000>/; -336 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-20000>/;\ $20000 <bicGT 1,1,-21000>;$ \+237 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-3000>/; -238 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-3000>/; +239 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-3000>/; -240 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-3000>/; +337 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-3000>/; -338 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-3000>/; +339 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-3000>/; -340 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-3000>/;\ $21000; 0 <ln -3>"Great, 8 out of 8!", <ln -1>”Now you’ll move on to the experiment”, <ln 0>”Please press the key that best corresponds to what you hear”, <ln 2>”Please respond as quickly and accurately as you can”, <ln 4>”Press the spacebar to begin the experiment”;$ \-56 <nfb>”ready”/<wav 2>*”200210lspn.wav”/; -57 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -58 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -59 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -60 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -126 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -127 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -128 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -129 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -130 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -46 <nfb>“ready”/<wav 2>*”200200lspn.wav”/;
280
-47 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -48 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -49 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -50 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -116 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -117 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -118 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -119 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -120 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -36 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -37 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -38 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -39 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -40 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -106 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -107 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -108 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -109 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -110 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -26 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -27 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -28 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -29 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -30 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -96 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -97 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -98 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -99 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -100 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -16 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -17 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -18 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -19 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -20 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -86 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -87 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -88 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -89 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -90 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -66 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -67 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -68 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -69 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -70 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -136 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -137 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -138 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -139 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -140 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -6 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -7 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -8 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -9 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -10 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -76 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -77 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -78 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -79 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -80 <nfb>“ready”/<wav 2>*”200160lsin.wav”/;\ $0 <ln -4> “End of experiment. THANK YOU!”;$
281
Appendix A6.3 DMDX Script - Discrimination <ep><cr><nfbt><t 1500><dfm 1><n 176><s 176><d 59><azk><fd 120><id "keyboard"><dbc 0><dwc 000255000><eop> $0 <ln -4>"In this task, you’ll hear two sounds in close succession”, <ln -2>”If they’re the SAME sound, press the LEFT shift key”, <ln -1>”If they’re DIFFERENT sounds, press the RIGHT shift key”, <ln 2>”Please respond as quickly and accurately as you can”, <ln 3>”If you’re really not sure, please respond with your first impression”, <ln 5>”Please press the spacebar to do 8 practice items”; $ \ +501 <nfb 0>“test”/<wav 2>*"160170lsin500.wav”; +502 <nfb 0>“test”/<wav 2>*"160170lsin500.wav”; +503 <nfb 0>“test”/<wav 2>*"170160lsin500.wav”; +504 <nfb 0>“test”/<wav 2>*"170160lsin500.wav”; -505 <nfb 0>“test”/<wav 2>*"200160lsinsame500.wav”; -506 <nfb 0>“test”/<wav 2>*"200180lsinsame500.wav”; -507 <nfb 0>“test”/<wav 2>*"200210lsinsame500.wav”; -508 <nfb 0>“test”/<wav 2>*"200170lsinsame500.wav”; \ $0 “Great, thanks. Please press the spacebar to start the testing.”;$ \ -1 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -2 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -3 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -4 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -9 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -10 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -11 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -12 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -17 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -18 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -19 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -20 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -25 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -26 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -27 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -28 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -33 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -34 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -35 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -36 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -41 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -42 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -43 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -44 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -49 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; -50 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; -51 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; -52 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; +61 <nfb> “test”/<wav 2>*"160170lsin500.wav"; +62 <nfb> “test”/<wav 2>*"160170lsin500.wav"; +63 <nfb> “test”/<wav 2>*"160170lsin500.wav"; +91 <nfb> “test”/<wav 2>*"170160lsin500.wav"; +92 <nfb> “test”/<wav 2>*"170160lsin500.wav"; +66 <nfb> “test”/<wav 2>*"170180lsin500.wav"; +67 <nfb> “test”/<wav 2>*"170180lsin500.wav"; +96 <nfb> “test”/<wav 2>*"180170lsin500.wav"; +97 <nfb> “test”/<wav 2>*"180170lsin500.wav";
282
+98 <nfb> “test”/<wav 2>*"180170lsin500.wav"; +71 <nfb> “test”/<wav 2>*"180190lsin500.wav"; +72 <nfb> “test”/<wav 2>*"180190lsin500.wav"; +73 <nfb> “test”/<wav 2>*"180190lsin500.wav"; +101 <nfb> “test”/<wav 2>*"190180lsin500.wav"; +102 <nfb> “test”/<wav 2>*"190180lsin500.wav"; +76 <nfb> “test”/<wav 2>*"190200lsin500.wav"; +77 <nfb> “test”/<wav 2>*"190200lsin500.wav"; +78 <nfb> “test”/<wav 2>*"200190lsin500.wav"; +106 <nfb> “test”/<wav 2>*"200190lsin500.wav"; +107 <nfb> “test”/<wav 2>*"200190lsin500.wav"; +81 <nfb> “test”/<wav 2>*"200210lsin500.wav"; +82 <nfb> “test”/<wav 2>*"200210lsin500.wav"; +83 <nfb> “test”/<wav 2>*"200210lsin500.wav"; +111 <nfb> “test”/<wav 2>*"210200lsin500.wav"; +112 <nfb> “test”/<wav 2>*"210200lsin500.wav"; +86 <nfb> “test”/<wav 2>*"210220lsin500.wav"; +87 <nfb> “test”/<wav 2>*"210220lsin500.wav"; +116 <nfb> “test”/<wav 2>*"220210lsin500.wav"; +117 <nfb> “test”/<wav 2>*"220210lsin500.wav"; +118 <nfb> “test”/<wav 2>*"220210lsin500.wav"; \ $0 <ln -4>“Great, good going. You’re half-way through the first part.”, <ln -2> “Please press the spacebar to continue.”;$ \ -5 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -6 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -7 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -8 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -13 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -14 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -15 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -16 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -21 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -22 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -23 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -24 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -29 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -30 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -31 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -32 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -37 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -38 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -39 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -40 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -45 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -46 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -47 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -48 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -53 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; -54 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; -55 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; -56 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; +64 <nfb> “test”/<wav 2>*"160170lsin500.wav"; +65 <nfb> “test”/<wav 2>*"160170lsin500.wav"; +93 <nfb> “test”/<wav 2>*"170160lsin500.wav"; +94 <nfb> “test”/<wav 2>*"170160lsin500.wav"; +95 <nfb> “test”/<wav 2>*"170160lsin500.wav"; +68 <nfb> “test”/<wav 2>*"170180lsin500.wav"; +69 <nfb> “test”/<wav 2>*"170180lsin500.wav";
283
+70 <nfb> “test”/<wav 2>*"170180lsin500.wav"; +99 <nfb> “test”/<wav 2>*"180170lsin500.wav"; +100 <nfb> “test”/<wav 2>*"180170lsin500.wav"; +74 <nfb> “test”/<wav 2>*"180190lsin500.wav"; +75 <nfb> “test”/<wav 2>*"180190lsin500.wav"; +103 <nfb> “test”/<wav 2>*"190180lsin500.wav"; +104 <nfb> “test”/<wav 2>*"190180lsin500.wav"; +105 <nfb> “test”/<wav 2>*"190180lsin500.wav"; +78 <nfb> “test”/<wav 2>*"190200lsin500.wav"; +79 <nfb> “test”/<wav 2>*"190200lsin500.wav"; +80 <nfb> “test”/<wav 2>*"190200lsin500.wav"; +109 <nfb> “test”/<wav 2>*"200190lsin500.wav"; +110 <nfb> “test”/<wav 2>*"200190lsin500.wav"; +84 <nfb> “test”/<wav 2>*"200210lsin500.wav"; +85 <nfb> “test”/<wav 2>*"200210lsin500.wav"; +113 <nfb> “test”/<wav 2>*"210200lsin500.wav"; +114 <nfb> “test”/<wav 2>*"210200lsin500.wav"; +115 <nfb> “test”/<wav 2>*"210200lsin500.wav"; +88 <nfb> “test”/<wav 2>*"210220lsin500.wav"; +89 <nfb> “test”/<wav 2>*"210220lsin500.wav"; +90 <nfb> “test”/<wav 2>*"210220lsin500.wav"; +119 <nfb> “test”/<wav 2>*"220210lsin500.wav"; +120 <nfb> “test”/<wav 2>*"220210lsin500.wav"; \ $0 “End of part one. Thanks a lot.”; 0 <ln -4>"In the second part, you’ll again hear two sounds in close succession”, <ln -2>”If they’re the SAME sound, press the LEFT shift key”, <ln -1>”If they’re DIFFERENT sounds, press the RIGHT shift key”, <ln 2>”Please respond as quickly and accurately as you can”, <ln 3>”If you’re really not sure, please respond with your first impression”, <ln 5>”Please press the spacebar to do 8 practice items”; $ \ +509 <nfb 0>“test”/<wav 2>*"160170lspn500.wav”; +510 <nfb 0>“test”/<wav 2>*"160170lspn500.wav”; +511 <nfb 0>“test”/<wav 2>*"170160lspn500.wav”; +512 <nfb 0>“test”/<wav 2>*"170160lspn500.wav”; -513 <nfb 0>“test”/<wav 2>*"200160lspnsame500.wav”; -514 <nfb 0>“test”/<wav 2>*"200180lspnsame500.wav”; -515 <nfb 0>“test”/<wav 2>*"200210lspnsame500.wav”; -516 <nfb 0>“test”/<wav 2>*"200170lspnsame500.wav”; \ $0 “Great, thanks. Please press the spacebar to start the testing.”;$ \ -1 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -2 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -3 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -4 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -9 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -10 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -11 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -12 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -17 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -18 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -19 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -20 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -25 <nfb> “test”/<wav 2>*"200190lspnsame500.wav"; -26 <nfb> “test”/<wav 2>*"200190lspnsame500.wav"; -27 <nfb> “test”/<wav 2>*"200190lspnsame500.wav";
284
-28 <nfb> “test”/<wav 2>*"200190lspnsame500.wav"; -33 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -34 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -35 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -36 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -41 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -42 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -43 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -44 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -49 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; -50 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; -51 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; -52 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; +61 <nfb> “test”/<wav 2>*"160170lspn500.wav"; +62 <nfb> “test”/<wav 2>*"160170lspn500.wav"; +63 <nfb> “test”/<wav 2>*"160170lspn500.wav"; +91 <nfb> “test”/<wav 2>*"170160lspn500.wav"; +92 <nfb> “test”/<wav 2>*"170160lspn500.wav"; +66 <nfb> “test”/<wav 2>*"170180lspn500.wav"; +67 <nfb> “test”/<wav 2>*"170180lspn500.wav"; +96 <nfb> “test”/<wav 2>*"180170lspn500.wav"; +97 <nfb> “test”/<wav 2>*"180170lspn500.wav"; +98 <nfb> “test”/<wav 2>*"180170lspn500.wav"; +71 <nfb> “test”/<wav 2>*"180190lspn500.wav"; +72 <nfb> “test”/<wav 2>*"180190lspn500.wav"; +73 <nfb> “test”/<wav 2>*"180190lspn500.wav"; +101 <nfb> “test”/<wav 2>*"190180lspn500.wav"; +102 <nfb> “test”/<wav 2>*"190180lspn500.wav"; +76 <nfb> “test”/<wav 2>*"190200lspn500.wav"; +77 <nfb> “test”/<wav 2>*"190200lspn500.wav"; +106 <nfb> “test”/<wav 2>*"200190lspn500.wav"; +107 <nfb> “test”/<wav 2>*"200190lspn500.wav"; +108 <nfb> “test”/<wav 2>*"200190lspn500.wav"; +81 <nfb> “test”/<wav 2>*"200210lspn500.wav"; +82 <nfb> “test”/<wav 2>*"200210lspn500.wav"; +83 <nfb> “test”/<wav 2>*"200210lspn500.wav"; +111 <nfb> “test”/<wav 2>*"210200lspn500.wav"; +112 <nfb> “test”/<wav 2>*"210200lspn500.wav"; +86 <nfb> “test”/<wav 2>*"210220lspn500.wav"; +87 <nfb> “test”/<wav 2>*"210220lspn500.wav"; +116 <nfb> “test”/<wav 2>*"220210lspn500.wav"; +117 <nfb> “test”/<wav 2>*"220210lspn500.wav"; +118 <nfb> “test”/<wav 2>*"220210lspn500.wav"; \ $0 <ln -4>“Great, good going. You’re half-way throughthe second part.”, <ln -2> “Please press the spacebar to continue.”;$ \ -5 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -6 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -7 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -8 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -13 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -14 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -15 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -16 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -21 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -22 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -23 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -24 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -29 <nfb> “test”/<wav 2>*"200190lspnsame500.wav";
285
-30 <nfb> “test”/<wav 2>*"200190lspnsame500.wav"; -31 <nfb> “test”/<wav 2>*"200190lspnsame500.wav"; -32 <nfb> “test”/<wav 2>*"200190lspnsame500.wav"; -37 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -38 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -39 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -40 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -45 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -46 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -47 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -48 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -53 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; -54 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; -55 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; -56 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; +64 <nfb> “test”/<wav 2>*"160170lspn500.wav"; +65 <nfb> “test”/<wav 2>*"160170lspn500.wav"; +93 <nfb> “test”/<wav 2>*"170160lspn500.wav"; +94 <nfb> “test”/<wav 2>*"170160lspn500.wav"; +95 <nfb> “test”/<wav 2>*"170160lspn500.wav"; +68 <nfb> “test”/<wav 2>*"170180lspn500.wav"; +69 <nfb> “test”/<wav 2>*"170180lspn500.wav"; +70 <nfb> “test”/<wav 2>*"170180lspn500.wav"; +99 <nfb> “test”/<wav 2>*"180170lspn500.wav"; +100 <nfb> “test”/<wav 2>*"180170lspn500.wav"; +74 <nfb> “test”/<wav 2>*"180190lspn500.wav"; +75 <nfb> “test”/<wav 2>*"180190lspn500.wav"; +103 <nfb> “test”/<wav 2>*"190180lspn500.wav"; +104 <nfb> “test”/<wav 2>*"190180lspn500.wav"; +105 <nfb> “test”/<wav 2>*"190180lspn500.wav"; +78 <nfb> “test”/<wav 2>*"190200lspn500.wav"; +79 <nfb> “test”/<wav 2>*"190200lspn500.wav"; +80 <nfb> “test”/<wav 2>*"190200lspn500.wav"; +109 <nfb> “test”/<wav 2>*"200190lspn500.wav"; +110 <nfb> “test”/<wav 2>*"200190lspn500.wav"; +84 <nfb> “test”/<wav 2>*"200210lspn500.wav"; +85 <nfb> “test”/<wav 2>*"200210lspn500.wav"; +113 <nfb> “test”/<wav 2>*"210200lspn500.wav"; +114 <nfb> “test”/<wav 2>*"210200lspn500.wav"; +115 <nfb> “test”/<wav 2>*"210200lspn500.wav"; +88 <nfb> “test”/<wav 2>*"210220lspn500.wav"; +89 <nfb> “test”/<wav 2>*"210220lspn500.wav"; +90 <nfb> “test”/<wav 2>*"210220lspn500.wav"; +119 <nfb> “test”/<wav 2>*"220210lspn500.wav"; +120 <nfb> “test”/<wav 2>*"220210lspn500.wav"; \ $0 “End of experiment. Thanks a lot.”;$
286
Appendix A6.4 Raw Data - Criterion Language Presentation Tone type/ Criterion
Type Language Manner Set1/2 Score
tonal Thai blocked sine 8
tonal Thai blocked speech 8
tonal Thai blocked sine 8
tonal Thai blocked speech 8
tonal Thai blocked sine 13
tonal Thai blocked speech 10
tonal Thai blocked sine 8
tonal Thai blocked speech 9
tonal Thai blocked sine 13
tonal Thai blocked speech 10
tonal Thai blocked sine 8
tonal Thai blocked speech 8
tonal Thai blocked sine 36
tonal Thai blocked speech 8
tonal Thai blocked sine 8
tonal Thai blocked speech 8
tonal Vietnamese blocked sine 12
tonal Vietnamese blocked speech 8
tonal Vietnamese blocked sine 10
tonal Vietnamese blocked speech 15
tonal Vietnamese blocked sine 8
tonal Vietnamese blocked speech 11
tonal Vietnamese blocked sine 8
tonal Vietnamese blocked speech 20
tonal Vietnamese blocked sine 9
tonal Vietnamese blocked speech 8
tonal Vietnamese blocked sine 8
tonal Vietnamese blocked speech 8
tonal Vietnamese blocked sine 8
tonal Vietnamese blocked speech 8
tonal Vietnamese blocked sine 8
tonal Vietnamese blocked speech 8
tonal Mandarin blocked sine 8
tonal Mandarin blocked speech 8
287
tonal Mandarin blocked sine 8
tonal Mandarin blocked speech 8
tonal Mandarin blocked sine 8
tonal Mandarin blocked speech 14
tonal Mandarin blocked sine 11
tonal Mandarin blocked speech 8
tonal Mandarin blocked sine 8
tonal Mandarin blocked speech 10
tonal Mandarin blocked sine 8
tonal Mandarin blocked speech 8
tonal Mandarin blocked sine 10
tonal Mandarin blocked speech 8
tonal Mandarin blocked sine 8
tonal Mandarin blocked speech 8
non-tonal Aust. English blocked sine 8
non-tonal Aust. English blocked speech 8
non-tonal Aust. English blocked sine 8
non-tonal Aust. English blocked speech 8
non-tonal Aust. English blocked sine 8
non-tonal Aust. English blocked speech 8
non-tonal Aust. English blocked sine 8
non-tonal Aust. English blocked speech 8
non-tonal Aust. English blocked sine 8
non-tonal Aust. English blocked speech 8
non-tonal Aust. English blocked sine 8
non-tonal Aust. English blocked speech 8
non-tonal Aust. English blocked sine 8
non-tonal Aust. English blocked speech 13
non-tonal Aust. English blocked sine 8
non-tonal Aust. English blocked speech 8
tonal Thai mixed set 1 15
tonal Thai mixed set 2 8
tonal Thai mixed set 1 39
tonal Thai mixed set 2 8
tonal Thai mixed set 1 43
tonal Thai mixed set 2 19
tonal Thai mixed set 1 11
tonal Thai mixed set 2 8
288
tonal Thai mixed set 1 8
tonal Thai mixed set 2 8
tonal Thai mixed set 1 21
tonal Thai mixed set 2 8
tonal Thai mixed set 1 28
tonal Thai mixed set 2 8
tonal Thai mixed set 1 8
tonal Thai mixed set 2 8
tonal Vietnamese mixed set 1 19
tonal Vietnamese mixed set 2 8
tonal Vietnamese mixed set 1 42
tonal Vietnamese mixed set 2 8
tonal Vietnamese mixed set 1 43
tonal Vietnamese mixed set 2 8
tonal Vietnamese mixed set 1 10
tonal Vietnamese mixed set 2 8
tonal Vietnamese mixed set 1 36
tonal Vietnamese mixed set 2 8
tonal Vietnamese mixed set 1 11
tonal Vietnamese mixed set 2 8
tonal Vietnamese mixed set 1 8
tonal Vietnamese mixed set 2 8
tonal Vietnamese mixed set 1 11
tonal Vietnamese mixed set 2 11
tonal Mandarin mixed set 1 17
tonal Mandarin mixed set 2 8
tonal Mandarin mixed set 1 23
tonal Mandarin mixed set 2 8
tonal Mandarin mixed set 1 43
tonal Mandarin mixed set 2 8
tonal Mandarin mixed set 1 29
tonal Mandarin mixed set 2 8
tonal Mandarin mixed set 1 12
tonal Mandarin mixed set 2 8
tonal Mandarin mixed set 1 13
tonal Mandarin mixed set 2 8
tonal Mandarin mixed set 1 12
tonal Mandarin mixed set 2 11
289
tonal Mandarin mixed set 1 16
tonal Mandarin mixed set 2 8
non-tonal Aust. English mixed set 1 8
non-tonal Aust. English mixed set 2 8
non-tonal Aust. English mixed set 1 11
non-tonal Aust. English mixed set 2 8
non-tonal Aust. English mixed set 1 8
non-tonal Aust. English mixed set 2 8
non-tonal Aust. English mixed set 1 27
non-tonal Aust. English mixed set 2 21
non-tonal Aust. English mixed set 1 16
non-tonal Aust. English mixed set 2 8
non-tonal Aust. English mixed set 1 8
non-tonal Aust. English mixed set 2 8
non-tonal Aust. English mixed set 1 9
non-tonal Aust. English mixed set 2 8
non-tonal Aust. English mixed set 1 22
non-tonal Aust. English mixed set 2 8
290
Appendix A6.5 Statistical Analyses – Criterion T-Test Blocked vs. Mixed
Group Statistics
64 9.41 4.019 .502
64 14.27 10.267 1.283
mixblockblock
mix
criterionN Mean Std. Deviation
Std. ErrorMean
Independent Samples Test
36.504 .000 -3.526 126 .001 -4.859 1.378 -7.587 -2.132
-3.526 81.863 .001 -4.859 1.378 -7.601 -2.118
Equal variancesassumed
Equal variancesnot assumed
criterionF Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)Mean
DifferenceStd. ErrorDifference Lower Upper
95% ConfidenceInterval of the
Difference
t-test for Equality of Means
ANOVA Blocked
Analysis of Variance Summary Table
Source SS df MS F
------------------------------------------------
Between
------------------------------------------------
B1 25.521 1 25.521 1.678
B2 22.042 1 22.042 1.450
B3 6.125 1 6.125 0.403
Error 425.750 28 15.205
------------------------------------------------
Within
------------------------------------------------
W1 1.563 1 1.563 0.096
B1W1 4.688 1 4.688 0.289
B2W1 6.000 1 6.000 0.370
B3W1 72.000 1 72.000 4.443
Error 453.750 28 16.205
------------------------------------------------
291
ANOVA Mixed
Analysis of Variance Summary Table
Source SS df MS F
------------------------------------------------
Between
------------------------------------------------
B1 148.755 1 148.755 1.628
B2 10.010 1 10.010 0.110
B3 0.031 1 0.031 0.000
Error 2558.188 28 91.364
------------------------------------------------
Within
------------------------------------------------
W1 1816.891 1 1816.891 27.337
B1W1 236.297 1 236.297 3.555
B2W1 2.344 1 2.344 0.035
B3W1 7.031 1 7.031 0.106
Error 1860.938 28 66.462
------------------------------------------------
292
Appendix A6.6 Raw Data – Crossover Values Presentation Crossover Crossover
Language Manner Sine-wave Speech
Thai mixed 1 200 205
Thai mixed 1 181 188
Thai mixed 1 182 201
Thai mixed 1 199 205
Thai mixed 1 191 209
Thai mixed 1 183 194.9
Thai mixed 1 175 183
Thai mixed 1 202 198
Thai blocked 2 204 190
Thai blocked 2 194 191
Thai blocked 2 198 201
Thai blocked 2 205 198
Thai blocked 2 193 192
Thai blocked 2 194.9 209
Thai blocked 2 187 188
Thai blocked 2 193 201
Mandarin mixed 1 171 189
Mandarin mixed 1 179 189
Mandarin mixed 1 194 179
Mandarin mixed 1 179 199
Mandarin mixed 1 183 188
Mandarin mixed 1 170 177
Mandarin mixed 1 176 184
Mandarin mixed 1 186 193
Mandarin blocked 2 184 192
Mandarin blocked 2 189 190
Mandarin blocked 2 184.7 194
Mandarin blocked 2 185 185
Mandarin blocked 2 184 181
Mandarin blocked 2 171 181
Mandarin blocked 2 181 182
Mandarin blocked 2 191 201
Vietnamese mixed 1 180 191
Vietnamese mixed 1 195 192
293
Vietnamese mixed 1 181 186
Vietnamese mixed 1 185 188
Vietnamese mixed 1 189 191
Vietnamese mixed 1 181 193
Vietnamese mixed 1 180 191
Vietnamese mixed 1 183 164
Vietnamese blocked 2 193 191
Vietnamese blocked 2 186 193
Vietnamese blocked 2 193 195
Vietnamese blocked 2 186 190
Vietnamese blocked 2 180 190
Vietnamese blocked 2 200 195
Vietnamese blocked 2 201 191
Vietnamese blocked 2 188 191
Australian mixed 1 191 195
Australian mixed 1 177 189
Australian mixed 1 196 194
Australian mixed 1 193 181
Australian mixed 1 190 193
Australian mixed 1 192 191
Australian mixed 1 198 200
Australian mixed 1 190 199
Australian blocked 2 193 193
Australian blocked 2 198 200
Australian blocked 2 199 196
Australian blocked 2 195 195
Australian blocked 2 186 184
Australian blocked 2 193 208
Australian blocked 2 189 196
Australian blocked 2 190 193
294
Appendix A6.7 Statistical Analyses – Crossovers Analysis of Variance Summary Table
Source SS df MS F
------------------------------------------------
Between
------------------------------------------------
B1 356.779 1 356.779 5.264
B2 319.923 1 319.923 4.721
B3 34.595 1 34.595 0.510
B4 1641.263 1 1641.263 24.218
B5 8.670 1 8.670 0.128
B6 56.659 1 56.659 0.836
B7 0.083 1 0.083 0.001
Error 3795.198 56 67.771
------------------------------------------------
Within
------------------------------------------------
W1 436.232 1 436.232 13.788
B1W1 77.346 1 77.346 2.445
B2W1 20.304 1 20.304 0.642
B3W1 58.853 1 58.853 1.860
B4W1 9.226 1 9.226 0.292
B5W1 42.334 1 42.334 1.338
B6W1 23.730 1 23.730 0.750
B7W1 33.206 1 33.206 1.050
Error 1771.723 56 31.638
------------------------------------------------
295
Appendix A6.8 Raw Data – Identification Accuracy Presentation
Language Manner Sine-wave Speech
Thai mixed 1.873 3.11
Thai mixed 1.468 1.824
Thai mixed 1.781 1.353
Thai mixed 2.229 0.992
Thai mixed 0.992 1.824
Thai mixed 2.3 2.229
Thai mixed 2.3 2.229
Thai mixed 0.674 1.873
Thai blocked 1.873 2.705
Thai blocked 1.824 1.781
Thai blocked 0.996 2.28
Thai blocked 1.353 0.636
Thai blocked 1.468 1.468
Thai blocked 2.213 1.465
Thai blocked 1.873 2.3
Thai blocked 1.468 2.229
Viet mixed 1.873 1.353
Viet mixed 1.468 1.115
Viet mixed 1.353 0.992
Viet mixed 2.3 2.229
Viet mixed 0.992 1.873
Viet mixed 0.507 0.992
Viet mixed 2.229 0.636
Viet mixed 0.992 0.734
Viet blocked 0.992 1.348
Viet blocked 1.348 1.227
Viet blocked 1.873 0.436
Viet blocked 1.286 0.496
Viet blocked 1.873 1.468
Viet blocked 0.496 1.286
Viet blocked 1.468 0.992
Viet blocked 1.468 1.468
Mand mixed 2.705 2.705
Mand mixed 2.705 0.674
Mand mixed 0.992 1.348
Mand mixed 1.468 1.824
296
Mand mixed 1.115 1.468
Mand mixed 2.705 1.873
Mand mixed 1.776 1.824
Mand mixed 2.3 2.3
Mand blocked 1.873 1.873
Mand blocked 1.555 1.353
Mand blocked 2.705 0.992
Mand blocked 0.619 1.468
Mand blocked 1.115 1.555
Mand blocked 2.705 1.468
Mand blocked 2.229 1.468
Mand blocked 1.353 2.705
Australian mixed 2.229 2.705
Australian mixed 0.912 1.468
Australian mixed 2.3 1.824
Australian mixed 1.627 0.636
Australian mixed 1.873 1.353
Australian mixed 1.353 1.468
Australian mixed 2.229 3.11
Australian mixed 2.3 2.3
Australian blocked 0.734 1.627
Australian blocked 2.108 1.272
Australian blocked 1.385 1.555
Australian blocked 1.873 1.339
Australian blocked 1.348 1.468
Australian blocked 1.468 0.758
Australian blocked 1.468 0.992
Australian blocked 1.353 1.627
297
Appendix A6.9 Statistical Analyses Identification Accuracy
Analysis of Variance Summary Table
Source SS df MS F
------------------------------------------------
Between
------------------------------------------------
B1 1.380 1 1.380 3.748
B2 0.004 1 0.004 0.011
B3 1.250 1 1.250 3.395
B4 3.911 1 3.911 10.622
B5 0.663 1 0.663 1.801
B6 0.027 1 0.027 0.072
B7 0.015 1 0.015 0.042
Error 20.616 56 0.368
------------------------------------------------
Within
------------------------------------------------
W1 0.147 1 0.147 0.490
B1W1 0.007 1 0.007 0.025
B2W1 0.000 1 0.000 0.000
B3W1 0.174 1 0.174 0.580
B4W1 0.876 1 0.876 2.915
B5W1 0.033 1 0.033 0.111
B6W1 0.008 1 0.008 0.028
B7W1 0.001 1 0.001 0.004
Error 16.822 56 0.300
------------------------------------------------
298
Appendix A6.10 Raw Data – Discrimination Accuracy Presentation Inter-Stimulus Sine-Wave Discrimination
Language Manner Interval (ms) 160-170 170-180 180-190 190-200 200-210 210-220
Thai block 500 3.995 3.995 1.5525 8.715 7.53 2.36
Thai block 500 -2.283 1.7375 1.9975 2.6225 3.835 -1.553
Thai block 500 4.565 5.1425 5.715 5.455 2.6225 3.995
Thai block 500 -1.998 -1.998 4.0575 3.105 4.565 0.34
Thai mix 500 2.8825 4.175 5.2425 4.98 4.095 0
Thai mix 500 -1.998 0.285 5.2425 5.715 4.62 1.9975
Thai mix 500 0.285 1.9975 0 1.9975 2.2825 0.885
Thai mix 500 5.165 8.715 6.98 3.105 8.715 6.355
Thai block 1500 1.5525 4.905 3.835 0.885 4.905 0.285
Thai block 1500 3.835 2.6225 0.785 0 4.565 1.9975
Thai block 1500 0 4.565 5.91 5.91 6.64 2.2825
Thai block 1500 2.6225 2.2825 4.725 -2.338 0.6675 0.885
Thai mix 1500 3.995 3.5075 6.98 6.345 5.715 4.25
Thai mix 1500 -0.445 0.885 0 1.2375 7.53 0.285
Thai mix 1500 2.2825 2.4375 7.53 -1.57 6.345 3.3275
Thai mix 1500 -1.498 6.64 2.3375 7.055 6.9 2.4375
Vietnamese block 500 6.64 8.715 5.245 8.715 8.715 5.245
Vietnamese block 500 -1.998 1.9975 6.64 2.8825 -1.998 -0.285
Vietnamese block 500 0 4.62 1.9975 5.455 0 1.9975
Vietnamese block 500 0.785 -0.885 -1.998 -3.835 -5.795 0
Vietnamese mix 500 4.565 4.3575 3.5075 0.285 2.2825 1.9975
Vietnamese mix 500 0 3.5075 7.53 4.905 2.2825 0
Vietnamese mix 500 3.55 4.905 -1.398 0 2.8825 0
Vietnamese mix 500 3.1675 0 2.5425 5.1425 5.715 2.5425
Vietnamese block 1500 0 5.91 6.64 4.175 4.905 1.3975
Vietnamese block 1500 0 1.9975 0 4.28 2.2825 0.285
Vietnamese block 1500 1.5525 0.885 -0.785 3.4725 3.3275 0
Vietnamese block 1500 -0.785 -3.508 0 0.785 -1.77 0
Vietnamese mix 1500 2.2825 -2.283 2.8825 0.285 2.8825 2.2825
Vietnamese mix 1500 3.55 -0.285 1.07 1.67 1.57 2.3375
Vietnamese mix 1500 1.4975 0.885 0.785 -1.758 2.6975 1.8125
Vietnamese mix 1500 2.4375 -1.998 1.77 1.9975 0 4.175
Mandarin block 500 1.5525 0 0 1.67 2.6225 -0.885
Mandarin block 500 2.5425 0 -0.1 -2.283 -2.283 0.885
Mandarin block 500 -0.785 -3.958 -3.835 -3.068 -1.67 0.6675
Mandarin block 500 0 4.28 3.1725 0.885 0 0
299
Mandarin mix 500 0.885 0 -2.543 3.105 -3.168 -3.168
Mandarin mix 500 3.835 0.785 5.715 5.795 4.905 1.9975
Mandarin mix 500 1.77 3.3275 1.5525 2.5425 -1.658 0.1
Mandarin mix 500 0.785 3.9575 5.32 1.9975 1.5525 4.62
Mandarin block 1500 1.175 4.3575 5.165 3.1725 0 0.785
Mandarin block 1500 0 1.9975 0 4.28 2.2825 0.285
Mandarin block 1500 2.2825 5.2425 3.835 1.5525 3.4825 0.785
Mandarin block 1500 2.8825 3.55 7.53 4.62 4.28 0
Mandarin mix 1500 0 2.8825 6.98 4.62 4.28 5.795
Mandarin mix 1500 -0.99 -1.813 0.6675 0.7675 4.3575 -2.783
Mandarin mix 1500 2.2825 6.98 5.91 7.53 8.715 5.2425
Mandarin mix 1500 5.085 3.3275 5.795 4.095 0.785 0.6675
Australian block 500 1.5525 3.1725 -1.913 -1.213 5.245 -1.998
Australian block 500 6.64 2.4375 5.1425 4.0575 4.095 4.095
Australian block 500 -3.995 1.9975 0.285 0 3.1675 3.4275
Australian block 500 4.565 2.2825 3.55 6.9 2.6225 7.14
Australian mix 500 2.4375 3.5075 3.1725 5.1425 1.9125 3.06
Australian mix 500 4.3575 2.2825 0 2.6225 0.6675 1.9975
Australian mix 500 1.77 3.3275 1.5525 2.5425 -1.658 0.1
Australian mix 500 1.5525 1.57 2.5425 5.87 6.9 7.53
Australian block 1500 2.2825 1.5525 4.0575 2.4375 1.1125 2.2825
Australian block 1500 6.98 5.795 5.795 8.715 7.53 8.715
Australian block 1500 0.625 3.1725 3.5075 2.36 4.62 4.28
Australian block 1500 2.36 4.62 4.0575 7.53 4.725 4.3575
Australian mix 1500 6.345 2.5425 4.0575 4.825 3.5075 2.2825
Australian mix 1500 6.64 5.165 5.91 3.55 4.3575 3.55
Australian mix 1500 0.6675 2.3375 1.5525 -1.145 1.7375 0
Australian mix 1500 5.91 6.345 6.98 5.87 5.085 5.395
300
Presentation Inter-Stimulus Speech Discrimination
Language Manner Interval (ms) 160-170 170-180 180-190 190-200 200-210 210-220
Thai block 500 2.2825 2.4375 5.455 6.98 5.245 4.565
Thai block 500 0 -1.998 3.1725 5.085 3.105 1.77
Thai block 500 2.5425 5.91 5.455 5.17 1.9975 1.9975
Thai block 500 1.175 2.4375 4.095 5.245 1.9975 1.9975
Thai mix 500 0.285 1.62 3.1725 5.2425 4.54 1.5525
Thai mix 500 0 2.6225 4.905 0.89 0.285 -1.998
Thai mix 500 1.9975 4.905 2.2825 5.87 1.57 2.5425
Thai mix 500 3.1725 6.98 3.5075 8.715 6.98 4.905
Thai block 1500 1.1125 0.285 1.4975 4.98 3.105 2.4375
Thai block 1500 4.565 0 2.2825 5.91 1.9975 0
Thai block 1500 0.545 -0.1 0.885 5.91 3.5075 6.355
Thai block 1500 3.1675 4.3575 4.25 5.245 -0.73 2.6225
Thai mix 1500 7.53 4.905 8.715 6.9 8.715 3.105
Thai mix 1500 0.885 2.7825 -1.398 4.905 3.5825 2.6225
Thai mix 1500 2.2825 -1.998 0.89 3.0675 3.1675 2.2825
Thai mix 1500 0 4.565 -0.445 5.795 1.77 3.1675
Vietnamese block 500 2.6225 1.67 3.105 2.6225 3.4075 0
Vietnamese block 500 1.1125 -1.998 1.3975 0 0 0
Vietnamese block 500 -1.998 4.28 4.3575 0 2.6225 -2.283
Vietnamese block 500 0.785 -3.835 1.9975 0.885 -3.573 -1.658
Vietnamese mix 500 1.77 4.28 6.64 5.91 5.245 4.28
Vietnamese mix 500 0 0.885 2.4375 2.6975 3.835 0
Vietnamese mix 500 0.785 0.785 5.085 3.105 0 2.4375
Vietnamese mix 500 6.64 8.715 1.57 5.085 5.1425 5.715
Vietnamese block 1500 1.5525 6.64 6.98 2.7825 6.355 0
Vietnamese block 1500 0 -1.998 4.28 3.995 2.2825 0
Vietnamese block 1500 1.07 0 3.105 0.73 -2.283 1.1125
Vietnamese block 1500 1.1125 -1.998 2.2825 2.2825 1.9975 0
Vietnamese mix 1500 3.1675 0 3.55 0.885 1.9975 1.9975
Vietnamese mix 1500 1.1125 4.62 4.28 2.2825 3.995 0
Vietnamese mix 1500 2.4375 -4.565 1.7375 0.6675 4.905 0
Vietnamese mix 1500 -3.995 2.6225 3.1725 -3.068 0 0.885
Mandarin block 500 3.4275 4.0575 2.5425 1.7575 6.64 2.2825
Mandarin block 500 0 4.0575 3.4275 -1.998 1.3975 -3.995
Mandarin block 500 -2.438 -1.738 -3.168 -3.068 -1.67 -0.99
Mandarin block 500 0 2.2825 8.715 4.565 4.565 0
Mandarin mix 500 2.2825 -0.155 0.885 1.9975 0 1.9975
Mandarin mix 500 -4.28 3.4075 4.905 1.9975 1.9975 0
301
Mandarin mix 500 5.085 0 3.105 3.105 0.885 2.8825
Mandarin mix 500 3.4075 2.6975 5.17 6.345 4.905 2.2825
Mandarin block 1500 5.87 5.1425 3.4075 3.835 3.835 -3.068
Mandarin block 1500 0 -1.998 4.28 3.995 2.2825 0
Mandarin block 1500 0.285 0.285 5.91 3.1725 3.1675 4.0575
Mandarin block 1500 8.715 6.355 4.28 1.9975 4.62 2.6225
Mandarin mix 1500 2.6225 6.64 1.9975 1.9975 5.245 5.17
Mandarin mix 1500 2.6225 6.98 0 3.1675 2.2825 3.995
Mandarin mix 1500 6.98 6.64 8.715 6.98 4.28 4.3575
Mandarin mix 1500 4.175 3.105 3.1675 2.6225 0.1 -1.398
Australian block 500 1.5525 2.2825 2.6225 4.175 5.795 1.77
Australian block 500 1.9975 3.4075 1.5525 6.9 7.53 5.165
Australian block 500 -2.283 0 1.3975 0 0.885 0
Australian block 500 4.3575 3.1675 6.98 4.3575 7.53 5.245
Australian mix 500 4.28 2.2825 -1.113 5.795 0 0.785
Australian mix 500 0 0.545 3.3275 3.0675 3.55 0.885
Australian mix 500 5.085 0 3.105 3.105 0.885 2.8825
Australian mix 500 -2.623 4.54 3.0675 3.4725 4.3575 0
Australian block 1500 2.2825 3.1675 3.5075 1.9975 1.5525 -1.113
Australian block 1500 7.53 7.53 6.345 7.53 8.715 8.715
Australian block 1500 2.6225 4.28 4.62 6.355 3.5075 4.3575
Australian block 1500 1.9975 3.995 5.2425 4.175 5.455 4.905
Australian mix 1500 4.28 4.565 4.28 5.245 2.8825 0
Australian mix 1500 4.175 4.905 8.715 7.53 6.98 6.355
Australian mix 1500 4.28 0 0 4.28 0.885 0
Australian mix 1500 3.4725 4.175 3.1725 5.795 7.53 3.1725
302
Appendix A6.11 Statistical Analyses – Discrimination Accuracy Analysis of Variance Summary Table
Source SS df MS F
------------------------------------------------
Between
------------------------------------------------
B1 38.693 1 38.693 1.108
B2 73.545 1 73.545 2.107
B3 13.053 1 13.053 0.374
B4 145.494 1 145.494 4.167
B5 5.435 1 5.435 0.156
B6 151.271 1 151.271 4.333
B7 55.377 1 55.377 1.586
B8 53.512 1 53.512 1.533
B9 4.707 1 4.707 0.135
B10 8.518 1 8.518 0.244
B11 178.318 1 178.318 5.107
B12 0.028 1 0.028 0.001
B13 1.146 1 1.146 0.033
B14 8.019 1 8.019 0.230
B15 33.106 1 33.106 0.948
Error 1675.821 48 34.913
------------------------------------------------
Within
------------------------------------------------
W1 139.940 1 139.940 38.735
B1W1 2.944 1 2.944 0.815
B2W1 1.617 1 1.617 0.447
B3W1 8.997 1 8.997 2.490
B4W1 9.290 1 9.290 2.572
B5W1 0.720 1 0.720 0.199
B6W1 0.882 1 0.882 0.244
B7W1 7.868 1 7.868 2.178
B8W1 4.402 1 4.402 1.219
B9W1 0.012 1 0.012 0.003
B10W1 16.035 1 16.035 4.438
B11W1 6.436 1 6.436 1.782
B12W1 10.897 1 10.897 3.016
B13W1 2.945 1 2.945 0.815
B14W1 6.698 1 6.698 1.854
B15W1 2.461 1 2.461 0.681
Error 173.412 48 3.613
W2 0.605 1 0.605 0.137
B1W2 1.434 1 1.434 0.326
B2W2 3.533 1 3.533 0.802
B3W2 3.719 1 3.719 0.844
B4W2 11.662 1 11.662 2.648
B5W2 4.580 1 4.580 1.040
B6W2 20.007 1 20.007 4.543
B7W2 0.179 1 0.179 0.041
B8W2 1.596 1 1.596 0.363
B9W2 0.602 1 0.602 0.137
B10W2 7.246 1 7.246 1.645
B11W2 0.559 1 0.559 0.127
B12W2 0.186 1 0.186 0.042
B13W2 1.153 1 1.153 0.262
B14W2 0.186 1 0.186 0.042
B15W2 9.034 1 9.034 2.051
303
Error 211.386 48 4.404
W3 108.226 1 108.226 22.181
B1W3 0.044 1 0.044 0.009
B2W3 0.037 1 0.037 0.008
B3W3 0.029 1 0.029 0.006
B4W3 0.503 1 0.503 0.103
B5W3 17.436 1 17.436 3.574
B6W3 36.657 1 36.657 7.513
B7W3 1.589 1 1.589 0.326
B8W3 4.124 1 4.124 0.845
B9W3 0.672 1 0.672 0.138
B10W3 0.220 1 0.220 0.045
B11W3 0.686 1 0.686 0.141
B12W3 0.644 1 0.644 0.132
B13W3 2.231 1 2.231 0.457
B14W3 4.117 1 4.117 0.844
B15W3 24.143 1 24.143 4.948
Error 234.198 48 4.879
W4 3.775 1 3.775 1.343
B1W4 0.001 1 0.001 0.000
B2W4 7.625 1 7.625 2.713
B3W4 17.962 1 17.962 6.392
B4W4 0.121 1 0.121 0.043
B5W4 0.332 1 0.332 0.118
B6W4 0.088 1 0.088 0.031
B7W4 14.004 1 14.004 4.984
B8W4 4.442 1 4.442 1.581
B9W4 2.359 1 2.359 0.839
B10W4 31.115 1 31.115 11.073
B11W4 4.592 1 4.592 1.634
B12W4 6.105 1 6.105 2.173
B13W4 1.576 1 1.576 0.561
B14W4 0.001 1 0.001 0.000
B15W4 0.075 1 0.075 0.027
Error 134.877 48 2.810
W5 3.854 1 3.854 0.595
B1W5 0.551 1 0.551 0.085
B2W5 0.050 1 0.050 0.008
B3W5 0.093 1 0.093 0.014
B4W5 0.169 1 0.169 0.026
B5W5 17.746 1 17.746 2.742
B6W5 0.071 1 0.071 0.011
B7W5 4.019 1 4.019 0.621
B8W5 0.423 1 0.423 0.065
B9W5 4.739 1 4.739 0.732
B10W5 3.600 1 3.600 0.556
B11W5 3.223 1 3.223 0.498
B12W5 8.469 1 8.469 1.308
B13W5 13.059 1 13.059 2.018
B14W5 2.951 1 2.951 0.456
B15W5 13.444 1 13.444 2.077
Error 310.689 48 6.473
W6 12.393 1 12.393 2.683
B1W6 2.282 1 2.282 0.494
B2W6 1.867 1 1.867 0.404
B3W6 0.275 1 0.275 0.060
B4W6 1.089 1 1.089 0.236
B5W6 10.324 1 10.324 2.235
B6W6 0.274 1 0.274 0.059
B7W6 3.113 1 3.113 0.674
B8W6 0.069 1 0.069 0.015
304
B9W6 3.516 1 3.516 0.761
B10W6 1.391 1 1.391 0.301
B11W6 9.300 1 9.300 2.013
B12W6 0.107 1 0.107 0.023
B13W6 1.959 1 1.959 0.424
B14W6 0.016 1 0.016 0.004
B15W6 0.534 1 0.534 0.116
Error 221.745 48 4.620
W7 0.984 1 0.984 0.291
B1W7 7.535 1 7.535 2.231
B2W7 7.127 1 7.127 2.110
B3W7 0.104 1 0.104 0.031
B4W7 0.210 1 0.210 0.062
B5W7 4.526 1 4.526 1.340
B6W7 42.781 1 42.781 12.664
B7W7 0.113 1 0.113 0.033
B8W7 0.001 1 0.001 0.000
B9W7 7.296 1 7.296 2.160
B10W7 0.015 1 0.015 0.004
B11W7 0.177 1 0.177 0.052
B12W7 1.700 1 1.700 0.503
B13W7 1.554 1 1.554 0.460
B14W7 19.512 1 19.512 5.776
B15W7 1.219 1 1.219 0.361
Error 162.150 48 3.378
W8 3.833 1 3.833 1.095
B1W8 0.004 1 0.004 0.001
B2W8 1.690 1 1.690 0.483
B3W8 0.084 1 0.084 0.024
B4W8 8.869 1 8.869 2.533
B5W8 0.716 1 0.716 0.204
B6W8 0.042 1 0.042 0.012
B7W8 3.746 1 3.746 1.070
B8W8 0.150 1 0.150 0.043
B9W8 8.197 1 8.197 2.341
B10W8 4.912 1 4.912 1.403
B11W8 8.574 1 8.574 2.449
B12W8 0.703 1 0.703 0.201
B13W8 0.403 1 0.403 0.115
B14W8 6.550 1 6.550 1.871
B15W8 8.191 1 8.191 2.339
Error 168.073 48 3.502
W9 5.143 1 5.143 1.699
B1W9 2.437 1 2.437 0.805
B2W9 6.841 1 6.841 2.260
B3W9 0.976 1 0.976 0.322
B4W9 1.257 1 1.257 0.415
B5W9 16.485 1 16.485 5.445
B6W9 46.239 1 46.239 15.272
B7W9 0.038 1 0.038 0.013
B8W9 0.317 1 0.317 0.105
B9W9 3.589 1 3.589 1.185
B10W9 0.526 1 0.526 0.174
B11W9 0.103 1 0.103 0.034
B12W9 5.090 1 5.090 1.681
B13W9 3.500 1 3.500 1.156
B14W9 4.945 1 4.945 1.633
B15W9 0.405 1 0.405 0.134
Error 145.328 48 3.028
------------------------------------------------
306
Appendix A7.1 Consent Form and Questionnaire for Australian Participants
MARCS Auditory Laboratories College of Arts, Education and Social Sciences
Denis Burnham
Professor of Psychology, Director MARCS Phone: (+612) 9772 6681 Fax: (+612) 9772 6736
Email: [email protected] Web: www.uws.edu.au/marcs/
Speech perception across different languages April 2005
PARTICIPANT INFORMATION STATEMENT You are invited to participate in a research study on human speech. The results of the study will be used to understand how adults produce and perceive speech and other auditory signals. The benefits of this study include increased understanding of how easily humans produce speech sounds in their native language, how easily humans perceive acoustic information in another's speech. We are interested in studying this for different languages. So this research is being conducted with speakers of Australian English. You are invited to participate because you are a native speaker of Australian English. If you participate, you will complete a 60-minute session. You will be asked to identify and discriminate short sound items. Participation is voluntary. You have a right not to participate in, or subsequently withdraw from the study. Any decision not to participate will not affect any current or future relationship with the University of Western Sydney. If you agree to take part in this study, you will be asked to sign a consent form (see over). If you would like additional information on the project or have any questions please do not hesitate to contact Barbara Schwanhaeusser on 9772 6589. Please take time now to ask any questions you may have. Thank you for your time. Denis Burnham MARCS Auditory Laboratories & School of Psychology University of Western Sydney (Bankstown) NOTE: This study has been approved by the University of Western Sydney Human Research Ethics Committee. If you have any complaints or reservations about the ethical conduct of this research, you may contact the UWS Ethics Committee through the Research Ethics Officers (tel: (02) 4736 0883 ). Any issues you raise will be treated in confidence and investigated fully, and you will be informed of the outcome.
307
MARCS Auditory Laboratories
College of Arts, Education and Social Sciences
Denis Burnham Professor of Psychology, Director MARCS
Phone: (+612) 9772 6681 Fax: (+612) 9772 6736 Email: [email protected] Web: www.uws.edu.au/marcs/
Speech perception across different languages CONSENT FORM
Please read the information sheet before signing this. 1. Yes No I,..............................................................................(please print name) agree to participate as a participant in the study described in the participant information statement attached to this form. 2. Yes No I acknowledge that I have read the participant information statement, which explains why I have been selected, the aims of the experiment, the nature, the possible risks of the investigation, and the statement have been explained to me to my satisfaction. 3. Yes No I understand that I can withdraw from the study at any time, and I understand that my decision whether or not to participate in or subsequently withdraw from this study will not affect any current or future relationship to the University of Western Sydney. 4. Yes No I agree that research data gathered from the results of the study may be published, provided that I cannot be identified. 5. Yes No I agree that research data gathered from the results of the study may be provided to other researchers in conference presentations and in follow-up research, provided that I cannot be identified. 6. Yes No I understand that if I have any questions relating to my participation in this research, I may contact Barbara Schwanhaeusser (9772 6589) or Prof Denis Burnham (9772 6681) who will be happy to answer them. 7. Yes No I acknowledge receipt of a copy of the Participant Information Statement. 8. Yes No I agree to complete a questionnaire about my language background and other details relevant to the research before participating in the research. Participant's signature: ………………………………….. Date: ………………………………….. NOTE: This study has been approved by the University of Western Sydney Human Research Ethics Committee. If you have any complaints or reservations about the ethical conduct of this research, you may contact the UWS Ethics Committee through the Research Ethics Officers (tel: (02) 4736 0883). Any issues you raise will be treated in confidence and investigated fully, and you will be informed of the outcome.
308
MARCS Auditory Laboratories College of Arts, Education and Social Sciences
Denis Burnham
Professor of Psychology, Director MARCS Phone: (+612) 9772 6681 Fax: (+612) 9772 6736
Email: [email protected] Web: www.uws.edu.au/marcs/
Speech perception across different languages PARTICIPANT QUESTIONNAIRE
Please fill in the following details. This information is important for the study, and is the only information about you which will be retained. 1 Your name: …………………….. 2 Male / Female (please circle) 3 Date of birth: …………….. 4 Place of birth (City/town & Country): …………………………………..… 5 What is the official language in that country?………………………………. 6 Hearing: Do you have normal hearing? Yes / No If No, please provide any details you can _________________________________________________________________________________________________________________________________________________________ 7 Speech/language history: Do you have any history of speech/language problems? Yes / No If Yes, please provide any details you can __________________________________________________________________________________________________________________________________________________________ 8 Language background: What is your native language/dialect, that is the language/dialect, which you learned from birth. Please list the percentage of time you use these in your everyday life now. Native Language / Dialect Percentage of Time Spoken …………………………. ……………….. If you learned more than one language from birth, please list these and the percentage of time you use these in your everyday life now. Additional Native Language / Dialect Percentage of Time Spoken …………………………. ………………..
309
MARCS Auditory Laboratories College of Arts, Education and Social Sciences
Denis Burnham
Professor of Psychology, Director MARCS Phone: (+612) 9772 6681 Fax: (+612) 9772 6736
Email: [email protected] Web: www.uws.edu.au/marcs/
Speech perception across different languages PARTICIPANT QUESTIONNAIRE
Please also list all the languages of which you have some knowledge, and indicate how old you were when you started learning the language, and the percentage of time you use these in your everyday life. Other language/s that you Age at which you started Percentage of have knowledge of: learning this language: Time Spoken …………………………………. ………… ……….. …………………………………. ………… ……….. …………………………………. ………… ……….. 9 Do you play a musical instrument and/or have singing training? Yes / No If Yes, please list all the musical instruments which you have play/ed and indicate for how long you play/ed the instrument, e.g. violin, age 10, played for 5 years (singing counts!) Instrument: Age started playing: Number of Years Playing: ……………………… …… …… ……………………… …… …… ……………………… …… …… Do you have formal secondary or tertiary level music education? Yes / No If so, please list details about that Instrument: Course: Grade or level attained: ……………………… …… …… ……………………… …… …… ……………………… …… …… Are you still playing music? Yes / No If not, when did you finish? ……………… If yes, how many hours per day do you play? ……………… 12 Do you have perfect pitch / absolute pitch? Yes / No / Don’t know (recognition / production of a note without a reference to other notes) Where, when and how was that assessed? ……………………………………….
THANK YOU!
314
Appendix A7.3 Raw Data – Tone Perception Test Criterion Results Continuum Language Presentation Tone type Trials to
Shape Background Musicianship Manner Set 1/2 Criterion
Rising Australian musician blocked sine-wave 8
Rising Australian musician blocked speech 8
Rising Australian musician blocked sine-wave 17
Rising Australian musician blocked speech 8
Rising Australian musician blocked sine-wave 8
Rising Australian musician blocked speech 8
Rising Australian musician blocked sine-wave 8
Rising Australian musician blocked speech 8
Rising Australian non-musician blocked sine-wave 19
Rising Australian non-musician blocked speech 8
Rising Australian non-musician blocked sine-wave 8
Rising Australian non-musician blocked speech 8
Rising Australian non-musician blocked sine-wave 17
Rising Australian non-musician blocked speech 8
Rising Australian non-musician blocked sine-wave 8
Rising Australian non-musician blocked speech 8
Rising Thai musician blocked sine-wave 8
Rising Thai musician blocked speech 9
Rising Thai musician blocked sine-wave 9
Rising Thai musician blocked speech 13
Rising Thai musician blocked sine-wave 8
Rising Thai musician blocked speech 8
Rising Thai musician blocked sine-wave 8
Rising Thai musician blocked speech 8
Rising Thai non-musician blocked sine-wave 24
Rising Thai non-musician blocked speech 8
Rising Thai non-musician blocked sine-wave 23
Rising Thai non-musician blocked speech 8
Rising Thai non-musician blocked sine-wave 8
Rising Thai non-musician blocked speech 8
Rising Thai non-musician blocked sine-wave 8
Rising Thai non-musician blocked speech 8
Rising Australian musician mixed set 1 8
Rising Australian musician mixed set 2 8
315
Rising Australian musician mixed set 1 10
Rising Australian musician mixed set 2 8
Rising Australian musician mixed set 1 8
Rising Australian musician mixed set 2 8
Rising Australian musician mixed set 1 14
Rising Australian musician mixed set 2 12
Rising Australian non-musician mixed set 1 87
Rising Australian non-musician mixed set 2 8
Rising Australian non-musician mixed set 1 76
Rising Australian non-musician mixed set 2 10
Rising Australian non-musician mixed set 1 41
Rising Australian non-musician mixed set 2 39
Rising Australian non-musician mixed set 1 78
Rising Australian non-musician mixed set 2 50
Rising Thai musician mixed set 1 8
Rising Thai musician mixed set 2 8
Rising Thai musician mixed set 1 37
Rising Thai musician mixed set 2 11
Rising Thai musician mixed set 1 17
Rising Thai musician mixed set 2 11
Rising Thai musician mixed set 1 10
Rising Thai musician mixed set 2 8
Rising Thai non-musician mixed set 1 40
Rising Thai non-musician mixed set 2 8
Rising Thai non-musician mixed set 1 13
Rising Thai non-musician mixed set 2 8
Rising Thai non-musician mixed set 1 8
Rising Thai non-musician mixed set 2 8
Rising Thai non-musician mixed set 1 22
Rising Thai non-musician mixed set 2 9
Falling Australian musician blocked sine-wave 8
Falling Australian musician blocked speech 21
Falling Australian musician blocked sine-wave 8
Falling Australian musician blocked speech 8
Falling Australian musician blocked sine-wave 8
Falling Australian musician blocked speech 8
Falling Australian musician blocked sine-wave 8
Falling Australian musician blocked speech 8
316
Falling Australian non-musician blocked sine-wave 18
Falling Australian non-musician blocked speech 8
Falling Australian non-musician blocked sine-wave 20
Falling Australian non-musician blocked speech 44
Falling Australian non-musician blocked sine-wave 8
Falling Australian non-musician blocked speech 8
Falling Australian non-musician blocked sine-wave 8
Falling Australian non-musician blocked speech 23
Falling Thai musician blocked sine-wave 8
Falling Thai musician blocked speech 8
Falling Thai musician blocked sine-wave 8
Falling Thai musician blocked speech 8
Falling Thai musician blocked sine-wave 8
Falling Thai musician blocked speech 8
Falling Thai musician blocked sine-wave 8
Falling Thai musician blocked speech 8
Falling Thai non-musician blocked sine-wave 8
Falling Thai non-musician blocked speech 8
Falling Thai non-musician blocked sine-wave 8
Falling Thai non-musician blocked speech 28
Falling Thai non-musician blocked sine-wave 8
Falling Thai non-musician blocked speech 8
Falling Thai non-musician blocked sine-wave 8
Falling Thai non-musician blocked speech 8
Falling Australian musician mixed set 1 8
Falling Australian musician mixed set 2 8
Falling Australian musician mixed set 1 11
Falling Australian musician mixed set 2 8
Falling Australian musician mixed set 1 8
Falling Australian musician mixed set 2 8
Falling Australian musician mixed set 1 9
Falling Australian musician mixed set 2 8
Falling Australian non-musician mixed set 1 19
Falling Australian non-musician mixed set 2 8
Falling Australian non-musician mixed set 1 26
Falling Australian non-musician mixed set 1 8
Falling Australian non-musician mixed set 2 9
Falling Australian non-musician mixed set 1 8
317
Falling Australian non-musician mixed set 2 22
Falling Australian non-musician mixed set 1 29
Falling Thai musician mixed set 2 12
Falling Thai musician mixed set 1 8
Falling Thai musician mixed set 2 8
Falling Thai musician mixed set 1 8
Falling Thai musician mixed set 2 12
Falling Thai musician mixed set 1 8
Falling Thai musician mixed set 1 24
Falling Thai musician mixed set 2 9
Falling Thai non-musician mixed set 1 8
Falling Thai non-musician mixed set 2 8
Falling Thai non-musician mixed set 1 35
Falling Thai non-musician mixed set 2 9
Falling Thai non-musician mixed set 1 45
Falling Thai non-musician mixed set 2 13
Falling Thai non-musician mixed set 1 22
Falling Thai non-musician mixed set 2 8
318
Appendix A7.4 Statistical Analyses – Criterion Results
Univariate Analysis of Variance Criterion Rising vs. Falling
Tests of Be tween-Subjects Effects
Dependent Variable: criterionout
4416.438a 7 630.920 1.920 .083
43785.563 1 43785.563 133.260 .000
27.562 1 27.562 .084 .773
3164.062 1 3164.062 9.630 .003
248.063 1 248.063 .755 .389
370.563 1 370.563 1.128 .293
175.563 1 175.563 .534 .468
60.063 1 60.063 .183 .671
370.563 1 370.563 1.128 .293
18400.000 56 328.571
66602.000 64
22816.438 63
SourceCorrected Model
Intercept
language
musical
contshape
language * musical
language * contshape
musical * contshape
language * musical *contshape
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .194 (Adjusted R Squared = .093)a.
Univariate Analysis of Variance Criterion Rising blocked vs. mixed
Tests of Between-Subjects Effects
Dependent Variable: crit
2116.000a 1 2116.000 7.919 .007
16065.563 1 16065.563 60.125 .000
2116.000 1 2116.000 7.919 .007
16566.438 62 267.201
34748.000 64
18682.438 63
SourceCorrected Model
Intercept
manner
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .113 (Adjusted R Squared = .099)a.
319
manner = blocked
Tests of Be tween-Subjects Effectsb
Dependent Variable: crit
229.469a 7 32.781 1.868 .120
3260.281 1 3260.281 185.749 .000
2.531 1 2.531 .144 .707
38.281 1 38.281 2.181 .153
94.531 1 94.531 5.386 .029
5.281 1 5.281 .301 .588
.281 1 .281 .016 .900
69.031 1 69.031 3.933 .059
19.531 1 19.531 1.113 .302
421.250 24 17.552
3911.000 32
650.719 31
SourceCorrected Model
Intercept
language
musical
type
language * musical
language * type
musical * type
language * musical * type
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .353 (Adjusted R Squared = .164)a.
manner = blockb.
manner = mixed
Tests of Be tween-Subjects Effectsb
Dependent Variable: crit
12197.469a 7 1742.496 11.247 .000
14921.281 1 14921.281 96.312 .000
1785.031 1 1785.031 11.522 .002
3180.031 1 3180.031 20.526 .000
2161.531 1 2161.531 13.952 .001
2945.281 1 2945.281 19.011 .000
282.031 1 282.031 1.820 .190
1092.781 1 1092.781 7.054 .014
750.781 1 750.781 4.846 .038
3718.250 24 154.927
30837.000 32
15915.719 31
SourceCorrected Model
Intercept
language
musical
type
language * musical
language * type
musical * type
language * musical * type
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .766 (Adjusted R Squared = .698)a.
manner = mixb.
320
Univariate Analysis of Variance Criterion Falling blocked vs. mixed Tests of Between-Subjects Effects Dependent Variable: crit
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 81.000(a) 1 81.000 1.070 .305
Intercept 9900.250 1 9900.250 130.745 .000
blockmix 81.000 1 81.000 1.070 .305
Error 4694.750 62 75.722
Total 14676.000 64
Corrected Total 4775.750 63
a R Squared = .017 (Adjusted R Squared = .001)
manner = blocked
Tests of Between-Subjects Effects(b) Dependent Variable: crit
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 562.375(a) 7 80.339 1.357 .268
Intercept 4095.125 1 4095.125 69.189 .000
language 136.125 1 136.125 2.300 .142
music 200.000 1 200.000 3.379 .078
sinespeech 120.125 1 120.125 2.030 .167
language * music 50.000 1 50.000 .845 .367
language * sinespeech 15.125 1 15.125 .256 .618
music * sinespeech 40.500 1 40.500 .684 .416
language * music * sinespeech .500 1 .500 .008 .928
Error 1420.500 24 59.188
Total 6078.000 32
Corrected Total 1982.875 31
a R Squared = .284 (Adjusted R Squared = .075) b blockmix = block
manner = mixed Tests of Between-Subjects Effects(b) Dependent Variable: crit
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 1282.375(a) 7 183.196 3.076 .018 Intercept 5886.125 1 5886.125 98.823 .000 language 50.000 1 50.000 .839 .369 music 450.000 1 450.000 7.555 .011 sinespeech 465.125 1 465.125 7.809 .010 language * music .125 1 .125 .002 .964 language * sinespeech 144.500 1 144.500 2.426 .132 music * sinespeech 144.500 1 144.500 2.426 .132 language * music * sinespeech
28.125 1 28.125 .472 .499
Error 1429.500 24 59.563 Total 8598.000 32 Corrected Total 2711.875 31
321
a R Squared = .473 (Adjusted R Squared = .319) b blockmix = mix
Appendix A7.5 Raw Data – Crossover Values Continuum Language Tpne Presentation Crossover
Shape Background Musicianship Type Manner Value in Hz
falling Thai musician sine-wave mixed 211.75
falling Thai musician sine-wave mixed 217.8
falling Thai musician sine-wave mixed .
falling Thai musician sine-wave mixed 232.75
falling Thai musician sine-wave blocked 205
falling Thai musician sine-wave blocked 214
falling Thai musician sine-wave blocked 212.5
falling Thai musician sine-wave blocked 216.25
falling Thai musician speech mixed 208
falling Thai musician speech mixed 202
falling Thai musician speech mixed 220
falling Thai musician speech mixed 211
falling Thai musician speech blocked 208
falling Thai musician speech blocked 213.25
falling Thai musician speech blocked 204.25
falling Thai musician speech blocked 209.5
falling Thai non-musician sine-wave mixed 226.75
falling Thai non-musician sine-wave mixed 214.75
falling Thai non-musician sine-wave mixed 209.5
falling Thai non-musician sine-wave mixed 217.75
falling Thai non-musician sine-wave blocked 227.5
falling Thai non-musician sine-wave blocked 207
falling Thai non-musician sine-wave blocked 204.25
falling Thai non-musician sine-wave blocked 233.5
falling Thai non-musician speech mixed 205
falling Thai non-musician speech mixed 219.25
falling Thai non-musician speech mixed 218.5
falling Thai non-musician speech mixed 223.75
falling Thai non-musician speech blocked 230.5
falling Thai non-musician speech blocked 212.5
falling Thai non-musician speech blocked 211
falling Thai non-musician speech blocked 231.25
falling Australian musician sine-wave mixed 212.5
322
falling Australian musician sine-wave mixed 215.5
falling Australian musician sine-wave mixed 212.5
falling Australian musician sine-wave mixed 211
falling Australian musician sine-wave blocked 198.25
falling Australian musician sine-wave blocked 209.5
falling Australian musician sine-wave blocked 202
falling Australian musician sine-wave blocked 205.75
falling Australian musician speech mixed 204.25
falling Australian musician speech mixed 205.75
falling Australian musician speech mixed 217.75
falling Australian musician speech mixed 207.25
falling Australian musician speech blocked 209.5
falling Australian musician speech blocked 211.75
falling Australian musician speech blocked 200.5
falling Australian musician speech blocked 209.5
falling Australian non-musician sine-wave mixed 215.5
falling Australian non-musician sine-wave mixed 205.75
falling Australian non-musician sine-wave mixed 216.25
falling Australian non-musician sine-wave mixed 215.5
falling Australian non-musician sine-wave blocked 219.25
falling Australian non-musician sine-wave blocked 201.25
falling Australian non-musician sine-wave blocked 208.75
falling Australian non-musician sine-wave blocked 208.75
falling Australian non-musician speech mixed 185.5
falling Australian non-musician speech mixed .
falling Australian non-musician speech mixed .
falling Australian non-musician speech mixed 205.75
falling Australian non-musician speech blocked 207.25
falling Australian non-musician speech blocked 209.5
falling Australian non-musician speech blocked 202
falling Australian non-musician speech blocked 208.75
rising Thai musician sine-wave mixed 243.25
rising Thai musician sine-wave mixed 234.25
rising Thai musician sine-wave mixed 234.25
rising Thai musician sine-wave mixed 253.8
rising Thai musician sine-wave blocked 227.5
rising Thai musician sine-wave blocked 235.75
rising Thai musician sine-wave blocked 234.25
323
rising Thai musician sine-wave blocked 235.75
rising Thai musician speech mixed 234.25
rising Thai musician speech mixed 235
rising Thai musician speech mixed 238
rising Thai musician speech mixed 230.5
rising Thai musician speech blocked 232
rising Thai musician speech blocked 228.25
rising Thai musician speech blocked 231.25
rising Thai musician speech blocked 231.25
rising Thai non-musician sine-wave mixed 241
rising Thai non-musician sine-wave mixed 232
rising Thai non-musician sine-wave mixed .
rising Thai non-musician sine-wave mixed 230.5
rising Thai non-musician sine-wave blocked 235
rising Thai non-musician sine-wave blocked 229.75
rising Thai non-musician sine-wave blocked 232
rising Thai non-musician sine-wave blocked 234.25
rising Thai non-musician speech mixed 231.25
rising Thai non-musician speech mixed 223
rising Thai non-musician speech mixed .
rising Thai non-musician speech mixed 233.5
rising Thai non-musician speech blocked 233.5
rising Thai non-musician speech blocked 227.5
rising Thai non-musician speech blocked 232.75
rising Thai non-musician speech blocked 229
rising Australian musician sine-wave mixed 226.75
rising Australian musician sine-wave mixed 220.75
rising Australian musician sine-wave mixed 230.5
rising Australian musician sine-wave mixed 227.5
rising Australian musician sine-wave blocked 233.5
rising Australian musician sine-wave blocked 236.5
rising Australian musician sine-wave blocked 230.5
rising Australian musician sine-wave blocked 221.5
rising Australian musician speech mixed 223.75
rising Australian musician speech mixed 232
rising Australian musician speech mixed 229.75
rising Australian musician speech mixed 223
rising Australian musician speech blocked 238.75
324
rising Australian musician speech blocked 241.75
rising Australian musician speech blocked 231.25
rising Australian musician speech blocked 225.25
rising Australian non-musician sine-wave mixed 231.25
rising Australian non-musician sine-wave mixed 228.25
rising Australian non-musician sine-wave mixed 241.75
rising Australian non-musician sine-wave mixed 231.25
rising Australian non-musician sine-wave blocked 232
rising Australian non-musician sine-wave blocked 234.25
rising Australian non-musician sine-wave blocked 233.5
rising Australian non-musician sine-wave blocked 233.5
rising Australian non-musician speech mixed 221.5
rising Australian non-musician speech mixed 220.75
rising Australian non-musician speech mixed 229.75
rising Australian non-musician speech mixed 249.25
rising Australian non-musician speech blocked 225.25
rising Australian non-musician speech blocked .
rising Australian non-musician speech blocked 232
rising Australian non-musician speech blocked 232
325
Appendix A7.6 Statistical Output – Crossover Values Rising Continuum Tests of Between-Subjects Effects(b) Dependent Variable: crossover
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 729.992(a) 15 48.666 1.387 .195
Intercept 3241689.613 1 3241689.613 92420.449 .000
language 116.616 1 116.616 3.325 .075
musical 4.529 1 4.529 .129 .721
tonetype 82.831 1 82.831 2.362 .131
mixblock .375 1 .375 .011 .918
language * musical 102.656 1 102.656 2.927 .094
language * tonetype 53.029 1 53.029 1.512 .225
musical * tonetype 17.453 1 17.453 .498 .484
language * musical * tonetype 40.610 1 40.610 1.158 .288
language * mixblock 124.606 1 124.606 3.553 .066
musical * mixblock .003 1 .003 .000 .993
language * musical * mixblock 125.963 1 125.963 3.591 .065
tonetype * mixblock 22.425 1 22.425 .639 .428
language * tonetype * mixblock 6.516 1 6.516 .186 .669
musical * tonetype * mixblock 5.621 1 5.621 .160 .691
language * musical * tonetype * mixblock 1.606 1 1.606 .046 .832
Error 1578.396 45 35.075
Total 3286291.628 61
Corrected Total 2308.388 60
a R Squared = .316 (Adjusted R Squared = .088) b continuum = B=rising
326
Falling Continuum
Tests of Between-Subjects Effects(b) Dependent Variable: crossover
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 2022.592(a) 15 134.839 2.313 .015
Intercept 2641791.253 1 2641791.253 45316.468 .000
language 1015.283 1 1015.283 17.416 .000
musical 41.566 1 41.566 .713 .403
tonetype 228.315 1 228.315 3.916 .054
mixblock 11.977 1 11.977 .205 .653
language * musical 201.451 1 201.451 3.456 .070
language * tonetype 20.481 1 20.481 .351 .556
musical * tonetype 3.110 1 3.110 .053 .818
language * musical * tonetype 306.671 1 306.671 5.261 .027
language * mixblock 1.252 1 1.252 .021 .884
musical * mixblock 256.346 1 256.346 4.397 .042
language * musical * mixblock .551 1 .551 .009 .923
tonetype * mixblock 269.983 1 269.983 4.631 .037
language * tonetype * mixblock 32.794 1 32.794 .563 .457
musical * tonetype * mixblock 2.700 1 2.700 .046 .831
language * musical * tonetype * mixblock 25.221 1 25.221 .433 .514
Error 2623.342 45 58.296
Total 2741906.840 61
Corrected Total 4645.934 60
a R Squared = .435 (Adjusted R Squared = .247) b continuum = A=Falling
327
Appendix A7.7 Raw Data – Identification Accuracy Results Continuum Language Tone Presentation Identification
Shape Background Musicianship Type Manner Accuracy
rising Thai musician sine-wave mixed 1.11
rising Thai musician speech mixed 1.35
rising Thai musician sine-wave mixed 0.91
rising Thai musician speech mixed 1.35
rising Thai musician sine-wave blocked 1.87
rising Thai musician speech blocked 2.3
rising Thai musician sine-wave blocked 2.71
rising Thai musician speech blocked 2.71
rising Thai musician sine-wave mixed .
rising Thai musician speech mixed .
rising Thai musician sine-wave mixed 0.99
rising Thai musician speech mixed 0.99
rising Thai musician sine-wave blocked 1.74
rising Thai musician speech blocked 2.71
rising Thai musician sine-wave blocked 1.35
rising Thai musician speech blocked 1.82
rising Thai non-musician sine-wave mixed .
rising Thai non-musician speech mixed 1.35
rising Thai non-musician sine-wave mixed 1.12
rising Thai non-musician speech mixed 1.35
rising Thai non-musician sine-wave blocked 1.87
rising Thai non-musician speech blocked 2.3
rising Thai non-musician sine-wave blocked 0.91
rising Thai non-musician speech blocked 1.15
rising Thai non-musician sine-wave mixed 1.74
rising Thai non-musician speech mixed 1.33
rising Thai non-musician sine-wave mixed 0.32
rising Thai non-musician speech mixed 3.11
rising Thai non-musician sine-wave blocked 2.23
rising Thai non-musician speech blocked 1.47
rising Thai non-musician sine-wave blocked 0.99
rising Thai non-musician sine-wave blocked 0.99
rising Australian musician speech mixed 1.56
rising Australian musician sine-wave mixed 1.35
rising Australian musician speech mixed 1.47
328
rising Australian musician sine-wave mixed 1.82
rising Australian musician speech blocked 1.47
rising Australian musician sine-wave blocked 1.82
rising Australian musician speech blocked 1.82
rising Australian musician sine-wave blocked 1.87
rising Australian musician speech mixed 0.91
rising Australian musician sine-wave mixed 0.73
rising Australian musician speech mixed 1.82
rising Australian musician sine-wave mixed 0.62
rising Australian musician speech blocked 1.47
rising Australian musician sine-wave blocked 1.33
rising Australian musician speech blocked 1.47
rising Australian musician sine-wave blocked 1.47
rising Australian non-musician speech mixed 6.04
rising Australian non-musician sine-wave mixed .
rising Australian non-musician speech mixed 0.64
rising Australian non-musician sine-wave mixed .
rising Australian non-musician speech blocked 0.73
rising Australian non-musician sine-wave blocked 0.64
rising Australian non-musician speech blocked 0.99
rising Australian non-musician sine-wave blocked 0.73
rising Australian non-musician speech mixed 0.5
rising Australian non-musician sine-wave mixed .
rising Australian non-musician speech mixed 1.87
rising Australian non-musician sine-wave mixed 1.87
rising Australian non-musician speech blocked 0.64
rising Australian non-musician sine-wave blocked 0.99
rising Australian non-musician speech blocked 2.3
rising Australian non-musician sine-wave blocked 1.35
falling Thai musician sine-wave mixed 1.82
falling Thai musician speech mixed 1.47
falling Thai musician sine-wave mixed 1.82
falling Thai musician speech mixed 1.47
falling Thai musician sine-wave blocked 1.87
falling Thai musician speech blocked 1.82
falling Thai musician sine-wave blocked 2.23
falling Thai musician speech blocked 2.23
falling Thai musician sine-wave mixed 2.23
329
falling Thai musician speech mixed 0.99
falling Thai musician sine-wave mixed 0.99
falling Thai musician speech mixed 1.47
falling Thai musician sine-wave blocked 2.71
falling Thai musician speech blocked 2.3
falling Thai musician sine-wave blocked 1.35
falling Thai musician speech blocked 0.62
falling Thai non-musician sine-wave mixed 1.47
falling Thai non-musician sine-wave mixed 1.35
falling Thai non-musician speech mixed 2.23
falling Thai non-musician sine-wave mixed 1.82
falling Thai non-musician speech blocked 1.87
falling Thai non-musician sine-wave blocked 1.47
falling Thai non-musician speech blocked 1.47
falling Thai non-musician sine-wave blocked 1.87
falling Thai non-musician speech mixed .
falling Thai non-musician sine-wave mixed 0.73
falling Thai non-musician speech mixed 0.64
falling Thai non-musician sine-wave mixed 1.47
falling Thai non-musician speech blocked 0.64
falling Thai non-musician sine-wave blocked 2.71
falling Thai non-musician speech blocked 1.35
falling Thai non-musician sine-wave blocked 1.15
falling Australian musician sine-wave mixed 1.87
falling Australian musician speech mixed 2.71
falling Australian musician sine-wave mixed 2.3
falling Australian musician speech mixed 1.35
falling Australian musician sine-wave blocked 0.94
falling Australian musician speech blocked 3.11
falling Australian musician sine-wave blocked 1.47
falling Australian musician speech blocked 1.47
falling Australian musician sine-wave mixed 3.11
falling Australian musician speech mixed 1.47
falling Australian musician sine-wave mixed 1.47
falling Australian musician speech mixed 1.82
falling Australian musician sine-wave blocked 1.84
falling Australian musician speech blocked 1.35
falling Australian musician sine-wave blocked 0.91
330
falling Australian musician speech blocked 1.47
falling Australian non-musician sine-wave mixed 0.99
falling Australian non-musician sine-wave mixed 0.64
falling Australian non-musician sine-wave mixed 1.47
falling Australian non-musician speech mixed 1.47
falling Australian non-musician sine-wave blocked 1.11
falling Australian non-musician speech blocked 0.73
falling Australian non-musician sine-wave blocked 1.87
falling Australian non-musician speech blocked .
falling Australian non-musician sine-wave mixed 1.87
falling Australian non-musician speech mixed 0.99
falling Australian non-musician sine-wave mixed 1.84
falling Australian non-musician speech mixed 0.73
falling Australian non-musician sine-wave blocked 1.47
falling Australian non-musician speech blocked .
falling Australian non-musician sine-wave blocked 0.99
falling Australian non-musician speech blocked 0.99
331
Appendix A7.8 Statistical Output – Identification Accuracy
Rising Continuum Tests of Between-Subjects Effects(b) Dependent Variable: dprime
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 8.148(a) 15 .543 1.669 .094
Intercept 144.069 1 144.069 442.711 .000
language .740 1 .740 2.275 .139
musical .106 1 .106 .326 .571
mixblock .016 1 .016 .049 .826
sinespeech .354 1 .354 1.086 .303
language * musical 1.336 1 1.336 4.106 .049
language * mixblock .029 1 .029 .090 .766
musical * mixblock 3.192 1 3.192 9.808 .003
language * musical * mixblock .483 1 .483 1.485 .230
language * sinespeech .323 1 .323 .994 .324
musical * sinespeech .000 1 .000 .000 .985
language * musical * sinespeech .001 1 .001 .003 .955
mixblock * sinespeech .554 1 .554 1.701 .199
language * mixblock * sinespeech .314 1 .314 .964 .332
musical * mixblock * sinespeech .229 1 .229 .704 .406
language * musical * mixblock * sinespeech .966 1 .966 2.968 .092
Error 14.319 44 .325
Total 175.342 60
Corrected Total 22.467 59
a R Squared = .363 (Adjusted R Squared = .145) b continuum = rising
332
Falling Continuum
Tests of Be tween-Subjects Effectsb
Dependent Variable: dprime
9.000a 15 .600 .816 .655
120.744 1 120.744 164.221 .000
.387 1 .387 .527 .472
.883 1 .883 1.200 .279
1.411 1 1.411 1.920 .173
1.179 1 1.179 1.604 .212
.289 1 .289 .394 .534
.348 1 .348 .474 .495
.460 1 .460 .626 .433
.345 1 .345 .470 .497
.008 1 .008 .011 .917
.998 1 .998 1.357 .251
.096 1 .096 .131 .719
1.107 1 1.107 1.505 .227
1.375 1 1.375 1.870 .179
.044 1 .044 .060 .808
.104 1 .104 .141 .709
31.616 43 .735
172.240 59
40.616 58
SourceCorrected Model
Intercept
language
musical
mixblock
sinespeech
language * musical
language * mixblock
musical * mixblock
language * musical *mixblock
language * sinespeech
musical * s inespeech
language * musical *sinespeech
mixblock * sinespeech
language * mixblock *sinespeech
musical * mixblock *sinespeech
language * musical *mixblock * sinespeech
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .222 (Adjusted R Squared = -.050)a.
continuum = fall ingb.
333
Appendix A7.9 Raw Data – Discrimination Accuracy
Continuum Language Tpne Presentation Discrimination
Shape Background Musicianship Type Manner Accuracy
rising Australian musician blocked sine-wave 1.357
rising Australian musician blocked speech 2.338
rising Australian musician blocked sine-wave 2.414
rising Australian musician blocked speech 0
rising Australian musician blocked sine-wave 2.2
rising Australian musician blocked speech 1.998
rising Australian musician blocked sine-wave 0.284
rising Australian musician blocked speech -1.998
rising Australian musician mixed sine-wave 4.289
rising Australian musician mixed speech 0
rising Australian musician mixed sine-wave 2.916
rising Australian musician mixed speech -0.885
rising Australian musician mixed sine-wave 1.714
rising Australian musician mixed speech 0.885
rising Australian musician mixed sine-wave 0.616
rising Australian musician mixed speech 0.438
rising Australian non-musician blocked sine-wave 0.881
rising Australian non-musician blocked speech -2.283
rising Australian non-musician blocked sine-wave 1.315
rising Australian non-musician blocked speech 0.885
rising Australian non-musician blocked sine-wave 0.674
rising Australian non-musician blocked speech 0
rising Australian non-musician blocked sine-wave 1.344
rising Australian non-musician blocked speech -0.885
rising Australian non-musician mixed sine-wave 3.338
rising Australian non-musician mixed speech 2.283
rising Australian non-musician mixed sine-wave 3.338
rising Australian non-musician mixed speech 2.283
rising Australian non-musician mixed sine-wave 2.461
rising Australian non-musician mixed speech 1.998
rising Australian non-musician mixed sine-wave -1.938
rising Australian non-musician mixed speech 2.698
rising Thai musician blocked sine-wave 2.196
rising Thai musician blocked speech 0.885
334
rising Thai musician blocked sine-wave 1.736
rising Thai musician blocked speech 0
rising Thai musician blocked sine-wave 2.629
rising Thai musician blocked speech 0.768
rising Thai musician blocked sine-wave 2.521
rising Thai musician blocked speech 1.998
rising Thai musician mixed sine-wave 2.041
rising Thai musician mixed speech -3.995
rising Thai musician mixed sine-wave 1.648
rising Thai musician mixed speech 2.283
rising Thai musician mixed sine-wave 0.558
rising Thai musician mixed speech 4.62
rising Thai musician mixed sine-wave 2.285
rising Thai musician mixed speech 1.398
rising Thai non-musician blocked sine-wave 1.15
rising Thai non-musician blocked speech -1.998
rising Thai non-musician blocked sine-wave 0.118
rising Thai non-musician blocked speech 0
rising Thai non-musician blocked sine-wave 2.247
rising Thai non-musician blocked speech 1.113
rising Thai non-musician blocked sine-wave 1.393
rising Thai non-musician blocked speech -2.283
rising Thai non-musician mixed sine-wave 2.576
rising Thai non-musician mixed speech 0
rising Thai non-musician mixed sine-wave 3.193
rising Thai non-musician mixed speech 0.785
rising Thai non-musician mixed sine-wave 1.323
rising Thai non-musician mixed speech 0
rising Thai non-musician mixed sine-wave -0.377
rising Thai non-musician mixed speech 1.113
falling Australian musician blocked sine-wave 6.39
falling Australian musician blocked speech 7.11
falling Australian musician blocked sine-wave 2.46
falling Australian musician blocked speech 1.32
falling Australian musician blocked sine-wave 1.78
falling Australian musician blocked speech 2.06
falling Australian musician blocked sine-wave 2.21
falling Australian musician blocked speech 1.28
335
falling Australian musician mixed sine-wave 2.87
falling Australian musician mixed speech 3.48
falling Australian musician mixed sine-wave 4.34
falling Australian musician mixed speech 3.63
falling Australian musician mixed sine-wave 2.88
falling Australian musician mixed speech 2.8
falling Australian musician mixed sine-wave 2.61
falling Australian musician mixed speech 3
falling Australian non-musician blocked sine-wave 2.31
falling Australian non-musician blocked speech 1.31
falling Australian non-musician blocked sine-wave 0.57
falling Australian non-musician blocked speech 1.84
falling Australian non-musician blocked sine-wave 1.92
falling Australian non-musician blocked speech 1.03
falling Australian non-musician blocked sine-wave 1.06
falling Australian non-musician blocked speech 0.31
falling Australian non-musician mixed sine-wave 1.31
falling Australian non-musician mixed speech -0.07
falling Australian non-musician mixed sine-wave 1.31
falling Australian non-musician mixed speech 1.93
falling Australian non-musician mixed sine-wave 2.06
falling Australian non-musician mixed speech 0.71
falling Australian non-musician mixed sine-wave 1.09
falling Australian non-musician mixed speech -0.2
falling Thai musician blocked sine-wave -0.02
falling Thai musician blocked speech 1.19
falling Thai musician blocked sine-wave 1.71
falling Thai musician blocked speech 1.34
falling Thai musician blocked sine-wave 2.84
falling Thai musician blocked speech 4.24
falling Thai musician blocked sine-wave 1.73
falling Thai musician blocked speech 0.69
falling Thai musician mixed sine-wave 1.72
falling Thai musician mixed speech 2.05
falling Thai musician mixed sine-wave 0.53
falling Thai musician mixed speech 0.21
falling Thai musician mixed sine-wave 2.83
falling Thai musician mixed speech 0.12
336
falling Thai musician mixed sine-wave 2.55
falling Thai musician mixed speech 3.56
falling Thai non-musician blocked sine-wave 3.07
falling Thai non-musician blocked speech 3.22
falling Thai non-musician blocked sine-wave 1.33
falling Thai non-musician blocked speech 1.47
falling Thai non-musician blocked sine-wave 2.93
falling Thai non-musician blocked speech 1.34
falling Thai non-musician blocked sine-wave 2
falling Thai non-musician blocked speech 0.68
falling Thai non-musician mixed sine-wave 0.78
falling Thai non-musician mixed speech 1.84
falling Thai non-musician mixed sine-wave 0.1
falling Thai non-musician mixed speech -0.16
falling Thai non-musician mixed sine-wave -0.12
falling Thai non-musician mixed speech 0.77
falling Thai non-musician mixed sine-wave 1.08
falling Thai non-musician mixed speech 1.27
337
Appendix A7.10 Statistical Output – Discrimination Accuracy
Rising Continuum Tests of Between-Subjects Effects Dependent Variable: discrimination
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 55.521(a) 15 3.701 1.586 .114
Intercept 78.497 1 78.497 33.641 .000
language .144 1 .144 .062 .805
music 2.802 1 2.802 1.201 .279
mixblock 6.815 1 6.815 2.921 .094
sinespeech 22.567 1 22.567 9.672 .003
language * music 2.659 1 2.659 1.139 .291
language * mixblock 1.867 1 1.867 .800 .376
music * mixblock 7.505 1 7.505 3.216 .079
language * music * mixblock .300 1 .300 .129 .721
language * sinespeech .150 1 .150 .064 .801
music * sinespeech .175 1 .175 .075 .785
language * music * sinespeech 2.980 1 2.980 1.277 .264
mixblock * sinespeech 1.513 1 1.513 .649 .425
language * mixblock * sinespeech .149 1 .149 .064 .802
music * mixblock * sinespeech 2.971 1 2.971 1.273 .265
language * music * mixblock * sinespeech 2.925 1 2.925 1.253 .268
Error 112.000 48 2.333
Total 246.018 64
Corrected Total 167.521 63
a R Squared = .331 (Adjusted R Squared = .122)
338
Falling Continuum Tests of Between-Subjects Effects Dependent Variable: discrimination
Source Type III Sum of Squares df Mean Square F Sig.
Corrected Model 49.501(a) 15 3.300 2.030 .033
Intercept 216.009 1 216.009 132.863 .000
language 6.135 1 6.135 3.774 .058
musical 21.871 1 21.871 13.452 .001
mixblock 2.199 1 2.199 1.352 .251
sinespeech .736 1 .736 .453 .504
language * musical 10.585 1 10.585 6.510 .014
language * mixblock 1.383 1 1.383 .851 .361
musical * mixblock 2.875 1 2.875 1.769 .190
language * musical * mixblock .783 1 .783 .482 .491
language * sinespeech .303 1 .303 .186 .668
musical * sinespeech .267 1 .267 .164 .687
language * musical * sinespeech .209 1 .209 .128 .722
mixblock * sinespeech .012 1 .012 .007 .931
language * mixblock * sinespeech .089 1 .089 .055 .816
musical * mixblock * sinespeech .256 1 .256 .157 .693
language * musical * mixblock * sinespeech 1.798 1 1.798 1.106 .298
Error 78.038 48 1.626
Total 343.548 64
Corrected Total 127.539 63
a R Squared = .388 (Adjusted R Squared = .197)
339
Appendix A7.11 Raw Data – Discrimination Peak Analysis
Language Rising Sine Stimulus-pairs (stimulus offset in Hz)
Background Continuum Wave 205-212.5 212.5-220 220-227.5 227.5-235 235-242.5 242.5-250 250-257.5
Thai musician mixed -2.283 4.28 3.168 1.658 2.623 3.105 3.105
Thai musician mixed 2.283 -2.283 2.623 2.075 0 1.67 1.67
Thai musician mixed 0.668 2.338 0.885 -0.885 1.67 -2.338 -2.338
Thai musician mixed 2.283 4.905 0.768 2.438 0.885 3.168 3.168
Thai musician blocked 0 1.553 4.725 -1.498 6.355 3.473 3.473
Thai musician blocked 5.243 1.815 3.473 1.07 0 1.735 1.735
Thai musician blocked -3.173 2.883 6.98 1.175 0 5.715 5.715
Thai musician blocked 0.415 0.885 0.785 2.288 5.91 5.795 5.795
Thai non-musician mixed 2.283 2.283 1.998 4.905 2.283 1.998 1.998
Thai non-musician mixed 4.358 2.283 3.483 3.173 1.553 2.883 2.883
Thai non-musician mixed 1.77 -0.785 0 1.113 3.995 0 0
Thai non-musician mixed -2.283 -2.283 2.438 -1.398 0.885 0 0
Thai non-musician blocked 1.998 -0.885 4.28 1.77 0.885 1.998 1.998
Thai non-musician blocked 0 1.998 -0.285 0 0 -2.883 -2.883
Thai non-musician blocked 1.998 1.57 1.553 2.698 -0.768 5.795 5.795
Thai non-musician blocked -1.998 1.998 0 0.785 3.173 0 0
Australian musician mixed 1.998 5.795 1.553 2.623 6.9 4.175 4.175
Australian musician mixed 3.202 1.023 2.61 2.077 3.507 2.045 2.045
Australian musician mixed 3.328 -1.998 4.725 1.77 2.283 3.958 3.958
Australian musician mixed 4.28 -0.73 1.553 1.838 1.338 -1.998 -1.998
Australian musician blocked 1.998 -0.625 2.283 2.438 0 2.623 2.623
Australian musician blocked 0 4.28 0.285 5.795 2.543 1.998 1.998
Australian musician blocked 0.885 0.885 5.91 3.428 0.1 1.758 1.758
Australian musician blocked -2.283 -3.168 -2.283 3.508 1.113 3.55 3.55
Australian non-musician mixed 4.62 2.438 4.058 0 4.358 5.455 5.455
Australian non-musician mixed 4.62 2.438 4.058 0 4.358 5.455 5.455
Australian non-musician mixed 2.623 1.998 -1.998 0.285 6.64 4.175 4.175
Australian non-musician mixed -4.058 -0.785 -3.408 0.445 -5.245 0.885 0.885
Australian non-musician blocked -0.668 1.77 -0.1 3.068 2.543 1.553 1.553
Australian non-musician blocked 2.283 1.998 4.095 -0.785 3.168 -1.553 -1.553
Australian non-musician blocked -0.785 0 0.785 2.283 2.438 -1.77 -1.77
Australian non-musician blocked 0.73 -1.553 3.173 2.698 -0.885 0 0
340
Rising Speech 205-212.5 212.5-220 220-227.5 227.5-235 235-242.5 242.5-250 250-257.5
Thai musician mixed -3.995 1.998 4.175 5.243 -0.34 3.168 4.905
Thai musician mixed 2.283 1.998 1.553 0.785 0.285 -1.998 0
Thai musician mixed 4.62 2.283 0.285 0 3.168 -0.885 -1.738
Thai musician mixed 1.398 -2.438 2.438 3.958 1.838 0.285 0
Thai musician blocked 0.885 4.62 2.283 0.885 1.838 3.835 0.885
Thai musician blocked 0 0 0 0 4.358 0 -0.99
Thai musician blocked 0.768 0.73 4.62 -0.1 1.398 -1.145 4.095
Thai musician blocked 1.998 6.355 1.67 5.165 0.89 2.283 0.285
Thai non-musician mixed 0 -1.998 2.283 2.623 1.113 1.998 1.998
Thai non-musician mixed 0.785 0.885 -1.498 3.173 3.473 1.998 0
Thai non-musician mixed 0 2.623 -2.883 0.885 1.498 -3.835 0
Thai non-musician mixed 1.113 -2.283 0 -0.885 4.28 -0.885 1.498
Thai non-musician blocked -1.998 0 -1.553 1.66 0 -0.785 -0.1
Thai non-musician blocked 0 0 -1.998 1.998 4.28 2.283 -1.998
Thai non-musician blocked 1.113 -3.995 -1.145 3.483 0.885 0.1 8.715
Thai non-musician blocked -2.283 0 0.885 -1.553 0.545 -1.553 2.883
Australian musician mixed 0 2.283 3.995 6.98 3.508 4.28 6.64
Australian musician mixed -0.885 1.67 4.358 0 -0.768 2.438 2.438
Australian musician mixed 0.885 2.283 1.113 -1.67 -3.105 2.283 5.243
Australian musician mixed 0 2.078 3.155 1.77 -0.122 3 4.773
Australian musician blocked 2.338 -0.785 5.795 1.398 2.438 0 -1.398
Australian musician blocked 0 0 0.885 -1.998 2.883 4.905 2.623
Australian musician blocked 1.998 2.438 -1.498 2.438 -0.155 -2.805 1.145
Australian musician blocked -1.998 -2.283 3.508 2.283 3.068 0 2.438
Australian non-musician mixed 2.283 1.998 0 0.285 0.285 -2.283 2.623
Australian non-musician mixed 2.283 1.998 0 0.285 0.285 -1.998 2.623
Australian non-musician mixed 1.998 0 0 1.998 4.905 2.623 2.783
Australian non-musician mixed 2.698 -0.785 -1.553 -3.173 -1.67 0 -0.73
Australian non-musician blocked -2.283 -4.28 -1.67 -0.885 2.883 1.553 1.998
Australian non-musician blocked 0.885 0 -1.398 -1.113 -1.113 0.885 1.553
Australian non-musician blocked 0 -1.738 -2.283 0.785 2.623 2.283 1.553
Australian non-musician blocked -0.885 0.885 0 0 2.388 -0.785 0.1
341
Language Falling Sine Stimulus-pairs (stimulus offset in Hz)
Background Continuum Wave 182.5-190 190-197.5 197.5-205 205-212.5 212.5-220 220-227.5 227.5-235
Thai musician mixed 3.428 5.395 2.698 0 -1.145 1.67 0
Thai musician mixed 1.998 0 -1.113 4.28 -0.885 -1.998 1.398
Thai musician mixed 1.57 -5.455 -0.785 -1.553 3.835 -2.283 4.565
Thai musician mixed 1.998 -0.885 -1.998 3.168 4.62 4.28 0.785
Thai musician blocked 1.998 0 4.905 2.438 5.91 2.783 1.77
Thai musician blocked 0 2.283 1.498 4.28 4.725 2.783 2.283
Thai musician blocked 0 0.285 3.473 4.175 4.358 5.245 2.36
Thai musician blocked 0.785 0 1.57 3.173 0.785 2.283 3.508
Thai non-musician mixed 2.283 -1.77 -3.55 3.105 1.998 -0.885 4.28
Thai non-musician mixed 1.998 1.553 0 -3.168 2.283 -1.998 0
Thai non-musician mixed 3.168 1.998 1.998 3.428 4.565 5.455 0.885
Thai non-musician mixed 0.285 0 3.508 3.173 1.175 0.885 0.285
Thai non-musician blocked 0 0 -1.998 -1.498 0.668 0 1.998
Thai non-musician blocked 0 -1.998 1.998 1.553 3.995 1.998 0
Thai non-musician blocked 0.785 -0.075 2.543 3.173 5.795 5.455 2.805
Thai non-musician blocked -1.398 3.168 -1.998 4.095 4.1 3.173 2.883
Australian musician mixed 3.995 0.285 1.62 3.835 5.715 4.62 0
Australian musician mixed 3.573 5.91 4.825 5.795 4.358 1.758 4.175
Australian musician mixed 1.398 0.885 0.785 0.785 5.24 5.24 5.795
Australian musician mixed 2.988 2.36 2.41 3.472 5.104 3.873 3.323
Australian musician blocked -0.1 -0.885 5.085 1.57 3.483 4.058 5.085
Australian musician blocked 5.17 5.91 6.98 6.345 6.9 4.725 8.715
Australian musician blocked 3.995 4.175 -0.73 3.173 2.623 1.998 1.998
Australian musician blocked 0.785 1.738 2.623 -4.28 1.998 2.623 6.98
Australian non-musician mixed 1.67 1.62 -0.785 1.67 0.885 5.143 5.243
Australian non-musician mixed -0.668 4.095 2.543 -1.498 2.388 1.838 0.475
Australian non-musician mixed 0 0 3.168 1.998 0 1.998 1.998
Australian non-musician mixed 1.77 1.553 2.438 4.28 -1.998 1.998 4.358
Australian non-musician blocked -0.885 0.73 2.623 0.885 1.398 0 2.883
Australian non-musician blocked 0 2.438 2.438 3.105 3.995 4.98 -0.785
Australian non-musician blocked 0 0 -1.998 1.998 0 0 3.995
Australian non-musician blocked 1.998 -3.835 2.438 4.175 3.583 0.885 4.175
342
Falling Speech 182.5-190 190-197.5 197.5-205 205-212.5 212.5-220 220-227.5 227.5-235
Thai musician mixed 2.623 2.623 0 1.57 0.285 4.62 2.623
Thai musician mixed -0.885 2.543 -0.885 0.668 0 1.998 -1.998
Thai musician mixed 2.883 -1.998 -0.885 1.998 2.283 2.623 1.398
Thai musician mixed 0 0 2.623 2.543 3.408 0.785 0
Thai musician blocked 0 0 0 3.55 -2.698 0 0
Thai musician blocked 4.565 4.905 2.283 1.125 4.565 2.883 4.62
Thai musician blocked 0 4.28 4.565 6.355 7.53 2.623 4.358
Thai musician blocked 0 0.885 -0.155 1.758 0.785 1.553 0
Thai non-musician mixed 1.553 0 2.883 -1.998 3.508 2.338 4.62
Thai non-musician mixed -1.998 0 -1.998 2.883 0 0 0
Thai non-musician mixed 3.995 1.113 5.715 0.885 4.28 4.28 2.283
Thai non-musician mixed 0 1.553 1.553 0.668 3.55 3.835 -0.885
Thai non-musician blocked -0.885 0 0.285 1.998 1.998 0 1.998
Thai non-musician blocked 2.283 -1.113 2.283 0.1 2.543 2.783 0
Thai non-musician blocked 2.805 0 2.338 -0.075 2.883 0.545 0.885
Thai non-musician blocked -1.998 4.905 -0.885 1.838 0 0 0.885
Australian musician mixed 1.998 1.998 2.883 4.565 4.28 6.64 1.998
Australian musician mixed 1.256 1.688 3.2 5.553 4.523 3.248 3.658
Australian musician mixed 0.885 3.068 2.883 4.565 4.565 3.105 6.355
Australian musician mixed 0.885 0 3.835 7.53 4.725 0 2.623
Australian musician blocked 4.28 0 0.885 4.358 4.565 4.28 2.623
Australian musician blocked 4.905 5.795 8.715 8.715 5.91 8.715 6.98
Australian musician blocked 1.998 -1.998 1.113 1.553 4.565 0 1.998
Australian musician blocked 0.785 3.995 4.175 4.565 -1.77 1.77 0.885
Australian non-musician mixed -0.445 2.438 1.998 0 0.785 4.62 -0.445
Australian non-musician mixed 0 -0.1 -1.77 2.338 -0.885 -0.885 0.785
Australian non-musician mixed -0.885 1.998 1.213 2.883 1.998 3.995 2.283
Australian non-musician mixed 1.998 0.885 -1.998 0 0.885 1.213 1.998
Australian non-musician blocked -0.668 -0.885 -1.998 0.73 0.668 0 0.73
Australian non-musician blocked 2.283 -0.885 0.885 2.623 2.543 -0.885 2.623
Australian non-musician blocked 1.998 2.623 1.998 4.28 0 1.998 0
Australian non-musician blocked 0.785 0.785 0.885 0.785 0.768 2.438 0.785
343
Appendix A7.12 Analyses – Discrimination Peak
Analysis of Variance Summary Table
Discrimination falling continuum
Source SS df MS F
------------------------------------------------
Between
------------------------------------------------
B1 2.942 1 2.942 0.156
B2 198.372 1 198.372 10.489
B3 6.449 1 6.449 0.341
B4 8.258 1 8.258 0.437
B5 7.432 1 7.432 0.393
B6 18.342 1 18.342 0.970
Error 434.965 23 18.912
------------------------------------------------
Within
------------------------------------------------
W1 18.988 1 18.988 2.964
B1W1 11.343 1 11.343 1.771
B2W1 65.650 1 65.650 10.249
B3W1 0.233 1 0.233 0.036
B4W1 0.040 1 0.040 0.006
B5W1 18.100 1 18.100 2.826
B6W1 4.519 1 4.519 0.705
Error 147.325 23 6.405
W2 1.103 1 1.103 0.130
B1W2 3.997 1 3.997 0.471
B2W2 43.137 1 43.137 5.083
B3W2 1.407 1 1.407 0.166
B4W2 19.598 1 19.598 2.309
B5W2 4.453 1 4.453 0.525
B6W2 1.266 1 1.266 0.149
Error 195.180 23 8.486
W3 398.539 1 398.539 15.479
B1W3 27.515 1 27.515 1.069
B2W3 33.294 1 33.294 1.293
B3W3 35.615 1 35.615 1.383
B4W3 0.036 1 0.036 0.001
B5W3 10.469 1 10.469 0.407
B6W3 0.442 1 0.442 0.017
Error 592.167 23 25.746
W4 8.843 1 8.843 0.607
B1W4 2.909 1 2.909 0.200
B2W4 16.081 1 16.081 1.104
B3W4 53.552 1 53.552 3.677
B4W4 4.657 1 4.657 0.320
B5W4 0.662 1 0.662 0.045
B6W4 9.781 1 9.781 0.672
Error 334.949 23 14.563
W5 0.228 1 0.228 0.027
B1W5 1.997 1 1.997 0.239
B2W5 2.053 1 2.053 0.246
B3W5 40.157 1 40.157 4.807
B4W5 10.344 1 10.344 1.238
B5W5 38.362 1 38.362 4.593
B6W5 0.735 1 0.735 0.088
Error 192.119 23 8.353
344
W6 3.369 1 3.369 0.487
B1W6 28.981 1 28.981 4.192
B2W6 0.052 1 0.052 0.008
B3W6 0.804 1 0.804 0.116
B4W6 24.314 1 24.314 3.516
B5W6 14.115 1 14.115 2.041
B6W6 53.300 1 53.300 7.709
Error 159.028 23 6.914
W7 1.472 1 1.472 0.200
B1W7 23.183 1 23.183 3.150
B2W7 0.527 1 0.527 0.072
B3W7 4.356 1 4.356 0.592
B4W7 0.801 1 0.801 0.109
B5W7 0.769 1 0.769 0.104
B6W7 3.396 1 3.396 0.462
Error 169.252 23 7.359
W8 13.390 1 13.390 1.340
B1W8 20.711 1 20.711 2.073
B2W8 2.375 1 2.375 0.238
B3W8 24.349 1 24.349 2.437
B4W8 0.759 1 0.759 0.076
B5W8 8.118 1 8.118 0.812
B6W8 4.827 1 4.827 0.483
Error 229.801 23 9.991
------------------------------------------------
345
Analysis of Variance Summary Table
Discrimination falling continuum
Source SS df MS F
------------------------------------------------
Between
------------------------------------------------
B1 85.895 1 85.895 2.238
B2 306.201 1 306.201 7.977
B3 19.220 1 19.220 0.501
B4 10.471 1 10.471 0.273
B5 15.114 1 15.114 0.394
B6 47.906 1 47.906 1.248
Error 921.241 24 38.385
------------------------------------------------
Within
------------------------------------------------
W1 80.829 1 80.829 10.948
B1W1 15.360 1 15.360 2.080
B2W1 0.755 1 0.755 0.102
B3W1 0.351 1 0.351 0.048
B4W1 6.160 1 6.160 0.834
B5W1 6.793 1 6.793 0.920
B6W1 4.349 1 4.349 0.589
Error 177.197 24 7.383
W2 36.647 1 36.647 8.998
B1W2 3.720 1 3.720 0.913
B2W2 0.123 1 0.123 0.030
B3W2 1.985 1 1.985 0.487
B4W2 3.022 1 3.022 0.742
B5W2 13.482 1 13.482 3.310
B6W2 14.677 1 14.677 3.603
Error 97.752 24 4.073
W3 10.303 1 10.303 1.279
B1W3 4.242 1 4.242 0.526
B2W3 3.742 1 3.742 0.464
B3W3 4.126 1 4.126 0.512
B4W3 7.484 1 7.484 0.929
B5W3 1.454 1 1.454 0.181
B6W3 3.681 1 3.681 0.457
Error 193.384 24 8.058
W4 32.589 1 32.589 2.261
B1W4 7.170 1 7.170 0.497
B2W4 5.058 1 5.058 0.351
B3W4 20.088 1 20.088 1.394
B4W4 15.396 1 15.396 1.068
B5W4 0.113 1 0.113 0.008
B6W4 19.822 1 19.822 1.375
Error 345.892 24 14.412
W5 3.955 1 3.955 0.538
B1W5 0.038 1 0.038 0.005
B2W5 0.109 1 0.109 0.015
B3W5 44.369 1 44.369 6.033
B4W5 8.013 1 8.013 1.090
B5W5 8.606 1 8.606 1.170
B6W5 5.928 1 5.928 0.806
Error 176.500 24 7.354
W6 10.339 1 10.339 0.739
B1W6 33.463 1 33.463 2.393
B2W6 13.178 1 13.178 0.942
346
B3W6 0.164 1 0.164 0.012
B4W6 0.921 1 0.921 0.066
B5W6 0.110 1 0.110 0.008
B6W6 1.057 1 1.057 0.076
Error 335.628 24 13.985
W7 21.532 1 21.532 4.687
B1W7 24.370 1 24.370 5.304
B2W7 23.707 1 23.707 5.160
B3W7 4.493 1 4.493 0.978
B4W7 1.385 1 1.385 0.301
B5W7 8.264 1 8.264 1.799
B6W7 1.385 1 1.385 0.301
Error 110.263 24 4.594
W8 2.889 1 2.889 0.364
B1W8 4.091 1 4.091 0.516
B2W8 0.735 1 0.735 0.093
B3W8 0.003 1 0.003 0.000
B4W8 0.056 1 0.056 0.007
B5W8 1.475 1 1.475 0.186
B6W8 2.101 1 2.101 0.265
Error 190.351 24 7.931
------------------------------------------------
348
Appendix A8.1 Participant Details
Non-musicians Gender Date of Birth Instrument
Years Played
Hours per week
N1 f 2/08/1981 0 0
N2 f 5/03/1982 0 0
N3 m 20/10/1972 0 0
N4 f 10/03/1973 0 0
N5 f 6/07/1988 0 0
N6 m 30/10/1978 0 0
N7 f 9/04/1988 0 0
N8 f 4/08/1987 0 0
N9 f 27/04/1987 0 0
N10 m 22/10/1980 Guitar 2 0
N11 f 12/11/1988 0 0
N12 m 10/09/1986 0 0
N13 m 22/08/1985 0 0
N13 m 22/08/1985 0 0
N14 m 3/06/1984 0 0
N15 f 3/10/1976 0 0
N16 m 23/08/1979 0 0
N17 m 14/09/1979 0 0
N18 f 24/07/1987 0 0
Musicians Gender
Date of
Birth Instrument
Years
Played
Hours per
Week
M1 f 30/05/1977 Piano 11 15
Violin 11
Recorder 5
Percussion 16
Singing 5
Fife 5
M2 f 23/05/1981 Violin 5 2.5
Flute 5
Recorder 2
Singing 3
M2 f 23/05/1981 Violin 7 2.5
Flute 12
349
Recorder 6
M3 m 23/02/1982 Piano 17 10
Guitar 9
Singing 17
Bass 6
M4 m 5/12/1983 Piano 17 7
Guitar 13
Saxophone 8
Singing 2
M5 m 28/05/1982 Piano 15 6
Clarinet 8
Voice 10
M6 f 14/02/1966 Piano 5 3
Flute 2
Singing 12
M7 m 12/09/1973 Trumpet 22 3.5
French Horn 17
Clarinets 14
Percussion 14
Flute 2
Viola 1
Double Bass 1
Tuba 1
Violin 0.5
Voice 3
Piano 1
Oboe 0.5
Alto Sax 0.5
Tenor Horn 1
M8 f 4/10/1987 Clarinet 11 7
Cello 3
Drums 5
Keyboard 3
M9 f 12/05/1987 Saxophone 2 0
Clarinet 6
M10 m 23/08/1974 Piano 5 1
Guitar 2
Bass G 4
350
Recorder 4
Singing 5
M11 m 7/07/1987 Piano 2 3
Tuba 5
M12 f 22/08/1974 Piano 26 10
M13 f 5/04/1988 Piano 13 10
M14 m 12/03/1979 Guitar 9 14
Piano 3
M15 f 21/11/1986 Singing 15 17.5
Piano 14
Bass Guitar 7
Drums 5
M16 f 28/10/1946 Piano 2 3.5
Guitar 41
Singing 50
M17 f 1/08/1979 Piano 22 0
M18 m 16/03/1981 Trumpet 18 4
Tenor Horn 7
Soprano Cornet 5
351
Appendix A8.2 Consent Form and Questionnaire
MARCS Auditory Laboratories College of Arts, Education and Social Sciences
Denis Burnham
Professor of Psychology, Director MARCS Phone: (+612) 9772 6681 Fax: (+612) 9772 6736
Email: [email protected] Web: www.uws.edu.au/marcs/
PERCEPTION AND PRODUCTION OF SPEECH SOUNDS April 2006
PARTICIPANT INFORMATION STATEMENT You are invited to participate in a research study on human speech. The results of the study will be used to understand how adults produce and perceive speech and other auditory signals. The benefits of this study include increased understanding of how easily humans produce speech sounds in their native language, how easily humans perceive acoustic information in another's speech. We are interested in studying this for different languages. So this research is being conducted with speakers of Australian English. You are invited to participate because you are a native speaker of Australian English. If you participate, you will complete a 90-minute session. You will be asked to identify and produce short sound items. After that, you will do a language aptitude test, a foreign language aptitude test and a musical memory test. Participation is voluntary. You have a right not to participate in, or subsequently withdraw from the study. Any decision not to participate will not affect any current or future relationship with the University of Western Sydney. If you agree to take part in this study, you will be asked to sign a consent form (see over). If you would like additional information on the project or have any questions please do not hesitate to contact Barbara Schwanhaeusser on 9772 6589. Please take time now to ask any questions you may have. Thank you for your time. Denis Burnham MARCS Auditory Laboratories & School of Psychology University of Western Sydney (Bankstown) NOTE: This study has been approved by the University of Western Sydney Human Research Ethics Committee. If you have any complaints or reservations about the ethical conduct of this research, you may contact the UWS Ethics Committee through the Research Ethics Officers (tel: (02) 4736 0883 ). Any issues you raise will be treated in confidence and investigated fully, and you will be informed of the outcome.
352
MARCS Auditory Laboratories
College of Arts, Education and Social Sciences
Denis Burnham Professor of Psychology, Director MARCS
Phone: (+612) 9772 6681 Fax: (+612) 9772 6736 Email: [email protected] Web: www.uws.edu.au/marcs/
PERCEPTION AND PRODUCTION OF SPEECH SOUNDS CONSENT FORM
Please read the information sheet before signing this. 1. Yes No I,..............................................................................(please print name) agree to participate as a participant in the study described in the participant information statement attached to this form. 2. Yes No I acknowledge that I have read the participant information statement, which explains why I have been selected, the aims of the experiment, the nature, the possible risks of the investigation, and the statement have been explained to me to my satisfaction. 3. Yes No I understand that I can withdraw from the study at any time, and I understand that my decision whether or not to participate in or subsequently withdraw from this study will not affect any current or future relationship to the University of Western Sydney. 4. Yes No I agree that research data gathered from the results of the study may be published, provided that I cannot be identified. 5. Yes No I agree that research data gathered from the results of the study may be provided to other researchers in conference presentations and in follow-up research, provided that I cannot be identified. 6. Yes No I understand that if I have any questions relating to my participation in this research, I may contact Barbara Schwanhaeusser (9772 6589) or Prof Denis Burnham (9772 6681) who will be happy to answer them. 7. Yes No I acknowledge receipt of a copy of the Participant Information Statement. 8. Yes No I agree to complete a questionnaire about my language background and other details relevant to the research before participating in the research. Participant's signature: ………………………………….. Date: …………………………………..
NOTE: This study has been approved by the University of Western Sydney Human Research Ethics Committee. If you have any complaints or reservations about the ethical conduct of this research, you may contact the UWS Ethics Committee through the Research Ethics Officers (tel: (02) 4736 0883). Any issues you raise will be treated in confidence and investigated fully, and you will be informed of the outcome.
354
MARCS Auditory Laboratories College of Arts, Education and Social Sciences
Denis Burnham
Professor of Psychology, Director MARCS Phone: (+612) 9772 6681 Fax: (+612) 9772 6736
Email: [email protected] Web: www.uws.edu.au/marcs/
PERCEPTION AND PRODUCTION OF SPEECH SOUNDS PARTICIPANT QUESTIONNAIRE
Please fill in the following details. This information is important for the study, and is the only information about you which will be retained. 1 Your name: …………………….. 2 Male / Female (please circle) 3 Date of birth: …………….. 4 Place of birth (City/town & Country): …………………………………..… 5 What is the official language in that country?………………………………. 6 Hearing: Do you have normal hearing? Yes / No If No, please provide any details you can _________________________________________________________________________________________________________________________________________________________ 7 Speech/language history: Do you have any history of speech/language problems? Yes / No If Yes, please provide any details you can __________________________________________________________________________________________________________________________________________________________ 8 Language background: What is your native language/dialect, that is the language/dialect, which you learned from birth. Please list the percentage of time you use these in your everyday life now. Native Language / Dialect Percentage of Time Spoken …………………………. ……………….. If you learned more than one language from birth, please list these and the percentage of time you use these in your everyday life now. Additional Native Language / Dialect Percentage of Time Spoken …………………………. ………………..
355
MARCS Auditory Laboratories College of Arts, Education and Social Sciences
Denis Burnham
Professor of Psychology, Director MARCS Phone: (+612) 9772 6681 Fax: (+612) 9772 6736
Email: [email protected] Web: www.uws.edu.au/marcs/
PERCEPTION AND PRODUCTION OF SPEECH SOUNDS PARTICIPANT QUESTIONNAIRE
Please also list all the languages of which you have some knowledge, and indicate how old you were when you started learning the language, and the percentage of time you use these in your everyday life. Other language/s that you Age at which you started Percentage of have knowledge of: learning this language: Time Spoken …………………………………. ………… ……….. …………………………………. ………… ……….. …………………………………. ………… ……….. 9 Do you play a musical instrument and/or have singing training? Yes / No If Yes, please list all the musical instruments which you have play/ed and indicate for how long you play/ed the instrument, e.g. violin, age 10, played for 5 years (singing counts!) Instrument: Age started playing: Number of Years Playing: ……………………… …… …… ……………………… …… …… ……………………… …… …… Do you have formal secondary or tertiary level music education? Yes / No If so, please list details about that Instrument: Course: Grade or level attained: ……………………… …… …… ……………………… …… …… Are you still playing music? Yes / No If not, when did you finish? ……………… If you are still playing, how many hours per day do you play? ………… 12 Do you have perfect pitch / absolute pitch? Yes / No / Don’t know (recognition / production of a note without a reference to other notes) Where, when and how was that assessed? ……………………………………….
357
Appendix A8.4 Song List / Answer Sheet Musical Memory Test
Which is the original version of the song? (please the box if you know the song) 1: Cat Empire - Hello Version 1 Version 2 2: Nelly/Kelly - Dilemma Version 1 Version 2 3: Men at Work - Land down Under Version 1 Version 2 4: Franz Ferdinand – Take me out Version 1 Version 2 5: Puff Daddy – I‟ll be missing you Version 1 Version 2 6: Black Eyed Peas – Where is the love Version 1 Version 2 7: Los del Rio - Macarena Version 1 Version 2 8: Outkast – Hey Ya! Version 1 Version 2 9: Michael Jackson – Black or White Version 1 Version 2 10: Jamelia - Superstar Version 1 Version 2 11: Queen – Bohemian Rhapsody Version 1 Version 2 12: Eamon – Don‟t want you back Version 1 Version 2
361
Appendix A8.7 List of Thai Words and Translations
Mid0 Thai Thai Thai
0 W 0 W 0 N
0 W โพ 0 W โ財 0 W โ罪
0 N 0 W 0 N
Low1
1 N 1 W 1 N
1 N 1 W 1 N
1 N 1 N 1 N
High 3
3 W 3 N 3 N
3 N 3 W 3 N
3 N 3 N 3 N
Word list & Translation Word Translation
0 Fat (adj.)
0 Bo tree; pipal tree
3 Doing drugs (slang)
0 Year
0 Instrument used in gambling (Chinese origin)
0 Full; a lot of
1 Flute; pipe
1 Last (in playing game)
3 Naked; nude, obscene
0 Ribbon
362
Appendix A8.8 DMDX Script for Tone Identification Task
<ep><dwc 200200060 ><dbc 19> <fd 50> <s 27> <cr> <nfbt> <t 7000> <d 50> <id "keyboard"> <zil> <zor> <mpr +Left Shift> <mnr +Right Shift> <mnr +space><eop> $0 m1=<umpr><umnr><mpr +Left Shift><mnr +Right Shift><mnr +space>=, m2=<umpr><umnr><mnr +Left Shift><mpr +Right Shift><mnr +space>=, m3=<umpr><umnr><mnr +Left Shift><mnr +Right Shift><mpr +space>=, <ln -6>"In this experiment you’ll identify sounds.", <ln -4>"You'll press the LEFT key when you hear one kind of sound.", <ln -3>"You'll press the RIGHT key when you hear another kind of sound.", <ln -2>"You'll press the SPACBAR when you hear a third kind of sound.", <ln 0>"Before we begin, let’s practice.", <ln 1>"Press the SPACEBAR to continue."; +50a ~1 "Now press the LEFT key after you hear this kind of sound:"/<wav 2>*"phi0.wav"/; +502 ~3 "Press the SPACEBAR after you hear this kind of sound:"/<wav 2>*"phi3.wav"/; +503 ~2 "Press the RIGHT after you hear this kind of sound:"/<wav 2>*"phi1.wav"/; +504 ~1 "Press the LEFT key after you hear this kind of sound:"/<wav 2>*"pi0.wav"/; +505 ~3 "Press the SPACEBAR key after you hear this kind of sound:"/<wav 2>*"pi3.wav"/; +506 ~2 "Press the RIGHT after you hear this kind of sound:"/<wav 2>*"pi1.wav"/; +507 ~1 "Press the LEFT key after you hear this kind of sound:"/<wav 2>*"bi0.wav"/; +508 ~3 "Press the SPACEBAR key after you hear this kind of sound:"/<wav 2>*"bi3.wav"/; +509 ~2 "Press the RIGHT after you hear this kind of sound:"/<wav 2>*"bi1.wav"/; 0<ln -3>"Good. Now you'll train some more.", <ln -2>"PLEASE RESPOND AS FAST AS POSSIBLE!", <ln 0>"When you get 3 in a row correct,", <ln 1>"you'll move on the the testing phase.", <ln 3>"Please press the SPACEBAR to continue."; 2000<set 1,2>;$ \+601 ~1<set 1,2>"ready"/<wav 2>*"phi0.wav"<deciw 1><bicLE 1,1,-4000>/; +602 ~2<set 1,2>"ready"/<wav 2>*"phi1.wav"<deciw 1><bicLE 1,1,-4000>/; +603 ~3<set 1,2>"ready"/<wav 2>*"phi3.wav"<deciw 1><bicLE 1,1,-4000>/;\ $4000<bicGT 1,1,-12000>;$ \+604 ~1<set 1,2>"ready"/<wav 2>*"phi0.wav"<deciw 1><bicLE 1,1,-6000>/; +605 ~2<set 1,2>"ready"/<wav 2>*"phi1.wav"<deciw 1><bicLE 1,1,-6000>/; +606 ~3<set 1,2>"ready"/<wav 2>*"phi3.wav"<deciw 1><bicLE 1,1,-6000>/;\ $6000<bicGT 1,1,-12000>;$ \+607 ~1<set 1,2>"ready"/<wav 2>*"phi0.wav"<deciw 1><bicLE 1,1,-8000>/; +608 ~2<set 1,2>"ready"/<wav 2>*"phi1.wav"<deciw 1><bicLE 1,1,-8000>/; +609 ~3<set 1,2>"ready"/<wav 2>*"bi3.wav"<deciw 1><bicLE 1,1,-8000>/;\ $8000<bicGT 1,1,-12000>;$ \+610 ~1<set 1,2>"ready"/<wav 2>*"phi0.wav"<deciw 1><bicLE 1,1,-10000>/; +611 ~2<set 1,2>"ready"/<wav 2>*"phi1.wav"<deciw 1><bicLE 1,1,-10000>/; +612 ~3<set 1,2>"ready"/<wav 2>*"bi3.wav"<deciw 1><bicLE 1,1,-10000>/;\ $10000<bicGT 1,1,-12000>;$ \+613 ~1<set 1,2>"ready"/<wav 2>*"phi0.wav"<deciw 1><bicLE 1,1,-2000>/; +614 ~2<set 1,2>"ready"/<wav 2>*"phi1.wav"<deciw 1><bicLE 1,1,-2000>/; +615 ~3<set 1,2>"ready"/<wav 2>*"bi3.wav"<deciw 1><bicLE 1,1,-2000>/;\
363
$12000; 0<ln -3>"Great, 3 in a row!", <ln -2>"Now you'll move on the the next part of the experiment", <ln -1>"PLEASE RESPOND AS QUICKLY AND ACCURATELY AS POSSIBLE!!", <ln 0> "Please press SPACEBAR to continue.";$ \+701 ~1<nfb>"ready"/<wav 2>*"phi0.wav"/; +702 ~1<nfb>"ready"/<wav 2>*"pho0.wav"/; +703 ~1<nfb>"ready"/<wav 2>*"phu0.wav"/; +704 ~1<nfb>"ready"/<wav 2>*"bi0.wav"/; +705 ~1<nfb>"ready"/<wav 2>*"bo0.wav"/; +706 ~1<nfb>"ready"/<wav 2>*"bu0.wav"/; +707 ~1<nfb>"ready"/<wav 2>*"pi0.wav"/; +708 ~1<nfb>"ready"/<wav 2>*"po0.wav"/; +709 ~1<nfb>"ready"/<wav 2>*"pu0.wav"/; +710 ~2<nfb>"ready"/<wav 2>*"phi1.wav"/; +711 ~2<nfb>"ready"/<wav 2>*"pho1.wav"/; +712 ~2<nfb>"ready"/<wav 2>*"phu1.wav"/; +713 ~2<nfb>"ready"/<wav 2>*"bi1.wav"/; +714 ~2<nfb>"ready"/<wav 2>*"bo1.wav"/; +715 ~2<nfb>"ready"/<wav 2>*"bu1.wav"/; +716 ~2<nfb>"ready"/<wav 2>*"pi1.wav"/; +717 ~2<nfb>"ready"/<wav 2>*"po1.wav"/; +718 ~2<nfb>"ready"/<wav 2>*"pu1.wav"/; +719 ~3<nfb>"ready"/<wav 2>*"phi3.wav"/; +720 ~3<nfb>"ready"/<wav 2>*"pho3.wav"/; +721 ~3<nfb>"ready"/<wav 2>*"phu3.wav"/; +722 ~3<nfb>"ready"/<wav 2>*"bi3.wav"/; +723 ~3<nfb>"ready"/<wav 2>*"bo3.wav"/; +724 ~3<nfb>"ready"/<wav 2>*"bu3.wav"/; +725 ~3<nfb>"ready"/<wav 2>*"pi3.wav"/; +726 ~3<nfb>"ready"/<wav 2>*"po3.wav"/; +727 ~3<nfb>"ready"/<wav 2>*"pu3.wav"/;\ \+801 ~1<nfb>"ready"/<wav 2>*"phi0.wav"/; +802 ~1<nfb>"ready"/<wav 2>*"pho0.wav"/; +803 ~1<nfb>"ready"/<wav 2>*"phu0.wav"/; +804 ~1<nfb>"ready"/<wav 2>*"bi0.wav"/; +805 ~1<nfb>"ready"/<wav 2>*"bo0.wav"/; +806 ~1<nfb>"ready"/<wav 2>*"bu0.wav"/; +807 ~1<nfb>"ready"/<wav 2>*"pi0.wav"/; +808 ~1<nfb>"ready"/<wav 2>*"po0.wav"/; +809 ~1<nfb>"ready"/<wav 2>*"pu0.wav"/; +810 ~2<nfb>"ready"/<wav 2>*"phi1.wav"/; +811 ~2<nfb>"ready"/<wav 2>*"pho1.wav"/; +812 ~2<nfb>"ready"/<wav 2>*"phu1.wav"/; +813 ~2<nfb>"ready"/<wav 2>*"bi1.wav"/; +814 ~2<nfb>"ready"/<wav 2>*"bo1.wav"/; +815 ~2<nfb>"ready"/<wav 2>*"bu1.wav"/; +816 ~2<nfb>"ready"/<wav 2>*"pi1.wav"/;
+817 ~2<nfb>"ready"/<wav 2>*"po1.wav"/;
+818 ~2<nfb>"ready"/<wav 2>*"pu1.wav"/; +819 ~3<nfb>"ready"/<wav 2>*"phi3.wav"/; +820 ~3<nfb>"ready"/<wav 2>*"pho3.wav"/; +821 ~3<nfb>"ready"/<wav 2>*"phu3.wav"/; +822 ~3<nfb>"ready"/<wav 2>*"bi3.wav"/; +823 ~3<nfb>"ready"/<wav 2>*"bo3.wav"/; +824 ~3<nfb>"ready"/<wav 2>*"bu3.wav"/; +825 ~3<nfb>"ready"/<wav 2>*"pi3.wav"/;
364
+826 ~3<nfb>"ready"/<wav 2>*"po3.wav"/; +827 ~3<nfb>"ready"/<wav 2>*"pu3.wav"/;\ $0 <ln 0> "End of experiment. Please press the bell. THANK YOU!";$
366
Appendix A8.10 Rating Procedure – Production Task
Two native Thai-speaking raters rated productions of consonants, vowels, and tones.
In each of the rating tasks, three speakers‟ imitations had to be rated. In each task,
there were musicians‟ productions as well as non-musicians‟ and the productions
were randomised. The rater had to rate each item on a scale from 1 to five, where 1
was “very bad”, 2 “bad, 3, “average”, 4 “good”, and 5 “very good”. The raters were
instructed to ignore possible differences in recording quality and to only concentrate
on the quality of productions, in terms of the particular sound that was to be rated.
Due to time constraints it was not possible to have all items rated by both raters. In
order to check inter-rater reliability, productions of four of the participants were rated
by both raters independently. The reliability between the two raters was high (mean
Pearson product-moment correlation of r = .828), so it can be assumed that raters
were using equivalent rating criteria.
participant speech Rater Rater
number sound A B Correlation
m18 b 2.090909 2.022727 0.856317
p 3.94 4.066667
ph 3.522727 4
0 4.066667 4.266667
1 3.136364 3.704545
3 3.466667 4.4
i 3.590909 3.636364
o 4.355556 4.066667
u 3.369565 4.152174 n8 b 2.181818 1.931818 0.804418
p 3.340909 3.522727
ph 3.190476 3.880952
0 2.348837 2.976744
1 3 3.954545
3 2.5 2.863636
i 3.404762 3.428571
o 3.222222 3.2254
u 2.844444 3.244444 m10 b 3.266667 3.666667 0.873692
p 2.444444 2.955556
367
ph 3.4 4.222222
0 3.31288 4.022727
1 3.355556 4.066667
3 3.311111 3.488889
i 3.733333 4.066667
o 3.355556 4.222222
u 3.636364 4.377778 n9 b 2.088889 2.369565 0.708566
p 3.111111 3.355556
ph 2.733333 2.822222
0 3.822222 3.133333
1 2.431818 2.931818
3 2.81089 3.355556
i 3.133333 3.133333
o 2.355556 2.688889
u 2.81089 3.045455
368
Appendix A8.11 Raw Data – Musical Aptitude Test
participant musicianship tone score rhythm score
1 musician 93 97
2 musician 39 50
3 musician 79 85
4 musician 97 98
5 musician 75 60
6 musician 87 92
7 musician 98 92
8 musician 71 65
9 musician 50 45
10 musician 61 55
11 musician 66 70
12 musician 79 85
13 musician 29 50
14 musician 34 25
15 musician 83 80
16 musician 87 60
17 musician 97 95
18 musician 29 55
19 non-musician 44 25
20 non-musician 44 65
21 non-musician 93 85
22 non-musician 79 85
23 non-musician 24 7
24 non-musician 29 40
25 non-musician 50 35
26 non-musician 34 45
27 non-musician 75 75
28 non-musician 19 60
29 non-musician 75 60
30 non-musician 66 75
31 non-musician 61 65
32 non-musician 61 45
33 non-musician 50 30
34 non-musician 61 40
35 non-musician 56 55
36 non-musician 44 55
369
Appendix A8.12 SPSS Output – Musical Aptitude Test
Descriptive Sta tistics
52.72 20.008 18
70.17 20.563 18
61.44 21.865 36
53.61 20.003 18
69.67 24.034 18
61.64 23.264 36
52.61 21.180 18
69.94 21.394 18
61.28 22.748 36
musnonnon-musician
musician
Total
non-musician
musician
Total
non-musician
musician
Total
tone
rhythm
total
Mean Std. Deviation N
Tests of Be tween-Subjects Effects
2738.778a 1 2738.778 6.654 .014 .164
2320.028b 1 2320.028 4.745 .036 .122
2704.000c 1 2704.000 5.967 .020 .149
135915.111 1 135915.111 330.218 .000 .907
136776.694 1 136776.694 279.770 .000 .892
135178.778 1 135178.778 298.307 .000 .898
2738.778 1 2738.778 6.654 .014 .164
2320.028 1 2320.028 4.745 .036 .122
2704.000 1 2704.000 5.967 .020 .149
13994.111 34 411.592
16622.278 34 488.891
15407.222 34 453.154
152648.000 36
155719.000 36
153290.000 36
16732.889 35
18942.306 35
18111.222 35
Dependent Variabletone
rhythm
total
tone
rhythm
total
tone
rhythm
total
tone
rhythm
total
tone
rhythm
total
tone
rhythm
total
SourceCorrected Model
Intercept
musnon
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
Partial EtaSquared
R Squared = .164 (Adjusted R Squared = .139)a.
R Squared = .122 (Adjusted R Squared = .097)b.
R Squared = .149 (Adjusted R Squared = .124)c.
370
Appendix A8.13 Raw Data – Musical Memory Test
Song Shift Non-musicians Musicians
Franz Ferdinand -2 1 1
Los Del Rio -2 0.9375 1
Eamon -2 0.9333 1
Cat Empire -1 0.1429 0.4615
Nelly/Kelly -1 0.9375 0.8333
Jamelia -1 0.8125 1
Puff Daddy 1 0.875 0.5833
Outkast 1 1 0.9333
Queen 1 0.6 0.8571
Men at Work 2 0.9375 0.8
Black eyed Peas 2 1 0.8462
Michael Jackson 2 0.9375 0.9286
371
Appendix A8.14 SPSS Output – Musical Memory Test
Tests of Be tween-Subjects Effects
Dependent Variable: result
.315a 7 .045 1.055 .434
17.267 1 17.267 404.775 .000
.001 1 .001 .016 .900
.002 1 .002 .056 .816
.217 1 .217 5.095 .038
.036 1 .036 .850 .370
.009 1 .009 .218 .647
.049 1 .049 1.144 .301
.000 1 .000 .005 .943
.683 16 .043
18.265 24
.998 23
SourceCorrected Model
Intercept
musnon
updown
shift
musnon * updown
musnon * shift
updown * shift
musnon * updown * shift
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .316 (Adjusted R Squared = .016)a.
372
Appendix A8.15 Raw Data – Language Aptitude Test
Participant Musicianship Part Five Part Six
1 musician 0.8 1
2 musician 0.5 1
3 musician 0.9667 0.9583
4 musician 0.9667 1
5 musician 0.9667 0.9167
6 musician 0.9333 1
7 musician 0.9333 1
8 musician 0.7 0.9583
9 musician 0.5667 0.9583
10 musician 0.9 1
11 musician 0.9333 0.9583
12 musician 0.8667 0.9583
13 musician 0.8667 1
14 musician 0.6667 0.875
15 musician 0.6 0.8333
16 musician 0.7 0.75
17 musician 0.9333 0.9583
18 musician 0.6 1
19 non-musician 0.6 0.8333
20 non-musician 0.5333 0.7917
21 non-musician 0.9667 1
22 non-musician 0.7333 1
23 non-musician 0.6 1
24 non-musician 0.5667 0.9167
25 non-musician 0.7333 0.9167
26 non-musician 0.4 0.9583
27 non-musician 0.7 0.9167
28 non-musician 0.8 1
29 non-musician 1 0.9167
30 non-musician 0.9 0.9583
31 non-musician 0.8 0.875
32 non-musician 0.6667 0.7917
33 non-musician 0.7 0.9583
34 non-musician 0.7333 0.875
35 non-musician 0.7667 1
36 non-musician 0.4667 0.9167
373
Appendix A8.16 SPSS Outputs – Language Aptitude Test
Tests of Be tween-Subjects Effects
.083a 1 .083 3.242 .081 .087
.007b 1 .007 1.462 .235 .041
.035c 1 .035 3.761 .061 .100
20.350 1 20.350 790.469 .000 .959
31.641 1 31.641 6662.903 .000 .995
25.685 1 25.685 2788.953 .000 .988
.083 1 .083 3.242 .081 .087
.007 1 .007 1.462 .235 .041
.035 1 .035 3.761 .061 .100
.875 34 .026
.161 34 .005
.313 34 .009
21.309 36
31.809 36
26.033 36
.959 35
.168 35
.348 35
Dependent Variablefive
six
total
five
six
total
five
six
total
five
six
total
five
six
total
five
six
total
SourceCorrected Model
Intercept
musnon
Error
Total
Corrected Total
Type III Sumof Squares df Mean Square F Sig.
Partial EtaSquared
R Squared = .087 (Adjusted R Squared = .060)a.
R Squared = .041 (Adjusted R Squared = .013)b.
R Squared = .100 (Adjusted R Squared = .073)c.
374
Appendix A8.17 Raw Data - Speech Perception Test Criterion Results
Participant Musicianship Consonant Tone Vowel
1 musician 6 3 3
2 musician 4 3 3
3 musician 6 4 3
4 musician 3 5 3
5 musician 8 8 4
6 musician 3 3 3
7 musician 3 3 3
8 musician 6 3 4
9 musician 12 3 3
10 musician 3 5 3
11 musician 3 5 3
12 musician 3 3 3
13 musician 6 3 3
14 musician 3 5 3
15 musician 3 6 3
16 musician 6 3 3
17 musician 13 10 3
18 musician 10 4 4
19 non-musician 4 4 3
20 non-musician 12 3 3
21 non-musician 4 3 3
22 non-musician 12 3 3
23 non-musician 4 3 3
24 non-musician 12 19 3
25 non-musician 6 5 3
26 non-musician 7 3 3
27 non-musician 5 3 3
28 non-musician 12 3 3
29 non-musician 5 3 3
30 non-musician 9 18 3
31 non-musician 10 10 3
32 non-musician 10 3 3
33 non-musician 7 4 3
34 non-musician 6 4 3
35 non-musician 9 19 3
375
36 non-musician 7 10 3
Appendix A8.18 PSY Output – Criterion Results
Analysis of Variance Summary Table
Source SS df MS F
---------------------------------------------------
Between
---------------------------------------------------
B1 59.6671 1 59.6671 4.6995
Error 406.2917 32 12.6966
---------------------------------------------------
Within
---------------------------------------------------
W1 13.4447 1 13.4447 1.1991
B1W1 12.1391 1 12.1391 1.0827
Error 358.7958 32 11.2124
W2 237.6563 1 237.6563 48.1977
B1W2 24.8063 1 24.8063 5.0308
Error 157.7875 32 4.9309
---------------------------------------------------
376
Appendix A8.19 Raw Data - Speech Perception Test Results
Consonant Identification Tone Identification Vowel Identification
Participant Musicianship Overall b p ph Overall tone 0 tone 1 tone 3 Overall u i o
1 musician 0.731 1 0.31 0.83 0.944 0.89 0.94 1 1 1 1 1
2 musician 0.685 0.56 0.61 0.89 0.944 0.94 0.94 0.94 1 1 1 1
3 musician 0.833 0.94 0.67 0.89 0.63 0.44 0.5 0.94 0.963 1 0.89 1
4 musician 0.537 0.44 0.22 0.94 0.907 0.83 0.89 1 1 1 1 1
5 musician 0.698 0.94 0.35 0.78 0.463 0.33 0.17 0.89 1 1 1 1
6 musician 0.796 0.89 0.5 1 1 1 1 1 0.963 1 1 0.89
7 musician 0.741 0.61 0.67 0.94 0.852 0.78 0.78 1 0.981 0.94 1 1
8 musician 0.574 1 0.56 0.17 0.704 0.61 0.56 0.94 0.944 0.96 1 0
9 musician 0.593 1 0.22 0.56 0.926 0.83 0.94 1 0.87 0.76 1 0.83
10 musician 0.907 0.83 0.89 1 1.019 1 0.94 1 1 1 1 1
11 musician 0.648 0.72 0.22 1 0.519 0.17 0.39 1 0.981 1 1 0.94
12 musician 0.852 0.78 0.83 0.94 0.944 0.94 1 0.89 0.963 1 0.94 0.94
13 musician 0.5 0.89 0.17 0.44 0.852 0.72 0.89 0.94 0.907 0.72 1 1
14 musician 0.37 0.28 0.44 0.39 0.593 0.39 0.61 0.78 0.98 1 1 0.94
15 musician 0.481 1 0.11 0.33 0.648 0.33 0.67 0.94 0.981 0.94 1 1
16 musician 0.426 0.89 0.11 0.28 0.981 0.94 1 1 0.981 1 0.94 1
17 musician 0.692 0.69 0.72 0.67 0.87 0.72 0.89 1 0.889 0.78 1 0.89
18 musician 0.5 0.89 0.39 0.22 0.63 0.33 0.56 1 1 1 1 1
1 non-musician 0.385 0.89 0 0.25 0.648 0.44 0.5 1 1 1 1 1
2 non-musician 0.556 0.89 0.11 0.67 0.574 0.28 1 0.94 0.759 0.67 0.89 0.72
3 non-musician 0.815 0.89 0.72 0.83 0.537 0.33 0.65 1 1 1 1 1
4 non-musician 0.519 1 0.06 0.44 0.926 0.94 0.83 1 1 1 1 1
5 non-musician 0.407 1 0 0.22 0.556 0.28 0.5 0.89 0.963 1 0.94 0.94
6 non-musician 0.722 0.78 0.72 0.67 0.537 0.17 0.5 0.94 1 1 1 1
7 non-musician 0.5 0.89 0.11 0.5 0.5 0.11 0.39 1 0.963 0.94 0.94 1
8 non-musician 0.444 0.94 0 0.39 0.667 0.56 0.56 0.89 0.87 0.72 0.94 0.94
9 non-musician 0.519 1 0.06 0.44 0.926 0.94 0.83 1 1 1 1 1
10 non-musician 0.481 0.83 0.11 0.5 0.444 0.11 0.22 1 1 1 1 1
11 non-musician 0.556 1 0 0.67 1 1 1 1 1 1 1 1
12 non-musician 0.611 0.94 0 0.89 0.5 0.17 0.5 0.83 0.981 1 0.94 1
13 non-musician 0.528 0.83 0.18 0.56 0.537 0.17 0.44 1 0.926 0.83 0.94 1
14 non-musician 0.7 0.69 0.41 1 0.463 0.17 0.22 1 1 1 1 1
15 non-musician 0.593 0.5 0.44 0.83 0.722 0.5 0.83 0.83 0.34 0.22 0.35 0.44
377
16 non-musician 0.463 1 0.11 0.28 0.463 0.06 0.33 1 0.519 0.39 1 0.17
17 non-musician 0.627 0.88 0.33 0.67 0.574 0.39 0.33 1 0.981 1 1 0.94
18 non-musician 0.574 0.94 0.11 0.67 0.574 0.22 0.5 1 0.981 1 1 0.94
378
Appendix A8.20 Statistical Output – Speech Perception Test
Analysis of Variance Summary Table – Perception Overall
Source SS df MS F
---------------------------------------------------
Between
---------------------------------------------------
B1 0.3295 1 0.3295 13.1538
Error 0.8518 34 0.0251
---------------------------------------------------
Within
---------------------------------------------------
W1 0.0780 1 0.0780 3.1401
B1W1 0.0693 1 0.0693 2.7878
Error 0.8450 34 0.0249
W2 2.0417 1 2.0417 122.7228
B1W2 0.0027 1 0.0027 0.1652
Error 0.5657 34 0.0166
---------------------------------------------------
Analysis of Variance Summary Table – Tone Perception
Source SS df MS F
---------------------------------------------------
Between
---------------------------------------------------
B1 0.7257 1 0.7257 11.3065
Error 2.0539 32 0.0642
---------------------------------------------------
Within
---------------------------------------------------
W1 3.4631 1 3.4631 120.2344
B1W1 0.3845 1 0.3845 13.3508
Error 0.9217 32 0.0288
W2 0.3133 1 0.3133 20.0532
B1W2 0.0511 1 0.0511 3.2680
Error 0.5000 32 0.0156
---------------------------------------------------
379
Analysis of Variance Summary Table – Consonant Perception
Source SS df MS F
---------------------------------------------------
Between
---------------------------------------------------
B1 0.185 1 0.185 5.556
Error 1.662 32 0.052
---------------------------------------------------
Within
---------------------------------------------------
W1 2.913 1 2.913 46.106
B1W1 0.410 1 0.410 6.494
Error 2.022 32 0.063
W2 1.751 1 1.751 50.307
B1W2 0.089 1 0.089 2.571
Error 1.114 32 0.035
---------------------------------------------------
Analysis of Variance Summary Table - Vowel Perception
Source SS df MS F
---------------------------------------------------
Between
---------------------------------------------------
B1 0.0605 1 0.0605 0.9460
Error 2.0448 32 0.0639
---------------------------------------------------
Within
---------------------------------------------------
W1 0.0097 1 0.0097 0.9476
B1W1 0.0107 1 0.0107 1.0506
Error 0.3265 32 0.0102
W2 0.0632 1 0.0632 2.5817
B1W2 0.0009 1 0.0009 0.0382
Error 0.7833 32 0.0245
---------------------------------------------------
380
Appendix A8.21 Raw Data - Speech Production Rating Results
Consonant Production Tone Production Vowel Production
Participant Musaicianship Overall b p ph Overall 0 1 3 Overall i o u
1 musician 3.044 3.18 2.44 3.51 3.044 3.47 3.18 2.49 3.284 3.32 3.4 3.13
2 musician 3.315 3.18 3.39 3.38 2.992 3.31 2.8 2.86 3.34 3.84 2.91 3.27
3 musician 2.952 2.64 2.84 3.38 2.898 2.67 3.25 2.78 2.878 3.04 2.57 3.02
4 musician 3.154 3.98 2.82 2.67 3.037 3.29 2.78 3.04 3.334 3.6 3.14 3.27
5 musician 3.115 3.5 2.27 3.58 3.253 3.3 3.41 3.05 3.319 3.44 2.98 3.53
6 musician 3.259 2.2 4.4 3.18 3.77 4.33 2.89 4.09 3.897 4.09 3.28 4.32
7 musician 3.399 3.64 2.82 3.73 3.17 3.02 3.31 3.18 3.41 3.36 3.47 3.41
8 musician 2.788 2.31 3.39 2.67 3.044 3.8 2.56 2.78 3.222 3.89 2.2 3.58
9 musician 3.045 2.04 3.09 4 3.311 3.09 4 2.84 3.45 3.48 3.71 3.16
10 musician 3.444 3.67 2.44 4.22 3.859 4.02 4.07 3.49 4.111 3.73 4.22 4.38
11 musician 3.23 3.02 3.04 3.62 3.643 4.18 3.24 3.51 3.978 4.42 3.33 4.18
12 musician 3.059 2.36 3.38 3.44 3.388 3.4 3.4 3.36 3.296 3.67 3.29 2.93
13 musician 3.091 2.52 3.26 3.49 3.548 3.67 3.89 3.09 3.578 3.96 3.36 3.42
14 musician 2.719 2.13 2.5 3.52 3.024 3.53 2.96 2.58 3.375 3.4 3.38 3.35
15 musician 2.741 1.91 3.27 3.04 3.356 3.71 3.71 2.64 2.822 3.29 2.36
16 musician 2.607 2.29 2.91 2.62 3.466 3.93 3.6 2.86 2.656 2.84 2.47
17 musician 3.234 3.82 2.11 3.77 3.395 3.34 4.11 2.73 3.274 3.47 3.31 3.05
18 musician 3.363 2.02 4.07 4 4.169 4.27 3.84 4.4 3.712 3.59 4.36 3.19
1 non-musician 2.932 2.64 2.52 3.63 3.192 3.27 3.47 2.84 3.285 3.6 3.77 2.49
2 non-musician 2.793 2.51 2.96 2.91 2.983 2.73 2.87 3.36 3.2 3.58 3.07 2.95
3 non-musician 3 2.71 2.31 3.98 3 3.13 3.36 2.51 3.281 3.27 3.04 3.53
4 non-musician 2.859 2.51 3 3.07 2.933 2.87 3.42 2.51 2.867 3.67 2.53 2.4
5 non-musician 2.775 2.27 2.66 3.4 2.593 2.98 2.93 1.87 3.135 3.2 2.98 3.22
6 non-musician 2.568 1.81 2.56 3.33 1.924 1.27 3 1.5 2.57 2.73 2.57 2.41
7 non-musician 3.081 2.14 3.57 3.53 3.284 3.56 2.91 3.39 3.641 3.71 3.59 3.62
8 non-musician 2.965 2.18 3.52 3.19 2.737 2.35 3 2.86 3.298 3.43 3.22 3.24
9 non-musician 2.768 2.37 3.11 2.82 3.14 3.13 2.93 3.36 2.956 3.13 2.69 3.05
10 non-musician 2.996 2.91 2.67 3.41 3.444 3.29 3.6 3.385 3.16 3.6 3.4
11 non-musician 3.141 2.49 3.51 3.42 3.415 3.57 3.22 3.45 3.552 4.18 3.5 2.98
12 non-musician 3.201 3.21 3.26 3.13 2.962 3.02 2.75 3.11 3.268 3.54 2.89 3.37
13 non-musician 3.105 3.16 3.24 2.91 2.881 2.67 3.07 2.91 3.11 3.33 2.89 3.11
14 non-musician 3.074 2.64 2.56 4.02 3.326 3.32 3.33 3.304 3.47 3.76 2.69
15 non-musician 3.062 3.67 2.49 3.02 3.154 3.35 2.91 3.2 2.947 3.32 3.02 2.5
16 non-musician 2.648 1.89 2.45 3.6 3.181 3.24 3.48 2.82 3.467 3.22 3.51 3.67
17 non-musician 2.941 2.76 2.76 3.31 3.267 3.89 2.44 3.47 3.815 4.36 3.09 4
18 non-musician 2.836 2.95 2.3 3.27 3.187 3.71 2.64 3.2 3.516 3.73 2.95 3.87
381
Appendix A8.22 Statitsical Output – Speech Production Ratings
Analysis of Variance Summary Table – Overall Production
Source SS df MS F
---------------------------------------------------
Between
---------------------------------------------------
B1 1.1035 1 1.1035 5.5592
Error 6.7490 34 0.1985
---------------------------------------------------
Within
---------------------------------------------------
W1 0.0205 1 0.0205 0.4059
B1W1 0.1881 1 0.1881 3.7170
Error 1.7204 34 0.0506
W2 1.7514 1 1.7514 47.7272
B1W2 0.0032 1 0.0032 0.0861
Error 1.2477 34 0.0367
---------------------------------------------------
Analysis of Variance Summary Table – Tone Production
Source SS df MS F
---------------------------------------------------
Between
---------------------------------------------------
B1 3.7316 1 3.7316 9.9563
Error 11.2440 30 0.3748
---------------------------------------------------
Within
---------------------------------------------------
W1 0.0900 1 0.0900 0.3065
B1W1 0.0105 1 0.0105 0.0356
Error 8.8107 30 0.2937
W2 1.4827 1 1.4827 15.8619
B1W2 0.3851 1 0.3851 4.1196
Error 2.8043 30 0.0935
---------------------------------------------------
382
Analysis of Variance Summary Table – Consonant Production
Source SS df MS F
---------------------------------------------------
Between
---------------------------------------------------
B1 0.7154 1 0.7154 5.3179
Error 4.3050 32 0.1345
---------------------------------------------------
Within
---------------------------------------------------
W1 4.6841 1 4.6841 12.3643
B1W1 0.0464 1 0.0464 0.1224
Error 12.1231 32 0.3788
W2 3.9548 1 3.9548 16.0600
B1W2 0.0139 1 0.0139 0.0563
Error 7.8800 32 0.2463
---------------------------------------------------
Analysis of Variance Summary Table – Vowel production
Source SS df MS F
---------------------------------------------------
Between
---------------------------------------------------
B1 1.1008 1 1.1008 3.6757
Error 8.9844 30 0.2995
---------------------------------------------------
Within
---------------------------------------------------
W1 0.1876 1 0.1876 1.3645
B1W1 0.0984 1 0.0984 0.7155
Error 4.1255 30 0.1375
W2 1.8119 1 1.8119 12.4173
B1W2 0.0014 1 0.0014 0.0095
Error 4.3776 30 0.1459
---------------------------------------------------
383
Appendix A8.23 Raw Data – Musical Experience Number of Years of Hours per
Participant Instruments Training Week
1 6 16 15
2 4 5 2.5
3 4 17 10
4 4 17 7
5 3 15 6
6 3 12 3
7 7 22 3.5
8 4 11 7
9 2 5 0
10 5 5 1
11 2 5 3
12 3 26 10
13 1 13 10
14 2 9 14
15 4 15 17.5
16 3 50 3.5
17 1 22 0
18 1 18 4
19 0 0 0
20 0 0 0
21 0 0 0
22 0 0 0
23 0 0 0
24 0 0 0
25 0 0 0
26 0 0 0
27 0 0 0
28 1 2 0
29 0 0 0
30 0 0 0
31 0 0 0
32 0 0 0
33 0 0 0
34 0 0 0
35 0 0 0
384
36 0 0 0
Appendix A8.24 Factor Analysis Output – Musical Training
Correlation Matrix
1.000 .609 .627
.609 1.000 .506
.627 .506 1.000
instruments
years
hoursweek
Correlationinstruments years hoursweek
Communalities
1.000 .781
1.000 .683
1.000 .699
ins truments
years
hoursweek
Initial Extraction
Extraction Method: Principal Component Analysis.
Total Variance Explained
2.163 72.107 72.107 2.163 72.107 72.107
.495 16.495 88.602
.342 11.398 100.000
Component1
2
3
Total % of Variance Cumulative % Total % of Variance Cumulative %
Initial Eigenvalues Extraction Sums of Squared Loadings
Extraction Method: Principal Component Analysis.
Component Matrixa
.884
.827
.836
instruments
years
hoursweek
1
Component
Extract ion Method: Princ ipal Component Analysis
1 components extracted.a.
385
Appendix A8.25 Raw data – Sequential Regression Musical Language Musical Musical Tone Consonant Vowel
Participant Training Aptitude Aptitude Memory Perception Perception Perception
1 2.08656 48 92 0.909 0.944 0.731 1
2 0.31568 39 49 0.625 0.944 0.685 1
3 1.32641 52 82 0.909 0.63 0.833 0.963
4 1.09071 53 96 1 0.907 0.537 1
5 0.74068 51 70 0.909 0.463 0.698 1
6 0.3996 52 88 0.889 1 0.796 0.963
7 1.59498 52 92 0.6 0.852 0.741 0.981
8 0.87997 44 70 0.833 0.704 0.574 0.944
9 -0.28318 40 47 0.917 0.926 0.593 0.87
10 0.39904 51 59 0.9 1.019 0.907 1
11 -0.04747 51 70 0.667 0.519 0.648 0.981
12 1.4413 49 82 0.917 0.944 0.852 0.963
13 0.58228 50 50 0.917 0.852 0.5 0.907
14 0.95727 41 26 0.917 0.593 0.37 0.98
15 1.84543 38 80 0.917 0.648 0.481 0.981
16 1.77355 39 76 0.8 0.981 0.426 0.981
17 0.11269 51 93 0.917 0.87 0.692 0.889
18 0.28648 42 41 0.917 0.63 0.5 1
19 -0.86122 38 32 0.667 0.648 0.385 1
20 -0.86122 35 56 0.917 0.574 0.556 0.759
21 -0.86122 59 88 0.778 0.537 0.815 1
22 -0.86122 46 82 1 0.926 0.519 1
23 -0.86122 42 12 0.667 0.556 0.407 0.963
24 -0.86122 39 32 0.444 0.537 0.722 1
25 -0.86122 44 41 0.833 0.5 0.5 0.963
26 -0.86122 35 38 0.833 0.667 0.444 0.87
27 -0.86122 43 76 0.917 0.926 0.519 1
28 -0.86122 48 50 0.917 0.463 0.315 0.815
29 -0.86122 52 70 1 1 0.556 1
30 -0.86122 50 72 0.833 0.667 0.66 0.963
31 -0.86122 45 65 0.833 0.537 0.528 0.926
32 -0.86122 39 32 0.917 0.593 0.519 0.926
33 -0.86122 44 38 0.833 0.722 0.593 0.34
34 -0.86122 43 35 0.833 . 0.509 0.833
35 -0.86122 47 32 0.9 0.593 0.444 1
386
36 -0.86122 36 50 0.917 0.574 0.574 0.981
Appendix A8.26 SPSS Output – Sequential Regression
Tone Perception
Model Summary
.195a .038 .009 .183577 .038 1.306 1 33 .261
.479b .230 .182 .166808 .192 7.969 1 32 .008
.494c .244 .171 .167906 .014 .583 1 31 .451
.522d .272 .175 .167465 .028 1.164 1 30 .289
Model1
2
3
4
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predic tors: (Constant), plaba.
Predic tors: (Constant), plab, ammab.
Predic tors: (Constant), plab, amma, musmemc.
Predic tors: (Constant), plab, amma, musmem, musical trainingd.
ANOVAe
.044 1 .044 1.306 .261a
1.112 33 .034
1.156 34
.266 2 .133 4.775 .015b
.890 32 .028
1.156 34
.282 3 .094 3.336 .032c
.874 31 .028
1.156 34
.315 4 .079 2.806 .043d
.841 30 .028
1.156 34
Regression
Residual
Total
Regression
Residual
Total
Regression
Residual
Total
Regression
Residual
Total
Model1
2
3
4
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), plaba.
Predictors: (Constant), plab, ammab.
Predictors: (Constant), plab, amma, musmemc.
Predictors: (Constant), plab, amma, musmem, musical trainingd.
Dependent Variable: perc tonee.
387
Tone Production
Model Summary
.253a .064 .036 .37495 .064 2.320 1 34 .137
.256b .066 .009 .38025 .002 .059 1 33 .809
.494c .244 .173 .34728 .179 7.565 1 32 .010
.543d .295 .204 .34082 .051 2.224 1 31 .146
Model1
2
3
4
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predic tors: (Constant), plaba.
Predic tors: (Constant), plab, ammab.
Predic tors: (Constant), plab, amma, musmemc.
Predic tors: (Constant), plab, amma, musmem, musical trainingd.
ANOVAe
.326 1 .326 2.320 .137a
4.780 34 .141
5.106 35
.335 2 .167 1.157 .327b
4.772 33 .145
5.106 35
1.247 3 .416 3.447 .028c
3.859 32 .121
5.106 35
1.505 4 .376 3.240 .025d
3.601 31 .116
5.106 35
Regression
Residual
Total
Regression
Residual
Total
Regression
Residual
Total
Regression
Residual
Total
Model1
2
3
4
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), plaba.
Predictors: (Constant), plab, ammab.
Predictors: (Constant), plab, amma, musmemc.
Predictors: (Constant), plab, amma, musmem, musical trainingd.
Dependent Variable: toneprode.
388
Consonant Perception
Model Summary
.536a .287 .266 .125963 .287 13.709 1 34 .001
.600b .360 .322 .121128 .073 3.769 1 33 .061
.662c .438 .385 .115312 .078 4.413 1 32 .044
.665d .442 .370 .116774 .004 .204 1 31 .655
Model1
2
3
4
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predic tors: (Constant), plaba.
Predic tors: (Constant), plab, ammab.
Predic tors: (Constant), plab, amma, musmemc.
Predic tors: (Constant), plab, amma, musmem, musical trainingd.
ANOVAe
.218 1 .218 13.709 .001a
.539 34 .016
.757 35
.273 2 .136 9.297 .001b
.484 33 .015
.757 35
.331 3 .110 8.310 .000c
.426 32 .013
.757 35
.334 4 .084 6.128 .001d
.423 31 .014
.757 35
Regression
Residual
Total
Regression
Residual
Total
Regression
Residual
Total
Regression
Residual
Total
Model1
2
3
4
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), plaba.
Predictors: (Constant), plab, ammab.
Predictors: (Constant), plab, amma, musmemc.
Predictors: (Constant), plab, amma, musmem, musical trainingd.
Dependent Variable: percconse.
389
Consonant Production
Model Summary
.519a .269 .247 .19462 .269 12.511 1 34 .001
.519b .269 .225 .19748 .001 .023 1 33 .881
.519c .269 .201 .20054 .000 .000 1 32 .998
.520d .270 .176 .20362 .001 .037 1 31 .848
Model1
2
3
4
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predic tors: (Constant), plaba.
Predic tors: (Constant), plab, ammab.
Predic tors: (Constant), plab, amma, musmemc.
Predic tors: (Constant), plab, amma, musmem, musical trainingd.
ANOVAe
.474 1 .474 12.511 .001a
1.288 34 .038
1.762 35
.475 2 .237 6.087 .006b
1.287 33 .039
1.762 35
.475 3 .158 3.935 .017c
1.287 32 .040
1.762 35
.476 4 .119 2.872 .039d
1.285 31 .041
1.762 35
Regression
Residual
Total
Regression
Residual
Total
Regression
Residual
Total
Regression
Residual
Total
Model1
2
3
4
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), plaba.
Predictors: (Constant), plab, ammab.
Predictors: (Constant), plab, amma, musmemc.
Predictors: (Constant), plab, amma, musmem, musical trainingd.
Dependent Variable: consprode.
390
Vowel Perception
Model Summary
.167a .028 -.001 .118821 .028 .979 1 34 .329
.247b .061 .004 .118528 .033 1.168 1 33 .288
.288c .083 -.003 .118983 .021 .748 1 32 .394
.330d .109 -.006 .119159 .026 .905 1 31 .349
Model1
2
3
4
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predic tors: (Constant), plaba.
Predic tors: (Constant), plab, ammab.
Predic tors: (Constant), plab, amma, musmemc.
Predic tors: (Constant), plab, amma, musmem, musical trainingd.
ANOVAe
.014 1 .014 .979 .329a
.480 34 .014
.494 35
.030 2 .015 1.076 .353b
.464 33 .014
.494 35
.041 3 .014 .961 .423c
.453 32 .014
.494 35
.054 4 .013 .945 .451d
.440 31 .014
.494 35
Regression
Residual
Total
Regression
Residual
Total
Regression
Residual
Total
Regression
Residual
Total
Model1
2
3
4
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), plaba.
Predictors: (Constant), plab, ammab.
Predictors: (Constant), plab, amma, musmemc.
Predictors: (Constant), plab, amma, musmem, musical trainingd.
Dependent Variable: percvowele.
391
Vowel Production
Model Summary
.325a .106 .079 .33015 .106 4.019 1 34 .053
.484b .234 .188 .31011 .128 5.537 1 33 .025
.529c .279 .212 .30547 .045 2.009 1 32 .166
.531d .282 .189 .30986 .002 .100 1 31 .754
Model1
2
3
4
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predic tors: (Constant), plaba.
Predic tors: (Constant), plab, ammab.
Predic tors: (Constant), plab, amma, musmemc.
Predic tors: (Constant), plab, amma, musmem, musical trainingd.
ANOVAe
.438 1 .438 4.019 .053a
3.706 34 .109
4.144 35
.970 2 .485 5.046 .012b
3.174 33 .096
4.144 35
1.158 3 .386 4.137 .014c
2.986 32 .093
4.144 35
1.168 4 .292 3.040 .032d
2.976 31 .096
4.144 35
Regression
Residual
Total
Regression
Residual
Total
Regression
Residual
Total
Regression
Residual
Total
Model1
2
3
4
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), plaba.
Predictors: (Constant), plab, ammab.
Predictors: (Constant), plab, amma, musmemc.
Predictors: (Constant), plab, amma, musmem, musical trainingd.
Dependent Variable: vowelprode.
392
Appendix A8.27 SPSS Output – Alternative Sequential Regression
Tone Perception
Model Summary
.195a .038 .009 .183577 .038 1.306 1 33 .261
.407b .166 .114 .173591 .128 4.906 1 32 .034
.450c .203 .125 .172457 .037 1.422 1 31 .242
.522d .272 .175 .167465 .070 2.876 1 30 .100
Model1
2
3
4
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predictors : (Constant), plaba.
Predictors : (Constant), plab, mus ical trainingb.
Predictors : (Constant), plab, mus ical training, musmemc.
Predictors : (Constant), plab, mus ical training, musmem, ammad.
ANOVAe
.044 1 .044 1.306 .261a
1.112 33 .034
1.156 34
.192 2 .096 3.183 .055b
.964 32 .030
1.156 34
.234 3 .078 2.624 .068c
.922 31 .030
1.156 34
.315 4 .079 2.806 .043d
.841 30 .028
1.156 34
Regress ion
Residual
Total
Regress ion
Residual
Total
Regress ion
Residual
Total
Regress ion
Residual
Total
Model1
2
3
4
Sum ofSquares df Mean Square F Sig.
Predictors : (Constant), plaba.
Predictors : (Constant), plab, musical trainingb.
Predictors : (Constant), plab, musical training, musmemc.
Predictors : (Constant), plab, musical training, musmem, ammad.
Dependent Variable: perctonee.
393
Tone Production
Model Summary
.253a .064 .036 .37495 .064 2.320 1 34 .137
.319b .102 .047 .37286 .038 1.384 1 33 .248
.519c .269 .200 .34155 .167 7.327 1 32 .011
.543d .295 .204 .34082 .026 1.137 1 31 .294
Model1
2
3
4
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predictors : (Constant), plaba.
Predictors : (Constant), plab, mus ical trainingb.
Predictors : (Constant), plab, mus ical training, musmemc.
Predictors : (Constant), plab, mus ical training, musmem, ammad.
ANOVAe
.326 1 .326 2.320 .137a
4.780 34 .141
5.106 35
.519 2 .259 1.865 .171b
4.588 33 .139
5.106 35
1.373 3 .458 3.924 .017c
3.733 32 .117
5.106 35
1.505 4 .376 3.240 .025d
3.601 31 .116
5.106 35
Regress ion
Residual
Total
Regress ion
Residual
Total
Regress ion
Residual
Total
Regress ion
Residual
Total
Model1
2
3
4
Sum ofSquares df Mean Square F Sig.
Predictors : (Constant), plaba.
Predictors : (Constant), plab, musical trainingb.
Predictors : (Constant), plab, musical training, musmemc.
Predictors : (Constant), plab, musical training, musmem, ammad.
Dependent Variable: toneprode.
394
Consonant Perception
Model Summary
.536a .287 .266 .125963 .287 13.709 1 34 .001
.573b .329 .288 .124110 .041 2.023 1 33 .164
.616c .379 .321 .121173 .051 2.619 1 32 .115
.665d .442 .370 .116774 .062 3.457 1 31 .073
Model1
2
3
4
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predictors : (Constant), plaba.
Predictors : (Constant), plab, mus ical trainingb.
Predictors : (Constant), plab, mus ical training, musmemc.
Predictors : (Constant), plab, mus ical training, musmem, ammad.
ANOVAe
.218 1 .218 13.709 .001a
.539 34 .016
.757 35
.249 2 .124 8.072 .001b
.508 33 .015
.757 35
.287 3 .096 6.518 .001c
.470 32 .015
.757 35
.334 4 .084 6.128 .001d
.423 31 .014
.757 35
Regress ion
Residual
Total
Regress ion
Residual
Total
Regress ion
Residual
Total
Regress ion
Residual
Total
Model1
2
3
4
Sum ofSquares df Mean Square F Sig.
Predictors : (Constant), plaba.
Predictors : (Constant), plab, musical trainingb.
Predictors : (Constant), plab, musical training, musmemc.
Predictors : (Constant), plab, musical training, musmem, ammad.
Dependent Variable: percconse.
395
Consonant Production
Model Summary
.519a .269 .247 .19462 .269 12.511 1 34 .001
.519b .269 .225 .19752 .000 .009 1 33 .925
.519c .269 .201 .20057 .000 .002 1 32 .967
.520d .270 .176 .20362 .001 .048 1 31 .827
Model1
2
3
4
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predictors : (Constant), plaba.
Predictors : (Constant), plab, mus ical trainingb.
Predictors : (Constant), plab, mus ical training, musmemc.
Predictors : (Constant), plab, mus ical training, musmem, ammad.
ANOVAe
.474 1 .474 12.511 .001a
1.288 34 .038
1.762 35
.474 2 .237 6.078 .006b
1.287 33 .039
1.762 35
.474 3 .158 3.930 .017c
1.287 32 .040
1.762 35
.476 4 .119 2.872 .039d
1.285 31 .041
1.762 35
Regress ion
Residual
Total
Regress ion
Residual
Total
Regress ion
Residual
Total
Regress ion
Residual
Total
Model1
2
3
4
Sum ofSquares df Mean Square F Sig.
Predictors : (Constant), plaba.
Predictors : (Constant), plab, musical trainingb.
Predictors : (Constant), plab, musical training, musmemc.
Predictors : (Constant), plab, musical training, musmem, ammad.
Dependent Variable: consprode.
396
Vowel Perception
Model Summary
.167a .028 -.001 .118821 .028 .979 1 34 .329
.291b .085 .029 .117028 .057 2.050 1 33 .162
.314c .099 .014 .117928 .014 .498 1 32 .485
.330d .109 -.006 .119159 .010 .342 1 31 .563
Model1
2
3
4
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predictors : (Constant), plaba.
Predictors : (Constant), plab, mus ical trainingb.
Predictors : (Constant), plab, mus ical training, musmemc.
Predictors : (Constant), plab, mus ical training, musmem, ammad.
ANOVAe
.014 1 .014 .979 .329a
.480 34 .014
.494 35
.042 2 .021 1.529 .232b
.452 33 .014
.494 35
.049 3 .016 1.170 .336c
.445 32 .014
.494 35
.054 4 .013 .945 .451d
.440 31 .014
.494 35
Regress ion
Residual
Total
Regress ion
Residual
Total
Regress ion
Residual
Total
Regress ion
Residual
Total
Model1
2
3
4
Sum ofSquares df Mean Square F Sig.
Predictors : (Constant), plaba.
Predictors : (Constant), plab, musical trainingb.
Predictors : (Constant), plab, musical training, musmemc.
Predictors : (Constant), plab, musical training, musmem, ammad.
Dependent Variable: percvowele.
397
Vowel Production
Model Summary
.325a .106 .079 .33015 .106 4.019 1 34 .053
.359b .129 .076 .33080 .023 .866 1 33 .359
.384c .148 .068 .33224 .019 .716 1 32 .404
.531d .282 .189 .30986 .134 5.789 1 31 .022
Model1
2
3
4
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predictors : (Constant), plaba.
Predictors : (Constant), plab, musical trainingb.
Predictors : (Constant), plab, musical training, musmemc.
Predictors : (Constant), plab, musical training, musmem, ammad.
ANOVAe
.438 1 .438 4.019 .053a
3.706 34 .109
4.144 35
.533 2 .266 2.434 .103b
3.611 33 .109
4.144 35
.612 3 .204 1.848 .158c
3.532 32 .110
4.144 35
1.168 4 .292 3.040 .032d
2.976 31 .096
4.144 35
Regress ion
Residual
Total
Regress ion
Residual
Total
Regress ion
Residual
Total
Regress ion
Residual
Total
Model1
2
3
4
Sum ofSquares df Mean Square F Sig.
Predictors : (Constant), plaba.
Predictors : (Constant), plab, musical trainingb.
Predictors : (Constant), plab, musical training, musmemc.
Predictors : (Constant), plab, musical training, musmem, ammad.
Dependent Variable: vowelprode.