LEXICAL TONE PERCEPTION AND PRODUCTION

LEXICAL TONE PERCEPTION AND

PRODUCTION: THE ROLE OF LANGUAGE

AND MUSICAL BACKGROUND

Barbara Schwanhäußer

M. A.

A thesis submitted for the degree of Doctor of Philosophy

University of Western Sydney

March 2007

ii

I hereby declare that this submission is my own work and, to the best of my

knowledge, it contains no material previously published or written by another person,

nor material which has been accepted for the award of any other degree or diploma at

the University of Western Sydney, or any other educational institution, except where

due acknowledgment is made in the thesis.

I also declare that the intellectual content of this thesis is the product of my own work,

except to the extent that assistance from others in the project‟s design and conception

is acknowledged.

_________________________

iii

For my parents

iv

Acknowledgements

There are many people I would like to thank for a large variety of reasons.

Firstly, I would like to thank my supervisor, Prof. Denis Burnham. I could not have

imagined having a better supervisor for my PhD, and without his support, knowledge,

perceptiveness, and good humour and I would never have been able to do this work.

His expertise, understanding, and patience, added considerably to my PhD experience.

I appreciate his vast knowledge and skill in many areas, such as psychology and

statistics and his assistance in writing reports (i.e., scholarship application, research

proposals, papers, and this thesis). He has taught me that scientific research can be not

only rewarding and creative, but also heaps of fun. Thanks a lot, Denis for your

enthusiasm and for being a great supervisor!

Thanks to Dr. Kate Stevens for helpful feedback in all matters and also to Dr. Chris

Davis, Dr. Christine Kitamura, and Dr. Hartmut Pfitzinger who were on my

supervisory panel.

I was very lucky to work with the people at MARCS Auditory Laboratories, and I am

very thankful for all the things I learned in the last years, including setting up my

experiments at Chulalongkorn University in Bangkok during 2004. I am very glad

that I had the opportunity to visit different laboratories in Germany (thanks to Prof.

Tillmann at IPSK, Munich), Croatia (thank you Dr. Glovacki-Bernardi at the

University of Zagreb), Thailand (Prof. Luksaneeyanawin at CRSLP, Chulalongkorn

University, Bangkok), and the United States (Prof. Ohala, University of California,

Berkeley; Prof. Keating at University of California, Los Angeles; Dr. Patel at The

Neurosciences Institute, San Diego; Prof. Stevens at MIT, Boston; Prof. Byrd at

University of Southern California, Los Angeles) during 2003, 2004, and 2005, and for

support in attending the INTERSPEECH conference in Lisbon, Portugal in 2005 and

ICMPC conference in Bologna, Italy in 2006.

I would also like to thank the academic and support staff of MARCS Auditory

Laboratories at University of Western Sydney, especially Gail Charlton for assisting

with travel arrangements, Mel Gallagher for her help with ethics and formatting, Dr.

Caroline Jones for her patience and help with set-up and data analysis in the first

v

stages, Brett Molesworth for help in all areas, Dr. Christian Kroos for assistance with

data analysis and Colin Schoknecht, Arman Abrahamyan (last-minute angel), and

Johnson Chen for technical support.

And a big thanks goes to the people who proofread my thesis: Jess Hartcher-O‟Brien,

Nicole Lees, Dr, Christian Kroos, Dr. Kai Mueller, Michael Nash, Dr. Michael Tyler,

and Nan Xu.

I also want to thank Prof. Marcus Taft who made it possible to set up the first

experiment at UNSW, Prof. Sudaporn Luksaneeyanawin for allowing to test at

CRSLP (Chulalongkorn University, Bangkok), and a very special thank you to

Benjawan Kasisopa for helping me in Bangkok and later at MARCS with everything

related to Thai language and lexical tone. And thank you Sarinya Chompoobutr and

Noppawan Nomura for testing Thai participants in Bangkok. Thanks also to Patana

Surawatanapongs and Naruechol Chuenjamnong for lending their voices in the third

Experiment.

I would like to say a big thank-you to all the people who agreed to do the experiments

for this thesis: Brooke Adam, Mary Broughton, Tim Byron, Sean Coward, Mel

Gallagher, Renee Glass, Shaun Halovic, Jemma Harris, Jess Hartcher-O‟Brien,

Graham Howard, Clare Howell, Nicole Lees, Paul Mason, Karen Mattock, Brett

Molesworth, Damien Smith, Michael Tyler, and all other students who participated in

the experiments at University of Western Sydney, University of New South Wales,

Chulalongkorn University, and Assumption University in Bangkok.

Thanks to my officemates and friends, Bettina, Jemma, and Mary for sharing the

room and much more for the last years and to Pimo Söderbohm for inspiration.

I have to say a huge Danke! to my friends in Germany and all over the world, for

supporting me in so many different ways.

And a big thank you to Michael, for all the help with programming and formatting,

for giving me love and encouragement, and for making me laugh. You are the best!

Finally, I would like to thank my family. I could not have done this without my

parents‟ and my sisters‟ love and support.

Ihr seid Wahnsinn!

vi

Abstract

This thesis is concerned with the perception and production of lexical tone. In the first

experiment, categorical perception of asymmetric synthetic tone continua was

examined in speakers of tonal (Thai, Mandarin, and Vietnamese) and non-tonal

(Australian English) languages. It was observed that perceptual strategies for

categorisation depend on language background. Specifically, Mandarin and

Vietnamese listeners tended to use the central tone to divide the continuum, whereas

Thai and Australian English listeners used a flat no-contour tone as a perceptual

anchor; a split based not on tonal vs. non-tonal language background, but rather on the

specific language. In the second experiment, tonal (Thai) and non-tonal (Australian

English) language speaking musicians and non-musicians were tested on categorical

perception of two differently shaped synthetic tone continua. Results showed that,

independently of language background, musicians learn to identify tones more

quickly, show steeper identification functions, and display higher discrimination

accuracy than non-musicians. Experiment three concerns the influence of language

aptitude, musical aptitude, musical memory, and musical training on Australian

English speakers‟ perception and production of non-native (Thai) tones, consonants,

and vowels. The results showed that musicians were better than non-musicians at

perceiving and producing tones and consonants; a ceiling effect was observed for

vowel perception. Musical training per se did not determine acquisition of novel

speech sounds, rather, musicians‟ higher accuracy was explained by a combination of

inherent abilities - language and musical aptitude for consonants, and musical aptitude

and musical memory for tones. It is concluded that tone perception is language

dependent and strongly influenced by musical expertise - musical aptitude and

musical memory, not musical training as such.

vii

LIST OF FIGURES ........................................................................................................................... XIV

LIST OF TABLES ........................................................................................................................... XVII

CHAPTER 1 INTRODUCTI ON ........................................................................................................... 1

1.1 OVERVIEW ...................................................................................................................................... 2

1.2 ORGANISATION OF THESIS .............................................................................................................. 2

CHAPTER 2 SPEECH PERCEPTION AND CATEGORICAL PERCEPTION ............................ 4

2.1 THE NATURE OF LANGUAGE AND SPEECH ...................................................................................... 5

2.1.1 Segmental Aspects of Speech ................................................................................................. 5

2.1.1.1 Nature of Consonants ..................................................................................................................... 6

2.1.1.2 Nature of Vowels ........................................................................................................................... 8

2.1.1.3 The Nature of Lexical Tone ........................................................................................................... 9

2.1.2 Suprasegmental Aspects of Speech ...................................................................................... 10

2.1.2.1 Rhythm ......................................................................................................................................... 10

2.1.2.2 Stress ............................................................................................................................................ 10

2.1.2.3 Intonation ..................................................................................................................................... 11

2.1.3 Tone as a Segmental and Suprasegmental Aspect of Spoken Language .............................. 12

2.2 SPEECH PERCEPTION AND CATEGORICAL PERCEPTION ................................................................. 12

2.2.1 Speech Perception Research History - Important Issues and Problems .............................. 13

2.2.2 The Problem of Segmentation and Speaker Variability ....................................................... 14

2.2.3 The Nature of Categorical Perception ................................................................................. 15

2.2.4 Prediction of Discrimination Performance from Identification results ............................... 18

2.3 STIMULUS FACTORS IN CATEGORICAL PERCEPTION ..................................................................... 19

2.3.1 Step-Size ............................................................................................................................... 19

2.3.2 Stimulus Duration ................................................................................................................ 20

2.3.3 Categorical Perception of Different Classes of Speech Sounds ........................................... 20

2.3.3.1 Categorical Perception of Stop Consonants ................................................................................. 21

2.3.3.2 Categorical Perception of Nasal Consonants ................................................................................ 21

2.3.3.3 Categorical Perception of Approximants...................................................................................... 22

2.3.3.4 Categorical Perception of Fricatives............................................................................................. 22

2.3.3.5 Categorical Perception of Vowels ................................................................................................ 23

2.3.4 Categorical Perception of Nonspeech Stimuli ................................................................................. 23

2.3.4.1 Categorical Perception of Continua Unrelated to Speech ............................................................. 24

2.3.4.2 Categorical Perception of Nonspeech Analogues of Speech Sounds ........................................... 24

2.4 TASK AND RESPONSE FACTORS IN CATEGORICAL PERCEPTION .................................................... 26

2.4.1 Identification Task Factors .................................................................................................. 26

2.4.2 Discrimination Task Factors in Categorical Perception ..................................................... 27

2.4.2.1 ABX and AXB Discrimination Tasks .......................................................................................... 27

2.4.2.2 The Two-Interval Two-Alternative Forced-Choice Discrimination Task .................................... 28

2.4.2.3 The AX Discrimination Task ....................................................................................................... 28

2.4.2.4 The Four-Interval-AX Discrimination Task ................................................................................. 28

2.4.2.5 The Four-Interval Oddity Discrimination Task ............................................................................ 29

viii

2.4.3 Methods for Increasing Categoricality ................................................................................ 30

2.4.3.1 Interference with Auditory Memory............................................................................................. 30

2.4.3.2 Decay of Auditory Memory ......................................................................................................... 31

2.4.4 Methods to Reduce Categoricality of Perception ................................................................. 32

2.4.4.1. The Use of More Sensitive Discrimination Paradigms. ............................................................... 32

2.4.4.2 The Use of Rating Scales and Measurement of Reaction Times .................................................. 33

2.5 PSYCHOACOUSTIC STRATEGIES AND EXPERIENTIAL FACTORS IN CATEGORICAL PERCEPTION ..... 34

2.5.1 Practice and Strategies .................................................................................................................... 35

2.5.1.1 Practice and Feedback .................................................................................................................. 35

2.5.1.2 The Use of Strategies in Categorical Perception .......................................................................... 35

2.5.2 The Influence of Specific Linguistic Experience on Categorical Perception .................................. 38

2.5.3 Categorical Perception in Infants .................................................................................................... 39

2.5.4 Categorical Perception in Animals .................................................................................................. 41

2.6 THEORIES OF CATEGORICAL SPEECH PERCEPTION ....................................................................... 42

2.6.1 The Motor Theory of Speech Perception .............................................................................. 42

2.6.2 Articulatory/Auditory Theories ............................................................................................ 45

2.6.3 The Stage Theory of Speech Perception ............................................................................... 45

2.6.4 The Dual-Process-Model ..................................................................................................... 46

CHAPTER 3 PERCEPTION AND PRODUCTION OF LEXICAL TONE .................................. 48

3.1 WHAT IS A TONAL LANGUAGE? .................................................................................................... 49

3.1.1 Tonal Phenomena ................................................................................................................ 50

3.1.2 Notation of Tone .................................................................................................................. 51

3.2 TONAL LANGUAGE SYSTEMS ........................................................................................................ 53

3.2.1 Thai Tones ............................................................................................................................ 54

3.2.2 Vietnamese Tones ................................................................................................................. 55

3.2.3 Mandarin Chinese Tones ..................................................................................................... 56

3.3 TONOGENESIS ............................................................................................................................... 57

3.3.1 Development of Tones from Voicing Contrasts - Tonal Split ............................................... 57

3.3.2 Development of Tones from Consonants .............................................................................. 58

3.3.3 Development of Tones from Vowel Height ........................................................................... 59

3.3.4 Other Influences of Tone Development ................................................................................ 59

3.4 TONE PRODUCTION ....................................................................................................................... 59

3.5 TONE PERCEPTION ........................................................................................................................ 62

3.5.1 Fundamental Frequency and the Auditory System ............................................................... 62

3.5.1.1 The Outer and Middle Ear ............................................................................................................ 62

3.5.1.2 The Inner Ear and the Basilar Membrane ..................................................................................... 63

3.5.1.3 The Transduction Process and the Hair Cells ............................................................................... 64

3.5.1.4 Central structures ......................................................................................................................... 65

3.5.2 Theories of Pitch Perception ................................................................................................ 65

3.5.3 Pitch Perception in Speech – Lexical Tone .......................................................................... 66

3.5.3.1 Perception of Lexical Tone - Overview ....................................................................................... 66

3.5.3.2 Multidimensional Approach to Lexical Tone Perception ............................................................. 66

ix

3.5.3.3 Perception of Tone when Pitch Information is not Available ....................................................... 67

3.5.3.4 Lateralization and Neuroimaging Studies .................................................................................... 69

3.6 TONE ACQUISITION....................................................................................................................... 72

3.6.1 Production of First Language Tone ..................................................................................... 72

3.6.2 Perception of First Language Tone ..................................................................................... 73

3.6.3 Production of Second Language Tone ................................................................................. 74

3.6.4 Perception of Second Language Tone .................................................................................. 75

3.6.4.1 Mandarin Tone Perception by Second Language Learners .......................................................... 75

3.6.4.2 Thai Tone Perception by Second Language Learners .................................................................. 77

3.6.5 Tone Perception as a Function of Tone Language Experience – First and Second Language

studies ........................................................................................................................................... 77

3.7 CATEGORICAL PERCEPTION OF LEXICAL TONE ............................................................................ 78

3.7.1 Categorical Perception of Cantonese and Mandarin Tones ................................................ 79

3.7.2 Categorical Perception of Thai Tones ................................................................................. 82

CHAPTER 4 MUSICAL PITCH AND TONE ................................................................................. 85

4.1 GENERAL CHARACTERISTICS OF MUSIC ....................................................................................... 86

4.1.1 Scales and Intervals ............................................................................................................. 86

4.1.2 Tempo, Rhythm, and Meter .................................................................................................. 87

4.2 MUSIC IN DIFFERENT CULTURES .................................................................................................. 88

4.2.2 Western Music ...................................................................................................................... 88

4.2.2.1 Scales and Intervals in Western Music ......................................................................................... 88

4.2.2.2 Tempo, Rhythm, and Meter in Western Music ............................................................................ 89

4.2.2.3 Brief History of Western Music ................................................................................................... 89

4.2.3 Thai Music ........................................................................................................................... 90

4.2.3.1 Thai Intervals and Scales .............................................................................................................. 90

4.2.3.2 Tempo, Rhythm, and Meter in Thai Music .................................................................................. 91

4.2.3.3 Brief History of Thai Music ......................................................................................................... 91

4.2.4 Similarities and Differences – Western and Thai Music and Singing .................................. 92

4.2.4.1 Music............................................................................................................................................ 93

4.2.4.2 Singing in Tonal Languages ......................................................................................................... 93

4.3 PERCEPTION OF MUSIC – TEMPO, RHYTHM , GROUPING, AND METER ........................................... 95

4.3.1 Perception of Tempo ............................................................................................................ 95

4.3.2 Perception of Rhythmic Patterns ......................................................................................... 96

4.3.3 Perception of Grouping ....................................................................................................... 97

4.3.4 Perception of Meter ............................................................................................................. 97

4.4 PERCEPTION OF MUSIC - PITCH ..................................................................................................... 98

4.4.1 Categorical Perception of Musical Pitch ............................................................................. 99

4.4.2 Relative Pitch ..................................................................................................................... 100

4.4.3 Absolute Pitch .................................................................................................................... 100

4.4.4 Absolute Pitch Memory ...................................................................................................... 102

4.4.5 Developmental Issues in Pitch Perception ......................................................................... 103

4.4.5.1 Pitch Perception Development ................................................................................................... 103

x

4.4.5.2 Absolute Pitch Perception Development .................................................................................... 105

4.4.5.3 Absolute Pitch Abilities in Infants ............................................................................................. 106

4.4.6 Hemispheric Differences in Pitch Processing .................................................................... 107

4.4.6.1 Lateralization of Pitch Processing .............................................................................................. 107

4.4.6.2 Lateralization of Absolute Pitch ................................................................................................. 108

4.5 MUSIC AND OTHER DOMAINS ..................................................................................................... 109

4.5.1 The Effect of Music on Other Cognitive Abilities............................................................... 111

4.5.2 The Effect of Music on Speech ........................................................................................... 114

4.6 INFLUENCE OF MUSIC ON FOREIGN LANGUAGE SOUND ACQUISITION ........................................ 116

CHAPTER 5 WHAT WE KNOW NOW AND WHAT WE WANT TO FIND OUT ................. 118

5.1 CATEGORICAL PERCEPTION OF ARTIFICIAL TONE CONTINUA .................................................... 119

5.2 INFLUENCE OF MUSICAL BACKGROUND ON TONE PERCEPTION ................................................. 120

5.3 PERCEPTION AND PRODUCTION OF TONES, VOWELS, AND CONSONANTS – INFLUENCE OF MUSICAL

APTITUDE AND LANGUAGE APTITUDE? ............................................................................................. 120

CHAPTER 6 CATEGORICAL PERCEPTION OF SPEECH AND SINE-WAVE TONES IN

TONAL AND NON-TONAL LANGUAGE SPEAKERS ............................................................... 121

6.1 BACKGROUND: RESEARCH ON THE CATEGORICAL PERCEPTION OF TONE .................................. 122

6.2 METHODOLOGICAL ISSUES ......................................................................................................... 123

6.2.1 Stimulus Type Presentation: Blocked vs. Mixed ................................................................ 124

6.2.2 Categorical Perception: Identification and Discrimination .............................................. 124

6.2.2.1 Interstimulus Interval in Discrimination Tasks .......................................................................... 124

6.2.2.2 Refined Operationalisation of Categorical Perception of Tone .................................................. 125

6.2.2.3 Non-Speech Stimulus Materials ................................................................................................. 127

6.3 HYPOTHESES .............................................................................................................................. 127

6.4 EXPERIMENTAL DESIGN ............................................................................................................. 127

6.4.1 Stimuli ................................................................................................................................ 128

6.4.2 Participants ........................................................................................................................ 129

6.4.3 Procedure ........................................................................................................................... 129

6.4.3.1 Identification .............................................................................................................................. 131

6.4.3.2 Discrimination ............................................................................................................................ 132

6.5 ANALYSES .................................................................................................................................. 133

6.5.1 Test Assumptions ................................................................................................................ 133

6.5.2 Language Group Hypotheses ............................................................................................. 134

6.5.3 Strategy Type Hypotheses .................................................................................................. 134

6.6 RESULTS ..................................................................................................................................... 135

6.6.1 Qualitative Evaluation ....................................................................................................... 135

6.6.2 Identification Results.......................................................................................................... 137

6.6.2.1 Trials to Criterion in Identification Training .............................................................................. 137

6.6.2.2 Identification Test Trials: Crossover Values .............................................................................. 139

6.6.2.3 Identification d' Results .............................................................................................................. 141

6.6.3 Discrimination Results ....................................................................................................... 142

6.6.3.1 Overall Discrimination Differences............................................................................................ 142

xi

6.6.3.2 Peak Discrimination Analysis .................................................................................................... 143

6.7 DISCUSSION ................................................................................................................................ 145

6.7.1 Independence of Speech and Non-Speech Processing ....................................................... 145

6.7.2 Perceptual Strategies in Identification and Discrimination ............................................... 146

6.7.3 Categoricality Issues .......................................................................................................... 148

6.8 FURTHER ANALYSIS AND FUTURE DIRECTIONS .......................................................................... 150

CHAPTER 7 PERCEPTION OF SPEECH AND SINE-WAVE TONES: THE ROLE OF

LANGUAGE BACKGROUND AND MUSICAL TRAINING ...................................................... 152

7.1 BACKGROUND: RESULTS OF EXPERIMENT 1 ............................................................................... 153

7.2 METHODOLOGICAL ISSUES ......................................................................................................... 153

7.2.1 Rising and Falling Continua .............................................................................................. 154

7.2.2 Step Size ............................................................................................................................. 156

7.3 HYPOTHESES .............................................................................................................................. 157

7.3.1 Categoricality Differences ................................................................................................. 157

7.3.2 Continuum Shape and Language Background ................................................................... 157

7.3.3 Processing Differences ...................................................................................................... 157

7.4 EXPERIMENTAL DESIGN ............................................................................................................. 158

7.4.1 Stimuli ................................................................................................................................ 158

7.4.2 Participants ........................................................................................................................ 159

7.4.3 Procedure ........................................................................................................................... 159

7.4.3.1 Identification .............................................................................................................................. 160

7.4.3.2 Discrimination ............................................................................................................................ 161

7.5 ANALYSES .................................................................................................................................. 162

7.5.1 Test Assumptions ................................................................................................................ 162

7.5.2 Language Group and Musical Background Hypotheses .................................................... 162

7.5.3 Strategy Type Hypotheses .................................................................................................. 162

7.6 RESULTS ..................................................................................................................................... 163

7.6.1 Identification Training Results ........................................................................................... 163

7.6.1.1 Trials to Criterion in Identification – Rising Continuum ............................................................ 164

7.6.1.3 Trials to Criterion in Identification – Falling Continuum ........................................................... 165

7.6.2 Identification Test Trials .................................................................................................... 167

7.6.2.1 Crossover Values – Rising Continuum ...................................................................................... 167

7.6.2.2 Crossover Values – Falling Continuum ...................................................................................... 168

7.6.2.3 Identification d' Results .............................................................................................................. 169

7.6.3 Discrimination Results ....................................................................................................... 171

7.6.3.1 Overall Discrimination Differences............................................................................................ 171

7.6.3.2 Discrimination Peak Analysis .................................................................................................... 173

7.6.4 Qualitative Evaluation and Summary of Results ................................................................ 176

7.7 DISCUSSION ................................................................................................................................ 180

7.7.1 Differences Due to Continuum Shape ................................................................................ 180

7.7.2 Differences Between Musicians and Non-Musicians ......................................................... 181

xii

CHAPTER 8 PERCEPTION AND PRODUCTION OF TONES, CONSONANTS, AND

VOWELS: THE INFLUENCE OF LANGUAGE APTITUDE, MUSICAL APTITUDE ,

MUSICAL MEMORY, AND MUSICAL TRAINING .................................................................... 182

8.1 INTRODUCTION ........................................................................................................................... 184

8.2 HYPOTHESES .............................................................................................................................. 185

8.2.1 Separate Abilities ............................................................................................................... 185

8.2.1.1 Musicianship .............................................................................................................................. 185

8.2.1.2 Musical Aptitude ........................................................................................................................ 185

8.2.1.4 Musical Memory ........................................................................................................................ 186

8.2.1.3 Language Aptitude ..................................................................................................................... 186

8.2.2 Relationship between Perception and Production ............................................................. 186

8.2.3. Determinants of Perception and Production .................................................................... 186

8.3 METHOD ..................................................................................................................................... 187

8.3.1 Participants ........................................................................................................................ 187

8.3.2 Musical Aptitude ................................................................................................................ 187

8.3.3 Musical Memory for Pitch ................................................................................................. 189

8.3.4 Foreign Language Aptitude ............................................................................................... 190

8.3.5 Stimulus Material for Perceptual Identification and Production Tasks ............................. 191

8.3.6 Perception of Speech Sounds ............................................................................................. 193

8.3.7 Production of Speech Sounds ............................................................................................. 194

8.4 RESULTS: SEPARATE ABILITIES .................................................................................................. 195

8.4.1 Musical Aptitude Results .................................................................................................... 195

8.4.2 Musical Memory Results .................................................................................................... 195

8.4.2 Foreign Language Aptitude Results ................................................................................... 196

8.4.4 Speech Perception .............................................................................................................. 197

8.4.5 Speech Production ............................................................................................................. 202

8.5 RESULTS: COMPARISON OF PERCEPTION AND PRODUCTION ....................................................... 205

8.6 RESULTS: DETERMINANTS OF PERCEPTION AND PRODUCTION ................................................... 205

8.6.1 Factor Analysis for Data Reduction .................................................................................. 205

8.6.2 Correlations Between Variables ........................................................................................ 206

8.6.3 Sequential Regressions ...................................................................................................... 208

8.6.2.1 Tone Perception and Production ................................................................................................ 209

8.6.2.2 Consonant Perception and Production ........................................................................................ 210

8.6.2.3 Vowel Perception and Production .............................................................................................. 210

8.7 DISCUSSION ................................................................................................................................ 216

8.7.1 The Nature of Musicianship ............................................................................................... 217

8.7.2 Musicianship and the Perception and Production of Speech Sounds ................................ 218

8.7.3 Musical Determinants of Speech Perception and Production ........................................... 219

CHAPTER 9 DISCUSSION .............................................................................................................. 222

9.1 SUMMARY OF RESULTS............................................................................................................... 223

xiii

9.1.1 Experiment 1: Categorical Perception of Speech and Sine-Wave Tones in Tonal and Non-

Tonal Language Speakers ........................................................................................................... 223

9.1.2 Experiment 2: Perception of Speech and Sine-wave Tones - The Role of Language

Background and Musical Training ............................................................................................. 224

9.1.3 Experiment 3: Perception and Production of Tones, Vowels, and Consonants - The

Influence of Training, Memory, and Aptitude ............................................................................. 225

9.2 STRATEGY EFFECTS IN TONE PERCEPTION ................................................................................. 225

9.3 MUSICIANS‟ ADVANTAGES IN SPEECH PERCEPTION AND PRODUCTION ...................................... 227

9.3.1 Musical Experience – Transfer to Musical Tasks .............................................................. 228

9.3.2 Musical Experience – Transfer to Related Linguistic Tasks .............................................. 228

9.3.3 Music Experience – Transfer to Less Related Linguistic Abilities ..................................... 229

9.4 LOCUS OF MUSICIANS‟ SUPERIORITY ......................................................................................... 230

9.5 SUGGESTIONS FOR FUTURE RESEARCH ....................................................................................... 232

9.5.1 Relationship between Tone Space and Intonation Space ................................................... 232

9.5.2 Investigation of Relationship Between Musical Training and Musical Aptitude ............... 232

9.5.3 Development of Musicality ................................................................................................. 232

9.5.4 Acoustic Analyses of Speech Production Ability ................................................................ 233

9.5.5 Psychoacoustic Processing Investigation .......................................................................... 233

9.6 CONCLUSION .............................................................................................................................. 233

REFERENCES………….…………………………………………………………...………………..235

APPENDIX………….……………………………………………………………...………………..272

xiv

LIST OF FIGURES

Figure 2.1. IPA chart of consonant sounds…………………………………………...7

Figure 2.2. The vowel chart of the International Phonetic Alphabet…………………8

Figure 2.3. Schematic spectrograms of the syllables [du] and [di]………………….14

Figure 2.4. Idealised categorical perception result…………………………………..16

Figure 2.5. Idealised continuous perception result…………………………………..17

Figure 3.1. Time normalised fundamental frequency contours of Thai tones……….54

Figure 3.2. Time normalised fundamental frequency contours Vietnamese tones…..55

Figure 3.3. Time-normalised fundamental frequency contours Mandarin tones…….56

Figure 3.4. View of the larynx ………………………………………………………60

Figure 3.5. Schematic figure of the vocal folds during phonation…………………...61

Figure 3.6. Anatomy of the ear: outer ear, middle ear, and inner ear………………..63

Figure 3.7. Anatomy of the cochlea…………………………………………...……..64

Figure 4.1. Comparison between the Thai and the Western Scales………………….91

Figure 6.1. Mid-Continuum Response Strategy……………………………………126

Figure 6.2. Flat-Anchor Response Strategy………………………………………...126

Figure 6.3. F0 characteristics of the asymmetric tone continuum…………………..129

Figure 6.4 Identification and discrimination results across languages……………..135

Figure 6.5 Descriptive statistics for trials to criterion scores………………….……137

Figure 6.6. Descriptive statistics for trials to criterion Scores……………...………138

Figure 6.7. Mean d' identification scores…………………………………………...140

Figure 6.8. Mean d' identification scores…………………………………………..140

Figure 6.9. Mean d' discrimination scores………………………………………….142

Figure 6.10. Mean d' discrimination scores………………………………………...143



xv

Figure 7.3. Flat-Anchor Response Strategy………………………………………...154

Figure 7.4. Flat-Anchor Response………………………………………………….154

Figure 7.5. F0 characteristics of the two asymmetric tone continua………………..157

Figure 7.6. Trials to criterion scores for the rising continuum……………………..162

Figure 7.7. Trials to criterion Scores for the falling Continuum…………………...164

Figure 7.8. Crossover values for the rising continuum for Thai listeners………......165

Figure 7.9. Crossover values for the rising continuum for Australian listeners……165

Figure 7.10. Crossover values for the falling continuum for Thai listeners……......166

Figure 7.11. Crossover values for the falling continuum for Australian listeners….166

Figure 7.12. Descriptive statistics for d‟ values for Thai listeners………...……….167

Figure 7.13. Descriptive statistics for d‟ values for Australian English listeners…..168

Figure 7.14. Descriptive statistics for d‟ values for Thai Listeners………………...168

Figure 7.15. Descriptive statistics for d‟ values for Australian English Listeners…168

Figure 7.16. Descriptive statistics for discrimination accuracy in Thai listeners…..169

Figure 7.17. Descriptive statistics for discrimination accuracy in Australian listeners……………………………………………………………………………...169

Figure 7.18. Descriptive statistics for discrimination accuracy in Thai Listeners….169

Figure 7.19. Descriptive statistics for discrimination accuracy in Australian listeners…………………………………………………………………………...…170

Figure 7.20. Descriptive statistics for the Flat vs. the other stimulus-pairs – Thai and Australian musicians and non-musicians.………………………………...171

Figure 7.21. Descriptive statistics for the Mid Pair vs. the other stimulus-pairs – Thai and Australian musicians and non-musicians…………………………………171

Figure 7.22. Descriptive statistics for the Flat vs. the other stimulus-pairs – Thai and Australian musicians and non-musicians………………………………...172

Figure 7.23. Descriptive statistics for the Mid Pair vs. the other stimulus-pairs – Thai and Australian musicians and non-musicians…………………………………172

Figure 7.24 Identification and discrimination results across languages……………175

Figure 7.25 Identification and discrimination results across languages……………176

xvi

Figure 8.1. Stylised versions of tonal contours used to label keys.………………...191

Figure 8.2. Descriptive statistics for mean percentile-ranking scores in the musical aptitude test……………………………………………………...…………192

Figure 8.3. Descriptive statistics for musical memory results for musicians and non-musicians for shift size and shift direction…………………………………….193

Figure 8.4. Descriptive statistics for musical memory results for musicians and non-musicians across shift size and shift direction…………………………………193

Figure 8.5. Descriptive statistics for mean scores in the language aptitude test……194

Figure 8.6. Descriptive statistics for mean trials to criterion scores………………..194

Figure 8.7. Descriptive statistics for perception accuracy………………………….195

Figure 8.8. Descriptive statistics for tone perception scores……………………….195

Figure 8.9. Descriptive statistics for consonant perception………………………...195

Figure 8.10. Descriptive statistics for vowel perception……………………………195

Figure 8.11. Descriptive statistics for speech production…………………………..200

Figure 8.12. Descriptive statistics for tone production……………………………..200

Figure 8.13. Descriptive statistics for consonant production……………………….200

Figure 8.14 Descriptive statistics for vowel production……………………………200

xvii

LIST OF TABLES

Table 3.1. The Five Lexical Tones of Standard Thai………………………………...52

Table 6.1. Description of the Language Hypotheses …………………………….…133

Table 6.2. Planned Contrasts for the Strategy Type Hypotheses……...……………134

Table 6.3. Descriptive Statistics for Crossover Values.………………………….…139

Table 6.4. Mean d’ values for Identification and Discrimination in Musicians and Non-Musicians ………………………………….……………………………..148

Table 7.1. Planned Contrasts for Strategy Type Hypotheses – Rising continuum...160

Table 7.2. Planned Contrasts for Strategy Type Hypotheses – Falling continuum..161

Table 8.1. Matrix Showing Stimuli Differing on Three Levels……………………..189

Table 8.2. Description of the Speech Sound Type Planned Contrasts……………...196

Table 8.3. Description of the Tone Contrasts………………………………...…….197

Table 8.4. Description of the Consonant Contrasts………………………….……..198

Table 8.5. Description of the Vowel Contrasts …………………………………….199

Table 8.6. Principal Component Loadings and Communalities …………………...203

Table 8.7. Intercorrelations Among Language Aptitude, Musical Aptitude, Musical Memory, and Musical Training…………………………………...……….204

Table 8.8. Descriptive Statistics and Correlations ………………………………...204

Table 8.9. Summary of Significant Predictors ……………………………………..208

Table 8.10. Summary of Significant Predictors – Alternative…………………..….208

1

CHAPTER 1

Introduction

2

1.1 Overview

This thesis is concerned with perception and production of lexical tone.

In tonal languages such as Vietnamese, Mandarin, or Thai, additional to differences in

consonants and vowels, differences in tone (fundamental frequency changes, perceived

as pitch differences) can be used to distinguish meaning (1962). In Thai, for example,

the word [mai] can mean “wood”, “not”, “silk”, “burn”, or “new” depending on what

tone it is pronounced with. In speech science research, consonants and vowels have

received most of the attention, while tones in tonal languages have been relatively

neglected. However, more than half of the world‟s population are tonal language

speakers, and an estimated 60 % to 70 % of the world‟s languages are tonal. On the

basis of this prevalence alone, tones, as well as consonants and vowels need to be

considered in all areas of speech perception and production research.

In addition, given the much greater importance of fundamental frequency (F0) in tone,

studies of tones may reveal aspects of speech processing that studies of consonants and

vowels have left uncovered. Such processes may well be elucidated by studying

speakers of tonal and non-tonal languages, and in this thesis both will be investigated.

F0 is also very important in music. In fact, F0 changes are used to change musical

melody and harmony. Given that tone is mainly conveyed by F0 it is of interest whether

or not tone perception is influenced by previous experience with musical tone.

Therefore, in this thesis, musicians‟ and non-musicians‟ perception of tone will also be

investigated. In addition the relative contributions of language aptitude, musical

aptitude, training, and memory will be considered.

1.2 Organisation of Thesis

To understand processes involved in speech perception in general, studies of consonant

and vowel perception are reviewed in Chapter Two. Differences and similarities

between those classes of speech sounds will be considered, with a focus on studies

concerning the phenomenon of categorical perception. This will be followed in Chapter

Three by a review of research in the area of lexical tone, especially tone perception.

Tonal phenomena, such as tonogenesis, tone acquisition, and categorical tone

perception will also be considered. In Chapter Four, research on music, especially pitch

will be summarised. Differences between musical pitch and lexical tone perception will

3

be explicated, including findings concerning singing vs. speaking, hemispheric

differences, and categorical pitch perception.

Chapter Five provides a summary of the introductory chapters, and sets up the research

issues that will be addressed in the experiments. Methods, results, and discussion of the

three experimental studies will be presented in Chapters Six, Seven, and Eight. Chapter

Six concerns categorical perception of speech and sine-wave tones in tonal and non-

tonal language speakers. Chapter Seven considers the influence of language and

musical background on the perception of speech and sine-wave tones and musical

memory. Chapter Eight investigates the influence of musical memory, musical aptitude,

and language aptitude on the perception and production of tones, consonants, and

vowels by musicians and non-musicians.

Finally in Chapter Nine there is a general discussion of the findings in terms of

perceptual strategies and interdependence of tonal language and musical background.

Chapter 10 concludes the thesis, noting implications and direction of future research.

4

CHAPTER 2

Speech Perception and Categorical Perception

5

2.1 The Nature of Language and Speech

The main purpose of language is to convey information. There are many different ways

to transmit information through language. Some examples are Braille transcription, sign

language and Morse code. Spoken and written language are the most common forms of

language.

Speech – the spoken form of language is not the same as language, because additional

to the linguistic content of what is said, a great deal of non-linguistic information is

conveyed by speech. When we hear somebody speak, we usually only need a few

moments to learn many aspects of the person we are talking to: where they come from,

which social group they belong to, whether they are in a good or in a bad mood. We

can find out about their health state and other important speaker characteristics.

The focus of the current experiments is on both the perception and production of

speech, particularly tones. Sections 2.1.1 and 2.1.2 will introduce segmental and

suprasegmental aspects of speech, ahead of a discussion of categorical speech

perception.

2.1.1 Segmental Aspects of Speech

In articulatory terms, speech sounds differ in whether or not the airflow coming from

the lungs is obstructed in the vocal tract1 and if so, at what point and in what manner it

is obstructed. On this articulatory basis, there are two broad classes of segments in

spoken language: consonants and vowels. Vowels are produced by allowing air to flow

from the lungs in an unobstructed way, whereas consonants are characterised by an

obstruction of the vocal tract. A third class of segments, usually carried on vowels, is

lexical tone. Lexical tone is the distinctive pitch level and/or contour carried by the

syllable(s) of a word, in cases in which tone is an essential feature of the meaning of

that word. In the past, lexical tone has often been neglected in spoken language

research, even though more than 60% to 70% of the world‟s languages are tonal

languages and more that half of the world‟s population are tonal language speakers

(Yip, 2001).

1 The term vocal tract refers to the whole of the air passage above the larynx, the shape of which is the main factor affecting the quality of speech sounds.

6

In order to understand the following information about speech, it is important to define

some terms that will be used.

Consonants and vowels can be phonetic as well as phonemic categories. A sound that is

distinguished on the basis of phonetic or articulatory features is called a phone, whereas

a category of sounds that are used to distinguish meaning in a particular language is

called a phoneme. In English, for example, /t/ and /d/ are different phonemes and „tent‟

and „dent‟ have different meanings. Phonemes can have different phonetic realisations,

which are called allophones. An allophone is one member of a phonemic category. In

English, [th] as in „toast‟, [t] as in „stand‟ and [t] as in „pot‟ are allophones of the

phoneme category /t/, even though their articulatory realisation and acoustic

characteristics are different.

As demonstrated above, the convention for written text is that slashes are used to

indicate phonemes /t/, and phones are written in brackets [t].

2.1.1.1 Nature of Consonants

In the consonant chart of the International Phonetic Association (IPA), consonants are

classified according to their place of articulation and articulatory organs as well as the

manner in which they are produced (see Figure 2.1).

Another important distinction among consonants is whether or not the vocal cords2 are

closed or separated as air coming from the lungs travels through them. If the vocal

cords are kept close together, the air stream must force its way through the glottis3,

causing the vocal cords to vibrate. The resulting sound is called a voiced speech sound,

as in [z]. If the cords are separated, the air is not obstructed at all, and the sound is

called voiceless, as in [s]. In the IPA chart (Figure 2.1), voiced (consonants on the left

of a cell) and voiceless (consonants on the right) versions of the consonants are shown.

There are 11 consonant classes in the IPA chart but in this section, just the 4 consonant

classes that are used in English, which are plosives, nasals, and fricatives are described.

2 The vocal cords are two bands of mucous membrane that are situated in the larynx. The vocal cords vibrate when they are adducted. 3 The glottis is the opening between the vocal cords at the upper part of the larynx.

7

Figure 2.1. IPA chart of consonant sounds, charted by manner (vertical axis), place (horizontal axis) of articulation and voicing (left vs. right member of each cell) (International Phonetics Association, 1999).

Stop consonants or plosives, of which English has /b/, /p/, /d/, /t/, /g/, and /k/, are

produced by completely occluding the vocal tract at a single place of articulation with

the lips, the tongue tip, or the tongue body. In /b/ and /p/, the closure occurs between

the lips, in /d/ and /t/ between the tongue tip and the teeth, and in /g/ and /k/ the tongue

body occludes the vocal tract at the velum. In a plosive, vocal tract air pressure is built

up during the closure phase and then released with a rapid opening movement that

causes a noise burst.

A second class of consonants, called nasals, involves the lowering of the velum during

an oral closure, so that the airflow travels through the nose, rather than through the oral

cavity, as in /m/, which is produced with closed lips or /n/, where the tip of the tongue

occludes the vocal tract at the upper alveolar ridge. In /ŋ/, the back of the tongue

touches the velum.

Fricatives are produced by creating a narrow constriction, usually via tongue tip or

tongue body placement, and an appropriate level of air pressure to produce turbulence

and thus fricative noises. In English, there are five different places of articulation for

fricatives: labiodental (/f/ as in „fast‟ and /v/ as in „vast‟), interdental (/θ/ as in „thunder‟

and /ð/ as in „though‟), alveolar (/s/ as in „sue‟, and /z/ as in „zoo‟), palatal (/∫/ as in

shade and /ʒ/ as in measure), and glottal (/h/ as in house).

8

Plosives that are released as fricatives are called affricates. In English, two affricates

are apparent: [t∫] as in „chicken‟ and [dʒ] as in „jockey‟. The closure is alveolar as in /t/

or /d/ and friction occurs during the release.

Consonants are generally described as fast changing parts of the speech signal. These

changes, called transitions, can also be found in visual representations of the speech

signal. Vowels show more stable formant transitions and will be described in the

following section.

2.1.1.2 Nature of Vowels

Vowels are characterised by an open vocal tract. Vowels are distinguished from one

another by whether they are produced in the front, [i], [a], in the centre, [ə], or in the

back, [u], of the oral cavity, whether the tongue position is high, [i], middle, [e], or low,

[a], and whether the lips are rounded, [y], or unrounded, [i], (see Figure 2.2).

Figure 2.2. The vowel chart of the International Phonetic Alphabet. The vowel quadrangle shows the extreme vowel positions in articulation. Horizontally, frontness/ backness of the vowels (acoustically measured by the second formant F2) are plotted. Vertically, vowels are plotted according to tongue position (acoustically measured with the first formant F1). Where symbols appear in pairs, the one to the right represents the rounded version of the vowel. Figure reproduced from the Handbook of the International Phonetic Association. (1999)

The number of vowels varies across languages. Aranda for example, a language spoken

in Greenland, only has three vowels, whereas twenty different vowels are found in

Punjabi (Pompino-Marschall, 1995). There are 14 different vowels apparent in the

vowel system of English. Vowels, as opposed to consonants, have rather steady-state

characteristics, which means that there is a phase in the vowel during which the

9

acoustic and articulatory characteristics are relatively stable. Those steady vowels are

called monophthongs. Other vowels have more than a single steady state - their quality

changes as the tongue moves in the course of their production and these are called

diphthongs. Examples of diphthongs are /ei/ as in „face‟ and /oI/ as in „boy‟.

2.1.1.3 The Nature of Lexical Tone

In English, only consonants and vowels are used to differentiate the meaning of words.

However, there is a third feature that plays a role in well over half of the world‟s

languages – lexical tone.

In around 60% to 70% of the world‟s languages, pitch height and/or pitch contour of

vowels is used as a lexical feature. These differences in pitch are called lexical tone,

and the languages that make use of tone as a lexical feature are called tonal languages.

In tonal languages, such as Mandarin Chinese or Vietnamese the meaning is not only

determined by vowels and consonants but also by pitch height and pitch contour. Tone

is not a lexical feature in English.

There are two kinds of tonal languages: register tone languages, where the pitch height

of the tones is relatively level, and contour tone languages, in which the contour of at

least some of the tones is more important than the absolute pitch height.

The description of tones in categories of height and contour is relative. On the one

hand, it is not the absolute pitch that makes the tone high or low, but rather the relative

height compared to the pitch range of the particular speaker; on the other hand, it is

relative pitch height compared to the accompanying intonation. Tone is mainly

specified by the psychological dimension of pitch, as measured by the acoustic variable

of fundamental frequency, but other aspects such as duration4, amplitude5 and voice

register6 can also play a role in tone. A comprehensive overview of pitch and lexical

tone is presented in Chapter 3.

4 Duration is the acoustic feature of length of time, measured in seconds or milliseconds.

5 Amplitude is an acoustical measure, that refers to the extent to which an air particle moves to and from around its rest point in a sound wave. The greater the amplitude, the greater the intensity of a sound, and the greater the sensation of loudness. 6 Voice register refers to the voice quality produced by a specific physiological constitution of the larynx. Variations in the length, thickness and tension of the vocal cords combine to produce different types of phonation, such as creaky or breathy voice.

10

2.1.2 Suprasegmental Aspects of Speech

The physical correlate of pitch is fundamental frequency (F0) of the voiced parts of the

acoustic speech signal. As opposed to lexical tone, which can be considered to be a

segmental feature, there are suprasegmental aspects of pitch, which extend not just over

syllables or words, but also over whole utterances. These are rhythm, stress, and

intonation and they are used in tonal languages as well as in non-tonal languages. These

features are reviewed in the following sections.

2.1.2.1 Rhythm

Rhythm in speech is the matter of timing within an utterance7. It can be regarded as the

relationship between strong and weak beats (or stresses). Extended utterances such as

sentences always display a mix of strong and weak beats. In speech, rhythm is apparent

on the word level as in „DOCtor‟ vs. „guiTAR‟ (capitalised words/syllables are

stressed) as well as at the sentence level as in „HE did it.‟ vs. „He DID it.‟ Rhythmic

organisation is not the same in all languages. For example English is a stress-timed

language, in which, it is hypothesised, durations between consecutive stressed syllables

are always similar (Abercrombie, 1967). Other types of rhythmic organisation are

syllable-timed languages, such as French, in which the syllables are said to occur at

regular intervals in time, and mora-timed languages, such as Japanese, in which the

rhythmic units are moras, similar to but generally smaller units than syllables in which

duration is generally equivalent (Crystal, 2003).

In English, the stress foot determines the rhythm. Stress foot is the rhythmic unit that

includes a stressed syllable preceding an unstressed syllable (Echols, Crowhurst, &

Childers, 1997; Shattuck-Hufnagel & Turk, 1996). The pattern of stress feet found in

English is predominantly trochaic, which means that a stressed syllable is usually

followed by no or one unstressed syllable; a strong-weak pattern (Nooteboom, 1997).

2.1.2.2 Stress

Linguistic stress in speech can operate on two levels –the word and the sentence level.

Word stress can be phonemic in many languages. In English, for instance word pairs

can be distinguished by their stress pattern. In some words, if the stress is moved from

the first to the second syllable in bi-syllabic words, the word meaning changes in 7 An utterance is a complete unit of speech in spoken language. It can consist of one or more words and is generally but not always bounded by silence.

11

respect to its lexical class. „SUBject‟, for example, with the stress on the first syllable,

means a topic, whereas when the stress is on the second syllable, „subJECT‟, means to

make somebody accountable for something. Other examples are „PERmit‟ versus

„perMIT‟, and „ABstract‟ versus „abSTRACT‟.

Word stress is not predictable in its placement in English – it must be learned with each

word. In many other languages, word stress is predictable; in French for example it is

always on the last syllable („mademoiSELLE‟, „bon voYAGE‟).

Stress is often signalled by loudness (amplitude), but stressed syllables are not always

louder than others; other factors like duration (lengthening of a stressed syllable), pitch

shift (higher pitch in stressed syllables), and changed spectral characteristics8 are also

very important and reliable features that signal stress.

Sentence stress (or focus) is an important part of the phonology of all languages. This

prosodic feature indicates where the important information point of the sentence is.

Consider the sentence „she was NOT supposed to read about these decisions.‟ vs. „she

was not supposed to READ about these decisions. In the first the meaning is that the

person should not have been informed about the decision, whereas in the second the

possibility of learning about these decisions by means other than reading is opened.

2.1.2.3 Intonation

Intonation is the course of the pitch pattern across an utterance. Over the course of an

utterance, irrespective of fundamental frequency changes resulting from the particular

accentuation patterns, there is a general declination in F0 over time. Over and above

this, intonation contours can serve to transmit differences in meaning. An utterance that

has a falling intonation contour as in „She was there.‟ is usually perceived as a

statement whereas the same sentence produced with a rising intonation contour would

be perceived as a question – „She was there?‟

Apart from the use of pitch for lexical distinctions in tonal languages (see 2.1.1.3 and

Chapter 3), pitch is also used in speech for so-called paralinguistic9 purposes. This kind

8 The sound spectrum, as represented in spectrograms, includes time, frequency and intensity relationships, notably seen as bands of energy called formants. (see Fig 2.3). In this case these variations could include changes to the formant structure. 9 Paralinguistic refers to the set of non-phonemic properties of speech, such as speaking tempo, vocal pitch, and intonational contours, which can be used to communicate attitudes or other shades of meaning.

12

of pitch variation is known as emotional prosody, and it refers to the mechanism by

which personal characteristics and emotional states are indicated in the intonation

contour (Abercrombie, 1968; Kramer, 1963). This affective function of prosody has

been seen to reflect individual psychological states more than general features of the

language community (Fry, 1969, 1970). Some attempts have been made to analyse

affect in the acoustic signal (Lieberman & Michaels, 1962), for example Williams and

Stevens (1972) found regular patterns of pitch change associated with anger and fear.

Closely related to the affective level is the distinction between attitudes by prosody,

including attitudes towards the speaker, towards the content of the utterance, or towards

the listener (Van Lancker, 1980). This application of prosody is used to express a more

personal commentary on the sentence that is produced. In this way prosody can be used

to express rather subtle notions such as hesitancy and irony. The use of prosodic

parameters as transmitters of attitudes is thought to be universal across all human

language systems (W. Wang, 1971).

2.1.3 Tone as a Segmental and Suprasegmental Aspect of Spoken Language

Now that both the segmental and suprasegmental aspects of language have been

considered, it can be seen that lexical tone has a special and even ambiguous nature. On

the one hand lexical tone is segmental as it distinguishes meaning at the lexical level, as

in other segments, consonants and vowels. On the other hand it is suprasegmental as it

uses pitch variation (often over time) as in suprasegmental cues such as intonation, to

do so. In this thesis, the perception of tone in speech contexts and pitch in non-speech

contexts will be investigated on a segmental level.

2.2 Speech Perception and Categorical Perception

Over the past half century, one of the major goals of speech scientists has been to map

acoustic properties of the speech signal and linguistic elements such as phonemes. This

mapping has turned out to be a rather complex process and a complete explanation of

how humans identify consonants and vowels remains unclear.

Speech perception can be divided into three levels (Studdert-Kennedy, 1976). At the

auditory level, the signal is represented in terms of its frequency, intensity, and

temporal attributes (features that can be identified in the acoustic signal) as with any

other auditory stimulus. At the phonetic level, particular acoustic cues, such as formant

13

transitions, duration, etc. are perceived as specific speech segments, phones. At the

phonological level, phonetic segments are classified into language-specific phonemes

in terms of phoneme classes and phonological rules are applied to the perceived

utterance. These three levels may be interpreted as successive discrimination processes

applied to the speech signal. First the auditory signal is separated from other sensory

signals and it is established that the stimulus is an event that has been perceived. Then

the special properties that qualify it as speech allow it to be separated from other

sounds. Finally, its specific characteristics allow it to be recognised as meaningful

speech of a particular language.

In this thesis the phonetic and phonological stage of perception are of most interest, and

within these levels the manner in which speech sounds are classified - categorical

perception. In the following sections, previous research in the area of categorical

perception will be summarised and the major theoretical standpoints explained.

2.2.1 Speech Perception Research History - Important Issues and Problems

Up to the middle of the 20th century, most research in speech perception was conducted

by telephone companies such as the Bell Laboratories in the United States. Their major

interest was to reduce the acoustic speech signal in order to be able to use the available

capacity of the transmitting media as effectively as possible. As a result of experiments

testing speech intelligibility, the telephone, up to this date, only transmits frequencies

between 300 Hz and 3 kHz, as it has been found that frequencies outside this range are

not necessary to understand speech (Pompino-Marschall, 1995).

Much of academic speech perception research started in the context of a technical

failure: The Haskins Laboratories in New York had planned to develop a reading

machine for the blind in the 1950s; their goal was to encode letters into special acoustic

signals, similar to the Morse code. During their research on this it became clear that the

very high transfer rates that humans use in speech communication could not be

achieved, and the new avenue of speech perception research developed.

During World War II, an apparatus for visual investigation of spectral features of

speech sounds and production of synthetic speech was developed: the spectrograph.

14

This involved representation of speech in term of a spectrogram10 and drawing stylised

spectrographic patterns meant to capture the essential linguistically relevant aspects of

the speech signal, and converting these stylised visual patterns to speech via the

“pattern playback” synthesiser. Thus a new method of analysing perceptual processes

by testing synthetised speech material, „analysis-by-synthesis‟, was born. The quality of

those early attempts at synthesis was not very good, but the results of the experiments

conducted at the time became very influential in establishing future directions in speech

perception research. Some of the problems that occurred with synthesis of speech are

discussed in the following section.

2.2.2 The Problem of Segmentation and Speaker Variability

The early approaches to synthesise speech sounds showed that it was relatively

straightforward to produce intelligible vowels by resynthesising appropriate two-

formant patterns. Plosive sounds, such as /b/ or /g/ turned out to be much more complex

to synthesise and these sounds became a major object of investigation from that point in

time. The most salient feature of stop consonants is the dynamic formant transition,

which can differ according to neighbouring sounds.

Figure 2.3. Schematic spectrograms of the syllables [du] and [di].

As shown in Figure 2.3, the spectrogram patterns for the speech sound [d] differ with

respect to the second formant F2. The formant transitions into F2 are very different in 10 In a spectrogram, the horizontal dimension represents time and the vertical dimension represents frequency. Each thin vertical slice of the spectrogram shows the spectrum during a short period of time, using darkness to stand for amplitude. Darker areas show those frequencies where the simple component waves have high amplitude.

2.5

2.0

1.5

1.0

0.5

0

0 100 200 300 400

[du]

[di]

time (ms)

F2

F1 F2 F1 F

req

uen

cy in

kH

z

15

[du] and [di], due to the phonetic context in which the [d] sounds are articulated. This

feature of spoken language is called coarticulation, a term which refers to the overlap

of articulatory movements in consecutive speech sounds (see Figure 2.3). Despite

coarticulation effects on the acoustic realisation of the speech signal, there is invariance

in perception; in this case the same initial phone [d] is perceived in each case.

Another problem for speech synthesis was that while synthesis of the parts with static

formant characteristics resulted in sounds that were clearly intelligible vowels, [u] and

[I] in Figure 2.3, if only the transition part of the syllable was synthesised, the percept

was a complex non-speech chirp which did not sound like a [d] at all. This meant that

auditorially, the sound does not only vary according to the context but it is also not

always possible to segment it in a way that the single phoneme is audible.

The issue of variability of speakers concerns the fact that different productions of the

same sound, syllable or word can look very different in the signal, but can be

recognised as the same utterance by human listeners. Due to large differences in

articulatory anatomy and articulation types, the acoustic signal varies greatly between

speakers and even different productions of the same word can look very different

within one speaker. This variability does not pose a real problem for human listeners,

but it is one of the major obstacles that automatic speech recognition must overcome.

All of these mismatches between the signal and the percept show that the human

listener is able to perceive phonetic/phonemic invariance in the face of acoustic

variability. Further investigation of such phenomena via manipulation of phonetic

features gave rise to establishment of a very important phenomenon: categorical

perception of speech sound continua. Categorical perception assists in the explanation

of how humans overcome the above problems of speech perception and is explained

and discussed in the following sections.

2.2.3 The Nature of Categorical Perception

In categorical perception, stimuli equally spaced along a physical continuum spanning

two phoneme or phone categories are perceived as members of one or the other

category with little perceptual ambiguity, and stimuli within categories are difficult to

differentiate. Categorical speech perception was first described and investigated at

16

Haskins Laboratories (Liberman, 1957). In a typical categorical perception experiment,

a continuum of synthetic stimuli (e.g. consonants in speech experiments) varying in a

physical parameter (e.g. the duration of the voice onset time (VOT) as in /ba/ - /pa/) is

presented to the participant for identification and discrimination. In the identification

task, speech sounds from the continuum are presented to subjects with their task being

simply to label the sounds, for example as „b‟ or „p‟. In the discrimination task, a

judgement is required regarding whether two sounds that are presented in succession

are the same or different. (A more detailed review of different types of discrimination

tasks is presented in section 2.4.2)

stimulus numbers (from 1 to 9) with 1 and 9 being the extreme stimuli

Figure 2.4. Idealised categorical perception result. The dashed lines represent identification; the solid line represent discrimination performance.

Categorical perception is indicated by a particular combination of identification and

discrimination functions, as shown in Figure 2.4. Firstly, identification functions will

exhibit abrupt boundaries between stimulus categories; and secondly, the

discrimination performance is close to the chance level (50%) for stimulus-pairs within

a category, but almost perfect for stimulus-pairs that cross the identification boundary,

a pattern of results known as “phoneme-boundary effect”. Perception is only considered

categorical if the location of the identification boundary corresponds with the location

of best discrimination performance (the discrimination peak). It can be seen that the

underlying notion of the categorical perception of speech is the premise that speech

discrimination ability is very closely connected to the existence or non-existence of

functional differences between sounds.

perc

eptu

al a

ccur

acy

(%

)

0

25

50

75

100

1 2 3 4 5 6 7 8 9

discrimination performance

identification performance

discrimination peak

identification boundary

17

stimulus numbers (from 1 to 9) with 1 and 9 being the extreme stimuli Figure 2.5. Idealised continuous perception result. The solid line represents discrimination, the dashed lines represent identification performance.

If a continuum is perceived continuously, rather than categorically, as it is the case with

most psychoacoustic continua like brightness, amplitude or duration, the identification

functions are less steep than those in categorically identified stimuli. Such a case shown

in the idealised results in Figure 2.5, while discrimination ability is better than chance,

it is constant along the whole continuum. Clear result patterns like those shown here are

rarely obtained in real experiments but they demonstrate the essential features of

categorical and continuous perception. In speech, result patterns resembling continuous

perception have been found in experiments with vowels - identification functions are

not very abrupt and discrimination is only slightly improved around the perceived

identification boundary (Cowan & Morse, 1979; Fry, Abramson, Eimas, & Liberman,

1962; D. B. Pisoni, 1973); and patterns resembling categorical perception have been

found for consonants (Bastian, Eimas, & Liberman, 1961; Eimas, 1963; Lane, 1965;

Liberman, 1957; Liberman, Harris, Hoffman, & Griffith, 1957; Repp, 1984).

Research on the categorical perception of speech was quite productive up until the mid-

to late-eighties (Harnad, 1987; Repp, 1984; Snowdon, 1987) and while the notion of

categorical speech perception is not without controversy, it has now become a core

concept of the field (Kluender, 1994; MacKay, Allport, Prinz, & Scheerer, 1987).

0

25

50

75

100

1 2 3 4 5 6 7 8 9

discrimination performance

identification performance

(no sharp) identification boundary

(no) discrimination peak

perc

eptu

al a

ccur

acy

(%

)

18

2.2.4 Prediction of Discrimination Performance from Identification results

One of the premises of categorical perception is that discrimination performance is

predicted by identification performance. This discrimination prediction consists of two

parts. The first prediction concerns the location of the discrimination peak: To fulfil the

premises of categorical perception, the discrimination peak must be located at the same

point of the continuum where the identification boundary is found. The identification

boundary can be computed as the point on the continuum where correct identification

responses for each category are at 50 percent. Secondly, following the hypothesis that

the listener can only discriminate sounds that are identified as different categories, for

judgements involving two categories, where the probability for each is equal, the

proportion correct (P(C)) in discrimination is predicted to be

P(C) = 0.5 [1+ (pA – pB)2]

where pA is the probability of identifying stimulus A as one of the two categories, and

pB is the probability of identifying stimulus B as the same category, and the chance

level is 0.5 (Liberman, 1957; Macmillan, Kaplan, & Creelman, 1977). In most cases the

use of this prediction formula leads to a conservative prediction, a lower level of

discrimination performance than is actually observed (Damper & Harnad, 2000). The

difference between prediction and actual performance is attributed to different factors:

The first possible reason is that higher-than-predicted discrimination could be based not

only on the phonemic labels but also on important spectral differences between the

sounds (Eimas, 1963; Liberman et al., 1957; D. B. Pisoni, 1971; Wood, 1976). Another

explanation for the discrepancy between obtained and predicted results is that the

difference in some studies may be due to artefacts of the experimental procedure,

irrelevant aspects of the speech signal that could have given extra-speech cues to the

listener, such as accidental noise apparent in one, but not the neighbouring sounds.

These external aspects of the speech signal would not influence identification results,

but might contribute to the discrepancy between predicted and obtained discrimination.

(For detailed reviews of other methods of discrimination prediction see Macmillan,

1987 and Massaro, 1987).

19

2.3 Stimulus Factors in Categorical Perception

In this section features of the stimulus itself are considered and in the following

sections the kinds of tasks and responses used in categorical perception studies are

described. Here, a number of stimulus features are discussed, beginning with one of the

most important, the size of the steps along the continuum.

2.3.1 Step-Size

In discrimination, the most obvious variable that influences response accuracy is the

magnitude of the separation of stimuli on the continuum. As expected, it has been

found that the larger step-size between stimuli to be discriminated, then generally, the

better the discrimination (D. B. Pisoni, 1971). Healy and Repp (1982) tested influence

of three different step sizes and found increased discrimination performance for vowels,

tones, and fricatives but not for stop consonants. Pisoni and Tash (1974) assessed

reaction times in a same-different category discrimination task and found response

times for stimuli that were two steps apart were significantly shorter than for one-step

pairs and pairs of identical stimuli. The other observation made was that „different‟

reaction times for stimulus-pairs crossing a phonetic boundary were greater than for

those stimuli that were separated by four or six steps. Nevertheless, no difference was

found between four-step and six-step different pairs and the likelihood of falsely

responding with „same‟ to a different pair was the highest for two-step pairs. This could

mean that different- reaction times reflected phonetic, rather than auditory ambiguity.

Based on these results Pisoni and Tash (1974) proposed a two-stage model for same-

different comparisons. In this model, auditory stimulus properties are first compared

and a comparison of phonetic labels takes place. Then a second phonetic stage is used

only if the auditory difference does not fall either below the “same” or above the

“different” criterion adopted by the listener.

In summary, same-different reaction time studies have shown that the listener is

sensitive to differences within stop consonant categories, but it is difficult to detect

those. Even though there are experiments that did not find such sensitivities (Repp,

1975), the positive results reinforce the hypothesis that all aspects of the speech signal

are represented in auditory memory.

20

2.3.2 Stimulus Duration

Another factor that can affect categoricality of perception is the duration of the

stimulus. In the case of steady-state vowels, a shortening of the stimulus duration is

suspected to weaken the auditory trace of the stimulus consequently lead to perception

that is more categorical than for longer stimulus durations.

Indeed, it has been found that perception is more categorical with short (around 25-50

ms) vowels than long (100 ms) vowels and even up to 300 ms in duration (Fujisaki &

Kawashima, 1968, 1969). In addition perception is also more categorical for rapidly

changing formant transitions in vowels than for steady-state vowels (D. B. Pisoni,

1971; Sachs, 1969; Sawusch, Nusbaum, & Schwab, 1980; K. N. Stevens, 1968; Tartter,

1981).

These findings of categorical perception for shorter and dynamically varying vowels

suggest that the short duration and rapid transitions for initial stop consonants may be

responsible for their being perceived in a highly categorical manner. Investigations of

this hypothesis in a number of studies confirm the impression that formant transitions

have a representation in auditory memory that can be accessed when redundant steady-

state information is removed from the vowel (Dechovitz & Mandler, 1977; Keating &

Blumstein, 1978; Tartter, 1981). This means that even though the vocalic portion of a

stop consonant-vowel syllable helps phonetic perception, it appears to interfere with the

maintenance of the consonantal features at a pre-categorical level. Therefore the

general auditory salience of irrelevant stimulus parts may be a major factor in

categorical perception.

2.3.3 Categorical Perception of Different Classes of Speech Sounds

Most of the experiments in which categorical perception was tested have used voicing

in initial stop consonants or vowels, because these sounds seem to represent the

endpoints of the categoricality spectrum (stops: categorical; vowels: rather continuous).

In order to be able to interpret the results of the current experiments that are concerned

with the much less investigated categorical perception of lexical tone, the results

obtained with various classes of speech sounds are reviewed in the following sections.

21

2.3.3.1 Categorical Perception of Stop Consonants

Most research in the categorical perception domain has been conducted with stop

consonants and is reviewed here. In stop consonants, possible contrasts are voicing,

manner of articulation, and place of articulation.

It has been shown in numerous experiments that continua in which stop consonant

voicing is manipulated, are perceived in a categorical manner (Abramson & Lisker,

1973; Bastian et al., 1961; D. Burnham, L. Earnshaw, & J. Clark, 1991; Cutting &

Rosner, 1974; Edman, 1979; Eimas, 1963; Harnad, 1987; Liberman, 1957; D. B.

Pisoni, 1971).

Early studies of categorical perception of place of articulation found that this feature of

speech is also perceived categorically (Liberman et al., 1957; Mattingly, Liberman,

Syrdal, & Halwes, 1971; D. B. Pisoni, 1971; Syrdal-Lasky, 1978). Place of articulation

was manipulated by Mattingly et al. (1971), who noticed an absence of discrimination

peaks, which was later attributed to the poor quality of the stimuli. Nevertheless,

Popper (1972) found a discrimination peak on an /ab/ - /ad/ continuum, although

discrimination was better than predicted by identification; (similar results were

obtained in a study by Miller, Eimas, and Zatorre (1979).

Results of studies investigating the influence of the consonant position in the word or

syllable suggest that syllable-final stop consonants are perceived less categorically than

syllable-initial stops, which could be due to the fact that the distinctive information is

better retained in auditory memory when it is in final position.

Another way of manipulating stop consonants is by varying their manner of

articulation. It has been found that continua which involve a change in articulation

manner are perceived categorically (Liberman, Delattre, Gerstman, & Cooper, 1956; J.

L. Miller & Liberman, 1979).

2.3.3.2 Categorical Perception of Nasal Consonants

Nasal consonants have not been tested for categoricality very extensively, because

synthetic manipulation of nasals is more difficult than for vowels and stop consonants.

In his studies on nasal consonants however, Garcia (1966) found categorical

discrimination that was better than predicted by identification. More consistent results

were obtained by Miller and Eimas (1977), who compared /ba/ - /da/ stimuli with

22

stimuli from a /ma/ - /na/ continuum and observed identification that was not as

categorical as for stops, but discrimination patterns that suggested categorical

perception.

Given these results and those of other studies (Larkey, Wald, & Strange, 1978;

Mandler, 1976; J. D. Miller et al., 1979; J. L. Miller & Eimas, 1977) on categorical

perception of nasal consonants, it can be concluded that perception of nasals is very

categorical, with discrimination that slightly exceeds the prediction.

2.3.3.3 Categorical Perception of Approximants

A seminal study on categorical perception of approximants (Miyawaki et al., 1975)

employed a /ra/ - /la/ continuum to investigate the influence of linguistic experience on

perception in Japanese vs. American listeners (in Japanese, /ra/ and /la/ are allophones

of one phonemic category). American listeners showed fairly categorical perception,

Japanese listeners, however, performed poorly in both identification and discrimination

and were far from showing categorical perception. Studies observing perception of

synthetic approximant continua (H. Fujisaki & T. Kawashima, 1970; MacKain, Best, &

Strange, 1981; McGovern & Strange, 1977) obtained very similar results. Frazier

(1976) created a synthetic continuum from /w/ to /l / to /y / through variation of the

initial steady-state portion and the F2 transition and found that those stimuli were

perceived fairly categorically.

Apart from studies that investigated perception of continua between different

approximants, experiments using continua between approximants and other phonemes

have also been conducted and it was shown that perception was highly categorical (J. L.

Miller, 1980).

Altogether, perception of semivowels, liquids, and approximants appears to be less

categorical than that of stop consonants, but is far from being continuous.

2.3.3.4 Categorical Perception of Fricatives

Fricatives are expected to be perceived rather continuously because stimuli on a

synthetic fricative continuum are acoustically rather widespread and even one-step

differences should exceed auditory detection thresholds.

23

Fujisaki and Kawashima (1968; 1970) investigated the perception of fricatives and

observed good within-category discrimination and a peak at the category boundary. A

study by Healy and Repp (1982) yielded somewhat different results: continuous

perception of fricatives. These and other results (Hasegawa, 1976; May, 1981) show

clearly that the acoustic differences between isolated fricatives are relatively easy to

detect and perception of those continua seems as „uncategorical‟ as vowel perception.

2.3.3.5 Categorical Perception of Vowels

Even though vowels are said to be perceived continuously, a closer look at vowel

perception studies reveals that there are discrimination peaks observed in most cases

(Cowan & Morse, 1979; Eimas, 1963; Healy & Repp, 1982; D. B. Pisoni, 1973; Repp,

Healy, & Crowder, 1979; M. E. H. Schouten & Van Hessen, 1992). A study by Fry,

Abramson, Eimas, and Liberman (1962) is one among only a few studies that did not

discover a discrimination peak for vowels. As this was the first study on categorical

perception of vowels it may have given unjustifiable credibility to the common view

that vowels are perceived continuously.

Another property of vowels, rather than their spectral characteristics, is duration, giving

rise to the phonological feature of length. Vowel length can convey phonetic

information, which is phonologically relevant in some languages such as Thai. To test

the categoricality of length, Bastian and Abramson (1964) synthesised a continuum

between the two Thai words /baat/ and /bat/ and found continuous perception.

Recapitulating, it seems that categorical perception is not only a characteristic of stop

consonants, but can be observed in other speech sounds as well.

2.3.4 Categorical Perception of Non-Speech Stimuli

The comparison between speech and non-speech stimuli has always been a very

important aspect of categorical perception research. In order to exclude the possibility

that categorical perception is an artefact of the experimental procedures, it is essential

to test the perception of non-speech stimuli. The original motivation of non-speech

experiments was to determine whether speech is special, which would mean that non-

speech continua would be perceived in a strictly continuous manner. Later, the

24

perception of non-speech sounds promised to yield more insight into possible

psychoacoustic reasons for categorical perception (Mattingly et al., 1971). To use non-

speech stimuli to detect psychoacoustic factors the non-speech must be very similar to

speech stimuli on the one hand, on the other, different enough from speech to avoid

them being perceived as speech sounds.

2.3.4.1 Categorical Perception of Continua Unrelated to Speech

Categorical perception has been found for various diverse continua, e.g. sectored circles

(Cross, Lane, & Sheppard, 1965), flicker fusion11 (Pastore et al., 1977), and colour

(Lane, 1967). There is also the interesting case of categorical hue perception

(Bornstein, 1987).

These results certainly show that categorical perception is not restricted to speech.

However, they do not throw light on the nature of categorical speech perception.

Consideration of non-speech analogues in the next section will address this issue.

2.3.4.2 Categorical Perception of Non-Speech Analogues of Speech Sounds

One major goal of this thesis is to establish whether perception of the same tonal

contours in speech vs. non-speech contexts varies, and what role the linguistic and

musical background plays in such perception. To this end it is important here to

consider the various non-speech analogues that have been used in studies of vowel and

consonant perception.

Voice Onset Time (VOT) Analogues:

In an attempt to create non-speech analogues for VOT, Liberman, Harris, Kinney and

Lane (1961) synthesised a /do/ - /to/ continuum by variation of F1 onset time. The

matching non-speech continuum was obtained by presenting the same sounds, but with

inverted frequency scales and a modified F1 transition. Discrimination of the speech

stimuli was categorical, and of non-speech stimuli continuous. In a follow-up study,

Lane and Schneider (1963, reported in Lane, 1965) found that some participants could

be trained to correctly identify the non-speech stimuli as speech sounds. Results of a

subsequent discrimination study revealed relatively high within-category discrimination

11 The flicker fusion threshold is defined as the frequency at which an intermittent light stimulus seems to be completely steady to the observer.

25

and a peak at the boundary – categorical perception, though see Studdert-Kennedy,

Liberman, Harris, and Cooper (1970), who were unsuccessful in training participants to

identify the non-speech stimuli in a consistent manner.

Another approach to non-speech analogues of VOT, used by Miller, Wier, Pastore,

Kelly and Dooling (1976) was to present stimuli consisting of white noise and a square-

wave buzz with varying noise-buzz lead times. Control data with isolated noises did not

show discrimination peaks, but the noise-buzz stimuli yielded category boundaries that

were generally located at the same point of the continuum where a discrimination peak

was found – another case of categorically perceived non-speech stimuli.

Pisoni (1977) constructed a VOT analogue by varying relative tone onset times (TOTs)

of two pure tones. After training, a boundary effect and discrimination peaks were

observed at a similar location to the VOT boundary for voiced/voiceless stop

consonants. In two subsequent experiments Pisoni (1977) tested discrimination of the

same stimuli without training and some of the participants showed similar results to

those found in the previous study (category boundary at short low-tone lags), whereas

other listeners exhibited two peaks in discrimination – at 20 ms lead and 20 ms lag

times of the lower component tone of the stimulus. Together with a subsequent

successful attempt to train participants to divide the continuum into three categories,

these results show that there were two natural boundaries on the continuum around +20

msec and –20 msec TOT (which coincide with, for example, languages with a three-

way voicing distinction, such as Thai). Pisoni (1977) concludes that VOT perception is

influenced by temporal-order processing limitations. The results suggest that there is a

threshold for judgements of non-simultaneity. It appears that the listener needs an onset

asynchrony of around 20 ms between two successive sounds in order for them to be

perceived as two temporally distinct events. If the separation is less than 20 ms, sounds

appear to have a simultaneous onset and temporal order judgements are difficult (Hirsh

& Sherrick, 1961).

Summerfield (1982) compared perception of three types of continua: TOT, noise-buzz

stimuli (similar to those previously studied by Miller et al., 1976) and VOT, with onset

asynchrony threshold measured as a function of the lowest stimulus component (F1).

The results show that there is a boundary effect for VOT, but not for the two non-

speech continua, suggesting that speech and non-speech processing are different.

26

Formant Transition Analogues:

The most important cues for perception of consonant place of articulation are the

transitions of the first and second formants, F1 and F2. Non-speech analogues of

formant transition cues are created by excluding the constant parts of the signal (F1 and

the steady state portion of F2) to present F2 in isolation (perceived as bleats), or only

the transition (perceived as chirps). Generally, the perception of these chirps and bleats

is continuous, rather than categorical (Kirstein, 1966; Mattingly et al., 1971; D. B.

Pisoni, 1976; Popper, 1972).

Closure Duration in Speech and Non-speech:

To create non-speech analogues of closure duration, Liberman, Harris, Eimas, Lisker,

and Bastian (1961) matched the duration and amplitude of two noise bursts with those

of the pre-closure and post-closure characteristics of speech sounds (/rb d/ -/ r p d/).

Stimuli varied on silent interval duration (30-120 ms) and the results show that ABX

discrimination12 ability of the non-speech stimuli was not as good as for the speech

stimuli and no non-speech discrimination peaks were observed, a result that was further

supported by similar findings by Baumrin (1974). Finally, Perey and Pisoni (1980)

conducted a categorical perception experiment of silent intervals that were embedded in

two 250-ms three-tone complexes imitating the formants of an /ə/-vowel. The results

show that the stimuli were perceived continuously. Together, these results show that

duration of silence is only perceived categorically when presented in a speech context.

2.4 Task and Response Factors in Categorical Perception

When designing a categorical perception task, it is essential to choose procedures

carefully, as these can affect the final results. Thus, a good knowledge of these

procedures and their effects is important and these are reviewed in the following

sections.

2.4.1 Identification Task Factors

In categorical perception tasks, the identification procedure is quite consistent across

studies. The only three task types that are used to test identification are open, covert,

12 In the ABX discrimination task two stimuli (A and B) are presented and then a third (X) is offered, which is either A or B. The subject is required to indicate whether X equals A or B.

27

and AXB identification. In open identification, the subject is presented with each token

(X) of a continuum (usually in random order, and usually multiple times) and asked to

identify the category to which the stimulus belongs, usually using labels that carry the

name of the categories. If there are no category labels and the stimuli are assigned to

functional categories as such as „left‟ or „right‟, the term covert identification is used. A

third way to test identification is with the AXB paradigm, in which the listener must

judge whether stimulus (X), which varies over trials, is more similar to the first or to

the third member of the triad (A or B), which are always the endpoints of the

continuum (Lindblom & Studdert-Kennedy, 1967).

2.4.2 Discrimination Task Factors in Categorical Perception

While the identification task does not offer great variety of experimental procedures,

there are a number of different discrimination tasks to choose from (Gerrits &

Schouten, 2004). The choice of discrimination task is important, because certain tasks

appear to induce categorical perception more than others.

2.4.2.1 ABX and AXB Discrimination Tasks

The ABX task is one of the standard discrimination tests in categorical perception

research. In the ABX discrimination task two stimuli (A and B) are presented and then

a third one (X), which is either A or B. The subject is required to indicate whether X

equals A or B (Liberman et al., 1957). High levels of categorical perception are often

found with the ABX task and Massaro and Cohen (1983) claim that this high level

might reflect the use of phonetic memory. That is, in ABX tasks, listeners may try to

remember both auditory memory traces and the labels of the A and B sounds. When

sound X „arrives‟, these auditory traces may have already faded, in which case listeners

must rely on the labels (or „internal labels‟, if there are no actual labels involved in the

task) they have previously assigned to A and B and choose the one that matches the

label they have assigned to X. Such a strategy could well result in high degrees of

categorical perception. Signal detection analysis of data from an ABX task (B.

Schouten, Gerrits, & Hessen, 2003) has revealed that it is subject to a very strong bias

towards the response “B = X”. In theory, this is not a great problem, as a signal

detection analysis will allow a clear separation between sensitivity and bias, but in

practice, the greater the bias, the more unlikely are the conditions for such an analysis

28

to be met. In order to overcome this problem, a variant of the ABX procedure, the AXB

discrimination task has been used, in which the second stimulus is identical to the first

or the third sound (Van Hessen & Schouten, 1999).

2.4.2.2 The Two-Interval Two-Alternative Forced-Choice Discrimination Task

In the two-interval two-alternative forced-choice (2I2AFC) paradigm, the two stimuli

that are presented are always different, and the subject must determine the order in

which they are presented (AB or BA). This makes it necessary to explain to the

participants what the term „order‟ means, which makes it difficult to avoid mentioning

the phoneme categories in the instructions, with the consequent risk of encouraging

labelling behaviour (M. E. H. Schouten & Van Hessen, 1992). Response bias to one or

the other stimuli has, however, been found to be much smaller here than in ABX tasks

(B. Schouten et al., 2003).

2.4.2.3 The AX Discrimination Task

To avoid strategies that exclusively rely on category labels, a task is required that

reduces the cognitive load on auditory memory and encourages direct auditory

comparison between the stimuli that are to be discriminated (D. W. Massaro & Cohen,

1983). An example of such a task is AX discrimination, in which all possible stimulus-

pairs are presented (AA, BB, AB, and BA) and the participants must indicate whether

they are the same sound or different sounds. A disadvantage of this task is that, if the

difference between two neighbouring stimuli is relatively small, listeners tend only to

respond “different” if they are very sure of their decision. For this reason the AX

discrimination task is not bias-free: the listener‟s response is determined by a subjective

criterion of what is “same” and what is “different”. The AX task is often chosen if there

are time-limitations for the experiment, as it is the most time-efficient discrimination

measure and is also relatively easy to use with young children, or when the stimulus

categories are unfamiliar to the listener, such as in cross-language studies.

2.4.2.4 The Four-Interval-AX Discrimination Task

In this task (4IAX) the test trials consist of eight possible combinations: ABAA,

BAAA, AAAB, AABA, and BABB, ABBB, BBBA, BBAB. The time interval between

the second and the third stimulus is longer than between the other sounds, so that the

29

impression of two pairs of sounds arises. The participants are required to decide which

pair contains two identical stimuli, the first or the second pair. It is assumed that the

listener first determines the differences between the stimuli within the pairs and, in a

second step, determines which of the two differences is smaller. The decision is thought

to be based mainly on bottom-up13 auditory information and not to be subject to top-

down14 influences, such as information about phoneme boundaries (Gerrits &

Schouten, 2004). The 4IAX task has been found to be more sensitive to acoustic

differences between sounds than the other above-mentioned tasks (D. B. Pisoni, 1975).

2.4.2.5 The Four-Interval Oddity Discrimination Task

In the 4I oddity task, the stimuli A and B are presented randomly in the two orders

AABA or ABAA, with stimulus A at the beginning and the end of the trial, functioning

as a reference stimulus. Listeners must respond by indicating if the „oddball‟ (stimulus

B) is the second or third stimulus. In principle this task is as bias-free as 4IAX and it

has a much shorter experimental duration. However, although it is a four-interval task,

the optimal decision rules defined by Macmillan and Creelman (1991) predict that the

ideal observer will ignore the reference stimuli (stimulus 1 and 4) and thus perform the

4I-oddity task like a standard AX (same-different) task (Heller & Trahiotis, 1995). The

advantage over AX is that the listener can decide about the oddball without needing to

refer to any internal criterion of „same‟ or „different‟. In other words, it is expected that

the 4I-oddity paradigm combines some of the important aspects of AX and 4IAX and

that listeners will have the choice between two perceptual strategies: a AX-like

phoneme labelling strategy or a 4IAX-like low-bias strategy.

Due to time constraints, the AX same-different task type is used in the current

experiment series. In addition, as no labels are required in the AX procedure it is

possible to test people from different language backgrounds on the same experiment.

As there were no „real‟ labels for the stimuli that will be used (the pitch contour will be

rising or falling, but this acoustic dimension does not have meaning for everyone) this

was considered to be the most appropriate task with which to work.

13 Bottom-up is a term that characterises any procedure or model which begins with a low-level, e.g. acoustical unit or with the smallest functional units in a hierarchy and proceeds to combine these into larger units. 14 Top-down, as opposed to bottom-up, begins with analysis of a high-level, e.g. more cognitive or a composite unit into progressively smaller units.

30

2.4.3 Methods for Increasing Categoricality

In categorical perception experiments, two different ways of increasing the

categoricality of perception without changing the task have been discovered:

interference with auditory memory and decay of auditory memory.

2.4.3.1 Interference with Auditory Memory

Different attempts to interfere with auditory memory in categorical perception tasks

have been undertaken and the most effective ways to do this are summarised below.

Lane (1965) tested the influence of interference by using an existing vowel continuum

(Fry et al., 1962) and his results indicate that the addition of irrelevant noise interferes

with memory in such a way that it increases discrimination ability at the category

boundaries, but not within categories, and thus leads to a pattern of results that

resembles categorical perception. Fujisaki and Kawashima (1969; 1970) provided

listeners with a fixed vocalic context (/a/) in a test for categorical perception of a vowel

continuum (/i/ - /e/), turning them into diphthongs. Their results show that perception

was more categorical when the context was not present. The difference between the two

sets of results was explained as being a result of the context serving as a perceptual

reference.

Pisoni (1975) investigated the role of a fixed context in vowel perception more

systematically. His hypothesis was that if the context stimuli provide a perceptual

reference, as suggested by Fujisaki and Kawashima (1969; 1970), then it should not be

important whether the context is presented before or after the test stimulus. If, on the

other hand, the context does influence auditory memory, it is expected that addition of a

post-stimulus context will cause more interference than a preceding context. In

addition, Pisoni (1975) hypothesised that the amount of interference would be

determined by the similarity between test stimulus and context stimulus. He used four

different context stimuli (pure tone, white noise, and the vowels /a/ and //) to interfere

with a continuum of stimuli ranging from /i/ to //. In these experiments (identification

and ABX discrimination), the context either followed or preceded the test stimulus. The

results support Pisoni‟s (1975) similarity hypothesis: Discrimination ability was the

31

lowest in the (most similar) / - context with a greater decrease in discrimination

scores when the context followed than when it preceded the test stimuli.

In a subsequent study, Repp, Healy, and Crowder (1979) tested discrimination of

stimuli from an /i/ - / - / / continuum with a silent or a filled (using an inserted /y/

vowel sound) interstimulus interval. Perception of those trials that contained the

intervening vowel stimulus showed a decrease in discrimination performance and it was

concluded that categoricality of perception had increased. The authors‟ interpretation of

these data was that auditory memory had exerted its effect before phonetic

categorisation, in the form of contrastive interactions between auditory stimulus traces,

and that discrimination was the mainly based on the phonetic labels.

Together, the results of these studies strongly suggest that interference with auditory

memory can increase categoricality of perception.

2.4.3.2 Decay of Auditory Memory

Another way of interfering with auditory memory in discrimination tasks is by

manipulating the interstimulus interval (ISI). The ISI is the interval between two stimuli

to be discriminated. Longer ISIs allow greater decay of auditory memory. Since it is

desirable to encourage comparison of the acoustic cues in stimuli during discrimination

and since the auditory trace of speech sounds is time-dependent, it is important to make

a considered decision about the ISI in categorical perception experiments. If the ISI

exceeds the duration of the auditory trace of the stimuli, all that is left of the first

stimulus is a representation coding the relationship of the presented sound to the other

sounds in the experiment, or to pre-established categories, or to both (D. B. Pisoni,

1973). Studies with variable ISIs (Cowan & Morse, 1979; Cutting, Rosner, & Foard,

1976; Frazier, 1976; D. B. Pisoni, 1971, 1973; Repp et al., 1979) have shown greater

categorical perception of speech sounds with longer ISIs up to a maximum of 3 seconds

(Crowder, 1982).

ISI duration will be manipulated in Experiment 1 of this thesis to investigate the role of

auditory memory in pitch perception in speech and non-speech sounds.

32

2.4.4 Methods to Reduce Categoricality of Perception

In this section, the results of studies using more sensitive discrimination paradigms and

experiments using rating scales and reaction time measurements are reviewed.

Experiments in this area have focussed on stop consonants, a class of speech sounds

that are known to be perceived highly categorically.

2.4.4.1. The Use of More Sensitive Discrimination Paradigms.

One way to reduce categoricality is to use more sensitive discrimination paradigms in

order to access memory traces for acoustic properties of stop consonants retained in

auditory memory. As stop consonants are considered to be abstract, highly encoded

categories that require a special speech decoder (Liberman, Cooper, Shankweiler, &

Studdert-Kennedy, 1967), these are ideal candidates for this experimental manipulation.

Pisoni (1971) presented steady-state vowels and stimuli from a /bæ/ - /dæ/ - /gæ/

continuum in a 4IAX and an ABX task (see section 2.3.2). The results show that

discrimination of vowels, but not of consonants was better in 4IAX compared to the

ABX results. Thus the data show a contribution of only phonetic and not auditory

memory in stop consonant perception, even when these more sensitive measures

(4IAX) are used. Pisoni and Lazarus (1974) also compared ABX and 4IAX

discrimination of a /ba/ - /pa/ continuum and included a preparatory sensitisation phase,

where one group of listeners were presented with the whole continuum before the

discrimination task. Discrimination improvement was observed only in those listeners

who had been sensitised and for whom the 4IAX task was used. However, a similar

study (D. B. Pisoni & Glanzman, 1974) found that the factor that increased

discriminability must have been the sensitisation phase, because there was no

difference between results of the ABX and the 4IAX tasks when they were tested

without prior presentation of the stimulus continuum. Crowder (1982) compared

discrimination of a /i/ - /I/ continuum in ABX and AX tasks with different ISIs (500 ms

and 3 s) and his results show that AX is the more sensitive task type that also obtained

much more constant results.

33

In summary there is no doubt that, even when more sensitive discrimination methods

are used, perception of stop consonants is uninfluenced and very categorical. (For a

comparative overview of different discrimination tasks for the perception of non-speech

sounds see Creelman & Macmillan, 1979).

2.4.4.2 The Use of Rating Scales and Measurement of Reaction Times

Another way to quantify differences in within-category perception is by assessing

listeners‟ certainty in identifying stimuli through reaction time measurement. It is

expected that reaction times be longer for „difficult‟ ambiguous stimuli than for „easy‟

unambiguous sounds closer to the endpoints of the continuum. Studdert-Kennedy,

Liberman and Stevens (1963; 1964) were the first to investigate this hypothesis and

found shorter reaction time peaks at the category boundary for stop consonants, a

finding that has been replicated very often since then (Cross et al., 1965; D. B. Pisoni &

Tash, 1974; Repp, 1975, 1981a).

Another way of accessing information about auditory memory in identification is the

use of scales to rate individual stimuli. Conway and Haggard (1971) provided their

subjects with a 9-point rating scale to assess stimuli from /bl/ - /p l/ and /gl/ - /k l/

VOT-continua and their results led to the conclusion that even finer-grained scales do

not make distinctions within those consonant categories possible. These findings

suggest that stop consonants are perceived categorically and that such perception does

not depend on the number of items on the scale that is being used.

A task type called absolute identification was employed by Sachs (1969), in order to

establish a one-to-one correspondence between the stimuli and responses. In this task

listeners used numbers from 1 to 8 to label stimuli on a continuum between /badəl/ and

/bædəl/ and between /a/ and /æ/ vowels that differed in duration. Perception was quite

categorical for all stimuli except for the long vowels, which suggests that rating scales

do not influence categoricality of perception. Similar results were obtained by Cooper,

Ebert, and Cole (1976) for stimuli from /ba/ - /wa/ and /ga/ - /ja/ continua.

Rating scales have also been used in discrimination tasks. Vinegrad (1972) used the

method of direct magnitude scaling to investigate the perception of consonants (/bε/ -

34

/dε/ - /gε/), vowels (/i/ - /I/ - /ε/) and pure tones varying in frequency. Listeners were

asked to rate stimulus X by marking a point on a line between A and B. Their results

suggest highly categorical perception of stop consonants and rather continuous

response patterns for the vowels and pure (non-speech) tones. A similar experiment was

conducted by Strange (1972), who observed the same result patterns for stimuli in

which VOT was manipulated. In a similar vein, Pisoni and Glanzman (1974) had their

participants make confidence ratings for /ba/ - /pa/ discrimination and obtained a very

close relationship between discrimination performance and confidence: the higher the

confidence, the better the performance. Repp (1984) suggests that there might be the

possibility that

“Rather than directly accessing some auditory memory representations, subjects

might base decisions about stimulus differences on estimates of their subjective

uncertainty in phonetic categorization.” (p. 270)

It can be concluded that different kinds of discrimination tasks, rating scales and

reaction time measurements are good ways to access additional information about

stimulus representations of the listener, but they do not change the pattern of

categorical perception.

2.5 Psychoacoustic Strategies and Experiential Factors in Categorical

Perception

In the previous sections, it was shown that categorical perception varies as a function of

stimulus and response factors. Apart from considering these factors, it is also important

to consider whether categorical perception is a property of the auditory system or more

a result of experiential participant factors. This section will review studies that asked

the questions whether categorical perception is immutable (in section 2.5.1. and 2.5.2,

the studies on the influence of training, strategies, and language background on

perception are summarised); whether it is innate (studies with human infants are

reviewed in section 2.5.3); and whether it is specific to humans (a review of animal

studies is given in 2.5.4).

35

2.5.1 Practice and Strategies

It was shown earlier (2.3.3) that within-category discrimination could be manipulated

by using different kinds of discrimination tasks. An additional manner of improving

discrimination is by providing feedback to the participants.

2.5.1.1 Practice and Feedback

In a categorical perception task, feedback means providing the participant with

information about whether the response that they have just given was correct or

incorrect. Hanson (1977) was one of the first researchers to use feedback in a same-

different task and found that listeners‟ performance improved significantly when

feedback was used (compared with Repp, 1975, whose participants completed the same

task without feedback and did not show any improvement).

Training has also been used to improve discrimination performance. Training is

different from feedback in that the listener is given a number of practice trials before

the experiment, whereas feedback is only given during the experiment. Carney, Widin,

and Viemeister (1977) used feedback in their experiment on stimuli from a /ba/ - /pa/

continuum and found improved discrimination, that is less categorical perception. A

follow-up study was conducted by Edman, Soli and Widin (1978) and the results

showed that listeners, who were trained on a labial VOT continuum were able to

transfer their discrimination skills to a velar continuum. Similar results were observed

by Edman (1979), who successfully trained listeners with stimuli from /bæ/ - /dæ/ -

/gæ/ and /pæ/ - /tæ/ - /kæ/ continua and by Samuel (1977) with /da/ -/ta/ stimuli.

Generally, we can conclude that training and feedback, and especially the combination

of both allows intraphonemic discrimination.

2.5.1.2 The Use of Strategies in Categorical Perception

As we have seen in the previous section, feedback and training are ways to enable

listeners to make within-category discriminations in stop consonants by directing the

listeners‟ attention to certain stimulus properties that are not required in speech

discrimination in fluent speech.

In some continua, acoustic differences are more salient and more easily accessible than

in others. The question is: Is discrimination of stimuli in these continua easier and is

less training needed for discrimination in such continua?

36

Repp tested stimuli from a [∫] – [s] continuum, which were perceived rather

categorically before training. After extensive training, isolated fricatives showed

continuous perception. However, when presented in vocalic context, perception was

again categorical (Repp, 1981b). When a different training method was used

(presenting sounds isolated or in context one after the other), participants could be

trained to pick up intraphonemic differences – continuous perception was observed. It

seems that listeners were able to switch between different perceptual modes – phonetic

and acoustic.

There are various studies that show that different auditory strategies may be applied

when listening to speech stimuli or speech-like sounds. This is only possible when

listening in the auditory mode15; in the phonetic mode, all relevant acoustic input is

integrated into a phonetic percept, a phonetic category. Best, Moringiello, and Robson

(1981) discovered that listeners tested on perception of sine-wave stimuli could be

divided into two groups: temporal listeners, who concentrated on temporal duration

information and spectral listeners, whose attention was focused on the spectral changes

in the signal. Similar results were obtained in a study about amplitude rise time

discrimination (Rosen & Howell, 1987): there were large individual differences in

participants‟ attention to spectral vs. temporal cues.

In a study on sine-wave analogues Bailey, Summerfield, and Dorman (1977) found

interesting differences in perception depending on whether the participant perceived the

artificial stimuli as speech or as non-speech: When perceived as speech, the category

boundary was similar to a previously determined boundary of matching speech stimuli,

whereas when perceived as non-speech, the boundary location did not match the speech

boundary. These results suggest that the location of the category boundary (as well as

the shape of the discrimination function) does not only depend on the spectral

characteristics of the stimuli but also on the expectation of the listener.

15 Auditory mode of perception refers to perception without reliance on linguistic category labels. In this mode fine acoustic details are perceived well, even variations within phonetic categories.

37

A similar effect was found for stop consonant manner16. Best et al. (1981) created a

synthetic continuum between sine-wave analogues of /say/ and /stay/ (the stimuli

consisted of an initial noise burst, followed by a variable silent interval and a three-tone

complex with variable F1 onset-frequencies). The results show that those listeners who

perceived the stimuli as speech sounds perceived them in a categorical manner. Those

participants who did not relate the stimuli to speech were separated into two groups:

listeners who paid attention to the duration of the silent interval (temporal listeners) and

listeners who paid attention to the onset quality (spectral listeners). Temporal listeners‟

discrimination was worse than that of the speech-perceivers, whereas spectral listeners

showed much better discrimination abilities than the people who perceived speech.

These results support the conclusion that there are separate modes of perception for

speech and for non-speech sounds.

The frame of reference in which the stimuli are presented is very important for

perception. It has proven to be possible to use different perceptual strategies while

operating in the phonetic listening mode. Researchers at Haskins Laboratories

(unpublished study, cited by Repp, 1984) presented a /ba/ - /da/ continuum for

identification and discrimination and obtained the usual categorical perception result

pattern. When they provided the listeners with an additional label, /, listeners

developed two category boundaries and two discrimination peaks. This shows that there

is strong phonetic influence on discrimination performance.

In summary, it seems that there cannot be a definitive conclusion about whether

perception of speech and non-speech sounds are separate processes, but what seems to

be important is not whether the stimuli are natural speech or non-speech analogues but

rather how they are interpreted by the listener. It seems that categoricality is more a

function of the expectations of the listener than of the acoustic properties of the stimuli.

The results of these studies lead to the conclusion that perceptual categories are not

established as result of psychoacoustic sensitivities but mainly by the phonetic criteria

that are adopted by the listener.

16 Manner of articulation refers to how the tongue, lips, and other speech organs are involved in making a sound. One such manner is that of stops. Stop consonants are made by a set of articulators, e.g., the two lips, or the tongue and the teeth, closing off the airflow momentarily, and then releasing the air in a burst.

38

2.5.2 The Influence of Specific Linguistic Experience on Categorical Perception

As different languages have different phoneme inventories, it is of interest to discover

whether perceptual phoneme boundaries change depending on language background. A

central phenomenon in the area of cross-linguistic studies is the phoneme boundary

effect (Carney et al., 1977). There are two main possibilities – phoneme boundaries

could be of a psychoacoustic or phonetic origin, in which case phoneme boundaries

should be in the same location independent of language background. On the other hand

if boundaries are affected by the surrounding language, then boundaries and

discrimination peaks should occur where the phonemic boundary is located.

Most cross-linguistic studies have examined the voicing dimension, taking advantage of

the fact that languages such as English, French and Thai contrast voicing in

phonetically different ways. English distinguishes voiced and voiceless aspirated stops:

depending on the place of articulation, the perceptual voicing boundary for English

listeners is positioned in the short-lag region of VOT between 20 and 40 ms (Lisker &

Abramson, 1970). In French, Polish and Spanish prevoiced and voiceless unaspirated

stops are contrasted. This category boundary is more variable but generally located at

lag times around 0 ms VOT (Caramazza, Yeni-Komshian, Zurif, & Carbone, 1973;

Keating, Mikos, & Ganong, 1981; L. Williams, 1977). Thai is a language that makes

both distinctions, resulting in three voicing categories (Forfeit, 1977; Lisker &

Abramson, 1970). These differences indicate that native language influences phonetic

boundary locations on VOT continua.

Given these differences, it is to be expected that there would be shifts of discrimination

peaks coincident with the phonetic boundaries across different languages. Indeed Thai

speakers exhibit a discrimination peak in the voicing-lead region, a region in which

English listeners‟ discrimination performance is poor (Abramson & Lisker, 1970).

Most cross-linguistic studies have focussed on consonants, but there have also been few

studies on vowel perception across languages. Flege, Munro, and Fox (1994) tested

English and Spanish bilingual listeners‟ perception of Spanish and English vowels in a

dissimilarity task and found that perceived dissimilarity increased with increasing

acoustic separation in both listener groups. It was concluded that there is a universal,

psychoacoustic component in cross-language vowel perception.

39

Another way of looking at cross-linguistic differences is by testing perception of new

phonetic contrasts. Various studies compared perception of voicing across languages

and found differences in perception, depending on language background. One of the

most influential study is the one by Lisker and Abramson (1970). They compared

perception of VOT in Thai (there are three voicing contrasts in Thai: prevoiced,

voiceless unaspirated and voiceless aspirated) and English (English distinguishes two

voicing contrasts: voiced and voiceless aspirated) listeners. When presented with

stimuli from a VOT continuum, Thai listeners exhibit three categories, whereas English

participants only perceive two categories. These and other results show that native

language does seem to influence boundary existence and location(s) (Lisker, 1970;

Lisker & Abramson, 1970; McClasky, Pisoni, & Carrell, 1980; D. B. Pisoni, Aslin,

Perey, & Hennessy, 1982; Strange, 1972). Together, the results demonstrate that it is

possible to acquire new phonetic contrasts under laboratory conditions, but it is not

clear whether transfer of such acquired phonetic distinctions into the real world, for

example during learning a new language, is promising.

2.5.3 Categorical Perception in Infants

Early on in categorical perception research, when infant perception studies were rare

due to methodological difficulties, categorical perception was seen as a language-

specific phenomenon (Liberman et al., 1957). The subsequent development of tools by

which to test infant speech perception allowed empirical investigation of this issue. In a

now classic study Eimas and his colleagues (Eimas, Siqueland, Jusczyk, & Vigorito,

1971) tested 1-month old infants‟ perception of voicing with the High Amplitude

Sucking technique (HAS). After presenting /ba/ stimuli repeatedly (while the infant

sucked on a non-nutritive nipple) and then presenting a /pa/ stimulus, greater sucking

rates were observed. This was not so when both of the stimuli came from the same

(adult) category (within-category contrasts such as two different /pa/- stimuli but of the

same VOT difference as the cross-category condition). As similar results were found in

subsequent experiments (Aslin & Pisoni, 1980; Aslin, Pisoni, & Jusczyk, 1983; Eilers,

1980) and with place of articulation contrasts (Bertoncini, Bijeljac-Babic, Blumstein, &

Mehler, 1987; Eimas & Miller, 1980b; Jusczyk, Copan, & Thompson, 1978; Moffitt,

1971; Morse, 1972), it was concluded that infants discriminate speech sounds

categorically, that the general mechanisms underlying speech perception are innate and

40

specifically linguistic (Eimas, 1975; Eimas et al., 1971). (However, note that the results

were not so clear for manner of articulation, see Eimas & Miller, 1980a; Hillenbrand,

Minifie, & Edwards, 1979).

A problem with these studies was that, as identification is difficult to test with very

young infants, only the discrimination aspect of categorical perception task was tested.

Burnham, Earnshaw, and Quinn (1987) argue that without identification data, it is not

possible to conclude that there is categorical perception of speech in infants. They

suggest that improved discrimination ability is merely a sign of heightened perceptual

ability and that the processes involved in discrimination are not the same as those in

identification. In a new infant identification procedure developed by Burnham,

Earnshaw, and Clark (1991), 9- to 11-month old infants were tested on their

identification ability for bilabial stops from a VOT continuum. Results show that

identification in the positive VOT region was significantly better than in the negative

VOT region, and that in neither region was a categorical result pattern found. This led

Burnham et al. (1991) to the conclusion that even though there may be categorical

discrimination (Aslin, Pisoni, Hennessy, & Perey, 1981), infants do not identify speech

categorically.

Similar to adult research, most categorical perception studies with infants have been

conducted on consonant contrasts. However, a few experiments have tested infants‟

perception of vowels. In a study investigating vowel pairs ([a] vs. [i] and [u] vs. [i]),

Trehub (1973) found that 1- to 4-month-old babies could discriminate both contrasts.

Similar sensitivities were observed in a later study on [a] vs. [i] by Kuhl and Miller

(1982). Even though discrimination of steady-state characteristics of vowels was

observed in these experiments, the methods and results do not allow conclusions about

categoricality of vowel perception in infants.

The only study that clearly investigated the matter of categorical vowel perception was

the one by Swoboda, Morse and Leavitt (1976), in which 2-month old infants‟

perception of [i] vs. [I] was tested and the results showed that infants perceived these

vowel-stimuli in a continuous manner.

41

As it was shown that, as for adults, infants‟ discrimination of consonants was

categorical and discrimination of vowels continuous, the question of the origin of

categorical perception – psychoacoustic or linguistic – again arose. In order to obtain a

clearer answer to this question, infants‟ perception of non-speech stimuli is of interest.

Continuous discrimination of non-speech sounds vs. categorical of speech sounds was

found by Mattingly, Liberman, Syrdal, and Halwes (1971), but Morse (1972) observed

no perceptual differences between speech and non-speech continua.

These results, together with findings from similar studies (Jusczyk, Pisoni, Walley, &

Murray, 1980; Jusczyk, Rosner, Reed, & Kennedy, 1989), allow the conclusion that

infants‟ perception of speech vs. non-speech is similar to adults‟ perception and

supports the hypothesis that specific perceptual mechanisms are functional from birth.

The difference seems to be that infants are sensitive to all phonetic contrasts, whereas

adults‟ perception is shaped by their linguistic environment such that they have learned

to ignore contrasts that are not linguistically significant for them.

2.5.4 Categorical Perception in Animals

The investigation of speech perception in animals is important because it allows

comparison with infant perception and conclusions about the species-specific nature of

speech perception. As animals do not produce human speech, their level of

discrimination ability for speech sounds should reflect the involvement of

psychoacoustic factors only. In order to investigate this, studies have been conducted

with species that have perceptual systems that are very similar to that of the human

auditory system.

Morse and Snowdon (1975) measured heart rate changes in macaque monkeys in

response to stimuli from Pisoni‟s (1971) /bæ/ - /dæ/ - /gæ/ continuum. There was good

discrimination performance between categories and some within-category sensitivities,

a pattern that is similar to that observed in adult perception of place of articulation

contrasts (see 2.4.3.1). In addition, Kuhl and Miller (1975) used three VOT continua

/ba/ - /pa/, /da/ - /ta/ and /ga/ - /ka/ and found very similar identification abilities for

humans and chinchillas. Similarly, Kuhl and Miller (1978) tested chinchillas on voicing

contrasts and observed categorical discrimination, and Kuhl and Padden (1982)

observed human-like discrimination abilities of voicing and place of articulation in

42

macaques. These patterns of results suggest that there is a psychoacoustic basis for

categorical speech of voicing contrasts. (For a detailed review on animal speech

perception see Harnad, 1987.)

While these results suggest that humans and animals process speech in a similar way,

there are also differences. Waters and Wilson (1976), for example, compared

perception of voicing contrasts in humans and rhesus monkeys and found that both

species showed categorical discrimination, but that the category boundary for the

monkeys was highly dependent on the training stimuli (Sinnott, Beecher, Moody, &

Stebbins, 1976). Furthermore, in a vowel perception study, Kuhl (1991) observed that

monkeys do not exhibit the „perceptual magnet effect‟17. Their perception of vowels

was uninfluenced by the closeness of vowel prototypes, a finding that shows that there

are important differences between human and animal speech perception.

Finally, while there are certain similarities between human and animal speech

perception, it can be argued that the results of speech perception studies with non-

humans do not necessarily imply that general auditory processes or mechanisms

common to humans and non-humans are at work. Non-human performance with speech

is only analogous to human performance; it is possible that similar processes may arise

from disparate evolutionary sources (Jusczyk & Bertoncini, 1988)

2.6 Theories of Categorical Speech Perception

Many theories have been formulated to explain the phenomenon of categorical

perception and how it is embedded in speech perception; a representative sample of the

most important theories is reviewed here. The search for an explanation for how the

transformation from acoustic signal to phoneme occurs has given rise to many

theoretical perspectives on speech perception18. Some of these perspectives are

described in this chapter.

2.6.1 The Motor Theory of Speech Perception

In the 1950s, much influential work was conducted on the categorical perception of

synthetic speech sounds. At Haskins Laboratories, Liberman, Cooper, Delattre and their

17

The perceptual magnet effect theory states that discrimination of vowel contrasts is more difficult if the vowels are acoustically similar to a vowel prototype (P. K Kuhl, 1991) 18 For reviews on speech perception theories and models see (Altmann, 1990; Barron, 1994; Diehl & Kluender, 1989; Diehl, Lotto, & Holt, 2004; Hickok, 2001)

43

colleagues‟ work (Delattre, Liberman, & Cooper, 1955, 1964; Delattre, Liberman,

Cooper, & Gerstman, 1952; Liberman, 1957; Liberman, Delattre, & Cooper, 1952,

1954; Liberman et al., 1956) provided the basis for what is known as the Motor Theory

of Speech Perception.

The motor theory has experienced important changes from the time of its first

formulation (Liberman, 1996), but one claim that has been made in every version is that

the objects of speech perception are articulatory rather than acoustic or auditory events.

More specifically, it is suggested that the articulatory events recovered by human

listeners are neuromotor commands to the articulators – also referred to as intended

gestures – rather than more peripheral events such as articulatory movements or

gestures (Liberman et al., 1967; Liberman & Mattingly, 1985). This view was informed

by a belief arising form the early studies of invariance in speech perception (see section

2.2.2), that the objects of speech perception must be more or less invariant with respect

to phonemes or feature sets and by a further belief that such a requirement was satisfied

only by neuromotor commands.

According to the motor theory, the perception of speech sounds occurs through the

participation of the same neuronal mechanisms that are responsible for their production

(Liberman et al., 1967; Liberman & Mattingly, 1985). Whereas phonemes were

assumed to stand approximately in one-to-one correspondence with neuromotor

commands and muscle contractions, mapping between muscle contractions and vocal

tract shapes was thought to be highly complex owing to the fact that adjacent vowels

and consonants are coarticulated. Because the relationship between articulatory

movements and acoustic signals was assumed to be one-to-one, the complex mapping

between phonemes and speech sounds was attributed mainly to the effects of

coarticulation (Liberman et al., 1967).

As an illustration of the complex mapping between phonemes and their acoustic

realisations, Liberman et al. (1967) pointed to differences in spectrograms of synthetic

two-formant patterns that are perceived by listeners as the syllables /du/ and /di/ (see

Figure 2.3 in section 2.2.2). In these, the steady-state formants correspond to the target

values of the vowels /u/ and /i/, and the rapidly changing formant transitions at the

onset of each syllable carry important information about the initial consonant. That

44

different formant patterns could evoke the same phonemic percept (/d/) strongly

suggested to the motor theorists that invariance must be sought at an articulatory rather

than an acoustic level of description.

The second main assertion of motor theory is that the very close relationship between

speech perception and speech production is innate. Perception of the intended gestures

is said to take place in a specialised speech mode, whose main function it is to make an

automatic conversion from the acoustic signal to the articulatory gesture. The

supporters of this model argue that motor theory can explain a large body of speech

perception phenomena, including the variable relationship between acoustic realisations

and perceived speech sounds, duplex perception19, cue trading20, and audiovisual

integration21 (Liberman & Mattingly, 1985).

Despite these claims it appears that the model is incomplete because it states only that,

but not how exactly the transformation from the acoustic signal to the perceived

articulatory gestures takes place. Moreover, even though categorical speech perception

has been considered to prove the existence and operation of a special decoder for

speech, there is evidence for analogous categorical perception for non-speech signals

(see section 2.3.4.2). These suggest that the categorical perception is not unique to

speech and that its existence does not depend on a special decoder. In addition, the

finding of categorical perception of speech in animals suggests that even if there is a

special decoder, such a special decoder is not unique to humans.

In a newer version of the motor theory, Fowler (1994; 1996) states that speech

perception is a direct mapping process from the acoustic qualities to the gestures by

which those were produced. This is framed within a perspective in which perception is

the direct recovery of the distal event that is perceived. The key elements of the direct

realist approach are: (1) perception is a single step from the signal to the percept, (2) the 19 Duplex perception is an experimental technique that involves manipulation of two components of a sound stimulus, one in each ear. In one ear, the listener would, for example, be presented with a synthesised stop-vowel syllable (such as /ga/) from which the third formant transition is removed; this transition is simultaneously presented to the other ear. People typically perceive a /ga/ as well as the isolated transition, which sounds like a non-speech chirp. Perception is called „duplex‟ because of the double effect: listeners hear both the integrated percept and the isolated transition percept. 20 The term cue trading refers to the concept that different cues can combine to trade against each other to signal the same contrast. 21 In audiovisual integration an integrated percept results from a combination of different auditory and visual input. The phenomenon was discovered by McGurk and MacDonald (1976), who noted that when hearing /ba/ and seeing a video of a face saying /ga/ at the same time, /da/ was perceived.

45

percept is the gesture that produced the event, and (3) there must be an invariant to

mediate the mapping.

2.6.2 Articulatory/Auditory Theories

While the motor theory of speech perception claims that the categorical nature of

perception is a result of the categorical nature of gestures used in production (Liberman

et al., 1957), more recently, it has been proposed that categorical perception occurs as a

result of natural sensitivities of the auditory system, with no reference to articulation, or

any speech-specific mechanism. Stevens (1981) claims that certain acoustic continua,

because of the way the sounds comprising them are processed by the auditory system,

contain regions where discrimination is poor and other regions where it is good.

There are three lines of evidence to support the idea that natural auditory sensitivities

may be responsible for categorical perception. Firstly, categorical perception has been

reported with non-speech continua in which the acoustic contrasts seem to be related to

phonetic contrasts used to distinguish phonemes (Cutting & Rosner, 1974; J. D. Miller

et al., 1976; D. B. Pisoni, 1971). Secondly, mammals seem to perceive VOT

categorically (P. K. Kuhl, 1981; Ramus, Hauser, Miller, Morris, & Mehler, 2000;

Snowdon, 1987), and perhaps they are responding to the same auditory properties that

adult humans respond to, which would speak for a psychoacoustic basis of perceptual

categories. Third, categorical discrimination occurs in human infants for certain speech

and non-speech sounds. For example, infants appear to discriminate VOT categorically

(Aslin et al., 1981), but also discriminate non-speech TOT categorically (Jusczyk et al.,

1980) .

This evidence indicates that categorical perception may have a psychoacoustic basis

that arises from an auditory rather that a linguistic predisposition (Aslin et al., 1981;

Jusczyk, 1981).

2.6.3 The Stage Theory of Speech Perception

The Stage Theory of Speech Perception proposes a different kind of model. In this

theory it is claimed that there is a sequence of stages of processes, the most distinctive

of which includes an array of phonetic feature detectors (Klatt, 1989; K. N. Stevens,

1986). In the first stage the speech signal undergoes analysis in the peripheral auditory

system (inner ear, basilar membrane), including filtering, lateral suppression,

46

adaptation, and phase locking. In the next stage an array of acoustic property detectors,

including detectors for onset, spectral change, formant frequency and periodicity,

compute relational attributes of the signal, for example dynamic changes in the

spectrum or periodicity across different parts of the signal, which tend to be more

invariant than absolute local or static attributes. The third stage consists of an array of

phonetic feature detectors, which examine the set of auditory property values over a

certain period of time, and decide if a particular phonetic feature, such as voicing or

nasality, is present. These decisions are language-specific, i.e., the detectors that are in

use are tuned to the phonetic contrasts of the ambient language of the listener.

Nevertheless, decisions may be similar in many languages owing to constraints

imposed on all speakers and listeners by speech production mechanisms (K. N. Stevens,

1989), and by the auditory system (K. N. Stevens, 1981). A phonetic feature detector

may lead to a decision based on the input from a single acoustic property detector, or it

may combine information from several property detectors. Finally, there are stages of

segmental analysis and lexical search. (For further description of these stages see Klatt,

1989).

The main principle underlying this model is that it should be possible to find a

relatively invariant mapping between acoustic patterns and perceived speech sounds,

provided the acoustic patterns are analysed in an appropriate manner.

2.6.4 The Dual-Process-Model

The Haskins group described speech perception as a process that is either categorical or

continuous, corresponding to the articulatory discontinuity or continuity of the

perceived segmental distinctions; i.e. whether co-articulations between two particular

segments occur, or are anatomically possible. Perception was thought to be mediated by

an articulatory representation of the input (Liberman et al., 1957), even though the

similarity of continuous perception and non-speech perception was evident.

This view of speech perception contrasts with the dual-process model by Fujisaki and

Kawashima (1970) and elaborated by Pisoni (1975). They proposed that two modes are

active simultaneously, one of which is strictly categorical and represents phonetic

classification and the associated short-term memory. The other is claimed to be

continuous and represents processes common to all auditory perception, including

47

auditory short-term memory. The results of any speech discrimination experiment are

thus assumed to reflect both processes: the part of performance that can be predicted

from identification performance (Haskins model) is ascribed to categorical judgements;

the „rest‟, which is the deviation from the Haskins prediction, is assigned to the

memory for acoustic stimulus properties.

The dual-process model partly discards the articulatory basis for categorical perception

by associating continuous perception with auditory (non-speech) perception. Thus, the

difference in categoricality between stop consonants and vowels is hypothesized to

derive not from the different articulatory properties of these segments, but from the

different strengths of their representation in auditory memory. By augmenting the

Haskins model with a free parameter representing the contribution of auditory memory,

Fujisaki and Kawashima also introduced a way of quantifying „degrees‟ of categorical

perception.

The dual-process model reveals new research opportunities, e.g., it allows the

possibility to investigate how subjects use the two sources of information (categorical

and continuous) and what factors might lead them to rely more on one more than the

other. Given that the continuous component is identified with general auditory memory,

techniques became available by supporters of this model to manipulate the strength of

that memory and the influences of that on discrimination (see section 2.4.2). Thus, in

the dual-process model categorical perception changed from being a “special” speech

phenomenon to being a function of the experimental situation.

48

CHAPTER 3

Perception and Production of Lexical Tone

49

3.1 What is a Tonal Language?

As briefly explained in the previous chapter, apart from consonants and vowels, a third

feature that can be used to distinguish the meaning of words in spoken language is tone.

Tone differs from consonants and vowels in that while all the world‟s languages use

consonants and vowels, not all of the world‟s languages, namely „only‟ 60 to 70

percent, make use of tone to convey meaning (Yip, 2002). Examples of tonal languages

are Mandarin Chinese (almost 900 million speakers), Thai (50 million speakers), and

Yoruba (a language spoken in West Africa, about 20 million speakers).

A language is classified as a tonal language if, over and above consonant and vowel

information, pitch height and/or contour can change the meaning of words. Tonal

languages in which there are mostly level tones and relative pitch height is important,

are called register tone languages; tonal languages that also distinguish meaning by

pitch contour (such as falling or rising), rather than height only, are called contour tone

languages (Pike, 1948). It is reported that around 80 percent of the tonal languages in

the world are contour tone languages (Maddieson, 1978).

Before discussing particular tonal languages in more detail, it is important to define and

distinguish some terms that will be used in the following sections: fundamental

frequency, pitch, and tone.

Fundamental frequency (F0) is an acoustic property of the speech signal. F0 is measured

in Hertz (Hz) and it refers to the pulses per second in the acoustic signal. In speech,

each pulse is produced by a single vocal cord vibration (the process of vocal production

will be explained in section 3.4).

The term „pitch‟ refers to how the F0 of sounds (speech or non-speech) is perceived;

pitch is the subjective auditory perception that is correlated with the fundamental

frequency of the acoustic signal. Even though the terms pitch and F0 are often used

interchangeably, the relationship between F0 and pitch is not straightforward.

Psychoacoustic experiments have shown that pitch perception is only linearly related to

F0 up to frequencies around 500 Hz, while frequencies above 500 Hz are

logarithmically related to physical F0 changes (Moore, 1989).

Tone is a linguistic term: lexical tone refers to a phonological category that

distinguishes words and is only relevant for languages in which pitch plays a phonemic

role – tonal languages such as Thai, Mandarin, Cantonese, or Vietnamese. It is

50

important to note that there are other factors than pitch that influence tonal distinctions.

Duration of the vowel, voice quality, values of the second formant (F2), and amplitude

can also play important roles in the perception and production of lexical tone

(Abramson, 1978; Henderson, 1981; Tseng, Massaro, & Cohen, 1986; Yip, 2002).

Nevertheless, pitch variations – height and/or contour – are the sine qua non of lexical

tone.

In this thesis, the concern is with the pitch variations in speech sounds (lexical tone)

and in non-speech sounds.

3.1.1 Tonal Phenomena

Apart from in lexical tone languages, pitch is also used to distinguish meaning in other

languages; the so-called pitch accent languages make use of tone in a different way

than tonal languages. Pitch accent languages like Japanese or Swedish mark words with

specific tone patterns that can change the meaning of the word. Pitch accent languages

differ from tonal languages in that not every syllable is marked by a tone and the

distribution of tones within a word or an utterance is predictable because it is governed

by certain pitch accent rules (Crystal, 2003). Another difference between pitch accent

and tone is that pitch accent relies on relative pitch across syllables, whereas lexical

tone is syllable-based.

In this thesis, the concern is not with pitch accent languages, but with the syllabic tone

languages Mandarin Chinese, Thai, and Vietnamese and the perception and production

of syllabic lexical tone and non-speech pitch.

A phenomenon that occurs in some tonal languages, e.g., Mandarin and Cantonese

Chinese, is tone sandhi. Sandhi in Sanskrit means "putting together" and it refers to

tone manipulation rules governing the pronunciation of tones. Mandarin Chinese has a

tonal system that is governed by tone sandhi rules. Sandhi refers to the change of tones

depending on their context in spoken language (see section 3.2.3 for more information

about Mandarin tone sandhi). Historically, tone sandhi appears to have developed out

of allophonic variants of tones which assimilated other tones and were subsequently

replaced by them (Gussenhoven, 2004).

Apart from tone sandhi, there is a different articulatory phenomenon that can influence

tone production – tonal coarticulation. Coarticulation refers to the process in spoken

language in which the features of phonemes change due to overlap with the properties

51

of adjacent sounds. This has been observed with consonants and vowels (see section

2.2.2), and tones also seem to be prone to coarticulatory effects. Tonal coarticulation

occurs in various languages, including Thai (Gandour, Potisuk, Dechongkit, &

Ponglorpisit, 1992), Vietnamese (Han & Kim, 1974), and Mandarin Chinese (Y. Xu,

1994). In a comprehensive study of tonal coarticulation of Mandarin, it was observed

that there is bidirectional coarticulation in Mandarin that affects the average pitch

height, rather than only the tone onset or tone offset values (X. Shen, 1990).

Even though tone and intonation are semantically distinct and not to be confused (see

section 2.1.3), there are acoustic interactions between intonation and lexical tone. For

example in Thai, lexical tone interacts with intonation such that lexical tones keep their

contour, but the absolute values of F0 of tones decrease with the usual intonational

declination of F0 over the utterance (Abramson & Svastikula, 1983). Nevertheless, tone

and intonation remain distinct. In a study of intonation patterns in Thai,

Luksaneeyanawin (1984; 1998) observed four intonation types: statement intonation

(also called tune 1), question intonation (tune 2), telephone „yes‟ intonation (tune 3),

and agreeable and interested intonation (tune 4). Integrated in the different tunes, the

lexical tones could still be identified quite accurately, even though there was some

confusion between low, mid, and high tones in tune 1 and between mid and low tones

in tune 2 (Luksaneeyanawin, 1984, 1998).

Thus even though the acoustic characteristics of tones may change due to the effect of

intonation, they remain relatively distinct and can be perceived correctly, as long as the

listener has knowledge about the tonal context, such as the normal speaking range of

the speaker. It appears that there is interaction between tone and intonation in Thai, but

intonation does not seem to interfere with correct tone identification.

3.1.2 Notation of Tone

In order to provide a reference system that can be used to describe lexical tones,

linguists often use numbers, called the „Chao tone values‟ based on Chao‟s (1930)

work. These numbers divide the natural pitch range of a particular speaker into five

levels, the lowest being 1 and the highest being 5. Each syllable is given up to three

Chao numbers in order to track the course of F0 movement across the tone. Most

syllable tones have two digits, the first indicating the onset pitch and last the offset

52

pitch of the tone. Three digits are used for tones that have a changing pitch direction in

the course of the syllable, such as the rising tone in Thai or the mid-falling-rising tone

in Mandarin Chinese (Yip, 2002). If a syllable does not have a tone value or has a

neutral tone22, no numbers are assigned. The Chao values are generally used in the

description of lexical tones in Asian and Southeast Asian tonal languages. This

numerical system sometimes goes along with small diagrams of the tonal contour.

Using these conventions, the representation of the five Thai tones is shown in Table

3.1.

Table 3.1

The Five Lexical Tones of Standard Thai and an Example of a Five-way Tone Contrast

on the Syllable /na/. IPA transcriptions, pitch trajectories, tone names, Chao-values

and meanings are provided.

IPA symbol trajectory name Chao-value meaning

_______________________________________________________________

[ná] high 55 „aunt‟/‟uncle‟

[nā ] mid 33 „a paddy‟

[nà ] low 11 a nickname

[nâ ] falling 451 „face‟

[nă ] rising 215 „thick‟

In the description of tones in American languages a similar system is used, but with

reversed values – 5 indicates low and 1 high tone (Yip, 2002). African languages are

usually described with the letters H for high, L for low and M for middle tone

(Gussenhoven, 2004; Yip, 2002), because most of these are register tone languages

with little or no movement within the course of a single tone.

In this thesis, lexical tones will either be described by numerical Chao values or they

will be named according to the appropriate convention for the particular language.

22 The neutral tone (also called fifth tone) is an inherent tone whose pitch characteristics depend on the preceding tone (see also 3.3.1 for tone sandhi).

53

Building on this preliminary exposition of tonal phenomena, and knowledge of how

tones may be described, the following section provides more detailed reviews of the

tonal language systems that are important for this thesis.

3.2 Tonal Language Systems

This thesis is only concerned with tonal languages that are located in Asia and

Southeast Asia, specifically Mandarin Chinese, Vietnamese, and Thai. In this section an

overview of these and, where relevant, some other tonal languages of the world will be

provided.

Tonal languages can be classified on basis of their geographical location. There are

three areas in the world where most of the tonal languages can be found: Africa, East

and Southeast Asia (including the Pacific), and the Americas.

Most of the African languages are tonal, except for the Semitic languages and the

Berber languages. Most African tone languages have a high and a low (and sometimes

a middle) tone (Yip, 2002).

A great number of the languages of Central America, such as Tewa and Mixtec are

tonal languages. Most tonal inventories in Central American languages have four or

five level tones (Yip, 2002).

Asian and Pacific languages are also rich in tones. Included in the Asian-Pacific

languages are Chinese language families, such as Cantonese and Mandarin. Cantonese

is spoken in Hong Kong and Canton (66 million speakers). The number of tones in the

Cantonese language has been subject of debate in the past. According to Bauer and

Benedict (1997), there are six tones, three level (high, level, and mid-level) and three

contour tones (high rising, mid-low rising, and mid-low falling), with an additional

contrastive tone, that only occurs in some Cantonese speakers. Others put the number at

nine tones (So, 1996), six contrastive tones with three allotones of the three level tones

(high level, mid level, and low level), that differ in pitch height. The three contour tones

are the high rising, the low rising, and the low falling tone. The three allotones are not

contrastive tones: The high entering tone is an allotone of high level, mid entering an

allotone of mid level and low entering is an allotone of low level. These allotones are

similar in height to their level counterparts tones but differ in duration (So, 1996).

54

Other Asian/Pacific languages are Tibeto-Burman, Tai-Kadai (including Thai),

Vietnamese, and Papua, In the following sections (3.2.1, 3.2.2, and 3.2.3), the Thai,

Vietnamese, and Mandarin languages are discussed in further detail, because these are

the populations of tonal language speakers that will serve as participants in experiments

in this thesis.

3.2.1 Thai Tones

Thai is spoken by about 50 million people in Thailand, Vietnam, and in the Yunnan

province of China. There are various dialects of Thai, but Standard Thai (also called

Central Thai or Siamese) is the official language in Thailand, that is used in schools, for

trade, on television, and in national politics. The Thai phonemic inventory consists of

20 consonants, nine monophthongs, and three diphthongs. Each of the monophthongs

can occur in a long or a short version and all 21 vowels can occur with an initial

consonant, a syllable-final consonant, or both (Wayland & Guion, 2003).

In Thai, in addition to consonant and vowel features, every syllable carries a lexical

tone. Thai has five lexical tones: high, falling, mid, rising, and low. There are three

level tones, whose fundamental frequencies are relatively stable: high, mid, and low.

The other two tones, rising and falling have dynamic pitch contours and are therefore

called contour tones (Abramson, 1978). The trajectories of Thai tones in terms of F0

over time are provided in Figure 3.1.

50

100

150

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time (normalised)

Fu

nd

am

en

tal F

req

ue

ncy

(F0

Kha_falling

Kha_high

Kha_low

Kha_mid

Kha_rising

Figure 3.1. Time normalised fundamental frequency contours of the five Thai tones spoken by a male Thai speaker. Figure reproduced from Mattock (2004). Permission to reproduce this figure was obtained from the author.

55

3.2.2 Vietnamese Tones

Vietnamese, the official language of Vietnam, is spoken by around 64 million people

(Dung, Huong, & Boulakia, 1998). Vietnamese is a monosyllabic language with six

lexical tones. These can be separated into two sets of three: three high and three low

tones. The high tones consist of high (or mid) level [33] (tone 1), creaky rising (broken)

[415] (tone 3) and high (or mid) rising [35] (tone 5); the low tones are low falling [21]

(tone 2), low dipping-rising [313] (tone 4) and low level [22] (tone 6) (Dung et al.,

1998; Yip, 2002). The trajectories of Vietnamese tones in terms of F0 over time are

provided in Figure 3.2.

100

120

140

160

180

200

220

240

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time (normalised)

tone1

tone2

tone3

tone4

tone5

tone6

Figure 3.2. Time normalised fundamental frequency contours of the six Vietnamese tones spoken by a male Vietnamese speaker (tone stimuli provided by Prof. Mixdorff, 2007). Permission to use tones was obtained from Prof. Mixdorff.

Over and above F0, other factors that play a role in the Vietnamese tonal system are

duration and voice register. The high rising tone is usually produced in a tense manner,

the creaky rising tone is produced with glottalisation, the low falling tone is usually

produced in a breathy manner, and the low dipping-rising is produced in a tense way

(L. Thompson, 1987).

Duration is generally longest in tone 5 (around 400 ms), followed by tone 1, then tone 2

and tone 3, and is even lower in tone 4, with tone 6 being the shortest (around 160 ms).

56

3.2.3 Mandarin Chinese Tones

Around 70% of the Chinese population speaks Mandarin, and there are a total of 1.1

billion Mandarin speakers across the world. Mandarin is spoken in Mainland China and

is the official language of the country, which includes television, education, and

politics. Mandarin Chinese has four tones: high-level (tone 1), mid-rising (tone 2), low-

falling-rising (tone 3), and high-falling (tone 4). Mandarin tone trajectories in terms of

F0 over time are provided in Figure 3.3.

100

150

200

250

300

350

400

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time (normalised)

Fundam

enta

l Fre

quency (

F0)

ma_55

ma_35

ma_214

ma_51

Figure 3.3. Time normalised fundamental frequency contours of the four Mandarin tones spoken by a female Mandarin speaker. Figure reproduced from Mattock (2004). Permission to reproduce this figure was obtained from the author.

Apart from distinct tone heights and contours, duration differences are also apparent in

Mandarin tones. In citation form, tone 3 is observed to be longer than the other tones,

which are of similar duration (Dow, 1972; A. Ho, 1976; Howie, 1976), however this

duration difference is not as apparent in spontaneous speech (Coster & Kratochvil,

1984; Kratochvil, 1985, 1998). Another cue that plays a role in Mandarin lexical tone is

the intensity contour (Chuang, Hiki, Sone, & Nimura, 1972; Coster & Kratochvil,

1984). In terms of intensity or amplitude, the tones can be categorised into five

patterns: level, higher at onset, higher at offset, higher in the middle, and double-peak

amplitude contour (Lin, 1988). Chuang, Hiki, Sone, and Nimura (1972) revealed that

Tone 4 has the highest amplitude overall, and Tone 3 the lowest amplitude, and Whalen

57

and Xu (1992) have shown that listeners are able to identify all tones except for Tone 1

from amplitude contours alone. Together, it can be concluded that F0, duration, and

amplitude constitute phonetic correlates and perceptual cues for tones in Mandarin,

with F0 usually being the most relevant cue.

Mandarin Chinese exhibits tone sandhi (see section 3.1.1). The main sandhi rules in

Mandarin are: If a third tone is followed by another third tone, the first one changes to a

tone that is similar to the second tone; if a third tone is followed by a neutral23, first,

second, or fourth tone, it changes to what can be called a „half-third‟ tone (the half-third

tone begins to dip like the third tone, but then does not rise) a second tone changes to a

first tone when it follows a first or second tone and is followed by any of the four tones

(Hung, 1989).

3.3 Tonogenesis

Tones, as well as consonants and vowels change over historical time; this tonal

development is called tonogenesis. There are two possible manners of tonal

development: languages can acquire lexical tones or they can lose them. As there are

tonal languages spoken in various regions of the world - Africa, America and South

East Asia (Yip, 2002) - it appears that the geographical location of, or genetic

relationship between, languages does not play a simple role in the origin of lexical tone.

Nevertheless such factors may can be relevant for the development of tones, which

often appears to occur through imitation of other languages (Henderson, 1981), with

tone loss often related to proximity of the language community to speakers of non-tonal

languages (Gussenhoven, 2004). In addition, it is rare for new tones to develop in non-

tonal languages that are not geographically close to areas where lexical tone is used.

3.3.1 Development of Tones from Voicing Contrasts - Tonal Split

The most common type of tonogenesis is the development of tones from a voicing

distinction of stop consonants in a prevocalic position. This so-called tonal split occurs

when a voiced vowel follows a voiced stop consonant and changes to a low-pitched

tone. In the same way, high-pitched tones can develop out of vowels preceded by

23 The neutral tone (where the usual tone is dropped) varies depending on the tone that precedes it (see also 3.3.1 for tone sandhi).

58

voiceless consonants. In this way, two tones are created from one previous voiced-

voiceless consonant distinction. Tonal split has been observed for Vietnamese

(Haudricourt, 1954), Chinese (Karlgren, 1926), other East and Southeast Asian

Languages (Haudricourt, 1954, 1961), including Thai (Gandour, 1974) and in African

languages (Beach, 1938) and Yoruba (Hombert, 1975; 1977a).

It can be concluded that in tonal languages, there may be a trend to minimise intrinsic

effects of preceding vowels on F0, which could be there in order to make the different

tones as perceptually distinguishable as possible.

Two theories have been proposed to explain the articulatory tone change that occurs

after prevocalic voiced or voiceless stops: the aerodynamic theory, and the vocal-cord

tension theory. The aerodynamic theory explains the tone change through pressure

differences in voiced vs. voiceless stops (Hombert, 1975; Hombert & Ladefoged, 1976;

Ohala, 1970, 1973b), whereas the vocal-cord tension theory attributes the F0 changes to

tension changes in the vocal cords as a result of changes in stiffness of the vocal cords

during the production of voiceless sounds, and subsequent spreading out over adjacent

vowels (Ewan, 1975; M. Halle & Stevens, 1971; Ohala, 1973a).

Regardless of which explanation is correct it can be concluded that the voicing

distinction in a prevocalic location results in subtle but perceptible articulation-based F0

changes, which can result in a tone contrast supplementing a voicing contrast.

3.3.2 Development of Tones from Consonants

Apart from the influence of voicing, there are other consonantal features that can play a

role in tone development. The development of lexical tones from a vowel preceded by a

breathy voiced consonant was for example shown in the Punjabi language (Gill &

Gleason, 1969). Implosives24 are another class of consonants that appear to influence

lexical tone development. In languages such as Lolo-Burmese implosives have been

observed to lower the pitch of subsequent vowels in a less effective, but similar way to

voiced stops (Matisoff, 1972). Tone change can also result from the effects of glottal

stop consonants on the preceding vowel. In Vietnamese, the glottal stop disappeared in

the 6th century and was replaced by a rising tone (Haudricourt, 1954), a development

24 Implosives are sounds that are produced by a complete closure of the oral cavity.

59

similar to that in Middle Chinese, where the rising tone also evolved out of a final

glottal stop (Mei, 1970).

3.3.3 Development of Tones from Vowel Height

Another explanation of tone change is the historical development of tones from vowel

quality. Even though there are not many cases to support this suggestion, vowel height

seems to have influenced vowel development in Ngizim, an Afro-Asiatic language

spoken in Nigeria, in which the tone pattern of the verb can be (partially) predicted by

the vowel of the first syllable (Hombert, Ohala, & Ewan, 1979; Schuh, 1971). For a

detailed review of data concerning the development of tones from vowel height see

Hombert (1977b).

3.3.4 Other Influences of Tone Development

In addition to the above causes of tone development from stop consonants, implosive

consonants, glottal consonants, and vowel height, other factors that can play a role in

tonal development are intrinsic pitch25, downdrift26, and interactions between tones,

such as tone redistribution to maximise the perceptual distance between different tones.

Tonal phenomena that cannot be explained by any of these approaches are labelled

tonal „flip-flop‟ or „tonexodus‟. Tonal flip-flop indicates that low tones become high

tones and vice versa. The term tonexodus refers to the development in which particular

tones are eliminated from tonal languages.

3.4 Tone Production

In order to understand the perception of lexical tone, some basic knowledge of the

articulation of F0, the basis of pitch perception, is important. The F0 of speech sounds is

mainly determined by the frequency of vocal cord vibrations. The basic mechanisms of

F0 production are explained below, and the basic structures are shown in Figure 3.4.

25

Intrinsic pitch refers to the fact that high vowels (high in terms of tongue position) are produced with a higher pitch than low (tongue position) vowels. 26

Downdrift is the lowering of a high tone after a low tone (also called automatic downstep).

60

Figure 3.4. View of the larynx (a) lateral view and (b) view from above (Pompino-Marschall, 1995). Permission to reproduce this figure was obtained from the author.

The larynx consists of two rings of cartilage, the cricoid cartilage and the thyroid

cartilage. The thyroid is an open ring that sits on the cricoid. The arytenoids, two small

cartilages, are located on the top of the rear rim of the cricoid cartilage. The vocal cords

(also called vocal „folds‟) are two muscles, together called the vocalis muscle that joins

the thyroid and the arytenoid cartilages. Between the vocal cords is the glottis, a

passage that allows air to pass from the lungs to the mouth. The rotating movement of

the arytenoid cartilages controls the degree of opening of the glottis by bringing the

vocal cords closer together or further apart. The rhythmic closing and opening of the

glottis is often referred to as vocal cord vibration. Vocal cord vibration is achieved by

closing the glottis. When the vocal cords are brought together very closely and air is

forced through the narrow glottal opening, a mechanical force called Bernoulli-force27

applies a sucking effect that draws the vocal cords closer together. Due to this closure,

air pressure from the lungs is built up and eventually forces the vocal folds apart,

releasing a stream of air and reducing the pressure behind the glottis. This cycle is then

repeated. Each burst of air is one vocal cord vibration cycle and these cycles can occur

27

The term Bernoulli-force refers to a high-velocity airstream that passes through a narrow opening causing a reduction in air pressure, which results in drawing the walls of the vocal cords together (Pompino-Marschall, 1995).

(a) (b)

Hyoid

Thyroid

Cricoid Trachea

Arytenoid

Epiglottis

Thyroid

Cricoid

Arytenoid

Glottis

Vocal Cords

61

from as low as 80 times per second (in male speakers) to up to 400 times per second

(for female speakers).

Figure 3.5. Schematic figure of the vocal folds during phonation: (a) closed glottis, (b) subglottal air pressure, (c) glottis forced apart (air pressure: horizontal arrows; Bernoulli pressure: vertical arrows), (d) closing glottis, (e) start of the new cycle (Pompino-Marschall, 1995). Permission to reproduce this figure was obtained from the author.

Small but perceptible F0 changes can be achieved by finely adjusting the mass and

stiffness of the vocal cords (Hirose, 1997). The length of the vocal cords can be

increased by contraction of the crico-thyroid muscle, which, in turn, decreases the mass

of the vocal cords and increases their stiffness. As a result of the greater stiffness,

vibration frequency, and thus F0, increases. It has been shown that the crico-thyroid

muscle plays an important role in the process of raising pitch in tonal languages,

whereas the lowering the pitch involves a more complex interaction of crico-thyroid

and thyro-arytenoid muscles (Yip, 2002).

In the production of stop consonants, voicing regulation is only possible under certain

conditions. If the vocal cords are stiffened by muscular tension, a great amount of

pressure has to be applied in order to make the vocal cords vibrate. Stop consonants

that are produced with stiff vocal cords are voiceless. Hence, the vowel following

voiceless consonants is automatically produced with a higher pitch than when preceded

by a voiced consonant, as was seen in consideration of tonogenesis (see Section 3.3).

In the production of vowels and sonorants (voiced consonants with vowel-like quality,

such as approximants like [w] as in „we‟ and [j] as in „yet‟), vibration frequency is

controlled by different factors. In particular the length of the vocal cords can be

affected by the interplay between thyroid and cricoid rotation patterns, leading to

different vibration patterns.

62

3.5 Tone Perception

While a number of articulatory factors contribute to tone and its perception, such as

duration and register (see section 3.5.3), the main factor is F0, and it is F0 that will

mainly be considered here.

3.5.1 Fundamental Frequency and the Auditory System

Here the anatomy and physiology of the human ear are described, especially as they

relate to the coding of fundamental frequency (Ball & Rahilly, 1999; Moore, 1989;

Pompino-Marschall, 1995).

3.5.1.1 The Outer and Middle Ear

The outer ear is composed of the pinna and the auditory canal or meatus (see Figure

3.6). The pinna modifies the incoming sound, particularly at high frequencies, which is

important for the ability to localise sounds. The sound then travels along the meatus and

causes the tympanic membrane to vibrate. These vibrations are then transmitted

through the middle ear by three small ossicles, the malleus, incus, and stapes to a

membrane-covered open window in the bony wall of the spiral-shaped structure of the

inner ear, the cochlea. The major function of the middle ear is to ensure the efficient

transfer of sound from the air through to the fluids in the cochlea. If sound would

impinge directly on the oval window, most of it would simply be reflected because of

the greater acoustical impedance of the oval window. Thus the middle ear acts as an

impedance-matching device or transformer that improves sound transmission and

reduces reflections. Transmission of sound through the middle ear is most efficient at

frequencies between 500 and 4000 Hz (Ball & Rahilly, 1999; Moore, 1989; Pompino-

Marschall, 1995).

63

Figure 3.6. Anatomy of the ear: outer ear, middle ear, and inner ear (Pompino-Marschall, 1995). Permission to reproduce this figure was obtained from the author.

3.5.1.2 The Inner Ear and the Basilar Membrane

The cochlea is a spiral-shaped conical chamber of bone. It has rigid walls and is filled

with almost incompressible fluids. It is divided along its length by two membranes, the

Reissner membrane and the Basilar membrane. The oval window (see Figure 3.7) is

located at the basal end of the cochlea and at the apical end is a small opening (the

helicotrema) connecting the two outer chambers of the cochlea, the scala vestibuli and

the scala tympani. Inward movement of the oval window results in a corresponding

outward movement of the round window, a membrane covering this second opening of

the cochlea (Ball & Rahilly, 1999; Moore, 1989).

When the oval window (see Figure 3.7) is set in motion by an incoming sound, a

pressure difference occurs almost instantaneously through the fluids of the cochlea, and

thus along the whole length of the basilar membrane. A travelling wave moves along

the basilar membrane from the base towards the apex, and the amplitude of this wave at

first increases and then decreases rather abruptly. The response of the basilar membrane

to sounds of different frequencies is strongly influenced by its mechanical properties,

which vary across the length: At the base, the basilar membrane is relatively narrow

and stiff, whereas at the apex it is wider and more flexible, such that the position of

maximum vibration differs according to the frequency of stimulation. High frequencies

produce a maximum displacement of the basilar membrane near the oval window and

there is little movement on the remainder of the membrane. Low frequencies produce a

vibration pattern which extends all the way along the basilar membrane but reaches a

maximum before the end of the membrane (Ball & Rahilly, 1999; Moore, 1989;

outer ear middle ear

inner ear

vestibular appa

ratus cochlea oval window round window

Eustachian tube incus malleus stapes

tympanic membrane

64

Pompino-Marschall, 1995). Thus it can be seen that the basilar membrane acts as a kind

of spectrum analyser, but with a limited resolving power.

3.5.1.3 The Transduction Process and the Hair Cells

Hair cells are located between the basilar membrane and the tectorial membrane, which

form part of a structure called the organ of Corti. The hair cells are divided into inner

and outer hair cells by an arch known as the tunnel of Corti with the inner hair cells

closest to the inside of the cochlea, and the outer hair cells to the outside of the cochlea.

There are about 25000 outer hair cells, with around 140 small hairs protruding from

each one, while there are only around 3500 inner hair cells, each with around 40 small

hairs. The gelatinous tectorial membrane lies above the hairs. The hairs of the outer hair

cells seem to make contact with the tectorial membrane, but this may not be the case for

the inner hair cells. The tectorial membrane appears to be effectively hinged on one

side, so that when the basilar membrane moves up and down, a shearing motion is

created between the basilar and tectorial membrane. This displaces the hair cells

leading to excitation of the inner hair cells, which in turn leads to the generation of

action potentials in the neurons of the auditory nerve. Thus, the inner hair cells

transduce mechanical movement into neural activity.

Figure 3.7. Anatomy of the cochlea (Pompino-Marschall, 1995). Permission to reproduce this figure was obtained from the author.

Basilar membrane

inner outer haircells

Reissner‟s membrane

tectorial membrane

65

3.5.1.4 Central structures

Nerve fibres from the cochlea first synapse in the cochlear nucleus, then in the superior

olivary nucleus, and finally in the inferior colliculus of the middle brain before they

reach the medial geniculate nucleus of the thalamus. From there, fibres lead to the

primary auditory receiving area, which is located in the temporal lobe of the cortex.

3.5.2 Theories of Pitch Perception

Pitch is the attribute of auditory perception that corresponds to the physical dimension

of the wavelength, the repetition rate of the sound signal. For a pure tone this

corresponds to the frequency and for a complex sound such as speech or a musical

chord, to the fundamental frequency (F0). In this section the concern is with different

approaches to pitch perception.

There are two main theories of pitch perception: the place theory and the temporal

theory. The place theory was first proposed by von Békésy (1960), and focuses upon

the spectral analysis of the sound stimulus in the cochlea, in which particular

frequencies (or frequency components, in a complex stimulus) excite particular places

along the basilar membrane. The perceived pitch of a stimulus is said to be related to its

pattern of excitation; for a pure tone, the pitch corresponds to the position of maximum

excitation on the basilar membrane. The problem with the place theory is that it cannot

explain fine resolution of frequency.

The temporal theory (J. F. Schouten, 1940) suggests that nerve spikes tend to occur at a

particular phase in the stimulating waveform (phase locking). The intervals between

successive spikes approximate integer multiples of the period of the stimulating

waveform. Pitch would, according to this be related to the pattern of inter-spike

intervals, rather than to location of excitation patterns on the basilar membrane. The

temporal theory, however, cannot explain the perception of tones with frequencies

many times greater than the maximum firing rate of neurons.

Modern theories, such as that of Moore (1982) tend to combine both place theory and

temporal theory, as neither of them can account for all perceptual phenomena. Moore's

model is like the place model in its initial frequency analysis of the sound, followed by

a temporal analysis of the neural firings.

66

The reliability of a particular theory of pitch perception is not of central interest here; is

it sufficient to note that perceived pitch is not a simple function of the physical stimulus

and the means by which the transformation takes place is both non-linear and still to be

precisely established.

3.5.3 Pitch Perception in Speech – Lexical Tone

There is a great deal of research devoted to the investigation of the sensitivity of the

human ear. The just noticeable difference (jnd) for pitch has been found to be .3 to .5

Hz (Flanagan & Saslow, 1958) in constant synthetic speech sounds and 1.5 to 4 Hz in

dynamic sounds (Klatt, 1973). In this thesis, the concern is with pitch perception in

speech – the case of lexical tone.

Turning to tone perception, the F0 fluctuations both within and between tone classes

must be large enough in order to be perceived as distinct lexical tones, even though

other factors such as duration, amplitude, and voice register may also play a role in tone

perception.

3.5.3.1 Perception of Lexical Tone - Overview

In most of the world‟s languages, lexical tone determines meaning (Yip, 2002). Even

though lexical tone is apparent in the majority of languages, it is underrepresented in

language and speech research. It is essential to investigate lexical tone in order to find

out whether it shares characteristics with other speech segments, such as vowels and

consonants. In the following sections, tone is discussed in terms of perception,

production, and the acquisition and development of both. Categoricality of tone

perception will also be considered.

3.5.3.2 Multidimensional Approach to Lexical Tone Perception

In a multidimensional analysis28 of lexical tone perception, Gandour and Harshman

(1978) investigated the role of average pitch, pitch direction, length, extreme pitch

endpoint of tones, and pitch contour slope on tone perception. Thai, Yoruba, and

American English native speaking participants were tested. The results showed

similarities between the tonal (Thai and Yoruba) speakers, and both similarities and

differences between these and the non-tonal (English) language speakers. All language 28 Multidimensional scaling techniques are used to detect similarities between pairs of stimuli in order to assess the perceptual distance between those stimuli, measured on different dimensions.

67

groups used the dimensions of average pitch and duration as perceptual cues, but tone

direction and slope were perceptually salient only for the Thai and Yoruba speakers. In

comparison to the other cues and the other listeners, Thai listeners paid relatively more

attention to the duration of the stimuli compared to average pitch, direction, and slope

cues. Gandour and Harshman (1978) conclude that mean pitch and duration seem to be

either phonetic or auditory language-general perceptual dimensions, while direction and

slope appear to function as phonetic dimensions only for the speakers of tonal

languages, in which such cues are linguistically relevant. In another study, (Gandour,

1983) Cantonese, Mandarin, Taiwanese, Thai, and English listeners were tested with 19

different pitch contours that can be found in East Asian languages, embedded in the

syllable [wa]. In accord with the previous study, results revealed two important

perceptual dimensions: pitch height and direction. In the English-speaking participants,

pitch direction appeared less important than height.

In summary, it can be concluded that the acoustic feature of pitch height is the most

salient tonal feature, and is used by speakers from both tonal and non-tonal language

backgrounds. Other features (direction, length, and slope) are also important but their

use depends on the particular language background. In general, it seems that pitch

height is a rather universal and acoustic level feature of tone perception, while the other

features are only important for speakers of languages that make use of those features in

a linguistically relevant fashion.

3.5.3.3 Perception of Tone when Pitch Information is not available

In order to investigate the utility of cues other than F0, studies of tone perception in

which F0 is removed are useful. An easy way of removing the pitch information from

lexical tones is whispering. During whispering, the glottis is kept open and thus, all

sounds that are produced are voiceless. Even though whispering is a natural but rather

unusual mode of speaking, people can communicate with whispered speech. Perception

of whispered tone is of interest because F0 information, one of the main acoustic cues

for pitch perception is not available in whispering.

Two types of studies of perception of whispered lexical tone need to be considered

here: those studies that present whispered lexical tones on isolated words and those that

68

present whispered tones in context. After that, tone perception in speech sounds where

F0 is neutralised is considered.

Wise and Chong (1957) investigated perception of whispered lexical tone in Mandarin

sentences and observed correct identification in 62% of the sentences. It was concluded

that lexical tone is not consistently intelligible when it is whispered (Wise & Chong,

1957).

In an experiment about tone perception in whispered speech in isolation in Norwegian,

Swedish, Slovenian, and Mandarin Chinese, Jensen (1958) presented whispered

contrastive word pairs, with the only difference between the words being the lexical

tone. Results of this study show that whispered lexical tone was easier to identify in

Swedish (100% correct identification of whispered tones) and Mandarin Chinese (73-

88% correct), but more difficult in Norwegian (53-73% correct) and Slovenian (71-

85% correct). In a replication of this experiment by Miller (1961) with Vietnamese

whispered tones, only 42% of the whispered tones could be correctly identified. Miller

concludes that less tone information appears to be transmitted in Vietnamese whispered

speech than in the other languages (J. D. Miller, 1961).

A different method of removing tone information is by artificially neutralising the F0

contour while leaving the remaining phonetic information, such as amplitude and

duration intact. Two studies were conducted by Liu and Samuel (2004): in one F0 was

partly or completely neutralised, in the other whispered speech vs. lexical tones with

synthetically removed pitch information29 were tested. The results show that

identification of the neutralised tones is still quite accurate and that there are durational

differences in whispered tones which might aid perception (Liu & Samuel, 2004).

Together these studies show that other cues to tone perception, such as amplitude and

duration are perceived even if F0 information is not available.

Together, these results show that perception of lexical tone in whispered speech in

dependent on the availability of context and that pitch information can be replaced by

other perceptual cues.

29 This was done by replacing the voicing source with white noise.

69

3.5.3.4 Lateralization and Neuroimaging Studies

Further indication of the perception of tone and pitch may be gleaned from studies

determining the locus of processing of particular types of sounds, over and above the

well known specialisation of particular parts of the brain for certain aspects of speech

(Broca, 1861; Wernicke, 1874). In this section, lesion studies (effect on perception and

production of tone), dichotic listening tasks, and functional neuroimaging studies will

be reviewed.

When investigating tonal languages, similarly to non-tonal languages, the approach of

looking at patients with brain lesions is common. In an experiment on brain-damaged

patients, Gandour and Dardarananda (1983) asked left-hemisphere damaged (LHD)

listeners to identify words that differed in the five Thai lexical tones. They found that

LHD patients were less accurate at identifying Thai tones than a normal and a right-

hemisphere damaged (RHD) listener. It was concluded (Eng, Obler, Harris, &

Abramson, 1996) that the left hemisphere of the brain may be responsible for tone

processing (Eng et al., 1996; Gandour & Dardarananda, 1983).

Another tone perception study with brain lesion patients was conducted by Hughes,

Chan, and Su (1983) who observed that Mandarin RHD patients had problems with

perception and production of affective prosody, but their lexical tone identification was

intact (it needs to be mentioned that the stimuli in the prosody part of the task were

sentences, whereas the tone stimuli were words). These results suggest that damage in

the right side of the brain can lead to impairment in monitoring pitch at a global level -

prosody, and not a deficit in lexical tone processing. Together these results indicate that

RHD patients‟ lexical tone perception is normal which suggests that there is a left-

hemispheric dominance for lexical tone processing.

So far, only tone perception has been reviewed. Turning to tone production research,

Gandour also examined brain-injured patients‟ production of lexical tone (Gandour,

Petty, & Dardarananda, 1988; Gandour, Ponglorpisit et al., 1992). Gandour et al. (1988)

70

asked six aphasic patients, one RHD, one dysarthric30, and five normal subjects to

identify and produce Thai tones. Participants without brain damage were very good at

tone identification, while four out of six aphasics showed over 90% identification

accuracy, and the RHD and dysarthric subjects over 92% accuracy. Tone production

was also very accurate in all participants, with very few tone errors. Because most

participants‟ results were at ceiling, it could not be concluded that tone production is

lateralized to one specific hemisphere. In another tone production study, Gandour et al.

(1992) found that RHD patients‟ tone production was between production accuracy of

fluent and non-fluent LHD patients‟ performances, with the non-fluent LHD patients‟

productions being the least accurate. These data imply that poor tone production may

better be characterised as a more peripheral fluency deficit rather than as having its root

cause at the level of a hemispheric differences. Similar results were observed by Yiu

and Fok (1995) who investigated tone production and perception in Cantonese

aphasics. In the perception part, participants had to identify pictures representing six

Chinese words differing in tone. The normal control participants performed

significantly better than the aphasics. In the production part, subjects produced the

same words in production. It was observed that normal subjects produced less tonal

errors than fluent aphasics, who produced less tonal errors than non-fluent aphasics,

which indicates that the grade of fluency influences patients‟ production of lexical tone.

However, as in the other experiments, no right-hemisphere patients were tested, which

makes it impossible to conclude about hemispheric specialisation in the perception and

production of lexical tone.

In addition to the lesion studies mentioned above, different dichotic listening tasks31

were conducted in order to find out about lateralization of tone processing. In such a

task, Van Lancker and Fromkin (1973; 1978) asked Thai and English listeners to

identify Thai tones, presented as words, and hummed versions of these words. Thai

listeners, when listening to Thai words, displayed a right-ear advantage32, indicating

left-hemispheric specialisation for lexical tone, but no advantage for either ear for the

hummed tones. The authors concluded that there is a left-hemisphere dominance for

30 Dysarthric patiens have a speech impairment resulting from damage to the nerves and areas of the brain that control the muscles that are used in forming words. 31 In dichotic listening tasks listeners hear a different signal in each ear through headphones. 32 The phenomenon of right-ear advantage shows that speech sounds are perceived louder and clearer when presented to the right ear.

71

tone processing. It has to be mentioned that the condition in which the Thai participants

listened to Thai words was the only one in which listeners were presented with

semantically correct meaningful syllables and therefore the results might be caused by a

word listening effect but may not be interpretable as a tone perception result.

Not only Thai, but also Mandarin tones were used in dichotic listening tasks: For

example, Baudoin-Chial (1986) asked French and Mandarin native listeners to identify

Mandarin consonants, tones, and hums. The Mandarin group showed no ear preferences

for any of the stimuli, whereas for the French listeners, REA was observed for

consonants, left-ear advantage (LEA) for tones, and no ear advantage for the hums.

More recently, Wang, Jongman, and Sereno (2001) presented Mandarin words differing

in tone to Mandarin and English native listeners for identification. (English listeners

were trained before the main task and the sounds were embedded in noise.) An REA

was observed in Mandarin, but not in English speaking participants, which supports the

view that the left hemisphere is responsible for tone processing. However, again this

pattern of results could be due to general word processing rather than tone processing.

With the help of functional neuroimaging, different studies have been conducted in

order to investigate lateralization of tone perception. Gandour, Wong, and Hutchins

(1998) used positron emission tomography (PET) to investigate tone perception in Thai

and English speaking listeners, who had to discriminate Thai tones presented as words

and low-pass filtered pitch-matched tone contours. The results revealed lateralization of

tone processing in Thai, but not in English speaking participants. A limitation of this

experiment is that the stimuli in the tone condition were real Thai words. Therefore the

data may only show left hemispheric word processing, regardless of lexical tone

contrasts (similar results were found by Klein, Zatorre, Milner, and Zhao, 2001).

In summary, it is difficult to conclude which hemisphere of the brain is specialised for

tone perception and production, as comprehensive studies with RHD and LHD patients

are yet to be conducted. Regarding dichotic listening tasks, it is not possible to study

lateralization of tone perception independently of word/segment perception because

both mostly occur at the same time. Lastly, the neuroimaging data do not completely

support the lateralization view (for a detailed review see Wong, 2002).

72

3.6 Tone Acquisition

Investigation of tone acquisition is important for the understanding of how tone

production and perception develops, and especially how it is affected by linguistic and

developmental variables. As with other aspects of tone, research is sparse, but the few

results available, mainly from case studies, are reviewed in the following section. In the

following, tone acquisition in first language development, both production then

perception is considered first, and then tone production and perception in second

language acquisition will be reviewed.

3.6.1 Production of First Language Tone

In a case study Tuaycharoen (1977), investigated tone acquisition in a single Thai child.

This infant first produced the mid and low tones at around 11 months, which was also

the time the first words were articulated. Despite correctly producing those two tones,

the infant also substituted them with falling, high and rising tones. At 14 months, the

rising tone was introduced and by the end of the 15th month, the high and falling tones

were correctly produced. By the age of around two years, the full set of six lexical tones

was acquired, whereas the production of certain consonants (diphthongs, triphthongs,

initial consonant clusters) was not acquired yet (Tuaycharoen, 1977).

In a similar study of Cantonese, Tse (1977) observed the language development of his

own child. The high and the low tone could be correctly produced at 16 months and by

20 months mid and high rising tone were mastered. Low rising and mid tones were

acquired when the infant began to articulate the first two-word utterances, in the

twenty-first month. It was noted that the child still had problems with articulation of

certain consonants even after the whole set of six lexical tones was acquired (Tse,

1977). In another case, this time with Mandarin Chinese speaking children between the

age of 18 and 36 months, Li and Thompson (1977) also observed that Mandarin tones

were acquired earlier than consonants and vowels. High tone and falling tone were

produced earlier than rising and dipping tones. Tone sandhi rules were mastered at the

same stage as when multi-word utterances were first produced (Li & Thompson, 1977).

In the first cross-sectional multiple-participant study of L1 tone development Burnham,

et al. (2005) investigated tone language production in 12 children at each of ten ages

73

from 18 months to 7 years. It was found that tone production was quite accurate for all

tones at 18 months. Beyond this it was found that tone production and development, as

measured by a tone differentiation score (Barry & Blamey, 2004) improved steadily

over age with respect to the differentiation of the three level tones, and contour vs. level

tones was relatively stable over time but appeared to peak at times of significant

linguistic investment – around the onset of the vocabulary spurt33, and the onset of

reading instruction in school.

Together these data show that lexical tone production is acquired before the production

of consonants and vowels, and that production of tone is more precise and robust than

observed in consonants and vowel production.

3.6.2 Perception of First Language Tone

It appears that tones are generally acquired before consonants and vowels. For example

Tse (1977) investigated a 10-months old child, in order to find out whether the tone or

the consonant information for a word („light‟) was more relevant. He observed that the

responses were more confident when the correct tone information was available with

incorrect segments than when the consonant information was correct. From these data,

Tse concludes that tone information is more salient than segmental information (Tse,

1977). Although this is just a single case study, it is interesting that these results concur

with those for tone production development in first language acquisition.

Turning to relative salience of tonal information across languages, Harrison (1998)

observed that Yoruba but not English speaking infants were able to discriminate

artificial tones and were especially well at discriminating tones that are similar to the

actual Yoruba tones.

In a more comprehensive study Mattock and Burnham (2006) investigated 6- and 9-

month-old Chinese (Mandarin and Cantonese) and English infants‟ discrimination

ability with Thai tones and synthetic violin tones. The results show that English infants‟

discrimination of lexical tones declined over age, but discrimination of the non-speech

tones remained constant. No difference in discrimination performance between 6 and 9

months for either speech or non-speech tones was observed in the Chinese infants. 33 The vocabulary spurt refers to a sudden increase in word acquisition in children‟s language development (Bloom, 1973).

74

These findings indicate that there is perceptual reorganisation for tone just as there is

for consonants and vowels (Werker & Tees, 1992) and that this reorganisation is

dependent on the language environment.

3.6.3 Production of Second Language Tone

Most training experiments in the area of second language tone have concentrated on

tone perception (see section 3.6.4). In this section those experiments that investigated

second language tone production will be summarised.

Shen (1989) analysed tonal errors made by American English speakers who had studied

Mandarin for four months. The results show that error rates ranged from 55.6% for

Tone 4 to 8.9% for Tone 2. The error rates for Tone 1 and Tone 3 were 16.7% and

9.4%. These results indicate that American learners have problems with the production

of all Mandarin tones, but especially with Tone 4 (this is the high falling tone).

Miracle (1989) also analysed the errors that second-year American learners of

Mandarin make, and found an overall error rate of 42.9%. The errors were classified as

either tonal register errors (too high or too low) or tonal contour errors. Miracle found

that the tone errors were evenly divided between these two error types, and they were

also evenly distributed among all tones. These results show that second language tone

acquisition depends on the individual tones at the beginning of the tone learning

process (X. S. Shen, 1989), but later are independent of the particular tone that is

produced (Miracle, 1989).

In a different vein than the previously mentioned quasi-experiments with learners of

tonal languages, Wang, Jongman, and Sereno (2003) conducted a tone training study

with American English listeners to investigate whether perceptual tone training with

Mandarin tones transfers to production performance. The results of perceptual ratings

indicate that tone production improved by 18% compared with pre-training. Acoustic

analyses of the speech data confirm these results, and further show that performance

improvements consisted mainly of increased accuracy in pitch contour rather than pitch

height. These results are consistent with training experiments in the segmental area, and

show transfer effects of perceptual learning to production (Akahane-Yamada, Tohkura,

Bradlow, & Pisoni, 1998; Bradlow, Pisoni, Yamada, & Tohkura, 1997).

75

3.6.4 Perception of Second Language Tone

Listeners who do not have tonal language experience often have difficulties perceiving

lexical tones (Bluhme & Burr, 1971; Kiriloff, 1969; Y. Wang & Spence, 1999) and as

shown earlier (see section 3.5.3.2), non-tonal language listeners place more emphasis

on non-linguistic tone features (average pitch and extreme F0 values), whereas tonal

language speakers concentrate on the linguistic dimensions direction and slope of the

tone contour (Gandour & Harshman, 1978). This indicates that the average pitch and

extreme F0 values are important perceptual features in second language acquisition of

tone. Further studies of second language tone perception for learning Mandarin and

Thai tones are set out below.

3.6.4.1 Mandarin Tone Perception by Second Language Learners

When looking at tone perception in second language learners, it is important to

compare tone perception in tonal and non-tonal language listeners. Lee, Vakoch, and

Wurm (1996) tested Cantonese, Mandarin, and English listeners in a tone

discrimination task of Cantonese and Mandarin tones. They found that tonal language

speakers were more accurate and faster at discriminating tones than non-tonal language

speakers. Thus, it appears that listeners‟ strategy for tone perception depends to some

extent on the linguistic function of pitch in their native language. This is also reflected

in their processing style: Mandarin listeners‟ degree of segmental identification

decreased when an irrelevant pitch level change occurred, whereas English listeners

were not influenced by such a change (Lee & Nusbaum, 1993). This implies that tonal

language listeners perceive pitch and segmental information in an integrated manner,

whereas non-tonal language speakers perceive segments independently of their tonal

manifestation.

Nevertheless, it has been shown that non-tonal language speakers are able to learn

about tone. Tone perception ability has found to improve with increased training:

English listeners‟ perception of Mandarin tones increased by 21% after extensive

perceptual training (Y. Wang & Spence, 1999), a result consistent with what has been

found for consonants (D. B. Pisoni et al., 1982). This perceptual learning of tone

appears to be mainly based on acoustic cues (Gandour & Harshman, 1978).

76

Leather (1990) investigated the effect of production training on perception. A group of

Dutch speakers were trained to produce four Mandarin words differing only in tone.

The results showed that participants‟ tone perception improved through the production

training. The Dutch speakers were able to perceive tone differences without perceptual

training. Leather concluded that training in one modality (production) could enable

learners to perform well in another modality (perception). However, since only a single

syllable was used in training as well as in the test phase, it cannot be concluded that this

observed transfer effect is universal34.

Apart from acoustic cues that are utilised in tone perception by second language tone

learners, it seems that F0-related aspects of the first language can affect tone learning:

In a study about tone perception as a function of linguistic context and sentence

position, Broselow, Hurtig, and Ringen (1987) tested American listeners‟ perception of

Mandarin tones presented in isolation as well as in the context of two or three syllables.

One of the most important findings of this study was that identification accuracy of

Tone 4 varied with the position in context. The authors argue that the results reflected

the interference from English sentence intonation. The results imply that English

listeners‟ perception of Mandarin tones is influenced by their native intonation system.

This finding is in line with the results of studies showing the influence of stress on

Mandarin tone perception. White (1981) observed that English listeners perceive

Mandarin high tones as stressed and the low Tone 3 as unstressed, despite the fact that

in Mandarin, the stress on a syllable is realised by duration and amplitude rather than

by F0.

Together the results of second language Mandarin tone learning show that tone

perception depends on language background, processing style, context, and training

methods.

34 It should be noted however, that while there was no perceptual discrimination training, the production task did of course include presenting four different tones for participants to produce, so there may have be considered to have been some perceptual training of sorts.

77

3.6.4.2 Thai Tone Perception by Second Language Learners

As in the case of Mandarin speakers, Thai language speakers perceive tone better than

non-tonal language (English) speakers (Wayland & Guion, 2003), those listeners

learning Thai as a second language perceive tone better than those without lexical tone

experience (Wayland & Guion, 2003), and Thai children discriminate tones better than

Australian English children (Burnham & Francis, 1997) indicating that linguistic

experience is very important for tone perception.

It appears that the cues that are used in second language tone perception may change

over age. In a series of studies about the use of tonal cues in speech perception over

age, Burnham and Francis (1997) tested four, six, and eight year old Thai and

Australian English children and found that Thai children were better at discriminating

tones than English speaking children. Performance increased with age, independent of

language background. A closer look at the tonal cues that were used revealed that the

cue of mean pitch was used at all ages and the use of pitch onset as perceptual cue

increased with age in Thai children, whereas English speaking children made more use

of tone onset and offset as cues for perception (Burnham & Francis, 1997). These

results support the view that exposure to a tone language across development fine-tunes

a listener‟s use of cues for lexical tone, and that over age speakers of a tone language

integrate more cues in their perception of tone. The results of Burnham and Francis

(1997) support Gandour‟s (1983) findings for adults that pitch height is significant

perceptual dimension in adult tone discrimination for tonal and non-tonal language

speakers. However, Burnham and Francis (1997) found that young tonal and non-tonal

language speaking listeners‟ perception depends on a combination of acoustic cues

rather than on a single acoustic dimension.

3.6.5 Tone Perception as a Function of Tone Language Experience – First and

Second Language studies

The studies summarised in section 3.6.1 to 3.6.4 indicate that native and non-native

tonal language speakers exhibit different patterns in tone perception and production.

Firstly, there seem to be processing differences. Studies of lateralization differences

indicate that lexical tone is processed in a different manner by tonal than by non-tonal

language speakers (Hsieh, Gandour, Wong, & Hutchins, 2001; Klein et al., 2001; Y.

Wang, Sereno, Jongman, & Hirsch, 2001). For native speakers acquiring tones as L1,

78

the tone appears to be a part of each word, however such an association between the

segmental structure and the tone contour does not seem to be active in non-tonal

language speakers (Bluhme & Burr, 1971; Kiriloff, 1969; X. S. Shen, 1989).

Secondly, it appears that second language and first language tone learners attend to

different acoustic/phonetic features. Non-tonal language speakers attend to acoustic

cues differently (Gandour, 1983). Non-tonal language listeners attend more to tone

height than to tone direction, whereas within tonal language listeners, attention depends

on the particular language that is spoken – Cantonese listeners attended more to tone

height than Mandarin and Taiwanese participants, whereas Thai listeners used the

direction cue more than the other groups (Gandour, 1983). A reason for this could be

that non-tonal language listeners have less auditory resources left to pay attention to

cues that are provided by context (Jongman & Moore, 2000). Thirdly, a source of

difficulty with learning tones has been attributed to interference from L1 features, with

knowledge of the function of pitch in the English stress and intonation systems found to

interfere with English listeners‟ perception of lexical tones (Broselow et al., 1987;

White, 1981).

Although tonal and non-tonal language speakers process tones differently, non-tonal

language learners‟ tone perception ability can be improved through training (Y. Wang

& Spence, 1999). This improvement seems to generalise to other contexts, transfers to

tone production, and is stored in learners‟ long-term memory (Y. Wang, Jongman et al.,

2003). These results imply that the adult production and perceptual systems still retain

plasticity with respect to lexical tone, and that cortical representations can be modified

as learners gain additional experience with lexical tone (Y. Wang, Sereno, Jongman, &

Hirsch, 2003; Wong, Skoe, Russo, Dees, & Kraus, 2007).

3.7 Categorical Perception of Lexical Tone

Categorical perception refers to the case in which a physical continuum is perceived

categorically. In speech perception this refers to the categorical perception of

acoustically-based continua (for a detailed review of categorical perception see Chapter

2), and this occurs especially in stop consonants (Liberman, Harris, Eimas et al., 1961;

Liberman et al., 1957), whereas contrasts involving vowels are perceived much less

79

categorically (Fry et al., 1962). Indeed the pattern observed in vowel perception can be

described as almost continuous. In this thesis, the concern is with categorical perception

of lexical tone. Previous research on this matter is presented in the following sections.

The results of studies are best examined for Cantonese and Mandarin, and this

separately, as the findings appear to differ as a function of language background,

perhaps due to the nature of the particular tone systems/tone spaces.

Categorical perception data will be separated into studies that concern mechanisms of

tone perception and experiments that obtain categorical vs. continuous results in

different listener groups with different stimulus materials.

3.7.1 Categorical Perception of Cantonese and Mandarin Tones

Firstly, the studies that have found differences in categoricality in listeners of tonal and

non-tonal languages will be considered here.

Francis, Ciocca, and Ng (2003), tested identification and discrimination of Cantonese

tones in Cantonese native listeners. Identification results for the three continua (low

level to high level, high rising to high level, and low falling to high rising) show that

level tone continua were identified in a continuous way, similar to the degree of

categoricality results found with vowels (Abramson, 1962; Fry et al., 1962), whereas

perception of contour tone continua appeared to be more categorical, as found in

consonant perception (Liberman, Harris, Eimas et al., 1961; Liberman, Harris, Kinney

et al., 1961). No discrimination peaks were observed and the authors conclude that

category boundaries in lexical tone continua are influenced by a combination of natural

psychoacoustic sensitivities and linguistic experience. The natural sensitivities are

shown by the fact that the boundaries between some Cantonese contour tones seem to

be located at locations of perceptual space at which it is likely for listeners to exhibit

heightened auditory sensitivity (changes in pitch contour slope). For example the

transitions between falling and rising and between rising and level pitch contours are

auditory salient boundaries that have been observed in non-tonal participants listening

to speech and non-speech stimuli (Klatt, 1973; Schouten, 1985). The influence of

linguistic experience was shown by the coincidence of a linguistic boundary with a

category boundary (Francis et al., 2003). This pattern of results shows that

categoricality of tones does not only depend on language background but also on

certain acoustic features of the particular tones that the listener is presented with.

80

Fox and Unkefer (1983) also demonstrated categorical perception effects for native

Mandarin speakers when they listened to dynamic Mandarin tones and non-categorical

perception of these tones in American English listeners. These results show that

categoricality depends on the linguistic status that the sound has in the listeners‟

phonological system and tone is therefore treated in a linguistic way by tonal language

speakers – categorical perception and in a rather phonetic/acoustic – continuous way by

non-tonal language speakers.

Quite similar results were observed by Leather (1987), who investigated identification

of dynamic Mandarin tones by Mandarin, Dutch, and English listeners. The results of

this experiment indicate that Mandarin listeners‟ tone perception is categorical, whereas

in English and Dutch speaking participants, a rather continuous pattern of results was

observed, another indicator that categoricality of tone perception is shaped by language

background.

In an experiment investigating mechanisms of tone perception by measuring

discrimination of small F0 contour variations by Mandarin and English listeners,

Stagray and Downs (1993) observed that Mandarin speakers are less sensitive to small

pitch differences than English listeners. This result appears surprising at first, but it

makes sense when it is taken into account that Mandarin listeners, in natural language

processing, have to ignore small F0 differences in order to be able to correctly

categorise lexical tones.

Bent, Bradlow, and Wright (2006) tested Mandarin and English listeners identification

and discrimination of (dynamic) speech and non-speech tones to investigate whether

long-term linguistic experience influences processing of non-speech sounds. As

expected, Mandarin listeners‟ identification of tones was significantly more accurate

than English listeners‟; however, non-speech discrimination did not differ across the

listener. Interestingly, they found cross-language differences in non-speech pitch

identification: Mandarin listeners more often misidentified flat and falling pitch

contours than English listeners in a way that could be related to specific features of the

81

sound structure of Mandarin, which suggests that the effect of linguistic experience

extends to non-speech processing under certain stimulus and task conditions.

Recent experiments by Halle, Chang and Best (2000; 2004) compared categorical

identification and discrimination of dynamic tones in (Taiwanese) Mandarin Chinese

and French listeners. In the first experiment (Y. C. Chang & Halle, 2000), Mandarin

listeners were tested and the results reveal that perception is categorical in a similar

manner as vowel continua are (rather shallow identification functions with slight peaks

in discrimination). A follow-up cross-language study was conducted with speakers of

French and Taiwanese Mandarin (P. A. Halle et al., 2004) and categorical perception

was observed in Taiwanese but psychophysically based perception (continuous looking

results) in the French listeners. The category boundary was located at different points of

the continuum for the different language groups, a phenomenon previously found by

Chan et al. (1975). Halle et al. conclude that tones are perceived quasi-categorically (in

a way similar to vowels) by listeners of Taiwan Mandarin, whereas perception of the

same tonal stimuli seems to be psychophysically based in the French listeners (P. A.

Halle et al., 2004).

Xu, Gandour, and Francis (2006) conducted another cross-language study of lexical

tone perception with Mandarin Chinese and English listeners. They examined

perception of tones that ranged from a level to a rising tone and were presented as

speech and sine-wave stimuli. Their results show that tonal language speakers

perceived the tones in a categorical manner, whereas in non-tonal language speakers, a

rather continuous pattern of results was observed. Interestingly, the non-speech tones

were perceived more categorically than the speech tones in the English listener group.

The authors suggest a memory-based model of perception in which categoricality of

perception is domain-general but strongly influenced by long-term categorical

representations.

In another study about categorical perception of lexical tones in listeners from different

language backgrounds, Wang and colleagues (S. Chan et al., 1975; W. Wang, 1976)

found categorical perception with speakers of Mandarin when asked to discriminate

high rising tones versus high level tones, whereas American subjects produced data

82

consistent with continuous perception. Interestingly, the Mandarin listeners exhibited a

category boundary located near the middle of the asymmetric stimulus space, whereas

the American listeners perceptually divided the continuum into „level‟ and „below

level‟ tones. Similar results were obtained for a series of experiments using non-speech

stimuli with the same tonal characteristics as the speech tones, and Chan et al. (1975)

conclude that the results reflect different modes of processing pitch depending on

language background: psychoacoustic processing in non-tonal language and linguistic

processing in the Mandarin listeners.

Mandarin Chinese and American English listeners‟ categoricality of tone perception

was also investigated by Zue (1976). Both English and Mandarin listeners‟ perception

of a continuum between tone 2 and tone 3 (both dynamic tones) was found to be

categorical.

Even though the results seem very different, there is a general trend in Mandarin tone

perception: dynamic tones are perceived rather categorically, whereas static tones are

perceived in a continuous manner.

3.7.2 Categorical Perception of Thai Tones

Abramson (1961; 1962; 1975; 1979) used stimuli from a continuum between Thai mid

and high tones to test Thai and American listeners. Identification results show a very

sharp category boundary in the middle of the continuum for both listener groups, but

the corresponding discrimination peak (albeit somewhat slight) was only observed in

Thai listeners (Abramson, 1961). In follow-up experiments (Abramson, 1977, 1979),

identification data showed a rather continuous response pattern with quite shallow

identification boundaries; and discrimination results also appeared continuous – there

was no clear discrimination peak and discrimination was very good across the whole

continuum. Based on these combined results Abramson concludes that perception of

lexical tone in Thai is continuous. Even though these results appear clear, it should be

stated that a closer inspection of the discrimination functions reveals that there could

have been a ceiling effect that masked the discrimination peaks.

More recently, Burnham and Jones (2002) conducted a categorical perception study

with native Thai speakers and Australian English speakers using both speech and non-

83

speech stimuli - sine-wave pure tones, filtered speech sounds, and violin sounds. The

speech condition used a non-Thai word (/wa/) recorded in three different dynamic Thai

tone contrast continua (mid-fall, rise-mid, rise-fall) and was matched for duration, F0

onset and contours with the three non-speech stimulus categories. The results indicated

that tonal language speakers show categorical perception effects when they listen to

lexical tones, but fail to generalise to non-speech stimuli with the same tonal

characteristics. Another interesting finding of this study was that while Thai listeners‟

perception of lexical tone items in speech was more categorical than in non-speech,

non-tonal English listeners heard non-speech stimuli more categorically that the speech

sounds, indicating that the categorical perception of tone is, to some extent, learned

(Burnham & Jones, 2002).

In summary, the data on perception of lexical tone indicate that pitch movement (level

vs. contour) is important for categoricality of perception. Level tone continua are

perceived in a more continuous way (Abramson, 1979; Francis et al., 2003), whereas

contour tones are perceived categorically (Francis et al., 2003; W. Wang, 1976). It is

difficult to draw firm conclusions however, as the studies are not comparable, because

different stimuli and different languages were investigated. It is therefore necessary to

conduct the same experiment with different language groups.

Another question that is important for this thesis is whether categorical perception is

restricted to speech sounds, or can also be found in non-speech stimuli with the same

pitch characteristics. Therefore, speech and non-speech tones will be used in the

following experiments. Most of the previous studies have not tested identification and

discrimination of lexical tones; however, both are necessary in order to draw

conclusions about categorical perception.

Some of the shortcomings of the previous studies on categorical perception of lexical

tone will be overcome in the following experiments. Previous studies, for example,

have compared perception of speech and non-speech stimuli in speakers of one tonal

and one non-tonal language, but so far, no previous study has integrated different tonal

languages and speech and non-speech sounds and tested identification and

discrimination. It is essential to look at all of these interconnected issues at the same

84

time. In the following cross-language experiment of the categorical perception of

lexical tone, all variables are incorporated: language group (Mandarin, Vietnamese, and

Thai), stimulus type (speech versus non-speech), and categorical perception factors

(identification and discrimination). In terms of language background, the tonal

language groups will consist of Mandarin, Vietnamese, and Thai native speakers, the

non-tonal language group consists of native speakers of Australian English who are

unfamiliar with Mandarin, Vietnamese, Thai or any other tonal language. In terms of

stimulus types, we will use a tone continuum ranging from a rising to a falling tone

presented as speech and sine-wave sounds. In order to match the speech and non-

speech tones, we use linear instead of curved tone contours. Linear tone contours

usually occur in non-speech situations but correspond to a rather rough approximation

of real lexical tones, and are thus less prone to give perceptual advantages to tonal

language speakers. This experimental design makes it possible to assess the effect of

language background on tone perception by testing tonal and non-tonal language

listeners; to find out whether categorical perception is speech-specific or universal by

comparing speech and non-speech tones in tonal and non-tonal language speakers; and

to investigate whether identification results match discrimination data by testing both of

these aspects of categorical perception.

85

CHAPTER 4

Musical Pitch and Tone

86

4.1 General Characteristics of Music

This chapter concerns music. The chapter is organised in six sections: The first two

sections concern the structure of music and differences in music across cultures, the

next three sections concern people‟s musical ability and how music is perceived, and

the final section concerns the relationship between music and other abilities, especially

speech perception and production.

Music consists of melody, harmony, and rhythm. Relevant aspects of these are

reviewed in the following sections ("Grove Music Online," 2001).

4.1.1 Scales and Intervals

The simplest musical interval is the octave. Physically, an octave difference represents

a frequency ratio of 2:1. Tones with frequencies that are in an octave relationship are

perceived as similar or related, and are given the same name. Because of the perceptual

similarity of tones separated by an octave, it is now generally accepted that pitch may

be modelled as a spiral (Shepard, 1982a), with pitch chroma being the dimension of the

spiral associated with different pitches, and pitch height being the dimension of the

spiral that is associated with different octaves.

The octave has the perceptual quality of consonance, meaning that the sound of two

complex tones played together which have an octave relationship is generally perceived

as pleasant or harmonious. Two complex tones, which have an octave relationship,

have many partials35 in common. For example, a complex tone with a fundamental

frequency (F0) of 100 Hz has partials (harmonics) at 200 Hz, 300 Hz, 400 Hz, and so

on. A complex tone with a fundamental frequency of 200 Hz has partials at 400 Hz,

600 Hz, etc. Every alternate partial of the lower-pitched complex will lie at the same

frequency as every partial of the higher-pitched complex. These coincidences appear to

assist the perception of pleasantness (Justus & Bharucha, 2002).

In contrast, if one of the complex tones is somewhat mistuned so that it is slightly more

or less than an octave away from the other, the resulting sound is perceived as

dissonant. Continuing with the above example, if the higher complex has a 202 Hz

fundamental, then a beating sensation at a rate of two beats per second will result with

35 A partial is one of the component vibrations at a particular frequency in a complex mixture. A partial does not need to be a harmonic. The fundamental and all overtones may be described as partials; in this case, the fundamental is the first partial, the first overtone the second partial, and so on.

87

the second harmonic of the 100 Hz complex tone, making the combination sound

rough. Additionally, the second harmonic of the 202 Hz complex will beat at 4 times

per second with the fourth harmonic of the 100 Hz complex tone such that a 202 Hz

complex tone sounds very rough and unpleasant.

Apart from the octave, there are other musical intervals that are also consonant (Sadie

& Tyrrell, 2001). In general, two notes that have a simple ratio of their fundamental

frequencies will sound consonant. Examples of these intervals and their ratios are: a

perfect fifth (frequency ratio of 3:2), a perfect fourth (4:3), a major third (5:4), and a

minor third (6:5). The reason that intervals with simple frequency ratios are perceived

as pleasant is similar to the reason that octaves sound consonant: their partials fall at the

same frequencies, and there are no near partials to cause beating or roughness.

4.1.2 Tempo, Rhythm, and Meter

Tempo in music is defined as the speed or pacing of a musical composition. Tempo can

be indicated in different ways. A metronome36 can be used to determine the tempo of a

musical piece in terms of beats per minute. In a less quantitative system,

conventionalised descriptions of speed and gestural character of the composition, such

as andante37, allegro38, adagio39, etc. are used.

Rhythm is a fundamental element that plays a part in many aspects of music: it is an

important element in melody; it affects the progression of harmony, and has a role in

such matters as texture, timbre and ornamentation. While in Western music rhythm is

multiplicative (i.e. rhythmic patterns are derived by multiplying or dividing, normally

by two or three), in many non-Western cultures it is additive; an eight-unit rhythm in

Western music is invariably constructed on the basis 2 x 2 x 2, in Middle Eastern music

it can be 3 x 2 x 3.

Meter is defined as the temporal hierarchy of subdivisions, beats and bars, which is

maintained by musicians and deduced by the listener, and functions as a dynamic

36 A metronome is a device that is used to mark time in music by means of regularly recurring ticks or flashes at adjustable intervals. 37 Andante indicates a moderately slow tempo. 38 An allegro is performed quickly in a brisk, lively manner. 39 Adagio indicates that the composition is to be played in a slow tempo.

88

temporal framework for the production and perception of musical durations. Meter is an

aspect of the behaviour of performers and listeners rather than an aspect of the music

itself. Meters may be categorised as duple or triple (according to whether the beat or

pulse is organised in twos or threes) and as simple or compound (indicating whether

those beats are subdivided into duplets or triplets), or even more complex meters.

Rhythmic and metric characteristics of Western and non-Western music, along with

other musical characteristics of these musical styles are discussed in the following

sections.

4.2 Music in Different Cultures

In this thesis it is the relationship between music and speech that is important. Music

and speech are both universal acoustic communicative systems used by humans. The

differences and similarities between music and speech have long been of interest for

psychologists (Feld & Fox, 1994). They share very important acoustic characteristics

such as intensity, duration, rhythm, timbre, and pitch. The common characteristic of

particular interest here is pitch, based on fundamental frequency, as this relates to

lexical tone. Fundamental frequency and pitch are the focus of the following sections,

firstly with regard to Western music and then with regard to Thai music.

4.2.2 Western Music

4.2.2.1 Scales and Intervals in Western Music

In the Western music tradition40, the diatonic scale (C, D, E, F, etc.) is used. A diatonic

scale is a seven-note musical scale comprising five whole-tone and two half-tone steps.

Between two half-tone steps there are either two or three whole tones, with the pattern

repeating at the octave. These scales are the foundation of the European musical

tradition. Within the twelve notes of the chromatic scale41, there are twelve distinct

diatonic scales. The white keys on a piano map out the seven notes of one such diatonic

scale, repeated in each octave.

40 In this review, only sacred and secular art music is considered; folksong and dance music are not discussed. 41 The chromatic scale contains all 12 pitches of the Western tempered scale.

89

4.2.2.2 Tempo, Rhythm, and Meter in Western Music

In Western music, time is usually organised to establish a regular pulse, and by the

subdivision of that pulse, into regular groups. The arrangement of the pulse into groups

is the meter, and the rate of pulses is its tempo. Most Western music possesses a regular

rhythmic pulse and meter.

In basic meters the subdivisions are equally spaced, but in both Western and non-

Western music there are metric patterns that involve unequally spaced beats.

Conventional meters can also be used as a way of notating complex, irregular rhythms.

(In these cases, performers may engage in metric counting, but listeners are not able to

infer any pattern of beats or bars, in which case it is doubtful if any meter is present at

all.)

4.2.2.3 Brief History of Western Music

The history of Western music can be collapsed into six main periods, each of which has

particular more or less stable features that characterise the period.

The first period is the Medieval Period, which lasted from 400 AD to 1400 AD. Before

900 AD, almost all music had a simple melodic structure, called a plainchant. The

plainchant consisted of one melodic line sung in unison. Over the next 500 years, this

simple structure was expanded. By 1300 AD, there were compositions that were written

for three and four voices. These works are referred to as polyphonic (many voices).

The period called Renaissance began around 1400 AD and lasted for about 200 years.

By 1400, various composers were writing polyphonic works in slightly different

manners. This led to more unified sounding works, and gave rise to a number of

contrapuntal forms, such as the canon42, the canzon43, and the fugue44. Most of the

development during the renaissance happened in Italy and most influential during this

musical period was the Italian composer Giovanni da Palestrina.

The following Baroque Period lasted from about 1600 AD to 1750 AD. New hymns

(chorales) were written, which were primarily homophonic (simple chordal structure) in

nature. By the mid-1700s, several composers began to explore new styles of

42 In a canon, all the voices are repeated exactly, but delayed in time. 43 A canzon is a succession of themes, each of which is developed and then discarded. 44 In the fugue, one theme is developed extensively.

90

composition, such as symphony and concerto. The most important composers to

contribute to the style of music of the Baroque period were Bach, Vivaldi, and Haendel.

This development led to the Classical period (1750-1800), where the basic musical

features did not change appreciably, except for the abandonment of polyphony. The

major contribution during this period was the enlargement and augmentation of many

aspects of music, such as the development of the orchestra. Mozart, Beethoven, Haydn,

and Schubert contributed significantly to the Classical period.

The Romantic period started around 1820 and ended in 1900. The scale of works

continued to expand and by the end of the 19th century, operas of three or more hours

were written; symphonies of an hour and a half were composed, and sometimes 200 or

more musicians were needed to perform these works. In this period, among the most

influential composers were Mendelssohn, Verdi, and Wagner.

By 1900, popular and 'classical' music began to separate. Jazz and then Rock became

the music of the masses, and classically trained composers experienced a much smaller

audience than before. The classical world fractured into many different groups, most of

which began to write much smaller pieces again.

4.2.3 Thai Music

Apart from the Western musical system, there are other widespread musical systems in

the world. Because the current series of studies is concerned with perception of speech

and music in tonal languages and particularly in (musician and non-musician) Thai

listeners (see Experiment 1, Chapter 6, and Experiment 2, Chapter 7), the main

characteristics of Thai music are reviewed in this chapter.

4.2.3.1 Thai Intervals and Scales

In Thai music the tuning system is equidistant. Thai instruments are tuned to seven

different pitches (although the voice and non-fixed-pitch instruments use tones beyond

these seven). In traditional Thai music, five fundamental tones, forming a pentatonic

scale, are the basis of most compositions (Morton, 1976). The relationship between

Western and Thai musical scales can be seen in Figure 4.1.

91

Figure 4.1. Comparison between the Thai and the Western Scales

In the Thai tuning system, the octave is divided into seven intervals that are around 171

cents45. Thai music is notated in the same way as Western music; however, as

mentioned above, Western and Thai musicians are not „ruled‟ by the same size musical

steps. It is therefore unlikely that the music from both musical traditions would be

played together as it would give rise to a dissonant or unpleasant sensation. Thai music

is described as non-harmonic, melodic, with an organisation that is horizontal. This

means that Thai musical pieces usually consist of a melody that is played

simultaneously with variants of the same melody. These variants are then played more

slowly or more quickly than the main melody. This melodic format, where

instrumentalists improvise around the central musical theme is called heterophony.

4.2.3.2 Tempo, Rhythm, and Meter in Thai Music

In Thai music, three prominent proportional tempi can be found: the sam chan, the song

chan, and the chan dieo. Sam chan is double of the length of song chan and four times

the length of chan dieo. Each of these can be played at slow, medium, or fast speeds,

depending on the composition and the instruments involved.

The rhythm and meter of Thai music are steady in tempo, show a regular pulse, and can

be described as divisive, in simple duple meter with no swing and little syncopation. In

Thai music the emphasis is generally placed on the final beat of a measure or group of

pulses and phrases, whereas in Western music, the first beat is usually emphasised.

4.2.3.3 Brief History of Thai Music

Historically, the music of Thailand was mainly an oral tradition, which was

characterised by no notational system. It is therefore difficult to describe clearly the

historic forms of Thai music. Morton (1976) suggests that the notation of Thai music

45 The “cent” system is a means of comparing intervals, in which the octave (in Western music) is divided into 1200 equal parts with each of the semitones encompassing 100 cents.

92

began only around 600 years ago. The classical period, also called the Bangkok period

started in 1782 AD, as a development of music from the fourteenth and fifteenth

century. In 1767 AD art collections and libraries were burnt by the Burmese army in

then the capital of Thailand Ayuthaya, which resulted in the loss of most knowledge

about Thai music history before the Bangkok period (Morton, 1976).

In Thai music, three major genres can be identified: classical music, traditional or folk

music, and contemporary pop and rock.

The earliest traditional Thai ensembles were the so-called piphat ensembles, which

included woodwind and percussion instruments. The khruang sai is another form of

ensemble, composed mainly of string instruments; and in the mahori ensemble string

instruments are combined with melodic percussion instruments and flute.

Thai country music, the luk thung, was developed in the middle of the twentieth century

and was used to reflect the daily trials and tribulations of the rural Thai people. A folk

genre found mostly in Isan, in Thailand‟s northeast, is the mor lam. The mor lam is

thematically similar to luk thung. In the mor lam, the melody is tailored to the lexical

tones of the lyrics and the vocals are described as rapid-fire rhythmic with a „funk‟ feel

to the percussion. The kantrum, another musical variety is traditional dance music

played by Cambodians in Thailand near the Cambodian border. Singers, percussion,

and string instruments dominate the sound of the kantrum.

Apart from the traditional and classical music, in the twentieth century, Western

classical music, jazz, and tango became popular in Thailand. Thai melodies were

combined with Western classical music, which progressed into luk grung, a romantic

music style. In the 1960s, Western rock music became very popular, which led to the

formation of Thai pop music, called string and Thai lyrics started being used.

4.2.4 Similarities and Differences – Western and Thai Music and Singing

In the following sections, musical characteristics of Western and non-Western music

will be compared. In the first part, the features of Thai and Western music are

contrasted, followed by a comparison of singing in tonal and non-tonal languages.

93

4.2.4.1 Music

One of the main differences between Western and non-Western music is the tuning.

While it is clear that Thai and Western music share the octave principle, the Thai scale

consists of seven steps, whereas the Western scale has 12 steps (see Figure 4.1).

Another difference between Thai and Western music is that Thai music is non-

harmonic, while Western music is generally harmonic. In Thai music each instrument

plays its own melodic variation, based on a principal melody, whereas in Western

music notes are combined simultaneously and successively to produce chords and

chord progressions. In terms of tempo, rhythm and meter, Thai and Western music are

similar, both showing a regular pulse, however in Western music, the first beat of a

phrase is generally emphasised, whereas in Thai music it is the last beat. One other

reason contributing to the difference in sound between Western and Thai music is the

difference in instruments. In Western music, instruments are classified into string

instruments, wind instruments, and percussion instruments; in Thai music, plucked,

bowed, hit or beaten, and blown instruments are differentiated (Morton, 1976). This

classification is very similar in both musical cultures, however the instruments in each

category differ; therefore the sound that is produced is different.

4.2.4.2 Singing in Tonal Languages

Singing is a universal form of auditory expression in which music and speech are

fundamentally combined. The investigation of singing represents an ideal case for

assessing the differences and the similarities of music and language. In this section,

studies concerning singing in a tonal language are reviewed.

Even though the speaking voice and the singing voice are similar in terms of the organs

involved, there are some slight differences. One feature that distinguishes singing from

speaking is the controlled use of fundamental frequency. In singing the fundamental

frequency must be controlled much more precisely than in speaking, especially if the

singer performs with other musicians. In singing, higher volume levels and greater

dynamic volume ranges are produced than when speaking. Another difference between

the singing and speaking voice is the position of the larynx. Professional singers sing

with a lowered larynx, whereas the larynx is not lowered in speech (Sundberg, 1999).

94

In tonal languages, pitch height and contour are used to contrast the meaning of words

(see Chapter 3 about lexical tone). As singing involves melodic variation, an important

issue is how pitch information is used to signal lexical tones in singing.

Three options are possible to approach an understanding of lexical tone in songs. The

first is to ignore the lexical tones and thus the meaning of the words, and to use only

pitch for melody marking. If this were the case, musicality would be preserved, but

intelligibility reduced. The second alternative is to preserve the lexical tones and ignore

the melody, thus retaining intelligibility at the cost of musicality. In this case, songs

would sound very much like speech. The third option is a combination of the first two:

the composer (or the singer) tries to preserve as much of the lexical tone information as

possible while restricting the melody as little as possible. Empirical investigations of

singing in tonal languages are presented below.

In an investigation of lexical tones in different kinds of songs, Chao (1956) found that

in Mandarin Chinese “Singsong”, a type of song between speaking and singing, single

lexical tones are sung with a consistent pitch pattern that is the same for the whole

song. For example, one singer was reported to sing all high-level tones on the musical

note A (440 Hz). In this kind of song, the musical intelligibility is preserved, as the

listener can identify the tone (and consequently the word) through the pitch pattern that

is assigned to it. In contemporary Mandarin songs, lexical tones are generally ignored

(R. C. Chao, 1956).

An investigation of investigated Cantonese Opera (Yung, 1983) showed an interesting

very systematic relationship between tones and melody: high-level tones are sung on E

(659.3 Hz), G (784 Hz), or D (587.3 Hz), mid-level tones are sung on C (523.3 Hz),

and low-level tones are sung mostly on A (440 Hz) and sometimes on B (493.9 Hz). It

appears then that in Cantonese Opera, each lexical tone has one or more musical notes

assigned to it and thus there is no overlapping of the musical notes and lexical tones

(Yung, 1983).

On the other hand, modern songs in Mandarin and Cantonese exhibit very different

behaviour with respect to the extent to which the melodies affect the lexical tones. In

95

modern Mandarin songs, the melodies dominate, so that the original tones on the lyrics

seem to be completely ignored. In Cantonese songs, however, the melodies typically

take the lexical tones into consideration and attempt to preserve their pitch contours and

relative pitch heights. Wong and Diehl (2002) analysed four contemporary Cantonese

songs. They observed direction of pitch change over pairs of consecutive syllables and

found an overall correspondence of over 90 percent between musical and tonal

sequences. This indicates that, while the fundamental frequency intervals and the shape

of the contours that are normal for speech are not reproduced exactly in these songs,

there does seem to be a very strong tendency for a rising sequence of tones to

correspond with an ascending sequence of musical notes, and for a falling sequence of

tones to correspond with a descending sequence of notes (Wong & Diehl, 2002).

Another analysis of six modern Cantonese songs (M. Chan, 1987) revealed similar

results: most tones are preserved in Cantonese popular music.

Finally, in an investigation of chants and songs of Central Thailand, List (1961)

observed that the “speech melody” (i.e. the pattern of pitch intervals of the lexical

tones) is mostly preserved, but the range of variation of the musical melody appears to

be limited.

In summary, it seems that the treatment of tones in songs does not depend on the

particular language in which they are sung, but more so on the style of the composition

and the songs.

4.3 Perception of Music – Tempo, Rhythm, Grouping, and Meter

In order to understand the perception of music in general, an understanding of the

perception of all musical features, tempo, rhythm, grouping and meter is required, and

these are discussed below.

4.3.1 Perception of Tempo

Tempo describes the rate at which the basic pulses of the musical piece are played;

musical pulses are confined to a tempo range of roughly 50 to 500 ms (Fraisse, 1982).

Sensitivity to small changes in tempo is most accurate in the range from 300 - 800 ms

(Fraisse, 1982). Different lines of evidence propose that temporal intervals ranging

96

from around 200 - 1800 ms, especially those between 400 and 800 ms, have particular

perceptual salience (Braun, 1927; Collyer, Broadbent, & Church, 1994; Fraisse, 1982).

The tempi at which humans prefer to produce and hear an isochronous pulse, the

spontaneous tempo and the preferred tempo are based upon a temporal interval of

about 600 ms (Fraisse, 1982). The phenomena observed in perception of musical tempo

seem to have their origins in the anatomy and physiology of the human body and

research suggest a strong relationship between the perception of rhythm and human

movement, such as heartbeats (means around 670 ms - 1000 ms for adults), walking

(900 ms - 1100 ms), or breathing (200 ms - 350 ms) (Clynes & Nettheim, 1982;

Davidson, 1993; Gabrielsson, 1973; Krumhansl & Schenk, 1997; McLaughlin, 1970;

Shove & Repp, 1995; Todd, 1992; Truslit, 1938).

4.3.2 Perception of Rhythmic Patterns

A rhythmic pattern is a short sequence of events, usually in the order of a few seconds,

characterised by the periods between the successive onsets of the events. The inter-

onset periods are usually simple integral multiples of each other. Around 85 to 95

percent of the notated durations in a typical musical piece are of two categories in a

ratio of 2:1 or 3:1 (Fraisse, 1982). This limit of two main categories is thought to be the

result of a cognitive limitation; it has been observed that even musically experienced

participants have problems distinguishing more than two or three durational categories

in the range below two seconds (Murphy, 1966). In accord with this notion listeners

appear to distort near-integer ratios towards the integers when they repeat rhythmic

structures (Fraisse, 1982), and musicians seem to have difficulties reproducing rhythms

that are not representable as approximations of simple ratios (Fraisse, 1982). Rhythms

that have simple ratios are also easier to reproduce at different tempi, but that is not the

case for complex rhythms (Collier & Wright, 1995). The simplicity of the ratio alone

however cannot account for all perceptual phenomena that are observed in rhythm

perception: Povel (1981) observed that even if the ratios in a rhythmic pattern are

integrals, listeners sometimes cannot realise this relationship unless the pattern structure

makes it evident.

97

4.3.3 Perception of Grouping

As for language, in which utterances can be segmented into sentences, words, syllables,

and phonemes, music can be segmented into groups. Rhythmic patterns are groups that

contain sub-groups, which can be combined to form superordinate musical groups such

as phrases, sections, or movements (Justus & Bharucha, 2002). Lerdahl and Jackendoff

(1983) propose that the psychological representation of a musical piece includes a

hierarchical organisation of groups, also called the grouping structure. Further evidence

to support psychological grouping mechanisms was found by Sloboda and Gregory

(1980), who observed that clicks placed in a musical piece were systematically

remembered as being closer to the phrase boundary than they actually were. This

phenomenon has also been demonstrated in the perception of speech in an experiment

that involved placing a click in a stream of speech (Garrett, Bever, & Fodor, 1966).

Results showed that listeners perceived the click at phrase boundaries, rather than in the

middle of the word, where it was actually placed.

Perceptual grouping can occur even when there is no objective basis for it, a

phenomenon called subjective rhythmisation. Within a range of around 200 ms to 1800

ms intervals, an isochronous pattern will be grouped into twos, threes, or fours (Bolton,

1894), and when listeners are asked to synchronise with such a pattern, they illustrate

the grouping by lengthening or accenting every second or third event (MacDougall,

1903). Grouping is also dependent on tempo, as shown by the occurrence of groups of

larger numbers, which is more likely at fast tempi (Bolton, 1894; Fraisse, 1982).

Rhythmic patterns also affect grouping in that events which are separated by shorter

intervals in a sequence will be grouped into a unit that has a longer interval (Povel,

1984).

4.3.4 Perception of Meter

Meter is the hierarchical organisation of musical pieces based on temporal regularities

of the underlying beat or pulse of a musical sequence. One of the main characteristics

of meter is isochrony. This means that the beats are equally spaced over time, creating a

perceived pulse at a particular tempo (Povel, 1984). A beat itself has no duration - it is

simply used to divide the musical piece into equal time-spans. A sensation of pulse may

be evoked by temporal regularity at any level within a sound sequence and is not a

98

feature of the raw musical stimulus itself, but rather something that the listener infers

from it. For example, if a new event occurs almost once per second, a beat is perceived

every second, no matter if the event is present or not (Justus & Bharucha, 2002). A

form of behaviour that reflects the perception of pulse is tapping of the foot to music.

The process of extraction of regularities in music is often regarded as synchronisation

or internal timing device (Povel & Essens, 1984–5; Wing & Kristofferson, 1973).

4.4 Perception of Music - Pitch

Having reviewed the temporal features of music and their consequent perception, the

following sections will look at the perception of pitch in music. When listening to

music one may experience different pitches played consecutively (melody) or at the

same time (harmony) that form coherent patterns, which unfold as the musical piece

develops. Many perceived aspects of these patterns – such as certain pitches seeming

more steady than others, or that simultaneously played pitches sound more or less

pleasant together, or that the occurrence of certain pitches is highly predictable – appear

to conform to the rules of tonality (see Section 4.2.2.1).

There are two main approaches to the study of pitch processing: one is focusing on

sensitivity to acoustical frequencies and frequency relationships; the other is concerned

with the influence of basic cognitive processes on pitch perception. Researching the

first approach, Seashore (1938) claimed that pitch constitutes the direct correlate of

fundamental frequency, thus the relationship between pitch and frequency was believed

to be mediated exclusively by the dynamics of peripheral auditory mechanisms. The

experience of a difference between two pitches was identified with the perception of a

difference, or a ratio, between two frequencies. Empirical studies conducted within this

vein of research tend to focus on the perception of isolated tones or tone combinations.

However, some of the results of this reductionist approach have proven difficult to

agree with the intuitions of musicians and theorists. For instance, Stevens‟ mel scale of

pitch (S. S. Stevens & Volkmann, 1940) implies that the same interval differs in size

according to the register in which it occurs. This type of disproportion, together with

the emergence of cognitive psychology in the 1950s, stimulated research that focused

on the role of cognitive factors in shaping the experience of musical pitch (Shepard,

1982b).

99

4.4.1 Categorical Perception of Musical Pitch

Musical pitch continua are sensory continua, thus it would be expected that they would

be perceived continuously rather than categorically (see 2.3.4). Burns and Ward (1978)

examined trained musicians‟ perception of melodic intervals between sequentially

presented tones and found that perception was categorical: Identification functions for

the intervals were steep and discrimination was best at the semitone boundaries. This

pattern of results was attributed to learning, rather than the acoustic properties of the

stimuli, and this conclusion is supported by the finding than non-musicians did not

exhibit categorical patterns of results (Burns & Ward, 1978).

In concert with these categorical perception effects just for musicians, Siegel and Siegel

(1977a) observed that trained musicians are very accurate in labelling intervals ranging

between unison46 and a major triad, whereas non-musicians show inconsistent labelling.

In a follow-up study, Siegel and Siegel (1977b) measured musicians‟ magnitude

estimates for intervals that ranged from a fourth to a fifth. Perceptual plateaus and

reduced variability was observed within the three interval categories (fourth, tritone,

fifth), whereas rapid changes with higher variability were observed at the boundaries.

This pattern of results led to the conclusion that musicians perceived these intervals

categorically, however there was no assessment of discrimination abilities.

Categorical perception experiments have also been conducted on simultaneous intervals

and chords. Locke and Kellar (1973) presented chords consisting of three tones,

varying the frequency of the middle tone. Participants were asked to identify stimuli

from a continuum between a minor and a major triad. Musicians‟ perception of these

chords was categorical, whereas non-musicians showed rather poor and non-categorical

identification and discrimination. In a similar vein, Blechner (1977) presented chords

from a minor to a major continuum for identification and discrimination. The listeners

who were capable of consistently labelling the stimuli as minor and major exhibited

categorical perception patterns, whereas those listeners who could not identify minor

and major stimuli did not, and had lower discrimination scores. Similar results were

found by Zatorre and Halpern (1979), who used two-tone simultaneous intervals from

minor third to major third. Thus, it seems that musically trained listeners identify 46 If two tones in unison they are considered to be the same pitch, but are still perceived as coming from two separate sources ("Grove Music Online," 2001).

100

acoustically ambiguous chords as major or minor, in a similar manner that listeners

identify ambiguous speech stimuli as one category or the other.

In summary, it can be concluded that the phenomenon of categorical perception is not

unique to speech or acoustic cues that are relevant to speech, and that categorical

perception of pitch in musical stimuli appears to reflect learned categories, rather than

psychoacoustic sensitivities, thus implicating cognitive factors rather than purely

perceptual mechanisms.

4.4.2 Relative Pitch

Melodic musical intervals can be defined as subjective correlates of sequential

frequency ratios. The melodic information in a musical piece is not dependent on the

absolute frequencies of the tones that form the melody, but rather on the frequency

relationships between the tones. The musical scales of all cultures are based on the

notion that equal frequency ratios cause equivalent percepts (Burns, 1999).

Accordingly, melody transposition does not cause loss of melodic information.

Trained musicians who have developed what is called relative pitch (RP) are able to

assign verbal labels to musical intervals. Possessors of RP are able to identify the name

of the note or interval, but only with a given reference note. RP also enables musicians

to produce intervals when given an interval name and a reference tone. RP is not

necessary in order to appreciate or play music, but it can assist in sight-reading music.

Most trained musicians have RP abilities, and RP training is included in most music

curricula.

4.4.3 Absolute Pitch

There is a small proportion of the population that have absolute pitch (AP). AP (also

called perfect pitch) is defined as the ability to produce or identify specific pitches

without reference to an external standard tone (Baggaley, 1974). Possessors of AP have

internalised their pitch references, and are able to maintain stable representations of

pitch in their long-term memory.

101

The faculty of AP is often compared with colour perception (D. Ward, 1999), because

for possessors of AP, pitch labelling is comparably easy, just as colour labelling is for

most humans. Nevertheless, the comparison is not quite perfect, as the visual system

divides visible wavelengths on the basis of retinal cells (cones) which are specialised

for particular wavelength ranges (Bimler, Kirkland, & Jameson, 2004). Information

from the peripheral auditory system and the cochlea however is more continuous and

there does not seem to be a one-to-one mapping between incoming stimulus and

percept (Bornstein, 1973). Absolute pitch is quite rare, probably occurring in only

0.01% of the population47 (Bachem, 1955; Baharloo, Johnston, Service, Gitschier, &

Freimer, 1998; Profita & Bidder, 1988; Takeuchi & Hulse, 1993). It appears to be

distinct from the ability, which some people develop to judge the pitch of a note in

relation to a reference pitch such as the lowest or highest note they can sing. Rakowski

(1972) asked listeners with and without AP to adjust a variable signal so as to have the

same pitch as a standard signal, for various time delays between the two. For long

compared with shorter time delays listeners without AP showed a marked deterioration

in performance, whereas those with AP did not.

One possibility to explain the origin of AP is that it is a faculty acquired in infancy

and/or childhood through some learning process, although Ward (1999) has suggested

that the converse is true: we may all start with AP, but the ability is usually unlearned

due to being reinforced for relative but not absolute pitch judgements. The limited

success achieved by training in adulthood tends, at the moment, to favour the idea of

some sort of learning process in the development of AP. (For a discussion of infant AP

see 4.5.5 and 4.5.6.).

Timbre can also play a role in AP accuracy. Lockhead and Byrd (1981) demonstrated

that AP possessors are more reliable in identifying tones played on instruments that

they are familiar with. It has also been found that notes that are played on a piano are

easier to identify than notes played on other instruments – a phenomenon also referred

to as “absolute piano” (Takeuchi & Hulse, 1993) – which may also be a familiarity

47 It needs to be noted that there are relatively more AP possessors in Japan, with around 30% for university music education students and around 50% or more for music students (Miyazaki, 2004). This high occurrence of AP in Japan is believed to be a result of early music lessons (see section 4.4.5.2 about the origin of AP).

102

effect given the pervasiveness of the piano. These findings suggest the involvement of

other factors than pitch range or pitch register in pitch identification accuracy.

In an attempt to link absolute pitch to tonal language background, Deutsch, Henthorn,

and Dolson (1999, 2004) examined tonal (Mandarin and Vietnamese) and non-tonal

(American English) speaking participants' F0 variation across two days. The data show

that tonal language speakers' pitch variation between the two days was significantly

smaller than that of non-tonal language speakers, and from this the authors conclude

that the tonal language speakers "display a remarkably precise and stable form of

absolute pitch in enunciating words." (p. 399). This conclusion is questionable, as

absolute pitch is defined as the ability to voluntarily produce or identify specific pitches

without reference to an external standard tone; what Deutsch et al. (1999, 2004) are

examining here is unconscious variation of speaking voice. Nevertheless the fact that

there was less variation for tone language speakers warrants further investigation. A

subsequent study by Burnham, Peretz, Stevens, Jones, Schwanhäußer, Tsukada,

Bollwerk (2004) using more transparent and speech-appropriate measures of F0

variation suggests that the differences between tone and non-tone language speakers are

minimal, prompting the conclusion that such tasks have little to do with absolute pitch.

4.4.4 Absolute Pitch Memory

Over and above absolute and relative pitch listeners with no musical background can

identify familiar melodies presented at novel pitch levels and notice when those

melodies are performed incorrectly (Drayna, Manichaikul, de Lange, Snieder, &

Spector, 2001), suggesting that even participants without AP or RP have accurate

implicit pitch memory. Musical memory abilities were also tested by Schellenberg and

Trehub (2003). In order to investigate pitch memory in people who cannot identify or

produce musical notes, familiar recordings were presented and participants were

required to identify whether the excerpt was taken from the original song or whether it

was shifted in pitch. Even adults with little or no musical experience were able to

remember pitch levels of familiar songs over time, which led to the conclusion that

non-musicians can retain pitch information over long periods of time, an ability

comparable to AP. Similar results were obtained with excerpts as short as 100 ms

suggesting that it is absolute features of the music, such as timbre or frequency spectra

103

that are important rather than relational cues (Schellenberg, Iverson, & McKinnon,

1999).

Further support for this more generalised AP-like ability comes from adults‟ production

of songs. When asked to sing popular songs that they know, almost two thirds of adults

that were tested produced renditions within two semitones of the original versions

(Levitin, 1994) and tempi within 8% of the original tempo (Levitin & Cook, 1996).

Similar consistency in pitch level and tempo has been found when adults were asked to

sing familiar folk songs (like “Yankee Doodle”) on different occasions, even though

they had obviously heard these songs in several pitch levels and tempi (Bergeson &

Trehub, 2002; Halpern, 1989).

Taken together, these results indicate that even though non-musicians are not able to

label pitches, they nevertheless can have good long-term musical pitch memory.

However, it should be noted that studies of this nature have been criticised for various

reasons, one of them being that the reproduction of melodies might involve a form of

muscle memory of the vocal chord muscles that might originate from singing along to

the song, so that the vocal tract position can be recalled when asked to reproduce the

song (Cook, 1991; W. D. Ward & Burns, 1978).

4.4.5 Developmental Issues in Pitch Perception

4.4.5.1 Pitch Perception Development

Pitch perception is essential to music perception (see section 4.2.2). A melody is

characterised by its pitch relationships, without regard to the absolute pitch levels of the

particular tones. This is demonstrated by the fact that adults can recognise a familiar

song at any given pitch level. Infants also seem to have this ability. After limited

exposure to a melody, 5- to 10-month-old infants treat transpositions48 of that melody

as familiar/equivalent to the original melody (H. W. Chang & Trehub, 1977; Trehub,

Bull, & Thorpe, 1984; Trehub, Thorpe, & Moringiello, 1987). When the tones are

rearranged, however (Trehub et al., 1984), or one component tone is altered (Trehub et

al., 1987), infants perceive the tune as new. For infants, who are able to perceive pitch

contour changes even when the standard and comparison melodies are separated by a

48 Transposition of a melody is alteration of component pitches where the pitch relations are preserved.

104

longer (15 sec) temporal interval (H. W. Chang & Trehub, 1977) or by a series of

unrelated notes (Trehub et al., 1984) the pitch contour appears to be the most relevant

aspect of the melody. The salience of the pitch contour is not only found in music. In a

comparison of pitch amplitude and pitch contour Fernald (1991) found that that pitch

contour is also the most salient aspect of infant-directed speech to infants (but see

Kitamura & Burnham, 1998, who show that vocal affect is the most salient aspect of

IDS).

It has also been found that infants are sensitive to interval information in Western

music (Trehub & Trainor, 1993). Infants showed heightened sensitivity for octave

information (Demany & Armand, 1984; Schellenberg & Trehub, 1996b) and have been

shown to confuse melodies that have the same contour (Trehub et al., 1987; Trehub,

Thorpe, & Trainor, 1990), however, they can detect small interval changes in melodies

(Cohen, Thorpe, & Trehub, 1987; Trainor & Trehub, 1993).

Another feature of Western music is the consonance and dissonance of certain intervals.

When 6-month-old infants were given a task requiring them to detect changes in

melodic intervals with varying frequency ratios, it was found that discrimination ability

was better for simple than for complex frequency relationships (Schellenberg &

Trehub, 1996b). Such findings suggest that infants find some intervals more consonant

than others and thus confirm similar observations in children (Schellenberg & Trehub,

1996a) and adults (Schellenberg & Trehub, 1994).

As music not only consists of melodies, but is also made up of harmonies (though see

non-Western music in section 4.2.3), infants‟ perception of harmonies is also of

interest. In a study of adults‟ and 8-month-old infants‟ detection of harmonic change

adults were better able to detect changes that were not within the key49 than changes

within the key, while infants could discriminate changes equally well, independent of

the key (Trainor & Trehub, 1992). This is in line with the general opinion that much of

the listener‟s knowledge of Western harmony is based on learning and exposure, rather

than on natural predispositions.

49 A key is one out of 24 major and minor diatonic scales that provide the tonal framework for a piece of music.

105

4.4.5.2 Absolute Pitch Perception Development

There has been a continuing debate about the origin of AP, mainly between theories

that highlight inherited contributions to AP and those that stress experiential

contributions (Takeuchi & Hulse, 1993; D. Ward, 1999; D. W. Ward & Burns, 1982).

Results that support and contradict the genetic view as well as data for and against the

experiential theory and early earning of AP will be summarised in the following.

Supporters of the hereditary theories emphasise the rarity of AP, and point to evidence

that AP is concentrated in families (Bachem, 1955; Baharloo et al., 1998; Baharloo,

Service, Risch, Gitschier, & Freimer, 2000; Gregersen, Kowalsky, Kohn, & Marvin,

2001; Profita & Bidder, 1988). In fact, while the percentage of AP possessors in the

general population is thought to be less than 0.01% (Bachem, 1955; Baharloo et al.,

1998; Profita & Bidder, 1988; Takeuchi & Hulse, 1993), these numbers are only

estimates and may exaggerate the rarity of AP. Moreover, inherited and environmental

factors are inseparably confounded, and a high incidence of AP by itself is not reliable

support for the contribution of genetic factors (Levitin, 1999; R. J. Zatorre, 2003).

On the other hand, a number of researchers have examined whether AP is learnt

through training. Attempts to improve AP identification in adults by intensive training

have had some success (Cuddy, 1968, 1970), but the levels of performance rarely equal

those found in real cases of AP. Some of those studies showed that it is possible to

remember a fixed standard pitch to a certain degree (Brady, 1970; Cuddy, 1968, 1970).

Nevertheless, these AP training accuracies are far from the level of real AP possessors

who can immediately and accurately identify the 12 pitch classes.

Thus, there is no conclusive evidence supporting the learning theories of AP. It has to

be mentioned that surveys of a great number of musicians show that the proportion of

participants who reported having AP decreased as age of starting musical training

increased (Baharloo et al., 2000; Sergeant, 1969). Miyazaki and Ogawa (2006) tested

children who attended music schools and observed that AP accuracy increased to 80%

or more between 4 and 7 years, and more than two thirds of children acquired AP to a

level of 90% or more correct. It was also found that AP for white piano notes developed

earlier than for black piano notes.

106

However, existing evidence for the early-learning model of AP is mainly based on

anecdotal reports or surveys based on biographical recollections. There is no

convincing evidence supporting the model. In fact, some researchers tried to train

children in AP, but with little or no success (Cohen & Baird, 1990; Crozier, 1997).

These failures may be due to the limited quantity and time of training that was given.

Altogether, these results indicate that AP is most likely a result of long-term training in

childhood. This issue will be considered further in relation to studies with infants

designed to test their AP ability (see section 4.4.5.3).

In summary, these results support the early-learning theory of AP genesis, proposing

that AP is most effectively acquired through training in early childhood. The early-

learning view of AP parallels with the critical period hypothesis50 (Lenneberg, 1967)

for language learning, suggesting that the acquisition process of AP should be

considered within a broader framework of cognitive development.

4.4.5.3 Absolute Pitch Abilities in Infants

As mentioned in 4.4.5.2, it is generally assumed that the roots of AP lie in

childhood/infancy. One of the facts that support this view is the negative correlation

between the onset of musical training and the accuracy of AP (Miyazaki, 1988;

Sergeant, 1969) and that younger children are better than older children in AP training

tasks (Crozier, 1997). According to this view, early in life, AP would be the dominant

pitch-processing mode, which is later replaced by the more functional ability to

represent and remember pitch relations (relative pitch). Saffran and Griepentrog (2001)

conducted two experiments with 8-month-old infants in order to investigate the use of

absolute and relative pitch cues in a statistical learning task with tone sequences. The

results suggest that infants are more likely to track prototypes of absolute pitches than

of relative pitches. In the third part of this experiment adult musicians and non-

musicians were tested on the same statistical learning tasks. Unlike the infants, adult

listeners depended mainly on relative pitch cues. These results suggest that there is a

developmental reorganisation from an initial focus on AP to the ultimate dominance of

50 The strong version of the critical period hypothesis states that children must acquire their first language by puberty or they will never be able to learn from subsequent exposure. The weak version is that language learning will be more difficult and incomplete after puberty.

107

RP. AP may be a less mature perceptual capacity, eventually replaced by RP during

development (Trehub, Schellenberg, & Hill, 1997).

In summary these studies of pitch perception in infancy augment our knowledge of

pitch perception and its development. It appears for example that knowledge of

harmony is probably a learned skill and absolute pitch is most likely a result of

intensive early training.

4.4.6 Hemispheric Differences in Pitch Processing

The processes involved in pitch perception depend to some extent, on the locations in

the brain, where information is processed, particularly with regard to the lateralization

of brain function. Here lateralization with regard to pitch perception in general and AP

in particular are considered in turn.

4.4.6.1 Lateralization of Pitch Processing

In the second half of the nineteenth century, Wernicke (1874) and Broca (1861) found

that certain speech and language problems could be related to specific brain areas. This

led to the suggestion that the left hemisphere of the brain was responsible for language

and analytic processing and the right hemisphere more for processing global

information, such as patterns. Accordingly, musical abilities were thought to be a

function of the right hemisphere; whereas language abilities seemed to be a function

located in the left side of the brain. Today, the picture seems to be much less simple.

It is not clear whether pitch perception in music and language share cognitive

mechanisms, although, a left hemisphere bias has been shown for processing of pitch

category information in both music and language. The suggestion that pitch categories

of a lexical or musical nature are especially dependent on left hemisphere processing is

interesting and differences between musical and lexical pitch categories suggest that

both of these may be processed separately in the left hemisphere. A great deal of

research has been devoted to investigating shared processing mechanisms for music and

speech. Currently, there are two significantly different views on that matter. One of

them assumes strict modularity of the two systems; it states that speech and music are

separate and do not share the same mental processing systems (Fodor, 1983). This view

is supported by the results of many behavioral and imaging studies, that have suggested

108

that linguistic processing occurs in the left hemisphere of the human brain, whereas

music is processed in the right hemisphere (Bever, 1975; Bever & Chiarello, 1974).

Even though these studies show that there is a lateralization effect, it appears that

hemispheric dominance of different processing mechanisms is not absolute, but can be

viewed as a tendency (Wong, 2002).

The alternative view states that hemispheric differences affect particular aspects of

auditory processing and that shared acoustic features of music and speech, such as

pitch, will be processed in a similar way. This possibility is based on the results of

various experiments, that have shown that phonemic processing occurs in the left

hemisphere, whereas melodic and prosodic units are processed in the right part of the

human brain (Bryden, 1982; Kimura, 1961, 1964; Patel, 2003; Patel & Peretz, 1997;

Peretz & Coltheart, 2003; Shankweiler & Studdert-Kennedy, 1967; Studdert-Kennedy

& Shankweiler, 1970; Van Lancker & Fromkin, 1973, 1978). It appears that some

aspects common to music and speech, for example hierarchical organisation, are

processed in overlapping areas of the brain, an observation that suggests that there are

common neural mechanisms which are used in speech and music processing (Patel,

2003).

In the current series of experiments, the effect of experience-dependent learning in the

domain of music on processing in the domain of speech is investigated and from the

results, it may be possible to know more about modularity or non-modularity of speech

and non-speech processing.

4.4.6.2 Lateralization of Absolute Pitch

It appears that there may be some neurophyisological differences between AP

possessors and people without AP. Keenan, Thangaraj, Halpern, and Schlaug (2001)

reported that musicians with AP have an enlarged planum temporale, a region that is

located in the left temporal lobe of the brain, which has been found to be involved with

language processing.

In order to investigate the neural basis of AP, Zatorre, Perry, Beckett, Westbury, and

Evans (1998) used functional and structural brain imaging techniques to measure

cerebral blood flow during presentation of musical notes to both possessors of AP and

109

to musicians without AP. Both listener groups showed similar patterns of increased

cerebral blood flow in auditory cortical areas and the group of AP possessors

demonstrated activation of the left posterior frontal cortex, an area thought to be related

to learning conditional associations. This activity was also observed in non-AP subjects

when they made relative pitch judgments of intervals (R. J. Zatorre et al., 1998).

Increased activity within the right inferior frontal cortex was observed in RP but not in

AP subjects during the interval judgment task, suggesting that AP possessors need not

access working memory mechanisms in this task, because they simply classify each

interval by name. Magnetic resonance imaging measures of cortical volume also

suggested that listeners with AP have an enlarged planum temporale, a result that

correlated with performance in a pitch-naming task (R. J. Zatorre et al., 1998). Their

findings suggest that AP depends on the use of a special neural network that enables

retrieval and manipulation of verbal-tonal associations.

Both of the mentioned studies show the importance of the left hemisphere of the brain

in AP.

4.5 Music and Other Domains

Music and speech are the most complex uses of sound-based communication in

humans. Additional to this similarity, they share other features: Both music and speech

are generative. This means that simple elements such as notes or speech sounds are

combined systematically in order to create complex, but meaningful structures, such as

melodies or words (R. J. Zatorre, Belin, & Penhune, 2002). Both music and speech

consist of elements that are time-dependent and occur in sequences, in which pitch,

duration and dynamics are very important. Both systems are constrained by the limits

of the auditory system, the central nervous system, and memory. Lerdahl and

Jackendoff (1983) have found similarities between the hierarchical structuring in

musical rhythm and the prosodic timing patterns in speech.

The suggestion that there might be links between musical and non-musical domains has

generated a large amount of research in recent years. One line of research is concerned

with the short-term benefits of listening to classical music, and another is concerned

with the effects of long-term musical training (see section 4.5.1). Each is discussed

here.

110

Neurophysiological studies to investigate common vs. separate processing of music and

speech have been considered in 4.4.6.1. Here behavioral evidence on this issue is

considered with respect to priming studies, transfer effects, and the effect of music on

speech.

Before considering these studies it is important to clarify differences in musical ability,

musical aptitude, and musical training. Everybody learns to speak51, and language is

learnt by exposure to speaking. Mothers and caregivers talk to infants and children and

on this and other bases they learn to speak and to understand language/speech.

Moreover there are certain proclivities and structural aspects of the human brain that

facilitate language learning. If we look at music, the case is not so simple. Not

everybody is exposed to music at the same level nor has the opportunity to learn a

musical instrument (or have singing lessons).

It has become clear that certain aspects, such as musical harmony are learnt by

exposure to music, i.e., listening. Other aspects, such as knowing the different keys on

the piano, must be trained, just as learning to read requires instruction. It is known that

there are children - dyslexic children - who have problems with reading (Orton, 1925).

In relation to the speech/music comparison, the question that then arises is whether

there also are people who have problems learning to play an instrument. This question

will be addressed in the following review of musical aptitude studies.

Musical aptitude is different from musical ability. Musical ability refers to the level of

musical skill and musical understanding of an individual (Boyle, 1992). The level of

musical ability is generally a result of various factors, such as aptitude, and musical

training. Musical aptitude is the possibly latent potential of a particular individual to

acquire musical skills.

There are various measures of musical aptitude (Gordon, 1965, 1989; Seashore, Lewis,

& Saetveit, 1939, 1960; Shuter-Dyson & Gabriel, 1981). The aptitude test used in the

current series of experiments is called AMMA (Advanced Measures of Music

Audiation) (Gordon, 1989). This test will be used because it is the only music aptitude

test developed specifically for university students and was found to have significant

51 This excludes individuals with medical conditions that do not allow them to learn how to talk.

111

predictive validity (Gordon, 1990). A criticism often encountered with aptitude

assessment is that musical aptitude and training may be confounded, i.e. performance

on this test may improve with musical training; therefore musical aptitude tests may

never be measures of “pure” aptitude. This problem cannot be solved in this thesis,

however, regression analyses will be conducted in order to eliminate such confounds.

In the ongoing debate on the degree to which certain cognitive abilities are influenced

by biological or environmental factors, the issue of musical aptitude versus training as a

determining factor of musical ability is often raised. In Western culture it is generally

believed that musical ability can mostly be explained by innate talent or giftedness

(Davis, 1994; Gardner, 1983; Radford, 1990). However, there is no direct evidence for

genetic involvement in musicality (Howe, Davidson, & Sloboda, 1998). This is in line

with the claim that most individuals have the potential to develop musical skill

(Ericsson, Krampe, & Tesch-Römer, 1993). Thus, the origin of musical ability and

aptitude has not been clearly established. One of the aims of the current series of

experiments is to investigate the relative roles of musical training and musical aptitude

on speech perception and production, which may lead to clearer understanding of the

interplay between these factors. Even though no gender differences in musical ability

and musical aptitude have been found (Shuter-Dyson & Gabriel, 1981), there are

differences between the genders in terms of musical involvement and achievement.

Girls are generally more involved and successful than boys at musical activities at

school, but men still play much more important roles in the music world. Research has

shown that in instrumental music, the gender-role principle and self-perceptions in

children are opposite to gender differences observed in adults (Eccles, Wigfield,

Harold, & Blumenfeld, 1993).

4.5.1 The Effect of Music on Other Cognitive Abilities

This section will provide a review of studies on how the effect of long-term training in

music can influence unrelated cognitive abilities, such as spatial and visual abilities,

verbal skills, reading and mathematics. Studies with infants and adults will be

considered.

112

Gromko and Poorman (1998) found that children‟s musical aptitude was positively

related to performance on a task that involved matching melody with graphic

representations leading to the conclusion that musical ability also seems to have an

effect on symbolic reasoning (Gromko & Poorman, 1998).

A different line of research investigates the relationship between musical skills and

verbal and visual skills. Hassler, Birbaumer, and Feil (1985) investigated visual-spatial

abilities and verbal fluency in nine- to fourteen-year old children. The sample consisted

of three groups with different musical abilities (non-musicians, musically talented,

musically talented and able to compose/improvise). Test results indicate that the

musically talented children were better than the non-musical children at verbal fluency

and visualisation, but not at tasks measuring spatial relation ability. This shows that

formal music training can be accompanied by better performance in non-musical tasks

(Hassler et al., 1985). In a similar study with adults, Chan, Ho, and Cheung (1998)

compared the verbal and visual memory abilities of women with and without musical

training. The results of this observation show that the groups did not differ on the visual

memory task, whereas the musicians performed better on the verbal memory task (A. S.

Chan et al., 1998). In a study about the influence of formal music training on verbal

recall showed that there is positive transfer between those unrelated skills (Y.-C. Ho,

Cheung, & Chan, 2003). Together this shows that musical training can enhance visual

and verbal abilities.

Another line of research investigated the relationship between music training and

reading and writing skills in children. It was found that discrimination ability of

musical sounds was related to reading performance in early readers of four to five years

age (Lamb & Gregory, 1993). Similarly, Standley and Hughes (1997) found that

children in pre-kindergarten classes (four to five years old) who took music lessons

over a period of two months showed improved pre-reading and pre-writing abilities,

compared to children without music lessons. A study on the relationship between music

and reading showed that a group of students that received Kodaly52 music instruction

52 Zoltan Kodaly was a Hungarian composer and ethnomusicologist. He developed a way of educating young children through singing of the native mother tongue folk songs. The Kodaly Method promotes the learning of music in a series of concepts then applies a sequential learning process for teaching music that follows the natural developmental pattern used in learning a language: i. e., aural, written, and then

113

scored significantly higher in a reading ability test than a control group that did not

receive music training (Hurwitz, Wolff, Bortnick, & Kokas, 1975). However, it was not

clear that the enhancement of reading ability was caused by the music training itself or

simply because of the more varied school program. The question remained unanswered

whether the children would have improved in reading if they had been given a different

kind of special instruction. It also remains unclear how music training could facilitate

reading, since the musical training group did not learn to read music.

The effect of music training on mathematic ability was investigated in different studies,

which will be reviewed here. Mathematical ability in five to seven-year old children

was observed and those children who participated in arts were found to have better

mathematical skills than children who did not do arts (Gardiner, Fox, Knowles, &

Jeffrey, 1996). Another study investigated possible transfer effects of keyboard lessons

in children of six to eight years of age. The results show improved mathematical skills

for the children who took keyboard lessons, which is further evidence that musical skill

can influence mathematical ability (Graziano, Peterson, & Shaw, 1999). Some studies

have found that music instruction can also affect certain mathematical abilities. In a

more complex study, Graziano, Peterson and Shaw (Graziano et al., 1999) compared

the proportional reasoning scores of children from seven to nine years. The study

included one group who received computer-generated spatial-temporal training alone

and another group who received the same spatial-temporal training and piano lessons.

Although both groups scored higher than a control group, the group that included piano

training scored significantly higher than the group that did not. A more recent study

(Rauscher & LeMieux, 2003) found that children who received two years of individual

keyboard lessons performed better on a standardised arithmetic test than children in

control groups. Children who received singing instruction also scored higher than

controls. Children who received instruction on rhythm instruments performed best on a

mathematical reasoning task. A meta-analysis combining six experimental studies

provides tentative support for the notion that music training affects mathematical

achievement (Vaughn, 2000)).

read. Rhythm symbols and syllables are utilized. Hand signals are used in order for the singer to visualize the pitches being sung and to understand tonal relationships.

114

The studies that have been reviewed here provide suggestive support that formal music

training has positive effects on non-musical abilities. However, the specific details of

the effects vary vastly between the studies. Many of the studies have been criticised for

lack of reliability. However, it can be concluded that music lessons can have positive

transfer effects on non-musical abilities. Most studies reviewed here show that formal

music training can enhance performance on unrelated tasks.

Music lessons combine different factors, such as practice, attention, concentration,

timing, ear training, sight-reading, and exposure to music. Schellenberg (2001) argues

that positive transfer into other non-musical areas, such as mathematics or spatial

reasoning, might be unique for learners of all factors involved in music lessons. He

points out that music lessons might improve general skills, such as attention to rapidly

changing temporal features and other skills, which are likely to be transferred into

other, non-musical domains (Schellenberg, 2001).

These data suggest that music may function as a medium for cognitive abilities in other

disciplines, and the connection between music and spatial-temporal logic is very

convincing. However, it is not clear what exactly the contribution of music instruction

is and no longitudinal studies have been conducted to determine the longevity of the

effects.

4.5.2 The Effect of Music on Speech

The effect of music training and exposure to music on unrelated cognitive abilities has

been reviewed in the previous section. This section will focus on the use of music for

speech processing, because in perception of music as well as speech shared processes,

such as melody recognition, contour processing, timbre discrimination, rhythm

processing, prediction, and perception of symbols in context are involved. As music has

been shown to be related to other cognitive abilities (see section 4.5.1), it is interesting

to examine how music and speech processing are connected. The interest in the

relationship between music and speech abilities has a long tradition (Dexter &

Omwake, 1934) and the popular opinion that musical aptitude helps in language

learning has been investigated since the 1930s. In this section, the influence of music

115

training and music aptitude on pitch processing, word recognition, prosody perception,

and foreign language learning will be reviewed.

In a recent study about the influence of musical training on pitch processing in

language, Magne, Schoen, and Besson (2006) analysed 8-year old children‟s behavioral

data and event-related potentials (ERP) in a pitch perception task. Those children who

had a musical training background performed better at a pitch incongruity task in both

language and music. Together with the electrophysiological data, these results confirm

what had previously been found: there are positive transfer effects between the areas of

music and language and the development of prosodic and melodic processing is

influenced by musical experience (Magne et al., 2006).

Another line of research is concerned with the effect of music training on word

recognition, reading, and related abilities. McMahon (1979) trained young children to

discriminate three-note chords and found improved word recognition, reading, and

general phonic skills (McMahon, 1979). In a study by Douglas and Willatts (1994), 8-

year old children‟s literacy and musicality were assessed and the results suggest that

pitch-discrimination ability, rather than rhythm-discrimination ability predict literacy.

Different aspects of music and speech have been examined; the remainder of this

section will focus on the existing literature concerned with the processing of prosody

and melody. Prosody has linguistic and emotional functions and can be defined as

stress and intonation patterns in speech, measured as fundamental frequency, intensity,

duration, and spectral features. Thompson, Schellenberg, and Husain (2004)

investigated the emotional function of prosody and found that adults with musical

training were better at identification of emotions, such as sadness or fear than non-

musical adults. They also demonstrated that 6-year old children who were tested after

one year of musical training were better than non-musical children at identification of

anger or fear. It was concluded that music lessons heighten sensitivity to emotions that

are conveyed by speech prosody (for a review see Slevc and Miyake, 2006).

116

4.6 Influence of Music on Foreign Language Sound Acquisition

The assumption that musically talented people are also better at learning foreign

languages is a very old one and has not yet been systematically tested on a segmental

level, but usually more on a grammar-related structural level (Blickenstaff, 1963;

Gilleece, 2006). In this section, studies concerning the relationship between music and

foreign language sound learning will be investigated.

In an investigation of musical ability and second language learning, Fish (1984) found

that the results of the melodic variation subtest of the music aptitude test (Gordon,

1965) and scores on the Pimsleur sound discrimination test (Pimsleur, 1966) were

correlated, however no correlation was found between music aptitude, sound

discrimination and imitation of short phrases in German ("MLA cooperative foreign

language test, German," 1964). Arellano and Draper (1972) found that pitch, intensity,

and timbre perception, tonal memory and Spanish articulation were correlated

(independently of IQ) in speakers of American English. Similar results were obtained in

a study by Westphal, Leutenegger, and Wagner (1969), who tested the relationship

between psychoacoustic factors and intellectual abilities and second language learning

achievement in American English speaking beginners of German. They observed a

significant correlation between scores on a music aptitude test (Seashore et al., 1939,

1960) and comprehension, reading and production of German sentences. In a recent

study, Gillece (2006) investiagted the relationship between music aptitude and second

language learning and found that there was a significant relationship between music

aptitude and language aptitude, independently of general intelligence.

Another study that illustrates a remarkable link between musical experience and

pronunciation ability was conducted by Eterno (1961). His results show that 90% of

eighth graders who had a minimum of one year of musical experience scored above

average in a pronunciation test, while only 10% scored a rating of average. Not one

person who played a musical instrument scored below average.

In a number of experiments concerning acquisition of new tonal contrasts, Gottfried

(2007) shows that musicians are more accurate than non-musicians at identifying,

discriminating, and imitating Mandarin tones.

117

Taken together, these data show that there is a strong relationship between music and

second language learning; Musical training generally appears to assists in foreign

language (sound) acquisition (Tahta, Wood, & Loewenthal, 1981; Thogmartin, 1982).

The aim of the last experiment in the current series is to investigate the relationship

between musical training, musical pitch memory, music aptitude, language aptitude,

and acquisition of foreign language speech sound perception and production. This aims

to clarify the question what exact aspect of music helps with language learning on a

segmental level.

Research findings in the area of categorical perception, lexical tone, and music

perception were presented in the previous chapters. In the next section, these studies

will be summarised, and the motivation behind the current series of experiments will be

explained. Following this chapter, the Experiments 1, 2, and 3 are presented in Chapters

6, 7, and 8 respectively.

118

CHAPTER 5

What We Know Now and What We Want to Find Out

119

5.1 Categorical Perception of Artificial Tone Continua

In the second and third chapters, speech perception processes and lexical tone were

reviewed. Special attention was devoted to the phenomenon of categorical perception.

None of the existing studies of categorical perception of lexical tone tested more than

one tonal language. Thus, any outcomes of those studies cannot necessarily be

attributed to tonal languages in general, but only to the particular languages that were

tested. In this thesis, listeners from different tonal language backgrounds are tested, in

order to investigate whether the observed influence of tonal language experience on the

perception of speech and sine-wave tones is a universal phenomenon, or whether

unique perceptual differences exist for each tonal language. In Experiment 1, speakers

of three tonal languages (Vietnamese, Mandarin, and Thai) will be tested with artificial

synthesised tone continua. The tones are presented in speech and sine-wave contexts in

order to eliminate the effect of tonal content on speech perception. Non-tonal language

(Australian English) speakers‟ tone perception is also examined in order to compare

processes of speech perception in tonal and non-tonal language speakers.

Not all previous experiments on categorical perception of lexical tone have measured

identification and discrimination of tones. As discussed in Chapters 2 and 3, it is

necessary to test both identification and discrimination in order to draw conclusions

about the categoricality of perception. In Experiments 1 and 2, both components of the

categorical perception paradigm will be administered.

This thesis is also concerned with the processing of speech sounds compared to non-

speech sounds. Comparisons between the perception of speech and sine-wave tones in

speakers of tonal languages and non-tonal languages will focus attention on the

differences between speech and non-speech processing.

Most studies in categorical tone perception have only investigated whether tones are

perceived categorically or continuously. In the current Experiments (1 and 2), degrees

of categoricality, measured as subtle differences between listener groups, will be

analysed. In addition to the degree of categoricality, the relative location of category

boundaries on the synthesised tone continua will be investigated.

120

So far, only one study has used asymmetrical tone continua to investigate category

boundary differences between different language groups (S. Chan et al., 1975; W.

Wang, 1976). The present Experiments 1 and 2 will approach the issue of boundary

location on an asymmetric tone continuum, and thus examine the matter of perceptual

tone space in a more systematic way. Use of an asymmetrical continuum allows the

investigation of psychoacoustic vs. linguistic strategies in tone perception.

5.2 Influence of Musical Background on Tone Perception

In Chapters 3 and 4, pitch perception in speech and music was reviewed. The findings

indicate that both language and musical experience can influence perception of musical

pitch and lexical tone.

In Chapter 3, it was also shown that pitch perception is shaped by experience with

lexical tone, and in Chapter 4 that pitch perception is also influenced by experience

with musical tone. In general, musicians and tonal language speakers are more accurate

at perceiving pitch differences.

In Experiment 1, the role of musical ability and tonal language experience is

investigated in a preliminary fashion, ahead of a more detailed and analytic

investigation of these factors in Experiment 2.

5.3 Perception and Production of Tones, Vowels, and Consonants –

Influence of musical aptitude and language aptitude?

Musical ability has different constituents - experience, aptitude, and memory. In

Chapter 4 the influence of music exposure and music lessons (music experience) on

other cognitive abilities was discussed. The results show that music exposure and

training enhances performance in other non-musical areas. In Experiment 3, musicians‟

and non-musicians‟ perception and production of speech sounds (tones, vowels, and

consonants) are investigated and related to the components of music ability, memory

and aptitude, via administration of a musical memory task and a music aptitude test.

121

CHAPTER 6

Categorical Perception of Speech and Sine-Wave Tones in

Tonal and Non-Tonal Language Speakers

122

Experiment 1 concerns categorical perception of speech and sine-wave tone in

speakers of tonal and non-tonal languages. In this introduction past research and a

number of methodological issues are considered ahead of the presentation of

Experiment 1.

6.1 Background: Research on the Categorical Perception of Tone

As indicated in Chapter 3, research regarding categorical perception of lexical tone is

confounded by contradictory findings, differences in participants‟ tonal language

background, varying stimuli, differential methods and data analysis across studies,

and inconsistent operationalisation of categorical perception. These issues and

differences have made it difficult to make comparisons and to provide a

comprehensive account of lexical tone categorisation. Experiment 1 addresses these

issues by providing a common context for comparison between speakers from a non-

tone language background (Australian-English) and a variety of tone language

backgrounds (Thai, Mandarin, and Vietnamese), as well as a measure of

categorisation that encompasses both tone identification and discrimination. In doing

so, a clearer indication of whether perceptual effects are language-dependent or

language-universal can be more confidently made.

Experiment 1 is based on previous research on categorical perception of lexical tone

(see section 3.7). Most recently, Burnham and Jones (2002) found that language

background only impacted upon categorical perception of tones presented as speech,

and not upon any of the non-speech representations they studied (F0 variations in

filtered speech, violin notes, or sine-waves). Specifically, there was better categorical

identification of tone by tonal language (Thai) than non-tonal language (Australian

English) speakers, but only on speech syllables. The relative degree of categoricality

between perceivers with Thai and Australian-English backgrounds did not differ

across the three non-speech tone types. While this finding is superficially inconsistent

with earlier studies of Thai listeners in which tones in speech were perceived rather

continuously (Abramson, 1961; 1977), it should be noted that categorical speech

perception was only investigated in identification by Burnham and Jones (2002).

Together these results suggest that, not only does categorical tone identification

depend on a tone language background, but also that tone in speech is perceived more

categorically than non-speech tone.

123

In this experiment Thai, Mandarin, Vietnamese, and Australian English speakers will

be tested with speech and sine-wave tones in both a categorical identification and a

categorical discrimination task. Thus this study (a) increases the number of tonal

languages considered, and (b) expands the measures used to assess categorical

perception of tone.

With regard to the increase in the number of tonal languages investigated this is

necessary given that previous results in categorical perception relating to tonal

language speakers are inconsistent. With respect to pitch movement (dynamic vs.

static), static tones tend to be perceived continuously (Thai - Abramson 1979;

Cantonese - Francis et al., 2003), whereas dynamic tones are perceived categorically

(Mandarin – Wang, 1976; Cantonese – Francis et al., 2003). Moreover, because of the

differences in stimuli and methods, data are not comparable across studies

investigating Thai and other tonal languages. Wang (1976) observed categorical

perception of Mandarin contour tones, with the boundaries being different in

American English vs. Mandarin listeners, indicating that the reference point that is

used to divide the continuum perceptually depends on language background.

Based on these and Burnham and Jones‟ (2002) results, in addition to Australian

English listeners, Thai and Mandarin listeners‟ perception of tones will be

investigated. Furthermore a tone language as yet unexamined in relation to tone

categorisation, Vietnamese, is also examined in order to expand the range of different

tonal language backgrounds investigated. On this basis, it may be possible to

determine whether categorical perception of tone is general across this subset of tonal

languages.

With regard to the measures used to assess categorical perception, important issues

are pointed out in the following section.

6.2 Methodological Issues

In order to provide a reliable and valid methodological basis for Experiment 1,

deliberations on several methodological concerns are presented below.

124

6.2.1 Stimulus Type Presentation: Blocked vs. Mixed

The method of stimulus presentation is of concern in relation to measures of

categorical perception. Previous research has suggested that perceptual processes are

affected by the context surrounding the presentation of sounds (see section 2.5.1.2).

Burnham and Jones (2002) presented speech and non-speech tones in separate blocks

in their experiment. Indeed it may be the case that blocked presentation of stimulus

types encourages different modes of perception in particular blocks, or that a „speech‟

mode of perception is transferred from a block of speech stimuli to a subsequent block

of non-speech stimuli. (This methodological point bears, to some extent, on the issue

of whether speech and non-speech perception are modular.) Even though the order of

blocks was randomised by Burnham and Jones (2002), only three quarters of

presentations (all except those in which the speech block was presented last) would

not have been prone to this possible order effect. To assess this potential extraneous

influence, the current study presents listeners with the speech and non-speech (sine-

wave) tones in a mixed condition (speech and sine-wave trials presented randomly

within the same blocks of the task), in addition to the blocked presentation that was

used by Burnham and Jones (2002).

6.2.2 Categorical Perception: Identification and Discrimination

Burnham and Jones (2002) were only interested in identification of tones from a

continuum; however, categorical perception is best gauged by a combination of

identification and discrimination results. The current study will test for both

categorical identification and discrimination. This additional task requires that further

considerations be made in the design (choice of interstimulus interval) and that a

refined operationalisation of categorical perception be made. These issues are

addressed below.

6.2.2.1 Interstimulus Interval in Discrimination Tasks

In relation to discrimination tasks, it was shown in Chapter 2 (section 2.4.3.2) that the

interval between two sounds to be discriminated, the interstimulus interval (ISI), can

have an effect on categoricality of perception. Based on previous investigations on

different ISIs it was proposed in 2.4.3.2 that shorter intervals between stimuli to be

125

discriminated result in a phonetic mode of perception and thus less categorical

perception, whereas longer intervals engage a phonemic mode of perception and thus

more categorical processing (Werker & Logan, 1985; Werker & Tees, 1984). These

results suggest acoustic – rather continuous - perception for short ISIs, but phonemic

– more categorical - perception for longer ISIs. Therefore, two different ISIs (500 ms

and 1500 ms) were employed here in order to test for possible effects of stimulus

separation.

6.2.2.2 Refined Operationalisation of Categorical Perception of Tone

In a study of the categorical perception of tone, Wang (1976) found that Mandarin

and American English listeners employed different points to perceptually divide an

asymmetric continuum. The category boundary for American listeners was found to

be psychophysically based, located close to a level tone, whereas for Mandarin

listeners, the boundary was closer to the middle of the continuum, presumably due to

an influence of their language background.

Based on these results (W. Wang, 1976), two alternative response strategies were

deemed possible in the current experiment. These are schematically presented in

Figure 6.1 and Figure 6.2, and described below for the asymmetric continuum used

here in Experiment 1. This continuum consisted of tones with a consistent onset F0

(200 Hz) but varying offset F0s from 160 Hz (falling tone) to 220 Hz (rising tone) in

10 Hz steps (see Figure 6.3). This asymmetrical continuum was used because it was

different from the tonal systems of the three tonal languages investigated here.

Mid-Continuum Response Strategy: In the mid-continuum strategy, it might be

expected that the category identification boundary and discrimination peak for a

synthetic asymmetric continuum would be midway between the endpoints of the

continuum. If so then (i) the identification boundary should lie in the middle of the

continuum (190 Hz offset), and (ii) stimuli in the middle of the continuum should be

more discriminable than those at the ends of the continuum (see Figure 6.1).

126

0

25

50

75

100

160 170 180 190 200 210 220

offset value (in Hz) for the 200 Hz onset tone

res

po

ns

e a

cc

ura

cy

in

%

discrimination identification

Figure 6.1. Mid-Continuum Response Strategy for a synthetic continuum with onset = 200 Hz, and offset = 160 – 220 Hz in 10 Hz steps. The 190-offset stimulus represents the middle of the continuum, whereas the 200 Hz offset stimulus is a flat no-contour tone.

0

25

50

75

100

160 170 180 190 200 210 220


res

po

ns

e a

cc

ura

cy

in

%

discrimination identification

Figure 6.2. Flat-Anchor Response Strategy for a synthetic continuum with onset = 200 Hz, and offset = 160 – 220 Hz in 10 Hz steps. The 190-offset stimulus represents the middle of the continuum, whereas the 200 Hz offset stimulus is a flat no-contour tone.

Flat-Anchor Response Strategy: In the Flat-Anchor Response strategy, it might be

expected that the category boundary and discrimination peak for this synthetic

mid flat

mid flat

127

asymmetric continuum would be near the flat no-contour tone (200 Hz offset). If so

then (i) the identification boundary should lie on the flat 200 Hz stimulus, and (ii) the

stimuli around the 200 Hz stimulus should be more discriminable than those at the

ends of the continuum (see Figure 6.2).

6.2.2.3 Non-Speech Stimulus Materials

One final methodological consideration concerns the choice of stimulus types. While

Burnham and Jones (2002) used filtered speech, sine-wave tones, and musical

equivalents as non-speech stimuli, here only sine-wave stimuli will be used to

represent non-speech. A reduction in non-speech types here is reasonable because

Burnham and Jones (2002) found that type of non-speech tone did not influence the

results. The sine-wave option is preferable to the violin or filtered speech types in that

it provides a more neutral context compared to the musical context suggested by the

violin, and is more normal sounding than filtered speech, which can sometimes seem

muffled and/or obscured.

6.3 Hypotheses

Hypotheses for language background, tone type, presentation type, and interstimulus

interval (ISI) were entertained. It was hypothesised that perception should be more

categorical (i) by tonal language than non-tonal language speakers; (ii) for speech

than for sine-wave continua; (iii) and within the speech stimuli, in blocked than

mixed presentations, because speech/non-speech juxtaposition might result in

interaction of processing modes (i.e., it was thought that speech and non-speech

processing are not entirely independent in tone perception). With regard to the ISI, a

final hypothesis was entertained, that: (iv) stimulus-pairs with a greater temporal

separation (1500 ms) will be discriminated more categorically than those with a

shorter ISI (500 ms).

6.4 Experimental Design

A 4 x 2 x 2 x (2) design was employed. The between-group factors were participants‟

native language (Thai, Vietnamese, Mandarin, Australian English), presentation

manner (mixed or blocked), interstimulus interval (500 ms or 1500 ms temporal

separation between two sounds in discrimination) and the within-group factor was

128

tone type (speech or sine-wave). The order of tasks (identification and

discrimination), the order of tone types in the blocked condition (speech and sine-

wave tones) as well as order of stimulus in discrimination (higher offset stimulus first

vs. lower offset stimulus first) were counterbalanced between participants and are not

considered in the analysis.

For the identification task measurements were taken for each stimulus on the

continuum; and for the discrimination task measurements were taken for each

contiguous pair of stimuli. Dependent variables in each task are considered in detail

in subsections 6.4.3.1 and 6.4.3.2.

6.4.1 Stimuli

Two synthetic continua were created, one speech, one sine-wave each with identical

F0 contours. The tonal contours of the continua are shown in Figure 6.3. The tones

have a fixed onset of 200 Hz and an offset varying from 160 Hz (falling) to 220 Hz

(rising) in 10 Hz intervals. Thus each continuum is asymmetrical and consists of

seven steps. The stimulus with the 190 Hz offset marks the middle of the continuum,

and the 200 Hz offset stimulus is a flat no-contour stimulus. F0 movement in the

continuum is linear in order to avoid resemblance to the actual tones of the specific

languages whose speakers were tested. The syllable /wa/, recorded from a female

native Thai speaker, was used as the speech sound carrier because this combination of

sounds exists in most of the languages tested53. Variations in tone type matching, and

F0 were achieved by resynthesising speech and sine-wave tones to the same F0 and

duration (495 ms) specifications with the STRAIGHTv30kr16 software (Kawahara,

Katayose, de Cheveigne, & Patterson, 1999). Sine-wave resynthesis was conducted

with the MARCS Auditory Perceptual Toolbox (APT) (Stainsby, Haszard Morris,

Malloch, & Burnham, 2002).

53 The syllable /wa/ in Mandarin has different meanings; wa1 means frog, wa2 means baby or doll, wa3 means tile, and wa4 means socks or stockings. In Thai, wa3 (short vowel) is a question particle, waa0 (long vowel) is a Thai measurement for distance and area, and waa3 (long vowel) is the name of a hill tribe from Myanmar. The sound combination /wa/ is possible but does not have a meaning in Vietnamese.

129

Figure 6.3. F0 characteristics of the asymmetrical tone continuum.

6.4.2 Participants

In all the experiments including this one, participants gave their informed consent (see

Appendix A6.1) and the experiment was covered by Ethics Approval of the

University of Western Sydney (HREC 01/163). A total of 64 participants were tested,

16 native Thai speakers (9 female, 7 male average age: 23.6 years), 16 native

Mandarin speakers (9 female, 7 male average age: 25.6 years), 16 native Vietnamese

speakers (5 female, 11 male average age: 23.6 years), and 16 native Australian

English speakers (11 female, 5 male average age: 26 years). All were students at the

University of New South Wales, Sydney (mean age 24.7 years, range 18-31), who

were reimbursed for their travel expenses. A further three participants (one in the

Australian English group, the other two in the Mandarin listener group) began the

experiment, but failed to complete the task. Their data are not included in the analysis.

6.4.3 Procedure

Participants were tested individually in a single session, in a sound-attenuated testing

cubicle in the Department of Psychology at the University of New South Wales,

Sydney. Stimuli were presented on a laptop computer (Compaq Evo N1000c) over

headphones (KOSS UR20) at a self-adjustable listening level in the DMDX (Forster

& Forster, 2003) experimental environment.

Time Step size: 10 Hz

200 Hz

220 Hz

160 Hz

rising tone

falling tone

210

200

190

180

170

130

131

6.4.3.1 Identification

In the identification task participants were provided with two labeled (RIGHT and

LEFT) keyboard keys and instructed to “press the RIGHT (LEFT) key for one kind of

sound and the LEFT (RIGHT) key for the other kind of sound”. Responses timed out

after 4000 ms, and timed out trials were not replaced. (An example DMDX script can

be found in Appendix A6.2)

For all listeners, there were two sets of trials, each consisting of a practice phase, a

training phase, and a test phase. For subjects in the Blocked condition, speech and

sine-wave stimuli were presented separately in the two trial sets of the experiment

(counterbalanced order of speech and sine-wave trial sets between listeners); and for

listeners in the Mixed condition, there were simply two sets of mixed (speech and

sine-wave randomised) stimuli. Each of the two trial sets consisted of three phases:

practice, training, and test.

In the practice phase eight items were presented, four of each of the relevant endpoint

stimuli (the 160 Hz and the 220 Hz offset stimuli). Following the practice phase, in

the training phase, a criterion of 8 consecutive correct responses was required in each

trial set: for participants in the Blocked condition this entailed reaching criterion for

the speech set and for the sine-wave set; and for Mixed condition participants this

entailed reaching criterion for each set of mixed stimuli to equate with the procedure

in the Blocked condition and to ensure listeners remembered the task over time. Each

training phase consisted of the endpoints of the continuum and continued until a

criterion of eight consecutive correct responses was reached (criterion results will be

discussed in section 6.6.1.1).

Following criterion in the training phase the test phase was presented. The test phase

consisted of 8 repetitions of each continuum step presented in random order. Given

seven steps on each continuum, participants were required to identify a total of 112

items in the test phase. In total the identification task took 15 to 20 minutes,

depending on the individual participant‟s pace.

Data treatment:

For each listener, two crossover values (in Hz) and two d' values and were computed,

one for each tone type, speech and sine-wave. The crossover value is the point on the

132

continuum above and below which there is the same number of responses for each

category. The d' value measures the steepness of the slope at the crossover, and serves

to indicate the degree of categorical perception. In general, the steeper the

identification curves at crossover, the more categorical the perception. Crossover

values were computed by running a logistic regression for each listener and dividing

the constant by the slope at the 50% boundary (because log (0.5/0.5) = 0 = constant +

slope (crossover)). This value was then converted to Hz, ranging from 160 Hz to 220

Hz, the offset values of the stimuli on the tone continuum. From the identification

responses d', measuring the steepness of the identification at the category boundary,

was computed for the perceptual distance between the stimuli spanning the 50%

crossover. First the proportion of responses for one of the two categories for each

stimulus was converted to a z-score, then for each stimulus-pair, the z-score for the

smaller proportion was subtracted from the z-score from the larger proportion to

derive the d'. To avoid inflated d' estimates, response category proportions of 0 and 1

were converted to 0.005 and 0.995 (1/(2N) and 1-1/(2N) (Macmillan & Creelman,

1991). For four of the 64 participants (one in the Mandarin and three in the Australian

group), it was not possible to identify a unique 50% crossover. For these four

participants the missing values were replaced by means of the respective groups,

separately for sine-wave and speech.

6.4.3.2 Discrimination

In the discrimination task the participants listened to stimulus-pairs and were

instructed to “press the LEFT (RIGHT) key if they are the same sound, and press the

RIGHT (LEFT) key if they are different sounds”. (An example DMDX script can be

found in Appendix A6.3). Responses timed out after 1500 ms (because of the longer

trial duration here in discrimination, the time-out duration was shorter than in

identification task). Omitted trials were not replaced. The task was separated into two

trial sets, punctuated by a break. As in the identification task, there were two manners

of presentation: Blocked and Mixed. In the Blocked presentation, participants listened

to a block of speech stimuli followed by the sine-wave block (or vice versa); in the

Mixed mode the stimuli were presented randomly (speech and sine-wave in the same

blocks). A roving AX paradigm was used, measuring discrimination accuracy along

the whole continuum. Neighbouring stimuli were presented pair-wise. Half of the

participants in each condition listened to stimulus-pairs that were separated by a 1500

133

ms interval, whereas the other half listened to sounds that were separated by a 500 ms

ISI.

Each trial set consisted of a block of eight practice trials and a block of 192 test trials.

The eight practice trials consisted of four different and four same pairs that were

presented with feedback for correct/incorrect responses. In test trials there were four

repetitions of each of the four possible combinations (AA, BB, AB, BA) of each of

the six possible stimulus-pairs for each of the two tone types (speech and sine-wave).

This summed to a total of 192 stimulus-pairs. This task took 20 to 25 minutes,

depending on the participant‟s pace and the ISI (around 20 min for 500 ms ISI,

around 24 min for 1500 ms ISI).

Data treatment:

Discrimination performance was measured by d', calculated according to the models

for discrimination tasks in Kaplan, Macmillan and Creelman (1978). The number of

hits H (when different stimuli were correctly judged as different), false alarms F

(same stimuli were incorrectly perceived as different) was calculated for each

stimulus-pair for each tone type. It needs to be considered here that the larger the

difference between hits and false alarms the better the listener‟s sensitivity to the

differences in tone contours. The statistic d' is a measure of the difference between

hits and false alarms. The number however, is not simply H – F, but rather the

difference between the z-transformations54 of these two rates, where H and F are

forcibly limited to the 0.01 to 0.99 range, which means that a maximum d'-value of

8.715, and a minimum d' value of –8.715 would be obtained. This transformation was

conducted using DPrime Plus, a program available online (C. D. Creelman &

Macmillan, 1996).

6.5 Analyses

6.5.1 Test Assumptions

For all following analyses, α was set at .05. Unless otherwise mentioned, test

assumptions were found to be satisfactory. Outliers and how they were dealt with will

be mentioned in the results sections. Post hoc comparisons were conducted using the

Bonferroni correction for multiple tests.

54 In the z-transformation H and F are converted into z-scores (standard-deviation units).

134

6.5.2 Language Group Hypotheses

As there were a priori reasons to expect differences in categorical perception between

tonal (Vietnamese, Thai, and Mandarin) and non-tonal (Australian English) native

speakers, the analyses of variance (ANOVAs) included a planned contrast to test this

difference (see Table 6.1 - Tonal vs. Non-Tonal).

Table 6.1.

Description of the Language Hypotheses:

The Tonal vs. Non-Tonal Test investigates differences between the tonal language

speakers and Australian English speakers; Residual A tests Thai against Vietnamese

and Mandarin; Residual B tests Vietnamese against Mandarin.

Vietnamese Mandarin Thai Australian English

degrees of freedom

Tonal vs. Non-Tonal

1 1 1 -3 1

Total Residual Omnibus test for differences between Vietnamese, Mandarin, and Thai listeners

2

There were no a priori grounds on which hypotheses about differences between the

three different tone language speaker groups could be based. Therefore the remaining

degrees of freedom (2) and variance for the language background levels (3) after the

above planned contrast, was tested via an omnibus F-test for the Mandarin, Thai, and

Vietnamese groups (see Table 6.1, Residual). Thus the 4-level language background

with 3 degrees of freedom was partitioned such that a df = 1 planned contrast for tone

vs. non-tone was tested, along with a df = 2 omnibus test for differences within the

tone language groups.

6.5.3 Strategy Type Hypotheses

Apart from differences between the language groups, differences between stimulus-

pairs, across the continua were also analysed, according to the planned contrasts

presented in Table 6.2.

135

Table 6.2.

Planned Contrasts for the Strategy Type Hypotheses for the Discrimination Task

Stimulus-pair 160-170 170-180 180-190 190-200 200-210 210-220

Mid Hypothesis 1 -1 -1 2 2 -1 -1

Mid Hypothesis 2 0 0 1 -1 0 0

Flat Hypothesis 1 -1 -1 -1 2 2 -1

Flat Hypothesis 2 0 0 0 1 -1 0

Mid Hypothesis 1 tests whether the two stimulus-pairs in the middle of the continuum

(180-190 and 190-200 are more discriminable than the other stimulus-pairs. Mid

Hypothesis 2 test investigates which of the two, if any, of the stimulus-pairs in the

middle is better discriminated. Flat Hypothesis 1 test examines whether there is a

difference in discrimination performance between the two flat pairs (190-200 and

200-210) and the rest of the stimulus-pairs, Flat Hypothesis 2 tests whether there are

differences between the two flat stimulus-pairs.

6.6 Results

Before considering the results, a qualitative evaluation of identification and

discrimination graphs will be presented in 6.6.1. The results of the identification task

are presented in section 6.6.2, and of the discrimination task in section 6.6.3

6.6.1 Qualitative Evaluation

Identification and discrimination functions are shown for each language group for

speech and sine-waves in Figure 6.4. In identification, the location of the category

boundary is found below the middle (190 Hz offset) of the continuum for Mandarin

and Vietnamese listeners, but above the middle and towards the acoustically flat (200

Hz offset) stimulus for Australian English and Thai listeners. Thus it appears that

some tonal language speakers (Vietnamese and Mandarin) perceptually halve the

continuum, whereas the non-tonal English language speakers and the Thai speakers

use an acoustically salient point on the continuum (the flat 200 Hz stimulus) to divide

the continuum into stimuli above and below this reference point. It can be seen in

Figure 6.4 that the same perceptual anchors are used in discrimination –

discrimination peaks tend to occur near the sharpest identification slopes. This

136

correspondence indicates the same perceptual strategies were used in identification

and discrimination.

-8.715

0

8.715

160 170 180 190 200 210 220

0

50

100

-8.715

0

8.715

160

0

50

100

-8.715

0

8.715

160

0

50

100

-8.715

0

8.715

160 170 180 190 200 210 220

0

50

100

mid flat

mid flat

mid flat

mid flat

Figure 6.4 Identification and discrimination results across languages.

Iden

tific

atio

n re

spon

ses

in %

Dis

crim

inat

ion

accu

racy

(m

inim

um:

-8.7

15, m

axim

um: 8

.715

)

Vietnamese

Mandarin

English

Thai

sine speech

sine speech

sine speech

sine speech

stimulus offset (in Hz) for the 200 Hz onset tone

Figure 6.4 Identification (black lines) and discrimination (grey lines) results across languages

137

6.6.2 Identification Results

The identification task began with practice trials, followed by a training session to

reach criterion, and then a set of test trials. Results for the practice trials were not

analysed. The results of the training session are presented in section 6.6.2.1, followed

by results of the test trials (sections 6.6.2.2, 6.6.2.3, and 6.6.2.3).

6.6.2.1 Trials to Criterion in Identification Training

In the identification training session, participants were required to identify eight

stimuli correctly in succession, in order to proceed to the main part of the experiment.

Raw data are presented in Appendix A6.4 and statistics outputs are presented in

Appendix A6.5. Descriptive statistics (means and standard error bars) for trials to

criterion are presented in Figure 6.5 for blocked conditions and Figure 6.6 for mixed

conditions. The number of trials required to reach criterion are analysed here. In the

blocked condition, trials to criterion for the two blocks of trials, speech and sine-wave

tones, were analysed irrespective of order of presentation (which was counterbalanced

across participants); in the mixed condition, this was not possible, so the trials to

criterion score for Set 1 (the first of the two sets completed) were compared to trials to

criterion for Set 2 in order to examine any possible practice or learning effects.

Blocked vs. Mixed Condition:

Before considering the blocked and mixed conditions separately, a comparison of

trials to criterion in the mixed vs. the blocked conditions collapsed over language

groups via an independent sample t-test revealed that more trials to criterion were

required in the mixed (M = 14.27, sd = 10.27) than in the blocked (M = 9.40, sd =

4.02) condition (t (1, 126) = -3.526, p < .01).

Blocked condition:

Mean trials required to reach criterion in the blocked condition for the four language

groups (Thai, Vietnamese, Mandarin, and Australian English) and two tone types

(speech and sine-wave) are shown in Figure 6.5 (for raw data and statistics outputs see

Appendix A6.4 and A6.5). A 4 (language groups) x (2 stimulus types) analysis of

variance (ANOVA) with repeated measures on the second factor revealed no

significant difference between the speech and the sine-wave trials to criterion (F (1,

138

28) = .096, p > 0.05), and no significant effect for tone language background vs. non-

tone language background (F (1, 28) = 1.678, p > 0.05). There was however a

significant interaction of tone language groups and tone type (F (1, 28) = 4.443, p <

.05) and further post-hoc tests revealed that Thai listeners required more trials to reach

criterion in the sine-wave task, but the Vietnamese listeners required more trials in the

speech task.

Mixed condition:

Three outliers were found55 (z > 3.29) and changed to one unit larger than the next

extreme score, as suggested by Tabachnik and Fidell (2001). The descriptive statistics

for the mixed condition are shown in Figure 6.6. The analysis shows that there were

significantly more trials to criterion required in the first set (M = 19.59) than in the

second (M = 8.94) set (F (1, 28) = 27.337, p < 0.05). No significant differences were

found between tone and non-tone language speakers (F (1,28) =1.62, p > .05) or

between speakers of the different tonal languages (F (2, 28) = .055, p > .05), nor were

there any interactions.

0

10

20

30

Thai Vietnamese Mandarin English

crite

rion s

core

sine

speech

Figure 6.5. Descriptive Statistics for trials to criterion scores across languages in the blocked condition (error bars represent the standard errors of the mean).

55 There was one outlier in the Thai group (66 trials for sine-wave criterion in the mixed condition), one in the Vietnamese group (84 trials in the mixed condition – sine-wave), and one outlier in the Mandarin group (also mixed condition sine-wave – 56 trials).

139

Figure 6.6. Descriptive statistics for trials to criterion scores across languages in the two parts of the mixed condition.

Together these results show that under most circumstances, participants learnt this

identification task quite easily in around nine trials, just one more than the minimum

number of eight. The only condition in which this was elevated (M = 19.3) was in the

first set of mixed trials (see Figure 6.6). This indicates that speech and sine-wave

stimuli were not treated as equivalent by the participants, for responding to one type

interfered with responding to the other type. Nevertheless, they were not treated

completely independently because by the second set of mixed trials, the mean trials to

criterion was around that for the blocked trials, indicating that mixed condition

participants could learn to ignore any sine-wave/speech similarities or differences in

this task that may have inhibited their performance.

6.6.2.2 Identification Test Trials: Crossover Values

Crossover values in Hz were calculated for each listener for each tone type.

Descriptive statistics for crossover values are shown in Table 6.3 and also Figure 6.4.

Raw data and analyses are presented in Appendix A6.6 and 6.7.

A 4 x 2 x (2) language x presentation mode x stimulus type ANOVA with repeated

measures on the last factor was conducted.

For tonal (Mandarin, Vietnamese, and Thai listeners) vs. non-tonal language speakers

(Australian English listeners) the difference was found to be significant, (F (1, 56) =

4.721, p < .05), with generally higher crossovers for non-tone language speakers.

Within the tone language speakers, the omnibus F (difference between groups) was

significant (F (2, 56) = 12.364, p < 0.05). Given the nature of the results (see Table

140

6.3) post hoc comparisons were also conducted. Post-hoc comparisons showed that

the Thai listeners had higher crossovers than the Mandarin listeners (p < .001), and

than the Vietnamese listeners (p < .01). Australian English listeners showed higher

crossover than the Mandarin listeners (p < .001), and that there was no difference

between Mandarin and Vietnamese listeners (p > .05).

There also was a significant main effect for stimulus type (F (1, 56) = 13.788, p <

.05), with the crossover being higher on the continuum for the sine-wave than for the

speech sounds (Mspeech = 188.5, sd = 8.28; Msine-wave = 192.1, sd = 7.90). The main

effect for presentation mode (mixed/blocked) was also significant (F (1, 56) = 5.264,

p < .05), with the blocked crossovers being higher than the mixed crossovers.

Other than these main effects, there were no two-way interactions between the three

factors, language background, presentation mode, and stimulus type, nor was the

three-way interaction significant.

Table 6.3

Descriptive Statistics for Crossover Values (measured in Hz) across Tone Types and

Languages (n = 16 in each group).

Group M (SD) blocked M (SD) mixed

speech sine-wave speech sine-wave

Thai 196.1 (6.01) 196.3 (7.21) 189.1 (10.27) 198.0 (8.97)

Vietnamese 190.9 (7.26) 192.0 (2.07) 184.3 (5.31) 187.0 (9.56)

Mandarin 183.7 (6.01) 188.3 (7.25) 179.8 (7.92) 187.3 (7.19)

Australian 192.9 (4.45) 195.6 (6.78) 190.9 (6.29) 192.8 (6.02)

Means 190.9 (5.93) 193.1 (5.83) 186.0 (7.70) 191.3 (7.94)

141

6.6.2.3 Identification d' Results

Mean identification slope, measured by d' at the crossover location was calculated for

each listener. Descriptive statistics for crossover values are shown in Figure 6.7 and

6.8. Raw Data and analyses are presented in Appendix A6.8 and A6.9. A 4 x 2 x (2)

language x presentation mode x stimulus type ANOVA with repeated measures on

the last factor was conducted.

Identification slopes between tonal language speakers (Mandarin, Vietnamese, and

Thai listeners) and non-tonal language speakers (Australian English listeners) did not

differ significantly, F (1, 56) = .011, p > .05. Within tonal language groups there was

a significant omnibus effect (F (2, 56) = 5.654, p < .05). Post-hoc comparisons

showed that there was significantly stronger identification by the Thai than by the

Vietnamese listeners (p < .005), and for the Mandarin than the Vietnamese listeners

(p < .005). Differences between Thai and Mandarin listeners were not significant (p >

.05). The other two main effects, presentation mode and stimulus type, were not

significant (Ftone type (1, 56) = .490, p > .05; Fpresentation manner (1, 56) = 3.748, p > .05),

and none of the two-way or three-way interactions were significant. Thus the effects

here were for language background, and these results were consistent over

mixed/blocked, and speech/sine-wave conditions.

-1

0

1

2

3

Vietnamese Mandarin Thai English

identif

icatio

n a

ccura

cy (

d')

blocked speech blocked sine

Figure 6.7. Mean d' identification scores across languages for speech and sine-wave stimuli in the blocked condition.

142

-1

0

1

2

3

Vietnamese Mandarin Thai English

identif

icatio

n a

ccura

cy (

d')

mixed speech mixed sine

Figure 6.8. Mean d' identification scores across languages for speech and sine-wave stimuli in the mixed condition.

6.6.3 Discrimination Results

In the discrimination task there were practice trials followed by sets of test trials.

Practice trials were not analysed. For the test trials d' was the dependent variable.

Overall differences in discrimination are presented in 6.6.3.1 and analyses of peaks of

discrimination in 6.6.3.2.

6.6.3.1 Overall Discrimination Differences

Mean discrimination accuracy was calculated for each listener for each stimulus-pair.

Raw data and Analyses are presented in Appendix A6.10 and A6.11. A 4 x 2 x 2 x (2)

language x presentation mode x ISI x stimulus type ANOVA with repeated measures

on the last factor was conducted using the mixture of planned contrasts and omnibus F

described in section 6.5.

Discrimination accuracy for tonal language (Mandarin, Vietnamese, and Thai)

listeners was found to be significantly lower than for non-tonal language (Australian

English) listeners (Mtone = 2.425, sd = 3.434; Mnon-tone = 3.427, sd = 3.368; F (1, 48) =

4.167, p < .05) and the omnibus F for differences within tone languages was not

significant (F (2, 48) = 2.244, p > .05). None of the other main effects (presentation

manner, ISI, or stimulus type) were significant nor were any interactions.

143

6.6.3.2 Peak Discrimination Analysis

When categorical perception is observed the identification slopes are steep in the

region on the continuum where perception changes from one category to the other –

the category boundary. So far we have looked at location of the boundary (see section

6.6.1.2) and steepness of the identification function at the boundary (section 6.6.1.3).

In categorical perception experiments not only identification is investigated. In order

to show that stimulus continua are perceived in a categorical manner, discrimination

also needs to be assessed. When discrimination is poor (around chance level) in those

areas of the continuum where identification is most consistent (between the endpoints

of the continuum and the category boundary area), it is concluded that stimuli are

indiscriminable (for example in the case of voicing continua, (Repp, 1984)). Another

feature of categorical discrimination is the existence of a discrimination peak at the

category boundary. This peak shows that stimuli that are identified inconsistently

(sometimes identified as category A, sometimes identified as category B), are

discriminated well.

In the current experiment the focus is on differences in categorical perception (all

measures: identification d', crossover location, overall discrimination performance,

and discrimination peak location) differences between speakers of different

languages. Examining location of discrimination peaks is interesting as it may reveal

further information about perceptual strategies that were observed in identification.

Discrimination values over the continuum are shown in Figure 6.9 and 6.10 for the

four language groups for sine-wave and speech tones. Raw discrimination data are

presented in Appendix A6.10 and A6.11.

Here, as set out in 6.5, planned comparisons were constructed to test the mid-

hypothesis and the flat-hypothesis. These are considered in turn.

Mid-Hypothesis:

Overall, the stimuli in the middle of the continuum (stimulus-pairs 180-190 Hz and

190-200 Hz) were discriminated significantly more accurately than the other stimulus-

pairs F (1, 48) = 38.735 (Mmid = 3.280, sd = 3.404; Mother = 2.372, sd = 3.415), but

there was no significant difference between these two (F (1, 48) = 0.137).

144

0

1

2

3

4

5

sine speech sine speech sine speech sine speech


dis

crim

inatio

n a

ccu

racy

(d')

mid (190) pairs other stimulus pairs

Figure 6.9. Mean d' discrimination scores for the stimuli around the middle of the continuum (the 180-190 Hz and 190-200 pairs) vs. the other stimulus-pairs on the continuum.

None of the interactions between the mid-hypothesis and other factors (presentation

mode, ISI, and language background, or their interactions) were found to be

significant. Thus, it can be said that discrimination was best in the centre of the

continuum (180-200 Hz) irrespective of language background or stimulus conditions.

Flat-Hypothesis:

It was also revealed that those stimuli situated around the flat stimulus (190-200 Hz

and 200-210 Hz) were discriminated more easily than the other stimulus-pairs on the

continuum: F (1, 48) = 22.181 (Mflat = 3.207, sd = 3.427; Mother = 2.409, sd = 3.412)

and that there was no significant difference between these two stimulus-pairs around

the flat stimulus (F (1, 48) = 1.343, p > .05).

145

0

1

2

3

4

5

sine speech sine speech sine speech sine speech


dis

crim

ina

tion

acc

ura

cy (

d')

Flat (200) pairs other stimulus pairs

Figure 6.10. Mean d' discrimination scores for the stimuli around the flat stimulus (the 190-200 and 200-210 pairs) vs. the other stimulus-pairs on the continuum.

There was also an interaction between this effect and the different tonal language

speaking groups. There is a significantly greater difference between the flat pairs and

the other pairs in the Thai listener group than there is in the Mandarin and Vietnamese

listeners (F (2, 48) = 5.543, p < .05). Thus it can be said that the flat hypothesis was

upheld for the Thai but not the Vietnamese or Mandarin speakers. Due to the nature of

the observed results, post-hoc tests with just the English speakers were also conducted

and these revealed that the flat-hypothesis was upheld there

6.7 Discussion

The results of this experiment will be discussed considering important aspects of tone

perception, starting with differences between speech and non-speech tone perception,

perceptual strategies, and categoricality issues, leading to further analyses and future

directions.

6.7.1 Independence of Speech and Non-Speech Processing

In the current experiment, unlike in Burnham and Jones (2002), no significant

differences between speech and sine-wave processing were found in terms of

identification or discrimination accuracy. While this is contrary to earlier results by

Burnham and Jones (2002), one could perhaps argue that in terms of F0, the difference

between speech and sine-wave non-speech is not that large. While in the speech

version, F0 is carried by the fundamental frequency and a number of higher harmonics

146

and in the sine-wave version, it is carried only by the frequency of the sine-wave, the

pitch contour is clear in both versions. This is quite different from, say, non-speech

analogues of VOT or place of articulation continua. According to this view, the lack

of difference in terms of categoricality is not surprising.

It was, however, found that significantly more trials were required to reach criterion in

the first but not the second set of trials the mixed condition. This means that speech

and sine-wave tone processing are not completely independent, shown by the elevated

criterion score in the first set, which indicates that there is interference between the

two stimulus types. This pattern of results also shows that this interference can be

ignored, which is shown by lower criterion scores in the second set.

6.7.2 Perceptual Strategies in Identification and Discrimination

In terms of crossover, the distinction between tonal and non-tonal language speakers

is not clear. The Mandarin and the Vietnamese listeners seem to perceptually halve

the continuum; they exhibit an identification boundary at the 190 Hz offset midpoint

of the continuum, and discrimination peaks at 190 Hz. This perceptual halving of the

asymmetric continuum would appear to be a linguistic approach - the listeners create a

tonal space for the particular task at hand (the synthetic continuum in this case) and

place the tones within this tonal framework – half of the stimuli in one category (those

above the mid-point), and the other half in another category (those below the mid-

point).

On the other hand, the Thai listeners and the Australian listeners divide the continuum

into above and below the flat no-contour stimulus (200 Hz offset). Consideration of

Figure 6.4 shows that this means that their crossovers were closer to the flat stimulus

This would appear to be a more acoustic approach, as the flat tone is an acoustically

salient tone with a distinctly different contour from the other tones. In this way, two

categories may be set up – non-flat rising tones, and non-flat falling tones. The

difference between the two strategies is that those listeners who use the flat tone as a

perceptual anchor appear to perceptually distinguish between rising and falling tones,

whereas those listeners who use the middle of the continuum as an anchor seem to

build a tonal space to use for this task, within which the middle of the continuum

plays a greater role than the flat no-contour stimulus.

As well as looking at the crossover results in terms of the middle and the flat tone,

discrimination results were also analysed according to these two locations on the

147

continuum. Overall, the mid and the flat stimulus-pairs were discriminated more

accurately than the other stimulus-pairs on the continuum, with the Thai listeners

showing significantly greater differences in discrimination accuracy between the flat-

pairs and the other pairs than the Mandarin and Vietnamese listeners. This means that

the Thai listeners‟ perceptual anchor lies again closer to the flat stimulus, whereas for

Vietnamese and Mandarin listeners, the perceptual anchor is more around the middle

of the continuum. This confirms what was found in the identification crossover results

– Thai listeners, but not Vietnamese and Mandarin listeners use the no-contour tone as

a perceptual anchor, here shown by increased discriminatory ability around that point

on the continuum.

It is interesting that the Thai listeners do not seem to use the same perceptual strategy

as the other (Mandarin and Vietnamese) tonal language listeners. While the results of

the Thai listeners seem very similar to those of the Australian English listeners, this

does not necessarily lead to the conclusion that both use an acoustic approach to tone

processing. Rather it might indicate that the Thai listeners employ a different

linguistic strategy in tone perception than the Vietnamese and Mandarin listeners. To

investigate possible similarities and differences between Thai and Australian English

listeners in terms of categorical tone perception, Experiment 2 will test just those two

language groups.

Although the language abilities of the participants were not tested for the purpose of

this perceptual experiment, and while it is possible that the Thais were more

experienced with English than the Vietnamese, this is unlikely, because all of the Thai

and Vietnamese participants were students of the University of New South Wales in

Sydney, where University level of English knowledge is expected.

Another difference within tonal languages was observed in the identification d'

results. Identification d' gives information about the steepness of the identification

function at the point where the crossover is located. Higher d' values indicate steeper

identification functions and this, in turn means more consistent, or „categorical‟

identification behaviour. Vietnamese listeners show significantly less categorical

perception than Thai and Mandarin listeners.

148

One of the reasons why Vietnamese listeners show less categorical perception could

be that there are distinct durational differences in the real Vietnamese tones (see

section 3.2.2). Duration, additional to F0, is an important feature of lexical tone in

Vietnamese, with tone 5 being the longest (around 400 ms), and then in order of

decreasing duration tone 2, tone 3, tone 4, and tone 6 (160 ms) being the shortest (L.

Thompson, 1987). As the duration of all the tones presented in this experiment was

equated at 495 ms, the Vietnamese listeners may have had greater difficulty in

categorical identification, due to lack of a durational cue. It should be noted that

Mandarin and Thai also have durational differences that are used to distinguish tones;

however, durational differences between the different Vietnamese tones are greater

than those between different Mandarin (300 ms vs. 170 ms) or Thai tones (570 ms vs.

450 ms) (Blicher, Diehl, & Cohen, 1988; Jongman, Wang, Moore, & Sereno, 2006;

Vu, Nguyen, Luong, & Hosom, 2005; Yip, 2002).

Apart from this distinction, there is another important feature of Vietnamese tones that

is missing in the artificial continuum that was used in the current study: voice register.

While different voice registers are present in real Vietnamese tones (Vu et al., 2005;

Yip, 2002), these cues were not engaged here, which might be another reason why the

Vietnamese listeners did not respond in a very categorical way to these stimuli.

Given that register and duration features of Vietnamese tone play a significant role in

Vietnamese tone identification, it would be interesting to see whether Vietnamese

listeners would respond differently to a tone continuum in which duration and/or

voice register are manipulated.

Thus, it appears that tone perception strategies are not universal and each tonal

language has to be considered separately.

6.7.3 Categoricality Issues

The results of this experiment show that, consistent with previous studies (Stagray &

Downs, 1993) discrimination is better overall for non-tonal language than for tonal

language speaking participants (but see section 6.8). The reason for the superiority of

the non-tonal language listeners over tonal language listeners could be the fact that the

Australian participants treat the task as an acoustic one, whereas the tonal language

speakers associate the tones with tones from their own tonal system which influences

their perception of a novel tonal space. Moreover, tonal language listeners are used to

149

perceiving tones that are very similar as one tonal category, whereas non-tonal

listeners are not required to categorise tones in linguistic contexts and thus better able

to perceive subtle tonal differences. However, this can only explain the differences

between speakers of two of the tonal languages (Mandarin and Vietnamese) and the

non-tonal Australian English speakers, Thai listeners‟ perception does not appear to

be influenced in that way. It is not clear why there is this similarity of Thai listeners‟

perception to Australian listeners‟, which leads to the conclusion that tone perception

is not only shaped by tonal language experience in general, but rather a result of the

specific tonal system that the listener is using.

With regard to presentation manner, crossovers, identification accuracy, and

discrimination results were analysed. The results show that the crossover varied

depending on presentation manner (with higher – towards the flat tone) crossovers in

the blocked condition) and tone type (higher crossovers for sine-wave tones). This

indicates that listeners‟ perception is influenced by both the context and the manner in

which it is presented. The crossover being closer to the flat 200 Hz offset stimulus for

sine-wave tones shows a rather acoustical approach to sine-wave tone perception.

Similarly, the fact that blocked presented stimuli show crossovers closer to the flat

tone than those for mixed presentation is also an indicator for a more psychoacoustic

perceptual strategy in the blocked condition. The reason for using this type of

perceptual strategy in the blocked, but not in the mixed condition may be that listeners

are better able to ignore stimulus-relevant features, such as stimulus type and solely

concentrate on the tone contour.

Turning to tone type, although there were higher crossovers for sine-wave than speech

tones, there were no significant differences in categoricality either in discrimination or

in identification. This is surprising, as the previous study by Burnham and Jones

(2002) did observe significant differences between tone types. However, this could, in

part, be due to the fact that the Burnham and Jones experiment used more non-speech

(75% of the stimuli presented were non-speech tones), which could have led to a

greater perceptual separation between speech and non-speech tones in their

experiment. Here, half of the stimuli are speech tones, the other half are sine-wave

equivalents, and this equal distribution might lead to a greater robustness of listeners

150

against stimulus-types which might have led to them perceptually treating them

similarly.

It is also worth pointing out that no difference was observed between the two inter-

stimulus intervals (1500 and 500 ms separation in discrimination). This is interesting

because previous studies have found more categorical discrimination for stimuli

separated by a longer than by a shorter interval (see section 2.4.3.2). A reason for this

lack of difference between the two ISIs could be that the 1500 ms separation was too

long to lead to a perceptual advantage.

6.8 Further Analysis and Future Directions

It was shown that tonal language background, tone type, and presentation type all

influence listeners‟ perception of tone continua.

There are some reported connections between tone perception and musical ability

(see Chapter 4). An ad hoc observation made here concerns a comparison between

results of musically trained listeners and non-musicians56. Inspections of d‟ values for

identification and discrimination (see Table 6.4) indicated that musicians show

slightly steeper identification functions and much more accurate discrimination than

non-musicians.

Table 6.4.

Mean d’ values for Identification and Discrimination in Musicians and Non-

Musicians

Musicians Nonmusicians

N d' ID (SD) d' disc (SD) N d' ID (SD) d' disc (SD)

Thai 5 1.93 (0.45) 3.94 (1.46) 11 1.70 (0.41) 2.75 (1.28)

Vietnamese 4 0.96 (0.42) 3.80 (1.12) 12 1.33 (0.44) 1.22 (1.27)

Mandarin 3 1.74 (0.84) 2.82 (1.21) 13 1.79 (0.37) 2.13 (2.06)

Australian English 8 1.81 (0.49) 4.15 (2.04) 8 1.31 (0.57) 2.72 (1.48)

Mean (SD) 20 1.66 (0.60) 3.82 (1.59) 44 1.55 (0.47) 2.24 (1.58)

This ad hoc comparison (which could not be analysed statistically, because the

participant numbers were not even close to equated) leads to the question whether 56 The definition of musicianship was based on a questionnaire in which participants were required to indicate whether they played a musical instrument and for how long they had played it for.

151

musically trained participants are better at tone identification than non-musicians.

This is of interest as a possible factor in tone perception, which may rival tonal

language background as a determinant of tone perception ability. Indeed these

differences were more apparent in Australian and Thai listeners than Mandarin and

Vietnamese listeners and may in fact explain some of the differences in results that

were found here between the two sets of languages. In order to investigate whether

there are any perceptual differences depending on musical background, musically

trained listeners from both tonal (Thai) and non-tonal (Australian English) language

background are tested in the following experiment.

152

CHAPTER 7

Perception of Speech and Sine-Wave Tones: The Role of

Language Background and Musical Training

153

The current experiment concerns categorical perception of tone in Thai and Australian

English speaking musicians and non-musicians. First, in the introduction the results of

the previous study are summarised and methodological issues are considered. The

method and results of the current experiment are subsequently presented.

7.1 Background: Results of Experiment 1

Two results from Experiment 1 bear on this experiment. First, in Experiment 1, it was

found that tone perception depends not only on whether the native language of the

listener is a tonal or a non-tonal language, but also on the particular tonal language

that is spoken. On the basis of strategy differences found in Experiment 1 for speakers

of different tonal and non-tonal languages, in Experiment 2 differences in categorical

tone perception are investigated for two differently shaped asymmetric continua,

which should accentuate any difference between the Mid-Continuum, and Flat-

Anchor strategies. Only speakers from one tonal language (Thai) and one non-tonal

language (Australian English) are tested in the current experiment. While the issue of

tone language differences are of interest, there were many aspects to consider from

the results of the previous study, and the most important were considered to be

differences between speech and non-speech perception, and the influence of musical

background on tone perception. Second, based on differences found between

musically trained listeners and non-musicians observed in an ad hoc analysis in

Experiment 1, both language groups in this study consist of musicians, and non-

musicians. As in Experiment 1 both identification and discrimination tasks, and

speech and sine-wave continua were included. Stimuli are again presented in separate

blocks or mixed within blocks.

7.2 Methodological Issues

Methodological issues concerning the construction of the stimuli for this experiment,

specifically continuum construction and step size, are considered below with due

reference to the results of Experiment 1.

154

7.2.1 Rising and Falling Continua

In Experiment 1, perception of tones from an asymmetric continuum that had more

falling than rising tones was investigated. The current experiment will again use

asymmetric tone continua in order to test for perceptual effects both around the medial

stimulus of the continuum and also around the non-dynamic flat F0 stimulus.

However, unlike Experiment 1, here asymmetry will be observed in both directions:

there will be a continuum with more falling than rising tones (and a slightly falling

central stimulus), and a continuum with more rising than falling tones (and a slightly

rising central stimulus). This is necessary in order to ensure that any effects that are

observed for the central stimulus are indeed due to its medial position rather than to its

being slightly falling or slightly rising. Another difference between the continua of

this and the last experiment is that the physical separation between the centre of the

continuum and the flat stimulus is greater here (in Experiment 1, the stimulus that

constituted the middle of the continuum was only one step away from the flat

stimulus). Based on the results of the previous experiment, two alternative response

strategies were deemed possible in the current study. These are schematically

presented in Figure 7.1 to Figure 7.4, and described below for the asymmetric

continua used here in Experiment 2. The rising continuum consisted of tones with a

consistent onset F0 (220 Hz) but varying F0 offsets from 205 Hz (falling tone) to 257.5

Hz (rising tone) in 7.5 Hz steps; in the falling continuum, the onset was 220 Hz, and

the offsets ranged from 182.5 Hz to 235 Hz in 7.5 Hz steps (see Figure 7.5).

Mid-Continuum Response Strategy: In the mid-continuum response strategy, it might

be expected that the category identification boundary and discrimination peak for the

synthetic asymmetric continuum would be midway between the endpoints of the

continuum. If so then (i) the identification boundary should be located at 208.75 Hz

offset for the falling continuum and at 231.25 Hz for the rising continuum, and (ii)

stimuli in the middle of the continuum should be more discriminable than surrounding

stimuli, especially those at the ends of the continuum (see Figures 7.1 and 7.2).

155

mid-continuum strategy - rising continuum

0

25

50

75

100

205 212.5 220 227.5 235 242.5 250 257.5


identifica

tion a

ccura

cy f

or

the

risi

ng t

one in (

%co

rrect

)

discrimination

identification

Figure 7.1. Mid-Continuum Response Strategy for a synthetic continuum with onset = 220 Hz, and offset = 205 – 257.5 Hz in 7.5 Hz steps. The 231.5-offset rising tone stimulus represents the middle of the continuum, whereas the 220 Hz offset stimulus is a flat no-contour tone. The dotted line represents the discrimination function, the solid line the identification function.

mid-continuum strategy - falling continuum

0

25

50

75

100

182.5 190 197.5 205 212.5 220 227.5 235


identification a

ccura

cy f

or

the

risin

g t

one in (

% c

orr

ect)

discrimination

identification

Figure 7.2. Mid-Continuum Response Strategy for a synthetic continuum with onset = 220 Hz, and offset = 182.5 – 235 Hz in 7.5 Hz steps. The 208.75-offset falling tone stimulus represents the middle of the continuum, whereas the 220 Hz offset stimulus is a flat no-contour tone. The dotted line represents the discrimination function, the solid line the identification function.

Flat-Anchor Response Strategy: In the Flat-Anchor Response strategy, it might be

expected that the category boundary and discrimination peak for this synthetic

asymmetric continuum would be near the flat no-contour tone (220 Hz onset 220 Hz

offset). If so then (i) the identification boundary should lie on the flat 220 Hz

stimulus, and (ii) the stimuli around the 220 Hz offset stimulus should be more

mid flat

mid flat

156

discriminable than surrounding stimuli, especially those at the ends of the continuum

(see Figures 7.3 and 7.4).

flat-anchor strategy - rising continuum

0

25

50

75

100

182.5 190 197.5 205 212.5 220 227.5 235offset value (in Hz) for the 220 Hz onset tone

ide

ntif

ica

tion

acc

ura

cy fo

r th

e

risi

ng

ton

e in

(%

co

rre

ct)

discrimination

identification

Figure 7.3. Flat-Anchor Response Strategy for a synthetic continuum with onset = 220 Hz, and offset = 182.5 – 235 Hz in 7.5 Hz steps. The 208.75-offset falling tone represents the middle of the continuum, whereas the 220 Hz offset stimulus is a flat no-contour tone.

flat-anchor strategy - falling continuum

0

25

50

75

100

205 212.5 220 227.5 235 242.5 250 257.5


identification a

ccura

cy o

f th

e r

isin

g t

one in (

%

corr

ect)

discrimination

identification

Figure 7.4. Flat-Anchor Response Strategy for a synthetic continuum with onset = 220 Hz, and offset = 205 – 257.5 Hz in 7.5 Hz steps. The 208.75-offset rising tone stimulus represents the middle of the continuum, whereas the 220 Hz offset stimulus is a flat no-contour tone.

7.2.2 Step Size

Another difference between this and the previous study is the step size on the

continuum. In Experiment 1, the physical separation between the stimulus offsets was

10 Hz, with onset fixed at 200 Hz, and the offset ranging from 160 Hz to 220 Hz.

Based on the rather high discrimination performance observed in Experiment 1 (better

mid flat

mid flat

157

than chance at most points of the continuum for most listener groups), the step size in

this study has been reduced from 10 Hz to 7.5 Hz (see Figure 7.5).

7.3 Hypotheses

7.3.1 Categoricality Differences

In the ad hoc analyses in Experiment 1 it appeared that musicians had steeper

identification curves than non-musicians. Based on these results it is expected that

musicians here, irrespective of language background, will show more categorical

identification than non-musicians. Musicians also appeared to show higher

discrimination performance in the previous study than non-musicians, and it is thus

expected that they will also be better at discriminating tones than non-musicians.

Separate (2 x 2) groups of tonal and non-tonal language speaking musicians and non-

musicians will be tested to investigate whether musical experience shapes categorical

perception in a similar manner to that of tonal language experience.

7.3.2 Continuum Shape and Language Background

The shape of the continua (more falling vs. more rising) is expected to influence

performance specifically in the Thai listener group. Thai has more falling than rising

tones (see section 3.2.1) and it is thus hypothesised that, if Thai listeners‟ perception

is influenced by their own tonal space, they should show more accurate identification

and discrimination in the falling than in the rising continuum. No such differences are

expected between the falling and the rising continua for Australian English listeners.

It is expected that both language groups will show a perceptual anchor (identification

boundary and discrimination peak) near the flat 220 Hz onset 220 Hz offset stimulus,

as was found in Experiment 1.

7.3.3 Processing Differences

In the first study, all participants were tested in Australia and interacted with an

English-speaking researcher. In the current experiment, Australian English

participants are again tested in Australia by an English-speaking researcher, but Thai

listeners are tested in Thailand and by a Thai-speaking researcher, a more appropriate

linguistic environment for a tone perception experiment with tonal language speakers.

It is hypothesised that this native language environment will result in more language-

specific processing in the tonal language speakers and therefore more categorical

158

perception of tones than was observed in Experiment 1 (although this will not be

tested formally here).

7.4 Experimental Design

A 2 x 2 x 2 x 2 x (2) design was employed. The between-group factors were listeners‟

native language (Thai, Australian English) and musical background (musicians,

nonmusicians), presentation manner (mixed, blocked), and continuum shape (more

falling, more rising). The within-group factor was tone type (speech, sine-wave). The

order of tasks (identification and discrimination), the order of tone types (speech or

sine-wave), as well as the order of stimuli in discrimination pairs (higher offset

stimulus first vs. higher offset stimulus second) were counterbalanced and are not

considered in the analysis.

For the identification task, measurements were taken for each stimulus on the

continuum; and for the discrimination task measurements were taken for each

contiguous pair of stimuli. Task-dependent variables in each task are considered in

detail in sections 7.4.3.1, and 7.4.3.2.

7.4.1 Stimuli

Four synthetic tone continua were created, two types, a speech continuum and a sine-

wave continuum with identical F0 contours, for each of the two continuum shapes -

one with more falling (5 items) than rising (2 items) stimuli, the other one with more

rising (5 items) than falling (2 items) tones. All tones have a fixed onset of 220 Hz

and offsets varying from 182.5 to 235 Hz in the falling continua; and from 205 to

257.5 Hz in the rising continua (step size: 7.5 Hz). Thus each continuum shape is

asymmetrical and consists of eight steps. F0 movement in all tones is linear. The tonal

contours of the continua are shown schematically in Figure 7.5. Tone types, speech on

the syllable /wa/ and sine-wave were created in the same manner as in Experiment 1

(see section 6.4.1).

159

Figure 7.5. F0 characteristics of the two asymmetric tone continua in Experiment 2.

7.4.2 Participants

A total of 64 adults with normal hearing were tested. Of these, 32 were Thai native

speakers and 32 were Australian English native speakers. Each language group

consisted of 16 musicians and 16 non-musicians. Musicians were defined as having

more than five years of continuous formal musical education; non-musicians were

participants with no more than two years of musical education. The Thai participants

were students at Chulalongkorn University, Bangkok, and Assumption University,

Bangkok (26 female, 6 male; mean age 22 years, range 19-27 years), and were

reimbursed for their travel expenses. The musical and non-musical Australian English

participants (23 female, 9 male; mean age 21 years, range 18-30 years) were students

at the University of Western Sydney and received course credit for their participation.

7.4.3 Procedure

Participants were tested individually in a single session, in sound-attenuated testing

cubicles at Chulalongkorn University in Bangkok, Thailand for the Thai listeners, and

at MARCS Auditory Laboratories at the University of Western Sydney for the

257.5 250 242.5 235 227.5 220 212.5 205

220

235 227.5 220 212.5 205 197.5 190 182.5

0 ms 495 ms 0 ms 495 ms time in ms

200

F0 in Hz

240

180

Rising Continuum

Falling Continuum

160

Australian English listeners. Participants gave their informed consent (see Appendix

A7.1 for Australian participants and Appendix A7.2 for Thai participants) and the

experiment was covered by University of Western Sydney Ethics approval (HREC

05/64). Stimuli were presented on a laptop computer (Compaq Evo N1000c) over

headphones (KOSS UR20) at a self-adjustable listening level in the DMDX (Forster

& Forster, 2003) experimental environment.

7.4.3.1 Identification

As in Experiment 1, participants were provided with two labelled (RIGHT and LEFT)

keyboard keys and instructed to “press the RIGHT (LEFT) key for one kind of sound

and the LEFT (RIGHT) key for the other kind of sound”. Responses timed out after

4000 ms, and timed out trials were not replaced. For all listeners there were two sets

of trials, each consisting of a practice phase, a training phase, and a test phase. For

subjects in the Blocked condition, speech and sine-wave stimuli were presented

separately in the two trial sets of the experiment (counterbalanced order of speech and

sine-wave between listeners); and for listeners in the Mixed condition, there were

simply two sets of mixed (speech and sine-wave randomised) stimuli. The three

phases, practice, training, and test, are described below.

In the practice phase eight items were presented, four of each of the relevant endpoint

stimuli (the 205 Hz and the 257.5 Hz offset stimuli for the rising continuum and the

182.5 Hz and 235 Hz offset stimuli for the falling continuum). Following the practice

phase, in the training phase it was required that a criterion of 8 consecutive correct

responses be made: for participants in the Blocked condition this entailed reaching

criterion for the speech set and for the sine-wave set; and for Mixed condition

participants this entailed reaching criterion separately for each of the two sequential

sets of mixed stimuli to ensure listeners remembered the task over time. Each training

phase consisted of the endpoints of the continuum and continued until a criterion of

eight consecutive correct responses was reached. Following criterial responding in the

training phase the test phase was presented. The test phase consisted of 8 repetitions

of each item on each continuum (speech continuum and sine-wave continuum)

presented in random order. Given the eight steps on each continuum, participants

161

were required to identify a total of 128 items in the test phase. The task took 20 to 25

minutes, depending on the individual participant‟s pace.

Data Treatment:

The data treatment was the same as in the previous study (see section 6.4.3.1). For

each listener, two crossover values (in Hz) and two d' values and were computed (see

6.4.3.1 for details). This value was then converted to the offset values of the stimuli

on the tone continuum (ranging from 205 Hz to 257.5 Hz for the rising continuum and

from 182.5 Hz to 235 Hz for the falling continuum).

7.4.3.2 Discrimination

In the discrimination task the participants listened to stimulus-pairs and were

instructed to “press the LEFT (RIGHT) key if they are the same sound, and press the

RIGHT (LEFT) key if they are different sounds”. Responses timed out after 1500 ms

(because of the longer trial duration in discrimination, the time-out duration was

shorter here than in identification). Omitted trials were not replaced. The task was

separated into two trial sets, punctuated by a break. As in the identification task, there

were two manners of presentation: Blocked and Mixed. A roving AX paradigm was

used, measuring discrimination accuracy along the whole continuum. Neighbouring

stimuli were presented pair-wise. Half of the participants in each condition listened to

tones from the rather rising continuum, whereas the other half listened to tones from

the rather falling continuum. The eight practice trials consisted of four different and

four same pairs that were presented with feedback. In test trials there were four

repetitions of each of the four possible combinations (AA, BB, AB, BA) of each of

the seven possible stimulus-pairs for each of the 2 tone types (speech and sine-wave).

This summed to a total of 224 stimulus-pairs. In total, this task took 20 to 25 minutes,

depending on the individual participant‟s pace.

Data Treatment:

162

Again, discrimination performance was measured by d', calculated according to the

model for discrimination tasks in Kaplan, Macmillan and Creelman (1978). The

proportions of Hits, p (Hit), and, False Alarms, p (False Alarm), were forcibly limited

to the 0.01 to 0.99 range, which meant that a maximum d'-value of 8.715, and a

minimum d' value of –8.715 would be obtained.

7.5 Analyses

7.5.1 Test Assumptions

For all following analyses, α was set at .05. Unless otherwise mentioned, test

assumptions were found to be satisfactory. Outliers and how they were dealt with will

be mentioned in the appropriate results sections. Planned contrasts were conducted

using the Bonferroni correction for multiple tests, where appropriate.

7.5.2 Language Group and Musical Background Hypotheses

As there were a priori reasons to expect differences in categorical perception between

tonal (Thai) and non-tonal (Australian English) native speakers and between

musicians and non-musicians, analyses of variance (ANOVAs) incorporating planned

contrasts were conducted to test the differences between these groups, and their

interactions.

7.5.3 Strategy Type Hypotheses

Apart from differences between the language and musical background groups,

differences between stimulus-pairs in the discrimination pairs, across the continua

were also tested in a priori analyses, according to the planned contrasts presented in

Table 7.1.

Table 7.1

Planned Contrasts for the Strategy Type Hypotheses for the Discrimination Task –

Rising continuum57

Stimulus-pair (offset in Hz)

205-212.5

212.5-220

220-227.5

227.5-235

235-242.5

242.5-250

250-257.5

Mid Hypothesis

-1 -1 -1 6 -1 -1 -1

57 In the current experiment, there was only one contrast per strategy, because the flat stimulus was contained by two stimulus pairs.

163

Flat Hypothesis

2 -5 -5 2 2 2 2

The Mid Hypothesis tests whether the stimulus-pairs in the middle of the continuum

(220 - 227.5 Hz and 220 – 235 Hz) are more discriminable than the other stimulus-

pairs. The Flat Hypothesis test examines whether there is a difference in

discrimination performance between the two flat pairs (212.5 – 220 Hz and 220 -

227.5 Hz) and the rest of the stimulus-pairs.

Table 7.2

Planned Contrasts for the Strategy Type Hypotheses for the Discrimination Task –

Falling continuum

Stimulus-pair (offset in Hz)

182.5-190

190-197.5

197.5-205

205-212.5

212.5-220

220-227.5

227.5-235

Mid Hypothesis

-1 -1 -1 6 -1 -1 -1

Flat Hypothesis

-2 -2 -2 -2 5 5 -2

The Mid Hypothesis tests whether the stimulus-pair in the middle of the continuum

(220 – 205 Hz and 220 – 212.5 Hz) are more discriminable than the other stimulus-

pairs. The Flat Hypothesis test examines whether there is a difference in

discrimination performance between the two flat pairs (212.5 – 220 Hz and 220 -

227.5 Hz) and the rest of the stimulus-pairs.

7.6 Results

Results are presented separately for the identification training (section 7.6.1),

identification test trials (7.6.2) and the discrimination test trial (section 7.6.3) task,

followed by a summary and a qualitative evaluation of the identification and

discrimination functions (section 7.6.4).

7.6.1 Identification Training Results

The identification task began with practice trials, followed by a training session to

criterion, and then a set of test trials. Results for the practice trials were not analysed.

In the identification training session, participants were required to identify eight

164

stimuli correctly in succession in order to proceed to the main part of the experiment.

(For raw data and statistical analyses see Appendix A7.3 and 7.4).

Before considering the rising and the falling continua separately a 2 (English/Thai) x

2 (musicians/non-musicians) x 2 (rising/falling) analysis of variance was conducted.

Musicians required significantly fewer trials to reach criterion than non-musicians

(Mmusicians = 19.12, Mnon-musicians = 33.19; F (1, 56) = 9.630, p < .01). No other

significant main effects or interactions were found. Results of the training session for

the rising and falling continua are presented in section 7.6.1.1 and 7.6.1.2.

7.6.1.1 Trials to Criterion in Identification – Rising Continuum

Before considering the blocked and the mixed conditions separately a 2

(Australian/Thai) x 2 (musicians/non-musicians) x 2 (mixed/blocked) ANOVA was

conducted. Descriptive statistics (means and standard error bars) are presented in

Figure 7.6. The analysis revealed that significantly more trials were required to reach

criterion in the mixed condition than in the blocked condition (t (1, 62) = 2.814, p <

.01; Mmix 21.59, sd = 22.66; Mblock = 10.09, sd = 4.58; for raw data see Appendix

A7.3; for analysis see Appendix A7.4).

Australian Musicians

0

20

40

60

80

set 1 set 2 sine speech

mixed blocked

Australian Non-Musicians

0

20

40

60

80


mixed blocked

Thai Musicians

0

20

40

60

80


mixed blocked

Thai Non-Musicians

0

20

40

60

80


mixed blocked

mean

crite

rion s

core

s

mean

crite

rion s

core

s

165

Figure 7.6. Descriptive Statistics for Trials to Criterion for the Rising Continuum in the Identification Task.

Rising Continuum, Blocked condition: Mean trials required to reach criterion in the

blocked condition for the language groups and tone types are shown in Figure 7.6, and

were analysed in a 2 (Australian/Thai) x 2 (musician/non-musician) x 2 (sine/speech)

ANOVA. No significant differences were found between language groups or

musicians and non-musicians. The analysis shows that significantly more trials were

required to reach criterion in the sine-wave condition (M = 11.82) than in the speech

condition (M = 8.38; F (1, 24) = 5.386, p < .05). No other main effects or interactions

were significant (for analyses Appendix A7.4).

Rising Continuum, Mixed condition: As can be seen in Figure 7.6, in the mixed

condition, Australian participants required significantly more trials than Thai listeners

to reach criterion (MAustralian = 29.06; MThai = 14.125; F (1, 24) = 11.522, p < .01), non-

musicians required significantly more trials than musicians (Mmusicians = 11.63, Mnon-

musicians = 31.56; F (1, 24) = 20.526, p < .01), and participants needed more trials to

reach the criterion in Set 1 (M = 29.81) than in Set 2 (M = 13.37; F (1, 24) = 13.95, p

< .01). The interaction between language and musical background was also found to

be significant, with no significant difference between Thai musicians and non-

musicians, but significantly higher criterion scores for the Australian musicians than

non-musicians (F (1, 24) = 19.011, p < .05). Another interaction that was found to be

significant was the interaction between sets and musical background, with no

difference between musicians and non-musicians in the second set, but significantly

lower criterion scores in musicians in the first set (F (1, 24) = 7.054, p < .05). None of

the other interactions were found to be significant (for analyses Appendix A7.4).

7.6.1.3 Trials to Criterion in Identification – Falling Continuum

Descriptive statistics for trials to criterion in the falling continuum are provided in

Figure 7.7. Before considering the blocked and the mixed conditions separately a

comparison of trials to criterion in the mixed vs. the blocked condition collapsed over

language groups, tone types, and musical backgrounds was conducted. The analysis

revealed no significant difference between criterion scores in the mixed condition and

the blocked condition (t (1, 62) = 1.034, p > .05; Mmix = 13.56, sd = 9.353; Mblock =

11.31, sd = 7.998; for raw data see Appendix A7.3; for analysis see Appendix A7.4).

166

Figure 7.7. Trials to Criterion Scores for the Falling Continuum.

Falling Continuum, Blocked condition: No significant main effects or interactions

were found in the blocked condition. (For raw data and analyses see Appendix A7.3

and 7.4.)

Falling Continuum, Mixed condition: In the mixed condition, musicians needed

significantly fewer trials to reach criterion than non-musicians (F (1, 24) = 7.555, p <

.05) and participants required significantly more trials to reach criterion in the first

than in the second set (F (1, 24) = 7.809, p < .05). (For raw data and analyses see

Appendix A7.3 and A7.4.)

mean

crite

rion s

core

s

m

ea

n c

rite

rion s

core

s

Australian Musicians

0

20

40

60

80


mixed blocked

Australian Non-Musicians

0

20

40

60

80


mixed blocked

Thai Musicians

0

20

40

60

80


mixed blocked

Thai Non-Musicians

0

20

40

60

80


mixed blocked

167

7.6.2 Identification Test Trials

Identification test trials were analysed with respect to identification crossover

(sections 7.6.2.1 and 7.6.2.2) and d' values (sections 7.6.2.3 and 7.3.2.4) and

separately for the rising and falling continua.

7.6.2.1 Crossover Values – Rising Continuum

Mean crossover values in Hz were calculated for each listener and are shown in

Figure 7.8, 7.9, 7.10, and 7.11. Planned comparisons within a 2 x 2 x 2 x (2) language

x musical background x presentation manner x stimulus type analysis of variance

(ANOVA), with repeated measures on the last factor were conducted to test the effect

of these four factors on the crossover boundary location (for raw data and analyses see

Appendix A7.5 and A7.6).

In the rising continuum, the mean crossover over all manipulations was 232.03 Hz

and there were no significant deviations from this, due to language or musical

background, presentation manner, or stimulus type, or their interactions.

220

230

240

250

mixed blocked mixed blocked

musicians non-musicians

mean c

rossover

valu

es (

Hz)

sine speech

Figure 7.8. Crossover Values for the Rising Continuum for Thai Listeners.

220

230

240

250



mean c

rossover

valu

es (

Hz)

sine speech

168

Figure 7.9. Crossover Values for the Rising Continuum for Australian Listeners.

7.6.2.2 Crossover Values – Falling Continuum

The mean crossover over all manipulations was 211.83 Hz, but crossovers for Thai

listeners were significantly higher and thus closer to the flat 220 Hz offset tone than

those of the Australian listeners (F (1, 45) = 17.416, p < .005; MAustralian = 208.10 Hz;

MThai = 215.45 Hz). There was also a significant interaction of presentation manner

and musical background: crossovers were similar for both musicians (M = 212.65 Hz)

and non-musicians (M = 212.82 Hz) in the mixed condition, as well as for non-

musicians in the blocked condition (M = 213.9 Hz), but significantly lower for the

musicians in the blocked condition (M = 208.1 Hz; F (1, 45) = 4.397; p < .05). There

was also a significant interaction between presentation manner and tone type: in the

blocked condition, there was little difference in crossover for speech and sine-wave

tones, whereas in the mixed condition, the sine-wave crossover was much higher on

the continuum than the crossover for speech tones (F (1, 45) = 4.631; p = .037),

especially for the Australian non-musicians (F (1, 45) = 5.261; p < .05).

180

190

200

210

220

230



mean c

rossover

valu

es (

Hz)

sine speech

Figure 7.10. Crossover Values for the Falling continuum for Thai Listeners.

169

180

190

200

210

220

230



mean c

rossover

valu

es (

Hz)

sine speech

Figure 7.11. Crossover Values for the Falling continuum for Australian Listeners.

7.6.2.3 Identification d' Results

The d' results (steepness of the identification function at the boundary location, an

indicator of the degree of categoricality) were analysed in language x musical

background x presentation manner x tone type, 2 x 2 x 2 x (2) analyses of variance

separately for the rising and the falling continua. Descriptive statistics are shown in

Figures 7.12 and 7.13 for the rising continuum (for raw data see Appendix A7.7, and

for analyses see Appendix A7.8) and Figures 7.14 and 7.15 for the falling continuum.

Before considering the falling and the rising continua separately, a 2 (rising/falling) x

2 (Thai/Australian) x 2 (musicians/non-musicians) analysis of variance collapsed over

presentation manner and tone type was conducted. The analysis revealed no

significant main effects or interactions.

Identification Accuracy - Rising Continuum: For the rising continuum, no main

effects were observed, but there was a language x musical background interaction -

Australian musicians perform more categorically than Australian non-musicians, but

there was little difference between Thai musicians and non-musicians (F (1, 44) =

4.106, p = .049). There was also a significant interaction of musicianship and

presentation manner: In the mixed condition, non-musicians show more categorical

identification, however in the blocked condition, musicians are more categorical in

identifying tones (F (1, 44) = 9.808, p < .005). No other interactions were significant.

170

0

1

2

3

4



slo

pe (

d')

sine speech

Figure 7.12. Descriptive Statistics for d' Values (rising continuum) for Thai Listeners.

0

1

2

3

4



slope

(d')

sine speech

Figure 7.13. Descriptive Statistics for d' Values (rising continuum) for Australian English Listeners.

Identification Accuracy - Falling Continuum: In the falling continuum, the mean

identification accuracy (d') over all manipulations was 1.460 and analyses of variance

showed there were no significant differences due to language background, musical

background, presentation manner, or tone type.

171

0

1

2

3

4



slo

pe (

d')

sine speech

Figure 7.14 Descriptive Statistics for d' Values (falling) for Thai Listeners.

0

1

2

3



slo

pe (

d')

sine speech

Figure 7.15. Descriptive Statistics for d' Values (falling) for English Listeners.

7.6.3 Discrimination Results

In the discrimination task there were practice trials followed by sets of test trials.

Practice trials were not analysed. For the test trials d' was the dependent variable.

Overall differences in discrimination are presented in 7.6.3.1 and 7.6.3.2 and analyses

of peaks of discrimination and perceptual strategies and perceptual strategies in

7.6.3.3 and 7.6.3.4.

7.6.3.1 Overall Discrimination Differences

Descriptive statistics for overall discrimination ability are presented in Figure 7.16

and 7.17 for rising and in Figure 7.18 and 7.19 for falling continua respectively, and

in summary form in Figures 7.24 and 7.25. For raw data see Appendix A7.9, for

analyses see Appendix A7.10. Before considering the rising and the falling continua

separately a 2 (English/Thai) x 2 (musicians/non-musicians) x 2 (rising/falling)

172

analysis of variance was conducted. Musicians show significantly higher

discrimination accuracy than non-musicians (Mmusicians = 1.589, Mnon-musicians = .998; F

(1, 124) = 7.316, p < .01). No other significant main effects or interactions were

found.

Overall Discrimination, Rising Continuum: In the rising continuum, there were no

significant main effects depending on language background or musicianship and no

interactions of language background with tone type. However, sine-wave stimuli were

generally discriminated significantly better than speech tones (F (1, 48) = 9.672, p <

.01). No other main interactions were found to be significant.

-2

-1

0

1

2

3

4

5



dis

crim

inatio

n a

ccu

racy

(d')

sine speech

Figure 7.16. Descriptive statistics for Discrimination Accuracy (rising) in Thai Listeners.

-2

-1

0

1

2

3

4

5



dis

crim

inatio

n a

ccu

racy

(d')

sine speech

Figure 7.17. Descriptive Statistics for Discrimination Accuracy (rising) in Australian Listeners.

173

Overall Discrimination Differences – Falling Continuum: For the falling continuum,

musicians were found to be significantly better at discriminating the tones (F (1, 48) =

13.45, p < .01). There was also an interaction between language and musical

background: Australian musicians were better at discrimination of tones than Thai

musicians (F (1, 48) = 6.51, p < .05), but there was little difference between

Australian and Thai non-musicians. None of the other main effects or interactions

(neither 2-way nor 3-way) were found to be significant.

-2

-1

0

1

2

3

4

5



dis

crim

inatio

n a

ccu

racy

(d')

sine speech

Figure 7.18. Descriptive Statistics for Discrimination Accuracy (falling) in Thai Listeners.

-2

-1

0

1

2

3

4

5



dis

crim

inatio

n a

ccu

racy

(d')

sine speech

Figure 7.19. Descriptive Statistics for Discrimination Accuracy (falling) in Australian Listeners.

7.6.3.2 Discrimination Peak Analysis

The discrimination results were analysed separately for the different continuum

shapes. This was necessary because the location of the discrimination peak has to be

analysed relative to the continuum, which means that the middle of the continuum for

174

instance, is at 208.75 Hz for the falling continuum and at 231.5 for the rising

continuum. Raw data are presented in Appendix A7.11; analyses are presented in

Appendix A7.12. Data are schematically represented for the rising continuum in

Figures 7.20 and 7.21 and for the falling continuum in Figures 7.22 and 7.23

Discrimination Peak, Rising Continuum: In the rising continuum, neither the mid-

hypothesis nor the flat-hypothesis were confirmed by the results (Fmid (1, 23) = .471, p

> .05; Fflat (1, 23) = 1.771, p > .05). There was however an interaction of pair type

(flat vs. other) and musical background with the d‟ for the flat pair significantly lower

than for the other pairs in the non-musician but not in the musician group (F (1, 23) =

15.479, p < .05).

-1

0

1

2

3

4

5

musicians non-musicians musicians non-musicians

Australian Thai

dis

crim

ina

tion a

ccus\

racy

(d')

flat pairs other pairs

Figure 7.20. Descriptive statistics for the Flat Pair vs. the other Stimulus-Pairs – Thai and Australian Musicians and Non-Musicians (rising continuum).

-1

0

1

2

3

4

5

musicians non-

musicians

musicians non-

musicians

Australian Thaidis

cri

min

ation a

ccura

cy (

d')

mid pair other pairs

Figure 7.21. Descriptive statistics for the Mid Pair vs. the other Stimulus-Pairs – Thai and Australian Musicians and Non-Musicians (rising continuum).

175

Thus it appears that the flat hypothesis is upheld less for non-musicians than for

musically trained listeners. No other main effects or interactions concerning the mid-

hypothesis or the flat hypothesis were found to be significant.

Discrimination Peak, Falling Continuum: In the falling continuum, both the flat and

the mid hypothesis were supported by the data (Fmid (1, 24) = 8.998, p < .05; Fflat (1,

24) = 10.948, p < .05;). No interactions concerning the mid-hypothesis or the flat-

hypothesis were significant, suggesting that all groups displayed these equivalent

mid/flat results.

-1

0

1

2

3

4

5

musicians non-musicians musicians non-musicians

Australian Thai

dis

crim

ina

tio

n a

ccura

cy (

d')

flat pairs other pairs

Figure 7.22. Descriptive statistics for the Flat Pair vs. the other Stimulus-pairs – Thai and Australian Musicians and Non-Musicians (falling continuum).

-1

0

1

2

3

4

5

musicians non-

musicians

musicians non-

musicians

Australian Thaidis

crim

inatio

n a

ccura

cy (

d')

mid pair other pairs

Figure 7.23. Descriptive statistics for the Mid Pair vs. the other Stimulus-pairs – Thai and Australian Musicians and Non-Musicians (falling continuum).

176

7.6.4 Qualitative Evaluation and Summary of Results

Identification functions and discrimination graphs for both language groups and

musicians and non-musicians are shown in Figure 7.24 for the rising continuum and

in Figure 7.25 for the falling continuum.

In the rising continuum, the identification functions look steeper in Thai musicians

and non-musicians compared to Australian English musicians and non-musicians.

When comparing musicians and non-musicians, it can clearly be seen that musicians

show higher discrimination values and steeper identification curves than non-

musicians. In all groups, the category boundary lies very close to the middle of the

continuum. However, this qualitative observation is not reflected in the analyses. For

the rising continuum the concordance between speech and sine-wave identification in

both Thai and Australian English musicians appears to be greater than that in non-

musicians and the endpoints appear to be identified more reliably (closer to the

minimum and maximum values) by musician listeners than by non-musician listeners.

In the falling continuum, the picture is similar in terms of perceptual accuracy:

musicians show steeper identification functions, higher discrimination accuracy and

greater speech/non-speech correspondence (the speech and sine-wave graphs lie

closer together for musicians than for non-musicians). When looking at the crossover

values, however, the results differ from the rising continuum: the crossovers are not

consistently near the middle of the continuum, but rather between the middle and the

flat no-contour tone.

In summary, considering both the combined results in Figures 7.24 and 7.25 and the

statistical analyses, it appears that musicians learn to identify tones more quickly than

do non-musicians and discriminate tones with greater accuracy than non-musicians

(but only in the falling continuum). For the rising continuum, there is little difference

between musician and non-musician Thai speakers‟ identification accuracy, but

Australian English musicians show better identification accuracy than Australian

English non-musicians.

Participants who are confronted with just one type of stimulus at a time (in the

blocked condition) need fewer trials to reach criterion in identification. In terms of

177

identification boundaries, the perceptual strategy seems to depend on the shape of the

continuum, with the perceptual anchor found in the middle of the rising continuum

(note that this result is not statistically significant), but between the flat no-contour

tone and the central tone for the falling continuum.

English musicians rising

0

50

100

205 212.5 220 227.5 235 242.5 250 257.5

-8.715

0

8.715

English non-musicians rising

0

50

100

205 212.5 220 227.5 235 242.5 250 257.5

-8.715

0

8.715

Thai musicians rising

0

50

100

205 212.5 220 227.5 235 242.5 250 257.5

-8.715

0

8.715

Thai non-musicians rising

0

50

100

205 212.5 220 227.5 235 242.5 250 257.5

-8.715

0

8.715

Iden

tific

atio

n ac

cura

cy in

%

I

dent

ifica

tion

accu

racy

in %

Dis

crim

inat

ion

accu

racy

(m

inim

um:

-8.7

15, m

axim

um: 8

.715

)

sine speech

sine speech

sine speech

sine speech

Figure 7.24. Identification and discrimination for the rising continuum

flat mid flat mid

offset values (in Hz) for the 220 Hz onset tone offset values (in Hz) for the 220 Hz onset tone

Rising Continuum Tone Perception

Thai musicians falling

0

50

100

182.5 190 197.5 205 212.5 220 227.5 235

-8.715

0

8.715

mid flat mid flat

English non-musicians falling

0

50

100

182.5 190 197.5 205 212.5 220 227.5 235

-8.715

0

8.715

Thai non-musicians falling

0

50

100

182.5 190 197.5 205 212.5 220 227.5 235

-8.715

0

8.715

English musicians falling

0

50

100

182.5 190 197.5 205 212.5 220 227.5 235

-8.715

0

8.715

Iden

tific

atio

n ac

cura

cy in

%

I

dent

ifica

tion

accu

racy

in %

Dis

crim

inat

ion

accu

racy

(m

inim

um:

-8.7

15, m

axim

um: 8

.715

)

Figure 7.25. Identification and discrimination results for the falling continuum

sine speech

sine speech

sine speech

sine speech

offset values (in Hz) for the 220 Hz onset tone offset values (in Hz) for the 220 Hz onset tone

Falling Continuum Tone Perception

180

7.7 Discussion

Here the results are discussed especially in terms of continuum shape and differences

depending on musical background.

7.7.1 Differences Due to Continuum Shape

In this experiment, two differently shaped continua were used: one that consisted of

more rising than falling tones, and another that comprised more falling tones than

rising tones. In the analyses, significantly more trials were required to reach the

identification training criterion in the rising continuum than in the falling continuum.

This is interesting because it suggests that over and above any effects of musical or

language experience, the exact properties of the tone continuum, that is the proportion

of rising vs. falling tones, can affect the ability to learn a new tone contrast. So, even

though the physical separation was equivalent in the two continua, listeners learned to

label a slightly rising tone (the upper endpoint of the falling continuum with an onset

of 220 Hz and an offset of 235 Hz) and a steeper falling tone (the lower endpoint of

the falling continuum with an onset of 220 Hz and an offset of 182.5 Hz) faster than a

steeper rising tone (the upper endpoint of the rising continuum with an onset of 220

Hz and an offset of 257.5 Hz) and a slightly falling (the lower endpoint of the rising

continuum with an onset of 220 Hz and an offset of 205 Hz) tone.

In terms of crossover location, no differences were found regarding language

background, musical background, tone type, or presentation manner in the rising

continuum. Thus, all participants‟ perception changed from one tone category to the

other at a similar location on the continuum, namely around the middle (231.25 Hz) of

the rising continuum (Mrising = 232.03, sd = 6.203). In the falling continuum, however,

there are crossover differences, depending on language background, presentation

manner, and tone type, with the mean again being around the middle (208.75 Hz) of

the continuum (Mfalling = 211.83, sd = 8.80). These differences indicate that even

though in both continua the middle is used as a perceptual anchor in identification,

there remain differences that may originate in the specific stimulus properties.

181

7.7.2 Differences Between Musicians and Non-Musicians

Tones are a feature of music in all cultures, but tone is not a feature of speech in all

languages. Musicians, however, are constantly exposed to small tonal changes. Thus,

it was expected that tone perception would be better in musicians than in non-

musicians. The results of the current study confirm this; musicians learn to identify

tones more quickly than non-musicians, and are more accurate at discriminating

subtle differences between tones than non-musicians. Thus it appears that musical

training might improve listeners‟ speed of learning new non-musical tonal distinctions

and their ability to discriminate tones.

A reason for the superiority of musicians over non-musicians could be that musicians

have better pitch pattern processing skills. Jakobson, Cuddy, and Kilgour (2003)

found that music training engages and refines processes involved in pitch pattern

analysis and these may be activated in the current tasks. Thus it is possible that music

instruction affects categorical perception of tone indirectly by strengthening auditory

temporal processing skills (Jakobson et al., 2003), which allows musicians to

discriminate better between rapidly changing acoustic events.

Another more universal explanation of the difference between musicians and non-

musicians‟ categorical perception of tone in both speech and non-speech contexts is

that the musicians‟ dominance may originate from the combination of abilities that

music lessons teach and improve, including focused attention and concentration,

memorisation, music reading, and fine-motor skills (Schellenberg, 2001). The

enhanced ability for identifying and discriminating tones as shown in the current

study might be a result of generally enhanced auditory skills evoked by previous

music training. This will be taken up further in the next experiment; the relative

influence of musical training on the one hand and musical aptitude on the other on

tone perception will be considered.

The results of this experiment show that musical ability enhances the ability to

identify and discriminate tones in both a speech and non-speech context. Interestingly,

these are no systematic differences between speakers of tonal and non-tonal languages

182

in general. This suggests that musical experience is a potent factor that shapes the

perception of unfamiliar tones in speech and non-speech contexts in a stronger way

than tonal language experience does.

It is surprising that no differences between the perception of speech and non-speech

tones have been found in this experiment. One reason for this lack of difference could

be the artificial nature of the speech tones presented. However, it has been shown in

previous studies that synthetic tones that were generated in the same way as the

stimuli used here are in fact perceived as speech sounds and distinct from non-speech

tones (Burnham & Jones, 2002; S. Chan et al., 1975; Howie, 1976). However,

Burnham and Jones (2002) presented a range of types of non-speech sounds (sine-

wave, filtered speech, simulated violin) and so it is possible that the contrast, for the

listener, of speech and non-speech under such conditions is greater. Clearly this is an

issue that requires further research.

While this result pattern seems very clear, it remains uncertain what exactly it is about

musical training that enhances tone perception. The next experiment will investigate

the locus of musicians‟ superiority and whether it is restricted to the quasi-musical

stimulus of tone.

183

CHAPTER 8

Perception and Production of Tones, Consonants, and

Vowels: The Influence of Language Aptitude, Musical

Aptitude, Musical Memory, and Musical Training

184

8.1 Introduction

As discussed in Chapter 4, music exposure and training can influence speech

processing. In fact, in Experiment 2, it became clear that the influence of musical

background on tone perception might even be greater than the influence of tonal

language background, at least for the language backgrounds studied there, English

and Thai. In order to obtain a more comprehensive understanding of the role of

musicality in tone perception, here musical experience, musical memory, aptitude58

(musical aptitude and foreign language aptitude) will be investigated, especially in

relation to speech perception and production. As lexical tone relies very much on F0

(see Chapter 3) there may well be a link between speech and music at the tonal level.

For this reason the effect of musical variables on the perception and production is

investigated here. To analyse whether any such effects are due to a general effect on

phonological perception and production or more specifically on tone, the perception

and production of consonants and vowels are also considered here. In this experiment

the effect of musical and linguistic ability on the perception and production of foreign

language sounds will be investigated. To this end musician and non-musician non-

tone (Australian English) language participants are given tests of musical ability and

linguistic aptitude, along with tests of the perception and production of Thai tones,

consonants, and vowels.

Musicianship was measured by considering musical training and experience. To this

end, groups of musicians, with at least five years of continuous formal musical

training, and non-musicians with no more than two years of musical training were

tested. A questionnaire was administered to specify more precisely the degree and

type of training. In addition to these a priori measures of musicality, further measures

of musical ability were used in order to assess the effect of inherent musicality, free

from the effect of musical training and experience, on foreign language perception

and production. Two types of inherent musicality tests were administered – a test of

musical aptitude, and a test of musical memory.

58 Aptitude is defined as the potential to acquire a skill, a natural tendency to do something well. In this context, it is also the case that there may be between-participant differences in such potential, and these differences may originate from various sources, and not necessarily be genetically inherited.

185

Thus the design of the experiment involved two groups of Australian English

speaking participants (musicians and non-musicians), given tests of musical aptitude,

musical memory, language aptitude, and Thai speech sound tests on the (a) perception

and (b) production of (i) tones, (ii) consonants, and (iii) vowels. The hypotheses are

set out below ahead of a detailed description of the tests and the method.

8.2 Hypotheses

Hypotheses were advanced regarding each individual ability/aptitude test; for

perception and production; and for the relative contribution of abilities/aptitudes to

speech sound production and perception.

8.2.1 Separate Abilities

8.2.1.1 Musicianship

It is expected that musical training (i.e. the musicians vs. non-musician factor) will

enhance both perception and production of lexical tones, vowels, and consonants,

although given that the experience with musical pitch that musical training affords, it

is expected that musical training will enhance both perception and production of

lexical tones more than it will that of consonants and vowels.

8.2.1.2 Musical Aptitude

It is expected that musicians will display higher musical aptitude scores. It is also

expected that high musical aptitude will enhance perception and production of lexical

tones and consonants more than of vowels. The reason for this for tones is that

musical aptitude measures natural abilities people might have for melody and rhythm.

Lexical tone can be related to melody and on this basis it might be expected that tone

perception and production will be enhanced by musical aptitude. For consonants the

contrasts used here are based on differences in voicing, voice onset time, so it might

be expected that consonant perception and production would be enhanced to the

degree that temporal resolution is involved in musical aptitude. For vowels as the

timbre aspect of music, which can be related to vowel features, has not been found to

be enhanced in musicians compared to non-musicians (Lamb & Gregory, 1993; Prior

& Troup, 1988) it is expected that vowel perception and production will be unaffected

by musical aptitude.

186

8.2.1.4 Musical Memory

Based on previous results (Schellenberg & Trehub, 2003) it is expected that musicians

as well as non-musicians will perform equally well on the musical memory test. It is

further expected that musical memory for the pitch of melodies will enhance speech

perception and production. In particular, due to the pitch-based nature of this ability it

is expected that musical memory will enhance the perception and production of

lexical tones more than it will that of consonants and vowels.

8.2.1.3 Language Aptitude

Musicians and non-musicians are expected to perform equally well on language

aptitude. It is also expected that language aptitude will enhance perception and

production of lexical tones, vowels, and consonants equally.

8.2.2 Relationship between Perception and Production

In all the above it is expected that both perception and production will be equally

enhanced. It is also expected that musicians as well as non-musicians will produce

those sounds that result in high perceptual accuracy better than those sounds that are

non-native to identify, i.e. that there should be positive correlations between

perception and production.

8.2.3. Determinants of Perception and Production

It is expected that musical training, musical aptitude, musical memory, and language

aptitude, will contribute to the prediction of participants‟ ability to (a) perceive and

(b) produce (i) tones, (ii) consonants, and (iii) vowels. In particular it is expected that

musical training will predict production and perception of tone, but not of vowels and

consonants. Musical aptitude is expected to predict perception and production of

tones, and consonants and vowels, and musical memory is expected to predict

perception and production of tones, but not consonants and vowels. It is expected that

language aptitude will predict perception and production of all speech sounds equally.

187

8.3 Method

8.3.1 Participants

A total of 36 native Australian English participants were tested; 18 musicians (10

female, 8 male, average age: 27.8 years) and 18 non-musicians (10 female and 8 male,

average age: 24.6 years). Participants gave their informed consent (see Appendix

A8.2) and the experiment was covered by Ethics Approval from the University of

Western Sydney (HREC 06/65). All participants were students at the University of

Western Sydney, who received course credit for their participation. None of the

participants had previous exposure to a tone language. Musicians were defined as

instrumentalists/singers having at least five years of continuous formal musical

training (M = 15.7 years, sd = 10.62). Non-musicians were defined as having no more

than two years of musical training (M = .11 years, sd = .47). In addition to this

classification, all participants were given a questionnaire, which included

demographic and sensory information history as well as musical history (for details of

participants‟ musical history see Appendix A8.1). None of the participants had any

self-reported hearing or speech/language problems. Participants were tested

individually in a single session, in a sound-attenuated testing cubicle in the MARCS

Auditory Laboratories at the University of Western Sydney. They were each given

tests of musical aptitude, musical memory, language aptitude, and perception and

production of tones, consonants, and vowels. The order of tests was counterbalanced

between participants and testing took a total of 75 minutes. Stimuli for all tasks were

presented on a laptop computer (Compaq Evo N1000c) over headphones (KOSS

UR20) at a self-adjustable listening level. Details of each test are given below.

8.3.2 Musical Aptitude

The most comprehensive of the musical aptitude tests is the Musical Aptitude Profile

(MAP), designed by Gordon (1965) to measure seven different dimensions of musical

aptitude in students with and without musical knowledge ranging from grade four to

twelve or older. The MAP consists of three sections: tonal imagery, rhythm imagery,

and musical sensitivity. However, the test takes 3 hours and 30 minutes and so here, a

shorter version was employed, the Advanced Measures of Music Audiation (AMMA;

Gordon, 1989). The Advanced Measures of Music Audiation is a recorded aptitude

test that is usually administered to high school or university students with or without

188

musical experience. The AMMA was chosen because it is reported to represent

stabilised musical aptitude and it thus is best suitable to study musicians and non-

musicians. The AMMA (Gordon, 1989) has adequate normative data, its reliability

and validity are adequate, it is independent of musical training and chronological age,

has clear instructions, and takes only 20 minutes to complete. The norm sample

consisted of American college and university music majors (n = 3206), non-music

majors (n = 2130) and high school students (n = 872). Mean reliability of the test,

measured by split-half and test re-test measures was r = .82, and this value was

around the same for all three of the norm groups.

The AMMA consists of 30 computer-programmed questions with musical material

performed on an electronic instrument. It takes around 20 minutes to administer and is

presented on audio-CD through headphones. Each test question consists of a short

musical statement and a musical answer. For example, the musical statement would

have a different final note than the musical answer or would be partly performed more

slowly or with different phrasing. The student is required to decide whether statement

and answer are the same or different. In case of different judgements, the listener must

then decide whether the difference lies in a tonal (change in tone or key) or

rhythmical (change in duration, tempo, or meter) manipulation. There cannot be a

tonal and a rhythm change. In the practice exercises, it is explained what is meant by

a tonal or a rhythmical change. The listener is required to fill the blank in the “same”

column, if they think the answer is identical to the statement, and if it is judged to be

different, to fill the “tonal” column if the change is perceived as tonal, and the

“rhythm” column if perceived to be rhythmical (see Appendix A8.3 for the answer

sheet). The listeners are instructed not to guess, but to leave items blank if they are

unsure. The change can occur in any position in the answer (beginning, middle, or

end) and the listener cannot answer by counting the number of notes, as there is

always the same number of notes in the question as in the answer. Scoring included

three processes: a) counting the number of correct answers to obtain a raw score; b)

adjusting the raw scores (according to the procedure in Gordon, 1989) c) converting

the raw scores to percentile ranks (according to a table included in the test materials,

Gordon, 1989). All scoring was done by the experimenter, as because scoring was

completely objective, it was not necessary to involve other scorers.

189

8.3.3 Musical Memory for Pitch

The Schellenberg and Trehub (2003) musical memory test was devised to measure

absolute pitch memory in musicians and non-musicians using material that would be

familiar to both - the absolute pitch of popular TV themes. Using such tests

Schellenberg and Trehub (2003) have shown that both musicians and non-musicians

show relatively good pitch memory, and variation in this ability is relatively

independent of musical training. Here the original Schellenberg and Trehub (2003)

test was adapted for the current purposes. Twelve 5-second samples from popular

songs were presented to the listener in two forms – at the original pitch level and at a

slightly transposed pitch. The listener was required to decide which of the versions is

the original version (Schellenberg & Trehub, 2003). A list of the 12 songs used here is

given in Appendix A8.4. The selection criterion for which songs to use was

popularity, as determined in a pilot study conducted at MARCS Auditory laboratories,

UWS, Sydney. The position in the songs from which excerpts were taken was not

predetermined; it was selected to be maximally representative of the overall recording

and then saved as CD-quality sound files (44.1 KHz sampling rate). The original

excerpts were shifted by 1 or 2 semitones upward or downward with ProTools

(DigiDesign) digital-editing software, which is commonly used in professional

recording studios. Pitch shifting had no effect on tempo (speed) or overall sound

quality. Direction and magnitude of the pitch shifts were counterbalanced, so that of

the 12 trials, 6 were upward shifts and 6 downward, and of those, 3 were shifted by 1

semitone, and the other 3 by 2 semitones. Pitch shifts involved multiplying (for

upward shifts) or dividing (for downward shifts) all frequencies in an excerpt by a

factor of 1.12 for 2-semitone shifts, and 1.06 for 1-semitone shifts. For example, a 2-

semitone shift upwards involved a change from 262 Hz to 294 Hz. To eliminate

potential cues from the electronic manipulation (that could result in quality

differences between the original and the shifted versions) the pitch levels of the

original excerpts were also shifted upward and then downward 1 semitone (all

frequencies multiplied and then divided by a factor of 1.06).

The excerpts were presented to participants in a Microsoft PowerPoint file. All 12

trials were tested in the same session. Listeners heard one version of a 5-sec excerpt at

the original pitch level and another version at the altered (upward or downward) pitch.

190

Order of presentation (original-altered or altered-original) was counterbalanced. The

participants activated each excerpt version by clicking on a loudspeaker-icon on the

screen, so they could determine the time separation of the sounds themselves (for an

example of a screen shot see Appendix A8.5). Participants were instructed that they

would hear two versions of the same song on each trial, with one version at the

correct pitch and the other version shifted higher or lower. First, they were required to

indicate whether they had heard the song before, by ticking a box next to the song title

on the answering sheet. Their task was then to identify which was the excerpt with the

original (usual) pitch level (see Appendix A8.4 for the answer sheet). They received

no feedback for correct or incorrect responses during the task. Responses for those

songs that the participants were familiar with were scored as correct or incorrect and

participants were given a proportion correct for known songs score. The task took 4 to

6 minutes, depending on the individual participant‟s pace.

8.3.4 Foreign Language Aptitude

For the present purposes the Pimsleur Language Aptitude Battery (PLAB; Pimsleur,

1966) was used. The PLAB was developed to measure the ability to learn foreign

languages, and consists of six parts, concerned with different aspects of language

learning – grades, motivation, vocabulary, language analysis, sound discrimination

and sound-symbol association. In part 1, the student gives information about Grade

Point Average (a measure of academic achievement used in the United States) in

areas other than foreign languages. In the second part, interest in learning a foreign

language is assessed; the student indicates whether he/she is interested on a 5-point

scale from “rather uninterested” to “strongly interested”. Part 3 involves vocabulary

assessment of English; participants are required to choose from four possibilities a

word that has “approximately the same meaning” as a given word. In part four, ability

to reason logically in terms of foreign language is assessed. In the current experiment,

only part five and six were administered. In part five, the task is to learn three words

in a language (Ewe59) where nasality and lexical tone are used to distinguish meaning.

In the first section, two words that are different in nasality (“cabin”, no nasality, [ehɔ]

and “boa”, nasal final vowel [ehɔ]) are to be learnt. These two words are then

presented in sentence context (items 1 – 7) and the participant identifies these by

59 Ewe is an African language, spoken in Ghana and Togo. It has four lexical tones (Capo, 1991).

191

filling in blanks on the answer sheet (for the answer sheet see Appendix A8.6). Then,

a third word (“friend” [ehɔ] with rising tone) is introduced which differs from one of

the other words (boa) in lexical tone (items 8 – 15). Listeners are required to identify

whether it is “boa” or “friend” that is said in a sentence context. In the last part of the

experiment (items 16 – 30), the listener has to choose from all three words. This test

aims to test how well listeners can learn to perceive foreign language sounds. This

section is relevant to the current study as it tests the ability to learn new phonetic

distinctions and to recognise them in different contexts and is a measure of auditory

ability. Section six consists of 24 nonsense words based on English consonants and

vowels (and essentially English syllable structure). The voice on the tape pronounces

one of four words in a written response set, and the participant simply indicates which

of the four written words was spoken. An example would be an auditory presentation

of the word “trapdel”, with four written possibilities: a) trapled b) tarpled c) tarpdel d)

trapdel. This test is appropriate here because it tests the ability to convert sounds into

written output (see Appendix A8.7 for the answer sheet). In part five, the 30 items

were scored as correct or incorrect resulting in a proportion correct score. In part six,

the 24 items were scored as correct or incorrect and participants were given a

proportion correct score. Taken together, part five and six take around 12 minutes.

8.3.5 Stimulus Material for Perceptual Identification and Production Tasks

In order to prepare the stimuli for the identification and production tasks, 27 Thai

syllables were recorded, each consisting of a consonant and a vowel, with an

accompanying tone. Three levels of each of these three features (tone, consonant, and

vowel) were examined; giving rise to 3 (tones) x 3 (consonants) x 3 (vowels) that is

27 different stimulus combinations. In the recording process, native Thai speakers

were asked to read the Thai syllables were presented on flash cards, each having one

of three levels of voicing (prevoiced bilabial stop [b], voiceless unaspirated bilabial

stop [p], and voiceless aspirated bilabial stop [ph]), vowel quality (closed unrounded

front vowel [i], open-mid rounded back vowel [ɔ]60, unrounded closed back vowel

[u]), and tonal contour (tone 0-mid, tone 1-low, and tone 3-high61). Table 8.1 presents

the stimulus matrix. Both male and female native Thai voices were recorded.

60 This vowel will also labelled /o/. 61 All of the three tones were contour tones (see page 189, Figure 8.1).

192

The three voicing distinctions with the bilabial stop were chosen because in

Australian English only two of these distinctions (the voiceless unaspirated, as in

“spa” and the voiceless aspirated bilabial as in “part”) are phonologically relevant.

Thus, the prevoiced bilabial stop is a non-native speech sound that is phonologically

unfamiliar to the English language-speaking participants. In terms of vowels,

i: ɔ: and u were chosen because in Australian English, i as in /heed/, ɔ as in

/hot/ are native, whereas u is not. The tones that are used in this experiment are the

mid tone (tone 0), the low tone (tone 1) and the high tone (tone 3). One reason for this

choice was that these three tones are distinct in F0, and there is no or very little

overlap between them. Another reason was that, apart from the F0 differences, they

also differ in their degree of contour. The mid and the low tone are similar in pitch

height and contour (both falling contours), whereas the high tone has a rising contour

with a slight fall at the end. All of these, at least at a phonological distinction level are

non-native in English, although it could be argued that the mid tone could be the most

familiar.

Table 8.1

Matrix Showing Stimuli Differing on Three Levels: Voice Onset Time, Lexical Tone,

and Vowel Quality. Non-native sounds (for Australian English participants) are

underlined.

Mid Tone (0) Low Tone (1) High Tone (3)

Prevoiced - b bi:0 bɔ:0 bu:0 bi:1 bɔ:1 bu:1 bi:3 bɔ:3 bu:3

Voiceless unaspirated - p

pi:0 pɔ:0 pu:0 pi:1 pɔ:1 pu:1 pi:3 pɔ:3 pu:3

Voiceless aspirated - ph

phi:0 phɔ:0 phu:0 phi:1 phɔ:1 phu:1 phi:3 phɔ:3 phu:3

Some of the syllables presented are real words in Thai whereas others are non-words

(see Appendix A8.7 for meanings of the words). The reason for this is simply that not

all combinations of the consonants, vowels and tones have a meaning in Thai.

However, the syllables were mostly unlike English language words, except for [bi:0]

as in “be”, [phi:0] as in “pea”, [phɔ:0] as in “paw”, and [phu:0] as in “poo”.

193

One male (23 years) and one female (22 years) Thai speaker were employed to record

the stimuli. Both were from Bangkok city and had lived there their whole lives. The

stimuli were recorded by the experimenter, and those (male and female) stimuli that

best matched in terms of F0 contour and duration as judged by a different native Thai

speaker were selected resulting in a complete set of 27 stimuli from the male and the

female speaker. In both the perception (see 8.3.6) and the production (see 8.3.7) tasks

the male participants (8 in the musician and eight in the non-musician group) were

presented with the male speaker stimuli and the females (10 in the musician and 10 in

the non-musician group) were presented with the female speaker stimuli. The reason

for this was so that F0 of participants' productions could be more directly compared to

native speakers' models.

8.3.6 Perception of Speech Sounds

In the perception part of the study, perception of sounds (see 8.3.5) differing in lexical

tone, consonant, and vowel quality is examined. Perception was measured in an

identification test presented in the DMDX (Forster & Forster, 2003) experimental

environment (see Appendix A8.8 for an example DMDX script). There were three

parts: tone identification, consonant identification, and vowel identification, each

consisting of a practice block, a training block, and a test block. In the practice block

nine items were presented, three of each of the contrasting sounds in the relevant

dimension. For example in the consonant part, three prevoiced, three voiceless

unaspirated, and three voiceless aspirated bilabial stops were presented. Following the

practice block, in the training block, a criterion of three consecutive correct responses

had to be reached before testing continued. In this training block, the same sounds

were presented as in the practice block and feedback was provided. In the test block,

no feedback was given to participants, and there were two repetitions of each of the

27 items presented in random order. Given the 27 different stimuli, two repetitions,

and three parts of the test, participants were required to identify 162 items in total.

Participants were provided with three labelled keyboard keys and instructed to “press

the RIGHT key for one kind of sound and the SPACEBAR key for another kind of

sound, and press LEFT for a third kind of sound”. For the consonant task, the keys

were labelled LEFT “ph” (for the voiceless aspirated bilabial), SPACEBAR “p” (for

the voiceless unaspirated bilabial stop), and RIGHT ”b” (for the prevoiced bilabial

194

stop). In the vowel task, the three keys were labelled “i”, “o”, and “u”, and in the tone

task they were labelled with the tone contour in stylised form (as shown in Figure

8.1).

mid tone (0) low tone (1) high tone (3)

Figure 8.1. Stylised versions of tonal contours used to label keys. Only the contours were presented on the keys.

The order of the experiment parts (tone, consonant, vowel) was counterbalanced

between subjects. In the test phase, responses timed out after 7000 ms, and timed out

trials were not replaced. Each of the tasks took 4 to 5 minutes, depending on the

participants‟ pace.

8.3.7 Production of Speech Sounds

In the production task, the same 27 stimuli were used as in the perception task, again

with male speaker stimuli for males and female speaker stimuli for females. Stimuli to

be imitated were presented in a Power Point file (see Appendix A8.9 for example

screenshots of the production task). First, all 27 stimuli were presented in random

order and participants were required to repeat each sound. After this randomised

presentation, the sounds were presented in a systematic way. First, the different

consonant sounds were introduced; the three sounds that differed only in voicing were

presented in succession with a crossed-out microphone presented on the display and

the participant being instructed to listen and not repeat the sounds. Then the same

three sounds were presented again; this time separately, and the listener was required

to repeat each of the sounds after it was presented. Similarly, in the tone phase of the

experiment, all three stimuli differing in lexical tone, for example: [bi:0], [bi:1], [bi:3]

were presented for listening then repeating. In the vowel phase, the three stimuli

varying in vowel were presented. After the separated presentation of vowel, tones,

and consonants, all sounds were presented again in random order. Thus, each

participant was required to produce five repetitions of each sound, a total of 135

productions. The production task took around 12 minutes.

195

Following this experiment two native Thai phoneticians were employed to rate each

participant‟s five productions for each sound on a scale of 1 (very bad) to 5 (very

good). Reliability between raters was high (r = .83). Details of the rating procedure

are given in Appendix A8.10. The result for each participant for each of the 27 sounds

was a mean score from 1 to 5.

8.4 Results: Separate Abilities

8.4.1 Musical Aptitude Results

Musical aptitude raw scores transformed into percentile ranks (see 8.3.2) were

analysed using a 2 (musicians/non-musicians) x 2 (rhythm/tone) analysis of variance

(ANOVA). Descriptive statistics (means and standard error bars) are shown in Figure

8.2. (For raw data see Appendix A8.11, and for statistical outputs see Appendix

A8.12.) Musicians scored significantly higher in both tone (F (1, 34) = 6.654, p < .05)

and rhythm (F (1, 34) = 4.745, p < .05) sections. No significant interaction between

the two aptitude scores and musical background was observed (F (1, 68) = .019, p >

.05).

0

20

40

60

80

100

tone rhythm

Musi

cal A

ptit

ude

(perc

entil

e r

anki

ng)


Figure 8.2. Descriptive statistics for mean percentile-ranking scores in the musical aptitude test

8.4.2 Musical Memory Results

Descriptive statistics for proportion correct for familiar songs (see section 8.3.3)

shown in Figures 8.3 and 8.4. Raw data and statistical analyses are presented in

Appendices 8.13 and 8.14. A 2 (musicians/non-musicians) x 2 (upward

shift/downward shift) x 2 (one-semitone/two-semitones) analysis of variance

(ANOVA) was conducted. No significant difference was observed between musicians

196

(M = .859, sd = .113) and non-musicians (M = .836, sd = .134) in the musical memory

test (F (1, 34) = .016, p > .05). There was also no significant difference between

upward and downward shifted songs (F = .056, p > .05). There was however a

significant difference between songs that were shifted by one semitone and songs that

were shifted by two semitones with the greater shifts being identified more accurately

than the smaller shifts (F (1, 16) = 5.095, p < .05).

0

0.25

0.5

0.75

1

1 semitone 2 semitones 1 semitone 2 semitones

upward shift downward shift

Mu

sica

l Me

mo

ry

(pro

po

rtio

n c

orr

ect

)


Figure 8.3. Descriptive statistics for musical memory results for musicians and non-musicians for shift size and shift direction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Franz

Ferdinand

Los Del Rio Eamon Cat Empire Nelly/Kelly Jamelia Puff Daddy Outkast Queen Men at

Work

Black eyed

Peas

Michael

Jackson

Musi

cal M

em

ory

(pro

port

ion c

orr

ect

)


Figure 8.4. Descriptive statistics for musical memory results for musicians and non-musicians across shift size and shift direction

8.4.2 Foreign Language Aptitude Results

Proportion correct scores (see section 8.3.4) for part 5 (sound discrimination) and part

6 (sound-symbol association) of the PLAB modern language aptitude test were

analysed using two separate analyses of variance (ANOVA). Mean values are plotted

in Figure 8.5 and raw data and analyses are presented in Appendix A8.15 and 8.16.

197

No significant differences between musicians and non-musicians were observed in

sound discrimination (F (1, 34) = 3.242, p > .05) or in sound-symbol association (F

(1, 34) = 1.462, p > .05).

0

0.2

0.4

0.6

0.8

1

five six

La

ngua

ge A

ptit

ude

(pro

port

ion c

orr

ect

)


Figure 8.5. Descriptive statistics for mean scores in the language aptitude test

8.4.4 Speech Perception

In the speech perception task, the listeners were required to identify tones (tone 0,

tone 1 tone 3), voicing (prevoiced [b], voiceless unaspirated [p], voiceless aspirated

[ph]), and vowels ([i:], [ɔ: [u ]). First, the results for all three sound types (tones,

consonants, vowels) are compared in both trials to criterion and test trial identification

performance. Then each speech sound type is investigated separately in more detailed

analyses of the particular tones, consonants, and vowels that were tested. Descriptive

statistics are shown for performance for tones, consonants, and vowels in trials to

criterion (Figure 8.6) and test trials (Figure 8.7) and then separately for tones (Figure

8.8), consonants (Figure 8.9), and vowels (Figure 8.10).

0

3

6

9

tones consonants vowels

mea

n tri

als

to c

rite

rion


Figure 8.6. Descriptive statistics for mean trials to criterion scores for musicians and non-musicians for tones, consonants, and vowels

198

0

0.5

1

tones consonants vowels

pe

rceptio

n a

ccu

racy


Figure 8.7. Descriptive statistics for perception accuracy for musicians and non-musicians for tones, consonants, and vowels.

0

0.5

1

low (static) tone mid (static) tone high (dynamic)

tone

tone

perc

ep

tion

acc

ura

cy


Figure 8.8. Descriptive statistics for tone perception scores for musicians and non-musicians.

0

0.5

1

b p ph

conso

na

nt

pe

rce

ptio

n a

ccura

cy


Figure 8.9. Descriptive statistics for consonant perception scores for musicians and non-musicians.

0

0.5

1

u i o

vow

el p

erce

pacc

urac

y


Figure 8.10. Descriptive statistics for vowel perception scores for musicians and non-musicians.

199

Trials to Criterion Analysis

Three outliers were found62 (z > 3.29) and changed to one unit larger than the next

extreme score, as suggested by Tabachnik and Fidell (2001). After dealing with the

outliers, the trials to criterion scores were analysed in an analysis of variance using

planned contrasts to test for differences between speech sound types (see Table 8.2).

Raw data for the criterion results of the speech perception task and statistical analyses

are presented in Appendix A8.17 and 8.18. Descriptive statistics for trials to criterion

results across speech sounds are provided in Figure 8.6.

Table 8.2

Description of the Speech Sound Type Planned Contrasts:

The test investigates (a) differences between pitch-based sounds (tones) and the non-

pitch based sounds (consonants and vowels) and (b) consonants and vowels

Tones Consonants Vowels

Tones (pitch-based) vs. Consonants

and Vowels (non-pitch-based)

2 -1 -1

Consonants vs. Vowels 0 1 -1

Trials to criterion were analysed in a 2 (musicians/non-musicians) x speech sound

(tones, consonants, vowels) planned contrast analysis of variance (ANOVA) in which

tones were compared with consonants and vowels and consonants were compared

with vowels. As can be seen in Figure 8.6 musicians required significantly fewer trials

to reach criterion than non-musicians (F (1, 32) = 4.70, p < .05), and listeners needed

significantly fewer trials to criterion for vowels that consonants (F (1, 32) = 48.20, p

< .05). There was also a significant interaction between musical background and

consonants vs. vowels with no difference between musicians and non-musicians in

vowels, but fewer trials needed in consonant perception by musicians than non-

musicians (F (1, 32) = 5.031, p < .05). None of the other main effects or interactions

were found to be significant.

62 Outliers were changed to one unit larger than the next extreme score. There was one outlier in the consonant task (24 trials – changed to 13), two in the tone task (26 and 24 trials – changed to 19), and one in the vowel task (6 trials – changed to 4).

200

Identification of Tones, Consonants, and Vowels

Speech perception results for tones, consonants, and vowels were analysed in a 2

(musicians/non-musicians) x 3 (tones/consonants/vowels) ANOVA using the planned

contrasts described in Table 8.2. For raw data of the speech perception task and

statistical analysis tables see Appendix A8.19 and 8.20. Descriptive statistics are

provided in Figure 8.7

The analysis shows that musicians show generally better perception accuracy across

all sounds (F (1, 34) = 13.154, p < .05) and consonant perception accuracy was

generally significantly lower than vowel perception accuracy (F (1, 34) = 122.72, p <

.05). No other significant main effects or interactions were observed.

Tone Perception Analysis

Tone perception scores for musicians and non-musicians were analysed using the

planned contrasts described in Table 8.3. Descriptive statistics are provided in Figure

8.8.

Table 8.3

Description of the Contrasts:

The test investigates (a) differences between the two steady63 tones (low tone and mid

tone) and dynamic tone (high) and (b) between the two steady tones

Mid Low High

Steady vs. dynamic tones 1 1 -2

Steady 1 vs. steady 2 1 -1 0

The analysis revealed that musicians are significantly more accurate than non-

musicians at identifying lexical tones (Mmusicians = .799, sd = .243; Mnon-musicians = .635,

sd = .331; F (1, 32) = 11.31, p < .05). The dynamic high tone was perceived

significantly more accurately than the two steady tones (Mdynamic = .961, sd = .059;

Msteady = .595, sd = .299; F (1, 32) = 120.23, p < .05) and the low tone was identified

more accurately than the mid tone (Mlow = .662, sd = .261; Mmid = .529, sd = .323; F

(1, 32) = 20.05, p < .05). The interaction between musicianship and steady vs.

dynamic tones was also significant (F (1, 32) = 13.35, p < .05) showing that the better

perception by musicians than non-musicians occurred only for the steady low and mid

63 In this analysis, the low and mid tone are referred to as “steady” tones, because they are steadily falling, whereas the high tone rises first and falls at the end.

201

tones, and not for the dynamic high tone on which both groups performed equally

well.

Consonant Perception Analysis

Consonant perception accuracy was analysed using the planned contrasts described in

Table 8.4 and descriptive statistics are shown in Figure 8.9. The label „native‟ is

attached to the voiceless unaspirated [p] and the voiceless aspirated [ph] sound, as

they are apparent in Australian English and the label „non-native‟ to the prevoiced

bilabial [b] as it is not part of the Australian English phonological inventory.

Table 8.4


The test investigates (a) differences between native and non-native consonants and (b)

differences between the two native consonants

b (non-native) p (native) ph (native)

Native vs. non-native sounds -2 1 1

Native sound 1 vs. native sound 2 0 1 -1

The difference between native and non-native consonants was found to be significant,

with the native consonants [p] and [ph] being identified less accurately than the non-

native [b] consonant (Mnative = .475, sd = .037; Mnon-native = .841, sd = .140; F (1, 32) =

104.07, p < .05). There was a no significant difference between musicians and non-

musicians overall, however there was a greater difference between musicians and

non-musicians on the native consonants, but little difference as a function of musical

background on the native speech sounds (F (1, 32) = 6.49, p < .05). The difference

between the two native sounds was also significant, with sound [ph] being identified

more accurately than sound [p] (Mph = .632, sd = .267; Mp = .319, sd = .271; F (1, 32)

= 50.31, p < .05), but the interaction with musicianship was not significant. These

results seem surprising at first, as it was expected that the native sounds would be

easier to identify than the non-native consonants. The reason for better identification

of [b] than [p] and [ph] however may be attributed to the use of labels in the current

experiment. The label “b” is unambiguous, whereas “p” and “ph” are more prone to

being confused, which may be the reason for lower identification scores for these two

native sounds.

202

Vowel Perception Analysis

Vowels were analysed using planned comparisons shown in Table 8.5. Descriptive

statistics are provided in Figure 8.10. The vowels [i] and [o] are labelled native

because they correspond to phonemes in Australian English, and the [u] vowel is non-

native as it is not part of the Australian vowel system.

Table 8.5


The test investigates (a) differences between native and non-native vowels and (b)

differences between native vowel [i] and native vowel [o]

[u] [i] [o]

Native [i, o] vs. non-native [u] vowels 2 -1 -1

Native vowel 1 [i] vs. native vowel 2 [o] 0 1 -1

Analysis revealed no significant differences between musicians and non-musicians in

the perception of vowels (F (1, 32) = .946, p > .05), no significant differences

between native and non-native vowels or between the two native vowels, and no

significant two-way interactions.

8.4.5 Speech Production

Rated speech production performance (see 8.3.6) will be considered first generally

across tones, consonants, and vowels and then more specifically for each of these.

Descriptive statistics are shown for all three (in Figure 8.11) and for tones,

consonants, and vowels in Figures 8.12, 8.13, and 8.14 respectively.

Overall Speech Production Results

Speech production results were analysed in a 2 (musicians/non-musicians) x 3

(tones/consonants/vowels) ANOVA using the planned contrasts described in Table

8.2 (section 8.4.4). Results are schematically presented in Figure 8.11, raw data and

analyses are presented in Appendix A8.21 and 8.22. Musicians show generally better

speech production accuracy across all sounds (Mmusicians = 3.24, sd = .523; Mnon-musicians

= 3.01, sd = .451; F (1, 34) = 5.56, p < .05) and vowel production accuracy (Mvowel =

3.32, sd = .0344) was significantly lower than consonant production accuracy

203

(Mconsonant = 3.01, sd = .0224; F (1, 34) = 47.73, p < .05). No other significant main

effects or interactions were observed.

Figure 8.11. Descriptive statistics for speech production scores for musicians and non-musicians for tones, consonants, and vowels

1

2

3

4

5

mid tone low tone high tone

mea

n p

rod

uction

score

musicians non-musicians Figure 8.12. Descriptive statistics for tone production scores for musicians and non-musicians

Figure 8.13. Descriptive statistics for consonant production scores across musicians and non-musicians

1

2

3

4

5

u i o

mean p

roduction s

core

s


Figure 8.14 Descriptive statistics for vowel production scores for musicians and non-musicians

204

Tone Production

Tone production ratings were analysed in a 2 (musicians/non-musicians) x 3

(tone0/tone1/tone3) ANOVA with the same planned contrasts used in the tone

perception analysis (see Table 8.3, section 8.4.4). Results are shown schematically in

Figure 8.12. Musicians were found to be significantly better at producing tones

overall (F (1, 30) = 9.956. p < .05), and the mid tone was produced significantly better

than the high tone (F (1, 30) = 15.86, p < .05). None of the other main effects or

interactions were found to be significant.

Consonant Production

Consonant production ratings were analysed in a 2 (musicians/non-musicians) x 3

(prevoiced/voiceless-unaspirated/voiceless-aspirated) ANOVA using the same

planned contrasts as in consonant perception (see Table 8.4, section 8.4.4).

Descriptive statistics are shown in Figure 8.13. Musicians were significantly more

accurate at producing consonants in general (F (1, 32) = 5.318, p < .05) and

production of the native ([p] and [ph]) sound was significantly more accurate than

production of the non-native [b] consonant (F (1, 32) = 12.364, p < .05). It was also

observed that the voiceless aspirated consonant was rated as being produced

significantly more accurately than the voiceless unaspirated consonant (F (1, 32) =

16.06, p < .05). No other significant differences were observed.

Vowel Production

Vowel production ratings were analysed in a 2 (musicians/non-musicians) x 3 (vowel

u/vowel i/vowel o) ANOVA according to the contrasts described in Table 8.5. For

descriptive statistics see Figure 8.14. No significant difference was found between

musicians and non-musicians in vowel production (F (1, 32) = 3.676, p > .05),

however, the native [i] vowel was generally produced significantly better than the

native [o] vowel (F (1, 32) = 12.42, p < .05). No other main effects or interactions

were significant.

205

8.5 Results: Comparison of Perception and Production

Scores on perception (scale from 0 to 1) and production (scale from 1 to 5) of tones,

consonants and vowels in musicians and non-musicians are compared in this section

using Pearson product-moment correlations. The critical value for correlations for

tone, consonant, and vowel overall was rcrit = .231, and the critical value for specific

sounds was rcrit = .40) It can be seen that there were only a very few significant

correlations between perception and production in musicians (in [b], vowels overall,

[i], and [o]) and non-musicians (consonants overall). For tones there were no

significant correlations between perception and production. Perception and production

of consonants was positively correlated in non-musicians (r = .265), and there was a

significant negative correlation between perception and production of the prevoiced

bilabial [b] in musicians (r = -.761). This indicates that participants, who were good at

perceiving prevoiced stops, were poor at producing them, which suggests that the

perceptual salience of this contrast for the perceiver actually distracts from the ability

to produce it. One possible explanation for this could be that listeners correctly label

the prevoiced bilabial stop, however consistently produce it wrongly. Vowel results

show a significant correlation between perception and production of the vowels in

general in musicians (r = .385) but not in non-musicians. Apart from the general

vowel perception and production correlation in musicians, perception and production

of [o] (r = .529) and [i] (r = .490) were also positively correlated in musicians, but not

in non-musicians.

8.6 Results: Determinants of Perception and Production

A principle components analysis was conducted to reduce the components

contributing to musical training obtained from the questionnaire (see Appendix A8.1

and 8.2), and the resulting factor was then used along with language aptitude, musical

aptitude, musical memory, and musical training in six separate sequential linear

regression analyses to predict perception and production of tones, consonants, and

vowels.

8.6.1 Factor Analysis for Data Reduction

In order to reduce the data for musical training, a principal components analysis was

performed on the three musical experience variables obtained in the questionnaire:

206

number of instruments played, total number of years playing music, hours a week

currently played for all 36 participants (18 musicians and 18 non-musicians). There

were no missing data and the one outlier was dealt with according to the procedure

suggested by Tabachnik and Fidell (2001)64. The raw data and analyses are presented

in Appendix A8.23 and 8.24. A single component with an eigenvalue greater than one

was extracted, which accounts for 72.11% of the variance. The component loadings,

communalities (h2), and percentages of variance explained are shown in Table 8.6. As

can be seen the three variables all load on the component relatively equally. The

factor was labeled musical training.

Table 8.6

Principal Component Loadings and Communalities (h2) for Music Training Variables.

Factor

Item 1 h2

Number of instruments .884 .781

Years played .827 .683

Hours per week .836 .699

% of variance 72.11 72.11

Label Musical training

8.6.2 Correlations Between Variables

Correlations between the four independent variables (language aptitude, musical

aptitude, musical memory, and musical training) were computed and are shown

schematically in Table 8.7. As can be seen in the results, there is a high correlation

between language aptitude and musical aptitude. This could indicate the involvement

of a general aptitude effect. High correlation was also found between musical aptitude

and musical experience. Thus, there seems to be overlap between inherent music

abilities (aptitude) and experiential music factors (experience). Musical memory

shows no significant correlation with any of the factors, which suggests that this is an

inherent ability that is unrelated to any other aspect of music or to language aptitude.

64 There was one outlier in the musician group for number of instruments played. It was changed to one unit larger than the next extreme score (from 14 to 7)

207

Table 8.7

Intercorrelations Among Language Aptitude, Musical Aptitude, Musical Memory, and

Musical Training

Variables Language Aptitude (PLAB)

Musical Aptitude (AMMA)

Musical Memory

Musical Training

Language Aptitude

(PLAB)

1 .602** .169 .218

Musical Aptitude

(AMMA)

1 .277 .528**

Musical Memory 1 .101

Musical Training 1

** p < .01

Correlations between the six outcome measures (perception and production of tones,

consonants, and vowels) and the four predictor variables are schematically presented

in Table 8.8.

Table 8.8

Descriptive Statistics and Correlations between the six dependent variables and the

four independent variables

Language

Aptitude

(PLAB)

Musical

Aptitude

(AMMA)

Musical

Memory

Musical

Training

Tone Perception .195 .467** .244 .391*

Tone Production .253 .185 .458** .244

Consonant Perception .536** .539** -.117 .315*

Consonant Production .519** .295* .083 .127

Vowel Perception .167 .246 -.027 .269

Vowel Production .325* -.090 .181 -.077

* p < .05, ** p < .01

208

Table 8.8 shows significant correlations between tone perception and musical aptitude

as well as between tone perception and musical training. This indicates that musically

trained participants and participants with high musical aptitude are also good at

perceiving tones and is of interest in the light of the high correlation between musical

aptitude and training (see Table 8.7) showing high intercorrelations between all three

of these factors. Tone production is not correlated with the same variables as tone

perception: here musical memory provides the only significant correlation. This tone

production/music memory correlation is intriguing in light of the fact that music

memory is not correlated with any of other inherent and experiential variables (see

Table 8.7). Of note is that neither tone perception nor tone production are correlated

with language aptitude. The exact relationship between tone perception and

production and measures of inherent abilities and experiential abilities will be

investigated further in the regression analyses.

Consonant perception and production are both highly correlated with language and

musical aptitude. Thus high scores in the inherent ability tests result in high consonant

perception and production. Again, this is interesting in light of the high correlation

between music and language aptitude (see Table 8.7) and may indicate that a general

aptitude factor may underlie perception and production. Interestingly, as for tone

perception, musical training is significantly correlated with perception of consonants

suggesting a general influence of music training on speech perception.

In vowel perception and production the picture is fairly simple; the only significant

correlation here is between vowel production and language aptitude. Thus neither the

possible general aptitude effect, nor the possible effect of musical training on speech

perception appears to operate. This may be due the nature of vowels or due to the high

scores in vowel perception.

8.6.3 Sequential Regressions

Six separate sequential linear regressions were performed, one each for perception

and production of tones, consonants, and vowels. In each the four independent

predictor variables were: language aptitude, musical aptitude and musical memory,

and musical training (the variable that was extracted out of the music training

209

variables in the factor analysis, see section 8.6.1). Raw data and analyses for the

regression are presented in Appendix A8.25 and 8.26. The sequential regression

procedure was chosen in order to identify the additional variance explained by

subsequently entered variables over and beyond the preceding variables. Language

aptitude was entered in the first block, as it is the variable that is expected to be most

closely related to each of the dependent variables (perception and production of tones,

consonants, and vowels). In the next block, musical aptitude was inserted, as, along

with language aptitude the musical aptitude test is designed to measure an inherent

ability. After that, in block 3, the musical memory variable, another presumably

inherent ability, was inserted. In the final block musical training was entered, as it

measures the least inherent factor that has very little relation to predisposition, as it

consists of musical experience data. Regression tables are presented in Appendix

A8.26 and results for each of the six regressions are discussed below.

8.6.2.1 Tone Perception and Production

Tone Perception: In block 1 language aptitude alone did not predict perception of

tone (R = .195, R2 = .038, adjusted R2 = .009, F (1, 33) = 1.306, p > .05). In block 2

when musical aptitude is added to the regression, the resultant combination did

predict tone perception (R = .479, R2 = .230, adjusted R2 = .182), and the addition of

musical aptitude led to a significant R2 change of .192 (F (2. 32) = 4.775, p < .05).

The addition of musical memory in the next block did not result in significant R2-

changes (R = .494, R2 = .244, adjusted R2 = .171, F (3, 31) = 3.336, p > .05), nor did

the addition of musical training in the final block (R = .522, R2 = .272, adjusted R2 =

.175, F (4, 30) = 2.806, p > .05). Thus musical aptitude appears to be the most

important predictor of perception of tones.

Tone production: Language aptitude alone did not predict production of tone (R =

.253, R2 = .064, adjusted R2 = .036, F (1, 34) = 2.320, p > .05), nor did the

combination of language and musical aptitude (R = .256, R2 = .064, adjusted R2 =

.036, F (2, 33) = 1.157, p > .05). When musical memory is added, however, there is a

significant R2 change (R = .494, R2 = .244, adjusted R2 = .173; R2 change = .179, F (3,

32) = 3.447, p < .05), which suggests that tone production can best be explained by

the addition of musical memory to the regression equation.

210

8.6.2.2 Consonant Perception and Production

Consonant Perception: Language aptitude alone predicts perception of consonants (R

= .536, R2 = .287, adjusted R2 = .266, F (1, 34) = 13.709, p < .05). The addition of

musical aptitude does not lead to a significant R2-change (R = .600, R2 = .360,

adjusted R2 = .322, F (2, 33) = 9.297, p > .05), however when musical memory is

added to the model, there is significant R2-change (R = .662, R2 = .438, adjusted R2 =

.385, R2 change = .078; F (3, 32) = 8.310, p < .05). Addition of musical training in

block 4 did not reliably improve R2. Thus, perception of consonants is best explained

by a combination of language and musical memory.

Consonant Production: Language aptitude alone best predicts production of

consonants (R = .519, R2 = .269, adjusted R2 = .247, F (1, 34) = 12.551, p < .05). The

addition of the other variables does not lead to significant R2-changes, which suggests

that language aptitude alone explains production of consonants.

8.6.2.3 Vowel Perception and Production

Vowel Perception: Language aptitude alone does not predict perception of vowels (R

= .167, R2 = .028, adjusted R2 = -.001, F (1, 34) = .979, p > .05), nor does addition of

any further variables. It is of note that there was a ceiling effect in the vowel

perception results, which may account for this lack of significant prediction.

Vowel production: In the case of vowel production, language aptitude alone is not a

significant predictor (R = .325, R2 = .106, adjusted R2 = .079, F (1, 34) = 4.019, p >

.05). When musical aptitude is added, the combination does predict vowel production

(R = .484, R2 = .234, adjusted R2 = .188, R2 change = .128; F (2, 33) = 5.046, p < .05).

Addition of musical memory or musical training in the next blocks does not reliably

improve R2.

Table 8.9 summarizes the predicting variables. It can be seen that tone perception and

production can be best explained by inherent musical abilities (musical aptitude and

musical memory respectively), consonant perception and production are predicted by

inherent language aptitude (and musical memory in perception), and vowel

production is best explained by musical aptitude, whereas none of the variables

explains perception of vowels.

211

Table 8.9

Summary of Significant Predictors when added to the Regressions. R2 change Values

are shown in brackets.

Speech Sound Type Perception Production

Tones Musical Aptitude (.192) Musical Memory (.179)

Consonants Language Aptitude (.287)

Musical Memory (.078)

Language Aptitude (.269)

Vowels No predictors Musical Aptitude (.128)

Thus, it seems that language and musical aptitude are generally good predictors for

perception and production of foreign speech sounds. Considering the very high

correlation between musical aptitude and language aptitude, and the similarly high

correlation between musical aptitude and musical training, one might suspect that this

pattern of results is influenced by the order of blocks that was chosen. In order to

exclude this possibility, an alternative model was tested in which the first predictor

remained language aptitude; the second block was now musical training, followed by

musical memory and the final block, musical aptitude. That is the position of music

aptitude and musical training, which are highly correlated, were reversed. Tables for

the alternative model regressions are given in Appendix A8.27 and are summarised in

Table 8.10.

Table 8.10

Summary of Significant Predictors when added to the Alternative Regressions. R2

change Values are shown in brackets.

Speech Sound Type Perception Production

Tones Musical Training (.128) Musical Memory (.167)

Consonants Language Aptitude (.287) Language Aptitude (.269)

Vowels No Predictors Musical Aptitude (.134)

212

The results of the alternative regression analyses indicate that, irrespective of the

order of steps, the main results remain the same. There were two changes. First, tone

perception is predicted by music aptitude in the preferred model but musical training

in the alternative model. This suggests that due to their high correlation it is difficult

to ascertain precisely whether musical aptitude or training is the critical predictor of

tone perception. Second, music memory dropped out as a predictor of consonant

perception in the alternative model. Music memory and music aptitude remained

significant predictors for tone perception and vowel perception across the two models.

8.6.4 Alternative Approach to Participant Grouping

Because of the high incidence of low scores on the musical training variable (the non-

musicians‟ scores especially on the musical training were close to zero) the

distributions could be skewed. In order to investigate this further, the musicians‟ and

non-musicians‟ data were analysed separately. Six separate sequential linear

regressions were performed for musicians, one each for perception and production of

tones, consonants, and vowels. In each, the four independent predictor variables were:

language aptitude, musical aptitude (the rhythm score and the tonal score were

entered separately, but in the same step of the regression), musical memory, and

musical training. The sequential regression procedure was chosen in order to identify

the additional variance explained by subsequently entered variables over and beyond

the preceding variables. Language aptitude was entered in the first block, as it is the

variable that is expected to be most closely related to each of the dependent variables

(perception and production of tones, consonants, and vowels). In the next block,

musical aptitude, measured as tonal and rhythm (separately) aptitude was entered, as,

along with language aptitude the musical aptitude test is designed to measure an

inherent ability. After that, in block 3, the musical memory variable, another

presumably inherent ability, was entered. In the final block musical training was

entered, as it measures the least inherent factor that has very little relation to

predisposition, as it consists of musical experience data.

For the non-musicians, another set of six sequential linear regressions was performed,

in which the predictor variables were: language aptitude, musical aptitude (the rhythm

213

score and the tonal score, entered in the same step, but as separate variables), and

musical memory.

Non-Musician results:

Tone Perception: None of the variables predict tone perception in non-musicians.

Tone Production: Language aptitude alone does not predict perception of tones,

neither does the addition of either of the two musical aptitude scores; however when

musical memory is added to the model, there is significant R2-change (R = .850, R2 =

.723, adjusted R2 = .637, R2 change = .623; F (4, 13) = 8.469, p < .05). Thus,

perception of tones is best explained the addition of musical memory into the model.

Consonant Perception: Language aptitude alone does not predict perception of

consonants, neither does the addition of either of the two musical aptitude scores;

however when musical memory is added to the model, there is significant R2-change

(R = .786, R2 = .617, adjusted R2 = .500, R2 change = .341; F (4, 13) = 5.247, p <

.05). Thus, perception of consonants is best explained by the addition of musical

memory.

Consonant Production: None of the variables in the analysis predicts consonant

production in non-musicians.

Vowel Perception: None of the variables in the regression predicted vowel perception

in non-musicians.

Vowel Production: Regression analyses showed that none of the entered variables

predicted vowel production either.

These results from an analysis including only the non-musicians‟ results are different

from those obtained when analysing both groups together: tone perception is not

predicted by musical aptitude in non-musicians, however the predictor for tone

production is still the addition of musical memory to language aptitude. In consonant

perception, the non-musician results are similar to the results for the two groups

combined: the addition of musical memory to language aptitude best predicts

consonant perception, however, unlike the original analysis, language aptitude alone

does not predict consonant perception. Consonant production in non-musicians is not

214

predicted by any of the variables used here, even though language aptitude was a

significant predictor in the original analysis.

Vowel perception in non-musicians is not predicted by any variables; and this is the

same as in the combined analysis. Vowel perception, originally best predicted by the

addition of musical aptitude to language aptitude, is now not predicted by any of the

variables.

Thus, overall it seems that the exclusion of the musicians from the analysis leads to

fewer significant predictors in the non-musician group‟s perception and production

with the only significant predictor for these non-musicians being musical memory.

Musician results:

Tone Perception: None of the variables entered in the sequential regression led to a

significant R2 change in musicians‟ tone perception abilities.

Tone Production: None of the variables in the regression predicted tone production in

non-musicians.

Consonant Perception: Language aptitude alone predicts perception of consonants (R

= .616, R2 = .379, adjusted R2 = .205, F (1, 16) = 9.780, p < .05). The addition of

musical aptitude (tone and rhythm scores, which were entered separately) does not

lead to a significant R2-change, and neither does musical memory or musical training.

Thus, perception of consonants is best explained by inherent language abilities,

measured as language aptitude.

Consonant Production: Language aptitude alone predicts production of consonants in

musically trained participants (R = .517, R2 = .267, adjusted R2 = .221, F (1, 16) =

5.826, p < .05). The addition of musical aptitude (rhythm and tone, entered as separate

variables but in the same step) also leads to significant R2-changes (R = .667, R2 =

.445, adjusted R2 = .326, F (3, 14) = 3.745, p < .05. (More specifically, both tone and

rhythm predict consonant production when tone is entered first, but only tone predicts

it when rhythm is entered first, suggesting perhaps that musical aptitude for tone in

musicians may be more important in consonant production than that for rhythm.) In

addition, the addition of both musical memory (R = .748, R2 = .560, adjusted R2 =

.425, F (4, 13) = 4.135, p < .05), and musical training (R = .819, R2 = .671, adjusted

215

R2 = .534, F (5, 12) = 4.889, p < .05) add to the predictive power for consonant

production. This suggests that production of consonants in musicians can best be

explained by a combination of language aptitude, musical (especially tone) aptitude,

music memory and musical training.

Vowel Perception: Language aptitude alone does not predict perception of vowels,

nor does addition of any other variables.

Vowel Production: In the case of vowel production, language aptitude alone is not a

significant predictor. The addition of musical aptitude significantly predicts vowel

production (R = .700, R2 = .490, adjusted R2 = .381, F (1, 16) = 4.481, p < .05).

Addition of musical memory (R = .717, R2 = .514, adjusted R2 = .364, F (4, 13) =

3.432, p < .05) and musical training (R = .798, R2 = .637, adjusted R2 = .485, F (5, 12)

= 4.205, p < .05) in the next blocks also reliably improves R2. Thus, vowel production

in musicians is best predicted by a combination of music aptitude, musical memory,

and musical training.

This set of results is different from that obtained in the original analysis in terms of

tone perception and production; these were predicted by the addition of musical

aptitude and musical memory respectively, however, when only musicians are

analysed, no significant predictors were found. Consonant perception is still predicted

by language aptitude, however the addition of musical memory does not lead to

significant R changes; consonant production however, which was originally predicted

significantly only by language aptitude, is now predicted by a combination of

language and music aptitude, plus music memory and training.

Vowel perception results are the same in this and the previous analysis: no significant

predictors were found. Vowel production however, previously predicted by musical

aptitude alone, is now predicted by a combination of music aptitude, memory, and

training.

The results of this alternative approach, in which musicians were analysed separately,

show great differences in results: Tone perception and production are not predictable,

neither is perception of vowels. Consonant perception is predicted by language

aptitude, and consonant production is predicted by a combination of all four variables.

216

Finally, vowel production is predicted best by a combination of musical aptitude,

memory, and training.

The differences between musicians and non-musicians are evident: while the non-

musicians‟ results were only predicted by musical memory, musicians‟ results are

more complex, and combinations of different variables seem to predict them best,

especially in production of consonants and vowels.

Parallels that have been discovered between the current separated and the previous

combined results for non-musician are found in production of tones, perception of

consonants, and perception and production of vowels. When comparing the

musicians‟ results to the original set of predictors, no similarities are found, except for

the fact that vowel perception is still not predicted by any of the entered variables.

Given these differences there is a need for more comprehensive future studies in

which a greater range of musical abilities and training is sampled than in the current

study.

It can be concluded that non-musicians‟ speech perception and production abilities are

best predicted by musical memory, whereas in musicians, it is a combination of

training and inherent abilities that best predict their speech perception and production.

However, since the assumptions for these analyses were not met (minimum number of

participants in the musician group – 24, minimum number in the non-musician group:

20 participants; in the current study only 18 participants per group were tested), the

conclusions drawn from these results are tentative and can only be seen as

explanatory.

8.7 Discussion

The results show that musicians are better than non-musicians at tone perception and

production, consonant perception and production, and in musical aptitude, but there

were no differences between musicians and non-musicians in language aptitude,

musical memory, or perception and production of vowels. There was some correlation

between perception and production of consonants and vowels, however mainly for the

musicians, which is interesting because it shows that, in musically trained listeners,

consonant and vowel perception and production may be processed similarly. The

217

regression results show that the critical factor in tone perception and production are

the inherent music abilities, aptitude for perception and music memory for production.

Consonant perception and production are mainly explained by language aptitude, and

vowel perception is not explained by any of the variables; vowel production, however

is best predicted by music aptitude (but see section 8.6.4 for alternative approach). In

the section below the results are discussed in terms of what the components of

musical training might be, how music training might affect perception and production,

and finally the musical determinants of speech sound perception and production.

8.7.1 The Nature of Musicianship

Accuracy on the musical memory test was similar for musicians and non-musicians.

This confirms earlier results, that both musicians and non-musicians have quite

accurate long-term memory for pitch (Schellenberg & Trehub, 2003). These results

provide evidence that adults with little musical training remember the pitch level of

familiar instrumental recordings, as reflected in their strong ability to distinguish

correct versions from versions shifted upward or downward by 1 or 2 semitones.

Their failure to identify the correct pitch level of unfamiliar musical recordings

excludes contributions from possible artifacts of the pitch shifting process

(Schellenberg & Trehub, 2003).

The very high accuracy with which these musicians and non-musicians without

absolute pitch65 (AP) identified pitch shifting demonstrates that most people retain

fine-grained information about pitch height over long periods. This could indicate that

music listeners create very accurate representations of musical pieces that contain

absolute and relational characteristics (Dowling, 1999). The results also show that this

type of absolute memory for pitch is much more widespread than the traditional type

of AP, in which the listener names or reproduces tones, isolated from musical

contexts. It is therefore possible that it is the aspect of pitch naming, rather that of

pitch memory that is responsible for the rarity of the traditional form of AP (see also

Chapter 4).

65 Two of the participants in the current experiment were possessors of absolute pitch. This did, however, not influence the results, as they did not show enhanced musical memory (.63 and .91 proportion correct) compared to participants without absolute pitch (M = .854).

218

Musicians‟ musical aptitude scores were found to be significantly higher than non-

musicians‟ scores (see 8.4.1) and there was a high correlation between music aptitude

and training (see 8.6.2). There are two main possible explanations of this. The first

concerns self-selection; those people who have high musical aptitude are probably

more likely to learn an instrument, due possibly to self-motivation or encouragement

from observant parents or teachers. Secondly, once musical training begins, those

people with higher musical aptitude quite probably learn music more quickly and

easily, and thus do not quit playing music as easily as people with lower aptitude.

Musicians and non-musicians scored similarly on language aptitude (see 8.4.2) and

there was a low correlation between music training and language aptitude. Thus it

appears that musical training is not related to language aptitude in the measures used

here.

8.7.2 Musicianship and the Perception and Production of Speech Sounds

Based on the results of the separate abilities, it appears that what distinguishes

musicians from non-musicians is their musical experience, and higher musical

aptitude with no differences in musical memory or language aptitude. Now the effect

of musicianship on the perception and production of speech sounds will be

considered.

The results of the speech perception and production tasks show that musicians are

significantly better than non-musicians at perceiving and producing tones and

consonants but there was no difference between musicians and non-musicians in the

perception and production of vowels. The possible reasons for the systematic

superiority of musicians over non-musicians are set out below, separately for tones

and consonants.

Tones are a feature of music in all cultures, but not a feature of speech in all

languages. Musicians are continually exposed to small tonal variations. It was

therefore expected that tone perception would be better in musicians than in non-

musicians. The results of the current study confirm this; musicians learn to identify

tones more quickly than non-musicians, and are more accurate at discriminating

subtle differences between tones than non-musicians. Thus it appears that musical

219

training might enhance listeners‟ acquisition of new non-musical tonal distinctions

and their ability to perceive and produce tones. A reason for the superiority of

musicians over non-musicians could be that musicians have better pitch pattern

processing skills. Jakobson, Cuddy, and Kilgour (2003) found that musical training

engages and refines processes involved in pitch pattern analysis and these may be

activated in the current tasks. Thus it is possible that music instruction affects

categorical perception of tone indirectly by strengthening auditory temporal

processing skills (Jakobson et al., 2003), which allows musicians to discriminate

better between rapidly changing acoustic events. It is quite possible that such skills

may be of use here in the foreign language perception and production tasks.

For consonants the essential differences between [b], [p], and [ph] are small voice

onset timing (VOT) differences. Music is not only about melody and harmony, but

also about rhythm and timing. So one reason why musicians are better at perceiving

and producing consonants could be that they are more finely tuned to small rhythmic

differences than non-musicians. In fact it has been found that musicians are better

than non-musicians in judging temporal order (Koh, Cuddy, & Jakobson, 2001). This

skill may be useful in the consonant perception task here, which required fine-grained

discrimination of VOT. Indeed, Koh et al. (2001) suggest that increased exposure to

temporal distinctions may improve processing of subtle temporal order (small voice

onset time discrimination) differences.

8.7.3 Musical Determinants of Speech Perception and Production

Now that we know the nature of musical training and the influence of musical training

on perception and production of speech sounds the relative contribution of musical

training and other variables to speech sound perception and production can be

considered.

Musicians are better at musical aptitude, speech sound learning, tone identification,

identification of non-native consonants, and they show greater correlations between

perception and production of speech sounds than non-musicians. However, a more

analytic look at the results through regression analyses shows that it may not be

musical training per se that contributes to the explanation of speech sound perception

220

and production. The determinants for tone, consonants, and vowels are considered

separately below.

Tones: Language aptitude is not important in the perception and production of lexical

tone, rather it is the additional effect of musical aptitude that makes a good tone

perceiver and of musical memory makes a good tone producer. Musical training does

not predict tone perception or production, showing that it is not necessary that

listeners have actual experience in perceiving small tonal changes or producing them

(when learning an instrument). Rather inherent language aptitude and inherent

musical memory aid in tone perception and production, irrespective of degree of

training. A rider must be added to this explanation for tone perception. There, in the

alternative regression model, it was in fact the addition of musical training that added

predictive power to the regression. This, along with the high correlation between

musical aptitude and musical training suggests that neither musical training nor

musical aptitude per se consistently predicts tone perception66. However tone

production is resistant to the model change and musical memory remains the

significant addition in both models67.

Consonants: For consonants, it is clear that language aptitude alone can predict

participants‟ ability for producing consonants. This is also the case in consonant

perception; however, the addition of musical memory also adds to the prediction.

Together, these results show that it is inherent language ability that leads to good

consonant perception and production abilities and for consonant perception the

inherent ability, musical memory (not musical training), also adds to the results68.

Vowels: Finally, for vowels, none of the independent variables served as a good

predictor for vowel perception (note the ceiling effect in vowel perception, see section

8.4.4). In vowel production, however, the best predictor was a combination of

language and musical aptitude. This indicates that language aptitude alone does not

66 Note that when musicians‟ and non-musicians‟ data are analysed separately, this is confirmed – none of the factors in the analysis predict tone perception in either of the groups. 67 However, it should be noted that when musicians‟ and non-musicians‟ data are analysed separately, music memory only predicts tone production in non-musicians, and not in musicians. 68 However, when musicians‟ and non-musicians‟ data are analysed separately, consonant perception is predicted by language aptitude in musicians, and by musical memory in non-musicians; and consonant production is predicted best by a combination of all factors in musicians, but not by any of the variables in non-musicians.

221

predict vowel perception and that musical training is, again, not required to explain

accuracy of vowel perception or production69.

Together, the results show that it is not musical training as such that leads to

musicians‟ superior performance on speech perception and production tasks, but

rather more the case that inherent abilities like musical aptitude, musical memory, and

language aptitude enhance speech perception and production70. In short, playing an

instrument per se does not make foreign language sound learning easier, but aptitude

for music and language makes a good producer or perceiver of new speech sounds.

These results suggest that the reason why musicians are better at speech perception

and production is not due to their experience with music perception and production,

but rather to their predisposition for music, which, in turn could be related to their

motivation to learn an instrument (see section 8.7.2.1) and to their continued musical

experience, while those with less aptitude do not continue with musical training.

69 Similarly, when vowel perception was analysed for musicians and non-musicians separately, none of the factors predicted vowel perception in either group or production in non-musicians; vowel production in musicians was best predicted by music aptitude, memory, and training. 70 However, when musicians‟ and non-musicians‟ data are analysed separately, inherent abilities and training only predicted performance of musicians, but not of non-musicians, for which musical memory seemed to be the best predictor.

222

CHAPTER 9

Discussion

223

9.1 Summary of Results

Here the results of Experiments 1, 2, and 3 are summarised ahead of a discussion and

interpretation of these. The final section considers implications of the current

experiments and suggestions for future research.

9.1.1 Experiment 1: Categorical Perception of Speech and Sine-Wave Tones in

Tonal and Non-Tonal Language Speakers

Previous studies of the categorical identification and discrimination of tone typically

employed just a single tonal language. Moreover, the studies often only examined one

part of categorical perception; identification or discrimination.

Experiment 1 here concerned the categorical perception of novel synthetic tone

continua realised as both speech and non-speech stimuli in speakers of tonal and non-

tonal languages. The results in terms of language background were mixed. Mandarin

and Vietnamese listeners‟ perception was similar, but differed from Thai listeners‟

perception, which was unexpectedly similar to that of Australian English listeners.

Thus, there was no clear-cut distinction between tonal and non-tonal language

listeners. Different perceptual strategies were observed between language groups

(mid-continuum strategy for the Vietnamese and the Mandarin listeners, and flat-

anchor strategy for the Thai and the Australian English listeners), but these

differences did not correspond to tonal vs. non-tonal language background. These

results strongly suggest that it simply does not matter from the point of view of

proficiency or strategy choice whether the listener‟s native language is tonal, but

rather, if a tonal language is spoken, which tonal language is spoken. Indeed, it might

even be the case that particular features associated with tone categories in the native

language, e.g., durational differences in Vietnamese, might affect the categoricality of

tone perception for a new synthetic tone continuum.

In summary and conclusion, Experiment 1 here is the first study to compare a range of

tonal languages along with a non-tonal language. The results show that proficiency

and strategies differ across tonal and non-tonal language speakers confronted with a

new synthetic tone continuum. Not all tonal language speakers use the same

224

perceptual strategies, and each tonal language must be considered and analysed

separately.

An interesting side observation made in Experiment 1 was that tonal and non-tonal

language-speaking musicians behaved differently from non-musicians. Ad hoc

analyses suggested that musicians required fewer trials to reach criterion in

identification, exhibited more consistent identification patterns (identification was

more categorical), and were more accurate at discriminating tones. These observations

suggest that not only language background but also musical training appears to

influence tone perception. The reasons for this were unclear in Experiment 1, and

begged investigation in a more systematic manner.

9.1.2 Experiment 2: Perception of Speech and Sine-Wave Tones - The Role of

Language Background and Musical Training

The influence of musical background on the perception of tone was tested further in

Experiment 2, in which tonal and non-tonal language speakers (Thai and Australian

English speakers because of their strategy similarities observed in Experiment 1) with

and without musical training were tested on categorical identification and

discrimination tasks. Two different continua, one with more falling and fewer rising

tones, the other with more rising and fewer falling tones were employed in this

experiment in contrast to the single continuum used in Experiment 1. In this

controlled experimental manipulation of language and music background it was

shown that the ability to perceive a novel synthetic tone continuum categorically did

not differ appreciably as a function of language background (tone, Thai, vs. non-tone,

Australian English), but it was found that musicians compared with non-musicians

learn to identify tone categories more quickly, show more consistent tone labelling

abilities, and have better tone discrimination abilities. Interestingly, it was found that,

independently of language and music background, identification and discrimination

accuracy was higher for the falling than for the rising continuum. This suggests that

perception of tones does depend on the shape of the continuum that is presented.

225

9.1.3 Experiment 3: Perception and Production of Tones, Vowels, and

Consonants - The Influence of Training, Memory, and Aptitude

The question of whether the perceptual advantage found in Experiment 2 enables

musicians to perceive and to produce tones was addressed in Experiment 3. In order

to specify clearly what effects musicianship might have on perception and production

of tone compared with those on speech more generally, not only tone, but also

consonant and vowel perception and production were tested. In order that the effect of

musicianship per se on tone and phone perception and production was investigated,

non-tone (Australian English) musician and non-musician participants were tested

with non-native speech sounds - Thai tones, consonants, and vowels. Moreover, in

order to specify clearly what aspect of musicality might be the critical factor,

measures of musical training, musical aptitude, and musical memory were employed

(along with a measure of language aptitude as a control for inherent linguistic skills

independent of musical training).

Overall, there appeared to be a ceiling effect for vowel perception, due to the task

being quite easy and resulting in uniformly high identification scores. This aside,

musicians were better able to perceive and produce both tones and consonants than

non-musicians. However, when the specific predictors of these abilities were

considered, it was found that they differed according to speech sound type: consonant

perception and production was predicted by language aptitude (arguably an

autoregression effect), except that in consonant production adding musical memory

also added predictive power; whereas tone perception and production were best

predicted when musical aptitude and musical memory respectively were added to the

regression equation. Thus the type of musical influence was entirely unexpected:

rather than musical training, i.e., the amount and type of musical experience, it was

the inherent musical ability, musical memory, that best predicted tone production, and

for tone perception it was musical aptitude, although, in line with a strong correlation

between musical aptitude and musical training, there may well have been a co-

determination by musical aptitude/training

226

9.2 Strategy Effects in Tone Perception

In Experiment 1, there was evidence for the differential use of two different strategies

in the categorical perception of a novel synthetic tone continuum: Mandarin and

Vietnamese tonal language speakers tended to divide the asymmetric tone continuum

into above and below the centre; whereas Thai and Australian listeners appeared to

use a different strategy, dividing the continuum into tones above and below a flat no-

contour tone.

It seems that for the Thai and non-tonal Australian English listeners it is easier to use

a perceptually salient tone (flat 200 Hz onset 200 Hz offset tone) as the reference

point at which to create a boundary between tones of one and the other category –

rising or falling. For Mandarin and Vietnamese listeners, however, the separation

point is located at the centre of the continuum; they appear to create a new tone space

for the task and this tone space is perceptually-based, with the centre point creating

the boundary between the two tone categories. These two approaches are different.

The flat-anchor strategy used by Thai and English listeners is a more acoustically or

psychophysically-based strategy - the perceptually different and thus presumably

more salient flat stimulus is used as category boundary. All tones above the flat

stimulus are the categorised as one tone type, presumably „rising‟, while the tones

below the flat tone are categorised as another tone type, presumably „falling‟. On the

other hand in the mid-continuum strategy for the Mandarin and Vietnamese listeners,

the flat tone appears to play a less critical role in their perception of the asymmetric

continuum: they simply divide it into two equal halves, depending on the values of the

two endpoints. This appears to be a more linguistic approach and perhaps one that, all

other things being equal, is potentially a more adaptive and profitable approach when

learning a new (tone) language.

Another reason for the differences between the three tonal languages could be the

relative similarity of the synthetic tones used in the experiment to the actual tones of

the tonal languages (Thai, Vietnamese, and Mandarin). Thai has a relatively large

proportion of static tones (three static, two dynamic tones) compared to Vietnamese

(two static, four dynamic tones) and Mandarin (one static, three dynamic tones). Thai

listeners' greater exposure to static tones in their everyday linguistic experience may

227

predispose them to use the static „flat‟ tone as a perceptual anchor more readily. This

requires further investigation via specific experimental investigations such as

systematically varying synthetic tones from specific tone spaces, or by training groups

on different synthetic tone spaces and testing for transfer to new tones. The results of

such studies would assist in determining the plausibility of the indication here that the

listeners‟ tone space influences their perception of novel tonal spaces

The reason why Australian listeners, who are not required to attend to subtle tonal

differences at the lexical level in their own language, use the acoustic approach

appears to be obvious. The flat tone in the synthetic continuum is the most

acoustically salient (not falling, not rising) tone and therefore is used as a perceptual

anchor. Moreover, this is what Australian English (non-tonal) language speakers

might consider to be the more “normal” speech sound. What remains unclear is why

the Thai listener group, unlike the other tonal language groups, uses the same,

seemingly acoustic approach to tone perception as the Australian English listeners.

One reason could be that the Thai listeners‟ pre-existing linguistically-based

perceptual anchors, derived from the tone values and tone space of their native

language, just happen to coincide with the flat no-contour tone, which in turn is non-

tonal language listeners‟ acoustically-based anchor. While the veracity of this

explanation cannot be confirmed here it provides grist for future experiments in tone

perception (see section 9.5). This pattern of results for tone perception strategies,

together with observed differences between perception of differently shaped continua,

show that it is not possible to generalise from results based on speakers of one tonal

language to the speakers of all tonal languages, and that each group of language

speakers must be considered separately, as must the specific tone space characteristics

of their particular languages.

9.3 Musicians’ Advantages in Speech Perception and Production

The results of all three experiments show that musicality is associated with perceptual

advantages in identification, discrimination, and production of tones. These

advantages are discussed in further detail below in relation to (i) transfer to musical

tasks (ii) transfer to related linguistic tasks and (iii) transfer to less related linguistic

tasks.

228

9.3.1 Musical Experience – Transfer to Musical Tasks

The results of Experiment 3 show that musicians have higher musical aptitude scores

than non-musicians and there is a high correlation between musical aptitude and

musical training. This pattern of results is not very surprising, considering that people

with high musical aptitude are more likely to take up and keep playing an instrument

than people who find learning an instrument difficult. Another reason for musicians‟

better performance on musical aptitude tests could be that they are more used to

listening to subtle differences in music, and therefore their perception is more fine-

tuned than non-musicians‟ perception, especially since most music curricula start with

classical music, and the musical aptitude stimuli are based on Western classical

tonality and rhythm. Together, it seems only natural that musical training enhances, or

encourages or allows expression of, abilities that rely on similar processes as those on

which musical skills are based. However, that said, it is of interest that there are no

consistent differences between musicians and non-musicians in terms of musical

memory; musicians and non-musicians perform equivalently on this task, and musical

memory does not correlate with any other of the musical or linguistic tasks here. Thus

while musicianship (musical experience and musical aptitude seem closely related,

musical memory is independent of these, but note a ceiling effect in the musical

memory results. The ceiling effect in the musical memory results indicates that not

only did a large proportion of participants not vary in training, they also did not vary

in terms of memory. A more challenging memory task would presumably yield

reliable differences between musicians and non-musicians, as well as larger individual

differences with the groups.

9.3.2 Musical Experience – Transfer to Related Linguistic Tasks

In Experiment 3 it was found that musicians are better than non-musicians at

perception and production of lexical tones. The reasons for this are very likely related

to the fact that tone is a feature of both speech (in tonal languages) and music. Thus,

the frequent exposure to fine tonal differences may enhance musicians‟ tone

perception. While on the surface this explanation appears reasonable, the fact that

experience with tones in speaking a tonal language, Thai, does not appear to

systematically improve the perception of a new tone space (see Experiment 2 results),

229

this may not be the complete picture. It may well be the case that it is musical aptitude

and not musical experience that facilitates lexical tone perception.

Indeed the results of the regression analyses suggest that tone perception and

production are best predicted by inherent musical abilities, musical aptitude and

musical memory. Even though there may be some involvement of musical training

(see test of the alterative model, see section 8.6.3), musical training alone cannot

account for tone perception and production ability.

Elevated musical aptitude may be related to more general abilities, specifically in

musicians‟ generally auditory processing capabilities in listeners with musical training

and/or aptitude. Jakobson, Cuddy, and Kilgour (2003) found that musical training

engages and refines processes involved in pitch pattern analysis. They hypothesise

that music instruction strengthens auditory temporal processing skills, which would

enable musicians to discriminate better between rapidly changing acoustic events, a

skill that is necessary in at least the tone perception tasks in the current series. Given

the overlap between musical training and musical aptitude here, and the often

confounding of these factors in previous studies it is possible that the auditory

temporal processing skills mentioned above might be just as related to musical

aptitude as to musical training.

Another consideration speaking against the notion that musical expertise per se

facilitates tone perception and production is that the goal is generally to produce

stable pitches in Western music; in this experiment, none of the three tones were

stable.

9.3.3 Music Experience – Transfer to Less Related Linguistic Abilities

If indeed the skills uncovered or described by musical aptitude and/or musical training

are related to general auditory processing skills, then it should be the case that ability

for other linguistic skills, not just tone perception, should be elevated in musicians

compared to non-musicians. Such a contention is supported by the results of

Experiment 3, which show that musicians have more accurate consonant perception

abilities than non-musicians. Thus, the reason that musicians are better at perceiving

consonants than non-musicians could be that they are more finely tuned than non-

230

musicians, not only to subtle tonal differences, but also to small timing differences. It

has previously been found that musicians are also better than non-musicians in

judging temporal order (Koh et al., 2001). Such an advantage may be attributed to

musicians‟ frequent exposure to small tonal changes. However, given the close

relationship found between musical aptitude and musical training in the current

experiments, whether the advantages that musicians show here on linguistic tasks are

the result of acquired skills or latent aptitudes must await further experimentation.

9.4 Locus of Musicians’ Superiority

The results of the current experiments have shown that musicians are better at

learning new tonal contrasts, identification of novel tones, discrimination of subtle

tonal differences, and perception and production of novel consonants and tones.

The reasons for this are complex. Correlation results show that musical training is

highly correlated with musical aptitude, which, in turn, is correlated with language

aptitude. The regression results reveal that it is not musical training per se that

predicts perception and production of speech sounds. Thus, the question is: what is

the exact locus of musicians‟ superiority?

Linguistic ability for tones is best predicted by a combination of language and musical

aptitude (for tone perception) and musical memory (for tone production), but not by

language aptitude alone. Indeed it is the addition of musical aptitude and musical

memory to the regression equations for tone perception and tone production

respectively that significantly predicts the production. Thus, it appears that it is not

musical training as such that leads to good tone perception and production, but

inherent musical abilities. There is however, a rider to this conclusion; the high

correlation between musical aptitude and musical training plus the success in

prediction by musical training in the alternative regression model suggests that it may

be difficult to tease apart the effects of musical aptitude and musical training.

Nevertheless, the results strongly suggest that there may well be a component of

musicality that predicts tone perception and production that is independent of musical

training or experience. To find out more about this and how it may relate to a more

general underlying aptitude, consonant and vowel results need to be considered.

231

Consonant perception is best explained by language aptitude alone (and the addition

of musical memory in production). Again, musical training does not appear to play a

role and this is the case even under the alternative regression model. This involvement

of language aptitude makes intuitive sense, as the language aptitude test measures the

ability to learn new speech sounds. However, what requires addressing is why

consonant perception and production, but not tone perception and production are

predicted by language aptitude. It may well be the case that consonant perception and

production are more linguistic in nature than tone perception and production. One line

of evidence to support such a view is the fact that consonants are perceived more

categorically than tones or even vowels (see Chapter 2). However, while this may be

the case in general, what is shown here is that a particular (sub-) test of language

aptitude (Pimsleur, 1966) does not predict non-native participants' perception and

imitation of these naturally produced tone contrasts. Given the unfamiliarity of lexical

tone contrasts for these non-native listeners, it may not be too surprising that listeners

do not process these linguistically (as the previous experiments also suggest), but

perhaps more acoustically or even musically.

Turning to vowels, there is no contribution of language aptitude in the ability for

vowel perception or production. Indeed for vowel perception there were no significant

predictors. For vowel production however musical aptitude was a significant

predictor. However, it should be noted that the vowel perception task here was the

easiest task overall, and the regression results are the least comprehensive. Further

studies should be conducted before definitive conclusions can be drawn regarding

vowel perception and production 71.

The fact that musical training as such does not explain good perception and

production of speech sounds suggests that previous research may have confounded

musical training and musical aptitude. Such studies must, therefore, be considered

with caution, and future studies investigating speech processing skills in musicians

71 It also needs to be noted that when musicians‟ and non-musicians‟ data were analysed separately, non-musicians‟ speech perception and production in general were best predicted by musical memory, whereas a combination of musical training and inherent abilities best predicted the musicians‟ results. It should further be noted, however, that these results should be considered with caution, as the necessary assumptions for this separate analysis were not completely met.

232

and non-musicians should consider inherent factors (language aptitude, musical

aptitude, and musical memory) in addition to experiential factors such as of musical

training history. This and other directions for future research are considered in the

final section.

9.5 Suggestions for Future Research

The results of the current experiments have added to our understanding of the role of

musical ability in language ability, but have also opened a range of further questions.

Suggestions for future research, picking up on such questions, are discussed below.

9.5.1 Relationship between Tone Space and Intonation Space

In order to find out more about how specific characteristics of listeners‟ native tone

space (or intonation space in the case of non-tonal intonation languages such as

English) further experiments are required in which cross-language tone and intonation

are manipulated. In this way, it could be tested exactly how the native tone space

influences the acquisition of a new tone space.

9.5.2 Investigation of Relationship Between Musical Training and Musical

Aptitude

The results of the current studies indicate that there is a high correlation between

musical aptitude and musical training. This may, however, at least partly be due to the

selection criteria (no less than five years of musical training for musicians and no

more than 2 years of training for non-musicians). In order to separate the effects of

musical training and musical aptitude, there needs to be a greater range of degrees of

musical experience. Future studies should sample more widely from the population of

musical and non-musical participants.

9.5.3 Development of Musicality

Another important issue to consider in future experiments is the issue of the

development of musicality. Experiments with children may uncover whether

musicality is a learned skill that can be acquired at any age or whether there is a

critical period for music acquisition, similar to language acquisition. Some research in

this area was considered in Chapter 4. However studies with children are required to

233

determine the degree of aptitude in the presence or absence of musical training,

perhaps also in relation to their foreign language learning ability.

9.5.4 Acoustic Analyses of Speech Production Ability

In order to discover more about the production skills of musicians and non-musicians,

speech productions need to be analysed acoustically. The question here would be:

what exactly is the difference between the “good” and the “bad” producers? There are

various possibilities, for example in tone production, accuracy of overall pitch,

goodness of pitch contours, duration; in consonant production, accuracy of VOT

values, differences in production of native and non-native consonants; in vowel

production: accuracy of formant values, difference in production between native and

non-native vowels, etc.

9.5.5 Psychoacoustic Processing Investigation

Another important issue that has not been addressed in the current series of studies is

whether the relationship between musical ability and speech sound acquisition is a

result of individual differences in basic auditory processing, such as the ability to

perceive the pitch patterns (Jakobson et al., 2003), temporal order perception (Koh et

al., 2001) or the ability to perceive the presence or absence of very low amplitude

sounds. Previous research has shown that individual speech perception differences are

not dependent on spectral and temporal processing accuracy for non-speech

(Surprenant & Watson, 2001). It is thus necessary to investigate whether individual

variation in basic auditory abilities and processing can predict variation in the

perception of foreign language sounds and of musical sounds.

9.6 Conclusion

This thesis offers the first comprehensive multi-language investigation of lexical tone

perception in speakers of tonal and non-tonal languages with and without musical

experience. The findings suggest that tone processing is language-specific and

strongly shaped by inherent musical ability. Speech perception and production results

indicate that musical training is not the determining factor in acquisition of novel

speech sounds. Rather it appears to be inherent abilities, such as language or musical

234

aptitude, or a more universal „auditory aptitude‟ that explains musicians‟ superiority

in speech perception and production.

The current experiments represent are the first step into a new avenue of research, in

which both inherent abilities, and experiential factors will be considered in the search

for the origin of good speech acquisition skills.

235

REFERENCES

236

Abercrombie, D. (1967). Elements of general phonetics. Chicago, IL: Aldine.

Abercrombie, D. (1968). Paralanguage. British Journal of Disorders of

Communication, 3, 55-59.

Abramson, A. S. (1961). Identification and discrimination of phonetic tones. Journal

of the Acoustical Society of America, 33, 842.

Abramson, A. S. (1962). The vowels and tones of standard Thai: Acoustical

measurements and experiments. International Journal of American

Linguistics, 28(2).

Abramson, A. S. (1975). The tones of Central Thai: Some perceptual experiments. In

J. C. J. G. Harris (Ed.), Studies in Thai Linguistics (pp. 1-16). Bangkok:

Central Institute of English Language.

Abramson, A. S. (1977). The noncategorical perception of tone categories in Thai.

Paper presented at the 93rd meeting of the Acoustical Society of America,

State College, Penn.

Abramson, A. S. (1978). Static and dynamic acoustic cues in distinctive tones.

Language and Speech, 21(4), 319-325.

Abramson, A. S. (1979). The noncategorical perception of tones in Thai. In B.

Lindblom & S. Ohmann (Eds.), Frontiers of speech communication research

(pp. 127-134). London: Academic Press.

Abramson, A. S., & Lisker, L. (1970). Discriminability along the voicing continuum:

Cross-language tests. In Proceedings of the 6th International Congress of

Phonetic Sciences (pp. 569-573). Prague: Academia.

Abramson, A. S., & Lisker, L. (1973). Voice-timing perception in Spanish word-

initial stops. Journal of Phonetics, 1, 1-8.

Abramson, A. S., & Svastikula, K. (1983). Intersections of tone and intonation in

Thai. Haskins Laboratories Status Report on Speech Research, SR-74/75, 143-

154.

Akahane-Yamada, R., Tohkura, Y., Bradlow, A. R., & Pisoni, D. B. (1998). Does

training in speech perception modify speech production? In H. T. Bunnell &

W. Idsardi (Eds.), Proceedings of the 4th International Conference on Spoken

Language Processing (Vol. 2, pp. 606-609). Philadelphia, PA, USA.

Altmann, G. T. M. (1990). Cognitive models of speech processing: An introduction.

In G. T. M. Altmann (Ed.), Cognitive models of speech processing:

237

Psycholinguistic and computational perspectives (pp. 1-23). Cambridge: MA

US: The MIT Press.

Arellano, S. I., & Draper, J. E. (1972). Relations between musical aptitudes and

second-language learning. Hispania, 55(1), 111-121.

Aslin, R. N., & Pisoni, D. B. (1980). Some developmental processes in speech

perception. In G. Yeni-Komshian, J. Kavanagh & C. Ferguson (Eds.), Child

phonology: Perception and production (pp. 67-96). New York: Academic

Press.

Aslin, R. N., Pisoni, D. B., Hennessy, B. L., & Perey, A. J. (1981). Discrimination of

voice onset time by human infants: New findings and implications for the

effects of early experience. Child Development, 52, 1135-1145.

Aslin, R. N., Pisoni, D. B., & Jusczyk, P. W. (1983). Auditory development and

speech perception in infancy. In M. M. Haith & J. J. Campos (Eds.), Infancy

and the biology of development. New York: Wiley.

Bachem, A. (1955). Absolute pitch. Journal of the Acoustical Society of America, 27,

1180–1185.

Baggaley, J. (1974). Measurement of Absolute Pitch. Psychology of Music(22), 11-17.

Baharloo, S., Johnston, P. A., Service, S. K., Gitschier, J., & Freimer, N. B. (1998).

Absolute pitch: An approach for identification of genetic and nongenetic

components. American Journal of Human Genetics, 62, 224–231.

Baharloo, S., Service, S. K., Risch, N., Gitschier, J., & Freimer, N. B. (2000).

Familial aggregation of absolute pitch. American Journal of Human Genetics,

67, 75-758.

Bailey, P. J., Summerfield, Q., & Dorman, M. F. (1977). On the identification if sine-

wave analogues of certain speech sounds. Haskins Laboratories Status Report

on Speech Research, SR-51/52, 1-25.

Ball, M., & Rahilly, J. (1999). Phonetics. The Science of Speech. London: Arnold.

Barron, R. W. (1994). The sound-to-spelling connection: Orthographic activation in

auditory word recognition and its implications for the acquisition of

phonological awareness and literacy skills. In V. W. Berninger (Ed.), The

varieties of orthographic knowledge, 1: Theoretical and developmental issues.

Neuropsychology and cognition (Vol. 8, pp. 219-242). Dordrecht,

Netherlands: Kluwer Academic Publishers.

238

Barry, J., & Blamey, P. (2004). The acoustic analysis of tone differentiation as a

means for assessing tone production in speakers of Cantonese. Journal of the

Acoustical Society of America, 116(3), 1739-1748.

Bastian, J., & Abramson, A. S. (1964). Identification and discrimination of phonemic

vowel duration. In Speech research and instrumentation (Vol. 10). New York:

Haskins Laboratories.

Bastian, J., Eimas, P. D., & Liberman, A. (1961). Identification and discrimination of

a phonemic contrast induced by silent interval. Journal of the Acoustical

Society of America, 33, 842.

Baudoin-Chial, S. (1986). Hemispheric lateralization of modern standard Chinese

tone processing. Journal of Neurolinguistics, 2, 189–199.

Bauer, R. S., & Benedict, P. K. (1997). Modern Cantonese Phonology (Vol. 102).

Berlin: Mouton de Gruyter.

Baumrin, J. M. (1974). Perception of the duration of a silent interval in nonspeech

stimuli: A test of the motor theory of speech perception. Journal of Speech

and Hearing Research, 17, 294-309.

Beach, D. M. (1938). The phonetics of the Hottentot language. Cambridge:

Cambridge University Press.

Bekesy, G. V. (1960). Experiments in hearing. New York: McGraw-Hill.

Bent, T., Bradlow, A., & Wright, B. (2006). The influence of linguistic experience on

the cognitive processing of pitch in speech and nonspeech sounds. Journal of

Experimental Psychology: Human Perception & Performance, 32(1), 97-103.

Bergeson, T. R., & Trehub, S. E. (2002). Absolute pitch and tempo in mothers' songs

to infants. Psychological Science, 13, 72-75.

Bertoncini, J., Bijeljac-Babic, R., Blumstein, S. E., & Mehler, J. (1987).

Discrimination in neonates of very short CV's. Journal of the Acoustical

Society of America, 82, 31-37.

Best, C. T., Moringiello, B., & Robson, R. (1981). Perceptual equivalence of acoustic

cues in speech and nonspeech perception. Perception & Psychophysics,

29(191-211).

Bever, T. G. (1975). Cerebral asymmetries in humans are due to their differentiation

of two incompatible processes: holistic and analytic. Annals of the New York

Academy of Sciences, 163, 251-262.

239

Bever, T. G., & Chiarello, R. J. (1974). Cerebral dominance in musicians and

nonmusicians. Science, 185, 537-539.

Bimler, D., Kirkland, J., & Jameson, K. (2004). Quantifying variations in personal

color spaces: Are there sex differences in color vision? Color Research and

Application, 29, 128-134.

Blechner, M. J. (1977). Musical skill and the categorical perception of harmonic

mode. Unpublished PhD Dissertation, Yale University.

Blicher, D. L., Diehl, R. L., & Cohen, L. B. (1988). Effects of syllable duration on the

perception of Mandarin tones: A cross-language study. Journal of the

Acoustical Society of America, 84(1), 157.

Blickenstaff, C. B. (1963). Musical talents and foreign language learning ability.

Modern language Journal, 47, 359-363.

Bloom, L. (1973). One word at a time: The use of single-word utterances before

syntax. The Hague: Mouton.

Bluhme, H., & Burr, R. (1971). An audio-visual display of pitch for teaching Chinese

tones. Studies in Linguistics, 22, 51-57.

Bolton, T. L. (1894). Rhythm. American Journal of Psychology, 6, 145-238.

Bornstein, M. H. (1973). Color vision and color naming: A psychophysiological

hypothesis of cultural difference. Psychological Bulletin, 80(4), 257-285.

Bornstein, M. H. (1987). Perceptual categories in vision and audition. In S. Harnad

(Ed.), Categorical Perception: The Groundwork of Cognition (pp. 535-565).

Cambridge: Cambridge University Press.

Boyle, J. (1992). Evaluation of musical ability. In R. Colwell (Ed.), Handbook of

Research on Music Teaching and Learning (pp. 247–265). New York: Oxford

University Press.

Bradlow, A., Pisoni, D., Yamada, R., & Tohkura, Y. (1997). Training Japanese

listeners to identify English /r/ and /l/. IV. Some effects of perceptual learning

on speech production. Journal of the Acoustical Society of America, 101,

2299–2310.

Brady, P. T. (1970). Fixed-scale mechanism of absolute pitch. Journal of the

Acoustical Society of America, 48, 883–887.

Braun, F. (1927). Untersuchungen ueber das persoenliche Tempo. Archiv der

gesamten Psychologie, 60, 317-360.

240

Broca, P. (1861). Remarques sur le siege de la faculte de langage articule, suivis d'une

observation d'aphemie (perte de la parole). Bulletin de la Societe Anatomique,

6, 330-357.

Broselow, E., Hurtig, R. R., & Ringen, C. (1987). The perception of second language

prosody. In G. Ioup & S. H. Weinberger (Eds.), Inter-language Phonology,

The Acquisition of Second Language Sound System (pp. 350-361). Cambridge:

Newbury House Publishers.

Bryden, M. P. (1982). Laterality: Functional asymmetry in the brain. New York:

Academic Press.

Burnham, D., Earnshaw, L., & Clark, J. (1991). Development of categorical

identification of native and non-native bilabial stops: infants, children and

adults. Journal of Child Language, 18, 231-260.

Burnham, D., Earnshaw, L., & Quinn, M. (1987). The development of categorical

identification of speech. In B. McKenzie & H. Day (Eds.), Perceptual

Development in Early Infancy: Problems and Issues (pp. 237-275). New York:

Erlbaum.

Burnham, D., & Francis, E. (1997). The role of linguistic experience in the perception

of Thai tones. In A. S. Abramson (Ed.), Southeast Asian Linguistic studies in

honour of Vichin Panupong (pp. 29-47).

Burnham, D., & Jones, C. (2002). Categorical perception of lexical tone by tonal and

non-tonal language speakers. In Proceedings of the 9th Australian

International Conference on Speech Science & Technology. Melbourne:

Australian Speech Science & Technology Association Inc.

Burnham, D., Peretz, I., Stevens, K., Jones, C., Schwanhäußer, B., Tsukada, K., et al.

(2004). Do Tone Language Speakers have Perfect Pitch? Paper presented at

the 8th International Conference on Music Perception & Cognition, Evanston,

IL.

Burnham, D., Tsukada, K., Jones, C., Rungrojsuwan, S., Krachaikiat, N., &

Luksaneeyanawin, S. (2005, December 15-16, 2005). Lexical tone production

development in Thai children, 18 months to 6 years: Relationships with

language milestones? Paper presented at the 15th Australian Language and

Speech Conference, Sydney.

241

Burnham, D. K., Earnshaw, L. J., & Clark, J. E. (1991). Development of categorical

identification of native and non-native bilabial stops: Infants, children, and

adults. Journal of Child Language, 18(2), 231-260.

Burns, E. M. (1999). Intervals, scales, and tuning. In D. Deutsch (Ed.), The

Psychology of Music (pp. 215-264). New York: Academic Press.

Burns, E. M., & Ward, D. (1978). Categorical perception - phenomenon or

epiphenomenon: Evidence from experiments in the perception of melodic

musical intervals. Journal of the Acoustical Society of America, 68, 456-468.

Capo, H. B. C. (1991). A comparative phonology of Gbe. Publications in African

Languages and Linguistics, 14.

Caramazza, A., Yeni-Komshian, G., Zurif, E., & Carbone, E. (1973). The acquisition

of a new phonological contrast: The case of stop consonants in French-English

bilinguals. Journal of the Acoustical Society of America, 54, 421-428.

Carney, A. E., Widin, G. P., & Viemeister, N. F. (1977). Noncategorical perception of

stop consonants differing in VOT. Journal of the Acoustical Society of

America, 62, 961-970.

Chan, A. S., Ho, Y.-C., & Cheung, M.-C. (1998). Music training improves verbal

memory. Nature, 381(284).

Chan, M. (1987). Tone and melody in Cantonese. Paper presented at the Thirteenth

Annual Meeting of the Berkeley Linguistics Society, Berkeley, BLS.

Chan, S., Chuang, C., & Wang, W. (1975). Cross-language study of categorical

perception for lexical tone. Journal of the Acoustical Society of America, 58,

119.

Chang, H. W., & Trehub, S. E. (1977). Auditory processing of relational information

by young infants. Journal of Experimental Child Psychology, 24, 324-331.

Chang, Y. C., & Halle, P. (2000). Taiwan Huayu shengdiao fanchou ganzhi

[Categorical perception of Taiwan Mandarin tones]. Tsing Hua Journal of

Chinese Studies, new series XXX, 1, 51-56.

Chao, R. C. (1956). Tone, intonation, singsong, chanting, recitative, tonal

composition, and atonal composition in Chinese. In M. Halle, H. G. Lunt, H.

McLean & C. H. V. Schooneveld (Eds.), For Roman Jakobson: Essays on the

Occasion of His Sixtieth Birthday, 11 October 1956. The Hague, Netherlands:

Monton & Co.

242

Chao, Y.-R. (1930). A system of tone letters. Le Maitre Phonetique, 45, 24-27.

Chuang, C., Hiki, S., Sone, T., & Nimura, T. (1972). The acoustical features and

perceptual cues of the four tones of standard colloquial Chinese. Proceedings

of the 7th International Congress on Acoustics, (Adademial Kiado, Budapest),

297-300.

Clynes, M., & Nettheim, N. (1982). The living quality of music: Neurobiologic

patterns of communicating feeling. In M. Clynes (Ed.), Music, Mind and Brain

(pp. 171–216). New York.

Cohen, A. J., & Baird, K. (1990). Acquisition of absolute pitch: The question of

critical periods. Psychomusicology, 9, 31–37.

Cohen, A. J., Thorpe, L. A., & Trehub, S. E. (1987). Infants‟ perception of musical

relations in short transposed tone sequences. Canadian Journal of Psychology,

41, 33–47.

Collier, G. L., & Wright, C. E. (1995). Temporal rescaling of simple and complex

ratios in rhythmic tapping. Journal of Experimental Psychology: Human

Perception and Performance, 21, 602-627.

Collyer, C. E., Broadbent, H. A., & Church, R. M. (1994). Preferred rates of repetitive

tapping and categorical

time production. Perception & Psychophysics, 55, 443–453.

Conway, D. A., & Haggard, M. P. (1971). New demonstrations of categorical

perception (No. 5). Cambridge: University of Cambridge, Psychology

Laboratory.

Cook, P. R. (1991). Identification of control parameters in an articulator vocal tract

model, with applications to the synthesis of singing. Unpublished PhD

Dissertation, Stanford University.

Cooper, W. E., Ebert, R. E., & Cole, R. A. (1976). Perceptual analysis of stop

consonants and glides. Journal of Experimental Psychology: Human

Perception & Performance, 2, 92-104.

Coster, D. C., & Kratochvil, P. (1984). Tone and stress discrimination in normal

Peking dialect speech. In B. Hong (Ed.), New papers in Chinese linguistics

(pp. 119-132). Canberra: Australian National University Press.

Cowan, N., & Morse, P. A. (1979). Influence of task demands on the categorical

versus continuous perception of vowels. In J. J. Wolf & D. H. Klatt (Eds.),

243

Speech Communication Papers (pp. 443-446). New York: Acoustical Society

of America.

Creelman, C. D., & Macmillan, N. A. (1979). Auditory phase and frequency

discrimination: A comparison of nine procedures. Journal of Experimental

Psychology: Human Perception & Performance, 5, 146-156.

Creelman, C. D., & Macmillan, N. A. (1996). DPrime Plus. 2004, from

http://www.psych.utoronto.ca/~creelman/

Cross, D. V., Lane, H. L., & Sheppard, W. C. (1965). Identification and

discrimination functions for a visual continuum and their relation to the motor

theory of speech perception. Journal of Experimental Psychology, 70, 63-74.

Crowder, R. G. (1982). Decay of auditory memory in vowel discrimination. Journal

of Experimental Psychology: Human Learning and Memory, 8, 153-162.

Crozier, J. B. (1997). Absolute pitch: Practice makes perfect, the earlier the better.

Psychology of Music, 25, 110–119.

Crystal, D. (2003). A Dictionary of Linguistics and Phonetics (5 ed.). Malden, MA:

Blackwell.

Cuddy, L. L. (1968). Practice effects in the judgment of absolute pitch. Journal of the

Acoustical Society of America, 43, 1069-1076.

Cuddy, L. L. (1970). Training the absolute identification of pitch. Perception &

Psychophysics, 8, 265-269.

Cutting, J., & Rosner, B. (1974). Categories and boundaries in speech and music.

Perception & Psychophysics, 16(3), 564-570.

Cutting, J., Rosner, B., & Foard, C. (1976). Perceptual categories for musiclike

sounds: implications for theories of speech perception. Quarterly Journal of

Experimental Psychology, 28, 361-378.

Damper, R., & Harnad, S. (2000). Neural network models of categorical perception.


Davidson, J. (1993). Visual perception of performance manner in the movements of

solo musicians. Psychology of Music, 21(2), 103–113.

Dechovitz, D., & Mandler, R. (1977). Effects of transition length on identification and

discrimination along a place continuum. Haskins Laboratories Status Report

on Speech Research, SR-51/51, 119-130.

http://www.psych.utoronto.ca/~creelman/

244

Delattre, P. C., Liberman, A. M., & Cooper, F. S. (1955). Acoustic loci and

transitional cues for consonants. Journal of the Acoustical Society of America,

27, 769-773.

Delattre, P. C., Liberman, A. M., & Cooper, F. S. (1964). Formant transitions and loci

as acoustic correlates of place of articulation. Studie Lingustica, 18, 104-121.

Delattre, P. C., Liberman, A. M., Cooper, F. S., & Gerstman, L. J. (1952). An

experimental study of the acoustic determinants of vowel color. Word, 8, 195-

210.

Demany, L., & Armand, F. (1984). The perceptual reality of tone chroma in early

infancy. Journal of the Acoustical Society of America, 76, 57-66.

Dexter, E. S., & Omwake, K. T. (1934). The relation between pitch discrimination

and accent in modern languages. Journal of Applied Psychology, 18, 267-271.

Diehl, R. L., & Kluender, K. R. (1989). On the objects of speech perception.

Ecological Psychology, 1(2), 121-144.

Diehl, R. L., Lotto, A. J., & Holt, L. L. (2004). Speech perception. Annual Review of

Psychology, 55, 149-179.

Douglas, S., & Willatts, P. (1994). The relationship between musical ability and

literacy skills. Journal of Research in Reading, 17, 99-107.

Dow, F. (1972). An outline of Mandarin Phonetics. Canberra: Australian National

University Press.

Dowling, W. J. (1999). Development of music perception and cognition. In D.

Deutsch (Ed.), The Psychology of Music (2nd ed., pp. 603–625). San Diego,

CA: Academic Press.

Drayna, D., Manichaikul, A., de Lange, M., Snieder, H., & Spector, T. (2001).

Genetic correlates of musical pitch recognition in humans. Science, 291, 1969-

1972.

Dung, D., Huong, T., & Boulakia, G. (1998). Intonation in Vietnamese. In D. Hirst &

A. Di Cristo (Eds.), Intonation Systems. A Survey of Twenty Languages (pp.

395-416). Cambridge: Cambridge University Press.

Echols, C. H., Crowhurst, M. J., & Childers, J. B. (1997). The perception of rhythmic

units in speech by infants and adults. Journal of Memory and Language,

36(2), 202-225.

245

Edman, T. R. (1979). Discrimination of intraphonemic differences along two place of

articulation continua. In J. J. Wolf & D. H. Klatt (Eds.), Speech

Communication Papers (pp. 455-458). New York: Acoustical Society of

America.

Edman, T. R., Soli, S. D., & Widin, G. P. (1978). Learning and generalization of

intraphonemic VOT discrimination. Journal of the Acoustical Society of

America, 63, 19 (Abstract).

Eilers, R. E. (1980). Infant speech perception: History and mystery. In G. H. Yeni-

Komshian, J. F. Kavanagh & C. A. Ferguson (Eds.), Child Phonology (Vol. 2,

pp. 23-39). New York: Academic Press.

Eimas, P. D. (1963). The relation between identification and discrimination along

speech and non-speech continua. Language and Speech, 6, 206-217.

Eimas, P. D. (1975). Auditory and phonetic coding of the cues for speech:

Discrimination of the [r-l] distinction by young infants. Perception &


Eimas, P. D., & Miller, J. L. (1980a). Contextual effects in infant speech perception.

Science, 209, 1140-1141.

Eimas, P. D., & Miller, J. L. (1980b). Discrimination of the information for manner of

articulation. Infant Behavior and Development, 3, 367-375.

Eimas, P. D., Siqueland, E. R., Jusczyk, P., & Vigorito, J. (1971). Speech perception

in infants. Us: American Assn for the Advancement of Science.

Eng, N., Obler, L. K., Harris, K. S., & Abramson, A. S. (1996). Tone perception

deficits in Chinese-speaking Broca‟s aphasics. Aphasiology, 10, 649–656.

Eterno, J. A. (1961). Foreign language pronunciation and musical aptitude. Modern

Language Journal, 45, 168 -170.

Ewan, W. G. (1975). Laryngeal behavior in speech. Unpublished PhD Dissertation,

University of California, Berkeley.

Feld, S., & Fox, A. A. (1994). Music and language. Annual Review of Anthropology,

23, 25-53.

Fernald, A. (1991). Prosody in speech to children: Prelinguistic and linguistic

functions. Annals of Child Developmental Psychology, 8, 43–80.

Fish, L. (1984). Relationships among Eighth-Grade German Students' Learning

Styles, Pitch Discrimination, Sound Discrimination, and Pronunciation of

246

German Phonemes. Unpublished Master's Thesis, University of Minnesota,

Minneapolis.

Flanagan, J., & Saslow, M. (1958). Pitch discrimination of synthetic vowels. Journal

of the Acoustical Society of America, 32, 1319-1328.

Flege, L. E., Munro, M. J., & Fox, R. A. (1994). Auditory and categorical effects on

cross-language vowel perception. Journal of the Acoustical Society of

America, 95(6), 3623-3641.

Fodor, J. (1983). Modularity of mind. Cambridge, MA: MIT Press.

Forfeit, K. G. (1977). Linguistic relativism and selective adaptation for speech: A

comparative study of English and Thai. Perception & Psychophysics, 21, 347-

351.

Forster, K. I., & Forster, J. C. (2003). DMDX: A Windows display program with

millisecond accuracy. Behavior Research Methods, Instruments & Computers,

35, 116-124.

Fowler, C. A. (1994). Speech perception: direct realist theory. In R. E. Asher (Ed.),

The Encyclopaedia of Language and Linguistics (pp. 4199-4203). Oxford:

Pergamon.

Fowler, C. A. (1996). Listeners do hear sounds, not tongues. Journal of the Acoustical


Fox, R., & Unkefer, J. (1983). The effect of lexical status on the perception of tone.

Journal of Chinese Linguistics, 13, 71-87.

Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.), The Psychology of Music

(1 ed., pp. 149-180). New York: Academic Press.

Francis, A. L., Ciocca, V., & Chit Ng, B. K. (2003). On the (non)categorical

perception of lexical tones. Perception & Psychophysics, 65(7), 1029-1044.

Frazier, L. (1976). What can /w/, /l/, /y/ tell us about categorical perception? Haskins

Laboratories Status Report on Speech Research, SR-48, 235-256.

Fry, D. B. (1969). Acoustic Phonetics. Cambridge: Cambridge University Press.

Fry, D. B. (1970). Prosodic phenomena. In B. Malmberg (Ed.), Manual of Phonetics.

Amsterdam: North Holland.

Fry, D. B., Abramson, A. S., Eimas, P. D., & Liberman, A. M. (1962). The

identification and discrimination of synthetic vowels. Language and Speech,

5, 171-189.

247

Fujisaki, H., & Kawashima, T. (1968). The influence of various factors on the

identification of synthetic speech sounds. In Reports of the 6th international

congress on acoustics (Vol. No. 2, pp. 95-98). Tokyo.

Fujisaki, H., & Kawashima, T. (1969). On the modes and mechanisms of speech

perception. In Annual Report of the Engineering Research Institute, (Vol. 28,

pp. 67-73): University of Tokyo.

Fujisaki, H., & Kawashima, T. (1970). Some experiments on speech perception and a

model for the perceptual mechanism. In Annual Report of the Engineering

Research Institute, Faculty of Engineering (Vol. 29, pp. 207-214): University

of Tokyo.

Fujisaki, H., & Kawashima, T. (1970). Some experiments on speech perception and a

model for the perceptual mechanism. Tokyo: Faculty of Engineering.

Gabrielsson, A. (1973). Adjective ratings and dimension analyses of auditory rhythm

patterns. Scandinavian Journal of Psychology, 14, 244–260.

Gandour, J. (1974). Consonant types and tone in Siamese. Journal of Phonetics, 2,

337-350.

Gandour, J. (1983). Tone perception in far eastern languages. Journal of Phonetics,

11, 149-175.

Gandour, J., & Dardarananda, R. (1983). Identification of tonal contrasts in Thai

aphasic patients. Brain and Language, 18, 98-114.

Gandour, J., & Harshman, R. (1978). Crosslanguage differences in tone perception: A

multidimensional scaling investigation. Language and Speech, 21(1), 1-33.

Gandour, J., Petty, S. H., & Dardarananda, R. (1988). Perception and production of

tone in aphasia. Brain and Language, 35(2), 201-240.

Gandour, J., Ponglorpisit, S., Khunadorn, F., Dechongkit, S., Boongrid, P.,

Boonklam, R., et al. (1992). Lexical tones in Thai after unilateral brain

damage. Brain and Language, 43, 275–307.

Gandour, J., Potisuk, S., Dechongkit, S., & Ponglorpisit, S. (1992). Tonal

coarticulation in Thai disyllabic utterances: A preliminary study. Linguistics of

the Tibeto-Burman Ares, 15(1), 93-110.

Gandour, J., Wong, D., & Hutchins, G. (1998). Pitch processing in the human brain is

influenced by language experience. NeuroReport, 9, 2115-2119.

248

Garcia, E. (1966). The identification and discrimination of synthetic nasals. Haskins

Laboratories Status Report on Speech Research, SR-7/8(3.1-3.16).

Gardiner, M. F., Fox, A., Knowles, F., & Jeffrey, D. (1996). Learning improved by

arts training. Nature, 381, 284.

Garrett, M., Bever, T., & Fodor, J. (1966). The active use of grammar in speech

perception. Perception & Psychophysics, 1, 30-32.

Gerrits, E., & Schouten, M. E. H. (2004). Categorical perception depends on the

discrimination task. Perception & Psychophysics, 66(3), 363-376.

Gill, H. S., & Gleason, H. A. (1969). A reference grammar of Punjabi: Patiala.

Gilleece, L. F. (2006). An empirical investigation of the association between musical

aptitude and foreign language aptitude. Unpublished PhD Dissertation,

Trinity College Dublin.

Gordon, E. (1965). Music Aptitude profile. Boston: Houghton Mifflin.

Gordon, E. (1989). Advanced Measures of Music Audiation: GIA Publications.

Gottfried, T. L. (2007). Music and language learning: Effect of musical training on

learning L2 speech contrasts. In O.-S. Bohn & M. J. Munro (Eds.), Language

Experience in Second Language Speech Learning: In honor of James Emil

Flege. (pp. 221–237).

Graziano, A. B., Peterson, M., & Shaw, G. L. (1999). Enhanced learning of

proportional math through music training and spatial-temporal reasoning.

Neurological Research, 21, 139-152.

Gregersen, P. K., Kowalsky, E., Kohn, N., & Marvin, E. W. (2001). Early childhood

music education and predisposition to absolute pitch: Teasing apart genes and

environment. American Journal of Medical Genetics, 98(3), 280-282.

Gromko, J. E., & Poorman, A. S. (1998). Developmental trends and relationships in

children's aural perception and symbol use. Journal of Research in Music

Education, 46, 16-23.

Grove Music Online. (2001). Retrieved 5 October, 2006, from

http://www.grovemusic.com.ezproxy.uws.edu.au

Gussenhoven, C. (2004). The Phonology of Tone and Intonation. Cambridge:


Halle, M., & Stevens, K. N. (1971). A note on laryngeal features (Quarterly progress

Report). Boston: MIT Research Lab of Electronics.

http://www.grovemusic.com.ezproxy.uws.edu.au/

249

Halle, P. A., Chang, Y. C., & Best, C. T. (2004). Identification and discrimination of

Mandarin Chinese tones by Mandarin Chinese vs. French listeners. Journal of

Phonetics, 32(3), 395-421.

Halpern, A. R. (1989). Memory for the absolute pitch of familiar songs. Memory &

Cognition, 17, 572-581.

Han, S. M., & Kim, K.-O. (1974). Phonetic variation of Vietnamese tones in

disyllabic utterances. Journal of Phonetics, 2, 223-232.

Hanson, V. L. (1977). Within-category discrimination in speech perception.

Perception & Psychophysics, 21, 423-430.

Harnad, S. (1987). Categorical Perception: The Groundwork of Cognition.


Harrison, P. (1998). Yoruba babies and unchained melody. UCL Working Papers in

Linguistics, 10, 1-20.

Hasegawa, A. (1976). Some perceptual consequences of fricative coarticulation.

Unpublished PhD Dissertation, Purdue University.

Hassler, M., Birbaumer, N., & Feil, A. (1985). Musical talent and visual-spatial

ability: a longitudinal study. Psychology of Music, 13, 99-113.

Haudricourt, A.-G. (1954). De l'origine des tons en vietnamien. Journal Asiatique,

242, 69-82.

Haudricourt, A.-G. (1961). Bipartition et tripartition des systemes de tons dans

quelques langues d'extreme-orient. Bulletin de la Societe Linguistique de

Paris, 56, 163-180.

Healy, A. F., & Repp, B. H. (1982). Context independence and phonetic mediation in

categorical perception. Journal of Experimental Psychology: Human


Heller, L. M., & Trahiotis, C. (1995). The discrimination of samples of noise in

monotonic, diotic and dichotic conditions. Journal of the Acoustical Society of

America, 97, 3775-3781.

Henderson, E. J. A. (1981). Tonogenesis: Some recent speculations on the

development of tone. Transactions of the Philological society, 112, 1-24.

Hickok, G. (2001). Functional anatomy of speech perception and speech production:

Psycholinguistic implications. Journal of Psycholinguistic Research, 30(3),

225-235.

250

Hillenbrand, J. M., Minifie, F. D., & Edwards, T. J. (1979). Tempo of spectrum

change as a cue in speech and sound discrimination by infants. Journal of

Speech and Hearing Research, 22, 147-165.

Hirose, H. (1997). Investigating the physiology of laryngeal structures. In W. J.

Hardcastle & J. Laver (Eds.), Handbook of Phonetic Sciences (pp. 116-136).

Oxford: Blackwell.

Hirsh, I. J., & Sherrick, C. E. (1961). Perceived order in different sense modalities.

Journal of Experimental Psychology, 62, 423-432.

Ho, A. (1976). The acoustic variation of Mandarin tones. Phonetica, 33, 353-367.

Ho, Y.-C., Cheung, M.-C., & Chan, A. S. (2003). Music training improves verbal but

not visual memory: Cross-sectional and longitudinal explorations in children.

Neuropsychology, 17, 439-450.

Hombert, J.-M. (1975). Towards a theory of tonogenesis: an empirical,

physiologically and perceptually-based account of the development of tonal

contrasts in language. Unpublished PhD Dissertation, University of

California, Berkeley.

Hombert, J.-M. (1977a). Consonant types, vowel height, and tone in Yoruba. Studies

in African Linguistics, 8(173-190).

Hombert, J.-M. (1977b). Development of tones from vowel height? Journal of

Phonetics, 5, 9-16.

Hombert, J.-M., & Ladefoged, P. (1976). The effect of aspiration on the fundamental

frequency of the following vowel. Journal of the Acoustical Society of

America, 59, 572.

Hombert, J.-M., Ohala, J. J., & Ewan, W. G. (1979). Phonetic explanations for the

development of tone. Language, 55, 37-58.

Howie, J. M. (1976). Acoustical studies of Mandarin vowels and tones. Cambridge,

UK: Cambridge University Press.

Hsieh, L., Gandour, J., Wong, D., & Hutchins, G. D. (2001). Functional heterogeneity

of inferior frontal gyrus is shaped by linguistic experience. Brain and

Language, 76, 227-252.

Hughes, C. P., Chan, J. L., & Su, M. S. (1983). Aprosodia in Chinese patients with

right cerebral hemisphere lesions. Archives of Neurology, 40, 732-736.

251

Hung, T. T. N. (1989). Syntactic and Semantic Aspects of Chinese Tone Sandhi.

Bloomington: Indiana University Linguistics Club Publications.

Hurwitz, I., Wolff, P. H., Bortnick, B. D., & Kokas, K. (1975). Nonmusical effects of

the Kodaly music curriculum in primary grade children. Journal of Learning

Disability, 8, 167-174.

International Phonetics Association. (1999). Handbook of the International Phonetic

Association: A guide to the use of the International Phonetic Alphabet:


Jakobson, L. S., Cuddy, L. L., & Kilgour, A. R. (2003). Time tagging: A key to

musicians‟ superior memory. Music Perception, 20, 307–313.

Jensen, M. K. (1958). Recognition of word tones in whispered speech. Word, 14, 187-

197.

Jongman, A., & Moore, C. (2000). The role of language experience in speaker and

rate normalization processes. Paper presented at the 6th International

conference on Spoken Language Processing.

Jongman, A., Wang, Y., Moore, C., & Sereno, J. A. (2006). Perception and

production of Mandarin Chinese tones. In L. H. T. E. Bates, and O. Tseng

(Ed.), The Handbook of Chinese Psycholinguistics: Cambridge University

Press.

Jusczyk, P. W. (1981). Infant speech perception: A critical appraisal. In P. D. Eimas

& J. L. Miller (Eds.), Perspectives on the study of speech (pp. 113-164).

Hillsdale: Erlbaum.

Jusczyk, P. W., & Bertoncini, J. (1988). Viewing the development of speech

perception as an innately guided learning process. Language and Speech, 31,

217-238.

Jusczyk, P. W., Copan, H., & Thompson, E. (1978). Perception by two-month-olds of

glide contrasts in multisyllabic utterances. Perception & Psychophysics, 24,

515-520.

Jusczyk, P. W., Pisoni, D. B., Walley, A. C., & Murray, J. (1980). Discrimination of

the relative onset of two-component tones by infants. Journal of the Acoustical


252

Jusczyk, P. W., Rosner, B. S., Reed, M., & Kennedy, L. J. (1989). Could temporal

order differences underlie 2-month-olds' discrimination of English voicing

contrasts? Journal of the Acoustical Society of America, 85, 1741-1749.

Justus, T. C., & Bharucha, J. J. (2002). Music perception and cognition. New York,

NY, US: John Wiley & Sons Inc.

Kaplan, H. L., Macmillan, N. A., & Creelman, C. D. (1978). Tables of d' for variable-

standard discrimination paradigms. Behavior Research Methods &

Instrumentation, 10, 796-813.

Karlgren, B. (1926). Etudes sur la phonologie Chinoise. In Archives d'Etudes

Orientales (Vol. 15). Stockholm: Norstedt.

Kawahara, H., Katayose, H., de Cheveigne, A., & Patterson, R. D. (1999). Fixed point

analysis of frequency to instantaneous frequency mapping for accurate

estimation of F0 and periodicity. Paper presented at the EUROSPEECH '99,

Budapest, Hungary.

Keating, P. A., & Blumstein, S. E. (1978). Effects of transition length on the

perception of stop consonants. Journal of the Acoustical Society of America,

64, 57-64.

Keating, P. A., Mikos, M. J., & Ganong, W. F. (1981). A cross-language study of

range of voice-onset time in the perception of initial stop voicing. Journal of

the Acoustical Society of America, 70, 1261-1271.

Keenan, J. P., Thangaraj, V., Halpern, A. R., & Schlaug, G. (2001). Absolute pitch

and the planum temporale. NeuroImage, 14, 1402-1408.

Kimura, D. (1961). Cerebral dominance and the perception of verbal stimuli.

Canadian Journal of Psychology, 15, 156-165.

Kimura, D. (1964). Left-right differences in the perception of melodies. Quarterly

Journal of Experimental Psychology, 16, 335-358.

Kiriloff. (1969). On the auditory discrimination of tones in Mandarin. Phonetica, 20,

63-69.

Kirstein, E. (1966). Perception of second-formant transitions in non-speech patterns.

Haskins Laboratories Status Report on Speech Research, SR-7/8, 9.1-9.3.

Kitamura, C., & Burnham, D. (1998). The infant's response to vocal affect in maternal

speech. Advances in Infancy Research, 12, 221-236.

253

Klatt, D. (1973). Discrimination of fundamental frequency contours in synthetic

speech: Implications for models of pitch perception. Journal of the Acoustical


Klatt, D. (1989). Review of selected models of speech perception. In W. Marslen-

Wilson (Ed.), Lexical Representation and Process (pp. 169-226). Cambridge,

MA: MIT Press.

Klein, D., Zatorre, R., Milner, B., & Zhao, V. (2001). A cross-linguistic PET study of

tone perception in Mandarin Chinese and English speakers. NeuroImage, 13,

646-653.

Kluender, K. R. (1994). Speech perception as a tractable problem in cognitive

science. In M. A. Gernsbacher (Ed.), Handbook of Psycholinguistics (pp. 173-

217). San Diego, CA, USA: Academic Press, Inc.

Koh, C. K., Cuddy, L. L., & Jakobson, L. S. (2001). Associations and dissociations

among music training, tonal and temporal order processing, and cognitive

skills. Annals of the New York Academy of Sciences, 930(1), 386–388.

Kramer, E. (1963). Judgement of personal characteristics and emotions from

nonverbal properties of speech. Psychological Bulletin, 61, 408-420.

Kratochvil, P. (1985). Variable norms of tones in Beijing prosody. Cahiers de

Linguistique Asie Orientale, 14, 153-174.

Kratochvil, P. (1998). Intonation in Beijing Chinese. In D. Hirst & A. Di Cristo

(Eds.), Intonation Systems. A Survey of Twenty Languages (pp. 417-431).


Krumhansl, C. L., & Schenk, D. L. (1997). Can dance reflect the structural and

expressive qualities of music? Musicae Scientiae, 1, 63–85.

Kuhl, P. K. (1981). Discrimination of speech by nunhuman animals: Basic auditory

sensitivities conducive to the perception of speech-sound categories. Journal


Kuhl, P. K. (1991). Human adults and human infants show a "perceptual magnet

effect" for the prototypes of speech categories, monkeys do not. Perception &

Psychophysics, 50(2), 93-107.

Kuhl, P. K., & Miller, C. (1975). Speech perception by the chinchilla: Voiced-

voiceless distinction in alveolar plosive consonants. Science, 190, 69-72.

254

Kuhl, P. K., & Miller, J. D. (1978). Speech perception by the chinchilla: Identification

functions for synthetic VOT stimuli. Journal of the Acoustical Society of

America, 63, 905-907.

Kuhl, P. K., & Miller, J. D. (1982). Discrimination of auditory target dimensions in

the presence or absence of variation in a second dimension by infants.

Perception & Psychophysics, 32, 279-292.

Kuhl, P. K., & Padden, D. M. (1982). Enhanced discriminability at the phonetic

boundaries for the voicing feature in macaques. Perception & Psychophysics,

32, 542-550.

Lamb, S. J., & Gregory, A. H. (1993). The relationships between music and reading in

beginning readers. Educational Psychology, 13, 19-27.

Lane, H. L. (1965). Motor Theory of Speech Perception: A critical review.

Psychological Review, 72, 275-309.

Lane, H. L. (1967). A behavioral basis for the polarity principle in linguistics.

Language, 43, 494-511.

Larkey, L. S., Wald, J., & Strange, W. (1978). Perception of synthetic nasal

consonants in initial and final syllable position. Perception & Psychophysics,

23(299-312).

Leather, J. (1987). F0 pattern inference in the perceptual acquisition of second

language tone. In A. James & J. Leather (Eds.), Sound Patterns in Second

Language Acquisition. Dordrecht: Foris.

Leather, J. (1990). Perceptual and productive learning of Chinese lexical tone by

Dutch and English speakers. Paper presented at the New Sounds 90:

Amsterdam Symposium on the Acquisition of Second Language Speech,

University of Amsterdam.

Lee, L., & Nusbaum, H. C. (1993). Processing interactions between segmental and

suprasegmental information in native speakers of English and Mandarin

Chinese. Perception & Psychophysics, 53, 157-165.

Lenneberg, E. H. (1967). Biological Foundations of Language. New York: John

Wiley & Sons.

Lerdahl, F., & Jackendoff, R. (1983). A Generative Theory of Tonal Music.

Cambridge, MA: MIT Press.

255

Levitin, D. J. (1994). Absolute memory for musical pitch - Evidence from the

production of learned melodies. Perception & Psychophysics, 56(4), 414-423.

Levitin, D. J. (1999). Absolute pitch: Self-reference and human memory.

International Journal of Computing and Anticipatory Systems, 4, 255-266.

Levitin, D. J., & Cook, P. R. (1996). Memory for musical tempo: Additional evidence

that auditory memory is absolute. Perception & Psychophysics, 58, 927-935.

Li, C. N., & Thompson, S. A. (1977). The acquisition of tone in Mandarin-speaking

children. Journal of Child Language, 4, 185-199.

Liberman, A. M. (1957). Some results of research on speech perception. Journal of


Liberman, A. M. (1996). Introduction: Some assumptions about speech and how they

changed. In Speech: A Special Code. Cambridge, MA: MIT Press.

Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967).

Perception of the speech code. Psychological Review, 74, 431-461.

Liberman, A. M., Delattre, P. C., & Cooper, F. S. (1952). The role of selected

stimulus-variables in the perception of unvoiced stop consonants. American

Journal of Psychology, 65, 497-516.

Liberman, A. M., Delattre, P. C., & Cooper, F. S. (1954). The role of consonant-

vowel transitions in the stop and nasal consonants. Psychological Monogram,

68, 1-13.

Liberman, A. M., Delattre, P. C., Gerstman, L. J., & Cooper, F. S. (1956). Tempo of

frequency change as a cue for distinguishing classes of speech sounds. Journal

of Experimental Psychology, 52, 127-137.

Liberman, A. M., Harris, K., Eimas, P. D., Lisker, L., & Bastian, J. (1961). An effect

of learning on speech perception: The discrimination of durations of silence

with and without phonemic significance. Language and Speech, 54, 175-195.

Liberman, A. M., Harris, K., Hoffman, B., & Griffith, K. (1957). The discrimination

of speech sounds within and across phoneme boundaries. Journal of

Experimental Psychology: Human Perception & Performance, 54, 385.

Liberman, A. M., Harris, K., Kinney, J., & Lane, H. (1961). The discrimination of

relative onset time of the components of certain speech and nonspeech

patterns. Journal of Experimental Psychology, 61, 379-388.

256

Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception

revised. Cognition, 21, 1-36.

Lieberman, P., & Michaels, S. (1962). Some aspects of fundamental frequency,

envelope amplitude, and the emotional content of speech. Journal of the

Acoustical Society of America, 34(7), 922-927.

Lin, M. C. (1988). Putong hua sheng diao de sheng xue texing he zhi jue zhengzhao

[Standard Mandarin tone characteristics and percepts]. Zhongguo Yuyan, 3,

182-193.

Lindblom, B., & Studdert-Kennedy, M. (1967). On the role of formant transitions in

vowel recognition. Journal of the Acoustical Society of America, 42, 830-843.

Lisker, L. (1970). On learning a new contrast. Haskins Laboratory Status Report on

Speech Research, SR-24, 1-15.

Lisker, L., & Abramson, A. S. (1970). The voicing dimension: Some experiments in

comparative phonetics. In Proceedings of the 6th International Congress of

Phonetic Sciences (pp. 563-567). Prague: Academia.

List, G. (1961). Speech melody and song melody in central Thailand.

Ethnomusicology, 5, 15-32.

Liu, S., & Samuel, A. G. (2004). Perception of Mandarin lexical tones when F0

information is neutralized. Language and Speech, 47(2), 109-138.

Locke, S., & Kellar, L. (1973). Categorical perception in a nonlinguistic mode.

Cortex, 9, 355-369.

Lockhead, G. R., & Byrd, R. (1981). Practically perfect pitch. Journal of the


Luksaneeyanawin, S. (1984). The tonal behaviour of one-word utterances: The

interplay between tone and intonation in Thai. Unpublished PhD Dissertation,

University of Edinburgh.

Luksaneeyanawin, S. (1998). Intonation in Thai. In D. Hirst & A. Di Cristo (Eds.),

Intonation Systems. A Survey of Twenty Languages (pp. 376-394). Cambridge:


MacDougall, R. (1903). The structure of simple rhythmic forms. Psychological

Review, Monograph Supplements, 4, 309-416.

MacKain, K. S., Best, C. T., & Strange, W. (1981). Categorical perception of English

/r/ and /l/ by Japanese bilinguals. Applied Psycholinguistics, 2, 369-390.

257

MacKay, D. G., Allport, A., Prinz, W., & Scheerer, E. (1987). Relationships and

modules within language perception and production: An introduction. In A.

Allport & D. G. MacKay (Eds.), Language perception and production:

Relationships between listening, speaking, reading and writing. Cognitive

science series (pp. 1-15). London, England UK: Academic Press, Inc.

Macmillan, N. A. (1987). Beyond the categorical/continuous distinction: A

psychophysical approach to processing modes. In S. Harnad (Ed.),

Categorical Perception: The Groundwork of Cognition (pp. 53-85).


Macmillan, N. A., & Creelman, C. D. (1991). Detection theory: A user's guide.


Macmillan, N. A., Kaplan, H. L., & Creelman, C. D. (1977). The psychophysics of

categorical perception. Psychological Review, 84, 452-471.

Maddieson, I. (1978). The frequency of tones. UCLA Working Papers in Phonetics,

41, 43-52.

Magne, C., Schoen, D., & Besson, M. (2006). Musician children detect pitch

violations in both music and language better than nonmusician children:

Behavioral and electrophysical approaches. Journal of Cognitive

Neuroscience, 18(2), 199-211.

Mandler, R. (1976). Categorical perception along an oral-nasal continuum. Haskins

Laboratories Status Report on Speech Research, SR-47, 147-154.

Massaro, D. W. (1987). Categorical partition: A fuzzy-logical model of categorization

behavior. In S. Harnad (Ed.), Categorical Perception: The Groundwork of

Cognition (pp. 254-283). Cambridge: Cambridge University Press.

Massaro, D. W., & Cohen, M. M. (1983). Categorical or continuous perception: A

new test. Speech Communication, 2, 15-35.

Matisoff, J. (1972). The Loloish tonal split revisited (Research Monograph No. 7).

Berkeley: Center for South and Southeast Asia Studies, University of

California.

Mattingly, I. G., Liberman, A., Syrdal, A. M., & Halwes, T. (1971). Discrimination in

speech and nonspeech modes. Cognitive Psychology, 2, 131-157.

Mattock, K. (2004). Perceptual Reorganisation for Tone: Linguistic Tone and Non-

Linguistic Pitch Perception by English Language and Chinese Language

258

Infants. Unpublished PhD Dissertation, University of Western Sydney,

Sydney.

Mattock, K., & Burnham, D. (2006). Chinese and English infants‟ tone perception:

Evidence for perceptual reorganization. Infancy, 10(3), 241-265.

May, J. G. (1981). Acoustic factors that may contribute to categorical perception.

Language and Speech, 24, 273-284.

McClasky, C. L., Pisoni, D. B., & Carrell, T. D. (1980). Effects of transfer of training

on identification of a new linguistic contrast in voicing (Progress Report No.

6). Bloomington: Indiana University, Department of Psychology.

McGovern, K., & Strange, W. (1977). The perception of /r/ and /l/ in syllable-initial

and syllable-final position. Perception & Psychophysics, 21, 162-170.

McLaughlin, T. (1970). Music and Communication. London: Faber and Faber.

McMahon, O. (1979). The relationship of music discrimination training to reading

and associated auditory skills. Bulletin of the Council for Research in Music

Education, 59, 68-72.

Mei, T. L. (1970). Tones and prosody in middle Chinese and the origin of the rising

tone. Harvard Journal of Asiatic Studies, 30, 86-110.

Miller, J. D. (1961). Word tone recognition in Vietnamese whispered speech. Word,

17, 11-15.

Miller, J. D., Eimas, P. D., & Zatorre, R. (1979). Studies of place and manner of

articulation in syllable-final position. Journal of the Acoustical Society of

America, 66, 1207-1210.

Miller, J. D., Wier, C. C., Pastore, R., Kelly, W. J., & Dooling, R. J. (1976).

Discrimination and labeling of noise-buzz sequences with varying noise-lead

times: An example of categorical perception. Journal of the Acoustical Society

of America, 60, 410-417.

Miller, J. L. (1980). Contextual effects in the discrimination of stop consonant and

semivowel. Perception & Psychophysics, 28, 93-95.

Miller, J. L., & Eimas, P. D. (1977). Studies on the perception of place and manner of

articulation: A comparison of the labial-alveolar and nasal-stop distinctions.

Journal of the Acoustical Society of America, 61, 835-845.

259

Miller, J. L., & Liberman, A. M. (1979). Some effects of later-occurring information

on the perception of stop consonant and semivowel. Perception &


Miracle, W. C. (1989). Tone production of American students of Chinese: A

preliminary acoustic study. Journal of Chinese Language Teachers

Association, 24(49–65).

Miyawaki, K., Strange, W., Verbrugge, R., Liberman, A., Jenkins, J. J., & Fujimura,

O. (1975). An effect of linguistic experience: The discrimination of [r] and [l]

by native speakers of Japanese and English. Perception & Psychophysics, 25,

331-340.

Miyazaki, K. (1988). Musical pitch identification by absolute pitch possessors.


Miyazaki, K. (2004). How well do we understand absolute pitch? Acoustical Science

and Technology, 25(6), 270–282.

Miyazaki, K., & Ogawa, Y. (2006). Learning absolute pitch by children: A cross-

sectional study. Music Perception, 24(1), 63-78.

MLA cooperative foreign language test, German. (1964). Princeton, NJ: Educational

Testing Service.

Moffitt, A. R. (1971). Consonant cue perception by twenty-to-twenty-four-week old

infants. Child Development, 42, 717-731.

Moore, B. C. J. (1989). An introduction to the psychology of hearing (3rd ed.).

London: Academic Press.

Morse, P. A. (1972). The discrimination of speech and nonspeech stimuli in early

infancy. Journal of Experimental Child Psychology, 14, 477-492.

Morse, P. A., & Snowdown, C. T. (1975). An investigation of categorical speech

discrimination by rhesus monkeys. Perception & Psychophysics, 17, 9-16.

Morton, D. (1976). The traditional Music of Thailand: University of California Press.

Murphy, L. E. (1966). Absolute judgments of duration. Journal of Experimental


Nooteboom, S. (1997). The prosody of speech: Melody and rhythm. In W. J.

Hardcastle & J. Laver (Eds.), Handbook of Phonetic Sciences (pp. 640-673).

Oxford: Blackwell.

260

Ohala, J. (1970). Aspects of the control and production of speech. In Working Papers

in Phonetics, UCLA (Vol. 15). Los Angeles.

Ohala, J. (1973a). Explanations for the intrinsic pitch of vowels (Monthly Internal

Memorandum). Berkeley: Phonology Laboratory, University of California.

Ohala, J. (1973b). The physiology of tone. In L. M. Hyman (Ed.), Consonant types

and tone (Southern California occasional papers in linguistics). Los Angeles:

USC.

Orton, S. T. (1925). Word-blindness in school children. Archives of Neurology and

Psychiatry, 14, 582–615.

Pastore, R. E., Ahroon, W. A., Baffuto, K. J., Friedman, C., Puelo, J. S., & Fink, E. A.

(1977). Common-factor model of categorical perception. Journal of

Experimental Psychology: Human Perception & Performance, 3, 686-696.

Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience,

6(7), 674-681.

Patel, A. D., & Peretz, I. (1997). Is music autonomous from language? A

neuropsychological appraisal. In I. Deliege & J. Sloboda (Eds.), Perception

and Cognition of Music (pp. 191-215). London: Erlbaum Psychology Press.

Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature


Perey, A. J., & Pisoni, D. B. (1980). Identification and discrimination of durations of

silence in nonspeech signals (Research Report on Speech Perception No. 6).

Bloomington: Indiana University.

Pike, K. L. (1948). Tone Languages: University of Michigan Press.

Pimsleur, P. (1966). The Pimsleur language aptitude battery. New York: Harcourt

Brace Jovanovice.

Pisoni, D. B. (1971). On the nature of categorical perception of speech sounds.

Unpublished PhD Dissertation, University of Michigan.

Pisoni, D. B. (1973). Auditory and phonetic memory codes in the discrimination of

consonants and vowels. Perception & Psychophysics, 13, 253-260.

Pisoni, D. B. (1975). The role of auditory short-term memory in vowel perception.

Memory & Cognition, 3, 7-18.

261

Pisoni, D. B. (1976). Some effects of discrimination training on the identification and

discrimination of rapid spectral changes (Progress Report No. 3).

Bloomington: Indiana University, Department of Psychology.

Pisoni, D. B. (1977). Identification and discrimination of the relative onset of two

component tones; Implications for the perception of voicing in stops. Journal


Pisoni, D. B., Aslin, R. N., Perey, A. J., & Hennessy, B. L. (1982). Some effects of

laboratory training on identification and discrimination of voicing contrasts in

stop consonants. Journal of Experimental Psychology: Human Perception &

Performance, 8, 297-314.

Pisoni, D. B., & Glanzman, D. L. (1974). Decision processes in speech discrimination

as revealed by confidence ratings (No. 1). Bloomington: Indiana University.

Pisoni, D. B., & Lazarus, J. (1974). Categorical and noncategorical modes of speech

perception along the voicing continuum. Journal of the Acoustical Society of

America, 55(2), 328-333.

Pisoni, D. B., & Tash, J. (1974). Reaction times to comparisons within and across

phonetic categories. Perception & Psychophysics, 15, 285-299.

Pompino-Marschall, B. (1995). Einfuehrung in die Phonetik. Berlin: de Gruyter.

Popper, R. D. (1972). Pair discrimination for a continuum of synthetic voiced stops

with and without first and third formants. Journal of Psycholinguistic

Research, 1, 205-219.

Povel, D. (1981). Internal representation of simple temporal patterns. Journal of

Experimental Psychology: Human Perception & Performance, 7, 3-18.

Povel, D. (1984). A theoretical framework for rhythm perception. Psychological

Research, 45, 315-337.

Povel, D., & Essens, P. (1984–5). Perception of temporal patterns. Music Perception,

2, 411–440.

Prior, M., & Troup, G. A. (1988). Processing of timbre and rhythm in musicians and

non-musicians. Cortex, 24(3), 451-456.

Profita, J., & Bidder, T. G. (1988). Perfect pitch. American Journal of Medical

Genetics, 29, 763-771.

Rakowski, A. (1972). Direct comparison of absolute and relative pitch. In Hearing

Theory 1972. Eindhoven, The Netherlands: IPO.

262

Ramus, F., Hauser, M. D., Miller, C., Morris, D., & Mehler, J. (2000). Language

discrimination by human newborns and by cotton-top tamarin monkeys.

Science, 288(5464), 349-351.

Rauscher, F. H., & LeMieux, M. T. (2003). Piano, rhythm, and singing instruction

improve different aspects of spatial-temporal reasoning in Head Start children.

Poster presented at the annual meeting of the Cognitive Neuroscience Society,

New York.

Repp, B. H. (1975). Categorical perception, dichotic interference, and auditory

memory: A "same-different" reaction time study.Unpublished manuscript.

Repp, B. H. (1981a). Perceptual equivalence of two kinds of ambiguous speech

stimuli. Bulletin of the Psychonomic Society, 18, 12-14.

Repp, B. H. (1981b). Two strategies in fricative discrimination. Perception &


Repp, B. H. (1984). Categorical perception: Issues, methods, findings. In Speech and

Language: Advanced Basic Research and Practice (Vol. 10, pp. 243-335):

N.J. Lass.

Repp, B. H., Healy, A. F., & Crowder, R. G. (1979). Categories and context in the

perception of isolated steady-state vowels. Journal of Experimental

Psychology: Human Perception & Performance, 5, 129-145.

Rosen, S., & Howell, P. (1987). Auditory, articulatory and learning explanations of

categorical perception in speech. In S. Harnad (Ed.), Categorical Perception:

The Groundwork of Cognition (pp. 113-160). Cambridge: Cambridge

University Press.

Sachs, R. M. (1969). Vowel identification and discrimination in isolation vs. word

context. In Quarterly Progress Report (Vol. 93, pp. 220-229). Cambridge,

Mass.: Research Laboratory of Electronics.

Sadie, S., & Tyrrell, J. (Eds.). (2001). The New Grove Dictionary of Music and

Musicians (2nd ed.). London: Oxford University Press.

Saffran, J. R., & Griepentrog, G. J. (2001). Absolute pitch in infant auditory learning:

Evidence for developmental reorganization. Developmental Psychology,

37(1), 74-85.

Samuel, A. G. (1977). The effect of discrimination training on speech perception:

Noncategorical perception. Perception & Psychophysics, 22, 321-330.

263

Sawusch, J. R., Nusbaum, H. C., & Schwab, E. C. (1980). Contextual effects in vowel

perception II. Evidence for two processing mechanisms. Perception &


Schellenberg, E. G. (2001). Music and nonmusical abilities. Biological Foundations

of Music, 930, 355-371.

Schellenberg, E. G., Iverson, P., & McKinnon, M. C. (1999). Name that tune:

Identifying popular recordings from brief excerpts. Psychological Bulletin &

Review, 6(641-646).

Schellenberg, E. G., & Trehub, S. E. (1994). Frequency ratios and the discrimination

of pure tone sequences. Perception & Psychophysics, 56, 472-478.

Schellenberg, E. G., & Trehub, S. E. (1996a). Children's discrimination of melodic

intervals. Developmental Psychology, 32.

Schellenberg, E. G., & Trehub, S. E. (1996b). Natural musical intervals: evidence

from infant listeners. Psychological Science, 5, 272-277.

Schellenberg, E. G., & Trehub, S. E. (2003). Good pitch memory is widespread.

Psychological Science, 14(3), 262-266.

Schouten, B., Gerrits, E., & Hessen, A. J. (2003). The end of categorical perception as

we know it. Speech Communication, 41, 71-80.

Schouten, J. F. (1940). The perception of pitch. Philips Techn. Rev., 5, 286-294.

Schouten, M. E. H., & Van Hessen, A. J. (1992). Modelling phoneme perception: I.

Categorical perception. Journal of the Acoustical Society of America, 92,

1841-1855.

Schuh, R. (1971). Toward a typology of Chadic vowel and tone systems.Unpublished

manuscript.

Seashore, C. E. (1938). The Psychology of Music. New York: McGraw-Hill.

Seashore, C. E., Lewis, D., & Saetveit, J. (1939, 1960). Seashore Measures of

Musical Talents. New York: The Psychological Corporation.

Sergeant, D. (1969). Experimental investigation of absolute pitch. Journal of

Research in Music Education, 17, 135-143.

Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Identification of consonants and

vowels presented to left and right ears. Quarterly Journal of Experimental


264

Shattuck-Hufnagel, S., & Turk, A. E. (1996). A prosody tutorial for investigators of

auditory sentence processing. Journal of Psycholinguistic Research, 25(2),

193-247.

Shen, X. (1990). Tonal coarticulation in Mandarin. Journal of Phonetics, 18, 281-285.

Shen, X. S. (1989). Toward a register approach in teaching Mandarin tones. Journal

of Chinese Language Teachers Association, 24, 27-47.

Shepard, R. (1982a). Geometrical approximations to the structure of musical pitch.

Psychological Review, 89, 305-333.

Shepard, R. (1982b). Structural Representation of Musical Pitch. In D. Deutsch (Ed.),

The Psychology of Music (pp. 344–390). London.

Shove, P., & Repp, B. H. (1995). Musical motion and performance: Theoretical and

empirical perspectives. In J. Rink (Ed.), The Practice of Performance (pp. 55–

83). Cambridge.

Shuter-Dyson, R., & Gabriel, C. (1981). The psychology of musical ability. London:

Methuen.

Siegel, J. A., & Siegel, W. (1977a). Absolute identification of notes and intervals by

musicians. Perception & Psychophysics, 21, 143-152.

Siegel, J. A., & Siegel, W. (1977b). Categorical perception of tonal intervals:

musicians can't tell sharp from flat. Perception & Psychophysics, 21, 399-407.

Sinnott, J. M., Beecher, M. D., Moody, D. B., & Stebbins, W. C. (1976). Speech

sound discrimination by monkeys and humans. Journal of the Acoustical


Slevc, R. L., & Miyake, A. (2006). Individual differences in second-language

proficiency: Does musical ability matter? Psychological Science, 17(8), 675-

681.

Sloboda, J., & Gregory, A. (1980). The psychological reality of musical segments.

Canadian Journal of Psychology, 34, 274-280.

Snowdon, C. (1987). A naturalistic view of categorical perception. In S. Harnad (Ed.),

Categorical Perception: The Groundwork of Cognition (pp. 332-355). New

York: Cambridge University Press.

So, L. K. H. (1996). Tonal changes in Hong Kong Cantonese. Current Issues in

Language and Society, 3(2), 186-189.

265

Stagray, J., & Downs, D. (1993). Differential sensitivity for frequency among

speakers of a tone and a nontone language. Journal of Chinese Linguistics,

21(1), 143-163.

Stainsby, T., Haszard Morris, R., Malloch, S., & Burnham, D. (2002). MARCS

Auditory Perception Toolbox (APT). Sydney:

http://marcs.uws.edu.au/research/software/apt.htm.

Standley, J. M., & Hughes, J. E. (1997). Evaluation of an early intervention music

curriculum for enhancing prereading/writing skills. Music Therapy

Perspectives, 15(2), 79-85.

Stevens, K. N. (1968). On the relations between speech movements and speech

perception. Zeitschrift fuer Phonetik, Sprachwissenschaft, und

Kommunikationsforschung, 21, 102-106.

Stevens, K. N. (1981). Constraints imposed by the auditory system on the properties

used to classify speech sounds: Data from phonology, acoustics and psycho-

acoustics. In T. F. Myers, J. Laver & J. Anderson (Eds.), The Cognitive

representation of Speech. Amsterdam: North Holland.

Stevens, K. N. (1986). Models of phonetic recognition II: A feature-based model of

speech recognition. In P. Mermelstein (Ed.), Proceedings of the Montreal

Satellite Symposium on Speech Recognition (pp. 66-67). Montreal: 12th

International Congress of Acoustics.

Stevens, K. N. (1989). On the quantal nature of speech. Journal of Phonetics, 17, 3-

45.

Stevens, S. S., & Volkmann, J. (1940). The relation of pitch to frequency: a revised

scale. American Journal of Psychology, 53, 329–353.

Strange, W. (1972). The effects of training on the perception of synthetic speech

sounds: Voice onset time. Unpublished PhD Dissertation, University of

Minnesota.

Studdert-Kennedy, M. (1976). Speech perception. In N. J. Lass (Ed.), Contemporary

issues in experimental phonetics (pp. 243-293). New York: Academic press.

Studdert-Kennedy, M., Liberman, A. M., Harris, K. S., & Cooper, F. S. (1970). Motor

theory of speech perception: A reply to Lane's critical review. Psychological

Review, 77(234-249).

http://marcs.uws.edu.au/research/software/apt.htm

266

Studdert-Kennedy, M., Liberman, A. M., & Stevens, K. N. (1963). Reaction times to

synthetic stop consonants and vowels at phoneme centers and phoneme

boundaries. Journal of the Acoustical Society of America, 35, 1900 (Abstract).

Studdert-Kennedy, M., Liberman, A. M., & Stevens, K. N. (1964). Reaction time

during the discrimination of stop consonants. Journal of the Acoustical Society

of America, 36, 1989.

Studdert-Kennedy, M., & Shankweiler, D. P. (1970). Hemispheric specialization for

speech perception. Journal of the Acoustical Society of America, 48, 579-594.

Summerfield, Q. (1982). Differences between spectral dependencies in auditory and

phonetic temporal processing: Relevance to the perception of voicing in initial

stops. Journal of the Acoustical Society of America 72 (1), 51-61.

Sundberg, J. (1999). The Perception of Singing. In D. Deutsch (Ed.), The Psychology

of Music (pp. 171-214). New York: Academic Press.

Surprenant, A. M., & Watson, C. S. (2001). Individual differences in the processing

of speech and nonspeech sounds by normal-hearing listeners. Journal of the


Swoboda, P. J., Morse, P. A., & Leavitt, L. A. (1976). Continuous vowel

discrimination in normal and at risk infants. Child Development, 47(2), 459-

465.

Syrdal-Lasky, A. (1978). Effects of intensity on the categorical perception of stop

consonants and isolated second formant transitions. Perception &


Tabachnik, B. G., & Fidell, L. S. (2001). Using multivariate statistics (4 ed.).

Needham Heights, MA: Allyn and Bacon.

Tahta, S., Wood, M., & Loewenthal, K. (1981). Foreign accents: factors relating to

the transfer of accent from the first to the second language. Language and

Speech, 24(4), 265-272.

Takeuchi, A. H., & Hulse, S. H. (1993). Absolute pitch. Psychological Bulletin,

113(2), 345-361.

Tartter, V. C. (1981). A comparison of the identification and discrimination of

synthetic vowel and stop consonant stimuli with various acoustic properties.

Journal of Phonetics, 9(477-486).

267

Thogmartin, C. (1982). Age, individual differences in musical and verbal aptitude,

and pronunciation achievement by elementary school children learning a

foreign language. International Review of Applied Linguistics in Language

Teaching, 20(1), 66-72.

Thompson, L. (1987). A Vietnamese Reference Grammar. Hawaii: University of

Hawaii.

Thompson, W. F., Schellenberg, E. G., & Husain, G. (2004). Decoding speech

prosody: Do music lessons help? Emotion, 4(1), 46-64.

Todd, N. P. M. (1992). The dynamics of dynamics: A model of musical expression.

Journal of the Acoustical Society of America, 91, 3540–3550.

Trainor, L. J., & Trehub, S. E. (1992). A comparison of infants' and adults' sensitivity

to Western musical structure. Journal of Experimental Psychology: Human


Trainor, L. J., & Trehub, S. E. (1993). What mediates infants' and adults' superior

processing of the major over the augmented triad. Music Perception, 11, 185-

196.

Trehub, S. E. (1973). Infants' sensitivity to vowel and tonal contrasts. Developmental

Psychology, 9(1), 91-96.

Trehub, S. E., Bull, D., & Thorpe, L. A. (1984). Infants' perception of melodies: The

role of melodic contour. Child Development, 55, 821-830.

Trehub, S. E., Schellenberg, E., & Hill, D. (1997). Music perception and cognition: A

developmental perspective. In I. Deliege & J. Sloboda (Eds.), Music

Perception and Cognition (pp. 121-162). Sussex: Psychology Press.

Trehub, S. E., Thorpe, L. A., & Moringiello, B. A. (1987). Organizational processes

in infants' perception of auditory patterns. Child Development, 58, 741-749.

Trehub, S. E., Thorpe, L. A., & Trainor, L. J. (1990). Infants‟ perception of good and

bad melodies. Psychomusicology, 9, 5–19.

Trehub, S. E., & Trainor, L. J. (1993). Listening strategies in infants: The roots of

language and musical development. In S. McAdams & E. Bigand (Eds.),

Thinking in sound: Cognitive perspectives on human audition (pp. 278-327).

London: Oxford University Press.

Truslit, A. (1938). Gestaltung und Bewegung in der Musik. Berlin.

268

Tse, J. K.-P. (1977). Tone acquisition in Cantonese: A longitudinal case study

(Unpublished manuscript): University of Southern California.

Tseng, C., Massaro, D. W., & Cohen, M. (1986). Lexical tone perception in Mandarin

Chinese. In H. Kao & R. Hoosain (Eds.), Evaluation and Integration of

Acoustic Features, in Linguistics, Psychology, and the Chinese Language (pp.

91-104). Hong Kong: Centre of Asian Studies.

Tuaycharoen, P. (1977). The phonetic and phonological development of a Thai baby:

From early communicative interaction to speech. Unpublished PhD

Dissertation, University of London.

Van Hessen, A. J., & Schouten, M. E. H. (1999). Categorical perception as a function

of stimulus quality. Phonetica, 56(1-2), 56-72.

Van Lancker, D. (1980). Cerebral lateralization of pitch cues in the linguistic signal.

Papers in Linguistics, 13(2), 201-177.

Van Lancker, D., & Fromkin, V. (1973). Hemispheric specialization for pitch and

"tone": Evidence from Thai. Journal of Phonetics, 1, 101-109.

Van Lancker, D., & Fromkin, V. (1978). Cerebral dominance for pitch contrasts in

tone language speakers and in musically untrained and trained English

speakers. Journal of Phonetics, 6, 19-23.

Vaughn, K. (2000)). Music and mathematics: Modest support for the oft-claimed

relationship. Journal of Aesthetic Education, 34(3-4), 149-166.

Vinegrad, M. D. (1972). A direct magnitude scaling method to investigate categorical

versus continuous modes of speech perception. Language and Speech, 15,

114-121.

Vu, T., Nguyen, D., Luong, M., & Hosom, J.-P. (2005). Vietnamese large vocabulary

continuous speech recognition. Paper presented at the INTERSPEECH 2005,

Lisboa, Portugal.

Wang, W. (1971). The basis of speech. In C. Reed (Ed.), The Learning of Language.

New York: Appleton-Century-Crofts.

Wang, W. (1976). Language change. Origins and evolution of language and speech.

Annals of the New York Academy of Sciences, 280, 61-72.

Wang, Y., Jongman, A., & Sereno, J. A. (2001). Dichotic perception of Mandarin

tones by Chinese and American listeners. Brain and Language, 78(3), 332-

348.

269

Wang, Y., Jongman, A., & Sereno, J. A. (2003). Acoustic and perceptual evaluation

of Mandarin tone productions before and after perceptual training. Journal of


Wang, Y., Sereno, J., Jongman, A., & Hirsch, J. (2001). Cortical reorganization

associated with the acquisition of Mandarin tones by American learners: An

fMri study. Paper presented at the Sixth International Conference on Spoken

Language Processing, Beijing.

Wang, Y., Sereno, J. A., Jongman, A., & Hirsch, J. (2003). fMRI evidence for cortical

modification during learning of Mandarin lexical tone. Journal of Cognitive

Neuroscience, 15, 1019-1027.

Wang, Y., & Spence, M. (1999). Training American listeners to perceive Mandarin

tone. Journal of the Acoustical Society of America, 106, 3649-3658.

Ward, D. (1999). Absolute Pitch. In D. Deutsch (Ed.), The Psychology of Music (pp.

265-298). New York: Academic Press.

Ward, D. W., & Burns, E. M. (1982). Absolute Pitch. In D. Deutsch (Ed.), The

Psychology of Music (pp. 431-451). New York: Academic Press.

Ward, W. D., & Burns, E. M. (1978). Singing without auditory feedback. Journal of

Research in Singing and Applied Vocal Pedagogy, 1(2), 24-44.

Waters, R. S., & Wilson, W. A. (1976). Speech perception in rhesus monkeys: The

voicing distinction in synthesized labial and velar stop consonants. Perception

& Psychophysics, 19, 285-289.

Wayland, R., & Guion, S. (2003). Perceptual discrimination of Thai tones by naive

and experienced learners of Thai. Applied Psycholinguistics, 24, 113-129.

Werker, J. F., & Logan, J. S. (1985). Cross-language evidence for three factors in

speech perception. Perception & Psychophysics, 37, 35-44.

Werker, J. F., & Tees, R. C. (1984). Phonemic and phonetic factors in adult cross-

language speech perception. Journal of the Acoustical Society of America, 75,

1866-1878.

Werker, J. F., & Tees, R. C. (1992). The organization and reorganization of human

speech perception. Annual Review of Neuroscience, 15, 377-402.

Wernicke, C. (1874). The aphasia symptom-complex: A psychological study on an

anatomical basis.

270

Westphal, M. E., Leutenegger, R. R., & Wagner, D. L. (1969). Some psycho-acoustic

and intellectual correlates of achievement in German language learning of

Junior High School students. The Modern Language Journal, 53(4), 258-266.

Whalen, D. H., & Xu, Y. (1992). Information for Mandarin tones in the amplitude

contour and in brief segments. Phonetica, 49, 25-47.

White, C. M. (1981). Tonal perception errors and interference from English

intonation. Journal of Chinese Language Teachers Association, 16, 27-56.

Williams, C., & Stevens, K. N. (1972). Emotions and speech: Some acoustical

correlates. Journal of the Acoustical Society of America, 52(4), 1238-1250.

Williams, L. (1977). The perception of stop consonant voicing by Spanish-English

bilinguals. Perception & Psychophysics, 21, 289-297.

Wing, A. M., & Kristofferson, A. B. (1973). The timing of interresponse intervals.

Perception & Psychophysics, 13, 455–460.

Wise, C. M., & Chong, L. P.-H. (1957). Intelligibility of whispering in a tone

language. Journal of Speech and Hearing Disorders, 22, 335-338.

Wong, P. (2002). Hemispheric specialization of linguistic pitch patterns. Brain

Research Bulletin, 59(2), 83-95.

Wong, P., & Diehl, R. L. (2002). How can the lyircs of a song in a tone language be

understood? Psychology of Music, 30(2), 202-209.

Wong, P., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience

shapes human brainstem encoding of linguistic pitch patterns. Nature

Neuroscience, in press.

Wood, C. C. (1976). Discriminability, response bias, and phoneme categories in

discrimination of voice onset time. Journal of the Acoustical Society of

America, 60, 1381-1389.

Xu, Y. (1994). Production and perception of coarticulated tones. Journal of the


Xu, Y., Gandour, J., & Francis, A. L. (2006). Effects of language experience and

stimulus complexity on the categorical perception of pitch direction. Journal

of the Acoustical Society of America, 120(2), 1063-1074.

Yip, M. (2001). Tone. Cambridge: Cambridge University Press.

Yip, M. (2002). Tone. Cambridge: Cambridge University Press.

271

Yiu, E., & Fok, A. (1995). Lexical tone disruption in Cantonese aphasic speakers.

Clinical Linguistic Phonetics, 9, 79-92.

Yung, B. (1983). Creative process in Cantonese opera I: The role of linguistic tones.

Ethnomusicology, 27, 29-47.

Zatorre, R., & Halpern, A. (1979). Identification, discrimination, and selective

adaptation of simultaneous musical intervals. Perception & Psychophysics,

26(5), 384-395.

Zatorre, R. J. (2003). Absolute pitch: a model for understanding the influence of

genes and development on neural and cognitive function. Nature


Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory

cortex: Music and speech. United Kingdom: Elsevier Science.

Zatorre, R. J., Perry, D. W., Beckett, C. A., Westbury, C. F., & Evans, A. C. (1998).

Functional anatomy of musical processing in listeners with absolute pitch and

relative pitch. Proceedings of the National Academy of Sciences of the United

States of America, 95(6), 3172-3177.

Zue, V. (1976). Some perceptual experiments on the Mandarin tones. Paper presented

at the 92nd Meeting of the Acoustical Society of America, San Diego,

California.

272

Appendix A6

Appendices for Chapter 6

273

Appendix A6.1 Consent Form and Questionnaire MARCS Auditory Laboratories

College of Arts, Education and Social Sciences

Denis Burnham Professor of Psychology, Director MARCS

Phone: (+612) 9772 6681 Fax: (+612) 9772 6736 Email: [email protected] Web: www.uws.edu.au/marcs/

Speech perception across different languages October, 2003

PARTICIPANT INFORMATION STATEMENT You are invited to participate in a research study on human speech. The results of the study will be used to understand how adults produce and perceive speech and other auditory signals. The benefits of this study include increased understanding of how easily adult humans produce speech sounds in their native language, how easily humans perceive that acoustic information in another's speech. We are interested in studying this for different languages. So this research is being conducted with speakers of English, Thai, Vietnamese and Mandarin. You are invited to participate because you are a native speaker of English, Thai, Vietnamese or Mandarin. If you participate, you will complete a 45-minute session. You will be asked to identify and discriminate short sound items. If you choose to participate, you will receive $15 at the completion of the study, in reimbursement of travel costs. Participation is voluntary. You have a right not to participate in, or subsequently withdraw from the study. Any decision not to participate will not affect any current or future relationship with the University of Western Sydney or the University of New South Wales. If you agree to take part in this study, you will be asked to sign a consent form (see over). If you would like additional information on the project or have any questions please do not hesitate to contact Caroline Jones on 9772 6230. Please take time now to ask any questions you may have. There are no anticipated risks to your participation. Thank you for your time. Denis Burnham MARCS Auditory Laboratories & School of Psychology University of Western Sydney (Bankstown) NOTE: This study has been approved by the University of Western Sydney Human Research Ethics Committee, and ratified by the UNSW Ethics Secretariat. If you have any complaints or reservations about the ethical conduct of this research, you may contact the UWS Ethics Committee through the Research Ethics Officers (tel: 02 4570 1136). Any issues you raise will be treated in confidence and investigated fully, and you will be informed of the outcome.

mailto:[email protected]

http://www.uws.edu.au/marcs/

274

MARCS Auditory Laboratories College of Arts, Education and Social Sciences Denis Burnham Professor of Psychology, Director MARCS Phone: (+612) 9772 6681 Fax: (+612) 9772 6736 Email: [email protected] Web: www.uws.edu.au/marcs/

Speech perception across different languages CONSENT FORM Please read the information sheet before signing this. 1. Yes No I,..............................................................................(please print name) agree to participate as a participant in the study described in the participant information statement attached to this form. 2. Yes No I acknowledge that I have read the participant information statement, which explains why I have been selected, the aims of the experiment and the nature and the possible risks of the investigation, and the statement has been explained to me to my satisfaction. 3. Yes No I understand that I can withdraw from the study at any time, and I understand that my decision whether or not to participate in or subsequently withdraw from this study will not affect any current or future relationship to the University of Western Sydney. 4. Yes No I agree that research data gathered from the results of the study may be published, provided that I cannot be identified. 5. Yes No I agree that research data gathered from the results of the study may be provided to other researchers in conference presentations and in follow-up research, provided that I cannot be identified. 6. Yes No I understand that if I have any questions relating to my participation in this research, I may contact Caroline Jones (9772 6230) who will be happy to answer them. 7. Yes No I acknowledge receipt of a copy of the Participant Information Statement. 8. Yes No I agree to complete a questionnaire about my language background and other details relevant to the research before participating in the research. NOTE: This study has been approved by the University of Western Sydney Human Research Ethics Committee, and ratified by the UNSW Ethics Secretariat. If you have any complaints or reservations about the ethical conduct of this research, you may contact the UWS Ethics Committee through the Research Ethics Officers (tel: 02 4570 1136). Any issues you raise will be treated in confidence and investigated fully, and you will be informed of the outcome. Participant's signature: ………………………………….. Date: …………………………………..



275

MARCS Auditory Laboratories College of Arts, Education and Social Sciences Denis Burnham Professor of Psychology, Director MARCS Phone: (+612) 9772 6681 Fax: (+612) 9772 6736 Email: [email protected] Web: www.uws.edu.au/marcs/

Speech perception across different languages PARTICIPANT QUESTIONNAIRE

Please fill in the following details. This information is important for the study, and is the only information about you which will be retained. 1. Your initials: ............................. 2. Age: _____ years, _____ months 3. Gender: Male / Female (please circle) 4. Hearing: Do you have normal hearing? Yes / No (please circle) 5. Speech/language history: Do you have any history of speech/language problems? Yes/No 6. Language background: Please list all languages which you speak natively, i.e. which you learned from birth.

Native language/s: …………………………………….. ……………………………………..

Please also list all languages which you have some knowledge of, and indicate how old you were when you started learning the language. Other language/s you have knowledge of: Age at which you started learning this language: …………………………………….. ……………………………………. …………………………………….. ……………………………………. 7. Musical background: Please list all the musical instruments which you have play/ed and indicate for how long you play/ed the instrument, e.g. violin, 5 years Instrument: Years of Playing: ……………………… …………… ……………………… …………… 8. Place of birth (City/town & Country): ................................................................. 9. If your Mandarin / Vietnamese / Thai dialect is typical of a particular area/city (e.g. Beijing, HongKong) please write the name of the place here: ........................................................................................

THANK YOU!



276

Appendix A6.2 DMDX Script - Identification <ep><nfbt><dfm 1><n 480><s 480><d 75><azk><cr><fd 100><id "keyboard"><dbc 0><dwc 255000000><eop> $0 <ln -3>”In this experiment, you’ll identify sounds.”, <ln -2>"You’ll press the LEFT key”, <ln -1>“when you hear one kind of sound.", <ln 1>"And you’ll press the RIGHT key”, <ln 2>“when you hear the other kind of sound.", <ln 4>"Before you start the experiment,.",

<ln 5>”let’s do some practice.”, <ln 6>”Please press the spacebar to continue.”; 0<ln -3>”PRACTICE TRIALS - Please press the keys for practice!”, <ln 3>”Please press the spacebar to start the practice.”; +550 ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lspn.wav”/; -551 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lsin.wav”/; +552 ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lsin.wav”/; -553 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lspn.wav”/; +554 ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lsin.wav”/; -555 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lspn.wav”/; +556 ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lspn.wav”/; -557 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lsin.wav”/; 0<ln -3>”Good. Now you’ll get to practise this some more.”, <ln 0>”When you get 8 in a row correct,”, <ln 1>”you’ll move on to the testing phase.”, <ln 3>”Please press the spacebar to continue.”; 0<ln -3>”TRAINING TRIALS”, <ln 3>”Please press the spacebar to start the training.”; 2000 <set 1,2>;$ \+301 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-4000>/; -302 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-4000>/; +303 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-4000>/; -304 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-4000>/; +201 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-4000>/; -202 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-4000>/; +203 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-4000>/; -204 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-4000>/;\ $4000 <bicGT 1,1,-12000>;$ \+305 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-6000>/; -306 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-6000>/; +307 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-6000>/; -308 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-6000>/; +205 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-6000>/; -206 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-6000>/; +207 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-6000>/; -208 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-6000>/;\ $6000 <bicGT 1,1,-12000>;$ \+309 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-8000>/; -310 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-8000>/; +311 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-8000>/; -312 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-8000>/; +209 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-8000>/; -210 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-8000>/; +211 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-8000>/; -212 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-8000>/;\ $8000 <bicGT 1,1,-12000>;$ \+313 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-10000>/; -314 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-10000>/; +315 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-10000>/;

277

+213 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-10000>/; -214 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-10000>/; +215 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-10000>/; -216 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-10000>/;\ $10000 <bicGT 1,1,-12000>;$ \+317 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-2000>/; -318 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-2000>/; +319 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-2000>/; -320 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-2000>/; +217 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-2000>/; -218 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-2000>/; +219 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-2000>/; -220 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-2000>/;\ $12000; 0 <ln -3>"Great, 8 out of 8!", <ln -2>”Now you’ll move on to the experiment”, <ln -1>"The experiment consists of two blocks.", <ln 0>"There will be a break in each of the blocks and between them.", <ln 2>”Please press the key that best corresponds to what you hear”, <ln 4>”Please respond as quickly and accurately as you can”, <ln 5>”Press the spacebar to begin the experiment”;$ \-121 <nfb>”ready”/<wav 2>*”200210lsin.wav”/; -122 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -123 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -124 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -125 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -51 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -52 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -53 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -54 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -55 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -111 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -112 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -113 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -114 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -115 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -41 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -42 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -43 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -44 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -45 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -101 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -102 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -103 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -104 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -105 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -31 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -32 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -33 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -34 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -35 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -91 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -92 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -93 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -94 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -95 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -21 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -22 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -23 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -24 <nfb>“ready”/<wav 2>*”200180lspn.wav”/;

278

-25 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -81 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -82 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -83 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -84 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -85 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -11 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -12 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -13 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -14 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -15 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -131 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -132 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -133 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -134 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -135 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -61 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -62 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -63 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -64 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -65 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -71 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -72 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -73 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -74 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -75 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -1 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -2 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -3 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -4 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -5 <nfb>“ready”/<wav 2>*”200160lspn.wav”/;\ $0 <ln -4> “THANK YOU for your effort!”, <ln -3>”In the second part, you’ll also identify sounds.”, <ln -2>"You’ll press the LEFT key”, <ln -1>“when you hear one kind of sound.", <ln 1>"And you’ll press the RIGHT key”, <ln 2>“when you hear the other kind of sound.", <ln 4>"Before you start the experiment,.", <ln 5>”let’s do some practice.”, <ln 6>”Please press the spacebar to continue.”; 0<ln -3>”PRACTICE TRIALS - Please press the keys for practice!”, <ln 3>”Please press the spacebar to start the practice.”; +558 <nfb 0> ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lspn.wav”/; -559 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lsin.wav”/; +560 ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lsin.wav”/; -561 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lspn.wav”/; +562 <nfb 0> ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lsin.wav”/; -563 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lspn.wav”/; +564 ”Press the RIGHT key after you hear this kind of sound:”/<wav 2>*”200220lspn.wav”/; -565 “Press the LEFT key after you hear this kind of sound:”/<wav 2>*”200160lsin.wav”/; 0<ln -3>”Good. Now you’ll get to practise this some more.”, <ln 0>”When you get 8 in a row correct,”, <ln 1>”you’ll move on to the testing phase.”, <ln 3>”Please press the spacebar to continue.”; 0<ln -3>”TRAINING TRIALS”, <ln 3>”Please press the spacebar to start the training.”; 3000 <set 1,2>;$ \+221 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-14000>/;

279

-222 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-14000>/; +223 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-14000>/; -224 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-14000>/; +321 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-14000>/; -322 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-14000>/; +323 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-14000>/; -324 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-14000>/;\ $14000 <bicGT 1,1,-21000>;$ \+225 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-16000>/; -226 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-16000>/; +227 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-16000>/; -228 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-16000>/; +325 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-16000>/; -326 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-16000>/; +327 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-16000>/; -328 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-16000>/;\ $16000 <bicGT 1,1,-21000>;$ \+229 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-18000>/; -230 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-18000>/; +231 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-18000>/; -232 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-18000>/; +329 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-18000>/; -330 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-18000>/; +331 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-18000>/; -332 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-18000>/;\ $18000 <bicGT 1,1,-21000>;$ \+233 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-20000>/; -234 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-20000>/; +235 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-20000>/; -236 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-20000>/; +333 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-20000>/; -334 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-20000>/; +335 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-20000>/; -336 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-20000>/;\ $20000 <bicGT 1,1,-21000>;$ \+237 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-3000>/; -238 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-3000>/; +239 <set 1,2>”ready"/<wav 2>*"200220lspn.wav"<deciw 1><bicLE 1,1,-3000>/; -240 <set 1,2>"ready"/<wav 2>*"200160lspn.wav"<deciw 1><bicLE 1,1,-3000>/; +337 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-3000>/; -338 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-3000>/; +339 <set 1,2>”ready"/<wav 2>*"200220lsin.wav"<deciw 1><bicLE 1,1,-3000>/; -340 <set 1,2>"ready"/<wav 2>*"200160lsin.wav"<deciw 1><bicLE 1,1,-3000>/;\ $21000; 0 <ln -3>"Great, 8 out of 8!", <ln -1>”Now you’ll move on to the experiment”, <ln 0>”Please press the key that best corresponds to what you hear”, <ln 2>”Please respond as quickly and accurately as you can”, <ln 4>”Press the spacebar to begin the experiment”;$ \-56 <nfb>”ready”/<wav 2>*”200210lspn.wav”/; -57 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -58 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -59 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -60 <nfb>“ready”/<wav 2>*”200210lspn.wav”/; -126 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -127 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -128 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -129 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -130 <nfb>“ready”/<wav 2>*”200210lsin.wav”/; -46 <nfb>“ready”/<wav 2>*”200200lspn.wav”/;

280

-47 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -48 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -49 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -50 <nfb>“ready”/<wav 2>*”200200lspn.wav”/; -116 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -117 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -118 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -119 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -120 <nfb>“ready”/<wav 2>*”200200lsin.wav”/; -36 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -37 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -38 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -39 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -40 <nfb>“ready”/<wav 2>*”200190lspn.wav”/; -106 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -107 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -108 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -109 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -110 <nfb>“ready”/<wav 2>*”200190lsin.wav”/; -26 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -27 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -28 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -29 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -30 <nfb>“ready”/<wav 2>*”200180lspn.wav”/; -96 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -97 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -98 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -99 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -100 <nfb>“ready”/<wav 2>*”200180lsin.wav”/; -16 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -17 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -18 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -19 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -20 <nfb>“ready”/<wav 2>*”200170lspn.wav”/; -86 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -87 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -88 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -89 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -90 <nfb>“ready”/<wav 2>*”200170lsin.wav”/; -66 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -67 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -68 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -69 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -70 <nfb>“ready”/<wav 2>*”200220lspn.wav”/; -136 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -137 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -138 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -139 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -140 <nfb>“ready”/<wav 2>*”200220lsin.wav”/; -6 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -7 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -8 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -9 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -10 <nfb>“ready”/<wav 2>*”200160lspn.wav”/; -76 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -77 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -78 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -79 <nfb>“ready”/<wav 2>*”200160lsin.wav”/; -80 <nfb>“ready”/<wav 2>*”200160lsin.wav”/;\ $0 <ln -4> “End of experiment. THANK YOU!”;$

281

Appendix A6.3 DMDX Script - Discrimination <ep><cr><nfbt><t 1500><dfm 1><n 176><s 176><d 59><azk><fd 120><id "keyboard"><dbc 0><dwc 000255000><eop> $0 <ln -4>"In this task, you’ll hear two sounds in close succession”, <ln -2>”If they’re the SAME sound, press the LEFT shift key”, <ln -1>”If they’re DIFFERENT sounds, press the RIGHT shift key”, <ln 2>”Please respond as quickly and accurately as you can”, <ln 3>”If you’re really not sure, please respond with your first impression”, <ln 5>”Please press the spacebar to do 8 practice items”; $ \ +501 <nfb 0>“test”/<wav 2>*"160170lsin500.wav”; +502 <nfb 0>“test”/<wav 2>*"160170lsin500.wav”; +503 <nfb 0>“test”/<wav 2>*"170160lsin500.wav”; +504 <nfb 0>“test”/<wav 2>*"170160lsin500.wav”; -505 <nfb 0>“test”/<wav 2>*"200160lsinsame500.wav”; -506 <nfb 0>“test”/<wav 2>*"200180lsinsame500.wav”; -507 <nfb 0>“test”/<wav 2>*"200210lsinsame500.wav”; -508 <nfb 0>“test”/<wav 2>*"200170lsinsame500.wav”; \ $0 “Great, thanks. Please press the spacebar to start the testing.”;$ \ -1 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -2 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -3 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -4 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -9 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -10 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -11 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -12 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -17 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -18 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -19 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -20 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -25 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -26 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -27 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -28 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -33 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -34 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -35 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -36 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -41 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -42 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -43 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -44 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -49 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; -50 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; -51 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; -52 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; +61 <nfb> “test”/<wav 2>*"160170lsin500.wav"; +62 <nfb> “test”/<wav 2>*"160170lsin500.wav"; +63 <nfb> “test”/<wav 2>*"160170lsin500.wav"; +91 <nfb> “test”/<wav 2>*"170160lsin500.wav"; +92 <nfb> “test”/<wav 2>*"170160lsin500.wav"; +66 <nfb> “test”/<wav 2>*"170180lsin500.wav"; +67 <nfb> “test”/<wav 2>*"170180lsin500.wav"; +96 <nfb> “test”/<wav 2>*"180170lsin500.wav"; +97 <nfb> “test”/<wav 2>*"180170lsin500.wav";

282

+98 <nfb> “test”/<wav 2>*"180170lsin500.wav"; +71 <nfb> “test”/<wav 2>*"180190lsin500.wav"; +72 <nfb> “test”/<wav 2>*"180190lsin500.wav"; +73 <nfb> “test”/<wav 2>*"180190lsin500.wav"; +101 <nfb> “test”/<wav 2>*"190180lsin500.wav"; +102 <nfb> “test”/<wav 2>*"190180lsin500.wav"; +76 <nfb> “test”/<wav 2>*"190200lsin500.wav"; +77 <nfb> “test”/<wav 2>*"190200lsin500.wav"; +78 <nfb> “test”/<wav 2>*"200190lsin500.wav"; +106 <nfb> “test”/<wav 2>*"200190lsin500.wav"; +107 <nfb> “test”/<wav 2>*"200190lsin500.wav"; +81 <nfb> “test”/<wav 2>*"200210lsin500.wav"; +82 <nfb> “test”/<wav 2>*"200210lsin500.wav"; +83 <nfb> “test”/<wav 2>*"200210lsin500.wav"; +111 <nfb> “test”/<wav 2>*"210200lsin500.wav"; +112 <nfb> “test”/<wav 2>*"210200lsin500.wav"; +86 <nfb> “test”/<wav 2>*"210220lsin500.wav"; +87 <nfb> “test”/<wav 2>*"210220lsin500.wav"; +116 <nfb> “test”/<wav 2>*"220210lsin500.wav"; +117 <nfb> “test”/<wav 2>*"220210lsin500.wav"; +118 <nfb> “test”/<wav 2>*"220210lsin500.wav"; \ $0 <ln -4>“Great, good going. You’re half-way through the first part.”, <ln -2> “Please press the spacebar to continue.”;$ \ -5 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -6 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -7 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -8 <nfb> “test”/<wav 2>*"200160lsinsame500.wav"; -13 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -14 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -15 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -16 <nfb> “test”/<wav 2>*"200170lsinsame500.wav"; -21 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -22 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -23 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -24 <nfb> “test”/<wav 2>*"200180lsinsame500.wav"; -29 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -30 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -31 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -32 <nfb> “test”/<wav 2>*"200190lsinsame500.wav"; -37 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -38 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -39 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -40 <nfb> “test”/<wav 2>*"200200lsinsame500.wav"; -45 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -46 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -47 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -48 <nfb> “test”/<wav 2>*"200210lsinsame500.wav"; -53 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; -54 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; -55 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; -56 <nfb> “test”/<wav 2>*"200220lsinsame500.wav"; +64 <nfb> “test”/<wav 2>*"160170lsin500.wav"; +65 <nfb> “test”/<wav 2>*"160170lsin500.wav"; +93 <nfb> “test”/<wav 2>*"170160lsin500.wav"; +94 <nfb> “test”/<wav 2>*"170160lsin500.wav"; +95 <nfb> “test”/<wav 2>*"170160lsin500.wav"; +68 <nfb> “test”/<wav 2>*"170180lsin500.wav"; +69 <nfb> “test”/<wav 2>*"170180lsin500.wav";

283

+70 <nfb> “test”/<wav 2>*"170180lsin500.wav"; +99 <nfb> “test”/<wav 2>*"180170lsin500.wav"; +100 <nfb> “test”/<wav 2>*"180170lsin500.wav"; +74 <nfb> “test”/<wav 2>*"180190lsin500.wav"; +75 <nfb> “test”/<wav 2>*"180190lsin500.wav"; +103 <nfb> “test”/<wav 2>*"190180lsin500.wav"; +104 <nfb> “test”/<wav 2>*"190180lsin500.wav"; +105 <nfb> “test”/<wav 2>*"190180lsin500.wav"; +78 <nfb> “test”/<wav 2>*"190200lsin500.wav"; +79 <nfb> “test”/<wav 2>*"190200lsin500.wav"; +80 <nfb> “test”/<wav 2>*"190200lsin500.wav"; +109 <nfb> “test”/<wav 2>*"200190lsin500.wav"; +110 <nfb> “test”/<wav 2>*"200190lsin500.wav"; +84 <nfb> “test”/<wav 2>*"200210lsin500.wav"; +85 <nfb> “test”/<wav 2>*"200210lsin500.wav"; +113 <nfb> “test”/<wav 2>*"210200lsin500.wav"; +114 <nfb> “test”/<wav 2>*"210200lsin500.wav"; +115 <nfb> “test”/<wav 2>*"210200lsin500.wav"; +88 <nfb> “test”/<wav 2>*"210220lsin500.wav"; +89 <nfb> “test”/<wav 2>*"210220lsin500.wav"; +90 <nfb> “test”/<wav 2>*"210220lsin500.wav"; +119 <nfb> “test”/<wav 2>*"220210lsin500.wav"; +120 <nfb> “test”/<wav 2>*"220210lsin500.wav"; \ $0 “End of part one. Thanks a lot.”; 0 <ln -4>"In the second part, you’ll again hear two sounds in close succession”, <ln -2>”If they’re the SAME sound, press the LEFT shift key”, <ln -1>”If they’re DIFFERENT sounds, press the RIGHT shift key”, <ln 2>”Please respond as quickly and accurately as you can”, <ln 3>”If you’re really not sure, please respond with your first impression”, <ln 5>”Please press the spacebar to do 8 practice items”; $ \ +509 <nfb 0>“test”/<wav 2>*"160170lspn500.wav”; +510 <nfb 0>“test”/<wav 2>*"160170lspn500.wav”; +511 <nfb 0>“test”/<wav 2>*"170160lspn500.wav”; +512 <nfb 0>“test”/<wav 2>*"170160lspn500.wav”; -513 <nfb 0>“test”/<wav 2>*"200160lspnsame500.wav”; -514 <nfb 0>“test”/<wav 2>*"200180lspnsame500.wav”; -515 <nfb 0>“test”/<wav 2>*"200210lspnsame500.wav”; -516 <nfb 0>“test”/<wav 2>*"200170lspnsame500.wav”; \ $0 “Great, thanks. Please press the spacebar to start the testing.”;$ \ -1 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -2 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -3 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -4 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -9 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -10 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -11 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -12 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -17 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -18 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -19 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -20 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -25 <nfb> “test”/<wav 2>*"200190lspnsame500.wav"; -26 <nfb> “test”/<wav 2>*"200190lspnsame500.wav"; -27 <nfb> “test”/<wav 2>*"200190lspnsame500.wav";

284

-28 <nfb> “test”/<wav 2>*"200190lspnsame500.wav"; -33 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -34 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -35 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -36 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -41 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -42 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -43 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -44 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -49 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; -50 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; -51 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; -52 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; +61 <nfb> “test”/<wav 2>*"160170lspn500.wav"; +62 <nfb> “test”/<wav 2>*"160170lspn500.wav"; +63 <nfb> “test”/<wav 2>*"160170lspn500.wav"; +91 <nfb> “test”/<wav 2>*"170160lspn500.wav"; +92 <nfb> “test”/<wav 2>*"170160lspn500.wav"; +66 <nfb> “test”/<wav 2>*"170180lspn500.wav"; +67 <nfb> “test”/<wav 2>*"170180lspn500.wav"; +96 <nfb> “test”/<wav 2>*"180170lspn500.wav"; +97 <nfb> “test”/<wav 2>*"180170lspn500.wav"; +98 <nfb> “test”/<wav 2>*"180170lspn500.wav"; +71 <nfb> “test”/<wav 2>*"180190lspn500.wav"; +72 <nfb> “test”/<wav 2>*"180190lspn500.wav"; +73 <nfb> “test”/<wav 2>*"180190lspn500.wav"; +101 <nfb> “test”/<wav 2>*"190180lspn500.wav"; +102 <nfb> “test”/<wav 2>*"190180lspn500.wav"; +76 <nfb> “test”/<wav 2>*"190200lspn500.wav"; +77 <nfb> “test”/<wav 2>*"190200lspn500.wav"; +106 <nfb> “test”/<wav 2>*"200190lspn500.wav"; +107 <nfb> “test”/<wav 2>*"200190lspn500.wav"; +108 <nfb> “test”/<wav 2>*"200190lspn500.wav"; +81 <nfb> “test”/<wav 2>*"200210lspn500.wav"; +82 <nfb> “test”/<wav 2>*"200210lspn500.wav"; +83 <nfb> “test”/<wav 2>*"200210lspn500.wav"; +111 <nfb> “test”/<wav 2>*"210200lspn500.wav"; +112 <nfb> “test”/<wav 2>*"210200lspn500.wav"; +86 <nfb> “test”/<wav 2>*"210220lspn500.wav"; +87 <nfb> “test”/<wav 2>*"210220lspn500.wav"; +116 <nfb> “test”/<wav 2>*"220210lspn500.wav"; +117 <nfb> “test”/<wav 2>*"220210lspn500.wav"; +118 <nfb> “test”/<wav 2>*"220210lspn500.wav"; \ $0 <ln -4>“Great, good going. You’re half-way throughthe second part.”, <ln -2> “Please press the spacebar to continue.”;$ \ -5 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -6 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -7 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -8 <nfb> “test”/<wav 2>*"200160lspnsame500.wav"; -13 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -14 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -15 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -16 <nfb> “test”/<wav 2>*"200170lspnsame500.wav"; -21 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -22 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -23 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -24 <nfb> “test”/<wav 2>*"200180lspnsame500.wav"; -29 <nfb> “test”/<wav 2>*"200190lspnsame500.wav";

285

-30 <nfb> “test”/<wav 2>*"200190lspnsame500.wav"; -31 <nfb> “test”/<wav 2>*"200190lspnsame500.wav"; -32 <nfb> “test”/<wav 2>*"200190lspnsame500.wav"; -37 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -38 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -39 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -40 <nfb> “test”/<wav 2>*"200200lspnsame500.wav"; -45 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -46 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -47 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -48 <nfb> “test”/<wav 2>*"200210lspnsame500.wav"; -53 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; -54 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; -55 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; -56 <nfb> “test”/<wav 2>*"200220lspnsame500.wav"; +64 <nfb> “test”/<wav 2>*"160170lspn500.wav"; +65 <nfb> “test”/<wav 2>*"160170lspn500.wav"; +93 <nfb> “test”/<wav 2>*"170160lspn500.wav"; +94 <nfb> “test”/<wav 2>*"170160lspn500.wav"; +95 <nfb> “test”/<wav 2>*"170160lspn500.wav"; +68 <nfb> “test”/<wav 2>*"170180lspn500.wav"; +69 <nfb> “test”/<wav 2>*"170180lspn500.wav"; +70 <nfb> “test”/<wav 2>*"170180lspn500.wav"; +99 <nfb> “test”/<wav 2>*"180170lspn500.wav"; +100 <nfb> “test”/<wav 2>*"180170lspn500.wav"; +74 <nfb> “test”/<wav 2>*"180190lspn500.wav"; +75 <nfb> “test”/<wav 2>*"180190lspn500.wav"; +103 <nfb> “test”/<wav 2>*"190180lspn500.wav"; +104 <nfb> “test”/<wav 2>*"190180lspn500.wav"; +105 <nfb> “test”/<wav 2>*"190180lspn500.wav"; +78 <nfb> “test”/<wav 2>*"190200lspn500.wav"; +79 <nfb> “test”/<wav 2>*"190200lspn500.wav"; +80 <nfb> “test”/<wav 2>*"190200lspn500.wav"; +109 <nfb> “test”/<wav 2>*"200190lspn500.wav"; +110 <nfb> “test”/<wav 2>*"200190lspn500.wav"; +84 <nfb> “test”/<wav 2>*"200210lspn500.wav"; +85 <nfb> “test”/<wav 2>*"200210lspn500.wav"; +113 <nfb> “test”/<wav 2>*"210200lspn500.wav"; +114 <nfb> “test”/<wav 2>*"210200lspn500.wav"; +115 <nfb> “test”/<wav 2>*"210200lspn500.wav"; +88 <nfb> “test”/<wav 2>*"210220lspn500.wav"; +89 <nfb> “test”/<wav 2>*"210220lspn500.wav"; +90 <nfb> “test”/<wav 2>*"210220lspn500.wav"; +119 <nfb> “test”/<wav 2>*"220210lspn500.wav"; +120 <nfb> “test”/<wav 2>*"220210lspn500.wav"; \ $0 “End of experiment. Thanks a lot.”;$

286

Appendix A6.4 Raw Data - Criterion Language Presentation Tone type/ Criterion

Type Language Manner Set1/2 Score

tonal Thai blocked sine 8

tonal Thai blocked speech 8















tonal Vietnamese blocked sine 12

tonal Vietnamese blocked speech 8















tonal Mandarin blocked sine 8

tonal Mandarin blocked speech 8

287















non-tonal Aust. English blocked sine 8

non-tonal Aust. English blocked speech 8















tonal Thai mixed set 1 15








288









tonal Vietnamese mixed set 1 19
















tonal Mandarin mixed set 1 17














289



non-tonal Aust. English mixed set 1 8
















290

Appendix A6.5 Statistical Analyses – Criterion T-Test Blocked vs. Mixed

Group Statistics

64 9.41 4.019 .502

64 14.27 10.267 1.283

mixblockblock

mix

criterionN Mean Std. Deviation

Std. ErrorMean

Independent Samples Test

36.504 .000 -3.526 126 .001 -4.859 1.378 -7.587 -2.132

-3.526 81.863 .001 -4.859 1.378 -7.601 -2.118

Equal variancesassumed

Equal variancesnot assumed

criterionF Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

ANOVA Blocked

Analysis of Variance Summary Table

Source SS df MS F

------------------------------------------------

Between

------------------------------------------------

B1 25.521 1 25.521 1.678

B2 22.042 1 22.042 1.450

B3 6.125 1 6.125 0.403

Error 425.750 28 15.205

------------------------------------------------

Within

------------------------------------------------

W1 1.563 1 1.563 0.096

B1W1 4.688 1 4.688 0.289

B2W1 6.000 1 6.000 0.370

B3W1 72.000 1 72.000 4.443

Error 453.750 28 16.205

------------------------------------------------

291

ANOVA Mixed


Source SS df MS F

------------------------------------------------

Between

------------------------------------------------

B1 148.755 1 148.755 1.628

B2 10.010 1 10.010 0.110

B3 0.031 1 0.031 0.000

Error 2558.188 28 91.364

------------------------------------------------

Within

------------------------------------------------

W1 1816.891 1 1816.891 27.337

B1W1 236.297 1 236.297 3.555

B2W1 2.344 1 2.344 0.035

B3W1 7.031 1 7.031 0.106

Error 1860.938 28 66.462

------------------------------------------------

292

Appendix A6.6 Raw Data – Crossover Values Presentation Crossover Crossover

Language Manner Sine-wave Speech

Thai mixed 1 200 205





Thai mixed 1 183 194.9



Thai blocked 2 204 190





Thai blocked 2 194.9 209



Mandarin mixed 1 171 189








Mandarin blocked 2 184 192


Mandarin blocked 2 184.7 194






Vietnamese mixed 1 180 191


293







Vietnamese blocked 2 193 191








Australian mixed 1 191 195








Australian blocked 2 193 193








294

Appendix A6.7 Statistical Analyses – Crossovers Analysis of Variance Summary Table

Source SS df MS F

------------------------------------------------

Between

------------------------------------------------

B1 356.779 1 356.779 5.264

B2 319.923 1 319.923 4.721

B3 34.595 1 34.595 0.510

B4 1641.263 1 1641.263 24.218

B5 8.670 1 8.670 0.128

B6 56.659 1 56.659 0.836

B7 0.083 1 0.083 0.001

Error 3795.198 56 67.771

------------------------------------------------

Within

------------------------------------------------

W1 436.232 1 436.232 13.788

B1W1 77.346 1 77.346 2.445

B2W1 20.304 1 20.304 0.642

B3W1 58.853 1 58.853 1.860

B4W1 9.226 1 9.226 0.292

B5W1 42.334 1 42.334 1.338

B6W1 23.730 1 23.730 0.750

B7W1 33.206 1 33.206 1.050

Error 1771.723 56 31.638

------------------------------------------------

295

Appendix A6.8 Raw Data – Identification Accuracy Presentation

Language Manner Sine-wave Speech

Thai mixed 1.873 3.11








Thai blocked 1.873 2.705








Viet mixed 1.873 1.353








Viet blocked 0.992 1.348








Mand mixed 2.705 2.705




296




Mand mixed 2.3 2.3

Mand blocked 1.873 1.873








Australian mixed 2.229 2.705








Australian blocked 0.734 1.627








297

Appendix A6.9 Statistical Analyses Identification Accuracy


Source SS df MS F

------------------------------------------------

Between

------------------------------------------------

B1 1.380 1 1.380 3.748

B2 0.004 1 0.004 0.011

B3 1.250 1 1.250 3.395

B4 3.911 1 3.911 10.622

B5 0.663 1 0.663 1.801

B6 0.027 1 0.027 0.072

B7 0.015 1 0.015 0.042

Error 20.616 56 0.368

------------------------------------------------

Within

------------------------------------------------

W1 0.147 1 0.147 0.490

B1W1 0.007 1 0.007 0.025

B2W1 0.000 1 0.000 0.000

B3W1 0.174 1 0.174 0.580

B4W1 0.876 1 0.876 2.915

B5W1 0.033 1 0.033 0.111

B6W1 0.008 1 0.008 0.028

B7W1 0.001 1 0.001 0.004

Error 16.822 56 0.300

------------------------------------------------

298

Appendix A6.10 Raw Data – Discrimination Accuracy Presentation Inter-Stimulus Sine-Wave Discrimination

Language Manner Interval (ms) 160-170 170-180 180-190 190-200 200-210 210-220

Thai block 500 3.995 3.995 1.5525 8.715 7.53 2.36

Thai block 500 -2.283 1.7375 1.9975 2.6225 3.835 -1.553

Thai block 500 4.565 5.1425 5.715 5.455 2.6225 3.995

Thai block 500 -1.998 -1.998 4.0575 3.105 4.565 0.34

Thai mix 500 2.8825 4.175 5.2425 4.98 4.095 0

Thai mix 500 -1.998 0.285 5.2425 5.715 4.62 1.9975

Thai mix 500 0.285 1.9975 0 1.9975 2.2825 0.885

Thai mix 500 5.165 8.715 6.98 3.105 8.715 6.355

Thai block 1500 1.5525 4.905 3.835 0.885 4.905 0.285

Thai block 1500 3.835 2.6225 0.785 0 4.565 1.9975

Thai block 1500 0 4.565 5.91 5.91 6.64 2.2825

Thai block 1500 2.6225 2.2825 4.725 -2.338 0.6675 0.885

Thai mix 1500 3.995 3.5075 6.98 6.345 5.715 4.25

Thai mix 1500 -0.445 0.885 0 1.2375 7.53 0.285

Thai mix 1500 2.2825 2.4375 7.53 -1.57 6.345 3.3275

Thai mix 1500 -1.498 6.64 2.3375 7.055 6.9 2.4375

Vietnamese block 500 6.64 8.715 5.245 8.715 8.715 5.245

Vietnamese block 500 -1.998 1.9975 6.64 2.8825 -1.998 -0.285

Vietnamese block 500 0 4.62 1.9975 5.455 0 1.9975

Vietnamese block 500 0.785 -0.885 -1.998 -3.835 -5.795 0

Vietnamese mix 500 4.565 4.3575 3.5075 0.285 2.2825 1.9975

Vietnamese mix 500 0 3.5075 7.53 4.905 2.2825 0

Vietnamese mix 500 3.55 4.905 -1.398 0 2.8825 0

Vietnamese mix 500 3.1675 0 2.5425 5.1425 5.715 2.5425

Vietnamese block 1500 0 5.91 6.64 4.175 4.905 1.3975

Vietnamese block 1500 0 1.9975 0 4.28 2.2825 0.285

Vietnamese block 1500 1.5525 0.885 -0.785 3.4725 3.3275 0

Vietnamese block 1500 -0.785 -3.508 0 0.785 -1.77 0

Vietnamese mix 1500 2.2825 -2.283 2.8825 0.285 2.8825 2.2825

Vietnamese mix 1500 3.55 -0.285 1.07 1.67 1.57 2.3375

Vietnamese mix 1500 1.4975 0.885 0.785 -1.758 2.6975 1.8125

Vietnamese mix 1500 2.4375 -1.998 1.77 1.9975 0 4.175

Mandarin block 500 1.5525 0 0 1.67 2.6225 -0.885

Mandarin block 500 2.5425 0 -0.1 -2.283 -2.283 0.885

Mandarin block 500 -0.785 -3.958 -3.835 -3.068 -1.67 0.6675

Mandarin block 500 0 4.28 3.1725 0.885 0 0

299

Mandarin mix 500 0.885 0 -2.543 3.105 -3.168 -3.168

Mandarin mix 500 3.835 0.785 5.715 5.795 4.905 1.9975

Mandarin mix 500 1.77 3.3275 1.5525 2.5425 -1.658 0.1

Mandarin mix 500 0.785 3.9575 5.32 1.9975 1.5525 4.62

Mandarin block 1500 1.175 4.3575 5.165 3.1725 0 0.785

Mandarin block 1500 0 1.9975 0 4.28 2.2825 0.285

Mandarin block 1500 2.2825 5.2425 3.835 1.5525 3.4825 0.785

Mandarin block 1500 2.8825 3.55 7.53 4.62 4.28 0

Mandarin mix 1500 0 2.8825 6.98 4.62 4.28 5.795

Mandarin mix 1500 -0.99 -1.813 0.6675 0.7675 4.3575 -2.783

Mandarin mix 1500 2.2825 6.98 5.91 7.53 8.715 5.2425

Mandarin mix 1500 5.085 3.3275 5.795 4.095 0.785 0.6675

Australian block 500 1.5525 3.1725 -1.913 -1.213 5.245 -1.998

Australian block 500 6.64 2.4375 5.1425 4.0575 4.095 4.095

Australian block 500 -3.995 1.9975 0.285 0 3.1675 3.4275


Australian mix 500 2.4375 3.5075 3.1725 5.1425 1.9125 3.06

Australian mix 500 4.3575 2.2825 0 2.6225 0.6675 1.9975

Australian mix 500 1.77 3.3275 1.5525 2.5425 -1.658 0.1

Australian mix 500 1.5525 1.57 2.5425 5.87 6.9 7.53





Australian mix 1500 6.345 2.5425 4.0575 4.825 3.5075 2.2825

Australian mix 1500 6.64 5.165 5.91 3.55 4.3575 3.55

Australian mix 1500 0.6675 2.3375 1.5525 -1.145 1.7375 0

Australian mix 1500 5.91 6.345 6.98 5.87 5.085 5.395

300

Presentation Inter-Stimulus Speech Discrimination

Language Manner Interval (ms) 160-170 170-180 180-190 190-200 200-210 210-220

Thai block 500 2.2825 2.4375 5.455 6.98 5.245 4.565

Thai block 500 0 -1.998 3.1725 5.085 3.105 1.77

Thai block 500 2.5425 5.91 5.455 5.17 1.9975 1.9975

Thai block 500 1.175 2.4375 4.095 5.245 1.9975 1.9975

Thai mix 500 0.285 1.62 3.1725 5.2425 4.54 1.5525

Thai mix 500 0 2.6225 4.905 0.89 0.285 -1.998

Thai mix 500 1.9975 4.905 2.2825 5.87 1.57 2.5425

Thai mix 500 3.1725 6.98 3.5075 8.715 6.98 4.905

Thai block 1500 1.1125 0.285 1.4975 4.98 3.105 2.4375

Thai block 1500 4.565 0 2.2825 5.91 1.9975 0

Thai block 1500 0.545 -0.1 0.885 5.91 3.5075 6.355

Thai block 1500 3.1675 4.3575 4.25 5.245 -0.73 2.6225

Thai mix 1500 7.53 4.905 8.715 6.9 8.715 3.105

Thai mix 1500 0.885 2.7825 -1.398 4.905 3.5825 2.6225

Thai mix 1500 2.2825 -1.998 0.89 3.0675 3.1675 2.2825

Thai mix 1500 0 4.565 -0.445 5.795 1.77 3.1675

Vietnamese block 500 2.6225 1.67 3.105 2.6225 3.4075 0

Vietnamese block 500 1.1125 -1.998 1.3975 0 0 0

Vietnamese block 500 -1.998 4.28 4.3575 0 2.6225 -2.283

Vietnamese block 500 0.785 -3.835 1.9975 0.885 -3.573 -1.658

Vietnamese mix 500 1.77 4.28 6.64 5.91 5.245 4.28

Vietnamese mix 500 0 0.885 2.4375 2.6975 3.835 0

Vietnamese mix 500 0.785 0.785 5.085 3.105 0 2.4375

Vietnamese mix 500 6.64 8.715 1.57 5.085 5.1425 5.715

Vietnamese block 1500 1.5525 6.64 6.98 2.7825 6.355 0

Vietnamese block 1500 0 -1.998 4.28 3.995 2.2825 0

Vietnamese block 1500 1.07 0 3.105 0.73 -2.283 1.1125

Vietnamese block 1500 1.1125 -1.998 2.2825 2.2825 1.9975 0

Vietnamese mix 1500 3.1675 0 3.55 0.885 1.9975 1.9975

Vietnamese mix 1500 1.1125 4.62 4.28 2.2825 3.995 0

Vietnamese mix 1500 2.4375 -4.565 1.7375 0.6675 4.905 0

Vietnamese mix 1500 -3.995 2.6225 3.1725 -3.068 0 0.885

Mandarin block 500 3.4275 4.0575 2.5425 1.7575 6.64 2.2825

Mandarin block 500 0 4.0575 3.4275 -1.998 1.3975 -3.995

Mandarin block 500 -2.438 -1.738 -3.168 -3.068 -1.67 -0.99

Mandarin block 500 0 2.2825 8.715 4.565 4.565 0

Mandarin mix 500 2.2825 -0.155 0.885 1.9975 0 1.9975

Mandarin mix 500 -4.28 3.4075 4.905 1.9975 1.9975 0

301

Mandarin mix 500 5.085 0 3.105 3.105 0.885 2.8825

Mandarin mix 500 3.4075 2.6975 5.17 6.345 4.905 2.2825

Mandarin block 1500 5.87 5.1425 3.4075 3.835 3.835 -3.068

Mandarin block 1500 0 -1.998 4.28 3.995 2.2825 0

Mandarin block 1500 0.285 0.285 5.91 3.1725 3.1675 4.0575

Mandarin block 1500 8.715 6.355 4.28 1.9975 4.62 2.6225

Mandarin mix 1500 2.6225 6.64 1.9975 1.9975 5.245 5.17

Mandarin mix 1500 2.6225 6.98 0 3.1675 2.2825 3.995

Mandarin mix 1500 6.98 6.64 8.715 6.98 4.28 4.3575

Mandarin mix 1500 4.175 3.105 3.1675 2.6225 0.1 -1.398



Australian block 500 -2.283 0 1.3975 0 0.885 0


Australian mix 500 4.28 2.2825 -1.113 5.795 0 0.785

Australian mix 500 0 0.545 3.3275 3.0675 3.55 0.885

Australian mix 500 5.085 0 3.105 3.105 0.885 2.8825

Australian mix 500 -2.623 4.54 3.0675 3.4725 4.3575 0

Australian block 1500 2.2825 3.1675 3.5075 1.9975 1.5525 -1.113




Australian mix 1500 4.28 4.565 4.28 5.245 2.8825 0

Australian mix 1500 4.175 4.905 8.715 7.53 6.98 6.355

Australian mix 1500 4.28 0 0 4.28 0.885 0

Australian mix 1500 3.4725 4.175 3.1725 5.795 7.53 3.1725

302

Appendix A6.11 Statistical Analyses – Discrimination Accuracy Analysis of Variance Summary Table

Source SS df MS F

------------------------------------------------

Between

------------------------------------------------

B1 38.693 1 38.693 1.108

B2 73.545 1 73.545 2.107

B3 13.053 1 13.053 0.374

B4 145.494 1 145.494 4.167

B5 5.435 1 5.435 0.156

B6 151.271 1 151.271 4.333

B7 55.377 1 55.377 1.586

B8 53.512 1 53.512 1.533

B9 4.707 1 4.707 0.135

B10 8.518 1 8.518 0.244

B11 178.318 1 178.318 5.107

B12 0.028 1 0.028 0.001

B13 1.146 1 1.146 0.033

B14 8.019 1 8.019 0.230

B15 33.106 1 33.106 0.948

Error 1675.821 48 34.913

------------------------------------------------

Within

------------------------------------------------

W1 139.940 1 139.940 38.735

B1W1 2.944 1 2.944 0.815

B2W1 1.617 1 1.617 0.447

B3W1 8.997 1 8.997 2.490

B4W1 9.290 1 9.290 2.572

B5W1 0.720 1 0.720 0.199

B6W1 0.882 1 0.882 0.244

B7W1 7.868 1 7.868 2.178

B8W1 4.402 1 4.402 1.219

B9W1 0.012 1 0.012 0.003

B10W1 16.035 1 16.035 4.438

B11W1 6.436 1 6.436 1.782

B12W1 10.897 1 10.897 3.016

B13W1 2.945 1 2.945 0.815

B14W1 6.698 1 6.698 1.854

B15W1 2.461 1 2.461 0.681

Error 173.412 48 3.613

W2 0.605 1 0.605 0.137

B1W2 1.434 1 1.434 0.326

B2W2 3.533 1 3.533 0.802

B3W2 3.719 1 3.719 0.844

B4W2 11.662 1 11.662 2.648

B5W2 4.580 1 4.580 1.040

B6W2 20.007 1 20.007 4.543

B7W2 0.179 1 0.179 0.041

B8W2 1.596 1 1.596 0.363

B9W2 0.602 1 0.602 0.137

B10W2 7.246 1 7.246 1.645

B11W2 0.559 1 0.559 0.127

B12W2 0.186 1 0.186 0.042

B13W2 1.153 1 1.153 0.262

B14W2 0.186 1 0.186 0.042

B15W2 9.034 1 9.034 2.051

303

Error 211.386 48 4.404

W3 108.226 1 108.226 22.181

B1W3 0.044 1 0.044 0.009

B2W3 0.037 1 0.037 0.008

B3W3 0.029 1 0.029 0.006

B4W3 0.503 1 0.503 0.103

B5W3 17.436 1 17.436 3.574

B6W3 36.657 1 36.657 7.513

B7W3 1.589 1 1.589 0.326

B8W3 4.124 1 4.124 0.845

B9W3 0.672 1 0.672 0.138

B10W3 0.220 1 0.220 0.045

B11W3 0.686 1 0.686 0.141

B12W3 0.644 1 0.644 0.132

B13W3 2.231 1 2.231 0.457

B14W3 4.117 1 4.117 0.844

B15W3 24.143 1 24.143 4.948

Error 234.198 48 4.879

W4 3.775 1 3.775 1.343

B1W4 0.001 1 0.001 0.000

B2W4 7.625 1 7.625 2.713

B3W4 17.962 1 17.962 6.392

B4W4 0.121 1 0.121 0.043

B5W4 0.332 1 0.332 0.118

B6W4 0.088 1 0.088 0.031

B7W4 14.004 1 14.004 4.984

B8W4 4.442 1 4.442 1.581

B9W4 2.359 1 2.359 0.839

B10W4 31.115 1 31.115 11.073

B11W4 4.592 1 4.592 1.634

B12W4 6.105 1 6.105 2.173

B13W4 1.576 1 1.576 0.561

B14W4 0.001 1 0.001 0.000

B15W4 0.075 1 0.075 0.027

Error 134.877 48 2.810

W5 3.854 1 3.854 0.595

B1W5 0.551 1 0.551 0.085

B2W5 0.050 1 0.050 0.008

B3W5 0.093 1 0.093 0.014

B4W5 0.169 1 0.169 0.026

B5W5 17.746 1 17.746 2.742

B6W5 0.071 1 0.071 0.011

B7W5 4.019 1 4.019 0.621

B8W5 0.423 1 0.423 0.065

B9W5 4.739 1 4.739 0.732

B10W5 3.600 1 3.600 0.556

B11W5 3.223 1 3.223 0.498

B12W5 8.469 1 8.469 1.308

B13W5 13.059 1 13.059 2.018

B14W5 2.951 1 2.951 0.456

B15W5 13.444 1 13.444 2.077

Error 310.689 48 6.473

W6 12.393 1 12.393 2.683

B1W6 2.282 1 2.282 0.494

B2W6 1.867 1 1.867 0.404

B3W6 0.275 1 0.275 0.060

B4W6 1.089 1 1.089 0.236

B5W6 10.324 1 10.324 2.235

B6W6 0.274 1 0.274 0.059

B7W6 3.113 1 3.113 0.674

B8W6 0.069 1 0.069 0.015

304

B9W6 3.516 1 3.516 0.761

B10W6 1.391 1 1.391 0.301

B11W6 9.300 1 9.300 2.013

B12W6 0.107 1 0.107 0.023

B13W6 1.959 1 1.959 0.424

B14W6 0.016 1 0.016 0.004

B15W6 0.534 1 0.534 0.116

Error 221.745 48 4.620

W7 0.984 1 0.984 0.291

B1W7 7.535 1 7.535 2.231

B2W7 7.127 1 7.127 2.110

B3W7 0.104 1 0.104 0.031

B4W7 0.210 1 0.210 0.062

B5W7 4.526 1 4.526 1.340

B6W7 42.781 1 42.781 12.664

B7W7 0.113 1 0.113 0.033

B8W7 0.001 1 0.001 0.000

B9W7 7.296 1 7.296 2.160

B10W7 0.015 1 0.015 0.004

B11W7 0.177 1 0.177 0.052

B12W7 1.700 1 1.700 0.503

B13W7 1.554 1 1.554 0.460

B14W7 19.512 1 19.512 5.776

B15W7 1.219 1 1.219 0.361

Error 162.150 48 3.378

W8 3.833 1 3.833 1.095

B1W8 0.004 1 0.004 0.001

B2W8 1.690 1 1.690 0.483

B3W8 0.084 1 0.084 0.024

B4W8 8.869 1 8.869 2.533

B5W8 0.716 1 0.716 0.204

B6W8 0.042 1 0.042 0.012

B7W8 3.746 1 3.746 1.070

B8W8 0.150 1 0.150 0.043

B9W8 8.197 1 8.197 2.341

B10W8 4.912 1 4.912 1.403

B11W8 8.574 1 8.574 2.449

B12W8 0.703 1 0.703 0.201

B13W8 0.403 1 0.403 0.115

B14W8 6.550 1 6.550 1.871

B15W8 8.191 1 8.191 2.339

Error 168.073 48 3.502

W9 5.143 1 5.143 1.699

B1W9 2.437 1 2.437 0.805

B2W9 6.841 1 6.841 2.260

B3W9 0.976 1 0.976 0.322

B4W9 1.257 1 1.257 0.415

B5W9 16.485 1 16.485 5.445

B6W9 46.239 1 46.239 15.272

B7W9 0.038 1 0.038 0.013

B8W9 0.317 1 0.317 0.105

B9W9 3.589 1 3.589 1.185

B10W9 0.526 1 0.526 0.174

B11W9 0.103 1 0.103 0.034

B12W9 5.090 1 5.090 1.681

B13W9 3.500 1 3.500 1.156

B14W9 4.945 1 4.945 1.633

B15W9 0.405 1 0.405 0.134

Error 145.328 48 3.028

------------------------------------------------

305

Appendix A7


306

Appendix A7.1 Consent Form and Questionnaire for Australian Participants

MARCS Auditory Laboratories College of Arts, Education and Social Sciences

Denis Burnham

Professor of Psychology, Director MARCS Phone: (+612) 9772 6681 Fax: (+612) 9772 6736

Email: [email protected] Web: www.uws.edu.au/marcs/

Speech perception across different languages April 2005

PARTICIPANT INFORMATION STATEMENT You are invited to participate in a research study on human speech. The results of the study will be used to understand how adults produce and perceive speech and other auditory signals. The benefits of this study include increased understanding of how easily humans produce speech sounds in their native language, how easily humans perceive acoustic information in another's speech. We are interested in studying this for different languages. So this research is being conducted with speakers of Australian English. You are invited to participate because you are a native speaker of Australian English. If you participate, you will complete a 60-minute session. You will be asked to identify and discriminate short sound items. Participation is voluntary. You have a right not to participate in, or subsequently withdraw from the study. Any decision not to participate will not affect any current or future relationship with the University of Western Sydney. If you agree to take part in this study, you will be asked to sign a consent form (see over). If you would like additional information on the project or have any questions please do not hesitate to contact Barbara Schwanhaeusser on 9772 6589. Please take time now to ask any questions you may have. Thank you for your time. Denis Burnham MARCS Auditory Laboratories & School of Psychology University of Western Sydney (Bankstown) NOTE: This study has been approved by the University of Western Sydney Human Research Ethics Committee. If you have any complaints or reservations about the ethical conduct of this research, you may contact the UWS Ethics Committee through the Research Ethics Officers (tel: (02) 4736 0883 ). Any issues you raise will be treated in confidence and investigated fully, and you will be informed of the outcome.



307

MARCS Auditory Laboratories




Speech perception across different languages CONSENT FORM

Please read the information sheet before signing this. 1. Yes No I,..............................................................................(please print name) agree to participate as a participant in the study described in the participant information statement attached to this form. 2. Yes No I acknowledge that I have read the participant information statement, which explains why I have been selected, the aims of the experiment, the nature, the possible risks of the investigation, and the statement have been explained to me to my satisfaction. 3. Yes No I understand that I can withdraw from the study at any time, and I understand that my decision whether or not to participate in or subsequently withdraw from this study will not affect any current or future relationship to the University of Western Sydney. 4. Yes No I agree that research data gathered from the results of the study may be published, provided that I cannot be identified. 5. Yes No I agree that research data gathered from the results of the study may be provided to other researchers in conference presentations and in follow-up research, provided that I cannot be identified. 6. Yes No I understand that if I have any questions relating to my participation in this research, I may contact Barbara Schwanhaeusser (9772 6589) or Prof Denis Burnham (9772 6681) who will be happy to answer them. 7. Yes No I acknowledge receipt of a copy of the Participant Information Statement. 8. Yes No I agree to complete a questionnaire about my language background and other details relevant to the research before participating in the research. Participant's signature: ………………………………….. Date: ………………………………….. NOTE: This study has been approved by the University of Western Sydney Human Research Ethics Committee. If you have any complaints or reservations about the ethical conduct of this research, you may contact the UWS Ethics Committee through the Research Ethics Officers (tel: (02) 4736 0883). Any issues you raise will be treated in confidence and investigated fully, and you will be informed of the outcome.



308


Denis Burnham




Please fill in the following details. This information is important for the study, and is the only information about you which will be retained. 1 Your name: …………………….. 2 Male / Female (please circle) 3 Date of birth: …………….. 4 Place of birth (City/town & Country): …………………………………..… 5 What is the official language in that country?………………………………. 6 Hearing: Do you have normal hearing? Yes / No If No, please provide any details you can _________________________________________________________________________________________________________________________________________________________ 7 Speech/language history: Do you have any history of speech/language problems? Yes / No If Yes, please provide any details you can __________________________________________________________________________________________________________________________________________________________ 8 Language background: What is your native language/dialect, that is the language/dialect, which you learned from birth. Please list the percentage of time you use these in your everyday life now. Native Language / Dialect Percentage of Time Spoken …………………………. ……………….. If you learned more than one language from birth, please list these and the percentage of time you use these in your everyday life now. Additional Native Language / Dialect Percentage of Time Spoken …………………………. ………………..



309


Denis Burnham




Please also list all the languages of which you have some knowledge, and indicate how old you were when you started learning the language, and the percentage of time you use these in your everyday life. Other language/s that you Age at which you started Percentage of have knowledge of: learning this language: Time Spoken …………………………………. ………… ……….. …………………………………. ………… ……….. …………………………………. ………… ……….. 9 Do you play a musical instrument and/or have singing training? Yes / No If Yes, please list all the musical instruments which you have play/ed and indicate for how long you play/ed the instrument, e.g. violin, age 10, played for 5 years (singing counts!) Instrument: Age started playing: Number of Years Playing: ……………………… …… …… ……………………… …… …… ……………………… …… …… Do you have formal secondary or tertiary level music education? Yes / No If so, please list details about that Instrument: Course: Grade or level attained: ……………………… …… …… ……………………… …… …… ……………………… …… …… Are you still playing music? Yes / No If not, when did you finish? ……………… If yes, how many hours per day do you play? ……………… 12 Do you have perfect pitch / absolute pitch? Yes / No / Don’t know (recognition / production of a note without a reference to other notes) Where, when and how was that assessed? ……………………………………….

THANK YOU!



310

311

312

313

314

Appendix A7.3 Raw Data – Tone Perception Test Criterion Results Continuum Language Presentation Tone type Trials to

Shape Background Musicianship Manner Set 1/2 Criterion

Rising Australian musician blocked sine-wave 8

Rising Australian musician blocked speech 8







Rising Australian non-musician blocked sine-wave 19

Rising Australian non-musician blocked speech 8







Rising Thai musician blocked sine-wave 8

Rising Thai musician blocked speech 9







Rising Thai non-musician blocked sine-wave 24

Rising Thai non-musician blocked speech 8







Rising Australian musician mixed set 1 8


315







Rising Australian non-musician mixed set 1 87








Rising Thai musician mixed set 1 8








Rising Thai non-musician mixed set 1 40








Falling Australian musician blocked sine-wave 8

Falling Australian musician blocked speech 21







316

Falling Australian non-musician blocked sine-wave 18

Falling Australian non-musician blocked speech 8







Falling Thai musician blocked sine-wave 8

Falling Thai musician blocked speech 8







Falling Thai non-musician blocked sine-wave 8

Falling Thai non-musician blocked speech 8







Falling Australian musician mixed set 1 8








Falling Australian non-musician mixed set 1 19






317



Falling Thai musician mixed set 2 12








Falling Thai non-musician mixed set 1 8








318

Appendix A7.4 Statistical Analyses – Criterion Results

Univariate Analysis of Variance Criterion Rising vs. Falling

Tests of Be tween-Subjects Effects

Dependent Variable: criterionout

4416.438a 7 630.920 1.920 .083

43785.563 1 43785.563 133.260 .000

27.562 1 27.562 .084 .773

3164.062 1 3164.062 9.630 .003

248.063 1 248.063 .755 .389

370.563 1 370.563 1.128 .293

175.563 1 175.563 .534 .468

60.063 1 60.063 .183 .671

370.563 1 370.563 1.128 .293

18400.000 56 328.571

66602.000 64

22816.438 63

SourceCorrected Model

Intercept

language

musical

contshape

language * musical

language * contshape

musical * contshape

language * musical *contshape

Error

Total

Corrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .194 (Adjusted R Squared = .093)a.

Univariate Analysis of Variance Criterion Rising blocked vs. mixed

Tests of Between-Subjects Effects

Dependent Variable: crit

2116.000a 1 2116.000 7.919 .007

16065.563 1 16065.563 60.125 .000

2116.000 1 2116.000 7.919 .007

16566.438 62 267.201

34748.000 64

18682.438 63


Intercept

manner

Error

Total

Corrected Total



319

manner = blocked

Tests of Be tween-Subjects Effectsb


229.469a 7 32.781 1.868 .120

3260.281 1 3260.281 185.749 .000

2.531 1 2.531 .144 .707

38.281 1 38.281 2.181 .153

94.531 1 94.531 5.386 .029

5.281 1 5.281 .301 .588

.281 1 .281 .016 .900

69.031 1 69.031 3.933 .059

19.531 1 19.531 1.113 .302

421.250 24 17.552

3911.000 32

650.719 31


Intercept

language

musical

type

language * musical

language * type

musical * type

language * musical * type

Error

Total

Corrected Total



manner = blockb.

manner = mixed



12197.469a 7 1742.496 11.247 .000

14921.281 1 14921.281 96.312 .000

1785.031 1 1785.031 11.522 .002

3180.031 1 3180.031 20.526 .000

2161.531 1 2161.531 13.952 .001

2945.281 1 2945.281 19.011 .000

282.031 1 282.031 1.820 .190

1092.781 1 1092.781 7.054 .014

750.781 1 750.781 4.846 .038

3718.250 24 154.927

30837.000 32

15915.719 31


Intercept

language

musical

type

language * musical

language * type

musical * type

language * musical * type

Error

Total

Corrected Total



manner = mixb.

320

Univariate Analysis of Variance Criterion Falling blocked vs. mixed Tests of Between-Subjects Effects Dependent Variable: crit

Source Type III Sum of Squares df Mean Square F Sig.

Corrected Model 81.000(a) 1 81.000 1.070 .305

Intercept 9900.250 1 9900.250 130.745 .000

blockmix 81.000 1 81.000 1.070 .305

Error 4694.750 62 75.722

Total 14676.000 64

Corrected Total 4775.750 63

a R Squared = .017 (Adjusted R Squared = .001)

manner = blocked

Tests of Between-Subjects Effects(b) Dependent Variable: crit



Intercept 4095.125 1 4095.125 69.189 .000

language 136.125 1 136.125 2.300 .142

music 200.000 1 200.000 3.379 .078

sinespeech 120.125 1 120.125 2.030 .167

language * music 50.000 1 50.000 .845 .367

language * sinespeech 15.125 1 15.125 .256 .618

music * sinespeech 40.500 1 40.500 .684 .416

language * music * sinespeech .500 1 .500 .008 .928

Error 1420.500 24 59.188

Total 6078.000 32


a R Squared = .284 (Adjusted R Squared = .075) b blockmix = block

manner = mixed Tests of Between-Subjects Effects(b) Dependent Variable: crit


Corrected Model 1282.375(a) 7 183.196 3.076 .018 Intercept 5886.125 1 5886.125 98.823 .000 language 50.000 1 50.000 .839 .369 music 450.000 1 450.000 7.555 .011 sinespeech 465.125 1 465.125 7.809 .010 language * music .125 1 .125 .002 .964 language * sinespeech 144.500 1 144.500 2.426 .132 music * sinespeech 144.500 1 144.500 2.426 .132 language * music * sinespeech

28.125 1 28.125 .472 .499

Error 1429.500 24 59.563 Total 8598.000 32 Corrected Total 2711.875 31

321

a R Squared = .473 (Adjusted R Squared = .319) b blockmix = mix

Appendix A7.5 Raw Data – Crossover Values Continuum Language Tpne Presentation Crossover

Shape Background Musicianship Type Manner Value in Hz

falling Thai musician sine-wave mixed 211.75


falling Thai musician sine-wave mixed .


falling Thai musician sine-wave blocked 205

falling Thai musician sine-wave blocked 214

falling Thai musician sine-wave blocked 212.5


falling Thai musician speech mixed 208




falling Thai musician speech blocked 208

falling Thai musician speech blocked 213.25



falling Thai non-musician sine-wave mixed 226.75




falling Thai non-musician sine-wave blocked 227.5

falling Thai non-musician sine-wave blocked 207



falling Thai non-musician speech mixed 205

falling Thai non-musician speech mixed 219.25



falling Thai non-musician speech blocked 230.5


falling Thai non-musician speech blocked 211


falling Australian musician sine-wave mixed 212.5

322



falling Australian musician sine-wave mixed 211

falling Australian musician sine-wave blocked 198.25


falling Australian musician sine-wave blocked 202


falling Australian musician speech mixed 204.25




falling Australian musician speech blocked 209.5




falling Australian non-musician sine-wave mixed 215.5




falling Australian non-musician sine-wave blocked 219.25




falling Australian non-musician speech mixed 185.5

falling Australian non-musician speech mixed .

falling Australian non-musician speech mixed .


falling Australian non-musician speech blocked 207.25


falling Australian non-musician speech blocked 202


rising Thai musician sine-wave mixed 243.25




rising Thai musician sine-wave blocked 227.5



323


rising Thai musician speech mixed 234.25

rising Thai musician speech mixed 235

rising Thai musician speech mixed 238


rising Thai musician speech blocked 232

rising Thai musician speech blocked 228.25



rising Thai non-musician sine-wave mixed 241

rising Thai non-musician sine-wave mixed 232

rising Thai non-musician sine-wave mixed .

rising Thai non-musician sine-wave mixed 230.5

rising Thai non-musician sine-wave blocked 235

rising Thai non-musician sine-wave blocked 229.75

rising Thai non-musician sine-wave blocked 232


rising Thai non-musician speech mixed 231.25

rising Thai non-musician speech mixed 223

rising Thai non-musician speech mixed .


rising Thai non-musician speech blocked 233.5



rising Thai non-musician speech blocked 229

rising Australian musician sine-wave mixed 226.75




rising Australian musician sine-wave blocked 233.5




rising Australian musician speech mixed 223.75

rising Australian musician speech mixed 232


rising Australian musician speech mixed 223

rising Australian musician speech blocked 238.75

324




rising Australian non-musician sine-wave mixed 231.25




rising Australian non-musician sine-wave blocked 232

rising Australian non-musician sine-wave blocked 234.25



rising Australian non-musician speech mixed 221.5




rising Australian non-musician speech blocked 225.25

rising Australian non-musician speech blocked .

rising Australian non-musician speech blocked 232

rising Australian non-musician speech blocked 232

325

Appendix A7.6 Statistical Output – Crossover Values Rising Continuum Tests of Between-Subjects Effects(b) Dependent Variable: crossover



Intercept 3241689.613 1 3241689.613 92420.449 .000

language 116.616 1 116.616 3.325 .075

musical 4.529 1 4.529 .129 .721

tonetype 82.831 1 82.831 2.362 .131

mixblock .375 1 .375 .011 .918

language * musical 102.656 1 102.656 2.927 .094

language * tonetype 53.029 1 53.029 1.512 .225

musical * tonetype 17.453 1 17.453 .498 .484

language * musical * tonetype 40.610 1 40.610 1.158 .288

language * mixblock 124.606 1 124.606 3.553 .066

musical * mixblock .003 1 .003 .000 .993

language * musical * mixblock 125.963 1 125.963 3.591 .065

tonetype * mixblock 22.425 1 22.425 .639 .428

language * tonetype * mixblock 6.516 1 6.516 .186 .669

musical * tonetype * mixblock 5.621 1 5.621 .160 .691

language * musical * tonetype * mixblock 1.606 1 1.606 .046 .832

Error 1578.396 45 35.075

Total 3286291.628 61


a R Squared = .316 (Adjusted R Squared = .088) b continuum = B=rising

326

Falling Continuum

Tests of Between-Subjects Effects(b) Dependent Variable: crossover



Intercept 2641791.253 1 2641791.253 45316.468 .000

language 1015.283 1 1015.283 17.416 .000

musical 41.566 1 41.566 .713 .403

tonetype 228.315 1 228.315 3.916 .054

mixblock 11.977 1 11.977 .205 .653


language * tonetype 20.481 1 20.481 .351 .556

musical * tonetype 3.110 1 3.110 .053 .818

language * musical * tonetype 306.671 1 306.671 5.261 .027

language * mixblock 1.252 1 1.252 .021 .884

musical * mixblock 256.346 1 256.346 4.397 .042

language * musical * mixblock .551 1 .551 .009 .923

tonetype * mixblock 269.983 1 269.983 4.631 .037

language * tonetype * mixblock 32.794 1 32.794 .563 .457

musical * tonetype * mixblock 2.700 1 2.700 .046 .831

language * musical * tonetype * mixblock 25.221 1 25.221 .433 .514

Error 2623.342 45 58.296

Total 2741906.840 61


a R Squared = .435 (Adjusted R Squared = .247) b continuum = A=Falling

327

Appendix A7.7 Raw Data – Identification Accuracy Results Continuum Language Tone Presentation Identification

Shape Background Musicianship Type Manner Accuracy









rising Thai musician sine-wave mixed .

rising Thai musician speech mixed .







rising Thai non-musician sine-wave mixed .



















328















rising Australian non-musician sine-wave mixed .
















falling Thai musician speech mixed 1.47








329
















falling Thai non-musician speech mixed .























330









falling Australian non-musician speech blocked .






falling Australian non-musician speech blocked .



331

Appendix A7.8 Statistical Output – Identification Accuracy

Rising Continuum Tests of Between-Subjects Effects(b) Dependent Variable: dprime


Corrected Model 8.148(a) 15 .543 1.669 .094

Intercept 144.069 1 144.069 442.711 .000

language .740 1 .740 2.275 .139

musical .106 1 .106 .326 .571

mixblock .016 1 .016 .049 .826

sinespeech .354 1 .354 1.086 .303


language * mixblock .029 1 .029 .090 .766


language * musical * mixblock .483 1 .483 1.485 .230

language * sinespeech .323 1 .323 .994 .324

musical * sinespeech .000 1 .000 .000 .985

language * musical * sinespeech .001 1 .001 .003 .955

mixblock * sinespeech .554 1 .554 1.701 .199

language * mixblock * sinespeech .314 1 .314 .964 .332

musical * mixblock * sinespeech .229 1 .229 .704 .406

language * musical * mixblock * sinespeech .966 1 .966 2.968 .092

Error 14.319 44 .325

Total 175.342 60


a R Squared = .363 (Adjusted R Squared = .145) b continuum = rising

332

Falling Continuum


Dependent Variable: dprime

9.000a 15 .600 .816 .655

120.744 1 120.744 164.221 .000

.387 1 .387 .527 .472

.883 1 .883 1.200 .279

1.411 1 1.411 1.920 .173

1.179 1 1.179 1.604 .212

.289 1 .289 .394 .534

.348 1 .348 .474 .495

.460 1 .460 .626 .433

.345 1 .345 .470 .497

.008 1 .008 .011 .917

.998 1 .998 1.357 .251

.096 1 .096 .131 .719

1.107 1 1.107 1.505 .227

1.375 1 1.375 1.870 .179

.044 1 .044 .060 .808

.104 1 .104 .141 .709

31.616 43 .735

172.240 59

40.616 58


Intercept

language

musical

mixblock

sinespeech

language * musical

language * mixblock

musical * mixblock

language * musical *mixblock

language * sinespeech

musical * s inespeech

language * musical *sinespeech

mixblock * sinespeech

language * mixblock *sinespeech

musical * mixblock *sinespeech

language * musical *mixblock * sinespeech

Error

Total

Corrected Total


R Squared = .222 (Adjusted R Squared = -.050)a.

continuum = fall ingb.

333

Appendix A7.9 Raw Data – Discrimination Accuracy

Continuum Language Tpne Presentation Discrimination

Shape Background Musicianship Type Manner Accuracy

rising Australian musician blocked sine-wave 1.357

rising Australian musician blocked speech 2.338


rising Australian musician blocked speech 0


rising Australian musician blocked speech 1.998


rising Australian musician blocked speech -1.998

rising Australian musician mixed sine-wave 4.289

rising Australian musician mixed speech 0


rising Australian musician mixed speech -0.885


rising Australian musician mixed speech 0.885


rising Australian musician mixed speech 0.438

rising Australian non-musician blocked sine-wave 0.881

rising Australian non-musician blocked speech -2.283


rising Australian non-musician blocked speech 0.885


rising Australian non-musician blocked speech 0


rising Australian non-musician blocked speech -0.885

rising Australian non-musician mixed sine-wave 3.338

rising Australian non-musician mixed speech 2.283





rising Australian non-musician mixed sine-wave -1.938


rising Thai musician blocked sine-wave 2.196

rising Thai musician blocked speech 0.885

334


rising Thai musician blocked speech 0





rising Thai musician mixed sine-wave 2.041

rising Thai musician mixed speech -3.995


rising Thai musician mixed speech 2.283





rising Thai non-musician blocked sine-wave 1.15

rising Thai non-musician blocked speech -1.998


rising Thai non-musician blocked speech 0


rising Thai non-musician blocked speech 1.113


rising Thai non-musician blocked speech -2.283

rising Thai non-musician mixed sine-wave 2.576

rising Thai non-musician mixed speech 0


rising Thai non-musician mixed speech 0.785


rising Thai non-musician mixed speech 0

rising Thai non-musician mixed sine-wave -0.377

rising Thai non-musician mixed speech 1.113

falling Australian musician blocked sine-wave 6.39

falling Australian musician blocked speech 7.11







335

falling Australian musician mixed sine-wave 2.87

falling Australian musician mixed speech 3.48






falling Australian musician mixed speech 3

falling Australian non-musician blocked sine-wave 2.31

falling Australian non-musician blocked speech 1.31







falling Australian non-musician mixed sine-wave 1.31

falling Australian non-musician mixed speech -0.07


falling Australian non-musician mixed speech 1.93


falling Australian non-musician mixed speech 0.71


falling Australian non-musician mixed speech -0.2

falling Thai musician blocked sine-wave -0.02

falling Thai musician blocked speech 1.19

falling Thai musician blocked sine-wave 1.71






falling Thai musician mixed sine-wave 1.72

falling Thai musician mixed speech 2.05





336



falling Thai non-musician blocked sine-wave 3.07

falling Thai non-musician blocked speech 3.22





falling Thai non-musician blocked sine-wave 2


falling Thai non-musician mixed sine-wave 0.78

falling Thai non-musician mixed speech 1.84


falling Thai non-musician mixed speech -0.16

falling Thai non-musician mixed sine-wave -0.12




337

Appendix A7.10 Statistical Output – Discrimination Accuracy

Rising Continuum Tests of Between-Subjects Effects Dependent Variable: discrimination



Intercept 78.497 1 78.497 33.641 .000

language .144 1 .144 .062 .805

music 2.802 1 2.802 1.201 .279

mixblock 6.815 1 6.815 2.921 .094

sinespeech 22.567 1 22.567 9.672 .003

language * music 2.659 1 2.659 1.139 .291


music * mixblock 7.505 1 7.505 3.216 .079

language * music * mixblock .300 1 .300 .129 .721


music * sinespeech .175 1 .175 .075 .785

language * music * sinespeech 2.980 1 2.980 1.277 .264

mixblock * sinespeech 1.513 1 1.513 .649 .425


music * mixblock * sinespeech 2.971 1 2.971 1.273 .265

language * music * mixblock * sinespeech 2.925 1 2.925 1.253 .268

Error 112.000 48 2.333

Total 246.018 64



338

Falling Continuum Tests of Between-Subjects Effects Dependent Variable: discrimination



Intercept 216.009 1 216.009 132.863 .000

language 6.135 1 6.135 3.774 .058

musical 21.871 1 21.871 13.452 .001

mixblock 2.199 1 2.199 1.352 .251

sinespeech .736 1 .736 .453 .504




language * musical * mixblock .783 1 .783 .482 .491


musical * sinespeech .267 1 .267 .164 .687

language * musical * sinespeech .209 1 .209 .128 .722

mixblock * sinespeech .012 1 .012 .007 .931


musical * mixblock * sinespeech .256 1 .256 .157 .693

language * musical * mixblock * sinespeech 1.798 1 1.798 1.106 .298

Error 78.038 48 1.626

Total 343.548 64



339

Appendix A7.11 Raw Data – Discrimination Peak Analysis

Language Rising Sine Stimulus-pairs (stimulus offset in Hz)

Background Continuum Wave 205-212.5 212.5-220 220-227.5 227.5-235 235-242.5 242.5-250 250-257.5

Thai musician mixed -2.283 4.28 3.168 1.658 2.623 3.105 3.105

Thai musician mixed 2.283 -2.283 2.623 2.075 0 1.67 1.67

Thai musician mixed 0.668 2.338 0.885 -0.885 1.67 -2.338 -2.338

Thai musician mixed 2.283 4.905 0.768 2.438 0.885 3.168 3.168

Thai musician blocked 0 1.553 4.725 -1.498 6.355 3.473 3.473

Thai musician blocked 5.243 1.815 3.473 1.07 0 1.735 1.735

Thai musician blocked -3.173 2.883 6.98 1.175 0 5.715 5.715

Thai musician blocked 0.415 0.885 0.785 2.288 5.91 5.795 5.795

Thai non-musician mixed 2.283 2.283 1.998 4.905 2.283 1.998 1.998


Thai non-musician mixed 1.77 -0.785 0 1.113 3.995 0 0

Thai non-musician mixed -2.283 -2.283 2.438 -1.398 0.885 0 0

Thai non-musician blocked 1.998 -0.885 4.28 1.77 0.885 1.998 1.998

Thai non-musician blocked 0 1.998 -0.285 0 0 -2.883 -2.883

Thai non-musician blocked 1.998 1.57 1.553 2.698 -0.768 5.795 5.795

Thai non-musician blocked -1.998 1.998 0 0.785 3.173 0 0

Australian musician mixed 1.998 5.795 1.553 2.623 6.9 4.175 4.175


Australian musician mixed 3.328 -1.998 4.725 1.77 2.283 3.958 3.958

Australian musician mixed 4.28 -0.73 1.553 1.838 1.338 -1.998 -1.998

Australian musician blocked 1.998 -0.625 2.283 2.438 0 2.623 2.623

Australian musician blocked 0 4.28 0.285 5.795 2.543 1.998 1.998

Australian musician blocked 0.885 0.885 5.91 3.428 0.1 1.758 1.758

Australian musician blocked -2.283 -3.168 -2.283 3.508 1.113 3.55 3.55

Australian non-musician mixed 4.62 2.438 4.058 0 4.358 5.455 5.455

Australian non-musician mixed 4.62 2.438 4.058 0 4.358 5.455 5.455

Australian non-musician mixed 2.623 1.998 -1.998 0.285 6.64 4.175 4.175

Australian non-musician mixed -4.058 -0.785 -3.408 0.445 -5.245 0.885 0.885

Australian non-musician blocked -0.668 1.77 -0.1 3.068 2.543 1.553 1.553

Australian non-musician blocked 2.283 1.998 4.095 -0.785 3.168 -1.553 -1.553

Australian non-musician blocked -0.785 0 0.785 2.283 2.438 -1.77 -1.77

Australian non-musician blocked 0.73 -1.553 3.173 2.698 -0.885 0 0

340

Rising Speech 205-212.5 212.5-220 220-227.5 227.5-235 235-242.5 242.5-250 250-257.5

Thai musician mixed -3.995 1.998 4.175 5.243 -0.34 3.168 4.905

Thai musician mixed 2.283 1.998 1.553 0.785 0.285 -1.998 0

Thai musician mixed 4.62 2.283 0.285 0 3.168 -0.885 -1.738

Thai musician mixed 1.398 -2.438 2.438 3.958 1.838 0.285 0


Thai musician blocked 0 0 0 0 4.358 0 -0.99

Thai musician blocked 0.768 0.73 4.62 -0.1 1.398 -1.145 4.095


Thai non-musician mixed 0 -1.998 2.283 2.623 1.113 1.998 1.998

Thai non-musician mixed 0.785 0.885 -1.498 3.173 3.473 1.998 0

Thai non-musician mixed 0 2.623 -2.883 0.885 1.498 -3.835 0

Thai non-musician mixed 1.113 -2.283 0 -0.885 4.28 -0.885 1.498

Thai non-musician blocked -1.998 0 -1.553 1.66 0 -0.785 -0.1

Thai non-musician blocked 0 0 -1.998 1.998 4.28 2.283 -1.998

Thai non-musician blocked 1.113 -3.995 -1.145 3.483 0.885 0.1 8.715

Thai non-musician blocked -2.283 0 0.885 -1.553 0.545 -1.553 2.883

Australian musician mixed 0 2.283 3.995 6.98 3.508 4.28 6.64

Australian musician mixed -0.885 1.67 4.358 0 -0.768 2.438 2.438

Australian musician mixed 0.885 2.283 1.113 -1.67 -3.105 2.283 5.243

Australian musician mixed 0 2.078 3.155 1.77 -0.122 3 4.773

Australian musician blocked 2.338 -0.785 5.795 1.398 2.438 0 -1.398

Australian musician blocked 0 0 0.885 -1.998 2.883 4.905 2.623

Australian musician blocked 1.998 2.438 -1.498 2.438 -0.155 -2.805 1.145

Australian musician blocked -1.998 -2.283 3.508 2.283 3.068 0 2.438

Australian non-musician mixed 2.283 1.998 0 0.285 0.285 -2.283 2.623

Australian non-musician mixed 2.283 1.998 0 0.285 0.285 -1.998 2.623

Australian non-musician mixed 1.998 0 0 1.998 4.905 2.623 2.783

Australian non-musician mixed 2.698 -0.785 -1.553 -3.173 -1.67 0 -0.73

Australian non-musician blocked -2.283 -4.28 -1.67 -0.885 2.883 1.553 1.998

Australian non-musician blocked 0.885 0 -1.398 -1.113 -1.113 0.885 1.553

Australian non-musician blocked 0 -1.738 -2.283 0.785 2.623 2.283 1.553

Australian non-musician blocked -0.885 0.885 0 0 2.388 -0.785 0.1

341

Language Falling Sine Stimulus-pairs (stimulus offset in Hz)

Background Continuum Wave 182.5-190 190-197.5 197.5-205 205-212.5 212.5-220 220-227.5 227.5-235

Thai musician mixed 3.428 5.395 2.698 0 -1.145 1.67 0

Thai musician mixed 1.998 0 -1.113 4.28 -0.885 -1.998 1.398

Thai musician mixed 1.57 -5.455 -0.785 -1.553 3.835 -2.283 4.565

Thai musician mixed 1.998 -0.885 -1.998 3.168 4.62 4.28 0.785

Thai musician blocked 1.998 0 4.905 2.438 5.91 2.783 1.77

Thai musician blocked 0 2.283 1.498 4.28 4.725 2.783 2.283


Thai musician blocked 0.785 0 1.57 3.173 0.785 2.283 3.508

Thai non-musician mixed 2.283 -1.77 -3.55 3.105 1.998 -0.885 4.28

Thai non-musician mixed 1.998 1.553 0 -3.168 2.283 -1.998 0


Thai non-musician mixed 0.285 0 3.508 3.173 1.175 0.885 0.285

Thai non-musician blocked 0 0 -1.998 -1.498 0.668 0 1.998

Thai non-musician blocked 0 -1.998 1.998 1.553 3.995 1.998 0

Thai non-musician blocked 0.785 -0.075 2.543 3.173 5.795 5.455 2.805

Thai non-musician blocked -1.398 3.168 -1.998 4.095 4.1 3.173 2.883

Australian musician mixed 3.995 0.285 1.62 3.835 5.715 4.62 0




Australian musician blocked -0.1 -0.885 5.085 1.57 3.483 4.058 5.085


Australian musician blocked 3.995 4.175 -0.73 3.173 2.623 1.998 1.998

Australian musician blocked 0.785 1.738 2.623 -4.28 1.998 2.623 6.98

Australian non-musician mixed 1.67 1.62 -0.785 1.67 0.885 5.143 5.243

Australian non-musician mixed -0.668 4.095 2.543 -1.498 2.388 1.838 0.475

Australian non-musician mixed 0 0 3.168 1.998 0 1.998 1.998

Australian non-musician mixed 1.77 1.553 2.438 4.28 -1.998 1.998 4.358

Australian non-musician blocked -0.885 0.73 2.623 0.885 1.398 0 2.883

Australian non-musician blocked 0 2.438 2.438 3.105 3.995 4.98 -0.785

Australian non-musician blocked 0 0 -1.998 1.998 0 0 3.995

Australian non-musician blocked 1.998 -3.835 2.438 4.175 3.583 0.885 4.175

342

Falling Speech 182.5-190 190-197.5 197.5-205 205-212.5 212.5-220 220-227.5 227.5-235

Thai musician mixed 2.623 2.623 0 1.57 0.285 4.62 2.623

Thai musician mixed -0.885 2.543 -0.885 0.668 0 1.998 -1.998

Thai musician mixed 2.883 -1.998 -0.885 1.998 2.283 2.623 1.398

Thai musician mixed 0 0 2.623 2.543 3.408 0.785 0

Thai musician blocked 0 0 0 3.55 -2.698 0 0



Thai musician blocked 0 0.885 -0.155 1.758 0.785 1.553 0

Thai non-musician mixed 1.553 0 2.883 -1.998 3.508 2.338 4.62

Thai non-musician mixed -1.998 0 -1.998 2.883 0 0 0


Thai non-musician mixed 0 1.553 1.553 0.668 3.55 3.835 -0.885

Thai non-musician blocked -0.885 0 0.285 1.998 1.998 0 1.998

Thai non-musician blocked 2.283 -1.113 2.283 0.1 2.543 2.783 0

Thai non-musician blocked 2.805 0 2.338 -0.075 2.883 0.545 0.885

Thai non-musician blocked -1.998 4.905 -0.885 1.838 0 0 0.885




Australian musician mixed 0.885 0 3.835 7.53 4.725 0 2.623

Australian musician blocked 4.28 0 0.885 4.358 4.565 4.28 2.623


Australian musician blocked 1.998 -1.998 1.113 1.553 4.565 0 1.998

Australian musician blocked 0.785 3.995 4.175 4.565 -1.77 1.77 0.885

Australian non-musician mixed -0.445 2.438 1.998 0 0.785 4.62 -0.445

Australian non-musician mixed 0 -0.1 -1.77 2.338 -0.885 -0.885 0.785

Australian non-musician mixed -0.885 1.998 1.213 2.883 1.998 3.995 2.283

Australian non-musician mixed 1.998 0.885 -1.998 0 0.885 1.213 1.998

Australian non-musician blocked -0.668 -0.885 -1.998 0.73 0.668 0 0.73

Australian non-musician blocked 2.283 -0.885 0.885 2.623 2.543 -0.885 2.623

Australian non-musician blocked 1.998 2.623 1.998 4.28 0 1.998 0

Australian non-musician blocked 0.785 0.785 0.885 0.785 0.768 2.438 0.785

343

Appendix A7.12 Analyses – Discrimination Peak


Discrimination falling continuum

Source SS df MS F

------------------------------------------------

Between

------------------------------------------------

B1 2.942 1 2.942 0.156

B2 198.372 1 198.372 10.489

B3 6.449 1 6.449 0.341

B4 8.258 1 8.258 0.437

B5 7.432 1 7.432 0.393

B6 18.342 1 18.342 0.970

Error 434.965 23 18.912

------------------------------------------------

Within

------------------------------------------------

W1 18.988 1 18.988 2.964

B1W1 11.343 1 11.343 1.771

B2W1 65.650 1 65.650 10.249

B3W1 0.233 1 0.233 0.036

B4W1 0.040 1 0.040 0.006

B5W1 18.100 1 18.100 2.826

B6W1 4.519 1 4.519 0.705

Error 147.325 23 6.405

W2 1.103 1 1.103 0.130

B1W2 3.997 1 3.997 0.471

B2W2 43.137 1 43.137 5.083

B3W2 1.407 1 1.407 0.166

B4W2 19.598 1 19.598 2.309

B5W2 4.453 1 4.453 0.525

B6W2 1.266 1 1.266 0.149

Error 195.180 23 8.486

W3 398.539 1 398.539 15.479

B1W3 27.515 1 27.515 1.069

B2W3 33.294 1 33.294 1.293

B3W3 35.615 1 35.615 1.383

B4W3 0.036 1 0.036 0.001

B5W3 10.469 1 10.469 0.407

B6W3 0.442 1 0.442 0.017

Error 592.167 23 25.746

W4 8.843 1 8.843 0.607

B1W4 2.909 1 2.909 0.200

B2W4 16.081 1 16.081 1.104

B3W4 53.552 1 53.552 3.677

B4W4 4.657 1 4.657 0.320

B5W4 0.662 1 0.662 0.045

B6W4 9.781 1 9.781 0.672

Error 334.949 23 14.563

W5 0.228 1 0.228 0.027

B1W5 1.997 1 1.997 0.239

B2W5 2.053 1 2.053 0.246

B3W5 40.157 1 40.157 4.807

B4W5 10.344 1 10.344 1.238

B5W5 38.362 1 38.362 4.593

B6W5 0.735 1 0.735 0.088

Error 192.119 23 8.353

344

W6 3.369 1 3.369 0.487

B1W6 28.981 1 28.981 4.192

B2W6 0.052 1 0.052 0.008

B3W6 0.804 1 0.804 0.116

B4W6 24.314 1 24.314 3.516

B5W6 14.115 1 14.115 2.041

B6W6 53.300 1 53.300 7.709

Error 159.028 23 6.914

W7 1.472 1 1.472 0.200

B1W7 23.183 1 23.183 3.150

B2W7 0.527 1 0.527 0.072

B3W7 4.356 1 4.356 0.592

B4W7 0.801 1 0.801 0.109

B5W7 0.769 1 0.769 0.104

B6W7 3.396 1 3.396 0.462

Error 169.252 23 7.359

W8 13.390 1 13.390 1.340

B1W8 20.711 1 20.711 2.073

B2W8 2.375 1 2.375 0.238

B3W8 24.349 1 24.349 2.437

B4W8 0.759 1 0.759 0.076

B5W8 8.118 1 8.118 0.812

B6W8 4.827 1 4.827 0.483

Error 229.801 23 9.991

------------------------------------------------

345


Discrimination falling continuum

Source SS df MS F

------------------------------------------------

Between

------------------------------------------------

B1 85.895 1 85.895 2.238

B2 306.201 1 306.201 7.977

B3 19.220 1 19.220 0.501

B4 10.471 1 10.471 0.273

B5 15.114 1 15.114 0.394

B6 47.906 1 47.906 1.248

Error 921.241 24 38.385

------------------------------------------------

Within

------------------------------------------------

W1 80.829 1 80.829 10.948

B1W1 15.360 1 15.360 2.080

B2W1 0.755 1 0.755 0.102

B3W1 0.351 1 0.351 0.048

B4W1 6.160 1 6.160 0.834

B5W1 6.793 1 6.793 0.920

B6W1 4.349 1 4.349 0.589

Error 177.197 24 7.383

W2 36.647 1 36.647 8.998

B1W2 3.720 1 3.720 0.913

B2W2 0.123 1 0.123 0.030

B3W2 1.985 1 1.985 0.487

B4W2 3.022 1 3.022 0.742

B5W2 13.482 1 13.482 3.310

B6W2 14.677 1 14.677 3.603

Error 97.752 24 4.073

W3 10.303 1 10.303 1.279

B1W3 4.242 1 4.242 0.526

B2W3 3.742 1 3.742 0.464

B3W3 4.126 1 4.126 0.512

B4W3 7.484 1 7.484 0.929

B5W3 1.454 1 1.454 0.181

B6W3 3.681 1 3.681 0.457

Error 193.384 24 8.058

W4 32.589 1 32.589 2.261

B1W4 7.170 1 7.170 0.497

B2W4 5.058 1 5.058 0.351

B3W4 20.088 1 20.088 1.394

B4W4 15.396 1 15.396 1.068

B5W4 0.113 1 0.113 0.008

B6W4 19.822 1 19.822 1.375

Error 345.892 24 14.412

W5 3.955 1 3.955 0.538

B1W5 0.038 1 0.038 0.005

B2W5 0.109 1 0.109 0.015

B3W5 44.369 1 44.369 6.033

B4W5 8.013 1 8.013 1.090

B5W5 8.606 1 8.606 1.170

B6W5 5.928 1 5.928 0.806

Error 176.500 24 7.354

W6 10.339 1 10.339 0.739

B1W6 33.463 1 33.463 2.393

B2W6 13.178 1 13.178 0.942

346

B3W6 0.164 1 0.164 0.012

B4W6 0.921 1 0.921 0.066

B5W6 0.110 1 0.110 0.008

B6W6 1.057 1 1.057 0.076

Error 335.628 24 13.985

W7 21.532 1 21.532 4.687

B1W7 24.370 1 24.370 5.304

B2W7 23.707 1 23.707 5.160

B3W7 4.493 1 4.493 0.978

B4W7 1.385 1 1.385 0.301

B5W7 8.264 1 8.264 1.799

B6W7 1.385 1 1.385 0.301

Error 110.263 24 4.594

W8 2.889 1 2.889 0.364

B1W8 4.091 1 4.091 0.516

B2W8 0.735 1 0.735 0.093

B3W8 0.003 1 0.003 0.000

B4W8 0.056 1 0.056 0.007

B5W8 1.475 1 1.475 0.186

B6W8 2.101 1 2.101 0.265

Error 190.351 24 7.931

------------------------------------------------

347

Appendix A8


348

Appendix A8.1 Participant Details

Non-musicians Gender Date of Birth Instrument

Years Played

Hours per week

N1 f 2/08/1981 0 0

N2 f 5/03/1982 0 0

N3 m 20/10/1972 0 0

N4 f 10/03/1973 0 0

N5 f 6/07/1988 0 0

N6 m 30/10/1978 0 0

N7 f 9/04/1988 0 0

N8 f 4/08/1987 0 0

N9 f 27/04/1987 0 0

N10 m 22/10/1980 Guitar 2 0

N11 f 12/11/1988 0 0

N12 m 10/09/1986 0 0

N13 m 22/08/1985 0 0

N13 m 22/08/1985 0 0

N14 m 3/06/1984 0 0

N15 f 3/10/1976 0 0

N16 m 23/08/1979 0 0

N17 m 14/09/1979 0 0

N18 f 24/07/1987 0 0

Musicians Gender

Date of

Birth Instrument

Years

Played

Hours per

Week

M1 f 30/05/1977 Piano 11 15

Violin 11

Recorder 5

Percussion 16

Singing 5

Fife 5

M2 f 23/05/1981 Violin 5 2.5

Flute 5

Recorder 2

Singing 3

M2 f 23/05/1981 Violin 7 2.5

Flute 12

349

Recorder 6

M3 m 23/02/1982 Piano 17 10

Guitar 9

Singing 17

Bass 6

M4 m 5/12/1983 Piano 17 7

Guitar 13

Saxophone 8

Singing 2

M5 m 28/05/1982 Piano 15 6

Clarinet 8

Voice 10

M6 f 14/02/1966 Piano 5 3

Flute 2

Singing 12

M7 m 12/09/1973 Trumpet 22 3.5

French Horn 17

Clarinets 14

Percussion 14

Flute 2

Viola 1

Double Bass 1

Tuba 1

Violin 0.5

Voice 3

Piano 1

Oboe 0.5

Alto Sax 0.5

Tenor Horn 1

M8 f 4/10/1987 Clarinet 11 7

Cello 3

Drums 5

Keyboard 3

M9 f 12/05/1987 Saxophone 2 0

Clarinet 6

M10 m 23/08/1974 Piano 5 1

Guitar 2

Bass G 4

350

Recorder 4

Singing 5

M11 m 7/07/1987 Piano 2 3

Tuba 5

M12 f 22/08/1974 Piano 26 10

M13 f 5/04/1988 Piano 13 10

M14 m 12/03/1979 Guitar 9 14

Piano 3

M15 f 21/11/1986 Singing 15 17.5

Piano 14

Bass Guitar 7

Drums 5

M16 f 28/10/1946 Piano 2 3.5

Guitar 41

Singing 50

M17 f 1/08/1979 Piano 22 0

M18 m 16/03/1981 Trumpet 18 4

Tenor Horn 7

Soprano Cornet 5

351

Appendix A8.2 Consent Form and Questionnaire


Denis Burnham



PERCEPTION AND PRODUCTION OF SPEECH SOUNDS April 2006

PARTICIPANT INFORMATION STATEMENT You are invited to participate in a research study on human speech. The results of the study will be used to understand how adults produce and perceive speech and other auditory signals. The benefits of this study include increased understanding of how easily humans produce speech sounds in their native language, how easily humans perceive acoustic information in another's speech. We are interested in studying this for different languages. So this research is being conducted with speakers of Australian English. You are invited to participate because you are a native speaker of Australian English. If you participate, you will complete a 90-minute session. You will be asked to identify and produce short sound items. After that, you will do a language aptitude test, a foreign language aptitude test and a musical memory test. Participation is voluntary. You have a right not to participate in, or subsequently withdraw from the study. Any decision not to participate will not affect any current or future relationship with the University of Western Sydney. If you agree to take part in this study, you will be asked to sign a consent form (see over). If you would like additional information on the project or have any questions please do not hesitate to contact Barbara Schwanhaeusser on 9772 6589. Please take time now to ask any questions you may have. Thank you for your time. Denis Burnham MARCS Auditory Laboratories & School of Psychology University of Western Sydney (Bankstown) NOTE: This study has been approved by the University of Western Sydney Human Research Ethics Committee. If you have any complaints or reservations about the ethical conduct of this research, you may contact the UWS Ethics Committee through the Research Ethics Officers (tel: (02) 4736 0883 ). Any issues you raise will be treated in confidence and investigated fully, and you will be informed of the outcome.



352

MARCS Auditory Laboratories




PERCEPTION AND PRODUCTION OF SPEECH SOUNDS CONSENT FORM

Please read the information sheet before signing this. 1. Yes No I,..............................................................................(please print name) agree to participate as a participant in the study described in the participant information statement attached to this form. 2. Yes No I acknowledge that I have read the participant information statement, which explains why I have been selected, the aims of the experiment, the nature, the possible risks of the investigation, and the statement have been explained to me to my satisfaction. 3. Yes No I understand that I can withdraw from the study at any time, and I understand that my decision whether or not to participate in or subsequently withdraw from this study will not affect any current or future relationship to the University of Western Sydney. 4. Yes No I agree that research data gathered from the results of the study may be published, provided that I cannot be identified. 5. Yes No I agree that research data gathered from the results of the study may be provided to other researchers in conference presentations and in follow-up research, provided that I cannot be identified. 6. Yes No I understand that if I have any questions relating to my participation in this research, I may contact Barbara Schwanhaeusser (9772 6589) or Prof Denis Burnham (9772 6681) who will be happy to answer them. 7. Yes No I acknowledge receipt of a copy of the Participant Information Statement. 8. Yes No I agree to complete a questionnaire about my language background and other details relevant to the research before participating in the research. Participant's signature: ………………………………….. Date: …………………………………..

NOTE: This study has been approved by the University of Western Sydney Human Research Ethics Committee. If you have any complaints or reservations about the ethical conduct of this research, you may contact the UWS Ethics Committee through the Research Ethics Officers (tel: (02) 4736 0883). Any issues you raise will be treated in confidence and investigated fully, and you will be informed of the outcome.



353

354


Denis Burnham



PERCEPTION AND PRODUCTION OF SPEECH SOUNDS PARTICIPANT QUESTIONNAIRE

Please fill in the following details. This information is important for the study, and is the only information about you which will be retained. 1 Your name: …………………….. 2 Male / Female (please circle) 3 Date of birth: …………….. 4 Place of birth (City/town & Country): …………………………………..… 5 What is the official language in that country?………………………………. 6 Hearing: Do you have normal hearing? Yes / No If No, please provide any details you can _________________________________________________________________________________________________________________________________________________________ 7 Speech/language history: Do you have any history of speech/language problems? Yes / No If Yes, please provide any details you can __________________________________________________________________________________________________________________________________________________________ 8 Language background: What is your native language/dialect, that is the language/dialect, which you learned from birth. Please list the percentage of time you use these in your everyday life now. Native Language / Dialect Percentage of Time Spoken …………………………. ……………….. If you learned more than one language from birth, please list these and the percentage of time you use these in your everyday life now. Additional Native Language / Dialect Percentage of Time Spoken …………………………. ………………..



355


Denis Burnham



PERCEPTION AND PRODUCTION OF SPEECH SOUNDS PARTICIPANT QUESTIONNAIRE

Please also list all the languages of which you have some knowledge, and indicate how old you were when you started learning the language, and the percentage of time you use these in your everyday life. Other language/s that you Age at which you started Percentage of have knowledge of: learning this language: Time Spoken …………………………………. ………… ……….. …………………………………. ………… ……….. …………………………………. ………… ……….. 9 Do you play a musical instrument and/or have singing training? Yes / No If Yes, please list all the musical instruments which you have play/ed and indicate for how long you play/ed the instrument, e.g. violin, age 10, played for 5 years (singing counts!) Instrument: Age started playing: Number of Years Playing: ……………………… …… …… ……………………… …… …… ……………………… …… …… Do you have formal secondary or tertiary level music education? Yes / No If so, please list details about that Instrument: Course: Grade or level attained: ……………………… …… …… ……………………… …… …… Are you still playing music? Yes / No If not, when did you finish? ……………… If you are still playing, how many hours per day do you play? ………… 12 Do you have perfect pitch / absolute pitch? Yes / No / Don’t know (recognition / production of a note without a reference to other notes) Where, when and how was that assessed? ……………………………………….



356

THANK YOU! Appendix A8.3 Answer Sheet – ADVANCED MEASURES OF MUSIC AUDIATION (AMMA)

357

Appendix A8.4 Song List / Answer Sheet Musical Memory Test

Which is the original version of the song? (please the box if you know the song) 1: Cat Empire - Hello Version 1 Version 2 2: Nelly/Kelly - Dilemma Version 1 Version 2 3: Men at Work - Land down Under Version 1 Version 2 4: Franz Ferdinand – Take me out Version 1 Version 2 5: Puff Daddy – I‟ll be missing you Version 1 Version 2 6: Black Eyed Peas – Where is the love Version 1 Version 2 7: Los del Rio - Macarena Version 1 Version 2 8: Outkast – Hey Ya! Version 1 Version 2 9: Michael Jackson – Black or White Version 1 Version 2 10: Jamelia - Superstar Version 1 Version 2 11: Queen – Bohemian Rhapsody Version 1 Version 2 12: Eamon – Don‟t want you back Version 1 Version 2

358

Appendix A8.5 Music Memory Test Screen Shot

359

Appendix A8.6 Part Five of The Pimsleur Language Aptitude Battery

360

Part Six of The Pimsleur Language Aptitude Battery

361

Appendix A8.7 List of Thai Words and Translations

Mid0 Thai Thai Thai

0 W 0 W 0 N

0 W โพ 0 W โ財 0 W โ罪

0 N 0 W 0 N

Low1

1 N 1 W 1 N

1 N 1 W 1 N

1 N 1 N 1 N

High 3

3 W 3 N 3 N

3 N 3 W 3 N

3 N 3 N 3 N

Word list & Translation Word Translation

0 Fat (adj.)

0 Bo tree; pipal tree

3 Doing drugs (slang)

0 Year

0 Instrument used in gambling (Chinese origin)

0 Full; a lot of

1 Flute; pipe

1 Last (in playing game)

3 Naked; nude, obscene

0 Ribbon

362

Appendix A8.8 DMDX Script for Tone Identification Task

<ep><dwc 200200060 ><dbc 19> <fd 50> <s 27> <cr> <nfbt> <t 7000> <d 50> <id "keyboard"> <zil> <zor> <mpr +Left Shift> <mnr +Right Shift> <mnr +space><eop> $0 m1=<umpr><umnr><mpr +Left Shift><mnr +Right Shift><mnr +space>=, m2=<umpr><umnr><mnr +Left Shift><mpr +Right Shift><mnr +space>=, m3=<umpr><umnr><mnr +Left Shift><mnr +Right Shift><mpr +space>=, <ln -6>"In this experiment you’ll identify sounds.", <ln -4>"You'll press the LEFT key when you hear one kind of sound.", <ln -3>"You'll press the RIGHT key when you hear another kind of sound.", <ln -2>"You'll press the SPACBAR when you hear a third kind of sound.", <ln 0>"Before we begin, let’s practice.", <ln 1>"Press the SPACEBAR to continue."; +50a ~1 "Now press the LEFT key after you hear this kind of sound:"/<wav 2>*"phi0.wav"/; +502 ~3 "Press the SPACEBAR after you hear this kind of sound:"/<wav 2>*"phi3.wav"/; +503 ~2 "Press the RIGHT after you hear this kind of sound:"/<wav 2>*"phi1.wav"/; +504 ~1 "Press the LEFT key after you hear this kind of sound:"/<wav 2>*"pi0.wav"/; +505 ~3 "Press the SPACEBAR key after you hear this kind of sound:"/<wav 2>*"pi3.wav"/; +506 ~2 "Press the RIGHT after you hear this kind of sound:"/<wav 2>*"pi1.wav"/; +507 ~1 "Press the LEFT key after you hear this kind of sound:"/<wav 2>*"bi0.wav"/; +508 ~3 "Press the SPACEBAR key after you hear this kind of sound:"/<wav 2>*"bi3.wav"/; +509 ~2 "Press the RIGHT after you hear this kind of sound:"/<wav 2>*"bi1.wav"/; 0<ln -3>"Good. Now you'll train some more.", <ln -2>"PLEASE RESPOND AS FAST AS POSSIBLE!", <ln 0>"When you get 3 in a row correct,", <ln 1>"you'll move on the the testing phase.", <ln 3>"Please press the SPACEBAR to continue."; 2000<set 1,2>;$ \+601 ~1<set 1,2>"ready"/<wav 2>*"phi0.wav"<deciw 1><bicLE 1,1,-4000>/; +602 ~2<set 1,2>"ready"/<wav 2>*"phi1.wav"<deciw 1><bicLE 1,1,-4000>/; +603 ~3<set 1,2>"ready"/<wav 2>*"phi3.wav"<deciw 1><bicLE 1,1,-4000>/;\ $4000<bicGT 1,1,-12000>;$ \+604 ~1<set 1,2>"ready"/<wav 2>*"phi0.wav"<deciw 1><bicLE 1,1,-6000>/; +605 ~2<set 1,2>"ready"/<wav 2>*"phi1.wav"<deciw 1><bicLE 1,1,-6000>/; +606 ~3<set 1,2>"ready"/<wav 2>*"phi3.wav"<deciw 1><bicLE 1,1,-6000>/;\ $6000<bicGT 1,1,-12000>;$ \+607 ~1<set 1,2>"ready"/<wav 2>*"phi0.wav"<deciw 1><bicLE 1,1,-8000>/; +608 ~2<set 1,2>"ready"/<wav 2>*"phi1.wav"<deciw 1><bicLE 1,1,-8000>/; +609 ~3<set 1,2>"ready"/<wav 2>*"bi3.wav"<deciw 1><bicLE 1,1,-8000>/;\ $8000<bicGT 1,1,-12000>;$ \+610 ~1<set 1,2>"ready"/<wav 2>*"phi0.wav"<deciw 1><bicLE 1,1,-10000>/; +611 ~2<set 1,2>"ready"/<wav 2>*"phi1.wav"<deciw 1><bicLE 1,1,-10000>/; +612 ~3<set 1,2>"ready"/<wav 2>*"bi3.wav"<deciw 1><bicLE 1,1,-10000>/;\ $10000<bicGT 1,1,-12000>;$ \+613 ~1<set 1,2>"ready"/<wav 2>*"phi0.wav"<deciw 1><bicLE 1,1,-2000>/; +614 ~2<set 1,2>"ready"/<wav 2>*"phi1.wav"<deciw 1><bicLE 1,1,-2000>/; +615 ~3<set 1,2>"ready"/<wav 2>*"bi3.wav"<deciw 1><bicLE 1,1,-2000>/;\

363

$12000; 0<ln -3>"Great, 3 in a row!", <ln -2>"Now you'll move on the the next part of the experiment", <ln -1>"PLEASE RESPOND AS QUICKLY AND ACCURATELY AS POSSIBLE!!", <ln 0> "Please press SPACEBAR to continue.";$ \+701 ~1<nfb>"ready"/<wav 2>*"phi0.wav"/; +702 ~1<nfb>"ready"/<wav 2>*"pho0.wav"/; +703 ~1<nfb>"ready"/<wav 2>*"phu0.wav"/; +704 ~1<nfb>"ready"/<wav 2>*"bi0.wav"/; +705 ~1<nfb>"ready"/<wav 2>*"bo0.wav"/; +706 ~1<nfb>"ready"/<wav 2>*"bu0.wav"/; +707 ~1<nfb>"ready"/<wav 2>*"pi0.wav"/; +708 ~1<nfb>"ready"/<wav 2>*"po0.wav"/; +709 ~1<nfb>"ready"/<wav 2>*"pu0.wav"/; +710 ~2<nfb>"ready"/<wav 2>*"phi1.wav"/; +711 ~2<nfb>"ready"/<wav 2>*"pho1.wav"/; +712 ~2<nfb>"ready"/<wav 2>*"phu1.wav"/; +713 ~2<nfb>"ready"/<wav 2>*"bi1.wav"/; +714 ~2<nfb>"ready"/<wav 2>*"bo1.wav"/; +715 ~2<nfb>"ready"/<wav 2>*"bu1.wav"/; +716 ~2<nfb>"ready"/<wav 2>*"pi1.wav"/; +717 ~2<nfb>"ready"/<wav 2>*"po1.wav"/; +718 ~2<nfb>"ready"/<wav 2>*"pu1.wav"/; +719 ~3<nfb>"ready"/<wav 2>*"phi3.wav"/; +720 ~3<nfb>"ready"/<wav 2>*"pho3.wav"/; +721 ~3<nfb>"ready"/<wav 2>*"phu3.wav"/; +722 ~3<nfb>"ready"/<wav 2>*"bi3.wav"/; +723 ~3<nfb>"ready"/<wav 2>*"bo3.wav"/; +724 ~3<nfb>"ready"/<wav 2>*"bu3.wav"/; +725 ~3<nfb>"ready"/<wav 2>*"pi3.wav"/; +726 ~3<nfb>"ready"/<wav 2>*"po3.wav"/; +727 ~3<nfb>"ready"/<wav 2>*"pu3.wav"/;\ \+801 ~1<nfb>"ready"/<wav 2>*"phi0.wav"/; +802 ~1<nfb>"ready"/<wav 2>*"pho0.wav"/; +803 ~1<nfb>"ready"/<wav 2>*"phu0.wav"/; +804 ~1<nfb>"ready"/<wav 2>*"bi0.wav"/; +805 ~1<nfb>"ready"/<wav 2>*"bo0.wav"/; +806 ~1<nfb>"ready"/<wav 2>*"bu0.wav"/; +807 ~1<nfb>"ready"/<wav 2>*"pi0.wav"/; +808 ~1<nfb>"ready"/<wav 2>*"po0.wav"/; +809 ~1<nfb>"ready"/<wav 2>*"pu0.wav"/; +810 ~2<nfb>"ready"/<wav 2>*"phi1.wav"/; +811 ~2<nfb>"ready"/<wav 2>*"pho1.wav"/; +812 ~2<nfb>"ready"/<wav 2>*"phu1.wav"/; +813 ~2<nfb>"ready"/<wav 2>*"bi1.wav"/; +814 ~2<nfb>"ready"/<wav 2>*"bo1.wav"/; +815 ~2<nfb>"ready"/<wav 2>*"bu1.wav"/; +816 ~2<nfb>"ready"/<wav 2>*"pi1.wav"/;

+817 ~2<nfb>"ready"/<wav 2>*"po1.wav"/;

+818 ~2<nfb>"ready"/<wav 2>*"pu1.wav"/; +819 ~3<nfb>"ready"/<wav 2>*"phi3.wav"/; +820 ~3<nfb>"ready"/<wav 2>*"pho3.wav"/; +821 ~3<nfb>"ready"/<wav 2>*"phu3.wav"/; +822 ~3<nfb>"ready"/<wav 2>*"bi3.wav"/; +823 ~3<nfb>"ready"/<wav 2>*"bo3.wav"/; +824 ~3<nfb>"ready"/<wav 2>*"bu3.wav"/; +825 ~3<nfb>"ready"/<wav 2>*"pi3.wav"/;

364

+826 ~3<nfb>"ready"/<wav 2>*"po3.wav"/; +827 ~3<nfb>"ready"/<wav 2>*"pu3.wav"/;\ $0 <ln 0> "End of experiment. Please press the bell. THANK YOU!";$

365

Appendix A8.9 Screenshots Production Task

366

Appendix A8.10 Rating Procedure – Production Task

Two native Thai-speaking raters rated productions of consonants, vowels, and tones.

In each of the rating tasks, three speakers‟ imitations had to be rated. In each task,

there were musicians‟ productions as well as non-musicians‟ and the productions

were randomised. The rater had to rate each item on a scale from 1 to five, where 1

was “very bad”, 2 “bad, 3, “average”, 4 “good”, and 5 “very good”. The raters were

instructed to ignore possible differences in recording quality and to only concentrate

on the quality of productions, in terms of the particular sound that was to be rated.

Due to time constraints it was not possible to have all items rated by both raters. In

order to check inter-rater reliability, productions of four of the participants were rated

by both raters independently. The reliability between the two raters was high (mean

Pearson product-moment correlation of r = .828), so it can be assumed that raters

were using equivalent rating criteria.

participant speech Rater Rater

number sound A B Correlation

m18 b 2.090909 2.022727 0.856317

p 3.94 4.066667

ph 3.522727 4

0 4.066667 4.266667

1 3.136364 3.704545

3 3.466667 4.4

i 3.590909 3.636364

o 4.355556 4.066667

u 3.369565 4.152174 n8 b 2.181818 1.931818 0.804418

p 3.340909 3.522727

ph 3.190476 3.880952

0 2.348837 2.976744

1 3 3.954545

3 2.5 2.863636

i 3.404762 3.428571

o 3.222222 3.2254

u 2.844444 3.244444 m10 b 3.266667 3.666667 0.873692

p 2.444444 2.955556

367

ph 3.4 4.222222

0 3.31288 4.022727

1 3.355556 4.066667

3 3.311111 3.488889

i 3.733333 4.066667

o 3.355556 4.222222

u 3.636364 4.377778 n9 b 2.088889 2.369565 0.708566

p 3.111111 3.355556

ph 2.733333 2.822222

0 3.822222 3.133333

1 2.431818 2.931818

3 2.81089 3.355556

i 3.133333 3.133333

o 2.355556 2.688889

u 2.81089 3.045455

368

Appendix A8.11 Raw Data – Musical Aptitude Test

participant musicianship tone score rhythm score

1 musician 93 97

2 musician 39 50

3 musician 79 85

4 musician 97 98

5 musician 75 60

6 musician 87 92

7 musician 98 92

8 musician 71 65

9 musician 50 45

10 musician 61 55

11 musician 66 70

12 musician 79 85

13 musician 29 50

14 musician 34 25

15 musician 83 80

16 musician 87 60

17 musician 97 95

18 musician 29 55

19 non-musician 44 25


















369

Appendix A8.12 SPSS Output – Musical Aptitude Test

Descriptive Sta tistics

52.72 20.008 18

70.17 20.563 18

61.44 21.865 36

53.61 20.003 18

69.67 24.034 18

61.64 23.264 36

52.61 21.180 18

69.94 21.394 18

61.28 22.748 36

musnonnon-musician

musician

Total

non-musician

musician

Total

non-musician

musician

Total

tone

rhythm

total

Mean Std. Deviation N


2738.778a 1 2738.778 6.654 .014 .164

2320.028b 1 2320.028 4.745 .036 .122

2704.000c 1 2704.000 5.967 .020 .149

135915.111 1 135915.111 330.218 .000 .907

136776.694 1 136776.694 279.770 .000 .892

135178.778 1 135178.778 298.307 .000 .898

2738.778 1 2738.778 6.654 .014 .164

2320.028 1 2320.028 4.745 .036 .122

2704.000 1 2704.000 5.967 .020 .149

13994.111 34 411.592

16622.278 34 488.891

15407.222 34 453.154

152648.000 36

155719.000 36

153290.000 36

16732.889 35

18942.306 35

18111.222 35

Dependent Variabletone

rhythm

total

tone

rhythm

total

tone

rhythm

total

tone

rhythm

total

tone

rhythm

total

tone

rhythm

total


Intercept

musnon

Error

Total

Corrected Total


Partial EtaSquared


R Squared = .122 (Adjusted R Squared = .097)b.

R Squared = .149 (Adjusted R Squared = .124)c.

370

Appendix A8.13 Raw Data – Musical Memory Test

Song Shift Non-musicians Musicians

Franz Ferdinand -2 1 1

Los Del Rio -2 0.9375 1

Eamon -2 0.9333 1

Cat Empire -1 0.1429 0.4615

Nelly/Kelly -1 0.9375 0.8333

Jamelia -1 0.8125 1

Puff Daddy 1 0.875 0.5833

Outkast 1 1 0.9333

Queen 1 0.6 0.8571

Men at Work 2 0.9375 0.8

Black eyed Peas 2 1 0.8462

Michael Jackson 2 0.9375 0.9286

371

Appendix A8.14 SPSS Output – Musical Memory Test


Dependent Variable: result

.315a 7 .045 1.055 .434

17.267 1 17.267 404.775 .000

.001 1 .001 .016 .900

.002 1 .002 .056 .816

.217 1 .217 5.095 .038

.036 1 .036 .850 .370

.009 1 .009 .218 .647

.049 1 .049 1.144 .301

.000 1 .000 .005 .943

.683 16 .043

18.265 24

.998 23


Intercept

musnon

updown

shift

musnon * updown

musnon * shift

updown * shift

musnon * updown * shift

Error

Total

Corrected Total



372

Appendix A8.15 Raw Data – Language Aptitude Test

Participant Musicianship Part Five Part Six

1 musician 0.8 1

2 musician 0.5 1

3 musician 0.9667 0.9583

4 musician 0.9667 1

5 musician 0.9667 0.9167

6 musician 0.9333 1

7 musician 0.9333 1

8 musician 0.7 0.9583

9 musician 0.5667 0.9583

10 musician 0.9 1

11 musician 0.9333 0.9583

12 musician 0.8667 0.9583

13 musician 0.8667 1

14 musician 0.6667 0.875

15 musician 0.6 0.8333

16 musician 0.7 0.75

17 musician 0.9333 0.9583

18 musician 0.6 1

19 non-musician 0.6 0.8333

20 non-musician 0.5333 0.7917

21 non-musician 0.9667 1



24 non-musician 0.5667 0.9167

25 non-musician 0.7333 0.9167




29 non-musician 1 0.9167



32 non-musician 0.6667 0.7917


34 non-musician 0.7333 0.875


36 non-musician 0.4667 0.9167

373

Appendix A8.16 SPSS Outputs – Language Aptitude Test


.083a 1 .083 3.242 .081 .087

.007b 1 .007 1.462 .235 .041

.035c 1 .035 3.761 .061 .100

20.350 1 20.350 790.469 .000 .959

31.641 1 31.641 6662.903 .000 .995

25.685 1 25.685 2788.953 .000 .988

.083 1 .083 3.242 .081 .087

.007 1 .007 1.462 .235 .041

.035 1 .035 3.761 .061 .100

.875 34 .026

.161 34 .005

.313 34 .009

21.309 36

31.809 36

26.033 36

.959 35

.168 35

.348 35

Dependent Variablefive

six

total

five

six

total

five

six

total

five

six

total

five

six

total

five

six

total


Intercept

musnon

Error

Total

Corrected Total


Partial EtaSquared


R Squared = .041 (Adjusted R Squared = .013)b.

R Squared = .100 (Adjusted R Squared = .073)c.

374

Appendix A8.17 Raw Data - Speech Perception Test Criterion Results

Participant Musicianship Consonant Tone Vowel

1 musician 6 3 3

2 musician 4 3 3

3 musician 6 4 3

4 musician 3 5 3

5 musician 8 8 4

6 musician 3 3 3

7 musician 3 3 3

8 musician 6 3 4

9 musician 12 3 3

10 musician 3 5 3

11 musician 3 5 3

12 musician 3 3 3

13 musician 6 3 3

14 musician 3 5 3

15 musician 3 6 3

16 musician 6 3 3

17 musician 13 10 3

18 musician 10 4 4

19 non-musician 4 4 3

















375


Appendix A8.18 PSY Output – Criterion Results


Source SS df MS F

---------------------------------------------------

Between

---------------------------------------------------

B1 59.6671 1 59.6671 4.6995

Error 406.2917 32 12.6966

---------------------------------------------------

Within

---------------------------------------------------

W1 13.4447 1 13.4447 1.1991

B1W1 12.1391 1 12.1391 1.0827

Error 358.7958 32 11.2124

W2 237.6563 1 237.6563 48.1977

B1W2 24.8063 1 24.8063 5.0308

Error 157.7875 32 4.9309

---------------------------------------------------

376

Appendix A8.19 Raw Data - Speech Perception Test Results

Consonant Identification Tone Identification Vowel Identification

Participant Musicianship Overall b p ph Overall tone 0 tone 1 tone 3 Overall u i o

1 musician 0.731 1 0.31 0.83 0.944 0.89 0.94 1 1 1 1 1

2 musician 0.685 0.56 0.61 0.89 0.944 0.94 0.94 0.94 1 1 1 1

3 musician 0.833 0.94 0.67 0.89 0.63 0.44 0.5 0.94 0.963 1 0.89 1

4 musician 0.537 0.44 0.22 0.94 0.907 0.83 0.89 1 1 1 1 1

5 musician 0.698 0.94 0.35 0.78 0.463 0.33 0.17 0.89 1 1 1 1

6 musician 0.796 0.89 0.5 1 1 1 1 1 0.963 1 1 0.89

7 musician 0.741 0.61 0.67 0.94 0.852 0.78 0.78 1 0.981 0.94 1 1

8 musician 0.574 1 0.56 0.17 0.704 0.61 0.56 0.94 0.944 0.96 1 0

9 musician 0.593 1 0.22 0.56 0.926 0.83 0.94 1 0.87 0.76 1 0.83

10 musician 0.907 0.83 0.89 1 1.019 1 0.94 1 1 1 1 1

11 musician 0.648 0.72 0.22 1 0.519 0.17 0.39 1 0.981 1 1 0.94

12 musician 0.852 0.78 0.83 0.94 0.944 0.94 1 0.89 0.963 1 0.94 0.94

13 musician 0.5 0.89 0.17 0.44 0.852 0.72 0.89 0.94 0.907 0.72 1 1

14 musician 0.37 0.28 0.44 0.39 0.593 0.39 0.61 0.78 0.98 1 1 0.94

15 musician 0.481 1 0.11 0.33 0.648 0.33 0.67 0.94 0.981 0.94 1 1

16 musician 0.426 0.89 0.11 0.28 0.981 0.94 1 1 0.981 1 0.94 1

17 musician 0.692 0.69 0.72 0.67 0.87 0.72 0.89 1 0.889 0.78 1 0.89

18 musician 0.5 0.89 0.39 0.22 0.63 0.33 0.56 1 1 1 1 1

1 non-musician 0.385 0.89 0 0.25 0.648 0.44 0.5 1 1 1 1 1

2 non-musician 0.556 0.89 0.11 0.67 0.574 0.28 1 0.94 0.759 0.67 0.89 0.72

3 non-musician 0.815 0.89 0.72 0.83 0.537 0.33 0.65 1 1 1 1 1

4 non-musician 0.519 1 0.06 0.44 0.926 0.94 0.83 1 1 1 1 1

5 non-musician 0.407 1 0 0.22 0.556 0.28 0.5 0.89 0.963 1 0.94 0.94

6 non-musician 0.722 0.78 0.72 0.67 0.537 0.17 0.5 0.94 1 1 1 1

7 non-musician 0.5 0.89 0.11 0.5 0.5 0.11 0.39 1 0.963 0.94 0.94 1

8 non-musician 0.444 0.94 0 0.39 0.667 0.56 0.56 0.89 0.87 0.72 0.94 0.94

9 non-musician 0.519 1 0.06 0.44 0.926 0.94 0.83 1 1 1 1 1

10 non-musician 0.481 0.83 0.11 0.5 0.444 0.11 0.22 1 1 1 1 1

11 non-musician 0.556 1 0 0.67 1 1 1 1 1 1 1 1

12 non-musician 0.611 0.94 0 0.89 0.5 0.17 0.5 0.83 0.981 1 0.94 1

13 non-musician 0.528 0.83 0.18 0.56 0.537 0.17 0.44 1 0.926 0.83 0.94 1

14 non-musician 0.7 0.69 0.41 1 0.463 0.17 0.22 1 1 1 1 1

15 non-musician 0.593 0.5 0.44 0.83 0.722 0.5 0.83 0.83 0.34 0.22 0.35 0.44

377

16 non-musician 0.463 1 0.11 0.28 0.463 0.06 0.33 1 0.519 0.39 1 0.17

17 non-musician 0.627 0.88 0.33 0.67 0.574 0.39 0.33 1 0.981 1 1 0.94

18 non-musician 0.574 0.94 0.11 0.67 0.574 0.22 0.5 1 0.981 1 1 0.94

378

Appendix A8.20 Statistical Output – Speech Perception Test

Analysis of Variance Summary Table – Perception Overall

Source SS df MS F

---------------------------------------------------

Between

---------------------------------------------------

B1 0.3295 1 0.3295 13.1538

Error 0.8518 34 0.0251

---------------------------------------------------

Within

---------------------------------------------------

W1 0.0780 1 0.0780 3.1401

B1W1 0.0693 1 0.0693 2.7878

Error 0.8450 34 0.0249

W2 2.0417 1 2.0417 122.7228

B1W2 0.0027 1 0.0027 0.1652

Error 0.5657 34 0.0166

---------------------------------------------------

Analysis of Variance Summary Table – Tone Perception

Source SS df MS F

---------------------------------------------------

Between

---------------------------------------------------

B1 0.7257 1 0.7257 11.3065

Error 2.0539 32 0.0642

---------------------------------------------------

Within

---------------------------------------------------

W1 3.4631 1 3.4631 120.2344

B1W1 0.3845 1 0.3845 13.3508

Error 0.9217 32 0.0288

W2 0.3133 1 0.3133 20.0532

B1W2 0.0511 1 0.0511 3.2680

Error 0.5000 32 0.0156

---------------------------------------------------

379

Analysis of Variance Summary Table – Consonant Perception

Source SS df MS F

---------------------------------------------------

Between

---------------------------------------------------

B1 0.185 1 0.185 5.556

Error 1.662 32 0.052

---------------------------------------------------

Within

---------------------------------------------------

W1 2.913 1 2.913 46.106

B1W1 0.410 1 0.410 6.494

Error 2.022 32 0.063

W2 1.751 1 1.751 50.307

B1W2 0.089 1 0.089 2.571

Error 1.114 32 0.035

---------------------------------------------------

Analysis of Variance Summary Table - Vowel Perception

Source SS df MS F

---------------------------------------------------

Between

---------------------------------------------------

B1 0.0605 1 0.0605 0.9460

Error 2.0448 32 0.0639

---------------------------------------------------

Within

---------------------------------------------------

W1 0.0097 1 0.0097 0.9476

B1W1 0.0107 1 0.0107 1.0506

Error 0.3265 32 0.0102

W2 0.0632 1 0.0632 2.5817

B1W2 0.0009 1 0.0009 0.0382

Error 0.7833 32 0.0245

---------------------------------------------------

380

Appendix A8.21 Raw Data - Speech Production Rating Results

Consonant Production Tone Production Vowel Production

Participant Musaicianship Overall b p ph Overall 0 1 3 Overall i o u

1 musician 3.044 3.18 2.44 3.51 3.044 3.47 3.18 2.49 3.284 3.32 3.4 3.13

2 musician 3.315 3.18 3.39 3.38 2.992 3.31 2.8 2.86 3.34 3.84 2.91 3.27

3 musician 2.952 2.64 2.84 3.38 2.898 2.67 3.25 2.78 2.878 3.04 2.57 3.02

4 musician 3.154 3.98 2.82 2.67 3.037 3.29 2.78 3.04 3.334 3.6 3.14 3.27

5 musician 3.115 3.5 2.27 3.58 3.253 3.3 3.41 3.05 3.319 3.44 2.98 3.53

6 musician 3.259 2.2 4.4 3.18 3.77 4.33 2.89 4.09 3.897 4.09 3.28 4.32

7 musician 3.399 3.64 2.82 3.73 3.17 3.02 3.31 3.18 3.41 3.36 3.47 3.41

8 musician 2.788 2.31 3.39 2.67 3.044 3.8 2.56 2.78 3.222 3.89 2.2 3.58

9 musician 3.045 2.04 3.09 4 3.311 3.09 4 2.84 3.45 3.48 3.71 3.16

10 musician 3.444 3.67 2.44 4.22 3.859 4.02 4.07 3.49 4.111 3.73 4.22 4.38

11 musician 3.23 3.02 3.04 3.62 3.643 4.18 3.24 3.51 3.978 4.42 3.33 4.18

12 musician 3.059 2.36 3.38 3.44 3.388 3.4 3.4 3.36 3.296 3.67 3.29 2.93

13 musician 3.091 2.52 3.26 3.49 3.548 3.67 3.89 3.09 3.578 3.96 3.36 3.42

14 musician 2.719 2.13 2.5 3.52 3.024 3.53 2.96 2.58 3.375 3.4 3.38 3.35

15 musician 2.741 1.91 3.27 3.04 3.356 3.71 3.71 2.64 2.822 3.29 2.36

16 musician 2.607 2.29 2.91 2.62 3.466 3.93 3.6 2.86 2.656 2.84 2.47

17 musician 3.234 3.82 2.11 3.77 3.395 3.34 4.11 2.73 3.274 3.47 3.31 3.05

18 musician 3.363 2.02 4.07 4 4.169 4.27 3.84 4.4 3.712 3.59 4.36 3.19

1 non-musician 2.932 2.64 2.52 3.63 3.192 3.27 3.47 2.84 3.285 3.6 3.77 2.49

2 non-musician 2.793 2.51 2.96 2.91 2.983 2.73 2.87 3.36 3.2 3.58 3.07 2.95

3 non-musician 3 2.71 2.31 3.98 3 3.13 3.36 2.51 3.281 3.27 3.04 3.53

4 non-musician 2.859 2.51 3 3.07 2.933 2.87 3.42 2.51 2.867 3.67 2.53 2.4

5 non-musician 2.775 2.27 2.66 3.4 2.593 2.98 2.93 1.87 3.135 3.2 2.98 3.22

6 non-musician 2.568 1.81 2.56 3.33 1.924 1.27 3 1.5 2.57 2.73 2.57 2.41

7 non-musician 3.081 2.14 3.57 3.53 3.284 3.56 2.91 3.39 3.641 3.71 3.59 3.62

8 non-musician 2.965 2.18 3.52 3.19 2.737 2.35 3 2.86 3.298 3.43 3.22 3.24

9 non-musician 2.768 2.37 3.11 2.82 3.14 3.13 2.93 3.36 2.956 3.13 2.69 3.05

10 non-musician 2.996 2.91 2.67 3.41 3.444 3.29 3.6 3.385 3.16 3.6 3.4

11 non-musician 3.141 2.49 3.51 3.42 3.415 3.57 3.22 3.45 3.552 4.18 3.5 2.98

12 non-musician 3.201 3.21 3.26 3.13 2.962 3.02 2.75 3.11 3.268 3.54 2.89 3.37

13 non-musician 3.105 3.16 3.24 2.91 2.881 2.67 3.07 2.91 3.11 3.33 2.89 3.11

14 non-musician 3.074 2.64 2.56 4.02 3.326 3.32 3.33 3.304 3.47 3.76 2.69

15 non-musician 3.062 3.67 2.49 3.02 3.154 3.35 2.91 3.2 2.947 3.32 3.02 2.5

16 non-musician 2.648 1.89 2.45 3.6 3.181 3.24 3.48 2.82 3.467 3.22 3.51 3.67

17 non-musician 2.941 2.76 2.76 3.31 3.267 3.89 2.44 3.47 3.815 4.36 3.09 4

18 non-musician 2.836 2.95 2.3 3.27 3.187 3.71 2.64 3.2 3.516 3.73 2.95 3.87

381

Appendix A8.22 Statitsical Output – Speech Production Ratings

Analysis of Variance Summary Table – Overall Production

Source SS df MS F

---------------------------------------------------

Between

---------------------------------------------------

B1 1.1035 1 1.1035 5.5592

Error 6.7490 34 0.1985

---------------------------------------------------

Within

---------------------------------------------------

W1 0.0205 1 0.0205 0.4059

B1W1 0.1881 1 0.1881 3.7170

Error 1.7204 34 0.0506

W2 1.7514 1 1.7514 47.7272

B1W2 0.0032 1 0.0032 0.0861

Error 1.2477 34 0.0367

---------------------------------------------------

Analysis of Variance Summary Table – Tone Production

Source SS df MS F

---------------------------------------------------

Between

---------------------------------------------------

B1 3.7316 1 3.7316 9.9563

Error 11.2440 30 0.3748

---------------------------------------------------

Within

---------------------------------------------------

W1 0.0900 1 0.0900 0.3065

B1W1 0.0105 1 0.0105 0.0356

Error 8.8107 30 0.2937

W2 1.4827 1 1.4827 15.8619

B1W2 0.3851 1 0.3851 4.1196

Error 2.8043 30 0.0935

---------------------------------------------------

382

Analysis of Variance Summary Table – Consonant Production

Source SS df MS F

---------------------------------------------------

Between

---------------------------------------------------

B1 0.7154 1 0.7154 5.3179

Error 4.3050 32 0.1345

---------------------------------------------------

Within

---------------------------------------------------

W1 4.6841 1 4.6841 12.3643

B1W1 0.0464 1 0.0464 0.1224

Error 12.1231 32 0.3788

W2 3.9548 1 3.9548 16.0600

B1W2 0.0139 1 0.0139 0.0563

Error 7.8800 32 0.2463

---------------------------------------------------

Analysis of Variance Summary Table – Vowel production

Source SS df MS F

---------------------------------------------------

Between

---------------------------------------------------

B1 1.1008 1 1.1008 3.6757

Error 8.9844 30 0.2995

---------------------------------------------------

Within

---------------------------------------------------

W1 0.1876 1 0.1876 1.3645

B1W1 0.0984 1 0.0984 0.7155

Error 4.1255 30 0.1375

W2 1.8119 1 1.8119 12.4173

B1W2 0.0014 1 0.0014 0.0095

Error 4.3776 30 0.1459

---------------------------------------------------

383

Appendix A8.23 Raw Data – Musical Experience Number of Years of Hours per

Participant Instruments Training Week

1 6 16 15

2 4 5 2.5

3 4 17 10

4 4 17 7

5 3 15 6

6 3 12 3

7 7 22 3.5

8 4 11 7

9 2 5 0

10 5 5 1

11 2 5 3

12 3 26 10

13 1 13 10

14 2 9 14

15 4 15 17.5

16 3 50 3.5

17 1 22 0

18 1 18 4

19 0 0 0

20 0 0 0

21 0 0 0

22 0 0 0

23 0 0 0

24 0 0 0

25 0 0 0

26 0 0 0

27 0 0 0

28 1 2 0

29 0 0 0

30 0 0 0

31 0 0 0

32 0 0 0

33 0 0 0

34 0 0 0

35 0 0 0

384

36 0 0 0

Appendix A8.24 Factor Analysis Output – Musical Training

Correlation Matrix

1.000 .609 .627

.609 1.000 .506

.627 .506 1.000

instruments

years

hoursweek

Correlationinstruments years hoursweek

Communalities

1.000 .781

1.000 .683

1.000 .699

ins truments

years

hoursweek

Initial Extraction

Extraction Method: Principal Component Analysis.

Total Variance Explained

2.163 72.107 72.107 2.163 72.107 72.107

.495 16.495 88.602

.342 11.398 100.000

Component1

2

3

Total % of Variance Cumulative % Total % of Variance Cumulative %

Initial Eigenvalues Extraction Sums of Squared Loadings

Extraction Method: Principal Component Analysis.

Component Matrixa

.884

.827

.836

instruments

years

hoursweek

1

Component

Extract ion Method: Princ ipal Component Analysis

1 components extracted.a.

385

Appendix A8.25 Raw data – Sequential Regression Musical Language Musical Musical Tone Consonant Vowel

Participant Training Aptitude Aptitude Memory Perception Perception Perception

1 2.08656 48 92 0.909 0.944 0.731 1

2 0.31568 39 49 0.625 0.944 0.685 1

3 1.32641 52 82 0.909 0.63 0.833 0.963

4 1.09071 53 96 1 0.907 0.537 1

5 0.74068 51 70 0.909 0.463 0.698 1

6 0.3996 52 88 0.889 1 0.796 0.963

7 1.59498 52 92 0.6 0.852 0.741 0.981

8 0.87997 44 70 0.833 0.704 0.574 0.944

9 -0.28318 40 47 0.917 0.926 0.593 0.87

10 0.39904 51 59 0.9 1.019 0.907 1

11 -0.04747 51 70 0.667 0.519 0.648 0.981

12 1.4413 49 82 0.917 0.944 0.852 0.963

13 0.58228 50 50 0.917 0.852 0.5 0.907

14 0.95727 41 26 0.917 0.593 0.37 0.98

15 1.84543 38 80 0.917 0.648 0.481 0.981

16 1.77355 39 76 0.8 0.981 0.426 0.981

17 0.11269 51 93 0.917 0.87 0.692 0.889

18 0.28648 42 41 0.917 0.63 0.5 1

19 -0.86122 38 32 0.667 0.648 0.385 1

20 -0.86122 35 56 0.917 0.574 0.556 0.759

21 -0.86122 59 88 0.778 0.537 0.815 1

22 -0.86122 46 82 1 0.926 0.519 1

23 -0.86122 42 12 0.667 0.556 0.407 0.963

24 -0.86122 39 32 0.444 0.537 0.722 1

25 -0.86122 44 41 0.833 0.5 0.5 0.963

26 -0.86122 35 38 0.833 0.667 0.444 0.87

27 -0.86122 43 76 0.917 0.926 0.519 1

28 -0.86122 48 50 0.917 0.463 0.315 0.815

29 -0.86122 52 70 1 1 0.556 1

30 -0.86122 50 72 0.833 0.667 0.66 0.963

31 -0.86122 45 65 0.833 0.537 0.528 0.926

32 -0.86122 39 32 0.917 0.593 0.519 0.926

33 -0.86122 44 38 0.833 0.722 0.593 0.34

34 -0.86122 43 35 0.833 . 0.509 0.833

35 -0.86122 47 32 0.9 0.593 0.444 1

386

36 -0.86122 36 50 0.917 0.574 0.574 0.981

Appendix A8.26 SPSS Output – Sequential Regression

Tone Perception

Model Summary

.195a .038 .009 .183577 .038 1.306 1 33 .261

.479b .230 .182 .166808 .192 7.969 1 32 .008

.494c .244 .171 .167906 .014 .583 1 31 .451

.522d .272 .175 .167465 .028 1.164 1 30 .289

Model1

2

3

4

R R SquareAdjustedR Square

Std. Error ofthe Estimate

R SquareChange F Change df1 df2 Sig. F Change

Change Statistics

Predic tors: (Constant), plaba.

Predic tors: (Constant), plab, ammab.

Predic tors: (Constant), plab, amma, musmemc.

Predic tors: (Constant), plab, amma, musmem, musical trainingd.

ANOVAe

.044 1 .044 1.306 .261a

1.112 33 .034

1.156 34

.266 2 .133 4.775 .015b

.890 32 .028

1.156 34

.282 3 .094 3.336 .032c

.874 31 .028

1.156 34

.315 4 .079 2.806 .043d

.841 30 .028

1.156 34

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Model1

2

3

4

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), plaba.

Predictors: (Constant), plab, ammab.

Predictors: (Constant), plab, amma, musmemc.

Predictors: (Constant), plab, amma, musmem, musical trainingd.

Dependent Variable: perc tonee.

387

Tone Production

Model Summary

.253a .064 .036 .37495 .064 2.320 1 34 .137

.256b .066 .009 .38025 .002 .059 1 33 .809

.494c .244 .173 .34728 .179 7.565 1 32 .010

.543d .295 .204 .34082 .051 2.224 1 31 .146

Model1

2

3

4




Change Statistics





ANOVAe

.326 1 .326 2.320 .137a

4.780 34 .141

5.106 35

.335 2 .167 1.157 .327b

4.772 33 .145

5.106 35

1.247 3 .416 3.447 .028c

3.859 32 .121

5.106 35

1.505 4 .376 3.240 .025d

3.601 31 .116

5.106 35

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Model1

2

3

4






Dependent Variable: toneprode.

388

Consonant Perception

Model Summary

.536a .287 .266 .125963 .287 13.709 1 34 .001

.600b .360 .322 .121128 .073 3.769 1 33 .061

.662c .438 .385 .115312 .078 4.413 1 32 .044

.665d .442 .370 .116774 .004 .204 1 31 .655

Model1

2

3

4




Change Statistics





ANOVAe

.218 1 .218 13.709 .001a

.539 34 .016

.757 35

.273 2 .136 9.297 .001b

.484 33 .015

.757 35

.331 3 .110 8.310 .000c

.426 32 .013

.757 35

.334 4 .084 6.128 .001d

.423 31 .014

.757 35

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Model1

2

3

4






Dependent Variable: percconse.

389


Model Summary

.519a .269 .247 .19462 .269 12.511 1 34 .001

.519b .269 .225 .19748 .001 .023 1 33 .881

.519c .269 .201 .20054 .000 .000 1 32 .998

.520d .270 .176 .20362 .001 .037 1 31 .848

Model1

2

3

4




Change Statistics





ANOVAe

.474 1 .474 12.511 .001a

1.288 34 .038

1.762 35

.475 2 .237 6.087 .006b

1.287 33 .039

1.762 35

.475 3 .158 3.935 .017c

1.287 32 .040

1.762 35

.476 4 .119 2.872 .039d

1.285 31 .041

1.762 35

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Model1

2

3

4






Dependent Variable: consprode.

390

Vowel Perception

Model Summary

.167a .028 -.001 .118821 .028 .979 1 34 .329

.247b .061 .004 .118528 .033 1.168 1 33 .288

.288c .083 -.003 .118983 .021 .748 1 32 .394

.330d .109 -.006 .119159 .026 .905 1 31 .349

Model1

2

3

4




Change Statistics





ANOVAe

.014 1 .014 .979 .329a

.480 34 .014

.494 35

.030 2 .015 1.076 .353b

.464 33 .014

.494 35

.041 3 .014 .961 .423c

.453 32 .014

.494 35

.054 4 .013 .945 .451d

.440 31 .014

.494 35

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Model1

2

3

4






Dependent Variable: percvowele.

391

Vowel Production

Model Summary

.325a .106 .079 .33015 .106 4.019 1 34 .053

.484b .234 .188 .31011 .128 5.537 1 33 .025

.529c .279 .212 .30547 .045 2.009 1 32 .166

.531d .282 .189 .30986 .002 .100 1 31 .754

Model1

2

3

4




Change Statistics





ANOVAe

.438 1 .438 4.019 .053a

3.706 34 .109

4.144 35

.970 2 .485 5.046 .012b

3.174 33 .096

4.144 35

1.158 3 .386 4.137 .014c

2.986 32 .093

4.144 35

1.168 4 .292 3.040 .032d

2.976 31 .096

4.144 35

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Regression

Residual

Total

Model1

2

3

4






Dependent Variable: vowelprode.

392

Appendix A8.27 SPSS Output – Alternative Sequential Regression

Tone Perception

Model Summary

.195a .038 .009 .183577 .038 1.306 1 33 .261

.407b .166 .114 .173591 .128 4.906 1 32 .034

.450c .203 .125 .172457 .037 1.422 1 31 .242

.522d .272 .175 .167465 .070 2.876 1 30 .100

Model1

2

3

4




Change Statistics

Predictors : (Constant), plaba.

Predictors : (Constant), plab, mus ical trainingb.

Predictors : (Constant), plab, mus ical training, musmemc.

Predictors : (Constant), plab, mus ical training, musmem, ammad.

ANOVAe

.044 1 .044 1.306 .261a

1.112 33 .034

1.156 34

.192 2 .096 3.183 .055b

.964 32 .030

1.156 34

.234 3 .078 2.624 .068c

.922 31 .030

1.156 34

.315 4 .079 2.806 .043d

.841 30 .028

1.156 34

Regress ion

Residual

Total

Regress ion

Residual

Total

Regress ion

Residual

Total

Regress ion

Residual

Total

Model1

2

3

4



Predictors : (Constant), plab, musical trainingb.

Predictors : (Constant), plab, musical training, musmemc.

Predictors : (Constant), plab, musical training, musmem, ammad.

Dependent Variable: perctonee.

393

Tone Production

Model Summary

.253a .064 .036 .37495 .064 2.320 1 34 .137

.319b .102 .047 .37286 .038 1.384 1 33 .248

.519c .269 .200 .34155 .167 7.327 1 32 .011

.543d .295 .204 .34082 .026 1.137 1 31 .294

Model1

2

3

4




Change Statistics





ANOVAe

.326 1 .326 2.320 .137a

4.780 34 .141

5.106 35

.519 2 .259 1.865 .171b

4.588 33 .139

5.106 35

1.373 3 .458 3.924 .017c

3.733 32 .117

5.106 35

1.505 4 .376 3.240 .025d

3.601 31 .116

5.106 35

Regress ion

Residual

Total

Regress ion

Residual

Total

Regress ion

Residual

Total

Regress ion

Residual

Total

Model1

2

3

4






Dependent Variable: toneprode.

394

Consonant Perception

Model Summary

.536a .287 .266 .125963 .287 13.709 1 34 .001

.573b .329 .288 .124110 .041 2.023 1 33 .164

.616c .379 .321 .121173 .051 2.619 1 32 .115

.665d .442 .370 .116774 .062 3.457 1 31 .073

Model1

2

3

4




Change Statistics





ANOVAe

.218 1 .218 13.709 .001a

.539 34 .016

.757 35

.249 2 .124 8.072 .001b

.508 33 .015

.757 35

.287 3 .096 6.518 .001c

.470 32 .015

.757 35

.334 4 .084 6.128 .001d

.423 31 .014

.757 35

Regress ion

Residual

Total

Regress ion

Residual

Total

Regress ion

Residual

Total

Regress ion

Residual

Total

Model1

2

3

4






Dependent Variable: percconse.

395


Model Summary

.519a .269 .247 .19462 .269 12.511 1 34 .001

.519b .269 .225 .19752 .000 .009 1 33 .925

.519c .269 .201 .20057 .000 .002 1 32 .967

.520d .270 .176 .20362 .001 .048 1 31 .827

Model1

2

3

4




Change Statistics





ANOVAe

.474 1 .474 12.511 .001a

1.288 34 .038

1.762 35

.474 2 .237 6.078 .006b

1.287 33 .039

1.762 35

.474 3 .158 3.930 .017c

1.287 32 .040

1.762 35

.476 4 .119 2.872 .039d

1.285 31 .041

1.762 35

Regress ion

Residual

Total

Regress ion

Residual

Total

Regress ion

Residual

Total

Regress ion

Residual

Total

Model1

2

3

4






Dependent Variable: consprode.

396

Vowel Perception

Model Summary

.167a .028 -.001 .118821 .028 .979 1 34 .329

.291b .085 .029 .117028 .057 2.050 1 33 .162

.314c .099 .014 .117928 .014 .498 1 32 .485

.330d .109 -.006 .119159 .010 .342 1 31 .563

Model1

2

3

4




Change Statistics





ANOVAe

.014 1 .014 .979 .329a

.480 34 .014

.494 35

.042 2 .021 1.529 .232b

.452 33 .014

.494 35

.049 3 .016 1.170 .336c

.445 32 .014

.494 35

.054 4 .013 .945 .451d

.440 31 .014

.494 35

Regress ion

Residual

Total

Regress ion

Residual

Total

Regress ion

Residual

Total

Regress ion

Residual

Total

Model1

2

3

4






Dependent Variable: percvowele.

397

Vowel Production

Model Summary

.325a .106 .079 .33015 .106 4.019 1 34 .053

.359b .129 .076 .33080 .023 .866 1 33 .359

.384c .148 .068 .33224 .019 .716 1 32 .404

.531d .282 .189 .30986 .134 5.789 1 31 .022

Model1

2

3

4




Change Statistics





ANOVAe

.438 1 .438 4.019 .053a

3.706 34 .109

4.144 35

.533 2 .266 2.434 .103b

3.611 33 .109

4.144 35

.612 3 .204 1.848 .158c

3.532 32 .110

4.144 35

1.168 4 .292 3.040 .032d

2.976 31 .096

4.144 35

Regress ion

Residual

Total

Regress ion

Residual

Total

Regress ion

Residual

Total

Regress ion

Residual

Total

Model1

2

3

4






Dependent Variable: vowelprode.

Date post:	19-Mar-2023
Category:	Documents
Upload:	khangminh22
View:	0 times
Download:	0 times

LEXICAL TONE PERCEPTION AND PRODUCTION

Documents