Overview
Background Methodology
Data Preliminary analysis Regression model
Results Discussion
2
Overview
Background Methodology
Data Preliminary analysis Regression model
Results Discussion
3
Background
Keywords VOT Variation Spontaneous speech
VOT (Voice Onset Time) The duration of time between consonant release
and the beginning of voicing of the next vowel Sensitive to speaker and speaking environment
4
close release vowel onset
Background
What conditions length of VOT? Place of articulation (POA)
VOT increases as POA moves backward, i.e. [p]<[t]<[k] Following vowel Speaking rate Age, gender Dialectal background Speech disorders Lung volume Hormone level …
5
Background
Why using spontaneous speech data? Previous results are mostly based on experimental
data or read speech. The existence of large-scale transcribed speech
corpora makes it possible to study patterns with “naturalistic” data. (Cf. Bell et al. 1999, Gahl in press, Raymond et al. 2006, etc)
6
Background
Experimental data Controlled content Easy to investigate
individual factors Hard to see the
general pattern of variation
Not necessarily natural speech
Spontaneous data Uncontrolled content Need to statistically
control for irrelevant factors
Provides a general picture of variation
More naturalistic. Include factors such as disfluency
7
Background
Purpose of this study To investigate some of the factors that have been shown to
affect VOT in experiments, as well as those that have been proposed to influence spontaneous speech production
Main statistical tool Linear regression Adding variables step by step
8
Overview
Background Methodology
Data Preliminary analysis Regression model
Results Discussion
9
Data
Buckeye corpus (Pitt et al. 2005)
40 speakers
All residents at Columbus, Ohio
Balanced in age and gender
1-hr interview
Transcribed at word and phone level
19 speakers’ transcriptions were available at the time of this study
10
Data
2 speakers’ data are used for this study
F07: Older, female, low speaking rate (4.022 syllables/sec)
M08: Younger, male, high speaking rate (6.434 syllabes/sec)
Target tokens word-initial transcribed voiceless stops (i.e., [p],
[t], [k])
11
Data
Finding point of burst An automatic algorithm is used first. (cf. Yao 2007) >70% of the tokens are checked manually. Error
<3.5 ms. Some tokens are rejected by the algorithm for not
having significant burst point.
Number of tokens
F07 M08
Target tokens 231 618
Target tokens with burst point found 210 466
12
F07: Mean = 57.41ms, SD = 26.00ms
M08: Mean = 34.86ms, SD = 19.82ms
VOT by speaker13
Overview
Background Methodology
Data Preliminary analysis Regression model
Results Discussion
14
Preliminary analysis: POA
VOT by POA in F07 VOT by POA in M08
p t kp t k
15
Preliminary analysis: Word class
Split the data set into three subsets Content words Function words Other. (e.g. proper names)
Number of words of different classes
Content Function Other
F07 155 47 8
M08 346 104 16
16
Preliminary analysis: Word class
VOT by word class in F07 VOT by word class in M08
17
function content other function content other
Word class distinction or general effect of frequency?
Preliminary analysis: word class18
Preliminary analysis: word frequency
Two frequency measures: Log of Celex frequency Log of Buckeye frequency (speaker-specific) The two measures are highly correlated (r=0.826)
Effect: more frequent words have shorter VOT
Frequency effect
Celex frequency Buckeye frequency
p R^2 (%) p R^2 (%)
F07 <0.001 5.1 <0.001 4.8
M08 <0.001 4.9 <0.001 5.9
19
Word class vs. frequency
After factoring out the effect of word class, frequency is no longer significant in F07’s data (p=0.277), but still in M08’s data (p=0.003)
This suggests that the above frequency effect in F07 is mainly due to the effect of word class. In other words, we need to factor out the effect of word class if we really want to study the effect of frequency.
20
Overview
Background Methodology
Data Preliminary analysis Regression model
Results Discussion
21
Linear regression model
We decide to only model the variation in the content word set F07: 155 tokens M08: 346 tokens
Factors investigated POA Word frequency Phonetic context Speech rate Utterance position
22
Overview
Background Methodology
Data Preliminary analysis Regression model
Results Discussion
23
Regression: POA
The canonical rule of [p] <[t] <[k] is only shown in M08’s data, not in F07’s data.
F07 M08
p 0.216 <0.001
R-squared(%) 0 9.2
24
Regression: word frequency
In both speakers’ data, more frequent words tend to have shorter VOT, but the trends are not very significant.
For both speakers, Buckeye frequency measure is slightly better than Celex frequency measure.
25
Regression: word frequency
F07 M08
Log Celex freq
p R^2 (%)
R^2 change of the model (%)
0.391 0.2 0 1.3
Buckeye freq (speaker-specific)
p R^2 (%)
R^2 change of the model (%)
0.577 0 0 1.7
Log Celex freq
p R^2 (%)
R^2 change of the model (%)
0.169 0.3 9.2 9.4
Buckeye freq (speaker-specific)
p R^2 (%)
R^2 change of the model (%)
0.067 0.7 9.2 9.6
26
Regression: phonetic context
Two measures Category of the previous phone
Coded as C(onsonant), V(owel), O(other sound), and N(on-linguistic)
Category of the next phone Coded as C(onsonant), V(owel), O(other sound), and
N(on-linguistic)
27
Regression: Phonetic context
F07 M08
Next phone category
p R^2 (%)
R^2 change of the model (%)
0.563 0.4 1.7 0.6
Previous phone category
p R^2 (%)
R^2 change of the model (%)
0.141 0.7 1.7 1.9
Next phone category
p R^2 (%)
R^2 change of the model (%)
0.036 0.9 9.6 10.08
Previous phone category
p R^2 (%)
R^2 change of the model (%)
0.127 0.4 9.6 9.27
28
Regression: phonetic context
VOT by next phone category in M08
VOT by previous phone category in F07
29
Regression: speech rate
Three speed measures Duration of the next phone, in ms. Average speed of a 3-word period centered at the
target word, measured in # of syll/s. Average speed of the pause-bounded stretch that
contains the target word, measured in # of syll/s. All speed measures predict that words in faster
speech tend to have shorter VOT
30
Regression: speech rate
F07 M08
Average of the 3-wd stretch
p R^2 (%)
R^2 change of the model (%)
<0.001 10.93
1.9 11.8
Duration of next phone
p R^2 (%)
R^2 change of the model (%)
0.014 3.2 1.9 5.1
Average of the 3-wd stretch
p R^2 (%)
R^2 change of the model (%)
<0.001 4.1 10.08 12.85
Duration of next phone
p R^2 (%)
R^2 change of the model (%)
0.342 0 10.08 16.62
Average of the local stretch
p R^2 (%)
R^2 change of the model (%)
<0.001 6 1.9 7.1
Average of the local stretch
p R^2 (%)
R^2 change of the model (%)
0.014 1.4 10.08 15.07
31
Regression: utterance position
Utterance-final lengthening has been documented in the literature extensively.
We code tokens for whether they are followed by silence.
Number of tokens
F07 M08
Non-final 146 312
final 9 34
32
Regression: utterance position
F07 M08
33
non-final final non-final final
Regression: utterance position
F07 M08
Utterance position contributes to the variation in VOT
Utterance position doesn’t contribute to the variation in VOT
Utterance position
p R^2 (%)
R^2 change of the model (%)
0.021 2.8 11.8 19.11
Utterance position
p R^2 (%)
R^2 change of the model (%)
0.652 0.2 16.62 13.31
34
Regression: complete model
F07 M08
Model performance
Variable added R^2 (%)
POA 0
Buckeye Frequency 1.7
Previous phone category
1.9
Average speed of the 3-word stretch
11.8
Utterance position 19.11
Model performance
Variable added R^2 (%)
POA 9.2
Buckeye Frequency 9.6
Next phone category 10.08
Duration of the next phone
16.62
35
Regression: trends observed
POA [p]<[t]<[k]
Word class function words < content words
Word frequency ??Higher frequency shorter VOT
36
Regression: trends observed
Phonetic category ??Preceded by vowel shorter VOT ??Followed by vowel longer VOT
Speaking rate Faster speech shorter VOT
Utterance position Utterance final longer VOT
37
Regression: trends observed
Missing from the picture Contextual predictability Stress Disfluency Emotion
38
Overview
Background Methodology
Data Preliminary analysis Regression model
Results Discussion
39
Discussion
Individual differences Factors Measurements
Other between-subject factors Age Gender Average speaking rate
40
Discussion
Relatively little variation is explained in the full model. (19.11% in F07 and 16.62% in M08) Factors missing from the picture: contextual
predictability, stress, disfluency, etc. Limitation of linear regression model
Non-linear effect Non-homogeneous effect Mixture of categorical and continuous variables
41
Discussion
Echoing and challenging previous findings VOT and POA
Canonical rule is observed in M08, but not in F07 Word frequency effect
Overshadowed by word class distinction Utterance-final lengthening
Significant in F07, but not M08 Speaking style? Content words vs. function words? Speed measures?
42
Conclusion
Still a long way to go to model VOT variation in spontaneous speech…
Thanks! Any comments are welcome!
43
Thanks to
Anonymous subjects Contributors to the Buckeye corpus Prof. Keith Johnson Members of the phonology lab in UC, Berkeley
44
Selected references
Bell, A. et al. (1999) Forms of English function words - Effects of disfluencies, turn position, age and sex, and predictability. Proceedings of ICPhS-99
Gahl, S. In press. "Time" and "thyme" are not homophones: The effect of lemma frequency on word durations in a corpus of spontaneous speech. To appear in Language.
Pitt, M. et al. (2005) The Buckeye Corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Communication. Vol 45, pp: 90-95
Raymond et al. (2006) Word-internal /t,d/ deletion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonological factors.
Yao, Y. (2007) Closure duration and VOT of word-initial voiceless plosives in English in spontaneous connected speech. UC Berkeley PhonLab report
45