The Plan for Today One announcement: On Thursday, well meet in
the Tri-Faculty Computer Lab (SS 018) Section 1 Well be working on
intonation transcription 1.Automatic Pitch Tracking 2.(Brief)
suprasegmentals review 3.The basics of English intonation
Slide 4
The Thin Blue Line The blue line represents the fundamental
frequency (F0) of the speakers voice. Also known as a pitch track
How can we automatically track F0 in a sample of speech? Praat can
give us a representation of speech that looks like:
Slide 5
Pitch Tracking Voicing: Air flow through vocal folds Rapid
opening and closing due to Bernoulli Effect Each cycle sends an
acoustic shockwave through the vocal tract which takes the form of
a complex wave. The rate at which the vocal folds open and close
becomes the fundamental frequency (F0) of a voiced sound.
Slide 6
Voicing Bars
Slide 7
Individual glottal pulses
Slide 8
Voicing = Complex Wave Note: voicing is not perfectly periodic.
always some random variation from one cycle to the next. How can we
measure the fundamental frequency of a complex wave?
Slide 9
The basic idea: figure out the period between successive cycles
of the complex wave. Fundamental frequency = 1 / period duration =
???
Slide 10
Measuring F0 To figure out where one cycle ends and the next
begins The basic idea is to find how well successive chunks of a
waveform match up with each other. One period = the length of the
chunk that matches up best with the next chunk. Automatic Pitch
Tracking parameters to think about: 1.Window size (i.e., chunk
size) 2.Step size 3.Frequency range (= period range)
Slide 11
Window (Chunk) Size Heres an example of a small window
Slide 12
Window (Chunk) Size Heres an example of a large(r) window
Slide 13
Initial window of the waveform is compared to another window
(of the same size) at a later point in the waveform
Slide 14
Matching The waveforms in the two windows are compared to see
how well they match up. Correlation = measure of how well the two
windows match ???
Slide 15
Autocorrelation The measure of correlation = Sum of the
point-by-point products of the two chunks. The technical name for
this is autocorrelation because two parts of the same wave are
being matched up against each other.
Slide 16
Autocorrelation Example Ex: consider window x, with n samples
Whats its correlation with window y? (Note: window y must also have
n samples) x 1 = first sample of window x x 2 = second sample of
window x x n = nth (final) sample of window x y 1 = first sample of
window y, etc. Correlation (R) = x 1 *y 1 + x 2 + y 2 + + x n * y n
The larger R is, the better the correlation.
Slide 17
By the Numbers Sample123456 x.8.3-.2-.5.4.8 y-.3-.1.1.3.1-.1
product-.24-.03-.02-.15.04-.08 Sum of products = -.48 These two
chunks are poorly correlated with each other.
Slide 18
By the Numbers, part 2 Sample123456 x.8.3-.2-.5.4.8
y.7.4-.1-.4.1.4 product.56.12.02.2.04.32 Sum of products = 1.26
These two chunks are well correlated with each other. (or at least
better than the previous pair) Note: matching peaks count for more
than matches close to 0.
Slide 19
Back to (Digital) Reality The waveforms in the two windows are
compared to see how well they match up. Correlation = measure of
how well the two windows match ??? These two windows are poorly
correlated
Slide 20
Next: the pitch tracking algorithm moves further down the
waveform and grabs a new window
Slide 21
The distance the algorithm moves forward in the waveform is
called the step size step
Slide 22
Matching, again The next window gets compared to the original.
???
Slide 23
Matching, again The next window gets compared to the original.
??? These two windows are also poorly correlated
Slide 24
The algorithm keeps chugging and, eventually another step
Slide 25
Matching, again The best match is found. ??? These two windows
are highly correlated
Slide 26
The fundamental period can be determined by the calculating the
length of time between window 1 and window 2. period
Slide 27
Frequency is 1 / period Q: How many possible periods does the
algorithm need to check? Frequency range (default in Praat: 75 to
600 Hz) Mopping up
Slide 28
Moving on Another comparison window is selected and the whole
process starts over again.
Slide 29
would Uhm I like A flight to Seattle from Albuquerque The
algorithm ultimately spits out a pitch track. This one shows you
the F0 value at each step. Thanks to Chilin Shih for making these
materials available
Slide 30
Pitch Tracking in Praat Play with F0 range. Create Pitch
Object. Also go To ManipulationPitch. Also check out:
Slide 31
Summing Up Pitch tracking uses three parameters 1.Window size
Ensures reliability In Praat, the window size is always three times
the longest possible period. E.g.: 3 X 1/75 =.04 sec. 2.Step size
For temporal accuracy 3.Frequency range Reduces computational
load
Slide 32
Deep Thought Questions What might happen if: The shortest
period checked is longer than the fundamental period? AND two
fundamental periods fit inside a window? Potential Problem #1:
Pitch Halving The pitch tracker thinks the fundamental period is
twice as long as it is in reality. It estimates F0 to be half of
its actual value
Slide 33
Pitch Halving pitch is halved Check out normal file in
Praat.
Slide 34
More Deep Thoughts What might happen if: The shortest period
checked is less than half of the fundamental period? AND the second
half of the fundamental cycle is very similar to the first?
Potential Problem #2: Pitch doubling The pitch tracker thinks the
fundamental period is half as long as it actually is. It estimates
the F0 to be twice as high as it is in reality.
Slide 35
Pitch Doubling pitch is doubled
Slide 36
Microperturbations Another problem: Speech waveforms are partly
shaped by the type of segment being produced. Pitch tracking can
become erratic at the juncture of two segments. In particular:
voiced to voiceless segments sonorants to obstruents These
discontinuities in F0 are known as microperturbations. Also:
transitions between modal and creaky voicing tend to be
problematic.
Slide 37
Back to Language F0 is important because it can be used by
languages to signal differences in meaning. Note:
Acoustic=Fundamental Frequency Perceptual=Pitch
Linguistic=Tone
Slide 38
A Typology F0 is generally used in three different ways in
language: 1. Tone languages (Chinese, Navajo, Igbo) Lexically
determined tone on every syllable Syllable-based tone languages 2.
Accentual languages (Japanese, Swedish) The location of an accent
in a particular word is lexically marked. Word-based tone languages
3. Stress languages (English, Russian) Its complicated.
Slide 39
Mandarin Tone ma1: mother ma2: hemp ma3: horse ma4: to scold
Mandarin (Chinese) is a classic example of a tone language.
Slide 40
How to Transcribe Tone Tones are defined by the pattern they
make through a speakers frequency range. The frequency range is
usually assumed to encompass five levels (1-5). (although this can
vary, depending on the language) 1 2 3 4 5Highest F0 Lowest F0
Slide 41
In Mandarin, tones span a frequency range of 1-5 Each tone is
denoted by its (numerical) path through the frequency range Each
syllable can also be labeled with a tone number (e.g., ma 1, ma 2,
ma 3, ma 4 ) Tone 1 2 3 4
Slide 42
How to Transcribe Tone Tone is relative i.e., not absolute Each
speaker has a unique frequency range. For example: 1 2 3 4 5Highest
F0 Lowest F0 FemaleMale 100 Hz 200 Hz350 Hz 150 Hz
Slide 43
General Relativity In ordinary conversation, for European
languages (Fant, 1956) : Men have an average F0 of 120 Hz A range
of 50-250 Hz Women have an average F0 of 220 Hz A range of 120-480
Hz Children have an average F0 of 330 Hz In a normal utterance, the
F0 range is usually one octave. i.e., highest F0 = 2 * lowest
F0
Slide 44
Relativity, in Reality The same tones may be denoted by
completely different frequencies, depending on the speaker. Tone is
an abstract linguistic unit. female speaker male speaker ma, tone 1
(55)
Slide 45
Accent Languages In accent languages, there is only one pitch
accent associated with each word. The pitch accent is realized on
only one syllable in the word. The other syllables in the word can
have no accent. Accent is lexically determined, so there can be
minimal pairs. Japanese is a pitch accent language for some, but
not all, words for some, but not all, dialects
Slide 46
Japanese Japanese words have one High accent it attaches to one
mora in the word A mora = a vowel, or a consonant following a
vowel, within a syllable. For example: [ni] twohas one mora. [san]
three has two morae. The first mora, if not accented, has a Low F0.
Morae following the accent have Low F0. Its actually slightly more
complicated than this; for more info, see:
http://sp.cis.iwate-u.ac.jp/sp/lesson/j/doc/accent.html
Slide 47
Japanese Examples asamorningH-L asahempL-H
Slide 48
chopsticksH-L-L bridgeL-H-L edgeL-H-H
Slide 49
Stress Languages Stress is a suprasegmental property that
applies to whole syllables. It is defined by more than just
differences in F0. Stressed syllables are higher in pitch (usually)
Stressed syllables are longer (usually) Stressed syllables are
louder (usually) Stressed syllables reflect more phonetic effort.
More aspiration, less coarticulation in stressed syllables. Vowels
often reduce to schwa in unstressed syllables. The combination of
these factors give stressed syllables more prominence than
unstressed syllables.
Slide 50
Stress: Pitch (N) (V) Complicating factor: pitch tends to drift
downwards at the end of utterances
Slide 51
Intonation Languages superimpose pitch contours on top of word-
based stress or tone distinctions. This is called intonation. It
turns out that English: has word-based stress and phrase-based
pitch accents (intonation) The pitch accents are pragmatically
specified, rather than lexically specified. =they change according
to discourse context.
Slide 52
English Intonation Well analyze English intonation with a
framework called TOBI Tones and Break Indices Note: intonational
patterns vary across dialects The patterns and examples presented
today might not match up with your own intonational system Also:
this framework has only been applied to a few (primarily western)
languages Theres more info at http://www.ling.ohio-state.edu/~tobi/
Course in Phonetics, pp. 124-128
Slide 53
Levels of Prominence In English, pitch accents align with
stressed syllables. Example: exploitation vowelX X X X full vowelX
X X stressX X pitch accent X Normally, the accent falls on the last
stressed syllable.
Slide 54
Pitch Accent Types In English, pitch accents can be either high
or low H* or L* Examples:High (H*)Low (L*) Yes.Yes? H* L*
Magnification.Magnification? As with tones in tone languages, high
and low pitch accents are defined relative to a speakers pitch
range. My pitch range: H* = 155 HzL* = 100 Hz Mary Beckman: H* =
260 HzL* = 130 Hz
Slide 55
Whole Utterances The same pitch pattern can apply to an entire
sentence: H* H*:Manny came with Anna. L* L*:Manny came with Anna?
H* H*:Marianna made the marmalade. L* L*:Marianna made the
marmalade?
Slide 56
Information Note that theres a tendency to accent new
information in the discourse. 4 different patterns for 4 different
contexts: H* H*:Manny came with Anna. H* H*:Manny came with Anna.
L* L*:Manny came with Anna? L* L*:Manny came with Anna?
Slide 57
Pitch Tracking H* is usually associated with a peak in F0; L*
is usually associated with a valley (trough) in F0 Pitch tracking
can help with the identification of pitch peaks and valleys. Note:
its easier to analyze utterances with lots of sonorants. Check out
both productions of Manny came with Anna in Praat. Note that there
is more to the intonation contour than just pitch peaks and valleys
The H* is followed by a falling pitch pattern The L* is followed by
a rising pitch pattern
Slide 58
Tone Transcription L* H%
Slide 59
Tone Types There are two types of tones at play: 1.Pitch
Accents associated with a stressed syllable may be either High (H)
or Low (L) marked with a * 2.Boundary Tones appear at the end of a
phrase not associated with a particular syllable may be either High
(H) or Low (L) marked with a %
Slide 60
Phrases Intonation organizes utterances into phrases chunks
Boundary tones mark the end of intonational phrases Intonational
phrases are the largest phrases In the transcription of intonation,
phrase boundaries are marked with Break Indices Hence, TOBI: Tones
and Break Indices Break Indices are denoted by numbers 1 = break
between words 4 = break between intonational phrases
Slide 61
Break Index Transcription Tones:L* H% Breaks: 1 1 14