+ All Categories
Home > Documents > Pitch Tracking + Prosody

Pitch Tracking + Prosody

Date post: 24-Feb-2016
Category:
Upload: karah
View: 36 times
Download: 1 times
Share this document with a friend
Description:
Pitch Tracking + Prosody. January 20, 2009. The Plan for Today. One announcement: On Thursday, we’ll meet in the Tri-Faculty Computer Lab (SS 018) Section 1 We’ll be working on intonation transcription… Automatic Pitch Tracking (Brief) suprasegmentals review - PowerPoint PPT Presentation
60
Pitch Tracking + Prosody January 20, 2009
Transcript

Announcements, etc.

Pitch Tracking + ProsodyJanuary 20, 2009The Plan for TodayOne announcement:On Thursday, well meet in the Tri-Faculty Computer Lab (SS 018)Section 1Well be working on intonation transcriptionAutomatic Pitch Tracking(Brief) suprasegmentals reviewThe basics of English intonationThe Thin Blue Line

The blue line represents the fundamental frequency (F0) of the speakers voice. Also known as a pitch track How can we automatically track F0 in a sample of speech? Praat can give us a representation of speech that looks like:

Pitch Tracking Voicing: Air flow through vocal folds Rapid opening and closing due to Bernoulli Effect Each cycle sends an acoustic shockwave through the vocal tract which takes the form of a complex wave. The rate at which the vocal folds open and close becomes the fundamental frequency (F0) of a voiced sound.4Voicing Bars

5Voicing Bars

Individual glottal pulses

6Voicing = Complex Wave

Note: voicing is not perfectly periodic. always some random variation from one cycle to the next. How can we measure the fundamental frequency of a complex wave?7

The basic idea: figure out the period between successive cycles of the complex wave. Fundamental frequency = 1 / periodduration = ???8Measuring F0To figure out where one cycle ends and the next beginsThe basic idea is to find how well successive chunks of a waveform match up with each other.One period = the length of the chunk that matches up best with the next chunk.Automatic Pitch Tracking parameters to think about:Window size (i.e., chunk size)Step sizeFrequency range (= period range)9

Window (Chunk) Size

Heres an example of a small window10Window (Chunk) Size

Heres an example of a large(r) window11Initial window of the waveform is compared to another window (of the same size) at a later point in the waveform

12MatchingThe waveforms in the two windows are compared to see how well they match up.Correlation = measure of how well the two windows match

???13Autocorrelation The measure of correlation = Sum of the point-by-point products of the two chunks. The technical name for this is autocorrelation because two parts of the same wave are being matched up against each other.Autocorrelation Example Ex: consider window x, with n samples Whats its correlation with window y? (Note: window y must also have n samples) x1 = first sample of window x x2 = second sample of window x xn = nth (final) sample of window x y1 = first sample of window y, etc. Correlation (R) = x1*y1 + x2+ y2 + + xn* yn The larger R is, the better the correlation.By the NumbersSample123456x.8.3-.2-.5.4.8y-.3-.1.1.3.1-.1product-.24-.03-.02-.15.04-.08Sum of products = -.48 These two chunks are poorly correlated with each other.By the Numbers, part 2Sample123456x.8.3-.2-.5.4.8y.7.4-.1-.4.1.4product.56.12.02.2.04.32Sum of products = 1.26 These two chunks are well correlated with each other.(or at least better than the previous pair) Note: matching peaks count for more than matches close to 0.Back to (Digital) RealityThe waveforms in the two windows are compared to see how well they match up.Correlation = measure of how well the two windows match

???These two windows are poorly correlated18Next: the pitch tracking algorithm moves further down the waveform and grabs a new window

19The distance the algorithm moves forward in the waveform is called the step size

step20Matching, againThe next window gets compared to the original.

???

21Matching, againThe next window gets compared to the original.

???These two windows are also poorly correlated

22The algorithm keeps chugging and, eventually

another step

23Matching, againThe best match is found.

???These two windows are highly correlated

24The fundamental period can be determined by the calculating the length of time between window 1 and window 2.

period25

period Frequency is 1 / period Q: How many possible periods does the algorithm need to check? Frequency range (default in Praat: 75 to 600 Hz)Mopping up26Moving on

Another comparison window is selected and the whole process starts over again.

27wouldUhmIlikeA flight to Seattle from Albuquerque

The algorithm ultimately spits out a pitch track. This one shows you the F0 value at each step.Thanks to Chilin Shih for making these materials available28Pitch Tracking in Praat Play with F0 range. Create Pitch Object. Also go To ManipulationPitch. Also check out:

Summing UpPitch tracking uses three parametersWindow sizeEnsures reliabilityIn Praat, the window size is always three times the longest possible period.E.g.: 3 X 1/75 = .04 sec.Step sizeFor temporal accuracyFrequency rangeReduces computational load30Deep Thought QuestionsWhat might happen if:The shortest period checked is longer than the fundamental period?AND two fundamental periods fit inside a window?Potential Problem #1: Pitch HalvingThe pitch tracker thinks the fundamental period is twice as long as it is in reality. It estimates F0 to be half of its actual value31Pitch Halving

pitch is halvedCheck out normal file in Praat.

32More Deep Thoughts What might happen if: The shortest period checked is less than half of the fundamental period? AND the second half of the fundamental cycle is very similar to the first? Potential Problem #2: Pitch doubling The pitch tracker thinks the fundamental period is half as long as it actually is. It estimates the F0 to be twice as high as it is in reality.33Pitch Doubling

pitch is doubled

34Microperturbations Another problem: Speech waveforms are partly shaped by the type of segment being produced. Pitch tracking can become erratic at the juncture of two segments. In particular: voiced to voiceless segments sonorants to obstruents These discontinuities in F0 are known as microperturbations. Also: transitions between modal and creaky voicing tend to be problematic.Check out ef4 for transition from modal to creaky.35Back to Language F0 is important because it can be used by languages to signal differences in meaning. Note: Acoustic=Fundamental Frequency Perceptual=Pitch Linguistic=Tone36A TypologyF0 is generally used in three different ways in language:1. Tone languages (Chinese, Navajo, Igbo)Lexically determined tone on every syllableSyllable-based tone languages2. Accentual languages (Japanese, Swedish)The location of an accent in a particular word is lexically marked.Word-based tone languages 3. Stress languages (English, Russian)Its complicated.37Mandarin Tone

ma1: mother

ma2: hemp

ma3: horse

ma4: to scold

Mandarin (Chinese) is a classic example of a tone language.38How to Transcribe Tone Tones are defined by the pattern they make through a speakers frequency range. The frequency range is usually assumed to encompass five levels (1-5). (although this can vary, depending on the language)12345Highest F0Lowest F039

In Mandarin, tones span a frequency range of 1-5 Each tone is denoted by its (numerical) path through the frequency range Each syllable can also be labeled with a tone number (e.g., ma1, ma2, ma3, ma4) Tone 1 2 3 4

40How to Transcribe Tone Tone is relative i.e., not absolute Each speaker has a unique frequency range. For example:12345Highest F0Lowest F0FemaleMale100 Hz200 Hz350 Hz150 Hz41General Relativity In ordinary conversation, for European languages (Fant, 1956) : Men have an average F0 of 120 Hz A range of 50-250 Hz Women have an average F0 of 220 Hz A range of 120-480 Hz Children have an average F0 of 330 Hz In a normal utterance, the F0 range is usually one octave. i.e., highest F0 = 2 * lowest F0Relativity, in Reality The same tones may be denoted by completely different frequencies, depending on the speaker. Tone is an abstract linguistic unit.

female speakermale speaker

ma, tone 1 (55)43Accent Languages In accent languages, there is only one pitch accent associated with each word. The pitch accent is realized on only one syllable in the word. The other syllables in the word can have no accent. Accent is lexically determined, so there can be minimal pairs. Japanese is a pitch accent language for some, but not all, words for some, but not all, dialects44Japanese Japanese words have one High accent it attaches to one mora in the word A mora = a vowel, or a consonant following a vowel, within a syllable. For example: [ni] twohas one mora. [san] three has two morae. The first mora, if not accented, has a Low F0. Morae following the accent have Low F0.Its actually slightly more complicated than this; for more info, see: http://sp.cis.iwate-u.ac.jp/sp/lesson/j/doc/accent.html45Japanese Examples asamorningH-L

asahempL-H

46 chopsticksH-L-L

bridgeL-H-L

edgeL-H-H

47Stress Languages Stress is a suprasegmental property that applies to whole syllables. It is defined by more than just differences in F0. Stressed syllables are higher in pitch (usually) Stressed syllables are longer (usually) Stressed syllables are louder (usually) Stressed syllables reflect more phonetic effort. More aspiration, less coarticulation in stressed syllables. Vowels often reduce to schwa in unstressed syllables. The combination of these factors give stressed syllables more prominence than unstressed syllables.48Stress: Pitch

(N)

(V)

Complicating factor: pitch tends to drift downwards at the end of utterances

49Intonation Languages superimpose pitch contours on top of word-based stress or tone distinctions. This is called intonation. It turns out that English: has word-based stress and phrase-based pitch accents (intonation) The pitch accents are pragmatically specified, rather than lexically specified. =they change according to discourse context.50English Intonation Well analyze English intonation with a framework called TOBI Tones and Break Indices Note: intonational patterns vary across dialects The patterns and examples presented today might not match up with your own intonational system Also: this framework has only been applied to a few (primarily western) languages Theres more info at http://www.ling.ohio-state.edu/~tobi/ Course in Phonetics, pp. 124-12851Levels of Prominence In English, pitch accents align with stressed syllables. Example: exploitation

vowelX X X Xfull vowelX X XstressX Xpitch accent X Normally, the accent falls on the last stressed syllable.52Pitch Accent Types In English, pitch accents can be either high or low H* or L* Examples:High (H*)Low (L*)Yes.Yes? H* L*Magnification.Magnification? As with tones in tone languages, high and low pitch accents are defined relative to a speakers pitch range. My pitch range: H* = 155 HzL* = 100 Hz Mary Beckman: H* = 260 HzL* = 130 Hz

53Whole Utterances The same pitch pattern can apply to an entire sentence: H*H*:Manny came with Anna. L*L*:Manny came with Anna? H*H*:Marianna made the marmalade. L*L*:Marianna made the marmalade?

54Information Note that theres a tendency to accent new information in the discourse. 4 different patterns for 4 different contexts: H*H*:Manny came with Anna. H*H*:Manny came with Anna. L*L*:Manny came with Anna? L*L*:Manny came with Anna?

55Pitch Tracking H* is usually associated with a peak in F0; L* is usually associated with a valley (trough) in F0 Pitch tracking can help with the identification of pitch peaks and valleys. Note: its easier to analyze utterances with lots of sonorants. Check out both productions of Manny came with Anna in Praat. Note that there is more to the intonation contour than just pitch peaks and valleys The H* is followed by a falling pitch pattern The L* is followed by a rising pitch pattern56Tone Transcription

L* H%

57Tone TypesThere are two types of tones at play:Pitch Accentsassociated with a stressed syllablemay be either High (H) or Low (L)marked with a *Boundary Tonesappear at the end of a phrasenot associated with a particular syllablemay be either High (H) or Low (L)marked with a %58Phrases Intonation organizes utterances into phrases chunks Boundary tones mark the end of intonational phrases Intonational phrases are the largest phrases In the transcription of intonation, phrase boundaries are marked with Break Indices Hence, TOBI: Tones and Break Indices Break Indices are denoted by numbers 1 = break between words 4 = break between intonational phrases59Break Index Transcription

Tones:L* H%Breaks: 1 1 14

60

************************************************************* ********************

************* ************** ***********************

**********************

*********** ****************** *******

****************

F0 (Hz)

1 2 3 4 (s)

200300400

Time


Recommended