Modeling of Speech Signal for Analysis Purposes - or...

Modeling ofSpeech Signal

for AnalysisPurposes

YannisStylianou

Outline of thetalk

Modeling

Synthesis:Jitter andShimmer

Analysis:Jitter andShimmer

Acknowledg-ments

References

Modeling of Speech Signal forAnalysis Purposes

or Mathematical modeling of jitter and shimmer

Yannis Stylianou

University of Crete, Computer Science Dept., Multimedia Informatics [email protected]

Limsi, France, 2008 August 13th



YannisStylianou

Outline of thetalk

Modeling



Acknowledg-ments

References

Multimedia Informatics Lab

4 Professors:1 T. Mouchtaris (Audio and Speech Processing)2 Y. Stylianou (Speech and Signal Processing)3 P. Tsakalides (Signal Processing and Sensor Networks)4 G. Tziritas (Image and Video Processing)

3 Post Docs, 8 Ph.D. Students and many students inM.Sc. degree

Strong connections with a Research Center: FORTH.



YannisStylianou

Outline of thetalk

Modeling



Acknowledg-ments

References

My current research topics

Speech Processing

Voice Quality AssessmentAlgorithms for Speech PathologyNon-linear speech modeling and processingInverse FilteringVoice TransformationMultimodal User identification

Music Signal Processing

Marine Mammals Acoustics



YannisStylianou

Outline of thetalk

Modeling



Acknowledg-ments

References

Multimedia Informatics Lab

Selected Recent Projects:

FP6-IST NoE SIMILAR: Human-computer interactionsimilar to the way humans do it.

FP6-IST Strep PISTE: Personalized, Immersive Sports TVExperience

FP6-Marie Curie TOK: Collaborative Signal Processing forEfficient Wireless Sensor Networks

GSRT Wireless Sensor Networks: Theory and Applicationsin Structural Health Monitoring

GSRT AKMON: Advanced Algorithms for Voice QualityAssessment

GSRT TV++: Multimedia processing for Broadcast News

Industrial Partners: France Telecom, British Telecom,FORTH-Net



YannisStylianou

Outline of thetalk

Modeling



Acknowledg-ments

References

1 Modeling

2 Synthesis: Jitter and ShimmerDefinitions and EstimatorsMathematical Modeling of JitterMathematical Modeling of Shimmer

3 Analysis: Jitter and ShimmerTime-Frequency RepresentationsTime-Frequency AnalysisModeling Jitter and Shimmer

4 Acknowledgments

5 References



YannisStylianou

Outline of thetalk

Modeling



Acknowledg-ments

References

Modeling

Modeling for ...

Coding

Modifications

Synthesis and Analysis



YannisStylianou

Outline of thetalk

Modeling



Acknowledg-ments

References

Modeling

Modeling for ...

Coding

Modifications




YannisStylianou

Outline of thetalk

Modeling



Acknowledg-ments

References

Modeling

Modeling for ...

Coding

Modifications




YannisStylianou

Outline of thetalk

Modeling


Definitions andEstimators

MathematicalModeling ofJitter

MathematicalModeling ofShimmer


Acknowledg-ments

References

Synthesis: Jitter and Shimmer



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Defining Jitter and Shimmer

Definition (Jitter)

Jitter is defined as perturbations of the glottal source signalthat occur during vowel phonation and affect the glottal pitchperiod.

Definition (Shimmer)

Shimmer is defined as perturbations of the glottal source signalthat occur during vowel phonation and affect the glottal energy.



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Defining Jitter and Shimmer

Definition (Jitter)

Jitter is defined as perturbations of the glottal source signalthat occur during vowel phonation and affect the glottal pitchperiod.

Definition (Shimmer)

Shimmer is defined as perturbations of the glottal source signalthat occur during vowel phonation and affect the glottal energy.



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Some Estimators ...

Let u[n] be a pitch period sequence.Absolute jitter:

1

N − 1

N−1∑n=1

|u(n + 1)− u(n)|

Let u[n] be a peak amplitude sequence of N samples.Absolute Shimmer:

1

N − 1

N−1∑n=1

|u(n + 1)− u(n)|



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Some Estimators ...

Let u[n] be a pitch period sequence.Absolute jitter:

1

N − 1

N−1∑n=1

|u(n + 1)− u(n)|

Let u[n] be a peak amplitude sequence of N samples.Absolute Shimmer:

1

N − 1

N−1∑n=1

|u(n + 1)− u(n)|



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Jitter: Aperiodicity throughperiodicity[1]

1

ampl

itude

� ��

time (samples)

P − ε P − εP + ε P + ε



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

In mathematical terms

We model the glottal impulse train as:

p[n] =+∞∑

k=−∞δ[n − (2k)P] +

+∞∑k=−∞

δ[n + ε− (2k + 1)P]

We may show that its power spectrum is then:

|P(ω)|2 =2

P2(1 + cos [(ε− P)ω])

[δlω0(ω) + δ(l+ 1

2)ω0

(ω)]

= H(ε, ω) + S(ε, ω)



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References



p[n] =+∞∑

k=−∞δ[n − (2k)P] +

+∞∑k=−∞

δ[n + ε− (2k + 1)P]

We may show that its power spectrum is then:

|P(ω)|2 =2

P2(1 + cos [(ε− P)ω])

[δlω0(ω) + δ(l+ 1

2)ω0

(ω)]

= H(ε, ω) + S(ε, ω)



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Examples of power spectrum

On synthetic glottal signal

−40

−38

−36

−34

−32

−30

−28

−26

radian frequency (ω)

pow

er (

dB)

� ��

H(0, ω)

S(0, ω)

H(1, ω)

S(1, ω)

H(2, ω)

S(2, ω)



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Experiments

Goal: discriminate pathological from normal voices, based onjitter

Database: Massachusetts Eye and Ear Infirmary (MEEI)[2]

Sustained vowels,53 subjects with normal voice,657 subjects with a wide variety of pathological conditions

Jitter estimation methods:

PRAAT2007 (P. Boersma and D. Weenink) [3]Multi-Dimensional Voice Program (MDVP), (Kay-Pentaxelemetrics, 2007) [4]Our approach [1]



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Experiments

Goal: discriminate pathological from normal voices, based onjitter

Database: Massachusetts Eye and Ear Infirmary (MEEI)[2]

Sustained vowels,53 subjects with normal voice,657 subjects with a wide variety of pathological conditions

Jitter estimation methods:

PRAAT2007 (P. Boersma and D. Weenink) [3]Multi-Dimensional Voice Program (MDVP), (Kay-Pentaxelemetrics, 2007) [4]Our approach [1]



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Results in ROC curves



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Shimmer: Aperiodicity throughperiodicity[1]



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References



g [n] = A(1 + ∆)δ(2k)P [n] + A(1−∆)δ(2k+1)P [n]

We may show that its Fourier Transform is then:

G (ω) = A[(1 + ∆) + (1−∆)e

−j2π ωω0

] ω0

4π

+∞∑k=−∞

δ(ω−kω0

2)

Splitting

G (lω0) = Aω0

2π

G ((l + 1/2)ω0) = Aω0

2π∆



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References



g [n] = A(1 + ∆)δ(2k)P [n] + A(1−∆)δ(2k+1)P [n]


G (ω) = A[(1 + ∆) + (1−∆)e

−j2π ωω0

] ω0

4π

+∞∑k=−∞

δ(ω−kω0

2)

Splitting

G (lω0) = Aω0

2π

G ((l + 1/2)ω0) = Aω0

2π∆



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References



g [n] = A(1 + ∆)δ(2k)P [n] + A(1−∆)δ(2k+1)P [n]


G (ω) = A[(1 + ∆) + (1−∆)e

−j2π ωω0

] ω0

4π

+∞∑k=−∞

δ(ω−kω0

2)

Splitting

G (lω0) = Aω0

2π

G ((l + 1/2)ω0) = Aω0

2π∆



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Examples of spectrum

On synthetic glottal signal



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Experiment at 8kHz



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Experiment at 16kHz



YannisStylianou

Outline of thetalk

Modeling



Time-FrequencyRepresentations

Time-FrequencyAnalysis

Modeling Jitterand Shimmer

Acknowledg-ments

References

Analysis: Jitter and Shimmer



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Short-Time Fourier Transform



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Time-Frequency Distributions [5]



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Modeling the Periodic part of Speech

Sum of simple exponential functions

h1(t) = <

{L∑

k=1

ake j2πkf0fs

t

}

Sum of exponential functions with complex slope(HNM2[6])

h2(t) = <

{L∑

k=1

Ak(t) expj2πkf0fs

t

}

whereAk(t) = ak + t bk



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Modeling the Periodic part of Speech

Sum of simple exponential functions

h1(t) = <

{L∑

k=1

ake j2πkf0fs

t

}

Sum of exponential functions with complex slope(HNM2[6])

h2(t) = <

{L∑

k=1

Ak(t) expj2πkf0fs

t

}

whereAk(t) = ak + t bk



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Revisiting HNM2

We recall that the periodic part of HNM2 is given by:

s(t) =

(L∑

k=−L

Ak(t)e2πjkf0t

)w(t)

with Ak(t) = ak + tbk , or in frequency domain:

S(f ) =L∑

k=−L

(akW (f − kf0) + jbkW ′(f − kf0)

)where W (f ) is the Fourier Transform of window w(t)



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Time-Domain Properties of HNM2

Instantaneous Amplitude:

mk(t) = |ak + tbk | =√

(aRk + tbR

k )2 + (aIk + tbI

k)2

Instantaneous Phase:

φk(t) = 2πkf0t + ∠(ak + tbk)

= 2πkf0t + atanaIk + tbI

k

aRk + tbR

k

Instantaneous Frequency:

fk(t) =1

2πφ′k(t)

= kf0 +1

2π

aRk bI

k − aIkbR

k

m2k(t)



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References



mk(t) = |ak + tbk | =√

(aRk + tbR

k )2 + (aIk + tbI

k)2




k

aRk + tbR

k


fk(t) =1

2πφ′k(t)

= kf0 +1

2π

aRk bI

k − aIkbR

k

m2k(t)



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References



mk(t) = |ak + tbk | =√

(aRk + tbR

k )2 + (aIk + tbI

k)2




k

aRk + tbR

k


fk(t) =1

2πφ′k(t)

= kf0 +1

2π

aRk bI

k − aIkbR

k

m2k(t)



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Frequency Domain properties of HNM2

Let ~ak and ~bk denote the vectors corresponding respectively tothe complex ak and bk andlet’s decompose ~bk into two components:

one collinear to ~ak , and

one perpendicular to ~ak .

Thus, ~bk is given by

~bk = ρ1,k~ak + ρ2,k~a⊥k ,



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Frequency Domain properties of HNM2

Let ~ak and ~bk denote the vectors corresponding respectively tothe complex ak and bk andlet’s decompose ~bk into two components:

one collinear to ~ak , and

one perpendicular to ~ak .

Thus, ~bk is given by

~bk = ρ1,k~ak + ρ2,k~a⊥k ,



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Let’s look at the k th component

The kth component can be written as:

Sk(f ) = ak [W (f −kf0)−ρ2,kW ′(f −kf0)+jρ1,kW ′(f −kf0)]

For small values of ρ2,k , using a first order approximationof the Taylor series of W (f ), we have:

W (f − kf0)− ρ2,kW ′(f − kf0) ≈W (f − kf0 − ρ2,k)

and then:

Sk(f ) ≈ ak [W (f − kf0 − ρ2,k) + jρ1,kW ′(f − kf0)]



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Time-Frequency Analysis using HNM2Healthy voice

Samples

Fre

quen

cy

1000 1500 2000 2500 3000

0

2000

4000

6000

8000



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Time-Frequency Analysis using HNM2Pathologic voice

Samples

Fre

quen

cy

1000 1500 2000 2500 3000

0

2000

4000

6000

8000



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Sinusoidal model

s(t) =

K(t)∑k=1

Ak(t)cos[θk(t)]

whereAk(t) = ak(t)︸︷︷︸

excitation

· Mk(t)︸︷︷︸vocal track

and

θk(t) = φk(t)︸︷︷︸excitation

+ Φk(t)︸︷︷︸vocal track

φk(t) = 2πk

∫ t

0f0(τ)dτ + φk



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Jitter and Shimmer

Jitter:f0(t) = f0 − δsin(πf0t)

Shimmer:ak(t) = ak [1 + γk cos(πf0t)]

so then:

s(t) =K∑

k=−K

Ak [1 + γkcos(πf0t)]e j(2πkf0t+δkcos(πf0t)+θk )w(t)

and by writing: e jδkcos(πf0t) ≈ 1 + jδkcos(πf0t), then:

s(t) ≈K∑

k=−K

Ake jθk [1 + (γk + jδk)cos(πf0t)]e j2πkf0tw(t)



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Jitter and Shimmer



so then:

s(t) =K∑

k=−K



s(t) ≈K∑

k=−K




YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Jitter and Shimmer



so then:

s(t) =K∑

k=−K



s(t) ≈K∑

k=−K




YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Jitter and Shimmer in HNM2

Suggesting:

s(t) =K∑

k=−K

[ak + bkcos(πf0t)]e j2πkf0tw(t)

and by letting bk = ρ1,kak + ρ2,k jak , then:

s(t) =K∑

k=−K

ak [1 + (ρ1,k + jρ2,k)cos(πf0t)]e j2πkf0tw(t)

comparing to what we would like to have:

s(t) ≈K∑

k=−K




YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Jitter and Shimmer in HNM2

Suggesting:

s(t) =K∑

k=−K

[ak + bkcos(πf0t)]e j2πkf0tw(t)

and by letting bk = ρ1,kak + ρ2,k jak , then:

s(t) =K∑

k=−K

ak [1 + (ρ1,k + jρ2,k)cos(πf0t)]e j2πkf0tw(t)

comparing to what we would like to have:

s(t) ≈K∑

k=−K




YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Modeling Shimmer

0 50 100 150 200 250 300 350 400−0.5

0

0.5

1

Samples

0 50 100 150 200 250 300 350 400−0.5

0

0.5

1

Samples

0 50 100 150 200 250 300 350 400−0.4

−0.2

0

0.2

0.4

Samples

OriginalReconstructed1


Error1Error2



YannisStylianou

Outline of thetalk

Modeling






Acknowledg-ments

References

Modeling Jitter

0 50 100 150 200 250 300 350 400−0.5

0

0.5

1

Samples

0 50 100 150 200 250 300 350 400−0.5

0

0.5

1

Samples

0 50 100 150 200 250 300 350 400−1

−0.5

0

0.5

Samples



Error1Error2



YannisStylianou

Outline of thetalk

Modeling



Acknowledg-ments

References

Acknowledgments

I wish to thank T. Quatieri and Prentice Hall for gave methe permission to use figures from Tom’s book[7].

My students, Miltos Vassilakis and Yannis Pantazis fortheir work on jitter and shimmer and on HNM2.



YannisStylianou

Outline of thetalk

Modeling



Acknowledg-ments

References

M. Vasilakis and Y. Stylianou, “A mathematical model for accurate measurement of jitter,” in

MAVEBA 2007, (Florence, Italy), 2007.

K. Elemetrics, “Disordered Voice Database (Version 1.03),” 1994.

P. Boersma and D. Weenink, “Praat: doing phonetics by computer (Version 4.6.24) [Computer

program],” 2007.

K. Elemetrics, “Multi-Dimensional Voice Program (MDVP) [Computer program],” 2007.

L. Cohen, Time-Frequency Analysis.

Englewood Cliffs, NJ: Prentice-Hall, 1995.

Y. Stylianou, “Modeling speech based on harmonic plus noise models.,” in Nonlinear Speech

Modeling and Aplications (G. Chellot, A. Esposito, M. Faundez, and M. M, eds.), pp. 244–260,Springer-Verlag, 2005.

T. F. Quatieri, Discrete-Time Speech Signal Processing.

Engewood Cliffs, NJ: Prentice Hall, 2002.



YannisStylianou

Outline of thetalk

Modeling



Acknowledg-ments

References

Date post:	22-May-2018
Category:	Documents
Upload:	buinguyet
View:	224 times
Download:	1 times

Modeling of Speech Signal for Analysis Purposes - or...

Documents