Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl...

Tools for Analysing the Voice Source

John Kane & Christer Gobl

April 24, 2009

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 1 / 28

Research Aims

Summary

To design efficient, accurate and robust methods for parameterisingthe human voice source.

To investigate new ways of utilising this information for differentapplications.


Research Aims

Summary

To design efficient, accurate and robust methods for parameterisingthe human voice source.

To investigate new ways of utilising this information for differentapplications.


Why?

Basic Research: Understanding more about speech production

Speech Synthesis: Formant synthesis, HMM-based synthesisCabral et al (2008)

Voice Pathology: Characterising speech pathologies, testing theeffectiveness of treatment etc.


Why?





Why?





Speech Production

Acoustic theory of speech production, Fant (1960)

Figure: taken from Gobl (2003)


Inverse Filtering

The speech production process in reverse. Speech waveform is putthrough a set of anti-resonators.

In practice this is far from straightforward.


Inverse Filtering

The speech production process in reverse. Speech waveform is putthrough a set of anti-resonators.

In practice this is far from straightforward.


LF model

The LF model is the most documented voice source model and hasbeen demonstrated to perform at least as well as other voice sourcemodels, e.g. Fujisaki & Ljungqvist (1986) or Strik (1998).


LF model

The LF model is usually described by three time based parametersand one amplitude parameter (EE).

R parameter Equations

Ra =Ta

T0Rg =

T0

2TpRk =

Tn

Tp


LF model

The LF model is usually described by three time based parametersand one amplitude parameter (EE).


Ra =Ta

T0Rg =

T0

2TpRk =

Tn

Tp


Manual Methods

Matching the model to the source pulse by varying 4 time markersand one amplitude marker.


Automatic Methods

Automatic methods for parameterising the voice source exist e.g. Strik(1998) and Airas (2008).

However, most parameterisation systems involve the marking of timeinstants in the glottal waveform


Automatic Methods

Automatic methods for parameterising the voice source exist e.g. Strik(1998) and Airas (2008).

However, most parameterisation systems involve the marking of timeinstants in the glottal waveform


Our Automatic Method


Amplitude Measures (EE and EI)


0 20 40 60 80 100 120 140 160 180-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

Time (ms)

Am

plitu

de

Amplitude Measures (EE and EI)


0 20 40 60 80 100 120 140 160 180-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

Time (ms)

Am

plitu

de

*

*

Amplitude Measures (UP)

UP is measured as the maximum peak of the glottal flow. Thisprocess, however, is complicated by the occurence of zero drift.


Amplitude Measures (UP)

0 100 200 300 400 500 600 700 800 900 1000

-1

-0.5

0

0.5

1x 10

-3

Time (ms)

Am

plitu

de

0 100 200 300 400 500 600 700 800 900 10000

0.2

0.4

0.6

0.8

1

x 10-3

Time (ms)

Am

plitu

de


R-parameters calculated from Amplitude Measures

Two of the time based parameters can now by defined by amplituderepresentations (equations taken from Gobl (2003)).


Rka = (2

π)(

EI

EE) Rga =

( 1π )( EI

UP )

f0


R-parameters calculated from Amplitude Measures

Two of the time based parameters can now by defined by amplituderepresentations (equations taken from Gobl (2003)).


Rka = (2

π)(

EI

EE) Rga =

( 1π )( EI

UP )

f0


Frequency Domain Measure

The final parameter required to describe the LF model is Ra. Radescribes the return phase of the model.

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-60

-40

-20

0

20

40

60FFT of the source waveform

Frequency (Hz)

Ampl

itude

(dB)


Evaluation

Three speakers were recorded saying an [a] vowel in lax, modal andtense phonation types.

Utterances were inverse filtered manually and then analysed usingboth methods.

Synthesised sounds made using data from both methods.

Amplitude and f0 were kept constant in each pair of synthesisedsounds.


Evaluation






Evaluation






Evaluation






Perception Tests

Test 1: Participants listened to 45 groups of three stimuli and chosethe synthesised stimuli which sounded most like the original.


Perception Tests

Test 2: Participants again listened to 45 groups of three stimuli andchose which of the synthesised stimuli was repeated.


Perception Tests

Test 1: % = participant preference of automatic method

Test 2: % = participant ability to discriminate stimuli

Test Modal Tense Lax Overall

1 50%

2 65%


Perception Tests




1 50% 61% 40% 50%

2 55% 67% 73% 65%


Perception Tests




1 50% 61% 40% 50%

2 55% 67% 73% 65%


Perception Tests




1 50% 61% 40% 50%

2 55% 67% 73% 65%


Calculating ResidualWe designed a method to calculate the residual by comparing thespectra of the source signal and the parameterisation.

0 1000 2000 3000 4000 5000-80

-70

-60

-50

-40

-30

-20

-10

Frequency (Hz)

Am

plitu

de (

dB)

Original SourceParameterised source


Calculating Residual

Modal Tense Lax0

500

1000

1500

2000

2500

(A) Voice Qualities

Am

plit

ud

e**

0-5 0-1 1-3 3-50

500

1000

1500

2000

(B) Frequency Regions (kHz)

Am

plit

ud

e

AutomaticManual

*

*

Figure: Residual values across voice qualities (A) and across four frequencyregions (B). Data expressed as mean ± SEM (Independent t test)p < 0.05*, p < 0.01**


Conclusions

Our automatic parameterisation method performed at least as well asthe manual method in modal to tense phonation modes.

For lax voice qualities the manual method performed slightly better.Quality in both methods generally poorer.


Conclusions

Our automatic parameterisation method performed at least as well asthe manual method in modal to tense phonation modes.

For lax voice qualities the manual method performed slightly better.Quality in both methods generally poorer.


Directions in Parameterisation

Utilise frequency domain information more.

Include a method of analysing the noise component in speech,e.g. Gobl (2006).

Further optimise the analysis by using information from previouspulses.












Thanks!

Questions, criticisms, comments on my hair......


Date post:	23-Jul-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl...

Documents