+ All Categories
Home > Documents > Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl...

Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl...

Date post: 23-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
41
Tools for Analysing the Voice Source John Kane & Christer Gobl April 24, 2009 John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 1 / 28
Transcript
Page 1: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Tools for Analysing the Voice Source

John Kane & Christer Gobl

April 24, 2009

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 1 / 28

Page 2: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Research Aims

Summary

To design efficient, accurate and robust methods for parameterisingthe human voice source.

To investigate new ways of utilising this information for differentapplications.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 2 / 28

Page 3: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Research Aims

Summary

To design efficient, accurate and robust methods for parameterisingthe human voice source.

To investigate new ways of utilising this information for differentapplications.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 2 / 28

Page 4: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Why?

Basic Research: Understanding more about speech production

Speech Synthesis: Formant synthesis, HMM-based synthesisCabral et al (2008)

Voice Pathology: Characterising speech pathologies, testing theeffectiveness of treatment etc.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 3 / 28

Page 5: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Why?

Basic Research: Understanding more about speech production

Speech Synthesis: Formant synthesis, HMM-based synthesisCabral et al (2008)

Voice Pathology: Characterising speech pathologies, testing theeffectiveness of treatment etc.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 3 / 28

Page 6: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Why?

Basic Research: Understanding more about speech production

Speech Synthesis: Formant synthesis, HMM-based synthesisCabral et al (2008)

Voice Pathology: Characterising speech pathologies, testing theeffectiveness of treatment etc.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 3 / 28

Page 7: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Speech Production

Acoustic theory of speech production, Fant (1960)

Figure: taken from Gobl (2003)

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 4 / 28

Page 8: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Inverse Filtering

The speech production process in reverse. Speech waveform is putthrough a set of anti-resonators.

In practice this is far from straightforward.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 5 / 28

Page 9: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Inverse Filtering

The speech production process in reverse. Speech waveform is putthrough a set of anti-resonators.

In practice this is far from straightforward.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 5 / 28

Page 10: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

LF model

The LF model is the most documented voice source model and hasbeen demonstrated to perform at least as well as other voice sourcemodels, e.g. Fujisaki & Ljungqvist (1986) or Strik (1998).

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 6 / 28

Page 11: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

LF model

The LF model is usually described by three time based parametersand one amplitude parameter (EE).

R parameter Equations

Ra =Ta

T0Rg =

T0

2TpRk =

Tn

Tp

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 7 / 28

Page 12: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

LF model

The LF model is usually described by three time based parametersand one amplitude parameter (EE).

R parameter Equations

Ra =Ta

T0Rg =

T0

2TpRk =

Tn

Tp

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 7 / 28

Page 13: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Manual Methods

Matching the model to the source pulse by varying 4 time markersand one amplitude marker.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 8 / 28

Page 14: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Automatic Methods

Automatic methods for parameterising the voice source exist e.g. Strik(1998) and Airas (2008).

However, most parameterisation systems involve the marking of timeinstants in the glottal waveform

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 9 / 28

Page 15: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Automatic Methods

Automatic methods for parameterising the voice source exist e.g. Strik(1998) and Airas (2008).

However, most parameterisation systems involve the marking of timeinstants in the glottal waveform

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 9 / 28

Page 16: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Our Automatic Method

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 10 / 28

Page 17: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Amplitude Measures (EE and EI)

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 11 / 28

0 20 40 60 80 100 120 140 160 180-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

Time (ms)

Am

plitu

de

Page 18: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Amplitude Measures (EE and EI)

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 12 / 28

0 20 40 60 80 100 120 140 160 180-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

Time (ms)

Am

plitu

de

*

*

Page 19: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Amplitude Measures (UP)

UP is measured as the maximum peak of the glottal flow. Thisprocess, however, is complicated by the occurence of zero drift.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 13 / 28

Page 20: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Amplitude Measures (UP)

0 100 200 300 400 500 600 700 800 900 1000

-1

-0.5

0

0.5

1x 10

-3

Time (ms)

Am

plitu

de

0 100 200 300 400 500 600 700 800 900 10000

0.2

0.4

0.6

0.8

1

x 10-3

Time (ms)

Am

plitu

de

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 14 / 28

Page 21: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

R-parameters calculated from Amplitude Measures

Two of the time based parameters can now by defined by amplituderepresentations (equations taken from Gobl (2003)).

R parameter Equations

Rka = (2

π)(

EI

EE) Rga =

( 1π )( EI

UP )

f0

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 15 / 28

Page 22: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

R-parameters calculated from Amplitude Measures

Two of the time based parameters can now by defined by amplituderepresentations (equations taken from Gobl (2003)).

R parameter Equations

Rka = (2

π)(

EI

EE) Rga =

( 1π )( EI

UP )

f0

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 15 / 28

Page 23: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Frequency Domain Measure

The final parameter required to describe the LF model is Ra. Radescribes the return phase of the model.

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-60

-40

-20

0

20

40

60FFT of the source waveform

Frequency (Hz)

Ampl

itude

(dB)

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 16 / 28

Page 24: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Evaluation

Three speakers were recorded saying an [a] vowel in lax, modal andtense phonation types.

Utterances were inverse filtered manually and then analysed usingboth methods.

Synthesised sounds made using data from both methods.

Amplitude and f0 were kept constant in each pair of synthesisedsounds.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 17 / 28

Page 25: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Evaluation

Three speakers were recorded saying an [a] vowel in lax, modal andtense phonation types.

Utterances were inverse filtered manually and then analysed usingboth methods.

Synthesised sounds made using data from both methods.

Amplitude and f0 were kept constant in each pair of synthesisedsounds.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 17 / 28

Page 26: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Evaluation

Three speakers were recorded saying an [a] vowel in lax, modal andtense phonation types.

Utterances were inverse filtered manually and then analysed usingboth methods.

Synthesised sounds made using data from both methods.

Amplitude and f0 were kept constant in each pair of synthesisedsounds.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 17 / 28

Page 27: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Evaluation

Three speakers were recorded saying an [a] vowel in lax, modal andtense phonation types.

Utterances were inverse filtered manually and then analysed usingboth methods.

Synthesised sounds made using data from both methods.

Amplitude and f0 were kept constant in each pair of synthesisedsounds.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 17 / 28

Page 28: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Perception Tests

Test 1: Participants listened to 45 groups of three stimuli and chosethe synthesised stimuli which sounded most like the original.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 18 / 28

Page 29: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Perception Tests

Test 2: Participants again listened to 45 groups of three stimuli andchose which of the synthesised stimuli was repeated.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 19 / 28

Page 30: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Perception Tests

Test 1: % = participant preference of automatic method

Test 2: % = participant ability to discriminate stimuli

Test Modal Tense Lax Overall

1 50%

2 65%

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 20 / 28

Page 31: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Perception Tests

Test 1: % = participant preference of automatic method

Test 2: % = participant ability to discriminate stimuli

Test Modal Tense Lax Overall

1 50% 61% 40% 50%

2 55% 67% 73% 65%

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 21 / 28

Page 32: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Perception Tests

Test 1: % = participant preference of automatic method

Test 2: % = participant ability to discriminate stimuli

Test Modal Tense Lax Overall

1 50% 61% 40% 50%

2 55% 67% 73% 65%

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 22 / 28

Page 33: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Perception Tests

Test 1: % = participant preference of automatic method

Test 2: % = participant ability to discriminate stimuli

Test Modal Tense Lax Overall

1 50% 61% 40% 50%

2 55% 67% 73% 65%

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 23 / 28

Page 34: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Calculating ResidualWe designed a method to calculate the residual by comparing thespectra of the source signal and the parameterisation.

0 1000 2000 3000 4000 5000-80

-70

-60

-50

-40

-30

-20

-10

Frequency (Hz)

Am

plitu

de (

dB)

Original SourceParameterised source

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 24 / 28

Page 35: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Calculating Residual

Modal Tense Lax0

500

1000

1500

2000

2500

(A) Voice Qualities

Am

plit

ud

e**

0-5 0-1 1-3 3-50

500

1000

1500

2000

(B) Frequency Regions (kHz)

Am

plit

ud

e

AutomaticManual

*

*

Figure: Residual values across voice qualities (A) and across four frequencyregions (B). Data expressed as mean ± SEM (Independent t test)p < 0.05*, p < 0.01**

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 25 / 28

Page 36: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Conclusions

Our automatic parameterisation method performed at least as well asthe manual method in modal to tense phonation modes.

For lax voice qualities the manual method performed slightly better.Quality in both methods generally poorer.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 26 / 28

Page 37: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Conclusions

Our automatic parameterisation method performed at least as well asthe manual method in modal to tense phonation modes.

For lax voice qualities the manual method performed slightly better.Quality in both methods generally poorer.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 26 / 28

Page 38: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Directions in Parameterisation

Utilise frequency domain information more.

Include a method of analysing the noise component in speech,e.g. Gobl (2006).

Further optimise the analysis by using information from previouspulses.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 27 / 28

Page 39: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Directions in Parameterisation

Utilise frequency domain information more.

Include a method of analysing the noise component in speech,e.g. Gobl (2006).

Further optimise the analysis by using information from previouspulses.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 27 / 28

Page 40: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Directions in Parameterisation

Utilise frequency domain information more.

Include a method of analysing the noise component in speech,e.g. Gobl (2006).

Further optimise the analysis by using information from previouspulses.

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 27 / 28

Page 41: Tools for Analysing the Voice Source · Figure:taken from Gobl (2003) John Kane & Christer Gobl Tools for Analysing the Voice Source April 24, 2009 4 / 28. Inverse Filtering The speech

Thanks!

Questions, criticisms, comments on my hair......

John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 28 / 28


Recommended