Tools for Analysing the Voice Source
John Kane & Christer Gobl
April 24, 2009
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 1 / 28
Research Aims
Summary
To design efficient, accurate and robust methods for parameterisingthe human voice source.
To investigate new ways of utilising this information for differentapplications.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 2 / 28
Research Aims
Summary
To design efficient, accurate and robust methods for parameterisingthe human voice source.
To investigate new ways of utilising this information for differentapplications.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 2 / 28
Why?
Basic Research: Understanding more about speech production
Speech Synthesis: Formant synthesis, HMM-based synthesisCabral et al (2008)
Voice Pathology: Characterising speech pathologies, testing theeffectiveness of treatment etc.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 3 / 28
Why?
Basic Research: Understanding more about speech production
Speech Synthesis: Formant synthesis, HMM-based synthesisCabral et al (2008)
Voice Pathology: Characterising speech pathologies, testing theeffectiveness of treatment etc.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 3 / 28
Why?
Basic Research: Understanding more about speech production
Speech Synthesis: Formant synthesis, HMM-based synthesisCabral et al (2008)
Voice Pathology: Characterising speech pathologies, testing theeffectiveness of treatment etc.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 3 / 28
Speech Production
Acoustic theory of speech production, Fant (1960)
Figure: taken from Gobl (2003)
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 4 / 28
Inverse Filtering
The speech production process in reverse. Speech waveform is putthrough a set of anti-resonators.
In practice this is far from straightforward.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 5 / 28
Inverse Filtering
The speech production process in reverse. Speech waveform is putthrough a set of anti-resonators.
In practice this is far from straightforward.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 5 / 28
LF model
The LF model is the most documented voice source model and hasbeen demonstrated to perform at least as well as other voice sourcemodels, e.g. Fujisaki & Ljungqvist (1986) or Strik (1998).
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 6 / 28
LF model
The LF model is usually described by three time based parametersand one amplitude parameter (EE).
R parameter Equations
Ra =Ta
T0Rg =
T0
2TpRk =
Tn
Tp
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 7 / 28
LF model
The LF model is usually described by three time based parametersand one amplitude parameter (EE).
R parameter Equations
Ra =Ta
T0Rg =
T0
2TpRk =
Tn
Tp
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 7 / 28
Manual Methods
Matching the model to the source pulse by varying 4 time markersand one amplitude marker.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 8 / 28
Automatic Methods
Automatic methods for parameterising the voice source exist e.g. Strik(1998) and Airas (2008).
However, most parameterisation systems involve the marking of timeinstants in the glottal waveform
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 9 / 28
Automatic Methods
Automatic methods for parameterising the voice source exist e.g. Strik(1998) and Airas (2008).
However, most parameterisation systems involve the marking of timeinstants in the glottal waveform
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 9 / 28
Our Automatic Method
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 10 / 28
Amplitude Measures (EE and EI)
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 11 / 28
0 20 40 60 80 100 120 140 160 180-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
Time (ms)
Am
plitu
de
Amplitude Measures (EE and EI)
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 12 / 28
0 20 40 60 80 100 120 140 160 180-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
Time (ms)
Am
plitu
de
*
*
Amplitude Measures (UP)
UP is measured as the maximum peak of the glottal flow. Thisprocess, however, is complicated by the occurence of zero drift.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 13 / 28
Amplitude Measures (UP)
0 100 200 300 400 500 600 700 800 900 1000
-1
-0.5
0
0.5
1x 10
-3
Time (ms)
Am
plitu
de
0 100 200 300 400 500 600 700 800 900 10000
0.2
0.4
0.6
0.8
1
x 10-3
Time (ms)
Am
plitu
de
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 14 / 28
R-parameters calculated from Amplitude Measures
Two of the time based parameters can now by defined by amplituderepresentations (equations taken from Gobl (2003)).
R parameter Equations
Rka = (2
π)(
EI
EE) Rga =
( 1π )( EI
UP )
f0
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 15 / 28
R-parameters calculated from Amplitude Measures
Two of the time based parameters can now by defined by amplituderepresentations (equations taken from Gobl (2003)).
R parameter Equations
Rka = (2
π)(
EI
EE) Rga =
( 1π )( EI
UP )
f0
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 15 / 28
Frequency Domain Measure
The final parameter required to describe the LF model is Ra. Radescribes the return phase of the model.
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-60
-40
-20
0
20
40
60FFT of the source waveform
Frequency (Hz)
Ampl
itude
(dB)
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 16 / 28
Evaluation
Three speakers were recorded saying an [a] vowel in lax, modal andtense phonation types.
Utterances were inverse filtered manually and then analysed usingboth methods.
Synthesised sounds made using data from both methods.
Amplitude and f0 were kept constant in each pair of synthesisedsounds.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 17 / 28
Evaluation
Three speakers were recorded saying an [a] vowel in lax, modal andtense phonation types.
Utterances were inverse filtered manually and then analysed usingboth methods.
Synthesised sounds made using data from both methods.
Amplitude and f0 were kept constant in each pair of synthesisedsounds.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 17 / 28
Evaluation
Three speakers were recorded saying an [a] vowel in lax, modal andtense phonation types.
Utterances were inverse filtered manually and then analysed usingboth methods.
Synthesised sounds made using data from both methods.
Amplitude and f0 were kept constant in each pair of synthesisedsounds.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 17 / 28
Evaluation
Three speakers were recorded saying an [a] vowel in lax, modal andtense phonation types.
Utterances were inverse filtered manually and then analysed usingboth methods.
Synthesised sounds made using data from both methods.
Amplitude and f0 were kept constant in each pair of synthesisedsounds.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 17 / 28
Perception Tests
Test 1: Participants listened to 45 groups of three stimuli and chosethe synthesised stimuli which sounded most like the original.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 18 / 28
Perception Tests
Test 2: Participants again listened to 45 groups of three stimuli andchose which of the synthesised stimuli was repeated.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 19 / 28
Perception Tests
Test 1: % = participant preference of automatic method
Test 2: % = participant ability to discriminate stimuli
Test Modal Tense Lax Overall
1 50%
2 65%
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 20 / 28
Perception Tests
Test 1: % = participant preference of automatic method
Test 2: % = participant ability to discriminate stimuli
Test Modal Tense Lax Overall
1 50% 61% 40% 50%
2 55% 67% 73% 65%
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 21 / 28
Perception Tests
Test 1: % = participant preference of automatic method
Test 2: % = participant ability to discriminate stimuli
Test Modal Tense Lax Overall
1 50% 61% 40% 50%
2 55% 67% 73% 65%
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 22 / 28
Perception Tests
Test 1: % = participant preference of automatic method
Test 2: % = participant ability to discriminate stimuli
Test Modal Tense Lax Overall
1 50% 61% 40% 50%
2 55% 67% 73% 65%
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 23 / 28
Calculating ResidualWe designed a method to calculate the residual by comparing thespectra of the source signal and the parameterisation.
0 1000 2000 3000 4000 5000-80
-70
-60
-50
-40
-30
-20
-10
Frequency (Hz)
Am
plitu
de (
dB)
Original SourceParameterised source
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 24 / 28
Calculating Residual
Modal Tense Lax0
500
1000
1500
2000
2500
(A) Voice Qualities
Am
plit
ud
e**
0-5 0-1 1-3 3-50
500
1000
1500
2000
(B) Frequency Regions (kHz)
Am
plit
ud
e
AutomaticManual
*
*
Figure: Residual values across voice qualities (A) and across four frequencyregions (B). Data expressed as mean ± SEM (Independent t test)p < 0.05*, p < 0.01**
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 25 / 28
Conclusions
Our automatic parameterisation method performed at least as well asthe manual method in modal to tense phonation modes.
For lax voice qualities the manual method performed slightly better.Quality in both methods generally poorer.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 26 / 28
Conclusions
Our automatic parameterisation method performed at least as well asthe manual method in modal to tense phonation modes.
For lax voice qualities the manual method performed slightly better.Quality in both methods generally poorer.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 26 / 28
Directions in Parameterisation
Utilise frequency domain information more.
Include a method of analysing the noise component in speech,e.g. Gobl (2006).
Further optimise the analysis by using information from previouspulses.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 27 / 28
Directions in Parameterisation
Utilise frequency domain information more.
Include a method of analysing the noise component in speech,e.g. Gobl (2006).
Further optimise the analysis by using information from previouspulses.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 27 / 28
Directions in Parameterisation
Utilise frequency domain information more.
Include a method of analysing the noise component in speech,e.g. Gobl (2006).
Further optimise the analysis by using information from previouspulses.
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 27 / 28
Thanks!
Questions, criticisms, comments on my hair......
John Kane & Christer Gobl () Tools for Analysing the Voice Source April 24, 2009 28 / 28