IMPROVEMENT OF PROBABILISTIC ACOUSTIC TUBE MODEL FOR SPEECH DECOMPOSITION
Yang Zhang1
[email protected] Hasegawa-Johnson1
1 University of Illinois at Urbana Champaign, Urbana IL, USADepartment of Electrical and Computer Engineering
2 Tsinghua University, Beijing, ChinaDepartment of Electronic Engineering
MotivationDrawbacks of Current Model-based Methods Highlights of PAT2
• Incomplete - tend to model only a part of parameters of interest,and disregard others that might also be important.
• Speech analysis may be inaccurate or even incorrect:• Chicken and egg effect;• LPC and MFCC corrupted by spectral tilt.
• A probabilistic generative model that jointly considers all speech parameters;
• Incorporates breathiness and glottal vibration;• Incorporates phase modeling and so completely defines a
probabilistic model for the complex spectrum of speech;• Makes U/V states a continuum by introducing voiced amplitude and
unvoiced amplitude, which is closer to the nature of speech.
PAT2 Signal ModelingThe Source Filter Model with Mixed Excitation
Glottal Filter Vocal Tract Filter
Impulse train
Glottal filter
Vocal tract filter
Voiced speech
Unvoiced Excitation
Unvoiced Speech
𝑎𝑡 usually dominates 𝑏𝑡
𝑏𝑡
𝑎𝑡
𝑎𝑡 = 0
voiced excitation
𝑆𝑡 𝜔 = 𝑎𝑡𝑉𝑡 𝜔 + 𝑏𝑡𝑈𝑡 𝜔 𝐻𝑡 𝜔 ⊛ 𝑊𝑡 𝜔
𝑉𝑡 𝜔 = 𝐺𝑡 𝜔 𝑒−𝑗𝜔𝜏𝑡
𝑘
𝛿 𝜔 − 𝑘𝜔0𝑡
𝐺𝑡 𝜔 = 1 + 𝑔1𝑡 cos 𝛽𝑡𝑒−𝑗𝑤 + 𝑔1𝑡
2 𝑒−2𝑗𝜔 −1
1 + 𝑔2𝑡𝑒−𝑗𝜔 −1
magnitude & phase of the anti-causal poles
𝐻𝑡 𝜔 = exp
𝑛=1
𝐾
ℎ 𝑛 exp −𝑗𝑚 𝜔 𝑛
mel-frequency complex cepstral coefficientsWhen negative sign and group delay is removed, complex cepstrum decay at the rate of 1/ 𝑛.
Experimental ResultsVoiced Reconstruction Voiced Reconstruction – Single Frame GCI Location Estimation
Vocal Tract Filter Estimation Voiced vs Whispered
Original Spectrogram
Voiced Reconstruct
Real Spectrum
Imaginary Spectrum
PAT2 MFCC PAT2 MFCC
PAT2 Probabilistic ModelingConvert to DFT and Vectorize Convert to Mel Frequency Add Prior
𝒔𝑡 = 𝑎𝑡𝝃𝑡 + 𝑏𝑡𝜼𝑡
𝜼𝑡~𝒩 0,𝑯𝑡
𝒔𝑡 = 𝑭𝒔𝑡 = 𝑎𝑡𝑭𝝃𝑡 + 𝑏𝑡𝑭𝜼𝑡
𝒔𝑡~𝒩 𝑎𝑡𝑭𝝃𝑡, 𝑏𝑡2𝑭𝑯𝑡𝑭
𝑇
𝑃𝜃𝑡|𝜃𝑡−1𝑢|𝑣 ∝ −
𝑢 − 𝑣 2
𝜎𝜃2
Analysis with MAP
unvoiced excitation
dirac delta function
vocal tract transfer function
windowing function
glottal transfer function
group delay
speech
magnitude of the causal pole quefrency mel-frequency
𝐻𝑡
𝑉𝑡
𝑈𝑡 fundamental frequency