Speech Cepstrum procedure
Pre-emphasis Framing Windowing
FFTSpectrum
MelSpectrum
MelCepstrum
For FFT basedMel Cepstrum
Speech
OR
LPC LPCSpectrum
For LPC based Mel Cepstrum
Speech Cepstrum procedure
Pre-emphasis Framing Windowing
FFTSpectrum
MelSpectrum
MelCepstrum
For FFT basedMel Cepstrum
Speech
OR
LPC LPCSpectrum
For LPC based Mel Cepstrum
Pre-emphasis/Framing/Windowing
10-20ms frames
Pre-emphasis Framing Windowing
H(z)=1+az-1
a=-0.95 Hamming window 20.54 0.46cos( )1n
Nπ
−−
Reduces the edge effect when taking the FFTImproves spectral estimate accuracy
Used to boost thesignal spectrumapproximately20dB/decade
Speech Cepstrum procedureFor FFT based Mel Cepstrum
Pre-emphasis Framing Windowing
FFTSpectrum
MelSpectrum
MelCepstrum
Speech
Speech Cepstrum procedureFor LPC based Mel Cepstrum
Pre-emphasis Framing Windowing
MelSpectrum
MelCepstrumLPC LPC
Spectrum
Speech
LPC Spectrum
[ai]
LP coefficients
LPCSpectrumLPC
Frame(Time domain)
| |2FrequencyResponse
LPcoefficients
[ai]
1
( )1
Mi
ii
GH zaz−
=
=+∑
Reduces the spectrum varianceEliminates various source information
Speech Cepstrum procedure
Pre-emphasis Framing Windowing
FFTSpectrum
MelSpectrum
MelCepstrum
LPC LPCSpectrum
For FFT basedMel Cepstrum
Speech
OR
For LPC based Mel Cepstrum
Mel Spectrum
MelFilter BankFFT or LPC Spectrum s(k)
Mel Filter bank
0 500 1000 1500 2000 2500 3000 3500 40000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Mel s cale filter bank
Gai
n
Frequency (Hz)
# samples =# filters in filter bank
Mel Filter bank frequency table
0,1,..., 1( )
0,1,..., / 2l
l LM k
k N= −= 4KHz100Hz
4KHz3031Hz
1l =
18l =
0,1,... 1( )
0,1,..., / 2l
l LM k
k N= −=
center frequency
Mel Filter bank
1
∑
N/2 points
2∑
L=20 pointsL<N20 ∑
Multiply the power spectrum with each of the triangular Mel weighting filters and add the result Perform a weighted averaging procedure around the Mel frequency, similar to a weighted Daniel periodogram
Mel Filter bank
/ 2
0( ) ( ) ( ) 0,1,..., 1
N
lk
S l S k M k l L=
= = −∑%
Filter from filter bankthl
Original Spectrum
skfk HzN
− − >
Total number oftriangular Melweighing filters (20)
Half the FFT sizeMel Spectrum
Will get the wholerange of frequenciesbut only L samples
Mel spectrum plots
0 1000 2000 3000 40000
0.5
1
1.5
2
2.5
3x 10
-4 FFT bas ed S pectrum
Am
plitu
de
Frequency (Hz)
0 1000 2000 3000 40000
20
40
60
80LP C bas ed S pectrum
Am
plitu
de
Frequency (Hz)
0 1000 2000 3000 40000
0.2
0.4
0.6
0.8
1Mel-FFT bas ed S pectrum
Nor
mal
ized
Am
plitu
de
Frequency (Hz)
0 1000 2000 3000 40000
0.2
0.4
0.6
0.8
1Mel-LP C bas ed S pectrum
Nor
mal
ized
Am
plitu
de
Frequency (Hz)
•BW of the peak expandsbecause of the loss ofresolution in Mel scale
•Peaks that are not locatedexactly at the center Mel frequency of theircorresponding filtermay move to center onthe Mel frequencye.g.110Hz peak 100Hz
Mel spectrum plots
0 1000 2000 3000 40000
1
2
3
4
5x 10
-5 FFT bas ed S pectrum
Am
plitu
de
Frequency (Hz)
0 1000 2000 3000 40000
1
2
3
4
5
6LP C bas ed S pectrum
Am
plitu
de
Frequency (Hz)
0 1000 2000 3000 40000
0.2
0.4
0.6
0.8
1Mel-FFT bas ed S pectrum
Nor
mal
ized
Am
plitu
deFrequency (Hz)
0 1000 2000 3000 40000
0.2
0.4
0.6
0.8
1Mel-LP C bas ed S pectrum
Nor
mal
ized
Am
plitu
de
Frequency (Hz)
•Loss of resolution
•Maintains prominentpeaks
Mel spectrum plots
0 1000 2000 3000 40000
0.05
0.1
0.15
0.2
0.25FFT bas ed S pectrum
Am
plitu
de
Frequency (Hz)
0 1000 2000 3000 40000
5
10
15
20LP C bas ed S pectrum
Am
plitu
de
Frequency (Hz)
0 1000 2000 3000 40000
0.2
0.4
0.6
0.8
1Mel-FFT bas ed S pectrum
Nor
mal
ized
Am
plitu
deFrequency (Hz)
0 1000 2000 3000 40000
0.2
0.4
0.6
0.8
1Mel-LP C bas ed S pectrum
Nor
mal
ized
Am
plitu
de
Frequency (Hz)
•Small peaks close tolarge ones are lumpedinto one main peak
Speech Cepstrum procedure
Pre-emphasis Framing Windowing
FFTSpectrum
MelSpectrum
MelCepstrum
For FFT basedMel Cepstrum
Speech
OR
LPC LPCSpectrum
For LPC based Mel Cepstrum
Speech Cepstrum parameterizationMel
Cepstrum
Log() DCT
# cepstralcoefficients desired
Total number oftriangular Melweighing filters (20)
1
2( ) log( ( )) cos ( 0.5) 0,1,... 1L
m
ic i S m m i CL L
π=
= − = −
∑ %DCT Equation:
Mel Spectrum
2 4 6 8 10 12 14-1
-0.5
0
0.5
Mel S caled Ceps tral CoeffN
orm
aliz
ed A
mpl
itude
Coefficient number
FFT bas edLP C bas ed
Mel Scaled Cepstral Coefficients
Mel Scaled Cepstral Coefficients
2 4 6 8 10 12 14-1
-0.5
0
0.5
Mel S caled Ceps tral CoeffN
orm
aliz
ed A
mpl
itude
Coefficient number
FFT bas edLP C bas ed
Why the DCT?
• The signal is real with mirror symmetry• The IFFT requires complex arithmetic• The DCT does NOT• The DCT implements the same function as
the FFT more efficiently by taking advantage of the redundancy in a real signal.
• The DCT is more efficient computationally
Conclusions
• LPC approximates speech linearly at all frequencies
• LPCC are more robust and reliable than LPC alone
• Mel-scaled LPCC and Mel-scaled FFT-CC are robust and also take into account the psychoacoustic properties of the human auditory system
References[1] Wong, E. and Sridharan, S. “Comparison of linear prediction cepstrum coefficients and Mel-frequency cepstrum coefficients for language identification,” Intelligent Multimedia, Video and Speech Processing Proceedings, pp. 95 -98, 2001.
[2] Molau, S., Pitz, M., Schluter, R. and Ney, H., “Computing Mel-frequency cepstral coefficients on the power spectrum,” Acoustics, Speech, and Signal Processing Proceedings, Volume: 1, pp. 73 -76, 2001.
[3] De Lima Araujo, A.M. and Violaro, F., “Formant frequency estimation using a Mel-scale LPC algorithm,” ITS '98 Proceedings, Volume: 1, pp. 207 -212, 1998.
[4] Picone, J.W., “Signal modeling techniques in speech recognition,” Proceedings of the IEEE , Volume: 81, Issue: 9, pp. 1215 -1247 Sep 1993.
[5] Umesh, S., Cohen, L. and Nelson, D., “Frequency warping and the Mel scale,” IEEE Signal Processing Letters , Volume: 9, Issue: 3, pp. 104 -107, 2002.
[6] http://www-2.cs.cmu.edu/~mseltzer/sphinxman/