RCC-Mean Subtraction Robust Feature and RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Compare Various Feature based Methods for Robust Speech Recognition in presence of Robust Speech Recognition in presence of
Telephone NoiseTelephone Noise
Amin FazelSharif University of TechnologySharif University of Technology
Hossein Sameti, Mohammad T. Manzuri
February 2005
Computer Engineering Department, Sharif University of Computer Engineering Department, Sharif University of TechnologyTechnology
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
2/30
• Introduction
• Feature based methods– MFCC, RCC, CMN, PLP, RASTA
• Mean Normalization Root Cepstral Coefficients
• Experimental Results– Experiment 1 – Sharif CSR and TFARSDAT Database– Experiment 2 – HTK CSR and AURORA 2 Database
• Summery
Outline
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
3/30
Effect of Noise on ASR
• Two phase in most ASR systems– Train– Operating (Testing)
• Mismatch causes reduction in accuracy
• Mismatch occur because of– Environment
• Microphone, babble, distance, transmission canal
– Speaker• Specific speaker: speed,…• Various speakers: gender, age, accent,…
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
4/30
Effect of Noise on ASR
• Noise– Additive noise
• Babble, car, subway
• Exhibit, office, …
– Convolutional Noise• Canal, telephone line
• Microphone effect• Distance of speaker to microphone
– Others • Lombard noise, Reflection of building
noise
Stationary Non-stationary
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
5/30
Effect of Noise on ASR
• Simple model
• Robust Speech Recognition is the study of building speech recognition that handle mismatch condition.
Convolutional
noise CorruptedSpeech
Additive noise
Clean Speech
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
6/30
Robustness Methods
• Signal– Speech enhancement
• Feature– Robust feature extraction
• Model– Change of the model parameters
– Model trainingTraining phase
Testing phase
SpeechSignal
Features ModelFeature
ExtractionModel
Training
SpeechSignal
Features Model
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
7/30
Introduction
• Feature based methods– MFCC, RCC, CMN, PLP, RASTA
• Mean Normalization Root Cepstral Coefficients
• Experimental Results– Experiment 1 – Sharif CSR and TFARSDAT Database– Experiment 2 – HTK CSR and AURORA 2 Database
• Summery
Outline
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
8/30
Mel-Frequency Cepstral Coefficient
• Compute magnitude-squared of Fourier transform
• Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution
• Take log of outputs ( for RCC we take root instead of log)
• Compute cepstral using discrete cosine transform
• Smooth by dropping higher-order coefficients
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
9/30
Temporal processing
• To capture the temporal features of the spectral envelop; to provide the robustness:–Delta Feature: first and second order differences; regression–Cepstral Mean Subtraction:
• For normalizing for channel effects and adjusting for spectral slope
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
10/30
Perceptual Linear Prediction (PLP)
• Compute magnitude-squared of Fourier transform• Apply triangular frequency weights that represent the
effects of peripheral auditory frequency resolution
• Apply compressive nonlinearities
• Compute discrete cosine transform
• Smooth using autoregressive modeling• Compute cepstral using linear recursion
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
11/30
PLP (Cont.)
• Algorithm
Intensity-Loudness
Conversion
Inverse DFT
Find Autoregressive
Coefficients
All pole model
Critical Band Analysis
Equal Loudness Pre-
Emphasis
Speech signal
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
12/30
RelAtive SpecTral Analysis
• Which makes PLP (and possibly also some other short-term spectrum based techniques) more robust to linear spectral distortions
• The new spectral estimate is less sensitive to slow variations in the short-term spectrum
• Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features
– This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz)
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
13/30
RASTA (Cont.)
• Algorithm
SPECTRAL ANALYSIS
Bank of Compressing Static Nonlinearities
Bank of Linear Band pass Filters
Bank of Expanding Static Nonlinearities
OPTIONAL PROCESSING
SPEECH SIGNAL
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
14/30
RASTA-PLP
• Algorithm
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
15/30
Introduction
Feature based methods– MFCC, RCC, CMN, PLP, RASTA
• Mean Normalization Root Cepstral Coefficients
• Experimental Results– Experiment 1 – Sharif CSR and TFARSDAT Database– Experiment 2 – HTK CSR and AURORA 2 Database
• Summery
Outline
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
16/30
RCC-Mean Normalization
• Root Cepstral Coefficients (RCC)– Derived using root compression rather than
log compression on the filterbank energies
• Advantage of RCC to MFCC– More immune to noise– Faster decoding
P , 2, 1,jfor ,][~
][][1
0
kSkwjeN
kj
m,2, 1,ifor ,)5.0(
cos])[(][1j
P
P
jijeiRCC
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
17/30
RCC-Mean Normalization
• Mean normalization
• If we approximate root with logarithm
NiCCC avgyiyiMNRCC ,,1 ,;;;_
N
iiyavgy C
NC
1;;
1
hsy CCCnhnsny
)()()(
avghavgsavgy CCC ;;;
0
is
hhisiMNRCC
C
CCCC
;
;;_
)(
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
18/30
Introduction
Feature based methods– MFCC, RCC, CMN, PLP, RASTA
Mean Normalization Root Cepstral Coefficients
• Experimental Results– Experiment 1 – Sharif CSR and TFARSDAT Database– Experiment 2 – HTK CSR and AURORA 2 Database
• Summery
Outline
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
19/30
Experiment 1
• Database– TFARSDAT
• 64 Speakers• 8 hours telephony speech data
• ASR– Sharif ASR System
• HMM based• Training: Segmental K-means • Search: Beam Viterbi
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
20/30
Experiment 1
• Test results
Accuracy Correctness%
MFCC % 54.97 % 59.32
MFCC_CMS % 51.62 % 56.63
RASTA_PLPRASTA_PLP % 58.38% 58.38 % 65.59% 65.59
RCC % 55.67 % 59.85
RCC_MNRCC_MN % 56.89% 56.89 % 64.31% 64.31
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
21/30
Experiment 2
• Aurora 2.0– Noisy connected digits recognition– 4 hours training data, 2 hours test data in
70 Noise Types/SNR conditions
• HTK– HMM based– Model for each digit
• 16 states with 3 Gaussian mixtures
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
22/30
Experiment 2
• Average results on AURORA– Average obtained on various SNRs of a noise
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
23/30
Experiment 2
• Subway noise in various SNRs
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
24/30
Experiment 2
• Babble noise in various SNRs
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
25/30
Experiment 2
• Car noise in various SNRs
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
26/30
Experiment 2
• Exhibition noise in various SNRs
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
27/30
Introduction
Feature based methods– MFCC, RCC, CMN, PLP, RASTA
Mean Normalization Root Cepstral Coefficients
Experimental Results– Experiment 1 – Sharif CSR and TFARSDAT Database– Experiment 2 – HTK CSR and AURORA 2 Database
• Summery
Outline
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
28/30
Summery
• Various robust features was tested
• Introduce of RCC_MN
• In first experiment– RASTA-PLP – Although RCC_MN is good
• In second experiment– RCC_MN
Wednesday, February 18, 2005
Computer Engineering DepartmentComputer Engineering DepartmentSharif University of TechnologySharif University of Technology
29/30
Introduction
Feature based methods– MFCC, RCC, CMN, PLP, RASTA
Mean Normalization Root Cepstral Coefficients
Experimental Results– Experiment 1 – Sharif CSR and TFARSDAT Database– Experiment 2 – HTK CSR and AURORA 2 Database
Summery
Outline
Thanks for your patience !