HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR.

transcript

HIWIRE MEETINGHIWIRE MEETINGParis, February 11, 2005Paris, February 11, 2005

JOSÉ C. SEGURA LUNAJOSÉ C. SEGURA LUNA

GSTC UGRGSTC UGR

2 HIWIRE Meeting – Paris, 11 February, 2005 José C. Segura Luna

Schedule

AURORA 4 HTK-based setup

Baseline results (AURORA databases) MFCC with C0 and CMN AFE

Additional results CMVN HEQ

Work in progress WP1: Improved HEQ WP2: User independence & robustness

ETSI AURORA 4 evaluation Baseline system based on ISIP speech recognition system

Main drawbacks: CPU time for experiments (specially for decoding) Scripts are excessively complex to use

Described in: N. Parihar and J. Picone, "DSR Front End LVCSR Evaluation -

AU/384/02," Aurora Working Group, ETSI, December 06, 2002.

G. Hirsch, "Experimental Framework for the Performance Evaluation of Speech Recognition Front-ends on a Large Vocabulary Task, Version 2.0," ETSI STQ-Aurora DSR Working Group, November 19, 2002.

HTK-based setup for AURORA 4 evaluations

Features 12MFCC + C0 (CMS) + Δ + Δ Δ

Cross-word tree-based tied-state tri-phones 3 states / 6 Gaussians per state

Back-off bi-gram language model Same as used in ISIP setup

Pruning is performed as in ISIP setup

Available for partners at: http://www.hiwire.org

Performance comparisons (HTK-based setup vs. ISIP) Training clean models from scratch takes 3h52‘ on a 2.66GHz

Word error rate Decoding time (s)

ISIP HTK ISIP HTK

Test 01

(clean data)16.2% 13.22%

(6.16RT)

(2.78RT)

Test 02

(car noise)49.6% 24.68%

(18.03RT)

(6.50RT)

Test 03

(babble noise)62.2% 46.00%

(26.9RT)

(11.17RT)

12 MFCCs + C0 (CMS) + +

AURORA 4 Baseline results

TRAIN TEST LATTICEPARAMETERS MODE SIZE SIZE 01-07 08-14 01-14 01-07 08-14 01-14

MFCC_0_D_A_Z clean 166 none 40,53 50,60 45,57 --- --- ---MFCC_0_D_A_Z clean 166 sml 26,53 33,57 30,05MFCC_0_D_A_Z clean 166 mid 27,98 35,02 31,50MFCC_0_D_A_Z clean 330 none 40,72 50,78 45,75 -0,47% -0,36% -0,40%MFCC_0_D_A_Z clean 330 sml 25,75 32,93 29,34MFCC_0_D_A_Z clean 330 mid 27,18 34,25 30,71

MFCC_0_D_A_Z multi 166 none 24,58 29,88 27,23 39,36% 40,96% 40,25%MFCC_0_D_A_Z multi 166 sml 17,32 18,87 18,09MFCC_0_D_A_Z multi 166 mid 18,83 20,16 19,50MFCC_0_D_A_Z multi 330 none 24,74 29,73 27,24 38,97% 41,24% 40,23%MFCC_0_D_A_Z multi 330 sml 16,70 17,80 17,25MFCC_0_D_A_Z multi 330 mid 18,26 19,33 18,79

AVERAGES Relative Error Reduction

AURORA 4 Additional results

MFCC_0_D_A_Z clean 166 none 40,53 50,60 45,57 --- --- ---MFCC_0_D_A_Z multi 166 none 24,58 29,88 27,23 39,36% 40,96% 40,25%

MFCC_0_D_A_Z MV clean 166 none 36,12 48,50 42,31 10,88% 4,15% 7,14%MFCC_0_D_A_Z MV DELTAS clean 166 none 34,73 47,35 41,04 14,31% 6,43% 9,94%

AFE clean 166 none 27,57 34,99 31,28 31,99% 30,85% 31,36%AFE noFD clean 166 none 27,69 35,26 31,48 31,67% 30,31% 30,92%AFE noFD multi 166 none 22,33 27,67 25,00 44,90% 45,32% 45,13%

ECDF_WSJ_MULTI clean 166 none 32,81 43,77 38,29 19,06% 13,50% 15,97%ECDF_TID_MULTI clean 166 none 31,36 40,87 36,12 22,61% 19,24% 20,74%ECDF_WSJ_CLEAN clean 166 none 32,19 42,75 37,47 20,58% 15,53% 17,78%ECDF_TID_CLEAN clean 166 none 31,75 41,95 36,85 21,67% 17,09% 19,13%

Baseline results

HIWIRE baseline results: 12 MFCCs + C0 (CMS) + +

Subway Babble Car Exhibition Average RestaurantStreet Airport Station Average Subway MStreet M Average AverageClean 98,46 98,46 98,36 98,73 98,50 98,46 98,46 98,36 98,73 98,50 98,53 98,46 98,50 98,5020 dB 97,79 97,67 98,21 97,22 97,72 97,21 97,82 97,88 97,41 97,58 97,88 97,49 97,69 97,6615 dB 97,11 97,13 97,67 97,13 97,26 96,81 96,89 97,02 96,42 96,79 97,05 97,13 97,09 97,0410 dB 95,52 96,07 96,24 94,17 95,50 95,76 95,41 95,88 94,72 95,44 94,96 95,07 95,02 95,385 dB 90,30 90,24 87,41 87,75 88,93 89,75 89,06 90,69 87,87 89,34 90,11 88,18 89,15 89,140 dB 69,85 66,02 48,79 65,84 62,63 70,49 62,36 72,62 57,39 65,72 68,81 63,00 65,91 64,52-5dB 28,98 28,14 19,21 27,00 25,83 34,45 24,73 33,91 23,82 29,23 28,25 26,45 27,35 27,49Average 90,11 89,43 85,66 88,42 88,41 90,00 88,31 90,82 86,76 88,97 89,76 88,17 88,97 88,75

Absolute w ord accuracy. If an HTK

output is WORD: %Corr=99.14,

Acc=98.68 [H=……..], the value to enter is

98.68.

Clean training, multicondition testingA

Aurora 2 Small Vocabulary

Multicondition training, multicondition testingA B C

98.68.

Aurora 2 Small Vocabulary B C

AURORA 2

Baseline results

AURORA 2

98.68.

Clean training, multicondition testingA

Aurora 2 Small Vocabulary

Multicondition training, multicondition testingA B C

98.68.

Aurora 2 Small Vocabulary B C

Baseline results

AURORA 3 word error rates

Italian Spanish German AverageWell (x40%) 5,58% 10,69% 8,86% 8,38%Mid (x35%) 12,98% 16,82% 18,81% 16,20%High (x25%) 53,25% 34,50% 20,31% 36,02%Overall 20,09% 18,79% 15,21% 18,03%

Italian Spanish German AverageWell (x40%) 3,29% 3,39% 4,87% 3,85%Mid (x35%) 7,47% 6,21% 10,40% 8,03%High (x25%) 11,00% 9,23% 8,70% 9,64%Overall 6,68% 5,84% 7,76% 6,76%

MFCC + C0 (CMS) + +

Work in progress (WP1)

Improved equalization

Modeling Speech & Noise separately

First results with Gaussian models Very promising on AURORA 4 Need to be evaluated on AURORA 2 & 3

Next Use more detailed / nonparametric models Incorporate dynamic features

Preliminary results

MFCC_0_D_A_Z clean 166 none 40,53 50,60 45,57 --- --- ---MFCC_0_D_A_Z multi 166 none 24,58 29,88 27,23 39,36% 40,96% 40,25%

MFCC_0_D_A_Z (MV) clean 166 none 36,12 48,50 42,31 10,88% 4,15% 7,14%MFCC_0_D_A_Z (MV DELTAS)clean 166 none 34,73 47,35 41,04 14,31% 6,43% 9,94%

AFE clean 166 none 27,57 34,99 31,28 31,99% 30,85% 31,36%AFE noFD clean 166 none 27,69 35,26 31,48 31,67% 30,31% 30,92%AFE noFD multi 166 none 22,33 27,67 25,00 44,90% 45,32% 45,13%

(ECDF_WSJ_MULTI) clean 166 none 32,81 43,77 38,29 19,06% 13,50% 15,97%(ECDF_TID_MULTI) clean 166 none 31,36 40,87 36,12 22,61% 19,24% 20,74%(ECDF_WSJ_CLEAN) clean 166 none 32,19 42,75 37,47 20,58% 15,53% 17,78%(ECDF_TID_CLEAN) clean 166 none 31,75 41,95 36,85 21,67% 17,09% 19,13%

CLASIF N20 ref01 clean 166 none 28,29 33,87 31,08 30,19% 33,06% 31,79%

VAD & Noise reduction

Baseline evaluations AURORA 2 & 3 already done AURORA 4 to be ready on June

Integration with parametric techniques Speech & Noise equalization

HEQ-based user robustness

Ready for AURORA 4Working in WSJ1 baseline

HEQ-based user adaptation

MLLR baselineEstimation of MLLR transformations using HEQWorking in WSJ1 baseline

HIWIRE MEETINGHIWIRE MEETINGParis, February 11, 2005Paris, February 11, 2005

JOSÉ C. SEGURA LUNAJOSÉ C. SEGURA LUNA

GSTC UGRGSTC UGR

HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR.

Documents