Speech recognition in MUMIS
Eric Sanders (KUN)
March 2003
People involved at KUN
Helmer Strik
Judith Kessens
Mirjam Wester
Janienke Sturm
Eric Sanders
Febe de Wet
Paul Tielen
Overview
Speech data
Baseline recognition
Adding data
Noise robustness
Word types
Conclusions
Examples of Data
Dutch“op _t ogenblik wordt in dit stadion de opstelling voorgelezen”
English“and they wanna make the change before the corner”
German“und die beiden Tore die die Hollaender bekommen hat haben”
From Yugoslavia-The Netherlands
Speech Data
All data
Language Dutch English German
# matches 6 3 21
# words 40,296 34,684 127,265
Speech Data
Match Dutch English German
Yugoslavia – The Netherlands 5,922 10,188 3,998
England – Germany 5,798 13,488 7,280
Test data (#words)
Baseline recognition
PMs: - trained on the other test match
Lex: - based on the other test set- match specific words added
LM: - category LM - based on the other test match- match specific words added
Baseline recognition
83,28
84,9186,84
93,16
85,71 85,21
78
80
82
84
86
88
90
92
94
YugNL EngGer
WE
R (
%)
Dutch
German
English
Adding Data
Extra training data:Dutch = 4 matchesGerman = 19 matchesEnglish = 1 match
Adding training data to train the lexicon and the language models (phone models trained on 1 match)
Adding Data (German)
75
80
85
90
95
0 100.000 200.000 300.000
number of words to train the LM
WE
R (%
)
Yug-NL, lex:1match
Yug-NL, lex:7matches
Yug-NL, lex:19matches
Eng-Ger, lex:7matches
Eng-Ger, lex:19matches
Noise Robustness Dutch English German
Noise Robustness
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30
SNR (dB)
WER
(%)
YugNL_NL
EngGer_NL
YugNL_ENG
EngGer_ENG
YugNL_GER (A)
YugNL_GER (B)
Eng-Ger_GER
Noise Robustness
Matching acoustic properties of train and test material
Training SNR dependent phone models
Applying noise robust feature extraction:Histogram Normalisation & FTNR
Possible solutions:
Noise RobustnessYUG-NL, very noisy
66
68
70
72
74
76
78
80
82
Semi-clean Noisy Very noisy
WE
R (
%)
Baseline
HN
HN + FTNR
Word Types
Not all words are equally important for an information retrieval task
Categories:- function words (prepositions, pronouns)- application specific words (player names)- other content words
WERs for different categories
0
20
40
60
80
100
NL Ger Eng NL Ger Eng
YugNL EngGer
WER
(%
) all
content w ords
function w ords
player names
Word Types
Conclusions
SNR values explain the WERs to a large extent
More data is not necessarily better
Applying noise robust features leads to best results
Overall WERs are very high, but application specific words are recognised relatively well
The end