Exit
Duration modeling for speech Duration modeling for speech recognitionrecognition
Presented for BBN
Dr. Andrey Nikiforov
Department of Applied Mathematics and Statistics
State University of New York at Stony Brook
Start Exit
Additional topics
Computational and modeling issues improving the performance of speech recognition algorithms
Partial classification techniques Tree-dependence covariance models in HMM Fast search and computations for codebooks Interpolation for acoustic space
Start Exit
State duration in HMM
Start Exit
Duration distributions
Duration probability density functions
Time
Exponential
Raleigh
Weibull
Normal
Start Exit
From …
Start Exit
… to
Start Exit
Progressive model
Start Exit
Time calculation
BA
t t+1
Start Exit
Time calculation (continued)
BA
t t+1
Start Exit
Probability calculations: from …
Start Exit
…to
Start Exit
Hazard function
Start Exit
Hazard function estimation
Start Exit
“Nonparametric estimate”
Start Exit
“Trajectories”
Start Exit
State duration correction
(Fant et al., 1991)
Start Exit
Word duration
0.0
2.7
5.3
8.0
30.0 41.7 53.3 65.0
Word duration distribution
Word_length__frames_
Count
Start Exit
State duration correction
0.0
2.7
5.3
8.0
0.0 0.1 0.1 0.1
State duration distribution
C4
Count
0.0
2.7
5.3
8.0
0.0 0.1 0.1 0.1
State duration distribution
C5
Count
0.0
4.7
9.3
14.0
2.8 3.3 3.7 4.2
State duration distribution
C4
Count
0.0
4.0
8.0
12.0
2.5 3.5 4.5 5.5
State duration distribution
C5
Count
Start Exit
State duration correction (continued)
0.0
2.7
5.3
8.0
0.1 0.1 0.1 0.1
State duration distribution
C6
Count
0.0
3.3
6.7
10.0
0.0 0.1 0.1 0.2
State duration distribution
C7
Count
0.0
5.0
10.0
15.0
2.5 3.7 4.8 6.0
State duration distribution
C6
Count
0.0
6.7
13.3
20.0
2.0 4.7 7.3 10.0
State duration distribution
C7
Count
Start Exit
Conclusions
• Representation of duration distribution via the hazard function is simple, effective and comfortable for programming
• Speech recognition errors dropped by 20-25% in different tasks
• Pure time spent in Viterbi search or full probability calculation increased in average by 20% compared to the conventional HMM (almost completely compensated by the reduction of computations due to more adequate modeling)
Start Exit
Partial classification techniques for speech recognition
Helps to create structure in speech HMMs
Useful in codebook(s) estimation
Initial estimates for HMMs and codebooks
More accurate estimates