+ All Categories
Home > Documents > Survey ICASSP 2007 Discriminative Training

Survey ICASSP 2007 Discriminative Training

Date post: 03-Jan-2016
Category:
Upload: thuraya-chaka
View: 33 times
Download: 2 times
Share this document with a friend
Description:
Survey ICASSP 2007 Discriminative Training. Reporter: Shih-Hung Liu 2007/04/30. References. Large-Margin Minimum Classification Error Training for Large-Scale Speech Recognition Tasks Dong Yu, Li, Deng Xiaodong He, Alex Acero , Microsoft - PowerPoint PPT Presentation
Popular Tags:
22
Survey ICASSP 2007 Discriminative Training Reporter: Shih-Hung Liu 2007/04/30
Transcript
Page 1: Survey ICASSP 2007 Discriminative Training

Survey ICASSP 2007Discriminative Training

Reporter: Shih-Hung Liu 2007/04/30

Page 2: Survey ICASSP 2007 Discriminative Training

2

References

• Large-Margin Minimum Classification Error Training for Large-Scale Speech Recognition Tasks– Dong Yu, Li, Deng Xiaodong He, Alex Acero, Microsoft

• Approximate Test Risk Minimization Through Soft Margin Estimation– Jinyu Li, Sabato Marco, Chin-Hui Lee, Georgia

• Unsupervised Training for Mandarin Broadcast News and Conversation transcription– L. Wang, M.J.F. Gales, P.C. Woodland, Cambridge

• A New Minimum Divergence Approach to Discriminative Training– J. Du, P. Liu, H. Jiang, F.K. Soong, R.H. Wang, Microsoft Asia

Page 3: Survey ICASSP 2007 Discriminative Training

3

LM-MCE

• The basic idea of LM-MCE is to include the margin in the optimization criteria along with the smoothed empirical error rate and make the correct samples classified well far away from the decision boundary

• To successfully incorporate the margin, we proposed increasing the discriminative margin gradually over iterations

Page 4: Survey ICASSP 2007 Discriminative Training

4

LM-MCE

Using Parzen window

Page 5: Survey ICASSP 2007 Discriminative Training

5

LM-MCE

define symmetric kernel function

Margin-free Bayes Risk

Page 6: Survey ICASSP 2007 Discriminative Training

6

LM-MCE

Page 7: Survey ICASSP 2007 Discriminative Training

7

LM-MCE

Page 8: Survey ICASSP 2007 Discriminative Training

8

Experiments

Page 9: Survey ICASSP 2007 Discriminative Training

9

Soft Margin Estimation

• Test risk bound expressed as a sum of an empirical risk and a function of VC dimension

• Approximate test risk minimization

• Define loss function

Page 10: Survey ICASSP 2007 Discriminative Training

10

Soft Margin Estimation

Page 11: Survey ICASSP 2007 Discriminative Training

11

Soft Margin Estimation on LVCSR

Page 12: Survey ICASSP 2007 Discriminative Training

12

Experiments

Page 13: Survey ICASSP 2007 Discriminative Training

13

Unsupervised Training

• Segmentation:– First, advert removal is run. Here the arithmetic harmonic spheric

ity distance is used to detect repeated blocks of audio data, for example jingles or commercials.

– Acoustic segmentation is performed. The data is then split into wide-band and narrow-band speech.

– Sections of music are discarded. – Finally gender detection and speaker clustering are run.

Page 14: Survey ICASSP 2007 Discriminative Training

14

Unsupervised Training

• Transcription generation:– Initial transcriptions are generated using good acoustic models,

MPE trained in this work– • P1: gender-independent models are used to generate initial

transcriptions using a trigram language model and relatively tight beamwidths.

– • P2: the 1-best hypothesis from the P1 stage is used to generate adaptation transforms. Here least squares linear regression

and diagonal variance transforms are estimated. Using the adapted models lattices are generated using a trigram language model. These lattices are then rescored using a 4-gram language model.

Page 15: Survey ICASSP 2007 Discriminative Training

15

Experiments

Page 16: Survey ICASSP 2007 Discriminative Training

16

A New Minimum Divergence Approach

• MD possesses the following advantages: – 1. It is with higher resolution than any label comparison based

error definition.– 2. It is a general solution in dealing with any kinds of models and

phone sets. – As a result, MD outperforms other DT criteria on several tasks

• It is notable that in MD, the accuracy term is a function of model parameters. Hence, we can also take it into consideration in the optimization process

Page 17: Survey ICASSP 2007 Discriminative Training

17

A New Minimum Divergence Approach

• MD criterion

• Joint optimization

• It satisfies the conditions of the weak-sense auxiliary function

Page 18: Survey ICASSP 2007 Discriminative Training

18

A New Minimum Divergence Approach

Page 19: Survey ICASSP 2007 Discriminative Training

19

A New Minimum Divergence Approach

• With state frame independent assumption

Page 20: Survey ICASSP 2007 Discriminative Training

20

A New Minimum Divergence Approach

• Statistics for EBW

Page 21: Survey ICASSP 2007 Discriminative Training

21

A New Minimum Divergence Approach

Page 22: Survey ICASSP 2007 Discriminative Training

22

Experiments


Recommended