+ All Categories
Home > Documents > An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error...

An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error...

Date post: 18-Dec-2015
Category:
Upload: noah-murphy
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
21
An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak
Transcript
Page 1: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

An Alternative Approach of

Finding Competing

Hypotheses for Better

Minimum Classification Error

Training

Mr. Yik-Cheung Tam

Dr. Brian Mak

Page 2: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

Motivation

Overview of MCE training

Problem using N-best hypotheses

Alternative:1-nearest hypothesis

What?

Why?

How?

Evaluation

Conclusion

OutlineOutline

Page 3: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

MCE OverviewMCE OverviewThe MCE loss function:

Distance measure:

G(X) may be computed using the N-best

hypotheses.

l(.) = 0-1 soft error-counting function (Sigmoid)

Gradient descent method to obtain a better

estimate.

Page 4: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

When d(X) gets large enough,

It falls out of the steep trainable region of Sigmoid.

Trainable region

Problem Using N-best Problem Using N-best HypothesesHypotheses

Page 5: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

What is 1-nearest Hypothesis?What is 1-nearest Hypothesis?

d(1-nearest) <= d(1-best)

The idea can be generalized to N-nearest hypotheses.

Page 6: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

Keep the training data inside the steep trainable

region.

Trainable region

Using 1-nearest HypothesisUsing 1-nearest Hypothesis

Page 7: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

Method 1 (exact approach)

Stack-based N-best decoder

Drawback:

• N may be very large => memory problem

• Need to limit the size of N.

Method 2 (approximated approach)

Modify the Viterbi algorithm with a special

pruning scheme.

How to Find 1-nearest How to Find 1-nearest Hypothesis?Hypothesis?

Page 8: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

Approximated 1-nearest HypothesisApproximated 1-nearest Hypothesis

Notation: V(t+1, j) : accumulated score at time t+1 and state j

: transition probability from state i to j

: observation probability at time t+1 and state j

: accumulated score of the Viterbi path of the

correct string at time t+1.

Beam(t+1) : beam width applied at time t+1

Page 9: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

There exists some “nearest” path in the search space (shaded area).

Approximated 1-nearest Hypothesis Approximated 1-nearest Hypothesis (.)(.)

Page 10: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

System EvaluationSystem Evaluation

Page 11: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

Corpus: AuroraCorpus: AuroraAurora

Noisy connected digits derived from TIDIGIT.

Multi-condition training: (Train on noisy condition)

{subway, babble, car, exhibition} x {clean, 20, 15,

10, 5} (5 noise levels)

8440 training utterances.

Testing: (Test on matched noisy condition)

Same as above except with additional samples

with 0 and –5 dB (7 noise levels)

28,028 testing utterances.

Page 12: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

System ConfigurationSystem Configuration

Standard 39-dimension MFCC (cep + + )

11 Whole-word digit HMM (0-9, oh)

16 states, 3 Gaussians per state

3-state silence HMM, 6 Gaussians per state

1-state short pause HMM tied to the 2nd state of the

silence model.

Baum-Welch training to obtain the initial HMM.

Corrective MCE training on HMM parameters.

Page 13: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

Compare 3 kinds of competing hypotheses:

1-best hypothesis

Exact 1-nearest hypothesis

Approx. 1-nearest hypothesis

Sigmoid parameters:

Various (control slope of Sigmoid)

Offset = 0

System Configuration (.)System Configuration (.)

Page 14: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

Learning rate = 0.05, with different 0.1 (best test performance)0.5 (steeper)0.02, 0.004 (more flat)

Experiment I: Effect of Sigmoid Experiment I: Effect of Sigmoid slopeslope

Baseline: 12.71%Baseline: 12.71%1-best: 11.01%1-best: 11.01%

Approx. 1-nearest: 10.71%Approx. 1-nearest: 10.71%

Exact 1-nearest: 10.45%Exact 1-nearest: 10.45%

Page 15: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

Soft error < 0.95 is defined to be “effective”.

1-nearest approach has more training data when the

Sigmoid slope is relatively steep.

Effective Amount of Training DataEffective Amount of Training Data

1-best (40%)1-best (40%)

Approx. 1-nearest (51%)Approx. 1-nearest (51%)

Exact. 1-nearest (67%)Exact. 1-nearest (67%)

Page 16: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

With 100% effective training data, apply more training

iterations:

= 0.004, learning rate = 0.05

Result: Slow improvement compared to the best case.

Experiment II: Experiment II: Compensation Compensation With More Training IterationsWith More Training Iterations

Exact 1-nearest with gamma = Exact 1-nearest with gamma = 0.10.1

Page 17: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

Use a larger learning rate (0.05 -> 1.25)

Fix = 0.004 (100% effective training data)

Result: 1-nearest approach is better than one-best

approach after compensation.

Experiment II: Compensation Experiment II: Compensation Using a Larger Learning RateUsing a Larger Learning Rate

System Before

compensation

After

compensation

Baseline 12.71% 12.71%

1-best 12.07% 11.55%

Approx 1-nearest 12.27% 10.70%

Exact 1-nearest 12.16% 10.79%

Page 18: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

Using a Larger Learning Rate (.)Using a Larger Learning Rate (.)

Training performance: MCE loss versus # of

training iterations.

1-best1-best

Approx. 1-nearestApprox. 1-nearest

Exact. 1-nearestExact. 1-nearest

Page 19: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

Using a Larger Learning Rate (..)Using a Larger Learning Rate (..)

Test performance: WER versus # of training

iterations.

Approx. 1-nearest Approx. 1-nearest (10.70%)(10.70%)

Exact. 1-nearest (Exact. 1-nearest (10.79%)

1-best (1-best (11.55%)

Page 20: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

ConclusionConclusion

1-best and 1-nearest methods were compared in

MCE training.

Effect of Sigmoid slope.

Compensation on using a flat sigmoid.

1-nearest method is better than 1-best approach.

More trainable data are available in the 1-nearest

approach.

Approx. and exact 1-nearest methods yield

comparable performance.

Page 21: An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.

Questions and Answers


Recommended