+ All Categories
Home > Documents > 1 Less is More? Yi Wu Advisor: Alex Rudnicky. 2 People: There is no data like more data!

1 Less is More? Yi Wu Advisor: Alex Rudnicky. 2 People: There is no data like more data!

Date post: 22-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
22
1 Less is More? Yi Wu Advisor: Alex Rudnicky
Transcript

1

Less is More?

Yi Wu

Advisor: Alex Rudnicky

2

People:

There is no data like more data!

3

Goal: Use less to Perform more

• Identifying an informative subset from a large corpus for Acoustic Model (AM) training.

• Expectation of the Selected Set Good in Performance Fast in Selection

4

Motivation

The improvement of system will become increasingly smaller when we keep adding data.

Training acoustic model is time consuming. We need some guidance on what is the

most needed data.

5

Approach Overview

Applied to well-transcribed data Selection based on transcription Choose subset that have “uniform”

distribution on speech unit (word, phoneme, character)

6

How to sample data wisely?--A simple example

k Gaussian distribution with known priorωi and unknown density function fi(μi ,σi)

7

How to sample wisely?--A simplified example We are given access to at most N examples. We have right to choose how much we want

from each class. We train the model use MLE estimator. When a new sample generated, we use our

model to determine its class.

Question: How to sample to achieve minimum error?

8

The optimal Bayes Classifier

If we have the exact form of fi(x), above classification is optimal.

arg max ( ) arg max(log( ) log( ( ))i i i ii f x f x

9

To approximate the optimal

We use our MLE The true error would be bounded by optimal

Bayes error plus error bound for our worst estimated

ˆif

if

arg max ( ) arg max(log( ) log( ( ))i i i ii f x f x

10

Sample Uniformly

We want to sample each class equally. The data selected will have good coverage on

each class. This will give robust estimation on each class.

11

The Real ASR system

12

Data Selection for ASR System

The prior has been estimated independently by language model.

To make acoustic model accurate, we want to sample the W uniformly.

We can take the unit to be phoneme, character, word. We want their distribution to be uniform.

13

Entropy: Measure for “uniformness”

Use the entropy of the word (phoneme) as ways of evaluation Suppose the word (phoneme) has a sample

distribution p1, p2…. pn

Choose subset have maximum -p1*log(p1)-p2*log(p2)-... pn *log(pn))

Entropy actually is the KL distance from uniform distribution

14

Computational Issue

It is computational intractable to find the transcription set that maximizes the entropy

Forward Greedy Search

15

Combination

There are multiple entropies we want to maximize.

Combination Method Weighted Sum Add sequentially

16

Experiment Setup

System: Sphinx III Feature: 39 dimension MFCC Training Corpus: Chinese BN 97(30hr)+

GaleY1(810hr data) Test Set: RT04(60 min)

17

Experiment 1 ( use word distribution)

Time (hour) 30 50 100 840Random (all) 27.6 27.1 26.1 24.3

Max-entropy 27.0 26.2 24.8

Table 1

18

More Result  30 h 50 h 100 h 150 h 840 h

random(all) 27.6 27.1 26.1 25.0 24.3

cctv(bn) 17 15.7 13.2 13.6 12.9

ntdv(bn) 24.7 24.2 23.3 22.2 21.0

rfa(bc) 42.9 43.6 44 41.1 41.0

bc/bn(ratio) 15.4/14.6 25.7/24.3 51.2/49.8 76.8/73.2431/409

max-entropy(all)

27 26.2 24.8

cctv(bn) 15 14 13

ntdtv(bn) 23 22.3 21.1

rfa(bc) 45.8 44.8 42.7

bc/bn(ratio) 11.0/19.0 18.2/31.850.6 50.6/49.8

19

Experiment 2 (add sequentially with phoneme and character 150hr)

CCTV NTDTV RFA ALLRandom(150h) 13.6 22.2 44.1 25.0Max-entropy (word+char)

12.2 21.8 42.3 24.7

Max-entropy (word+phone)

13.1 20.5 41.8 24.4

All data (840 hrs)

12.9 21.0 41.0 24.3

Table 2

20

Experiment 1,2

100

101

102

103

104

105

100

101

102

103

104

105

106

word

co

un

t

max-entropy

random

all data

21

Experiment 3 (with VTLN)

CCTV NTDTV RFA ALL150 hr (word+phone)

13.1 20.5 41.8 24.4

With VTLN 11.8 17.8 40.1 22.5

Table 3

22

Summary

Choose data uniformly according to speech unit Maximize entropy using greedy algorithm Add data sequentially

Future Work

Combine Multiple Sources Select Un-transcribed Data


Recommended