+ All Categories
Home > Documents > Lecture 24: Model Adaptation and Semi-Supervised Learning

Lecture 24: Model Adaptation and Semi-Supervised Learning

Date post: 24-Feb-2016
Category:
Upload: ama
View: 42 times
Download: 0 times
Share this document with a friend
Description:
Lecture 24: Model Adaptation and Semi-Supervised Learning. Machine Learning. Iain Murray’s MLSS lecture on videolectures.net: http://videolectures.net/mlss09uk_murray_mcmc/ . Today. Adaptation of Gaussian Mixture Models Maximum A Posteriori (MAP) Maximum Likelihood Linear Regression (MLLR) - PowerPoint PPT Presentation
Popular Tags:
38
Machine Learning Lecture 24: Model Adaptation and Semi-Supervised Learning in Murray’s MLSS lecture on videolectures.net: tp://videolectures.net/mlss09uk_murray_mcmc/
Transcript
Page 1: Lecture  24: Model Adaptation and Semi-Supervised Learning

Machine Learning

Lecture 24: Model Adaptation and Semi-Supervised Learning

Iain Murray’s MLSS lecture on videolectures.net: http://videolectures.net/mlss09uk_murray_mcmc/

Page 2: Lecture  24: Model Adaptation and Semi-Supervised Learning

Today

• Adaptation of Gaussian Mixture Models– Maximum A Posteriori (MAP)– Maximum Likelihood Linear Regression (MLLR)

• Application: Speaker Recognition– UBM-MAP + SVM

• Other Semi-Supervised Approaches– Self-Training– Co-Training

Page 3: Lecture  24: Model Adaptation and Semi-Supervised Learning

The Problem

• I have a little bit of labeled data, and a lot of unlabeled data.

• I can model the trainingdata fairly well.

• But we always fit training data betterthan testing data.

• Can we use the wealthof unlabeled data to do better?

Page 4: Lecture  24: Model Adaptation and Semi-Supervised Learning

Let’s use a GMM

• GMMs to model labeled data.• In simplest form, one mixture component per

class.

Page 5: Lecture  24: Model Adaptation and Semi-Supervised Learning

Labeled training of GMM

• MLE estimators of parameters

• Or these can be used to seed EM.

Page 6: Lecture  24: Model Adaptation and Semi-Supervised Learning

Adapting the mixtures to new data• Essentially, let EM start with MLE parameters as seeds. • Expand the available data for EM, proceed until convergence

Page 7: Lecture  24: Model Adaptation and Semi-Supervised Learning

Adapting the mixtures to new data• Essentially, let EM start with MLE parameters as seeds. • Expand the available data for EM, proceed until convergence

Page 8: Lecture  24: Model Adaptation and Semi-Supervised Learning

Problem with EM adaptation

• The initial labeled seeds could contribute very little to the final model

Page 9: Lecture  24: Model Adaptation and Semi-Supervised Learning

One Problem with EM adaptation

• The initial labeled seeds could contribute very little to the final model

Page 10: Lecture  24: Model Adaptation and Semi-Supervised Learning

MAP Adaptation• Constrain the contribution of unlabeled data.

• Let the alpha terms dictate how much weight to give to the new, unlabeled data compared to the exiting estimates.

Page 11: Lecture  24: Model Adaptation and Semi-Supervised Learning

MAP adaptation

• The movement of the parameters is constrained.

Page 12: Lecture  24: Model Adaptation and Semi-Supervised Learning

MLLR adaptation

• Another idea…• “Maximum Likelihood Linear Regression”.• Apply an affine transformation to the means.• Don’t change the covariance matrices

Page 13: Lecture  24: Model Adaptation and Semi-Supervised Learning

MLLR adaptation

• Another view on adaptation.• Apply an affine transformation to the means.• Don’t change the covariance matrices

Page 14: Lecture  24: Model Adaptation and Semi-Supervised Learning

MLLR adaptation

• The new means are the MLE of the means with the new data.

Page 15: Lecture  24: Model Adaptation and Semi-Supervised Learning

MLLR adaptation• The new means are the MLE of the means with the new data.

Page 16: Lecture  24: Model Adaptation and Semi-Supervised Learning

Why MLLR?• We can tie the transformation matrices of mixture components.• For example:

– You know that the red and green classes are similar– Assumption: Their transformations should be similar

Page 17: Lecture  24: Model Adaptation and Semi-Supervised Learning

Why MLLR?• We can tie the transformation matrices of mixture components.• For example:

– You know that the red and green classes are similar– Assumption: Their transformations should be similar

Page 18: Lecture  24: Model Adaptation and Semi-Supervised Learning

Application of Model Adaptation

• Speaker Recognition.• Task: Given speech from a known set of

speakers, identify the speaker.• Assume there is training data from each speaker.• Approach:

– Model a generic speaker. – Identify a speaker by its difference from the generic

speaker– Measure this difference by adaptation parameters

Page 19: Lecture  24: Model Adaptation and Semi-Supervised Learning

Speech Representation

• Extract a feature representation of speech.• Samples every 10ms.

MFCC – 16 dims

Page 20: Lecture  24: Model Adaptation and Semi-Supervised Learning

Similarity of sounds

MFCC1

MFCC2 /s/

/b/

/o//u/

Page 21: Lecture  24: Model Adaptation and Semi-Supervised Learning

Universal Background Model

• If we had labeled phone information that would be great.

• But it’s expensive, and time consuming.• So just fit a GMM to the MFCC representation

of all of the speech you have. – Generally all but one example, but we’ll come

back to this.

Page 22: Lecture  24: Model Adaptation and Semi-Supervised Learning

MFCC Scatter

MFCC1

MFCC2 /s/

/b/

/o//u/

Page 23: Lecture  24: Model Adaptation and Semi-Supervised Learning

UBM fitting

MFCC1

MFCC2 /s/

/b/

/o//u/

Page 24: Lecture  24: Model Adaptation and Semi-Supervised Learning

MAP adaptation

• When we have a segment of speech to evaluate, – Generate MFCC features. – Use MAP adaptation on the UBM Gaussian

Mixture Model.

Page 25: Lecture  24: Model Adaptation and Semi-Supervised Learning

MAP Adaptation

MFCC1

MFCC2 /s/

/b/

/o//u/

Page 26: Lecture  24: Model Adaptation and Semi-Supervised Learning

MAP Adaptation

MFCC1

MFCC2 /s/

/b/

/o//u/

Page 27: Lecture  24: Model Adaptation and Semi-Supervised Learning

UBM-MAP

• Claim:– The differences between speakers can be

represented by the movement of the mixture components of the UBM.

• How do we train this model?

Page 28: Lecture  24: Model Adaptation and Semi-Supervised Learning

UBM-MAP training

TrainingData

Held outSpeaker N

UBM Training

MAP

Supervector

• Supervector– A vector of adapted

means of the gaussian mixture components

Train a supervised model with theselabeled vectors.

Page 29: Lecture  24: Model Adaptation and Semi-Supervised Learning

UBM-MAP training

TrainingData

Held outSpeaker N

UBM Training

MAP

Supervector

Repeat for all training data

MulticlassSVM

Training

Page 30: Lecture  24: Model Adaptation and Semi-Supervised Learning

UBM-MAP Evaluation

Test Data

UBM

MAP

Supervector MulticlassSVM

Prediction

Page 31: Lecture  24: Model Adaptation and Semi-Supervised Learning

Alternate View

• Do we need all this?• What if we just train an SVM on labeled MFCC

data?

Test DataMulticlass

SVM

Prediction

LabeledTraining

Data

MulticlassSVM

Training

Page 32: Lecture  24: Model Adaptation and Semi-Supervised Learning

Results

• UBM-MAP (with some variants) is the state-of-the-art in Speaker Recognition.– Current state of the art performance is about 97%

accuracy (~2.5% EER) with a few minutes of speech.

• Direct MFCC modeling performs about half as well ~5% EER.

Page 33: Lecture  24: Model Adaptation and Semi-Supervised Learning

Model Adaptation

• Adaptation allows GMMs to be seeded with labeled data.

• Incorporation of unlabeled data gives a more robust model.

• Adaptation process can be used to differentiate members of the population – UBM-MAP

Page 34: Lecture  24: Model Adaptation and Semi-Supervised Learning

Self-Training

• Train a supervised model based on training data, T.• Generate predictions [t, x] for the unlabeled data,

U.• Add these to the training data.• Retrain the supervised model

• Alternates – only use the most confident predictions on U– Weight the new predictions by the confidence

Page 35: Lecture  24: Model Adaptation and Semi-Supervised Learning

Self-Training

• Advantages– Simple to use– No classifier dependence

• Disadvantages– Bad decisions get reinforced– Uncertain convergence properties

Page 36: Lecture  24: Model Adaptation and Semi-Supervised Learning

Co-Training

• Train two supervised classifiers C1 and C2 using different (uncorrelated) feature representations of T

• Generate predictions for U using C1 and C2.• Add the most confident predictions made

using C1 to the C2 training set, and vice versa.• Repeat

Page 37: Lecture  24: Model Adaptation and Semi-Supervised Learning

Co-Training

• Pros– Simple wrapper method– Less sensitive to mistakes than co-training

• Disadvantages– Natural feature splits might not exist– Models using both features are probably better

Page 38: Lecture  24: Model Adaptation and Semi-Supervised Learning

Next Time

• Ensemble Techniques


Recommended