+ All Categories
Home > Documents > Loss-based Learning with Latent Variables

Loss-based Learning with Latent Variables

Date post: 24-Feb-2016
Category:
Upload: nantai
View: 41 times
Download: 0 times
Share this document with a friend
Description:
Loss-based Learning with Latent Variables. M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay , Île-de-France. Joint work with Ben Packer, Daphne Koller, Kevin Miller and Danny Goodman. Image Classification. Images x i. Boxes h i. Labels y i. Image x. - PowerPoint PPT Presentation
Popular Tags:
57
Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben Packer, Daphne Koller, Kevin Miller and Danny Goodman
Transcript
Page 1: Loss-based Learning with Latent Variables

Loss-based Learningwith Latent Variables

M. Pawan KumarÉcole Centrale Paris

École des Ponts ParisTechINRIA Saclay, Île-de-France

Joint work withBen Packer, Daphne Koller, Kevin Miller and Danny Goodman

Page 2: Loss-based Learning with Latent Variables

Image Classification

Images xi Labels yi

Bison

Deer

Elephant

Giraffe

Llama

Rhino

Boxes hi

y = “Deer”

Image x

Page 3: Loss-based Learning with Latent Variables

Image Classification

0.00 0.00 0.000.00 0.75 0.000.00 0.00 0.00

Feature Ψ(x,y,h) (e.g. HOG)

Score f : Ψ(x,y,h) (-∞, +∞)

f (Ψ(x,y1,h))

Page 4: Loss-based Learning with Latent Variables

Image Classification

0.00 0.23 0.000.00 0.00 0.010.01 0.00 0.00

f (Ψ(x,y2,h))0.00 0.00 0.000.00 0.75 0.000.00 0.00 0.00

f (Ψ(x,y1,h))

Feature Ψ(x,y,h) (e.g. HOG)

Score f : Ψ(x,y,h) (-∞, +∞)

Prediction y(f)

Learn f

Page 5: Loss-based Learning with Latent Variables

Loss-based Learning

User defined loss function Δ(y,y(f))

f* = argminf Σi Δ(yi,yi(f))

Minimize loss between predicted and ground-truth output

No restriction on the loss function

General framework (object detection, segmentation, …)

Page 6: Loss-based Learning with Latent Variables

• Latent SVM

• Max-Margin Min-Entropy Models

• Dissimilarity Coefficient Learning

Outline

Andrews et al., NIPS 2001; Smola et al., AISTATS 2005;Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009

Page 7: Loss-based Learning with Latent Variables

Image Classification

Images xi Labels yi

Bison

Deer

Elephant

Giraffe

Llama

Rhino

Boxes hi

y = “Deer”

Image x

Page 8: Loss-based Learning with Latent Variables

Latent SVM

Scoring function

wTΨ(x,y,h)

Prediction

y(w),h(w) = argmaxy,h wTΨ(x,y,h))

Parameters Features

Page 9: Loss-based Learning with Latent Variables

Learning Latent SVM

Training data {(xi,yi), i = 1,2,…,n}

Highly non-convex in w

Cannot regularize w to prevent overfitting

w* = argminw Σi Δ(yi,yi(w))

Page 10: Loss-based Learning with Latent Variables

Learning Latent SVM

Δ(yi,yi(w))wTΨ(x,yi(w),hi(w)) + - wTΨ(x,yi(w),hi(w))

Δ(yi,yi(w))≤ wTΨ(x,yi(w),hi(w)) + - maxhi wTΨ(x,yi,hi)

Δ(yi,y)}≤ maxy,h{wTΨ(x,y,h) + - maxhi wTΨ(x,yi,hi)

Training data {(xi,yi), i = 1,2,…,n}

Page 11: Loss-based Learning with Latent Variables

Learning Latent SVM

Training data {(xi,yi), i = 1,2,…,n}

minw ||w||2 + C Σiξi

wTΨ(xi,y,h) + Δ(yi,y) - maxhi wTΨ(xi,yi,hi) ≤ ξi

Difference-of-convex program in w

Local minimum or saddle point solution (CCCP)

Self-Paced Learning, NIPS 2010

Page 12: Loss-based Learning with Latent Variables

Recap

minw ||w||2 + C Σiξi

wTΨ(xi,y,h) + Δ(yi,y) - maxhi wTΨ(xi,yi,hi) ≤ ξi

Scoring function

wTΨ(x,y,h)

Prediction

y(w),h(w) = argmaxy,h wTΨ(x,y,h))

Learning

Page 13: Loss-based Learning with Latent Variables

Image Classification

Images xi Labels yi

Bison

Deer

Elephant

Giraffe

Llama

Rhino

Boxes hi

y = “Deer”

Image x

Page 14: Loss-based Learning with Latent Variables

Image Classification

0.00 0.00 0.250.00 0.25 0.000.25 0.00 0.00

Score wTΨ(x,y,h) (-∞, +∞)

wTΨ(x,y1,h)

Page 15: Loss-based Learning with Latent Variables

Image Classification

0.00 0.24 0.000.00 0.00 0.000.01 0.00 0.00

wTΨ(x,y2,h)0.00 0.00 0.250.00 0.25 0.000.25 0.00 0.00

wTΨ(x,y1,h)

Only maximum score used

No other useful cue?

Score wTΨ(x,y,h) (-∞, +∞)

Uncertainty in h

Page 16: Loss-based Learning with Latent Variables

• Latent SVM

• Max-Margin Min-Entropy (M3E) Models

• Dissimilarity Coefficient Learning

Outline

Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012

Page 17: Loss-based Learning with Latent Variables

M3E

Scoring function

Pw(y,h|x) = exp(wTΨ(x,y,h))/Z(x)

Prediction

y(w) = argminy Hα(Pw(h|y,x)) – log Pw(y|x)

Partition Function

MarginalizedProbability

RényiEntropy

Rényi Entropy of Generalized Distribution Gα(y;x,w)

Page 18: Loss-based Learning with Latent Variables

Rényi Entropy

Gα(y;x,w) =1

1-αlog

Σh Pw(y,h|x)α

Σh Pw(y,h|x)

α = 1. Shannon Entropy of Generalized Distribution

- Σh Pw(y,h|x) log(Pw(y,h|x))

Σh Pw(y,h|x)

Page 19: Loss-based Learning with Latent Variables

Rényi Entropy

Gα(y;x,w) =1

1-αlog

Σh Pw(y,h|x)α

Σh Pw(y,h|x)

α = Infinity. Minimum Entropy of Generalized Distribution

- maxh log(Pw(y,h|x))

Page 20: Loss-based Learning with Latent Variables

Rényi Entropy

Gα(y;x,w) =1

1-αlog

Σh Pw(y,h|x)α

Σh Pw(y,h|x)

α = Infinity. Minimum Entropy of Generalized Distribution

- maxh wTΨ(x,y,h)

Same prediction as latent SVM

Page 21: Loss-based Learning with Latent Variables

Learning M3E

Training data {(xi,yi), i = 1,2,…,n}

Highly non-convex in w

Cannot regularize w to prevent overfitting

w* = argminw Σi Δ(yi,yi(w))

Page 22: Loss-based Learning with Latent Variables

Learning M3E

Δ(yi,yi(w))Gα(yi(w);xi,w) +

Training data {(xi,yi), i = 1,2,…,n}

- Gα(yi(w);xi,w)

Δ(yi,yi(w))≤ Gα(yi;xi,w) + - Gα(yi(w);xi,w)

maxy{Δ(yi,y)≤ Gα(yi;xi,w) + - Gα(y;xi,w)}

Page 23: Loss-based Learning with Latent Variables

Learning M3E

Training data {(xi,yi), i = 1,2,…,n}

minw ||w||2 + C Σiξi

Gα(yi;xi,w) + Δ(yi,y) – Gα(y;xi,w) ≤ ξi

When α tends to infinity, M3E = Latent SVM

Other values can give better results

Page 24: Loss-based Learning with Latent Variables

Image Classification

271 images, 6 classes

90/10 train/test split

5 folds

Mammals Dataset

0/1 loss

Page 25: Loss-based Learning with Latent Variables

Image Classification

HOG-Based Model. Dalal and Triggs, 2005

Page 26: Loss-based Learning with Latent Variables

Motif Finding

~ 40,000 sequences

50/50 train/test split

5 Proteins, 5 folds

UniProbe Dataset

Binding vs. Not-Binding

Page 27: Loss-based Learning with Latent Variables

Motif + Markov Background Model. Yu and Joachims, 2009

Motif Finding

Page 28: Loss-based Learning with Latent Variables

RecapScoring function

Prediction

Learning

Pw(y,h|x) = exp(wTΨ(x,y,h))/Z(x)

y(w) = argminy Gα(y;x,w)

minw ||w||2 + C Σiξi

Gα(yi;xi,w) + Δ(yi,y) – Gα(y;xi,w) ≤ ξi

Page 29: Loss-based Learning with Latent Variables

• Latent SVM

• Max-Margin Min-Entropy Models

• Dissimilarity Coefficient Learning

Outline

Kumar, Packer and Koller, ICML 2012

Page 30: Loss-based Learning with Latent Variables

Object Detection

Images xi Labels yi

Bison

Deer

Elephant

Giraffe

Llama

Rhino

Boxes hi

y = “Deer”

Image x

Page 31: Loss-based Learning with Latent Variables

Minimizing General Loss

minw Σi Δ(yi,hi,yi(w),hi(w))

Unknown latent variable values

Supervised Samples

+ Σi Δ’(yi,yi(w),hi(w))Weakly

Supervised Samples

Page 32: Loss-based Learning with Latent Variables

Minimizing General Loss

minw Σi Δ(yi,hi,yi(w),hi(w))

A single distribution to achieve two objectives

Pw(hi|xi,yi)Σhi

Page 33: Loss-based Learning with Latent Variables

Problem

Model Uncertainty in Latent Variables

Model Accuracy of Latent Variable Predictions

Page 34: Loss-based Learning with Latent Variables

Solution

Model Uncertainty in Latent Variables

Model Accuracy of Latent Variable Predictions

Use two different distributions for the two different tasks

Page 35: Loss-based Learning with Latent Variables

Solution

Model Accuracy of Latent Variable Predictions

Use two different distributions for the two different tasks

Pθ(hi|yi,xi)

hi

Page 36: Loss-based Learning with Latent Variables

SolutionUse two different distributions for the two different tasks

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

Page 37: Loss-based Learning with Latent Variables

The Ideal CaseNo latent variable uncertainty, correct prediction

hi

Pw(yi,hi|xi)

(yi,hi)(yi,hi(w))

Pθ(hi|yi,xi)

hi(w)

Page 38: Loss-based Learning with Latent Variables

In PracticeRestrictions in the representation power of models

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

Page 39: Loss-based Learning with Latent Variables

Our FrameworkMinimize the dissimilarity between the two distributions

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

User-defined dissimilarity measure

Page 40: Loss-based Learning with Latent Variables

Our FrameworkMinimize Rao’s Dissimilarity Coefficient

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)

Page 41: Loss-based Learning with Latent Variables

Our FrameworkMinimize Rao’s Dissimilarity Coefficient

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

- β Σh,h’ Δ(yi,h,yi,h’)Pθ(h|yi,xi)Pθ(h’|yi,xi)

Hi(w,θ)

Page 42: Loss-based Learning with Latent Variables

Our FrameworkMinimize Rao’s Dissimilarity Coefficient

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

- (1-β) Δ(yi(w),hi(w),yi(w),hi(w))

- β Hi(θ,θ)Hi(w,θ)

Page 43: Loss-based Learning with Latent Variables

Our FrameworkMinimize Rao’s Dissimilarity Coefficient

hi

Pw(yi,hi|xi)

(yi,hi)(yi(w),hi(w))

Pθ(hi|yi,xi)

- β Hi(θ,θ)Hi(w,θ)minw,θ Σi

Page 44: Loss-based Learning with Latent Variables

Optimization

minw,θ Σi Hi(w,θ) - β Hi(θ,θ)

Initialize the parameters to w0 and θ0

Repeat until convergence

End

Fix w and optimize θ

Fix θ and optimize w

Page 45: Loss-based Learning with Latent Variables

Optimization of θ

minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)

hi

Pθ(hi|yi,xi)

Case I: yi(w) = yi

hi(w)

Page 46: Loss-based Learning with Latent Variables

Optimization of θ

minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)

hi

Pθ(hi|yi,xi)

Case I: yi(w) = yi

hi(w)

Page 47: Loss-based Learning with Latent Variables

Optimization of θ

minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)

hi

Pθ(hi|yi,xi)

Case II: yi(w) ≠ yi

Page 48: Loss-based Learning with Latent Variables

Optimization of θ

minθ Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi) - β Hi(θ,θ)

hi

Pθ(hi|yi,xi)

Case II: yi(w) ≠ yi

Stochastic subgradient descent

Page 49: Loss-based Learning with Latent Variables

Optimization of w

minw Σi Σh Δ(yi,h,yi(w),hi(w))Pθ(h|yi,xi)

Expected loss, models uncertainty

Form of optimization similar to Latent SVM

Δ independent of h, implies latent SVM

Concave-Convex Procedure (CCCP)

Page 50: Loss-based Learning with Latent Variables

Object Detection

Bison

Deer

Elephant

Giraffe

Llama

Rhino

Input x

Output y = “Deer”Latent Variable h

Mammals Dataset

60/40 Train/Test Split

5 Folds

Train Input xi Output yi

Page 51: Loss-based Learning with Latent Variables

Results – 0/1 Loss

Fold 1 Fold 2 Fold 3 Fold 4 Fold 50

0.10.20.30.40.50.60.70.80.9

Average Test Loss

LSVMOur

Statistically Significant

Page 52: Loss-based Learning with Latent Variables

Results – Overlap Loss

Fold 1 Fold 2 Fold 3 Fold 4 Fold 50

0.1

0.2

0.3

0.4

0.5

0.6Average Test Loss

LSVMOur

Page 53: Loss-based Learning with Latent Variables

Action DetectionInput x

Output y = “Using Computer”Latent Variable h

PASCAL VOC 2011

60/40 Train/Test Split

5 Folds

Jumping

Phoning

Playing Instrument

Reading

Riding Bike

Riding Horse

Running

Taking Photo

Using Computer

Walking

Train Input xi Output yi

Page 54: Loss-based Learning with Latent Variables

Results – 0/1 Loss

Fold 1 Fold 2 Fold 3 Fold 4 Fold 50

0.2

0.4

0.6

0.8

1

1.2Average Test Loss

LSVMOur

Statistically Significant

Page 55: Loss-based Learning with Latent Variables

Results – Overlap Loss

Fold 1 Fold 2 Fold 3 Fold 4 Fold 50.62

0.64

0.66

0.68

0.7

0.72

0.74Average Test Loss

LSVMOur

Statistically Significant

Page 56: Loss-based Learning with Latent Variables

• Latent SVM ignores latent variable uncertainty

• M3E for latent variable independent loss

• DISC for latent variable dependent loss

• Strict generalizations of Latent SVM

• Code available online

Conclusions

Page 57: Loss-based Learning with Latent Variables

Questions?

http://cvc.centrale-ponts.fr/personnel/pawan


Recommended