+ All Categories
Home > Documents > ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe...

ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe...

Date post: 04-Jul-2020
Category:
Upload: others
View: 12 times
Download: 1 times
Share this document with a friend
45
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3
Transcript
Page 1: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

ECE 5984: Introduction to Machine Learning

Dhruv Batra Virginia Tech

Topics: –  Unsupervised Learning: Kmeans, GMM, EM

Readings: Barber 20.1-20.3

Page 2: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Midsem Presentations Graded •  Mean 8/10 = 80%

–  Min: 3 –  Max: 10

(C) Dhruv Batra 2

Page 3: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Tasks

(C) Dhruv Batra 3

Classification x y

Regression x y

Discrete

Continuous

Clustering x c Discrete ID

Dimensionality Reduction

x z Continuous

Supervised Learning

Unsupervised Learning

Page 4: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Unsupervised Learning •  Learning only with X

–  Y not present in training data

•  Some example unsupervised learning problems: –  Clustering / Factor Analysis –  Dimensionality Reduction / Embeddings –  Density Estimation with Mixture Models

(C) Dhruv Batra 4

Page 5: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

New Topic: Clustering

Slide Credit: Carlos Guestrin 5

Page 6: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Synonyms •  Clustering

•  Vector Quantization

•  Latent Variable Models •  Hidden Variable Models •  Mixture Models

•  Algorithms: –  K-means –  Expectation Maximization (EM)

(C) Dhruv Batra 6

Page 7: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Some Data

7 (C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 8: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

K-means

1.  Ask user how many clusters they’d like.

(e.g. k=5)

8 (C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 9: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

K-means

1.  Ask user how many clusters they’d like.

(e.g. k=5)

2.  Randomly guess k cluster Center

locations

9 (C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 10: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

K-means

1.  Ask user how many clusters they’d like.

(e.g. k=5)

2.  Randomly guess k cluster Center

locations

3.  Each datapoint finds out which Center it’s

closest to. (Thus each Center “owns” a set of datapoints)

10 (C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 11: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

K-means

1.  Ask user how many clusters they’d like.

(e.g. k=5)

2.  Randomly guess k cluster Center

locations

3.  Each datapoint finds out which Center it’s

closest to.

4.  Each Center finds the centroid of the

points it owns

11 (C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 12: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

K-means

1.  Ask user how many clusters they’d like.

(e.g. k=5)

2.  Randomly guess k cluster Center

locations

3.  Each datapoint finds out which Center it’s

closest to.

4.  Each Center finds the centroid of the

points it owns…

5.  …and jumps there

6.  …Repeat until terminated! 12 (C) Dhruv Batra Slide Credit: Carlos Guestrin

Page 13: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

K-means •  Randomly initialize k centers

–  µ(0) = µ1(0),…, µk

(0)

•  Assign: –  Assign each point i∈{1,…n} to nearest center: – 

•  Recenter: –  µj becomes centroid of its points

13 (C) Dhruv Batra Slide Credit: Carlos Guestrin

C(i) ⇥� argminj

||xi � µj ||2

Page 14: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

K-means •  Demo

–  http://www.kovan.ceng.metu.edu.tr/~maya/kmeans/ –  http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/

AppletKM.html

(C) Dhruv Batra 14

Page 15: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

What is K-means optimizing? •  Objective F(µ,C): function of centers µ and point

allocations C:

– 

–  1-of-k encoding

•  Optimal K-means: –  minµmina F(µ,a)

15 (C) Dhruv Batra

F (µ, C) =NX

i=1

||xi � µC(i)||2

F (µ,a) =NX

i=1

kX

j=1

aij ||xi � µj ||2

Page 16: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Coordinate descent algorithms

16 (C) Dhruv Batra Slide Credit: Carlos Guestrin

•  Want: mina minb F(a,b)

•  Coordinate descent: –  fix a, minimize b –  fix b, minimize a –  repeat

•  Converges!!! –  if F is bounded –  to a (often good) local optimum

•  as we saw in applet (play with it!)

•  K-means is a coordinate descent algorithm!

Page 17: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

•  Optimize objective function:

•  Fix µ, optimize a (or C)

17 (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means as Co-ordinate Descent

minµ1,...,µk

mina1,...,aN

F (µ,a) = minµ1,...,µk

mina1,...,aN

NX

i=1

kX

j=1

aij ||xi � µj ||2

Page 18: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

•  Optimize objective function:

•  Fix a (or C), optimize µ

18 (C) Dhruv Batra Slide Credit: Carlos Guestrin

K-means as Co-ordinate Descent

minµ1,...,µk

mina1,...,aN

F (µ,a) = minµ1,...,µk

mina1,...,aN

NX

i=1

kX

j=1

aij ||xi � µj ||2

Page 19: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

One important use of K-means •  Bag-of-word models in computer vision

(C) Dhruv Batra 19

Page 20: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Bag of Words model

aardvark 0

about 2

all 2

Africa 1

apple 0

anxious 0

...

gas 1

...

oil 1

Zaire 0

Slide Credit: Carlos Guestrin (C) Dhruv Batra 20

Page 21: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Object Bag of ‘words’

Fei-­‐Fei  Li  

Page 22: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Fei-­‐Fei  Li  

Page 23: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Interest Point Features

Normalize patch

Detect patches [Mikojaczyk and Schmid ’02]

[Matas et al. ’02]

[Sivic et al. ’03]

Compute SIFT

descriptor [Lowe’99]

Slide credit: Josef Sivic

Page 24: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Patch Features

Slide credit: Josef Sivic

Page 25: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

dictionary formation

Slide credit: Josef Sivic

Page 26: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Clustering (usually k-means)

Vector quantization

Slide credit: Josef Sivic

Page 27: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Clustered Image Patches

Fei-Fei et al. 2005

Page 28: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Visual Polysemy. Single visual word occurring on different (but locally similar) parts on different object categories.

Visual Synonyms. Two different visual words representing a similar part of an object (wheel of a motorbike).

Visual synonyms and polysemy

Andrew  Zisserman  

Page 29: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Image representation

…..

frequ

ency

codewords

Fei-­‐Fei  Li  

Page 30: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

(One) bad case for k-means

•  Clusters may overlap •  Some clusters may be

“wider” than others

•  GMM to the rescue!

Slide Credit: Carlos Guestrin (C) Dhruv Batra 30

Page 31: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

GMM

(C) Dhruv Batra 31 Figure Credit: Kevin Murphy

−25 −20 −15 −10 −5 0 5 10 15 20 250

5

10

15

20

25

30

35

Page 32: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Recall Multi-variate Gaussians

(C) Dhruv Batra 32

Page 33: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

GMM

(C) Dhruv Batra 33

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Figure Credit: Kevin Murphy

Page 34: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Hidden Data Causes Problems #1 •  Fully Observed (Log) Likelihood factorizes

•  Marginal (Log) Likelihood doesn’t factorize

•  All parameters coupled!

(C) Dhruv Batra 34

Page 35: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

GMM vs Gaussian Joint Bayes Classifier •  On Board

–  Observed Y vs Unobserved Z –  Likelihood vs Marginal Likelihood

(C) Dhruv Batra 35

Page 36: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Hidden Data Causes Problems #2

(C) Dhruv Batra 36

−25 −20 −15 −10 −5 0 5 10 15 20 250

5

10

15

20

25

30

35

Figure Credit: Kevin Murphy

Page 37: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Hidden Data Causes Problems #2 •  Identifiability

(C) Dhruv Batra 37

−25 −20 −15 −10 −5 0 5 10 15 20 250

5

10

15

20

25

30

35

µ1

µ2

−15.5 −10.5 −5.5 −0.5 4.5 9.5 14.5 19.5

−15.5

−10.5

−5.5

−0.5

4.5

9.5

14.5

19.5

Figure Credit: Kevin Murphy

Page 38: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Hidden Data Causes Problems #3 •  Likelihood has singularities if one Gaussian

“collapses”

(C) Dhruv Batra 38 x

p(x

)

Page 39: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

Special case: spherical Gaussians and hard assignments

Slide Credit: Carlos Guestrin (C) Dhruv Batra 39

•  If P(X|Z=k) is spherical, with same σ for all classes:

•  If each xi belongs to one class C(i) (hard assignment), marginal likelihood:

•  M(M)LE same as K-means!!!

P(xi, y = j)j=1

k

∑i=1

N

∏ ∝ exp − 12σ 2 xi −µC(i)

2%

&'(

)*i=1

N

P(xi | z = j)∝ exp −12σ 2 xi −µ j

2#

$%&

'(

Page 40: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

The K-means GMM assumption

•  There are k components

•  Component i has an associated mean vector µι

µ1

µ2

µ3

Slide Credit: Carlos Guestrin (C) Dhruv Batra 40

Page 41: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

The K-means GMM assumption

•  There are k components

•  Component i has an associated mean vector µι

•  Each component generates data from a Gaussian with mean mi and

covariance matrix σ2Ι

Each data point is generated according to the following recipe:

µ1

µ2

µ3

Slide Credit: Carlos Guestrin (C) Dhruv Batra 41

Page 42: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

The K-means GMM assumption

•  There are k components

•  Component i has an associated mean vector µι

•  Each component generates data from a Gaussian with

mean mi and covariance matrix σ2Ι

Each data point is generated according to the following

recipe:

1.  Pick a component at random: Choose component i with

probability P(y=i)

µ2

Slide Credit: Carlos Guestrin (C) Dhruv Batra 42

Page 43: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

The K-means GMM assumption

•  There are k components

•  Component i has an associated mean vector µι

•  Each component generates data from a Gaussian with

mean mi and covariance matrix σ2Ι

Each data point is generated according to the following

recipe:

1.  Pick a component at random: Choose component i with

probability P(y=i)

2.  Datapoint ∼ Ν(µι, σ2Ι )

µ2

x

Slide Credit: Carlos Guestrin (C) Dhruv Batra 43

Page 44: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

The General GMM assumption

µ1

µ2

µ3

•  There are k components

•  Component i has an associated mean vector mi

•  Each component generates data from a Gaussian with

mean mi and covariance matrix Σi

Each data point is generated according to the following

recipe:

1.  Pick a component at random: Choose component i with

probability P(y=i)

2.  Datapoint ~ N(mi, Σi ) Slide Credit: Carlos Guestrin (C) Dhruv Batra 44

Page 45: ECE 5984: Introduction to Machine Learning › ~s15ece5984 › slides › L22_kmeans_gmm.pptx.pdfThe K-means GMM assumption • There are k components • Component i has an associated

K-means vs GMM •  K-Means

–  http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html

•  GMM –  http://www.socr.ucla.edu/applets.dir/mixtureem.html

(C) Dhruv Batra 45


Recommended