+ All Categories
Home > Documents > CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies...

CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies...

Date post: 13-Jan-2016
Category:
Upload: clinton-henry
View: 217 times
Download: 2 times
Share this document with a friend
18
CHAPTER 7: Clustering -Means and EM (modified Alpaydin transparencies and new transparencies added Last updated: February 25, 20
Transcript
Page 1: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

CHAPTER 7:

Clustering

Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added)

Last updated: February 25, 2014

Page 2: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

2

k-Means Clustering Problem: Find k reference vectors M={m1,…, mk}

(representatives/centroids) and a 1-of-k coding scheme (X M) represented by b_ti that minimizes the “reconstruction error”:

b_it and M has to be found that minimizes E:

Step1: find the best b_it:

Step2: find M: mi=( b_it*xt)/(b_it)

2**,11

t i i

tt

i

N

t

t

i

k

iibbEJ mxm X

otherwise0

min if1 jt

ji

ttib

mxmx

Problem: b_it depends on M!!only an iterative solution can be found!!Significantly different from book!!

Eick: K-Means and EM

Page 3: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

3

k-means Clustering

Page 4: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

4

k-Means Clustering Reformulated Given dataset X, k (and a random seed) Problem: Find k vectors M={m1,…, mk}d and 1-of-

k coding scheme bit (bi

t: X M) that minimizes the following error:

Subject to: {0,1}

(t)=i =1 for t=1,…,N (“coding scheme”)

t i i

tt

i

N

t

t

i

k

iibbEJ 2**,

11mxm X

t

ib

t

ib

Eick: K-Means and EM

Page 5: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

5

k-Means Clustering Generalized Given dataset X, k, distance function d Problem: Find k vectors M={m1,…, mk}d and 1-of-

k coding scheme bit (bi

t: X M) that minimizes the following error:

Subject to: {0,1}

(t)=i =1 for t=1,…,N (“coding scheme”)

t i i

tt

i

N

t

t

i

k

iidbbEJ 2),(*,'

11mxm X

t

ib

t

ib

Eick: K-Means and EM

Page 6: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

6

K-mean’s Complexity—O(kntd)

K-means performs t iterations of E/M-steps; in each iteration the following computations have to be done: E-Step:

1. Find the nearest centroid for each object; because there are n objects and k centroids n*k distances have to be computed and compared; the complexity of computing a distance is O(d) yielding: O(k*n*d)

2. Assign objects to clusters/update clusters (O(n)) M-Step: Compute centroid for each cluster (O(n))

We obtain: O(t*(k*n*d + n + n))=O(t*k*n*d)

Eick: K-Means and EM

Page 7: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

7

Semiparametric Density Estimation Parametric: Assume a single model for p (x | Ci)

(Chapter 4 and 5) Semiparametric: p (x | Ci) is a mixture of densities

Multiple possible explanations/prototypes: Nonparametric: No model; data speaks for itself

(Chapter 8)

Page 8: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

Eick: K-Means and EM8

Mixture Densities

where Gi the components/groups/clusters,

P ( Gi ) mixture proportions (priors),

p ( x | Gi) component densities

Gaussian mixture where p(x|Gi) ~ N ( μi , ∑i ) parameters Φ = {P ( Gi ), μi , ∑i }k

i=1

unlabeled sample X={xt}t (unsupervised learning)

k

iii Ppp

1

| GGxx

Idea: use a density function for each cluster or use multiple density functions for a single class

Page 9: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

9

Expectation-Maximization (EM) Clustering Assumptions: datasets X are created a mixture of k

multivariate Gaussian G1,…Gk also preserving their priors P(G1),…,P(Gk).

Example (2-D Dataset):p(x|G1)~N((2,3),1), P(G1)=0.5

p(x|G2)~ N((-4,1),2), P(G2)=0.4

p(x|G3)~N((0,-9),3), P(G3)=0.1 EM’s Task: Given X, reconstruct the k Gaussian and

their priors; each Gaussian is a model of a cluster in the dataset.

Forming of clusters: Assign xi to the cluster Gr which has the maximum value for: p(xi|Gr)P(Gr); alternatively, set p(xi|Gr)P(Gr)= (soft EM)Eick: K-Means and EM

irh

Page 10: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

10

Performance Measure of EM EM maximizes Log likelihood of a mixture model

We choose which makesX most likely.

Assume hidden variables z, which when known, make optimization much simpler. z gives the probability of an object belonging to a particular cluster; z_ij is not known and has to be estimated; h_ij denotes its estimator.

EM maximizes the log likelihood of the sample, but instead of maximizing L(Φ |X), it maximizes L(Φ |X,Z), in terms of x and z.

t

k

iii

t

t

t

Pp

p

1

|log

|log|

GG

XL

x

x

Eick: K-Means and EM

Page 11: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

11

E- and M-steps

Iterate the two steps1. E-step: Estimate z given X and current Φ2. M-step: Find new Φ’ given z, X, and old Φ.

An increase in Q increases incomplete likelihood

ll

lC

l E

|maxarg:step-M

|||:step-E1 Q

X,ZX,LQ

XLXL ||1 ll

Assignments of samplesto clusters

Current Model

Remark: z_it denotes the unknown membership of the t-th example

Page 12: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

12

EM in Gaussian Mixtures zt

i = p(x|Gi); assume p(x|Gi)~N(μi,∑i) E-step: Estimate mixture probabilities for X={x1,

…,xn}

M-step: Obtain: l+1namely (P(Gi),mi,Si)l+1 for i=1,…,k

Use estimated labels in place of unknown labels

ti

j jl

jt

il

it

lti h

Pp

PpzE

G,G

G,GX

|

|,

x

x

t

ti

Tli

t

t

li

ttil

i

t

ti

t

ttil

it

ti

i

h

h

h

h

N

hP

111

1

mxmx

xm

S

G

Remarks: h_it plays the role of b_it in K-Meansh_it acts as an estimator for the unknown labels z_ij.

Page 13: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

13

EM Means Expectation-Maximization 1. E-step (Expectation): Determine object membership in

clusters (from model)2. M-step (Maximization): Create model by using

Maximum Likelihood Estimation (from cluster memberships)

Computations in the two steps (K-means/EM):1. Compute b_it / h_it2. Compute centroid/(prior, mean,co-variance matrix)

Remark: K-means uses a simple form of the EM-procedure!

Eick: K-Means and EM

Page 14: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

14

Commonalities between K-Means and EM

1. They start with random clusters and rely on a 2 step-approach to minimize the objective function using the EM-procedure (see previous transparency).

2. Use the same optimization procedure of an objective function f(a1,…,am,b1,…,,bk); we basically, maximize the a-values (keeping the b-values fixed) and then the b-values (keeping the a-values fixed) until some convergence is reached. Consequently, both algorithms only find a local minimum of the objective function are sensitive to initialization

3. Both assume the number of clusters k is known

Eick: K-Means and EM

Page 15: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

15

Differences between K-Means and EM I

1. K-means is distanced-based and relies on 1-NN queries to form clusters. EM is density based/probabilistic; EM usually works with multivariate Gaussians but can be generalized to work with other probability distributions.

2. K-means minimizes the squared distance of on object to its cluster prototype (usually the centroid). EM maximizes the log-likelihood of a sample given a model (p(X|)); models are assumed to be mixtures of k Gaussians and their priors.

3. K-means is a hard clustering, EM is a soft clustering algorithm: h_it [0,1]

Eick: K-Means and EM

Page 16: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

16

Differences between K-Means and EM II

4. K-means cluster models are just k centroids; EM models are k “priors, means+co-variance matrices”.

5. EM directly deals with dependencies between attributes in its density estimation approach: the degree to which an object x belongs to a cluster c depends on the product of c’s prior with the Mahalanobis distance between x and the c’s mean; therefore, EM clusters do not depend on units of measurements and orientation of attributes in space.

6. The distance metrics can be viewed as an input parameter when using k-means, and generalizations of K-means have been proposed which use different distance functions. EM implicitly relies on the Mahalanobis distance function which is part of its density estimation approach.

Eick: K-Means and EM

Page 17: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

17

Mixture of Mixtures

In classification, the input comes from a mixture of classes (supervised).

If each class is also a mixture, e.g., of Gaussians, (unsupervised), we have a mixture of mixtures:

K

iii

k

jijiji

Ppp

Pppi

1

1

|

||

CC

GGC

xx

xx

Page 18: CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

18

Choosing k

Defined by the application, e.g., image quantization

Plot data (after PCA) and check for clusters Incremental (leader-cluster) algorithm: Add one at

a time until “elbow” (reconstruction error/log likelihood/intergroup distances)

Manual check for meaning Run with multiple k-values and compare the

results


Recommended