Main Clustering Algorithms §K-Means §Hierarchical §SOM.

Main Clustering Algorithms

K-Means

Hierarchical

SOM

K-Means

MacQueen, 1967

clusters defined by means/centroids

Many clustering algorithms are derivatives of K-Means

Widespread use in industry and academia, despite it’s many problems

K-Means Example

Hierarchical Clustering

Starts by assuming each point as a cluster

Iteratively links most similar pair of clusters

User-defined threshold parameter specifies the output clusters

Hierarchical Clustering Variants In Minitab©

Linkage MethodsAverageCentroidCompleteMcQuittyMedianSingleWard

Distance MeasuresEuclideanManhattanPearsonSquared EuclideanSquared Pearson

Hierarchical Clustering Example

Results

Still There are Problems

Clustering Documents“bag of words”

Di: vector of length l

Distance between Di and Dj: <Di, Dj>

W1 W2 W3 Wi Wj Wn

f11 f21 f31 fi1 fj1 fn1

. . . . . . . . . . . . . . . .

. . . . . . . . . . . .. . . .D1:

f12 f22 f32 fi2 fj2 fn2. . . . . . . . . . . .. . . .D2:

Dm: f1m f2m f3m fim fjm fnm. . . . . . . . . . . .. . . .

M

Cluster Centroid

Cluster defined by distance to centroid: C

C = 1/m Di, where m is

the # of vectors

Elevations

Elevation of D: El(D) = <C, D>

Problem: Would like:

Mapping to higher DimensionUtilizing Kernel Function K(X,Y)

K(X,Y) = <(X),(Y)>,where, X,Y are vectors in Rn, and is a mapping into Rd, d >> n

Key element in Support Vector Machines

Data needs to appear as Dot Product only: <Di,Dj>

Kernel Function ExamplesPolynomial:

K(X, Y) = (<X, Y> + 1)n

Feedforward Neural Network Classifier

K(X, Y) = tanh(β<X, Y> + b)

Radial Basis

K(X, Y) = e-<X, Y>^2/2^2

First Step: Penalizing Outliers

Ck = 1/m <Di,N(Ck-1)>Di) (1)

Convergence: C = Principal Eigenvector of MTM,where M is the

matrix of Di’s Clim L (MTM)LU (2)

Both (1) and (2) are efficient methods of computing C

Cannot with: Fk = 1/m <(Di),N(Fk-1)> (Di))

Or by using (2):

M = MTM has unmanageable (eventually infinite) dimension

So instead we use ik = <(Di),N(Fk-1)> =

(1/m)jk-1Di),Dj)>) (3)

(D1)

(D2)

.

.

Using Kernels to replace

Theorem

F = i*Di

i*= lim {i

n=(1/m)jn-1K(Di , Dj)}

El(D): Elevation of vector D = i*K(Di , D)

where

for n

Zoomed Clusters

Clusters defined through peaks Peaks: all vectors, which are the highest in their vicinity:

PEAKS = {Dj El(Dj) El(Di)<Di,Dj>S) for all i}

S: Sharpening/Smoothing ParameterCluster: Set of vectors, which are in the vicinity of a

peak

1 2 3

0.1 0.2 0.3 0.4 0.5 0.6

0.1

0.2

0.3

0.4

0.5

0.6

0.7

C1

C2

Kernel: Linear S: Default (1)

1 2 3

0.1 0.2 0.3 0.4 0.5 0.6

0.1

0.2

0.3

0.4

0.5

0.6

0.7

C1

C2

Kernel: Linear S: Default (1)

Clustering Example

Zooming Example

0.00

0.5

1.0

1

2

3

0.0

0.5

1.0

1.5

0.0

0.0

0.1

0.5

0.2

1.41.2

1.0

C2 1.0

0.3

0.80.6

0.4 C10.2

C3

0.4

0.0

0.50.6

1 2 3 4 5 6 7 8

Kernel: LinearS: Default (1)

0.0

0.1

0.5

0.2

1.41.2

1.0

C2 1.0

0.3

0.80.6

0.4 C10.2

C3

0.4

0.0

0.50.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Kernel: Polynomial Degree 2S: 16

Zoomed Clusters Results

0.0

0.1

0.5

0.2

1.41.2

1.0

C2 1.0

0.3

0.80.6

0.4 C10.2

C3

0.4

0.0

0.50.6

1 2 3 4

Kernel: Polynomial Degree 8000S: 1.5

0.0

0.1

0.5

0.2

1.41.2

1.0

C2 1.0

0.3

0.80.6

0.4 C10.2

C3

0.4

0.0

0.50.6

1 2

Kernel: Polynomial Degree 8000S: Deafault (1)Default

Genes

Experiments

Clustering MicroArray Data

Expression Level of Gene i during Experiment j

http://www.gene-chips.com/sample1.jpg

MicroArrays As Time Series

Clustering Time Series

Reveals groups of genes, which have similar reactions to experiments

Functionally related genes should cluster

Simulated Time Series

Simulated 180 Time Series, with 3 clusters and 9 sub-clusters (20 per sub-cluster)

Each time series is a vector with 1000 components Each component is expression level at a given time

Results

Kernel: Polynomial Degree 3 S: 6Kernel: Polynomial Degree 3 S: 7Kernel: Polynomial Degree 6 S: 15

HMM Parameter Estimation

Viterbi Algorithm

Refinement of HMM Model

Final HMM Model

Sequential K-Means

Baum-Welch Algorithm

Final HMM Model

Refinement of HMM Model

Initial HMM Model

Parameter Estimation with Zoomed Clusters

Zoomed Clusters

Initial HMM Model

Advantages:

• Flexibility with number of states

• Initial Model is closer to the final one

Consequences:

• Higher accuracy and faster convergence for either Baum-Welch or Viterbi

Example: Coins

HHHHHTTTTTTTHHHHHHHTHTHTHTHTHTTTTTTTTHHHHHTTTTTTTHHHHHHHTHTHTHTHTHTTTTTTTT

Coin 1: 100% Heads

Coin 1: 100% Tails Coin 3:

50% Tails 50% Heads

• Regions with similar statistical distribution of Heads and Tails represent the states in the initial HMM Model

• Use Elevation Functions, separately for Heads and Tails to represent these distributions

HHHHH HHHHHHH H H H H H

TTTTTTT T T T T T TTTTTTTT

Step 1: Separating LettersStep 2: Calculating Elevation

Function for each letterStep 3: For each position in the

sequence of throws …

Position i

Step 3: Get the Elevation Functions for Heads and Tails

Step 3: Create point Di in R2, whose components are the

elevations

Step 4: Cluster all the points obtained from each position

Point Di = [Eh, Et]

What Clustering Achieves

Each cluster defines regions of similar distributions of heads and tails

Each Cluster is a state in the initial HMM model

State transition/emission probabilities, are estimated from the clusters

References MacQueen, J. 1967. Some methods for classification and analysis of

multivariate observations. Pp. 281-297 in: L. M. Le Cam & J. Neyman [eds.] Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. University of California Press, Berkeley. xvii + 666 p.

Jain, A. K., Murty, M. N., and Flynn, P. J. Data Clustering: A Review. ACM Computing Surveys, Vol. 31, No. 3, September 1999

http://www.gene-chips.com/ by Leming Shi, Ph.D.

Date post:	27-Dec-2015
Category:	Documents
Upload:	loren-west
View:	235 times
Download:	3 times