Date post: | 27-Dec-2015 |
Category: |
Documents |
Upload: | loren-west |
View: | 235 times |
Download: | 3 times |
Main Clustering Algorithms
K-Means
Hierarchical
SOM
K-Means
MacQueen, 1967
clusters defined by means/centroids
Many clustering algorithms are derivatives of K-Means
Widespread use in industry and academia, despite it’s many problems
K-Means Example
Hierarchical Clustering
Starts by assuming each point as a cluster
Iteratively links most similar pair of clusters
User-defined threshold parameter specifies the output clusters
Hierarchical Clustering Variants In Minitab©
Linkage MethodsAverageCentroidCompleteMcQuittyMedianSingleWard
Distance MeasuresEuclideanManhattanPearsonSquared EuclideanSquared Pearson
Hierarchical Clustering Example
Results
Still There are Problems
Clustering Documents“bag of words”
Di: vector of length l
Distance between Di and Dj: <Di, Dj>
W1 W2 W3 Wi Wj Wn
f11 f21 f31 fi1 fj1 fn1
. . . . . . . . . . . . . . . .
. . . . . . . . . . . .. . . .D1:
f12 f22 f32 fi2 fj2 fn2. . . . . . . . . . . .. . . .D2:
Dm: f1m f2m f3m fim fjm fnm. . . . . . . . . . . .. . . .
M
Cluster Centroid
Cluster defined by distance to centroid: C
C = 1/m Di, where m is
the # of vectors
Elevations
Elevation of D: El(D) = <C, D>
Problem: Would like:
Mapping to higher DimensionUtilizing Kernel Function K(X,Y)
K(X,Y) = <(X),(Y)>,where, X,Y are vectors in Rn, and is a mapping into Rd, d >> n
Key element in Support Vector Machines
Data needs to appear as Dot Product only: <Di,Dj>
Kernel Function ExamplesPolynomial:
K(X, Y) = (<X, Y> + 1)n
Feedforward Neural Network Classifier
K(X, Y) = tanh(β<X, Y> + b)
Radial Basis
K(X, Y) = e-<X, Y>^2/2^2
First Step: Penalizing Outliers
Ck = 1/m <Di,N(Ck-1)>Di) (1)
Convergence: C = Principal Eigenvector of MTM,where M is the
matrix of Di’s Clim L (MTM)LU (2)
Both (1) and (2) are efficient methods of computing C
Cannot with: Fk = 1/m <(Di),N(Fk-1)> (Di))
Or by using (2):
M = MTM has unmanageable (eventually infinite) dimension
So instead we use ik = <(Di),N(Fk-1)> =
(1/m)jk-1Di),Dj)>) (3)
(D1)
(D2)
.
.
Using Kernels to replace
Theorem
F = i*Di
i*= lim {i
n=(1/m)jn-1K(Di , Dj)}
El(D): Elevation of vector D = i*K(Di , D)
where
for n
Zoomed Clusters
Clusters defined through peaks Peaks: all vectors, which are the highest in their vicinity:
PEAKS = {Dj El(Dj) El(Di)<Di,Dj>S) for all i}
S: Sharpening/Smoothing ParameterCluster: Set of vectors, which are in the vicinity of a
peak
1 2 3
0.1 0.2 0.3 0.4 0.5 0.6
0.1
0.2
0.3
0.4
0.5
0.6
0.7
C1
C2
Kernel: Linear S: Default (1)
1 2 3
0.1 0.2 0.3 0.4 0.5 0.6
0.1
0.2
0.3
0.4
0.5
0.6
0.7
C1
C2
Kernel: Linear S: Default (1)
Clustering Example
Zooming Example
0.00
0.5
1.0
1
2
3
0.0
0.5
1.0
1.5
0.0
0.0
0.1
0.5
0.2
1.41.2
1.0
C2 1.0
0.3
0.80.6
0.4 C10.2
C3
0.4
0.0
0.50.6
1 2 3 4 5 6 7 8
Kernel: LinearS: Default (1)
0.0
0.1
0.5
0.2
1.41.2
1.0
C2 1.0
0.3
0.80.6
0.4 C10.2
C3
0.4
0.0
0.50.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Kernel: Polynomial Degree 2S: 16
Zoomed Clusters Results
0.0
0.1
0.5
0.2
1.41.2
1.0
C2 1.0
0.3
0.80.6
0.4 C10.2
C3
0.4
0.0
0.50.6
1 2 3 4
Kernel: Polynomial Degree 8000S: 1.5
0.0
0.1
0.5
0.2
1.41.2
1.0
C2 1.0
0.3
0.80.6
0.4 C10.2
C3
0.4
0.0
0.50.6
1 2
Kernel: Polynomial Degree 8000S: Deafault (1)Default
Genes
Experiments
Clustering MicroArray Data
Expression Level of Gene i during Experiment j
MicroArrays As Time Series
Clustering Time Series
Reveals groups of genes, which have similar reactions to experiments
Functionally related genes should cluster
Simulated Time Series
Simulated 180 Time Series, with 3 clusters and 9 sub-clusters (20 per sub-cluster)
Each time series is a vector with 1000 components Each component is expression level at a given time
Results
Kernel: Polynomial Degree 3 S: 6Kernel: Polynomial Degree 3 S: 7Kernel: Polynomial Degree 6 S: 15
HMM Parameter Estimation
Viterbi Algorithm
Refinement of HMM Model
Final HMM Model
Sequential K-Means
Baum-Welch Algorithm
Final HMM Model
Refinement of HMM Model
Initial HMM Model
Parameter Estimation with Zoomed Clusters
Zoomed Clusters
Initial HMM Model
Advantages:
• Flexibility with number of states
• Initial Model is closer to the final one
Consequences:
• Higher accuracy and faster convergence for either Baum-Welch or Viterbi
Example: Coins
HHHHHTTTTTTTHHHHHHHTHTHTHTHTHTTTTTTTTHHHHHTTTTTTTHHHHHHHTHTHTHTHTHTTTTTTTT
Coin 1: 100% Heads
Coin 1: 100% Tails Coin 3:
50% Tails 50% Heads
• Regions with similar statistical distribution of Heads and Tails represent the states in the initial HMM Model
• Use Elevation Functions, separately for Heads and Tails to represent these distributions
HHHHH HHHHHHH H H H H H
TTTTTTT T T T T T TTTTTTTT
Step 1: Separating LettersStep 2: Calculating Elevation
Function for each letterStep 3: For each position in the
sequence of throws …
Position i
Step 3: Get the Elevation Functions for Heads and Tails
Step 3: Create point Di in R2, whose components are the
elevations
Step 4: Cluster all the points obtained from each position
Point Di = [Eh, Et]
What Clustering Achieves
Each cluster defines regions of similar distributions of heads and tails
Each Cluster is a state in the initial HMM model
State transition/emission probabilities, are estimated from the clusters
References MacQueen, J. 1967. Some methods for classification and analysis of
multivariate observations. Pp. 281-297 in: L. M. Le Cam & J. Neyman [eds.] Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. University of California Press, Berkeley. xvii + 666 p.
Jain, A. K., Murty, M. N., and Flynn, P. J. Data Clustering: A Review. ACM Computing Surveys, Vol. 31, No. 3, September 1999
http://www.gene-chips.com/ by Leming Shi, Ph.D.