Date post: | 13-Mar-2016 |
Category: |
Documents |
Upload: | rahim-foreman |
View: | 21 times |
Download: | 1 times |
Optimizing these criteria is NP-hard’
Data
Objective
Algorithm
similarities
Spectral clustering K-means
...but “spectral clustering, K-means work well when good clustering exists”
worst case
interesting case
This talk: If a “good” clustering exists, it is “unique” If “good” clustering found, it is provably good
Results summary Given
objective = NCut, K-means distortion data clustering Y with K clusters
Spectral lower bound on distortion If small Then small
where = best clustering with K clusters
distortion
A graphical view
clusteringslowerbound
Overview Introduction
Matrix representations for clusterings Quadratic representation for clustering cost The misclassification error distance
Results for NCut (easier)
Results for K-means distortion (harder)
Discussion
Clusterings as matrices Clustering of { 1,2,..., n } with K clusters (C1, C2,...CK) Represented by n x K matrix
unnormalized
normalized
All matrices have orthogonal columns
Distortion is quadratic in X
NCut K-means
similarities
k
k’
mkk’
The Confusion MatrixTwo clusterings
(C1, C2, ... CK) with (C’1, C’2, ... C’K’) with
Confusion matrix (K x K’)
=
The Misclassification Error distance
computed by the maximal bipartite matching algorithm between clusters
confusion matrix
classification error
k
k’
Results for NCut given
data A (n x n) clustering X (n x K)
Lower bound for NCut (M02, YS03, BJ03)
Upper bound for (MSX’05)
whenever
largest e-values of A
small w.r.t eigengap K+1-K X close to X*
Two clusterings X,X’ close to X*
trace XTX’ large
trace XTX’ large small
convexity proof
Relaxed minimization for
s.t. X = n x K orthogonal matrixSolution:X* = K principal e-vectors of A
Why the eigengap matters Example
A has 3 diagonal blocks K = 2 gap( C ) = gap( C’ ) = 0 but C, C’ not close
C C’
Remarks on stability results No explicit conditions on S
Different flavor from other stability results, e.g Kannan & al 00, Ng & al 01 which assume S “almost” block diagonal
But…results apply only if a good clustering is found
There are S matrices for which no clustering satisfies theorem
Bound depends on aggregate quantities like K cluster sizes (=probabilities)
Points are weighted by their volumes (degrees) good in some applications bounds for unweighted distances can be obtained
Is the bound ever informative? An experiment: S perfect + additive noise
We can do the same ...
...but, K-th principal subspace typically not stable
K-means distortion
4
K = 4dim = 30
New approach: Use K-1 vectors Non-redundant representation Y
Distortion – new expression
...and new (relaxed) optimization problem
Solution of the new problem Relaxed optimization problem
given
Solution
U = K-1 principal e-vectors of A W = KxK orthogonal matrix
with on first row
Clusterings Y,Y’ close to Y*
||YTY’||F large
Solve relaxed minimization
small Y close to Y*
||YTY’||F large small
Theorem For any two clusterings Y,Y’ with Y, Y’ > 0
whenever
Corollary: Bound for d(Y,Yopt)
Experiments
20 replicates
K = 4dim = 30
true error
bound
pmin
B A D
Conclusions First (?) distribution independent bounds on the clustering error
data dependent hold when data well clustered (this is the case of interest)
Tight? – not yet... In addition
Improved variational bound for the K-means cost Showed local equivalence between “misclassification error”
distance and “Frobenius norm distance” (also known as 2 distance)
Related work Bounds for mixtures of Gaussians (Dasgupta, Vempala) Nearest K-flat to n points (Tseng) Variational bounds for sparse PCA (Mogghadan)