+ All Categories
Home > Documents > Clustering Techniques and Applications

Clustering Techniques and Applications

Date post: 05-Oct-2015
Category:
Upload: anjany-kumar-sekuboyina
View: 227 times
Download: 0 times
Share this document with a friend
Description:
Clustering.
65
 Liang Shan [email protected] Clustering Techniques and Applications to Image Segmentation
Transcript

Clustering Techniques and Applications

RoadmapUnsupervised learningClustering categoriesClustering algorithmsK-meansFuzzy c-meansKernel-based Graph-basedQ&A

2Unsupervised learningDefinition 1Supervised: human effort involvedUnsupervised: no human effortDefinition 2Supervised: learning conditional distribution P(Y|X), X: features, Y: classesUnsupervised: learning distribution P(X), X: features

Slide credit: Min ZhangBack3ClusteringWhat is clustering?

4K-meansMinimizes functional:

Iterative algorithm:Initialize the codebook V with vectors randomly picked from XAssign each pattern to the nearest clusterRecalculate partition matrixRepeat the above two steps until convergence

Data set:Clusters:Codebook : Partition matrix:

ClusteringDefinitionAssignment of a set of observations into subsets so that observations in the same subset are similar in some sense

5Clustering Hard vs. SoftHard: same object can only belong to single clusterSoft: same object can belong to different clustersSlide credit: Min Zhang6Clustering Hard vs. SoftHard: same object can only belong to single clusterSoft: same object can belong to different clustersE.g. Gaussian mixture modelSlide credit: Min Zhang7ClusteringFlat vs. HierarchicalFlat: clusters are flatHierarchical: clusters form a treeAgglomerativeDivisive

Hierarchical clusteringAgglomerative (Bottom-up)Compute all pair-wise pattern-pattern similarity coefficientsPlace each of n patterns into a class of its ownMerge the two most similar clusters into oneReplace the two clusters into the new clusterRe-compute inter-cluster similarity scores w.r.t. the new clusterRepeat the above step until there are k clusters left (k can be 1)

Slide credit: Min Zhang9Hierarchical clusteringAgglomerative (Bottom up)

Hierarchical clusteringAgglomerative (Bottom up)1st iteration1Hierarchical clusteringAgglomerative (Bottom up)2nd iteration12Hierarchical clusteringAgglomerative (Bottom up)3rd iteration123Hierarchical clusteringAgglomerative (Bottom up)4th iteration1234Hierarchical clusteringAgglomerative (Bottom up)5th iteration12345Hierarchical clusteringAgglomerative (Bottom up)Finally k clusters left123469578Hierarchical clusteringDivisive (Top-down)Start at the top with all patterns in one clusterThe cluster is split using a flat clustering algorithmThis procedure is applied recursively until each pattern is in its own singleton clusterHierarchical clusteringDivisive (Top-down)

Slide credit: Min ZhangBottom-up vs. Top-downWhich one is more complex?Which one is more efficient?Which one is more accurate?

Bottom-up vs. Top-downWhich one is more complex?Top-downBecause a flat clustering is needed as a subroutineWhich one is more efficient?Which one is more accurate?

20Bottom-up vs. Top-downWhich one is more complex?Which one is more efficient?Which one is more accurate?

Bottom-up vs. Top-downWhich one is more complex?Which one is more efficient?Top-downFor a fixed number of top levels, using an efficient flat algorithm like K-means, divisive algorithms are linear in the number of patterns and clustersAgglomerative algorithms are least quadraticWhich one is more accurate?

Bottom-up vs. Top-downWhich one is more complex?Which one is more efficient?Which one is more accurate?

Bottom-up vs. Top-downWhich one is more complex?Which one is more efficient?Which one is more accurate?Top-downBottom-up methods make clustering decisions based on local patterns without initially taking into account the global distribution. These early decisions cannot be undone. Top-down clustering benefits from complete information about the global distribution when making top-level partitioning decisions.

BackK-meansDisadvantagesDependent on initialization

K-meansDisadvantagesDependent on initialization

K-meansDisadvantagesDependent on initialization

K-meansDisadvantagesDependent on initializationSelect random seeds with at least DminOr, run the algorithm many times

K-meansDisadvantagesDependent on initializationSensitive to outliers

30K-meansDisadvantagesDependent on initializationSensitive to outliersUse K-medoids

Add definition of median 31K-meansDisadvantagesDependent on initializationSensitive to outliers (K-medoids)Can deal only with clusters with spherical symmetrical point distributionKernel trick

32K-meansDisadvantagesDependent on initializationSensitive to outliers (K-medoids)Can deal only with clusters with spherical symmetrical point distributionDeciding KDeciding KTry a couple of K

Image: Henry LinDeciding KWhen k = 1, the objective function is 873.0

Image: Henry LinDeciding KWhen k = 2, the objective function is 173.1

Image: Henry LinDeciding KWhen k = 3, the objective function is 133.6

Image: Henry Lin

Deciding KWe can plot objective function values for k=1 to 6The abrupt change at k=2 is highly suggestive of two clustersknee finding or elbow findingNote that the results are not always as clear cut as in this toy example

BackImage: Henry Lin38Fuzzy C-meansMinimize

subject to

Fuzzy C-means Minimize

subject to

How to solve this constrained optimization problem?

41Fuzzy C-means Minimize

subject to

How to solve this constrained optimization problem?Introduce Lagrangian multipliers

Show equations here42Fuzzy c-meansIntroduce Lagrangian multipliers

Iterative optimizationFix V, optimize w.r.t. U

Fix U, optimize w.r.t. V

Application to image segmentation

Original imagesSegmentationsHomogenous intensity corrupted by 5% Gaussian noiseSinusoidal inhomogenous intensity corrupted by 5% Gaussian noiseBackImage: Dao-Qiang Zhang, Song-Can ChenAccuracy = 96.02%Accuracy = 94.41%44Kernel substitution trick Kernel fuzzy c-means

Confine ourselves to Gaussian RBF kernel

Introduce a penalty term containing neighborhood information

Equation: Dao-Qiang Zhang, Song-Can ChenSpatially constrained KFCM : the set of neighbors that exist in a window around : the cardinality of controls the effect of the penalty termThe penalty term is minimized when Membership value for xj is large and also large at neighboring pixelsVice versa

0.90.90.90.90.90.90.90.90.90.10.10.10.10.90.10.10.10.1Equation: Dao-Qiang Zhang, Song-Can Chen47FCM applied to segmentation

Original images FCM Accuracy = 96.02% KFCMAccuracy = 96.51% SKFCMAccuracy = 100.00% SFCMAccuracy = 99.34%Image: Dao-Qiang Zhang, Song-Can ChenHomogenous intensity corrupted by 5% Gaussian noiseFCM applied to segmentation

FCM Accuracy = 94.41% KFCMAccuracy = 91.11% SKFCMAccuracy = 99.88% SFCMAccuracy = 98.41%Original imagesImage: Dao-Qiang Zhang, Song-Can ChenSinusoidal inhomogenous intensity corrupted by 5% Gaussian noiseFCM applied to segmentation

Original MR image corrupted by 5% Gaussian noise FCM resultKFCM resultSFCM resultSKFCM resultBackImage: Dao-Qiang Zhang, Song-Can Chen

Slide credit: Jianbo Shi52

Slide credit: Jianbo Shi53

Slide credit: Jianbo Shi54

Slide credit: Jianbo Shi55Problem with min. cutsMinimum cut criteria favors cutting small sets of isolated nodes in the graphNot surprising since the cut increases with the number of edges going across the two partitioned parts

Image: Jianbo Shi and Jitendra Malik56

Slide credit: Jianbo Shi57

Slide credit: Jianbo ShiAlgorithmGiven an image, set up a weighted graph and set the weight on the edge connecting two nodes to be a measure of the similarity between the two nodesSolve for the eigenvectors with the second smallest eigenvalueUse the second smallest eigenvector to bipartition the graphDecide if the current partition should be subdivided and recursively repartition the segmented parts if necessary

Example(a) A noisy step image(b) eigenvector of the second smallest eigenvalue(c) resulting partition

Image: Jianbo Shi and Jitendra Malik60Example(a) Point set generated by two Poisson processes(b) Partition of the point set

Example(a) Three image patches form a junction(b)-(d) Top three components of the partition

Image: Jianbo Shi and Jitendra Malik

Image: Jianbo Shi and Jitendra Malik63ExampleComponents of the partition with Ncut value less than 0.04Image: Jianbo Shi and Jitendra Malik

64Example

BackImage: Jianbo Shi and Jitendra Malik65


Recommended