+ All Categories
Home > Documents > CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

Date post: 21-Dec-2015
Category:
View: 218 times
Download: 1 times
Share this document with a friend
Popular Tags:
18
CS281B Winter02 Yan Wang and Lihua Lin 1 K-means clustering
Transcript
Page 1: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 1

K-means clustering

Page 2: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 2

What are clustering algorithms?

What is clustering ? Clustering of data is a method by which large sets of data is

grouped into clusters of smaller sets of similar data.

Example:

The balls of same color are clustered into a group as shown below :

Thus, we see clustering means grouping of data or dividing a

large data set into smaller data sets of some similarity.

Page 3: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 3

What is a clustering algorithm ?

A clustering algorithm attempts to find natural groups of components (or data) based on some similarity.

The clustering algorithm also finds the centroid of a group of data sets.

The centroid of a cluster is a point whose parameter values are the mean of the parameter values of all the points in the clusters.

Page 4: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 4

What is the common metric for clustering techniques ? Generally, the distance between two points is taken as a

common metric to assess the similarity among the components of a population. The most commonly used distance measure is the Euclidean metric which defines the distance between two points p= ( p1, p2, ....) and q = ( q1, q2, ....) as :

2

1

)( i

k

ii qpd

Page 5: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 5

Uses of clustering algorithms Engineering sciences: pattern recognition, artificial intelligence,

cybernetics etc. Typical examples to which clustering has been applied include handwritten characters, samples of speech, fingerprints, and pictures.

Life sciences (biology, botany, zoology, entomology, cytology, microbiology): the objects of analysis are life forms such as plants, animals, and insects.

Information, policy and decision sciences: the various applications of clustering analysis to documents include votes on political issues, survey of markets, survey of products, survey of sales programs, and R & D.

Page 6: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 6

Types of clustering algorithms The various clustering

concepts available can be grouped into two broad categories :

Hierarchial methods – Minimal Spanning Tree Method (Fig)

Nonhierarchial methods –

K-means Algorithm

Page 7: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 7

K-Means Clustering Algorithm Definition:

This nonheirarchial method initially takes the number of components of the population equal to the final required number of clusters. In this step itself the final required number of clusters is chosen such that the points are mutually farthest apart. Next, it examines each component in the population and assigns it to one of the clusters depending on the minimum distance. The centroid's position is recalculated everytime a component is added to the cluster and this continues until all the components are grouped into the final required number of clusters.

Page 8: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 8

K-Means Clustering Algorithm

Page 9: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 9

The Parameters and options for the k-means algorithm

• Initialization: Different init Methods• Distance Measure:There are different distance measures that can be used. (Manhattan distance & Euclidean distance).• Termination: k-means should terminate when no more pixels are changing classes. • Quality: the quality of the results provided by k-means classification • Parallelism: There are several ways to parallelize the k-means algorithm • What to do with dead classes:A class is "dead" if no pixels belong to it. • Variants: one pass on-the-fly calculation of means • Number of classes: Number of classes is usually given as an input variable.

Page 10: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 10

Comments on the K-means Methods

Strength of the K-means:

• Relatively efficient: O(tkn), where n is the number of objects, k is the number of clusters, and t is number of iterations. Normally, k,t << n.

• Often terminates at a local optimum.

Weakness of the k-means:

• Applicable only when mean is defined, then what about categorical data?

• Need to specify k, the number of clusters, in advance.

• Unable tom handle noisy data and outlines.

•Not suitable to discover clusters with non-convex shapes.

Page 11: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 11

Direct k-means clustering algorithm

Page 12: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 12

2 Initial Clusters

Demo (I)

Page 13: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 13

2-means Clustering

Demo (I)

Page 14: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 14

Demo (II) – Init Method: Random

Page 15: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 15

Demo (II) – Init Method: Linear

Page 16: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 16

Demo (II) – Init Method: Cube

Page 17: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 17

Demo (II) – Init Method: Statistics

Page 18: CS281B Winter02Yan Wang and Lihua Lin1 K-means clustering.

CS281B Winter02 Yan Wang and Lihua Lin 18

Demo (II) – Init Method: Possibility


Recommended