+ All Categories
Home > Engineering > DATA MINING:Clustering Types

DATA MINING:Clustering Types

Date post: 05-Jul-2015
Category:
Upload: ashwin-shenoy-m
View: 164 times
Download: 0 times
Share this document with a friend
Description:
data mining : Clustering Methods
19
Presented By, Presented By, Ashwin Shenoy M Ashwin Shenoy M 4CB13SCS02 4CB13SCS02 1
Transcript
Page 1: DATA MINING:Clustering Types

Presented By,Presented By, Ashwin Shenoy MAshwin Shenoy M 4CB13SCS024CB13SCS02

1

Page 2: DATA MINING:Clustering Types

Partitioning methods Hierarchical methodsHierarchical methods Density-based methodsDensity-based methods Grid-based methods Grid-based methods Model-based methods Model-based methods

(conceptual clustering, neural networks)

2

Page 3: DATA MINING:Clustering Types

A partitioning method:A partitioning method: construct a partition of a database D of nn objects into a set of k clusters such thato each cluster contains at least one objecto each object belongs to exactly one cluster

3

Page 4: DATA MINING:Clustering Types

methodsmethods:o k-meansk-means : Each cluster is

represented by the center of the cluster (centroidcentroid).

o k-medoidsk-medoids : Each cluster is represented by one of the objects in the cluster (medoidmedoid).

4

Page 5: DATA MINING:Clustering Types

Input to the algorithmInput to the algorithm: the number of clusters k, and a database of n objects

Algorithm consists of four stepsAlgorithm consists of four steps: 1. partition object into k nonempty

subsets/clusters2. compute a seed points as the centroidcentroid (the

mean of the objects in the cluster) for each cluster in the current partition

3. assign each object to the cluster with the nearest centroid

4. go back to Step 2, stop when there are no more new assignments

5

Page 6: DATA MINING:Clustering Types

Alternative algorithm also consists of four stepsAlternative algorithm also consists of four steps: 1. arbitrarily choose k objects as the initial

cluster centers (centroids)2. (re)assign each object to the cluster with the

nearest centroid3. update the centroids4. go back to Step 2, stop when there are no

more new assignments

6

Page 7: DATA MINING:Clustering Types

7

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10 0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

Page 8: DATA MINING:Clustering Types

Input to the algorithmInput to the algorithm: the number of clusters k, and a database of n objects

Algorithm consists of four stepsAlgorithm consists of four steps: 1. arbitrarily choose k objects as the initial

medoidsmedoids (representative objects)2. assign each remaining object to the cluster

with the nearest medoid3. select a nonmedoid and replace one of the

medoids with it if this improves the clustering4. go back to Step 2, stop when there are no

more new assignments

8

Page 9: DATA MINING:Clustering Types

A hierarchical method:A hierarchical method: construct a hierarchy of clustering, not just a single partition of objects.

The number of clusters kk is not required as an input .

Use a distance matrixdistance matrix as clustering criteria

A termination conditiontermination condition can be used (e.g., a number of clusters)

9

Page 10: DATA MINING:Clustering Types

1.agglomerative1.agglomerative (bottom-up):oplace each object in its own cluster.omerge in each step the two most similar clusters until

there is only one cluster left or the termination condition is satisfied.

2.divisive2.divisive (top-down):ostart with one big cluster containing all the objectsodivide the most distinctive cluster into smaller

clusters and proceed until there are n clusters or the termination condition is satisfied.

10

Page 11: DATA MINING:Clustering Types

11

Step 0 Step 1 Step 2 Step 3 Step 4

b

d

c

e

a a b

d e

c d e

a b c d e

Step 4 Step 3 Step 2 Step 1 Step 0

agglomerativeagglomerative

divisivedivisive

Page 12: DATA MINING:Clustering Types

Birch: Balanced Iterative Reducing and Clustering using Hierarchies.

Incrementally construct a CF (Clustering Feature) tree, a hierarchical data structure for multiphase clustering

Phase 1: scan DB to build an initial in-memory CF tree (a multi-level compression of the data that tries to preserve the inherent clustering structure of the data).

Phase 2: use an arbitrary clustering algorithm to cluster the leaf nodes of the CF-tree.

12

Page 13: DATA MINING:Clustering Types

Clustering feature:

• summary of the statistics for a given subcluster: the 0-th, 1st and 2nd moments of the subcluster from the statistical point of view.

• registers crucial measurements for computing cluster and utilizes storage efficiently

A CF tree is a height-balanced tree that stores the clustering features for a hierarchical clustering

• A nonleaf node in a tree has descendants or “children”

• The nonleaf nodes store sums of the CFs of their children

A CF tree has two parameters

• Branching factor: specify the maximum number of children.

• threshold: max diameter of sub-clusters stored at the leaf nodes

13

Page 14: DATA MINING:Clustering Types

14

CF1

child1

CF3

child3

CF2

child2

CF6

child6

CF1

child1

CF3

child3

CF2

child2

CF5

child5

CF1 CF2 CF6prev next CF1 CF2 CF4

prev next

B = 7

L = 6

Root

Non-leaf node

Leaf node Leaf node

Page 15: DATA MINING:Clustering Types

ROCK: RObust Clustering using linKs Major ideas

• Use links to measure similarity/proximity.

15

Page 16: DATA MINING:Clustering Types

Links: # of common neighbors• C1 <a, b, c, d, e>: {a, b, c}, {a, b, d}, {a, b, e}, {a, c, d}, {a, c, e}, {a,

d, e}, {b, c, d}, {b, c, e}, {b, d, e}, {c, d, e}

• C2 <a, b, f, g>: {a, b, f}, {a, b, g}, {a, f, g}, {b, f, g}

Let T1 = {a, b, c}, T2 = {c, d, e}, T3 = {a, b, f}

• link(T1, T2) = 4, since they have 4 common neighbors

{a, c, d}, {a, c, e}, {b, c, d}, {b, c, e}

• link(T1, T3) = 3, since they have 3 common neighbors

{a, b, d}, {a, b, e}, {a, b, g} Thus link is a better measure than Jaccard coefficient

16

Page 17: DATA MINING:Clustering Types

Measures the similarity based on a dynamic model

• Two clusters are merged only if the interconnectivity and closeness (proximity) between two clusters are high relative to the internal interconnectivity of the clusters and closeness of items within the clusters

A two-phase algorithm

1. Use a graph partitioning algorithm: cluster objects into a large number of relatively small sub-clusters

2. Use an agglomerative hierarchical clustering algorithm: find the genuine clusters by repeatedly combining these sub-clusters

17

Page 18: DATA MINING:Clustering Types

18

Construct

Sparse Graph Partition the Graph

Merge Partition

Final Clusters

Data Set

Page 19: DATA MINING:Clustering Types

THANK YOU

19


Recommended