+ All Categories
Home > Documents > Prepared by: Mahmoud Rafeek Al-Farra

Prepared by: Mahmoud Rafeek Al-Farra

Date post: 03-Jan-2016
Category:
Upload: gloria-barber
View: 37 times
Download: 3 times
Share this document with a friend
Description:
College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology. Data Mining. Chapter 6_2 : Clustering Methods. Prepared by: Mahmoud Rafeek Al-Farra. 2013. www.cst.ps/staff/mfarra. Course’s Out Lines. Introduction Data Preparation and Preprocessing - PowerPoint PPT Presentation
17
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining Data Mining 2013 www.cst.ps/staff/ mfarra Chapter 6_2: Clustering Methods
Transcript
Page 1: Prepared by: Mahmoud Rafeek Al-Farra

Prepared by: Mahmoud Rafeek Al-Farra

College of Science & TechnologyDep. Of Computer Science & ITBCs of Information Technology

Data MiningData Mining

2013www.cst.ps/staff/mfarra

Chapter 6_2: Clustering Methods

Page 2: Prepared by: Mahmoud Rafeek Al-Farra

Course’s Out Lines

Introduction Data Preparation and Preprocessing Data Representation Classification Methods Evaluation Clustering Methods Mid Exam Association Rules Knowledge Representation Special Case study : Document clustering Discussion of Case studies by students

2

Page 3: Prepared by: Mahmoud Rafeek Al-Farra

Out Lines

Definition of Clustering Clustering Process Clustering Algorithms (Methods) Cluster validation

3

Page 4: Prepared by: Mahmoud Rafeek Al-Farra

Definition ?4

Clustering is a division of data into groups of similar objects.

Each group, called cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups.

Clustering is an unsupervised classification problem.

Page 5: Prepared by: Mahmoud Rafeek Al-Farra

Clustering Process (Document Case) 5

Preprocessing step

• Document cleaning•Feature selection or extraction.

Documents samples

Clustering Algorithm

• Similarity Measure • Criterion Clustering Function

Cluster validation• External Indices• Internal Indices• Relative Indices.

Results interpretation

Knowledge

Clusters

1 2

34

Page 6: Prepared by: Mahmoud Rafeek Al-Farra

Clustering algorithm design or selection

This step is usually combined with the selection of a corresponding proximity measure and the construction of a criterion function.

Obviously, the proximity measure directly affects the formation of the resulting clusters.

Almost all clustering algorithms are explicitly or implicitly connected to some definition of proximity measure.

6

Page 7: Prepared by: Mahmoud Rafeek Al-Farra

Clustering algorithm design or selection

In order to be able to group similar data objects a proximity metric has to be used to find which objects (or clusters) are similar.

Similarity Measure can be done through measure how much two objects are similar to each other (Similarity) or measure how mach two objects are different (dissimilarity ).

There is a large number of similarity metrics reported in the literature due to the large number of representation models and clustering algorithms.

7

Page 8: Prepared by: Mahmoud Rafeek Al-Farra

Clustering algorithm design or selection8

Document cluster Document cluster

Document cluster

Inter-ClusterSim.

Intra-ClusterSim.

Page 9: Prepared by: Mahmoud Rafeek Al-Farra

Clustering Algorithms

Once a proximity measure is chosen, the construction of a clustering criterion function makes the partition of clusters an optimization problem, which is well defined mathematically, and has rich solutions in the literature.

9

Page 10: Prepared by: Mahmoud Rafeek Al-Farra

Clustering Algorithms10

Clustering Algorithms

Clustering Algorithms

Hierarchical ClusteringHierarchical Clustering

Partitional ClusteringPartitional Clustering

Agglomerative

(AHC) Agglomerative

(AHC)

Divisive

( DHC) Divisive

( DHC)

• K-means

• Fuzzy C-means

• Bisecting k-means

Density ClusteringDensity Clustering

Grid ClusteringGrid Clustering

NN ClusteringNN Clustering

Page 11: Prepared by: Mahmoud Rafeek Al-Farra

Hierarchical Clustering

Hierarchical techniques produce a nested sequence of partitions, with a single all-inclusive cluster at the top and singleton clusters of individual objects at the bottom.

The result of a hierarchical clustering algorithm can be viewed as a tree, called a dendogram.

11

a b c d e

{a{ ,}b,c,d,e}

{a{ ,}b,c{ ,}d,e}

{a{ ,}b,c{ ,}d{ ,}e}

{a{ ,}b{ ,}c{ ,}d{ ,}e}

{a, b,c,d,e}

Page 12: Prepared by: Mahmoud Rafeek Al-Farra

Hierarchical Clustering

AHC starts with the set of objects as individual clusters; then, at each step merges the most two similar clusters.

This process is repeated until a minimal number of clusters have been reached, or, if a complete hierarchy is required then the process continues until only one cluster is left.

12

Page 13: Prepared by: Mahmoud Rafeek Al-Farra

Hierarchical Clustering

DHC Methods work from top to bottom, starting with the whole data set as one cluster, and at each step split a cluster until only singleton clusters of individual objects remain

13

Page 14: Prepared by: Mahmoud Rafeek Al-Farra

Partitional Clustering

Partitional clustering techniques create a one-level (un-nested) partitioning of the data points.

If K is the desired number of clusters, the partitional approaches typically find all K clusters at once.

The most known class of partitional clustering algorithms are the k-means algorithm and its variants.

14

Centroids

Page 15: Prepared by: Mahmoud Rafeek Al-Farra

Neural Networks-Based Clustering

Neural networks (NNs) are able to learn complex relationships from data samples either in a supervised or unsupervised fashion.

In supervised leaning, a labeled set of data is used to train the network for modeling the input and output functions, prior to testing. Whereas unsupervised networks do not use such a priori knowledge but they can learn the underlying relationships from the data.

15

Page 16: Prepared by: Mahmoud Rafeek Al-Farra

Next :

Cluster validation Examples of Clustering algorithm

16

Prepare 2 slides for each of the following clustering algorithm:•Density Clustering•Grid Clustering

Page 17: Prepared by: Mahmoud Rafeek Al-Farra

Thanks17


Recommended