A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Post on 22-Feb-2016

35 views 15 download

Tags:

description

A Link-Based Cluster Ensemble Approach for Categorical Data Clustering. Presenter : Jian-Ren Chen Authors : Natthakan Iam -On, Tossapon Boongoen , Simon Garrett, and Chris Price 2012 , IEEE. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. - PowerPoint PPT Presentation

transcript

Intelligent Database Systems Lab

Presenter : JIAN-REN CHEN

Authors : Natthakan Iam-On, Tossapon Boongoen,

   Simon Garrett, and Chris Price

2012 , IEEE

A Link-Based Cluster Ensemble Approachfor Categorical Data Clustering

Intelligent Database Systems Lab

OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments

Intelligent Database Systems Lab

Motivation• Cluster Ensembles:

combine different clustering decisions in such a

way as to achieve accuracy superior to that of

any individual clustering.

Intelligent Database Systems Lab

Objectives• A new link-based approach improves the conventional

matrix by discovering unknown entries through

similarity between clusters in an ensemble.

Intelligent Database Systems Lab

Methodology

Creating a Cluster Ensemble

Generating a Refined Matrix

Applying a Consensus Function to RM

Intelligent Database Systems Lab

Creating a Cluster Ensemble

Generating a Refined Matrix

Applying a Consensus Function to RM

MethodologyType I (Direct ensemble):

Type II (Full-space ensemble)

Type III (Subspace ensemble)

Intelligent Database Systems Lab

MethodologyCreating a Cluster

Ensemble

Generating a Refined Matrix

Applying a Consensus Function to RM

Intelligent Database Systems Lab

MethodologyCreating a Cluster

Ensemble

Generating a Refined Matrix

Applying a Consensus Function to RM

Intelligent Database Systems Lab

Methodology

• given a graph G = (V,W)• SPEC finds the K largest eigenvectors

of W• formed another matrix U

Creating a Cluster Ensemble

Generating a Refined Matrix

Applying a Consensus Function to RM

Intelligent Database Systems Lab

Experiments

• Investigated Data Sets

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Conclusions• Constructing the RM is efficiently resolved by the

similarity among categorical labels, using the

Weighted Triple-Quality similarity algorithm.

• The link-based method usually achieves superior

clustering results.

Intelligent Database Systems Lab

Comments• Advantages– The link-based method is efficient.

• Applications– Categorical Data Clustering