+ All Categories
Home > Documents > Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh...

Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh...

Date post: 17-Jan-2016
Category:
Upload: susanna-cox
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
14
Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany , Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo J. G. B. Campello Department of Computing Science, University of Alberta, Edmonton, Canada Aug 2012
Transcript
Page 1: Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo.

Relative Validity Criteria for Community Mining Evaluation

ASONAM 2012

Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo J. G. B. CampelloDepartment of Computing Science,University of Alberta,Edmonton, Canada

Aug 2012

Page 2: Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo.

Motivation

Applications in different domains; sociology, criminology •Module identification in Biological Networks Clusters in Protein-Protein Interaction Networks Protein complexes and parts of pathways; Clusters in a

protein similarity network protein families. (R Guimerà et al., Functional cartography of complex metabolic networks, Nature 433, 2005)

Prerequisite of further analysis; Targeted advertising, link prediction, recommendation

•Social Networks: personalized news feed, easier privacy settingsGmail's "Don't Forget Bob!" and "Got the Wrong Bob?" features (M Roth et al., Suggesting Friends Using

the Implicit Social Graph, KDD 2010)

•Citation network of scholarsPaper and collaborator recommendation, Network visualization and Navigation; e.g. CiteULike, Arnet Miner

and Microsoft Academic

•Hyperlinks between web pages - WWW Detecting Group of closely related topics to refined search results (J Chen et al., An Unsupervised Approach

to Cluster Web Search Results Based on Word Sense Communities. Web Intelligence 2008)

1

Page 3: Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo.

Loosely defined as groups of nodes that have relatively more links between themselves than to the rest of the network

• Nodes that have structural similarity (SCAN, Xu et al. 2007)

• Nodes that are connected with cliques (CFinder by Palla et al. 2005)

• Nodes that a random walk is likely to trap within them (MCL by Dongen, Walktrap by Pons and Latapy)

• Nodes that follow the same leader (TopLeaders, 2010)

• Nodes that make the graph compress efficiently (Infomap, Infomod, Rosvall and Bergstrom, 2011)

• Nodes that are separated from the rest by min cut, conductance (flow based methods, e.g. Kernighan-Lin (KL), betweenness of Newman)

• Nodes that number of links between them is more than chance (Newman's Q modularity, FastModularity, Blondel et al.’s Louvain)

Community

2

Page 4: Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo.

Evaluation; overlooked

Internal EvaluationPredefined quality/structure for the communities

• Graph partitioning measures (density, conductance)

External EvaluationAgreement between the results and a given known ground-truth•A clustering similarity/agreement indexes; Rand Index, Jaccard •Benchmarks with ground truth; GN(2002), LFR(2008)

3

The community structure is not known beforehand The community structure is not known beforehand

No ground truthNo large data set with known ground truth

The synthetic benchmarks disagree with some real network characteristics

No ground truthNo large data set with known ground truth

The synthetic benchmarks disagree with some real network characteristics

LFRGNKarate

Page 5: Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo.

Relative Validity Criteria

Validity criteria defined for clustering evaluation; compares different clusterings of a same data setWe altered criteria •Generalized distance; graph distance measures•Generalized mean/centroid notion; averaging v.s. medoid e.g. Variance Ratio Criterion (VRC)

Same for: Dunn index, Silhouette Width Criterion (SWC), Alternative Silhouette, PBM, C-Index, Z-Statistics, Point-Biserial (PB)

Distance Alternatives: Edge Path (ED), Shortest Path Distance (SPD), Adjacency Relation Distance (ARD), Neighbour Overlap Distance (NOD), Pearson Correlation Distance (PCD), ICloseness Distance (ICD)

4

Page 6: Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo.

Correlation with External Index

Correlation of relative criteria and external scores on different clusterings of same data set

random clusterings that range from very close to very far from ground truth

For karate;

5

Page 7: Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo.

Correlation with External Index

Correlation of relative criteria and external scores on different clusterings of same data set

random clusterings that range from very close to very far from ground truth

For karate;

5

Page 8: Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo.

Ranking of Criteria on Real World Benchmarks

6

Data set statistics

Overall Ranking

Difficulty Analysis

Page 9: Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo.

Ranking of Criteria on Synthetic Benchmarks

7

Data set statistics

Overall ranking for very mixed communities

Ranking for well separated communities

Page 10: Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo.

Criteria Ranking is affected by:• Choice of benchmarks, synthetic generator and its parameters• Choice of External agreement Index; ARI, NMI, AMI, Jacard• Choice of correlation measure; Pearson & Spearman correlation• Choice of clustering randomization

Get the ranking in your setting

www.cs.ualberta.ca/~rabbanyk/criteriaComparison

Ranking varies

8

Page 11: Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo.

Evaluation Issues• Community mining specific agreement

measure• Realistic synthetic benchmarks Extensions of criteria• Incorporating attributes; combine

clustering and community mining for cases for which we have both attributes and relations

• Incorporating uncertainty and edges with probability

• ...

Future Works

9

Page 12: Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo.

End

Questions?

10

Page 13: Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo.

Alternative Distances

A

• Edge Path (ED),

• Shortest Path Distance (SPD),

• Adjacency Relation Distance (ARD),

• Neighbour Overlap Distance (NOD),

• Pearson Correlation Distance (PCD),

• ICloseness Distance (ICD)

Page 14: Relative Validity Criteria for Community Mining Evaluation ASONAM 2012 Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo.

Relative criteria

• Variance Ratio Criterion (VRC)

• Dunn index,

• Silhouette Width Criterion (SWC),

• Alternative Silhouette,

• PBM,

• Davies-Bouldin

• C-Index,

• Point-Biserial (PB)

B


Recommended