+ All Categories
Home > Documents > A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School...

A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School...

Date post: 13-Dec-2015
Category:
Upload: claribel-whitehead
View: 213 times
Download: 0 times
Share this document with a friend
68
A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December 11, 2012 ∙ International Conference on Data Mining 1
Transcript
Page 1: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

1

A General and Scalable Approach to Mixed Membership Clustering

Frank Lin William W. Cohen∙School of Computer Science Carnegie Mellon University∙

December 11, 2012 International Conference on Data Mining∙

Page 2: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

2

Mixed Membership Clustering

Page 3: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

3

Motivation

• Spectral clustering is nice• But two drawbacks:

◦ Computationally expensive◦ No mixed-membership clustering

Page 4: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

4

Our Solution

• Convert a node-centric representation of the graph to an edge-centric one

• Adapt this representation to work with a scalable clustering method - Power Iteration Clustering

Page 5: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

5

Mixed Membership Clustering

Page 6: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

6

Perspective

• Since◦ an edge represents a relationship between two

entities.◦ an entity can belong to as many groups as its

relationships• Why don’t we group the relationships instead

of the entities?

Page 7: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

7

Edge Clustering

Page 8: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

8

Edge Clustering

• Assumptions:◦ An edge represents relationship between two

nodes◦ A node can belong to multiple clusters, but an

edge can only belong to one

Quite general – we can allow parallel edges if

needed

Page 9: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

9

Edge Clustering

• How to cluster edges?• Need a edge-centric view of the graph G

◦ Traditionally: a line graph L(G)• Problem: potential (and likely) size blow up!• size(L(G))=O(size(G)2)

◦ Our solution: a bipartite feature graph B(G)• Space-efficient• size(B(G))=O(size(G))

Transform edges into

nodes!

Side note: can also be used to represent tensors efficiently!

Page 10: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

10

Edge ClusteringThe

original graph G

The line graph L(G)

BFG - the bipartite feature graph B(G)

Costly for star-shaped structure!

Only use twice the space of G

a

b

c

d

e

ab

ac

bc

cd

ce

ab

ac

cd

bc

ce

a ab

ac

ce

cb

cb

c

b

d e

Page 11: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

11

Edge Clustering

• A general recipe:1. Transform affinity matrix A into B(A)2. Run cluster method and get edge clustering3. For each node, determine mixed membership

based on the membership of its incident edges

The matrix dimensions of B(A) is very big – can only use

sparse methods on large datasets

Perfect for PIC and implicit manifolds!☺

Page 12: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

12

Edge Clustering What are the dimensions of the

matrix that represent B(A)?

If A is a |V| x |V| matrix…Then B(A) is a

(|V|+|E|) x (|V|+|E|) matrix!

Need a clustering method that takes full advantage of the

sparsity of B(A)!

Page 13: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

13

Power Iteration Clustering:Quick Overview

• Spectral clustering methods are nice, a natural choice for graph data

• But they are expensive (slow)• Power iteration clustering (PIC) can provide a

similar solution at a very low cost (fast)!

Page 14: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

14

The Power IterationBegins with a

random vector

Ends with a piece-wise constant vector!

Overall absolute distance between points decreases, here we show relative distance

Page 15: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

15

Implication

• We know: the 2nd to kth eigenvectors of W=D-

1A are roughly piece-wise constant with respect to the underlying clusters, each separating a cluster from the rest of the data (Meila & Shi 2001)

• Then: a linear combination of piece-wise constant vectors is also piece-wise constant!

Page 16: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

16

Spectral Clustering

valu

e

index

1 2 3cluster

datasets

2nd s

mal

lest

ei

genv

ecto

r 3rd

sm

alle

st

eige

nvec

tor

clus

terin

g s

pace

Page 17: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

17

Linear Combination…

+

=

Page 18: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

18

Power Iteration Clustering

PIC results

vt

Page 19: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

19

Power Iteration Clustering

• The algorithm:

Page 20: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

20

We just need the clusters to be separated in some space

Key idea:To do clustering, we may not need all the information in a full spectral embedding(e.g., distance between clusters in a k-

dimension eigenspace)

Power Iteration Clustering

Page 21: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

21

Mixed Membership Clustering with PIC

• Now we have◦ a sparse matrix representation, and◦ a fast clustering method that works on sparse

matrices• We’re good to go!

Not so fast!!

Iterative methods like PageRank and power iteration don’t work on bipartite graphs, B(A) is a bipartite

graph!

Solution: convert it to a unipartite (aperiodic)

graph!

Page 22: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

22

Mixed Membership Clustering with PIC

• Define a similarity function:

Similarity between

edges I and j…

is proportional to the product of the incident

nodes they have in common…

and inversely proportional to the

number of edges this node is incident to

Then we simply just use a matrix S where S(I,j)=s(I,j) in place of B(A)!

Page 23: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

23

Mixed Membership Clustering with PIC

• Now we have◦ a sparse matrix representation, and◦ a fast clustering method that works on sparse

matrices, and◦ an unipartite graph

• We’re good to go?

Similar to line graphs, matrix S may no longer be sparse (e.g., star-shapes)!

Back to where we started?

Page 24: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

24

Mixed Membership Clustering with PIC

• Observations:

S NF FT

Page 25: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

25

Mixed Membership Clustering with PIC

• Simply replace one line:

Page 26: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

26

Mixed Membership Clustering with PIC

• Simply replace one line:

We get the exact same result, but with all sparse

matrix operations

Page 27: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

27

• That’s pretty cool. But how well does it work?

Page 28: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

28

Experiments

• Compare:◦ NCut◦ Node-PIC (single membership)◦ MM-PIC using different cluster label schemes:

• Max - pick the most frequent edge cluster (single membership)

• T@40 - pick edge clusters with at least 40% frequency• T@20 - pick edge clusters with at least 20% frequency• T@10 - pick edge clusters with at least 10% frequency• All - use all incident edge clusters

1 cluster label

many labels

?

Page 29: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

29

Experiments

• Data source:◦ BlogCat1

• 10,312 blogs and links• 39 overlapping category labels

◦ BlogCat2• 88,784 blogs and link• 60 overlapping category labels

• Datasets:◦ Pick pairs of categories with enough overlap◦ BlogCat1: 86 category pair datasets◦ BlogCat2: 158 category pair datasets

At least 1%

Page 30: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

30

Result

• F1 scores for clustering category pairs from the BlogCat1 dataset:

Max is better than Node!

Generally a lower threshold is better, but not

All

Page 31: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

31

Result

• Important - MM-PIC wins where it matters:

y-axis: difference in

F1 score when the method “wins”

x-axis: ratio of mixed membership instances

Each point is a two-cluster dataset

When MM-PIC

does better, it

does much better

MM-PIC does better on

datasets with more mixed membership

instances

# of datasets where the

method “wins”

Page 32: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

32

MM-PIC Result

• F1 scores for clustering category pairs from the (bigger) BlogCat2 dataset: More differences

between thresholds

Did not use NCut because

the datasets are too big...

Threshold matters!

Page 33: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

33

Result

• Again, MM-PIC wins where it matters:

Page 34: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

34

Questionsch2+3

PICicml 2010

Clustering Classification

ch4+5

MRWasonam 2010

ch6

ImplicitManifolds

ch6.1

IM-PICecai 2010

ch6.2

IM-MRWmlg 2011

ch7

MM-PICin submission

ch8

GK SSLin submission

ch9

Future Work

?

Page 35: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

35

Additional Slides

+

Page 36: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

36

Power Iteration Clustering

• Spectral clustering methods are nice, a natural choice for graph data

• But they are expensive (slow)• Power iteration clustering (PIC) can provide a

similar solution at a very low cost (fast)!

Page 37: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

37

Background: Spectral Clustering

Normalized Cut algorithm (Shi & Malik 2000):1. Choose k and similarity function s2. Derive A from s, let W=I-D-1A, where D is a diagonal matrix

D(i,i)=Σj A(i,j)

3. Find eigenvectors and corresponding eigenvalues of W4. Pick the k eigenvectors of W with the 2nd to kth smallest

corresponding eigenvalues5. Project the data points onto the space spanned by these

eigenvectors6. Run k-means on the projected data points

Page 38: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

38

Background: Spectral Clustering

datasets

2nd s

mal

lest

ei

genv

ecto

r 3rd

sm

alle

st

eige

nvec

tor

valu

e

index

1 2 3cluster

clus

terin

g s

pace

Page 39: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

39

Background: Spectral Clustering

Normalized Cut algorithm (Shi & Malik 2000):1. Choose k and similarity function s2. Derive A from s, let W=I-D-1A, where D is a diagonal matrix

D(i,i)=Σj A(i,j)

3. Find eigenvectors and corresponding eigenvalues of W4. Pick the k eigenvectors of W with the 2nd to kth smallest

corresponding eigenvalues5. Project the data points onto the space spanned by these

eigenvectors6. Run k-means on the projected data points

Finding eigenvectors and eigenvalues of a matrix is

slow in general

Can we find a similar low-dimensional embedding for clustering without

eigenvectors?

There are more efficient

approximation methods*

Note: the eigenvectors of I-D-1A corresponding to the smallest eigenvalues are the eigenvectors of D-1A corresponding to the largest

Page 40: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

40

The Power Iteration

• The power iteration is a simple iterative method for finding the dominant eigenvector of a matrix:

tt cWvv 1

W : a square matrix

vt : the vector at

iteration t;

v0 typically a random vector

c : a normalizing constant to keep vt

from getting too large or too small

Typically converges quickly; fairly efficient if W is a sparse matrix

Page 41: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

41

The Power Iteration

• The power iteration is a simple iterative method for finding the dominant eigenvector of a matrix:

tt cWvv 1

What if we let W=D-1A(like Normalized Cut)?

i.e., a row-normalized

affinity matrix

Page 42: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

42

The Power IterationBegins with a

random vector

Ends with a piece-wise constant vector!

Overall absolute distance between points decreases, here we show relative distance

Page 43: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

43

Implication

• We know: the 2nd to kth eigenvectors of W=D-

1A are roughly piece-wise constant with respect to the underlying clusters, each separating a cluster from the rest of the data (Meila & Shi 2001)

• Then: a linear combination of piece-wise constant vectors is also piece-wise constant!

Page 44: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

44

Spectral Clustering

valu

e

index

1 2 3cluster

datasets

2nd s

mal

lest

ei

genv

ecto

r 3rd

sm

alle

st

eige

nvec

tor

clus

terin

g s

pace

Page 45: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

45

Linear Combination…

+

=

Page 46: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

46

Power Iteration Clustering

PIC results

vt

Page 47: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

47

We just need the clusters to be separated in some space

Key idea:To do clustering, we may not need all the information in a full spectral embedding(e.g., distance between clusters in a k-

dimension eigenspace)

Power Iteration Clustering

Page 48: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

48

When to Stop

ntnnk

tkkk

tkk

tt cccc eeeev ...... 111111

The power iteration with its components:

n

t

nnk

t

kkk

t

kkt

t

c

c

c

c

c

c

ceeee

v

111

1

1

1

1

111

11

......

If we normalize:

At the beginning, v changes fast,

“accelerating” to converge locally due to

“noise terms” with small λ

When “noise terms” have gone to zero, v changes slowly (“constant speed”) because only larger λ terms (2…k) are left, where the

eigenvalue ratios are close to 1

Because they are raised to the power t, the eigenvalue ratios determines how fast

v converges to e1

Page 49: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

49

Power Iteration Clustering

• A basic power iteration clustering (PIC) algorithm:

Input: A row-normalized affinity matrix W and the number of clusters kOutput: Clusters C1, C2, …, Ck

1. Pick an initial vector v0

2. Repeat• Set vt+1 ← Wvt

• Set δt+1 ← |vt+1 – vt|• Increment t• Stop when |δt – δt-1| ≈ 0

3. Use k-means to cluster points on vt and return clusters C1, C2, …, Ck

Page 50: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

50

Evaluating Clustering for Network DatasetsEach dataset is an

undirected, weighted,

connected graph

Every node is labeled by human

to belong to one of k classes

Clustering methods are only given k and

the input graph

Clusters are matched to classes

using the Hungarian algorithm

We use classification

metrics such as accuracy, precision,

recall, and F1 score; we also use clustering metrics such as purity and normalized mutual information (NMI)

Page 51: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

51

PIC RuntimeNormalized Cut

Normalized Cut, faster

eigencomputation

Ran out of memory (24GB)

Page 52: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

52

PIC Accuracy on Network Datasets

Upper triangle: PIC does

better

Lower triangle: NCut or

NJW does better

Page 53: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

53

Multi-Dimensional PIC

• One robustness question for vanilla PIC as data size and complexity grow:

• How many (noisy) clusters can you fit in one dimension without them “colliding”?

Cluster signals cleanly separated

A little too close for comfort?

Page 54: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

54

Multi-Dimensional PIC

• Solution:◦ Run PIC d times with different random starts and

construct a d-dimension embedding◦ Unlikely any pair of clusters collide on all d

dimensions

Page 55: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

55

Multi-Dimensional PIC

• Results on network classification datasets:

RED: PIC using 1 random start

vector

GREEN: PIC using 1 degree

start vector

BLUE: PIC using 4

random start vectors

1-D PIC embeddings lose

on accuracy at higher k’s

compared to NCut and NJW

(# of clusters) But using a 4 random vectors instead helps!

Note # of vectors << k

Page 56: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

56

PIC Related Work

• Related clustering methods:

PIC is the only one using a reduced dimensionality – a critical feature for graph

data!

Page 57: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

57

Multi-Dimensional PIC

• Results on name disambiguation datasets:

Again using a 4 random vectors seems to work!

Again note # of vectors << k

Page 58: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

58

PIC: Versus Popular Fast Sparse Eigencomputation Methods

For Symmetric Matrices For General Matrices Improvement

Successive Power Method

Basic; numerically unstable, can be

slow

Lanczos Method Arnoldi MethodMore stable, but

require lots of memory

Implicitly Restarted Lanczos Method

(IRLM)

Implicitly Restarted Arnoldi Method

(IRAM)More memory-

efficient

Method Time Space

IRAM (O(m3)+(O(nm)+O(e))×O(m-k))×(# restart) O(e)+O(nm)

PIC O(e)x(# iterations) O(e)

Randomized sampling

methods are also popular

Page 59: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

59

PIC: Another View

• PIC’s low-dimensional embedding, which we will call a power iteration embedding (PIE), is related to diffusion maps:

(Coifman & Lafon 2006)

Page 60: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

60

PIC: Another View

• PIC’s low-dimensional embedding, which we will call a power iteration embedding (PIE), is related to diffusion maps:

(Coifman & Lafon 2006)

Page 61: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

61

PIC: Another View

• PIC’s low-dimensional embedding, which we will call a power iteration embedding (PIE), is related to diffusion maps:

(Coifman & Lafon 2006)

Page 62: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

62

PIC: Another View

• Result:

PIE is a random projection of the data in the diffusion space W with scale parameter t

We can use results from diffusion maps for applying PIC!

We can also use results from random projection for applying PIC!

Page 63: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

63

PIC Extension: Hierarchical Clustering

• Real, large-scale data may not have a “flat” clustering structure

• A hierarchical view may be more useful

Good News:The dynamics of a PIC embedding display a hierarchically convergent

behavior!

Page 64: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

64

PIC Extension: Hierarchical Clustering

• Why?• Recall PIC embedding at time t:

n

t

nn

tt

t

t

c

c

c

c

c

c

ceeee

v

113

1

3

1

32

1

2

1

21

11

...

Less significant eigenvectors / structures go away first, one by one

More salient structure stick

around

e’s – eigenvectors (structure) SmallBig

There may not be a clear

eigengap - a gradient of

cluster saliency

Page 65: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

65

PIC Extension: Hierarchical Clustering

PIC already converged to 8 clusters…

But let’s keep on iterating…

“N” still a part of the “2009”

cluster…

Similar behavior also noted in matrix-matrix power

methods (diffusion maps, mean-shift, multi-resolution

spectral clustering)

Same dataset you’ve seen

Yes(it might take a while)

Page 66: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

66

Distributed / Parallel Implementations

• Distributed / parallel implementations of learning methods are necessary to support large-scale data given the direction of hardware development

• PIC, MRW, and their path folding variants have at their core sparse matrix-vector multiplications

• Sparse matrix-vector multiplication lends itself well to a distributed / parallel computing framework

• We propose to use• Alternatives:

Existing graph analysis tool:

Page 67: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

67

Adjacency Matrix vs. Similarity Matrix

• Adjacency matrix:• Similarity matrix:• Eigenanalysis:

xAx

xxAx

xxIA

xSx

)(

)(

Same eigenvectors and same ordering

of eigenvalues!

A

IAS What about the

normalized versions?

Page 68: A General and Scalable Approach to Mixed Membership Clustering Frank Lin ∙ William W. Cohen School of Computer Science ∙ Carnegie Mellon University December.

68

Adjacency Matrix vs. Similarity Matrix

• Normalized adjacency matrix:• Normalized similarity matrix:• Eigenanalysis:

xDAxD

xDxAxD

xIxDAxD

xxIAD

)ˆ(ˆ

ˆˆ

ˆˆ

ˆ

11

11

11

1

Eigenvectors the same if degree is

the same

AD 1

IAD 1ˆ

Recent work on degree-corrected Laplacian (Chaudhuri 2012) suggests that it is

advantageous to tune α for clustering graphs

with a skewed degree distribution and does

further analysis


Recommended