+ All Categories
Home > Documents > Community Detection

Community Detection

Date post: 24-Feb-2016
Category:
Upload: booker
View: 50 times
Download: 0 times
Share this document with a friend
Description:
Community Detection . Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks . Physical Review E 69, 026113 (2004). M. E. J. Newman. Fast algorithm for detecting community structure in networks . Physical Review E 69, 066133 (2004). . - PowerPoint PPT Presentation
26
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69, 026113 (2004). M. E. J. Newman. Fast algorithm for detecting community structure in networks. Physical Review E 69, 066133 (2004).
Transcript
Page 1: Community Detection

Community Detection

Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69, 026113 (2004). M. E. J. Newman. Fast algorithm for detecting community structure in networks. Physical Review E 69, 066133 (2004).

Page 2: Community Detection

The Problem Can we partition the network into groups

s.t. the inter-group edges are sparse while the intra-group edges are dense?

Why is it interesting/useful? ◦Understanding comm. structure – means to

understanding n/w structure. ◦Graph partitioning – similar problem; graph of

processes, edges=communication; assign sub-graphs to processors to minimize inter-processor comm. & balance processor load. (NP-hard in general.)

◦Diff. w/ graph partitioning.

Page 3: Community Detection

An Example with Three Communities

Page 4: Community Detection

A Hierarchical Clustering Approach 1. Define a notion of similarity or affinity

between nodes. 2. E.g.: := #node-disjoint paths between

and . 3. := #edge-disjoint paths between and . 4. := weighted sum of all paths, with

longer paths weighted down, e.g., Katz! 5. Qn: how can we compute #2, 3 fast? 6. (Efficient algorithms for Katz have been

developed.)

Page 5: Community Detection

Community detection via hierarchical clustering Compute all pairwise node

similarities for every edge present.

Repeatedly add edges with greatest similarity.

leads to a tree (called dendrogram).

A slice throguh the dendogram represents a clustering or comm. structure.

Page 6: Community Detection

Dendrogram example

Page 7: Community Detection

Limitations of HC approach “Misplaces” nodes in the

periphery. E.g.:

Which community should 5 belong to?

Alternative approach based on “edge betweenness”.

51

2

34

Page 8: Community Detection

Key Intuition An inter-comm. edge has a

higher “betweenness” compared to an intra-comm. edge, i.e., more paths between node pairs pass through it.

Start with G. Repeatedly remove edges with

highest betweenness until <some stopping criterion>.

Communities = resulting components.

Page 9: Community Detection

Basic Algorithm repeat {

◦Calculate betweenness of all edges; ◦Remove one with highest betweenness,

breaking ties arbitrarily; } Until no edges left. Remarks:

◦Which betweenness score? ◦Calculate upfront and reuse or recalculate? ◦Can we incrementally recalculate after each

edge removal? ◦Related algorithms for node betweenness by

Newman and Brandes.

Page 10: Community Detection

A Real Example (Zachary’s Karate Club)

With recalculation of betweenness.

Without recalculation of betweenness.

Page 11: Community Detection

Scalability Issues Edge betweenness for all edges

can be computed in time (=#edges, =#nodes). [Newman 2001] – details soon.

Recalculation makes algorithm , so not feasible for large networks.

Page 12: Community Detection

Computing edge betweenness An Example

b

a

d

c

f

eg

Breadth-first search – means for doing many things.

Compute #geodesics from every node to g.

Page 13: Community Detection

Computing edge betweenness An Example

b

a

d

c

f

eg

Breadth-first search – means for doing many things.

d=0w=1

Page 14: Community Detection

Computing edge betweenness An Example

b

a

d

c

f

eg

Breadth-first search – means for doing many things.

d=0w=1

d=1w=1

d=1w=1

Page 15: Community Detection

Computing edge betweenness An Example

b

a

d

c

f

eg

Breadth-first search – means for doing many things.

d=0w=1

d=1w=1

d=1w=1d=2

w=2

d=2w=2

d=2w=2

Page 16: Community Detection

Computing edge betweenness An Example

b

a

d

c

f

eg

Breadth-first search – means for doing many things.

d=0w=1

d=1w=1

d=1w=1d=2

w=2

d=2w=2

d=2w=2

d=3w=4

Have all info.we need for edge betweenness now.

Page 17: Community Detection

Computing edge betweenness An Example

b

a

d

c

f

eg

Breadth-first search – means for doing many things.

d=0w=1

d=1w=1

d=1w=1d=2

w=2

d=2w=2

d=2w=2

d=3w=4

Note: a and fare like leaves: no geodesic to g from other nodes passes through them.

2/4

2/4

1/2

1/2

Page 18: Community Detection

Computing edge betweenness An Example

b

a

d

c

f

eg

Breadth-first search – means for doing many things.

d=0w=1

d=1w=1

d=1w=1d=2

w=2

d=2w=2

d=2w=2

d=3w=4

Note: a and fare like leaves: no geodesic to g from other nodes passes through them.

2/4

2/4

½(1+2/4)

½(1+2/4)

½(1+2/4

)

½(1+2/4) 1/2

1/2

Page 19: Community Detection

Computing edge betweenness An Example

b

a

d

c

f

eg

Breadth-first search – means for doing many things.

d=0w=1

d=1w=1

d=1w=1d=2

w=2

d=2w=2

d=2w=2

d=3w=4

Note: a and fare like leaves: no geodesic to g from other nodes passes through them.

2/4

2/4

½(1+2/4)

½(1+2/4)

½(1+2/4

)

½(1+2/4) 1/2

1/2

1/1[ 1+½(1+2/4)+1/2(1+2/4)+1/2]

Page 20: Community Detection

EB Computation summary For any one target node,

compute weights of nodes by BFS; = #geodesics from to target.

Suppose rest of (containing target).

Then intuitively, of the geodesics from to the target node go through .

Page 21: Community Detection

EB Computation summary (contd.) For any edge ( further from

target than ), = The above is wrt a specific target

node. Overall bet for any edge = sum

of bet wrt every node treated as target node.

Page 22: Community Detection

EB computation – complexity analysis For any one target node, BFS

gives bet of every edge w.r.t. that target node, in time.

Doing so for every node treated as target node time for final betweenness score for every edge.

Quite elegant, but recalculation bumps up complexity to

Need more scalable approaches for CD.

Page 23: Community Detection

On scaling up CD algorithm determine intelligently which edges need

their bet recalculated, when an edge is removed. ◦When is removed, needs to be recalculated

only if is in the same connected component as . For a very large component, doesn’t prune much.

◦Perhaps it’s only important to determine the edge with the next highest bet.

can we maintain enough “state” so that when is removed, we can recalculate incrementally, i.e., not from scratch?

Point to ponder!

Page 24: Community Detection

Closing Remarks 1/2 Newman also proposed other bases for

defining edge betweenness. Electrical current flow through the edge

where every edge is viewed as unit resistance and we consider all source-sink pairs.

Based on random walks. Both less effective and more expensive

than geodesics (see paper for details). What about directed and weighted

cases?

Page 25: Community Detection

Closing Remarks 2/2 Goodness metric of community division. Helpful when we don’t know the ground truth. Q = ∑i (eii – ai

2 ), where Ekxk= matrix of community division: eij = fraction of

edges linking comm. i to comm. j; ai = ∑j eij . Q measures fraction of intra-comm. edges over

what is expected by chance (assuming uniform distribution).

See paper for details of experimental results. Turns out study of influence/information

propagation can suggest new ways of detecting communities: will revisit this issue after we study influence propagation.

Page 26: Community Detection

Recommended Reading J. Ruan and W. Zhang. An Efcient Spectral Algorithm for

Network Community Discovery and Its Applications to Biological and Social Networks. ICDM 2007.

M. E. J. Newman "Modularity and community structure in networks", physics/0602124 = Proceedings of the National Academy of Sciences (USA) 103 (2006): 87577—8582.

Jure Leskovec, Kevin J. Lang, and Michael W. Mahoney. Empirical Comparison of Algorithms for Network Community Detection. WWW 2010.

M. E. J. Newman. Communities, modules and large-scale structure in networks. Nature Physics 8, 25–31 (2012) doi:10.1038/nphys2162 Received 23 September 2011 Accepted 04 November 2011 Published online 22 December 2011.


Recommended