+ All Categories
Transcript

1

Complex NetworksTina Eliassi-RadLawrence Livermore National Laboratory & Rutgers University

7/22/2010Mathcamp'10

g y

http://eliassi.org

7/22/2010Mathcamp'10

2

Internet TerrorismReality MiningFood WebInternet TerrorismReality MiningFood Web

Enron EmailsMap of Science HP Emails

Contagion of TB NY State Power GridProtein InteractionsFriendship Network

7/22/2010Mathcamp'10

3

Common Patterns• Scale-free▫ Eigen exponentEigen exponent

• Small-world

• Triangle lawg▫ k friends → ~k1.6 triangles

• Small diameter

i l i fl• Social influence▫ Pr(A.class == B.class | A → B) > Pr(A.class == B.class)

• Social selection• Social selection▫ Pr(A B | A.class == B.class) > Pr(A → B)

7/22/2010Mathcamp'10

4

Problem #1

• Triangles are expensive to compute▫ Friends of friends are friends▫ Friends of friends are friends

3-way join

• Q: Can we do this quickly?Q: Can we do this quickly?• A: Yes!▫ #triangles = 1/6 Σi( λi

3 )g i( i )▫ because of skewness, we only need the top few

eigenvalues

7/22/2010Mathcamp'10

5

Communities• Clusters, groups, modules• Need to• Need to▫ Formalize the notion

of a communityHard vs. soft

▫ Design an algorithm that will find sets of nodes that are “good” communities

▫ Formalize the evaluation of Formalize the evaluation of community structure

7/22/2010Mathcamp'10

6

Clustering Objective Function: ConductanceConductance• A good cluster S has

▫ Many edges internallyS

▫ Many edges internally

▫ Few edges pointing outside

• Simplest objective function:S’

S p est object ve u ct o :

Conductance

Φ(S) = #edges outside S / #edges inside S

▫ Small conductance corresponds to good clusters

• The score of best cluster of size k

7/22/2010Mathcamp'10

7

Network Community Profile (NCP) [Leskovec et al WWW‘08][Leskovec et al. WWW‘08]

• The score of best cluster of size k

log Φ(k)

k=5 k=7k=10

g ( )

Community size, log k

7/22/2010Mathcamp'10

8

Clustering Objective Function: ModularityModularity• m = number of edges in the graph

• A = 1 if v→w; 0 otherwise• Avw = 1 if v→w; 0 otherwise

• kv = degree of vertex v

• δ(i, j) = 1 if i ≡ j; 0 otherwiseδ(i, j) 1 if i j; 0 otherwise

• Maximizes modularity, Q

▫ Fraction of all edges within communities minus the expected Fraction of all edges within communities minus the expected value of the same quantity in a network where the vertices have the same degrees but edges are placed at random

7/22/2010Mathcamp'10

9

Problem #2

• Q: Is maximizing modularity similar to local spectral partitioning?local spectral partitioning?

• Notes:

▫ In spectral methods, eigenvectors are based on the unnormalized Laplacian of the graph: L = D – A

D d t i f th hD = degree matrix of the graph

A = adjacency matrix of the graph

L k i t Fi ld t▫ Look into Fielder vector

7/22/2010Mathcamp'10

10

Clustering Objective Function: CompressionCompression• Organize into few, homogeneous communities

versusw g

rou

ps

ow g

rou

ps

Column groups Column groups

Row

Ro

Good Clustering

1. Similar nodes are grouped together

2. As few groups as

A few,homogeneous

blocks

Good Compression

necessary

implies

11

Total Encoding Cost Objective FunctionTotal Encoding Cost Objective Function

m1 m2 m3

ℓ = 3 col. groups jiep ,=

n1

m1 m2 m3

p p p

density of ones (edges)ji

ji mnp , =

∑i,j nimj H(pi,j)

1 p1,1 p1,2 p1,3

w g

rou

ps

bits total

block size entropy

n2 p2,1 p2,2 p2,3

k=

3ro

w code cost

d i ti t

+

n3 p3,3p3,2p3,1 ∑irow-partitionidescription ∑j

col-partitionjdescription

∑ transmit

+

description cost

transmit

n × m adj. matrix

∑i,jtransmit#edges ei,j

++ transmit#partitions

12

Total Encoding Cost Objective FunctionTotal Encoding Cost Objective Function

m1 m2 m3

ℓ = 3 col. groups

jie

n1

m1 m2 m3

p p p density of ones (edges)ji

jiji mn

ep ,

, =

1 p1,1 p1,2 p1,3

w g

rou

ps

∑ n m H(p ) bits total

block size entropy

n2 p2,1 p2,2 p2,3

k=

3ro

w ∑i,j nimj H(pi,j)code cost

bits total

+n3 p3,3p3,2p3,1 ( ) ( )

⎡ ⎤∑+ m

mmm

nn

nn

lk

mHnH lk

lll

,,,, 11 LL

+

n × m adj. matrix

⎡ ⎤∑+++ji jimnlk

,logloglog

7/22/2010Mathcamp'10

13

one r

one

Total Encoding Cost:cost vs. # of clusters

trow

group

col group

bit

cost

nrow

m

col group

s grou

ps

k

k=

3row

ℓ=

3col ℓ

w grou

ps

group

s

7/22/2010Mathcamp'10

14

Problem #3

• Q: How do you find the best partitioning of a graph based on compression?of a graph based on compression?

• Notes:

▫ Requires a couple of lemmas that rely on

Concavity of entropy

Non-negativity of the KL-divergence

7/22/2010Mathcamp'10

15

Evaluation based on Link Prediction

• A good factorization of a graph's connectivity structure should accurately predict links between nodes based on their respective communities

• P(s→t | s, t, cs, ct)

E l t ff ti b

0 1 0 1 1

1 0 1 0 1

0 1 1

1 0 0 1• Evaluate effectiveness by

1. randomly holding out a number of links

2 building a model

0 1 0 0 0

1 0 0 0 1

1 1 0 1 0

0 0

1 0 0 0 1

1 1 02. building a model

3. using learnt model to predict held-out links

4. measuring performance with area under ROC curve (AUC)

1 1 0 1 01 1 0

4. measuring performance with area under ROC curve (AUC)

Evaluation based on Variation of Information [Karrer Levina and Newman Phys Rev E 2008] [Karrer, Levina, and Newman, Phys. Rev. E. 2008]

Perturb graph by randomly reassigning a number of its links

R i i t [0 1] d t i f ti f li k i dRewiring parameter c ∈ [0,1] determines fraction of links rewired

Links are rewired in a way that preserves the expected degree of each node in graph

C = communities discovered on original graph, where c = 0

C’ = communities discovered on perturbed graphs, where c ≠ 0

V l f I f ti Δ(C C’) H(C|C’) H(C’|C)Value of Information, Δ(C, C’) = H(C|C’) + H(C’|C)

H(C’|C) measures the information needed to describe C’ given C

Δ(C, C’) ∈ [0, log(N)] treats each assignment as a messageΔ(C, C ) ∈ [0, log(N)] treats each assignment as a message

Is a symmetric entropy-based measure of the distance between these messages

7/22/2010Mathcamp'10

17

Summary: Clustering Objective Functions for Complex NetworksFunctions for Complex Networks

• Conductance▫ J. Leskovec, K. J. Lang, M. W. Mahoney: Empirical comparison of algorithms for network community

detection. WWW 2010: 631-640

• Modularity▫ A. Clauset, M.E.J. Newman, C. Moore: Finding community structure in very large networks. Phys. Rev. E 70:

066111 (2004)

• Total encoding cost▫ D. Chakrabarti, S. Papadimitriou, D. S. Modha, Christos Faloutsos: Fully automatic cross-associations. KDD

2004: 79-88

• Maximum likelihood▫ D. M. Blei, A. Y. Ng, M. I. Jordan: Latent Dirichlet Allocation. JMLR 3: 993-1022 (2003)

d d i ldi id l i S h i b i i i d b hi S h i ▫ Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, Eric P. Xing: Mixed Membership Stochastic Blockmodels. JMLR 9: 1981-2014 (2008)

▫ K. Henderson, T. Eliassi-Rad, S. Papadimitriou, C. Faloutsos: HCDF: A Hybrid Community Discovery Framework. SDM 2010: 754-765

• Clique percolation q p▫ G. Palla, I. Derenyi, I. Farkas, T. Vicsek: Uncovering the overlapping community structure of complex

networks in nature and society. Nature 435:814 (2005)

• Many more…


Top Related