Hierarchically Modular Structure in Complex...

Post on 25-Jul-2020

3 views 0 download

transcript

Hierarchically ModularStructure in

Complex Networks

Aaron ClausetSanta Fe Institute

3 November 2008DIMACS / DyDAn

“Network Models of Biological and Social Contagion”

Modular Hierarchies

Grassland species**thank you: Jennifer Dunne

plant!

!

herbivore

!

parasite

Modular Hierarchies

c

Modular Hierarchies

c

The TaskHow can we extract

• this hierarchical (multi-scale) structurefrom complex networks?

!

network

?

c

hierarchy

One Approach

Model-based inference

1. describe how to generate hierarchies (a model)

2. “fit” model to empirical data

3. test “fitted” model

4. extract predictions + insight

5. profit!

A Model of Hierarchy

!

A Model of Hierarchy

probability

assortative modules

pr

D, {pr}

“inhomogeneous” random graph

!!

model

instance

!

Pr(i, j connected) = pr

i

j

i j

= p(lowest common ancestor of i,j)

Model Features

• explicit model = explicit assumptions

• very flexible (many parameters)

• captures structure at all scales

• arbitrary mixtures of assortativity, disassortativity

• learnable directly from data

Learning From Data

a direct approach• likelihood function

( scores quality of model)

• sample the good models

via Markov chain Monte Carlo

• technical details in arXiv : physics/0610051

L = Pr( data | model )

From Graph to Ensemble

From Graph to Ensemble

• Given graph• run MCMC to equilibrium• then, for each sampled , draw a resampled

graph from ensemble

A test: do resampled graphs look like original?

D

G!

G

Grassland species*

plant!

!herbivore

!parasite

*thank you: Jennifer Dunne

100 10110!3

10!2

10!1

100a

Degree, k

Frac

tion

of v

ertic

es w

ith d

egre

e k

Degree Distribution

resampled!

original!

Clustering Coefficient

resampled!

original!

0 0.05 0.1 0.15 0.2 0.25 0.30

0.05

0.1

0.15

0.2

0.25

Frac

tion

of g

raph

s wi

th c

lust

erin

g co

effic

ient

c

Clustering coefficient, c

resampled!

original!

2 4 6 8 1010!3

10!2

10!1

100b

Distance, d

Frac

tion

of v

erte

x!pa

irs a

t dis

tanc

e d

Distance Distribution

resampled!

original!

Missing Links

A test: can model predict missing links?

Predicting is Hard

• remove edges from• how easy to guess a missing link?

n = 75

m = 113

pguess !k

n2" m + k

= O(n!2)

k

pguess = k/(2662 + k)

G

• Given incomplete graph• run MCMC to equilibrium• then, over sampled , compute average

for links • predict links with high values are missing

Test idea via leave-k-out cross-validationperfect accuracy: AUC = 1no better than chance: AUC = 1/2

(i, j) !" G

Predicting Missing Links

D !pr"

G

!pr"

Missing Structure

0 0.2 0.4 0.6 0.8 10.4

0.5

0.6

0.7

0.8

0.9

1

Area

und

er R

OC

curv

e

Fraction of edges observed, k/m

Grassland species network

Pure chanceCommon neighborsJaccard coeff.Degree productShortest pathsHierarchical structure

simple predictors

!

hierarchy

!

pure chance!

AUC

0 0.2 0.4 0.6 0.8 10.4

0.5

0.6

0.7

0.8

0.9

1

AUC

Fraction of edges observed

Terrorist association networka

Pure chanceCommon neighborsJaccard coefficientDegree productShortest pathsHierarchical structure

Other Networks

0 0.2 0.4 0.6 0.8 10.4

0.5

0.6

0.7

0.8

0.9

1

AUC

Fraction of edges observed

T. pallidum metabolic networkb

Pure chanceCommon neighborsJaccard coefficientDegree productShortest pathsHierarchical structure

Summary

• Many real networks are hierarchically modular• Hierarchies can

• model multi-scale structure• generalize a single network• predict missing links

• Model-based inference is very powerful

Acknowledgments:C. Moore, M.E.J. Newman, C.H. Wiggins, and C.R. Shalizi

Fin