+ All Categories
Home > Documents > Fast Counting of triangles in large networks: Algorithms and laws

Fast Counting of triangles in large networks: Algorithms and laws

Date post: 22-Jan-2016
Category:
Upload: simone
View: 38 times
Download: 0 times
Share this document with a friend
Description:
Charalampos (Babis) Tsourakakis School of Computer Science Carnegie Mellon University http://www.cs.cmu.edu/~ctsourak. Fast Counting of triangles in large networks: Algorithms and laws. RPI Theory Seminar, 24 November 2008. Counting Triangles. - PowerPoint PPT Presentation
Popular Tags:
55
FAST COUNTING OF TRIANGLES IN LARGE NETWORKS: ALGORITHMS AND LAWS RPI Theory Seminar, 24 November 2008 Charalampos (Babis) Tsourakakis School of Computer Science Carnegie Mellon University http://www.cs.cmu.edu/~ctsourak
Transcript
Page 1: Fast Counting of triangles in large networks:  Algorithms and laws

FAST COUNTING OF TRIANGLES IN LARGE NETWORKS: ALGORITHMS AND LAWS

RPI Theory Seminar, 24 November 2008

Charalampos (Babis) Tsourakakis School of Computer ScienceCarnegie Mellon University

http://www.cs.cmu.edu/~ctsourak

Page 2: Fast Counting of triangles in large networks:  Algorithms and laws

Counting Triangles

RPI, November 2008

2

Given an undirected, simple graph G(V,E) a triangle is a set of 3 vertices such that any two of them by an edge of the graph.

Related Problems a) Decide if a graph is triangle-free. b) Count the total number of triangles δ(G). c) Count the number of triangles δ(v) that each

vertex v participates at.

d) List the triangles that each vertex v

participates at.

Our focus

|}),(,),(:),{(|)( EwvEuvEwuv

Page 3: Fast Counting of triangles in large networks:  Algorithms and laws

Why is triangle counting important*?

RPI, November 2008

3

Social Network Analysis:“Friends of friends are friends” [WF94]

Web Spam Detection [BPCG08] Hidden Thematic Structure of the

Web [EM02] Motif Detection e.g. biological

networks [YPSB05]

*few indicative reasons, from the graph mining perspective

Page 4: Fast Counting of triangles in large networks:  Algorithms and laws

Why is triangle counting important?

RPI, November 2008

4

Furthermore, two often used metrics are: Clustering Coefficient

where: Transitivity Ratio

where:

)(

)(3

G

GTR

Triple at node v

Triangle

'' )(

)(

|'|

1)(

|'|

1)(

VvVv v

v

Vvcc

VGCC

v

2

)()( and }2)(:{'

vdvvdvV

VvVv

vGvG )()( and )(3

1)(

Page 5: Fast Counting of triangles in large networks:  Algorithms and laws

Outline

RPI, November 2008

5

• Related Work• Proposed Method • Experiments• Triangle-related Laws• Triangles in Kronecker Graphs• Future Work & Open Problems

Page 6: Fast Counting of triangles in large networks:  Algorithms and laws

Counting methods

Dense graphs

Fast Low space

Time complexity

O(n2.37) O(n3)

Space complexity

O(n2) O(m)

Fast Low space

Time complexity

O(m0.7n1.2+n2+o(1)) e.g. O( n )

Space complexity

Θ(n2) (eventually) Θ(m)

Sparse graphs

RPI, November 2008

2maxd

6

Page 7: Fast Counting of triangles in large networks:  Algorithms and laws

Outline

RPI, November 2008

7

• Related Work• Proposed Method • Experiments• Triangle-related Laws• Triangles in Kronecker Graphs• Future Work & Open Problems

Page 8: Fast Counting of triangles in large networks:  Algorithms and laws

Outline of the Proposed Method

8

EigenTriangle theorem EigenTriangleLocal theorem EigenTriangle algorithm EigenTriangleLocal algorithm Efficiency & Complexity

Power law degree distributions Gershgorin discs Real world network spectra

RPI, November 2008

Page 9: Fast Counting of triangles in large networks:  Algorithms and laws

Theorem [EigenTriangle]9

Theorem The number of triangles δ(G) in an

undirected, simple graph G(V,E) is given by:

where are the eigenvalues of the adjacency matrix of graph G.

RPI, November 2008

6)(

||

1

3

V

ii

G

||21 ... V

Page 10: Fast Counting of triangles in large networks:  Algorithms and laws

Proof10

Call A the adjacency matrix of the graph. Consider the i-th diagonal element of A3, αii. This element is equal to the number of triangles vertex i participates at. So the trace is 6δ(G) because each triangle is counted 6 times (3 participating vertices and is also counted as i-j-k, and i-k-j). Furthermore, if Ax=λx, then λ3 is an eigenvalue of A3 (*) and vice versa if λ is an eigenvalue of A3 , then is an eigenvalue of A.

* A3 x=AAAx=AAλx=λΑΑx=λΑλx=λ2Αx=λ3x

3

RPI, November 2008

Page 11: Fast Counting of triangles in large networks:  Algorithms and laws

Theorem [EigenTriangleLocal]

11

Theorem The number of triangles δ(i) vertex i

partipates at is equal to:

where is the j-th entry of the i-th eigenvector

Proof [Sketch]Follows from the previous theorem and the fact that A is symmetric, therefore diagonalizable and also

RPI, November 2008

2)(

2||

1

3ij

V

jju

i

iju iu

TUUA 33

Page 12: Fast Counting of triangles in large networks:  Algorithms and laws

EigenTriangle Algorithm12

RPI, November 2008

Page 13: Fast Counting of triangles in large networks:  Algorithms and laws

EigenTriangleLocal Algorithm

13

RPI, November 2008

Why are these two

algorithms

efficient?

Page 14: Fast Counting of triangles in large networks:  Algorithms and laws

Skewed Degree Distributions

14

Skewed degree distribution ubiquitous in nature! Have been termed as “the signature of human activity”[FKP02] but appear as well to all other kind of networks, e.g. biological. See [N05][M04] for generative models of power law distributions.

Typically referred to as power-laws (even if sometimes we abuse the strict definition of a power law, i.e ).

RPI, November 2008

bxay )log()log(

Page 15: Fast Counting of triangles in large networks:  Algorithms and laws

Examples of power laws15

Newman [N05] demonstratedhow often power laws appearusing may different types ofnetworks, ranging from wordfrequencies to population ofcities.

RPI, November 2008

Many cities havea small population

Few cities havea huge population

Page 16: Fast Counting of triangles in large networks:  Algorithms and laws

Gershgorin’s Discs

RPI, November 2008

16

Theorem Let B an arbitrary matrix. Then the eigenvalues λ of B are located in the union of the n discs

For a proof see Demmel [D97], p.82.

kj

kjkk bb ||||

Page 17: Fast Counting of triangles in large networks:  Algorithms and laws

Gershgorin Discs

RPI, November 2008

17

Bounds on the airports network (Observe how loose)

Page 18: Fast Counting of triangles in large networks:  Algorithms and laws

Typical real world spectra18

RPI, November 2008

AirportsPolitical blogs

Page 19: Fast Counting of triangles in large networks:  Algorithms and laws

Top Eigenvalues19

Zooming in the top eigenvalues and plotting the rank vs. the eigenvalue in log-log scale reveals that the top eigenvalues follow a power law [FFF99]

Some years later, Mihail & Papadimitriou [MP02] and Chung, Lu and Vu [CLV03] proved this fact.

RPI, November 2008

Page 20: Fast Counting of triangles in large networks:  Algorithms and laws

Our idea20

Simple & clear: Use a low-rank approximation of A3 to estimate the diagonal elements and the trace.

Suggests also a way of thinking:Take advantage of special properties (e.g. power laws) to reduce the complexity of certain computational tasks in real-world networks.

RPI, November 2008

Page 21: Fast Counting of triangles in large networks:  Algorithms and laws

Summing up: Why does it work?

21

Almost symmetry of the spectrum around 0 for the bulk of the eigenvalues except the top ones is the first main reason.

Cubes amplify strongly this phenomenon!

RPI, November 2008

Page 22: Fast Counting of triangles in large networks:  Algorithms and laws

Complexity Analysis22

Main computational bottleneck that determines the complexity is the Lanczos method.

Lanczos runs in linear time with respect to the non-zero entries of the matrix, i.e. the edges, assuming that we compute a few constant number of eigenvalues.

Convergence of Lanczos is fast due to the eigenvalue power law (see Kaniel-Paige theory [GL89])

RPI, November 2008

Page 23: Fast Counting of triangles in large networks:  Algorithms and laws

Outline

RPI, November 2008

23

• Related Work• Proposed Method • Experiments• Triangle-related Laws• Triangles in Kronecker Graphs• Future Work & Open Problems

Page 24: Fast Counting of triangles in large networks:  Algorithms and laws

Datasets24

RPI, November 2008

Page 25: Fast Counting of triangles in large networks:  Algorithms and laws

Competitor: Node Iterator 25

Node Iterator algorithm considers each node at the time, looks at its neighbors and checks how many among them are connected among them.

Complexity: O(n ) We report the results as the speedup

that EigenTriangle algorithm gives compared to the running time of the Node Iterator .

2maxd

RPI, November 2008

Page 26: Fast Counting of triangles in large networks:  Algorithms and laws

Results: #Eigenvalues vs. Speedup

26

RPI, November 2008

Page 27: Fast Counting of triangles in large networks:  Algorithms and laws

Results: #Edges vs. Speedup

27

RPI, November 2008

Page 28: Fast Counting of triangles in large networks:  Algorithms and laws

Main points28

Some interesting facts for the two scatterplots:

Mean required approximations rank for at least 95% is 6.2

Speedups are between 33.7x and 1159x. The mean speedup is 250. Notice the increasing speedup as the

size of the network grows.

RPI, November 2008

Page 29: Fast Counting of triangles in large networks:  Algorithms and laws

Zooming in29

RPI, November 2008

Zoomingin this point

Page 30: Fast Counting of triangles in large networks:  Algorithms and laws

Evaluating the Local Counting Method

30

Pearson’s correlation coefficient ρ Relative Reconstruction Error

||

1 )(

|)(')(|

||

1 V

i

i

VRRE

RPI, November 2008

Political Blogs:RRE 7*10-4

ρ 99.97%

Page 31: Fast Counting of triangles in large networks:  Algorithms and laws

#Eigenvalues vs. ρ for three networks

31

RPI, November 2008

Observe how a low rankresults in

almost optimal results.This holds for

surprisingly manyreal world networks

Page 32: Fast Counting of triangles in large networks:  Algorithms and laws

Outline

RPI, November 2008

32

• Related Work• Proposed Method • Experiments• Triangle-related Laws• Triangles in Kronecker Graphs• Future Work & Open Problems

Page 33: Fast Counting of triangles in large networks:  Algorithms and laws

Triangle Participation Law

RPI, November 2008

33

Plots the number of triangles δ (x-axis) vs. the count of vertices with δ participating triangles.

a) EPINIONS, who trusts-whosb) ASN, social networkc) HEP_TH, collaboration network

(a) (b)

(c)

Page 34: Fast Counting of triangles in large networks:  Algorithms and laws

Degree Triangle Law

RPI, November 2008

34

Plots the degree di (x-axis) vs. the mean number of triangles that nodes with degree di participate at.

Epinions ASN

Page 35: Fast Counting of triangles in large networks:  Algorithms and laws

Outline

RPI, November 2008

35

• Related Work• Proposed Method • Experiments• New Triangle-related Laws• Triangles in Kronecker Graphs• Future Work & Open Problems

Page 36: Fast Counting of triangles in large networks:  Algorithms and laws

Kronecker Graphs

RPI, November 2008

36

This model was introduced in [LCKF05]. It is based on the simple operation of the Kronecker product to generate graphs that mimic real world networks.

Deterministic Kronecker Graphs: Kronecker Product of the adjacency matrix at the current step k with the initiator adjacency matrix (typically small).

Stochastic Kronecker Graphs: Kronecker Product of the matrix at the current step k with the initiator matrix. Initiator matrix contains probabilities.For more details see [LF07].

Page 37: Fast Counting of triangles in large networks:  Algorithms and laws

Triangles in Kronecker Graphs

RPI, November 2008

37

Some notation first:A: nxn initiatior adjacency matrix of the undirected, simple graph GA

B = A[k] k-th Kronecker product

λ=(λ1,...,λn) the eigenvalues of A

Δ(GA), Δ(GΒ) #triangles of GA , GΒ Theorem [KroneckerTRC]

06 1 , k)Δ(G ) Δ(G kA

kB

Page 38: Fast Counting of triangles in large networks:  Algorithms and laws

Proof 38

We use induction on the number of recursion steps k. For k=0 the theorem trivially holds.

Assume now that KroneckerTRC holds now for some

.Call C=A[r], D=A[r+1] and the eigenvalues of C,

[μi]i=1..s.By the assumption

The eigenvalues of D are given by the Kronecker product . By the EigenTriangle theorem, the number of triangles in D is given by:

RPI, November 2008

1r

16 rA

rc )Δ(G ) Δ(G

Page 39: Fast Counting of triangles in large networks:  Algorithms and laws

Proof 39

RPI, November 2008

211

3

1

3

1 1

33

1 1

33

)(6)()(66

)(6

6

)(6

66)(

rA

rCA

s

ii

A

s

iAi

s

i

n

jji

s

i

n

jji

D

GGGG

GG

Therefore KroneckerTRC holds for all .Q.E.D

0k

Page 40: Fast Counting of triangles in large networks:  Algorithms and laws

Outline

RPI, November 2008

40

• Related Work• Proposed Method • Experiments• New Triangle-related Laws• Triangles in Kronecker Graphs• Future Work & Open Problems

Page 41: Fast Counting of triangles in large networks:  Algorithms and laws

Theoretical Challenge I:Spectra of real world networks

41

Can we prove things about the distribution of the eigenvalues, adopting a random graph model such as the expected degree model G(w) [CLV03]?

An analog to Wigner’s semicircle law for random Erdos-Renyi graphs (see Furedi-Komlos [FK81])

RPI, November 2008

Spectrum of

over 100000 Iterations

[S07]

2

1,40

G

Page 42: Fast Counting of triangles in large networks:  Algorithms and laws

Theoretical Challenge I:Spectra of real world networks

42

RPI, November 2008

Empirically, the rest of

the spectrum:Triangular-like

distribution[FDBV01]

Can we proveSomething about

this empirical observation ?

Page 43: Fast Counting of triangles in large networks:  Algorithms and laws

Theoretical Challenge II: Eigenvectors of real world networks

RPI, November 2008

43

Things even “worse” than the case of spectra. Very few knowledge about the eigenvectors. Related work:See [P08] for random graphs.

Page 44: Fast Counting of triangles in large networks:  Algorithms and laws

Theoretical Challenge III: Degree Triangle Law

44

Prove using the expected degree random graph model G(w) the pattern we saw (see [S04])

Conjecture: The relationship we observed probably appears

for some cases of the slope of the degree distribution. Further experiments, recently

showed that for some graphs this pattern does not

hold.

RPI, November 2008

Page 45: Fast Counting of triangles in large networks:  Algorithms and laws

Experimental Challenge I:Compare with Streaming Methods45

Streaming or Semi-Streaming methods, perform one or O(1) passes over the graph. [YKS02][BFLSS06][BPCG08] Common Underlying Idea: Sophisticated sampling methods

Implement and compare.

RPI, November 2008

Page 46: Fast Counting of triangles in large networks:  Algorithms and laws

Practical Challenge I:Triangles in Large Scale Graph Mining46

Many Giga-byte and Peta-byte sized graphs. How to handle these graphs? HADOOP EigenTriangle algorithms are based just on

simple matrix vector multiplications. Easy to parallelize in all sorts of

architectures (distributed memory , shared memory).

See [DHV93] for the details. RPI, November 2008

Page 47: Fast Counting of triangles in large networks:  Algorithms and laws

PEGASUS: Peta-Graph Miningfrom the Triangle perspective

47

RPI, November 2008

On-going work with U Kang and Christos Faloutsos in collaboration with Yahoo! Research.

Among others: Implement EigenTriangle algorithms in HADOOP and compare to other methods.

Find outliers in graphs with many billions of edges wrt triangles.

Soon…Stay tuned!

Page 48: Fast Counting of triangles in large networks:  Algorithms and laws

Curious about:

RPI, November 2008

48

Page 49: Fast Counting of triangles in large networks:  Algorithms and laws

Acknowledgements

RPI, November 2008

49

Christos Faloutsos

Yiannis KoutisFor the helpful discussions

Page 50: Fast Counting of triangles in large networks:  Algorithms and laws

Acknowledgements

RPI, November 2008

50

Maria Tsiarli For the PEGASUS logo

Page 51: Fast Counting of triangles in large networks:  Algorithms and laws

51

RPI, November 2008

Page 52: Fast Counting of triangles in large networks:  Algorithms and laws

References

RPI, November 2008

52

[WF94] Wasserman, Faust: “Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences)”

[EM02] Eckmann, Moses: “Curvature of co-links uncovers hidden thematic layers in the World Wide Web”

[BPCG08] Becchetti, Boldi, Castillo, Gionis Efficient Semi-Streaming Algorithms for Local Triangle Counting in Massive Graphs

[FKP02] Fabrikant, Koutsoupias, Papadimitriou: “Heuristically Optimized Trade-offs: A New Paradigm for Power Laws in the Internet”

[N05] Newman: “Power laws, Pareto distributions and Zipf's law” [M04] Mitzenmacher: “A brief history of generative models for

power law and lognormal distributions” [FK81] Furedi-Komlos: “Eigenvalues of random symmetric

matrices”

Page 53: Fast Counting of triangles in large networks:  Algorithms and laws

References

RPI, November 2008

53

[S04] Danilo Sergi: “Random graph model with power-law distributed triangle subgraphs”

[D97] Demmel: “Applied Numerical Algebra” [LCKF05] Leskovec, Chakrabarti, Kleinberg, Faloutsos:

“Realistic, Mathematically Tractable Graph Generation and Evolution using Kronecker Multiplication”

[LK07] Leskovec, Faloutsos: “Scalable Modeling of Real Graphs using Kronecker Multiplication”

[FFF09] Faloutsos, Faloutsos, Faloutsos: “On power-law relationships of the Internet topology”

[MP02] Mihail, Papadimitriou: “On the Eigenvalue Power Law” [CLV03] Chung, Lu, Vu: “Spectra of Random Graphs with

given expected degrees”

Page 54: Fast Counting of triangles in large networks:  Algorithms and laws

References

RPI, November 2008

54

[YKS02] Yossef, Kumar, Sivakumar: “Scalable Modeling of Real Graphs using Kronecker Multiplication”

[GL89] Golub, Van Loan: “Matrix Computations” [BFLSS06] Buriol, Frahling, Leonardi, Spaccamela, Sohler: “Counting

triangles in data streams” [DHV93] Demmel, Heath, Vorst: “Parallel Numerical Linear Algebra” [YPSB05] Ye, Peyser, Spencer, Bader: “Commensurate distances and

similar motifs in genetic congruence and protein interaction networks in yeast”

[P08] Mitra Pradipta: “Entrywise Bounds for Eigenvectors of Random Graphs”

[FDBV01] Farkas, Derenyi, Barabasi, Vicsek: “Spectra of "real-world" graphs: Beyond the semi-circle law”

[S07] Spielman’s “Spectral Graph Theory and its Applications” class (YALE): http://www.cs.yale.edu/homes/spielman/eigs/

Page 55: Fast Counting of triangles in large networks:  Algorithms and laws

References

RPI, November 2008

55

[F08] Faloutsos’ “Multimedia Databases and Data Mining” class (CMU):http://www.cs.cmu.edu/~christos/courses/826.S08

For more references, take a look also in the paper: http://www.cs.cmu.edu/~ctsourak/tsourICDM08.pdf


Recommended