Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 221 times |
Download: | 1 times |
CMU SCS
C. Faloutsos (CMU) #1
Large Graph Algorithms
Christos Faloutsos
CMU
McGlohon, MaryPrakash, AdityaTong, HanghangTsourakakis, Babis
Akoglu, LemanChau, PoloKang, U
OpenCirrus'10
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 2
Graphs - why should we care?
Internet Map [lumeta.com]
Food Web [Martinez ’91]
Protein Interactions [genomebiology.com]
Friendship Network [Moody ’01]
CMU SCS
ICDM-LDMTA 2009 C. Faloutsos 3
Graphs - why should we care?
• IR: bi-partite graphs (doc-terms)
• web: hyper-text graph
• Social networking sites (Facebook, twitter)
• Users posing and answering questions
• Click-streams (user – page bipartite graph)
• ... and more – any M:N db relationship
D1
DN
T1
TM
... ...
CMU SCS
C. Faloutsos (CMU) 4
Our goal:
One-stop solution for mining huge graphs:
PEGASUS project (PEta GrAph mining System)
• www.cs.cmu.edu/~pegasus
• Open-source code and papers
OpenCirrus'10
CMU SCS
C. Faloutsos (CMU) 5
Centralized Hadoop/PEGASUS
Degree Distr. old old
Pagerank old old
Diameter/ANF old DONE
Conn. Comp old DONE
Triangles DONE
Visualization STARTED
Outline – Algorithms & results
OpenCirrus'10
CMU SCS
HADI for diameter estimation
• Radius Plots for Mining Tera-byte Scale Graphs U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM’10
• Naively: diameter needs O(N**2) space and up to O(N**3) time – prohibitive (N~1B)
• Our HADI: linear on E (~10B)– Near-linear scalability wrt # machines– Several optimizations -> 5x faster
C. Faloutsos (CMU) 6OpenCirrus'10
CMU SCS
YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges)• Largest publicly available graph ever studied.
????????
??
19+? [Barabasi+]
7C. Faloutsos (CMU)OpenCirrus'10
Radius
Count
CMU SCS
YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges)• effective diameter: surprisingly small.• Multi-modality: probably mixture of cores .
8C. Faloutsos (CMU)OpenCirrus'10
CMU SCS
YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges)• effective diameter: surprisingly small.• Multi-modality: probably mixture of cores .
9C. Faloutsos (CMU)OpenCirrus'10
CMU SCS
Running time - Kronecker and Erdos-Renyi Graphs with billions edges.
#11C. Faloutsos (CMU)OpenCirrus'10
CMU SCS
C. Faloutsos (CMU) 12
Centralized Hadoop/PEGASUS
Degree Distr. old old
Pagerank old old
Diameter/ANF old DONE
Conn. Comp old DONE
Triangles DONE
Visualization STARTED
Outline – Algorithms & results
OpenCirrus'10
CMU SCS
Generalized Iterated Matrix Vector Multiplication (GIMV)
OpenCirrus'10 C. Faloutsos (CMU) 13
PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up).
CMU SCS
Generalized Iterated Matrix Vector Multiplication (GIMV)
OpenCirrus'10 C. Faloutsos (CMU) 14
• PageRank• proximity (RWR)• Diameter• Connected components• (eigenvectors, • Belief Prop. • … )
Matrix – vectorMultiplication(iterated)
CMU SCS
16
Example: GIM-V At Work
• Connected Components
Size
Count300-size cmptX 500.Why?
1100-size cmptX 65.Why?
C. Faloutsos (CMU)OpenCirrus'10
CMU SCS
17
Example: GIM-V At Work
• Connected Components
Size
Count
suspiciousfinancial-advice sites(not existing now)
C. Faloutsos (CMU)OpenCirrus'10
CMU SCS
C. Faloutsos (CMU) 18
Centralized Hadoop/PEGASUS
Degree Distr. old old
Pagerank old old
Diameter/ANF old DONE
Conn. Comp old DONE
Triangles DONE
Visualization STARTED
Outline – Algorithms & results
OpenCirrus'10
CMU SCS
ASONAM 2009 C. Faloutsos 20
Triangles
• Real social networks have a lot of triangles– Friends of friends are friends
• Q1: how to compute quickly?
• Q2: Any patterns?
CMU SCS
ASONAM 2009 C. Faloutsos 21
Triangles : Computations [Tsourakakis ICDM 2008]
Q: Can we do that quickly?
Triangles are expensive to compute(3-way join; several approx. algos)
CMU SCS
ASONAM 2009 C. Faloutsos 22
Triangles : Computations [Tsourakakis ICDM 2008]
But: triangles are expensive to compute(3-way join; several approx. algos)
Q: Can we do that quickly?A: Yes!
#triangles = 1/6 Sum ( i3 )
(and, because of skewness, we only need the top few eigenvalues!
CMU SCS
ASONAM 2009 C. Faloutsos 23
Triangles : Computations [Tsourakakis ICDM 2008]
1000x+ speed-up, high accuracy
CMU SCS
C. Faloutsos (CMU) 24
Triangles• Easy to implement on hadoop: it only needs
eigenvalues (working on it, using Lanczos)
OpenCirrus'10
CMU SCS
ASONAM 2009 C. Faloutsos 25
Triangles
• Real social networks have a lot of triangles– Friends of friends are friends
• Q1: how to compute quickly?
• Q2: Any patterns?
CMU SCS
ASONAM 2009 C. Faloutsos 26
Triangle Law: #1 [Tsourakakis ICDM 2008]
ASNHEP-TH
Epinions X-axis: # of Triangles a node participates inY-axis: count of such nodes
CMU SCS
ASONAM 2009 C. Faloutsos 27
Triangle Law: #2 [Tsourakakis ICDM 2008]
SNReuters
EpinionsX-axis: degreeY-axis: mean # trianglesNotice: slope ~ degree exponent (insets)
CMU SCS
C. Faloutsos (CMU) 28
Centralized Hadoop/PEGASUS
Degree Distr. old old
Pagerank old old
Diameter/ANF old DONE
Conn. Comp old DONE
Triangles DONE
Visualization STARTED
Outline – Algorithms & results
OpenCirrus'10
CMU SCS
Visualization: ShiftR
• Supporting Ad Hoc Sensemaking: Integrating Cognitive, HCI, and Data Mining ApproachesAniket Kittur, Duen Horng (‘Polo’) Chau, Christos Faloutsos, Jason I. HongSensemaking Workshop at CHI 2009, April 4-5. Boston, MA, USA.
OpenCirrus'10 C. Faloutsos (CMU) 29