CMU SCS
Mining Large Graphs: Spectral Methods, Tensors and Influence propagation
Christos FaloutsosCMU
CMU SCS
Thanks• Alex Smola
• Jia Yu (Tim) Pan
Google, June 2013 C. Faloutsos (CMU) 2
CMU SCS
C. Faloutsos (CMU) 3
Roadmap• Graph problems:
– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’
• Influence propagation and spike modeling– C1: spikeM model
• Conclusions
Google, June 2013
CMU SCS
Google, June 2013 C. Faloutsos (CMU) 4
E-bay Fraud detection
w/ Polo Chau &Shashank Pandit, CMU[www’07]
CMU SCS
Google, June 2013 C. Faloutsos (CMU) 5
E-bay Fraud detection
CMU SCS
Google, June 2013 C. Faloutsos (CMU) 6
E-bay Fraud detection
CMU SCS
Google, June 2013 C. Faloutsos (CMU) 7
E-bay Fraud detection - NetProbe
CMU SCS
Google, June 2013 C. Faloutsos (CMU) 8
E-bay Fraud detection - NetProbe
F A HF 99%
A 99%
H 49% 49%
Compatibilitymatrix
heterophily
details
CMU SCS
C. Faloutsos (CMU) 9
Background 1: Belief Propagation Equations
€
mij (x j ) = φi (xi ) ⋅ψ ij (xi , x j ) ⋅ mni (xi )n∈N (i)\ j
∏xi
∑
€
bi (xi ) = η ⋅φi (xi ) ⋅ mij (xi )j∈N (i)∏
[Pearl ‘82][Yedidia+ ‘02]…[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10]
Google, June 2013
~bi (xi )
CMU SCS
Popular press
And less desirable attention:• E-mail from ‘Belgium police’ (‘copy of
your code?’)
Google, June 2013 C. Faloutsos (CMU) 10
CMU SCS
C. Faloutsos (CMU) 11
Roadmap• Graph problems:
– G1: Fraud detection – BP• Ebay• Symantec• Unification
– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’
• Influence propagation and spike modeling• Conclusions
Google, June 2013
CMU SCS
Polo ChauMachine Learning Dept
Carey NachenbergVice President & Fellow
Jeffrey WilhelmPrincipal Software Engineer
Adam WrightSoftware Engineer
Prof. Christos FaloutsosComputer Science Dept
Polonium: Tera-Scale Graph Mining and Inference for Malware Detection
PATENT PENDING
SDM 2011, Mesa, Arizona
CMU SCS
Polonium: The Data60+ terabytes of data anonymously contributed by participants of worldwide Norton Community Watch program
50+ million machines900+ million executable files
Constructed a machine-file bipartite graph (0.2 TB+)
1 billion nodes (machines and files)37 billion edges
Google, June 2013 13C. Faloutsos (CMU)
CMU SCS
Polonium: Key Ideas• Use “guilt-by-association” (i.e., homophily)
– E.g., files that appear on machines with many bad files are more likely to be bad
• Scalability: handles 37 billion-edge graph
Google, June 2013 14C. Faloutsos (CMU)
CMU SCS
Polonium: One-Interaction Results
84.9% True Positive Rate1% False Positive Rate
True Positive Rate% of malware
correctly identified
False Positive Rate% of non-malware wrongly labeled as malware15
Ideal
Google, June 2013 C. Faloutsos (CMU)
CMU SCS
C. Faloutsos (CMU) 16
Roadmap• Graph problems:
– G1: Fraud detection – BP• Ebay• Symantec• Unification
– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’
• Influence propagation and spike modeling• Conclusions
Google, June 2013
CMU SCS
Unifying Guilt-by-Association Approaches:
Theorems and Fast Algorithms
Danai KoutraU KangHsing-Kuo Kenneth Pao
Tai-You KeDuen Horng (Polo) ChauChristos Faloutsos
ECML PKDD, 5-9 September 2011, Athens, Greece
CMU SCS
Problem Definition:GBA techniques
C. Faloutsos (CMU) 18
Given: Graph; & few labeled nodesFind: labels of rest(assuming network effects)
?
?
?
?
Google, June 2013
CMU SCS
Homophily and Heterophily
C. Faloutsos (CMU) 19
Step 1
Step 2
homophily heterophily
All methods handle
homophily
NOT all methods handle
heterophilyBUT
proposed method
does!Google, June 2013
CMU SCS
Are they related?• RWR (Random Walk with Restarts)
– google’s pageRank (‘if my friends are important, I’m important, too’)
• SSL (Semi-supervised learning) – minimize the differences among neighbors
• BP (Belief propagation) – send messages to neighbors, on what you
believe about them
Google, June 2013 C. Faloutsos (CMU) 20
CMU SCS
Are they related?• RWR (Random Walk with Restarts)
– google’s pageRank (‘if my friends are important, I’m important, too’)
• SSL (Semi-supervised learning) – minimize the differences among neighbors
• BP (Belief propagation) – send messages to neighbors, on what you
believe about them
Google, June 2013 C. Faloutsos (CMU) 21
YES!
CMU SCS
C. Faloutsos (CMU) 22
Background 1: Belief Propagation Equations
€
mij (x j ) = φi (xi ) ⋅ψ ij (xi , x j ) ⋅ mni (xi )n∈N (i)\ j
∏xi
∑
€
bi (xi ) = η ⋅φi (xi ) ⋅ mij (xi )j∈N (i)∏
[Pearl ‘82][Yedidia+ ‘02]…[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10]
Google, June 2013
CMU SCS
Correspondence of Methods
C. Faloutsos (CMU) 23
Method Matrix Unknown knownRWR [I – c AD-1] × x = (1-c)ySSL [I + a(D - A)] × x = y
FABP [I + a D - c’A] × bh = φh
0 1 01 0 10 1 0
? 0 1 1
d1
d2 d3
final labels/ beliefs
prior labels/ beliefs
adjacency matrix
Google, June 2013
CMU SCS
Correspondence of Methods
C. Faloutsos (CMU) 24
Method Matrix Unknown knownRWR [I – c AD-1] × x = (1-c)ySSL [I + a(D - A)] × x = y
FABP [I + a D - c’A] × bh = φh
0 1 01 0 10 1 0
? 0 1 1
d1
d2 d3
final labels/ beliefs
prior labels/ beliefs
adjacency matrix
Google, June 2013
We know when it converges!
CMU SCS
Results: Scalability
C. Faloutsos (CMU) 25
FABP is linear on the number of edges.
# of edges (Kronecker graphs)
runt
ime
(min
)
Google, June 2013
CMU SCS
Results: Parallelism
C. Faloutsos (CMU) 26
FABP ~2x faster & wins/ties on accuracy.
runtime (min)
% a
ccur
acy
Google, June 2013
CMU SCS
C. Faloutsos (CMU) 27
Conclusions for BP
• ‘NetProbe’, ‘Polonium’, and belief propagation: exploit network effects.
• FaBP: fast & accurate (and -> convergence conditions)
Google, June 2013
CMU SCS
C. Faloutsos (CMU) 28
Roadmap• Graph problems:
– G1: Fraud detection – BP• Ebay• Symantec• Unification
– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’
• Influence propagation and spike modeling• Conclusions
Google, June 2013
CMU SCS
EigenSpokesB. Aditya Prakash, Mukund Seshadri, Ashwin
Sridharan, Sridhar Machiraju and Christos Faloutsos: EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs, PAKDD 2010, Hyderabad, India, 21-24 June 2010.
C. Faloutsos (CMU) 29Google, June 2013
CMU SCS
EigenSpokes• Eigenvectors of adjacency matrix
equivalent to singular vectors (symmetric, undirected graph)
30C. Faloutsos (CMU)Google, June 2013
CMU SCS
EigenSpokes• Eigenvectors of adjacency matrix
equivalent to singular vectors (symmetric, undirected graph)
31C. Faloutsos (CMU)Google, June 2013
N
N
details
CMU SCS
EigenSpokes• Eigenvectors of adjacency matrix
equivalent to singular vectors (symmetric, undirected graph)
32C. Faloutsos (CMU)Google, June 2013
N
N
details
CMU SCS
EigenSpokes• Eigenvectors of adjacency matrix
equivalent to singular vectors (symmetric, undirected graph)
33C. Faloutsos (CMU)Google, June 2013
N
N
details
CMU SCS
EigenSpokes• Eigenvectors of adjacency matrix
equivalent to singular vectors (symmetric, undirected graph)
34C. Faloutsos (CMU)Google, June 2013
N
N
details
CMU SCS
EigenSpokes• EE plot:• Scatter plot of
scores of u1 vs u2• One would expect
– Many points @ origin
– A few scattered ~randomly
C. Faloutsos (CMU) 35
u1
u2
Google, June 2013
1st Principal component
2nd Principal component
CMU SCS
EigenSpokes• EE plot:• Scatter plot of
scores of u1 vs u2• One would expect
– Many points @ origin
– A few scattered ~randomly
C. Faloutsos (CMU) 36
u1
u290o
Google, June 2013
CMU SCS
EigenSpokes - pervasiveness•Present in mobile social graph
across time and space
•Patent citation graph
37C. Faloutsos (CMU)Google, June 2013
CMU SCS
EigenSpokes - explanation
Near-cliques, or near-bipartite-cores, loosely connected
38C. Faloutsos (CMU)Google, June 2013
CMU SCS
EigenSpokes - explanation
Near-cliques, or near-bipartite-cores, loosely connected
39C. Faloutsos (CMU)Google, June 2013
CMU SCS
EigenSpokes - explanation
Near-cliques, or near-bipartite-cores, loosely connected
40C. Faloutsos (CMU)Google, June 2013
CMU SCS
EigenSpokes - explanation
Near-cliques, or near-bipartite-cores, loosely connected
So what? Extract nodes with high
scores high connectivity Good “communities”
spy plot of top 20 nodes
41C. Faloutsos (CMU)Google, June 2013
CMU SCS
Bipartite Communities!
magnified bipartite community
patents fromsame inventor(s)
`cut-and-paste’bibliography!
42C. Faloutsos (CMU)Google, June 2013
CMU SCS
(maybe, botnets?)
Victim IPs?
Botnet members?
43C. Faloutsos (CMU)Google, June 2013
Exploring itwith Dr. Eric Mao (III-Taiwan)
CMU SCS
C. Faloutsos (CMU) 44
Roadmap• Graph problems:
– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’
• Influence propagation and spike modeling• Conclusions
Google, June 2013
CMU SCS
GigaTensor: Scaling Tensor Analysis Up By 100 Times –
Algorithms and Discoveries
U Kang
ChristosFaloutsos
KDD’12
EvangelosPapalexakis
AbhayHarpale
Google, June 2013 45C. Faloutsos (CMU)
CMU SCS
Background: Tensors• Tensors (=multi-dimensional arrays) are
everywhere– Hyperlinks &anchor text [Kolda+,05]
URL 1
URL 2
Anchor Text
Java
C++
C#
11
1
1
1
1 1
Google, June 2013 46C. Faloutsos (CMU)
CMU SCS
Background: Tensors• Tensors (=multi-dimensional arrays) are
everywhere– Sensor stream (time, location, type)– Predicates (subject, verb, object) in knowledge base
“Barack Obama is president of U.S.”
“Eric Clapton playsguitar”
(26M)
(26M)
(48M)
NELL (Never Ending Language Learner) data
Nonzeros =144M
Google, June 2013 47C. Faloutsos (CMU)
CMU SCS
Background: Tensors• Tensors (=multi-dimensional arrays) are
everywhere– Sensor stream (time, location, type)– Predicates (subject, verb, object) in knowledge base
Google, June 2013 48C. Faloutsos (CMU)IP-destination
IP-source
Time-stamp Anomaly Detection inComputernetworks
CMU SCS
Problem Definition• How to decompose a billion-scale tensor?
– Corresponds to SVD in 2D case
Google, June 2013 49C. Faloutsos (CMU)
CMU SCS
Problem Definition• How to decompose a billion-scale tensor?
– Corresponds to SVD in 2D case
Google, June 2013 50C. Faloutsos (CMU)
‘Politicians’ ‘Artists’
CMU SCS
Problem Definition
Q1: Dominant concepts/topics? Q2: Find synonyms to a given noun phrase? (and how to scale up: |data| > RAM)
(26M)
(26M)
(48M)
NELL (Never Ending Language Learner) data
Nonzeros =144M
Google, June 2013 51C. Faloutsos (CMU)
CMU SCS
Experiments• GigaTensor solves 100x larger problem
Number of nonzero= I / 50
(J)
(I)
(K)
GigaTensor
Tensor
Toolbox Out ofMemory
100x
Google, June 2013 52C. Faloutsos (CMU)
CMU SCS
A1: Concept Discovery• Concept Discovery in Knowledge Base
Google, June 2013 53C. Faloutsos (CMU)
CMU SCS
A1: Concept Discovery
Google, June 2013 54C. Faloutsos (CMU)
CMU SCS
A2: Synonym Discovery
Google, June 2013 55C. Faloutsos (CMU)
CMU SCS
C. Faloutsos (CMU) 56
Roadmap• Graph problems:
– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’
• Influence propagation and spike modeling• Conclusions
Google, June 2013
CMU SCS
Rise and Fall Patterns of Information Diffusion:Model and Implications
Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), B. Aditya Prakash (CMU),
Lei Li (UCB), Christos Faloutsos (CMU)KDD’12, Beijing China
KDD 2012 57Y. Matsubara et al.
CMU SCS
C. Faloutsos (CMU)
• Meme (# of mentions in blogs)– short phrases Sourced from U.S. politics in 2008
58
“you can put lipstick on a pig”
“yes we can”
Rise and fall patterns in social media
Google, June 2013
CMU SCS
C. Faloutsos (CMU)
Rise and fall patterns in social media
59
• four classes on YouTube [Crane et al. ’08]• six classes on Meme [Yang et al. ’11]
Google, June 2013
CMU SCS
C. Faloutsos (CMU)
Rise and fall patterns in social media
60
• Can we find a unifying model, which includes these patterns?
• four classes on YouTube [Crane et al. ’08]• six classes on Meme [Yang et al. ’11]
Google, June 2013
CMU SCS
C. Faloutsos (CMU)
Rise and fall patterns in social media
61
• Answer: YES!
• We can represent all patterns by single model
Google, June 2013
CMU SCS
C. Faloutsos (CMU) 62
Main idea - SpikeM- 1. Un-informed bloggers (uninformed about rumor)- 2. External shock at time nb (e.g, breaking news)- 3. Infection (word-of-mouth)
Time n=0 Time n=nb
β
Google, June 2013
Infectiveness of a blog-post at age n:
- Strength of infection (quality of news)- Decay function
Time n=nb+1
CMU SCS
C. Faloutsos (CMU) 63
- 1. Un-informed bloggers (uninformed about rumor)- 2. External shock at time nb (e.g, breaking news)- 3. Infection (word-of-mouth)
Time n=0 Time n=nb
β
Google, June 2013
Infectiveness of a blog-post at age n:
- Strength of infection (quality of news)- Decay function
Time n=nb+1
Main idea - SpikeM
CMU SCS
Google, June 2013 C. Faloutsos (CMU) 64
-1.5 slope
J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005) . [PDF]
Response time (log)
Prob(RT > x)(log) -1.5
CMU SCS
C. Faloutsos (CMU)
SpikeM - with periodicity• Full equation of SpikeM
65
Periodicity
noonPeak 3am
Dip
Time n
Bloggers change their activity over time
(e.g., daily, weekly, yearly)
activity
Details
Google, June 2013
CMU SCS
C. Faloutsos (CMU)
Details• Analysis – exponential rise and power-raw fall
66
Lin-log
Log-log
Rise-part
SI -> exponential SpikeM -> exponential
Google, June 2013
CMU SCS
C. Faloutsos (CMU)
Details• Analysis – exponential rise and power-raw fall
67
Lin-log
Log-log
Fall-part
SI -> exponential SpikeM -> power law
Google, June 2013
CMU SCS
C. Faloutsos (CMU)
Tail-part forecasts
68
• SpikeM can capture tail part
Google, June 2013
CMU SCS
C. Faloutsos (CMU)
“What-if” forecasting
69
e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date
?
(1) First spike
(2) Release date
(3) Two weeks before release
Google, June 2013
?
CMU SCS
C. Faloutsos (CMU)
“What-if” forecasting
70SpikeM can forecast upcoming spikes
(1) First spike
(2) Release date
(3) Two weeks before release
Google, June 2013
CMU SCS
Conclusions for spikes• Exp rise; PL decay• ‘spikeM’ captures all patterns, with a few
parms– And can do extrapolation– And forecasting
Google, June 2013 C. Faloutsos (CMU) 71
CMU SCS
C. Faloutsos (CMU) 72
Roadmap• Graph problems:
– G1: Fraud detection – BP– G2: Botnet detection – spectral – G3: Beyond graphs: tensors and ``NELL’’
• Influence propagation and spike modeling• Future research• Conclusions
Google, June 2013
CMU SCS
Challenge#1: Time evolving networks / tensors
• Periodicities? Burstiness?• What is ‘typical’ behavior of a node, over time• Heterogeneous graphs (= nodes w/ attributes)
Google, June 2013 C. Faloutsos (CMU) 73
…
CMU SCS
Challenge #2: ‘Connectome’ – brain wiring
Google, June 2013 C. Faloutsos (CMU) 74
• Which neurons get activated by ‘bee’• How wiring evolves• Modeling epilepsy
N. Sidiropoulos
George Karypis
V. Papalexakis
Tom Mitchell
CMU SCS
C. Faloutsos (CMU) 75
Thanks
Google, June 2013
Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab
CMU SCS
C. Faloutsos (CMU) 76
Project info: PEGASUS
Google, June 2013
www.cs.cmu.edu/~pegasusResults on large graphs: with Pegasus +
hadoop + M45Apache licenseCode, papers, manual, video
Prof. U Kang Prof. Polo Chau
CMU SCS
C. Faloutsos (CMU) 77
Cast
Akoglu, Leman
Chau, Polo
Kang, U
McGlohon, Mary
Tong, Hanghang
Prakash,Aditya
Google, June 2013
Koutra,Danai
Beutel,Alex
Papalexakis,Vagelis
CMU SCS
C. Faloutsos (CMU) 78
References
• Deepayan Chakrabarti, Christos Faloutsos: Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38(1): (2006)
Google, June 2013
CMU SCS
C. Faloutsos (CMU) 79
References• Christos Faloutsos, Tamara G. Kolda, Jimeng Sun:
Mining large graphs and streams using matrix and tensor tools. Tutorial, SIGMOD Conference 2007: 1174
Google, June 2013
CMU SCS
References• Yasuko Matsubara, Yasushi Sakurai, B. Aditya
Prakash, Lei Li, Christos Faloutsos, "Rise and Fall Patterns of Information Diffusion: Model and Implications", KDD’12, pp. 6-14, Beijing, China, August 2012
Google, June 2013 C. Faloutsos (CMU) 80
CMU SCS
References• Jimeng Sun, Dacheng Tao, Christos
Faloutsos: Beyond streams and graphs: dynamic tensor analysis. KDD 2006: 374-383
Google, June 2013 C. Faloutsos (CMU) 81
CMU SCS
Overall Conclusions• G1: fraud detection
– BP: powerful method– FaBP: faster; equally accurate; known
convergence• G2: botnets -> Eigenspokes• G3: Subject-Verb-Object ->
Tensors/GigaTensor• Spikes: ‘spikeM’ (exp rise; PL drop)
Google, June 2013 C. Faloutsos (CMU) 82