+ All Categories
Home > Documents > Counting Triangles in Real-World Networks

Counting Triangles in Real-World Networks

Date post: 19-Mar-2016
Category:
Upload: charalampos-tsourakakis
View: 219 times
Download: 3 times
Share this document with a friend
Description:
Counting Triangles in Real-World Networks
Popular Tags:
56
Charalampos (Babis) E. Tsourakakis [email protected] CSE 2011, Reno 1 st March ‘11 CSE'11 1
Transcript
Page 1: Counting Triangles in Real-World Networks

                                           Charalampos  (Babis)  E.  Tsourakakis                [email protected]  

         CSE  2011,  Reno              1st    March  ‘11  

CSE'11   1  

Page 2: Counting Triangles in Real-World Networks

CSE'11   2  

               Geoff  Sanders          Lawrence  Livermore  

Page 3: Counting Triangles in Real-World Networks

CSE'11   3  

Gary  L.  Miller                  SCS,  CMU  

Mihail  N.  Kolountzakis    Math,  University  of  Crete  

Page 4: Counting Triangles in Real-World Networks

 Motivation   Existing  Work     Spectral  Family   Combinatorial  Family     Experimental  Results   Conclusions  

CSE'11   4  

Page 5: Counting Triangles in Real-World Networks

A  

C  B  

(Wasserman Faust ‘94)

Friends  of  friends  tend  to  become                                friends  themselves!    

(left  to  right)  Paul  Erdös  ,  Ronald  Graham,  Fan  Chung  Graham    CSE'11   5  

Page 6: Counting Triangles in Real-World Networks

6  CSE'11  

Eckmann-­‐Moses,  Uncovering  the  Hidden  Thematic  Structure    of  the  Web    (PNAS,  2001)    

Key  Idea:  Connected  regions  of  high  curvature  (i.e.,  dense  in  triangles)  indicate  a  common  topic!  

Page 7: Counting Triangles in Real-World Networks

7  CSE'11  

Triangles  used  for  Web  Spam  Detection  (Becchetti  et  al.  KDD  ‘08)  

Key  Idea:  Triangle  Distribution  among  spam  hosts  is  significantly  different  from  non-­‐spam  hosts!  

Page 8: Counting Triangles in Real-World Networks

8  CSE'11  

Triangles  used  for  assessing  Content  Quality  in  Social  Networks  

Welser,  Gleave,  Fisher,  Smith  Journal  of  Social  Structure  2007  

Key  Claim:  The  amount  of  triangles  in  the  self-­‐centered  social  network  of  a  user  is  a  good  indicator  of  the  role  of  that  user  in  the  community!  

Page 9: Counting Triangles in Real-World Networks

     

CSE'11   9  

Page 10: Counting Triangles in Real-World Networks

     

CSE'11   10  

(Watts,Strogatz’98)  

Page 11: Counting Triangles in Real-World Networks

  Signed  triangles  in  structural  balance  theory  Jon  Kleinberg  

  Triangle  closing  models  also  used  to  model  the  microscopic    evolution  of  social  networks  (Leskovec  et.al.,  KDD  ‘08)  

CSE'11   11  

Page 12: Counting Triangles in Real-World Networks

  CAD  applications,      E.g.,  solving  systems  of  geometric  constraints  involves  triangle  counting!    (Fudos,  Hoffman  1997)  

CSE'11   12  

Page 13: Counting Triangles in Real-World Networks

Numerous  other  applications  including  :  •  Motif  Detection/  Frequent  Subgraph  Mining  (e.g.,  Protein-­‐Protein  Interaction  Networks)    

•  Community  Detection  (Berry  et  al.  ‘09)  •  Outlier  Detection  (CET  ‘08)  •  Link  Recommendation    

13  CSE'11  

Fast  triangle  counting  algorithms  are  necessary.  

Page 14: Counting Triangles in Real-World Networks

  There  is  no  general,  good  definition  but  typical  characteristics  include:    Skewed  degree  distributions    High  clustering  coefficients    “Small  world”  characteristics    (Six  degrees  of  separation)  

CSE'11   14  

Page 15: Counting Triangles in Real-World Networks

 Motivation   Existing  Work     Spectral  Family   Combinatorial  Family     Experimental  Results   Conclusions  

CSE'11   15  

Page 16: Counting Triangles in Real-World Networks

Alon     Yuster   Zwick  

Asymptotically    the  fastest  algorithm  but  not  practical  for  large  graphs.  

In  practice,  one  of  the  iterator  algorithms  are  preferred.  •  Node  Iterator  (count  the  edges  among  the  neighbors  of  each    vertex)  

•  Edge  Iterator  (count  the  common  neighbors  of  the  endpoints  of    each  edge)  

Both  run  asymptotically  in  O(mn)  time.  CSE'11   16  

Page 17: Counting Triangles in Real-World Networks

  Remarks    In  Alon,  Yuster,  Zwick  appears  the  idea  of  partitioning  the  vertices  into  “large”  and  “small”  degree  and  treating  them  appropriately.  

  For  more  work,  see  references  in  our  paper:  ▪  Itai,  Rodeh  (STOC  ‘77)  ▪  Papadimitriou,  Yannakakis  (IPL  ‘81)  ……  

CSE'11   17  

Page 18: Counting Triangles in Real-World Networks

  r  independent  samples  of  three  distinct  vertices  

CSE'11   18  

Then  the  following  holds:  

with probability at least 1-δ

Works for dense graphs. e.g., T3 n2logn

Page 19: Counting Triangles in Real-World Networks

  (Yosseff,  Kumar,  Sivakumar    ‘02)  require  n2/polylogn  edges  

 More  follow  up  work:    (Jowhari,  Ghodsi  ‘05)    (Buriol,  Frahling,  Leondardi,  Marchetti,  Spaccamela,  Sohler  ‘06)    

  (Becchetti,  Boldi,  Castillio,  Gionis  ‘08)  

CSE'11   19  

Page 20: Counting Triangles in Real-World Networks

 Motivation   Existing  Work     Spectral  Family   Combinatorial  Family     Experimental  Results   Conclusions  

CSE'11   20  

Page 21: Counting Triangles in Real-World Networks

21  CSE'11  

eigenvalues of adjacency matrix

i-th eigenvector

Key  Idea:  Few  top  eigenvalue-­‐eigenvector  pairs    typically  give  a  good  approximation    to  the  number    of  triangles.  

CET,  [ICDM  ’08]  

Page 22: Counting Triangles in Real-World Networks

CSE'11   22  

Keep only 3!

Political Blogs Network (1.2K,17K) (Adamic, Glance ‘04)

Page 23: Counting Triangles in Real-World Networks

  The  few  top  eigenvalues  are  significantly  larger  than  the  bulk  of  the  eigenvalues  (“Eigenvalue  power  law”)  

  Hence,  they  contribute  a  lot  to  the  number  of  triangles  and  cubes  amplify  this  even  more.    

  Bulk  of  eigenvalues  almost  symmetrically  distributed  around  0,  cubes  cancel  out.  

  Lanczos  method  converges  fast  due  to  large  eigengaps.  

CSE'11   23  

Page 24: Counting Triangles in Real-World Networks

CSE'11   24  

Political Blogs Network (1.2K,17K) (Adamic, Glance ‘04)

Pearson’s  correlation  coefficient  ρ=0.9997  using  a  rank  10    approximation  

Page 25: Counting Triangles in Real-World Networks

CSE'11   25  

Note:  with  a  rank  3  approximation  almost  perfect  results  

Page 26: Counting Triangles in Real-World Networks

  Sample  the  i-­‐th  column  A(i)  of  the  adjacency  matrix  with  probability  proportional  to  the  degree  of  the  i-­‐th  vertex  and  scale  it  “appropriately”  

  Compute  a  low  rank  approximation  of  sampled  matrix  using  SVD.  

CSE'11   26  

CET,  [KAIS  ’11]  

 Key  idea    

Page 27: Counting Triangles in Real-World Networks

  Observation  1:  Eigendecomposition  <-­‐>  SVD  when  matrix  is  symmetric,  i.e.,      eigenvectors  =  left  singular  vectors      λi=σi  sgn(uivi)    (where  λi,σi  eigenvalue,  singular  value  respectively,  ui  and  vi  left  and  right  singular  vectors  respectively.              

  Observation  2:  We  care  about  a  k-­‐rank  approximation  Ak  of  A,  where  k  is  small.  

CSE  ’11   27  

Page 28: Counting Triangles in Real-World Networks

  Frieze,  Kannan,  Vempala  

  Idea:  Sample  c  columns,  obtain  A  and  find  Ak  instead  of  the  optimal  Ak.  Recover  signs  from  left  and  right  singular  vectors.  Use  EigenTriangle!  

  Results:  c=100,  k=6  for  Flickr  (400k,2M)  95.6%  accuracy  

CSE  ‘11   28  

(1)  Pick  column  i  with    probability  proportional  to    its  squared  length  (2)  Use  the  sampled  matrix  to    obtain  a  good  low  rank    approximation  to  the  original  one  

~   ~  

Page 29: Counting Triangles in Real-World Networks

  Success  is  based  on  empirical  properties:    Real  world  networks  typically  satisfy  the  properties  shown  before  but  not  always.  

  Very  little  knowledge  about  the  spectrum,  most  we  know  about  are  the  top  eigenvalues  

 Way  less  knowledge  about  eigenvectors  of  real  world  networks  

CSE'11   29  

Page 30: Counting Triangles in Real-World Networks

 Motivation   Existing  Work     Spectral  Family   Combinatorial  Family     Results   Conclusions  

CSE'11   30  

Page 31: Counting Triangles in Real-World Networks

  Approximate  a  given  graph  G  with  a  sparse  graph  H,  such  that  H  is  close  to  G  in  a  certain  notion.  

  Examples:      Cut  preserving  Benczur-­‐Karger  

         Spectral  Sparsifier  Spielman-­‐Teng    

CSE  ‘11   31  

What  about  Triangle  Sparsifiers?    

Page 32: Counting Triangles in Real-World Networks

     

CSE'11   32  

Page 33: Counting Triangles in Real-World Networks

  Speedup:  e.g.,  if  we  use  any  standard  iterator  method  1/p2    

  Setting  p  optimally  using  “median  boosting  trick”  (Jerrum,  Valiant,  Vazirani  ‘86)  

  Sampling  in  expected  sublinear  time  O(pm)      Can  justify  even  O(n)  speedups  in  graphs  with  sufficiently  many  triangles.    

  Practice:  huge  speedups,  high  accuracy  

CSE'11   33  

Page 34: Counting Triangles in Real-World Networks

CSE'11   34  

McSherry   Achlioptas  

CET  et  al.  [ASONAM  ‘09]  :  Speeds  up  spectra    computations  while  not  affecting  triangle  estimates  MACH:  Fast  Randomized  Tensor  Decompositions  (CET,  SDM’10)  Theoretical  guarantees  on  HOSVD  decompositions  for  dense    tensors,  works  great  in  practice  for  Tucker  decompositions  too.    

Sparsify  matrix  A  appropriately    Compute  faster  a  low  rank    Approximation  which  is  “good”    in  terms  of  any  reasonable  norm  (e.g.,  Frobenious,2-­‐norm)  

Page 35: Counting Triangles in Real-World Networks

 Theorem    If                                                                      then  with  probability  1-­‐1/n3-­‐d  the  sampled  graph  has  a  triangle  count  that  ε-­‐approximates  the  true  number  of  triangles  for  any  0<d<3.  

CSE'11   35  

Page 36: Counting Triangles in Real-World Networks

CSE'11   36  

1    k+1  

2  

Every  graph  on  n  vertices  with  max.  degree  Δ(G)  =k  is    (k+1)  -­‐colorable    with  all  color  classes  differing  at  size  by  at    most  1.  

….  

Page 37: Counting Triangles in Real-World Networks

  Create  an  auxiliary  graph  where  each  triangle  is  a  vertex  and  two  vertices  are  connected  iff  the  corresponding  triangles  share  an  edge.    

 Observe:  Δ(G)=Ο(n)  

  Invoke  Hajnal-­‐Szemerédi  theorem  and  apply  Chernoff  bound  per  each  chromatic  class.  Finally,  take  a  union  bound.  Q.E.D.  

CSE'11   37  

Page 38: Counting Triangles in Real-World Networks

     

CSE'11   38  

K,  M,  Peng,  CET  Int.    Math.  ‘11  

Page 39: Counting Triangles in Real-World Networks

     

CSE'11   39  

Page 40: Counting Triangles in Real-World Networks

     

CSE'11   40  

Given  a  graph  G  with  n  vertices  and  m  edges  which  graph  maximizes  the  edges  in  the  line  graph  L(G)?  

Page 41: Counting Triangles in Real-World Networks

     

CSE'11   41  

Page 42: Counting Triangles in Real-World Networks

 Motivation   Existing  Work     Spectral  Family   Combinatorial  Family     Experimental  Results   Conclusions  

CSE'11   42  

Page 43: Counting Triangles in Real-World Networks

CSE'11   43  

Page 44: Counting Triangles in Real-World Networks

CSE'11   44  

LiveJournal  (5.4M,48M)  

Orkut  (3.1M,117M)  

Web-­‐EDU  (9.9M,46.3M)  

YouTube  (1.2M,3M)  

Flickr,  (1.9M,15.6M)  

Page 45: Counting Triangles in Real-World Networks

CSE'11   45  

Social  networks    abundant    in  triangles!  

Page 46: Counting Triangles in Real-World Networks

0  

50  

100  

150  

200  

250  

Orkut   Flickr   Livejournal   Wiki-­‐2006   Wiki-­‐2007  

Exact  

Triple  Sampling  

Hybrid  

CSE'11   46  

secs  

Accuracy  ~99%  

Page 47: Counting Triangles in Real-World Networks

  p  was  set  to  0.1.  More  sophisticated  techniques  for  setting  p  exist  (CET,  Kolountzakis,  Miller  )  using  a  doubling  procedure.  

  From  our  results,  there  is  not  a  clear  winner,  but  the  hybrid  algorithm  achieves  both  high  accuracy  and  speed.    

  Our  code,  even  our  exact  algorithm,  outperforms  the  fastest  approximate  counting  competitors  code,  hence  we  compared  different  versions  of  our  code!    

CSE'11   47  

Page 48: Counting Triangles in Real-World Networks

 Motivation   Existing  Work     Spectral  Family   Combinatorial  Family     Experimental  Results   Conclusions  

CSE'11   48  

Page 49: Counting Triangles in Real-World Networks

  Real  world  graphs  though  of  as  “planar  graphs”   Many  problems  can  be  solved  more  efficiently  than  the  general  case.  

  Spectral  algorithm  designed  based  on  empirical  special  spectral  properties    

  Triangle  Sparsifiers  (fast  with  strong  theoretical  guarantees)  

CSE'11   49  

Page 50: Counting Triangles in Real-World Networks

  “Ιnterplay”  Combinatorial-­‐Spectral  approach   MACH  for  HOSVD    

  Degree  based  partitioning  is  a  very  practical  “trick”  

  State  of  the  art  results  for  sampling  based  and  semi-­‐streaming  triangle  counting  algorithms  

CSE'11   50  

Page 51: Counting Triangles in Real-World Networks

  Triangles  in  Kronecker  graphs  [CET  ICDM’08]    Triangle  Power  Laws  [CET  ICDM’08]    Random  projections  and  counting  triangles  [  Kolountzakis,  Miller,  Peng,  CET  ‘11]    Semistreaming  model  with  low  space  usage  and  only  3  passes  over  the  graph  stream    

[  Kolountzakis,  Miller,  Peng,  CET  ‘11]    MapReduce  implementation  [CET  et  al,  KDD’09]    High  quality  code  with  optimized  cache  properties  

CSE'11   51  

Page 52: Counting Triangles in Real-World Networks

Remove  edge  (1,2)  

Remove  any  weighted  edge  w  sufficiently  large  

52  CSE'11  

Spielman-­‐Srivastava    and    Benczur-­‐Karger  sparsifiers  also  don’t  work!  

Page 53: Counting Triangles in Real-World Networks

THANK  YOU!  

QUESTIONS  

CSE'11   53  

Page 54: Counting Triangles in Real-World Networks

CSE'11   54  

Page 55: Counting Triangles in Real-World Networks

CSE'11   55  

             621,963,073  

Page 56: Counting Triangles in Real-World Networks

Best  method  for  our  applications:  best  running  time,  high  accuracy  

CSE'11   56  

Hybrid  vs.  Naïve  Sampling  improves  accuracy,  Increases  running  time  


Recommended