+ All Categories
Transcript
Page 1: Compu&ng)Approximate)b2Matchings)in)Large)Graphs)) and)an ...sc14.supercomputing.org/sites/.../src_poster/poster... · Compu&ng)Approximate)b2Matchings)in)Large)Graphs)) and)an)Applica&on)to)kAnonymity)

   

Compu&ng  Approximate  b-­‐Matchings  in  Large  Graphs    and  an  Applica&on  to  k-­‐Anonymity  

Arif  Khan    Adviser:  Prof.  Alex  Pothen    

Department  of  Computer  Science,  Purdue  University  

       

Problem  DefiniAon  

Abstract  

Given   a   graph,   the   b-­‐Matching   problem   is   to   find   an   edge   weighted  matching   of   maximum  weight   with   the  constraint  that  every  vertex  v  can  match  with  at  most  b  verAces.  b-­‐Matching  is  useful  in  various  machine  learning  problems  such  as  classificaAon,  spectral  clustering,  graph  sparsificaAon,  graph  embedding  and  data  privacy.  The  exact  algorithms  for  this  problem  have  high  Ame  as  well  as  space  complexiAes,    are   inherently  sequenAal,  and  therefore,   are  not  pracAcal  on   large  problems.  We  propose  a  1/2-­‐approximaAon  algorithm,  we   call   it  bSuitor,  which  runs  in  linear  Ame  in  the  number  of  edges  and  also  requires  linear  storage.  We  show  that  our  algorithm  can  solve  large  problems  with  billions  of  edges  and  can  get  up  to  97%  of  weight  of  the  opAmal  soluAon.  We  also  show  that  our  algorithm  scales  up  to  11x  on  16  cores  of  Intel  Xeon  machines  and  up  to  50x  on  60  cores  of  Intel  Xeon  Phi  machines.  

References  

   

b-­‐Suitor  Algorithm  

Experiments  and  Results   MoAvaAon  

The  fastest  exact  algorithms  for  maximum  edge  weighted  b-­‐Matching  have  the  Ame  complexity  of  O(|V|1/2|E|).    Therefore,   it   is  not  pracAcal  to  use  these  algorithms  to  solve   larger  problems.   It   turns  out  that  b-­‐Matching  has  pracAcal  use   in  many  machine   learning  applicaAons  where  approximate   soluAons   suffice.  Therefore,   any  good  approximaAon   algorithm   can   be   used   instead   of   exact   algorithm.   The   approximaAon   algorithm   also   has   the  benefit  of  being  highly  scalable  nature.  These  are  several  applicaAons  where  b-­‐Matching  is  shown  to  be  useful:    i)  Classifica&on   ii)  Spectral   clustering   iii),  Graph  embedding   iv)  graph   sparsifica&on  and  v)  Data  privacy  as   in   k-­‐Anonymity  problem.  

ApplicaAon  to  k-­‐Anonymity  Privacy  Problem  

ContribuAons  and  Future  Work  Ø  We  have  shown  that  the  bSuitor  algorithm  is  the  fastest  algorithm  for  approximate  b-­‐Matching  compared  to  

other   algorithms.  We  also   show   that   this   algorithm  demonstrates  near   linear   scalability   both  on  Xeon  and  Xeon  Phi  mulAprocessors.  

Ø  We  idenAfied  an  important  applicaAon  of  bSuitor  to  a  privacy  problem  called  k-­‐Anonymity.  Ø  By  using  bSuitor,  we  can  solve  problems  with  sizes  larger  by  a  factor  of  100,  which  could  not  be  solved  before  

without  significant  change  in  the  quality  of  the  soluAon.  Ø  Our  goal  is  to  conAnue  developing  faster  b-­‐Matching  algorithms.  Ø  We  also  plan  to  apply  our  algorithm  to  other  contexts  such  as  graph  clustering  and  parAAoning.  

Consider  an  undirected  graph  G(V,  E,  w)  with  vertex  set  V  ,  edge  set  E,  and  weight  funcAon  w(e)  >=  0  for  each  e  ε  E,   and   a   funcAon   f   :   V     →   Z+   assigning   non-­‐negaAve   integers   to   the   verAces.   (We   assume   without   loss   of  generality  that   f   (v)   is   less  than  or  equal   to  the  degree  of  the  vertex  v.)  Then  a  b-­‐matching  on  G   is  a  subset  of  edges  M    of  E  such  that  every  vertex  v  ε  V  has  at  most  f  (v)  edges  in  M  incident  on  it.  The  values  f  (v)  for  each  vertex  v  could  be  the  same  or  be  different.  The  usual  noAon  of  matching  has  f  (v)  =  1  for  all  v,  and  we  will  call  it  a  1-­‐matching.   If   all   verAces   in   M   are   required   to   have   degree   exactly   f   (v),   we   call   it   a   perfect   b-­‐matching.   A  maximum  cardinality   b-­‐matching   is   a   b-­‐matching   such   that   |M|   is   as   large   as   possible.  A  maximum  weight  b-­‐matching  is  a  b-­‐matching  such  that  total  weight  of  the  matched  edges  is  as  large  as  possible.  

Ø  We  apply  our  algorithm  to  solve  the  k-­‐Anonymity  privacy  problem.  Ø  We  show  that  by  using  approximate  matching  instead  of  exact  matching  makes  the  algorithm  faster  by  two  

order  of  magnitude  [Table  1]  

Boy  • I  want  to  be  your  Suitor……  J  

Girl  

• Let  me  think……  • Are  you  bemer  than  my  current  Suitor..???  

Yes,  he  is..  

Bye  Bye,  current  Suitor..  

You’re  my  new  Suitor..  

No,  he  is  not…  

Acknowledgements  

Figure  1:  Quality  of  the  Approxima&on  

Figure  2:  Rela&ve  run&mes  with  other  algorithms  

Figure  4:  Strong  Scaling  on  Intel  Xeon  Phi  with  60  cores,  normalized  by  the  &me  of  1  core  (4  threads)    

Ø  F.  Manne  and  M.  Halappanavar.  “New  effec&ve  mul&threaded  matching  algorithms",  Proceedings  of  IPDPS  2014,  to  appear.  

Ø  Khan,  Pothen,  Manne,  Halappanavar,  “Compu&ng  Approximate  b-­‐Matchings”,  SIAM  Workshop  on  CSC,  Lyon,  July  2014.  

Ø  J.  Mestre,  “Greedy  in  approxima&on  algorithms,"  in  Algorithms  -­‐  ESA  2006,  Lecture  Notes  in  Computer  Science,  vol.  4168.  Springer,  2006,  pp.  528-­‐539.  

Ø  B.  C.  Huang  and  T.  Jebara,  “Fast  b-­‐matching  via  sufficient  selec&on  belief  propaga&on,"  in  Proceedings  of  the  Fourteenth  InternaAonal  Conference  on  ArAficial  Intelligence  and  StaAsAcs,  AISTATS  2011,  ser.  JMLR  Proceedings,  vol.  15.  2011,  pp.  361-­‐369.  

Ø  H.  N.  Gabow  and  R.  E.  Tarjan,  “Faster  scaling  algorithms  for  network  problems,"  SIAM  Journal  of  CompuAng,  vol.  5,  no.  18,  pp.  1013{1036,  1989.  

Ø  K.  Choromanski,  T.  Jebara  and  K.  Tang.  "Adap&ve  Anonymity  via  b-­‐Matching"  .  Neural  InformaAon  Processing  Systems  (NIPS),  December  2013.    

We  also  acknowledge  the  support  of  Fredrik  Manne,  Md.  Mostofa  Ali  Patwary,  Nadathur  SaAsh  and  Narayan  Sundaram.  For  our  experiments  we  used  Purdue  Community  Cluster  Conte.  Each  compute  node  contains  two  Intel®  Xeon®1    E5-­‐2670  processors  running  at  2.60  GHz  (16  cores  in  all).  Each  node  also  has  a  Intel®  Xeon  Phi™1  coprocessor  running  at  1.1  GHz  (61  cores  in  all).  1Intel,  Xeon,  and  Intel  Xeon  Phi  are  trademarks  of  Intel  CorporaAon  in  the  U.S.  and/or  other  countries.  

Figure  3:  Strong  Scaling  on  Intel  Xeon  with  16  Cores  

59.48  

Problems   Instances   Exact  (sec)   Approx.  (sec)   Speed  up  

Caltech36   768   854   10   85  

Reed98   962   1,358   18   75  Haverford76   1,446   5,649   40   141  

Simmons81   1,518   4,226   43   98  

Ø  We   reduce   the  overall  memory   complexity  of   k-­‐Anonymity  problem   from  quadraAc   to   linear   in  number  of  data  points    by  using  parAally  sorted  adjacency  lists  in  bSuitor.  

Ø  This   enables   us   to   solve   k-­‐Anonymity   problems   that   are   two   orders   of   magnitude   larger   than   previously  reported.  

Table  1:  Comparing  single  thread  run  &mes  of  k-­‐Anonymity  problem  using  exact  b-­‐Matching  and  bSuitor.  

Problems   Instances   Xeon  (16  Cores)   Xeon  Phi  (240  Cores)   Speed  up  

UCI_Adult   32,561   21.85   9.65   2.27  

USCensus1990   55,285   111.17   54.96   2.02  Poker_hands   100,000   268.67   140.94   1.91  

Table  2:  Comparing  the  run  &mes  (seconds)  of  bSuitor  based  k-­‐Anonymity  algorithm  with  large  problems.  

Top Related