+ All Categories
Home > Documents > Compu&ng)Approximate)b2Matchings)in)Large)Graphs)) and)an...

Compu&ng)Approximate)b2Matchings)in)Large)Graphs)) and)an...

Date post: 28-May-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
1
Compu&ng Approximate bMatchings in Large Graphs and an Applica&on to kAnonymity Arif Khan Adviser: Prof. Alex Pothen Department of Computer Science, Purdue University Problem DefiniAon Abstract Given a graph, the bMatching problem is to find an edge weighted matching of maximum weight with the constraint that every vertex v can match with at most b verAces. bMatching is useful in various machine learning problems such as classificaAon, spectral clustering, graph sparsificaAon, graph embedding and data privacy. The exact algorithms for this problem have high Ame as well as space complexiAes, are inherently sequenAal, and therefore, are not pracAcal on large problems. We propose a 1/2approximaAon algorithm, we call it bSuitor, which runs in linear Ame in the number of edges and also requires linear storage. We show that our algorithm can solve large problems with billions of edges and can get up to 97% of weight of the opAmal soluAon. We also show that our algorithm scales up to 11x on 16 cores of Intel Xeon machines and up to 50x on 60 cores of Intel Xeon Phi machines. References bSuitor Algorithm Experiments and Results MoAvaAon The fastest exact algorithms for maximum edge weighted bMatching have the Ame complexity of O(|V| 1/2 |E|). Therefore, it is not pracAcal to use these algorithms to solve larger problems. It turns out that bMatching has pracAcal use in many machine learning applicaAons where approximate soluAons suffice. Therefore, any good approximaAon algorithm can be used instead of exact algorithm. The approximaAon algorithm also has the benefit of being highly scalable nature. These are several applicaAons where bMatching is shown to be useful: i) Classifica&on ii) Spectral clustering iii), Graph embedding iv) graph sparsifica&on and v) Data privacy as in k Anonymity problem. ApplicaAon to kAnonymity Privacy Problem ContribuAons and Future Work We have shown that the bSuitor algorithm is the fastest algorithm for approximate bMatching compared to other algorithms. We also show that this algorithm demonstrates near linear scalability both on Xeon and Xeon Phi mulAprocessors. We idenAfied an important applicaAon of bSuitor to a privacy problem called kAnonymity. By using bSuitor, we can solve problems with sizes larger by a factor of 100, which could not be solved before without significant change in the quality of the soluAon. Our goal is to conAnue developing faster bMatching algorithms. We also plan to apply our algorithm to other contexts such as graph clustering and parAAoning. Consider an undirected graph G(V, E, w) with vertex set V , edge set E, and weight funcAon w(e) >= 0 for each e ε E, and a funcAon f : V → Z+ assigning nonnegaAve integers to the verAces. (We assume without loss of generality that f (v) is less than or equal to the degree of the vertex v.) Then a bmatching on G is a subset of edges M of E such that every vertex v ε V has at most f (v) edges in M incident on it. The values f (v) for each vertex v could be the same or be different. The usual noAon of matching has f (v) = 1 for all v, and we will call it a 1matching. If all verAces in M are required to have degree exactly f (v), we call it a perfect bmatching. A maximum cardinality bmatching is a bmatching such that |M| is as large as possible. A maximum weight b matching is a bmatching such that total weight of the matched edges is as large as possible. We apply our algorithm to solve the kAnonymity privacy problem. We show that by using approximate matching instead of exact matching makes the algorithm faster by two order of magnitude [Table 1] Boy • I want to be your Suitor…… Girl • Let me think…… • Are you bemer than my current Suitor..??? Yes, he is.. Bye Bye, current Suitor.. You’re my new Suitor.. No, he is not… Acknowledgements Figure 1: Quality of the Approxima&on Figure 2: Rela&ve run&mes with other algorithms Figure 4: Strong Scaling on Intel Xeon Phi with 60 cores, normalized by the &me of 1 core (4 threads) F. Manne and M. Halappanavar. “New effec&ve mul&threaded matching algorithms", Proceedings of IPDPS 2014, to appear. Khan, Pothen, Manne, Halappanavar, “Compu&ng Approximate bMatchings”, SIAM Workshop on CSC, Lyon, July 2014. J. Mestre, “Greedy in approxima&on algorithms," in Algorithms ESA 2006, Lecture Notes in Computer Science, vol. 4168. Springer, 2006, pp. 528539. B. C. Huang and T. Jebara, “Fast bmatching via sufficient selec&on belief propaga&on," in Proceedings of the Fourteenth InternaAonal Conference on ArAficial Intelligence and StaAsAcs, AISTATS 2011, ser. JMLR Proceedings, vol. 15. 2011, pp. 361369. H. N. Gabow and R. E. Tarjan, “Faster scaling algorithms for network problems," SIAM Journal of CompuAng, vol. 5, no. 18, pp. 1013{1036, 1989. K. Choromanski, T. Jebara and K. Tang. "Adap&ve Anonymity via bMatching" . Neural InformaAon Processing Systems (NIPS), December 2013. We also acknowledge the support of Fredrik Manne, Md. Mostofa Ali Patwary, Nadathur SaAsh and Narayan Sundaram. For our experiments we used Purdue Community Cluster Conte. Each compute node contains two Intel® Xeon® 1 E52670 processors running at 2.60 GHz (16 cores in all). Each node also has a Intel® Xeon Phi™ 1 coprocessor running at 1.1 GHz (61 cores in all). 1 Intel, Xeon, and Intel Xeon Phi are trademarks of Intel CorporaAon in the U.S. and/or other countries. Figure 3: Strong Scaling on Intel Xeon with 16 Cores 59.48 Problems Instances Exact (sec) Approx. (sec) Speed up Caltech36 768 854 10 85 Reed98 962 1,358 18 75 Haverford76 1,446 5,649 40 141 Simmons81 1,518 4,226 43 98 We reduce the overall memory complexity of kAnonymity problem from quadraAc to linear in number of data points by using parAally sorted adjacency lists in bSuitor. This enables us to solve kAnonymity problems that are two orders of magnitude larger than previously reported. Table 1: Comparing single thread run &mes of kAnonymity problem using exact bMatching and bSuitor. Problems Instances Xeon (16 Cores) Xeon Phi (240 Cores) Speed up UCI_Adult 32,561 21.85 9.65 2.27 USCensus1990 55,285 111.17 54.96 2.02 Poker_hands 100,000 268.67 140.94 1.91 Table 2: Comparing the run &mes (seconds) of bSuitor based kAnonymity algorithm with large problems.
Transcript
Page 1: Compu&ng)Approximate)b2Matchings)in)Large)Graphs)) and)an ...sc14.supercomputing.org/sites/.../src_poster/poster... · Compu&ng)Approximate)b2Matchings)in)Large)Graphs)) and)an)Applica&on)to)kAnonymity)

   

Compu&ng  Approximate  b-­‐Matchings  in  Large  Graphs    and  an  Applica&on  to  k-­‐Anonymity  

Arif  Khan    Adviser:  Prof.  Alex  Pothen    

Department  of  Computer  Science,  Purdue  University  

       

Problem  DefiniAon  

Abstract  

Given   a   graph,   the   b-­‐Matching   problem   is   to   find   an   edge   weighted  matching   of   maximum  weight   with   the  constraint  that  every  vertex  v  can  match  with  at  most  b  verAces.  b-­‐Matching  is  useful  in  various  machine  learning  problems  such  as  classificaAon,  spectral  clustering,  graph  sparsificaAon,  graph  embedding  and  data  privacy.  The  exact  algorithms  for  this  problem  have  high  Ame  as  well  as  space  complexiAes,    are   inherently  sequenAal,  and  therefore,   are  not  pracAcal  on   large  problems.  We  propose  a  1/2-­‐approximaAon  algorithm,  we   call   it  bSuitor,  which  runs  in  linear  Ame  in  the  number  of  edges  and  also  requires  linear  storage.  We  show  that  our  algorithm  can  solve  large  problems  with  billions  of  edges  and  can  get  up  to  97%  of  weight  of  the  opAmal  soluAon.  We  also  show  that  our  algorithm  scales  up  to  11x  on  16  cores  of  Intel  Xeon  machines  and  up  to  50x  on  60  cores  of  Intel  Xeon  Phi  machines.  

References  

   

b-­‐Suitor  Algorithm  

Experiments  and  Results   MoAvaAon  

The  fastest  exact  algorithms  for  maximum  edge  weighted  b-­‐Matching  have  the  Ame  complexity  of  O(|V|1/2|E|).    Therefore,   it   is  not  pracAcal  to  use  these  algorithms  to  solve   larger  problems.   It   turns  out  that  b-­‐Matching  has  pracAcal  use   in  many  machine   learning  applicaAons  where  approximate   soluAons   suffice.  Therefore,   any  good  approximaAon   algorithm   can   be   used   instead   of   exact   algorithm.   The   approximaAon   algorithm   also   has   the  benefit  of  being  highly  scalable  nature.  These  are  several  applicaAons  where  b-­‐Matching  is  shown  to  be  useful:    i)  Classifica&on   ii)  Spectral   clustering   iii),  Graph  embedding   iv)  graph   sparsifica&on  and  v)  Data  privacy  as   in   k-­‐Anonymity  problem.  

ApplicaAon  to  k-­‐Anonymity  Privacy  Problem  

ContribuAons  and  Future  Work  Ø  We  have  shown  that  the  bSuitor  algorithm  is  the  fastest  algorithm  for  approximate  b-­‐Matching  compared  to  

other   algorithms.  We  also   show   that   this   algorithm  demonstrates  near   linear   scalability   both  on  Xeon  and  Xeon  Phi  mulAprocessors.  

Ø  We  idenAfied  an  important  applicaAon  of  bSuitor  to  a  privacy  problem  called  k-­‐Anonymity.  Ø  By  using  bSuitor,  we  can  solve  problems  with  sizes  larger  by  a  factor  of  100,  which  could  not  be  solved  before  

without  significant  change  in  the  quality  of  the  soluAon.  Ø  Our  goal  is  to  conAnue  developing  faster  b-­‐Matching  algorithms.  Ø  We  also  plan  to  apply  our  algorithm  to  other  contexts  such  as  graph  clustering  and  parAAoning.  

Consider  an  undirected  graph  G(V,  E,  w)  with  vertex  set  V  ,  edge  set  E,  and  weight  funcAon  w(e)  >=  0  for  each  e  ε  E,   and   a   funcAon   f   :   V     →   Z+   assigning   non-­‐negaAve   integers   to   the   verAces.   (We   assume   without   loss   of  generality  that   f   (v)   is   less  than  or  equal   to  the  degree  of  the  vertex  v.)  Then  a  b-­‐matching  on  G   is  a  subset  of  edges  M    of  E  such  that  every  vertex  v  ε  V  has  at  most  f  (v)  edges  in  M  incident  on  it.  The  values  f  (v)  for  each  vertex  v  could  be  the  same  or  be  different.  The  usual  noAon  of  matching  has  f  (v)  =  1  for  all  v,  and  we  will  call  it  a  1-­‐matching.   If   all   verAces   in   M   are   required   to   have   degree   exactly   f   (v),   we   call   it   a   perfect   b-­‐matching.   A  maximum  cardinality   b-­‐matching   is   a   b-­‐matching   such   that   |M|   is   as   large   as   possible.  A  maximum  weight  b-­‐matching  is  a  b-­‐matching  such  that  total  weight  of  the  matched  edges  is  as  large  as  possible.  

Ø  We  apply  our  algorithm  to  solve  the  k-­‐Anonymity  privacy  problem.  Ø  We  show  that  by  using  approximate  matching  instead  of  exact  matching  makes  the  algorithm  faster  by  two  

order  of  magnitude  [Table  1]  

Boy  • I  want  to  be  your  Suitor……  J  

Girl  

• Let  me  think……  • Are  you  bemer  than  my  current  Suitor..???  

Yes,  he  is..  

Bye  Bye,  current  Suitor..  

You’re  my  new  Suitor..  

No,  he  is  not…  

Acknowledgements  

Figure  1:  Quality  of  the  Approxima&on  

Figure  2:  Rela&ve  run&mes  with  other  algorithms  

Figure  4:  Strong  Scaling  on  Intel  Xeon  Phi  with  60  cores,  normalized  by  the  &me  of  1  core  (4  threads)    

Ø  F.  Manne  and  M.  Halappanavar.  “New  effec&ve  mul&threaded  matching  algorithms",  Proceedings  of  IPDPS  2014,  to  appear.  

Ø  Khan,  Pothen,  Manne,  Halappanavar,  “Compu&ng  Approximate  b-­‐Matchings”,  SIAM  Workshop  on  CSC,  Lyon,  July  2014.  

Ø  J.  Mestre,  “Greedy  in  approxima&on  algorithms,"  in  Algorithms  -­‐  ESA  2006,  Lecture  Notes  in  Computer  Science,  vol.  4168.  Springer,  2006,  pp.  528-­‐539.  

Ø  B.  C.  Huang  and  T.  Jebara,  “Fast  b-­‐matching  via  sufficient  selec&on  belief  propaga&on,"  in  Proceedings  of  the  Fourteenth  InternaAonal  Conference  on  ArAficial  Intelligence  and  StaAsAcs,  AISTATS  2011,  ser.  JMLR  Proceedings,  vol.  15.  2011,  pp.  361-­‐369.  

Ø  H.  N.  Gabow  and  R.  E.  Tarjan,  “Faster  scaling  algorithms  for  network  problems,"  SIAM  Journal  of  CompuAng,  vol.  5,  no.  18,  pp.  1013{1036,  1989.  

Ø  K.  Choromanski,  T.  Jebara  and  K.  Tang.  "Adap&ve  Anonymity  via  b-­‐Matching"  .  Neural  InformaAon  Processing  Systems  (NIPS),  December  2013.    

We  also  acknowledge  the  support  of  Fredrik  Manne,  Md.  Mostofa  Ali  Patwary,  Nadathur  SaAsh  and  Narayan  Sundaram.  For  our  experiments  we  used  Purdue  Community  Cluster  Conte.  Each  compute  node  contains  two  Intel®  Xeon®1    E5-­‐2670  processors  running  at  2.60  GHz  (16  cores  in  all).  Each  node  also  has  a  Intel®  Xeon  Phi™1  coprocessor  running  at  1.1  GHz  (61  cores  in  all).  1Intel,  Xeon,  and  Intel  Xeon  Phi  are  trademarks  of  Intel  CorporaAon  in  the  U.S.  and/or  other  countries.  

Figure  3:  Strong  Scaling  on  Intel  Xeon  with  16  Cores  

59.48  

Problems   Instances   Exact  (sec)   Approx.  (sec)   Speed  up  

Caltech36   768   854   10   85  

Reed98   962   1,358   18   75  Haverford76   1,446   5,649   40   141  

Simmons81   1,518   4,226   43   98  

Ø  We   reduce   the  overall  memory   complexity  of   k-­‐Anonymity  problem   from  quadraAc   to   linear   in  number  of  data  points    by  using  parAally  sorted  adjacency  lists  in  bSuitor.  

Ø  This   enables   us   to   solve   k-­‐Anonymity   problems   that   are   two   orders   of   magnitude   larger   than   previously  reported.  

Table  1:  Comparing  single  thread  run  &mes  of  k-­‐Anonymity  problem  using  exact  b-­‐Matching  and  bSuitor.  

Problems   Instances   Xeon  (16  Cores)   Xeon  Phi  (240  Cores)   Speed  up  

UCI_Adult   32,561   21.85   9.65   2.27  

USCensus1990   55,285   111.17   54.96   2.02  Poker_hands   100,000   268.67   140.94   1.91  

Table  2:  Comparing  the  run  &mes  (seconds)  of  bSuitor  based  k-­‐Anonymity  algorithm  with  large  problems.  

Recommended