Compu&ng Approximate b-‐Matchings in Large Graphs and an Applica&on to k-‐Anonymity
Arif Khan Adviser: Prof. Alex Pothen
Department of Computer Science, Purdue University
Problem DefiniAon
Abstract
Given a graph, the b-‐Matching problem is to find an edge weighted matching of maximum weight with the constraint that every vertex v can match with at most b verAces. b-‐Matching is useful in various machine learning problems such as classificaAon, spectral clustering, graph sparsificaAon, graph embedding and data privacy. The exact algorithms for this problem have high Ame as well as space complexiAes, are inherently sequenAal, and therefore, are not pracAcal on large problems. We propose a 1/2-‐approximaAon algorithm, we call it bSuitor, which runs in linear Ame in the number of edges and also requires linear storage. We show that our algorithm can solve large problems with billions of edges and can get up to 97% of weight of the opAmal soluAon. We also show that our algorithm scales up to 11x on 16 cores of Intel Xeon machines and up to 50x on 60 cores of Intel Xeon Phi machines.
References
b-‐Suitor Algorithm
Experiments and Results MoAvaAon
The fastest exact algorithms for maximum edge weighted b-‐Matching have the Ame complexity of O(|V|1/2|E|). Therefore, it is not pracAcal to use these algorithms to solve larger problems. It turns out that b-‐Matching has pracAcal use in many machine learning applicaAons where approximate soluAons suffice. Therefore, any good approximaAon algorithm can be used instead of exact algorithm. The approximaAon algorithm also has the benefit of being highly scalable nature. These are several applicaAons where b-‐Matching is shown to be useful: i) Classifica&on ii) Spectral clustering iii), Graph embedding iv) graph sparsifica&on and v) Data privacy as in k-‐Anonymity problem.
ApplicaAon to k-‐Anonymity Privacy Problem
ContribuAons and Future Work Ø We have shown that the bSuitor algorithm is the fastest algorithm for approximate b-‐Matching compared to
other algorithms. We also show that this algorithm demonstrates near linear scalability both on Xeon and Xeon Phi mulAprocessors.
Ø We idenAfied an important applicaAon of bSuitor to a privacy problem called k-‐Anonymity. Ø By using bSuitor, we can solve problems with sizes larger by a factor of 100, which could not be solved before
without significant change in the quality of the soluAon. Ø Our goal is to conAnue developing faster b-‐Matching algorithms. Ø We also plan to apply our algorithm to other contexts such as graph clustering and parAAoning.
Consider an undirected graph G(V, E, w) with vertex set V , edge set E, and weight funcAon w(e) >= 0 for each e ε E, and a funcAon f : V → Z+ assigning non-‐negaAve integers to the verAces. (We assume without loss of generality that f (v) is less than or equal to the degree of the vertex v.) Then a b-‐matching on G is a subset of edges M of E such that every vertex v ε V has at most f (v) edges in M incident on it. The values f (v) for each vertex v could be the same or be different. The usual noAon of matching has f (v) = 1 for all v, and we will call it a 1-‐matching. If all verAces in M are required to have degree exactly f (v), we call it a perfect b-‐matching. A maximum cardinality b-‐matching is a b-‐matching such that |M| is as large as possible. A maximum weight b-‐matching is a b-‐matching such that total weight of the matched edges is as large as possible.
Ø We apply our algorithm to solve the k-‐Anonymity privacy problem. Ø We show that by using approximate matching instead of exact matching makes the algorithm faster by two
order of magnitude [Table 1]
Boy • I want to be your Suitor…… J
Girl
• Let me think…… • Are you bemer than my current Suitor..???
Yes, he is..
Bye Bye, current Suitor..
You’re my new Suitor..
No, he is not…
Acknowledgements
Figure 1: Quality of the Approxima&on
Figure 2: Rela&ve run&mes with other algorithms
Figure 4: Strong Scaling on Intel Xeon Phi with 60 cores, normalized by the &me of 1 core (4 threads)
Ø F. Manne and M. Halappanavar. “New effec&ve mul&threaded matching algorithms", Proceedings of IPDPS 2014, to appear.
Ø Khan, Pothen, Manne, Halappanavar, “Compu&ng Approximate b-‐Matchings”, SIAM Workshop on CSC, Lyon, July 2014.
Ø J. Mestre, “Greedy in approxima&on algorithms," in Algorithms -‐ ESA 2006, Lecture Notes in Computer Science, vol. 4168. Springer, 2006, pp. 528-‐539.
Ø B. C. Huang and T. Jebara, “Fast b-‐matching via sufficient selec&on belief propaga&on," in Proceedings of the Fourteenth InternaAonal Conference on ArAficial Intelligence and StaAsAcs, AISTATS 2011, ser. JMLR Proceedings, vol. 15. 2011, pp. 361-‐369.
Ø H. N. Gabow and R. E. Tarjan, “Faster scaling algorithms for network problems," SIAM Journal of CompuAng, vol. 5, no. 18, pp. 1013{1036, 1989.
Ø K. Choromanski, T. Jebara and K. Tang. "Adap&ve Anonymity via b-‐Matching" . Neural InformaAon Processing Systems (NIPS), December 2013.
We also acknowledge the support of Fredrik Manne, Md. Mostofa Ali Patwary, Nadathur SaAsh and Narayan Sundaram. For our experiments we used Purdue Community Cluster Conte. Each compute node contains two Intel® Xeon®1 E5-‐2670 processors running at 2.60 GHz (16 cores in all). Each node also has a Intel® Xeon Phi™1 coprocessor running at 1.1 GHz (61 cores in all). 1Intel, Xeon, and Intel Xeon Phi are trademarks of Intel CorporaAon in the U.S. and/or other countries.
Figure 3: Strong Scaling on Intel Xeon with 16 Cores
59.48
Problems Instances Exact (sec) Approx. (sec) Speed up
Caltech36 768 854 10 85
Reed98 962 1,358 18 75 Haverford76 1,446 5,649 40 141
Simmons81 1,518 4,226 43 98
Ø We reduce the overall memory complexity of k-‐Anonymity problem from quadraAc to linear in number of data points by using parAally sorted adjacency lists in bSuitor.
Ø This enables us to solve k-‐Anonymity problems that are two orders of magnitude larger than previously reported.
Table 1: Comparing single thread run &mes of k-‐Anonymity problem using exact b-‐Matching and bSuitor.
Problems Instances Xeon (16 Cores) Xeon Phi (240 Cores) Speed up
UCI_Adult 32,561 21.85 9.65 2.27
USCensus1990 55,285 111.17 54.96 2.02 Poker_hands 100,000 268.67 140.94 1.91
Table 2: Comparing the run &mes (seconds) of bSuitor based k-‐Anonymity algorithm with large problems.