Date post: | 29-Aug-2014 |
Category: |
Technology |
Upload: | jason-riedy |
View: | 1,248 times |
Download: | 0 times |
Scalable Multi-threaded Community Detectionin Social Networks
E. Jason Riedy1, David A. Bader1, and Henning Meyerhenke2
1School of Comp. Science and Engineering, Georgia Inst. of Technology2Inst. of Theoretical Informatics, Karlsruhe Inst. of Technology (KIT)
25 May 2012
Exascale data analysis
Health care Finding outbreaks, population epidemiology
Social networks Advertising, searching, grouping
Intelligence Decisions at scale, regulating algorithms
Systems biology Understanding interactions, drug design
Power grid Disruptions, conservation
Simulation Discrete events, cracking meshes
• Graph clustering is common in all application areas.
MTAAP 2012—Scalable Community Detection—Jason Riedy 2/35
These are not easy graphs.Yifan Hu’s (AT&T) visualization of the in-2004 data set
http://www2.research.att.com/~yifanhu/gallery.html
MTAAP 2012—Scalable Community Detection—Jason Riedy 3/35
But no shortage of structure...
Protein interactions, Giot et al., “A ProteinInteraction Map of Drosophila melanogaster”,Science 302, 1722-1736, 2003.
Jason’s network via LinkedIn Labs
• Locally, there are clusters or communities.
• First pass over a massive social graph:• Find smaller communities of interest.• Analyze / visualize top-ranked communities.
• Our part: Community detection at massive scale. (Or kindalarge, given available data.)
MTAAP 2012—Scalable Community Detection—Jason Riedy 4/35
Outline
Motivation
Defining community detection and metrics
Shooting for massive graphs
Our parallel method
Implementation and platform details
Performance
Conclusions and plans
MTAAP 2012—Scalable Community Detection—Jason Riedy 5/35
Community detection
What do we mean?• Partition a graph’s
vertices into disjointcommunities.
• A community locallymaximizes some metric.• Modularity,
conductance, ...
• Trying to capture thatvertices are more similarwithin one communitythan betweencommunities. Jason’s network via LinkedIn Labs
MTAAP 2012—Scalable Community Detection—Jason Riedy 6/35
Community detection
Assumptions
• Disjoint partitioning ofvertices.
• There is no one uniqueanswer.• Many metrics are
NP-complete tooptimize (Brandes, etal.[1]).
• Graph is lossyrepresentation.
• Want an adaptabledetection method. Jason’s network via LinkedIn Labs
MTAAP 2012—Scalable Community Detection—Jason Riedy 7/35
Common community metric: Modularity
• Modularity: Deviation of connectivity in the communityinduced by a vertex set S from some expected backgroundmodel of connectivity.
• We take Newman [2]’s basic uniform model.
• Let m count all edges in graph G, mS count of edges withboth endpoints in S, and xS count the edges with anyendpoint in S. Modularity QS :
QS = (mS − x2S/4m)/m
• Total modularity: sum of modularities of disjoint subsets.
• A sufficiently positive modularity implies some structure.
• Known issues: Resolution limit, NP-complete opt. prob.
MTAAP 2012—Scalable Community Detection—Jason Riedy 8/35
Can we tackle massive graphs now?
Parallel, of course...• Massive needs distributed memory, right?
• Well... Not really. Can buy a 2 TiB Intel-based Dell serveron-line for around $200k USD, a 1.5 TiB from IBM, etc.
Image: dell.com.
Not an endorsement, just evidence!
• Publicly available “real-world” data fits...
• Start with shared memory to see what needs done.
• Specialized architectures provide larger shared-memory viewsover distributed implementations (e.g. Cray XMT).
MTAAP 2012—Scalable Community Detection—Jason Riedy 9/35
Multi-threaded algorithm design points
A scalable multi-threaded graph analysis algorithm
• ... avoids global locks and frequent global synchronization.
• ... distributes computation over edges rather than only vertices.
• ... works with data as local to an edge as possible.
• ... uses compact data structures that agglomerate memoryreferences.
MTAAP 2012—Scalable Community Detection—Jason Riedy 10/35
Sequential agglomerative method
A
B
C
D
E
FG
• A common method (e.g. Clauset, etal. [3]) agglomerates vertices intocommunities.
• Each vertex begins in its owncommunity.
• An edge is chosen to contract.• Merging maximally increases
modularity.• Priority queue.
• Known often to fall into an O(n2)performance trap with modularity(Wakita & Tsurumi [4]).
MTAAP 2012—Scalable Community Detection—Jason Riedy 11/35
Sequential agglomerative method
A
B
C
D
E
FG
C
B
• A common method (e.g. Clauset, etal. [3]) agglomerates vertices intocommunities.
• Each vertex begins in its owncommunity.
• An edge is chosen to contract.• Merging maximally increases
modularity.• Priority queue.
• Known often to fall into an O(n2)performance trap with modularity(Wakita & Tsurumi [4]).
MTAAP 2012—Scalable Community Detection—Jason Riedy 12/35
Sequential agglomerative method
A
B
C
D
E
FG
C
B
D
A
• A common method (e.g. Clauset, etal. [3]) agglomerates vertices intocommunities.
• Each vertex begins in its owncommunity.
• An edge is chosen to contract.• Merging maximally increases
modularity.• Priority queue.
• Known often to fall into an O(n2)performance trap with modularity(Wakita & Tsurumi [4]).
MTAAP 2012—Scalable Community Detection—Jason Riedy 13/35
Sequential agglomerative method
A
B
C
D
E
FG
C
B
D
A
B
C
• A common method (e.g. Clauset, etal. [3]) agglomerates vertices intocommunities.
• Each vertex begins in its owncommunity.
• An edge is chosen to contract.• Merging maximally increases
modularity.• Priority queue.
• Known often to fall into an O(n2)performance trap with modularity(Wakita & Tsurumi [4]).
MTAAP 2012—Scalable Community Detection—Jason Riedy 14/35
Parallel agglomerative method
A
B
C
D
E
FG
• We use a matching to avoid the queue.
• Compute a heavy weight, largematching.• Simple greedy algorithm.• Maximal matching.• Within factor of 2 in weight.
• Merge all matched communities atonce.
• Maintains some balance.
• Produces different results.
• Agnostic to weighting, matching...• Can maximize modularity, minimize
conductance.• Modifying matching permits easy
exploration.
MTAAP 2012—Scalable Community Detection—Jason Riedy 15/35
Parallel agglomerative method
A
B
C
D
E
FG
C
D
G
• We use a matching to avoid the queue.
• Compute a heavy weight, largematching.• Simple greedy algorithm.• Maximal matching.• Within factor of 2 in weight.
• Merge all matched communities atonce.
• Maintains some balance.
• Produces different results.
• Agnostic to weighting, matching...• Can maximize modularity, minimize
conductance.• Modifying matching permits easy
exploration.
MTAAP 2012—Scalable Community Detection—Jason Riedy 16/35
Parallel agglomerative method
A
B
C
D
E
FG
C
D
G
E
B
C
• We use a matching to avoid the queue.
• Compute a heavy weight, largematching.• Simple greedy algorithm.• Maximal matching.• Within factor of 2 in weight.
• Merge all matched communities atonce.
• Maintains some balance.
• Produces different results.
• Agnostic to weighting, matching...• Can maximize modularity, minimize
conductance.• Modifying matching permits easy
exploration.
MTAAP 2012—Scalable Community Detection—Jason Riedy 17/35
Platform: Cray XMT2
Tolerates latency by massive multithreading.
• Hardware: 128 threads per processor• Context switch on every cycle (500 MHz)• Many outstanding memory requests (180/proc)• “No” caches...
• Flexibly supports dynamic load balancing• Globally hashed address space, no data cache
• Support for fine-grained, word-level synchronization• Full/empty bit on with every memory word
• 64 processor XMT2 at CSCS,the Swiss NationalSupercomputer Centre.
• 500 MHz processors, 8192threads, 2 TiB of sharedmemory
Image: cray.com
MTAAP 2012—Scalable Community Detection—Jason Riedy 18/35
Platform: Intel R© E7-8870-based server
Tolerates some latency by hyperthreading.
• “Westmere:” 2 threads / core, 10 cores / socket, four sockets.• Fast cores (2.4 GHz), fast memory (1 066 MHz).• Not so many outstanding memory requests (60/socket), but
large caches (30 MiB L3 per socket).
• Good system support• Transparent hugepages reduces TLB costs.• Fast, user-level locking. (HLE would be better...)• OpenMP, although I didn’t tune it...
• mirasol, #17 on Graph500(thanks to UCB)
• Four processors (80 threads),256 GiB memory
• gcc 4.6.1, Linux kernel3.2.0-rc5
Image: Intel R© press kit
MTAAP 2012—Scalable Community Detection—Jason Riedy 19/35
Platform: Other Intel R©-based servers
Different design points
• “Nehalem” X5570: 2.93 GHz, 2 threads/core, 4 cores/socket,2 sockets, 8 MiB cache/socket
• “Westmere” X5650: 2.66 GHz, 2 threads/core, 6 cores/socket,2 sockets, 12 MiB cache/socket
• All with 1 066 MHz memory.
• Does the Westmere E7-8870’s scale affect performance?
• Nodes in Georgia Tech CSEcluster jinx
• 24-48 GiB memory, smalltests
Image: Intel R© press kit
MTAAP 2012—Scalable Community Detection—Jason Riedy 20/35
Implementation: Data structures
Extremely basic for graph G = (V,E)
• An array of (i, j;w) weighted edge pairs, each i, j stored onlyonce and packed, uses 3|E| space
• An array to store self-edges, d(i) = w, |V |• A temporary floating-point array for scores, |E|• A additional temporary arrays using 4|V |+ 2|E| to store
degrees, matching choices, offsets...
• Weights count number of agglomerated vertices or edges.
• Scoring methods (modularity, conductance) need onlyvertex-local counts.
• Storing an undirected graph in a symmetric manner reducesmemory usage drastically and works with our simple matcher.
MTAAP 2012—Scalable Community Detection—Jason Riedy 21/35
Implementation: Data structures
Extremely basic for graph G = (V,E)
• An array of (i, j;w) weighted edge pairs, each i, j stored onlyonce and packed, uses 3|E| space
• An array to store self-edges, d(i) = w, |V |• A temporary floating-point array for scores, |E|• A additional temporary arrays using 4|V |+ 2|E| to store
degrees, matching choices, offsets...
• Original ignored order in edge array, killed OpenMP.
• New: Roughly bucket edge array by first stored index.Non-adjacent CSR-like structure.
• New: Hash i, j to determine order. Scatter among buckets.
MTAAP 2012—Scalable Community Detection—Jason Riedy 22/35
Implementation: Routines
Three primitives: Scoring, matching, contracting
Scoring Trivial.
Matching Repeat until no ready, unmatched vertex:
1 For each unmatched vertex in parallel, find thebest unmatched neighbor in its bucket.
2 Try to point remote match at that edge (lock,check if best, unlock).
3 If pointing succeeded, try to point self-match atthat edge.
4 If both succeeded, yeah! If not and there wassome eligible neighbor, re-add self to ready,unmatched list.
(Possibly too simple, but...)
MTAAP 2012—Scalable Community Detection—Jason Riedy 23/35
Implementation: Routines
Contracting
1 Map each i, j to new vertices, re-order by hashing.
2 Accumulate counts for new i′ bins, prefix-sum for offset.
3 Copy into new bins.
• Only synchronizing in the prefix-sum. That could be removed ifI don’t re-order the i′, j′ pair; haven’t timed the difference.
• Actually, the current code copies twice... On short list forfixing.
• Binning as opposed to original list-chasing enabledIntel/OpenMP support with reasonable performance.
MTAAP 2012—Scalable Community Detection—Jason Riedy 24/35
Performance summary
Two moderate-sized graphs, one large
Graph |V | |E| Reference
rmat-24-16 15 580 378 262 482 711 [5, 6]soc-LiveJournal1 4 847 571 68 993 773 [7]
uk-2007-05 105 896 555 3 301 876 564 [8]
Peak processing rates in edges/second
Platform rmat-24-16 soc-LiveJournal1 uk-2007-05
X5570 1.83× 106 3.89× 106
X5650 2.54× 106 4.98× 106
E7-8870 5.86× 106 6.90× 106 6.54× 106
XMT 1.20× 106 0.41× 106
XMT2 2.11× 106 1.73× 106 3.11× 106
MTAAP 2012—Scalable Community Detection—Jason Riedy 25/35
Performance: Time to solution
Threads (OpenMP) / Processors (XMT)
Tim
e (s
) 3
10
32
100
316
1000
3162
3
10
32
100
316
1000
3162
rmat−24−16
●●●●●●
●●●●●●
●●●●●●●●●●●●
●●●●●●
1 2 4 8 16 32 64 128
soc−LiveJournal1
●●●●●●
●●●●●●
●●●●●●
●●●●●●●●●●●●
1 2 4 8 16 32 64 128
IntelC
ray XM
T
Platform
● X5570 (4−core) X5650 (6−core) E7−8870 (10−core) XMT XMT2
MTAAP 2012—Scalable Community Detection—Jason Riedy 26/35
Performance: Rate (edges/second)
Threads (OpenMP) / Processors (XMT)
Edg
es p
er s
econ
d
105
105.5
106
106.5
107
107.5
105
105.5
106
106.5
107
107.5
rmat−24−16
●●●●●●
●●●●●●
●●●●●●
●●●●●●
●●●●●●
1 2 4 8 16 32 64 128
soc−LiveJournal1
●●
●
●
●●
●●
●●
●●
●●●
●●●●●●
●●●●
●●●●●
1 2 4 8 16 32 64 128
IntelC
ray XM
T
Platform
● X5570 (4−core) X5650 (6−core) E7−8870 (10−core) XMT XMT2
MTAAP 2012—Scalable Community Detection—Jason Riedy 27/35
Performance: Modularity
step
Mod
ular
ity
0.0
0.2
0.4
0.6
0.8
●
●●
0 10 20 30
Graph
● coAuthorsCiteseer ● eu−2005 ● uk−2002
Termination metric
● Coverage Max Average
• Timing results: Stop when coverage ≥ 0.5 (communities cover 1/2 edges).
• More work ⇒ higher modularity. Choice up to application.
MTAAP 2012—Scalable Community Detection—Jason Riedy 28/35
Performance: Small-scale speedup
Threads (OpenMP) / Processors (XMT)
Spe
ed−
up o
ver
one
thre
ad (
Ope
nMP
) or
pro
cess
or (
XM
T)
1
2
4
8
16
32
1
2
4
8
16
32
rmat−24−16
●
●●
●●●
●●●●●●
●●●●●●
●●●●●●
1 2 4 8 16 32 64 128
soc−LiveJournal1
●●●●●●
●●●
●●●
●●●●●●
●●●●●●
1 2 4 8 16 32 64 128
IntelC
ray XM
T
Platform
● X5570 (4−core) X5650 (6−core) E7−8870 (10−core) XMT XMT2
MTAAP 2012—Scalable Community Detection—Jason Riedy 29/35
Performance: Large-scale time
Threads (OpenMP) / Processors (XMT)
Tim
e (s
)
210
212
214
●
●
●
●
●
●
●
●
●
● ● ●
●
●● ●
● ●
●
●
●
●
●
●
504.9s
1063s
6917s
31568s
1 2 4 8 16 32 64
Platform
● E7−8870 (10−core) XMT2
MTAAP 2012—Scalable Community Detection—Jason Riedy 30/35
Performance: Large-scale speedup
Threads (OpenMP) / Processors (XMT)
Spe
ed u
p ov
er s
ingl
e pr
oces
sor/
thre
ad
21
22
23
24
●
●
●
●
●
●
●
●
● ● ●
●
●
● ●
● ●
●
●
●
●
●
●
13.7x
29.6x
1 2 4 8 16 32 64
Platform
● E7−8870 (10−core) XMT2
MTAAP 2012—Scalable Community Detection—Jason Riedy 31/35
Conclusions and plans
• Code:http://www.cc.gatech.edu/~jriedy/community-detection/
• Some low-hanging fruit remains:• Eliminate one unnecessary copy during contraction.• Deal with stars.
• Then... Practical experiments.• How volatile are modularity and conductance to perturbations?• What matching schemes work well?• How do different metrics compare in applications?
• Extending to streaming graph data!• Includes developing parallel refinement...• And possibly de-clustering or manipulating the dendogram....• Very much WIP, more tricky than anticipated.
MTAAP 2012—Scalable Community Detection—Jason Riedy 32/35
Acknowledgment of support
MTAAP 2012—Scalable Community Detection—Jason Riedy 33/35
Bibliography I
U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer,Z. Nikoloski, and D. Wagner, “On modularity clustering,” IEEETrans. Knowledge and Data Engineering, vol. 20, no. 2, pp.172–188, 2008.
M. Newman, “Modularity and community structure innetworks,” Proc. of the National Academy of Sciences, vol. 103,no. 23, pp. 8577–8582, 2006.
A. Clauset, M. Newman, and C. Moore, “Finding communitystructure in very large networks,” Physical Review E, vol. 70,no. 6, p. 66111, 2004.
K. Wakita and T. Tsurumi, “Finding community structure inmega-scale social networks,” CoRR, vol. abs/cs/0702048, 2007.
MTAAP 2012—Scalable Community Detection—Jason Riedy 34/35
Bibliography II
D. Chakrabarti, Y. Zhan, and C. Faloutsos, “R-MAT: Arecursive model for graph mining,” in Proc. 4th SIAM Intl.Conf. on Data Mining (SDM). Orlando, FL: SIAM, Apr. 2004.
D. Bader, J. Gilbert, J. Kepner, D. Koester, E. Loh,K. Madduri, W. Mann, and T. Meuse, HPCS SSCA#2 GraphAnalysis Benchmark Specifications v1.1, Jul. 2005.
J. Leskovec, “Stanford large network dataset collection,” Athttp://snap.stanford.edu/data/, Oct. 2011.
P. Boldi, B. Codenotti, M. Santini, and S. Vigna, “Ubicrawler:A scalable fully distributed web crawler,” Software: Practice &Experience, vol. 34, no. 8, pp. 711–726, 2004.
MTAAP 2012—Scalable Community Detection—Jason Riedy 35/35