Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | aubrey-todd |
View: | 217 times |
Download: | 2 times |
Efficient Triangle Motif Counting in Large Scale Complex Networks with GPUs
Hakan Kardeş
CS 791v
Introduction
• Many systems are being modeled as complex networks to understand local and global characteristics of these systems.
• Studying network models of these systems provides a new direction towards understanding biological, chemical, technological or social systems in a better way.
CS 791v 2
Complex Networks Everywhere
3
Aspirin Yeast protein interaction network
An Internet Web Co-author network
CS 791v
Why Graph Mining and Searching?
• In many cases, systems under investigation are very large and the corresponding graphs have large number of nodes/edges requiring graph mining techniques to derive information from the graph.
• Several graph mining techniques have been developed to extract useful information from graph representation and analyze various features of complex networks.
4CS 791v
Why is Triangle Counting important?
5
• Clustering coefficient • Transitivity ratio • Social Network Analysis fact:
“Friends of friends are friends”
A
CB
• Hidden Thematic Structure of the Web (Eckmann et al. PNAS [EM02])
• Motif Detection, (e.g., [YPSB05] )• Web Spam Detection (Becchetti
et.al. KDD ’08 [BBCG08])
[WF94)]
CS 791v
Related Work• Hakan Kardes, and M. H. Gunes. Structural Graph Indexing for Mining Complex Networks.
IEEE ICDCS 2010 Workshop on Simplifying Complex Networks for Practitioners, Genoa, ITALY, June 21 2010.– Our paper in which we count all star, triangle, complete bipartite and clique structures.
• Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407, 1-3 (November 2008), 458-473. – Survey paper, focused on space complexity
• Charalampos Tsourakakis, Petros Drineas, Eirinaios Michelakis, Ioannis Koutis, Christos Faloutsos, "Spectral Counting of Triangles in Power-Law Networks via Element-Wise Sparsification," Social Network Analysis and Mining, International Conference on Advances in, pp. 66-71, 2009 International Conference on Advances in Social Network Analysis and Mining, 2009– relies on the spectral properties of power-law networks, focused on power-law networks
6CS 791v
Related Work• Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2010. Efficient
algorithms for large-scale local triangle counting. ACM Transactions on Knowl. Discov. Data 4, 3, Article 13 (October 2010), 28 pages. – They count the number of triangles for a given node.
• Charalampos E. Tsourakakis, U Kang, Gary L. Miller, and Christos Faloutsos. Doulion: Counting triangles in massive graphs with a coin.In Knowledge Discovery and Data Mining (KDD '09)
• Belkacem Serrour, Alex Arenas, Sergio Gomez. 2010. Detecting communities of triangles in complex networks using spectral optimization.
• Bill Andreopoulos, Christof Winter, Dirk Labudde and Michael Schroeder. Triangle network motifs predict complexes by complementing high-error interactomes with structural information. BMC Bioinformatics 2009
7CS 791v
Methodology
8
Star
9
• We first index the star structure where a node has multiple neighbors as shown in below figures.
• All star structures within a graph G = (V,E) are represented as s(vi , nsi) where vi V and ns∈ i is the set of all neighbors of vi.
• We index maximal star structures for each node.
v1
ns1
ns2
ns2v1
ns1
ns3
vi
ns1
ns2
nsn
.
.
.
.
CS 791v
StarAlgorithm:
• First build a star structure s(v,ø) for each node v V, ∈without any neighbors.
• Then, for each edge e(a, b) E, append neighbor ∈sets of nodes a and b to the other one.
• Finally, remove star structures s(v,ns) that have less than two neighbors.
10
Nodes: a, b, c, d, e, f.
Edges: (a,b), (a,d), (a,f), (b,e), (c,f), (d,f)
Star Structures:
a b c
d e f
b a
a
f
dab
e
f
cf
d
CS 791v
Triangle
11
Algorithm:• Find second hop neighbors of ‘a’ by iterating over the ns set
• Then, take the intersection of second hop neighbors of ‘a’ and ns set.
• Grow the triangle set for each isi ԑ is.
a
ns1
ns2
nsn
.
.
.
.
CS 791v
ls1
ls2
lsn
.
.
.
.
CUDA
• For the parallel algorithm, I will use CUDA.
12CS 791v
CUDA
13CS 791v
Experiments
14
Possible Datasets for Experiments
• Router-level Internet topology (around 2.3 M nodes and 4M edges)– http://cheleby.cse.unr.edu/data.html
• the routing data on the Internet network (around 124K nodes and 207K edges)– http://vlado.fmf.uni-lj.si/pub/networks/data/web/web.zip
• a mobile phone graph. (around 2.7M nodes and 6M edges)– Will be requested from the authors of “Structure of neighborhoods in a large social network”
• Biological Data– http://www.biomedcentral.com/1471-2105/10/196
• Wikipedia graph (around 1.6M nodes and 18.5M edges)– I haven’t decided how to do it yet.
• I will generate sample graphs with different number of triangles– I haven’t decided how to do it yet.
15CS 791v
Results
• Triangle Counting CPU vs. GPU:
16
100 1000 10000 100000
CPUGPU
no. of nodes
Exe
cuti
on T
ime
CS 791v
Results
17
• Triangle Counting CPU vs. GPU:
x 1.5x 2x 3x
CPUGPU
no. of edges(while no. of nodes is constant
Exe
cuti
on T
ime
CS 791v
Results
18
• Triangle Counting with different triangle sizes:
x 1.5x 2x 3x 5x 10x
CPUGPU
No. of triangles
Exe
cuti
on T
ime
CS 791v
Results
19
• Triangle Counting with different block sizes:
x 1.5x 2x 3x 5x 10x
GPU
GPU
Block Size
Exe
cuti
on T
ime
CS 791v
Future Work
Structural Graph Indexing(SGI)
• We propose an alternative structural indexing approach to search and process queries efficiently even in very large graphs.
• As indexing features, we use commonly observed graph structures: star, complete bipartite, triangle and clique.
• These structures are ubiquitous in biological, chemical, technological, social, and many other complex networks.
21CS 791v
Structural Models
22
d1
dn
d2
v1 ...
d1
d3
d2v1
v1
v3
d4
d2
d1
v2
d3
v1
v2
d3
d2
d1
3-Star (K1,3) n-Star (K1,n) 2*3-Complete Bipartite(K2,3) 3*4-Complete Bipartite(K3,4)
v1
vm
dn
d1
.
.
.
.
.
.
.
.
.
.
.
.
v1
v3 v2
v1
v4 v3
v2 v1
vn
v2
v3
v4
. . . . ..
m*n-Complete Bipartite(Km,n) Triangle(K3) 4-Clique (K4) n-Clique(Kn)
CS 791v
Structural Graph Indexing
• An important feature of these structures is that each one is comprised from the previous one where clique contains complete bipartite structures and complete bipartite contains star structures.
• So, we can index these structures within the original graph in a consecutive manner. We first identify star structures, and then the complete-bipartite and clique structures from the preceding ones.
23CS 791v
Structural Graph Indexing
• An important difference of our approach from the previous studies is that we does not limit the size of subgraph considered in indexing. We index all maximal graphs that match the structure formulation. For instance, a maximal clique is a clique that cannot be extended by adding one more vertex from the graph. However, the substructure size in indexing may be limited when needed, since maximal clique search is known to be NP-complete.
24CS 791v
Complete Bipartite
SIMPLEX’10 25
• The second structure we index is complete bipartite, shown in below figures.
• A complete bipartite graph G = (V1 V2,E) is a bipartite ∪graph such that V1 and V2 are two distinct sets and for any two vertices vi V1 and vj V2, ∈ ∈ then there is an edge between them (i.e., e ∃ ∗ (vi,vj ) E). ∈
v1
v3
d4
d2
d1
v2
d3
v1
v2
d3
d2
d1v1
vm
dn
d1
.
.
.
.
.
.
.
.
.
.
.
.
.
CS 791v
Complete Bipartite
26
• Complete bipartite structure is ubiquitous in many complex networks.– protein-protein interaction networks (Thomas et. al.)– the Internet (Fay et. al.)
• We index all complete bipartite structures in the graph G using indexed star structures.
• For each star structure s(a,ns) where a V and ns is ∈the neighbor set of the node a, we identify the maximal complete bipartite involving the node ‘a’.
CS 791v
Complete Bipartite
27
Algorithm:• Find second hop neighbors of ‘a’ by iterating over the ns set
and unifying them under Lcan set that indicates candidates for the left side of the complete bipartite while the ns set is the candidate set for the right hand side.
• Then, find a K2,n and then grow it to Km,n. In finding K2,n , iterate over each candidate node in the Lcan and determine the neighbor intersection with a. If the intersection set is larger than two, then these nodes belong to the right hand side.
• Grow the K2,n by finding all nodes in the left hand side (i.e., Lcan) that has the right hand side nodes (i.e., Rnew) as a neighbor.
a
ns1
ns2
nsn
.
.
.
.
ls1
ls2
lsn
.
.
.
.
Right can. set
Left can. set
CS 791v
Clique
28
• Finally, we index clique structures shown in below figures. • A clique in graph G = (V,E) is a subset of the vertex set (i.e., C
V ) such that there are edges between all node pairs (i.e., ⊆(ci, cj) C, e(ci,cj) E, when i ≠ j). ∀ ∈ ∃ ∈
• We index all maximal clique structures in the graph using complete bipartite structures.
v1
v3
v2
v4
v1
v2 v3
v1
vn
v2
v3
v4
.. . . ..
CS 791v
Clique
29
•This structure has been observed and utilized in many fields.
–computational biology– protein structure prediction (Samudrala et. al.)–electronic circuits (Cong et. al.)–chemicals in a chemical database (Rhodes et. al.)
CS 791v
Clique
30
Algorithm:
•First get the set of nodes from each complete bipartite k(m,n) and look for cliques that are formed by those nodes. •The clique search algorithm works recursively on each node from the k(m,n) as the pivot node in the L1 set and considers other nodes as candidate nodes in the L2 set. •The function, moves each node from the L2 set to the L1 set if it is connected to all nodes in the L1 and then recursively tries to grow the structure with remaining nodes as candidates. •When there are no more candidates to consider in L2 set then a clique has been identified.
v1
v3
d4
d2
d1
v2
d3
Set1
Set2
• v1
• v2
• v3
• d1
• d2
• d3
• d4
CS 791v
Where to Submit• Advances in Social Network Analysis and Mining (ASONAM 2011):
– Full paper submission deadline is March 1, 2011. – Full paper manuscripts must be with a maximum length of 8 pages (using the IEEE two- column template).– Kaohsiung, Taiwan 7/25-7/27
• Workshop on Large-scale Data Mining: Theory and Applications (LDMTA 2011)• Workshop on Mining and Learning with Graphs (MLG 2011)• Workshop on Social Network Mining and Analysis (SNAKDD 2011)
– Full paper submission deadline is May 4-10, 2011. – Full paper manuscripts must be with a maximum length of 10 pages (using the ACM two- column template).– San Diego, CA 8/21-8/24
• Simplifying Network Science for Practitioners: (SIMPLEX 2011)– Full paper submission deadline is Jan 31, 2011 – Feb 19 2011. – Full paper manuscripts must be with a maximum length of 6-10 pages (using the IEEE two- column template).– Minneapolis, Minnesota, USA 6/20-6/24
31CS 791v
Questions
SIMPLEX’10 32
Thank you