Efficient Triangle Motif Counting in Large Scale Complex Networks with GPUs Hakan Kardeş CS 791v.

Efficient Triangle Motif Counting in Large Scale Complex Networks with GPUs

Hakan Kardeş

CS 791v

Introduction

• Many systems are being modeled as complex networks to understand local and global characteristics of these systems.

• Studying network models of these systems provides a new direction towards understanding biological, chemical, technological or social systems in a better way.

CS 791v 2

Complex Networks Everywhere

3

Aspirin Yeast protein interaction network

An Internet Web Co-author network

CS 791v

Why Graph Mining and Searching?

• In many cases, systems under investigation are very large and the corresponding graphs have large number of nodes/edges requiring graph mining techniques to derive information from the graph.

• Several graph mining techniques have been developed to extract useful information from graph representation and analyze various features of complex networks.

4CS 791v

Why is Triangle Counting important?

5

• Clustering coefficient • Transitivity ratio • Social Network Analysis fact:

“Friends of friends are friends”

A

CB

• Hidden Thematic Structure of the Web (Eckmann et al. PNAS [EM02])

• Motif Detection, (e.g., [YPSB05] )• Web Spam Detection (Becchetti

et.al. KDD ’08 [BBCG08])

[WF94)]

CS 791v

Related Work• Hakan Kardes, and M. H. Gunes. Structural Graph Indexing for Mining Complex Networks.

IEEE ICDCS 2010 Workshop on Simplifying Complex Networks for Practitioners, Genoa, ITALY, June 21 2010.– Our paper in which we count all star, triangle, complete bipartite and clique structures.

• Matthieu Latapy. 2008. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407, 1-3 (November 2008), 458-473. – Survey paper, focused on space complexity

• Charalampos Tsourakakis, Petros Drineas, Eirinaios Michelakis, Ioannis Koutis, Christos Faloutsos, "Spectral Counting of Triangles in Power-Law Networks via Element-Wise Sparsification," Social Network Analysis and Mining, International Conference on Advances in, pp. 66-71, 2009 International Conference on Advances in Social Network Analysis and Mining, 2009– relies on the spectral properties of power-law networks, focused on power-law networks

6CS 791v

Related Work• Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2010. Efficient

algorithms for large-scale local triangle counting. ACM Transactions on Knowl. Discov. Data 4, 3, Article 13 (October 2010), 28 pages. – They count the number of triangles for a given node.

• Charalampos E. Tsourakakis, U Kang, Gary L. Miller, and Christos Faloutsos. Doulion: Counting triangles in massive graphs with a coin.In Knowledge Discovery and Data Mining (KDD '09)

• Belkacem Serrour, Alex Arenas, Sergio Gomez. 2010. Detecting communities of triangles in complex networks using spectral optimization.

• Bill Andreopoulos, Christof Winter, Dirk Labudde and Michael Schroeder. Triangle network motifs predict complexes by complementing high-error interactomes with structural information. BMC Bioinformatics 2009

7CS 791v

Methodology

8

Star

9

• We first index the star structure where a node has multiple neighbors as shown in below figures.

• All star structures within a graph G = (V,E) are represented as s(vi , nsi) where vi V and ns∈ i is the set of all neighbors of vi.

• We index maximal star structures for each node.

v1

ns1

ns2

ns2v1

ns1

ns3

vi

ns1

ns2

nsn

.

.

.

.

CS 791v

StarAlgorithm:

• First build a star structure s(v,ø) for each node v V, ∈without any neighbors.

• Then, for each edge e(a, b) E, append neighbor ∈sets of nodes a and b to the other one.

• Finally, remove star structures s(v,ns) that have less than two neighbors.

10

Nodes: a, b, c, d, e, f.

Edges: (a,b), (a,d), (a,f), (b,e), (c,f), (d,f)

Star Structures:

a b c

d e f

b a

a

f

dab

e

f

cf

d

CS 791v

Triangle

11

Algorithm:• Find second hop neighbors of ‘a’ by iterating over the ns set

• Then, take the intersection of second hop neighbors of ‘a’ and ns set.

• Grow the triangle set for each isi ԑ is.

a

ns1

ns2

nsn

.

.

.

.

CS 791v

ls1

ls2

lsn

.

.

.

.

CUDA

• For the parallel algorithm, I will use CUDA.

12CS 791v

CUDA

13CS 791v

Experiments

14

Possible Datasets for Experiments

• Router-level Internet topology (around 2.3 M nodes and 4M edges)– http://cheleby.cse.unr.edu/data.html

• the routing data on the Internet network (around 124K nodes and 207K edges)– http://vlado.fmf.uni-lj.si/pub/networks/data/web/web.zip

• a mobile phone graph. (around 2.7M nodes and 6M edges)– Will be requested from the authors of “Structure of neighborhoods in a large social network”

• Biological Data– http://www.biomedcentral.com/1471-2105/10/196

• Wikipedia graph (around 1.6M nodes and 18.5M edges)– I haven’t decided how to do it yet.

• I will generate sample graphs with different number of triangles– I haven’t decided how to do it yet.

15CS 791v

http://cheleby.cse.unr.edu/data.html

http://vlado.fmf.uni-lj.si/pub/networks/data/web/web.zip

http://www.biomedcentral.com/1471-2105/10/196

Results

• Triangle Counting CPU vs. GPU:

16

100 1000 10000 100000

CPUGPU

no. of nodes

Exe

cuti

on T

ime

CS 791v

Results

17

• Triangle Counting CPU vs. GPU:

x 1.5x 2x 3x

CPUGPU

no. of edges(while no. of nodes is constant

Exe

cuti

on T

ime

CS 791v

Results

18

• Triangle Counting with different triangle sizes:

x 1.5x 2x 3x 5x 10x

CPUGPU

No. of triangles

Exe

cuti

on T

ime

CS 791v

Results

19

• Triangle Counting with different block sizes:

x 1.5x 2x 3x 5x 10x

GPU

GPU

Block Size

Exe

cuti

on T

ime

CS 791v

Future Work

Structural Graph Indexing(SGI)

• We propose an alternative structural indexing approach to search and process queries efficiently even in very large graphs.

• As indexing features, we use commonly observed graph structures: star, complete bipartite, triangle and clique.

• These structures are ubiquitous in biological, chemical, technological, social, and many other complex networks.

21CS 791v

Structural Models

22

d1

dn

d2

v1 ...

d1

d3

d2v1

v1

v3

d4

d2

d1

v2

d3

v1

v2

d3

d2

d1

3-Star (K1,3) n-Star (K1,n) 2*3-Complete Bipartite(K2,3) 3*4-Complete Bipartite(K3,4)

v1

vm

dn

d1

.

.

.

.

.

.

.

.

.

.

.

.

v1

v3 v2

v1

v4 v3

v2 v1

vn

v2

v3

v4

. . . . ..

m*n-Complete Bipartite(Km,n) Triangle(K3) 4-Clique (K4) n-Clique(Kn)

CS 791v

Structural Graph Indexing

• An important feature of these structures is that each one is comprised from the previous one where clique contains complete bipartite structures and complete bipartite contains star structures.

• So, we can index these structures within the original graph in a consecutive manner. We first identify star structures, and then the complete-bipartite and clique structures from the preceding ones.

23CS 791v

Structural Graph Indexing

• An important difference of our approach from the previous studies is that we does not limit the size of subgraph considered in indexing. We index all maximal graphs that match the structure formulation. For instance, a maximal clique is a clique that cannot be extended by adding one more vertex from the graph. However, the substructure size in indexing may be limited when needed, since maximal clique search is known to be NP-complete.

24CS 791v

Complete Bipartite

SIMPLEX’10 25

• The second structure we index is complete bipartite, shown in below figures.

• A complete bipartite graph G = (V1 V2,E) is a bipartite ∪graph such that V1 and V2 are two distinct sets and for any two vertices vi V1 and vj V2, ∈ ∈ then there is an edge between them (i.e., e ∃ ∗ (vi,vj ) E). ∈

v1

v3

d4

d2

d1

v2

d3

v1

v2

d3

d2

d1v1

vm

dn

d1

.

.

.

.

.

.

.

.

.

.

.

.

.

CS 791v

Complete Bipartite

26

• Complete bipartite structure is ubiquitous in many complex networks.– protein-protein interaction networks (Thomas et. al.)– the Internet (Fay et. al.)

• We index all complete bipartite structures in the graph G using indexed star structures.

• For each star structure s(a,ns) where a V and ns is ∈the neighbor set of the node a, we identify the maximal complete bipartite involving the node ‘a’.

CS 791v

Complete Bipartite

27

Algorithm:• Find second hop neighbors of ‘a’ by iterating over the ns set

and unifying them under Lcan set that indicates candidates for the left side of the complete bipartite while the ns set is the candidate set for the right hand side.

• Then, find a K2,n and then grow it to Km,n. In finding K2,n , iterate over each candidate node in the Lcan and determine the neighbor intersection with a. If the intersection set is larger than two, then these nodes belong to the right hand side.

• Grow the K2,n by finding all nodes in the left hand side (i.e., Lcan) that has the right hand side nodes (i.e., Rnew) as a neighbor.

a

ns1

ns2

nsn

.

.

.

.

ls1

ls2

lsn

.

.

.

.

Right can. set

Left can. set

CS 791v

Clique

28

• Finally, we index clique structures shown in below figures. • A clique in graph G = (V,E) is a subset of the vertex set (i.e., C

V ) such that there are edges between all node pairs (i.e., ⊆(ci, cj) C, e(ci,cj) E, when i ≠ j). ∀ ∈ ∃ ∈

• We index all maximal clique structures in the graph using complete bipartite structures.

v1

v3

v2

v4

v1

v2 v3

v1

vn

v2

v3

v4

.. . . ..

CS 791v

Clique

29

•This structure has been observed and utilized in many fields.

–computational biology– protein structure prediction (Samudrala et. al.)–electronic circuits (Cong et. al.)–chemicals in a chemical database (Rhodes et. al.)

CS 791v

Clique

30

Algorithm:

•First get the set of nodes from each complete bipartite k(m,n) and look for cliques that are formed by those nodes. •The clique search algorithm works recursively on each node from the k(m,n) as the pivot node in the L1 set and considers other nodes as candidate nodes in the L2 set. •The function, moves each node from the L2 set to the L1 set if it is connected to all nodes in the L1 and then recursively tries to grow the structure with remaining nodes as candidates. •When there are no more candidates to consider in L2 set then a clique has been identified.

v1

v3

d4

d2

d1

v2

d3

Set1

Set2

• v1

• v2

• v3

• d1

• d2

• d3

• d4

CS 791v

Where to Submit• Advances in Social Network Analysis and Mining (ASONAM 2011):

– Full paper submission deadline is March 1, 2011. – Full paper manuscripts must be with a maximum length of 8 pages (using the IEEE two- column template).– Kaohsiung, Taiwan 7/25-7/27

• Workshop on Large-scale Data Mining: Theory and Applications (LDMTA 2011)• Workshop on Mining and Learning with Graphs (MLG 2011)• Workshop on Social Network Mining and Analysis (SNAKDD 2011)

– Full paper submission deadline is May 4-10, 2011. – Full paper manuscripts must be with a maximum length of 10 pages (using the ACM two- column template).– San Diego, CA 8/21-8/24

• Simplifying Network Science for Practitioners: (SIMPLEX 2011)– Full paper submission deadline is Jan 31, 2011 – Feb 19 2011. – Full paper manuscripts must be with a maximum length of 6-10 pages (using the IEEE two- column template).– Minneapolis, Minnesota, USA 6/20-6/24

31CS 791v

Questions

SIMPLEX’10 32

Thank you

Date post:	04-Jan-2016
Category:	Documents
Upload:	aubrey-todd
View:	217 times
Download:	2 times

Efficient Triangle Motif Counting in Large Scale Complex Networks with GPUs Hakan Kardeş CS 791v.

Documents