+ All Categories
Home > Documents > High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 ·...

High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 ·...

Date post: 17-Feb-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
76
Papers, code, slides at graphanalysis.info High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering The Pennsylvania State University [email protected]
Transcript
Page 1: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Papers, code, slides at graphanalysis.info

High-performance Graph Analytics

Kamesh Madduri

Computer Science and Engineering

The Pennsylvania State University

[email protected]

Page 2: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• NSF grants ACI-1253881, CCF-1439075

• DOE Office of Science through the FASTMath SciDAC Institute– Sandia National Laboratories is a multi program laboratory managed

and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

• Use of NERSC systems (supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231)

Acknowledgments

2

Page 3: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Design and implementation of scalable parallel algorithms to accelerate irregular and data-intensive problems in computational science

– Complex Network analysis (this talk)

– Computational Genomics

– Fusion physics

– Computational Cosmology

Madduri group research areas: scientific computing, parallel algorithms for irregular problems

3

Page 4: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

“Wordle” from recent paper abstracts

4

Page 5: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Why high-performance graph analytics?

• FASCIA: Fast Subgraph Counting

• PULP: Partitioning using Label Propagation

• Other research projects

Talk Overview

5

Page 6: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Sources of ‘big data’: the Internet, Intelligence and surveillance applications, sensor networks, medical applications, petascale simulations, experimental devices.

• New challenges for analysis: data sizes, data heterogeneity, uncertainty, data quality, and dynamic/temporal nature.

Graph abstractions are pervasive

Astrophysicse.g., outlier detection.

Bioinformaticse.g., Identifying drug target proteins.

Social Informaticse.g., advertising, modeling spread of information.

6

Page 7: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• [Krebs ’04] Post 9/11 Terrorist Network Analysis from public domain information

• Plot masterminds correctly identified from interaction patterns: centrality

• A global view of entities is often more insightful

• Detect anomalous activities by exact/approximate subgraph isomorphism.

Image Source: http://www.orgnet.com/hijackers.html

Network Analysis for Intelligence and Surveillance

Image Source: T. Coffman, S. Greenblatt, S. Marcus, Graph-based technologies

for intelligence analysis, CACM, 47 (3, March 2004): pp 45-47

7

Page 8: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Characterizing Graph-theoretic computations

8

Application Areas

Methods/Problems

ArchitecturesGraph Algorithms

Traversal

Shortest Paths

Connectivity

Max Flow

GPUsFPGAs

x86 multicoreservers

Massively multithreadedarchitectures

MulticoreClusters

Clouds

Social NetworkAnalysis

WWW

Computational Biology

Scientific Computing

Engineering

Find central entitiesCommunity detection

Network dynamics

Data size

ProblemComplexity

Graph partitioningMatchingColoring

Gene regulationMetabolic pathways

Genomics

MarketingSocial Search

VLSI CADRoute planning

Page 9: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Characterizing Graph-theoretic computations

9

• graph sparsity (m/n ratio)• static/dynamic nature• weighted/unweighted, weight distribution• vertex degree distribution• directed/undirected• simple/multi/hyper graph• problem size• granularity of computation at nodes/edges• domain-specific characteristics

• paths• clusters• partitions• matchings• patterns• orderings

Input: Graph abstraction

Problem: Find ***

Factors that influence choice of algorithm

Kernels

• traversal• shortest path algorithms• flow algorithms• spanning tree algorithms• …..

Graph problems are often also recast as sparse linear algebra (e.g., partitioning) or linear programming computations

Page 10: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Graph topology assumptions in

classical algorithms do not match real-world data

• Parallelization strategies conflict with

techniques for enhancing memory locality

• Classical “work-efficient” graph algorithms

may not fully exploit new architectural features– Increasing complexity of memory hierarchy (x86),

wide SIMD (GPUs, Xeon Phi).

• Tuning implementation to minimize parallel overhead

is non-trivial– Shared memory: minimizing overhead of locks, barriers.

– Distributed memory: bounding message buffer sizes, bundling messages, overlapping communication w/ computation.

Parallel algorithm engineering challenges

10

Page 11: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Human Protein Interaction Network(18669 proteins, 43568 interactions)

Vertex Degree

0.1 1 10 100 1000

Fre

qu

en

cy

0.1

1

10

100

1000

10000

• Low graph diameter– Key source of concurrency in

graph traversal.

• Skewed degree distributions– Parallel algorithm (or the load-

balancing strategy) must be

cognizant of this fact.

• Very sparse networks– Choose graph representations,

data structures accordingly.

• Exploit/enforce locality– Vertex reordering, be frugal in

memory utilization.

Speedup Insight: Exploiting network structure

High-dimensional data,

Low graph diameter,

irregular degree distributions.

Planar, high diameter

11

Page 12: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Subgraph search and enumeration

• Complex network partitioning

• Scalable algorithms for multicore, manycore, distributed-memory, and cloud platforms

• Sparse Graph ↔ Sparse Matrix computations

• Dynamic graph computations

• Web data analytics, Protein interaction network analysis, RDF/SPARQL data analytics

Our recent work: Algorithms and software for large-scale graph analysis

12

Page 13: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Subgraph isomorphism-related [ParCo15, IPDPS14a, ICPP13]

• Graph partitioning: PULP [BigData14, DIMACS13]

• “Sub-quadratic work” graph algorithms

– Strongly connected components [IPDPS14b]

– Biconnected components [HiPC14]

– Single-source shortest paths [CSC14]

– Approx. Betweenness Centrality [SC12]

– BFS [SC11]

– Manycore graph algorithms [IPDPS15]

– Triad census, PageRank, Approx. K-core

• Applications

– Graph analysis in Hydrology [EGAS15, TPDS16/to appear]

– RDF data stores and SPARQL query processing [Chi14]

– Web Data Commons hyperlink graph analytics

– New community ranking method for weighted social networks

Our recent work: publications

13

Page 14: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Why high-performance graph analytics?

• FASCIA: Fast Subgraph Counting

• PULP: Partitioning using Label Propagation

• Other research projects

Talk Overview

14

Page 15: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Background

• Motivating Applications

• Color-coding overview

• Our new parallel algorithm and optimizations

• Results

FASCIA: Fast Approximate Subgraph Counting

15

Page 16: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Subgraph counting

16

Page 17: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Subgraph counting

17

Page 18: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Subgraph counting

18

Page 19: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Subgraph counting

19

Page 20: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Subgraph enumeration

20

Page 21: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Subgraph enumeration

21

Page 22: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Subgraph enumeration

22

Page 23: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Subgraph enumeration

23

Page 24: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Induced vs. Non-induced subgraphs

24

Page 25: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Important in social network analysis, bioinformatics, chemoinformatics, communication network analysis, etc.

• Forms the basis for higher-order network analyses

– Motif finding

– Graphlet frequency distance (GFD)

– Graphlet degree distributions (GDD)

– Graphlet degree signatures (GDS)

• Exact counting and enumeration on large networks is very compute-intensive, O(nk) work complexity for naïve algorithm

Motivation: Why fast algorithms for subgraph counting?

25

Page 26: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Motifs: frequently-occurring subgraphs of certain size and structure

26

C. Elegans

E. Coli

S. Cerevisiae

H. Pylori

Page 27: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Numerically compare occurrence frequency to other networks

Graphlet Frequency Distance (GFD)

27

Page 28: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Numerically compare embedding distributions

Graphlet Degree Distribution (GDD)

28

Page 29: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Alon et al., 1995: approximate counting of tree-like non-induced subgraphs

Color-coding approximation strategy

29

Larger graphTemplate

k = 3

n = 12, m = 15

Page 30: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Alon et al., 1995: approximate counting of tree-like non-induced subgraphs

Color-coding approximation strategy

30

Randomly “color” vertices of graphTemplate

k = 3

Page 31: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Alon et al., 1995: approximate counting of tree-like non-induced subgraphs

Color-coding approximation strategy

31

Possible colorful embeddings

Page 32: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Alon et al., 1995: approximate counting of tree-like non-induced subgraphs

Color-coding approximation strategy

32

Possible colorful embeddings

Identify colorful embeddings

Page 33: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Cntcolorful = 3, Probability of colorful embedding = 3!/33

• Perform multiple (~ ek) coloring iterations

• Each iteration requires O(m2k) work

Color-coding approximation strategy

33

Page 34: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Template partitioning

• Count number of

colorful occurrences

of template

– Memory-intensive step

– 𝑂 𝑚2𝑘 work; 𝑂 𝑛2𝑘 space requirements

– Optimizations, Parallelization of dynamic programming-based counting step

• Estimate the total number of occurrences

FASCIA Algorithm and Parallelization Overview

34

Page 35: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

FASCIA Algorithm

35

Page 36: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Counting step

36

Page 37: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Test network families, example templates

37

Network type # of networks # of edges in largest

network

PINs 8 22 K

Web crawls 4 3.9 M

Social networks 6 5.4 M

Road 5 2.8 M

Collaboration 6 1.05 M

Large soc. net (Orkut) 1 117 M

Large synth. urban

pop. (Portland)

1 31 M

Page 38: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Error

38

H. Pylori, Subgraphs of size 7

Page 39: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Error

39

Enron email network

Page 40: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Graphlet degree distributionsEnron Portland

Slashdot G(n,p) random

40

Page 41: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Graphlet frequency distribution agreement scores heatmap

Road networks

P2P

Collaboration

networks

41

PINs

Page 42: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

How similar are PINs to each other?

42

Page 43: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Execution times for various template sizes

43

Portland network (31M edges)

Single node performance (Intel Sandy Bridge server, 16 cores)

Page 44: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Shared-memory strong scaling

44

Portland network (31M edges), U12-2 template

Single node performance (Intel Sandy Bridge server, 16 cores)

1 color-coding iteration

11.8x speedup

Page 45: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Multi-node strong scaling

45

Orkut network (117M edges)

Performance on an Intel Sandy Bridge cluster (1-15 nodes)

6.8x speedup

for U12-2

Page 46: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Multi-node strong scaling (communication time)

46

Orkut network (117M edges)

Performance on an Intel Sandy Bridge cluster (1-15 nodes)

No scaling:

Comm Volume

proportional

to # of MPI tasks!

Page 47: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Reduction in memory utilization

47

Page 48: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• For subgraph counting: Parallel and memory-efficient implementation of an approximation algorithm based on the color-coding technique

– 𝑂 2𝑘𝑒𝑘𝑚 work (exhaustive search requires 𝑂 𝑛𝑘 work)

• Significantly faster (at least 10X) than prior parallel color-coding implementations

• FASTPATH: Color-coding applied to enumerate high-scoring paths in biological networks

• fascia-psu.sourceforge.net, fastpath-psu.sourceforge.net

Our new parallel approach: FASCIA

48

Page 49: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Why high-performance graph analytics?

• FASCIA: Fast Subgraph Counting

• PULP: Partitioning using Label Propagation

• Other research projects

Talk Overview

49

Page 50: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Multi-constraint, multi-objective method for partitioning complex networks– But not multilevel!

• Constraints: for each vertex partition, ensure that

1. 1 − 𝜖𝐿𝑛

𝑝≤ partition size ≤ 1 + 𝜖𝑈

𝑛

𝑝

2. intra-partition edge count ≤ 1 + 𝜂𝑈2𝑚

𝑝

• Objectives: reduce1. Edge cut (total number of inter-partition edges)

2. Max inter-partition edge cut

PULP: Partitioning using Label Propagation

50

Page 51: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Memory-efficient: 8-40X reduction in memory utilization compared to competing methods

• Partitioning quality comparable to Metis and ParMetis for a collection of large web crawls and social networks.

• Fast: 42 s on sk-2005 (1.8 billion edges), 530 s on Twitter (1.6 billion edges) on a 16-core, 64 GB Intel system.

PULP main results

51

Page 52: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• We (humans, primarily Network Science community) create them and term them complex

– Mostly “virtual” or physical topology + virtual interactions

– Complex network = Graph + Vertex/edge Heterogeneity + multi* + uncertainty + incompleteness + dynamics + vertex/edge metadata + finer-grained communication + …

• What aren’t complex networks?

– Road networks

– Meshes from scientific simulations

– Meshes with underlying 2D/3D topologies

What are complex networks?

52

Page 53: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Low (O(log n)) graph diameter

• Low (typically O(1)) mean shortest path length

• Skewed vertex degree distributions

• Sparse: m = O(nlog n)

• m > 10,000

• High-dimensional

Our definition of complex networks

53

Page 54: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Our observation: several real-world graphs are expander-like; reduction in edge cut over a random partitioning may be less than 5%, for graphs with million+ edges and 32-way partitioning

• Leskovec et al. [WWW08] studied 100 large networks, observed presence of several tightcommunities of size 100 in most networks

Complex networks lack good partitions

54

Page 55: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• There is a substantial reduction in total edge cut (over random partitioning) for somenetworks

– Good results for web crawls with high average vertex degree (~ 100)

• Partition graphs in an exploratory manner?

However …

55

Page 56: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Reduce overhead of data replication

• Distributed-memory graph computations

– Reduced total edge cut may lead to reduce inter-processor communication

• In addition to vertex balance, edge balance is also very important

– Add it as a constraint

Why partition complex networks?

56

Page 57: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

1. Assign each vertex to one of p partitions randomly

2. Degree-weighted label propagation (LP)

3. for k1 iterations do

for k2 iterations do

Balance partitions with LP to satisfy

vertex constraint

Heuristically improve partitions to reduce edge cut

for k3 iterations do

Balance partitions with LP to satisfy edge constraint and minimize max per-part edge cut

Heuristically improve partitions to reduce edge cut

PULP algorithm

57

Page 58: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Iteratively propagate vertex labels along links

• Popular algorithm for community detection [Raghavan et al., 2007]: iteratively assign to each vertex the maximal per-label count over all neighbors

• Theoretical convergence bounds for unweighted graphs

• Fast convergence in practice

Label propagation

58

Page 59: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

PULP with a toy example1. Random initialization

59

Infectious network from KONECT (http://konect.uni-koblenz.de/)

410 vertices, 17298 edges

Page 60: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

PULP Step 2. Degree-weighted label propagation

60

Page 61: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

PULP Step 3. Satisfy vertex constraint, reduce total edge cut

61

Page 62: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

PULP Step 4: Satisfy edge constraint, reduce max per-part edge cut

62

Page 63: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

1. Assign each vertex to one of p partitions randomly

2. Degree-weighted label propagation (LP)

3. for k1 iterations do

for k2 iterations do

Balance partitions with LP to satisfy

vertex constraint

Heuristically improve partitions to reduce edge cut

for k3 iterations do

Balance partitions with LP to satisfy edge constraint and minimize max per-part edge cut

Heuristically improve partitions to reduce edge cut

PULP algorithm

63

Page 64: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Intel Xeon E5-2670 system, dual-socket, 8 cores per socket, 64 GB memory

• Test graphs

– LAW graphs from UF Sparse matrix collection

– Large graphs from SNAP, Koblenz, MPI repositories

– 60K-70M vertices, 275K-2B edges

• Quality and time comparisons to Metis (v5.1.0), Metis (v5.1.0) with multiple constraints, ParMetis(v4.0.3), KaFFPa-FastSocial (v0.62, serial)

• 2-128 partitions, serial and parallel time, peak memory use

Experimental study

64

Page 65: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Some of the graphs used

65

Page 66: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Peak memory use (128-way partitioning)

66

Page 67: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Vertex lower balance: 0.25n/p

• Vertex upper balance: 1.1n/p

• Edge upper balance parameter: 1.5

• 3 iterations of degree-weighted label propagation

• 5 iterations of outer loop (k1)

– 5 iterations for objective 1 (k2)

– 10 iterations for objectives 1 and 2 (k3)

Balance constraints and other parameters

67

Page 68: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

Time (p = 32)

68

Page 69: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

69

Page 70: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

70

Page 71: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

71

Page 72: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

72

Page 73: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

PULP partitioning improves analytic performance

73

Random partitioning

FASCIA performance on LJ graph

PULP partitioning

Page 74: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Partitioning with vertex weights

• Make it single-objective again: why do total edge cut?

• Swap order of edge and vertex balance constraints

• Parameter sensitivity

• Partitioning with vertex delegates

• Distributed-memory, scaling to larger graphs

• Performance of graph analytics before/after partitioning

Current PULP-related work

74

Page 75: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Why high-performance graph analytics?

• FASCIA: Fast Subgraph Counting

• PULP: Partitioning using Label Propagation

• Other research projects

– Genomics: Accelerating the genetic variant detection workflow

– Fusion physics: Parallel particle-in-cell method

Talk Overview

75

Page 76: High-performance Graph Analyticskxm85/papers/Madduri_UB2015_slides.pdf · 2015-11-15 · High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering ... VLSI

• Questions?

[email protected]

• http://www.cse.psu.edu/~madduri

• graphanalysis.info

• sites.psu.edu/XPSGenomics

Thank you!

76


Recommended