+ All Categories
Home > Documents > Parallel Triangle Counting and K-Truss Identification Using Graph...

Parallel Triangle Counting and K-Truss Identification Using Graph...

Date post: 22-Mar-2021
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
26
Parallel Triangle Counting and K-Truss Identification Using Graph-Centric Methods Chad Voegele, Yi-Shan Lu, Sreepathi Pai, Keshav Pingali The University of Texas at Austin 09/13/2017 1
Transcript
Page 1: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

Parallel Triangle Counting and K-Truss Identification Using

Graph-Centric MethodsChad Voegele, Yi-Shan Lu, Sreepathi Pai, Keshav Pingali

The University of Texas at Austin

09/13/2017

1

Page 2: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

Graph-Centric vs. Matrix-Centric Abstractions

• Active element• Node/edge where computation is needed

• Operator• Computation at active element• Neighborhood: Set of nodes/edges

read/written by the update

• Parallelism• Disjoint updates• Read-only operators, e.g. triangle counting

• Bulk operations• Matrix-matrix/vector multiplication• Element-wise manipulation• Reduction

• Parallelism• Inside individual operations

2

: active node

: neighborhood

1 1 1

1 1 1

1 1 1 1

1 1 1 1 1

1 1 1

1 1

1 1 1 2 2 2 1 1 1

1 2 2 1 1 2 1 1 1

2 1 2 1 2 1 1 2 1 1

2 2 1 2 1 1 2 1 1 2

1 1 1 1 2 1 1 2 1

1 1 1 1 2 1 1

1 1 1

1 1 1

1 1 1 1

1 1 1 1 1

1 1 1

1 1

* =

Page 3: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

Galois: Graph-Centric Programming Framework

Shared-Memory Galois [1]

(C++ Library)• Parallel data structures

• Graphs, bags, etc.

• Parallel loops over active elements• for_each, do_all, etc.

• Support for • Load balancing

• Scheduling

• Dynamic work

IrGL [2]

(Compiler)• Translates Galois programs to CUDA

• Applies GPU-specific optimizations• Iteration outlining

• Cooperative conversion

• Nested parallelism

3[1] D. Nguyen, A. Lenharth and K. Pingali. “A lightweight infrastructure for graph analytics,” in SOSP 2013.[2] S. Pai and K. Pingali. “A compiler for throughput optimization of graph algorithms on GPUs,” in OOPSLA 2016.

Page 4: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

Advantages of Graph-Centric Approach

4

Page 5: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

Eliminating Barriers in a Round

5

Graph-centric methods: Operator for edges

Op

erat

or f

or

e 2

Op

erat

or

for

e 3

Op

erat

or f

or

en

Matrix-centric methods: Matrix operation for each step

Barrier between rounds

Matrix operation for triangle enumeration

Matrix operation for counting # triangles for edges

Matrix operation for removing selected edges

Reduction to check for edges w/ insufficient support

Barrier in a round

Barrier in a round

Barrier in a round

Enumerate triangles

Count number of triangles for edges

Remove edges w/ insufficient support

Do all edges have sufficient

support?

K-Truss done

NoYes

K-Truss begins

Barrier between rounds

Op

erat

or f

or

e 1

Page 6: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

Exploiting Domain Knowledge in Operators

6

EdgeDst

EdgeRange

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 --EdgeRemoved

Graph as Compressed Sparse Row (CSR)

0 1

2 3

54

Sorted edge lists to speed up edge list intersection from O(deg(u)*deg(v)) to O(deg(u)+deg(v))

Sorted edge lists to locate edges using binary search when removing edges

Enumerate triangles

Count number of triangles for edges

Remove edges w/

insufficient support

Do all edges have sufficient

support?

K-Truss done

NoYes

K-Truss begins

Early termination when edge support reaches k – 2.

Edge removals may be visible in current round, reducing the number of rounds.

1 2 3 0 2 3 0 1 3 4 0 1 2 4 5 2 3 5 3 4 --

0 3 6 10 15 18 20

Page 7: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

Avoiding Runtime Memory Management

7

1 1 1

1 1 1

1 1 1 1

1 1 1 1 1

1 1 1

1 1

1 1 1 2 2 2 1 1 1

1 2 2 1 1 2 1 1 1

2 1 2 1 2 1 1 2 1 1

2 2 1 2 1 1 2 1 1 2

1 1 1 1 2 1 1 2 1

1 1 1 1 2 1 1

1 1 1

1 1 1

1 1 1 1

1 1 1 1 1

1 1 1

1 1

* =

0 3 6 10 15 18 20

1 2 3 0 2 3 0 1 3 4 0 1 2 4 5 2 3 5 3 4 --

e e e e e e e e e e e e e e e e e e e e --

EdgeDst

EdgeData

EdgeRange

n n n n n n --NodeData

Graph-centric methods: Load graphs and update node/edge data in the graphs

Fixed after graphs are

loaded.

Adjacency matrix Incidence matrix Product matrix

Matrix-centric methods: Construct matrices at runtime

Needs runtime memory

management.

0 1

2 3

54

Page 8: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

Advantages of Graph-Centric Approach

• Eliminates barriers in a round

• Exploits domain knowledge in operators

• Avoids runtime memory management

8

Page 9: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

Experimental Setup

9

Platform• CPU

• Broadwell-EP Xeon E5-2650 v4 @ 2.2 GHz• 30 MB LLC, 192 GB RAM• g++ 4.9• 1, 12 or 24 threads

• GPU• Pascal-based NVIDIA GTX 1080• 8 GB RAM• NVCC 8.0

Baseline from IEEE HPEC static graph challenge [3]

• Triangle counting: serial miniTri in C++• K-truss computation: reference implementation in Julia 0.60

Parameter• Compute kmax-truss for each graph.• kmax: the maximum k for a graph to return non-empty truss.

[3] S. Samsi et al. “Static graph challenge: subgraph isomorphism,” in IEEE HPEC, 2017.[4] J. Leskovec and A. Krevl. SNAP datasets: Stanford large network dataset collection. Retrieved from http://snap.Stanford.edu/data, June 2014.

[4]

Largest

Smallest

Page 10: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

Runtime

10

Page 11: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

11

K-Truss Runtime4800

Low

er is

bet

ter

timeout

Speedup over Julia

Variant Geo Mean

Julia 1.00

End-to-end runtime after the graph is loaded and before the results are printed.

Page 12: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

12

K-Truss Runtime4800

Low

er is

bet

ter

timeout

Speedup over Julia

Variant Geo Mean

Julia 1.00

Cpu-01 428.87

End-to-end runtime after the graph is loaded and before the results are printed.

Page 13: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

13

K-Truss Runtime4800

Low

er is

bet

ter

timeout

Speedup over Julia

Variant Geo Mean

Julia 1.00

Cpu-01 428.87

Cpu-24 623.62

End-to-end runtime after the graph is loaded and before the results are printed.

Maximum speedup of cpu-24 over cpu-01: 14.30X (~117M edges)

Page 14: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

14

K-Truss Runtime4800

Low

er is

bet

ter

timeout

Speedup over Julia

Variant Geo Mean

Julia 1.00

Cpu-01 428.87

Cpu-24 623.62

Gpu 2,213.14

End-to-end runtime after the graph is loaded and before the results are printed.

Maximum speedup of cpu-24 over cpu-01: 14.30X (~117M edges)

Page 15: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

15

Triangles Runtime4800

Low

er is

bet

ter

timeout

Speedup over MiniTri

Variant Geo Mean

MiniTri 1.00

Cpu-01 163.23

Cpu-24 380.57

Gpu 1,760.47

End-to-end runtime after the graph is loaded and before the results are printed.

Maximum speedup of cpu-24 over cpu-01: 17.22X (~15.7M edges)

Page 16: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

Memory Usage

16

Page 17: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

17

K-Truss Memory Usage192GB

Low

er is

bet

ter

Total CPU memory

% over Julia

Variant Geo Mean

Julia 100.00

MeasurementJulia: @time

Page 18: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

18

K-Truss Memory Usage

Low

er is

bet

ter

% over Julia

Variant Geo Mean

Julia 100.00

Cpu-01 0.54

MeasurementJulia: @timeCPU: Galois’ internal allocator

192GB Total CPU memory

Page 19: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

19

K-Truss Memory Usage

Low

er is

bet

ter

% over Julia

Variant Geo Mean

Julia 100.00

Cpu-01 0.54

Cpu-24 11.05

MeasurementJulia: @timeCPU: Galois’ internal allocator

192GB Total CPU memory

Page 20: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

20

K-Truss Memory Usage

Low

er is

bet

ter

% over Julia

Variant Geo Mean

Julia 100.00

Cpu-01 0.54

Cpu-24 11.05

Gpu 1.09

MeasurementJulia: @timeCPU: Galois’ internal allocatorGPU: cudaMemGetInfo

192GB Total CPU memory

8GB Total GPU memory

Page 21: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

21

Triangles Memory Usage

Low

er is

bet

ter

% over MiniTri

Variant Geo Mean

MiniTri 100.00

Cpu-01 94.31

Cpu-24 791.64

Gpu 50.14

Measurement

CPU: Galois’ internal allocatorGPU: cudaMemGetInfo

192GB Total CPU memory

8GB Total GPU memory

Page 22: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

Energy Usage

22

Page 23: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

23

K-Truss Energy Usage

Low

er is

bet

ter

% over Julia

Variant Geo Mean

Julia 100.00

Cpu-01 2.27

Cpu-24 2.03

Gpu 0.48

MeasurementJulia: Intel RAPL countersCPU: Intel RAPL countersGPU: nvprof

Page 24: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

24

Triangles Energy Usage

Low

er is

bet

ter

% over MiniTri

Variant Geo Mean

MiniTri 100.00

Cpu-01 12.95

Cpu-24 12.07

Gpu 2.55

Measurement

CPU: Intel RAPL countersGPU: nvprof

Page 25: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

Conclusions

• Graph-centric methods deliver two to three orders of magnitude improvements over matrix-centric IEEE HPEC static graph challenge reference implementations.

• Advantages of graph-centric methods over matrix-centric methods• Eliminates barriers in a round.

• Exploits domain knowledge in operators.• Early operator termination

• On-the-spot edge removals

• Sorting of edge lists for faster edge list intersections and edge removals

• Avoids runtime memory management.

25

Page 26: Parallel Triangle Counting and K-Truss Identification Using Graph …yishanlu/slides/HPEC_2017... · 2017. 10. 18. · Parallel Triangle Counting and K-Truss Identification Using

Thank you!Questions? Comments?

26


Recommended