+ All Categories
Home > Documents > Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Date post: 06-Jan-2016
Category:
Upload: manny
View: 38 times
Download: 0 times
Share this document with a friend
Description:
Understanding t he SIMD Efficiency o f Graph Traversal o n GPU. Yichao Cheng , Hong An, Zhitao Chen, Feng Li, Zhaohui Wang , Xia Jiang and Yi Peng. University of Science and Technology of China. Breadth - first Search (BFS). Source. A. C. 1. 1. C. A. 2. D. E. F. E. 2. F. 2. D. - PowerPoint PPT Presentation
Popular Tags:
21
Understanding the SIMD Efficiency of Graph Traversal on GPU Yichao Cheng, Hong An, Zhitao Chen, Feng Li, Zhaohui Wang, Xia Jiang and Yi Peng University of Science and Technology of China
Transcript
Page 1: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Understanding the SIMD Efficiency of Graph Traversal on GPUYichao Cheng, Hong An, Zhitao Chen, Feng Li, Zhaohui Wang, Xia Jiang and Yi Peng

University of Science and Technology of China

Page 2: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Breadth-first Search (BFS)

A C

D E F

G H I

A C

DE F

G

H I

1 1

2 22

3 3

4

Source

Page 3: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Breadth-first Search (BFS)

A

B

C

DE F

G

H I

BFS_Iteration:for u ∈ Current Frontier for v ∈ u’ s neighbors do if v has not been labeled label v put v in Next Frontier

Page 4: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Application of BFS• Many datasets in real world are represented by graph• VLSI circuits• Social relationship• Road connections

• Primitive for building complex algorithms• Path-finding• Belief propagation• Points-to Analysis (PTA)

Page 5: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

The Problem•GPU relies on high SIMD lanes occupancy to boost performance • 100% efficiency is achieved only if all SIMD lanes fall in the same path

I

Do_something_common();If (thread_id > 5) { do_something_red(); } else { do something_blue();}

100% utilization

Page 6: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

The Problem•GPU relies on high SIMD lanes occupancy to boost performance • 100% efficiency is achieved only if all SIMD lanes fall in the same path

I

37.5% utilization

Do_something_common();If (thread_id > 5) { do_something_red(); } else { do something_blue();}

Page 7: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

The Problem•GPU relies on high SIMD lanes occupancy to boost performance • 100% efficiency is achieved only if all SIMD lanes fall in the same path

I

62.5% utilization

Do_something_common();If (thread_id > 5) { do_something_red(); } else { do something_blue();}

Page 8: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Traditional ImplementationGPU_BFS_Iteration u = C[tid] for v ∈ u’ s neighbors do

end for

The # of sub-iterations depends on

the size of u ’s adjacent list

task 1 = 4 sub-iterations

task 2 = 2 sub-iterations

Page 9: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Visualizing the Irregularity

vertex range < 8

Highly skewedoutlier exists

irregular but concentrate

distributedbetween a wide rage

Page 10: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Alternative Way• Assign each task with a warp of threads• Vectorize the sub-iterations!

I

So, what’s the relationship between graph topology and SIMD efficiency?

Page 11: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Topology and Utilization• Assign each vertex with a group of threads

Thread Warp Group

task 1 = 2 sub-iterationstask 2 = 1 sub-iteration

Page 12: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Topology and Utilization

Divide the SIMD underutilization into two parts• InteR-group Underutilization (UR)

• IntrA-group Underutilization (UA)

SIMD Window

Page 13: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Conclusions From the Model

•UR is induced by the heterogeneity of workloads• Affected by the graph topology

•UR is sensitive to the group size (S)• Large logical SIMD window can narrow the gap• When S = 32, UR = 0

•UA is determined by the intrinsic irregularity of vertex degree• It can be limited by shrink the S• When S = 1, UA = 0

•UR and UA can convert to each other

Page 14: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Comparing Different Mapping Strategies

Expansion Rate (ME/s)

Scalability

good

poor

low high

Page 15: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Evaluating the SIMD Efficiency•Metrics derived from the model:

UR = inter-group underutilizationUA = intra-group underutilizationME = mapping efficiency UR + UA + ME = 100%

• Captures utilization trend with increasing S

Page 16: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Explaining the Result

Expansion Rate (ME/s)

Scalability

good

poor

low high

alleviate the UR , introducing minor UA

Page 17: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Explaining the Result

Expansion Rate (ME/s)

Scalability

good

poor

low high

ME in a high level (~80%)

Page 18: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Explaining the Result

Expansion Rate (ME/s)

Scalability

good

poor

low high

outweighed by the fast-growing UA

Page 19: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Explaining the Result

Expansion Rate (ME/s)

Scalability

good

poor

low high

do little help to UR but lead to severe UA

Page 20: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Conclusion• Study the link between graph topo & hardware util• Present a model for analyzing the components of SIMD underutilization• Discover that the SIMD are wasted due to:•Develop 3 metrics for quantifying SIMD efficiency• Provide a foundation for developing techniques of static analysis and runtime optimization

imbalance of vertex degree distribution heterogeneity of

each vertex degree

Page 21: Understanding t he SIMD Efficiency o f Graph Traversal o n GPU

Q&A


Recommended