APACHE GIRAPH ON YARNChuan Lei and Mohammad Islam
Fast Scalable Graph Processing
2
What is Apache Giraph Why do I need it Giraph + MapReduce Giraph + Yarn
What is Apache Giraph3
Giraph is a framework for performing offline batch processing of semi-structured graph data on massive scale
Giraph is loosely based upon Google’s Pregel graph processing framework
What is Apache Giraph4
Giraph performs iterative calculation on top of an existing Hadoop cluster
What is Apache Giraph5
Giraph uses Apache Zookeeper to enforce atomic barrier waits and perform leader election
Done! Done!Still
Working…!
What is Apache Giraph6
Why do I need it?7
Giraph makes graph algorithms easy to reason about and implement by following the Bulk Synchronous Parallel (BSP) programming model
In BSP, all algorithms are implemented from the point of view of a single vertex in the input graph performing a single iteration of the computation
Why do I need it?8
Giraph makes iterative data processing more practical for Hadoop users
Giraph can avoid costly disk and network operations that are mandatory in MR
No concept of message passing in MR
Why do I need it?9
Each cycle of an iterative calculation on Hadoop means running a full MapReduce job
PageRank example10
PageRank – measuring the relative importance of document within a set of documents
1. All vertices start with same PageRank
1.0
1.0
1.0
PageRank example11
2. Each vertex distributes an equal portion of its pagerank to all neighbors
1.0
1.00.5
0.5
PageRank example12
3. Each vertex sums incoming values times a weight factor and adds in small adjustment: 1/(# vertices in graph)1.5*1
+1/3
1*1+1/3
0.5*1+1/3
PageRank example13
4. This value becomes the vertices’ PageRank for the next iteration
1.33
0.83
1.83
PageRank example14
5. Repeat until convergence: change in PR per iteration < epsilon)
PageRank on MapReduce15
1. Load complete input graph from disk as [K = vertex ID, V = out-edges and PR]
Map Sort/Shuffle Reduce
PageRank on MapReduce16
2. Emit all input records (full graph state), emit [K = edgeTarget, V = share of PR]
Map Sort/Shuffle Reduce
PageRank on MapReduce17
3. Sort and Shuffle this entire mess.
Map Sort/Shuffle Reduce
PageRank on MapReduce18
4. Sum incoming PR shares for each vertex, update PR values in graph state records
Map Sort/Shuffle Reduce
PageRank on MapReduce19
5. Emit full graph state to disk…
Map Sort/Shuffle Reduce
PageRank on MapReduce20
6. … and START OVER!
Map Sort/Shuffle Reduce
PageRank on MapReduce21
Awkward to reason about I/O bound despite simple core business
logic
PageRank on Giraph22
1. Hadoop Mappers are “hijacked” to host Giraph master and worker tasks
Map Sort/Shuffle Reduce
PageRank on Giraph23
2. Input graph is loaded once, maintaining code-data locality when possible
Map Sort/Shuffle Reduce
PageRank on Giraph24
3. All iterations are performed on data in memory, optionally spilled to disk. Disk access is linear/scan-based
Map Sort/Shuffle Reduce
PageRank on Giraph25
4. Output is written from the Mappers hosting the calculation, and the job run ends
Map Sort/Shuffle Reduce
PageRank on Giraph26
This is all well and good, but must we manipulate Hadoop this way?
Heap and other resources are set once, globally for all Mappers in the computation
No control of which cluster nodes host which tasks
No control over how Mappers are scheduled
Mapper and Reducer slots abstraction is meaningless for Giraph
Overview of Yarn YARN (Yet Another Resource Negotiator)
is Hadoop’s next-gen management platform
A general purpose framework that is not fixed to MapReduce paradigm
Offers fine-grained control over each task’s resource allocation
27
Giraph on Yarn28
It’s a natural fit!
Giraph on Yarn29
Client Resource Manager Application Master
Node Manag
erWork
erWork
er
Node Manag
erApp Mstr
Worker
Node Manag
erMast
erWork
er
Resource
ManagerClien
tZooKeep
er
Giraph Architecture31
Master / Workers Zookeeper
Master Worker Worker Worker
Worker Worker Worker
Worker Worker Worker
Metrics35
Performance Processing time
Scalability Graph size (number of vertices and number
of edges)
Optimization Factors36
JVM Giraph App
GC control• Parallel GC• Concurrent GC• Young
Generation
• Memory Size• Number of
Workers• Combiner• Out-of-Core
• Object Reuse
Experimental Settings37
Cluster - 43 nodes ~ 800 GB memory Hadoop-2.0.3-alpha (non-secure) Giraph-1.0.0-release Data - LinkedIn social network graph
Approx. 205 million vertices Approx. 11 billion edges
Application - PageRank algorithm
14 26 38 50 60 70 80 900
100200300400500600700800900
10 GB per worker 20 GB per worker
Number of Vertices (mil)
Proc
essi
ng T
ime
(sec
)
Baseline Result38
10 v.s 20 GB per worker Max memory 800 GB
Processing time 10 GB per worker –
better performance
Scalability 20 GB per worker –
higher scalability 10 15 20 25 30 35 400
20406080
100120
10 GB per worker 20 GB per worker
Number of Workers
Num
ber
of V
ertic
es (
mil)
40 worke
rs40
workers
400G
800G
30 worke
rs
10 worke
rs
25 worke
rs
15 worke
rs
5 worke
rs
Heap Dump w/o Concurrent GC
39
Iteration 3 Iteration 27
Reach-able 1.5
Un-reachable 3
Reach-able 1.5
Un-reach-able 6
Big portion of unreachable objects are messages created at each superstep
GB
GB
GB
GB
Concurrent GC40
Significantly improves the scalability by 3 folds
Suffered from performance degradation by 16%
20 40 60 80 100 120 140 160 180 2000
100200300400500600700800900
ConGC ON ConGC OFFLinear (ConGC OFF)
Number of Vertices (mil)
Mem
ory
Nee
ded
(GB)
20 40 60 80 100 120 140 160 180 2000
200400600800
100012001400160018002000
ConGC ON ConGC OFFLinear (ConGC OFF)
Number of Vertices (mil)
Proc
essi
ng T
ime
(sec
)
20 GB per worker
Using Combiner41
Scale up 2 times w/o any other optimizations
Speed up the performance by 50%
10 15 20 25 30 35 400
20
40
60
80
100
120Combiner ON Combiner OFF
Number of Workers
Num
ber
of V
ertic
es (
mil)
10 15 20 25 30 35 400
200400600800
100012001400
Combiner ON Combiner OFF
Number of Workers
Proc
essi
ng T
ime
(sec
)
20 GB per worker
Memory Distribution42
More workers achieve better performance Larger memory size per worker provides
higher scalability
20 (20GB) 40 (10GB) 80 (5GB) 100 (4GB)0
200400600800
100012001400160018002000
020406080100120140160
Total Memory 400 GBPerformance Scalability
Number of Workers
Proc
essi
ng T
ime
(sec
)
Num
ber
of V
ertic
es (
mil)
Application – Object Reuse43
Improves 5x scalability Improves 4x performance Require skills from application developers
20 40 60 80 100 120 140 160 180 2000
100200300400500600700800900
No Mem Reuse Linear (No Mem Reuse)Mem Reuse
Number of Vertices (mil)
Min
imum
Mem
ory
Nee
ded
(GB)
20 40 60 80 100 120 140 160 180 2000
500
1000
1500
2000
2500
3000
No Mem Reuse Linear (No Mem Reuse)Mem Reuse
Number of Vertices (mil)Pr
oces
sing
Tim
e (s
ec)
20 GB per worker
650G
29 mins
Problems of Giraph on Yarn44
Various knobs to tune to make Giraph applications work efficiently
Highly depend on skillful application developers
Performance penalties suffered from scaling up
Future Direction45
C++ provides direct control over memory management
No need to rewrite the whole Giraph Only master and worker in C++
Conclusion46
Linkedin is the 1st player of Giraph on Yarn
Improvements and bug fixes Provide patches in Apache Giraph
Make full LI graph run on 40-node cluster with 650GB memory
Evaluate various performance and scalability options