Apache Giraph on yarn

transcript

APACHE GIRAPH ON YARNChuan Lei and Mohammad Islam

Fast Scalable Graph Processing

What is Apache Giraph Why do I need it Giraph + MapReduce Giraph + Yarn

What is Apache Giraph3

Giraph is a framework for performing offline batch processing of semi-structured graph data on massive scale

Giraph is loosely based upon Google’s Pregel graph processing framework

Giraph performs iterative calculation on top of an existing Hadoop cluster

Giraph uses Apache Zookeeper to enforce atomic barrier waits and perform leader election

Done! Done!Still

Working…!

Why do I need it?7

Giraph makes graph algorithms easy to reason about and implement by following the Bulk Synchronous Parallel (BSP) programming model

In BSP, all algorithms are implemented from the point of view of a single vertex in the input graph performing a single iteration of the computation

Why do I need it?8

Giraph makes iterative data processing more practical for Hadoop users

Giraph can avoid costly disk and network operations that are mandatory in MR

No concept of message passing in MR

Why do I need it?9

Each cycle of an iterative calculation on Hadoop means running a full MapReduce job

PageRank example10

PageRank – measuring the relative importance of document within a set of documents

1. All vertices start with same PageRank

PageRank example11

2. Each vertex distributes an equal portion of its pagerank to all neighbors

1.00.5

PageRank example12

3. Each vertex sums incoming values times a weight factor and adds in small adjustment: 1/(# vertices in graph)1.5*1

1*1+1/3

0.5*1+1/3

PageRank example13

4. This value becomes the vertices’ PageRank for the next iteration

PageRank example14

5. Repeat until convergence: change in PR per iteration < epsilon)

PageRank on MapReduce15

1. Load complete input graph from disk as [K = vertex ID, V = out-edges and PR]

Map Sort/Shuffle Reduce

2. Emit all input records (full graph state), emit [K = edgeTarget, V = share of PR]

3. Sort and Shuffle this entire mess.

4. Sum incoming PR shares for each vertex, update PR values in graph state records

5. Emit full graph state to disk…

6. … and START OVER!

Awkward to reason about I/O bound despite simple core business

PageRank on Giraph22

1. Hadoop Mappers are “hijacked” to host Giraph master and worker tasks

2. Input graph is loaded once, maintaining code-data locality when possible

3. All iterations are performed on data in memory, optionally spilled to disk. Disk access is linear/scan-based

4. Output is written from the Mappers hosting the calculation, and the job run ends

This is all well and good, but must we manipulate Hadoop this way?

Heap and other resources are set once, globally for all Mappers in the computation

No control of which cluster nodes host which tasks

No control over how Mappers are scheduled

Mapper and Reducer slots abstraction is meaningless for Giraph

Overview of Yarn YARN (Yet Another Resource Negotiator)

is Hadoop’s next-gen management platform

A general purpose framework that is not fixed to MapReduce paradigm

Offers fine-grained control over each task’s resource allocation

Giraph on Yarn28

It’s a natural fit!

Giraph on Yarn29

Client Resource Manager Application Master

Node Manag

erWork

Node Manag

erApp Mstr

Worker

Node Manag

erMast

erWork

Resource

ManagerClien

tZooKeep

Giraph Architecture31

Master / Workers Zookeeper

Master Worker Worker Worker

Worker Worker Worker

Metrics35

Performance Processing time

Scalability Graph size (number of vertices and number

of edges)

Optimization Factors36

JVM Giraph App

GC control• Parallel GC• Concurrent GC• Young

Generation

• Memory Size• Number of

Workers• Combiner• Out-of-Core

• Object Reuse

Experimental Settings37

Cluster - 43 nodes ~ 800 GB memory Hadoop-2.0.3-alpha (non-secure) Giraph-1.0.0-release Data - LinkedIn social network graph

Approx. 205 million vertices Approx. 11 billion edges

Application - PageRank algorithm

14 26 38 50 60 70 80 900

100200300400500600700800900

10 GB per worker 20 GB per worker

Number of Vertices (mil)

Baseline Result38

10 v.s 20 GB per worker Max memory 800 GB

Processing time 10 GB per worker –

better performance

Scalability 20 GB per worker –

higher scalability 10 15 20 25 30 35 400

20406080

100120

10 GB per worker 20 GB per worker

Number of Workers

40 worke

workers

30 worke

10 worke

25 worke

15 worke

5 worke

Heap Dump w/o Concurrent GC

Iteration 3 Iteration 27

Reach-able 1.5

Un-reachable 3

Reach-able 1.5

Un-reach-able 6

Big portion of unreachable objects are messages created at each superstep

Concurrent GC40

Significantly improves the scalability by 3 folds

Suffered from performance degradation by 16%

20 40 60 80 100 120 140 160 180 2000

100200300400500600700800900

ConGC ON ConGC OFFLinear (ConGC OFF)

20 40 60 80 100 120 140 160 180 2000

200400600800

100012001400160018002000

ConGC ON ConGC OFFLinear (ConGC OFF)

20 GB per worker

Using Combiner41

Scale up 2 times w/o any other optimizations

Speed up the performance by 50%

10 15 20 25 30 35 400

120Combiner ON Combiner OFF

Number of Workers

10 15 20 25 30 35 400

200400600800

100012001400

Combiner ON Combiner OFF

Number of Workers

20 GB per worker

Memory Distribution42

More workers achieve better performance Larger memory size per worker provides

higher scalability

20 (20GB) 40 (10GB) 80 (5GB) 100 (4GB)0

200400600800

100012001400160018002000

020406080100120140160

Total Memory 400 GBPerformance Scalability

Number of Workers

Application – Object Reuse43

Improves 5x scalability Improves 4x performance Require skills from application developers

20 40 60 80 100 120 140 160 180 2000

100200300400500600700800900

No Mem Reuse Linear (No Mem Reuse)Mem Reuse

20 40 60 80 100 120 140 160 180 2000

No Mem Reuse Linear (No Mem Reuse)Mem Reuse

Number of Vertices (mil)Pr

20 GB per worker

29 mins

Problems of Giraph on Yarn44

Various knobs to tune to make Giraph applications work efficiently

Highly depend on skillful application developers

Performance penalties suffered from scaling up

Future Direction45

C++ provides direct control over memory management

No need to rewrite the whole Giraph Only master and worker in C++

Conclusion46

Linkedin is the 1st player of Giraph on Yarn

Improvements and bug fixes Provide patches in Apache Giraph

Make full LI graph run on 40-node cluster with 650GB memory

Evaluate various performance and scalability options

Apache Giraph on yarn

Documents