Apache Giraph on yarn

Post on 24-Feb-2016

60 views 0 download

Tags:

description

Apache Giraph on yarn. Chuan Lei and Mohammad Islam. Fast Scalable Graph Processing . What is Apache Giraph Why do I need it Giraph + MapReduce Giraph + Yarn. What is Apache Giraph. - PowerPoint PPT Presentation

transcript

APACHE GIRAPH ON YARNChuan Lei and Mohammad Islam

Fast Scalable Graph Processing

2

What is Apache Giraph Why do I need it Giraph + MapReduce Giraph + Yarn

What is Apache Giraph3

Giraph is a framework for performing offline batch processing of semi-structured graph data on massive scale

Giraph is loosely based upon Google’s Pregel graph processing framework

What is Apache Giraph4

Giraph performs iterative calculation on top of an existing Hadoop cluster

What is Apache Giraph5

Giraph uses Apache Zookeeper to enforce atomic barrier waits and perform leader election

Done! Done!Still

Working…!

What is Apache Giraph6

Why do I need it?7

Giraph makes graph algorithms easy to reason about and implement by following the Bulk Synchronous Parallel (BSP) programming model

In BSP, all algorithms are implemented from the point of view of a single vertex in the input graph performing a single iteration of the computation

Why do I need it?8

Giraph makes iterative data processing more practical for Hadoop users

Giraph can avoid costly disk and network operations that are mandatory in MR

No concept of message passing in MR

Why do I need it?9

Each cycle of an iterative calculation on Hadoop means running a full MapReduce job

PageRank example10

PageRank – measuring the relative importance of document within a set of documents

1. All vertices start with same PageRank

1.0

1.0

1.0

PageRank example11

2. Each vertex distributes an equal portion of its pagerank to all neighbors

1.0

1.00.5

0.5

PageRank example12

3. Each vertex sums incoming values times a weight factor and adds in small adjustment: 1/(# vertices in graph)1.5*1

+1/3

1*1+1/3

0.5*1+1/3

PageRank example13

4. This value becomes the vertices’ PageRank for the next iteration

1.33

0.83

1.83

PageRank example14

5. Repeat until convergence: change in PR per iteration < epsilon)

PageRank on MapReduce15

1. Load complete input graph from disk as [K = vertex ID, V = out-edges and PR]

Map Sort/Shuffle Reduce

PageRank on MapReduce16

2. Emit all input records (full graph state), emit [K = edgeTarget, V = share of PR]

Map Sort/Shuffle Reduce

PageRank on MapReduce17

3. Sort and Shuffle this entire mess.

Map Sort/Shuffle Reduce

PageRank on MapReduce18

4. Sum incoming PR shares for each vertex, update PR values in graph state records

Map Sort/Shuffle Reduce

PageRank on MapReduce19

5. Emit full graph state to disk…

Map Sort/Shuffle Reduce

PageRank on MapReduce20

6. … and START OVER!

Map Sort/Shuffle Reduce

PageRank on MapReduce21

Awkward to reason about I/O bound despite simple core business

logic

PageRank on Giraph22

1. Hadoop Mappers are “hijacked” to host Giraph master and worker tasks

Map Sort/Shuffle Reduce

PageRank on Giraph23

2. Input graph is loaded once, maintaining code-data locality when possible

Map Sort/Shuffle Reduce

PageRank on Giraph24

3. All iterations are performed on data in memory, optionally spilled to disk. Disk access is linear/scan-based

Map Sort/Shuffle Reduce

PageRank on Giraph25

4. Output is written from the Mappers hosting the calculation, and the job run ends

Map Sort/Shuffle Reduce

PageRank on Giraph26

This is all well and good, but must we manipulate Hadoop this way?

Heap and other resources are set once, globally for all Mappers in the computation

No control of which cluster nodes host which tasks

No control over how Mappers are scheduled

Mapper and Reducer slots abstraction is meaningless for Giraph

Overview of Yarn YARN (Yet Another Resource Negotiator)

is Hadoop’s next-gen management platform

A general purpose framework that is not fixed to MapReduce paradigm

Offers fine-grained control over each task’s resource allocation

27

Giraph on Yarn28

It’s a natural fit!

Giraph on Yarn29

Client Resource Manager Application Master

Node Manag

erWork

erWork

er

Node Manag

erApp Mstr

Worker

Node Manag

erMast

erWork

er

Resource

ManagerClien

tZooKeep

er

Giraph Architecture31

Master / Workers Zookeeper

Master Worker Worker Worker

Worker Worker Worker

Worker Worker Worker

Metrics35

Performance Processing time

Scalability Graph size (number of vertices and number

of edges)

Optimization Factors36

JVM Giraph App

GC control• Parallel GC• Concurrent GC• Young

Generation

• Memory Size• Number of

Workers• Combiner• Out-of-Core

• Object Reuse

Experimental Settings37

Cluster - 43 nodes ~ 800 GB memory Hadoop-2.0.3-alpha (non-secure) Giraph-1.0.0-release Data - LinkedIn social network graph

Approx. 205 million vertices Approx. 11 billion edges

Application - PageRank algorithm

14 26 38 50 60 70 80 900

100200300400500600700800900

10 GB per worker 20 GB per worker

Number of Vertices (mil)

Proc

essi

ng T

ime

(sec

)

Baseline Result38

10 v.s 20 GB per worker Max memory 800 GB

Processing time 10 GB per worker –

better performance

Scalability 20 GB per worker –

higher scalability 10 15 20 25 30 35 400

20406080

100120

10 GB per worker 20 GB per worker

Number of Workers

Num

ber

of V

ertic

es (

mil)

40 worke

rs40

workers

400G

800G

30 worke

rs

10 worke

rs

25 worke

rs

15 worke

rs

5 worke

rs

Heap Dump w/o Concurrent GC

39

Iteration 3 Iteration 27

Reach-able 1.5

Un-reachable 3

Reach-able 1.5

Un-reach-able 6

Big portion of unreachable objects are messages created at each superstep

GB

GB

GB

GB

Concurrent GC40

Significantly improves the scalability by 3 folds

Suffered from performance degradation by 16%

20 40 60 80 100 120 140 160 180 2000

100200300400500600700800900

ConGC ON ConGC OFFLinear (ConGC OFF)

Number of Vertices (mil)

Mem

ory

Nee

ded

(GB)

20 40 60 80 100 120 140 160 180 2000

200400600800

100012001400160018002000

ConGC ON ConGC OFFLinear (ConGC OFF)

Number of Vertices (mil)

Proc

essi

ng T

ime

(sec

)

20 GB per worker

Using Combiner41

Scale up 2 times w/o any other optimizations

Speed up the performance by 50%

10 15 20 25 30 35 400

20

40

60

80

100

120Combiner ON Combiner OFF

Number of Workers

Num

ber

of V

ertic

es (

mil)

10 15 20 25 30 35 400

200400600800

100012001400

Combiner ON Combiner OFF

Number of Workers

Proc

essi

ng T

ime

(sec

)

20 GB per worker

Memory Distribution42

More workers achieve better performance Larger memory size per worker provides

higher scalability

20 (20GB) 40 (10GB) 80 (5GB) 100 (4GB)0

200400600800

100012001400160018002000

020406080100120140160

Total Memory 400 GBPerformance Scalability

Number of Workers

Proc

essi

ng T

ime

(sec

)

Num

ber

of V

ertic

es (

mil)

Application – Object Reuse43

Improves 5x scalability Improves 4x performance Require skills from application developers

20 40 60 80 100 120 140 160 180 2000

100200300400500600700800900

No Mem Reuse Linear (No Mem Reuse)Mem Reuse

Number of Vertices (mil)

Min

imum

Mem

ory

Nee

ded

(GB)

20 40 60 80 100 120 140 160 180 2000

500

1000

1500

2000

2500

3000

No Mem Reuse Linear (No Mem Reuse)Mem Reuse

Number of Vertices (mil)Pr

oces

sing

Tim

e (s

ec)

20 GB per worker

650G

29 mins

Problems of Giraph on Yarn44

Various knobs to tune to make Giraph applications work efficiently

Highly depend on skillful application developers

Performance penalties suffered from scaling up

Future Direction45

C++ provides direct control over memory management

No need to rewrite the whole Giraph Only master and worker in C++

Conclusion46

Linkedin is the 1st player of Giraph on Yarn

Improvements and bug fixes Provide patches in Apache Giraph

Make full LI graph run on 40-node cluster with 650GB memory

Evaluate various performance and scalability options