Mutil-Tenant Data Cloud with YARN & Helix

Post on 19-Aug-2014

887 views 5 download

Tags:

description

Building applications on YARN with Helix

transcript

Multi-Tenant Data Cloud with YARN & Helix

LinkedIn - Data infra : Helix, Espresso

@kishore_b_gYahoo - Ads infra : S4

Kishore Gopalakrishna

1Thursday, June 5, 14

What is YARNNext Generation Compute Platform

MapReduce

HDFS

Hadoop 1.0

MapReduce

HDFS

Hadoop 2.0

Others(Batch, Interactive, Online,

Streaming)

YARN(cluster resource management)

2Thursday, June 5, 14

What is YARNNext Generation Compute Platform

MapReduce

HDFS

Hadoop 1.0

MapReduce

HDFS

Hadoop 2.0

Others(Batch, Interactive, Online,

Streaming)

YARN(cluster resource management)

A1

A1

A2

A3

B1 C1

C5

B2

B3 C2

B4

B5

C3

C4

Enables

2Thursday, June 5, 14

HDFS/Common Area

YARNYARN Architecture

ClientResource Manager

Node Manager Node Manager

submit job

node statusnode statuscontainer request

App Package

Application Master Container

3Thursday, June 5, 14

So, let’s build something

4Thursday, June 5, 14

Example System

Generate Data

Serve

M/R

Redis Server 3

HDFS 3

- Generate data in Hadoop - Use it for serving

5Thursday, June 5, 14

Example System

Generate Data

Serve

M/R

Server 3

HDFS 3

6Thursday, June 5, 14

Example SystemRequirements

Big Data :-)

Partitioned, replicated

Fault tolerant, Scalable

Efficient resource utilization

Generate Data

Serve

M/R

Server 3

HDFS 3

6Thursday, June 5, 14

ApplicationMaster

Example System

Request Containers Assign work

Handle FailureHandle

workload Changes

RequirementsBig Data :-)

Partitioned, replicated

Fault tolerant, Scalable

Efficient resource utilization

Generate Data

Serve

M/R

Server 3

HDFS 3

6Thursday, June 5, 14

Allocation + Assignment

HDFS

Server 1 Server 2Server 3

Partition Assignment - affinity, even distribution

Replica Placement - on different physical machines

Container Allocation - data affinity, rack aware placement

M/Rp1 p2 p3 p4 p5 p6

p1 p2

p5 p4

Server 3p3 p4

p1 p6

Server 3p5 p6

p3 p2

Multiple servers to serve the partitioned data

M/R job generates partitioned data

7Thursday, June 5, 14

Failure HandlingServer 1 Server 2Server 1

Acquire new container close to data if possible

Assign failed partitions to new container

On Failure - Even load distribution, while waiting for new container

Server 23 Server 3

p5 p4 p1 p6 p3 p2

p1 p2 p3 p4 p5 p6

8Thursday, June 5, 14

Failure HandlingServer 1 Server 2Server 1

Acquire new container close to data if possible

Assign failed partitions to new container

On Failure - Even load distribution, while waiting for new container

Server 23 Server 3

p5 p4 p1 p6 p3 p2

p1 p2 p3 p4 p5 p6

8Thursday, June 5, 14

Failure HandlingServer 1 Server 2Server 1

Acquire new container close to data if possible

Assign failed partitions to new container

On Failure - Even load distribution, while waiting for new container

Server 23 Server 3 Server 4

p5 p4 p1 p6 p3 p2

p1 p2 p3 p4 p5 p6

p3 p2

p5 p6

8Thursday, June 5, 14

Workload ChangesServer 1 Server 2Server 3

Workload change - Acquire/Release containers

Container change - Re-distribute work

Monitor - CPU, Memory, Latency, Tps

p1 p2

p5 p4

Server 3p3 p4

p1 p6

Server 3p5 p6

p3 p2

9Thursday, June 5, 14

Workload ChangesServer 1 Server 2Server 3

Workload change - Acquire/Release containers

Container change - Re-distribute work

Monitor - CPU, Memory, Latency, Tps

p1 p2

p5 p4

Server 3p3 p4

p1 p6

Server 3p5 p6

p3 p2

Server 3p4 p6

p2

9Thursday, June 5, 14

Workload ChangesServer 1 Server 2Server 3

Workload change - Acquire/Release containers

Container change - Re-distribute work

Monitor - CPU, Memory, Latency, Tps

p1 p2

p5

Server 3p3 p4

p1

Server 3p5 p6

p3

Server 3p4 p6

p2

9Thursday, June 5, 14

Service Discovery

Server 1 Server 2Server 3

Dynamically updated on changes Discover everything, what is running where

p1 p2

p1 p1

Server 3p3 p4

p1 p1

Server 3p5 p6

p1 p1

10Thursday, June 5, 14

Service Discovery

Server 1 Server 2Server 3

Dynamically updated on changes Discover everything, what is running where

p1 p2

p1 p1

Server 3p3 p4

p1 p1

Server 3p5 p6

p1 p1

Client Client

Service Discovery

10Thursday, June 5, 14

Building YARN Application

Writing AM is Hard and Error ProneHandling Faults, Workload Changes is non-trivial and often overlooked

Request container

How many containers

Where

Assign work

Place partitions &

replicas

Affinity

Workload changes

acquire/release

containers

Minimize movement

Faults Handling

Detect non trivial failures

new v/s reuse

containers

Other

Service Discovery

Monitoring

11Thursday, June 5, 14

Building YARN Application

Writing AM is Hard and Error ProneHandling Faults, Workload Changes is non-trivial and often overlooked

Request container

How many containers

Where

Assign work

Place partitions &

replicas

Affinity

Workload changes

acquire/release

containers

Minimize movement

Faults Handling

Detect non trivial failures

new v/s reuse

containers

Other

Service Discovery

Monitoring

Is there something that can make this easy?

11Thursday, June 5, 14

Apache Helix

12Thursday, June 5, 14

What is Helix?

Built at LinkedIn, 2+ years in production

Generic cluster management framework

Contributed to Apache, now a TLP: helix.apache.org

Decoupling cluster management from core functionality

13Thursday, June 5, 14

Helix at LinkedIn

OracleOracleOracleDB

Change Capture

ChangeConsumers

Index Search Index

User Writes

Data Replicator

In Production

ETL

HDFS

Analytics

14Thursday, June 5, 14

Helix at LinkedInIn Production

Over 1000 instances covering over 30000 partitions

Over 1000 instances for change capture consumers

As many as 500 instances in a single Helix cluster

(all numbers are per-datacenter)

15Thursday, June 5, 14

Others Using Helix

16Thursday, June 5, 14

Helix concepts

Resource (Database, Index, Topic, Task)

17Thursday, June 5, 14

Helix concepts

Resource (Database, Index, Topic, Task)

Partitionsp1 p2 p3 p4 p5 p6

17Thursday, June 5, 14

Helix concepts

Resource (Database, Index, Topic, Task)

PartitionsReplicas

p1 p2 p3 p4 p5 p6

r1

r2

r3

17Thursday, June 5, 14

Helix concepts

Resource (Database, Index, Topic, Task)

PartitionsReplicas

p1 p2 p3 p4 p5 p6

r1

r2

r3

Container Process

Container Process

Container Process

17Thursday, June 5, 14

Helix concepts

Resource (Database, Index, Topic, Task)

PartitionsReplicas

p1 p2 p3 p4 p5 p6

r1

r2

r3

Container Process

Container Process

Container Process

Assignment ?

17Thursday, June 5, 14

State Model and ConstraintsHelix Concepts

18Thursday, June 5, 14

Serve

bootstrap

State Model and ConstraintsHelix Concepts

Stop

18Thursday, June 5, 14

Serve

bootstrap

State Model and ConstraintsHelix Concepts

State Constraints

Transition Constraints

Partition

Resource

Node

Cluster

Serve: 3bootstrap: 0 Max T1 transitions in

parallel

- Max T2 transitions in parallel

No more than 10 replicas

Max T3 transitions in parallel

- Max T4 transitions in parallel

Stop

18Thursday, June 5, 14

Serve

bootstrap

State Model and ConstraintsHelix Concepts

State Constraints

Transition Constraints

Partition

Resource

Node

Cluster

Serve: 3bootstrap: 0 Max T1 transitions in

parallel

- Max T2 transitions in parallel

No more than 10 replicas

Max T3 transitions in parallel

- Max T4 transitions in parallel

StateCount= Replication factor:3

Stop

18Thursday, June 5, 14

ParticipantParticipantParticipant

Helix Architecture

P1stop

bootstrapserver

P2 P5

P3

P4

P8

P6

P7

Controller

Client Client Target Provider

Provisioner

Rebalancer

assign work via callback

spectator spectator

Service Discovery

metrics

metrics

19Thursday, June 5, 14

Helix ControllerHigh-Level Overview

Resource Config

Constraints

Objectives

Controller

TargetProvider

Provisioner

Rebalancer

Number of Containers

Task-> Container Mapping

YARN RM

20Thursday, June 5, 14

Helix ControllerTarget Provider

Determine how many containers are required along with the spec

Fixed CPU Memory Bin Packing

monitoring system provides usage informationDefault implementations, Bin Packing can be used to customize further

TargetProvider

Resources p1,p2 .. pn

Existing containers c1,c2 .. cn

Health of tasks, containers cpu, memory, health

Allocation constraints

Affinity,rack locality

SLA

Fixed: 10 containersCPU headroom:30%Memory Usage: 70%

time: 5h

Number of container

release listacquire list

Container speccpu: x

memory: ylocation: L

21Thursday, June 5, 14

Helix ControllerProvisioner

Given the container spec, interact with YARN RM to acquire/release, NM to start/stop containers

YARN

Interacts with YARN RM and subscribes to notifications

22Thursday, June 5, 14

Helix ControllerRebalancer

Based on the current nodes in the cluster and constraints, find an assignment of task to node

Auto Semi-Auto Static

Rebalancer

Tasks t1,t2 .. tn

Existing containers c1,c2 .. cn

Allocation constraints &

objectives

Affinity,rack locality,

Even distribution of tasks,

Minimize movement while expanding

Assignment C1: t1,t2C2: t3,t4

User defined

Based on the FSM, compute & fire the transitions to Participants

23Thursday, June 5, 14

Example System: Helix-Based Solution

Solution

Configure App

Configure Target Provider

Configure Provisioner

Configure RebalancerGenerate Data

Serve

M/R

Server 3

HDFS 3

24Thursday, June 5, 14

Configure AppConfigure App

App Name Partitioned Data Server

App Master Package

/path/to/GenericHelixAppMaster.tar

App package /path/to/RedisServerLauncher.tar

App Config DataDirectory: hdfs:/path/to/data

Configure target providerConfigure target provider

TargetProvider RedisTargetProvider

Goal Target TPS: 1 million

Min container 1

Max containers 25

Configure ProvisionerConfigure Provisioner

YARN RM host:port

Configure RebalancerConfigure RebalancerPartitions 6Replica 2

Max partitions per container 4

Rebalancer.Mode AUTO

Placement Data Affinity

FailureHandling Even distribution

Scaling Minimize Movement

app_config_spec.yaml

Example System: Helix-Based Solution

25Thursday, June 5, 14

yarn_app_launcher.sh  app_config_spec.yaml

Launch Application

26Thursday, June 5, 14

Helix + YARN

Server 1 Server 2

27Thursday, June 5, 14

Helix + YARN

YARN Resource Manager

Client

submit job

Server 1 Server 2

27Thursday, June 5, 14

Application Master

Helix + YARN

YARN Resource Manager

Client

submit job

Launch AM

Server 1 Server 2

27Thursday, June 5, 14

Application Master

Helix + YARN

Helix Controller

YARN Resource Manager

Target Provider

Provisioner

RebalancerClient

submit job

Launch AM

Server 1 Server 2

27Thursday, June 5, 14

Application Master

Helix + YARN

Helix Controller

YARN Resource Manager

Target Provider

Provisioner

RebalancerClient

submit job

Launch AM

request cntrs

Server 1 Server 2

27Thursday, June 5, 14

Node ManagerNode Manager

Application Master

Helix + YARN

Helix Controller

Node Manager

YARN Resource Manager

Target Provider

Provisioner

RebalancerClient

submit job

Launch AM

request cntrs

launch containers

Server 1 Server 2participant 3 participant 3 participant 3

27Thursday, June 5, 14

Node ManagerNode Manager

Application Master

Helix + YARN

Helix Controller

Node Manager

YARN Resource Manager

Target Provider

Provisioner

Rebalancer

assign work

Client

submit job

Launch AM

request cntrs

launch containers

Server 1 Server 2participant 3p1 p2

p5 p4

participant 3p3 p4

p1 p6

participant 3p5 p6

p3 p2

27Thursday, June 5, 14

Auto Scaling

Non linear scaling from 0 to 1M TPS and back

28Thursday, June 5, 14

Failure Handling: Random Faults

Recovering from faults at 1M Tps (5%, 10%, 20% failures/min)

29Thursday, June 5, 14

Summary

HDFS

YARN(cluster resource management)

HELIX(container + task management)

Others(Batch, Interactive, Online, Streaming)

Fault tolerance, Expansion handled transparently

Generic Application Master

Efficient resource utilization by task model

30Thursday, June 5, 14

Questions?

Website

Twitter

Mail

Team

helix.apache.org, #apachehelix

@apachehelix, @kishore_b_g

user@helix.apache.org

Kanak Biscuitwala, Zhen Zhang?We love helping & being helped

31Thursday, June 5, 14