Applied research group - University of Washington · Applied research group Systems+database people...

Applied research groupSystems+database people building prototypes, publishing papers


Collaborating with Big Data product group at MSShipping our code to production


Collaborating with Big Data product group at MSShipping our code to production

Open-sourcing our codeApache Hadoop, REEF, Heron

Resource

management

Distributed

tiered storage

Query

optimization

Log analyticsStream

processing

Resource

management

Distributed

tiered storage

Query

optimization

Log analyticsStream

processing

Node Manager

Node Manager

Node Manager

Node Manager

Node Manager

Node Manager

•

Node Manager

Node Manager

Node Manager

•

•

Node Manager

Node Manager

Node Manager

•

•

Node Manager

Node Manager

Node Manager

1. Request

•

•

Node Manager

Node Manager

Node Manager

1. Request

2. Allocation

•

•

Node Manager

Node Manager

Node Manager

1. Request

2. Allocation

3. Start task

•

•

Node Manager

Node Manager

Node Manager

1. Request

2. Allocation

3. Start task

•

•

•

Node Manager

Node Manager

Node Manager

1. Request

2. Allocation

3. Start task

•

•

•

•

Node Manager

Node Manager

Node Manager

1. Request

2. Allocation

3. Start task

•

•

•

•

Do we really need a Resource Manager?

Ad-hocapp

Ad-hocapp

Ad-hocapp

Ad-hocApps

YARN

MR

v2Tez Giraph Storm Dryad

REEF

...

Hive / Pig

Hadoop 1.x

(MapReduce)

MR v1

Hive / Pig

Users

Application

Frameworks

Programming

Model(s)

Cluster OS

(Resource

Management)

Hadoop 1 World Hadoop 2 World

File System

HDFS 1 HDFS 2

Hardware

Ad-hocapp

Ad-hocapp

Scope

on

YARNSpark

•

monolithic

Heron

Ad-hocapp

Ad-hocapp

Ad-hocapp

Ad-hocApps

YARN

MR


REEF

...

Hive / Pig

Hadoop 1.x

(MapReduce)

MR v1

Hive / Pig

Users

Application

Frameworks

Programming

Model(s)

Cluster OS

(Resource

Management)


File System

HDFS 1 HDFS 2

Hardware

Ad-hocapp

Ad-hocapp

Scope

on

YARNSpark

•

monolithic

• Reuse of RM component

Heron

Ad-hocapp

Ad-hocapp

Ad-hocapp

Ad-hocApps

YARN

MR


REEF

...

Hive / Pig

Hadoop 1.x

(MapReduce)

MR v1

Hive / Pig

Users

Application

Frameworks

Programming

Model(s)

Cluster OS

(Resource

Management)


File System

HDFS 1 HDFS 2

Hardware

Ad-hocapp

Ad-hocapp

Scope

on

YARNSpark

•

monolithic

• Reuse of RM component

• YARN

layering abstractions

Heron

But is all this good enough for the Microsoft clusters?

High resource

utilizationScalability

Workload

heterogeneity

Production jobs

and

predictability

100% Utilization

0

• Wide variety

• Wide variety

• Wide variety

•

•

deadlines

recurring>60%

• Predictability

over-provisioned

• Rayon/Morpheus:

• Mercury/Yaq:

• YARN Federation:

• Medea:

4 Hadoop committers in CISL

404 patches as of last night

• Rayon/Morpheus:

• Mercury/Yaq:

• YARN Federation:

• Medea:

4 Hadoop committers in CISL

404 patches as of last night

[Hadoop 3.0; ATC 2015, EuroSys 2016]

N1 N2

RM

N1 N2

RM

j1

N1 N2

RM

j1

N1 N2

RM

j2

N1 N2

RM

j2

N1 N2

RM

j2

N1 N2

RM

j2

N1 N2

RM

j2

N1 N2

RM

j2

• Feedback delays

idle between allocations

• Feedback delays


5 sec 10 sec 50 sec Mixed-5-50 Cosmos-gm

60.59% 78.35% 92.38% 78.54% 83.38%

N1 N2

RM

j2

• Feedback delays


• Actual

5 sec 10 sec 50 sec Mixed-5-50 Cosmos-gm

60.59% 78.35% 92.38% 78.54% 83.38%

N1 N2

RM

j2

• Introduce task queuing at nodes• Mask feedback delays

• Improve cluster utilization

• Improve task throughput (by up to 40%)

• Container types• GUARANTEED and OPPORTUNISTIC

• Keep guarantees for important jobs

• Use opportunistic execution to improve utilization

N1 N2

RM

N1 N2

RM

N1 N2

RM

j1

N1 N2

RM

j1

N1 N2

RM

j2

N1 N2

RM

j2

N1 N2

RM

j2

N1 N2

RM

j2

N1 N2

RM

j2

•

N1 N2

RM

j2

•

•

N1 N2

RM

j2

•

•

•

•

•

•

•

So all we need to do is use long queues?

can be detrimental for job completion times• Despite the utilization gains

can be detrimental for job completion times• Despite the utilization gains

Proper queue management techniques are required

N1 N2 N3

N1 N2 N3

N1 N2 N3

N1 N2 N3

Place tasks to

node queues

Prioritize task

execution

(queue reordering)

Bound queue

lengths

Place tasks to

node queues

Prioritize task

execution

(queue reordering)

Bound queue

lengths

Yaq improves median job completion time by 1.7x over YARN

N1 N2 N3

RM

N1 N2 N3

RM

queue length

N1 N2 N3

RM

queue length

queue length

N1 N2 N3

RM

N1 N2 N3

RM

queue length

queue wait time

N1 N2 N3

RM

queue length

queue wait time

• Shortest Remaining Job First (SRJF)

• Least Remaining Tasks First (LRTF)



N1 N2 N3

RM

j2: 5 tasks

j3: 9 tasks

j1: 21 tasks



N1 N2 N3

RM

j2: 5 tasks

j3: 9 tasks

j1: 21 tasks



N1 N2 N3

RM

j2: 5 tasks

j3: 9 tasks

j1: 21 tasks



job-aware

N1 N2 N3

RM

j2: 5 tasks

j3: 9 tasks

j1: 21 tasks

lower throughput

longer job completion times

• 1.7x improvement in median JCT over YARN

• Container types

distributed scheduling

any distributed scheduler

over-commitment

multi-tenancy

• Pricing

cluster utilization

queue management techniques

job completion time

Date post:	20-May-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Applied research group - University of Washington · Applied research group Systems+database people...

Documents