+ All Categories
Home > Documents > NEPTUNE - acmsocc.github.io · Microsoft [email protected] Peter Pietzuch Imperial College...

NEPTUNE - acmsocc.github.io · Microsoft [email protected] Peter Pietzuch Imperial College...

Date post: 22-May-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
21
NEPTUNE Scheduling Suspendable Tasks for Unified Stream/Batch Applications SoCC, Santa Cruz, California, November 2019 Panagiotis Garefalakis Imperial College London [email protected] Konstantinos Karanasos Microsoft [email protected] Peter Pietzuch Imperial College London [email protected]
Transcript
Page 1: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

NEPTUNEScheduling Suspendable Tasks

for Unified Stream/Batch Applications

SoCC, Santa Cruz, California, November 2019

Panagiotis GarefalakisImperial College London

[email protected]

Konstantinos KaranasosMicrosoft

[email protected]

Peter PietzuchImperial College London

[email protected]

Page 2: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

Unified application example

Panagiotis Garefalakis - Imperial College London 2

Inference Job

Low-latencyresponses

TrainedModel

Historical data

Real-time data

Training Job

Iterate

Stream

Batch Application

Page 3: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

Evolution of analytics frameworks

Panagiotis Garefalakis - Imperial College London 3

Batch frameworks

20142010 2018

Frameworks with hybrid

stream/batch applicationsStream frameworks

Unified stream/batch frameworks

Structured Streaming

Page 4: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

Requirements> Latency: Execute inference job with minimum delay> Throughput: Batch jobs should not be compromised> Efficiency: Achieve high cluster resource utilization

Stream/Batch application requirements

Panagiotis Garefalakis - Imperial College London 4

Challenge: schedule stream/batch jobs to satisfy their diverse requirements

Page 5: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

Stream/Batch application scheduling

Panagiotis Garefalakis - Imperial College London 5

2xTInference (stream) Job 2xT

3T TTraining (batch) Job

Stage1

T

Stage2

T2x 2x

3T3T3T

Stage1

TT

Stage24x 3x

ApplicationCode

Driver

DAG Scheduler

submitApp Contextrun job

Page 6: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

Stream/Batch application scheduling

Panagiotis Garefalakis - Imperial College London 6

2xTInference (stream) Job 2xT

3T TTraining (batch) Job

3T

3T

3T

T T T T

4T

3T

exec

utor

1ex

ecut

or 2

8T

T

T

TWasted

resourcesCor

es

2T 6T

Stage1

T

Stage2

T2x 2x

3T3T3T

Stage1

TT

Stage24x 3x

> Static allocation: dedicate resources to each job

Resources can not be shared across jobs

Page 7: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

Stream/Batch application scheduling

Panagiotis Garefalakis - Imperial College London 7

2xT 2xT

3T T

4T 8T2T 6T

Stage1

T

Stage2

T2x 2x

3T3T3T

Stage1

TT

Stage24x 3x

> FIFO: first job runs to completion

3T

3T

3T

3T

T

T

T

T T

T

Long batch jobs increase stream job latency

Cor

es

T

Inference (stream) Job

Training (batch) Jobsh

ared

exe

cuto

rs

Page 8: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

Stream/Batch application scheduling

Panagiotis Garefalakis - Imperial College London 8

2xT 2xT

3T T

4T 8T2T 6T

Stage1

T

Stage2

T2x 2x

3T3T3T

Stage1

TT

Stage24x 3x

> FAIR: weight share resources across jobs

Cor

es

3T

3T

3T

3T

T

T

T

T

T

T

T

queuingBetter packing with non-optimal latency

Inference (stream) Job

Training (batch) Jobsh

ared

exe

cuto

rs

Page 9: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

Stream/Batch application scheduling

Panagiotis Garefalakis - Imperial College London 9

2xT 2xT

3T T

4T 8T2T 6T

Stage1

T

Stage2

T2x 2x

3T3T3T

Stage1

TT

Stage24x 3x

> KILL: avoid queueing by preempting batch tasks

Cor

es

3T

3T

3T

3T

T

T

T

T

T

T 3T

T 3T

Better latency at the expense of extra work

Inference (stream) Job

Training (batch) Jobsh

ared

exe

cuto

rs

Page 10: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

Stream/Batch application scheduling

Panagiotis Garefalakis - Imperial College London 10

2xT 2xT

3T T

4T 8T2T 6T

Stage1

T

Stage2

T2x 2x

3T3T3T

Stage1

TT

Stage24x 3x

> NEPTUNE: minimize queueing and wasted work!

Cor

esInference (stream) Job

Training (batch) Jobsh

ared

exe

cuto

rs

3T

3T

3T

3T

T

T

T

T

T

2T

2TT

T

Page 11: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

> How to minimize queuing for latency-sensitive jobs and wasted work?Implement suspendable tasks

> How to natively support stream/batch applications?Provide a unified execution framework

> How to satisfy different stream/batch application requirements and high-level objectives?Introduces custom scheduling policies

Challenges

Panagiotis Garefalakis - Imperial College London 11

Page 12: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

> How to minimize queuing for latency-sensitive jobs and wasted work?Implement suspendable tasks

> How to natively support stream/batch applications?Provide a unified execution framework

> How to satisfy different stream/batch application requirements and high-level objectives?Introduces custom scheduling policies

NEPTUNEExecution framework for Stream/Batch applications

Panagiotis Garefalakis - Imperial College London 12

Support suspendable tasks

Introduce pluggable scheduling policies

Unified execution framework on top ofStructured Streaming

Page 13: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

Typical tasks

Panagiotis Garefalakis - Imperial College London 13

ExecutorStack

Task run

Value

Context

Iterator

Function

> Tasks: apply a function to a partition of data

> Subroutines that run in executor to completion

> Preemption problem: > Loss of progress (kill)> Unpredictable preemption times

(checkpointing)

State

Page 14: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

Suspendable tasks

Panagiotis Garefalakis - Imperial College London 14

Function

Context

Iterator

Coroutine Stack

callyield

> Idea: use coroutines> Separate stacks to store task

state> Yield points handing over

control to the executor

> Cooperative preemption: > Suspend and resume in

milliseconds> Work-preserving

> Transparent to the user

ExecutorStack

Task run

Value

State

Context

https://github.com/storm-enroute/coroutines

Page 15: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

Execution framework

Panagiotis Garefalakis - Imperial College London 15

> Idea: centralized scheduler with pluggable policies

> Problem: not just assign but also suspend and resume

ExecutorExecutorDAG scheduler

Task Scheduler

Scheduling policy

ExecutorTasks

Low-pri job High-pri job

Running Paused

suspend & run task

App + job prioritiesLowHigh

Tasks

Incr

emen

taliz

er

Opt

imiz

er

launchtask

metrics

Page 16: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

Scheduling policies

Panagiotis Garefalakis - Imperial College London 16

> Idea: policies trigger task suspension and resumption> Guarantee that stream tasks bypass batch tasks> Satisfy higher-level objectives i.e. balance cluster load> Avoid starvation by suspending up to a number of times

> Load-balancing (LB): takes into account executors’ memory conditions and equalize the number of tasks per node

> Locality- and memory aware (LMA): respect task locality preferences in addition to load-balancing

Page 17: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

> Built as an extension to 2.4.0 (https://github.com/lsds/Neptune)

> Ported all ResultTask, ShuffleMapTask functionality across programming interfaces to coroutines

> Extended Spark’s DAG Scheduler to allow job stages with different requirements (priorities)

> Added additional Executor performance metrics as part of the heartbeat mechanism

Implementation

Panagiotis Garefalakis - Imperial College London 17

Page 18: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

> Cluster– 75 nodes with 4 cores and 32 GB of memory each

> Workloads– LDA: ML training/inference application uncovering

hidden topics from a group of documents– Yahoo Streaming Benchmark: ad-analytics on a

stream of ad impressions– TPC-H decision support benchmark

Azure deployment

Panagiotis Garefalakis - Imperial College London 18

Page 19: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

DIFF-EXEC FIFO FAIR KILL NEP-CL NEP-LB PRI-ONLY0

1

2

3

4

5

6S

tream

ing

late

ncy

(s)

Benefit of NEPTUNE in stream latency

Panagiotis Garefalakis - Imperial College London 19

> LDA: training (batch) job using all available resources, with a latency-sensitive inference (stream) using 15% of resources

NEPTUNE achieves latencies comparable to the ideal for the latency-sensitive jobs

LBNeptune

LMANeptune

IsolationKILLFAIRFIFOStaticallocation

37%

13%61%

54%

99th

median

5th

Page 20: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

Impact of resource demands in performance

Panagiotis Garefalakis - Imperial College London 20

Past to future> YSB: increasing stream job resource demands while batch

job using all available resources

0% 20% 40% 60% 80% 100%Cores used for Streaming

0

2

4

6S

tream

ing

late

ncy

(s)

3.85

3.88

3.90

3.92

3.95

Bat

ch(M

even

ts/s

)

1.5%

Efficiently share resources with low impact on throughput

Page 21: NEPTUNE - acmsocc.github.io · Microsoft kokarana@microsoft.com Peter Pietzuch Imperial College London prp@imperial.ac.uk. Unified application example Panagiotis Garefalakis-Imperial

NEPTUNE supports complex unified applications with diverse job requirements!

> Suspendable tasks using coroutines> Pluggable scheduling policies> Continuous unified analytics

Thank you!Questions?

Panagiotis [email protected]

Summary

https://github.com/lsds/Neptune


Recommended