+ All Categories
Home > Documents > 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon...

1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon...

Date post: 29-Dec-2015
Category:
Upload: garey-johns
View: 218 times
Download: 2 times
Share this document with a friend
31
1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg @ Wieder, Kunal Talwar, and Andrew Goldberg @ Microsoft Research Microsoft Research Presenter: Weiyue Xu Presenter: Weiyue Xu 22nd ACM Symposium on Operating Systems Principles 22nd ACM Symposium on Operating Systems Principles
Transcript
Page 1: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

1

Quincy: Fair Scheduling for Distributed Computing Clusters

Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg @ Microsoft Researchand Andrew Goldberg @ Microsoft Research

Presenter: Weiyue XuPresenter: Weiyue Xu

22nd ACM Symposium on Operating Systems Principles22nd ACM Symposium on Operating Systems Principles

Page 2: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

2

Credit

• Modified version of

www.sigops.org/sosp/sosp09/slides/quincy/QuincyTestPage.html

www.cs.uiuc.edu/class/sp11/cs525/slides.021711.ppt

Page 3: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

3

Outline

• Introduction• Goal of Quincy• Baseline: Queue Based Scheduler • Flow Based Scheduler: Quincy • Evaluation• Conclusion

Page 4: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

4

Motivation

• Popularity of data-intensive cluster computing Fairness

• More than 50% are small jobs ( less than 30 minutes) • Large job should not monopolize the cluster • If Job X takes t seconds when it runs exclusively on a cluster, X

should take no more than Jt seconds when cluster has J concurrent jobs. (For N computers and J jobs, each job should get at-least N/J computers)

Data locality• Large disks directly attached to the computers• Network bandwidth is expensive

Page 5: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

5

Problem setting & assumptions

• Homogenous environment• Dryad distributed execution platform

Similar with MapReduce, Hadoop For each job, it contains one “root task” and

several “worker tasks” Tasks are independent of each other

Page 6: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

6

• For MPI (message-passing) jobs, coarse grain scheduling Devote a fixed set of computers for a particular job Static allocation, rarely change the allocation

• Tasks’ dependencies costly to kill a task • No direct-attached storage

• For Dryad jobs, fine grain resource sharing Multiplex all computers in the cluster between all jobs When one task completes, computer resources may be

reassigned to another job• Independent tasks (less costly to kill a task and restart)• Large datasets attached to each computer

Page 7: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

7

Example of Coarse Grain Sharing

Page 8: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

8

Example of Fine Grain Sharing

N/J computers are used by one job at a time but the set in use varies over lifetime

Page 9: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

9

Data Locality

• Data transfer cost depends on the size and location.

Page 10: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

10

Goal of Quincy

• Fairness + data locality• N computers, J concurrent jobs

Each job gets at-least N/J computers With data locality

• place tasks near data to avoid network bottlenecks Joint optimization of fairness and data locality A multi-constrained optimization problem with trade-

offs!

Page 11: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

11

Cluster Architecture

Page 12: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

12

Baseline: Queue Based Scheduler

Page 13: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

13

• Greedy (G): Locality is computed by the root task for each

worker task by computing the amount of data need to be transferred if computer m is be assigned to the task. (preferred list Cm > Rl > X)

Without considering fairness• Simple Greedy Fairness (GF):

“blocked” job will not be assigned more computers. But pre-existing tasks from now-blocked jobs are allowed to run to completion (Similar as Hadoop Fair Scheduler)

• Fairness with preemption (GFP) The over-quota tasks will be killed

Baseline: Queue Based Schedulerjr

jnw

Page 14: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

14

Flow Based Scheduler: Quincy• Main Idea: [Matching = Scheduling]

Construct a graph based on scheduling constraints and cluster architecture

Assign costs to each matching Finding a min cost flow on the graph is equivalent to

finding a feasible schedule Each task is either scheduled on a computer or it

remains unscheduled Fairness constrains the number of tasks scheduled for

each job

Page 15: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

15

New Goal

• Minimize matching cost while obeying fairness constraints Instead of making local decisions [queue based], solve

it globally

• Issues: How to construct the graph? How to embed fairness and locality constraints

in the graph?

Page 16: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

16

Graph Construction• Start with a directed graph representation of the cluster

architecture

Page 17: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

17

Graph Construction (2)

• Add an unscheduled node Uj

• Each worker task has an edge to Uj

• There is a single edge from Uj to the sink

• High cost on edges from tasks to Uj.

• The cost and flow on the edge from Uj to the sink controls fairness

• Fairness controlled by adjusting the number of tasks allowed for each job

Page 18: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

18

Graph Construction (3)

• Add edges from tasks (T) to computers (C), racks (R), and the cluster (X)

• Control over data locality cost(T-C) << cost(T-R) << cost(T-X)

• 0 cost edge from root task to computer to avoid preempting root task

Page 19: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

19

A Feasible Matching

• Cost of T-U edge increases over time

• New cost assigned to scheduled T-C edge: increases over time

Page 20: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

20

Final Graph

Page 21: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

21

Evaluation

• Typical Dryad jobs (Sort, Join, PageRank, WordCount, Prime)

• Prime used as a worst-case job that hogs the cluster if started first

• 240 computers in cluster. 8 racks, 29-31 computers per rack

• More than one metric used for evaluation

Page 22: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

22

Experiments

Page 23: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

23

Experiments (2)

Page 24: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

24

Experiments (3)

Page 25: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

25

Experiments (4)

Page 26: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

26

Experiments (5)

Page 27: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

27

Makespan when network is bottleneck(s)

Page 28: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

28

Data Transfer (TB)

Page 29: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

29

Conclusion

• New computational model for data intensive computing

• Elegant mapping of scheduling to min-cost flow/matching problem

Page 30: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

30

Discussion

• Homogenous environment• Centralized Quincy controller: single point

of failure.• No theoretical stability guarantee.• Cost measure: fairness, cost of kill

Page 31: 1 Quincy: Fair Scheduling for Distributed Computing Clusters Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg.

31

Questions or Comments?

Thanks!


Recommended