Post on 23-Dec-2015
transcript
SparrowDistributed Low-Latency Scheduling
Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica
Sparrow schedules tasks in clusters
using a decentralized, randomized approach
Sparrow schedules tasks in clusters
using a decentralized, randomized approach
support constraints and fair sharing
and provides response times
within 12% of ideal
Scheduling Setting
…
…
…
Map Reduce/Spark/
Dryad Job
…
Task
Task
Task
Map Reduce/Spark/
Dryad JobTask
Task
Job Latencies Rapidly Decreasing
10 min.
10 sec.
100 ms
1 ms
2004: MapReducebatch job
2009: Hive
query
2010: Dremel Query
2012: Impala query
2010:In-
memory Spark query
2013:Spark
streaming
Scheduling challenges:
Millisecond Latency
Quality Placement
Fault Tolerant
High Throughput
10 min.
10 sec.
100 ms
1 ms
2004: MapReducebatch job
2009: Hive
query
2010: Dremel Query
2012: Impala query
2010:In-
memory Spark query
2013:Spark
streaming
1000 16-core machines
26decisio
ns/second
Scheduler
throughput
1.6K
decisions/
second
160Kdecisio
ns/second
16M
decisions/
second
Millisecond Latency
Quality Placement
Fault Tolerant
High Throughput
Today: Completely Centralized Less centralization
Sparrow:Completely
Decentralized
✗
✗
✓
✗ ✓
✓
✓
?✓
SparrowDecentralized approach
Existing randomized approachesBatch SamplingLate BindingAnalytical performance evaluation
Handling constraints
Fairness and policy enforcement
Within 12% of ideal on 100 machines
Scheduling with Sparrow
WorkerWorkerWorkerWorkerWorker
Scheduler
Scheduler
Scheduler
SchedulerJob
Worker
WorkerWorkerWorkerWorkerWorker
Scheduler
Scheduler
Scheduler
SchedulerJob
Worker
Random
Simulated Results
100-task jobs in 10,000-node cluster, exp. task durations
Omniscient: infinitely fast centralized
scheduler
Per-task sampling
WorkerWorkerWorkerWorkerWorker
Scheduler
Scheduler
Scheduler
SchedulerJob
Worker
Power of Two Choices
Per-task sampling
WorkerWorkerWorkerWorkerWorker
Scheduler
Scheduler
Scheduler
SchedulerJob
Worker
Power of Two Choices
Simulated Results
100-task jobs in 10,000-node cluster, exp. task durations
70% cluster load
Response Time Grows with Tasks/Job!
Per-Task Sampling
WorkerWorkerWorkerWorkerWorker
Scheduler
Scheduler
Scheduler
SchedulerJob
Worker
✓
✓
Task 1
Task 2
Per-ta
sk
Per-task Sampling
WorkerWorkerWorkerWorkerWorker
Scheduler
Scheduler
Scheduler
SchedulerJob
Worker
Place m tasks on the least loaded of dm slaves
Per-ta
sk✓
✓
4 probes (d =
2)
Batch
Per-task versus Batch Sampling
70% cluster load
Simulated Results
100-task jobs in 10,000-node cluster, exp. task durations
Queue length poor predictor of wait time
Worker
Worker
80 ms155
ms
530 ms
Poor performance on heterogeneous workloads
Late Binding
Worker
Worker
Worker
Worker
Worker
Scheduler
Scheduler
SchedulerSchedulerJob
Worker
Place m tasks on the least loaded of dm slaves
4 probes (d =
2)
Late Binding
Scheduler
Scheduler
SchedulerSchedulerJob
Place m tasks on the least loaded of dm slaves
4 probes (d =
2)
Worker
Worker
Worker
Worker
Worker
Worker
Late Binding
Scheduler
Scheduler
SchedulerSchedulerJob
Place m tasks on the least loaded of dm slaves
Worker
requests
task
Worker
Worker
Worker
Worker
Worker
Worker
Simulated Results
100-task jobs in 10,000-node cluster, exp. task durations
What about constraints?
Job Constraints
Scheduler
Scheduler
Scheduler
SchedulerJob
Worker
Worker
Worker
Worker
Worker
Worker
Restrict probed machines to those that satisfy the constraint
Per-Task Constraints
Scheduler
Scheduler
Scheduler
SchedulerJob
Worker
Worker
Worker
Worker
Worker
Worker
Probe separately for each task
Technique Recap
Scheduler
Scheduler
Scheduler
SchedulerBatch
sampling+
Late binding+
Constraint handling
WorkerWorkerWorkerWorkerWorker
Worker
How does Sparrow perform on a real cluster?
Spark on Sparrow
WorkerWorkerWorkerWorkerWorker
Worker
Query: DAG of Stages
Sparrow
Scheduler
Job
Spark on Sparrow
WorkerWorkerWorkerWorkerWorker
Worker
Query: DAG of Stages
Sparrow
Scheduler
Job
Spark on Sparrow
WorkerWorkerWorkerWorkerWorker
Worker
Query: DAG of Stages
Sparrow
Scheduler
Job
How does Sparrow compare to Spark’s native scheduler?
100 16-core EC2 nodes, 10 tasks/job, 10 schedulers, 80% load
TPC-H Queries: Background
TPC-H: Common benchmark for analytics workloads
Sparrow
Spark: Distributed in-memory analytics framework
Shark: SQL execution engine
TPC-H Queries
100 16-core EC2 nodes, 10 schedulers, 80% load
95
75
25
50
Percentiles
5
TPC-H Queries
100 16-core EC2 nodes, 10 schedulers, 80% load
Within 12% of ideal
Median queuing delay of 9ms
Fault Tolerance
Scheduler 1
Scheduler 2
Spark Client 1 ✗Spark
Client 2
Timeout: 100msFailover: 5ms
Re-launch queries: 15ms
When does Sparrow not work as well?
High cluster load
Related Work
Centralized task schedulers: e.g., Quincy
Two level schedulers: e.g., YARN, Mesos
Coarse-grained cluster schedulers: e.g., Omega
Load balancing: single task
Batch sampling
+Late binding
+Constraint handling
www.github.com/radlab/sparrow
Sparrows provides near-ideal job response
times without global visibility
Scheduler
Scheduler
Scheduler
Scheduler
WorkerWorkerWorkerWorkerWorker
Worker
Backup Slides
Can we do better without losing simplicity?
Policy Enforcement
SlaveHigh Priority
Low Priority SlaveUser A (75%)
User B (25%)
Fair SharesServe queues using
weighted fair queuing
PrioritiesServe queues based on strict priorities