Post on 22-Jan-2016
description
transcript
1
Grid Scheduling
Cécile Germain-Renaud
2
Scheduling
• Job– A computation to run on a machine
– Possibly with network access e.g. input/output file (coarse grain) or communication with other jobs (the DAG model)
• Schedule– s(J) = date to begin execution of task J– Alloc(J) = machine assigned to J
• One of the oldest Computer Science problems
• Principles of classification:
[Graham et al. Optimization and approximation in deterministic sequencing and scheduling: A survey. Ann. Discrete Math. 5, (1979), 287-326]
• Computer-aided classification of complexity results (4536 at the time of the paper) [Lageweg et al. Computer-Aided complexity classification of combinational problems. CACM 11:2, 1892]
3
Classical scheduling in HPC
• Context: parallel computing/computers• Application = Direct Acyclic Graph (T, E, w, c)
– T = set of sequential tasks– E = dependence constraints– w(t) = computational cost of task t– c(t,t’) = communication cost (data sent from t to t’)
• Infrastructure– P identical processors– With or without preemption, dedicated (no sharing)
• An optimization problem with objective functionMakespan = Total execution time S(T) = max (s(t) + w(t))
• Complexity– NP-complete for independant tasks and no communication E = vide, p =2 and c= 0
– NP-complete for UET-UCT graphs (w = c = 11)– Very old: without communication, list scheduling provides a (2-1/p)
approximation
T
T’
4
Scheduling in Institutional Grids
• Institutional: federation of ressources– accounted-for: fair-share on the medium to long time scale is a
premium constraint– Partially autonomous local policies must be allowed
• Grid– Permanent regime: on-line decisions– Large scale: strongly distributed
• Information system • Scheduling services
• Relevant contexts– Autonomous, multi-agents systems– Auction algorithms– Service Level Agreement (SLA) technology
5
EGEE gLite Scheduling
BrokerUI
Localscheduler
Site (node)
CE
Proc
Broker
Broker
UI
UI
UI
UI
6
EGEE gLite Scheduling
BrokerUI
Localscheduler
Site (node)
CE
Proc
Broker
Broker
UI
UI
UI
UI
BDII
Publish
7
EGEE gLite Scheduling
Localscheduler
Site (node)
CE
Proc
Broker
BDII
Publish
Query
UI
UI
UI
UI
UI
Rank
The information published isStatic: eg which type of VO is acceptedDynamic: expected traversal time
8
EGEE gLite Scheduling
Localscheduler
Site (node)
CE
Proc
Broker
BDII
Publish
Query
UI
UI
UI
UI
UI
Rank
Rank: may be any user-defined function, e.g. avoid « bad » machinesDefault is first locality, second expected traversal time
9
EGEE gLite Scheduling
Localscheduler
Site (node)
CE
Proc
Broker
BDII
Publish
Query
UI
UI
UI
UI
UI
Update
BDII broker cache
10
Not only academic
Execution time (s)
Overhead Ratio
• Long waiting times
• When EGEE was not so heavily loaded
11
Batch scheduling
• Very complex policies• Maximise throughput under constraints
– Weighted fair-share – VOs, type of jobs– Priorities– Hardware requirements– Advance reservations
• An indication of job duration is given by the type of queue: infinite, long, medium, short, and exotic ones
[B. Bode et al.The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters]
12
Classical vs Grid
• (Relatively) easy:– Throughput instead of makespan + Master-slave graph instead
of DAG allow for instance to define cyclic schedules in polynomial time which are asymptotically optimal, but not local[Y. Robert] [A. Rosenberg]
• Moderately difficult: information about– Applications– Infrastructures
• The same program on different data may run at very different speed• The network performance is dynamic
• Really difficult– Queues managed by local policies– On-line decision
13
Information and Scheduling (I)
• Considerable work has been done in predicting CPU load in shared environments – desktops, clusters, desktop grids [P.A. Dinda, R. Wolski, J. Schopf]– The basic technique is linear time-series analysis
– Self-similarity and epochal behavior– Usual goal is the prediction of the next value– Applied to soft real-time scheduling on shared clusters– Practical application in NWS
zt = + (B)
(B)(1 – B)dat
14
Information and scheduling (III)
• Less work on predicting the behavior of dedicated systems
• Papers are on parallel systems, mostly based on time-series techniques, but at least one based on a genetic algorithm [Downey, Foster, Wolski]
• The traces are much more difficult to access• No time slice - Irregular time series: the records are
event-driven• Which analysis
– Average waiting time: clear but not very useful for prediction– Fitting a distribution: not convincing for // systems– Predicting an upper bound with a confidence interval: metric of
success?
15
Information and grid
• We cannot directly log the entire state of the system– Access rights– Size
• Currently available data– The lifecycle of jobs going through certain brokers– The job ranking at the same brokers– The detailed behavior of the queues on certain sites– Certain = LAL + possibly other mainstream
• Easy to get– Summary data about the lifecycle of all jobs – From which it could be possible to reconstruct the detailed state
and dynamic of the CE
16
What should we learn ?
• Learning besides time series make sense in a grid: massive use of community programs instead of (?) sparse runs of a very long and complex digital experiment
• Information as sketched before– Beware: not be a steady-state system
• New users, new machines, new software is the expected regime for some years from now
• A community-based resource will tend display correlated activity
– Is there an invariant social graph? Is it a feature?
• System algorithms e.g. a site scheduler or the broker – Validation ?
• Scheduling algorithms– Validation ?