Grid Scheduling

transcript

Grid Scheduling

Cécile Germain-Renaud

Scheduling

• Job– A computation to run on a machine

– Possibly with network access e.g. input/output file (coarse grain) or communication with other jobs (the DAG model)

• Schedule– s(J) = date to begin execution of task J– Alloc(J) = machine assigned to J

• One of the oldest Computer Science problems

• Principles of classification:

[Graham et al. Optimization and approximation in deterministic sequencing and scheduling: A survey. Ann. Discrete Math. 5, (1979), 287-326]

• Computer-aided classification of complexity results (4536 at the time of the paper) [Lageweg et al. Computer-Aided complexity classification of combinational problems. CACM 11:2, 1892]

Classical scheduling in HPC

• Context: parallel computing/computers• Application = Direct Acyclic Graph (T, E, w, c)

– T = set of sequential tasks– E = dependence constraints– w(t) = computational cost of task t– c(t,t’) = communication cost (data sent from t to t’)

• Infrastructure– P identical processors– With or without preemption, dedicated (no sharing)

• An optimization problem with objective functionMakespan = Total execution time S(T) = max (s(t) + w(t))

• Complexity– NP-complete for independant tasks and no communication E = vide, p =2 and c= 0

– NP-complete for UET-UCT graphs (w = c = 11)– Very old: without communication, list scheduling provides a (2-1/p)

approximation

Scheduling in Institutional Grids

• Institutional: federation of ressources– accounted-for: fair-share on the medium to long time scale is a

premium constraint– Partially autonomous local policies must be allowed

• Grid– Permanent regime: on-line decisions– Large scale: strongly distributed

• Information system • Scheduling services

• Relevant contexts– Autonomous, multi-agents systems– Auction algorithms– Service Level Agreement (SLA) technology

EGEE gLite Scheduling

BrokerUI

Localscheduler

Site (node)

Broker

BrokerUI

Localscheduler

Site (node)

Broker

Publish

Localscheduler

Site (node)

Broker

Publish

The information published isStatic: eg which type of VO is acceptedDynamic: expected traversal time

Localscheduler

Site (node)

Broker

Publish

Rank: may be any user-defined function, e.g. avoid « bad » machinesDefault is first locality, second expected traversal time

Localscheduler

Site (node)

Broker

Publish

Update

BDII broker cache

Not only academic

Execution time (s)

Overhead Ratio

• Long waiting times

• When EGEE was not so heavily loaded

Batch scheduling

• Very complex policies• Maximise throughput under constraints

– Weighted fair-share – VOs, type of jobs– Priorities– Hardware requirements– Advance reservations

• An indication of job duration is given by the type of queue: infinite, long, medium, short, and exotic ones

[B. Bode et al.The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters]

Classical vs Grid

• (Relatively) easy:– Throughput instead of makespan + Master-slave graph instead

of DAG allow for instance to define cyclic schedules in polynomial time which are asymptotically optimal, but not local[Y. Robert] [A. Rosenberg]

• Moderately difficult: information about– Applications– Infrastructures

• The same program on different data may run at very different speed• The network performance is dynamic

• Really difficult– Queues managed by local policies– On-line decision

Information and Scheduling (I)

• Considerable work has been done in predicting CPU load in shared environments – desktops, clusters, desktop grids [P.A. Dinda, R. Wolski, J. Schopf]– The basic technique is linear time-series analysis

– Self-similarity and epochal behavior– Usual goal is the prediction of the next value– Applied to soft real-time scheduling on shared clusters– Practical application in NWS

zt = + (B)

(B)(1 – B)dat

Information and scheduling (III)

• Less work on predicting the behavior of dedicated systems

• Papers are on parallel systems, mostly based on time-series techniques, but at least one based on a genetic algorithm [Downey, Foster, Wolski]

• The traces are much more difficult to access• No time slice - Irregular time series: the records are

event-driven• Which analysis

– Average waiting time: clear but not very useful for prediction– Fitting a distribution: not convincing for // systems– Predicting an upper bound with a confidence interval: metric of

success?

Information and grid

• We cannot directly log the entire state of the system– Access rights– Size

• Currently available data– The lifecycle of jobs going through certain brokers– The job ranking at the same brokers– The detailed behavior of the queues on certain sites– Certain = LAL + possibly other mainstream

• Easy to get– Summary data about the lifecycle of all jobs – From which it could be possible to reconstruct the detailed state

and dynamic of the CE

What should we learn ?

• Learning besides time series make sense in a grid: massive use of community programs instead of (?) sparse runs of a very long and complex digital experiment

• Information as sketched before– Beware: not be a steady-state system

• New users, new machines, new software is the expected regime for some years from now

• A community-based resource will tend display correlated activity

– Is there an invariant social graph? Is it a feature?

• System algorithms e.g. a site scheduler or the broker – Validation ?

• Scheduling algorithms– Validation ?

Grid Scheduling

Documents