Date post: | 11-Jan-2016 |
Category: |
Documents |
Upload: | blaise-lamb |
View: | 212 times |
Download: | 0 times |
of 21
1
Low-Cost Task Scheduling for Distributed-Memory Machines
Andrei Radulescu and Arjan J.C. Van Gemund
Presented by Bahadır Kaan Özütam
of 21
2
Outline Introduction List Scheduling Preliminaries General Framework for LSSP Complexity Analysis Case Study Extensions for LSDP Conclusion
of 21
3
Introduction
Task Scheduling Scheduling heuristics Shared-memory - Distributed Memory Bounded - unbounded number of processors Multistep - singlestep methods Duplicating - nonduplicating methods Static - dynamic priorities
of 21
4
List Scheduling LDSP and LSSP algorithms LSSP (List Scheduling with Static Priorities);
Tasks are scheduled in the order of their previously computed priorities on the task’s “best” processor.
Best processor is ... The processor enabling the earliest start time, if the
performance is the main concern The processor becoming idle the earliest, if the speed
is the main concern. LSDP (List Scheduling with Dynamic Priorities);
Priorities for task-processor pairs more complex
of 21
5
List Scheduling Reducing LSSP time complexity
O(V log(V) + (E+V)P)
=> O(V log (P) + E)
V = expected number of tasks
E = expected number of dependencies
P = number of processors
1. Considering only two processors
2. Maintaining partially-sorted task priority queue with a limited number of tasks
of 21
6
Preliminaries
Parallel programs (DAG) G = (V,E) Computation cost Tw(t)
Communication cost Tc(t, t’)
Communication and computation ratio (CCR)
The task graph width (W)
EE
E EE
EE
EE
V
V V V
VVV
V
E
of 21
7
Preliminaries Entry and exit tasks The bottom level (Tb) of the task Ready = parents scheduled Start time Ts(t)
Finish time Tf(t) Partial schedule Processor ready time
Tr(p) = max Tf(t) , t V, Pr(t)=p.
Processor becoming idle the earliest (pr)
Tr(pr) = min Tr(p) , p P
of 21
8
Preliminaries The last message arrival time
Tm(t) = max { Tf(t’) + Tc(t’, t) }
(t’, t) E The enabling processor pe(t); from which last
message arrives Effective message arrival time
Te(t,p) = max { Tf(t’) + Tc(t’, t) }
(t’, t) E , pt(t’) <> p
The start time of a ready task, once scheduled Ts(t, p) = max { Te(t, p), Tr(p) }
of 21
9
General Framework for LSSP
General LSSP algorithm Task’s priority computation,
O(E + V) Task selection,
O(V log W) Processor selection
O( (E + V) P)
of 21
10
General Framework for LSSP
Processor Selection selecting a processor
1. The enabling processor
2. Processor becoming idle first
Ts(t) = max { Te (t, p), Tr ( p ) }
of 21
11
General Framework for LSSP Lemma 1.
p <> pe(t) : Te (t, p) = Tm(t)
Theorem 1. t is a ready task, one of the processors p {pe(t), pr } satisfies
Ts (t, p) = min Ts(t, px), px P
O( (E + V) P ) O (V log (P) + E ) O (E + V) to traverse the task graph O (V log P) to maintain the processors sorted
of 21
12
General Framework for LSSP
Task Selection O (V log W) can be reduced by sorting only
some of the tasks. Task priority queue
1. A sorted list of size H
2. A FIFO list ( O ( 1 ) ) decreases to O(V log H)
H needs to be adjusted H = P is optimal ( O ( V log P ) )
of 21
13
Complexity Analysis
Computing task prioritiesO ( E + V )
Task selection O ( V log W ) O ( V log H ) for partially sorted priority
queue
O ( V log (P) ) for queue of size P Processor Selection O (E + V)
O (V log P) Total complexity
O ( V ( log (W) + log (P) ) + E) fully sorted
O ( V ( log (P) + E ) partially sorted
of 21
14
Case Study
MCP (Modified Critical Path) The task having the highest
bottom level has the highest priority
FCP (Fast Critical Path) 3 Processors Partially sorted priority queue
of size 2 7 tasks
4
4
1
1 32
31
11
t0 / 2
t1 / 2 t2 / 2 t3 / 2
t6 / 2t5 / 3t4 / 3
t7 / 2
2
of 21
15
Case Study
4
4
1
1 32
31
11
t0 / 2
t1 / 2 t2 / 2 t3 / 2
t6 / 2t5 / 3t4 / 3
t7 / 2
2
Ready tasks Scheduling
sorted FIFOt
t -> p [ Ts - Tf ]
t0 [15] - t0 t0 -> p0 [0 - 2]
t1 [11]
t2 [9]t3 [12] t1 t1 -> p0 [2 - 4]
t3 [12] t4 [6]
t2 [9] t5 [8]t3 t3 -> p1 [3 - 6]
t2 [9]
t4 [6]t5 [8] t2 t2 -> p0 [4 - 6]
t5 [8]
t4 [6]t6 [6] t5 t5 -> p2 [6 - 9]
t4 [6]
t6 [6]- t4 t4 -> p0 [6 - 9]
t6 [6] - t6 t6 -> p1 [7 - 9]
t7 [2] - t7 t7 -> p2 [11 - 13]
of 21
16
Extensions for LSDP
Extend the approach to dynamic priorities ETF : ready task starts the earliest
ERT : ready task finishes the earliest
DLS : task-processor having highest dynamic level General formula
(t, p) = ( t ) + max { Te (T, p), Tr (p) } ETF ( t ) = 0
ERT ( t ) = Tw( t )
DLS ( t ) = - Tb(t)
of 21
17
Extensions for LSDP
EP case on each processor, the tasks are sorted the processors are sorted
non-EP case the processor becoming idle first if this is EP, it falls to the EP case
of 21
18
Extensions for LSDP
3 tries; 1 for EP case, 1 for non-EP case
Task priority queues maintained; P for EP case, 2 for non-EP case
Each task is added to 3 queues; 1 for EP case, 2 for non-EP case
Processor queues; 1 for EP case, 1 for non-EP case
of 21
19
Complexity
Originally O ( W ( E + V ) P )
now O ( V (log (W) + log (P) ) + E )
can be further reduced using partially sorted priority queue. A size of P is required to maintain comparable performance
O ( V log (P) + E )
of 21
20
Conclusion
LSSP can be performed at a significantly lower cost... Processor selection between only two processors;
enabling processor or processor becoming idle first Task selection, only a limited number of tasks are
sorted Using the extension of this method, LSDP
complexity also can be reduced For large program and processor dimensions,
superior cost-performance trade-off.
of 21
21
Thank You
Questions?