+ All Categories
Home > Documents > Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van...

Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van...

Date post: 11-Jan-2016
Category:
Upload: blaise-lamb
View: 212 times
Download: 0 times
Share this document with a friend
21
of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam
Transcript
Page 1: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

1

Low-Cost Task Scheduling for Distributed-Memory Machines

Andrei Radulescu and Arjan J.C. Van Gemund

Presented by Bahadır Kaan Özütam

Page 2: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

2

Outline Introduction List Scheduling Preliminaries General Framework for LSSP Complexity Analysis Case Study Extensions for LSDP Conclusion

Page 3: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

3

Introduction

Task Scheduling Scheduling heuristics Shared-memory - Distributed Memory Bounded - unbounded number of processors Multistep - singlestep methods Duplicating - nonduplicating methods Static - dynamic priorities

Page 4: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

4

List Scheduling LDSP and LSSP algorithms LSSP (List Scheduling with Static Priorities);

Tasks are scheduled in the order of their previously computed priorities on the task’s “best” processor.

Best processor is ... The processor enabling the earliest start time, if the

performance is the main concern The processor becoming idle the earliest, if the speed

is the main concern. LSDP (List Scheduling with Dynamic Priorities);

Priorities for task-processor pairs more complex

Page 5: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

5

List Scheduling Reducing LSSP time complexity

O(V log(V) + (E+V)P)

=> O(V log (P) + E)

V = expected number of tasks

E = expected number of dependencies

P = number of processors

1. Considering only two processors

2. Maintaining partially-sorted task priority queue with a limited number of tasks

Page 6: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

6

Preliminaries

Parallel programs (DAG) G = (V,E) Computation cost Tw(t)

Communication cost Tc(t, t’)

Communication and computation ratio (CCR)

The task graph width (W)

EE

E EE

EE

EE

V

V V V

VVV

V

E

Page 7: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

7

Preliminaries Entry and exit tasks The bottom level (Tb) of the task Ready = parents scheduled Start time Ts(t)

Finish time Tf(t) Partial schedule Processor ready time

Tr(p) = max Tf(t) , t V, Pr(t)=p.

Processor becoming idle the earliest (pr)

Tr(pr) = min Tr(p) , p P

Page 8: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

8

Preliminaries The last message arrival time

Tm(t) = max { Tf(t’) + Tc(t’, t) }

(t’, t) E The enabling processor pe(t); from which last

message arrives Effective message arrival time

Te(t,p) = max { Tf(t’) + Tc(t’, t) }

(t’, t) E , pt(t’) <> p

The start time of a ready task, once scheduled Ts(t, p) = max { Te(t, p), Tr(p) }

Page 9: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

9

General Framework for LSSP

General LSSP algorithm Task’s priority computation,

O(E + V) Task selection,

O(V log W) Processor selection

O( (E + V) P)

Page 10: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

10

General Framework for LSSP

Processor Selection selecting a processor

1. The enabling processor

2. Processor becoming idle first

Ts(t) = max { Te (t, p), Tr ( p ) }

Page 11: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

11

General Framework for LSSP Lemma 1.

p <> pe(t) : Te (t, p) = Tm(t)

Theorem 1. t is a ready task, one of the processors p {pe(t), pr } satisfies

Ts (t, p) = min Ts(t, px), px P

O( (E + V) P ) O (V log (P) + E ) O (E + V) to traverse the task graph O (V log P) to maintain the processors sorted

Page 12: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

12

General Framework for LSSP

Task Selection O (V log W) can be reduced by sorting only

some of the tasks. Task priority queue

1. A sorted list of size H

2. A FIFO list ( O ( 1 ) ) decreases to O(V log H)

H needs to be adjusted H = P is optimal ( O ( V log P ) )

Page 13: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

13

Complexity Analysis

Computing task prioritiesO ( E + V )

Task selection O ( V log W ) O ( V log H ) for partially sorted priority

queue

O ( V log (P) ) for queue of size P Processor Selection O (E + V)

O (V log P) Total complexity

O ( V ( log (W) + log (P) ) + E) fully sorted

O ( V ( log (P) + E ) partially sorted

Page 14: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

14

Case Study

MCP (Modified Critical Path) The task having the highest

bottom level has the highest priority

FCP (Fast Critical Path) 3 Processors Partially sorted priority queue

of size 2 7 tasks

4

4

1

1 32

31

11

t0 / 2

t1 / 2 t2 / 2 t3 / 2

t6 / 2t5 / 3t4 / 3

t7 / 2

2

Page 15: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

15

Case Study

4

4

1

1 32

31

11

t0 / 2

t1 / 2 t2 / 2 t3 / 2

t6 / 2t5 / 3t4 / 3

t7 / 2

2

Ready tasks Scheduling

sorted FIFOt

t -> p [ Ts - Tf ]

t0 [15] - t0 t0 -> p0 [0 - 2]

t1 [11]

t2 [9]t3 [12] t1 t1 -> p0 [2 - 4]

t3 [12] t4 [6]

t2 [9] t5 [8]t3 t3 -> p1 [3 - 6]

t2 [9]

t4 [6]t5 [8] t2 t2 -> p0 [4 - 6]

t5 [8]

t4 [6]t6 [6] t5 t5 -> p2 [6 - 9]

t4 [6]

t6 [6]- t4 t4 -> p0 [6 - 9]

t6 [6] - t6 t6 -> p1 [7 - 9]

t7 [2] - t7 t7 -> p2 [11 - 13]

Page 16: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

16

Extensions for LSDP

Extend the approach to dynamic priorities ETF : ready task starts the earliest

ERT : ready task finishes the earliest

DLS : task-processor having highest dynamic level General formula

(t, p) = ( t ) + max { Te (T, p), Tr (p) } ETF ( t ) = 0

ERT ( t ) = Tw( t )

DLS ( t ) = - Tb(t)

Page 17: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

17

Extensions for LSDP

EP case on each processor, the tasks are sorted the processors are sorted

non-EP case the processor becoming idle first if this is EP, it falls to the EP case

Page 18: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

18

Extensions for LSDP

3 tries; 1 for EP case, 1 for non-EP case

Task priority queues maintained; P for EP case, 2 for non-EP case

Each task is added to 3 queues; 1 for EP case, 2 for non-EP case

Processor queues; 1 for EP case, 1 for non-EP case

Page 19: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

19

Complexity

Originally O ( W ( E + V ) P )

now O ( V (log (W) + log (P) ) + E )

can be further reduced using partially sorted priority queue. A size of P is required to maintain comparable performance

O ( V log (P) + E )

Page 20: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

20

Conclusion

LSSP can be performed at a significantly lower cost... Processor selection between only two processors;

enabling processor or processor becoming idle first Task selection, only a limited number of tasks are

sorted Using the extension of this method, LSDP

complexity also can be reduced For large program and processor dimensions,

superior cost-performance trade-off.

Page 21: Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

of 21

21

Thank You

Questions?


Recommended