Towards a Realistic Scheduling Model Oliver Sinnen, Leonel Sousa, Frode Eika Sandnes IEEE TPDS, Vol....

Post on 20-Dec-2015

215 views 1 download

Tags:

transcript

Towards a Realistic Scheduling Model

Oliver Sinnen, Leonel Sousa, Frode Eika Sandnes

IEEE TPDS, Vol. 17, No. 3, pp. 263-275, 2006.

Parallel processing is the oldest discipline in computer science –

yet the general problem is far from solved

Why is parallel processing difficult?

• ”Jo flere kokker jo mere søl”– Partitioning and

transforming problems

– Load balancing– Inter-processor

communication– Granularity– Architecture

Implementing parallel systems

• Manually– MPI– PVM– Linda

• Automatically– Parallelising compilers (Fortran)– Static scheduling

Taskgraph scheduling:Representing static

computations

Modelling computations

A=B+C

Data dependencies

A

BC

Valid sequences:CBA, BCAInvalid sequences:ABC, ACB, CAB, BAC

Another example

A = (B-C)/DF = B+G

A

BCD

F

G

Scheduling

Static taskgraph scheduling techniques

The scheduling process

A

B C

D E

A

B

D

C

E

Taskgraph

Allocation

Schedule

p1 p2tim

e

c1 c2

c3 c4 c5

Topological sorting

• Topological sorting– to order the vertices of a graph such

that the precedence constraints are not violated

• All valid schedules represent a topological sort

• Scheduling algorithms differ in how they topologically sort the graph

The importance of abstraction

• Abstraction is important to preserve generality

• Too specificfloat sum = 0;for (int i=0;i<8;i++)

{sum += a[i];}

• General and flexiblefloat sum = sumArray(a);

Communication

Communication is a major bottleneck

• Typically from 1:50 to 1:10,000 difference between computation and communication

• Communication cost not very dependent on data size.

• Interconnection network topology affect the overall time.

Scheduling work prior to 1995

• Assumptions– Zero-interprocessor communication

costs– Fully connected processor

interconnection networks.

Amounts of data transferPublic transport is a good

thing?

Data-size not is not major factor

• Multiple single messages

• Single compound message

connect send connect send connect send

connect send send send

Interconnection topology

Fully connected

The ring

To send something from here..

…to here

Interprocessor communication

• Zero vs non-zero communication overheads

• Direct links vs connecting nodes

P1 P4P3P2

Bus

P11 P12 P13 P14

P21 P22 P23 P24

P31 P32 P33 P34

P41 P42 P43 P44

Shared memoryBus-based multiprocessor

Distributed memoryMesh multiprocessor

RAM

Avoiding communication overheads

Duplication

a

b c

a a

b c

a

b

c

a

b

a

c

p1 p2 p1 p2

t=1

t=2

t=3

t=1

t=2

11 1

1 1

11

1

11

1

duplication

allocation allocation

When considering communication overheads

Classic communication model: Assumptions

• Local communications have zero communication costs

• Communication is conducted by subsystem.

• Communication can be performed concurrently

• The network is fully connected

Implications

• Network contention (not modelled)– Tasks compete for communication

resources

• Contention can be modelled:– Different types of edges– Switch verticies (in addition to

processor verticies)

Processor involvement in communication I

Two-sided involvement

(TCP/IP PC-cluster)

Processor involvement in communication II

One-sided involvement

(Shared memory Cray T3E)

Processor involvement in communication III

Third party involvement

(Dedicated DMA hardware Meiko CS-2)

Problems

• All classic scheduling models assume third-party involvement.

• Very little hardware are equipped with dedicated hardware supporting third-party involvement.

• Estimated finish-times for tasks are hugely inaccurate.

• Scheduling algorithm are very sub-optimal.

Even more problems

Results

bobcat Sun E3500 3TE-900

The End