+ All Categories
Home > Documents > Idempotent Work Stealing

Idempotent Work Stealing

Date post: 23-Feb-2016
Category:
Upload: lilah
View: 52 times
Download: 1 times
Share this document with a friend
Description:
Idempotent Work Stealing. Maged M. Michael, Martin T. Vechev , Vijay A. Saraswat PPoPP’09. Outline. Memory Operations Reordering Problem Definition – Idempotent Work-Stealing The algorithms Comparison to Previous Work Summary. Memory Operations Reordering. - PowerPoint PPT Presentation
Popular Tags:
43
Idempotent Work Stealing Maged M. Michael, Martin T. Vechev, Vijay A. Saraswat PPoPP’09 1
Transcript
Page 1: Idempotent Work Stealing

1

Idempotent Work Stealing

Maged M. Michael, Martin T. Vechev,Vijay A. Saraswat

PPoPP’09

Page 2: Idempotent Work Stealing

2

Memory Operations Reordering Problem Definition – Idempotent Work-

Stealing The algorithms Comparison to Previous Work Summary

Outline

Page 3: Idempotent Work Stealing

3

Some architectures reorder the memory accesses to achieve faster execution

Good optimization for uni-processors… But may be dangerous for multi-processors

Memory Operations Reordering

read(a)read(b)

write(a,1)write(b,2)

read(a)write(b,2)

write(a,1)read(b)

Page 4: Idempotent Work Stealing

4

Memory Operations Reordering

P1L1: if(read(a) = 0)

goto L1 print(read(b))

Memorya = 0;b = 0;

P2 write(b, 7) write(a, 1)

Expected output of P1?

What happens if P2 changes the order of memory stores?

P1

P2

Page 5: Idempotent Work Stealing

5

Operations that synchronize memory accesses

X-Y fence: all previous operations of type X must commit before all following operations of type Y start

Example: store-load

Memory Fences

read1

write1

store-loadwrite2

read2

store-store?

Page 6: Idempotent Work Stealing

6

Memory Operations Reordering –With Memory Fences

P1L1: if (read(a) = 0)

goto L1 print(read(b))

Memorya = 0;b = 0;

P2 write(b, 1) store-store write(a, 7)

P1

P2

Page 7: Idempotent Work Stealing

7

Sequential Consistency A model where:

◦ All processors see all memory operations in the same order

◦ Must adhere to the program order (for each thread)

Memory operations are not sequential consistent

Makes program verification a non-simple task

Page 8: Idempotent Work Stealing

8

Sequential Consistency Vs. Linearizability Linearizability is stronger than sequential

consistency

(and not only for a single thread)

If operation A is executed before operation B (in real-time), then

A precedes B in the order

Page 9: Idempotent Work Stealing

9

Memory Operations Reordering Problem Definition – Idempotent Work-

Stealing The algorithms Comparison to Previous Work Summary

Outline

Page 10: Idempotent Work Stealing

10

Idempotence – the property of certain operations, that can be applied multiple times without changing the result (Wikipedia)

In other words: f(f(x))=f(x)

Examples:1. The absolute function2. The number 1 is idempotent of multiplication:

1 * 13. SQL query (without updates)

Problem Definition - Idempotence

Page 11: Idempotent Work Stealing

11

A policy to divide procedure executions (jobs/tasks) efficiently among multiple processors

Each processor has a deque (double-ended queue) of jobs

Problem Definition – Work Stealing

job

job

job

job

job

job

job

job

job

P1

P2

Pk

Page 12: Idempotent Work Stealing

12

Each processor can put a new job in its own queue

Each processor can take a job from its own queue

Problem Definition – Work Stealing

job

job

job

job

job

job

job

job

job

job

P1

P2

Pk

Page 13: Idempotent Work Stealing

13

A processor without work can steal jobs from another processor

Problem Definition – Work Stealing

job

job

job

job

job

job

job

P1

P2

Pk

Page 14: Idempotent Work Stealing

14

Fibonacci numbers – fib(7) P1 – take() -> fib(7) P1 – put(fib(6)), put(fib(5)) P1 – take() -> fib(6) P2 – steal(P1) P2 – take() -> fib(5) P1 – put(fib(5)), put(fib(4)) P2 – put(fib(4)), put(fib(3)) P1 – take() -> fib(5) P3 – steal(P1) P3 – take() -> fib(4) P2 – take() -> fib(4)

Work Stealing - Example

fib(7)Fib(6)

fib(5)

Fib(5)

Fib(4)

Fib(4)

Fib(3)

P1

P2

P3

Page 15: Idempotent Work Stealing

15

Work stealing seems like a good idea… But, it can be expensive…

Because:1. Using locks2. Using atomic Read-Modify-Write operations3. Using Memory Ordering Fence

Previous work-stealing algorithms use strong synchronization primitives

Well…

Can Work-Stealing algorithms of Idempotent tasks avoid using

synchronization primitives?

Page 16: Idempotent Work Stealing

16

Not exactly…

Our goal:◦ Making Work-stealing cheap when jobs are

idempotent

How?◦ Making the owner’s operations (“put”, “take”)

cheap, but “steal” remains expensive

The answer

Page 17: Idempotent Work Stealing

17

A snippet of the Chase-Lev algorithm:

The Chase-Lev algorithm

Task take() {1. b := bottom;2. CircularArray a = activeArray;3. b = b – 1;4. bottom = b;5. t = top;… }

store-load

Page 18: Idempotent Work Stealing

18

Memory Operations Reordering Problem Definition – Idempotent Work-

Stealing The algorithms Comparison to Previous Work Summary

Outline

Page 19: Idempotent Work Stealing

19

We will see 3 algorithms All algorithms insert (put) jobs at the tail

1. Idempotent LIFO – extracting tasks (take/steal) from the tail

2. Idempotent FIFO – extracting tasks (take/steal) from the head

3. Idempotent double-ended – the owner takes tasks from the tail, and the others steal from the head

The algorithms

Page 20: Idempotent Work Stealing

20

Each processor has:◦ Dynamic array of tasks◦ A capacity variable◦ An anchor (tail index)

1) Idempotent LIFO

capacity = 7anchor = 0

tasks

insert – to tailtake/stealfrom tail

P1

Page 21: Idempotent Work Stealing

21

Idempotent LIFO – put(task)

task1

capacity = 7anchor = 0

void put(Task task) {1. t := anchor;2. if (t = capacity) { expand(); goto 1;}3. tasks[t] := task;4. anchor := t + 1; } store-store

1

tasks

Page 22: Idempotent Work Stealing

22

Idempotent LIFO – take()

task1 task2 task3

capacity = 7anchor = 3

Task take() {1. t := anchor;2. if (t = 0) return EMPTY;3. task := tasks[t – 1];4. anchor := t - 1;5. return task; }

2

tasks

Page 23: Idempotent Work Stealing

23

Idempotent LIFO – steal()

task1 task2 task3

capacity = 7anchor = 3

Task steal() {1. t := anchor;2. if (t = 0) return EMPTY;3. a := tasks;4. task := a[t – 1];5. if !CAS(anchor, t, t-1) goto 1;6. return task; }

load-load

load-CAS

2

Why tasks must be idempotent?

tasks

Page 24: Idempotent Work Stealing

24

Idempotent tasks

task1 task2 task3

capacity = 7anchor = 3

Task take() {1. t := anchor;2. if (t = 0) return EMPTY;3. task := tasks[t – 1];4. anchor := t - 1;5. return task; }

2

tasks

Task steal() {1. t := anchor;2. if (t = 0) return EMPTY;3. a := tasks;4. task := a[t – 1];5. if !CAS(anchor, t, t-1) goto 1;6. return task; }t a

task=task3t

task=task3

2

Page 25: Idempotent Work Stealing

25

How is ABA possible?

Preventing ABA

task1 task2 task3

tasks

ownertake();put(taskX);…put(taskY);

Task steal() {1. t := anchor;2. if (t = 0) return EMPTY;3. a := tasks;4. task := a[t – 1];5. if !CAS(anchor, t, t-1) goto 1;6. return task; }

capacity = 7anchor = 32

t

taskX

3

task=task3

taskX is lost!

2

Page 26: Idempotent Work Stealing

26

How can we prevent it?

Preventing ABA

anchor: <integer, integer>; // <tail, tag>

void put(Task task) {1. <t,tag> := anchor;2. if (t = capacity) { expand(); goto 1;}3. tasks[t] := task;4. anchor := <t + 1, tag + 1>; } Task steal() {

1. <t,tag> := anchor;2. if (t = 0) return EMPTY;3. a := tasks;4. task := a[t – 1];5. if !CAS(anchor, <t,tag>, <t-1,tag>) goto 1;6. return task; }

Page 27: Idempotent Work Stealing

27

Each processor has:◦ Dynamic cyclic-array of tasks◦ A capacity variable◦ Head index (always increasing)◦ Tail index (always increasing)

2) Idempotent FIFO

task2 task3 task4

capacity = 7head = 1tail = 4

tasks

insert – to tailtake/stealfrom head

Next…P1

Page 28: Idempotent Work Stealing

28

Idempotent FIFO – put(task) void put(Task task) {1. h := head;2. t := tail;3. if (t = h + tasks.capacity) { expand(); goto 1;}4. tasks.array[t%tasks.capacity] := task;5. tail := t + 1; }

store-store

task2 task3 task4 task5

capacity = 7head = 1tail = 45

Page 29: Idempotent Work Stealing

29

Idempotent FIFO – take() Task take() {1. h := head;2. t := tail;3. if (h = t) return EMPTY;4. task := tasks.array[h%tasks.capacity];5. head := h + 1;6. return task; }

task2 task3 task4 task5

capacity = 7head = 1tail = 4

2

Page 30: Idempotent Work Stealing

30

Idempotent FIFO – steal() Task steal() {1. h := head;2. t := tail;3. if (h = t) return EMPTY;4. a := tasks;5. task := a.array[h%a.capacity];6. if !CAS(head, h, h+1) goto 1;7. return task; }

load-load

load-CAS

task2 task3 task4 task5

capacity = 7head = 1tail = 4

2

load-load

Page 31: Idempotent Work Stealing

31

Each processor has:◦ Dynamic cyclic-array of tasks◦ A capacity variable◦ An anchor (head, size)

3) Idempotent double-ended

task2 task3 task4

capacity= 7anchor = <1, 3>

tasks

insert – to tailtake – from tail

steal - from head

Next…P1

Page 32: Idempotent Work Stealing

32

Idempotent double-ended – put(task)

void put(Task task) {1. <h, s> := anchor;2. if (s = tasks.capacity) { expand(); goto 1;}3. tasks.array[(h+s)%tasks.capacity] := task;4. anchor := <h, s + 1>; } store-store

task2 task3 task4 task5

capacity = 7anchor = <1, 3>4

Page 33: Idempotent Work Stealing

33

Idempotent double-ended – take()

Task take() {1. <h, s> := anchor;2. if (s = 0) return EMPTY;3. task := tasks.array[(h+s-1)%tasks.capacity];4. anchor := <h, s – 1>;5. return task; }

task2 task3 task4 task5

capacity = 7anchor = <1, 4>3

Page 34: Idempotent Work Stealing

34

Idempotent double-ended – steal()

Task steal() {1. <h, s> := head;2. if (s = 0) return EMPTY;3. a := tasks;4. task := a.array[h%a.capacity];5. h2 := (h + 1) % a.capacity;6. if !CAS(head, <h,s>, <h2,s-1>) goto 1;7. return task; }

load-load

load-CAS

task2 task3 task4 task5

capacity = 7anchor = <1, 4 >2, 3

Page 35: Idempotent Work Stealing

35

Memory Operations Reordering Problem Definition – Idempotent Work-

Stealing The algorithms Comparison to Previous Work Summary

Outline

Page 36: Idempotent Work Stealing

36

Compared against “Chase-Lev” and “Cilk THE” algorithms (after adding memory fences)

Benchmarks:◦ Micro – the common case – take() and put()◦ Irregular Graph Applications

Experimental evaluation

Page 37: Idempotent Work Stealing

37

2 Scenarios:◦ Both puts and takes (106 ops for each type)◦ Only takes (106 ops) – pre populating the work-

queues

Micro-benchmarks

Page 38: Idempotent Work Stealing

38

2 Scenarios:◦ Both puts and takes (106 ops for each type)◦ Only takes (106 ops) – pre populating the work-

queues

Micro-benchmarks

Page 39: Idempotent Work Stealing

40

Based on SIMPLE framework 2D Torus Graph:

◦ Vertices – on the torus◦ Each vertex connected to its 4

neighbors Build a spanning tree

Irregular Graph Applications

Page 40: Idempotent Work Stealing

41

2D-Torus

Up to 6% redundant work

Page 41: Idempotent Work Stealing

42

Memory Operations Reordering Problem Definition – Idempotent Work-

Stealing The algorithms Comparison to Previous Work Summary

Outline

Page 42: Idempotent Work Stealing

43

Memory operations reordering improves execution times

Use with care in multi-processors “Idempotent Work-Stealing” useful for some

workloads Idempotent-LIFO gives good results for all

benchmarks

Summary

Page 43: Idempotent Work Stealing

44

Thank You!Questions?


Recommended