+ All Categories
Home > Documents > CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen...

CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen...

Date post: 06-Jan-2018
Category:
Upload: amelia-french
View: 231 times
Download: 0 times
Share this document with a friend
Description:
3CoreDet Goal: Execution-level determinism... of arbitrary, unmodified C/C++ pthreads programs without special hardware without sacrificing scalability Implementation: a compiler pass in LLVM, plus a custom runtime library
47
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear at ASPLOS 2010
Transcript
Page 1: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

CoreDet: A Compiler and Runtime System for

Deterministic Multithreaded Execution

Tom BerganOwen Anderson, Joe Devietti, Luis Ceze, Dan

GrossmanTo appear at ASPLOS 2010

Page 2: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

2

CoreDetCoreDetGoal: Execution-level determinism ...•of arbitrary, unmodified C/C++ pthreads programs•without special hardware•without sacrificing scalability

Implementation:•a compiler pass in LLVM, plus•a custom runtime library

Page 3: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

3

CoreDetCoreDetGoal: Execution-level determinism ...•of arbitrary, unmodified C/C++ pthreads programs•without special hardware•without sacrificing scalability

Implementation:•a compiler pass in LLVM, plus•a custom runtime library

Page 4: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

4

CoreDetCoreDetAn implementation of DMP in software

• DMP-OwnershipA new protocol

• DMP-Buffering

Page 5: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

5

CoreDetCoreDetAn implementation of DMP in software

• DMP-Ownership• straightforward to implement• poorer scalability, fewer overheads

• DMP-Buffering• better scalability, more overheads• no speculation• key insight: relaxed memory

consistency(specifically, Weak Ordering)

A new protocol

Page 6: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

6

CoreDetCoreDetAn implementation of DMP in software

• DMP-Ownership• straightforward to implement• poorer scalability, fewer overheads

• DMP-Buffering• better scalability, more overheads• no speculation (easier to implement

than TM)• key insight: relaxed memory

consistency(specifically, Weak Ordering)

A new protocol

Page 7: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

7

CoreDet: CoreDet: ImplementationImplementation

A compiler•instruments the code with calls to the runtime•static optimizations to remove instrumentation

A runtime library•scheduling threads•tracks interthread communication•deterministic wrappers for . . .

- pthreads- malloc

Page 8: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

8

OutlineOutlineDMP-Ownership in Software

Why not DMP-TM in Software?

DMP-Buffering

A Few Implementation Details

Performance

Page 9: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

9

DMP-SerialDMP-Serial

a := xa := x b := yb := y

x := a * 2x := a * 2 y := a + by := a + b

Thread 1 c := xc := x d := yd := y

x := a * 3x := a * 3 y := a - by := a - b

Thread 2

Page 10: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

10

DMP-SerialDMP-Serial

a := xa := x b := yb := y

x := a * 2x := a * 2 y := a + by := a + b

Thread 1

c := xc := x d := yd := y

x := a * 3x := a * 3 y := a - by := a - b

Thread 2

quan

tum

Page 11: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

11

DMP-SerialDMP-Serial

end of roundtime

T1

T3

T2

quantum

Execution is completely serialized

Page 12: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

12

DMP-OwnershipDMP-OwnershipParallel Serial

Parallel mode: no communication (can write only to private data)Serial mode: arbitrary communication

end of roundtime

T1

T3

T2

MOTx owned-by

T1

y shared

z owned-by T2:: ::

Page 13: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

13

DMP-Ownership in DMP-Ownership in SoftwareSoftware

T1

T3

T2

x owned-by T1

y shared

z owned-by T2:: ::

Quantum Formation•Count instructions: e.g. at the end of each basic block

Requirements:

Page 14: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

14

DMP-Ownership in DMP-Ownership in SoftwareSoftware

T1

T3

T2

x owned-by T1

y shared

z owned-by T2:: ::

Quantum Formation•Count instructions: e.g. at the end of each basic block Ownership Tracking•Instrument every load/store•Manipulate an MOT (in memory) via a runtime library

Requirements:

Page 15: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

15

DMP-Ownership in DMP-Ownership in SoftwareSoftware

We have implemented this in CoreDet.. . . but the scalability is not that great

splashmean

parsecmean

Page 16: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

16

OutlineOutlineDMP-Ownership in Software

Why not DMP-TM in Software?

DMP-Buffering

A Few Implementation Details

Performance

Page 17: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

17

DMP-SerialDMP-Serial

end of roundtime

T1

T3

T2

quantum

Execution is completely serialized

Page 18: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

18

DMP-TMDMP-TM

end of roundtime

T1

T3

T2

Execution is parallel and transactional

commitimplicit

transactions

Page 19: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

19

DMP-TM in SoftwareDMP-TM in Software

Quantum Formation•Count instructions

Requirements:

T1

T3

T2

Page 20: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

20

DMP-TM in SoftwareDMP-TM in Software

Quantum Formation•Count instructions Transactional Execution•Instrument every load/store•Use an off-the-shelf STM with minor modifications

Requirements:

This is really hard!

T1

T3

T2

Page 21: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

21

DMP-TM: Why not DMP-TM: Why not STM?STM?

STMs make the wrong assumptions:•Most code is not transactional •Transactions are short •Transactions are scoped

void foo() { ... begin_transaction() return}

An unscoped transaction:

Page 22: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

25

➡ Speculation makes things hard➡ Good scalability because of versioned

memory(versions are modified in parallel)

DMP-TM: What Can We DMP-TM: What Can We Learn?Learn?

DMP-Buffering: Use versioned memory without speculation

Page 23: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

26

OutlineOutlineDMP-Ownership in Software

Why not DMP-TM in Software?

DMP-Buffering

A Few Implementation Details

Performance

Page 24: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

27

DMP-BufferingDMP-Buffering

Parallel mode: no communication (versioned memory via store buffering)

time

T1

T3

T2

ParallelGlobal Global MemoryMemory

(read only)(read only)Thread

x := .. y ..x := .. y ..

Store BufferStore Buffer

Page 25: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

28

DMP-BufferingDMP-Buffering

Parallel mode: no communication (versioned memory via store buffering)Commit mode: deterministically (serially) publish local store buffers

Global Global MemoryMemory

......

Store BufferStore Buffer

Thread

time

T1

T3

T2

Parallel Commit

stores are reordered

x=..

Page 26: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

29

DMP-BufferingDMP-Buffering

Parallel mode: no communication (versioned memory via store buffering)

Global Global MemoryMemory

(read/write)(read/write)

lock(x)lock(x)

Store BufferStore Buffer

Thread

end of roundtime

T1

T3

T2

Parallel SerialCommit

Commit mode: deterministically (serially) publish local store buffersSerial mode: used for synchronization (e.g. atomic ops)

Page 27: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

30

DMP-BufferingDMP-BufferingT1

T3

T2

Parallel mode: buffer stores locally

Commit mode: publish local store buffers

Serial mode: used for synchronization (e.g. atomic ops)

•ends at synchronization (atomic ops and fences), and quantum boundaries

•happens serially for determinism•executes in parallel for performance

Page 28: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

31

DMP-BufferingDMP-Buffering A = 1A = 1 if (B == 0)if (B == 0) ......

B = 1B = 1 if (A == 0)if (A == 0) ......

Thread 1 Thread 2

Dekker’s Algorithm(there is a data race)

Page 29: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

32

DMP-BufferingDMP-Buffering

A = buffer[A]A = buffer[A] B = buffer[B]B = buffer[B]

Thread 1 Thread 2 buffer[A] = 1buffer[A] = 1 if (B == 0)if (B == 0) ......

buffer[B] = 1buffer[B] = 1 if (A == 0)if (A == 0) ......

Page 30: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

33

DMP-BufferingDMP-Buffering

A = buffer[A]A = buffer[A]

B = buffer[B]B = buffer[B]

Thread 1 Thread 2parallel

comm

it

buffer[A] = 1buffer[A] = 1 if (B == 0)if (B == 0) ......

buffer[B] = 1buffer[B] = 1 if (A == 0)if (A == 0) ......

This is deterministic . . .

reor

dere

d

Page 31: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

34

DMP-BufferingDMP-Buffering

A = buffer[A]A = buffer[A]

B = buffer[B]B = buffer[B]

Thread 1 Thread 2parallel

comm

it

buffer[A] = 1buffer[A] = 1 if (B == 0)if (B == 0) ......

buffer[B] = 1buffer[B] = 1 if (A == 0)if (A == 0) ......

. . . but not sequentially consistent(cycle in the happens-before graph)

Page 32: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

35

DMP-BufferingDMP-BufferingThread 1 Thread 2

A = 1A = 1 tmptmp11 = B = B

if (tmpif (tmp11 == 0) == 0) ......

B = 1B = 1 tmptmp22 = A = A

if (tmpif (tmp22 == 0) == 0) ......

Dekker’s Algorithm (again)Let’s remove the data race . . .

Page 33: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

36

DMP-BufferingDMP-Buffering

A = 1A = 1 tmptmp11 = B = B

Thread 1 Thread 2 lock(L)lock(L)

unlock(L)unlock(L)

lock(L)lock(L)

unlock(L)unlock(L)

if (tmpif (tmp11 == 0) == 0) ......

B = 1B = 1 tmptmp22 = A = A

if (tmpif (tmp22 == 0) == 0) ......

Dekker’s Algorithm(no data race)

Page 34: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

37

DMP-BufferingDMP-Buffering A = 1A = 1 tmptmp11 = B = B

lock(L)lock(L)

unlock(L)unlock(L)

lock(L)lock(L)

unlock(L)unlock(L)

if (tmpif (tmp11 == 0) == 0) ......

B = 1B = 1 tmptmp22 = A = A

if (tmpif (tmp22 == 0) == 0) ......

parallel +commit

serial

serial

parallel +commit

serial

parallel +commit

Synchronizationhappens sequentially

Page 35: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

DMP-BufferingDMP-Buffering A = 1A = 1 tmptmp11 = B = B

lock(L)lock(L)

unlock(L)unlock(L)

lock(L)lock(L)

unlock(L)unlock(L)

if (tmpif (tmp11 == 0) == 0) ......

B = 1B = 1 tmptmp22 = A = A

if (tmpif (tmp22 == 0) == 0) ......

parallel +commit

serial

serial

parallel +commit

serial

parallel +commit

Synchronizationhappens sequentially

Data race free programs are sequentially consistent

(required by C++ and Java memory models)

Page 36: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

43

DMP-Buffering: Parallel DMP-Buffering: Parallel CommitCommit

For determinism, the commit order must be deterministic i.e. logically serialFor performance, the commit must happen in parallelBasic idea:•Publish store buffers in parallel•Preserve the commit order on collisions

Page 37: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

44

0xA0xA 00

0xB0xB 00

0xC0xC 00

addr value

0xD0xD 00

0xE0xE 00

Global Memory

DMP-Buffering: Parallel DMP-Buffering: Parallel CommitCommit

0xA0xA 11

0xB0xB 11

0xC0xC 11

0xC0xC 22

0xD0xD 22

0xE0xE 22

addr value addr value

Thre

ad 1

Thread 2

Basic idea:•Commit store buffers in parallel•Preserve the commit order on collisions

Collision!Resolve for Thread 2

Page 38: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

46

OutlineOutlineDMP-Ownership in Software

Why not DMP-TM in Software?

DMP-Buffering

A Few Implementation Details

Performance

Page 39: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

48

• Redundant accesses y = ... x ... z = ... x ...

Remove Instrumention Remove Instrumention From ...From ...

• Accesses to thread-local (non-escaping) objects

don’t need to instrument this

Page 40: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

49

Remove Instrumention Remove Instrumention From ...From ...

• Accesses to thread-local (non-escaping) objects

don’t need to instrument this

DMP-Buffering: requires unification-based points-to analysis int local; int *p = (...) ? &local : &global; ...

must access through the store buffer

• Redundant accesses y = ... x ... z = ... x ...

Page 41: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

50

Remove Instrumention Remove Instrumention From ...From ...

• Accesses to thread-local (non-escaping) objects

don’t need to instrument this• Redundant accesses

y = ... x ... z = ... x ...

DMP-Buffering: requires unification-based points-to analysis int local; int *p = (...) ? &local : &global; ...

must access through the store buffer

Page 42: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

51

Remove Instrumention Remove Instrumention From ...From ...

• Accesses to thread-local (non-escaping) objects

don’t need to instrument this• Redundant accesses

y = ... x ... z = ... x ...

DMP-Buffering: requires unification-based points-to analysis int local; int *p = (...) ? &local : &global; ...

must access through the store buffer

DMP-Buffering: this does not apply

Page 43: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

52

External LibrariesExternal LibrariesWe do not instrument external shared libraries, suchas the system libc•External calls must be serialized

Preventing over-serialization:•We check indirect calls at runtime•We provide deterministic wrappers for common libc functions, e.g. memcpy and malloc•We do not serialize pure libc functions, e.g. sqrt

Page 44: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

53

OutlineOutlineDMP-Ownership in Software

Why not DMP-TM in Software?

DMP-Buffering

A Few Implementation Details

Performance

Page 45: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

54

ScalabilityScalability

splash mean parsec mean

Page 46: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

55

OverheadsOverheads

splash mean parsec mean

Since we preserve scalability, we can

overcome overheads by adding cores

Page 47: CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.

56

Wrap UpWrap UpCoreDet•execution-level determinism in software of arbitrary multithreaded programs

DMP-Buffering•uses a relaxed memory consistency model•scales comparably to nondeterministic execution

Thank you!The CoreDet source code will eventually be released at

http://sampa.cs.washington.edu


Recommended