+ All Categories
Home > Documents > What is a Data Race?

What is a Data Race?

Date post: 06-Jan-2016
Category:
Upload: ryann
View: 27 times
Download: 0 times
Share this document with a friend
Description:
What is a Data Race?. Two concurrent accesses to a shared location, at least one of them for writing. Indicative of a bug. Thread 1 Thread 2 X++T=Y Z=2T=X. How Can Data Races be Prevented?. Explicit synchronization between threads: Locks Critical Sections Barriers Mutexes - PowerPoint PPT Presentation
25
1 Thread 1 Thread 2 X++ T=Y Z=2 T=X What is a Data Race? Two concurrent accesses to a shared location, at least one of them for writing. Indicative of a bug
Transcript
Page 1: What is a Data Race?

1

Thread 1 Thread 2X++ T=YZ=2 T=X

What is a Data Race?

Two concurrent accesses to a shared location, at least one of them for writing. Indicative of a bug

Page 2: What is a Data Race?

2

Lock(m)

Unlock(m) Lock(m)

Unlock(m)

How Can Data Races be Prevented?

Explicit synchronization between threads: Locks Critical Sections Barriers Mutexes Semaphores Monitors Events Etc.

Thread 1 Thread 2

X++

T=X

Page 3: What is a Data Race?

3

Is This Sufficient?

Yes! No!

Programmer dependent Correctness – programmer may forget to synch

Need tools to detect data races

Expensive Efficiency – to achieve correctness,

programmer may overdo. Need tools to remove excessive synch’s

Page 4: What is a Data Race?

4

#define N 100Type g_stack = new Type[N];int g_counter = 0;Lock g_lock;

void push( Type& obj ){lock(g_lock);...unlock(g_lock);}void pop( Type& obj ) {lock(g_lock);...unlock(g_lock);}void popAll( ) {

lock(g_lock); delete[] g_stack;g_stack = new Type[N];g_counter = 0;unlock(g_lock);

}int find( Type& obj, int number ) {

lock(g_lock); for (int i = 0; i < number; i++)

if (obj == g_stack[i]) break; // Found!!!if (i == number) i = -1; // Not found… Return -1 to callerunlock(g_lock);return i;

}int find( Type& obj ) {

return find( obj, g_counter );}

Where is Waldo?

Page 5: What is a Data Race?

5

#define N 100Type g_stack = new Type[N];int g_counter = 0;Lock g_lock;

void push( Type& obj ){lock(g_lock);...unlock(g_lock);}void pop( Type& obj ) {lock(g_lock);...unlock(g_lock);}void popAll( ) {

lock(g_lock); delete[] g_stack;g_stack = new Type[N];g_counter = 0;unlock(g_lock);

}int find( Type& obj, int number ) {

lock(g_lock); for (int i = 0; i < number; i++)

if (obj == g_stack[i]) break; // Found!!!if (i == number) i = -1; // Not found… Return -1 to callerunlock(g_lock);return i;

}int find( Type& obj ) {

return find( obj, g_counter );}

Can You Find the Race?Similar problem was foundin java.util.Vector

write

read

Page 6: What is a Data Race?

6

Detecting Data Races?

NP-hard [Netzer&Miller 1990] Input size = # instructions performed Even for 3 threads only Even with no loops/recursion

Execution orders/scheduling (#threads)thread_length

# inputs Detection-code’s side-effects Weak memory, instruction reorder,

atomicity

Page 7: What is a Data Race?

7

Motivation

Run-time framework goals Collect a complete trace of a program’s user-mode

execution Keep the tracing overhead for both space and time low Re-simulate the traced execution deterministically based on

the collected trace with full fidelity down to the instruction level

Full fidelity: user mode only, no tracing of kernel, only user-mode I/O callbacks

Advantages Complete program trace that can be analyzed from multiple

perspectives (replay analyzers: debuggers, locality, etc) Trace can be collected on one machine and re-played on

other machines (or perform live analysis by streaming)

Challenges: Trace Size and Performance

Page 8: What is a Data Race?

8

Original Record-Replay Approaches

InstantReplay ’87 Record order or memory accesses overhead may affect program behavior

RecPlay ’00 Record only synchronizations Not deterministic if have data races

Netzer ’93 Record optimal trace too expensive to keep track of all memory locations

Bacon & Goldstein ’91 Record memory bus transactions with hardware high logging bandwidth

Page 9: What is a Data Race?

9

Motivation

Increasing use and development for multi-core processors

MT program behavior is non-deterministic To effectively debug software, developers must

be able to replay executions that exhibit concurrency bugs

Shared memory updates happen in different order

Page 10: What is a Data Race?

10

Related Concepts

Runtime interpretation/translation of binary instructions Requires no static instrumentation, or special symbol

information Handle dynamically generated code, self modifying code Recording/Logging: ~100-200x

More recent logging Proposed hardware support (for MT domain) FDR (Flight Data Recorder) BugNet (cache bits set on first load) RTR (Regulated Transitive Reduction) DeLorean (ISCA 2008- chunks of instructions) Strata (time layer across all the logs for the running threads) iDNA (Diagnostic infrastructure using NirvanA- Microsoft)

Page 11: What is a Data Race?

11

Deterministic Replay

Re-execute the exact same sequence of instructions as recorded in a previous run

Single threaded programs Record Load Values needed for reproducing behavior of

a run (Load Log) Registers updated by system calls and signal handlers

(Reg Log) Output of special instructions: RDTSC, CPUID (Reg Log) System call (virtualization- cloning arguments, updates) Checkpointing (log summary ~10Million)

Multi-threaded programs Log interleaving among threads (shared memory

updates ordering – SMO Log)

Page 12: What is a Data Race?

12

PinSEL – System Effect Log (SEL)

Logging program load values needed for deterministic replay:– First access from a memory location– Values modified by the system (system effect) and read

by program– Machine and time sensitive instructions (cpuid,rdtsc)

Load A; (A = 111)

Logged

Not Logged

Syscall modifies location (B -> 0)

and (C -> 99)

Load C; (C = 99)

Load D; (D = 10)

Store A; (A 111)

Store B; (B 55)

Load B; (B = 0)

system call

Program execution

Load C; (C = 9)

Load D; (D = 10)

•Trace size is ~4-5 bytes per instruction

Page 13: What is a Data Race?

13

Optimization: Trace select reads

Observation: Hardware caches eliminate most off-chip reads Optimize logging:

Logger and replayer simulate identical cache memories Simple cache (the memory copy structure) to decide

which values to log. No tags or valid bits to check. If the values mismatch they are logged.

Average trace size is <1 bit per instruction

i = 1;for (j = 0; j < 10; j++){ i = i + j;}k = i; // value read is 46System_call();k = i; // value read is 0 (not predicted)

The only read not predicted and logged follows the system call

Page 14: What is a Data Race?

14

Example Overhead

PinSEL and PinPLAY Initial work (2006) with single threaded programs:

SPEC2000 ref runs: 130x slowdown for pinSEL and ~80x for PinPLAY (w/o in-lining)

Working with a subset of SPLASH2 benchmarks: 230x slowdown for PinSEL

Now: Geo-mean SPEC2006 Pin 1.4x Logger 83.6x Replayer 1.4x

Page 15: What is a Data Race?

15

Example: Microsoft iDNA Trace Writer Performance

Application

SimulatedInstructions(millions)

Trace FileSize

Trace FileBits / Instructio

n

NativeExecutionTime

ExecutionTime WhileTracing

ExecutionOverhead

Gzip 24,097 245 MB 0.09 11.7s 187s 15.98

Excel 1,781 99 MB 0.47 18.2s 105s 5.76

PowerPoint

7,392 528 MB 0.60 43.6s 247s 5.66

IE 116 5 MB 0.50 0.499s 6.94s 13.90

Vulcan 2,408 152 MB 0.53 2.74s 46.6s 17.01

Satsolver 9,431 1300 MB 1.16 9.78s 127s 12.98

•Memchecker and valgrind are in 30-40x range on CPU 2006

•iDNA ~11x, (does not log shared-memory dependences explicitly)

•Use a sequential number for every lock prefixed memory operation: offline

data race analysis

Page 16: What is a Data Race?

16

Logging Shared Memory Ordering(Cristiano’s PinSEL/PLAY Overview)

Emulation of Directory Based Cache Coherence

Identifies RAW, WAR, WAW dependences Indexed by hashing effective address Each entry represents an address range

Store A

Load B

Program execution

hash

Dir EntryDir Entry

Dir Entry

Dir Entry

Directory

Page 17: What is a Data Race?

17

Directory Entries

Every DirEntry maintains: Thread id of the last_writer A timestamp is the # of memory ref. the thread has

executed Vector of timestamps of last access for each thread

to that entry On Loads: update the timestamp for the thread in the

entry On Stores: update the timestamp and the last_writer

fields

Pro

gra

m e

xecu

tion

Thread T1 Thread T2

Last writer id:1: Store A

2: Load A

DirEntry: [A:D]

Last writer id:

DirEntry: [E:H]

Directory

T1: T2:

T1: T2:

1: Load F

2: Store A

3: Load F

3: Store F

T1

1

1

T2

22

3

T1

3

Vector

Page 18: What is a Data Race?

18

Detecting Dependences

RAW dependency between threads T and T’ is established if:

T executes a load that maps to the directory entry A T’ is the last_writer for the same entry

WAW dependency between T and T’ is established if: T executes a store that maps to the directory entry A T’ is the last_writer for the same entry

WAR dependency between T and T’ is established if: T executes a store that maps to the directory entry A T’ has accessed the same entry in the past and T is not

the last_writer

Page 19: What is a Data Race?

19

ExampleP

rog

ram

execu

tion

Thread T1 Thread T2

Last writer id:1: Store A

2: Load A

DirEntry: [A:D]

Last writer id:

DirEntry: [E:H]

T1: T2:

T1: T2:

1: Load F

2: Store A

3: Load F

3: Store F

T1

1

1

T2

22

3

T1

3

WAW

RAW

WAR

T1 2 T2 2

T1 3 T2 3

T2 2 T1 1

SMO logs:

Thread T1 cannot execute memory reference 2until T2 executes its memory reference 2

Thread T2 cannot execute memory reference 2 until T1 executes itsmemory reference 1

Last access to the DirEntry

Last_writerLast access to the DirEntry

Page 20: What is a Data Race?

20

Ordering Memory Accesses (Reducing log size)

Preserving order will reproduce execution a→b: “a happens-before b” Ordering is transitive: a→b, b→c means

a→c Two instructions must be ordered if:

they both access the same memory, and one of them is a write

Page 21: What is a Data Race?

21

Constraints: Enforcing Order

To guarantee a→d: a→d b→d a→c b→c

Suppose we need b→c b→c is necessary a→d is redundant

P1

a

b

c

d

P2

overconstrained

Page 22: What is a Data Race?

22

Reproduce exact same conflicts: no more, no less

Problem Formulation

ld A

Thread I Thread J

Recording

st B

st C

sub

ld B

add

st C

ld B

st A

st C

Thread I Thread J

Replay

Log

ld D

st D

ld A

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

Conflicts(red)

Dependence(black)

Page 23: What is a Data Race?

23

Detect conflicts Write log

Log All Conflicts

1

2

3

4

5

6

1

2

3

4

5

6

ld A

Thread I Thread J

Replay

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

Log J: 23 14 35 46

Log I: 23

Log Size: 5*16=80 bytes(10 integers)

Dependence Log

16 bytes

Assign IC (logical Timestamps)

But too many conflicts

Page 24: What is a Data Race?

24

Netzer’s Transitive Reduction

1

2

3

4

5

6

1

2

3

4

5

6

ld A

Thread I Thread J

Replay

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

TR reduced Log J: 23

35 46

Log I: 23

Log Size: 64 bytes(8 integers)

TR Reduced Log

Page 25: What is a Data Race?

25

RTR (Regulated Transitive Reduction): Stricter Dependences to Aid Vectorization

1

2

3

4

5

6

1

2

3

4

5

6

ld A

Thread I Thread J

Replay

st B

st C

sub

ld B

add

st C

ld B

st A

st C

ld D

st D

Log J: 23 45

Log I: 23

Log Size: 48 bytes(6 integers)

New Reduced Log

stricter

Reduced

4% Overhead RTR+FDR (simulated on GEMs).2 MB/core/second logging (Apache)


Recommended