+ All Categories
Home > Documents > Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier...

Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier...

Date post: 24-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
66
Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google
Transcript
Page 1: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Blowing up the (C++11) atomic barrierOptimizing C++11 atomics in LLVM

Robin Morisset, Intern at Google

Page 2: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Background: C++11 atomics

Optimizing around atomics

Fence elimination

Miscellaneous optimizations

Further work: Problems with atomics

Page 3: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Background: C++11 atomics

Optimizing around atomics

Fence elimination

Miscellaneous optimizations

Further work: Problems with atomics

Page 4: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x <- 1;

print y;

y <- 1;

print x;

Can this possibly print 0-0 ?

Thread 1 Thread 2

Page 5: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

print y;

x <- 1;

print x;

y <- 1;

Can this possibly print 0-0 ?

Thread 1 Thread 2

Yes if your compiler reorder accesses

Page 6: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x <- 1;

mfence;

print y;

y <- 1;

mfence;

print x;

Can this possibly print 0-0 ?Yes on x86: needs a fence

Flush your (FIFO)

store buffer

Page 7: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x <- 42;

ready <- 1;

if (ready)

print x;

Can this possibly print 0 ?

Page 8: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x <- 42;

dmb ish;

ready <- 1;

if (ready)

print x;

Can this possibly print 0 ?Yes on ARM

Flush your (non-FIFO)store buffer

Page 9: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x <- 42;

dmb ish;

ready <- 1;

if (ready)

dmb ish;

print x;

Can this possibly print 0 ?Yes on ARM: needs 2 fences to prevent

Flush your (non-FIFO)store buffer

Don’t speculate reads across

Page 10: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

● data race (dynamic) = undefined

● no data race (using mutexes)= intuitive behavior (“Sequentially consistent”)

● for lock-free code: atomic accesses

C11/C++11 memory modelDoing it portably

Page 11: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x.store(1, seq_cst);

print(y.load(seq_cst));

Sequentially consistent

y.store(1, seq_cst);

print(x.load(seq_cst));

Page 12: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x = 42;

ready.store(1, release);

Release/acquire

if (ready.load(acquire))

print(x);

Page 13: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x = 42;

ready.store(1, release);

Release/acquire

if (ready.load(acquire))

print(x);

Page 14: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x = 42;

ready.store(1, release);

Release/acquire

if (ready.load(acquire))

print(x);

Page 15: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Background: C++11 atomics

Optimizing around atomics

Fence elimination

Miscellaneous optimizations

Further work: Problems with atomics

Page 16: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

void foo(int *x, int n) {

for(int i=0; i<n; ++i){

*x *= 42;

}

}

Compiler optimizations ?

void foo(int *x, int n) {

int tmp = *x;

for(int i=0; i < n; ++i){

tmp *= 42;

}

*x = tmp;

}

LICM

Page 17: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

void foo(int *x, int n) {

}

Compiler optimizations ?

void foo(int *x, int n) {

int tmp = *x;

*x = tmp;

}

LICM

Page 18: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

void foo(int *x, int n) {

}

Compiler optimizations ?

void foo(int *x, int n) {

int tmp = *x;

*x = tmp;

}

LICM

++(*x); // in another thread...

Page 19: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Never introduce a store where there was none

Page 20: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x = 42;

x = 43;

Dead store elimination ?

Page 21: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x = 42;

flag1.store(true, release);

while (!flag2.load(acquire))

continue;

x = 43;

Dead store elimination ?

Page 22: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x = 42;

flag1.store(true, release);

while (!flag2.load(acquire))

continue;

x = 43;

Dead store elimination ?

while (!flag1.load(acquire))

continue;

print(x);

flag2.store(true, release);

Page 23: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x = 42;

while (!flag2.load(acquire))

continue;

x = 43;

Dead store elimination ?

print(x);

flag2.store(true, release);

Race !

Page 24: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x = 42;

flag1.store(true, release);

x = 43;

Dead store elimination ?

while (!flag1.load(acquire))

continue;

print(x);

Race !

Page 25: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Anything can happen to memory between a release and an acquire

Page 26: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Background: C++11 atomics

Optimizing around atomics

Fence elimination

Miscellaneous optimizations

Further work: Problems with atomics

Page 27: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

int t = y.load(acquire);

x.store(1, release);

ldr r0, [r0]

dmb ish

dmb ish

str r2, [r1]

Fence elimination

Page 28: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

ldr …

dmb ish

dmb ish

str …

2 fences on main pathstr …

Page 29: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

ldr …

dmb ish

str …

1 fence on main pathdmb ish

str …

Page 30: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

ldr …

dmb ish

str …

str …

dmb ish1 fence on main path

Page 31: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Build graph from CFG

ldr …

str …

str …

Page 32: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Source

Build graph from CFGIdentify sources/sinks

Sink

ldr …

str …

str …

Page 33: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Build graph from CFGIdentify sources/sinks

Source

Sink

ldr …

str …

str …

Page 34: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Build graph from CFGIdentify sources/sinksAnnotate with frequency

5

5

2∞

Source

Sink

ldr …

str …

str … 2

Page 35: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Build graph from CFGIdentify sources/sinksAnnotate with frequencyFind min-cut

2 + 5 = 7 is minimum

5

5

2

2∞

Source

Sink

ldr …

str …

str …

Page 36: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

ldr …

dmb ish

str …

dmb ish

str …

Build graph from CFGIdentify sources/sinksAnnotate with frequencyFind min-cutMove fences

Page 37: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

while(flag.load(acquire))

{}

.loop:

ldr r0, [r1]

dmb ish

bnz .loop

Page 38: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

while(flag.load(acquire))

{}

.loop:

ldr r0, [r1]

bnz .loop

dmb ish

Page 39: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

.loop:

ldr r0, [r1]

dmb ish

bnz .loop

memory access

Source

Sink

98100

2

Page 40: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

.loop:

ldr r0, [r1]

bnz .loop

dmb ish

memory access

Source

Sink

98100

2

Page 41: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Background: C++11 atomics

Optimizing around atomics

Fence elimination

Miscellaneous optimizations

Further work: Problems with atomics

Page 42: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x.load(release) ?

Page 43: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x.fetch_add(0, release)

x.load(release) ?

Page 44: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x.fetch_add(0, release)

mov %eax, $0

lock

xadd (%ebx), %eax

x.load(release) ? x86

Page 45: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x.fetch_add(0, release)

mov %eax, $0

lock

xadd (%ebx), %eax

x.load(release) ?

mfence

mov %eax, (%ebx)

x86

7200%speedupfor a seqlock*

Page 46: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x.store(0, release) hwsync

stw …

dmb sy

str …

x.load(acquire) lwz …

hwsync

ldr …

dmb sy

Power ARM

Page 47: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x.store(0, release) lwsync

stw …

dmb ish

str …

x.load(acquire) lwz …

lwsync

ldr …

dmb ish

Power ARM

Page 48: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x.store(0, release) lwsync

stw …

dmb ishst

str …

x.load(acquire) lwz …

lwsync

ldr …

dmb ish

Power ARM (Swift)

Page 49: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Power

x.store(2, relaxed)

rlwinm r2, r3, 3, 27, 28

li r4, 2

xori r5, r2, 24

rlwinm r2, r3, 0, 0, 29

li r3, 255

slw r4, r4, r5

slw r3, r3, r5

and r4, r4, r3

LBB4_1:

lwarx r5, 0, r2

andc r5, r5, r3

or r5, r4, r5

stwcx. r5, 0, r2

bne cr0, LBB4_1

Page 50: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Power

Shuffling

x.store(2, relaxed)

rlwinm r2, r3, 3, 27, 28

li r4, 2

xori r5, r2, 24

rlwinm r2, r3, 0, 0, 29

li r3, 255

slw r4, r4, r5

slw r3, r3, r5

and r4, r4, r3

LBB4_1:

lwarx r5, 0, r2

andc r5, r5, r3

or r5, r4, r5

stwcx. r5, 0, r2

bne cr0, LBB4_1

Page 51: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Power

Loop

Shuffling

x.store(2, relaxed)

rlwinm r2, r3, 3, 27, 28

li r4, 2

xori r5, r2, 24

rlwinm r2, r3, 0, 0, 29

li r3, 255

slw r4, r4, r5

slw r3, r3, r5

and r4, r4, r3

LBB4_1:

lwarx r5, 0, r2

andc r5, r5, r3

or r5, r4, r5

stwcx. r5, 0, r2

bne cr0, LBB4_1

Page 52: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x.store(2, relaxed)

rlwinm r2, r3, 3, 27, 28

li r4, 2

xori r5, r2, 24

rlwinm r2, r3, 0, 0, 29

li r3, 255

slw r4, r4, r5

slw r3, r3, r5

and r4, r4, r3

LBB4_1:

lwarx r5, 0, r2

andc r5, r5, r3

or r5, r4, r5

stwcx. r5, 0, r2

bne cr0, LBB4_1

Power

Load linkedStore conditional

Loop

Shuffling

Page 53: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x.store(2, relaxed) li r2, 2stb r2, 0(r3)

Power

Page 54: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x.store(2, relaxed) mov %eax, $2

mov (%ebx), %eax

x86

Page 55: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

x.store(2, relaxed)mov (%ebx), $2

x86

Page 56: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Background: C++11 atomics

Optimizing around atomics

Fence elimination

Miscellaneous optimizations

Further work: Problems with atomics

Page 57: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

print(y.load(relaxed));

x.store(1, relaxed);

print(x.load(relaxed));

y.store(1, relaxed);

Relaxed attribute

Page 58: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

print(y.load(relaxed));

x.store(1, relaxed);

print(x.load(relaxed));

y.store(1, relaxed);

Can print 1-1

Relaxed attribute

Page 59: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

t_y = y.load(relaxed);

x.store(t_y, relaxed);

t_x = x.load(relaxed);

y.store(t_x, relaxed);

x = y = ???

Relaxed attribute

Page 60: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

if(y.load(relaxed))

x.store(1, relaxed);

print(“foo”);

if(x.load(relaxed))

y.store(1, relaxed);

print(“bar”);

Can print foobar !

Relaxed attribute

Page 61: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

*x = 42;

x.store(1, release);

Consume attribute

t = x.load(acquire);

print(*t);

Page 62: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

*x = 42;

x.store(1, release);

Consume attribute

t = x.load(consume);

print(*t);

Ordered

Page 63: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

*x = 42;

x.store(1, release);

Consume attribute

t = x.load(consume);

print(*y);

Unordered !

Page 64: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

*x = 42;

x.store(1, release);

Consume attribute

t = x.load(consume);

print(*(y + t - t));

???

Page 65: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

● Atomics = portable lock-free code in C11/C++11

● Tricky to compile, but can be done

● Lots of open questions

Conclusion

Page 66: Blowing up the (C++11) atomic barrier Optimizing C++11 ... · Blowing up the (C++11) atomic barrier Optimizing C++11 atomics in LLVM Robin Morisset, Intern at Google

Questions ?


Recommended