Non-blocking Michael-Scott queue algorithm

Post on 22-Jan-2018

2,537 views 1 download

transcript

Non-blocking Michael-Scott queue algorithm

Alexey FyodorovJUG.ru Group

• Programming

• Algorithms

• Concurrency

What is this talk about?

• Programming

• Algorithms

• Concurrency

Areyousureyouneedit?

What is this talk about?

For concurrency beginners

SorryPlease go to another room

For concurrency beginners

SorryPlease go to another room

For non-blocking programming beginners

A short introduction

For concurrency beginners

SorryPlease go to another room

For non-blocking programming beginners

A short introduction

For advanced concurrent programmers

CAS-based queue algorithm

You have another room!

12:10Non-blocking Michael-Scott queue algorithm

Alexey Fyodorov

Easily scale enterprise applications

using distributed data gridsOndrej Mihaly

Main Models

Shared Memory

write + read

Similar to how we program it

Concurrent Programming

Main Models

Shared Memory Messaging

write + read send + onReceive

Similar to how we program it

Similar to how a real hardware works

Distributed Programming

Concurrent Programming

Advantages of ParallelismResource utilization Utilization of several cores/CPUs

aka PERFORMANCE

Advantages of ParallelismResource utilization

Simplicity Complexity goes to magic frameworks• ArrayBlockingQueue• ConcurrentHashMap• Akka

Utilization of several cores/CPUsaka PERFORMANCE

Advantages of ParallelismResource utilization

Async handling

Simplicity

Utilization of several cores/CPUsaka PERFORMANCE

Complexity goes to magic frameworks• ArrayBlockingQueue• ConcurrentHashMap• Akka

Responsible services, Responsible UI

Disadvantages of Locking

• Deadlocks

Disadvantages of Locking

• Deadlocks

• Priority Inversion

Disadvantages of Locking

• Deadlocks

• Priority Inversion

• Reliability• What will happen if lock owner die?

Disadvantages of Locking

• Deadlocks

• Priority Inversion

• Reliability• What will happen if lock owner die?

• Performance• Scheduler can push lock owner out• No parallelism inside a critical section!

Amdahl’s Law

α non-parallelizable part of the computation

1-α parallelizable part of the computation

p number of threads

Amdahl’s Law

α non-parallelizable part of the computation

1-α parallelizable part of the computation

p number of threads

S = #

α$%&α'

If-Modify-Write

volatile int value = 0;

Can we run it in multithreaded environment?

if (value == 0) {value = 42;

}

If-Modify-Write

volatile int value = 0;

No atomicityif (value == 0) {

value = 42;}

}

Compare-And-Set

int value = 0;

LOCKif (value == 0) {

value = 42;}

UNLOCK

Introducing a Magic Operation

value.compareAndSet(0, 42);

int value = 0;

Simulated CASlong value;

synchronized long get() { return value;

}

synchronized long compareAndSwap(long expected, long newValue) {long oldValue = value;if (oldValue == expected) {

value = newValue;}return oldValue;

}

synchronized boolean compareAndSet(long expected, long newValue) {return expected == compareAndSwap(expected, newValue);

}

Simulated CASlong value;

synchronized long get() { return value;

}

synchronized long compareAndSwap(long expected, long newValue) {long oldValue = value;if (oldValue == expected) {

value = newValue;}return oldValue;

}

synchronized boolean compareAndSet(long expected, long newValue) {return expected == compareAndSwap(expected, newValue);

}

Simulated CASlong value;

synchronized long get() { return value;

}

synchronized long compareAndSwap(long expected, long newValue) {long oldValue = value;if (oldValue == expected) {

value = newValue;}return oldValue;

}

synchronized boolean compareAndSet(long expected, long newValue) {return expected == compareAndSwap(expected, newValue);

}

Simulated CASlong value;

synchronized long get() { return value;

}

synchronized long compareAndSwap(long expected, long newValue) {long oldValue = value;if (oldValue == expected) {

value = newValue;}return oldValue;

}

synchronized boolean compareAndSet(long expected, long newValue){return expected == compareAndSwap(expected, newValue);

}

Compare and Swap — Hardware Support

compare-and-swapCAS

load-link / store-conditionalLL/SC

cmpxchg ldrex/strex lwarx/stwcx

Atomics in JDK

AtomicReference• ref.get()

• ref.compareAndSet(v1, v2)

• ...

AtomicLong• i.get()

• i.compareAndSet(42, 43)

• i.incrementAndGet(1)

• i.getAndAdd(5)

• ...

java.util.concurrent.atomic

Atomics in JDK

AtomicReference• ref.get()• ref.compareAndSet(v1, v2)

• ...

AtomicLong• i.get()• i.compareAndSet(42, 43)

• i.incrementAndGet(1)

• i.getAndAdd(5)

• ...

java.util.concurrent.atomic

Example. Atomic CounterAtomicLong value = new AtomicLong();

long get() {return value.get();

}

void increment() {long v;do {

v = value.get();} while (!value.compareAndSet(v, v + 1));

}

AtomicLong value = new AtomicLong();

long get() {return value.get();

}

void increment() {long v;do {

v = value.get();} while (!value.compareAndSet(v, v + 1));

}

Example. Atomic Counter

Atomics.Questions?

Non-blocking Guarantees

Wait-Free Per-thread progress is guaranteed

Non-blocking Guarantees

Wait-Free Per-thread progress is guaranteed

Lock-Free Overall progress is guaranteed

Non-blocking Guarantees

Wait-Free Per-thread progress is guaranteed

Lock-Free Overall progress is guaranteed

Obstruction-Free Overall progress is guaranteed if threads don’t interfere with each other

CAS-loopdo {

v = value.get();} while (!value.compareAndSet(v, v + 1));

A. Wait-FreeB. Lock-FreeC. Obstruction-Free

CAS-loopdo {

v = value.get();} while (!value.compareAndSet(v, v + 1));

A. Wait-FreeB. Lock-FreeC. Obstruction-Free

*for modern hardware supporting CAS or LL/SC

Stack & Concurrency

class Node<E> {

final E item;

Node<E> next;

Node(E item) {this.item = item;

}

}

...

class Node<E> {

final E item;

Node<E> next;

Node(E item) {this.item = item;

}

}

E3

E1

E2

E3

E1

E2

top

E3

E1

E2

top

item1

Thread 1

E3

E1

E2

top

item1

Thread 1

E3

E1

E2

top

item2item1

Thread 1 Thread 2

E3

E1

E2

top

item2item1

Thread 1 Thread 2

E3

E1

E2

item2item1

Thread 1 Thread 2top

E3

E1

E2

item2item1

Thread 1 Thread 2

We need a synchronization

top

Non-blocking Stack

void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {

oldHead = top.get();newHead.next = oldHead;

} while (!top.compareAndSet(oldHead, newHead));}

AtomicReference<Node<E>> top;E3

E1

E2

top

void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {

oldHead = top.get();newHead.next = oldHead;

} while (!top.compareAndSet(oldHead, newHead));}

AtomicReference<Node<E>> top;E3

E1

E2

item

top

void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {

oldHead = top.get();newHead.next = oldHead;

} while (!top.compareAndSet(oldHead, newHead));}

E3

E1

E2

item

AtomicReference<Node<E>> top;top

newHead

void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {

oldHead = top.get();newHead.next = oldHead;

} while (!top.compareAndSet(oldHead, newHead));}

E3

E1

E2

AtomicReference<Node<E>> top;

item

top

newHead

oldHead

void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {

oldHead = top.get();newHead.next = oldHead;

} while (!top.compareAndSet(oldHead, newHead));}

AtomicReference<Node<E>> top;E3

E1

E2

item

top

newHead

oldHead

void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {

oldHead = top.get();newHead.next = oldHead;

} while (!top.compareAndSet(oldHead, newHead));}

AtomicReference<Node<E>> top;E3

E1

E2

item

top

newHead

oldHead

void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {

oldHead = top.get();newHead.next = oldHead;

} while (!top.compareAndSet(oldHead, newHead));}

AtomicReference<Node<E>> top;E3

E1

E2

item

top

void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {

oldHead = top.get();newHead.next = oldHead;

} while (!top.compareAndSet(oldHead, newHead));}

AtomicReference<Node<E>> top;E3

E1

E2

item

top

newHead

oldHead

void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {

oldHead = top.get();newHead.next = oldHead;

} while (!top.compareAndSet(oldHead, newHead));}

E3

E1

E2

AtomicReference<Node<E>> top;top

itemnewHead

oldHead

void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {

oldHead = top.get();newHead.next = oldHead;

} while (!top.compareAndSet(oldHead, newHead));}

E3

E1

E2

AtomicReference<Node<E>> top;top

item

E pop() {Node<E> newHead;Node<E> oldHead;do {

oldHead = top.get();if (oldHead == null) return null;newHead = oldHead.next;

} while (!top.compareAndSet(oldHead, newHead));return oldHead.item;

}

E3

E1

E2

top

Non-blocking Stack.Questions?

Non-blocking Queue

Michael and Scott, 1996https://www.research.ibm.com/people/m/michael/podc-1996.pdf

Threads help each other

Non-blocking queue

class LinkedQueue<E> {

static class Node<E> {E item;AtomicReference<Node<E>> next;

Node(E item, AtomicReference<Node<E>> next) {this.item = item;this.next = next;

}}

Node<E> dummy = new Node<>(null, null);AtomicReference<Node<E>> head = new AtomicReference<>(dummy);AtomicReference<Node<E>> tail = new AtomicReference<>(dummy);

}

class LinkedQueue<E> {

static class Node<E> {E item;AtomicReference<Node<E>> next;

Node(E item, AtomicReference<Node<E>> next) {this.item = item;this.next = next;

}}

Node<E> dummy = new Node<>(null, null);AtomicReference<Node<E>> head = new AtomicReference<>(dummy);AtomicReference<Node<E>> tail = new AtomicReference<>(dummy);

}

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode); tail.compareAndSet(curTail, curTail.next.get());

} while (!success); }

tail

dummy 1 2

head

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode); tail.compareAndSet(curTail, curTail.next.get());

} while (!success); }

tail

dummy 1 2 item

head

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode); tail.compareAndSet(curTail, curTail.next.get());

} while (!success); }

tailhead

dummy 1 2 item

newNode

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode); tail.compareAndSet(curTail, curTail.next.get());

} while (!success); }

tailhead

dummy 1 2 item

newNodecurTail

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode);tail.compareAndSet(curTail, curTail.next.get());

} while (!success); }

tailhead

dummy 1 2 item

newNodecurTail

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode); tail.compareAndSet(curTail, curTail.next.get());

} while (!success); }

tailhead

dummy 1 2 item

newNodecurTail

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode); tail.compareAndSet(curTail, curTail.next.get());

} while (!success); }

tailhead

dummy 1 2 item

newNodecurTail

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode); tail.compareAndSet(curTail, curTail.next.get());

} while (!success); }

tailhead

dummy 1 2 item

newNodecurTail

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); tail.CAS(curTail, curTail.next.get());

} while (!success); }

tailhead

dummy 1 2 item

newNodecurTail

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); tail.CAS(curTail, curTail.next.get());

} while (!success); }

tailhead

dummy 1 2 item

newNodecurTail

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); // true tail.CAS(curTail, curTail.next.get()); // true

} while (!success); }

tailhead

dummy 1 2 item

newNodecurTail

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); tail.CAS(curTail, curTail.next.get());

} while (!success); }

tailhead

dummy 1 2 item

newNodecurTail

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); // true tail.CAS(curTail, curTail.next.get()); // false

} while (!success); }

tailhead

dummy 1 2 item

newNodecurTail

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); tail.CAS(curTail, curTail.next.get());

} while (!success); }

tailhead

dummy 1 2 item

newNodecurTail

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); // false tail.CAS(curTail, curTail.next.get()); // false

} while (!success); }

tailhead

dummy 1 2 item

newNodecurTail

another

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); tail.CAS(curTail, curTail.next.get());

} while (!success); }

tailhead

dummy 1 2 item

newNodecurTail

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); // false tail.CAS(curTail, curTail.next.get()); // true

} while (!success); }

tailhead

dummy 1 2 item

newNodecurTail

another

void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {

Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); // false tail.CAS(curTail, curTail.next.get()); // true

} while (!success); }

tailhead

dummy 1 2 item

newNodecurTail

anotherHELP

Synchronization

Blocking

lock + unlock

Invariant: before & after

lock-based

Synchronization

Blocking Non-blocking

lock + unlock CAS-loop

Invariant: before & after Semi-invariant

CAS-basedlock-based

public void put(E item) {Node<E> newNode = new Node<>(item, null);while (true) {

Node<E> currentTail = tail.get();Node<E> tailNext = currentTail.next.get();if (currentTail == tail.get()) {

if (tailNext != null) {tail.compareAndSet(currentTail, tailNext);

} else {if (currentTail.next.compareAndSet(null, newNode)) {

tail.compareAndSet(currentTail, newNode);return;

}}

}}

}

public E poll() {while (true) {

Node<E> first = head.get();Node<E> last = tail.get();Node<E> next = first.next.get();if (first == head.get()) {

if (first == last) {if (next == null) return null;tail.compareAndSet(last, next);

} else {E item = next.item;if (head.compareAndSet(first, next))

return item;}

}}

}

Non-blocking Queue in JDK

ConcurrentLinkedQueue is based on Michael-Scott queue

� based on CAS-like operations

� use CAS-loop pattern

� threads help one another

Non-blocking algorithms. Summary

Non-blocking Queue.Questions?

ArrayBlockingQueue

ArrayBlockingQueue

0 1 2 3 4 N-1...

void put(E e) throws InterruptedException {checkNotNull(e);final ReentrantLock lock = this.lock;lock.lockInterruptibly();try {

while (count == items.length)notFull.await();

final Object[] items = this.items;items[putIndex] = x;if (++putIndex == items.length)

putIndex = 0;count++;notEmpty.signal();

} finally {lock.unlock();

}}

ArrayBlockingQueue.put()

void put(E e) throws InterruptedException {checkNotNull(e);final ReentrantLock lock = this.lock;lock.lockInterruptibly();try {

while (count == items.length)notFull.await();

final Object[] items = this.items;items[putIndex] = x;if (++putIndex == items.length)

putIndex = 0;count++;notEmpty.signal();

} finally {lock.unlock();

}}

ArrayBlockingQueue.put()

Modifications

Ladan-Mozes, Shavit, 2004, 2008Key IDEA: use Doubly Linked List to avoid 2nd CAS

OptimisticApproach

http://people.csail.mit.edu/edya/publications/OptimisticFIFOQueue-journal.pdf

Hoffman, Shalev, Shavit, 2007

BasketsQueue

http://people.csail.mit.edu/shanir/publications/Baskets%20Queue.pdf

� Throughput is better

� no FIFO any more� usually you don’t need strong FIFO in real life

Baskets Queue

Summary

� Non-blocking algorithms are complicated� Blocking algorithms are easier

� correctness checking is difficult� difficult to support

� Sometimes it has better performance

Summary

� Non-blocking algorithms are complicated� Blocking algorithms are easier

� correctness checking is difficult� difficult to support

� Sometimes it has better performance

Summary

� Non-blocking algorithms are complicated� Blocking algorithms are easier

� correctness checking is difficult� difficult to support

� Sometimes it has better performance

Summary

Engineering is the art of trade-offs

Links & Books

Books

Links

• Nitsan Wakart — http://psy-lob-saw.blogspot.com/• AlexeyShipilev— https://shipilev.net/• concurrency-interestmailinglist:

http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest

Q & A