Post on 22-Jan-2018
transcript
Non-blocking Michael-Scott queue algorithm
Alexey FyodorovJUG.ru Group
• Programming
• Algorithms
• Concurrency
What is this talk about?
• Programming
• Algorithms
• Concurrency
Areyousureyouneedit?
What is this talk about?
For concurrency beginners
SorryPlease go to another room
For concurrency beginners
SorryPlease go to another room
For non-blocking programming beginners
A short introduction
For concurrency beginners
SorryPlease go to another room
For non-blocking programming beginners
A short introduction
For advanced concurrent programmers
CAS-based queue algorithm
You have another room!
12:10Non-blocking Michael-Scott queue algorithm
Alexey Fyodorov
Easily scale enterprise applications
using distributed data gridsOndrej Mihaly
Main Models
Shared Memory
write + read
Similar to how we program it
Concurrent Programming
Main Models
Shared Memory Messaging
write + read send + onReceive
Similar to how we program it
Similar to how a real hardware works
Distributed Programming
Concurrent Programming
Advantages of ParallelismResource utilization Utilization of several cores/CPUs
aka PERFORMANCE
Advantages of ParallelismResource utilization
Simplicity Complexity goes to magic frameworks• ArrayBlockingQueue• ConcurrentHashMap• Akka
Utilization of several cores/CPUsaka PERFORMANCE
Advantages of ParallelismResource utilization
Async handling
Simplicity
Utilization of several cores/CPUsaka PERFORMANCE
Complexity goes to magic frameworks• ArrayBlockingQueue• ConcurrentHashMap• Akka
Responsible services, Responsible UI
Disadvantages of Locking
• Deadlocks
Disadvantages of Locking
• Deadlocks
• Priority Inversion
Disadvantages of Locking
• Deadlocks
• Priority Inversion
• Reliability• What will happen if lock owner die?
Disadvantages of Locking
• Deadlocks
• Priority Inversion
• Reliability• What will happen if lock owner die?
• Performance• Scheduler can push lock owner out• No parallelism inside a critical section!
Amdahl’s Law
α non-parallelizable part of the computation
1-α parallelizable part of the computation
p number of threads
Amdahl’s Law
α non-parallelizable part of the computation
1-α parallelizable part of the computation
p number of threads
S = #
α$%&α'
If-Modify-Write
volatile int value = 0;
Can we run it in multithreaded environment?
if (value == 0) {value = 42;
}
If-Modify-Write
volatile int value = 0;
No atomicityif (value == 0) {
value = 42;}
}
Compare-And-Set
int value = 0;
LOCKif (value == 0) {
value = 42;}
UNLOCK
Introducing a Magic Operation
value.compareAndSet(0, 42);
int value = 0;
Simulated CASlong value;
synchronized long get() { return value;
}
synchronized long compareAndSwap(long expected, long newValue) {long oldValue = value;if (oldValue == expected) {
value = newValue;}return oldValue;
}
synchronized boolean compareAndSet(long expected, long newValue) {return expected == compareAndSwap(expected, newValue);
}
Simulated CASlong value;
synchronized long get() { return value;
}
synchronized long compareAndSwap(long expected, long newValue) {long oldValue = value;if (oldValue == expected) {
value = newValue;}return oldValue;
}
synchronized boolean compareAndSet(long expected, long newValue) {return expected == compareAndSwap(expected, newValue);
}
Simulated CASlong value;
synchronized long get() { return value;
}
synchronized long compareAndSwap(long expected, long newValue) {long oldValue = value;if (oldValue == expected) {
value = newValue;}return oldValue;
}
synchronized boolean compareAndSet(long expected, long newValue) {return expected == compareAndSwap(expected, newValue);
}
Simulated CASlong value;
synchronized long get() { return value;
}
synchronized long compareAndSwap(long expected, long newValue) {long oldValue = value;if (oldValue == expected) {
value = newValue;}return oldValue;
}
synchronized boolean compareAndSet(long expected, long newValue){return expected == compareAndSwap(expected, newValue);
}
Compare and Swap — Hardware Support
compare-and-swapCAS
load-link / store-conditionalLL/SC
cmpxchg ldrex/strex lwarx/stwcx
Atomics in JDK
AtomicReference• ref.get()
• ref.compareAndSet(v1, v2)
• ...
AtomicLong• i.get()
• i.compareAndSet(42, 43)
• i.incrementAndGet(1)
• i.getAndAdd(5)
• ...
java.util.concurrent.atomic
Atomics in JDK
AtomicReference• ref.get()• ref.compareAndSet(v1, v2)
• ...
AtomicLong• i.get()• i.compareAndSet(42, 43)
• i.incrementAndGet(1)
• i.getAndAdd(5)
• ...
java.util.concurrent.atomic
Example. Atomic CounterAtomicLong value = new AtomicLong();
long get() {return value.get();
}
void increment() {long v;do {
v = value.get();} while (!value.compareAndSet(v, v + 1));
}
AtomicLong value = new AtomicLong();
long get() {return value.get();
}
void increment() {long v;do {
v = value.get();} while (!value.compareAndSet(v, v + 1));
}
Example. Atomic Counter
Atomics.Questions?
Non-blocking Guarantees
Wait-Free Per-thread progress is guaranteed
Non-blocking Guarantees
Wait-Free Per-thread progress is guaranteed
Lock-Free Overall progress is guaranteed
Non-blocking Guarantees
Wait-Free Per-thread progress is guaranteed
Lock-Free Overall progress is guaranteed
Obstruction-Free Overall progress is guaranteed if threads don’t interfere with each other
CAS-loopdo {
v = value.get();} while (!value.compareAndSet(v, v + 1));
A. Wait-FreeB. Lock-FreeC. Obstruction-Free
CAS-loopdo {
v = value.get();} while (!value.compareAndSet(v, v + 1));
A. Wait-FreeB. Lock-FreeC. Obstruction-Free
*for modern hardware supporting CAS or LL/SC
Stack & Concurrency
class Node<E> {
final E item;
Node<E> next;
Node(E item) {this.item = item;
}
}
...
class Node<E> {
final E item;
Node<E> next;
Node(E item) {this.item = item;
}
}
E3
E1
E2
E3
E1
E2
top
E3
E1
E2
top
item1
Thread 1
E3
E1
E2
top
item1
Thread 1
E3
E1
E2
top
item2item1
Thread 1 Thread 2
E3
E1
E2
top
item2item1
Thread 1 Thread 2
E3
E1
E2
item2item1
Thread 1 Thread 2top
E3
E1
E2
item2item1
Thread 1 Thread 2
We need a synchronization
top
Non-blocking Stack
void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {
oldHead = top.get();newHead.next = oldHead;
} while (!top.compareAndSet(oldHead, newHead));}
AtomicReference<Node<E>> top;E3
E1
E2
top
void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {
oldHead = top.get();newHead.next = oldHead;
} while (!top.compareAndSet(oldHead, newHead));}
AtomicReference<Node<E>> top;E3
E1
E2
item
top
void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {
oldHead = top.get();newHead.next = oldHead;
} while (!top.compareAndSet(oldHead, newHead));}
E3
E1
E2
item
AtomicReference<Node<E>> top;top
newHead
void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {
oldHead = top.get();newHead.next = oldHead;
} while (!top.compareAndSet(oldHead, newHead));}
E3
E1
E2
AtomicReference<Node<E>> top;
item
top
newHead
oldHead
void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {
oldHead = top.get();newHead.next = oldHead;
} while (!top.compareAndSet(oldHead, newHead));}
AtomicReference<Node<E>> top;E3
E1
E2
item
top
newHead
oldHead
void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {
oldHead = top.get();newHead.next = oldHead;
} while (!top.compareAndSet(oldHead, newHead));}
AtomicReference<Node<E>> top;E3
E1
E2
item
top
newHead
oldHead
void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {
oldHead = top.get();newHead.next = oldHead;
} while (!top.compareAndSet(oldHead, newHead));}
AtomicReference<Node<E>> top;E3
E1
E2
item
top
void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {
oldHead = top.get();newHead.next = oldHead;
} while (!top.compareAndSet(oldHead, newHead));}
AtomicReference<Node<E>> top;E3
E1
E2
item
top
newHead
oldHead
void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {
oldHead = top.get();newHead.next = oldHead;
} while (!top.compareAndSet(oldHead, newHead));}
E3
E1
E2
AtomicReference<Node<E>> top;top
itemnewHead
oldHead
void push(E item) {Node<E> newHead = new Node<>(item);Node<E> oldHead;do {
oldHead = top.get();newHead.next = oldHead;
} while (!top.compareAndSet(oldHead, newHead));}
E3
E1
E2
AtomicReference<Node<E>> top;top
item
E pop() {Node<E> newHead;Node<E> oldHead;do {
oldHead = top.get();if (oldHead == null) return null;newHead = oldHead.next;
} while (!top.compareAndSet(oldHead, newHead));return oldHead.item;
}
E3
E1
E2
top
Non-blocking Stack.Questions?
Non-blocking Queue
Michael and Scott, 1996https://www.research.ibm.com/people/m/michael/podc-1996.pdf
Threads help each other
Non-blocking queue
class LinkedQueue<E> {
static class Node<E> {E item;AtomicReference<Node<E>> next;
Node(E item, AtomicReference<Node<E>> next) {this.item = item;this.next = next;
}}
Node<E> dummy = new Node<>(null, null);AtomicReference<Node<E>> head = new AtomicReference<>(dummy);AtomicReference<Node<E>> tail = new AtomicReference<>(dummy);
}
class LinkedQueue<E> {
static class Node<E> {E item;AtomicReference<Node<E>> next;
Node(E item, AtomicReference<Node<E>> next) {this.item = item;this.next = next;
}}
Node<E> dummy = new Node<>(null, null);AtomicReference<Node<E>> head = new AtomicReference<>(dummy);AtomicReference<Node<E>> tail = new AtomicReference<>(dummy);
}
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode); tail.compareAndSet(curTail, curTail.next.get());
} while (!success); }
tail
dummy 1 2
head
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode); tail.compareAndSet(curTail, curTail.next.get());
} while (!success); }
tail
dummy 1 2 item
head
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode); tail.compareAndSet(curTail, curTail.next.get());
} while (!success); }
tailhead
dummy 1 2 item
newNode
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode); tail.compareAndSet(curTail, curTail.next.get());
} while (!success); }
tailhead
dummy 1 2 item
newNodecurTail
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode);tail.compareAndSet(curTail, curTail.next.get());
} while (!success); }
tailhead
dummy 1 2 item
newNodecurTail
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode); tail.compareAndSet(curTail, curTail.next.get());
} while (!success); }
tailhead
dummy 1 2 item
newNodecurTail
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode); tail.compareAndSet(curTail, curTail.next.get());
} while (!success); }
tailhead
dummy 1 2 item
newNodecurTail
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.compareAndSet(null, newNode); tail.compareAndSet(curTail, curTail.next.get());
} while (!success); }
tailhead
dummy 1 2 item
newNodecurTail
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); tail.CAS(curTail, curTail.next.get());
} while (!success); }
tailhead
dummy 1 2 item
newNodecurTail
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); tail.CAS(curTail, curTail.next.get());
} while (!success); }
tailhead
dummy 1 2 item
newNodecurTail
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); // true tail.CAS(curTail, curTail.next.get()); // true
} while (!success); }
tailhead
dummy 1 2 item
newNodecurTail
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); tail.CAS(curTail, curTail.next.get());
} while (!success); }
tailhead
dummy 1 2 item
newNodecurTail
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); // true tail.CAS(curTail, curTail.next.get()); // false
} while (!success); }
tailhead
dummy 1 2 item
newNodecurTail
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); tail.CAS(curTail, curTail.next.get());
} while (!success); }
tailhead
dummy 1 2 item
newNodecurTail
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); // false tail.CAS(curTail, curTail.next.get()); // false
} while (!success); }
tailhead
dummy 1 2 item
newNodecurTail
another
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); tail.CAS(curTail, curTail.next.get());
} while (!success); }
tailhead
dummy 1 2 item
newNodecurTail
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); // false tail.CAS(curTail, curTail.next.get()); // true
} while (!success); }
tailhead
dummy 1 2 item
newNodecurTail
another
void put(E item) {Node<E> newNode = new Node<>(item, null);boolean success;do {
Node<E> curTail = tail.get();success = curTail.next.CAS(null, newNode); // false tail.CAS(curTail, curTail.next.get()); // true
} while (!success); }
tailhead
dummy 1 2 item
newNodecurTail
anotherHELP
Synchronization
Blocking
lock + unlock
Invariant: before & after
lock-based
Synchronization
Blocking Non-blocking
lock + unlock CAS-loop
Invariant: before & after Semi-invariant
CAS-basedlock-based
public void put(E item) {Node<E> newNode = new Node<>(item, null);while (true) {
Node<E> currentTail = tail.get();Node<E> tailNext = currentTail.next.get();if (currentTail == tail.get()) {
if (tailNext != null) {tail.compareAndSet(currentTail, tailNext);
} else {if (currentTail.next.compareAndSet(null, newNode)) {
tail.compareAndSet(currentTail, newNode);return;
}}
}}
}
public E poll() {while (true) {
Node<E> first = head.get();Node<E> last = tail.get();Node<E> next = first.next.get();if (first == head.get()) {
if (first == last) {if (next == null) return null;tail.compareAndSet(last, next);
} else {E item = next.item;if (head.compareAndSet(first, next))
return item;}
}}
}
Non-blocking Queue in JDK
ConcurrentLinkedQueue is based on Michael-Scott queue
� based on CAS-like operations
� use CAS-loop pattern
� threads help one another
Non-blocking algorithms. Summary
Non-blocking Queue.Questions?
ArrayBlockingQueue
ArrayBlockingQueue
0 1 2 3 4 N-1...
void put(E e) throws InterruptedException {checkNotNull(e);final ReentrantLock lock = this.lock;lock.lockInterruptibly();try {
while (count == items.length)notFull.await();
final Object[] items = this.items;items[putIndex] = x;if (++putIndex == items.length)
putIndex = 0;count++;notEmpty.signal();
} finally {lock.unlock();
}}
ArrayBlockingQueue.put()
void put(E e) throws InterruptedException {checkNotNull(e);final ReentrantLock lock = this.lock;lock.lockInterruptibly();try {
while (count == items.length)notFull.await();
final Object[] items = this.items;items[putIndex] = x;if (++putIndex == items.length)
putIndex = 0;count++;notEmpty.signal();
} finally {lock.unlock();
}}
ArrayBlockingQueue.put()
Modifications
Ladan-Mozes, Shavit, 2004, 2008Key IDEA: use Doubly Linked List to avoid 2nd CAS
OptimisticApproach
http://people.csail.mit.edu/edya/publications/OptimisticFIFOQueue-journal.pdf
Hoffman, Shalev, Shavit, 2007
BasketsQueue
http://people.csail.mit.edu/shanir/publications/Baskets%20Queue.pdf
� Throughput is better
� no FIFO any more� usually you don’t need strong FIFO in real life
Baskets Queue
Summary
� Non-blocking algorithms are complicated� Blocking algorithms are easier
� correctness checking is difficult� difficult to support
� Sometimes it has better performance
Summary
� Non-blocking algorithms are complicated� Blocking algorithms are easier
� correctness checking is difficult� difficult to support
� Sometimes it has better performance
Summary
� Non-blocking algorithms are complicated� Blocking algorithms are easier
� correctness checking is difficult� difficult to support
� Sometimes it has better performance
Summary
Engineering is the art of trade-offs
Links & Books
Books
Links
• Nitsan Wakart — http://psy-lob-saw.blogspot.com/• AlexeyShipilev— https://shipilev.net/• concurrency-interestmailinglist:
http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
Q & A