Date post: | 20-Aug-2015 |
Category: |
Technology |
Upload: | guy-korland |
View: | 953 times |
Download: | 1 times |
Building Scalable Producer-Consumer Pools based on
Elimination-Diraction Trees
Yehuda Afek and Guy Korland and Maria Natanzon and Nir Shavit
The Pool
Producer-consumer pools, that is, collections of unordered objects or tasks, are a fundamental element of modern multiprocessor software and a target of extensive research and development
P1
P2
Pn
pool
...
Put(x)
Put(z)
Put(y)
C1
C2
Cn
Get( )
Get( )
Get( ) ...
ED-Tree Pool
We present the ED-Tree, a distributed pool structure based on a combination of the elimination-tree and diffracting-tree paradigms, allowing high degrees of parallelism with reduced contention
Java JDK6.0: SynchronousQueue/Stack (Lea, Scott, and Shearer) - pairing
up function without buffering. Producers and consumers wait for one another
LinkedBlockingQueue - Producers put their value and leave, Consumers wait for a value to become available.
ConcurrentLinkedQueue - Producers put their value and leave, Consumers return null if the pool is empty.
Drawback
All these structures are based on a centralized structures like a lock-free queue or a stack, and thus are limited in their scalability: the head of the stack or queue is a sequential bottleneck and source of contention.
Some Observations
A pool does not have to obey neither LIFO or FIFO semantics.
Therefore, no centralized structure needed, to hold the items and to serve producers and consumers requests.
New approachED-Tree: a combined variant of the diffracting-tree structure (Shavit and Zemach) and the elimination-tree structure (Shavit and Touitou)The basic idea: Use randomization to distribute the concurrent
requests of threads onto many locations so that they collide with one another and can exchange values, thus avoiding using a central place through which all threads pass.
The result: A pool that allows both parallelism and reduced
contention.
A little history
Both diffraction and elimination were presented years ago, and claimed to be effective through simulation
However, elimination trees and diffracting trees were never used to implement real world structures
Elimination and diffraction were never combined in a single data structure
Diffraction trees
b1 3
42
5134 25
A binary tree of objects called balancers [Aspnes-Herlihy-Shavit] with a single input wire and two output wires
Threads arrive at a balancer and it repeatedly sends them left and right, so its top wire always has maximum one more than the bottom one.
Diffraction trees
In any quiescent state (when there are no threads in the tree), the tree preserves the step property: the output items are balanced out so that the top leaves outputted at most one more element than the bottom ones, and there are no gaps.
b
b
b
b
b
b
b
1 9
2 10
3
4
5
6
7
8
12345678910
[Shavit-Zemach]
Diffraction treesConnect each output wire to a lock free queue
To perform a push, threads traverse the balancers from the root to the leaves and then push the item onto the appropriate queue. To perform a pop, threads traverse the balancers from the root to the leaves and then pop from the appropriate queue/block if the queue is empty.
b
b
b
b
b
b
b
Diffraction trees
12
b
b
0/1
1
3
2
1
2
2
3
1
3
b
b
b
b
0/1
0/1
0/1
0/1
0/1
0/1
0/1
Problem:
Each toggle bit is a hot spot
Diffraction trees
Observation: If an even number of threads pass through a balancer, the outputs are evenly balanced on the top and bottom wires, but the balancer's state remains unchanged
The approach:Add a diffraction array in front of each toggle bit
toggle bit 0/1
Prism ArrayPrism Array
Elimination
At any point while traversing the tree, if producer and consumer collide, there is no need for them to diffract and continue traversing the tree
Producer can hand out his item to the consumer, and both can leave the tree.
Using elimination-diffraction balancers
Let the array at balancer each be a diffraction-elimination array: If two producer (two consumer) threads meet in the
array, they leave on opposite wires, without a need to touch the bit, as anyhow it would remain in its original state.
If producer and consumer meet, they eliminate, exchanging items.
If a producer or consumer call does not manage to meet another in the array, it toggles the respective bit of the balancer and moves on.
What about low concurrency levels?
We show that elimination and diffraction techniques can be combined to work well at both high and low loads
To insure good performance in low loads we use several techniques, making the algorithm adapt to the current contention level.
Adaptation mechanisms
Use backoff in space: Randomly choose a cell in a certain range of the array If the cell is busy (already occupied by two threads), increase the range and
repeat. Else Spin and wait to collision If timed out (no collision) Decrease the range and repeat If certain amount of timeouts reached, spin on the first cell of the array for a
period, and then move on to the toggle bit and the next level. If certain amount of timeouts was reached, don’t try to diffract on any of the
next levels, just go straight to the toggle bit
Each thread remembers the last range it used at the current balancer and next time starts from this range
Starvation avoidance
Threads that failed to eliminate and propagated all the way to the leaves can wait for a long time for their requests to complete, while new threads entering the tree and eliminating finish faster.
To avoid starvation we limit the time a thread can be blocked in the queues before it retries the whole traversal again.
Implementation
Each balancer is composed from
an elimination array, a pair of toggle bits, and two references one to each of its child nodes.
public class Balancer{ ToggleBit producerToggle, consumerToggle;
Exchanger[] eliminationArray; Balancer leftChild , rightChild;ThreadLocal<Integer> lastSlotRange;
}
Implementation
public class Exchanger{
AtomicReference<ExchangerPackage> slot;}
public class ExchangerPackage{
Object value;State state ; // WAITING/ELIMINATION/DIFFRACTION,Type type; // PRODUCER/CONSUMER
}
Implementation
Starting from the root of the tree: Enter balancer Choose a cell in the array and try to collide with another thread,
using backoff mechanism described earlier. If collision with another thread occurred
If both threads are of the same type, leave to the next level balancer (each to separate direction)
If threads are of different type, exchange values and leave Else (no collision) use appropriate toggle bit and move to next
level
If one of the leaves reached, go to the appropriate queue and Insert/Remove an item according to the thread type
Performance evaluation
Sun UltraSPARC T2 Plus multi-core machine. 2 processors, each with 8 cores each core with 8 hardware threads 64 way parallelism on a processor and 128 way
parallelism across the machine.
Most of the tests were done on one processor. i.e. max 64 hardware threads
Performance evaluation
A tree with 3 levels and 8 queues The queues are
SynchronousBlocking/LinkedBlocking/ConcurrentLinked, according to the pool specification
b
b
b
b
b
b
b