+ All Categories
Home > Technology > Building Scalable Producer-Consumer Pools based on Elimination-Diraction Trees

Building Scalable Producer-Consumer Pools based on Elimination-Diraction Trees

Date post: 20-Aug-2015
Category:
Upload: guy-korland
View: 953 times
Download: 1 times
Share this document with a friend
33
Building Scalable Producer- Consumer Pools based on Elimination-Diraction Trees Yehuda Afek and Guy Korland and Maria Natanzon and Nir Shavit
Transcript

Building Scalable Producer-Consumer Pools based on

Elimination-Diraction Trees

Yehuda Afek and Guy Korland and Maria Natanzon and Nir Shavit

The Pool

Producer-consumer pools, that is, collections of unordered objects or tasks, are a fundamental element of modern multiprocessor software and a target of extensive research and development

P1

P2

Pn

pool

...

Put(x)

Put(z)

Put(y)

C1

C2

Cn

Get( )

Get( )

Get( ) ...

ED-Tree Pool

We present the ED-Tree, a distributed pool structure based on a combination of the elimination-tree and diffracting-tree paradigms, allowing high degrees of parallelism with reduced contention

Java JDK6.0: SynchronousQueue/Stack (Lea, Scott, and Shearer) - pairing

up function without buffering. Producers and consumers wait for one another

LinkedBlockingQueue - Producers put their value and leave, Consumers wait for a value to become available.

ConcurrentLinkedQueue - Producers put their value and leave, Consumers return null if the pool is empty.

Drawback

All these structures are based on a centralized structures like a lock-free queue or a stack, and thus are limited in their scalability: the head of the stack or queue is a sequential bottleneck and source of contention.

Some Observations

A pool does not have to obey neither LIFO or FIFO semantics.

Therefore, no centralized structure needed, to hold the items and to serve producers and consumers requests.

New approachED-Tree: a combined variant of the diffracting-tree structure (Shavit and Zemach) and the elimination-tree structure (Shavit and Touitou)The basic idea: Use randomization to distribute the concurrent

requests of threads onto many locations so that they collide with one another and can exchange values, thus avoiding using a central place through which all threads pass.

The result: A pool that allows both parallelism and reduced

contention.

A little history

Both diffraction and elimination were presented years ago, and claimed to be effective through simulation

However, elimination trees and diffracting trees were never used to implement real world structures

Elimination and diffraction were never combined in a single data structure

Diffraction trees

b1 3

42

5134 25

A binary tree of objects called balancers [Aspnes-Herlihy-Shavit] with a single input wire and two output wires

Threads arrive at a balancer and it repeatedly sends them left and right, so its top wire always has maximum one more than the bottom one.

Diffraction trees

In any quiescent state (when there are no threads in the tree), the tree preserves the step property: the output items are balanced out so that the top leaves outputted at most one more element than the bottom ones, and there are no gaps.

b

b

b

b

b

b

b

1 9

2 10

3

4

5

6

7

8

12345678910

[Shavit-Zemach]

Diffraction treesConnect each output wire to a lock free queue

To perform a push, threads traverse the balancers from the root to the leaves and then push the item onto the appropriate queue. To perform a pop, threads traverse the balancers from the root to the leaves and then pop from the appropriate queue/block if the queue is empty.

b

b

b

b

b

b

b

Diffraction trees

12

b

b

0/1

1

3

2

1

2

2

3

1

3

b

b

b

b

0/1

0/1

0/1

0/1

0/1

0/1

0/1

Problem:

Each toggle bit is a hot spot

Diffraction trees

Observation: If an even number of threads pass through a balancer, the outputs are evenly balanced on the top and bottom wires, but the balancer's state remains unchanged

The approach:Add a diffraction array in front of each toggle bit

toggle bit 0/1

Prism ArrayPrism Array

Elimination

At any point while traversing the tree, if producer and consumer collide, there is no need for them to diffract and continue traversing the tree

Producer can hand out his item to the consumer, and both can leave the tree.

Adding elimination

12

k

..::

0/1

Put(x)

Get( ) x

ok

0/1

Using elimination-diffraction balancers

Let the array at balancer each be a diffraction-elimination array: If two producer (two consumer) threads meet in the

array, they leave on opposite wires, without a need to touch the bit, as anyhow it would remain in its original state.

If producer and consumer meet, they eliminate, exchanging items.

If a producer or consumer call does not manage to meet another in the array, it toggles the respective bit of the balancer and moves on.

ED-tree

What about low concurrency levels?

We show that elimination and diffraction techniques can be combined to work well at both high and low loads

To insure good performance in low loads we use several techniques, making the algorithm adapt to the current contention level.

Adaptation mechanisms

Use backoff in space: Randomly choose a cell in a certain range of the array If the cell is busy (already occupied by two threads), increase the range and

repeat. Else Spin and wait to collision If timed out (no collision) Decrease the range and repeat If certain amount of timeouts reached, spin on the first cell of the array for a

period, and then move on to the toggle bit and the next level. If certain amount of timeouts was reached, don’t try to diffract on any of the

next levels, just go straight to the toggle bit

Each thread remembers the last range it used at the current balancer and next time starts from this range

Starvation avoidance

Threads that failed to eliminate and propagated all the way to the leaves can wait for a long time for their requests to complete, while new threads entering the tree and eliminating finish faster.

To avoid starvation we limit the time a thread can be blocked in the queues before it retries the whole traversal again.

Implementation

Each balancer is composed from

an elimination array, a pair of toggle bits, and two references one to each of its child nodes.

public class Balancer{ ToggleBit producerToggle, consumerToggle;

Exchanger[] eliminationArray; Balancer leftChild , rightChild;ThreadLocal<Integer> lastSlotRange;

}

Implementation

public class Exchanger{

AtomicReference<ExchangerPackage> slot;}

public class ExchangerPackage{

Object value;State state ; // WAITING/ELIMINATION/DIFFRACTION,Type type; // PRODUCER/CONSUMER

}

Implementation

Starting from the root of the tree: Enter balancer Choose a cell in the array and try to collide with another thread,

using backoff mechanism described earlier. If collision with another thread occurred

If both threads are of the same type, leave to the next level balancer (each to separate direction)

If threads are of different type, exchange values and leave Else (no collision) use appropriate toggle bit and move to next

level

If one of the leaves reached, go to the appropriate queue and Insert/Remove an item according to the thread type

Performance evaluation

Sun UltraSPARC T2 Plus multi-core machine. 2 processors, each with 8 cores each core with 8 hardware threads 64 way parallelism on a processor and 128 way

parallelism across the machine.

Most of the tests were done on one processor. i.e. max 64 hardware threads

Performance evaluation

A tree with 3 levels and 8 queues The queues are

SynchronousBlocking/LinkedBlocking/ConcurrentLinked, according to the pool specification

b

b

b

b

b

b

b

Performance evaluationSynchronous stack of Lea et. Al vs ED synchronous pool

Performance evaluationLinked blocking queue vs ED blocking pool

Performance evaluationConcurrent linked queue vs ED non blocking pool

Adding a delay between accessesto the pool

32 consumers, 32 producers

Changing percentage of Consumers vs. total threads number

64 threads

25% Producers 75%Consumers

Elimination rate

Elimination range


Recommended