+ All Categories
Home > Documents > Håkan Sundell, [email protected] Chalmers University of Technology 1 NOBLE: A Non-Blocking...

Håkan Sundell, [email protected] Chalmers University of Technology 1 NOBLE: A Non-Blocking...

Date post: 01-Jan-2016
Category:
Upload: godwin-gray
View: 215 times
Download: 1 times
Share this document with a friend
30
Håkan Sundell, [email protected] Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter- Process Communication Library Håkan Sundell Philippas Tsigas Computing Science Chalmers University of Technology
Transcript

Håkan Sundell, [email protected]

Chalmers University of Technology

1

NOBLE: A Non-Blocking Inter-Process Communication Library

Håkan Sundell

Philippas Tsigas

Computing Science

Chalmers University of Technology

Håkan Sundell, [email protected]

Chalmers University of Technology

2

Systems

• Multi-processor systems: cache-coherent shared memory– UMA– NUMA

• Desktop computers

Håkan Sundell, [email protected]

Chalmers University of Technology

3

Synchronization

• A significant part of the work performed by today’s parallel applications is spent on synchronization

• Mutual exclusion (Locks)– Blocking– Convoy effects– Deadlocks

Håkan Sundell, [email protected]

Chalmers University of Technology

4

Convoy effects

• The slowdown of one process may cause the whole system to slowdown

Håkan Sundell, [email protected]

Chalmers University of Technology

5

Research

• Non-blocking synchronization has been researched since the 70’s– Lock-free– Wait-free

• Non-blocking are based on usage of – atomic synchronization primitives – shared memory

Håkan Sundell, [email protected]

Chalmers University of Technology

6

Non-blocking Synchronization

• Lock-Free Synchronization– Retries until not interfered by other operations

• Usually detecting interference by using some kind of shared variable indicating busy-state or similar.

– Guarantees live-ness but not starvation-free.

Change flag to unique Change flag to unique valuevalue, or remember current state, or remember current state

... do the operation while preserving the active structure ...... do the operation while preserving the active structure ...

Check for same Check for same valuevalue or state and then validate changes or state and then validate changes, otherwise retry, otherwise retry

Håkan Sundell, [email protected]

Chalmers University of Technology

7

Non-blocking Synchronization

• Wait-free synchronization– All concurrent operations

can proceed independently of the others.

– Every process always finishes the protocol in a bounded number of steps, regardless of interleaving

– No starvation

Håkan Sundell, [email protected]

Chalmers University of Technology

8

Practice

• Non-blocking synchronization is still not used in many practical applications

• Non-blocking solutions are often– complex

– having non-standard or un-clear interfaces

– non-practical

• Many results show that non-blocking improves the performance of parallel applications significantly…

??

Håkan Sundell, [email protected]

Chalmers University of Technology

9

Non-blocking Synchronization – Practice

• P. Tsigas, Y. Zhang “Evaluating the Performance of Non-Blocking Synchronization on Modern Shared Memory Multiprocessors”, ACM Sigmetrics 2001

Håkan Sundell, [email protected]

Chalmers University of Technology

10

• Schedule– Goals

– Design

– Examples

– Experiments

– Status

– Conclusions and Future work

NOBLE: Brings Non-blocking closer to Practice

Håkan Sundell, [email protected]

Chalmers University of Technology

11

Goals

• Create a non-blocking inter-process communication interface that have these properties:– Attractive functionality

– Programmer friendly

– Easy to adapt existing solutions

– Efficient

– Portable

– Adaptable for different programming languages

Håkan Sundell, [email protected]

Chalmers University of Technology

12

Design: Attractive functionality

• Data structures for multi-threaded usage– Queues. – Stacks. – Singly linked lists. – Snapshots.

• Data structures for multi-process usage– Shared Register.

• Clear specifications

enqueue and dequeue

push and pop

first, next, insert, delete and read

update and scan

read and write

Håkan Sundell, [email protected]

Chalmers University of Technology

13

Design: Programmer friendly

• Hide the complexity as much as possible!

• Just one include file

• Simple naming convention: Every function is beginning with the NBL characters

#include <Noble.h>

NBLQueueEnqueue()NBLQueueDequeue()…

Håkan Sundell, [email protected]

Chalmers University of Technology

14

Design: Easy to adapt solutions• Support lock-based as well

as non-blocking solutions.• Several different create

functions

• Unified functions for the operations, independent of the synchronization method 

NBLQueue *NBLQueueCreateLF(); NBLQueue *NBLQueueCreateLB();

NBLQueueFree(handle);NBLQueueEnqueue(handle,item);NBLQueueDequeue(handle);

Håkan Sundell, [email protected]

Chalmers University of Technology

15

Design: Efficient

• To minimize overhead, usage of function pointers

• In-line redirection

typedef struct NBLQueue {void *data;void (*free)(void *data);void (*enqueue)(void *data,void *item);void *(*dequeue)(void *data);

} NBLQueue;

#define NBLQueueFree(handle) (handle->free(handle->data))#define NBLQueueEnqueue(handle,item) (handle-> enqueue(handle->data,item))#define NBLQueueDequeue(handle) (handle->dequeue(handle->data))

Håkan Sundell, [email protected]

Chalmers University of Technology

16

Design: Portable

#define NBL...#define NBL...#define NBL...

Noble.h

#include “Platform/Primitives.h”…

QueueLF.c#include “Platform/Primitives.h”…

StackLF.c

CAS, TAS, Spin-Locks…

SunHardware.asmCAS, TAS, Spin-Locks...

IntelHardware.asm. . .

. . .

Platform dependent

Platform in-dependent

Exported definitions

Identical on all platforms

Håkan Sundell, [email protected]

Chalmers University of Technology

17

Design: Adaptable for different programming languages

• Implemented in C, all compiled into a library file.• C++ compatible include files and easy to make C+

+ wrappersclass NOBLEQueue {private: NBLQueue* queue;public: NOBLEQueue(int type) {if(type==NBL_LOCKFREE) queue=NBLQueueCreateLF(); else … } ~NOBLEQueue() {NBLQueueFree(queue);} inline void Enqueue(void *item) {NBLQueueEnqueue(queue,item);} ...

Håkan Sundell, [email protected]

Chalmers University of Technology

18

Examples

• When the data structure is not in use anymore:

stack=NBLStackCreateLF(10000);...NBLStackFree(stack);

Main

NBLStackPush(stack, item);

oritem=NBLStackPop(stack);

Threads

#include <noble.h>...NBLStack* stack;

Globals• First create a global variable handling the shared data object, for example a stack:

• Create the stack with the appropriate implementation:

• When some thread wants to do some operation:

Håkan Sundell, [email protected]

Chalmers University of Technology

19

Examples

stack=NBLStackCreateLB();...NBLStackFree(stack);

Main

NBLStackPush(stack, item);

oritem=NBLStackPop(stack);

Threads

#include <noble.h>...NBLStack* stack;

Globals

• To change the synchronization mechanism, only one line of code has to be changed!

Håkan Sundell, [email protected]

Chalmers University of Technology

20

Experiment

• Set of 50000 random operations performed multithreaded on each data structure, with either low or high contention

• Comparing the different synchronization mechanisms and implementations available

• Varying number of threads from 1 – 30• Performed on multiprocessors:

– Sun Enterprise 10000 with 64 CPUs, Solaris– Compaq PC with 2 CPUs, Win32

Håkan Sundell, [email protected]

Chalmers University of Technology

21

Experiments: Linked List

• Lock-Free nr.1 – J. Valois “Lock-Free Data Structures” Ph.D-thesis 1995.

• Lock-Free nr.2 - T. Harris “A Pragmatic Implementation of Non-Blocking Linked Lists.” 2001 Symposium on Distributed Computing.

• Lock-Based – Spin-locks (Test-And-Set).

Håkan Sundell, [email protected]

Chalmers University of Technology

22

Experiments: Linked List (high)

Håkan Sundell, [email protected]

Chalmers University of Technology

23

Experiments: Linked List (low)

Håkan Sundell, [email protected]

Chalmers University of Technology

24

Experiments: Linked List (high) - Threads

Håkan Sundell, [email protected]

Chalmers University of Technology

25

Experiments: Queues

• Lock-Free nr.1 – J. Valois “Lock-Free Data Structures” Ph.D-thesis 1995.

• Lock-Free nr.2 - P. Tsigas, Y. Zhang “A Simple, Fast and Scalable Non-Blocking Concurrent FIFO queue for Shared Memory Multiprocessor Systems”, ACM SPAA’01, 2001.

• Lock-Based – Spin-locks (Test-And-Set).

Håkan Sundell, [email protected]

Chalmers University of Technology

26

Experiments: Queues (high)

Håkan Sundell, [email protected]

Chalmers University of Technology

27

Experiments: Queues (low)

Håkan Sundell, [email protected]

Chalmers University of Technology

28

Experiments: Queues (high) - Threads

Håkan Sundell, [email protected]

Chalmers University of Technology

29

Status

• Multiprocessor support– Sun Solaris (Sparc)– Win32 (Intel x86)– SGI (Mips) – Testing phase– Linux (Intel x86) – Testing phase

• Extensive Manual• Web site up and running,

http://www.cs.chalmers.se/~noble

Håkan Sundell, [email protected]

Chalmers University of Technology

30

Conclusions and Future work

• NOBLE: Easy to use, efficient and portable• Non-blocking protocols always performs better

than or similar to lock-based, especially on multi-processor systems.

• To do:– Use in real parallel applications

– Extend with more shared data object implementations

– Extend to other platforms, especially suitable for real-time systems


Recommended