UBI529 Distributed Algorithms
Instructor : Kayhan ErciyesCourse : T 13:30-16:30Office Hour : R 13:30-16:30http : //www.ube.ege.edu.tr/~erciyes/UBI529
2
Distributed Systems
Set of computers connected by a communication network. Gives the user an illusion of a single comp Old platform : Usually a number of WSs over a LAN Now, ranges from a LAN to a sensor network to a mobile network Each node in a DS :
is autonomous communicates by messages needs to synchronize with others
– To achieve a common goal (load balancing, fault tolerance, an application..)
3
Distributed Systems
A distributed system is a collection of entities, each of which is autonomous, programmable, asynchronous and failure-prone, and communicating through an unreliable communication medium.
Our interest in distributed systems in this course is about
Study of algorithms rather than systems We will model the distributed system as a graph G(V,E) where
– V is the set of nodes (processors, process on a device)– E is the set of edges (communication links, wired or
wireless)
4
Modern Distributed Applications
Collaborative computing Military command and control Shared white-board, shared editor, etc. Online strategy games
Stock Market
Distributed Real-time Systems Process Control Navigation systems, Airline Traffic Monitoring (ATM) in the U.S.
– largest DRTS
Mobile Ad hoc Networks
Rescue Operations, emergency operations, robotics
Wireless Sensor Networks
Habitat monitoring, intelligent farmingGrid
5
The Internet (Internet Mapping Project, color coded by ISPs)
6
Distributed Systems: Architecture
7
Issues in Building Distributed Applications
Reliable communication
Consistency same picture of game, same shared file
Fault-tolerance, high-availability failures, recoveries, partitions, merges
Scalability
How is the performance affected as the number of nodes increase ?
Performance What is the complexity of the designed algorithm ?
8
Distributed Real-time Systems
Correct and timely operation is imperative
Failure to respond within a deadline may result loss of lives and property
process control systems, navigation systems, nuclear plants, patient monitoring systems, Airline Traffic Control (ATC)
Hard real-time
Soft real-time banking systems, airline reservation
9
Distributed Real-time Systems : Architecture
10
Future : Mobile Distributed (and Real-Time ??) Computing Systems
Wireless data communications Mobile revolution is inevitable
Two important distributed algorithms Applications :- Mobile ad hoc Networks (MANETs)- Wireless Sensor Networks (WSNs)
Can we still apply DS principles ?
Problems : Location is dynamically changing info Security issues Limited storage on mobile hosts
11
Uncertainty in Distributed Systems
Uncertainty comes from differing processor speeds varying communication delays (partial) failures multiple input streams and interactive behavior
Uncertainty makes it hard to be confident that system is correctTo address this difficulty:
identify and abstract fundamental problems state problems precisely design algorithms to solve problems prove correctness of algorithms analyze complexity of algorithms (e.g., time, space, messages) prove impossibility results and lower bounds
12
Application Areas
These areas have provided classic problems in distributed/concurrent computing:
operating systems (distributed) database systems software fault-tolerance communication networks multiprocessor architectures
13
Course Overview
Part I : Distributed Graph Algorithms (appr. 5 weeks) :
Models of Distributed Computing Graph Theory and Algorithms Review Vertex and Tree Cover Distributed Tree based Communications Distributed MST Distributed Path Traversals Matching Independent Sets and Dominating Sets Clustering Graph Connectivity
14
Course Overview
Part II : Fundamental Algorithms (approx. 6 weeks):
Time Synchronization Distributed Mutual Exclusion Election Algorithms Global State Termination Detection Synchronizers Message Ordering
15
Course Overview
Part III : Fault Tolerance (approx. 3 weeks)
Self-stabilization Consensus Failure Detectors Recovery
For each topic :
describe algorithms, usually starting from a sequentail one and then describe well-known distributed algorithms
analyze the cost of these algorithms explore limitations
Also mention applications that use the techniques
16
Distributed Algorithms : A Perspective (K. Erciyes 2007)
17
A message passing model
System topology is a graph G = (V, E), where
V = set of nodes (sequential processes) E = set of edges (links or channels,
bi/unidirectional)
Four types of actions by a process:- Internal action -input action- Communication action -output action
18
A Reliable FIFO channel
Axiom 1. Message m sent message m received
Axiom 2. Message propagation delay is arbitrary but finite.
Axiom 3. m1 sent before m2 m1 received before m2.
P
Q
19
Life of a process
When a message m is rec’d
Evaluate a predicate with m and the local variables;
2. if predicate = true then- update internal variables;- send zero or more messages; else skip {do nothing}end if
20
Synchrony vs. Asynchrony
Synchronism may occur at various levels
Send & receive can be blocking or non-blocking
Postal communication is asynchronous:
Telephone communication is synchronous
Synchronous communication or not?
Remote Procedure Call,Email
Synchronous
clocks
Physical clocks are
synchronized
Synchronous processes Lock-step synchrony
Synchronous channels Bounded delay
Synchronous message-
order
First-in first-out channels
Synchronous
communication
Communication via
handshaking
21
Shared memory model
Address spaces of processes overlap
M1
1 32
4
M2
Concurrent operations on a shared variable are serialized
22
Modeling wireless networks
Communication via broadcastLimited rangeDynamic topologyCollision of broadcasts
(handled by CSMA/CA)
0
1
2
34
5
6
0
1
2
34
5
6
(a)
(b)
RTS RTS
CTS
23
Weak vs. Strong Models
One object (or operation) of a strong model = More than one objects (or operations) of a weaker model.
Often, weaker models are synonymous with fewer restrictions.
One can add layers (additional restrictions) to create a stronger model from weaker one.
ExamplesHLL model is stronger than assembly language model.
Asynchronous is weaker than synchronous.
Bounded delay is stronger than unbounded delay (channel)
24
Model transformation
Stronger models - simplify reasoning, but - needs extra work to implement
Weaker models - are easier to implement. - Have a closer relationship with the real world
“Can model X be implemented using model Y?” is an interesting question in computer science.
Sample problems
Non-FIFO to FIFO channelMessage passing to shared memoryNon-atomic broadcast to atomic broadcast
25
Non-FIFO to FIFO channel
P Q
buffer
m1m4m3m2
26
Non-FIFO to FIFO channel
{Sender process P} {Receiver process Q}var i : integer {initially 0} var k : integer {initially 0}
buffer: buffer[0..∞] of msg {initially k: buffer [k] = empty
repeat repeat send m[i],i to Q; {STORE} receive m[i],i from P; i := i+1 store m[i] into buffer[i];
forever {DELIVER} while buffer[k] ≠ empty do
begin deliver content of buffer [k]; buffer [k] := empty k :=
k+1; end
forever
27
Other classifications of models
Reactive vs Transformational systems
A reactive system never sleeps (like: a server)A transformational (or non-reactive systems) reaches a fixed point after which no further change occurs in the system (Examples?)
Named vs Anonymous systems
In named systems, process id is a part of the algorithm. In anonymous systems, it is not so. All are equal.
(-) Symmetry breaking is often a challenge.(+) Easy to switch one process by another with no side effect. Saves log N bits.
28
Configuration
Vector of processor states (including outbufs, i.e., channels), one per processor, is a configuration of the systemCaptures current snapshot of entire system: accessible processor states (local vars + incoming msg queues) as well as communication channels.
29
Termination
For technical reasons, admissible executions are defined as infinite.But often algorithms terminate.
To model algorithm termination, identify terminated states of processors: states which, once entered, are never left
Execution has terminated when all processors are terminated and no messages are in transit (in inbufs or outbufs)
30
Finite State Machines for Modelling Distributed Algorithms
A Finite State Machine is a 5-tuple :
I : Set of InputsS : Set of StatesS0 : Initial stateG : S X I -> S State Transfer Function O : Set of outputsÇ : S X I -> O Output Function
31
FSMs
Moore FSM : Output is dependent on the current state
Mealy FSM : Output is dependent on the input and the current state. It usually has less states than Moore FSM A Moore FSM example : Parity Checker
32
Process FSMs
Each process is a FSM. They execute the same FSM code but may be in different states.
33
Our Implementation Environment for Distributed Algorithms
- Design the distributed algorithm as an FSM
- Implement each FSM as Solaris/Linux Threads
- Use my “thread IPC” module for communication
34
The Unix fork
procid = fork()
Replicates calling process
Parent and child are identical except for the value of procid
Use procid to diverge parent and child:if (procid==0)do_child_processing else do_parent_processing
35
Process Concept
An operating system executes a variety of programs: Batch system – jobs Time-shared systems – user programs or tasks
the terms job and process almost used interchangeably.Process – a program in execution; process execution must progress in sequential fashion.A process includes:
program counter stack data section
36
UNIX Processes
Each process has its own address space
– Subdivided into text, data, & stack segment– a.out file describes the address space
OS kernel creates descriptor to manageprocess
Process identifier (PID): User handle forthe process (descriptor)
37
Creating/Destroying ProcessesUNIX fork() creates a process
– Creates a new address space– Copies text, data, & stack into new adress space– Provides child with access to open files
UNIX wait() allows a parent to wait for achild to terminate
UNIX execva() allows a child to run anew program
38
Creating a UNIX Processint pidValue;...pidValue = fork(); /* Creates a child process */
if(pidValue == 0) {/* pidValue is 0 for child, nonzero for parent *//* The child executes this code concurrently with parent */childsPlay(…); /* A procedure linked into a.out */exit(0);}/* The parent executes this code concurrently with child */parentsWork(..);wait(…);...
39
Child Executes Something else
int pid;.../* Set up the argv array for the child */.../* Create the child */if((pid = fork()) == 0) {/* The child executes its own absolute program */execve(childProgram.out, argv, 0);/* Only return from an execve call if it fails */printf(“Error in the exec … terminating the child …”);exit(0);}...wait(…); /* Parent waits for child to terminate */...
40
Example: Parent
#include <sys/wait.h>int main (void){
if ((c=fork()) == 0){ /* This is the child process */execve("child",NULL,NULL);exit(0); /* Should never get here, terminate */}
/* Parent code here */printf("Process[%d]: Parent in execution ...\n", getpid());sleep(2);if(wait(NULL) > 0) /* Child terminating */printf("Process[%d]: Parent detects terminating child \n",c);printf("Process[%d]: Parent terminating ...\n", getpid());
}
41
What is a thread?
process:
• an address space with 1 or more threads executing within that address space, and the required system resources for those threads
• a program that is running
thread:
• a sequence of control within a process• shares the resources in that process
42
Advantages and Drawbacks of ThreadsAdvantages:
• the overhead for creating a thread is significantly less than that for creating a process
• multitasking, i.e., one process serves multiple clients• switching between threads requires the OS to do much less
work than switching between processes
43
Drawbacks:
• not as widely available as longer established features• writing multithreaded programs require more careful thought• more difficult to debug than single threaded programs• for single processor machines, creating several threads in a
program may not necessarily produce an increase in performance (only so many CPU cycles to be had)
44
Single and Multithreaded Processes
45
User Threads
Thread management done by user-level threads library
Examples- POSIX Pthreads- Mach C-threads- Solaris threads
46
Kernel Threads
Supported by the Kernel
Examples- Windows 95/98/NT/2000
- Solaris- Tru64 UNIX- BeOS- Linux
47
Multithreading Models
Many-to-One
One-to-One
Many-to-Many
Many-to-One :
Many user-level threads mapped to single kernel thread.
Used on systems that do not support kernel threads.
48
Many-to-One Model
49
One-to-One
Each user-level thread maps to kernel thread.
Examples- Windows 95/98/NT/2000- OS/2
50
Pthreads
a POSIX standard (IEEE 1003.1c) API for thread creation and synchronization.
API specifies behavior of the thread library, implementation is up to development of the library.
Common in UNIX operating systems.
51
Solaris 2 Threads
52
Solaris Process
53
Windows 2000 Threads
Implements the one-to-one mapping.Each thread contains
- a thread id
- register set
- separate user and kernel stacks
- private data storage area
54
Linux Threads
Linux refers to them as tasks rather than threads.
Thread creation is done through clone() system call.
Clone() allows a child task to share the address space of the parent task (process)
55
Java Threads
Java threads may be created by:
Extending Thread class Implementing the Runnable interface
Java threads are managed by the JVM.
56
Java Thread States
57
POSIX Threads (pthreads)
IEEE's POSIX Threads Model:
• programming models for threads in a UNIX platform• pthreads are included in the international standards
ISO/IEC9945-1
pthreads programming model:
• creation of threads• managing thread execution• managing the shared resources of the process
58
main thread:
• initial thread created when main() are invoked by the process loader
• once in the main(), the application has the ability to create daughter threads
• if the main thread returns, the process terminates even if there are running threads in that process, unless special precautions are taken
• to explicitly avoid terminating the entire process, use pthread_exit()
59
thread termination methods:
• implicit termination:
– thread function execution is completed
• explicit termination:– calling pthread_exit() within the thread– calling pthread_cancel() to terminate other threads
for numerically intensive routines, it is suggested that the application calls p threads if there are p available processors
60
Sample Pthreads Program in C, C++
The program in C++ calls the pthread.h header file. Pthreads related statements are preceded by the pthread_ prefix (except for semaphores). Knowing how to manipulate pointers is important.
61
1 //****************************************************************2 // This is a sample threaded program in C++. The main thread
creates3 // 4 daughter threads. Each daughter thread simply prints out a
message4 // before exiting. Notice that I’ve set the thread attributes to
joinable and5 // of system scope.//****************************************************************#include <iostream.h>#include <stdio.h>#include <pthread.h> #define NUM_THREADS 4 void *thread_function( void *arg ); int main( void ){ int i, tmp; int arg[NUM_THREADS] = {0,1,2,3}; pthread_t thread[NUM_THREADS]; pthread_attr_t attr; // initialize and set the thread attributes pthread_attr_init( &attr ); pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_JOINABLE ); pthread_attr_setscope( &attr, PTHREAD_SCOPE_SYSTEM );
62
// creating threads for ( i=0; i<NUM_THREADS; i++ ) { tmp = pthread_create( &thread[i], &attr, thread_function, (void
*)&arg[i] ); if ( tmp != 0 ) { cout << "Creating thread " << i << " failed!" << endl; return 1; } } // joining threads for ( i=0; i<NUM_THREADS; i++ ) { tmp = pthread_join( thread[i], NULL ); if ( tmp != 0 ) { cout << "Joing thread " << i << " failed!" << endl; return 1; } } return 0;}
63
//***********************************************************// Thi s is the functi on each thread is going to run. It simply asks// the thread to pri nt out a message. Notice the pointer acrobatics.//***********************************************************void *thread_functi on( void *arg ){ int id; id = *((i nt *)arg); printf( "Hel lo from thread %d!\n", id ); pthread_exit( NULL );}
64
How to compile:• in Linux use:
> {C++ comp} –D_REENTRANT hello.cc –lpthread –o hello• it might also be necessary for some systems to define the
_POSIX_C_SOURCE (to 199506L)
Creating a thread:int pthread_create( pthread_t *thread, pthread_attr_t *attr, void
*(*thread_function)(void *), void *arg );• first argument – pointer to the identifier of the created thread• second argument – thread attributes• third argument – pointer to the function the thread will execute• fourth argument – the argument of the executed function (usually a
struct)• returns 0 for success
65
Waiting for the threads to finish:
int pthread_join( pthread_t thread, void **thread_return )
• main thread will wait for daughter thread thread to finish• first argument – the thread to wait for• second argument – pointer to a pointer to the return value
from the thread• returns 0 for success• threads should always be joined; otherwise, a thread might
keep on running even when the main thread has already terminated
66
Thread Attributes
detach state attribute:
int pthread_attr_setdetachstate(pthread_attr_t *attr, int detachstate);
• detached – main thread continues working without waiting for the daughter threads to terminate
• joinable – main thread waits for the daughter threads to terminate before continuing further
67
contention scope attribute:int pthread_attr_setscope(pthread_attr_t *attr, int
*scope);
• system scope – threads are mapped one-to-one on the OS's kernel threads (kernel threads are entities that scheduled onto processors by the OS)
• process scope – threads share a kernel thread with other process scoped threads
68
Threads Programming Model
pipeline model – threads are run one after the othermaster-slave model – master (main) thread doesn't do any work, it just waits for the slave threads to finish workingequal-worker model – all threads work
69
Thread Synchronization Mechanisms
Mutual exclusion (mutex):
• guard against multiple threads modifying the same shared data simultaneously
• provides locking/unlocking critical code sections where shared data is modified
• each thread waits for the mutex to be unlocked (by the thread who locked it) before performing the code section
70
Basic Mutex Functions:int pthread_mutex_init(pthread_mutex_t *mutex, const
pthread_mutexattr_t *mutexattr);
int pthread_mutex_lock(pthread_mutex_t *mutex);
int pthread_mutex_unlock(pthread_mutex_t *mutex);
int pthread_mutex_destroy(pthread_mutex_t *mutex);
• a new data type named pthread_mutex_t is designated for mutexes
• a mutex is like a key (to access the code section) that is handed to only one thread at a time
• the attribute of a mutex can be controlled by using the pthread_mutex_init() function
• the lock/unlock functions work in tandem
71
#include <pthread.h>...pthread_mutex_t my_mutex; // should be of global scope...int main(){ int tmp; ... // initialize the mutex tmp = pthread_mutex_init( &my_mutex, NULL ); ... // create threads ... pthread_mutex_lock( &my_mutex ); do_something_private(); pthread_mutex_unlock( &my_mutex ); ...
return 0;}Whenever a thread reaches the lock/unlock block, it first determines if the mutex is locked. If so, it waits until it is unlocked. Otherwise, it takes the mutex, locks the succeeding code, then frees the mutex and unlocks the code when it's done.
72
Counting Semaphores:
• permit a limited number of threads to execute a section of the code
• similar to mutexes• should include the semaphore.h header file• semaphore functions do not have pthread_ prefixes; instead,
they have sem_ prefixes
73
Basic Semaphore Functions:• creating a semaphore:int sem_init(sem_t *sem, int pshared, unsigned int value);
– initializes a semaphore object pointed to by sem– pshared is a sharing option; a value of 0 means the
semaphore is local to the calling process– gives an initial value value to the semaphore
• terminating a semaphore:int sem_destroy(sem_t *sem);
– frees the resources allocated to the semaphore sem– usually called after pthread_join()– an error will occur if a semaphore is destroyed for which a
thread is waiting
74
• semaphore control:int sem_post(sem_t *sem);int sem_wait(sem_t *sem);
– sem_post atomically increases the value of a semaphore by 1, i.e., when 2 threads call sem_post simultaneously, the semaphore's value will also be increased by 2 (there are 2 atoms calling)
– sem_wait atomically decreases the value of a semaphore by 1; but always waits until the semaphore has a non-zero value first
75
#include <pthread.h>#include <semaphore.h>...void *thread_function( void *arg );...sem_t semaphore; // also a global variable just like mutexesint main(){ int tmp;
... // initialize the semaphore
tmp = sem_init( &semaphore, 0, 0 );...// create threadspthread_create( &thread[i], NULL, thread_function, NULL );...while ( still_has_something_to_do() ){
sem_post( &semaphore ); ...
}...pthread_join( thread[i], NULL );sem_destroy( &semaphore );
return 0;}
76
void *thread_function( void *arg ){
sem_wait( &semaphore );perform_task_when_sem_open();...pthread_exit( NULL );
}
• the main thread increments the semaphore's count value in the while loop
• the threads wait until the semaphore's count value is non-zero before performing perform_task_when_sem_open() and further
• daughter thread activities stop only when pthread_join() is called
77
Condition Variables:
• used for communicating information about the state of shared data
• can make the execution of sections of a codes by a thread depend on the state of a data structure or another running thread
• condition variables are for signaling, not for mutual exclusion; a mutex is needed to synchronize access to shared data
78
Erciyes Thread IPC Module 1998
Functions to write to and read from a FIFO channel between a pair of threads
- Create the FIFO by statically allocating the data structure fifo_t P1_to_P2 ; // a FIFO structure between P1 and P2
Inıtialize a fifoinit_fifo(fifo_t)
- Read from this FIFO (blocking)
- read_fifo(fifo_t fifo, msgptr_t mp)
- Write to this FIFO
- write_fifo(fifo_t fifo, msgptr_t mp)
79
FIFO Functions
/*************************************************************initialize a fifo
*************************************************************/ int init_fifo(fifoptr fp)
{ int fifoid;//sys_fifot.curridx++;fp->state=ALLOCATED; fp->fifo_size=FIFO_SIZE1;sema_init(&fp->fullsem,0,USYNC_THREAD,0);sema_init(&fp->emptysem,fp->fifo_size,USYNC_THREAD,0);mutex_init(&fp->fifomut,USYNC_THREAD,0);fp->read_idx=0;fp->write_idx=0;return(fifoid);
}
80
/**************************************************************read a buffer from a fifo **************************************************************/
bufptr read_fifo(fifoptr fp)
{ bufptr bp;sema_wait(&fp->fullsem);mutex_lock(&fp->fifomut);bp=fp->bufs[fp->read_idx++];fp->read_idx = fp->read_idx % fp->fifo_size; mutex_unlock(&fp->fifomut);sema_post(&fp->emptysem);
return(bp);}
81
/**************************************************************write a buffer to a fifo **************************************************************/
int write_fifo(fifoptr fp, bufptr bp)
{ sema_wait(&fp->emptysem);mutex_lock(&fp->fifomut);fp->bufs[fp->write_idx++]=bp;fp->write_idx = fp->write_idx % fp->fifo_size; mutex_unlock(&fp->fifomut);sema_post(&fp->fullsem);
}
82
Erciyes Thread IPC Module : Memory Management
For memory critical applications, use memory management primitivesCreate a memory pool by
pool_t make_pool();
Initialize the pool
init_pol(pool_t pool);
Get a buffer from the pool by :
bufptr_t get_buf(pool_t pool);
Return the buffer to the pool by
put_buf (pool_t pool, bufptr_t bp);
83
pool data structure
/*********************************************************************bufpool.h
*********************************************************************/ #define POOL_SIZE 100 /* size of allocated space */#define ERR_POOLEMPTY -1#define ERR_POOLFULL -2
typedef struct pool *poolptr;typedef struct pool{ int state ;
int pool_size;sema_t poolsem;mutex_t poolmut;bufptr front;bufptr rear;buf_t bufs[POOL_SIZE];} pool_t;
84
/**************************************************************initialize a buffer pool**************************************************************/
int init_pool (poolptr pp, int pool_length)
{ int i ;pp->pool_size=pool_length;sema_init(&pp->poolsem,pp-
>pool_size,USYNC_THREAD,0);mutex_init(&pp->poolmut,USYNC_THREAD,0);pp->front=&(pp->bufs[0]);
for (i=0; i<pp->pool_size - 1; i ++)pp->bufs[i].next = &(pp->bufs[i+1]);
pp->rear=&(pp->bufs[pp->pool_size-1]);}
85
/**************************************************************get a buffer from a pool
*************************************************************/
bufptr get_buf(poolptr pp)
{ bufptr bp;sema_wait(&pp->poolsem);mutex_lock(&pp->poolmut);bp= pp->front;pp->front=bp->next;mutex_unlock(&pp->poolmut);return(bp);
}
86
/*************************************************************put a buffer to a pool
**************************************************************/
int put_buf(poolptr pp, bufptr bp)
{ bp->next=NULL;mutex_lock(&pp->poolmut);pp->rear->next=bp;pp->rear=bp;if (pp->front==NULL) pp->front=bp;mutex_unlock(&pp->poolmut);sema_post(&pp->poolsem);
}
87
Exercise : Stop and Wait Data Link Protocol
1. Draw the FSM for the Sender
2. Draw the FSM for the Receiver
3. Implement the Sender and the Receiver as Linux/Solaris threads
4. Include ipc module
5. Use read_fifo and write_fifo for communication
88
Summary
-We will be looking mainly at asynchronous distributed algorithms in this course
- We will see how these algorithms work, prove their properties, show their limitations
-We will implement few small projects using threads and the ipc module
-Each thread will simulate a distributed algorithm process
-(You will have an account on departmental Sun servers)
89
References
Programming with POSIX threads, by D. Butenhof, Addison Wesley (1997).Beginning Linux Programming, by R. Stones and N. Matthew, Wrox Press Ltd (1999).www.opengroup.org/onlinepubs/007908799/xsh/threads.htmlwww.research.ibm.com/actc/Tools/thread.htmwww.unm.edu/cirt/introductions/aix_xlfortran/xlflr101.htmwww.emsl.pnl.gov:2080/capabs/mset/training/ibmSP/fthreads/fthreads.html
• Programming with Threads, by G. Bhanot and R. Walkup• Appendix: Mixed models with Pthreads and MPI, V. Sonnad, C.
Tamirisa, and G. Bhanot
Sukumar Ghosh, Distributed Systems An Algorithmic Approach course