+ All Categories
Home > Documents > MultiCore Processing Workshop

MultiCore Processing Workshop

Date post: 24-Feb-2016
Category:
Upload: sancha
View: 32 times
Download: 0 times
Share this document with a friend
Description:
MultiCore Processing Workshop. Multithreaded Programming using POSIX Threads( Pthreads ) Syed Akbar Mehdi. Outline. Preliminaries and Introduction Thread Management Synchronization Exercises. Part 1. Preliminaries. OS Basics Virtual Address Space Program Execution Basics - PowerPoint PPT Presentation
Popular Tags:
47
MultiCore Processing Workshop Multithreaded Programming using POSIX Threads(Pthreads) Syed Akbar Mehdi
Transcript
Page 1: MultiCore  Processing Workshop

MultiCore Processing Workshop

Multithreaded Programming using POSIX Threads(Pthreads)

Syed Akbar Mehdi

Page 2: MultiCore  Processing Workshop

Outline

1. Preliminaries and Introduction2. Thread Management 3. Synchronization4. Exercises.

Page 3: MultiCore  Processing Workshop

Part 1. Preliminaries

• OS Basics• Virtual Address Space• Program Execution Basics• Processes vs Threads• POSIX Threads

Page 4: MultiCore  Processing Workshop

Computer System Organization Computer-system operation

One or more CPUs, device controllers connect through common bus providing access to shared memory

Concurrent execution of CPUs and devices competing for memory cycles

Page 5: MultiCore  Processing Workshop

What is an OS?

software between applications and reality: abstracts hardware and makes useful and portablemakes finite into (near)infiniteprovides protection

Visual StudioMS Word

Half-Life 2

OShardware

Page 6: MultiCore  Processing Workshop

What is a Process?A process is an “instance” of a program

running. Modern OSes run multiple processes simultaneously

Examples (can all run simultaneously): gcc file_A.c – compiler running on file A gcc file_B.c – compiler running on file B emacs – text editor firefox – web browser

Non-examples (implemented as one process): Multiple firefox tabs are part of one process.

Why processes? Simplicity of programming Higher throughput (better CPU utilization), lower latency

Page 7: MultiCore  Processing Workshop

What is a Process? Each proc. Pi has own view of machine

Its own address space. Its own open files. Its own virtual CPU (through preemptive

multitasking) *(char *)0xc000 different in P1 & P2

Greatly simplifies programming model gcc does not care that firefox is running

Sometimes want interaction between processes Simplest is through files: emacs edits file, gcc

compiles it More complicated: Shell/command, Window

manager/app.

Page 8: MultiCore  Processing Workshop

More about Processes

Page 9: MultiCore  Processing Workshop

Process Switching

Page 10: MultiCore  Processing Workshop

Process Organization in Memory

Page 11: MultiCore  Processing Workshop

Basic Execution

Page 12: MultiCore  Processing Workshop

Basic Execution Environment

int main( ) { }

int foo1 (int) { }

int foo2 (int) { }

gvar = 100

var1 = 2var2 = 3

Stack

Global

Text

IP FP SP

main()

int gvar = 100;

int foo2 (int b){

return b * gvar;}

int foo1 (int a){

int lvar = a + gvar;

return foo2(lvar);}

int main ( ){

int var1, int var2;var1 = 2;var2 = 3;var1 = foo1(var1);var2 = foo1(var2);return 0;

}

Heap

Page 13: MultiCore  Processing Workshop

Basic Execution Environmentint gvar = 100;

int foo2 (int b){

return b * gvar;}

int foo1 (int a){

int lvar = a + gvar;

return foo2(lvar);}

int main ( ){

int var1, int var2;var1 = 2;var2 = 3;var1 = foo1(var1);var2 = foo1(var2);return 0;

} int main( ) { }

int foo1 (int) { }

int foo2 (int) { }

gvar = 100

var1 = 2var2 = 3

Stack

Global

Text

IP FP SP

main()

a = 2 lvar = 102foo1()

Heap

Page 14: MultiCore  Processing Workshop

Basic Execution Environment

int main( ) { }

int foo1 (int) { }

int foo2 (int) { }

gvar = 100

var1 = 2var2 = 3

Stack

Global

Text

IP FP SP

main()

a = 2lvar = 102foo1()

foo2() b = 102

int gvar = 100;

int foo2 (int b){

return b * gvar;}

int foo1 (int a){

int lvar = a + gvar;

return foo2(lvar);}

int main ( ){

int var1, int var2;var1 = 2;var2 = 3;var1 = foo1(var1);var2 = foo1(var2);return 0;

}

Heap

Page 15: MultiCore  Processing Workshop

Basic Execution Environmentint gvar = 100;

int foo2 (int b){

return b * gvar;}

int foo1 (int a){

int lvar = a + gvar;

return foo2(lvar);}

int main ( ){

int var1, int var2;var1 = 2;var2 = 3;var1 = foo1(var1);var2 = foo1(var2);return 0;

} int main( ) { }

int foo1 (int) { }

int foo2 (int) { }

gvar = 100

var1 = 2var2 = 3

Stack

Global

Text

IP FP SP

main()

a = 2 lvar = 102foo1()

Heap

Page 16: MultiCore  Processing Workshop

Basic Execution Environment

int main( ) { }

int foo1 (int) { }

int foo2 (int) { }

gvar = 100

var1 = 10200var2 = 3

Stack

Global

Text

IP FP SP

main()

int gvar = 100;

int foo2 (int b){

return b * gvar;}

int foo1 (int a){

int lvar = a + gvar;

return foo2(lvar);}

int main ( ){

int var1, int var2;var1 = 2;var2 = 3;var1 = foo1(var1);var2 = foo1(var2);return 0;

}

Heap

Page 17: MultiCore  Processing Workshop

Basic Execution Environmentint gvar = 100;

int foo2 (int b){

return b * gvar;}

int foo1 (int a){

int lvar = a + gvar;

return foo2(lvar);}

int main ( ){

int var1, int var2;var1 = 2;var2 = 3;var1 = foo1(var1);var2 = foo1(var2);return 0;

} int main( ) { }

int foo1 (int) { }

int foo2 (int) { }

gvar = 100

var1 = 10200var2 = 3

Stack

Global

Text

IP FP SP

main()

a = 3 lvar = 103foo1()

Heap

Page 18: MultiCore  Processing Workshop

Basic Execution Environment

int main( ) { }

int foo1 (int) { }

int foo2 (int) { }

gvar = 100

var1 = 10200var2 = 3

Stack

Global

Text

IP FP SP

main()

a = 3lvar = 103foo1()

foo2() b = 103

int gvar = 100;

int foo2 (int b){

return b * gvar;}

int foo1 (int a){

int lvar = a + gvar;

return foo2(lvar);}

int main ( ){

int var1, int var2;var1 = 2;var2 = 3;var1 = foo1(var1);var2 = foo1(var2);return 0;

}

Heap

Page 19: MultiCore  Processing Workshop

Basic Execution Environmentint gvar = 100;

int foo2 (int b){

return b * gvar;}

int foo1 (int a){

int lvar = a + gvar;

return foo2(lvar);}

int main ( ){

int var1, int var2;var1 = 2;var2 = 3;var1 = foo1(var1);var2 = foo1(var2);return 0;

} int main( ) { }

int foo1 (int) { }

int foo2 (int) { }

gvar = 100

var1 = 10200var2 = 3

Stack

Global

Text

IP FP SP

main()

a = 3 lvar = 103foo1()

Heap

Page 20: MultiCore  Processing Workshop

Basic Execution Environment

int main( ) { }

int foo1 (int) { }

int foo2 (int) { }

gvar = 100

var1 = 10200var2 = 10300

Stack

Global

Text

IP FP SP

main()

int gvar = 100;

int foo2 (int b){

return b * gvar;}

int foo1 (int a){

int lvar = a + gvar;

return foo2(lvar);}

int main ( ){

int var1, int var2;var1 = 2;var2 = 3;var1 = foo1(var1);var2 = foo1(var2);return 0;

}

Heap

Page 21: MultiCore  Processing Workshop

What is a thread?

What’s needed to run code on CPU “execution stream in an execution context” Execution stream: sequential seq. of instructions

CPU execution context (1 thread) State: stack, heap, registers Position: Instruction Pointer(IP) register

OS execution context (n threads): identity + open file descriptors, page table, …

Page 22: MultiCore  Processing Workshop

What is a thread?

Page 23: MultiCore  Processing Workshop

What is a thread? All threads in a process share the same address

space. *(char *)0xc000 means “the same” in thread T1

and T2.

All threads share the same file descriptors. Which implies that they share network sockets.

All threads have access to the same heap and same global variables.

Write access to global variables should be protected by a synchronization mechanism.

Each thread has its separate stack, Instruction Pointer and Local variables. Therefore each thread has its own independent flow

of execution

Page 24: MultiCore  Processing Workshop

What is a thread?

Page 25: MultiCore  Processing Workshop

Pthreads Historically, hardware vendors have implemented their own

proprietary versions of threads. These implementations differed significantly from each other

resulting in reduced portability.

In order to take full advantage of the capabilities provided by threads, a standardized programming interface was required. For UNIX systems, this interface has been specified by the IEEE

POSIX 1003.1c standard (1995). Implementations adhering to this standard are referred to as

POSIX threads, or Pthreads. Most hardware vendors now offer Pthreads in addition to their

proprietary API's.

Pthreads are defined as a set of C language programming types and procedure calls, implemented with a pthread.h header/include file and a thread library.

Page 26: MultiCore  Processing Workshop

PthreadsThe subroutines which comprise the Pthreads API can be informally grouped into four major groups:

Thread management: Routines that work directly on threads

Mutexes: Routines that deal with synchronization, called a "mutex", which is an abbreviation for "mutual exclusion"

Condition variables: Routines that address communications between threads that share a mutex.

Synchronization: Routines that manage read/write locks and barriers.

Page 27: MultiCore  Processing Workshop

Pthreads

Routine Prefix Functional Group

pthread_ Threads themselves and miscellaneous subroutines

pthread_attr_ Thread attributes objectspthread_mutex_ Mutexespthread_mutexattr_ Mutex attributes objects.pthread_cond_ Condition variablespthread_condattr_ Condition attributes objectspthread_key_ Thread-specific data keyspthread_rwlock_ Read/write lockspthread_barrier_ Synchronization barriers

Page 28: MultiCore  Processing Workshop

Part 2. Thread Management

• Creating and Terminating Threads

• Passing Arguments to Threads • Joining and Detaching Threads • Setting Thread Attributes• Miscellaneous Routines

Page 29: MultiCore  Processing Workshop

Creating and Terminating ThreadsThe following functions are used for creating and terminating threads:

1. pthread_create (thread,attr,start_routine,arg)

2. pthread_exit (status) 3. pthread_attr_init (attr) 4. pthread_attr_destroy (attr)

Page 30: MultiCore  Processing Workshop

Creating Threads

Initially, your main() program comprises a single, default thread. All other threads must be explicitly created by the programmer. The maximum number of threads that may be created by a process

is implementation dependent. Once created, threads are peers, and may create other threads.

There is no implied hierarchy or dependency between threads.

Page 31: MultiCore  Processing Workshop

Creating Threads

int pthread_create(pthread_t *thr, const pthread_attr_t *attr, void *(*start_routine)(void), void *arg)

pthread_t *thr

const pthread_attr_t *attr

void *(*start_routine)(void)

void *arg

Will contain the newly created thread’s id. Must be passed by reference

Give the attributes that this thread will have. Use NULL for the default ones.

The name of the function that the thread will run. Must have a void pointer as its return and parameters values

The argument for the function that will be the body of the Pthreads

Pointers of the type void can reference ANY type of data, but they CANNOT be used in any type of operations that reads or writes its data without a cast

Return a non zero value in success

Page 32: MultiCore  Processing Workshop

Terminating ThreadsThere are several ways in which a Pthread may be terminated.

The thread returns from its starting routine This means the main() function for the initial thread.

The thread makes a call to the pthread_exit subroutine. Typically, the pthread_exit() routine is called after a thread has

completed its work and is no longer required to exist.

The thread is canceled by another thread via the pthread_cancel routine.

The entire process is terminated due to a call to either the exec or exit subroutines.

If main() finishes before the threads it has created. If it uses pthread_exit(), the other threads will continue to execute. If main simply returns they will be automatically terminated.

Page 33: MultiCore  Processing Workshop

Misc. Useful Functions

pthread_t pthread_self(void)

void pthread_exit(void *arg);

Return the id of the calling thread. Returns a pthread_t type which is usually an integer type variable

OpenMP Counterpartint omp_get_thread_num(void);

This function will indicate the end of a Pthread and the returning value will be put in arg

Page 34: MultiCore  Processing Workshop

“Hello World” Example

#include <pthread.h>#define NUM_THREADS 4

void* work(void *i){ printf("Hello, world from %i\n", pthread_self()); pthread_exit(NULL);}

int main(int argc, char **argv){ int i; pthread_t id[NUM_THREADS]; for(i = 0; i < NUM_THREADS; ++i){ if(pthread_create(&id[i], NULL, work, NULL)){ printf("Error creating the thread\n"); exit(-1); } } printf("After creating the thread. My id is: %i\n",

pthread_self());return 0;}

Hello, world from 2Hello, world from 3After creating the thread. My id is: 1Hello, world from 4

Hello, world from 2Hello, world from 3Hello, world from 4After creating the thread. My id is: 1Hello, world from 5

What happened to thread 5???

Page 35: MultiCore  Processing Workshop

Passing Arguments to Threads

Single Argument PassingCast its value as a void pointer (a tricky pass

by value)Cast its address as a void pointer (pass by

reference). The value that the address is pointing should NOT

change between Pthreads creation

Multiple Argument PassingHeterogonous: Create an structure with all

the desired arguments and pass an element of that structure as a void pointer.

Homogenous: Create an array and then cast it as a void pointer

Page 36: MultiCore  Processing Workshop

Passing a Single Argument

Hello, world from 2 with value 1Hello, world from 3 with value 2Hello, world from 6 with value 5Hello, world from 5 with value 5Hello, world from 4 with value 4Hello, world from 8 with value 9Hello, world from 9 with value 9Hello, world from 10 with value 9Hello, world from 7 with value 6Hello, world from 11 with value 10

#include <pthread.h>#define NUM_THREADS 10

void *work(void *i){ int f = *((int *)(i)); printf("Hello, world from %i with value %i\n",

pthread_self(), f); pthread_exit(NULL);}

int main(int argc, char **argv){ int i; pthread_t id[NUM_THREADS]; for(i = 0; i < NUM_THREADS; ++i){ if(pthread_create(&id[i], NULL, work, (void *)(&i))){ printf("Error creating the thread\n"); exit(-1);} } return 0;} Wrong Method!!!!

Page 37: MultiCore  Processing Workshop

Passing a Single Argument

Hello, world from 2 with value 0Hello, world from 3 with value 1Hello, world from 4 with value 2Hello, world from 5 with value 3Hello, world from 6 with value 4Hello, world from 7 with value 5Hello, world from 8 with value 6Hello, world from 10 with value 8Hello, world from 11 with value 9

#include <pthread.h>#define NUM_THREADS 10

void *work(void *i){ int f = (int)(i); printf("Hello, world from %i with value %i\n",

pthread_self(), f); pthread_exit(NULL);}

int main(int argc, char **argv){ int i; pthread_t id[NUM_THREADS]; for(i = 0; i < NUM_THREADS; ++i){ if(pthread_create(&id[i], NULL, work, (void *)(i))){ printf("Error creating the thread\n"); exit(-1); } } return 0;}

Right Method 1

Page 38: MultiCore  Processing Workshop

Passing a Single Argument

Hello, world from 2 with value 0Hello, world from 4 with value 2Hello, world from 5 with value 3Hello, world from 6 with value 4Hello, world from 7 with value 5Hello, world from 8 with value 6Hello, world from 9 with value 7Hello, world from 3 with value 1Hello, world from 10 with value 8Hello, world from 11 with value 9

#include <pthread.h>#define NUM_THREADS 10

void *work(void *i){ int f = *((int *)(i)); printf("Hello, world from %i with value %i\n",

pthread_self(), f); pthread_exit(NULL);}

int main(int argc, char **argv){ int i; int y[NUM_THREADS]; pthread_t id[NUM_THREADS]; for(i = 0; i < NUM_THREADS; ++i){ y[i] = i; if(pthread_create(&id[i], NULL, work, (void *)(&y[i]))){ printf("Error creating the thread\n"); exit(-1); } } return 0;}

Right Method 2

Page 39: MultiCore  Processing Workshop

Thread Joining

Joining is a way to accomplish “coarse grained” synchronization between threads.

The pthread_join() subroutine blocks the calling thread until the thread with the specified “id” terminates.

A joining thread can match one pthread_join() call. It is a logical error to attempt multiple joins on the same thread.

Page 40: MultiCore  Processing Workshop

Thread Joining

The Joining of All Loose Ends: pthread_joinint pthread_join(pthread_t id, void **tr);

pthread_t id

void **tr

The id of a created thread

A pointer to the result of the thread

Make sure that the thread that has this id returns. Otherwise waits for it

OpenMP Counterpart

#pragma omp barrier

Why use it? If the main thread dies, then all other threads will die with it. Even if they have not completed their work

Returns a non zero value in success

T3T2

T1Main

Premature thread death

T3T2T1Main

Join point

Page 41: MultiCore  Processing Workshop

Thread Joining

Hello, world from 2Hello, world from 3Hello, world from 4After creating the thread. My id is: 1Hello, world from 5After joining

#include <pthread.h>#define NUM_THREADS 4void *work(void *i){ printf("Hello, world from %i\n", pthread_self()); pthread_exit(NULL);}int main(int argc, char **argv){ int i; pthread_t id[NUM_THREADS]; for(i = 0; i < NUM_THREADS; ++i){ if(pthread_create(&id[i], NULL, work, NULL)){ exit(-1); } } printf("After creating the thread. My id is: %i\n“, pthread_self()); for(i = 0; i < NUM_THREADS; ++i){ if(pthread_join(id[i], NULL)){ exit(-1); } } printf("After joining\n"); return 0;}

Page 42: MultiCore  Processing Workshop

Thread Attributes By default, a thread is created with certain attributes. Some of

these attributes can be changed by the programmer via the thread attribute object.

Thread attributes help the programmer customize the behavior of thread execution.

pthread_attr_init and pthread_attr_destroy are two functions used to initialize/destroy the thread attribute object.

Other routines are then used to query/set specific attributes in the thread attribute object.

Page 43: MultiCore  Processing Workshop

Thread Attributesint pthread_attr_init(pthread_attr_t *attr);

int pthread_attr_destroy(pthread_attr_t *attr);

int pthread_attr_setdetachstate(pthread_attr_t *attr, int JOIN_STATE);

Initialize an attribute with the default values for the attribute object• Default Schedule: SCHED_OTHER (?)• Default Scope: PTHREAD_SCOPE_SYSTEM (?)• Default Join State: PTHREAD_CREATE_JOINABLE (?)

De-allocate any memory and state that the attribute object occupied. It is safe to delete the attribute object after the thread has been created

Set the attached parameter on the attribute object with the JOIN_STATE variable• PTHREAD_CREATE_JOINABLE: It can be joined at a join point. State must

be saved after function ends• PTHREAD_CREATE_DETACHED: It cannot be joined at a join point. State

and resources are de-allocated immediately

Page 44: MultiCore  Processing Workshop

Thread Attributesint pthread_attr_setschedpolicy(pthread_attr_t *attr, int policy)

int pthread_attr_setschedparam(pthread_attr_t *attr, const struct sched_param *pr)

int pthread_attr_setinheritsched(pthread_attr_t *attr, int inherit)

int pthread_attr_setscope(pthread_attr_t *attr, int scope)

Set the scheduling policy of the thread:• SCHED_OTHER Regular scheduling• SCHED_RR Round-robin (SU)• SCHED_FIFO First-in First-out (SU)

Contains the schedule priority of the threadDefault: 0

Tell if the scheduling parameters will be inherit from the parent or the ones in the attribute object will be usedPTHREAD_EXPLICIT_SCHED Scheduling parameters from the attribute object will be used.PTHREAD_INHERIT_SCHED inherit the attributes from its parent.

Contention parameter• PTHREAD_SCOPE_SYSTEM• PTHREAD_SCOPE_PROCESS

Page 45: MultiCore  Processing Workshop

Thread Attributes#include <pthread.h>#define NUM_THREADS 4struct args{int a; float b; char c;};void *work(void *i){ struct args *a = (struct args *)(i); printf("(%3i, %.3f, %3c) --> %i\n", a->a, a->b, a->c, pthread_self()); pthread_exit(NULL);}int main(int argc, char **argv){ int i; struct args a[NUM_THREADS]; pthread_t id[NUM_THREADS]; pthread_attr_t ma; pthread_attr_init(&ma); pthread_attr_setdetachstate(&ma, PTHREAD_CREATE_JOINABLE); for(i = 0; i < NUM_THREADS; ++i){ a[i].a = i; a[i].b = 1.0 /(i+1); a[i].c = 'a' + (char)(i); pthread_create(&id[i], &ma, work, (void *)(&a[i])); } pthread_attr_destroy(&ma); for(i = 0; i < NUM_THREADS; ++i){pthread_join(id[i], NULL);} return 0;}

( 0, 1.000, a) --> 2( 3, 0.250, d) --> 5( 2, 0.333, c) --> 4( 1, 0.500, b) --> 3

Page 46: MultiCore  Processing Workshop

Miscellaneous Useful Functionsint pthread_attr_getstackaddr (const pthread_attr_t *attr, void **stackaddr)

int pthread_attr_getstacksize (const pthread_attr_t *attr, size_t *stacksize)

Return the stack address that this P-thread will be using

Return the stack size that this P-thread will be using

int pthread_detach (pthread_t thr, void **value_ptr)Make the thread that is identified by thr not joinable

int pthread_once (pthread_once_t *once_control, void (*init_routine)(void));Make sure that the init_routine is executed by a single thread and only once. The once_control is a synchronization mechanism that can be defined as:pthread_once_t once_control = PTHREAD_ONCE_INIT;

OpenMP Counterpart#pragma omp single

void pthread_yield ()Relinquish the use of the processor

Page 47: MultiCore  Processing Workshop

Exercises

Compile and run the example code from the slides

Implement vector addition using Pthreads.

Implement matrix multiplication using Pthreads. Try chunking and cyclic distribution for

different matrix sizes and different core counts and observe the performance.


Recommended