+ All Categories
Home > Documents > 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki...

04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki...

Date post: 13-Jan-2016
Category:
Upload: ethan-weaver
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
45
04/10/25 Parallel and Distributed Prog ramming 1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita
Transcript
Page 1: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 1

Shared-memory Parallel Programming

Taura Lab M1Yuuki Horita

Page 2: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 2

Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary

Page 3: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 3

Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary

Page 4: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 4

Parallel Programming Model

Message Passing Model Talked by Imatake-kun just now

Shared Memory Model Memory is shared with all process

elements Multiprocessor (SMP, SunFire, …) DSM (Distributed Shared Memory)

Process elements can communicate each other through the shared memory

Page 5: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 5

Shared Memory Model

PE PE PE……

Memory

Page 6: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 6

Shared Memory Model

Simplicity not necessary to think about the location

of the computation data Fast communication (Multiprocessor)

not necessary to use networks in process communication

Dynamic load sharing the same reason as simplicity

Page 7: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 7

Shared Memory Parallel Programming Multi-thread programming

Pthreads OpenMP

Parallel Programming model for shared memory multiprocessor

Page 8: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 8

Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary

Page 9: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 9

Sample Sequential Program

…loop{

for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } }

}…

FDM (Finite Difference Method)

Page 10: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 10

Parallelization Procedure

Sequential Computation

Decomposition

Tasks

Assignment

Process Elements

Orchestration

Mapping

Processors

Page 11: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 11

Parallelize the Sequential Program

Decomposition

a task

…loop{

for (i=0; i<N; i++){ for (j=0; j<N; j++){

a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } }

}…

Page 12: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 12

Parallelize the Sequential Program

Assignment

PE

PE

PE

PE

Divide the tasks equally among process elements

Page 13: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 13

Parallelize the Sequential Program

Orchestration

PE

PE

PE

PE

need to communicate and to synchronize

Page 14: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 14

Parallelize the Sequential Program

Mapping

PE

PE

PE

PE

Multiprocessor

Page 15: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 15

Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary

Page 16: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 16

Multi-thread Programming

A process element is a thread cf. a process

Memory is shared among all threads generated from the same process

Threads can communicate with each other through shared memory

Page 17: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 17

Fork-Join Model

Fork

Join

Parallelized Section

Serialized Section

Serialized SectionProgram starts (Main Thread)

Main Thread creates new threads

Other threads join Main Thread

Main Thread continues processing

Main Thread

Page 18: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 18

Libraries for Thread Programming

Pthreads (C/C++) pthread_create() pthread_join()

Java Thread Thread Class / Runnable Interface

Page 19: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 19

Pthreads API (fork/join) pthread_t // thread variable pthread_create (

pthread_t *thread, // thread variable pthread_attr_t *attr, // thread attributes void *(*func)(void *), // start function void *arg // arguments of the function )

pthread_join ( pthread_t thread, // thread variable void **thread_return // the return value of the thread)

Page 20: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 20

Pthreads Parallel Programming#include …

void do_sequentially (void){ /* sequential execution */}

main (){ … do_sequentially(); // want to parallelize …}

Page 21: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 21

Pthreads Parallel Programming#include …#include <pthread.h>

void do_in_parallel (void){ /* parallel execution */}

main (){ pthread_t tid; … pthread_create(&tid, NULL, (void *)do_in_parallel, NULL); do_in_parallel(); pthread_join(tid); …}

Page 22: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 22

Exclusive Access Control

int sum = 0;

thread_A(){ sum++;}

thread_B(){ sum++;}

ThreadA ThreadB

a ← read sum

write a → sum

a = a + 1

a ← read sum

write a → sum

a = a + 1

0

0

1

1

sum = 0

sum = 1sum = 1

Page 23: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 23

Pthreads API (Exclusive Access Control)

Variable pthread_mutex_t

Initialization Function pthread_mutex_init( pthread_mutex_t *mutex, pthread_mutexattr_t *mutexattr )

Lock Function pthread_mutex_lock(pthread_mutex_t *mutex) pthread_mutex_unlock(pthread_mutex_t *mutex)

Page 24: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 24

Exclusive Access Control

int sum = 0;pthread_mutex_t mutex;pthread_mutex_init(&mutex, 0)

thread_A(){ pthread_mutex_lock(&mutex); sum++; pthread_mutex_unlock(&mutex);}

thread_B(){ pthread_mutex_lock(&mutex); sum++; pthread_mutex_unlock(&mutex);}

acquire lock

sum ++

release lock

sum ++

acquire lock

acquire lock

release lock

ThreadA ThreadB

Page 25: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 25

Pthreads API (Condition Variable) Variable

pthread_cond_t Initialization Function

pthread_cond_init( pthread_cond_t *cond, pthread_condattr_t *condattr )

Condition Function pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex) pthread_cond_broadcast(pthread_cond_t *cond) pthread_cond_signal(pthread_cond_t *cond);

Page 26: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 26

Condition Wait

acquire lock

release lock

pthread_mutex_lock(&mutex)while( condition is not satisfied ){ pthread_cond_wait(&cond, &mutex);}pthread_mutex_unlock(&mutex);

Is condition satisfied?

release lock

sleep

pthread_cond_broadcastpthread_cond_signal

pthread_mutex_lock(&mutex)update_condition();pthread_cond_broadcast(&cond);pthread_mutex_unlock(&mutex);

ThreadA

ThreadB

Page 27: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 27

Synchronization Synchronization in the sample program

n = 0;…pthread_mutex_lock(&mutex);n++;while ( n < nthreads ){ pthread_cond_wait(&cond, &mutex);}pthread_cond_broadcast(&cond);pthread_mutex_unlock(&mutex);

Page 28: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 28

Characteristics of Pthreads

troublesome to describe exclusive access control and synchronization

likely to be deadlocked still hard to parallelize a given

sequential program

Page 29: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 29

Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary

Page 30: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 30

What’s OpenMP? specification for a set of compiler

directives, library routines, and environment variables that can be used to specify shared memory parallelism in Fortran and C/C++ programs Fortran ver1.0 API – Oct.1997 C/C++ ver1.0 API – Oct. 1998

Page 31: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 31

Background of OpenMP spread of shared memory multiprocessors need for common directives in shared

memory multiprocessors Each vendors had provided a different set of

directives need for simpler and more flexible interface

for developing parallel applications Pthread is hard for developers to describe

parallel applications

Page 32: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 32

OpenMP API

Directives Libraries Environment Variables

Page 33: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 33

Directives

C/C++

Fortran

#pragma omp directive_name …

!$OMP directive_name …

If user’s compiler doesn’t support openMP, the directive sentences are ignored and therefore the program can be executed as a sequential program.

Page 34: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 34

Parallel Region the part parallelized by some threads

#pragma omp parallel{ /* parallel region */}

create some threads at the beginning of the parallel region

join at the end of the parallel region

Page 35: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 35

Parallel Region (thread)

the number of thread omp_get_num_threads() :

get current # of threads omp_set_num_threads(int nthreads) :

set # of threads to nthreads $OMP_NUM_THREADS

thread ID (0 ~ # of threads-1)

omp_get_thread_num() : get thread ID

Page 36: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 36

Work Sharing Construction

specify the task assignment inside parallel region for

sharing iterations among threads sections

sharing sections among threads single

executing only by one thread

Page 37: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 37

Example of Work Sharing

for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }}

omp_set_num_threads(4);

#pragma omp parallel#pragma omp forfor (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }}

omp_set_num_threads(4);

#pragma omp parallel forfor (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }}

Memory access conflict at i and j makes the computation slow

Page 38: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 38

Data Scoping Attributes specify the data scoping at parallel

construction or work sharing construction shared( var_list )

var_list is shared among threads private( var_list )

var_list is private reduction (operator : var_list )

ex) #pragma omp for reduction (+: sum) var_list is private in construction and reflected

after the construction

Page 39: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 39

Example of Data Scoping Attributes

omp_set_num_threads(4);

#pragma omp parallel for private(i, j)for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }}

Page 40: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 40

Synchronization barrier

wait until all threads reach this line #pragma omp barrier

critical execute exclusively #pragma omp critical [(name)] { … }

atomic update a scalar variable atomically #pragma omp atomic

……

Page 41: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 41

Synchronization (Pthreads/OpenMP)

Synchronization in the sample program

<Pthreads>

pthread_mutex_lock(&mutex);n++;while ( n < nthreads ){ pthread_cond_wait(&cond, &mutex);}pthread_cond_broadcast(&cond);pthread_mutex_unlock(&mutex);

<OpenMP>

#pragma omp barrier

Page 42: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 42

Summary of OpenMP

Incremental parallelization of sequential programs

Portability Easier to implement parallel

application than Pthreads and MPI

Page 43: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 43

Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary

Page 44: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 44

Message Passing Model / Shared Memory Model

Message Passing Shared Memory

Architecture any SMP or DSM

Programming difficult easier

Performance good better (SMP)worse (DSM)

Cost less expensive very expensiveSunFire15K $4,140,830

Page 45: 04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

04/10/25 Parallel and Distributed Programming 45

Thank you!


Recommended