+ All Categories
Home > Documents > Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

Date post: 18-Dec-2015
Category:
Upload: christine-flowers
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
58
Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1
Transcript
Page 1: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

1

Mohsan JameelDepartment of Computing

NUST School of Electrical Engineering and Computer Science

Page 2: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

2

OutlineI. Introduction to OpenMP

II. OpenMP Programming Model

III. OpenMP Directives

IV. OpenMP Clauses

V. Run-Time Library Routine

VI. Environment Variables

VII. Summary

Page 3: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

3

What is OpenMPApplication program interface (API) that is used

to explicitly direct multi-threaded, shared memory parallelism

Consists of:Compiler directives Run time routinesEnvironment variables

• Specification maintained by the OpenMP, Architecture Review Board (http://www.openmp.org)

• Version 3.0 has been released May 2008

Page 4: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

4

What OpenMP is NotNot Automatic parallelization

User explicitly specifies parallel execution Compiler does not ignore user directives even if

wrong

Not just loop level parallelism Functionality to enable coarse grained parallelism

Not meant for distributed memory parallel systems

Not necessarily implemented identically by all vendors

Not Guaranteed to make the most efficient use of shared memory

Page 5: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

5

History of OpenMP In the early 90's, vendors of shared-memory machines supplied

similar, directive-based, Fortran programming extensions: The user would augment a serial Fortran program with directives

specifying which loops were to be parallelized. First attempt at a standard was the draft for ANSI X3H5 in 1994. It

was never adopted, largely due to waning interest as distributed memory machines became popular.

The OpenMP standard specification started in the spring of 1997, taking over where ANSI X3H5had left off, as newer shared memory machine architectures started to become prevalent

Page 6: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

6

Goal of OpenMPStandardization :

Provide a standard among a variety of shared memory architectures/platforms

Lean and mean : Establish a simple and limited set of directives for

programming shared memory machines.

Ease of Use : Provide capability to incrementally parallelize a

serial program Provide the capability to implement both coarse-

grain and fine-grain parallelism

Portability : Support Fortran (77, 90, and 95), C, and C++

Page 7: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

7

OutlineI. Introduction to OpenMP

II. OpenMP Programming Model

III. OpenMP Directives

IV. OpenMP Clauses

V. Run-Time Library Routine

VI. Environment Variables

VII. Summary

Page 8: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

8

OpenMP Programming ModelThread Based Parallelism

Explicit Parallelism

Compiler Directive Based

Dynamic Threads

Nested Parallelism Support

Task parallelism support (OpenMP specification 3.0)

Page 9: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

9

Shared Memory Model

Page 10: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

10

Execution Model

ID=0

ID=1,2,3…N-1

Page 11: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

11

TerminologyOpenMP Team=: Master + workers

A parallel region is block of code executed by all threads simultaneously. Master thread always has thread ID=0 Thread adjustment is done before entering parallel

region. An “if” clause can be used with parallel construct,

incase the condition evaluate to FALSE, parallel region is avoided and code run serially

Work-sharing construct is responsible for dividing work among the threads in parallel region

Page 12: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

12

Example OpenMP Code Structure

Page 13: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

13

Components of OpenMP

Page 14: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

14

I. Introduction to OpenMP

II. OpenMP Programming Model

III. OpenMP Directives

IV. OpenMP Clauses

V. Run-Time Library Routine

VI. Environment Variables

VII. Summary

Page 15: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

15

Go to helloworld.c

Page 16: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

16

C/C++ Parallel Region Example

!$OMP PARALLEL write (*,*) “Hello”

!$OMP END PARALLEL

Hello world from thread = 0Number of threads = 3

Hello world from thread = 1 Hello world from thread = 2

thread 0 thread 1 thread 2

Page 17: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

17

OpenMP Directives

Page 18: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

18

OpenMP ScopingStatic Extent:

The code textually enclosed between beginning and end of structure block

The static extent does not span other routines

Orphaned Directive:An OpenMP directive appear independently

Dynamic Extent: It include extent of both static extent and

orphaned directives

Page 19: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

19

OpenMP Parallel RegionsA block of code that will be executed by multiple

threads

Properties

- Fork-Join Model

- Number of threads won’t change inside a parallel region

- SPMD execution within region

- Enclosed block of code must be structured, no branching into or out of block

Format

#pragma omp parallel clause1 clause2 …

Page 20: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

20

OpenMP ThreadsHow many threads?

Use of the omp_set_threads() library functionSetting of the OMP_NUM_THREADS environment

variable Implementation default

Dynamic Threads :By default, the same number of threads are used to

execute each parallel regionTwo methods for enabling dynamic threads

1 Use of the omp_set_dynamic() library function2 Setting of the OMP_DYNAMIC environment variable

Page 21: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

21

OpenMP Work-sharing constructs

Data parallelism Functional parallelism Serialize a section

Page 22: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

22

Example: Count3s in an array

Lets assume we have an array of N integers.

We want to find how many 3s are in the array.

We needa for loop if statement, anda count variable

Lets look at its serial and parallel version

Page 23: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

23

Serial: Count3s in an arrayint count, n=100;int array[n]; // initialize array

for(i=0;i<length;i++){

if (array[i]==3)count++;

}

Page 24: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

24

Work-sharing construct: “for loop”

“for loop” work-sharing construct is thought of as data parallelism construct.

Page 25: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

25

Parallelize 1st attempt: Count3s in an array

int count, n=100;int array[n]; // initialize array

#pragma omp parallel for default(none) shared(n,array,count) private(i)

for(i=0;i<length;i++){

if (array[i]==3)count++;

}

Page 26: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

26

Work-sharing construct:Example of “for loop”

#pragma omp parallel for default(none) shared(n,a,b,c) private(i)

for (i=0;i<n;i++)

{

c[i] = a[i] + b[i];

}

Page 27: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

27

Work-sharing construct: “section”

“Section” work-sharing construct is thought of as functional parallelism construct.

Page 28: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

28

Parallelize 2nd attempt: Count3s in an array

• Say we also want to count 4s in same array.• Now we have two different function i.e. count 3 and count 4.

int count, n=100;int array[n]; // initialize array#pragma omp parallel sections default(none) shared(n,array,count3,count4)

private(i)

#pragma omp parallel sectionfor(i=0;i<length;i++){

if (array[i]==3)count3++;

}#pragma omp parallel sectionfor(i=0;i<length;i++){

if (array[i]==4)count4++;

}

No date race condition in this example. WHY?

Page 29: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

29

#pragma omp parallel sections default(none) shared(a,b,c,d,e,n) private(i)

{#pragma omp section{

printf("Thread %d executes 1st loop \n”,omp_get_thread_num());

for(i=0;i<n;i++)a[i]=3*b[i];

}#pragma omp section{

printf("Thread %d executes 1st loop \n”,omp_get_thread_num());

for(i=0;i<n;i++) e[i]=2*c[i]+d[i];

}}final_sum=sum(a,n) + sum(e,n);printf("FINAL_SUM is %d\n",final_sum)

Work-sharing construct:Example 1 of “section”

Page 30: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

30

Work-sharing construct:Example 2 of “section” 1/2

Page 31: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

31

Work-sharing construct:Example 2 of “section” 2/2

Page 32: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

32

Work-sharing construct:Example of “single”

In parallel region “single block” is used to specify that this block is executed only by one thread in the team of threads.

Lets look at an example

Page 33: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

33

I. Introduction to OpenMP

II. OpenMP Programming Model

III. OpenMP Directives

IV. OpenMP Clauses

V. Run-Time Library Routine

VI. Environment Variables

VII. Summary

Page 34: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

34

OpenMP Clauses: Data sharing 1/2

shared(list)shared clause is used to specify which data is

shared among thread.All threads can read and write to this shared

variable.By default all variables are shared.

private(list)private variable are local to thread.Typical example of private variable is loop counter,

since each thread has its own loop counter initialized at entry point.

Page 35: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

35

A private variable is defined between entry and exit point of parallel region.

A private variable within parallel region has no scope out side of it

firstprivate and lastprivate clauses are used to increase scope of variable beyond parallel region.

firstprivate: All variables in the list are initialized with the original value that object had before entering parallel region

lastprivate: The thread that executes the last iteration or section updates the value of object in list.

OpenMP Clauses: Data sharing 2/2

Page 36: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

36

Example: firstprivate and lastprivate

int main(){int C, B , A= 10;

/*--- Start of parallel region ---*/#pragma omp parallel for default(none) firstprivate(A)

lastprivate(B) private(i) for (i=0;i<n;i++){

…B = i + A; …

}/*--- End of parallel region ---*/

C=B;}

Page 37: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

37

OpenMP Clauses: nowait

nowait clause is used to avoid implicit synchronization at end of work-sharing directive

Page 38: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

38

OpenMP Clause: schedule schedule clause is supported in loop construct only.

Used to control the manner in which loop iterations are distributed over the threads.

Syntax: schedule(kind[,chunk_size)

Types: static[,chunk]: distribute iterations in blocks of size “chunk

over the threads in a round-robin fashion dynamic[,chunk]: fixed portions of work; size is controlled by

the value chunk, when thread finishes its portion it starts with next portion.

guided[,chunk]: same as “dynamic”, but size of the portion of work decreases exponentially.

runtime[,chunk]: iteration scheduling scheme is set at runtime thought environment variable OMP_SCHEDULE

Page 39: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

39

The Experiment with schedule clause

Page 40: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

40

OpenMP Critical construct

int main(){int sum, n=5;

int a[5]={1,2,3,4,5};/*--- Start of parallel region ---*/#pragma omp parallel for default(none) shared(sum,a,n) private(i) for (i=0;i<n;i++){ sum += a[i]; }/*--- End of parallel region ---*/printf(“sum of vector a =%d”,sum);}

Example summation of a vector

race condition

Page 41: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

41

OpenMP Critical constructint main(){

int sum, local_sum, n=5; int a[5]={1,2,3,4,5};/*--- Start of parallel region ---*/#pragma omp parallel default(none) shared(sum,a,n) private(local_sum,i) {

#pragma omp forfor (i=0;i<n;i++){

local_sum += a[i]; }

#pragma omp critical {

sum+=local_sum}

}/*--- End of parallel region ---*/printf(“sum of vector a =%d”,sum);}

Page 42: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

42

Parallelize 3rd attempt: Count3s in an array

int count, n=100;int array[n]; // initialize array

#pragma omp parallel default(none) shared(n,array,count) private(i,local_count)

{#pragma omp parallel for for(i=0;i<length;i++){

if (array[i]==3)local_count ++;

}#pragma omp critical

{ count+=local_count}

} /*--- End of Parallel region ---*/

Page 43: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

43

OpenMP Clause: reduction

int main(){

int sum, n=5;

int a[5]={1,2,3,4,5};

/*--- Start of parallel region ---*/

#pragma omp parallel for default(none) shared(a,n) private(i)\

reduction(+:sum)

for (i=0;i<n;i++)

{

sum += a[i];

}

/*--- End of parallel region ---*/

printf(“sum of vector a =%d”,sum);

}

• OpenMP provides a reduction clause which is used with for loop and section directives.

• reduction variable must be shared among threads

• race condition is avoided implicitly.

Page 44: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

44

Parallelize 4th attempt: Count3s in an array

int count, n=100;int array[n]; // initialize array

#pragma omp parallel for default(none) shared(n,array) private(i) \

for(i=0;i<length;i++){

if (array[i]==3)count++;

} /*--- End of Parallel region ---*/

reduction(+:count)

Page 45: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

45

Tasking in OpenMP

Page 46: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

Tasking in OpenMPIn OpenMP 3.0 the concept of tasks has been

added to the OpenMP execution model

The Task model is useful is case where the number of parallel pieces and the work involved in each piece varies and/or unknown

Before inclusion of the Task model OpenMP was not suited for unstructured problem

Tasks are often set up within a single construct in a manager-worker model.

46

Page 47: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

Task Parallelism Approach 1/2 Threads line up as workers, go through the queue of work

to be done, and do a task

Threads do not wait, as in loop parallelism, rather go back to queue and do more tasks.

Each task is executed serially by work thread that encounter that task in queue.

Load balancing occur as short and long task are done as threads become available.

47

Page 48: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

48

Task Parallelism Approach 2/2

Page 49: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

49

Example: Task parallelism

Page 50: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

50

Best Practices Optimize barrier use

Avoid ordered construct

Avoid large critical regions

Maximize parallel regions

Avoid multiple use of parallel regions

Address poor load balance

Page 51: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

51

I. Introduction to OpenMP

II. OpenMP Programming Model

III. OpenMP Directives

IV. OpenMP Clauses

V. Run-Time Library Routine

VI. Environment Variables

VII. Summary

Page 52: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

52

List of runtime library routineRuntime library routine are provided in omp.h

header file void omp_set_num_threads(int num); int omp_get_num_threads(); int omp_get_max_threads(); int omp_get_thread_num(); int omp_get_thread_limit(); int omp_get_num_procs(); double omp_get_wtime(); int omp_in_parallel(); // return 0 false and non-zero true Few more

Page 53: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

53

More list of runtime library routineThese routine are new with OpenMP 3.0

Page 54: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

54

I. Introduction to OpenMP

II. OpenMP Programming Model

III. OpenMP Directives

IV. OpenMP Clauses

V. Run-Time Library Routine

VI. Environment Variables

VII. Summary

Page 55: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

55

Environment VariableOMP_NUM_THREAD

OMP_DYANMIC

OMP_THREAD_LIMIT

OMP_STACKSIZE

Page 56: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

56

I. Introduction to OpenMP

II. OpenMP Programming Model

III. OpenMP Directives

IV. OpenMP Clauses

V. Run-Time Library Routine

VI. Environment Variables

VII. Summary

Page 57: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

57

SummaryOpenMP provides small but yet powerful programming

model

Compilers with OpenMP support are widely available

OpenMP is a directive based shared memory programming model

OpenMP API is a general purpose parallel programming API with emphasis on the ability to parallelize existing programs

Scalable parallel programs can be written by using parallel regions

Work-sharing constructs enable efficient parallelization of computationally intensive portions of program

Page 58: Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

58

Thank Youand

Exercise Session


Recommended