+ All Categories
Home > Documents > OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel...

OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel...

Date post: 29-Jun-2020
Category:
Upload: others
View: 12 times
Download: 0 times
Share this document with a friend
47
Spring 2019 CS4823/6643 Parallel Computing 1 OpenMP with Examples Wei Wang
Transcript
Page 1: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 1

OpenMP with Examples

Wei Wang

Page 2: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 2

The Components of OpenMP Programs

● Construct– A block of the code to be parallelized in a specific way

● Directive– A special statement (called Pragma) used to declare a block to be a construct

● Clause– Appended to directives to declare additional attributes for a construct

– Usually used to specified how to handle variables, e.g., whether to make a variable shared or private

● Run-Time Library Functions– Functions to-be-called from OpenMP programs

– Usually used to query execution-time information or control execution

Page 3: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 3

The Components of OpenMP

● Example:

#pragma omp for private(i)

for (int i = 0; i < 100000; i++) { a[i] = 2 * i;}

Clause (var a and i are shared by all threads)

Directive (declares the following for loop to be parallelized)

Construct (for loop to be parallelized)

Page 4: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 4

Constructs and Directives: What We Will Learn

● Parallel construct– #pragma omp parallel

● Work sharing constructs– “For” loops: #pragma omp for

– Sections: #pragma omp sections– SINGLE code block: #pragma omp single

● Combined Parallel Work-Sharing Constructs– #pragma omp parallel for/sections

● Synchronization constructs– MASTER block: #pragma omp master

– CRITICAL block: #pragma omp critical

– BARRIER: #pragma omp barrier

– ATOMIC: #pragma omp atomic

● There are a few more constructs and directives

Page 5: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 5

Parallel Construct

● Syntax:

● Functionality:– Create multiple threads (number specified by the user) to

run in parallel

– If not combined with other directives, each threads executes “CODE BLOCK” independently

#pragma omp parallel{

CODE BLOCK}

Page 6: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 6

Parallel Construct cont’d

● Example - code*:#include <stdio.h>#include <omp.h>

int main() {

#pragma omp parallel { printf("Hello world\n"); printf("Goodbye world\n"); }

return 0;}

● Example – output:Hello worldGoodbye worldHello worldHello worldHello worldGoodbye worldHello worldHello worldHello worldGoodbye worldGoodbye worldHello worldGoodbye worldGoodbye worldGoodbye worldGoodbye world

Note the randomsequence, every run

the sequence isdifferent

Fox01 server have 12 cores. So OpenMP creates 12 threads

by default*example file name: parallel_directive.c

Page 7: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 7

Work Sharing Construct:For Loops

● Syntax:

● Functionality:– Create multiple threads (number specified by the

user), each execute some of the iterations

#pragma omp parallel{ #pragma omp for for(...){

CODE BLOCK }}

Need this “parallel” or there’s only one thread

Page 8: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 8

Work Sharing Construct:For loops cont’d

● Example - code*:#include <stdio.h>#include <omp.h>

int main() { int i; #pragma omp parallel { #pragma omp for for(i = 0; i < 16; i++)

printf("i is %d\n", i); } return 0;}

● Example – output:i is 0i is 1i is 4i is 5i is 6i is 7i is 14i is 15i is 10i is 11i is 8i is 9i is 12i is 13i is 2i is 3

Scrambled outputindicates multiple

threads run concurrently

*example file name: for_directive.c

8 threads (by default)execute 16 iterations

==>2 iterations/thread

“i” is 16 maximum, indicating only 16

iterations are executed in total,

hence “work sharing”

Page 9: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 9

Work Sharing Construct:For loops cont’d

● For directive only parallelize the outer loop

#pragma omp parallel private(j){ #pragma omp for for(i=...){

SOME CODE for(j=...){

SOME OTHER CODE } }}

Only the outer loop(the i loop) is parallelized

The inner loop (loop j)will be executed by every

thread.

Because inner loop is executed by every thread,

it is important that j isa private variable (more

on private variables later).

Page 10: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 10

Work Sharing Construct:Sections

● Syntax:

● Functionality:– Create multiple threads (number specified by the user), each

execute some of the sections– Can declare any number of sections– For non-iterative (i.e., not for loops) parallel algorithms

#pragma omp parallel{ #pragma omp sections { #pragma omp section {CODE BLOCK} #pragma omp section {CODE BLOCK} }}

Page 11: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 11

Work Sharing Construct:Sections

● Example - code*:int i,j; #pragma omp parallel { #pragma omp sections { #pragma omp section for(i = 0; i < 8; i++) printf("i is %d\n", i); #pragma omp section for(j = 0; j < 8; j++) printf("j is %d\n", j); } }

● Example – output:j is 0j is 1j is 2j is 3i is 0i is 1i is 2j is 4i is 3j is 5i is 4j is 6j is 7i is 5i is 6i is 7

Intermingled i,j outputs indicate

parallel execution

*example file name: section_directive.c

Since we have onlytwo sections, onlytwo threads will

be used for execution

Values of “i” or “j” alwayssequentially go from 0 to 7, indicating each forloop is executed serially,

i.e., each section is executedby one thread

Page 12: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 12

Work Sharing Construct:SINGLE Directive

● Syntax:

● Functionality:– Specify that a block of code should be executed by only one

thread for once with in a parallel construct

#pragma omp parallel{ { CODE BLOCKS EXECUTED IN PARALLEL } #pragma omp single {CODE BLOCK TO BE EXECUTEED BY ONE THREAD} { CODE BLOCKS EXECUTED IN PARALLEL }}

Page 13: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 13

Work Sharing Construct:SINGLE Directive

● Example - code*:#pragma omp parallel{ #pragma omp single {

printf("This only prints once\n"); }

printf("Hello world\n"); printf("Goodbye world\n");}

● Example – output:This only prints onceHello worldGoodbye worldHello worldHello worldGoodbye worldGoodbye worldHello worldHello worldHello worldHello worldGoodbye worldHello worldGoodbye worldGoodbye worldGoodbye worldGoodbye world

This line only printedonce even within

a parallel construct

*example file name: single_directive.c The other printfs executed

multiple times by multiplethreads

Page 14: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 14

Combined Parallel Work-Sharing Constructs

● For convenience, OpenMP allows combining parallel and work share directives into one line:

#pragma omp parallel{ #pragma omp for for(...){

CODE BLOCK }}

#pragma omp parallel for{ for(...){

CODE BLOCK }}

#pragma omp parallel{ #pragma omp sections { #pragma omp section {CODE BLOCK} #pragma omp section {CODE BLOCK} }}

#pragma omp parallel sections{ #pragma omp section {CODE BLOCK} #pragma omp section {CODE BLOCK}}

Page 15: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 15

Synchronization Construct:MASTER Driective

● Syntax:

● Functionality:– Specify that a block of code should be executed by the master thread for

once with in a parallel construct– Master thread is usually the first thread that is created, and usually is

responsible to coordinate other threads

// within a parallel and/or work sharing // construct{ { CODE BLOCKS EXECUTED IN PARALLEL } #pragma omp master {CODE BLOCK TO BE EXECUTEED BY MASTER THREAD} { CODE BLOCKS EXECUTED IN PARALLEL }}

Page 16: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 16

Work Sharing Construct:MASTER Directive

● Example - code*:#pragma omp parallel{ #pragma omp master {

printf("I am the master!\n"); }

printf("Hello world\n"); printf("Goodbye world\n");}

● Example – output:Hello worldHello worldGoodbye worldHello worldHello worldHello worldGoodbye worldI am the master!Hello worldGoodbye worldGoodbye worldHello worldGoodbye worldGoodbye worldHello worldGoodbye worldGoodbye world

This line only printedonce even within

a parallel construct

*example file name: master_directive.c

The other printfs executedmultiple times by multiple

threads

Note that master maynot be the first gets

executed

Page 17: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 17

Work Sharing Construct:MASTER Directive cont’d

● MASTER V.S. SINGLE:– MASTER construct is always executed by the master thread

● Many algorithms need a master thread to coordinate parallel tasks, it is usually good to know these coordination work is always done in one thread

– SINGLE construct is executed by the first thread that hits the SINGLE construct code

● Because of this dynamic scheduling nature, SINGLE construct is slightly slower than MASTER construct

– SINGLE construct has an implicit barrier after it● If you compare the outputs of the MASTER and SINGLE example, you will find that

– SINGLE printf is always executed before other printfs, because there is an implicit barrier blocking other threads to proceed until the SINGLE printf is execute

– MASTER printf can be executed after other printfs, because there is no barrier, and master thread may be schedule to run after the other threads

Page 18: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 18

Synchronization Construct:Barrier Driective

● Syntax:

● Functionality:– When a BARRIER directive is reached, a thread will wait at that point until

all other threads have reached that barrier. All threads then resume executing in parallel the code that follows the barrier.

// within a parallel and/or work sharing construct{ { CODE BLOCKS EXECUTED IN PARALLEL }

#pragma omp barrier

{ CODE BLOCKS EXECUTED IN PARALLEL }}

Page 19: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 19

Work Sharing Construct:Barrier Directive

● Example - code*:#pragma omp parallel{ printf("Hello world\n"); #pragma omp barrier printf("Goodbye world\n");}

● Example – output:Hello worldHello worldHello worldHello worldHello worldHello worldHello worldHello worldGoodbye worldGoodbye worldGoodbye worldGoodbye worldGoodbye worldGoodbye worldGoodbye worldGoodbye world

“Hello world”s are always printed before

“Goodbye world”, because the barrierdirects threads to

wait for each otherbefore proceed

*example file name: barrier_directive.c

Compare the output withthe parallel construct where

there is no barrier

Page 20: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 20

Synchronization Construct:CRITICAL Directive

● Syntax:

● Functionality:– The CRITICAL directive specifies a region of code that must be

executed by only one thread at a time.

// within a parallel and/or work sharing construct{ {CODE BLOCKS EXECUTED IN PARALLEL}

#pragma omp critical {

CODE BLOCK EXECUTED IN SEQUENTIAL }

{CODE BLOCKS EXECUTED IN PARALLEL}}

Page 21: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 21

Work Sharing Construct:CRITICAL Directive cont’d

● Example – code* (control example without CRITICAL):int i = j = 0;

#pragma omp parallel { { i = i + 1; j = j + 1; } } printf(“i = %d, j = %d\n”, i,j);

● Example – output:

i = 5, j = 6

*example file name: critical_directive.c

The values of i and j are not stable; they can be any

number less then thetotal number of threads

Run1:

i = 4, j = 4

Run2:

i = 4, j = 5

Run3:

Page 22: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 22

Work Sharing Construct:CRITICAL Directive cont’d

● Example - code*:int i = j = 0;

#pragma omp parallel { #pragma omp critical { i = i + 1; j = j + 1; } } printf(“i = %d, j = %d\n”, i,j);

● Example – output:

i = 12, j = 12

*example file name: critical_directive.c

The values of i and j are stable; and are always

the same as the number ofthreads (by default 12

threads on fox01)

Run1:

i = 12, j = 12

Run2:

i = 12, j = 12

Run3:

Page 23: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 23

Synchronization Construct:ATOMIC Directive

● Syntax:

● Functionality:– The ATOMIC directive specifies one statement that must be executed

by only one thread at a time.

// within a parallel and/or work sharing construct{ {CODE BLOCKS EXECUTED IN PARALLEL}

#pragma omp atomic STATEMENT EXECUTED IN SEQUENTIAL

{CODE BLOCKS EXECUTED IN PARALLEL}}

Page 24: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 24

Work Sharing Construct:ATOMIC Directive cont’d

● Example – code* (control example without ATOMIC):int i = j = 0;

#pragma omp parallel { i = i + 1; } printf(“i = %d\n”, i,j);

● Example – output:

i = 5

*example file name: atomic_directive.c

The value of i isnot stable; they can be any

number less then thetotal number of threads

Run1:

i = 4

Run2:

i = 4

Run3:

Page 25: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 25

Work Sharing Construct:ATOMIC Directive cont’d

● Example - code*:int i = 0;

#pragma omp parallel { #pragma omp atomic i = i + 1; } printf(“i = %d\n”, i);

● Example – output:

i = 8

*example file name: atomic_directive.c

The value of i isstable; and is always

the same as the number ofthreads (by default 12

threads on fox01)

Run1:

i = 8

Run2:

i = 8

Run3:

Page 26: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 26

CLAUSES

● In OpenMP, clauses are used to specify the properties and/or behavior of a construct, e.g.,– How to use variables?

– How to map and partition work?

– How to create and execute threads?

● Clauses are appended to directives● Clauses we will learn:

– Variables-related: private, shared, default, firstprivate, lastprivate, reduction

– Mapping/partitioning-related: schedule

– Execution-related: nowait

Page 27: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 27

CLAUSE: PRIVATE

● Syntax (always after a directive):

● Description:– The PRIVATE clause declares variables in its list to be private

to each thread.

– For a private variable:● OpenMP creates a new instance of this variable for each thread● Each thread will only access this new instance instead of the original

instance● New instance is not initialized, i.e., it starts with a random value● After execution of the construct, all new instances are discarded

private (var1, var2, …)

Page 28: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 28

CLAUSE: PRIVATE cont’d

● Example – code: ● Example – output:int i = 11;

printf("Before: i is %d\n", i);

#pragma omp parallel private(i) { printf("Within: i is %d\n", i); } printf("After: i is %d\n", i);

Before: i is 11

Within: i is 1Within: i is 1Within: i is 1Within: i is 1Within: i is 1Within: i is 32604Within: i is 1Within: i is 1

After: i is 11

The values of “i” before and after the parallel construct

are always “11”, indicting thisinstance of “i” is not

accessed by any thread

The values of “i” arenot all the same for

each thread, indicatingthese “i”s are different

from the original.The differences in

these “i”s alsosuggesting these “i”s

are not initialized*example file name: private_clause.c

Page 29: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 29

CLAUSE: FIRSTPRIVATE

● Syntax (always after a directive):

● Description:– Same as PRIVATE, FIRSTPRIVATE variables are

private to a thread

– Different from PRIVATE, FIRSTPRIVATE variables are initialized

● A FIRSTPRIVATE variable is initialized using the values of the original instance’s value

firstprivate (var1, var2, …)

Page 30: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 30

CLAUSE: FIRSTPRIVATE cont’d

● Example – code:Int i = 3;

printf("Before: i is %d\n", i);

#pragma omp parallel firstprivate(i) { printf("Within: i’s init val is %d\n", i); i++; printf(“Within: i’s new val is %d\n”, I);} printf("After: i is %d\n", i);

*example file name: firstprivate_clause.c

Page 31: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 31

CLAUSE: FIRSTPRIVATE cont’d

● Example – output:Before: i is 3

Within: i's init value is 3Within: i's init value is 3Within: i's init value is 3Within: i's new value is 4Within: i's init value is 3Within: i's new value is 4Within: i's init value is 3Within: i's new value is 4Within: i's new value is 4Within: i's init value is 3Within: i's new value is 4Within: i's init value is 3Within: i's new value is 4Within: i's init value is 3Within: i's new value is 4Within: i's new value is 4

After: i is 3

The values of “i” before and after the parallel construct

are always “3”, indicting thisinstance of “i” is not

accessed by any thread

For each thread, “i”’s valuealways starts with “3”,

indicating “i” is initialized;“i”’s final value is always4, indicating “i” is private.

Page 32: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 32

CLAUSE: LASTPRIVATE

● Syntax (always after a directive):

● Description:– Same as PRIVATE, a LASTPRIVATE variable is

private to a thread– Different from PRIVATE, the original instance of a

LASTPRIVATE is changed after parallel execution● The value copied back into the original variable object is

obtained from the last (sequentially) iteration or section of the enclosing construct

lastprivate (var1, var2, …)

Page 33: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 33

CLAUSE: LASTPRIVATE cont’d

● Example – code:int i=0, j=0;

#pragma omp parallel for lastprivate(j)for(i = 0; i < 8; i++){ j = i; printf("i=%d, j=%d\n", i,j); } printf("Final val of j is %d\n", j);

*example file name: lastprivate_clause.c

● Example – output:i=0, j=0i=7, j=7i=2, j=2i=1, j=1i=6, j=6i=5, j=5i=3, j=3i=4, j=4Final val of j is 7

After parallel execution, thevalue of the original “j” isupdated based on the lastloop iteration

Page 34: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 34

CLAUSE: SHARED

● Syntax (always after a directive):

● Description:– The SHARED clause declares variables in its list to be

shared among all threads in the team.

– For a shared variable:● There is only one instance of this variable● Every thread accessing this same instance● Accesses to shared variables should be properly protected if

shared variables will be written (e.g., using critical/atomic directives)

shared (var1, var2, …)

Page 35: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 35

CLAUSE: SHARED cont’d

● Example – code: ● Example – output:Int i = 0;

printf("Before: i is %d\n", i);

#pragma omp parallel shared(i) { #pragma omp critical { i++; printf("Within: i is %d\n", i); } } printf("After: i is %d\n", i);

Before: i is 0Within: i is 1Within: i is 2Within: i is 3Within: i is 4Within: i is 5Within: i is 6Within: i is 7Within: i is 8Within a thread, i is 10Within a thread, i is 11Within a thread, i is 12After: i is 12

The value of “i” is incremented by each thread from 1 to 12 (recall there are 12 threads by default); CRITICAL directionensures the updates to “i” is sequential.

*example file name: shared_clause.c

Page 36: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 36

CLAUSE: DEFAULT

● Syntax (always after a directive):

● Description:– The DEFAULT clause specifies the default scope (shared or not) for all variables in a

construct.

– If “none” is the default, programmers must explicitly scope their variables

– The actually allowed types for DEFAULT is compiler-dependent● Some compiler does not allow default(private)

● For a variable, if:– DEFAULT clause is not used;

– and the variable is not explicitly scoped;

– and the variable is not the loop variable of a for construct

– then the variable is shared by default

default (shared | private | none)

Page 37: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 37

CLAUSE: REDUCTION

● Syntax (always after a directive):

● Description:– After the parallel execution, a reduction is

performed on the variables using specified operator

– Variables used in REDUCTION clause are treated as FIRSTPRIVATE variables

– You can use most C operators

reduction (operator: var1, var2)

Page 38: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 38

CLAUSE: REDUCTION cont’d

● Example – code: ● Example – output:int i = 0;int a[4]={0,1,2,3};int b[4]={4,5,6,7};int result=0;

#pragma omp parallel for reduction(+:result) for(i = 0; I < 4; i++) result+=a[i] * b[i]; printf("Final result is %d\n", result);

Final result is 38

Although each thread only sets “result” to be the only one

multiplication, the final “result” is the reduction (sum in this case)

of all multiplications.

*example file name: reduction_clause.c

Page 39: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 39

CLAUSE: SCHEDULE

● Syntax (only valid for for constructs):

● Description:– Describes how iterations of the loop are divided among

the threads – Schedule types: STATIC, DYNAMIC, GUIDED,

RUNTIME, AUTO

– Chuck size is optional; chuck size is used as a guide of how many iterations should be assigned to each thread

#pragma omp for schedule(type [, chunk])

Page 40: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 40

CLAUSE: SCHEDULE cont’d

● Schedule types:– STATIC: each thread takes chunk of iterations; if chuck is not

specified, iterations are evenly divided among threads

– DYNAMIC: each thread takes chunk of iterations. Once a thread finishes, a new chunk is assigned to it.

– GUIDED: similar to DYNAMIC, except that chunk size will shrink gradually as more threads requesting new work to do

– RUNTIME: Allow users to specify schedule type during execution by setting environment variable OMP_SCHEDULE

– AUTO: Let compiler and OpenMP library decide

Page 41: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 41

CLAUSE: NOWAIT

● Syntax (always after a work sharing directive):

● Description:– There is an implicit barrier after each work sharing

construct

nowait

#pragma omp parallel forfor(i = 0; i < 10; i++) {SOME WORK}

#pragma omp parallel forfor(j = 0; j < 10; j++) {SOME WORK}

Implicit barrier here

Page 42: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 42

CLAUSE: NOWAIT cont’d

● Description cont’d:– NOWAIT clause removes that barrier. Therefore,

after a thread finishes the previous construct, it can proceed immediately to the next construct

#pragma omp parallel for nowaitfor(i = 0; i < 10; i++) {SOME WORK}

#pragma omp parallel forfor(j = 0; j < 10; j++) {SOME WORK}

NO barrier here

Page 43: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 43

CLAUSE: NOWAIT cont’d

● Example – code (without nowait):

● Example – output:

#pragma omp parallel{ #pragma omp for for(i = 0; i < 8; i++) printf("i is %d\n", i); #pragma omp for for(j = 0; j < 8; j++) printf("j is %d\n", j);}

i is 0i is 1i is 6i is 7i is 5i is 2i is 3i is 4j is 0j is 7j is 2j is 1j is 5j is 3j is 4j is 6

*example file name: nowait_clause.c

Loop i is always executed before loop j

Page 44: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 44

CLAUSE: NOWAIT cont’d

● Example – code (with nowait):

● Example – output:

#pragma omp parallel{ #pragma omp for nowait for(i = 0; i < 8; i++) printf("i is %d\n", i); #pragma omp for for(j = 0; j < 8; j++) printf("j is %d\n", j);}

i is 0j is 0i is 1j is 1i is 2j is 2i is 3j is 3i is 4j is 4i is 7j is 7i is 5j is 5i is 6j is 6

*example file name: nowait_clause.c

Some of loop i’s iterations are executed

after loop j

Page 45: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 45

Runtime Function: omp_get_thread_num

● Function declaration:

● Description:– Returns the thread number (id) of the thread

#include <omp.h>int omp_get_thread_num(void)

Page 46: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 46

Runtime Function: omp_get_thread_num cont’d

● Example – code: ● Example – output:int i,id;#pragma omp parallel private(id){ id = omp_get_thread_num(); #pragma omp for for(i = 0; i < 16; i++) printf("thread %d handles iteration i = %d\n", id, i);}

thread 0 handles iteration i = 0thread 0 handles iteration i = 1thread 7 handles iteration i = 14thread 7 handles iteration i = 15thread 2 handles iteration i = 4thread 2 handles iteration i = 5thread 4 handles iteration i = 8thread 4 handles iteration i = 9thread 6 handles iteration i = 12thread 6 handles iteration i = 13thread 3 handles iteration i = 6thread 3 handles iteration i = 7thread 1 handles iteration i = 2thread 1 handles iteration i = 3thread 5 handles iteration i = 10thread 5 handles iteration i = 11

*example file name: ompgetthreadnum_clause.c

Thread number/id

Page 47: OpenMP with Examples - GitHub PagesOpenMP with Examples Wei Wang Spring 2019 CS4823/6643 Parallel Computing 2 The Components of OpenMP Programs Construct – A block of the code to

Spring 2019 CS4823/6643 Parallel Computing 47

More Run-time Functions

Function Name Description

OMP_SET_NUM_THREADS Sets the number of threads that will be used in the next parallel region

OMP_GET_NUM_THREADS Returns the number of threads that are currently in the team executing the parallel region from which it is called

OMP_GET_WTIME Provides a portable wall clock timing routine

OMP_GET_WTICK Returns a double-precision floating point value equal to the number of seconds between successive clock ticks

*More functions can be found at:https://computing.llnl.gov/tutorials/openMP/#RunTimeLibrary


Recommended