Post on 16-Mar-2020
transcript
Programming withProgramming with OpenMP*OpenMP
ObjectivesObjectives
Upon completion of this module you will be able to use OpenMP to:
implement data parallelismimplement data parallelismimplement task parallelism
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
AgendaAgenda
What is OpenMP?Parallel regionsWorksharingWorksharingData environment SynchronizationOptional Advanced topicsOptional Advanced topics
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
What Is OpenMP?What Is OpenMP?
Portable, shared-memory threading APIFortran C and C++– Fortran, C, and C++
– Multi-vendor support for both Linux and WindowsStandardizes task & loop-level parallelismhttp://www.openmp.orgStandardizes task & loop level parallelismSupports coarse-grained parallelismCombines serial and parallel code in
Current spec is OpenMP 3.0
318 Pages Combines serial and parallel code in single sourceStandardizes ~ 20 years of compiler
g
(combined C/C++ and Fortran)
Standardizes ~ 20 years of compiler-directed threading experience
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Programming ModelProgramming ModelFork-Join Parallelism: • Master thread spawns a team of threads as needed• Parallelism is added incrementally: that is, the sequential
program evolves into a parallel programprogram evolves into a parallel program
Master Thread
Parallel Regions
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
A Few Syntax Details to Get StartedA Few Syntax Details to Get Started
Most of the constructs in OpenMP areMost of the constructs in OpenMP are compiler directives or pragmas
For C and C++ the pragmas take the form:For C and C , the pragmas take the form:#pragma omp construct [clause [clause]…]
For Fortran, the directives take one of the forms:
C$OMP construct [clause [clause]…]!$OMP construct [clause [clause] ]!$OMP construct [clause [clause]…]*$OMP construct [clause [clause]…]
Header file or Fortran 90 moduleHeader file or Fortran 90 module#include “omp.h”use omp_lib
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
AgendaAgenda
What is OpenMP?Parallel regionsParallel regionsWorksharingData environment SynchronizationSynchronizationOptional Advanced topics
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Parallel Region & Structured Blocks (C/C++)Most OpenMP constructs apply to structured blocks
Structured block: a block with one point of entry at the top
g ( )
Structured block: a block with one point of entry at the top and one point of exit at the bottom
The only “branches” allowed are STOP statements in Fortran yand exit() in C/C++
if (go_now()) goto more;#pragma omp parallel#pragma omp parallel{int id = omp_get_thread_num();
more: res[id] = do big job(id);
{int id = omp_get_thread_num();
more: res[id] = do big job (id); more: res[id] = do_big_job(id); if (conv (res[id]) goto done;goto more;
}
more: res[id] = do_big_job (id);
if (conv (res[id]) goto more;}
A structured block Not a structured block
done: if (!really_done()) goto more;printf (“All done\n”);
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Activity 1: Hello WorldsActivity 1: Hello Worlds
Modify the “Hello, Worlds” serial code to run multithreaded using OpenMP*
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
AgendaAgenda
What is OpenMP?Parallel regionsParallel regionsWorksharing – Parallel ForData environment SynchronizationSynchronizationOptional Advanced topicsp p
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
WorksharingWorksharing
W k h i i h l d iWorksharing is the general term used in OpenMP to describe distribution of work across threads.Three examples of worksharing inThree examples of worksharing in OpenMP are:
fomp for constructomp sections constructAutomatically divides work
h domp sections constructomp task construct
among threads
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
omp for constructomp for construct// 12
#pragma omp parallel
# f
// assume N=12#pragma omp parallel#pragma omp for
#pragma omp for
i = 1 i = 5 i = 9
for(i = 1, i < N+1, i++) c[i] = a[i] + b[i];
Threads are assigned an independent set of
i = 2
i = 3
i = 4
i = 6
i = 7
i = 8
i = 10
i = 11
i = 12an independent set of iterationsThreads must wait at
Implicit barrier
Threads must wait at the end of work-sharing construct
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
sharing construct
Combining constructsCombining constructs
These two code segments are equivalentThese two code segments are equivalent#pragma omp parallel {
#pragma omp forfor (i=0;i< MAX; i++) {
res[i] = huge();}
}
#pragma omp parallel forfor (i=0;i< MAX; i++) {
res[i] = huge();res[i] = huge();}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
The Private ClauseThe Private Clause
Reproduces the variable for each taskReproduces the variable for each taskVariables are un-initialized; C++ object is default
t t dconstructedAny value external to the parallel region is undefinedundefinedvoid* work(float* c, int N) {float x, y; int i;, y; ;
#pragma omp parallel for private(x,y)for(i=0; i<N; i++) {
x = a[i]; y = b[i];c[i] = x + y;c[i] = x + y;
}}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Activity 2 – Parallel MandelbrotActivity 2 Parallel Mandelbrot
Objective: create a parallel version of Mandelbrot. Modify the code to add OpenMP worksharingOpenMP worksharing clauses to parallelize the computation of Mandelbrotcomputation of Mandelbrot. Follow the next Mandelbrot activity called Mandelbrot in the student lab docCopyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
the student lab doc
The schedule clauseThe schedule clauseThe schedule clause affects how loop iterations are mapped onto threads
schedule(static [,chunk])Blocks of iterations of size “chunk” to threadsRound robin distributionLow overhead, may cause load imbalance
schedule(dynamic[,chunk])Threads grab “chunk” iterations When done with iterations, thread requests next setWhen done with iterations, thread requests next setHigher threading overhead, can reduce load imbalance
schedule(guided[,chunk])schedule(guided[,chunk])Dynamic schedule starting with large block Size of the blocks shrink; no smaller than “chunk”
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Schedule Clause ExampleSchedule Clause Example#pragma omp parallel for schedule (static 8)#pragma omp parallel for schedule (static, 8)
for( int i = start; i <= end; i += 2 ){
if ( TestForPrime(i) ) gPrimesFound++;if ( TestForPrime(i) ) gPrimesFound++;}
Iterations are divided into chunks of 8If start = 3 then first chunk isIf start 3, then first chunk is
i={3,5,7,9,11,13,15,17}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Activity 2b –Mandelbrot SchedulingActivity 2b Mandelbrot Scheduling
Objective: create a parallel version of mandelbrot. That uses OpenMP dynamic schedulingschedulingFollow the next Mandelbrot activity called Mandelbrot Scheduling in the student glab doc
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
AgendaAgenda
What is OpenMP?Parallel regionsWorksharing Parallel SectionsWorksharing – Parallel SectionsData environment SynchronizationOptional Advanced topicsOptional Advanced topics
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Task DecompositionTask Decompositiona = alice();b b b()b = bob();s = boss(a, b);c = cy();
alice bobc = cy();printf ("%6.2f\n",
bigboss(s,c)); boss cyg
li b b d
boss cy
alice,bob, and cy can be computed i ll l
bigboss
in parallel
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
omp sectionsomp sections#pragma omp sections#pragma omp sectionsMust be inside a parallel regionPrecedes a code block containing of N blocks of code gthat may be executed concurrently by N threadsEncompasses each omp section
#pragma omp sectionPrecedes each block of code within the encompassing p gblock described aboveMay be omitted for first parallel section after the parallel sections pragmasections pragmaEnclosed program segments are distributed for parallel execution among available threads
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Functional Level Parallelism w sectionsFunctional Level Parallelism w sections#pragma omp parallel sections{ #pragma omp section /* Optional */
a = alice();a = alice();#pragma omp section
b = bob();#pragma omp section
c = cy();}}
s = boss(a, b);\printf ("%6.2f\n",
bigboss(s,c));
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Advantage of Parallel SectionsAdvantage of Parallel SectionsIndependent sections of codeIndependent sections of code can execute concurrently –reduce execution timereduce execution time
#pragma omp parallel sections#pragma omp parallel sections
{
#pragma omp section
phase1();
#pragma omp section
phase2();
Serial Parallelphase2();
#pragma omp section
phase3();
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
}
AgendaAgenda
What is OpenMP?P ll l iParallel regionsWorksharing – TasksgData environment SynchronizationSynchronizationOptional Advanced topicsp p
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
New Addition to OpenMPNew Addition to OpenMP
Tasks – Main change for OpenMP 3 0OpenMP 3.0Allows parallelization of irregularAllows parallelization of irregular problems
unbounded loopsrecursive algorithmsrecursive algorithmsproducer/consumer
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
p
What are tasks?What are tasks?T k i d d t it f kTasks are independent units of workThreads are assigned to perform the work of each taskwork of each task
Tasks may be deferred
Tasks may be executed immediatelyTasks may be executed immediatelyThe runtime system decides which of the abovethe above
Tasks are composed of:code to execute Serial Paralleldata environmentinternal control variables (ICV)
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Simple Task ExampleSimple Task Example#pragma omp parallel// assume 8 threads A pool of 8 threads is
created here
// assume 8 threads{#pragma omp single private(p){…while (p) {
One thread gets to execute the while loop
#pragma omp task{processwork(p);
The single “while loop” thread creates a task for p (p);
}p = p->next;
}
each instance of processwork()
}}
}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Task Construct – Explicit Task ViewTask Construct – Explicit Task View
A team of threads is created at the omp parallel constructA single thread is chosen to execute the while loop lets
#pragma omp parallel{
execute the while loop – lets call this thread “L”Thread L operates the while loop, creates tasks, and f
#pragma omp single{ // block 1
node * p = head;fetches next pointersEach time L crosses the omp task construct it generates a new task and has a thread
while (p) { //block 2#pragma omp task private(p)
process(p);new task and has a thread assigned to itEach task runs in its own threadAll t k l t t th
process(p);p = p->next; //block 3}
}All tasks complete at the barrier at the end of the parallel region’s single construct
}}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Why are tasks useful?Why are tasks useful?Have potential to parallelize irregular patterns and recursive function c
#pragma omp parallel{
#pragma omp singleBlock
Single Threaded
Block
Thr1 Thr2 Thr3 Thr4
#pragma omp single{ // block 1
node * p = head;hil ( ) { //bl k 2
1Block 2Task 1
1Block 3Block 3 Block 2
k 2
Block 2Task 1
Bl k 2while (p) { //block 2#pragma omp task
process(p); Block 2T k 2
Block 3
Task 2 Block 2Task 3
Idle
Idle
p = p->next; //block 3}
}
Task 2 Tim
e}
}
Block 2
Block 3
Time Saved
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Task 3
Activity 3 – Linked List using TasksActivity 3 Linked List using Tasks
Objective: Modify the linked list pointer chasing code to implement tasks tochasing code to implement tasks to parallelize the applicationF ll th Li k d Li t t k ti it ll dFollow the Linked List task activity called LinkedListTask in the student lab doc
while(p != NULL){do_work(p->data);p = p->next;
}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
When are tasks gauranteed to be complete?
Tasks are gauranteed to be complete:
When are tasks gauranteed to be complete?
Tasks are gauranteed to be complete:At thread or task barriers
At the directive: #pragma omp barrierAt the directive: #pragma omp taskwaitAt the directive: #pragma omp taskwait
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Task Completion Examplep p#pragma omp parallel
Multiple foo tasks created here – one for#pragma omp parallel
{#pragma omp task
created here one for each thread
foo();#pragma omp barrier# i l
All foo tasks guaranteed to be completed here
#pragma omp single{
#pragma omp task One bar task created #pragma omp taskbar();
}
here
} bar task guaranteed to be completed here
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
AgendaAgenda
What is OpenMP?Parallel regionsParallel regionsWorksharingData environment SynchronizationSynchronizationOptional Advanced topics
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Data Scoping – What’s sharedData Scoping What s sharedOpenMP uses a shared-memory programmingOpenMP uses a shared memory programming modelShared variable a variable that can be read orShared variable - a variable that can be read or written by multiple threadsSh d l b d t k itShared clause can be used to make items explicitly shared
Global variables are shared by default among tasksFile scope variables, namespace scope variables, static variables Variables with const-qualified type having novariables, Variables with const qualified type having no mutable member are shared, Static variables which are declared in a scope inside the construct are shared
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Data Scoping – What’s privateData Scoping What s privateBut, not everything is shared...
Examples of implicitly determined private variables:Stack (local) variables in functions called from parallel regions areStack (local) variables in functions called from parallel regions are PRIVATEAutomatic variables within a statement block are PRIVATELoop iteration variables are privateImplicitly declared private variables within tasks will be treated as firstprivate
Firstprivate clause declares one or more list items to be private to a task, and initializes each of them with a value
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
A Data Environment ExampleA Data Environment Examplefloat A[10];main ()
extern float A[10];void Work (int *index)()
{integer index[10];#pragma omp parallel
( ){
float temp[10];static integer count;p g p p
{Work (index);
}
g ;<...>
}
A, index, count
}printf (“%d\n”, index[1]);
}
temptemp tempWhich variables are shared and which variables are private?A, index, and count are shared by all threads, but temp is
A, index, count
local to each thread
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Data Scoping Issue – fib exampleint fib ( int n ){
Data Scoping Issue fib example
{
int x,y;( )
n is private in both tasks
if ( n < 2 ) return n;#pragma omp task
x = fib(n-1);x is a private variabley is a private variable
#pragma omp tasky = fib(n-2);
#pragma omp taskwait#pragma omp taskwaitreturn x+y
} What’s wrong here?
Can’t use private variables outside of tasks
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
outside of tasks
Data Scoping Example – fib exampleint fib ( int n )
Data Scoping Example fib example
{
int x,y; n is private in both tasks
if ( n < 2 ) return n;#pragma omp task shared(x)
x = fib(n-1);x fib(n 1);#pragma omp task shared(y)
y = fib(n-2);#pragma omp taskwait#pragma omp taskwait
return x+y;}
x & y are shared Good solution
d b h lwe need both values to compute the sum
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Data Scoping Issue – List TraversalList ml; //my list
Data Scoping Issue List Traversal; // y_
Element *e;#pragma omp parallel#pragma omp single
What’s wrong here?
#pragma omp single{
for(e=ml->first;e;e=e->next)#pragma omp task#pragma omp task
process(e);}
Possible data race !Shared variable e
updated by multiple tasks
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Data Scoping Example – List Traversal
List ml; //my list
Data Scoping Example List Traversal
; // y_Element *e;#pragma omp parallel#pragma omp single
Good solution – e is firstprivate
#pragma omp single{
for(e=ml->first;e;e=e->next)#pragma omp task firstprivate(e)#pragma omp task firstprivate(e)
process(e);}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Data Scoping Example – List Traversal
List ml; //my list
p g p
; // y_Element *e;#pragma omp parallel#pragma omp single private(e)
Good solution – e is private
#pragma omp single private(e){
for(e=ml->first;e;e=e->next)#pragma omp task#pragma omp task
process(e);}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
SData Scoping Example – List Traversal
List ml; //my_list#pragma omp parallel{{
Element *e;for(e=ml->first;e;e=e->next)
#pragma omp task#pragma omp taskprocess(e);
}
Good solution – e is private
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
private
AgendaAgenda
What is OpenMP?Parallel regionsParallel regionsWorksharingData environment SynchronizationSynchronizationOptional Advanced topics
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Example: Dot ProductExample: Dot Product
float dot_prod(float* a, float* b, int N) {float sum = 0 0;float sum = 0.0;
#pragma omp parallel for shared(sum)for(int i=0; i<N; i++) {
sum += a[i] * b[i];sum += a[i] * b[i];}
return sum;}}
Wh t i W ?What is Wrong?
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Race ConditionRace Condition
A race condition is nondeterministic behavior caused by the times at which ytwo or more threads access a shared variablevariableFor example, suppose both Thread A and Th d B ti th t t tThread B are executing the statement
area += 4.0 / (1.0 + x*x);/ ( );
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Two TimingsTwo Timings
Value of area
Thread A Thread B
11.667
Value of area
Thread A Thread B
11.667
+3.765
11.667
+3.765
15.432
15.432
11.667
15.432
+ 3.563 + 3.563
18.995 15.230
Order of thread execution causes d t i t b h i i d t
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
non determinant behavior in a data race
Protect Shared DataProtect Shared Data
Must protect access to shared, modifiable data float dot_prod(float* a, float* b, int N) {{float sum = 0.0;
#pragma omp parallel for shared(sum)for(int i=0; i<N; i++) {for(int i 0; i<N; i++) {
#pragma omp criticalsum += a[i] * b[i];
}}return sum;
}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
OpenMP* Critical Construct#pragma omp critical [(lock_name)]
OpenMP Critical Construct_
Defines a critical region on a structured block
float RES;#pragma omp parallel{ float B; #
Threads wait their turn –only one at a time calls consum() thereby #pragma omp for
for(int i=0; i<niters; i++){B = big_job(i);
# iti l (RES l k)
consum() thereby protecting RES from race conditions
#pragma omp critical (RES_lock)consum (B, RES);
}}
Naming the critical construct RES_lock is optional
}
Good Practice – Name all critical sections
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
OpenMP* Reduction Clausepreduction (op : list)The variables in “list” must be shared in the enclosing parallel regiong p gInside parallel or work-sharing construct:
A PRIVATE copy of each list variable is createdA PRIVATE copy of each list variable is created and initialized depending on the “op”
Th i d t d l ll b th dThese copies are updated locally by threads
At end of construct, local copies are combinedAt end of construct, local copies are combined through “op” into a single value and combined with the value in the original SHARED variable
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Reduction Examplep
#pragma omp parallel for reduction(+:sum)for(i=0; i<N; i++) {
[i] * b[i]sum += a[i] * b[i];}
Local copy of sum for each threadAll local copies of sum added together andAll local copies of sum added together and stored in “global” variable
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Numerical Integration ExampleNumerical Integration Example4.0
4.0∫ 4.01
4.0(1+x2)
f(x) =∫ 4.0(1+x2) dx = π
0
static long num_steps=100000; double step, pi;
void main()
2.0
void main(){ int i;
double x, sum = 0.0;
step = 1.0/(double) num_steps;for (i=0; i< num_steps; i++){
x = (i+0.5)*step;sum = sum + 4.0/(1.0 + x*x);sum sum + 4.0/(1.0 + x x);
}pi = step * sum;printf(“Pi = %f\n”,pi);
1.00.0 X }
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
C/C++ Reduction OperationsA range of associative operands can be
C/C++ Reduction Operationsa ge o assoc at e ope a ds ca be
used with reductionInitial values are the ones that make senseInitial values are the ones that make sense mathematically
Operand Initial Value+ 0
Operand Initial Value& ~0+ 0
* 1
& 0
| 0- 0
^ 0&& 1
|| 0
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
|| 0
Activity 4 - Computing Piy p gParallelize the static long num_steps=100000; numerical integration code
i O MP
_double step, pi;
void main()using OpenMPWhat variables can b h d?
{ int i;double x, sum = 0.0;
1 0/(d bl ) be shared?What variables
d t b i t ?
step = 1.0/(double) num_steps;for (i=0; i< num_steps; i++){
x = (i+0.5)*step;+ 4 0/(1 0 + * ) need to be private?
What variables h ld b t f
sum = sum + 4.0/(1.0 + x*x);}pi = step * sum;printf(“Pi = %f\n” pi); should be set up for
reductions?
printf( Pi = %f\n ,pi);}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Single Construct
Denotes block of code to be executed
Single Construct
Denotes block of code to be executed by only one thread
Fi h d i i hFirst thread to arrive is chosen
Implicit barrier at end#pragma omp parallel{
DoManyThings();y g ();#pragma omp single
{ExchangeBoundaries();g
} // threads wait here for singleDoManyMoreThings();
}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Master ConstructMaster Construct
Denotes block of code to be executed only by the master thready yNo implicit barrier at end
#pragma omp parallel{
DoManyThings();DoManyThings();#pragma omp master
{ // if not master skip to next stmtExchangeBoundaries();ExchangeBoundaries();
}DoManyMoreThings();
}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
}
Implicit BarriersImplicit BarriersSeveral OpenMP* constructs have implicitSeveral OpenMP constructs have implicit barriers
Parallel necessary barrier cannot be removedParallel – necessary barrier – cannot be removedforsinglesingle
Unnecessary barriers hurt performance and can b d ith th it lbe removed with the nowait clause
The nowait clause is applicable to:For clauseSingle clause
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Nowait ClauseNowait Clause#pragma single nowait#pragma omp for nowait{ [...] }for(...)
{...};
Use when threads unnecessarily wait between independent computations
#pragma omp for schedule(dynamic,1) nowaitfor(int i=0; i<n; i++)a[i] = bigFunc1(i);
#pragma omp for schedule(dynamic,1) for(int j=0; j<m; j++)b[j] = bigFunc2(j);
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
b[j] bigFunc2(j);
Barrier ConstructBarrier Construct
Explicit barrier synchronizationEach thread waits until all threads arrive
# ll l h d ( C)#pragma omp parallel shared (A, B, C) {
DoSomeWork(A,B); // Processed A into B
#pragma omp barrier
DoSomeWork(B,C); // Processed B into C
}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Atomic ConstructAtomic Construct
Special case of a critical section Applies only to simple update of memory locationlocation#pragma omp parallel for shared(x, y, index, n)
for (i = 0; i < n; i++) {for (i = 0; i < n; i++) {#pragma omp atomicx[index[i]] += work1(i);
y[i] += work2(i);y[i] += work2(i);}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Agendag
What is OpenMP?Parallel regionsWorksharingWorksharingData environment SynchronizationOptional Advanced topicsOptional Advanced topics
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
AdvancedAdvanced ConceptsCo cep s
Parallel Construct – Implicit Task ViewpTasks are created in OpenMP even without an #pragma omp parallel
explicit task directive.Lets look at how tasks are created implicitly for the
{ mydata
d
{ mydata
d
{ mydata
d
Thread
1Thread
2Thread
3created implicitly for the code snippet below
Thread encountering parallelconstruct packages up a set of
code}
code}
code}1 2 3
B iconstruct packages up a set of implicit tasksTeam of threads is created.
Barrier
Each thread in team is assigned to one of the tasks (and tied to it).Barrier holds original master
#pragma omp parallel{ int mydata{
thread until all implicit tasks are finished.
code}
int mydata;code…
}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Task ConstructTask Construct#pragma omp task [clause[[ ]clause] ]#pragma omp task [clause[[,]clause] ...]
structured-block
where clause can be one of:
if (expression) untiedshared (list)private (list) firstprivate (list)default( shared | none )
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Tied & Untied TasksTied & Untied TasksTied Tasks:
A tied task gets a thread assigned to it at its first execution andA tied task gets a thread assigned to it at its first execution and the same thread services the task for its lifetimeA thread executing a tied task, can be suspended, and sent of to execute some other task but eventually the same thread willexecute some other task, but eventually, the same thread will return to resume execution of its original tied taskTasks are tied unless explicitly declared untied
Untied Tasks:An united task has no long term association with any given g y gthread. Any thread not otherwise occupied is free to execute an untied task. The thread assigned to execute an untied task may only change at a "task scheduling point".An untied task is created by appending “untied” to the task clauseExample: #pragma omp task untied
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Example: #pragma omp task untied
Task switchingTask switchingtask switching The act of a thread switching from the execution of
t k t th t kone task to another task.
• The purpose of task switching is distribute threads among the i d t k i th t t id ili l funassigned tasks in the team to avoid piling up long queues of
unassigned tasks
Task switching for tied tasks can only occur at task scheduling• Task switching, for tied tasks, can only occur at task scheduling points located within the following constructs
• encountered task constructs• encountered task constructs• encountered taskwait constructs• encountered barrier directives• implicit barrier regions• at the end of the tied task region
• Untied tasks have implementation dependent scheduling points
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Task switching exampleThe thread executing the “for loop” , AKA the generating task, generates many tasks in a short
Task switching exampleg g g ytime so...The SINGLE generating task will have to suspend for a while when “task pool” fills upsuspend for a while when task pool fills up
Task switching is invoked to start draining the “pool”• When “pool” is sufficiently drained – then the single
task can being generating more tasks again#pragma omp single{{
for (i=0; i<ONEZILLION; i++)#pragma omp task
process(item[i]);process(item[i]);}
}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Optional foil - OpenMP* API
Get the thread number within a team
Optional foil OpenMP API
Get the thread number within a teamint omp_get_thread_num(void);
Get the number of threads in a teamint omp_get_num_threads(void);
Usually not needed for OpenMP codesCan lead to code not being serially consistentDoes have specific uses (debugging)Must include a header file
#i l d < h>#include <omp.h>
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Optional foil - Monte Carlo PiOptional foil Monte Carlo Pi4
1
idfcircle hitting darts of#
2
2= πr
circle hitting darts of#4
squarein dartsof# 2
=π
r
squarein dartsof #
loop 1 to MAXx.coor=(random#)y.coor=(random#)dist=sqrt(x^2 + y^2) if (dist <= 1)
hits hits+1
r
hits=hits+1
pi = 4 * hits/MAX
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
p /
Optional foil - Making MonteOptional foil Making Monte Carlo’s Parallel
hits = 0call SEED48(1)DO I = 1 maxDO I = 1, max
x = DRAND48()y = DRAND48()IF (SQRT(x*x + y*y) .LT. 1)
THENhits = hits+1
ENDIFEND DOpi = REAL(hits)/REAL(max) * 4 0pi = REAL(hits)/REAL(max) * 4.0
What is the challenge here?Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
g
Optional Activity 5: Computing PiOpt o a ct ty 5 Co put gUse the Intel® Math Kernel Library (Intel® MKL) VSL:VSL:
Intel MKL’s VSL (Vector Statistics Libraries)
VSL creates an array, rather than a single random number
VSL can have multiple seeds (one for each thread)
Objective:Objective:Use basic OpenMP* syntax to make Pi parallel
Choose the best code to divide the task up
C t i l ll i bl
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Categorize properly all variables
Firstprivate ClauseVariables initialized from shared variable
Firstprivate ClauseVariables initialized from shared variableC++ objects are copy-constructed
incr=0;# ll l f fi t i t (i )#pragma omp parallel for firstprivate(incr)for (I=0;I<=MAX;I++) {
if ((I%2)==0) incr++;A(I)=incr;
}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
Lastprivate ClauseVariables update shared variable using value
Lastprivate ClauseVariables update shared variable using value from last iteration C++ objects are updated as if by assignmentC++ objects are updated as if by assignment
void sq2(int n, d bl *l )double *lastterm)
{double x; int i;#pragma omp parallel#pragma omp parallel #pragma omp for lastprivate(x)for (i = 0; i < n; i++){
x = a[i]*a[i] + b[i]*b[i];[ ] [ ] [ ] [ ]b[i] = sqrt(x);
}lastterm = x;
}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
}
Preserves global scope for per-threadThreadprivate Clause
Preserves global scope for per-thread storageLegal for name-space-scope and file-scopeUse copyin to initialize from master threadstruct Astruct A;#pragma omp threadprivate(A)Use copyin to initialize from master thread#pragma omp threadprivate(A)…
#pragma omp parallel copyin(A)Private copies of “A” persist between#pragma omp parallel copyin(A)
do_something_to(&A);…
# ll l
persist between regions
#pragma omp paralleldo_something_else_to(&A);
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
20+ Library Routines20 Library RoutinesRuntime environment routines:
Modify/check the number of threadsomp_[set|get]_num_threads()omp_get_thread_num()omp_get_max_threads()
A i ll l i ?Are we in a parallel region?omp_in_parallel()
H i th t ?How many processors in the system?omp_get_num_procs()
E li it l kExplicit locksomp_[set|unset]_lock()
And many more
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
And many more...
Library RoutinesLibrary RoutinesTo fix the number of threads used in a program p g
Set the number of threadsThen save the number returned
#include <omp h>
Request as many threads as you have processors.
#include <omp.h>
void main (){
int num threads;int num_threads;omp_set_num_threads (omp_num_procs ());
#pragma omp parallel{
Protect this operation because memory stores are
t t i{int id = omp_get_thread_num ();
#pragma omp singlenum threads = omp get num threads ();
not atomic
num_threads = omp_get_num_threads ();
do_lots_of_stuff (id);}
}
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES
}
BACKUPBACKUP
Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES