+ All Categories
Home > Education > Presentation on Shared Memory Parallel Programming

Presentation on Shared Memory Parallel Programming

Date post: 30-Apr-2015
Category:
Upload: vengada-karthik-rangaraju
View: 625 times
Download: 3 times
Share this document with a friend
Description:
This presentation deals with how one can utilize multiple cores, while working with C/C++ applications using an API called OpenMP. It's a shared memory programming model, built on top of POSIX thread. Also the fork-join model, parallel design pattern are discussed using PetriNets.
34
Introduction to OpenMP Presenter: Vengada Karthik Rangaraju Fall 2012 Term September 13 th , 2012
Transcript
Page 1: Presentation on Shared Memory Parallel Programming

Introduction to OpenMP

Presenter: Vengada Karthik Rangaraju

Fall 2012 Term

September 13th, 2012

Page 2: Presentation on Shared Memory Parallel Programming

What is openMP?

• Open Standard for Shared Memory Multiprocessing• Goal: Exploit multicore hardware with shared memory• Programmer’s view: The openMP API• Structure: Three primary API components: – Compiler directives,– Runtime Library routines and – Environment Variables

Page 3: Presentation on Shared Memory Parallel Programming

Shared Memory Architecture in a Multi-Core Environment

Page 4: Presentation on Shared Memory Parallel Programming

The key components of the API and its functions

• Compiler Directives - Spawning parallel regions (threads)- Synchronizing- Dividing blocks of code among threads- Distributing loop iterations

Page 5: Presentation on Shared Memory Parallel Programming

The key components of the API and its functions

• Runtime Library Routines - Setting & querying no. of threads - Nested parallelism - Control over locks - Thread information

Page 6: Presentation on Shared Memory Parallel Programming

The key components of the API and its functions

• Environment Variables - Setting no. of threads - Specifying how loop iterations are divided - Thread processor binding

- Enabling/Disabling dynamic threads - Nested parallelism

Page 7: Presentation on Shared Memory Parallel Programming

Goals

• Standardization• Ease of Use• Portability

Page 8: Presentation on Shared Memory Parallel Programming

Paradigm for using openMPWrite sequential

program

Find parallelizable portions of program

Insert directives/pragmas into existing code

Use openMP’s extended Compiler

Compile and run !

Insert calls to runtime library

routines and modify environment

variables, if desired

+

What happens here?

Page 9: Presentation on Shared Memory Parallel Programming

Compiler translation

#pragma omp <directive-type> <directive-clauses></n>{………..// Block of code executed as per instruction !}

Page 10: Presentation on Shared Memory Parallel Programming

Basic Example in C

{… //Sequential} #pragma omp parallel //fork{printf(“Hello from thread %d.\n”,omp_get_thread_num());} //join{… //Sequential}

Page 11: Presentation on Shared Memory Parallel Programming

What exactly happens when lines of code are executed in parallel?

• A team of threads are created• Each thread can have its own set of private

variables• All threads can have shared variables• Original thread : Master Thread• Fork-Join Model• Nested Parallelism

Page 12: Presentation on Shared Memory Parallel Programming

openMP LifeCycle – Petrinet model

Page 13: Presentation on Shared Memory Parallel Programming

Compiler directives – The Multi Core Magic Spells !

<directive type> Descriptionparallel Each thread will perform

same computation as others(replicated computations)

for / sections These are called workshare directives. Portions of overall work divided among threads(different computations). They don’t create threads. It has to be enclosed inside a parallel directive for threads to takeover the divided work.

Page 14: Presentation on Shared Memory Parallel Programming

Compiler directives – The Multi Core Magic Spells !

• Types of workshare directives

for Countable iteration[static]

sections One or more sequential sections of code, executed by a single thread

single Serializes a section of code

Page 15: Presentation on Shared Memory Parallel Programming

Compiler directives – The Multi Core Magic Spells !

• Clauses associated with each directive

<directive type> <directive clause>parallel If(expression)

private(var1,var2,…)firstprivate(var1,var2,..)lastprivate(var1,var2,..)shared(var1,var2,..)NUM_THREADS(integer value)

Page 16: Presentation on Shared Memory Parallel Programming

Compiler directives – The Multi Core Magic Spells !

• Clauses associated with each directive

<directive type> <directive clause>for schedule(type, chunk)

private(var1,var2,…)firstprivate(var1,var2,..)lastprivate(var1,var2,..)shared(var1,var2,..)collapse(n)nowaitReduction(operator:list)

Page 17: Presentation on Shared Memory Parallel Programming

Compiler directives – The Multi Core Magic Spells !

• Clauses associated with each directive

<directive type> <directive clause>sections private(var1,var2,…)

firstprivate(var1,var2,..)lastprivate(var1,var2,..)reduction(operator:list)nowait

Page 18: Presentation on Shared Memory Parallel Programming

Matrix Multiplication using loop directive

#pragma omp parallel private(i,j,k){

#pragma omp forfor(i=0;i<N;i++)

for(k=0;k<K;k++)for(j=0;j<M;j++)

C[i][j]=C[i][j]+A[i][k]*B[k][j];}

Page 19: Presentation on Shared Memory Parallel Programming

Scheduling Parallel Loops

• Static• Dynamic • Guided • Automatic • Runtime

Page 20: Presentation on Shared Memory Parallel Programming

Scheduling Parallel Loops

• Static - Amount of work/iteration - same - Set of contiguous chunks in RR fashion

- 1 Chunk = x iterations

Page 21: Presentation on Shared Memory Parallel Programming

Scheduling Parallel Loops

• Dynamic - Amount of work/iteration - Varies - Each thread will grab chunk of

iterations and return to grab another chunk when it has executed them.

• Guided - Same as dynamic, only difference, - a good proportion of iterations

remaining are shared among each thread.

Page 22: Presentation on Shared Memory Parallel Programming

Scheduling Parallel Loops

• Runtime - Schedule determined using an environment variable. Library routine provided !

• Automatic - Implementation chooses any schedule

Page 23: Presentation on Shared Memory Parallel Programming

Matrix Multiplication using loop directive – with a schedule

#pragma omp parallel private(i,j,k){

#pragma omp for schedule(static)for(i=0;i<N;i++)

for(k=0;k<K;k++)for(j=0;j<M;j++)

C[i][j]=C[i][j]+A[i][k]*B[k][j];}

Page 24: Presentation on Shared Memory Parallel Programming

openMP worshare directive – sections

int g; void foo(int m, int n) {

int p,i; #pragma omp sections firstprivate(g) nowait{

#pragma omp section{

p=f1(g);for(i=0;i<m;i++)do_stuff;

}#pragma omp section {

p=f2(g);for(i=0;i<n;i++)do_other_stuff;

}}

return;}

Page 25: Presentation on Shared Memory Parallel Programming

Parallelizing when the no.of Iterations is unknown[dynamic] !

• openMP has a directive called task

Page 26: Presentation on Shared Memory Parallel Programming

Explicit Tasks void processList(Node* list){

#pragma omp parallelpragma omp single{ Node *currentNode = list; while(currentNode) {

#pragma omp task firstprivate(currentNode)doWork(currentNode);

currentNode=currentNode->next; } }}

Page 27: Presentation on Shared Memory Parallel Programming

Explicit Tasks – Petrinet Model

Page 28: Presentation on Shared Memory Parallel Programming

Synchronization

• Barrier• Critical • Atomic • Flush

Page 29: Presentation on Shared Memory Parallel Programming

Performing Reductions

• A loop containing reduction will always be sequential, since each iteration would form a result depending on previous iteration.

• openMP allows these loops to be parallelized as long as the developer says, loop contains reduction and indicates the variable and kind of reduction via “Clauses”

Page 30: Presentation on Shared Memory Parallel Programming

Without using reduction#pragma omp parallel shared(array,sum)firstprivate(local_sum){

#pragma omp for private(i,j)for(i=0;i<max_i;i++){

for(j=0;j<max_j;++j)local_sum+=array[i][j];

}}#pragma omp criticalsum+=local_sum;}

Page 31: Presentation on Shared Memory Parallel Programming

Using Reductions in openMP

sum=0;#pragma omp parallel shared(array){

#pragma omp for reduction(+:sum) private(i,j)for(i=0;i<max_i;i++){for(j=0;j<max_j;++j)sum+=array[i][j];}

}

Page 32: Presentation on Shared Memory Parallel Programming

Programming for performance

• Use of IF clause before creating parallel regions

• Understanding Cache Coherence • Judicious use of parallel and flush • Critical and atomic - know the difference !• Avoid unnecessary computations in critical

region • Use of barrier - a starvation alert !

Page 33: Presentation on Shared Memory Parallel Programming

References• NUMA UMA

http://vvirtual.wordpress.com/2011/06/13/what-is-numa/

http://www.e-zest.net/blog/non-uniform-memory-architecture-numa/

• openMP basics

https://computing.llnl.gov/tutorials/openMP/

• Workshop on openMP SMP, by Tim Mattson from Intel (video)

http://www.youtube.com/watch?v=TzERa9GA6vY

Page 34: Presentation on Shared Memory Parallel Programming

Interesting links

• openMP official page

http://openmp.org/wp/

• 32 openMP Traps for C++ Developers

http://www.viva64.com/en/a/0054/#ID0EMULM


Recommended