MPI – An introduction by Jeroen van Hunen What is MPI and why should we use it? Simple example +...

MPI – An introduction by Jeroen van Hunen

• What is MPI and why should we use it?

• Simple example + some basic MPI functions

• Other frequently used MPI functions

• Compiling and running code with MPI

• Domain decomposition

• Stokes solver

• Tracers/markers

• Performance

• Documentation

What is MPI?

• Mainly a data communication tool: “Message-Passing Interface”• Allows parallel calculation on distributed memory machines• Usually Single-Program-Multiple-Data principle used: all processors have similar tasks (e.g. in domain decomposition)• Alternative: OpenMP for shared memory machines

Why should we use MPI?

• If sequential calculations take too long• If sequential calculations use too much memory

Output for 4 processors:

Code:

contains definitions, macros, function prototypes

initialize MPIask processor ‘rank’ ask # processors p

stop MPI

Simple MPI example

MPI calls for sending/receiving data

in C:

in Fortran:

in C:

in Fortran:

MPI_SEND and MPI_RECV syntax

MPI data types

in C: in Fortran:

Other frequently used MPI calls

Sending and receiving at the same time: no risk for deadlocks:

… or overwrite send buffer with received info:


Synchronizing the processors: wait for each other at the barrier:

Broadcasting a message from one processor to all the others: both sending and receiving processors use same call to MPI_BCAST


“Reducing” (combining) data from all processors: add, find maximum/minimum, etc.

OP can be one of the following:

For results to be available at all processors, use MPI_Allreduce:

Additional comments:

• ‘wildcards’ are allowed in MPI calls for:• source: MPI_ANY_SOURCE• tag: MPI_ANY_TAG

•MPI_SEND and MPI_RECV are ‘blocking’: they wait until job is done

Deadlocks:•Deadlock

•Depending on buffer

•Safe

•Don’t let processor send a message to itself•In this case use MPI_SENDRECV

Non-matching send/receive calls my block the code

Compiling and running code with MPI

Compiling: Fortran:mpif77 –o binary code.fmpif90 –o binary code.f

C:mpicc –o binary code.c

Running in general, no queueing system: mpirun –np 4 binarympirun -np 4 -nolocal -machinefile mach binary

Running on Gonzales, with queueing system:bsub -n 4 -W 8:00 prun binary

Domain decomposition

x

yz

• Total computational domain divided into ‘equal size’ blocks

• Each processor only deals with its own block

• At block boundaries some information exchange necessary

• Block division matters:

• surface/volume ratio

• number of processor bnds.

M2

S2

N2

EW

M1 =0.25*(N1+S1+W)

M

S

N

EW

M=0.25*(N+S+E+W)

S1

M1

N1

M2 =0.25*(E)M =M1+M2 (using MPI_SENDRECV)

M1 =M1=M

Stokes equation: Jacobi iterative solver

In block interior:no MPI needed

At block boundary:MPI needed

Gauss-Seidel solver performs better, butis also slightly more difficult to implement.

Tracers/Markers

proc n proc n+1

2nd order Runge-Kutta scheme:k1= dt v(t,x(t))k2= dt v(t+dt/2, x(t) + k1/2)x(t+dt) = x(t) + k2

Procedure:• Calculate x(t+dt/2)

If in procn+1: •procn sends tracer coordinates to procn+1

•procn+1 reports tracer velocity back to procn

• Calculate x(t)If in procn+1:

•procn sends tracer coordinates + function values permantently to procn+1

k1

k2

Performance

For too small jobs communication quickly becomes the bottleneck.

This problem: • R-B convection (Ra=106)• 2-D 64x64 finite elements, 104 steps• 3-D 64x64x64 FE, 100 steps• Calculation on gonzales

Documentation

PDF: www.hpc.unimelb.edu.au/software/mpi-docs/mpi-book.pdf

Books:

Date post:	19-Dec-2015
Category:	Documents
View:	227 times
Download:	0 times

MPI – An introduction by Jeroen van Hunen What is MPI and why should we use it? Simple example +...

Documents