MPI: Message Passing Interface

transcript

MPI: Message Passing InterfaceMPI: Message Passing Interface

Prabhaker MatetiWright State University

Mateti, MPI2

Overview Overview

MPI Hello World!Introduction to programming with MPIMPI library calls

Mateti, MPI3

MPI Overview MPI Overview

Similar to PVMNetwork of Heterogeneous MachinesMultiple implementations

– Open source:MPICHLAM

– Vendor specific

Mateti, MPI4

MPI FeaturesMPI Features

Rigorously specified standard Portable source codeEnables third party libraries Derived data types to minimize overhead Process topologies for efficiency on MPP Van fully overlap communication Extensive group communication

Mateti, MPI5

MPI 2MPI 2

Dynamic Process Management One-Sided Communication Extended Collective Operations External Interfaces Parallel I/O Language Bindings (C++ and Fortran-90)http://www.mpi-forum.org/

Mateti, MPI6

MPI OverviewMPI Overview

125+ functions typical applications need only about 6

Mateti, MPI7

MPI: manager+workersMPI: manager+workers#include <mpi.h>

main(int argc, char *argv[]){ int myrank; MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

if (myrank == 0) manager(); else worker();

MPI_Finalize();}

MPI_Init initializes the MPI system

MPI_Finalize called last by all processes

MPI_Comm_rank identifies a process by its rank

MPI_COMM_WORLD is the group that this process belongs to

Mateti, MPI8

MPI: manager()MPI: manager()manager(){ MPI_Status status;

MPI_Comm_size( MPI_COMM_WORLD, &ntasks); for (i = 1;i < ntasks;++i){

work= nextWork(); MPI_Send(&work, 1, MPI_INT,i,WORKTAG, MPI_COMM_WORLD);

… MPI_Reduce(&sub, &pi, 1, MPI_DOUBLE,

MPI_SUM, 0, MPI_COMM_WORLD);

MPI_Comm_size MPI_Send

Mateti, MPI9

MPI: worker()MPI: worker()worker(){ MPI_Status status; for (;;) { MPI_Recv(&work, 1,

MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD,

&status); result = doWork();

MPI_Send(&result, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);

MPI_Recv

Mateti, MPI10

MPI computes MPI computes #include "mpi.h"

int main(int argc, char *argv[]){ MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&np); MPI_Comm_rank(MPI_COMM_WORLD,&myid);

n = ...; /* intervals */

MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);

sub = series_sum(n, np); MPI_Reduce(&sub, &pi, 1, MPI_DOUBLE,

MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf("pi is %.16f\n", pi); MPI_Finalize(); return 0;}

Mateti, MPI11

Process groupsProcess groups

Group membership is static. There are no race conditions caused by

processes independently entering and leaving a group.

New group formation is collective and group membership information is distributed, not centralized.

Mateti, MPI12

MPI_SendMPI_Send blocking sendblocking send

MPI_Send(&sendbuffer, /* message buffer */

n, /* n items of */MPI_type, /* data type in message */ destination, /* process rank */ WORKTAG, /* user chosen tag */ MPI_COMM /* group */);

Mateti, MPI13

MPI_RecvMPI_Recv blocking blocking receivereceiveMPI_Recv(&recvbuffer, /* message buffer */n, /* n data items */MPI_type, /* of type */ MPI_ANY_SOURCE, /* from any sender */ MPI_ANY_TAG, /* any type of message */ MPI_COMM, /* group */ &status);

Mateti, MPI14

Send-receive succeeds …Send-receive succeeds …

Sender’s destination is a valid process rankReceiver specified a valid source processCommunicator is the same for bothTags matchMessage data types matchReceiver’s buffer is large enough

Mateti, MPI15

Message OrderMessage Order

P sends messages m1 first then m2 to QQ will receive m1 before m2

P sends m1 to Q, then m2 to RIn terms of a global wall clock, conclude

nothing re R receiving m2 before/after Q receiving m1.

Mateti, MPI16

Blocking and Non-blockingBlocking and Non-blocking

Send, receive can be blocking or notA blocking send can be coupled with a non-

blocking receive, and vice-versaNon-blocking send can use

– Standard mode MPI_Isend– Synchronous mode MPI_Issend– Buffered mode MPI_Ibsend– Ready mode MPI_Irsend

Mateti, MPI17

MPI_IsendMPI_Isend non-blockingnon-blocking

MPI_Isend(&buffer, /* message buffer */

n, /* n items of */MPI_type, /* data type in message */ destination, /* process rank */ WORKTAG, /* user chosen tag */ MPI_COMM, /* group */&handle);

Mateti, MPI18

MPI_IrecvMPI_Irecv

MPI_Irecv(&result, /* message buffer */n, /* n data items */MPI_type, /* of type */ MPI_ANY_SOURCE, /* from any sender */ MPI_ANY_TAG, /* any type of message */ MPI_COMM_WORLD, /* group */ &handle);

Mateti, MPI19

MPI_WaitMPI_Wait

MPI_Wait(handle,&status

Mateti, MPI20

MPI_Wait, MPI_TestMPI_Wait, MPI_Test

MPI_Wait(handle,&status

MPI_Test(handle,&status

Mateti, MPI21

Collective CommunicationCollective Communication

Mateti, MPI22

MPI_BcastMPI_Bcast

MPI_Bcast(buffer, count, MPI_Datatype, root,MPI_Comm

All processes use the same count, data type, root, and communicator. Before the operation, the root’s buffer contains a message. After the operation, all buffers contain the message from the root

Mateti, MPI23

MPI_ScatterMPI_Scatter

MPI_Scatter(sendbuffer,sendcount, MPI_Datatype,recvbuffer,recvcount, MPI_Datatype,root, MPI_Comm);

All processes use the same send and receive counts, data types, root and communicator. Before the operation, the root’s send buffer contains a message of length sendcount * N', where N is the number of processes. After the operation, the message is divided equally and dispersed to all processes (including the root) following rank order.

Mateti, MPI24

MPI_GatherMPI_Gather

MPI_Gather(sendbuffer,sendcount, MPI_Datatype, recvbuffer,recvcount, MPI_Datatype, root,MPI_Comm);

This is the “reverse” of MPI_Scatter(). After the operation the root process has in its receive buffer the concatenation of the send buffers of all processes (including its own), with a total message length of recvcount * N, where N is the number of processes. The message is gathered following rank order.

Mateti, MPI25

MPI_ReduceMPI_Reduce

MPI_Reduce(sndbuf, rcvbuf, count, MPI_Datatype datatype, MPI_Op, root, MPI_Comm);

After the operation, the root process has in its receive buffer the result of the pair-wise reduction of the send buffers of all processes, including its own.

Mateti, MPI26

Predefined Reduction OpsPredefined Reduction Ops

MPI_MAX MPI_MIN MPI_SUM MPI_PROD MPI_LAND MPI_BAND MPI_LOR MPI_BOR MPI_LXOR MPI_BXOR

MPI_MAXLOC MPI_MINLOC

L logical B bit-wise

Mateti, MPI27

User Defined Reduction User Defined Reduction OpsOpsvoid myOperator (void * invector,void * inoutvector,int * length,MPI_Datatype * datatype)

Mateti, MPI28

Ten Reasons to Prefer MPI over PVMTen Reasons to Prefer MPI over PVM

1. MPI has more than one free, and quality implementations.

2. MPI can efficiently program MPP and clusters. 3. MPI is rigorously specified. 4. MPI efficiently manages message buffers. 5. MPI has full asynchronous communication. 6. MPI groups are solid, efficient, and deterministic. 7. MPI defines a 3rd party profiling mechanism. 8. MPI synchronization protects 3rd party software. 9. MPI is portable. 10. MPI is a standard.

Mateti, MPI29

SummarySummary

Introduction to MPIReinforced Manager-Workers paradigmSend, receive: blocked, non-blockedProcess groups

Mateti, MPI30

MPI resourcesMPI resources Open source implementations

– MPICH– LAM

Books– Using MPI

William Gropp, Ewing Lusk, Anthony Skjellum – Using MPI-2

William Gropp, Ewing Lusk, Rajeev Thakur

On-line tutorials– www.tc.cornell.edu/Edu/Tutor/MPI/

MPI: Message Passing Interface

Documents