+ All Categories
Home > Documents > An Introduction to Parallel Programming with MPI

An Introduction to Parallel Programming with MPI

Date post: 15-Mar-2016
Category:
Upload: adina
View: 58 times
Download: 1 times
Share this document with a friend
Description:
An Introduction to Parallel Programming with MPI. March 22, 24, 29, 31 2005 David Adams – [email protected] http://research.cs.vt.edu/lasca/schedule. MPI and Classical References. MPI M. Snir, W. Gropp, MPI: The Complete Reference (2-volume set), MIT Press, MA, (1998). Parallel Computing - PowerPoint PPT Presentation
52
An Introduction to An Introduction to Parallel Programming Parallel Programming with MPI with MPI March 22, 24, 29, 31 March 22, 24, 29, 31 2005 2005 David Adams – [email protected] David Adams – [email protected] http:// http:// research.cs.vt.edu/lasca/schedule research.cs.vt.edu/lasca/schedule
Transcript
Page 1: An Introduction to Parallel Programming with MPI

An Introduction to Parallel An Introduction to Parallel Programming with MPIProgramming with MPI

March 22, 24, 29, 31March 22, 24, 29, 31 20052005

David Adams – [email protected] Adams – [email protected]://http://research.cs.vt.edu/lasca/scheduleresearch.cs.vt.edu/lasca/schedule

Page 2: An Introduction to Parallel Programming with MPI

MPI and Classical ReferencesMPI and Classical References

MPIMPI M. Snir, W. Gropp, M. Snir, W. Gropp, MPI: The Complete MPI: The Complete

Reference (2-volume set),Reference (2-volume set), MIT Press, MA, MIT Press, MA, (1998).(1998).

Parallel ComputingParallel Computing D. P. Bertsekas and J. N. Tsitsiklis, D. P. Bertsekas and J. N. Tsitsiklis, Parallel Parallel

and Distributed Computationand Distributed Computation, Prentice-Hall, , Prentice-Hall, Englewood Cliffs, NJ, (1989).Englewood Cliffs, NJ, (1989).

M. J. Quinn, M. J. Quinn, Designing Efficient Algorithms for Designing Efficient Algorithms for Parallel Computers, Mcgraw-HillParallel Computers, Mcgraw-Hill, NY, (1987)., NY, (1987).

Page 3: An Introduction to Parallel Programming with MPI

OutlineOutline DisclaimersDisclaimers Overview of basic parallel programming on a Overview of basic parallel programming on a

cluster with the goals of MPIcluster with the goals of MPI Batch system interactionBatch system interaction Startup proceduresStartup proceduresQuick reviewQuick reviewBlocking message passingBlocking message passingNon-blocking message passingNon-blocking message passingCollective communicationsCollective communications

Page 4: An Introduction to Parallel Programming with MPI

ReviewReviewMessages are the only way processors can pass Messages are the only way processors can pass information.information.MPI hides the low level details of message transport MPI hides the low level details of message transport leaving the user to specify only the message logic.leaving the user to specify only the message logic.Parallel algorithms are built from identifying the Parallel algorithms are built from identifying the concurrency opportunities in the problem itself, not in the concurrency opportunities in the problem itself, not in the serial algorithm.serial algorithm.Communication is slow.Communication is slow.Partitioning and pipelining are two primary methods for Partitioning and pipelining are two primary methods for exploiting concurrency.exploiting concurrency.To make good use of the hardware we want to balance the To make good use of the hardware we want to balance the computational load across all processors and maintain a computational load across all processors and maintain a compute bound process rather than a communication compute bound process rather than a communication bound process.bound process.

Page 5: An Introduction to Parallel Programming with MPI

More ReviewMore Review

MPI messages specify a starting point, a length, MPI messages specify a starting point, a length, and data type information.and data type information.MPI messages are read from contiguous MPI messages are read from contiguous memory.memory.These functions will generally appear in all MPI These functions will generally appear in all MPI programs:programs:

MPI_INITMPI_INIT MPI_FINALIZEMPI_FINALIZE MPI_COMM_SIZE MPI_COMM_SIZE MPI_COMM_RANKMPI_COMM_RANK

MPI_COMM_WORLD is the global MPI_COMM_WORLD is the global communicator available at the start of all MPI communicator available at the start of all MPI runs.runs.

Page 6: An Introduction to Parallel Programming with MPI

Hello WorldHello WorldFortran90Fortran90

PROGRAM Hello_WorldPROGRAM Hello_World

IMPLICIT NONEIMPLICIT NONEINCLUDE 'mpif.h'INCLUDE 'mpif.h'

INTEGER :: ierr_p, rank_p, size_pINTEGER :: ierr_p, rank_p, size_pINTEGER, DIMENSION(MPI_STATUS_SIZE) :: status_pINTEGER, DIMENSION(MPI_STATUS_SIZE) :: status_p

CALL MPI_INIT(ierr_p)CALL MPI_INIT(ierr_p)CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank_p, ierr_p)CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank_p, ierr_p)CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size_p, ierr_p)CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size_p, ierr_p)

IF (rank_p==0) THENIF (rank_p==0) THENWRITE(*,*) ‘Hello world! I am process 0 and I am special!’WRITE(*,*) ‘Hello world! I am process 0 and I am special!’ELSEELSEWRITE(*,*) ‘Hello world! I am process’, rank_p WRITE(*,*) ‘Hello world! I am process’, rank_p END IFEND IF

CALL MPI_FINALIZE(ierr_p)CALL MPI_FINALIZE(ierr_p)

END PROGRAM Hello_WorldEND PROGRAM Hello_World

Page 7: An Introduction to Parallel Programming with MPI

Hello WorldHello WorldC (case sensitive)C (case sensitive)

#include <stdio.h>#include <stdio.h>#include <mpi.h>#include <mpi.h>

int main (int argc, char **argv)int main (int argc, char **argv){{ int rank_p,size_p;int rank_p,size_p;

MPI_Init(&argc, &argv);MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank_p);MPI_Comm_rank(MPI_COMM_WORLD, &rank_p); MPI_Comm_size(MPI_COMM_WORLD, &size_p);MPI_Comm_size(MPI_COMM_WORLD, &size_p);

if (rank_p==0) {if (rank_p==0) { printf("%d: Hello World! I am special!\n", rank_p);printf("%d: Hello World! I am special!\n", rank_p); }} else {else { printf("%d: Hello World!\n", size_p);printf("%d: Hello World!\n", size_p); }}

MPI_Finalize();MPI_Finalize();}}

Page 8: An Introduction to Parallel Programming with MPI

MPI MessagesMPI MessagesMessages are non-overtaking.Messages are non-overtaking.All MPI messages are completed in two parts:All MPI messages are completed in two parts: SendSend

Can be blocking or non-blocking.Can be blocking or non-blocking.Identifies the destination, data type and length, and a Identifies the destination, data type and length, and a message type identifier (tag).message type identifier (tag).Identifies to MPI a space in memory specifically reserved for Identifies to MPI a space in memory specifically reserved for the sending of this message.the sending of this message.

ReceiveReceiveCan be blocking or non-blockingCan be blocking or non-blockingIdentifies the source, data type and length, and a message Identifies the source, data type and length, and a message type identifier (tag).type identifier (tag).Identifies to MPI a space in memory specifically reserved for Identifies to MPI a space in memory specifically reserved for the completion of this message.the completion of this message.

Page 9: An Introduction to Parallel Programming with MPI

Message SemanticsMessage Semantics(Modes)(Modes)

Standard Standard The completion of the send does not necessarily mean The completion of the send does not necessarily mean

that the matching receive has started, and no assumption that the matching receive has started, and no assumption should be made in the application program about whether should be made in the application program about whether the out-going data is buffered.the out-going data is buffered.

All buffering is made at the discretion of your MPI All buffering is made at the discretion of your MPI implementation.implementation.

Completion of an operation simply means that the Completion of an operation simply means that the message buffer space can now be modified safely again.message buffer space can now be modified safely again.

BufferedBufferedSynchronousSynchronousReadyReady

Page 10: An Introduction to Parallel Programming with MPI

Message SemanticsMessage Semantics(Modes)(Modes)

Standard Standard Buffered (not recommended)Buffered (not recommended) The user can guarantee that a certain amount of The user can guarantee that a certain amount of

buffer space is available.buffer space is available. The catch is that the space must be explicitly provided The catch is that the space must be explicitly provided

by the application program.by the application program. Making sure the buffer space does not become full is Making sure the buffer space does not become full is

completely the user’s responsibility.completely the user’s responsibility.

SynchronousSynchronousReadyReady

Page 11: An Introduction to Parallel Programming with MPI

Message SemanticsMessage Semantics(Modes)(Modes)

Standard Standard Buffered (not recommended)Buffered (not recommended)SynchronousSynchronous A rendezvous semantic between sender and A rendezvous semantic between sender and

receiver is used.receiver is used. Completion of a send signals that the receive Completion of a send signals that the receive

has at least started.has at least started.

ReadyReady

Page 12: An Introduction to Parallel Programming with MPI

Message SemanticsMessage Semantics(Modes)(Modes)

Standard Standard Buffered (not recommended)Buffered (not recommended)SynchronousSynchronousReady (not recommended)Ready (not recommended) Allows the user to exploit extra knowledge to simplify Allows the user to exploit extra knowledge to simplify

the protocol and potentially achieve higher the protocol and potentially achieve higher performance.performance.

In a ready-mode send, the user asserts that the In a ready-mode send, the user asserts that the matching receive already has been posted.matching receive already has been posted.

Page 13: An Introduction to Parallel Programming with MPI

Blocking Message PassingBlocking Message Passing(SEND)(SEND)

MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR)COMM, IERROR)

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR

Performs a standard-mode, blocking send.Performs a standard-mode, blocking send.Blocking means that the code can not continue until the Blocking means that the code can not continue until the send has completed.send has completed.Completion of the send means either that the data has Completion of the send means either that the data has been buffered non-locally or locally and that the been buffered non-locally or locally and that the message buffer is now free to modify. message buffer is now free to modify. Completion implies nothing about the matching receive.Completion implies nothing about the matching receive.

Page 14: An Introduction to Parallel Programming with MPI

BufferBufferMPI_SEND (MPI_SEND (BUFBUF, COUNT, DATATYPE, DEST, TAG, COMM, , COUNT, DATATYPE, DEST, TAG, COMM, IERROR)IERROR)

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR

BUF is an array. It can be an array of one object but it BUF is an array. It can be an array of one object but it must be an array.must be an array.The definition The definition

INTEGER :: X INTEGER :: X DOES NOT EQUALDOES NOT EQUAL INTEGER :: X(1)INTEGER :: X(1)

Page 15: An Introduction to Parallel Programming with MPI

BufferBufferMPI_SEND (MPI_SEND (BUFBUF, COUNT, DATATYPE, DEST, TAG, COMM, , COUNT, DATATYPE, DEST, TAG, COMM, IERROR)IERROR)

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR

BUF is the parameter in which MPI determines the BUF is the parameter in which MPI determines the starting point for the memory space to be allocated to starting point for the memory space to be allocated to this message.this message.Recall that this memory space must be contiguous and Recall that this memory space must be contiguous and allocatable arrays in Fortran90 are not necessarily allocatable arrays in Fortran90 are not necessarily contiguous. Also, array segments are certainly not in contiguous. Also, array segments are certainly not in general contiguous.general contiguous.

Page 16: An Introduction to Parallel Programming with MPI

BufferBufferMPI_SEND (MPI_SEND (BUFBUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR), COUNT, DATATYPE, DEST, TAG, COMM, IERROR)

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR

Until the send is complete the data inside BUF is undefined. Until the send is complete the data inside BUF is undefined. Any attempt to change the data in BUF before the send Any attempt to change the data in BUF before the send completes is also an undefined operation (though possible).completes is also an undefined operation (though possible).Once a send operation begins it is the users job to see that Once a send operation begins it is the users job to see that no modifications to BUF are made.no modifications to BUF are made.Completion of the send ensures the user that it is safe to Completion of the send ensures the user that it is safe to modify the contents of BUF again.modify the contents of BUF again.

Page 17: An Introduction to Parallel Programming with MPI

DATATYPEDATATYPEMPI_SEND (BUF, COUNT, MPI_SEND (BUF, COUNT, DATATYPEDATATYPE, DEST, TAG, COMM, , DEST, TAG, COMM, IERROR)IERROR)

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR

DATATYPE is an MPI specific data type DATATYPE is an MPI specific data type corresponding to the type of data stored in BUF.corresponding to the type of data stored in BUF. An array of integers would be sent using the An array of integers would be sent using the

MPI_INTEGER data typeMPI_INTEGER data type An array of logical variables would be sent using the An array of logical variables would be sent using the

MPI_LOGICAL data type MPI_LOGICAL data type etc…. etc….

Page 18: An Introduction to Parallel Programming with MPI

MPI Types in Fortran 77MPI Types in Fortran 77MPI_INTEGER – INTEGERMPI_INTEGER – INTEGERMPI_REAL – REALMPI_REAL – REALMPI_DOUBLE_PRECISION – DOUBLE PRECISIONMPI_DOUBLE_PRECISION – DOUBLE PRECISIONMPI_COMPLEX – COMPLEXMPI_COMPLEX – COMPLEXMPI_LOGICAL – LOGICALMPI_LOGICAL – LOGICALMPI_CHARACTER – CHARACTER(1)MPI_CHARACTER – CHARACTER(1)MPI_BYTEMPI_BYTEMPI_PACKEDMPI_PACKED

Page 19: An Introduction to Parallel Programming with MPI

MPI types in CMPI types in CMPI_CHAR – signed charMPI_CHAR – signed charMPI_SHORT – signed short intMPI_SHORT – signed short intMPI_INT – signed intMPI_INT – signed intMPI_LONG – signed long intMPI_LONG – signed long intMPI_UNSIGNED_CHAR – unsigned short intMPI_UNSIGNED_CHAR – unsigned short intMPI_UNSIGNED – unsigned intMPI_UNSIGNED – unsigned intMPI_UNSIGNED_LONG – unsigned long intMPI_UNSIGNED_LONG – unsigned long intMPI_FLOAT – floatMPI_FLOAT – floatMPI_DOUBLE – doubleMPI_DOUBLE – doubleMPI_LONG_DOUBLE – long doubleMPI_LONG_DOUBLE – long doubleMPI_BYTEMPI_BYTEMPI_PACKEDMPI_PACKED

Page 20: An Introduction to Parallel Programming with MPI

COUNTCOUNTMPI_SEND (BUF, MPI_SEND (BUF, COUNTCOUNT, DATATYPE, DEST, TAG, COMM, , DATATYPE, DEST, TAG, COMM, IERROR)IERROR)

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR

COUNT specifies the number of entries of type COUNT specifies the number of entries of type DATATYPE in the buffer BUF.DATATYPE in the buffer BUF.From the combined information of COUNT, From the combined information of COUNT, DATATYPE, and BUF, MPI can determine the DATATYPE, and BUF, MPI can determine the starting point in memory for the message and starting point in memory for the message and the number of bytes to move.the number of bytes to move.

Page 21: An Introduction to Parallel Programming with MPI

CommunicatorCommunicatorMPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMMCOMM, , IERROR)IERROR)

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR

COMM provides MPI with the reference point for COMM provides MPI with the reference point for the communication domain applied to this send. the communication domain applied to this send. For most MPI programs MPI_COMM_WORLD For most MPI programs MPI_COMM_WORLD will be sufficient as the argument for this will be sufficient as the argument for this parameter.parameter.

Page 22: An Introduction to Parallel Programming with MPI

DESTINATIONDESTINATIONMPI_SEND (BUF, COUNT, DATATYPE, MPI_SEND (BUF, COUNT, DATATYPE, DESTDEST, TAG, COMM, , TAG, COMM, IERROR)IERROR)

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR

DEST is an integer representing the rank of the DEST is an integer representing the rank of the process I am trying to send a message to.process I am trying to send a message to.The rank value is with respect to the The rank value is with respect to the communicator in the COMM parameter.communicator in the COMM parameter.For MPI_COMM_WORLD, the value in DEST is For MPI_COMM_WORLD, the value in DEST is the absolute rank of the processor you are trying the absolute rank of the processor you are trying to reach.to reach.

Page 23: An Introduction to Parallel Programming with MPI

TAGTAGMPI_SEND (BUF, COUNT, DATATYPE, DEST, MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAGTAG, COMM, , COMM, IERROR)IERROR)

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR

The TAG parameter is an integer between 0 and The TAG parameter is an integer between 0 and some upper bound where the upper bound is some upper bound where the upper bound is machine dependent. The value for the upper machine dependent. The value for the upper bound is found in the attribute MPI_TAG_UB.bound is found in the attribute MPI_TAG_UB.This integer value can be used to distinguish This integer value can be used to distinguish messages since send-receive pairs will only messages since send-receive pairs will only match if their TAG values also match.match if their TAG values also match.

Page 24: An Introduction to Parallel Programming with MPI

IERRORIERRORMPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERRORIERROR))

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR

Assuming everything is working as planned then Assuming everything is working as planned then the value of IERROR on exit will be the value of IERROR on exit will be MPI_SUCCESS. MPI_SUCCESS. Values not equal to MPI_SUCCESS indicate Values not equal to MPI_SUCCESS indicate some error but these values are implementation some error but these values are implementation specific.specific.

Page 25: An Introduction to Parallel Programming with MPI

Send ModesSend Modes

Standard Standard MPI_SENDMPI_SEND

Buffered (not recommended)Buffered (not recommended) MPI_BSENDMPI_BSEND

SynchronousSynchronous MPI_SSENDMPI_SSEND

Ready (not recommended)Ready (not recommended) MPI_RSENDMPI_RSEND

Page 26: An Introduction to Parallel Programming with MPI

Blocking Message PassingBlocking Message Passing(RECEIVE)(RECEIVE)

MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR)STATUS, IERROR)

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, STATUS(MPI_STATUS_SIZE)OUT IERROR, STATUS(MPI_STATUS_SIZE)

Performs a standard-mode, blocking receive.Performs a standard-mode, blocking receive.Blocking means that the code can not continue until the receive has Blocking means that the code can not continue until the receive has completed.completed.Completion of the receive means that the data has been placed into Completion of the receive means that the data has been placed into the message buffer locally and that the message buffer is now safe the message buffer locally and that the message buffer is now safe to modify or use.to modify or use.Completion implies nothing about the completion of the matching Completion implies nothing about the completion of the matching send (except that the send has started).send (except that the send has started).

Page 27: An Introduction to Parallel Programming with MPI

BUFFER, DATATYPE, COMM, and BUFFER, DATATYPE, COMM, and IERRORIERROR

MPI_RECV (MPI_RECV (BUFBUF, COUNT, , COUNT, DATATYPEDATATYPE, SOURCE, TAG, , SOURCE, TAG, COMMCOMM, , STATUS, STATUS, IERRORIERROR))

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, STATUS(MPI_STATUS_SIZE)OUT IERROR, STATUS(MPI_STATUS_SIZE)

The parameters BUF, DATATYPE and IERROR follow The parameters BUF, DATATYPE and IERROR follow the same rules as that of the send.the same rules as that of the send.Send receive pairs will only match if their Send receive pairs will only match if their SOURCE/DEST, TAG, and COMM information match.SOURCE/DEST, TAG, and COMM information match.

Page 28: An Introduction to Parallel Programming with MPI

COUNTCOUNTMPI_RECV (BUF, MPI_RECV (BUF, COUNTCOUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, , DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR)IERROR)

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, STATUS(MPI_STATUS_SIZE)OUT IERROR, STATUS(MPI_STATUS_SIZE)

Like in the send operation, the COUNT parameter indicates Like in the send operation, the COUNT parameter indicates the number of entries of type DATATYPE in BUF.the number of entries of type DATATYPE in BUF.The COUNT values of a send-receive pair, however, do not The COUNT values of a send-receive pair, however, do not need to match.need to match.It is the user’s responsibility to see that the buffer on the It is the user’s responsibility to see that the buffer on the receiving end is big enough to store the incoming message. receiving end is big enough to store the incoming message. An overflow error would be returned in IERROR in the case An overflow error would be returned in IERROR in the case when BUF is too small.when BUF is too small.

Page 29: An Introduction to Parallel Programming with MPI

SourceSourceMPI_RECV (BUF, COUNT, DATATYPE, MPI_RECV (BUF, COUNT, DATATYPE, SOURCESOURCE, TAG, COMM, STATUS, , TAG, COMM, STATUS, IERROR)IERROR)

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, STATUS(MPI_STATUS_SIZE)OUT IERROR, STATUS(MPI_STATUS_SIZE)

SOURCE is an integer representing the rank of the process I SOURCE is an integer representing the rank of the process I am willing to receive a message from.am willing to receive a message from.The rank value is with respect to the communicator in the The rank value is with respect to the communicator in the COMM parameter.COMM parameter.For MPI_COMM_WORLD, the value in SOURCE is the For MPI_COMM_WORLD, the value in SOURCE is the absolute rank of the processor you are willing to receive from.absolute rank of the processor you are willing to receive from.The receiver can specify a wildcard value for SOURCE The receiver can specify a wildcard value for SOURCE (MPI_ANY_SOURCE) indicating that any source is (MPI_ANY_SOURCE) indicating that any source is acceptable as long as the TAG and COMM parameters acceptable as long as the TAG and COMM parameters match.match.

Page 30: An Introduction to Parallel Programming with MPI

SourceSourceMPI_RECV (BUF, COUNT, DATATYPE, SOURCE, MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAGTAG, COMM, , COMM, STATUS, IERROR)STATUS, IERROR)

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, STATUS(MPI_STATUS_SIZE)OUT IERROR, STATUS(MPI_STATUS_SIZE)

The TAG value is an integer that must be The TAG value is an integer that must be matched with the TAG value of the matched with the TAG value of the corresponding send.corresponding send.The receiver can specify a wildcard value for The receiver can specify a wildcard value for TAG (MPI_ANY_TAG) indicating that it is willing TAG (MPI_ANY_TAG) indicating that it is willing to receive any tag value as long as the source to receive any tag value as long as the source and COMM values match.and COMM values match.

Page 31: An Introduction to Parallel Programming with MPI

SourceSourceMPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUSSTATUS, IERROR), IERROR)

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, STATUS(MPI_STATUS_SIZE)OUT IERROR, STATUS(MPI_STATUS_SIZE)

The STATUS parameter is a returned parameter The STATUS parameter is a returned parameter that contains information about the completion of that contains information about the completion of the message.the message.When using wildcards you may need to find out When using wildcards you may need to find out who it was that sent you a message, what it was who it was that sent you a message, what it was about, and how long the message was before about, and how long the message was before continuing to process. This is the type of continuing to process. This is the type of information found in STATUS.information found in STATUS.

Page 32: An Introduction to Parallel Programming with MPI

SourceSourceMPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUSSTATUS, , IERROR)IERROR)

IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, STATUS(MPI_STATUS_SIZE)OUT IERROR, STATUS(MPI_STATUS_SIZE)

In FORTRAN77 STATUS is an array of integers of size In FORTRAN77 STATUS is an array of integers of size MPI_STATUS_SIZE. MPI_STATUS_SIZE. The three constants, MPI_SOURCE, MPI_TAG, and The three constants, MPI_SOURCE, MPI_TAG, and MPI_ERROR are the indices of the entries that store the MPI_ERROR are the indices of the entries that store the source, tag and error fields respectively.source, tag and error fields respectively.In C, STATUS is a structure of type MPI_Status that In C, STATUS is a structure of type MPI_Status that contains three fields named MPI_Source, MPI_Tag, and contains three fields named MPI_Source, MPI_Tag, and MPI_Error.MPI_Error.Notice that the length of the message doesn’t appear to Notice that the length of the message doesn’t appear to be included…be included…

Page 33: An Introduction to Parallel Programming with MPI

Questions/AnswersQuestions/Answers

Question: What is the purpose of having the Question: What is the purpose of having the error returned in the STATUS data structure? It error returned in the STATUS data structure? It seems redundant.seems redundant.

Answer: It is possible for a single function such Answer: It is possible for a single function such as MPI_WAIT_ALL( ) to complete multiple as MPI_WAIT_ALL( ) to complete multiple messages in a single call. In these cases each messages in a single call. In these cases each individual message may produce its own error individual message may produce its own error code and that code is what is returned in the code and that code is what is returned in the STATUS data structure.STATUS data structure.

Page 34: An Introduction to Parallel Programming with MPI

MPI_GET_COUNTMPI_GET_COUNTMPI_GET_COUNT(STATUS, DATATYPE, COUNT, IERROR)MPI_GET_COUNT(STATUS, DATATYPE, COUNT, IERROR)

IN INTEGER STATUS(MPI_STATUS_SIZE), DATA_TYPE,IN INTEGER STATUS(MPI_STATUS_SIZE), DATA_TYPE, OUT COUNT, IERROROUT COUNT, IERROR

MPI_GET_COUNT will allow you to determine the MPI_GET_COUNT will allow you to determine the number of entities of type DATATYPE were received in number of entities of type DATATYPE were received in the message.the message.

For advanced users see also MPI_GET_ELEMENTFor advanced users see also MPI_GET_ELEMENT

Page 35: An Introduction to Parallel Programming with MPI

Six Powerful FunctionsSix Powerful Functions

MPI_INITMPI_INITMPI_FINALIZEMPI_FINALIZEMPI_COMM_RANKMPI_COMM_RANKMPI_COMM_SIZEMPI_COMM_SIZEMPI_SENDMPI_SENDMPI_RECVMPI_RECV

Page 36: An Introduction to Parallel Programming with MPI

DeadlockDeadlock

MPI does not enforce a safe programming MPI does not enforce a safe programming style.style.It is the user’s responsibility to ensure that It is the user’s responsibility to ensure that it is impossible for the program to fall into it is impossible for the program to fall into a deadlock condition.a deadlock condition.Deadlock occurs when a process blocks to Deadlock occurs when a process blocks to wait for an event that, given the current wait for an event that, given the current state of the system, can never happen.state of the system, can never happen.

Page 37: An Introduction to Parallel Programming with MPI

Deadlock examplesDeadlock examples……CALL MPI_COMM_RANK(comm, rank, ierr)CALL MPI_COMM_RANK(comm, rank, ierr)IF (rank .EQ. 0) THENIF (rank .EQ. 0) THEN

CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr)

ELSE IF (rank .EQ. 1) THEN ELSE IF (rank .EQ. 1) THEN CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr)CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr)

END IFEND IF……

This program will always deadlock.This program will always deadlock.

Page 38: An Introduction to Parallel Programming with MPI

Deadlock examplesDeadlock examples……CALL MPI_COMM_RANK(comm, rank, ierr)CALL MPI_COMM_RANK(comm, rank, ierr)IF (rank .EQ. 0) THENIF (rank .EQ. 0) THEN

CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)

ELSE IF (rank .EQ. 1) THEN ELSE IF (rank .EQ. 1) THEN CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr) CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr)CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr)

END IFEND IF……

This program is unsafe. Why?This program is unsafe. Why?

Page 39: An Introduction to Parallel Programming with MPI

Safe WaySafe Way……CALL MPI_COMM_RANK(comm, rank, ierr)CALL MPI_COMM_RANK(comm, rank, ierr)IF (rank .EQ. 0) THENIF (rank .EQ. 0) THEN

CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)

ELSE IF (rank .EQ. 1) THEN ELSE IF (rank .EQ. 1) THEN CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr) CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr)

CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr)END IFEND IF……

This is a silly example…no one would ever try to do it the other This is a silly example…no one would ever try to do it the other ways…right?ways…right?

Page 40: An Introduction to Parallel Programming with MPI

Motivating Example for DeadlockMotivating Example for Deadlock

P10

P2P1

P9

P8

P7 P6

P5

P4

P3

Page 41: An Introduction to Parallel Programming with MPI

Motivating Example for DeadlockMotivating Example for Deadlock

P10

P2P1

P9

P8

P7 P6

P5

P4

P3

Timestep: 1

Page 42: An Introduction to Parallel Programming with MPI

Motivating Example for DeadlockMotivating Example for Deadlock

P10

P2P1

P9

P8

P7 P6

P5

P4

P3

Timestep: 2

Page 43: An Introduction to Parallel Programming with MPI

Motivating Example for DeadlockMotivating Example for Deadlock

P10

P2P1

P9

P8

P7 P6

P5

P4

P3

Timestep: 3

Page 44: An Introduction to Parallel Programming with MPI

Motivating Example for DeadlockMotivating Example for Deadlock

P10

P2P1

P9

P8

P7 P6

P5

P4

P3

Timestep: 4

Page 45: An Introduction to Parallel Programming with MPI

Motivating Example for DeadlockMotivating Example for Deadlock

P10

P2P1

P9

P8

P7 P6

P5

P4

P3

Timestep: 5

Page 46: An Introduction to Parallel Programming with MPI

Motivating Example for DeadlockMotivating Example for Deadlock

P10

P2P1

P9

P8

P7 P6

P5

P4

P3

Timestep: 6

Page 47: An Introduction to Parallel Programming with MPI

Motivating Example for DeadlockMotivating Example for Deadlock

P10

P2P1

P9

P8

P7 P6

P5

P4

P3

Timestep: 7

Page 48: An Introduction to Parallel Programming with MPI

Motivating Example for DeadlockMotivating Example for Deadlock

P10

P2P1

P9

P8

P7 P6

P5

P4

P3

Timestep: 8

Page 49: An Introduction to Parallel Programming with MPI

Motivating Example for DeadlockMotivating Example for Deadlock

P10

P2P1

P9

P8

P7 P6

P5

P4

P3

Timestep: 9

Page 50: An Introduction to Parallel Programming with MPI

Motivating Example for DeadlockMotivating Example for Deadlock

P10

P2P1

P9

P8

P7 P6

P5

P4

P3

Timestep: 10!

Page 51: An Introduction to Parallel Programming with MPI

Super Idea!Super Idea!……CALL MPI_COMM_RANK(comm, rank, ierr)CALL MPI_COMM_RANK(comm, rank, ierr)IF (rank .EQ. 0) THENIF (rank .EQ. 0) THEN

CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)

ELSE IF (rank .EQ. 1) THEN ELSE IF (rank .EQ. 1) THEN CALL MPI_SEND(sendbuf, count, MPI_REAL, 2, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 2, tag, comm, ierr) CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr)CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr)

ELSE IF (rank .EQ. 2) THENELSE IF (rank .EQ. 2) THEN……

I’ll cleverly order my sends so that they all happen at the same time I’ll cleverly order my sends so that they all happen at the same time and all the communication will be completed in one time step!and all the communication will be completed in one time step!

Page 52: An Introduction to Parallel Programming with MPI

WRONG!WRONG!The code will be unsafe.The code will be unsafe.

It worked perfectly for me, why doesn’t it work on this machine?It worked perfectly for me, why doesn’t it work on this machine? It ran fine on Washday and now it doesn’t work. I haven’t It ran fine on Washday and now it doesn’t work. I haven’t

changed anything!changed anything! My code works if I send smaller messages. Maybe your My code works if I send smaller messages. Maybe your

machine can’t handle my optimized code.machine can’t handle my optimized code.

Why?Why?

http://research.cs.vt.edu/lasca/schedulehttp://research.cs.vt.edu/lasca/schedule

Please send any additional questions to:Please send any additional questions to: [email protected]@cs.vt.edu


Recommended