An Introduction to Parallel An Introduction to Parallel Programming with MPIProgramming with MPI
March 22, 24, 29, 31March 22, 24, 29, 31 20052005
David Adams – [email protected] Adams – [email protected]://http://research.cs.vt.edu/lasca/scheduleresearch.cs.vt.edu/lasca/schedule
MPI and Classical ReferencesMPI and Classical References
MPIMPI M. Snir, W. Gropp, M. Snir, W. Gropp, MPI: The Complete MPI: The Complete
Reference (2-volume set),Reference (2-volume set), MIT Press, MA, MIT Press, MA, (1998).(1998).
Parallel ComputingParallel Computing D. P. Bertsekas and J. N. Tsitsiklis, D. P. Bertsekas and J. N. Tsitsiklis, Parallel Parallel
and Distributed Computationand Distributed Computation, Prentice-Hall, , Prentice-Hall, Englewood Cliffs, NJ, (1989).Englewood Cliffs, NJ, (1989).
M. J. Quinn, M. J. Quinn, Designing Efficient Algorithms for Designing Efficient Algorithms for Parallel Computers, Mcgraw-HillParallel Computers, Mcgraw-Hill, NY, (1987)., NY, (1987).
OutlineOutline DisclaimersDisclaimers Overview of basic parallel programming on a Overview of basic parallel programming on a
cluster with the goals of MPIcluster with the goals of MPI Batch system interactionBatch system interaction Startup proceduresStartup proceduresQuick reviewQuick reviewBlocking message passingBlocking message passingNon-blocking message passingNon-blocking message passingCollective communicationsCollective communications
ReviewReviewMessages are the only way processors can pass Messages are the only way processors can pass information.information.MPI hides the low level details of message transport MPI hides the low level details of message transport leaving the user to specify only the message logic.leaving the user to specify only the message logic.Parallel algorithms are built from identifying the Parallel algorithms are built from identifying the concurrency opportunities in the problem itself, not in the concurrency opportunities in the problem itself, not in the serial algorithm.serial algorithm.Communication is slow.Communication is slow.Partitioning and pipelining are two primary methods for Partitioning and pipelining are two primary methods for exploiting concurrency.exploiting concurrency.To make good use of the hardware we want to balance the To make good use of the hardware we want to balance the computational load across all processors and maintain a computational load across all processors and maintain a compute bound process rather than a communication compute bound process rather than a communication bound process.bound process.
More ReviewMore Review
MPI messages specify a starting point, a length, MPI messages specify a starting point, a length, and data type information.and data type information.MPI messages are read from contiguous MPI messages are read from contiguous memory.memory.These functions will generally appear in all MPI These functions will generally appear in all MPI programs:programs:
MPI_INITMPI_INIT MPI_FINALIZEMPI_FINALIZE MPI_COMM_SIZE MPI_COMM_SIZE MPI_COMM_RANKMPI_COMM_RANK
MPI_COMM_WORLD is the global MPI_COMM_WORLD is the global communicator available at the start of all MPI communicator available at the start of all MPI runs.runs.
Hello WorldHello WorldFortran90Fortran90
PROGRAM Hello_WorldPROGRAM Hello_World
IMPLICIT NONEIMPLICIT NONEINCLUDE 'mpif.h'INCLUDE 'mpif.h'
INTEGER :: ierr_p, rank_p, size_pINTEGER :: ierr_p, rank_p, size_pINTEGER, DIMENSION(MPI_STATUS_SIZE) :: status_pINTEGER, DIMENSION(MPI_STATUS_SIZE) :: status_p
CALL MPI_INIT(ierr_p)CALL MPI_INIT(ierr_p)CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank_p, ierr_p)CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank_p, ierr_p)CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size_p, ierr_p)CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size_p, ierr_p)
IF (rank_p==0) THENIF (rank_p==0) THENWRITE(*,*) ‘Hello world! I am process 0 and I am special!’WRITE(*,*) ‘Hello world! I am process 0 and I am special!’ELSEELSEWRITE(*,*) ‘Hello world! I am process’, rank_p WRITE(*,*) ‘Hello world! I am process’, rank_p END IFEND IF
CALL MPI_FINALIZE(ierr_p)CALL MPI_FINALIZE(ierr_p)
END PROGRAM Hello_WorldEND PROGRAM Hello_World
Hello WorldHello WorldC (case sensitive)C (case sensitive)
#include <stdio.h>#include <stdio.h>#include <mpi.h>#include <mpi.h>
int main (int argc, char **argv)int main (int argc, char **argv){{ int rank_p,size_p;int rank_p,size_p;
MPI_Init(&argc, &argv);MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank_p);MPI_Comm_rank(MPI_COMM_WORLD, &rank_p); MPI_Comm_size(MPI_COMM_WORLD, &size_p);MPI_Comm_size(MPI_COMM_WORLD, &size_p);
if (rank_p==0) {if (rank_p==0) { printf("%d: Hello World! I am special!\n", rank_p);printf("%d: Hello World! I am special!\n", rank_p); }} else {else { printf("%d: Hello World!\n", size_p);printf("%d: Hello World!\n", size_p); }}
MPI_Finalize();MPI_Finalize();}}
MPI MessagesMPI MessagesMessages are non-overtaking.Messages are non-overtaking.All MPI messages are completed in two parts:All MPI messages are completed in two parts: SendSend
Can be blocking or non-blocking.Can be blocking or non-blocking.Identifies the destination, data type and length, and a Identifies the destination, data type and length, and a message type identifier (tag).message type identifier (tag).Identifies to MPI a space in memory specifically reserved for Identifies to MPI a space in memory specifically reserved for the sending of this message.the sending of this message.
ReceiveReceiveCan be blocking or non-blockingCan be blocking or non-blockingIdentifies the source, data type and length, and a message Identifies the source, data type and length, and a message type identifier (tag).type identifier (tag).Identifies to MPI a space in memory specifically reserved for Identifies to MPI a space in memory specifically reserved for the completion of this message.the completion of this message.
Message SemanticsMessage Semantics(Modes)(Modes)
Standard Standard The completion of the send does not necessarily mean The completion of the send does not necessarily mean
that the matching receive has started, and no assumption that the matching receive has started, and no assumption should be made in the application program about whether should be made in the application program about whether the out-going data is buffered.the out-going data is buffered.
All buffering is made at the discretion of your MPI All buffering is made at the discretion of your MPI implementation.implementation.
Completion of an operation simply means that the Completion of an operation simply means that the message buffer space can now be modified safely again.message buffer space can now be modified safely again.
BufferedBufferedSynchronousSynchronousReadyReady
Message SemanticsMessage Semantics(Modes)(Modes)
Standard Standard Buffered (not recommended)Buffered (not recommended) The user can guarantee that a certain amount of The user can guarantee that a certain amount of
buffer space is available.buffer space is available. The catch is that the space must be explicitly provided The catch is that the space must be explicitly provided
by the application program.by the application program. Making sure the buffer space does not become full is Making sure the buffer space does not become full is
completely the user’s responsibility.completely the user’s responsibility.
SynchronousSynchronousReadyReady
Message SemanticsMessage Semantics(Modes)(Modes)
Standard Standard Buffered (not recommended)Buffered (not recommended)SynchronousSynchronous A rendezvous semantic between sender and A rendezvous semantic between sender and
receiver is used.receiver is used. Completion of a send signals that the receive Completion of a send signals that the receive
has at least started.has at least started.
ReadyReady
Message SemanticsMessage Semantics(Modes)(Modes)
Standard Standard Buffered (not recommended)Buffered (not recommended)SynchronousSynchronousReady (not recommended)Ready (not recommended) Allows the user to exploit extra knowledge to simplify Allows the user to exploit extra knowledge to simplify
the protocol and potentially achieve higher the protocol and potentially achieve higher performance.performance.
In a ready-mode send, the user asserts that the In a ready-mode send, the user asserts that the matching receive already has been posted.matching receive already has been posted.
Blocking Message PassingBlocking Message Passing(SEND)(SEND)
MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR)COMM, IERROR)
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR
Performs a standard-mode, blocking send.Performs a standard-mode, blocking send.Blocking means that the code can not continue until the Blocking means that the code can not continue until the send has completed.send has completed.Completion of the send means either that the data has Completion of the send means either that the data has been buffered non-locally or locally and that the been buffered non-locally or locally and that the message buffer is now free to modify. message buffer is now free to modify. Completion implies nothing about the matching receive.Completion implies nothing about the matching receive.
BufferBufferMPI_SEND (MPI_SEND (BUFBUF, COUNT, DATATYPE, DEST, TAG, COMM, , COUNT, DATATYPE, DEST, TAG, COMM, IERROR)IERROR)
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR
BUF is an array. It can be an array of one object but it BUF is an array. It can be an array of one object but it must be an array.must be an array.The definition The definition
INTEGER :: X INTEGER :: X DOES NOT EQUALDOES NOT EQUAL INTEGER :: X(1)INTEGER :: X(1)
BufferBufferMPI_SEND (MPI_SEND (BUFBUF, COUNT, DATATYPE, DEST, TAG, COMM, , COUNT, DATATYPE, DEST, TAG, COMM, IERROR)IERROR)
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR
BUF is the parameter in which MPI determines the BUF is the parameter in which MPI determines the starting point for the memory space to be allocated to starting point for the memory space to be allocated to this message.this message.Recall that this memory space must be contiguous and Recall that this memory space must be contiguous and allocatable arrays in Fortran90 are not necessarily allocatable arrays in Fortran90 are not necessarily contiguous. Also, array segments are certainly not in contiguous. Also, array segments are certainly not in general contiguous.general contiguous.
BufferBufferMPI_SEND (MPI_SEND (BUFBUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR), COUNT, DATATYPE, DEST, TAG, COMM, IERROR)
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR
Until the send is complete the data inside BUF is undefined. Until the send is complete the data inside BUF is undefined. Any attempt to change the data in BUF before the send Any attempt to change the data in BUF before the send completes is also an undefined operation (though possible).completes is also an undefined operation (though possible).Once a send operation begins it is the users job to see that Once a send operation begins it is the users job to see that no modifications to BUF are made.no modifications to BUF are made.Completion of the send ensures the user that it is safe to Completion of the send ensures the user that it is safe to modify the contents of BUF again.modify the contents of BUF again.
DATATYPEDATATYPEMPI_SEND (BUF, COUNT, MPI_SEND (BUF, COUNT, DATATYPEDATATYPE, DEST, TAG, COMM, , DEST, TAG, COMM, IERROR)IERROR)
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR
DATATYPE is an MPI specific data type DATATYPE is an MPI specific data type corresponding to the type of data stored in BUF.corresponding to the type of data stored in BUF. An array of integers would be sent using the An array of integers would be sent using the
MPI_INTEGER data typeMPI_INTEGER data type An array of logical variables would be sent using the An array of logical variables would be sent using the
MPI_LOGICAL data type MPI_LOGICAL data type etc…. etc….
MPI Types in Fortran 77MPI Types in Fortran 77MPI_INTEGER – INTEGERMPI_INTEGER – INTEGERMPI_REAL – REALMPI_REAL – REALMPI_DOUBLE_PRECISION – DOUBLE PRECISIONMPI_DOUBLE_PRECISION – DOUBLE PRECISIONMPI_COMPLEX – COMPLEXMPI_COMPLEX – COMPLEXMPI_LOGICAL – LOGICALMPI_LOGICAL – LOGICALMPI_CHARACTER – CHARACTER(1)MPI_CHARACTER – CHARACTER(1)MPI_BYTEMPI_BYTEMPI_PACKEDMPI_PACKED
MPI types in CMPI types in CMPI_CHAR – signed charMPI_CHAR – signed charMPI_SHORT – signed short intMPI_SHORT – signed short intMPI_INT – signed intMPI_INT – signed intMPI_LONG – signed long intMPI_LONG – signed long intMPI_UNSIGNED_CHAR – unsigned short intMPI_UNSIGNED_CHAR – unsigned short intMPI_UNSIGNED – unsigned intMPI_UNSIGNED – unsigned intMPI_UNSIGNED_LONG – unsigned long intMPI_UNSIGNED_LONG – unsigned long intMPI_FLOAT – floatMPI_FLOAT – floatMPI_DOUBLE – doubleMPI_DOUBLE – doubleMPI_LONG_DOUBLE – long doubleMPI_LONG_DOUBLE – long doubleMPI_BYTEMPI_BYTEMPI_PACKEDMPI_PACKED
COUNTCOUNTMPI_SEND (BUF, MPI_SEND (BUF, COUNTCOUNT, DATATYPE, DEST, TAG, COMM, , DATATYPE, DEST, TAG, COMM, IERROR)IERROR)
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR
COUNT specifies the number of entries of type COUNT specifies the number of entries of type DATATYPE in the buffer BUF.DATATYPE in the buffer BUF.From the combined information of COUNT, From the combined information of COUNT, DATATYPE, and BUF, MPI can determine the DATATYPE, and BUF, MPI can determine the starting point in memory for the message and starting point in memory for the message and the number of bytes to move.the number of bytes to move.
CommunicatorCommunicatorMPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMMCOMM, , IERROR)IERROR)
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR
COMM provides MPI with the reference point for COMM provides MPI with the reference point for the communication domain applied to this send. the communication domain applied to this send. For most MPI programs MPI_COMM_WORLD For most MPI programs MPI_COMM_WORLD will be sufficient as the argument for this will be sufficient as the argument for this parameter.parameter.
DESTINATIONDESTINATIONMPI_SEND (BUF, COUNT, DATATYPE, MPI_SEND (BUF, COUNT, DATATYPE, DESTDEST, TAG, COMM, , TAG, COMM, IERROR)IERROR)
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR
DEST is an integer representing the rank of the DEST is an integer representing the rank of the process I am trying to send a message to.process I am trying to send a message to.The rank value is with respect to the The rank value is with respect to the communicator in the COMM parameter.communicator in the COMM parameter.For MPI_COMM_WORLD, the value in DEST is For MPI_COMM_WORLD, the value in DEST is the absolute rank of the processor you are trying the absolute rank of the processor you are trying to reach.to reach.
TAGTAGMPI_SEND (BUF, COUNT, DATATYPE, DEST, MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAGTAG, COMM, , COMM, IERROR)IERROR)
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR
The TAG parameter is an integer between 0 and The TAG parameter is an integer between 0 and some upper bound where the upper bound is some upper bound where the upper bound is machine dependent. The value for the upper machine dependent. The value for the upper bound is found in the attribute MPI_TAG_UB.bound is found in the attribute MPI_TAG_UB.This integer value can be used to distinguish This integer value can be used to distinguish messages since send-receive pairs will only messages since send-receive pairs will only match if their TAG values also match.match if their TAG values also match.
IERRORIERRORMPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, MPI_SEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERRORIERROR))
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROROUT IERROR
Assuming everything is working as planned then Assuming everything is working as planned then the value of IERROR on exit will be the value of IERROR on exit will be MPI_SUCCESS. MPI_SUCCESS. Values not equal to MPI_SUCCESS indicate Values not equal to MPI_SUCCESS indicate some error but these values are implementation some error but these values are implementation specific.specific.
Send ModesSend Modes
Standard Standard MPI_SENDMPI_SEND
Buffered (not recommended)Buffered (not recommended) MPI_BSENDMPI_BSEND
SynchronousSynchronous MPI_SSENDMPI_SSEND
Ready (not recommended)Ready (not recommended) MPI_RSENDMPI_RSEND
Blocking Message PassingBlocking Message Passing(RECEIVE)(RECEIVE)
MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR)STATUS, IERROR)
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, STATUS(MPI_STATUS_SIZE)OUT IERROR, STATUS(MPI_STATUS_SIZE)
Performs a standard-mode, blocking receive.Performs a standard-mode, blocking receive.Blocking means that the code can not continue until the receive has Blocking means that the code can not continue until the receive has completed.completed.Completion of the receive means that the data has been placed into Completion of the receive means that the data has been placed into the message buffer locally and that the message buffer is now safe the message buffer locally and that the message buffer is now safe to modify or use.to modify or use.Completion implies nothing about the completion of the matching Completion implies nothing about the completion of the matching send (except that the send has started).send (except that the send has started).
BUFFER, DATATYPE, COMM, and BUFFER, DATATYPE, COMM, and IERRORIERROR
MPI_RECV (MPI_RECV (BUFBUF, COUNT, , COUNT, DATATYPEDATATYPE, SOURCE, TAG, , SOURCE, TAG, COMMCOMM, , STATUS, STATUS, IERRORIERROR))
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, STATUS(MPI_STATUS_SIZE)OUT IERROR, STATUS(MPI_STATUS_SIZE)
The parameters BUF, DATATYPE and IERROR follow The parameters BUF, DATATYPE and IERROR follow the same rules as that of the send.the same rules as that of the send.Send receive pairs will only match if their Send receive pairs will only match if their SOURCE/DEST, TAG, and COMM information match.SOURCE/DEST, TAG, and COMM information match.
COUNTCOUNTMPI_RECV (BUF, MPI_RECV (BUF, COUNTCOUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, , DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR)IERROR)
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, STATUS(MPI_STATUS_SIZE)OUT IERROR, STATUS(MPI_STATUS_SIZE)
Like in the send operation, the COUNT parameter indicates Like in the send operation, the COUNT parameter indicates the number of entries of type DATATYPE in BUF.the number of entries of type DATATYPE in BUF.The COUNT values of a send-receive pair, however, do not The COUNT values of a send-receive pair, however, do not need to match.need to match.It is the user’s responsibility to see that the buffer on the It is the user’s responsibility to see that the buffer on the receiving end is big enough to store the incoming message. receiving end is big enough to store the incoming message. An overflow error would be returned in IERROR in the case An overflow error would be returned in IERROR in the case when BUF is too small.when BUF is too small.
SourceSourceMPI_RECV (BUF, COUNT, DATATYPE, MPI_RECV (BUF, COUNT, DATATYPE, SOURCESOURCE, TAG, COMM, STATUS, , TAG, COMM, STATUS, IERROR)IERROR)
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, STATUS(MPI_STATUS_SIZE)OUT IERROR, STATUS(MPI_STATUS_SIZE)
SOURCE is an integer representing the rank of the process I SOURCE is an integer representing the rank of the process I am willing to receive a message from.am willing to receive a message from.The rank value is with respect to the communicator in the The rank value is with respect to the communicator in the COMM parameter.COMM parameter.For MPI_COMM_WORLD, the value in SOURCE is the For MPI_COMM_WORLD, the value in SOURCE is the absolute rank of the processor you are willing to receive from.absolute rank of the processor you are willing to receive from.The receiver can specify a wildcard value for SOURCE The receiver can specify a wildcard value for SOURCE (MPI_ANY_SOURCE) indicating that any source is (MPI_ANY_SOURCE) indicating that any source is acceptable as long as the TAG and COMM parameters acceptable as long as the TAG and COMM parameters match.match.
SourceSourceMPI_RECV (BUF, COUNT, DATATYPE, SOURCE, MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAGTAG, COMM, , COMM, STATUS, IERROR)STATUS, IERROR)
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, STATUS(MPI_STATUS_SIZE)OUT IERROR, STATUS(MPI_STATUS_SIZE)
The TAG value is an integer that must be The TAG value is an integer that must be matched with the TAG value of the matched with the TAG value of the corresponding send.corresponding send.The receiver can specify a wildcard value for The receiver can specify a wildcard value for TAG (MPI_ANY_TAG) indicating that it is willing TAG (MPI_ANY_TAG) indicating that it is willing to receive any tag value as long as the source to receive any tag value as long as the source and COMM values match.and COMM values match.
SourceSourceMPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUSSTATUS, IERROR), IERROR)
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, STATUS(MPI_STATUS_SIZE)OUT IERROR, STATUS(MPI_STATUS_SIZE)
The STATUS parameter is a returned parameter The STATUS parameter is a returned parameter that contains information about the completion of that contains information about the completion of the message.the message.When using wildcards you may need to find out When using wildcards you may need to find out who it was that sent you a message, what it was who it was that sent you a message, what it was about, and how long the message was before about, and how long the message was before continuing to process. This is the type of continuing to process. This is the type of information found in STATUS.information found in STATUS.
SourceSourceMPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, MPI_RECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUSSTATUS, , IERROR)IERROR)
IN <type> BUF(*)IN <type> BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, STATUS(MPI_STATUS_SIZE)OUT IERROR, STATUS(MPI_STATUS_SIZE)
In FORTRAN77 STATUS is an array of integers of size In FORTRAN77 STATUS is an array of integers of size MPI_STATUS_SIZE. MPI_STATUS_SIZE. The three constants, MPI_SOURCE, MPI_TAG, and The three constants, MPI_SOURCE, MPI_TAG, and MPI_ERROR are the indices of the entries that store the MPI_ERROR are the indices of the entries that store the source, tag and error fields respectively.source, tag and error fields respectively.In C, STATUS is a structure of type MPI_Status that In C, STATUS is a structure of type MPI_Status that contains three fields named MPI_Source, MPI_Tag, and contains three fields named MPI_Source, MPI_Tag, and MPI_Error.MPI_Error.Notice that the length of the message doesn’t appear to Notice that the length of the message doesn’t appear to be included…be included…
Questions/AnswersQuestions/Answers
Question: What is the purpose of having the Question: What is the purpose of having the error returned in the STATUS data structure? It error returned in the STATUS data structure? It seems redundant.seems redundant.
Answer: It is possible for a single function such Answer: It is possible for a single function such as MPI_WAIT_ALL( ) to complete multiple as MPI_WAIT_ALL( ) to complete multiple messages in a single call. In these cases each messages in a single call. In these cases each individual message may produce its own error individual message may produce its own error code and that code is what is returned in the code and that code is what is returned in the STATUS data structure.STATUS data structure.
MPI_GET_COUNTMPI_GET_COUNTMPI_GET_COUNT(STATUS, DATATYPE, COUNT, IERROR)MPI_GET_COUNT(STATUS, DATATYPE, COUNT, IERROR)
IN INTEGER STATUS(MPI_STATUS_SIZE), DATA_TYPE,IN INTEGER STATUS(MPI_STATUS_SIZE), DATA_TYPE, OUT COUNT, IERROROUT COUNT, IERROR
MPI_GET_COUNT will allow you to determine the MPI_GET_COUNT will allow you to determine the number of entities of type DATATYPE were received in number of entities of type DATATYPE were received in the message.the message.
For advanced users see also MPI_GET_ELEMENTFor advanced users see also MPI_GET_ELEMENT
Six Powerful FunctionsSix Powerful Functions
MPI_INITMPI_INITMPI_FINALIZEMPI_FINALIZEMPI_COMM_RANKMPI_COMM_RANKMPI_COMM_SIZEMPI_COMM_SIZEMPI_SENDMPI_SENDMPI_RECVMPI_RECV
DeadlockDeadlock
MPI does not enforce a safe programming MPI does not enforce a safe programming style.style.It is the user’s responsibility to ensure that It is the user’s responsibility to ensure that it is impossible for the program to fall into it is impossible for the program to fall into a deadlock condition.a deadlock condition.Deadlock occurs when a process blocks to Deadlock occurs when a process blocks to wait for an event that, given the current wait for an event that, given the current state of the system, can never happen.state of the system, can never happen.
Deadlock examplesDeadlock examples……CALL MPI_COMM_RANK(comm, rank, ierr)CALL MPI_COMM_RANK(comm, rank, ierr)IF (rank .EQ. 0) THENIF (rank .EQ. 0) THEN
CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr)
ELSE IF (rank .EQ. 1) THEN ELSE IF (rank .EQ. 1) THEN CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr)CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr) CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr)
END IFEND IF……
This program will always deadlock.This program will always deadlock.
Deadlock examplesDeadlock examples……CALL MPI_COMM_RANK(comm, rank, ierr)CALL MPI_COMM_RANK(comm, rank, ierr)IF (rank .EQ. 0) THENIF (rank .EQ. 0) THEN
CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)
ELSE IF (rank .EQ. 1) THEN ELSE IF (rank .EQ. 1) THEN CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr) CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr)CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr)
END IFEND IF……
This program is unsafe. Why?This program is unsafe. Why?
Safe WaySafe Way……CALL MPI_COMM_RANK(comm, rank, ierr)CALL MPI_COMM_RANK(comm, rank, ierr)IF (rank .EQ. 0) THENIF (rank .EQ. 0) THEN
CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)
ELSE IF (rank .EQ. 1) THEN ELSE IF (rank .EQ. 1) THEN CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr) CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr)
CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 0, tag, comm, ierr)END IFEND IF……
This is a silly example…no one would ever try to do it the other This is a silly example…no one would ever try to do it the other ways…right?ways…right?
Motivating Example for DeadlockMotivating Example for Deadlock
P10
P2P1
P9
P8
P7 P6
P5
P4
P3
Motivating Example for DeadlockMotivating Example for Deadlock
P10
P2P1
P9
P8
P7 P6
P5
P4
P3
Timestep: 1
Motivating Example for DeadlockMotivating Example for Deadlock
P10
P2P1
P9
P8
P7 P6
P5
P4
P3
Timestep: 2
Motivating Example for DeadlockMotivating Example for Deadlock
P10
P2P1
P9
P8
P7 P6
P5
P4
P3
Timestep: 3
Motivating Example for DeadlockMotivating Example for Deadlock
P10
P2P1
P9
P8
P7 P6
P5
P4
P3
Timestep: 4
Motivating Example for DeadlockMotivating Example for Deadlock
P10
P2P1
P9
P8
P7 P6
P5
P4
P3
Timestep: 5
Motivating Example for DeadlockMotivating Example for Deadlock
P10
P2P1
P9
P8
P7 P6
P5
P4
P3
Timestep: 6
Motivating Example for DeadlockMotivating Example for Deadlock
P10
P2P1
P9
P8
P7 P6
P5
P4
P3
Timestep: 7
Motivating Example for DeadlockMotivating Example for Deadlock
P10
P2P1
P9
P8
P7 P6
P5
P4
P3
Timestep: 8
Motivating Example for DeadlockMotivating Example for Deadlock
P10
P2P1
P9
P8
P7 P6
P5
P4
P3
Timestep: 9
Motivating Example for DeadlockMotivating Example for Deadlock
P10
P2P1
P9
P8
P7 P6
P5
P4
P3
Timestep: 10!
Super Idea!Super Idea!……CALL MPI_COMM_RANK(comm, rank, ierr)CALL MPI_COMM_RANK(comm, rank, ierr)IF (rank .EQ. 0) THENIF (rank .EQ. 0) THEN
CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)CALL MPI_RECV(recvbuf, count, MPI_REAL, 1, tag, comm, status, ierr)
ELSE IF (rank .EQ. 1) THEN ELSE IF (rank .EQ. 1) THEN CALL MPI_SEND(sendbuf, count, MPI_REAL, 2, tag, comm, ierr)CALL MPI_SEND(sendbuf, count, MPI_REAL, 2, tag, comm, ierr) CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr)CALL MPI_RECV(recvbuf, count, MPI_REAL, 0, tag, comm, status, ierr)
ELSE IF (rank .EQ. 2) THENELSE IF (rank .EQ. 2) THEN……
I’ll cleverly order my sends so that they all happen at the same time I’ll cleverly order my sends so that they all happen at the same time and all the communication will be completed in one time step!and all the communication will be completed in one time step!
WRONG!WRONG!The code will be unsafe.The code will be unsafe.
It worked perfectly for me, why doesn’t it work on this machine?It worked perfectly for me, why doesn’t it work on this machine? It ran fine on Washday and now it doesn’t work. I haven’t It ran fine on Washday and now it doesn’t work. I haven’t
changed anything!changed anything! My code works if I send smaller messages. Maybe your My code works if I send smaller messages. Maybe your
machine can’t handle my optimized code.machine can’t handle my optimized code.
Why?Why?
http://research.cs.vt.edu/lasca/schedulehttp://research.cs.vt.edu/lasca/schedule
Please send any additional questions to:Please send any additional questions to: [email protected]@cs.vt.edu