MpiP Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of...

mpiP Evaluation Report

Hans Sherburne,Adam LekoUPC Group

HCS Research LaboratoryUniversity of Florida

2

Basic Information Name:

mpiP Developer:

Jeffrey Vetter (ORNL), Chris Chambreau (LLNL) Current versions:

mpiP v2.8 Website:

http://www.llnl.gov/CASC/mpip/ Contacts:

Jeffrey Vetter: [email protected] Chris Chambreau: [email protected]

3

mpiP: Lightweight, Scalable MPI Profiling mpiP is a simple lightweight tool for profiling Gathers information through the MPI profiling layer

Probably not good candidate to be extended for UPC or SHMEM Supports many platforms running Linux, Tru64, AIX, UNICOS, IBM BG/L Very simple to use, and output file is very easy to understand

Provides statistics for the top twenty MPI calls based on time spent in call, and total size of messages sent, also provides statistics for MPI I/O

Callsite traceback depth is variable to allow user to differentiate between and examine the behavior of routines that are wrappers for MPI calls

A mpip viewer, Mpipview, is available as part of Tool Gear Some of its functionality is extended to developers through an API:

stackwalking, address-to-source translation, symbol demangling, timing routines, accessing the name of the executable These functions might be useful is source-code correlation is to be

included in a UPC or SHMEM tool

4

What is mpiP Useful For? The data collect by mpiP is useful for analyzing the scalability of parallel applications. By examining the aggregate time and rank correlation of the time spent in each MPI call versus the total

time spent in MPI calls while increasing the number of tasks, one can locate flaws in load balancing and algorithm design.

This technique is described in [1] “Statistical Scalability Analysis of Communication Operations in Distributed Applications” –Vetter, J. & McCracken, M.

The following are courtesy of [1]:

5

The Downside… mpiP does provide the measurements of aggregate callsite time, and total

MPI call time necessary for computing the rank correlation coefficient mpiP does NOT automate the process of computing the rank correlation,

which must utilize data from multiple experiments

Equations for calculation of coefficients of correlation (linear and rank), care of [1]:

6

Partial Sample of mpiP Output

7

Information Provided by mpiP Information displayed in terms of task assignments, and callsites, which correspond

to machines and MPI calls in source code arranged in the following sections: Time per task

(AppTime, MPITime, MPI%) Location of callsite in source code

(callsite, line#, parent function, MPI call) Aggregate time per callsite (top twenty)

(time, app%, MPI%, variance) Aggregate sent message size per callsite (top twenty)

(count, total, avg. MPI%) Time statistics per callsite per task (all)

(max, min, mean, app%, MPI%) Sent message size statistics per callsite per task (all)

(count, max, min, mean, sum) I/O statistics per callsite per task (all)

(count, max, min, mean, sum)

8

mpiP Overhead

mpiP Profiling Overhead

5%

3%

2%

3%

1%

1%

0%

7%

5%

0%

1%

0%

0% 1% 2% 3% 4% 5% 6% 7% 8%

CAMEL

NAS LU (8p, W)

NAS LU (32p, B)

PP: Big message

PP: Diffuse procedure

PP: Hot procedure

PP: Intensive server

PP: Ping pong

PP: Random barrier

PP: Small messages

PP: System time

PP: Wrong way

Ben

chm

ark

Overhead (instrumented/uninstrumented)

9

Source Code Correlation in Mpipview

10

Bottleneck Identification Test Suite Testing metric: what did profile data tell us? CAMEL: TOSS-UP

Profile showed that MPI time is a small percentage of overall application time Profile reveals some imbalance in the amount of time spent in certain calls, but

doesn’t help the user understand the cause Profile does not provide information about what occurs when execution is not in

MPI calls. Difficult to grasp overall program behavior from profiling information alone

NAS LU: TOSS-UP Profile reveals that a MPI function calls consume a significant portion of

application time Profile reveals some imbalance in the amount of time spent in certain calls, but

doesn’t help the user understand the cause Profile does not provide information about what occurs when execution is not in

MPI calls. Difficult to grasp overall program behavior from profiling information alone

11

Bottleneck Identification Test Suite (2) Big message: PASSED

Profile clearly shows that Send and Recv dominate the application time

Profiles shows a large number of bytes transfered

Diffuse procedure: FAIL Profile showed large amount of time spent in

barrier Time is diffused across processes Profile does not show that in each barrier a

single process is always delaying completion Hot procedure: FAIL

No profile output, due to no MPI calls (other than setup and breakdown

Intensive server: PASSED Profile showed one process spent very little

time in MPI calls, while the remaining processes spent nearly all their time in Recvs

Profile showed one process sent an order of magnitude more data than the others, and spent far more time in Send

Ping pong: PASSED Profile showed time spent in MPI function

calls dominated the total application time Profile showed excessive number of Sends

and Recvs with little load imbalance Random barrier: PASS

Profile shows that the majority of execution time is spent in Barrier called by processes not holding “potato”

Small messages: PASS Profile clearly shows single process spends

almost all of the total application time in Recv, and recvs an excessive amount of messages sent by all the other processes

System time: FAIL No profile output, due to no MPI calls (other

than setup and breakdown Wrong way: TOSS-UP

One process spends most of the execution time in sends the other spends most of the execution time in receives

Profile does not reveal the improperly ordered communication pattern

12

Evaluation (1) Available metrics: 1/5

Only provides a handful statistical information about time, message size, and frequency of MPI calls

No hardware counter support Cost: free 5/5 Documentation quality: 4/5

Though brief (a single webpage), documentation adequately covers installation and available functionality

Extensibility: 2/5 mpiP is designed to use the MPI profiling layer so they would not be readily adapted to UPC

or SHMEM and so it would be of little use The source code correlation functions work well

Filtering and aggregation: 2/5 mpiP was designed to be lightweight, and presents statistics for the top twenty callsites Output size grows with number of tasks (machines)

Hardware support: 5/5 64-bit Linux (Itanium and Opteron), IBM SP (AIX), AlphaServer (Tru64), Cray X1, Cray XD1,

SGI Altix, IBM BlueGene/L Heterogeneity support: 0/5 (not supported)

13

Evaluation (2) Installation: 5/5

About as easy as you could expect Interoperability: 1/5

mpiP has it’s own output format Learning curve: 4/5

Easy to use Simple statistics are easily understood

Manual overhead: 1/5 All MPI calls automatically instrumented for you when linking against mpiP library No way to turn on/off tracing in places without relinking

Measurement accuracy: 4/5 CAMEL overhead: ~5% Correctness of programs is not affected Overhead is low (less than 7% for all test suite programs)

14

Evaluation (3) Multiple executions: 0/5 (not supported) Multiple analyses & views: 2/5

Statistics regarding MPI calls are displayed in output file Source code location to callsite correlation provided by Mpipview

Performance bottleneck identification: 2.5/5 No automatic methods supported Some bottlenecks could be deduced by examining gathered statistics Lack of trace information makes some bottlenecks impossible to detect

Profiling/tracing support: 2/5 Only supports profiling Profiling can be enabled for various regions of code by editing source code Turning on/off profiling requires recompilation

(a runtime environment variable for deactivating profiling is given in documentation, and acknowledged in the profile output file when set, but profiling is not disabled)

15

Evaluation (4) Response time: 3/5

No results until after run Quickly assembles report at the end of experimentation run

Searching: 0/5 (not supported) Software support: 3/5

Supports C, C++, Fortran Supports a large number of compilers Tied closely to MPI applications

Source code correlation: 4/5 Line numbers of source code provided for each MPI callsites in output file Automatic source code correlation provided by Mpipview

System stability: 5/5 mpiP and Mpipview work very reliably

Technical support: 5/5 Co-author, Chris Chambreau, responded quickly, and provided good information

allowing us to correct a problem with one of our benchmark apps

Date post:	11-Jan-2016
Category:	Documents
Upload:	rosamund-blankenship
View:	215 times
Download:	0 times

MpiP Evaluation Report Hans Sherburne, Adam Leko UPC Group HCS Research Laboratory University of...

Documents