Date post: | 31-Mar-2015 |
Category: |
Documents |
Upload: | fabiola-stiver |
View: | 214 times |
Download: | 0 times |
Initial Design of a Test Suite
for Automatic Performance Analysis
Tools Bernd Mohr
Forschungszentrum JülichJohn von Neumann - Institut für
ComputingGermany
Jesper Larsson Träff
NEC Europe Ltd.C&C Research Labs
Initial Design of a Test Suitefor (Automatic)
Performance Analysis Tools
© 2003 Forschungszentrum Jülich, NIC-ZAM [2]
IST Working Group APART (since 1999)
AAutomatic PPerformance AAnalysis: RResources and TTools
• Forum for scientists and vendors• About 20 partners in Europe and the U.S.• http://www.fz-juelich.de/apart/
• Current Automatic Performance Tools Projects• Askalon http://www.par.univie.ac.at/project/askalon/• Kappa-Pi http://www.caos.uab.es/kpi.html• KOJAK http://www.fz-juelich.de/zam/kojak• Paradyn http://www.cs.wisc.edu/~paradyn/• Peridot http://wwwbode.cs.tum.edu/~gerndt/peridot/
© 2003 Forschungszentrum Jülich, NIC-ZAM [3]
(Full, Associated, and Former) Members
• European Research Centers and Universities
• U.S. Research Centers and Universities
• Vendors
© 2003 Forschungszentrum Jülich, NIC-ZAM [4]
APART Terminologie
• Performance Property• Aspect of performance behavior of an application
– E.g., communication dominated by waiting time• Specified as condition referring to performance data• Quantified and normalized in terms of
behavior-independent metric (severity)
• Performance Problem• Performance property with “negative” implications
• Performance Bottleneck• Performance Problem with highest severity
© 2003 Forschungszentrum Jülich, NIC-ZAM [5]
Example: Performance Property “Message in Wrong Order”
Locati
on
RECVA
Time
wait
SEND B SEND
C
RECV
SEND
© 2003 Forschungszentrum Jülich, NIC-ZAM [6]
The APART Test Suite (ATS)
• Users rely on correct working of tools Tools need to be especially well tested Systematic approach needed
• APART Test Suite• Common project inside APART group
– Every member needs this minimize resources– Ensures re-usability– Will also allow evaluation / comparison of
the different member projects• Main focus: automatic performance analysis tools• But also useful for “regular” performance tools
– http://www.fz-juelich.de/apart/ats/
© 2003 Forschungszentrum Jülich, NIC-ZAM [7]
Desired Functionality
• Tests to determine whether the semanticsof the original program were not altered
• Tests to see whether the recordedperformance data is correct
• Synthetic positive test cases for each known and definedperformance property and combinations of them
• Negative test cases which have no known performance problem
• “Real world” size parallel applications and benchmarks
•
Can be partially based on existing validation suites WWW
•
Probably needs to be tool specific
•
Collect available benchmarks and applications WWW
Design and Implementation of a ATS Framework
© 2003 Forschungszentrum Jülich, NIC-ZAM [8]
Validation Suites and Kernel Benchmarks (I)
ValidationMPI test / validation suites from Intel, IBM, ANL•http://www-unix.mcs.anl.gov/mpi/mpi-test/tsuite.html
MPI BenchmarksPARKBENCH (PARallel Kernels and BENCHmarks)•http://www.netlib.org/parkbench/
PMB - Pallas MPI Benchmarks•http://www.pallas.com/e/products/pmb/
SKaMPI (Special Karlsruher MPI – Benchmark)•http://liinwww.ira.uka.de/~skampi/
© 2003 Forschungszentrum Jülich, NIC-ZAM [9]
Kernel Benchmarks (II)
OpenMP BenchmarksEPCC OpenMP Microbenchmarks •http://www.epcc.ed.ac.uk/… research/openmpbench/openmp_index.html
Hybrid BenchmarksThe Los Alamos MicroBenchmarks Suite (LAMB) • MPI and multi threading ( Pthreads and OpenMP)
programming models based on SKaMPI and EPCC
© 2003 Forschungszentrum Jülich, NIC-ZAM [10]
“Real World” Applications and Benchmarks
The NAS Parallel Benchmarks (NPB)•http://www.nas.nasa.gov/Software/NPB/
The ASCI Purple and Blue Benchmark Codes•http://www.llnl.gov/… asci/purple/benchmarks/limited/code_list.html… asci_benchmarks/asci/asci_code_list.html
NCAR Benchmarks•http://www.scd.ucar.edu/css/software/bench/
© 2003 Forschungszentrum Jülich, NIC-ZAM [11]
Current Design of ATS Framework
df_same()df_cyclic2()df_block2()df_linear()df_peak()df_cyclic3()df_block3()
DISTRIBUTION
do_work()
WORK
© 2003 Forschungszentrum Jülich, NIC-ZAM [12]
The Distribution Module
• Distribution specified by• Distribution function• Distribution parameters
• All distribution function have the same signature• double distr_func (int me, int size, double sf, distr_t* dd)
– me, size: member me of group of size size– sf: scaling factor– dd: distribution parameter descriptor
• returns value for me calculated based on me, size, and ddscaled by sf
• ATS provides set of predefined distribution functions• Can easily extended if needed
© 2003 Forschungszentrum Jülich, NIC-ZAM [13]
Predefined Distribution Functions
low
high
block2
low
high
cyclic2
val
same
low
high
linear
low
high
peak
low
med
high
block3
low
med
high
cyclic3
n
© 2003 Forschungszentrum Jülich, NIC-ZAM [14]
Current Design of ATS Framework
df_same()df_cyclic2()df_block2()df_linear()df_peak()df_cyclic3()df_block3()
DISTRIBUTION
do_work()
WORK
MPI PROPERTIES OpenMP PROPERTIES
par_do_omp_work()
OpenMP UTILS
par_do_mpi_work()alloc_mpi_buf()free_mpi_buf()alloc_mpi_vbuf()free_mpi_vbuf()mpi_commpattern_sendrecv()mpi_commpattern_shift()
MPI UTILS
© 2003 Forschungszentrum Jülich, NIC-ZAM [15]
Example: MPI Property Function late_sender
void par_do_mpi_work(distr_func_t df, distr_t* dd, MPI_Comm c) { int me, sz; MPI_Comm_rank(c, &me); MPI_Comm_size(c, &sz); do_work(df(me, sz, 1.0, dd));}
void late_sender(double bwork, double ework, int r, MPI_Comm c) { val2_distr_t dd; int i; mpi_buf_t* buf = alloc_mpi_buf(base_type, base_cnt); dd.low = bwork+ework; dd.high = bwork;
for (i = 0; i<r; ++i) { par_do_mpi_work(df_cyclic2, &dd, c); mpi_commpattern_sendrecv(buf, DIR_UP, 0, 0, c); } free_mpi_buf(buf);}
© 2003 Forschungszentrum Jülich, NIC-ZAM [16]
Currently Implemented Performance Property Functions
• MPI Point-to-PoCommunication Performance Properties• late_sender(basework, extrawork, rf, MPI_Comm);• late_receiver(basework, extrawork, rf, MPI_Comm);
• MPI Collective Communication Performance Properties• imbalance_at_mpi_barrier(distr_func, distr_param, rf, MPI_Comm);• imbalance_at_mpi_alltoall(distr_func, distr_param, rf, MPI_Comm);• late_broadcast(basework, rootextrawork, root, rf, MPI_Comm);• late_scatter(basework, rootextrawork, root, rf, MPI_Comm);• late_scatterv(basework, rootextrawork, root, rf, MPI_Comm);• early_reduce(rootwork, baseextrawork, root, rf, MPI_Comm);• early_gather(rootwork, baseextrawork, root, rf, MPI_Comm);• early_gatherv(rootwork, baseextrawork, root, rf, MPI_Comm);
• OpenMP Performance Properties• imbalance_in_parallel_region(distr_func, distr_param, rf);• imbalance_at_barrier(distr_func, distr_param, rf);• imbalance_in_loop(distr_func, distr_param, rf);
© 2003 Forschungszentrum Jülich, NIC-ZAM [17]
Current Design of ATS Framework
df_same()df_cyclic2()df_block2()df_linear()df_peak()df_cyclic3()df_block3()
DISTRIBUTION
do_work()
WORK
MPI PROPERTIES OpenMP PROPERTIES
par_do_omp_work()
OpenMP UTILS
par_do_mpi_work()alloc_mpi_buf()free_mpi_buf()alloc_mpi_vbuf()free_mpi_vbuf()mpi_commpattern_sendrecv()mpi_commpattern_shift()
MPI UTILS
TEST PROGRAMS
© 2003 Forschungszentrum Jülich, NIC-ZAM [18]
Performance Property Test Programs
• Single performance property testing• Programs can be generated automatically from
performance property function signature– Generator based on Program Database Toolkit (PDT)– http://www.cs.uoregon.edu/research/paracomp/pdtoolkit/
• Property parameters become test program arguments• More extensive tests through scripting languages
or experiment management system (e.g., Zenturio)– http://www.par.univie.ac.at/project/zenturio/
• Composite performance property testing• Program containing multiple performance property functions• Complexity only limited by imagination• Currently: manually implemented
© 2003 Forschungszentrum Jülich, NIC-ZAM [19]
Example: Single Performance Property Test Program
#include "mpi_pattern.h"
int main(int argc, char *argv[]) { distr_func_t df = atodf("b2:0.5:1.0"); distr_t *dd = atodd("b2:0.5:1.0"); int r = 1;
MPI_Init(&argc, &argv);
switch ( argc ) { case 3: r = atoi(argv[2]); case 2: df = atodf(argv[1]); dd = atodd(argv[1]); case 1: break; default: fprintf(stderr, "usage: %s <distf> <rfac>\n", argv[0]); break; }
imbalance_at_mpi_barrier(df, dd, r, MPI_COMM_WORLD); MPI_Finalize();}
© 2003 Forschungszentrum Jülich, NIC-ZAM [20]
Example: Single Performance Property Test Program
• imbalance_at_mpi_barrier <distribution-spec> <repition-factor>
b2:0.5:1.0 2 b2:0.1 :2.0 5
• Problem: additional property “MPI Setup/Termination Overhead” also holds!
© 2003 Forschungszentrum Jülich, NIC-ZAM [21]
Example: Collection of MPI Performance Properties
© 2003 Forschungszentrum Jülich, NIC-ZAM [22]
Examples: Detail MPI Properties
© 2003 Forschungszentrum Jülich, NIC-ZAM [23]
Example: MPI Properties in 2 Communicators
© 2003 Forschungszentrum Jülich, NIC-ZAM [24]
EXPERT Analysis of MPI 2 Communicator Example
© 2003 Forschungszentrum Jülich, NIC-ZAM [25]
Example: OpenMP Performance Property
© 2003 Forschungszentrum Jülich, NIC-ZAM [26]
ATS: Status and Future Work
• Initial prototype available from APART website• List of MPI, OpenMP, and hybrid
validation and benchmark suites• 1st version of ATS framework including
– C version of code– Single property test program generator
• Future Work• More complete collection of validation and benchmark suites• Real “real world” applications• ATS Framework
– Fortran version – More complete list of property functions for
MPI, OpenMP, hybrid, and sequential performance properties– Documentation