+ All Categories
Home > Documents > 2010 02 instrumentation_and_runtime_measurement

2010 02 instrumentation_and_runtime_measurement

Date post: 29-Nov-2014
Category:
Upload: ptihpa
View: 975 times
Download: 2 times
Share this document with a friend
Description:
 
38
Instrumentation and Run-Time Measurement VampirTrace
Transcript
Page 1: 2010 02 instrumentation_and_runtime_measurement

Instrumentation and Run-Time Measurement

VampirTrace

Page 2: 2010 02 instrumentation_and_runtime_measurement

Overview

• Instrumentation– Automatic, manual and binary instrumentation

• Run-time measurement– Behind the scenes, post-processing

– Trace file format, overhead

• Options, settings, parameters– Environment Variables

– PAPI hardware performance counters

– Memory allocation counters, application I/O calls

– Filtering, grouping

• FAQ and Issues

Page 3: 2010 02 instrumentation_and_runtime_measurement

INSTRUMENTATION

Page 4: 2010 02 instrumentation_and_runtime_measurement

Instrumentation in General

Edit – Compile – Run Cycle

Edit – Compile – Run Cycle with VampirTrace

Source Code Binary ResultsCompiler Run

Source Code Binary ResultsVT Wrapper

Run

Traces

Compiler

Page 5: 2010 02 instrumentation_and_runtime_measurement

Compiler Wrappers

• Easiest way of using VampirTrace

• No source code modifications

• In the build system of your application, substitute calls to the regular compiler with calls to the VampirTrace compiler wrappers

– For compiling and linking

– e.g. in the makefile change icc to vtcc

• Rebuild the application

• Run the application to produce trace data

Page 6: 2010 02 instrumentation_and_runtime_measurement

Instrumentation & Measurement

• What do you need to do for it?– VampirTrace and a supported compiler

• Instrumentation (automatic with compiler wrappers)

• Re-compile & re-link• Trace Run (run with appropriate test data set)

• More details later

CC = vtcc

CXX = vtcxx

F90 = vtf90

MPICC = vtcc -vt:cc mpicc

CC = icc

CXX = icpc

F90 = ifc

MPICC = mpicc

Page 7: 2010 02 instrumentation_and_runtime_measurement

Compiler Wrappers

Captured events:

• All user function entries and exits

– If supported by the compiler (Intel, GNU, PGI, NEC, IBM)

• MPI calls and messages

– If the application is MPI parallel

• OMP regions

– If the application is OpenMP parallel

Compiler Wrappers

Page 8: 2010 02 instrumentation_and_runtime_measurement

Manual Instrumentation

• Allows for detailed source code instrumentation

– e.g. regions of functions such as loops

• Can be combined with automatic instrumentation

• Be sure to instrument all function exits!

– Otherwise post-mortem analysis will fail

• I personally consider this advanced usage of VampirTrace!

Page 9: 2010 02 instrumentation_and_runtime_measurement

Manual Instrumentation

• Add the following into our source code to instrument a region, e.g. C: (available for C++ and FORTRAN as well)

• Compile with “-DVTRACE”– Otherwise, VampirTrace macros will expand to empty

blocks, producing zero overhead

#include "vt_user.h"...VT_USER_START("Region_1");...VT_USER_END("Region_1");...

vtcc -vt:inst manual prog.c -DVTRACE -o prog

Manual Instrumentation

Page 10: 2010 02 instrumentation_and_runtime_measurement

Binary Instrumentation

• Using DYNINST

– http://www.dyninst.org

• Source should be compiled with “-g” switch

• “vtunify” has to be run manually afterwards

vtf90 -vt:inst dyninst prog.c -o prog

Page 11: 2010 02 instrumentation_and_runtime_measurement

RUN-TIME MEASUREMENT

Behind the Scenes

Unifying - Post-Processing

OTF Open Trace Format

Tracing Overhead

Page 12: 2010 02 instrumentation_and_runtime_measurement

Workflow

1) Instrumentation– Hide instrumentation in compiler wrappers

– Use underlying compiler and add appropriate options

2) Test Run– Use representative test input

– Set parameters, environment variables, etc.

– Selective tracing

3) Get Trace

CC=mpicc

CC= vtcc -vt:cc mpicc

Page 13: 2010 02 instrumentation_and_runtime_measurement

Automatic Function Tracing

• Uses compiler support to add tracing calls at every function entry and exit

• Compilers supported:

– GNU, Intel, PGI, PathScale, IBM, Sun Fortran, NEC

• Binary instrumentation via Dyninst

Page 14: 2010 02 instrumentation_and_runtime_measurement

MPI and OpenMP Tracing

• Tracing of MPI-1 and MPI-IO events via PMPI interface

• Tracing of OpenMPdirectives via OPARI source-to-source instrumentation

Page 15: 2010 02 instrumentation_and_runtime_measurement

Hardware Performance Counter• Recording PAPI counter(s) at every function entry /

exit• PAPI allows access to hardware (mostly CPU)

counters, e.g. floating point operations, cache misses, exceptions

• Can derive rates, e.g. GFlop/s of each function

Page 16: 2010 02 instrumentation_and_runtime_measurement

Memory and I/O Tracing

• Tracing of memory allocation calls via libcbuilt-in hooks

• malloc, realloc, free, …

• Tracing of I/O calls, accessed files, transferred data volume via wrappers for I/O calls

• open, read, write, …

Page 17: 2010 02 instrumentation_and_runtime_measurement

Instrumentation & MeasurementWhat does VampirTrace do in the background?

• Trace Run:– Event data collection– Precise time measurement– Parallel timer synchronization– Collecting parallel process/thread traces– Collecting performance counters

• from PAPI, • memory usage,• POSIX I/O calls and • fork/system/exec calls, and more …

– Filtering and grouping of function calls

17

Page 18: 2010 02 instrumentation_and_runtime_measurement

Behind the Scenes

• Trace data is written to a buffer in memory first

• When this buffer is full, data is flushed to storage

• After the application has run to completion, these trace files are unified to produce the final OTF trace

• Most aspects of this behavior can be customized with environment variables

Page 19: 2010 02 instrumentation_and_runtime_measurement

Filebased Workflow

Page 20: 2010 02 instrumentation_and_runtime_measurement

Unifying - Post-Processing

• Normally, trace data is unified automatically after the application has run to completion

• This takes time – depending on the trace-data

• Can be switched off by an environment variable

• vtunify <number-of-trace-files> <trace-file-prefix>

vtunify 16 my_trace

Page 21: 2010 02 instrumentation_and_runtime_measurement

How to Store Trace Data - Trace File

Various trace file formats (for HPC):

– VTF3 (TU Dresden)

– Tau Trace Format (Univ. of Oregon, LANL and JSC/Jülich)

– EPILOG (JSC/Jülich/Germany)

– STF (Pallas GmbH, now Intel)

– OTF (TU Dresden)

• ASCII or binary file formats

• single/multiple file(s) per trace

• merge process traces to single file

• multiple streams for parallel/selective I/O

Page 22: 2010 02 instrumentation_and_runtime_measurement

OTF – Open Trace Format

• Open source trace file format– Available from the homepage of TU Dresden, ZIH

http://www.tu-dresden.de/zih/otf/

• Includes powerful libotf for use in custom applications

• API / Interfaces– High level interface for analysis tools

– Low level interface for trace libraries

• Actively developed – In cooperation with the University of Oregon and

Lawrence Livermore National Laboratory

Page 23: 2010 02 instrumentation_and_runtime_measurement

Tracing Overhead

• Measured on SGI Altix 4700, Itanium 2 1.6 GHz

• Tracing overhead per function call (from test program with one million function calls, multiple repetitions)

• Suppressed inlining: icc -O2 -ip-no-inlining

9.25 µs4.47 µs1 PAPI counter

1.04 µs0.82 µsFiltered function

1.10 µs0.92 µsWithout PAPI

9.64 µs4.61 µs3 PAPI counters

Intel Trace CollectorVampirTrace

Page 24: 2010 02 instrumentation_and_runtime_measurement

OPTIONS, SETTINGS, PARAMETERS

Environment Variables

PAPI hardware performance counters

Memory allocation counters

Application I/O calls

Filtering

Grouping

Page 25: 2010 02 instrumentation_and_runtime_measurement

Environment Variables

• By default, trace data is written to the ‘pwd’

• About everything of this can be customized with environment variables

• Environment variables must be set prior to running the application, not prior to building the application

Page 26: 2010 02 instrumentation_and_runtime_measurement

Environment Variables

VT_PFORM_GDIR Directory where final trace file is storedVT_PFORM_LDIR Directory for intermediate trace filesVT_FILE_PREFIX Trace file nameVT_BUFFER_SIZE Internal trace buffer sizeVT_MAX_FLUSHES Max number of buffer flushesVT_MEMTRACE Enable memory allocation tracingVT_IOTRACE Enable I/O tracingVT_MPITRACE Enable MPI tracingVT_FILTER_SPEC Name of filter fileVT_GROUPS_SPEC Name of function groups fileVT_COMPRESSION Compress trace filesVT_METRICS List of PAPI counters

Page 27: 2010 02 instrumentation_and_runtime_measurement

PAPI Counter

• PAPI counters can be included in traces

– If PAPI is available on the platform

– If VampirTrace was build with PAPI support

• VT_METRICS can be used to specify a colon-separated list of PAPI counters

• VampirTrace >5.8.1 will have a customizable separator as Component-PAPI counters will use colons in the counter-names

export VT_METRICS=PAPI_FP_OPS:PAPI_L2_TCM

Environment Variables

Page 28: 2010 02 instrumentation_and_runtime_measurement

Memory Counter

• Memory allocation counters can be included in traces

– If VampirTrace was build with memory allocations support

– If GNU glibc is used on the platform

• Memory function in glibc like “malloc” and “free” are traced

• Environment variable VT_MEMTRACE

export VT_MEMTRACE=yes

Page 29: 2010 02 instrumentation_and_runtime_measurement

I/O Counter

• I/O counter can be included in traces

– If VampirTrace was build with I/O tracing support

• Standard I/O calls like “open” and “read” are recorded

• Environment variable VT_IOTRACE

export VT_IOTRACE=yes

Page 30: 2010 02 instrumentation_and_runtime_measurement

User defined Counter

• Records program variables or any othernumerical quantity

• Helps finding „that one loop-iteration“ whichcauses trouble

#include "vt_user.h"int main() {

unsigned int i, cid, cgid;

cgid = VT_COUNT_GROUP_DEF(’loopindex’);cid = VT_COUNT_DEF("i", "#", VT_COUNT_TYPE_UNSIGNED, cgid);

for( i = 1; i <= 100; i++ ) {VT_COUNT_UNSIGNED_VAL(cid, i);

}return 0;

}

Page 31: 2010 02 instrumentation_and_runtime_measurement

User defined Counter

Page 32: 2010 02 instrumentation_and_runtime_measurement

Function Filtering• Filtering is one of the ways to reduce trace file size• Environment variable VT_FILTER_SPEC

• Filter definition file contains a list of filters

• Filter rules can be global to all processes or only be assigned to specific ranks (see the manual for more details of rank specific filtering)

• See also the vtfilter tool– Can generate a customized filter file– Can reduce the size of existing trace files

%> export VT_FILTER_SPEC=filter.spec

my_*;test_* -- 1000debug_* -- 0calculate -- -1* -- 1000000

Page 33: 2010 02 instrumentation_and_runtime_measurement

Switch Tracing On/Off

• Starting and stopping of tracing should be performed with care

• Tracing has to be activated on the same level as it was switched off to ensure the consistency of the trace file

• Useful if your program behaves in an iterative manner or if you are only interested in some parts of your application

• Recompile your source code with the user macro“-DVTRACE”

#include “vt_user.h”…VT_OFF();for( i=1; i < 100; i++ ) { do something};VT_ON();…

%> vtcc … -DVTRACE source_code.c …

Page 34: 2010 02 instrumentation_and_runtime_measurement

Selective Instrumentation

• Selective instrumentation can help you to reduce the size of your trace file so that only those parts of interests will be recorded

• One option to use selective instrumentation is to use a manual instrumentation instead of a automatic instrumentation

• Another option is to modify your Makefile in such a way that a automatic instrumentation (default) is only applied to source files of interest (functions of interest)

%> vtcc -vt:inst manual … source_code.c

Page 35: 2010 02 instrumentation_and_runtime_measurement

Function Grouping

• Groups can be defined by the user to group related functions

– Groups can be assigned different colors in Vampir, highlighting application behavior

• Environment variable VT_GROUPS_SPEC

• Group file contains a list of groups with associated functions

export VT_GROUPS_SPEC=/path/to/groups.spec

CALC=calculateMISC=my*;test

UNKNOWN=*

Page 36: 2010 02 instrumentation_and_runtime_measurement

Advanced Performance Monitoring

• CUDA wrapper library

– Based on LD_PRELOAD

– Usable with dynamically linked libraries

– Little overhead (indirection)

– No re-compilation (neither application nor library)

Preload-Library

Application

CUDA

Function

Function

Wrapper-Function

enter

enter

leave

leave

Page 37: 2010 02 instrumentation_and_runtime_measurement

Advanced Performance Monitoring

• vtlibwrapgen

– Abstraction layer for process monitoring

– Dynamic and static libraries

– Requires library’s header file only

– Portable

monitor-gen

foo.h

make

callback.inc.*

libmonitor/src

vt_user.h

libmonitor.so

vtlibwrapgen -g SDL -o SDLwrap.c /usr/include/SDL/*.h

vtlibwrapgen --build --shared -o libSDLwrap SDLwrap.c

export LD_PRELOAD=$PWD/libSDLwrap.so <executable>

Page 38: 2010 02 instrumentation_and_runtime_measurement

QUESTIONS?


Recommended