HandsOn
QUARRY / VAMPIRSERVERUsing putty for portforwarding
Start VM
• If not done already – start the virtual machine:
1. Start All Programs MS Virtual PC
2. Next Add an existing VM Next
3. Browse Select Windows XP VM
4. Next Finish
5. Start Windows XP VM
• Full-screen-mode is found under the Action menu entry
• The VM should resize the resolution automatically
Hands-On is completely done in the VM
Start PuTTY
Select Quarry, click Load and then Open
Login to compute-node
• There is a script in your home-folder that should connect you to the correct node:
% ./logon_to_compute_node
Start VampirServer
% vampirserver
Once connected, type in “vampirserver “
Open Second PuTTY
Select Quarry, click Load but !NOT! Open
Portforwarding
On the left, select SSH and then Tunnels
Portforwarding
Source port: 30000Destination: see vampirserverClick Add
Portforwarding
Click Open and then:
% ./logon_to_compute_node
This terminal can be used normally for compile&run commands
Vampir Remote Open
Open GUI, click on File, then click on Remote Open
Vampir Remote Open
Server: 127.0.0.1Port: 30000Click on Connect
Vampir Remote Open
To avoid having to wait for all user-home folders to be loaded – add path manually.Path: /N/u/hpstrn####: 01-15 (see vampirserver terminal for your specific username)
Vampir Remote Open
In your Home under Quarry/traces/p_8, click on Semtex_original_8cpu and then Open
Vampir 7 GUI
Take time to get acquainted with the different displays and the options each display offers
Use of VampirTrace• Instrument your application with VampirTrace
– Edit your Makefile and change the underlying compiler
– Tell VampirTrace the parallelization type of your application
– Optional: Choose instrumentation type for your application
CC = icc
CXX = icpc
F77 = ifort
F90 = ifort
MPICC = icc
MPIF90 = ifort
CC = vtcc
CXX = vtcxx
F77 = vtf77
F90 = vtf90
MPICC = vtcc
MPIF90 = vtf90
-vt:<seq|mpi|mt|hyb>
# seq = sequential
# mpi = parallel (uses MPI)
# mt = parallel (uses OpenMP/POSIX threads)
# hyb = hybrid parallel (MPI+Threads)
-vt:inst <gnu|pgi|sun|xl|ftrace|openuh|manual|
dyninst>
# DEFAULT: automatic instrumentation by compiler
# manual: manual by using VT’s API (see manual)
# dyninst: binary instrumentation using Dyninst
HANDS-ON EXERCISEGetting to know the GUI
Hands-on: The Ping-Pong Example
• Hands-on: The Ping-Pong example with VampirTrace and Vampir– Go to the ping_pong.c example program
– Compile and run with pristine version• Always check that target application compiles and runs
without errors
– Compile with VampirTrace compiler wrapper
– Run normally
%> mpicc -g –O3 ping_pong.c –o ping_pong
%> mpirun –np 2 ./ping_pong
%> vtcc –vt:cc mpicc -g –O3 ping_pong.c –o
ping_pong
%> mpirun –np 2 ./ping_pong
%> cd ./examples/ping_pong
Hands-on: The Ping-Pong Example– After trace run, there are additional output files in the
working directory:
– The event trace in Open Trace Format (OTF)• Anchor file *.otf• Definitions in *.def.z• Events in *.events.z, one per process/rank/thread by default• Markers in *.markers.z for advanced usage
– Open *.otf with Vampir– Command line tools to access or modify OTF traces
2.2K ping_pong.0.def.z
29 ping_pong.0.marker.z
954 ping_pong.1.events.z
935 ping_pong.2.events.z
12 ping_pong.otf
Hands-on: The Ping-Pong Example
Timeline and Profile: time mostly spend in VT init and MPI finish
Time interval indicator: entire time shown
Hands-on: The Ping-Pong Examplezoomed to the actual activity
Hands-on: The Ping-Pong Example
further zoomed, ping-pong messages become visible
MPI time still dominating!
average message bandwidth
Hands-on: The Ping-Pong Example
zoomed to single message pair
different behavior on both ranks details for selected
second message
VAMPIR / VAMPIRTRACE HANDS-ON EXERCISE
Guided Exercise with NPB 3.3 BT-MPI
Center for Information Services and High Performance Computing (ZIH)
Hands-on: NPB 3.3 BT-MPI
– Move into tutorial directory in your home directory
– Select the VampirTrace compiler wrappers% vim config/make.def
-> comment out line 32, resulting in:...
32: #MPIF77 = mpif77...
-> remove the comment from line 38, resulting in:...
38: MPIF77 = vtf77 –vt:f77 mpif77
...-> comment out line 88, resulting in:
...88: #MPICC = mpicc
...
-> remove the comment from line 94, resulting in:...
94: MPICC = vtcc -vt:cc mpicc...
% cd NPB3.3-MPI
Hands-on: NPB 3.3 BT-MPI
• Build benchmark
• Launch as MPI application% cd bin.vampir; export VT_FILE_PREFIX=bt_1_initial
% mpiexec –np 16 bt_W.16
NAS Parallel Benchmarks 3.3 -- BT Benchmark
Size: 24x 24x 24Iterations: 200 dt: 0.0008000
Number of active processes: 16
Time step 1
...Time step 180
[0]VampirTrace: Maximum number of buffer flushes reached \
(VT_MAX_FLUSHES=1)[0]VampirTrace: Tracing switched off permanently
Time step 200...
% make clean; make suite
Hands-on: NPB 3.3 BT-MPI• Resulting trace files
• Visualization with Vampir7
% ls -alh
4,1M bt_1_initial.164,9K bt_1_initial.16.0.def.z
29 bt_1_initial.16.0.marker.z12M bt_1_initial.16.10.events.z
12M bt_1_initial.16.1.events.z11M bt_1_initial.16.2.events.z
12M bt_1_initial.16.3.events.z
...11M bt_1_initial.16.c.events.z
12M bt_1_initial.16.d.events.z12M bt_1_initial.16.e.events.z
12M bt_1_initial.16.f.events.z
66 bt_1_initial.16.otf
28
Hands-on: NPB 3.3 BT-MPI
29
Hands-on: NPB 3.3 BT-MPI
Hands-on: NPB 3.3 BT-MPI
• Decrease number of buffer flushes by increasing the buffer size
• Set a new file prefix
• Launch as MPI application
% export VT_FILE_PREFIX=bt_2_buffer_120M
% export VT_MAX_FLUSHES=1 VT_BUFFER_SIZE=120M
% mpiexec -np 16 bt_W.16
31
Hands-on: NPB 3.3 BT-MPI
On an SGI Altix4700
32
Hands-on: NPB 3.3 BT-MPI
On an SGI Altix4700
Hands-on: NPB 3.3 BT-MPI
• Generate filter specification file
• Set a new file prefix
• Launch as MPI application
• For reference a manually written filterfile:
% export VT_FILE_PREFIX=bt_3_filter
% vtfilter -gen -fo filter.txt -r 10 -stats \
-p bt_2_buffer_120M.otf% export VT_FILTER_SPEC=/path/to/filter.txt
% mpiexec -np 16 bt_W.16
matmul_sub*; matvec_sub*;binvcrhs* --0
34
Hands-on: NPB 3.3 BT-MPI
On an SGI Altix4700
35
Hands-on: NPB 3.3 BT-MPI
On an SGI Altix4700
PAPI
• PAPI counters can be included in traces
– If VampirTrace was build with PAPI support
– If PAPI is available on the platform
• VT_METRICS specifies a list of PAPI counters
• see also the PAPI commands papi_avail and papi_command_line
• PAPI is not available on quarry
– View traces Large/Small on your windows-machine
% export VT_METRICS = PAPI_FP_OPS:PAPI_L2_TCM
37
Hands-on: NPB 3.3 BT-MPI
• Record I/O and Memory counters
• Set a new file prefix
• Launch as MPI application
% export VT_FILE_PREFIX=bt_4_papi
% export VT_MEMTRACE = yes
% export VT_IOTRACE = yes
% mpiexec -np 16 bt_W.16
Hands-on: NPB 3.3 BT-MPI
On an SGI Altix4700
FREE TRAINING
Examples:Filtering: filter_mpi_ompInstrumenting: instrument_ringProfiling: profile_heatMixed: Cannon
examples/filter_mpi_omp
• Look into the source-code
– Artificial example made of three parts
• Matrix multiply MPI-parallelized
• Matrix multiply OpenMP-parallelized
• Dummy functions
• Use automatic instrumentation and visualize
• Filter out the dummy functions, run&visualize
• Create a group-filter for dummy functions and matrix multiply functions
– Do not forget to switch off the function filter
examples/instrument_ring
• Look at source-code and makefiles
• Run and visualize both versions
• Add additional instrumentation for while loop
• Run and visualize again
examples/profile_heat
• Compile via “make all”
• export GMON_OUT_PREFIX=name
• Run the binaries (change prefix in between)
• Use gprof to combine the profiles: gprof –s
• Watch the output: gprof [-b] sum.txt | less
WRAP-UPHow to solve issues when using VampirTrace
HOW TO SOLVE ISSUES WHEN USING VAMPIRTRACE
For more details on VampirTrace and its features see also the manual.
Incomplete Traces
• Issue: Tracing was switched off because the internal trace buffer was too small
• Result:
• Asynchronous behavior of the application due tobuffer flush of the measurement system
• No tracing information available after flush operation
• Huge overhead due to flush operation
[0]VampirTrace: Maximum number of buffer flushes reached \
(VT_MAX_FLUSHES=1)
[0]VampirTrace: Tracing switched off permanently
Incomplete Traces - Solutions
• Increase trace buffer size
• Increase number of allowed buffer flushes (not recommended)
• Use filter mechanisms to reduce the number of recorded events
• Switch tracing on/off if your application in an iterative manner to reduce the number of recorded events
%> export VT_BUFFER_SIZE = 150M
%> export VT_MAX_FLUSHES = 2
%> export VT_FILTER_SPEC = $HOME/filter.spec
Way too large Traces
• Issue:
– Each function entry/exit, MPI event was recorded
• Result:
– Trace files become large even for short application runs
• Solutions:
– Use filter mechanisms to reduce the number of recorded events
– Use selective instrumentation of your application
– Switch tracing on/off if your application works in an iterative manner to reduce the number of recorded events
Overhead
• Issue:– Runtime filtering will be called for each event
• Result:– Runtime filtering increases the runtime overhead
• Solutions:– Use selective instrumentation of your application
– Use manual source instrumentation (high effort, error prone)
– Only instrument interesting source files with VampirTrace
– Switch tracing on/off if your application works in an iterative manner to reduce the number of recorded events
Additional Information needed
• Issue:– I’m interested in more events and hardware counters. What do
I have to do?
• Solutions:– Use the enviroment option VT_METRICS to enable recording of
additional hardware counters like PAPI, CPC or NEC if available.
– Use the environment option VT_RUSAGE to record the Unixresource usage counters.
– Use the environment option VT_MEMTRACE, if available on your system, to intercept the libc allocation functions add to record memory allocation information.
– For more additional events and recording hardware information see chapter 4 in the VampirTrace manual.
50
Thanks for your attention.