+ All Categories
Home > Documents > 2010 05 hands_on

2010 05 hands_on

Date post: 26-Jun-2015
Category:
Upload: ptihpa
View: 758 times
Download: 2 times
Share this document with a friend
Popular Tags:
50
HandsOn
Transcript
Page 1: 2010 05 hands_on

HandsOn

Page 2: 2010 05 hands_on

QUARRY / VAMPIRSERVERUsing putty for portforwarding

Page 3: 2010 05 hands_on

Start VM

• If not done already – start the virtual machine:

1. Start All Programs MS Virtual PC

2. Next Add an existing VM Next

3. Browse Select Windows XP VM

4. Next Finish

5. Start Windows XP VM

• Full-screen-mode is found under the Action menu entry

• The VM should resize the resolution automatically

Hands-On is completely done in the VM

Page 4: 2010 05 hands_on

Start PuTTY

Select Quarry, click Load and then Open

Page 5: 2010 05 hands_on

Login to compute-node

• There is a script in your home-folder that should connect you to the correct node:

% ./logon_to_compute_node

Page 6: 2010 05 hands_on

Start VampirServer

% vampirserver

Once connected, type in “vampirserver “

Page 7: 2010 05 hands_on

Open Second PuTTY

Select Quarry, click Load but !NOT! Open

Page 8: 2010 05 hands_on

Portforwarding

On the left, select SSH and then Tunnels

Page 9: 2010 05 hands_on

Portforwarding

Source port: 30000Destination: see vampirserverClick Add

Page 10: 2010 05 hands_on

Portforwarding

Click Open and then:

% ./logon_to_compute_node

This terminal can be used normally for compile&run commands

Page 11: 2010 05 hands_on

Vampir Remote Open

Open GUI, click on File, then click on Remote Open

Page 12: 2010 05 hands_on

Vampir Remote Open

Server: 127.0.0.1Port: 30000Click on Connect

Page 13: 2010 05 hands_on

Vampir Remote Open

To avoid having to wait for all user-home folders to be loaded – add path manually.Path: /N/u/hpstrn####: 01-15 (see vampirserver terminal for your specific username)

Page 14: 2010 05 hands_on

Vampir Remote Open

In your Home under Quarry/traces/p_8, click on Semtex_original_8cpu and then Open

Page 15: 2010 05 hands_on

Vampir 7 GUI

Take time to get acquainted with the different displays and the options each display offers

Page 16: 2010 05 hands_on

Use of VampirTrace• Instrument your application with VampirTrace

– Edit your Makefile and change the underlying compiler

– Tell VampirTrace the parallelization type of your application

– Optional: Choose instrumentation type for your application

CC = icc

CXX = icpc

F77 = ifort

F90 = ifort

MPICC = icc

MPIF90 = ifort

CC = vtcc

CXX = vtcxx

F77 = vtf77

F90 = vtf90

MPICC = vtcc

MPIF90 = vtf90

-vt:<seq|mpi|mt|hyb>

# seq = sequential

# mpi = parallel (uses MPI)

# mt = parallel (uses OpenMP/POSIX threads)

# hyb = hybrid parallel (MPI+Threads)

-vt:inst <gnu|pgi|sun|xl|ftrace|openuh|manual|

dyninst>

# DEFAULT: automatic instrumentation by compiler

# manual: manual by using VT’s API (see manual)

# dyninst: binary instrumentation using Dyninst

Page 17: 2010 05 hands_on

HANDS-ON EXERCISEGetting to know the GUI

Page 18: 2010 05 hands_on

Hands-on: The Ping-Pong Example

• Hands-on: The Ping-Pong example with VampirTrace and Vampir– Go to the ping_pong.c example program

– Compile and run with pristine version• Always check that target application compiles and runs

without errors

– Compile with VampirTrace compiler wrapper

– Run normally

%> mpicc -g –O3 ping_pong.c –o ping_pong

%> mpirun –np 2 ./ping_pong

%> vtcc –vt:cc mpicc -g –O3 ping_pong.c –o

ping_pong

%> mpirun –np 2 ./ping_pong

%> cd ./examples/ping_pong

Page 19: 2010 05 hands_on

Hands-on: The Ping-Pong Example– After trace run, there are additional output files in the

working directory:

– The event trace in Open Trace Format (OTF)• Anchor file *.otf• Definitions in *.def.z• Events in *.events.z, one per process/rank/thread by default• Markers in *.markers.z for advanced usage

– Open *.otf with Vampir– Command line tools to access or modify OTF traces

2.2K ping_pong.0.def.z

29 ping_pong.0.marker.z

954 ping_pong.1.events.z

935 ping_pong.2.events.z

12 ping_pong.otf

Page 20: 2010 05 hands_on

Hands-on: The Ping-Pong Example

Timeline and Profile: time mostly spend in VT init and MPI finish

Time interval indicator: entire time shown

Page 21: 2010 05 hands_on

Hands-on: The Ping-Pong Examplezoomed to the actual activity

Page 22: 2010 05 hands_on

Hands-on: The Ping-Pong Example

further zoomed, ping-pong messages become visible

MPI time still dominating!

average message bandwidth

Page 23: 2010 05 hands_on

Hands-on: The Ping-Pong Example

zoomed to single message pair

different behavior on both ranks details for selected

second message

Page 24: 2010 05 hands_on

VAMPIR / VAMPIRTRACE HANDS-ON EXERCISE

Guided Exercise with NPB 3.3 BT-MPI

Center for Information Services and High Performance Computing (ZIH)

Page 25: 2010 05 hands_on

Hands-on: NPB 3.3 BT-MPI

– Move into tutorial directory in your home directory

– Select the VampirTrace compiler wrappers% vim config/make.def

-> comment out line 32, resulting in:...

32: #MPIF77 = mpif77...

-> remove the comment from line 38, resulting in:...

38: MPIF77 = vtf77 –vt:f77 mpif77

...-> comment out line 88, resulting in:

...88: #MPICC = mpicc

...

-> remove the comment from line 94, resulting in:...

94: MPICC = vtcc -vt:cc mpicc...

% cd NPB3.3-MPI

Page 26: 2010 05 hands_on

Hands-on: NPB 3.3 BT-MPI

• Build benchmark

• Launch as MPI application% cd bin.vampir; export VT_FILE_PREFIX=bt_1_initial

% mpiexec –np 16 bt_W.16

NAS Parallel Benchmarks 3.3 -- BT Benchmark

Size: 24x 24x 24Iterations: 200 dt: 0.0008000

Number of active processes: 16

Time step 1

...Time step 180

[0]VampirTrace: Maximum number of buffer flushes reached \

(VT_MAX_FLUSHES=1)[0]VampirTrace: Tracing switched off permanently

Time step 200...

% make clean; make suite

Page 27: 2010 05 hands_on

Hands-on: NPB 3.3 BT-MPI• Resulting trace files

• Visualization with Vampir7

% ls -alh

4,1M bt_1_initial.164,9K bt_1_initial.16.0.def.z

29 bt_1_initial.16.0.marker.z12M bt_1_initial.16.10.events.z

12M bt_1_initial.16.1.events.z11M bt_1_initial.16.2.events.z

12M bt_1_initial.16.3.events.z

...11M bt_1_initial.16.c.events.z

12M bt_1_initial.16.d.events.z12M bt_1_initial.16.e.events.z

12M bt_1_initial.16.f.events.z

66 bt_1_initial.16.otf

Page 28: 2010 05 hands_on

28

Hands-on: NPB 3.3 BT-MPI

Page 29: 2010 05 hands_on

29

Hands-on: NPB 3.3 BT-MPI

Page 30: 2010 05 hands_on

Hands-on: NPB 3.3 BT-MPI

• Decrease number of buffer flushes by increasing the buffer size

• Set a new file prefix

• Launch as MPI application

% export VT_FILE_PREFIX=bt_2_buffer_120M

% export VT_MAX_FLUSHES=1 VT_BUFFER_SIZE=120M

% mpiexec -np 16 bt_W.16

Page 31: 2010 05 hands_on

31

Hands-on: NPB 3.3 BT-MPI

On an SGI Altix4700

Page 32: 2010 05 hands_on

32

Hands-on: NPB 3.3 BT-MPI

On an SGI Altix4700

Page 33: 2010 05 hands_on

Hands-on: NPB 3.3 BT-MPI

• Generate filter specification file

• Set a new file prefix

• Launch as MPI application

• For reference a manually written filterfile:

% export VT_FILE_PREFIX=bt_3_filter

% vtfilter -gen -fo filter.txt -r 10 -stats \

-p bt_2_buffer_120M.otf% export VT_FILTER_SPEC=/path/to/filter.txt

% mpiexec -np 16 bt_W.16

matmul_sub*; matvec_sub*;binvcrhs* --0

Page 34: 2010 05 hands_on

34

Hands-on: NPB 3.3 BT-MPI

On an SGI Altix4700

Page 35: 2010 05 hands_on

35

Hands-on: NPB 3.3 BT-MPI

On an SGI Altix4700

Page 36: 2010 05 hands_on

PAPI

• PAPI counters can be included in traces

– If VampirTrace was build with PAPI support

– If PAPI is available on the platform

• VT_METRICS specifies a list of PAPI counters

• see also the PAPI commands papi_avail and papi_command_line

• PAPI is not available on quarry

– View traces Large/Small on your windows-machine

% export VT_METRICS = PAPI_FP_OPS:PAPI_L2_TCM

Page 37: 2010 05 hands_on

37

Hands-on: NPB 3.3 BT-MPI

• Record I/O and Memory counters

• Set a new file prefix

• Launch as MPI application

% export VT_FILE_PREFIX=bt_4_papi

% export VT_MEMTRACE = yes

% export VT_IOTRACE = yes

% mpiexec -np 16 bt_W.16

Page 38: 2010 05 hands_on

Hands-on: NPB 3.3 BT-MPI

On an SGI Altix4700

Page 39: 2010 05 hands_on

FREE TRAINING

Examples:Filtering: filter_mpi_ompInstrumenting: instrument_ringProfiling: profile_heatMixed: Cannon

Page 40: 2010 05 hands_on

examples/filter_mpi_omp

• Look into the source-code

– Artificial example made of three parts

• Matrix multiply MPI-parallelized

• Matrix multiply OpenMP-parallelized

• Dummy functions

• Use automatic instrumentation and visualize

• Filter out the dummy functions, run&visualize

• Create a group-filter for dummy functions and matrix multiply functions

– Do not forget to switch off the function filter

Page 41: 2010 05 hands_on

examples/instrument_ring

• Look at source-code and makefiles

• Run and visualize both versions

• Add additional instrumentation for while loop

• Run and visualize again

Page 42: 2010 05 hands_on

examples/profile_heat

• Compile via “make all”

• export GMON_OUT_PREFIX=name

• Run the binaries (change prefix in between)

• Use gprof to combine the profiles: gprof –s

• Watch the output: gprof [-b] sum.txt | less

Page 43: 2010 05 hands_on

WRAP-UPHow to solve issues when using VampirTrace

Page 44: 2010 05 hands_on

HOW TO SOLVE ISSUES WHEN USING VAMPIRTRACE

For more details on VampirTrace and its features see also the manual.

Page 45: 2010 05 hands_on

Incomplete Traces

• Issue: Tracing was switched off because the internal trace buffer was too small

• Result:

• Asynchronous behavior of the application due tobuffer flush of the measurement system

• No tracing information available after flush operation

• Huge overhead due to flush operation

[0]VampirTrace: Maximum number of buffer flushes reached \

(VT_MAX_FLUSHES=1)

[0]VampirTrace: Tracing switched off permanently

Page 46: 2010 05 hands_on

Incomplete Traces - Solutions

• Increase trace buffer size

• Increase number of allowed buffer flushes (not recommended)

• Use filter mechanisms to reduce the number of recorded events

• Switch tracing on/off if your application in an iterative manner to reduce the number of recorded events

%> export VT_BUFFER_SIZE = 150M

%> export VT_MAX_FLUSHES = 2

%> export VT_FILTER_SPEC = $HOME/filter.spec

Page 47: 2010 05 hands_on

Way too large Traces

• Issue:

– Each function entry/exit, MPI event was recorded

• Result:

– Trace files become large even for short application runs

• Solutions:

– Use filter mechanisms to reduce the number of recorded events

– Use selective instrumentation of your application

– Switch tracing on/off if your application works in an iterative manner to reduce the number of recorded events

Page 48: 2010 05 hands_on

Overhead

• Issue:– Runtime filtering will be called for each event

• Result:– Runtime filtering increases the runtime overhead

• Solutions:– Use selective instrumentation of your application

– Use manual source instrumentation (high effort, error prone)

– Only instrument interesting source files with VampirTrace

– Switch tracing on/off if your application works in an iterative manner to reduce the number of recorded events

Page 49: 2010 05 hands_on

Additional Information needed

• Issue:– I’m interested in more events and hardware counters. What do

I have to do?

• Solutions:– Use the enviroment option VT_METRICS to enable recording of

additional hardware counters like PAPI, CPC or NEC if available.

– Use the environment option VT_RUSAGE to record the Unixresource usage counters.

– Use the environment option VT_MEMTRACE, if available on your system, to intercept the libc allocation functions add to record memory allocation information.

– For more additional events and recording hardware information see chapter 4 in the VampirTrace manual.

Page 50: 2010 05 hands_on

50

Thanks for your attention.


Recommended