8/3/2019 MAE Mostafa
1/40
MAE Presentation - Nagy Mostafa 1
Software Performance Profiling
Major Area Presentation
Nagy Mostafa6-5-2008
Committee Members
Chandra Krintz (chair)Tim SherwoodTevfik Bultan
8/3/2019 MAE Mostafa
2/40
MAE Presentation - Nagy Mostafa 2
Motivation
Application
Hardware
Application Server
Virtual Machine
O.S.
Thousands of lines of code Several different-language modules Numerous features and usage scenarios
Heterogeneous Prefetching, data/control-flow speculation,pipelining, caching, etc.
8/3/2019 MAE Mostafa
3/40
MAE Presentation - Nagy Mostafa 3
Motivation Static analysis fails to capture runtime-
behavior because S/W and H/W are complex S/W H/W interaction is hard to understand
Application usage scenarios are hard topredict
Solution:Dynamic analysis based on run-time information
8/3/2019 MAE Mostafa
4/40
MAE Presentation - Nagy Mostafa 4
Profiling Profiling: investigation of program behavior using run-
time information
Profiler: conceptual module that collects/analysis run-time data Profile: a set of frequencies associated with run-time
events
Profile usage examples: Feedback-directed optimization [Calder 99] [Burrows 00] Debugging and bug-isolation [Liblit 03]
Coverage testing [Tikir 02] Understanding program/architecture interaction
[Hauswirth 04]
8/3/2019 MAE Mostafa
5/40
MAE Presentation - Nagy Mostafa 5
Profiling Desired qualities of a profiler:
Accurate Low-overhead
Fast convergence Flexible Portable
Transparent Low storage overhead
8/3/2019 MAE Mostafa
6/40
MAE Presentation - Nagy Mostafa 6
Outline Types of profiles
Profiling techniques and implementations
Hybrid profiling systems
Software profiling systems
Path profiling
8/3/2019 MAE Mostafa
7/40
MAE Presentation - Nagy Mostafa 7
Types of profiles Point profile
Events are simple and independente.g.
basic block, edge, method, cache-misses,
Context profileEvents are composed of simpler ordered eventse.g.
call-context is a sequence of method invocationsexecution paths is a sequence of edges in a CFG
8/3/2019 MAE Mostafa
8/40
MAE Presentation - Nagy Mostafa 8
Profiling techniquesExhaustive instrumentation
Sampling
Instrumentation sampling,
temporary instrumentation,
A c c ur a t e
L i gh t -W ei gh t
8/3/2019 MAE Mostafa
9/40
MAE Presentation - Nagy Mostafa 9
Profiler implementations Hybrid (HW-assisted)
1. Hardware Performance Monitors (HPMs)2. Dedicated HW collectors that deliver data to
SW module
Fixed, low-overhead
Software Pure software implementations
Portable, flexible, high-overhead
8/3/2019 MAE Mostafa
10/40
MAE Presentation - Nagy Mostafa 10
Outline Types of profiles
Profiling techniques and implementations
Hybrid profiling systems
Software profiling systems
Path profiling
8/3/2019 MAE Mostafa
11/40
MAE Presentation - Nagy Mostafa 11
Hybrid profiling systems HPM-based system-wide profilers
Event-based sampling Support multiple events Profile unmodified binaries Low overhead May be used in production environment
E.g. DCPI [Anderson 97] , OProfile, VTune
8/3/2019 MAE Mostafa
12/40
MAE Presentation - Nagy Mostafa 12
Hybrid profiling systems HPM-sampling for dynamic compilation
[Buytaert 07] HPM-sampling for JikesRVM
More accurate, converges faster, less overhead
Speed-up by 5 - 18%
Online optimizations using HPMs[Schneider 07]
Cache-misses profiling Co-allocation of objects O.H. < 1%, speed-up 14%
Class A Class B
parent child
8/3/2019 MAE Mostafa
13/40
MAE Presentation - Nagy Mostafa 13
Hybrid profiling systems Rapid profiling via stratified sampling [Sastry 01]
Data stream is divided into disjoint strata
Reduces size of output stream and improvesaccuracy
O.H. 4.5%, accuracy 97%
P
PP
P
.
.
.
HEvents
stream
Samples
Periodic Sampler
Stratified sampling
Hasher Samples
8/3/2019 MAE Mostafa
14/40
MAE Presentation - Nagy Mostafa 14
Hybrid profiling systems
Phase-aware profiling [Nagpurkar 05]
Programs exhibit repeating patterns of execution(phases) Profiling a representative of each phase approximates
full profile
Phase-change detector is in HW O.H. reduction by 58% over periodic sampling,
accuracy 95%
Source: Nagpurkar 05
8/3/2019 MAE Mostafa
15/40
MAE Presentation - Nagy Mostafa 15
Outline Types of profiles
Profiling techniques and implementations
Hybrid profiling systems
Software profiling systems
Path profiling
8/3/2019 MAE Mostafa
16/40
MAE Presentation - Nagy Mostafa 16
Software profiling systems Dynamic call tree
Fully describes method invocation duringexecution
Calling context tree (CCT)
Aggregates calls with same context
Source: [Whaley 00]
Dynamic call treeCalling context tree
8/3/2019 MAE Mostafa
17/40
MAE Presentation - Nagy Mostafa 17
Software profiling systems Calling-context profiling
Goal: build an approximate calling contexttree (CCT)
Expensive to collect via instrumentation
Approximate CCTDynamic call tree
Source: Whaley 00
8/3/2019 MAE Mostafa
18/40
MAE Presentation - Nagy Mostafa 18
Software profiling systems Sampling-based approach [Whaley 00]
Builds a partial call-context tree (PCCT)
Walks the stack on each interrupt up to k frames Overhead 2-4%, Precision more than 90% Disadv:
Precision is the correlation of PCCTs for differentruns of benchmark (not accuracy)
Relies on O.S. timer interrupt (Low frequency,slower on faster architectures)
8/3/2019 MAE Mostafa
19/40
MAE Presentation - Nagy Mostafa 19
Software profiling systems Approximating CCT [Arnold 00]
Uses event-based sampling (method counter)
No bound on stack depth sampled High accuracy, O.H. not analyzed (expected to be high)
Timer/event-based sampling [Arnold 05]
Stride-enabled (mix of timer-based and event-basedsampling), O.H. < 0.3%, Accuracy 60%
Probabilistic calling context (PCC) [Bond 07] Usage: Residual testing, debugging, anomaly detection Instrumentation-based (Sampling is not suitable) No CCT is maintained O.H. 3%
8/3/2019 MAE Mostafa
20/40
MAE Presentation - Nagy Mostafa 20
Outline Types of profiles
Profiling techniques and implementations
Hybrid profiling systems
Software profiling systems
Path profiling
8/3/2019 MAE Mostafa
21/40
MAE Presentation - Nagy Mostafa 21
Path profiling Frequency of execution paths in CFG Exponential in the number of CFG nodes
Hard to collect by sampling Path profile is NOT edge profile
Source: Ball 96
8/3/2019 MAE Mostafa
22/40
MAE Presentation - Nagy Mostafa 22
Path profiling
Source: Ball 96
Efficient path profiling [Ball 96] Compact enumeration of path (Path IDs) O.H. average 31%, maximum 97% Disadv: High overhead, suitable only for offline setting
8/3/2019 MAE Mostafa
23/40
MAE Presentation - Nagy Mostafa 23
Path profiling Online hot path prediction scheme
[Duesterwald 00 ] Only path head is instrumented instead of full path Next Executing Trail (NET) predicted as hot Hit rate average: 97% when 10% of execution profiled Disadv: Cannot distinguish hot from warm paths
(false positives)
8/3/2019 MAE Mostafa
24/40
MAE Presentation - Nagy Mostafa 24
Path profiling Selective path profiling (SPP) [Apiwattanapong 02] Instrument only paths of interest
Usage: Residual testing, profiling different subsetsover multiple copies of deployed software
Disadv: the set of path is not totally arbitrary
Preferential path profiling [Vasawni 07] Similar to SPP but paths can be specified arbitrarily O.H. average 15%
8/3/2019 MAE Mostafa
25/40
MAE Presentation - Nagy Mostafa 25
Path profiling Variational path profiling [Perelman 05]
Offline profiling scheme Collects execution time variability for paths Paths with high variability are good candidates for
optimization O.H. average: 5%, speedup via simple
optimizations: 8.5%
Disadv: Doesnt report constantly slow paths.
8/3/2019 MAE Mostafa
26/40
MAE Presentation - Nagy Mostafa 26
Path profiling From edge profile we can find total flow (frequency)
upper bound on the flow of each paths lower bound on the flow of each path (definite flow)
A
B C
D
E F
G
50 30
6020
Consider ABDEG:Max flow = 50
Let ACDFG = 0, ACDEG = 30, ABDFG = 20Min flow = 30 (Definite Flow)
8/3/2019 MAE Mostafa
27/40
MAE Presentation - Nagy Mostafa 27
Path profilingA
I
B
C D
F
E
H
G
J
Defining edge
Defining edgeABDFGHJFG
ABDFHJFH
ABDEHJDE
ABCEHJCEAIJAI
Obvious pathDefining Edge
Obvious path:Path whose flow can be driven from edge profile
8/3/2019 MAE Mostafa
28/40
MAE Presentation - Nagy Mostafa 28
Path profiling Targeted path profiling (TPP) [Joshi 04]
Suitable for staged dynamic optimizers Relies on edge profile to:
Remove obvious paths Remove cold edges
O.H. 14%, Acc > 97% Disadv: Compilation O.H. 74%
8/3/2019 MAE Mostafa
29/40
MAE Presentation - Nagy Mostafa 29
Path profiling Practical path profiling (PPP) [Bond 05] Reduces amount of instrumentation than TPP Reduce instrumentation overhead
Smart path numbering (avoid instrumenting hotedges)
O.H. average 5%, accuracy average 96% Disadv: Compilation overhead not reported
(expected to be high)
8/3/2019 MAE Mostafa
30/40
MAE Presentation - Nagy Mostafa 30
Path profilingTwo types of instrumentation:
Expensive profile updates
Cheap register updates
8/3/2019 MAE Mostafa
31/40
MAE Presentation - Nagy Mostafa 31
Path profiling Path and edge profiling (PEP) [Bond05]
Orthogonal approach that samples profileupdates
Piggybacks on JikesRVM sampling profiler Edge profile guides instrumentation placing O.H. average 1.2%, 94% accuracy
Disadv: VM-specific
8/3/2019 MAE Mostafa
32/40
MAE Presentation - Nagy Mostafa 32
Conclusion Programs behavior are hard to understand statically
Dynamic analysis based on runtime data is needed
Performance profiling investigates runtime behavior
Profiling techniques range from exhaustive instrum-entation to sampling
Implementation can be in hardware, software or hybrid
Context and path profiling techniques
8/3/2019 MAE Mostafa
33/40
MAE Presentation - Nagy Mostafa 33
8/3/2019 MAE Mostafa
34/40
MAE Presentation - Nagy Mostafa 34
Reducing profiling cost Ephemeral instrumentation [Traub 00]
Temporary instrumentation Shadow profiling [Moseley 07]
O.H. < 1%, Accuracy 94% (value-profiling)
Profile over adaptive ranges [Mysore 06] Adaptively reducing profile storage overhead
8/3/2019 MAE Mostafa
35/40
MAE Presentation - Nagy Mostafa 35
Reducing profiling cost
Source: Arnold 01 Source: Hirzel 01
8/3/2019 MAE Mostafa
36/40
MAE Presentation - Nagy Mostafa 36
Usage examples Value profiling and optimization
[Calder 99] [Burrows 00] Code coverage testing [Tikir 02] Bug isolation [Liblit 03] Understand OO applications [Hauswirth 04] Offline/Online feed-back directed
optimization
8/3/2019 MAE Mostafa
37/40
MAE Presentation - Nagy Mostafa 37
Path profiling usage example
Source: Gupta et al. 1998
8/3/2019 MAE Mostafa
38/40
MAE Presentation - Nagy Mostafa 38
Sampling: Triggering mechanism
Timer-based Easy to implement, Relies on O.S. timer
interrupt ( e.g. 4 ms on Linux 2.6 = 250 samples/sec) Disadv.: Low sampling frequency, independent of
processor speed, cannot always correlate withprogram events
Event-based Relies on event counting in SW or HW (e.g.
HPMs) Correlates with program events Adapts to processor speed
8/3/2019 MAE Mostafa
39/40
MAE Presentation - Nagy Mostafa 39
Sampling: Collection mechanism Polling-based
Samples are taken only at polling points (e.g.method entries, back-edges, etc. )
Disadv.: Not timely Overhead: flag checking Limited accuracy: biased sampling (e.g. polling
points after long I/O operations)
Immediate Sample is taken right after interrupt is raised
8/3/2019 MAE Mostafa
40/40
MAE Presentation - Nagy Mostafa 40
Sampling: Collection mechanism(2)
I/O
Over-sampled polling pointtime
Timer-interrupt
Timer-interrupt
time
Arnold-Grove sampling [Arnold 05]
Sample burst = 3, Stride = 3
Problem with polling sampling
.. .. ..