Date post: | 06-Apr-2018 |
Category: |
Documents |
Upload: | lipsa-priyadarshini-jena |
View: | 238 times |
Download: | 0 times |
of 85
8/3/2019 Ch6 Measurement
1/85
Copyright 2004 David J. Lilja 1
Measurement tools and
techniques
Fundamental strategies
Interval timers
Program profiling Tracing
Indirect measurement
8/3/2019 Ch6 Measurement
2/85
Copyright 2004 David J. Lilja 2
Events
Most measurement tools based onevents
Some predefined change to system state
Definition depends on metric being measured Memory reference
Disk access
Change in a registers state
Network message
Processor interrupt
8/3/2019 Ch6 Measurement
3/85
Copyright 2004 David J. Lilja 3
Event Classification
Countmetrics
The number of times event X occurs
Number of cache misses Number of I/O operations
8/3/2019 Ch6 Measurement
4/85
Copyright 2004 David J. Lilja 4
Event Classification
Secondary-eventmetrics
Record a value when triggered
by some event
Record block size for each I/O
operation
Count number of operations
Find average I/O transfer size
8/3/2019 Ch6 Measurement
5/85
Copyright 2004 David J. Lilja 5
Event Classification
Profiles
Characterization of overall
behavior
Aggregate/big picture view of an
application program
Time spent in each function
8/3/2019 Ch6 Measurement
6/85
Copyright 2004 David J. Lilja 6
Event-Driven Strategies
Record necessary informationonly when
selected eventoccurs
Modify system to record event Dump data when program terminates
May need intermediate dumps also
E.g. simple coun
ter in
page fault ro
utine
8/3/2019 Ch6 Measurement
7/85
Copyright 2004 David J. Lilja 7
Event-Driven Strategies
System overhead
Only when the event of interest actually occurs
Infrequent events little perturbation Frequent events high perturbation
No longer typical behavior?
Perturbation changes system being measured
8/3/2019 Ch6 Measurement
8/85
Copyright 2004 David J. Lilja 8
Event-Driven Strategies
Inter-event time is unpredictable
Depends on when events actually occur
Makes it hard to estimate pert
urbatio
n
How long to measure?
Event-driven measurement tools
Good for low-frequency events
8/3/2019 Ch6 Measurement
9/85
Copyright 2004 David J. Lilja 9
Event-Driven Strategies
Counts 8 events exactly
+1 +1 +1 +1 +1 +1 +1 +1
8/3/2019 Ch6 Measurement
10/85
Copyright 2004 David J. Lilja 10
Tracing
Similar to event-driven
But record additional system state
Even
t has occu
rred coun
t Additional information to uniquely identify event
E.g. addresses that cause page faults
Overhead
Additional memory or disk storage Time to save state
Relatively large system perturbation
8/3/2019 Ch6 Measurement
11/85
Copyright 2004 David J. Lilja 11
Tracing
Counts 8 events plus extra data
+1;
Addr
+1;
Addr
+1;
Addr
+1;
Addr
+1;
Addr
+1;
Addr
+1;
Addr
+1;
Addr
8/3/2019 Ch6 Measurement
12/85
Copyright 2004 David J. Lilja 12
Sampling
Record necessary state at fixed time intervals
Overhead
Independent of specific event frequency Depends onsampling frequency
Misses some events
Produces statistical s
ummary
May miss infrequent events
Each replication will produce different results
8/3/2019 Ch6 Measurement
13/85
Copyright 2004 David J. Lilja 13
Sampling
Counts 3 events out of 5 samples
+1 +1 +1
8/3/2019 Ch6 Measurement
14/85
Copyright 2004 David J. Lilja 14
Comparisons
Event
countTracing Sampling
Resolution Exactcount Detailedinfo
Statistical
summary
Overhead Low High Constant
Perturbation ~ #events High Fixed
8/3/2019 Ch6 Measurement
15/85
Copyright 2004 David J. Lilja 15
Comparison
Event counting
Best for low frequency events
Required if exact co
untsneeded
Sampling
Best for high frequency events
If statistical summary is adequate
Tracing
When additional detail is required
8/3/2019 Ch6 Measurement
16/85
Copyright 2004 David J. Lilja 16
Indirect Measurements
Used when desired metric is not directly
accessible
Measure one thing directly Derive or deduce desired metric
Highly dependent on creativity of
performance analyst
8/3/2019 Ch6 Measurement
17/85
Copyright 2004 David J. Lilja 17
Interval Timers
Fundamental tool of performance
measurement
Measure execution time of any portion of aprogram
Provide time basis for sampling
8/3/2019 Ch6 Measurement
18/85
Copyright 2004 David J. Lilja 18
Interval Timers
Actually count clock pulses between two events
Event 1 Event 2
x1= Counter x2 = Counter
Tc
Te=(x2 x1)Tc
8/3/2019 Ch6 Measurement
19/85
Copyright 2004 David J. Lilja 19
Using an Interval Timer
Within an application program
Start_count = read_timer();
Portion of program to be measuredStop_count = read_timer();
Elapsed_time = (stop_count start_count)
* clock_period;
8/3/2019 Ch6 Measurement
20/85
Copyright 2004 David J. Lilja 20
Hardware Timer
Clock
n-bit counter
To CPU input port
Tc
Te=(x2 x1)Tc
8/3/2019 Ch6 Measurement
21/85
Copyright 2004 David J. Lilja 21
Software Timer
Te=(x2 x1)Tc
Clock
Tc
Prescalar
(divide-by-n)
Software counter
CPU interrupt inputTc
8/3/2019 Ch6 Measurement
22/85
Copyright 2004 David J. Lilja 22
Quantization Errors
Event
Clock
(b) Interval timer reports event duration of n = 14 clock ticks.
(a) Interval timer reports event duration of n = 13 clock ticks.
Event
Clock
8/3/2019 Ch6 Measurement
23/85
Copyright 2004 David J. Lilja 23
Quantization Error
Timer resolution
quantization error
Repeated measurements nTc < Te < (n+1)Tc
Te rounded to one clock tick
Completely unpredictable rounding
Want Tc to be as small as possible
8/3/2019 Ch6 Measurement
24/85
Copyright 2004 David J. Lilja 24
Timer Rollover
n-bit counter
count= [0, 2n-1]
Rollover= transition from (2n
1) 0 If rollover occurs betweenstart/stop events
Thencount= (x2 x1) < 0
Check forcount< 0
Measure again
Add 2n to count
8/3/2019 Ch6 Measurement
25/85
Copyright 2004 David J. Lilja 25
Timer Rollover
Counter width, n
Resolution
(Tc)
16 32 64
10 ns 655 us 43 s 58.5 cent
1 us 65.5 ms 1.2 h 5,580 cent
100 us 6.55 s 5 days 585,000 cent
1 ms 1.1 min 50 days 5,850,000 cent
8/3/2019 Ch6 Measurement
26/85
Copyright 2004 David J. Lilja 26
Timer Overhead
Start_count = read_timer();
Portion of program to be measured
Stop_count = read_timer();
Elapsed_time = (stop_count start_count)
* clock_period;
To access timer Min of 1 memory read subroutine call
Min of 1 memory write subroutine call
Once at start, again at stop
8/3/2019 Ch6 Measurement
27/85
Copyright2004Da
vidJ.Lilja
TimerOverhead
T1
T2
T3
T4
Event begins;
Initiate read_timer()
Event ends;Initiate read_time()
Current time actually read
Event being measured
begins
8/3/2019 Ch6 Measurement
28/85
Copyright 2004 David J. Lilja 28
Timer Overhead
T1 T2 T3 T4
T1 = time to read counter value
T2 = time to store counter value
T3 = time of the event we are measuring T4 = time to read counter value
T4 = T1
8/3/2019 Ch6 Measurement
29/85
Copyright 2004 David J. Lilja 29
Timer Overhead
T1 T2 T3 T4
Te = event time = T3
But actually measured
Tm = T2 + T3 + T4
Te = Tm (T2 + T4) = Tm (T1 + T2)
Timeroverhead= Tovhd = (T1 + T2)
8/3/2019 Ch6 Measurement
30/85
Copyright 2004 David J. Lilja 30
Timer Overhead
If Te >> Tovhd Ignore the timer overhead
If Te Tovhd Measurements will be highly suspect
Potentially large variations in Tovhd
Good rule of thumb
Te should be 100-1000x > Tovhd
8/3/2019 Ch6 Measurement
31/85
Copyright 2004 David J. Lilja 31
Approximate Measures of
Short Intervals
How to measure an event that is shorter than
the resolution of the clock?
Cann
ot directly measure eve
nts with
Te < Tc
Overhead makes it hard to measure even
when Te
> nTc
,
n is small integer
8/3/2019 Ch6 Measurement
32/85
Copyright 2004 David J. Lilja 32
Approximate Measures of
Short Intervals
Tc
Te
Te
Case 1:
Count+1
Case 2:Count+0
8/3/2019 Ch6 Measurement
33/85
Copyright 2004 David J. Lilja 33
Approximate Measures of
Short Intervals
Bernoulli experiment
Outcome = +1 with probabilityp
Outcome =
+0 with probability (1-p)
Equivalent to flipping a biased coin
Repeat n times
Approximates a binomial distribution
Only approximate since each measurementcannot be guaranteed to be independent
Usually close enough in practice
8/3/2019 Ch6 Measurement
34/85
Copyright 2004 David J. Lilja 34
Approximate Measures of
Short Intervals
m = number of times Case 1 occurs
Count+1
n = total number of measurements Average duration is ratio ofm/n
Use confidence interval for proportions
ceT
n
mT !
8/3/2019 Ch6 Measurement
35/85
Copyright 2004 David J. Lilja 35
Example Clock resolution = 10 us
n = 8764 measurements
m = 467 clock ticks counted
95% confidence interval
10 us
?
?
Case 1:
467
Case 2:
8297
8/3/2019 Ch6 Measurement
36/85
Copyright 2004 David J. Lilja 36
Example
)0580.0,0486.0(
8764
8764
4671
8764
467
96.18764
467
),( 21
!
! Occ
Scale by clock period = 10 us
95% chance that measured event is
(0.49, 0.58) us
8/3/2019 Ch6 Measurement
37/85
Copyright 2004 David J. Lilja 37
Profiling
Overall view of programs execution-time
behavior
Fractio
nof total time spe
nt in
specific states Fraction of time in each subroutine
Fraction of time in OS kernel
Fraction of time doing I/O
Find bottlenecks, code hot-spots
Optimize those sections first
8/3/2019 Ch6 Measurement
38/85
Copyright 2004 David J. Lilja 38
Statistical Sampling
Select a random subsetof a population
Gather information on
only this subset Extrapolate this
information to overallpopulation
Resu
lts are a statisticalsummary withcorresponding errorprobabilities
8/3/2019 Ch6 Measurement
39/85
Copyright 2004 David J. Lilja 39
PC Sampling
Periodically interrupt program at fixed intervals
Record appropriate state information in interrupt
service routine
Post-process to obtain overall profile
+1 +1 +1
8/3/2019 Ch6 Measurement
40/85
Copyright 2004 David J. Lilja 40
PC Sampling
At each interrupt
Examine PC on return address stack
Use address map to translate this PC to
subroutine i
Increment array element H[i]
Addr map
0-1298: Subr 1
1299-3455: Subr 2
3456-5567: Subr 3
5568-9943: Subr 4
PC: 4582 Histogramcounters:
H[3]=H[3]+1
8/3/2019 Ch6 Measurement
41/85
Copyright 2004 David J. Lilja 41
PC Sampling
0
20
40
60
80
100
120
140
H[1]
H[2]
H[3]
H[4]
H[5]
H[6]
H[7]
H[8]
H[9]
H[10]
H[11]
H[12]
8/3/2019 Ch6 Measurement
42/85
Copyright 2004 David J. Lilja 42
PC Sampling
n total interrupts
Post-processing step
H[i]/n
= fraction
of time executing in
subro
utin
e i (H[i]/n) * (interrupt period) = time in each
subroutine
8/3/2019 Ch6 Measurement
43/85
Copyright 2004 David J. Lilja 43
PC Sampling
This is a statistical process
Different counts each time the experiment is
performed
Infer behavior of entire program from small
sample
Apply confidence intervals to quantify
precision of results
8/3/2019 Ch6 Measurement
44/85
Copyright 2004 David J. Lilja 44
Example
40 us interrupt
36,128 interrupts in subroutine A
Program runs for 10 seconds
Time in this subroutine? 90% confidence interval
m = 36,128
n = 10 sec / 40 us = 250,000
p = m/n = 0.144
8/3/2019 Ch6 Measurement
45/85
Copyright 2004 David J. Lilja 45
Example
)146.0,144.0(
250000
)855488.0(144512.0645.1144512.0),( 21
!
! Occ
90% chance that the program spent 14.4-14.6% of
its time in subroutine A
8/3/2019 Ch6 Measurement
46/85
Copyright 2004 David J. Lilja 46
Example
10 ms interrupt
12 interrupts in subroutine A
n = 800 samples
8 seconds total execution time
Time in this subroutine?
99% confidence interval
p = m/n = 0.015
8/3/2019 Ch6 Measurement
47/85
Copyright 2004 David J. Lilja 47
Example
)0261.0,0039.0(
800
)015.01(015.0576.2015.0),( 21
!
! Occ
99% chance that the program spent 31-210 ms in
subroutine A
A pretty wide range! But only
8/3/2019 Ch6 Measurement
48/85
Copyright 2004 David J. Lilja 48
Reducing the Interval Size
Use a lower confidence level
Obtain more samples
Run
program longer
May not be possible
Increase sample rate
May be fixed by system
Will increase overhead and perturbation Run multiple times and add samples from each
run
8/3/2019 Ch6 Measurement
49/85
Copyright 2004 David J. Lilja 49
PC Sampling
Interrupts must occur asynchronously w.r.t. any
program events Samples must be independent of each other
Else over/under-sample events synchronous with interrupt
Periodic versus random sampling
+1 +1 +1
8/3/2019 Ch6 Measurement
50/85
Copyright 2004 David J. Lilja 50
Basic Block Counting
Basic block
Sequence of instructions with no branches intoor
outof the block
When first instruction is executed, guaranteed
that all instructions in block will be executed
Single entry, single exit
8/3/2019 Ch6 Measurement
51/85
Copyright 2004 David J. Lilja 51
Basic Block Counting
Generate a program profile by inserting
additional instructions in each block
Increment a unique counter each time a block is
entered
Produces a histogram of program execution
Can post-process to find instruction execution
frequencies
8/3/2019 Ch6 Measurement
52/85
Copyright 2004 David J. Lilja 52
Comparison
PC samplingBasic block
counting
Output
Statistical
estimate Exact coun
t
OverheadInterrupt service
routine
Extra instructions
per block
PerturbationRandomly
distributed High
RepeatabilityWithin statistical
variancePerfect
8/3/2019 Ch6 Measurement
53/85
Copyright 2004 David J. Lilja 53
Event Tracing
Profile shows overall frequency-of-execution
behavior
Ignores time-ordering of events
Program trace
Dynamic list of events generated by program
Events = anything you want to instrument
Sequence of memory addresses I/O blocks accessed
Typically used to drive a simulator
8/3/2019 Ch6 Measurement
54/85
Copyright 2004 David J. Lilja 54
Trace Generation
Application
program
Compress
Uncompress
Trace
consumer
Modify to generate trace
8/3/2019 Ch6 Measurement
55/85
Copyright 2004 David J. Lilja 55
Trace Generation
Application
program
Compress
Uncompress
Trace
consumer
Online trace
consumption
Modify to generate trace
8/3/2019 Ch6 Measurement
56/85
Copyright 2004 David J. Lilja 56
Trace Generation
Source-code modification
Allows precise control of what events are traced
and what data is recorded
Typically a manual process
Source
code
Object
code Proc TraceCompiler
8/3/2019 Ch6 Measurement
57/85
Copyright 2004 David J. Lilja 57
Trace Generation
Software exceptions
HW forces an exception before each instruction
Exception routine decodes instruction
Store instr type, PC, operand addresses, etc.
Trace bit in many processors
Tremendous slowdown
Source
code
Object
code Proc TraceCompiler
8/3/2019 Ch6 Measurement
58/85
Copyright 2004 David J. Lilja 58
Trace Generation
Emulation
Make a system appear to
be something else
Modify emulator to
generate trace
E.g. Java Virtual Machine
Source
code
Object
code Proc TraceCompiler
8/3/2019 Ch6 Measurement
59/85
Copyright 2004 David J. Lilja 59
Microcode modification
Modify instruction execution directly
Allows tracing ofallinstructions
Including operating system
Depends on access to lower levels of the
processor
E.g. Tran
smeta Crusoe processor
Trace Generation
Source
code
Object
code Proc TraceCompiler
8/3/2019 Ch6 Measurement
60/85
Copyright 2004 David J. Lilja 60
Trace Generation
Compiler modification
Insert trace code directly in object file
Requires access to the compiler itself
Source
code
Object
code Proc TraceCompiler
8/3/2019 Ch6 Measurement
61/85
Copyright 2004 David J. Lilja 61
Trace Generation
Compiler modification
Insert trace code directly in object file
Requires access to the compiler itself
Write post-compilation binary editor/rewrite tool
Source
code
Object
code Proc TraceCompiler
8/3/2019 Ch6 Measurement
62/85
Copyright 2004 David J. Lilja 62
Trace Data
Tracing generates a tremendous volume ofdata
Trace 100,000,000 instrs/sec
16 bits of data per event
190 Mbytes of data per second
11 Gbytes per minute
Huge perturbations Due to tracing code
Time to store trace data
8/3/2019 Ch6 Measurement
63/85
Copyright 2004 David J. Lilja 63
Trace Data Compression
Standard compression
algorithms as trace is
written to disk
Uncompress whenreading
Typical reduction
20-70%
Tradeoff is compress-uncompress time
Application
program
Compress
Uncompress
Trace
consumer
Modify to generate trace
8/3/2019 Ch6 Measurement
64/85
Copyright 2004 David J. Lilja 64
Online Trace Consumption
Use trace data as it is
generated
Never stored on disk
Multitaski
ng may lead tonon-deterministic behavior
Repeatability issue
Before-and-after
comparison tests
Difference due to change insystem or change in trace?
Becomes statistical
comparisonwith n runs
Application
program
Trace
consumer
Online trace
consumption
Modify to generate trace
8/3/2019 Ch6 Measurement
65/85
Copyright 2004 David J. Lilja 65
Abstract Execution
Use higher-level information to intelligently
compress trace info
Two-step process
Compiler-style analysis to find critical subset of
trace
Store only control flow information sufficient to
reconstruct trace later
Produce trace-regeneration code for subsequent
use of trace
8/3/2019 Ch6 Measurement
66/85
Copyright 2004 David J. Lilja 66
Abstract Execution
Trace will be either
1-2-4
1-3-4
Store only 2 or3 Combine with compiler-
generated control flow
graph to regenerate
trace Slowdown = 2-10x
Compress = 10-100x
1. if (i > 5)
2. then a = a + i;
3. else b = b + i;
4. i = i + 1;
1. if (i>5)
2. a=a+i 3. b=b+i
4. i=i+1
8/3/2019 Ch6 Measurement
67/85
Copyright 2004 David J. Lilja 67
Trace Sampling
Save only subsequences of overall trace
Drive simulator with samples
Results should be statistically similar to
driving with complete trace
One sample = kconsecutive events
Sampling interval = P(period)
k k
P
8/3/2019 Ch6 Measurement
68/85
Copyright 2004 David J. Lilja 68
SimPoint
Find representative program samples
Match basic block execution frequencies
Clustering tool to automate process
Perform detailed timing simulation on only
these samples
Fast-forward (functional simulation) between
samples [Sherwood et al, ASPLOS, 2002]
8/3/2019 Ch6 Measurement
69/85
Copyright 2004 David J. Lilja 69
SimPoint
Weight each samples result by execution
frequency to produced overall result
Relatively small number (10s) ofSimPoints
produced 3% error in IPC onSPEC
8/3/2019 Ch6 Measurement
70/85
Copyright 2004 David J. Lilja 70
SMARTS
Uses systematic sampling
Fixed sample interval
Apply statistical sampling techniques to
determine j, k,P
k
P
k
Functional simulation
Detailed simulation
jj
8/3/2019 Ch6 Measurement
71/85
Copyright 2004 David J. Lilja 71
Indirect Ad HocTechniques
Sometimes the desired metric cannot be
measured directly
Use your creativity to measure one thing and
then derive/infer the desired value
8/3/2019 Ch6 Measurement
72/85
Copyright 2004 David J. Lilja 72
Example System Load
What is system load? Number of jobs in run queue?
Number of jobs actively time-sharing?
Fraction of time processor is not in idle loop? Others?
How to measure it? Modify OS
PC sampling
Indirect?
8/3/2019 Ch6 Measurement
73/85
Copyright 2004 David J. Lilja 73
Example
Let system run for fixed time T Note value of counter
Monitor
Count
n
T
8/3/2019 Ch6 Measurement
74/85
Copyright 2004 David J. Lilja 74
Example
Let system run for fixed time T Compare value of loaded system monitor
counter to unloaded system count value
Monitor
Monitor
App 1
Count
n
n/2
T
8/3/2019 Ch6 Measurement
75/85
Copyright 2004 David J. Lilja 75
Example
Let system run for fixed time T Compare value of loaded system monitor
counter to unloaded system count value
Monitor
Monitor
App 1
App 1
App 2
Monitor
Count
n
n/2
n/3
T
8/3/2019 Ch6 Measurement
76/85
Copyright 2004 David J. Lilja 76
Perturbation
To obtain more information (higher
resolution)
Use more instrumentation points
More instrumentation points
Greater perturbation
8/3/2019 Ch6 Measurement
77/85
Copyright 2004 David J. Lilja 77
Perturbation
Computer performance measurement
uncertainty principle
Accuracy is inversely proportional to
resolution.
Resolution
Ac
curacy
Low
High
High
8/3/2019 Ch6 Measurement
78/85
Copyright 2004 David J. Lilja 78
Perturbation
Superposition does not work here
Non-linear
Non-additive
Double instrumentation double impact on
performance
Some instrumentation cancels out
Some multiplies impact
No way to predict!
8/3/2019 Ch6 Measurement
79/85
Copyright 2004 David J. Lilja 79
Instrumentation Code
Changes memory access patterns Affects memory banking optimizations
Generates additional load/store instructions More frequent cache flushes and replacements
But may reduce set associativity conflicts
Generates more I/O operations
Will increase overall exec
ution
time More time-sharing context switches
Alters virtual memory paging behavior
8/3/2019 Ch6 Measurement
80/85
Copyright 2004 David J. Lilja 80
Important Points
Event types
Simple counts of primary event
Secondary events triggered by some primary
event
Overallprofiles
8/3/2019 Ch6 Measurement
81/85
Copyright 2004 David J. Lilja 81
Important Points
Measurement strategies
Event-driven
Tracing
Sampling
Indirect approaches
8/3/2019 Ch6 Measurement
82/85
Copyright 2004 David J. Lilja 82
Important Points
Interval timers
Stopwatch functionality
Rollover problem
Overhead
Quantization errors
Statistical measures of short intervals
8/3/2019 Ch6 Measurement
83/85
Copyright 2004 David J. Lilja 83
Important Points
Profiling
PC sampling
Statistical view
Basic block counting
Exact behavior
High overhead and perturbation
8/3/2019 Ch6 Measurement
84/85
Copyright 2004 David J. Lilja 84
Important Points
Trace generation Source code modification
Force exceptions
Emulation Microcode modification
Compiler modification
Object code editor
Online trace consumption
Trace sampling
8/3/2019 Ch6 Measurement
85/85
C i ht 2004 D id J Lilj 85
Important Points
Indirect measurements when all else fails
System load example
Perturbations
Nobody likes them
Have to learn to live with them