Ch6 Measurement

8/3/2019 Ch6 Measurement

1/85

Copyright 2004 David J. Lilja 1

Measurement tools and

techniques

Fundamental strategies

Interval timers

Program profiling Tracing

Indirect measurement


2/85


Events

Most measurement tools based onevents

Some predefined change to system state

Definition depends on metric being measured Memory reference

Disk access

Change in a registers state

Network message

Processor interrupt


3/85


Event Classification

Countmetrics

The number of times event X occurs

Number of cache misses Number of I/O operations


4/85



Secondary-eventmetrics

Record a value when triggered

by some event

Record block size for each I/O

operation

Count number of operations

Find average I/O transfer size


5/85



Profiles

Characterization of overall

behavior

Aggregate/big picture view of an

application program

Time spent in each function


6/85


Event-Driven Strategies

Record necessary informationonly when

selected eventoccurs

Modify system to record event Dump data when program terminates

May need intermediate dumps also

E.g. simple coun

ter in

page fault ro

utine


7/85



System overhead

Only when the event of interest actually occurs

Infrequent events little perturbation Frequent events high perturbation

No longer typical behavior?

Perturbation changes system being measured


8/85



Inter-event time is unpredictable

Depends on when events actually occur

Makes it hard to estimate pert

urbatio

n

How long to measure?

Event-driven measurement tools

Good for low-frequency events


9/85



Counts 8 events exactly

+1 +1 +1 +1 +1 +1 +1 +1


10/85


Tracing

Similar to event-driven

But record additional system state

Even

t has occu

rred coun

t Additional information to uniquely identify event

E.g. addresses that cause page faults

Overhead

Additional memory or disk storage Time to save state

Relatively large system perturbation


11/85


Tracing

Counts 8 events plus extra data

+1;

Addr

+1;

Addr

+1;

Addr

+1;

Addr

+1;

Addr

+1;

Addr

+1;

Addr

+1;

Addr


12/85


Sampling

Record necessary state at fixed time intervals

Overhead

Independent of specific event frequency Depends onsampling frequency

Misses some events

Produces statistical s

ummary

May miss infrequent events

Each replication will produce different results


13/85


Sampling

Counts 3 events out of 5 samples

+1 +1 +1


14/85


Comparisons

Event

countTracing Sampling

Resolution Exactcount Detailedinfo

Statistical

summary

Overhead Low High Constant

Perturbation ~ #events High Fixed


15/85


Comparison

Event counting

Best for low frequency events

Required if exact co

untsneeded

Sampling

Best for high frequency events

If statistical summary is adequate

Tracing

When additional detail is required


16/85


Indirect Measurements

Used when desired metric is not directly

accessible

Measure one thing directly Derive or deduce desired metric

Highly dependent on creativity of

performance analyst


17/85


Interval Timers

Fundamental tool of performance

measurement

Measure execution time of any portion of aprogram

Provide time basis for sampling


18/85


Interval Timers

Actually count clock pulses between two events

Event 1 Event 2

x1= Counter x2 = Counter

Tc

Te=(x2 x1)Tc


19/85


Using an Interval Timer

Within an application program

Start_count = read_timer();

Portion of program to be measuredStop_count = read_timer();

Elapsed_time = (stop_count start_count)

* clock_period;


20/85


Hardware Timer

Clock

n-bit counter

To CPU input port

Tc

Te=(x2 x1)Tc


21/85


Software Timer

Te=(x2 x1)Tc

Clock

Tc

Prescalar

(divide-by-n)

Software counter

CPU interrupt inputTc


22/85


Quantization Errors

Event

Clock

(b) Interval timer reports event duration of n = 14 clock ticks.

(a) Interval timer reports event duration of n = 13 clock ticks.

Event

Clock


23/85


Quantization Error

Timer resolution

quantization error

Repeated measurements nTc < Te < (n+1)Tc

Te rounded to one clock tick

Completely unpredictable rounding

Want Tc to be as small as possible


24/85


Timer Rollover

n-bit counter

count= [0, 2n-1]

Rollover= transition from (2n

1) 0 If rollover occurs betweenstart/stop events

Thencount= (x2 x1) < 0

Check forcount< 0

Measure again

Add 2n to count


25/85


Timer Rollover

Counter width, n

Resolution

(Tc)

16 32 64

10 ns 655 us 43 s 58.5 cent

1 us 65.5 ms 1.2 h 5,580 cent

100 us 6.55 s 5 days 585,000 cent

1 ms 1.1 min 50 days 5,850,000 cent


26/85


Timer Overhead

Start_count = read_timer();

Portion of program to be measured

Stop_count = read_timer();

Elapsed_time = (stop_count start_count)

* clock_period;

To access timer Min of 1 memory read subroutine call

Min of 1 memory write subroutine call

Once at start, again at stop


27/85

Copyright2004Da

vidJ.Lilja

TimerOverhead

T1

T2

T3

T4

Event begins;

Initiate read_timer()

Event ends;Initiate read_time()

Current time actually read

Event being measured

begins


28/85


Timer Overhead

T1 T2 T3 T4

T1 = time to read counter value

T2 = time to store counter value

T3 = time of the event we are measuring T4 = time to read counter value

T4 = T1


29/85


Timer Overhead

T1 T2 T3 T4

Te = event time = T3

But actually measured

Tm = T2 + T3 + T4

Te = Tm (T2 + T4) = Tm (T1 + T2)

Timeroverhead= Tovhd = (T1 + T2)


30/85


Timer Overhead

If Te >> Tovhd Ignore the timer overhead

If Te Tovhd Measurements will be highly suspect

Potentially large variations in Tovhd

Good rule of thumb

Te should be 100-1000x > Tovhd


31/85


Approximate Measures of

Short Intervals

How to measure an event that is shorter than

the resolution of the clock?

Cann

ot directly measure eve

nts with

Te < Tc

Overhead makes it hard to measure even

when Te

> nTc

,

n is small integer


32/85



Short Intervals

Tc

Te

Te

Case 1:

Count+1

Case 2:Count+0


33/85



Short Intervals

Bernoulli experiment

Outcome = +1 with probabilityp

Outcome =

+0 with probability (1-p)

Equivalent to flipping a biased coin

Repeat n times

Approximates a binomial distribution

Only approximate since each measurementcannot be guaranteed to be independent

Usually close enough in practice


34/85



Short Intervals

m = number of times Case 1 occurs

Count+1

n = total number of measurements Average duration is ratio ofm/n

Use confidence interval for proportions

ceT

n

mT !


35/85


Example Clock resolution = 10 us

n = 8764 measurements

m = 467 clock ticks counted

95% confidence interval

10 us

?

?

Case 1:

467

Case 2:

8297


36/85


Example

)0580.0,0486.0(

8764

8764

4671

8764

467

96.18764

467

),( 21

!

! Occ

Scale by clock period = 10 us

95% chance that measured event is

(0.49, 0.58) us


37/85


Profiling

Overall view of programs execution-time

behavior

Fractio

nof total time spe

nt in

specific states Fraction of time in each subroutine

Fraction of time in OS kernel

Fraction of time doing I/O

Find bottlenecks, code hot-spots

Optimize those sections first


38/85


Statistical Sampling

Select a random subsetof a population

Gather information on

only this subset Extrapolate this

information to overallpopulation

Resu

lts are a statisticalsummary withcorresponding errorprobabilities


39/85


PC Sampling

Periodically interrupt program at fixed intervals

Record appropriate state information in interrupt

service routine

Post-process to obtain overall profile

+1 +1 +1


40/85


PC Sampling

At each interrupt

Examine PC on return address stack

Use address map to translate this PC to

subroutine i

Increment array element H[i]

Addr map

0-1298: Subr 1

1299-3455: Subr 2

3456-5567: Subr 3

5568-9943: Subr 4

PC: 4582 Histogramcounters:

H[3]=H[3]+1


41/85


PC Sampling

0

20

40

60

80

100

120

140

H[1]

H[2]

H[3]

H[4]

H[5]

H[6]

H[7]

H[8]

H[9]

H[10]

H[11]

H[12]


42/85


PC Sampling

n total interrupts

Post-processing step

H[i]/n

= fraction

of time executing in

subro

utin

e i (H[i]/n) * (interrupt period) = time in each

subroutine


43/85


PC Sampling

This is a statistical process

Different counts each time the experiment is

performed

Infer behavior of entire program from small

sample

Apply confidence intervals to quantify

precision of results


44/85


Example

40 us interrupt

36,128 interrupts in subroutine A

Program runs for 10 seconds

Time in this subroutine? 90% confidence interval

m = 36,128

n = 10 sec / 40 us = 250,000

p = m/n = 0.144


45/85


Example

)146.0,144.0(

250000

)855488.0(144512.0645.1144512.0),( 21

!

! Occ

90% chance that the program spent 14.4-14.6% of

its time in subroutine A


46/85


Example

10 ms interrupt

12 interrupts in subroutine A

n = 800 samples

8 seconds total execution time

Time in this subroutine?

99% confidence interval

p = m/n = 0.015


47/85


Example

)0261.0,0039.0(

800

)015.01(015.0576.2015.0),( 21

!

! Occ

99% chance that the program spent 31-210 ms in

subroutine A

A pretty wide range! But only


48/85


Reducing the Interval Size

Use a lower confidence level

Obtain more samples

Run

program longer

May not be possible

Increase sample rate

May be fixed by system

Will increase overhead and perturbation Run multiple times and add samples from each

run


49/85


PC Sampling

Interrupts must occur asynchronously w.r.t. any

program events Samples must be independent of each other

Else over/under-sample events synchronous with interrupt

Periodic versus random sampling

+1 +1 +1


50/85


Basic Block Counting

Basic block

Sequence of instructions with no branches intoor

outof the block

When first instruction is executed, guaranteed

that all instructions in block will be executed

Single entry, single exit


51/85


Basic Block Counting

Generate a program profile by inserting

additional instructions in each block

Increment a unique counter each time a block is

entered

Produces a histogram of program execution

Can post-process to find instruction execution

frequencies


52/85


Comparison

PC samplingBasic block

counting

Output

Statistical

estimate Exact coun

t

OverheadInterrupt service

routine

Extra instructions

per block

PerturbationRandomly

distributed High

RepeatabilityWithin statistical

variancePerfect


53/85


Event Tracing

Profile shows overall frequency-of-execution

behavior

Ignores time-ordering of events

Program trace

Dynamic list of events generated by program

Events = anything you want to instrument

Sequence of memory addresses I/O blocks accessed

Typically used to drive a simulator


54/85


Trace Generation

Application

program

Compress

Uncompress

Trace

consumer

Modify to generate trace


55/85


Trace Generation

Application

program

Compress

Uncompress

Trace

consumer

Online trace

consumption



56/85


Trace Generation

Source-code modification

Allows precise control of what events are traced

and what data is recorded

Typically a manual process

Source

code

Object

code Proc TraceCompiler


57/85


Trace Generation

Software exceptions

HW forces an exception before each instruction

Exception routine decodes instruction

Store instr type, PC, operand addresses, etc.

Trace bit in many processors

Tremendous slowdown

Source

code

Object



58/85


Trace Generation

Emulation

Make a system appear to

be something else

Modify emulator to

generate trace

E.g. Java Virtual Machine

Source

code

Object



59/85


Microcode modification

Modify instruction execution directly

Allows tracing ofallinstructions

Including operating system

Depends on access to lower levels of the

processor

E.g. Tran

smeta Crusoe processor

Trace Generation

Source

code

Object



60/85


Trace Generation

Compiler modification

Insert trace code directly in object file

Requires access to the compiler itself

Source

code

Object



61/85


Trace Generation


Insert trace code directly in object file

Requires access to the compiler itself

Write post-compilation binary editor/rewrite tool

Source

code

Object



62/85


Trace Data

Tracing generates a tremendous volume ofdata

Trace 100,000,000 instrs/sec

16 bits of data per event

190 Mbytes of data per second

11 Gbytes per minute

Huge perturbations Due to tracing code

Time to store trace data


63/85


Trace Data Compression

Standard compression

algorithms as trace is

written to disk

Uncompress whenreading

Typical reduction

20-70%

Tradeoff is compress-uncompress time

Application

program

Compress

Uncompress

Trace

consumer



64/85


Online Trace Consumption

Use trace data as it is

generated

Never stored on disk

Multitaski

ng may lead tonon-deterministic behavior

Repeatability issue

Before-and-after

comparison tests

Difference due to change insystem or change in trace?

Becomes statistical

comparisonwith n runs

Application

program

Trace

consumer

Online trace

consumption



65/85


Abstract Execution

Use higher-level information to intelligently

compress trace info

Two-step process

Compiler-style analysis to find critical subset of

trace

Store only control flow information sufficient to

reconstruct trace later

Produce trace-regeneration code for subsequent

use of trace


66/85


Abstract Execution

Trace will be either

1-2-4

1-3-4

Store only 2 or3 Combine with compiler-

generated control flow

graph to regenerate

trace Slowdown = 2-10x

Compress = 10-100x

1. if (i > 5)

2. then a = a + i;

3. else b = b + i;

4. i = i + 1;

1. if (i>5)

2. a=a+i 3. b=b+i

4. i=i+1


67/85


Trace Sampling

Save only subsequences of overall trace

Drive simulator with samples

Results should be statistically similar to

driving with complete trace

One sample = kconsecutive events

Sampling interval = P(period)

k k

P


68/85


SimPoint

Find representative program samples

Match basic block execution frequencies

Clustering tool to automate process

Perform detailed timing simulation on only

these samples

Fast-forward (functional simulation) between

samples [Sherwood et al, ASPLOS, 2002]


69/85


SimPoint

Weight each samples result by execution

frequency to produced overall result

Relatively small number (10s) ofSimPoints

produced 3% error in IPC onSPEC


70/85


SMARTS

Uses systematic sampling

Fixed sample interval

Apply statistical sampling techniques to

determine j, k,P

k

P

k

Functional simulation

Detailed simulation

jj


71/85


Indirect Ad HocTechniques

Sometimes the desired metric cannot be

measured directly

Use your creativity to measure one thing and

then derive/infer the desired value


72/85


Example System Load

What is system load? Number of jobs in run queue?

Number of jobs actively time-sharing?

Fraction of time processor is not in idle loop? Others?

How to measure it? Modify OS

PC sampling

Indirect?


73/85


Example

Let system run for fixed time T Note value of counter

Monitor

Count

n

T


74/85


Example

Let system run for fixed time T Compare value of loaded system monitor

counter to unloaded system count value

Monitor

Monitor

App 1

Count

n

n/2

T


75/85


Example

Let system run for fixed time T Compare value of loaded system monitor

counter to unloaded system count value

Monitor

Monitor

App 1

App 1

App 2

Monitor

Count

n

n/2

n/3

T


76/85


Perturbation

To obtain more information (higher

resolution)

Use more instrumentation points

More instrumentation points

Greater perturbation


77/85


Perturbation

Computer performance measurement

uncertainty principle

Accuracy is inversely proportional to

resolution.

Resolution

Ac

curacy

Low

High

High


78/85


Perturbation

Superposition does not work here

Non-linear

Non-additive

Double instrumentation double impact on

performance

Some instrumentation cancels out

Some multiplies impact

No way to predict!


79/85


Instrumentation Code

Changes memory access patterns Affects memory banking optimizations

Generates additional load/store instructions More frequent cache flushes and replacements

But may reduce set associativity conflicts

Generates more I/O operations

Will increase overall exec

ution

time More time-sharing context switches

Alters virtual memory paging behavior


80/85


Important Points

Event types

Simple counts of primary event

Secondary events triggered by some primary

event

Overallprofiles


81/85


Important Points

Measurement strategies

Event-driven

Tracing

Sampling

Indirect approaches


82/85


Important Points

Interval timers

Stopwatch functionality

Rollover problem

Overhead

Quantization errors

Statistical measures of short intervals


83/85


Important Points

Profiling

PC sampling

Statistical view

Basic block counting

Exact behavior

High overhead and perturbation


84/85


Important Points

Trace generation Source code modification

Force exceptions

Emulation Microcode modification


Object code editor

Online trace consumption

Trace sampling


85/85

C i ht 2004 D id J Lilj 85

Important Points

Indirect measurements when all else fails

System load example

Perturbations

Nobody likes them

Have to learn to live with them

Date post:	06-Apr-2018
Category:	Documents
Upload:	lipsa-priyadarshini-jena
View:	238 times
Download:	0 times

Ch6 Measurement

Documents