Lecture 2: Performance Measurement

Post on 12-Feb-2016

49 views 0 download

Tags:

description

Lecture 2: Performance Measurement. Performance Evaluation. The primary duty of software developers is to create functionally correct programs Performance evaluation is a part of software development for well-performing programs. Performance Analysis Cycle. - PowerPoint PPT Presentation

transcript

Lecture 2:Lecture 2:

Performance Performance MeasurementMeasurement

Performance Evaluation

The primary duty of software developers is to create functionally correct programs

Performance evaluation is a part of software development for well-performing programs

Performance Analysis Cycle

Have an optimization phase just like testing and debugging phase

Code Development

Measure

Modify / Tune

Analyze

Usage

Functionally complete and correct program

Complete, correct and well-performing program

Goals of Performance Analysis

The goal of performance analysis is to provide quantitative information about the performance of a computer system

Goals of Performance Analysis Compare alternatives

• When purchasing a new computer system, to provide quantitative information Determine the impact of a feature

• In designing a new system or upgrading, to provide before-and-after comparison System tuning

• To find the best parameters that produce the best overall performance Identify relative performance

• To quantify the performance relative to previous generations Performance debugging

• To identify the performance problems and correct them Set expectations

• To determine the expected capabilities of the next generation

Performance EvaluationPerformance Evaluation steps:

1. Measurement / Prediction• What to measure? How to measure?• Modeling for prediction

• Simulation • Analytical Modeling

2. Analysis & Reporting• Performance metrics

Performance Measurement

Interval Timers

• Hardware Timers

• Software Timers

Performance MeasurementHardware Timers

• Counter value is read from a memory location

• Time is calculated as

Clock Counter

Tc

n bits to processor memory bus

Time = (x2 - x1) x Tc

Performance MeasurementSoftware Timers

• Interrupt-based

• When interrupt occurs, interrupt-service routine increments the timer value which is read by a program

• Time is calculated as

Clock Prescaling Counter

Tc

to processor interrupt input

T’c

Time = (x2 - x1) x T’c

Performance Measurement

Timer Rollover

Occurs when an n-bit counter undergoes a transition from its maximum value 2n – 1 to zero

There is a trade-off between roll over time and accuracy

T’c 32-bit 64-bit10 ns 42 s 5850 years

1 s 1.2 hour 0.5 million years

1 ms 49 days 0.5 x 109 years

TimersSolution:

1. Use 64-bit integer (over half a million year)

2. Timer returns two values: • One represents seconds • One represents microseconds since the last secondWith 32-bit, the roll over is over 100 years

Performance Measurement

Interval Timers

T0 Read current timeEvent being timed ();T1 Read current time

Time for the event is: T1-T0

Performance MeasurementTimer Overhead

Initiate read_time

Current time is read

Event begins

Event ends; Initiate read_time

Current time is read

T1

T2

T3

T4

Measured time: Tm = T2 + T3 + T4

Desired measurement: Te = Tm – (T2 + T4) = Tm – (T1 + T2) since T1 = T4

Timer overhead: Tovhd = T1 + T2

Te should be 100-1000 times greater than Tovhd .

Performance MeasurementTimer Resolution

Resolution is the smallest change that can be detected by an interval timer.

nT’c < Te < (n+1)T’c

If Tc is large relative to the event being measured, it may be impossible to measure the duration of the event.

Performance MeasurementMeasuring Short Intervals

Te < Tc

Tc

Te

Tc

Te

1

0

Performance MeasurementMeasuring Short Intervals

Solution: Repeat measurements n times.

Average execution time: T’e = (m x Tc) / nm: number of 1s measured

Average execution time: T’e = (Tt / n ) – hTt : total execution time of n

repetitionsh: repetition overhead

Tc

Te

Tt

Performance Measurement Time

• Elapsed time / wall-clock time / response time• Latency to complete a task, including disk access,

memory access, I/O, operating system overhead, and everything (includes time consumed by other programs in a time-sharing system)

• CPU time• The time CPU is computing, not including I/O time or

waiting time• User time / user CPU time

• CPU time spent in the program• System time / system CPU time

• CPU time spent in the operating system performing tasks requested by the program

Performance Measurement UNIX time command

90.7u 12.9s 2:39 65%

Drawbacks:• Resolution is in milliseconds• Different sections of the code can not be timed

User time

System time

Elapsed time Percentage of elapsed time

Timers

Timer is a function, subroutine or program that can be used to return the amount of time spent in a section of code.

t0 = timer(); …< code segment > …t1 = timer();time = t1 – t0;

zero = 0.0;t0 = timer(&zero); …< code segment > …t1 = timer(&t0);time = t1;

Timers

Read Wadleigh, Crawford pg 130-136 for:time, clock, gettimeofday, etc.

TimersMeasuring Timer Resolution

main() { . . .zero = 0.0;t0 = timer(&zero);t1 = 0.0;j=0;while (t1 == 0.0) {

j++;zero=0.0;t0 = timer(&zero);foo(j);t1 = timer(&t0);

}printf (“It took %d iterations for a nonzero time\n”, j); if (j==1) printf (“timer resolution <= %13.7f seconds\n”, t1);else printf (“timer resolution is %13.7f seconds\n”, t1);

}foo(n){ . . .

i=0;for (j=0; j<n; j++)

i++;return(i);

}

TimersMeasuring Timer Resolution

Using clock():

Using times():

Using getrusage():

It took 682 iterations for a nonzero timetimer resolution is 0.0200000 seconds

It took 720 iterations for a nonzero timetimer resolution is 0.0200000 seconds

It took 7374 iterations for a nonzero timetimer resolution is 0.0002700 seconds

TimersSpin Loops

For codes that take less time to run than the resolution of the timer First call to a function may require an inordinate amount of time. Therefore the minimum of all times may be desired.

main() { . . .zero = 0.0;t2 = 100000.0;for (j=0; j<n; j++) {

t0 = timer(&zero);foo(j);t1 = timer(&t0); t2 = min(t2, t1);

}t2 = t2 / n;printf (“Minimum time is %13.7f seconds\n”, t2);

}foo(n){ . . .

< code segment >}

Profilers A profiler automatically insert timing calls into applications to

generate calls into applications

It is used to identify the portions of the program that consumes the largest fraction of the total execution time.

It may also be used to find system-level bottlenecks in a multitasking system.

Profilers may alter the timing of a program’s execution

Profilers Data collection techniques

• Sampling-based • This type of profilers use a predefined clock; every multiple of this clock tick

the program is interrupted and the state information is recorded. • They give the statistical profile of the program behavior. • They may miss some important events.

• Event-based• Events are defined (e.g. entry into a subroutine) and data about these events

are collected.• The collected information shows the exact execution frequencies. • It has substantial amount of run-time overhead and memory requirement.

Information kept• Trace-based: The compiler keeps all information it collects.• Reductionist: Only statistical information is collected.

Performance EvaluationPerformance Evaluation steps:

1. Measurement / Prediction• What to measure? How to measure?• Modeling for prediction

• Simulation • Analytical Modeling

• Queuing Theory

2. Analysis & Reporting• Performance metrics

Predicting Performance

Performance of simple kernels can be predicted to a high degree

Theoretical performance and peak performance must be close

It is preferred that the measured performance is over 80% of the theoretical peak performance

Performance EvaluationPerformance Evaluation steps:

1. Measurement / Prediction• What to measure? How to measure?• Modeling for prediction

• Simulation • Analytical Modeling

• Queuing Theory

2. Analysis & Reporting• Performance metrics

Performance Metrics

Measurable characteristics of a computer system: • Count of an event• Duration of a time interval

• Size of a parameter

Rate: • Operations executed per second

Performance MertricsClock Speed

Clock speed/frequency (f): the rate of clock pulses (ex: 1GHz)

Cycle time (Tc): time between two clock pulses (Tc = 1/f)

Tc

Performance MertricsInstruction Execution Rate

Cycles per Instruction (CPI): is an average depends on the design of micro-architecture

(hardwired/microprogrammed, pipelined)

Number of instructions: is the number of instructions executed at runtime Depends on

• instruction set architecture (ISA) • compiler

CPI =

n

i 1ii )ICPI(

n

i 1iI

CPIi: number of cycles required for instruction i

Ii: number of executed instructions of type i

Performance MetricsCPU Performance

CPU time of a program (T) = instructions x cycles x time program instruction cycle

CPI (cycles per instruction)

T = instruction count x CPI x 1 f

Performance MetricsCPU Performance

Drawbacks:

In modern computers, no program runs without some operating system running on the hardware

Comparing performance between machines with different operating systems will be unfair

Performance MetricsExecution time

• Elapsed time / wall-clock time / response time• Latency to complete a task, including disk access, memory

access, I/O, operating system overhead, and everything (includes time consumed by other programs in a time-sharing system)

• CPU time• The time CPU is computing, not including I/O time or

waiting time• User time / user CPU time

• CPU time spent in the program• System time / system CPU time

• CPU time spent in the operating system performing tasks requested by the program

Performance MetricsPerformance Comparison

Relative performance

Performancex = 1 . Execution timeX

Performance Ratio = PerformanceX = Execution timeY

PerformanceY Execution timeX

Performance MetricsRelative Performance

• If workload consists of more than one program, total execution time may be used.

• If there are more than one machine to be compared, one of them must be selected as a reference.

Performance MetricsThroughput

• Total amount of work done in a given time• Measured in tasks per time unit• Can be used for

• Operating system performance • Pipeline performance• Multiprocessor performance

Performance Metrics MIPS (Million instructions per second)

• Includes both integer and floating point performance

• Number of instructions in a program varies between different computers

• Number of instructions varies between different programs on the same computer

MIPS = Instruction count = Clock rate Execution time x 106 CPI x 106

Performance Metrics MFLOPS

(Million floating point operations per second)

• Give performance of only floating-point operations

• Different mixes of integer and floating-point operations may have different execution times:• Integer and floating-point units work independently• Instruction and data caches provide instruction and data

concurrently

Performance Metrics Utilization

Speciality ratio

• 1 general purpose

Utilization = Busy time . Total time

Speciality ratio = Maximum performance . Minimum performance

Performance Metrics Asymptotic and Half performance

• r – asymptotic performance

• n1/2 – half performanceT = r (n + n1/2)

r = 1/tn1/2 = t0/t

Slope = r-1

t0

-n1/2n1/2

2t0

Performance MetricsSpeedup

• Express how much faster is system 2 than system 1• Calculated directly from execution time

Performancex = 1 = 1 Execution timeX TX

Speedup2,1 = Performance2 = T1

Performance1 T2

Performance MetricsRelative Change

• It expresses the performance of system 2 relative to system 1

Performancex = 1 = 1 Execution timeX TX

Relative change2,1 = Performance2 - Performance1 = T1 - T2 = Speedup2,1 - 1 Performance1 T2

Performance MetricsStatistical Analysis

• Used to compare performance

• Workload consists of many programs

• Depends on the nature of the data as well as distribution of the test results

Performance MetricsIndices of Central Tendency

Used to summarize multiple measurements

• Mean

• Median

• Mode

Performance Metrics• Mean (average)

Gives equal weight to all measurements

Arithmetic mean = xi , 1 ≤ i ≤ n n

Measurement Execution time

X1 10

X2 20

X3 15

X4 18

X5 16

Mean 15.8

Performance Metrics• Median

1. Order all n measurements2. The middle value is the median. If n is even, median is the mean of the middle 2 values

Using Median instead of Mean reduces the skewing effect of the outliers.

Measurement Execution time

X1 10

X2 20

X3 15

X4 18

X5 16

X6 200

Mean 46.5

Measurement Execution time

X1 10

X3 15

X5 16

X4 18

X2 20

X6 200

254 XX Median =

= 17

Performance Metrics• ModeMode is the value that occurs most frequently

• If all values occur once, there is no mode• If there are several samples that all have the same value, there would be several

modes

Measurement Execution time

X1 10

X2 20

X3 36

X4 20

X5 20

X6 20

Mode = 20

Mean, Median, Mode

Mean Incorporates information from the entire measured values Sensitive to outliers

Median and Mode Less sensitive to outliers Do not effectively use all information

ex

Performance MetricsArithmetic mean (average)

• May be misleading if the data are skewed or scattered

Arithmetic mean = xi , 1 ≤ i ≤ n n

MA MB MC

Prog1 50 100 500

Prog2 400 800 800

Prog3 5550 5100 4700

Average 2000 2000 2000

Performance MetricsWeighted average

• weight is the frequency of each program in daily processing

• Results may change with a different set of execution frequencies

Weighted average = ∑ wi . xi 1 ≤ i ≤ n

weight MA MB MC

Prog1 60% 50 100 500

Prog2 30% 400 800 800

Prog3 10% 5550 5100 4700

Average 705 810 1010

Performance MetricsGeometric mean

• Results are stated in relation to the performance of a reference machine

Geometric mean = ( xi )1/n , 1 ≤ i ≤ n

MA Normalized to MA

MB

(reference)

Normalized to MB

MC Normalized to MC

Prog1 50 2 100 1 500 0.2

Prog2 400 2 800 1 800 1

Prog3 5550 0.92 5100 1 4700 1.085

Average 1.54 1 0.60

• Results are consistent no matter which system is chosen as reference

Performance MetricsHarmonic mean

• Used to compare performance results that are expressed as a rate (e.g. operations per second, throughput, etc.)

• Slowest rates have the greatest influence on the resultIt identifies areas where performance can be improved

Harmonic mean = n , 1 ≤ i ≤ n ∑ 1/xi

Performance Metrics

Characteristics of a good performance metric

• If the time values are averaged, then the resulting mean value must be directly proportional to the total time.

• If the rate values are averaged, then the resulting mean value must be inversely proportional to the total time.

Performance MetricsEx• n benchmark programs• Ti is the execution time of program i• F floating-point operations in each program• Mi = F / Ti is the execution rate of program i (MFLOP/s)

Arithmetic mean

• Inappropriate for summarizing rates

TA = TA is directly proportional to the total execution time

n

1iiT

n1

MA = =

n

1iiM

n1

n

1i iT1

nF MA is inversely proportional to

the total execution time

Performance MetricsHarmonic mean

• Inappropriate for summarizing execution times• Appropriate for summarizing rates

TH = TH is not directly proportional to the total execution time

n

1ii1/T

n

MH = =MH is inversely proportional to the total execution time

n

1ii1/M

n

n

1iiT

n F

Performance MetricsEx

Performance MetricsGeometric mean

• Inappropriate for summarizing execution times• Inappropriate for summarizing rates

TG = TG is not directly proportional to the total execution time

MH = =MH is not inversely proportional to the total execution time

n

n

1iiT

n

n

1iiM

n

n

1iiF/T

Performance MetricsGeometric mean

• Produces a consistent ordering of the systems but it is the wrong ordering

M1 M2 M3Prog1 417 244 134

Prog2 83 70 70

Prog3 66 153 135

Prog4 39499 33527 66000

Prog5 772 368 369

Geometric mean (Normalized wrt S1) 1.0 0.86 0.84

Geometric mean (Normalized wrt S2) 1.17 1.0 0.99

Rank 3 2 1

Total time 40787 34362 66798

Arithmetic mean 8157 6872 13342

Rank 2 1 3

Performance MetricsHistogram• Used to display the distribution of a set of measured

values (variability)

• First find the minimum and maximum values. Then divide the range into b subranges, called cells.

Histogram

Message size (kbytes)

Network A Network B

0 < xi ≤ 5 11 39

5 < xi ≤ 10 27 25

10 < xi ≤ 15 41 18

15 < xi ≤ 20 32 5

20 < xi ≤ 25 21 19

25 < xi ≤ 30 12 42

30 < xi ≤ 35 4 0

Performance MetricsIndex of Dispersion

Index of dispersion is used to compare the spread of measurements around the mean value

• Range is the simplest metric for an index of dispersion

• Range is sensitive to a few extreme values

)(min)(maxmax iiii xxR

Performance MetricsIndex of Dispersion

• Maximum of the absolute values of the difference of each measurement from the mean

• It is also sensitive to extreme values

|)(|maxmax xxii

Performance MetricsIndex of Dispersion

• Sample variance is the simplest metric for an index of dispersion

)1()(

2

1 1

22

nnxxn

sn

i

n

ii

Requires 2 passes through the data to calculate first x and then s2

Requires 1 pass

1)(

2

12

nxx

sn

i i

Performance MetricsIndex of Dispersion

• Standard deviation

• Coefficient of variance (COV): normalizes standard deviation wrt the mean

1)(

2

12

nxx

ssn

i i

xsCOV /

Performance Evaluation Methods

Benchmarking Monitoring Analytical Modeling Queuing Theory

Benchmarking

Benchmark is a program that is run on a computer to measure its performance and compare it with other machines

Best benchmark is the users’ workload – the mixture of programs and operating system commands that users run on a machine. Not practical

Standard benchmarks

BenchmarkingTypes of Benchmarks

Synthetic benchmarks

Toy benchmarks

Microbenchmarks

Program Kernels

Real Applications

BenchmarkingSynthetic benchmarks

Artificially created benchmark programs that represent the average frequency of operations (instruction mix) of a large set of programs

• Whetstone benchmark • Dhrystone benchmark• Rhealstone benchmark

BenchmarkingSynthetic benchmarks

• Whetstone benchmark • First written in Algol60 in 1972, today Fortran, C/C++,

Java versions are available• Represents the workload of numerical applications• Measures floating point arithmetic performance• Unit is Millions of Whetstone instructions per second

(MWIPS)• Shortcommings:

• Does not represent constructs in modern languages, such as pointers, etc.

• Does not consider cache effects

BenchmarkingSynthetic benchmarks

• Dhrystone benchmark• First written in Ada in1984, today • Represents the workload of C version is available• Statistics are collected on system software, such as operating

system, compilers, editors and a few numerical programs• Measures integer and string performance, no floating-point

operations• Unit is the number of program iteration completions per second • Shortcommings:

• Does not represent real life programs• Compiler optimization overstates system performance• Small code that may fit in the instruction cache

BenchmarkingSynthetic benchmarks

• Rhealstone benchmark• Multi-tasking real-time systems • Factors are:

• Task switching time• Pre-emption time• Interrupt latency time• Semaphore shuffling time• Dead-lock breaking time• Datagram throughput time

• Metric is Rhealstones per second

6

∑ wi . (1/ ti) i=1

BenchmarkingToy benchmarks

10-100 lines of code that the result is known before running the toy program

• Quick sort • Sieve of Eratosthenes

Finds prime numbers http://upload.wikimedia.org/wikipedia/commons/8/8c/New_Animation_Sieve_of_Eratosthenes.gif

func sieve( var N ) var PrimeArray as array of size N initialize PrimeArray to all true for i from 2 to N for each j from i + 1 to N, where i divides j

set PrimeArray( j ) = false

BenchmarkingMicrobenchmarks

Small, specially designed programs used to test some specific function of a system (eg. Floating-point execution, I/O subsystem, processor-memory interface, etc.)

• Provide values for important parameters of a system• Characterize the maximum performance if the overall

performance is limited by that single component

BenchmarkingKernels

Key pieces of codes from real applications.

• LINPACK and BLAS

• Livermore Loops

• NAS

BenchmarkingKernels

• LINPACK and BLAS Libraries• LINPACK – linear algebra package

• Measures floating-point computing power• Solves system of linear equations Ax=b with Gaussian

elimination• Metric is MFLOP/s• DAXPY - most time consuming routine• Used as the measure for TOP500 list

• BLAS – Basic linear algebra subprograms• LINPACK makes use of BLAS library

BenchmarkingKernels

• LINPACK and BLAS Libraries

• SAXPY – Scalar Alpha X Plus Y• Y = X + Y, where X and Y are vectors, is a scalar• SAXPY for single and DAXPY for double precision • Generic implementation:

for (int i = m; i < n; i++) { y[i] = a * x[i] + y[i];

}

BenchmarkingKernels

• Livermore Loops• Developed at LLNL• Originally in Fortran, now also in C• 24 numerical application kernels, such as:

• hydrodynamics fragment, • incomplete Cholesky conjugate gradient, • inner product, • banded linear systems solution, tridiagonal linear systems solution, • general linear recurrence equations, • first sum, first difference, • 2-D particle in a cell, 1-D particle in a cell, • Monte Carlo search, • location of a first array minimum, etc.

• Metrics are arithmetic, geometric and harmonic mean of CPU rate

BenchmarkingKernels

• NAS Parallel Benchmarks• Developed at NASA Advanced Supercomputing division• Paper-and-pencil benchmarks• 11 benchmarks, such as:

• Discrete Poisson equation,• Conjugate gradient• Fast Fourier Transform• Bucket sort• Embarrassingly parallel• Nonlinear PDE solution• Data traffic, etc.

BenchmarkingReal Applications

Programs that are run by many users• C compiler• Text processing software• Frequently used user applications• Modified scripts used to measure particular aspects of

system performance, such as interactive behavior, multiuser behavior

BenchmarkingBenchmark Suites

Desktop Benchmarks• SPEC benchmark suite

Server Benchmarks • SPEC benchmark suite• TPC

Embedded Benchmarks• EEMBC

BenchmarkingSPEC Benchmark Suite

Desktop Benchmarks• CPU-intensive

• SPEC CPU2000• 11 integer (CINT2000) and 14 floating-point (CFP2000) benchmarks• Real application programs:

• C compiler• Finite element modeling• Fluid dynamics, etc.

• Graphics intensive• SPECviewperf

• Measures rendering performance using OpenGL• SPECapc

• Pro/Engineer – 3D rendering with solid models• Solid/Works – 3D CAD/CAM design tool, CPU-intensive and I/O intensive tests• Unigraphics – solid modeling for an aircraft design

Server Benchmarks • SPECWeb – for web servers• SPECSFS – for NFS performance, throughput-oriented

BenchmarkingTPC Benchmark Suite

Server Benchmark Transaction processing (TP) benchmarks Real applications

• TPC-C: simulates a complex query environment• TPC-H: ad hoc decision support• TPC-R: business decision support system where users run a

standard set of queries• TPC-W: business-oriented transactional web server

Measures performance in transactions per second. Throughput performance is measured only when response time limit is met.

Allows cost-performance comparisons

BenchmarkingEEMBC Benchmarks

for embedded computing systems

34 benchmarks from 5 different application classes:• Automotive/industrial• Consumer• Networking• Office automation• Telecommunications

BenchmarkingBenchmarking Strategies

Fixed-computation benchmarks

Fixed-time benchmarks

Variable-computation and variable-time benchmarks

BenchmarkingBenchmarking Strategies

Fixed-computation benchmarks

Fixed-time benchmarks

Variable-computation and variable-time benchmarks

BenchmarkingFixed-Computation benchmarks

W: fixed workload (number of instructions, number of floating-point operations, etc)

T: measured execution time

R: speed

Compare

TWR

1

2

2

1

2

1

//

TT

TWTW

RRSpeedup

BenchmarkingFixed-Computation benchmarks

Amdahl’s Law

BenchmarkingFixed-Time benchmarks

On a faster system, a larger workload can be processed in the same amount of time

T: fixed execution timeW: workload R: speed

Compare

TWR

2

1

2

1

2

1

//

WW

TWTW

RRSizeup

BenchmarkingFixed-Time benchmarks

Scaled Speedup

BenchmarkingVariable-Computation and Variable-Time

benchmarks

In this type of benchmark, quality of the solution is improved.

Q: quality of the solutionT: execution time

Quality improvements per second:TQ

Quality of MeasurementCharacteristics of a measurement tool (timer)

Accuracy: Absolute difference of a measured value and the corresponding standard reference value (such as the duration of a second).

Precision: Reliability of the measurements made with the tool. Highly precise measurements are tightly clustered around a single value.

Resolution: Smallest incremental change that can be detected. Ex: interval between clock ticks

Quality of Measurement

accuracy

precision

mean value true value

Quality of Measurement

The uncertainties in the measurements are called errors or noise

Sources of errors: Accuracy, precision, resolution of the measurement tool Time required to read and store the current time value Time-sharing among multiple programs Processing of interrupts Cache misses, page faults

Quality of Measurement

Types of errors:

Systematic errors• Are the result of some experimental mistake• Usually constant across all measurementsEx: temperature may effect clock period

Random errors• Unpredictable, nondeterministic• Effect the precision of measurementEx: timer resolution ±T , effects measurements with equal probability

Quality of MeasurementExperimental measurements follow Gaussian (normal) distribution

Ex:x measured value±E random errorTwo sources of errors, each having 50% probability

Pg 48

Actual value of x is measured half of the time.

Error 1 Error 2 Measured value Probability

-E -E x-2E 1/4

-E +E x 1/4

+E -E x 1/4

+E +E x+2E 1/4

Confidence IntervalsUsed to find a range of values that has a given probability of

including the actual value.

Case 1: number of measurements is large (n≥30)

{x1, x2, … xn} - Samples Gaussian distribution – mean – standard deviation

Confidence interval: [ c1, c2 ]Confidence level: (1-)×100 Pr[ c1 ≤ x ≤ c2 ] = 1-

Pr[ x < c1 ] = Pr[ x > c2] = /2

Confidence Intervals

Case 1: number of measurements is large (n≥30)

Confidence interval: [ c1, c2 ]

nszxc 2/11

nszxc 2/12

x

s

- Sample mean

- Standard deviation

is obtained from the precomputed table

2/1 z

Confidence Intervals

Case 2: number of measurements is small (n<30)

Sample variances s2 can vary significantly.

t distribution:

nstxc n 1;2/11

x

s

- Sample mean

- Standard deviation

is obtained from the precomputed table

1;2/1 nt

nstxc n 1;2/12

Confidence Intervals

Ex: number of measurements is large (n<30)

Pg 51

90% confidence interval means that there is a 90% chance that the actual mean is within that interval.

Confidence Intervals

90% c1= 6.5 c2= 9.495% c1= 6.1 c2= 9.7 99% c1= 5.3 c2=10.6

Wider interval Less precise knowledge about the mean

Confidence IntervalsDetermining the Number of measurements Needed

nszxxec 2/11 )1(

))1(,)1((),( 21 xexecc

22/1

xeszn

Confidence IntervalsDetermining the Number of measurements Needed

Estimating s:1. Make small number of measurements.2. Estimate standard deviation s.3. Calculate n.4. Make n measurements.

22/1

xeszn

Confidence Intervals

Ex:

Pg 53

Confidence IntervalsConfidence Intervals for Proportions

When we are interested in the number of times events occur.

Bimonial distribution: If np≥10 it approximates Gaussian distribution with mean p and

variance p(1-p)/n

nppzpc )1(

2/11

n

m

- Total events recorded

- Number of times desired outcome occurs

is the sample proportion nmp /

nppzpc )1(

2/12

Confidence IntervalsConfidence Intervals for Proportions

Determining the number of measurements needed:

2

22/1

)()1()(

epppzn

nppzppe )1()1( 2/1

Confidence Intervals

Ex:

Pg 55

Comparing Alternatives

Three different cases:

Before-and-after comparison

Comparison of non-corresponding (impaired) measurements

Comparisons involving proportions

Comparing AlternativesBefore-and-after comparison

Used to determine whether some change made to a system has statistically significant impact on its performance.

1. Find a confidence interval for the mean of the differences of the paired observations

2. If this interval includes 0, then measured differences are not statistically significant.

Comparing AlternativesBefore-and-after comparison

Before measurements: b1, … bn After measurements: a1, … an

Differences: d1= a1, - b1 d2= a2, - b2 …

nszdc d

2/11

nszdc d

2/12

d

ds

- Arithmetic mean

- Standard deviation

n ≥ 30

Comparing AlternativesBefore-and-after comparison

Ex: pg 65

Comparing AlternativesNon-corresponding Measurements

There is no direct corresponding between pairs of measurements.

1. First system: n1 measurements, find x1 and s1 2. Second system: n2 measurements, find x2 and s2 3. Calculate the difference of means and standard deviation of

the difference of means

4. If confidence interval includes 0, then no significant difference

21 xxx 2

22

1

21

ns

nssx

Comparing AlternativesNon-corresponding Measurements

xszxc 2/11

xszxc 2/12 n1 ≥ 30 and n2 ≥ 30

Comparing AlternativesNon-corresponding Measurements

xn stxcdf;2/11

n1 < 30 or n2 < 30

xn stxcdf;2/12

)1()1( 2

2

22

2

1

2

12

1

2

2

22

1

21

nns

nns

ns

ns

ndf

Comparing AlternativesNon-corresponding Measurements

Ex: pg 67

Comparing AlternativesComparing Proportions

m1 is the number of times the event occurs in system 1 out of a total of n1 events measured.

If m1>10 and m2>10 the it approximates normal distribution with

means and

variance and

111 / nmp 222 / nmp

1p 2p

111 /)1( npp 222 /)1( npp

Comparing AlternativesComparing Proportions

Confidence intervals

where

Standard deviation2

22

1

11 )1()1(npp

nppsp

21 ppp

pszpc 2/11

pszpc 2/12

Comparing AlternativesComparing more than Two Alternatives

Comparing AlternativesComparing more than Two Alternatives

Comparing AlternativesComparing more than Two Alternatives

Comparing AlternativesComparing more than Two Alternatives

TimersRoll Over

Suppose a timer returns 32-bit integer data and measures microseconds.

It rolls over after 232 microseconds (= 1.2 hours) Timers that measure milliseconds and use 32-bit data roll

over after 232 milliseconds (= 49 days)

There is a trade-off between roll over time and accuracy.

Performance Evaluation

Performance Evaluation steps:

1. Measurement / Prediction• What to measure? How to measure?• Modeling for prediction

2. Analysis & Reporting• Performance metrics