Prof Anja Feldmann based on slides by David J. Lilja · Copyright 2004 David J. Lilja 12 Important...

Copyright 2004 David J. Lilja 1

Comparing alternatives

Prof Anja Feldmannbased on slides by David J. Lilja


Comparing alternatives

ANOVAAnalysis of Variance

Partitions total variation in a set of measurements into

Variation due to real differences in alternativesVariation due to errors


Comparing two alternatives1. Before-and-after

Did a change to the system have a statistically significant impact on performance?

2. Non-corresponding measurementsIs there a statistically significant difference between two different systems?


Before-and-after comparisonAssumptions

Before-and-after measurements are not independentVariances in two sets of measurements may not be equal

→ Measurements are related

Use mean of differences


Before-and-after comparison

483876-391885-595904490943-588832-186851

Difference(di = bi – ai)

After(ai)

Before(bi)

Measurement(i)


Before-and-after comparison

From mean of differences, appears that change reduced performance.However, standard deviation is large.

15.4deviation Standard1sdifference ofMean

==−==

dsd


95% Confidence interval for mean of differences

c1,2 = [-5.36, 3.36]Interval includes 0

→ With 95% confidence, there is no statistically significant difference between the two systems.


Noncorrespondingmeasurements

No direct correspondence between pairs of measurementsUnpaired observationsn1 measurements of system 1n2 measurements of system 2


Confidence interval for difference of means

1. Compute means2. Compute difference of means3. Compute standard deviation of difference of

means4. Find confidence interval for this difference5. No statistically significant difference

between systems if interval includes 0


OS exampleInitial operating system

n1 = 1,300,203 interrupts (3.5 hours)m1 = 142,892 interrupts occurred in OS codep1 = 0.1099, or 11% of time executing in OS

Upgrade OSn2 = 999,382m2 = 84,876p2 = 0.0849, or 8.5% of time executing in OS

Statistically significant improvement?


OS example (2.)

p = p1 – p2 = 0.0250sp = 0.000391190% confidence interval

(0.0242, 0.0257)Statistically significant difference?


Important points

Use confidence intervals to determine if there are statistically significant differences

Before-and-after comparisonsFind interval for mean of differences

Noncorresponding measurementsFind interval for difference of means

If interval includes zero→ No statistically significant difference


Comparing > two alternatives

Naïve approachCompare confidence intervals


One-factor Analysis of Variance (ANOVA)

Very general techniqueLook at total variation in a set of measurementsDivide into meaningful components

Also calledOne-way classificationOne-factor experimental design


One-factor Analysis of Variance (ANOVA)

Separates total variation observed in a set of measurements into:

1. Variation within one systemDue to random measurement errors

2. Variation between systemsDue to real differences + random error

Is variation(2) statistically > variation(1)?


ANOVA

Make n measurements of k alternativesyij = ith measurment on jth alternativeAssumes errors are:

IndependentGaussian (normal)


Measurements for all alternatives

αk…αj…α2α1Effecty.k…y.j…y.2y.1Col meanynk…ynj…yn2yn1n

…………………

yik…yij…yi2yi1i

…………………

y2k…y2j…y22y212

yk1…y1j…y12y111

k…j…21Measurements

Alternatives


Column means: Average performance of one alternative


…………………

yik…yij…yi2yi1i

…………………

y2k…y2j…y22y212

yk1…y1j…y12y111


Alternatives


Error: Deviation from column mean


…………………

yik…yij…yi2yi1i

…………………

y2k…y2j…y22y212

yk1…y1j…y12y111


Alternatives


Overall mean:Average performance of all alternatives


…………………

yik…yij…yi2yi1i

…………………

y2k…y2j…y22y212

yk1…y1j…y12y111


Alternatives


Effect: Deviation from overall mean


…………………

yik…yij…yi2yi1i

…………………

y2k…y2j…y22y212

yk1…y1j…y12y111


Alternatives


Effects and errors

Effect is distance from overall meanHorizontally across alternatives

Error is distance from column meanVertically within one alternativeError across alternatives, too

Individual measurements are then:

ijjij eyy ++= α..


Sum of squares of differencesSST = differences between each measurement and overall meanSSA = variation due to effects of alternativesSSE = variation due to errors in measurments

SSESSASST +=


Sum of squares of differences

( )

( )

( )2

1 1..

2

1 1.

2

1...

∑∑

∑∑

∑

= =

= =

=

−=

−=

−=

k

j

n

iij

k

j

n

ijij

k

jj

yySST

yySSE

yynSSA


ANOVA – Fundamental idea

Separates variation in measured values into:1. Variation due to effects of alternatives

• SSA – variation across columns2. Variation due to errors

• SSE – variation within a single columnIf differences among alternatives are due to

real differences,• SSA should be statistically > SSE


Comparing SSE and SSA

Simple approachSSA / SST = fraction of total variation explained by differences among alternativesSSE / SST = fraction of total variation due to experimental error

But is it statistically significant?


Comparing variancesUse F-test (statistics) to compare ratio of variancesIf Fcomputed > Ftable

→ We have (1 – α) * 100% confidence that variation due to actual differences in alternatives, SSA, is statistically greater than variation due to errors, SSE.


ANOVA example

0.3175-0.1441-0.1735Effects

0.29030.60780.14620.1168Column mean

0.52980.13830.09745

0.66750.17300.19544

0.51520.13820.09693

0.53000.14320.09712

0.79660.13820.09721

Overall mean

321Measurements

Alternatives


Conclusions from exampleSSA/SST = 0.7585/0.8270 = 0.917→ 91.7% of total variation in measurements is due to

differences among alternativesSSE/SST = 0.0685/0.8270 = 0.083→ 8.3% of total variation in measurements is due to noise in

measurementsComputed F statistic > tabulated F statistic→ 95% confidence that differences among alternatives are

statistically significant.


ContrastsANOVA tells us that there is a statistically significant difference among alternativesBut it does not tell us where difference isUse method of contrasts to compare subsets of alternatives

A vs B{A, B} vs {C}Etc.

Contrast = linear combination of effects of alternatives


Important Points

Use one-factor ANOVA to separate total variation into:– Variation within one system

Due to random errors– Variation between systems

Due to real differences (+ random error)

Is the variation due to real differences statistically greater than the variation due to errors?


Generalized design of experiments

GoalsIsolate effects of each input variable.Determine effects of interactions.Determine magnitude of experimental errorObtain maximum information for given effort

Basic ideaExpand 1-factor ANOVA to m factors


TerminologyResponse variable

Measured output value: e.g., total execution timeFactors

Input variables that can be changedE.g.: cache size, clock rate, bytes transmitted

LevelsSpecific values of factors:

Continuous (~bytes) or discrete (type of system)Replication

Completely re-run experiment with same input levelsInteraction

Effect of input factor A depends on level of input factor B


Two-factor experiments

Two factors (inputs)A, B

Separate total variation in output values into:Effect due to AEffect due to BEffect due to interaction of A and B (AB)Experimental error


Example – User response timeA = degree of multiprogrammingB = memory sizeAB = interaction of memory size and degree of multiprogramming

0.701.451.504

0.500.660.813

0.360.450.522

0.150.210.251

1286432A

B (Mbytes)


Two-factor ANOVA

Factor A – a input levelsFactor B – b input levelsn measurements for each input combinationabn total measurements


Two-factor ANOVA

Each individual measurement is composition of

Overall meanEffectsInteractionsMeasurement errors


ExampleOutput = user response time (seconds)Want to separate effects due to

A = degree of multiprogrammingB = memory sizeAB = interactionError

Need replications to separate error 0.701.451.504

0.500.660.813

0.360.450.522

0.150.210.251

1286432A

B (Mbytes)


Conclusions from example

77.6% (SSA/SST) of all variation in response time due to degree of multiprogramming11.8% (SSB/SST) due to memory size9.9% (SSAB/SST) due to interaction0.7% due to measurement error95% confident that all effects and interactions are statistically significant


A problem

Full factorial design with replicationMeasure system response with all possible input combinationsReplicate each measurement n times to determine effect of measurement error

m factors, v levels, n replications→ n vm experimentsm = 5 input factors, v = 4 levels, n = 3→ 3(45) = 3,072 experiments!


Fractional factorial designs: n2m experiments

Special case of generalized m-factor experimentsRestrict each factor to two possible values

High, lowOn, off

Find factors that have largest impactFull factorial design with only those factors


Still too many experiments with n2m!

Plackett and Burman designs (1946)Multifactorial designs

Effects of main factors onlyLogically minimal number of experiments to estimate effects of m input parameters (factors)Ignores interactions

Requires O(m) experimentsInstead of O(2m) or O(vm)

Date post:	16-Jul-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Prof Anja Feldmann based on slides by David J. Lilja · Copyright 2004 David J. Lilja 12 Important...

Documents