Copyright 2004 David J. Lilja 1
Comparing alternatives
Prof Anja Feldmannbased on slides by David J. Lilja
Copyright 2004 David J. Lilja 2
Comparing alternatives
ANOVAAnalysis of Variance
Partitions total variation in a set of measurements into
Variation due to real differences in alternativesVariation due to errors
Copyright 2004 David J. Lilja 3
Comparing two alternatives1. Before-and-after
Did a change to the system have a statistically significant impact on performance?
2. Non-corresponding measurementsIs there a statistically significant difference between two different systems?
Copyright 2004 David J. Lilja 4
Before-and-after comparisonAssumptions
Before-and-after measurements are not independentVariances in two sets of measurements may not be equal
→ Measurements are related
Use mean of differences
Copyright 2004 David J. Lilja 5
Before-and-after comparison
483876-391885-595904490943-588832-186851
Difference(di = bi – ai)
After(ai)
Before(bi)
Measurement(i)
Copyright 2004 David J. Lilja 6
Before-and-after comparison
From mean of differences, appears that change reduced performance.However, standard deviation is large.
15.4deviation Standard1sdifference ofMean
==−==
dsd
Copyright 2004 David J. Lilja 7
95% Confidence interval for mean of differences
c1,2 = [-5.36, 3.36]Interval includes 0
→ With 95% confidence, there is no statistically significant difference between the two systems.
Copyright 2004 David J. Lilja 8
Noncorrespondingmeasurements
No direct correspondence between pairs of measurementsUnpaired observationsn1 measurements of system 1n2 measurements of system 2
Copyright 2004 David J. Lilja 9
Confidence interval for difference of means
1. Compute means2. Compute difference of means3. Compute standard deviation of difference of
means4. Find confidence interval for this difference5. No statistically significant difference
between systems if interval includes 0
Copyright 2004 David J. Lilja 10
OS exampleInitial operating system
n1 = 1,300,203 interrupts (3.5 hours)m1 = 142,892 interrupts occurred in OS codep1 = 0.1099, or 11% of time executing in OS
Upgrade OSn2 = 999,382m2 = 84,876p2 = 0.0849, or 8.5% of time executing in OS
Statistically significant improvement?
Copyright 2004 David J. Lilja 11
OS example (2.)
p = p1 – p2 = 0.0250sp = 0.000391190% confidence interval
(0.0242, 0.0257)Statistically significant difference?
Copyright 2004 David J. Lilja 12
Important points
Use confidence intervals to determine if there are statistically significant differences
Before-and-after comparisonsFind interval for mean of differences
Noncorresponding measurementsFind interval for difference of means
If interval includes zero→ No statistically significant difference
Copyright 2004 David J. Lilja 13
Comparing > two alternatives
Naïve approachCompare confidence intervals
Copyright 2004 David J. Lilja 14
One-factor Analysis of Variance (ANOVA)
Very general techniqueLook at total variation in a set of measurementsDivide into meaningful components
Also calledOne-way classificationOne-factor experimental design
Copyright 2004 David J. Lilja 15
One-factor Analysis of Variance (ANOVA)
Separates total variation observed in a set of measurements into:
1. Variation within one systemDue to random measurement errors
2. Variation between systemsDue to real differences + random error
Is variation(2) statistically > variation(1)?
Copyright 2004 David J. Lilja 16
ANOVA
Make n measurements of k alternativesyij = ith measurment on jth alternativeAssumes errors are:
IndependentGaussian (normal)
Copyright 2004 David J. Lilja 17
Measurements for all alternatives
αk…αj…α2α1Effecty.k…y.j…y.2y.1Col meanynk…ynj…yn2yn1n
…………………
yik…yij…yi2yi1i
…………………
y2k…y2j…y22y212
yk1…y1j…y12y111
k…j…21Measurements
Alternatives
Copyright 2004 David J. Lilja 18
Column means: Average performance of one alternative
αk…αj…α2α1Effecty.k…y.j…y.2y.1Col meanynk…ynj…yn2yn1n
…………………
yik…yij…yi2yi1i
…………………
y2k…y2j…y22y212
yk1…y1j…y12y111
k…j…21Measurements
Alternatives
Copyright 2004 David J. Lilja 19
Error: Deviation from column mean
αk…αj…α2α1Effecty.k…y.j…y.2y.1Col meanynk…ynj…yn2yn1n
…………………
yik…yij…yi2yi1i
…………………
y2k…y2j…y22y212
yk1…y1j…y12y111
k…j…21Measurements
Alternatives
Copyright 2004 David J. Lilja 20
Overall mean:Average performance of all alternatives
αk…αj…α2α1Effecty.k…y.j…y.2y.1Col meanynk…ynj…yn2yn1n
…………………
yik…yij…yi2yi1i
…………………
y2k…y2j…y22y212
yk1…y1j…y12y111
k…j…21Measurements
Alternatives
Copyright 2004 David J. Lilja 21
Effect: Deviation from overall mean
αk…αj…α2α1Effecty.k…y.j…y.2y.1Col meanynk…ynj…yn2yn1n
…………………
yik…yij…yi2yi1i
…………………
y2k…y2j…y22y212
yk1…y1j…y12y111
k…j…21Measurements
Alternatives
Copyright 2004 David J. Lilja 22
Effects and errors
Effect is distance from overall meanHorizontally across alternatives
Error is distance from column meanVertically within one alternativeError across alternatives, too
Individual measurements are then:
ijjij eyy ++= α..
Copyright 2004 David J. Lilja 23
Sum of squares of differencesSST = differences between each measurement and overall meanSSA = variation due to effects of alternativesSSE = variation due to errors in measurments
SSESSASST +=
Copyright 2004 David J. Lilja 24
Sum of squares of differences
( )
( )
( )2
1 1..
2
1 1.
2
1...
∑∑
∑∑
∑
= =
= =
=
−=
−=
−=
k
j
n
iij
k
j
n
ijij
k
jj
yySST
yySSE
yynSSA
Copyright 2004 David J. Lilja 25
ANOVA – Fundamental idea
Separates variation in measured values into:1. Variation due to effects of alternatives
• SSA – variation across columns2. Variation due to errors
• SSE – variation within a single columnIf differences among alternatives are due to
real differences,• SSA should be statistically > SSE
Copyright 2004 David J. Lilja 26
Comparing SSE and SSA
Simple approachSSA / SST = fraction of total variation explained by differences among alternativesSSE / SST = fraction of total variation due to experimental error
But is it statistically significant?
Copyright 2004 David J. Lilja 27
Comparing variancesUse F-test (statistics) to compare ratio of variancesIf Fcomputed > Ftable
→ We have (1 – α) * 100% confidence that variation due to actual differences in alternatives, SSA, is statistically greater than variation due to errors, SSE.
Copyright 2004 David J. Lilja 28
ANOVA example
0.3175-0.1441-0.1735Effects
0.29030.60780.14620.1168Column mean
0.52980.13830.09745
0.66750.17300.19544
0.51520.13820.09693
0.53000.14320.09712
0.79660.13820.09721
Overall mean
321Measurements
Alternatives
Copyright 2004 David J. Lilja 29
Conclusions from exampleSSA/SST = 0.7585/0.8270 = 0.917→ 91.7% of total variation in measurements is due to
differences among alternativesSSE/SST = 0.0685/0.8270 = 0.083→ 8.3% of total variation in measurements is due to noise in
measurementsComputed F statistic > tabulated F statistic→ 95% confidence that differences among alternatives are
statistically significant.
Copyright 2004 David J. Lilja 30
ContrastsANOVA tells us that there is a statistically significant difference among alternativesBut it does not tell us where difference isUse method of contrasts to compare subsets of alternatives
A vs B{A, B} vs {C}Etc.
Contrast = linear combination of effects of alternatives
Copyright 2004 David J. Lilja 31
Important Points
Use one-factor ANOVA to separate total variation into:– Variation within one system
Due to random errors– Variation between systems
Due to real differences (+ random error)
Is the variation due to real differences statistically greater than the variation due to errors?
Copyright 2004 David J. Lilja 32
Generalized design of experiments
GoalsIsolate effects of each input variable.Determine effects of interactions.Determine magnitude of experimental errorObtain maximum information for given effort
Basic ideaExpand 1-factor ANOVA to m factors
Copyright 2004 David J. Lilja 33
TerminologyResponse variable
Measured output value: e.g., total execution timeFactors
Input variables that can be changedE.g.: cache size, clock rate, bytes transmitted
LevelsSpecific values of factors:
Continuous (~bytes) or discrete (type of system)Replication
Completely re-run experiment with same input levelsInteraction
Effect of input factor A depends on level of input factor B
Copyright 2004 David J. Lilja 34
Two-factor experiments
Two factors (inputs)A, B
Separate total variation in output values into:Effect due to AEffect due to BEffect due to interaction of A and B (AB)Experimental error
Copyright 2004 David J. Lilja 35
Example – User response timeA = degree of multiprogrammingB = memory sizeAB = interaction of memory size and degree of multiprogramming
0.701.451.504
0.500.660.813
0.360.450.522
0.150.210.251
1286432A
B (Mbytes)
Copyright 2004 David J. Lilja 36
Two-factor ANOVA
Factor A – a input levelsFactor B – b input levelsn measurements for each input combinationabn total measurements
Copyright 2004 David J. Lilja 37
Two-factor ANOVA
Each individual measurement is composition of
Overall meanEffectsInteractionsMeasurement errors
Copyright 2004 David J. Lilja 38
ExampleOutput = user response time (seconds)Want to separate effects due to
A = degree of multiprogrammingB = memory sizeAB = interactionError
Need replications to separate error 0.701.451.504
0.500.660.813
0.360.450.522
0.150.210.251
1286432A
B (Mbytes)
Copyright 2004 David J. Lilja 39
Conclusions from example
77.6% (SSA/SST) of all variation in response time due to degree of multiprogramming11.8% (SSB/SST) due to memory size9.9% (SSAB/SST) due to interaction0.7% due to measurement error95% confident that all effects and interactions are statistically significant
Copyright 2004 David J. Lilja 40
A problem
Full factorial design with replicationMeasure system response with all possible input combinationsReplicate each measurement n times to determine effect of measurement error
m factors, v levels, n replications→ n vm experimentsm = 5 input factors, v = 4 levels, n = 3→ 3(45) = 3,072 experiments!
Copyright 2004 David J. Lilja 41
Fractional factorial designs: n2m experiments
Special case of generalized m-factor experimentsRestrict each factor to two possible values
High, lowOn, off
Find factors that have largest impactFull factorial design with only those factors
Copyright 2004 David J. Lilja 42
Still too many experiments with n2m!
Plackett and Burman designs (1946)Multifactorial designs
Effects of main factors onlyLogically minimal number of experiments to estimate effects of m input parameters (factors)Ignores interactions
Requires O(m) experimentsInstead of O(2m) or O(vm)