Statistical Comparison of Two or More Systems
The most relevant of all the Basic Theory Lectures.
No Holidays.
THE MISSION
Your analysis task involves manipulating conditions of the system of interest from a prescribed set of options. Design of Experiments: Determine if the
different options are really different. Is the best one really statistically better?
Ranking and Selection: What’s the probability that the best sample indicates the best system setting?
VOCABULARY
Factor An element of the system that will be
manipulated Setting or Level
A value that a Factor may assume
EXAMPLE : Simulation model of Football (EA Sports)
Factors Quarterback Running Back Strong Safety
Settings or Levels for Quarterback Dante’ Bret Johnny U.
TYPES OF DESIGNS One Factor, Two Settings
Paired samples Behrens-Fischer Question: Which is Best?
More than one Factor Factorial Designs Partially Exhaustive Designs Question: Are the settings significant
difference-makers?
PAIRED SAMPLES Example: Quarterback Controversy! Simulate St. Louis Rams vs. Tampa Bay
Bucs, recording the Quarterback Rating Level 1: Curt Warner Level 2: Mark Bulger
Run the simulation 28 times for each player, resulting in data set W1, W2, ..., W28 B1, B2, ..., B28
Is E[B] > E[W]?
BRUTE FORCE
Confidence interval on the quantity E[W]-E[B]
If it doesn’t include 0.0, we have conclusive evidence that there is a difference
Equivalent to the Hypothesis Test H0: E[B]=E[W]
CALCULATIONS ON VARIANCES: SOME BASICS Let X and Y be random variables
)(])[(
])[(][
2
2
xdFXEX
XEXEXVAR
X
CALCULATIONS ON VARIANCES: SOME BASICS Let X and Y be random variables
],[2][][][)5
][][)4
][][][],[)3
],[2][][][)2
])[(][][)1
2
22
YXCOVYVARXVARYXVAR
XVARccXVAR
YEXEXYEYXCOV
YXCOVYVARXVARYXVAR
XEXEXVAR
COV=0 if X and Y are independent.
SAMPLE MEAN
n
XVARXVAR
n
n
XVARnn
XVARXVAR
i
n
ii
n
ii
)(
1)(
2
12
1
nX
X
CONFIDENCE INTERVAL
a/2 probability of Type I error on each end of the confidence interval
basic interval for X-bar is n
ZX
nZX
XVARZX
2/
2
2/
2/ ][
BASIC CONFIDENCE INTERVAL
][)( 2/ BWVARZBW
28
0][][
],[2][][][
BVARWVAR
BWCOVBVARWVARBWVAR
SPREADSHEET HIGHLIGHTS 1 (U-0.5)*SQRT(12)
zero mean unit stddev
m + (U-0.5)*SQRT(12)*s mean m stddev s uniform over an interval centered at
m and s*SQRT(12)/2 wide
COMMON RANDOM NUMBERS Correlation is not always BAD! Suppose we could INDUCE
CORRELATION between the W’s and the B’s without adding any bias?
Reduces the theoretical variance of W-bar – B-bar
FREE POWER (the probability of correctly rejecting H0: equal means)
STREAMING
Segregate the random number generation task into streams connected to phenomena
seed1 seed2
Inter-arrivaltimes
Servicetimes
Zi=aZi-1 mod m
1. Change features of the service.2. Use exact same arrival stream forcomparing each service setting.
SPREADSHEET HIGHLIGHTS 2
Use same results of RAND() for building Bulger samples Warner samples
Note CI shrinkage Try with identical sigma Discuss “Estimation”
Behrens-Fischer Problem Comparison of Means No pairs, equal sample sizes, or equal
variances Remember that we are after the variance of
W-bar – B-bar Common use: New samples vs. History
0/][/][
],[2][][][
BW nBVARnWVAR
BWCOVBVARWVARBWVAR
SPREADSHEET HIGHLIGHTS
MULTI-SETTING CASE
Can involve many Factors or just one
Treatment i has mean mi
Analysis of Variance (ANOVA) Data from treatment 1, 2, ..., n H0: m1 =...mn-1 =mn
Are the treatments distinguishable?
DESIGN OF EXPERIMENT
DetermineFactors and Settings
Collect DataAccording to Design
Design = Which Factors,Which Settings for each Treatment
PerformANOVA
State Conclusion
FULL FACTORIAL
Build sample of All Combinations Factors
Quarterback (2) Running Back (3) Strong Safety (3) 2x3x3=18 Treatments
HOW ANOVA WORKS Xi,j is ith sample from jth treatment
point Assumed iid Normal (never!) Decomposition of variability
Observation (Obs) Treatment vs. Grand Mean (Tr) Within Treatment (Res)
jiiji eX ,,
HYPOTHESIS H0
The treatment variability is random variability
The size of the treatment variability is in-scale with the residual variability
ANOVA uses sums of squares g treatments nt samples from treatment t
ANOVA TABLE
1)(
)(
1)(
11
2,
1
11
2,
1Re
1
2
g
it
g
tji
n
jObs
g
it
g
ttji
n
js
g
tttTr
nxxSS
gnxxSS
gxxnSS
t
t
degreesfreedom
REMEMBER chi-SQUARED?From our Goodness-of-Fit Test
X~N(0,1) for n independent X’s sum of n X2 is chi-SQUARED with n
degrees of freedom if estimates (X-bar, sigma) were
used to make X’s N(0,1), lose one d.f. per estimate
F-distribution X is chi-sq with n d.f. Y is chi-sq with m d.f. (X/n)/(Y/m) has F distribution
ANOVA HYPOTHESIS TEST
FfdSS
fdSS
s
Tr ~./
./
Re
The normalizing s cancels!
ANOVA HYPOTHESIS TEST Compare the
test statistic to a table
Reject if its big and conclude that ...
the Treatments are Different!
SPREADSHEET HIGHLIGHTS