Prediction of New Observations
Statistic Seminar: 6th talkETHZ FS2010
1
ObservationsMartina Albers12. April 2010
Papers: Welham (2004), Yiang (2007)
Content
• Introduction• Prediction of Mixed Effects• Prediction of Future Observation• Principles of Prediction
– Prediction Process– Prediction Process– Fixed and Random Terms
• Example: Split-Plot Design
2
Linear mixed model
y X Zb Wβ γε ε= + + = +
1 vector of observations
matrix associating observations with
the appropriate combination of fixed effects
1 vector of fixed effects
y n
X n p
pβ
≅ ×
≅ ×
≅ ×
3
1 vector of fixed effects
matrix associating observations with
t
p
Z n q
β ≅ ×
≅ ×
he app. comb. of random effects
1 vector of random effects
1 vector of residual errors
, combined design matrix resp. vector of effects
b q
n
W
ε
γ
≅ ×
≅ ×
≅
Introduction
Linear mixed model
y X Zb Wβ ε τ ε= + + = +
( ) ( )( )
2
1~ ,
~ 0,
nY B b N b I
B N θ
β σ= +X Z
Σ
cov( ) q q, symmetricB= ×Σ
Assumption
where
b ⊥ ε
4Introduction
2
cov( ) q q, symmetricBθ
θ θσ
= ×
′=
Σ
Λ Λ
( )2
1~ 0, q
B U
U N I
θ
σ
= Λ ( ) ( )( )
2
1
2
1
~ ,
~ 0,
n
q
Y U u N u I
U N I
θβ σ
σ
= +X ZΛ
where
Use:
Linear mixed modelCompute and
2
,
arg mˆ
0ˆin
0u
y uu
I
θ
βθ
ββ
= −
ΛZ X X
uI y′ ′ ′ + ′′Λ Λ Λ ΛZ Z Z X Z
u ˆθβ
5Introduction
0
0
ˆ
ˆq
uI y
y
θ θ
θ θ
θ θ θ θ
θ θβ′
′ ′
′=
′
′
′ ′= −
′ ′ ′ + = ′ ′ ′
′
′
ZX
ZX X X
ZX
X X ZX ZX
Λ
L L R
R R R
L R Z X
R R X X R R
Λ Λ Λ Λ
Λ
Z Z Z X Z
X Y X X X�����������
What do we mean by “prediction”?• Estimation of effects in the model
� prediction = linear combination of estimated effects• Marginal vs. Conditional predictions• What is needed?
– Marginal vs. Conditional predictions– In Example: variety or nitrogen prediction?
6Introduction
Is there a general strategy?Problem: Each situation needs to be analyzed “by hand”!
Questions that might arise:• In- or exclusion of random model terms from the prediction?• Different weighting schemes?• etc.
7
• etc.
Introduction
Predictions
� Prediction of mixed effects� Prediction of a future observation
2 types of prediction:
8Introduction
Linear mixed modely X Zb Wβ γε ε= + + = +
1 vector of observations
matrix associating observations with
the appropriate combination of fixed effects
1 vector of fixed effects
y n
X n p
pβ
≅ ×
≅ ×
≅ ×
9
1 vector of fixed effects
matrix associating observations with
t
p
Z n q
β ≅ ×
≅ ×
he app. comb. of random effects
1 vector of random effects
1 vector of residual errors
, combined design matrix resp. vector of effects
b q
n
W
ε
γ
≅ ×
≅ ×
≅
Prediction of Mixed effects
Prediction of Mixed Effects“all parameters are known”, i.e. fixed effects + variance components are knownConsider :
: known vectors
, vector of fixed/random effects
b
β b :
β βγ ξ′ ′ ′= + = +x z x
x, z
10
Best predictor for ξ (as MSE):
Prediction of Mixed Effects
ˆ E Ey b yξ ξ ′ = = z
Assumption
( )var( ) 'E b y b y β− ⇒ = − 1
Z V X
0 cov( ) cov( )~ ,
cov( )
cov( ), cov( ) cov( )
b b bN
X b
y b
y β
′
′= = =
+
Z
Z V
R V Z Z Rε
11
Best linear predictor of γ is then
Prediction of Mixed Effects
( )
( )1
var( ) '
ˆ var( ) '
E b y b y
b y
β
ξ β−
⇒ = −
⇒ = −
Z V X
x' Z V X
( )cov( )ˆ 'b yγ β β−= + −1x' z' Z V X
Example: IQ-Test
−
− Estimate the true IQ of a student scoring 130 in a test−
− model:
)15,100(~IQ 2N
)5,IQ(~IQscore 2N
, student test score,
realization of a random effect
y b y
b
µ ε= + + ≅
≅
12
− Predict:
Result:
Prediction of Mixed Effects
student's true IQ (unobservable)bµ + ≅
�IQ 127=
Prediction of Mixed Effects
• Fixed effects + variance components unknown• 20 Students• each student: 5 tests• Computations as described before• See R-File: rf_prediction3.R
13Prediction of Mixed Effects
Prediction of Future ObservationsAIM: construct prediction intervalsprediction intervalsprediction intervalsprediction intervals, i.e. an interval in which future
observation will fall with a certain probability given what has been observed.
Examples:1. Longitudinal studies
– Prediction of a fut. obs. from an individual not previously observed
14
– Prediction of a fut. obs. from an individual not previously observed– Less interest to predict another observation from an observed individual as
the studies often aim at applications to a larger population– E.g.: drugs going to the market after clinical trials
2. Surveys– 2-step survey:– A) number of families randomly selected– B) some family members of each family are interviewed– Prediction for a non-selected family
Prediction of Future Observations
Prediction Intervals
a. Assumption: fut. obs. has certain distribution– Distribution defined up to a finite number of unknown parameters– Estimate parameter � obtain prediction interval– BUT: if distribution assumption fails, the interval might be wrong
b. Distribution-free– Normality is not assumed– Distribution-free approach
15
– Distribution-free approach– Assumption: future observation is independent of current ones
c. Markov-Chains, Montecarlod. et cetera…
Prediction of Future Observations
Confidence vs. Prediction Intervals
Confidence Interval (CI)Confidence Interval (CI)Confidence Interval (CI)Confidence Interval (CI)• Interval estimate of
population parameter• ¨ unobservable
population parameter
Prediction Interval (PI)Prediction Interval (PI)Prediction Interval (PI)Prediction Interval (PI)• Interval estimate of future
observation• ¨ future observation
population parameter• predict distribution of
estimate of unobservable quantity of interest (e.g. true pop. mean)
• Predict distribution of individual future points
16Prediction of Future Observations
Confidence vs. Prediction Intervals (math.)( )
( )( ) 12
12
1
2
)'(ˆvar
)'(ˆvar
')'(ˆ
,0~,
−
−
−
=−
=
=
+=
XX
XX
yXXX
NXy
σββ
σβ
β
σεεβ
true y x β′= ⌣
Fixed effect model
17Prediction of Future Observations
1 1
true
observed CIˆ Interval: 1.96 var
ˆˆmodelled PI
ˆˆpredicted
i i
i i
i i
n n
y x
y xy
y x
y x
β
β ε
β
β+ +
′= ′= +
± ⋅ ′=
′=
⌣
Normal Approximation
Confidence vs. Prediction Intervals (math.)
( ) ( )
( ) ( )( )
12
12
1 1 1 1
ˆCI: var '
ˆPI: var ' 1
i i i
n n n n
y x X X x
y y x X X x
σ
σ
−
−
+ + + +
′=
′− = +
( )12ˆConfidence Interval: 1.96 'i iy x X X xσ
−′± ⋅
18Prediction of Future Observations
( )
( )( )12
1 1
ˆConfidence Interval: 1.96 '
ˆPrediction Interval: 1.96 ' 1
i i
n n
y x X X x
y x X X x
σ
σ−
+ +
′± ⋅
′± ⋅ +
Confidence vs. Prediction Intervals
How do we construct Prediction Intervals for a more general model?
Mixed effects model: y X Zbβ ε= + +
19Prediction of Future Observations
Prediction IntervalsModel:
( )
( )2
,
~ 0,
~ 0, n
y X Zb b
b N
N
θ
β ε ε
ε σ
= + + ⊥
Σ
I
Observationsn
( ) ( )2 2
cov(
cov cov
)
b
θ ε
σεσ
= +
= = =
V Z'
Σ I
ΣZ
I
20Prediction of Future Observations
( ) ( )2 2
1cov covq nb θ σεσ= = =Σ I I
Estimate and : bβ ( )( )
11 1
2 1
1
2 2
1
ˆ ˆ ˆ' '
ˆˆˆ '
ˆ ˆ ˆwhere ' n
y
yb
β
σ β
σ σ
−− −
−
=
= −
= +
X V X X V
Z V X
V ZZ I
ɶ
Prediction IntervalsMarginal:
( ) ( ) ( )( ) ( )( )
( )1 1 1 1 1
2
ˆ ˆˆcov cov cov
ˆˆcov cov
ˆcov
i i i i
n n n n n
y x x x
y y x x z b
x x z z
β β
β β ε
β β σ
+ + + + +
′ ′= =
′ ′ ′− = − + + =
′ ′= − + +Σ I
21Prediction of Future Observations
( ) 2
1 1 1 1ˆcovn n n n nx x z zθβ β σ+ + + +
′ ′= − + +Σ I
Conditional (on all random effects):
( ) ( )( )12
11 1 1ˆcov 1n n n nb xby y xσ ++
−
++′ ′− = = +X Xɶ
Prediction IntervalsMarginal:
( )( )( )2
1 1 1 1
ˆˆConfidence Interval: 1.96 cov
ˆˆPrediction Interval: 1.96 cov
i i
n n n n n
y x x
y x x z zθ
β
β β σ+ + + +
′± ⋅
′ ′± ⋅ − + +Σ I
22Prediction of Future Observations
Conditional on all random effects:( )( )12
1 1ˆPrediction Interval: 1.96 1n ny X xx Xσ
−
+ +′ ′± ⋅ +
Prediction IntervalsThere is a difference between marginal and conditional predictions!
�Which one is of interest?
23Prediction of Future Observations
The prediction process
Prediction: • is a linear function of the best linear (unbiased) predictor of random effects with the best linear (unbiased) estimator of fixed effects in the model• is typically associated with a combination of explanatory
24
• is typically associated with a combination of explanatory variables• either averaged over, ignoring, or at a specific value of other explanatory variables in the model
Principles of Prediction
The prediction process
Partition of the explanatory variables (e.v.) into 3 sets:1. Classifying set
• e.v. for which predicted values are required
25
2. Averaging set• e.v. which have to have averaged over
3. Rest• e.v. which will be ignored
Principles of Prediction
The role of fixed and random effects with respect to prediction
Fixed Terms• have associated set of effects (parameters) which have to be estimated
Random Terms
26
Random Terms• associated effects are normally distributed with 0 mean and co-variance matrix• co-variance matrix is function of (usually) unknown parameters• error terms due to randomization or other structure of the data
Principles of Prediction
How to deal with Random Factor Terms1. Evaluate at a given value(s) specified by user2. Average over the set of random effects
• Prediction specific to / conditional on the random effects observed
• � „Conditional prediction” w.r.t. the term
27
• � „Conditional prediction” w.r.t. the term3. Omit the random term from the model
• Prediction at the population average (zero)• substitutes the assumed pop. mean for an unknown
random effect• � „Marginal prediction” w.r.t. the term
Principles of Prediction
How to deal with Fixed Factors
• no pre-defined population average• no natural interpretation for a prediction derived by omitting a fixed term from the fitted values• average over all the present levels to give a conditional
28
• average over all the present levels to give a conditional average• or: user should specify the value(s)
Principles of Prediction
4 conceptual steps for the prediction process
1. Choose e.v. and their respective values for which predictive margins are required, i.e. determine the classifying set
2. Determine which variables should be averaged over, i.e. determine the averaging set
3. Determine terms that are needed to compute parameters
29
3. Determine terms that are needed to compute parameters and estimations
4. Choose the weighting for taking means over margins (for the averaging set)
Principles of Prediction
Split-Plot DesignExperiment: • 4 levels of nitrogen• 3 oat varieties• 6 “tries”, i.e. 6 blocks• 4 subplots• 3 whole-plots• random allocation of
30
• random allocation of nitrogen within a block
Fixed effects:• treatment combination
Random effects:• blocking factors (source of error variation)
AIM: estimate the performance of each treatment combination within AIM: estimate the performance of each treatment combination within AIM: estimate the performance of each treatment combination within AIM: estimate the performance of each treatment combination within the experimentthe experimentthe experimentthe experiment
Split-Plot Design
The Data-Set
31Split-Plot Design
The model: components~ :
~ :
~ : :
rando
constant variety nitrogen variety nitrogen
bloc
residual
fixe
ks blocks wplots
blocks wplots s
m
p ot
d
l s
+ + +
+
Random terms: • error terms used in estimation of treatment effects
32
• error terms used in estimation of treatment effects• Not otherwise relevant to the prediction of treatment effects
Split-Plot Design
Conditional vs. Marginal prediction for the random effects
Conditional prediction:• gives a prediction specific to the blocks and plots used in the experiment• appropriate to inference for the specific instance that occurred in the dataset
33
occurred in the datasetMarginal prediction:
• the prediction corresponds to the yields expected from a similar experiment laid out using different blocks and plots• appropriate when inference is required for members of the wider population
Split-Plot Design
The model( ) ( )i ijr ij s ijijk ir jkk sv nb n ey w vµ= + + + + + +
( ) ( )
( )
ijrv
b
kjiy
ijr
i
ijk
variety ofeffect
block ofeffect
constant overall
4,...,1plot -sub ,3,..1plot - whole,6,...,1block on yield
≅
≅
≅
===≅
µ
34
( )
( )
( )
kjie
srvn
ijks
sn
ijw
ijr
ijk
rs
ijks
ij
plot -sub ,plot - whole,block for error residual
levelnitrogen with level variety ofn interactio
plots-sub tolevelsnitrogen ofion randomizat
levelnitrogen ofeffect
block in plot - wholeofeffect
plots- whole to varietiesofion randomizat
≅
≅
≅
≅
≅
≅
Split-Plot Design
Assumption
( ) ( ) ( )634112111631211621 ,...,, and ,...,, ,,...,, eeeewwwwbbbb ===
For the following terms we assume a normal distribution
( ) ( )i ijr ij s ijijk ir jkk sv nb n ey w vµ= + + + + + +
35
72
2
18
2
6
2
00
00
00
,
0
0
0
~
I
I
I
N
e
w
b
w
b
σ
σ
σ
Split-Plot Design
ANOVA• no interaction• usually: drop non-significant terms
variety nitrogen×
( ) ( )i ijr ij s ijijk ir jkk sv nb n ey w vµ= + + + + + +
36Split-Plot Design
Prediction Process
Prediction of yield for each nitrogen level• = general effect of different nitrogen applications• � unweighted average across all varieties
( ) ( )i ijr ij s ijijk ir jkk sv nb n ey w vµ= + + + + + +
37
• = prediction of yield for nitrogen level l for “average” block+whole-plot
• � marginal prediction wrt block+whole-plotCalculation: ignore random terms:
Split-Plot Design
�( )3
1
1ˆ ˆ ˆ
3l j jl
j
n v vnµ=
+ + +∑
Prediction Process
Prediction specific to blocks+whole-plots in experiment• � conditional prediction
Calculation: include random terms:
( ) ( )i ijr ij s ijijk ir jkk sv nb n ey w vµ= + + + + + +
38Split-Plot Design
�( )1
3
1
6 6 3
1 1
11 1ˆ
6 18ˆ ˆ
3jli l ij
i i j
j
j
vb n w vnµ= = ==
+ ++ + +∑ ∑∑ ∑ɶ ɶ
Prediction Process
Explanatory variable Set Levels AveragingM C M C M C
Variety a a all all e e
Nitrogen c c all all n n
39Split-Plot Design
Blocks x a - all - e
Wplots x a - all - e
splots x x - - - -
e : equal weightsn : none
a : averaging setc : classifying setx : excluded
M : marginal pred.C : conditional pred.
Prediction Process
Model term In prediction?M C
Constant + +
Variety + +
Nitrogen + +
40Split-Plot Design
Nitrogen + +
Variety:nitrogen + +
Blocks x +
Blocks:wplots x +
Blocks:wplots:splots x x
+ : usedx : ignored
The resulting predictions
Nitrogen application Prediction SE
M C
0.0 cwt/acre 79.4 7.18 3.14
Predictions for nitrogen application levels with SE and SED, using marginal (M) or conditional (C) values of blocks+whole-plots
41Split-Plot Design
0.2 cwt/acre 98.9 7.18 3.14
0.4 cwt/acre 114.2 7.18 3.14
0.6 cwt/acre 123.4 7.18 3.14
SED 4.44 4.44
The resulting predictionspredictions
Variety Nitrogen application (cwt/acre) Margin
0.0 0.2 0.4 0.6
Golden Rain 80.00 98.50 124.83 124.83 104.50
Marvellous 86.67 108.50 117.17 126.83 109.79
variety nitrogen×
42Split-Plot Design
Marvellous 86.67 108.50 117.17 126.83 109.79
Victory 71.50 89.67 110.83 118.50 97.63
Margin 79.39 98.89 114.22 123.39 103.97
The resulting predictions• SE smaller for conditional predictions
• � because predictions are calculated conditional on the blocks+whole-plots observed!
• Using marginal values = „no information on block+whole-plot effect“
43Split-Plot Design
Special case: data missing
Data for all replicates of Golden Rain with 0 cwt nitrogen are missing
44Split-Plot Design
Special case: data missing
Variety Nitrogen application (cwt/acre) Margin
0.0 0.2 0.4 0.6
Golden Rain ??? 98.50 124.83 124.83
Marvellous 86.67 108.50 117.17 126.83 109.79
45Split-Plot Design
Corresponding cell cannot be estimated without additional assumptions!
Marvellous 86.67 108.50 117.17 126.83 109.79
Victory 71.50 89.67 110.83 118.50 97.63
Margin 98.89 114.22 123.39
Special case: data missing
No significant variety main effect/ interactions present in the model� Approach chosen has no great influence on nitrogen prediction� consider variety predictions
46
Possible approaches• set inestimable parameters to 0, average over all cells• average over cells with data present• average over levels of nitrogen for which all varieties are present
Split-Plot Design
The resulting predictions
Variety Inestimable
parameters zero
For data
present
On nitrogen levels
0.2-0.6
Golden Rain 105.67 112.67 112.67
Marvellous 109.97 109.79 117.50
Variety predictions with ‘Golden Rain + 0 cwt nitrogen‘ plots set tomissing valueMargin
104.50
109.79
47Split-Plot Design
Marvellous 109.97 109.79 117.50
Victory 97.63 97.63 106.33
109.79
97.63
Complete
data
The resulting predictionsVariety Inestimable
parameters zero
For data
present
On nitrogen levels
0.2-0.6
Golden Rain 105.67 112.67 112.67
Marvellous 109.97 109.79 117.50
Victory 97.63 97.63 106.33
48Split-Plot Design
In 2nd case: variety ordering has changed� prediction not comparable because of large nitrogen effect
ComparisonVariety Margin
Golden Rain 104.50
Marvellous 109.79
Victory 97.63
Margin 103.97
Data complete
49Split-Plot Design
Variety Inest. param.
zero
For data
present
On nitrogen levels
0.2-0.6
Golden Rain 105.67 112.67 112.67
Marvellous 109.79 109.79 117.50
Victory 97.63 97.63 106.33
Margin 103.97
Data missing
Other special cases: data missingFirst entry in the data is missing (i.e. Victory, 0.0 cwt/acre, Block I)
50Split-Plot Design
Data in Block I for all replicates with 0.0 cwt nitrogen are missing
Other special cases: data missingAll Data for replicates with 0.0 cwt nitrogen are missing
51Split-Plot Design
Making the computations…
R-FILE!
52Split-Plot Design