Post on 29-Jun-2020
transcript
Statistical Methods for Analysis with Missing Data
Lecture 3: naıve methods: complete-case analysis and imputation
Mauricio Sadinle
Department of Biostatistics
1 / 42
Previous LectureUniverse of missing-data mechanisms:
MNAR MAR MCAR
I MCAR: p(R = r | z) = p(R = r)I Unreasonable in most cases
I MAR: p(R = r | z) = p(R = r | z(r))I Hard to digest, in generalI R ⊥⊥ Z1 | Z2, if Z2 fully observed
I MNAR: p(R = r | z) 6= p(R = r | z(r))I Most realistic, but hard to handle
2 / 42
Today’s Lecture
Naıve or ad-hoc methods
I Complete-case / available-case analyses
I Different types of (single) imputation
Reading: Ch. 2, of Davidian and Tsiatis
3 / 42
Naıve or Ad-Hoc Methods
I Motivation: we know how to run analyses with complete(rectangular) datasets
I Idea: somehow “fix” the dataset so that the analysis for completedata can be run
4 / 42
Outline
Complete-Case and Available-Case AnalysisComplete-Case AnalysisAvailable-Case Analysis
ImputationMean ImputationMode ImputationRegression ImputationHot-Deck ImputationLast Observation Carried Forward
Summary
5 / 42
Outline
Complete-Case and Available-Case AnalysisComplete-Case AnalysisAvailable-Case Analysis
ImputationMean ImputationMode ImputationRegression ImputationHot-Deck ImputationLast Observation Carried Forward
Summary
6 / 42
Complete-Case Analysis
I Idea: ignore observations with missingness, run intended analysiswith remaining data
7 / 42
Complete-Case Analysis
8 / 42
Assumption for Complete-Case AnalysisComplete-case analysis implicitly assumes
p(z) = p(z | R = 1K ) (1)
where 1K represents a vector (1, 1, . . . , 1) of length K
I By Bayes’ theorem
p(z | R = 1K ) =p(R = 1K | z)p(z)
p(R = 1K )
I Therefore, (1) is equivalent to
p(R = 1K | z) = p(R = 1K )
I This doesn’t require any assumptions on p(R = r | z) for r 6= 1K
I MCAR (Z ⊥⊥ R) is a sufficient condition for (1)
9 / 42
Assumption for Complete-Case AnalysisComplete-case analysis implicitly assumes
p(z) = p(z | R = 1K ) (1)
where 1K represents a vector (1, 1, . . . , 1) of length K
I By Bayes’ theorem
p(z | R = 1K ) =p(R = 1K | z)p(z)
p(R = 1K )
I Therefore, (1) is equivalent to
p(R = 1K | z) = p(R = 1K )
I This doesn’t require any assumptions on p(R = r | z) for r 6= 1K
I MCAR (Z ⊥⊥ R) is a sufficient condition for (1)
9 / 42
Assumption for Complete-Case AnalysisComplete-case analysis implicitly assumes
p(z) = p(z | R = 1K ) (1)
where 1K represents a vector (1, 1, . . . , 1) of length K
I By Bayes’ theorem
p(z | R = 1K ) =p(R = 1K | z)p(z)
p(R = 1K )
I Therefore, (1) is equivalent to
p(R = 1K | z) = p(R = 1K )
I This doesn’t require any assumptions on p(R = r | z) for r 6= 1K
I MCAR (Z ⊥⊥ R) is a sufficient condition for (1)
9 / 42
Assumption for Complete-Case AnalysisComplete-case analysis implicitly assumes
p(z) = p(z | R = 1K ) (1)
where 1K represents a vector (1, 1, . . . , 1) of length K
I By Bayes’ theorem
p(z | R = 1K ) =p(R = 1K | z)p(z)
p(R = 1K )
I Therefore, (1) is equivalent to
p(R = 1K | z) = p(R = 1K )
I This doesn’t require any assumptions on p(R = r | z) for r 6= 1K
I MCAR (Z ⊥⊥ R) is a sufficient condition for (1)
9 / 42
Assumption for Complete-Case AnalysisComplete-case analysis implicitly assumes
p(z) = p(z | R = 1K ) (1)
where 1K represents a vector (1, 1, . . . , 1) of length K
I By Bayes’ theorem
p(z | R = 1K ) =p(R = 1K | z)p(z)
p(R = 1K )
I Therefore, (1) is equivalent to
p(R = 1K | z) = p(R = 1K )
I This doesn’t require any assumptions on p(R = r | z) for r 6= 1K
I MCAR (Z ⊥⊥ R) is a sufficient condition for (1)
9 / 42
Complete-Case Analysis is Wasteful/Inefficient
Clearly, there can be a huge waste of information
I Observed data with response patterns r 6= 1K should be informativeabout the distribution of Z(r), which is informative about thedistribution of Z
p(z(r)) =
∫p(z) dz(r), r ∈ {0, 1}K
I We might end up with very little data
I Say the R1, . . . ,RKi.i.d.∼ Bernoulli(π)
I p(R = 1K ) = πK K→∞−→ 0
10 / 42
Complete-Case Analysis is Wasteful/Inefficient
Clearly, there can be a huge waste of information
I Observed data with response patterns r 6= 1K should be informativeabout the distribution of Z(r), which is informative about thedistribution of Z
p(z(r)) =
∫p(z) dz(r), r ∈ {0, 1}K
I We might end up with very little data
I Say the R1, . . . ,RKi.i.d.∼ Bernoulli(π)
I p(R = 1K ) = πK K→∞−→ 0
10 / 42
Example: Estimating a Mean
We’ll see an alternative presentation of Example 1 in Section 1.4 ofDavidian and Tsiatis
I {(Yi ,Ri )}ni=1i.i.d.∼ F
I Yi : numeric variable for individual i
I Ri : indicator of Yi being observed
I If Yi was always observed, we could estimate the mean of Y ,µ = E (Y ), as
µfull =1
n
n∑i=1
Yi
11 / 42
Example: Estimating a Mean
With missing data, we could use the complete cases
µcc =
∑ni=1 YiRi∑ni=1 Ri
Is this any good?
HW1: show that the following holds
E (µcc) = E (Y | R = 1)
for all sample sizes, provided that at least one Yi is observed.
Hint: write E(µcc ) = E[E(∑n
i=1 YiRi∑ni=1 Ri
| R1, . . . ,Rn
)]
12 / 42
Example: Estimating a Mean
With missing data, we could use the complete cases
µcc =
∑ni=1 YiRi∑ni=1 Ri
Is this any good?
HW1: show that the following holds
E (µcc) = E (Y | R = 1)
for all sample sizes, provided that at least one Yi is observed.
Hint: write E(µcc ) = E[E(∑n
i=1 YiRi∑ni=1 Ri
| R1, . . . ,Rn
)]
12 / 42
Example: Estimating a Mean
With missing data, we could use the complete cases
µcc =
∑ni=1 YiRi∑ni=1 Ri
Is this any good?
HW1: show that the following holds
E (µcc) = E (Y | R = 1)
for all sample sizes, provided that at least one Yi is observed.
Hint: write E(µcc ) = E[E(∑n
i=1 YiRi∑ni=1 Ri
| R1, . . . ,Rn
)]
12 / 42
Example: Estimating a Mean
E (µcc) = E (Y | R = 1)
Therefore
I Complete-case estimator of the mean requires assuming
E (Y ) = E (Y | R = 1)
I In particular, valid under MCAR
I Otherwise, µcc is not valid for µ, as it estimates the wrong quantity
I HW1: if p(R = 1 | y) is an increasing function of y , show that
E (Y | R = 1) > E (Y )
13 / 42
Example: Estimating a Mean
E (µcc) = E (Y | R = 1)
Therefore
I Complete-case estimator of the mean requires assuming
E (Y ) = E (Y | R = 1)
I In particular, valid under MCAR
I Otherwise, µcc is not valid for µ, as it estimates the wrong quantity
I HW1: if p(R = 1 | y) is an increasing function of y , show that
E (Y | R = 1) > E (Y )
13 / 42
Outline
Complete-Case and Available-Case AnalysisComplete-Case AnalysisAvailable-Case Analysis
ImputationMean ImputationMode ImputationRegression ImputationHot-Deck ImputationLast Observation Carried Forward
Summary
14 / 42
Available-Case Analysis
Sometimes what we need to estimate doesn’t really require a“rectangular” dataset
I If you can, just use whatever data are available for computing whatyou need
I Davidian and Tsiatis talk about generalized estimating equations(GEEs) and their Example 3 in Section 1.4 (we’ll cover this when weget to Chapter 5)
I K normal random variables: under some missing-data assumption, itseems we could still obtain a good estimate of the distribution as itonly depends on univariate and bivariate quantities (means,variances, covariances)
15 / 42
Available-Case Analysis
Sometimes what we need to estimate doesn’t really require a“rectangular” dataset
I If you can, just use whatever data are available for computing whatyou need
I Davidian and Tsiatis talk about generalized estimating equations(GEEs) and their Example 3 in Section 1.4 (we’ll cover this when weget to Chapter 5)
I K normal random variables: under some missing-data assumption, itseems we could still obtain a good estimate of the distribution as itonly depends on univariate and bivariate quantities (means,variances, covariances)
15 / 42
Available-Case Analysis
Sometimes what we need to estimate doesn’t really require a“rectangular” dataset
I If you can, just use whatever data are available for computing whatyou need
I Davidian and Tsiatis talk about generalized estimating equations(GEEs) and their Example 3 in Section 1.4 (we’ll cover this when weget to Chapter 5)
I K normal random variables: under some missing-data assumption, itseems we could still obtain a good estimate of the distribution as itonly depends on univariate and bivariate quantities (means,variances, covariances)
15 / 42
Example of Available-Case Analysis
I Say the data are
I Zi = (Yi1, . . . ,YiK )
I Ri = (Ri1, . . . ,RiK )
I Available-case estimators:
µacj =
∑ni=1 YijRij∑ni=1 Rij
, j = 1, . . . ,K
σacjk =
∑ni=1(Yij − µac
j )(Yik − µack )RijRik∑n
i=1 RijRik − 1; j , k = 1, . . . ,K
I Better than complete-case analysis
I Valid under MCAR, but what are the minimal assumptions on themissing-data mechanism for this to be valid?
16 / 42
Example of Available-Case Analysis
I Say the data are
I Zi = (Yi1, . . . ,YiK )
I Ri = (Ri1, . . . ,RiK )
I Available-case estimators:
µacj =
∑ni=1 YijRij∑ni=1 Rij
, j = 1, . . . ,K
σacjk =
∑ni=1(Yij − µac
j )(Yik − µack )RijRik∑n
i=1 RijRik − 1; j , k = 1, . . . ,K
I Better than complete-case analysis
I Valid under MCAR, but what are the minimal assumptions on themissing-data mechanism for this to be valid?
16 / 42
Example of Available-Case Analysis
I Say the data are
I Zi = (Yi1, . . . ,YiK )
I Ri = (Ri1, . . . ,RiK )
I Available-case estimators:
µacj =
∑ni=1 YijRij∑ni=1 Rij
, j = 1, . . . ,K
σacjk =
∑ni=1(Yij − µac
j )(Yik − µack )RijRik∑n
i=1 RijRik − 1; j , k = 1, . . . ,K
I Better than complete-case analysis
I Valid under MCAR, but what are the minimal assumptions on themissing-data mechanism for this to be valid?
16 / 42
Example of Available-Case Analysis
I Say the data are
I Zi = (Yi1, . . . ,YiK )
I Ri = (Ri1, . . . ,RiK )
I Available-case estimators:
µacj =
∑ni=1 YijRij∑ni=1 Rij
, j = 1, . . . ,K
σacjk =
∑ni=1(Yij − µac
j )(Yik − µack )RijRik∑n
i=1 RijRik − 1; j , k = 1, . . . ,K
I Better than complete-case analysis
I Valid under MCAR, but what are the minimal assumptions on themissing-data mechanism for this to be valid?
16 / 42
Complete-Case and Available-Case Analysis
The moral:
I Complete-case analysis is wasteful and, most likely, invalid
I Available-case analysis is better, but still requires MCAR or possiblya weaker assumption depending on what we need to compute
17 / 42
Outline
Complete-Case and Available-Case AnalysisComplete-Case AnalysisAvailable-Case Analysis
ImputationMean ImputationMode ImputationRegression ImputationHot-Deck ImputationLast Observation Carried Forward
Summary
18 / 42
Imputation
I Idea: plug something “reasonable” into the holes of the dataset,then run intended analysis with completed data
19 / 42
Imputation
20 / 42
Outline
Complete-Case and Available-Case AnalysisComplete-Case AnalysisAvailable-Case Analysis
ImputationMean ImputationMode ImputationRegression ImputationHot-Deck ImputationLast Observation Carried Forward
Summary
21 / 42
Mean Imputation
I Numeric variables
I Impute mean of observed values
I Corresponds to imputing an estimate of E(Yj | Rj = 1), j = 1, . . . ,K
I Leads to valid point estimates of means under MCAR
I Underestimates true variance of estimators
22 / 42
Mean Imputation
Say the data are
I {(Zi ,Ri )}ni=1i.i.d.∼ F
I Zi = (Yi1, . . . ,YiK )
I Ri = (Ri1, . . . ,RiK )
Mean imputation:
I Compute
µ1j =
∑ni=1 YijRij∑ni=1 Rij
, j = 1, . . . ,K
I Impute Yij with µ1j whenever Rij = 0
I Run your analysis as if your data were fully observed
23 / 42
Mean Imputation
Say the data are
I {(Zi ,Ri )}ni=1i.i.d.∼ F
I Zi = (Yi1, . . . ,YiK )
I Ri = (Ri1, . . . ,RiK )
Mean imputation:
I Compute
µ1j =
∑ni=1 YijRij∑ni=1 Rij
, j = 1, . . . ,K
I Impute Yij with µ1j whenever Rij = 0
I Run your analysis as if your data were fully observed
23 / 42
Mean Imputation
Say the data are
I {(Zi ,Ri )}ni=1i.i.d.∼ F
I Zi = (Yi1, . . . ,YiK )
I Ri = (Ri1, . . . ,RiK )
Mean imputation:
I Compute
µ1j =
∑ni=1 YijRij∑ni=1 Rij
, j = 1, . . . ,K
I Impute Yij with µ1j whenever Rij = 0
I Run your analysis as if your data were fully observed
23 / 42
Mean Imputation
Say the data are
I {(Zi ,Ri )}ni=1i.i.d.∼ F
I Zi = (Yi1, . . . ,YiK )
I Ri = (Ri1, . . . ,RiK )
Mean imputation:
I Compute
µ1j =
∑ni=1 YijRij∑ni=1 Rij
, j = 1, . . . ,K
I Impute Yij with µ1j whenever Rij = 0
I Run your analysis as if your data were fully observed
23 / 42
Mean Imputation
Age Income25 60, 000
? ?
51 ?
? 150, 300
......
=⇒
Age Income25 60, 000
µ1Age µ1
Income
51 µ1Income
µ1Age 150, 300
......
24 / 42
Example: Estimating a Mean
I Estimating a mean after mean imputation corresponds to using theestimator
µmimpj =
1
n
n∑i=1
[YijRij + µ1j (1− Rij)]
I µmimpj is the mean of the imputed data, so its naıvely estimated
variance isVnaıve(µmimp
j ) = Vnaıve(Yj)/n
where
Vnaıve(Yj) =1
n − 1
n∑i=1
[Rij(Yij − µmimpj )2 + (1− Rij)(µ1
j − µmimpj )2]
I HW1: show that µmimpj = µ1
j
25 / 42
Example: Estimating a Mean
I Estimating a mean after mean imputation corresponds to using theestimator
µmimpj =
1
n
n∑i=1
[YijRij + µ1j (1− Rij)]
I µmimpj is the mean of the imputed data, so its naıvely estimated
variance isVnaıve(µmimp
j ) = Vnaıve(Yj)/n
where
Vnaıve(Yj) =1
n − 1
n∑i=1
[Rij(Yij − µmimpj )2 + (1− Rij)(µ1
j − µmimpj )2]
I HW1: show that µmimpj = µ1
j
25 / 42
Example: Estimating a Mean
I Estimating a mean after mean imputation corresponds to using theestimator
µmimpj =
1
n
n∑i=1
[YijRij + µ1j (1− Rij)]
I µmimpj is the mean of the imputed data, so its naıvely estimated
variance isVnaıve(µmimp
j ) = Vnaıve(Yj)/n
where
Vnaıve(Yj) =1
n − 1
n∑i=1
[Rij(Yij − µmimpj )2 + (1− Rij)(µ1
j − µmimpj )2]
I HW1: show that µmimpj = µ1
j
25 / 42
Example: Estimating a Mean
As a consequence, using the mean imputation method we:
I Underestimate the variance of each variable:
Vnaıve(Yj) =1
n − 1
n∑i=1
Rij(Yij − µ1j )2
I Compare with an estimate based on the available cases:
V 1(Yj) =
∑ni=1 Rij(Yij − µ1
j )2∑ni=1 Rij − 1
I =⇒ Vnaıve(Yj) ≤ V 1(Yj)
26 / 42
Example: Estimating a Mean
As a consequence, using the mean imputation method we:
I Underestimate the variance of µmimpj :
Vnaıve(µmimpj ) =
1
n(n − 1)
n∑i=1
Rij(Yij − µ1j )2
I Compare with an estimate based on the available cases:
V 1(µmimpj ) =
∑ni=1 Rij(Yij − µ1
j )2
(∑n
i=1 Rij)(∑n
i=1 Rij − 1)
I =⇒ Vnaıve(µmimpj ) ≤ V 1(µmimp
j )
I HW1: comment on the implications of mean imputation for theconstruction of confidence intervals
27 / 42
Example: Estimating a Mean
As a consequence, using the mean imputation method we:
I Underestimate the variance of µmimpj :
Vnaıve(µmimpj ) =
1
n(n − 1)
n∑i=1
Rij(Yij − µ1j )2
I Compare with an estimate based on the available cases:
V 1(µmimpj ) =
∑ni=1 Rij(Yij − µ1
j )2
(∑n
i=1 Rij)(∑n
i=1 Rij − 1)
I =⇒ Vnaıve(µmimpj ) ≤ V 1(µmimp
j )
I HW1: comment on the implications of mean imputation for theconstruction of confidence intervals
27 / 42
Outline
Complete-Case and Available-Case AnalysisComplete-Case AnalysisAvailable-Case Analysis
ImputationMean ImputationMode ImputationRegression ImputationHot-Deck ImputationLast Observation Carried Forward
Summary
28 / 42
Mode Imputation
I Categorical variables
I Impute mode of observed values
I Artificially inflates frequency of mode
I Leads to valid point estimates of marginal modes under MCAR
I Underestimates true variance of estimators
29 / 42
Outline
Complete-Case and Available-Case AnalysisComplete-Case AnalysisAvailable-Case Analysis
ImputationMean ImputationMode ImputationRegression ImputationHot-Deck ImputationLast Observation Carried Forward
Summary
30 / 42
Regression Imputation
I Regress one variable on others based on observed data, then imputepredicted values from model
I Corresponds to imputing an estimate of E (Yj | y−j ,R = 1K ), wherey−j = (y1, . . . , yj−1, yj+1, . . . , yK )
I Valid for means under MCAR
I Underestimates true variance of estimators
I Validity depends on model used for imputation
31 / 42
Regression Imputation
I Regress one variable on others based on observed data, then imputepredicted values from model
I Corresponds to imputing an estimate of E (Yj | y−j ,R = 1K ), wherey−j = (y1, . . . , yj−1, yj+1, . . . , yK )
I Valid for means under MCAR
I Underestimates true variance of estimators
I Validity depends on model used for imputation
31 / 42
Regression Imputation
I Regress one variable on others based on observed data, then imputepredicted values from model
I Corresponds to imputing an estimate of E (Yj | y−j ,R = 1K ), wherey−j = (y1, . . . , yj−1, yj+1, . . . , yK )
I Valid for means under MCAR
I Underestimates true variance of estimators
I Validity depends on model used for imputation
31 / 42
Regression Imputation
I Regress one variable on others based on observed data, then imputepredicted values from model
I Corresponds to imputing an estimate of E (Yj | y−j ,R = 1K ), wherey−j = (y1, . . . , yj−1, yj+1, . . . , yK )
I Valid for means under MCAR
I Underestimates true variance of estimators
I Validity depends on model used for imputation
31 / 42
Regression Imputation
I Regress one variable on others based on observed data, then imputepredicted values from model
I Corresponds to imputing an estimate of E (Yj | y−j ,R = 1K ), wherey−j = (y1, . . . , yj−1, yj+1, . . . , yK )
I Valid for means under MCAR
I Underestimates true variance of estimators
I Validity depends on model used for imputation
31 / 42
Example of Regression Imputation in Davidian and Tsiatis
I Z = (Y1,Y2), baseline and follow-up, Y1 always observed
I R indicator of response for Y2
I Goal: to estimate µ2 = E (Y2)
I Say we posit a linear model E (Y2 | y1) = β0 + β1y1
I Impute Yi2 with Yi2 = β0 + β1Yi1 when Ri = 0, with β0 and β1
obtained via least squares among complete cases
I The regression imputation estimator for µ2 is
µrimp2 =
1
n
n∑i=1
[Yi2Ri + Yi2(1− Ri )]
I When is this valid? (when does µrimp2
n→∞−→ µ2 ?)
32 / 42
Example of Regression Imputation in Davidian and Tsiatis
I Z = (Y1,Y2), baseline and follow-up, Y1 always observed
I R indicator of response for Y2
I Goal: to estimate µ2 = E (Y2)
I Say we posit a linear model E (Y2 | y1) = β0 + β1y1
I Impute Yi2 with Yi2 = β0 + β1Yi1 when Ri = 0, with β0 and β1
obtained via least squares among complete cases
I The regression imputation estimator for µ2 is
µrimp2 =
1
n
n∑i=1
[Yi2Ri + Yi2(1− Ri )]
I When is this valid? (when does µrimp2
n→∞−→ µ2 ?)
32 / 42
Example of Regression Imputation in Davidian and Tsiatis
I Z = (Y1,Y2), baseline and follow-up, Y1 always observed
I R indicator of response for Y2
I Goal: to estimate µ2 = E (Y2)
I Say we posit a linear model E (Y2 | y1) = β0 + β1y1
I Impute Yi2 with Yi2 = β0 + β1Yi1 when Ri = 0, with β0 and β1
obtained via least squares among complete cases
I The regression imputation estimator for µ2 is
µrimp2 =
1
n
n∑i=1
[Yi2Ri + Yi2(1− Ri )]
I When is this valid? (when does µrimp2
n→∞−→ µ2 ?)
32 / 42
Example of Regression Imputation in Davidian and Tsiatis
I Z = (Y1,Y2), baseline and follow-up, Y1 always observed
I R indicator of response for Y2
I Goal: to estimate µ2 = E (Y2)
I Say we posit a linear model E (Y2 | y1) = β0 + β1y1
I Impute Yi2 with Yi2 = β0 + β1Yi1 when Ri = 0, with β0 and β1
obtained via least squares among complete cases
I The regression imputation estimator for µ2 is
µrimp2 =
1
n
n∑i=1
[Yi2Ri + Yi2(1− Ri )]
I When is this valid? (when does µrimp2
n→∞−→ µ2 ?)
32 / 42
Example of Regression Imputation in Davidian and Tsiatis
I Z = (Y1,Y2), baseline and follow-up, Y1 always observed
I R indicator of response for Y2
I Goal: to estimate µ2 = E (Y2)
I Say we posit a linear model E (Y2 | y1) = β0 + β1y1
I Impute Yi2 with Yi2 = β0 + β1Yi1 when Ri = 0, with β0 and β1
obtained via least squares among complete cases
I The regression imputation estimator for µ2 is
µrimp2 =
1
n
n∑i=1
[Yi2Ri + Yi2(1− Ri )]
I When is this valid? (when does µrimp2
n→∞−→ µ2 ?)
32 / 42
Example of Regression Imputation in Davidian and Tsiatis
Davidian and Tsiatis show that for µrimp2
n→∞−→ µ2 (µrimp2
p−→ µ2) weneed these two requirements to hold simultaneously:
I E (Y2 | y1,R = 1) = E (Y2 | y1) (implied by MAR)
I E (Y2 | y1) is correctly specified, i.e., there really exist β∗0 and β∗1such that E (Y2 | y1) = β∗0 + β∗1 y1
However, even if these two conditions hold, single imputation leads tounderestimation of variances, as seen with mean imputation
33 / 42
Example of Regression Imputation in Davidian and Tsiatis
Davidian and Tsiatis show that for µrimp2
n→∞−→ µ2 (µrimp2
p−→ µ2) weneed these two requirements to hold simultaneously:
I E (Y2 | y1,R = 1) = E (Y2 | y1) (implied by MAR)
I E (Y2 | y1) is correctly specified, i.e., there really exist β∗0 and β∗1such that E (Y2 | y1) = β∗0 + β∗1 y1
However, even if these two conditions hold, single imputation leads tounderestimation of variances, as seen with mean imputation
33 / 42
Example of Regression Imputation in Davidian and Tsiatis
Davidian and Tsiatis show that for µrimp2
n→∞−→ µ2 (µrimp2
p−→ µ2) weneed these two requirements to hold simultaneously:
I E (Y2 | y1,R = 1) = E (Y2 | y1) (implied by MAR)
I E (Y2 | y1) is correctly specified, i.e., there really exist β∗0 and β∗1such that E (Y2 | y1) = β∗0 + β∗1 y1
However, even if these two conditions hold, single imputation leads tounderestimation of variances, as seen with mean imputation
33 / 42
Example of Regression Imputation in Davidian and Tsiatis
Davidian and Tsiatis show that for µrimp2
n→∞−→ µ2 (µrimp2
p−→ µ2) weneed these two requirements to hold simultaneously:
I E (Y2 | y1,R = 1) = E (Y2 | y1) (implied by MAR)
I E (Y2 | y1) is correctly specified, i.e., there really exist β∗0 and β∗1such that E (Y2 | y1) = β∗0 + β∗1 y1
However, even if these two conditions hold, single imputation leads tounderestimation of variances, as seen with mean imputation
33 / 42
Outline
Complete-Case and Available-Case AnalysisComplete-Case AnalysisAvailable-Case Analysis
ImputationMean ImputationMode ImputationRegression ImputationHot-Deck ImputationLast Observation Carried Forward
Summary
34 / 42
Hot-Deck Imputation
I Replace missing values of a non-respondent (called the recipient)with observed values from a respondent (the donor)
I Recipient and donor need to be similar with respect to variablesobserved by both cases
I Donor can be selected randomly from a pool of potential donorsI Single donor can be identified, e.g. “nearest neighbour” based on
some metric
I Andridge & Little (2010, Int. Stat. Rev.) reviewed this approachand concluded that
I General patterns of missingness are difficult to deal with (“swisscheese pattern”)
I Lack of theory to support this methodI Lack of comparisons with other methodsI Uncertainty from imputation is not taken into account
(underestimation of variances)
35 / 42
Hot-Deck Imputation
I Replace missing values of a non-respondent (called the recipient)with observed values from a respondent (the donor)
I Recipient and donor need to be similar with respect to variablesobserved by both cases
I Donor can be selected randomly from a pool of potential donorsI Single donor can be identified, e.g. “nearest neighbour” based on
some metric
I Andridge & Little (2010, Int. Stat. Rev.) reviewed this approachand concluded that
I General patterns of missingness are difficult to deal with (“swisscheese pattern”)
I Lack of theory to support this methodI Lack of comparisons with other methodsI Uncertainty from imputation is not taken into account
(underestimation of variances)
35 / 42
Hot-Deck Imputation
I Replace missing values of a non-respondent (called the recipient)with observed values from a respondent (the donor)
I Recipient and donor need to be similar with respect to variablesobserved by both cases
I Donor can be selected randomly from a pool of potential donorsI Single donor can be identified, e.g. “nearest neighbour” based on
some metric
I Andridge & Little (2010, Int. Stat. Rev.) reviewed this approachand concluded that
I General patterns of missingness are difficult to deal with (“swisscheese pattern”)
I Lack of theory to support this methodI Lack of comparisons with other methodsI Uncertainty from imputation is not taken into account
(underestimation of variances)
35 / 42
Outline
Complete-Case and Available-Case AnalysisComplete-Case AnalysisAvailable-Case Analysis
ImputationMean ImputationMode ImputationRegression ImputationHot-Deck ImputationLast Observation Carried Forward
Summary
36 / 42
Last Observation Carried Forward
I Common in settings where a variable is measured repeatedly overtime and there is dropout
I If there is droput at time j , we don’t observe Zj ,Zj+1, . . . ,ZT
I LOCF: replace all of Zj ,Zj+1, . . . ,ZT with Zj−1
37 / 42
Last Observation Carried Forward
Example from Davidian and Tsiatis:
Solid lines: observed data. Dashed lines: extrapolated data with LOCF.
38 / 42
Last Observation Carried Forward
Attempts to justify LOCF
I Interest in the last observed outcome measure (reasonable in somecontext??)
I Under some assumptions, will lead to conservative analysis
I Say we have a clinical trial, outcome under treatment is expected toimprove over time
I If treatment is found to be superior even with LOCF, then true effectshould be even larger
I Relies on assumption of monotonic improvement over time!
39 / 42
Last Observation Carried Forward
Attempts to justify LOCF
I Interest in the last observed outcome measure (reasonable in somecontext??)
I Under some assumptions, will lead to conservative analysis
I Say we have a clinical trial, outcome under treatment is expected toimprove over time
I If treatment is found to be superior even with LOCF, then true effectshould be even larger
I Relies on assumption of monotonic improvement over time!
39 / 42
Example of LOCF in Davidian and TsiatisStudy participants’ characteristic to be measured at T times
I Yj : measurement taken at time tj
I D: participant dropout time
I Interest: µT = E (YT )
I The LOCF estimator of the mean is
µLOCFT =
1
n
n∑i=1
T∑j=1
I (Di = j + 1)Yij
I The expected value of the LOCF estimator of the mean is
E (µLOCFT ) = µT −
T−1∑j=1
E [I (D = j + 1)(YT − Yj)],
so µLOCFT is biased, in general
40 / 42
Example of LOCF in Davidian and TsiatisStudy participants’ characteristic to be measured at T times
I Yj : measurement taken at time tj
I D: participant dropout time
I Interest: µT = E (YT )
I The LOCF estimator of the mean is
µLOCFT =
1
n
n∑i=1
T∑j=1
I (Di = j + 1)Yij
I The expected value of the LOCF estimator of the mean is
E (µLOCFT ) = µT −
T−1∑j=1
E [I (D = j + 1)(YT − Yj)],
so µLOCFT is biased, in general
40 / 42
Example of LOCF in Davidian and TsiatisStudy participants’ characteristic to be measured at T times
I Yj : measurement taken at time tj
I D: participant dropout time
I Interest: µT = E (YT )
I The LOCF estimator of the mean is
µLOCFT =
1
n
n∑i=1
T∑j=1
I (Di = j + 1)Yij
I The expected value of the LOCF estimator of the mean is
E (µLOCFT ) = µT −
T−1∑j=1
E [I (D = j + 1)(YT − Yj)],
so µLOCFT is biased, in general
40 / 42
Example of LOCF in Davidian and TsiatisStudy participants’ characteristic to be measured at T times
I Yj : measurement taken at time tj
I D: participant dropout time
I Interest: µT = E (YT )
I The LOCF estimator of the mean is
µLOCFT =
1
n
n∑i=1
T∑j=1
I (Di = j + 1)Yij
I The expected value of the LOCF estimator of the mean is
E (µLOCFT ) = µT −
T−1∑j=1
E [I (D = j + 1)(YT − Yj)],
so µLOCFT is biased, in general
40 / 42
Outline
Complete-Case and Available-Case AnalysisComplete-Case AnalysisAvailable-Case Analysis
ImputationMean ImputationMode ImputationRegression ImputationHot-Deck ImputationLast Observation Carried Forward
Summary
41 / 42
Summary
Main take-aways from today’s lecture:
I Complete-case analyses are wasteful. Also, potentially invalid unlessMCAR
I Available-case analyses make a better use of the available data butstill requires MCAR (weaker assumptions possibly depend onmodel/quantity being used/estimated)
I Imputation methods might be valid for some quantities under MCARbut variances are underestimated =⇒ overconfidence in your results!
Next lecture:
I R session 1: imputation methods, some simulation studies
I Bring your laptops!
42 / 42
Summary
Main take-aways from today’s lecture:
I Complete-case analyses are wasteful. Also, potentially invalid unlessMCAR
I Available-case analyses make a better use of the available data butstill requires MCAR (weaker assumptions possibly depend onmodel/quantity being used/estimated)
I Imputation methods might be valid for some quantities under MCARbut variances are underestimated =⇒ overconfidence in your results!
Next lecture:
I R session 1: imputation methods, some simulation studies
I Bring your laptops!
42 / 42