Some Practical Solutions toAnalyzing Messy Data
Monnie McGee
Department of Statistical Science
Southern Methodist University, Dallas, Texas
Co-authored with N. Bergasa (SUNY Downstate Medical Center)
I. Ginsburg, and D. Engler (Columbia Presbyterian Medical Center)
ENAR Spring Meeting, March 20-23, 2005 – p.1/16
Gabapentin Study
Protocol called for 15 subjects in pre-post format
ENAR Spring Meeting, March 20-23, 2005 – p.2/16
Gabapentin Study
Protocol called for 15 subjects in pre-post format
Half randomized to receive Gabapentin
ENAR Spring Meeting, March 20-23, 2005 – p.2/16
Gabapentin Study
Protocol called for 15 subjects in pre-post format
Half randomized to receive Gabapentin
Main outcomes: Hourly Scratching Activity &Visual Analogue Score
ENAR Spring Meeting, March 20-23, 2005 – p.2/16
Gabapentin Study
Protocol called for 15 subjects in pre-post format
Half randomized to receive Gabapentin
Main outcomes: Hourly Scratching Activity &Visual Analogue Score
Two quantitations: Baseline and After 6 weeks
ENAR Spring Meeting, March 20-23, 2005 – p.2/16
Gabapentin Study
Protocol called for 15 subjects in pre-post format
Half randomized to receive Gabapentin
Main outcomes: Hourly Scratching Activity &Visual Analogue Score
Two quantitations: Baseline and After 6 weeks
Quantitations required a 48-hour stay in the hospital
ENAR Spring Meeting, March 20-23, 2005 – p.2/16
Mixed Effects Model Analysis
Split-Plot Design, subjects nested within groups
ENAR Spring Meeting, March 20-23, 2005 – p.3/16
Mixed Effects Model Analysis
Split-Plot Design, subjects nested within groups
Fixed Effects: group, treatment, and group bytreatment interaction
ENAR Spring Meeting, March 20-23, 2005 – p.3/16
Mixed Effects Model Analysis
Split-Plot Design, subjects nested within groups
Fixed Effects: group, treatment, and group bytreatment interaction
Random Effects: Subjects nested within group
ENAR Spring Meeting, March 20-23, 2005 – p.3/16
Mixed Effects Model Analysis
Split-Plot Design, subjects nested within groups
Fixed Effects: group, treatment, and group bytreatment interaction
Random Effects: Subjects nested within group
Covariate: Time of measurement
yi = Xiα + Zibi + εi, i = 1, . . . ,M
ENAR Spring Meeting, March 20-23, 2005 – p.3/16
Mixed Effects Model Analysis
Split-Plot Design, subjects nested within groups
Fixed Effects: group, treatment, and group bytreatment interaction
Random Effects: Subjects nested within group
Covariate: Time of measurement
yi = Xiα + Zibi + εi, i = 1, . . . ,M
yi : ni–dimensional response vector
β: p-dimensional vector of fixed effects
Xi andZi are known regressor matrices
bi ∼ N (0,Σ) andεi ∼ N (0, σ2I).ENAR Spring Meeting, March 20-23, 2005 – p.3/16
Mixed Model Results for HSA
With Time CovariateEffect Num DF Den DF F Value Pr > FTime 23 839 0.87 0.6461Group 1 13 2.50 0.1376Treat 1 839 7.65 0.0058Group× Treat 1 839 2.12 0.1461
Without Time CovariateGroup 1 13 2.11 0.1700Treat 1 846 7.45 0.0065Group× Treat 1 846 1.34 0.2482
ENAR Spring Meeting, March 20-23, 2005 – p.4/16
Estimates and Errors
Effect Group Treat Estimate Error P-value
Group Gab 73.08 18.26 0.0015
Group Pbo 26.51 23.08 0.2713
Treat Post 37.39 15.55 0.0167
Treat Pre 62.29 15.23 < 0.0001
Group× Treat Gab Post 67.16 19.56 0.0006
Group× Treat Gab Pre 79.00 19.04 < 0.0001
Group× Treat Pbo Post 7.44 24.19 0.7585
Group× Treat Pbo Pre 45.58 23.78 0.0556
ENAR Spring Meeting, March 20-23, 2005 – p.5/16
Issues with the Data
Very small sample size
ENAR Spring Meeting, March 20-23, 2005 – p.6/16
Issues with the Data
Very small sample size
Disparate beginning times
ENAR Spring Meeting, March 20-23, 2005 – p.6/16
Issues with the Data
Very small sample size
Disparate beginning times
A priori difference in gabapentin and placebo groups
ENAR Spring Meeting, March 20-23, 2005 – p.6/16
Issues with the Data
Very small sample size
Disparate beginning times
A priori difference in gabapentin and placebo groups
HSA and VAS scaled differently for each subject
ENAR Spring Meeting, March 20-23, 2005 – p.6/16
Issues with the Data
Very small sample size
Disparate beginning times
A priori difference in gabapentin and placebo groups
HSA and VAS scaled differently for each subject
Psychological testing data to analyze
ENAR Spring Meeting, March 20-23, 2005 – p.6/16
Issues with the Data
Very small sample size
Disparate beginning times
A priori difference in gabapentin and placebo groups
HSA and VAS scaled differently for each subject
Psychological testing data to analyze
Non-random missing hourly quantitations
ENAR Spring Meeting, March 20-23, 2005 – p.6/16
Issues with the Data
Very small sample size
Disparate beginning times
A priori difference in gabapentin and placebo groups
HSA and VAS scaled differently for each subject
Psychological testing data to analyze
Non-random missing hourly quantitations
Entire pre and/or post assessments missing for 4subjects
ENAR Spring Meeting, March 20-23, 2005 – p.6/16
Brief Overview of Literature
Adjustment for data loss from pretest to posttest(Becker and Walstead, 1990)
Adjustments under non-random missingness forbinomial data (Choi and Stablein, 1988)
Selection–Regression Effect (Maltzet. al.,1980)
Non-ignorable dropout in longitudinal data (Hoganet. al., 2004)
Multiple Imputation (Rubin)
Multivariate regression analysis with missing valuesin the response variables (Tonget. al., 2003)
ENAR Spring Meeting, March 20-23, 2005 – p.7/16
Problem: Missing Observations
Data are missing due to Severity of illness,equipment malfunctions, meal times, sleep times,etc.
ENAR Spring Meeting, March 20-23, 2005 – p.8/16
Problem: Missing Observations
Data are missing due to Severity of illness,equipment malfunctions, meal times, sleep times,etc.
Large chunks of the data are missing
ENAR Spring Meeting, March 20-23, 2005 – p.8/16
Problem: Missing Observations
Data are missing due to Severity of illness,equipment malfunctions, meal times, sleep times,etc.
Large chunks of the data are missing
First Approach: Fill in values with mean or lastobservation carried forward
ENAR Spring Meeting, March 20-23, 2005 – p.8/16
Problem: Missing Observations
Data are missing due to Severity of illness,equipment malfunctions, meal times, sleep times,etc.
Large chunks of the data are missing
First Approach: Fill in values with mean or lastobservation carried forward
Run mixed-effect model with filled-in values
ENAR Spring Meeting, March 20-23, 2005 – p.8/16
Results: Mean-Filled Values
Significant Effect: Treatment (p < 0.0001)
Effect Group Treat Estimate Error Pr> |t|
Group Gab 77.25 19.39 0.002
Group Pbo 30.94 23.70 0.214
Group× Treat Gab Post 65.63 19.80 0.0009
Group× Treat Gab Pre 88.88 19.65 < 0.0001
Group× Treat Pbo Post 24.14 24.14 0.6398
7.44 24.19 0.7585
Group× Treat Pbo Pre 23.92 23.91 0.0346
45.58 23.78 0.0556
ENAR Spring Meeting, March 20-23, 2005 – p.9/16
Results: LOCF-Filled Values
Significant Effect: Group by Treatment Interaction (p < 0.0001)
Effect Group Treat Estimate Error Pr> |t|
Treat Post 49.17 11.68 < 0.0001
37.03 16.29 0.016
Treat Pre 57.15 11.46 < 0.0001
62.29 15.23 < 0.0001
Group× Treat Gab Post 80.76 14.84 < 0.0001
67.16 19.56 0.0006
Group× Treat Gab Pre 56.85 14.60 0.0001
79.00 19.04 <0.0001
ENAR Spring Meeting, March 20-23, 2005 – p.10/16
Problem: Missing Quantitations
Pre or post assessments not available for 4 subjects
ENAR Spring Meeting, March 20-23, 2005 – p.11/16
Problem: Missing Quantitations
Pre or post assessments not available for 4 subjects
Most Missing Variable Models Assume IgnorableMissingness
ENAR Spring Meeting, March 20-23, 2005 – p.11/16
Problem: Missing Quantitations
Pre or post assessments not available for 4 subjects
Most Missing Variable Models Assume IgnorableMissingness
Replace missing pre/post assessment with that of a“like” individual with random perturbation
ENAR Spring Meeting, March 20-23, 2005 – p.11/16
Problem: Missing Quantitations
Pre or post assessments not available for 4 subjects
Most Missing Variable Models Assume IgnorableMissingness
Replace missing pre/post assessment with that of a“like” individual with random perturbation
Use variance from extant quantitation forperturbation
ENAR Spring Meeting, March 20-23, 2005 – p.11/16
A Simple Simulation
Pretest/Posttest Study with one normally distributed random
variable (σ2 = 1)
ENAR Spring Meeting, March 20-23, 2005 – p.12/16
A Simple Simulation
Pretest/Posttest Study with one normally distributed random
variable (σ2 = 1)
Remove 10, 20, or 30 percent of posttest values at random
ENAR Spring Meeting, March 20-23, 2005 – p.12/16
A Simple Simulation
Pretest/Posttest Study with one normally distributed random
variable (σ2 = 1)
Remove 10, 20, or 30 percent of posttest values at random
Replace with randomly perturbed pre-test values
ENAR Spring Meeting, March 20-23, 2005 – p.12/16
A Simple Simulation
Pretest/Posttest Study with one normally distributed random
variable (σ2 = 1)
Remove 10, 20, or 30 percent of posttest values at random
Replace with randomly perturbed pre-test values
N = 10 N = 30
% Missing 10% 20% 30% 10% 20% 30%
µd = 0 0.050 0.056 0.091 0.051 0.054 0.062
µd = 2 0.938 0.839 0.712 1 1 0.999
µd = 5 1 0.999 0.987 1 1 1
ENAR Spring Meeting, March 20-23, 2005 – p.12/16
A Slightly More Realistic Simulation
Pretest/Posttest Study with one normally distributed random
variable
ENAR Spring Meeting, March 20-23, 2005 – p.13/16
A Slightly More Realistic Simulation
Pretest/Posttest Study with one normally distributed random
variable
Remove 30% or 50% ofsuccessiveobservations
ENAR Spring Meeting, March 20-23, 2005 – p.13/16
A Slightly More Realistic Simulation
Pretest/Posttest Study with one normally distributed random
variable
Remove 30% or 50% ofsuccessiveobservations
Replace with randomly perturbed pre-test values
ENAR Spring Meeting, March 20-23, 2005 – p.13/16
A Slightly More Realistic Simulation
Pretest/Posttest Study with one normally distributed random
variable
Remove 30% or 50% ofsuccessiveobservations
Replace with randomly perturbed pre-test values
N = 10 N = 30
% Missing 30% 50% 10% 30% 50%
µd = 0 0.052 0.053 0.050 0.050 0.051
µd = 2 0.662 0.341 0.999 0.995 0.904
ENAR Spring Meeting, March 20-23, 2005 – p.13/16
Remaining Issues
Choosing “like” individuals for replacement values
ENAR Spring Meeting, March 20-23, 2005 – p.14/16
Remaining Issues
Choosing “like” individuals for replacement values
Variance of random perturbation
ENAR Spring Meeting, March 20-23, 2005 – p.14/16
Remaining Issues
Choosing “like” individuals for replacement values
Variance of random perturbation
Generating data substitutions from models
ENAR Spring Meeting, March 20-23, 2005 – p.14/16
Remaining Issues
Choosing “like” individuals for replacement values
Variance of random perturbation
Generating data substitutions from models
Previous scenarios are simple, but unrealistic
ENAR Spring Meeting, March 20-23, 2005 – p.14/16
Remaining Issues
Choosing “like” individuals for replacement values
Variance of random perturbation
Generating data substitutions from models
Previous scenarios are simple, but unrealistic
Simulate pre/post data withn subjects andt timepoints per subject per measurement whereobservations are white noise.
ENAR Spring Meeting, March 20-23, 2005 – p.14/16
Remaining Issues
Choosing “like” individuals for replacement values
Variance of random perturbation
Generating data substitutions from models
Previous scenarios are simple, but unrealistic
Simulate pre/post data withn subjects andt timepoints per subject per measurement whereobservations are white noise.
Simulate pre/post data from AR(1) model withsequential values missing from post testobservations.
ENAR Spring Meeting, March 20-23, 2005 – p.14/16
Remaining Issues
Choosing “like” individuals for replacement values
Variance of random perturbation
Generating data substitutions from models
Previous scenarios are simple, but unrealistic
Simulate pre/post data withn subjects andt timepoints per subject per measurement whereobservations are white noise.
Simulate pre/post data from AR(1) model withsequential values missing from post testobservations.
ENAR Spring Meeting, March 20-23, 2005 – p.14/16
A Priori Difference in Groups
Reassign subjects to groups at random, regardless oftrue assignment
ENAR Spring Meeting, March 20-23, 2005 – p.15/16
A Priori Difference in Groups
Reassign subjects to groups at random, regardless oftrue assignment
Calculate two-sample t-tests for each assignment
ENAR Spring Meeting, March 20-23, 2005 – p.15/16
A Priori Difference in Groups
Reassign subjects to groups at random, regardless oftrue assignment
Calculate two-sample t-tests for each assignment
1000 replications of 10000 assignments
ENAR Spring Meeting, March 20-23, 2005 – p.15/16
A Priori Difference in Groups
Reassign subjects to groups at random, regardless oftrue assignment
Calculate two-sample t-tests for each assignment
1000 replications of 10000 assignments
Results: Percentage of P-values< 0.05
ENAR Spring Meeting, March 20-23, 2005 – p.15/16
A Priori Difference in Groups
Reassign subjects to groups at random, regardless oftrue assignment
Calculate two-sample t-tests for each assignment
1000 replications of 10000 assignments
Results: Percentage of P-values< 0.05
Data Min Median Max
Original 2.97 3.35 4.2
Mean Repl 0.07 0.21 0.38
LOCF Repl 11.6 12.5 13.4
ENAR Spring Meeting, March 20-23, 2005 – p.15/16
References1. Becker, William E. and Walstad, William B. (1990). Data Lossfrom Pretest to Posttest as a
Sample Selection Problem,The Review of Economics and Statistics, 72, 184-188.
2. Choi, S.C. and Stablein, D. M. (1988). Comparing IncompletePaired Binomial Data Under
Non-Random Mechanisms,Statistics in Medicine, 7, 929-939.
3. Hogan, J., Lin X., and Herman, B. (2004). Mixtures of VaryingCoefficient Models for
Longitudinal Data with Discrete or Continuous Nonignorable Dropout,Biometrics, 60,
854-864.
4. Maltz, M.D., Gordon, A.C., McDowall, D., and McCleary, R. (1980). An Artifact in
Pretest-Posttest Designs: How it Can Mistakenly Make Delinquency Programs Look Effective,
Evaluation Review, 4, 225-240.
5. Tang, G., Little, R.J.A., and Raghunathan, T.E. (2003). Analysis of Multivariate Missing Data
with Nonignorable Nonresponse,Biometrika, 90, 747-764.
ENAR Spring Meeting, March 20-23, 2005 – p.16/16