Some Practical Solutions to Analyzing Messy Datafaculty.smu.edu/mmcgee/enartalk.pdf · Some...

Some Practical Solutions toAnalyzing Messy Data

Monnie McGee

[email protected].

Department of Statistical Science

Southern Methodist University, Dallas, Texas

Co-authored with N. Bergasa (SUNY Downstate Medical Center)

I. Ginsburg, and D. Engler (Columbia Presbyterian Medical Center)

ENAR Spring Meeting, March 20-23, 2005 – p.1/16

Gabapentin Study

Protocol called for 15 subjects in pre-post format


Gabapentin Study


Half randomized to receive Gabapentin


Gabapentin Study



Main outcomes: Hourly Scratching Activity &Visual Analogue Score


Gabapentin Study




Two quantitations: Baseline and After 6 weeks


Gabapentin Study




Two quantitations: Baseline and After 6 weeks

Quantitations required a 48-hour stay in the hospital


Mixed Effects Model Analysis

Split-Plot Design, subjects nested within groups




Fixed Effects: group, treatment, and group bytreatment interaction





Random Effects: Subjects nested within group






Covariate: Time of measurement

yi = Xiα + Zibi + εi, i = 1, . . . ,M






Covariate: Time of measurement

yi = Xiα + Zibi + εi, i = 1, . . . ,M

yi : ni–dimensional response vector

β: p-dimensional vector of fixed effects

Xi andZi are known regressor matrices

bi ∼ N (0,Σ) andεi ∼ N (0, σ2I).ENAR Spring Meeting, March 20-23, 2005 – p.3/16

Mixed Model Results for HSA

With Time CovariateEffect Num DF Den DF F Value Pr > FTime 23 839 0.87 0.6461Group 1 13 2.50 0.1376Treat 1 839 7.65 0.0058Group× Treat 1 839 2.12 0.1461

Without Time CovariateGroup 1 13 2.11 0.1700Treat 1 846 7.45 0.0065Group× Treat 1 846 1.34 0.2482


Estimates and Errors

Effect Group Treat Estimate Error P-value

Group Gab 73.08 18.26 0.0015

Group Pbo 26.51 23.08 0.2713

Treat Post 37.39 15.55 0.0167

Treat Pre 62.29 15.23 < 0.0001

Group× Treat Gab Post 67.16 19.56 0.0006

Group× Treat Gab Pre 79.00 19.04 < 0.0001

Group× Treat Pbo Post 7.44 24.19 0.7585

Group× Treat Pbo Pre 45.58 23.78 0.0556


Issues with the Data

Very small sample size




Disparate beginning times





A priori difference in gabapentin and placebo groups






HSA and VAS scaled differently for each subject







Psychological testing data to analyze








Non-random missing hourly quantitations








Non-random missing hourly quantitations

Entire pre and/or post assessments missing for 4subjects


Brief Overview of Literature

Adjustment for data loss from pretest to posttest(Becker and Walstead, 1990)

Adjustments under non-random missingness forbinomial data (Choi and Stablein, 1988)

Selection–Regression Effect (Maltzet. al.,1980)

Non-ignorable dropout in longitudinal data (Hoganet. al., 2004)

Multiple Imputation (Rubin)

Multivariate regression analysis with missing valuesin the response variables (Tonget. al., 2003)


Problem: Missing Observations

Data are missing due to Severity of illness,equipment malfunctions, meal times, sleep times,etc.




Large chunks of the data are missing





First Approach: Fill in values with mean or lastobservation carried forward





First Approach: Fill in values with mean or lastobservation carried forward

Run mixed-effect model with filled-in values


Results: Mean-Filled Values

Significant Effect: Treatment (p < 0.0001)

Effect Group Treat Estimate Error Pr> |t|

Group Gab 77.25 19.39 0.002

Group Pbo 30.94 23.70 0.214

Group× Treat Gab Post 65.63 19.80 0.0009

Group× Treat Gab Pre 88.88 19.65 < 0.0001

Group× Treat Pbo Post 24.14 24.14 0.6398

7.44 24.19 0.7585

Group× Treat Pbo Pre 23.92 23.91 0.0346

45.58 23.78 0.0556


Results: LOCF-Filled Values

Significant Effect: Group by Treatment Interaction (p < 0.0001)

Effect Group Treat Estimate Error Pr> |t|

Treat Post 49.17 11.68 < 0.0001

37.03 16.29 0.016

Treat Pre 57.15 11.46 < 0.0001

62.29 15.23 < 0.0001

Group× Treat Gab Post 80.76 14.84 < 0.0001

67.16 19.56 0.0006

Group× Treat Gab Pre 56.85 14.60 0.0001

79.00 19.04 <0.0001


Problem: Missing Quantitations

Pre or post assessments not available for 4 subjects




Most Missing Variable Models Assume IgnorableMissingness





Replace missing pre/post assessment with that of a“like” individual with random perturbation





Replace missing pre/post assessment with that of a“like” individual with random perturbation

Use variance from extant quantitation forperturbation


A Simple Simulation

Pretest/Posttest Study with one normally distributed random

variable (σ2 = 1)


A Simple Simulation


variable (σ2 = 1)

Remove 10, 20, or 30 percent of posttest values at random


A Simple Simulation


variable (σ2 = 1)


Replace with randomly perturbed pre-test values


A Simple Simulation


variable (σ2 = 1)



N = 10 N = 30

% Missing 10% 20% 30% 10% 20% 30%

µd = 0 0.050 0.056 0.091 0.051 0.054 0.062

µd = 2 0.938 0.839 0.712 1 1 0.999

µd = 5 1 0.999 0.987 1 1 1


A Slightly More Realistic Simulation


variable




variable

Remove 30% or 50% ofsuccessiveobservations




variable






variable



N = 10 N = 30

% Missing 30% 50% 10% 30% 50%

µd = 0 0.052 0.053 0.050 0.050 0.051

µd = 2 0.662 0.341 0.999 0.995 0.904


Remaining Issues

Choosing “like” individuals for replacement values


Remaining Issues


Variance of random perturbation


Remaining Issues



Generating data substitutions from models


Remaining Issues




Previous scenarios are simple, but unrealistic


Remaining Issues





Simulate pre/post data withn subjects andt timepoints per subject per measurement whereobservations are white noise.


Remaining Issues






Simulate pre/post data from AR(1) model withsequential values missing from post testobservations.


Remaining Issues






Simulate pre/post data from AR(1) model withsequential values missing from post testobservations.


A Priori Difference in Groups

Reassign subjects to groups at random, regardless oftrue assignment




Calculate two-sample t-tests for each assignment





1000 replications of 10000 assignments






Results: Percentage of P-values< 0.05






Results: Percentage of P-values< 0.05

Data Min Median Max

Original 2.97 3.35 4.2

Mean Repl 0.07 0.21 0.38

LOCF Repl 11.6 12.5 13.4


References1. Becker, William E. and Walstad, William B. (1990). Data Lossfrom Pretest to Posttest as a

Sample Selection Problem,The Review of Economics and Statistics, 72, 184-188.

2. Choi, S.C. and Stablein, D. M. (1988). Comparing IncompletePaired Binomial Data Under

Non-Random Mechanisms,Statistics in Medicine, 7, 929-939.

3. Hogan, J., Lin X., and Herman, B. (2004). Mixtures of VaryingCoefficient Models for

Longitudinal Data with Discrete or Continuous Nonignorable Dropout,Biometrics, 60,

854-864.

4. Maltz, M.D., Gordon, A.C., McDowall, D., and McCleary, R. (1980). An Artifact in

Pretest-Posttest Designs: How it Can Mistakenly Make Delinquency Programs Look Effective,

Evaluation Review, 4, 225-240.

5. Tang, G., Little, R.J.A., and Raghunathan, T.E. (2003). Analysis of Multivariate Missing Data

with Nonignorable Nonresponse,Biometrika, 90, 747-764.


Date post:	13-May-2020
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

Some Practical Solutions to Analyzing Messy Datafaculty.smu.edu/mmcgee/enartalk.pdf · Some...

Documents