Date post: | 18-Jan-2018 |
Category: |
Documents |
Upload: | frederick-cunningham |
View: | 219 times |
Download: | 0 times |
Confounding adjustment: Ideas in Action -a case study
Xiaochun Li, Ph.D. Associate Professor Division of Biostatistics Indiana University School of Medicine
2
• Description of the data set• Quantity to be estimated• Summary of baseline characteristics• Approaches to data analyses• Results• Discussion
Outline
3
Linder Center data described and analyzed in Kereiakes et al. (2000)
• 6 month follow-up data on 996 patients who underwent an initial Percutaneous Coronary
Intervention (PCI) were treated with “usual care” alone or usual care plus a
relatively expensive blood thinner (IIB/IIIA cascade blocker
• has10 variables Y: 2 outcomes, mort6mo (efficacy) and cardcost (cost) X: 1 treatment variable, and 7 baseline covariates,
stent, height, female, diabetic, acutemi, ejecfrac and ves1proc
Simulation Setup
4
Baseline characteristics
Stent coronary stent deployment
female patient sex
diabetic diabetes mellitus
acutemi acute myocardial infarction
ves1proc number of vessels involved in initial PCI
height In centimeter
ejecfrac left ejection fraction %
5
Simulation data set was based on the Linder Center data
• 17 copies of the clustered Lindner data, with fudge factors added to ejfract and hgt, and some clipping
same correlation among covariates, same clustering patterns
• Contains the values of 10 simulated variables for 10,325 hypothetical patients
• To simplify analyses, the data contain no missing values.
• Details and dataset available from Bob’s website
The “LSIM10K” dataset
6
The population average treatment effect (ATE), i.e.,
E(Y1) - E(Y0)
Y1 and Y0 are conterfactual outcomes
In plain words: what if scenarios
The expected response if treatment had been assigned to the entire study population minus the expected response if control had been assigned to the entire study population
What do we want to estimate?
7
Baseline covariate balanceassessment
Variable C (Usual care
alone)
T (Usual care + Abciximab)
P value
stent 63% 69% <0.001
female 33% 34% 0.36
diabetic 23% 19% <0.001
acutemi 7% 15% <0.001
ves1proc 1.4 (±0.6) 1.3 (±0.6) <0.001
height (cm) 172.5 (±10) 171.5 (±10) <0.001
ejfract 53 (±8) 50 (±10) <0.001
8
Visualizing overall imbalance
C
Deep blue = high values
T
9
The following methods were applied to lsim10k
• Outcome regression adjustment (OR)• Propensity score (PS) stratification• Inverse-probability-treatment-weighted (IPTW)• Doubly robust estimation• Matching by
Mahalonobis distancePS only
Analytical Methodsfor confounding adjustment
ANALYSIS OF MORT6MOOR model for mort6mo :• treatment indicator (trtm) • main effect terms for all seven covariates• quadratic terms for both height and ejfract• Residual deviance: 2410.4 on 10323 degrees of freedom
PS model:• saturated model for the five categorical covariates (main effects and interaction terms up to fifth-order)• main effects and quadratic terms for height and ejfract
Covariates Balance Evaluations based on PS Quintiles
12
Stent
13
Female
14
Diabetic
15
Acutemi
16
Ves1proc
17
Heightstrata 2 (0.95 cm) and 3 (-1.50cm)
18
Height
• Existence of residual confounding after adjusting for PS quintiles
• The within-stratum between-group height difference
mean s.d. p
Stratum 2: 0.949 0.44 0.032
Stratum 3: -1.497 0.43 0.0005
19
Ejfractstrata 1 (0.81), 2 (-1.32) and 3 (-0.72)
20
• Existence of residual confounding after adjusting for PS quintiles
• The within-strata between-group height difference mean s.d. p-value
Stratum 1: 0.812 0.41 0.0475
Stratum 2: -1.322 0.33 7.38e-5
Stratum 3: -0.721 0.32 0.025
Ejfract
21
• Residual confounding within strata
• In PS stratification method, height and ejfract are further adjusted
stratum specific Treatment effect Height, ejfract main effects and their quadratic terms
PS Stratification
22
Results – mort6mo
Method u1 u0 △ SE
Outcome Regression
0.010 0.043 -0.032 0.0038
PS strat. 0.012 0.044 -0.033 0.0039
IPTW1 0.011 0.045 -0.034 0.0038
IPTW2 0.011 0.045 -0.034 0.0037
DR 0.011 0.043 -0.032 0.0037
Match Mahalanobis PS
NA NA -0.037-0.036
0.00440.0039
Results of all methods are consistent, providing evidence of treatment effectiveness at preventing death at 6 months.
True △=-0.036
ANALYSIS OF CARDCOST
cardcost model:•treatment indicator (trtm) • main effect terms for all seven covariates• quadratic terms for both height and ejfract
PS MODEL: SAME AS BEFORE
cardcost model of CA with PS stratification:
stratum specific Treatment effectHeight, ejfract main effects and their quadratic terms
24
Model checking – OR Adjusted R-squared: 0.0386
25
Model checking – OR (log transformed) Adjusted R-squared: 0.0693
26
Results – cardcostMethod u1 u0 △ SE
OR: original scale
15308 15300 8 210
OR: Log transformed
13536 13702 -166 111
PS strat. 13580 13639 -59 119
IPTW1 15545 15226 -319 409
IPTW2 15408 15303 -105 229
DR 15393 15292 -101 226
Match Mahalanobis PS
NA NA 150-3
178215
27
IPTW 1 vs 2
28
• All methods give consistent results on the 2 outcomes
• All PS based results have similar variance except IPTW1
• IPTWs depend on approx. correct PS model• OR depends on approx. correct outcome model• DR is a fortuitous combination of OR and IPTW: de
pends on one of models being right• Nonparametric models of either models may be an
alternative to parametric models
Discussion
29
Double Robustness
Method PS outcome △ SEIPTW2 wrong NA 464 214
DR
wrong wrong right
wrong right wrong
463166
-131
217214233
• wrong PS model: adjust for one covariate ‘acutemi’ only• wrong OR model for card cost: adjust for the treatment indicator ‘trtm’ and the ‘acutemi’ covariate
By “right”, we mean approximately.
30
• The majority applications in literature use a parametric logistic regression model that assume covariates are linear and additive on the log odds scale May include selected interactions and polynomial terms
• Accurate PS estimation is impeded by High dimensional covariates – which ones should we de-
confound? Unknown functional form – how do they relate to the
treatment selection
• PS model misspecification can substantially bias the estimated treatment effect
• Nonparametric approach is flexible to accommodate nonlinear/non-additive relationship of covariates to treatment assignment, e.g., trees
Propensity score estimation
31
Nonparametric regression techniques
• Generalized Boosted Models (GBM) to estimate the propensity score function Friedman, 2001; Madigan and Ridgeway, 2004;
McCaffrey, Ridgeway, and Morral, 2004 R package: twang
• Regression tree model to predict cardcost Ripley, 1996; Therneau and Atkinson, 1997 R package: rpart
32
• A multivariate nonparametric regression technique• Sum of a large set of simple regression trees modelling
log-odds gbm finds mle of g(x)=log(p(x)/(1-p(x)), p(x)=P(T=1|x)
• Predict treatment assignment from a large number of pretreatment covariates – adaptively choose them
• Nonlinear• No need to select variables• Can model complex interactions• Invariant to monotone transformations of x
E.g, same PS estimates whether use age, log(age) or age2
• Outperforms alternative methods in prediction error
Generalized Boosted Models (GBM)
33
Results – cardcostnonparametric approach
Method u1 u0 △ SE
DR:parametric models
15393 15292 -101 226
DR:Gbm + parametric model
15303 15213 -90 210
DR:Gbm + tree
15233 15356 123 172
34
• People try quintiles, deciles for propensity score stratification – need data driven approach (based on bias-variance tradeoff) for number of strata
• Model selection: PS model, and outcome model Nonparametric estimation of models may be intuitive,
but not clear about the properties of the causal estimates
Nonparametric caveat: still need to define a set of “confounders” based on knowledge of causal relationship among treatment, outcome and covariates rather than conditioning indiscriminatly on all covariates that have associations with treatment and outcome
Future research