1
1
Comparative Effectiveness of Treatments in Large Healthcare Databases
The Value of High-Dimensional Propensity Score Approaches
Sebastian Schneeweiss, MD, ScD Professor of Medicine and Epidemiology
Division of Pharmacoepidemiology and Pharmacoeconomics, Dept of Medicine, Brigham & Women’s Hospital/ Harvard Medical School
Effec%venessResearchwithHealthcareDatabases
2
v Reducebias§ Analysesthatsupportcausalinterpreta2ons
v Reduceinves%gatorerror
v Increasemeaningfulnessfordecisionmaking
§ Analysesthatruninnearreal-2measdatarefresh
§ Analysesthatproduceabsoluteeffectsizes
§ Analysesthatarerepresenta2veofrou2necareoutcomes
§ Analysesthatcanbereproducedbyothers
2
3
DealingwithConfounding
Schneeweiss, PDS 2006
Confounding
Unmeasured Confounders
Measured Confounders
Design
• Restriction
• Matching
Analysis
• Standardization
• Stratification
• Regression
Unmeasured, but measurable in
substudy
• 2-stage sampl.
• Ext. adjustment
• Imputation
Unmeasurable
Design Analysis
• Cross-over
• Active comparator (restriction)
• Instrumental variable
• Proxy analysis
• Sensitivity analysis
Propensity scores
• Marginal Structural Models
4
Claims data describe the sociology of health care and its recording practice in light of economic interests
Secondary Healthcare Databases
Schneeweiss J Clin Epi 2005
3
5
From healthcare encounters to data and analyzable databases Visits w/ Dx of Afib
Visits w/ Dx of Afib Hospital
for stroke
Pharmacy
ID Date Service
MedicalServices
ID Date Service
Hospitaliza%on
ID Date Service
Warfarin Rivaroxaban
LinkedDatabase
ID Date Service
ID Date Service
ID Date Service
ID Date Service
---------- ID=********** dob=**/**/1948 sex=M eligdt=1/2000 indexdt=6/2001 -------------------
Service Site of ___________Drug or Procedure________ ________Diagnosis_____Date Service Prov Type Code Description * Code Description ----------------------------------------------------------------------------------------------10/01/00 OFFICE Family Practice 90658 INFLUENZA VIRUS VACC/SPLIT V048 VACC FOR INFLUEN10/01/00 Rx Pharmacy CIPROFLOXACIN 500MG TABLETS 1011/05/00 OFFICE Family Practice 17110 DESTRUCT OF FLAT WARTS, UP 0781 VIRAL WARTS 11/07/00 Rx Pharmacy CIPROFLOXACIN 500MG TABLETS 1001/15/01 Rx Pharmacy CIPROFLOXACIN 500MG TABLETS 1006/25/01OFFICEEmerg Clinic99070SPECIALSUPPLIES*84509SPRAINOF ANKLE
E927ACCOVEREXERTION06/30/01OFFICEOrthopedist99204OV,NEWPT.,DETAILEDH&P,LOW*72767RUPTACHILLTEND06/30/01 OFFICE Internist/Gener 99202 OV,NEW PT.,EXPD.PROB-FOCSD * 84509 SPRAIN OF ANKLE
OUTPT HP Anesthesiologis 01472 REPAIR OF RUPTURED ACHILLES * 84509 SPRAIN OF ANKLE Hospital 27650 REPAIR ACHILLES TENDON * 84509 SPRAIN OF ANKLE
85018 BLOOD COUNT; HEMOGLOBIN * 84509 SPRAIN OF ANKLE Orthopedist 27650 REPAIR ACHILLES TENDON * 84509 SPRAIN OF ANKLE
06/30/01 OFFICE Orthopedist 29405 APPLY SHORT LEG CAST * 72767 RUPT ACHILL TEND07/30/01 OFFICE Orthopedist 29405 APPLY SHORT LEG CAST * 72767 RUPT ACHILL TEND08/13/01 OFFICE Orthopedist L2116 AFO TIBIAL FRACTURE RIGID * 72767 RUPT ACHILL TEND
Longitudinal insurance claims databases
4
7
1) Temporality 2) Baseline Health
(confounders) 3) Exposures 4) Outcomes
Health Status
Exposure
Outcomes
Claims data (hosp. for MI via ICD-9 codes)
EHR data (Functional status via nat. language processing)
Registry data (PRO)
Claims data (drug dispensing)
EHR data (prescrib. details)
Registry data (Device id#)
Claims data (In+ outpatient Dx)
EHR data (clinical parms, lifestyle, QoL)
Registry data (PRO)
Time
Minimal Components for Causal interpretations:*
*Sir Austin Bradford Hill. Proc Roy Soc Med 1965;58:295-300
Why we like propensity score matching when working with healthcare databases:
Propensityscores:v Manyexposedpa2entsv Fewoutcomesv ManycovariatesMatching:v Transparencyintheachievedbalancev Trimmingofsubjectsthatcannotbematched
(areasofnosupport)
8
Primary data collection
Secondary use of data
§ Precisely identi-fied covariates
§ Well-defined measurement
§ A small number of selected covariates
§ Known constructs of covariates
§ No control of covariate measurement
§ Large numbers of covariates can be generated
5
Unobservable confounding and proxy measures
E (Exposure)
C
U
Y (Outcome)
E = Exposure; e.g. Y = Outcome of interest C = observable confounder (serves as a proxy) U = unobservable confounder
Unobserved confounder
Observable proxy Coding
Very frail health Use of oxygen canister CPT-4:
Acutely sick but not that bad off
Receiving a code for hypertension during a hospital stay
ICD-9:
Health seeking behavior Regular check-up visit; regular screening exams
ICD-9, CPT-4 # GP visits
Fairly healthy senior Receiving the first lipid-lowering medication at age 70
NDC
Chronically sick Regular visits with specialist, hospitalization; many prescription drugs
# specialist visits, NDC
---------- ID=********** dob=**/**/1948 sex=M eligdt=1/2000 indexdt=6/2001 -------------------
Service Site of ___________Drug or Procedure________ ________Diagnosis_____Date Service Prov Type Code Description * Code Description ----------------------------------------------------------------------------------------------10/01/00 OFFICE Family Practice 90658 INFLUENZA VIRUS VACC/SPLIT V048 VACC FOR INFLUEN10/01/00 Rx Pharmacy CIPROFLOXACIN 500MG TABLETS 1011/05/00 OFFICE Family Practice 17110 DESTRUCT OF FLAT WARTS, UP 0781 VIRAL WARTS 11/07/00 Rx Pharmacy CIPROFLOXACIN 500MG TABLETS 1001/15/01 Rx Pharmacy CIPROFLOXACIN 500MG TABLETS 1006/25/01OFFICEEmerg Clinic99070SPECIALSUPPLIES*84509SPRAINOF ANKLE
E927ACCOVEREXERTION06/30/01OFFICEOrthopedist99204OV,NEWPT.,DETAILEDH&P,LOW*72767RUPTACHILLTEND06/30/01 OFFICE Internist/Gener 99202 OV,NEW PT.,EXPD.PROB-FOCSD * 84509 SPRAIN OF ANKLE
OUTPT HP Anesthesiologis 01472 REPAIR OF RUPTURED ACHILLES * 84509 SPRAIN OF ANKLE Hospital 27650 REPAIR ACHILLES TENDON * 84509 SPRAIN OF ANKLE
85018 BLOOD COUNT; HEMOGLOBIN * 84509 SPRAIN OF ANKLE Orthopedist 27650 REPAIR ACHILLES TENDON * 84509 SPRAIN OF ANKLE
06/30/01 OFFICE Orthopedist 29405 APPLY SHORT LEG CAST * 72767 RUPT ACHILL TEND07/30/01 OFFICE Orthopedist 29405 APPLY SHORT LEG CAST * 72767 RUPT ACHILL TEND08/13/01 OFFICE Orthopedist L2116 AFO TIBIAL FRACTURE RIGID * 72767 RUPT ACHILL TEND
Longitudinal insurance claims databases
Longitudinal patterns of codes of any type (Dx, Px, Rx, Lx etc.) are proxies of disease activity, severity and general health state.
6
Data domains
Inpatient Diagnoses *
Outpatient Diagnoses *
Inpatient Procedures **
Outpatient Procedures **
Medication dispensings ***
Lab test results
Unstructured text notes
Frequency/ Intensity
Once
Sporadic
Frequent
Temporality
Proximal to exposure
Evenly distributed
Distal to exposure start
Three main data dimensions
Standard coding examples: * ICD: International classification of disease; ** CPT: Current procedure terminology; *** NDC: National Drug Code, ATC: Anatomical Therapeutic Classification
Stru
ctur
ed h
ealth
dat
a
Schneeweiss et al. 2009, Rassen et al 2011
Covariate assessment period
Start of drug exposure
Follow-up period
Sporadic
Frequent
Even
Distal
Proximal
Confounding frequency and temporality patterns
Frequency
Temporality pattern
7
In-hospitalPx
UnstructuredEM
R
In-hospitalDx
Outpa2entDx
Outpa2entPx
Medica2ons
HSintensity
Sex
Time
Race
Labresults
Structured
EMR
Nursinghom
eDx
AgeNLP/imputa2on
Prevalenceoffactors
Basiccovariatepriori%za%onreconfounding
Covaria
tegen
era%
on
Es%m
a%on
Frequency,temporalclustering
Data adaptive adjustment using hdPS
Interac%ons
Covaria
te
priori%
za%o
n
BoostthroughDRSmachinelearning
PSes%ma%onfollowedbymatching,stra%fica%on
Targetparameteres%ma%onforcausalinferenceSchneeweiss et al. 2009,
e.g. top 200 most prevalent features
e.g. top predictors for outcomes and exposure (=bias prioritization)
Better confounding adjustment by id outcome predictors
14
8
Performance in empirical database studies
(a) Rassen JA, et al.. Cardiovascular outcomes and mortality in patients using clopidogrel with proton pump inhibitors after percutaneous coronary intervention. Circulation 2009;120:2322-9. (b + d) Schneeweiss S, et al.. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology 2009;20:512–22. (c) Patorno E, et al. Anticonvulsant medications and the risk of suicide, attempted suicide, or violent death. JAMA 2010;303:1401-9 (e) Schneeweiss S, et al. The comparative safety of antidepressant agents in children regarding suicidal acts. Pediatrics 2010;125: 876–88 (f) Garbe E, et al. High-dimensional versus conventional propensity scores in a comparative effectiveness study of coxibs and reduced upper gastrointestinal complications. Eur J Clin Pharmacol. 2012 Jul 5. (g) Le, et al. Effects of aggregation of drug and diagnostic codes on the performance of the hdPS algorithm. BMC Med Res Methodology 2013;13:142.
-0.60
-0.30
0.00
0.30
0.60
Unad
just
asyr
adjus
t
+ sp
ec.
cova
rs
+ hd
-PS
adjus
t
only
hd-P
S
log(rela2
verisk)
Clopidogrel - MI(a) Statin - death (b) TCA suicide(e) Neurontin -suicide(c) Coxib-UGB US comm. (g) Coxib-UGB De (f) Coxib-UGB US Mcare (d) hdPS is data
source indep’t Claims databases: U.S. Medicare U.S. commercial Canada Germany HER databases: United Kingdom Regenstrief
16
Plasmode simulations studies confirmed excellent performance in real-world data Plasmode simulations inject a defined causal effect of E on Y|C in a given healthcare database preserving the underlying data structure and information content.
Franklin et al. Comp Stat Data Analysis 2013
9
17
1) Variables that are unrelated to the exposure but related to the outcome should always be included in a PS model.
2) Including variables that are related to the exposure but not to the outcome will increase the variance of the estimated exposure effect without decreasing bias
3) In small studies, the inclusion of variables that are strongly related to the exposure but only weakly related to the outcome can increase bias
18 Schneeweiss et al. Epidemiology 2016 in press
Using Lasso to ID outcome predictors, then feeding into PS
10
19
Direct effect estimation (outcome model) with Lasso Why PS? Why not use statistical learning techniques like Lasso for direct outcome estimation in an high-dimensional covariate space
Franklin et al. AJE 2015
20
hd-PS small sample performance (simulations)
277 events 110 83 56 27 Rassen et al. AJE 2011
Use case: Newly marketed medications - Initially few exposed patients and a
handful of events - Want to know adverse events early on - Sequential estimation as data refresh
11
21
New medications: Can historical data help with covariate identification when there are few exposed subjects
Kamamaru et al. J Clin Epi 2016 in press
22
Automatic variable selection: When is it enough?
Anticonvulsants and suicidal action
Patorno et al. Epidemiology 2014
Change in estimate? (Schneeweiss) X-validated outcome prediction via CTMLE? (van der Laan)
12
Performance of algorithmic EHR word stem adjustment
Rassen et al. 2013
1 Word: leukocytosi oxycontin haptic extracrani scleral splenomengali valium cardizem crp
2 Words: site cervix categori within specimen categori peripher edema maxillari sinus differenti diagnos high hpv film # comparison prior see descripti mildly enlarg fractur right
3 Words: specimen site cervix site cervix endocervix categori within normal impress ct abdomen or 3 view white female a exam ct abdomen
Drug A launch
(=month 0)
Baseline New user of Drug B Follow-up
3 6 9 12
Baseline New user of Drug A Follow-up
A B D
D _ a
c b d
Time
Schneeweiss et al. CPT 2011
Propensity score matching
24
From one-off to ongoing monitoring Now that we have developed an automated approach to optimized confounding adjustment with healthcare data
13
Drug A launch
(=month 0)
Baseline New user of Drug B Follow-up
3 6 9 12
Baseline New user of Drug A Follow-up
Baseline New user of Drug B Follow-up
Baseline New user of Drug A Follow-up
Combined cohort:
A B D
D _ a
c b d
A B D
D _ a
c b d
A B D
D _ a
c b d
Time
25
Evidence generation as data refresh A sequential cohort design
Drug A launch
(=month 0)
Baseline New user of Drug B Follow-up
3 6 9 12
Baseline New user of Drug A Follow-up
Baseline
Baseline New user of Drug B
New user of Drug B
Follow-up
Follow-up
Baseline New user of Drug A Follow-up
Baseline New user of Drug A Follow-up
Combined cohort:
A B D
D _ a
c b d
A B D
D _ a
c b d
A B D
D _ a
c b d
A B D
D _ a
c b d
Time
26
Evidence generation as data refresh A sequential cohort design
14
PS-match
Ben
efit
+
-
7 per 1,000 person-year benefit of the new drug
When is a benefit real?
Acknowledgement: Dr. Joshua Gagne
PS-match
PS-match
Ben
efit
+
-
9 per 1,000 person-year benefit of the new drug
Acknowledgement: Dr. Joshua Gagne
15
PS-match
PS-match
PS-match
Ben
efit
+
-
13 per 1,000 person-year benefit of the new drug
Acknowledgement: Dr. Joshua Gagne
PS-match
PS-match
PS-match
?
Ben
efit
+
-
Acknowledgement: Dr. Joshua Gagne
16
… … … …
… …
…
PS-match
PS-match
PS-match
? B
enef
it +
-
Acknowledgement: Dr. Joshua Gagne
… … … …
… …
…
PS-match
PS-match
PS-match
Ben
efit
+
-
Acknowledgement: Dr. Joshua Gagne
17
Sequential approaches using healthcare databases for accelerated approval and adaptive licensing
Eichler et al, Clin Pharmacol Therap 2012 Woodcock J, CPT 2012 33
Summary
Tremendouspossibili2es:v High-dimensionalPSasaconfoundingadjustmentstrategytailored
towardshealthcaredatabases:§ Fewoutcomes§ Manyexposedpa2ents§ Manyproxiesofcovariates§ Automatedanddataadap2ve
Prac2calnotes:v UsedbytheFDASen2nelsystemv UsedbyOMOPv Trainingandguidelinesv ValidatedsoTwaretoolsNotmuchvalueoutsidesecondarydatabases
34