+ All Categories
Home > Documents > Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May...

Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May...

Date post: 30-Dec-2015
Category:
Upload: prudence-quinn
View: 228 times
Download: 1 times
Share this document with a friend
Popular Tags:
39
Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007
Transcript
Page 1: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Estimating Causal Effects from Large Data Sets Using

Propensity Scores

Hal V. Barron, MD

TICR

May 2007

Page 2: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Estimating Causal Effects from Large Data Sets Using Propensity Scores

The aim of many analyses of large databases is to draw causal inferences about the effects of actions, treatments, or interventions

A complication of using large databases to achieve such aims is that their data are almost always observational rather than experimental

Page 3: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Is hospital A better than hospital B in treating metastatic colorectal cancer?

Estimating Causal Effects from Large Data Sets Using Propensity Scores

Hospital A Hospital B p value

Median survival

18.9 11.9 p <0.001

Age

49.2 59.7 p <0.001

Age adjusted Survival

16.4 13.9 p <0.05

Page 4: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

What assumptions are made in the modeling of age adjusted survival?

Page 5: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

40 50 70 80 90AGE

50 60 70

trend line

RR

/OR

death

Page 6: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Ideally we would like to compare patients who are similar with respect to all covariates which are observed to influence the outcome

Page 7: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Estimating Causal Effects from Large Data Sets Using Propensity Scores

Standard methods of analysis using available statistical software (such as linear or logistic regression) can be deceptive for these objectives because they provide no warnings about their propriety

Propensity score methods may be a more reliable tools for addressing such objectives because the assumptions needed to make their answers appropriate are more assessable and transparent to the investigator

Page 8: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Propensity Scores

Propensity score technology essentially reduces the entire collection of background characteristics to a single composite characteristic that appropriately summarizes the collection

Thus, the PS is a device for constructing matched pairs or matched sets or strata that balance numerous observed covariates

Page 9: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Propensity Scores

This reduction from many characteristics to one composite characteristic allows the straightforward assessment of whether the treatment and control groups overlap enough with respect to background characteristics to allow a sensible estimation of treatment versus control effects from the data set

Moreover, when such overlap is present, the propensity score approach allows a straightforward estimation of treatment versus control effects that reflects adjustment for differences in all observed background characteristics

Page 10: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Hospital A vs B

Propensity Score

Fre

qu

en

c y

A B

Page 11: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Questions??

Two subjects have the same propensity score-what does this mean?

Do the two subjects have the same age, gender etc… Do their differences help predict which subject is

more likely to receive the treatment?

Page 12: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

True or False?.

If we pair or group subjects with the same PS, then treated and control subjects in these groups will have similar patterns or distributions of each covariate

Page 13: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Background

The PS approach complements model-based procedures and is not a substitute for them (ie often used in conjunction with regression or log-liner models)

Page 14: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Sub-classification

Table 1. Comparison of Mortality Rates for Three Smoking Groups in Three Databases*

Annals of Internal Medicine, Part 2, 15 October 1997. 127:757-763

Page 15: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Sub-classification

Comparison of Mortality Rates for Three Smoking Groups in Three Databases*

Annals of Internal Medicine, Part 2, 15 October 1997. 127:757-763.

Page 16: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Sub-classification

A particular statistical model, such as a linear regression (or a logistic regression model, or in other settings, a hazard model) could be used to adjust for age, but sub-classification has three distinct advantages

Page 17: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Sub-classification vs MVA

First, if the treatment or exposure groups do not adequately overlap on the confounding covariate age, the investigator will see it immediately and be warned. In contrast, nothing in the standard output of any regression modeling software will display this critical fact

Page 18: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Sub-classification vs MVA

Second: Sub-classification does not rely on any particular functional form, such as linearity, for the relation between the outcome (death) and the covariate (age) within each treatment group, whereas models do

Page 19: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Sub-classification vs MVA

Third: Small differences in many covariates can accumulate into a substantial overall difference

Page 20: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Sub-classification

If standard models can be so dangerous, why are they commonly used for such adjustments when large databases are examined for estimates of causal effects?

Page 21: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Sub-classification

Which is easier???

How do you deal with multiple confounders??

Page 22: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Propensity Scores

Propensity score techniques are very much like sub-classification techniques with more than one covariate

Page 23: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Is there a benefit to early angiography in patients with ST-segment depression myocardial infarction? An observational study

Background: It remains unclear whether an aggressive treatment approach with very early (<6 hours) angiography and revascularization improves outcome over an early conservative approach. We compared the short-term outcome of patients who received very early (<6 hours) angiography with patients who received early conservative therapy for ST-segment depression MI

Methods: Patients seen within 12 hours with ST-segment depression on the initial electrocardiogram (ECG) were identified from the National Registry of Myocardial Infarction 2 (NRMI) database, which collected information from 1994 to 1998. Those who received very early (<6 hours) angiography were compared with those who received early conservative therapy. The short-term outcomes, including major bleeding episodes, cerebral vascular events, recurrent ischemia and angina, MI, and death, were compared on the basis of the initial therapy received

(Am Heart J 2002;143:488-96)

Page 24: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Results

Page 25: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Hospital outcome in the very early angiography group versus the early conservative therapy group

Page 26: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Clinical factors associated with increased hospital mortality

Very early angiography has an OR of 0.76 with 95% CIs 0.61-0.95

Page 27: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Adjustment With Propensity Score

Because of the substantial differences in baseline characteristics between the treatment groups, we used the propensity score method to attempt to find comparable patients treated with each strategy

In the first step, we identified factors that predicted receiving very early angiography. These were age, male sex, white race, history of MI, history of angina, history of CHF, previous PTCA, previous aortocoronary bypass surgery, diabetes mellitus, current smoker, Killip class I, pulse >100 beats/min, systolic blood pressure <=100 mm Hg, admission diagnosis of MI, chest pain at presentation, and transfer from an outside hospital

A stepwise multivariate logistic regression analysis was performed to predict receiving early angiography

Thus, the propensity score represents the probability that a patient will receive very early angiography. A higher score indicates a higher probability of receiving very early angiography. Similarly, the same propensity score among patients indicates that same probability of receiving very early angiography

Page 28: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Adjustment With Propensity Score

The predicted probability of receiving very early angiography (the propensity score) was calculated for each patient

Patients receiving very early angiography (cases) were matched to patients receiving early conservative therapy (controls) on propensity score using the nearest available pair matching method. The 4-digit match resulted in 58% of the cases matched to control, yielding 1405 patient matches with similar baseline characteristics

After the matched-pair analysis, the original multivariate logistic regression model to predict hospital death was rerun with the propensity score forced in. OR and 95% CIs were calculated

Page 29: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.
Page 30: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Comparing patients matched on propensity score showed mortality was similar in both treatment groups (5.6% vs 5.4%, P = .87), with no significant inhospital mortality benefit of very early angiography in a MVA (OR = 0.89;

95% CI 0.71-1.13)

Results

Page 31: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Summary:Propensity Scores

The basic idea of propensity score methods is to replace the collection of confounding covariates in an observational study with one function of these covariates, called the propensity score (that is, the propensity to receive treatment 1 rather than treatment 2). This score is then used just as if it were the only confounding covariate

Thus, the collection of predictors is collapsed into a single predictor The propensity score is found by predicting treatment group

membership (that is, the indicator variable for being in treatment group 1 as opposed to treatment group 2) from the confounding covariates, for example, by a logistic regression or discriminant analysis

In this prediction of treatment group measurement, it is critically important that the outcome variable (for example, death) play no role; the prediction of treatment group must involve only the covariates

Page 32: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Summary: Propensity Scores

Each person in the database then has an estimated propensity score, which is the estimated probability (as determined by that person's covariate values) of being exposed to treatment 1 rather than treatment 2. This propensity score is then the single summarized confounding covariate to be used for sub-classification

Page 33: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Summary:Propensity Scores

If two persons, one exposed to treatment 1 and the other exposed to treatment 2, had the same value of the propensity score, these two persons would then have the same predicted probability of being assigned to treatment 1 or treatment 2. Thus, as far as we can tell from the values of the confounding covariates, a coin was tossed to decide who received treatment 1 and who received treatment 2. Now suppose that we have a collection of persons receiving treatment 1 and a collection of persons receiving treatment 2 and that the distributions of the propensity scores are the same in both groups (as is approximately true within each propensity subclass). In subclass 1, the persons who received treatment 1 were essentially chosen randomly from the pool of all persons in subclass 1, and analogously for each subclass

As a result, within each subclass, the multivariate distribution of the covariates used to estimate the propensity score differs only randomly between the two treatment groups

Page 34: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Limitations of Propensity Scores

In observational studies, our confidence in causal conclusions is limited

Propensity score methods can only adjust for observed confounding covariates and not for unobserved ones

Propensity score methods work better in larger samples A final possible limitation of propensity score methods

is that a covariate related to treatment assignment but not to outcome is handled the same as a covariate with the same relation to treatment assignment but strongly related to outcome (potential for over-correcting or including irrelevant covariates)

Page 35: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Conclusion

Large databases have tremendous potential for addressing (although not necessarily settling) important medical questions, including important causal questions involving issues of policy

Addressing these causal questions using standard statistical models can be fraught with pitfalls because of their possible reliance on unwarranted assumptions and extrapolations without any warning

Propensity score methods are more reliable; they generalize the straightforward technique of sub-classification with one confounding covariate to allow simultaneous adjustment for many covariates

One critical advantage of propensity score methods is that they can warn the investigator that, because of inadequately overlapping covariate distributions, a particular database cannot address the causal question at hand without relying on untrustworthy model-dependent extrapolation or restricting attention to the type of person adequately represented in both treatment groups

Page 36: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Clinical Implications

A group of Biostatisticians and a group of clinicians were riding together on a train to joint scientific meetings. All the clinicians had tickets, but the Biostatisticians only had one ticket between them. Inquisitive by nature, the clinicians asked the Biostatisticians how they were going to get away with such a small sample of tickets when the conductor came through. The Biostatisticians said, "Easy.We have methods for dealing with that." Later, when the conductor came to punch tickets, all the Biostatisticians slipped quietly into the bathroom. When the conductor knocked on the door, the head Biostatistician slipped their one ticket under the door thoroughly fooling the layman conductor.After the joint meetings were over, the Biostatisticians and the clinicians again found themselves on the same train. Always quick to catch on, the clinicians had purchased one ticket between them. The Biostatisticians (always on the cutting edge) had purchased NO tickets for the trip home. Confused, the clinicians asked the Biostatisticians "We understand how your methods worked when you had one ticket, but how can you possibly get away with no tickets?" "Easy," replied the Biostatisticians smugly, "we have different methods for dealing with that situation." Later, when the conductor was in the next car, all the clinicians trotted off to the bathroom with their one ticket and all the Biostatisticians packed into the other bathroom. Shortly, the head Biostatistician crept over to where the clinicians were hiding and knocked authoritatively on the door. As they had been instructed, the clinicians slipped their one ticket under the door. The head Biostatistician took the clinicians' one and only ticket and returned triumphantly to the Biostatistician group. Of course, the clinicians were subsequently discovered and publicly humiliated.

Moral: Beware of using statistical techniques that you don't fully understand - it can only lead to trouble

Page 37: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.
Page 38: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Propensity Sub-classification

The U.S. Government Accounting Office used propensity score methods on the SEER database to compare the two treatments for breast cancer

First, approximately 30 potential confounding covariates and interactions were identified

A logistic regression was then used to predict treatment (mastectomy compared with conservation therapy) from these confounding covariates on the basis of data from the 5326 women

Each woman was then assigned an estimated propensity score, which was her probability, on the basis of her covariate values, of receiving breast conservation therapy rather than mastectomy

The group was then divided into five subclasses of approximately equal size on the basis of the womens' individual propensity scores

Before examining any outcomes (5-year survival results), the subclasses were checked for balance with respect to the covariates

If important within-subclass differences between treatment groups had been found on some covariates, then the propensity score prediction model would need to be reformulated

Page 39: Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR May 2007.

Annals of Internal Medicine, Part 2, 15 October 1997. 127:757-763

Propensity Sub-classification

Table 3. Estimated 5-Year Survival Rates for Node-Negative Patients in the SEER Database within Each of Five Propensity Score

Subclasses*


Recommended