+ All Categories
Home > Documents > Causal Inference with Observational Data

Causal Inference with Observational Data

Date post: 25-Oct-2021
Category:
Upload: others
View: 10 times
Download: 0 times
Share this document with a friend
47
Causal Inference with Observational Data 2. Causality Marc F. Bellemare May 2018 c Marc F. Bellemare, 2018 2. Causality
Transcript
Page 1: Causal Inference with Observational Data

Causal Inference with Observational Data2. Causality

Marc F. Bellemare

May 2018

c© Marc F. Bellemare, 2018 2. Causality

Page 2: Causal Inference with Observational Data

Causality

I begin this class with a discussion of causality because for allintents and purposes, (getting as close as possible to) identifyingcausal relationships is what the vast majority of appliedmicroeconomists spend their time working on.

Even the staunchest of structural econometricians, whose time isusually not spent thinking about clever identification strategies, isusually interested in whether the exogenous variables in her modelscause the endogenous variables.

(It is important not to confuse the theoretical and empiricaldefinitions of exogeneity and endogeneity. Many disagreements andmisconceptions stem from those homonyms. More in this in aminute.)

c© Marc F. Bellemare, 2018 2. Causality

Page 3: Causal Inference with Observational Data

Causality

And for what it’s worth, it’s not just economists who care aboutidentifying causal relationships– since I got my PhD in 2006, I haveseen a methodological convergence take place in the social science.

My one purely econometric contribution– a paper in which mycoauthors and I show that lagging explanatory variables willgenerally not exogenize them– was published in the Journal ofPolitics, and for my money, the methodological pieces published inthe top political science journals are often a lot more useful to mywork than those published in the top economics journals.

Likewise, social scientists in sociology, criminology, etc. are doingquantitative work with the goal of identifying causal relationships(Manzi, 2010).

c© Marc F. Bellemare, 2018 2. Causality

Page 4: Causal Inference with Observational Data

Causality

But before delving into causality, I should discuss two things aboutmyself which explain where my point of view comes from.

First, when I was doing my bachelor’s degree at the Université deMontréal in the late 1990s, the only social science students whotook any serious statistics were economics majors. Students whomajored in political science, sociology, or anthropology never tookany math or stats classes.

In other words, social sciences at the Université de Montréal weredone the old-fashioned French way.

c© Marc F. Bellemare, 2018 2. Causality

Page 5: Causal Inference with Observational Data

Causality

Second, toward the end of my Master’s degree, I dated someonewho was writing her doctoral dissertation on the relationshipbetween a large multinational corporation and its employees– alarge corporation which she used to work for.

Worse, her structured interviews relied on a convenience sample ofthe friends she had made while she worked for that corporation.

So unlike the current generation of graduate students, myreference point for what constitutes rigorous empirical work in thesocial sciences was set extremely low to begin with.

c© Marc F. Bellemare, 2018 2. Causality

Page 6: Causal Inference with Observational Data

Causality

Suppose we have the following theoretical relationship:

y = f (x). (1)

This relationship is deterministic– if we know x and f (·), we knowy .

Beyond being deterministic, the relationship above might becausal: By saying that y is a function of x , we are implying that xcauses y .

c© Marc F. Bellemare, 2018 2. Causality

Page 7: Causal Inference with Observational Data

Causality

In other words, because convention dictates that y should be onthe left-hand side (LHS) and x on the right-hand side (RHS) ofequation 1, we suspect that causality flows from x to y in thesame equation. Nothing, however, prevents us from writing thesame equation as x = f −1(x).

Suppose we have data on y and x for a sample of observationsi = 1, ...,N. Those variables need not be the same as in theequation above. Linearly projecting y on x yields

yi = α+ βxi + εi , (2)

where the error term εi is added because the relationship is nowstochastic rather than deterministic.

c© Marc F. Bellemare, 2018 2. Causality

Page 8: Causal Inference with Observational Data

Causality

Now suppose we estimate equation 2, ignoring for the momenthow we do that (i.e., ordinary least squares, maximum likelihood,or method of moments).

Is the coeffi cient estimate β̂ causally identified? (Let’s ignore forthe time being the imprecisions of language surrounding the terms“causal” and “identified.”)

The answer is “Maybe, but probably not.”

c© Marc F. Bellemare, 2018 2. Causality

Page 9: Causal Inference with Observational Data

Causality

For starters, we know that β̂ is identified (or unbiased) if and onlyif Cov(x , ε) = 0, in which case E (β̂) = β.

But even then, suppose you know for a fact that Cov(x , ε) = 0(say, because you randomly assigned x). A true skeptic might notbuy your story, simply because who knows if you’re not in the onein 10, 20, or 100 cases (depending on your level of confidence)where you reject the true null hypothesis that β = 0?

For a true skeptic, accepting any statement as causal requires noless of a leap of faith than that necessary to believe in theexistence of a Great Architect of the Universe.

c© Marc F. Bellemare, 2018 2. Causality

Page 10: Causal Inference with Observational Data

Causality

Luckily, in economics and other social sciences, the claims we aretrying to find causal evidence for are far removed from theologicalclaims.

Specifically, if you can convincingly argue on the basis of yourresearch design that Cov(x , ε) = 0 (arguably a big “if”), thenprovided that you did not make any mistake in estimation, thenyour estimated relationship is causal beyond any reasonable doubt.

c© Marc F. Bellemare, 2018 2. Causality

Page 11: Causal Inference with Observational Data

Causality

In other words, the name of the game in the applied micro versionof econometrics is to estimate a version of

yi = α+ βxi + εi , (3)

where we take as much “bad” stuff out of ε as possible, with“bad” stuff defined as “stuff that is correlated with x .”This iswhat the title of this course– causal inference with observationaldata– refers to.

Corollary: Any causal claim is based on a selection-on-observablesargument.

c© Marc F. Bellemare, 2018 2. Causality

Page 12: Causal Inference with Observational Data

Bellemare and Novak (2017)

Selection on observables example.

We were interested in the impact of participation in contractfarming (D) on the duration of the hungry season y for householdi holding a number of potential confounders x constant. So we areinterested in the coeffi cient γ in the equation

yi = α+ βxi + γDi + εi . (4)

The issue is obviously that participation in contract farming is notrandomly assigned to the households in the data– householdschoose (not) to participate on the basis of things which wetypically do not observe (e.g., ambiguity aversion, discount rates,entrepreneurial ability, managerial ability, risk aversion, technicalability, etc.)

c© Marc F. Bellemare, 2018 2. Causality

Page 13: Causal Inference with Observational Data

Bellemare and Novak (2017)

Luckily, the survey questionnaire included a series of questionsaimed at eliciting (all) respondents’WTP to participate in ahypothetical contract farming arrangement which would increasetheir income by 10%. Considering the following Roy model (Smithand Sweetman, 2016), household i participates iff

y1i − ci ≥ y0i , (5)

where ci is i’s cost of participation. For each household, we knowy0i , and I also know y1i = y0i + 0.1y0i .

c© Marc F. Bellemare, 2018 2. Causality

Page 14: Causal Inference with Observational Data

Bellemare and Novak (2017)

By exogenously varying ci in the in-survey experiment andobserving people’s yes or no answers to the hypothetical question,we can obtain for each respondent a measure of his WTP forparticipation in the hypothetical arrangement (let’s ignore how I doso in the interest of brevity), which proxies for his marginal utilityMU of participating in contract farming. This means I can estimate

yi = α+ βxi + γDi + δMUi + εi . (6)

Since a respondent’s marginal utility will be moved around bytypically unobservable factors (e.g., ambiguity aversion, discountrates, entrepreneurial ability, managerial ability, risk aversion,technical ability, etc.) then including a proxy for it should pull outof the error term those factors which account for selection.

c© Marc F. Bellemare, 2018 2. Causality

Page 15: Causal Inference with Observational Data

Bellemare and Novak (2017)

c© Marc F. Bellemare, 2018 2. Causality

Page 16: Causal Inference with Observational Data

Bellemare and Novak (2017)

c© Marc F. Bellemare, 2018 2. Causality

Page 17: Causal Inference with Observational Data

Causality

What makes applied econometrics more art than science– morerhetoric than dialectic– is the fact that one cannot test forcausality.

Rather, one has to argue that one’s research design yields a causalrelationship. How easy this is depends in large part on yourresearch design.

The reason I teach a course such as this one is that most graduateprograms are better at teaching you how to run tests than they areat teaching you rhetorical skills!

c© Marc F. Bellemare, 2018 2. Causality

Page 18: Causal Inference with Observational Data

Causality

Before anything, it should go without saying that identifying acausal relationship flowing from x to y does not mean that y isonly caused by x .

In Bellemare (2015), I showed beyond any reasonable doubt thatrising food prices levels cause food riots. At a policy conference inWashington, DC, I was taken to task by another participant– aphysicist– for talking about causality... because food riots havemore causes than just food prices.

Well, yeah. That isn’t the point of quantitative social science!

c© Marc F. Bellemare, 2018 2. Causality

Page 19: Causal Inference with Observational Data

Causality

The latter claim is hard to dispute– when food prices go up, thefact that we are more likely to see food riots in Lagos than in NewYork City is clearly proof of that– but one has to be careful not tointerpret the statement “x causes y”as equivalent to thestatement “x is the only cause of y .”

The former can be identified with a good research design; thelatter is akin to the Unmoved Mover of Aristotle’s Metaphysics.

c© Marc F. Bellemare, 2018 2. Causality

Page 20: Causal Inference with Observational Data

Statistical Endogeneity

What makes Cov(x , ε) 6= 0? That is, what are the sources ofstatistical endogeneity? Broadly speaking, it is useful to breakthings down into three such sources.

The first is reverse causality, or simultaneity. This arises when xcauses y but y also causes x . If your observations cover a longenough time span, this is likely to happen. Alternatively, one mightsay that the expectation of y might cause individuals (or firms, orhouseholds, etc.) to adjust x consequently.

In a regression of wage on education, for example, it is almostcertain that individuals’expectations regarding their future wagehas driven how much education they have gotten.

c© Marc F. Bellemare, 2018 2. Causality

Page 21: Causal Inference with Observational Data

Statistical Endogeneity

The second source of statistical endogeneity is unobservedheterogeneity, or omitted variables.

In most applications in applied microeconomics, this is the mainsource of statistical endogeneity. Individuals’preferences, levels ofability, etc. are typically not observed by the econometrician, andthey are likely to be correlated with what the econometrician canobserve, in particular the variable of interest.

c© Marc F. Bellemare, 2018 2. Causality

Page 22: Causal Inference with Observational Data

Statistical Endogeneity

The third source of statistical endogeneity is measurement error.

This arises when one of your variables– in particular, your variableof interest– is systematically misreported or mismeasured, and thedegree of misreporting or mismeasurement is correlated with whatyou can observe.

Note: This is distinct from classical measurement error, wheresomething is measured with error at random.

c© Marc F. Bellemare, 2018 2. Causality

Page 23: Causal Inference with Observational Data

Statistical Endogeneity

In all three cases, there is something in the error term ε inequation 2 which is correlated with x , which means thatCov(x , ε) 6= 0 and E (β̂) 6= β.

Why do I talk of statistical endogeneity?

Because there is a vast difference between theoretical andstatistical endogeneity. Theoretical endogeneity (exogeneity) refersto the case where the value of a variable is determined (taken asgiven) within a specific optimization problem. Statisticalendogeneity (exogeneity) refers to cases where Cov(x , ε) 6= 0(Cov(x , ε) = 0).

c© Marc F. Bellemare, 2018 2. Causality

Page 24: Causal Inference with Observational Data

Statistical Endogeneity

The two notions have little in common with each other.

Unfortunately, the fact that we use the same term is confusing,and some economists– in particular, those who were trained beforeCredibility Revolution (Angrist and Pischke, 2010) and failed tocatch up on empirical methods– mistakenly believe the two to beidentical.

This leads to some people thinking of reverse causality to be thedefinition of statistical endogeneity. It is not; it is only one cause ofstatistical endogeneity.

c© Marc F. Bellemare, 2018 2. Causality

Page 25: Causal Inference with Observational Data

Statistical Endogeneity

The foregoing suggests a systematic way to think through anddiscuss identification issues when writing applied papers.

In my own applied work, I almost always include a point-by-pointdiscussion of whether (i) reverse causality or simultaneity, (ii)unobserved heterogeneity or omitted variables, and (iii)measurement error are a source of bias in the application at hand,and of how I deal with those sources of statistical endogeneity thatare there.

I have come to see such a discussion as second only to an article’sintroduction in terms of importance, and I believe most youngresearchers would benefit from including such a discussion whenusing observational data.

c© Marc F. Bellemare, 2018 2. Causality

Page 26: Causal Inference with Observational Data

Methodological SkepticismDavid Hume (1711-1776) was one of the many philosophers ofscience who carefully thought and wrote about causality.According to Lorkowski (2016),

[I]f the denial of a causal statement is stillconceivable, then its truth must be a matter of fact, andmust therefore be in some way dependent uponexperience. Though for Hume, this is true by definitionfor all matters of fact, he also appeals to our ownexperience to convey the point. Hume challenges us toconsider any one event and meditate on it; for instance, abilliard ball striking another. He holds that no matterhow clever we are, the only way we can infer if and howthe second billiard ball will move is via past experience.There is nothing in the cause that will ever imply theeffect in an experiential vacuum.

c© Marc F. Bellemare, 2018 2. Causality

Page 27: Causal Inference with Observational Data

Methodological Skepticism

Extrapolating from the last two sentences to economics, for Hume,a good theoretical model is of no help in identifying causalrelationships.

Worse, a theoretical model is completely useless without at leastsome data (this could be something as simple as stylized facts) totest it.

c© Marc F. Bellemare, 2018 2. Causality

Page 28: Causal Inference with Observational Data

Methodological Skepticism

With the Credibility Revolution, applied microeconomists haveadopted a position of methodological skepticism.

That is, before the average applied microeconomist can take agiven relationship as causal, she has to be convinced of it.

The default position is to assume that any given correlation is justthat– and not a causal relationship.

c© Marc F. Bellemare, 2018 2. Causality

Page 29: Causal Inference with Observational Data

Methodological Skepticism

When a researcher claims that a given relationship is causal, theonus is on her to prove it beyond any reasonable doubt.

It is very much in this sense that much of applied microeconomicsis a craft, and that much of our work is rhetorical: In the absenceof an experiment or quasi experiment, it is diffi cult to claim that agiven relationship is causal.

So when someone is skeptical of another’s identification strategy ata seminar, this is (usually) not because the former person is beingobnoxious. Rather, it is because that person is merely exhibitingthe kind of methodological skepticism which (for better or forworse) is equated with critical thinking nowadays in our profession.

c© Marc F. Bellemare, 2018 2. Causality

Page 30: Causal Inference with Observational Data

Methodological Skepticism

One problem is that you cannot test for endogeneity.

You can test for exogeneity– that is, you can run a test thatassumes there is no statistical endogeneity, as in theDurbin-Wu-Hausman test– but a failure to reject the null in suchcases is not convincing: With 90, 95, or 99 percent of theprobability mass resting on the null, depending on your chosenlevel of confidence, you would expect to fail to reject the null inmost cases.

Thus, a rejection of the null in this case is much more convincingthan a failure to reject the null. The problem is that most peoplewho run Durbin-Wu-Hausman tests are usually interested in“proving” there is no endogeneity. But it is diffi cult to prove anegative– you could spend a lifetime trying to prove that unicornsdo not exist.

c© Marc F. Bellemare, 2018 2. Causality

Page 31: Causal Inference with Observational Data

Pearl’s Contribution

A lot of ground has been covered since the days of David Humewhen it comes to the study of causality. The leading researcher oncausality nowadays is Judea Pearl, a computer scientist at UCLA.One of Pearl’s insights is that we simply do not have the notationto talk about causality.

Let us take equation 2 again. What we are interested in is inestimating P(y |x), i.e., the probability that y will take a givenvalue given that we know the value of x , which is such that

P(y |x) = P(y , x)P(x)

. (7)

c© Marc F. Bellemare, 2018 2. Causality

Page 32: Causal Inference with Observational Data

Pearl’s Contribution

The problem is that equation 7 tells us nothing about whether therelationship between y and x is causal! We could also write

P(y , x) = P(x |y)P(y), (8)

which also tells us nothing about causality. This is equivalent tosaying that equation 2 could easily be rewritten as

xi = π + φyi + νi , (9)

where π = −α/β, φ = 1/β, and ν = −ε/β. In other words, thesame equation can be written in two ways, without there being anyindication as to the direction of causality.

c© Marc F. Bellemare, 2018 2. Causality

Page 33: Causal Inference with Observational Data

Pearl’s Contribution

Pearl (2009) suggests that we need a new notation, do(x), whichindicates that we “do something” to x . That is,

P(y |do(x)), (10)

where do(x) indicates that the econometrician controls x in someway (for instance, via an experiment).

Only then can we truly talk of causality.

c© Marc F. Bellemare, 2018 2. Causality

Page 34: Causal Inference with Observational Data

Pearl’s Contribution

Economists have been thinking about causality for a while. In twoarticles published a half-century ago, Herman Wold discussed thenotion of causality in econometrics (Wold, 1954) as well as causalinference in observational data (Wold, 1956).

The study of causality has been neglected in economics until themid-1980s, if not the early 2000s. Even then, Kennedy (2008) onlydiscussed causality briefly in the context of Granger causality– andthen again, to warn the reader that Granger causality is notcausality because the sales of holiday greeting cards have beenfound to Granger-cause the holidays.

c© Marc F. Bellemare, 2018 2. Causality

Page 35: Causal Inference with Observational Data

Pearl’s Contribution

Pearl also brought to the study of causality the use of directedacyclic graphs (DAG).

Strictly speaking, a DAG is a finite, directed graph with nodirected cycles.

In econometrics, DAGs are used to graph the that some variableshave on other variables. As such, DAGs are useful in that they arevisual representations of the inference problem at hand, and theycan help us determine visually whether an estimated relationship iscausally identified or not.

c© Marc F. Bellemare, 2018 2. Causality

Page 36: Causal Inference with Observational Data

Pearl’s Contribution

Figure: Source: Bellemare, Masaki, and Pepinsky (2017).

c© Marc F. Bellemare, 2018 2. Causality

Page 37: Causal Inference with Observational Data

Regression vs. Matching

I usually teach applied economists, who are familiar with theregression approach, so that’s what I focus on in this course.

But with a research design that allows assuming conditionalindependence, the matching approach is also valid. One distinctadvantage of matching methods is that they sometimes allowestimating types of treatment effects which are otherwiseimpossible to estimate. In Bellemare and Novak (2017), forexample, we rely primarily on a regression approach to estimate theaverage treatment effect (ATE) of interest, and we then rely on amatching approach (i) to assess the robustness of our regressionresults, and (ii) to estimate both the average treatment effect onthe treated (ATT) and the average treatment effect on theuntreated (ATU).

c© Marc F. Bellemare, 2018 2. Causality

Page 38: Causal Inference with Observational Data

Regression vs. Matching

In terms of notation, assuming a binary treatment variable D, let

ATE = E (yi |Di = 1)− E (yi |Di = 1), (11)

with the ATT and ATU defined as

ATT = E (y1i |Di = 1)− E (y0i |Di = 1), (12)

where y1i and y0i respectively denote the value of the outcomevariable for observation i in cases where i is treated and untreated,and

ATU = E (y1i |Di = 0)− E (y0i |Di = 0). (13)

c© Marc F. Bellemare, 2018 2. Causality

Page 39: Causal Inference with Observational Data

Bellemare and Novak (2017)

c© Marc F. Bellemare, 2018 2. Causality

Page 40: Causal Inference with Observational Data

Regression vs. Matching

Intuitively then, the ATT and ATU measure the causal effect ofchanging D = 0 to D = 1 for those who were treated and causaleffect the same change would have on those who were not treated.

The issue with matching is that oftentimes, researchers who lack acredible research design will substitute matching on observables forthat credible research design and claim that it allows making morecredible statements than a regression approach would.

But matching on observables does not account for unobservables,which is usually what plagues economic applications.

c© Marc F. Bellemare, 2018 2. Causality

Page 41: Causal Inference with Observational Data

What to Tackle, and In Which OrderFrances Woolley wrote:

[I]t is rare that I will have someone come to my offi cehours and ask “Have I chosen my sample appropriately?”Instead, year after year, students are obsessed aboutlearning how to use probit or logit models, as if theircomputer would explode, or the god of econometricswould smite them down, if they were to try to explain a0-1 dependent variable by running an ordinary leastsquares regression. I try to explain: “Look, it doesn’tmatter. It doesn’t make much difference to your results.It’s hard to come up with an intuitive interpretation ofwhat logit and probit coeffi cients mean, and it’s a hassleto calculate the marginal effects. You can run logit orprobit if you want, but run a linear probability model aswell, so I can tell whether or not anything weird is goingon with the regression.”But they just don’t believe me.

c© Marc F. Bellemare, 2018 2. Causality

Page 42: Causal Inference with Observational Data

What to Tackle, and In Which Order

Indeed, nothing screams “grad student” louder than an obsessionwith fancy estimators– usually of the maximum likelihood variety(i.e., probit, logit, tobit, etc.), sometimes of the semiparametricvariety– instead of with whether one has reasonably identifiedone’s parameter of interest (via a research design that relies on aplausibly exogenous source of variation), or with whether one’sfindings have some reasonable claim at being externally valid viathe use of a representative sample.

c© Marc F. Bellemare, 2018 2. Causality

Page 43: Causal Inference with Observational Data

What to Tackle, and In Which Order

There is an ontological order of importance to things in appliedwork, which unfortunately goes unspoken in most econometricsclasses. That order is roughly as follows:

1. Internal validity. Is your parameter of interest crediblyidentified?

2. Precision. Are your standard errors right?

3. External validity. Are your findings applicable to observationsoutside of your sample?

4. Data-generating process. Did you properly model the DGP?

c© Marc F. Bellemare, 2018 2. Causality

Page 44: Causal Inference with Observational Data

What to Tackle, and In Which Order

Getting standard errors right is important. But it is not moreimportant than internal validity. At least not these days.

Likewise, it is important to account for the fact that a dependentvariable is ordered and categorical, but with 150 observations, oneis better off relying on a good research design and using a linearregression than a likelihood-based procedure (which is onlyasymptotically consistent; n = 150 does not count as asymptotic).

Conversely, having “big data” in the form of millions or billions ofobservations will not make your work more likely to be published ingood journals in the absence of solid identification.

c© Marc F. Bellemare, 2018 2. Causality

Page 45: Causal Inference with Observational Data

Summary

I Unless you are dealing with experimental data, where causalityis practically given, start from a position of methodologicalskepticism.

I Think carefully about what leads to Cov(x , ε) 6= 0 in yourapplication. Your paper should have an Empirical Frameworksection. In that section, which should be split in at least twosub-sections– Estimation Strategy and IdentificationStrategy– systematically list the three causes of statisticalendogeneity in your Identification Strategy section and explainhow your research design allows ruling them out as concerns.

c© Marc F. Bellemare, 2018 2. Causality

Page 46: Causal Inference with Observational Data

Summary

I If your research design does not allow ruling one of thosesources of statistical endogeneity out, be honest about it, andtry to explain how that source biases your results. Drawing aDAG might help. In some cases, you might be able toanalytically derive the sign and magnitude of the remainingbias. Any attenuation bias is good for your story when youreject the null, since it implies that what you have estimatedis an estimate of a lower bound on the true effect.

I Another thing which works well is to imagine what the perfectdata set to answer your question would look like, explain howthe data you use in your paper differs from that ideal, andthen explain how given available data and methods, you are asclose as possible to the ideal data set.

c© Marc F. Bellemare, 2018 2. Causality

Page 47: Causal Inference with Observational Data

Summary

I Whatever you do, unless you have experimental or quasiexperimental data, or data from a randomized control trial, donot use causal language. Instead of talking about how xcauses y , talk about how your results suggest that x causes y ,about how there is an association between the two. Papersget rejected when their authors use causal language where it isnot warranted.

I Focus on internal validity, i.e., on identification, first andforemost. Your dependent variable might be a count variable,but as long as you have not done a good job of identifyingwhether your variable of interest causes it, estimating aPoisson or negative binomial regression remains secondary. Atbest, you can estimate those fancier regressions as robustnesschecks.

c© Marc F. Bellemare, 2018 2. Causality


Recommended