Interesting Questions in Freakonomics - Indiana Universityusers.ipfw.edu/dilts/E 306...

Journal of Economic LiteratureVol. XLV (December 2007), pp. 973–1000

Interesting Questions in Freakonomics

JOHN DINARDO∗

Freakonomics is more about “entertainment” than it is a serious attempt at popu-larization. Consequently, rather than conduct a comprehensive fact check, I use thebook as a springboard for a broader inquiry into social science research and takeissue with the book’s surprising premise that “Economics is a science with excellenttools for gaining answers but a serious shortage of interesting questions.” Usingexamples from Freakonomics, I argue that some of the questions the book addressesare “uninteresting” because it is impossible to even imagine what a good answerwould look like. I conclude with some thoughts about the role of economic theory ingenerating interesting questions and/or answers.

973

1. Introduction

Freakonomics: A Rogue EconomistExplores the Hidden Side of Everything

(Harper Collins 2005) is certainly popular.Written jointly by the University of Chicagoeconomist Steven Levitt and New YorkTimes journalist and author Stephen Dubner(Confessions of a Hero-Worshiper andTurbulent Souls: A Catholic Son’s Return tohis Jewish Family), the book has appearedon best seller lists internationally and hasoccupied the New York Times Best Sellerslist for more than a year.1 Moreover, with the

release of an Instructor’s Manual, as well asa Student Guide written by S. ClaytonPalmer and J. Lon Carlson,2 Freakonomicsmay become part of the learning experiencefor many economics students.

However, Freakonomics is more about“entertainment” than it is a serious attempt atpopularization. Consistent with its hagio-graphication of Levitt the book lapses into“truthiness”—telling versions of the researchthat comport better with what (presumably)the audience wishes were true; the book’snearly “photo negative” misdescription of theeffect of the Romanian dictator’s abortion

∗ DiNardo: University of Michigan.1 See the book’s website, http://www.freakonomics.com,

for information about the book. Since I began this review,Levitt and Dubner have produced a new “revisedexpanded” edition of Freakonomics that, among otherthings, corrects some minor mistakes and reorganizes thebook in light of the fact some readers found the vignettesthat preceded each chapter in the original edition “intru-sive (and/or egomaniacal, and/or sycophantic).” (Levittand Dubner 2006a). The bulk of the changes regard theircurrent view that they were “hoodwinked” regarding

their description of Stetson Kennedy (Levitt and Dubner2006b). The book also contains a new section of materialfreely available from their blog that I will largely avoiddiscussing, except where it seems especially pertinent tothe “main” text. My page references will be taken fromthe original edition.

2 The manual is available by request from HarperCollins Academic www.harperacademic.com. The studentguide is free and available at http://www.freakonomics.com/pdf/StudentFREAKONOMICS.pdf.

dec07_Article3 11/28/07 1:08 PM Page 973

ban is a case in point. More generally,although some of the research discussed hasbeen challenged by others, little of the sub-stance of these debates is treated as central tothe discussion; the controversies, when theyappear, are often treated as a sideshow in the“blog” material. I provide the briefest sketchof some of the issues this raises in JohnDiNardo (2006a).

While a comprehensive fact check of theclaims of the book might be of some value, inlight of the book’s apparent aims, it wouldseem beside the point.3 Rather, my hope isthat Freakonomics might provide a spring-board for a discussion of issues that I thinkapply more broadly to social scienceresearch.

One of the more surprising claims inFreakonomics is that “Economics is a sci-ence with excellent tools for gaining answersbut a serious shortage of interesting ques-tions” (p. xi). I do not wish to dispute thatthere is a wealth of uninteresting researchand, when I look for entertaining or interest-ing insights into “human behavior,” I ammore likely to turn to a good novel than thelatest working paper in economics. However,this claim runs so contrary to my experience(and I suspect, to the experience of manyeconomists and social scientists) that itseems worthwhile to explore.

There are many criteria for interestingquestions that will be given short shrift,despite being among the most important:who is included in the discussion, for

example, is often more important than theintellectual capacities of the debaters. Thequest by Emperor Charles V of Spain who“set out to discover the truth by experiment”(Lewis Hanke 1935) whether AmericanIndians had the “capacity” for liberty calledforth a flurry of research and debate amongthe most serious Spanish intellectuals of theday. It would not have been made more“interesting” by a more thorough attentionto matters of methodology.4 One suspectsthat few “American Indians” doubted theircapacity for liberty despite the absence of social science research demonstrating otherwise.

Instead I would like to focus on criteriathat might be used to distinguish good socialscience from good literature. Even if onestipulates that a good story need only sound“believable” or “entertaining”—in social sci-ence I believe we should aim for a differentstandard.

One sensible criterion is that claims in thesocial sciences should distinguish themselvesby the “severity” of the “tests” to which theyare put.5 Mayo (1996) cites the Americanphilosopher Charles Sanders Peirce to pro-vide a nice short account of what a “scientif-ic” approach is and what is meant by a“severe test”:6

[After posing a question or theory], thenext business in order is to commence deduc-ing from it whatever experimental predictionsare extremest and most unlikely . . . in orderto subject them to the test of experiment.

974 Journal of Economic Literature, Vol. XLV (December 2007)

3 From Noam Scheiber (2007), “‘There’s no question Ihave written some ridiculous papers,’ [Levitt] says. By wayof explanation, [Levitt] draws an analogy to the fashionindustry: haute couture versus prêt-à-porter. ‘Sometimesyou write papers and they’re less about the actual result,more about your vision of how you think the professionshould be. And so I think some of my most ridiculouspapers actually fall in the high-fashion category.’”

4 Hanke’s useful book describes the “first social experi-ments in America” and makes for an informative yet har-rowing read in part because it was intended as defense ofthe Emperor and because the aim of his book was todemonstrate that “the Emperor . . . was imbued with aspirit not unlike that of a modern sociologist” (Preface) Thedescription of the wide-ranging experiments undertook by

the Spanish ends with the observation that “probably themountain of evidence piled up during almost thirty yearsof social experimentation was high enough to convince the[Spanish] government that nothing could be gained byfurther attempts to make the Indians live like Christianlaborers in Castile” (p. 71).

5 For an insightful and much more careful exposition ofthe notion of “severe” testing that motivates this discus-sion, see Deborah G. Mayo (1996).

6 Perhaps an even pithier summary was provided byLucien Lecam (1977) in his critique of Bayesian solutionsto the problems of inference: “the only precept or theorywhich really seems relevant is the following: ‘Do the bestyou can.’ This may be taxing for the old noodle, but eventhe authority of Aristotle is not an acceptable substitute.”


The process of testing it will consist, not inexamining the facts in order to see how wellthey accord with the hypothesis, but on thecontrary in examining such of the probableconsequences of the hypothesis as would becapable of direct verification, especiallythose consequences that would be veryunlikely or surprising in case the hypothesiswere not true.

When the hypothesis has sustained a test-ing as severe as the present state of ourknowledge . . . renders imperative, it will beadmitted provisionally . . . subject of courseto reconsideration.7

The context of Peirce’s remarks is a dis-cussion of the importance and usefulness ofbringing statistical reasoning to bear on his-tory, though clearly they apply more broadly.While accepting the notion that putting ourquestions to a severe test is a good idea, formost problems there is no simple formulafor assessing severity. Nonetheless, it seemslike such a sensible criterion that it mightcome as a surprise that much economicsresearch is of the first sort mentioned byPeirce—evaluating how well the facts accordwith a given economic hypothesis.Undergraduate economics textbooks arefilled with stories, very few of which havebeen forced to bear mild, let alone severescrutiny, but are “broadly consistent” withthe data.

A convenient place to begin is the issue,raised several times in Freakonomics (andthe student guide, which refers to it as a“basic economics concept”), of whether analleged relationship is “cause” or “correla-tion.” Indeed, Freakonomics invokes severaldifferent notions of causality and I begin byreviewing some of what it has to say on thesubject. Stripped to its essence, my argu-ment is that such a debate often seemsbeside the point: “cause” means manythings. A more relevant question about a

correlation is whether it provides a severetest of a hypothesis.

Next I turn to a description of the ran-domized controlled trial (RCT), not as anexemplar of what all, or even most, social sci-ence should be but rather as an exemplar ofsubjecting a hypothesis to a severe test.

A basic precondition to severe testing, ofcourse, is to formulate questions that can beput to some kind of test. Unfortunately,many social science questions often fail tomeet this precondition. I take a couple ofexamples from Freakonomics and argue thatsome of the questions it addresses are “unin-teresting” because it is impossible to evenimagine what a good answer would look like.Somewhat ironically, the issues inFreakonomics that have generated the mostpopular debate seem are the ones that seemto have no good answers.

I conclude with some thoughts about therole of economic theory in generatinginteresting questions and/or answers.

2. Correlation is Causation?Causes make appearances in Freakonom-

ics in many different and confusing ways.8 Insome places, Freakonomics seems to invokecausation as “explanation” or “motive”:

What might lead one person to cheat orsteal while another didn’t? How would oneperson’s seemingly innocuous choice, goodor bad, affect a great number of peopledown the line? In [Adam] Smith’s era, causeand effect had begun to wildly accelerate;incentives were magnified tenfold. Thegravity and shock of these changes were asoverwhelming to the citizens of his time asthe gravity and shock of modern life seem tous today (p. 15).

In another passage, the inability to reasonabout causation is described as an evolution-ary by-product exploited by “experts”:

975DiNardo: Interesting Questions in Freakonomics

7 Peirce (1958) 7.182 (emphasis added) and 7.231 ascited in Mayo (1996).

8 This is unfortunate as “causation” is a notoriously dif-ficult topic even when treated by a serious philosopherand I can not do it minimal justice here. One place to startfor a more ably argued introduction to these issues in eco-nomics is Julian Reiss and Nancy Cartwright (2004). Seealso Cartwright (2007).


We have evolved with a tendency to linkcausality to things we can touch or feel, notto some distant or difficult phenomenon.We believe especially in near-term caus-es . . . a snake bites your friend, he screamswith pain, and he dies. The snakebite, youconclude, must have killed him. Most of thetime, such a reckoning is correct. But whenit comes to cause and effect, there is often atrap in such open-and-shut thinking. Wesmirk now when we think of ancient cul-tures that embraced faulty causes: the war-riors who believed, for instance, that it wastheir raping of a virgin that brought themvictory on the battlefield. But we tooembrace faulty causes, usually at the urgingof an expert proclaiming a truth in which hehas a vested interest (p. 140).

Confusion about correlation, when notbeing exploited by unsavory experts, is theproduct of soft-headed thinking:

The evidence linking increased punishmentwith lower crime rates is very strong. Harshprison terms have been shown to act as bothdeterrent (for the would-be criminal on thestreet) and prophylactic (for the would-becriminal who is already locked up). Logicalas this may sound, some criminologists havefought the logic. A 1977 academic studycalled “On Behalf of a Moratorium onPrison Construction” noted that crime ratestend to be high when imprisonment ratesare high, and concluded that crime wouldfall if imprisonment rates could only be low-ered. (Fortunately, jailers did not suddenlyturn loose their wards and sit back waitingfor crime to fall.) . . . The “Moratorium”argument rests on a fundamental confusionof correlation and causality (p. 123).

While war, rape, and experts wieldingdubious metaphysics may be as old ashumankind, confusion about “correlationversus causation” is arguably quite recent.Even the idea of “probability” as we mightunderstand it today emerged only in the sev-enteenth century (Ian Hacking 1975). At thattime, there was a great deal of reluctance tointroduce any notion of “chance” into laws ofnature. Several years after Smith’s Wealth ofNations, Laplace could still write “all events,even those which on account of their

insignificance do not seem to follow thegreat laws of nature, are a result of it just asnecessarily as the revolutions of the sun.”

Karl Pearson (1930), proponent of eugen-ics and an important contributor to modernstatistics and scientific philosophy (who didmuch to popularize the idea of correlation)argued that “correlation” superseded thenotion of “causation”:9

Up to 1889 [when Galton published NaturalInheritance], men of science had thoughtonly in terms of causation . . . . In [the]future, they were to admit another workingcategory, correlation which was to replace notonly in the minds of many of us the old cate-gory of causation, but deeply to influence ouroutlook on the universe. The conception ofcausation—unlimitedly profitably to thephysicists—began to crumble to pieces. In nocase was B simply and wholly caused by A,nor, indeed by C, D, E, and F as well! It wasreally possible to go on increasing the num-ber of contributory causes until they mightinvolve all the factors of the universe.

To put Pearson’s views in context, he wasreacting against a view held by many othersthat “stable” correlations—correlations thatdidn’t change much over time, for example—were informative about causes or causallaws—an idea that is coterminous with theidea of correlation itself. One example, per-haps one of the earliest predecessors toFreakonomics, is Andrè-Michel Guerry’s(1883) Essay on the Moral Statistics ofFrance.10

One of the most sensational of Guerry’sfindings was his refutation of the view that“ignorance is the principal cause of crime,and that to make men better and happier, itis sufficient to give them an education.”


9 Pearson was a complex figure who made contribu-tions in many areas. His book, The Grammar of Science(1892), for example, was on a list of books read by thefamous “Olympia Academy” reading group of AlbertEinstein, Conrad Habicht, and Maurice Solovine in 1902.

10 Guerry’s work “appears to be the first to test ‘arm-chair’ assumptions about the relationship of certain vari-ables to criminal behavior” (Sue Titus Reid 1985). LikeFreakonomics, it was an international hit and was popularamong “amateurs” (Hacking 1990).


According to Guerry, this view was based onthe observation that the departments whereeducation is least widespread are thosewhere the most crimes are committed(Guerry 1883, page 87, emphasis in origi-nal). Guerry was able to refute conventionalwisdom on the subject by merely demon-strating (with better data) that the correla-tion between education and crime at thedepartment level was not negative, but posi-tive. Moreover, Guerry apparently felt littleneed to consider the possibility of what wemight call “confounders.” Having estab-lished that the correlation was positive, headduced further evidence that the conven-tional view was wrong by demonstrating thestability of the correlation over time: a law ofsorts.

The impulse to embed statistical uncer-tainty in an otherwise “determinist” worldview led to arguably one of the most bizarreintellectual strands in social science: theidea that statistical laws vitiated free will(Hacking 1990; Hacking 1983a). A wonder-ful illustration comes from CharlesDickens’s Hard Times (1854). Mr.Gradgrind (who it may be recalled namedtwo of his sons “Malthus” and “AdamSmith”!) was in part the satirical embodi-ment of statistical fatalism.11 If the numberand proportion of crimes displayed statisti-cal regularities, could the criminals reallyhave free will? When Mr. Gradgrind’s sonTom is revealed to be a thief, Tom respondsto his father’s shock and dismay this way: “‘Idon’t see why,’ grumbled the son. ‘So manypeople are employed in situations of trust;so many people, out of so many, will be dis-honest.’ I have heard you talk, a hundredtimes, of its being a law. How can I helplaws? You have condemned others to suchthings, Father. Comfort yourself” (bookthree, chapter 7).

While we have long abandoned the view (Ihope) that statistical laws have anything tosay about free will, still with us is the ideathat statistical distributions are “laws” thatregulate human behavior on some macroscale.12

Notwithstanding the paucity of economic“laws,” the idea that mere empirical regular-ities might embody “causes” can not alwayseasily be dismissed. One might argue thatNewton’s law of gravitation was an exampleof an empirical regularity—correlation—thatbecame a “cause.” Surely we can talk aboutgravity causing my cup of coffee to fall offthe table after I pushed it. However limitedwe might find such an account of gravity asa cause, the testable predictions from thelaw of gravitation can ultimately be put torather severe tests in an awe-inspiring vari-ety of contexts. Thus, one can sympathizewith Leibniz’s opposition to the concept ofgravity—which he dismissed as an “occult”force—while maintaining it is a useful andpowerful idea. Whether gravity is “real,” alaw of nature, or whether it is really a“cause” seems beside the point.

2.1 Distant and Subtle Causes

The word “cause,” unfortunately, canmean many different things. Herbert Simononce observed that “in view of the generallyunsavory epistemological status of the notionof causality, it is somewhat surprising to findthe term in rather common use in scientificwriting”(Simon 1953 as cited in Zellner1984). Indeed one of the most confusingthemes in Freakonomics is that “distant andsubtle causes can have dramatic effects.”

Their claim about “distant and subtlecauses” is confusing in a couple of ways.First, it doesn’t seem to speak to the type of“manipulationist” notions of causality thatconcern many in social science. Second, theclaim evokes an echo of the Laplacean


11 And economics as well. The colorful protagonistSissy defines the basic principle of Political Economy as“To do unto others, as they would do unto me”!

12 See, for example, Arthur DeVany (forthcoming) andthe references cited within.


determinism I discussed above. While it isnot precisely clear what notion of “cause” isbeing invoked, it seems to speak to some“causal” antecedent which sets off a longchain of events ultimately resulting in a spe-cific event. It is a common narrative devicein fiction—many a character’s fate can be“traced back” to a single fateful act.

The search for the single (or small numberof) causal antecedent(s) of an event is sur-prisingly common among economists: “whatcaused wage inequality to increase in the1980s” or “what caused the GreatDepression” or “what caused crime to fall inthe 1990s” (a question taken up inFreakonomics) are three examples that cometo mind. I won’t deny that the search foranswers to such questions can sometimes beinformative. Nonetheless, except for thevery simplest phenomenon, it is rarely clearwhat constitutes a good answer to such aquestion.13

Consider something as simple as “thecause of death.” Enumeration of such causesdates back at least to 1592: “the occasion ofkeeping an accompt of burials arose firstfrom the plague” (John Graunt 1676). Notsurprisingly, the victims of the plague werenot drawn randomly from rich and poor; nei-ther was the focus on the cause of deathpolitically inert. Anne M. Fagot (1980)reports on one Doctor Vacher who, seekingto understand the dramatic increase indeaths during the 1870 siege of Paris, wentback to study an even earlier four-monthsiege of Paris in 1590. After studying thedata, he was led to conclude that one of the“effects of insufficient food” was that thelethality of diseases such as typhoid was

much greater. Nonetheless, “hunger” or“lack of food” was rarely cited as a “cause” ofdeath, although he identified undernutritionas an “underlying potential cause.”

This arbitrariness, of course, persists. Inthe United States, for example, most peo-ple die of more than a single cause ofdeath; yet even on the death certificate,where up to twenty causes of death can bereported, the distinction between “underly-ing causes” and other types of causeremains! (Center For Health Statistics1998).14 Despite its arbitrariness, suchinformation can be useful. Indeed, if thereis any clear doctrine on how to attribute thecause of death, perhaps it is the require-ment that the classification scheme issomehow minimally “useful” (Fagot 1980).No amount of diligent record keeping,however, will be able to create a “com-plete” description of “why” some peopledie—debate on “why” Jesus died contin-ues! (W. D. Edwards, W. J. Gabel, and F. E.Hosmer 1986; Anonymous 1986; C. G.Gosling 1987; B. Brenner 2005; H. urRehman 2005; W. R. Saliba 2006).

2.2 Cause as Explanations

Surely “crime” or other social scienceissues are at least as complicated as“death.” Yet it is surprising how muchsocial science research seems dedicated totelling simple stories. This suggests anotherrelated notion that might be called “causeas explanation.” While such stories appearto have great appeal, I must confess I don’tunderstand why.

A well known reductio ad absurdum of thistype of reasoning concerns the famous Dr.


13 For a more optimistic view about the types of ques-tions that are “susceptible to empirical investigation,” seeZellner (1984).

14 The U.S. Implementation of the InternationalClassification of Diseases includes this instruction: “Acause of death is the morbid condition or disease process,abnormality, injury, or poisoning leading directly or indi-rectly to death. The underlying cause of death is the dis-ease or injury which initiated the train of morbid events

leading directly or indirectly to death or the circumstancesof the accident or violence which produced the fatal injury.A death often results from the combined effect of two ormore conditions. These conditions may be completelyunrelated, arising independently of each other or they maybe causally related to each other, that is, one cause maylead to another which in turn leads to a third cause, etc.”(National Center for Health Statistics 2006, p. 6).


Pangloss in Candide.15 At one point Candideis reunited with his former teacher Dr.Pangloss, who has been reduced to a beggarwith his nose half-eaten off, covered in scabs.Surprised by this (and a lot of other) misfor-tune, Candide “inquired into the cause andeffect, as well as into the sufficing reason thathad reduced Pangloss to so miserable a con-dition.” We learn that Dr. Pangloss had “tast-ed the pleasures of Paradise” with Pacquette,a pretty servant girl who had, as it turns out,been infected with a disease, the impressivegenealogy of which Dr. Pangloss is able totrace back to a Countess, a Jesuit, a novitiate(among others), and ultimately ChristopherColumbus. Candide asks why did Dr.Pangloss suffer such a horrific fate? Whatcaused his degradation? For Dr. Pangloss,causal questions were straightforward: thingscould not be otherwise than they are, allthings are created for some end, and thus allthings are created for the best. In this case,Dr. Pangloss concludes his suffering was “athing unavoidable, a necessary ingredient inthe best of worlds” for had this disease notcome to pass “we should have had neitherchocolate nor cochineal.”16

The humor in Candide comes from thecreativity with which one can generate a“theoretically justified” explanation of “why”for any set of facts. One obvious problemwith Dr. Pangloss’ explanations is the impos-sibility of putting such views to a severe testof any sort.17

Much economics as “explanation” itseems to me, resembles Dr. Pangloss’sexplanations. With enough cleverness onecan dream up a mathematical model of util-ity–maximizing individuals to explain any-thing. It is not always clear what purposesuch explanations serve. In a throw-awayline in Freakonomics, for example, theauthors attribute the putative fact that “thetypical prostitute earns more than the typi-cal architect” (p. 106) to a “delicate balance”of “four meaningful factors.”18 I don’t meanto deny that such factors often play somerole—certainly, for example, an interventionto make a job more unpleasant may act toreduce the number of people willing andable to do that job. But even if we stipulateto these “four meaningful factors,” that isonly the beginning of an explanation atbest.19


15 Voltaire (1796) describes Pangloss this way: “[He]was a professor of metaphysico–theologo–comsolo–nigol-ogy. He could prove, to admiration, that there is no effectwithout a cause; and, that in this the best of all possibleworlds, the baron’s castle was the most magnificent of allcastles, any lady the best of all possible baronesses. It isdemonstrable, said he, that things cannot be otherwisethan as they are: for all things having been created forsome end, they must necessarily be created for the bestend. Observe, that the nose is formed for spectacles, andtherefore we wear spectacles. The legs are visiblydesigned for stockings, and therefore we come to wearstockings” (p. 4).

16 See chapter 4, page 14, of Voltaire (1796). Thetranslator of this version of Voltaire’s story attributes thisstyle of reasoning to the “maxims of Leibniz” and as putinto the mouth of Dr. Pangloss is a “most Capital andpointed stroke of Satire.” Cochineal is apparently a reddye made from ground up insects.

17 John Maynard Keynes (1921, p. 297) argues that“the discussion of [Aristotelian] final causes and of theargument from design has suffered from its supposed con-nection with theology. But the logical problem is plain andcan be determined upon formal and abstract considera-tions.” He illustrates the case with the evidentiary value ofobserving of some unusual event and ascribing its cause to

some “supposed conscious agent.” With a simple applica-tion of Bayes’ rule he does conclude that “no conclusion,therefore, which is worth having, can be based on theargument from design alone; like induction, this type ofargument can only strengthen the probability of conclu-sions, for which there is something to be said on othergrounds.”

18 “When there are a lot of people willing and able todo a job, that job doesn’t generally pay well . . . the othersare the specialized skills a job requires, the unpleasant-ness of a job, and the demand for services that the job ful-fills” (p. 105).

19 As to the truth of the claim about prostitute wages, itis too imprecise to verify or deny; moreover Dubner andLevitt provide no reference. In a previous version of thisessay, I concluded that it would be a major project to ver-ify such a claim. Putting aside the almost insuperableproblems of defining prostitution and measurement ofhours worked, a comparison of data from a probabilitysample of Los Angeles prostitutes, Lee A. Lillard (1998)revealed that measured in 2004 dollars, the mean incomefor “Street Prostitutes” in Los Angeles was $36,325 in1989. In May 2004, data from Occupational EmploymentStatistics for “Architects, Except Landscape and Naval”suggested an annual income from work of $66,230(assuming 2,080 hours of work per year).


3. The Randomized Controlled Trial as OneType of Severe Test

Some writers have sought to define acause as something that arises from the pre-dictable consequence of an intervention thatcan be evaluated by something approximat-ing randomized design. As the foregoing hasmade clear, this definition is too limitedgiven the many different notions of the word“cause.” Rather than “no causation withoutmanipulation” (Paul W. Holland 1986) itmight be more truthful to say that discus-sions in social science about causes are moreintelligible when they involve an interven-tion of some sort; moreover, a focus on such“policy evaluation” questions often leads tomore interesting questions, and importantlyoften leads to situations when we may ableto subject our views to some kind of test. Ina helpful discussion, Reiss and Cartwright(2004) suggest the slogan “disambiguatebefore you evaluate.”

My purpose in discussing a RCT is that itis useful to review a framework where whatquestion is being asked, and the ground rulesunder which we might find an answer credi-ble is arguably more transparent than isusual. As is common practice, I will describequestions answered in such a framework as“causal,” although they are often causal in avery limited sense.20 Indeed, the origins of

the RCT lie in the attempt to put some of the“squishiest” beliefs to a severe test—some ofthe earliest examples arose in the study oftelepathy (Hacking 1988).21

In Freakonomics, regression analysis isdescribed as the tool of someone who can’tconduct a RCT:

In a perfect world, an economist could run acontrolled experiment just like a physicist ora biologist does: setting up two samples, ran-domly manipulating one of them, and meas-uring the effect. But an economist rarely hasthe luxury of such pure experimentation.(That’s why the school-choice lottery inChicago was such a happy accident.) Whatan economist typically has is a data set witha great many variables, none of them ran-domly generated, some related and othersnot. From this jumble, he must determinewhich factors are correlated [sic] and whichare not (p. 162).22

Putting aside whether this is a descrip-tion of good practice, the view that regres-sion is a (sometimes inadequate) substitutefor a randomized controlled trial is not uni-versally held by economists. More surpris-ingly, perhaps, is that as a philosophicalmatter “it is hard to think of a more contro-versial subject than that of randomization(Patrick Suppes 1982, p. 464)23 Convinc-ing Bayesian rationales for randomization,for example, are evidently difficult to


20 For those who prefer a definition of “cause,” onethat seems to capture some of the ideas I have in mind isdue to James J. Heckman (2005): “Two ingredients arecentral to any definition [of causality]: (a) a set of possibleoutcomes (counterfactuals) generated by a function of aset of ‘factors’ or ‘determinants’ and (b) a manipulationwhere one (or more) of the ‘factors’ or ‘determinants’ ischanged. An effect is realized as a change in the argumentof a stable function that produces the same change in theoutcome for a class of interventions that change the “fac-tors” by the same amount. The outcomes are compared atdifferent levels of the factors or generating variables.Holding all factors save one at a constant level, the changein the outcome associated with manipulation of the variedfactor is called a causal effect of the manipulated factor”(p. 1). For a discussion of the limitations of any single def-inition of causality relevant for economists, see Cartwright(2007) and Reiss and Cartwright (2004). For a thoughtfuland well-reasoned discussion of views about causation thatdo not have a central role for “manipulation,” see Zellner

(1984). Zellner prefers to work with a definition proposedby Feigl “the clarified (purified) concept of causation isdefined in terms of predictability according to a set oflaws.” By doing so, he appears to be able to consider manysorts of questions—albeit subject to a logical (Bayesian)calculus—which could not be put to a severe test in myway of viewing of the issue.

21 It is also unsurprising that Peirce was one of the ear-liest to conduct a high quality RCT (Stephen M. Stigler1978). Even economists played a role: Francis YsidroEdgeworth (1885, 1887) wrote up two excellent analysesof the results of a trial involving randomization in theJournal of Psychical Research.

22 It is not clear what is intended. Even in a “jumble”of data, determing what variables are correlated isstraightforward.

23 For a sample of this debate, see Suppes (1982),David A. Harville (1975), Zeno G. Swijtink (1982), andthe illuminating debate in Leonard J. Savage (1962) espe-cially pages 62–103.



24 There are many different flavors of Bayesian argu-ments against randomization. One argument, not neces-sarily the best, will be familiar to economists. From ScottM. Berry and Joseph B. Kadane (1997): “Suppose a deci-sion maker has two decisions available d1 and d2. Thesetwo decisions have current (perhaps posterior to certaindata collection) expected utilities U(d1) and U(d2) respec-tively. Then a randomized decision, taking d1 with proba-bility λ and d2 with probability 1 − λ, would have expectedutility λU(d1) + (1 − λ)U(d2). If the randomization is non-trivial, i.e., if 0 < λ < 1, then randomization could be opti-mal only when U(d1) = U (d2), and even then anonrandomized decision would be as good.” Anotherrationale is that it helps “simplify” the appropriate likeli-hood (Donald B. Rubin 1978).

25 Although salient to this discussion, limitations ofscope do not permit an extensive discussion of these issues.For a useful discussion and a defense/reformulation ofclassical statistical inference, see Mayo (1996).

26 Another way to proceed, which is often helpful, is toestablish a notation for counterfactuals. Let Yi(1) be theoutcome when the person is assigned to the treatment andlet Yi(0) be that same person’s outcome when they areassigned to the control. The treatment effect for person iis then τi ≡ Yi(1) − Yi(0). It is generally impossible toobserve ti since the individual is in one state or the other.We could then talk about trying to define E[τi] (for somepopulation) as the object of interest. See Holland (1986)for an exposition along these lines. See Heckman (2005)for a critique of that approach and related points.

generate24 and this difficulty has been thesource of criticism of Bayesian methods fortheir failure to recognize a “distinctionbetween ‘experiences’ and ‘experiments’”(Lecam 1977, p. 137).25

Rather than hold up the RCT as a para-digm for all research, I review it herebecause it represents a single case in whichwe sometimes have some hope of evaluating(limited, context dependent) causal claims,and because what constitutes a severe test issomewhat clearer.

Second, the RCT is a useful framework todiscuss the “intelligibility” of putativelycausal questions. That is, if one is discussinga “causal” question, whether or not one isdiscussing an RCT, the RCT often providesa useful template to evaluate whether thecausal question is answerable. It allows us totry to answer the question “what do youmean by a causal effect?” as well as the relat-ed question “how credible is your inferenceabout the ‘cause’?”

A natural by-product of considering theRCT is that the limitations of a researchdesign to answer “interesting” questions(and what might provide evidence for andagainst the validity of the design) is easier tounderstand. Ironically, I suspect that someof the disenchantment with RCTs relates tothe relatively transparent notion of“cause”—in particular the possibility that

the putative cause under examination is“implementation-specific,” which I discussbelow.

3.1 Randomized Controlled Trials

In an RCT, a single potential cause is ran-domly “assigned” to a treatment group andan (inert) placebo is assigned to the controlgroup.

Let yi be an outcome which can be meas-ured for all individuals, and let Ti = 1 signifythat person i has been assigned to treatmentand Ti = 0 otherwise. Suppose the followingcharacterizes the true state of the world26:

(1) ,

where � and β are constants, f(•) is someunknown function of all the observable char-acteristics that affect yi before being assignedto the treatment or control, and εi is all theother unmeasurable influences. Even at thislevel of generality, it takes a considerable leapof faith to think that this simple (partially) lin-ear representation can yield anything but themost limited understanding of the effect of Teven when some understanding is possible.

A fundamental problem we face is that, foran individual i, we can only observe the personin one of the two states—treatment or control.Another related problem is that we don’tobserve everything that affects the outcome y.For any individual then, we can never be cer-tain that some unobserved determinant of theoutcome y is changing at the same time we areassigning the person to treatment or control.

The key to this design is that by coin toss,nature, or some other contrivance that generates “random numbers,” persons are

y T f Xi i i i= + + +α β ε( )


assignment, any X should be the same onaverage for the two groups. This is, of course,a consequence of random assignment that isroutinely tested in every RCT. If the groupslook very different on average, this is gener-ally considered evidence against the design,and one reason to have less confidence in theresults. A related implication is that in anRCT, the answer should be insensitive to theaddition of additional controls.29

It is the fact that the important X’s are thesame on average that gives us some reason tobelieve that the same is true for the ε. Evenin this simple case, we can never be sure thatthis is true. At best, the answers from identi-cal experiments have the “tendency” to becorrect.

Several attractive features of a welldesigned RCT that are usually too obviousto deserve mention become more impor-tant when one turns to the sorts of “approx-imations” we are often faced with in socialscience:(1) Prespecified research design. In an RCT,

the researcher specifies in advance to theextent possible the conditions that haveto be satisfied, and what will be conclud-ed under every possible result of theexperiment. (This is articulated with theusual degree of tentativeness associatedwith any technique involving sampling.)30


27 Already this aspect of the RCT highlights its weak-ness for a lot of social science questions. Many social sci-entists are interested “why” someone does what they do orwhy things turned out as they did. In the RCT, however,the credibility of the answer hinges on the fact that part ofhuman choice has been handed over (implicitly or explic-itly) to a (hypothetical) chance set up. This is also a sourceof the considerable ethical problems that are frequentlyinvolved in RCTs.

28 Even in this short description we have swept severalvery important issues under the rug that can arise even ina simple medical example. For instance, we are assumingthat “general equilibrium” effects are unimportant so thatone isn’t concerned that the controls are affected by thetreatment also. These and related concerns become evenmore important when we raise our ambitions to seek toextrapolate the results of the experiment to other possiblydifferent contexts. There is a long tradition in economicsof seeking answers to these more difficult questions thatdates back at least to the Cowles Commission (seeHeckman (2000) and Heckman (2005) for useful discus-

sions). I focus on “simpler” less ambitious questions.(Heckman and Edward Vytlacil 2005).

29 This is, one is tempted to speculate, the source of theintuition, that many appear to have, that somehow if aresult “survives” the inclusion of a long list of covariates, itis a more trustworthy estimate.30

30 I don’t mean to advocate a simple-minded caricatureof the Fisher or Neyman–Pearson significance testingapproach. Long-standing criticisms of insisting on pre-specification is that they are rarely strictly applied (withgood reason). See Mayo (1996) for a discussion of thedebate about “predesignation” and a helpful reformula-tion of Neyman–Pearson “error statistics.” In her frame-work, violations of predesignation are licensed when theydon’t make the test of the hypothesis less “severe.”Surprisingly, some Bayesians argue for the irrelevance ofpredesignation on the grounds that the “mental state” ofthe person collecting the data should have no relevancefor the evidential import of the data. See Mayo andMichael Kruse (2002) and the references therein for auseful discussion.

next assigned to either treatment or controlin a way that is independent of their charac-teristics. If this assignment is conducted on arandom sample of individuals from a partic-ular population, then the mean outcome forindividuals in the treatment group—y–T=1—isa good estimate of the average outcome ofindividuals from this population under thetreatment—� + β + E[f(Xi)]. By similar logic,y–T=0 is a good estimate of the average out-come for the control group—� +E[f(Xi)](provided, of course, that there is infact some stable relationship between thecause and the outcome.)27 The differencebetween these two means is likewise a“good” estimate of the average treatmenteffect for this group.28

The assertion that the estimate so formedis a “good” one is fortunately not one that hasto be taken solely on faith: it can be tested.While not “assumption free,” our confidencein estimates generated this way does not relyon us having complete knowledge of thedata generation process given by equation(1). Specifically, it is reasonable to hope thatwe can get a good answer without having tohope that somehow we can “control” for allpossible confounds.

In a typical RCT, in fact, any of the vari-ables in X are generally not used for any pur-pose but to test the design. Under random


If we are assessing the efficacy of a drug,for instance, it is pointless to decide inadvance that the drug “works” and thenmassage the data, sample, specification,etc. until we “reach” that conclusion.Doing so would seem to vitiate using theRCT (or regression more generally) as amethod for anything but confirming ourpreviously held beliefs.31 Indeed, histor-ically and etymologically the notion ofan “experiment” is intimately related tothe effort to put one’s views to the test(DiNardo 2006b). Clearly, after the factresearch design is less “severe.”

(2) “Transparent” research design. In theclassical RCT, as one example, it is trans-parent what constitutes evidence againstthe design (for example, if the predeter-mined characteristics of the treatmentand control are very different) and whatcomparison or regression coefficientconstitutes evidence in favor of, oragainst, the claim.

Another set of assumptions—again usual-ly too obvious to be discussed in the case ofthe RCT—deal with whether a question orset of questions are “well posed” or whetherthe answer suggested by RCT addresses the“intended” question.(1) We can identify a “treatment” or “poli-

cy.” At one level, since we are dealingwith human beings, one often has tocarefully distinguish between “assign-ment to treatment” and the “treatment.”You can assign someone to take a specif-ic medicine but it isn’t always reasonableto assume that the person has taken themedicine. Even if we can ignore suchdistinctions it may be difficult to identi-fy what our treatment is. Even the most


31 For an illustration of evolving definitions of the“appropriate” specification after having seen the results,and the consequences of failing to adopt a prespecifiedresearch design, see the discussion of Finis R. Welch(1974), Frederic B. Siskind (1977), Welch (1976), andWelch (1977) in chapter 6 of David Edward Card andAlan B. Krueger (1995). Although the extent of thisresearch style is unknown, I suspect that the example isunusual only because it is documented. 32 See Clive Granger (1986) for example.

routine, minor medical manipulationoften comes bundled with other things.Many years ago it would have been asound inference based on much unfor-tunate experience that the causal effectof a spinal tap (lumbar puncture) wouldbe a serious headache afterward. Is thiseffect caused by the substance used tosterilize the needle? The type of nee-dle? The size of the needle? Despitethe fact that lumbar punctures havebeen performed for more than one hun-dred years (A. Sakula 1991), these ques-tions continue to be subject of debatedespite many randomized controlled tri-als (Carmel Armon and Randolph W.Evans 2005).

(2) The effect of a treatment is always rela-tive to the control. The state of beingassigned to the control is the “counter-factual” against which the treatment isevaluated. An effect is a comparison ofoutcomes in different possible states.

(3) The treatment involves an “intervention”and/or is “manipulable.” In the RCT, thisis so basic it hardly deserves mention; itis, however, a subject of some debateamong economists.32 In the limited wayI wish to use the word “cause,” it is notmeaningful to question the effect of“being black” on one’s propensity forcrime. Only in a fantasy world does itmake sense to consider the fate of JohnDiNardo as a “black man.” If a misguid-ed social scientist had been able tosecretly reach back into the womb tomanipulate John DiNardo’s DNA tomake him “black” (something that wouldhave no doubt come as a surprise to hisItalian parents) would it even be mean-ingful to describe the person generatedfrom that process as the “black JohnDiNardo” to which the “white JohnDiNardo” could be compared? Theissue is not “Is such a manipulation pos-sible?” but “Were such a manipulation


conceivable, would it answer the ques-tion we are asking?” If the answer to thatquestion is “no,” I would describe thequestion as ill-posed or unintelligibleeven if it is the answer to a differentwell-posed question. I have no wish tooverstate this issue: some of debate maybe of no greater moment than questionsof terminology. For example, I think it ispossible to talk in a limited way aboutthe effect of changing a person’s percep-tion of the race of, say, a job applicantbecause it is perhaps meaningful to think about manipulating a person’s perception of race.33

(4) Related to this last issue is the hope that“how” the treatment is assigned is irrel-evant to the effect (β) on the outcome. Ifthe effect of the putative cause is imple-mentation specific, it is often more help-ful to abandon the effort to find theeffect of the putative cause and “settle”for the effect of the “implementedcause.” For example, if the effect ofaspirin on headache differs when it isgiven to a patient by a nurse than whenit is given to a patient by a doctor, themost we may be able to do is describethe causal effect of “nurse administeredaspirin” or “doctor administeredaspirin.” In the limit, of course, if only

the method of administration matterswe might even wish to conclude thataspirin qua aspirin doesn’t cause any-thing to do with headache. At a veryminimum, if such were the case, adebate about the causal effect of aspirinwould be, at a minimum, unintelligible.

(5) I would add, although this is not prop-erly thought of as a “requirement,” themost interesting studies involve manipu-lations that correspond to real policies.In these cases, even if we learn littleabout the “structure” of a true model,we have perhaps learned somethingabout the consequences of one possibleaction we have taken.

I do not mean to suggest by the foregoingthat a RCT is always or usually the “best”evidence. Quite to the contrary, I don’t eventhink that a singular focus on “well-posed”questions would be a good idea.34

I would go even further and suggest thatin many areas under study by economists,the focus on “treatments” can be, perhapsunintentionally, narrow. As David Thacher(2001) observes, “Reducing crime is clearlyone important goal for the police. But itmust compete with other goals like equity,due process, just desserts, and parsimony.”Rather I argue that if a putatively causalquestion can not be posed as some sort of


33 Robert Moffitt (2005), for example, explains that“[The argument in Holland (1986) that race can not be acause because it can not be manipulated resultsfrom] . . . a mistaken application of the experimental anal-ogy, and the more basic counterfactual analogy is thesuperior and more general one. It does make conceptualsense to imagine that, at any point in the lifetime of (say)an African-American, having experienced everything shehas experienced up to that time, her skin color werechanged to white (this is sometimes called a gedanken, orthought, experiment). Although it is a well-defined ques-tion, it may nevertheless be unanswerable, and it may noteven be the main question of interest. For example, wouldthe individual in question move to a different neighbor-hood, live in a different family, and go to a differentschool? If not, the question is not very interesting” (p.105). While a distinction between comparisons one couldmake and those that are possible is important (I wish tothink of manipulable quite broadly), I find such discussionconfusing. If I were to wake up tomorrow and discoverthat my skin color had changed dramatically, one possible

reaction might be a visit to the Centers for DiseaseControl to learn if I had acquired an obscure disease!Whether or not I moved to a different neighborhood, ordivorced my wife, if that response were typical of otherwhite folks who woke up one day to find themselves“black,” I would nonetheless hesitate to say that the“causal effect of being black” (or white) is an increase inthe probability that one makes a visit to the CDC [asabove], though it could be so described. Again, absentsome discussion of a class of counterfactual states andhypothetical manipulations, for me it is hard to know whatto make of such causes, even when they can be defined.

34 In this regard, the philosopher Hacking has done agreat deal to show that useful work can be done in areasthat vary quite widely in how well posed the questions are.For a study of statistical questions, see Hacking (1965),the role of experimentation in natural science (Hacking1983b), multiple personality disorder (Hacking 1995) andthe “social construction of reality” (Hacking 2000), forexample.


themselves” for short periods of time mighthelp one lose weight but wouldn’t necessar-ily promote longevity (although it might,who knows?). Similarly, we might expectweight loss that results from increased phys-ical activity to be more protective thanweight loss that results from increased lifestress.

The experience in the United States withthe drugs fenfluramine and dexfenfluramine(Redux) is a case in point. Despite good evi-dence that the causal effect of taking Reduxwas weight loss, the drugs were pulled fromthe market because a “side effect” of themedication was an increase in potentiallyserious heart problems (Food andAdministration 1997) .

Indeed, it would appear that the pre-sumption that obesity is a cause of ill healthmade it virtually impossible to debatewhether nonobesity was the cause of theincreased heart problems. Rather, the con-sensus seems to be that the heart problemswere not caused by nonobesity, but rather byRedux’s “side effects.”36 I don’t want toargue that “ideal weight” is bad for one’shealth, only that this example highlights thefact that the effect of weight loss or weightgain is inextricably implementation-specific.If one accepts, this logic, much of theresearch claiming that being nonobese (andnonunderweight) is causally related to betterhealth demonstrates no such thing. Indeed,this literature is filled with “anomalous”results.37 Moreover, “theory” seems to pro-vide little help: even the weakest research


35 This point is not in any way unique to me. For dif-ferent, but not unrelated, views of these issues with rele-vance to social science, see Holland (1986), David A.Freedman (1999), Jude Pearl (1997), Heckman (2005)and William R. Shadish, Thomas D. Cook, and Donald T.Campbell (2002), to name just a few.

36 I am merely stipulating to the existence of a distinc-tion between “effects” and “side effects”; frequently thedistinction seems to be based on marketing rather thanscientific concerns.

37 In an excellent but unfortunately unpublished study,Jerome Timothy Gronniger (2003) finds that if oneincludes more careful OLS type controls for income theputative effect of obesity is actually protective for manyincome groups. In arguing against viewing obesity as a“causal” factor in all-cause mortality, he also observes thatthe salient policy question “is not what obesity does topeople, but what removing obesity would do to people.” Aheavily abridged version of the article appears asGronniger (2006).

“approximation” to a question satisfying theabove desiderata, the burden of explainingwhat is meant in plain language should beborne by the author. Too frequently, howev-er, it is not.35

4. Just Because We Can Manipulate ItDoesn’t Mean We Can Learn About It

One of the serious problems with a focuson the RCT is the misleading view that wecan always learn about causes from manipu-lations. Cartwright (2007a, 2007b) makesthe point with greater generality than I canhere. Rather I would like to focus on oneclass of problems with direct relevance formuch of the research described inFreakonomics. My argument is simply thatalthough we can learn about the effect of anintervention in a well-designed study, wearen’t guaranteed to learn about the putativecause in question, because the cause underconsideration may be inextricably imple-mentation specific. Consider the “causaleffect” of obesity on all-cause mortality. Theliterature hardly seems to doubt that it ispossible to measure such an effect, thoughthere may be problems—perhaps body massindex (BMI) is an inappropriate measure, forexample.

Nonetheless, I would argue that it isunlikely that anyone will devise a severe testof the proposition that obesity causes anincrease in all-cause mortality. Simply put,the effect of obesity (or of ideal weight) isinextricably implementation specific. Thatis, it is not helpful to think about the “effect”of obesity for the same reason it is not help-ful to debate the “causal effect” of race onincome (Granger 1986, p. 967).

Many of us suspect, for example, thatencouraging obese individuals to “starve



38 Suppose the world was as complicated asBehavior = G + E − G • E where G is some index summa-rizing “genes” and E is some index summarizing “environ-ment.” In this simple example, the fraction of variation inbehavior induced by differences in genes isn’t separablefrom the environment—indeed, the effect of genes is afunction of the environment. In some environments,introducing differences in genes would introduce littlechange in behavior, and in some environments it wouldchange behavior a lot. For a useful discussion thataddresses this and other related points, see Heckman(1995).

surprisingly, those who win the lotteryperform no better on the usual measuresof “performance” (and sometimes worse)than lottery losers.

(3) Even though I can’t come up with a sim-ple “experiment” to test the hypothesisthat “honesty may be more important togood parenting than spanking is to badparenting” (p. 171), I think honesty is agood strategy (even if it didn’t have acausal effect on a child’s test scores; thesalient issues have to do with ethicalbehavior.)

In the setup to this discussion, Levitt andDubner begin with a summary of previouswork: “A long line of studies, includingresearch into twins who were separated atbirth, had already concluded that genesalone are responsible for perhaps 50 per-cent of a child’s personality and abilities” (p.154). As any student of regression knows,this statement doesn’t even make senseunless the world is of the simplest sort imag-ined by regression’s eugenicist forefathers.38

Obviously as careful as Cullen, Jacob, andLevitt (2003) is, it is completely silent onthis unanswerable question.

Much of the chapter, a discussion ofRoland G. Fryer and Levitt (2004b) (pp.163–76), is a long hike in a forest of confu-sion. Surprisingly, the authors use it to deliv-er a short tutorial about regression analysis(“knowing what you now know aboutregression analysis, conventional wisdom,and the art of parenting”) and they spend agreat deal of time discussing what is essen-tially a pair of “kitchen sink regressions”

design often comes with an impressive theo-ry. M. Cournot et al. (2007), for example,finds an “association” between obesity andlower “cognitive functioning” (verified by asimple cross-sectional design regressingmeasures of cognitive functioning on a smallnumber of covariates and BMI) and positsone possible “theoretical” reason for why thelink might be “causal”: “direct action of adi-posity on neuronal tissue through neuro-chemical mediators produced by theadipocyte” (fat cell).

My point is simple: when each way of“assigning” obesity that we can imaginewould be expected to produce a differenteffect on all-cause mortality or other out-comes, it is not at all clear that it is helpful todebate the “effect of obesity.” It seems moreintelligible (and more policy relevant) to dis-cuss the effect of Redux or exercise than it isto talk about the “effect” of obesity.

4.1 How Much Do Parents Matter?

Though some of the “interesting” ques-tions in economics might admit of a mean-ingful causal (or other) interpretation, oneoften hopes for more explanation than isprovided in several of the examples inFreakonomics. Indeed, the obesity exampleabove is arguably a bit clearer than the ques-tion they pursue in two chapters—“howmuch do parents really matter?”

Let me begin by stating that there is muchI agree with in the chapters:(1) The advice of “parenting experts” should

be met with deep skepticism at best.(2) The research in Julie Berry Cullen,

Brian A. Jacob, and Levitt (2003) justi-fies a longer discussion than the twopages the book provides. It is qualitative-ly several notches above most of theresearch done on school choice, evalu-ates an actual (not a hypothetical) policy,and is a marvel of clarity and honestreporting of results. They exploit a ran-domized lottery that determineswhether some children get to “choose”the public school they attend. Perhaps



39 I think they mean “so imprecisely estimated that anull hypothesis of no correlation can not be rejected usingstandard procedures.”

40 From Appendix A-2, when the dependent variable ismath scores the coefficient on WIC is −0.120 with a stan-dard error (0.020). When the dependent variable is read-ing scores, the coefficient on WIC is –0.104 with astandard error (0.021).

(regressions with enormous numbers ofcovariates) from appendix A-2 of Fryer andLevitt (2004b) using data from the EarlyChildhood Longitudinal Study. In theirpresentation, they invite the reader to con-sider several things that are positively corre-lated with a child’s test scores (presumablyafter conditioning on a huge laundry list of[unmentioned] variables):

the child has highly educated parents, thechild’s parents have high socio-economicstatus, the child’s birth mother was thirty orolder at the time of her first child’s birth, thechild had low birth weight, the child’s par-ents speak English in the house, the child isadopted, the child’s parents are involved inthe PTA, the child has many books in hishome.

As well as things that “aren’t correlated”39:the child’s family is intact, the child’s parentsrecently moved into a better neighborhood,the child’s mother didn’t work between birthand kindergarten, the child attended HeadStart, the child’s parents regularly take him tomuseums, the child is regularly spanked, thechild frequently watches television, the child’sparents regularly read to him every day.

At some points, they seem to suggest thatthe results of this analysis speak to nothingcausal: “the ECLS data don’t say that booksin the house [or any of the variables in theiranalysis] cause high test scores; it says onlythat the two are correlated.” Elsewherethey seem to suggest the opposite:

Now a researcher is able to tease someinsights from this very complicated set ofdata. He can line up all the children whoshare many characteristics—all the circuitboards that have their switches flipped inthe same direction—and then pinpoint thesingle characteristic they don’t share. This ishow he isolates the true impact of that sin-gle switch—and, eventually, of everyswitch—becomes manifest (p. 162).

I would maintain that, even allowing for thesimplification of the argument for a general

audience, this is a bad description of whatmakes for credible research—nothing isbeing severely tested.

For example, whatever one thinks of HeadStart, accepting Dubner and Levitt’s obser-vation that “according to the [kitchen sinkregression using] ECLS data, Head Startdoes nothing for a child’s future test scores”seems unwise at best. The research designcan not credibly support that inference. Tomake this clear, consider other inferences(though not discussed in Freakonomics)from the same regressions. Why not, forexample, observe that participation in WIC(Women, Infants, and Children) significant-ly lowers test scores?40 Perhaps such assis-tance actively harms children. I would arguethat the good reason for avoiding that infer-ence works just as well as a rationale foravoiding the inference they do make aboutHead Start: there is no reason to believe that(conditional on the other nonrandomlyassigned regressors) that a coefficient in akitchen sink regression reliably informs usabout causation in any sense.

Again, even kitchen sink regressions havetheir place: one can sometimes make a casefor inclusion of scores of covariates in somevery selected contexts. However, an algo-rithm which allows the researcher to decidewhich coefficients represent “causal” effectsand which ones are regression artifacts afterone has seen the regression output is unlikelyto result in much progress in understanding.It is the very antithesis of a severe test.

4.2 Can Regression Help Distinguish“Cause” from “Consequence”?

Chapter 6, “Perfect Parenting, Part II; or:Would a Roshanda by Any Other NameSmell as Sweet?” begins this way:

Levitt thinks he is onto something with anew paper about black names. He wanted to



41 The most notorious example perhaps is the contro-versy over the 1840 census that involved the putative neg-ative correlation between the number of “insane andidiotic colored persons” living in a state and the propor-tion that were slaves. The data, which are still availabletoday from the ICPSR show that incidence of insanity wasfar, far lower in the South, and the implication for thedebate on slavery was clear (Gerald N. Grob 1978). (A fardifferent version of “acting white” is mentioned severaltimes in Freakonomics.)

42 I am stipulating, of course, that Levitt and Fryer’smeasure of “distinctiveness” of a “Black” name (BNI)—crudely put a function of the relative frequency withwhich a specific name is chosen for black children and therelative frequency with which the same name is chosenfor white children—provides a measure of whatever “cul-ture” is. A lot of nonobvious measurement issues arise. Afew moments reflection, for instance, makes clear that thelevel of “black culture” is, by definition, a function of“white” culture although one doubts this research designwould have found much appeal as a study of “white cul-ture.” Second, a white man named Maurice Ravel mightbe measured as have more black culture than a black mannamed Paul Robeson Jr. regardless of their actual “cul-ture” if Maurice was relatively more popular among blacksthan Paul.

involving their measure of “black culture”the “Black Name Index” (BNI).42

It is not clear whether the BNI is an x or ay: superficially, it would appear that they runthe regressions “both ways”: in one set ofregressions, BNI is an independent variable,in a second set, it plays the role of a depend-ent variable. As is well appreciated, this is aproblem even when it occurs in different literatures (John Kennan 1989).

Further inspection suggests that this is notstrictly the case: in the first set of regressions(see table 2 “Determinants of Name ChoicesAmong Blacks,” of Fryer and Levitt 2004a)the dependent variable is the BNI of a givenchild, and the explanatory variables are anumber of things, many of which are pre-sumably correlated with outcomes (mother’sage at time of birth, father’s age at time ofbirth, months of prenatal care, percentage ofBlack babies in zip code, per capita incomein the birth place, parental education, etc.).In another set (table 3, “The RelationshipBetween Names and Life Outcomes”), BNIbecomes an explanatory variable and thedependent variables are outcomes such as“percent Black in residential zip code as anadult,” years of education (the woman herself), the woman’s age at first birth, etc.

Fryer and Levitt (2004a) are forthright inadmitting that their evidence is consistentwith a number of very plausible (but verydifferent) alternatives that are consistentwith their regressions but not necessarilywith their conclusion: “With respect to this

know if someone with a distinctly black namesuffers an economic penalty. His answer—contrary to other recent research—is no. Butnow he has a bigger question: Is black culturea cause of racial inequality or is it a conse-quence? For an economist, even for Levitt,this is new turf—“quantifying culture” he callsit. As a task, he finds it thorny, messy, perhapsimpossible, and deeply tantalizing (p. 177).

As with eugenics, the history of social sci-ence suggests that scholarly research intorace that makes extensive use of correlationsshould be taken with a large grain of salt.41

When talking about race, it is my view thatbeing clear about what is meant is even moreimportant.

As someone who is frequently called uponas an econometric “script doctor” to “fix theeconometrics” of some existing paper whichis putatively about “causation,” I have foundit useful to begin with two seemingly simplequestions:(1) What is y, the outcome, you wish to

explain?(2) What are your key x variables and what

potential “causes” or “interventions” areyou interested in?

As a practical matter, the inability to pro-vide a simple reply to the question is a goodpredictor of my inability to understand theempirical work.

The above quote from Freakonomics is ina chapter which, inter alia, discussesresearch from Fryer and Levitt (2004a) and(far more briefly) Marianne Bertrand andSendhil Mullainathan (2004). In Fryer andLevitt (2004a), much of the evidence onwhether “black names” are cause or conse-quence comes from two types of regressions



43 The paper seems to suggest that they have the usual“manipulationist” version of cause in mind. For example,there is a brief mention of the fact that there are no obvi-ous instrumental variables which would be of no momentunless they conceived of a potential manipulation.

44 The fact that employers call back “Jamals” much lessfrequently than “Johns” may not be based solely on self-conscious racial hatred, but might reflect “only” “statisticaldiscrimination” (i.e., employers are merely acting assophisticated econometricians, extracting all the usefulinformation not provided by a résumé about the likely pro-ductivity of workers based on their first names, and thenchoosing based exclusively on “merit”) or some othermechanism (although this may be of little comfort to Jamalor John). See Thacher (2002) for a thoughtful discussion ofthe issues involved in “profiling.”

Even if one agrees to stipulate that a limi-tation of such studies is their inability toexplain “why” (although the concern for“why” is not pressed very hard elsewhere inFreakonomics regarding the motives of Sumowrestlers or school teachers, for example),Bertrand and Mullainathan (2004) clearlyexplain that they are not interested in thelifetime “economic cost” of a black soundingname—which is not obviously an interestingor well-posed question. Rather they are inter-ested in “experimentally manipulat[ing] [anemployer’s] perception of race.” In contrastto the thought experiment of manipulating aperson’s “culture” or “black name,” Bertrandand Mullainathan seem to ask a well-posedquestion: it is much easier to conceive of asalient experiment manipulating “percep-tions” than a salient experiment manipulatingthe naming decisions of parents. One canargue that the causal effect of manipulatingperceptions of race is “uninteresting” on anumber of grounds, not the least of which isthat the manipulation itself doesn’t suggestan intervention we might wish to undertakeas a society. On the other hand, in contrastwith some experiments in “experimental eco-nomics” their study is embedded rather moredeeply in “real life” than experiments thatoccur in a lab. Nonetheless, the questionseems well-posed and may be answerablewith regression, even if one wants to arguethat it is uninteresting on other grounds.44

Second, although Dubner and Levitt arecorrect to argue that studies involving résumérandomization are unlikely to provide con-vincing evidence on “why DeShawn getsfewer callbacks,” it is not clear what a satis-factory explanation of “why” would look like.

particular aspect of distinctive Black culture,we conclude that carrying a black name isprimarily a consequence rather than a causeof poverty and segregation.”

I have no wish to dispute their conclusion;rather, I wish to suggest that there is no con-figuration of the data of which I am awarewhich would credibly support the view heldby Fryer and Levitt and not support very dif-ferent alternatives. In short, this is because itis very difficult to know what is being askedand what would constitute an answer. Put dif-ferently, there is at least one ill-posed questionfloating about. Is it possible to talk meaning-fully about “manipulating” culture? (And ifone could, would one want to?)43 Might rea-sonable people agree on some variable or pol-icy that served exclusively to manipulate blackculture and affected economic outcomes onlythrough its effect on “culture?” It is not evenclear that “culture” and “economic outcomes”or “racial inequalities” are distinct entities.Indeed, as the word is often understood, cul-ture often includes the distribution of “eco-nomic outcomes.” For instance, one mightremark: “the fact that Bill Gates earns severaltimes more in a year than the sum earned byall Chicago Public School teachers is a distressing fact about U.S. culture.”

Further muddling the issue is the wayLevitt and Dubner discuss studies such asBertrand and Mullainathan (2004):

So how does it matter if you have a very whitename or a very black name?. . . In a typicalaudit study, a researcher would send twoidentical (and fake) résumés, one with a tradi-tionally minority-sounding name, to potentialemployers. The “white” résumés have alwaysgleaned more job interviews . . . . The impli-cation is that black-sounding names carry aneconomic penalty. Such studies are tantalizingbut severely limited, for they can’t explainwhy [someone with a black sounding namelike] DeShawn didn’t get the call (p. 186).



45 Eugenics, often popular among “progressive” mem-bers of the elite, was a leading motive for the development ofregression. Sir Francis Galton, who gave us the word “regres-sion,” was an ardent eugenicist. For example, what is now the“Galton Laboratory, Department of Human Genetics andBiometry” at University College London, was originallynamed the “Galton Laboratory of National Eugenics.”

It is even harder to understand how the typeof regressions performed in Fryer and Levitt(2004a) would, in principle, be relevant to thisdiscussion. (Again, they might be, but the linkis not obvious to me.) Perhaps like Dr.Pangloss, we could trace Jamal’s bad luck withemployers to necessity: it is necessary for thisto be the case, for us to be able to live in thisthe best of all possible worlds.

More generally, reasoning backward from asingle effect (not calling back Jamal) to a“cause” (why employers don’t call Jamal) insocial science is generally fraught with peril;people are complicated enough that there israrely a single answer to the question “why”—often there are many interacting “reasons.”Absent some fairly articulated model of howthe world works, it seems difficult even toknow what would constitute a good answer. Asevere test of the claim seems even moreunlikely. Moreover, it often seems that puta-tive explanations of “why” some complexhuman interaction occurs are frequently usedas a device to end a debate just at the pointwhen the issue begins to get interesting. If Xis the reason Y occurs, why look further?Many readers might be familiar with thisaspect of some answers to “why” questions:one thinks of a parent who tries to end a longconversation with a child who, in response toa parent’s increasingly complicated responses,keeps asking “Why?” Again it is not that a sat-isfactory answer to such question is not desir-able: it just seems like way too much to hopefrom a small set of OLS regressions.

Finally, in asking a regression to distinguish“black culture” as a cause from black cultureas a consequence of economic conditions, weare very far from the types of questions I dis-cussed in section 3. But there is no clear dis-cussion in Freakonomics of what question isbeing asked nor the “ground rules” that wemight use to determine when the questionhas been answered satisfactorily. It is possiblethat the question is well posed, but at a min-imum, it is not very obvious. After readingFreakonomics and the original source materi-al, I haven’t gained any understanding of the

issues involved or even how to think aboutwhat are the answerable questions.

4.3 Why a Transparent Research DesignHelps—Abortion as a “Cause”

For me the most confusing section ofFreakonomics is the discussion of “Why dodrug dealers live with their moms?” and“Where have all the criminals gone?”Between them, the chapters contain refer-ences to scores of articles of varying degreesof scholarship. Much of the former chapterdiscusses Levitt’s work with sociologistSudhir Alladi Venkatesh who collected alarge amount of detailed data on one Chicagogang. For those surprised as to why gangmembers don’t frequently live in the nicesthomes in town, it will be a useful corrective.(For an earlier discussion that covers similarground, see Peter Reuter, Robert MacCoun,and Patrick Murphy 1990.) The discussionalso includes the conclusions of some verycareful work by Douglas V. Almond, KennethY. Chay, and Michael Greenstone (2003) thatdocument the key role that hospital integra-tion in Mississippi played in improving theappalling infant mortality rate of black chil-dren—before integration, these infants wereoften left to die of very preventable causessuch as diarrhea and pneumonia.

Much of the chapter on “where have allthe criminals gone?” deals with Romania’sabortion ban, which I have discussed else-where (DiNardo 2006a). This chapter alsoincludes the controversial material onwhether “abortion lowers crime rates.”

As a purely personal matter, given thelong, deep, and ugly relationship betweenstatistical analysis and eugenics, what mightemerge from this debate seems too meagerto justify the effort on this subject. I don’tfind the question “interesting.”45 Merely



46 Indeed, the debate has grown coarser. Consider thispartial transcript and discussion by Levitt of remarks byWilliam Bennett. (For clarity, in what follows, text andtranscript material from the blog is in italics.) Bennett, aformer government official, after appearing to dismiss the“abortion–crime” hypothesis in Freakonomics, remarkedin a talk show that:

BENNETT: Well, I don't think it is either, I don't thinkit is either, because, first of all, there is just too much thatyou don't know. But I do know that it's true that if youwanted to reduce crime, you could—if that were your solepurpose, you could abort every black baby in this country,and your crime rate would go down. That would be animpossible, ridiculous, and morally reprehensible thing todo, but your crime rate would go down.

Everyone agrees with Bennett that “it would be amorally reprehensible thing to do.” On the other hand, hispremise that “you could abort every black baby in thiscountry and the crime rate would go down” is unsupport-able at best, racist at worst. Levitt's thoughts on the sub-ject (as well as a transcript of the relevant portion ofBennetts's remarks) are available at the web-site http://freakonomics.blogs.nytimes.com/2005/09/30/bill-bennett-and-freakonomics/. For what it's worth, Levitt’sremarks are a mixture of what strike me as reasonableassertions and others that are confusing at best, wrong atworst. For example, consider Levitt's points 6 and 7:

6) If we lived in a world in which the government chosewho gets to reproduce, then Bennett would be correct insaying that “you could abort every black baby in thiscountry, and your crime rate would go down.” Of course,it would also be true that if we aborted every white, Asian,male, Republican, and Democratic baby in that world,crime would also fall. Immediately after he made the state-ment about blacks, he followed it up by saying, “Thatwould be an impossible, ridiculous, and morally reprehen-sible thing to do, but your crime rate would go down.” Hemade a factual statement (if you prohibit any group fromreproducing, then the crime rate will go down), and thenhe noted that just because a statement is true, it doesn’tmean that it is desirable or moral. That is, of course, anincredibly important distinction and one that we makeover and over in Freakonomics.

7) There is one thing I would take Bennett to task for:first saying that he doesn’t believe our abortion–crimehypothesis but then revealing that he does believe it withhis comments about black babies. You can't have it bothways.

As far as I can tell, Levitt's statement about loweringthe level of crime by abort¬ing Native American,Republican, . . . fetuses is a non sequitor at best. Bennettis clearly talking about the rate of crime. I can only makesense of the statement by construing it to mean that rid-ding the planet of human life would eliminate crime (atleast that caused by humans). As to the rest of the expla-nation, Levitt gives no reason to believe that “if we livedin a world in which the government chose who gets toreproduce, then Bennett would be correct in saying that‘you could abort every black baby in this country, andyour crime rate would go down.’”

Contrary to Levitt's claim, I do not think it necessary tobelieve that the termination of black fetuses would lowerthe crime rate even if the “causal effect of abortion legal-ization” in the United States had been a reduction incrime. As I explain below, even if one stipulates that crimereduction was a causal effect of abortion legalization in theUnited States this would tell us nothing about the causalconsequences of aborting black (or any) fetuses.

47 One could conceive of cases where abortion mightbe thought of (for better or worse) as a treatment: that isgenerally true when the subject of interest was child-bear-ing women (not their fetuses). The question of what hap-pened to the welfare of women who are given the choiceof having an abortion relative to those that have beendenied such choice, is well posed. One merely would seekto compare a group of women given the opportunity tohave an abortion to those who did not. Even putting asidethe serious ethical questions, this is much easier said thandone (and indeed is the subject of much of thepre–Donohue and Levitt (2001) work by economists onthe consequences of abortion legalization).

48 I have not been able to figure out what role thishypothesis plays in the empirical work. See ChristopherL. Foote and Christopher F. Goetz (forthcoming) for anattempt to make sense of this; they come to different con-clusions than Donohue and Levitt (2001).

(p. 386).48 While possibly “simple,” it is amaz-ingly difficult to articulate clearly in a regres-sion framework where the unit of observationis the individual. At its core this hypothesisappears to include the implicit assertion thatamong other things, my mother’s decision notto abort the fetal John DiNardo caused someother children’s propensity to commit crimeto increase. (Although it should be said, itclearly raised mine!) Such effects are difficultto identify, even in the easiest cases (CharlesF. Manski 1993).

A far more subtle mechanism is distinctfrom the first, although it could certainly

participating in the discussion one runs therisk of coarsening the debate on how wetreat the poor—the usual the target ofeugenic policies.46 Caveats aside, here goes.

In their original article, John J. Donohueand Levitt (2001) cite two possible “theories”about the consequences of abortion legaliza-tion. Neither of them fit well into the frame-work described in section 3.47 Donohue and Levitt (2001) discuss two possible mechanisms at length.

Donohue and Levitt (2001) first argue that“The simplest way in which legalized abortionreduces crime is through smaller cohort sizes”



49 Indeed, this or similar identification strategy isemployed in such work as Kerwin Kofi Charles andMelvin Stephens (2006), Jonathan Gruber, Phillip Levine,and Douglas Staiger (1999), Marianne Bitler andMadeline Zavodny (2002), as well as Joyce (2004). Gruber,Levine, and Staiger (1999) detect a rather small (andbrief) effect on the total number of children born fromthis identification strategy. Note of course, that such anexperiment would provide us essentially no informationon the “mechanisms”—the “effect of abortion legaliza-tion” could be a complicated interaction of many thingshaving little to do with selective abortion or cohort sizeper se. Merely the option of having an abortion mightchange outcomes for many reasons.

earlier than the other forty-five states andthe District of Columbia. Between 1988 and1994, violent crime in the early-legalizingstates fell 13 percent compared to the otherstates; between 1994 and 1997, their mur-der rates fell 23 percent more than those ofthe other states (p. 140).

Of the identification strategies employedin this literature, this is the most transparent.To understand what is going on, assume thatpre-Roe legalization provided a Brandeisiannatural experiment. Instead of the individualbeing the unit of observation, think of eachstate as a sort of identical petri dish to whicha drop of abortion legalization is beingadded. Fifteen to twenty five years later, thepetri dishes will be checked again to see howmuch per capita crime is occurring. If legal-ization had been an actual experiment (per-haps run by a dictator), we might haveexpected half the states to be legalizers andthe other half to never legalize (assume thatitems in the petri dishes can’t jump intoother petri dishes.) That of course did nothappen. In this case, the experimenteradded a drop of legalization to five states in1970, and then added a drop to the remain-ing states a scant three years later. Of course,it wouldn’t be clear that even in this experi-ment you could detect an “effect” on crime,unless the effect was large relative to thevariation across the petri dishes expected inthe absence of any experiment.49

Though one would not know from read-ing Freakonomics, Donohue and Levitt(2001) argue that this research design is

interact with it. “Far more interesting fromour perspective is the possibility that abor-tion has a disproportionate effect on thebirths of those who are most at risk of engag-ing in criminal behavior” (Donohue andLevitt 2001, p. 386).

Even if we could agree that the effect ofabortion legalization is independent of otheraspects of the society (access to birth con-trol, women’s rights in other spheres, etc.),for anyone who has given the problem of“missing data” some thought, it is difficult tobe sanguine about the possibility of inferringmuch about the criminal propensities ofthose who are never born. Even in the con-text of a medical RCT, the analogous prob-lem of attrition is often distressingly difficultto cope with. Moreover, the problem is sodifficult that in the RCT one often abandonshope of modeling nonresponse or sampleselection and seeks merely to bound the dif-ference between the treated and controlgroups (Joel L. Horowitz and Manski 1998).

Moreover, as Donohue and Levitt (2001)observe, there are many mechanismsbesides abortion either to stop the “crimino-genic” fetus from being born or to preventthe child from becoming a “criminal” onceborn: “Equivalent reductions in crime couldin principle be obtained through alternativesfor abortion, such as more effective birthcontrol, or providing better environmentsfor those children at greatest risk for futurecrime” (p. 415).

Ironically, this observation points to a lot of(unasked) questions which are interestingand might conceivably be put to a severe test.The focus in Freakonomics unfortunately iselsewhere:

How, then, can we tell if the abortion–crimelink is a case of causality rather than simplycorrelation? One way to test the effect ofabortion on crime would be to measurecrime data in the five states where abortionwas made legal before the Supreme Courtextended abortion rights to the rest of the country . . . And indeed, those early-legalizing states saw crime begin to fall



50 They argue against the identification strategy bothon a priori grounds and on ex post grounds (the implausi-bility of the results so obtained). In Donohue and Levitt(2001), for example, when they deploy that identificationstrategy, they report that “the cumulative decrease incrime between 1982–97 for early-legalizing states com-pared with the rest of the nation is 16.2 percent greaterfor murder, 30.4 percent greater for violent crime, and35.3 percent greater for property crime. Realistically,these crime decreases are too large to be attributed to thethree-year head start in the early-legalizing states.” Thereservations in Donohue and Levitt (2001) about the esti-mates generated with this identification strategy do notappear in Freakonomics which selectively discusses somecomparison between early and late legalizing states.

51 The asterisk appears to be undefined in the text andmay be a typographical error.

52 As it turns out, the description of the regressions inthe text and the actual regressions run are not always thesame. See Foote and Goetz (forthcoming).

53 This is perhaps more than we should stipulate to: ourknowledge of the number of illegal abortions today orabortions that preceded abortion legalization in the 1970sis meager at best. Moreover, Donohue and Levitt (2001)and other researchers typically do not have data on theamount of crime committed by individuals of a given age.At best one has very crude proxies. See Charles andStephens (2006) or Joyce (2004) for discussion.

54 In the published version of the paper, the word“endogeneity” appears only regarding a discussion of tworight hand side variables—number of police and prisons—which are “lagged to minimize endogeneity.” The word“exogeneity” appears in a confusing discussion about thedifference between high and low abortion states (p. 401).

inadequate.50 Consequently, much of this isbeside the point. Donohue and Levitt(2001) argue that evidence from such aresearch design is only “suggestive.”

The bulk of their argument centers ontheir attempts to “more systematically” ana-lyze the relationship with an analysis of statelevel crime data on lagged “abortion rates.”

Consider equation (1) from Donohue andLevitt (2001):

At

.

They label At the “effective abortion rate.”The “a” subscript denotes a particular “age”group.51 Using data on state s at time t, theythen divide this by the number of live birthsto get an “effective abortion ratio”:

AstAst = ⎯⎯⎯.LBst

Much of the more “systematic” evidence onthe link between abortion legalization andcrime is a result of regressions of the form:

(2) log Crime Per Capitast

= β1 Ast ,

where each observation is the relevantstate/year average or value. The Xst are a setof covariates, γ are a set of state dummy

+ + + +Xst s t stβ θ γ λ ε

at aAbortion

Arrests= ∑ −

∗ aa

totalArrests

variables and λt are a set of year fixedeffects. The ε is a random disturbance thatis presumably uncorrelated with any of theregressors. Up to a constant that differs bystates, absent variation in X or the (modi-fied) abortion ratio, it is assumed that trendsacross state in crime would be the same.52

Stipulating that all of the data used to gen-erate this specification are fine53, I find itimpossible to interpret the coefficients at all.In common econometric parlance, the abor-tion ratio is “endogenous.” Indeed, somework has looked at the effect of economicand other conditions on abortion (RebeccaM. Blank, Christine C. George, and RebeccaA. London 1996): that is, something akin toA is the dependent variable in the regression.Donohue and Levitt (2001), however, spendsurprisingly little time discussing the issue.54

What are the “ground rules” that a skepti-cal, but persuadable person should use forevaluating this regression? Other than that“the coefficients look reasonable,” whatwould speak to the credibility of the researchdesign, or what should lead me to reject it?

Not obvious is the notion that we shouldbe reassured about the existence of an “abor-tion–crime” link because the OLS coeffi-cient on A in a regression like equation (2) isrobust to the inclusion of some covariates orslight modifications of the sample. One“intuition” that motivates investigatingwhether a result is “robust” to the inclusion

EffectiveA≡ teAbortion



55 To make this clear, consider an analysis made by offi-cials responsible for New York’s Powerball lottery. In theMarch 30, 2005, drawing, a startling number of persons(110) got five out of six numbers correct. According to anews report (Jennifer 8 Lee, “Who Needs Giacomo? Beton the Fortune Cookie,” New York Times, May 11, 2005),past experience with the lottery had led them to believethat, in the 29 states where the game is played, the aver-age number of winners would be more like four or five.After considering numerous hypotheses including fraud,lottery officials finally concluded that some of the winnershad chosen their number on the basis of a fortune cookie.Lottery investigators finally even managed to locate thefortune cookie maker who verified that his factory hadproduced the fortune cookies with the winning number.

in question, such considerations can remainimportant.

One example is the discussion entitled“What Do School Teacher’s and SumoWrestler’s Have in Common,” the authorsdiscuss some work by Levitt on detecting“teacher cheating.” In the telling, the cast ofheroes includes the CEO of the ChicagoPublic School system, and the villainsinclude the school teachers and their laborunion (“When [Duncan] took over the pub-lic schools, his allegiance lay more with theschoolchildren and their families than withteachers and their unions,” p. 36.)

The basic method is to analyze the patternof test answers. Answers that depart fromthe posited (ad hoc) data generation processare flagged as “cheating.” For obvious rea-sons, at no point in the process are actualdata on observed teacher cheating used. As aconsequence, the algorithm described hasno way of discriminating between the case inwhich a teacher selectively “corrects” a sub-set of answers for a class, from those cases inwhich the students (unknown to the teacher)have obtained copies of a subset of theanswers, to name one (perhaps unlikely) sit-uation. At a most basic level, of course, thereis no perfect way to “detect teacher cheat-ing” with statistical analysis.55

Indeed, the chapter indicates that the“teacher cheating” algorithm was not thesole method used to assess guilt (as onehopes) but remarks with little further curios-ity that “the evidence was strong enough

of a large number of explanatory variablescomes from the RCT. On average, if werepeat the experiment, the answer we getfrom including covariates and from exclud-ing covariates should be the same.

On the other hand, clearly it makes nosense to think of A as “randomly assigned.”Indeed, if abortion legalization is all about“selection”—i.e., the difference in the crimepropensities of those born and those notborn—pure random assignment of abortion (athought too grotesque to even contemplate)would not merely leave the statistical problemunsolved; it would answer a different (evenmore uninteresting) question. For example, inone version of the Donohue–Levitt story,abortion matters for crime because it is theconsequence of choice made by women toselectively abort some fetuses and not oth-ers. “Random abortion” would, on the otherhand, would produce no “selectioneffect”—studying such “random” variationin abortion ratios would be silent about theputative effects of legalizing abortion.

If thinking about the regression as anapproximation to some sort of RCT doesn’thelp, how is one to even assess or interpretthe specification? What covariates “should”be included? Missing from this research iseither a similarity to the simple type of ques-tion I described in section 3 or an explicitmodel of the link between abortion legaliza-tion and cohort size. With an explicit struc-tural model, one might in principle be able towrap one’s mind around what question isbeing asked. The mere presence of an explic-it model might help, although we would stillbe faced with the task of putting it to somesort of test. Absent that, it is hard to under-stand why this (or similar evidence) shouldpersuade anyone (one way or the other.)

4.4 Catching Cheaters

I have suggested that a focus on actualpolicies and their potentially predictableconsequences often goes a long way to mak-ing some queries intelligible. Even wherequestions about causes are not the only ones



56 The calculation is:

(Pr(D|C) • Pr(C)1−Pr(C|D)=1−{⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯}Pr(D|C) •Pr(C) + Pr(D|~C) •(1 − Pr(C))

.9(.04)= 1 − ⎯⎯⎯⎯⎯⎯⎯.9(.04) + .06(.96)

= 1 − 0.385

= 0.615.

It should also be noted that the usual way to minimize thisproblem is to test the teacher more than once: this works,of course, only in the highly improbable case that onemight consider errors from the proposed procedure asindependent.

Algorithm|Not Engaged in Cheating) = .06Pr(C) ≡ Pr(Engaged in Cheating) = .04.I wasn’t able to locate the actual numbers in

Freakonomics and the ones I have chosenseem a bit optimistic for the algorithm theydescribe (albeit a bit pessimistic about thefraction of cheating teachers). If they werecorrect, however, it would explain why only ahandful of those identified by the algorithmwere finally identified as cheaters—despitethe large pool of potential cheaters. Many sta-tistically naive readers might conclude thatvirtually all of those identified as guilty wereindeed guilty. The test looks pretty accurate.Few detected cheaters are innocent, andcheaters have a good chance of being caught.However, even in this example, of the roughly9 percent of teachers classified as cheating onthe basis of the algorithm, the majority (about62 percent) would actually be innocent. Thisstrikes me as a frighteningly high percentage,but perhaps others will disagree.56 A morethoughtful analysis would go even further:does it treat different but morally homoge-neous groups differently? It would almostcertainly give one a moment’s pause if an algo-rithm was only (or mostly) able to detectcheating among the lowest paid teachers withthe most difficult students, as well as beingscarcely able to detect cheating among themost affluent. Freakonomics unfortunatelydiscusses none of these issues.

5. The Power of TheoryIf what we mean by theory is an articula-

tion of the premises of the questions we are

only to get rid of a dozen of them.” Given therest of the discussion, this might come asquite a surprise. Why would such a cleveralgorithm work so poorly in a situation whenthere was so much cheating?

Anything but a perfect “test” for the exis-tence or “nonexistence” of something (avirus, for example) commits two types oferror—in unhelpful terminology, Type I andType II. I find the legal metaphor the easiestway to remember the distinction. The legalsystem in the United States putativelyattempts to minimize Type I error—sendingan innocent person to jail. Type II error isthe opposite mistake—exonerating theguilty. In practice, there is a trade-offbetween the two types. One way to avoidType II error is to declare everyone guilty;declare everyone innocent and one avoidsType I error at the expense of Type II error.

This is of course relevant if one is interest-ed in the causal impact of implementing acheating detection algorithm. Here I focusonly on the narrowest causal question: howmany innocents are punished by such a sys-tem? There are others to be sure.

If the fact that only a “handful” werecaught is a surprise to the reader, it wouldn’tbe a surprise to those familiar with the find-ings of Amos Tversky and Daniel Kahneman(1974) who argue that people are frequentlyinattentive to “base rates” (although thatinterpretation is the subject of a livelydebate.) The canonical problem can be illus-trated by making a few assumptions aboutthe algorithm discussed in Freakonomics.Suppose that the probability of one’s cheat-ing being detected, given that you cheat is0.90—the probability of Type II error is 0.1.Also assume that the probability the algo-rithm incorrectly identifies you as a cheaterwhen you are not is .06—Type I error.Further suppose that 4 percent of teacherscheat—this is the crucial “base rate.” Slightlymore formally:

Pr(D|C) ≡ Pr(Detected Cheating byAlgorithm|Engaged in Cheating) = .90

Pr(D|~C) ≡ Pr(Detected Cheating by



57 This gloss is clearly inadequate. See Cartwright(2007d). See also Alexandrova (2006).

attempting to answer, it is fair to say that alittle bit of theory can go a long way.

Despite Freakonomics’s various encomi-ums to the “powerful toolkit of economics”(apart from “regression” which manages torise to a level no higher than “art”), I coulddetect very little in the research thatdepended on “economic theory” in anythingbut the most superficial way. Perhaps that isall for the best. Cartwright (2007d) makes aconvincing argument that the “trouble with[more mathematically sophisticated theorymodels] is not that [they are] too rigorous,but rather . . . not rigorous enough.” Unreal-istic assumptions are necessary in all theoriz-ing, but the problem is that the alleged“tendencies” isolated in such models areusually model-dependent in a way that viti-ates any “external validity.”57 It’s one thing tofind that a model helps illuminate somesmall aspect of human responses in a fewcontexts. In my field of labor economics, forexample, the canonical labor–leisure modelmight help me predict how working mothersin female headed households adjust theirhours of market work to a small change in anexisting government welfare program, but Iwouldn’t use the model to tell me much elseabout working mothers.

If were one to judge Economics by read-ing Freakonomics alone, however, it wouldappear that some economists are successful-ly going after much bigger game, with eco-nomic theory leading the way. How elsecould a study about professional Sumospeak to anything but the unusual context ofSumo? Presumably some “model” takes usfrom a finding for wrestlers to something ofmore general interest. On the other hand,all we learn about economic theory is that itappears to be the premise that “incentivesmatter.” Whatever enthusiasm one mighthave for the power of that insight, it is notclear what an incentive “is.” The helpfulindex to the book lists the following: incen-

tives, bright line versus murky, as a corner-stone of modern life, criminal, definitions of,discovery and understanding, economic, ofexperts, invention and enactment of, moral,negative versus positive, power of, of realestate agents, schemes based on, of school-teachers, social, study, tinkering with,trade-offs inherent in.

Indeed, in Dubner and Levitt’s hands, theassertion that incentives are the “corner-stone of modern life” often comes off as atwo part tautology. The first part of the tau-tology is: “when incentives matter, they mat-ter.” The second part of the tautology is thatwhen incentives don’t matter, it is because of“moral incentives.”

Less than a theory, perhaps, it describes aworld view that evokes a sort neo-Skinnerianbehaviorism that in popular writing was mostcogently demolished by Noam Chomsky(1971). For example, it was quite easy for meto get confused, when reading Freakonomics,about whether negative and positive incen-tives were merely synonyms for theSkinnerian notions of negative and positivereinforcement.

Perhaps I read more into the use of theword incentives than is there. However con-sider Dubner and Levitt’s description of the“typical economist’s view” of incentives:

Economists love incentives. They love todream them up and enact them, study them,and tinker with them. The typical economistbelieves the world has not yet invented aproblem that he can not fix if given a freehand to design the proper incentive scheme.His solution may not always be pretty—itmay involve coercion or exorbitant penaltiesor the violation of civil liberties—but theoriginal problem, rest assured, will be fixed.An incentive is a bullet, a lever, a key: anoften tiny object with astonishing power tochange a situation (p. 20).

Nonetheless, as elastic as the notion ofincentives is, I think it is still way too narrow.Speaking of B. F. Skinner’s views of thepower of “reinforcement,” Chomsky’s(1971) words about B. F. Skinner seem toapply with equal force to Dubner and



58 Buchanan (1975), page 92; Chapter 6, “The Paradoxof ‘Being Governed’” at Buchanan (1999) http://www.econlib.org/LIBRARY/Buchanan/buchCv7Contents.html.

59 See Cartwright (1983), especially essay 2 (“TheTruth Doesn’t Explain Much”) and essay 3 (“Do the Lawsof Physics State the Facts?”) for a discussion of the case ofphysics.

be epistemologically privileged. Even ifsocial scientists were to learn something“deep,” “fundamental,” or “primordial” abouthuman behavior that was previouslyunknown to the skilled novelist, it is unlikelyto inform us very much about the type ofpolicies we might like to pursue. As Lecam(1977) observed about a genuine examplefrom real science, “even those physicists whoare most fascinated by the kinetic theory ofgases would hesitate to use it to compute thesize of wood beams for their own abode.”Simply put questions about the predictableconsequences of our actions are not well-answered by untested or untestable insightsfrom some “general” economic theory; ratherwe might learn a little about the predictableconsequences of our actions—if we arelucky—by formulating ideas that can be putto a test.

REFERENCES

Alexandrova, Anna. 2006. “Connecting EconomicModels to the Real World: Game Theory and theFCC Spectrum Auctions.” Philosophy of the SocialSciences, 36(2): 173–92.

Almond, Douglas V., Kenneth Y. Chay, and MichaelGreenstone. 2003. “Civil Rights, the War on Poverty,and Black–White Convergence in Infant Mortality inMississippi.” Unpublished.

Anonymous. 1986. “On the Physical Death of JesusChrist.” Journal of the American Medical Association,255(20): 2752–60.

Armon, Carmel, and Randolph W. Evans. 2005.“Addendum to Assessment: Prevention of Post-lumbarPuncture Headaches: Report of the Therapeutics andTechnology Assessment Subcommittee of theAmerican Academy of Neurology.” Neurology, 65(4):510–12.

Berry, Scott M., and Joseph B. Kadane. 1997. “OptimalBayesian Randomization.” Journal of the RoyalStatistical Society: Series B (Statistical Methodology),59(4): 813–19.

Bertrand, Marianne, and Sendhil Mullainathan. 2004.“Are Emily and Greg More Employable than Lakishaand Jamal? A Field Experiment on Labor MarketDiscrimination.” American Economic Review, 94(4):991–1013.

Bitler, Marianne, and Madeline Zavodny. 2002. “DidAbortion Legalization Reduce the Number ofUnwanted Children? Evidence from Adoptions.”Perspectives on Sexual and Reproductive Health,34(1): 25–33.

Blank, Rebecca M., Christine C. George, and RebeccaA. London. 1996. “State Abortion Rates: The Impactof Policies, Providers, Politics, Demographics, andEconomic Environment.” Journal of Health

Levitt’s typical economist: “Humans are notmerely dull mechanisms formed by a historyof reinforcement and behaving predictablywith no intrinsic needs apart from the needfor physiological satiation. Then humans arenot fit subjects for manipulation, and we willseek to design a social order accordingly.”

I do not mean to suggest that Dubner andLevitt believe that humans are merely “dullmechanisms” formed only by a history of“incentives.” My point is merely that as aframework for understanding human behav-ior the typical economists’ focus on “incen-tives” ignores much that is important. JamesM. Buchanan, for example, writes that “anyperson’s ideal situation is one that allows himfull freedom of action and inhibits thebehavior of others so as to force adherenceto his own desires. That is to say, each personseeks mastery over a world of slaves.”58

I hope I never live to meet this sort ofhomo economicus. At a minimum,Buchanan’s ideal appears more like a dystopi-an nightmare, notwithstanding the fact thathis description of homo economicus followslogically from his premises about humanmotivation. It also highlights the problemwith the view that “theory is evidence too”(June O’Neill as cited in Angus Deaton1996)—most commonly our models are per-fect environments to do “experiments” onpeople we would never hope to meet, in sit-uations in which they could never find them-selves. If even “the laws of physics lie”59 andphysicists often fruitfully use different andmutually inconsistent models, it seems farmore modesty about economic theory/mod-els is warranted than Freakonomics seems tosuggest. This is not to deny that simple eco-nomic models can be put to good use, buteconomic theory—whatever it is—shouldn’t



Lotteries.” NBER Working Papers, no. 10113.Deaton, Angus. 1996. “Letters from America: The

Minimum Wage.” Newsletter of the Royal EconomicSociety, 95: 13.

DeVany, Arthur. Forthcoming. “Steroids, Home Runsand the Law of Genius.” Economic Inquiry.

Dickens, Charles. 1854. Hard Times. London:Bradbury & Evans.

DiNardo, John. 2006a. “Freakonomics: Scholarship inthe Service of Storytelling.” American Law andEconomics Review, 8(3): 615–26.

DiNardo, John. 2006b. “Natural Experiments.” In TheNew Palgrave Dictionary of Economics, ed. StevenN. Durlauf and Lawrence E. Blume. Houndmills,U.K. and New York: Palgrave Macmillan.

Donohue, John J., III, and Steven D. Levitt. 2001.“The Impact of Legalized Abortion on Crime.”Quarterly Journal of Economics, 116(2): 379–420.

Edgeworth, Francis Ysidro. 1885. “The Calculus ofProbabilities Applied to Psychic Research, I.”Proceedings of the Society for Psychical Research, 3:190–99.

Edgeworth, Francis Ysidro. 1887. “The Calculus ofProbabilities Applied to Psychic Research, II.”Proceedings of the Society for Psychical Research, 4:189–208.

Edwards, W. D., W. J. Gabel, and F. E. Hosmer. 1986.“On the Physical Death of Jesus Christ.” Journal ofthe American Medical Association, 255(11): 1455–63.

Fagot, Anne M. 1980. “Probabilities and Causes: On LifeTables, Causes of Death, and Etiological Diagnoses.”In Probabalistic Thinking, Thermodynamics and theInteraction of the History and Philosophy of Science,ed. Jakko Hintikka, David Gruender, and EvandroAgazzi. Studies in Epistemology, Logic, Methodology,and Philosophy of Science, vol. 146. Dordrecht;Boston and London: D. Reidel Publishing Company,41–104.

Food and Drug Administration. 1997. “FDA AnnouncesWithdrawal of Fenfluramine and Dexfenfluramine(Fen-Phen).” Press Release: September 15.

Foote, Christopher L., and Christopher F. Goetz.Forthcoming. “The Impact of Legalized Abortion onCrime: Comment.” Quarterly Journal of Economics.

Freedman, David A. 1999. “From Association toCausation: Some Remarks on the History ofStatistics.” Statistical Science, 14(3): 243–58.

Fryer, Roland G., Jr., and Steven D. Levitt. 2004a.“The Causes and Consequences of DistinctivelyBlack Names.” Quarterly Journal of Economics,119(3): 767–805.

Fryer, Roland G., Jr., and Steven D. Levitt. 2004b.“Understanding the Black–White Test Score Gap inthe First Two Years of School.” Review of Economicsand Statistics, 86(2): 447–64.

Gosling, C. G. 1987. “Comments on ‘The Physical Deathof Jesus Christ.’” Journal of Biocommunication, 14(4): 2–3.

Granger, Clive. 1986. “Statistics and Causal Inference:Comment.” Journal of the American StatisticalAssociation, 81(396): 967–68.

Graunt, John. 1676. Natural and Political ObservationsMentioned in a Following Index and Made upon theBills of Mortality. With Reference to the Government,

Economics, 15(5): 513–53.Brenner, B. 2005. “Did Jesus Die of Pulmonary

Embolism?” Journal of Thrombosis andHaemostasis, 3(9): 2130–31.

Buchanan, James M. 1975. The Limits of Liberty:Between Anarchy and Leviathan. Chicago: Universityof Chicago Press.

Buchanan, James M. 1999. The Limits of Liberty:Between Anarchy and Leviathan. Indianapolis:Liberty Fund, Inc.

Card, David Edward, and Alan B. Krueger. 1995. Mythand Measurement: The New Economics of theMinimum Wage. Princeton: Princeton UniversityPress.

Cartwright, Nancy. 1983. How the Laws of Physics Lie.Oxford and New York: Oxford University Press.

Cartwright, Nancy. 1999. “The Vanity of Rigour inEconomics: Theoretical Models and GalileanExperiments.” Centre for the Philosophy of Naturaland Social Science Discussion Paper, no. 43/99.

Cartwright, Nancy. 2003a. “Causation: One Word;Many Things.” Causality: Metaphysics and MethodsTechnical Report, no. CTR 07-03.

Cartwright, Nancy. 2003b. “Counterfactuals inEconomics: A Commentary.” In Proceedings fromINPC 2003: Explanation and Causation: Topics inContemporary Philosophy, Volume 4, ed. O’Rourke.Cambridge and London: MIT Press.

Cartwright, Nancy. 2007a. “Causation: One Word, ManyThings.” In Hunting Causes and Using Them:Approaches in Philosophy and Economics. Cambridgeand New York: Cambridge University Press, 9–23.

Cartwright, Nancy. 2007b. “Counterfactuals inEconomics: A Commentary.” In Hunting Causes andUsing Them: Approaches in Philosophy andEconomics. Cambridge and New York: CambridgeUniversity Press, 236–61.

Cartwright, Nancy. 2007c. Hunting Causes and UsingThem: Approaches in Philosophy and Economics.Cambridge and New York: Cambridge UniversityPress.

Cartwright, Nancy. 2007d. “The Vanity of Rigour inEconomics: Theoretical Models and GalileanExperiments.” In Hunting Causes and Using Them:Approaches in Philosophy and Economics. Cambridgeand New York: Cambridge University Press, 217–35.

Center for Health Statistics. 1998. “Data Matters:Multiple Cause of Death in California.” U.S.Department of Health and Human Services, Centersfor Disease Control and Prevention TechnicalReport.

Charles, Kerwin Kofi, and Melvin Stephens, Jr. 2006.“Abortion Legalization and Adolescent SubstanceUse.” Journal of Law and Economics, 49(2): 481–505.

Chomsky, Noam. 1971. “The Case against B.F.Skinner.” New York Review of Books, December 30.

Cournot, M., J. C. Marquié, D. Ansiau, C. Martinaud,H. Fonds, J. Ferrières, and J. B. Ruidavets. 2006.“Relation between Body Mass Index and CognitiveFunction in Healthy Middle-Aged Men andWomen.” Neurology, 67(7): 1208–14

Cullen, Julie Berry, Brian A. Jacob, and Steven D.Levitt. 2003. “The Effect of School Choice onStudent Outcomes: Evidence from Randomized



Policy Analysis in Economics: A Twentieth CenturyRetrospective.” Quarterly Journal of Economics,115(1): 45–97.

Heckman, James J. 2005. “The Scientific Model ofCausality.” Sociological Methodology, 35(1): 1–98.

Heckman, James J., and Edward Vytlacil. 2005.“Structural Equations, Treatment Effects, andEconometric Policy Evaluation.” Econometrica,73(3): 669–738.

Holland, Paul W. 1986. “Statistics and CausalInference.” Journal of the American StatisticalAssociation, 81(396): 945–60.

Horowitz, Joel L., and Charles F. Manski. 1998.“Censoring of Outcomes and Regressors Due toSurvey Nonresponse: Identification and EstimationUsing Weights and Imputations.” Journal ofEconometrics, 84(1): 37–58.

Joyce, Ted. 2004. “Further Tests of Abortion andCrime.” NBER Working Papers, no. 10564.

Kennan, John. 1989. “Simultaneous Equations Bias inDisaggregated Econometric Models.” Review ofEconomic Studies, 56(1): 151–56.

Keynes, John Maynard. 1921. A Treatise on Probability.London: Macmillan.

Laplace, Pierre Simon de. Essai Philosophique sur lesProbabilités, sixth ed., New York: John Wiley & Sons,1795. Translated as “Philosophical Essay onProbabilities” from the Sixth French Edition byFrederick Wilson Truscott and Frederick LincolnEmory and First U.S. edition Published in 1902.

Lecam, Lucien. 1977. “A Note on Metastatistics or ‘AnEssay toward Stating a Problem in the Doctrine ofChances.’” Synthese, 36(1): 133–60.

Levitt, Steven D., and Stephen J. Dubner. 2005.Freakonomics: A Rogue Economist Explores theHidden Side of Everything. New York: HarperCollins.

Levitt, Steven D., and Stephen J. Dubner. 2006a.“Freakonomics 2.0.” http://freakonomics.blogs.nytimes.com/2006/09/20/freakonomics-20/. Acces-sed November 1, 2007.

Levitt, Steven D., and Stephen J. Dubner. 2006b.“Hoodwinked.” New York Times Magazine, January 8.

Lillard, Lee A. 1998. “The Market for Sex: StreetProstitution in Los Angeles.” Unpublished.

Manski, Charles F. 1993. “Identification of EndogenousSocial Effects: The Reflection Problem.” Review ofEconomic Studies, 60(3): 531–42.

Mayo, Deborah G. 1996. Error and the Growth ofExperimental Knowledge. Science and ItsConceptual Foundations series. Chicago: Universityof Chicago Press.

Mayo, Deborah G., and Michael Kruse. 2002.“Principles of Inference and Their Consequences.” InFoundations of Bayesianism, ed. D. Corfield and J.Williamson. Applied Logic Series, vol. 24. Dordrecht:Kluwer Academic, 381–404.

Moffitt, Robert. 2005. “Remarks on the Analysis ofCausal Relationships in Population Research.”Demography, 42(1): 91–108.

National Center for Health Statistics. 2006.“Instructions for Classifying Underlying Cause-of-Death.” NCHS Instruction Manual Part 2-a.

Pearl, Judea. 1997. “The New Challenge: From a

Religion, Trade, Growth, Air, Diseases, and theSeveral Changes of the Said City, First edition. JohnMartyn.

Graunt, John. Natural and Political ObservationsMentioned in a Following Index and Made upon theBills of Mortality. With Reference to the Government,Religion, Trade, Growth, Air, Diseases, and theSeveral Changes of the Said City., first ed., JohnMartyn, 1676. Rendered into HTML Format by EdStephan 25 Jan 96. Accessed June 1, 2005 fromhttp://www.ac.wwu.edu/~stephan/Graunt/bills .html.

Grob, Gerald N. 1978. Edward Jarvis and the MedicalWorld of Nineteenth-Century America. Knoxville:University of Tennessee Press.

Gronniger, Jerome Timothy. 2003. “Fat and Happy:Dissecting the Obesity–Mortality Relationship.”Unpublished.

Gronniger, Jerome Timothy. 2006. “A SemiparametricAnalysis of the Relationship of Body Mass Index toMortality.” American Journal of Public Health, 96(1):173–78.

Gruber, Jonathan, Phillip Levine, and Douglas Staiger.1999. “Abortion Legalization and Child LivingCircumstances: Who Is the ‘Marginal Child?’”Quarterly Journal of Economics, 114(1): 263–91.

Guerry, André-Michel. 1883. Essai sur la statistiquemoral de la France. A Translation of André-MichelGuerry’s Essay on the Moral Statistics of France: ASociological Report to the French Academy ofScience edited and Translated by Hugh P. Whitt andVictor W. Reinking, 2002.

Hacking, Ian. 1965. The Logic of Statistical Inference.Cambridge: Cambridge University Press.

Hacking, Ian. 1975. The Emergence of Probability: APhilosophical Study of Early Ideas about Probability,Induction and Statistical Inference. Cambridge:Cambridge University Press.

Hacking, Ian. 1983a. “Nineteenth Century Cracks inthe Concept of Determinism.” Journal of the Historyof Ideas, 44(3): 455–75.

Hacking, Ian. 1983b. Representing and Intervening:Introductory Topics in the Philosophy of NaturalScience. Cambridge: Cambridge University Press.

Hacking, Ian. 1988. “Telepathy: Origins ofRandomization in Experimental Design.” Isis, 79(3):427–51.

Hacking, Ian. 1990. The Taming of Chance. Ideas inContext, no. 17. Cambridge: Cambridge UniversityPress.

Hacking, Ian. 1995. Rewriting the Soul: MultiplePersonality and the Sciences of Memory. Princeton:Princeton University Press.

Hacking, Ian. 2000. The Social Construction of What?Cambridge and London: Harvard University Press.

Hanke, Lewis. 1935. The First Social Experiments inAmerica: A Study in the Development of SpanishIndian Policy in the Sixteenth Century. Cambridge,Mass. and London: Harvard University Press.

Harville, David A. 1975. “Experimental Randomization:Who Needs It?” American Statistician, 29 (1): 27–31.

Heckman, James J. 1995. “Lessons from the BellCurve.” Journal of Political Economy, 103(5):1091–1120.

Heckman, James J. 2000. “Causal Parameters and



Campbell. 2002. Experimental and Quasi-Experimental Designs for Generalized CausalInference. Boston: Houghton Mifflin.

Simon, Herbert A. 1953. “Causal Ordering andIdentifiability.” In Studies in Econometric Method,ed. W. Hood and T. Koopmans. Cowles Commissionfor Research in Economics Monograph, no. 14. NewYork: Wiley, 49–74.

Siskind, Frederic B. 1977. “Minimum WageLegislation in the United States: Comment.”Economic Inquiry, 15(1): 135–38.

Stigler, Stephen M. 1978. “Mathematical Statistics inthe Early States.” Annals of Statistics, 6(2): 239–65.

Suppes, Patrick. 1982. “Arguements for Randomizing.”Philosophy of Science Association Proceedings, 2:464–75.

Swijtink, Zeno G. 1982. “A Bayesian Justification ofExperimental Randomization.” Philosophy ofScience Association Proceedings, 1: 159–68.

Thacher, David. 2001. “Policing Is Not a Treatment:Alternatives to the Medical Model of PolicyResearch.” Journal of Research in Crime andDelinquency, 38(4): 387–415.

Thacher, David. 2002. “From Racial Profiling to RacialEquality: Rethinking Equity in Police Stops andSearches.” Gerald R. Ford School of Public PolicyWorking Paper, no. 02-006.

Tversky, Amos, and Daniel Kahneman. 1974.“Judgement under Uncertainty: Heuristics andBiases.” Science, 185(4157): 1124–31.

Ur Rehman, H. 2005. “Did Jesus Christ Die ofPulmonary Embolism? A Rebuttal.” Journal ofThrombosis and Haemostasis, 3(9): 2131–33.

Voltaire. 1796. The History of Candid; or All for theBest. London: C. Cooke.

Welch, Finis R.. 1974. “Minimum Wage Legislation inthe United States.” Economic Inquiry, 12(3):285–318.

Welch, Finis R. 1976. “Minimum Wage Legislation inthe United States.” In Evaluating the Labor MarketEffects of Social Programs, ed. Orley Ashenfelter andJames Blum. Princeton: Princeton University Press.

Welch, Finis R. 1977. “Minimum Wage Legislation inthe United States: Reply.” Economic Inquiry, 15(1):139–42.

Williams, Richard H., Bruno D. Zumbo, Donald Ross,and Donald W. Zimmerman. 2003. “On theIntellectual Versatility of Karl Pearson.” HumanNature Review, 3: 296–301.

Zellner, Arnold. 1984. “Causality and Econometrics.” InBasic Issues in Econometrics, ed. A. Zellner. Chicagoand London: University of Chicago Press, 35–74.

Century of Statistics to the Age of Causation.”Computing Science and Statistics, 29(2): 415–23.

Pearson, Karl. 1892. The Grammar of Science. A.London and C. Black.

Pearson, Karl. 1930. Life, Letters and Labours ofFrancis Galton. Vol. 3A, Correlation, PersonalIdentification and Eugenics. Cambridge: CambridgeUniversity Press.

Peirce, Charles Sanders. 1958. Collected Papers. Vol.7–8, ed. A. Burks. Cambridge: Harvard UniversityPress.

Reid, Sue Titus. 1985. Crime and Criminology, Secondedition. New York: Holt, Rinehart and Winston.

Reiss, Julian, and Nancy Cartwright. 2004. “Uncertaintyin Econometrics: Evaluating Policy Counterfactuals.”In Economic Policy under Uncertainty: The Role ofTruth and Accountability in Policy Advice, ed. P.Mooslechner, H. Schuberth, and M. Schürz.Cheltenham, U.K. and Northampton, Mass.: Elgar,204–32.

Reuter, Peter, Robert MacCoun, and Patrick Murphy.1990. “Money from Crime: A Study of the Economicsof Drug Dealing in Washington, D.C.” RANDResearch Report, no. R-3894-RF

Rubin, Donald B. 1978. “Bayesian Inference for CausalEffects: The Role of Randomization.” Annals ofStatistics, 6(1): 34–58.

Sakula, A. 1991. “A Hundred Years of LumbarPuncture: 1891–1991.” Journal of the Royal Collegeof Physicians of London, 25(2): 171–75.

Saliba, W. R. 2006. “Did Jesus Die of PulmonaryEmbolism?” Journal of Thrombosis and Haemostasis,4(4): 891–92.

Savage, Leonard J. 1961. “Discussion.” In TheFoundations of Statistical Inference: A Discussion, ed.G. A. Barnard and D. R. Cox. London and Colchester:Spottiswoode Ballantyne and Company, 62–103.

Savage, Leonard J., M. S. Bartlett, G. A. Barnard, D. R.Cox, E. S. Pearson, C. A. B. Smith et al. “TheFoundations of Statistical Inference: A Discussion.” InG. A. Barnard and D. R. Cox, eds., The Foundationsof Statistical Inference: A Discussion, Meuthen’sMonographs on Applied Probability and StatisticsSpottiswoode Ballantyne & Co. Ltd. London &Colchester 1962. A Disucssion Opened by L. J. Savageat the Joint Statistics Seminar of Birbeck and ImperialColleges. Discussants also Include H. Ruben, I. J.Good, D. V. Lindley, P. Armitage, C. B. Winsten, R.Syski, E. D. Van Rest and G. M. Jenkins.

Scheiber, Noam. 2007. “How Freakonomics Is Ruiningthe Dismal Science.” The New Republic, April 2.

Shadish, William R., Thomas D. Cook, and Donald T.


Date post:	10-May-2018
Category:	Documents
Upload:	phungdieu
View:	216 times
Download:	0 times

Interesting Questions in Freakonomics - Indiana Universityusers.ipfw.edu/dilts/E 306...

Documents