+ All Categories
Home > Documents > Harvard Econ

Harvard Econ

Date post: 26-Dec-2015
Category:
Upload: scribewriter1990
View: 238 times
Download: 2 times
Share this document with a friend
Description:
econometrics, harvard, questions
Popular Tags:
47
2 Introduction Some programs aimed at reducing juvenile crime are based on the theory that a good way to keep teenagers out of trouble is to keep them busy and off the streets. These programs include supervised Saturday and evening sports activities, supervised clubs, and similar programs. Another way to keep teenagers under supervision is to keep them in school longer. Here, you will examine the effect on juvenile crime of whether a given day is a “school day” (a day that a teenager is supposed to be in school) or a “no-school day” (a day when a teenager does not have school). The measure of juvenile crime is the number of reported incidents of juvenile property crimes. (Juvenile means under 18 years old. Property crime includes theft and vandalism; it excludes violent crime and drug offenses.) The data are daily, for weekdays only (no data for Saturday or Sunday) for 1995 to 1999 for 29 cities in the U.S. with populations between 30,000 and 600,000; an observation corresponds to a day in a city. If data were available for all cities for all days, the number of observations would be 5 years q 261 weekdays/year q 29 cities = 37,845, but for some cities data are available only for shorter periods of time, so the actual sample size is n = 27,389. The variables are defined in Table 1 and regression results are summarized in Table 2. Table 1 Variable definitions Variable Mean Definition #incidents 3.35 Daily number of reported incidents of property crime involving a juvenile offender. Thus #incidents = 4 means that four juvenile property crime incidents were reported on that day in that city. breakday .08 = 1 if the day is a school break day (Thanksgiving break, Christmas break, etc.) = 0 otherwise teacherday .01 = 1 if the day is a teacher meeting day. A teacher meeting day normally would be a school day (that is, the day falls in the regular school year, not in the summer, and is not a school break day), except no classes are held because the day is used instead for teacher meetings, professional development, etc. = 0 otherwise. summer .23 = 1 if the day is during summer vacation = 0 otherwise. pop 1.19 population of the city in hundreds of thousands, so pop = 1.19 means the city has a population of 119,000.
Transcript
Page 1: Harvard Econ

2

Introduction

Some programs aimed at reducing juvenile crime are based on the theory that a good way to keep teenagers out of trouble is to keep them busy and off the streets. These programs include supervised Saturday and evening sports activities, supervised clubs, and similar programs. Another way to keep teenagers under supervision is to keep them in school longer.

Here, you will examine the effect on juvenile crime of whether a given day is a “school

day” (a day that a teenager is supposed to be in school) or a “no-school day” (a day when a teenager does not have school).

The measure of juvenile crime is the number of reported incidents of juvenile property

crimes. (Juvenile means under 18 years old. Property crime includes theft and vandalism; it excludes violent crime and drug offenses.) The data are daily, for weekdays only (no data for Saturday or Sunday) for 1995 to 1999 for 29 cities in the U.S. with populations between 30,000 and 600,000; an observation corresponds to a day in a city. If data were available for all cities

for all days, the number of observations would be 5 years q 261 weekdays/year q 29 cities =

37,845, but for some cities data are available only for shorter periods of time, so the actual sample size is n = 27,389.

The variables are defined in Table 1 and regression results are summarized in Table 2.

Table 1

Var iable definit ions

Var iable Mean Definit ion #incidents 3.35

Daily number of reported incidents of property crime involving a juvenile offender. Thus #incidents = 4 means that four juvenile property crime incidents were reported on that day in that city.

breakday .08

= 1 if the day is a school break day (Thanksgiving break, Christmas break, etc.)

= 0 otherwise teacherday .01

= 1 if the day is a teacher meeting day. A teacher meeting day

normally would be a school day (that is, the day falls in the regular school year, not in the summer, and is not a school break day), except no classes are held because the day is used instead for teacher meetings, professional development, etc.

= 0 otherwise. summer .23

= 1 if the day is during summer vacation = 0 otherwise.

pop 1.19

population of the city in hundreds of thousands, so pop = 1.19 means the city has a population of 119,000.

Page 2: Harvard Econ

3

Table 2 Summary of Regression Results

Dependent variable: #Incidents

Regressor (1) (2) (3) (4) (5) (6) teacherday .76

(.26) .87

(.23) .40

(.32) .87

(.23) -.82 (.67)

breakday .173 (.114)

.129 (.108)

.129 (.108)

.142 (.106)

.142 (.108)

summer .292 (.055)

.292 (.055)

.293 (.055)

.292 (.055)

pop 1.58 (0.03)

1.57 (.03)

4.05 (.17)

4.02 (.18)

pop2 -1.51 (.09)

-1.50 (.09)

pop3 .211 (.012)

.209 (.012)

teacherdayqpop .40 (.31)

3.05 (1.53)

teacherdayqpop2 -1.21 (.78)

teacherdayqpop3 .13 (.10)

intercept 3.35 (.03)

3.34 (.03)

1.39 (0.04)

1.39 (0.04)

.50 (.09)

.52 (.09)

F-tests testing the hypothesis that the population coefficients on the indicated regressors are all zero: teacher day,

teacherdayqpop

7.27 (p < .001)

teacher day, teacherdayqpop,

teacherdayqpop2, teacherdayqpop3

4.01 (p = .003)

teacherdayqpop, teacherdayqpop2,

teacherdayqpop3

1.75 (p = .155)

pop, pop2, and pop3 938.3 (p < .001)

919.8 (p < .001)

pop2 and pop3 171.5 (p < .001)

167.7 (p < .001)

pop, pop2, pop3, teacherdayqpop,

teacherdayqpop2, teacherdayqpop3

470.2 (p < .001)

R2 .0004 .0001 .1347 .1348 .1519 .1520 SER (Regression RMSE) 4.18 4.18 3.89 3.89 3.85 3.85

Notes: Heteroskedasticity-robust standard errors are given in parentheses under estimated coefficients, and p-values are given in parentheses under F- statistics. The F-statistics are heteroskedasticity-robust. All regressions have n = 27,389 observations. The regressions were estimated using data for weekdays only.

Page 3: Harvard Econ

4

Question 1 (24 points) The measure of a “no-school day” used in regression (1) (column (1) in Table 2) is whether the day is a teacher meeting day. The measure of a “no-school day” in regression (2) is whether the day is a break day (Thanksgiving break, etc.). a) (9 points) For regression (1):

(i) Provide the estimated effect on the number of incidents of having a “no-school day.” (ii) Is this estimated effect large in a real-world sense? Briefly, explain.

(iii) Test the hypothesis that this effect is zero, against the alternative that it is nonzero, at the 5% significance level.

b) (9 points) Repeat (i) – (iii) for regression (2) c) (6 points) Suggest a plausible reason, based on the definitions of the variables in regressions

(1) and (2), why the two estimates differ. Question 2 (30 points) a) (6 points) Explain what is meant by the SER in regression (3). b) (6 points) Interpret the coefficient on teacherday in regression (3). c) (6 points) Suggest a reason why the errors in regression (3) might be heteroskedastic;

explain. d) (6 points) Using regression (3), compute the predicted value of the number of incidents on a

teacher meeting day for a city with a population of 200,000 (so pop = 2). e) (6 points) The school superintendent in a city with population 200,000 is contemplating

changing a normal school day into a teacher meeting day. Use regression (6) (not regression (3)) to estimate the effect of this decision on the number of juvenile property crime incidents.

Question 3 (26 points) a) (8 points) One possibility is that pop enters the population regression function nonlinearly.

What does regression (5) tell us about this possibility? Briefly explain (be precise). b) (8 points) Another possibility is that the effect on crime of a “no-school day” is different in

bigger cities than in smaller cities. What do the regression results tell us about this possibility? Briefly explain (be precise).

c) (10 points) In words, briefly summarize your conclusions from Table 2 about the effect on juvenile property crime of having a “no-school day” because of a teacher meeting day.

Question 4 (20 points) a) (10 points) Suggest a policy intervention – that is, a specific program to provide teenagers

with some form of supervision – for which these results would be externally valid. Suggest a different policy intervention for which these results would not be externally valid. Explain (be precise).

b) (10 points) Suggest two potential threats to the internal validity of your conclusions in 3(c). That is, provide two potential threats to the internal validity of the regression analysis summarized in Table 2. Explain why each threat could be relevant to this study (be precise).

Page 4: Harvard Econ

1

Department of Economics Economics 1123 Harvard University Fall 2005

Midterm Exam

11:40 a.m., Thursday October 27, 2005 Instructions 1. Do not turn this page until so instructed. 2. This exam ends promptly at 1:00 PM. 3. The exam has five parts for a total of 100 points. Please put each par t in a separate blue

book. Put your name and Harvard ID number on the cover of each blue book. 4. You are permitted one two-sided 8½” x 11” sheet of notes, plus a calculator. No computers

or wireless devices without prior permission. You may not share resources with anyone else. 5. Some questions ask you to draw a real-world judgment in a problem of practical importance.

The quality of that judgment counts. For example, consider the question: “It is 10oF outside. In your judgment, why are so many people wearing heavy coats?” The answer, “To stay warm” would receive more points than the answer, “Because they are fashion-conscious.”

6. You may keep or discard this exam, you need not turn it in.

Introduction

The reputation of a university depends in part on teaching quality, which is primarily

measured by course evaluations. This exam considers an empirical analysis of course evaluations for n = 463 courses, sampled for the academic years 2000 –2002 at a major U.S. university (the University of Texas at Austin). The objective of the study is to quantify the causal effect on professorial productivity, as measured by course evaluations, of the physical appearance of the instructor (Beauty). The dependent variable is the “Course Overall” course evaluation rating, on a scale of 1 (very unsatisfactory) to 5 (excellent) (the same question and scale as at Harvard).

The physical appearance (Beauty) of the instructor was measured by a paid panel of six

students, working independently, who assigned a numeric grade to the physical appearance of all the instructors in the data set based on photographs on the instructors’ Web sites. The panelists were told to focus on physical characteristics and to make their ratings independent of age. The six grades were averaged, centered, and rescaled so that the average score for Beauty across all instructors is zero. Other relevant data were also collected.

Page 5: Harvard Econ

Table 1 Var iable Definit ions and Summary Statistics

Var iable Definit ion Mean Std. Dev.

Course Overall “Course overall” teaching evaluation score, on a scale of 1 (very unsatisfactory) to 5 (excellent)

4.022 .525

Beauty Rating of instructor physical appearance by a panel of six students, averaged across the six panelists, shifted to have mean zero.

0 .83

DBeauty>0 1 if 0

0 if 0

Beauty

Beauty

!­ ®

.51 .50

Female 1 if the instructor is female

0 if the instructor is male

­ ®¯

.36 .48

Minority 1 if the instructor is a non-White

0 if the instructor is White

­ ®¯

.10 .30

Non-native English 1 if the instructor is not a native English speaker

0 if the instructor is a native English speaker

­ ®¯

.04 .20

tenure track 1 if the instructor is in a tenure-track job (Asst., Assoc., Full Professor)

0 if the instructor is not in a tenure-track job (lecturer)

­ ®¯

.85 .36

intro course 1 if the course is introductory (mainly large Freshman and Sophomore courses)

0 if the course is not introductory

­ ®¯

.34 .47

one-credit course 1 if the course is a single-credit elective (yoga, aerobics, dance, etc.)

0 otherwise

­ ®¯

.03 .17

dresses well 1 if the instructor is wearing a tie in his Web photo (male) or

is wearing a blouse and jacket in her Web photo (female)

0 otherwise

­°

®°¯

.31 .46

2

Page 6: Harvard Econ

3

Table 2

Regression Results

Dependent variable: “Course Overall” evaluation score

(1) (2) (3) (4) (5) (6) Data subset: All

instructors All

instructors All

instructors All

instructors Male

instructors Female

instructors Regressor

Beauty

.410 (.081)

.275 (.059)

.229 (.047)

.237 (.096)

.384 (.076)

.128 (.064)

Female

-.166 (.098)

-.239 (.085)

-.210 (.075)

-.255 (.088)

– –

Minority

-.284 (.015)

-.249 (.012)

-.206 (.014)

-.221 (.012)

.060 (.101)

-.260 (.139)

Non-native English

-.344 (.152)

-.253 (.134)

-.288 (.112)

-.251 (.132)

-.427 (.143)

-.262 (.151)

tenure track

-.150 (.114)

-.136 (.094)

-.156 (.110)

-.131 (.092)

-.056 (.089)

-.041 (.133)

intro course

-.071 (.134)

-.046 (.111)

-.079 (.102)

-.052 (.110)

.005 (.129)

-.228 (.164)

one-credit course (yoga, aerobics,

dance, short electives)

– .687 (.166)

.823 (.129)

.694 (.170)

.768 (.119)

.517 (.232)

dresses well

– – .243 (.088)

– – –

BeautyuDBeauty>0 – – – .081 (.135)

– –

Intercept 4.27 (.071)

4.25 (0.56)

4.22 (.054)

4.21 (.054)

4.35 (.081)

4.08 (.088)

Summary statistics R2 .224 .279 .302 .285 .359 .162 n 463 463 463 463 268 195

Notes: Each column represents a different regression. Heteroskedasticity-robust standard errors are given in parentheses under estimated coefficients.

Page 7: Harvard Econ

4

Par t 1 (25 points) 1) (5 points) Interpret the coefficient on Beauty in regression (2). 2) (5 points) Using regression (2), compute a 95% confidence interval for the population

coefficient on Beauty. 3) (5 points) Define a 95% confidence interval. 4) (5 points) Professor Stock is male, not a minority, is a native English speaker, and is tenure

track. Ec1123 is not an introductory course, nor is it a one-credit elective. Suppose that Professor Stock has average beauty, so his value of Beauty is zero. Use regression (2) to compute the predicted “course overall” course evaluation score for Ec1123 this semester.

5) (5 points) The professor in Ec1123 next semester is a tenure-track white male Australian.

Suppose he has a Beauty score of 1.66. Use regression (2) to compute a 95% confidence interval for the difference between the Ec1123 Course Overall evaluation score next semester and the Course Overall score this semester.

Par t 2 (24 points) 1) Suppose you want to estimate a version of regression (2) in which the coefficients on all

regressors except Beauty are the same for men and women, however the effect of Beauty can differ for men and women. a) (4 points) Provide a regression specification that achieves this (be specific). b) (2 points) In your specification in (a), how would you test the hypothesis that the effect

of Beauty is the same for men and women (be specific)? 2) The coefficient on Beauty drops from .410 in regression (1) to .275 in regression (2).

a) (4 points) Explain why. What does this drop imply about the relation between Beauty and One-credit course?

b) (4 points) Is your reason in (a) for this decline plausible in a real-world sense? Explain.

3) The following variables are not in regression (2):

a) The amount of time the instructor spends on course preparation per class. b) The marital status of the instructor. For each, explain whether omission of this variable from regression (2) will, in your judgment, plausibly result in omitted variable bias for the estimated effect of Beauty. Briefly explain. (5 points each)

Page 8: Harvard Econ

5

Par t 3 (21 points) 1) Suppose you have data on years of teaching experience (Experience) of the instructor, and

you are considering choosing among three possible specifications:

(i) regression (2) plus Experience (ii) regression (2) plus Experience, Experience2, and Experience3

(iii) regression (2) plus log(Experience) a) (6 points) In your judgment (before you know the results of these regressions), which

specification, (i), (ii), or (iii), is the most appropriate? Explain. b) (4 points) Suppose you estimated regressions for specifications (i) and (ii). How would

you decide, based on the empirical evidence, whether (i) or (ii) is more appropriate. 2) Consider regression (4).

a) (2 points) Test, at the 5% level, the hypothesis that the coefficient on BeautyuDBeauty>0 is zero, against the alternative that it is nonzero.

b) (4 points) In real-world terms, describe the null hypothesis you just tested, the alternative, and the conclusion you draw from the hypothesis test.

3) (5 points) Test (at the 5% significance level) the hypothesis that the effect on course

evaluations of Beauty is the same for men and for women, against the alternative that these effects differ.

Par t 4 (14 points) 1) (6 points) Suppose you have data on marital status of the instructor (the data record three

possibilities: single and never married, single and divorced, married). Provide a regression specification that modifies (2) so as to control for marital status (be specific).

2) (8 points) Based on the facts given in the following statement and on the empirical results

presented in Table 2, in your judgment is the conclusion in the following statement justified or not? Explain.

“Regression (2) does not control for innate teaching ability. To do so, I obtained data on the instructor’s average teaching evaluations in the previous year and added it to regression (2). The coefficient on Beauty fell to .051 and was not statistically significant (SE = .079). Therefore I conclude that the Beauty coefficient in regression (2) is subject to omitted variable bias and that the true causal effect on course evaluations of Beauty is effectively zero.”

Page 9: Harvard Econ

6

Par t 5 (16 points) A FAS committee on improving undergraduate teaching needs your help before reporting to Dean Kirby. The committee seeks your advice, as an econometric expert, about whether FAS should take physical appearance into account when hiring teaching faculty. (This is legal as long as doing so is blind to race, religion, age, and gender.) You do not have time to collect your own data so you must base your recommendations on the regression results in Table 2. Based on your analysis of Table 2, what is your advice? Justify your advice based on a careful and complete assessment of the internal and external validity of the results in Table 2. Notes on Part 5:

x Assume the committee knows econometrics and econometric jargon at the level of this course.

x The committee has experts on ethics, law, and university policy, and it is uninterested in your views about the ethics or practicality of this proposed policy, whether the university should be in the business of maximizing course evaluation ratings, etc.; not that these are unimportant issues, they are simply not the question asked of you.

Page 10: Harvard Econ

1

Department of Economics Economics 1123 Harvard University Fall 2006

Midterm Exam

11:40 a.m., Thursday October 26, 2006

Instructions 1. Do not turn this page until so instructed. 2. This exam ends promptly at 1:00 PM. 3. The exam has four parts for a total of 100 points. Please put each par t in a separate blue

book. Put your name and Harvard ID number on the cover of each blue book. 4. You are permitted one two-sided 8½” x 11” sheet of notes, plus a calculator. No computers,

wireless, or other electronic devices without prior permission. You may not share resources with anyone else.

5. Some questions ask you to draw a real-world judgment in a problem of practical importance.

The quality of that judgment counts. For example, consider the question: “It is 10oF outside. In your judgment, why are so many people wearing heavy coats?” The answer, “To stay warm” would receive more points than the answer, “Because they are fashion-conscious.”

6. You may keep or discard this exam, you need not turn it in.

Introduction

Many concerned college administrators view binge drinking by college students as a

problem. Binge drinking can lead to other risky behavior or, in rare cases, death by drunk driving or alcohol poisoning. Some of these concerned college administrators think that the Greek system – fraternities for men, sororities for women – promotes a culture that encourages binge drinking. According to this “Animal House” view, the elimination of fraternities and sororities (and their replacement with dorm or off-campus housing) would go a long ways towards solving the problem of binge drinking among college students.

In this exam, you will examine the link between the Greek system and binge drinking

using data from the National College Health Risk Behavior Survey, a survey conducted in 1995 by the U.S. Centers for Disease Control. The study randomly selected individuals from 136 randomly selected two- and four-year colleges. The survey was mailed to the selected students, who filled it in and returned it by mail; the response rate was 65%. The data used here are for n = 1333 students at four-year colleges only.

Page 11: Harvard Econ

Table 1

Var iable Definit ions and Summary Statistics

Data source: 1995 National College Health Risk Behavior Survey

Var iable

Definit ion Mean Std. Dev.

Min Max

binge30 number of days out of the last 30 days in which the student binge drank, defined as consuming at least five alcoholic drinks (e.g. five bottles of beer) in two hours

2.35 4.04 0 25

alcohol30 number of days out of the last 30 days in which the student consumed any alcohol

5.12 5.94 0 30

Greek 1 if student belongs to a sorority or fraternity

0 otherwise

­ ®¯

0.19 0.39 0 1

female 1 if female

0 if male

­ ®¯

0.59 0.49 0 1

age age of student in years 20.33 1.56 18 24 sports 1 if on a sports team (intramural or intercollegiate)

0 otherwise

­ ®¯

0.32 0.47 0 1

Freshman 1 if Freshman

0 otherwise

­ ®¯

0.21 0.40 0 1

Sophomore 1 if Sophomore

0 otherwise

­ ®¯

0.25 0.43 0 1

Junior 1 if Junior

0 otherwise

­ ®¯

0.24 0.43 0 1

Black 1 if Black

0 otherwise

­ ®¯

0.14 0.34 0 1

Hispanic /other

1 if Hispanic, Asian, or other non-White, non-Black

0 otherwise

­ ®¯

0.20 0.40 0 1

2

Page 12: Harvard Econ

Table 2. Binge Drinking and Fraternity/Sorority Membership: Regression Results Dependent variable: binge30

(1) (2) (3) (4) (5) (6)

Regressor: Greek

1.87** (.32)

1.62** (.32)

1.48** (.31)

1.47** (.31)

2.69** (.56)

.37* (.18)

female

-1.33** (.23)

-1.01** (.23)

-.96** (.23)

-.97** (.23)

-.59* (.24)

-.25+ (.14)

age

.02

(.06) .05

(.06) .09

(.10) 3.53+ (1.81)

.09 (.10)

.01 (.07)

age2

__ __ __ -.081

(.062) __ __

Greek u female __ __ __ __ -2.06** (.66)

__

alcohol30 __ __ __ __ __ .54** (.02)

sports __ 1.29** (.26)

1.15** (.25)

1.16** (.25)

1.07** (.25)

.41** (.15)

Freshman

__ __ .35 (.48)

.70 (.55)

.38 (.48)

.76** (.29)

Sophomore

__ __ .00

(.36) .08

(.36) .03

(.36) .36

(.22) Junior __ __ .22

(.34) .14

(.34) .24

(.33) .46* (.20)

Black __ __ -2.08** (.24)

-2.09** (.24)

-2.12** (.24)

-.33+ (.17)

Hispanic/other __ __ -1.54** (.22)

-1.52** (.22)

-1.55** (.22)

-.32* (.15)

Intercept

2.40 (1.31)

1.16 (1.36)

.91 (2.28)

-34.96 (19.26)

1.68 (2.27)

-.92 (1.44)

F-statistics testing the hypothesis that the population coefficients on the indicated regressors are all zero:

age, age2 __ __ __ 1.95 (.142)

__ __

Freshman, Sophomore, Junior

__ __ .53 (.663)

.96 (.413)

.53 (.659)

2.81 (.038)

Black, Hispanic/other __ __ 46.01 (<.0001)

45.89 (<.0001)

46.80 (<.0001)

3.45 ( )

Regression summary statistics: R2 .061 .081 .125 .128 .135 .672

2R .059 .078 .119 .121 .128 .670

SER 3.919 3.879 3.791 3.788 3.772 2.322 n 1333 1333 1333 1333 1333 1333

Notes: Heteroskedasticity-robust standard errors are given in parentheses under estimated coefficients, and p-values are given in parentheses under F- statistics. The F-statistics are heteroskedasticity-robust. Coefficients are individually statistically significant at the +10%, *5%, **1% significance level.

3

Page 13: Harvard Econ

4

Par t 1 (25 points) 1) (5 points) Interpret the coefficient on Greek in regression (1). 2) (5 points) Explain why the coefficient on Greek decreases from regression (1) to regression

(2). 3) (5 points) Define heteroskedasticity and suggest a reason why the error in regression (3)

might be heteroskedastic. 4) (5 points) Using regression (3), predict the number of binge-drinking days in a 30-day

period for an 18-year old white male Freshman who belongs to a fraternity and is on a sports team.

5) (5 points) All the respondents are either Freshmen, Sophomores, Juniors, or Seniors, yet

Freshman, Sophomore, Junior, age, and the “constant” regressor (the intercept) are not perfectly multicollinear in regression (3). Describe a counterfactual situation in which these variables would be perfectly multicollinear.

Par t 2 (26 points) 1) (6 points) Consider two white male frat-member non-sports Sophomores, one of whom is

18 years old and the other is 20 years old. Using regression (5): a) (3 points) Compute the difference in the predicted values of binge30 for these two

students; b) (3 points) Compute a 95% confidence interval for the difference in part (a).

2) (5 points) Use regression (4) to test the null hypothesis that the relationship between age

and binge drinking is linear, against the alternative hypothesis that the relationship is possibly a quadratic, at the 5% significance level. Is the null hypothesis rejected?

3) (5 points) Suppose you hypothesized that female athletes are not prone to binge drinking,

even though male athletes might be. How would you modify regression (3) to test this hypothesis? Be precise.

4) (5 points) The p-value is missing in Table 2 for one of the F-tests based on regression (6).

Estimate this missing p-value and briefly explain how you did so. 5) (5 points) In everyday language, what is the difference in the interpretation of the

coefficient on Greek in regression (3) and regression (6)?

Page 14: Harvard Econ

5

Note to Par ts 3 and 4 Your answers should be based on the results in Table 2 and your knowledge of econometrics, not on your beliefs about personal choice, equity, probity, etc.; these are questions about the empirical results, not about your opinions concerning the Greek system, drinking, or such matters. Par t 3 (24 points) Do you agree or disagree with the following statements? Briefly explain why. (8 points each) 1) Binge drinking is a problem that primarily involves only a segment of the student

population. 2) Sororities are just as bad as fraternities, at least from the perspective of binge drinking. 3) Freshmen, who are learning how to cope with the new freedoms of college, have the

highest incidence of binge drinking; as students gain college experience, binge drinking becomes much less of a problem.

Par t 4 (25 points) 1) (8 points) Summarize the results in Table 2 about the effect on binge drinking of fraternity

and sorority membership. For the purpose of this question, take the results in the table at face value, that is, do not consider threats to the validity of these results.

2) (10 points) Provide two threats that, in your judgment, are the most important threats to the

internal validity of the results discussed in your response to Part 4/Question 1 (be specific and explain your reasoning).

3) (7 points) Consider the concerned college administrator of the introduction, who would like

to ban the Greek system and replace it with dorms or off-campus housing. All things considered, do the results in Table 2 support this recommendation? Specifically, why or why not?

Page 15: Harvard Econ

1

Background for Par ts I and II : Polit ical Corruption Parts I and II examine the relationship between the general level of education of citizens and the level of political corruption. The data are cross-sectional for the 50 U.S. states on the following variables:

Var iables in the Corruption Data Set

Var iable Definit ion Mean Std. Dev. Corruption rate Convictions in that state of federal, state, and local public

officials on corruption charges during the period 1990-2002, per 100,000 state residents

3.9 2.1

LowEd share Share (fraction) of adults in 1990 with at most a high school diploma (LowEd = .35 means 35% of adults have at most a high school diploma)

.35 .07

Urban share Share (fraction) of adults in 1990 living in an urban area .68 .15 Foreign-born share

Share (fraction) of adults in 1990 born outside the U.S. .02 .02

ln(Pop) Logarithm of 1990 state population 14.93 1.01 Voting share Share (fraction) of adults voting in the 1992 presidential

election .58 .07

Manufacturing share

Share (fraction) of jobs that are in the manufacturing sector in 1990

.17 .06

HS1928 High school graduation rate in 1928 .30 .12 LnInc1940 Logarithm of per capita state income in 1940 6.23 .79

Page 16: Harvard Econ

2

Part I: Corruption (A) The questions in Part I refer to Table 1.

Table 1 The Determinants of Corruption: OLS Regressions Results

Dependent variable: Corruption Rate

Regressor (1) (2) (3) LowEd share 10.4

(5.2) 18.4 (8.7)

-9.7 (54.8)

Urban share .4 (3.1)

-.5 (3.2)

Foreign-born share 21.9 (13.9)

21.3 (14.3)

ln(Pop) -.61 (.38)

-.56 (.41)

Voting share 5.5 (6.0)

-11.1 (32.7)

LowEd share×Voting share 47.7 (94.8)

R2 .069 .173 .177 N 50 50 50

F-statistics testing the hypothesis of zero coefficients on groups of variables: Urban share, Foreign share, ln(Pop),

Voting share .93

(p = .455) 1.15

(p =.345)

LowEd share, LowEd share×Voting share 2.25 (p =.118)

Voting share, LowEd share×Voting share 0.52 (p =.600)

Urban share, Foreign share, ln(Pop), Voting share, LowEd share×Voting share

0.93 (p =.470)

Notes: Heteroskedasticity-robust standard errors appear in parentheses under regression coefficients, and p-values appear in parentheses under F-statistics. All regressions include an estimated intercept, which is not reported. All regressions are estimated using a cross-sectional data set consisting of 50 US states.

Page 17: Harvard Econ

3

Questions for Par t I (25 points) Answer these questions in blue book #1 1) (3 points) Using regression (2), construct a 95% confidence interval for the effect on the

corruption rate of an increase in LowEd share of .01 (that is, of a 1 percentage point increase in the percent of the adult population with at most a high school degree).

2) Consider regression (3):

(a) (3 points) Test the hypothesis that the population coefficient on LowEd share×Voting

share is zero, against the alternative that it is nonzero. (b) (3 points) Test the hypothesis that citizen participation, specifically the presidential

voting share, does not affect corruption, against the alternative that the voting share affects corruption.

3) Do you agree or disagree with the following statements? Explain (3 points each).

(a) Because immigrants are less knowledgeable about the U.S. legal system, they are more susceptible to governmental corruption. The regression results in Table 1 show that this is true: more foreign-born citizens, more corruption.

(b) The R2 of regression (2) is low. Thus there are important determinants of corruption omitted, and therefore the coefficient on LowEd share in regression (2) is biased because of omitted variable bias.

(c) The regression results in Table 1 are flawed because they use heteroskedasticity-robust standard errors: if the errors really are homoskedastic, then these standard errors will be incorrect. The table should instead report standard errors that are correct even under homoskedasticity.

4) Suppose that high levels of corruption result in low-quality public institutions, including low-

quality schools, which in turn results in lower levels of education. (a) (3 points) If so, what are the implications for the estimated effect on corruption of

education in Table 1? Briefly explain. (b) Consider the following potential instrumental variables for LowEd share in regression

(3): (i) Newspapers = average number of newspapers per capita in 1990 (ii) Alphabet = 1 if the state falls in the first half of the alphabet, = 0 otherwise (e.g. = 1

for Alabama, = 0 for Wyoming) (2 points each) For each proposed instrument, is the variable arguably a valid instrument variable? Briefly explain.

Page 18: Harvard Econ

4

Part II : Corruption (B) The questions in Part II refer to Table 2.

Table 2 The Determinants of Corruption: Two Stage Least Squares Regressions Results

Dependent variable: Corruption Rate

(1) (2) (3) (4) (5) (6) Endogenous regressor

LowEd share 29.4 (11.7)

131.0 (114.4)

32.9 (12.8)

32.5 (10.2)

54.8 (36.4)

35.4 (11.4)

Exogenous regressors Urban share 1.3

(2.8) 18.4

(18.4) 1.9

(2.9) -.4

(2.5) 2.7

(5.6) -.1

(2.5) Foreign-born share 22.4

(14.4) 69.3

(48.9) 24.0

(14.7) 7.0

(9.4) 12.6

(14.8) 7.7

(9.5) ln(Pop) -.43

(.45) -2.20 (1.97)

-.49 (.49)

.34 (.34)

.18 (.54)

-.32 (.35)

Voting share 14.4 (8.2)

80.4 (73.3)

16.6 (18.8)

17.4 (7.2)

32.1 (24.1)

19.2 (7.8)

Manufacturing share -22.2 (6.3)

-28.5 (10.7)

-23.0 (6.1)

Instrumental variables HS1928 LnInc1940 HS1928, LnInc1940

HS1928 LnInc1940 HS1928, LnInc1940

First-stage F-statistic* 19.0 0.7 10.6 19.7 2.6 11.3 J-test of overidentifying

restrictions 3.95

(p = .047) 0.48

(p = .487) N 50 50 50 50 50 50

Notes: Heteroskedasticity-robust standard errors appear in parentheses under regression coefficients, and p-values appear in parentheses under F-statistics. All regressions include an estimated intercept, which is not reported. All regressions are estimated using a cross-sectional data set consisting of 50 US states, where the variables are defined in Table 1. *The first-stage F-statistic is the F-statistic testing the hypothesis that the coefficients on the instruments in the first stage regression all equal zero. Questions for Par t II (25 points) Answer these questions in blue book #2 1) (15 points) From the regressions in Table 2, select one or more preferred regressions that

you believe provide the most reliable basis for inference about the effect of low education levels on corruption. Carefully explain your reasoning.

2) (5 points) Based on your preferred regression(s), what conclusions do you draw about the

effect on corruption of the level of education? Explain. 3) (5 points) In your judgment, what are the most important threats to the internal validity of the

estimates in your preferred regression(s), upon which you based your answer to question 2?

Page 19: Harvard Econ

5

Background for Par ts III and IV: The 2001 Tax Rebate Because of an income tax cut enacted in May 2001, most U.S. taxpayers received a Federal tax rebate check between July and September 2001. Taxpayers received the check if they paid taxes in 2000 and if their income in 2000 was high enough. The maximum check size was $300 per taxpayer. Typically a family of two adults and two children with 2000 income at least $25,000 would have received the maximum $600; if their income was half that, they did not get a check. Because there were so many rebate checks, they were mailed over a ten-week period between July and September 2001. The week in which a check was mailed was determined by the second-to-last digit of the recipient’s social security number, a digit that is in effect randomly assigned. Approximately 20% of the checks were mailed in July, approximately 40% were mailed in August, and approximately 40% were mailed in September. This study uses monthly, household-level panel data on consumption, personal characteristics, and the tax rebate (size and date of receipt). The data set consists of N = 13,066 households and T = 6 months. Of the 13,066 households, 7,709 received a rebate check, while 5,357 did not. Variable definitions are:

Var iables in the Tax Rebate data set

Var iable Definit ion Cit dollars of consumption spending (i.e. spending on food, gasoline,

insurance, rent, movies, etc.) by household i in month t Rebateit dollar value of rebate check(s) received by household i in month tAnyChildrenit = 1 if household i in month t includes any children age 12 or less

= 0 otherwise HHAgeit age (in years) of head of household i in month t #Adultsit number of adults in household i in month t LowIncomei = 1 if household income < $34,000 in June

= 0 otherwise

Page 20: Harvard Econ

6

Part III : Tax Rebates (A) The questions in this part refer to Table 3. The dependent variable in the regressions in Table 3 is the monthly change in the dollar value of consumption for a given household, 'Cit = Cit – Ci,t–1.

Table 3 OLS Regression Results

Estimated using households who received a rebate check (all 6 months of data) Dependent variable: 'Cit

(1) (2) Rebatet .247

(.114) .130

(.185) Rebatet–1 -.172

(.097) -.067 (.172)

Rebatet–2 -.034 (.121)

LowIncome* – -5.1 (13.8)

Rebatet×LowIncome – .624

(.266)

Rebatet–1×LowIncome – -.459 (.248)

Month fixed effects? yes yes R2 .022 .024 N 7,709 7,709

F-statistics testing the hypothesis of zero coefficients on groups of variables: Rebatet–1, Rebatet–2 3.36

(p = .032)

Rebatet×LowIncome, Rebatet–1×LowIncome – 4.10 (p = .024)

Notes: Heteroskedasticity-robust standard errors appear in parentheses below estimated coefficients. All regressions include month fixed effects (values of these coefficients are not reported). The regressions are estimated using panel data with T=6 months of data (June through November) for the 7,709 households that received a rebate check. *LowIncome does not vary over time (=1 for all t if household income < $34,000 in June, = 0 for all t otherwise)

Page 21: Harvard Econ

7

Questions for Par t III (29 points) Answer these questions in blue book #3 1) Using regression (1):

(a) (2 points) What is the estimated effect of a $600 rebate on consumption in the month in which the rebate is received?

(b) (2 points) Test the hypothesis that a rebate received in month t has no effect on the change in consumption in the second month after which it is received, that is, on 'Ct+2.

2) Consider regression (1):

(a) (2 points) Would you expect the error term in this regression to be serially correlated? Why or why not?

(b) Whatever your answer to 2(a), suppose that this error term is in fact serially correlated. (i) (2 points) What are the implications of this serial correlation for bias in the estimated

causal effects? Explain. (ii) (2 points) What are the implications of this serial correlation for the standard errors

reported in the table? Explain. 3) Using the results of regression (1):

(a) Draw the following graphs. Clearly label the axes and provide the numerical values of the points (3 points each). (i) The effect of a $1 rebate on the change of consumption, 'Ct, in the month the rebate

is received and the two subsequent months. (ii) The effect of a $1 rebate on the level of consumption, Ct, in the month the rebate is

received and the two subsequent months. (b) (2 points) Of a $1 rebate received in July, how much is estimated to remain unspent by

the end of September? 4) (3 points) During this period, the economy was emerging from a recession. A skeptic says:

“The regression results show that, on average, consumption is increasing over this six-month period, but this could just be a consequence of the general economic recovery. Therefore, these regressions confuse the effect on consumption of the rebate with the broader effect of the overall economic recovery.” Do you agree or disagree? Why?

5) Using the results in regression (2), compare the estimated dynamic causal effects of the

rebate for low-income families vs. non-low income families. (a) (3 points) Is there statistically significant evidence that the dynamic effects differ for

these two groups? (b) (3 points) According to the estimated coefficients, which group (if any) has spent more of

the rebate check after two months, and (if so) by how much? Briefly, explain. (c) (2 points) Do these results accord with economic reasoning, or do they pose a puzzle?

Briefly, explain.

Page 22: Harvard Econ

Part IV: The 2001 Tax Rebate (B) The questions in Part IV refer to Tables 4 and 5. Par t IV uses the subset of the data for June and July for the following groups of households:

Group I. 2-adult households that received a full $600 rebate in July; II. 2-adult households that received a full $600 rebate in August or September; III. 2-adult households that never received a rebate (were ineligible for a rebate)

Table 4 summarizes average consumption for these groups, by month:

Table 4 Group Average Consumption for June and July

Group I (received in July) II (received later) III (never received)

June ,I JuneC ,II JuneC

,III JuneC

July ,I JulyC ,II JulyC ,II I JulyC

For example, ,I JuneC is the average consumption of households in Group I in June.

Table 5 summarizes a probit regression, estimated using data for July only for groups I and II.

Table 5 Probit Regression Results

Dependent variable: = 1 if check received in July, = 0 otherwise Data used for estimation: groups I and II, July only

Probit Coefficient Intercept -0.75

(0.04) AnyChildren 0.11

(0.12) HHAge -.008

(.006) F-statistic testing whether the coefficients

on AnyChildren and HHAge are zero1.42

(p = .241) Notes: robust standard errors are given in parentheses below the estimated probit coefficients.

8

Page 23: Harvard Econ

Questions for Par t IV (21 points) Answer these questions in blue book #4 For purposes of Part IV, the “rebate effect” is the effect of receiving a $600 tax rebate on household consumption of eligible households, in the month in which the rebate is received, holding all else constant. 1) Consider the following estimators of the rebate effect:

(a) ,I JulyC – ,I JuneC

(b) ,I JulyC – ,II JulyC

(c) ,I JulyC – ,II I JulyC

(d) ( ,I JulyC – ,I JuneC ) – ( ,II JulyC – ,II JuneC )

(2½ points each) For each estimator (a) – (d), is this an unbiased estimator of the rebate effect? Briefly explain.

2) (3 points) Provide a regression equation by which the estimator in 1(d) can be computed by

OLS regression estimated with household-level data for June and July. 3) Consider the probit regression in Table 5.

(a) (5 points) Using Table 5, compute the probability of receiving a check in July for an eligible household with one child, aged 6 years, in which the head of household is 30 years old.

(b) (3 points) Do the results in Table 5 support, or cast doubt on, the government’s claim that the month in which checks were mailed is effectively random? Explain.

9

Page 24: Harvard Econ

1

Background for Par ts I and II : Voting on Women’s Issues Parts I and II examine the relationship between the gender of a U.S. representative’s children and his/her voting record on “women’s issues.” The data pertain to votes taken during the 105th Congress (1997-1998; each Congress lasts two years). The observational unit is a U.S. representative (House of Representatives only – no senators). There are 435 representatives, but the study focuses on the 371 who have at least one child (regressions with fewer than n = 371 reflect some missing opinion survey data). Among these 371 representatives with at least one child, 89% are men and the mean age is 53. Two voting measures are considered. The first (“Teen contraceptive”) is binary, whether the representative voted to support a specific bill that would increase teenagers’ access to contraceptives. The second (“NOW”) is a score ranging from 0 to 100 based on votes on multiple bills related to women’s issues, computed by the National Organization of Women (NOW), measuring the agreement between the representative’s votes and the voting recommendations made by NOW (0 to 100, with 100 = perfect agreement). The data set contains variables that measure the characteristics of the representative’s district and the results of a political opinion survey administered to voters in his/her district.

Var iables in the Voting Data Set Var iable Definit ion

Teen contraceptive = 1 if the representative voted in favor of a specific bill increasing teen access to contraception, = 0 otherwise

NOW Composite NOW voting score: 0 = complete disagreement with NOW’s positions 100 = complete agreement with NOW’s positions

Fraction daughters fraction of the representative’s children who are female (range is 0 to 1)

District characteristics Registered Democrat proportion of voters registered as Democratic Party (0 to 1)

District income median income in district (thousands of dollars) Fraction white fraction of district voters who are white (0 to 1)

Fraction college grads fraction of district voters who are college graduates (0 to 1) District opinions

Abortion should be legal fraction of survey respondents in district who agree (0 to 1) Women are equal to men fraction of survey respondents in district who agree (0 to 1)

Anti-crime spending should increase

fraction of survey respondents in district who agree (0 to 1)

Social service spending should increase

fraction of survey respondents in district who agree (0 to 1)

Should be laws to protect homosexuals from discrimination

fraction of survey respondents in district who agree (0 to 1)

Page 25: Harvard Econ

2

The questions in Parts I and II refer to Table 1.

Table 1 The Effect of Having Daughters on Representatives Votes

(1) (2) (3) (4) (5) Dependent variable Teen

contra-ceptives?

Teen contra-

ceptives?

NOW NOW Fract. daughters

Estimation method Probit OLS OLS OLS OLS Regressors

Intercept -0.51** (0.10)

0.38** (0.06)

40.2** (4.1)

38.6** (2.3)

0.07 (0.29)

Fraction daughters 0.36** (0.12)

0.13** (0.05)

6.18* (2.67)

6.01* (2.86)

District characteristics Registered Democrat 0.71**

(0.28) 0.23** (0.09)

84.27** (11.57)

82.1** (15.8)

0.20 (0.26)

District income 0.21 (0.20)

0.00 (0.00)

Fraction white -8.6 (9.5)

0.08 (0.19)

Fraction college grads -108.5 (77.7)

-1.72 (1.58)

District opinions Abortion should be legal 41.0*

(20.6) -0.32 (0.40)

Women are equal to men -20.6 (23.1)

0.25 (0.29)

Anti-crime spending should increase

30.2 (18.7)

0.82 (0.52)

Social service spending should increase

-14.8 (16.8)

-1.53** (0.47)

Should be laws to protect homosexuals from discrimination

10.2 (13.9)

-0.06 (0.36)

N 371 371 371 331 331 F-statistics testing that the

coefficients on variables in a group are all zero

District characteristics 0.93 (0.46)

1.10 (0.36)

District opinions 1.98 (0.081)

1.41 (0.220)

Notes: Heteroskedasticity-robust standard errors appear in parentheses under regression coefficients, and p-values appear in parentheses under F-statistics. The regressions are estimated using data on U.S. representatives during the 105th Congress (1997-1998). Significant at the: **1%, *5% significance level.

Page 26: Harvard Econ

3

Questions for Par t I (18 points). Please answer these questions in Blue Book I 1) Interpret the coefficient on Fraction daughters in regression (2). (3 points) 2) Consider a representative with 2 daughters and 1 son, from a district in which 55% of

voters are registered Democrats. a) Using regression (1), compute the probability that this representative voted in favor of

the bill on teen access to contraception. (3 points) b) Using regression (2), compute the probability that this representative voted in favor of

the bill on teen access to contraception. (3 points) 3) Does the coefficient on Fraction daughters change substantially (in a real-world sense)

from regression (3) to regression (4)? What does this tell you about the additional variables that were included in regression (4)? (3 points)

4) A critic asserts that a shortfall of this study is that it focuses exclusively on daughters,

indicating gender bias by the author. The critic suggests adding one more regressor to regression (4), specifically, Fraction sons, which is the fraction of males among the representative’s children. What would be learned from this regression? Be specific. (3 points)

5) Another critic suggests that more conservative districts might elect representatives with

fewer daughters, so that Fraction daughters is endogenous. The author responds that regression (5) provides evidence against this hypothesis, because Fraction daughters is (with only one exception) unpredictable by the other regressors and thus is exogenous. Do you agree or disagree with the author’s response? Why? Be precise. (3 points)

Page 27: Harvard Econ

4

Questions for Par t II (24 points). Please answer these questions in Blue Book II

1) The following questions concern regression (4): a) Provide a potential reason why the coefficient on district income in (4) is subject to

omitted variable bias. (2 points) b) Comment on the following statement: Your answer to the previous question implies

that the conditional mean of the error term in (4) is nonzero, given the regressors in (4). Therefore, the first least squares assumption is violated and the coefficient on Fraction daughters in (4) does not have a causal interpretation. (3 points)

For the remaining questions, suppose (hypothetically) that the data set is extended to be panel data for T = 3 Congresses, the 105th (1997-1998), 106th (1999-2000), and 107th (2001-2002) Congresses. The observational unit would be a representative (his/her votes, children, and district) in a given Congressional session. The data set would consist of all representatives who were elected to Congress for all three sessions. Suppose n = 300, so there is a total of 900 observations (representatives are elected for two-year terms, and almost all who run for reelection are reelected). 2) Representatives in the 105th Congress who retire, are not reelected, or die would be in the

cross-sectional data set used in Table 1, but would not be in the panel data set. Would this introduce sample selection bias into the panel data estimate of the effect of Fraction daughters? (3 points)

Regardless of your answer to question (2), for the rest of these questions, ignore the possibility of sample selection bias. 3) To what extent would including representative fixed effects address the “endogeneity”

criticism raised in the first sentence of Part I, question 5? Explain. (3 points) 4) Would it be appropriate to include time fixed effects, in addition to representative fixed

effects, in the panel data regression? Explain. (3 points) 5) Consider a hypothetical panel data version of regression (4) in Table 1, in which both

representative fixed effects and time fixed effects are included. Call this hypothetical regression (P4) (“P” for panel). a) What is the problem that is solved by “clustered” or “HAC” standard errors, and how

do clustered standard errors solve that problem? (3 points) b) In regression (P4), which would you recommend using: conventional

(heteroskedasticity-robust) standard errors or clustered standard errors? Explain, with specific reference to regression (P4). (3 points)

c) Suppose that the author estimated regression (P4), using the standard errors you recommended in part (b). Using your judgment, do you think that these standard errors in hypothetical panel regression (P4) would be smaller, larger, or about the same as those in the cross-section regression (4) in Table 1? Explain. (3 points)

Page 28: Harvard Econ

5

Background to Par ts III and IV: Female Labor Supply Harvard economist Claudia Goldin attributes much of the rise of professional women in the U.S. labor force to their ability to engage in family planning after the introduction of the birth-control pill. In developing countries early childbearing is associated with lower levels of education and more dependency of women on their husband’s earnings. This question examines the effect of family size on female labor supply. The data set consists of observations on n = 254,654 married women, aged 21 – 35, who have at least two children. The data come from the 1980 U.S. Census of the Population (the data pertain to the full calendar year of 1979).

Var iables in the Female Labor Supply Data Set

Var iable Definit ion Wife’s weeks worked No. of weeks wife worked for pay in 1979 Husband’s weeks worked No. of weeks husband worked for pay in 1979 Same sex = 1 if first two children have same sex, = 0 otherwise 2 boys = 1 if first two children are boys, = 0 otherwise 2 girls = 1 if first two children are girls, = 0 otherwise Kids>2 = 1 if family has more than 2 children, = 0 otherwise Boy first = 1 if first child is a boy, = 0 otherwise Current age of mother age of mother in 1979 Age of mother at 1st birth age of mother at birth of first child Black = 1 if black Hispanic = 1 if Hispanic Other race = 1 if nonwhite/nonblack/nonHispanic

Page 29: Harvard Econ

6

The questions in Parts III and IV refer to Table 2.

Table 2 Child Sex Composition, Family Size, and Labor Supply

(1) (2) (3) (4) (5) (6) Dependent variable Kids>2 Kids>2 Wife’s

weeks worked

Wife’s weeks worked

Wife’s weeks worked

Husband’s weeks worked

Estimation method OLS OLS OLS TSLS TSLS TSLS Instruments Same sex 2 boys,

2 girls Same sex

Regressors Same sex .0694**

(.0018)

2 boys .0599** (.0026)

2 girls .0789** (.0026)

Kids>2 -8.04** (0.09)

-5.40** (1.21)

-5.16** (1.20)

1.01 (0.63)

Boy first -.0011 (.0019)

-.0015 (.0026)

-0.05 (0.08)

-0.02 (0.08)

-0.02 (0.08)

0.03 (0.08)

Current age of mother .0304** (.0003)

.0304** (.0003)

1.33** (0.01)

1.25** (0.04)

1.25** (0.04)

0.10* (0.04)

Age of mother at 1st birth -.0436** (.0003)

-.0436** (.0003)

-1.36** (0.17)

-1.24** (0.05)

-1.24** (0.05)

-0.21** (0.06)

Black .0680** (.0042)

.0680** (.0042)

10.83** (0.19)

10.66** (0.21)

10.64** (0.21)

-4.10** (0.26)

Hispanic .1260** (.0039)

.1260** (.0039)

-0.04 (0.18)

-0.38 (0.23)

-0.41 (0.23)

-2.61** (0.23)

Other race .0480** (.0044)

.0480** (.0044)

2.82** (0.20)

2.70** (0.21)

2.69** (0.21)

2.02** (0.18)

N 254,654 254,654 254,654 254,654 254,654 254,654 F-statistic on Same sex 1413.0 F-statistic on 2 boys, 2

girls 725.9

J-statistic 3.24 Notes: Regressions (4), (5), and (6) are estimated by two stage least squares (TSLS) regression, in which the included endogenous variable is Kids>2. Heteroskedasticity-robust standard errors appear in parentheses under regression coefficients, and p-values appear in parentheses under F-statistics. All regressions include an estimated intercept, which is not reported. Regressions (1) – (5) are estimated using data on married women for 1979, regression (6) is estimated using data for the husbands of those married women. Significant at the: **1%, *5% significance level.

Page 30: Harvard Econ

7

Questions for Par t III (21 points). Please answer these questions in Blue Book III 1) Give the best reason you can why the OLS estimator of the coefficient on Kids>2 in Table

2, column (3) might be biased. (3 points) 2) Consider the hypothesis that, on average, U.S. parents want to have children of both

genders (that is, they prefer at least one girl and one boy to all girls or all boys). Does Table 2 provide evidence in favor of this hypothesis, against this hypothesis, or neither? Explain. (3 points)

3) Consider the following potential instrumental variables for Kids>2 in regression (3):

a) Whether wife came from large family (binary) (3 points) b) The teen pregnancy rate in the wife’s city or town of residence (3 points) For each proposed instrument, is the variable arguably a valid instrument variable? Briefly explain.

4) Based on a combination of your judgment and the empirical results in Table 2:

a) Is Same sex a valid instrument in regression (4)? (3 points) b) Is the pair of variables, 2 boys and 2 girls, a valid set of instruments in regression (5)?

(3 points) 5) The estimated coefficient on Kids>2 differs in regressions (3) and (4) (the OLS estimate is

more negative than the TSLS estimate). Provide a real-world explanation (an interpretation of the results) that explains why the OLS estimate is more negative than the TSLS estimate. (3 points)

Page 31: Harvard Econ

8

Questions for Par t IV (17 points). Please answer these questions in Blue Book IV

1) Consider a hypothetical regression (7),

Wife’s weeks workedi = E0 + E1Kids>2 + ui (7)

which would be estimated by TSLS, using Same sex as an instrument (so regression (7) is regression (4) without the variables Boy first,…, Other race). For this question, assume that Same sex is a valid instrument in regression (4) and in addition that Same sex is distributed independently of all the control variables in regression (4), so E(Boy first|Same sex) = E(Boy first), …, E(Other race|Same sex) = E(Other race). a) Explain why Same sex would be a valid instrument in regression (7). (3 points) b) Provide a reason why, despite the validity of Same sex as an instrument in regression

(7), you would still prefer regression (4). (3 points) 2) Some women are more ambitious professionally than others. Suppose that the effect on

labor force participation of having a large family is not the same for every woman, specifically, the more ambitious the woman, the smaller is the effect (the most ambitious women will work whether or not they have a large family). How – if at all – would this change your interpretation of the results in regressions (4) and (5)? Explain your reasoning. (5 points)

Use Table 2 to comment on the following statements. For each statement, do you agree or disagree with the statement, and explain why (be specific). 3) Families with large numbers of children tend to be unusual in certain ways, in some cases

coming from certain religious/ethnic backgrounds (traditional Catholic families, Mormons, etc.). So the analysis in regressions (4) and (5) is not providing a valid estimate of the effect of family size on labor supply, it is just reflects this religious/ethnic effect. (3 points)

4) Even though having large families reduces female labor force participation, this is only half

of the story because their husbands will work more to compensate for the loss of the wife’s earnings. (3 points)

Page 32: Harvard Econ

9

Background to Par t V: The Term Spread and Output Growth The U.S. Treasury issues bonds of different maturities. A 10-year bond is debt that is paid off over 10 years. A one-year bond is debt that is paid off over one year. Usually, the rate of interest on a 10-year bond exceeds the rate of interest on a one-year bond. If short-term interest rates are unusually high, however, then the rate of interest on a one-year bond can exceed the rate of interest on a 10-year bond. The difference between the rate of interest on a long-term bond (here, the 10-year bond) and the rate of interest a short-term bond (here, the one-year bond) is called the Term Spread. If the 10-year rate is 4.5 (percent) and the 1-year rate is 3.5 (percent), then the spread is 1.0 (percentage points). The Term Spread is often viewed as a measure of monetary policy. If monetary policy is especially tight, then short term interest rates are high, relative to long term interest rates, and the term spread is negative. Over the past few months, the Term Spread in the U.S. has fallen, and just recently it became negative for the first time since the onset of the recession in 2000. The Term Spread data set contains quarterly time series data for the U.S. from the first quarter of 1960 (1960:I) through the third quarter of 2005 (2005:III). The data are plotted in Figure 1.

Var iables in Term Spread Data Set

Var iable Definit ion GDP growth quarterly growth rate of GDP, expressed in percent at an annual

rate (computed using the logarithmic approximation, GDP growth = 400ln(GDPt/GDPt–1), where GDPt is the real Gross Domestic Product of the U.S. in quarter t. (Quarterly GDP is the total value of final goods and services produced in the United States in that quarter.)

Term Spread the interest rate on a 10-year U.S. Treasury bill, minus the interest rate on a 1-year U.S. Treasury bill.

Page 33: Harvard Econ

Quarterly GDP growth at an annual ratetime

1960q1 1972q3 1985q1 1997q3 2010q1

-10

0

10

20

Term Spread: 10-year minus 1-yeartime

1960q1 1972q3 1985q1 1997q3 2010q1

-4

-2

0

2

4

Figure 1. Time series plots of quarterly GDP growth and Term Spread, 1960:I – 2005:III

10

Page 34: Harvard Econ

11

The questions in Part V refer to Table 3.

Table 3 GDP Growth and the Term Spread

Dependent variable: GDP growtht

(1) (2) (3) (4) (5) Sample period 1960:I –

2005:III 1960:I – 2005:III

1960:I – 2005:III

1960:I – 1984:IV

1985:I – 2005:III

Regressors Intercept 2.42**

(0.38) 2.04** (0.52)

1.85** (0.45)

2.05** (0.56)

2.07** (0.57)

GDP growtht–1 0.27** (0.08)

0.24** (0.08)

0.26** (0.07)

0.23* (0.10)

0.25*

(0.12) GDP growtht–2 0.18

(0.14)

GDP growtht–3 -0.06 (0.08)

GDP growtht–4 0.01 (0.10)

Term Spreadt–1 0.67** (0.25)

1.56** (0.44)

0.18 (0.20)

Quandt Likelihood Ratio (QLR) statistic (p-value in parentheses)

1.18 (0.41)

1.71 (0.32)

5.37 (0.03)

2.59 (0.26)

2.88 (0.24)

T 183 183 183 100 83 SER 3.3 3.2 3.1 3.8 1.93

F-statistic testing zero coefficients on GDP growtht–2,. GDP growtht–3,

and GDP growtht–4 (p-value in parentheses)

1.27 (0.29)

Notes: Estimation is by OLS, with heteroskedasticity-robust standard errors in parentheses. The regressions are estimated over the sample period given in the first row. The QLR statistic is for all the regressors in the regression, including the intercept. Heteroskedasticity-robust standard errors are included in parentheses. Significant at the: **1%, *5%, +10% significance level.

Page 35: Harvard Econ

12

Questions for Par t V (20 points). Please answer these questions in Blue Book V 1) The value of GDP growth in 2005:III was 4.1 (that is, in the third quarter of 2005, GDP

grew by 4.1% at an annual rate). a) Use regression (1) in Table 3 to compute a forecast of GDP growth for 2005:IV. (3

points) b) Suppose that the errors in regression (1) are normally distributed. Compute a 95%

prediction interval (forecast interval) for GDP growth in 2005:IV. (3 points) c) Suppose that forecast errors come in clusters, for example, some years have more

volatile GDP growth than others, so that GDP growth is more difficult to predict in some years than in others. Suggest a modification of regression (1) in Table 3 that would produce more reliable forecast intervals if there is this forecast error volatility clustering. (2 points)

2) Table 3 reports heteroskedasticity-robust standard errors. Should it report HAC standard

errors instead? Explain. (2 points) 3) In Business Week Online (January 9, 2006), David Wyss, chief economist for Standard and

Poor’s wrote about how the recent decline of Term Spread has created worries about a slowdown in U.S. economic growth. Based on the results in Table 3, do you think that these worries are justified? Fully explain your reasoning. (5 points)

4) Suppose the U.S. Federal Reserve Bank is considering setting Term Spread to 1.0, that is,

increasing Term Spread from its current value of approximately zero by 1.0 percentage point. (Suppose that, because long rates are more sluggish than short rates, the Fed can do this by lowing short-term interest rates until Term Spread equals 1.0.) a) Use regression (5) to estimate the effect of this easing. (1 points) b) In your judgment, do you think that your answer in (a) provides a good estimate of the

effect of this proposed policy intervention by the Fed? Why or why not? (4 points)

Page 36: Harvard Econ

1

Background for Par ts I and II : Nature vs. Nur ture What is the relative importance of “nature” (genes) vs. “nurture” (social and family environment) in determining economic outcomes? This part examines this question using data from a large adoption agency that placed Korean children in American families between 1964 and 1985. At this agency, the parents must file an application, pass a criminal background check, and attend adoption classes; if all goes well, they are then deemed eligible. Children are then matched with eligible parents on a first-come, first-serve basis. The data set contains data on the parents and their children, both adopted and non-adopted (natural), at the time of adoption and also at the end of the study when they are adults. Some households have multiple adoptees; for the purpose of this analysis, assume that the Korean adoptees in the same household are not related by blood. The analysis is restricted to adoptees who are at least 25 years of age at the end of the study.

Var iables in the Adoption Data Set

Variable Definition Child’s characteristics upon adoption

Adopted = 1 if adoptee, = 0 if non-adopted Weight at adoption Weight of child upon adoption (pounds) Height at adoption Height of child upon adoption (inches)

Child’s characteristics at end of study (as an adult)

Child’s education Years of education of adult child College grad = 1 if adult child graduates from a 4-year college, = 0 otherwise

Child’s income Income of adult child Child’s BMI BMI of adult child. The BMI is the Body Mass Index, which is weight

(in kilograms) divided by the square of height (in meters), so units are kg/m2.

Child drinks = 1 if adult child drinks alcohol, = 0 otherwise

Parent characteristics Mother’s education Years of education of mother Father’s education Years of education of father

Log Parent's Income natural logarithm of parent’s income in dollars Mother's BMI BMI of mother (kg/m2) Father’s BMI BMI of father (kg/m2)

Mother drinks = 1 if mother drinks alcohol, = 0 otherwise Father drinks = 1 if father drinks alcohol, = 0 otherwise

Year binary variables Binary variables indicating the year of adoption (the first year of program is the omitted or “base” year)

Page 37: Harvard Econ

2

Table 1. Regression of adoptee height and weight at adoption on pre-adoption parental characteristics

(1) (2) (3) (4) Dependent variable Weight at

adoption (pounds)

Height at adoption (inches)

Weight at adoption (pounds)

Height at adoption (inches)

Regressors: Mother's Education -0.008 -0.067 0.017 -0.038

(0.097) (0.095) (0.088) (0.078)

Father's Education -0.028 0.046 -0.047 0.005

(0.077) (0.077) (0.073) (0.069)

Log Parent's Income 0.508** 0.707** -0.119 0.034

(0.197) (0.201) (0.277) (0.244)

Mother's BMI -0.019 -0.023 -0.051 -0.065

(0.039) (0.039) (0.037) (0.034)

Father's BMI 0.000 0.022 -0.029 -0.033

(0.046) (0.046) (0.047) (0.048)

Mother Drinks -0.125 -0.187 0.017 0.050

(0.463) (0.454) (0.456) (0.416)

Father Drinks 0.241 -0.271 0.288 -0.266

(0.479) (0.471) (0.473) (0.426)

Year binary variables? No No Yes Yes

Observations 989 1038 989 1038 Adjusted R-squared 0.02 0.03 0.14 0.28 F-statistic testing: coefficients parental variables =0 (p-value)

2.87 (0.008)

2.73 (0.009)

0.66 (0.709)

0.84 (0.553)

Notes: All regressions are estimated by OLS. Clustered standard errors are given in parentheses, where the clustering occurs at the level of the family. All regressions include an intercept, which is not reported. * significant at 5%; ** significant at 1%.

Page 38: Harvard Econ

3

Table 2. Regression of adoptee outcome var iables on pre-adoption parental characteristics

(1) (2) (3) (4) (5) (6) Dependent variable Child's

Years of Education

Child's Years of

Education

College Grad

Log Child's Income

Child's BMI

Child Drinks

Regressors: Mother's Education 0.097** 0.084** 0.021* 0.016 -0.081 0.010

(0.027) (0.031) (0.008) (0.013) (0.061) (0.009)

Father's Education -0.001 -0.041 -0.004 -0.004 -0.037 0.010

(0.032) (0.055) (0.007) (0.011) (0.052) (0.007)

Log parent's income -0.018 -0.005 0.011 0.024 -0.412 0.015

(0.113) (0.032) (0.027) (0.040) (0.219) (0.028)

Mother's BMI -0.088** 0.180 -0.017** -0.004 0.006 -0.001

(0.024) (0.183) (0.004) (0.006) (0.028) (0.004)

Father's BMI 0.007 -0.008 -0.000 -0.000 -0.004 0.004

(0.020) (0.112) (0.004) (0.007) (0.038) (0.004)

(Mother's BMI)2 -0.091

(0.118)

(Father's BMI)2 -0.081

(0.202)

(Mother's BMI) x 0.274 (Father’s BMI)

(0.206)

Mother Drinks -0.039 -0.715** -0.043 -0.007 -0.345 0.135**

(0.205) (0.175) (0.046) (0.066) (0.392) (0.045)

Father Drinks 0.263 0.000 0.050 0.030 0.580 0.061

(0.212) (0.002) (0.048) (0.070) (0.396) (0.046)

Child is Male -0.723** -0.004 -0.159** -0.259** 1.927** 0.068

(0.177) (0.003) (0.041) (0.059) (0.301) (0.040)

Constant 16.902** 0.002 0.766** 3.758** 31.183** 0.121

(1.063) (0.004) (0.264) (0.466) (2.350) (0.315)

Year binary variables? Yes Yes Yes Yes Yes Yes F-statistic testing: (Mother's BMI)2,

(Father’s BMI)2, (Mother’s BMI) x (Father’s BMI) = 0 (p-value)

0.57 (0.634)

Observations 897 897 897 874 878 893 Adjusted R-squared 0.03 0.03 0.01 0.06 0.04 0.04

Notes: All regressions are estimated by OLS. Clustered standard errors are given in parentheses, where the clustering occurs at the level of the family. * significant at 5%; ** significant at 1%.

Page 39: Harvard Econ

4

Table 3. Probit regressions of outcome var iables on pre-adoption parental characteristics for adoptee and non-adoptee children

(1) (2) (3) (4) Dependent variable College Grad Child Drinks College Grad Child Drinks Data are for: Adoptees Adoptees Non-adoptees Non-adoptees Regressors:

Mother's Education 0.057** 0.013 0.097** 0.032

(0.019) (0.021) (0.025) (0.025)

Father's Education -0.010 0.022 0.105** 0.021

(0.017) (0.018) (0.020) (0.022)

Log Parent's Income 0.008 0.079 0.108 -0.067

(0.064) (0.066) (0.076) (0.077)

Mother's BMI -0.086** 0.000 -0.108** 0.000

(0.019) (0.009) (0.022) (0.013)

Father's BMI -0.003 0.000 -0.030* 0.010

(0.010) (0.010) (0.012) (0.015)

Mother Drinks -0.054 0.374** 0.039 0.489**

(0.109) (0.106) (0.128) (0.131)

Father Drinks 0.042 0.211 0.134 0.611**

(0.112) (0.110) (0.132) (0.132)

Child is Male -0.397** 0.203* -0.063 0.355**

(0.090) (0.097) (0.098) (0.100)

Constant 0.142 -1.300* -1.680** -1.396*

(0.566) (0.570) (0.607) (0.669)

Year binary variables? Yes Yes Yes Yes Observations 1088 1083 943 933 Notes: All regressions are probit. Clustered standard errors are given in parentheses, where the clustering occurs at the level of the family. * significant at 5%; ** significant at 1%

Page 40: Harvard Econ

5

Par t I (24 points) Please answer these questions in Blue Book I

The questions in Part I refer to the results in Tables 1 and 2. 1) Using regression (1) in Table 2:

a) (3 points) Compute the estimated effect on the child’s years of education of an increase of four years in the mother’s education.

b) (2 points) Compute a 95% confidence interval for your estimated effect in (a). 2) Consider the relationship between the child’s years of education and parental BMI, holding

constant the regressors in Table 2, column (1) other than parental BMI. a) (2 points) Suggest a reason why this effect might be nonlinear. b) (2 points) Can you reject the null hypothesis that effect on the child’s years of education

of parental BMI is linear? Explain. 3) Consider the regressions in Table 1.

a) (2 points) Explain why these regressions can be used to examine the proposition that the assignment process of adoptees to families was in effect random.

b) (2 points) Using regressions (1) and (2), can you reject the hypothesis of random assignment? Explain.

c) (2 points) Using regressions (3) and (4), can you reject the hypothesis of random assignment? Explain.

d) (3 points) Explain what your answers to (b) and (c) imply about the program. Explain, in real-world, concrete terms, how you might reconcile any discrepancy between your answers to (b) and (c).

4) The standard errors reported in Tables 1 and 2 are “clustered” standard errors, clustered at

the level of the household. a) (3 points) Explain specifically what this means, that is, what are clustered standard errors,

clustered at the level of the household? Be precise. b) (3 points) Provide a reason why the clustered standard errors could be larger than the

conventional heteroskedasticity-robust standard errors for the regressions in Table 2.

Page 41: Harvard Econ

6

Par t II (22 points) Please answer these questions in Blue Book II

The questions in Part II refer to the results in Tables 2 and 3. 1) Consider a female adoptee whose adoptive mother has 14 years of education, whose father

has 16 years of education, whose parents’ income is $50,000, mother’s BMI is 23, father’s BMI is 24, the mother does not drink, and the father does not drink. Also suppose that the child was adopted in the initial program year (so all binary year variables equal zero). a) (3 points) Using regression (2) in Table 3, compute predicted probability that the adoptee

grows up to be a drinker. b) (2 points) What is the difference in the predicted probabilities of drinking for the adoptee

in (a), compared with an adoptee whose parents have the same characteristics as those in (a) except that the mother drinks?

c) (2 points) Now use the linear probability model from Table 2 to estimate the change in predicted probabilities for the comparison in 1(b) (that is, a nondrinking vs. a drinking mother, with the values of the other regressors given at the beginning of this question).

2) Using the results in Tables 2 and 3, do you agree or disagree with the following statements?

Explain. a) (5 points) Many countries impose restrictions on foreign adopting parents, including

limits on parental BMI and parents’ education. The results in Tables 2 and 3 support these policies in the sense that Tables 2 and 3 show that high parental BMI and low parental education both are associated with worse outcomes for adoptees.

b) (5 points) The results in Tables 2 and 3 show that dieting by overweight mothers has

positive benefits for children. Specifically, consider a mother who decreases her BMI by 10 (for an obese woman, this corresponds to a weight drop of approximately 25%). On average, holding other family characteristics constant, we would expect to see this weight loss lead to an economically substantial increase in the child’s years of education and in the child’s probability of graduating from college.

c) (5 points) The results in Table 3 shed light on the “nature-nurture” debate. These tables

show that paternal characteristics (such as drinking and being overweight) are transmitted primarily through a genetic path, whereas maternal characteristics seem to be transmitted primarily through a non-genetic (that is, environmental) path.

Page 42: Harvard Econ

7

Background for Par t III : Fast-Food TV Advertising and Childhood Obesity Childhood obesity is a health problem of significant concern. In the 1960s, approximately 4 percent of American children ages 6 to 11 were overweight; by 1999, 13 percent of American children were overweight. Measured in terms of BMI, the average BMI for children rose from 16.63 in the 1960s to 17.37 in 1999, an increase of almost 5%; this is a large increase in historical and medical terms. [The BMI is the body mass index, which is weight (in kilograms) divided by the square of height (in meters), so the units of the BMI are kg/m2.] A shift to a high-fat, high-calorie childhood diet – the sort of food found at fast-food restaurants – is one possible reason for the increase in childhood BMI. This section considers whether exposure to fast-food advertising on TV plays a role in this increase. The data set is a cross-sectional data set on children aged 6-11 in the U.S. in 1997. It contains data on children’s characteristics, family characteristics, TV viewing by the child, and characteristics of the child’s county.

Var iables in the Childhood BMI Data Set Var iable Definit ion Child characteristics

BMI Child’s BMI (kilograms/meter2) TV Exposure Number of hours per week of fast-food TV ads seen by the

child Age Child’s age (years)

Other individual variables Child’s race and sex, family income, mother’s BMI, and mother employed/not employed

County characteristics

Price of TV advertising Average price of TV advertising in the child’s county in 1997 ($/second)

Number of households with TV Number of households in the child’s county with a TV (hundreds of thousands)

Temperature Average annual temperature in child’s county (degrees Fahrenheit)

Other county variables Number of fast-food restaurants per capita, number of full-service restaurants per capita, and price indexes for fast-food restaurant meals, full-service restaurant meals, and at-home restaurant meals

Page 43: Harvard Econ

8

Table 4. Children’s BMI and Fast-Food TV Advertising

(1) (2) (3) Dependent variable BMI TV exposure BMI Estimation method OLS OLS Two Stage

Least Squaresa

Regressors: TV exposure .315**

(.111) -- .336*

(.150)

Age .429** (.028)

.021* (.010)

.388** (.048)

Price of TV advertising -- -.148**

(.013) --

Number of households with TV -- .100+

(.064) --

Temperature -- 4.711

(5.50) --

Other individual variables? Yes Yes Yes

Other county variables? Yes Yes Yes

F-statistic testing: coefficients on Price of TV advertising, no. households with TV, and Temperature = 0

-- 41.92 --

J-statistic -- -- .308 Number of observations 6,818 6,818 6,818

Notes: Heteroskedasticity-robust standard errors appear in parentheses under regression coefficients, and p-values appear in parentheses under F-statistics. All regressions contain the other individual variables (child’s race, male/female, family income, mother’s BMI, mother employed/not employed) and the other county variables (number of fast-food restaurants per capita, number of full-service restaurants per capita, price indexes for fast-food restaurant meals, full-service restaurant meals, and at-home restaurant meals ).

aInstruments for the TSLS regression are the Price of TV Advertising, Number of households with TV, and Temperature. Significant at the: **1%, *5%, +10% significance level.

Page 44: Harvard Econ

9

Questions for Par t III (34 points) Please answer these questions in Blue Book III

The questions in Part III refer to Table 4. 1) (3 points) Suggest a reason why TV exposure might be endogenous in regression (1). 2) Regression (3) uses three variables as instrumental variables for TV exposure. For each

instrument, explain whether, in your judgment, the instrument plausibly is exogenous: a) (2 points) the Price of TV advertising in the county; b) (2 points) the Number of households with TV in the county; c) (2 points) the average annual county Temperature.

3) Consider regression (3).

a) (3 points) Suppose the instruments in regression (3) are weak. If so, what would the consequence be for interpreting the results in column (3), specifically the coefficient on TV exposure and its standard error?

b) (3 points) Based on the results in Table 2 (TYPO: this should be Table 4), are the instruments weak, are they strong, or do you need more information before you can decide? Explain.

4) Consider the J-statistic in column (3).

a) (3 points) Suppose you were to reject the null hypothesis using this J-statistic. What would you conclude?

b) (3 points) Using the J-statistic actually reported in column (3), do you reject the null hypothesis at the 5% significance level? Explain how you reached this conclusion (be precise).

5) (3 points) A researcher suggests using as instruments a full set of county binary variables

(county dummy variables). What would be the effect of adding a full set of county dummy variables to regression (2)?

6) (5 points) Another researcher suggests replacing the instruments in regression (3) with a new

instrumental variable, ProSports, that equals one if at least one local professional sports team was in the playoffs during the study period, and equals zero otherwise. For the purposes of this question, suppose that ProSports is a valid instrument. Describe, in concrete and everyday terms, a reason why the local average treatment effect obtained using ProSports would differ from the average treatment effect. In your example, is the local average treatment effect greater than or less than the average treatment effect?

7) (5 points) Do you agree or disagree with the following statement? Explain fully. (The

sample average of TV Exposure is approximately 0.5 hours.) The results in Table 1 (TYPO: this should be Table 4) indicate that a ban on TV fast-food advertisements would reduce the BMI among children by an amount that is statistically significant and meaningful in a real-world sense.

Page 45: Harvard Econ

Background to Par t IV: Baseball U.S. Major League Baseball (MLB) has experienced a number of changes that could change the competitive balance across teams. On occasion, the league has expanded by creating new teams. In 1976, U.S. Major League Baseball (MLB) introduced free agency, which gives the players the right to sell their services to the highest bidder upon the expiration of their contract or under certain other conditions; previously, players could switch teams only if they were traded or released by their team. What have been the effects of league expansion and free agency on the competitive balance among teams? The measure of competitiveness used here is the standard deviation of the end-of-season winning percentages of MLB teams in year t. A team’s winning percent is the percent of games won that year, for example, 55%. Thus, if the standard deviation of the winning percentage is large in a given year, there is a large spread in the won/loss record among teams, indicating a non-competitive year. The data are annual time series data from 1950 to 2001. The variables SDWP and FreeAgents are plotted in Figure 1.

Var iables in the Baseball Data Set: Annual Time Series Data, 1950 - 2001

Var iable Definit ion SDWPt Standard deviation of the end-of-season winning percentages of MLB

teams in year t (units are percentage points) FreeAgentst Number of players who declared free agency in year t, divided by 10 (so

units are tens of players) Expansion Yeart = 1 if MLB expanded the number of teams in year t, = 0 otherwise

YEAR1940 1960 1980 2000

0

5

10

15

SDWP

FreeAgents

Figure 1. SDWP (solid line) and the first lag of FreeAgents (circles) plotted against time

10

Page 46: Harvard Econ

11

Table 5. Time Series Models of the Baseball Competit iveness

Dependent variable: the Standard Deviation of Winning Percentages (SDWPt)

(1) (2) (3) Regressors:

FreeAgentst-1 -0.109 -0.106 -0.110 (0.040) [0.046]

(0.037) [0.047]

(0.038) [0.046]

Expansion Yeart 1.536 1.544

(0.569) [0.363]

(0.579) [0.386]

Expansion Yeart-1 -0.583

(0.577) [0.289]

Expansion Yeart-2 -0.293

(0.579) [0.368]

Constant 7.908 7.714 7.624

(0.252) [0.408]

(0.247) [0.458]

(0.264) [0.533]

Observations 50 50 50 R2 0.14 0.25 0.27 F-statistics testing coefficients on Expansion Yeart-1, Expansion Yeart-2 = 0:

Heteroskedasticity-robust F-statistic (p-value)

-- -- 0.67 (0.516)

Newey-West F-statistic

(p-value)-- -- 2.09

(0.136) Notes: All regressions are OLS and are estimated using data from 1952 – 2001, with earlier observations for initial values of lagged regressors. Under the estimated coefficients are heteroskedasticity-robust standard errors in parentheses ( ) and Newey-West standard errors with four lags in square brackets [ ].

Page 47: Harvard Econ

12

Questions for Par t IV (20 points) Please answer these questions in Blue Book III

The questions in Part IV refer to Table 5. 1) In an expansion year, new teams are added to the league.

a) (3 points) What is the immediate, or impact, effect of an expansion on competitiveness? (Provide a numerical estimate and interpret.)

b) (3 points) What is the cumulative dynamic effect of the expansion on competitiveness, two years after the expansion? (Provide a numerical estimate and interpret.)

c) (3 points) Compute the standard error for the cumulative dynamic effect in (b). If you do not have enough information to do so, explain how you would compute this standard error and what additional information you would need.

2) (3 points) Table 5 reports two sets of standard errors, heteroskedasticity-robust standard

errors and Newey-West standard errors. Which should be used here? Explain. 3) (3 points) A critic of this analysis asserts that the relationship in regression (3) might be

unstable and suggests computing the QLR statistic (with trimming of 15% on each end of the sample, as is conventional). Is this a good recommendation for the purpose of assessing the stability of regression (3)? Explain why or why not.

4) (5 points) Baseball owners assert that free agency reduces competitiveness across baseball

teams because rich teams can outbid poor teams, increasing talent disparities across teams. Based on the results in Table 5, do you agree, disagree, or can you not reach a conclusion? Explain.


Recommended