ch10tb

1

Chapter 10Instrumental Variables Regression

Multiple Choice

1) Estimation of the IV regression model

a. requires exact identification.b. allows only one endogenous regressor, which is typically correlated with the error

term.c. requires exact identification or overidentification.d. is only possible if the number of instruments is the same as the number of

regressors.

Answer: c

2) Two stage least squares is calculated as follows: in the first stage

a. Y is regressed on the exogenous variables only. The predicted value of Y is thenregressed on the instrumental variables.

b. the unknown coefficients in the reduced form equation are estimated by OLS, andthe predicted values are calculated. In the second stage, Y is regressed on thesepredicted values and the other exogenous variables.

c. the exogenous variables are regressed on the instruments. The predicted value ofthe exogenous variables is then used in the second stage, together with theinstruments, to predict the dependent variable.

d. the unknown coefficients in the reduced form equation are estimated by weightedleast squares, and the predicted values are calculated. In the second stage, Y isregressed on these predicted values and the other exogenous variables.

Answer: b

3) The conditions for a valid instruments do not include the following:

a. each instrument must be uncorrelated with the error term.b. each one of the instrumental variables must be normally distributed.c. at least one of the instruments must enter the population regression of X on the Z’s

and the W’s.d. perfect multicollinearity between the predicted endogenous variables and the

exogenous variables must be ruled out.

2

Answer: b

4) The IV regression assumptions include all of the following with the exception of

a. the error terms must be normally distributed.b. E(ui|W1i,…, Wri) = 0.c. the X’s, W’s, Z’s, and u all have nonzero, finite fourth moments.d. (X1i,…, Xki, W1i,…,Wri, Z1i, … Zmi, Yi) are i.i.d. draws from their joint distribution.

Answer: a

5) The rule-of-thumb for checking for weak instruments is as follows: for the case of asingle endogenous regressor,

a. a first stage F must be statistically significant to indicate a strong instrument.b. a first stage F > 1.96 indicates that the instruments are weak.c. the t-statistic on each of the instruments must exceed at least 1.64.d. a first stage F < 10 indicates that the instruments are weak.

Answer: d

6) The J-statistic

a. tells you if the instruments are exogenous.b. provides you with a test of the hypothesis that the instruments are exogenous for

the case of exact identification.c. is distributed 2

m kχ − where m-k is the degree of overidentification.

d. is distributed 2m kχ − where m-k is the number of instruments minus the number of

regressors.

Answer: c

7) In the case of the simple regression model Yi = β0 + β1Xi + ui, i = 1,…, n, when X and uare correlated, then

a. the OLS estimator is biased in small samples only.b. OLS and TSLS produce the same estimate.c. X is exogenous.d. the OLS estimator is inconsistent.

Answer: d

3

8) The following will not cause correlation between X and u in the simple regression model:

a. simultaneous causality.b. omitted variables.c. irrelevance of the regressor.d. errors in variables.

Answer: c

9) The distinction between endogenous and exogenous variables is

a. that exogenous variables are determined inside the model and endogenousvariables are determined outside the model.

b. dependent on the sample size: for n > 100, endogenous variables becomeexogenous.

c. dependent on the distribution of the variables: when they are normally distributed,they are exogenous; otherwise they are endogenous.

d. whether or not the variables are correlated with the error term.

Answer: d

10) The two conditions for a valid instrument are

a. corr(Zi, Xi) = 0 and corr(Zi, ui) π 0.b. corr(Zi, Xi) = 0 and corr(Zi, ui) = 0.c. corr(Zi, Xi) π 0 and corr(Zi, ui) = 0.d. corr(Zi, Xi) π 0 and corr(Zi, ui) π 0.

Answer: c

11) Instrument relevance

a. means that the instrument is one of the determinants of the dependent variable.b. is the same as instrument exogeneity.c. means that some of the variance in the regressor is related to variation in the

instrument.d. is not possible since X and u are correlated and Z and u are not correlated.

Answer: c

4

12) Consider a competitive market where the demand and the supply depend on the currentprice of the good. Then fitting a line through the quantity-price outcomes will

a. give you an estimate of the demand curve.b. estimate neither a demand curve nor a supply curve.c. enable you to calculate the price elasticity of supply.d. give you the exogenous part of the demand in the first stage of TSLS.

Answer: b

13) When there is a single instrument and single regressor, the TSLS estimator for the slopecan be calculated as follows

a. 1

TSLSZY

ZX

s

sβ = .

b. 1 2

TSLSXY

X

s

sβ = .

c. 1

TSLSZX

ZY

s

sβ = .

d. 1 2

TSLSZY

Z

s

sβ = .

Answer: a

14) The TSLS estimator is

a. consistent and has a normal distribution in large samples.b. unbiased.c. efficient in small samples.d. F-distributed.

Answer: a

5

15) The reduced form equation for X

a. regresses the endogenous variable X on the smallest possible subset of regressors.b. relates the endogenous variable X to all the available exogenous variables, both

those included in the regression of interest and the instruments.c. uses the predicted values of X from the first stage as a regressor in the original

equation.d. uses smaller standard errors, such as homoskedasticity-only standard errors, for

inference.

Answer: b

16) When calculating the TSLS standard errors

a. you do not have to worry about heteroskedasticity, since it was eliminated in thefirst stage.

b. you can use the standard errors reported by OLS estimation of the second stageregression.

c. the critical values from the standard normal table should be adjusted for theproper degrees of freedom.

d. you should use heteroskedasticity-robust standard errors.

Answer: d

17) Having more relevant instruments

a. is a problem because instead of being just identified, the regression now becomesoveridentified.

b. is like having a larger sample size in that the more information is available for usein the IV regressions.

c. typically results in larger standard errors for the TSLS estimator.d. is not as important for inference as having the same number of endogenous

variables as instruments.

Answer: b

18) Weak instruments are a problem because

a. the TSLS estimator may not be normally distributed, even in large samples.b. they result in the instruments not being exogenous.c. the TSLS estimator cannot be computed.d. you cannot predict the endogenous variables any longer in the first stage.

Answer: a

6

19) (Requires Appendix Material) The relationship between the TSLS slope and thecorresponding population parameter is:

a.

_

11 1 _

1

1( )

ˆ( )1

( )( )

n

i iTSLS i

n

i ii

Z Z un

Z Z X Xn

β β =

=

−− =

− −

∑

∑.

b.

_

11 1 _

1

1( )

ˆ( )1

( )( )

n

iTSLS i

n

i ii

Z Zn

Z Z X Xn

β β =

=

−− =

− −

∑

∑.

c.

_

11 1 _

2

1

1( )

ˆ( )1

( )

n

i iTSLS i

n

ii

Z Z un

Z Zn

β β =

=

−− =

−

∑

∑.

d. 11 1 _

1

1( )

ˆ( )1

( )( )

n

i iTSLS i

n

i ii

X X un

Z Z X Xn

β β =

=

−− =

− −

∑

∑.

Answer: a

20) If the instruments are not exogenous,

a. you cannot perform the first stage of TSLS.b. then, in order to conduct proper inference, it is essential that you use

heteroskedasticity-robust standard errors.c. your model becomes overidentified.d. then TSLS is inconsistent.

Answer: d

21) In the case of exact identification,

a. you can use the J-statistic in a test of overidentifying restrictions.b. you cannot use TSLS for estimation purposes.c. you must rely on your personal knowledge of the empirical problem at hand to

assess whether the instruments are exogenous.d. OLS and TSLS yield the same estimate.

Answer: c

7

22) To calculate the J-statistic you regress the

a. squared values of the TSLS residuals on all exogenous variables and theinstruments. The statistic is then the number of observations times the regression

2R .b. TSLS residuals on all exogenous variables and the instruments. You then multiply

the homoskedasticity-only F-statistic from that regression by the number ofinstruments.

c. OLS residuals from the reduced form on the instruments. The F-statistic from thisregression is the J-statistic.

d. TSLS residuals on all exogenous variables and the instruments. You then multiplythe heteroskedasticity-robust F-statistic from that regression by the number ofinstruments.

Answer: b

23) (Requires Chapter 8) When using panel data and in the presence of endogenousregressors ,

a. the TSLS does not exist.b. you do not have to worry about the validity of instruments, since there are so

many fixed effects.c. the OLS estimator is consistent.d. application of the TSLS estimator is straightforward if you use two time periods

and difference the data.

Answer: d

24) In practice, the most difficult aspect of IV estimation is

a. finding instruments that are both relevant and exogenous.b. that you have to use two stages in the estimation process.c. calculating the J-statistic.d. finding instruments that are exogenous. Relevant instruments are easy to find.

Answer: a

8

25) Consider a model with one endogenous regressor and two instruments. Then the J-statistic will be large

a. if the number of observations is very large.b. if the coefficients are very different when estimating the coefficients using one

instrument at a time.c. if the TSLS estimates are very different from the OLS estimates.d. when you use homoskedasticity-only standard errors.

Answer: b

9

Essays

1) Write a short essay about the overidentifying restrictions test. What is meant exactly by“overidentification”? State the null hypothesis. Describe how to calculate the J-statisticand what its distribution is. Use an example of two instruments and one endogenousvariable to explain under what situation the test will be likely to reject the null hypothesis.What does this example tell you about the exactly identified case? If your variables passthe test, is this sufficient for these variables to be good instruments?

Answer: The regression coefficients in the regression model with endogenous regressorscan be either underidentified, exactly identified, or overidentified. If the numberof instruments (m) equals the number of endogenous regressors (k), then thecoefficients are exactly identified. If there are more instruments than the numberof endogenous regressors, then the regression coefficients are overidentified. Forthe instrumental variable estimator to exist, there must be at least as manyinstruments as endogenous regressors ( m k≥ ). In the case of overidentification,the exogeneity of the instruments can be tested. Under the null hypothesis, allinstruments are exogenous. Under the alternative hypothesis, at least one of theinstruments is endogenous. Technically, the overidentifying restrictions test usesthe TSLS residuals to see if these are correlated with the instruments. Theresiduals are regressed on the instruments and the included exogenousregressors. Under the null hypothesis, all coefficients other than the constant arezero. Since this is a case of joint hypothesis testing, the F-statistic is computed,and from it the J-statistic, where J mF= . In large samples the distribution ofthis statistic is 2

m kχ − . Calculating the J-statistic amounts to comparing different

IV estimates. In the case of two instruments and one endogenous regressor,where the degree of overidentification is one, two such estimates exist. Due tosample variation, these estimates will differ, although they should be similar, or“close” to each other. If one or both of the instruments is not exogenous, thenthe estimates will not be similar, or the difference between the two will besufficiently large so as not to be the result of pure sampling variation. In thissituation the null hypothesis will be rejected. This procedure can only beexecuted when the coefficients are overidentified, since there is no comparisonpossible for the case of exactly identified coefficients. Passing the test is notsufficient for the instruments to be valid since, in addition to being exogenous,they must also be relevant, i.e. they must be correlated with the endogenousregressor.

10

2) Using some of the examples from your textbook, describe econometric studies whichrequired instrumental variable techniques. In each case emphasize why the need forinstrumental variables arises and how authors have approached the problem. Make sure toinclude a discussion of overidentification, the validity of instruments, and testingprocedures in your essay.

Answer: The textbook mentions several studies which used instrumental variableestimation techniques, starting with Wright’s problem to estimate demand andsupply elasticities on animal and vegetable oils and fats. This is a case ofsimultaneous causality bias since the price and quantity in the market aredetermined by both the supply and demand for the commodity. Wright used theweather, which shifted the supply curve only and thereby traced out the demandcurve. Since there was only a single instrument, the coefficients are exactlyidentified, and the validity of the instrument cannot not be tested.

Another example mentioned is the effect of class size on test scores. The reasonfor a correlation between class size and the error term potentially stems fromomitted variable bias here, such as the quality of the teaching staff and outsideopportunities for some of the students. In the hypothetical examples of anearthquake, some schools may receive more students than usual dependent onthe closeness to the epicenter, if the school was unaffected structurally. Theincrease in class size is related to the closeness to the epicenter, but this distanceshould be uncorrelated with the ability of the teaching staff and the outsideopportunities. As in the previous study, there is only a single instrument andhence no possibility to use the overidentification test.

The primary example of instrumental variable estimation in the chapter involvesestimation of the demand elasticity for cigarettes. Due to simultaneity bias forthe demand equation, sales taxes are used as an instrument first in a crosssection of states in a single year and later in a panel. Prices and quantities aredetermined simultaneously by supply and demand, and as a result, prices will becorrelated with the error term in the demand equation. Sales taxes are fairlyhighly correlated with prices, explaining almost half of the variation in these. Itis argued that due to differences in choices about public finance due to politicalconsiderations across states, these are exogenous. Only one instrument is used inthe cross-section and hence there is no degree of overidentification. Lateranother instrument is introduced, cigarette-specific taxes. With two instrumentsand one endogenous regressor, the J-statistic can be computed for theoveridentifying restrictions test.

11

Other examples discussed in the textbook include the effect of an increase in theprison population on crime rates, further discussion of class size and test scores,and aggressive treatment of heart attacks and the potential for saving lives.

3) Describe the consequences of estimating an equation by OLS in the presence of anendogenous regressor. How can you overcome these obstacles? Present an alternativeestimator and state its properties.

Answer: In the case of an endogenous regressor, there is correlation between the variableand the error term. In this case, the OLS estimator is inconsistent. To get aconsistent estimator in this situation, instrumental variable techniques, such asTSLS should be used. If one or more valid instruments can be found, meaningthat the instrument must be relevant and exogenous, then a consistent estimatorcan be derived. The relevance of instruments can be tested using the rule ofthumb (a first-stage F-statistic of more than 10 in the TSLS estimator). Theexogeneity of the instruments can be tested using the J-statistic. The testrequires that there is at least one more instrument than endogenous regressors,i.e., that the equation is overidentified. In large samples the samplingdistribution of the TSLS estimator is approximately normal, so that statisticalinference can proceed as usual using the t-statistic, confidence intervals, or jointhypothesis tests involving the F-statistic. However, inference based on thesestatistics will be misleading in the case where instruments are not valid.

4) Write an essay about where valid instruments come from. Part of your answer must dealwith checking the validity of instruments and what the consequences of weak instrumentsare.

Answer: In order for instruments to be valid, they have to be relevant and exogenous. Tofind valid instruments, two approaches are typically used. First economic theorycan serve as a guide. In the case of simultaneous causality in a market, forexample, theory predicts shifts in one curve but not the other as a result ofchanges in an instrumental variable. The second approach focuses on shifts inthe endogenous regressor that is caused by an “exogenous source of variation”in the variable resulting from a random phenomenon. The textbook uses theexample of an earthquake which changes student-teacher ratios as students inaffected areas have to be redistributed.

12

To check the validity of instruments, there is the rule of thumb to determinewhether or not an instrument is weak. It states that the F-statistic in the firststage of the TSLS procedure should exceed 10. Instrument exogeneity can betested only in the case of overidentification. If there are more instruments thanendogenous regressors, then the J-statistic can be calculated. The nullhypothesis of exogeneity will be rejected, in essence, if the TSLS residuals arecorrelated with the instruments.

If instruments are weak, then the TSLS estimator is biased and statisticalinference does not yield reliable confidence intervals even in large samples.

5) You have estimated a government reaction function, i.e., a multiple regression equation,where a government instrument, say the federal funds rate, depends on past governmenttarget variables, such as inflation and unemployment rates. In addition, you added theprevious period’s popularity deficit of the government, e.g. the approval rating of thepresident minus 50%, as one of the regressors. Your idea is that the Federal Reserve,although formally independent, will try to expand the economy if the president isunpopular. One of your peers, a political science student, points out that approval ratingsdepend on the state of the economy and thereby indirectly on government instruments. Itis therefore endogenous and should be estimated along with the reaction function.Initially you want to reply by using a phrase that includes the words “money neutrality”but are worried about a lengthy debate. Instead you state that as an economist, you are notconcerned about government approval ratings, and that government approval ratings aredetermined outside your (the economic) model. Does your whim make the regressorexogenous? Why or why not?

Answer: In general, the question of whether or not a variable is endogenous orexogenous depends on its correlation with the error term, not on the size of theunderlying model. The point to make is that just because a variable isendogenous does not imply that its determinants have to be modeled. If thepurpose of the exercise is to eventually simulate the model for policy purposes,then the feedback envisioned by the political science student is potentiallyimportant. However, if the aim is simply to forecast the behavior of thegovernment reaction function, then the issue of endogeneity or exogeneity isonly relevant for questions regarding the type of estimator to be used. Of course,if a regressor is endogenous, then instrumental variable techniques must be usedto ensure desirable properties of the estimator.

13

Mathematical and Graphical Problems

1) To analyze the year-to-year variation in temperature data for a given city, you regress thedaily high temperature (Temp) for 100 randomly selected days in two consecutive years(1997 and 1998) for Phoenix. The results are (heteroskedasticity-robust standard errors inparentheses):

1998PHXTemp = 15.63 + 0.80 1997

PHXTemp× ; 2R = 0.65, SER = 9.63

(6.46) (0.10)

(a) Calculate the predicted temperature for the current year if the temperature in the previousyear was 40ºF, 78ºF, and 100ºF. How does this compare with you prior expectation?Sketch the regression line and compare it to the 45 degree line. What are theimplications?

Answer: The three predicted temperatures will be 47.6, 78.0, and 95.6 respectively. The initialexpectation should be that the temperature in 1998 is the same in 1997 for a givendate. The regression line and the 45 degree line are sketched in the accompanyingfigure. The implication is mean reversion: if the temperature was low (40 degrees),then it will also be low the following year, but not as low. Alternatively, if thetemperature was high (100º), then it will be high again, but not as high. If thisprediction is extrapolated into the future, then eventually all temperatures should bethe same for all days. This obviously does not make sense.

1997 and 1998 Temperature

30

40

50

60

70

80

90

100

110

120

30 50 70 90 110

Degrees in 1997

Deg

rees

in19

98

Predicted Temp 1998 45 degree line

14

(b) You recall having studied errors-in-variables before. Although the Web site you receivedyour data from seems quite reliable in measuring data accurately, what if the temperaturecontained measurement error in the following sense: for any given day, say January 28,there is a true underlying seasonal temperature (X), but each year there are different

temporary weather patterns (v, w) which result in a temperature X different from X. Forthe two years in your data set, the situation can be described as follows:

1997 1997X X v= + and 1998 1998X X w= +

Subtracting 1997X from 1998X , you get 1998 1997 1998 1997X X w v= + − . Hence the population

parameter for the intercept and slope are zero and one, as expected. It is not difficult toshow that the OLS estimator for the slope is inconsistent, where

2

1 2 2ˆ 1

pv

x v

σβσ σ

→ −+

As a result you consider estimating the slope and intercept by TSLS. You think about aninstrument and consider the temperature one month ahead of the observation in theprevious year. Discuss instrument validity for this case.

Answer: For an instrument to be valid, two conditions have to hold. First, theinstrument has to be relevant, and second, the instrument has to be exogenous. Iftemperatures in one month ahead can predict the current temperature, as itcertainly does in Phoenix, then the instrument is relevant or correlated with thecurrent month’s temperature. If in addition, whatever caused the temperature inthe current month to deviate from its long-term value is only a temporaryphenomenon, such as a weather system created by a storm in the Pacific, thennext month’s temperature should not be correlated with this event. Hence theinstrument would be exogenous.

(c) The TSLS estimation result is as follows:

1998PHXTemp = -6.24 + 1.07 1997

PHXTemp× ;

(5.0) (0.06)

Perform a t-test on whether or not the slope is now significantly different from 1.

Answer: The t-statistic is 1.17, and hence you cannot reject the null hypothesis that theslope equals 1.

15

2) Consider the following population regression model relating the dependent variable Yi

and regressor Xi,

Yi = β0 + β1Xi + ui, i = 1,…, n.

i i iX Y Z≡ +

where Z is a valid instrument for X.

(a) Exlain why you should not use OLS to estimate β1.

Answer: Substitution of the first equation into the identity shows that X is correlated withthe error term. Hence estimation with OLS results in an inconsistent estimator.

(b) To generate a consistent estimator for β1, what should you do?

Answer: The instrumental variable estimator is consistent and in this case is2

1

SLSZY

ZX

s

sβ = .

Adventurous students will derive this estimator along the lines shown inAppendix 10.2.

(c) The two equations above make up a system of equations in two unknowns. Specify thetwo reduced form equations in terms of the original coefficients. (Hint: Substitute theidentity into the first equation and solve for Y. Similarly, substitute Y into the identityand solve for X.)

Answer: 0 1

0 1

( )

( )i i i i

i i i i

Y Y Z u

X X u Z

β ββ β

= + + += + + +

or

1 0 1

1 0

(1 )

(1 )i i i

i i i

Y Z u

X Z u

β β ββ β

− = + +− = + +

Hence

0 2 1

3 4 2

i i i

i i i

Y Z v

X Z v

π ππ π

= + += + +

where 0 10 3 2 4

1 1 1

1, , ,

1 1 1

β βπ π π πβ β β

= = = =− − −

and 1 21

1

1i i iv v uβ

= =−

.

16

(d) Do the two reduced form equations satisfy the OLS assumptions? If so, can you findconsistent estimators of the two slopes? What is the ratio of the two estimated slopes?This estimator is called “indirect least squares.” How does it compare to the TSLS in thisexample?

Answer: Since Z is a valid instrument by assumption, it must be uncorrelated with the

error term, and hence using OLS results in a consistent estimator.2

4

YZ

YZZZ

XZ ZZ

ZZ

sss

s ss

ππ

= = which

is identical to the TSLS estimator.

3) Here are some examples of the instrumental variables regression model. In each case youare given the number of instruments and the J-statistic. Find the relevant value from the

2m kχ − distribution, using a 1% and 5% significance level, and make a decision whether or

not to reject the null hypothesis.

(a) 0 1 1 , 1,...,i i iY X u i nβ β= + + = ; 1 2,i iZ Z are valid instruments, J = 2.58.

Answer: The test statistic is distributed 21χ and the critical values are 6.63 and 3.84 at the

1% and 5% significance level. Hence you cannot reject the null hypothesis thatall the instruments are exogenous.

(b) 0 1 1 2 2 3 1 , 1,...,i i i i iY X X W u i nβ β β β= + + + + = ; 1 2 3 4, , ,i i i iZ Z Z Z are valid instruments,

J = 9.63.

Answer: The test statistic is distributed 22χ and the critical values are 9.21 and 5.99 at the

1% and 5% significance level. Hence you can reject the null hypothesis that allthe instruments are exogenous.

(c) 0 1 1 2 1 3 2 4 3 , 1,...,i i i i i iY X W W W u i nβ β β β β= + + + + + = ; 1 2 3 4, , ,i i i iZ Z Z Z are valid

instruments, J = 11.86.

Answer: The test statistic is distributed 23χ and the critical values are 11.34 and 7.81 at

the 1% and 5% significance level. Hence you can reject the null hypothesis thatall the instruments are exogenous.

17

4) To study the determinants of growth among the countries of the world, researchers haveused panels of countries and observations spanning long periods of time (e.g. 1965-1975,1975-1985, 1985-1990). Some of these studies have focused on the effect that inflationhas on growth, and found that although the effect is small for a given time period, itaccumulates over time and therefore has an important negative effect.

(a) Explain why the OLS estimator may be biased in this case.

Answer: The presence of simultaneous causality is highly likely since inflation mayrespond to growth. Depending on the list of regressors, omitted variables canalso bias the estimator for the effect of the inflation rate.

(b) Explain how methods using panel data could potentially alleviate the problem.

Answer: Country fixed effects or differencing the data can solve the problem if inflationstays relatively constant over time from one country to the other. Unfortunatelyif the effect of inflation on growth is the focus of the study, then much of thecross-sectional information is lost using this approach.

(c) Some authors have suggested using an index of central bank independence as aninstrument. Discuss whether or not such an index would be a valid instrument.

Answer: For this index to be valid, central bank independence has to be relevant andexogenous. If inflation rates are correlated with the index, then central bankindependence is a relevant instrument. Although there is a high correlation fordeveloped countries, there is little to no correlation when data for all countriesare considered. Whether or not the index is exogenous cannot be tested unlessthe coefficients of the equation are overidentified. Otherwise personal judgmentis the only guide. An argument that central bank independence is exogenouswould have to rely on it being based on institutional arrangements which areindependent of inflation. Although the independence of central banks in manycountries was initially determined by concerns independent of inflation, therehave been many situations where the institutional arrangements were altered as aresult of high inflation.

5) (Requires Matrix Algebra) The population multiple regression model can be written inmatrix form as

Y = Xββββ + U

18

Where

01 1 11 1 11 1

2 2 12 2 12 2 1

1 1

1

1, , , and

1

k r

k r

n kn n rnn n k

Y u X X W W

Y u X X W W

X X W WY u

ββ

β

β

= = = =

Y U X

Note that the X matrix contains both k endogenous regressors and (r +1) includedexogenous regressors (the constant is obviously exogenous).

The instrumental variable estimator for the overidentified case is

1 1 1β − − −= [ ' ( ' ) ' ] ' ( ' ) 'IV

X Z Z Z Z X X Z Z Z Z Y ,where Z is a matrix, which contains two types of variables: first the r included exogenousregressors plus the constant, and second, m instrumental variables.

11 1 11 1

12 2 12 2

1 1

1

1

1

m r

m r

n mn n rn

Z Z W W

Z Z W WZ

Z Z W W

=

It is of order n× (m+r+1).

For this estimator to exist, both ( ' )Z Z and 1−[ ' ( ' ) ' ]X Z Z Z Z X must be invertible.State the conditions under which this will be the case and relate them to the degree ofoveridentification.

Answer: In order for a matrix to be invertible, it must have full rank. Since Z¢Z is oforder ( 1) ( 1)m r m r+ + × + + , then in order to invert Z¢Z , it must have rank

( 1)m r+ + . In the case of a product such as Z¢Z, the rank is at most less than or

equal to the rank of Z¢ or Z , whichever is smaller. Z is of order ( 1)n m r× + + ,and assuming that there is no perfect multicollinearity, will have either rank n orrank (m + r + 1), whichever is the smaller of the two. Hence if there are fewerobservations than the number of instrumental variables plus exogenousvariables, then the rank of Z will be ( 1)n m r< + + , and the rank of Z¢Z is also

( 1)n m r< + + . Hence Z¢Z does not have full rank, and therefore cannot beinverted. The IV estimator does not exist as a result. In the past, this was

19

considered a strong possibility with large econometric models, where manypredetermined variables entered.

If there are more observations than instruments, then the rank of Z¢Z is( 1)m r+ + . 'X Z will be of order ( 1) ( 1)k r m r+ + × + + , which will have rank( 1)k r+ + if m > k, i.e. if there is overidentification. Furthermore

1−[ ' ( ' ) ' ]X Z Z Z Z X is of order ( 1) ( 1)k r k r+ + × + + and will have full ranksince the rank of a product of the three matrices involved is at most the rank ofthe minimum of the three matrices 'X Z , Z¢Z , and 'Z X .

6) Consider the following model of demand and supply of coffee:

Demand: 1 2Coffee Coffee Teai i i iQ P P uβ β= + +

Supply: 3 4 5Coffee Coffee Teai i i iQ P P Weather vβ β β= + + +

(Variables are measured in deviations from means, so that the constant is omitted.)

What are the expected signs of the various coefficients this model? Assume that the priceof tea and Weather are exogenous variables. Are the coefficients in the supply equationidentified? Are the coefficients in the demand equation identified? Are theyoveridentified? Is this result surprising given that there are more exogenous regressors inthe second equation?

Answer: Changes in Weather will shift the supply equation and thereby trace out thedemand equation. Hence the coefficients of the demand equation are exactlyidentified since the number of instruments equals the number of endogenousregressors. However, the coefficients of the supply equation are underidentifiedsince there is no instrumental variable available for estimation. The result is notsurprising, since it is not the number of exogenous regressors in the equationthat matters when determining whether or not the coefficients are identified.Instead what matters is the number of instruments available relative to thenumber of endogenous regressors. It is possible that the regression coefficientscan be (over)identified even if there are no exogenous regressors present in theequation.

7) You started your econometrics course by studying the OLS estimator extensively, first forthe simple regression case and then for extensions of it. You have now learned about theinstrumental variable estimator. Under what situation would you prefer one to the other?Be specific in explaining under which situations one estimation method generatessuperior results.

20

Answer: Under the OLS assumptions, the OLS estimator is unbiased and consistent. Thesampling distribution of the estimator is approximately normal in large samples.Hence statistical inference can proceed as usual using the t-statistic, confidenceintervals, or joint hypothesis tests involving the F-statistic.

One major concern throughout the text has been the development of newestimation techniques in the case where one of the OLS assumptions is violated,specifically that there is correlation between the error term and at least one ofthe regressors. This may be the result of omitted variables, error-in-variables, orsimultaneous causality bias. These make up three of the threats to internalvalidity. In each of these cases, OLS becomes biased and an alternativeestimator should be used.

Even if the OLS assumptions are violated and the OLS estimator is biasedbecause of omitted variable bias, simultaneous causality, or errors-in-variables,using TSLS will not improve the situation if the instruments are not valid. Inthat case, TSLS will yield inconsistent estimators if the instruments are notexogenous. It will be biased and statistical inference will not be valid if theinstruments are weak. Furthermore, the estimator will not even be normallydistributed in large samples.

If the instruments are valid and the other IV regression assumptions hold, thenthe TSLS estimator is consistent and therefore preferable over the OLSestimator. Although its distribution is complicated in small samples, thesampling distribution of the estimator is approximately normal in large samples.Hence statistical inference can proceed as usual using the t-statistic, confidenceintervals, or joint hypothesis tests involving the F-statistic.

8) Your textbook gave an example of attempting to estimate the demand for a good in amarket, but being unable to do so because the demand function was not identified. Is thisthe case for every market? Consider, for example, the demand for sports events. One ofyour peers estimated the following demand function after collecting data over two yearsfor every one of the 162 home games of the 2000 and 2001 season for the Los AngelesDodgers.

Attend = 15,005 + 201×Temperat + 465×DodgNetWin + 82×OppNetWin(8,770) (121) (169) (26)

+ 9647×DFSaSu + 1328×Drain + 1609×D150m + 271×DDiv – 978×D2001;(1505) (3355) (1819) (1,184) (1,143)

21

R2 = 0.416, SER = 6983

Attend is announced stadium attendance, Temperat is the average temperature on gameday, DodgNetWin are the net wins of the Dodgers before the game (wins-losses),OppNetWin is the opposing team’s net wins at the end of the previous season, andDFSaSu, Drain, D150m, Ddiv, and D2001 are binary variables, taking a value of 1 if thegame was played on a weekend, it rained during that day, and the opposing team waswithin a 150-mile radius, plays in the same division as the Dodgers, and played during2001, respectively. Numbers in parentheses are heteroskedasticity-robust standard errors.

Even if there is no identification problem, is it likely that all regressors are uncorrelatedwith the error term? If not, what are the consequences?

Answer: In the case of sports events, often price and quantity are not simultaneouslydetermined by supply and demand. For baseball games, the supply of seats isfixed at the capacity level of the stadium. In addition, prices for games are alsofixed in advance and do not vary with the attractiveness of the opponent.Therefore the supply curve is infinitely elastic up to the point of where the gameis sold out. This situation is complicated by ticket scalping and the fact thatteams stage special events (fireworks, etc.). Taking these considerations intoaccount may result in simultaneous causality bias or a threat to internal validitybecause of the identification problem.

However, assuming that there is no identification problem, there may still beomitted variable bias or errors-in-variables bias. For example, attendancetypically increases the tighter the race for a play-off spot towards the end of theseason. Furthermore, it is not the opposing team’s net wins at the end of theprevious season that accounts for the attractiveness of the opponent, but theperformance during the current season. If the opposing team’s currentperformance is related to its performance in the previous season, then the OLSestimator is biased.

9) Earnings functions, whereby the log of earnings is regressed on years of education, yearsof on-the-job training, and individual characteristics, have been studied for a variety ofreasons. Some studies have focused on the returns to education, others on discrimination,union and non-union differentials, etc. For all these studies, a major concern has been thefact that ability should enter as a determinant of earnings, but that it is close to impossibleto measure and therefore represents an omitted variable.

Assume that the coefficient on years of education is the parameter of interest. Given thateducation is positively correlated to ability, since, for example, more able students attractscholarships and hence receive more years of education, the OLS estimator for the returnsto education could be upward-biased. To overcome this problem, various authors have

22

used instrumental variables estimation techniques. For each of the instruments potentialinstruments listed below, briefly discuss instrument validity.

(a) The individual’s postal zip code.

Answer: Instrumental validity has two components, instrument relevance( ( , ) 0i icorr Z X ≠ ), and instrument exogeneity ( ( , ) 0i icorr Z u = ). The

individual’s postal zip code will certainly be uncorrelated with the omittedvariable, ability, even though some zip codes may attract more able individuals.However, this is an example of a weak instrument, since it is also uncorrelatedwith years of education.

(b) The individual’s IQ or test-score on a work-related exam.

Answer: There is instrument relevance in this case, since, on average, individuals who dowell in intelligence scores or other work-related test scores will have more yearsof education. Unfortunately there is bound to be a high correlation with theomitted variable ability, since this is what these tests are supposed to measure.

(c) Years of education for the individual’s mother or father.

Answer: A non-zero correlation between the mother’s or father’s years of education andthe individual’s years of education can be expected. Hence this is a relevantinstrument. However, it is not clear that the parent’s years of education areuncorrelated with parent’s ability, which in turn, can be a major determinant ofthe individual’s ability. If this is the case, then years of education of the motheror father is not a valid instrument.

(d) Number of siblings the individual has.

Answer: There is some evidence that the larger the number of siblings of an individual,the less the number of years of education the individual receives. Hence numberof siblings is a relevant instrument. It has been argued that number of siblings isuncorrelated with an individual’s ability. In that case it also represents anexogenous instrument. However, there is the possibility that ability depends onthe attention an individual receives from parents, and this attention is sharedwith other siblings.

(10) The two conditions for instrument validity are ( , ) 0i icorr Z X ≠ and ( , ) 0i icorr Z u = . The

reason for the inconsistency of OLS is that ( , ) 0i icorr X u ≠ . But if X and Z are correlated,

and X and u are also correlated, then how can Z and u not be correlated? Explain.

23

Answer: The introduction to Chapter 10 on instrumental variables regression and section10.1 went into a lengthy explanation of this problem. The major idea is that

( , )i icorr X u has two parts: one for which the correlation is zero and a second

for which it is non-zero. The trick is to isolate the uncorrelated part of X. For theinstrument to be valid, ( , ) 0i icorr Z u = and ( , ) 0i icorr Z X ≠ must hold. TSLS

then generates predicted values of X in the first stage by using a linearcombination of the instruments. As long as ( , ) 0i icorr Z X ≠ and ( , ) 0i icorr Z u = ,

then the part of X which is uncorrelated with the error term is extracted throughthe prediction. In the second stage, this captured exogenous variation in X isthen used to estimate the effect of X on Y, which is exogenous.

.

Date post:	25-Oct-2014
Category:	Documents
Upload:	paulofz
View:	1,038 times
Download:	5 times

ch10tb

Documents