Lab8: Instrumental Variable Estimation in RIntroduction to Econometrics,Fall 2019
Zhaopeng Qu
Nanjing University
12/24/2020
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 1 / 37
Section 1
Instrumental Variable
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 2 / 37
Instrumental Variable: Tax and Cigrattes
# load the data set and get an overviewlibrary(tidyverse)library(AER)library(stargazer)
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 3 / 37
Demand of Cigrattes
# load the data set and get an overviewdata("CigarettesSW")str(CigarettesSW)
## 'data.frame': 96 obs. of 9 variables:## $ state : Factor w/ 48 levels "AL","AR","AZ",..: 1 2 3 4 5 6 7 8 9 10 ...## $ year : Factor w/ 2 levels "1985","1995": 1 1 1 1 1 1 1 1 1 1 ...## $ cpi : num 1.08 1.08 1.08 1.08 1.08 ...## $ population: num 3973000 2327000 3184000 26444000 3209000 ...## $ packs : num 116 129 105 100 113 ...## $ income : num 4.60e+07 2.62e+07 4.40e+07 4.47e+08 4.95e+07 ...## $ tax : num 32.5 37 31 26 31 ...## $ price : num 102.2 101.5 108.6 107.8 94.3 ...## $ taxs : num 33.3 37 36.2 32.1 31 ...
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 4 / 37
Demand of CigrattesWe are interested in estimating 𝛽1 in
log(𝑄𝑐𝑖𝑔𝑎𝑟𝑒𝑡𝑡𝑒𝑠𝑖 ) = 𝛽0 + 𝛽1 log(𝑃 𝑐𝑖𝑔𝑎𝑟𝑒𝑡𝑡𝑒𝑠𝑖 ) + 𝑢𝑖 (1)
where 𝑄𝑐𝑖𝑔𝑎𝑟𝑒𝑡𝑡𝑒𝑠𝑖 is the number of cigarette packs per capita soldand 𝑃 𝑐𝑖𝑔𝑎𝑟𝑒𝑡𝑡𝑒𝑠𝑖 is the after-tax average real price per pack ofcigarettes in state 𝑖.The instrumental variable is 𝑆𝑎𝑙𝑒𝑠𝑇 𝑎𝑥, the portion of taxes oncigarettes arising from the general sales tax. 𝑆𝑎𝑙𝑒𝑠𝑇 𝑎𝑥 is measuredin dollars per pack.
The idea is that 𝑆𝑎𝑙𝑒𝑠𝑇 𝑎𝑥 is a relevant instrument as it is includedin the after-tax average price per pack. Also, it is plausible that𝑆𝑎𝑙𝑒𝑠𝑇 𝑎𝑥 is exogenous since the sales tax does not influencequantity sold directly but indirectly through the price.
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 5 / 37
Data transformations
# compute real per capita pricesCigarettesSW$rprice
The first stage
# perform the first stage regressioncig_s1 |t|)## (Intercept) 4.6165463 0.0289177 159.6444 < 2.2e-16 ***## salestax 0.0307289 0.0048354 6.3549 8.489e-08 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 7 / 37
The first stage# check instrument relevance: The F-statisticslinearHypothesis(cig_s1,"salestax=0",vcov=vcovHC,type="HC1")
## Linear hypothesis test#### Hypothesis:## salestax = 0#### Model 1: restricted model## Model 2: log(rprice) ~ salestax#### Note: Coefficient covariance matrix supplied.#### Res.Df Df F Pr(>F)## 1 47## 2 46 1 35.713 3.145e-07 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 8 / 37
2SLS ManuallyThus estimating the model using TSLS yields
̂log(𝑄𝑐𝑖𝑔𝑎𝑟𝑒𝑡𝑡𝑒𝑠𝑖 ) = 9.72(1.70) + 1.08(0.36) log(𝑃𝑐𝑖𝑔𝑎𝑟𝑒𝑡𝑡𝑒𝑠𝑖 ) (2)
# store the predicted values from first stagelcigp_pred
ivreg package
Command format
𝑖𝑣𝑟𝑒𝑔(𝑌 𝑋 + 𝑊|𝑊 + 𝑍, ...)
Endogenous variables (X) can only appear before the vertical lineInstruments (Z) can only appear after the vertical line;Exogenous regressors that are not instruments (W) must appear bothbefore and after the vertical line.
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 10 / 37
using ivreg package
# perform TSLS using 'ivreg()'cig_ivreg |t|)## (Intercept) 9.71988 1.52832 6.3598 8.346e-08 ***## log(rprice) -1.08359 0.31892 -3.3977 0.001411 **## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 11 / 37
More Covariates: Income
State income, which impact the demand for cigarettes and correlatewith the sales tax.
States with high personal income tend to generate tax revenues byincome taxes and less by sales taxes. Consequently, state incomeshould be included in the regression model.
log(𝑄𝑐𝑖𝑔𝑎𝑟𝑒𝑡𝑡𝑒𝑠𝑖 ) = 𝛽0 + 𝛽1 log(𝑃 𝑐𝑖𝑔𝑎𝑟𝑒𝑡𝑡𝑒𝑠𝑖 ) + 𝛽2 log(𝑖𝑛𝑐𝑜𝑚𝑒𝑖) + 𝑢𝑖 (3)
# add rincome to the datasetCigarettesSW$rincome
2SLS Estimate# estimate the modelcig_ivreg2 |t|)## (Intercept) 9.43066 1.25939 7.4883 1.935e-09 ***## log(rprice) -1.14338 0.37230 -3.0711 0.003611 **## log(rincome) 0.21452 0.31175 0.6881 0.494917## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 13 / 37
Another Instrumental Variable
add the cigarette-specific taxes (𝑐𝑖𝑔𝑡𝑎𝑥𝑖), it could be related withprice more closely.
# add cigtax to the data setCigarettesSW$cigtax
2SLS Estimate
# estimate the modelcig_ivreg3 |t|)## (Intercept) 9.89496 0.95922 10.3157 1.947e-13 ***## log(rprice) -1.27742 0.24961 -5.1177 6.211e-06 ***## log(rincome) 0.28040 0.25389 1.1044 0.2753## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 15 / 37
Reduced form
# estimate the modelcig_ivreg5 |t|)## (Intercept) 5.325015 0.669483 7.9539 4.023e-10 ***## salestax -0.031323 0.011233 -2.7883 0.007736 **## log(rincome) -0.230581 0.252173 -0.9144 0.365394## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 16 / 37
Reduced forms
cig_ivreg6 |t|)## (Intercept) 4.5411823 0.6426287 7.0666 8.119e-09 ***## cigtax -0.0147577 0.0033733 -4.3748 7.128e-05 ***## log(rincome) 0.1930657 0.2584028 0.7472 0.4589## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 17 / 37
Application to the Demand for CigarettesAre the general sales tax and the cigarette-specific tax validinstruments? If not, TSLS is not helpful to estimate the demandelasticity for cigarettes.Both variables are likely to be relevant butwhether they are exogenous is a different question.
cigarette-specific taxes could be endogenous because there might bestate specific historical factors like economic importance of the tobaccofarming and cigarette production industry that lobby for low cigarettespecific taxes.If we had data on the size on the tobacco and cigarette industry, wecould solve this potential issue by including the information in theregression. Unfortunately, this is not the case.since the role of the tobacco and cigarette industry is a factor that canbe assumed to differ across states but not over time we may exploit thepanel structure of data on changes between two time periodseliminates such state specific and time invariant effects, thuschanges in variables between 1985 and 1995.
estimating the long-run elasticity of the demand for cigarettes.Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 18 / 37
Application to the Demand for Cigarettes
The model to be estimated by TSLS using the general sales tax and thecigarette-specific sales tax as instruments hence is
log(𝑄𝑐𝑖𝑔𝑎𝑟𝑒𝑡𝑡𝑒𝑠𝑖,1995 ) − log(𝑄𝑐𝑖𝑔𝑎𝑟𝑒𝑡𝑡𝑒𝑠𝑖,1995 ) = 𝛽0+𝛽1 [log(𝑃 𝑐𝑖𝑔𝑎𝑟𝑒𝑡𝑡𝑒𝑠𝑖,1995 ) − log(𝑃 𝑐𝑖𝑔𝑎𝑟𝑒𝑡𝑡𝑒𝑠𝑖,1985 )]
+𝛽2 [log(𝑖𝑛𝑐𝑜𝑚𝑒𝑖,1995) − log(𝑖𝑛𝑐𝑜𝑚𝑒𝑖,1985)] + 𝑢𝑖.(4)
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 19 / 37
Before and After Model
first create differences from 1985 to 1995 for the dependent variable,the regressors and both instruments.
# subset data for year 1985c1985
differences from 1985 to 1995
salestaxdiff
First stage regression
# first-stage regressionsmod_relevance1
First stage regression
# robust coefficient summary for 1.coeftest(mod_relevance1, vcov = vcovHC, type = "HC1")
# robust coefficient summary for 2.coeftest(mod_relevance2, vcov = vcovHC, type = "HC1")
# robust coefficient summary for 3.coeftest(mod_relevance3, vcov = vcovHC, type = "HC1")
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 23 / 37
First stage regression: F-test
# check instrument relevance for model (1)linearHypothesis(mod_relevance1,
"salestaxdiff = 0",vcov = vcovHC, type = "HC1")
# check instrument relevance for model (2)linearHypothesis(mod_relevance2,
"cigtaxdiff = 0",vcov = vcovHC, type = "HC1")
# check instrument relevance for model (3)linearHypothesis(mod_relevance3,c("salestaxdiff = 0", "cigtaxdiff = 0"),
vcov = vcovHC, type = "HC1")
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 24 / 37
Table: First stage
# generate tablerob_se_fs
Table: First stage
Dependent Variable: priceFirst Stage: salestax First Stage: cigtax First Stage: salestax, cigtax
(1) (2) (3)salestaxdiff 0.025∗∗∗ 0.013∗∗∗
(0.004) (0.003)cigtaxdiff 0.010∗∗∗ 0.008∗∗∗
(0.001) (0.001)incomediff −0.224 0.029 −0.029
(0.219) (0.131) (0.124)Constant 0.184∗∗∗ 0.155∗∗∗ 0.144∗∗∗
(0.030) (0.020) (0.018)Observations 48 48 48Adjusted R2 0.493 0.665 0.763F Statistic 23.857∗∗∗ (df = 2; 45) 47.722∗∗∗ (df = 2; 45) 51.362∗∗∗ (df = 3; 44)
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 26 / 37
Reduced forms
# reduced formsmod_reduced1
Reduced forms
# robust coefficient summary for 1.coeftest(mod_reduced1, vcov = vcovHC, type = "HC1")# robust coefficient summary for 2.coeftest(mod_reduced2, vcov = vcovHC, type = "HC1")# robust coefficient summary for 3.coeftest(mod_reduced3, vcov = vcovHC, type = "HC1")
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 28 / 37
Reduced forms# gather robust standard errors in a listrob_se
Table: Reduced forms
Dependent Variable: 1985-1995 Difference in Log per Pack PriceReduced Form: salestax Reduced Form: cigtax Reduced Form: salestax, cigtax
(1) (2) (3)salestaxdiff −0.024∗∗∗ −0.003
(0.006) (0.006)cigtaxdiff −0.014∗∗∗ −0.013∗∗∗
(0.002) (0.002)incomediff 0.736 0.389 0.403
(0.487) (0.291) (0.297)Constant −0.291∗∗∗ −0.225∗∗∗ −0.222∗∗∗
(0.067) (0.043) (0.045)Observations 48 48 48Adjusted R2 0.227 0.583 0.577F Statistic 7.885∗∗∗ (df = 2; 45) 33.915∗∗∗ (df = 2; 45) 22.372∗∗∗ (df = 3; 44)
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 30 / 37
2sls estimation
# estimate the three modelscig_ivreg_diff1
2sls estimation# robust coefficient summary for 1.coeftest(cig_ivreg_diff1, vcov = vcovHC, type = "HC1")
#### t test of coefficients:#### Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.117962 0.068217 -1.7292 0.09062 .## pricediff -0.938014 0.207502 -4.5205 4.454e-05 ***## incomediff 0.525970 0.339494 1.5493 0.12832## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1# robust coefficient summary for 2.coeftest(cig_ivreg_diff2, vcov = vcovHC, type = "HC1")
#### t test of coefficients:#### Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.017049 0.067217 -0.2536 0.8009## pricediff -1.342515 0.228661 -5.8712 4.848e-07 ***## incomediff 0.428146 0.298718 1.4333 0.1587## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1# robust coefficient summary for 3.coeftest(cig_ivreg_diff3, vcov = vcovHC, type = "HC1")
#### t test of coefficients:#### Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.052003 0.062488 -0.8322 0.4097## pricediff -1.202403 0.196943 -6.1053 2.178e-07 ***## incomediff 0.462030 0.309341 1.4936 0.1423## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 32 / 37
2sls estimation# gather robust standard errors in a listrob_se2
Table:2sls estimation
Dependent Variable: 1985-1995 Difference in Log per Pack PriceIV: salestax IV: cigtax IVs: salestax, cigtax
(1) (2) (3)pricediff −0.938∗∗∗ −1.343∗∗∗ −1.202∗∗∗
(0.208) (0.229) (0.197)incomediff 0.526 0.428 0.462
(0.339) (0.299) (0.309)Constant −0.118∗ −0.017 −0.052
(0.068) (0.067) (0.062)Observations 48 48 48Adjusted R2 0.530 0.498 0.526
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 34 / 37
Overidentification test# compute the J-statisticcig_iv_OR Chisq)## 1 46 0.37472## 2 44 0.33695 2 0.037769 4.932 0.08492 .## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 35 / 37
compute correct p-value for J-statistic
In the case above, the p-value reported by linearHypothesis() iswrong because the degrees of freedom are set to 2.
The right degree of overidentification is 𝑚 − 𝑘 = 2 − 1 = 1, where 𝑚is the number of instruments, and 𝑘 is the number of endogenousvariable.
# compute correct p-value for J-statistic
cig_OR_test
Overidentification test
Since this value is smaller than 0.05, then we reject the hypothesisthat both instruments are exogenous at the level of 5%. This meansone of the following:
1 The sales tax is an invalid instrument for the per-pack price.2 The cigarettes-specific sales tax is an invalid instrument for the
per-pack price.3 Both instruments are invalid.
Since cigarettes-specific sales tax is more likely endogenous, so it isbetter to drop it.
Zhaopeng Qu (Nanjing University) Lab8: Instrumental Variable Estimation in R 12/24/2020 37 / 37
Instrumental Variable