Econometrics II - u Valipas.uwasa.fi/~sjp/Teaching/ecmii/lectures/ecmiic2.pdf · Dynamic Panel...

Econometrics II

Seppo Pynnonen

Department of Mathematics and Statistics, University of Vaasa, Finland

Spring 2018

Seppo Pynnonen Econometrics II

Panel Data

Part II

Panel Data

As of Jan 23, 2018Seppo Pynnonen Econometrics II

Panel Data

1 Panel Data

Pooling independent cross section across time

Fixed effects model

Two-period panel data analysis

More than two time periods

Fixed effects method

Dummy variable regression

Fixed effects or first differencing?

Balanced and unbalanced panels

Random effects models

Random effects or fixed effects

Hausman specification test

Policy analysis with panel data

Dynamic Panel Models


Panel Data

Data sets that combine time series and cross sections data arecommon in economics.

Independently pooled cross section:

Data are obtained by sampling randomly a large population atdifferent points in time (e.g., yearly).

Allows to investigate the effect of time. E.g., whetherrelationships have changed.

Raises typically minor statistical complications.

Important feature:

The data set consists of independently sampled observations.


Panel Data

A panel data set (longitudinal data):

is a sample of same individuals, families, firms, cities . . ., arefollowed across time.

E.g., OECD statistics contain numerous series observed yearly fromseveral countries.

Similarly time series data on several firms, industries, etc., arethese type of data.


Panel Data

Pooling independent cross section across time1 Panel Data


Fixed effects model













Panel Data


Example 1

Women’s fertility over time: Data from General Social Survey containssamples collected even years from 1972 to 1984.

Model for explaining total number of children born to a woman.

Data is available on the course web side (password protected).


Panel Data


dfr <- read.table(file = "http://www.econometrics.com/comdata/wooldridge/FERTIL1.shd",

stringsAsFactors = FALSE, na = -999) # read data

vnames <- unlist(strsplit("year educ meduc feduc age kids

black east northcen west farm othrural

town smcity y74 y76 y78 y80

y82 y84 agesq y74educ y76educ y78educ

y80educ y82educ y84educ", split = "[ \n]+")) # variable names

vnames # check that OK

colnames(dfr) <- vnames # rename column names

str(dfr) # description of data frame dfr

’data.frame’: 1129 obs. of 27 variables:

$ year : int 72 72 72 72 72 72 72 72 72 72 ...

$ educ : int 12 17 12 12 12 8 12 10 12 12 ...

$ meduc : int 8 8 7 12 3 8 12 12 8 6 ...

$ feduc : int 8 18 8 10 8 8 10 5 8 13 ...

$ age : int 48 46 53 42 51 50 47 46 41 36 ...

$ kids : int 4 3 2 2 2 4 0 1 2 4 ...

$ black : int 0 0 0 0 0 0 0 0 0 0 ...

$ east : int 0 0 0 0 0 0 0 0 0 0 ...

$ northcen: int 1 0 1 1 0 1 1 1 1 1 ...

$ west : int 0 0 0 0 0 0 0 0 0 0 ...

$ farm : int 0 0 0 0 1 1 0 0 0 1 ...

$ othrural: int 0 1 1 0 0 0 0 0 0 0 ...

$ town : int 0 0 0 1 0 0 1 0 0 0 ...

$ smcity : int 0 0 0 0 0 0 0 0 0 0 ...

.

.

.

etc


Panel Data


## average number of children per woman in years 1972 to 1984

avkids <- tapply(dfr$kids, INDEX = dfr$year, FUN = mean, na.rm = TRUE) # averages per year

round(avkids, digits = 1) # averages per year

72 74 76 78 80 82 84

3.0 3.2 2.8 2.8 2.8 2.4 2.2

table(dfr$year) # number of observations (families) per year

72 74 76 78 80 82 84

156 173 152 143 142 186 177


Panel Data


72 74 76 78 80 82 84

2.02.5

3.03.5

4.0Average Number of Children per Woman in 1972 to 1984

Year

Avera

ge n

of ch

ildern

It is obvious that the fertility rate has declined over years


Panel Data


The analysis can be substantially elaborated by regression analysis.

After controlling other factors (educations, age, etc.), what has happenedto the fertility rate?

Build a regression with year dummies: y74 for 1974, · · · , y84 for year1984.

Year 1972 is the base year.


Panel Data


lm(formula = kids ~ educ + age + I(age^2) + black + east + northcen +

west + farm + y74 + y76 + y78 + y80 + y82 + y84, data = dfr)

Residuals:

Min 1Q Median 3Q Max

-3.9493 -1.0420 -0.0663 0.9324 4.7785

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -7.894707 3.051590 -2.587 0.009805 **

educ -0.124241 0.018149 -6.846 1.25e-11 ***

age 0.538145 0.138400 3.888 0.000107 ***

I(age^2) -0.005868 0.001564 -3.751 0.000185 ***

black 1.083783 0.173404 6.250 5.83e-10 ***

east 0.227601 0.131252 1.734 0.083180 .

northcen 0.371391 0.119968 3.096 0.002012 **

west 0.218869 0.166352 1.316 0.188547

farm -0.091881 0.122027 -0.753 0.451637

y74 0.258628 0.172716 1.497 0.134569

y76 -0.101236 0.178732 -0.566 0.571228

y78 -0.067151 0.181449 -0.370 0.711393

y80 -0.075120 0.182707 -0.411 0.681042

y82 -0.532352 0.172339 -3.089 0.002058 **

y84 -0.538395 0.174472 -3.086 0.002080 **

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 1.556 on 1114 degrees of freedom

Multiple R-squared: 0.1263,Adjusted R-squared: 0.1153

F-statistic: 11.51 on 14 and 1114 DF, p-value: < 2.2e-16


Panel Data


Sharp drop in fertility in the early 1980s (others are not statisticallysignificant).

E.g., the coefficient on y82 indicates that, holding other factors fixed(educ, age, and others), per 100 women there were in 1982 about 53 lesschildren than in 1972.

In particular, since education is controlled, this decline is separate fromthe decline due to the increase in eduction.

Women with more education have fewer children (coefficient −0.12 ishighly statistically significant with t = −6.85 and p-value < 0.0005).

Other things equal, per 100 women with a college education tend to have

4× 0.124 = 0.496, i.e., about 50 children less than women with only high

school education.


Panel Data


In summary, pooled cross section data (independent samples)problems can be analyzed utilizing dummy variables.


Panel Data

Fixed effects model1 Panel Data


Fixed effects model













Panel Data



Fixed effects model













Panel Data

Fixed effects model

From each individual (people, firms, schools, cities, countries, etc.)data are collected at two time points, t = 1 and t = 2.

In usual regression one major source of bias stems from omitted(important) variables.

For example, if the true model is

yi = β0 + β1xi + β2zi + ui , (1)

but we estimateyi = β0 + β1xi + vi , (2)

wherevi = β2zi + ui , (3)

the bias in OLS estimator β1 from model (2) is

E[β1

]− β1 = β2

∑ni=1(xi − x)zi∑ni=1(xi − x)2

, (4)

which can be substantial if x and z are correlated and β2 is large.Seppo Pynnonen Econometrics II

Panel Data

Fixed effects model

The problem is that we usually do not know if important variablesare missing from our model!

Use of panel data makes it possible to eliminate the omittedvariable bias in certain cases.

Suppose that we have the following situation in terms of model (1)

yit = β0 + β1xit + β2zi + uit , (5)

where i refers to individual i and t to time point t.

Thus, we have panel data where data is collected from eachindividual i at different time points t (in the two period case,t = 1, 2).

Note that in (5) zi does not have the time index, which impliesthat variable z is time invariant (or at least changing very slowlywith time).


Panel Data

Fixed effects model

Suppose, we have from each of the n individuals observations onyit and xit at time points t = 1 and t = 2, thus altogether 2nobservations.

However, we do not observe zi .

Suppose further that we allow the possibility that intercept β0 maybe different at different time points, such that (5) can be written as

yit = β0 + δ0Dt + β1xit + β2zi + uit , (6)

where Dt = 0 for t = 1 and Dt = 1 for t = 2 (time dummy).


Panel Data

Fixed effects model

Then taking differences

∆yi = yi2 − yi1,

the model in (6) becomes

∆yi = δ0 + β1∆xi + ∆ui , (7)

i.e., the (unobserved) omitted variable disappears and estimatingthe slope parameter β1 with OLS is unbiased.


Panel Data

Fixed effects model

The above generalizes immediately such that if we denote

ai = z′iγ = γ1zi1 + γ2zi2 + · · ·+ γqziq (8)

and enhance (6) to

yit = β0 + δ0Dt + βxit + ai + uit , (9)

taking differences reduces again to estimation model (7).

The above model is called the fixed effect (FE) model in which ai is fixed

over the time periods (ai can be a random variable, and can correlate

with the explanatory variable xit).

If ai is not correlated with other explanatory variables, the model is called

random effect (RE) model and is estimated with different techniques that

are supposed to yield more efficient estimators to β-parameters than the

fixed effect methods (that are basically OLS methods). We will return to

the RE model later.Seppo Pynnonen Econometrics II

Panel Data

Fixed effects model

In the FE case the resulting estimators of the regressionparameters from the first-differenced equation with OLS arecalled the first-differenced estimators (FD estimators).

We will deal with other fixed effect estimators later.

In summary:

Differencing eliminates all unobserved time invariant factorsfrom the model.

A major pitfall is that differencing also wipes out observedtime invariant variables (like gender) from the model!

FE cannot be used in these cases (if we want to estimate theseeffects), or in cases where the explanatory variables changevery slowly across time (the difference is nearly zero).


Panel Data

Fixed effects model

In many cases the FD-method is useful, however.

The following example highlight the biasing effect of unobservedfactors and how panel estimation with the simple FD-method likelysolves the problem.

Example 2

Data set crime2 (Wooldridge) contains data on crime and unemploymentrates for 46 US cities for 1982 (t = 1) and 1987 (t = 2).

Running simple cross section regression of crmrte on unem by using only1987 yields


Panel Data

Fixed effects model

lm(formula = crmrte ~ unem, data = cdfr, subset = year == 87)

Residuals:


-57.55 -27.01 -10.56 18.01 79.75

Coefficients:


(Intercept) 128.378 20.757 6.185 1.8e-07 ***

unem -4.161 3.416 -1.218 0.23

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1



F-statistic: 1.483 on 1 and 44 DF, p-value: 0.2297


Panel Data

Fixed effects model

Coefficient of crmrte is negative, −4.16!

However, not statistically significant.

Likely suffers from omitted variables problem (age distribution,gender distribution, eduction levels, . . .).

Most of these can be expected to be fairly stable across time. Thus,use of panel data techniques may be helpful.

Before proceeding to the panel data estimation, let us see what happensif we simply pool the two years and estimate

crmrte = β0 + δ0D87 + β1unem + u, (10)

where D87 is the year 1987 dummy.


Panel Data

Fixed effects model

lm(formula = crmrte ~ d87 + unem, data = cdfr)

Residuals:


-53.474 -21.794 -6.266 18.297 75.113

Coefficients:


(Intercept) 93.4203 12.7395 7.333 9.92e-11 ***

d87 7.9404 7.9753 0.996 0.322

unem 0.4265 1.1883 0.359 0.720

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1


Multiple R-squared: 0.01221,Adjusted R-squared: -0.009986


The situation does not change much qualitatively!


Panel Data

Fixed effects model

R, SAS, Stata, and EViews all have sophisticated panel dataprocedures.

We discuss some of them later.

R has the plm package for panel data analysis. In order to usepanel data variables to identify individuals and time must beavailable. in crime2, year is the time index, but city identifiersmust be defined (call it city. With these the FD method can beapplied by setting model = "FD" and index = c(city, year)

(in this order!) with the model definition in plm, see examplebelow.

In Stata the FD-method can be applied by using the regress

routine by first declaring the data as a panel data with the xtset

command(Menu: Statistics > Longitudinal/panel data > Setup

and utilities > Declare data set to be panel data).


Panel Data

Fixed effects model

Eviews: Proc > Structure/Resize Current Page. . ., andfollow the instructions.

SAS: proc panel data = crime2; model crmrte = unemp;

id = city year; end; Before applying proc panel the datamust be sorted by proc sort.

Whichever software is used, identifiers for the individuals (inparticular) are needed to indicate the multiple measurements on anindividual.


Panel Data

Fixed effects model

After declaring the panel structure for the program, the model

∆crmrte = δ0 + β1∆umem + ∆u (11)

can be estimated with the FD difference method in R as follows:

plm(formula = crmrte ~ unem, data = cdfr, model = "fd", index = c("city", "year"))

Balanced Panel: n=46, T=2, N=92

Observations used in estimation: 46

Residuals :

Min. 1st Qu. Median 3rd Qu. Max.

-36.90 -13.40 -5.51 12.40 52.90

Coefficients :

Estimate Std. Error t-value Pr(>|t|)

(intercept) 15.40219 4.70212 3.2756 0.00206 **

unem 2.21800 0.87787 2.5266 0.01519 *

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Total Sum of Squares: 20256

Residual Sum of Squares: 17690

R-Squared: 0.1267

Adj. R-Squared: 0.10685



Panel Data

Fixed effects model

In Eviews, after the data has been reshaped to panel data, theFD-estimatation can be worked out using Quick > Estimate

Equation. . . to open the Equation Estimation commandwindow to input d(cmrte) c d(unem) to get the results similarto above.

The coefficient estimate of the β1 ≈ 2.22 is now highly statisticallysignificant and of expected sign.

The model predicts that one percent increase in unemployment increasescrimes by about 2.2 per 1, 000 people.

The constant term indicates that even if the change in unemployment

rate were zero, the crime rate has generally increased during the period

from 1982 to 1987 by about 15.4 crimes per 1,000 people.


Panel Data

Fixed effects model

Note that the time dummy component δ0 in (11) captures all unobservedtime effect that are common to all cross-sectional individuals.

That is, we can consider δ0 to represent

δ0 = δ′zt = δ1z1t + δ2z2t + · · ·+ δpzpt ,

where zt ’s are common trend components affecting all crime rates in

t = 1987.


Panel Data



Fixed effects model













Panel Data

Fixed effects model

Differencing can be used with more than two time periods to workout fixed effect estimation.

As an example consider a three period model.

yit = δ1 + δ2D2t + δ3D3t + β1xit1 + · · ·+ βkxitk + uit (12)

for t = 1, 2, 3, where D2t = 1 for period t = 2 and zero otherwiseand D3t = 1 for t = 3 and zero othewrise.

Differencing yields

∆yit = δ2∆D2t + δ3∆D3t + β1∆xit1 + · · ·+ βk∆xitk + ∆uit (13)

t = 2, 3.

Note: For t = 2, ∆D2t = 1 and ∆D3t = 0 = D3t ; for t = 3,∆Dt2 = −1 and ∆D3t = 1 = D3t .

Again the model is simple to estimate by OLS.


Panel Data

Fixed effects model

Remark 1

A model of the form (13) is usual reparametrized into an equivalent form

∆yit = α0 + α3D3t + β1∆xit1 + · · ·+ βk∆xitk + ∆uit , (14)

where α0 = δ2 and α3 = δ3 − δ2.This generalizes to T time periods with time dummies D1t ,D2t , . . . ,DTt

in which Djt = 1 if j = t and zero otherwise, j = 1, 2, . . . ,T ,

∆yit = α0 + α3D3t + · · ·+ αTDTt (15)

+β1∆xit1 + · · ·+ βk∆xitk + ∆uit ,

where α0 = δ2, αj = δj − δj−1, j = 3, . . . ,T


Panel Data



Fixed effects model













Panel Data

Fixed effects model

An alternative method, which works in certain cases better thanthe FD-method, is called the fixed effects method.

Consider the simple case model of

yit = β1xit + ai + uit , (16)

i = 1, . . . , n, t = 1, . . . ,T .

Thus there are altogether n × T observations.

Define means over the T time periods

yi =1

T

T∑t=1

yit , xi =1

T

T∑t=1

xit , ui =1

T

T∑t=1

uit . (17)


Panel Data

Fixed effects model

Thenyi = β1xi + ai + ui . (18)

Note that

1

T

T∑t=1

ai =1

TTai = ai .

Thus, subtracting (18) from (16) eliminates ai and gives

yit − yi = β1(xit − xi ) + (uit − ui ) (19)

oryit = β1xit + uit , (20)

where e.g., yit = yit − yi is the time demeaned data on y .

This transformation is also called the within transformation andresulting (OLS) estimators of the regression parameters applied to(20) are called fixed effect estimators or within estimators.


Panel Data

Fixed effects model

In the two period case the FD method and FE lead to identicalresults.

Remark 2

The slope coefficient β1 estimated from (18) is called the

between estimator. vi = ai + ui is the error term. The estimator is

biased, however, if the unobserved component ai is correlated with x .

Remark 3

When estimating the unobserved effect by the fixed effect (FE) method,

it is unfortunately not clear how the goodness-of-fit R-square should be

computed. Stata produces three different R-squares: within, between,

and total.


Panel Data

Fixed effects model

Remark 4

Usually a full set of year dummies (i.e., year dummies for all years but the

first) are included in FE estimation to capture time variation. However,

then the effect of any variable whose change across time is constant

cannot be estimated (an example of such a variable is experience

measure by the number of year; experience increases every year by one).

Remark 5

Although time invariant variables cannot be included by themselves in a

FE mode, their interactions with year dummies can. For example, in a

wage equation (year dummy) x (education) measure the change in return

of education over time.


Panel Data



Fixed effects model













Panel Data

Fixed effects model

Yet another method is to introduce dummy variables for the crosssection unit (N − 1 dummy variables) and (possibly) for theperiods (T − 1 dummies).

If N and T are large this is not very practical.

Gives the same estimates for the regression coefficients as the timedemeaned method and the standard errors and major statistics arethe same.


Panel Data

Fixed effects model

Example 3

Papke (1994), Journal of Public Economics 54, 37–49, studied the effectof Indiana enterprise zone program on unemployment, years 1980–1988(Wooldridges data base, file: ezunem). Six zones designated 1984 andfour more in 1985. Twelve cities did not receive a zone (control group).

An evaluation model of the policy is

log(uclmsit) = θt + β1Dit + ai + uit (21)

where θt indicates time varying intercept, ucclmsit is the numberunemployment claims during year t in city i , and Dit = 1 if the city i hadthe zone in year t and zero otherwise.

First Difference estimates for β1:


Panel Data

Fixed effects model

plm(formula = log(uclms) ~ d81 + d82 + d83 + d84 + d85 + d86 +

d87 + d88 + ez, data = udfr, model = "fd", index = c("city",

"year"))

Balanced Panel: n = 22, T = 9, N = 198

Observations used in estimation: 176

Residuals:


-0.4925469 -0.1426738 -0.0091983 0.1494605 0.6062251

Coefficients:


d81 -0.321632 0.046064 -6.9823 6.547e-11 ***

d82 0.135496 0.065144 2.0799 0.039059 *

d83 -0.219255 0.079785 -2.7481 0.006654 **

d84 -0.558025 0.094564 -5.9011 1.958e-08 ***

d85 -0.556576 0.108961 -5.1080 8.809e-07 ***

d86 -0.586054 0.118298 -4.9541 1.769e-06 ***

d87 -0.853738 0.126950 -6.7250 2.666e-10 ***

d88 -1.192423 0.135049 -8.8296 1.382e-15 ***

ez -0.181878 0.078186 -2.3262 0.021209 *

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Total Sum of Squares: 20.678

Residual Sum of Squares: 7.7958

R-Squared: 0.623




Panel Data

Fixed effects model

The estimate of β1, β1 = −.182 indicates that the presence of an EZ

causes about a 16.6% (e−.182 − 1 = .166) fall in unemployment claims,

which is both economically and statistically significant (t-val 2.33).


Panel Data

Fixed effects model

Fixed Effect estimation results

plm(formula = log(uclms) ~ d81 + d82 + d83 + d84 + d85 + d86 +

d87 + d88 + ez, data = udfr, model = "within", index = c("city",

"year"))

Balanced Panel: n = 22, T = 9, N = 198

Residuals:


-0.5761817 -0.1083693 -0.0097701 0.1136396 0.4962292

Coefficients:


d81 -0.321632 0.060457 -5.3200 3.297e-07 ***

d82 0.135496 0.060457 2.2412 0.0263323 *

d83 -0.219255 0.060457 -3.6266 0.0003811 ***

d84 -0.579152 0.062318 -9.2935 < 2.2e-16 ***

d85 -0.591787 0.065495 -9.0355 3.919e-16 ***

d86 -0.621265 0.065495 -9.4856 < 2.2e-16 ***

d87 -0.888949 0.065495 -13.5727 < 2.2e-16 ***

d88 -1.227633 0.065495 -18.7438 < 2.2e-16 ***

ez -0.104415 0.055419 -1.8841 0.0612906 .

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Total Sum of Squares: 42.388

Residual Sum of Squares: 6.7144

R-Squared: 0.8416




Panel Data

Fixed effects model

Dummy variable regression:

lm(formula = log(uclms) ~ d81 + d82 + d83 + d84 + d85 + d86 +

d87 + d88 + c2 + c3 + c4 + c5 + c6 + c7 + c8 + c9 + c10 +

c11 + c12 + c13 + c14 + c15 + c16 + c17 + c18 + c19 + c20 +

c21 + c22 + ez, data = udfr)

Residuals:


-0.57618 -0.10837 -0.00977 0.11364 0.49623

Coefficients:


(Intercept) 11.67615 0.08008 145.807 < 2e-16 ***

d81 -0.32163 0.06046 -5.320 3.30e-07 ***

d82 0.13550 0.06046 2.241 0.026332 *

d83 -0.21926 0.06046 -3.627 0.000381 ***

d84 -0.57915 0.06232 -9.293 < 2e-16 ***

d85 -0.59179 0.06550 -9.036 3.92e-16 ***

d86 -0.62126 0.06550 -9.486 < 2e-16 ***

d87 -0.88895 0.06550 -13.573 < 2e-16 ***

d88 -1.22763 0.06550 -18.744 < 2e-16 ***

(city dummies deleted)

ez -0.10441 0.05542 -1.884 0.061291 .

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1





Panel Data

Fixed effects model

The results show that the FE and DVRM results are exactly the same.

Using the FE results, the coefficient −0.104 implies about 10.4 percent

drop in the unemployment claims due to the program. The estimate is

significant in one-tailed testing but not in two-tailed testing.


Panel Data



Fixed effects model













Panel Data

Fixed effects model

If the number of periods is 2 (T = 2) FE and FD giveidentical results.

When T ≥ 3 the FE and FD are not the same.Both are unbiased under assumptions FE.1–FE.4

FE.1 For each i , the model is

yit = β1xit1 + · · ·+ βkxitk + ai + uit , t = 1, . . .T .

FE.2 We have a random sample from the cross section.FE.3 Each explanatory variables changes over time, and they are not

perfectly collinear.FE.4 E[uit |Xi , ai ] = 0 for all time periods (Xi stands for all

explanatory variables).FE.5 var[uit |Xi , ai ] = σ2

u for all t = 1, . . . ,T .FE.6 cov[uit , uis ] = 0 for all t 6= sFE.7 uit |Xi , ai ∼ NID(0, σ2

u).

Both are consistent under assumptions FE.1–FE.4 for fixed Tas n→∞.


Panel Data

Fixed effects model

If uit is serially uncorrelated, FE is more efficient than FD (becauseof this FE is more popular).

If uit is (highly) serially correlated, ∆uit may be less seriallycorrelated, which may favor FD over FE. However, typically T israther small, such that serial correlation is difficult to observe.

In sum, there are no clear cut guidelines to choose between thesetwo. Thus, a good advise is to check them them both and try todetermine why they differ if there is a big difference.


Panel Data



Fixed effects model













Panel Data

Fixed effects model

A data set is called a balanced panel if the same number of timeseries observations are available for each cross section units. Thatis T is the same for all individuals. The total number ofobservations in a balanced panel is nT .

All the above examples are balanced panel data sets.

If some cross section units have missing observations, whichimplies that for an individual i there are available Ti time periodobservations i = 1, . . . , n, Ti 6= Tj for some i and j , we call thedata set an unbalanced panel. The total number of observationsin an unbalanced panel is T1 + · · ·+ Tn.

In most cases unbalanced panels do not cause major problems tofixed effect estimation.

Modern software packages make appropriate adjustments toestimation results.


Panel Data

Random effects models1 Panel Data


Fixed effects model













Panel Data


Consider the simple unobserved effects model

yit = β0 + β1xit + ai + uit , (22)

i = 1, . . . , n, t = 1, . . . ,T .

Typically also time dummies are also included to (22).

Using FD or FE eliminates the unobserved component ai .

However, if ai is uncorrelated with xit using random effect (RE)estimation can lead to more efficient estimation of the regressionparameters.


Panel Data


Generally, we call the model in equation (22) the random effectsmodel if ai is uncorrelated with all explanatory variables, i.e.,

cov[xit , ai ] = 0, t = 1, . . . ,T . (23)

How to estimate β1 efficiently?

If (23) holds, β1 can be estimated consistently from a single crosssection.

Obviously this discards lots of useful information.

Also if (23) holds β1 can be estimated consistently from thepooled data set by OLS. However, the errors are heteroskedasticand correlated that lead to underestimation of the standard errors.


Panel Data


If the data set is simply pooled and the error term is denoted asvit = ai + uit , we have the regression

yit = β0 + β1xit + vit . (24)

Then

corr[vit , vis ] =σ2a

σ2a + σ2

u

(25)

for t 6= s, where σ2a = var[ai ] and σ2

u = var[uit ].

That is, the error terms vit are (positively) autocorrelated, whichbiases the standard errors of the OLS β1.


Panel Data


If σ2a and σ2

u were known, optimal estimators (BLUE) would beobtained the generalized least squares (GLS), which in this casewould reduce to estimate the regression slope coefficients from thequasi demeaned equation

yit − λyt = β0(1− λ) + β1(xit − λxi ) + (vit − λvi ), (26)

where

λ = 1−(

σ2u

σ2u + Tσ2

a

) 12

. (27)

In practice σ2u and σ2

a are unknown, but they can be estimated.


Panel Data


One method is to estimate (24) from the pooled data set and usethe OLS residuals vit to estimate σ2

a and σ2u and plug them into

(27).

There resulting GLS estimators for the regression slope coefficientsare called random effects estimators (RE estimators).

Under the random effects assumptions2 the estimators areconsistent, but not unbiased.

They are also asymptotically normal as n→∞ for fixed T .

However, with small n and large T properties of the RE estimatoris largely unknown.

2The ideal random effects assumptions include FE.1, FE.2, FE.4–FE.6.

FE.3 is replaced withRE.3: There are no perfect linear relationships among the explanatory variables.RE.4: In addition of FE.4, E[ai |Xi ] = 0.


Panel Data


It is notable that λ = 1 results in (26) results to the pooledregression and FE obtained with λ = 0.

RE estimation is available in modern statistical packages withdifferent options.

Example 4

Data set wagepan.xls (Wooldridge): n = 545, T = 8.

Is there a wage premium in belonging to labor union?

log(wageit) = β0 + β1educit + β3exprit + β4expr2it

+β5marriedit + β6unionit + ai + uit

Year dummies for 1980–1987 are included.

It is notable that with inclusion of full set of year dummies implies thatone cannot estimate with the FE method effects that change a constantamount over time. Experience (exper) is such a variable.


Panel Data


Estimate Std. Error

(Intercept) -0.0343056 0.06326

educ 0.0989945 0.00462

exper 0.0861696 0.01014

I(exper^2) -0.0027349 0.00071

married 0.1230113 0.01557

union 0.1685243 0.01707

-------------------------------------------

lwage | Pooled Random Fixed

| OLS Effects Effects

--------+----------------------------------

educ | .0989945 .0906150 ..

| (.0046227) (.0105807)

exper | .0861696 .1027934 ..

| (.0101415) (.0153853)

exper2 | -.0027349 -.0046859 -.0051855

| (.0007099) (.0006896) (.0007044)

married | .1230113 .0678821 .0466804

| (.0155714) (.0167369) (.0183104)

union | .1685243 .1031103 .0800019

| (.0170652) (.0178388) (.0193103)

-------------------------------------------

It is notable that OLS standard errors tend to be smaller than in the RE or FE cases.

OLS standard errors underestimate the true standard errors.

OLS coefficient estimates also suffer from the omitted variable problem accounted inpanel estimation.

Stata estimate of the correlation in (25) is .464.


Panel Data

Random effects or fixed effects1 Panel Data


Fixed effects model













Panel Data


FE is widely considered preferable because it allows correlationbetween ai and x variables.

Given that the common effects, aggregated to ai is not correlatedwith x variables, an obvious advantage of the RE is that it allowsalso estimation of the effects of factors that do not change in time(like education in the above example).

Typically the condition that common effects ai is not correlatedwith the regressors (x-variables) should be considered more like anexception than a rule, which favors FE.


Panel Data

Hausman specification test1 Panel Data


Fixed effects model













Panel Data


Hausmanan (1978) devised a test for the orthogonality of thecommon effects (ai ) and the regressors.

The test compares the fixed effect (OLS) and random effect (GLS)estimates utilizing the Wald testing approach.


Panel Data


The basic idea of the test relies on the fact that under the nullhypothesis of orthogonality both OLS and GLS are consistent,while under the alternative hypothesis GLS is not consistent.

Thus, under the null hypothesis OLS and GLS estimates shouldnot differ much from each other.

The test compares these estimates with Wald statistic.

In Stata performing Hausman requires that both OLS and GLSregression results are saved for availability for the postestimationtest0 procedure.


Panel Data


Example 5

Applying the Hausman test to the case of Examle 4 can be in Stata

yields:


Panel Data


* Estimate fixed effects

xtreg lwage y81 y82 y83 y84 y85 y86 y87 exper2 married union, fe

* store the results into "hfixed"

estimates store hfixed

* Estimate the random effects model

xtreg lwage y81 y82 y83 y84 y85 y86 y87 educ exper exper2 married union, re

* store the results into "hrandom"

estimates store hrandom

* Hausman test

hausman hfixed hrandom

---- Coefficients ----

| (b) (B) (b-B) sqrt(diag(V_b-V_B))

| hfixed hrandom Difference S.E.

--------+---------------------------------------------------------

y81 | .1511912 .0427498 .1084414 .

y82 | .2529709 .035577 .2173939 .

y83 | .3544437 .0270943 .3273494 .

y84 | .4901148 .052207 .4379078 .

y85 | .6174822 .0690524 .5484299 .

y86 | .7654965 .1053229 .6601736 .

y87 | .9250249 .1505464 .7744785 .

exper2 | -.0051855 -.0046859 -.0004996 .000144

married | .0466804 .0678821 -.0212017 .0074261

union | .0800019 .1031103 -.0231085 .0073935

-------------------------------------------------------------------

b = consistent under Ho and Ha; obtained from xtreg

B = inconsistent under Ha, efficient under Ho; obtained from xtreg

Test: Ho: difference in coefficients not systematic

chi2(10) = (b-B)’[(V_b-V_B)^(-1)](b-B)

= 26.77

Prob>chi2 = 0.0028

(V_b-V_B is not positive definite)


Panel Data


The test rejects the orthogonality condition. Thus, FE should be used.

In Eviews Hausman test is obtained by first estimating the modelas a random effect model and then selecting

View > Fixed/Rendom Effect Testing > Correlated

Random Effects - Hausman Test


Panel Data

Policy analysis with panel data1 Panel Data


Fixed effects model













Panel Data


Panel data is useful for policy analysis, in particular, programevaluation.

Example 6

Continue Example 1.2, where training program on worker productivity was evaluated.

The data include three years, 1987, 1988, and 1989.

The training program was implemented first time 1988.

We focus on the years 1987 (no program) and 1988 (program implemented) to seewhether the program benefits firms.

The model panel model is

log(scarpit) = β0 + δ0 y88 + β1grantit + ai + uit , (28)

where y88 is the year 1988 dummy (= 1 for year 1988 and = 0 otherwise) and ai

includes the unobserved firm effects (worker skill, etc.).


Panel Data


Ignoring panel structure OLS results suggested no improvement.

Dependent Variable: LOG(SCRAP)

Method: Panel Least Squares

Sample: 1 471 IF YEAR < 1989

Periods included: 2

Cross-sections included: 54

Total panel (balanced) observations: 108

=====================================================

Variable Coefficient Std. Error t-Statistic Prob.

-----------------------------------------------------

C 0.523144 0.159783 3.274086 0.0014

GRANT -0.058018 0.380949 -0.152299 0.8792

-----------------------------------------------------

R-squared 0.000219

Adjusted R-squared -0.009213

S.E. of regression 1.507393

F-statistic 0.023195

Prob(F-statistic) 0.879241

=====================================================

The coefficient for grant is not statistically significant, suggesting that

the program does not help in reducing the scrap rate.


Panel Data


Accounting for the possible firm effects and imposing also the year

dummy to account for possible time effect, yields

=====================================================

Variable Coefficient Std. Error t-Statistic Prob.

-----------------------------------------------------

C 0.568716 0.048603 11.70126 0.0000

GRANT -0.317058 0.163875 -1.934753 0.0585

-----------------------------------------------------

Effects Specification

Cross-section fixed (dummy variables)

Period fixed (dummy variables)

R-squared 0.964308

Adjusted R-squared 0.926556

S.E. of regression 0.406642

F-statistic 25.54364

Prob(F-statistic) 0.000000


Panel Data


The estimate of the coefficient for the grant is negative and close tostatistically significant in two sided testing and significant in one sidedtesting (program improves) for the alternative

H1 : β1 < 0

significant at the 5% level with p-value 0.0265.

According to the estimate participating the program decreases the

scrap-rate on average 32% (more accurately 27%, since

exp(−.317058)− 1 ≈ −0.272).


Panel Data

Dynamic Panel Models1 Panel Data


Fixed effects model













Panel Data


Many economic relationships are dynamic.

These may be characterized by the presence of lagged dependentvariables

yit = δyi ,t−1 + x′itβ + vit , (29)

wherevit = ai + uit (30)

with ai ∼ iid(0, σ2a) and uit ∼ iid(0, σ2

u) are independent,i = 1, . . . , n, t = 1, . . . ,T .


Panel Data


Alternatively the one-way error component model in (30) can be atwo-way specification such that

vit = ai + bt + uit , (31)

where all the components are assumed again independent.

After differencing we have

∆yit = δ∆yi ,t−1 + ∆x′itβ + ∆uit . (32)

The lagged term yi ,t−1 as a regressor variable is correlated withui ,t−1, which causes problems in estimation.


Panel Data


Once regressor variables are correlated with the error term, OLS orGLS estimators become inconsistent.

A typical solution to the problem is to apply some kind ofinstrumental variable estimation.

These are least squares (LS) or some other type of methods, whereinstrumental variables are utilized to remove the inconsistency dueto the error term correlation with the regressors.

A variable is suitable for an instrumental variable if it is notcorrelated with the error term, but is correlated with the regressors.

Thus, those regressors that are not correlated with the error termcan be used also as instruments.


Panel Data


Example 7

2SLS (two state least squares).

Consider a standard regression model

yi = x′iβ + ui , (33)

where xi is a k-vector of regressors (including the constant term) cov[xi , ui ] 6= 0,i = 1, . . . , n.

Suppose we have m ≥ k, additional variables in zi (m-vector) such that cov[zi , ui ] = 0but cov[zi , xi ] 6= 0.

2SLS solution for the problem is such that first (first stage) use OLS to regressx-variables on z-variables.

In the second stage replace the original regressors xi by the predicted variables xi fromthe first stage, and estimate β from the regression

yi = x′iβ + ui . (34)

The estimatorβ2SLS = (X′X)−1X′y (35)

is called the 2SLS estimator of β.


Panel Data


In particular, if m = k then (35) becomes

βIV = (Z′X)−1Z′y, (36)

which is called the Instrumental Variable estimator of β.


Panel Data


Example 8

(Data: http://eu.wiley.com/college/baltagi/ > Student companion site > datasets)

Demand for cigarettes in 46 US States [annual data, 1963–1992]. Estimated equation

cit = α+ β1ci,t−1 + β2pit + β3yit + β4pnit + vit , (37)

wherevit = ai + bt + uit , (38)

ai and bt are fixed effects, uit ∼ NID(0, σ2u), and all the observable variables are in

logarithms:cit = real per capita sales of cigarettes by persons of smoking age (14 and older).cigarette average price per packpit = real average retail price of a pack of cigarettesyit = real per capital disposable incomepnit = the minimum real price of cigarettes in any neighboring state (proxy for casualsmuggling effect across state borders)

ci,t−1 is very likely correlated with uit .


Panel Data


For reference purposes, estimating with panel OLS (average of within

group regressions with time dummies) yields

Fixed-effects (within) regression Number of obs = 1334

Group variable: state Number of groups = 46

R-sq: within = 0.9283 Obs per group: min = 29

between = 0.9859 avg = 29.0

overall = 0.9657 max = 29

F(32,1256) = 508.07

corr(u_i, Xb) = 0.4743 Prob > F = 0.0000

-----------------------------------------------------

lc | Coef. Std. Err. t P>|t|

-------------+---------------------------------------

lc |

L1. | .8302514 .0126242 65.77 0.000

|

lp | -.2916822 .0230847 -12.64 0.000

ly | .1068698 .0233417 4.58 0.000

lpn | .0354559 .02656 1.33 0.182

_cons | .8204374 .2228775 3.68 0.000

-------------+---------------------------------------

sigma_u | .02738301

sigma_e | .03504776

rho | .37905103 (fraction of variance due to u_i)

-----------------------------------------------------

F test that all u_i=0: F(45, 1256) = 4.52

Prob > F = 0.0000


Panel Data


Several method are proposed to estimate when there is potentialcorrelation between the error term and (some) regressors.

GMM (Generalized Method of Moments) estimation has gained latelymuch popularity, in particular when there are non-linear momentrestrictions.

Stata has xtdpd procedure which produces the Arellano and Bond or the

Arellano-Bover/Blundell-Bond estimator, which are GMM estimators,

where instruments are defined in a particular way (the idea will be

discussed in the classroom).


Panel Data


xtdpd l(0/1).lc lp ly lpn y66-y92, div(lp ly lpn y66-y92) dgmmiv(lc)

Dynamic panel-data estimation Number of obs = 1334

Group variable: state Number of groups = 46

Time variable: year

Obs per group: min = 29

avg = 29

max = 29

Number of instruments = 437 Wald chi2(31) = 13273.45

Prob > chi2 = 0.0000

One-step results

-----------------------------------------------------

lc | Coef. Std. Err. z P>|z|

-------------+---------------------------------------

lc |

L1. | .8201729 .0161446 50.80 0.000

|

lp | -.3607549 .0311244 -11.59 0.000

ly | .1871102 .0334027 5.60 0.000

lpn | -.0215713 .0399233 -0.54 0.589

-----------------------------------------------------

Instruments for differenced equation

GMM-type: L(2/.).lc

Standard: D.lp D.ly D.lpn D.y66 D.y67 D.y68

D.y69 D.y70 D.y71 D.y72 D.y73 D.y74 D.y75

D.y76 D.y77 D.y78 D.y79 D.y80 D.y81 D.y82

D.y83 D.y84 D.y85 D.y86 D.y87 D.y88 D.y89

D.y90 D.y91 D.y92

Instruments for level equation

Standard: _cons


Panel Data


Test for the orthogonality conditions of the instruments

Sargan test of overidentifying restrictions

H0: overidentifying restrictions are valid

chi2(405) = 561.5047

Prob > chi2 = 0.0000

The orthogonality conditions are rejected.

The reason may be that that the errors are MA(1), which implies thatthe GMM instruments (lct−2, . . .) are correlated with the error term.

This can be tried to fix by defining starting from t − 3 with command· · · dgmmiv(lc, lagrange(3 .)).

Doing this improved slightly the situation but still lead to rejection of theorthogonality conditions.

We however, do not continue the analysis here further.


Date post:	07-Feb-2018
Category:	Documents
Upload:	duongcong
View:	225 times
Download:	3 times

Econometrics II - u Valipas.uwasa.fi/~sjp/Teaching/ecmii/lectures/ecmiic2.pdf · Dynamic Panel...

Documents