Lec06 - Panel Data

Lecture 6

Panel Data Econometrics

What are Panel Data?

• Panel data are a form of longitudinal data, where observations on cross-section units are

regularly repeated

o Cross-section units can be individuals, households, plants, firms, municipalities,

states or countries

o Repeat observations are usually time period (e.g. five years intervals, annual,

quarters, weeks, days, etc) or units within clusters (e.g. siblings within a family,

firms within an industry, workers within a firm, etc)

• An important characteristic of panel data is that we cannot assume that the observations

are independently distributed across time

o e.g. unobserved factors that affects a person’s wage in 1990 will also affect that

person’s wage in 1991

• Independently pooled cross-section data are obtained by sampling randomly from a large

population at different points in time.

o Such data consist of independently sampled observations, which rules out

correlation in the error terms across different observations

• Examples of Panel Data

o Firm or company data

o Longitudinal data on patterns of individual behaviour over the life-cycle.

o Comparative country-specific macroeconomic data over time.

• Examples of Panel Datasets

o Panel Study of Income Dynamics (PSID)

o National Longitudinal Surveys of Labor Market Experience (NLS)

o German Socioeconomic Panel (GSEP)

o The British Household Panel Survey (BHPS)

o Swedish Agriculture Farm Level Survey (JEU)

o Finnish Company Database (Yritystietokanta)

o Luxembourg Income Study (LIS)

• Common features of Panel Data:

o The sample of individuals � is typically relatively large

o The number of time periods � is generally short

o Time series dimension of aggregate data tends to be longer (e.g. the Penn World

Tables, World Development Indicators)

• Why use panel data methods?

o Increased precision of regression estimates

o Repeated observations on individuals allow for possibility of isolating effects of

unobserved differences between individuals

o We can study dynamics

o The ability to make causal inference is enhanced by temporal ordering

o The ability to model temporal effects and control for variables that vary over time

o Some phenomena are inherently longitudinal (e.g. poverty persistence, unstable

employment)

• But there are limits to the benefits of panel data:

o Variation between people usually far exceeds variation over time for an individual

o A panel with � waves doesn’t give � times the information of a cross-

section

o Variation over time may not exist for some important variables or may be inflated

by measurement error

o Panel data imposes a fixed timing structure; continuous time survival analysis may

be more informative

o We still need very strong assumptions to draw clear inferences from panels:

sequencing in time does not necessarily reflect causation

• Advantages of panel estimation methods

o Large number of data points (observations)

o Increased degrees of freedom

o Reduces the collinearity among the explanatory variables

o Improved efficiency of econometric estimates

o More variability, less aggregation over firm and individuals

o Better able to study dynamics of adjustment in unemployment, income mobility,

etc

o More reliable and stable parameter estimates

o Identify and measure effects not detectable in pure cross-section (CS) or time-

series (TS) data. Control for unobservable individual heterogeneity and dynamics

not possible in TS (N=1) and CS (T=1). Example: married woman labour-force

participation of 50% interpreted as 50% chance of being in labour force in any

given year, or alternatively 50% always work and 50% never

o Dynamic effects cannot be estimated using CS data.

• Disadvantages of panel estimation methods

o Complicated survey design, stratification

o Changing structure of population (use of rotating panel data)

o Incomplete coverage of the population of interest

o Data collection and management problem

o Distortions of measurement errors due to faulty response, unclear questions, …

o Selectivity problems (self-selectivity not to work because reservation

wage>offered wage)

o Non-response (partial or complete) due to lack of cooperation

o Attrition problem, non-response over time is increasing

o Short time-series dimension, increased N costly, increased T deteriorates attrition

o New estimation problems

o Imputations of unit non-response/missing

Pooled Data

• Pooling Independent Cross-Sections across Time

o Since a random sample is drawn at each time period, pooling the resulting random

samples gives us an independently pooled cross-section

o As such, we can use standard OLS methods

o Advantage of pooling is to increase the sample size, thereby obtaining more precise

estimates and test statistics with greater power

o Pooling is only useful in this regard if the relationship between the dependent variable

and at least some of the independent variables remains constant over time

o To reflect the fact that the population may have different distributions in different time

periods, the intercept is usually allowed to differ across time periods (can be

accomplished by including year dummies)

o The coefficients on the year dummies may be of interest (e.g. after controlling for other

factors has the pattern of fertility changed over time?)

o Year dummies can also be interacted with other explanatory variables to see if the

effect of that variable has changed over time

• Testing for Structural Change across Time

o Considering a pooled dataset of two time periods �� and ��

o Interact each variable with a year dummy for ��

o Test for the joint significance of the year dummy and all of the interaction terms

o Since the intercept in a regression model often changes over time, the Chow test can

detect such changes. It is usually more interesting to allow for an intercept difference and

then to test whether certain slope coefficients change over time

o This can be extended to more than two time periods

Policy Analysis with Pooled Cross-Sections

• Difference-in-Difference Estimation

o Methodology

� Examine the effect of some sort of treatment by comparing the treatment group

after treatment both to the treatment group before treatment and to some other

control group.

� Standard case: outcomes are observed for two groups for two time periods. One of

the groups is exposed to a treatment in the second period but not in the first period.

The second group is not exposed to the treatment during either period. Structure

can apply to repeated cross sections or panel data.

� Usually related to a so-called natural (or quasi-) experiment, when some exogenous

event – often a change in government policy – changes the environment in which

individuals, families, firms or cities operate.

Example

o A state offers a tax break to firms providing employers with health insurance. To estimate

the impact of the bill on the percentage of firms offering health insurance we could use

data on a state that didn’t implement such a law as a control group. It is not correct just

to compare pre- and post-law changes in the percentage of firms offering health

insurance, i.e.

� = � + ��2 + � (1)

where �2 is a dummy for period two.

o Here the coefficient estimate �� gives an estimate of the difference in the percentage of

firms offering health insurance between periods one and two

o The coefficient doesn’t necessarily provide a (causal) estimate of the impact of the tax

break however, since there could be a trend towards more employers offering health

insurance over time

• With repeated cross sections, let A be the control group and B the treatment group. Write

� = � + �� + ��2 + ��2. �� + � (2)

where:

− y is the outcome of interest (e.g. percentage of firms offering health insurance in each State)

− dB captures possible differences between the treatment and control groups prior to the

policy change (e.g. State A versus State B)

− d2 captures aggregate factors that would cause changes in y over time even in the absence

of a policy change, i.e. for both States (e.g. time dummies)

− The coefficient of interest is ��, which gives an estimate of the change in health insurance

take-up for firms in State B, and which is called the difference-in-difference estimator.

State A State B

Year 1 � �

Year 2 � �

Coefficient Calculation

� �

�� − �

� � − �

�� (� − �) − (� − �)

• The difference-in-differences (DD) can be written as:

�� = ��,� − ��,�� − ��,� − ��,�� (3)

In other words, �� represents the difference in the changes over time.

• Assuming that both states have the same health insurance trends over time, we have now

controlled for a possible national time trend, and can now identify what the true impact of

the tax deductibility is on employers offering insurance.

• Inference based on moderate sample sizes in each of the four groups is straightforward, and

is easily made robust to different group/time period variances in regression framework.

• Can refine the definition of treatment and control groups.

o Example: change in state health care policy aimed at elderly. Could use data only on

people in the state with the policy change, both before and after the change, with the

control group being people 55 to 65 (say) and the treatment group being people over

65

o This DD analysis assumes that the paths of health outcomes for the younger and older

groups would not be systematically different in the absence of intervention

o Alternatively, use the over-65 population from another state as an additional control

• Let dE be a dummy equal to one for someone over 65:

� = � + �� + �� + ��. �� + ��2 + ��2. �� + ��2. �� + ��2. ��. �� + � (4)

• The OLS estimate �� is

�� = ��,!,� − ��,!,�� − ��,",� − ��,",��# − ��,!,� − ��,!,�� − ��,",� − ��,",��# (5)

where the A subscript means the state not implementing the policy and the N subscript

represents the non-elderly. This is the difference-in-difference-in-differences (DDD) estimate.

• Can add covariates to either the DD or DDD analysis to control for compositional changes.

• Can use multiple time periods and groups.

• This methodology has a number of applications, particularly when the data arise from a

natural experiment (or quasi experiment)

o Occurs when some exogenous event – often a change in government policy – changes

the environment in which individuals, families, firms or cities operate

• A natural experiment always has a control group, which is not affected by the policy change,

and a treatment group thought to be affected by the policy change

• Different to a true experiment the control and treatment groups in natural experiments

arise from the particular policy change and are not randomly assigned

• If $ is the control group and � the treatment group, and letting �� equal one for those in

the treatment group �, and zero otherwise. Then, letting �2 denote a dummy for the

second (post-policy change) time period, the equation of interest is;

� = � + ��2 + �� + ��2 ∙ �� + &�ℎ() +��&),

Where �� measures the effect of the policy

• Without other factors in the regression �� will be the difference-in-difference estimator:

�� = ��,- − ��,.� − (��,- − ��,.)

where the bar denotes the average

Before After After - Before

Control � � + � �

Treatment � + �� + � + � + �� + ��

Treatment - Control �� + ��

• The parameter �� – sometimes called the average treatment effect – can be estimated in

two ways:

1. Compute the differences in averages between the treatment and control groups in

each time period, and then difference the results over time

2. Compute the change in averages over time for each of the treatment and control

groups, and then difference these changes, i.e. write

�� = ��,- − ��,-� − (��,. − ��,.)

• When explanatory variables are added to the regression, the OLS estimate of �� no longer

has a simple form, but its interpretation is similar

Panel Data

• A balanced panel has the same number of time observations (�) for each of the / individuals

• An unbalanced panel has different numbers of time observations (�0) on each individual

• A compact panel covers only consecutive time periods for each individual – there are no

“gaps”

• Attrition is the process of drop-out of individuals from the panel, leading to an unbalanced

(and possibly non-compact) panel

• A short panel has a large number of individuals but few time observations on each, (e.g. the

British Household Panel Survey has 5,500 households and 14 waves)

• A long panel has a long run of time observations on each individual, permitting separate

time-series analysis for each (e.g. Penn World Tables has data from 1960)

• While panel data can be analyzed using standard OLS techniques, it is better to use some

techniques specifically designed to take advantage of panel data.

o Specifically, you know that in a panel data set there is a special relationship between

the multiple observations of a particular individual.

• Consider the following regression specification:

�01 = 201� + �01

• A common assumption used in panel data is that we can write the error term as:

�01 = 301 + 40 • Where 40 is called a fixed- (or random-) effect that doesn’t vary over time

• One assumption of OLS is that �(�01|201) = 0. If the 40 are correlated with 201therefore OLS

will provide inconsistent estimates of the parameters

• Panel data methods allow us to estimate the parameters consistently using so-called fixed

effects (or related) methods

• We replace the assumption that �(�01|201) = �(301 + 40|201) = 0 with the weaker

assumption that �(301|20) = 0

Individual Effects in Panel Data

• It may look as though there is a positive relationship between X and Y and you would be

tempted to draw some sort of positively sloped linear relationship

• However, if you know that this is panel data, you might consider which points in the graph

are observations on the same individuals and, perhaps, circle observations of particular

individuals. In this case, you might find the following:

• This reveals a very different relationship between 7 and 8. The knowledge of which

observations came from the same person, company, plant, unit, state, city or whatever can

dramatically impact the results of your research.

• In this case, we see a negative relationship between the variables, but we might imagine

that there is a separate intercept for each person. This is one type of panel data model.

Panel Data Methods

• Distinguish between Fixed Effects and Random Effects Models

o In fixed effects models 40 and 201 are allowed to be correlated

o In random effects models 40 and 201 are assumed to be uncorrelated

The Fixed effects panel data model

• Consider the following model

�01 = 40 + 201� + 301 (1)

For 9 = 1, . . . , � individuals over � = 1, . . . , � periods

• Model includes

o An individual effect, 40 (constant over time).

o Marginal effects � for 201 (common across 9 and �)

• The model can be estimated using the (pooled) Ordinary Least Squares (OLS) estimator

o The simplest approach to the estimation.

o Individual effects 40 are fixed and common across economic agents, such that 40 = 4

for all 9 = 1, . . . , �

o OLS produces consistent and efficient estimates of 4 and �.

• One assumption of OLS is that there is a zero correlation between the error terms of any

two observations.

• The problem with panel data is that we would expect there to be correlations between error

terms for a particular individual across different time periods.

• So, if unobserved variables for an individual tend to make its error term positive in one

period, they will tend to make its error term positive in other periods as well. For example, if

a county has a particularly high rate of unemployment in one year, it is likely to have a high

rate then next year, too.

• This correlation between error terms is a violation of one of the assumptions of OLS. This

violation means that OLS is not the best estimator.

Bias from Ignoring Fixed Effects

Accounting for Fixed Effects

First Differencing the Data

• Is the easiest way of dealing with the fixed effects

• The lagged value of �01 is:

�01;� = 40 + 201;�� + 301;�

• Taking first differences gives us:

(�01 − �01;�) = (40 − 40) + (201� − 201;��) + (301 − 301;�)

• Or:

∆�01 = ∆201� + ∆301

• OLS on this transformed equation will yield consistent estimates of � since the 40 has been

removed through first-differencing

• ∆201 and ∆301 are assumed uncorrelated because of the assumption that �(301|201) = 0

• First differencing tends to introduce a negative correlation across observations since:

�&=(∆301 , ∆301;�) = �&=(301 − 301;�, 301;� − 301;�) = −=�)(301;�)

• First-differenced regression is less efficient than other (fixed effects) methods (except when

errors follow a random walk)

• In addition to first-differences, one could use longer differences (e.g. five-year differences).

Such estimators often have useful properties (e.g. robust to measurement error)

The Within-Groups (WG) estimator

o Can be used if individual effects 40 are fixed but not common across 9 = 1, . . . , �

o Eliminates the fixed effect 40 by differencing with respect to the mean

• Let ��0 = �;� ∑ �01-1?� and 2̅0 = �;� ∑ 201-1?�

• Define: 201∗ = 201 − 2̅0 and �01∗ = �01 − ��0 • Then ��0 = 40 + 2̅0B� + 3�0 • Subtracting from (1) gives:

�01 − ��0 = (40 − 4�0) + (201 − 2̅0)′� + (301 − 3�0)

�01 − ��0 = (201 − 2̅0)′� + (301 − 3�0)

• Or �01∗ = 201∗B� + 301∗

• Which can then be estimated by OLS

• The individual effects can be estimated as 4D0 = ��0 − ��2̅0

• The estimator of the slope parameters, �� , is consistent if either � or � become large

• The estimator of the individual effects, 4D0, is constant only if � becomes large

• The number of degrees of freedom need to be adjusted.

o Usually the degrees of freedom would be �� − E, but with individual effects we have

�� − � − E (software packages usually make this correction when running their panel

commands)

The Within Groups Estimator

• Drawback with the Within-Groups estimator

o Eliminates time-invariant characteristics from a model of the form

�01 = 40 + 2′01� + F0B� + 301

o As such, we cannot distinguish between observed and unobserved heterogeneity

The Least Squares Dummy Variable (LSDV) Model

• Define a series of group-specific dummy variables �G01 = 1 (H = 9)

• This gives:

�01 = 40 + 2′01� + 301 (2)

�01 = 4��01 + 4��01 + ⋯ + 4"�"01 + 2′01� + 301

• Estimate by standard OLS (excluding a constant)

• Here the constant terms vary by individual, but the slopes are the same for all individuals

• A test for individual effects:

J: 4� = 4� = ⋯ = 4"

which can be tested using an F-test.

o Note: equation (2) can be written as �01 = 4 + L0 + 2′01� + 301, where 4 is the average

individual effect and L0 is the deviation from average

o The model can thus be estimated by including a constant and � − 1 individual

dummies

• Problems

o Incidental parameters – the number of dummies grows as � increases. The usual proof

for consistency does not hold for LSDV models therefore

o Inverting an � + E matrix can be impossible, and even when possible impractical

and/or inaccurate

Random Effects Models

• In the random effects model the 40 are treated as random variables, rather than fixed

constants

o The 40 are usually assumed to be independent of the errors 301 and also mutually

independent, i.e.

o 40~NNO(0, PQ�)

o 301~NNO(0, PR�)

o 40 and 301 are independently distributed

• Since 40 are now random, the errors now take the following form: �01 = 40 + 301

• The presence of 40 produces a correlation among the errors of the same cross-section unit

(i.e. $&=(�01 , �0S) ≠ 0, though the errors from the different cross-section units are

independent(i.e. $&=��01 , �U1� = 0)

o OLS is thus inefficient in the random effects model, and yields incorrect standard errors

• Since the errors are correlated, we use Generalised Least Squares (GLS) to estimate the

model

One possibility is to estimate a regression model w ith panel data using OLS. This imposes the assumpti on that the fixed effects are the same for each individual.

. regress lwage educ exper expersq black married hi sp, vce(robust) Linear regression Number of obs = 4360 F( 6, 4353) = 141.60 Prob > F = 0.0000 R-squared = 0.1659 Root MSE = .48676 --------------------------------------------------- --------------------------- | Robust lwage | Coef. Std. Err. t P>| t| [95% Conf. Interval] -------------+------------------------------------- --------------------------- educ | .0989097 .0045823 21.59 0.0 00 .0899261 .1078932 exper | .0940639 .0101683 9.25 0.0 00 .0741289 .1139989 expersq | -.0032342 .000678 -4.77 0.0 00 -.0045634 -.0019051 black | -.1142367 .0248047 -4.61 0.0 00 -.1628665 -.0656069 married | .1165982 .0154429 7.55 0.0 00 .0863222 .1468742 hisp | .0266568 .0198779 1.34 0.1 80 -.012314 .0656276 _cons | -.0065678 .0649791 -0.10 0.9 19 -.13396 .1208244

In this case it is often useful to cluster standard errors around the individual. Stata’s cluster comm and specifies that the standard errors allow for intragroup corre lation, relaxing the usual requirement that the obs ervations be independent. That is, the observations are indepen dent across groups (clusters) but not necessarily w ithin groups. . regress lwage educ exper expersq black married hi sp, vce(cluster nr) Linear regression Number of obs = 4360 F( 6, 544) = 58.28 Prob > F = 0.0000 R-squared = 0.1659 Root MSE = .48676 (Std. Err. adjus ted for 545 clusters in nr) --------------------------------------------------- --------------------------- | Robust lwage | Coef. Std. Err. t P>| t| [95% Conf. Interval] -------------+------------------------------------- --------------------------- educ | .0989097 .0092261 10.72 0.0 00 .0807865 .1170328 exper | .0940639 .012361 7.61 0.0 00 .0697827 .1183451 expersq | -.0032342 .0008643 -3.74 0.0 00 -.0049321 -.0015364 black | -.1142367 .0520134 -2.20 0.0 28 -.2164083 -.012065 married | .1165982 .0266349 4.38 0.0 00 .0642784 .168918 hisp | .0266568 .0403928 0.66 0.5 10 -.0526881 .1060018 _cons | -.0065678 .119913 -0.05 0.9 56 -.2421171 .2289815

To allow for differences in the distribution in dif ferent time periods it is often desirable to allow for differences in the intercept over time. This can be achieved by including a set of time dummies. . regress lwage educ exper expersq black married hi sp d8*, vce(cluster nr) Linear regression Number of obs = 4360 F( 13, 544) = 43.29 Prob > F = 0.0000 R-squared = 0.1682 Root MSE = .48649 (Std. Err. adjus ted for 545 clusters in nr) --------------------------------------------------- --------------------------- | Robust lwage | Coef. Std. Err. t P>| t| [95% Conf. Interval] -------------+------------------------------------- --------------------------- educ | .0920318 .0111277 8.27 0.0 00 .0701732 .1138903 exper | .0789065 .0201169 3.92 0.0 00 .0393902 .1184228 expersq | -.0030804 .0010397 -2.96 0.0 03 -.0051227 -.0010381 black | -.1101816 .0524171 -2.10 0.0 36 -.2131463 -.0072168 married | .1173324 .0266045 4.41 0.0 00 .0650723 .1695925 hisp | .0272366 .0403107 0.68 0.5 00 -.0519471 .1064203 d81 | .0500808 .0285791 1.75 0.0 80 -.0060582 .1062197 d82 | .0496073 .0378311 1.31 0.1 90 -.0247058 .1239203 d83 | .0417228 .0473305 0.88 0.3 78 -.0512501 .1346957 d84 | .0677402 .0594756 1.14 0.2 55 -.0490897 .1845702 d85 | .079509 .0686687 1.16 0.2 47 -.0553792 .2143973 d86 | .1092778 .0782723 1.40 0.1 63 -.0444753 .2630308 d87 | .1512288 .0872289 1.73 0.0 84 -.020118 .3225756 _cons | .0958252 .1620766 0.59 0.5 55 -.2225475 .4141978

To estimate the model using panel techniques (i.e. fixed and random effects) it is necessary to tell S tata which is the group and which if the time identifier. . tsset nr year panel variable: nr (strongly balanced) time variable: year, 1980 to 1987 delta: 1 unit To run a fixed effects model is then straightforwar d using the xtreg command. . xtreg lwage educ exper expersq married black his p, fe vce(robust) note: educ omitted because of collinearity note: black omitted because of collinearity note: hisp omitted because of collinearity Fixed-effects (within) regression Num ber of obs = 4360 Group variable: nr Num ber of groups = 545 R-sq: within = 0.1741 Obs per group: min = 8 between = 0.0014 avg = 8.0 overall = 0.0534 max = 8 F(3 ,544) = 135.44 corr(u_i, Xb) = -0.1289 Pro b > F = 0.0000 (Std. Err. adjus ted for 545 clusters in nr) --------------------------------------------------- --------------------------- | Robust lwage | Coef. Std. Err. t P>| t| [95% Conf. Interval] -------------+------------------------------------- --------------------------- educ | (omitted) exper | .1169371 .0106843 10.94 0.0 00 .0959496 .1379245 expersq | -.0043329 .0006857 -6.32 0.0 00 -.0056799 -.0029859 married | .0473384 .0211735 2.24 0.0 26 .0057465 .0889303 black | (omitted) hisp | (omitted) _cons | 1.085044 .0367222 29.55 0.0 00 1.01291 1.157179 -------------+------------------------------------- --------------------------- sigma_u | .40387667 sigma_e | .35204264 rho | .56824994 (fraction of variance d ue to u_i)

Similarly, a random effects model can be estimated as: . xtreg lwage educ exper expersq married black his p, re vce(robust) Random-effects GLS regression Num ber of obs = 4360 Group variable: nr Num ber of groups = 545 R-sq: within = 0.1739 Obs per group: min = 8 between = 0.1548 avg = 8.0 overall = 0.1635 max = 8 Wal d chi2(6) = 517.71 corr(u_i, X) = 0 (assumed) Pro b > chi2 = 0.0000 (Std. Err. adjus ted for 545 clusters in nr) --------------------------------------------------- --------------------------- | Robust lwage | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- educ | .1011033 .0088919 11.37 0.0 00 .0836756 .1185311 exper | .1128358 .0104931 10.75 0.0 00 .0922696 .1334019 expersq | -.0041483 .0006719 -6.17 0.0 00 -.0054652 -.0028314 married | .065336 .0192047 3.40 0.0 01 .0276956 .1029765 black | -.1269633 .0514673 -2.47 0.0 14 -.2278373 -.0260892 hisp | .026507 .0407199 0.65 0.5 15 -.0533025 .1063165 _cons | -.0845855 .1154202 -0.73 0.4 64 -.310805 .1416339 -------------+------------------------------------- --------------------------- sigma_u | .33561173 sigma_e | .35204264 rho | .47611949 (fraction of variance d ue to u_i)

It is also possible to estimate the fixed effects m odel using the least squares dummy variable approac h. Initially we have to define a set of dummy variables – one for e ach individual. . tab(nr), gen(dumi) Then estimate the regression model including the du mmy variables using OLS . regress lwage educ exper expersq married black hi sp dumi*, noc vce(robust) note: dumi375 omitted because of collinearity note: dumi395 omitted because of collinearity note: dumi462 omitted because of collinearity Linear regression Number of obs = 4360 F(548, 3812) = 1048.92 Prob > F = 0.0000 R-squared = 0.9639 Root MSE = .35204 --------------------------------------------------- --------------------------- | Robust lwage | Coef. Std. Err. t P>| t| [95% Conf. Interval] -------------+------------------------------------- --------------------------- educ | .0826707 .0040599 20.36 0.0 00 .0747108 .0906305 exper | .1169371 .0091402 12.79 0.0 00 .0990168 .1348573 expersq | -.0043329 .0005988 -7.24 0.0 00 -.0055069 -.0031589 married | .0473384 .0181832 2.60 0.0 09 .0116886 .0829882 black | .1943954 .0733538 2.65 0.0 08 .0505789 .3382118 hisp | .4369434 .0835889 5.23 0.0 00 .2730601 .6008268 dumi1 | -.3174653 .3215929 -0.99 0.3 24 -.947976 .3130454 dumi2 | -.0474871 .0700694 -0.68 0.4 98 -.1848643 .0898 .................. dumi542 | .5402538 .078385 6.89 0.00 0 .3865731 .6939344 dumi543 | -.3221958 .1511144 -2.13 0.0 33 -.6184686 -.025923 dumi544 | .7369184 .0553026 13.33 0.0 00 .6284928 .8453439 dumi545 | -.0496064 .0986855 -0.50 0.6 15 -.2430879 .1438751 Warning: we now have estimates for educ, black, etc , but things are not as they appear!

Fixed effects can also be eliminated by differencin g. . regress d.lwage d.educ d.exper d.expersq d.black d.married d.hisp, vce(robust) note: _delete omitted because of collinearity note: _delete omitted because of collinearity note: _delete omitted because of collinearity note: _delete omitted because of collinearity Linear regression Number of obs = 3815 F( 2, 3812) = 5.08 Prob > F = 0.0063 R-squared = 0.0030 Root MSE = .44326 --------------------------------------------------- --------------------------- | Robust D.lwage | Coef. Std. Err. t P>| t| [95% Conf. Interval] -------------+------------------------------------- --------------------------- educ | D1. | (omitted) | exper | D1. | (omitted) | expersq | D1. | -.0038615 .0013887 -2.78 0.0 05 -.0065841 -.0011388 | black | D1. | (omitted) | married | D1. | .0383504 .0233253 1.64 0.1 00 -.0073809 .0840818 | hisp | D1. | (omitted) | _cons | .1155318 .0211452 5.46 0.0 00 .0740747 .1569889

The Two-Way Fixed Effects Model

• In the one-way model, we assume that there exists an unobserved individual heterogeneity,

but that the model is homogenous over time

• Two-way panel models allow for unobserved heterogeneity across both time and individuals

• The two-way panel model can be written as

�01 = 40 + V1 + 2′01� + 301 or,

�01 = 4 + L0 + W1 + 2′01� + 301

where ∑ L0 = 00 and ∑ W1 = 01

• We can then define:

o Individual/time effect: 401 = 4 + L0 + W1

o The average effect: 4 = 4�∙∙ = �"- ∑ ∑ 40110

o The individual effect: 4 + L0 = 4�0∙ = �- ∑ 4011

o The time effect: 4 + W1 = 4�∙1 = �" ∑ 4010

• Using these we can write 401 − 4�0∙ − 4�∙1 + 4�∙∙ = 0

• The two-way fixed effects panel model can be estimated using the LSDV approach by

including time dummies FS01 = 1 (, = �) in addition to individual dummies, thus estimating:

�01 = 4��01 + 4��01 + ⋯ + 4"�"01 + V�F�01 + ⋯ + V-F-01 + 2′01� + 301

• One problem with estimating the two-way panel model using dummy variables is that there

is an incidental parameters problem as either � or � go to infinity

o A new within transformation can remove these:

�X01 = �01 − ��0∙ − ��∙1 + ��∙∙ o The two-way within model can then be written as

�X01 = �2X01 + �0̃1

• The average, individual and time effects can now be estimated as

o 4DZ = ��∙∙ − ��Z2̅∙∙ o 4D�Z,0∙ = ��0∙ − ��Z2̅0∙ o 4D�Z,∙1 = ��∙1 − ��Z2̅∙1

• Consistency:

o 4DZ and ��Z are consistent as either � or � tend to infinity

o 4D�Z,0∙ is only T-consistent

o 4D�Z,∙1 is only N-consistent

• The two-way within transformation removes both observed and unobserved heterogeneity

for both individual and time effects

• The two-way model can also be estimated using a random effects model by GLS

• In one-way models the fixed effects are either fixed or random. In a two-way model the

individual and time effects can be fixed or random

o i.e. we may have mixed random effects / fixed effects models where the time effect is

assumed fixed and the individual effect random for example

o if � is small for example, one may estimate a one-way random effects model on a set

of exogenous variables and time dummies

• In certain cases a dataset might have more than 2 dimensions: e.g. firm-industry-year;

country region-year; individual-household-year; employee-firm-year; farm-region-year.

• This class of data can be analyzed using nested error component models.

80U1 = 4 + �′7[0U1 + �0U1 , �0U1 = L0 + \U + W1 + ]0U + 30U1

• Which can be estimated using LSDV methods.

Other Estimators

Between Effects Regression

• Between Groups estimation involves OLS on the cross-section equation:

��0 = 2̅0� + (4�0 + �0̅)

• i.e. we average out all of the within-individual variation, leaving only the between-individual

variation

• The model can be estimated using OLS by either

o Using one group-mean observation per individual

o Or using �0 copies of the individual group mean data for individual 9

• The latter is equivalent to a weighted regression of ��0 on 2̅0, with the weights given by �0 for

individual 9. It is often desirable to give more weight to individuals with many time series.

• Consistency requires that �(20140) = 0

• Between groups estimation is not efficient

• Usually only used to obtain an estimate of PQ� when using feasible GLS

Seemingly Unrelated Regression

• Seemingly unrelated regressions (SUR) involves estimating a different model for each

individual within the data set (though there is a bit more to it) – involves system estimation.

• For example, if you are trying to estimate high school graduation rates and you have data on

all fifty states in the U.S. from a particular time period (1985 through 2005, for example),

you estimate a separate model for each state.

o This generates some coefficient estimates and some error terms.

o However, error terms are likely to be correlated between states for any particular year.

� For example, there might have been some national event in 1990 that caused

graduation rates across the country to be unusually high, so we would expect to

see that error terms are correlated across observations in this year.

o This correlation is a violation of the assumptions of OLS, and the SUR takes advantage

of this to improve upon OLS.

• The process uses the information about the correlation between the error terms to improve

upon the OLS estimates and come up with improved coefficient estimates.

• The statistical technique that is used to compute these improved estimates is called

generalized least squares.

• One potential problem with the SUR model is that you might have more explanatory

variables than you have observations for any individual in the data, which would essentially

make this approach impossible.

Mixed Models or Random Coefficient Models

• In linear random-intercept models, the overall level of the response, conditional on 2, could

vary across clusters

• In random coefficients models, we also allow the marginal effect of the covariates to vary

across clusters

• Consider the model:

�01 = (� + �0) + (�� + ��0)201 + �01

• This allows for the intercept and slope coefficients to vary across individuals (we could also

allow for the coefficients to change across time also)

• Such models are in many cases not estimable due to a degrees of freedom problem

• One solution is to assume that each regression coefficient is a random variable with a

probability distribution

• This reduces the number of parameters to be estimated significantly

• In particular we may assume that:

�0~�(0, ^�)

��0~�(0, ^��)

�&=(�0 , ��0) = ^��

• In the above equation we can consider �� to be the common mean coefficient vector and

the ��0’s as the individual deviation from the mean.

• Rewriting the above equation we have:

�01 = (� + ��201) + (�0 + ��0201) + �01

]01 = (�0 + ��0201) + �01

=�)(]01) = ^� + 2^��201 + ^��201� + P�

• Since the variance of ]01 depends on 201 there is heteroscedasticity

• The model can be estimated by Generalised Least Squares

Example of Snijders and Bosker

Dataset allows one to consider verbal IQ as a predictor of language scores

Data is collected on individuals within schools

To estimate a Random Coefficients model we can use the xtmixed command in Stata

To let Stata know that we want the covariance between the intercept and slope to be estimated we specify covariance(unstructured)

. xtmixed langpost iqvc || schoolnr: iqvc, mle cova riance(unstructured) Performing EM optimization: Performing gradient-based optimization: Iteration 0: log likelihood = -7615.9951 Iteration 1: log likelihood = -7615.3893 Iteration 2: log likelihood = -7615.3887 Computing standard errors: Mixed-effects ML regression Num ber of obs = 2287 Group variable: schoolnr Num ber of groups = 131 Obs per group: min = 4 avg = 17.5 max = 35 Wal d chi2(1) = 962.03 Log likelihood = -7615.3887 Pro b > chi2 = 0.0000 --------------------------------------------------- --------------------------- langpost | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- iqvc | 2.52637 .0814522 31.02 0.0 00 2.366726 2.686013 _cons | 40.70956 .3042423 133.81 0.0 00 40.11325 41.30586 --------------------------------------------------- --------------------------- --------------------------------------------------- --------------------------- Random-effects Parameters | Estimate Std. Er r. [95% Conf. Interval] -----------------------------+--------------------- --------------------------- schoolnr: Unstructured | sd(iqvc) | .4583713 .110096 5 .286264 .7339526 sd(_cons) | 3.058354 .249135 7 2.607043 3.587791 corr(iqvc,_cons) | -.8168636 .174362 1 -.9744848 -.1196644 -----------------------------+--------------------- --------------------------- sd(Residual) | 6.44051 .100424 4 6.246659 6.640377 --------------------------------------------------- --------------------------- LR test vs. linear regression: chi2(3) = 24 6.91 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference.

• The expected language score for a child with average IQ now averages 40.7 across schools, with a standard dviation of 3.1.

• The expected gain in language score per point of IQ averages 2.5, with a standard devaition of 0.48.

• The intercept and slope have a negative correlation of -0.82 across schools, so schools with higher language scores for a kid with average verbal IQ tend

to show smaller average gains.

Although random effects are not directly estimated, you can form best linear unbiased predictions (BLUPs) of them (and standard errors) – using predict after

xtmixed.

The next step is to predict fitted values as well as the random effects (we can also verify that we can reproduce the fitted values)

. predict yhat2, fitted // yhat for model 2 . predict rb2 ra2, reffects //residual slope and intercept for model 2 . gen check = (_b[_cons]+ra2) + (_b[iqvc]+rb2)*iqvc . list yhat2 check in 1/10 +---------------------+ | yhat2 check | |---------------------| 1. | 20.78043 20.78043 | 2. | 24.53934 24.53934 | 3. | 27.04527 27.04527 | 4. | 30.80418 30.80418 | 5. | 33.31012 33.31012 | |---------------------| 6. | 34.56309 34.56309 | 7. | 34.56309 34.56309 | 8. | 34.56309 34.56309 | 9. | 34.56309 34.56309 | 10. | 35.81606 35.81606 | +---------------------+

• The graph of fitted lines shows clearly how school differences are more pronounced at lower than at higher verbal IQs.

• Fixed or Random Effects

o Random Effects is more efficient

o Can estimate all parameters (e.g. time invariant variables) with random effects

o Random effects is inconsistent if fixed effects are present

o Fixed effects allow for arbitrary correlation between the individual effects and the

regressors

o The group effect can be thought of as random if we can think of the sample as being

drawn from a larger population.

o Fixed effects model appropriate when differences between individuals may be viewed as

parametric shifts in the regression function (considered reasonable when the sample

covers broadly exhaustive sample of the population)

o Random effects more applicable when we want to draw inferences for the whole

population

o Random effects preferred when there is no correlation between the individual effects and

the regressors (can test this using the Hausman test)

o LSDV model often results in a large loss in degrees of freedom

o Fixed effects model eliminates a large portion of the total variation if the between sum of

squares are large relative to the within sum of squares

o The 40 are a total of several factors specific to the cross-section units and thus represents

“specific ignorance”, which can be treated as random variables, in the same manner as �01

which represents “general ignorance” are treated as random

Comparison of Estimators

(1) (2) (3) (4) (5)

OLS OLS

(clustered s.e)

FE RE BE

educ 0.0989*** 0.0989*** 0.101*** 0.0941***

(0.00458) (0.00923) (0.00889) (0.0112)

exper 0.0941*** 0.0941*** 0.117*** 0.113*** -0.00271

(0.0102) (0.0124) (0.0107) (0.0105) (0.0511)

expersq -0.00323*** -0.00323*** -0.00433*** -0.00415*** 0.00212

(0.000678) (0.000864) (0.000686) (0.000672) (0.00327)

married 0.117*** 0.117*** 0.0473** 0.0653*** 0.161***

(0.0154) (0.0266) (0.0212) (0.0192) (0.0423)

black -0.114*** -0.114** -0.127** -0.0963*

(0.0248) (0.0520) (0.0515) (0.0498)

hisp 0.0267 0.0267 0.0265 0.0233

(0.0199) (0.0404) (0.0407) (0.0439)

Observations 4360 4360 4360 4360 4360

• Chow Test

o Provides a test of the pooled (restricted model) versus the fixed effects (unrestricted)

model

o This is simply a joint test of whether the fixed effects are significant

$J_` = (aabb − cabb)/(� − 1)cabb/(�� − � − E)

where RRSS and URSS are the residuals sum of squares from the restricted and unrestricted

model respectively. This is distributed e";�,"-;";f under the null of no fixed effects.

o If there are a number of observed individual specific variables in the model, these are

included in the pooled model, but not the fixed effects model (i.e. we want to test for

unobserved heterogeneity)

. regress lwage educ exper expersq black married hi sp i.nr note: 11892.nr omitted because of collinearity note: 12220.nr omitted because of collinearity note: 12548.nr omitted because of collinearity Source | SS df MS Number of obs = 4360 -------------+------------------------------ F(547, 3812) = 11.27 Model | 764.09314 547 1.3968796 Prob > F = 0.0000 Residual | 472.436482 3812 .123934019 R-squared = 0.6179 -------------+------------------------------ Adj R-squared = 0.5631 Total | 1236.52962 4359 .283672774 Root MSE = .35204 --------------------------------------------------- --------------------------- lwage | Coef. Std. Err. t P>| t| [95% Conf. Interval] -------------+------------------------------------- --------------------------- educ | .0290989 .0352816 0.82 0.4 10 -.0400737 .0982715 exper | .1169371 .0084385 13.86 0.0 00 .1003926 .1334815 expersq | -.0043329 .0006066 -7.14 0.0 00 -.0055222 -.0031436 black | .8273663 .1536651 5.38 0.0 00 .5260925 1.12864 married | .0473384 .0183445 2.58 0.0 10 .0113725 .0833043 hisp | .6551428 .1539299 4.26 0.0 00 .3533498 .9569357 | nr | 17 | .2164064 .1614937 1.34 0.1 80 -.1002159 .5330287 18 | .5947677 .1539649 3.86 0.0 00 .2929061 .8966293 45 | .496684 .1534798 3.24 0.0 01 .1957735 .7975946 …….. 12500 | -.1118741 .1536758 -0.73 0.4 67 -.4131688 .1894206 12534 | .8936683 .1537159 5.81 0.0 00 .592295 1.195042 12548 | (omitted) | _cons | .4325396 .4165826 1.04 0.2 99 -.3842066 1.249286 --------------------------------------------------- ---------------------------

. testparm i.nr ( 1) 17.nr = 0 ( 2) 18.nr = 0 ( 3) 45.nr = 0 ( 4) 110.nr = 0 ( 5) 120.nr = 0 … (536) 12420.nr = 0 (537) 12433.nr = 0 (538) 12451.nr = 0 (539) 12477.nr = 0 (540) 12500.nr = 0 (541) 12534.nr = 0 F(541, 3812) = 8.34 Prob > F = 0.0000

• So, we reject the null that all fixed effects are z ero (and thus prefer the fixed effects over the poo led model)

• Hausman Test

o Usually applied to test for fixed versus random effects models

o Compares directly the random effects estimator, ��g! to the fixed effects estimator, ��h!

o In the presence of a correlation between the individual effects and the regressors the

GLS estimates are inconsistent, while the OLS fixed effects results are consistent

o If there is no correlation between the fixed effects and the regressors both estimators

are consistent, but the OLS fixed effects estimator is inefficient

o Construct i = ��h! − ��g! and j(i) = j��h!� − j��g!�

o Test statistic: k = iD′ jl(iD)#;�iD distributed as a m� statistic with n degress of freedom

(where n is the dimensionality of �)

o The null hypothesis is that the preferred model is a random effects model and the

alternative that the fixed effects model is preferred

. xtreg lwage educ exper expersq black married hisp , fe note: educ omitted because of collinearity note: black omitted because of collinearity note: hisp omitted because of collinearity Fixed-effects (within) regression Num ber of obs = 4360 Group variable: nr Num ber of groups = 545 R-sq: within = 0.1741 Obs per group: min = 8 between = 0.0014 avg = 8.0 overall = 0.0534 max = 8 F(3 ,3812) = 267.93 corr(u_i, Xb) = -0.1289 Pro b > F = 0.0000 --------------------------------------------------- --------------------------- lwage | Coef. Std. Err. t P>| t| [95% Conf. Interval] -------------+------------------------------------- --------------------------- educ | (omitted) exper | .1169371 .0084385 13.86 0.0 00 .1003926 .1334815 expersq | -.0043329 .0006066 -7.14 0.0 00 -.0055222 -.0031436 black | (omitted) married | .0473384 .0183445 2.58 0.0 10 .0113725 .0833043 hisp | (omitted) _cons | 1.085044 .026295 41.26 0.0 00 1.033491 1.136598 -------------+------------------------------------- --------------------------- sigma_u | .40387667 sigma_e | .35204264 rho | .56824994 (fraction of variance d ue to u_i) --------------------------------------------------- --------------------------- F test that all u_i=0: F(544, 3812) = 8.29 Prob > F = 0.0000 . estimates store coeff_consistent

. xtreg lwage educ exper expersq black married hisp , re Random-effects GLS regression Num ber of obs = 4360 Group variable: nr Num ber of groups = 545 R-sq: within = 0.1739 Obs per group: min = 8 between = 0.1548 avg = 8.0 overall = 0.1635 max = 8 Wal d chi2(6) = 901.13 corr(u_i, X) = 0 (assumed) Pro b > chi2 = 0.0000 --------------------------------------------------- --------------------------- lwage | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- educ | .1011033 .0091567 11.04 0.0 00 .0831565 .1190501 exper | .1128358 .0082738 13.64 0.0 00 .0966195 .1290521 expersq | -.0041483 .0005928 -7.00 0.0 00 -.0053101 -.0029864 black | -.1269633 .0488629 -2.60 0.0 09 -.2227328 -.0311938 married | .065336 .0168465 3.88 0.0 00 .0323175 .0983546 hisp | .026507 .0437909 0.61 0.5 45 -.0593215 .1123355 _cons | -.0845855 .1135289 -0.75 0.4 56 -.3070982 .1379271 -------------+------------------------------------- --------------------------- sigma_u | .33561173 sigma_e | .35204264 rho | .47611949 (fraction of variance d ue to u_i) --------------------------------------------------- --------------------------- . estimates store coeff_efficient

. hausman coeff_consistent coeff_efficient ---- Coefficients ---- | (b) (B) (b- B) sqrt(diag(V_b-V_B)) | coeff_cons~t coeff_effi~t Differ ence S.E. -------------+------------------------------------- --------------------------- exper | .1169371 .1128358 .004 1013 .0016594 expersq | -.0043329 -.0041483 -.000 1846 .0001287 married | .0473384 .065336 -.017 9977 .0072605 --------------------------------------------------- --------------------------- b = consistent under Ho and Ha; obtaine d from xtreg B = inconsistent under Ha, efficient un der Ho; obtained from xtreg Test: Ho: difference in coefficients not syst ematic chi2(3) = (b-B)'[(V_b-V_B)^(-1)]( b-B) = 11.79 Prob>chi2 = 0.0081

• So the fixed effects model is preferred

• Breusch and Pagan Test

o Provides a test of the random effects model against the pooled OLS model

o Tests the null hypothesis that PQ� = 0, which is the case where the individual effects do

not exist and OLS is applicable (i.e. the random effects model reduces to the pooled

one if the variance of the individual effects is zero)

o Denote the residuals from the OLS (pooled) regression as �0̂1

o Define: b� = ∑ (∑ �0̂1-1?� )"0?��

and b� = ∑ ∑ �01�-1?�"0?�

o Test statistic: W = "-�(-;�) pqr

qs − 1t�, distributed as a m� statistic with 1 degree of

freedom under the null hypothesis

. xtreg lwage educ exper expersq black married hisp , vce(robust) Random-effects GLS regression Num ber of obs = 4360 Group variable: nr Num ber of groups = 545 R-sq: within = 0.1739 Obs per group: min = 8 between = 0.1548 avg = 8.0 overall = 0.1635 max = 8 Wal d chi2(6) = 517.71 corr(u_i, X) = 0 (assumed) Pro b > chi2 = 0.0000 (Std. Err. adjus ted for 545 clusters in nr) --------------------------------------------------- --------------------------- | Robust lwage | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- educ | .1011033 .0088919 11.37 0.0 00 .0836756 .1185311 exper | .1128358 .0104931 10.75 0.0 00 .0922696 .1334019 expersq | -.0041483 .0006719 -6.17 0.0 00 -.0054652 -.0028314 black | -.1269633 .0514673 -2.47 0.0 14 -.2278373 -.0260892 married | .065336 .0192047 3.40 0.0 01 .0276956 .1029765 hisp | .026507 .0407199 0.65 0.5 15 -.0533025 .1063165 _cons | -.0845855 .1154202 -0.73 0.4 64 -.310805 .1416339 -------------+------------------------------------- --------------------------- sigma_u | .33561173 sigma_e | .35204264 rho | .47611949 (fraction of variance d ue to u_i) --------------------------------------------------- ---------------------------

. xttest0 Breusch and Pagan Lagrangian multiplier test for ra ndom effects lwage[nr,t] = Xb + u[nr] + e[nr,t] Estimated results: | Var sd = sqrt( Var) ---------+------------------------- ---- lwage | .2836728 .532609 4 e | .123934 .352042 6 u | .1126352 .335611 7 Test: Var(u) = 0 chibar2(01) = 3425.60 Prob > chibar2 = 0.0000

• We reject the null hypothesis, which indicates that there are significant differences across individua ls and random effects is more appropriate

Misspecification Tests

It is difficult to investigate the time-series properties (e.g. autocorrelation, stationarity, etc) of

panel data when � is small

• Testing for heteroscedasticity is possible with small � using the Bickel version of the

Bresuch-Pagan test

o This is a test of both within and between heterogeneity

o This is a test of V� = ⋯ = Vu = 0 in the regression model

�0̂1� = V + V��D01 + ⋯ + Vu�D01u + v01

where �0̂1 and �D01 are the residuals and fitted values respectively from the within regression

• For medium and larger values of � a Bartlett type test is used.

o This assumes homoscedasticity within individuals, and tests for heteroscedasticity

between individuals

o Using the residuals from the within regression calculate the total residual variance,

,� = �"-;";f ∑ ∑ �0̂1� , and the within individual variances, ,0� = �

-;� ∑ �0̂1�1

o Calculate the Bartlett statistic, � = (-;�) "wxSs;∑ wxSys#�z{("z�)/�(-;�)| , which is distributed as m";��

under the null hypothesis

• A test for first-order within individual autocorrelation is calculated from the within

regression residuals as

) = ∑ ∑ �0̂1�0̂,1;�-1?�"0?�∑ ∑ �0̂1�-1?�"0?�

o The simplest test is then the Breusch-Godfrey test; }~ = �"-s-;� . ), which is distributed

�(0,1) under the null hypothesis

o Given the slow convergence to normality a superior alternative due to Fisher is often

used; F = √"-;";f� . ln �z�

�;�, which is also distributed �(0,1) under the null

• If evidence of heteroscedasticity or autocorrelation is discovered, one could try to model the

heteroscedasticity and/or correlations

o This can be difficult even for large �, but is generally impossible for small �

o An alternative is to accept the coefficient estimates, but use robust standard errors

� If heteroscedasticity is a problem we can use White’s robust standard errors

� If heteroscedasticity and/or within individual autocorrelation is suspected we can

use Arrelano’s robust standard errors

o The White method is often included in statistical packages, with the variance-

covariance matrix given by, j�)�� = �7�′7��;��∑ ∑ �0̂1� 7�01′7�01��7�′7��;�, where 7� is the

(�� × E) “difference-from-mean” matrix of all exogenous variables and 7�01 the (1 × E)

row vector of variables for a given observation

o The Arellano method is less standard, with the variance-covariance matrix given by,

j�)�� = �7�′7��;��∑ 7�0′�0̂�0̂′7�0��7�′7��;�, where 7�0 is the (� × E) “difference-from-

mean” matrix of exogenous variables, and �0̂ is the (� × 1) vector of residuals for the 9th

individual

A user-written command in Stata (xttest3) allows one to test for heteroscedasticity. This tests for

heteroscedasticity within groups.

. ssc install xttest3 . xtreg lwage educ exper expersq black married hisp , fe note: educ omitted because of collinearity note: black omitted because of collinearity note: hisp omitted because of collinearity Fixed-effects (within) regression Num ber of obs = 4360 Group variable: nr Num ber of groups = 545 R-sq: within = 0.1741 Obs per group: min = 8 between = 0.0014 avg = 8.0 overall = 0.0534 max = 8 F(3 ,3812) = 267.93 corr(u_i, Xb) = -0.1289 Pro b > F = 0.0000 --------------------------------------------------- --------------------------- lwage | Coef. Std. Err. t P>| t| [95% Conf. Interval] -------------+------------------------------------- --------------------------- educ | (omitted) exper | .1169371 .0084385 13.86 0.0 00 .1003926 .1334815 expersq | -.0043329 .0006066 -7.14 0.0 00 -.0055222 -.0031436 black | (omitted) married | .0473384 .0183445 2.58 0.0 10 .0113725 .0833043 hisp | (omitted) _cons | 1.085044 .026295 41.26 0.0 00 1.033491 1.136598 -------------+------------------------------------- --------------------------- sigma_u | .40387667 sigma_e | .35204264 rho | .56824994 (fraction of variance d ue to u_i) --------------------------------------------------- --------------------------- F test that all u_i=0: F(544, 3812) = 8.29 Prob > F = 0.0000

. xttest3 Modified Wald test for groupwise heteroskedasticity in fixed effect regression model H0: sigma(i)^2 = sigma^2 for all i chi2 (545) = 2.2e+05 Prob>chi2 = 0.0000 So, we reject the null of homoscedasticity Note: the power of the test is weak in large N, sma ll T panels.

. net sj 3-2 st0039

. net install st0039 . xtserial lwage educ exper expersq black married h isp Wooldridge test for autocorrelation in panel data H0: no first-order autocorrelation F( 1, 544) = 24.809 Prob > F = 0.0000

Linear Dynamic Panel Models

• Why model dynamics?

o Many economic relationships are dynamic in nature and one of the advantages of panel

data is that they allow the researcher to better understand the dynamics of adjustment

o Current outcomes might depend on past values of the explanatory variables, i.e. include

lags of the 2’s in the model (a distributed lag model)

� In this case we can use similar techniques to those described above

o Simple dynamic model regresses �01 on polynomial in time

� e.g. growth curve of child height or IQ as grow older

� use previous models with 201 polynomial in time or age

o Adjustment might be partial: this year’s outcome � depends not only on 2, but also on

last year’s outcome, i.e. include lags of �

� Note: this is equivalent to including an infinite number of lagged 2’s

� This further implies that we have in the equation the entire history of the RHS

variables, meaning that any measured influence is conditioned on this history

Linear Dynamic Panel Models with Individual Effects

• It is common to consider an AR(1) model with individual fixed effects

�01 = V�0,1;� + 2′01� + 40 + �01

• Though more general models can be used (e.g. error correction models, ARMA models).

• Consider the within-group transformation (i.e. the mean difference);

�01 − ��0 = (201 − 2̅0)� + V��0,1;� − ��0∗� + �01 − �0̅

��0∗ = 1� � �0,1;�

-y

1?�= 1

� � �01 ≠ ��-

1?

• We have got rid of the individual effect. But what are the statistical properties of a

regression of �01 − ��0 on (201 − 2̅0) and ��0,1;� − ��0∗�?

Properties of the Within-Group Estimator

• Obtain an expression for �01 that only involves 40, 2, and �0 (the starting value or initial

condition of �)

�01 = V�0,1;� + 2′01� + 40 + �01

• By substitution;

�01 = V�V�0,1;� + 2′0,1;�� + 40 + �0,1;�� + 2′01� + 40 + �01

�01 = V��0,1;� + V2′0,1;�� + V40 + V�0,1;� + 2′01� + 40 + �01

�01 = V��V�0,1;� + 2′0,1;�� + 40 + �0,1;�� + V2′0,1;�� + V40 + V�0,1;� + 2′01� + 40 + �01

�01 = V�V�0,1;� + V�2′0,1;�� + V�40 + V��0,1;� + V2′0,1;�� + V40 + V�0,1;� + 2′01� + 40 + �01

�01 = V�V�0,1;� + 2′01� + V2′0,1;�� + V�2′0,1;�� + (1 + V + V�)40 + �01 + V�0,1;� + V��0,1;�

• And so on until we arrive at � = 0 (i.e. initial conditions are important)

• Hence, the statement that this essentially estimating a model including an infinite number

of lags of the 2’s

Distributed lag form:

�01 = � VS1;�

S?�40 + VS2′0,1;S� + �0,1;S� + V1�0

�01 = 1 − V11 − V 40 + � VS2′0,1;S�

S+ �01 + V�0,1;� + ⋯ + V1�0,�# + V1�0

�01;� is a function of �0,1;�, … , �0,�

��0∗ = ∑ �01-y;�1? �0� is a function of �0,-;�, … , �0,� and �0

�01;� − ��0∗ is correlated with �01 − �0̅

• Bias in within-group regression coefficients

o Bias of the within-groups estimator is caused by eliminating the individual effect from

the equation. This causes a correlation between the transformed error term and the

transformed lagged dependent variable

o Bias is generally negative for small �

o For large � the bias is small, but with panel data � tends to be small

• As with the within-groups estimator there is a bias in the estimator when first-differencing

the model

• This is also true for other models (e.g. pooled OLS, random effects,…)

• Consider first-differencing to eliminate individual effects

��01 − �0,1;�� = �201 − 20,1;�� + V��0,1;� − �0,1;�� + ��01 − �0,1;��

• OLS is inconsistent since ��01 − �0,1;�� is correlated with ��01 − �0,1;�� (even under the

assumption that �01 is serially uncorrelated)

o The transformed error term ��01 − �0,1;�� is a MA(1) process which contains �0,1;�, and

is thus correlated with ��01 − �0,1;��

• There are several IV estimators which correct for endogeneity of the lagged dependent

variable.

o Similar to the method of Hausman and Taylor (see below) the instruments come from

within the model.

o Examples include Anderson and Hsiao, Arellano and Bond, and Blundell and Bond

• What we need is a set of instruments that are correlated with ��01 − �0,1;��, but not with

�0,1;�

o All lagged 201 and �0,1;�, … , �0 are valid instruments if {�01| is serially independent

• Since �0,1;� is not correlated with ��01 − �0,1;�� Anderson and Hsiao suggested using �0,1;�

as an instrument for ��0,1;� − �0,1;�� alongside 201, 20,1;� and 20,1;�

Problems

• If �01 is (or is close to) a random walk then �0,1;� is not correlated with ∆�0,1;� and is not a

valid instrument

• Methods based solely on the differenced equation ignores potentially valuable information

contained in the initial condition

• What is the optimal point on the trade-off between the number of lags used as instruments

and the number of time periods retained in the estimation sample?

System Estimators

• The time differenced model:

��01 − �0,1;�� = �201 − 20,1;�� + V��0,1;� − �0,1;�� + ��01 − �0,1;��

∆�01 = ∆201� + V∆�0,1;� + ∆�01 � = 2, … , �0 (1)

• Can be considered a system of �0 − 1 linear equations with cross-correlated errors (since

∆�01 is correlated with ∆�0,1;� and ∆�0,1z�)

• There is also some related process generating the initial conditions, �0 and �0�, which could

provide further equations

• A different number of instruments is available for each of the equations in (1), for example:

o The equation for t=2 has only (20 … 20-, �0)

o The equation for t=�0 has (20 … 20-, �0 … �0-;�)

The Method of Moments

• The method of moments is a way of getting consistent estimates of model parameters

• Specify moment conditions (e.g. means, covariances) implied by the model as a function of

its parameters (population moments)

• Write down the “sample analogues” of these moment conditions, i.e. expressions into which

you can plug the sample data, as a function of parameter estimates

• Choose values for the parameter estimates which “solve” the sample moment conditions

• Consider the mean of a random variable y

o The mean of � is defined as L = ��

o Rearrange this as a moment condition: k(�; L) = �� − L� = 0

o The sample analogue is: k�(�; L) = �� ∑ (�0 − L) = 0�0?�

o Solve to get the MM estimator: L = �� ∑ �0�0?�

• Often there are more moment conditions than parameters to be estimated. Then the

moment conditions don’t have a unique solution

• In this case, we minimise a (weighted) sum of the squares of the sample moments. In vector

notation this is written in the general case as ��(�, �, �)B��(�, �, �)

• This is called the generalised method of moments (GMM)

• IV estimators are members of the class of GMM estimators (e.g. 2SLS)

System Estimation of Dynamic Panel Models

• Arellano and Bond is a variation of Anderson and Hsiao that uses an unbalanced set of

instruments with further lags as instruments.

• Instead of regarding (1) as one equation think of it as a system of � − 1 equations

t=3: ∆�0� = ∆20�� + V∆�0,� + ∆�0�, instruments: F0� = �0�, ∆20�

t=4: ∆�0� = ∆20�� + V∆�0,� + ∆�0�, instruments: F0� = F0�, �0�, ∆20�

t=T: ∆�0- = ∆20-� + V∆�0,-;� + ∆�0-, instruments: F0- = F0-;�, �0-;�, ∆20-

• It is the use of different instruments for equations of different time periods that defines the

A&B method relative to conventional IV estimation, which uses the same instrument set for

all endogenous variables.

• Conventional instruments can also be used in the analysis

• A problem arises with the Arellano and Bond method if the variables are close to a random

walk, with lagged levels being poor instruments for the first differences

• Arellano and Bover (1995) and Blundell and Bond (1998) show that adding the original

equation in levels to the system can increase the number of moment conditions and

increase efficiency

o In the levels equations endogenous variables are instrumented with lags of their first

differences

• XTABOND2 in Stata fits both the Arellano and Bond difference GMM estimator and the

Blundell and Bond system (i.e. levels and differences) GMM estimator

Specification Testing in Dynamic Panel Models

• Tests for Overidentifying Restrictions (i.e. whether the instruments appear exogenous)

o Can be tested using the standard Sargan test

o Stata also reports the Hansen J test (since the Sargan test is not robust to heteroscedasticity

or autocorrelation)

• Testing for Residual Serial Correlation

o If the �01 are serially independent, then

�(∆�01∆�01;�) = ��(�01 − �01;�)(�01;� − �01;�)� = −� �01;�� # = −PR�

o Thus, we would expect first order serial correlation

o We would not however expect there to be any second order serial correlation, i.e.

�(∆�01∆�01;�) = ��(�01 − �01;�)(�01;� − �01;�)� = 0

o One should test for second order serial correlation therefore

o The presence of second order serial correlation indicates a specification error

use http://www.stata-press.com/data/r7/abdata.dta, clear xtabond2 n l.n w k l.w l.k yr1980 yr1981 yr1982 yr 1983 yr1984, gmm(l.n) iv(yr1980-yr1984) noleveleq Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm. Dynamic panel-data estimation, one-step difference GMM --------------------------------------------------- --------------------------- Group variable: id Num ber of obs = 751 Time variable : year Num ber of groups = 140 Number of instruments = 33 Obs per group: min = 5 Wald chi2(10) = 1235.04 avg = 5.36 Prob > chi2 = 0.000 max = 7 --------------------------------------------------- --------------------------- n | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- n | L1. | .554094 .1037099 5.34 0.0 00 .3508264 .7573616 | w | -.7749952 .2021376 -3.83 0.0 00 -1.171178 -.3788128 k | .48439 .1422565 3.41 0.0 01 .2055723 .7632077 | w | L1. | .3597913 .1214988 2.96 0.0 03 .121658 .5979247 | k | L1. | -.334291 .1447846 -2.31 0.0 21 -.6180637 -.0505183 | yr1980 | -.0139178 .0134693 -1.03 0.3 01 -.0403172 .0124815 yr1981 | -.0466677 .0231872 -2.01 0.0 44 -.0921137 -.0012217 yr1982 | -.038262 .0386544 -0.99 0.3 22 -.1140232 .0374993 yr1983 | -.0311078 .0519977 -0.60 0.5 50 -.1330214 .0708058 yr1984 | -.0303459 .0642001 -0.47 0.6 36 -.1561758 .095484

Instruments for first differences equation Standard D.(yr1980 yr1981 yr1982 yr1983 yr1984) GMM-type (missing=0, separate instruments for eac h period unless collapsed) L(1/.).L.n --------------------------------------------------- --------------------------- Arellano-Bond test for AR(1) in first differences: z = -4.29 Pr > z = 0.000 Arellano-Bond test for AR(2) in first differences: z = -0.27 Pr > z = 0.788 --------------------------------------------------- --------------------------- Sargan test of overid. restrictions: chi2(23) = 49.90 Prob > chi2 = 0.001 (Not robust, but not weakened by many instruments .) Difference-in-Sargan tests of exogeneity of instrum ent subsets: iv(yr1980 yr1981 yr1982 yr1983 yr1984) Sargan test excluding group: chi2(18) = 40.82 Prob > chi2 = 0.002 Difference (null H = exogenous): chi2(5) = 9.08 Prob > chi2 = 0.106

. xtabond2 n l.n w k l.w l.k yr1980 yr1981 yr1982 yr1983 yr1984, gmm(l.n) iv(yr1980-yr1984) Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm. Dynamic panel-data estimation, one-step system GMM --------------------------------------------------- --------------------------- Group variable: id Num ber of obs = 891 Time variable : year Num ber of groups = 140 Number of instruments = 41 Obs per group: min = 6 Wald chi2(10) = 6066.80 avg = 6.36 Prob > chi2 = 0.000 max = 8 --------------------------------------------------- --------------------------- n | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- n | L1. | .8658989 .0572771 15.12 0.0 00 .7536379 .9781598 | w | -.7153568 .1246113 -5.74 0.0 00 -.9595905 -.4711232 k | .5904417 .091638 6.44 0.0 00 .4108345 .770049 | w | L1. | .5637949 .1067572 5.28 0.0 00 .3545547 .7730351 | k | L1. | -.4850106 .0941143 -5.15 0.0 00 -.6694714 -.3005499 | yr1980 | -.000474 .0123318 -0.04 0.9 69 -.0246438 .0236958 yr1981 | -.0172628 .0170324 -1.01 0.3 11 -.0506458 .0161201 yr1982 | .0163149 .0195558 0.83 0.4 04 -.0220137 .0546435 yr1983 | .0206385 .0204995 1.01 0.3 14 -.0195397 .0608167 yr1984 | .0126074 .0295783 0.43 0.6 70 -.0453651 .0705799 _cons | .6428872 .3137909 2.05 0.0 40 .0278683 1.257906 --------------------------------------------------- ---------------------------

Instruments for first differences equation Standard D.(yr1980 yr1981 yr1982 yr1983 yr1984) GMM-type (missing=0, separate instruments for eac h period unless collapsed) L(1/.).L.n Instruments for levels equation Standard _cons yr1980 yr1981 yr1982 yr1983 yr1984 GMM-type (missing=0, separate instruments for eac h period unless collapsed) D.L.n --------------------------------------------------- --------------------------- Arellano-Bond test for AR(1) in first differences: z = -8.39 Pr > z = 0.000 Arellano-Bond test for AR(2) in first differences: z = -0.20 Pr > z = 0.838 --------------------------------------------------- --------------------------- Sargan test of overid. restrictions: chi2(30) = 60.24 Prob > chi2 = 0.001 (Not robust, but not weakened by many instruments .) Difference-in-Sargan tests of exogeneity of instrum ent subsets: GMM instruments for levels Sargan test excluding group: chi2(23) = 54.29 Prob > chi2 = 0.000 Difference (null H = exogenous): chi2(7) = 5.95 Prob > chi2 = 0.546 iv(yr1980 yr1981 yr1982 yr1983 yr1984) Sargan test excluding group: chi2(25) = 49.46 Prob > chi2 = 0.002 Difference (null H = exogenous): chi2(5) = 10.77 Prob > chi2 = 0.056

Endogeneity Revisited

• Consider the following wage regression:

�01 = 40 + ��(�3�0 + ��+(k��(0 + ��H(01 + ��(2�()01 + �01

• We can think of two possible forms of endogeneity:

o Two-way causation – experience is rewarded with high pay and workers tend to stay

in high-paid jobs

o Unobserved common factors – ability is rewarded with high pay and high ability

people stay in education longer

Two-way causation

Tenure model:

(2�()01 = V�01 + =01

(2�()01 = V(40 + ��(�3�0 + ��+(k��(0 + ��H(01 + ��(2�()01 + �01) + =01

(2�()01 = �V(40 + ��(�3�0 + ��+(k��(0 + ��H(01 + �01) + =01�/(1 − V��)

�&=((2�()01 , 40) = VPQ�/(1 − V��)

�&=((2�()01 , �01) = VPR�/(1 − V��)

• To deal with this kind of endogeneity we can estimate a within-group IV regression model

o The within-group transformation eliminates the 40’s and the IV deals with the

covariance between (2�()01 and �01

Unobserved common factors: 40 represents high ability and high ability people stay in education

longer

(�3�0 = �40 + &�ℎ() =�)9��(, (� > 0)

�&=((�3�0 , 40) = �PQ�

�&=((�3�0 , �01) = 0

• To deal with this kind of endogeneity we can estimate a within group regression model

o The within-group transformation eliminates the 40’s

o It also eliminates time-invariant variables, but there are approaches (e.g. Hausman-

Taylor) to obtain coefficients on these variables

Instrumental Variables Regression with Panel Data

• The standard 2SLS estimator for cross-section can be easily extended to the panel context

• Consider the model:

�01 = 40 + 201� + �01

• Where a subset of the 201’s are considered to be endogenous

• Partition 201, i.e. 201 = (2�01 , 2�01)

• Where 2�01 represents the endogenous covariates

�&=(2�01 , �01) = 0 and �&=(2�01 , �01) ≠ 0

• Obtain a set of instruments i�01 (at least as many as in 2�01)

o Where �&=(i�01 , �01) = 0

• The full set of instruments is thus i01 = (2�01 , i�01)

• Within-group transformation:

�01 − ��0 = (201 − 2̅0)� + �01 − �0̅ • The within-groups IV estimator then uses (i01 − i�0) as instruments

Other IV Estimators

• By applying the between-group transformation or the random-effects GLS transformation to

the model and instruments, we can define between-group and random effects IV estimators

analogous to the regression case

o As with standard regression these estimators are not robust with respect to correlation

between the 40 and 201

. webuse nlswork, clear . xtreg ln_wage tenure age not_smsa, fe Fixed-effects (within) regression Num ber of obs = 28093 Group variable: idcode Num ber of groups = 4699 R-sq: within = 0.1335 Obs per group: min = 1 between = 0.2484 avg = 6.0 overall = 0.1862 max = 15 F(3 ,23391) = 1201.75 corr(u_i, Xb) = 0.1840 Pro b > F = 0.0000 --------------------------------------------------- --------------------------- ln_wage | Coef. Std. Err. t P>| t| [95% Conf. Interval] -------------+------------------------------------- --------------------------- tenure | .0209418 .0008001 26.17 0.0 00 .0193735 .02251 age | .0123481 .0004125 29.93 0.0 00 .0115395 .0131566 not_smsa | -.099398 .0097221 -10.22 0.0 00 -.118454 -.080342 _cons | 1.280688 .0112142 114.20 0.0 00 1.258707 1.302668 -------------+------------------------------------- --------------------------- sigma_u | .38143467 sigma_e | .29745202 rho | .62184184 (fraction of variance d ue to u_i) --------------------------------------------------- --------------------------- F test that all u_i=0: F(4698, 23391) = 7.3 3 Prob > F = 0.0000

. xtivreg ln_wage age not_smsa (tenure = union sout h), fe Fixed-effects (within) IV regression Number of obs = 19007 Group variable: idcode Number of groups = 4134 R-sq: within = . Obs pe r group: min = 1 between = 0.1277 avg = 4.6 overall = 0.0879 max = 12 Wald c hi2(3) = 141873.28 corr(u_i, Xb) = -0.6875 Prob > chi2 = 0.0000 --------------------------------------------------- --------------------------- ln_wage | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- tenure | .2452291 .0386354 6.35 0.0 00 .1695051 .3209531 age | -.0651322 .0127701 -5.10 0.0 00 -.0901611 -.0401034 not_smsa | -.0159519 .0346643 -0.46 0.6 45 -.0838927 .0519888 _cons | 2.831893 .2443845 11.59 0.0 00 2.352908 3.310878 -------------+------------------------------------- --------------------------- sigma_u | .71942007 sigma_e | .64359089 rho | .55546192 (fraction of variance d ue to u_i) --------------------------------------------------- --------------------------- F test that all u_i=0: F(4133,14870) = 1.3 8 Prob > F = 0.0000 --------------------------------------------------- --------------------------- Instrumented: tenure Instruments: age not_smsa union south

. xtreg ln_wage tenure age not_smsa, re Random-effects GLS regression Num ber of obs = 28093 Group variable: idcode Num ber of groups = 4699 R-sq: within = 0.1322 Obs per group: min = 1 between = 0.2638 avg = 6.0 overall = 0.1979 max = 15 Wal d chi2(3) = 4879.47 corr(u_i, X) = 0 (assumed) Pro b > chi2 = 0.0000 --------------------------------------------------- --------------------------- ln_wage | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- tenure | .025503 .0007467 34.15 0.0 00 .0240395 .0269666 age | .0118608 .0003859 30.74 0.0 00 .0111044 .0126171 not_smsa | -.1580611 .0077541 -20.38 0.0 00 -.1732589 -.1428633 _cons | 1.289867 .0116069 111.13 0.0 00 1.267118 1.312616 -------------+------------------------------------- --------------------------- sigma_u | .32162515 sigma_e | .29745202 rho | .53898759 (fraction of variance d ue to u_i)

. xtivreg ln_wage age not_smsa (tenure = union sout h), re G2SLS random-effects IV regression Num ber of obs = 19007 Group variable: idcode Num ber of groups = 4134 R-sq: within = 0.0607 Obs per group: min = 1 between = 0.1725 avg = 4.6 overall = 0.1192 max = 12 Wal d chi2(3) = 929.08 corr(u_i, X) = 0 (assumed) Pro b > chi2 = 0.0000 --------------------------------------------------- --------------------------- ln_wage | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- tenure | .1768498 .0110283 16.04 0.0 00 .1552346 .1984649 age | -.0333235 .0030544 -10.91 0.0 00 -.03931 -.027337 not_smsa | -.2135208 .0129503 -16.49 0.0 00 -.2389029 -.1881386 _cons | 2.20578 .0581674 37.92 0.0 00 2.091774 2.319787 -------------+------------------------------------- --------------------------- sigma_u | .32796027 sigma_e | .64359089 rho | .20614163 (fraction of variance d ue to u_i) --------------------------------------------------- --------------------------- Instrumented: tenure Instruments: age not_smsa union south

. xtivreg ln_wage age not_smsa (tenure = union sout h), fd First-differenced IV regression Group variable: idcode Num ber of obs = 5934 Time variable: year Num ber of groups = 3461 R-sq: within = 0.1235 Obs per group: min = 1 between = 0.2071 avg = 4.3 overall = 0.0892 max = 11 Wal d chi2(3) = 5.83 corr(u_i, Xb) = -0.4766 Pro b > chi2 = 0.1203 --------------------------------------------------- --------------------------- D.ln_wage | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- tenure | D1. | .1365949 .0778382 1.75 0.0 79 -.0159652 .289155 | age | D1. | -.0048762 .0135226 -0.36 0.7 18 -.03138 .0216277 | not_smsa | D1. | -.0633273 .0382332 -1.66 0.0 98 -.138263 .0116083 | _cons | -.0694077 .0598777 -1.16 0.2 46 -.1867658 .0479503 -------------+------------------------------------- --------------------------- sigma_u | .53692033 sigma_e | .28615582 rho | .77878957 (fraction of variance d ue to u_i) --------------------------------------------------- --------------------------- Instrumented: tenure Instruments: age not_smsa union south

Simultaneity involving only individual effects: the Hausman-Taylor case

Consider the model:

�01 = 40 + 201� + F0� + �01 (1)

Where F0 are a set of individual specific (non-time varying) variables (e.g. education)

Partition 201 and F0: 201 = (2�01 , 2�01), F0 = (F�0 , F�0)

Where:

�(40|2�01) = 0, �(40|F�0) = 0 → 2�01, F�0 are exogenous

�(40|2�01) = 0, �(40|F�0) = 0 → 2�01, F�0 are endogenous

We must assume that

�(�01|201) = 0, �(�01|F0) = 0 for all 2- and F-variables

Identification condition: number of 2�01 > number of F�0 Method: use 2�01 as instruments for F�0

Hausman-Taylor IV Estimator

• Uses exogenous time-varying regressors 201 from periods other than the current one as

instruments

• One benefit of this approach is that it allows the estimation of a coefficient of a time-

invariant regressor in a fixed effects model (which is not possible using the standard FE

model)

• Step 1: compute the within-group estimator for �:

o Regress �01 − ��0 on (201 − 2̅0), which gives us the estimates �� (which are consistent

estimates of the parameters)

• Step 2: construct within-group residuals and estimate PR�

�0̂1 = �01 − ��0 − (201 − 2̅0)��

PDR� = ∑ ∑ �0̂1�-y1?��0?� (/(�� − 1) − n�)�

• Step 3: estimate model for (̂0 = ��0 − 2̅0��:

(̂0 = � + F0� + )(,9�3��

o To do this, stack the group means of these residuals in a full sample length data vector

o Use as IVs i01 = (2�01 , F�0) (which requires that the number of 2�’s exceeds the number

of the number of F�’s

o This provides a consistent estimator of the �’s

• Step 4: Construct (̂0∗ = ��0 − F0�� − 2̅0��; estimate PQ� from �0̂1 and (̂0∗. These form the

weights in the GLS (random effects) estimation

• Step 5: Estimate (1) as a random effects model using as IVs i01 = (F�0 , (2�01 − 2̅�0), (2�01 −2̅�0), 2̅�0 )

• This estimator was first proposed as a way of estimating wage regressions:

o Given that unobserved ability is omitted from the regression model random effects

estimation will suffer from an endogeneity bias

o Fixed effects estimation can eliminate this bias, but also prevents us from estimating

the coefficients on schooling as well as other time-invariant (dummy) variables

webuse psidextract, clear regress lwage wks south smsa ms exp exp2 occ ind un ion fem blk ed, vce(cluster id) Linear regression Number of obs = 4165 F( 12, 594) = 65.91 Prob > F = 0.0000 R-squared = 0.4286 Root MSE = .34936 (Std. Err. adjus ted for 595 clusters in id) --------------------------------------------------- --------------------------- | Robust lwage | Coef. Std. Err. t P>| t| [95% Conf. Interval] -------------+------------------------------------- --------------------------- wks | .0042161 .001542 2.73 0.0 06 .0011877 .0072444 south | -.0556374 .0261593 -2.13 0.0 34 -.1070134 -.0042614 smsa | .1516671 .0241026 6.29 0.0 00 .1043303 .1990039 ms | .0484485 .0409438 1.18 0.2 37 -.0319638 .1288608 exp | .0401046 .0040764 9.84 0.0 00 .0320987 .0481106 exp2 | -.0006734 .0000913 -7.37 0.0 00 -.0008527 -.000494 occ | -.1400093 .0272428 -5.14 0.0 00 -.1935133 -.0865054 ind | .0467886 .0236627 1.98 0.0 48 .0003159 .0932614 union | .0926267 .0236719 3.91 0.0 00 .046136 .1391175 fem | -.3677852 .0455743 -8.07 0.0 00 -.4572917 -.2782788 blk | -.1669376 .0443291 -3.77 0.0 00 -.2539986 -.0798767 ed | .0567042 .0055646 10.19 0.0 00 .0457756 .0676328 _cons | 5.251124 .1235461 42.50 0.0 00 5.008483 5.493764

Fixed-effects (within) regression Num ber of obs = 4165 Group variable: id Num ber of groups = 595 R-sq: within = 0.6581 Obs per group: min = 7 between = 0.0261 avg = 7.0 overall = 0.0461 max = 7 F(9 ,594) = 377.62 corr(u_i, Xb) = -0.9100 Pro b > F = 0.0000 (Std. Err. adjus ted for 595 clusters in id) --------------------------------------------------- --------------------------- | Robust lwage | Coef. Std. Err. t P>| t| [95% Conf. Interval] -------------+------------------------------------- --------------------------- wks | .0008359 .0008658 0.97 0.3 35 -.0008644 .0025363 south | -.0018612 .0893013 -0.02 0.9 83 -.1772459 .1735235 smsa | -.0424691 .0294829 -1.44 0.1 50 -.1003726 .0154343 ms | -.0297259 .0268702 -1.11 0.2 69 -.0824979 .0230462 exp | .1132083 .0040499 27.95 0.0 00 .1052543 .1211622 exp2 | -.0004184 .0000824 -5.07 0.0 00 -.0005803 -.0002564 occ | -.0214765 .0189947 -1.13 0.2 59 -.0587815 .0158285 ind | .0192101 .0226818 0.85 0.3 97 -.0253361 .0637564 union | .0327849 .0250658 1.31 0.1 91 -.0164436 .0820133 fem | (omitted) blk | (omitted) ed | (omitted) _cons | 4.648767 .0780057 59.60 0.0 00 4.495567 4.801968 -------------+------------------------------------- --------------------------- sigma_u | 1.0338102 sigma_e | .15199444 rho | .97884144 (fraction of variance d ue to u_i)

. xthtaylor lwage wks south smsa ms exp exp2 occ in d union fem blk ed, endog(exp exp2 occ ind union ed ) constant(fem blk ed) Hausman-Taylor estimation Num ber of obs = 4165 Group variable: id Num ber of groups = 595 Obs per group: min = 7 avg = 7 max = 7 Random effects u_i ~ i.i.d. Wal d chi2(12) = 6874.89 Pro b > chi2 = 0.0000 --------------------------------------------------- --------------------------- lwage | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- TVexogenous | wks | .000909 .0005988 1.52 0.1 29 -.0002647 .0020827 south | .0071377 .032548 0.22 0.8 26 -.0566553 .0709306 smsa | -.0417623 .0194019 -2.15 0.0 31 -.0797893 -.0037352 ms | -.036344 .0188576 -1.93 0.0 54 -.0733041 .0006161 TVendogenous | exp | .1129718 .0024697 45.74 0.0 00 .1081313 .1178122 exp2 | -.0004191 .0000546 -7.68 0.0 00 -.0005261 -.0003121 occ | -.0213946 .0137801 -1.55 0.1 21 -.048403 .0056139 ind | .0188416 .0154404 1.22 0.2 22 -.011421 .0491043 union | .0303548 .0148964 2.04 0.0 42 .0011583 .0595513 TIexogenous | fem | -.1368468 .1272797 -1.08 0.2 82 -.3863104 .1126169 blk | -.2818287 .1766269 -1.60 0.1 11 -.628011 .0643536 TIendogenous | ed | .1405254 .0658715 2.13 0.0 33 .0114197 .2696311 | _cons | 2.884418 .8527775 3.38 0.0 01 1.213004 4.555831 -------------+------------------------------------- --------------------------- sigma_u | .94172547 sigma_e | .15180273 rho | .97467381 (fraction of variance d ue to u_i) --------------------------------------------------- --------------------------- Note: TV refers to time varying; TI refers to time invariant.

Binary Response Models with Panel Data

• A Reminder

o Whenever the variable that we want to model is binary, it is natural to think in terms of

probabilities, examples include:

� What is the probability that a firm with given characteristics exports?

� If a female has a child what is the effect on the probability of being in the labour

force?

o Binary Response models can be estimated in a number of ways.

� The most straightforward is to use the Linear Probability Model (LPM)

o In this type of model the probability of success (i.e. � = 1) is a linear function of the

explanatory variables in the vector 2.

� The model is estimated using linear regression techniques

� e.g. the model can be estimated using OLS or the within groups estimator.

� The particular linear estimator used in the case of panel data will depend on the

relationship between the observed explanatory variables and the unobserved

individual effects

Properties of the LPM

• One undesirable property of the LPM is that we can get predicted "probabilities" either less

than zero or greater than one

• A related problem is that, conceptually, it does not make sense to say that a probability is

linearly related to a continuous independent variable for all possible values. If it were, then

continually increasing this explanatory variable would eventually drive �(� = 1|2) above

one or below zero

• A third problem with the LPM, is that the residuals are heteroscedastic. The easiest way of

solving this problem is to obtain estimates of the standard errors that are robust to

heteroscedasticity

• A fourth and related problem is that the residual is not normally distributed. This implies

that inference in small samples cannot be based on the usual normality-based distributions

such as the �-test

• Results from the LPM are easy to estimate and easy to interpret (e.g. the marginal effects

are simply given by the estimated coefficients)

• Certain econometric problems are easier to address within the LPM framework than with

probit and logit models (e.g. when using instrumental variables whilst controlling for fixed

effects)

• The two main problems with the LPM were: nonsense predictions are possible (there is

nothing to bind the value of 8 to the (0,1) range); and linearity doesn’t make much sense

conceptually. To address these problems we can use a nonlinear binary response model

• A general index model has the form:

�(� = 1|�) = �(��)

• for some � ∶ ℝ → (0,1). That is, 0 < �(∙) < 1. In most cases, �(∙) is a cumulative

distribution function for a continuous random variable with density H(∙). Then, �(∙) is

strictly increasing, and the estimates are easier to interpret.

• The leading cases are:

o The Probit Model: �(F) = Φ(F) ≡ ¤ ¥(=)�=¦;∞

where ¥(=) is the standard normal

density ¥(=) = (2§);�/�exp (−F�/2)

o The Logit Model: �(F) =∧ (F) = exp(F) /�1 + exp (F)�, i.e. the cumulative distribution

function for a standard logistic random variable

• Such models can be estimated using Maximum Likelihood (ML) techniques

• One can think about these models in terms of an underlying latent variable model

• Estimating the effect of 2U on the probability of success �(� = 1|�) is complicated by the

non-linear nature of �(∙)

o For continuous variables this can be calculated using calculus

• If 2U is a roughly continuous variable, its partial effect on �(�) = �(� = 1|�) is obtained

from

¬�(2)�2U

= H(��)�U , where H(F) ≡ ��F (F)

• Suppose we estimate a probit modelling the probability that a firm does some exporting as a

function of firm size.

• For simplicity, abstract from other explanatory variables. Our model is thus:

�(export = 1|,9F() = Φ(β + β�,9F()

• where size is defined as the natural logarithm of employment.

• The probit results are:

Coefficient t-statistic

β -2.85 16.6

β� 0.54 13.4

• Since the coefficient on size is positive, we know that the marginal effect must be positive.

Treating size as a continuous variable, it follows that the marginal effect is equal to:

¬�(export = 1|,9F()¬,9F( = ¥(β + β�,9F()β�

¬�(export = 1|,9F()¬,9F( = ¥(−2.85 + 0.54,9F()0.54

• where ¥() is the standard normal density function:

¥(F) = (2§);�/�exp (−F�/2)

• Assume that the mean value of log employment is 3.4 (i.e. 30 workers), we can then

evaluate the marginal effect at the mean (this is the partial effect at the average):

H��µ� = H(�� + ��2̅�+��2̅� + ⋯ + ��[2̅[)

• We then have:

¬�(export = 1|,9F( = 3.4)¬,9F( = 1

√2§ exp(— (2.85 + 0.54 × 3.4)�/2) 0.54

¬�(export = 1|,9F( = 3.4)¬,9F( = 0.13

• So, evaluated at log employment = 3.4, the results imply that an increase in log employment

by a small amount (∆) raises the probability of exporting by 0.13∆

• A second way of calculating the marginal effect is the average partial effect:

¸��¹U = �;� � H(��µ)��U# = º�;� � H(��µ)"0?�

»"0?�

��U

• With discrete variables we do not need to reply on calculus

Binary Choice Models for Panel Data

• We now consider the estimation of logit and probit models in the case of a panel dataset,

where we allow for unobserved individual effects.

• Using the latent variable framework, we can write the panel binary choice model as:

�01∗ = 201� + 40 + 301

�01 = l��01∗ > 0�

And

�(�01 = 1|201 , 40) = �(201� + 40)

• Where �(∙) is either the standard normal CDF (probit) or the logistic CDF (logit)

• In linear panel models it is easy to eliminate the individual effects (i.e. the 40’s) by first

differencing or by using the within groups transformation

• This is not possible in this case because of the non-linear nature of the model (i.e. the �(∙)

function)

• If we attempt to estimate the 40’s through the inclusion of dummy variables in the probit

and logit specification we will get biased estimates of � unless � is large.

• This is the incidental parameters problem

o With small � the estimates of the 40’s are inconsistent (and increasing � doesn’t solve

this problem).

o Unlike in the linear model, the inconsistency of the 40’s has a ‘knock-on’ effect in the

sense that the estimate of � becomes insignificant too.

Example:

• Consider the logit model in which � = 2, � is a scalar, and 201 is a time dummy such that

20� = 0, 20� = 1. Thus,

�(�01 = 1|20�, 40) = exp(� ∙ 0 + 40)1 + exp(� ∙ 0 + 40) ≡∧ (� ∙ 0 + 40)

�(�01 = 1|20�, 40) = exp(� ∙ 1 + 40)1 + exp(� ∙ 1 + 40) ≡∧ (� ∙ 1 + 40)

• Suppose we attempt to estimate this model with N dummy variables included to control for

the individual fixed effects. There would thus be � + 1 parameters to estimate. It can be

shown that in this case the probability of our variable of interest, �, is:

� lim¾→∞βl = 2�

• That is, the probability limit of the logit dummy variable estimator – for this admittedly very

special case – is double the true value of �. With a bias of 100% in very large (infinite)

samples, this is not a very useful approach. This form of inconsistency also holds in more

general cases: unless � is large, the logit dummy variable estimator will not work.

• So how can we proceed?

• One possibility is to use the fixed effects or first differenced LPM

o Wooldridge (2010) argues that this may provide reasonable estimates of APEs and has

one or two useful properties

• A further possibility is to employ a pooled probit or logit model

o Wooldridge (2010) notes problems with this approach, and notes that a robust variance

matrix is required to account for serial correlation

• Three more common approaches:

o The traditional random effects (RE) probit (or logit) model

o The conditional fixed effects logit model

o The Mundlak-Chamberlain approach

The Traditional Random Effects (RE) Probit

Model:

�01∗ = 201� + 40 + 301

�01 = 1��01∗ > 0�

And

�(�01 = 1|201 , 40) = �(201� + 40)

• The key assumptions underlying this estimator are:

o The 40 and 201 are independent

o The 201 are strictly exogenous (this will be necessary for it to be possible to write the

likelihood of observing a given series of outcomes as the product of individual

likelihoods).

o 40 has a normal distribution with zero mean and variance PQ� (note: homoscedasticity).

o �0�, … , �0- are independent conditional on (20 , 40) – this rules out serial correlation in

�01, conditional on (20 , 40). This assumption enables us to write the likelihood of

observing a given series of outcomes as the product of individual likelihoods. The

assumption can easily be relaxed (see Wooldridge, 2002).

• These are restrictive assumptions, especially since endogeneity in the explanatory variables

is ruled out. The only advantage over a simple pooled probit model is that the RE model

allows for serial correlation in the unobserved factors determining �01, i.e. in (40 + 301)

• However, it is fairly straightforward to extend the model and allow for correlation between

40 and 201 – this is what the Mundlak-Chamberlain approach does (see below)

• If 40 had been observed, the likelihood of observing individual 9 would have been:

¿�Φ(201� + 40)�ÀyÁ�1 − Φ(201� + 40)�(�;ÀyÁ)-

1?�

• and it would be straightforward to maximize the sample likelihood conditional on 201, 40 and

�01

• Because the 40 are unobserved however, they cannot be included in the likelihood function.

As discussed above, a dummy variables approach cannot be used, unless � is large.

What can we do?

• We must make an additional assumption about the relationship between 40 and 20, namely:

40|20~Normal(0, PQ�)

• This is a strong assumption, as it implies that 40 and 20 are independent, and that 40 has a

normal distribution

• The assumption that �(40) = 0 is without loss of generality provided 201 contains an

intercept

• With the above assumption we can integrate out the 40 from the likelihood function

• Recall from basic statistics (Bayes’ theorem for probability densities) that, in general,

+�|À(2, �) = +�À(2, �)+À(�)

• where +�|À(2, �) is the conditional density of 7 given 8 = �, +�À(2, �) is the joint

distribution of random variables 7, 8, and +À(�) is the marginal density of 8 . Thus,

+�À(2, �) = +�|À(2, �)+À(�)

• The marginal density of 7 can be obtained by integrating out � from the joint density

+�(2) = Ä +�À(2, �)�� = Ä +�|À(2, �)+À(�)��

• We can think about +�(2) as a likelihood contribution. For a linear model, we might write:

+�(�) = Ä +RQ(�, 4)�4 = Ä +R|Q(�, 4)+Q(4)��

• Where �01 = �01 − (201� + 40)

• In the context of the traditional RE probit, we integrate out 40 from the likelihood as follows:

}0(�0�, … , �0-|20�, … , 20-; �, PQ�)= Ä ¿�Φ(201� + 40)�ÀyÁ�1 − Φ(201� + 40)�(�;ÀyÁ)(1/PQ)¥(4/PQ)

-

1?��4

• Which can be maximised w.r.t. � and PQ� (or PQ)

• In general, there is no analytical solution here, and so numerical methods have to be used.

The most common approach is to use a Gauss-Hermite quadrature method

• To form the sample log likelihood, we simply compute weighted sums in this fashion for

each individual in the sample, and then add up all the individual likelihoods expressed in

natural logarithms:

log } = � log }0(�0�, … , �0-|20�, … , 20-; �, PQ�)"

0?�

• This model can be estimated in Stata using the xtprobit command

• There is a logit counterpart to this, but it is generally less desirable than the probit case

• Since � and PQ� can be estimated, the partial effects at 4 = 0 as well as the APEs can be

estimated

• Marginal effects at 40 = 0 can be computed using standard techniques, with the APE again a

useful effect to calculate

• Since 40~Normal(0, PQ�), the APE for a continuous 21U is:

�U/(1 + PQ�)�/�#¥ 21�/(1 + PQ�)�/�#

• Whilst perhaps elegant, the above model does not allow for a correlation between 40 and

the explanatory variables, and so does not achieve anything in terms of addressing an

endogeneity problem. We now turn to more useful models in that context.

The "Fixed Effects" Logit Model

• Now return to the panel logit model:

�(�01 = 1|201 , 40) =∧ (201� + 40)

• One important advantage of this model over the probit model is that it will be possible to

obtain a consistent estimator of � without making any assumptions about how 40 is related

to 201 (however, we need strict exogeneity to hold).

• This is possible, because the logit functional form enables us to eliminate 40 from the

estimating equation

• What we do is find the joint distribution of �0 ≡ (�0�, … , �0-)′ conditional on 20, 40 and

/0 ≡ ∑ �01-1?�

• It turns out in the logit case that this conditional distribution does not depend upon 20, so

that it is also the distribution of �0 given 20 and /0

• To see this, assume T = 2, and consider the following conditional probabilities:

�(�0� = 0, �0� = 1|20�, 20�, 40 , �0� + �0� = 1)

• The key thing to note here is that we condition on �0� + �0� = 1, i.e. that �01 changes

between the two time periods. For the logit functional form, we have:

�(�0� + �0� = 1|20�, 20�, 40)= exp(20�� + 40)

1 + exp(20�� + 40)1

1 + exp(20�� + 40) + 11 + exp(20�� + 40)

exp(20�� + 40)1 + exp(20�� + 40)

• Or simply:

�(�0� + �0� = 1|20�, 20�, 40) = ÆÇÈ(�yrÉzQy)zÆÇÈ(�ysÉzQy)��zÆÇÈ(�yrÉzQy)��zÆÇÈ(�ysÉzQy)�

• Furthermore:

�(�0� = 0, �0� = 1|20�, 20�, 40) = 11 + exp(20�� + 40)

exp(20�� + 40)1 + exp(20�� + 40)

• Hence conditional on �0� + �0� = 1:

�(�0� = 0, �0� = 1|20�, 20�, 40 , �0� + �0� = 1) = exp(20�� + 40)exp(20�� + 40) + exp(20�� + 40)

�(�0� = 0, �0� = 1|20�, 20�, 40 , �0� + �0� = 1) = exp(∆20��)1 + exp(∆20��)

• The key result is that the 40 have been eliminated. It follows that:

�(�0� = 1, �0� = 0|20�, 20�, 40 , �0� + �0� = 1) = 11 + exp(∆20��)

• Notes:

o These probabilities condition on �0� + �0� = 1

o These probabilities are independent of 40

• Hence, by maximizing the following conditional log likelihood function:

log } = � Ê��0 ln Ë exp(∆20��)1 + exp(∆20��)Ì + ��0 ln Í 1

1 + exp(∆20��)ÎÏ"

�?�

• We obtain consistent estimates of �, regardless of whether 40 and 201 are correlated

• The trick is thus to condition the likelihood on the outcome series (�0�, �0�), and in the more

general case (�0�, �0�, … , �0-). For example, if � = 3, we can condition on ∑ �011 = 1, with

possible sequences {1,0,0|, {0,1,0| and {0,0,1|, or on ∑ �011 = 2 with possible sequences

{1,1,0|, {1,0,1| and {0,1,1|.

• Stata does this for us, of course. This estimator is requested in Stata by using xtlogit with the

fe option.

• Note that the logit functional form is crucial for it to be possible to eliminate the 40 in this

fashion. It won’t be possible with probit. So this approach is not really very general.

• Another awkward issue concerns the interpretation of the results. The estimation procedure

just outlined implies we do not obtain estimates of 40, which means we can’t compute

marginal effects.

o We can’t estimate the partial effects on the response probabilities unless we plug in a

value for 4

o Because the distribution of 40 is unrestricted – and in particular �(40) is not necessarily

zero – it is hard to know what to plug in for 4

o We can’t estimate APEs, since doing so would require finding ��Λ(21� + 40)� - a task

that requires specifying a distribution for 40

Modelling the Random Effect as a Function of 2-variables

• The previous two methods are useful, but:

o The traditional random effects probit/logit model requires strict exogeneity and zero

correlation between the explanatory variables and 40 o The fixed effects logit relaxes the latter assumption but we can’t obtain consistent

estimates of 40 and hence we can’t compute the conventional marginal effects in

general.

• We will now discuss an approach which, in some ways, can be thought of as representing a

middle way. Start from the latent variable model:

�01∗ = 201� + 40 + 301

�01 = l��01∗ > 0�

• Consider writing the 40 as an explicit function of the 2-variables (i.e. allowing for correlation

between 40 and 20), for example as follows:

40 = Ñ + 2̅0] + �0 (1)

Or

40 = ¥ + 20^ + �0 (2)

• where 2̅ is an average of 201 over time for individual 9 (hence time invariant); 20 contains 201

for all �; �0 is assumed uncorrelated with 2̅0; �0 is assumed uncorrelated with 20. Equation (1)

is easier to implement and so we will focus on this.

• Assume that =�)(�0) = PÒ� is constant (i.e. there is homoscedasticity) and that 30 is

normally distributed - the model that then results is known as Chamberlain’s random effects

probit model.

• The latent variable formulation can be written as:

�01∗ = 201� + Ñ + 2̅0] + �0 + 301

• Equation (1) may be considered restrictive, in the sense that functional form assumptions

are made, but it at least allows for non-zero correlation between 40 and the regressors 201.

• The probability that �01 = 1 can now be written as:

�(�01 = 1|201 , 40) = �(�01 = 1|201 , 2̅0 , �0) = Φ(201� + Ñ + 2̅0] + �0)

• We now see that, after having added 2̅0 to the RHS, we arrive at the traditional random

effects probit model:

�0(�0�, … , �0-|20�, … , 20-; �, PÒ�)= Ä ¿�Φ(201� + Ñ + 2̅0] + �0)�ÀyÁ

-

1?�× �1 − Φ(201� + Ñ + 2̅0] + �0)�(�;ÀyÁ)(1/PÒ) ¥(�/PÒ)��

• This can be estimated using standard RE probit software

• Effectively, we are adding 2̅0 as control variables to allow for some correlation between the

random effect 40 and the regressors.

• If 201 contains time invariant variables, then clearly they will be collinear with their mean

values for individual 9, thus preventing separate identification of �-coefficients on time

invariant variables.

• Notice also that this model nests the simpler and more restrictive traditional random effects

probit: under the (easily testable) null hypothesis that ] = 0, the model reduces to the

traditional model discussed earlier.

• We can easily compute marginal effects at the mean of 40, since:

�(40) = Ñ + �(2̅0)]

• APEs can be evaluated using:

�;� � Φ�201��Ò + ÑlÒ + 2̅0]�Ò�"

0?�

• Where the � subscripts indicate that coefficients have been scaled by (1 + PQ�)�/�

• For a discrete variable, the above expression can be evaluated for two different values for 2

• For a continuous variable, 2U, the APE can be evaluated by using the average across 9 of

��ÒU¥�201��Ò + ÑlÒ + 2̅0]�Ò� to get the approximate APE of a one-unit increase in 2U

Linear Fixed Effects Model . xtreg lfp kids lhinc educ black age agesq per1 p er2 per3 per4 per5, robust fe Fixed-effects (within) regression Num ber of obs = 28315 Group variable: id Num ber of groups = 5663 R-sq: within = 0.0031 Obs per group: min = 5 between = 0.0103 avg = 5.0 overall = 0.0091 max = 5 F(6 ,5662) = 5.61 corr(u_i, Xb) = -0.0073 Pro b > F = 0.0000 (Std. Err. adjust ed for 5663 clusters in id) --------------------------------------------------- --------------------------- | Robust lfp | Coef. Std. Err. t P>| t| [95% Conf. Interval] -------------+------------------------------------- --------------------------- kids | -.0388976 .0091682 -4.24 0.0 00 -.0568708 -.0209244 lhinc | -.0089439 .0045947 -1.95 0.0 52 -.0179513 .0000635 educ | (omitted) black | (omitted) age | (omitted) agesq | (omitted) per1 | .0176797 .0048541 3.64 0.0 00 .0081637 .0271957 per2 | .0133998 .0045121 2.97 0.0 03 .0045544 .0222453 per3 | .0067844 .0039786 1.71 0.0 88 -.0010152 .014584 per4 | .0053795 .0032723 1.64 0.1 00 -.0010354 .0117944 per5 | (omitted) _cons | .7913419 .0373148 21.21 0.0 00 .7181905 .8644933 -------------+------------------------------------- --------------------------- sigma_u | .42247488 sigma_e | .21363541 rho | .79636335 (fraction of variance d ue to u_i)

Pooled Probit MLE . probit lfp kids lhinc educ black age agesq per1 per2 per3 per4 per5 note: per5 omitted because of collinearity Iteration 0: log likelihood = -17709.021 Iteration 1: log likelihood = -16561.609 Iteration 2: log likelihood = -16556.671 Iteration 3: log likelihood = -16556.671 Probit regression N umber of obs = 28315 L R chi2(10) = 2304.70 P rob > chi2 = 0.0000 Log likelihood = -16556.671 P seudo R2 = 0.0651 --------------------------------------------------- --------------------------- lfp | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- kids | -.1989144 .0074815 -26.59 0.0 00 -.2135779 -.1842509 lhinc | -.2110738 .0130489 -16.18 0.0 00 -.2366493 -.1854984 educ | .0796863 .003201 24.89 0.0 00 .0734125 .0859601 black | .2209396 .0334141 6.61 0.0 00 .1554492 .2864301 age | .1449159 .0061536 23.55 0.0 00 .132855 .1569767 agesq | -.0019912 .0000756 -26.34 0.0 00 -.0021393 -.001843 per1 | .0577767 .025249 2.29 0.0 22 .0082896 .1072637 per2 | .0453522 .0252187 1.80 0.0 72 -.0040756 .0947799 per3 | .0252589 .0251707 1.00 0.3 16 -.0240749 .0745926 per4 | .0116797 .025157 0.46 0.6 42 -.0376272 .0609865 per5 | (omitted) _cons | -1.122226 .1369621 -8.19 0.0 00 -1.390667 -.853785

. margins , dydx( kids lhinc) Average marginal effects N umber of obs = 28315 Model VCE : OIM Expression : Pr(lfp), predict() dy/dx w.r.t. : kids lhinc --------------------------------------------------- --------------------------- | Delta-method | dy/dx Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- kids | -.0660184 .002395 -27.56 0.0 00 -.0707126 -.0613242 lhinc | -.070054 .0042791 -16.37 0.0 00 -.0784409 -.0616671

Chamberlain’s RE Probit – Pooled MLE . sort id period . by id: egen kids_bar = mean(kids) . by id: egen lhinc_bar = mean(lhinc) . probit lfp kids lhinc kids_bar lhinc_bar educ bl ack age agesq per1 per2 per3 per4 per5 note: per5 omitted because of collinearity Iteration 0: log likelihood = -17709.021 Iteration 1: log likelihood = -16521.245 Iteration 2: log likelihood = -16516.437 Iteration 3: log likelihood = -16516.436 Probit regression N umber of obs = 28315 L R chi2(12) = 2385.17 P rob > chi2 = 0.0000 Log likelihood = -16516.436 P seudo R2 = 0.0673 --------------------------------------------------- --------------------------- lfp | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- kids | -.1173749 .0372874 -3.15 0.0 02 -.1904569 -.044293 lhinc | -.0288098 .0248077 -1.16 0.2 46 -.077432 .0198125 kids_bar | -.0856913 .0380322 -2.25 0.0 24 -.160233 -.0111495 lhinc_bar | -.2501781 .0290625 -8.61 0.0 00 -.3071396 -.1932167 educ | .0841338 .0032539 25.86 0.0 00 .0777562 .0905114 black | .2030668 .0335069 6.06 0.0 00 .1373945 .268739 age | .1516424 .0062081 24.43 0.0 00 .1394748 .1638101 agesq | -.0020672 .0000762 -27.13 0.0 00 -.0022166 -.0019179 per1 | .0552425 .0252773 2.19 0.0 29 .0056999 .1047851 per2 | .0416724 .0252544 1.65 0.0 99 -.0078254 .0911701 per3 | .0220434 .0252037 0.87 0.3 82 -.027355 .0714417 per4 | .0162108 .0251878 0.64 0.5 20 -.0331564 .0655779 per5 | (omitted) _cons | -.7812987 .1426149 -5.48 0.0 00 -1.060819 -.5017785

. margins , dydx( kids lhinc) Average marginal effects N umber of obs = 28315 Model VCE : OIM Expression : Pr(lfp), predict() dy/dx w.r.t. : kids lhinc --------------------------------------------------- --------------------------- | Delta-method | dy/dx Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- kids | -.038852 .0123363 -3.15 0.0 02 -.0630307 -.0146734 lhinc | -.0095363 .008211 -1.16 0.2 45 -.0256296 .0065571

. xtprobit lfp kids lhinc kids_bar lhinc_bar educ b lack age agesq per1 per2 per3 Random-effects probit regression Num ber of obs = 28315 Group variable: id Num ber of groups = 5663 Random effects u_i ~ Gaussian Obs per group: min = 5 avg = 5.0 max = 5 Wal d chi2(12) = 623.40 Log likelihood = -8609.9002 Pro b > chi2 = 0.0000 --------------------------------------------------- --------------------------- lfp | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- kids | -.3970102 .0701298 -5.66 0.0 00 -.534462 -.2595584 lhinc | -.10034 .0469979 -2.13 0.0 33 -.1924541 -.0082258 kids_bar | -.4085664 .0898875 -4.55 0.0 00 -.5847427 -.2323901 lhinc_bar | -.8941069 .1199703 -7.45 0.0 00 -1.129244 -.6589695 educ | .3189079 .024327 13.11 0.0 00 .2712279 .366588 black | .6388783 .1903525 3.36 0.0 01 .2657943 1.011962 age | .7282056 .0445623 16.34 0.0 00 .640865 .8155461 agesq | -.0098358 .0005747 -17.11 0.0 00 -.0109623 -.0087094 per1 | .200357 .049539 4.04 0.0 00 .1032624 .2974515 per2 | .1551917 .0499822 3.10 0.0 02 .0572284 .253155 per3 | .0756514 .0499737 1.51 0.1 30 -.0222952 .173598 per4 | .0646736 .049747 1.30 0.1 94 -.0328288 .1621759 per5 | (omitted) _cons | -5.559732 1.000528 -5.56 0.0 00 -7.52073 -3.598733 -------------+------------------------------------- --------------------------- /lnsig2u | 2.947234 .0435842 2.861811 3.032657 -------------+------------------------------------- --------------------------- sigma_u | 4.364995 .0951224 4.182484 4.55547 rho | .9501326 .002065 .945926 .9540279 --------------------------------------------------- --------------------------- Likelihood-ratio test of rho=0: chibar2(01) = 1.6e +04 Prob >= chibar2 = 0.000

PÒ� = 4.364995� = 19.05318 So, (1 + PDÒ�);�/� = 0.22331 gen temp = normalden(_b[kids] * 0.22331 * kids + _b [lhinc] * 0.22331 * lhinc + _b[kids_bar] * 0.22331 * kids_bar + _b[lhinc_bar] * 0.22331 * lhinc_bar + _b[educ] * 0. 22331 * educ + _b[black] * 0.22331 * black + _b[age ] * 0.22331 * age + _b[agesq] * 0.22331 * agesq + _b[per1] * 0.22331 * per1 + _b[per2] * 0.22331 * per2 + _b[per3] * 0.2 2331 * per3 + _b[per4] * 0.22331 * per4 + _b[_cons] * 0.22331) egen temp1 = mean(temp) temp1 = 0.3250374 (this is the scale factor) So, the APE for n9�, is (−0.3970102 × 0.22331) × 0.3250374 = −0.0288 And the APE for �ℎ9/� is (−0.10034 × 0.22331) × 0.3250374 = −0.007283 Estimating the LPM by FE gives estimated coefficien ts of roughly -0.039 and -0.009 on the n9�, and �ℎ9/� variables respectively. That is, each child reduces the labou r force participation by about 0.039%, while a 10% increase in a husband’s income lowers the probability by about 0. 0009. The APEs become much larger when we use the probit and assume that 40 is independent of 20 Using Chamberlain’s model gives results similar to the LPM Not too much difference between pooled and full ran dom effects Chamberlain model

. xtlogit lfp kids lhinc educ black age agesq per1 per2 per3 per4 per5, fe Conditional fixed-effects logistic regression Num ber of obs = 5275 Group variable: id Num ber of groups = 1055 Obs per group: min = 5 avg = 5.0 max = 5 LR chi2(6) = 57.27 Log likelihood = -2003.4184 Pro b > chi2 = 0.0000 --------------------------------------------------- --------------------------- lfp | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- kids | -.6438386 .1247828 -5.16 0.0 00 -.8884084 -.3992688 lhinc | -.1842911 .0826019 -2.23 0.0 26 -.3461878 -.0223943 educ | (omitted) black | (omitted) age | (omitted) agesq | (omitted) per1 | .3563745 .0888354 4.01 0.0 00 .1822604 .5304886 per2 | .2635706 .0886977 2.97 0.0 03 .0897262 .437415 per3 | .1315756 .0880899 1.49 0.1 35 -.0410774 .3042286 per4 | .1084422 .0879067 1.23 0.2 17 -.0638517 .2807361 per5 | (omitted)

Coefficient estimates are difficult to interpret The relative size is 0.644 / 0.184 = 3.5, which is not too different from the ratio when us ing the pooled MLE Chamberlain model for example

Dynamic Unobserved Effects Models

• Dynamic models that also contain unobserved effects are important in testing theories and

evaluating policies

• We have seen that using lagged dependent variables as explanatory variables complicates

the estimation of standard linear panel data models.

• Conceptually, similar problems arise for nonlinear models, but since we don’t rely on

differencing the steps involved for dealing with the problems are a little different.

• Suppose we date our observations at � = 0, so that �0 is the first observation on �. For

� = 1, … , � we are interested in the dynamic unobserved effects model:

��01 = 1×�0,1;�, … , �0, F0 , 40� = ��F01� + Ø�0,1;� + 40� (1)

• Where F01 is a vector of contemporaneous explanatory variables, F0 = (F0�, … , F0-) and � is

the probit or logit function. In the case of the probit model we have:

��01 = 1×�0,1;�, … , �0, F0 , 40� = Φ�F01� + Ø�0,1;� + 40�

• The F01 are assumed to be strictly exogenous (conditional on 40) • The probability of success at time � is allowed to depend on the outcome in � − 1 as well as

unobserved heterogeneity, 40. • The unobserved effect 40 is correlated with �0,1;� by definition

• The coefficient Ø is often referred to as the state dependence parameter. If Ø ≠ 0, then the

outcome �0,1;� influences the outcome in period t, �01

• If =�)(40) > 0, so that there is unobserved heterogeneity, we cannot use a pooled probit to

test J: Ø = 0. The reason is that under =�)(40) > 0, there will be serial correlation in the �01.

• We can write the density function as:

+(��, ��, … , �-|�, F, 4; �) = ¿ +(�1|�1;�, … , ��, �, F1 , 4; �)-

1?�

+(��, ��, … , �-|�, F, 4; �) = ∏ �(F1� + Ø�1;� + 4)ÀÁ-1?� �1 − �(F1� + Ø�1;� + 4)��;ÀÁ (2)

• Due to the presence of the unobserved effects it is not possible to construct a log-likelihood

function that can be used to estimate � consistently.

• Treating the 40 as parameters to be estimated – i.e. including N individual dummies – does

not result in consistent estimators of � and Ø

• We need to integrate 40 out of the distribution

• After integrating out the 40 the likelihood function in the dynamic probit model is: �0(�0�, … , �0-|20�, … , 20-; �, PQ�)= Ä ¿ Φ�F01� + Ø�0,1;� + 40�#ÀyÁ

-

1?�× 1 − Φ�F01� + Ø�0,1;� + 40�#(�;ÀyÁ)+ÀÚ|Q,¦y(�0, F0 , 4)(1/PQ) ¥(�/PQ)�4

• There remains an endogeneity problem. In +ÀÚ|Q,¦y(�0, F0 , 4) the regressor �0 is correlated

with the unobserved random effect.

o This is called the initial conditions problem.

• As � gets large the problem posed by the initial conditions problem becomes less serious

(since there will be a smaller weight on the problematic term), but with � small it can cause

substantial bias.

• How to deal with endogeneity?

o i.e. How do we treat the initial observations, �0?

• Heckman (1981) suggests approximating the conditional density of �0 given (F0 , 40) and

then specifying a density for 40 given F0 o e.g. we might specify that �0 follows a probit model with success probability

Φ(\ + F0 + V40) and specify the density of 40 given F0 as normal

• Once these two densities are specified they can be multiplied by (2) and 4 can be integrated

out to approximate the density of (�0, �0�, �0�, … , �0-) given F0

• An alternative approach suggested by Wooldridge (2005) is to obtain the joint distribution of

(�0�, �0�, … , �0-) conditional on (�0, F0). This approach allows one to remain agnostic about

the distribution of �0 given (F0 , 40)

• If we can find the density of (�0�, �0�, … , �0-) given (�0, F0) in terms of � and other

parameters, then we can use standard conditional MLE methods

• To obtain +(��, ��, … , �-|�0, F0) we need to propose a density for 40 given (�0, F0)

• This approach is similar to Chamberlain’s in the static probit case with unobserved effects

(except that now we condition on �0 also)

• Given a density ℎ(4|�, F, V), which depends on the parameters V, we have

+(��, ��, … , �-|�, F, Û) = Ä +∞

;∞

(�0�, �0�, … , �0-|�, F, 4; �)ℎ(4|�, F, V)��

• The integral can be replaced by a weighted average if the distribution of 4 is discrete.

o When � = Φ in (1) a convenient choice for ℎ(4|�, F, V) is Normal(Ñ + ]�0 +F], PÒ�), which follows by writing 40 = Ñ + ]�0 + F] + �0, where �0~�&)k��(0, PÒ�)

and independent of (�0, F0)

• Then we can write:

�01 = 1 Ñ + F01� + Ø�0,1;� + ]�0 + F0] + �0 + (01 > 0#

• So that �01 given ��0,1;�, … , �0, F0 , 40� follows a probit model and �0 given (�0, F0) is

distributed as Normal(0, PÒ�)

• This gives a density in exactly the same form as that for conditional MLE with � and PÒ

replacing 4 and PQ

• This means that we can use standard RE probit commands to estimate these dynamic

models

• We simply expand the list of explanatory variables to include �0 and F0 in each time period

• It is then simple to test whether Ø = 0, meaning that there is no state dependence, once we

control for an unobserved effect

• In estimating the dynamic model it is important to remember that it is not possible to obtain

consistent estimates of the parameters using pooled probit of �01 on 1, F01, �0,1;�, �0, F0. o While ��01 = 1×�0,1;�, … , �0, F0 , 40� = Φ�Ñ + F01� + Ø�0,1;� + ]�0 + F0] + �0� it is

not true that ��01 = 1×�0,1;�, … , �0, F0� = Φ�ÑÒ + F01�Ò + ØÒ�0,1;� + ]Ò�0 + F0]Ò�

unless �0 is identically zero.

o Correlation between �0,1;� and �0 means that ��01 = 1×�0,1;�, … , �0, F0� does not

follow a probit model with index that depends on the scaled coefficients of interest

• We can estimate Average Partial Effects, but we must now average out the initial condition

along with leads and lags of all strictly exogenous variables.

• Let F1 and �1;� be given values of the explanatory variables

• Then the “Average Structural Function”:

��Φ(F1� + Ø�1;� + 40)� = ��Φ(ÑÒ + F1�Ò + ØÒ�1;� + ]Ò�0 + F0]Ò)�

• Can be consistently estimated as:

¸beÜ (F1 , �1;�) = �;� � Φ�ÑlÒ + F1��Ò + ØDÒ�1;� + ]�Ò�0 + F0]�Ò�"

0?�

• Where the � subscript denotes that the original coefficients have been multiplied by

(1 + PDÒ�);�/� and Ñl, ��, ØD, ]�, ]� and PDÒ� are the estimates reported by the statistical package

• We can then take derivatives of this expression w.r.t. continuous elements of F1, or take

differences with respect to discrete elements

• A particularly interesting case is to alternatively set �1;� = 1 and �1;� = 0 and obtain the

change in probability that �01 = 1 when �1;� goes from zero to one

• To obtain a single APE we can also average across all time periods

Example: Dynamic Women’s Labour Force Participation

• To estimate a dynamic women’s labour force participation equation using the method

described above we look to estimate the following model:

• ��+�01 = 1×n9�,01 , �ℎ9/�01 , �+�0,1;�, 40�

• We further include time-constant variables: ��n, (�3�, �H(, �H(� and a full set of time

dummies

• We include among the regressors: �+�0, n9�,0� through n9�,0� and �ℎ9/�0� through �ℎ9/�0�

. by id: gen lfp_0 = lfp[_n] if _n == 1 (22652 missing values generated) . by id: replace lfp_0 = sum(lfp_0) (22652 real changes made) . by id: gen lhinc_1 = lhinc[_n] if _n == 2 (22652 missing values generated) . by id: gen lhinc_2 = lhinc[_n] if _n == 3 (22652 missing values generated) . by id: gen lhinc_3 = lhinc[_n] if _n == 4 (22652 missing values generated) . by id: gen lhinc_4 = lhinc[_n] if _n == 5 (22652 missing values generated) . by id: replace lhinc_1 = sum(lhinc_1) (22652 real changes made) . by id: replace lhinc_2 = sum(lhinc_2) (22652 real changes made) . by id: replace lhinc_3 = sum(lhinc_3) (22652 real changes made) . by id: replace lhinc_4 = sum(lhinc_4) (22652 real changes made) . by id: gen kids_1 = kids[_n] if _n == 2 (22652 missing values generated) . by id: gen kids_2 =kids[_n] if _n == 3 (22652 missing values generated) . by id: gen kids_3 =kids[_n] if _n == 4 (22652 missing values generated) . by id: gen kids_4 =kids[_n] if _n == 5 (22652 missing values generated) . by id: replace kids_1 = sum(kids_1) (22652 real changes made) . by id: replace kids_2 = sum(kids_2) (22652 real changes made) . by id: replace kids_3 = sum(kids_3) (22652 real changes made) . by id: replace kids_4 = sum(kids_4) (22652 real changes made)

xtprobit lfp l.lfp lfp_0 kids kids_1 kids_2 kids_3 kids_4 lhinc lhinc_1 lhinc_2 lhinc_3 lhinc_4 educ black age agesq per2 per3 per4 per5 Random-effects probit regression Num ber of obs = 22652 Group variable: id Num ber of groups = 5663 Random effects u_i ~ Gaussian Obs per group: min = 4 avg = 4.0 max = 4 Wal d chi2(19) = 4093.48 Log likelihood = -5039.9867 Pro b > chi2 = 0.0000 --------------------------------------------------- --------------------------- lfp | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- lfp | L1. | 1.536826 .0665669 23.09 0.0 00 1.406358 1.667295 | lfp_0 | 2.545303 .1556541 16.35 0.0 00 2.240227 2.85038 kids | -.3591127 .0646669 -5.55 0.0 00 -.4858574 -.2323679 kids_1 | .2595521 .0665754 3.90 0.0 00 .1290668 .3900374 kids_2 | .0206725 .0355721 0.58 0.5 61 -.0490476 .0903925 kids_3 | .0024968 .0358429 0.07 0.9 44 -.0677541 .0727477 kids_4 | .047252 .0366343 1.29 0.1 97 -.02455 .119054 lhinc | -.0745293 .0485506 -1.54 0.1 25 -.1696867 .0206281 lhinc_1 | -.0747431 .0531001 -1.41 0.1 59 -.1788173 .0293311 lhinc_2 | -.0080621 .0501491 -0.16 0.8 72 -.1063524 .0902283 lhinc_3 | .0088362 .0511642 0.17 0.8 63 -.0914437 .1091162 lhinc_4 | -.1189348 .0610491 -1.95 0.0 51 -.2385888 .0007193 educ | .0459694 .0098917 4.65 0.0 00 .026582 .0653569 black | .1281378 .0984119 1.30 0.1 93 -.0647459 .3210216 age | .1383024 .019357 7.14 0.0 00 .1003634 .1762414 agesq | -.0017838 .0002402 -7.43 0.0 00 -.0022545 -.0013131 per2 | -.7521025 .5635128 -1.33 0.1 82 -1.856567 .3523623 per3 | -.7700739 .5206839 -1.48 0.1 39 -1.790596 .2504478 per4 | -.8158966 .4836915 -1.69 0.0 92 -1.763915 .1321213 per5 | (omitted) _cons | -2.818611 .5587894 -5.04 0.0 00 -3.913818 -1.723403 -------------+------------------------------------- ---------------------------

/lnsig2u | .1151956 .1209124 -.1217884 .3521796 -------------+------------------------------------- --------------------------- sigma_u | 1.059289 .0640406 .9409228 1.192545 rho | .5287671 .030128 .4695905 .587146 --------------------------------------------------- --------------------------- Likelihood-ratio test of rho=0: chibar2(01) = 164 .94 Prob >= chibar2 = 0.000 From the above PDÒ� = 1.059� = 1.122093 So, (1 + PDÒ�);�/� = 0.686464 gen temp = normalden(_b[lfp_0]*0.686464*lfp_0 + _b[ kids]*0.686464*kids + _b[kids_1]*0.686464*kids_1 + _b[kids_2]*0.686464*kids_2 + _b[kids_3]*0.686464*ki ds_3 + _b[kids_4]*0.686464*kids_4 + _b[lhinc]*0.686 464*lhinc + _b[lhinc_1]*0.686464*lhinc_1 + _b[lhinc_2]*0.686464 *lhinc_2 + _b[lhinc_3]*0.686464*lhinc_3 + _b[lhinc_4]*0.686464*lhinc_4 + _b[educ]*0.686464*ed uc + _b[black]*0.686464*black + _b[age]*0.686464*ag e + _b[agesq]*0.686464*agesq + _b[per2]*0.686464*per2 + _b[per3]*0.686464*per3 + _b[per4]*0.686464*per4 + _b[_cons]*0.686464 + _b[L1.]*0.686464) - normalden( _b[lfp_0]*0.686464*lfp_0 + _b[kids]*0.686464*kids + _b[kids_1]*0.686464*kids_1 + _b[kids_2]*0.686464*ki ds_2 + _b[kids_3]*0.686464*kids_3 + _b[kids_4]*0.68 6464*kids_4 + _b[lhinc]*0.686464*lhinc + _b[lhinc_1]*0.686464*lhi nc_1 + _b[lhinc_2]*0.686464*lhinc_2 + _b[lhinc_3]*0 .686464*lhinc_3 + _b[lhinc_4]*0.686464*lhinc_4 + _b[educ]*0.686464*ed uc + _b[black]*0.686464*black + _b[age]*0.686464*ag e + _b[agesq]*0.686464*agesq + _b[per2]*0.686464*per2 + _b[per3]*0.686464*per3 + _b[per4]*0.686464*per4 + _b[_cons]*1.45674) . summarize temp Variable | Obs Mean Std. Dev. Min Max -------------+------------------------------------- ------------------- temp | 28315 .0830207 .2166758 - .3857821 .3968095

• Averaged across all women and all time periods the probability of being in the labour force

at time � is about 0.08 higher if the woman was in the labour force at time � − 1

It is instructive to compare the APE with the estim ate of a dynamic probit model that ignores 40 . xtprobit lfp l.lfp kids lhinc educ black age ag esq per1 per2 per3 per4 per5, re Random-effects probit regression Num ber of obs = 22652 Group variable: id Num ber of groups = 5663 Random effects u_i ~ Gaussian Obs per group: min = 4 avg = 4.0 max = 4 Wal d chi2(10) = 12071.51 Log likelihood = -5332.529 Pro b > chi2 = 0.0000 --------------------------------------------------- --------------------------- lfp | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- lfp | L1. | 2.875683 .0269811 106.58 0.0 00 2.822801 2.928565 | kids | -.0607933 .012217 -4.98 0.0 00 -.0847381 -.0368484 lhinc | -.1143188 .0211668 -5.40 0.0 00 -.1558051 -.0728325 educ | .0291874 .0052362 5.57 0.0 00 .0189246 .0394501 black | .079251 .0536696 1.48 0.1 40 -.0259396 .1844415 age | .084404 .0099983 8.44 0.0 00 .0648076 .1040004 agesq | -.0010991 .0001236 -8.90 0.0 00 -.0013413 -.000857 per1 | (omitted) per2 | .0304145 .037152 0.82 0.4 13 -.042402 .103231 per3 | -.0036646 .0369207 -0.10 0.9 21 -.0760278 .0686986 per4 | .0326971 .0371438 0.88 0.3 79 -.0401035 .1054977 per5 | (omitted) _cons | -2.201223 .2218053 -9.92 0.0 00 -2.635954 -1.766493 -------------+------------------------------------- --------------------------- /lnsig2u | -15.70567 14.44481 -44.01697 12.60564 -------------+------------------------------------- --------------------------- sigma_u | .0003886 .002807 2.77e-10 546.109 rho | 1.51e-07 2.18e-06 7.65e-20 .9999966 --------------------------------------------------- --------------------------- Likelihood-ratio test of rho=0: chibar2(01) = 0 .00 Prob >= chibar2 = 1.000

gen temp = normprob(_b[kids]*kids + _b[lhinc]*lhin c + _b[educ]*educ + _b[black]*black + _b[age]*age + _b[agesq]*agesq + _b[per2]*per2 + _b[per3]*per3 + _b[per4]*per4 + _ b[_cons] + _b[L1.]) - normprob(_b[kids]*kids + _b[l hinc]*lhinc + _b[educ]*educ + _b[black]*black + _b[age]*age + _b[ agesq]*agesq + _b[per2]*per2 + _b[per3]*per3 + _b[p er4]*per4 + _b[_cons])

. summarize temp

Variable | Obs Mean Std. Dev. Min Max -------------+------------------------------------- ------------------- temp | 28315 .8375263 .0121556 .6019519 .849521

• The APE for state dependence is much higher in this case than when heterogeneity is

controlled for.

o Averaged across all women and all time periods the probability of being in the labour

force at time � is about 0.84 higher if the woman was in the labour force at time � − 1

• Therefore, much of the persistence in labour force participation of married women is

accounted for by unobserved heterogeneity.

• There is some state dependence, but its value is much smaller than a simple dynamic probit

indicates.

Date post:	01-Dec-2015
Category:	Documents
Upload:	tung-nguyen
View:	84 times
Download:	4 times

Lec06 - Panel Data

Documents