Application 3: Estimating the Effect of Education on
Earnings
Methods of Economic Investigation
Lecture 9
1
Quick Asymptotics reminder… In class: Not really about “proving”
consistency or asymptotic bias in estimates
When appropriate, will mention these bias terms which are asymptotically zero but not zero in finite samples
2
What should you know? What happens to something in it’s
probability limit
That our estimates will, in the limit, as N goes to infinity, under regularity conditions
3
1 1
1
1 1lim ' ' lim ' lim '
1 1lim ' lim '
p X X X y p X X p X yN N
p X X p XN N
What you do not need to know Behind these results are various theorems Laws of Large Numbers for plims Central Limit Theorems for asymptotic normality Various mathematical conditions
e.g contiunuous mapping theorem
You do not have to know: which theorems you are using You do not have to be able to prove these results
with the theorems4
Bottom Line… Understand the role of N→∞
the mean of the sample mean is μ The variance goes to zero If something is scaled by (N)-2 can converge in
distribution So far, typically rely on concept of “bias”
but in large samples, consistency is more useful term. If bias is decreasing as sample is increasing,
then worry less about it If even in large samples, our estimate is not
close to the true value, worry more about it 5
Today’s Lecture Review Error component models
Fixed Effects Random Effects
Application: Estimating the Effects of Education on Earnings Difficulty in Causal Estimation Within-family estimator
Some limitation of fixed effects6
Error has different components Suppose we had to estimate where
If unobserved factors are uncorrelated with X’s: can do OLS w/ robust standard errors or FGLS
If unobserved factors correlated with X’s, can include group-specific fixed effect
ijijij XY ijjij
Fixed effects versus “Dummy Variables” These are not mutually exclusive
categories
Dummy variables are just a categorical variable that is zero sometimes and one sometimes “control” variables, which have a direct
meaning, may sometimes be dummy variables Fixed Effects, which tell us something about
the structure of our error term, are also dummy variables.
Motivation for today’s example… Want to know why do people earn
different amounts
Specifically, what are the returns, in terms of increased wages, for various investments people make
Most common labor improving investment: Education
Motivation-2 Simple Linear regression first introduced
by Mincer )log( 2ijijijijij edXcXbSay
Measure of schooling: we’re going to use years of education
Experience: we’re going to include a quadratic specification which is most commonly used
Index this by individual i in group j
Basic Problem with estimating this Lot’s of reasons why different people may
invest at different levels of education
Some of those reasons are probably correlated with how much money a person would earn as well as how much they will invest in education Unobserved “ability” Family factors, such income, parental
involvement, genetic stuff, etc.
How might these bias our estimates? Let’s say what we want to estimate is:
Interpret higher f as something like family income or family investment
Recall the OVB formula—care about two things: Correlation between f and y: probably positive Correlation between f and S: positive
)log( 2ijjijijijij efdXcXbSay
Why is OLS biased?
]2|[ jSE ij
]3|[ jSE ij
]1|[ jSE ij
S
Y ][SEOLS
How could we fix this? Some of the unobserved differences that
bias a cross-sectional comparison of education and earnings are based on family characteristics
Key Assumption: within families, these differences should be fixed.
Observe multiple individuals with exactly the same family effect, then we could difference out the group effect
Estimating Family Averages Can look at differences within family effect
This of this as a different CEF for each familyE[Yij -Yj | S, X, f] = a + b(Sij – Sj) + c(Xij – Xj) +
d(X2ij – X2
j)
The way we estimate this: ˆˆˆˆˆˆ)log( 2ijjijijijij efXdXcSbay
What makes this believable No within family differences
Might be a problem with siblings generally Parents invest differently Cohort related differences—influence siblings
differently Different “inherited” endowment
More believable with identical twins
A twins sample Collect data at the Twins festival in
Twinsburg Ohio
Survey twins: Are you identical? If both say yes—then included Ever worked in past two years Earnings, education, and other characteristics
Useful because also get two measures of shared characteristics, so can control for measurement error
Twins sample issues… Sample at Twinsburg NOT a random sample
of twins Benefit: more likely to be similar because
attendees are into their “twinness” Cost: not necessarily generalizable, even to
other twin
Attendees select segment of the population Generally Richer, Whiter, More Educated, etc. Worry about heterogeneity of effects across
some of these categories
External validity Twins may not be very comparable to
other families—face different costs and benefits to schooling
Twinsburg sample not representative of twins Maybe not even externally valid for twins Worry that selection into sample will give us an
estimate that is not consistent with the population average
20
22
No family effect, cross-section regression
Control for avg. family schooling—”ability” measure
Fixed effects (same as first difference w/ only two obs/family
Where’s the variation Recall our estimating equation
If Sij is the same in both twins, no contribution to estimate of b
Only estimated off of twins who are different from each other in schooling investments
ˆˆˆˆˆˆ)log( 2ijjijijijij efXdXcSbay
Correlation Matrix for Twins
Education of twin 1, reported by twin1
Education of twin 1, reported by twin2
ALL of the identification for b comes from the 25% of twins who don’t have the same schooling
Measurement Error Seems that twins not perfect at reporting
each other’s schooling: 5-10% measurement error
May be generating a different bias Can use instrumental variables to try to
address this (more on this after we do Instrumental Variables methods)
Need to worry about Data Quality too, can’t just worry about OVB
25
Limitations of Fixed Effects Relies on within variation
Not transparent what is generating that variation The variation that’s left may be ‘random’ but may
be limited in its external validity
Must be the case that there is NO within group variation AND homogeneous effects between groups (i.e. b the same across groups) May be less believable if family inputs have non-
linear effects on income or education
What did we learn today When have unobserved group effects can
be two issues: Uncorrelated with X’s: OLS not efficient, can fix
this with GLS Correlated with X’s: OVB, can include “fixed
effects”
Fixed effects, within-group differences, and deviation from means differences can all remove bias from unobserved group effect
27
Next Class Application: The effect of Schooling on
wages Ability Bias Fixing this with “twins” and “siblings” models
28