Application 3: Estimating the Effect of Education on Earnings

Application 3: Estimating the Effect of Education on

Earnings

Methods of Economic Investigation

Lecture 9

1

Quick Asymptotics reminder… In class: Not really about “proving”

consistency or asymptotic bias in estimates

When appropriate, will mention these bias terms which are asymptotically zero but not zero in finite samples

2

What should you know? What happens to something in it’s

probability limit

That our estimates will, in the limit, as N goes to infinity, under regularity conditions

3

1 1

1

1 1lim ' ' lim ' lim '

1 1lim ' lim '

p X X X y p X X p X yN N

p X X p XN N

What you do not need to know Behind these results are various theorems Laws of Large Numbers for plims Central Limit Theorems for asymptotic normality Various mathematical conditions

e.g contiunuous mapping theorem

You do not have to know: which theorems you are using You do not have to be able to prove these results

with the theorems4

Bottom Line… Understand the role of N→∞

the mean of the sample mean is μ The variance goes to zero If something is scaled by (N)-2 can converge in

distribution So far, typically rely on concept of “bias”

but in large samples, consistency is more useful term. If bias is decreasing as sample is increasing,

then worry less about it If even in large samples, our estimate is not

close to the true value, worry more about it 5

Today’s Lecture Review Error component models

Fixed Effects Random Effects

Application: Estimating the Effects of Education on Earnings Difficulty in Causal Estimation Within-family estimator

Some limitation of fixed effects6

Error has different components Suppose we had to estimate where

If unobserved factors are uncorrelated with X’s: can do OLS w/ robust standard errors or FGLS

If unobserved factors correlated with X’s, can include group-specific fixed effect

ijijij XY ijjij

Fixed effects versus “Dummy Variables” These are not mutually exclusive

categories

Dummy variables are just a categorical variable that is zero sometimes and one sometimes “control” variables, which have a direct

meaning, may sometimes be dummy variables Fixed Effects, which tell us something about

the structure of our error term, are also dummy variables.

Motivation for today’s example… Want to know why do people earn

different amounts

Specifically, what are the returns, in terms of increased wages, for various investments people make

Most common labor improving investment: Education

Motivation-2 Simple Linear regression first introduced

by Mincer )log( 2ijijijijij edXcXbSay

Measure of schooling: we’re going to use years of education

Experience: we’re going to include a quadratic specification which is most commonly used

Index this by individual i in group j

Basic Problem with estimating this Lot’s of reasons why different people may

invest at different levels of education

Some of those reasons are probably correlated with how much money a person would earn as well as how much they will invest in education Unobserved “ability” Family factors, such income, parental

involvement, genetic stuff, etc.

How might these bias our estimates? Let’s say what we want to estimate is:

Interpret higher f as something like family income or family investment

Recall the OVB formula—care about two things: Correlation between f and y: probably positive Correlation between f and S: positive

)log( 2ijjijijijij efdXcXbSay

Why is OLS biased?

]2|[ jSE ij

]3|[ jSE ij

]1|[ jSE ij

S

Y ][SEOLS

How could we fix this? Some of the unobserved differences that

bias a cross-sectional comparison of education and earnings are based on family characteristics

Key Assumption: within families, these differences should be fixed.

Observe multiple individuals with exactly the same family effect, then we could difference out the group effect

Estimating Family Averages Can look at differences within family effect

This of this as a different CEF for each familyE[Yij -Yj | S, X, f] = a + b(Sij – Sj) + c(Xij – Xj) +

d(X2ij – X2

j)

The way we estimate this: ˆˆˆˆˆˆ)log( 2ijjijijijij efXdXcSbay

What makes this believable No within family differences

Might be a problem with siblings generally Parents invest differently Cohort related differences—influence siblings

differently Different “inherited” endowment

More believable with identical twins

A twins sample Collect data at the Twins festival in

Twinsburg Ohio

Survey twins: Are you identical? If both say yes—then included Ever worked in past two years Earnings, education, and other characteristics

Useful because also get two measures of shared characteristics, so can control for measurement error

Twins sample issues… Sample at Twinsburg NOT a random sample

of twins Benefit: more likely to be similar because

attendees are into their “twinness” Cost: not necessarily generalizable, even to

other twin

Attendees select segment of the population Generally Richer, Whiter, More Educated, etc. Worry about heterogeneity of effects across

some of these categories

External validity Twins may not be very comparable to

other families—face different costs and benefits to schooling

Twinsburg sample not representative of twins Maybe not even externally valid for twins Worry that selection into sample will give us an

estimate that is not consistent with the population average

20

22

No family effect, cross-section regression

Control for avg. family schooling—”ability” measure

Fixed effects (same as first difference w/ only two obs/family

Where’s the variation Recall our estimating equation

If Sij is the same in both twins, no contribution to estimate of b

Only estimated off of twins who are different from each other in schooling investments

ˆˆˆˆˆˆ)log( 2ijjijijijij efXdXcSbay

Correlation Matrix for Twins

Education of twin 1, reported by twin1

Education of twin 1, reported by twin2

ALL of the identification for b comes from the 25% of twins who don’t have the same schooling

Measurement Error Seems that twins not perfect at reporting

each other’s schooling: 5-10% measurement error

May be generating a different bias Can use instrumental variables to try to

address this (more on this after we do Instrumental Variables methods)

Need to worry about Data Quality too, can’t just worry about OVB

25

Limitations of Fixed Effects Relies on within variation

Not transparent what is generating that variation The variation that’s left may be ‘random’ but may

be limited in its external validity

Must be the case that there is NO within group variation AND homogeneous effects between groups (i.e. b the same across groups) May be less believable if family inputs have non-

linear effects on income or education

What did we learn today When have unobserved group effects can

be two issues: Uncorrelated with X’s: OLS not efficient, can fix

this with GLS Correlated with X’s: OVB, can include “fixed

effects”

Fixed effects, within-group differences, and deviation from means differences can all remove bias from unobserved group effect

27

Next Class Application: The effect of Schooling on

wages Ability Bias Fixing this with “twins” and “siblings” models

28

Date post:	11-Feb-2016
Category:	Documents
Upload:	armand
View:	38 times
Download:	1 times

Application 3: Estimating the Effect of Education on Earnings

Documents