Advanced Econometrics - uni- · PDF fileAdvanced Econometrics ... (CQE) Econometrics Winter...

Advanced Econometrics

Dr. Andrea Beccarini

Center for Quantitative Economics

Winter 2013/2014

Andrea Beccarini (CQE) Econometrics Winter 2013/2014 1 / 156

General informationAims and prerequisites

Objective: learn to understand and use advanced econometricestimation techniques

Applications in micro and macro econometrics and finance

Prerequisites: Statistical Foundations (random vectors, stochasticconvergence, estimators)


General informationLiterature

Russell Davidson and James MacKinnon, Econometric Theory andMethods, Oxford University Press, 2004.

Various textbooks


General informationSchedule

Least squares estimation and method of moments

Maximum likelihood estimation

Instrument variables estimation

GMM

Indirect Inference


Least squaresLinear regression

Multiple linear regression model

y = Xβ + u

u ∼ N(0, σ2I

)OLS estimator

β =(X ′X

)−1X ′y

Covariance matrixCov

(β)

= σ2(X ′X

)−1

Gauss-Markov theorem


Least squaresNonlinear regression

Notation of Davidson and MacKinnon (2004),

yt = xt (β) + ut

ut ∼ IID(0, σ2)

xt(β) is a nonlinear function of the parameter vector β

Example:

yt = β1 + β2xt1 +1

β2xt2 + ut


Least squaresNonlinear regression

Minimize the sum of squared residuals

T∑t=1

(yt − xt (β))2

with respect to β

Usually, the minimization must be done numerically


Method of momentsDefinition of moments

Raw moment of order pµp = E (X p)

Empirical raw moment of order p

µp =1

n

n∑i=1

X pi

for a simple random sample X1, . . . ,Xn


Method of momentsBasic idea: Step 1

Write r theoretical moments as functions of r unknown parameters

µ1 = g1 (θ1, . . . , θr )...

µr = gr (θ1, . . . , θr )

Of course, central moments may be used as well



Invert the system of equations:Write the r unknown parametersas functions of the r theoretical moments

θ1 = h1(µ1, . . . , µr )...

θr = hr (µ1, . . . , µr )



Replace all theoretical moments by empirical moments

θ1 = h1(µ1, . . . , µr )...

θr = hr (µ1, . . . , µr )

The estimators θ1, . . . , θr are moment estimators


Method of momentsProperties of moment estimators

Moment estimators are consistent since

plimθ1 = plim (h1(µ1, µ2, . . .))

= h1(plimµ1, plimµ2, . . .)

= h1(µ1, µ2, . . .)

= θ1

In general, moment estimators are not unbiased and not efficient

Since the empirical moments are asymptotically normal (why?),moment estimators are also asymptotically normal−→ delta method [P]


Method of momentsExample

Let X ∼ Exp (λ) with unknown parameter λ and let X1, . . . ,Xn be arandom sample

Step 1: We know that E (X ) = µ1 = 1/λ

Step 2 (inversion): λ = 1/µ1

Step 3: The estimator is

λ =1

µ1=

11n

∑i Xi

=1

Xn

Is λ unbiased?

Alternative: Var(X ) = 1/λ2, then λ = 1/√S2







λ =1

µ1=

11n

∑i Xi

=1

Xn

Is λ unbiased?








λ =1

µ1=

11n

∑i Xi

=1

Xn

Is λ unbiased?








λ =1

µ1=

11n

∑i Xi

=1

Xn

Is λ unbiased?








λ =1

µ1=

11n

∑i Xi

=1

Xn

Is λ unbiased?








λ =1

µ1=

11n

∑i Xi

=1

Xn

Is λ unbiased?



Maximum likelihoodBasic idea

The basic idea is very natural:

Choose the parameters such that the probability (likelihood) of theobservations x1, . . . , xn as a function of the unknown parametersθ1, . . . , θr is maximized

Likelihood function

L(θ; x1, . . . , xn) =

P(X1 = x1, . . . ,Xn = xn; θ)

fX1,...,Xn(x1, . . . , xn; θ)



For simple random samples

L(θ; x1, . . . , xn) =n∏

i=1

fX (xi ; θ)

Maximize the likelihood

L(θ; x1, . . . , xn) = maxθ∈Θ

L(θ; x1, . . . , xn)

ML estimate θ = arg max L(θ; x1, . . . , xn)

ML estimator θ = arg max L(θ;X1, . . . ,Xn)



Because sums are easier to deal with than products,and because sums are subject to limit laws, it iscommon to maximize the log-likelihood

ln L(θ) =n∑

i=1

ln fX (Xi ; θ)

The ML estimator is the same as before, since

θ = arg max ln L(θ;X1, . . . ,Xn)

= arg max L(θ;X1, . . . ,Xn)



Usually, we find θ by solving the system of equations

∂ ln L/∂θ1 = 0...

∂ ln L/∂θr = 0

The gradient vector g(θ) = ∂ ln L(θ)/∂θ is calledscore vector or score

If the log-likelihood is not differentiable other maximization methodsmust be used


Maximum likelihoodExample

Let X ∼ Exp(λ) with density f (x ;λ) = λe−λx for x ≥ 0and f (x ;λ) = 0 else

Likelihood of i.i.d. random sample

L(λ; x1, . . . , xn) =n∏

i=1

λe−λxi

Log-likelihood

ln L(λ; x1, . . . , xn) = n lnλ− λn∑

i=1

xi



Set the derivative to zero

∂ ln L(λ)

∂λ=

n

λ−

n∑i=1

xi!

= 0,

hence

λ =n∑ni=1 xi

=1

x

The ML estimator for λ is

λ =1

X


Maximum likelihoodProperties of ML estimators: Preliminaries

The log-likelihood and the score vector are

ln L (θ) =n∑

i=1

ln fX (Xi ; θ)

∂ ln L (θ)

∂θ=

n∑i=1

∂ ln fX (Xi ; θ)

∂θ

The contributions ln fX (Xi ; θ) are random variables

The contributions ∂ ln fX (Xi ; θ)/∂θ are random vectors

Hence, limit laws can be applied to the (normalized) sums



For all θ ∫e ln L(θ)dx =

∫L (θ; x1, . . . , xn) dx

= 1

since L (θ) is a joint density function of X1, . . . ,Xn



Define the matrix G (θ,X1, . . . ,Xn) of gradient contributions

Gij (θ,Xi ) =∂ ln fX (Xi ; θ)

∂θj

The column sums are the gradient vector with elements

gj (θ) =n∑

i=1

Gij (θ,Xi )

The expected gradient vector is Eθ (g (θ)) = 0 [P]



The covariance matrix of gradient vector

Cov (g (θ)) = E(g (θ) g (θ)′

)is called information matrix (and often denoted I (θ))

Information matrix equality [P]

Cov (g (θ)) = −E (H (θ))

Cov

(∂ ln L (θ)

∂θ

)= −E

(∂2 ln L (θ)

∂θ∂θ′

)


Maximum likelihoodProperties of ML estimators

1 Equivariance: If θ is the ML estimator for θ, then h(θ) is the MLestimator for h(θ)

2 Consistency:plimθn=θ

3 Asymptotic normality:

√n(θn − θ

)d→ U ∼ N (0,V (θ))

4 Asymptotic efficiency: V (θ) is the Cramı¿ 12 r-Rao bound

5 Computability (analytical or numerical); the covariance matrix of theestimator is a by-product of the numerical method



Equivariance:

Let θ be the ML estimator of θ

Let ψ = h(θ) be a one-to-one function of θ with inverse h−1(ψ) = θ

Then the ML estimator of ψ satisfies

d ln L(h−1(ψ))

dψ=

d ln L(θ)

dθ

dh−1(ψ)

dψ= 0

which holds at ψ = h(θ)



Consistency

The parameter θ is identified if for all θ′ 6= θ and data x1, . . . , xn

ln L(θ′|x1, . . . , xn

)6= ln L (θ|x1, . . . , xn)

The parameter θ is asymptotically identified if for all θ′ 6= θ0

plim1

nln L

(θ′)6= plim

1

nln L (θ0)

where θ0 is the true value of the parameter [P]



Asymptotic normality

By definition, the ML estimator satisfies

g(θ) = 0

A first order Taylor series expansion of g around the true parametervector θ0 gives [P]

g(θ) = g (θ0) + H (θ0) (θ − θ0) + rest


Maximum likelihoodCovariance matrix estimation

The (approximate) covariance matrix of θ is

Cov(θ) = − [E (H (θ0))]−1 = −[E

(∂2 ln L(θ0)

∂θ0∂θ′0

)]−1

A consistent estimator of Cov(θ) is

Cov(θ) = −[H(θ)

]−1= −

(∂2 ln L(θ)

∂θ∂θ′

)−1

Often, H(θ) is a by-product of numerical optimization


Maximum likelihoodCovariance matrix estimation

An alternative consistent covariance matrix estimator is

Cov(θ) =[G (θ;X1, . . . ,Xn)′G (θ;X1, . . . ,Xn)

]−1

This estimator is called outer-product-of-the-gradient (OPG)estimator

Advantage: Only the first derivatives are required

Disadvantage: Less reliable in small samples



Numerical estimation of the parameters of N(µ, σ2)

Let X1, . . . ,X50 be a random sample from X ∼ N(µ, σ2)with µ = 5 and σ2 = 9

Density function

fX (x) =1√2π

exp

(−1

2· (x − µ)2

σ2

)

Log-likelihood function ln L(µ, σ2

)=∑n

i=1 ln fX (xi )



See numnormal.R

Point estimates (µσ2

)=

(3.640256.90869

)Estimated covariance matrix derived numerically from H(θ)

Cov(µ, σ2

)=

(0.13817 −0.00016−0.00016 1.90918

)



See numnormal.R

Point estimates (µσ2

)=

(3.640256.90869

)Estimated covariance matrix derived from theory

Cov(µ, σ2

)=

(0.13817 0

0 1.90920

)


Maximum likelihoodExample of violated regularity conditions

Let X be uniformly distributed on the interval [0, θ]

The density function is

fX (x) =

1/θ for 0 ≤ x ≤ θ0 else

The likelihood function is

L(θ|x1, . . . , xn) =

(1θ

)nfor θ ≥ maxi xi

0 else


Maximum likelihoodExample of violated regularity conditions

0 1 2 3 4 5 6

0.0e

+00

4.0e

−13

8.0e

−13

1.2e

−12

θ

likel

ihoo

d

L(θ) is not differentiable at maxi xi

Maximum is at θ = maxi xi

The estimator is consistent but not asymptotically normal

Illustration in R


Maximum likelihoodDependent observations

Maximum likelihood estimation is still possible if the observations aredependent

The joint density of the observations

fX1,...,XT(x1, . . . , xT )

can be factorized as

fX1(x1) ·T∏t=2

fXt |X1=x1,...,Xt−1=xt−1(xt)


Maximum likelihoodDependent observations

Loglikelihood

ln L = ln fX1(x1) +T∑t=2

ln fXt |X1=x1,...,Xt−1=xt−1(xt)

If T is large, one may ignore ln fX1(x1)

Computing the loglikelihood is straightforward if

fXt |X1=x1,...,Xt−1=xt−1(xt) = fXt |Xt−1=xt−1

(xt)


Maximum likelihoodThe three classical tests

Wald test, Lagrange multiplier test and likelihood ratio test(W, LM, LR)

HypothesesH0 : r(θ) = 0 vs H1 : r(θ) 6= 0

Often, r is a scalar-valued function and θ is a scalar

The function r may be non-linear!



Basic test ideas:

Wald test: If r(θ) = 0 is true, then r(θML) will be close to 0

Likelihood ratio test: If r(θ) = 0 is true, then ln L(θR) will not be farbelow ln L(θML)

Lagrange multiplier test: If r(θ) = 0 is true, the score functiong(θR) = ∂ ln L(θR)/∂θ will be close to 0



Example:

Let X1, . . . ,Xn be a random sample from X ∼ Exp(λ)

Test H0 : λ = 4 against H1 : λ 6= 4

Different notation:H0 : r(λ) = 0

where r(λ) = λ− 4

See threetests.R


Maximum likelihoodWald test

Wald test

Hypotheses

H0 : r(θ) = 0

H1 : r(θ) 6= 0

with functions r= (r1, . . . , rm)

m is the number of restrictions

Wald test: If r(θ) = 0 is true, then r(θML) will be close to 0



Asymptotically, under H0 (by delta method!)

r(θML) ∼ N(

0,Cov(r(θML)))

with

Cov(r(θML)) =∂r(θML)

∂θ′· Cov(θML) · ∂r(θML)

∂θ

Remember: If X ∼ N(µ,Σ), then (X − µ)′Σ−1(X − µ) ∼ χ2m

Wald test statistic

W = r(θML)′[Cov(r(θML))

]−1r(θML)

asy∼ χ2m



Remarks:

Reject H0 if W is larger than the (1− α)-quantile of theχ2m-distribution

Usually, Cov(r(θML)) must be replaced by Cov(r(θML))

The Wald test is not invariant with respect to re-parametrizations

The Wald test only requires the unrestricted ML estimator

Ideal, if θML is much easier to calculate than θR


Maximum likelihoodLikelihood ratio test

Likelihood ratio test

Is ln L(θML) significantly larger than ln L(θR) ?

LR test statistic

LR = −2 ln

(L(θR)

L(θML)

)= −2

(ln L(θR)− ln L(θML)

)Asymptotic distribution: LR

asy∼ χ2m


Maximum likelihoodLikelihood ratio test

Remarks:

Reject H0 if LR is larger than the (1− α)-quantile of theχ2m-distribution

To compute LR, one requires both the unrestricted estimator θML andthe restricted estimator θR

Ideal, if both θML and θR are easy to calculate

The LR test is often used to compare different models to each other


Maximum likelihoodLagrange multiplier test

Lagrange multiplier test

Is g(θR) significantly different from 0?

The test is based on the restricted estimator θR

Lagrange approach: maxθ ln L(θ) s.t. r(θ) = 0

LM test statistic

LM = g(θR)′ ·[I (θR)

]−1· g(θR)

asy∼ χ2m

with

I (θR) = −E

(∂2 ln L(θR)

∂θ∂θ′

)


Maximum likelihoodLagrange multiplier test

Remarks:

Reject H0 if LM is larger than the (1− α)-quantile of theχ2m-distribution

The LM test only requires the restricted estimator

Ideal, if θR is much easier to calculate than θML

The LM test is often used to test misspecifications(heteroskedasticity, autocorrelation, omitted variables etc.)

Asymptotically, the three tests are equivalent



Multivariate case

Example: Production function

Yi = X a1i1 · X

a2i2 + ui

where ui ∼ N(0, 0.052)

Log-likelihood function ln L(a1, a2)

ML estimators a1 and a2

Hypothesis test of a1 + a2 = 1 or a1 + a2 − 1 = 0

See classtest.R


Instrumental variablesPreliminaries

OLS is not consistent if E (ut |Xt) 6= 0

Define an information set Ωt (a σ-algebra), such that

E (ut |Ωt) = 0

This moment condition can be used for estimation

Variables in Ωt are called instrumental variables (or instruments)

We denote the instrument vector by Wt


Instrumental variablesCorrelation between errors and disturbances (I)

Errors in variables

Consider the model

yt = α + βx∗t + εt , εt ∼ iid(0, σ2ε)

The exogenous variable x∗t is unobservable

We can only observext = x∗t + vt

where vt ∼ iid(0, σ2v ) are independent of everything else

Estimators of yt = α + βxt + ut are inconsistent [P]


Instrumental variablesCorrelation between errors and disturbances (II)

Omitted variables bias

Letyt = α + β1x1t + β2x2t + εt

If x2 is unobservable, one estimates

yt = α + β1x1t + ut

where ut = β2x2t + εt

If x2t and x1t are correlated then so are ut and x1t


Instrumental variablesCorrelation between errors and disturbances (III)

Endogeneity

Standard example: supply and demand curves determine both priceand quantity

qt = γdpt + X dt βd + udt

qt = γspt + X st βs + ust

Solve for qt and pt[qtpt

]=

[1 −γd1 −γs

]−1([X dt βd

X st βs

]+

(udtust

))


Instrumental variablesCorrelation between errors and disturbances (III)

Since qt and pt depend on both udt and ust single equation OLSestimation of

qt = γdpt + X dt βd + udt

qt = γspt + X st βs + ust

is inconsistent

The right hand side variable pt is correlated with the error term

The condition E (ut |Ωt) = 0 is violated if pt is in Ωt


Instrumental variablesCorrelation between errors and disturbances

Warning! Inconsistency is not always a problem

If we simply want to forecast, we can use inconsistent estimators

Trivial example:

10 20 30 40 50

050

100

150

200

Positive correlation between u and X

x

y

true regression line


Instrumental variablesThe simple IV estimator

Let W denote the T × K matrix of instruments

All columns of X with Xt ∈ Ωt should be included in W

Then E (ut |Wt) = 0 implies the moment condition

E(W ′u

)= E

(W ′ (y − Xβ)

)= 0

The IV estimator is a method of moment estimator

The solution isβIV =

(W ′X

)−1W ′y


Instrumental variablesProperties

The simple IV estimator is consistent if

plim1

nW ′X = SWX

is deterministic and nonsingular [P]

The simple IV estimator is asymptotically normal,

√n(βIV − β

)→ U ∼ N

(0, σ2 (SWX )−1 SWW

(S ′WX

)−1)

where SWW = plim 1nW

′W [P]


Instrumental variablesHow to find instruments

Instruments must be

1 exogenous, i.e. plim 1nW

′u = 02 valid, i.e. plim 1

nW′X = SWX non-singular

Natural experiments (weather, earthquakes, . . . )

Angrist and Pischke (2009):

Good instruments come from a combination of institutional knowledge andideas about the processes determining the variable of interest.



Examples

Natural experiments

1 Brı¿ 12 ckner and Ciccone: Rain and the democratic window of

opportunity, Econometrica 79 (2011) 923-947

2 Angrist and Evans: Children and their parents’ labor supply: Evidencefrom exogenous variation in family size, American Economic Review88 (1998) 450-77.



Examples

Institutional arrangements

1 Angrist and Krueger: Does Compulsory School Attendance AffectSchooling and Earnings?, Quarterly Journal of Economics 106 (1991)979-1014.

2 Levitt: The Effect of Prison Population Size on Crime Rates: Evidencefrom Prison Overcrowding Litigation, Quarterly Journal of Economics111 (1996) 319-351.



In a time series context, one can sometimes use lagged endogenousregressors as instrumental variables

Example:yt = α + βxt + ut

with E (ut |xt) 6= 0

If Cov (xt , xt−1) 6= 0 but Cov (ut , xt−1) = 0, then xt−1 can be used asinstrumental variable

Attention: Cov (ut , xt−1) = 0 is not always obvious



Example (Measurement error in time series)

Consider the model

yt = α + βx∗t + ut

x∗t = ρx∗t−1 + εt

xt = x∗t + vt .

Then xt−1 is a valid instrument for a regression of yt on xt , and α and βwill be estimated consistently.



Example (Omitted variable bias in time series)

Consider the model

yt = α + β1x1t + β2xt2 + ut

x1t = ρ11x1,t−1 + ρ12x2,t−1 + ε1t

x2t = ρ21x1,t−1 + ρ22x2,t−1 + ε2t

Then x1,t−1 is not a valid instrument for a regression of yt on x1t , and αand β1 will not be estimated consistently.



Example (Endogeneity in time series)

Consider the model

yt = α + β1xt + β2yt−1 + ut

xt = γ + δ1yt + δ2xt−1 + vt

Then x1,t−1 is a valid instrument for a regression of yt on xt and yt−1, andα, β1 and β2 will be estimated consistently.


Instrumental variablesGeneralized IV estimation

If the number of instruments L is larger than the number ofparameters K , the model is overidentified

Right-multiply the T × L matrix W by an L× K matrix J to obtainan T × K instrument matrix WJ

Linear combinations of the instruments in W

One can show that the asymptotically optimal matrix isJ = (W ′W )−1 W ′X



The generalized IV estimator is

βIV =((WJ)′ X

)−1(WJ)′ y

=(X ′W

(W ′W

)−1W ′X

)−1X ′W

(W ′W

)−1W ′y

=(X ′PWX

)−1X ′PW y

with PW = W (W ′W )−1 W ′

Consistency and asymptotic normality still hold



The two-stage-least-squares (2SLS) interpretation

The matrix J is similar to β in the standard OLS model,

J =(W ′W

)−1W ′X

Hence, WJ is similar to X β

The optimal instruments are obtained if we regress theendogenous regressors on the instruments (1st stage), andthen use the fitted values as regressors (2nd stage)


Instrumental variablesFinite sample properties

The finite sample properties of IV estimators are complex

In the overidentified case, the first L− K moments exist,but higher moments do not

If the expectation exists, IV estimators are in general biased

The simple IV estimator has very heavy tails,even the first moment does not exist!

The estimator can be extremely far off the true value

ivfinite.R


Instrumental variablesHypothesis testing

Exact hypothesis tests are usually not feasible

Asymptotic tests are based on the asymptotic normality

An estimator of the covariance matrix of βIV is

Cov(βIV

)= σ2

(X ′PWX

)−1

with

PW = W(W ′W

)−1W ′

σ2 =1

n

(y − X βIV

)′ (y − X βIV

)



Asymptotic t-test

H0 : βi = βi0

H1 : βi 6= βi0

Under the null hypothesis, the test statistic

t =βi − βi0√Var

(βi

)is asymptotically N(0, 1)



Asymptotic Wald test (similiar to an F -test)

H0 : β2 = β20, H1 : β2 6= β20

where β2 is a length L subvector of β


W =(β2 − β20

)′ [Cov

(β2

)]−1 (β2 − β20

)is asymptotically χ2 with L degrees of freedom



Testing overidentifying restrictions

The identifying restrictions are

E (ut |Wt) = 0

or E(W ′u

)= 0

If the model is just identified the validity of the restriction cannot betested

If the model is overidentified, one can test if the overidentifyingrestrictions hold, i.e. if the instruments are valid and exogenous



Basic test idea: Check if the IV residuals can be explainedby the full set of instruments

Compute the IV residuals u

Regress the residuals on all instruments W


nR2 ∼ χ2m

where m is the degree of overidentification



Davidson and MacKinnon (2004, p. 338):Even if we do not know quite how to interpret a significant value of theoveridentification test statistic, it is always a good idea to compute it. If itis significantly larger than it should be by chance under the nullhypothesis, one should be extremely cautious in interpreting the estimates,because it is quite likely either that the model is specified incorrectly orthat some of the instruments are invalid.



Durbin-Wu-Hausman test

H0 : E(X ′u

)= 0

H1 : E(W ′u

)= 0

Test if IV estimation is really necessary or if OLS would do

Under H1, OLS is inconsistent, but IV is still consistent

Basic test idea: Compare βOLS and βIV . If they are‘too different’, reject H0



The difference between the estimators is

βIV − βOLS

=(X ′PWX

)−1X ′PW y −

(X ′X

)−1X ′y

=(X ′PWX

)−1(X ′PW y −

(X ′PWX

) (X ′X

)−1X ′y

)=

(X ′PWX

)−1(X ′PW

(I − X

(X ′X

)−1X ′)y)

=(X ′PWX

)−1 (X ′PWMX y

)




βIV − βOLS

=(X ′PWX

)−1X ′PW y −

(X ′X

)−1X ′y

=(X ′PWX

)−1(X ′PW y −

(X ′PWX

) (X ′X

)−1X ′y

)

=(X ′PWX

)−1(X ′PW

(I − X

(X ′X

)−1X ′)y)

=(X ′PWX

)−1 (X ′PWMX y

)




βIV − βOLS

=(X ′PWX

)−1X ′PW y −

(X ′X

)−1X ′y

=(X ′PWX

)−1(X ′PW y −

(X ′PWX

) (X ′X

)−1X ′y

)=

(X ′PWX

)−1(X ′PW

(I − X

(X ′X

)−1X ′)y)

=(X ′PWX

)−1 (X ′PWMX y

)




βIV − βOLS

=(X ′PWX

)−1X ′PW y −

(X ′X

)−1X ′y

=(X ′PWX

)−1(X ′PW y −

(X ′PWX

) (X ′X

)−1X ′y

)=

(X ′PWX

)−1(X ′PW

(I − X

(X ′X

)−1X ′)y)

=(X ′PWX

)−1 (X ′PWMX y

)



We need to test if X ′PWMX y is significantly different from 0

This term is identically equal to zero for all variables in X that areinstruments (i.e. that are also in W )

Denote by X all possibly endogenous regressors

To test if X ′PWMX y is significantly different from zero, perform aWald test of δ = 0 in the regression

y = Xβ + PW X δ + u


GMMModel description

Hansen, L. (1982), Large Sample Properties of Generalized Method ofMoments Estimators, Econometrica 50, 1029-1054:In this paper we study the large sample properties of a class of generalizedmethod of moments (GMM) estimators which subsumes many standardeconometric estimators. To motivate this class, consider an econometricmodel whose parameter vector we wish to estimate. The model implies afamily of orthogonality conditions that embed any economictheoretical restrictions that we wish to impose or test.



John Cochrane (2005), Asset Pricing, p. 196:

Most of the effort involved with GMM is simply mapping a given probleminto the very general notation.



Describe the model by elementary zero functions

Eθ (ft (θ, yt)) = 0

where everything can be vector-valued

Parameter vector θ of length K

Observation vectors yt

Identification condition

Eθ0 (ft (θ, yt)) 6= 0 for all θ 6= θ0



Example (Linear regression model)

Consider the standard model

y = Xβ + u

u ∼ N(0, σ2I ), independent of X

Parameter vector θ =?Observations yt =?Elementary zero functions ft(θ, yt) =?



Example (Lognormal distribution)

Suppose there is a random sample X1, . . . ,Xn from

X ∼ LN(µ, σ2)




Example (Asset pricing)

The basic asset pricing formula is

pt = E (mt+1xt+1|Ωt)

with asset price p, stochastic discount factor m, payoff x , and informationset Ωt .




Stack all elementary zero functions

f (θ, y) =

f1 (θ, y1)...

fn (θ, yn)

Covariance matrix

E(f (θ, y) f (θ, y)′

)= Ω

Dimension of Ω depends on dimension of ft(θ, yt)




The covariance matrix Ω is

E (f (θ, y) f (θ, y)′) = E(u u′)

= σ2I

If there are autocorrelation and heteroskedasticity

E(u u′)

= Ω





E (f (θ, y) f (θ, y)′) = E

f 211 f11f12 . . . f11fn1 f11fn2

f12f11 f 212 . . . f12fn1 f12fn2

......

. . ....

...fn1f11 fn1f12 . . . f 2

n1 fn1fn2

fn2f11 fn2f12 . . . fn2fn1 f 2n2

= ?



Example (Asset pricing)


E (f (θ, y) f (θ, y)′) = E

f 211 . . . f11fn1...

. . ....

fn1f11 . . . f 2n1

= ?


GMMEstimating equations

To estimate θ, we need K estimating equations

In general, they are weighted averages of the ft

In most cases, the estimating equations are based on L ≥ Kinstrumental variables W

If L > K , we need to form linear combinations

Let W be the n × L matrix of instrumentsand J be an L× K matrix of full rank

Define the n × K matrix Z = WJ



Theoretical moment conditions (orthogonality conditions)

E(Z ′t ft (θ, yt)

)= 0

The estimating equations are the empirical counterpart

1

nZ ′f (θ, y) = 0

Solving this system yields the GMM estimator θ




The K moment conditions for the linear regression model are


)= E

(X ′t(yt − X ′tβ

))= 0

and the estimating equations are

1

nX ′ (y − Xβ) = 0.




The two moment conditions for the lognormal distribution are


)= E

([1 00 1

] [ft1 (θ, yt)ft2 (θ, yt)

])= E

([Xt − exp

(µ+ 1

2σ2)

X 2t − exp

(2µ+ 2σ2

) ])=

(00

)



Example (contd)

. . . and the estimating equations are

1

nZ ′f (θ, y) =

1

n

[1 0 1 0 . . . 1 00 1 0 1 . . . 0 1

]

f11

f12...fn1

fn2

=

[1n

∑nt=1

(Xt − exp

(µ+ 1

2σ2))

1n

∑nt=1

(X 2t − exp

(2µ+ 2σ2

)) ] =

[00

]


GMMProperties of GMM estimators

Consistency

Assume that a law of large numbers applies to 1nZ′f (θ, y)

Define the limiting estimation functions

α (θ) = plim1

nZ ′f (θ, y)

and the limiting estimation equations α (θ) = 0

The GMM estimator θ is consistent if the asymptotic identificationcondition holds, α (θ) 6= α (θ0) for all θ 6= θ0 [P]


GMMProperties of GMM estimators

Asymptotic normality

Simplified notation: ft(θ) = ft(θ, yt), f (θ) = f (θ, y)

Additional assumption: ft (θ) is continuously differentiable at θ0

First order Taylor series expansion of

1

nZ ′f (θ) = 0

in θ around θ0 [P]


GMMAsymptotic efficiency

The asymptotic distribution of√n(θ − θ0

)is normal with

mean 0 and covariance matrix(plim

1

nZ ′F (θ0)

)−1(plim

1

nZ ′ΩZ

)(plim

1

nF (θ0)′Z

)−1

What is the optimal choice of Z in the estimating equations?

The optimal choice depends on assumptions about the matrices F (θ)and Ω



If Ω = σ2I and E (Ft(θ0)ft(θ0)) = 0 the optimal choice is

Z = F (θ0)

Problem: Z depends on the unknown θ0

Solution: Solve the estimating equations

1

nF ′(θ)f (θ) = 0



If Ω = σ2I and E (Ft(θ0)ft(θ0)) 6= 0 but Wt ∈ Ωt , the optimal choiceis

Z = PWF (θ0)

Problem: Z depends on the unknown θ0

Solution: Solve the estimating equations

1

nF ′(θ)PW f (θ) = 0



Suppose, the covariance matrix Ω is unknown

Since Z = WJ, the covariance matrix of√n(θ − θ0) is(

plim1

nJ ′W ′F0

)−1(plim

1

nJ ′W ′ΩWJ

)(plim

1

nF0′WJ

)−1

For the optimal J = (W ′ΩW )−1 W ′F0 this becomes(plim

1

nF ′0W

(W ′ΩW

)−1W ′F0

)−1



Although Ω cannot be estimated consistently, the term 1nW

′ΩW canbe estimated consistently (we will do that later)

If Σ is an estimator of 1nW

′ΩW , the optimal estimating equations are

1

nJ ′W ′f (θ) =

1

nF (θ)′W Σ−1W ′f (θ) = 0

and the estimated covariance matrix of θ is

Cov(θ) = n(F ′W Σ−1W ′F

)−1


GMMAlternative notation

Attention

Many textbooks use a different notation(and so does the gmm package in R)

The two approaches are equivalent

The moment conditions are notated as

E (g (θ, yt)) = E(W ′

t ft (θ, y))

= 0

The number of moment conditions L can be larger than the numberof parameters K



The L estimating equations cannot be solved exactly

gn(θ, y) =1

n

n∑t=1

g(θ, yt) = 0

The GMM estimator is defined by

θ = arg min gn(θ, y)′ An gn(θ, y)

where An is a sequence of L× L weighting matrices(which can be chosen by the user) with limit A



The GMM estimator based on gn is consistent, θp→ θ

Asymptotic normality: Define the L× K matrix

G (θ) =∂gn (θ, yt)

∂θ′=

1

n

n∑t=1

∂g(xt , θ)

∂θ′

Assume that√ngn(θ, y)

d→ N (0,V ), then [P]

√n(θ − θ0

)d→ N

(0,(G ′AG

)−1G ′AVAG

(G ′A′G

)−1)

Asymptotically optimal weighting matrix A [P]


GMMEquivalence

The two GMM approaches (based on ft and g) are equivalent

The first order condition of g(θ)′Ag(θ) is

G ′K×L

AL×L

gL×1

= 0K×1

which is the same as

J ′K×L

W ′L×n

fn×1

= 0K×1

List of equivalences [P]


GMMCovariance matrix estimation

The covariance matrix of the elementary zero functions

E(f (θ, y) f (θ, y)′

)= Ω

is often unknown

There may be heteroskedasticity and autocorrelation in Ω

Although Ω cannot be estimated consistently, the term 1nW

′ΩW canbe estimated consistently



Write

Σ = plimn→∞1

nW ′ΩW

Assume that a suitable law of large numbers holds,

Σ = limn→∞

1

n

n∑t=1

n∑s=1

E(ft fsW

′tWs

)where ft = ft (θ, yt)



Define the autocovariance matrices

Γ(j) =

1n

∑nt=j+1 E (ft ft−jW

′tWt−j) for j ≥ 0

1n

∑nt=−j+1 E

(ft+j ftW

′t+jWt

)for j < 0

Then

Σ = limn→∞

n−1∑j=−n+1

Γ(j) = limn→∞

Γ(0) +n−1∑j=1

(Γ(j) + Γ′(j)

)



The autocovariance matrix Γ(j), j ≥ 0, can be estimated by

Γ(j) =1

n

n∑t=j+1

ft ft−jW′tWt−j

Newey-West estimator of Σ

Σ = Γ(0) +

p∑j=1

(1− j

p + 1

)(Γ(j) + Γ′(j)

)


GMMTest of overidentifying restrictions

The GMM estimators minimize the criterion function

1

nf ′(θ)W Σ−1W ′f (θ)

Asymptotically, the minimized value (Hansen’s J statistics,Hansen’s overidentification statistic, Hansen-Sargan statistic)is distributed as χ2

L−K if the overidentifying restrictions hold

If the null hypothesis is rejected, then something went wrong,e.g. the model is misspecified


Indirect inferenceBasic idea

Anthony Smith, Jr. (New Palgrave Dictionary of Economics):Indirect inference is a simulation-based method for estimating theparameters of economic models . Its hallmark is the use of an auxiliarymodel to capture aspects of the data upon which to base the estimation.The parameters of the auxiliary model can be estimated using either theobserved data or data simulated from the economic model. Indirectinference chooses the parameters of the economic model so that these twoestimates of the parameters of the auxiliary model are as close as possible .


Indirect inferenceThe true model

Economic model

yt = G (yt−1, xt , ut ;β) , t = 1, . . . ,T

Exogenous variables xt and endogenous variables yt

Random errors ut , i.i.d. with cdf F

Parameter vector β of dimension K

Let standard estimation methods for β be intractable

It must be possible (and easy) to simulate y1, . . . , yTgiven y0 (assumed to be known), x1, . . . , xT and β


Indirect inferenceThe auxiliary model

The true model is too complicated for estimation of β

Instead estimate an auxiliary model with parameter vector θ

The dimension L of θ must be at least as large as thedimension K of β

The auxiliary model must be

“suitable” (but is allowed to be misspecified)easy and fast to estimate

Often, the auxiliary model is a standard time series model


Indirect inferenceEstimating the auxiliary model

For given β (and y0, x1, . . . , xT ), the auxiliary model’s parameters θare estimated

1 from the observed data x1, . . . , xT , y1, . . . , yT ,resulting in estimator θ

2 from H simulated datasets x1, . . . , xT , y(h)1 , . . . , y

(h)T for h = 1, . . . ,H,

resulting in estimators θ(h)(β)

Define

θ(β) =1

H

H∑h=1

θ(h)(β)


Indirect inferenceOptimization

Compute the difference between the vectors θ and θ(β)

Q(β) =(θ − θ(β)

)′W(θ − θ(β)

)where W is a positive definite weighting matrix

The indirect inference estimator of β is

β = arg minQ(β)


Indirect inferenceRemarks

The simulations have to be done with the same set ofrandom errors

Indirect inference is similar to GMM: the auxiliary parametersare the “moments”

The asymptotic distribution of β can be derived(see Gourieroux et al., 1993)

The weighting matrix W can be chosen optimally


Indirect inferenceA simple example (Gourieroux et al., 1993)

Consider the MA(1) process

yt = εt − βεt−1

with εt ∼ N(0, 1) and β = 0.5 for t = 1, . . . , 250

The maximum likelihood estimator βML is not trivial

Indirect inference estimator βII of β ?

Auxiliary model: AR(3) with parameters θ

No weighting, the matrix W is the identity matrix


Indirect inferenceA simple example (Gourieroux et al., 1993)

Compare the distribution of βML and βII

Step 1: Simulate a time series y1, . . . , y250

Step 2: Compute βML

Step 3: Estimate θ from y1, . . . , y250

Step 4: For given β, simulate 10 paths y(h)1 , . . . , y

(h)250

Step 5: Estimate θ(β) from the simulated paths

Step 6: Repeat steps 4 and 5 for different β until the differencebetween θ and θ(β) is minimized

Step 7: Save βII and start again at step 1


BootstrapBasic idea

Point of departure: unknown distribution function F(univariate or multivariate)

Unknown parameter vector

θ = θ(F )

Simple random sample X1, . . . ,Xn from F

Estimatorθ = θ(X1, . . . ,Xn)

Why is the distribution of θ of interest?


BootstrapBasic idea

Basic bootstrap idea: Approximate the unknown distribution of

θ(X1, . . . ,Xn) for X1, . . . ,Xni.i.d. from F

by the distribution of

θ(X ∗1 , . . . ,X∗n ) for X ∗1 , . . . ,X

∗n i.i.d. from F

The distribution of θ under F is usually found by Monte-Carlosimulations based on resamples (pseudo sample)


BootstrapBasic idea

How is F estimated?parametric −→ parametric bootstrapnonparametric −→ nonparametric bootstrapsmoothed −→ smooth bootstrapmodel based

Applicationsbias and standard errorsconfidence intervalshypothesis tests


BootstrapExample 1

Nonparametric bootstrap of the standard error of

θ = X =1

n

n∑i=1

Xi

Simple random sample X1, . . . ,X20

Estimation of the unknown cdf F by the empirical distributionfunction

Fn(x) =1

n

n∑i=1

1 (Xi ≤ x)


BootstrapExample 1 (contd)

How is X distributed under F ?

How is X distributed under F = Fn ?

Estimation of the distributio of X under Fnby Monte-Carlo simulation

Calculation of the standard deviation of X under Fn

The distribution of X under Fn is an approximation of the distributionof X under F


BootstrapExample 1 (still contd): The algorithm

1 Draw a random sample X ∗1 , . . . ,X∗20 from Fn (resampling)

2 Compute

X ∗ =1

20

20∑i=1

X ∗i

3 Repeat steps 1 and 2 a large number B of times,save the results as X ∗1 , . . . , X

∗B

4 Compute the standard error bootex1.R

SE (X ) =

√√√√ 1

B − 1

B∑i=1

(X ∗i − X ∗

)2


BootstrapExample 2

Parametric bootstrap of the bias of

θ = λ =1

X

for the exponential distribution X ∼ Exp(λ)


Estimation of the unknown distribution function F by

Fλ(x) = 1− exp(−λx

)



How is λ distributed under F ?

How is λ distributed under F = Fλ ?

Estimation of the distribution of λ under Fλby Monte-Carlo simulation

Find the expectation of λ under Fλ

The distribution of λ under Fλ approximates the distribution of λunder F


BootstrapExample 2 (still contd): The algorithm

1 Compute λ = 1/X from X1, . . . ,X8

2 Draw a simple random sample X ∗1 , . . . ,X∗8 from Fλ

3 Compute λ∗ = 1/X ∗

4 Repeat steps 1 and 2 a large number B of times,save the results as λ∗1, . . . , λ

∗B

5 Estimate the bias by bootex2.R(1

B

∑b

λ∗b

)− λ


BootstrapGeneral approach for bootstrap standard errors

originalsample

X1, . . . ,Xn

−→

edf

F = Fnor

F = Fθ

−→

⟨ 1. resample: X ∗1 , . . . ,X∗n → θ∗1

2. resample: X ∗1 , . . . ,X∗n → θ∗2

...

B. resample: X ∗1 , . . . ,X∗n → θ∗B

−→ SE (θ) =

√√√√ 1

B − 1

B∑b=1

(θ∗b − θ∗

)2


BootstrapBootstrapping confidence intervals

General definition: An interval[θlow (X1, . . . ,Xn) ; θhigh (X1, . . . ,Xn)

]is called (1− α)-confidence interval if

P(θlow ≤ θ ≤ θhigh

)= 1− α

If the equality holds only asymptotically, the interval is calledasymptotic (1− α)-confidence interval

Note: The interval limits are random variables


BootstrapNaive bootstrap confidence intervals

The naive confidence intervals are sometimes called the“other” percentile method

Generate a large number (B) of resamples and compute θ∗1, . . . , θ∗B

Let θ∗(1) ≤ θ∗(2) ≤ . . . ≤ θ

∗(B) be the order statistic

The naive (1− α)-confidence interval is[θ∗((α/2)B); θ∗((1−α/2)B)

]Why is this approach often problematic? bootnaiv.R


BootstrapPercentile bootstrap confidence intervals

To determine confidence intervals we look at the distribution of

θ − θ

Let c1 and c2 be the α/2- and (1− α/2)-quantiles, i.e.

P(c1 ≤ θ − θ ≤ c2

)= 1− α

Then [θ − c2, θ − c1

]is the (1− α)-confidence interval



Approximate the distribution of θ − θ by bootstrapping

θ∗ − θ

Let c∗1 and c∗2 be the α/2- and (1− α/2)-quantiles, i.e.

P(c∗1 ≤ θ∗ − θ ≤ c∗2

)= 1− α

We obtain c∗1 = θ∗(α/2B) − θ and c∗2 = θ∗((1−α/2)B) − θ and[θ − c∗2 , θ − c∗1

]=[2θ − θ∗((1−α/2)B); 2θ − θ∗((α/2)B)

]



Algorithm of the percentile method:

Compute θ from the original sample X1, . . . ,Xn

Generate a large number B of resamples and compute θ∗1, . . . , θ∗B

Let θ∗(1) ≤ θ∗(2) ≤ . . . ≤ θ

∗(B) be the order statistics

The bootstrap (1− α)-confidence interval is[2θ − θ∗((1−α/2)B); 2θ − θ∗((α/2)B)

]


BootstrapExample 3

Parametric bootstrap 0.95-confidence interval for λ of an exponentialdistribution


Estimate λ by λ = 1/X

Estimate the unknown distribution function F by

Fλ(x) = 1− exp(−λx

)



The algorithm bootex3.R

1 Compute λ = 1/X from X1, . . . ,X8

2 Draw a simple random sample X ∗1 , . . . ,X∗8 from Fλ

3 Compute λ∗ = 1/X ∗

4 Repeat steps 1 and 2 a large number B of times,save the results as λ∗1, . . . , λ

∗B

5 The bootstrap 0.95-confidence interval is[2λ− λ∗((1−α/2)B); 2λ− λ∗((α/2)B)

]


BootstrapHypothesis testing

Test the hypotheses

H0 : θ = θ0

H1 : θ 6= θ0

at significance level α

Assumption: Random sample (univariate or multivariate)

Test statisticT = θ − θ0



Reject H0 if the value of the test statistic is less than theα/2-quantile of T or greater than the (1− α/2)-quantile of T

The p-value of the test is P(|T | > |t|)How can we estimate the distribution of T under H0 ?

Wald approach: bootstrap distribution

T ∗ = θ∗ − θ

θ∗ = θ(X ∗1 , . . . ,X∗n ) is calculated from resamples drawn under the

alternative hypothesis



Lagrange multiplier approach: bootstrap distribution

T# = θ# − θ0

Attention: θ# = θ(X#1 , . . . ,X

#n ) is calculated from resamples drawn

under the null hypothesis!

This approach is particularly suitable for the parametric bootstrap(but can also be used for other bootstraps)


BootstrapHypothesis testing: General algorithm

1 Compute test statistic T from X1, . . . ,Xn

2 Draw a resample under the null hypothesis, X#1 , . . . ,X

#n , or draw a

resample under the alternative hypothesis, X ∗1 , . . . ,X∗n

3 Compute the test statistic T ∗ or T# for the resample

4 Repeat steps 2 and 3 a large number B of times;save the results as T#

1 , . . . ,T#B or T ∗1 , . . . ,T

∗B

5 Calculate the α/2-quantile c#1 (or c∗1 ) and the

(1− α/2)-quantile c#2 (or c∗2 )

6 Reject H0 if the test statistic T is less than c#1 (or c∗1 ) or greater

than c#2 (or c∗2 )


BootstrapExample 4

Parametric bootstrap for the parameter λ of an exponentialdistribution X ∼ Exp(λ)

Random sample X1, . . . ,X8

Hypotheses H0 : λ = λ0 = 2 against H1 : λ 6= λ0

(at level α = 0.05)

Test statisticT = λ− 2

Bootstrap of the distribution of T under the alternative hypothesis(Wald approach) bootex4a.R



Bootstrap of the distribution of T under the null hypothesis(LM approach) bootex4b.R

Under the null hypothesis, X# ∼ Exp(λ0) with λ0 = 2

Hence, the distribution of T# is found by an ordinary Monte-Carlosimulation!

If T < T#(α/2B) or T > T#

((1−α/2)B), reject H0


BootstrapExample 5

Nonparametric test for equality of two expectations

Two independent variables X and Y with expectations µX , µYand unknown variances σ2

X , σ2Y

Hypotheses H0 : µX = µY against H1 : µX 6= µY

Samples X1, . . . ,Xm and Y1, . . . ,Yn

Test statistic

T =µX − µY√σ2X + σ2

Y



Case I: resampling under the alternative hypothesis bootex5a.R

Draw X ∗1 , . . . ,X∗m with replacement from X1, . . . ,Xm

and Y ∗1 , . . . ,Y∗n from Y1, . . . ,Yn

Compute the test statistic T ∗

Repeat this B times; calculate the quantile of T ∗

Reject H0 at level α = 0.05 if T < T ∗(0.025B) or T > T ∗(0.975B)


BootstrapExample 5 (still contd)

Case II: resampling under the null hypothesis bootex5b.R

Estimate the joint expectation by

µ =mµX + nµY

n + m

Translate X1, . . . ,Xm such that their mean is µ

Translate Y1, . . . ,Yn such that their mean is µ

Resample from the translated data (i.e. under the null hypothesis);then continue as before


BootstrapExample 6

Nonparametric bootstrap for independence

Bivariate distribution (X ,Y )

Hypothesis H0 : X and Y are stochastically independent

Sample (X1,Y1) , . . . , (Xn,Yn)

Test statistic: Empirical coefficient of correlation

T = Corr(X ,Y ) =

∑(Xi − X

) (Yi − Y

)√∑(Xi − X

)2∑(Yi − Y

)2



Resampling under the null hypothesis bootex6.R

Draw X#1 , . . . ,X

#n with replacement from X1, . . . ,Xn

Independently, draw Y #1 , . . . ,Y #

n with replacement from Y1, . . . ,Yn

Bootstrap distribution of

T# = Corr(X#,Y #)

Reject H0 if T < T#(0.025B) or T > T#

(0.975B)


BootstrapResampling methods: Parametric bootstrap

Parametric bootstrap under the alternative hypothesis

1 Estimate θ from the original data X1, . . . ,Xn

2 The estimated distribution function is F = Fθ3 Draw X ∗1 , . . . ,X

∗n from Fθ and compute θ∗

4 Repeat step 3 a large number of times to determine the requireddistribution


BootstrapResampling methods: Parametric bootstrap

Parametric bootstrap under the null hypothesis

1 The estimated distribution function is F = Fθ0 If the distribution

function is not completely specified by θ0, choose F “as close aspossible” to θ

2 Draw X#1 , . . . ,X

#n from Fθ0 and compute θ#



BootstrapResampling methods: Nonparametric bootstrap

Nonparametric bootstrap under the alternative hypothesis

1 The estimated distribution function is F = Fn(empirical distribution function)

2 Draw X ∗1 , . . . ,X∗n with replacement from X1, . . . ,Xn

and compute θ∗



BootstrapResampling methods: Nonparametric bootstrap

Nonparametric bootstrap under the null hypothesis

1 The estimated distribution function F is a weighted empiricaldistribution function

2 Draw X#1 , . . . ,X

#n with replacement (but with different probabilities)

from X1, . . . ,Xn

The probabilities are chosen such that F satisfies H0. If not unique,choose an optimality criterion, e.g. maximal entropy



BootstrapResampling methods: Smooth bootstrap

Smooth bootstrap under the alternative hypothesis

Kernel density estimation (e.g. with Gaussian kernel φ)

fX (x) =1

nh

n∑i=1

φ

(x − Xi

h

)

Estimated distribution function F (x) =∫ x−∞ fX (z)dz

Draw X ∗1 , . . . ,X∗n from F (x)


BootstrapResampling methods: Smooth bootstrap

Drawing from F (x) is equivalent to the following method:

1 Draw Z1, . . . ,Zn with replacement from X1, . . . ,Xn

2 Draw ε1, . . . , εn from a standard normal distribution3 For i = 1, . . . , n, compute

X ∗i = Z1 + hεi

Smooth bootstrap: nonparametric bootstrap with additional noise


BootstrapWarning

The bootstrap approximates the distribution of θ (or sometransformations of θ) if the model is correctly specified

Bias due to misspecification cannot be found by bootstrapping!

Example: Errors-in-variables, omitted variables

The validity of the bootstrap approximation can usually be shownonly asymptotically, i.e. for B →∞ and n→∞Experience shows that the bootstrap often yields good approximationsof the small-sample distribution of θ


BootstrapRegression

Simple linear regression model

yi = α + βxi + ui

for i = 1, . . . , n with i.i.d. error terms ui

Let E (ui |xi ) = 0 for all i = 1, . . . , n

OLS estimator of β is

β =

∑ni=1 (xi − x) (yi − y)∑n

i=1 (xi − x)2


BootstrapRegression

OLS estimator of α is α = y − βxFitted values

yi = α + βxi

Residualsui = yi − yi

Estimated error term variance

σ2 =1

n − 2

n∑i=1

u2i


BootstrapRegression

How can we construct a (1− α)-confidence interval for β?

Usual approach: Normal approximation[β − 1.96 · SE (β); β + 1.96 · SE (β)

]with standard errors SE (β) =

√σ2/

∑(xi − x)2

Alternative method (1): bootstrap the residuals

Alternative method (2): bootstrap the observations (xi , yi )


BootstrapRegression

Bootstrap the residuals

The unknown distribution function F is the distribution function ofthe error terms

The estimated distribution function F is the (parametrically ornonparametrically) estimated distribution function of the residualsu1, . . . , un

The x-values are kept constant

Only the error terms are resampled


BootstrapRegression

Algorithm (nonparametric) bootregr1.R

1 Estimate the model (β) from the data and calculate u1, . . . , un2 Draw a resample u∗1 , . . . , u

∗n with replacement from u1, . . . , un

3 For i = 1, . . . , n generate

y∗i = α + βxi + u∗i

4 Compute β∗ from (x1, y∗1 ), . . . , (xn, y

∗n )

5 Proceed as usual


BootstrapRegression

Bootstrap of the observations

The unknown distribution function F is the joint distribution functionof (xi , yi )

The estimated distribution function F is the (usuallynonparametrically) estimated multivariate distribution function of theobservations (x1, y1), . . . , (xn, yn)

The x-values are different in each resample


BootstrapRegression

Algorithm bootregr2.R

1 Estimate β from the data

2 Draw a resample (x∗1 , y∗1 ), . . . , (x∗n , y

∗n ) with replacement from

(x1, y1), . . . , (xn, yn)

3 Compute β∗ from (x∗1 , y∗1 ), . . . , (x∗n , y

∗n )

4 Proceed as usual


Date post:	05-Feb-2018
Category:	Documents
Upload:	doanhanh
View:	267 times
Download:	0 times

Advanced Econometrics - uni- · PDF fileAdvanced Econometrics ... (CQE) Econometrics Winter...

Documents