Advanced Econometrics
Dr. Andrea Beccarini
Center for Quantitative Economics
Winter 2013/2014
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 1 / 156
General informationAims and prerequisites
Objective: learn to understand and use advanced econometricestimation techniques
Applications in micro and macro econometrics and finance
Prerequisites: Statistical Foundations (random vectors, stochasticconvergence, estimators)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 2 / 156
General informationLiterature
Russell Davidson and James MacKinnon, Econometric Theory andMethods, Oxford University Press, 2004.
Various textbooks
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 3 / 156
General informationSchedule
Least squares estimation and method of moments
Maximum likelihood estimation
Instrument variables estimation
GMM
Indirect Inference
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 4 / 156
Least squaresLinear regression
Multiple linear regression model
y = Xβ + u
u ∼ N(0, σ2I
)OLS estimator
β =(X ′X
)−1X ′y
Covariance matrixCov
(β)
= σ2(X ′X
)−1
Gauss-Markov theorem
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 5 / 156
Least squaresNonlinear regression
Notation of Davidson and MacKinnon (2004),
yt = xt (β) + ut
ut ∼ IID(0, σ2)
xt(β) is a nonlinear function of the parameter vector β
Example:
yt = β1 + β2xt1 +1
β2xt2 + ut
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 6 / 156
Least squaresNonlinear regression
Minimize the sum of squared residuals
T∑t=1
(yt − xt (β))2
with respect to β
Usually, the minimization must be done numerically
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 7 / 156
Method of momentsDefinition of moments
Raw moment of order pµp = E (X p)
Empirical raw moment of order p
µp =1
n
n∑i=1
X pi
for a simple random sample X1, . . . ,Xn
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 8 / 156
Method of momentsBasic idea: Step 1
Write r theoretical moments as functions of r unknown parameters
µ1 = g1 (θ1, . . . , θr )...
µr = gr (θ1, . . . , θr )
Of course, central moments may be used as well
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 9 / 156
Method of momentsBasic idea: Step 2
Invert the system of equations:Write the r unknown parametersas functions of the r theoretical moments
θ1 = h1(µ1, . . . , µr )...
θr = hr (µ1, . . . , µr )
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 10 / 156
Method of momentsBasic idea: Step 3
Replace all theoretical moments by empirical moments
θ1 = h1(µ1, . . . , µr )...
θr = hr (µ1, . . . , µr )
The estimators θ1, . . . , θr are moment estimators
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 11 / 156
Method of momentsProperties of moment estimators
Moment estimators are consistent since
plimθ1 = plim (h1(µ1, µ2, . . .))
= h1(plimµ1, plimµ2, . . .)
= h1(µ1, µ2, . . .)
= θ1
In general, moment estimators are not unbiased and not efficient
Since the empirical moments are asymptotically normal (why?),moment estimators are also asymptotically normal−→ delta method [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 12 / 156
Method of momentsExample
Let X ∼ Exp (λ) with unknown parameter λ and let X1, . . . ,Xn be arandom sample
Step 1: We know that E (X ) = µ1 = 1/λ
Step 2 (inversion): λ = 1/µ1
Step 3: The estimator is
λ =1
µ1=
11n
∑i Xi
=1
Xn
Is λ unbiased?
Alternative: Var(X ) = 1/λ2, then λ = 1/√S2
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156
Method of momentsExample
Let X ∼ Exp (λ) with unknown parameter λ and let X1, . . . ,Xn be arandom sample
Step 1: We know that E (X ) = µ1 = 1/λ
Step 2 (inversion): λ = 1/µ1
Step 3: The estimator is
λ =1
µ1=
11n
∑i Xi
=1
Xn
Is λ unbiased?
Alternative: Var(X ) = 1/λ2, then λ = 1/√S2
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156
Method of momentsExample
Let X ∼ Exp (λ) with unknown parameter λ and let X1, . . . ,Xn be arandom sample
Step 1: We know that E (X ) = µ1 = 1/λ
Step 2 (inversion): λ = 1/µ1
Step 3: The estimator is
λ =1
µ1=
11n
∑i Xi
=1
Xn
Is λ unbiased?
Alternative: Var(X ) = 1/λ2, then λ = 1/√S2
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156
Method of momentsExample
Let X ∼ Exp (λ) with unknown parameter λ and let X1, . . . ,Xn be arandom sample
Step 1: We know that E (X ) = µ1 = 1/λ
Step 2 (inversion): λ = 1/µ1
Step 3: The estimator is
λ =1
µ1=
11n
∑i Xi
=1
Xn
Is λ unbiased?
Alternative: Var(X ) = 1/λ2, then λ = 1/√S2
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156
Method of momentsExample
Let X ∼ Exp (λ) with unknown parameter λ and let X1, . . . ,Xn be arandom sample
Step 1: We know that E (X ) = µ1 = 1/λ
Step 2 (inversion): λ = 1/µ1
Step 3: The estimator is
λ =1
µ1=
11n
∑i Xi
=1
Xn
Is λ unbiased?
Alternative: Var(X ) = 1/λ2, then λ = 1/√S2
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156
Method of momentsExample
Let X ∼ Exp (λ) with unknown parameter λ and let X1, . . . ,Xn be arandom sample
Step 1: We know that E (X ) = µ1 = 1/λ
Step 2 (inversion): λ = 1/µ1
Step 3: The estimator is
λ =1
µ1=
11n
∑i Xi
=1
Xn
Is λ unbiased?
Alternative: Var(X ) = 1/λ2, then λ = 1/√S2
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 13 / 156
Maximum likelihoodBasic idea
The basic idea is very natural:
Choose the parameters such that the probability (likelihood) of theobservations x1, . . . , xn as a function of the unknown parametersθ1, . . . , θr is maximized
Likelihood function
L(θ; x1, . . . , xn) =
P(X1 = x1, . . . ,Xn = xn; θ)
fX1,...,Xn(x1, . . . , xn; θ)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 14 / 156
Maximum likelihoodBasic idea
For simple random samples
L(θ; x1, . . . , xn) =n∏
i=1
fX (xi ; θ)
Maximize the likelihood
L(θ; x1, . . . , xn) = maxθ∈Θ
L(θ; x1, . . . , xn)
ML estimate θ = arg max L(θ; x1, . . . , xn)
ML estimator θ = arg max L(θ;X1, . . . ,Xn)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 15 / 156
Maximum likelihoodBasic idea
Because sums are easier to deal with than products,and because sums are subject to limit laws, it iscommon to maximize the log-likelihood
ln L(θ) =n∑
i=1
ln fX (Xi ; θ)
The ML estimator is the same as before, since
θ = arg max ln L(θ;X1, . . . ,Xn)
= arg max L(θ;X1, . . . ,Xn)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 16 / 156
Maximum likelihoodBasic idea
Usually, we find θ by solving the system of equations
∂ ln L/∂θ1 = 0...
∂ ln L/∂θr = 0
The gradient vector g(θ) = ∂ ln L(θ)/∂θ is calledscore vector or score
If the log-likelihood is not differentiable other maximization methodsmust be used
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 17 / 156
Maximum likelihoodExample
Let X ∼ Exp(λ) with density f (x ;λ) = λe−λx for x ≥ 0and f (x ;λ) = 0 else
Likelihood of i.i.d. random sample
L(λ; x1, . . . , xn) =n∏
i=1
λe−λxi
Log-likelihood
ln L(λ; x1, . . . , xn) = n lnλ− λn∑
i=1
xi
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 18 / 156
Maximum likelihoodExample
Set the derivative to zero
∂ ln L(λ)
∂λ=
n
λ−
n∑i=1
xi!
= 0,
hence
λ =n∑ni=1 xi
=1
x
The ML estimator for λ is
λ =1
X
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 19 / 156
Maximum likelihoodProperties of ML estimators: Preliminaries
The log-likelihood and the score vector are
ln L (θ) =n∑
i=1
ln fX (Xi ; θ)
∂ ln L (θ)
∂θ=
n∑i=1
∂ ln fX (Xi ; θ)
∂θ
The contributions ln fX (Xi ; θ) are random variables
The contributions ∂ ln fX (Xi ; θ)/∂θ are random vectors
Hence, limit laws can be applied to the (normalized) sums
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 20 / 156
Maximum likelihoodProperties of ML estimators: Preliminaries
For all θ ∫e ln L(θ)dx =
∫L (θ; x1, . . . , xn) dx
= 1
since L (θ) is a joint density function of X1, . . . ,Xn
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 21 / 156
Maximum likelihoodProperties of ML estimators: Preliminaries
Define the matrix G (θ,X1, . . . ,Xn) of gradient contributions
Gij (θ,Xi ) =∂ ln fX (Xi ; θ)
∂θj
The column sums are the gradient vector with elements
gj (θ) =n∑
i=1
Gij (θ,Xi )
The expected gradient vector is Eθ (g (θ)) = 0 [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 22 / 156
Maximum likelihoodProperties of ML estimators: Preliminaries
The covariance matrix of gradient vector
Cov (g (θ)) = E(g (θ) g (θ)′
)is called information matrix (and often denoted I (θ))
Information matrix equality [P]
Cov (g (θ)) = −E (H (θ))
Cov
(∂ ln L (θ)
∂θ
)= −E
(∂2 ln L (θ)
∂θ∂θ′
)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 23 / 156
Maximum likelihoodProperties of ML estimators
1 Equivariance: If θ is the ML estimator for θ, then h(θ) is the MLestimator for h(θ)
2 Consistency:plimθn=θ
3 Asymptotic normality:
√n(θn − θ
)d→ U ∼ N (0,V (θ))
4 Asymptotic efficiency: V (θ) is the Cramı¿ 12 r-Rao bound
5 Computability (analytical or numerical); the covariance matrix of theestimator is a by-product of the numerical method
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 24 / 156
Maximum likelihoodProperties of ML estimators
Equivariance:
Let θ be the ML estimator of θ
Let ψ = h(θ) be a one-to-one function of θ with inverse h−1(ψ) = θ
Then the ML estimator of ψ satisfies
d ln L(h−1(ψ))
dψ=
d ln L(θ)
dθ
dh−1(ψ)
dψ= 0
which holds at ψ = h(θ)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 25 / 156
Maximum likelihoodProperties of ML estimators
Consistency
The parameter θ is identified if for all θ′ 6= θ and data x1, . . . , xn
ln L(θ′|x1, . . . , xn
)6= ln L (θ|x1, . . . , xn)
The parameter θ is asymptotically identified if for all θ′ 6= θ0
plim1
nln L
(θ′)6= plim
1
nln L (θ0)
where θ0 is the true value of the parameter [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 26 / 156
Maximum likelihoodProperties of ML estimators
Asymptotic normality
By definition, the ML estimator satisfies
g(θ) = 0
A first order Taylor series expansion of g around the true parametervector θ0 gives [P]
g(θ) = g (θ0) + H (θ0) (θ − θ0) + rest
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 27 / 156
Maximum likelihoodCovariance matrix estimation
The (approximate) covariance matrix of θ is
Cov(θ) = − [E (H (θ0))]−1 = −[E
(∂2 ln L(θ0)
∂θ0∂θ′0
)]−1
A consistent estimator of Cov(θ) is
Cov(θ) = −[H(θ)
]−1= −
(∂2 ln L(θ)
∂θ∂θ′
)−1
Often, H(θ) is a by-product of numerical optimization
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 28 / 156
Maximum likelihoodCovariance matrix estimation
An alternative consistent covariance matrix estimator is
Cov(θ) =[G (θ;X1, . . . ,Xn)′G (θ;X1, . . . ,Xn)
]−1
This estimator is called outer-product-of-the-gradient (OPG)estimator
Advantage: Only the first derivatives are required
Disadvantage: Less reliable in small samples
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 29 / 156
Maximum likelihoodExample
Numerical estimation of the parameters of N(µ, σ2)
Let X1, . . . ,X50 be a random sample from X ∼ N(µ, σ2)with µ = 5 and σ2 = 9
Density function
fX (x) =1√2π
exp
(−1
2· (x − µ)2
σ2
)
Log-likelihood function ln L(µ, σ2
)=∑n
i=1 ln fX (xi )
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 30 / 156
Maximum likelihoodExample
See numnormal.R
Point estimates (µσ2
)=
(3.640256.90869
)Estimated covariance matrix derived numerically from H(θ)
Cov(µ, σ2
)=
(0.13817 −0.00016−0.00016 1.90918
)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 31 / 156
Maximum likelihoodExample
See numnormal.R
Point estimates (µσ2
)=
(3.640256.90869
)Estimated covariance matrix derived from theory
Cov(µ, σ2
)=
(0.13817 0
0 1.90920
)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 32 / 156
Maximum likelihoodExample of violated regularity conditions
Let X be uniformly distributed on the interval [0, θ]
The density function is
fX (x) =
1/θ for 0 ≤ x ≤ θ0 else
The likelihood function is
L(θ|x1, . . . , xn) =
(1θ
)nfor θ ≥ maxi xi
0 else
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 33 / 156
Maximum likelihoodExample of violated regularity conditions
0 1 2 3 4 5 6
0.0e
+00
4.0e
−13
8.0e
−13
1.2e
−12
θ
likel
ihoo
d
L(θ) is not differentiable at maxi xi
Maximum is at θ = maxi xi
The estimator is consistent but not asymptotically normal
Illustration in R
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 34 / 156
Maximum likelihoodDependent observations
Maximum likelihood estimation is still possible if the observations aredependent
The joint density of the observations
fX1,...,XT(x1, . . . , xT )
can be factorized as
fX1(x1) ·T∏t=2
fXt |X1=x1,...,Xt−1=xt−1(xt)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 35 / 156
Maximum likelihoodDependent observations
Loglikelihood
ln L = ln fX1(x1) +T∑t=2
ln fXt |X1=x1,...,Xt−1=xt−1(xt)
If T is large, one may ignore ln fX1(x1)
Computing the loglikelihood is straightforward if
fXt |X1=x1,...,Xt−1=xt−1(xt) = fXt |Xt−1=xt−1
(xt)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 36 / 156
Maximum likelihoodThe three classical tests
Wald test, Lagrange multiplier test and likelihood ratio test(W, LM, LR)
HypothesesH0 : r(θ) = 0 vs H1 : r(θ) 6= 0
Often, r is a scalar-valued function and θ is a scalar
The function r may be non-linear!
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 37 / 156
Maximum likelihoodThe three classical tests
Basic test ideas:
Wald test: If r(θ) = 0 is true, then r(θML) will be close to 0
Likelihood ratio test: If r(θ) = 0 is true, then ln L(θR) will not be farbelow ln L(θML)
Lagrange multiplier test: If r(θ) = 0 is true, the score functiong(θR) = ∂ ln L(θR)/∂θ will be close to 0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 38 / 156
Maximum likelihoodThe three classical tests
Example:
Let X1, . . . ,Xn be a random sample from X ∼ Exp(λ)
Test H0 : λ = 4 against H1 : λ 6= 4
Different notation:H0 : r(λ) = 0
where r(λ) = λ− 4
See threetests.R
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 39 / 156
Maximum likelihoodWald test
Wald test
Hypotheses
H0 : r(θ) = 0
H1 : r(θ) 6= 0
with functions r= (r1, . . . , rm)
m is the number of restrictions
Wald test: If r(θ) = 0 is true, then r(θML) will be close to 0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 40 / 156
Maximum likelihoodWald test
Asymptotically, under H0 (by delta method!)
r(θML) ∼ N(
0,Cov(r(θML)))
with
Cov(r(θML)) =∂r(θML)
∂θ′· Cov(θML) · ∂r(θML)
∂θ
Remember: If X ∼ N(µ,Σ), then (X − µ)′Σ−1(X − µ) ∼ χ2m
Wald test statistic
W = r(θML)′[Cov(r(θML))
]−1r(θML)
asy∼ χ2m
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 41 / 156
Maximum likelihoodWald test
Remarks:
Reject H0 if W is larger than the (1− α)-quantile of theχ2m-distribution
Usually, Cov(r(θML)) must be replaced by Cov(r(θML))
The Wald test is not invariant with respect to re-parametrizations
The Wald test only requires the unrestricted ML estimator
Ideal, if θML is much easier to calculate than θR
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 42 / 156
Maximum likelihoodLikelihood ratio test
Likelihood ratio test
Is ln L(θML) significantly larger than ln L(θR) ?
LR test statistic
LR = −2 ln
(L(θR)
L(θML)
)= −2
(ln L(θR)− ln L(θML)
)Asymptotic distribution: LR
asy∼ χ2m
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 43 / 156
Maximum likelihoodLikelihood ratio test
Remarks:
Reject H0 if LR is larger than the (1− α)-quantile of theχ2m-distribution
To compute LR, one requires both the unrestricted estimator θML andthe restricted estimator θR
Ideal, if both θML and θR are easy to calculate
The LR test is often used to compare different models to each other
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 44 / 156
Maximum likelihoodLagrange multiplier test
Lagrange multiplier test
Is g(θR) significantly different from 0?
The test is based on the restricted estimator θR
Lagrange approach: maxθ ln L(θ) s.t. r(θ) = 0
LM test statistic
LM = g(θR)′ ·[I (θR)
]−1· g(θR)
asy∼ χ2m
with
I (θR) = −E
(∂2 ln L(θR)
∂θ∂θ′
)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 45 / 156
Maximum likelihoodLagrange multiplier test
Remarks:
Reject H0 if LM is larger than the (1− α)-quantile of theχ2m-distribution
The LM test only requires the restricted estimator
Ideal, if θR is much easier to calculate than θML
The LM test is often used to test misspecifications(heteroskedasticity, autocorrelation, omitted variables etc.)
Asymptotically, the three tests are equivalent
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 46 / 156
Maximum likelihoodThe three classical tests
Multivariate case
Example: Production function
Yi = X a1i1 · X
a2i2 + ui
where ui ∼ N(0, 0.052)
Log-likelihood function ln L(a1, a2)
ML estimators a1 and a2
Hypothesis test of a1 + a2 = 1 or a1 + a2 − 1 = 0
See classtest.R
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 47 / 156
Instrumental variablesPreliminaries
OLS is not consistent if E (ut |Xt) 6= 0
Define an information set Ωt (a σ-algebra), such that
E (ut |Ωt) = 0
This moment condition can be used for estimation
Variables in Ωt are called instrumental variables (or instruments)
We denote the instrument vector by Wt
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 48 / 156
Instrumental variablesCorrelation between errors and disturbances (I)
Errors in variables
Consider the model
yt = α + βx∗t + εt , εt ∼ iid(0, σ2ε)
The exogenous variable x∗t is unobservable
We can only observext = x∗t + vt
where vt ∼ iid(0, σ2v ) are independent of everything else
Estimators of yt = α + βxt + ut are inconsistent [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 49 / 156
Instrumental variablesCorrelation between errors and disturbances (II)
Omitted variables bias
Letyt = α + β1x1t + β2x2t + εt
If x2 is unobservable, one estimates
yt = α + β1x1t + ut
where ut = β2x2t + εt
If x2t and x1t are correlated then so are ut and x1t
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 50 / 156
Instrumental variablesCorrelation between errors and disturbances (III)
Endogeneity
Standard example: supply and demand curves determine both priceand quantity
qt = γdpt + X dt βd + udt
qt = γspt + X st βs + ust
Solve for qt and pt[qtpt
]=
[1 −γd1 −γs
]−1([X dt βd
X st βs
]+
(udtust
))
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 51 / 156
Instrumental variablesCorrelation between errors and disturbances (III)
Since qt and pt depend on both udt and ust single equation OLSestimation of
qt = γdpt + X dt βd + udt
qt = γspt + X st βs + ust
is inconsistent
The right hand side variable pt is correlated with the error term
The condition E (ut |Ωt) = 0 is violated if pt is in Ωt
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 52 / 156
Instrumental variablesCorrelation between errors and disturbances
Warning! Inconsistency is not always a problem
If we simply want to forecast, we can use inconsistent estimators
Trivial example:
10 20 30 40 50
050
100
150
200
Positive correlation between u and X
x
y
true regression line
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 53 / 156
Instrumental variablesThe simple IV estimator
Let W denote the T × K matrix of instruments
All columns of X with Xt ∈ Ωt should be included in W
Then E (ut |Wt) = 0 implies the moment condition
E(W ′u
)= E
(W ′ (y − Xβ)
)= 0
The IV estimator is a method of moment estimator
The solution isβIV =
(W ′X
)−1W ′y
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 54 / 156
Instrumental variablesProperties
The simple IV estimator is consistent if
plim1
nW ′X = SWX
is deterministic and nonsingular [P]
The simple IV estimator is asymptotically normal,
√n(βIV − β
)→ U ∼ N
(0, σ2 (SWX )−1 SWW
(S ′WX
)−1)
where SWW = plim 1nW
′W [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 55 / 156
Instrumental variablesHow to find instruments
Instruments must be
1 exogenous, i.e. plim 1nW
′u = 02 valid, i.e. plim 1
nW′X = SWX non-singular
Natural experiments (weather, earthquakes, . . . )
Angrist and Pischke (2009):
Good instruments come from a combination of institutional knowledge andideas about the processes determining the variable of interest.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 56 / 156
Instrumental variablesHow to find instruments
Examples
Natural experiments
1 Brı¿ 12 ckner and Ciccone: Rain and the democratic window of
opportunity, Econometrica 79 (2011) 923-947
2 Angrist and Evans: Children and their parents’ labor supply: Evidencefrom exogenous variation in family size, American Economic Review88 (1998) 450-77.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 57 / 156
Instrumental variablesHow to find instruments
Examples
Institutional arrangements
1 Angrist and Krueger: Does Compulsory School Attendance AffectSchooling and Earnings?, Quarterly Journal of Economics 106 (1991)979-1014.
2 Levitt: The Effect of Prison Population Size on Crime Rates: Evidencefrom Prison Overcrowding Litigation, Quarterly Journal of Economics111 (1996) 319-351.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 58 / 156
Instrumental variablesHow to find instruments
In a time series context, one can sometimes use lagged endogenousregressors as instrumental variables
Example:yt = α + βxt + ut
with E (ut |xt) 6= 0
If Cov (xt , xt−1) 6= 0 but Cov (ut , xt−1) = 0, then xt−1 can be used asinstrumental variable
Attention: Cov (ut , xt−1) = 0 is not always obvious
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 59 / 156
Instrumental variablesHow to find instruments
Example (Measurement error in time series)
Consider the model
yt = α + βx∗t + ut
x∗t = ρx∗t−1 + εt
xt = x∗t + vt .
Then xt−1 is a valid instrument for a regression of yt on xt , and α and βwill be estimated consistently.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 60 / 156
Instrumental variablesHow to find instruments
Example (Omitted variable bias in time series)
Consider the model
yt = α + β1x1t + β2xt2 + ut
x1t = ρ11x1,t−1 + ρ12x2,t−1 + ε1t
x2t = ρ21x1,t−1 + ρ22x2,t−1 + ε2t
Then x1,t−1 is not a valid instrument for a regression of yt on x1t , and αand β1 will not be estimated consistently.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 61 / 156
Instrumental variablesHow to find instruments
Example (Endogeneity in time series)
Consider the model
yt = α + β1xt + β2yt−1 + ut
xt = γ + δ1yt + δ2xt−1 + vt
Then x1,t−1 is a valid instrument for a regression of yt on xt and yt−1, andα, β1 and β2 will be estimated consistently.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 62 / 156
Instrumental variablesGeneralized IV estimation
If the number of instruments L is larger than the number ofparameters K , the model is overidentified
Right-multiply the T × L matrix W by an L× K matrix J to obtainan T × K instrument matrix WJ
Linear combinations of the instruments in W
One can show that the asymptotically optimal matrix isJ = (W ′W )−1 W ′X
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 63 / 156
Instrumental variablesGeneralized IV estimation
The generalized IV estimator is
βIV =((WJ)′ X
)−1(WJ)′ y
=(X ′W
(W ′W
)−1W ′X
)−1X ′W
(W ′W
)−1W ′y
=(X ′PWX
)−1X ′PW y
with PW = W (W ′W )−1 W ′
Consistency and asymptotic normality still hold
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 64 / 156
Instrumental variablesGeneralized IV estimation
The two-stage-least-squares (2SLS) interpretation
The matrix J is similar to β in the standard OLS model,
J =(W ′W
)−1W ′X
Hence, WJ is similar to X β
The optimal instruments are obtained if we regress theendogenous regressors on the instruments (1st stage), andthen use the fitted values as regressors (2nd stage)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 65 / 156
Instrumental variablesFinite sample properties
The finite sample properties of IV estimators are complex
In the overidentified case, the first L− K moments exist,but higher moments do not
If the expectation exists, IV estimators are in general biased
The simple IV estimator has very heavy tails,even the first moment does not exist!
The estimator can be extremely far off the true value
ivfinite.R
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 66 / 156
Instrumental variablesHypothesis testing
Exact hypothesis tests are usually not feasible
Asymptotic tests are based on the asymptotic normality
An estimator of the covariance matrix of βIV is
Cov(βIV
)= σ2
(X ′PWX
)−1
with
PW = W(W ′W
)−1W ′
σ2 =1
n
(y − X βIV
)′ (y − X βIV
)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 67 / 156
Instrumental variablesHypothesis testing
Asymptotic t-test
H0 : βi = βi0
H1 : βi 6= βi0
Under the null hypothesis, the test statistic
t =βi − βi0√Var
(βi
)is asymptotically N(0, 1)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 68 / 156
Instrumental variablesHypothesis testing
Asymptotic Wald test (similiar to an F -test)
H0 : β2 = β20, H1 : β2 6= β20
where β2 is a length L subvector of β
Under the null hypothesis, the test statistic
W =(β2 − β20
)′ [Cov
(β2
)]−1 (β2 − β20
)is asymptotically χ2 with L degrees of freedom
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 69 / 156
Instrumental variablesHypothesis testing
Testing overidentifying restrictions
The identifying restrictions are
E (ut |Wt) = 0
or E(W ′u
)= 0
If the model is just identified the validity of the restriction cannot betested
If the model is overidentified, one can test if the overidentifyingrestrictions hold, i.e. if the instruments are valid and exogenous
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 70 / 156
Instrumental variablesHypothesis testing
Basic test idea: Check if the IV residuals can be explainedby the full set of instruments
Compute the IV residuals u
Regress the residuals on all instruments W
Under the null hypothesis, the test statistic
nR2 ∼ χ2m
where m is the degree of overidentification
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 71 / 156
Instrumental variablesHypothesis testing
Davidson and MacKinnon (2004, p. 338):Even if we do not know quite how to interpret a significant value of theoveridentification test statistic, it is always a good idea to compute it. If itis significantly larger than it should be by chance under the nullhypothesis, one should be extremely cautious in interpreting the estimates,because it is quite likely either that the model is specified incorrectly orthat some of the instruments are invalid.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 72 / 156
Instrumental variablesHypothesis testing
Durbin-Wu-Hausman test
H0 : E(X ′u
)= 0
H1 : E(W ′u
)= 0
Test if IV estimation is really necessary or if OLS would do
Under H1, OLS is inconsistent, but IV is still consistent
Basic test idea: Compare βOLS and βIV . If they are‘too different’, reject H0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 73 / 156
Instrumental variablesHypothesis testing
The difference between the estimators is
βIV − βOLS
=(X ′PWX
)−1X ′PW y −
(X ′X
)−1X ′y
=(X ′PWX
)−1(X ′PW y −
(X ′PWX
) (X ′X
)−1X ′y
)=
(X ′PWX
)−1(X ′PW
(I − X
(X ′X
)−1X ′)y)
=(X ′PWX
)−1 (X ′PWMX y
)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156
Instrumental variablesHypothesis testing
The difference between the estimators is
βIV − βOLS
=(X ′PWX
)−1X ′PW y −
(X ′X
)−1X ′y
=(X ′PWX
)−1(X ′PW y −
(X ′PWX
) (X ′X
)−1X ′y
)
=(X ′PWX
)−1(X ′PW
(I − X
(X ′X
)−1X ′)y)
=(X ′PWX
)−1 (X ′PWMX y
)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156
Instrumental variablesHypothesis testing
The difference between the estimators is
βIV − βOLS
=(X ′PWX
)−1X ′PW y −
(X ′X
)−1X ′y
=(X ′PWX
)−1(X ′PW y −
(X ′PWX
) (X ′X
)−1X ′y
)=
(X ′PWX
)−1(X ′PW
(I − X
(X ′X
)−1X ′)y)
=(X ′PWX
)−1 (X ′PWMX y
)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156
Instrumental variablesHypothesis testing
The difference between the estimators is
βIV − βOLS
=(X ′PWX
)−1X ′PW y −
(X ′X
)−1X ′y
=(X ′PWX
)−1(X ′PW y −
(X ′PWX
) (X ′X
)−1X ′y
)=
(X ′PWX
)−1(X ′PW
(I − X
(X ′X
)−1X ′)y)
=(X ′PWX
)−1 (X ′PWMX y
)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 74 / 156
Instrumental variablesHypothesis testing
We need to test if X ′PWMX y is significantly different from 0
This term is identically equal to zero for all variables in X that areinstruments (i.e. that are also in W )
Denote by X all possibly endogenous regressors
To test if X ′PWMX y is significantly different from zero, perform aWald test of δ = 0 in the regression
y = Xβ + PW X δ + u
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 75 / 156
GMMModel description
Hansen, L. (1982), Large Sample Properties of Generalized Method ofMoments Estimators, Econometrica 50, 1029-1054:In this paper we study the large sample properties of a class of generalizedmethod of moments (GMM) estimators which subsumes many standardeconometric estimators. To motivate this class, consider an econometricmodel whose parameter vector we wish to estimate. The model implies afamily of orthogonality conditions that embed any economictheoretical restrictions that we wish to impose or test.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 76 / 156
GMMModel description
John Cochrane (2005), Asset Pricing, p. 196:
Most of the effort involved with GMM is simply mapping a given probleminto the very general notation.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 77 / 156
GMMModel description
Describe the model by elementary zero functions
Eθ (ft (θ, yt)) = 0
where everything can be vector-valued
Parameter vector θ of length K
Observation vectors yt
Identification condition
Eθ0 (ft (θ, yt)) 6= 0 for all θ 6= θ0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 78 / 156
GMMModel description
Example (Linear regression model)
Consider the standard model
y = Xβ + u
u ∼ N(0, σ2I ), independent of X
Parameter vector θ =?Observations yt =?Elementary zero functions ft(θ, yt) =?
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 79 / 156
GMMModel description
Example (Lognormal distribution)
Suppose there is a random sample X1, . . . ,Xn from
X ∼ LN(µ, σ2)
Parameter vector θ =?Observations yt =?Elementary zero functions ft(θ, yt) =?
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 80 / 156
GMMModel description
Example (Asset pricing)
The basic asset pricing formula is
pt = E (mt+1xt+1|Ωt)
with asset price p, stochastic discount factor m, payoff x , and informationset Ωt .
Parameter vector θ =?Observations yt =?Elementary zero functions ft(θ, yt) =?
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 81 / 156
GMMModel description
Stack all elementary zero functions
f (θ, y) =
f1 (θ, y1)...
fn (θ, yn)
Covariance matrix
E(f (θ, y) f (θ, y)′
)= Ω
Dimension of Ω depends on dimension of ft(θ, yt)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 82 / 156
GMMModel description
Example (Linear regression model)
The covariance matrix Ω is
E (f (θ, y) f (θ, y)′) = E(u u′)
= σ2I
If there are autocorrelation and heteroskedasticity
E(u u′)
= Ω
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 83 / 156
GMMModel description
Example (Lognormal distribution)
The covariance matrix Ω is
E (f (θ, y) f (θ, y)′) = E
f 211 f11f12 . . . f11fn1 f11fn2
f12f11 f 212 . . . f12fn1 f12fn2
......
. . ....
...fn1f11 fn1f12 . . . f 2
n1 fn1fn2
fn2f11 fn2f12 . . . fn2fn1 f 2n2
= ?
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 84 / 156
GMMModel description
Example (Asset pricing)
The covariance matrix Ω is
E (f (θ, y) f (θ, y)′) = E
f 211 . . . f11fn1...
. . ....
fn1f11 . . . f 2n1
= ?
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 85 / 156
GMMEstimating equations
To estimate θ, we need K estimating equations
In general, they are weighted averages of the ft
In most cases, the estimating equations are based on L ≥ Kinstrumental variables W
If L > K , we need to form linear combinations
Let W be the n × L matrix of instrumentsand J be an L× K matrix of full rank
Define the n × K matrix Z = WJ
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 86 / 156
GMMEstimating equations
Theoretical moment conditions (orthogonality conditions)
E(Z ′t ft (θ, yt)
)= 0
The estimating equations are the empirical counterpart
1
nZ ′f (θ, y) = 0
Solving this system yields the GMM estimator θ
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 87 / 156
GMMEstimating equations
Example (Linear regression model)
The K moment conditions for the linear regression model are
E(Z ′t ft (θ, yt)
)= E
(X ′t(yt − X ′tβ
))= 0
and the estimating equations are
1
nX ′ (y − Xβ) = 0.
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 88 / 156
GMMEstimating equations
Example (Lognormal distribution)
The two moment conditions for the lognormal distribution are
E(Z ′t ft (θ, yt)
)= E
([1 00 1
] [ft1 (θ, yt)ft2 (θ, yt)
])= E
([Xt − exp
(µ+ 1
2σ2)
X 2t − exp
(2µ+ 2σ2
) ])=
(00
)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 89 / 156
GMMEstimating equations
Example (contd)
. . . and the estimating equations are
1
nZ ′f (θ, y) =
1
n
[1 0 1 0 . . . 1 00 1 0 1 . . . 0 1
]
f11
f12...fn1
fn2
=
[1n
∑nt=1
(Xt − exp
(µ+ 1
2σ2))
1n
∑nt=1
(X 2t − exp
(2µ+ 2σ2
)) ] =
[00
]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 90 / 156
GMMProperties of GMM estimators
Consistency
Assume that a law of large numbers applies to 1nZ′f (θ, y)
Define the limiting estimation functions
α (θ) = plim1
nZ ′f (θ, y)
and the limiting estimation equations α (θ) = 0
The GMM estimator θ is consistent if the asymptotic identificationcondition holds, α (θ) 6= α (θ0) for all θ 6= θ0 [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 91 / 156
GMMProperties of GMM estimators
Asymptotic normality
Simplified notation: ft(θ) = ft(θ, yt), f (θ) = f (θ, y)
Additional assumption: ft (θ) is continuously differentiable at θ0
First order Taylor series expansion of
1
nZ ′f (θ) = 0
in θ around θ0 [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 92 / 156
GMMAsymptotic efficiency
The asymptotic distribution of√n(θ − θ0
)is normal with
mean 0 and covariance matrix(plim
1
nZ ′F (θ0)
)−1(plim
1
nZ ′ΩZ
)(plim
1
nF (θ0)′Z
)−1
What is the optimal choice of Z in the estimating equations?
The optimal choice depends on assumptions about the matrices F (θ)and Ω
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 93 / 156
GMMAsymptotic efficiency
If Ω = σ2I and E (Ft(θ0)ft(θ0)) = 0 the optimal choice is
Z = F (θ0)
Problem: Z depends on the unknown θ0
Solution: Solve the estimating equations
1
nF ′(θ)f (θ) = 0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 94 / 156
GMMAsymptotic efficiency
If Ω = σ2I and E (Ft(θ0)ft(θ0)) 6= 0 but Wt ∈ Ωt , the optimal choiceis
Z = PWF (θ0)
Problem: Z depends on the unknown θ0
Solution: Solve the estimating equations
1
nF ′(θ)PW f (θ) = 0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 95 / 156
GMMAsymptotic efficiency
Suppose, the covariance matrix Ω is unknown
Since Z = WJ, the covariance matrix of√n(θ − θ0) is(
plim1
nJ ′W ′F0
)−1(plim
1
nJ ′W ′ΩWJ
)(plim
1
nF0′WJ
)−1
For the optimal J = (W ′ΩW )−1 W ′F0 this becomes(plim
1
nF ′0W
(W ′ΩW
)−1W ′F0
)−1
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 96 / 156
GMMAsymptotic efficiency
Although Ω cannot be estimated consistently, the term 1nW
′ΩW canbe estimated consistently (we will do that later)
If Σ is an estimator of 1nW
′ΩW , the optimal estimating equations are
1
nJ ′W ′f (θ) =
1
nF (θ)′W Σ−1W ′f (θ) = 0
and the estimated covariance matrix of θ is
Cov(θ) = n(F ′W Σ−1W ′F
)−1
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 97 / 156
GMMAlternative notation
Attention
Many textbooks use a different notation(and so does the gmm package in R)
The two approaches are equivalent
The moment conditions are notated as
E (g (θ, yt)) = E(W ′
t ft (θ, y))
= 0
The number of moment conditions L can be larger than the numberof parameters K
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 98 / 156
GMMAlternative notation
The L estimating equations cannot be solved exactly
gn(θ, y) =1
n
n∑t=1
g(θ, yt) = 0
The GMM estimator is defined by
θ = arg min gn(θ, y)′ An gn(θ, y)
where An is a sequence of L× L weighting matrices(which can be chosen by the user) with limit A
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 99 / 156
GMMAlternative notation
The GMM estimator based on gn is consistent, θp→ θ
Asymptotic normality: Define the L× K matrix
G (θ) =∂gn (θ, yt)
∂θ′=
1
n
n∑t=1
∂g(xt , θ)
∂θ′
Assume that√ngn(θ, y)
d→ N (0,V ), then [P]
√n(θ − θ0
)d→ N
(0,(G ′AG
)−1G ′AVAG
(G ′A′G
)−1)
Asymptotically optimal weighting matrix A [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 100 / 156
GMMEquivalence
The two GMM approaches (based on ft and g) are equivalent
The first order condition of g(θ)′Ag(θ) is
G ′K×L
AL×L
gL×1
= 0K×1
which is the same as
J ′K×L
W ′L×n
fn×1
= 0K×1
List of equivalences [P]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 101 / 156
GMMCovariance matrix estimation
The covariance matrix of the elementary zero functions
E(f (θ, y) f (θ, y)′
)= Ω
is often unknown
There may be heteroskedasticity and autocorrelation in Ω
Although Ω cannot be estimated consistently, the term 1nW
′ΩW canbe estimated consistently
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 102 / 156
GMMCovariance matrix estimation
Write
Σ = plimn→∞1
nW ′ΩW
Assume that a suitable law of large numbers holds,
Σ = limn→∞
1
n
n∑t=1
n∑s=1
E(ft fsW
′tWs
)where ft = ft (θ, yt)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 103 / 156
GMMCovariance matrix estimation
Define the autocovariance matrices
Γ(j) =
1n
∑nt=j+1 E (ft ft−jW
′tWt−j) for j ≥ 0
1n
∑nt=−j+1 E
(ft+j ftW
′t+jWt
)for j < 0
Then
Σ = limn→∞
n−1∑j=−n+1
Γ(j) = limn→∞
Γ(0) +n−1∑j=1
(Γ(j) + Γ′(j)
)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 104 / 156
GMMCovariance matrix estimation
The autocovariance matrix Γ(j), j ≥ 0, can be estimated by
Γ(j) =1
n
n∑t=j+1
ft ft−jW′tWt−j
Newey-West estimator of Σ
Σ = Γ(0) +
p∑j=1
(1− j
p + 1
)(Γ(j) + Γ′(j)
)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 105 / 156
GMMTest of overidentifying restrictions
The GMM estimators minimize the criterion function
1
nf ′(θ)W Σ−1W ′f (θ)
Asymptotically, the minimized value (Hansen’s J statistics,Hansen’s overidentification statistic, Hansen-Sargan statistic)is distributed as χ2
L−K if the overidentifying restrictions hold
If the null hypothesis is rejected, then something went wrong,e.g. the model is misspecified
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 106 / 156
Indirect inferenceBasic idea
Anthony Smith, Jr. (New Palgrave Dictionary of Economics):Indirect inference is a simulation-based method for estimating theparameters of economic models . Its hallmark is the use of an auxiliarymodel to capture aspects of the data upon which to base the estimation.The parameters of the auxiliary model can be estimated using either theobserved data or data simulated from the economic model. Indirectinference chooses the parameters of the economic model so that these twoestimates of the parameters of the auxiliary model are as close as possible .
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 107 / 156
Indirect inferenceThe true model
Economic model
yt = G (yt−1, xt , ut ;β) , t = 1, . . . ,T
Exogenous variables xt and endogenous variables yt
Random errors ut , i.i.d. with cdf F
Parameter vector β of dimension K
Let standard estimation methods for β be intractable
It must be possible (and easy) to simulate y1, . . . , yTgiven y0 (assumed to be known), x1, . . . , xT and β
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 108 / 156
Indirect inferenceThe auxiliary model
The true model is too complicated for estimation of β
Instead estimate an auxiliary model with parameter vector θ
The dimension L of θ must be at least as large as thedimension K of β
The auxiliary model must be
“suitable” (but is allowed to be misspecified)easy and fast to estimate
Often, the auxiliary model is a standard time series model
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 109 / 156
Indirect inferenceEstimating the auxiliary model
For given β (and y0, x1, . . . , xT ), the auxiliary model’s parameters θare estimated
1 from the observed data x1, . . . , xT , y1, . . . , yT ,resulting in estimator θ
2 from H simulated datasets x1, . . . , xT , y(h)1 , . . . , y
(h)T for h = 1, . . . ,H,
resulting in estimators θ(h)(β)
Define
θ(β) =1
H
H∑h=1
θ(h)(β)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 110 / 156
Indirect inferenceOptimization
Compute the difference between the vectors θ and θ(β)
Q(β) =(θ − θ(β)
)′W(θ − θ(β)
)where W is a positive definite weighting matrix
The indirect inference estimator of β is
β = arg minQ(β)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 111 / 156
Indirect inferenceRemarks
The simulations have to be done with the same set ofrandom errors
Indirect inference is similar to GMM: the auxiliary parametersare the “moments”
The asymptotic distribution of β can be derived(see Gourieroux et al., 1993)
The weighting matrix W can be chosen optimally
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 112 / 156
Indirect inferenceA simple example (Gourieroux et al., 1993)
Consider the MA(1) process
yt = εt − βεt−1
with εt ∼ N(0, 1) and β = 0.5 for t = 1, . . . , 250
The maximum likelihood estimator βML is not trivial
Indirect inference estimator βII of β ?
Auxiliary model: AR(3) with parameters θ
No weighting, the matrix W is the identity matrix
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 113 / 156
Indirect inferenceA simple example (Gourieroux et al., 1993)
Compare the distribution of βML and βII
Step 1: Simulate a time series y1, . . . , y250
Step 2: Compute βML
Step 3: Estimate θ from y1, . . . , y250
Step 4: For given β, simulate 10 paths y(h)1 , . . . , y
(h)250
Step 5: Estimate θ(β) from the simulated paths
Step 6: Repeat steps 4 and 5 for different β until the differencebetween θ and θ(β) is minimized
Step 7: Save βII and start again at step 1
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 114 / 156
BootstrapBasic idea
Point of departure: unknown distribution function F(univariate or multivariate)
Unknown parameter vector
θ = θ(F )
Simple random sample X1, . . . ,Xn from F
Estimatorθ = θ(X1, . . . ,Xn)
Why is the distribution of θ of interest?
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 115 / 156
BootstrapBasic idea
Basic bootstrap idea: Approximate the unknown distribution of
θ(X1, . . . ,Xn) for X1, . . . ,Xni.i.d. from F
by the distribution of
θ(X ∗1 , . . . ,X∗n ) for X ∗1 , . . . ,X
∗n i.i.d. from F
The distribution of θ under F is usually found by Monte-Carlosimulations based on resamples (pseudo sample)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 116 / 156
BootstrapBasic idea
How is F estimated?parametric −→ parametric bootstrapnonparametric −→ nonparametric bootstrapsmoothed −→ smooth bootstrapmodel based
Applicationsbias and standard errorsconfidence intervalshypothesis tests
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 117 / 156
BootstrapExample 1
Nonparametric bootstrap of the standard error of
θ = X =1
n
n∑i=1
Xi
Simple random sample X1, . . . ,X20
Estimation of the unknown cdf F by the empirical distributionfunction
Fn(x) =1
n
n∑i=1
1 (Xi ≤ x)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 118 / 156
BootstrapExample 1 (contd)
How is X distributed under F ?
How is X distributed under F = Fn ?
Estimation of the distributio of X under Fnby Monte-Carlo simulation
Calculation of the standard deviation of X under Fn
The distribution of X under Fn is an approximation of the distributionof X under F
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 119 / 156
BootstrapExample 1 (still contd): The algorithm
1 Draw a random sample X ∗1 , . . . ,X∗20 from Fn (resampling)
2 Compute
X ∗ =1
20
20∑i=1
X ∗i
3 Repeat steps 1 and 2 a large number B of times,save the results as X ∗1 , . . . , X
∗B
4 Compute the standard error bootex1.R
SE (X ) =
√√√√ 1
B − 1
B∑i=1
(X ∗i − X ∗
)2
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 120 / 156
BootstrapExample 2
Parametric bootstrap of the bias of
θ = λ =1
X
for the exponential distribution X ∼ Exp(λ)
Simple random sample X1, . . . ,X8
Estimation of the unknown distribution function F by
Fλ(x) = 1− exp(−λx
)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 121 / 156
BootstrapExample 2 (contd)
How is λ distributed under F ?
How is λ distributed under F = Fλ ?
Estimation of the distribution of λ under Fλby Monte-Carlo simulation
Find the expectation of λ under Fλ
The distribution of λ under Fλ approximates the distribution of λunder F
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 122 / 156
BootstrapExample 2 (still contd): The algorithm
1 Compute λ = 1/X from X1, . . . ,X8
2 Draw a simple random sample X ∗1 , . . . ,X∗8 from Fλ
3 Compute λ∗ = 1/X ∗
4 Repeat steps 1 and 2 a large number B of times,save the results as λ∗1, . . . , λ
∗B
5 Estimate the bias by bootex2.R(1
B
∑b
λ∗b
)− λ
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 123 / 156
BootstrapGeneral approach for bootstrap standard errors
originalsample
X1, . . . ,Xn
−→
edf
F = Fnor
F = Fθ
−→
⟨ 1. resample: X ∗1 , . . . ,X∗n → θ∗1
2. resample: X ∗1 , . . . ,X∗n → θ∗2
...
B. resample: X ∗1 , . . . ,X∗n → θ∗B
−→ SE (θ) =
√√√√ 1
B − 1
B∑b=1
(θ∗b − θ∗
)2
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 124 / 156
BootstrapBootstrapping confidence intervals
General definition: An interval[θlow (X1, . . . ,Xn) ; θhigh (X1, . . . ,Xn)
]is called (1− α)-confidence interval if
P(θlow ≤ θ ≤ θhigh
)= 1− α
If the equality holds only asymptotically, the interval is calledasymptotic (1− α)-confidence interval
Note: The interval limits are random variables
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 125 / 156
BootstrapNaive bootstrap confidence intervals
The naive confidence intervals are sometimes called the“other” percentile method
Generate a large number (B) of resamples and compute θ∗1, . . . , θ∗B
Let θ∗(1) ≤ θ∗(2) ≤ . . . ≤ θ
∗(B) be the order statistic
The naive (1− α)-confidence interval is[θ∗((α/2)B); θ∗((1−α/2)B)
]Why is this approach often problematic? bootnaiv.R
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 126 / 156
BootstrapPercentile bootstrap confidence intervals
To determine confidence intervals we look at the distribution of
θ − θ
Let c1 and c2 be the α/2- and (1− α/2)-quantiles, i.e.
P(c1 ≤ θ − θ ≤ c2
)= 1− α
Then [θ − c2, θ − c1
]is the (1− α)-confidence interval
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 127 / 156
BootstrapPercentile bootstrap confidence intervals
Approximate the distribution of θ − θ by bootstrapping
θ∗ − θ
Let c∗1 and c∗2 be the α/2- and (1− α/2)-quantiles, i.e.
P(c∗1 ≤ θ∗ − θ ≤ c∗2
)= 1− α
We obtain c∗1 = θ∗(α/2B) − θ and c∗2 = θ∗((1−α/2)B) − θ and[θ − c∗2 , θ − c∗1
]=[2θ − θ∗((1−α/2)B); 2θ − θ∗((α/2)B)
]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 128 / 156
BootstrapPercentile bootstrap confidence intervals
Algorithm of the percentile method:
Compute θ from the original sample X1, . . . ,Xn
Generate a large number B of resamples and compute θ∗1, . . . , θ∗B
Let θ∗(1) ≤ θ∗(2) ≤ . . . ≤ θ
∗(B) be the order statistics
The bootstrap (1− α)-confidence interval is[2θ − θ∗((1−α/2)B); 2θ − θ∗((α/2)B)
]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 129 / 156
BootstrapExample 3
Parametric bootstrap 0.95-confidence interval for λ of an exponentialdistribution
Simple random sample X1, . . . ,X8
Estimate λ by λ = 1/X
Estimate the unknown distribution function F by
Fλ(x) = 1− exp(−λx
)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 130 / 156
BootstrapExample 3 (contd)
The algorithm bootex3.R
1 Compute λ = 1/X from X1, . . . ,X8
2 Draw a simple random sample X ∗1 , . . . ,X∗8 from Fλ
3 Compute λ∗ = 1/X ∗
4 Repeat steps 1 and 2 a large number B of times,save the results as λ∗1, . . . , λ
∗B
5 The bootstrap 0.95-confidence interval is[2λ− λ∗((1−α/2)B); 2λ− λ∗((α/2)B)
]
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 131 / 156
BootstrapHypothesis testing
Test the hypotheses
H0 : θ = θ0
H1 : θ 6= θ0
at significance level α
Assumption: Random sample (univariate or multivariate)
Test statisticT = θ − θ0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 132 / 156
BootstrapHypothesis testing
Reject H0 if the value of the test statistic is less than theα/2-quantile of T or greater than the (1− α/2)-quantile of T
The p-value of the test is P(|T | > |t|)How can we estimate the distribution of T under H0 ?
Wald approach: bootstrap distribution
T ∗ = θ∗ − θ
θ∗ = θ(X ∗1 , . . . ,X∗n ) is calculated from resamples drawn under the
alternative hypothesis
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 133 / 156
BootstrapHypothesis testing
Lagrange multiplier approach: bootstrap distribution
T# = θ# − θ0
Attention: θ# = θ(X#1 , . . . ,X
#n ) is calculated from resamples drawn
under the null hypothesis!
This approach is particularly suitable for the parametric bootstrap(but can also be used for other bootstraps)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 134 / 156
BootstrapHypothesis testing: General algorithm
1 Compute test statistic T from X1, . . . ,Xn
2 Draw a resample under the null hypothesis, X#1 , . . . ,X
#n , or draw a
resample under the alternative hypothesis, X ∗1 , . . . ,X∗n
3 Compute the test statistic T ∗ or T# for the resample
4 Repeat steps 2 and 3 a large number B of times;save the results as T#
1 , . . . ,T#B or T ∗1 , . . . ,T
∗B
5 Calculate the α/2-quantile c#1 (or c∗1 ) and the
(1− α/2)-quantile c#2 (or c∗2 )
6 Reject H0 if the test statistic T is less than c#1 (or c∗1 ) or greater
than c#2 (or c∗2 )
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 135 / 156
BootstrapExample 4
Parametric bootstrap for the parameter λ of an exponentialdistribution X ∼ Exp(λ)
Random sample X1, . . . ,X8
Hypotheses H0 : λ = λ0 = 2 against H1 : λ 6= λ0
(at level α = 0.05)
Test statisticT = λ− 2
Bootstrap of the distribution of T under the alternative hypothesis(Wald approach) bootex4a.R
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 136 / 156
BootstrapExample 4 (contd)
Bootstrap of the distribution of T under the null hypothesis(LM approach) bootex4b.R
Under the null hypothesis, X# ∼ Exp(λ0) with λ0 = 2
Hence, the distribution of T# is found by an ordinary Monte-Carlosimulation!
If T < T#(α/2B) or T > T#
((1−α/2)B), reject H0
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 137 / 156
BootstrapExample 5
Nonparametric test for equality of two expectations
Two independent variables X and Y with expectations µX , µYand unknown variances σ2
X , σ2Y
Hypotheses H0 : µX = µY against H1 : µX 6= µY
Samples X1, . . . ,Xm and Y1, . . . ,Yn
Test statistic
T =µX − µY√σ2X + σ2
Y
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 138 / 156
BootstrapExample 5 (contd)
Case I: resampling under the alternative hypothesis bootex5a.R
Draw X ∗1 , . . . ,X∗m with replacement from X1, . . . ,Xm
and Y ∗1 , . . . ,Y∗n from Y1, . . . ,Yn
Compute the test statistic T ∗
Repeat this B times; calculate the quantile of T ∗
Reject H0 at level α = 0.05 if T < T ∗(0.025B) or T > T ∗(0.975B)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 139 / 156
BootstrapExample 5 (still contd)
Case II: resampling under the null hypothesis bootex5b.R
Estimate the joint expectation by
µ =mµX + nµY
n + m
Translate X1, . . . ,Xm such that their mean is µ
Translate Y1, . . . ,Yn such that their mean is µ
Resample from the translated data (i.e. under the null hypothesis);then continue as before
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 140 / 156
BootstrapExample 6
Nonparametric bootstrap for independence
Bivariate distribution (X ,Y )
Hypothesis H0 : X and Y are stochastically independent
Sample (X1,Y1) , . . . , (Xn,Yn)
Test statistic: Empirical coefficient of correlation
T = Corr(X ,Y ) =
∑(Xi − X
) (Yi − Y
)√∑(Xi − X
)2∑(Yi − Y
)2
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 141 / 156
BootstrapExample 6 (contd)
Resampling under the null hypothesis bootex6.R
Draw X#1 , . . . ,X
#n with replacement from X1, . . . ,Xn
Independently, draw Y #1 , . . . ,Y #
n with replacement from Y1, . . . ,Yn
Bootstrap distribution of
T# = Corr(X#,Y #)
Reject H0 if T < T#(0.025B) or T > T#
(0.975B)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 142 / 156
BootstrapResampling methods: Parametric bootstrap
Parametric bootstrap under the alternative hypothesis
1 Estimate θ from the original data X1, . . . ,Xn
2 The estimated distribution function is F = Fθ3 Draw X ∗1 , . . . ,X
∗n from Fθ and compute θ∗
4 Repeat step 3 a large number of times to determine the requireddistribution
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 143 / 156
BootstrapResampling methods: Parametric bootstrap
Parametric bootstrap under the null hypothesis
1 The estimated distribution function is F = Fθ0 If the distribution
function is not completely specified by θ0, choose F “as close aspossible” to θ
2 Draw X#1 , . . . ,X
#n from Fθ0 and compute θ#
3 Repeat step 2 a large number of times to determine the requireddistribution
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 144 / 156
BootstrapResampling methods: Nonparametric bootstrap
Nonparametric bootstrap under the alternative hypothesis
1 The estimated distribution function is F = Fn(empirical distribution function)
2 Draw X ∗1 , . . . ,X∗n with replacement from X1, . . . ,Xn
and compute θ∗
3 Repeat step 2 a large number of times to determine the requireddistribution
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 145 / 156
BootstrapResampling methods: Nonparametric bootstrap
Nonparametric bootstrap under the null hypothesis
1 The estimated distribution function F is a weighted empiricaldistribution function
2 Draw X#1 , . . . ,X
#n with replacement (but with different probabilities)
from X1, . . . ,Xn
The probabilities are chosen such that F satisfies H0. If not unique,choose an optimality criterion, e.g. maximal entropy
3 Repeat step 2 a large number of times to determine the requireddistribution
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 146 / 156
BootstrapResampling methods: Smooth bootstrap
Smooth bootstrap under the alternative hypothesis
Kernel density estimation (e.g. with Gaussian kernel φ)
fX (x) =1
nh
n∑i=1
φ
(x − Xi
h
)
Estimated distribution function F (x) =∫ x−∞ fX (z)dz
Draw X ∗1 , . . . ,X∗n from F (x)
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 147 / 156
BootstrapResampling methods: Smooth bootstrap
Drawing from F (x) is equivalent to the following method:
1 Draw Z1, . . . ,Zn with replacement from X1, . . . ,Xn
2 Draw ε1, . . . , εn from a standard normal distribution3 For i = 1, . . . , n, compute
X ∗i = Z1 + hεi
Smooth bootstrap: nonparametric bootstrap with additional noise
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 148 / 156
BootstrapWarning
The bootstrap approximates the distribution of θ (or sometransformations of θ) if the model is correctly specified
Bias due to misspecification cannot be found by bootstrapping!
Example: Errors-in-variables, omitted variables
The validity of the bootstrap approximation can usually be shownonly asymptotically, i.e. for B →∞ and n→∞Experience shows that the bootstrap often yields good approximationsof the small-sample distribution of θ
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 149 / 156
BootstrapRegression
Simple linear regression model
yi = α + βxi + ui
for i = 1, . . . , n with i.i.d. error terms ui
Let E (ui |xi ) = 0 for all i = 1, . . . , n
OLS estimator of β is
β =
∑ni=1 (xi − x) (yi − y)∑n
i=1 (xi − x)2
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 150 / 156
BootstrapRegression
OLS estimator of α is α = y − βxFitted values
yi = α + βxi
Residualsui = yi − yi
Estimated error term variance
σ2 =1
n − 2
n∑i=1
u2i
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 151 / 156
BootstrapRegression
How can we construct a (1− α)-confidence interval for β?
Usual approach: Normal approximation[β − 1.96 · SE (β); β + 1.96 · SE (β)
]with standard errors SE (β) =
√σ2/
∑(xi − x)2
Alternative method (1): bootstrap the residuals
Alternative method (2): bootstrap the observations (xi , yi )
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 152 / 156
BootstrapRegression
Bootstrap the residuals
The unknown distribution function F is the distribution function ofthe error terms
The estimated distribution function F is the (parametrically ornonparametrically) estimated distribution function of the residualsu1, . . . , un
The x-values are kept constant
Only the error terms are resampled
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 153 / 156
BootstrapRegression
Algorithm (nonparametric) bootregr1.R
1 Estimate the model (β) from the data and calculate u1, . . . , un2 Draw a resample u∗1 , . . . , u
∗n with replacement from u1, . . . , un
3 For i = 1, . . . , n generate
y∗i = α + βxi + u∗i
4 Compute β∗ from (x1, y∗1 ), . . . , (xn, y
∗n )
5 Proceed as usual
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 154 / 156
BootstrapRegression
Bootstrap of the observations
The unknown distribution function F is the joint distribution functionof (xi , yi )
The estimated distribution function F is the (usuallynonparametrically) estimated multivariate distribution function of theobservations (x1, y1), . . . , (xn, yn)
The x-values are different in each resample
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 155 / 156
BootstrapRegression
Algorithm bootregr2.R
1 Estimate β from the data
2 Draw a resample (x∗1 , y∗1 ), . . . , (x∗n , y
∗n ) with replacement from
(x1, y1), . . . , (xn, yn)
3 Compute β∗ from (x∗1 , y∗1 ), . . . , (x∗n , y
∗n )
4 Proceed as usual
Andrea Beccarini (CQE) Econometrics Winter 2013/2014 156 / 156