Gerzensee, Econometrics Week 3, March 2020
Linear Models with Serially Correlated Data
Review from Bo's Week 2…
Background: Asymptotics for Serially Correlated Processes –LLN and CLT
(1) LLN and Ergodicity A process is ergodic if its elements are asymptotically independent – that is, if random variables that are far apart in the sequence are essentially statistically independent of one another (see Hayashi, page 101 and Hamilton, pages 46-47). Ergodicity is important because, together with stationarity it leads to a SLLN:
Ergodic Theorem: Suppose is stationary and ergodic with , then
This is a generalization of the SLLN. (For a proof of the theorem and a more detailed discussion see Karlin and Taylor (1975).)
If is stationary and ergodic, then so is xt = f(zt) for arbitrary function .
{zt} E(zt ) = µ
T −1
i=1
T∑ zt →a.s.
µ.
zt f
Gerzensee, Econometrics Week 3, March 2020
(2) CLT for martingale difference sequences (mds): Let {gt} be a (possibly vector-valued) mds that is stationary and ergodic with E(gtgt′) = Sgg . Then
See Hayashi (p. 106). Notes: While {gt} is serially uncorrelated, it may be serially dependent (through higher order moments). E(gtgt′) = Sgg concerns the “unconditional” variance. The conditional variance may be non-constant.
T g = 1
Tgt∑ →
d
N (0,Σgg )
Gerzensee, Econometrics Week 3, March 2020
Review: Linear Model with Serially Correlated Regressors
These notes will work through the linear regression model and present conditions under which the results in i.i.d. setting continue to hold when the data are serially correlated. I will not work through the linear IV model, but under analogous conditions the i.i.d. results will also continue to hold. Chapter 3 of Hayashi provides the details. Replace the i.i.d. assumptions with
is a stationary and ergodic process E(etxt) = 0, or letting gt = etxt then E(gt) = 0 E(xtxt¢) = Sxx which is non-singular {gt} is a mds with E(gtgt′) = Sgg. Notes: E(etxt) = 0 is weaker than E(et | X) = 0 and E(et | xt ) = 0. It is sometimes (as in Hamilton) called the assumption of predetermined regressors. We will return to the difference between these two assumptions when we discuss GLS in the context of serially correlated errors. E(gtgt′) = Sgg allows the errors to be heteroskedastic conditional on the regressors – that is, it does not require that Sgg = . “{gt} is a mds” is a “high-level” assumption concerning the cross product of and . At a
more primitive level it is implied by
{yt , xt}
σε2Σ xx
ε t xt
E(ε t |{ε i , xi}i=1t−l , xt ) = 0
Gerzensee, Econometrics Week 3, March 2020
Properties of
Consistency: (consistency)
Proof:
and is stationary and ergodic, with , so that
which is non-singular. Also {gt} is stationary and ergodic with E(gt) = 0. Thus
and the result follows from Slutsky’s theorem.
β!
β!→
p
β
β! − β = 1
Txt xt
′∑⎛⎝⎜
⎞⎠⎟
−11T
gt∑⎛⎝⎜
⎞⎠⎟
{xt xt′} E{xt xt
′}= Σ xx
1T
xt xt′∑ →
p
Σ xx
1T
gt∑ →p
0
Gerzensee, Econometrics Week 3, March 2020
Asymptotic Normality: , where .
Proof:
From earlier:
Also
which follows by the CLT for a mds.
The result then follows by Slutsky’s theorem.
T (β! − β )→
d
N (0,Vβ!)
V
β!= Σ xx
−1ΣggΣ xx−1
T (β! − β ) = 1
Txt xt
′∑⎛⎝⎜
⎞⎠⎟
−11T
gt∑⎛⎝⎜
⎞⎠⎟
1T
xt xt′∑ →
p
Σ xx (nonsingular)
1T
gt∑ →d
N (0,Σgg )
Gerzensee, Econometrics Week 3, March 2020
Feasible inference:
(1) Let be a consistent estimator of Sgg. Then , where
.
Proof: (You should fill in – use Slutsky’s Theorem)
(2)
Proof: (You should fill in)
(3) Let the Wald Statistic be written as
then
and
where .
Proof: (You should fill in – use from Slutsky’s theorem and the continuous mapping theorem.)
Σgg
Vβ= Sxx
−1ΣggSxx−1→
p
Vβ!
Sxx =
1T∑ xt xt '
t j =T (
jβ! − β j )
(Vβ) jj
→d
N (0,1)
ξW = T (Rβ! − Rβ )′ (RV
βR′ )−1(Rβ! − β )
ξW →d
χm2
ξW
m→
d
Fm,∞
rank(R) = m
Gerzensee, Econometrics Week 3, March 2020
(4) Suppose that is finite for all and . Let , , and
. Then
Proof: (in the model with , for simplicity)
so that
Now
by the ergodic theorem.
Thus
Note is finite since
and thus
so that
and the result follows.
E[(xt ,ixt , j )
2] i j ε t = yt − xtβ!
gt = ε t xt
Sgg =
1T
gt2∑ = 1
Tε t
2xt2∑
Sgg →p
Σgg
k = 1
ε t = yt − xtβ − xt (β! − β )
= ε t − xt (β! − β )
Sgg =1T∑ε t
2xt2
+(β!−β )2 1T
xt4∑
−2(β!−β )1T
ε t xt3∑
1T
ε t2xt
2∑ = 1T
gt2∑ →
p
Σgg
(β! − β )→p
01T
xt4∑ →
p
E(xt4 )
(β! − β )2 1
Txt
4∑ →p
0
E(ε t xt3) | E(ε t xt
3) |≤ [E(ε t2xt
2 )E(xt4 )]
12 (Cauchy−Schwartz)
1T
ε t xt3∑ →
p
E(ε t xt3)
(β! − β ) 1
Tε t xt
3∑ →p
0
Gerzensee, Econometrics Week 3, March 2020
Application to the AR(p) model Some preliminaries: (a) Suppose yt follows the MA process
yt = q(L)et =
where et ~ iid(0,s2). Let li = E(yt yt+i) denote the autocovariance of . If
then
and the process is stationary and ergodic.
Proof: Hamilton page 69-70 and Dhrymes page 370.
i=0
∞
∑θ iε t−i
i′th {yt}
i=0
∞
∑ |θ i |< ∞
i=0
∞
∑ | λi |< ∞
Gerzensee, Econometrics Week 3, March 2020
(b) Suppose f(L)yt =et where et ~ iid(0,s2), f(L) = 1 − f1L − … −fpLp with roots outside the unit circle. The AR model can be inverted to yield yt = q(L)et
with < ∞.
This can be verified using the results that we worked out for AR models earlier in the week.
i=0
∞
∑ |θ i |
Gerzensee, Econometrics Week 3, March 2020
Consider using OLS to estimate the coefficients of the AR model. Maintain the assumptions that is and that the roots of are outside the unit circle. Write the model as
where and . Then
and
where
with . Proof: Key Points
is stationary and ergodic
, is independent of for independent of for and
. Thus and so gt is a mds.
. The result then follows from the general results given above.
ε t iid(0,σ 2 ) φ(z)
yt = xt 'β + ε t
xt = ( yt−1, yt−2,...yt− p )
β = (φ1,φ2,...,φp )
β!→
p
β
T (β! − β )→
d
N (0,Vβ)
V
β=σ 2Σ xx
−1
Σ xx = E(xt xt′ )
{yt , xt}
[E(xtxt′ )]ij = E( yt−i yt− j ) = λ|i− j|
gt = ε t xt ε t xt−i i ≤ 0, ε t−i i < 0
E(ε t ) = 0 E(ε t |{ε i , xi}i=1t−l , xt ) = 0
E(gt gt′ ) = E(ε t
2xt xt′ ) = E[E(ε t
2xt xt′ | xt )]=σ
2Σ xx
Gerzensee, Econometrics Week 3, March 2020
AR(1) Example with and . Then
and
with so that
and an approximate 95% confidence interval for is given by
yt = βxt + ε t
β = φ xt = yt−1
Σ xx = var( yt−1) = var( yt ) =
σ 2
1−φ 2
T (φ! −φ)→
d
N (0,Vφ )
Vφ =σ
2Σ xx−1 = (1−φ 2 )
φ! ~
a
N φ, 1T
(1−φ 2 )⎛⎝⎜
⎞⎠⎟
φ
φ! ± (1.96)
(1−2
φ! )T
Gerzensee, Econometrics Week 3, March 2020
Dropping the Assumption that (gt) is a mds
Reference: Hayashi Chapter 6. 1. Implications for standard GMM inference The asymptotic distribution of the OLS and GMM estimators are driven by the asymptotic
normality of .
What happens when gt is serially correlated (so that gt is not a mds)?
It turns out that, as long as gt is “weakly dependent,” then ,
but the W ≠ Sgg.
1T
gtt=1
T
∑
1T
gtt=1
T
∑ →d
N (0,Ω)
Gerzensee, Econometrics Week 3, March 2020
Before stating the result, it is useful to work out an expression for W. Suppose gt is stationary with autocovariances li. Then
If the autocovariances satisfy (jargon: they are “1-summable”) then
This will be the expression for W. Recall that we defined the autocovariance generating variance function (AGF) as
l(z) = .
Looking at this, and the expression for W, you can see that W can be computed from the ACF by setting z =1. That is, W = l(1).
var(T −1/2 gt )t=1
T
∑
= 1T
{Tλ0 + (T −1)(λ1 + λ−1)+ (T − 2)(λ2 + λ−2 )+ ... + 1× (λT−1 + λ1−T )}
= λ jj=−T+1
T−1
∑ − 1T
j(λ j + λ− j )j=1
T−1
∑
j |λ j |j=1∞∑ <∞
var(T −1/2 gt )t=1
T
∑ → λ jj=−∞
∞
∑
λ j zj
j=−∞
∞
∑
Gerzensee, Econometrics Week 3, March 2020
Now, a few results: CLT for the MA(∞) model Let
where ~ iid(0, s2) and . Then
Proof: Anderson (1971, page 429) or Hamilton (page 195).
yt = µ +
j=0
∞
∑θ jε t− j
ε t |θ j |∑ < ∞
T ( y − µ)→d
N 0, λ jj=−∞
∞
∑⎛
⎝⎜⎞
⎠⎟
Gerzensee, Econometrics Week 3, March 2020
Example: MA(1) Suppose , then
.
Notice
and . Finally, to reconcile with earlier notation: For the MA model, l(z) = s2q(z)q(z-1), so that
,
and this is the result stated in the theorem.
yt = µ + ε t −θε t−1
T ( y − µ) = 1T t=1
T∑ ε t −θ 1T t=0
T−1∑ ε t = (1−θ ) 1T t=1
T∑ ε t +θ 1T(εT − ε0 )
(1−θ ) 1
T t=1
T∑ ε t →d
N (0,σ 2(1−θ )2 )
θ 1
T(εT − ε0 )→
p
0
λ(1) = λ jj=−∞
∞
∑ =σ 2θ(1)2 =σ 2(1−θ )2
Gerzensee, Econometrics Week 3, March 2020
CLT for ergodic and stationary processes Suppose that is stationary and ergodic with finite variance. Then under a set of “dependence” assumptions analogous to absolutely summable MA coefficients
The additional assumptions are given in White (1984), Theorem 5.15.
Properties of OLS and GMM when .
These are the same as the results presented above but with W replacing Sgg.
{yt}
T ( y − µ)→d
N (0, λ jj=−∞
∞
∑ )
T −1/2 gt
t=1
T
∑ →d
N (0,Ω)
Gerzensee, Econometrics Week 3, March 2020
2. Digression and a Review of GLS Consider the regression model Y = Xb + u where Y is n×1, X is n×k and so forth. Suppose that E(u | X) = 0 and Var(u | X) = L. If L = s2 I, then the OLS estimator of b , say is the best linear unbiased estimator of b conditional on X (the Gauss-Markov theorem). Moreover, if the errors have a conditional Gaussian distribution, is the MLE and achieves the CR lower bound and therefore in the minimum variance unbiased estimator conditional on X. When L ≠ s2 I, does not (in general) have these efficiency properties. Another estimator, the “generalized least squares estimator” does. This estimator is
. To motivate this estimator, write , so that and L-1/2LL-1/2¢ = I. Multiplying the regression relation by L−1/2 yields
−1/2Y = −1/2Xb + −1/2u or , where , , and . Note that and . Because and are nonsingular transformations of Y and X, the best linear unbiased estimator of b conditional on X is (from the Gauss-Markov Theorem):
where the final equality follows from the definitions of and .
βOLS
βOLS
βOLS
βGLS = X 'Λ−1X( )−1
X 'Λ−1Y
Λ−1 = Λ−1/2 'Λ−1/2 Λ = Λ1/2Λ1/2 '
Λ Λ Λ !Y = !Xβ + !u
!Y = Λ−1/2Y !X = Λ−1/2 X !u = Λ−1/2u
E( !u | !X ) = 0 Var( !u | !X ) = I
!Y !X
βGLS = !X ' !X( )−1 !X ' !Y = X 'Λ−1X( )−1
X 'Λ−1Y
!Y !X
Gerzensee, Econometrics Week 3, March 2020
An alternative way to derive the estimator is to write the Gaussian conditional likelihood function: Y|X ~ N(Xb, L), so that
, so that the MLE solves
Carrying out this minimization yields
f (Y | X ) = (2π )−n/2 |Λ |−1/2 exp − 1
2Y − Xβ( ) 'Λ−1 Y − Xβ( )⎧
⎨⎩
⎫⎬⎭
minb Y − Xb( ) 'Λ−1 Y − Xb( )
βGLS
Gerzensee, Econometrics Week 3, March 2020
Examples: (1) Weighted least squares:
Suppose that L = diag( ). Then L−1/2 = ,
and .
The GLS estimator can be constructed as the OLS regression of onto . Notice that this estimator is OLS after re-weighting the observations, where the weight applied to the i’th observation is 1/si (so that observations corresponding to ui with a low variance receive more weight).
σ i2
diag 1
σ i
⎛⎝⎜
⎞⎠⎟
!yi = yi /σ i !xi = xi /σ i
!yi !xi
Gerzensee, Econometrics Week 3, March 2020
(2) GLS in time series models: Suppose that ut = c(L)et, where et ~ iid(0,s2). Then (ignoring effects associated with initial conditions) , so that the GLS estimator can be constructed by regressing c(L)−1yt onto c(L)−1xt via OLS. As an example: suppose that (1−rL)ut = et, so that c(L) = (1−rL)−1. Then the GLS estimator is formed by regressing (1−rL)yt = yt − ryt−1 onto (1−rL)xt = xt − rxt−1. Notice that 1 observation is “lost” using this transformation. A calculation shows that the initial observation should be constructed as and
. Because the first observation is asymptotically negligible relative to the information in the other T−1 observations, the first observation is often dropped from the analysis.
!ut = c(L)−1ut = ε t
!y1 = (1− ρ 2 )1/2 y1
!x1 = (1− ρ 2 )1/2 x1
Gerzensee, Econometrics Week 3, March 2020
Feasible GLS: To construct the GLS estimator, you need to know L. Suppose that it is unknown, but depends on a small number of parameters, say q, so that L = L (q). This suggests that L can be estimated as . The “Feasible” GLS is
.
In many models it is possible to show that , so that, in large samples,
shares the same efficiency properties as .
Λ = Λ(θ )
βFGLS = ( X ' Λ−1X )−1 X ' Λ−1Y
n(β FGLS − βGLS )→p
0
βFGLS
βGLS
Gerzensee, Econometrics Week 3, March 2020
When should you use OLS and HAC standard errors instead of GLS? Consider the regression model
where is the regression error. Suppose that , so that . Given the other assumptions discussed above, the OLS estimator of is consistent and asymptotically normal. Now suppose that where ~ iid(0,s2). This suggests that the GLS estimator might be used. The GLS estimator can be formed as the OLS estimator applied to the transformed model:
where and similarly for . As discussed above, under certain assumptions the GLS estimator is BLUE, and hence is more efficient that the OLS estimator. The key assumption underlying the consistency of the GLS estimator is that . Alternatively, noting that et = ut - fut-1, this can be written as
for this to be true for all values of it must be the case that
The first two of these are implied by , the same assumption used for the consistency of OLS. The last two restrictions are not.
yt = xt 'β + ut
ut
E(ut | xt ) = 0 E(ut xt ) = 0β
ut = φut−1 + ε t ε t
!yt = !xt 'β + ε t
!yt = yt −φ yt−1 !xt
E(ε t!xt ) = 0
E[(ut −φut−1)(xt −φxt−1)]= E(ut xt )+φ2E(ut−1xt−1)−φE(ut xt−1)−φE(ut−1xt ) = 0
φ
E(ut xt ) = 0 (Term 1)E(ut−1xt−1) = 0 (Term 2)
E(ut xt−1) = 0 (Term 3)E(ut−1xt ) = 0 (Term 4)
E(ut | xt ) = 0
Gerzensee, Econometrics Week 3, March 2020
These two restrictions are implied be stronger assumptions. Term 3 is implied by . When this holds the regressors are said to be predetermined (or sometimes the term exogenous is used) Term 4 is implied by the stronger assumption . When this holds the regressors are said to be strictly exogenous. Evidently, GLS requires the assumption of strict exogeneity. Examples to be discussed: (a) Multi-period forecast efficiency (b) Orange juice prices and the weather
E(ut | xt , xt−1,...) = 0
E(ut | ...xt+1, xt , xt−1,...) = 0
Gerzensee, Econometrics Week 3, March 2020
24
HAC and HAR inference
(same setup as above) yt = xtb + et, where (yt, xt) stationary and ergodic and so forth
gt = etxt and
with and lj = E(gtgt').
.
Let and .
If W, then , and .
T SXε = T g = 1T
gtt=1
T
∑ d⎯ →⎯ N (0,Ω)
Ω = λ jj=−∞
∞
∑
T (β − β ) d⎯ →⎯ N (0,Vβ) with V
β= Σ XX
−1 ΩΣ XX−1
V = SXX−1 ΩSXX
−1 ξW = T (β − β )'Vβ−1(β − β )
Ω p⎯ →⎯ Vβ
p⎯ →⎯ Vβ
ξWd⎯ →⎯ χ k
2
Gerzensee, Econometrics Week 3, March 2020
25
Focus on estimators of W and how this affects inference. Simple case, xt = 1 and b = 0. Thus gt (= yt) are the data, E(gt) = 0,
= ,
and
and let .
β − β g = T −1 gtt=1
T
∑
T g d⎯ →⎯ N (0,Ω)
ξW (Ω) = TgΩ−1g d⎯ →⎯ χ1
2
ξW (Ω) = TgΩ−1g
Gerzensee, Econometrics Week 3, March 2020
26
Estimators of W:
HAC: The goal is estimate .
With a finite sample of data it is impossible to consistently estimate W for all possible sequences {lj }. But for special sequences, consistent estimation is possible. Two examples:
Ω =j=−∞
∞∑ λ j
Gerzensee, Econometrics Week 3, March 2020
27
Example 1: Suppose l|j| = 0 for |j| > 1 (so gt follows an MA(1) process). Then one needs only estimated the variance and first auto-covariance of the process. These can be estimated consistently. Thus
is consistent.
Ω = λ jj=−1
1
∑
Gerzensee, Econometrics Week 3, March 2020
28
Example 2: Suppose gt ~ AR(1). In this case lj = s2 fj/(1 -f2), and (from the formula for the ACGF),
=
= s2/(1-f)2. This can be consistently estimated by estimating the two parameters characterizing the AR(1) process, s2 and f and yields
.
Ω =j=−∞
∞∑ λ j
Ω = σ 2 / (1− φ)2
Gerzensee, Econometrics Week 3, March 2020
29
These two example are easily generalized. The logic of example 1 accommodates l|j| = 0 for |j| > k , where k is finite. The logic of example 2 accommodates any (vector) finite order ARMA model. And even these can be generalized, if they hold "approximately". There is a large literature on this.
Gerzensee, Econometrics Week 3, March 2020
30
Truncated estimators: with
Truncated estimators are not PSD. (They can generate values of that are note PSD).
Ω = λ jj=−k
k
∑ λ j = T−1 gtgt+ j 't=1
T− j
∑
Ω
Gerzensee, Econometrics Week 3, March 2020
31
Weighted truncated estimators: where w are weights.
Carefully chosen weights ensure PSD estimators.
Ω(w) = wjλ jj=−k
k
∑
Gerzensee, Econometrics Week 3, March 2020
32
The most widely used estimator is the "Newey-West" estimator:
, where w|j| = (k+1-j)/(k+1) (called, 'Bartlett' weights)
k is chosen as a function of T. (The Stock-Watson UG textbook suggests k = 0.75T1/3).
is PSD.
ΩNW = wjλ jj=−k
k
∑
ΩNW
Gerzensee, Econometrics Week 3, March 2020
33
With Barlett (i.e., Newey-West) weights, Andrews (1991) shows: if k = k(T) with k(T) → ∞ and k(T)/T → 0.
MSE( ) is minimized with k(T) = O(T1/3).
ΩNW p⎯ →⎯ Ω
ΩNW
Gerzensee, Econometrics Week 3, March 2020
34
With gt ~ AR(1) with coefficient f, applying Andrew's formula yields:
k* = 1.1447×41/3×(f2)1/3×T1/3 = 1.82×f2/3×T1/3
f k* T 100 400 1000
0.00 0 0 0 0 0.25 0.72×T1/3 3 5 7 0.50 1.15×T1/3 5 8 11 0.75 1.50×T1/3 6 11 15 0.90 1.70×T1/3 7 12 17 0.95 1.76×T1/3 8 12 17
Gerzensee, Econometrics Week 3, March 2020
35
Using rules such as these: so
ξW (ΩNW )− ξ W (Ω)
p⎯ →⎯ 0
ξW (ΩNW ) d⎯ →⎯ χ 2
Gerzensee, Econometrics Week 3, March 2020
36
This is all fine, 'asymptotically,' but it turns out that size distortions can be large using c2 critical values for . Example: gt is AR(1) with coefficient f. Size of 10% tests of µg = 0 when T = 250.
f Size 0.00 0.10 0.25 0.13 0.50 0.15 0.75 0.21 0.90 0.33 0.95 0.46
ξW (ΩNW )
Gerzensee, Econometrics Week 3, March 2020
37
Can we go beyond 'first-order asymptotics': (a calculation in Phillips and Sun (2008))
Let
Then
Write
then
or
If is independent of (as it would be, for example when the data are Gaussian), F(c,W) = P(x(W) > c) which can be computed (exactly) from the c2 when the data are Gaussian. This is the first-order asymptotic term. The Bias and MSE are higher-order terms. Evidently, bias is important (above and beyond its role in MSE). This suggests that k should be larger than the value chosen to minimize MSE.
P ξ(Ω) > c( ) = P ξ(Ω) > ΩΩc
⎛
⎝⎜⎞
⎠⎟
P ξ(Ω) > ΩΩc
⎛
⎝⎜⎞
⎠⎟Ω = F(c,Ω)
P ξ(Ω) > ΩΩc
⎛
⎝⎜⎞
⎠⎟= P ξ(Ω) > Ω
Ωc Ω
⎛
⎝⎜⎞
⎠⎟∫ f (Ω)dΩ
= F(c,Ω)∫ f (Ω)dΩ = EΩ(F(c,Ω))
F(c,Ω) = F(c,Ω)+ (Ω −Ω)F '(c,Ω)+ 12(Ω −Ω)2F ''(c,Ω)+ ...
P ξ(Ω) > c( ) ≈ F(c,Ω)+ E Ω −Ω( )F '(c,Ω)+ 12 E Ω −Ω( )2 F ''(c,Ω)
P ξ(Ω) > c( ) ≈ F(c,Ω)+ Bias(Ω)F '(c,Ω)+ 12 MSE(Ω)F ''(c,Ω)
T g Ω
Gerzensee, Econometrics Week 3, March 2020
38
Lazarus, Lewis, Stock (2019) do analysis that considers size distortion versus power loss tradeoff. They suggest k = 1.3T1/2 (together with modifying the critical values). (T = 400, SW textbook k = 0.75T1/3 ≈ 6 … LLS k = 1.3T1/2 ≈ 26) There are alternative approaches: Most promising follows from Müller (2004). Lazarus, Lewis, Stock, and Watson (2018) compare inference using various HAR estimators.
Gerzensee, Econometrics Week 3, March 2020
39
… Think about how serially correlated gt is likely to be in practice Examples: