+ All Categories
Home > Documents > Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020...

Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020...

Date post: 01-Aug-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
39
Gerzensee, Econometrics Week 3, March 2020 Linear Models with Serially Correlated Data Review from Bo's Week 2… Background: Asymptotics for Serially Correlated Processes LLN and CLT (1) LLN and Ergodicity A process is ergodic if its elements are asymptotically independent – that is, if random variables that are far apart in the sequence are essentially statistically independent of one another (see Hayashi, page 101 and Hamilton, pages 46-47). Ergodicity is important because, together with stationarity it leads to a SLLN: Ergodic Theorem: Suppose is stationary and ergodic with , then This is a generalization of the SLLN. (For a proof of the theorem and a more detailed discussion see Karlin and Taylor (1975).) If is stationary and ergodic, then so is xt = f(zt) for arbitrary function . {z t } E( z t ) = μ T 1 i=1 T z t a.s. μ. z t f
Transcript
Page 1: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

Linear Models with Serially Correlated Data

Review from Bo's Week 2…

Background: Asymptotics for Serially Correlated Processes –LLN and CLT

(1) LLN and Ergodicity A process is ergodic if its elements are asymptotically independent – that is, if random variables that are far apart in the sequence are essentially statistically independent of one another (see Hayashi, page 101 and Hamilton, pages 46-47). Ergodicity is important because, together with stationarity it leads to a SLLN:

Ergodic Theorem: Suppose is stationary and ergodic with , then

This is a generalization of the SLLN. (For a proof of the theorem and a more detailed discussion see Karlin and Taylor (1975).)

If is stationary and ergodic, then so is xt = f(zt) for arbitrary function .

{zt} E(zt ) = µ

T −1

i=1

T∑ zt →a.s.

µ.

zt f

Page 2: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

(2) CLT for martingale difference sequences (mds): Let {gt} be a (possibly vector-valued) mds that is stationary and ergodic with E(gtgt′) = Sgg . Then

See Hayashi (p. 106). Notes: While {gt} is serially uncorrelated, it may be serially dependent (through higher order moments). E(gtgt′) = Sgg concerns the “unconditional” variance. The conditional variance may be non-constant.

T g = 1

Tgt∑ →

d

N (0,Σgg )

Page 3: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

Review: Linear Model with Serially Correlated Regressors

These notes will work through the linear regression model and present conditions under which the results in i.i.d. setting continue to hold when the data are serially correlated. I will not work through the linear IV model, but under analogous conditions the i.i.d. results will also continue to hold. Chapter 3 of Hayashi provides the details. Replace the i.i.d. assumptions with

is a stationary and ergodic process E(etxt) = 0, or letting gt = etxt then E(gt) = 0 E(xtxt¢) = Sxx which is non-singular {gt} is a mds with E(gtgt′) = Sgg. Notes: E(etxt) = 0 is weaker than E(et | X) = 0 and E(et | xt ) = 0. It is sometimes (as in Hamilton) called the assumption of predetermined regressors. We will return to the difference between these two assumptions when we discuss GLS in the context of serially correlated errors. E(gtgt′) = Sgg allows the errors to be heteroskedastic conditional on the regressors – that is, it does not require that Sgg = . “{gt} is a mds” is a “high-level” assumption concerning the cross product of and . At a

more primitive level it is implied by

{yt , xt}

σε2Σ xx

ε t xt

E(ε t |{ε i , xi}i=1t−l , xt ) = 0

Page 4: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

Properties of

Consistency: (consistency)

Proof:

and is stationary and ergodic, with , so that

which is non-singular. Also {gt} is stationary and ergodic with E(gt) = 0. Thus

and the result follows from Slutsky’s theorem.

β!

β!→

p

β

β! − β = 1

Txt xt

′∑⎛⎝⎜

⎞⎠⎟

−11T

gt∑⎛⎝⎜

⎞⎠⎟

{xt xt′} E{xt xt

′}= Σ xx

1T

xt xt′∑ →

p

Σ xx

1T

gt∑ →p

0

Page 5: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

Asymptotic Normality: , where .

Proof:

From earlier:

Also

which follows by the CLT for a mds.

The result then follows by Slutsky’s theorem.

T (β! − β )→

d

N (0,Vβ!)

V

β!= Σ xx

−1ΣggΣ xx−1

T (β! − β ) = 1

Txt xt

′∑⎛⎝⎜

⎞⎠⎟

−11T

gt∑⎛⎝⎜

⎞⎠⎟

1T

xt xt′∑ →

p

Σ xx (nonsingular)

1T

gt∑ →d

N (0,Σgg )

Page 6: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

Feasible inference:

(1) Let be a consistent estimator of Sgg. Then , where

.

Proof: (You should fill in – use Slutsky’s Theorem)

(2)

Proof: (You should fill in)

(3) Let the Wald Statistic be written as

then

and

where .

Proof: (You should fill in – use from Slutsky’s theorem and the continuous mapping theorem.)

Σgg

Vβ= Sxx

−1ΣggSxx−1→

p

Vβ!

Sxx =

1T∑ xt xt '

t j =T (

jβ! − β j )

(Vβ) jj

→d

N (0,1)

ξW = T (Rβ! − Rβ )′ (RV

βR′ )−1(Rβ! − β )

ξW →d

χm2

ξW

m→

d

Fm,∞

rank(R) = m

Page 7: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

(4) Suppose that is finite for all and . Let , , and

. Then

Proof: (in the model with , for simplicity)

so that

Now

by the ergodic theorem.

Thus

Note is finite since

and thus

so that

and the result follows.

E[(xt ,ixt , j )

2] i j ε t = yt − xtβ!

gt = ε t xt

Sgg =

1T

gt2∑ = 1

Tε t

2xt2∑

Sgg →p

Σgg

k = 1

ε t = yt − xtβ − xt (β! − β )

= ε t − xt (β! − β )

Sgg =1T∑ε t

2xt2

+(β!−β )2 1T

xt4∑

−2(β!−β )1T

ε t xt3∑

1T

ε t2xt

2∑ = 1T

gt2∑ →

p

Σgg

(β! − β )→p

01T

xt4∑ →

p

E(xt4 )

(β! − β )2 1

Txt

4∑ →p

0

E(ε t xt3) | E(ε t xt

3) |≤ [E(ε t2xt

2 )E(xt4 )]

12 (Cauchy−Schwartz)

1T

ε t xt3∑ →

p

E(ε t xt3)

(β! − β ) 1

Tε t xt

3∑ →p

0

Page 8: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

Application to the AR(p) model Some preliminaries: (a) Suppose yt follows the MA process

yt = q(L)et =

where et ~ iid(0,s2). Let li = E(yt yt+i) denote the autocovariance of . If

then

and the process is stationary and ergodic.

Proof: Hamilton page 69-70 and Dhrymes page 370.

i=0

∑θ iε t−i

i′th {yt}

i=0

∑ |θ i |< ∞

i=0

∑ | λi |< ∞

Page 9: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

(b) Suppose f(L)yt =et where et ~ iid(0,s2), f(L) = 1 − f1L − … −fpLp with roots outside the unit circle. The AR model can be inverted to yield yt = q(L)et

with < ∞.

This can be verified using the results that we worked out for AR models earlier in the week.

i=0

∑ |θ i |

Page 10: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

Consider using OLS to estimate the coefficients of the AR model. Maintain the assumptions that is and that the roots of are outside the unit circle. Write the model as

where and . Then

and

where

with . Proof: Key Points

is stationary and ergodic

, is independent of for independent of for and

. Thus and so gt is a mds.

. The result then follows from the general results given above.

ε t iid(0,σ 2 ) φ(z)

yt = xt 'β + ε t

xt = ( yt−1, yt−2,...yt− p )

β = (φ1,φ2,...,φp )

β!→

p

β

T (β! − β )→

d

N (0,Vβ)

V

β=σ 2Σ xx

−1

Σ xx = E(xt xt′ )

{yt , xt}

[E(xtxt′ )]ij = E( yt−i yt− j ) = λ|i− j|

gt = ε t xt ε t xt−i i ≤ 0, ε t−i i < 0

E(ε t ) = 0 E(ε t |{ε i , xi}i=1t−l , xt ) = 0

E(gt gt′ ) = E(ε t

2xt xt′ ) = E[E(ε t

2xt xt′ | xt )]=σ

2Σ xx

Page 11: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

AR(1) Example with and . Then

and

with so that

and an approximate 95% confidence interval for is given by

yt = βxt + ε t

β = φ xt = yt−1

Σ xx = var( yt−1) = var( yt ) =

σ 2

1−φ 2

T (φ! −φ)→

d

N (0,Vφ )

Vφ =σ

2Σ xx−1 = (1−φ 2 )

φ! ~

a

N φ, 1T

(1−φ 2 )⎛⎝⎜

⎞⎠⎟

φ

φ! ± (1.96)

(1−2

φ! )T

Page 12: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

Dropping the Assumption that (gt) is a mds

Reference: Hayashi Chapter 6. 1. Implications for standard GMM inference The asymptotic distribution of the OLS and GMM estimators are driven by the asymptotic

normality of .

What happens when gt is serially correlated (so that gt is not a mds)?

It turns out that, as long as gt is “weakly dependent,” then ,

but the W ≠ Sgg.

1T

gtt=1

T

1T

gtt=1

T

∑ →d

N (0,Ω)

Page 13: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

Before stating the result, it is useful to work out an expression for W. Suppose gt is stationary with autocovariances li. Then

If the autocovariances satisfy (jargon: they are “1-summable”) then

This will be the expression for W. Recall that we defined the autocovariance generating variance function (AGF) as

l(z) = .

Looking at this, and the expression for W, you can see that W can be computed from the ACF by setting z =1. That is, W = l(1).

var(T −1/2 gt )t=1

T

= 1T

{Tλ0 + (T −1)(λ1 + λ−1)+ (T − 2)(λ2 + λ−2 )+ ... + 1× (λT−1 + λ1−T )}

= λ jj=−T+1

T−1

∑ − 1T

j(λ j + λ− j )j=1

T−1

j |λ j |j=1∞∑ <∞

var(T −1/2 gt )t=1

T

∑ → λ jj=−∞

λ j zj

j=−∞

Page 14: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

Now, a few results: CLT for the MA(∞) model Let

where ~ iid(0, s2) and . Then

Proof: Anderson (1971, page 429) or Hamilton (page 195).

yt = µ +

j=0

∑θ jε t− j

ε t |θ j |∑ < ∞

T ( y − µ)→d

N 0, λ jj=−∞

∑⎛

⎝⎜⎞

⎠⎟

Page 15: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

Example: MA(1) Suppose , then

.

Notice

and . Finally, to reconcile with earlier notation: For the MA model, l(z) = s2q(z)q(z-1), so that

,

and this is the result stated in the theorem.

yt = µ + ε t −θε t−1

T ( y − µ) = 1T t=1

T∑ ε t −θ 1T t=0

T−1∑ ε t = (1−θ ) 1T t=1

T∑ ε t +θ 1T(εT − ε0 )

(1−θ ) 1

T t=1

T∑ ε t →d

N (0,σ 2(1−θ )2 )

θ 1

T(εT − ε0 )→

p

0

λ(1) = λ jj=−∞

∑ =σ 2θ(1)2 =σ 2(1−θ )2

Page 16: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

CLT for ergodic and stationary processes Suppose that is stationary and ergodic with finite variance. Then under a set of “dependence” assumptions analogous to absolutely summable MA coefficients

The additional assumptions are given in White (1984), Theorem 5.15.

Properties of OLS and GMM when .

These are the same as the results presented above but with W replacing Sgg.

{yt}

T ( y − µ)→d

N (0, λ jj=−∞

∑ )

T −1/2 gt

t=1

T

∑ →d

N (0,Ω)

Page 17: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

2. Digression and a Review of GLS Consider the regression model Y = Xb + u where Y is n×1, X is n×k and so forth. Suppose that E(u | X) = 0 and Var(u | X) = L. If L = s2 I, then the OLS estimator of b , say is the best linear unbiased estimator of b conditional on X (the Gauss-Markov theorem). Moreover, if the errors have a conditional Gaussian distribution, is the MLE and achieves the CR lower bound and therefore in the minimum variance unbiased estimator conditional on X. When L ≠ s2 I, does not (in general) have these efficiency properties. Another estimator, the “generalized least squares estimator” does. This estimator is

. To motivate this estimator, write , so that and L-1/2LL-1/2¢ = I. Multiplying the regression relation by L−1/2 yields

−1/2Y = −1/2Xb + −1/2u or , where , , and . Note that and . Because and are nonsingular transformations of Y and X, the best linear unbiased estimator of b conditional on X is (from the Gauss-Markov Theorem):

where the final equality follows from the definitions of and .

βOLS

βOLS

βOLS

βGLS = X 'Λ−1X( )−1

X 'Λ−1Y

Λ−1 = Λ−1/2 'Λ−1/2 Λ = Λ1/2Λ1/2 '

Λ Λ Λ !Y = !Xβ + !u

!Y = Λ−1/2Y !X = Λ−1/2 X !u = Λ−1/2u

E( !u | !X ) = 0 Var( !u | !X ) = I

!Y !X

βGLS = !X ' !X( )−1 !X ' !Y = X 'Λ−1X( )−1

X 'Λ−1Y

!Y !X

Page 18: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

An alternative way to derive the estimator is to write the Gaussian conditional likelihood function: Y|X ~ N(Xb, L), so that

, so that the MLE solves

Carrying out this minimization yields

f (Y | X ) = (2π )−n/2 |Λ |−1/2 exp − 1

2Y − Xβ( ) 'Λ−1 Y − Xβ( )⎧

⎨⎩

⎫⎬⎭

minb Y − Xb( ) 'Λ−1 Y − Xb( )

βGLS

Page 19: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

Examples: (1) Weighted least squares:

Suppose that L = diag( ). Then L−1/2 = ,

and .

The GLS estimator can be constructed as the OLS regression of onto . Notice that this estimator is OLS after re-weighting the observations, where the weight applied to the i’th observation is 1/si (so that observations corresponding to ui with a low variance receive more weight).

σ i2

diag 1

σ i

⎛⎝⎜

⎞⎠⎟

!yi = yi /σ i !xi = xi /σ i

!yi !xi

Page 20: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

(2) GLS in time series models: Suppose that ut = c(L)et, where et ~ iid(0,s2). Then (ignoring effects associated with initial conditions) , so that the GLS estimator can be constructed by regressing c(L)−1yt onto c(L)−1xt via OLS. As an example: suppose that (1−rL)ut = et, so that c(L) = (1−rL)−1. Then the GLS estimator is formed by regressing (1−rL)yt = yt − ryt−1 onto (1−rL)xt = xt − rxt−1. Notice that 1 observation is “lost” using this transformation. A calculation shows that the initial observation should be constructed as and

. Because the first observation is asymptotically negligible relative to the information in the other T−1 observations, the first observation is often dropped from the analysis.

!ut = c(L)−1ut = ε t

!y1 = (1− ρ 2 )1/2 y1

!x1 = (1− ρ 2 )1/2 x1

Page 21: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

Feasible GLS: To construct the GLS estimator, you need to know L. Suppose that it is unknown, but depends on a small number of parameters, say q, so that L = L (q). This suggests that L can be estimated as . The “Feasible” GLS is

.

In many models it is possible to show that , so that, in large samples,

shares the same efficiency properties as .

Λ = Λ(θ )

βFGLS = ( X ' Λ−1X )−1 X ' Λ−1Y

n(β FGLS − βGLS )→p

0

βFGLS

βGLS

Page 22: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

When should you use OLS and HAC standard errors instead of GLS? Consider the regression model

where is the regression error. Suppose that , so that . Given the other assumptions discussed above, the OLS estimator of is consistent and asymptotically normal. Now suppose that where ~ iid(0,s2). This suggests that the GLS estimator might be used. The GLS estimator can be formed as the OLS estimator applied to the transformed model:

where and similarly for . As discussed above, under certain assumptions the GLS estimator is BLUE, and hence is more efficient that the OLS estimator. The key assumption underlying the consistency of the GLS estimator is that . Alternatively, noting that et = ut - fut-1, this can be written as

for this to be true for all values of it must be the case that

The first two of these are implied by , the same assumption used for the consistency of OLS. The last two restrictions are not.

yt = xt 'β + ut

ut

E(ut | xt ) = 0 E(ut xt ) = 0β

ut = φut−1 + ε t ε t

!yt = !xt 'β + ε t

!yt = yt −φ yt−1 !xt

E(ε t!xt ) = 0

E[(ut −φut−1)(xt −φxt−1)]= E(ut xt )+φ2E(ut−1xt−1)−φE(ut xt−1)−φE(ut−1xt ) = 0

φ

E(ut xt ) = 0 (Term 1)E(ut−1xt−1) = 0 (Term 2)

E(ut xt−1) = 0 (Term 3)E(ut−1xt ) = 0 (Term 4)

E(ut | xt ) = 0

Page 23: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

These two restrictions are implied be stronger assumptions. Term 3 is implied by . When this holds the regressors are said to be predetermined (or sometimes the term exogenous is used) Term 4 is implied by the stronger assumption . When this holds the regressors are said to be strictly exogenous. Evidently, GLS requires the assumption of strict exogeneity. Examples to be discussed: (a) Multi-period forecast efficiency (b) Orange juice prices and the weather

E(ut | xt , xt−1,...) = 0

E(ut | ...xt+1, xt , xt−1,...) = 0

Page 24: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

24

HAC and HAR inference

(same setup as above) yt = xtb + et, where (yt, xt) stationary and ergodic and so forth

gt = etxt and

with and lj = E(gtgt').

.

Let and .

If W, then , and .

T SXε = T g = 1T

gtt=1

T

∑ d⎯ →⎯ N (0,Ω)

Ω = λ jj=−∞

T (β − β ) d⎯ →⎯ N (0,Vβ) with V

β= Σ XX

−1 ΩΣ XX−1

V = SXX−1 ΩSXX

−1 ξW = T (β − β )'Vβ−1(β − β )

Ω p⎯ →⎯ Vβ

p⎯ →⎯ Vβ

ξWd⎯ →⎯ χ k

2

Page 25: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

25

Focus on estimators of W and how this affects inference. Simple case, xt = 1 and b = 0. Thus gt (= yt) are the data, E(gt) = 0,

= ,

and

and let .

β − β g = T −1 gtt=1

T

T g d⎯ →⎯ N (0,Ω)

ξW (Ω) = TgΩ−1g d⎯ →⎯ χ1

2

ξW (Ω) = TgΩ−1g

Page 26: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

26

Estimators of W:

HAC: The goal is estimate .

With a finite sample of data it is impossible to consistently estimate W for all possible sequences {lj }. But for special sequences, consistent estimation is possible. Two examples:

Ω =j=−∞

∞∑ λ j

Page 27: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

27

Example 1: Suppose l|j| = 0 for |j| > 1 (so gt follows an MA(1) process). Then one needs only estimated the variance and first auto-covariance of the process. These can be estimated consistently. Thus

is consistent.

Ω = λ jj=−1

1

Page 28: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

28

Example 2: Suppose gt ~ AR(1). In this case lj = s2 fj/(1 -f2), and (from the formula for the ACGF),

=

= s2/(1-f)2. This can be consistently estimated by estimating the two parameters characterizing the AR(1) process, s2 and f and yields

.

Ω =j=−∞

∞∑ λ j

Ω = σ 2 / (1− φ)2

Page 29: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

29

These two example are easily generalized. The logic of example 1 accommodates l|j| = 0 for |j| > k , where k is finite. The logic of example 2 accommodates any (vector) finite order ARMA model. And even these can be generalized, if they hold "approximately". There is a large literature on this.

Page 30: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

30

Truncated estimators: with

Truncated estimators are not PSD. (They can generate values of that are note PSD).

Ω = λ jj=−k

k

∑ λ j = T−1 gtgt+ j 't=1

T− j

Ω

Page 31: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

31

Weighted truncated estimators: where w are weights.

Carefully chosen weights ensure PSD estimators.

Ω(w) = wjλ jj=−k

k

Page 32: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

32

The most widely used estimator is the "Newey-West" estimator:

, where w|j| = (k+1-j)/(k+1) (called, 'Bartlett' weights)

k is chosen as a function of T. (The Stock-Watson UG textbook suggests k = 0.75T1/3).

is PSD.

ΩNW = wjλ jj=−k

k

ΩNW

Page 33: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

33

With Barlett (i.e., Newey-West) weights, Andrews (1991) shows: if k = k(T) with k(T) → ∞ and k(T)/T → 0.

MSE( ) is minimized with k(T) = O(T1/3).

ΩNW p⎯ →⎯ Ω

ΩNW

Page 34: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

34

With gt ~ AR(1) with coefficient f, applying Andrew's formula yields:

k* = 1.1447×41/3×(f2)1/3×T1/3 = 1.82×f2/3×T1/3

f k* T 100 400 1000

0.00 0 0 0 0 0.25 0.72×T1/3 3 5 7 0.50 1.15×T1/3 5 8 11 0.75 1.50×T1/3 6 11 15 0.90 1.70×T1/3 7 12 17 0.95 1.76×T1/3 8 12 17

Page 35: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

35

Using rules such as these: so

ξW (ΩNW )− ξ W (Ω)

p⎯ →⎯ 0

ξW (ΩNW ) d⎯ →⎯ χ 2

Page 36: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

36

This is all fine, 'asymptotically,' but it turns out that size distortions can be large using c2 critical values for . Example: gt is AR(1) with coefficient f. Size of 10% tests of µg = 0 when T = 250.

f Size 0.00 0.10 0.25 0.13 0.50 0.15 0.75 0.21 0.90 0.33 0.95 0.46

ξW (ΩNW )

Page 37: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

37

Can we go beyond 'first-order asymptotics': (a calculation in Phillips and Sun (2008))

Let

Then

Write

then

or

If is independent of (as it would be, for example when the data are Gaussian), F(c,W) = P(x(W) > c) which can be computed (exactly) from the c2 when the data are Gaussian. This is the first-order asymptotic term. The Bias and MSE are higher-order terms. Evidently, bias is important (above and beyond its role in MSE). This suggests that k should be larger than the value chosen to minimize MSE.

P ξ(Ω) > c( ) = P ξ(Ω) > ΩΩc

⎝⎜⎞

⎠⎟

P ξ(Ω) > ΩΩc

⎝⎜⎞

⎠⎟Ω = F(c,Ω)

P ξ(Ω) > ΩΩc

⎝⎜⎞

⎠⎟= P ξ(Ω) > Ω

Ωc Ω

⎝⎜⎞

⎠⎟∫ f (Ω)dΩ

= F(c,Ω)∫ f (Ω)dΩ = EΩ(F(c,Ω))

F(c,Ω) = F(c,Ω)+ (Ω −Ω)F '(c,Ω)+ 12(Ω −Ω)2F ''(c,Ω)+ ...

P ξ(Ω) > c( ) ≈ F(c,Ω)+ E Ω −Ω( )F '(c,Ω)+ 12 E Ω −Ω( )2 F ''(c,Ω)

P ξ(Ω) > c( ) ≈ F(c,Ω)+ Bias(Ω)F '(c,Ω)+ 12 MSE(Ω)F ''(c,Ω)

T g Ω

Page 38: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

38

Lazarus, Lewis, Stock (2019) do analysis that considers size distortion versus power loss tradeoff. They suggest k = 1.3T1/2 (together with modifying the critical values). (T = 400, SW textbook k = 0.75T1/3 ≈ 6 … LLS k = 1.3T1/2 ≈ 26) There are alternative approaches: Most promising follows from Müller (2004). Lazarus, Lewis, Stock, and Watson (2018) compare inference using various HAR estimators.

Page 39: Linear Models with Serially Correlated Datamwatson/misc/SW_W3_19_20_part2_20200316… · 16-03-2020  · Proof: (You should fill in – use from Slutsky’s theorem and the continuous

Gerzensee, Econometrics Week 3, March 2020

39

… Think about how serially correlated gt is likely to be in practice Examples:


Recommended