1 Stationary Process - University of Texas at Dallasd.sul/Econo3/lec3.pdf1 Stationary Process Model:...

transcript

1 Stationary Process

Model: Time invariant mean

yt = �+ "t

1.1 De�nition:

1. Autocovariance

jt = E (yt � �) (yt�j � �) = E ("t"t�j)

2. Stationarity: If neither the mean � nor the autocovariance depend on the data t; then the process

yt is said to be covariance stationary or weakly stationary

E (yt) = � for all t

E (yt � �) (yt�j � �) = j for all t and any j

3. Ergodicity:

(a) Covariance stationary process is said to be ergodic for the mean if

yt !p E (yt)

for all j: Alternatively we have1Xj=0

�� j�� <1:(b) Covariance stationary process is said to be ergodic for second moment if

T � j

TXt=j+1

(yt � �) (yt�j � �)!p j

for all j:

4. White Noise: A series "t is a white noise process if

E ("t) = 0; E�"2t�= �2; E ("t"s) = 0 for all t and s:

1.2 Moving Average

The �rst-order MA process: MA(1)

yt = �+ "t + �"t�1; "t � iid�0; �2"

E (yt � �)2 = 0 =�1 + �2

��2"

E (yt � �) (yt�1 � �) = 1 = ��2"

E (yt � �) (yt�2 � �) = 2 = 0

yt = �+ "t + �1"t�1 + �2"t�2

0 =�1 + �21 + �

��2"

1 = (�1 + �2�1)�2"

2 = �2�2"

3 = 4 = ::: = 0

yt = �+1Xj=0

j"t�j

yt is stationary if1Xj=0

2j <1 : square summable

1.3 Autoregressive Process

yt = a+ ut; ut = �ut�1 + "t

yt = a (1� �) + �yt�1 + "t

yt = a (1� �) + "t + �"t�1 + �2"t�2 + :::

so that yt =MA (1) ; and1Xj=0

�2j =1

1� �2 <1

1� �2�2"; 1 =

1� �2 ��2"; t = � t�1

yt = a (1� �) + �1yt�1 + :::+ �pyt�p + "t

�j :

t = �1 t�1 + :::+ �p t�p : Yule-Walker equation

Augmented Form for AR(2)

yt = a (1� �) + (�1 + �2) yt�1 � �2yt�1 + �2yt�2 + "t

= a (1� �) + �yt�1 � �2�yt�1 + "t

Unit Root Testing Form

�yt = a (1� �) + (1� �) yt�1 � �2�yt�1 + "t

1.4 Source of MA term

Example 1:

yt = �yt�1 + ut; xt = �xt�1 + et;

ut; et = white noise

Consider z variable such that

zt = xt + yt:

Now does zt follow AR(1)?

zt = � (yt�1 + xt�1) + (�� )xt�1 + ut + et = �zt�1 + "t

"t = (�� )1Xj=0

�jet�j�1 + ut + et

so that zt becomes ARMA (1;1) :

Example 2:

ys = �ys�1 + us; s = 1; :::; S

You observe only even event. Then we have

ys = �2ys�2 + �us�1 + us

xt = ys for t = 1; :::; T ; s = 2; :::; S:

Then we have

xt = �2xt�1 + "t; "t = �ut�1=2 + ut

so that xt follows ARMA(1,1)

1.5 Model Selection

1.5.1 Information Criteria

Consider a criteria function given by

cn (k) = �2lnL (k)

� (T )

where � (T ) is a deterministic function. The model (lag length) is selected by minimizing the above criteria

function with respect to k: That is

argminkcT (k)

There are three famous criteria functions

AIC: � (T ) = 2

BIC(Schwartz): � (T ) = lnT

Hannan-Quinn: � (T ) = 2 ln (lnT )

Let k� be the true lag length. Then the likelihood function must be maximized with k� asymptotically.

That is,

plimT!1 lnL (k�) > plimT!1 lnL (k) for any k

Now, consider two cases. First, k < k�: Then we have

limT!1

Pr [cT (k�) � cT (k)] = lim

��2 lnL (k

T+ k�

� (T )

T� �2 lnL (k)

� (T )

�= lim

�lnL (k�)

T� lnL (k)

T� 1

2(k� � k) � (T )

�= 0

for all � (T )s:

Next, consider the case of k > k�: Then we know that the likelihood ration test given by

2 [lnL (k)� lnL (k�)]!D �2k�k�

Now consider AIC �rst.

T (cT (k�)� cT (k)) = 2 [lnL (k)� lnL (k�)]� 2 (k � k�)!D �2k�k� � 2 (k � k�)

Hence we have

limT!1

Pr [cT (k�) � cT (k)] = Pr

��2k�k� � 2 (k � k�)

�> 0

so that AIC may asymptotically over-estimate the lag length.

Consider the other two criteria. For both cases,

limT!1

� (T ) =1:

Hence we have

T (cT (k�)� cT (k)) = 2 [lnL (k)� lnL (k�)]� 2 (k � k�)� (T )!D �2k�k� � 2 (k � k�)� (T )

so that

limT!1

Pr [cT (k�) � cT (k)] = lim

T!1Pr��2k�k� � 2 (k � k�)� (T )

�= 0

Hence BIC and Hannan-Quinn�s criteria consistently estimate the true lag length.

1.5.2 General to Speci�c (GS) Method

In practice, the so-called general to speci�c method is also popularly used. GS method involves the following

sequential steps.

Step 1 Run AR(kmax) and test if the last coe¢ cient is signi�cantly di¤erent from zero.

Step 2 If not, let kmax = kmax � 1; and repeat step 1 until the last coe¢ cient is signi�cant.

The general-to-speci�c methodology applies conventional statistical tests. So if the signi�cance level for

the tests is �xed, then the order estimator inevitably allows for a nonzero probability of overestimation.

Furthermore, as is typical in sequential tests, this overestimation probability is bigger than the signi�cance

level when there are multiple steps between kmax and p because the probability of false rejection accumulates

as k step downs from kmax to p.

These problems can be mitigated (and overcome at least asymptotically) by letting the level of the test

be dependent on the sample size. More precisely, following Bauer, Pötscher and Hackl (1988), we can set

the critical value CT in such a way that (i) CT ! 1, and (ii) CT =pT ! 0 as T ! 1: The critical value

corresponds to the standard normal critical value for the signi�cance level �T = 1��(CT ), where �(�) is the

standard normal c.d.f. Conditions (i) and (ii) are equivalent to the requirement that the signi�cance level

�T ! 0 and � log�TpT

! 0 (proved in equation (22) of Pötscher, 1983).

References

[1] Bauer, P., Pötscher, B. M., and P. Hackl (1988). Model Selection by Multiple Test Procedures. Statistics,

19, 39�44.

[2] Pötscher, B. M. (1983). Order Estimation in ARMA-models by Lagrangian Multiplier Tests. Annals of

Statistics, 11, 872�885.

2 Asymptotic Distribution for Stationary Process

2.1 Law of Large Numbers for a covariance stationary process

Let consider �rst the asymptotic properties of the sample mean.

�yT =1

Xyt; E (�yT ) = �

Next, as T !1

E (�yT � �)2 = E

nX(yt � �)

o2�=

T 2�T 0 + 2 (T � 1) 1 + � � �+ 2 T�1

� 0 + 2

T � 1T

1 + � � �+1

T T�1

�� 1

� 0 + 2 1 + � � �+ T�1

�Hence we have

limT!1

T � E (�yT � �)2 =1X�1

Example: yt = ut; ut = �ut�1 + "t; "t � iid�0; �2

�: Then we have

� 0 + 2

T � 1T

1 + � � �+1

T T�1

1� �2

�1 + 2

T � 1T

�+ 2T � 2T

�2 + � � �+ 1

T�T�1

1� �2

(1� �)2�T � T�+ �T � 1

1� �2

T� (1� �)(1� �)2

��T � 1

�(1� �)2

1� �2

(1 + 2

� (1� �)(1� �)2

��T � 1

�(1� �)2

1� �2

�1 + 2

(1� �)

�T�2

(1� �) (1 + �)

�1 + �

(1� �)

�T�2

(1� �)2+O

�T�2

�where note that

2T � tT

�t =2

(1� �)2�T � T�+ �T � 1

Now as T !1; we have

limT!1

T � E�1

(1� �)2:

2.2 CLT for Martingale Di¤erence Sequence

E (yt) = 0 & E (ytjt�1) = 0 for all t;

then yt is called m.d.s.

If yt is m.d.s, then yt is not serially correlated.

CLT for a mds Let fytg1t=1 be a scalar mds with �yt = T�1PT

t=1 yt: Suppose that (a) E�y2t�= �2t > 0

with T�1PT

t=1 �2t ! �2 > 0 (b) E jytjr <1 for some r > 2 and all t (c) T�1

PTt=1 y

2t !p �2; Then

pT �yt !d N

�0; �2

CLT for a stationary stochastic process Let

yt = �+1Xj=0

j"t�j

where "t is iid random variable with E�"2t�<1 and

�� j�� <1: ThenpT (�yT � �)!d N

0@0; 1Xj=�1

Example 1:

yt = a+ ut; ut = �ut�1 + et; et � iid�0; �2e

�Then we have

yt = a+

�jet�j :

HencepT (�yT � a)!d N

(1� �)2

Example 2:

yt = a (1� �) + �yt�1 + "t; "t � iid�0; �2

�� = �+

P(yt�1 � �yT�1) ("t � �"T )P

(yt�1 � �yT�1)2

Show the condition that

limT!1

X(yt�1 � �yT�1)2 = Q2 <1

where Q2 = �2=�1� �2

Calculate

limT!1

�1pT

X(yt�1 � �yT�1) ("t � �"T )

�2Show that

pT (�� )!d N

�0; 1� �2

3 Finite Sample Properties

3.1 Calculating Bias By using Simple Taylor Expansion

Unknown Constant Case

yt = a+ �yt�1 + et; et � iidN (0; 1)

First, show that

1� Cov (A;B)

E(A)E(B)+V ar (B)

E (B)2

�T�2

EB� E (A� a) (B � b)

E(B)2+EAE (B � b)2

E (B)3 +O

�T�2

�Let EA = a; EB = b and take the Taylor expansion of A=B around a and b:

b(A� a)� a

b2(B � b)� 1

b2(A� a)(B � b)� 1

b2(A� a)(B � b) + a

b3(B � b)2 +Rn

Take expectation.

bE (A� a)� a

b2E (B � b)� 1

b2E (B � b) (A� a) + a

b3E (B � b)2 + ERn

b� 1

b2Cov(A;B) +

b3V ar(B) +O

�T�2

�Now consider

E� = E

P~yt~yt�1P~y2t�1

Note that in this example, we have

E (A) = E (B) =�2e

(1� �2) ��2e

T (1� �)2+O

�and

E (xtxt+kxt+k+lxt+k+l+m) =�k+m

�1 + 2�2l

�(1� �2)2

if ut is normal.

From this, we can calculate all moments. For an example, we have

T 2E�X

(1� �2)2+ 2

T�1Xt=1

(T � i) 1 + 2�2t

(1� �2)2

Then we have �nally

E� = E

P~yt~yt�1P~y2t�1

= �� 1 + 3�T

+O�T�2

�so that

E (�� ) = �1 + 3�T

+O�T�2

For non-constant case

xt = �xt�1 + et

E (�� ) = �2�T+O

�T�2

�For a trend case

yt = a+ bt+ �yt�1 + et

E (�� ) = �2(1 + 2�)T

+O�T�2

�3.2 Approximating Statistical Inference by using Edgeworth Expansion

For non-constant case (Phillips, 1977), we have

�� =Pyt�1utPy2t�1

Pyt�1 (yt � �yt�1)P

y2t�1=

Pyt�1yt � �

Py2t�1P

y2t�1

so that (�� ) can be expressed as a function of moments. Let

pT (�� ) =

pTe (m)

where m stands for a vector of moments. Then taking Taylor expansion yields

pTe (m) =

�ermr +

2ersmrms +

6erstmrmsmt +Op

��where

er =@e (0)

@mr; ... etc.

Solving all moments yields

pT (�� )p1� �2

!= �(w) +

� (w)pT

�p1� �2

!�w2 + 1

�where w = x=

p1� �2:

For constant case (Tanaka, 1983)

pT (�� )p1� �2

!= �(w) +

� (w)pT

�+ � (w)

2p1� �2

4 Covariance-Stationary Vector Processes

Consider the following simple VAR(1) with bivariate variables

y1t = a1 + b11y1t�1 + b12y2t�1 + e1t

y2t = a2 + b21y1t�1 + b22y2t�1 + e2t;

alternatively we can rewrite it as

yt = a+ byt�1 + et

35 ; b =

24 b11 b12

b21 b22

35 ; et =

24 e1t

35Usually,

Eete0s = for t = s

= 0 otherwise

4.1 State Space Representation

Consider VAR(p) with two variables given by

yt = a+ b1yt�1 + � � �+ bpyt�p + et

Then we can rewrite it as

26664yt � �...

yt�p+1 � �

37775 =

26666666664

b1 b2 � � � bp

I2 0 � � � 0

0 I2 � � � 0

0 0 � � � 0

37777777775

26664yt�1 � �

yt�p � �

37775+26666664et

37777775 ;

�t = F�t�1 + vt

so that any VAR(p) can be rewritten as VAR(1). We call this form �state space representation�.

4.2 Stationarity Condition

The eigenvalues of the matrix F satisfy

��I�p�b1�p�1�b2�p�2� � � � � bp�� = 010

where � are the eigenvalues. That is, as long as j�j < 1; yt is stationary.

Vector MA(1) Representation

If the eigenvalues of F all lie inside the unit circle, then F s ! 0 as s!1 and yt can be rewritten as

yt= �+1Xj=0

Fjvt�j

or equivalently

yt= �+1Xj=0

jet�j :

�� j�� <1 (absolute summable), then

1. the autocovariance between the ith variable at time t and the jth variable s periods, E (yit � �i)�yjs � �j

exisits and is given by the row i; column j element of

�s =1Xk=0

s+k 0k for s = 0; 1; :::

2. the sequence of matrices f�kg1k=0 is absolute summable

4.3 Autocovariance

Note that

Ey1ty1t�1 = Ey1ty1t+1;

Ey1ty2t�1 6= Ey1ty2t+1:

E (yt) = �;

E (yt��) (yt�j��)0 = ��j

E (yt+j��) (yt��)0 = �j

��j 6= �j :

Similar to univariate case, we have

�yT! �;

limT!1

T � E�(�yT � �) (�yT � �)0

1Xj=�1

�j 6= �0 + 21Xj=1

�j :

4.4 Model Selection

Suppose that VAR(1) with two variables is a correct speci�cation. Then we have

y1t = a1 + b11y1t�1 + b12y2t�1 + e1t

= a1 + b11y1t�1 + b12 (a2 + b21y1t�2 + b22y2t�2 + e2t�1) + e1t

= a1 + a2b12 + b11y1t�1 + b12b21y1t�2 + e1t + b12e2t�1 + b22y2t�2

= a1 + a2b12 + b11y1t�1 + b12b21y1t�2 + e1t + e2t�1 + b22 (a2 + b21y1t�3 + b22y2t�3 + e2t�2)

= a+1Xj=1

bjy1t�j +1Xj=1

cje2t�j + e1t

so that we have AR(1). Similarly, if VAR(1) with three variables is a true model, then any two variables

have VAR(1).

Now consider the lag selection criteria.

AIC : cT (p) = ln��p��+ 21 + p

BIC : cT (p) = ln��p��+ 1 + p

H-Q : cT (p) = ln��p��+ 21 + p

Tln lnT

For all three criteria, the lag length for each equation becomes identical. However it is not hard to conjecture

that single equation criteria can be used for selecting individual lag length.

Alternatively, GS method is also used here. However, there is no joint criteria available for GS method.

4.5 Finite Sample Properties

Let b =Pp

i=1 bi where bi is de�ned in

yt = a+ b1yt�1 + � � �+ bpyt�p + et:

The �rst order bias is studied by Nicholls and Pope (1988) given by

E�b� b

�= � 1

�T�2

�where

24�I� b0��1 + b0 �I� b02��1 + pXj=1

�j (I��jb0)�135� (0)�1

and �j are the eigenvalues of b, and G is the covariance and variance matrix of et

Note that this formula is similar to panel VAR case which we will consider very soon.

4.6 Granger Causality

There is no Granger Causality if

E (xt+sjxt; xt�1; � � � ) = E (xt+sjxt; xt�1; � � � ; yt; yt�1; � � � ) for all s > 0:

Testing: Under the null hypothesis of no Granger Causality (y2t to y1t), all upper o¤-diagonal elements

should be zero.

5 Process with Deterministic Time Trends

First consider a simple trend regression given by

yt = bt+ "t

where we assume

"t � iidN�0; �2

�Let derive the limiting distribution.

b = b+

Pt"tPt2

Next, add a constant

yt = a+ bt+ "t

Derive the limiting distributions of a and b:0@24 a

35�24 a

351A =

1A�10@ P"tPt"t

1A0@ T 1=2 0

0 T 3=2

1A0@24 a

35�24 a

351A =

0@ T 1=2 0

0 T 3=2

1A0@ P1

1A�10@ P"tPt"t

1AConsider another estimator given by

b =yT � y1T � 1 = b+

"T � "1T � 1

Find the limiting distribution of b:

6 Univariate Processes with Unit Roots

yt = a+ ut; ut = ut�1 + "t

Then we have

yt = yt�1 + "t

yt = "1 + � � �+ "t

"t � iidN (0; 1)

Then we have

yt � N (0; t)

yt � yt�1 = "t � N (0; 1)

yt � ys � N (0; t� s)

Rewrite as a continuous time stochastic process such that

Standard Brownian Motion: W (�) is a continuous-time stochastic process, associatting each date t 2

[0; 1] with the scalar W (t) such that

1. W (0) = 0

2. [W (s)�W (t)] � N (0; s� t)

3. W (1) � N (0; 1)

Transition from discrete to continuous. First let "t � iidN (0; 1)

yt =1pT

"1T+"1 + "2T

+ � � �+ 1

W (r) dr

y2t !d

tyt�1 !d

If "t � N�0; �2

�; then we have

T�3=2X

yt !d �2Z 1

More Limiting distributions stubs. Let

yt = yt�1 + et; et � iid�0; �2

�; y0 = Op (1)

as T !1; we have

T�1=2X

et ! d�W (1) = N�0; �2

�T�1=2

Xyt�1et ! 1

Next, consider1ptyt =

es !d N�0; �2

�so that

�2ty2t =

!2!d N (0; 1)

2= �21 for a large t:

Now we are ready to prove

T�1X

yt�1et !d 1

2�2hW (1)

2 � 1i

Proof: Consider �rst

y2t = (yt�1 + et)2= y2t�1 + e

2t + 2yt�1et

so that we have

yt�1et =1

�y2t � y2t�1 � e2t

Taking time series average yields

Xyt�1et =

�y2T � y20

�� 12

Now let y0 = 0; then we have

Xyt�1et =

2y2T �

Xe2t !d 1

2�2�21 �

2�2 =

2�2hW (1)

2 � 1i:

Next, consider

Xyt�1 =

T 3=2(e1 + (e1 + e2) + � � �+ (e1 + � � �+ eT�1))

T 3=2((T � 1) e1 + (T � 2) e2 + � � �+ eT�1)

(T � t) et =1

et �1

Now we are ready to prove

T�3=2X

tet !d �W (1)� �Z 1

W (r) dr

Proof:

T�3=2X

tet =1

et �1

Xyt�1 !d �W (1)� �

ZW (r) dr:

Questions:

1. Consider this

yt = yt�1 + et; ut � iid�0; �2u

�: Eetus = 0 for all t; s:

Then derive the limiting distribution of

T�1X

yt�1ut

2. Prove the followings

(a) T�5=2Ptyt�1 !d �

RrW (r) dr

(b) T�3Pty2t�1 !d �

RrW (r)

6.1 Limiting Distribution of Unit Root Process I (No constant)

Consider the simple AR(1) regression without constant,

yt = �yt�1 + et:

Then OLS estimator is given by

�� =Pyt�1etPy2t�1

When � = 1; we know

Xyt�1et ! d 1

2�2hW (1)

2 � 1i;

y2t ! d�2Z 1

W 2dr:

Hence we have

T�1 (�� 1) =1T

Pyt�1et

Py2t�1

�ZW 2dr

��1�1

hW (1)

2 � 1i�=

�ZW 2dr

��1 ZWdW:

Consider t-ratio statistic given by

t� =�� 1p��T

=�� 1�

�2TPy2t�1

�1=2where

�2T =1

X(yt � �yt�1)2 !p �2

Hence we have

t� =�� 1�

�2TPy2t�1

�1=2 !d

�ZW 2dr

��1=2�1

hW (1)

2 � 1i�

Note that the upper and lower 2.5 (5.0) % of critical values are -2.23 (-1.95) and 1.62 (1.28) which are

very di¤erent from 1.96 (1.65).

6.2 Limiting Distribution of Unit Root Process I (constant)

Now we have

yt = a+ �yt�1 + et

When � = 1; a = 0: Howeve we don�t know if � = 1 or not. Under the null of unit root, the OLS estimators

are given by 24 a� 0

�� 1

35 =24 T

Pyt�1P

yt�1Py2t�1

35�1 24 PetP

yt�1et

35Consider24 p

3524 a� 0

�� 1

24 1 1T 3=2

Pyt�1

1T 3=2

Pyt�1

Py2t�1

35�1 24 1pT

Pyt�1et

24 1 �RW (r) dr

�RW (r) dr �2

RW (r)

35�1 24 �W (1)

�2 12

hW (1)

2 � 1i 35

T (�� 1)!d

hW (1)

2 � 1i�W (1)

W 2dr ��RWdr

�2 =

R~WdWR~W 2dr

Similarly t�ratio statistic is given by

t� !d

hW (1)

2 � 1i�W (1)

RWdrnR

W 2dr ��RWdr

�2o1=2 :

Exercise: Derive the limiting distribution of trend case.

6.3 Unit Root Test

For AR(p), we have

yt = a+ �yt�1 +X

�j�yt�j + et

Note that1

X�yt�j = Op

�1pT

so that as T !1; the augmented term goes away. Hence the limiting distribution does not change at all.

7 Meaning of Nonstationary

Let�s �nd an economic meaning of nonstationarity.

1. No steady state. No static mean or average exists. A series becomes random around its true mean.

Never converge to its mean.

2. No equilibrium since there is no steady state. Cannot forecast or predict its future value without

considering other nonstationary variable.

3. Fast convergence rate.

7.1 Unit Root Test and Stationarity Test

Note that the rejection of the null of unit root does not imply that a series is stationary. To see this, let

xt = �xt�1 + "t; "t =ptet; et � iid

�0; �2

Further let � = 0: Now xt is not weakly stationary since its variance is time varying. However at the same

time, xt does not follow unit root process since � = 0: To see this, let derive the limiting distribution of �:

�� =Pxt�1"tPx2t�1

xt�1"t

�2= E

�X"t�1"t

�2= E

�Xtet�1et

��2�2 T 2

2+O (T )

so that1

Xxt�1"t !d N

�0;�4

Xx2t�1 =

X"2t�1 =

Xte2t�1 !p �2

2+O (T )

Hence we have

T (�� ) =1T

Pxt�1"t

Px2t�1

!d N (0; 2) =p2W (1) :

Therefore, we can see that the convergence rate is still T; but the limiting distribution is not a function of

Brownian motion at all.

8 Unit Root Test Considering Finite Sample Bias

Consider the following recursive mean adjustment

�yt =1

t� 1

t�1Xs=1

Under the null of unit root, we have

yt � �yt = a+ � (yt � �yt) + (�� 1) �yt + et:

Since a = 0 and � = 1; we have

yt � �yt = � (yt � �yt) + et:

The limiting distribution of �RD is given by

T (�RD � 1)!d

�ZWdr � r�1

W (r) dr

��1�ZWdW � r�1

�Finite sample performance is usually better than ADF test.

Exercise Read �Uniform Asymptotic Normality in Stationary and Unit Root Autoregression� by Han,

Phillips and Sul (2010) and construct X-di¤erencing Unit root test.

9 Weak Stationary and Local to Unity

Joon Park (2007) and Phillips and Magdalinos (2007, JoE)

Consider the following DGP

yt = �nyt�1 + ut; t = 1; :::; n;

�n = 1�c

n�; 0 � � < 1 and c > 0

Then we havepn (�n � �n)!d N

�0; 1� �2n

�so that p

np1� �2n

(�n � �n)!d N (0; 1) :

Note that

1� �2n = 1� 1 +c2

n2�+2c

n�= O

�n��

�n�2�

hence p1� �2n =

�1 +

and pnp

1� �2n= n1=2n�=2

1p2c+O

�n�+1=2

Finally we have

n�+12 (�n � �n)!d N (0; 2c)

For the case of � = 1; we call it local to unity of which limiting distribution is di¤erent. (Phillips 1987,

Biometrica, 88 Econometrica). Consider the following simple DGP

yt = �nyt�1 + ut; �n = exp� cT

�' 1 + c

Now, de�ne

J (r) =

e(r�s)cdW (s)

where J (r) is a Gaussian process which for �xed r > 0; has the distribution

J (r) � N

�0;1

e2rc � 1c

and it call Ornstein-Uhlenbeck process. Alternatively we have

J (r) =W (r) + c

e(r�s)cW (s) ds:

The limiting distribution of �n is given by

n (�n � �n)!d

�ZJdW +

�1� �2u

��ZJ2dr

��1where � is the long run variance of ut:

Now we have

n�n = n+ c

so that

n (�n � 1� c)!d

�ZJdW +

�1� �2u

��ZJ2dr

��1and let �2u = �2 (for AR(1) case), then

n (�n � 1)!d c+

�ZJdW

� �ZJ2dr

��1for c < 0

See Phillips for the case of c!1:

Explosive Series

�n = 1 +c

n�; for c > 0

As n!1; �n ! 1 but in the �xed n; �n > 1: Note that if � > 1 but yo = 0; then the limiting distribution

(done by White, 1958) is given by

�2 � 1 (�� )!d C as n!1

where C is a Cauchy distribution. From this, consider

�2n � 1 = 2c

so that�nn

�2n � 1=

�nn2cn��

= �nnn�=2c

Hence we have

(�nnn�=2c) (�n � �n)!d C:

Note: White considered momenting generating function �rst and convert it to pdf.

"Explosive Behavior in the 1990s Nasdaq: When Did Exuberance Escalate Asset Values?,�by Peter C.

B. Phillipsy, Yangru Wuz, Jun Yu 2009.

10 Cointegration

10.1 Multivariate Integrated Process (Chap 18)

yt= yt�1+ut

where yt is a vector of nonstationary process. We further assume that

ut =1Xs=1

s"t�s

whereP1

s=0 s�� sij�� <1. Let E ("t"0t) = ; then

�s = E�utu

0t�s�=

s+v 0v:

Further de�ne

(1) = 0+ 1 + � � � ;

� = (1)P

Then we have

T�1=2X

ut !d �W (1)

T�1X

yt�1u0t ! d�

�ZWdW

��0 +

T�3=2X

yt�1y0t�1 ! d�

�ZWW0dr

��0

We are ready to analyze time series regression with integrated processes. Consider

yt = �xt + ut; ut = ut�1 + "t

Then we have ��

�=�X

��1 �Xxtut

�!d?

To �nd out this, we can choose two ways. The �rst way is to de�ne the bivariate process of yt and xt: By

doing this, we can know what � stands for. See Hamilton p. 558-559. The second way is just considering

the following results. Let

zt = (xt; ut)0;

T�3=2X

ztz0t =

0@ T�3=2Px2t T�3=2

T�3=2Pxtut T�3=2

0@ �11RW 21 �12

�12RW1W2 �22

1Awhere � can be de�ned as

�zt= (L) "t

� = (1)P; and ��0= �:

Now consider the rate of convergence. Since ut is I(1) ;Pxtut is needed to divide by T 3=2 and

Px2t as

well. Hence the rate of convergence is 1. What does it mean then?

10.2 Common Factor across Variables

Consider a process given by

yt = ��t + ut; �t = �t�1 +mt; ut � iidN (0; 1) :

Note that yt is I (1) : Next, consider a similar process of which contains �t:

xt = �t + "t; "t � iidN (0; 1) :

Then we have

yt � �xt = ut � �"t = I (0) :

In this case, we say yt is cointegrated with xt:

10.2.1 When common factor is a linear trend

Part I Before we proceed a formal asymptotic in this case, we will enjoy the following simple trend

regression case. Consider

yt = bt+ zt; xt = t+ st

zt � iidN (0; 1) ; st � iidN (0; 1) and Eztsm = 0 for all t and m:

Q1: If you are running

yt = �xt + "t

then identify � and "t:

yt = bt+ zt = b (t+ st) + zt � bst = bxt + "t:

Q2: Is xt correlated with "t?

E (xt"t) = E (t+ st) (zt � bst) = �b 6= 0:

Q3: is � consistent?

� = b+

Pxt"tPx2t

ConsiderPx2t �rst.

Xx2t =

X�t2 + 2tst + s

�+Op

�Next

Xxt"t =

X(t"t + st"t) = Op (1) +Op

�+Op

�Hence we have

T 3=2�� b

�=T�3=2

T�3Px2t

!d N (0; V )

Part II Now we have

xt = att = (1 + st) t = t+ stt:

Q1. And you are running

yt = �xt + "t

yt = bt+ zt = b (t+ stt)� bstt+ zt

= bxt + "t

E (xt"t) = E (t+ st) (zt � bstt) = Ebs2t t = bt 6= 0

Xx2t = E

Xt2 (1 + st)

Xt2�1 + s2t + 2st

3O (1) +O

�Next

xt"t =?

Derive it.

10.2.2 When the common factors are stochastic.

Consider the following processes again

yt = ��t + ut; �t = �t�1 +mt; ut � iidN (0; 1) ; mt � iidN�0; �2

We assume further that

Emtus = 0 for all t and s:

Let�s derive the limiting distribution of

X�tut:

First its mean is zero. Next,

�tut

�2= E

��21u

21 + :::+ �

�= �2 (1 + 2 + :::+ T ) =

2T 2 +O (T )

Hence we have1

X�tut !d N

�0;�2

Now consider

yt = ��t + ut; xt = �t + et;

Q1: If you are running

yt = �xt + "t

yt = ��t + ut = � (�t + et) + ut � �et = bxt + "t:

E (xt"t) = E (�t + et) (ut � �et) = �� 6= 0:

� = � +

Pxt"tPx2t

ConsiderPx2t �rst.

Xx2t !d

ZW 2dr

Xxt"t !d N (0; V )

Find V: So that we have

T��

�!d N (0; Q)

Find Q:

11 Cointegration Test

Consider a simple case

yt = �xt + ut; ut = ut�1 + "t

where xt is independent from ut: Let

ut = yt � �xt

and run

ut = �ut�1 + et

Z� = T (�� 1) =1T

Put�1et

Pu2t�1

Note that

ut = yt ��P

xtytPx2t

Q (r) =Wy (r)�ZWyW

�ZWxW

��1Wx (r)

Then we have1

Xu2t�1 !d

Q (r)2dr

Xut�1et !d

Q (r) dR

Hence �nally

Z� !d

�ZQ (r) dR

��Z 1

Q (r)2dr

��1:

Note htat Q (r) is depending on the value of �. Intutitively, we can say that we have more regressors,

the variability of � will increase so that the quantity of Q (r) is also increasing. That is, for two regressors

case, we have

ut = yt � �1x1t � �2x2t =��1 � �1

�x1t +

��2 � �2

�x2t:

Therefore, the limiting distribution is depending on the number of regressors. As more regressors enters, the

critical value is getting larger.

12 Error Correction Model (ECM)

Here I introduce ECM in an intuitive way. Consider the case of cointegration �rst. Then we have

ut = �ut�1 + et

�ut = (�� 1)ut�1 + et

�yt � ��xt = (�� 1) (yt�1 � �xt�1) + et:

This regression model can be further decomposed into

�yt = �1 (yt�1 � �xt�1) + e1t

�xt = �2 (yt�1 � �xt�1) + e2t

�1 � ��2 = �� 1:

Note that the lagged term is call �error correction�term, and so it call error correction model.

If ut follows AR(p), then the general ECM is given by

�yt = �1 (yt�1 � �xt�1) +p�1Xj=1

�yj�yt�j +

p�1Xj=1

�xj�xt�j + e1t;

�xt = �1 (yt�1 � �xt�1) +p�1Xj=1

�yj�yt�j +

p�1Xj=1

�xj�xt�j + e1t:

Now let�s compare no-cointegration case. If yt is not cointegrated with xt; we have

�yt =

p�1Xj=1

�yj�yt�j +

p�1Xj=1

�xj�xt�j + e1t;

�xt =

p�1Xj=1

�yj�yt�j +

p�1Xj=1

�xj�xt�j + e1t:

so that there is no error correction term in the VAR system.

13 Midterm Exam

1. Find AR order.

2. Test Granger Causality (assume y and x are stationary). Use bootstrap critical values.

3. Test unitroot.

4. Test cointegration.

5. Run ECM.

1 Stationary Process - University of Texas at Dallasd.sul/Econo3/lec3.pdf1 Stationary Process Model:...

Documents