[Lecture Notes in Mathematics] Lectures in Probability and Statistics Volume 1215 || Topics in...

TOPICS IN ADVANCED TIME

SERIES ANALYSIS

Victor Solo

Preface

These notes were prepared for a series of 20 lectures given at

the invitation of Professors Rolando Rebolledo and Guido del Pino in

July 1982 at the Primera Escuela de Invierno de Probabilidad y

Estadistica held at the Universidad Catolica de Chile. I thank them

for an enjoyable and useful time in Santiago.

One aim of these lectures is to provide a discussion of some

issues not easily found elsewhere. It is hoped some parts of the

notes will benefit both theoreticians and practitioners. I have

tried as much to put ideas across as to give technical detail: thus

where appropriate scalar and bivariate time series have been used to

illustrate issues and motivate the general multivariate case.

I would like to thank Narie Sheehan who did an

amazing job of typing an enormous amount of material

in a very short time.

i.

CON £ENTS

INTRODUCTION

I.A Motivation

I.B Some Ideas on Time Series Modelling

I.C A Selection of Time Series and Econometric

Models

APPENDIX A.I Matrix Calculations

page

171

2. SOME PROPERTIES OF TIME SERIES MODELS

2.A Some Aspects of the Spectrum

2.B Minimum Phase and Minimum Delay

2.C Spectral Factorization

179

3. REVIEW OF PREDICTION I

3.A Prediction: General Ideas

3.B The Wiener Filter

3.C The Levinson, Whittle Algorithm

APPENDIX A.3 Projection

185

REVIEW OF PREDICTION II

4.A The Kalman Filter

4.B The Output Statistics Kalman Filter

4.C The Kalman Filter for an ARMA Model

4.D Likelihood Functions for Time Series

192

5.

6,

THEORY OF IDENTIFIABILITY I

5.A Introduction and Examples

5.B Basic Ideas

5.C Identifiability and Estimability

IDENTIFIABILITY II

6.A Simultaneous Equation Models

6.B Examples

6.C Local Identifiability

APPENDIX A°6 A Basic Lemma of Identifiability

197

201

7.

8,

9.

10.

11.

12.

13.

168

IDENTIFIABILITY III

7.A Identifiability and Kullback-Liebler Information

7.B Consistency and Identifiability

IDENTIFIABILITY OF SOME TIME SERIES MODELS

8,A ARMA Models

8,B Transfer Function Models

APPENDIX A.8 Basic Lemmas in Time Series Identifiability

LAGS IN MULTIVARIATE ARMA MODELS I

9.A Lags, Kronecker Indices and McMillan Degrees

9.B Determination of AR Parameters

9oC Construction of a State Space Form

APPENDIX A.9 Matrix Polynomials

LAGS IN MULTIVARIATE ARMAMODELS II

10.A Matrix Fraction Descriptions

APPENDIX A°IO The Predictable Degree Property of Row Reduced Polynomial Matrices

MARTINGALE LIMIT THEOREMS

II.A Martingales

II.B Martingale Central Limit Theorem

APPENDIX A.II Basics of Martingales

APPENDIX B.II Some Useful Results in Probability

APPENDIX C.II A Strong Law of Large Numbers

LEAS TSQUARES ASYMPTOTICS IN REGRESSION MODEL~

12.A Introduction

12.B Regression Without Feedback

12.C Regression With Feedback

12,D A General Central Limit Theorem

APPENDIX A.12 Convergence of a Series

LEAST SQUARES ASYMPTOTICS IN AR AND ARX MODELS

13.A Consistency in AR Models

13.B Central Limit Theorem in AR Models

13.C Consistency in ARX Models

210

214

224

237

243

255

263

14.

15.

169

LEAST S~UARES ASYMPTOTICS FOR ARMA MODELS

14.A Preliminaries

14.B Consistency of Least Squares

14.C A Central Limit Theorem

APPENDIX A.14 Consistency of Sample Autocovariances

ASYMPTOTICALLY EFFICIENT ESTIMATORS IN ARMA MODELS

16.

17.

18.

19.

20.

15.A The One-Step Gauss Newton Scheme

15.B Adjoint Variables for Quick Gradient Calculation

HYPOTHESIS TESTS IN TIME SERIES

16.A Lagrange Multiplier Tests

16.B Testing for Autocorrelation in a Regression

16.C Choice of Order and AIC

IDENTIFIABILITY OF CLOSED LOOP SYSTEMS I

17.A Introduction

17.B Basic Issues in Closed Loop Identifiability

17.C Identifiability of the Forward Loop

IDENTIFIABILITY OF CLOSED LOOP SYSTEMS II

18.A Identifiability of the Closed Loop

APPENDIX A,18 Closed Loop Models from Spectral Factors

APPENDIX B.18 Nongeneric Pole Zero Cancellations

LINEARLY FEEDBACK FREE PROCESSES

19.A Introduction

19.B Linear Exogeneity

19.C Linear Feedback Free Processes

TESTS FOR ABSENCE OF LINEAR FEEDBACK

20.A Tests for Linear Feedback

20.B Identifiability and Weak Linear Feedback

270

280

288

295

303

308

315

References 320

NOTATION

AR

ARX

ARMA

ARMAX

a(L), ~(L)

b(e), ~(L)

~,b

c(L)

6

~k

e k

f

f (~), ~(~)

F

L

m

P

Pi

R -S

SEM

Z

S k

r(e)

v k, w k

x k

-Yk' Yk

z k , Z k

z~ Z

autoregression

autoregressive, exogenous

autoregressive moving average

autoregressive moving average exogenous

AR polynomials; a(L) = E~ a i L i etc.

exogenous polynomials

coefficients on regressors

noise or disturbance polynomial

McMillan degree

white noise

prediction error, innovations, residual

frequency; ~ = 2~f

spectrum

coefficients on endogenous variables

lag or backwards operator

dimension of Z, Y

dimension of ~,

order of AR; maximum Kronecker index

Kronecker indices

state covariance matrix; reduced term parameters

autocovariance sequence

simultaneous equation model

covariance matrix

s igna 1

rational transfer function

correlated noise sequence

state vector; regression vector

observed sequence

exogenous or input sequence

z transform or forwards operator

LECTURE i: INTRODUCTION

IA. Motivation

In many areas of statistical application such as economics, business,

engineering, earth sciences, environmental sciences, data are collected

over time; typically quarterly, daily, hourly, in seconds etc. Often sucb

data is available on several related variables. An example is the daily

measurement of dissolved oxygen concentration and biochemical oxygen demand

in a river as indicators of water quality. Another example concerns the

study of the relation between in-migration and out-migration from a region

of a country as they relate to economic variables such as capital growth.

There are two basic reasons for analyzing the data jointly as opposed

to singly:

(i) To understand the dynamic relations among the series. The effect of

a change in one series may show in another series simultaneously or

over a period of time and with varying amplitude of effect. Some

series may lead others and there may be feedback connections between

them.

(2) To provide more accurate forecasts. The joint modelling allows the

use of joint information; intuitively this enables forecast errors

to be reduced.

Note.

IB.

See also Tiao and Box (1979).

Some Ideas on Time Series Modelling

There seem to be three threads in the time domain modelling of

time series. One old one, one more recent and one just developing.

(i) A traditional view

(ii) The view popularized by Box and Jenkins

(iii) A very new idea

172

(i)

form

In the traditional view the data YI' "''' Yn is modelled in the

= + ~t + vt Yt Tt

where T t is a trend such as a mean or a polynomial; ~t is a seasonal com-

ponent e.g., a sum of sinusoids of various frequencies; v t is a correlated

noise sequence or disturbance sequence of bounded variance. The trend is

supposed to capture the "smooth" part of Yt; the seasonal component captures

the "regular" part; while the noise captures the "rough" or unpredictable

part. The model is fitted by a combination of regression and smoothing to

remove the trend and the seasonal component. The noise is modelled as a

stationary process. In general, the noise Vl, ...,v n is one realization

(i.e. a sample of size I) from a multivariate distribution with covarianee

1 matrix having ~n(n+ i) elements. The stationarity assumption reduces this

to n elements (which puts the analysis in the realm of classical nonpara-

metric statistics). Finally, finite parameter models (AR, MA, ARMA) reduce

this to a fixed dimension. A careful discussion of this methodology is the

book of T. W. Anderson (1971).

(ii) The idea here is to model time series in terms of random

wandering (RW) models: this term which connotes random walk is preferred

to nonstationary since confusion has resulted with time varying parameter

models of bounded variance. The data is differenced to produce a stationary

model which is then fit by an AR, ~ or A~MA scheme. Typical differencing

operators are 1 - L, (I-L) 2, 1 - L4; this last one removes quarterly

effects from monthly series. Fuller (1976) has provided tests for deter-

mining how many if any factors of the form 1 - L should be employed. At

first this approach seemed to contradict the older one. More recently

(Abraham and Box (1978)) it has been realized that data containing deter-

ministic components such as sinusoids can be handled by allowing cancella-

tion of AR and MA factors with roots on the unit circle viz. the model

173

Yt = A cos wt + gt i.i

is equivalent to the ARMA (z,2) model

(I-2L cosw+L2)Yt = (I-2L cosw+L2)~ t 1.2

L is the backward or lag operator. If such a cancellation is found

(fitting must be done with an exact likelihood routine) then it was

recommended to fit the model (i.i). The fact that this is not necessary

is the substance of some very recent ideas. (Some related ideas to this

section are in Parzen (1981)).

(iii) Recently Harvey (1981) pointed out that if a unit root cancel-

lation is observed in fitting an ARMA model then the Kalman filter can be

used to produce forecasts without refitting (a model such as (i.i)). The

idea (see Lecture 4A) is that the Kalman filter applies to RW models.

Actually, the standard stability theory of Kalman filtering does not

apply to unit root cancellation models. Fortunately, recently Goodwin

and coworkers working independently (Goodwin et al. (1982), Chan et al.

(1982)) have extended the stability theory to deal with just such an issue.

There are still some issues to be resolved. There is another related

issue, since the Kalman filter does not require a stable model for its

validity there is no reason why general AR~ models cannot be fitted with

unit root factors in the AR, MA terms (some cancelling, some not). In

particular orders of differencing would not need to be specified, just an

overall model order. These ideas need some sorting out: some recent

pertinent work is Tiao and Tsay (1981).

IC. A Selection of Time Series and Econometric Models

Some models are listed here togehter with some miscellaneous comments.

It is sometimes useful to write the models in the general form

data = signal + noise.

(i) Autoregressive Model of Order i AR(p).

174

Yk = Sk + gk

s k = a(L)Y k

where L is the lag or backwards operator LY k Yk-l; a(L) = ~ aiLi = ; C k

is a white noise sequence or uncorrelated sequence E(gkS t) = ~kt ~2,

E(C k) = O. A minimal additional requirement to allow statistical inference

is that E(eklgk_ 1 ..... Cl) = O. This means that gk is uncorrelated with,

or orthogonal to, any function of the past y's; it is called a martingale

difference sequence or nonlinear innovations process. The behavior of

the process Yk is determined by the roots of the polynomial equation

zP(I+a(Z-I)) = O

zP + ~ aizP-i = 0

-i where Z = L is the Z transform or forwards operator ZY k = Yk+l" The

basic behaviors are captured by the second order model

1 + a(L) = 1 + all + a2L2.

If the roots of the equation Z 2 + alZ + a 2 = 0 are real the behavior is

geometric decay after each new "shock" gk; if the roots are a complex

conjugate pair the behavior is oscillatory. Readable accounts are given

in Fuller (]976, Chapter 2) and Box and Jenkins (1976).

(ii) Moving Average Model of Order ~ MA(q).

q L i where C(L) = ~ici "

in Lecture 2B.

Yk = Sk + gk

S k = C(L)~ k

Some issues relating to this model are discussed

Remark. Roughly speaking we can describe the difference between AR and ~

models as follows. AR models are smooth, MA models are "rough". To see

what is meant here consider the signal + noise decomposition Yk = Sk ÷ gk"

Intuitively we think as follows.

175

S k = signal = the part of Yk predictable from the past

Sk = noise = unpredictable part of v k.

This "predictability" or smoothness can be measured by the signal to noise

ratio (SNR)

SNR = var(Sk)/var(g k)

= (var(Yk)/var(Ek)) - I

viz. we have the comparison

MA(1) : Yk = Sk + 0£k-I

AR(1) : Yk = eYk-i + Ek

: SNR = 02

: SNR = 82/(1-82).

For .5 < le] < i the SNIR for AR(1) is much higher than that for the MA(1).

For higher order models it is harder to discriminate on this basis but the

idea is a useful intuitive one. Box and Tiao (1977) have mentioned SNR.

It is a co~on measure in the engineering literature.

<iii) Autoregressive Exogenous Mode=]_ ARX

Yk = Sk + ek

Sk = a(L)Yk + ~'~k

Here ~k is a sequence of deterministic trends such as a constant, polynomials,

sinusoids. Alternatively, ~'~k is of the form (or contains components of the

form b(L)Z k where Z k is an exogenous or input sequence.

(iv) Autoregressive Moving Avera&eModel ARMA

Yk = Sk + Ck

S k = a(L)Y k + C(L)g k

The appropriate method for generating the autocovariances of this model

seems to be the scheme suggested by Hwang (1978) and Wilson (1979).

176

(v) ARMA + Exogenous_ (ARMAX)

Yk = Sk + Ek

Sk = a(L)Yk + ~'~k + c(e)~k-

Two related models are the TFg (transfer function white noise) model

= (I +a(L))-Ib(L)Z k S k

and the TFARMA (TF + A~MA noise) model

Yk = Sk + Vk

= (i +a(L))-ID(L)Z k S k

v k = (l+d(L))-l(l +c(L))Sk.

One of the advantages of the ARMAX form is its linearity.

(vi) State Space or Markov Model

Yk = Sk + Vk

Sk = ~'~k

~k+l = ~k + ~k

where Vk, ~k are white noises with

Provided M # 0 there is a i -i relation between Markov models and ARMA

models. Now while the A~MA model can be written in state space or Markov

form in infinitely many ways only a handful turn out to be of practical

ase.

One particular representation is the observer form

Yk = Sk + Ek

177

S k = h'_x k

~k+l = [0Xk + ~gk

F_O =

al10 -a k 0 0

k 1

k= kp

c I - a 1

Cp - ap

Notice that v k = ~k and ~k = ~gk are contemForaneously correlated. Other

forms are discussed in Kailath (1980, Chapter 2).

(vii) Simultaneous Equation Model (SEM)

+ = ~t' i = i, ..., N

where Y is an £ × 1 vector of endogenous variables i.e. variables whose --t

relations among themselves are determined within the system (whether it be

a market, group of markets, or whole economy) being studied• Then Z is -t

an m × 1 vector of exogenous variables i.e. variables whose values are

determined outside the system being studied. Finally, ¢ is an £ × 1 -t

vector of uncorrelated noises or disturbances with covariance matrix E:

N is the sample size•

These equations are called structural equations because they represent

the structure of the system being studied• They might for example be pro-

vided by economic theory. An excellent account of econometrics exhibiting

the links between the economic models and the econometric equations is the

book by Intriligator (1978).

The equations can be collected together as

YF + ZB = E

where Y = (~i .... , YN )' is N × £, etc. If the matrix of endogenous co-

efficients F is of full rank the equations can be written as a multivariate

regression in "reduced" form

Y = Z~+W

178

R :-~[-i w:K -1

The covariance matrix of Y is ~ ~N (where ~is the Kronecker product:

see Appendix AI). The dynamic simultaneous equation model allows lagging

on the Y and Z sequences viz.

~'(L)Yt + B'(L)z = 8 -t -t

_ FI L i F'(L) = ~8 -l etc.

Further background for this model is available in Intriligator (1978) or

Theil (1971).

APPENDIX AI: MATRIX CALCUALTIONS

To deal easily with multivariate models it is useful to employ the

"vec" or stacking operator. If M is a p × q matrix with columns

t

~i ..... Mq then vec M = vec(M) = (~i ..... ~p)' ~s'pq'× 1 i.e. vec stacks

the columns of M one under another. We also need the Kronecker product

of two matrices. If A is p × q and B is r x s then A~B is the pr x qs

block matrix with i, j block entries [aijB] , 1 ! i ! p, 1 ! j ! q.

The vec operator has the following easily verified properties (we

write vec' X = (vec X)').

vec ABC = C' ~A vec B

tr(_XY_) = trace(_XY_) = vec' X vec X'

tr(_A~_Cp = vec' ABC vec D' = vec'B C QA vec D'

LECTURE 2

SOME PROPERTIES OF TIME SERIES MODELS

2A. Some Aspects3f the Spectrum

A p dimensional time series of multivariate time series is a collec-

= X p t ) ' tion of random vectors ~t (Xlt , ..., where t belongs to an index

set T. We call X discrete if T is discrete. If T = 0, i, 2, ... (where -t

the sampling interval A has been suppressed) the series is equispaced.

Otherwise it is unequispaced; if then T = {tl, t2, ...} often Xt I is de-

noted ~1 e t c .

A multivariate time series X is strictly stationary if the joint -t

distributions

[XI...X (~I ..... ~n ) = P(~l!~l ..... ~n!~n ) -n

(where e.g., ~i ! ~i means Xjl ! Xjl, j = i, ..., p) obey the shift in-

variance property

•X "'"

_tl+h'''~tn+h (~I' '~n ) = rx ...x (~l ..... Xn)

-t I -t n

for all t~,± ..., t and h c T. A multivariate time series _X t n

weakly (or wide sense) stationary if for all t,h c T

is called

c°v(~t' ~t+h ) = ~h"

Then _~ is called the autocovariance matrix of ~t" Observe that

T

_R h = R_h. (2.1)

The block matrix [_Ri_j]l!i,j< m is positive semidefinite for any m. (2.2)

To see t h i s c a l c u l a t e t h e v a r i a n c e o f an a r b i t r a r y l i n e a r c o m b i n a t i o n

m E 1 -:o~LX i . _ A p a r t of s t a t i s t i c a l t ime s e r i e s m o d e l l i n g i n v o l v e s b u i l d i n g

parametric modelling for autocovariance sequences.

180

Some further comments are worth making. To keep things clear we

work with the scalar case. The vector arguments are nearly identical.

We introduce the covariance generating function

¢(z) = ~:~' z -h ~-oo ~ (2.3)

In view of (2.1), (2.2) this has the properties

~(z) = ~(z -I) (2.4a)

~(z) is positive on i zl = I. (2.4b)

The function

oo ¢(e i~) = R 0 + 2 ~iRkCOSk~ (2.5)

is the spectrum of X t. We now investigate conditions under which the

series is finite. In fact the following square summability condition

is enough

Zt < o~. (2.6)

To see this regard ~k(C0) = cos k~ as a sequence of random variables on

the sample space {[-~, z), Borel o fields, Lebesgue measure}. They are

I uncorrelated with variance ~ : i.e.

e(~k~ ~) = F l cos i~ cos k~0 df =- 6ki; f = ~/2"~. -r 2

Thus we deduce mean square convergence of

~n(CO) = R 0 + 2 ~,~R k c o s k ~

i . e .

F [ _ j-~ I* n ¢I 2df -' 0 as n ÷ ~o.

Further this is enough to establish the inversion formula

= lim /~_~,n(L0)e-i°Ohdf : /n_~¢(C°)e-i°°hdf. n-+Oo

We simply observe that

I f~ (~n- ~)e-iWhdf[ ~ f i~n- ¢[df ~ ( / i¢ n-¢I2df)½.1 -+ 0.

2B.

181

Minimum Phase and Minimum Delay

Let gk be a unit variance white noise sequence and consider the two

MA(1) models

Yk = cOCk + Clek-1 = (Co +elL)sk

Yk = ClCk + C0Sk-i = (Cl +c0L)sk"

First observe that Yk' Yk have the same autocovariances

var(Yk) = C2o + c~ = var(Y k)

cov(Y k, Yk_l ) = c0c I = cov(Yk, Yk_l ).

The ordered pair (Co, c I) is called a wavelet (or an impulse response).

The energy in the wavelet is defined to be cg + c~. Thus the wavelets

(c o , Cl), (c I, c o ) have the same energy and same autocovariances. They

are distinguished in other ways.

A minimum delay wavelet is one which delivers its energy as early

as possible. A maximum delay wavelet delivers its energy as late as

possible. So if we take ic01 > iCll then (c o , c I) is minimum delay while

(Cl, c o ) is maximum delay.

Now consider the amplitude and phase properties of these two wavelets.

We write

The amplitude is

Ao(eiW ) = e 0 + cleiW

= c o + c I cosw + i c Isinw.

2 2 ½ A0(W) : ~IA0 (eiw) I = (c o +c I +2c0c I cos to)

• + c I cos w c I sin 0~ => ~ + i A0(elW) = X0(~)[ c0 A0(w ) ~ J

: ~o(~)e i~o6°)

where TO(W) = tan-l[cl sin~/(c 0 +c I cosw)].

182

Similarly we find

2 cos 2 AI(~) = (c0+2c0e I ~+Cl) 2 = A0(~)

• i(~) = tan-l[c0 sin~/(c l+c 0 cos~)].

Thus the two wavelets have the same amplitude but different phases. The

two phase spectra are plotted on (0, ~) (sufficient since T(~) is an odd

1 function) for the case (c0, Cl) = (i, ~)

(~) degrees

180

135

90

45

0

~/2 "~

t0-+

We observe that TI(~) exceeds T0(w). Thus the wavelet (Co, c I) is

also the minimum phase wavelet.

Finally, observe that the minimum delay, minimum phase wavelet is

characterized by being the one with all its zeros inside Izl < i. Thus

its inverse is a stable filter. When MA models are fitted it is the

minimum phase or minimum delay or stably invertible models that are fit.

An interesting application of these ideas is revealed by the all pass

filter

(c O + Clh)/(c I + CoL)

which produces the ARMA(I,I) model

(c I +c0L)Y k = (c 0+clL)C k.

Then Yk is stationary and is itself a white noise. The issue of minimum

phase/delay is important in feedback detection.

Notes. The discussion here is adapted from Robinson and Treitel (1980).

183

See also Robinson (1962) and Robinson (1967, Appendix i). The ideas here

readily extend to higher order MA models.

2C. Spectral Factorization

As mentioned earlier the covariance generating function

-k ~(Z) = ~_~E(YoYk)Z has two basic properties (2.4). The spectral

factorization problem is to represent the process Yk as the output of

a linear system driven by white noise. That is to show

THEOREM 2.1. If ~(Z) is a real rational full rank covariance generating

function then there exists a unique factorization

¢(z) = 9(z)~[*(z)

where W(Z) is real, rational, stable, minimum phase i.e.

W-I(z) is analytic in !Z 1 ~ i (2,7)

with W(~) = I. Q is symmetric positive definite. W(Z) is called the

normalized minimum phase spectral factor (NMSF) of ~(Z). Further

1 deg W(Z) = ~ deg #(Z).

Proof. A very readable (multivariate) proof is given by Whittle (1953).

With ¢(Z) as a rational function an (easier) proof is given by Robinson

and Treitel (1980, p. 228), Hannah (1970). See also Ng et al. (1977).

Remark. The significance of (2.7) is the following. If and only if (2.7)

holds then by a standard result in complex variables W-I(z) has a (one-sided)

Taylor series expansion valid in [Z 1 ~ 1 (so W-I has no poles there)

W-I(z) = E0~ CkZ-k'

to put this in more familiar terms. If and only if (2.7) holds then

W-I(z) must be the Z transform of a stable causal (=one-sided) filter.

This is the relevance of analycity; it enables us to calculate the white

noise causally stably as gk = W-I(Z)Yk"

184

If we remark that this problem is the same as finding a Cholesky

factor for the infinite Toeplitz matrix

R 0 R 1 R 2 °.•

R_ 1 R 0 ...... Too =

R_2 .........

then some of the methods used to find the polynomial of Theorem 2.1 are

understandable.

If we look at Cholesky factorization as generalized square rooting

it is not surprising that there is a Newton Raphson scheme for finding

W(Z) from @(g) which generalizes exactly the well known scheme for finding

the square root of a number namely

1 an+ 1 = ~ (a n+a/a n ) ÷ a2

see e.g., G. T. Wilson (1972) and B. D. O. Anderson (1978)•

We might also expect to find a spectral factor by doing Cholesky

factorizations of increasing order on the Toeplitz matrices

T n

R 0 ..... Rn_ 1

=

K n+ I ..... R 0

The celebrated Levinson, Whittle algorithm (Lecture 3C) finds Cholesky

factors of T-I• A similar algorithm that finds Cholesky factors of T -I n n

is given by Justice (1972).

If @(Z) is rational then the (output statistics) Kalman filter per-

forms spectral factorization (see Lectures 4A, 4B). Alternatively, we

1 ~ -k can write ~(Z) g(Z) + g(z-l), g(Z) ~ R 0 + = = EI~Z • Then g(Z) is

called positive real since Re(g(ei~)) > O. It is possible to do spectral

factorization of ~(Z) from an ARMA formula or state space formula for g(Z)°

This is accomplished via the so called positive real lemma (see Anderson

et al. (1974)). All the above discussion extends naturally to the multi-

variate case•

LECTURE 3: REVIEW OF PREDICTION I

3A. Prediction and Smoothing: General Ideas

Consider the problem of estimating one time series x from measure- -t

ments on a r e l a t e d t i m e s e r i e s I t : c o l l e c t t h e m e a s u r e m e n t s ~ i = ~ t . ' 1

i = i, ..., n (which need not be equispaced) into a vector

n ! v Y = ~1 = ( Y l ' " ' ' ' Yn )" I f t > t t h e p r o b l e m i s c a l l e d p r e d i c t i o n ; i f - - - n

t < t n it is called smoothing. The basis of most time series estimation

f o r m u l a s i s l i n e a r l e a s t s q u a r e s ( l t s ) e s t i m a t i o n . We c o n s i d e r l i n e a r

c o m b i n a t i o n s ( w e i g h t e d a v e r a g e s ) o f t h e o b s e r v a t i o n s s u c h a s

and choose the weights ~ti

estimate is then given by

^ n ~ t l n = ~I ~ t i X i

tO ensure EIIxt-~tln I12 is minimized. The lls

-Xt!n = E(xtl-Yl)

E-I = E(_xtx') y Y-

E = E(Z~'). (The wide sense projection or conditional expectation symbol -y

is discussed in Appendix A3. If the data is Gaussian it is conditional

expection .) Thus

(~tl .... Htn) = E(~tX') E-I _ -y

= {E(xtZl) ..... E(XtZ~)} ~yl.

Observe that to solve the estimation problem we need only to know the

covariance of the data y i.e. ~ and the cross covariance between the process - -y

x and the process yt. This basic point is often forgotten. There is a -t

problem with the solution as it stands in that it involves a matrix inversion.

There is however a second approach.

We use the data one step at a time as follows

186

where

~t]n = E(~tlZl ) = E(~t lZn ' n-1 X1 )

= E(xtlS n, Z~ -1)

n-i

en : Yn - E(YnlY-I ) = Y-n - -Yn[n-l"

This follows by the basic property of (wide sense) conditional expectation.

The computation of e is discussed in a moment. -n

n-i uncorrelated with v_ so

hl

Now by definition e is -n

_~tl n : ~(xtl_en ) + ~(xt n-l) ly I •

Applying this projection repeatedly yields

ei = -Yi - E(vilY[ -I); el = -YI"

(3.1)

The sequence e. is called the innovation sequence since by definition it --I

contains the new information in the data Yi not predicted from previous data

i-i -Yl Finally, note that since the innovations sequence is an uncorrelated

sequence (3.1) is

n

xtl n = E 1 E(_xtei)E_~.l e i

v

z. =E(e_e ) -z i i "

(3.2)

Actually, (3.2) is a generalized finite data Wold decomposition: we

see that it terminates if ever E. = O. -I -

It is not clear in the above calculation how the ~i are to be produced.

In fact they are produced by a Cholesky factorization of the covariance

matrix ~ as -y

E = LDL' --y -----

where L is a block lower triangular matrix with identities on its diagonal;

D is block diagonal with entries E . The e. sequence is generated from the - -i -i

y by

From this we verify that

187

(el, ...,e ) = L -I -n -Y °

E(_e I ..... _e n)(e I ..... en)' = _D = diag (~I' "''' -nE ).

The inversion of triangular matrices is very easy. Also since the

inverse of a triangular matrix is triangular we see that e I , ..., e i are

linearly equivalent to -YI' "''' ~i i.e. the e i and Yi are causally equiva-

lent. Further consequences of these ideas are expanded on in various

articles by Kailath (see e.g., (1974)). It should be emphasized that

these formulae apply whether x evolves in continuous or discrete time. -t

Now given a covariance matrix there are standard numerical techniques

for generating Cholesky triangular decompositions (see e.g., Stewart (1973)).

However, the special structure of Y~ that arises in time series models means -y

that more efficient methods can be found.

There are basically two ideas involved.

(i) Stationarity. If the -Yt process is stationary then the covariance

matrix E is a Toeplitz matrix -y

~0 ~n- 1 =

-y

~-n+l "'" 20

The Levinson, Whittle algorithm (Lecture 3C) generates th~ inverse

-i triangular factor L by using this Toeplitz structure.

(ii) Markov Structure. The Kalman filter (Lecture 4A) generates the

inverse factor L -I directly even for nonstationary models by using

the Markov structure of Zt.

Finally, it is possible to combine these structures and produce an

algorithm for stationary Markov processes that improves on (i), (ii).

This is the so-called Chandrasekhar algorithm (see Lindquist (1974) and

Sidhu and Kailath (1974)) and Rissanen (1973a, 1973b). The aut:~or has

recently used this algorithm to construct likelihood functions (Solo (1982)).

There is one other general point worth making here. If we deal with

188

processes ~t' Zt having a joint Markov structure then the problem of

minimizing E[Ixt-~tlnll 2 can be converted to a variational problem. This

point which enables the estimation problem to be cast as a regression

problem has been discovered by a number of authors: see e.g. Rauch et al.

(1965), Sage (1968), Kimeldorf and Wahba (1970), Duncan and Horn (1972),

Paige and Saunders (1977), also Silverman (1976). The variational view

underlies the relation between splines and time series.

3B. Wiener Filter

Using the above ideas it is possible to give a very simple derivation

of the discrete time Wiener filter formulae (cf. Anderson and Moore, 1979,

p. 255). For simplicity we take the scalar case. Suppose ~, Yk are

jointly wide sense stationary. Consider the infinite data problem where

the infinite past of Yk is available. Introduce the autocovariance gen-

erating function of Yk by

~y(Z) = ~7~E(YoYk)Z-k"

Now the quantity that corresponds to the Cholesky factor of ~ is the -y

spectral factor of ~y(Z). According to Theorem 2.1 ~y(Z) has a factoring

~y(Z) = W(Z)W(Z-I)d 2 where W-I(z) is analytic in IZl ~ i (W(m) = l) so

that W-I(z) is a causal operator. Thus we can introduce the innovations

sequence

~k = W-I(Z)Yk"

We rapidly check that the autocovariance generating function for V is

E ~ z-k ~(Z) = -~E(VkV 0)

= ~ W-I(z)E(YkY0)W-I(z-I)z -k

= W-I(Z)~y(Z)W-I(z -I) = 02 "

This confirms that ~k is a white noise sequence. Next by the generalized

189

Wold decomposition

~I k = £(XklYkYk_ 1 -..) = E(XklVkVk_ I ...)

= E 0 E(Xk~k_j)Vk_ j

oo

= E 0 E(Xo~ j)~k_ j by stationarity.

Introduce Dj = E(Xo~ j) then

co co --"

~Ik_ I = Z OD j~Jk_ j = Z 0 DjZ J~k = D+(Z-I)vk

where the generating function D(Z) has been written as a sum of causal

D+(Z -I) and non causal D_(Z) parts viz.

D(Z) = Z °o D.Z -i = E°°D Z -j + Zl D .Z j - ~ ] 0 j - . l

= D+(Z -1) + D_(Z).

^ D+(Z-I) That is to say the Z transform linking Dk to Y~ik_ 1 is . Thus

the transform linking Yk to ~ik_ I is

Xklk_ 1 = D+(z-l)~jk

= D+(Z-I)w-I(Z)yk ,

Finally, we need to obtain D+(Z -I) in terms of more easily available

quantities: observe that

co -j D(Z) = E_~E(XoV j)e

= Z °o E(XoW-I(z)y_j)z-J -co

= w-l(z)¢xy(Z -I)

Cxy(Z -l) ~ -j where = Z_o E(XoY j)Z is the cross covariance generating function.

Thus we obtain the celebrated formula

^ c ~x (Z-l) Yk

Xklk-i = I i~W(Z) }+ W(Z) °

t90

3C. Levinson, Whittle Algorithm

The Levinson algorithm is a means of producing the inverse Cholesky

factor of the data covariance matrix when the process is stationary.

There are a number of ways of deriving the algorithm (each having their

advantage). The idea is that we can produce the triangular factor by

fitting increasing order autoregressions.

An important aspect of the Levinson algorithm is the fact that the

AR polynomial it produces is stable. In fact this idea can be used in

reverse to produce a stability test for an arbitrary polynomial. It is

the Cohn criterion (see Veiera and Kailath (1981)).

A derivation of the Levinson algorithm using properties of Toeplitz

matrices is available in Anderson and Moore (1979, p. 258); also Robinson

and Trietel (1980, p. 1963). A derivation using projection arguments is

given by Findley (1981). A discussion of the algorithm from the lattice

point of view is given by Makhoul (1978).

APPENDIX A3: WIDE SENSE CONDITIONAL EXPECTATION

A basic role is played in the linear estimation theory by the following

elementary considerations. Let X, Z be two zero mean correlated random

vectors. Suppose initially they are jointly Gaussian. Then the following

conditional mean expression is well known (and easily derived)

E(ZIX) -I = EZX ~XX X

ZZX = E(ZX'); ~XX = E(XX').

Now suppose X, Z are not Gaussian. We introduce the wide sense conditional

expectation

~(zlx) -1 = ~ZX ExxX"

The label conditional expectation is justified in view of the basic prop-

erties of

E(Z-E(ZIX))X' = 0 (A3.1)

191

This is called the orthogonality condition.

~(zlx, Y) : ~(zlx) + ~(z[~)

~(z]x, Y,w) = ~(z[x, Y,~y)

= Y - E(YIX); Wy = W - E(WIY).

Also E can be viewed as a projection. Denote by R(X) the column space of

X i.e. R(X) consists of all vectors that are linear combinations of columns

of X. Then

EI]Z-qll 2 = E(ZIX) : r]C R(X). (A3.4) arg m~n

That is E(ZIX ) is the projection of Z onto R(X). We can reinterpret

(A3.2) as showing that the projection of Z onto R(X, Y) can be accomplished

by first projecting onto ~(X) and then onto that part of R(Y) that is

orthogonal to R(X).

Remark I. It is a useful exercise to derive the following well known

matrix inversion formulae by using (A3.2)

-i

~>

Z = A + BCD

= A -I _ A-IB(DA-IB +C-I)-IDA -I

= A -I _ Z-IBcDA -I

= A -I _ A-IBcDZ -I

C -I _ DZ-IB = (CDA-IB +I) -I.

If Y, W are other random vectors

(A3.2)

(A3.3)

Remark 2. If X, Z are not stochastic we can still define E as

E(XIZ ) = (XZ')(ZZ')-Iz

provided the inverse exists. The properties derived above continue to

hold, This definition of E is useful in regression analysis,

Notes. Discussions of wide sense conditional expectation are given by

Doob (1953) and Parzen (1967).

LECTURE 4: REVIEW OF PREDICTION II

4A. The Kalman Filter

where

Here we suppose that x, y are related through a joint Markov model

_Xk+ 1 = Fk_X k + w k

-Yk = h k ~ + Vk

[ _wk] _v k

] _N k -Nk

(4.1)

Denote the data Y-I .... 'Y-k by _yk. Now apply the basic projection lemma

to see

~ k-I = E(-~+ii-ek'-Yl )

-~k ~k ~klk-1 -Yk ~Zkl k-1 . . . . Yl )"

(4.2)

We deduce then

-~+llk = -~+11e-i + ~(~+iiek )

T

= E(_~k+le k)

~k = E(eke k).

To proceed further we need to use the Markov property

-Xk+11k-1 -k+l Yl )

~ k-i = E(Fx, +w, Yl ) --E -K I ±

= FkE(XklY k-l)

(4.3)

(4.4)

(4.5)

(4.6)

193

= Fk_Xk i k_l

Thus we obtain the first part of the Kalman filter

(4.7a)

since m~Yl Wk ) = O.

k+llk = rk Ik-i +

In the sequel replace the subscript k+llk by k+l. Introduce the error

covarianee matrix

!k = E(~k- ~k ) (~k- ~k )'

= ~k - ~k

= A ^, ~k (4.8) ~k E(~kXk )' = E(XkXk )

Take covariances in (4.7a) to see

^ ! --1 T

~k+l = ~k~kgk + -~k -~" (4.9)

On the other hand taking covariances in (4.1) gives

v

~k+l = ~k~k~k + ~k" (4.10)

Thus from (4.9), (4.10) we find on subtraction

--1!

~k+l = ~k~k~k + ~k - ~k ~k -~" (4.7b)

We need to add that

!

~k = E(-Xk+l~k)

!

= E(FkXk+Wk)(Zk + (_Xk-_~)'_~)

= ~k~k)k + ~k (4.11a)

as well as deducing

!

~k = ~k~k)k + -~ (4.11b)

The Kalman filter is the equation set (4.7), (4.11) (Kalman (1960).

Remark i. If ~ obeys a continuous time model rather than the discrete

one (4.1) the same argument will produce the so-called continuous discrete

194

filter (see Jazwinski (1970)).

Remark 2. An important aspect of this setting is that stationarity is

not assumed so unstable models are allowed. A very simple discussion of

the behavior of the filter in such cases, using only mean square ideas is

given by Aasnes and Kailath (1974). This contrasts with complicated tradi-

tional analyses (Jazwinski (1970)).

Remark 3. It appears that to operate this filter it is necessary to know

the quantities Q, N, R and much work has been devoted to methods for esti-

mating these off-line as well as on-line. This is unfortunate since these

quantities are not identifiable nor in fact are they needed to operate the

filter. This is discussed in the next section.

4B. The Output Statistics Kalman Filter

Return to equation (4.5) and observe

T

~k = E(~k+lek)

v ~ v

= FX_k+l(Yk -_Xk~ k)

v

= E(Xk+iZk) - Fk~khk.

Similarly we find

(4.12a)

v ~A

-k % = E([kZk) - _hkPkh k._ _ (4.12b)

The so-called output statistics Kalman filter consists of equations (4.7a),

(4.9), (4.12). It is clear that to operate the filter what is necessary is

the variance sequence of yk, the cross variance between the state x and the

output or observation Z and the transition matrix F. These quantities can

be obtained in a model fitting exercise (see Solo (1982)).

Notes. The output statistics Kalman filter is due to Son and Anderson (1971).

The use of covariance information to produce a filter is due to Rissanen and Barbosa (1964).

195

4C. Kalman Filter for an ARMA Model

Now suppose the Markov model comes from a scalar ARMA model. Suppose

in particular F = ~0' the observer form matrix. It follows immediately

by inspection that if we join

Yk = ~'~k + ek

to (4.7a) we can write

- - ... = e k + + ,.. + c e (4.13) Yk alYk-i - apYk-p el,k-lek-i p,k-p k-p

where c.j,k_j (Kj,k_j -aj)/Zk_j. This exhibits the Kalman filter as a

time varying ARMA model. It is an interesting exercise to show that

Ci,k_ i = E(Wkek_ i) ~ E(Wkgk- i) = c i

w k = (i -a(l,))y k.

Notes. This filter form was first derived by Rissanen and Barbosa (1969).

A simple, direct derivation is given by Aasnes and Kailath (1973).

4D. Likelihood Functions for Time Series Models

Assuming a Gaussian distribution for the data [, the log likelihood

function is

1 1 y, -i log L n = -~ log l~y I - ~ _ ~y

= E(yy'). Now we can simplify this by using the Cholesky n where y : Yl; ~y

factorization to see

i ~n 1 n ~ -i log L = -~ L 1 log I~il ~ Z 1 - !i~i ~i" (4.14) n

Another way to obtain this is to observe that since the likelihood

is just the joint distribution of the data we have by the definition of

conditional density

L n = f(Zl ..... Yn ) = f(Y~)

196

f(ynly~-l) n-I = - f(Yl )"

However since yn is Gaussian log f(ynl n-i - Yl ) must have the form

1 ] e' E-le -~ log ]~nl - ~ -n-n -n

mince -n ~ = E(~n~ ) and ~n = In - E(!nlY . Now iterating the conditional

calculation gives the form (4.14). The advantage of (4.14) is that we can

generate ~i' -iZ" from the Kalman filter: see Gardner et al. (1980); an alter-

native method using Chandrasekhar equations is discussed in Solo (1982).

LECTURE 5: THEORY OF IDENTIFIABILITY I

5A. Introduction and Examples

The identifiability problem refers to the fact that two different

parameter values in a probability model may give rise to the same distri-

bution and hence be indistinguishable. The issue arises especially in

econometric modelling of simultaneous relations (such as demand and supply

relations) and in statistics in the analysis of variance. The problem is

illustrated with two simple examples.

Example (i). Econometrics

Consider a pair of equations to describe the demand and supply of a

qt = 60 + ~iPt + gt demand

qt = $0 + 6'iPt + g't supply

qt = quantity sold at time t

Pt = price of the good at time t

~t~ ~ are white noises. Now clearly an attempt to estimate the ~'s by t

linear regression will return only two values b0, b I. The problem is that

the two equations

E(qt) = ~0 + ~l(Pt )

T t

E(qt) = 60 + BI(P t)

are identical. So DO, BI; BO, B 1 cannot be distinguished. The way to

resolve the problem is to build a larger model incorporating additional

exogeneous variables i.e. variables describing goods whose prices are

determined in other markets vizo,

E(qt) = ~0 + ~l(Pt) + ~2(rt )

single good

198

I t v

E(qt) = 80 + BI(p t) + ~2(ct)

where c is the price of a raw material used to produce the good while r t t

is the price of a complementary good. In larger systems of equations we

can see that the basic issues involve the existence of linear dependencies

between the structural equations. A readable discussion (on which this

example was based) is in Theil (1971, p. 446-449).

Example (ii). Analysis of Variance

Consider the following one way classification model

Y.. = ~ + ~. + g.., i = i, 2; j = i, ..., n 13 I l j

where Y.. denotes the response to treatment i on the jth trial, ~. measures lj l

the effects of treatments, ~.. are noises. The aim is to compare the treat- lj

ments through ~I - a2" The overall mean effect ~ is not identifiable or

estimable from the data.

5B. Basic Ideas

In the ensuing discussion it may be helpful to think about the identi-

fiability problem form two points of view. (i) What conditions are needed

on the parameters of a model to ~nsure it can be simulated with prescribed

properties, (ii) How much can the data reveal about parameter values.

To formalize the identifiability issue recall the definition of a

model. Suppose y denotes some time series data. A model M for ~ is a

family of probability distribution Po(y) or P(ylO) indexed by a d-dimensional

_ c ~d; IR@ is an parameter Q. The parameter belongs to a parameter space IR e

open set. A member of M i.e, a particular probability distribution is

called a structure or (a parameter value) and denoted M@ (or ~). (The

bar under y, @ will be dropped when no confusion results.)

DEFINITION i. Two structures M6, M@, (or parameter values Q, 0') are

called observationally equivalent if

Po(y) = e ~ , ( y ) .

199

DEFINITION 2. (Global Identifiability). The model M is identifiable

at O 0 (or MOO i s i d e n t i f i a b l e ) (or 00 i s i d e n t i f i a b l e ) i f

P0(y) = Pg0(y) => 0 = 80

i . e . t h e r e i s no o t h e r 9 ¢ IR e which i s o b s e r v a t i o n a l l y e q u i v a l e n t to 90 .

DEFINITION 3. (Local Identifiability). The parameter value O 0 is locally

identifiable if there is an open neighborhood of @ 0 containing no other

9 c IR 9 which is observationally equivalent to 80 .

Often we are only interested in specific functions of the parameters

(e.g., ~I - ~2 in the analysis of variance).

DEFINITION 4. A function g(0) is identifiable if

PgI(y) = P92(y)

=> g(91) = g(92)

i.e. observationally equivalent structures give the same value to g(!).

There are now two basic approaches to identifiability, one via the

theory of estimable functions, the other through Kullback-Liebler information.

5C. Identifiability and Estimabili~

DEFINITION 5. A function g(0) is called estimable if there exists a

function of the data ~g(y) with g(9) = Eg(~g(X)) = f~g(y)Pg(dy). The

use of this notion appears in the following result.

LEMMA 5.1. A function g(9) is identifiable if it is estimable.

Proof. Let (y) (y). Then Pgl = P92

g(01 ) = E g l ( ~ g ( y ) ) = f ~ g ( y ) P 0 1 (dy)

f~g(Y)P92(dY) = g (92) .

200

Remark i. This le~aa forms the basis of the common intuitive method of

investigating identifiability.

Remark 2. In linear models the converse holds as follows

LEMMA 5.2. In the linear model y = ~ + 8, a ~ N(O, I), ~ = XS. Then if

g(~) = £'~ is identifiable it is estimable.

Proof. The function c'@ is clearly identifiable iff it is uniquely deter-

mined by ~ since ~ determines P@(Z). That is there exists a with £'2 = a'q.

Now however ~'G is estimable by @g(Z) = ~'l"

Notes. Lepta 5.1 generalizes a lemma in Rothenberg (1971). Estimable

functions were introduced into econometrics by Richmond (1974). Lemma 5.1

provides a very powerful method for generating identifiability results.

Lemma 5.2 is in an unfortunately forgotten paper by Riersol (1963). The

theory of estimability in linear models is clearly discussed in Scheffe

(1959). Some application is now given to SEM that will show how these

le~m~as are used as well as motivate some general theorems later on.

Remark 3. To apply the above theory we will usually require that all

distributions are multivariate Gaussian (MVG). Since a MVG is determined

by its mean vector and covariance matrix we can avoid the MVG assumption

by instead defining observationally equivalent structures as those whose

means and covariances agree. With this in mind the above formulation

will be still used.

LECTURE 6: IDENTIFIABILITY OF ECONOMETRIC MODELS

6A. Simultaneous Equations Mode~l

Consider the SEM

YF + ZB = E

where Y is N x %, F is i x %, Z is N × m, B is m x ~. E has zero mean,

covariance ~Q I. The parameters are ~' = (vec' [, vec' B). The reduced

form is the multivariate regression

Y= ZI~+U

!:-Bf-I [= ~f-l.

Now apply Lemma 5.1 to the equalities

E(Z,Z)-Iz,y =

E(! [)(x-i)' = z = E(<U')

where Y = Z~ = Z(Z'Z)-Iz'Y. We deduce that the reduced form parameters

If, E are identifiable. Also observe that they determine P@(Y). Now the

idea is to obtain part or all of [, B from ~. We may need restrictions

on @ to ensure this can be done. We write the restrictions generally as

_~_~ = $, @, g known. (6.1)

The restrictions might typically be that 8. = 0 i.e. e! 8 = 0, l 0 -I 0 -

e. = vector of O's with 1 in i position. To (6.1) we add the relation -i

HF = -B or

I~®~ vecr :-vec B. (6.2)

To summarize (6.1), (6.2) we can write

202

[A] where A = A(]I) = (-19~ ~ ~ : Igm).

ability of a particular linear function say t'_@ = g(_G).

THEOREM 6.1. The function t'9 is identified iff

Rank(A' : 0' : t) = Rank(_A' :_~').

Now let us investigate the identifi-

(6 .4)

Proof. By definition t'@ is identified if

PoI(Y) = P02(v) :> t'61 = t'~ 2.

Now P01(Y) = P~2(Y) ~=> ~i = ~2 => ~(~i ) = ~(~2 ) = ~ say. Thus we have

0 !2"

Now by Lemma A6.3 ~ '~1 = £ '~2 i . e . ~ '~ i s unique i f f (6 .4) ho lds .

Remark. This theorem is given in terms of reduced form parameters. It

is more useful (especially for simulation) to restate it in terms of

structural parameters.

THEOREM 6.2. Introduce M = (I£ ~If' : I~ ~B'). Then t'~ is identified

iff

Rank(M~' : M~) = Rank(M~'). (6.5)

Proof. Observe that, on partitioning 0 appropriately

W =

01 0 2 I 0M'

:l£m I T +2

(6.6)

T =

~ ® r -1 ] _ : 0

J

203

Also, T is nonsingular and

T-i: M' £m

Now by Theorem 6.1 t'0 is identified iff

Rank(W':t) = Rank(W')

i.e.

Rank = Rank(W). T

Now multiply on the rhs by T -I (this does not change the ranks). Thus

(6.4) holds iff

Rank

0 : I

¢H' : 9 2

t'M' : t 2

= Rank 0 : I I.

[~M' @2 )

That is iff

I @M'

Rank i t'M' = Rank <¢M')

i.e. iff (6.5) holds.

These results can now be specialized to obtain identifiability of

v individual parameters. The ith element of 0 is 0 i = _ei0, i = i, ..., £(£+m).

From W form the matrix W. obtained by deleting the ith column; call the ith 1

column w.. Then -I

THEOREM 6.3. The ith element of @ is identified iff

Rank (W)_ = Rank (W i)_; + i.

t Proof. By Theorem 6.2, @i = -l-e'0 is identified iff ]%_ ) (A' :~')I_ = ~i'

! t i.e. (Wilwi)A- = -le" i.e. iff~'.l_~ = 0 and w_'il = I. But this occurs iff there

is no solution to Wi_ ~ = w i i.e. iff the ith column is linearly independent

of the remaining columns.

204

If we call m. the ith column of M we have

THEOREM 6.4. The ith element of ~ is identified iff

Rank (M}' : mi) = Rank (M~').

Proof. Straightforward.

From Theorem 6.3 it follows by contradiction

THEOREM 6.5. The whole vector @ is identified iff W has full column rank

i.e. Rank (W) = ~(~+m).

For structural parameters the result is

THEOREM 6.6. The whole vector ~ is identified iff }j~' has full column

rank i.e. Rank (M4') = ~2.

Proof. It follows from equation (6.6) that:

Rank (W) = Rank [ 0

[ < 4M'

l~m] .

4 2 ]

Then @ is identified iff Rank (W) = ~2 + ~m i.e. iff

Rank (M~') = ~2.

Notes• The above discussion is based on Richmond (1974). See also

Hsiao (1980).

Remark. Derivation of classical result where the constraints are on the

parameters of a particular equation. Here ~ takes the form

4 = ii 0 °°

~g

• 0

0 0 I • . ~g is mg x

Sg is m x m 1 g

0 " O~

then

205

W =

0

~i 0 I

0 ~r 0

0

Cg " ;g

" 0

0

I

SO

Rank = Rank ¢ ¢g Sg

Now apply Theorem 3 by removing % + m columns from W corresponding to the

gth column of v, gth column of B. We find that the (parameters of) gth

equation (are) is identified if

Rank W = Rank "~ : I

leg : ~g =0+~+m.

In terms of structural parameters this is

Rank (F'~)g + B'~'g : I) = Raak (F'¢'g + B' )

<=> Rank ~g ~ = f,

6B. Examples

The following example is based on Hsiao (1980).

equation system (m = ~ = 2)

[Ylt Y2t]I BII ~:i11 + [Zlt z2t]I vFYII

$21 B22-I L'2i

Consider the two

~I~ = (Clt E2t) Y22 ~

The parameters are

206

(vec' g :vec' B) = [711 721 712 722 : BII 821 912 ~22 ].

Clearly normalization restrictions are ~ii = 1 = ~22" For the identifi-

ability of the gth equation we require

[Sg g] x 4 matrix, Clearly we must have where :$ is an mg × (~+m) = mg

m ~ 2 to allow identifiability at the gth equation. This entails adding g

one constraint to the normalization constraint: e.g., 721 = 0 => equation i

is not identified; equation 2 maybe, We rapidly find

(as expected)

Rank [@i :@l]I'~ = Rank [0010] IBI = I < 2

o p Rank [@2 :$2 ] = Rank 0 0 ~

= Rank ~ 721 T221 = Rank ~ 0 7221 = 2

hg21 g22 i Lg2] I j

So equation 2 is identified. We can also consider doing this through

cross-equation restrictions. We have

M = F ' 0 : B' 0 I

F' : 0 B'

and require Rank (MS') = Rank (SM') = L2 = 4o For this we need at least

4 restrictions; there are 2 normalization restrictions leaving 2 independent

conditions viz., try YI2 = 0; Yll + 721 = 0. Then

S =

li ° 0

0

i

0 0 I i

0 0 0

I 0 0

0 0 0 °°il 0 0

0 0

0 0

Then

207

Rank (}M') = Rank

I ~!i 612 0 0 ;

0 621 822

0 YII YI2

LYII+Y21 YI2 +Y22 0 0

0 621 = Rank = 4

0 YII

Y12 +Y22 0

Thus both equations are identified.

6C. Local Identifiability

Guided by the theorems of 6A we can formulate some general theorems

for nonlinear situations. Suppose we are interested in a function t(~)

of parameters 0 describing a probability model P@(Y). Suppose there are

some "reduced form" parameters ~ that determine Pe(Y) and a relation

a(9, ~) = 0 is known. Also, 9 is known to obey some restrictions ~(2) = O.

Then the identifiability problem becomes one of finding under what condi-

tions t(2) is unique for each ! satisfying a(0, N) = 0, ~(~) = 0. The

following result intuitively agrees with Theorem 6.1.

THEOREM 6.7. (Richmond (1976)). Suppose

Rank -- :- De ~

is constant in a neighborhood of 20.

at g0 iff

~a' ~' Rank -- :--

~e 0 ~_0 0

Then t(0) is locally identifiable

= Rank -- : - -

Proof. See Richmond (1976).

208

Remark i. From this result, results analogous to Theorems 6.2 -6.6 can be

developed. See Richmond (1976).

Remark 2. Hsiao (1980) shows how Theorem 6.7 can be used to deal with

identifiability for errors in variables problems.

APPENDIX A6: A BASIC LEMMA OF IDENTIFIABILITY

First recall some basic results in the solution of linear equations.

LEPTA A6.1. (Fredholm Alternative).

p × 1 vector b

but not both.

For any p × q matrix A and any

either

or

Ax = b has a solution x

~X with ~'Z = 0 and b'y = 1

Proof. This is a classical result. An excellent intuitive discussion of

this and many other topics in applied linear algebra is given in the book

of Strang (1974). A short proof is now given.

q xia i = b. This Call ~I' "'''-qa the columns of A_ then Ax = b says E 1 _

says b is a linear combination of columns of A. That is Ax = b has a

solution iff b c R(A) = column space of A = space spanned by columns of A.

Now by the fundamental theorem of linear algebra (Strang (1974, p. 69, p. 81))

either b c R(A) or b c N(A') = null space of A' = {v :A'y = 0} in the second

case choose y e N(A') and scale it so ~'Z = i.

There is another classical condition for solution of equations

LEMMA A6.2. Ax = b has a solution x iff

Rank (A : b) = Rank (A) = dim (R(A)).

Proof. Ax = b has a solution iff b e R(A) which is what the condition says.

209

Finally, the following result, which is used in the theory of

estimability of linear models in another guise, is basic to the identi-

fiability results.

LEMMA A6.3. (Richmond, 1974, 1976).

x satisfying Ax = b iff

The function t'x is unique for each

i.e. iff

~_y ) A'y = t

Rank (A' : t) = Rank (A').

Proof.

(i)

(ii)

If 3 Y ~ ~'! = ~ then for each x~ Ax = b, ~'~ = !'~ = ~'~ which

does not depend on x.

If ~'x is unique observe that if ~0 is a particular solution of

_Ax_ = b then all solutions are of the form x = ~0 + ~ where

E N(A) i.e. AZ = 2" But then ~'x = ~'~0 + ~'~ so ~'x unique

=> t'Z = 0. Thus the equation AZ = 0 and t'Z = 1 has no solution

A ~ so by Lemma A6.1 ~ _y~ _ y = t.

LECTURE 7: IDENTIFIABILITY AND KULLBACK-LIEBLER INFOrmaTION

7A. Identifiability b~_Kullback-Liebler Information

It is possible to express the identifiability issues in a form that

does not involve the fu]l distribution function by using Kullback-Liebler

information. Introduce

H(8, ~') = E0, [ In(Ps(dY)/P 8,(dY))]

= f l u ( P s ( d y ) / p g , ( d Y ) ) P e , ( d Y ) .

Then we have

LEMNA 7.1. pc(y) ¢ Ps,(y) iff H(e, 8') < O.

Proof. By Jensen's inequality

with equality if

H(8, 8') ! In I = 8

We can thus say

(i)

(ii)

P@(Y) = Pg,(Y).

9, 9' are observationally equivalent iff H(8, 8') = 0.

80 is identifiable iff H(9, 80 ) = 0 ~> 8 = @ 0 •

Now since 0 is the maximum value of H(8, 80 ) if H(8,80) is differenti-

able near 60 we can check the local identifiability of @ 0 by looking for

maxima of H(8, 80). In particular the following result is reasonable

THEOREM 7.1

(i) The parameter O 0 is locally identifiable if the information matrix

I N In ps(Y) ~ in ps(Y) ]

92t{/3838 IO 0 I80eo ~8~ , , = -Es0 980

is of full rank (Ps(Y) is the density of Ps(Y)).

211

(ii) If ~2H/~839' has constant rank in a neighborhood of ~0 and ~0

is identified then 7~000 has full rank.

Remark, The second result is useful in studying central limit theorems

for efficient estimators.

Proof.

here.)

density ps(Y). We have to investigate solutions of the equation

0 = ~H(@, @0)/~@ = / poo(Y)8 In p@(Y)/~@dY

= f (pso(Y)/p@(Y))~ps(Y)/~8 dY.

Note that

(This proof lacks a little rigor but all the essential ideas are

For ease of discussion take Ps(Y) to be absolutely continuous with

Thus, ~H(@, 80)/28180 0.

$2H(8,@0)/~@$8'[90

/p@(Y)dY = i => / l~p@(Y)/$@ dY = 0.

Also it follows easily

= -E(:~ ]np@(Y)/98)($ Inp0(Y)/~8')I@ 0

= _78088 (the information matrix).

is negative definite then H(8, 80 ) has a unique local Hence if ~@090

maximum at !0"

(7.1)

Now expand ~H/~@ in a Taylor series to find

~H/~e = 0 + ~2H/~0' (9-e 0)

where I18-0011 ! 119-9011. Now provided ~2H/~8~@' has constant rank in a

neighborhood of 80; if 80 is identified so that ~H/$8 = 0 for 8 # 90 cannot

occur we must have ~2H/$93~' of full rank. By assumption this entails

~2H/$@~@'180 = 18080 of full rank.

Constraints such as i(~) = $ can be allowed by searching for constrained

maxima of H(@, 80). We do this by constructing the Lagrangian function

Lie, e o) = H(e, e o) + _z'(}(o) - g ) .

212

Again we must investigate conditions for unique local solution to the

equation

~L/~O : 0 = :~H/~ ! + ( ~ i ' / ~ O ) ~ = 0

- ~ L / ~ = g = i ( ~ ) "

THEOREM 7.2. The parameter 9 is locally identifiable at 00 subject to

constraints ~(0) = g if

!

Le0e 0 ~0 0

~eo : 2

has full rank (20 = ~/~') or equivalently if

Rank (I 0 ~' : (Ieoeo)- _ 000 _00) = Rank

Proof. Similar to proof of Theorem 7.1.

We can also obtain a result for the identifiability of a specified

function t(e). We look for conditions on the solutions to ~L/~O = 0 that

make t(0) locally unique.

THEOREM 7.3. The function t(0) is locally identifiable at 00 subject

to constraints ~(0) = g if

Rank (19090

where 30 = St/D0.

Proof. Left as an exercise.

Lemma A6.3.)

Remark.

:_e o¢' :~e o) =Ra~k(fOOSO:¢~O )

(Use a joint Taylor series on L, t(0) plus

Using this result we can reproduce Theorem 6.Z beginning from

H(9, 00 ) = in I_~I - tr(~-l(H-~0)(_~ -~_0 )')

= in I_~I - tr(_~-I(_F-B~0)(_F-B_~0)').

213

Notes. The idea of using Kullback-Liebler information is due to Bowden

(1973). A result like Theorem 7,2 has been given by Kohn (1977).

Theorem 7.3 is new but closely related to ideas of Richmond (1976)

(cf. Theorem 6.7),

7B. Consistency and Identifiabilit~

There is a straightforward connection between weak consistency and

asymptotic identifiability.

TIIEOREM 7,4. If ~n (n= dim (y)) is a weakly consistent estimator of e 0 then

e 0 is asymptotically identifiable i.e. H(e, 00) + -~ as n + ~.

Proof. Consider that

H(e, %) = Hn(9,% ) = fln[ pe(Y) ] p%%j p%(~) dY

= f I~n_ @ l>g In(Po(Y)/Pg0(Y))Pg0(Y)dY

] I~-o'>~)+o. 21nP9 ( n

But P0 (l~n- O I > g) + 0 s o H(@, e 0) .... .

Remark. In time series models often the following limit exists

n-iHn(0, 00) * H(~, 00).

Then we say

(i) The parameter values are (asymptotically) observationally equivalent

if H(e, e') = o.

(ii) The parameter value 90 is identifiable if H(9, 90) = 0 => 9 = 90 •

Often H(9, 90) takes the form H(@, @0) = L(e) - L(90). Then e.g., 90 is

identifiable if L(0) = [(90) => 9 = 90 etc.

Remark. A result like lheorem 7.4 is quoted in Wu (1979).

LECTURE 8: IDENTIFIABILITY OF SOME TIME SERIES MODELS

8A. ARMAModels

Consider the scalar stationary AgMA (p, q) model (q!p)

(l+a(e))y k = (I+C(L))c k. (8.1)

Define _0 = (a I, ..., apc I, ..., Cq)'. Observe that

E0(yiYi+ k) = ~(0), k = 0, ±i

Thus by the estimability Lemma 5.1, the autocovariances are identifiable.

Further they determine the covariance matrix E~(yy') = [R (8) u -- [i-j] - ]lJi,j<_n

and hence, under Gaussian assumptions the full distribution. Thus in time

series the autocovariances are often taken as the starting point for

identifiability considerations.

Now take covariances through (8.1) with Yk-j

following equations

Hp(0)ap = ~p(!)

a = (a I ..... ap)' -p

RI(0) R2(0) ... R (8) -- -- p

Re(0) . . . . . . . . . . Hp(0) = :

for j ~ p to form the

% ( 8 ) . . . . . . . . . . R2p_l(0)

; r p ( O ) = [ Rp+l(0)

R2p(0)

(8.2)

The matrix H (e) is called a Hankel matrix; it is characterized by having -p

the same entries on cross diagonals. Now suppose p is unknown we might

expect to recover a by solving equation (8.2) for increasing p. -p

THEOREM 5.1. In the model (8.1), a is identifiable iff Rank (H (8) = p. -p -p -

Proof. The parameters a (O) are identifiable iff --p --

215

P@I (y) = P~2(Y) ~> ~pl = ~p2"

Thus, since ~(@) determines P (y), ap is identifiable iff

P,k(@l) = P,k(82) Vk-> ~pl = ~p2"

Now we have

Hp(@l)apl = rp(e I)

Hp(~2)~p2 = ~p(82 ).

However, ~(@i ) = ~(@2 ) Vk --> H_p(@l ) = Hp(O 2)_ = Hp say and rp(@l ) = ~p(e 2)

= r say. Thus -p

~p~pl = ~p = ~p~p2"

We deduce ~pl = ~p2 iff Rank (Hp) = p.

Remark i. A standard numerical analysis method for investigating Rank is

the Singular Value Decomposition (SVD) (see Strang (1974), Stewart (1973)).

From a statistical point of view it is natural to scale H by using -p

= R -½H R -½' where R ½ is a square root of R = [RIi- j I In -p -p -p-p -p -p . i ] l~i,J <_P"

practice then we replace ~ by

~= N-I N-k E 1 YiYi+k

and perform a SVD of H where ~ is an upper bound on the ARMA order.

Now observe that if we introduce the vectors

-'~ - = _ _ , ) v .

-Yi = (YiYi+l ' ""'Yi+~-I )' Yi (Yi lYi 2' "'" Yi-~

= +--I Then H E(ZO~ 0 ). This reveals that the SVD is equivalent to a canonical

correlation analysis of the two vectors y+, y (a clear discussion of

canonical correlation and its relation to SVD is given by Kshirsagar (1974)).

The ~proach to finding ARMA order through canonical correlation analysis

is due to Akaike (1976) who discussed it for multivariate ARMA models.

Surprisingly the scalar version given here is little known. Actually,

216

Akaike proposes to choose the order by using

AIC(p) ~ -2 %~+i in(l -°2)i - £n (v- p)2 where Oi are the canonical correla-

tions. The canonical correlations are obtained through the SVD.

Remark 2. In view of Theorem 5.1 we can define the order of the scalar

ARMA model as the minimum p such that

Rank (Hp) = Rank (Hp+i), i ~ i. (8.3)

There is also another way to view the order that is useful for multivariate

time series. Denote

Yk+i[k-i = E(Yk+ilEk-i ..... gl )"

For the ARMA model 8.I we find that

Yk+j+plk_ I # alYk+j+p_llk_ 1 + ... + apYk+jlk_ I = 0, j ~ 0. (8.4)

Thus we can say

The order is the smallest index p such that ]

Yk+p]k-i is linearly dependent on its pre- I

decessors Yk+ilk_l, 0 < i < p -i.

(8.5)

that

Now this links up with the defintion (8.3) as follows.

+ -i I Yk

H = E(y0y 0 ) = E[ " (Yk-i ..... Yk-~ )

Yk+v-i

Simply observe

Thus H

Ykik-l

= E " (Yk-i ..... Yk-~ )

Yk+~-llk-i

!

E(Yklk-i ~k )

=

• !

E(Yk+v-IIk-I ~k )

has rank p iff it has only p linearly independent rows which

clearly occurs iff (8.5) holds.

217

Remark 3. Given the autocovariances of an ARMA process we can then find

its order. Suppose instead we are given two polynomials 1 + a(L), 1 + c(L)

to be used in a simulation. We should like to be sure they have no common

factors before we can know the order. Usually in simulation the problem

is avoided by building polynomials in factored form. Two polynomials A(L),

C(L) are called relatively prime if they have no common factor. The greatest

common divisor gcd of two polynomials A, C is the common factor of highest

degree. We should like to have a test for primeness as well as a means of

extracting the gcd (so at least we can check someone else's simulation!).

The classical method for testing two polynomials for relative primeness is

the Sylvester Resultant. The classic method for finding a gcd is the

Euclidean division algorithm. These and newer techniques are discussed

in Kailath (1980).

Remark 4. In connection with the last point is the issue of stability

of a polynomial A. Again this is often avoided by building models in

factored form. There are many methods available, one of the best known

being the Schur-Cohn criterion (see Kailath (1980)).

8B. Transfer Function Models

Consider the scalar TFARMA model

Yk = Sk + Vk

s k = (l+a(L))-ib(L)Zk

v k = (l+d(L))-l(l+c(L))~k

where s k is a white noise sequence of zero mean variance i.

The model can also be written

ek(e) = (l+d)-l(l+e)(Yk - (l+a)-ibz k)

_0 = (~ :~.) = (al...ap bl...bp : d l...d m Cl...c m).

We suppose there is a true value @ 0 so that ek(@ O) = Sk" Actually, a

2t8

similar analysis can be made even without this assumption.

m are assumed known. We consider only asymptotic identifiability.

the l+a, l+d, l+c polynomials are stable i.e.

the roots of zP(l+a(z-l)) = 0 etc. ] (8 .6 )

are inside the unit circle.

Also suppose z k is wide sense stationary i.e. the following limits exist

E(zizi+ k) = lim n-i ~l-t~-kzizi+k = <, k = O~ fl, ..,

(In a late~i lecture some models are discussed where these last two condi-

If the Sk were Gaussian we should be led to consider tions do not hold.)

the function

The order p,

Suppose

- 2 H ( e , e o) = L(e) - L(e o)

L(9) = E (e~ (9 ) ) = n-~oolim n-i ~ne~(9)'}"l

It is straightforward to see (provided Zk, £k are uncorrelated)

1 + c 2 * ,, L(9) = /_~ i-~ [¢y - ToCyz - r0¢yz + ITgI2¢z ]df

where Cy = Cy(ei~); Cz = Cz (ei~) are the spectra of y, z; Cyz = Cyz (ei~)

is the cross spectrum; T 9 = Tg(e1~ ) where

Te(L) = (I+a(L))-~(L); the asterisk denotes complex conjugate.

By adding and subtracting ICyzl2/¢z to the term inside the square bracket

we can reorganize the expression as

L(9) = Ls(9) + Lv(O) (8 .7a)

Cv(ei~Ig) Cz - TeI2df (8.7b)

Cy Lv(9) = f~% (ei~ (I - Iyyzi2)df (8.7e)

Cv e)

219

where ~v(L]@) = [i + c(L)[2/ll + d(L) l 2 is the noise spectrum and

Iyyz 12 = I ~ y z ( e i ~ ) 1 2 / ~ y ( e i ~ ) ~ z ( e i W ) i s t h e s q u a r e d c o h e r e n c y be tween i n p u t

arid o u t p u t . Note t h a t ( s u b s c r i p t z e r o d e n o t e s a t r u e v a l u e )

Tso(eim ) = ~sz(e )/$z (eIW) = %yz/%z

Ls(8) _> O, /v(8) _> i = Lv(90)

since @y(e i~) : Cv(ei~)(l - ]yyzl2).

LEMMA 8.1. The two parameter values 0, ~0 are asymptotically observationally

equivalent iff Lv(8 ) = i, L (9) = O. s

Proof. L(8) = L(8 O) <:~> Ls(8 ) + 6v(8 ) = Ls(@0) + Lv(80) = i. But

Lv(9) ~ i ~> Ls(8) = O, /v(9) = i since i (8), L (9) are >0. Converse is S V --

obvious

LEMMA 8.2. Lv(0) = Lv(8 0) = i :> -~ = ~0"

Proof. Lv(6) = l iff ~v(eZ~I0) = ~v(eiC°)

LEMMA 8.3. Ls (@) = Ls(@O) = 0 => _B = 50 i.e. 50 is a-identifiable if

z k is persistently exciting of order 2p.

(See Appendix A8 for a definition of this.)

(8.8)

Proof. Theorem 7.1 could be used but it is quicker to proceed directly.

Let us observe

where

Ls(@) = lim ! n ((TGo(L) T@(L))~k)2 n+oo n E1 - = 0

l+d(L) Zk or z k l+c(L) ~k i+c(L) = i¥d(L) ~k"

Now in view of (8.6) and Lemma A8.3 we conclude

n+oolim ~n Eln ((T~o(L) _ T0(L))Zk )

Introduce Sk(@ ) = (i +a(L))-ib(L)Zk so

2 = 0.

Thus we have

It follows that

Now see that

s k

220

= Sk(e 0) = (I + s0(L))-ibo(L)z k.

lim n-i Eln (Sk(e) _ Sk)2 = O.

lim n-i Eln [(i + a(L))(Sk(0) - Sk)]2 = O. (8.9)

(l+a(L))(Sk(8) - s k) = -(a(L) -ao(L))s k + (b(L) -bo(L))z k.

= (~ - ~o ) ' ~k where we have i n t r o d u c e d

_x k = (Sk_l...Sk_ p Zk_l'''Zk_p)'

Now call

N t (X'X) = E 1 XkX k. n - -

(8.10)

In view of (8.10), (8.9) we have

lira (~- G0 )'(n-l(X'X)n )(~- 00) = 0.

Hence by the corollary to persistent excitation Lemma A8.2 ~ = ~0 if (8.8)

holds.

Remark. These calculations go through in the multivariate case with little

change. The arguments above can be used to establish equivalence between

the discussion of Kohn (1978 b) and Deistler (1976). A related discussion

of identifiability is given by Grewal and Glover (1976).

APPENDIX A8. BASIC LEMMAS IN TIME SERIES IDENTIFIABILITY

Let ~, x k be two sequences related by a filtering operation (not

necessarily stable)

A0(L) ~ = Xk,

Consider the lagged vectors

_x k = (Xk_l...Xk_p)',

A 0(L) = Zg aiLi.

& = (~_l...~_p)'

(AS. i)

221

and the cross product matrices

~ ~ n ~ ~ , n ,

(X'X)n = ZI XkXk, (X'X) n = ~iXk_Xk .

Let ~ be an arbitrary fixed vector.

LEMMA A8.1. If %, x k are related by (AS.I) then

(ii) if ~(X'X) ÷ ~ => g(X'X) °+ n

(iii) if ~(X'X) ÷ 0 :> g(X'X) + 0 n n

where O(M) = smallest eigenvalue of M.

P ~i Li Proof. Introduce ~(L) = l I so we can write ~'x k =

= ~'_~. Then

n ~'(X'X) n~ = Z I (~(L)~K) 2

~(L)Xk; ~(L)%

n (c~(L)A0 (L)%) 2 = Z 1

n 2 p2 _< cE 1 (<~(L)x k) , c = p E a i

This establishes (i).

a(x'x) n ±. a(x'x) n.

= c~'(x'b ~. - n

Now (ii), (iii) follows since clearly

Corollary. Suppose n-l(x'X)n + ~ which is positive definite. Then if -- -xx

A is stable n-l(x'X)n ÷ I~ which is positive definite.

Proof. If ~~ exists it is clearly positive definite. That it exists -xx

follows by an argument similar to that used in Theorem 14.2.

Remark. Lemma A8.1 also clearly holds for vector sequences.

Suppose Sk, z k are two sequences related by

Ao(L)s k = Bo(L)z k (A8.2)

222

q i Ao(L ) = ~ aiLi, go(L) = 2obiL

Ao(L) could be unstable. Introduce the lagged vectors

~k (Sk-l'''Sk-p) ~k (Zk'''Zk-q

~k = (Zk-l"'Zk-p-q-l)'

and the cross product matrices

(Z'Z)n Eln ZkZk"

Let ~ be a fixed arbitrary p +q+ I vector. LEMMA A8.2. Persistent excitation lemma.

~ :> ~(W'W) + ~ (i) ~(z'z) n n

(ii) o(W'W) ÷ 0 => o(Z'Z) + O. n n

Remark. Since e.g., ~(Z'Z) + oo iff ~'(Z'Z) ~ ÷ ~o the condition o(Z'Z) + oo n -- n-- n

is called generalized persistent excitation of order p +q +I meaning via (i)

that the system A8.2 is excited sufficiently to ensure o(W'W) ~ oo. n

_ = ! ~v v Proof. Partition ~ (_Tp :_q+l ) and observe that _~'_s k can be written

_~'s k = ~(L)s k + $(h)z k

P TiLi; B(L) q $i Li. T(L) = E 1 = Z 0

_p+q+l L i Now given _~ consider the p+q+l vector 7__ defined through 7(L) = L I Yi

= T(L)~o(L) + B(L)Ao(L). Then

_ n )2 n y'(Z'Z)n ] = E 1 (Y(L)z k = Z 1 [T(L)Bo(L) + B(L)Ao(L))Zk ]2

2 n = E 1 (T(L)Ao(L)s k + B(L)A0(L)z k)

n (T(L)s k + B(L) Zk)2 _<cZ I

223

c ~'(W'W) ~. -- n-

~¢7'7~,~ ~, ÷ ~ yet o(W'W) + D < ~ . So ~ ~'(W'W) ~ + E < m => ~ T 71 Suppose n n - ~ - n- -

y'(Z'Z) y + F < m which contradicts U(Z'Z) ÷ m. Hence (i) follows. n- n

Similarly, (ii) can be established,

Corollar X. We say z k is persistently exciting of order p+q +I if

n-l(z'Z)n + -zzl which is positive definite. In this case, if Ao(L) is

-i stable then n (W'W) ÷ I which is positive definite.

n -ww

Proof. Straightforward.

3. if Sk ~kukl~_s = E~h L s LEMMA A8. = z and H(L) is stable then s 'i s

,n 2 < c~ 2 H(ei0b g ls k _ Zk, c = essup I 1 2.

Proof. Call Zk = Zk' k ~ n; Zk = 0, k > n. Then set Sk = H(L)~k for all

k. So Sk = Sk' k ~ n. Then

n 2 n-2 co -2 = Ii Sk < E1 Sk E 1 s k

= fl~ IZ~keik~12df; f = ~/2~

~ ik~k - 12df = fl~I~l ° ~lhk-s~s

= f~ i~l eiS~zs E~s ei(k-S)~hk-s 12df

m 2 ~ it~ 2 fZ~Iz leis~-z sl l~l e htl df

co isc0- _< cf_~ l~le ZsI2df

oa_2 = c E 1 z s

= c 2 1 z 2 s"

Remark. A similar result holds in the multivariate case. If H(L) is

rational the essential supremum is a maximum.

LECTURE 9: LAGS IN MULTIVARIATE ARMAMODELS I

9A. Kronecker Indices and MeMillan Degree

From the discussion in Lecture 8 of the order of the scalar ARMA model

(l+a(L))y k = (l+c(L))g k

there are three ways of considering the order p

(a) If 1 + a, 1 + c are coprime

p = deg(l+a).

(b) p = Rank H~ where ~ is an upper bound on p and H

matrix

H

R I R 2 R 3

R 2 R 3 ... =

R 3 ...

m~ "."

• . . m

R2~- i

is the Hankel

(c) The order is the minimum index p ~ y(k+plk- i) is linearly

dependent on y(k+p-ilk-l), 1 ! i ! P.

The interrelations among these three were discussed in Lecture 8.

these three notions will be pursued for the multivariate case. Here we

discuss (b), (c); (a) is discussed in Lecture i0.

Let ~k be an ~ × 1 vector ARMA process

A(L)Ik = C(L)!k

A(L) = ~0 p ~i LI; _C(L) = I pO~i LI; ~k is white noise.

From definition (b) we introduce

(B) The multivariate order or McMillan degree

Now

225

= Rank (H v)

where H is the block Hankel matrix

H - ' 0

v

_a 1

-~2 =

I

R

-~2 "" " R' -,,)

_R 3

~2V-I

where -iRi = E(y0y' i)_ -- = ~-i" (Suppose throughout we have infinite data.)

Note that 6 need not be a multiple of ~: if yk is white noise ~ = 0•

Intuitively there should be more than one order for the vector process.

Rather there should be % orders, one for each of the % variates in ~k"

These are found from definition (C).

(C) The ith Kronecker index or output index or output lag is the minimum

index Pi such that Yi(k+Pilk-l) is linearly dependent on its predecessors

[Yi_l(k+Pilk-l)...Yl(k+Pilk-l) :!'(k+Pi- Ilk-l) : ... :y'(klk-l)]

where

y(k+jlk- i) = !k+jlk_l = E(!k+jl~k_l, ~k-2 .... )"

There is apparently a difficulty here in that the definition depends on

the ordering of the Yik in ~k: it will be seen later that this is not a

problem; in fact, the set {pi } is unique but Pl is not associated with Yl

etc.

Now the relation between definitions (B), (C) is explored by considering

the Hankel matrix more carefully. Observe that the ~i + j row of H is

! _v

r%i+j = E(yj(k+ilk-l)Yk_ I)

, , , )

-Yk = (-Yk-]•''!k-v "

Then definition (C) says to search the rows of ~v for linear dependencies•

The positions of these linear dependencies determines the indices of output

226

lags Pi" (Linear dependence can be investigated by singular value decomposition:

see Strang (1974).)

To clarify all this consider the following example of how the linear

dependencies among the rows of H might occur when ~ = 4. We suppose always

the process ~k is of full rank i.e. the covariance generating function or

spectrum is positive definite: otherwise the dimension can be reduced

appropriately.

Exhibit 9.1. Example of linear dependencies in the rows of the Hankel

matrix for a 4-vector A~MA model.

X = independent row # of linearly

Row # 0 = dependent independent rows

(on previous rows)

1 X 1 2 X 2 3 X 3 4 X 4

5 X 5

6 0 P2 = 1 - 7 X 6 8 X 7

9 0 Pl = 2 - i0 0 - ii X 8

12 0 P4 = 2 -

13 0 - 14 0 - 15 X 9 16 0

17 0

18 0 19 0 p3 = 4

Notice that if row j is dependent on previous rows then rows

j +4, j+ 8, ... (can be and so) are omitted from further consideration. Also

rows 1 2 3 4 are given first consideration.

Note that there are 9 linearly independent rows in the Hankel matrix

227

(clearly all subsequent rows are linearly dependent on these). Thus

6 = Rank (H) = 9.

Also, clearly by construction

~ p_ = 2 + i + 4 + 2 = 9 = Rank (H) = 6, i -I U

This relates the McMi!lan degree to the output indices. If ~ = pg so

Pi = p' i = l...g the process is called block identifiable.

9B. Determination of AR parameters

Now working from the Kronecker index calculations (i.e. linear depen-

dencies) we can determine the AR coefficients of the A~MA scheme by solving

Take cross covariances with ~k-j Hankel equations.

to find

in equation (9.1), j Z P

~ v = (~p ' "~o ) Ev = P"

Each row of A specifies that a linear combination of rows of H - -O

this amounts to ~g equations. So these equations must match up with the

order calculations of Exhibit 9.1. In particular it is clear that

P = maxi Pi"

In the example p = 4. The index p is called the observability index, it is

the maximum lag in the ARMA process. Now we can write out equations (9.2a)

symbolically to correspond to Exhibit 9.1 as follows (now X is a generic

non-zero quantity; 0 is zero).

(9.2a)

is zero:

Exhibit 9.2. AR parameters from Hankel equations.

Kron. lag 4 3 2 1 0 index variable

0 =

F O 0 0 0

I O 0 0 0

[XXXX

~ 0 0 0 0

0000

0000

XOXX

0000

XXXX

0000

00X0

XXXX

XOXX

XXXX

00XO

XOXX

X0001 2 a

XX00 I b H

00X01 -~ 4 c

00XX~ 2 d

(9.2b)

228

To see how the exhibit is constructed note viz. the second row of A entails

a linear dependence among

rows (13, 14, 15, 16, 17, 18) of H .

This is the same as a dependence among

rows (I, 2, 3, 4, 5, 6) of H

which is what is depicted in Exhibit 9.1. This makes it clear that the

pattern of O's, X's in the A. is determined by the linear dependencies of -i

the rows of H . To keep subsequent calculations clear let us agree to

interchange the equations so that the Pi'S are ordered decreasing top to

bottom; we find

Kron. 4 3 2 i 0 index variable

0 = XXXX 0000

0000

0000

XOXX

0000

0000

0000

00X0

XXXX

XXXX

0000

00X0

XOXX

XOXX

XXXX

00x i 4 a 00 xx H 2 b

X00 ~0 2 c

XX 0 i d

(9.2c)

Next observe that permuting the positions occupied by (Ya' Yb, Yc' Yd ) in

Z merely interchanges columns in ~i (and corresponding rows in H ) and does

not affect the lag structure. Let us do this to make AO lower triangular

index variable

0 =

~X X XX[X XX 0

0000 0000

0000 0000

L0000i0000

X000

XXXX

XXXX

0000

XO00

XXX0

XXXO

XXXX

10001H 4 c XIO0 2 d

0010 ~ 2 a

00XI~ 1 b

(9,2d)

The diagonal entries of ~0 have been made unity by scaling each row. We

do not scale by since this destroys the lag structure. Notice how the

apparent association Pi to i is altered. The Pi are rather associated

with the equations. In Lecture I0 it will be seen that the set {pi } is

229

unique. Note that it is clear from (9.2d) that block identifiability

occurs iff Rank (Ap) = 4.

As mentioned earlier each row of (9.2d) entails ~£ equations. However,

Rank (%) = 6 means that there are only 6 independent equations (column rank

= row rank). There are £ rows so for each row we have 6 independent equations

available (a total of 64). Now count the number of unknowns i.e. parameters.

(In our example ~ = 9, £ = 4 so 6£ = 36.) We have

row 1, row 2, row 3, row 4

9 8 7 5 (=> 29 total).

So we can certainly solve for these. Actually the zeroes in the lower half

of AO are non-generic (aside from the 3,2 entry: since number of indices = 2

is 2). So adding these gives (9, 8, 8, 7)(=> 32). The generic case corre-

sponding to Exhibit 9.1 would have

row, 1234 67819101112113141S1617 dependency X X X X X X X 0 X 0 0 0 X 0 0 0 I 0

In general there are 6£ - i parameters, i depends on the Pi" Repeating the

argument of Theorem 8.1 shows that there are 64 identifiable parameters.

What happened to the I parameters will be clarified in 9C.

Because of the definition (C) of the Pi the MA matrices C. must have -l

the same structure as the A. except for two things. -i

(i) ~0 is lower triangular and so has £(%+1)/2 parameters. Alter-

natively, ~0 = ~ and there are 4(£+1)/2 parameters in the innovations

covariance matrix.

(ii) If the bottom a. rows of A. are null then (so are the bottom i -i

a i rows of -IC" but) the top £ - a.1 rows of -IC" have all non-zero entries,

viz. in the example we have

230

A 2 C 2 rx0001 xxx i I XXXX XXXX

XXX X X XX

0000 000

The number of parameters in C "''~i is then -p

1 %(~+i) + ~ Rank (Ci) = ~ + ~ %(~+i)

2 -- "

Thus the total number of identifiable parameters is

~_](~+i) + ~ + ~6 - k = 2~6 - ~ + ~(~+i) 2 2

The determination of the MA parameters is not discussed since they are not

needed for modelling. This point, which is a consequence of the output

statistics Kalman filter is elaborated on elsewhere. In this connection

though it is worth seeing how a state space model may be constructed.

9C. Construction of a State Space Form

The idea is fairly straightforward. Firstly the dimension of the

state vector has to be the system dimension 6. It is easiest to proceed

via the example. The first <5 ( = 9) linearly independent rows were

(i, 2, 3, 4, 5, 7, 8, ii, 15) so take as the state vector

Y-k+l I k

Yl(k+21k)

Y3(k +21k)

_Xk+l] k = Y4(k + 21k)

Y3(k+31 k)

Y3(k + 41 k )

23t

Then we see e.g.~ that E(_Xk+ll k y k) are these very rows. The state space

model will be

~k+llk = ~-Xklk-i + ~k

Zk = Zklk-i + ~k = (1%0...0) _Xklk_ I + ~k

where F, K are to be determined. First observe that F, K are defined by

the relations

T

E(-Xk+lle -Xklk_ I) = F E(_Xklk_ I ~klk-i )

v

E(_Xk+ll k ek ) = K E(e k ek ).

We concentrate on F. By iterated conditional expectation we see

T

E(-xk+llk-1 ~klk~l ) = FE(~k lk_ l ~kle-1 )"

However, the linear dependencies spelled out in calculations such as

Exhibit 9.1 entail relations of the form

-Xk+l I k-i = FO -Xk I k-l"

This immediately yields F 0 = _F. Thus in the example we find

Xk+lLk_ 1 =

-Yl(k+IIk- l)--

Y2(k+llk- I)

Y3(k+llk- I)

Y4(k+llk- i)

Yl(k+21k- i)

YB(k+21k - i)

Y4(k+elk- i)

yz(k+31k-l)

YB(k+41k-l)_

0000 i0000-

XXXXX0000

000001000

000000100

XXXXXXX00

000000010

XXXXXXXX0

0 0 0 0 0 0 0 0 1

XXXXXXXXX

- Yl(klk- i)

Y2(klk- i)

Y3(klk- i)

Y4(klk- I)

Yl(k+llk-l)

j

Y3(k +llk - I)

Y4(k+llk-l)

Y3(k+21k-l)

~Y3(k+3 k-l)

(9.3)

Again the pattern of O's, l's, X's is specified by the linear dependencies.

Also the X's in F are the same ones appearing in the A.. The parameter count --i

232

is especially easy yielding

5, 7, 8, 9 (==> 29 total).

Again there are 3 non-generic O's mai~ing 32 parameters. There are then

~% - % parameters in F, ~% in K, ~(% + 1)/2 in the innovations covariance

matrix; a total of 2~% - % + ~(%+1)/2 as before. Now we can see that the

rows with X's can be filled out without changing the lag structure. This

gives 4 x 9 (=%~) = 36 parameters in F.

Remark i. Identifiability. It has already been proven the ~ - % parameters

in F are identifiable: clearly the %(%+1)/2 parameters in the zero lag

covariance matrix are identifiable. It follows from the form of the output

statistics Kalman filter in 4B that observationally equivalent structures

I have the same value for G = E(Xl y0 ) . It is straightforward to show the

parameters G are identifiable. This is further discussed elsewhere.

Remark 2. If the ordered distinct values assumed by the Pi are ~i' ""'~m

(viz. in the example m = 3, ~i~2~3 = 1 24), then % = ~(Pl .... 'P~)

= ~m-lj=l jn(~m_j,) where n(~ i) = number of indices equal to ~i" This is readily

verified by a simple counting argument in equation (9.3) (take the generic

case).

Notes. The discussion of this lecture is the author's. However, in various

forms most of the ideas here are in (especially) Akaike (1974a, 1974b, 1975,

1976) (who shows how to apply these ideas to analyze data); Wertz et al.

(1981), Ljung and Rissanen (1976), Rissanen and Ljung (1975), Tse and Weinert

(1975), Glover and Williams (1974). Early work on identifiability stressing

block identifiability is Hannan (1969). Related work for econometric models

is Hannan (1971, 1976), Deistler (1975, 1976, ]978), Kohn (1978b, 1979b).

An interesting note relating state space calculations to Hankel matrices is

Koussiouris et al. (1981): see also Rosenbrok (1970). The form obtained

in (9.2d) is related to the so-called polynomial echelon form: see Kailath

(1980). The idea of periodic autoregression is to attack lag structure too

(Pagano (1978), Newton (1982)). The present discussion is its logical completion.

233

APPENDIX A9: MATRIX POLYNOMIALS

A polynomial matrix (PM) is a matrix whose entries are polynomials

in L (or Z = L -I)

p k A(L) = [alj (L) ] = [ E0 aijk L ]l<_i<_~;l_%jf~m"

We can also write A(L) as a polynomial with matrix coefficients

_A(L) = ~A iLI.

A PM is called nonsingular if det(A(L)) # 0.

as two PM's

If a PM A(L) can be factored

A(L) = C(L)A(L)

then C(L) is called a left divisor (Id) of A(L). Two PM's A(L), D(L) have

a common left divisor if there is a PM C(L) and PM's A(L), D(L) with

A(L) = C(L)~(L)

D(L) = C(L)D(L).

A greatest common id (gcld) of two PM's A, D is a common id that is left

divisible by any other common Id.

A unimodular matrix is a square PM whose determinant is constant.

Elementary row and column operations or PM's can be represented by uni-

modular matrices. Thus, e.g., if %(L) is a polynomial the matrix

1 0

U(L) : 0 1

0 0

~(L)

0

i

has determinant i; it adds %(L) times row 3 to row 1 when it multiplied

on the left.

LEMMA A9-1. A PM matrix U(L) is unimodular if and only if its inverse is

also a PM.

Proof. If U(L) is unimodular then clearly

234

is a PM.

we deduce

U-I(L) = adjoint(U(L))/det(U(L))

On the other hand, if U-I(L) is a PM then from U(L)U-I(L) = !

~>

det U(L) det U-I(L) = 1

det U(L) = constant

LEMMA A9-2. If C(L) is a gcld of A(L), D(L) then every gcld of A(L), D(L)

has the form C(L)U(L) where U(L) is unimodular.

Proof. If F is another gcld there exists PM's P, Q

C = FP, F = CQ

- > c = CQP ~ > QP = ~ = > I Q l i P I = 1 .

But Q, P are PM's => IQI = constant, [Pl := constant.

Corollary. If one geld is nonsingular then all are. If one gcld is uni-

modular, all are. Thus elementary row and column operations cannot change

the rank of a PM.

Two PM's are called relatively ]eft prime or left coprime if all their

gcld's are unimodular.

If H(L) is a matrix of rational functions and A, B are PM's with A

nonsingular and with H = A-IB then A-IB is called a left matrix fraction

description (MFD) of H. If A, B are left coprime A-IB is called an irre-

ducible MFD. If 8, ~ are square H = A-IB is irreducible the poles of

are the zeros of det A, the zeros of H are the zeros of det 2"

LEMMA A9-3. Construction of a gcld. Given the ~ × ~ PM 8, the ~ x m PM

B find elementary row operations so that at least m of the bottom rows of

the rhs of the following expression are zero

m ~

m U21 B' 0 m

235

Then R' is a gcld of A, B. (U.. are FM's.) -i 3

Remark. Kailath (1980, p. 401) mentions an alternative via Sylvester's

Resultant.

Proof. Call

so R is a cld of A, B.

i Ill I u-1 -u11 ~12 = Vll -v12 ~ v

u21 _u22 _v21 _v22

:> A = ~Ii ' B = ~Y21

But also

R' = [l~' + [129'

so if ~i is another cld with say

A = ~l~l

T T

=> R = RI(§IUI I +~IU12)

so that ~i is aid of R; so R is a gcld.

Now we observe that from equation (A2.1) we can obtain an irreducible

right MFD from the left MFD A-IB.

LEMMA A9-4. If A is nonsingular

(a) [22 is nonsingular

<h> A IB =

(C) ~21' [22 are left coprime

(d) If 9, 9 are eoprime then deg ]~I = deg ]U22 ] .

Proof. Recall A' = VIIR' so A nonsingular => YlI' g' are nonsingular.

Next, by a well known formula for partitioned matrices

(A2. I)

236

t~22 t / tY l lL = [~t ~ o => L~22I ~ o

so (a) holds. Next (b) holds since

U2tA' + U22B' = O.

For (d) if A, B are coprime then R is unimodular and

deg l~I = deg lYll I.

But U is unimodular so (A2.2)

deg I~l = deg IVII I.

Finally, (c) follows since U is unimodular thus any subset of its rows

is irreducible. For further details see Kailath (1980, Section 6.3).

(A2.2)

LECTURE i0: LAGS IN MULTIVARIATE ARMA MODELS II

10A. Matrix Fraction Descriptions

Consider the multivariate ARMA model

A(L)y k = ~(L) ~k (i0.i)

where A(L), C(L) are matrices of polynomials (PM's) such as A(L) = [~Aiel;

-- ! = -- ~! ) ~k is a white noise sequence with E(g k ~k ) = I. The sequence ~t E(Yk Zk-t

is called the impulse response sequence for the A~MA model and the relation

Z0~H'Li-I = H(L)_ = A-I(L)C(L)_ _ = A-Ic

is called a left matrix fraction description (MFD) of H in terms of A, C.

A brief discussion of }~D's was given in Appendix A9 (see also Rosenbrock

(1970), Wolovich (1974), Kailath (1980)). The causal nature of H(L) is

expressed by the fact that IIH(0)!I < ~.

Recalling Definition A (Lecture 9) of the order of a scalar ARMA model

as the degree of A(L) (deg A(L) or deg A) in a coprime description A-Ic,

we might expect an equality between McMillan degree ~ and deg det A. First

howe~er observe that if A, C have a cld G(L) so

A = GA, C = GC

where A, C are PM's then deg det A = deg det G + deg det A ~ deg det A.

Extraction of a gcld of A, C will reduce the deg det A to a minimum value.

If all the gcld's of A, C are unimodular then the MFD H = A-Ic is irre-

ducible and any other irreducible MFD of H has the same value for deg det A.

Hence the definition

(A') The order of the multivariate ARMA model (i0.i) is 6' = deg det A

where H = A-Ic is any irreducible MFD of H. We have still to argue that

6' = ~.

238

Introduce Pi degree of the ith row of A(L) ( =deg rowi(A)) = maximum

of the degrees of the polynomials in the ith row of A(L). Then deg det A

5_ E 1 Pi" This is most easily seen with the help of an example. Take p = 4,

Exhibit i0.i.

A(L) =

(x is a generic nonzero value).

F x+ xL+ xL 2 x+ xL I

x+xL x

x + x L + x L 2 + x L 3 x + x L 2

x + xL + xL 2

x + x L + x L 2 + x L 3

L 3 L 2 L 1 L 0

0 0 0 X 0 X X XX XX X7

0 00 iO 00I I X 0 X i X X Oj

Lxo x I x x x I x o x i x x x

!

Pi

2

i

3

i

P i

2

1

3

We can write

I

P. _A(L) = _Ahc diag (L i) + _R(L)

where Ahc is the matrix of highest degree coefficients

XOX

XOX

XOX

L 20 0

0 L 0

0 0 L 3

+ R(L)

(i0.3)

and R(L) is a PM whose row degrees are less than those of A(L).

E ~ det A(L) = (det %c ) L Ipi + lower degree terms.

Thus deg det A(L) = E lpi iff

-~c is of full rank.

If (10.2) holds we say A(L) is row reduced. Thus in the example A(L) is

% , not row reduced so deg det A(L) < E lpi = 6. Further it is clear that if

A-Ic is irreducible then iff A is row reduced we have 6' ~ ' = E1 Pi" Any PM

Then clearly,

(i0.2)

239

can be row reduced by elementary row operations (i.e. left multiplications

by unimodular matrices). An example makes this clear

A(L) =

4+I+L2 2L+L 2 ]

3+2L+2L 3 5+2L 3

SO

11] = 0. det (~c) = det 22

It helps to list the coefficients A. --I

L 3 L 2 L 1 L 0

2 00 20 3

Clearly the operation row 2 * row 2 - 2L row I i.e. left multiplication by

Ii -2L) L 3 U(L) = 0 1 i removes the terms. The ~ew ~c matrix is

[i l det HA~c = det ~ 0. - -4

v

So the new matrix is row reduced. Before we can link 6' to 6 and Pi to Pi

we need a lemma.

LEMMA i0.i. Invariance of row degrees. All row reduced PM's related by

unimodular matrices have the same row degrees i.e. if A, A are row reduced

with row degrees arranged in say increasing order and

A = UA, U unimodu]ar

then A, A have the same row degrees.

Corollary. All irreducible left MFD's H = A-Ic with A row reduced have

the same row degrees and the same order.

Proof of Lemma i0.I, See Appendix AIO.

Proof of Corollar X. If A-Ic, ~-i~ are two irreducible left MFD's then for

! -!

some unimodular U A = UA; but A, A are supposed row reduced. Thus Pi = Pi'

240

i = i, ...,~ and 5' = Z 1 Pi

Remark. Thus the set {pi} and the order are unique. To establish the

connection to Pi' 6 simply observe that A(L) in (9.2d) is row reduced.

Thus 6' = Z ~ 1 Pi = 6.

There is one other result that is of interest.

LEMMA 10.2.

proper) iff

If A(L) is row reduced then H(L) = A-I(L)C(L) is causal (or

deg fOWl(C) ! deg rowi(A). (10.5)

Proof.

(i)

(ii)

(cf. Kailath (1980, p. 385).)

If H is causal then from AH = C we find

row.(C) = (row.(A))'H. i I

But every element of H(L) is causal (i.e. no terms L -I, L -2, ...)

so result follows.

If (10.5) holds (with <) consider that the i, j element of H(L)

is

h..(L) = det AiJ(L)/det A(L) l]

where A lj is the matrix obtained by replacing the j th column of

A(L) by the ith column of C(L). We can write as in (10.3)

AiJ(L) = (AiJ)he diag (L pi) + RiJ(L)

(AiJ)h c agrees with ~e except for the jth column which is where

zero since each entry in the jth column of C(L) has degree lower

than the corresponding entry in the j th column of A(L). So

but

deg det A = Z I Pi

deg det A lj < Z lpi.

241

Thus hij(L) is strictly proper: so then is H(L).

in (i0,5) the argument is similar.

When = holds

Remark. This result relates to the comment in Lecture 9 that the pattern

of O's, X's in the _ Ci's matches that in the _ Ai's.

APPENDIX AI0. THE PREDICTABLE DEGREE PROPERTY

The discussion here of row reduced polynomial matrices follows Kailath

(1980, p. 387-388).

LEMMA AI0.1, Let A(L) be a PM of full row rank. Let p(L) be a polynomial

L-vector; let q'(L) = p'(L)A(L). Then A(L) is row reduced iff

deg q(L) = max {deg Pi(L)+Pi } i=Pi(L)#0

where Pi(L) is the ith entry of p(L); Pi = deg rowi(A).

Proof. Call

d = max {deg Pi(L)+Pi }. i=Pi(e)~0

Obviously deg q(s) ! d. We have to show equality. Now {Pi(L)} must have

the form

d-P i Pi(L) = ~i L + lower degree terms,

also not all the ~, can be zero. Now note i

q-0 =-Ahc-°' -~= (~l ..... ~L )

where q(L) = gO Ld + ... + ~d" So that if ~he has full rank then qo # O.

Conversely, (AI0.1) must hold for arbitrary ~ with q0 = 2 => ~hc has full

rank.

Proof of Lemma i0.I. Invariance of row degrees. Proof by contradiction.

Suppose the row degrees of A, A are respectively pl!P2~.,.~p~;

(AI0.1)

242

Pl!P2 !'''!p% and that for some j

Pi = Pi' i < j but pj

Now write out A = UA as

> p j-

A =

j -I %-j+l

J IUll u12] ~-j U21 U22

The predictable degree property of Lema~a AIO.I => UI2 = 0. Hence,

Rank lull UI2] cannot exceed j- i: thus Rank U cannot exceed

j -i + ~-j = %-I i.e. U is singular hence cannot be unimodular.

LECTURE ii: MARTINGALE LIMIT THEOREMS

IIA. Martingales

The contemporary approach to deriving consistency and central limit

results for time series models is based on the use of martingale (MG)

convergence theorems. From a time series point of view this is rather

natural since the basic quantities in the martingale calculations are non-

linear innovation sequences (called martingale differences) that are ortho-

gonal to all functions of the past as opposed to linear innovations which

are only uncorrelated with linear functions of the past. The past or history

is naturally represented by increasing sequences of o-fields.

The other aspect of martingale calculations is that they often use a.s.

(almost sure) or a.e. results. While for most statistical inference it is

sufficient to prove weak consistency (convergence in probability) there are

some time series applications where strong consistency (a.s. or a.e. con-

vergence) is appropriate e.g., analysis of algorithms operating in real

time. Further it is often easier or just as easy to prove a.s. convergence

than convergence in probability.

An informal discussion of martingales is now given. A formal presenta-

tion is included in Appendix All.

Let Y be a sequence of random variables. Denote the history of the n

sequence namely the values YI' "'''Yn-I by Sn-i ~y (these are the increasing

G-fields generated by {Yn}~).± Then Y is a martingale (MG) if n

E(YnI~-I) ° Yn-i"

Now we introduce the nonlinear innovations

gn = Yn - E(YnI~Y-I)~ n

that is for any random variable Zn_ I that is a function of the history of

244

Yn i.e. a function of YI' "'''Yn-i (i.e. ~Yn_l measurable) we have

E(enZn_ I) = O.

Now we can write by the MG property

y =Y +g n n-i n

: vn

~i gs"

Thus a MG is an accumulated nonlinear innovation.

The importance of MG's stems from the Martingale Convergence Theorem

MGCT.

THEOREM Ii.i. If Y n

EIY I < ~ and Y with

is a MG and suPn EiYnl < oo then J a random variable

Y ÷y a.s. n

Proof. The proof is rather involved: see Doob (1953) or Billingsley

(1979). However, a weaker result is easy to prove.

THEOREM Ii. 2. If Y n

Y with E(Y 2) < co and

is a MG and SUPn E(Y~) < ~ then J a random variable

Y -+Y a.s. n

Proof. See Appendix All.

Remark. The following version of the MGCT is often useful in time series.

THEOREM 11.4 (MGCT*). If Jn are increasing o-fields on a probability

space ~, 3, P and if Qn-l' ~n' ~n are 7n_ 1 measurable non-negative random

variables and

Then if E 1 B n

further E la n < ~a.s.

0 _< E(Qn[~n-I ) <-- Qn-i - °'n + ~n'

< ~ a.s. ] Q < ~ a.s. such that E(Q) < ~ and Q ÷ Q a.s. n

Proof. See Neveu (1975). Let S = E 1 ~n if we suppose additionally that

245

E(S) < ~ then a very simple proof is available. Introduce S

note that

E(Sn+l lJ p = E(E(S.I~n+ 1) LJ n) = E(SI~ ' ) = S n

= E(SLTn),

so by MGCT Sn ÷ S. Also, Sn E~ @k £ 0. Then consider that

n n _ -Z1 -I @k + Zl-l~k- E(qn+Sn-El #k + El~kl]n-l) < qn-i + Sn-i

Hence by the MGCT (still true with an inequality)

Qn + s - ,n n n E1 Bk + El~k ÷ L a.s., E(L) < co.

n n But Sn - E1 ~k ÷ 0 a.s. and EI~ k is increasing and bounded by L so

E~ ~k + ~ ! L. Thus Qn ÷ L - ~ £ 0.

Remark. " In applications Qn is often a quadratic form, Then this theorem

is often referred to a stochastic Lyapunov function theorem.

liB. Martingale Central Limit Theorem

Suppose we investigate the asymptotic behavior of a least squares

regression estimator in

Yn = ~ + ~n

namely

n = Zl s s ;

so

i (Z'Z)n = ElnZs_Z s

To f i n d a c e n t r a l l i m i t t h e o r e m we l o o k a t an a r b i t r a r y l i n e a r c o m b i n a t i o n

s a y

~'(~ - ~ 0 ) = ~ ( z ' z ) ~ ~ n - - n - -' 21}s~s"

n T h i s h a s t h e f o r m Z i Z n s w h e r e E ( Z n s t g s _ l . . . c 1 ) = O. We a r e n a t u r a l l y l e d

then to formulate a central limit theorem for an array.

246

k An array {Xnk , ~nk}l n is called a martingale difference array (mda)

k {X n } n is aMG or nonlinear innovations array if for each fixed n k' ~nk 1

difference i.e. (fix n, let k vary) E(Xnk I ~n,k_l) = O; ~nk =~n,k-l"

Introduce the conditional variance

k

Vk2 = ZlnE(X2nj]~n,j_ I) n

There are a number of different MG central limit theorems available (the

connections are completely worked out in Hall and Heyde (1981)). Since

much time series theory is done under finite variance assumptions the

following version proves quite useful.

k THEOREM 11.5 (MGCLT). If A, B, C hold then ~i n Xnk '~'> N(0, i)

2_2+ 1 (A) V k n

(B) E(Vk2 ) ÷ 1 n

k

(c) Z l n E ( X ~ i I ( [ X n i ! > c) l J n , i _ l ) -~+ 0 Vg

Proof. See Hall and Heyde (1981) or Scott (1973). For a very short proof

of a similar theorem see McLeish (1974).

Remark. Often X . has the form X = Z .g. where Z . are measurable nl ni nl I nl

~n,i-i = %-1 = ~(~l'''gi-i )" For simplicity suppose c.l are independent

and uni formly i n t e g r a b l e ( see Appendix B l l ) . Ca l l g i (x ) = E[s~ I ( g ~ > t / x ) ]

so g i (x ) a re un i fo rmly bounded, a l s o n o n - d e c r e a s i n g in x, and sup i g i (x ) + 0

as x + ~. Finally, introduce

Z 2 Z 2 = max = max E(X l~n,i_l ).

n l<i<n ni l<i<n i

Then we can deal with (C) as follows

k k 2 ( 2

ZlnE(X2i~(Ixil > ~)13n,i 1) = Zln Znig i Zni)

247

_< k Z2.g .(z2) _< suPi gi(Z2n )(~KnLI Lni)~2. Eln nl I n

= suPi gi(Z2n)V 2 • n

Thus by u.i. and (A), we deduce (C) if Z 2 P-~ 0. n

C o r o l l a r y . I f X . = Z .~ . w h e r e ~. a r e i n d e p e n d e n t , u n i f o r m l y i n t e g r a b l e Hi Hi 1 1

and ~n,i = ~i = °(~l'''~i ) while Zni are ~n,i-i measurable then

k n X => N(0 i)

Z1 n i '

if A, B, C' hold

(c') max E(X~ I ~ P 0. l~_i<_n i ~n,i-i )

APPENDIX All. BASICS OF MARTINGALES

Let {Yn}l be a sequence of random variables on a probability space

(~,3, P). Let {3 } be a sequence of sub o-fields of 7. Often J will be n n

t h e i n c r e a s i n g o - f i e l d s g e n e r a t e d by {Y }. However , t h e y c o u l d be l a r g e r n

e.g., generated by {Yn' Z } where Z is another sequence of interest. If n n

~n = ~y = O ( Y I ' ' ' Y n ) a r e g e n e r a t e d by Y t h e n i n t u i t i v e l y we can t h i n k o f n

~Y as containing the "history" of Y . Another way of thinking of this is n n

t h a t any m e a s u r a b l e f u n c t i o n ( l i n e a r o r n o n l i n e a r ) o f Y I ' " ' ' ' Y n i s

m e a s u r a b l e w i t h r e s p e c t to J Y . A r e a d a b l e i n t r o d u c t i o n t o MG's i s g i v e n n

by Heyde ( 1 9 7 2 ) .

The pair (Y , ~ ) is called a MG if n n

(1) ~n+1 = ~n"

(2) Yn is Jn measurable; then we say Yn is adapted to In"

(3) EIYnl < ~: this ensures conditional expectations exist.

(4) E(Yn+llSn) = Yn"

Often there is no confusion as to which history or o-fields are being used.

248

Then we say Y is a MG. If {Y ] ~ is a MG and sY is the increasing o-

fields generated by Y then {Y ,~Y} is a MG. The proof is a simple exercise n n n

in iterated conditional expectation. Note (4) is equivalent to

E(YmI-%) = Yn' m An.

LEMMA AII.I. {Yn ']y)n is a MG iff E(Yn+I[YI...Y n) = Yn a.s.

This equality is often taken as the definition of a MG. Before giving

the proof we recall the projection definition of conditional expectation.

If G is a sub o-field in ~ and Y is an ~-measurable random variable then

E(YIG) is defined by the orthogonality condition

fA (Y -E(YIG))dP = 0 for all A e G

or equivalently

f (Y-E(YIG))ZdP = 0 for all C-measurable functions Z.

Proof of Lemma AII.I. Let A e O(YI'''Yn ) - = ~Yn then A must be the inverse

i m a g e u n d e r Y = ( Y 1 . . . Y ) ' o f a B o r e l s e t H e ]]{ n i . e . A = Y - I ( H ) - n

= {m : Y ( ~ ) e H}. Then by a c h a n g e o f v a r i a b l e i f py i s t h e j o i n t d i s t r i b u t i o n

of Y we have

[A Yn+l dP = fYgHYn+l dP = fH E(Yn+I IY = y)dbtf (11.i)

fAYn dP= fHE(Ynl Y =Y)d~Y = fHYndPy" (11.2)

THEOREM AII.I. If {Yn' In } is a MG and sup n E(Y~) < ~ then ~ a random

variable Y with E(Y 2) < ~ and Y + Y a.s. To prove this we need to use n

THEOREM AII.2.

Y > 0 then n

Proof. Let A I

Then

Kolmogorov's Maximal Inequality. If Yn' % is a MG,

] E(Yn) P max Y > o'. <

l~i~n i -- -- ~, "

= {~ :YI(~) >__ C~}, A k = {max Y. < ~ < Yk i<k i _ }, k = 2, ...,n.

249

n A = h A k = h

l<_i<_n

Also ~ 6 ~k = o(YI'''Yk )" Then consider that (since ~ are disjoint)

E(IEn) = E(Y n EiIak) (I A is the indicator a function)

n = E 1 E (YnI%)

= Eln E(E(Y n I e kl 3k))

= Eln E(I~Yk ) by the MG property.

But on %, Yk > ~ so

nE(l ) =~E(IA) =o.P(A) E(IAY n) >_ ~ E 1 A k

i.e.

max

l<iin 1 n

Proof of Theorem AII.2. Recall the following necessary and sufficient

condition for a.s. convergence (see e.g., Chung (1974)). A sequence Y n

converges a.s. iff

lim lim P{IYn+k-Ynl > g for some i <k <m} = 0. n+oo n+oa

This holds by upper bounding if

lim lim P ( max IYn+k-Ynl2 > 2] = n -~°° n -~° 1%k<m

By the maximal inequality this is bounded by

0.

-2 )2 lim lim E(Yn+ m-Yn (11.3) n-+oo m->~o

However, by the MG property

2 E (Y2n+m) E(y2) E (Yn+m - Yn ) = - "

But E(Y2n ) converges to a limit. To see this observe SUPn E(Yn 2) is bounded

by assumption and E(Y 2) is non-decreasing. This is because if we call

250

an = Yn - E(Ynl~n-i ) = Yn - Yn-i then Yn = an + Yn-i .... > E(y2) =n E(Y2n-I ) + E(~2)

>_ E(Y2n_I ). Thus the limit in (11.3) is zero.

APPENDIX BII. SOME USEFUL PROBABILITY RESULTS

i. Dominated Convergence Theorems

Let Y be a sequence of random variables (RV's) on a probability space n

Q, ], P. Let f be a sequence of Borel measurable functions on a measure n

space ~, ~, .

LEMMA BII.I.

(i) If f n

Dominated Convergence Theorem.

÷ f in measure or a.e. and 3 g ~ [fnl ~ g, fgd~ < ~ then

f fndD ÷ /fd~ or f if n - fld~ + 0 as n ......

(ii) On a probability space this reads

If Yn ~ Y and ] a RV Z ) I Y n I ! Z

and E(Z) < ~ then EIYn-Y I -+ O.

Proof. See Billingsley (1980).

Now in the probability setting we can improve on the dominated con-

vergence theorem. In fact we can give an iff condition for

Yn ~ Y to imply ElY n - Y I ÷ O.

The condition is uniform integrability.

A sequence of random v a r i a b l e s Y i s c a l l e d un i formly i n t e g r a b l e ( u . i . ) n

if

(a) sup n EIYnl <

(b) given g 3 (~ ) P(E) < 6 => a

fE IYn IdP < a Vn i.e. lim SUPn fA [Yn idP = 0 or lim P (A) ÷0 P (A) ÷0

= O.

sup n E(IAiYnl)

251

Remark. If Yn is u.i. and EIYI,: < ~ then ~IY n-Yl is u.i.

LEMMA BII.2. If Y ~ Y then E - n [Yn Y1 + 0 iff Y is u.i. n

Proof. The u.i. condition drops out naturally from the classical proof of

dominated convergence as follows.

EIYn-Y I = f IYn-YI dP

= flYn-YI>a: [Yn- YIdP + fIYn-YI<¢ IYn-YIdP

! sup m flyn_yl>g IYm -YI dP + g

+ g by u.i. "" P(IYn-YI > g) ÷ O.

But g is arbitrary so EIY n-YI ÷ 0. The converse is a little harder.

very simple proof i s given in B i l l i n g s l e y (1979).

Corollary.__ If Yn ~ Y and SUPn E(Y~) < ~ then EIY n-Yl ÷ 0.

A

Proof. Clearly SUPn EIYnl ! sup n (E(Y~)) ½ < ~ also SUPn E(IAIYnl)

! /P(A) /SUPn E(Y~) so Y is u.i. n

Remark. Lemma AII.2 also entails EIYnl ÷ EIY I since ElY - EIY-Yn

EIYnl ! EIY I + EIY-Ynl.

2. Kronecker's Lemma and the Toeplitz Lemma

Suppose Z is a sequence of numbers Z ~ Z. If w is an array of n n n:

n weights with Wni A 0; E lwni = i; Wn:. + 0 each i (i.e. no weight dominates)

n + Z. That this is then intuitively we expect the weighted average Z 1 WniZ i

true is the Toeplitz Lemma. A proof is straightforward.

LEMMA BII.3. Kronecker's Lemma. Let b n

a non-decreasing sequence V + oo. Then n

be a sequence of numbers and V ii

Vnl n nbk/V k ÷ S < oo Z 1 b k ÷ 0 if I 1

Proof. Call S = n bk/V k n E1 so Sn + S; S O = 0.

252

V-I .n V-I n n ~i bk n - • = Z 1 Vk(S k - S k i )

V-I n = n [EiVkSk - Vk-iSk-i + (Vk-l- Vk)Sk-l]

n = S - V~ I E 1 (V k - n Vk-l)Sk-i

+ S - S = 0 by the Toeplitz lemma.

Remark.

3.

(i)

(ii)

(iii)

The Kronecker lemma is much used in a.s. convergence proofs.

Some Useful Lemmas

If X__> 0, E(Ix> ~ X) = tiP(X> ~) + f~P(X > t)dt.

Proof: straightforward (see BillJngsley (1979)).

oo co EIP([XI >_ k) < E]X 1 ! i + >21P(]XI > k). This is called the

moments lemma.

Proof :

~o _ ~ ~ . . . . /i+l lIP(IX I > k) = llfkdP = El>~i=k ~ i dP

co co = Z li ~i+idp < ~i+l ~i ~I -i IxldP -< EIXI

_ oo ~i+]dP 1 + °°P(IX I > k). < El (i+l) ~i = E1 --

oe co n If ~ _> 0 and E 1E(~) < ~ then L I~ < oo a.s.i.e. >]i ~ converges

a.s.

Proof: see Billingsley (1979).

APPENDIX CII. STRONG LAW OF LARGE NUMBERS

The following result, which generalizes Khinchin's strong law of large

numbers (SLLN) is known but there does not seem to be a published proof.

THEOREM CII.I. Generalized Khinchin SLLN (KSLLN*). If X is a sequence n

of independent random variables with E(X ) = m and ] a RV Z with n

e(txl >~) <_cF(Izl >_~), Elzl <

then

253

Remark.

-i n CII i n EIXi + m a.s,

Condition CII.I will be called stron~uniform integrability (SUI)

since it is about the minJ~al condition that ensures uniform integrability.

It can be interpreted as saying that while the X are not identically n

distributed the different mechanisms that produce them yield outliers at

about the same rate.

Proof,

(i)

This is very similar to the usual proof. Let Yk = l(IXkl ! k).

ZIP(~Y k) = ~IP(I~I > k) ! C ZIP(IZ I ~ k) ! CEIZ I < ~ so

P(~#Yk i.o.) = 0 by the Borel Cantelli lemma. So it is enough

to show

-i n y + n ~i k m.

(ii)

(iii)

Now n -I n ZIE(Y k) ÷ m as follows.

m(Yk) = m k = f[Xkl<k~dP = m- fiXkl>_k~dP

But

IflXkl>kXkdP[ = kP(I~ [ > k) + /kP(~>t)dt

i C~[e[>_k IZIHP ÷ 0 as k -~ co

since EIZ I < 0% Thus E(Y k) ÷ m so n -I n Z I E (Yk) + m.

So we need only show n -I n Z I (Yk-E(Yk: -* 0 a.s. By Kronecker's

oo lemma this occurs if Z I (Yk-E(Yk))/k < co. Since Yk are independent

oo E(Yk)) 2/k2 this occurs (by MGCT) if Z I (Yk- < oo which occurs if

EiE(Y k- E(Yk))2/k2 < o~ It will do to show Z E(Y )/k 2 < o~.

k 2 k 2 10 t>dt < t0 >t>dt

k 2 <C tO P(Z2>t)dt = C fk p(iz I >_ u)Eudu.

Thus

oo 2 o~ -2 k ZIE( Y )/k2 < 2cZi k f0uP(IZl > u)du

2 5 4

~ k-2 _k i 2 ~ 1 ~1 7i-1 ~p(!zl z u)du

~k-2 k i 2o~I z 1 i / i _ l P ( I z l z u)d~

~ ~ - 2 . i 2cZi= I ~k=i k i/i-I P(IZI ~ u)du

i 4cZi: I /~_iP(iZl ~ u)du

= 4cEiZi < ~.

Remark. It is possible to give a theorem with ~ a martingale difference

sequence but the added generality is not useful since the conditions could

hardly be checked in practice.

LECTURE 12: ASYMPTOTICS OF REGRESSION MODELS

12A. Introduction

The aim in the next few lectures is to consider the asymptotic theory

(consistency and central limit theorem) of least squares estimators (Ise)

for the two basic linear (in parameters) time series models. The AR

where

and the ARX

Yn = ~-O-Yn-I + ~n' n h i

-Yn-i = (Yn-i .... ' Yn-p )'

v T

Yn = -~0Yn-i + -$0Zn-i + gn

where Z is an m-vector of exogenous variables: s is a white noise sequence -n n

(specific properties are specified below).

These two models allow us to see how general an analysis can be given.

These models can be included in the regression model

Yn = 20Xn + ~ ' (12.1) n

If we allow x to be ~g,z -n ~n-I measurable (i.e. we suppose ~n is a function of

[n-i ..... [0' ~n-i ..... ~0 ) i.e. yn , ~n are fed back into -nX : ~0 is an

r-vector. The exogenous variables are taken to be independent of the a . n

As a sort of bench mark it is first worth reviewing the asymptotic

theory of Ise for the regression model with x independent of s • -n n

12B. Regression Without Feedback

The Ise is

T = (x,x)[1 n (x'x) = n~s~s. -n E1 ~sYs ' n E1

Thus subtracting off the true value ~0 gives

256

~ 20 (x,x)~l n = - = ~1 Xs~s" - ~ - n -

T Then the variance of 8 is

var(2n) = (X'X)-in 02.

We have then

(12.2)

THEOREM 12.1. In the regression model (12.1) if x are independent of -n

gn: Sn are uncorrelated zero mean with E(~) = 02 then -n ~ ~ ~0 if

O(X'X) ÷ oo n

(12.3)

where o(X'X) n = smallest eigenvalue of (X'X) . n

D Proof. 0 _5+ 0 if for an arbitrary fixed vector ~, ~'@ ÷ 0 in mean

n . . . . n

square and this follows via C h e b s h e v ' s i n e q u a l i t y f rom ( 1 2 . 2 ) , ( 1 2 . 3 ) .

Since a'(X'X) -I ~ ÷ 0 for an arbitrary ~ iff o(X'X) ÷ ~. - n - - n

Remark. The condition (12,3) can be called a generalized persistently

exciting condition.

The question of a.s. convergence of -n

has only recently been obtained.

is much harder. The basic result

THEOREM 12.2. In (12.1) if -nX are independent of gn; sn are a nonlinear

innovations sequence or martingale difference sequence i.e. E(enI6n_l...el)

= 0 and E(~I~n_I...£1) 2 = a.s, then -n ~ ÷ 20 a.s. if (12.3) holds.

Proof. See Lai et al. (1979). In the Gaussian case there is a very easy

proof as follows. Consider that

-n n-i + (X,X)nl = (X'X)nl E 1 _XsY s XnY n

(X'X)n I A = (X'X)n_I@n_ 1 + (X'X)-I x y n -n-n

= - n-1 + (X X)nl n (Yn 1 )

257

=> ~n = ~n-i + (X'X)nlxne n- (12.4)

= -- ~ en Yn ~n~n-i"

It is easy to show the residuals e are uncorrelated and have variance n

°2(I+x'(X'X)71~n ) ' - n n Also it is easy to see the en are linear combinations

of the Sn hence if ~n are Gaussian so are en," but en uncorrelated => en

independent . Now l e t ~ be an a r b i t r a r y f ixed vec to r and l e t ~n = d ( e l " ' e n ) "

Then if T = ~'6 we have from (12.4) n - -n

E(T21~n-I ) = T2n-i + 0"2(1 +-xn(x'X) n l x n ) X'-n (X'X>n2-Xn"

Hence by MGCT* T converges a.s. if n

E l~ (l+_nX' (X'X)n I Xn)(Xn (X'X)n 2xn ) < ~°. (12.5)

Once (12.5) is established we employ (12.3) and Theorem 12.1 to see that

T -~P~ 0 hence T + 0 a.s. Now (12.5) is proved in Appendix AI2. n n

= Remark i. In the scalar case (r = i) Yn Xn@O + g ) n

12.2 is well known. We simply observe that

the result of Theorem

n (90 On = (~i x2)-IEns ± = _ x ~ + 0 a.s.

$ s

oo n 2 if E lxsgs/V s < oo a.s. (Vn= ElXs) by Kronecker's lemma. Now by MGCT

T = n Xsgs/Vs converges a.s. if SUPn E(T2n ) < EiX2s/V2s < o% But n El

1 ~ 2s 2 2s/VsVs_l = ZTVs I _ V-I = Vl I ÷ o% The difficulty in x /V _< Z x i s if V n

the vector case is that the Kronecker lemma holds only under restrictive

conditions,

Remark 2. Actually the result of Lai et al. (1999) allows g n

correlated noise. If g has the property n

to be a

2 ~ (12.6) ZI a n < ~ ==> E lane n converges a.s.

Then Theorem (12.2) holds. Condition (12.6) is satisfied by certain types

of strictly stationary linear processes (see Solo (1981)).

258

Remark 3. If g do not satisfy (12.6) but say are only uncorrelated or n

stationary with bounded spectrum then the natural result is

÷ ~0 a.s. if ~ -n p+2 log s/(sq(X'X)s ) < ~.

This result is proved in Solo (1981).

We turn now to the CLT.

THEOREM 12.3. In (12.1) if x are independent of E • Suppose g inde- -n n n

pendent and 2 s t r o n g l y u n i f o r m l y i n t e g r a b l e . L e t (X'X) ½ be a s y m m e t r i c n n

positive definite square root of (X'X) . Then n

if

) l)

max , -Ix. x. (X'X) ÷ O. --i n --l

(12.7)

Remark. If g. are iid the result is true. 1

Corol!ary. Condition (12.7) is implied by

o(X'X) + oo n

T -i x (X'X) x + O. -n n -n

(12.3)

(12.8)

Proof.

that

Let ~ be an arbitrary fixed vector (but ~'~ = i). Then consider

~ ~ n ~' (x'x)~(en-e0) ~' x'x 2 n =

= - ( )n Z1 5sea ~iZns~s

where Zns = ~' (X'X) 2Xn-s" Set ]n,s = ~s = d(Sl'''s- s )" Then Zns are trivially

]s-i measurable. We need then only check the conditions of the corollary

to Theorem 11.5. The conditional variance is

while

V 2 = En Z 2 = i n i ns

Z 2 , -I max < max x. (X'X) x. * O. l<i<n ni -- l<i<n -l n -l

The proof of the corollary is straightforward.

259

12C. Regression With Feedback

Now we return to the case where the regressors -nX may be -n~-i measur-

able i.e. depend on the past. A result like Theorem 12.2 is not possible.

We begin with a basic result.

~IEOREM 12.4. In the regression (12.]) with x -n

÷ ~0 a.s. if (12.3) holds and -n

being <-i measurable then

x' ~I ~r+l-n (X'X) Xn/O(X'X) n < ~. (12.9)

Proof. Begin by observing

~o ~ (x'x)[ 1 ~ - = = E I XsC s --n -n -

Also then

If we introduce

n-I (X,X)~I Xngn = ~n ~ (x'x)~ I El ~s~ +

(X'X) ~n = ,n-i n E1 ~sgs + ~ngn"

n = -n [' (X'X)n-Qn

Xs s )

=> E(QnI3 _i ) = (Zl-lXsSs)'(X'X) + (X'X) x --n -n

Qn-i + O2-nx' (X'X)n I }n"

Now divide throughout by ~ = o(X'X) ; call Qn = Qn/°(x'X)n n n

dcn = ~ - => =O(X'X) ) n °n-i (On n

and

E(QnI~_ I) ! Qn-I - Qn-ld~n/On + 2_nX, (X'X)n l_xn/o(X'X) n.

T h u s b y ( 1 2 . 9 ) a n d MGCT e we d e d u c e

(i2.1o)

(12.ii)

also

Qn converges a.s. to Q < ~ a.s,

co

E1 Qn_idOn/On < o%

260

Thus to avoid contradicting (12.3) we must have Q = 0 a.s.

Finally, the result follows since Qn - > Ii~n-~0 [12"

THEOREM 12.4'. Call i(X'X) = % = largest eigenvalue of (X'X) n. If n n

then

r+l (log ln/%n_l)/O(X'X)n < co

÷20 a.s. -1%

Proof. Introduce V = det(X'X) . From the identity n n

x' n = (V n- Vn_ l)/v n" (X'X)nl x n

We find (call o = o(X'X) n) n

ENr+I -nX' (X'X)nl -Xn/~n = ENr+I (Vn - Vn-l)/Vn°n

n dx < O n d~x = Zr+l n n -- Zr+l - x

= EN o -I (in V - in V n i ) r+l n n -

In V N EN in Vn_ I

< O N + r+l On_ I

du n (sum by parts).

(7 n

However,

In V < r in 1 n n

so

i in l N ~N in In_ I don]

_< r o N + r+l On_ 1 O n

= ~N in(%n/%n-l) + c. r ~i o

n

The result follows from Theorem 12.4 on letting N ÷ ~.

(12.12)

Remark.

that

From the identity (12.12) we also deduce the interesting fact

lln_sx, (X'X)s l~s = Eln(V s_Vs_l)/V s ~ Enl in(Vs/Vs-l)

~ in V ~ in %(X'X) n. n

261

12D. A General Central Limit Theorem

Consider the quantity

= (XX> ½ _ _ ~iXs~s •

2 THEOREM 12.5. In 12.1 if ~ independent, 6 strongly uniformly integrable;

n n

if for each n there is a positive definite symmetric non-random matrix B -n

such that

~n 2(X'X)~ ~p ! (12.13)

E(B n2(x'x)nBn½ ) ----+ ~ (12.14)

and if

max i~" <iSm

x: (X'X)-Ix. P--~ O. (12.15) --l n -I

Then (X'X)~ (@n- 00) => N(9' !)"

Proof. Let us write

(X'X)~ (in--00) = (X'X)½B-½n-n -n B½ Eln~sgs"

Thus in view of (12.13) we consider B½-nEln-Xsgs" Let ~ be an arbitrary fixed

vector (but ~'~ = i). Introduce the array X = Z Es, Z = ~'B -½x , - - ns ns ms - -n -s

Fns = Fs = o(~ l...Es). Then we can apply the corollary to Theorem 11.5.

The conditional variance is

V 2 = ~'B -½ (X'X) B -½~ P-~+ 1 by (12,13). --n n - n -

Also E(V~) + i by (12.14). Finally

, 1 xi -~ max Z 2. _< max x i (X'X) n ~'B-~i(X'X) Bn2~ ~ 0 by (12.15) and (12.13). l<i<_<_<_n n: l<i<n - - -n n - -

Remark. If ~(X'X) n ÷ ~ then (12.15) may be replaced by

x' (X'X)~ Ix n P_Z+ O. -n

Notes. Theorem (12.1) is due to Eicker (1963). Theorem 12.2 is due to

262

Lai et al. (1979). The Gaussian result was proved (by a much longer argument)

by Anderson and Taylor (1976). The proof given is basically due to Sternby

(1977); see also Solo (1981). Theorem 12.4 is due to Solo (1978). With another

proof Theorem 12.4' is due to Lai et al. (1981). Theorem 12.4' with an extra

condition was proved in Solo (1978). Theorem 12.3, 12.5 are well known. A

theorem like 12.5 is quoted in Lai et al. (1981).

APPENDIX AI2. CONVERGENCE OF A SERIES

We show if (X'X)n = Eln_xsX~_ then

z I (i+ ~' (x'x)~ 1 x' ~2 -n ~n ) -n (X'X) ~n < ~"

We begin from

= (X'X)n_ 1 (X'X) n x x' -n-n

(AI2. i)

~> I = (X'X)n(X'X)nl I - XnX n (X'X)n_I 1

=> (X'X)nl : (X'X)n_I 1 - (X'X)nlxnx n (X'X)nl 1

~> tr(X'X)nl = tr(X'X)nl I - _x n (X'X)nI(X'X)n_IlXn

, => ~r+l-n (X' i Xn -< tr(X'X)-in

(AI2.2)

(AI2.3)

On the other hand from (A12.2)

x; (x X)n l :

Using this in (A12.3) gives (AI2.1).

x' (X'X) -I --n n

x < x ' x ) x 1 + - n n - n

LECTURE 13: LEAST SQUARES ASYMPTOTICS IN AR AND ARX MODELS

13A. Consistency in ARModels

Consider once more the AR model

Y = ~ ' ~ n - 1 n

where Sn i s a mds w i t h E ( ~ I F n _ I ) = o2:

+ c n

t

F n = <~(Sl...¢n ) :

(13.1)

= (~l'"c~). P

of the poly- The behavior of (13.1) depends critically on the roots ~i

nomial equation

zP(I-~(Z-I)) Z p -P a Z p-i 0. (13.2) = - L1 i =

If all I~il < 1 the process has bounded variance and is as~totically

stationary. If some ~i are on the unit circle the process has growing

variance and it will be called a random wanderinl model (RW_) (this derives

from random walk - the simplest case). If some roots are outside the unit

circle it will have geometrically growing variance and will be called an

e__xplosive model. From a time series point of view the ease l~il ! 1 seems

to be most interesting so only this will be treated.

We can prove the strong consistency of the least squares estimate (ise)

= (y,y)~l n _ n , -n EiYs_l~s; Y'Y = Z IYs_IYs_l using Theorem 12.4'. To do this we

have to investigate the relative growth rates of I(Y'Y) n and °(Y'Y)n the

largest and smallest eigenvalues of (Y'Y) . n

Suppose then that the roots of (13.2) are inside or on the unit circle.

Let A = number of roots on the unit circle and factor

1 - ~(L) = (I+A(L))(I+a(L)) where zA(I+A(Z-I)) = 0 has only roots on the

unit circle and zP-A(I +a(z-l)) = 0 has only roots inside the unit circle.

Introduce two auxiliary sequences

u t = (I+A(L))Yt, v t = (I+a(L))Y t

=> (l+a(L))ut = ~t' (l+A(L))vt = ~Tt" (13.3)

264

Now i~troduce u -t

Now we can write

where

= (Yt-l" " "Ut-A) ' T

, V t = (Vt_ l...vt_p+A)'', W t = (Ut, Vt)"

= Tit

T =

p-A

-i A 1 ... A A 0 0

0 1 A 1 ..... A A

0 ..... 0 I...A A

1 a I 0 0 .... ap_ A

_0 ......... 0 1 a I. .ap_ A

Futher T is nonsingular.

So we need only deal with ~ . -n

observe firstly that

So if we introduce (W'W) = n n E 1 Wt_lWt_l we find

n ! -1 ~ : !-RY'Y)~IzI !s-l%

(T'(Y'Y)nT)-IE I [[s_iEs

(W,W)~I ~n = ~ say. = Z1 ~s-lSs -n

To calculate the eigenvalues of (W'W) n

(i) Largest eigenvalue

A(W,W) n < p(Ei u 2 + n 2 -- t ZlVt). (13.4)

Since i + a(L) is stable it follows by the basic Lemma A8.3 that for some c

n 2 nu2 _< c s t Z1 t E1 "

±@ Now consider 1 + A(L), all the roots have the form e

two sequences d t, b t related by

(l_e-i@L)dt = b t

= eit8 t -is@~ => d t E 1 e o s

(13.5)

Consider then

265

idtl2 t 2 => ~ t ~1 Ibsl

n idti2 -> I I n .t [bs[2 E1 Ibsl2 zn --< ZI t ~I = s t

n 12 ~n 1 ~n ibsl2 --< Y~] Ibs ~i t = ~ n(n+ i) 1

=> ZI idtl2 n 2 n 12. n ~ El ib s

Continuing we see that since 1 + A(L) has A roots

2A vn 2 n v 2 < n ~i ZI t -- ~t"

Thus from (13.4) we find

%(W,W) n ~ n2A Zln 2_t ~ n2A+l

(ii) Smallest eigenvalue.

k (E'E)n = Z l~s_l~s_l •

A8.1 that

clearly

= ), Introduce ~t-I (St-l'''St-n '

In view of (13.3) it follows from the basic Lemma

c o(W'W) ~ o(E'E) n n

n 2 ~7(E'E) ~ p - ~ pn.

n ZI -t (13. Sb)

(13.6)

(13.7)

(13.8a)

THEOREM 13.1. In the AR model (13.1) if all the roots of (13.2) are !l

then ~n ÷ ~0 a.s.

Proof. In view of (13.7), (13.8) the result follows from Theorem 12.4'

(with a little extra work).

Remark i. An obvious question related to the AR model is the consistent

estimation of the autocovariances when all the roots of (13.2) are <i.

This can be proved quickly by an argument similar to that used in Theorem

14.2 (see Appendix AI4).

266

13B. Central Limit Theorem

The situation here is rather complicated but interesting. 7~o cases

will be discussed firstly where all roots of (13.2) are < i; secondly,

where one root =i.

THEOREM 13.2. In the AR model (13.1) if all roots of (13.2) are < i and

g. i n d e p e n d e n t , g2. s t r o n g l y u n i f o r m l y i n t e g r a b l e and (Y'Y)½ i s a s y m m e t r i c 1 1 n

positive definite square root of (Y'Y) then n

Proof.

n ½ (Y'Y)~ (~ -~0 ) => N(0, I ) -n - -p

It f o l l o w s f rom Remark 1 a b o v e t h a t

-i n (Y'Y)n ÷ [Ri-j]l<_i,j<n = R (13.9)

where ~ = lira E(YiYi+k). i-~o

apply Theorem 12.5 with B -n

theorem we need only show

Also R is positive definite. Thus we can

= R-in ½ . In view of the remark following that

y, (y,y)-I y p~ O. -n n -n

From (13.9) this holds if Y' R-Iy /n P+ 0 i.e. if II!nII2/n p-E+ 0 which -n - -n

holds since EII!nll2/n ÷ 0.

When at least one unit root is allowed the limit theory is deeper:

non Gaussian limit distributions are obtained. The idea is briefly indi-

cated.

Consider the simple model

Yt = ~Yt-i + gt; ~ = i (i3.10)

where ~ = i, Y0 = O, E(gt~s) = 6ts , E(ct) = O. The ise is

(E 1 2 -i .n = Yt_l ) >21 YtYt_l

-I => d- I= V U

n n

U ~n V n y2 n = EiYt-lgt ; n = E1 t-l"

267

t 2 Since Yt = E 1 a s so E(Y ) = to we see from Un = Un-1

V = + y2 that n Vn-i n-i

1 4 1 2 2 E(U ) = ~ n(n+l) ~ ~'o n

+ enYn_l and

1 2 2 s(v n) ~ 7o n .

Clearly the usual Gaussian limit is not obtained. The natural quantities

to consider are n-fUn , n-2Vn" First observe that Wnt = El gs/~f~n behave s t

like a Wiener process on [0, i] viz. E(WntWns) = min (t/n, s/n). So writing

n-2V as n

n-2Vn = n-I ~i vn(E~ -I Cs/~)2

suggests the Riemann sum converges to

n-2Vn ~ f~W2(s)ds

when W(s) is standard Brownian motion (see Billingsley (1979)). Similarly

we are led to the calculation

t-I n-iU n ~t E1 as ~,n fl 0 = -- L I dW W W(t)dW(t). n E1 /~ /~ nt nt

This has to be interpreted as an Ito integral to give a value

1 n-iUn ~ ~ (W2(1) - i).

Alternatively, we can calculate directly that

t o t l (E st)2 = E g + 2 EI~; t E 1 a s

1 -~ n 1 n-i n 2 => n-Iu n = ~(n ~E la t) 2 _ ~ llgt

--> n-iU i n ~ 2 (W2(1) - i)

as before. Thus

268

The consequences of these calculations are discussed further by Dickey and

Fuller (1979, 1981).

13C. Consistency in ARX Models

The model being considered is

Yn = -~'Yn-i + -B'Z-n-I + gn (13.11)

where Z is an exogenous sequence independent of g ; ~ is a p-vector; -n n -

is an m-vector. Again the roots of (13.2) determine the behavior of

(13.11). Again only the case with roots ~i seems of interest in time

series. Consider the case _B'Z_n_I = ~(L)Z n where ~(L) = Elm BiL i. Introduce

the m+p-vector Z = (Zn_ I. .Z )' and (Z'Z) = n ~ ~ The -n " n-m-p n E1 ~s-I ~s-l"

following result is available.

THEOREM 13.3. In the ARX model (13.11) suppose all the roots of (13.2) are

<i. Call t = n ~2 E 1Z k + n. Call _e = (~'$)'_ _ , let -n ~ be the ise of _0" n

(i) If E 1 in(tn/tn-i )/~(~'~)n < ~ then -n ~ ÷ ~0 a.s.

(ii) If in tn/O(Z'Zn) + 0 then -n ~ -~p+ ~0"

Proof. See Solo (1982) (the vector case and the case roots !l are also

dealt with).

Remark i. In either (i) or (ii) we must have in t /o(Z'Z) n n

implies in n/O(Z'Z)n ÷ O,

÷ 0 which

n 2 c Remark 2. Suppose E I Z k ~ n c > i. Then in t

n

convergence we need

~ c in n so for a.s.

Z 1 (no(Z'Z))-i < ~. n

For weak convergence o(Z'Z) /in n ÷ ~ will do. If n

n 2 c ZlZk ~ n , c < i.

Then in t ~ in n so we still need (13.12). n

(13.12)

269

Remark 3. Another obvious question here is the issue of identifiability.

It is clear from Lecture 7B that -n ~ consistent for ~0 => ~0 is a-identifiable.

The question is whether the conditions of Theorem 13,3 are minimal in any

sense for a-identifiability. The K-L information function is

-2Hn(0, 60) = E(e~(O)) - E(~)

where en(e) = Yn - ~(L)Y n - $(L)Z n. Thus

-2Hn(e, e O) = (e- ~0)'(W'W)n( 2 - 20)

n Er~k-i n ~k-l] , ' (W'W)n = E 1 ~Zk_ 1 (!k-l~k-l)' > Z 1 Zk_lJ(~k-l~k-i )

where S k = (i - ~o(L))-I$o(L)Zk o Now if ~ # 20, Hn(8 , 80) ÷ -~ iff

d(W'W)n ÷ ~ this occurs if ~ a p +m vector %_ = (T_ : with

~ (L)s k + %$(L)Z k = 0

(or T (L)y k + ~(L)Z k = 0).

Alternatively by Lemma A8.2, ~(W'W) n ÷ ~ if d(Z'Z) n

Notes, Theorem 13.1 is due to Kawashima (1980) with a different, very

tedious proof. This result has also been proved by Lai et al. (1982 b)

(not available at the time of writing). Some central limit theorem results

related to Theorem 13.3 are given by Fuller et al. (1981). Lai et al. (1982 a)

proved a.s. convergence of the ise but the conditions they give are not

easily checked. The ones in Theorem 13.3 are. When some roots are < i, some

are >i, Stigum (1976) proved a.s. convergence: see also Lai et al. (1982 c)

(who seem to have overlooked Kawashima and Stigum). Dickey and Fuller (1979,

1981) discuss distributional theory associated with models like (13o10).

LECTURE 14: LEAST SQUARES ASYMPTOTICS FOR ARMA MODELS

14A. Preliminaries

We turn now to consider parameter estimation in the ARMA model

(l+a(L))y k = (l+c(L))~k, k ~ 1 (14.1)

~k is white noise, E(~) = d 2. Now the presence of the MA terms makes

the estimation and analysis of this model technically complicated. The

basic reason is that the ~ parameters enter non-linearly into the least

squares function. The desirable way to estimate the parameters in (14.1)

is to maximize the likelihood function. However, the analysis of the

asymptotic properties of the maximum likelihood estimator is surprisingly

technically complicated. Consequently, only least squares estimates will

be considered here.

Before continuing it will be useful to list some assumptions. Denote

= (al...a p bl...bq) , Q £ IR8 which is a compact subset of the open set

on which

zP(I+a(Z-I)) = O, Zq(l+c(z-l)) = 0 have all roots <i in modulus. (14.2)

Under Assumption (14.2) we have a Taylor series

(l+c(L))-l(l+a(L)) = ~0 hs (~)Ls' h0(~) = 1

under Assumption (14.2) h (8) will comsist of geometrically damped sines s

and cosines. Thus the following will hold

% < 1 ~ V~ e ~@, lhs(~)] < %s Vs. (14,2a)

Also it follows that

H(ei~l~) = Eohs(e)elS(~ is continuous in 6 e ~6" (14.2b)

Since ]R 9

2 ~k are independent; ~k

(see Appendix BII).

271

compact H(ei~10) is uniformly continuous on ~, is

are strongly uniformly integrable

Yk is second order stationary.

(14.3)

In place of (14.3) typical assumptions are that ~k is a strictly stationary

ergodic martingale difference sequence. Conditions (14.3) have the advan-

tage that they are not placed on the joint distributions of the ~k'S only

on the marginal distributions. Also introduce the spectrum

~(~19) = 2 Ii + c(ei~,)12/11+ a(ei~)12 (14.5)

and note that

¢-i(~I0 ) = a-2 [Z0hs(0)ei~Sl2. (14.6)

The obvious way to generate an error sequence for a sum of squares

function is from

ek(~) = (l+c(L))-l(l+a(L))y k. (14.7)

However, this involves the infinite past. If we start (14.7) up from

some initial conditions then we get

ek(0)- = ~0~'khk-s(0)Ys- + gk(~)_ y_ 1 (14.8)

where gk(O ) are geometrically decaying initial conditions and in view of

(14.2a) can be neglected in the ensuing discussion (actually they can be

subsumed in the sum). Also y_ 1 : (y_l...y_p) are initial conditions (e.g.,

k-I for ARMA(I,I) gk(9) = c a, [-i = Y-1 )"

The sum of squares function is then

1 -i so~n Sn(~) = n e (0).

Since Yk is stationary there is, for each 0 an initial condition y_l(e)

which makes ek(!) stationary. If this initial condition is used rename

ek(0) as $k(9). Then we assume

(14.4)

272

t h e r e i s a t r u e v a l u e -~0 ) ek(-~O ) = ~:k" ( 1 4 . 9 )

The ensuing analysis can however be done without this assumption.

The least squares estimate (Ise) is found by minimizing S (8) over n

IR e . This is also called the prediction error method since ek(@) in (14.7)

is a prediction error. To analyze the behavior of the ise we have to look

at the asymptotics of S (0). From (14.8) n

= = e ¢ ( c / ) d f ; ¢ (w) = ¢ ( c o l e o ) . Rk E(yiYi+k) /_~ iko0

THEOREM 14.1. Under conditions (14.1) to (14.4)

lira E(e2(_@)) = E 0 ~ohs(~)hj(e)Rs_j

= f_~ ¢(uo)/¢(wte)df = s(e)

uniformly in _0 c IR~.

Remark. It follows from the remark after (14.8) that S(@) is uniformly

continuous in e c IR e .

Proof. The proof is a consequence of the following observations, the

dominated convergence theorem and (14.6). We have

.k, (~. iscol2 k~khs(e)hj(8)R s j f_~ IEon s v ) e ¢(0J)df Z 0 _ _ _ =

khs(e)eiS°~ 1 < ~O %s = (l-l) -I V e ~ IRe {Zo

and

E k hs(e)eiS~ + Zohs(8)eiS~° uniformly in e.

We now prove the same result for S (8). n

THEOREM 14.2. Under conditions (14.1) to (14.4)

1 1 sn(e) ÷~ s(_e) = ~ f_~ ¢(o~)/¢(~o[e)df

uniformly in _8 e IR e .

a.s.

(14.10a)

Proof.

now agree that y_j

273

n-i n 2 n-I n Z0ek(_@) = E0ek(_@) Zkhs(_0)Yk_ s

n-i ~ n hs(9 ) n = s= 0 - Ek= sek(-9)yk- s

=0, j >_0

n = n -I Zn=ohs(e ) Zk=Oek(_e)Yk_ s

= n n k -i xn=0hs(9) Zk=0Yk_s EoNg(-6)Yk-%

n n

= n -I zn=0hs(9 ) E~=0N£(@) E£=kYk_sYk_ ~"

Now recall the convention above so

1 in n hs(O)h£(9) n-i ~n Sn(-@) = 2 s=0 Z2=0 - - max(s,£) Yk-sYk-£"

Now call

Rs,~(n) = (n -I Z n max(s,£) Yk-sYk-£ ) I (max (s,£) ! n).

Then for each s, % a.s. Rs,£(n) ÷ Rs_ ~ (see Appendix AI4). Further

iRs,£(n) I < (n-I En 2 -IEn 2 -- max(s,£) Yk-s n max(s,£) Yk-~ )

< n-i n 2 -- Z0Yt = R0(n) say,

Now R0(n) ÷ R 0 a.s. so given g 3 n0(g) ~ V n > n0(E)

Now set

R 0 = max IR0(n) I. n<_n0(O

rt

Then Vn IR0(n) I ! R 0

Thus Vn

Now rewrite

. v

R 0 = max [R 0, IR01 + c].

~v

IRs,~(n) l _< R 0

tRo(n)- RO] < ~.

i ~ ~ -s -£ AsA~ Sn(e) = ~ Z£=0 Es= 0% hs(0)% h£(~)Rs,£(n) •

(14.10b)

274

Then by the dominated convergence theorem and (14.2a) the result follows.

Remark. For future reference it is worth noting that the following results

can be similarly established.

Remark.

where I (W) [ n y = Z 1 Yj

From (14.10b) follows the approximation (cf. (14.10a))

i f~ly(C~)/$( w Sn(-~) = 2- [8)df

eiWJ ! 2 I •

uniformly in

dS de k n + d S (ek(8) ~- d@ d@ = lim E ) a.s.

k-+oo

= -L% @(w) d log @ df

uniformly in O

I (8) = - - n

d2Sn d2S

dSdQ' dSdS'

d2ek I dek dek] EIek d~d, ] lim E ~ - - + lim

k-~o dS'J k-~o

~ d 2 log @

= - f-g O(wl@) dOdO' df + /~ @(w) d log ~ dlog @ df

.d@ dO'

In(8) is the sample information matrix.

Note that since f~ -~ ~(wl@)df = 1 we deduce dS/d8 0 = 0 and

16080 = i(@0) d2S , f~_ $(w) d log $ d log, $ df.

d@od@ 0 d~ d@

(14.ii)

(14.12)

275

14B. Consistency of Least S~uares

Now strong consistency of the ise can be proved.

as the value such that Sn(9 n)_ ~ Sn(8)_ _9 c

The ]se 9 is defined -n

THEOREM ]4.3. If the model (]4,8) is a-identified at ~0 i.e.

S(!) = S(20) :> ! = ~0 then with conditions (14.1) to (14.4), ~n ÷ ~0 a.s.

Remark. The model orders p, q are assumed known: this eases the a-identi-

fiahility issue considerably.

Proof. Now -n ~ belongs to 11%@ so is a bounded sequence. Let 0* be a limit

point of -n ~ " We show S(~*) = S(90) which proves the result.

Consider the following We have S(£*) Z S(90) and Vn Sn(~n) ! Sn(90).

sequence of i nequa l i t i es (n to be chosen)

0 ! S(9") - S(80) ! S(8") - S(@n) + S(gn) - Sn(0n)

+ Sn(6 n) - Sn(00) + Sn(00) - S(Q0)

S(8*) - S(gn) + S(~n) - Sn(~n) + Sn(90) - S(90).

£ Vn > n 1

< [: Vn > n 2

< £ Vn > n 2.

Now take moduli

by uniform continuity i S(0* ) -s(e n) I

by uniformity of limit IS(~ )- S (9) n n n

by uniformity of limit ISn(90)-S(90)

Thus take n > max (nl, n2) giving

0 <_ S(O*) - S(90) _< 2s

but s is arbitrary => S(@*) = S(90): hence result.

14C. A Central Limit Theorem

The basic idea is the classic one for producing central limit theorems.

Expand the equation dSn/dRn = 0 in a Taylor series about 00

276

0 = dSn/d~ n = dSn/d80 + In(8:)(6 n-80 )

where Q* is an intermediate value with II_~:- e_0!l _< []0n--°011 provided ]IR@ is -rl

convex. The idea then is to invert the Taylor series to see

dS #-n (-~n --GO) = [In(8:) ]-i ~nn d8 n

n

However we need In(e:) full rank to do this. If we suppose

l(B) is continuous for e e IR 8 (hence uniformly continuous) (14.13)

then we can show In(8:) + l(Oo) a.s. Then Vn > n o say, In(8*)n is full

rank (since 1(80) is). Then it is enough to provide a CLT for

n -½ dSn/d@ 0.

THEOREM 14.4. If conditions (14.1) - (14.4) hold, and (14.8) is identified

at 80 then if (14.13) holds and IR% is convex

(~n-~O) => N(O, Z-I(80)).

8" Proof. Now -n ~ ÷ ~0 a.s. so -n ÷ ~0 a.s. Further

IlZn(e : ) - t (eo:li < II I n ( 8 : ) - I (8:) i l + II t (8 : ) - I (90)ii

by uniformity of limit II~n(e:)-~(e~)ll < ~ Vn > n 1

by uniform continuity

Thus

In(e: ) ÷ !(80 ) = Z8080

so we have to show n ½ dSn/d~ 0 ~> N(O I~l~ ) . -' -¥oUo

lli(e~)-Z(eo)ll < ~ vn > n 2

a.s.

Now

n½ dSn/dS0 _½ n = n I I ek(eo)dek/d8 0

= n-½~Igkdek/d@ 0 + n -½ Zln (ek(e0) _gk)dek/d@o.

Now the second term-~P~ 0 if

(14.14)

n -~2 E 1Elek(~0 ) - 8kllldek/d_0011 ÷ O.

By the Toeplitz lemma this holds if

This is bounded by

277

E lek(00) - gk1[l dek/de011 ~ ÷ 0.

(kE(ek(00) -~k )2 EIldek/d~oiI2) ½

so it is enough to show

2 kE(ek(00) -£k ) ÷ 0

since the second term ~ tr(l(@o)). We show (14.15) at the end.

is shown that the first term in (14.14) obeys the required CLT.

Introduce the array

-~ , F ~ X = Z Z = n ~ des/dO O, F = • nS ~SCS ~ NS ns s

(14.15)

Now it

In view of the corollary to MGCLT Theorem II.5 we need only show

V 2 n Z 2 p+ d2_,l(@o ) = _ (% n Y~I ns

E(V2n ) d 2 , ÷ ~ _~(eo)

max Z 2 --P+ O. l<s<n ns

The first two follow by arguments already given cf. Theorems (14.1), (14.2).

The last will follow if n -I llden/d@0112_ -P-P+ 0. But this holds since

n-i II den/d_00112 = n-I E1 II des/d_eoIl 2 _ (i - n -1) ( (n- I)-I ii-i II des/d20112)

tr(I(@0)) - tr(I(@0)) = 0

Finally, (14.15) must be established. Recall the discussion between (14.8)

and (14.10) then

ek(@O) - F~ k = ek(@ O) - ~k(80) = gk(@_o)(Y_l(eO)-Y_I ).

Thus

E(ek(00) -gk )2 <_ [l_gk(00) 1] 2 c

278

and

However, iigk(8O)II 2

c : EIIZ_I(G O) -Z_III 2 < ~J.

÷ 0 geometrically so (14.15) holds.

Notes. Consistency and CLT are discussed by Jennrich (1969), Hannah (1973).

Hannah and Nichols (1973), Caines (1976), Ljung (1976), Kohn (1978),

Rissannen and Caines (1979), Dunsmuir and Hannah (1976). Early results are

due to Walker (1965). The proofs given here are the author's, many of the

elements can be found in the above references. Also, Caines and Ljung

(1976) emphasize analysis when the stationary process Yk is not A~MA but

an ARMA model is fitted to it.

Extensions. The extension of these results to ARMAX models involves some

careful considerations concerning the exogenous quantities: see Hannan and

Nicholls (1976). It is also possible to deal with parameter estimation

subject to constraints. The theory can be developed in a manner similar

to that of Silvey (1959), Aitcheson et al. (1958); see Kohn (1979 a).

APPENDIX AI4: CONSISTENCY OF SAMPLE AUTOCOVARI~NCES

There are a number of ways to develop this. We suppose E(g~) = 2 K

and

2 ~k are independent; c k are strongly uniformly integrable.

Now suppose Yk are given by

k Yk = ~0 gk-s~s

this covers the ARMA case if we add a geometrically decaying initial

(AI4.1)

(AI4.2)

condition: it is easily dispensed with in the ensuing discussion hence

~-k its omission. We have to show n-12 yjyj+k ÷ ~ = lim E(YIYI+ k) a.s.

-i n 2 Actually we only need to show n lly j + R 0 a.s. To see this let _~ be

an arbitrary k-vector. We have our result if we show

279

-i n (~(L)y.)2 ~+ , R<~; R = [R i "]i< n E 1 3 . . . . 3 i,j<k"

But define yj = ~(L)yj so we are back to n -I Xl~nyj~2 ÷ R0"

It follows exactly as in the proof of Theorem 14.2 that

-i n 2 n n n-l,.n n E0Yj : E0 E0gsg£ ~max(s,£) tk-s k-£"

Now proceed as for the rest of that proof noting

c o

for each k n -I ~k~jaj_k -~ 0 a.s. if F. k~!:jsj_k/j < ~

which occurs by MGCT if Z~E(g~_k)/j

(A14.1) and Appendix B l l

2 < oo which it is. While in view of

-I n 2 2 n E0g k + 0 a.s.

we find then n-i ~0"nyj2 ÷ v °°~0 g2Js = RO"

LECTURE 15: ASYMPTOTICALLY EFFICIENT ESTIMATION FOR ARMA MODELS

15A. The One-Step Gauss Newton Scheme

While the present discussion gives an interesting asymptotic theory

it must be emphasized that the scheme may not work with finite data.

For AJ~ models with MA roots near the unit circle (or with cancellation

of unit root factors) it seems mandatory to use an exact likelihood

algorithm. This is firstly for numerical reasons and secondly because the

least squares estimates will be biased. The advantage of a least squares

approach though is that it gives a rapid view of the asymptotic theory.

Before it was easy to compute exact likelihood functions there was a

great interest in finding iterative linear methods for producing asymptoti-

cally efficient and consistent schemes. These are still worth studying at

least for the insight they provide about time series models. Further they

are still of relevance in providing starting values for an exact likelihood

calculation.

The usual method of finding starting values for ARMA parameters is as

follows

(i) AR parameters

Solve the Hankel equations H a = r .

(ii) MA parameters

Solve the (quadratic) equations for c(L) given by

2 12 L s c o Ii+ c(L) = ~q -qs

where R are the autocovariances of s

Ws = (l+a(L))y s.

There are basically two iterative methods for this; a linearly

281

convergent one and a quadratically convergent one (see Box and

Jenkins (1976, Appendix A6.2)).

With this in mind, consider now how we might solve the equation

dSn/dgn = 0 = n -I ~n~1 ekdek/d@l ~ t

N

A simple idea now is to use the Gauss Newton (GN) procedure. Suppose

~in is a consistent estimate of -@0 (the ARMA parameters) obtained e.g.,

as described in (i), (ii) above. We use -@in to initiate an iteration

as follows. Consider the Taylor series expansion for a new value ~2n

dS dS d2S n n + ___n ....

d02n dgln dg*de*'n n (-@2n -6in )

and ll@n*-91nl] % ll_@2n-@inl]. Since 9* is unknown it is natural to replace

2 it by ~ln" Fu r the r in the l e a s t squares s e t t i n g d Sn/dOlndeln takes the

form

d2S de k de k d2ek n 1 n 1 ,,n n E1 -- + --n ~lek(91n) - -

d ' deln 91n dgln dgln dglnd01n

Since 9in is consistent the second term will be negligible (compared to the

first) so the second term is dropped. The GN algorithm results on the

setting dSn/de2n = 0

-@2n = -@in - ~i (-@in)dSn/d@in

in(9 ) 1 n dek dek

= n El de dg'

Since 91n is consistent, ~n(Oln) should be of full rank (if n is "large

enough"). It would be interesting to prove that for fixed n this iteration

converges to the least squares estimate ~ . If n is "large enough" this -n

can be done (see Kohn (1979)). Here we settle for the interesting observa-

tion that 22 n is asymptotically efficient. This phenomenon is known in

other areas of statistical inference (see Cox and Hinkley (1974)) so its

occurrence in time series should not be surprising. The proof is now

sketched.

From the Taylor series

where

We find

282

dSn/dSln = dSn/d80 + In(8~)(81 n-80 )

I (8) = d2S /dgd@'. n

~2n = ~In - Z~ 1 (~In) [~n(SP (~ln- ~0 ) dSn/dSO]

:> _ = (@in)dSn/d@ 8

+ In I (~in)([n(@In ) - In(8~))(~in- ~0 )

:> ~ (82n-90) = (@In)/n dSn/d@ 0

+ In I (81n)(~n(@in) - In(Sn )) ~-nn (~in- i0 )"

by the argument used earlier the first term => N(0, I-i(80)). For Now

the second term since !IIn(Oln ) - In(8~)ll ÷ 0 (again by modifying a previous

argument) we will have the result provided say~nn (if n -90 ) converges in

distribution. In practice this is usually straightforward enough to show.

While this shows that one step of a GN iteration provides an asymptot-

ically efficient estimate it is recommended to continue to iterate to

(approximate) convergence,

There is a second way of looking at the GN scheme namely as a regression,

Consider the approximate Taylor series expansion

ek(8) = ek(8 I) + (8-81)' dek/d8 I.

Now viewing this as a regression of ek(91) on -dek/d@ 1 yields the ise

- 81 = I-ln (81)dSn/dSl

which is exactly the GN scheme.

However, in the time series setting there is one further interesting

283

feature. Because the parameters typically occur as coefficients on lagged

quantities there is yet a third way to look at the GN scheme (as a regression).

It will usually be possible to write the model naturally as a sort of

regression

ek(@l) = ek + (dek/d@l)'

where ek is a sequence depending on the model.

Now observe that

@i (15.1)

n dSn/d@ I = n -I E lek(@l)dek/d@ 1

= rn(@ I) + In(@l)@ 1

r(@) = n -I Elnek dek/d@l"

Thus the GN scheme becomes

92 = 91 - I-ln (@l)(rn(@l) + In(@l)91)

= -I-ln (@l)rn(el)

Thus the GN often reduces to the following prescription

regress ek on - dek/dO 1.

This idea is now pursued for two examples.

Example i. MA(q) model. We have

ek(6) = (l+c(L))-lyk ; 9_ = (Cl...Cq)'

=> dek/d@ = -(l+c(L)) -lek; e k = (ek_l...ek_ q)'

Now introduce ek = (l+c(L))-i ek and note that dek/d@ = -~k" Further

- (dek/d9) '@. e k

Now the fact that the sign here is different from that in (15.1) produces

the unusual formula

284

~2n = 2~in + !n I (~In)ln<!l)"

Actually, recalling the earlier comments about MA model fitting (=

factorization) as generalized square rooting makes the factor 2 a little

less mysterious.

Example 2. XAR model, Here

ek(e) = (i +c(L))Vk(e)

Vk(@) = Yk - b(L) Zk

9 = (c'; b')' = (Cl...c p : bl...b )' - m

= )' ~> dek/d~ = ~k (Vk-l'"Vk-p

dek/d~ = - ~k = -(Zk-l'''Zk-m )'

and

~k = (l+c(L))z k.

We can write ek(@) in two ways

ek(9 ) = v k + Zk£

ek(6) = Yk - ~k~

and

Yk = (i + c(L))y k.

So introduce

~bn(81 ) n-i n ~ = ZlYk~ k

N ~cn(@l) = n -I Z I Vk~ k

and observe that asymptotically we expect

n-l~n dek ___dek -1 nZkVk ÷ O. 1 dc db' n Z 1 _ _ _

So we obtain the two regressions

(15.3)

spectral

285

and

= _~-± b-2n -bn (-~i) rbn (-~in)

= _i -I

C2n -cn (21)rcn(-~In)

n~ ~

ibn(21 ) = n-i Z 1 ZkZ k

n

!cn(~l) = n -I Zllk[ k

This has the following interpretation. Using !In filter Yk' Zk to

produce Yk' Zk then do a regression of Yk on ~k to get ~2n" Using ~in

form Vk(01n) and regress it on Vk_l, ...,Vk_ p to obtain !2 n. This agrees

with the intuitive method of proceeding.

15B. Adjoint Variables for Gradient Calculation__ss

One motivation for the GN method is to avoid calculating terms

d2ek/d0d0' in the Hessian. However, in the time series setting there is

a computationally efficient method of generating the gradient dS /d@ and n

the Hessian d2S /d@d@' by avoiding such calculations. Again it is all n

because of the lag structure. The idea is easily seen for an ARMA model.

We have

i -i n 2 Sn = 7 n E le (0)

ek(~) = (l+c(L))-l(l+a(L))y k

Then

2 = (a'; _c')'.

dSn/da_ = n-I zlnekdek/da_

n dS /dc = n -I E lekdek/dc

Ii -

which suggests that to calculate dS /d0 we have to generate the p +q-vector n -

dek/d ~. Let us calculate these, we find

(l+c(L))dek/d ! = !k_l = (Yk_l..-Yk_p)'

286

(l+c(L))dek/d ~ = -!k-I = -(ek-l'''ek-q )'

The fact that the filtering is by 1 + c(L) in both cases enables the following

device to be used. Consider the backwards or adjoint sequence

%i (%n+l = 0 = ... = %n+q)

z = L -I. (l+c(z))~i = -el;

Then calculate as follows

dSn/da_ = n-i zlneidei/da_ (15.4)

= -n-i Zln (I +c(z))~idei/da

-i n + = -n ~1%i(i c(L))de./dal _ (write it out)

-I .n

= -n ~I li-Yi-l" (15.5)

Similarly we find

dS /dc = n -I ~n n - 'i %i~i-i"

These two expressions clearly cut down the computation enormously.

One scalar sequence %. replaces the p +q-vector sequence de./dg. This i i -

is especially useful if only a gradient scheme is being used to maximize

the sum of squares. However if the Hessian or d2Sn/dQd@ ' matrix is

needed then dek/d ~ must be generated anyway but (15.5) is still superior

to (15.4). Also the adjoint variables can be used to avoid calculating

d2ek/dgd8 '. This is left as an exercise.

The above discussion shows the role played by the adjoint sequence

%. in filtering. However there is a natural interpretation as Lagrange I

multipliers as follows.

Consider the problem

I -I ~k e2 i minimize ~ n 1 ei,a,_c

subject to

287

ei = Yi + a(L)Yi - c(L)ei"

We can solve this using Lagrange multipliers by forming

i n-i n 2 n -I E~%i(ei+ - a(L)Yi) Hn = 2 El ei + c(L)ei-Yi

and solving

~H /De. = O, SH /~a = O, SH /~c = 0. n 1 n - n -

We find

$H /~a = -n -I n n - E 1%iZi_ I = 0

SH /~c = n -I ~n n - Z 1%iei_ 1 = 0

SH /De. = e. + (l+c(z))%. = 0 n i l 1

The equivalences are now obvious.

Notes

(A) The GN idea was revealed in times series by Akaike (1973). The

unusual iteration (15.3) is due to Hannah (1970) who derived it

in a different way. Actually there are some differences since

Hannan uses spectral methods to produce the I (~) matrices: see -n

Kohn (1977). Some related work is in Nicholls (1976). Finally,

it should be said that many other ad hoc procedures for producing

asymptotically efficient schemes for particular models have turned

out to be Gauss Newton schemes.

(B) The use of adjoint variables is due to Goodwin (1968) and Kashyap

(1970).

LECTURE 16: HYPOTHESIS TESTS IN TIME SERIES ANALYSIS

16A. Lagrange Multiplier Tests

Some part of time series analysis is involved with tests of hypothesis.

This includes tests for order, order of differencing, the presence of auto-

correlation. It is always possible to construct tests by the likelihood

ratio principle and in small data problems this may be the best idea. Again

however with larger data sets other methods have advantages and insights

to offer.

There are three basic methods for constructing tests of multiparameter

hypotheses, the Wald (W) Test, the Likelihood Ratio (LR) Test and the

Lagrange Multiplier (LM) or score test. The LM test has the great advantage

that it only requires parameter estimation under the null hypothesis.

Recently, it has become clear that many previously suggested ad hoc pro-

cedures are actually LM tests.

Suppose the hypothesis to be tested is posed as a set of restrictions

on the parameter r-vector @, say

H 0 = h(~) = O; h is a p-vector,

If we denote -2 log likelihood by L(@) then to estimate ! under H 0 the LM

approach is to minimize the Lagrangian

L(@) + %' h(@)

where i is a p-vector of Lagrange multipliers. This yields a set of first

order equations

+ H), : 0 (16,1)

h(~) = 0 (16.2)

2 8 9

where D = ~L/$~, H = ~h'/$@. Also 2 is the restricted maximum likelihood

estimate (mle).

The W and LM statistics are based on two Taylor series expansions.

Let 8 be the unrestricted mle then since $L/$~ = 0

e(8) = L(8) + (6 - @) '3(@*) (8 - @)

3(6) = ~2L/~8~6'; I]8"-611 ! lJS-SJj. In the previous lecture we had I = J/n.

Now the LR test is based on t(@) - e(@). If J(6) is replaced by say

J(e) = E(J(@)) and @* by @ we obtain the W test

n(G- ~) 'Jd) (~ - ~)

On the other hand consider the other Taylor series

0 = dL/d8 = dL/d8 + J(@*)(@-@).

This leads to

_ ~ _- _j-l(~) ~LI~ = ]-iB

so that we obtain the score test

Applying (16.1) to this gives the LM test

LM=~'~'7-1~

Asymptotically the LR, W, LM tests can be expected to be equivalent, all

X$. However, in small sample situations there are some interesting being

inequalities between them (Breuseh (1979)).

A common form for the restrictions occurs when 8 is partitioned

8= ( :_

i.e.

and the test is

HO : 21 = 21o

Ii'l H 0 : h(_@) = (Im : O) - 210 = O.

2

(16.3)

(16.4)

290

On partitioning D, J appropriately we find

since D2

Often it occurs that J12

J21 J22

= 0 by (16.1). Then the LM statistic becomes

. . . . . . i- -i DI ~li DI" (16.5) LM = DI(JII-JI2 J22321 ) =

= J21 = 0 so that J is block diagonal, then

LM = D1 Jll] DJ (16.6)

Now there is an interesting way to compute the LM statistic using the

one step GN method mentioned earlier. Suppose we perform one step of

Fisher's scoring algorithm with starting value @ = 8. The step will yield

=> LM = (el - ~ ) ' J(@z- ~)"

This quantity is what would be obtained by doing a Wald test after one step.

We can also view this idea in GN terms when the likelihood is a least

I n ~ squares function: L(@) = ~ Z le (0)/d 2. Now consider, as earlier, an

approximate Taylor series

ek(Q) = gk + (dek/de)'(0 - ~)

where ek = ek(e) are the residuals after fitting the restricted model e.

So if we regress ek on -~ = dek/d~ we find

_ ~ = j-1 ~2 n

and

n

n = Jn (~) = Z1 dek/dSdek/d@' = ~'~

n ~2D = Z I ekdek/d0 = dL/d@ = X'e

291

where _e = (el' ..., ~n ),; ~2 = n-i Zln ek'~2

we find

~-2 Thus on approximating J by J

- - II

LM -- D' ]-i~ $-2 n

= (~- ~)' ~ (6- ~) ~-2 n

R 2

-dek/de.

(1)

(2)

(3)

2 The coefficient of determination is LM ~ Xr.

= ~,(~,~)-i ~/$2

= n R 2

is the coefficient of determination for the regression of $k

So to generate the LM in such a case is straightforward.

Obtain 0 under H 0.

Generate ek(0) = ek' dek/d~'

Regress ek on dek/d8

x n

on

An example is now given.

16B. Testing for Autocorrelation in a Regression

Consider the regression model

Yk = -~ ~ + Vk

b Vk = a(L)Vk + ~k; a(L) = E l a i L i .

So the disturbance is AR(p). We test the hypothesis:

= (a I, ..., ap). The model in prediction error form is

v

ek(8) = (l-a(L))(yk-~k p"

Then

dek/d ~ = Xk = -(Vk_ 1 ..... Vk_p)'

dek/d~ = -~k = -(l-a(L)) _x k.

Ho:a=_O,

We put @ = (_81:_0'2) = (a' : @')'; = 2~o 0

292

Now the restricted estimates are a 0 and ~ - = - = ~OLS (OLS = ordinary

least squares). Further ek = Vk = OLS residuals. Finally,

dek/d~ = -Xk; dek/d~ = ~k"

Thus to obtain the LM we regress e k = v k on (ZkXk). Thus

~-2 n . . . . , = E 1 Vk(Vk_ I ..... Vk_p_X k)

= n(Rl ..... Rp~') = (DI : ~) (as expected)

where R. are the autocorrelations of the OLS residuals. J

If ~k has no lagged Yk'S then J12 = 0 so

~, ~-i DI LM = D 1 Jll

= ~;(~p)-l~p.

This is equivalent to the intuitively reasonable procedure of fitting an

AR(p) to the residuals and testing that against the null hypothesis p = 0.

If, however, the _x k has lagged y's then J12 # 0 and the full regression

of Vk on lagged Vk'S and _Xk'S should be performed. In the simple case of

an AR(1) model the limit of n -I J12 can be evaluated in a straightforward

way. The h-test of Durbin (1970) results.

16C. Choice of Order and AIC

The classical approach to order determination involves testing a nested

sequence of hypotheses. The difficulty involves finding a stopping point.

Some time ago Akaike (1970, 1976) suggested a new idea for chosing model

order based on an information theoretic argument. The idea is to consider

AIC = -2 log (maximized likelihood) P

+ 2 # of free parameters (degrees of freedom).

The idea behind this is easily understood through the AR model.

293

Suppose we use data YI' "''' Yn to fit an AR(p) model (by regression)

^

producing a parameter estimate a . Now the fit will be evaluated on a new -p

data set -±Y~' ..., _v n. The model is

Yn = apYn-i + gn; Vn-i = (Yn-I ..... Yn-p )

The prediction error sequence is

n = Yn - apY_n_l

= g + (a-a)'Y n - -p -n-l"

An average measure of predictive performance on the new data set for this

particular fit is the expected mean squared error (emse)

~2 = E~(e~) = 0~(1 + (a-~p) ' Rp(a- ~p))

where 0 2 is the variance of the innovations from a pth order model; also P

R = [E(y 0 Yj_k)]l< ~ ,kip" -p

From the original fit we know

(a-~p) = N(O, 0 2 R -In-l). _ _ p-p

Notice that the distribution of emse is thus approximately

~2 e ~ o2(1+x2/n). P P

(16.7)

Thus, if we compute the average emse based on the original sample (this

giving an average measure of predictive performance of our procedure of

having fit a pth order autoregression) we find

Ey(e 2) = EyEy(e~) = ~(l+tr Ip/n)

= ~2(l+p/n).

P (16.8)

Calculations like these led Akaike to propose choosing AR model order by

minimizing an expression such as (16.8). In view of (16.7) it is not

surprising that this method does not produce a consistent estimate of p.

294

Notes. A number of authors have stressed LM tests recently. The presenta-

tion here draws from Breusch and Pagan (1980). See also Godfrey (1979),

Hosking (1980) and Silvey (1959). Some calculations related to the AIC

are given by Broomfield (1972). An alternative to AIC is Parzen's (1974)

CAT. S~derstr~m (1977) gives an interesting discussion of the relation

between AIC and hypothesis tests. Shibata (1976) studies the asymptotic

behavior of AIC. See also Hannan and Quinn (1979).

LECTURE 17: IDENTIFIABILITY OF CLOSED LOOP SYSTEMS I

17A. Introduction

A common situation in time series modelling is when the data are

collected on a system in which the output variables are used to produce

(feedback) the inputs or exogenous variables. In control engineering the

overall system is called a closed loop system (since there are dynamic

relations from z to y and back to z). These schemes can be usefully

represented by block diagrams which are basically pictures of z transforms.

A basic scheme is shown in Figure i.

Figure i.

reference t< aignal~

A Closed Leo E System

.• input z 1 system P

i or plant i

vfl forward loop

~controller C

feedback I

loop noise Iv b

The closed loop system is described then by two dynamic equations

Forward loop Yk = PZk + Vfk

Feedback loop z k = Cy k + Vbk

where P = P(L), C = C(L) are rational causal transfer functions of L.

It is convenient to write this in matrix form

(17.1)

(17.2)

296

i -P 1

-C i

=>

z k

Yk } =

z k

1 Vbk

[ 1 P Vfk S

C 1 Vbk

The quantity S = (I-PC) -I is called the return difference. If the loop

is broken anywhere then the transfer function from the rhs of the break

to the lhs is S. The basic problem for model fitting is the issue of

identifiability.

(17.3)

(17.4)

17B. Basic Issues in Closed Loop Identifiability

To reveal the ideas consider the simple ARX model

Yk = a(L)Yk + b(L)Zk + Ck

gk is a white noise, a(L) = E i ~ aiLi etc. Suppose we propose to estimate

e = (a', b') = (a I ..... ap bl, .... b )' by least squares. We know from -- _ -- p

Lecture 13C (and it is anyway intuitively clear) that @ is a-ldentifiable

provided a linear dependency of the form

~(L)y k + B(L)z k = 0

(deg ~(L) ! p, deg $(L) ! p) is not possible.

There are three ways to avoid this

(i) If we allow a linear controller

-I z k = -(I+F(L))

(ii)

R(L)Y k

we must have it of sufficient complexity i.e. deg F ~ p.

(17.5)

Alternatively, we can introduce a dither signal in the feedback

loop

297

z k = -(I+F(L)) -IQ(L)y k + ~bk"

Then provided {gbk } is not linearly dependent on {Vfk} (17.5)

cannot occur.

(iii) Allow a time varying or nonlinear controller.

In the ensuing discussion case (ii) will be investigated. To keep

matters simple only scalar y, z will be treated: this already reveals

most of the issues involved, One last definition will be useful. A

rational transfer function H(L) = B(L)/A(L) (where B, A are polynomials)

is called causal (or proper) if ]H(0) I < ~: this simply means H(L)z k

requires only Zk, Zk_l, ... for its calculation. H(L) is called strictly

causal (or strictly proper) if H(0) = 0: this means there is a delay in

H(L) i.e. H(L)z k requires only Zk_l, Zk_2, ... for its calculation. All

transfer functions in the ensuing discussion are assumed to be causal.

17C. a-Identifiability of the Forward Loop

To simplify the discussion we suppose infinite data is available. In

Equations (17.1, 17.2) take vf, v b to be stationary. Then v b can be

interpreted as a dither signal or a noise in the feedback loop or a sum

of the two.

Intuitively in identifying the forward loop we can allow vf, v b to

be correlated. Let Vfk have an innovations representation or Wold decom-

position as (subscript zero is a true value)

Vfk = N0(L)~fk;

while Vbk is given by

and E(~fk gbj) = 0 Vk, j.

E(gfkSfj) = o2fSkj

Vbk = R0(L)Vfk + M0(L)abk

Clearly, it is necessary that

298

Ro(L) is causal.

The closed loop structure becomes

(17.6)

lyk) s01 1 P01 z k -C O I

No oll fkl RoNo MO gbk

(i7.7)

To proceed further, conditions will be imposed that ensure (Yk' Zk) is

stationary. It will be convenient to intoduce the forward loop and feedback

loop ARMAX descriptions (or irreducible matrix fraction descriptions)

(Po :N0) = Af~(L)(Bf0(L ) :Cfo(L))

(C O :M 0 : RoNo) = A;~(L)(Bbo(L ) : Cb0(L) : Db0(L)) where

deg Af0 = deg Bfo = deg Cfo = pf

deg ~0 = deg Bb0 = deg Cb0 = deg Db0 = Pb

and pf, Pb are assumed known. Then a necessary and sufficient condition

for (Yk' Zk) to be stationary is

(17.8a)

(17.8b)

[ z pf 0 det

~ 0 z pb

Afo(z_l) -Bf0(z-i )

-Bbo(Zb I) ,0(z-l) ]} =

has no roots in izl ~ i

This is obvious once we write the full model in A~MAX form as

(17.9a)

i i -Bbo ~0

Yk =

z k

I Cfo

Dbo

0 ( gfk

Cbo gbk

Alternatively, given Equations (17.8) observe from (17.7) that

1 AfoAbo

SO 1 - PoCo Afo~o - BfoBbo

and the numerator of S O cancels all the other denominators in (17.7). This

299

further makes it clear tbat the other way to express the iff condition for

(Yk' Zk) to be stationary is that

S O , SoM 0, SoPoM 0, SoNo(I-PoRo ), SoN0(R O-C O ) are stable.

Now let 2 denote the ARMAX parameters in a forward loop model P, N.

Introduce the prediction error sequence

ek(@) = N-l(yk-PZk)

= (N -1 : N-ip)

z

Clearly to produce a stationary error sequence it is necessary that

N -I, N-Ip are stable

or equivalently

N is minimum phase, N-Ip is stable

afortiori these entail

minimum phase, NoIp 0 N O

or alternatively, in view of (17.7)

if

stable

Pfcf0(z-i ) z = 0 has all roots < i.

The least squares function is L(e) = E(e~(@)) and @0

L(e ) = L (e o) => e = e o.

Now since ek(@0) = gfk we have to ensure

L(8 ) > 2f = E(2fk)

For this it is necessary that

re.

either

or

is a-identifiable

Po(hence P) has a delay i.e. is strictly causal

R 0 and C O each have a delay.

These statements entail

(17.9b)

(17.10a)

(17.10b)

(17.11a)

(17.11b)

(17.12)

(17.13a)

(17.13b)

300

Sn(O) = 1 i.e. there is a delay somewhere in the loop.

To see all this calculate

(Yk) ek(@) = N-I(1 l-P)[ Zk

= N_I( I :-P) SO ( 1 C o i RoN0 M0 gbk j

irN0 P000 0M0]i k I N-I(1 :-P) S O CON0+ R0N 0 M 0 J Ebk

_ : [~fk = N - I S O [(1-PC0)N 0 + (P0 P)RoN0 (P0- P)M0]

(~bk

: [ Cfk] = N-I(N0 + S0(P 0-P)(C 0+R0)N 0 S0(P 0-P)M0)

ebkJ

= Cfk + Tfe Efk + Tbe~bk

Tf0 = N -IN O - I + N -IS0(P 0-P)(C 0+R0)N 0

Tbe = N -IS0(P_P)M 0.

(17.13c)

= Note Tf0 0 0 = Tb@ 0. Thus, clearly (17.13) ensures Tfe is strictly causal:

then (17.12) follows.

The following result can now be proved. THEOREM 17,1. For the closed loop model (17.1), (17.2) suppose

(Yk' Zk) is stationary i.e. (17,9a) or (17,9b) (17.9)

N O is minimum phase

f No1P 0 i s s t ab le

R 0 is causal (17.6)

There is a delay in P0 or in C o and R 0 17.13)

Then P0' NO are a-identifiable.

17.11a)

301

-i Remark. Note that P0 need not be stable i.e. Afo need not be stable.

Proof. Consider that

ek(@ ) - ek(80) = ek(0 ) - Efk

We have to show E<e~<@)) = E<e~(@0) ) = ~ => @ = 80 . In view of the

discussion above the equality is equivalent to

)2 E(ek(8) -ek(@ O) = O.

Then N -I = NOI; N-Ip = NoIP 0 follows from a lemma.

LEMMA 17.1. If F(L), Fo(L ) are stable causal, rational transfer functions

and x k is a stationary process, then

E[(F(L) -Fo(L))Xk ]2 = 0 => F(L) = Fo(L).

Proof. Let F(L) = H-I(L)G(L), Fo(L ) = HoI(L)Go(L). Introduce S k

0 = F0(L)Xk. We can always take G(O) = i. Consider that S k

N(L)(S k-S~) = -(H(L) -H0(L))S ~ + (G(L) -G0(e))x k.

Now

E(Sk-SO)2 = 0 => E[H(L)(Sk-Sk0)] 2 = 0.

But calling 0 = (hi, ..., hp gl' "''' gp)'; p = deg (G(L)) we see

0 2 E[H(L)(Sk-Sk) ] = (_~-GO) J(_G-_@ O)

J = E(WkWk) ; w e = (Sk_ l ..... Sk_ p Xk_ I ..... Xk_ p)

and J is positive definite hence O = 00.

(17.14)

= F(L)Xk,

Notes. The present arguments extend straightforwardly to the multivariate

case. Theorem 17.1 is due to SSderstrom et al. (1976). Their method of

proof is a little different; perhaps the present one more clearly reveals

302

the origin of the conditions. ~iderson and Gevers (1981) have also proved

similar results by rather longer arguments. (It is possible to obtain their

theorem (5.2) also by modifying the argument given here: see Lecture 18.)

A discussion of case (i) is given in Ng et al. (1977). A general discussion

of the related topic of stochastic control is in Astr~m's (1970) book: see

also Kailath (1980).

LECTURE 18: IDENTIFIABILITY OF CLOSED LOOP SYSTEMS II

18A.

as the forward loop (17.1).

system is now

Clearly, it is necessary that R 0 = 0.

Identifiability of the Closed Loo~

Now consider the identifiability of the feedback loop (17.2) as well

The

Yk = P0Zk + NoCfk (18.1)

Zk = CoY k + M0gbk (18.2)

where (gfkgbk) is a white noise. The stationarity of (Yk' Zk) follows

iff (cf. 17.9b)

SO, 80Mo, SoPoM0, SON0, SoNoC0 are stable. (18.3)

By exactly the same argument as used in Theorem 17.1 the following holds.

THEOREM 18.1. In the system (18.1), (18.2) if (18.3) holds or equivalently

(Yk' Zk) is stationary

N O , M 0 are both minimum phase (18.4a)

NoIPo , MoICo are both stable (18.4b)

there is a delay in P0 or C O i.e. S0(0) = 1 (18.5)

E(~fkCbk) = O. (18.6)

Then, P0' NO' CO' MO are a-identifiable.

Remark. Actually, it is possible to remove conditions (18.4) as follows.

Recall from the ARMAX descriptions (17.8) that

(N~I :N~IP0) = Cfo(Af 0-I :Bf0) (18.7a)

304

("o 1:"o1~o ) ° ~ ( % o : ~o) (18.~b)

Now if we carry out identification with a minimum phase N, M and

stable N-Ip, M-Ic we will obtain e.g.,

( ~ i ~glP0 ) ~-1 : = Cf0(Afo :Bf0)

where Cf0~-I is stable. Thus we can recover PO = Af0Bfo-i (which may be unstable)

but only obtain Cf0 = CfoVf0 where Vf0 is an all pass transfer function that

removes the unstable parts of Cf0. The following result is then not un-

expected

THEOREM 18.2. In the system (18.1), (18.2) if

(Yk' Zk) is stationary

there is a delay somewhere in the loop i.e. S0(O) = 1 (18.5)

E(gfk Sbk ) = O. (18.6)

Then PO' CO are a-identifiable and NO, M 0 are a-identifiable up to multi-

plication by an all pass transfer function.

Proof. See Anderson and Gevers (1981). (They use the word paraunitary

rather than all pass.) Also their approach is different being based on

spectral decomposition. Recall the overall system is

lYk) = S0 I NO -PoM0

0o 0If I °I I ' Zk ~bk ~bk

Now they treat this as a bivariate model. Just how P, C, N, M are

recovered from W is discussed in Appendix AI8.

Remark. There is a problem with condition (18.6). In a phyiscal setting

it may be straightforward to determine whether (18.6) holds. But e.g.,

in econometric modelling it may not be clear. This issue may be resolved

by the following result. This theorem will be best understood after

305

studying the discussion of feedback free processes in Lecture 19.

A model P, N, C, M is called generic if certain special pole zero

cancellations among them are prohibited. These are set out in Appendix BI8.

THEOREM 18.3. Suppose (Yk' Zk) is stationary,

there is a delay in the loop i.e. S0(0) = 1

N0(O), M0(O) are both non zero

P0' NO' CoM0 is generic

P, N, C, M (obtained from the NMSF as in Appendix AI8) is generic

Then

from the NMSF is block diagonal.

El ~fk](~ Cbk)' is block diagonal Q= fk

and PO' NO' CO' MO are a- ident if iable and given by P, VNN, C, VNI~ respectively

where VN, V M are all pass transfer functions.

Proof. See Gevers and Anderson (1981).

Remark. The conditions of the theorem can also be expressed in terms of

conditions on W 0 e.g., the genericity requirement includes

1 deg W 0 = ~ deg ~(z).

1 Note that (Lecture 3) for the NI~F deg W = ~ deg ~(z): again see Gevers

and Anderson (1981).

Notes. The discussion surrounding Theorems 18.1, 18.2 is new. Though

most of the arguments are different this lecture draws heavily on the

series of papers by Anderson and Gevers (1981) and Gevers and Anderson

(1980, 1981). Also note that Theorem 18.2 is closely related to some

theorems of Ljung and Caines (see Caines and Chan (1976, Theorems 3.3, 3.4)).

306

APPENDIX AIS. Closed Loop Models from Spectral Factors

quoted in Lecture 2.

comparing

Starting from the autocovariance generating function recall the NMSF

From the NMSF W we can recover a set P, N, C, M by

W21 W22

=> SM = W22 , SPM = WI2

:~> ~ - --i = w12w22 (AI8. i)

SN = WII, SCN = W21

~> ~ - --i

= w21Wll •

Consider two expressions for det

. . . . ~-l- det W = SNM = WII(W22-W21 IIWI2 )

(W22 - --i- => = - W21WIIWI2)

det W = SNM = W22(WII-WI2W221W21 )

--> N = WII - WI2W22~]21.

(A18.2)

(A18.3)

(A18.4)

Remark. Once P, N, C, M have been computed we can deduce the closed loop

is stable i.e.

S, SM, S>M, SN, SNC are stable

by the stability and minimum phase property of W.

APPENDIX BI8. Nongeneric Pole Zero Cancellations

Introduce the notation N(P) = the poles of P; z(P) = the zeroes of P.

307

The special pole-zero cancellations are avoided by

p(N o) n {z(N o) UP(~'O)} =

p(M 0) I] {z(M 0) Up(C0)} = (~

where e.g., M; is the complex conjugate of M O.

LECTURE 19: LINEARLY FEEDBACK FREE PROCESSES

19A. Introduction

In Lecture 17 it was pointed out that much time series modelling must

be done on systems operating in closed loop. A common question in the

econometric situation is the presence or absence of the forward loop or

feedback loop. Intuitively, for example, if the feedback loop is absent

we feel z is exogenous to y and vice versa. It would be nice then also to

say that z causes y. However, here we enter the classic problem of the

relation between two random variables in observational studies. If it is

known that y (lung cancer) cannot cause z (smoking) then an observed corre-

lation between y, z cannot be interpreted causally (perhaps another variable

W causes them both). The usual procedure for yet attempting to draw causal

inferences involves adjustment for other variables, existence of the relation

in different settings and theoretical explanation. Rather than entering into

these issues we concentrate on establishing the time series equivalent of

"lung cancer does not cause smoking" i.e. exogeneity. The process (y, z)

will be called linearly feedback free (or we say z is linearly exogenous to

y; or y does not cause z) if the feedback loop is missing.

19B. Linear Exogeneity

Here some general definitions and properties of linear exogeneity are

b described. The symbol Ya will denote the collection of observations

Ya' Ya+l' "''' Yb: similarly for z b. For simplicity we suppose an infinite a

data set is available. Note that the following definition and equivalences

do not rely on stationarity.

(i) Weak linear exogeneity

The following three definitions are shown to be equivalent.

309

DEFINITION a. z t is weakly linearly exogenous (wle) to Yt if

E(Ytlzt )_ = E(YtlZ~, Zt+l). (a)

(The wide sense conditional expectation or projection E is discussed in

Appendix A3.) This says that the one-sided (causal) and two sided (smoothing)

filters for the estimation of y from z are identical. To put it another

t way, the (residuals from the) regression of Yt on z_~ is (are) the same as

t (the residuals from) the regression of Yt on z_~, zt+ I.

DEFINITION b. zt is wle to Yt if

~(zi(y t ~(ytl t - z_~)) = 0 ~i > t. (b)

(This automatically holds for i J t by the definition of E.) This says

that future z's carry no (linear) information on present y's once the

effect of past z's has been extracted.

DEFINITION c. z t is wle to Yt if

) = t t E(zilz_~ , y_<) i > t. (c)

The interpretation here is that past y's carry no information on future

z's not present in past z's. Alternatively, z t is wle to Yt if the

(residuals from the) regression of z on its own past (are) is the same

as (the residuals from) the regression of z t on the past of z and of the

past of y.

Proofs of Equivalence. First observe that

~(ytl t ~ ~(ytlz t ~(yt I~~ z_oo, zt+ I) = _~) + zt+ I)

where

Thus

zt+ 1 = zt+ 1 - E(Zt+llzt_co )"

(a) <=~> E yt(zi- E(zilz~ )) = 0 i > t

<=~> E(y t- E(Ytizto) zi = 0 i > t

310

which is (b).

Next, since

where

Then (c) =~

which is (b).

then (b) =>

which is (c).

~(zilzt , t ) z t ~t _ Y_~ = E(zil _~) + E(zilY_ )

~t t ~, t ,z t ) Y_~ = Y_~ - ~Y_~L _~ "

E zi(y t- E(YtlZ~ )) = 0

On the other hand, introducp

wt = Yt - E(Yt Izt~)-

i > t

E(ziwt) = 0 Vi, t

,-> ~(zilwt ~) = 0

~ t ~(Zi ztoo, W t => E(zilz_~) = I -~)

The last equality follows since for any x

t-l) E(xlzL ' Ytoo ) = E(xl zt-co' Yt' Y_oo

= E(xl t t-l) z_oo~ w t' Y-oo

(see Appendix A3)

t-i = E(x]zt~ I' Y_oo ' z t'wt)"

Now iterating the argument gives the above equality.

(ii) Strong line@r exogenei ~

The following three definitions are equivalent.

DEFINITION a'. z is strongly linearly exogenous (sle) to Yt if t

Iz_~ ) = E(Ytl _~ , zt). E(Yt t-i z t-I ~ (a')

The difference between a' and a clearly is that here instantaneous corre-

lation between y and z is not allowed.

311

DEFINITION b'. zt is sle to Yt if

~ ~ t-i E(zi(Yt-E(YtlZ_o ° )) = 0

(again this automatically holds for i < t).

correlation that is excluded.

DEFINITION c'. zt is sle to Yt if

~(zilzt21) ~(zi I t-1 t _ = z_~ , y~)

Proofs of equivalence are similar to before.

Vi >_t

Again it is instantaneous

i >_t.

(b')

(c')

19C. Linear Feedback Free Processes

Consider two processes y, z and suppose the joint dynamic relation

between them is described by the closed loop system

Yk = P0Zk + Nogfk

z k = CoY k + Mogbk

El ~fk) ~bj) Q . gbk (~fj = -~kj

Suppose that (Yk' Zk) is stationary. As pointed out in Lectures 17, 18

this is guaranteed if

1 - PoCo = S O , SoM O, SoPoM 0, S0N O, SoNoC 0 are stable.

Actually, if (Yk' Zk) is stationary we can always construct such a

closed loop model from any spectral factor of the spectrum of (Yk' Zk)

(cf. Appendix AI8). However, the identifiability question is exactly the

fact that this may not give PO' NO' CO' MO of the true system (if there

is one).

Identifiability will require there be a delay in the loop i.e.

(19.1)

(19.2)

(19.3)

(18.3)

s0(0) = l .

312

Then we can solve (19.1), (19.2) to find

f Yk] 1 PO NO Cfk] = S O •

[ z k C O I MO ~bk J

The identifiability question leads to the following defintions.

We say the structure (or system) P0' NO' CO' M0 is

(i) weakly linearly feedback free (wlff) if

C O = 0

(ii) strongly linearly feedback free (slff) if

C O = 0 and Q is diagonal.

If (i) fails we say the structure PO' NO' CO' MO is weakly linearly

feedback connected.

We also formulate the following definitions. The ordered pair of

processes (y,z) is

(i) wlff if

(ii) slff if

~=0

= 0, Q is diagonal

where P, N, C, M, Q are obtained from the NMSF as in Appendix AI8 and

Lecture 2.

The connections between the structural definitions and the process

definitions are discussed in Lecture 20. For the moment we relate the

process definitions to weak and strong linear exogeneity of the previous

section.

THEOREM 19.1. If (Yk' Zk) is stationary then the process (y, z) is wlff

iff z k is wle to Yk"

THEOREM 19.2. If (Yk' Zk) is stationary then the process (y, z) is slff

(19.4)

(19.5a)

(19.6a)

(19.5b)

(19.6b)

313

iff z k is sle to Yk"

Proof of Theorem 19.1. We show the equivalence of (b) and (19.5b).

suppose (b) holds. Now by iterating the projection calculation

First

~ - t-2 E(ztlz_tool ) = E(ztlz_o ° , zt_ I)

z t-2

= E(ztlCt_ I,, z_oo)

= + (z Iz- Z 2)

where c~ : z t E<z tlzt~ I) - - : we can produce an innovations representation

(IR) or Wold decomposition for z as t

co z = Z0 D c zt i t - i

z Z ~

Di = E(zt a t - i )Z- lz 2z = E(S~S t ); D O = I.

Stationarity ensures D. does not depend on t. Now define i

t Yt ~(ytl t ~(ytl zt w = - = - s °° ) . z_oo) Y t

Then iff (b) holds w t, z i are orthogonal. Let

= w co w

wt st + ZIAiE t-i

= st-i) Ewl; ~-w = Ct ); A0 I Ai E(wt w E(g t w' =

be an IR of w . Then we have the joint IR representation t

Yt = A(L) B(L) c t

z t 0 D(L) c

(19.5c)

i z 7j-i A(L) = Z 0AiL etc.; B i = E(y tst_i) z "

Now introduce B(L) = B(L) - A(L)B 0 and rewrite this as

z t 0 D(L) Ebt J Sbt

with

314

[nil I 0]i I Cbt j 0 I E~

So gft' gbt are instantaneously correlated; B(0) = O. The result is

thus established since W(O) = I.

Now suppose (19.5b) holds i.e. we have an IR of the form (19.5c)

but we do not know yet that c~, ~$ are innovation sequences so call them

~w ~z gt' gt" Then we have

z t = D(L)~.

z ~z But this immediately ~*> g = ~ .

t t

Further we then deduce

~ zt-l) Yt - A(L)g~ = Yt - E(Ytl ~_oo

= w say. t

t Now also w t B(L)~ . Hence

]

--> E zt(y j

Vt,j '~> E(z t w.) = 0 Vt,j ]

- E(yjlzj-l)) = 0 Vt, j

which is (b).

Proof of Theorem 19.2. Similar.

Notes. Definition (a) is the one used by Pierce and Haugh (1977) and

introduced by Sims (1972). Definition (c) is due to Granger (1969).

Some of the discussion here is adapted from Caines and Chan (1976) and

Caines (1976). Some related work is Geweke (1978). Caines and Chan

define feedback free processes in terms of a spectral factorization of

(y, z) so a closed loop model is not needed. The present discussion

raises the issue of identifiability discussed in the next lecture.

LECTURE 20: TESTS FOR ABSENCE OF LINEAR FEEDBACK

20A. Test for Linear Feedback

Now it is time to consider how the various equivalences expressed

in Theorems 19.1, 19.2 can be used to generate inferential procedures

(tests of hypothesis at least) for determining the presence or absence

of linear feedback. Here we discuss tests of feedback for processes.

To connect this to feedback testing of a structure (or closed loop system)

we have to discuss identifiability. This is done in the next section.

The equivalences in Theorems 19.1, 19.2 suggest several ways to test

for linear feedback of processes.

(a) Weak linear feedback

(i) Fit a closed loop (ARMA) model and test whether the transfer

function C = 0.

(ii) Fit a univariate time series model to each of y, z and then

replaced them with their (estimated) innovations sequences,

(lii)

say Syt' Szt" Now compare the two sided regression of gy t

t ~ with the one sided regression of ~yt on gz,_~, gz,t+l on

t g (cf. Definition (a)).

Replace z t by its innovations sequence ~zt" Regress Yt on

t z_~ to p~oduce residual gyzt" Now test czt, gyzt for inde-

pendence (cf. Definition (b)).

(b) Strong linear feedback

(i)' Fit a closed loop n odel and test whether C = 0, Q is (block)

diagonal.

t-i (ii)' As for (ii) but co~are the regression of ~ on ~ with

yt z,-~

t-i E~ (cf. Definition (a')). that of gyt on ~z,-m' z , t t-i

(Jii)' As for (iii) but regress Yt on z_~ to produce Eyzt. Now

316

test s and g for independence (cf. Definition (b')). zt yzt

20B. Identifiabilit X and Weak Linear Feedback

It has been mentioned that there is a problem with condition (18.6):

we may not know whether it is true. The nature of the problem is revealed

in the following example.

EXAMPLE 20.1. Consider the simple system (structure)

i + 2L Yk i + .4L Sfk

i + 4L Zk i + .6L Sbk"

So P0 = CO = 0. Suppose E(gfk Cbj) = 6kjl. Thus the structure is wlff.

Suppose we use method (ii) to test for wle of z t with respect to Yt

(i.e. whether the process (y, z) is wlff). We will only be able to fit a

minimum phase model namely

1 1 i+ ~L ~ i+ 7L ~

Yk 1 + .4L Sfk' Zk I + .6L ~bk

where then

~ 1 + 2L ~ 1 + 4L 1 gfk' ebk = 1 Sfk' gfk i + ~L 1 + %L

Now ~f, Sb are white noises but they are cross serially correlated because

1 + 2L 1 + 4L the all pass filters - - , 1 smear the instantaneous correlation

1 + ½L 1 + ~ L

of ef, Sb over the whole lag axis. Thus we will deduce the process (y, z)

is weakly linearly feedback connected.

On the other hand, if E(efkebj) = 0 Vk, j then also E(~fkSbj) = 0

Vk, j so if the structure is slff then when we use method (ii) (or any other)

to test the process (y, z) we will deduce that the process (y, z) ia slff.

This reveals an identifiabiiity problem with detecting wle of the structure:

we cannot do it unless we know NO, M 0 are minimum phase. If they are not

then we can still detect sle of the structure.

To put it carefully, if the structure is slff then so is the process.

317

But if the process is slff can we be sure the system (or structure) we

modelled exhibits sle? The answer is yes provided certain special pole

zero cancellations are prohibited. This is brought out in the following

example.

EXAMPLE 20.2. Consider the system

1 1 + .6L PO 1 + .5L ' NO = 1 + .8L

1 + .5L C O = 0 ' MO - 1 + .75L

E(Cfk Sbj) = 0 Vk, j.

So the system shows sle or is sill. We can construct the overall structure

in a number of ways. It is convenient here to construct the irreducible

M~D's or AR~X forms

(P0 :No) = A~I(Bf :Cf)

= [(I +.5L)(I+ .8L)]-I[(I+.8L)L : (I+.6L)(I+.5L)]

(C O :M 0) = <I(B b :C b)

= (I+.75L)-I[0 :i + .5L].

Then

W 0 = ~-B b Abl k 0 CbJ

= [ (I+.5L)(I+.8L)0 -(I+,8L)LI + .75L ]-i [ (I+'6L)(I+'5L)0 1 +0 ]..5L

This factorization is not coprime. A coprime factorization is

WO = [ 5(I+.8L)L ] [ 5(I+.6L) 6L

0 1 + .75L 0 1 + .5L

318

Note that W(0) = I. det W has no poles or zeroes in [z I ~ 1 so it is stable,

minimum phase so W 0 = W the NMSF. Also, Q : I, W21 = 0 => the process

(y, z) is slff (this follows any way since the structure is slff).

Now consider the following spectral factor

[ Ill = 5(I+.8L) L 5 + 13.913L 4.034L

0 1 + .75L j -I.085L 1 + .882L

.197 .813

This defines a structure P, N, C, M. Note that W(0) = I (so P(0) = 0)

but W21 # 0 and Q is not (block) diagonal. Thus the process (y, z) is slff

but the system or structure from which it could have come is not. The

problem is then the special pole-zero cancellation.

The situation can be summed up in the following result.

THEOREM 20.2. If the structure P0' NO' CO' M0 is generic (see Lecture 18)

Q0 is block diagonal, the process (y, z) is stationary,

P0(O) = 0 = Co(0)

N0(0), M0(0) are both non zero.

Then if the process (y, z) is slff so is the structure P0' NO' CO' M0

i.e.

= 0, Q (block) diagonal

'=> C 0 = 0, Q0 (block) diagonal.

Proof. See Gevers and Anderson (1981).

Summary. Suppose then we have a system described by a closed loop model.

If we know the noise transfer functions are minimum phase we can test wle

and sle. If this is unknown then provided the system is generic we can

test for sle. If the system is actually wle (but NO, M 0 are not minimum

319

phase) we can never know that; if the process is wle we cannot be sure the

system is (unless N O , M 0 are minimum phase). These points first raised

by Gevers and Anderson (1981) seem of some importance for testing linear

exogeneity.

Notes. A discussion of tests of linear exogeneity is given in Hsiao (1979)

who uses method (i) by fitting autoregressions. Pierce and Haugh (1977)

use method (ii). Also Caines and Chan (1976) use method (i) but fit APd~A

models. The material in this lecture draws heavily on Gevers and Anderson

(1981).

REFERENCES

Aasnes, H.B. and Kailath, T.: An Innovations Approach to Least Squares Estimation - Part VII: Some Applications of Vector ARMA Models. IEEE Trans. Autom. Contr. AC-18, p.601-607, 1973

Aasnes, H.B. and Kailath, T.: Initial-Condition Robustness of Linear Least Squares Filtering Algoritlnns~ IEEE Trans. Autom. Contr. , p.393-397~ 1974

Abraham, B. and Box, G.E.P.: Deterministric and Forecast-adaptive Time-dependent Models. J.R.S.S. (C) 27, p.120, 1978

Aitchison, J. and Silvey, S.D.: Maximum Likelihood Estimation of Parameters Subject to Restraints. Ann. Math. Stat. 29, p.813- 828, 1958

Akaike, H.: Autoregressive Model Fitting for Control. Ann. Inst. Stat. Math. 23, p.163, 1971

Akaike, H.: Maximum Likelihood Identification of Gaussian ARMA Models. Biometrika 60, p.225, 1973

Akaike, H.: Stochastic Theory of Minimal Realization. IEEE Trans. Autom. Contr. AC-19, p.667, 1974a

Akaike, H.: Markovian Representation of Stochastic Processes and its Application to the Analysis of ARMA processes. Ann. Inst. Stat. Math., p.363, 1974b

Akaike, H.: F~rkovian Representation of Stochastic Processes by Canonical Variables. SIAM J. Con~. 13, p.162, 1975.

Akaike, H.: Canonical Correlation Analysis of Time Series and the Use of an Information Criterion. In System Identification, Advances and Case Studies, Mehra and Lainiotis, eds., Academic Press, 1976

Anderson, B.D.O. Covariance Factorization via Newton-Raphson Iteration. IEEE Trans. Inform. Theory IT-24, p.187, 1978

Anderson, B.D.O., Hitz, K.L., and Diem, N.D.: Recursive Algorithm for Spectral Factorization. IEEE Trans. C,A.S., 21~ p.742~ 1974

Anderson, B.D.O, and Moore, J.B°: Optimal Filtering. Prentice Hall 1979

Anderson, B.D.O. and Gevers, M.R.: Identifiability of Linear Stochastic Systems Operating under Feedback. Automatica 1981

Anderson, T.W.: The Statistical Analysis of Time Series. John

Wiley, 1971

Anderson, T.W. and Taylor, J.B,: Strong Consistency of Least Squares Estimates in Normal Linear Regression. Ann. Stat. 4,

p.788, 1976

321

tt

Astrom, K.J.: Introduction to Stochastic Control Theory. Academic Press, 1970

Billingsley, P.: Probability and Measure. John Wiley, 1979

Bloomfield, P,: On the Error of Prediction of a Time Series. Biometrika 59, p.501-507, 1972

Breusch, T.S.: Conflict among Criteria for Testing Hypotheses: Extension and Comment. Econometrica 47, 203-208, 1979.

Breusch, T.S. and Pagan, A.R.: The Lagraage Multiplier Test and its Applications to Model Specification in Econometrics. Rev. Econ. Stud. 47, p.239, 1980

Bowden, R.: The Theory of Parametric Identification. Econometriea 41, p.i069-1074, 1973

Box, G.E.P. and Jenkins, G.M.: Time Series Analysis Forecasting and Control. Holden-Day, Revised Ed., 1976

Box, G.E.P. and Tiao, G.C.: A Canonical Analysis of Multiple Time Series. Biometrika, p.355-365, 1977

Caines, P.E.: Weak and Strong Feedback Free Processes. IEEE Trans. Autom. Contr. p°737, 1976

Caines, P.E.: Predication Error Identification Methods for Sta- tionary Stochastic Processes. IEEE Trans. Autom. Contr. p.500, 1976

Caines, P.E. and Ljung, L.: Asymptotic Normality and Accuracy of Prediction Error Estimators. Tech. Report #7602, Dept. Elec. Eng., Univ. Toronto, 1976

Caines, P.E. and Chan, C.W.: Estimation Identification and Feed- back. In System Identification Advances and Case Studies, Mehra and Lainioti~ eds., Academic Press 1976

Chan, S.W., Goodwin, G.C. and Sin, K.S.: Convergence and Proper- ties of the Solutions of the Ricatti Difference Equation, Tech. Report #8201, Dept, Elec. Cmptr. Eng°, Univ. Newcastle, NSW Australia 1982

Cox, D.R. and Hinkley, D.V.: Theoretical Statistics. Chapman and Hall 1974

Deistler, M.: Z-Transform and Identification of Linear Econo- metric Models with Autocorrelated Errors. Metrika 22, p.13 1975

Deistler, M.: The Identifiability of Linear Econometric Models with Autocorrelated Errors. Intl. Econ. Rev. 17, p.26-46, 1976

Deistler, M.: The Structural Identifiability of Linear Models with Autocorrelated Errors in the Case of Cross Equation Restric- tions. Jl, Econometrics 8, p.23, 1978

Dickey, D.A. and Fuller, W.: Distribution of the Estimators for Autoregressive Time Series with a Unit Root. J.A.S.A. 74, p.427 1979

322

Dickey, D.A. and Fuller, W.: Distribution of Likelihood Ratio Test Statistics for Nonstationary Time Series. J.A.S.A. 1981

Doob, J.L.: Stochastic Processes. John Wiley 1953

Duncan, D.B. and Horn, S.D.: Linear Dynamic Recursive Estimation from the Viewpoint of Regression Analysis. J.A.S.A. 67, p.815 1972

Dunsmuir, W. and Hannah, E.J.: Vector Linear Time Series Models, Adv. Appl. Prob 8, p.339, 1976

Durbin, J.: Testing for Serial Correlation in Least Squares Regres- sion ~en Some of the Regressors are Lagged Dependent Variables Econometrica 38, p.410, 1970

Eic~er, F.: Asymptotic Normality and Consistency of the Least Squares Estimators for Families of Linear Regressions. Ann. Math. ~tat. 34, p.447, 1963

Findley, D.F.: Geometrical and Lattice Versions of Levinson's General Algorithm. In A~lied Time Series Analysis II, Findley, ed., Academic Press 1981

Fuller, W.: Introduction to Statistical Time Series. John Wiley 1976

Fuller, W.A., Hasza, D.P. and Goebel, J.J.: Estimation of the Parameters of Stochastic Difference Equations. Ann. Stat. 9, p.531-543, 1981

Gardner, G.A.C., Harvey, A.C. and Phillips, G.D.A.: An Algorithm for Exact Maximum Likelihood Estimation of A~MA Models by Means of Kalman Filtering. J.R.S.S. (C) 29, p.311, 1980

Gevers, M.R. and Anderson, B.D.O.: On Jointly Stationary Feedback Free Stochastic Processes. Manuscript, 1980

Gevers, M.R. and Anderson, B.D.O.: Representations of Jointly Stationary Stochastic Feedback Processes. Int. Jl. Control., 1981

Geweke, J.: Testing the Exogeneity Specification in the Complete Dynamic Simultaneous Equation Model. Jl. Econometrics 7, p.163, 1978

Godfrey, L.G.: Testing the Adequacy of a Time SEries Model. Biometrika 66, p.67, 1979

Goodwin, G.C.: A Simplified Method for the Determination of the Curvature of an Error Index with Respect to Parameters and Initial States. Int. Jl. Control. 8, p.253, 1968

Goodwin, G.C. and Chan, S.W.: Restricted Complexity Predictors for Time Series with Deterministic and Nondeterministic Components. Unpublished manuscript 1982

Granger, C.W.J.: Investigating Causal Relations by Econometric Models and Cross-Spectral Methods. Econometrica 37, 1969

Glover, K. and Willems, J.C.: Parameterizations of Linear Dynamical Systems: Canonical Forms and Identifiability. IEEE Trans. Autom. Contr. AC-19, p.640, 1974

323

Grewal, M.S. and Glover, K.: Identifiability of Linear and Nonlinear Dynamical Systems. IEEE Trans. Autom. Contr. , p.833, 1976

Hall, P. and Heyde, C.C.: Martingale Limit Theory and its Applica- tion. Academic Press 1980

Hannan, E.J.: The Identification of Vector Mixed ARMA Systems. Biometrika 56, p.223, 1969

Hannan, E.J.: MuJtiple Time Serie>. John Wiley, 1970

Hannah, E.J.: The Identification Problem for Multiple Equation Systems with Moving Average Errors. Econometrica 39, p.751,

1971

Hannan, E.J.: The Asymptotic Theory of Linear Time Series Models. J. Appl. Pro~ i0, p.130, 1973

Hannan, E.J.: The Identification and Parameterization of ARMAX and State Space Forms. Econometrica 44, p.713, 1976

Hannah, E.J. and Nicholls, D.F.: The Estimation of Mixed Regres- sion, Autoregression Moving Average and Distributed Lag Models. Econometrica 40, p.529, 1972

Harvey, A.C.: Finite Sample Prediction and Overdifferencing. Jl. Time Series 2, p.221, 1982

Heyde, C.C.: Martingales: A Case for a Place in the Statisti- cian's Repertoire. Aust. Jl. Stat. 14, p.l, 1972

Hosking, J.R.M.: Lagrange and Multiplier Tests of Time Series. J.R.S.S. (B) 42, p.170, 1980

Hsiao, C.: Causality Tests in Econometrics. Jl. Econ. Dyn. Contr.

i, p.321, 1979

llsiao, C.: Identification. Tech. Report #311, Economic Series, Inst. Math. Stud. Soc. Sci., Stanford University, 1980

}~ang, S.Y.: Solution of Complex Integrals Using the Laurent Expansion. IEEE. Trans. ASSP 26, p.263-265, 1978

Intriligator, M.D.: Econometric Models, Techniques and Applica- tions. Prentice Hall 1978

Jazwinski, A.H.: Stochastic Processes and Filtering Theory_. Academic Press, 1980

Jennrich, R.I.: Asymptotic Properties of Non-linear Least Sqaures Estimators. A.M.S. 40, p.633-643, 1969

Justice, J.H.; An Algorithm for Inverting Positive Definite Toeplitz Matrices. SIAM..7. AppI. Math. 23, p.289-291, 1972

Kailath, T.: A View of Three Decades of Filtering Theory. IEEE Trans. Inform. Theory. IT-20, 2, p.146, 1974

Kailath, T.: Linear Sxstems. Prentice Hall, 1980

Kalman, R.E.: A New Approach to Linear Filtering and Prediction Problems. J. Basic Eng t, Trans. ASME, Series D, 82, p.35-45,1960

324

Kashyap, R.L.: Maximum Likelihood Identification of Stochastic Linear Systems. IEEE Trans. Autom. Contr. AC-15, p.25, 1970

Kawashima, H.: Parameter Estimation of Autoregressive Integrated Processes by Least Squares. Ann. Stat. 8, p.423-435, 1980

Kimeldorf, G. and Wahba, G.: A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines. Ann. Math. Stat. 41, p.495, 1970

Kohn, R.: Note Concerning the Akaik~ and Hanr~n Estimation Pro- cedures for an ARMA Process. Biometrika 64, p.622, 1977

Kohn, R.: Asymptotic Properties of Time Domain Gaussian Estimators Adv. Appl. Prob. i0, p.339-359, 1978a

Kohn, R.: Local and Global Identification and Strong Consistency in Time Series Models. Jl. Econometrics 8, p.269-293, 1978b

Kohn, R.: Asymptotic Estimation and Hypothesis Testing Results for Vector Linear Time Series Models. Econometrica 47, p.i005- 1030, 1979a

Kohn, R.: Identification Results for ARMAX Structures. Econo- metrica 47, p.1295, 1979b

Koussiouris, T.G. and Kafiris, G.P.: Controllability Indices, Observability Indices, and the Hankel Matrix. Int. Jl. Control 33, p.723, 1981

Kshirsagar, A.M.: Multivariate Analysis. Marcel Dekker, 1974.

Lai, T.L., Robbins, H. and Wei, C.Z.: Strong Consistency of Least Squares Estimates in Multiple Regression II. J. Mult. Anal 9, p.343-361, 1979

Lai, T.L. and Wei, C.Z.: Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Con- trol of Dynamic Systems. Ann. Stat. i0, p.154-166, 1982a

Lai, T.L. and Wei, C.Z.: Asymptotic Properties of Projections with Applications to Stochastic Regression Problems. J. Mult Anal. To appear, 1982b

Lai, T.L. and Wei, C.Z.: Asymptotic Properties of General Auto- regressive Models and Strong Consistency of Least Squares Esti- mates of Their Parameters. J. Mult. Anal. To appear, 1982c

Linquist, A.: A New Algorithm for Optimal Filtering of Discrete- Time Stationary Processes. SIAM J. Control 4, p.736-747, 1974

Ljung, L.: On the Consistency of Prediction Error Identification Methods. In System Identification, Advances and Care Studies, Mehra, R.K. and Lainiotis, D.G. (eds), Academic Press 1976

Ljung, L. and Caines, P.E.: Asymptotic Normality of Prediction Error Estimators for Approximate System Models. Manuscript 1976

Ljung, L. and Rissanen, J.: On Canonical Forms, Parameter Identi- fiability and the concept of complexity. 4th IFAC Symp. on Identification and Parameter Estimation, Tbilisi, U.S.S.R., 1976

325

McLeish, D.L.: Dependent Central Limit Theorems and Invariance Principles. Ann. Prob. 2, p.620, 1974

Makhoul, J.: A Class of All-Zero Lattice Digital Filters Proper- ties and Applications. IEEE Trans. ASSP 2, p.304-314, 1978

Neveu, J.: Discrete Parameter Martingales. North-Holland, 1975, p.34

Newton, H.J.: Using Periodic Autoregressions for Multiple Spectral Estimation. Technometrics 24, 2, p.109, 1982

Ng, T.S., Goodwin, G.C., Anderson, B.D.O.: Identifiability of MIMO Linear Dynamic Systems Operating in Closed Loop. Automatica 13, p.477-485, 1977

Nicholls, D.: The Efficient Estimation of Vector Linear Time Series Models. Biometrica 63, p.381, 1976

Pagano, M.: On Periodic and Multiple Autoregressions. Ann. Stat. 6, p.1310, 1978

Paige, C.C. and Saunders, M.A.: Least Squares Estimation of Discrete Linear Dynamic Systems Using Orthogonal Transforma- tions. SIAM J. Numer. Anal. 14, p.180, 1977

Parzen, E°: Some Recent Advances in Time Series Modeling. IEEE Trans. Autom. Contr. AC-19, p.723, 1974

Parzen, Eo: Statistical Inference on Time Series by Hilbert Space Methods. In E. Parzen Time Series Analysis Papers, Holden-Day, 1967

Parzen, E,.: Time Series Model Identification and Prediction Variance Horizon. In Applied Time Series Analysis II, Findley, D.F. (ed), Academic Press, 1981

Pierce, D.A. and Haugh, L.D.: Causality in Temporal Systems: Characterizations and a Survey. Jl. Econometrics 5, p.263, 1977

Rauch, H.E., Tung, F. and Striebel, C.T.: Maximum Likelihood Estimates of Linear Dynamic Systems. AIAA Jl. , p. I145-1450, 1965

Richmond, J.: Identifiability in Linear Models. Econometrica 42, p.731-736, 1974

Richmond, J.: Aggregation and Identification. Int. Ec. Rev. 17 p.47-56, 1976

Riers~l, O.: Identifiability, Estimability, Phenorestricting Specifications, and Zero Lagrange Multipliers in the Analysis of Variance. Skand. Aktuar. 46, p.131-142, 1963

Rissanen, J.L. and Barbosa, D.: Properties of Infinite Variance Covariance Matrices and Stability of Optimum Predictors. Inform. Sci. i, p.221-236, 1969

Rissanen, J.L. and Ljung, L.: Estimation of Optimum Structures and Parameters for Linear Systems. Proc. CNR. CISM. Symp. on ~raic System Theory. U dine 1975

326

Rissanen, J.L. and Caines, P.E.: The Strong Consistency of Maxi- mum Likelihood Estimators for ARMA Processes. Ann. Stat. 7, p.297, 1979

Robinson, E.A.: Random Wavelets and Cybelrnetic Systems ~. Charles Griffin, London, 1962

Robinson, E.A.: Multichannel Time Series Analysis with Digital Compute S Prosrams. Holden-Day, 1967

Robinson, E.A. and Treitel: Geophysical ~ignal Analysis. Prentice-Hall, 1980

Rosenbroek, H.H.: State-Space and Multivariable Theor X. John Wiley, 1970

Rothenberg, T.J.: Identification in Parametric Models. Econometrica 34, p.577-591, 1971

Sage, A.P.: Optimum Systems Control. Prentice-Hall 1968

Scheffe, H.: The Analysis of Variance. John Wiley, 1959

Scott, D.J.: Central Limit Theorems for Martingales and for Processes with Stationary Increments Using a Skorokhod Representation Approach. Adv. Appl. Prob 5, p. 119, 1973.

Shibata, R: Selection of the Order of an Autoregressive Model by Akaike's Information Criterion. Biometrika 63, p. 117-126, 1976.

Sidhu, G.S. and Kailath, T.: Development of New Estimation Algorithms by Innovations Analysis and Shift Invariance Properties. IEEE Trans. I.T., p.759-762, 1974

Silverman, L.M.: Discrete Ricatti Equations: Alternative Algorithms, Asymptotic Properties and System Theory Inter- pretations. In Control and Dynamic S~%tems, Advances in Theory and Applications. Leondes, C.T. (ed), Vol. 12, 1976

Silvey, S.D.: The Lagrangian Multiplier Test. A.M.S. 30, p.389-407, 1959

Sims, C.A.: Money, Income and Causality. Am. Ec. Rev. 62, p.540, 1972

SSderstrSm, T., Ljung, L. and Gustavsson, I.: Identifiability Conditions for Linear Multivariable Systems Operating Under Feedback. IEEE Trans Autom. Contr. p.837-840, 1976

SBderstr~m, T: On Model Structure Testing in System Identifi- cation. Int. Jl. Control 26, p. 1-18, 1977.

Solo, V.: Time Series Recursions and Stochastic Approximation. Unpublished Ph.D. Thesis, Aust. Nat. Univ., 1978

Solo, V.: Strong Consistency fo Least Squares Estimates in Regression with Correlated Disturbances. Ann. Stat., 1981

327

Solo, V.: Consistency of Least Squares Estimates in ARX models. In preparation 1982

Solo, V.: ARMA models without MA parameters. In Preparation, 1982

Son, L.H. and Anderson, B.D.O.: Design of Kalman Filters Using Signal Model Output Statistics. Prec. IEE, Vol. 120, p.312-318, 1973

Sternby, J.: On Consistency for the Method of Least Squares Using Martingale Theory. IEEE Trans. Autom. Contr. AC-22, p. 346, 1977

Stewart, G.W,: Introduction to Matrix Computations. Academic ~ress, 1973

Stigum, B.P.: Least Squares and Stochastic Difference Equations. Jl. Econometrics 4, p.349-370, 1976

Strang, G.: Linear Algebra and its Applications. Academic Press 1976

Theil, H.: Principles of Econometrics, John Wiley, 1971

Tiao, G.C. and Box, G.E.P.: An Introduction to Applied Multiple Time Series Analysis. Tech. Report #582, Dept. Statistics, Univ. Wisconsin, 1979

Tiao, G.C. and Tsay, R.S.: Stationary A~MAmodels. Univ. Wisconsin, 1981

Identification of Nonstationary and Tech. Report #647, Dept. Statistics,

Tse, E. and Weinert, H.L.: Structure Determination and Parameter Identification for Multivariable Stochastic Linear Systems. IEEE Trans. Autom. Contr. AC-20, p.603, 1975

Vieira, A. and Kailath, T.: On Another Approach to the Schur- Cohn Criterion. IEEE Trans. C.A.S.p.218-220, 1977

Walker, A.M.: Asymptotic Properties of Least Squares Estimates of Parameters of the Spectrum of a Stationary Non-Deterministic Time Series. J. Aust. Math. Soc. 4, p.363, 1964

Wertz, V., Gevers, M. and Hannah, E.J.: The Determination of Optimum Structures for the State Space Representation of Multivariate Stochastic Processes. Unpublished manuscript, 1981

Whittle, P.: The Analysis of Multiple Times Series. JRSS (B) 15, p.125-139, 1953

Wilson, G.T.: The Factorization of Matrical Spectral Densities. SIAM Jl. Appl. Math. 24, p.420-426, 1972

Wilson, G.T.: Some Efficient Computational Procedures for High Order ARMA Models. J. Stat. Comp. Simul__~. 8, p.301-309, 1979

Wolovich, W.A.: Linear Multivariable Systems. Springer-Verlag Applied Math. Sci., Vol. ii, 1974

328

Wu, C.F.: Asymptotic Theory of Nonlinear Least Squares Estimation. Ann. Stat. 1981

Rissanen, J. (1973a), Algorithms for triangular decomposition of block Hankel and Toeplitz matrices with application to factoring positive matrix polynomials, Math. Comp., 27, 147-154.

Rissanen, J. (1973b), a fast algorithm for optimum linear prediction, IEEE Trans. Autom. Contr., AC-18, 555,

Hannan, E.J., Quinn, B.G. (1979). The determination of the order of an autoregression. Journal of the Royal Statistical Society. (B), 41, 190-195.

Date post:	11-Dec-2016
Category:	Documents
Upload:	rolando
View:	221 times
Download:	0 times

[Lecture Notes in Mathematics] Lectures in Probability and Statistics Volume 1215 || Topics in...

Documents