Econometric Theory, ASYMPTOTIC NORMALITY OF MAXIMUM ... · ASYMPTOTIC NORMALITY OF MAXIMUM...

Econometric Theory, 2, 374-412. Printed in the United States of America.

ASYMPTOTIC NORMALITY OF MAXIMUM LIKELIHOOD

ESTIMATORS OBTAINED FROM NORMALLY DISTRIBUTED BUT DEPENDENT OBSERVATIONS

RISTo D. H. HEIJMANS

University of Amsterdam

JAN R. MAGNUS London School of Economics

In this article we aim to establish intuitively appealing and verifiable conditions for the first-order efficiency and asymptotic normality of ML estimators in a multi-parameter framework, assuming joint normality but neither the inde- pendence nor the identical distribution of the observations. We present five theorems (and a large number of lemmas and propositions), each being a special case of its predecessor.

Tout le monde y croit cependant, car les experimenteurs s'imaginent que c'est un theoreme de mathematiques, et les mathematiciens que c'est un fait experimental.1 (Poincare [15])

1. INTRODUCTION

In econometric models the observations are, as a rule, not independent and identically distributed (i.i.d.). For example, the observations {Yt} in the linear regression model Yt = xJ3 + Et (t = 1,... , n) are not i.i.d. even if the errors { e} are, unless the regressors {xtj are random and i.i.d., or xt = c (a vector of constants). Most central limit theorems are, however, for the case where the observations are i.i.d., and hence not applicable to regression models.

A (very) preliminary version of this paper was presented at the 1979 European meeting of the Econometric Society in Athens. We thank the participants of workshops at LSE, Essex, Manchester, Nuffield College, ANU, Monash, and Sydney for their helpful comments. We are particularly grateful to Peter Robinson, B. B. van der Genugten, and the referees for their constructive remarks. Magnus' research was supported, in part, by a grant from the Netherlands Organization for the Advancement of Pure Research (Z.W.O.).

374 ? 1986 Cambridge University Press

ASYMPTOTIC NORMALITY OF ML ESTIMATORS 375

Among the useful exceptions we mention central limit theorems by Malin- vaud [12, p. 250-253] for the case where the observations are independent but not identically distributed, Rozanov [17, p. 190-198] and Hannan [4, p. 220-229] for stationary stochastic processes, and Schonfeld [19] for m- dependent observations.

In searching for conditions which imply asymptotic normality of the maximum likelihood (ML) estimator of the parameters in nonlinear models, we again face the problem that the observations from which the ML estimator is obtained are not i.i.d. In Heijmans and Magnus [6] we studied this problem in a more general framework and established conditions which appear to be weaker and more readily applicable than usual. In particular, the regularity conditions never require convergences in probability to be uniform.

The present article is based on and extends from [6]. We consider a set Y= (Y,Y2, . . ., yJ) of observations, not necessarily independent or identically distributed, whose joint distribution is known to be normal,

y N(u(yo),Q(yo)), (1)

where both the mean vector p and the covariance matrix Q may depend on an unknown parameter vector yo to be estimated. This set-up, apart from the assumed unconditional normality, is rather wide since it contains not only the nonlinear regression model with "fixed" regressors, but also linear models with lagged dependent observations, random regressors or random parameters. Notice that the covariance matrix of the observations may depend on parameters in the mean. We shall discuss the generality and limitations of our set-up more fully in Section 3.

The existence and consistency of ML estimators are assumed both in [6] and in the present article. We discuss both issues in detail in [7].

A brief review of the literature on ML estimation with generally dependent observations can be found in [6]. Here we only mention two predecessors of the present study. First, Anderson [1, p. 183-211], generalizing the important paper by Mann and Wald [13], considered the linear regression model with exogenous and lagged endogenous variables

k p

Yt = E ixti + E bjYt-j + 8t, (t - 1, ... ., n), (2) i=l j=1

where the <S are i.i.d. N(0, a2). We note that Rubin [18] extended [13] to allow for exogenous variables much before Anderson, and that Schonfeld [19] gave the same extension as Anderson did. Many authors have improved upon Anderson's work in various ways. See for example, Hannan [5]. Secondly, Magnus [10] studied the linear regression model

y = X13o + E, (3)

376 RISTO D. H. HEIJMANS AND JAN R. MAGNUS

where X is nonrandom and of full rank, E is distributed as N,(0,Q(00)), and the parameters in 3,, are functionally independent from those in Ho Magnus' [10, Theorem 5] conditions go back to Weiss [20] and are very stringent. Both equations (2) and (3) are, of course, special cases of the model of equation (1).

This article contains five theorems, four propositions, and sixteen lemmas (of which seven are in Appendix A), and is organized as follows. In Section 2 we explain our notation, and state Theorem 1, which contains the extension of the central limit theorem to dependent observations previously obtained by us in [6]. In Section 3 we list the three basic assumptions; these are discussed briefly. The score vector, Hessian matrix, and Information matrix are derived in Section 4. In Section 5 we prove first- and second-order regularity of the loglikelihood function. Denoting by lj(y0) the score vector (i.e., the derivative of the loglikelihood function) evaluated at the true value yO, we prove in Sections 6 and 7 some auxiliary results regarding the components of the random "difference" vector ljy) - 1,, - 1(yo), and show that the sequence {lj(yj)} is a vector martingale.

All these results are finite results; the remainder of the article concerns asymptotic results. The first of these is the asymptotic normality of {n- 1121(o)} established in Section 8. Denoting the matrix of second-order partial derivatives of the loglikelihood function (the Hessian matrix) by Rj(y), we show in Section 9 that (1/n)RJ(yJ) converges in probability. to minus the Information matrix; in Section 10 conditions are given that guarantee sufficient continuity of (1/n)Rn(y) that this function is in some sense close to (1/n)Rn(yo), when y is near yo and n is large. Our main result (Theorem 2) is stated in Section 11. A weaker version of Theorem 2 (Theorem 3), easier to apply in practice, is stated and discussed in Section 12. Section 13 contains the special, but important case where the covariance matrix and its derivatives up to the second order have locally bounded eigenvalues. As an example we work out (Section 14) the conditions for asymptotic normality in the case of first-order autocorrelation (Theorem 5) and compare these conditions with the literature. Two appendices containing the proofs conclude the article.

2. NOTATION AND SET-UP

The following notation is used. The defining equality is denoted =, so that x:= y defines x in terms of y. N= {1,2, .. . }, and R' denotes the n- dimensional Euclidean space. The eigenvalues of an n x n matrix A are denoted it(A), t = 1, . . . , n. To indicate the dimension of a vector or matrix, we often write /I(n) :(=( 1, Y2, * * *, inY)', and Qn for the n x n matrix Q. If f(x) is an m x 1 vector function of an n x 1 vector x, then af(x)/ax' denotes the m x n matrix of partial derivatives, and af(x)/0xi the m x 1 vector of partial derivatives with respect to xi; if F(x) is an m x p matrix function of x, then


aF(x)/axi denotes the m x p matrix of partial derivatives of F with respect to xi.

The general set-up is as follows. (See also Heijmans and Magnus [6, Section 1.2].) Let JY1,Y2, .. } be a sequence of random variables, not necessarily independent or identically distributed. The joint density function of Y(n) := (Y1,Y2, . . . , yJ) e n is denoted hn(.; y0), and is assumed to be known except, of course, for y0, the true value of the parameter vector to be estimated. We assume that y, E F, the interior of the parameter space F c RP. For every (fixed) y E RAn the real-valued function LJ(y; y): = hn(y; y), y e F, is called the likelihood (function), and every value Yn(Y) e F with

L.('9(y); y) = sup L.(y; y) (4) yoF

is called an ML estimate of y, Let Mn denote the set of y E lRn for which an ML estimate exists, i.e.,

Mn:= U {Y: y R 9 Ln(y; y) = sup Ln(Ok; Y)} n E N. y c-r o r

If there exists, for every n e N, a measurable function n from R' into F such that equation (4) holds for every y E M", and a measurable subset M'n of Mn such that P(M') -* 1 as n -+ oo, then we say that an ML estimator {n(ny)} of yo E F exists asymptotically almost surely (a.a.s.).

All probabilities and expectations are taken with respect to the true underlying distribution. Thus we write P instead of P.0, E instead of EY0, etc.

For fixed y E Rn, An(Y) = log Ln(y; y) is the loglikelihood function and, if An is twice differentiable, ln(y) = aAn(y)/3y denotes the p x 1 score vector and RJ(y) := a2An(y)/ayDy' the p x p Hessian matrix with elements Rnij. We define Lo(y) := 1 and let gn(y) = Ln(y)/L -1(y) be the "conditional likelihood". Finally, we define cnj = a log 9g(YX)/aYi (j = 1,... , p).

In Heijmans and Magnus [6] we considered the first-order efficiency and asymptotic normality of ML estimators obtained from generally dependent observations, assuming that the joint density of (Y1, . . ., yJ) is known (except, of course, for yJ), but without specifying this function. The following result, which will serve as our starting point, was proved there.

THEOREM 1. Assume that

1. for every (fixed) n E N and Y(n) c -R , the loglikelihood An(y; Y(n)) is twice continuously differentiable on r;

2. EQ4. < 00 (n E ,j=1. p); 3. E(Ynil.Y y ,y- ) = 0 as. (n 1 2, j = 1., p); 4. p lim (1/n) max j =O ? = 1, . . ., P);

n -cc 1 tSn


n

5. lim (1/n2) var S E(ctiXtj JY1 , Yt- 1) = 0 (i, = 1. p);

n

6. lim (1/n2) E var (4tiX,j-E(4,iXtjI y1, * )- l) = 0 (i,j = . p); nr - toI= 1

7. there exists a finite positive definite p x p matrix Go such that

lim (1/n)Eln(yo)ln(yo) = Go;

8. p lim (l/n)R.(y0) =- ;

9. for every a > 0 there exists a neighborhood N(y0) of y0 such that for i,j = 1, p

lim P [(1/n) sup IRniJy) - Rnijyo)| > ] = O0 n - oo yeN(y.)

Assume further that an ML estimator {YU} of Y. E T exists asymptotically almost surely, and is weakly consistent. Then the sequence {In} is first-order efficient and asymptotically normally distributed, i.e.,

p limn ( A

- yo) - (1/ n)G`1l&(y0)) = 0 n - oo

and

-S/n(y-n Y) -+ N(O, Go-').

Proof. This is the special case of Heijmans and Magnus [6, Theorem 2] obtained by choosing ]n = in: = El1(yo)ln(Y0) and Go = Ko, and assuming that in/n tends to a positive constant as n -o . U

In the remainder of this article we shall assume that the joint density is normal. The precise framework is described in the next section.

3. BASIC ASSUMPTIONS

The first assumption defines the structure which generates the observations.

Assumption 1. For every (fixed) n E RJ, y(n) (Y1,Y2,.. , y.)Y E R' follows an n-variate normal distribution

0

Y(n) - N(u(n)(7o), f2n(Yo)), yO G F c HP,

where yo is the true (but unknown) value of the parameter vector to be esti-

mated, and p, the dimension of F, is independent of n.

We further specify the functions involved in the next two assumptions.


Assumption 2. p(fl): F -r R' and Qn: F R- n I n are known twice continuously differentiable vector (matrix) functions on F, the interior of F, for every (fixed) n E N.

Assumption 3. The matrix QJ(y) is positively definite (hence nonsingular) 0

for every n E NkJ and y e F.

The assumption of unconditional normality is, of course, rather strong, but not as strong as it may seem. Let us consider the class of models included in this assumption. First, the classical nonlinear regression model

Yt = O(xe, l0) + et, (t = 1,.. ., n),

where /(.) is the known response function, xt is a nonrandom vector containing the values observed for each of the explanatory variables at time t, the Et are unobservable errors whose joint distribution is known to be

,8(n) - N(O, Q2n(00)),

and fto and 00 are the true values of the parameter vectors to be estimated. Note, however, that our set-up allows for the fact that the covariance matrix of the observations may depend on parameters in the mean, thus including cases such as the type of heteroskedasticity where the variance of Ye is pro- portional to the square of the mean. Secondly, linear models with lagged dependent observations. For example,

Yt=co+floYt-i +7oXt+et, t= 1,2,...I

where {x,J is a sequence of observations on the nonstochastic regressor, yo is fixed, and {etJ is i.i.d. N(O, a2). Then (Yl, Y2, . . , Yn) is n-variate normally distributed, with

e- et-1

Pt: = Eyt = yof3t + 0 Eo,BJ + yoZ E f3xtj, t n1,..., n j=O j=O

and 6(s,t)

Wse9 cov(ys,ye) = 9fils-tl E z 2j, S, t = 1,... n, j=O

where 3(s, t) = min (s - 1, t - 1). The covariance matrix Q = (wst) depends of course on /Jo. The situation where the errors {ct} are not i.i.d. also falls within our framework; only the expression for w0st becomes more complicated. Thirdly, linear models with stochastically varying coefficients or stochastic regressors. The linearity, except in certain pathological situations, is essential in order to maintain unconditional normality. However, models that are linear in the stochastic regressors or lagged endogenous variables are allowed to be nonlinear in the (nonrandom) parameters, and models that are linear


in the stochastic parameters may be nonlinear in the (strictly exogenous) regressors.

The fact that the observations {Yt} are scalars is not restrictive, because of the assumed general dependence. The observations may be finite dimensional vectors and even the orders of these vectors may vary with t. The only requirement is that their joint distribution is multivariate normal.

Three further points are worth noting. First, the number of parameters is assumed to be fixed and independent of the number of observations. Secondly, the parameter space F is a subset of lRP, the Euclidean space of (fixed) dimension p > 1, but not necessarily a p-dimensional interval. Thirdly, for fixed n, the covariance matrix ,&(y) (and therefore also Q 1(y)) is required to be nonsingular for every value of y E F. But, for n -+ oo, there will in general be values of y such that

IQn() I+ 0 or Itn(T) I + ?

4. THE DERIVATIVES OF THE LIKELIHOOD FUNCTION

The loglikelihood function based on n observations Y, ... , Yn is

An(7) = -(n/2) log 27t - I log IQn(T)I - 1(Y(n )- (n)()M)A- 1(7)(T(n) - P(n)(7y)), (7 E IF). (5)

LEMMA 1. Given Assumptions 1-3, the loglikelihood An(y) is, for every (fixed) n E NJ and y(.) E Rn, twice continuously differentiable on F.

Proof. Obvious. a

Lemma 1 ensures that the (log) likelihood is twice continuously differentiable in a neighborhood of the true parameter value y,.

Let us define the p x 1 score vector

In(y) := ay (Y e F), (6)

with components lnj{)O, j = 1,... , p; the symmetric p x p Hessian matrix

a2An(Y) R.(y) := -, ( F), (7) ayay' with elements Rnij(y), i,j = 1, . . . , p; and the symmetric p x p Information matrix2

G(Yo): = - ER&(JY), (8)

with elements G.ij(y0), i,] = 1 . . . , p. We can now establish Proposition 1.


Proposition 1.3 Given Assumptions 1-3, the typical elements of the p x 1 score vector lQ7), the p x p Hessian matrix Rn(y), and the p x p Information matrix GQ(y0), are given by

1 (aQnn(7 N _____7

(n = 2 tr Q(v)) + (Y(n) - _(_)))_nn _ () _

2(Y(n) IL(n)(Y)) '

(Y(n) - (n)(7));

R a, _a(n ) Q-1() o(n)(7)

- tr aQn(v) Qn() n (7) Qn()

+ 2 tr (a () Qn()J) + (Y(n) - P(f)(y)) qnJ

- (Y(n) - I(n)(Y)) M '

(Y(n) - yo)(7))

where

qnij (y) e = + e., + nn () 1 n ;

and

Gnij(7,o= (aflr). Q n(7) a3(

+ - tr ( ( Qno) a y QR1co)).

From Proposition 1 we easily obtain a compact expression for the Infor- mation matrix (rather than for a typical element), namely4

G~(0 =(a/In)Y0)' I

'(an)Go ko) 7f O-

ay l()\\ 0 ~ a7) ay ) (0 (a vecQ G0)( v0a vec ))

'0)70

+2 ( aV' (Qn(o) (D 9Qn(vo) ) ( aVt ) (10)

which shows that the Information matrix is positive semidefinite for every ne RN.

For the special case where ,u is a function of "structural" parameters , Q is a function of "covariance" parameters 0, and ,B and 0 are functionally


unrelated, we note that the Information matrix is block-diagonal (see also Magnus [10, Theorem 3]):

GQ(f Oo) = (Gn( O') ?O) (11)

where

GI(f, 0) = ( ) (0)(aC (12)

and G'(0) is a symmetric matrix with typical element

(GO(O))ij = tr (a,(oQ) aQ) Q(O)) (13) n ~~aoj n)(O) o n)

5. REGULARITY

Our basic assumptions also suffice to prove first- and second- order regularity.

Proposition 2.5 Given Assumptions 1-3, the loglikelihood function An(y) is regular with respect to its first and second derivatives, i.e.,

Elk(yo) = 0, -ERn(yo) = Elb(yo)ln(yo).

6. FINITE PROPERTIES OF Xtj

Let g,(y) := L,(y)/L, - 1(y) denote the "conditional likelihood" function, and let Lo(y) := 1. In this section we are interested in the partial derivatives of

log g,(y) evaluated at the true value y0, i.e., Qtj : = a log gj(y.), = 1, . . .

First, however, we recall some properties of the Cholesky decomposition. Given any positive definite n x n matrix Q,n, there exists a unique diagonal

n x n matrix An with positive diagonal elements, and a unique unit upper triangular6 n x n matrix Z,, such that

A= ZnA'Z' (14)

This is the Cholesky decomposition of Qn 9 . The diagonal elements c, ... of An are called the Cholesky numbers. The first t components of the t-th column of Z. form a t x 1 vector, which we denote zt (t = 1, ... , n). Notice that the last (t-th) component of z, is 1. The vectors zl, . . ., Zn are called the Cholesky vectors.


Now let Q,n be the n x n upper-left submatrix of a positive definite (n + 1) x (n + 1) matrix On+1 The matrix Qn+l possesses n + 1 Cholesky numbers (vectors). A crucial property of the Cholesky decomposition is that the first n Cholesky numbers (vectors) of Qn+l are precisely the Cholesky numbers (vectors) of Q,n. Hence, a, (and zt) unambiguously denotes the t-th Cholesky number (vector), irrespective of the value of n.

The following two properties are easily established:

n

IQnl = H OCt (15) t = 1

and

C(n)Qn C(n) = t (?(7 ZtC(t)), (16) t= 1

given any set of real numbers c 1,... , cn, where C(t) = (c 1, *. . , c)', t = 1,. n. Using equations (15) and (16), we rewrite the loglikelihood function An,

given in equation (5), as

An(y) = -(n/2) log 27r - 2 log Ifn(vM) -i

(Y(n) - (n)(MY ) n- 1(7)(Y(n) 8 (n)(7) n

- (n/2) log 27r - 2 log H t(y) t= 1

n I

E (a -1/2(y)z,(y)(y(t) -_(t)(y))) t= 1

n =-2 EZ (log 2noct(y) + (at- 1/2(y)Z,(y)(Y(t) -(t)(y)))2).

t= 1

Hence,

log gt(y) =-2 log 2r1at(y) - 4(o5 12(y)Z(y)(Y(t) -_ (l)(y)))2' (17)

and its first-order partial derivatives :tj = 0 log gt(y0)/Oyj evaluated at y, can be conveniently expressed as

tj -atj(l -U2) + (6tj - W,j) (18)

where

actj = 0 log at(7)/ayj' (19)

btj : = (1/ ;(yO))( (yO)/OyJ)'zt(yO) (20)

t= (1/c)Xt(yo)))z(yo)(t), (21)


Wtj= (1/ cXt(yo))a )/)'C(t) (22)

6(t) = Y(t) - IL(t)(Yo). (23)

We note three important features of ut and wtj which render equation (18) for Xtj particularly useful. First, since ?(n) - N(0,,Q(y0)) and Q-1 = ZA-'Z', we have

An- 1/2 (7,jZ'(7,jg(.) N(0, In)-

Hence, u1, U2,... are independent and identically distributed as N(O, 1). Secondly, if ek (k = 1, . . . , t) denotes the k-th column of I,, and Ek the k-th

component of E(t), then, for k = 1, . . ., t -1,

EekUt = E(e6(t))(x7 12(Yo)8t)Zt(Yo)) - 2- 112(y )e'Qt(To)zt(7)(

= at- 12(y0)e'(at((y)et) = x1/22(yo)e'et = 0,

because

7tt= (z') 'AZ 'Zet = (Z<) 'Atet = tt(Z')- let = atet-

Hence, ut is independent from P1,.. .,- 1. Thirdly, since wtj depends only on 6(t 1) and not on et (because the last

component of Oz,(7oJIOTj is zero), we also have that ut and wtj are independent. Let us now establish some properties of Xtj.

LEMMA 2. Given Assumptions 1-3, Ed4j < oo for every n E FN and j =

1, ... , p.

Proof. Obvious. U

It is also easy to prove that

E((CtjIc(t-1)) = 0 a.s. (24)

Further, we have

XtiXtj = *xtioc (1- U2)2 + (6ti - Wti)(tj Wtj)U

- 1((bti - Wti)tj + (6tj - Wtj)ati)(l -U2)Ut,

and hence

E((:tjtj| e(t - 1)) = 'a 4c4 ; + (6ti - wti)(tj - wtj). (25)


From equations (24) and (25) we easily obtain

a2 =var Qtj = E42 = EE(2 |(t-)) = Ia2 + 32 + (IILXt(yo))(OZt(yo)layj) Qt(yo)(aZt(yo)layj). (26)

Finally, let us consider

Vtij = -ti tj-E(4tjdtj I e(t - 1))

= 4aitX4-U2- )(bti _Wti (jWJ(-2) = *tiut - 2u - 1)-(ti- ~)(~ - w,j)(l - ut)

2 ((ti-Wti)xtj + (btj - Wtj)Xti)(j - U2)Ut.

Using Eu6 = 15 and Eu' = 105, we have

2E(v2 jI2(t )) = 7a2 + 4(ttii-w)2(tj -Wtj)

+ 5((bti - Wti)tj + (3tj -Wtj)ti)2

+ 4ctiL 46ti - wti)(btj -wtj)

and hence, after some elementary calculations,

var vtij = EE(v2ijl(- 1)) 2 14of2fj. (27)

7. {1/(yo)} IS A VECTOR MARTINGALE

We shall now show that lj(y), i.e., the derivative of the loglikelihood function evaluated at yo, is a vector martingale. (See Heijmans and Magnus [6, Section 3] for a definition and some properties of vector martingales.) The relevance of this fact lies in the possibility to use (vector) martingale limit theory to prove the asymptotic normality of {In 1 21n(y0)} We first establish Lemma 3.

LEMMA 3. Given Assumptions 1-3,

E(4njly,...y * -1)= a.s. (n>2,j=1, . ..,p).

Proof. This follows from equation (24). U

An immediate consequence of Lemma 3 is Proposition 3.

Proposition 3. Given Assumptions 1-3, the sequence {ln(yo), n E NJ} is a vector martingale.

Proof. Since Inj(yo) = , t Lemma 3 implies that E(1,"yj) I yi'.... Yn- 1) = In - lj(vy). Further lj(y0) is a continuous (hence measurable) function of Yi, . ., Yn with finite (in fact, zero) expectation. Hence, the result follows from Lemma 1 of Heijmans and Magnus [6]. m


8. ASYMPTOTIC NORMALITY OF {n-'121.(y0)}

So far, all our results were finite results. From this section onwards, all results are asymptotic results. The first asymptotic result is the asymptotic normality of {n` /21 (7 )} for which we make two further assumptions.

0

Assumption 4. If Gj(y), y E F, denotes the positive semidefinite p x p matrix whose ij-th element is given by

Gnij(y) =(ago)( ) 9n- (Y (am)J ) =I Qn-, '(}ji n-, y

+ - trQK(7)n(Y) aQ?v QR(7))

0

then, for all y E r, the matrix (1/n)Gn(y) converges, as n -+ oo, to a positive definite p x p matrix Go(y).

Assumption 5. In terms of the (unique) Cholesky decomposition of Q7-1: Q'-1 = Z A,1 Z,, where An is a diagonal n x n matrix with positive diagonal elements and Zn is a unit upper triangular n x n matrix, we have, for every y E F,

lim (1/n2)tr , An (7)(af l) Q0(7)= 0, ( = 1,... , p).

Assumption 4 ensures that the Information matrix G&(yJ), which is positive semidefinite for every n E N\1 (see section 4), is in fact positive definite for n sufficiently large. In the linear regression model y = Xf3O + e, with E N(0, cr2 V), V known, we commonly require that (1/n)X'V - 'X tends to a positive definite matrix as n -+ oo. Assumption 4 is the equivalent condition for our case.

Assumption 5 requires that (1/n2)tr<(C-2(y)) - 0 as n oo, where

_(y) (az072(y) (aZn( n 1 Zn(Y) IYl/2(V) c ~~~~~~~A, 'Qy) C0iy n K a7i K a7i )

But since

tr K 7 a (7 Q )) = 2 tr(Cni(y)) + trKA-T'(y) aAn (y)2

we obtain


and so

(Il/n2)p2i(7) <, (Iln 2) tr (C2E(7)) <, ((1/n)9ni(7))((1/n) tr (Cni(7))) -< ( (11n)ynj(7))( ( 1n)Gnii(7A

where

Pni(7Y)= max 4jCni(y)). 1 StSn

Hence, given Assumption 4, Assumption 5 is equivalent to

lim (1/n)ynij(y) = 0, (i = 1, . . p). n -o

This shows that Assumption 5 is a weak assumption. For, knowing that the average of the n eigenvalues of C,j(7) is bounded (for every fixed y E F), we do not require that each eigenvalue of Cni(y) is bounded, but only that each eigenvalue is o(n).

The following three lemmas can now be established.


p lim (1/n) max X2 = O, (j = 1, . . ., p). n- 0o 1 $t<n


n lim (1/n2) var E E(4tjXtj I Yi, , t- J = 0, (i,j = 1, . * p). n-oo t= 1


n lim (1/n2) E var(tj tj-E(4tiXtjIy1, , Yt*. = 0, n-oo t=1

(i,j= l.,p).

We now have all the ingredients to prove the asymptotic normality of the sequence {n- /21n(Y0)}

Proposition 4. Given Assumptions 1-5, the sequence {n 1-/21 (yJ) n Ec is asymptotically normally distributed, i.e.,


9. CONVERGENCE (IN PROBABILITY) OF THE HESSIAN MATRIX

In this section we shall prove that the Hessian matrix Rn(y0) divided by n converges in probability to minus the asymptotic Information matrix Gj(y0). First, however, we establish Lemma 7, which we shall need later in the proof of Theorem 2.


lim (1/n)El1(jo)ln(70) = G(7j). n oo

Proof. We have (1/n)E1j(yj)l'(y7) = -(1/n)ERj(y0) = (1/n)G(0) Gjj, as n -s o, using second-order regularity (Proposition 2), Definition (8) and

Assumption 4. U

Let us now consider the convergence of (1/n)Rn(yT) to - G0(y0). We shall make the following two additional assumptions.

0

Assumption 6. For every y E F,

lim (1/n2) tr (a"(7 ) Q_ ( )\) = 0, (i,j= 1,, P).

0

Assumption 7. For every y E F,

(i) lim (l/n') (0,(n)() , Otn

n(Y) OfT 01 )?

(i2j = 0,

and

(ii) lim (-ln2a (47) ,

1(n ) (a02 )(Y)) = 0, (ij = p).

Assumption 6 looks similar to Assumption 5, but neither is implied by the other. A necessary condition for Assumption 6 to hold is that

(1 () ?

(1/n) max At n~ ?n(Y)j 0


as n o co; sufficient is that

(1/ ,n) max At( a2- (iQ()~ -0 ( /S <tn t(<r aTiaj7j

())

as n -+ oc. Assumption 7 is implied by the following conditions:

(i) (1/n) max 2I(Qn(v)) -+ 0, 1 <tSn

as n -* oo, for every 1 E F, and

(ii) there exists a continuous function M: F R such that for all n E NJ and o

/ J(fl(n)M~' n~ l( tt)n2 / (/ In) a ) a ) a(y)) sO M(y), (i,j 1 p),

and

(1/n) a )n 2(y)( ( (n(7)) hj M(y), (i,j = 1,... p).

The main result of this section is the following lemma.

LEMMA 8. Given Assumptions 1-4, 6, and 7,

p lim (Iln)Rn(7o) =-G(70). n oo

10. FINAL LEMMA

One more result is needed before we can prove the asymptotic normality of the ML estimator. To establish this result we shall make the following two additional assumptions.

0

Assumption 8. For every y E F, (i) there exists a finite positive number M(y) such that for every n E P,

(1 n) tr(QAM) -< M(y),

and (ii)

lim (1/n2)tr(K22(y)) = 0. 0n


0

Assumption 9. For every 0 e F and a > 0, there exists a neighborhood 0

N(O) c: F and an integer nO (depending only on a and N(O)) such that the following ten conditions hold for every n > no, y e N(0), and i,j = 1,. ,p:

(i) (1/n) y([(n)(7) - I(n)(0)) ai a!7(T)1 ,) <

(iii) (Iln) tr (r) Qn(4) a (4) Q (4))

- tr(C nn(?0 d) Qn(l) 1< ?t,

(iv) (l/n) tr ( ' - i?n(7))-tr (a2Q1 '(4))?)

(v) (l /n) j(t: nn (; (yj ) y (ni )y ( 7J )

(vii) (1/n) (U(.)(Y) - M(n)())) (a ) ((n)(Y) - f(n)(4))) < Ot,

(Vlll) (Iln/(n)Y _ Itn()f nY()0)< t ayi ayj aTayj y

(viii) (l/n) n (Q n(7) aIo)(7) _ n (4)) a(I2 ())) a7i a7j a7i a7j

(ix) (l/n)(Q- ,(y) a k(n7) - Q-l(4)) 0 INA)(0))

n\f a7ja71 -n 'yi"a7y7 j ? ayi@'Yi a!yiayj

02(n- 1(y) d/2d(/-Q- l(o)I(()<2

(x) max ,> (-a'n a2)- nR(4)i 1 < t \ n a7ia7j a7a71j

At first glance the conditions in Assumption 9 may appear to be 0

overly restrictive; in fact, they are not. We already know that for every 0 e F, x > 0, and n e IQ, there exists a neighborhood Nn(5b) c F such that all ten conditions of Assumption 9 hold for every y e Nn(. All that is required in addition is that the intersection nn=l Nn(O) does not degenerate to a single


point k, but remains a true neighborhood (however small) of 0. Notice that Assumption 9 is a local, not a global, condition: we require that it holds for points close to 0, but not necessarily for every point in F.

Nevertheless, to verify Assumption 9 in practical situations, i.e., when ,u and Q are specified, can be arduous. Therefore we shall provide later (Theorem 3) conditions which are slightly stronger than Assumption 9 but easier to verify.

Let us now use Assumptions 8 and 9 to prove Lemma 9.

LEMMA 9. Given Assumptions 1-3 and 7-9, there exists,for every os > 0, a neighborhood N(y0) c F of 70 such that for i,j = 1, ... ., p

lim P [(1/n) sup RJO-{y)- Rn,ij(T)l > oc =. n- oo yeN(y.)

11. THE CENTRAL LIMIT THEOREM FOR NORMALLY DISTRIBUTED (BUT DEPENDENT) OBSERVATIONS

We can now prove our main result.

THEOREM 2. Suppose that Assumptions 1-9 are all satisfied. Suppose further that an ML estimator {Yn(y(n))} exists asymptotically almost surely, and is weakly consistent. Then the sequence {Jn(Y(n))} isfirst-order efficient and asymptotically normally distributed, i.e.,

- y~)-~ N(, G~'(ye)). (^n(A()-tO N(O, Go(7)

Proof. Lemmas 1-9 imply that conditions 1-9 of Theorem 1 are all satisfied. Thus the result follows. U

12. A SET OF STRONGER ASSUMPTIONS

Some of the assumptions that underlie Theorem 2, notably Assumption 9, while weak and verifiable in principle, are still unappealing. For many pur- poses (for example in the case of first-order autocorrelation, see section 14) they are also unnecessarily general. Let us replace Assumptions 2 and 6-9 with the following set of assumptions which, while somewhat stronger, are more appealing and easier to verify.

Assumption 2'. For every (fixed) n c- N, [(n): F Rn is a known three

times differentiable vector function on F and Q : F Rn nX is a known twice

continuously differentiable matrix function on F.

392 RISTO D. H. HEIJMANS AND JAN R. MAGNUS 0

Assumption 6'. There exists a continuous function M: F R such that for all n e RN and y e F,

(i) (I1/n) tr (Q'(y)) <, M(y),

(ii) (l/n)trn- I

-Q(y)) 4

( i1 ,p

(iii) (lln) tr( f ()n(^Y)) M(y), (i, = l, I.,IP)

oyi~~~~~~~~~~~~~~~

0

Assumption 7'. There exists a continuous function M: F R W such that

for all n e N and y e F,

(i (l/n)(n)() 0(n/t)(y)) () i=1 )

(ii) (Iln) (t,(Y)) (yi ) Q(Y) I (7) (@(9)) .. M(),

ay / ay, / .,p (ij = )

(iii) (1/n) (a2L)(Y)) Q- '(y) (:2 Y(n)(Y) s M(y), (i,j = ,.. , p),

) (1ni) j )) y)2 (I0y )) sj

(an)I~(y) N' (aM(nl)(Y) N

(v) (1/n) Q- 2(y) I ) MQ), (i,j,k = 1,..., p). \aOYiaYj,Y n1 ~ aYiaY aYk/m() j k-1 )

Assumption 8'. There exist real-valued functions M1 and M2 defined and 0 0 0 0

continuous on F x F such that for all n e N, y e F and 0 e F,

(i) (l/In) ( 2 (} { n- ) M(y,), (i = 1, ,P), ) 2, / \Yi /))

(i) 9ln(n)(Y)) n_a () (/a'(9n)(Y/))M(,, (ii) ON ayi aYJaYk a7i / I(' )

(i,j, k = 1,. . .,p),

~02n- 1(y) i02n- 1(0))| (iii) max /, <--~/ M2(Y, 0), (1ij = ,.., w 1 ( 0 ay -+ 0ii. j

with M2(Y, 0) O-) as y -+0.


Discussion. In Assumption 2' we require that the mean vector ,u is three times differentiable rather than twice continuously differentiable. The only reason for strengthening Assumption 2 in this way is that it enables us to replace Assumption 9 (ix) by the much simpler Assumption 7' (v), as we shall see in the proof of Theorem 3. If ,u is not three times differentiable, we can retain Assumption 9 (ix).

The eight conditions in Assumptions 6' and 7' are all of the following form: "there exists a real-valued function M, defined and continuous on 0 0

F, such that (1/n)q2(y) < M(y) for all n E FM and y E F". This implies that for

every y E F a neighborhood N(y) can be found such that the sequence of functions {j2} is uniformly bounded on N(y). That is, it means that {jq} is locally uniformly bounded. Similar remarks apply to Assumption 8'.

In the linear regression model y = XyO + e, N(O, Q(yo)), Assumptions 7'(iii)-(v) are trivially satisfied.

If there exists a y E F such that Qn(y) = In (which is almost always the case), then Assumption 7'(i) follows from Assumption 8'(i).

Let us now state Theorem 3.

THEOREM 3. Suppose that Assumptions 1, 2', 3-5, 6'-8' are all satisfied. Suppose further that an ML estimator {I0(y(0))} exists asymptotically almost surely, and is weakly consistent. Then the sequence {Yn(Y(n))} is first-order efficient and asymptotically normally distributed, i.e.,

(9n(Y(n))-Yo) N(O, Go ̀()Jo) ).

13. THE CASE OF UNIFORMLY BOUNDED EIGENVALUES

In the frequently encountered case where the eigenvalues of Q), Q-21, and its first and second derivatives are known to be locally uniformly bounded, the conditions simplify considerably. The following set of assumptions then replaces Assumptions 6'-8'.

0

Assumptions 6". There exists a continuous function M: F R such that for all n E Rd and y E F,

(i) max t,(QKn(y)) < M(y 1 $;tSn

(ii) max t(Qn '()) ) M(y), 1 <t<n

(iii) 1max Atn - ( n`7)) < M(y), (i- = 1... 1 P).

(iv) aSx

At nri )| M(A) (i,j= LI, P).


Assumption 7". There exists a continuous function M: F -r R such that

for all n e N and y e F,

(i) (1/n)( ai ) ( ayi ) 4y), ( = .

(8[ (y() N ( a2,(f)(y < Ay) (i,j = 1, . ., ,

(iii) (I/n) Kay.a7 ) .8?k,)Kya ) )< M(7), (ij k = 1,. , P)

o o

Assumption 8". There exists a continuous function M2: F x F R such 0 0

that for all ne -J, y e F, and 0 e F,

a2 ( Q2 - 1(y) a2_ l lQ9))| M2(y 0), (i, = 1, . ,

< t S n a^yia7^j 3ayia^j

with M2(y, 0) -* 0 as y -. 0.

THEOREM 4. Suppose that Assumptions 1,2', 3-5, 6"-8" are all satisfied. Suppose further that an ML estimator {$ j(y(n)} exists asymptotically almost surely, and is weakly consistent. Then the sequence {9(Y()} is first-order

efficient and asymptotically normally distributed, i.e.,

(Y-YO) N(O, G- '(yo)).

Proof. Immediate from Theorem 3, making repeated use of the fact that x'AxI < x'x max IAV(A)I for any symmetric matrix A and vector x. U

1 <t<n

14. AN EXAMPLE: FIRST-ORDER AUTOCORRELATION

By discussing a relatively simple example (first-order autocorrelation), we now hope to convince the reader that the conditions of Theorem 4 are easy to verify in practice, and lead to conditions which are weaker than those known from the literature (in the case of first-order autocorrelation, in particular Hildreth [8]). We wish to demonstrate Theorem 5.

THEOREM 5. Let {Y1,Y2,... } be a sequence of random variables, and assume that

C1. for every (fixed) n e N\, Y(n) = (Yi, Y2, YJ) e R0follows an n-variate normal distribution,


where Vj(p) is symmetric n x n matrix whose ij-th element is given by pli-il/

(1 - p2), and & e B c k, po U:= (-1, 1), and a'2 R (0, c) are the true values of the k + 2 parameters /3:= (/3f. / . .k), p and a2 to be estimated;

C2. for every (fixed) n E N, P(n): B -- Rn is a known three times differentiable vector

function on B, the interior of B;

C3. for every /3 E B and p E U, there exists a finite positive definite k x k matrix HO(/3, p) such that

lim (1/n) (8) H)), V 7 (P)( ) = H0(/3,P);

C4. there exists a continuous function M: B -+ DR such that for all n E NJ and /3 E B,

(i) (1/n)t )() <1 M(:), (i=1... k),

(ii) (1/n) (8P(fl)(I3)) (P/3e) < M(/3), (i,j = 1,. k),

(ii3i ( n/ 3/) 838/3n)8/3l) ), (i,j, = 1. k).

Let y:= (P, a2) denote the vector of k + 2 parameters and let y0 =

(/W po, 2). Assume that an ML estimator {9(yn)} exists asymptotically almost surely, and is weakly consistent. Then the sequence {27(Y(n))} is first-order efficient and asymptotically normally distributed, i.e.,

(7-n(Y(n)) t0) N(O, W(yO))

with

( a2H- I(/30 pO) 0 )

W(,/o) = ' (I _ p2) 0

0' 0 2(4

Discussion. We have assumed that the structural parameters ,B and the covariance parameters a2 and p are functionally unrelated. This restriction can easily be removed.

Let il'(fl):= (1/)(M(l)t)a/)(()(p)/afli). From C3 we obtain q'(fl) 0

H"(fl, 0) > 0 as n -+ oo, for every /B E B. (H" denotes the i-th diagonal element 0

of Ho.) This implies that, for every ,B E B, the sequence {jn(f)} is bounded. It does not imply that for every ,B E B a neighborhood N(,B) can be found such that the sequence of functions {i'} is uniformly bounded on N(,B). This last requirement, however, is what we need, and condition C4(i) guarantees the existence of such a neighborhood. Notice again that C4(i) implies that {6n}


is locally uniformly bounded; we do not require that it is globally uniformly bounded. Similar remarks apply to conditions C4(ii) and (iii).

In the special case of the linear regression model

y = X/3o + , E - N(O,caV(pO)), (28)

conditions C2 and C4 are redundant, and condition C3 boils down to

C3'. for every p E U, the matrix (l/n)X'V -(p)X converges, as n -+ oo, to a positive definite k x k matrix H,(p).

Let us compare our results with those of Hildreth [8]. Hildreth consid- ers the linear model of equation (28). His conditions for asymptotic normality (p. 584) consist of our conditions Cl and C3', and two additional ones involving the matrix X = (x,j), namely (i) Ix,iJ is bounded (i = 1, ... , k; t = 1,2,. . .), and (ii) (l/n) n=. xuixt_ + ,j converges, as n -+ oo, to a finite limit (i,j = 1, . . . , k; z = 1, . . ., n). We conclude that our conditions are sub- stantially weaker than those of Hildreth.

NOTES

1. "Everybody believes in the law of errors, the experimenters because they think it is a mathematical theorem, the mathematicians because they think it is an experimental fact." Poincare [15, p. 171] attributes this remark to Lippmann. Also quoted by Cram6r [3, p. 232].

2. Since we have adopted the convention that each expectation is evaluated under the true value y, it would be incorrect to define G,(y) =-ERJ(y).

3. See also Holly [9]. 4. We use the fact that tr ABCB = (vec A)'(B 0 B) vec C for symmetric matrices A, B, and

C of the same order. 5. See also Magnus [10, Lemma 5] and Holly [9]. 6. A square matrix is said to be unit upper triangular if its diagonal elements are 1 and all

elements below the diagonal are zero. 7. Lemma A.4 replaces an earlier and less general one. We are indebted to Peter Robinson

for suggesting this lemma and providing the proof. 8. See also formula (37) in Magnus [10, p. 306] and the correction on p. 261.

REFERENCES

1. Anderson, T. W. The statistical analysis of time series. New York: John Wiley, 1971. 2. Apostol, T. M. Mathematical analysis (2nd edition). Reading, Mass.: Addison-Wesley, 1974. 3. Cramer, H. Mathematical methods of statistics. Princeton: Princeton University Press, 1946. 4. Hannan, E. J. Multiple time series. New York: John Wiley, 1970. 5. Hannan, E. J. The asymptotic theory of linear time-series models. Journal of Applied Proba-

bility 10 (1973): 130-145. 6. Heijmans, R.D.H. and J. R. Magnus. On the first-order efficiency and asymptotic normality

of maximum likelihood estimators obtained from dependent observations. Statistica Neer- landica 40 (1986): 169-188.

7. Heijmans, R.D.H. and J. R. Magnus. Consistent maximum-likelihood estimation with dependent observations: the general (non-normal) case and the normal case. Journal of Eco- nometrics 32 (1986): 253-285.


8. Hildreth, C. Asymptotic distribution of maximum likelihood estimators in a linear model with autoregressive disturbances. Annals of Mathematical Statistics 40 (1969): 583-594.

9. Holly, A. Problem 85.1.2. Econometric Theory 1 (1985): 143-144. Solution in Econometric Theory 2 (1986): forthcoming.

10. Magnus, J. R. Maximum likelihood estimation of the GLS model with unknown parameters in the disturbance covariance matrix. Journal of Econometrics 7 (1978): 281-312. Corrigenda in: Journal of Econometrics 10 (1979): 261.

11. Magnus, J. R. and H. Neudecker. The commutation matrix: some properties and applications. The Annals of Statistics 7 (1979): 381-394.

12. Malinvaud, E. Statistical methods of econometrics (2nd edition). Amsterdam: North-Holland, 1970.

13. Mann, H. B. and A. Wald. On the statistical treatment of linear stochastic difference equations. Econometrica 11 (1943): 173-220.

14. Marcus, M. and H. Minc. A survey of matrix theory and matrix inequalities. Boston: Allyn and Bacon, 1964.

15. Poincare, H. Calcul des probabilites (2nd edition). Paris, 1912. 16. Rao, C. R. Linear statistical inference and its applications (2nd edition). New York: John

Wiley, 1973. 17. Rozanov, Yu. A. Stationary random processes. San Francisco: Holden-Day, 1967. 18. Rubin, H. Some results on the asymptotic distribution of maximum- and quasi-maximum-

likelihood estimates (abstract). Annals of Mathematical Statistics 19 (1948): 598. 19. Schonfeld, P. A useful central limit theorem for m-dependent variables. Metrika 17 (1971):

116-128. 20. Weiss, L. Asymptotic properties of maximum likelihood estimators in some nonstandard

cases, II. Journal of the American Statistical Association 68 (1973): 428-430.

APPENDIX A: SEVEN AUXILIARY LEMMAS

In this Appendix we prove seven lemmas which are applied in the main text and in Appendix B.

LEMMA A.l. (Anderson [1, p. 25]). Let 1, 2,'... be a sequence of nonnegative real numbers. Then (1/n) max , -+ 0 as n -* oo if and only if (1/n)Mn -* 0 as

1 <t<n

n -+ oo.

Proof. If (1/n) max,-ITh-n --* 0, then (l/n). <, (1/n) max,-<t-<nt -O 0. To prove the converse, assume that (1/n)Pn 0. Let t(n) be the largest index t (1 < t < n) such that I-t(n)

as n -+ oo, then (1/n) max11_t _uA (I/t(n))t(n)O - 0. U

LEMMA A.2. Let it1, P2, . . . be a sequence of nonnegative real numbers. Suppose that (1/n) Et= 1 ,-+ ,o (finite), as n -+ oc. Then

(1/n) max pt -+ 0, as n -+ oo 1 <t n

Proof. Immediate from Lemma A. 1. U

LEMMA A.3. Let {xn, n E NJ} be a sequence of random variables with finite second moments. If there exists a sequence of random variables {Yn, n Ec N} such that

Xn = xn(y1 . * X Yn)

and

E(x. I y1. Yn- 1)-?

then the random variables {xn, n E NJI are uncorrelated.

Proof. For i <j, cov(xi,xj) = Exixj = EE(xixjIyl, yj-1) = E[xiE(xjIyl,. , Yi-01) = ? U

LEMMA A.4.7 Let vl V2, ... be a sequence of random variables satisfying E v Vt| C < oo for some 6 > 0, and let 1, 12,... be a sequence of nonnegative real numbers. If there exists an a > 0 such that

n-atn -? 0 as n -+ oo

398


and

n

n-ao Z Po is bounded in nfor some 0 e (0, 5), t= 1

then

0

lim E n-' max !Ltvt = 0. n-0 ao -1 t<n

Proof. Let dn = n- max 1Ut. Let E > 0 be arbitrary, and define 1 <t<n

Jvt, if |vt| > /d, }O, otherwise.

Then,

\ n n

E max goIvf0) E y4EIv'j = E jyE(jv'j-(f-l)|v'j6) 1 <t<n t=1 t=1

n n

< Z /Ll4(dn/1)0EIV,I6 i C(d /e)6 E Ito t=l_ t= 1

Hence,

0 ( En` max Ilttvt n OE max ,otIvtl) 1 <t<n 1 F?t<n

n-E (max /4Ivhl) + n- E(max i?lvt -vtl) I -<t-n 1 <t,<n

n

< C(dn,f) -n _a E / + n max pu(E1dj) t= 1 1 <t<n

< C'(d/,,F)"-0 + eO.

Since dn -O 0 as n -+ o (using Lemma A.1), and E was chosen arbitrarily, the result follows. U

LEMMA A.5. Let vl, v2,... be a sequence of X2(i) distributed random variables, not necessarily independent, and let PI, 12,... be a sequence of nonnegative real numbers. Suppose that (1/n) Zt= 1 It - po (finite), as n -o cxo. Then

p lim (1/n) max p,v' = 0 nfooo 1 erty0n

for every a > O.


Proof. Immediate from Lemma A.4. U

LEMMA A.6. Let An and Qn be given n x n matrices, positive definitefor every 0 0

n E , and let 8(n) - N(O, Qn). Let F be an open set in RP, and let y0 be a given point in F. For every (fixed) n E N, let /n be a real-valuedfunction, b(n) an n x 1 vector function,

and Bn an n x n symmetric matrixfunction, each defined on F. Assume thatfor every a > 0 there exists a neighborhood N(7y) c F of 7. and an integer nO (depending only on cx and N(7o)) such that for every n > nO and / e N(y0),

(i) (l1n)jl.n(T)j <. cc,

(ii) (11n)b(n)(T)A b(n)(7) < Y.,

(iii) max 'inJ') a, 1 StSn

where )t(y), t = 1, . . ., n, denote the eigenvalues of the symmetric n x n matrix A' 12Bn(y)Al'2

Assume further that

(iv) (1/n) tr (A -1 'n)< K, (n e N),

for some finite positive number K, and

(v) lim (1/n2) tr (An- 'Qn)2 = 0. nos~~~~~ n-

oc~~~~~~~~~~~~~~~~ Then there exists for every a > 0 a neighborhood N*(y0) c F of y0 such that

lim P[(1/n) sup f,,n(y) + b'n)(T)e(n) + e(n)B&(y)I(n) > a 0- n- oo yEN*(y.)

Note. In practice we shall choose either An = In or An = Qn. In the latter case conditions (iv) and (v) are automatically satisfied for K = 1.

Proof. Let a > 0 be arbitrary. Given conditions (i)-(iii), there exists a neigh-

borhood Nk(oy0) c F of yo and an integer nO (depending on ox and NO(yo)) such that, for every n > n, and y E NOGA

(l1n)jlpn(Y)j < x/2,

(l/n)b'.)(y)A.b(() L x2/(32K),

and

max 1inty)j < x/(8K). 1 StSn


Let Nl(y0) be another neighborhood of y0 such that its closure N,(y0) is a subset of Nj(y0). Then

(1/n) sup lfl.(y)| < a/2, (n > no), (A.1) y e NI(y.)

(1/n) sup b'n,)('y)Anb(n)(y) ? Lx2/(32K), (n > no), (A.2) y e-Nj(y.,)

and

sup max cx/(8K), (n > nj). (A.3) yeNj(y,,) 1 St-<n

To avoid cumbersome notation let us abbreviate ?(n)5 fln(Y), b(n)(y), and Bn(y) to ?, fi, b, and B; also, each supremum is understood to be over all y in Nl(y0). Now consider the random variable wn := (1/n)e'A- '?, whose distribution does not depend on y, and note from (iv) and (v) that

Ewn < K and var wn -+ 0 as n -- oo.

Let n := (l/n) sup b'Ab. Then, since

(b'g))' < (b'Ab)(E'A -le),

we obtain

P((Iln) sup lb'E| > a/4) < P(N/Onw > a/4)

= P(Wn > LX2/(160n)) < P(wn > 2K),

using (A.2). Similarly, let tfr,n = sup max1 t( n lInt(Y)I. Then, since

|9 Bel < max Int(y)|('AA '?), 1 StSn

we obtain

P((l/n) sup E'BEI > a/4) < P(OnWn > a/4) = P(wn > a/(4fr)) < P(wn > 2K),

using (A.3). Hence, for n > no,

P((1/n) sup |3 + b'? + E'B?l > a) <i P((l/n) sup 1fl + (1/n) sup lb'EI + (1/n) sup |?'BEf > cx)

< P((1/n) sup |b'E + (1/n) sup |E'BE > a/2) (using A 1))

< P(max {(1/n) sup Ib'eI, (1/n) sup |?'B?I } > ae/4)

(P((1/n) sup Ib'EI > a/4) + P((1/n) sup VE'BE > a/4) < 2P(wn > 2K) < 2P(wn - EWn > K)

< 2P(Iwn - EWnI > K) < (2/K2) var wn -+ 0

as n -+ oo. This concludes the proof.


LEMMA A.7. Let u N(O, I.), x = Au, and y = Bu, where A and B are real m x n matrices. Then

var x'y < '(var x'x + var y'y).

Proof. We have

var (x'x + y'y) - 4 var x'y = var u'(A'A + B'B)u - var u'(A'B + B'A)u

= 2 tr (A'A + B'B)2 - 2 tr (A'B + B'A)2

= 2 tr(A + B)'(A + B)(A - B)'(A- B) > 0,

and hence

2(var x'x + var y'y) = var (x'x + y'y) + var (x'x - y'y)

> var(x'x + y'y) > 4 var x'y. U

APPENDIX B: PROOFS

Proof of Proposition 1. The loglikelihood AJ(y) is given by equation (5). Since

log QM(y)I = -tr aQ (v) Qn(v))T)

and

a - (Y(n) -(,)(y))A,(y)(Y(n)- (n)CY))

- 2(y(n-. I()(Y) )Af( ) 8()

+ (Y(n) -(n)(y)) ( A&) (Y(n) -yI)(7))) (B.1)

for any symmetric n x n matrix function A defined and differentiable on F, we obtain the expression for 1,3{y). Further, since

a (Qn ____ MQ_ 1(y2) (T) aQ2 ( ) tr "~(Y)' = tr Mn"7 aJi Oyj ~ ~ay7i ayj

+ tr d a71

03(y-j (a^(())' Q-1(5) (aI(n)(Y))

+ (y) - 9(n)())' (aQ,7'(y) )

a7, a71 +1 jd1i )

and using (B. 1) we obtain the expression for R,,Jy). Finally, since

E(Y(n) - Y(n)(Yo)) = 0, E(Y(n) - /(f)(7Y))(Y(n) - Y(n)(7o)) = ln(Y.),

we obtain the expression for Gnij{yo)-

Proof of Proposition 2. Recall from Proposition 1 that

1nh)-t (aQ'(70 y) \(nv\~__ 1a~(

2= 2

tr2 a71 (7O)}+ ay ) n(y0)-2 aQ7

403


where e = y - (yj - N(O, Q(7y)). Clearly we have El,,j(y0) = 0 (first-order regularity). To prove second-order regularity, we first note that

E(e'AE)E = 0

and

cov (e'Ae, E'Be) = 2 tr (AQ(yj)BQ(y0))

for any pair of symmetric n x n matrices A and B (see, e.g., Magnus and Neudecker [11, Corollary 4.1]). It is then easy to show that

Elni(y0J)l1.C{Y) = coV (Ini(v0) In{(o))

+ ? trQ )(YO) Q(m) aQy(o) ?(4Y))

= ERnij(To),

according to Proposition 1. U

Proof of Lemma 4. From equation (18) we have

tj = - Itj(I- U2) + (6tj -wtj)t

and hence

tj < a + ct?jut + 4t?j t + 4w?j u

Let

2 = (Il/t(y"))(azt(Yo)laTj) Qt(To)(azt(yo)l@i)- (B.2)

Then the random variable wtj, defined in equation (22), is distributed as N(O, rh), and so the random variable vt = Wj/Tj is distributed as x2 (1), Thus we obtain

2 <, 4r2j(l + U2 + U4 + Vj),

where we recall from equation (26) that

2:= var = 62; + 62 + T 2

using (B.2). Since 4tj, 1 < t < n} is a sequence of uncorrelated random variables (Lemma A.3 and equation (24)), we have

n n n

Z o2i = E var (tj = var E Xtj = var 1n3{yj) = Gnjj(y,,) t=1 t=1 t=1


using Propositions 1 and 2, so that

2 (1/n) I utj = (l/n)Gn.j,~7) -+ G0OjJ{7y), (B.3)

as n - oo (Assumption 4). Hence

p lim (1/n) max X < 4p lim (1/n) max cr7/i + u2 + u4 + vt2) = 0 n -+co. 1 <tnn n-oo 1 <t n

using Lemmas A.2 and A.5 in Appendix A. U

Proof of Lemma 5. Recall from equation (25) that

E(4tjXtjI(t - 1)) = {2ti;tj + (bti - wti)(3tj -w

Let us define the strictly upper triangular n x n matrix

Q = (aZ,(T,)/aTj)A,- 1/2(7y) (j = 1, . * *

and the n x 1 vector

:= A- 1/2(70)Zn(Y0)(a!(n)(Y0)/a7J) (j = 1.

whose components are btj (1 S t < n). Then

n var E E(4tijtjI (t -1))

n = var (wtiwtj - (btiwtj + btjwti))

= var (6(n)QniQ -(n) -(P(n)iQnj + P(n)jQni)8(n))

= var (E(n)QniQn'j-(n) + var (P(n)iQnJE(n) + P(n)jQni3(n))

< 2 var (e6n)QniQn ig()) + I

var (8in)QnjQnlJ(n))

+ 2 var (p(Q"3iQn,() + 2 var(P'n)jQn ig())

(using Lemma A.7)

= tr (QnQn(0o)Qni)2 + tr (QnjQn(To)Qnj)

+ 2P'n)iQnjQn(yo)QnjP(n)i + 2P(n)jQniR,(7o)QniP(n)j

Now, since

P (n)iQnJQn(Jo)QnjP(n)i < (Ppn)i(n)d max 4QniQn(To)Qnj) 1 G <t((n

<, Gni(y0) t


we obtain

n (1/n2) var E E(4tj,tjIE(t- 1))

t=

< (1/n') tr (QnJ1n(YQni)2 + (1/n 2) tr (Qnifn(YJQnj)

+ (2/n)Gij(y.) ,/(l/n2) tr (Qnjn(y.)Q )

+ (2/n)G.jJJy0) /(1/n2) tr (Qin(yo)Qn)Qni)2 0,

as n -- o, by Assumptions 4 and 5.

Proof of Lemma 6. Let

Vtij = XtiQtj- E(4titjIYI, , Yt- 1), (i,j = 1 p),

and recall from equation (27) that its variance is bounded by

var v,ij < 14ai,2a.

Then

n n (1/n2) E var v,ij (14/n2) E U2U2

t=1 t=1

S 14((1/n) max o2) ((1/n) E a j) 0, 1 Stin t

as n -+ o, because of (B.3) and Lemma A.2. U

Proof of Proposition 4. Let

= [.j = ln(Y0) -n - (Y.), Xnp

where l0(y.) = 0 since L.(y) = 1 for all y. We wish to demonstrate

n- 1/2 L+4 N(O, Gj(yj). t= 1

Now, {l1(y0)} is a vector martingale (Proposition 3), so that {n 1/24t 1 < t < n, n e N} is a vector martingale difference array. (See Heijmans and Magnus [6, section 3] for details.) Hence, if we can show that

(i) (1/n)E max 42 is bounded in n, (j = 1,...,p), 1 <_<n


(ii) p lim (1/n) max 42 = 0, (j = 1 .. ,p) n-~c*o 1$t$n

n

(iii) p lim (1/n) E Xtijtj = G,Jj3Y0), (i,j = 1 .* n -oo t = I

then the result will follow from Proposition 1 of [6]. To prove (i), we note that

n

(1/n)E max 2 (1/n)E E 42 I -<t,n t =

n

= (1/n) ; var Xtj = (l/n)Gnjjyj) -+GOjj7)

as n -- oo, using (B.3). Hence (i) holds. Condition (ii) follows from Lemma 4. Since E4tj4Sj = 0 (t # s), we obtain

(1/n)E Z Xti,tj = (1/n)E ( Xti) (E Xtj) t= I t=lI t= I = (1/n)E1.i(7J)1.(jy) = (lln)GngJ3y0) G-j+7o )

as n -+ oX. Also, in view of Lemmas 5 and 6,

_n

(1/n2) var L Xtijtj -? 0,

as n -+ oo. Hence condition (iii) holds as well. U

Proof of Lemma 8. We recall, from Proposition 1, that

Rn47i=) -Gnijho{y0) + qn'i{7Y)J(n)

- (?9flBn^j8(O(n) - Ee'n)Bn,,iy o)8(n))'

where

q7?J/y) = anr; (r) azX n)() + aS (Y) OP(n)() + Q( 2) (n)

and

02Q- 1(y) Bni?hy)

n

In view of Assumption 4, it is sufficient to show that

p lim (1/n)qnij(n) = 0 n- oo


and

p lim (l/n)(()BfnJ(yO)8(nl)-Ec(n )Bf(ynij)8(f)) = O.

Sufficient (and in the first case also necessary) for these two conditions is that

(1/n2) var q'ije(n) -+ 0

and

(l/n2) var 8(',)BnijjC(n) *+ O,

as n -+ oo. The former of these limits follows from Assumption 7; the latter follows from Assumption 6. U

Proof of Lemma 9. From Proposition 1 we obtain

Rni{y) - Rnij{o) = fln(Y, Yo) + b(n)(Y, Yo)C(n) + S(fn)Bn(Y, Yo)E(n),

where &(n) = Y(n) -(n)(Yo) is distributed N(O, Qn(Yo))

, {(aIg;)(y) )Q 1 (n)(v) - (OI'n)(Yo))

(Y, Y -) tr (8Q (y) _y OQ (0 Q 0)) }+ {(

- try aQnn (2') 3n(2'+93y) - I(n)(Y) - !()2'0)) 1~~~~~~~~~~~~~~~

oy yj i ayi ayieyj

a2K2 - 2'a' n)-

Iny2'))

b(~)(2', y tr n

Q' YO - (n)(2') ~~Y) - n)Y)qi0

B,,(y,ye): -~ (a~2%v) a2nQ l(y))

and~~~~~~~~~~~~~

q~~~~~~~(y) = + ~~~~(i(n)(+Q,7'(y)(n)(o) 2 ~ ayay

To complete the proof we need only show that conditions (i)-(v) of Lemma A.6 are satisfied. But this is easy in view of our assumptions: Taking An = In, (i) follows


from Assumption 9 (i)-(vi), (ii) follows from Assumption 9 (vii)-(ix), (iii) from Assump- tion 9 (x), and (iv) and (v) follow from Assumption 8. U

Proof of Theorem 3. Assumption 2 follows from Assumption 2', Assumption 6 follows from Assumption 6' (iii), Assumptions 7(i) + (ii) follow from Assumptions 7'(ii) + (iii), and Assumptions 8(i) + (ii) follow from Assumption 6'(i) using the fact that ((1/n) tr Q')2 < (1/n) tr W2. Hence, if we can show that Assumption 9 holds, then Theorem 3 will follow from Theorem 2. 0

To prove conditions (i)-(x) of Assumption 9, let 4 E F and a > 0 be arbitrary. Let NO(4)) be a bounded neighborhood of 4 such that Assumptions 6'-8' hold for all n E N, y E No(X), and 0 E NO(o). Let N1(j) be a "smaller" neighborhood of 4 such that its closure N1() is a subset of NO(4)). Then

Mo = sup {M(y): y e IV

and

M = sup {M1(y, 0): (l, 0) c N1(4)) x NR(4))

are finite nonnegative numbers, because M(-) and M1(,-) are continuous functions on a compact set. Now let 6 be an appropriately chosen (small) positive number depending on a, Mo, and M1. In view of Assumption 8'(iii) we may then choose a neighborhood N2() c N1(j) such that

yeN2(4I) 1j 1$t 2f -1 ,n

sup maxl 1(4A)) ( 6, (i,j = 1 p) y eN2(0) I l t n

t ^yia2y j a>yia7Yj

and

sup (y- 0A'(y-)0) y e N2((')

We now claim that Assumption 9 holds for every n Ec N and y E N2(0). In particular, 7'(ii) + 8'(i) => 9(i), 7'(iii) + 8'(i) =>9(ii), 6'(ii) + (iii) =:.9(iii), 6'(i) + (ii) + (iii) + 8'(iii) 9(iv), 7'(ii) + (iii) + 8'(i) 9(v), 7'(i) + 8'(ii) => 9(vi), 8'(ii) 9(vii), 7'(iv) + 8'(ii) 9(viii), 2' + 7'(iv) + (v) => 9(ix), and 8'(iii) 9(x).

A detailed proof of these ten implications is tedious but straightforward, and is available from the authors upon request. Repeated use is made of Cauchy-Schwarz' inequality, the mean-value theorem and the mean-value theorem for vector functions (Apostol [2, p. 355]). U

Proof of Theorem 5. We shall verify the conditions of Theorem 4, i.e., Assump- tions 1, 2', 3-5 and 6"-8".

Assumption 1 follows from Cl. Assumption 2' follows from C2 and the fact that a72pt- '/(1 _ p2) is a twice continuously differentiable function of a2 and p for every t N N. To prove Assumption 3 we note that "(p)I = 1/(1 - p2), SO that Q(U2 p) = aVn(p) is positive definite for every n Ec N, U2 > 0, (pI < 1.


To demonstrate Assumption 4 we define the n x n matrices

0 1 0 01 O O O O O O

O 0 1 .. O 0 1 0 .. O

En = ? ? ? ??V and An:=LOOK

... ... .. 1 . . . ... I..

O O O . 0 0 O O .. 1 0

The inverse of Qn can now be expressed as

C- -(y2, p) = (l/l2)(In + p2An - p(En + EJ))

from which we obtain

n '(a2,p)/na2 = -(1/ )n (ap)

and

Of 1'(CJ2, p)/ap = (l/a2)(2pAn - (En + En)).

Hence,

- tr ((Of) -/aa2)Q")2 = n/(2a4), 2 n P

2 tr ((00 -l'?P)Qn= 1)/(1 - p2)-+ 2p2/(1 p2)2,

and

- tr ((@QV '/a2%f)(9(n '/Op)(n) = p/(a2(1 _ p2)).8

Now, with H (fl, p) := (anl(fl)/Pa1' V,, '(p)(aPn,(fi)/1a '), the symmetric (k + 2) x (k + 2) matrix G(f, p, a2) takes the form

(j1aT2)Hn(X8 p) O O

p) 0

G n(, P ff ) 2_ p2 (I _ p2)2 C2 (l _ p2),

r_

of 2(1- p2) n/(2a4) G~(f3, p, a2) = +(a _2(p2)

and, using C3, the matrix (1/n) G,(f3, p, a2) converges as n -* oo, to

(1/a2)Hf(#, p) 0 0

GO(fP,, a2) = 1/(1 _ p2) 0

Of'0 1/(2a4)1

ASYMPTOTIC NORMALITY OF ML ESTIMATORS 411 0

Furthermore, the matrix G (f, p, a2) is positive definite for every ,B e B, IPI < 1, U2 > 0, because H0(f3, p) is positive definite.

To demonstrate Assumption 5 we define the n x n matrices

Zn(p) = - pEn, and An(cf2 p) = o2(In + (p2/(l _ p2))e-e'n)

where e(n) is the first column of In. Then Q-' = ZnA,- 'Z is the Cholesky decomposition of Qn ' and

(Iln 2) tr ((0Zn10P)A - V(Zn10P)'Qn")2

= (1/n2)tr(En(I - pe(n)e(n))E'Vn)2

= (1/n2) tr (EnVnE )2 = (l1/n2) tr (Vn2- 1)

= (n(l - p2) _ (1 - p2,))/(n2(1 _ p2)4) 0,

as n -+ oo.

Next, let us verify Assumptions 6"-8". We shall need the following result. Let A = (a,j) be an arbitrary (possibly complex) n x n matrix and let

n n R := max Z taijI and T:= max Z |aijI.

1 $i <n j= 1 $<j <n i= 1

Then

max I A), A min (R, T). 1 <t n

Proof. See [14, p. 145]. U

Using this result, and noting that

02f) 1(-a2, p)/0f20a2 = (2/a6) Vn- (p), (B.4)

02Q -1(l2, p)/apap = (2/U2)A.5 (B.5)

a2Q- 1(2, p)/aa2ap = -(1/ag4)(2pAn -(En + E)), (B.6)

we obtain

max )t(QJ(a2, p)) < 2a2/(1 - pl)2 1 (t$n

and

max 2Al(Q(1)(U2, p))j I 4(u4 + 2)/a6, (h = 1,. . .,6), 1 (t(n


where

QMl =i Q1-, Q(2): = 0Qn- 1/aC2, Q(3) := dQn Il/p

Q(4) =2Q'-/Oa20a2 Q(5) := -2Q /apap,

Q(6): 02Ql-11/062ap.

This shows that Assumption 6" holds. Assumption 7" follows from C4. Finally, to prove that Assumption 8" holds, we note from (B.4)-(B.6) that

Q1,4)(U2, p) _ Q'4)(C2, p0) Qn ( P n (o0 o

_ 2 ar2<) -3((U6 _ U6)I + (p2cr6 _ pO2U6)An + (p0A6 - po')(En + E')),

Qn )(U2, p) - Q5)(uo, po) = 2(a2a>2) - '(of_ -2))A,

and

Q(6)(Uy2 p) _ Q(6)(ql2 p ) = (a2r2) -2((2p0o4 - 2pU4)A5 + (U4 -o4)(En + ER)).

Hence we obtain (after some elementary arithmetic) for h = 4, 5,6,

max |t(Q(h)( a p) - Q( Po)p0))j 1 <t<n

< (2(1 + (y2)(1 + C2)/(u2U2))3(j2 _- 2 1 + Ip - Po).

This concludes the proof of Theorem 5. U

Date post:	02-May-2020
Category:	Documents
Upload:	others
View:	17 times
Download:	0 times

Econometric Theory, ASYMPTOTIC NORMALITY OF MAXIMUM ... · ASYMPTOTIC NORMALITY OF MAXIMUM...

Documents