+ All Categories
Home > Documents > Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium...

Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium...

Date post: 10-Nov-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
35
Asymptotic Distributions of Quasi-Maximum Likelihood Estimators for Spatial Econometric Models II: Mixed Regressive, Spatial Autoregressive Models by Lung-fei Lee* Department of Economics The Ohio State University Columbus, Ohio August 1999; April 2001 Abstract Asymptotic properties of MLEs and QMLEs of mixed regressive, spatial autoregressive models are in- vestigated. The stochastic rates of convergence of the MLE and QMLE for such models may be less than the n-rate under some circumstances even though its limiting distribution is asymptotically normal. When spa- tially varying regressors are relevant, the MLE and QMLE of the mixed regressive, autoregressive model may not have this problem and they can converge at the n-rate. If the spatially varying regressors are irrelevant, estimates of various components of unknown parameters may possess dierent rates of convergence. Key Words: Spatial autoregression, mixed regressive spatial autoregressive model, maximum likelihood estima- tor, quasi-maximum likelihood estimator, rate of convergence, asymptotic distribution, inll asymptotic, increasing-domain asymptotic. Correspondence Address: Lung-fei Lee, Department of Economics, The Ohio State University, 410 Arps Hall, 1945 N. High St., Columbus, OH 43210-1172. E-mail: l[email protected] * This paper is a revised and expanded version of the second part of a paper previously circulated under the title Asymptotic Distributions of Maximum Likelihood Estimators for Spatial Autoregressive Models. Earlier version of the paper was presented in seminars at HKUST, NWU, OSU, Princeton U., PSU, U. of Florida, U. of Illinois, and USC. I appreciate comments from participants of those seminars. 1
Transcript
Page 1: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

Asymptotic Distributions of Quasi-Maximum Likelihood Estimators

for Spatial Econometric Models II:

Mixed Regressive, Spatial Autoregressive Models

by Lung-fei Lee*

Department of Economics

The Ohio State University

Columbus, Ohio

August 1999; April 2001

Abstract

Asymptotic properties of MLEs and QMLEs of mixed regressive, spatial autoregressive models are in-

vestigated. The stochastic rates of convergence of the MLE and QMLE for such models may be less than the

√n-rate under some circumstances even though its limiting distribution is asymptotically normal. When spa-

tially varying regressors are relevant, the MLE and QMLE of the mixed regressive, autoregressive model may

not have this problem and they can converge at the√n-rate. If the spatially varying regressors are irrelevant,

estimates of various components of unknown parameters may possess different rates of convergence.

Key Words:

Spatial autoregression, mixed regressive spatial autoregressive model, maximum likelihood estima-

tor, quasi-maximum likelihood estimator, rate of convergence, asymptotic distribution, inÞll asymptotic,

increasing-domain asymptotic.

Correspondence Address:

Lung-fei Lee, Department of Economics, The Ohio State University, 410 Arps Hall, 1945 N. High St.,

Columbus, OH 43210-1172.

E-mail: lß[email protected]

* This paper is a revised and expanded version of the second part of a paper previously circulated underthe title Asymptotic Distributions of Maximum Likelihood Estimators for Spatial Autoregressive Models.Earlier version of the paper was presented in seminars at HKUST, NWU, OSU, Princeton U., PSU, U. ofFlorida, U. of Illinois, and USC. I appreciate comments from participants of those seminars.

1

Page 2: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

1. Introduction

Spatial econometrics have recently received much more attention in both the theoretical and empirical

econometric literatures. Spatial econometric models provide structures which are intended to capture spatial

interactions of economic units or agents located in an economic or geographical space.

The spatial autoregressive models popularized by Whittle (1954), Mead (1967) and Ord (1975) have

received the most attention. Earlier development in testing and estimation of the spatial autoregressive

models has been summarized in Paelinck and Klaassen (1979), Doreian (1980), Anselin (1988, 1992), Haining

(1990), Kelejian and Robinson (1993), Cressie (1993), Anselin and Florax (1995), Anselin and Rey (1997),

and Anselin and Bera (1998). Empirical applications are, for example, Case (1991, 1992), Case et al.

(1993), and Kelejian and Robinson (1993). The autoregressive models can be estimated by the method of

maximum likelihood (ML) as well as methods of moments (Kelejian and Prucha 1999a). Computational

methods for the ML estimator (MLE) have been proposed in Ord (1975) and most recently by Smirnov and

Anselin (1999). Lee (2001) has studied some asymptotic properties of the MLE and the quasi-maximum

likelihood estimator (QMLE) based on the normal likelihood speciÞcation for the pure (Þrst order) spatial

autoregressive process. His analysis relates the asymptotic distribution of the estimates to the expanded

process of the spatial weights matrix as sample size increases.1

This paper investigates the mixed regressive, spatial (Þrst order) autoregressive model (mixed regressive

model, for short). The mixed regressive model differs from the pure spatial autoregressive process in the

presence of regressors in the equation. The presence of spatial varying regressors plays a distinctive role

in the model. As compared with time series models counterparts, the pure spatial autoregressive process

corresponds to an autoregressive process, and the mixed regressive model corresponds to a dynamic regression

model with lagged dependent variable. However, the spatial models have the distinguishing feature of

simultaneity in econometric equilibrium models. In the presence of spatial varying regressors, in addition

to the ML method, econometricians have considered the estimation of the mixed regressive model by the

method of instrumental variables (IV), (Anselin 1988; Kelejian and Prucha 1998; and Lee 1999a). However,

their IV estimation method will break down when all the spatial regressors are really irrelevant, and in their

IV framework, one can not test the joint signiÞcance of the regressors (Kelejian and Prucha 1998). These are

so, because the dependent variable can not be explained by the existing regressors when they are irrelevant.

1 Manski (1993) has criticized the literature on the spatial correlation models without specifying how thespatial weights matrix should change as the sample size.

2

Page 3: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

The ML method, however, is still applicable. This paper will follow up on the analysis of Lee (2001) to

consider asymptotic properties of the QMLE for the mixed regressive model.

In Section 2, we will show that when some of the spatial varying regressors are relevant, identiÞcation

of parameters can be assured if there is no multicollinearity among the regressors and a spatially generated

regressor. The QMLE can be consistent and asymptotic normal under some regularity conditions on the

spatial weights matrix. In the event that multicollinearity occurs as the spatially generated regressor is

collinear with the original regressors, identiÞcation of the spatial interactive parameter needs to be identiÞed

through spatial correlation of outcomes. For the latter, in the scenario that each unit will be inßuenced by

only a few neighboring units, the QMLE can be√n-rate of convergence and asymptotic normal. Section

3 considers the multicollinearity case with special emphasis on the spatial scenario that each unit can

be inßuenced by many neighbors. In this case, irregularity of the information matrix occurs and various

components of the QMLEs may have different rates of convergence. We discuss results for the mixed

regressive model where all the included spatial varying regressors are irrelevant. In Section 4, an example on

the inconsistency of the QMLE for the mixed regressive model is presented and this phenomena is related

to the notion of inÞll asymptotics (Cressie 1993). Section 5 provides the conclusions. Some useful lemmas

and all the proofs are collected in the Appendix. Additional lemmas related to the proofs will be referred to

Lee (2001).

2. Mixed Regressive, Spatial Autoregressive Models

2.1 The Model

A mixed regressive model differs from a spatial autoregressive process in that regressors are presented

as explanatory variables. The (Þrst-order) mixed regressive, spatial autoregressive model is speciÞed as

Yn = Xnβ + λWn,nYn + Vn, (2.1)

where Xn is an n× k dimensional matrix of regressors, Wn,n is a speciÞed spatial weights matrix, and Vn is

a n-dimensional vector consisting of i.i.d. disturbances with zero mean and Þnite variance σ2. This spatial

model is an equilibrium model in that the observed outcomes Yn for the spatial units are determined by Xn

and Vn. Denote Sn(λ) = In−λWn,n for any value of λ. To distinguish the true parameter vector from other

possible values in a parameter space, θ0 = (β00,λ0,σ

20)0 denotes the true parameter vector of the model.

Assumption 1. At the true parameter λ0 of λ, Sn, where Sn = Sn(λ0), is nonsingular.

3

Page 4: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

Under this assumption, the equilibrium outcome vector is

Yn = S−1n (Xnβ0 + Vn). (2.2)

In this paper, our main concern is to investigate asymptotic properties of the MLE under the normal

distribution of Vn. In the event that Vn were not really normally distributed, the estimator corresponds to

a QMLE. We will consider the asymptotic properties of the QMLE of which the MLE is a special case. Let

Vn(δ) = Yn −Xnβ − λWn,nYn where δ = (β0,λ)0. Thus, Vn = Vn(δ0) at the true parameter vector. The log

likelihood function of the model (2.1) is

lnLn(θ) = −n2ln(2π)− n

2lnσ2 + ln |Sn(λ)|− 1

2σ2V 0n(δ)Vn(δ), (2.3)

where θ = (β0,λ,σ2)0. The QMLE θn is the extremum estimator derived from the maximization of (2.3).

To provide a rigorous analysis of the QMLE, some basic regularity conditions are assumed for the model

below. Additional regularity conditions will be subsequently listed as needed.

Assumption 2. The disturbances vi, i = 1, · · · , n, are i.i.d. with mean zero and variance σ2. Its

moment E(|v|4+2δ) for some δ > 0 exits.

Assumption 3. The elements wn,ij of the weights matrix Wn,n are of order O(1hn) uniformly in all

i, j, i.e., there exists a constant c such that |wn,ij | ≤ chnfor all i, j and n, where the sequence of rates hn

can be bounded or divergent.

Assumption 4. The ratio hnn → 0 as n goes to inÞnity.

Assumption 5. The matrices Wn,n and S−1n are uniformly bounded in both row and column sums.

Assumption 6. The regressors xi are vectors of constants and are uniformly bounded. The limit

limn→∞ 1nX

0nXn exists and is nonsingular.

Assumptions 2-5 are the assumptions that provide the essential features of the disturbances and the weights

matrix for the model. Assumption 5 is originated by Kelejian and Prucha (1998, 1999a, 1999b).2 The

uniform boundedness of the weights matrix Wn,n and its associated S−1n are conditions to limit the

spatial correlation in a manageable degree. Assumptions 3 and 4 link directly the expression of the spatial

weights matrix Wn,n to the sample size n. These conditions are also assumed in Lee (2001) for the analysis

of the pure spatial autoregressive process. Examples of spatial models in empirical applications which satisfy

2 Related conditions have also been adopted in Pinkse (1999) in a different context. Useful properties ofboundedness in row and column sums can be found in Horn and Johnson (1985).

4

Page 5: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

the above assumptions include conventional spatial weights matrices where neighboring units are deÞned by

only a few adjacent ones, and models of Case (1991, 1992) where all spatial units in a district re neighbors

of each spatial unit in the same district. Discussions of the above assumptions, in particular, Assumptions

3-5 and examples can be found in Lee (2001). The mixed regressive model differs from the pure spatial

autoregressive model in the presence of regressors in the model. As the mixed regressive model is used for

analyzing cross-sectional units located in a space, it is meaningful to assume that the values of regressors are

bounded as in Assumption 6. Multicollinearity among the regressors of Xn are ruled out as usual in that

assumption.

2.2 IdentiÞcation and the Consistency of the QMLE

For the possible consistency of the QMLE θn, it shall be shown that1n lnLn(θ)−E( 1n lnLn(θ)) converges

to zero in probability uniformly on a compact parameter space Θ of θ and that θ0 is identiÞable. The uniform

convergence can be easily established as in the following theorem.

Proposition 2.1 Under the assumed regularity conditions, 1n [lnLn(θ)−E(lnLn(θ))]p→ 0 uniformly in

θ in any compact parameter space Θ.

For identiÞcation, it is known that nonsingularity of the information matrix is sufficient for the lo-

cal identiÞcation of parameters in a model via the likelihood function (Rothenberg 1971). The Þrst-order

derivatives of the log likelihood (2.3) are

∂ lnLn(θ)

∂β=1

σ2X 0nVn(δ),

∂ lnLn(θ)

∂λ= −tr(Wn,nS

−1n (λ)) +

1

σ2Y 0nW

0n,nVn(δ), (2.4)

and ∂ lnLn(θ)∂σ2 = − n

2σ2 +12σ4V

0n(δ)Vn(δ). The second-order derivatives of (2.3) are

∂2 lnLn(θ)

∂β∂β0= − 1

σ2X 0nXn,

∂2 lnLn(θ)

∂β∂λ= − 1

σ2X 0nWn,nYn,

∂2 lnLn(θ)

∂β∂σ2= − 1

σ4X 0nVn(δ),

∂2 lnLn(θ)

∂λ2= −tr([Wn,nS

−1n (λ)]2)− 1

σ2Y 0nW

0n,nWn,nYn,

∂2 lnLn(θ)

∂σ2∂λ= − 1

σ4Y 0nW

0n,nVn(δ), (2.5)

and ∂2 lnLn(θ)∂σ2∂σ2 = n

2σ4 − 1σ6V

0n(δ)Vn(δ). Denote Gn =Wn,nS

−1n . At the true parameter vector θ0,

− 1n

∂2 lnLn(θ0)

∂θ∂θ0=

1σ20nX 0nXn

1σ20nX 0nWn,nYn

1σ40nX 0nVn

1σ20nY 0nW 0

n,nXn1n tr(G

2n) +

1σ20nY 0nW 0

n,nWn,nYn1σ40nY 0nW 0

n,nVn1σ40nV 0nXn

1σ40nV 0nWn,nYn − 1

2σ40+ 1

σ60nV 0nVn

, (2.6)

and the average Hessian matrix (information matrix when vs are normal) is

− Eµ1

n

∂2 lnLn(θ0)

∂θ∂θ0

=

1σ20nX 0nXn

1σ20nX 0nGnXnβ0 0

1σ20n(Xnβ0)

0G0nXn1σ20n(Xnβ0)

0G0nGn(Xnβ0) +1n [tr(G

0nGn) + tr(G

2n)]

1σ20ntr(Gn)

0 1σ20ntr(Gn)

12σ4

0

. (2.7)

5

Page 6: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

Let Mn = In −Xn(X 0nXn)

−1X 0n be the projector into the space orthogonal to the column space of Xn.

Denote Cn = Gn − tr(Gn)n In.

Proposition 2.2 If limn→∞ 1n (GnXnβ0)

0Mn(GnXnβ0) 6= 0 or limn→∞ 1n tr[(C

0n +Cn)(C

0n +Cn)

0] 6= 0,

then Σθ is nonsingular where Σθ = − limn→∞ E³1n∂2 lnLn(θ0)

∂θ∂θ0

´.

The nonsingularity of Σθ implies the local identiÞable uniqueness condition of θ0 in the following proposition.3

Theorem 2.1 Under the assumed regularity conditions and that Σθ is nonsingular, there exists a neigh-

borhood Θ1 of θ0 such that, for any neighborhood N²(θ0) of θ0 with radius ²,

lim supn→∞

[ maxθ∈Θ1\N²(θ0)

[E(1

nlnLn(θ))− E( 1

nlnLn(θ0))] < 0.

The QMLE θn derived from the maximization of lnLn(θ) with θ ∈ Θ1 is consistent.

The presence of relevant spatial varying regressors in the model is a distinctive feature. From (2.1) and

(2.2), the reduced form equation of Yn can be represented as

Yn = Xnβ0 + λ0GnXnβ0 + S−1n Vn (2.2)0

because In+λ0Gn = S−1n . The sufficient condition in Theorem 2.1 that limn→∞ 1

n (GnXnβ0)0Mn(GnXnβ0) 6=

0 is equivalent to that the spatial generated regressor Zn where Zn = GnXnβ0 is linearly independent of

Xn in the reduced form equation (2.2)0. This condition is not only a local identiÞcation condition, but also

a global identiÞcation condition. This can, indeed, be seen from the expectation of the difference of the log

likelihood functions of (2.3). Let

σ2n(λ) =σ20ntr(S

0−1n S0n(λ)Sn(λ)S

−1n ). (2.8)

One has

E(1

nlnLn(θ))− E( 1

nlnLn(θ0)) =

1

2(lnσ20 − lnσ2) +

(σ2 − σ20)2σ2

+1

n(ln |Sn(λ)|− ln |Sn(λ0)|)

− (σ2n(λ)− σ20)2σ2

− 1

2σ2n[Xn(β0 − β) + (λ0 − λ)GnXnβ0]0[Xn(β0 − β) + (λ0 − λ)GnXnβ0].

The term involving Xn is analogous to the difference of the log likelihood functions of a normal regression

model of Yn with (known) regressors Xn and Zn where Zn = GnXnβ0. The presence of relevant regressors

plays the interesting role as if the spatial model were a regression model of Yn with regressors Xn and Zn.4

β0 and λ0 can be identiÞed as regression coefficients of Xn and GnXnβ0, respectively.

3 See White (1994) for the formal deÞnition of this condition.4 For cases with limn→∞ hn = ∞, the OLS regression of Yn on Xn and Wn,nYn may even provide

consistent estimates of β and λ in (2.1) (see, Lee 1999b).

6

Page 7: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

Theorem 2.2 Under the assumed regularity conditions and that limn→∞ 1n (Xn, GnXnβ0)

0(Xn, GnXnβ0)

exists and is nonsingular, θ0 is globally identiÞable and θn is a consistent estimator of θ0.

For global identiÞcation for the case when GnXnβ0 and Xn are multicollinear, the possible identiÞcation

condition can be more comprehensively stated in terms of the concentrated log likelihood function in λ. From

the log likelihood function (2.3) of the model, given an λ, the QMLE of β is

βn(λ) = (X0nXn)

−1X 0nSn(λ)Yn, (2.9)

and the QMLE of σ2 is

σ2n(λ) =1

n[Sn(λ)Yn −Xn βn(λ)]0[Sn(λ)Yn −Xn βn(λ)] = 1

nY 0nS

0n(λ)MnSn(λ)Yn. (2.10)

The concentrated log likelihood function of λ is

lnLn(λ) = −n2(ln(2π) + 1)− n

2ln σ2n(λ) + ln |Sn(λ)|. (2.11)

The QMLE λn of λ maximizes the concentrated likelihood (2.11). The QMLEs of β and σ2 are, respectively,

βn(λn) and σ2n(λn). It is clear from (2.9) and (2.10) that β0 and σ

20 will be identiÞable once λ0 is identiÞable.

DeÞne the function Qn(λ) = maxβ,σ2 E(lnLn(θ)), which is related to the concentrated log likelihood function

of λ. The optimal solutions of this problem are β∗n(λ) = (X0nXn)

−1X 0nSn(λ)S

−1n Xnβ0 and

σ∗2n (λ) =1

nE[Sn(λ)Yn −Xnβ∗n(λ)]0[Sn(λ)Yn −Xnβ∗n(λ)]

=1

n[Sn(λ)S−1n Xnβ0]

0Mn[Sn(λ)S−1n Xnβ0] + σ

20tr[S

0−1n S0n(λ)Sn(λ)S

−1n ]

=1

n(λ0 − λ)2(GnXnβ0)0Mn(GnXnβ0) + σ

20tr[S

0−1n S0n(λ)Sn(λ)S

−1n ].

(2.12)

Hence,

Qn(λ) = −n2(ln(2π) + 1)− n

2lnσ∗2n (λ) + ln |Sn(λ)|. (2.13)

For the case with GnXnβ0 and Xn being multicollinear, MnGnXnβ0 = 0 and σ∗2n (λ) = σ

2n(λ) in (2.8), which

does not involve Xn.

Both the log likelihood function (2.3) and the concentrated log likelihood function (2.11) involve the

nonzero determinant |Sn(λ)|, which, in turn, makes that S−1n (λ) exists and is relevant for identiÞcation

analysis. The assumption that S−1n is bounded in both row and column sums in Assumption 5 is a sufficient

condition relevant for local identiÞcation but the uniform boundedness condition of S−1n at λ0 can imply

that S−1n (λ) are uniformly bounded in row and column sums uniformly only in a neighborhood of λ0 (see

7

Page 8: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

Lee (2001)). For global identiÞcation via (2.13) without Xn, one needs, in general, a stronger uniform

boundedness property for S−1(λ) on the parameter space of λ.

Assumption 7. S−1n (λ) are uniformly bounded in either row or column sums, uniformly in λ in the

parameter space Λ, which is assumed to be a compact set.

This assumption is needed to deal with the logarithmic value of the determinant of Sn(λ) such that

1nE(|Sn(λ)|) can be uniformly equicontinuous in λ in Λ. It can be shown that if k Wn,n k≤ 1 for all n

where k · k is a matrix norm, then k S−1n (λ) k are uniformly bounded in any bounded subset of (−1, 1)

(Lemma A.4 in Lee, 2001). In particular, if Wn,n is a row-normalized matrix, S−1n (λ) is uniformly bounded

in row sums uniformly in any bounded subset of (−1, 1).

Theorem 2.3 For the case that GnXnβ0 and Xn are multicollinear, suppose that, for any λ 6= λ0,

limn→∞

µ1

nln |σ20S−1n S

0−1n |− 1

nln |σ2n(λ)S−1n (λ)S

0−1n (λ)|

¶6= 0. (2.14)

Then, for any ² > 0, lim supn→∞[maxθ∈N²(θ0)E(1n lnLn(θ))−E( 1n lnLn(θ0))] < 0, where N²(θ0) is an open

neighborhood of θ0 with radius ² in any compact parameter space Θ where Assumption 7 holds. The QMLE

θn is a consistent estimator of θ0 in Θ.

For the model, as Yn = S−1n Xnβ0+S

−1n Vn, the variance matrix of Yn is σ

20S

−1n S

0−1n . The global identiÞcation

condition in Theorem 2.3 is related to the uniqueness of the limiting sample average of the logarithmic value

of the determinant of this variance matrix.

2.3 Asymptotic Distribution of the QMLE

The asymptotic distribution of the QMLE θn can now be derived from the Taylor expansion of the Þrst

order condition that ∂ lnLn(θn)

∂θ = 0 at θ0.

At θ0, the Þrst-order derivatives of the log likelihood function in (2.4) imply

1√n

∂ lnLn(θ0)

∂β=

1

σ20√nX 0nVn and

1√n

∂ lnLn(θ0)

∂σ2=

1

2σ40√n(V 0nVn − nσ20),

and

1√n

∂ lnLn(θ0)

∂λ=

1

σ20√n(Xnβ0)

0G0nVn +1

σ20√n(V 0nG

0nVn − σ20tr(Gn)). (2.15)

These involve both linear and quadratic functions of Vn. Their asymptotic distributions may be derived

from relevant central limit theorems for linear-quadratic functions. The variance matrix of the score vector

8

Page 9: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

in (2.15) is E( 1√n∂ lnLn(θ0)

∂θ · 1√n∂ lnLn(θ0)

∂θ0 ) = −E( 1n ∂2 lnLn(θ0)∂θ∂θ0 ) + Ωθ,n where

Ωθ,n =

0 ∗ ∗µ3σ40n

Pni=1Gn,iixi,n

2µ3σ40n

Pni=1Gn,iiGinXnβ0 +

(µ4−3σ40)σ40n

Pni=1G

2n,ii ∗

µ32σ6

0nl0nXn

12σ6

0n[µ3l

0nGnXnβ0 + (µ4 − 3σ40)tr(Gn)] (µ4−3σ40)

4σ80

(2.16)

is a symmetric matrix with µj = E(vji ), j = 2, 3, 4, being, respectively, the second, third, and fourth

moments of v, where Gin is the ith row of Gn and Gn,ij is the (i, j)th entry of Gn. For the case hn

being a bounded sequence, the central limit theorem for linear-quadratic forms in Kelejian and Prucha

(1999b) is applicable to (2.15). For the case that limn→∞ hn = ∞, 1σ20√n(Xnβ0)

0G0nVn will dominate the

quadratic term of 1√n∂ lnLn(θ0)

∂λ . This occurs because var( 1√nV 0nG

0nVn) =

(µ4−3σ40)n

Pni=1G

2n,ii+

σ4on [tr(G

0nGn)+

tr(G2n)] = O( 1hn ) from Lemma A.10 in Lee (2001) and, hence, 1√n(V 0nG0nVn − σ20tr(Gn)) = oP (1) while

1√n(Xnβ0)

0G0nVn = OP (1). For the case that limn→∞ hn = ∞, the limiting distribution of the score vector

based on 1σ20√n(Xnβ0)

0G0nVn follows from the Kolmogorov central limit theorem for the independence case.

Proposition 2.3 Under the assumed regularity conditions and that Σθ is positive deÞnite,

1√n

∂ lnLn(θ0)

∂θ

D−→ N(0,Σθ + Ωθ), (2.17)

where Ωθ = limn→∞Ωθ,n. If vis are normally distributed, 1√n∂ lnLn(θ0)

∂θ

D−→ N(0,Σθ).

Proposition 2.4 Under the assumed regularity conditions, 1n∂2 lnLn(θn)

∂θ∂θ0 − 1n∂2 lnLn(θ0)

∂θ∂θ0p→ 0, for any θn

that converges in probability to θ0.

Proposition 2.5 Under the assumed regularity conditions, 1n∂2 lnLn(θ0)

∂θ∂θ0 − E³1n∂2 lnLn(θ0)

∂θ∂θ0

´p→ 0. For

the case that limn→∞ hn =∞,

E

µ1

n

∂2 lnLn(θ0)

∂θ∂θ0

¶+1

σ20

1nX

0nXn

1nX

0nGnXnβ0 0

1n (Xnβ0)

0G0nXn1n (Xnβ0)

0G0nGn(Xnβ0) 00 0 1

2σ20

p−→ 0. (2.18)

Theorem 2.4 Under the assumed regularity conditions and that Σθ is nonsingular,

√n(θn − θ0) D→ N(0,Σ−1θ + Σ−1θ ΩθΣ

−1θ ). (2.19)

If v0is are normally distributed, then√n(θn − θ0) D→ N(0,Σ−1θ ).

The conditions that either (i) limn→∞ 1n (GnXnβ0)

0Mn(GnXnβ0) 6= 0 or (ii) limn→∞ 1n tr[(C

0n+Cn)(C

0n+

Cn)0] 6= 0 in Proposition 2.2 guarantee that the limiting (information) matrix Σθ is nonsingular. These

conditions are (local) identiÞcation conditions for the parameters of the model. The condition (i) is relevant

9

Page 10: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

only for mixed regressive models as it involves the regressors Xn. The second condition (ii) is relevant for

spatial autoregressive models with or without regressors. However, the condition (ii) is applicable only for

the cases that hn is a bounded sequence. This is so, because tr(Gn), tr(G2n) and tr(G0nGn) are all of order

O( nhn ) from Lemma 2, they imply that1n tr[(C

0n+Cn)(C

0n+Cn)

0] = O( 1hn ), which will converge to zero when

limn→∞ hn = ∞. For the case with hn being a divergent sequence, the QMLEs for all the parameters

can still be√n-consistent when there are relevant spatial varying regressors Xn in the model such that

GnXnβ0 is not linearly dependent on Xn. However, when GnXnβ0 and Xn are multicollinear, β0 and λ0 can

not be identiÞed just from the regression structures as in Theorem 2.2 because they will not be separately

identiÞable as coefficients of the regressors Xn and Zn in (2.2)0. IdentiÞcation of λ0 will depend crucially on

the covariances of spatial outcomes. For the case that hn is a bounded sequence, the limiting matrix Σθis in general nonsingular when condition (ii) is satisÞed. The asymptotic distribution of the MLE θn will be

√n-consistent and asymptotically normal as in Theorems 2.1 and 2.3. When limn→∞

tr(Gn)n in (2.7) is Þnite

and nonzero, the MLEs λn and σ2n are asymptotically dependent. Anselin and Bera (1998) discussed the

implication of this dependence on statistical inference problems with nuisance parameters. For the case that

hn is a divergent sequence and the disturbances vs are normally distributed, Ωθ = 0, limn→∞tr(Gn)n = 0

and, in this case, the MLEs λn and σ2n are asymptotically independent.

3. Mixed Regressive Models with Singular Information Matrices

When limn→∞ hn = ∞, Σθ can be nonsingular only if GnXnβ0 is not multicollinear with Xn. If

spatially varying regressors are not present or are irrelevant, the multicollinearity problem can occur and,

consequently, there are irregularities in the information matrix for the case when limn→∞ hn =∞.

3.1 Cases with Singular Information Matrices

There are several concrete cases that irregularity can occur when Wn,n is row-normalized.

If there are no spatially varying regressors, Xn will consist of a constant term. AsWn,n is row-normalized,

Xn = ln will imply thatWn,nXn = ln and GnXn =ln

1−λ0 . The latter holds because Snln = (1−λ0)ln implies

S−1n ln =ln

1−λ0 .5 In this case, when limn→∞ hn =∞, the limiting information matrix in Σθ will be singular.

5 Note that this relation is established without using the series expansion that S−1n = In + λ0Wn,n +λ20W

2n,n + · · · =

P∞i=0 λ

i0W

in,n. The series expansion is valid for |λ| < 1 with a row-normalized Wn,n (Horn

and Johnson (1985).

10

Page 11: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

This is so because the submatrix

1

n

µX 0n

(Xnβ0)0G0n

¶(Xn, GnXnβ0) =

µ1nX

0nXn

1nX

0nGnXnβ0

1n (Xnβ0)

0G0nXn1n (Xnβ0)

0G0nGn(Xnβ0)

¶=

µ1 β0

1−λ0β01−λ0 ( β0

1−λ0 )2

¶(3.1)

of (2.7) is singular and that 1n [tr(G

0nGn) + tr(G

2n)] = O(

1hn)→ 0 as n→∞.

The other case is when all spatially varying regressors are irrelevant but are included in estimation. Let

Xn = (ln, X2n) where X2n are the spatially varying regressors. In that case, the coefficient subvector β02 of

X2n in β0 = (β01,β002)

0 is zero. Consequently, Xnβ0 = lnβ01 and

GnXnβ0 = β01Wn,nS−1n ln = β01Wn,n(

ln1− λ0 ) =

β011− λ0 ln.

It follows that

µ1nX

0nXn

1nX

0nGnXnβ0

1n (Xnβ0)

0G0nXn1n (Xnβ0)

0G0nGn(Xnβ0)

¶=

µ 1nX

0nXn

β011−λ0 · 1nX 0

nlnβ011−λ0 · 1n l0nXn ( β01

1−λ0 )2 · 1n l0nln

¶, (3.2)

which is singular because the last column is proportional to the Þrst one.

There is another case that causes some related issues. Consider the spatial weights matrix Wn,n =

1n−1 (l

0nln − In). Sample observations are from a single spatial economy where a spatial unit is inßuenced

equally by all the other spatial units. Because the spatial weights matrix has a particular pattern, the inverse

of Sn can be derived analytically and has the form: S−1n = (1 + λ0

n−1 )−1(In + λ0

1−λ0lnl

0n

(n−1) ). It follows that

Gn =1

(n−1+λ0) (lnl

0n

1−λ0 − In) and

GnXnβ0 =1

(n− 1 + λ0) (l0nXnβ01− λ0 ln −Xnβ0) =

n

(n− 1 + λ0) (l0nXnβ0(1− λ0)nln −

Xnβ0n

), (3.3)

which are linear combinations of Xn and its constant term, even though the linear coefficients change as n

changes. However, this case has hn = n and it does not satisfy Assumption 4 thathnn goes to zero.

The singularity of the information matrix has implications on the rate of convergence of the estimators

which will be analyzed in the following sections.

3.2 IdentiÞcation and Consistency

The preceding cases imply that the columns of GnXnβ0 lie in the space spanned by the columns of Xn

and, therefore, MnGnXnβ0 = 0. For subsequent analyses, the following assumption will be maintained.

Assumption 8. MnGnXnβ0 = 0.

This assumption may cover more irregular cases than those mentioned.

11

Page 12: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

The subsequent analysis is revealing by working with the concentrated log likelihood function lnLn(λ)

in (2.11) and the corresponding deterministic function Qn(λ) in (2.13). The following proposition shows

the rate of uniform convergence of (lnLn(λ) − Qn(λ)) to zero. For the case limn→∞ hn = ∞, the uniform

convergence can be achieved on any bounded set of λ as in the following proposition.6

Proposition 3.1 Under the assumed regularity conditions, hnn (lnLn(λ) − Qn(λ))p→ 0 uniformly in λ

in any bounded set when limn→∞ hn =∞.

As∂σ2n(λ)∂λ = − 2

nY0nW

0n,nMnSn(λ)Yn, the Þrst and second order derivatives of the concentrated log

likelihood in (2.11) are

∂ lnLn(λ)

∂λ=

1

σ2n(λ)Y 0nW

0n,nMnSn(λ)Yn − tr(Wn,nS

−1n (λ)), (3.4)

and

∂2 lnLn(λ)

∂λ2=

2

nσ4n(λ)(Y 0nW

0n,nMnSn(λ)Yn)

2 − 1

σ2n(λ)Y 0nW

0n,nMnWn,nYn − tr([Wn,nS

−1n (λ)]2). (3.5)

As MnGnXnβ0 = 0, the Þrst and second derivatives in (3.4) and (3.5) at λ0 can be simpliÞed into

∂ lnLn(λ0)

∂λ=

1

σ2n(λ0)V 0nG

0nMnVn − tr(Gn), (3.6)

and

∂2 lnLn(λ0)

∂λ2=

2

nσ4n(λ0)(V 0nG

0nMnVn)

2 − 1

σ2n(λ0)V 0nG

0nMnGnVn − tr(G2n). (3.7)

As MnXn = 0, σ2n(λ0) =

1nV

0nMnVn from (2.10) and σ2n(λ0)

p→ σ20 because

σ2n(λ0) =1

nV 0nMnVn =

1

nV 0nVn −

1

nV 0nXn(X

0nXn)

−1X 0nVn =

1

n

nXi=1

v2i + oP (1) = σ20 + oP (1) (3.8)

by the Markov inequality as E( 1nV0nXn(X

0nXn)

−1X 0nVn) =

σ2kn = o(1). DeÞne the function

Pn(λ0) =2

nσ40(V 0nG

0nMnVn)

2 − 1

σ20V 0nG

0nMnGnVn − tr(G2n). (3.9)

∂2 lnLn(λ0)∂λ2 in (3.7) differs from Pn(λ0) in that σ

2n(λ0) of the former is replaced by σ

20 .

Proposition 3.2 Under the regularity conditions for this model,

E(Pn(λ0)) =2

ntr2(MnGn)− tr(MnGnG

0n)− tr(G2n) +O(

1

hn)

= −[tr(CnC0n) + tr(C2n)] +O(1),(3.10)

6 For the case that hn is a bounded sequence, it can be shown that the uniform convergence can occuron a neighborhood of λ0 without extra conditions. We focus on the divergent hn case as it is relevant forthe irregularity issue.

12

Page 13: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

and hnn [Pn(λ0)− E(Pn(λ0)]

p→ 0. Furthermore, hnn

³∂2 lnLn(λ0)

∂λ2 − E(Pn(λ0))´

p→ 0.

Hence,

hnn

∂2 lnLn(λ0)

∂λ2p→ limn→∞

hnnE(Pn(λ0)) (3.11)

if the latter limit exists. The limit is nonnegative as tr(CnC0n)+ tr(C

2n) =

12 [tr(C

0n+Cn)(C

0n+Cn)

0] is always

nonnegative and is strictly positive unless C0n + Cn = 0.

Assumption 9. Σλλ = limn→∞ hnn tr[(C

0n + Cn)(C

0n + Cn)

0] exists and is positive.

This assumption modiÞes the local identiÞcation in Proposition 2.2 for the case that limn→∞ hn =∞. While

limn→∞ 1n tr[(C

0n + Cn)(C

0n + Cn)

0] = 0 as limn→∞ hn = ∞, the magnifying factor hn will correct for the

rate of convergence so that the limit Σλλ in Assumption 9 can be Þnite and nonzero. The following theorem

shows that Assumption 9 is a sufficient condition for the local identiÞcation of λ0.

Theorem 3.1 Under the assumed conditions, there exists a set Λ with λ0 in its interior such that, for

any neighborhood N²(λ0) of λ0 with radius ²,

lim supn→∞

µmax

λ∈Λ\N²(λ0)

hnnQn(λ)− hn

nQn(λ0)

¶< 0.

Furthermore, the QMLE λn derived from the maximization of lnLn(λ) with λ in Λ is a consistent estimator.

For global identiÞcation, λ0 would be globally identiÞable if

limn→∞(

hnnln |σ20S−1n S

0−1n |− hn

nln |σ2n(λ)S−1n (λ)S

0−1n (λ)|) 6= 0

when λ 6= λ0. This condition modiÞes that of Theorem 2.3 by adjusting the magnifying factor hn. It

is equivalent to that limn→∞ hnn

¡ln |Sn(λ)|− ln |Sn(λ0)|− n

2 (lnσ2n(λ)− lnσ2n(λ0))

¢ 6= 0 for any λ 6= λ0.

A direct veriÞcation of the global identiÞcation of λ0 would depend on the speciÞc sequence of weights

matrices Wn,n in a model. For some models with important weights matrices, when limn→∞ hn =∞, the

concentrated log likelihood function in (2.11) can be globally concave. For those cases, local identiÞcation

under Assumption 9 implies global identiÞcation as the maximizer of λ is unique. These cases include the

spatial weights matrix in Case (1991, 1992) which is symmetric, and the familiar case of Ord (1975) where

Wn,n is row-normalized and the eigenvalues of Wn,n are real. SpeciÞcally, when limn→∞ hn = ∞, if the

weights matrix Wn,n is symmetric or Wn,n = Λ−1n Dn where Λn is a diagonal matrix and Dn is a symmetric

matrix, then, for large enough n, hnn (Qn(λ) − Qn(λ0)) is concave on any bounded set of λ. With the local

identiÞcation in Theorem 3.1, λ0 is its unique global maximizer and, hence, λ0 is uniquely identiÞable on

13

Page 14: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

any bounded parameter space. The estimator λn derived from the maximization of lnLn(λ) in (2.11) over

any compact parameter space of λ will be consistent. The proofs of these results are the same as those in

Theorem 4 of Lee (2001) for the pure spatial autoregressive process because Qn(λ) of the mixed regressive

model has the same expression as that of the pure spatial autoregressive model when MnGnXnβ0 = 0.

3.3 Asymptotic Distribution

Asymptotic distribution of the QMLE λn can be derived from the concentrated log likelihood function

(2.11). Subsequently, the QMLEs βn and σ2n from (2.9) and (2.10) can be derived once the asymptotic

distribution of λn is available.

Proposition 3.3 Under the assumed regularity conditions of the model,

hnn

Ã∂2 lnLn(λn)

∂λ2− ∂

2 lnLn(λ0)

∂λ2

!= oP (1),

for any λn that converges to λ0 in probability.

Let Qn = V0nC

0nMnVn. It follows from (3.6) and (3.8) that

∂ lnLn(λ0)∂λ = Qn

σ2n(λ0). The limiting distribution

of this Þrst-order derivative depends on that of the quadratic form Qn of Vn. The original central limit

theorem in Kelejain and Prucha (1999b) is not directly applicable to the case with hn being a divergent

sequence. But their theorem and its proof have been modiÞed in Lee (2001) to cover the divergent case.

Assumption 4 needs to be slightly strengthened for the latter.

Assumption 40. h1+ηn

n for some η > 0 tends to zero as n goes to inÞnity.

Proposition 3.4 Under the assumed regularity conditions,

rhnn

∂ lnLn(λ0)

∂λ

p→ N (0,Σλλ + Ωλ) , (3.12)

where Ωλ = (µ4−3σ40σ40

) limn→∞ hnn

Pni=1 C

2n,ii

For the cases that vis are normally distributed or limn→∞ hn =∞,rhnn

∂ lnLn(λ0)

∂λ

p→ N (0,Σλλ) . (3.13)

The asymptotic distribution of λn will then follow from the expansion

rn

hn(λn − λ0) = −

Ãhnn

∂2 lnLn(λn)

∂λ2

!−1·rhnn

∂ lnLn(λ0)

∂λ

14

Page 15: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

and the stochastic convergence of the Þrst and second order derivatives in Propositions 3.3 and 3.4 and that

in (3.11).

Theorem 3.2 Under the assumed regularity conditions for this model,

rn

hn(λn − λ0) p→ N(0,Σ−1λλ + Σ

−1λλΩλΣ

−1λλ ). (3.14)

For the cases that vis are normally distributed or limn→∞ hn =∞,rn

hn(λn − λ0) p→ N(0,Σ−1λλ ). (3.15)

With λn, the QMLEs for β and σ2 are βn = (X

0nXn)

−1X 0nSn(

λn)Yn, and σ2n =

1nY

0nS

0n(λn)MnSn(λn)Yn.

As these estimators are related to λn, one has to take into account the possible effect of λn on their asymptotic

distributions.

Theorem 3.3 Under the assumed regularity conditions, as limn→∞ hn =∞,rn

hn( βn − β0) =

rn

hn(X 0

nXn)−1X 0

nVn −rn

hn(λn − λ0) · (X 0

nXn)−1X 0

nGnXnβ0 +Op(1√hn)

D→ N³0,Σ−1λλ · limn→∞(X

0nXn)

−1(X 0nGnXnβ0)(X

0nGnXnβ0)

0(X 0nXn)

−1´,

(3.16)

and√n(σ2n − σ20) = 1√

n

Pni=1(v

2i − σ20) + oP (1) D→ N(0, µ4 − σ40).

However, when β0 = 0,√nβn

D→ N³0,σ20 limn→∞(

X0nXn

n )−1´.

The asymptotic distribution of the QMLE βn in this situation is determined by the asymptotic distri-

bution of λn as the latter forms the leading term in an asymptotic expansion which converges in the slow

rateq

nhn. However, when β0 = 0, this leading term vanishes and βn can converge with the usual

√n rate.

The asymptotic distribution of σ2n has the usual√n convergence as expected.

3.4 The Spatial Autoregressive Model with Irrelevant Regressors or Only an Intercept

As GnXnβ0 and Xn are multicollinear under consideration, there exists a column vector cn such that

GnXnβ0 = Xncn, the asymptotic distribution of βn in (3.16) can be rewritten asrn

hn( βn − β0) D→ N(0,Σ−1λλ · limn→∞ cnc

0n). (3.16)0

If some components of cn are zero, the corresponding diagonal entries of the limiting variance matrix will be

zero and the corresponding distributions of estimates in βn will be degenerated. In that case, those component

estimates may converge at a rate faster thanq

nhn, while the estimates of the remaining components will

15

Page 16: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

converge at theq

nhn-rate. A concrete example is a mixed regressive model where all the included spatial

varying regressors are irrelevant, i.e., the unknown coefficients of the spatial varying regressors of Xn are

zeros. Let Xn = (ln, X2n) and β = (β1,β02)0 be the conformable partition. The model with all spatial

regressors being irrelevant has the true parameter (sub)vector β02 of β2 being zero. A spatial autoregressive

model with only an intercept but no other spatial regressors corresponds to the same model but with β02 = 0

known and being imposed in estimation.

In the irrelevant regressors case, one estimates both the unknown parameter β1 and β2. Because Xnβ0 =

β01ln, GnXnβ0 = β01Gnln. If Gnln is not linearly depended on ln, ln and Gnlnβ01 can be distinguished

regressors in (2.2)0. In that case, the results in Proposition 2.2 and Theorems 2.2 and 2.4 are applicable and

the QMLE θn of all the parameters in the model can be√n-consistent. In the event that Gnln and ln are

multicollinear but hn is a bounded sequence, Proposition 2.2 and Theorems 2.3 and 2.4 are applicable andθn is still

√n-consistent.

The irregular case occurs when Gnln and ln are multicollinear and limn→∞ hn =∞. The irregular case

is the concern in this section. If β01 were zero, it would correspond to β0 = 0, that is covered by the last part

of Theorem (3.3). Thus, for subsequent analysis, it is implicit that β01 6= 0 for consideration. In generality,

consider cn = (c01n, c

02n)

0 where c1n 6= 0 but c2n = 0. Conformably, let Xn = (X1n, X2n) and β = (β01,β02)0.

For the model with all spatial regressors being irrelevant and the weights matrix being row-normalized, as

Wn,nln = ln and Gnln =ln

1−λ0 , GnXnβ0 =β011−λ0 ln and, hence, c1n =

β011−λ0 and c2n = 0. It follows from

(3.16) that

βn − β0 = (X 0nXn)

−1X 0nVn − (λn − λ0)(X 0

nXn)−1X 0

nGnXnβ0 +OP (

√hnn)

= (X 0nXn)

−1X 0nVn − (λn − λ0)cn +OP (

√hnn).

(3.17)

As c1n 6= 0 but c2n = 0, (3.17) suggests that βn1 may be affected by the limiting distribution of λn but

βn2 will not. When limn→∞ hn =∞, it will be shown that λn and βn1 have a different rate of convergence

from that of βn2. Let M1n = In − X1n(X 01nX1n)

−1X 01n and M2n = In − X2n(X 0

2nX2n)−1X 0

2n. Using a

matrix partition for (X 0nXn)

−1, βn1 − β01 = (X 01nM2nX1n)

−1X 01nM2nVn − c1n(λn − λ0) + Op(

√hnn ), and

βn2 − β20 = (X 02nM1nX2n)

−1X 02nM1nVn +Op(

√hnn ). It follows that

rn

hn( βn1 − β01) = 1√

hn(1

nX 01nM2nX1n)

−1 1√nX 01nM2nVn − c1n ·

rn

hn(λn − λ0) +Op( 1√

n)

= −c1n ·rn

hn(λn − λ0) +OP ( 1√

hn),

(3.18)

16

Page 17: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

and

√n( βn2 − β20) = ( 1

nX 02nM1nX2n)

−1 · 1√nX 02nM1nVn +OP (

rhnn). (3.19)

This indicates that βn2 has the usual√n-rate of convergence regardless whether hn is divergent or not.

As λn has only aq

nhn-rate of convergence, βn2 has a faster rate of convergence than λn for the case

limn→∞ hn =∞. These results are summarized in the following theorem.

Theorem 3.4 Under the assumed regular conditions for the model (2.1) with GnXnβ0 = X1nc1n where

Xn = (X1n, X2n), when limn→∞ hn = ∞,q

nhn( βn1 − β01) D→ N(0,Σ−1λλc1c

01), where c1 = limn→∞ c1n, but

√n( βn2 − β02) D−→ N(0,σ20(limn→∞ 1

nX02nM1nX2n)

−1).

It follows that for the regressive model where the included spatial varying regressors are all irrelevant

and the intercept β01 6= 0, when limn→∞ hn = ∞, βn1 has the same low rate of convergence as λn and

its limited distribution is determined by the limiting distribution of λn. But, βn2 will converge to zero in

probability at the usual√n-rate. These results are for the regressive model (2.1) where β02 = 0 but the

constraint β02 = 0 is not imposed in the estimation. One may like to consider the situation where the

constraint is correctly imposed on the estimation. With the constraint β02 = 0 imposed, the model for

estimation is a spatial autoregressive model with an unknown intercept:

Yn = β1ln + λWn,nYn + Vn. (3.20)

The unknown parameters for estimation are β1, λ and σ2. Given an λ, the QMLE of β1 is

βn1(λ) =1

nl0nSn(λ)Yn (3.21)

and the estimate of σ2 is σ2n(λ) =1nY

0nS

0(λ)M1nSn(λ)Yn, where M1n = In − lnl0n

n . The concentrated log

likelihood function of λ is the one in (2.11) withMn in (2.10) replaced byM1n. BecauseM1n can be regarded

as a special case forMn (with Xn = ln), the results in Theorems 3.1 and 3.2 are applicable for the restricted

estimate λn. This is so also for the estimate σ2n with the results in Theorem 3.3. From (3.17) with Xn = ln,

βn1 − β01 = 1n l0nVn − c1n(λn − λ0) +OP (

√hnn ). Therefore,r

n

hn( βn1 − β01) = 1√

hn· l0nVn√n− c1n

rn

hn(λn − λ0) +OP ( 1√

n)

= −c1nrn

hn(λn − λ0) +OP ( 1√

hn),

(3.22)

which is the same as in (3.18). Thus, we can conclude that the results in Theorems 3.2, 3.3 and 3.4 hold

also for the restricted parameter estimates λn, βn1 and σ2n.

17

Page 18: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

4. Inconsistency When hn = O(n)

The preceding results are derived for cases with limn→∞ hnn = 0. That is, if hn is a divergent sequence,

it diverges to inÞnity at a rate slower than n. In this section, we are going to provide an example which

shows that the QMLE θn can not be consistent if hn has the order n.

Consider the weights matrix Wn,n =1

n−1 (lnl0n − In), which would be the weights matrix in Case (1991,

1992) when sample data are collected only for a single district. In this case, hn = (n− 1) is of order O(n).

It can be shown that the QMLE λn can not be a consistent estimator of λ0 and there are consequences in

estimating the regression coefficients. For simplicity, we consider the case that σ20 is known and σ20 = 1.

With this Wn,n, S−1n (λ) = (1 + λ

n−1 )−1(In + λ

(1−λ)lnl

0n

(n−1) ) and Wn,nS−1n (λ) = 1

(n−1+λ) (lnl

0n

1−λ − In). It follows

that GnXnβ0 =n

(n−1+λ0)³l0nXnβ0(1−λ0)n ln −

Xnβ0n

´, which is multicollinear with Xn.

As σ20 = 1 is known, the unknown parameter vector is δ = (β0,λ)0 and the corresponding log like-

lihood function is lnLn(δ) = −n2 ln(2π) + ln |Sn(λ)| − V 0

n(δ)Vn(δ)2 . Given λ, the QMLE of β0 is βn(λ) =

(X 0nXn)

−1X 0nSn(λ)Yn and the concentrated log likelihood function of λ is

lnLn(λ) = −n2ln(2π) + ln |Sn(λ)|− 1

2Y 0nS

0n(λ)MnSn(λ)Yn. (4.1)

Because MnGnXnβ0 = 0, the Þrst order derivative of (4.1) is

∂ lnLn(λ)

∂λ= −tr(Wn,nS

−1n (λ)) + Y 0nS

0n(λ)MnWn,nYn

= −tr(Wn,nS−1n (λ)) + V 0nMnGnVn + V

0nG

0nMnGnVn(λ0 − λ),

(4.2)

and the second order derivative is

∂2 lnLn(λ)

∂λ2= −tr[(Wn,nS

−1n (λ))2]− V 0nG0nMnGnVn. (4.3)

At λ0, because tr(Gn) =n

(n−1+λ0) (λ01−λ0 ) and MnGn = − Mn

(n−1+λ0) ,

∂ lnLn(λ0)

∂λ= −tr(Gn) + V 0nMnGnVn = − n

(n− 1 + λ0)µ

λ01− λ0

¶− 1

(n− 1 + λ0)V0nMnVn. (4.4)

As tr[(Wn,nS−1n (λ))2] = ( n

n−1+λ )2h1−2(1−λ)/n(1−λ)2 + 1

n

iand G0nMnGn =Mn/(n− 1 + λ0)2, one has

∂2 lnLn(λ)

∂λ2= −

µn

n− 1 + λ¶2 ·

1− 2(1− λ)/n(1− λ)2 +

1

n

¸− V 0nMnVn(n− 1 + λ0)2 . (4.5)

By the mean value theorem, λn = λ0−³∂2 lnLn(λn)

∂λ2

´−1∂ lnLn(λ0)

∂λ where λn lies between λn and λ0. Suppose

λn were consistent, we shall show that there would be a contradiction. If λn were consistent, it would

18

Page 19: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

imply that λnp→ λ0 and, hence,

∂2 lnLn(λn)∂λ2

p→ − 1(1−λ0)2 . As

1nV

0nMnVn =

1nV

0nVn+ oP (1)

p→ 1, ∂ lnLn(λ0)∂λ

p→

1− λ0(1−λ0) . Consequently,

λnp→ λ0+(1−λ0)(1−2λ0), which would not equal to λ0 in general, a contradiction.7

So, λn could not be a consistent estimator.

Let βn,L = (X 0Xn)−1X 0nYn is the OLSE of β. As (X

0nXn)

−1X 0nWn,n =

1n−1 (ek,1l

0n − (X 0

nXn)−1X 0

n)

where ek1 is the Þrst unit vector of the k-dimensional Euclidean space,

βn = (X0nXn)

−1X 0nSn(

λn)Yn

= (X 0nXn)

−1X 0nSnYn − (λn − λ0)(X 0

nXn)−1X 0

nWn,nYn

= β0 + (X0nXn)

−1X 0nVn − (λn − λ0)

"l0nYn(n− 1)ek,1 −

βn,L(n− 1)

#.

(4.6)

Because l0nS−1n = (1 + λ0

n−1 )−1(1 + nλ0

(n−1)(1−λ0))l0n,

l0nYn(n− 1) =

µ1 +

λ0n− 1

¶−1 µ1 +

nλ0(n− 1)(1− λ0)

¶·l0nXnβ0n− 1 +

l0nVnn− 1

¸p→ 1

(1− λ0)µxb,

where µxb = limn→∞ l0nXnβ0/n. For the OLSE, because (X 0nXn)

−1X 0nS

−1n Xnβ0 = (1 + λ0

n−1 )−1[β0 +

( λ01−λ0 )

l0nXnβ0(n−1) ek,1],

βn,L = (X0nXn)

−1X 0n(S

−1n Xnβ0 + S

−1n Vn) =

µ1 +

λ0n− 1

¶−1 ·β0 +

µλ0

1− λ0

¶l0nXnβ0(n− 1) ek,1

¸+ oP (1)

p→ β0 + (λ0

1− λ0 )µxbek,1.(4.7)

Hence,

βn = β0 − (λn − λ0)[ µxb1− λ0 ek1 +Op(

1

n)] +O(

1√n). (4.8)

So, as λn−λ0 does not go to zero in probability, βn will not converge to β0 and βn will not be consistent. Ifλn−λ0 were stochastically bounded, it could be seen from (4.8) that the consistency of βn would mainly be

on the estimate of the intercept term β01 and the estimates β2n of the coefficients of β02 could be consistent.

Indeed, these are so for the OLSE βn,L from (4.7).

However, λn might not be stochastically bounded and even the QMLE β2n could be problematic. This

may be seen as follows. λn is supposed to be solved from the likelihood equation ∂ lnLn(λ)∂λ = 0 from (4.2).

As MnWn,n = − Mn

n−1 , MnS−1n = (1 + λ0

n−1 )−1Mn and MnS

−1n Xn = 0, Y 0nMnWn,nYn = − 1

n−1Y0nMnYn =

7 For the pure spatial autoregressive model, a similar argument leads to the convergence of λn to anondegenerate random variable instead of a constant. This is so because of the presence of Mn in (4.4) and(4.5) in the mixed regressive model.

19

Page 20: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

−(1 + λ0(n−1) )

−2 V 0nMnVn(n−1)

p→ −1 and Y 0nW 0n,nMnWn,nYn =

1(n−1+λ0)2V

0nMnVn = op(1). It follows from (4.2)

that

∂ lnLn(λ)

∂λ= −

µλ

1− λ¶

n

(n− 1 + λ) + Y0nMnWn,nYn − λY 0nW 0

n,nMnWn,nYnp→ D∞(λ),

where D∞(λ) = − λ1−λ − 1 = − 1

(1−λ) , uniformly in any bounded set of λ bounded away from 1. For λ < 1,

D∞(λ) < 0 and, for λ > 1, D∞(λ) > 0. D∞(λ) a strictly decreasing function on both the regions where

λ < 1 and that of λ > 1. D∞(λ) 6= 0 for any Þnite λ but limλ→±∞D∞(λ) = 0. Thus, either λn does not

exist or λn may diverge to inÞnity for large n.

The above example has a sample obtained from a single district. By increasing n, it means increasing

spatial units in the same district. That corresponds to the notion of inÞll asymptotics (Cressie 1993, p.101).

The above example shows that the QMLE under inÞll asymptotics alone may not be consistent. If there

are many separate districts from which samples are obtained, the QMLEs can be consistent if the number

of districts, i.e., nhn, increases to inÞnity. The latter scenario corresponds to the notion of increasing-

domain asymptotics (Cressie 1993, p.100). Consistency of the QMLE can be achieved with increasing-

domain asymptotics. From our results, the rate of convergence of the QMLE under the increasing-domain

asymptotics alone can be the usual√n-rate. But, when both inÞll and increasing-domain asymptotics are

operating, the rates of convergence of the QMLEs for various parameters can be different and some of them

may have a slower rate than the usual one. These results are Þrst observed for the pure spatial autoregressive

process in Lee (2001). Here, we have demonstrated that even though the presence of relevant spatial varying

regressors play a distinctive role in identiÞcation and consistent estimation of the mixed regressive model,

the requirement of increasing-domain asymptotics is still needed regardless the regressors.

5. Conclusions

In this paper, we have investigated the asymptotic distribution of the QMLE of a (normal) likelihood

function for mixed regressive, spatial autoregressive models. Our analysis reveals that asymptotic properties

of the QMLE depend on some important features of the spatial weights matrix. We have considered distinc-

tive cases that each spatial unit can only be inßuenced by a few neighboring units and cases that each unit

can be inßuenced by many spatial units, which are uniformly small.

The mixed regressive model differs from a pure spatial autoregressive process in the presence of regres-

sors. In the presence of relevant spatial regressors in the model, the regressors play the dominated role in

the identiÞcation of the spatial interaction and regression coefficients. In models without spatial varying

20

Page 21: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

regressors but only a constant intercept term, a spatial weights matrix (without row normalization) might

interact with the constant term to generate a spatial varying regressor for identiÞcation. However, if the

spatial weights matrix is row-normalized in the formulation of the spatial model, it will not interact with the

constant term to generate a spatial varying regressor. In the presence of relevant spatial varying regressors

(generated or not), the QMLEs of all the parameters in the mixed regressive model can be√n-consistent

and are asymptotically normal regardless whether a spatial unit is inßuenced by a few or many neighbors.

If the spatial regressors are irrelevant or can not be generated from the interaction of the spatial weights

matrix with the included constant term, identiÞcation of the spatial interaction parameter can only be

identiÞed by the correlation of spatial units. The asymptotic distribution of the QMLE can be√n-consistent

for the scenario that each spatial unit is inßuenced by only a few neighboring units. For the case that each

spatial unit is inßuenced by many neighboring units, the asymptotic distributions of the QMLEs of various

parameters in the model may have different rates of convergence. The QMLEs of the regression coefficients of

the irrelevant spatial varying regressors may still converge at the√n to their true parameter values, namely

zero, but the QMLEs of the spatial interaction parameter and the constant intercept term will converge at

a slower rate.8

In the spatial scenario introduced by Case (1991, 1992), the requirement that hnn goes to zero corresponds

to the notion of increasing-domain asymptotics. Our analysis illustrates that the consistency and asymptotic

distributions of the parameter estimates are subject to the requirement of increasing-domain asymptotics

regardless of the presence of relevant spatial regressors or not.

8 The latter slow rate of convergence occurs also for the QMLE of the spatial interaction parameter inthe pure spatial autoregressive process (Lee, 2001).

21

Page 22: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

Appendix I: Some Useful Lemmas

In this Appendix, the assumption that the elements wn,ij of the weights matrixWn,n are of order O(1hn)

uniformly in i, j will be maintained. In general, Wn,n may not be row-normalized. The following lemmas are

mostly relevant for the regressive model. The Þrst three lemmas have been established in Lee (2001). They

are listed here for frequent reference. Additional lemmas needed for our subsequent proofs can be found in

Lee (2001).

Lemma 1. Suppose that the column sums of An are uniformly bounded.

1) wi,nAnenj = O( 1hn ) and w0n,iAnenj = O( 1hn ) uniformly in i and j, where enj is the jth unit column

vector of the n-dimensional Euclidean space and wi,n and wn,i are, respectively, the ith row and columns

of Wn,n.

2) wi,nAnb0j,n = O( 1hn ) and w

0n,iAnb

0j,n = O( 1hn ) uniformly in i and j, where bj,n is the jth row of Bn,

when Bn,n is uniformly bounded in row sums.

Proof: This is Lemma A.7 in Lee (2001). Q.E.D.

Lemma 2. Suppose that Wn,n and S−1n are uniformly bounded in row sums, and An is uniformly

bounded in column sums. Then, the matrix Hn = G0nAn has the properties that e

0niH

mn enj(= e

0njH

0mn eni) =

O( 1hn ) uniformly in i and j, and tr(Hmn ) = tr(H

0mn ) = O( nhn ) for any integer m ≥ 1.

Furthermore, if Wn,n, S−1n and An are uniformly bounded in both row and column sums, then

e0ni(HnH0n)menj = O( 1hn ) uniformly in i and j, and tr[(HnH

0n)m] = tr[(H 0

nHn)m] = O( nhn ). In addition,

e0ni(G0nGn)

menj = O(1hn).

Proof: This is Lemma A.8 in Lee (2001). Q.E.D.

Lemma 3 Assume that Wn,n, S−1n and An are uniformly bounded in both row and column sums

and v1, . . . , vn are i.i.d. with zero mean and Þnite variance σ2. Let Hn = G

0nAn. Then, E(V

0nHnVn) = O(

nhn),

var(V 0nHnVn) = O( nhn ), and V0nHnVn = OP (

nhn). Furthermore, if limn→∞ hn

n = 0, then hnn V

0nHnVn −

hnn E(V

0nHnVn) = oP (1).

Proof: This is a part of Lemma A.12 in Lee (2001). Q.E.D.

Lemma 4. Suppose that the elements of the n × k matrices Xn are uniformly bounded for all n; and

limn→∞ 1nX

0nXn exists and the limiting matrix is nonsingular, then the projectors Mn and (In−Mn), where

Mn = In −Xn(X 0nXn)

−1X 0n, are uniformly bounded in both row and column sums.

Proof: Let Bn = ( 1nX0nXn)

−1. From the assumption of the lemma, Bn converges to a Þnite limit.

22

Page 23: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

Therefore, there exists a constant cb such that |bn,ij | ≤ cb for all n, where bn,ij is the (i, j)th element

of Bn. By the uniform boundedness of Xn, there exists a constant cx such that |xn,ij | ≤ cx for all n.

Let An =1nXn(

X0nXn

n )−1X 0n, which can be rewritten as An =

1n

Pks=1

Pkr=1 bn,rsxn,rx

0n,s, where xn,r is

the rth column of Xn. It follows thatPn

j=1 |an,ij | =Pn

j=1 | 1nPk

s=1

Pkr=1 bn,rsxn,irxn,js| ≤ k2cbc

2x, for all

i = 1, · · · , n. Similarly, Pni=1 |an,ij | =

Pni=1 | 1n

Pks=1

Pkr=1 bn,rsxn,irxn,js| ≤ k2cbc

2x for all j = 1, · · · , n.

That is, (In−Mn) = Xn(X0nXn)

−1X 0n are uniformly bounded in both row and column sums. Consequently,

Mn are also uniformly bounded in both row and column sums. Q.E.D.

Lemma 5. Suppose that An are uniformly bounded in both row and column sums. Elements of the

n× k matrices Xn are uniformly bounded; limn→∞X0nXn

n exists and is nonsingular. Then

(i) tr(MnAn) = tr(An) +O(1),

(ii) tr(A0nMnAn) = tr(A0nAn) +O(1),

(iii) tr[(MnAn)2] = tr(A2n) +O(1), and

(iv) tr[(A0nMnAn)2] = tr[(MnAnA

0n)2] = tr[(AnA

0n)2] +O(1), where Mn = In −Xn(X 0

nXn)−1X 0

n.

Furthermore, if An,ij = O(1hn) for all i and j, then

(a) tr2(MnAn) = tr2(An) +O(

nhn) and

(b)Pn

i=1(MnAn)2ii =

Pni=1A

2n,ii +O(

1hn).

Proof: The assumptions imply that elements of the k × k matrix ( 1nX 0nXn)

−1 are uniformly bounded

for large enough n. Lemma A.5 in Lee (2001) implies that elements of the k × k matrices 1nX

0nAnXn,

1nX

0nAnA

0nXn and

1nX

0nA

2nXn are also uniformly bounded. It follows that

tr(MnAn) = tr(An)− tr[(X 0nXn)

−1X 0nAnXn] = tr(An) +O(1),

tr(A0nMnAn) = tr(A0nAn)− tr[(X 0

nXn)−1X 0

nAnA0nXn] = tr(A

0nAn) +O(1),

and tr((MnAn)2) = tr(A2n)−2tr[(X 0

nXn)−1X 0

nA2nXn]+ tr([(X

0nXn)

−1X 0nAnXn]

2) = tr(A2n)+O(1). By (iii),

tr[(A0nMnAn)2] = tr[(MnAnA

0n)2] = tr[(AnA

0n)2] +O(1).

When An,ij = O(1hn), from (i), tr2(MnAn) = (tr(An) + O(1))

2 = tr2(An) + 2tr(An) · O(1) + O(1) =

tr2(An) + O(nhn). Because An is uniformly bounded in column sums and elements of Xn are uniformly

bounded, X 0nAneni = O(1) for all i. Hence,

Pni=1(MnAn)

2ii =

Pni=1(An,ii − xi,n(X 0

nXn)−1X 0

nAneni)2 =Pn

i=1(An,ii +O(1n ))

2 =Pn

i=1[A2n,ii + 2An,ii ·O( 1n ) +O( 1n2 )] =

Pni=1A

2n,ii +O(

1hn). Q.E.D.

23

Page 24: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

Appendix II: Proofs of Theorems

Proof of Proposition 2.1 As Wn,nYn = GnXnβ0 +GnVn,

Y 0nW0n,nWn,nYn − E(Y 0nW 0

n,nWn,nYn) = 2(GnXnβ0)0GnVn + V 0nG

0nGnVn − E(V 0nG0nGnVn),

Y 0nW0n,nVn − E(Y 0nW 0

n,nVn) = (GnXnβ0)0Vn + V 0nG

0nVn − E(V 0nG0nVn),

and X 0nWn,nYn − E(X 0Wn,nYn) = X

0nGnVn. Because

V 0n(δ)Vn(δ) = (β − β0)0X 0nXn(β − β0) + (λ− λ0)2Y 0nW 0

n,nWn,nYn + V0nVn

+ 2(λ− λ0)(β − β0)0X 0nWn,nYn + 2(β0 − β)0X 0

nVn + 2(λ0 − λ)Y 0nW 0n,nVn,

one has

1

nlnLn(θ)− E( 1

nlnLn(θ)) = − (λ− λ0)

2

2σ2 2n(GnXnβ0)

0GnVn +1

n[V 0nG

0nGnVn − E(V 0nG0nGnVn)]

− 1

2σ2· 1n(V 0nVn − nσ20)−

(λ− λ0)(β − β0)0σ2

· X0nGnVnn

− (β0 − β)0

σ2X 0nVnn

− (λ0 − λ)σ2

½(GnXnβ0)

0Vnn

+1

n(V 0nG

0nVn − E(V 0nG0nVn))

¾.

Lemma A.14 in Lee (2001) implies that (GnXnβ0)0GnVn

n , (GnXnβ0)0Vn

n andX0nGnVnn are of order OP (

1√n). By the

central limit theorems for independent variables, 1n (V0nVn−nσ20) and 1

nX0nVn are of Op(

1√n). Lemma 3 implies

that 1n (V0nG

0nGnVn−E(V 0nG0nGnVn)) and 1

n (V0nG

0nVn−E(V 0nG0nVn)) have the order oP ( 1hn ). As σ2 is bounded

away from zero and β and λ are bounded in Θ, it follows that supθ∈Θ | 1n lnLn(θ)− E( 1n lnLn(θ))| = oP (1).

Q.E.D.

Proof of Proposition 2.2 Let α = (α01,α2,α3)0 be a column vector of constants such that Σθα = 0. It

is sufficient to show that α = 0. From the Þrst row block of the linear equation system Σθα = 0, one

has limn→∞X0nXn

n α1 + limn→∞X0nGnXnβ0

n α2 = 0 and, therefore, α1 = − limn→∞(X 0nXn)

−1X 0nGnXnβ0 · α2.

From the last equation of the linear system, one has α3 = −2σ20 limn→∞tr(Gn)n · α2. By eliminating α1 and

α3, the remaining equation becomes½limn→∞

1

nσ20(GnXnβ0)

0Mn(GnXnβ0) + limn→∞

1

n

·tr(G0nGn) + tr(G

2n)− 2

tr2(Gn)

n

¸¾α2 = 0.

Because tr(GnG0n)+ tr(G

2n)− 2

n tr2(Gn) =

12 tr[(C

0n+Cn)(C

0n+Cn)

0] ≥ 0, the assumed conditions imply that

α2 = 0 and, so, α = 0. Q.E.D.

Proof of Theorem 2.1 As E( 1n lnLn(θ)) = − ln(2π)2 − lnσ2

2 + 1n ln |Sn(λ)| − 1

2σ2nE(V0n(δ)Vn(δ)), where

E(V 0n(δ)Vn(δ)) = (Sn(λ)S−1n Xnβ0 − Xnβ)0(Sn(λ)S−1n Xnβ0 − Xnβ) + σ20tr(S

0−1n S0n(λ)Sn(λ)S

−1n ), its Þrst

order derivatives are∂E( 1n lnLn(θ))

∂β = 1σ2nX

0n(Sn(λ)S

−1n Xnβ0 −Xnβ),

∂E( 1n lnLn(θ))

∂λ= − 1

ntr(Wn,nS

−1n (λ)) +

1

σ2n(GnXnβ0)0(Sn(λ)S−1n Xnβ0 −Xnβ) + σ20tr(S

0−1n S0n(λ)Gn),

24

Page 25: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

and∂E( 1n lnLn(θ))

∂σ2 = − 12σ2 +

E(V 0n(δ)Vn(δ))2σ4n . At θ0,

∂E( 1n lnLn(θ0))

∂θ = 0. The second order derivatives are

∂2E( 1n lnLn(θ))

∂β∂β0 = −X0nXn

σ2n ,∂2E( 1n lnLn(θ))

∂β∂λ = − 1σ2nX

0nGnXnβ0,

∂2E( 1n lnLn(θ))

∂β∂σ2= − 1

σ4nX 0n(Sn(λ)S

−1n Xnβ0 −Xnβ),

∂2E( 1n lnLn(θ))

∂λ2= − 1

ntr([Wn,nS

−1n (λ)]2)− 1

σ2n[(GnXnβ0)

0(GnXnβ0) + σ20tr(G0nGn)],

∂2E( 1n lnLn(θ))

∂σ2∂λ= − 1

nσ4[(GnXnβ0)

0(Sn(λ)S−1n Xnβ0 −Xnβ) + σ20tr(S0−1n S0n(λ)Gn)],

and∂2E( 1n lnLn(θ))

∂σ2∂σ2 = 12σ4 − 1

σ6nE(V0n(δ)Vn(δ)).

By Taylors expansion of E( 1n lnLn(θ)) at θ0 and by rearrangement,

E(1

nlnLn(θ))− E( 1

nlnLn(θ0)) = −1

2

(θ − θ0)0k θ − θ0 k ·Σθ ·

(θ − θ0)k θ − θ0 k k θ − θ0 k

2

+1

2(θ − θ0)0

·∂2

∂θ∂θ0E(1

nlnLn(θn)) + Σθ

¸(θ − θ0)

where θn lies between θ and θ0. Because Σθ is positive deÞnite, there exists a constant c > 0 such that

supθ12(θ−θ0)0kθ−θ0k · (−Σθ) ·

(θ−θ0)kθ−θ0k < −c. As 1

n tr([Wn,nS−1n (λ)]2) − 1

n tr(G2n) =

2n tr[(Wn,nS

−1n (λn))

3](λ − λ0),

where λn lies between λ and λ0, and1n tr[(Wn,nS

−1n (λ))3] = O( 1hn ) uniformly in λ ∈ Λ1 by Lemma 2, it

follows that limn→∞∂2E( 1n lnLn(θn))

∂θ∂θ0 − ∂2E( 1n lnLn(θ0))

∂θ∂θ0 = 0 whenever limn→∞ θn = θ0. By Proposition 2.2

that limn→∞ 1nE(

∂2 lnLn(θ0)∂θ∂θ0 ) = −Σθ, there exists a neighborhood Θ1 of θ0 (contained in Λ1) such that

supθ∈Θ1k ∂2

∂θ∂θ0E(1n lnLn(θ)) + Σθ k≤ c/2 for large enough n. Hence,

E(1

nlnLn(θ))−E( 1

nlnLn(θ0)) ≤ 1

2

(θ − θ0)0k θ − θ0 k ·(−Σθ)·

(θ − θ0)k θ − θ0 k· k θ−θ0 k

2 +c

4k θ−θ0 k2< − c

4k θ−θ0 k2,

that is, the identiÞcation uniqueness property holds on Θ1. Finally, the consistency of θn follows from the

identiÞcation uniqueness and that 1n [lnLn(θ) − E(lnLn(θ))]

p→ 0 uniformly in θ from Proposition 2.1 (see

White 1994). Q.E.D.

Proof of Theorem 2.2 The expected average log likelihood of (2.3) is

E(lnLn(θ)) = −n2ln(2π)− n

2lnσ2 + ln |Sn(λ)|

− 1

2σ2[Sn(λ)S−1n Xnβ0 −Xnβ]0[Sn(λ)S−1n Xnβ0 −Xnβ] + σ20tr(S

0−1n S0n(λ)Sn(λ)S

−1n ).

As Sn(λ)S−1n Xnβ0−Xnβ = Xn(β0− β) + (λ0−λ)GnXnβ0, it follows that 1

nE(lnLn(θ))− 1nE(lnLn(θ0)) =

T1n − 12σ2T2n, where

T1n(λ,σ2) =

1

2(lnσ20 − lnσ2) +

(σ2 − σ20)2σ2

− (σ2n(λ)− σ20)2σ2

+1

n(ln |Sn(λ)|− ln |Sn(λ0)|),

25

Page 26: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

and T2n(β,λ) = (β0 − β00,λ− λ0)(Xn, GnXnβ0)0(Xn, GnXnβ0)(β0 − β00,λ− λ0)0.

Consider the pure spatial autoregressive process Yn = λWn,nYn + Vn where Vn is N(0,σ2In). The log

likelihood function of this process is

lnLp,n(λ,σ2) = −n

2ln(2π)− n

2lnσ2 + ln |Sn(λ)|− 1

2σ2Y 0nS

0n(λ)Sn(λ)Yn.

Let Ep(·) be the expectation operator for Yn based on this pure spatial autoregressive process. It follows

that

Ep(1

nlnLp,n(λ,σ

2))− Ep( 1nlnLp,n(λ0,σ

20))

=1

2(1 + lnσ20 − lnσ2) +

1

n(ln |Sn(λ)|− ln |Sn(λ0)|)− σ20

2σ2ntr(S

0−1n S0n(λ)Sn(λ)S

−1n )

=1

2(lnσ20 − lnσ2) +

(σ2 − σ20)2σ2

+1

n(ln |Sn(λ)|− ln |Sn(λ0)|)− (σ

2n(λ)− σ20)2σ2

,

which equals to T1n(λ,σ2). By the information inequality, Ep(

1n lnLp,n(λ,σ

2))− Ep( 1n lnLp,n(λ0,σ20)) ≤ 0.

Thus, T1n(λ,σ2) ≤ 0 for any (λ,σ2).

T2n(β,λ) is a quadratic function of β and λ. Under the assumed condition, T2n(β,λ) < 0 whenever

(β,λ) 6= (β0,λ0). Thus, β0 and λ0 are globally identiÞed. At λ0, σ20 is the unique maximizer of T1n(λ0,σ2).

Thus our assumption guarantees that E( 1n lnLn(θ))− E( 1n lnLn(θ0)) satisÞes the identiÞcation uniqueness

condition and θ0 is globally identiÞable. The consistency of θn follows from this identiÞcation and uniform

convergence in Proposition 2.1. Q.E.D.

Proof of Theorem 2.3 Suppose that lim supn→∞[maxθ∈N²(θ0)E(1n lnLn(θ) − E( 1n lnLn(θ0))] < 0 does

not hold for some ² > 0. Then there exists a sequence θn = (β0n,σ2n,λn) which converges to a vector

θ+ = (β0+,σ

2+,λ+) where θ+ 6= θ0 such that limn→∞[E( 1n lnLn(θn))− E( 1n lnLn(θ0))] = 0. If E( 1n lnLn(θ))

is uniformly equicontinuous in θ, it will imply that limn→∞[E( 1n lnLn(θn)) − E( 1n lnLn(θ+))] = 0. Hence,

limn→∞[E( 1n lnLn(θ+)) − E( 1n lnLn(θ0))] = 0. We want to show that there will be a contradiction to our

assumed condition in (2.14).

First, we show that E( 1n lnLn(θ)) is uniformly equicontinuous in θ. Let θ1 and θ2 be any two vectors

in Θ. As E( 1n lnLn(θ)) = − ln(2π)2 − lnσ2

2 + 1n ln |Sn(λ)|− 1

2σ2nE(V0n(δ)Vn(δ)),

E(1

nlnLn(θ2))− E( 1

nlnLn(θ1))

=1

2(lnσ21 − lnσ22) +

1

n(ln |Sn(λ2)|− ln |Sn(λ1)|)− 1

2(1

σ22− 1

σ21)1

nE(V 0n(δ2)Vn(δ2))

+1

2σ21· 1n[E(V 0n(δ1)Vn(δ1))− E(V 0n(δ2)Vn(δ2))].

26

Page 27: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

By expansion,

1

nE(V 0n(δ)Vn(δ)) =

1

n[(Sn(λ)S

−1n Xnβ0 −Xnβ)0(Sn(λ)S−1n Xnβ0 −Xnβ) + σ20tr(S

0−1n S0n(λ)Sn(λ)S

−1n )]

= (β − β0)0X0nXnn

(β − β0) + (λ− λ0)2 1n[(Xnβ0)

0G0nGnXnβ0 + σ20tr(G

0nGn)] + σ

20

+ 2(λ− λ0)(β − β0)0 1nX 0nGnXnβ0 + 2(λ0 − λ)σ20

tr(Gn)

n,

and

1

n[E(V 0n(δ1)Vn(δ1))− E(V 0n(δ2)Vn(δ2))]

= (β1 − β2)0X0nXnn

(β1 + β2 − 2β0) + [(λ1 − λ0)2 − (λ2 − λ0)2] 1n[(Xnβ0)

0G0nGnXnβ0 + σ20tr(G

0nGn)]

+ 2[(λ1 − λ0)(β1 − β0)0 − (λ2 − λ0)(β2 − β0)0] 1nX 0nGnXnβ0 + 2(λ2 − λ1)σ20

tr(Gn)

n.

By Lemma A.5 in Lee (2001), 1nX0nXn,

1n (Xnβ0)

0G0nGnXnβ0 and1nX

0nGnXnβ0 are of order O(1). By Lemma

2, 1n tr(Gn) and

1n tr(G

0nGn) are of order O(

1hn). Thus, as Θ is compact, 1

nE(V0n(δ)Vn(δ)) are uniformly

bounded and are uniformly equicontinuous on Θ. By the mean value theorem, 1n (ln |Sn(λ2)|− ln |Sn(λ1)|) =1n tr(Wn,nS

−1n (λn))(λ2 − λ1) where λn lies between λ1 and λ2. By the uniform boundedness Assumption

7, Lemma 2 implies that 1n tr(Wn,nS

−1(λn)) = O( 1hn ). Hence, 1n ln |Sn(λ)| is uniformly equicontinuous

in λ in Λ. As Θ is compact, lnσ2 is uniformly continuous and, consequently, E( 1n lnLn(θ)) is uniformly

equicontinuous in θ.

Consider Þrst the case that λ+ = λ0. If this were the case,

E(1

nlnLn(θ+))− E( 1

nlnLn(θ0)) = − lnσ

2+

2− 1

2σ2+n[(β0 − β+)0X 0

nXn(β0 −Xnβ+) + nσ20 ] +lnσ202

+1

2,

which would correspond to the difference of the average log likelihood functions of the normal regression

model SnYn = Xnβ+Vn (as if λ0 were known) at (β+,σ2+) and (β0,σ

20). In a linear regression model, β0 and

σ20 are uniquely identiÞable in its limiting process under the condition that limn→∞X0nXn

n is nonsingular. So

λ+ could not be equal to λ0.

By the Jensen inequality, for any θ, E(lnLn(θ)) ≤ E(lnLn(θ0)). At λ0, θ∗n(λ0) = β0, σ

∗2n (λ0) = σ20

and Qn(λ0) = E(lnLn(θ0)). It follows that E(1n lnLn(θ+)) ≤ 1

nQn(λ+) ≤ 1nQn(λ0) =

1nE(lnLn(θ0)). In

turn, these imply that limn→∞ 1n [Qn(λ+) − Qn(λ0)] = 0, which is a contradiction. This is so, because

the assumed condition (2.14) is equivalent to that limn→∞[ 1n (ln |Sn(λ)| − ln |Sn|) − 12 (lnσ

∗2n (λ) − lnσ20)] =

limn→∞ 1n [Qn(λ)−Qn(λ0)] 6= 0 for λ 6= λ0 as σ2n = σ∗2n when MnGnXnβ0 = 0.

The consistency of θn follows from the above identiÞcation uniqueness and the uniform convergence of

1n (lnLn(θ)− lnLn(θ0))

p→ 0 in Proposition 2.1. Q.E.D.

27

Page 28: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

Proof of Proposition 2.3 The matrix Gn of the quadratic form is uniformly bounded in row sums.

As all the regressors are bounded, the elements of GnXnβ0 for all n are uniformly bounded (Lemma A.5

in Lee (2001)). When hn = O(1), as Σθ is positive deÞnite, the variances of1√n∂ lnLn(θ0)

∂θ in (2.15) are

bounded away from zero. With the existence of high order moments of v in Assumption 2, the sufficient

conditions of the central limit theorem for quadratic forms of double arrays of Kelejian and Prucha (1999b)

are satisÞed and the limiting distribution of the score vector follows. For the case that limn→∞ hn = ∞,1√n(V 0nG0nVn−σ20tr(Gn)) p→ 0 because var(V 0nG0nVn) = O(

nhn). The remaining terms converge in distribution

to a relevant normal vector by Kolmogorovs central limit theorem. Q.E.D.

Proof of Proposition 2.4 AsX0nXn

n = O(1),X0nWn,nYn

n = OP (1) and σ2n

p→ σ20 , (2.5) implies that

1

n

∂2 lnLn(θn)

∂β∂β0− 1

n

∂2 lnLn(θ0)

∂β∂β0= (

1

σ20− 1

σ2n)X 0nXnn

= op(1),

and

1

n

∂2 lnLn(θn)

∂β∂λ− 1

n

∂2 lnLn(θ0)

∂β∂λ= (

1

σ20− 1

σ2n)X 0nWn,nYnn

= op(1).

As Vn(δn) = Yn −Xn βn − λnWn,nYn = Xn(β0 − βn) + (λ0 − λn)Wn,nYn + Vn,

1

n

∂2 lnLn(θn)

∂β∂σ2− 1

n

∂2 lnLn(θ0)

∂β∂σ2= (

1

σ40− 1

σ4n)X 0nVnn

+X 0nXnσ4nn

( βn − β0) + X0nWn,nYnσ4nn

(λn − λ0) = op(1).

By the mean value theorem, tr(G2n(λn)) = tr(G

2n) + 2tr(G

3n(λn)) · (λn − λ0), therefore,

1

n

∂2 lnLn(θn)

∂λ2− 1

n

∂2 lnLn(θ0)

∂λ2= −2 tr(G

3n(λn))

n(λn − λ0) + ( 1

σ20− 1

σ2n)Y 0nW 0

n,nWn,nYn

n= op(1).

On the other hand, because

1

nY 0nW

0n,nVn(

δn) =Y 0nW 0

n,nXn

n(β0 − βn) + (λ0 − λn)

Y 0nW 0n,nWn,nYn

n+Y 0nW 0

n,nVn

n=Y 0nW 0

n,nVn

n+ oP (1)

and

1

nV 0n(δn)Vn(δn) = (βn − β0)0

X 0nXnn

( βn − β0) + (λn − λ0)2Y 0nW 0

n,nWn,nYn

n+V 0nVnn

+ 2(λn − λ0)( βn − β0)0X0nWn,nYnn

+ 2(β0 − βn)0X

0nVnn

+ 2(λ0 − λn)Y 0nW

0n,nVn

n

=V 0nVnn

+ oP (1),

it follows that

1

n

∂2 lnLn(θn)

∂σ2∂λ− 1

n

∂2 lnLn(θ0)

∂σ2∂λ

= −Y0nW

0n,nVn(

δn)

σ4nn+Y 0nW

0n,nVn

σ40n

=Y 0nW 0

n,nXn

σ4nn( βn − β0) +

Y 0nW 0n,nWn,nYn

σ4nn(λn − λ0) + ( 1

σ40− 1

σ4n)Y 0nW 0

n,nVn

n= op(1),

28

Page 29: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

and1

n

∂2 lnLn(θn)

∂σ2∂σ2− 1

n

∂2 lnLn(θ0)

∂σ2∂σ2=

1

2σ4n− V

0n(δn)Vn(δn)

nσ6n− 1

2σ40+V 0nVnnσ60

=1

2(1

σ4n− 1

σ40) + (

1

σ60− 1

σ6n)V 0nVnn

+ oP (1) = op(1).

Q.E.D.

Proof of Proposition 2.5 By Lemma A.14 in Lee (2001), 1nX0nGnVn = oP (1) and

1nX

0nG

0nGnVn = oP (1).

It follows that 1nX

0nWn,nYn =

1nX

0nGnXnβ0 + oP (1),

1nY

0nW

0n,nVn =

1nV

0nG

0nVn + oP (1), and

1

nY 0nW

0n,nWn,nYn =

1

n(Xnβ0)

0G0nGnXnβ0 +1

nV 0nG

0nGnVn + oP (1).

E(V 0nG0nVn) = σ20tr(Gn) and var(1nV

0nG

0nVn) =

(µ4−3σ40)n2

Pni=1G

2n,ii +

σ40n2 [tr(GnG

0n) + tr(G

2n)] = O(

1nhn

) by

Lemma A.10 in Lee (2001). Similarly, E(V 0nG0nGnVn) = σ

20tr(G

0nGn) and

var(1

nV 0nG

0nGnVn) =

(µ4 − 3σ40)n2

nXi=1

(G0nGn)2ii + 2

σ40n2tr((G0nGn)

2) = O(1

nhn).

By the law of large numbers for i.i.d. random variables, 1nV0nVn

p→ σ20 . With these properties, the convergence

result follows.

Lemma 2 implies that 1n tr(Gn) = O( 1hn ),

1n tr(G

2n) = O( 1hn ) and

1n tr(G

0nGn) = O( 1hn ). Thus, if

limn→∞ hn =∞, 1n tr(Gn), 1n tr(G2n) and 1n tr(G

0nGn) vanish. Q.E.D.

Proof of Theorem 2.4 The asymptotic distribution of θn follows from the expansion√n(θn − θ0) =

−³1n∂2 lnLn(θn)

∂θ∂θ0

´−11√n

∂ lnLn(θ0)∂θ and Propositions (2.3) and (2.4). Q.E.D.

Proof of Proposition 3.1 As MnSn(λ)S−1n Xnβ0 = (λ0−λ)MnGnXnβ0 = 0 whenMnGnXnβ0 = 0, σ

∗2n (λ)

in (2.12) is reduced to

σ∗2n (λ) =σ20ntr(S

0−1n S0n(λ)Sn(λ)S

−1n ) = σ20 [1 + 2(λ0 − λ)

tr(Gn)

n+ (λ0 − λ)2 tr(G

0nGn)

n].

As tr(Gn)n and

tr(G0nGn)n are of order O( 1hn ), it follows that, when limn→∞ hn = ∞, σ∗2n (λ) − σ20 = op(1)

uniformly in λ in any bounded set. As

σ2n(λ)− σ∗2n (λ) =1

nY 0nS

0n(λ)MnSn(λ)Yn − σ

20

ntr(S

0−1n S0n(λ)Sn(λ)S

−1n )

=1

nV 0nS

0−1n S0n(λ)Sn(λ)S

−1n Vn − σ

20

ntr(S

0−1n S0n(λ)Sn(λ)S

−1n )− Tn(λ),

where Tn(λ) =1nV

0nS

0−1n S0n(λ)Xn(X

0nXn)

−1X 0nSn(λ)S

−1n Vn. By Lemma A.14 in Lee (2001),

1√nX 0nSn(λ)S

−1n Vn =

1√nX 0nS

−1n Vn − λ 1√

nX 0nGnVn = Op(1).

29

Page 30: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

As hnn goes to zero, hnTn(λ) =hnn (

1√nX 0nSn(λ)S

−1n Vn)

0(X0nXn

n )−1( 1√nX 0nSn(λ)S

−1n Vn)

0 goes to zero in prob-

ability. By Lemma 3, hnn [V

0nS

0−1n S0n(λ)Sn(λ)S

−1n Vn − σ20tr(S

0−1n S0n(λ)Sn(λ)S

−1n )] = oP (1). Therefore,

hn(σ2n(λ) − σ∗2n (λ)) = oP (1) uniformly in λ in any bounded set. The latter uniform convergence in λ in

any bounded set is apparent because the relevant expressions are linear in λ.

Because hnn (lnLn(λ)−Qn(λ)) = −hn

2 (ln σ2n(λ)− lnσ∗2n (λ)), by the mean value theorem hn

n (lnLn(λ)−

Qn(λ)) = − hn2σ2n(λ)

(σ2n(λ) − σ∗2n (λ)), where σ2n(λ) lies between σ2n(λ) and σ∗2n (λ). As σ2n(λ) and σ∗2n (λ)

converges to σ20 uniformly in any bounded set Λ when limn→∞ hn = ∞, supλ∈Λ 1σn(λ)

= OP (1) and

supλ∈Λ1

σ∗2n (λ) = O(1). As σ2n(λ) lies between σ2n(λ) and σ

∗2n (λ),

1σ2n(λ)

≤ 1σ2n(λ)

+ 1σ∗2n (λ) is stochastically

bounded uniformly in λ ∈ Λ. Hence

hnnsupλ∈Λ

| lnLn(λ)−Qn(λ)| ≤ supλ∈Λ

1

2σ2n(λ)· hn sup

λ∈Λ|σ2n(λ)− σ∗2n (λ)| = op(1).

Q.E.D.

Proof of Proposition 3.2 As Lemma A.10 in Lee (2001) gives

E(V 0nG0nMnVn)

2 = (µ4 − 3σ40)nXi=1

(G0nMn)2ii + σ

40 [tr

2(G0nMn) + tr(G0nMnGn) + tr((G

0nMn)

2)],

hence E(P0(λ0)) =2n [(

µ4−3σ40σ40

)Pn

i=1(G0nMn)

2ii+tr

2(G0nMn)+tr(G0nMnGn)+tr((G

0nMn)

2)]−tr(G0nMnGn)−

tr(G2n). The orders of the relevant terms tr(G0nMn), tr(G0nMnGn), tr((G

0nMn)

2) and tr((G0nMnGn)2)

are needed. From Lemma 5,Pni=1(G

0nMn)

2ii =

Pni=1G

2n,ii + O(

1hn), tr2(G0nMn) = tr2(Gn) + O(

nhn),

tr(G0nMnGn) = tr(G0nGn)+O(1), tr[(G

0nMn)

2] = tr(G2n)+O(1), and tr[(G0nMnGn)

2] = tr[(GnG0n)2]+O(1).

The simpliÞed expression for E(Pn(λ0)) follows also from these expansions.

From Lemma 2,Pni=1G

2n,ii = O( nh2n

) and tr(G0nMnGn), tr((G0nMn)

2) and tr((G0nMnGn)2) have the

order O( nhn ). Ashnn [Pn(λ0) − E(Pn(λ0))] = 2

σ40∆n1 − 1

σ20∆n2 + o(1), where ∆n1 =

hnn2 [(V

0nG

0nMnVn)

2 −

σ40tr2(G0nMn)] and ∆n2 =

hnn [V

0nG

0nMnGnVn − σ20tr(G0nMnGn)],

hnn [Pn(λ0) − E(Pn(λ0))] will converge to

zero in probability if both ∆n1 and ∆n2 converge to zero in probability. By Lemma A.10 in Lee (2001) and

the orders of relevant terms,

E(∆n1) =hnn2var(V 0nG

0nMnVn) =

hnn2(µ4 − 3σ40)

nXi=1

(G0nMn)2ii + σ

40 [tr(G

0nMnGn) + tr((G

0nMn)

2)] = O( 1n)

and

E(∆2n2) = (

hnn)2var(V 0nG

0nMnGnVn) = (

hnn)2[(µ4−3σ40)

nXi=1

(G0n,iMnGn,i)2+2σ40tr((G

0nMnGn)

2)] = O(hnn),

30

Page 31: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

which go to zero.

Finally,

hnn

µ∂2 lnLn(λ0)

∂λ2− E(Pn(λ0))

¶=hnn

µ∂2 lnLn(λ0)

∂λ2− Pn(λ0)

¶+hnn(Pn(λ0)− E(Pn(λ0)))

= (1

σ4n(λ0)− 1

σ40) · 2hn

n2(V 0nG

0nMnVn)

2 − ( 1

σ2n(λ0)− 1

σ20) · hnnV 0nG

0nMnGnVn + oP (1).

Lemma 3 implies that hnn V

0nG

0nMnGnVn = OP (1) and

hnn2 (V

0nG

0nMnVn)

2 = (√hnn V 0nG

0nMnVn)

2 = OP (1hn).

Because σ2n(λ0) is a consistent estimate of σ20 , the Þnal result follows from Slutskys lemma. Q.E.D.

Proof of Theorem 3.1 When MnGnXnβ0 = 0, σ∗2n (λ) =

σ20n tr(S

0−1n S0n(λ)Sn(λ)S

−1n ) from (2.12). For this

case, Qn(λ) happens to have the same expression used in Lee (2001) for the pure spatial autoregressive

process. The uniqueness identiÞcation condition holds as in Theorem 3 in Lee (2001). The consistency of

the estimator λn follows from the unique identiÞcation and the uniform convergence ofhnn (lnLn(λ)−Qn(λ))

to zero in probability. Q.E.D.

Proof of Proposition 3.3 Because MnGnXnβ0 = 0, MnXn = 0 and Sn(λ) = Sn + (λ0 − λ)Wn,n, one has

Y 0nW0n,nMnWn,nYn = (GnXnβ0 +GnVn)

0Mn(GnXnβ0 +GnVn) = V0nG

0nMnGnVn,

andY 0nW

0n,nMnSn(λ)Yn = Y

0nW

0n,nMnSnYn + (λ0 − λ)Y 0nW 0

n,nMnWn,nYn

= (GnXnβ0 +GnVn)0Mn(Xnβ0 + Vn) + (λ0 − λ)V 0nG0nMnGnVn

= V 0nG0nMnVn + (λ0 − λ)V 0nG0nMnGnVn.

Therefore,

hnn

Ã∂2 lnLn(λn)

∂λ2− ∂

2 lnLn(λ0)

∂λ2

!

= 2(1

σ4n(λn)

− 1

σ4n(λ0))hnn2(V 0nG

0nMnVn)

2 + 4(λ0 − λn)σ4n(λn)

(1

nV 0nG

0nMnGnVn)(

hnnV 0nG

0nMnVn)

+ 2(λ20 − λn)2σ4n(λn)

· hnn2(V 0nG

0nMnGnVn)

2 + (1

σ2n(λ0)− 1

σ2n(λn)

)(hnnV 0nG

0nMnGnVn)

+ 2hnntr(G3n(λn))(

λn − λ0)

= oP (1),

because V 0nG0nMnVn = Op(

nhn) and V 0nG

0nMnGnVn = Op(

nhn) by Lemma 3, σ2n(λ0) =

1nV

0nMnVn = Op(1) by

Lemma A.11 in Lee (2001), and hnn tr(G

3n(λn)) = O(1) by Lemma 2. Q.E.D.

31

Page 32: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

Proof of Proposition 3.4 The mean of Qn is E(Qn) = σ20tr(MnCn) = −σ20 ·tr[(X 0

nXn)−1X 0

nCnXn] = O(1)

becauseX0nXn

n = O(1) by Assumption 6 andX0nCnXn

n = O(1) from Lemma A.5 in Lee (2001). The variance

of Qn from Lemma A.10 in Lee (2001) is

σ2Qn= (µ4 − 3σ40)

nXi=1

(C0nMn)2ii + σ

40 [tr(C

0nMnCn) + tr((C

0nMn)

2)]

= (µ4 − 3σ40)nXi=1

C2n,ii + σ40 [tr(C

0nCn) + tr(C

2n)] +O(1),

where the last expression follows by using

nXi=1

(C 0nMn)2ii =

nXi=1

C2n,ii +O(1

hn), tr(C 0nMnCn) = tr(C

0nCn) +O(1) and tr[(C0nMn)

2] = tr(C2n) +O(1)

from Lemma 5. The central limit theorem for quadratic functions (Lemma A.13, Lee (2001)) implies that

(Qn−E(Qn))σQn

D→ N(0, 1). Asq

hnn E(Qn) = O(

qhnn ), which goes to zero,r

hnn

∂ lnLn(λ0)

∂λ=

qhnn σQn

σ2n(λ0)· (Qn − E(Qn))

σQn

+

rhnn

E(Qn)

σ2n(λ0)

=

qhnn σQn

σ2n(λ0)· (Qn − E(Qn))

σQn

+ oP (1)

p→ N

Ã0, limn→∞

hnn[(µ4 − 3σ40)

σ40

nXi=1

C2n,ii + tr(CnC0n) + tr(C

2n)]

!.

As C2n,ii = O(1h2n), hnn

Pni=1 C

2n,ii = O(

1hn) which goes to zero when limn→∞ hn =∞. Q.E.D.

Proof of Theorem 3.2 The asymptotic distribution follows from the Taylor expansion and the convergence

results in Propositions 3.3 and 3.4 and that in (3.11). Q.E.D.

Proof of Theorem 3.3 From (2.9), as Sn(λ) = Sn + (λ0 − λn)Wn,n,

βn(λn)− β0 = (X 0nXn)

−1X 0nVn − (λn − λ0)(X 0

nXn)−1X 0

nWn,nYn

= (X 0nXn)

−1X 0nVn − (λn − λ0)(X 0

nXn)−1X 0

nGnXnβ0 − (λn − λ0)(X 0nXn)

−1X 0nGnVn.

As (λn−λ0)(X 0nXn)

−1X 0nGnVn = Op(

√hnn ) because

X0nXn

n = O(1) andX0nGnVn√n

= Op(1) by Lemma A.14 in

Lee (2001) and λn − λ0 = OP (q

hnn ) by Theorem 3.2,

βn(λn)− β0 = (X 0nXn)

−1X 0nVn − (λn − λ0)(X 0

nXn)−1X 0

nGnXnβ0 + Op(

√hnn).

In general,rn

hn( βn(λn)− β0) = 1√

hn(X 0nXnn

)−1X 0nVn√n−rn

hn(λn − λ0) · (X 0

nXn)−1X 0

nGnXnβ0 +Op(1√n)

= −rn

hn(λn − λ0) · (X 0

nXn)−1X 0

nGnXnβ0 +Op(1√hn).

32

Page 33: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

If β0 is zero,√n( βn(λn)− β0) = (X

0nXn

n )−1X0nVn√n+Op(

qhnn )

D−→ N(0,σ20 limn→∞(X0nXn

n )−1).

As

σ2n =1

nY 0nS

0n(λn)MnSn(λn)Yn

=1

nY 0nS

0nMnSnYn + 2(λ0 − λn) 1

nY 0nS

0nMnWn,nYn + (λ0 − λn)2 1

nY 0nW

0n,nMnWn,nYn

and 1nY

0nS

0nMnSnYn =

1nV

0nMnVn, it follows that

√n(σ2n − σ20) =

1√n(V 0nVn − nσ20)−

1√nV 0nXn(X

0nXn)

−1X 0nVn

− 2rn

hn(λn − λ0) ·

√hnnY 0nW

0n,nMnSnYn +

rn

hn(λn − λ0)2 ·

√hnnY 0nW

0n,nMnWn,nYn.

Because MnXn = 0 and MnGnXnβ0 = 0,

√hnnY 0nW

0n,nMnSnYn =

√hnn(GnXnβ0 +GnVn)

0Mn(Xnβ0 + Vn) =

√hnnV 0nG

0nMnGnVn = O(

1√hn)

and√hnn Y 0nW

0n,nMnWn,nYn =

√hnn (GnXnβ0+GnVn)

0Mn(GnXnβ0+GnVn) =√hnn V 0nG

0nMnGnVn = O(

1√hn)

by Lemma 3. As E( 1√nV 0nXn(X 0

nXn)−1X 0

nVn) =σ20√ntr(Xn(X

0nXn)

−1Xn) =σ20k√ngoes to zero, the Markov

inequality implies that 1√nV 0nXn(X

0nXn)

−1X 0nVn = oP (1). Hence, as limn→∞ hn = ∞, √n(σ2n − σ20) =

1√n(V 0nVn − nσ20) + oP (1) D→ N(0, µ4 − σ4). Q.E.D.

Proof of Theorem 3.4 The asymptotic distributions of βn1 and βn2 follow from (3.18) and (3.19). Q.E.D.

33

Page 34: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

References

Anselin, L. (1988), Spatial Econometrics: Methods and Models, Kluwer Academic Publishers, The Nether-

lands.

Anselin, L. (1992), Space and Applied Econometrics, Anselin, ed. Special Issue, Regional Science and Urban

Economics 22.

Anselin, L. and A.K. Bera (1998), Spatial Dependence in Linear Regression Models with an Introduction

to Spatial Econometrics, in: Handbook of Applied Economics Statistics, A. Ullah and D.E.A. Giles, eds.,

Marcel Dekker, NY.

Anselin, L. and R. Florax (1995), New Directions in Spatial Econometrics, Springer-Verlag, Berlin.

Anselin, L. and S. Rey (1991), Properties of Tests for Spatial Dependence in Linear Regression Models,

Geographical Analysis 23, 110-131.

Anselin, L. and S. Rey (1997), Spatial Econometrics, Anselin, L. and S. Rey, ed. Special Issue, International

Regional Science Review 20.

Case, A.C. (1991), Spatial Patterns in Household Demand, Econometrica 59, 953-965.

Case, A.C. (1992), Neighborhood Inßuence and Technological Change, Regional Science and Urban Eco-

nomics 22, 491-508.

Case, A.C., H.S. Rosen, and J.R. Hines (1993), Budget Spillovers and Fiscal Policy Interdependence:

Evidence from the States, Journal of Public Economics 52, 285-307.

Cressie, N. (1993), Statistics for Spatial Data, Wiley, New York.

Dhrymes, P.J. (1978), Mathematics for Econometrics, Springer-Verlag, New York.

Doreian, P. (1980), Linear Models with Spatially Distributed Data, Spatial Disturbances, or Spatial Effects,

Sociological Methods and Research 9, 29-60.

Haining, R. (1990), Spatial Data Analysis in the Social and Environmental Sciences, Cambridge U. Press,

Cambridge.

Horn, R., and C. Johnson (1985), Matrix Analysis, New York: Cambridge University Press.

Kelejian, H.H. and I.R. Prucha (1998), A Generalized Spatial Two-Stage Least Squares Procedure for

Estimating a Spatial Autoregressive Model with Autoregressive Disturbances, Journal of Real Estate

Finance and Economics 17, 99-121.

Kelejian, H.H., and I.R. Prucha (1999a), A Generalized Moments Estimator for the Autoregressive Param-

34

Page 35: Asymptotic Distributions of Quasi-Maximum Likelihood ......Under this assumption, the equilibrium outcome vector is Y n= S−1(Xnβ0 +Vn). (2.2) In this paper, our main concern is

eter in a Spatial Model. International Economics Review 40, 509-533.

Kelejian, H.H., and I.R. Prucha (1999b), On the Asymptotic Distribution of the Moran I Test Statistic

with Applications. Manuscript, Department of Economics, University of Maryland.

Kelejian, H.H., and D. Robinson (1993), A suggested method of estimation for spatial interdependent models

with autocorrelated errors, and an application to a county expenditure model, Papers in Regional Science

72, 297-312.

Lee, L.F. (1999a), Best Spatial Two-Stage Least Squares Estimators for a Spatial Autoregressive Model

with Autoregressive Disturbances, Manuscript, Department of Economics, HKUST, Hong Kong.

Lee, L.F. (1999b), Consistency and Efficiency of Least Squares Estimation for Mixed Regressive, Spatial

Autoregressive Models. Manuscript, forthcoming in Econometric Theory.

Lee, L.F. (2001), Asymptotic Distributions of Quasi-Maximum Likelihood Estimators for Spatial Econo-

metric Models I: Spatial Autoregressive Processes. Manuscript, Department of Economics, HKUST,

1999; Revised Manuscript, Department of Economics, OSU, 2001.

Manski, C.F. (1993), IdentiÞcation of Endogenous Social Effects: The Reßection Problem, The Review of

Economic Studies 60, 531-542.

Mead, R. (1967), A Mathematical Model for the Estimation of Interplant Competition, Biometrics 23,

189-205.

Ord, J.K. (1975), Estimation Methods for Models of Spatial Interaction, Journal of American Statistical

Association, 70.

Paelinck, J. and L. Klaassen (1979), Spatial Econometrics, Saxon House, Farnborough.

Pinkse, J. (1999), Asymptotic Properties of Moran and Related Tests and Testing for Spatial Correlation

in Probit Models, manuscript, U. of British Columbia.

Rothenberg, T.J. (1971), IdentiÞcation in Parametric Models, Econometrica 32: 57-76.

Smirnov, O. and L. Anselin (1999), Fast Maximum Likelihood Estimation of Very Large Spatial Autoregres-

sive Models: A Characteristic Polynomial Approach. Manuscript, School of Social Sciences, University

of Texas at Dallas.

White, H. (1994), Estimation, Inference and SpeciÞcation Analysis, Cambridge University Press, New York,

New York.

Whittle, P. (1954), On Stationary Processes in the Plane, Biometrika 41, 434-449.

35


Recommended