Asymptotic Distributions of Quasi-Maximum Likelihood Estimators
for Spatial Econometric Models II:
Mixed Regressive, Spatial Autoregressive Models
by Lung-fei Lee*
Department of Economics
The Ohio State University
Columbus, Ohio
August 1999; April 2001
Abstract
Asymptotic properties of MLEs and QMLEs of mixed regressive, spatial autoregressive models are in-
vestigated. The stochastic rates of convergence of the MLE and QMLE for such models may be less than the
√n-rate under some circumstances even though its limiting distribution is asymptotically normal. When spa-
tially varying regressors are relevant, the MLE and QMLE of the mixed regressive, autoregressive model may
not have this problem and they can converge at the√n-rate. If the spatially varying regressors are irrelevant,
estimates of various components of unknown parameters may possess different rates of convergence.
Key Words:
Spatial autoregression, mixed regressive spatial autoregressive model, maximum likelihood estima-
tor, quasi-maximum likelihood estimator, rate of convergence, asymptotic distribution, inÞll asymptotic,
increasing-domain asymptotic.
Correspondence Address:
Lung-fei Lee, Department of Economics, The Ohio State University, 410 Arps Hall, 1945 N. High St.,
Columbus, OH 43210-1172.
E-mail: lß[email protected]
* This paper is a revised and expanded version of the second part of a paper previously circulated underthe title Asymptotic Distributions of Maximum Likelihood Estimators for Spatial Autoregressive Models.Earlier version of the paper was presented in seminars at HKUST, NWU, OSU, Princeton U., PSU, U. ofFlorida, U. of Illinois, and USC. I appreciate comments from participants of those seminars.
1
1. Introduction
Spatial econometrics have recently received much more attention in both the theoretical and empirical
econometric literatures. Spatial econometric models provide structures which are intended to capture spatial
interactions of economic units or agents located in an economic or geographical space.
The spatial autoregressive models popularized by Whittle (1954), Mead (1967) and Ord (1975) have
received the most attention. Earlier development in testing and estimation of the spatial autoregressive
models has been summarized in Paelinck and Klaassen (1979), Doreian (1980), Anselin (1988, 1992), Haining
(1990), Kelejian and Robinson (1993), Cressie (1993), Anselin and Florax (1995), Anselin and Rey (1997),
and Anselin and Bera (1998). Empirical applications are, for example, Case (1991, 1992), Case et al.
(1993), and Kelejian and Robinson (1993). The autoregressive models can be estimated by the method of
maximum likelihood (ML) as well as methods of moments (Kelejian and Prucha 1999a). Computational
methods for the ML estimator (MLE) have been proposed in Ord (1975) and most recently by Smirnov and
Anselin (1999). Lee (2001) has studied some asymptotic properties of the MLE and the quasi-maximum
likelihood estimator (QMLE) based on the normal likelihood speciÞcation for the pure (Þrst order) spatial
autoregressive process. His analysis relates the asymptotic distribution of the estimates to the expanded
process of the spatial weights matrix as sample size increases.1
This paper investigates the mixed regressive, spatial (Þrst order) autoregressive model (mixed regressive
model, for short). The mixed regressive model differs from the pure spatial autoregressive process in the
presence of regressors in the equation. The presence of spatial varying regressors plays a distinctive role
in the model. As compared with time series models counterparts, the pure spatial autoregressive process
corresponds to an autoregressive process, and the mixed regressive model corresponds to a dynamic regression
model with lagged dependent variable. However, the spatial models have the distinguishing feature of
simultaneity in econometric equilibrium models. In the presence of spatial varying regressors, in addition
to the ML method, econometricians have considered the estimation of the mixed regressive model by the
method of instrumental variables (IV), (Anselin 1988; Kelejian and Prucha 1998; and Lee 1999a). However,
their IV estimation method will break down when all the spatial regressors are really irrelevant, and in their
IV framework, one can not test the joint signiÞcance of the regressors (Kelejian and Prucha 1998). These are
so, because the dependent variable can not be explained by the existing regressors when they are irrelevant.
1 Manski (1993) has criticized the literature on the spatial correlation models without specifying how thespatial weights matrix should change as the sample size.
2
The ML method, however, is still applicable. This paper will follow up on the analysis of Lee (2001) to
consider asymptotic properties of the QMLE for the mixed regressive model.
In Section 2, we will show that when some of the spatial varying regressors are relevant, identiÞcation
of parameters can be assured if there is no multicollinearity among the regressors and a spatially generated
regressor. The QMLE can be consistent and asymptotic normal under some regularity conditions on the
spatial weights matrix. In the event that multicollinearity occurs as the spatially generated regressor is
collinear with the original regressors, identiÞcation of the spatial interactive parameter needs to be identiÞed
through spatial correlation of outcomes. For the latter, in the scenario that each unit will be inßuenced by
only a few neighboring units, the QMLE can be√n-rate of convergence and asymptotic normal. Section
3 considers the multicollinearity case with special emphasis on the spatial scenario that each unit can
be inßuenced by many neighbors. In this case, irregularity of the information matrix occurs and various
components of the QMLEs may have different rates of convergence. We discuss results for the mixed
regressive model where all the included spatial varying regressors are irrelevant. In Section 4, an example on
the inconsistency of the QMLE for the mixed regressive model is presented and this phenomena is related
to the notion of inÞll asymptotics (Cressie 1993). Section 5 provides the conclusions. Some useful lemmas
and all the proofs are collected in the Appendix. Additional lemmas related to the proofs will be referred to
Lee (2001).
2. Mixed Regressive, Spatial Autoregressive Models
2.1 The Model
A mixed regressive model differs from a spatial autoregressive process in that regressors are presented
as explanatory variables. The (Þrst-order) mixed regressive, spatial autoregressive model is speciÞed as
Yn = Xnβ + λWn,nYn + Vn, (2.1)
where Xn is an n× k dimensional matrix of regressors, Wn,n is a speciÞed spatial weights matrix, and Vn is
a n-dimensional vector consisting of i.i.d. disturbances with zero mean and Þnite variance σ2. This spatial
model is an equilibrium model in that the observed outcomes Yn for the spatial units are determined by Xn
and Vn. Denote Sn(λ) = In−λWn,n for any value of λ. To distinguish the true parameter vector from other
possible values in a parameter space, θ0 = (β00,λ0,σ
20)0 denotes the true parameter vector of the model.
Assumption 1. At the true parameter λ0 of λ, Sn, where Sn = Sn(λ0), is nonsingular.
3
Under this assumption, the equilibrium outcome vector is
Yn = S−1n (Xnβ0 + Vn). (2.2)
In this paper, our main concern is to investigate asymptotic properties of the MLE under the normal
distribution of Vn. In the event that Vn were not really normally distributed, the estimator corresponds to
a QMLE. We will consider the asymptotic properties of the QMLE of which the MLE is a special case. Let
Vn(δ) = Yn −Xnβ − λWn,nYn where δ = (β0,λ)0. Thus, Vn = Vn(δ0) at the true parameter vector. The log
likelihood function of the model (2.1) is
lnLn(θ) = −n2ln(2π)− n
2lnσ2 + ln |Sn(λ)|− 1
2σ2V 0n(δ)Vn(δ), (2.3)
where θ = (β0,λ,σ2)0. The QMLE θn is the extremum estimator derived from the maximization of (2.3).
To provide a rigorous analysis of the QMLE, some basic regularity conditions are assumed for the model
below. Additional regularity conditions will be subsequently listed as needed.
Assumption 2. The disturbances vi, i = 1, · · · , n, are i.i.d. with mean zero and variance σ2. Its
moment E(|v|4+2δ) for some δ > 0 exits.
Assumption 3. The elements wn,ij of the weights matrix Wn,n are of order O(1hn) uniformly in all
i, j, i.e., there exists a constant c such that |wn,ij | ≤ chnfor all i, j and n, where the sequence of rates hn
can be bounded or divergent.
Assumption 4. The ratio hnn → 0 as n goes to inÞnity.
Assumption 5. The matrices Wn,n and S−1n are uniformly bounded in both row and column sums.
Assumption 6. The regressors xi are vectors of constants and are uniformly bounded. The limit
limn→∞ 1nX
0nXn exists and is nonsingular.
Assumptions 2-5 are the assumptions that provide the essential features of the disturbances and the weights
matrix for the model. Assumption 5 is originated by Kelejian and Prucha (1998, 1999a, 1999b).2 The
uniform boundedness of the weights matrix Wn,n and its associated S−1n are conditions to limit the
spatial correlation in a manageable degree. Assumptions 3 and 4 link directly the expression of the spatial
weights matrix Wn,n to the sample size n. These conditions are also assumed in Lee (2001) for the analysis
of the pure spatial autoregressive process. Examples of spatial models in empirical applications which satisfy
2 Related conditions have also been adopted in Pinkse (1999) in a different context. Useful properties ofboundedness in row and column sums can be found in Horn and Johnson (1985).
4
the above assumptions include conventional spatial weights matrices where neighboring units are deÞned by
only a few adjacent ones, and models of Case (1991, 1992) where all spatial units in a district re neighbors
of each spatial unit in the same district. Discussions of the above assumptions, in particular, Assumptions
3-5 and examples can be found in Lee (2001). The mixed regressive model differs from the pure spatial
autoregressive model in the presence of regressors in the model. As the mixed regressive model is used for
analyzing cross-sectional units located in a space, it is meaningful to assume that the values of regressors are
bounded as in Assumption 6. Multicollinearity among the regressors of Xn are ruled out as usual in that
assumption.
2.2 IdentiÞcation and the Consistency of the QMLE
For the possible consistency of the QMLE θn, it shall be shown that1n lnLn(θ)−E( 1n lnLn(θ)) converges
to zero in probability uniformly on a compact parameter space Θ of θ and that θ0 is identiÞable. The uniform
convergence can be easily established as in the following theorem.
Proposition 2.1 Under the assumed regularity conditions, 1n [lnLn(θ)−E(lnLn(θ))]p→ 0 uniformly in
θ in any compact parameter space Θ.
For identiÞcation, it is known that nonsingularity of the information matrix is sufficient for the lo-
cal identiÞcation of parameters in a model via the likelihood function (Rothenberg 1971). The Þrst-order
derivatives of the log likelihood (2.3) are
∂ lnLn(θ)
∂β=1
σ2X 0nVn(δ),
∂ lnLn(θ)
∂λ= −tr(Wn,nS
−1n (λ)) +
1
σ2Y 0nW
0n,nVn(δ), (2.4)
and ∂ lnLn(θ)∂σ2 = − n
2σ2 +12σ4V
0n(δ)Vn(δ). The second-order derivatives of (2.3) are
∂2 lnLn(θ)
∂β∂β0= − 1
σ2X 0nXn,
∂2 lnLn(θ)
∂β∂λ= − 1
σ2X 0nWn,nYn,
∂2 lnLn(θ)
∂β∂σ2= − 1
σ4X 0nVn(δ),
∂2 lnLn(θ)
∂λ2= −tr([Wn,nS
−1n (λ)]2)− 1
σ2Y 0nW
0n,nWn,nYn,
∂2 lnLn(θ)
∂σ2∂λ= − 1
σ4Y 0nW
0n,nVn(δ), (2.5)
and ∂2 lnLn(θ)∂σ2∂σ2 = n
2σ4 − 1σ6V
0n(δ)Vn(δ). Denote Gn =Wn,nS
−1n . At the true parameter vector θ0,
− 1n
∂2 lnLn(θ0)
∂θ∂θ0=
1σ20nX 0nXn
1σ20nX 0nWn,nYn
1σ40nX 0nVn
1σ20nY 0nW 0
n,nXn1n tr(G
2n) +
1σ20nY 0nW 0
n,nWn,nYn1σ40nY 0nW 0
n,nVn1σ40nV 0nXn
1σ40nV 0nWn,nYn − 1
2σ40+ 1
σ60nV 0nVn
, (2.6)
and the average Hessian matrix (information matrix when vs are normal) is
− Eµ1
n
∂2 lnLn(θ0)
∂θ∂θ0
¶
=
1σ20nX 0nXn
1σ20nX 0nGnXnβ0 0
1σ20n(Xnβ0)
0G0nXn1σ20n(Xnβ0)
0G0nGn(Xnβ0) +1n [tr(G
0nGn) + tr(G
2n)]
1σ20ntr(Gn)
0 1σ20ntr(Gn)
12σ4
0
. (2.7)
5
Let Mn = In −Xn(X 0nXn)
−1X 0n be the projector into the space orthogonal to the column space of Xn.
Denote Cn = Gn − tr(Gn)n In.
Proposition 2.2 If limn→∞ 1n (GnXnβ0)
0Mn(GnXnβ0) 6= 0 or limn→∞ 1n tr[(C
0n +Cn)(C
0n +Cn)
0] 6= 0,
then Σθ is nonsingular where Σθ = − limn→∞ E³1n∂2 lnLn(θ0)
∂θ∂θ0
´.
The nonsingularity of Σθ implies the local identiÞable uniqueness condition of θ0 in the following proposition.3
Theorem 2.1 Under the assumed regularity conditions and that Σθ is nonsingular, there exists a neigh-
borhood Θ1 of θ0 such that, for any neighborhood N²(θ0) of θ0 with radius ²,
lim supn→∞
[ maxθ∈Θ1\N²(θ0)
[E(1
nlnLn(θ))− E( 1
nlnLn(θ0))] < 0.
The QMLE θn derived from the maximization of lnLn(θ) with θ ∈ Θ1 is consistent.
The presence of relevant spatial varying regressors in the model is a distinctive feature. From (2.1) and
(2.2), the reduced form equation of Yn can be represented as
Yn = Xnβ0 + λ0GnXnβ0 + S−1n Vn (2.2)0
because In+λ0Gn = S−1n . The sufficient condition in Theorem 2.1 that limn→∞ 1
n (GnXnβ0)0Mn(GnXnβ0) 6=
0 is equivalent to that the spatial generated regressor Zn where Zn = GnXnβ0 is linearly independent of
Xn in the reduced form equation (2.2)0. This condition is not only a local identiÞcation condition, but also
a global identiÞcation condition. This can, indeed, be seen from the expectation of the difference of the log
likelihood functions of (2.3). Let
σ2n(λ) =σ20ntr(S
0−1n S0n(λ)Sn(λ)S
−1n ). (2.8)
One has
E(1
nlnLn(θ))− E( 1
nlnLn(θ0)) =
1
2(lnσ20 − lnσ2) +
(σ2 − σ20)2σ2
+1
n(ln |Sn(λ)|− ln |Sn(λ0)|)
− (σ2n(λ)− σ20)2σ2
− 1
2σ2n[Xn(β0 − β) + (λ0 − λ)GnXnβ0]0[Xn(β0 − β) + (λ0 − λ)GnXnβ0].
The term involving Xn is analogous to the difference of the log likelihood functions of a normal regression
model of Yn with (known) regressors Xn and Zn where Zn = GnXnβ0. The presence of relevant regressors
plays the interesting role as if the spatial model were a regression model of Yn with regressors Xn and Zn.4
β0 and λ0 can be identiÞed as regression coefficients of Xn and GnXnβ0, respectively.
3 See White (1994) for the formal deÞnition of this condition.4 For cases with limn→∞ hn = ∞, the OLS regression of Yn on Xn and Wn,nYn may even provide
consistent estimates of β and λ in (2.1) (see, Lee 1999b).
6
Theorem 2.2 Under the assumed regularity conditions and that limn→∞ 1n (Xn, GnXnβ0)
0(Xn, GnXnβ0)
exists and is nonsingular, θ0 is globally identiÞable and θn is a consistent estimator of θ0.
For global identiÞcation for the case when GnXnβ0 and Xn are multicollinear, the possible identiÞcation
condition can be more comprehensively stated in terms of the concentrated log likelihood function in λ. From
the log likelihood function (2.3) of the model, given an λ, the QMLE of β is
βn(λ) = (X0nXn)
−1X 0nSn(λ)Yn, (2.9)
and the QMLE of σ2 is
σ2n(λ) =1
n[Sn(λ)Yn −Xn βn(λ)]0[Sn(λ)Yn −Xn βn(λ)] = 1
nY 0nS
0n(λ)MnSn(λ)Yn. (2.10)
The concentrated log likelihood function of λ is
lnLn(λ) = −n2(ln(2π) + 1)− n
2ln σ2n(λ) + ln |Sn(λ)|. (2.11)
The QMLE λn of λ maximizes the concentrated likelihood (2.11). The QMLEs of β and σ2 are, respectively,
βn(λn) and σ2n(λn). It is clear from (2.9) and (2.10) that β0 and σ
20 will be identiÞable once λ0 is identiÞable.
DeÞne the function Qn(λ) = maxβ,σ2 E(lnLn(θ)), which is related to the concentrated log likelihood function
of λ. The optimal solutions of this problem are β∗n(λ) = (X0nXn)
−1X 0nSn(λ)S
−1n Xnβ0 and
σ∗2n (λ) =1
nE[Sn(λ)Yn −Xnβ∗n(λ)]0[Sn(λ)Yn −Xnβ∗n(λ)]
=1
n[Sn(λ)S−1n Xnβ0]
0Mn[Sn(λ)S−1n Xnβ0] + σ
20tr[S
0−1n S0n(λ)Sn(λ)S
−1n ]
=1
n(λ0 − λ)2(GnXnβ0)0Mn(GnXnβ0) + σ
20tr[S
0−1n S0n(λ)Sn(λ)S
−1n ].
(2.12)
Hence,
Qn(λ) = −n2(ln(2π) + 1)− n
2lnσ∗2n (λ) + ln |Sn(λ)|. (2.13)
For the case with GnXnβ0 and Xn being multicollinear, MnGnXnβ0 = 0 and σ∗2n (λ) = σ
2n(λ) in (2.8), which
does not involve Xn.
Both the log likelihood function (2.3) and the concentrated log likelihood function (2.11) involve the
nonzero determinant |Sn(λ)|, which, in turn, makes that S−1n (λ) exists and is relevant for identiÞcation
analysis. The assumption that S−1n is bounded in both row and column sums in Assumption 5 is a sufficient
condition relevant for local identiÞcation but the uniform boundedness condition of S−1n at λ0 can imply
that S−1n (λ) are uniformly bounded in row and column sums uniformly only in a neighborhood of λ0 (see
7
Lee (2001)). For global identiÞcation via (2.13) without Xn, one needs, in general, a stronger uniform
boundedness property for S−1(λ) on the parameter space of λ.
Assumption 7. S−1n (λ) are uniformly bounded in either row or column sums, uniformly in λ in the
parameter space Λ, which is assumed to be a compact set.
This assumption is needed to deal with the logarithmic value of the determinant of Sn(λ) such that
1nE(|Sn(λ)|) can be uniformly equicontinuous in λ in Λ. It can be shown that if k Wn,n k≤ 1 for all n
where k · k is a matrix norm, then k S−1n (λ) k are uniformly bounded in any bounded subset of (−1, 1)
(Lemma A.4 in Lee, 2001). In particular, if Wn,n is a row-normalized matrix, S−1n (λ) is uniformly bounded
in row sums uniformly in any bounded subset of (−1, 1).
Theorem 2.3 For the case that GnXnβ0 and Xn are multicollinear, suppose that, for any λ 6= λ0,
limn→∞
µ1
nln |σ20S−1n S
0−1n |− 1
nln |σ2n(λ)S−1n (λ)S
0−1n (λ)|
¶6= 0. (2.14)
Then, for any ² > 0, lim supn→∞[maxθ∈N²(θ0)E(1n lnLn(θ))−E( 1n lnLn(θ0))] < 0, where N²(θ0) is an open
neighborhood of θ0 with radius ² in any compact parameter space Θ where Assumption 7 holds. The QMLE
θn is a consistent estimator of θ0 in Θ.
For the model, as Yn = S−1n Xnβ0+S
−1n Vn, the variance matrix of Yn is σ
20S
−1n S
0−1n . The global identiÞcation
condition in Theorem 2.3 is related to the uniqueness of the limiting sample average of the logarithmic value
of the determinant of this variance matrix.
2.3 Asymptotic Distribution of the QMLE
The asymptotic distribution of the QMLE θn can now be derived from the Taylor expansion of the Þrst
order condition that ∂ lnLn(θn)
∂θ = 0 at θ0.
At θ0, the Þrst-order derivatives of the log likelihood function in (2.4) imply
1√n
∂ lnLn(θ0)
∂β=
1
σ20√nX 0nVn and
1√n
∂ lnLn(θ0)
∂σ2=
1
2σ40√n(V 0nVn − nσ20),
and
1√n
∂ lnLn(θ0)
∂λ=
1
σ20√n(Xnβ0)
0G0nVn +1
σ20√n(V 0nG
0nVn − σ20tr(Gn)). (2.15)
These involve both linear and quadratic functions of Vn. Their asymptotic distributions may be derived
from relevant central limit theorems for linear-quadratic functions. The variance matrix of the score vector
8
in (2.15) is E( 1√n∂ lnLn(θ0)
∂θ · 1√n∂ lnLn(θ0)
∂θ0 ) = −E( 1n ∂2 lnLn(θ0)∂θ∂θ0 ) + Ωθ,n where
Ωθ,n =
0 ∗ ∗µ3σ40n
Pni=1Gn,iixi,n
2µ3σ40n
Pni=1Gn,iiGinXnβ0 +
(µ4−3σ40)σ40n
Pni=1G
2n,ii ∗
µ32σ6
0nl0nXn
12σ6
0n[µ3l
0nGnXnβ0 + (µ4 − 3σ40)tr(Gn)] (µ4−3σ40)
4σ80
(2.16)
is a symmetric matrix with µj = E(vji ), j = 2, 3, 4, being, respectively, the second, third, and fourth
moments of v, where Gin is the ith row of Gn and Gn,ij is the (i, j)th entry of Gn. For the case hn
being a bounded sequence, the central limit theorem for linear-quadratic forms in Kelejian and Prucha
(1999b) is applicable to (2.15). For the case that limn→∞ hn = ∞, 1σ20√n(Xnβ0)
0G0nVn will dominate the
quadratic term of 1√n∂ lnLn(θ0)
∂λ . This occurs because var( 1√nV 0nG
0nVn) =
(µ4−3σ40)n
Pni=1G
2n,ii+
σ4on [tr(G
0nGn)+
tr(G2n)] = O( 1hn ) from Lemma A.10 in Lee (2001) and, hence, 1√n(V 0nG0nVn − σ20tr(Gn)) = oP (1) while
1√n(Xnβ0)
0G0nVn = OP (1). For the case that limn→∞ hn = ∞, the limiting distribution of the score vector
based on 1σ20√n(Xnβ0)
0G0nVn follows from the Kolmogorov central limit theorem for the independence case.
Proposition 2.3 Under the assumed regularity conditions and that Σθ is positive deÞnite,
1√n
∂ lnLn(θ0)
∂θ
D−→ N(0,Σθ + Ωθ), (2.17)
where Ωθ = limn→∞Ωθ,n. If vis are normally distributed, 1√n∂ lnLn(θ0)
∂θ
D−→ N(0,Σθ).
Proposition 2.4 Under the assumed regularity conditions, 1n∂2 lnLn(θn)
∂θ∂θ0 − 1n∂2 lnLn(θ0)
∂θ∂θ0p→ 0, for any θn
that converges in probability to θ0.
Proposition 2.5 Under the assumed regularity conditions, 1n∂2 lnLn(θ0)
∂θ∂θ0 − E³1n∂2 lnLn(θ0)
∂θ∂θ0
´p→ 0. For
the case that limn→∞ hn =∞,
E
µ1
n
∂2 lnLn(θ0)
∂θ∂θ0
¶+1
σ20
1nX
0nXn
1nX
0nGnXnβ0 0
1n (Xnβ0)
0G0nXn1n (Xnβ0)
0G0nGn(Xnβ0) 00 0 1
2σ20
p−→ 0. (2.18)
Theorem 2.4 Under the assumed regularity conditions and that Σθ is nonsingular,
√n(θn − θ0) D→ N(0,Σ−1θ + Σ−1θ ΩθΣ
−1θ ). (2.19)
If v0is are normally distributed, then√n(θn − θ0) D→ N(0,Σ−1θ ).
The conditions that either (i) limn→∞ 1n (GnXnβ0)
0Mn(GnXnβ0) 6= 0 or (ii) limn→∞ 1n tr[(C
0n+Cn)(C
0n+
Cn)0] 6= 0 in Proposition 2.2 guarantee that the limiting (information) matrix Σθ is nonsingular. These
conditions are (local) identiÞcation conditions for the parameters of the model. The condition (i) is relevant
9
only for mixed regressive models as it involves the regressors Xn. The second condition (ii) is relevant for
spatial autoregressive models with or without regressors. However, the condition (ii) is applicable only for
the cases that hn is a bounded sequence. This is so, because tr(Gn), tr(G2n) and tr(G0nGn) are all of order
O( nhn ) from Lemma 2, they imply that1n tr[(C
0n+Cn)(C
0n+Cn)
0] = O( 1hn ), which will converge to zero when
limn→∞ hn = ∞. For the case with hn being a divergent sequence, the QMLEs for all the parameters
can still be√n-consistent when there are relevant spatial varying regressors Xn in the model such that
GnXnβ0 is not linearly dependent on Xn. However, when GnXnβ0 and Xn are multicollinear, β0 and λ0 can
not be identiÞed just from the regression structures as in Theorem 2.2 because they will not be separately
identiÞable as coefficients of the regressors Xn and Zn in (2.2)0. IdentiÞcation of λ0 will depend crucially on
the covariances of spatial outcomes. For the case that hn is a bounded sequence, the limiting matrix Σθis in general nonsingular when condition (ii) is satisÞed. The asymptotic distribution of the MLE θn will be
√n-consistent and asymptotically normal as in Theorems 2.1 and 2.3. When limn→∞
tr(Gn)n in (2.7) is Þnite
and nonzero, the MLEs λn and σ2n are asymptotically dependent. Anselin and Bera (1998) discussed the
implication of this dependence on statistical inference problems with nuisance parameters. For the case that
hn is a divergent sequence and the disturbances vs are normally distributed, Ωθ = 0, limn→∞tr(Gn)n = 0
and, in this case, the MLEs λn and σ2n are asymptotically independent.
3. Mixed Regressive Models with Singular Information Matrices
When limn→∞ hn = ∞, Σθ can be nonsingular only if GnXnβ0 is not multicollinear with Xn. If
spatially varying regressors are not present or are irrelevant, the multicollinearity problem can occur and,
consequently, there are irregularities in the information matrix for the case when limn→∞ hn =∞.
3.1 Cases with Singular Information Matrices
There are several concrete cases that irregularity can occur when Wn,n is row-normalized.
If there are no spatially varying regressors, Xn will consist of a constant term. AsWn,n is row-normalized,
Xn = ln will imply thatWn,nXn = ln and GnXn =ln
1−λ0 . The latter holds because Snln = (1−λ0)ln implies
S−1n ln =ln
1−λ0 .5 In this case, when limn→∞ hn =∞, the limiting information matrix in Σθ will be singular.
5 Note that this relation is established without using the series expansion that S−1n = In + λ0Wn,n +λ20W
2n,n + · · · =
P∞i=0 λ
i0W
in,n. The series expansion is valid for |λ| < 1 with a row-normalized Wn,n (Horn
and Johnson (1985).
10
This is so because the submatrix
1
n
µX 0n
(Xnβ0)0G0n
¶(Xn, GnXnβ0) =
µ1nX
0nXn
1nX
0nGnXnβ0
1n (Xnβ0)
0G0nXn1n (Xnβ0)
0G0nGn(Xnβ0)
¶=
µ1 β0
1−λ0β01−λ0 ( β0
1−λ0 )2
¶(3.1)
of (2.7) is singular and that 1n [tr(G
0nGn) + tr(G
2n)] = O(
1hn)→ 0 as n→∞.
The other case is when all spatially varying regressors are irrelevant but are included in estimation. Let
Xn = (ln, X2n) where X2n are the spatially varying regressors. In that case, the coefficient subvector β02 of
X2n in β0 = (β01,β002)
0 is zero. Consequently, Xnβ0 = lnβ01 and
GnXnβ0 = β01Wn,nS−1n ln = β01Wn,n(
ln1− λ0 ) =
β011− λ0 ln.
It follows that
µ1nX
0nXn
1nX
0nGnXnβ0
1n (Xnβ0)
0G0nXn1n (Xnβ0)
0G0nGn(Xnβ0)
¶=
µ 1nX
0nXn
β011−λ0 · 1nX 0
nlnβ011−λ0 · 1n l0nXn ( β01
1−λ0 )2 · 1n l0nln
¶, (3.2)
which is singular because the last column is proportional to the Þrst one.
There is another case that causes some related issues. Consider the spatial weights matrix Wn,n =
1n−1 (l
0nln − In). Sample observations are from a single spatial economy where a spatial unit is inßuenced
equally by all the other spatial units. Because the spatial weights matrix has a particular pattern, the inverse
of Sn can be derived analytically and has the form: S−1n = (1 + λ0
n−1 )−1(In + λ0
1−λ0lnl
0n
(n−1) ). It follows that
Gn =1
(n−1+λ0) (lnl
0n
1−λ0 − In) and
GnXnβ0 =1
(n− 1 + λ0) (l0nXnβ01− λ0 ln −Xnβ0) =
n
(n− 1 + λ0) (l0nXnβ0(1− λ0)nln −
Xnβ0n
), (3.3)
which are linear combinations of Xn and its constant term, even though the linear coefficients change as n
changes. However, this case has hn = n and it does not satisfy Assumption 4 thathnn goes to zero.
The singularity of the information matrix has implications on the rate of convergence of the estimators
which will be analyzed in the following sections.
3.2 IdentiÞcation and Consistency
The preceding cases imply that the columns of GnXnβ0 lie in the space spanned by the columns of Xn
and, therefore, MnGnXnβ0 = 0. For subsequent analyses, the following assumption will be maintained.
Assumption 8. MnGnXnβ0 = 0.
This assumption may cover more irregular cases than those mentioned.
11
The subsequent analysis is revealing by working with the concentrated log likelihood function lnLn(λ)
in (2.11) and the corresponding deterministic function Qn(λ) in (2.13). The following proposition shows
the rate of uniform convergence of (lnLn(λ) − Qn(λ)) to zero. For the case limn→∞ hn = ∞, the uniform
convergence can be achieved on any bounded set of λ as in the following proposition.6
Proposition 3.1 Under the assumed regularity conditions, hnn (lnLn(λ) − Qn(λ))p→ 0 uniformly in λ
in any bounded set when limn→∞ hn =∞.
As∂σ2n(λ)∂λ = − 2
nY0nW
0n,nMnSn(λ)Yn, the Þrst and second order derivatives of the concentrated log
likelihood in (2.11) are
∂ lnLn(λ)
∂λ=
1
σ2n(λ)Y 0nW
0n,nMnSn(λ)Yn − tr(Wn,nS
−1n (λ)), (3.4)
and
∂2 lnLn(λ)
∂λ2=
2
nσ4n(λ)(Y 0nW
0n,nMnSn(λ)Yn)
2 − 1
σ2n(λ)Y 0nW
0n,nMnWn,nYn − tr([Wn,nS
−1n (λ)]2). (3.5)
As MnGnXnβ0 = 0, the Þrst and second derivatives in (3.4) and (3.5) at λ0 can be simpliÞed into
∂ lnLn(λ0)
∂λ=
1
σ2n(λ0)V 0nG
0nMnVn − tr(Gn), (3.6)
and
∂2 lnLn(λ0)
∂λ2=
2
nσ4n(λ0)(V 0nG
0nMnVn)
2 − 1
σ2n(λ0)V 0nG
0nMnGnVn − tr(G2n). (3.7)
As MnXn = 0, σ2n(λ0) =
1nV
0nMnVn from (2.10) and σ2n(λ0)
p→ σ20 because
σ2n(λ0) =1
nV 0nMnVn =
1
nV 0nVn −
1
nV 0nXn(X
0nXn)
−1X 0nVn =
1
n
nXi=1
v2i + oP (1) = σ20 + oP (1) (3.8)
by the Markov inequality as E( 1nV0nXn(X
0nXn)
−1X 0nVn) =
σ2kn = o(1). DeÞne the function
Pn(λ0) =2
nσ40(V 0nG
0nMnVn)
2 − 1
σ20V 0nG
0nMnGnVn − tr(G2n). (3.9)
∂2 lnLn(λ0)∂λ2 in (3.7) differs from Pn(λ0) in that σ
2n(λ0) of the former is replaced by σ
20 .
Proposition 3.2 Under the regularity conditions for this model,
E(Pn(λ0)) =2
ntr2(MnGn)− tr(MnGnG
0n)− tr(G2n) +O(
1
hn)
= −[tr(CnC0n) + tr(C2n)] +O(1),(3.10)
6 For the case that hn is a bounded sequence, it can be shown that the uniform convergence can occuron a neighborhood of λ0 without extra conditions. We focus on the divergent hn case as it is relevant forthe irregularity issue.
12
and hnn [Pn(λ0)− E(Pn(λ0)]
p→ 0. Furthermore, hnn
³∂2 lnLn(λ0)
∂λ2 − E(Pn(λ0))´
p→ 0.
Hence,
hnn
∂2 lnLn(λ0)
∂λ2p→ limn→∞
hnnE(Pn(λ0)) (3.11)
if the latter limit exists. The limit is nonnegative as tr(CnC0n)+ tr(C
2n) =
12 [tr(C
0n+Cn)(C
0n+Cn)
0] is always
nonnegative and is strictly positive unless C0n + Cn = 0.
Assumption 9. Σλλ = limn→∞ hnn tr[(C
0n + Cn)(C
0n + Cn)
0] exists and is positive.
This assumption modiÞes the local identiÞcation in Proposition 2.2 for the case that limn→∞ hn =∞. While
limn→∞ 1n tr[(C
0n + Cn)(C
0n + Cn)
0] = 0 as limn→∞ hn = ∞, the magnifying factor hn will correct for the
rate of convergence so that the limit Σλλ in Assumption 9 can be Þnite and nonzero. The following theorem
shows that Assumption 9 is a sufficient condition for the local identiÞcation of λ0.
Theorem 3.1 Under the assumed conditions, there exists a set Λ with λ0 in its interior such that, for
any neighborhood N²(λ0) of λ0 with radius ²,
lim supn→∞
µmax
λ∈Λ\N²(λ0)
hnnQn(λ)− hn
nQn(λ0)
¶< 0.
Furthermore, the QMLE λn derived from the maximization of lnLn(λ) with λ in Λ is a consistent estimator.
For global identiÞcation, λ0 would be globally identiÞable if
limn→∞(
hnnln |σ20S−1n S
0−1n |− hn
nln |σ2n(λ)S−1n (λ)S
0−1n (λ)|) 6= 0
when λ 6= λ0. This condition modiÞes that of Theorem 2.3 by adjusting the magnifying factor hn. It
is equivalent to that limn→∞ hnn
¡ln |Sn(λ)|− ln |Sn(λ0)|− n
2 (lnσ2n(λ)− lnσ2n(λ0))
¢ 6= 0 for any λ 6= λ0.
A direct veriÞcation of the global identiÞcation of λ0 would depend on the speciÞc sequence of weights
matrices Wn,n in a model. For some models with important weights matrices, when limn→∞ hn =∞, the
concentrated log likelihood function in (2.11) can be globally concave. For those cases, local identiÞcation
under Assumption 9 implies global identiÞcation as the maximizer of λ is unique. These cases include the
spatial weights matrix in Case (1991, 1992) which is symmetric, and the familiar case of Ord (1975) where
Wn,n is row-normalized and the eigenvalues of Wn,n are real. SpeciÞcally, when limn→∞ hn = ∞, if the
weights matrix Wn,n is symmetric or Wn,n = Λ−1n Dn where Λn is a diagonal matrix and Dn is a symmetric
matrix, then, for large enough n, hnn (Qn(λ) − Qn(λ0)) is concave on any bounded set of λ. With the local
identiÞcation in Theorem 3.1, λ0 is its unique global maximizer and, hence, λ0 is uniquely identiÞable on
13
any bounded parameter space. The estimator λn derived from the maximization of lnLn(λ) in (2.11) over
any compact parameter space of λ will be consistent. The proofs of these results are the same as those in
Theorem 4 of Lee (2001) for the pure spatial autoregressive process because Qn(λ) of the mixed regressive
model has the same expression as that of the pure spatial autoregressive model when MnGnXnβ0 = 0.
3.3 Asymptotic Distribution
Asymptotic distribution of the QMLE λn can be derived from the concentrated log likelihood function
(2.11). Subsequently, the QMLEs βn and σ2n from (2.9) and (2.10) can be derived once the asymptotic
distribution of λn is available.
Proposition 3.3 Under the assumed regularity conditions of the model,
hnn
Ã∂2 lnLn(λn)
∂λ2− ∂
2 lnLn(λ0)
∂λ2
!= oP (1),
for any λn that converges to λ0 in probability.
Let Qn = V0nC
0nMnVn. It follows from (3.6) and (3.8) that
∂ lnLn(λ0)∂λ = Qn
σ2n(λ0). The limiting distribution
of this Þrst-order derivative depends on that of the quadratic form Qn of Vn. The original central limit
theorem in Kelejain and Prucha (1999b) is not directly applicable to the case with hn being a divergent
sequence. But their theorem and its proof have been modiÞed in Lee (2001) to cover the divergent case.
Assumption 4 needs to be slightly strengthened for the latter.
Assumption 40. h1+ηn
n for some η > 0 tends to zero as n goes to inÞnity.
Proposition 3.4 Under the assumed regularity conditions,
rhnn
∂ lnLn(λ0)
∂λ
p→ N (0,Σλλ + Ωλ) , (3.12)
where Ωλ = (µ4−3σ40σ40
) limn→∞ hnn
Pni=1 C
2n,ii
For the cases that vis are normally distributed or limn→∞ hn =∞,rhnn
∂ lnLn(λ0)
∂λ
p→ N (0,Σλλ) . (3.13)
The asymptotic distribution of λn will then follow from the expansion
rn
hn(λn − λ0) = −
Ãhnn
∂2 lnLn(λn)
∂λ2
!−1·rhnn
∂ lnLn(λ0)
∂λ
14
and the stochastic convergence of the Þrst and second order derivatives in Propositions 3.3 and 3.4 and that
in (3.11).
Theorem 3.2 Under the assumed regularity conditions for this model,
rn
hn(λn − λ0) p→ N(0,Σ−1λλ + Σ
−1λλΩλΣ
−1λλ ). (3.14)
For the cases that vis are normally distributed or limn→∞ hn =∞,rn
hn(λn − λ0) p→ N(0,Σ−1λλ ). (3.15)
With λn, the QMLEs for β and σ2 are βn = (X
0nXn)
−1X 0nSn(
λn)Yn, and σ2n =
1nY
0nS
0n(λn)MnSn(λn)Yn.
As these estimators are related to λn, one has to take into account the possible effect of λn on their asymptotic
distributions.
Theorem 3.3 Under the assumed regularity conditions, as limn→∞ hn =∞,rn
hn( βn − β0) =
rn
hn(X 0
nXn)−1X 0
nVn −rn
hn(λn − λ0) · (X 0
nXn)−1X 0
nGnXnβ0 +Op(1√hn)
D→ N³0,Σ−1λλ · limn→∞(X
0nXn)
−1(X 0nGnXnβ0)(X
0nGnXnβ0)
0(X 0nXn)
−1´,
(3.16)
and√n(σ2n − σ20) = 1√
n
Pni=1(v
2i − σ20) + oP (1) D→ N(0, µ4 − σ40).
However, when β0 = 0,√nβn
D→ N³0,σ20 limn→∞(
X0nXn
n )−1´.
The asymptotic distribution of the QMLE βn in this situation is determined by the asymptotic distri-
bution of λn as the latter forms the leading term in an asymptotic expansion which converges in the slow
rateq
nhn. However, when β0 = 0, this leading term vanishes and βn can converge with the usual
√n rate.
The asymptotic distribution of σ2n has the usual√n convergence as expected.
3.4 The Spatial Autoregressive Model with Irrelevant Regressors or Only an Intercept
As GnXnβ0 and Xn are multicollinear under consideration, there exists a column vector cn such that
GnXnβ0 = Xncn, the asymptotic distribution of βn in (3.16) can be rewritten asrn
hn( βn − β0) D→ N(0,Σ−1λλ · limn→∞ cnc
0n). (3.16)0
If some components of cn are zero, the corresponding diagonal entries of the limiting variance matrix will be
zero and the corresponding distributions of estimates in βn will be degenerated. In that case, those component
estimates may converge at a rate faster thanq
nhn, while the estimates of the remaining components will
15
converge at theq
nhn-rate. A concrete example is a mixed regressive model where all the included spatial
varying regressors are irrelevant, i.e., the unknown coefficients of the spatial varying regressors of Xn are
zeros. Let Xn = (ln, X2n) and β = (β1,β02)0 be the conformable partition. The model with all spatial
regressors being irrelevant has the true parameter (sub)vector β02 of β2 being zero. A spatial autoregressive
model with only an intercept but no other spatial regressors corresponds to the same model but with β02 = 0
known and being imposed in estimation.
In the irrelevant regressors case, one estimates both the unknown parameter β1 and β2. Because Xnβ0 =
β01ln, GnXnβ0 = β01Gnln. If Gnln is not linearly depended on ln, ln and Gnlnβ01 can be distinguished
regressors in (2.2)0. In that case, the results in Proposition 2.2 and Theorems 2.2 and 2.4 are applicable and
the QMLE θn of all the parameters in the model can be√n-consistent. In the event that Gnln and ln are
multicollinear but hn is a bounded sequence, Proposition 2.2 and Theorems 2.3 and 2.4 are applicable andθn is still
√n-consistent.
The irregular case occurs when Gnln and ln are multicollinear and limn→∞ hn =∞. The irregular case
is the concern in this section. If β01 were zero, it would correspond to β0 = 0, that is covered by the last part
of Theorem (3.3). Thus, for subsequent analysis, it is implicit that β01 6= 0 for consideration. In generality,
consider cn = (c01n, c
02n)
0 where c1n 6= 0 but c2n = 0. Conformably, let Xn = (X1n, X2n) and β = (β01,β02)0.
For the model with all spatial regressors being irrelevant and the weights matrix being row-normalized, as
Wn,nln = ln and Gnln =ln
1−λ0 , GnXnβ0 =β011−λ0 ln and, hence, c1n =
β011−λ0 and c2n = 0. It follows from
(3.16) that
βn − β0 = (X 0nXn)
−1X 0nVn − (λn − λ0)(X 0
nXn)−1X 0
nGnXnβ0 +OP (
√hnn)
= (X 0nXn)
−1X 0nVn − (λn − λ0)cn +OP (
√hnn).
(3.17)
As c1n 6= 0 but c2n = 0, (3.17) suggests that βn1 may be affected by the limiting distribution of λn but
βn2 will not. When limn→∞ hn =∞, it will be shown that λn and βn1 have a different rate of convergence
from that of βn2. Let M1n = In − X1n(X 01nX1n)
−1X 01n and M2n = In − X2n(X 0
2nX2n)−1X 0
2n. Using a
matrix partition for (X 0nXn)
−1, βn1 − β01 = (X 01nM2nX1n)
−1X 01nM2nVn − c1n(λn − λ0) + Op(
√hnn ), and
βn2 − β20 = (X 02nM1nX2n)
−1X 02nM1nVn +Op(
√hnn ). It follows that
rn
hn( βn1 − β01) = 1√
hn(1
nX 01nM2nX1n)
−1 1√nX 01nM2nVn − c1n ·
rn
hn(λn − λ0) +Op( 1√
n)
= −c1n ·rn
hn(λn − λ0) +OP ( 1√
hn),
(3.18)
16
and
√n( βn2 − β20) = ( 1
nX 02nM1nX2n)
−1 · 1√nX 02nM1nVn +OP (
rhnn). (3.19)
This indicates that βn2 has the usual√n-rate of convergence regardless whether hn is divergent or not.
As λn has only aq
nhn-rate of convergence, βn2 has a faster rate of convergence than λn for the case
limn→∞ hn =∞. These results are summarized in the following theorem.
Theorem 3.4 Under the assumed regular conditions for the model (2.1) with GnXnβ0 = X1nc1n where
Xn = (X1n, X2n), when limn→∞ hn = ∞,q
nhn( βn1 − β01) D→ N(0,Σ−1λλc1c
01), where c1 = limn→∞ c1n, but
√n( βn2 − β02) D−→ N(0,σ20(limn→∞ 1
nX02nM1nX2n)
−1).
It follows that for the regressive model where the included spatial varying regressors are all irrelevant
and the intercept β01 6= 0, when limn→∞ hn = ∞, βn1 has the same low rate of convergence as λn and
its limited distribution is determined by the limiting distribution of λn. But, βn2 will converge to zero in
probability at the usual√n-rate. These results are for the regressive model (2.1) where β02 = 0 but the
constraint β02 = 0 is not imposed in the estimation. One may like to consider the situation where the
constraint is correctly imposed on the estimation. With the constraint β02 = 0 imposed, the model for
estimation is a spatial autoregressive model with an unknown intercept:
Yn = β1ln + λWn,nYn + Vn. (3.20)
The unknown parameters for estimation are β1, λ and σ2. Given an λ, the QMLE of β1 is
βn1(λ) =1
nl0nSn(λ)Yn (3.21)
and the estimate of σ2 is σ2n(λ) =1nY
0nS
0(λ)M1nSn(λ)Yn, where M1n = In − lnl0n
n . The concentrated log
likelihood function of λ is the one in (2.11) withMn in (2.10) replaced byM1n. BecauseM1n can be regarded
as a special case forMn (with Xn = ln), the results in Theorems 3.1 and 3.2 are applicable for the restricted
estimate λn. This is so also for the estimate σ2n with the results in Theorem 3.3. From (3.17) with Xn = ln,
βn1 − β01 = 1n l0nVn − c1n(λn − λ0) +OP (
√hnn ). Therefore,r
n
hn( βn1 − β01) = 1√
hn· l0nVn√n− c1n
rn
hn(λn − λ0) +OP ( 1√
n)
= −c1nrn
hn(λn − λ0) +OP ( 1√
hn),
(3.22)
which is the same as in (3.18). Thus, we can conclude that the results in Theorems 3.2, 3.3 and 3.4 hold
also for the restricted parameter estimates λn, βn1 and σ2n.
17
4. Inconsistency When hn = O(n)
The preceding results are derived for cases with limn→∞ hnn = 0. That is, if hn is a divergent sequence,
it diverges to inÞnity at a rate slower than n. In this section, we are going to provide an example which
shows that the QMLE θn can not be consistent if hn has the order n.
Consider the weights matrix Wn,n =1
n−1 (lnl0n − In), which would be the weights matrix in Case (1991,
1992) when sample data are collected only for a single district. In this case, hn = (n− 1) is of order O(n).
It can be shown that the QMLE λn can not be a consistent estimator of λ0 and there are consequences in
estimating the regression coefficients. For simplicity, we consider the case that σ20 is known and σ20 = 1.
With this Wn,n, S−1n (λ) = (1 + λ
n−1 )−1(In + λ
(1−λ)lnl
0n
(n−1) ) and Wn,nS−1n (λ) = 1
(n−1+λ) (lnl
0n
1−λ − In). It follows
that GnXnβ0 =n
(n−1+λ0)³l0nXnβ0(1−λ0)n ln −
Xnβ0n
´, which is multicollinear with Xn.
As σ20 = 1 is known, the unknown parameter vector is δ = (β0,λ)0 and the corresponding log like-
lihood function is lnLn(δ) = −n2 ln(2π) + ln |Sn(λ)| − V 0
n(δ)Vn(δ)2 . Given λ, the QMLE of β0 is βn(λ) =
(X 0nXn)
−1X 0nSn(λ)Yn and the concentrated log likelihood function of λ is
lnLn(λ) = −n2ln(2π) + ln |Sn(λ)|− 1
2Y 0nS
0n(λ)MnSn(λ)Yn. (4.1)
Because MnGnXnβ0 = 0, the Þrst order derivative of (4.1) is
∂ lnLn(λ)
∂λ= −tr(Wn,nS
−1n (λ)) + Y 0nS
0n(λ)MnWn,nYn
= −tr(Wn,nS−1n (λ)) + V 0nMnGnVn + V
0nG
0nMnGnVn(λ0 − λ),
(4.2)
and the second order derivative is
∂2 lnLn(λ)
∂λ2= −tr[(Wn,nS
−1n (λ))2]− V 0nG0nMnGnVn. (4.3)
At λ0, because tr(Gn) =n
(n−1+λ0) (λ01−λ0 ) and MnGn = − Mn
(n−1+λ0) ,
∂ lnLn(λ0)
∂λ= −tr(Gn) + V 0nMnGnVn = − n
(n− 1 + λ0)µ
λ01− λ0
¶− 1
(n− 1 + λ0)V0nMnVn. (4.4)
As tr[(Wn,nS−1n (λ))2] = ( n
n−1+λ )2h1−2(1−λ)/n(1−λ)2 + 1
n
iand G0nMnGn =Mn/(n− 1 + λ0)2, one has
∂2 lnLn(λ)
∂λ2= −
µn
n− 1 + λ¶2 ·
1− 2(1− λ)/n(1− λ)2 +
1
n
¸− V 0nMnVn(n− 1 + λ0)2 . (4.5)
By the mean value theorem, λn = λ0−³∂2 lnLn(λn)
∂λ2
´−1∂ lnLn(λ0)
∂λ where λn lies between λn and λ0. Suppose
λn were consistent, we shall show that there would be a contradiction. If λn were consistent, it would
18
imply that λnp→ λ0 and, hence,
∂2 lnLn(λn)∂λ2
p→ − 1(1−λ0)2 . As
1nV
0nMnVn =
1nV
0nVn+ oP (1)
p→ 1, ∂ lnLn(λ0)∂λ
p→
1− λ0(1−λ0) . Consequently,
λnp→ λ0+(1−λ0)(1−2λ0), which would not equal to λ0 in general, a contradiction.7
So, λn could not be a consistent estimator.
Let βn,L = (X 0Xn)−1X 0nYn is the OLSE of β. As (X
0nXn)
−1X 0nWn,n =
1n−1 (ek,1l
0n − (X 0
nXn)−1X 0
n)
where ek1 is the Þrst unit vector of the k-dimensional Euclidean space,
βn = (X0nXn)
−1X 0nSn(
λn)Yn
= (X 0nXn)
−1X 0nSnYn − (λn − λ0)(X 0
nXn)−1X 0
nWn,nYn
= β0 + (X0nXn)
−1X 0nVn − (λn − λ0)
"l0nYn(n− 1)ek,1 −
βn,L(n− 1)
#.
(4.6)
Because l0nS−1n = (1 + λ0
n−1 )−1(1 + nλ0
(n−1)(1−λ0))l0n,
l0nYn(n− 1) =
µ1 +
λ0n− 1
¶−1 µ1 +
nλ0(n− 1)(1− λ0)
¶·l0nXnβ0n− 1 +
l0nVnn− 1
¸p→ 1
(1− λ0)µxb,
where µxb = limn→∞ l0nXnβ0/n. For the OLSE, because (X 0nXn)
−1X 0nS
−1n Xnβ0 = (1 + λ0
n−1 )−1[β0 +
( λ01−λ0 )
l0nXnβ0(n−1) ek,1],
βn,L = (X0nXn)
−1X 0n(S
−1n Xnβ0 + S
−1n Vn) =
µ1 +
λ0n− 1
¶−1 ·β0 +
µλ0
1− λ0
¶l0nXnβ0(n− 1) ek,1
¸+ oP (1)
p→ β0 + (λ0
1− λ0 )µxbek,1.(4.7)
Hence,
βn = β0 − (λn − λ0)[ µxb1− λ0 ek1 +Op(
1
n)] +O(
1√n). (4.8)
So, as λn−λ0 does not go to zero in probability, βn will not converge to β0 and βn will not be consistent. Ifλn−λ0 were stochastically bounded, it could be seen from (4.8) that the consistency of βn would mainly be
on the estimate of the intercept term β01 and the estimates β2n of the coefficients of β02 could be consistent.
Indeed, these are so for the OLSE βn,L from (4.7).
However, λn might not be stochastically bounded and even the QMLE β2n could be problematic. This
may be seen as follows. λn is supposed to be solved from the likelihood equation ∂ lnLn(λ)∂λ = 0 from (4.2).
As MnWn,n = − Mn
n−1 , MnS−1n = (1 + λ0
n−1 )−1Mn and MnS
−1n Xn = 0, Y 0nMnWn,nYn = − 1
n−1Y0nMnYn =
7 For the pure spatial autoregressive model, a similar argument leads to the convergence of λn to anondegenerate random variable instead of a constant. This is so because of the presence of Mn in (4.4) and(4.5) in the mixed regressive model.
19
−(1 + λ0(n−1) )
−2 V 0nMnVn(n−1)
p→ −1 and Y 0nW 0n,nMnWn,nYn =
1(n−1+λ0)2V
0nMnVn = op(1). It follows from (4.2)
that
∂ lnLn(λ)
∂λ= −
µλ
1− λ¶
n
(n− 1 + λ) + Y0nMnWn,nYn − λY 0nW 0
n,nMnWn,nYnp→ D∞(λ),
where D∞(λ) = − λ1−λ − 1 = − 1
(1−λ) , uniformly in any bounded set of λ bounded away from 1. For λ < 1,
D∞(λ) < 0 and, for λ > 1, D∞(λ) > 0. D∞(λ) a strictly decreasing function on both the regions where
λ < 1 and that of λ > 1. D∞(λ) 6= 0 for any Þnite λ but limλ→±∞D∞(λ) = 0. Thus, either λn does not
exist or λn may diverge to inÞnity for large n.
The above example has a sample obtained from a single district. By increasing n, it means increasing
spatial units in the same district. That corresponds to the notion of inÞll asymptotics (Cressie 1993, p.101).
The above example shows that the QMLE under inÞll asymptotics alone may not be consistent. If there
are many separate districts from which samples are obtained, the QMLEs can be consistent if the number
of districts, i.e., nhn, increases to inÞnity. The latter scenario corresponds to the notion of increasing-
domain asymptotics (Cressie 1993, p.100). Consistency of the QMLE can be achieved with increasing-
domain asymptotics. From our results, the rate of convergence of the QMLE under the increasing-domain
asymptotics alone can be the usual√n-rate. But, when both inÞll and increasing-domain asymptotics are
operating, the rates of convergence of the QMLEs for various parameters can be different and some of them
may have a slower rate than the usual one. These results are Þrst observed for the pure spatial autoregressive
process in Lee (2001). Here, we have demonstrated that even though the presence of relevant spatial varying
regressors play a distinctive role in identiÞcation and consistent estimation of the mixed regressive model,
the requirement of increasing-domain asymptotics is still needed regardless the regressors.
5. Conclusions
In this paper, we have investigated the asymptotic distribution of the QMLE of a (normal) likelihood
function for mixed regressive, spatial autoregressive models. Our analysis reveals that asymptotic properties
of the QMLE depend on some important features of the spatial weights matrix. We have considered distinc-
tive cases that each spatial unit can only be inßuenced by a few neighboring units and cases that each unit
can be inßuenced by many spatial units, which are uniformly small.
The mixed regressive model differs from a pure spatial autoregressive process in the presence of regres-
sors. In the presence of relevant spatial regressors in the model, the regressors play the dominated role in
the identiÞcation of the spatial interaction and regression coefficients. In models without spatial varying
20
regressors but only a constant intercept term, a spatial weights matrix (without row normalization) might
interact with the constant term to generate a spatial varying regressor for identiÞcation. However, if the
spatial weights matrix is row-normalized in the formulation of the spatial model, it will not interact with the
constant term to generate a spatial varying regressor. In the presence of relevant spatial varying regressors
(generated or not), the QMLEs of all the parameters in the mixed regressive model can be√n-consistent
and are asymptotically normal regardless whether a spatial unit is inßuenced by a few or many neighbors.
If the spatial regressors are irrelevant or can not be generated from the interaction of the spatial weights
matrix with the included constant term, identiÞcation of the spatial interaction parameter can only be
identiÞed by the correlation of spatial units. The asymptotic distribution of the QMLE can be√n-consistent
for the scenario that each spatial unit is inßuenced by only a few neighboring units. For the case that each
spatial unit is inßuenced by many neighboring units, the asymptotic distributions of the QMLEs of various
parameters in the model may have different rates of convergence. The QMLEs of the regression coefficients of
the irrelevant spatial varying regressors may still converge at the√n to their true parameter values, namely
zero, but the QMLEs of the spatial interaction parameter and the constant intercept term will converge at
a slower rate.8
In the spatial scenario introduced by Case (1991, 1992), the requirement that hnn goes to zero corresponds
to the notion of increasing-domain asymptotics. Our analysis illustrates that the consistency and asymptotic
distributions of the parameter estimates are subject to the requirement of increasing-domain asymptotics
regardless of the presence of relevant spatial regressors or not.
8 The latter slow rate of convergence occurs also for the QMLE of the spatial interaction parameter inthe pure spatial autoregressive process (Lee, 2001).
21
Appendix I: Some Useful Lemmas
In this Appendix, the assumption that the elements wn,ij of the weights matrixWn,n are of order O(1hn)
uniformly in i, j will be maintained. In general, Wn,n may not be row-normalized. The following lemmas are
mostly relevant for the regressive model. The Þrst three lemmas have been established in Lee (2001). They
are listed here for frequent reference. Additional lemmas needed for our subsequent proofs can be found in
Lee (2001).
Lemma 1. Suppose that the column sums of An are uniformly bounded.
1) wi,nAnenj = O( 1hn ) and w0n,iAnenj = O( 1hn ) uniformly in i and j, where enj is the jth unit column
vector of the n-dimensional Euclidean space and wi,n and wn,i are, respectively, the ith row and columns
of Wn,n.
2) wi,nAnb0j,n = O( 1hn ) and w
0n,iAnb
0j,n = O( 1hn ) uniformly in i and j, where bj,n is the jth row of Bn,
when Bn,n is uniformly bounded in row sums.
Proof: This is Lemma A.7 in Lee (2001). Q.E.D.
Lemma 2. Suppose that Wn,n and S−1n are uniformly bounded in row sums, and An is uniformly
bounded in column sums. Then, the matrix Hn = G0nAn has the properties that e
0niH
mn enj(= e
0njH
0mn eni) =
O( 1hn ) uniformly in i and j, and tr(Hmn ) = tr(H
0mn ) = O( nhn ) for any integer m ≥ 1.
Furthermore, if Wn,n, S−1n and An are uniformly bounded in both row and column sums, then
e0ni(HnH0n)menj = O( 1hn ) uniformly in i and j, and tr[(HnH
0n)m] = tr[(H 0
nHn)m] = O( nhn ). In addition,
e0ni(G0nGn)
menj = O(1hn).
Proof: This is Lemma A.8 in Lee (2001). Q.E.D.
Lemma 3 Assume that Wn,n, S−1n and An are uniformly bounded in both row and column sums
and v1, . . . , vn are i.i.d. with zero mean and Þnite variance σ2. Let Hn = G
0nAn. Then, E(V
0nHnVn) = O(
nhn),
var(V 0nHnVn) = O( nhn ), and V0nHnVn = OP (
nhn). Furthermore, if limn→∞ hn
n = 0, then hnn V
0nHnVn −
hnn E(V
0nHnVn) = oP (1).
Proof: This is a part of Lemma A.12 in Lee (2001). Q.E.D.
Lemma 4. Suppose that the elements of the n × k matrices Xn are uniformly bounded for all n; and
limn→∞ 1nX
0nXn exists and the limiting matrix is nonsingular, then the projectors Mn and (In−Mn), where
Mn = In −Xn(X 0nXn)
−1X 0n, are uniformly bounded in both row and column sums.
Proof: Let Bn = ( 1nX0nXn)
−1. From the assumption of the lemma, Bn converges to a Þnite limit.
22
Therefore, there exists a constant cb such that |bn,ij | ≤ cb for all n, where bn,ij is the (i, j)th element
of Bn. By the uniform boundedness of Xn, there exists a constant cx such that |xn,ij | ≤ cx for all n.
Let An =1nXn(
X0nXn
n )−1X 0n, which can be rewritten as An =
1n
Pks=1
Pkr=1 bn,rsxn,rx
0n,s, where xn,r is
the rth column of Xn. It follows thatPn
j=1 |an,ij | =Pn
j=1 | 1nPk
s=1
Pkr=1 bn,rsxn,irxn,js| ≤ k2cbc
2x, for all
i = 1, · · · , n. Similarly, Pni=1 |an,ij | =
Pni=1 | 1n
Pks=1
Pkr=1 bn,rsxn,irxn,js| ≤ k2cbc
2x for all j = 1, · · · , n.
That is, (In−Mn) = Xn(X0nXn)
−1X 0n are uniformly bounded in both row and column sums. Consequently,
Mn are also uniformly bounded in both row and column sums. Q.E.D.
Lemma 5. Suppose that An are uniformly bounded in both row and column sums. Elements of the
n× k matrices Xn are uniformly bounded; limn→∞X0nXn
n exists and is nonsingular. Then
(i) tr(MnAn) = tr(An) +O(1),
(ii) tr(A0nMnAn) = tr(A0nAn) +O(1),
(iii) tr[(MnAn)2] = tr(A2n) +O(1), and
(iv) tr[(A0nMnAn)2] = tr[(MnAnA
0n)2] = tr[(AnA
0n)2] +O(1), where Mn = In −Xn(X 0
nXn)−1X 0
n.
Furthermore, if An,ij = O(1hn) for all i and j, then
(a) tr2(MnAn) = tr2(An) +O(
nhn) and
(b)Pn
i=1(MnAn)2ii =
Pni=1A
2n,ii +O(
1hn).
Proof: The assumptions imply that elements of the k × k matrix ( 1nX 0nXn)
−1 are uniformly bounded
for large enough n. Lemma A.5 in Lee (2001) implies that elements of the k × k matrices 1nX
0nAnXn,
1nX
0nAnA
0nXn and
1nX
0nA
2nXn are also uniformly bounded. It follows that
tr(MnAn) = tr(An)− tr[(X 0nXn)
−1X 0nAnXn] = tr(An) +O(1),
tr(A0nMnAn) = tr(A0nAn)− tr[(X 0
nXn)−1X 0
nAnA0nXn] = tr(A
0nAn) +O(1),
and tr((MnAn)2) = tr(A2n)−2tr[(X 0
nXn)−1X 0
nA2nXn]+ tr([(X
0nXn)
−1X 0nAnXn]
2) = tr(A2n)+O(1). By (iii),
tr[(A0nMnAn)2] = tr[(MnAnA
0n)2] = tr[(AnA
0n)2] +O(1).
When An,ij = O(1hn), from (i), tr2(MnAn) = (tr(An) + O(1))
2 = tr2(An) + 2tr(An) · O(1) + O(1) =
tr2(An) + O(nhn). Because An is uniformly bounded in column sums and elements of Xn are uniformly
bounded, X 0nAneni = O(1) for all i. Hence,
Pni=1(MnAn)
2ii =
Pni=1(An,ii − xi,n(X 0
nXn)−1X 0
nAneni)2 =Pn
i=1(An,ii +O(1n ))
2 =Pn
i=1[A2n,ii + 2An,ii ·O( 1n ) +O( 1n2 )] =
Pni=1A
2n,ii +O(
1hn). Q.E.D.
23
Appendix II: Proofs of Theorems
Proof of Proposition 2.1 As Wn,nYn = GnXnβ0 +GnVn,
Y 0nW0n,nWn,nYn − E(Y 0nW 0
n,nWn,nYn) = 2(GnXnβ0)0GnVn + V 0nG
0nGnVn − E(V 0nG0nGnVn),
Y 0nW0n,nVn − E(Y 0nW 0
n,nVn) = (GnXnβ0)0Vn + V 0nG
0nVn − E(V 0nG0nVn),
and X 0nWn,nYn − E(X 0Wn,nYn) = X
0nGnVn. Because
V 0n(δ)Vn(δ) = (β − β0)0X 0nXn(β − β0) + (λ− λ0)2Y 0nW 0
n,nWn,nYn + V0nVn
+ 2(λ− λ0)(β − β0)0X 0nWn,nYn + 2(β0 − β)0X 0
nVn + 2(λ0 − λ)Y 0nW 0n,nVn,
one has
1
nlnLn(θ)− E( 1
nlnLn(θ)) = − (λ− λ0)
2
2σ2 2n(GnXnβ0)
0GnVn +1
n[V 0nG
0nGnVn − E(V 0nG0nGnVn)]
− 1
2σ2· 1n(V 0nVn − nσ20)−
(λ− λ0)(β − β0)0σ2
· X0nGnVnn
− (β0 − β)0
σ2X 0nVnn
− (λ0 − λ)σ2
½(GnXnβ0)
0Vnn
+1
n(V 0nG
0nVn − E(V 0nG0nVn))
¾.
Lemma A.14 in Lee (2001) implies that (GnXnβ0)0GnVn
n , (GnXnβ0)0Vn
n andX0nGnVnn are of order OP (
1√n). By the
central limit theorems for independent variables, 1n (V0nVn−nσ20) and 1
nX0nVn are of Op(
1√n). Lemma 3 implies
that 1n (V0nG
0nGnVn−E(V 0nG0nGnVn)) and 1
n (V0nG
0nVn−E(V 0nG0nVn)) have the order oP ( 1hn ). As σ2 is bounded
away from zero and β and λ are bounded in Θ, it follows that supθ∈Θ | 1n lnLn(θ)− E( 1n lnLn(θ))| = oP (1).
Q.E.D.
Proof of Proposition 2.2 Let α = (α01,α2,α3)0 be a column vector of constants such that Σθα = 0. It
is sufficient to show that α = 0. From the Þrst row block of the linear equation system Σθα = 0, one
has limn→∞X0nXn
n α1 + limn→∞X0nGnXnβ0
n α2 = 0 and, therefore, α1 = − limn→∞(X 0nXn)
−1X 0nGnXnβ0 · α2.
From the last equation of the linear system, one has α3 = −2σ20 limn→∞tr(Gn)n · α2. By eliminating α1 and
α3, the remaining equation becomes½limn→∞
1
nσ20(GnXnβ0)
0Mn(GnXnβ0) + limn→∞
1
n
·tr(G0nGn) + tr(G
2n)− 2
tr2(Gn)
n
¸¾α2 = 0.
Because tr(GnG0n)+ tr(G
2n)− 2
n tr2(Gn) =
12 tr[(C
0n+Cn)(C
0n+Cn)
0] ≥ 0, the assumed conditions imply that
α2 = 0 and, so, α = 0. Q.E.D.
Proof of Theorem 2.1 As E( 1n lnLn(θ)) = − ln(2π)2 − lnσ2
2 + 1n ln |Sn(λ)| − 1
2σ2nE(V0n(δ)Vn(δ)), where
E(V 0n(δ)Vn(δ)) = (Sn(λ)S−1n Xnβ0 − Xnβ)0(Sn(λ)S−1n Xnβ0 − Xnβ) + σ20tr(S
0−1n S0n(λ)Sn(λ)S
−1n ), its Þrst
order derivatives are∂E( 1n lnLn(θ))
∂β = 1σ2nX
0n(Sn(λ)S
−1n Xnβ0 −Xnβ),
∂E( 1n lnLn(θ))
∂λ= − 1
ntr(Wn,nS
−1n (λ)) +
1
σ2n(GnXnβ0)0(Sn(λ)S−1n Xnβ0 −Xnβ) + σ20tr(S
0−1n S0n(λ)Gn),
24
and∂E( 1n lnLn(θ))
∂σ2 = − 12σ2 +
E(V 0n(δ)Vn(δ))2σ4n . At θ0,
∂E( 1n lnLn(θ0))
∂θ = 0. The second order derivatives are
∂2E( 1n lnLn(θ))
∂β∂β0 = −X0nXn
σ2n ,∂2E( 1n lnLn(θ))
∂β∂λ = − 1σ2nX
0nGnXnβ0,
∂2E( 1n lnLn(θ))
∂β∂σ2= − 1
σ4nX 0n(Sn(λ)S
−1n Xnβ0 −Xnβ),
∂2E( 1n lnLn(θ))
∂λ2= − 1
ntr([Wn,nS
−1n (λ)]2)− 1
σ2n[(GnXnβ0)
0(GnXnβ0) + σ20tr(G0nGn)],
∂2E( 1n lnLn(θ))
∂σ2∂λ= − 1
nσ4[(GnXnβ0)
0(Sn(λ)S−1n Xnβ0 −Xnβ) + σ20tr(S0−1n S0n(λ)Gn)],
and∂2E( 1n lnLn(θ))
∂σ2∂σ2 = 12σ4 − 1
σ6nE(V0n(δ)Vn(δ)).
By Taylors expansion of E( 1n lnLn(θ)) at θ0 and by rearrangement,
E(1
nlnLn(θ))− E( 1
nlnLn(θ0)) = −1
2
(θ − θ0)0k θ − θ0 k ·Σθ ·
(θ − θ0)k θ − θ0 k k θ − θ0 k
2
+1
2(θ − θ0)0
·∂2
∂θ∂θ0E(1
nlnLn(θn)) + Σθ
¸(θ − θ0)
where θn lies between θ and θ0. Because Σθ is positive deÞnite, there exists a constant c > 0 such that
supθ12(θ−θ0)0kθ−θ0k · (−Σθ) ·
(θ−θ0)kθ−θ0k < −c. As 1
n tr([Wn,nS−1n (λ)]2) − 1
n tr(G2n) =
2n tr[(Wn,nS
−1n (λn))
3](λ − λ0),
where λn lies between λ and λ0, and1n tr[(Wn,nS
−1n (λ))3] = O( 1hn ) uniformly in λ ∈ Λ1 by Lemma 2, it
follows that limn→∞∂2E( 1n lnLn(θn))
∂θ∂θ0 − ∂2E( 1n lnLn(θ0))
∂θ∂θ0 = 0 whenever limn→∞ θn = θ0. By Proposition 2.2
that limn→∞ 1nE(
∂2 lnLn(θ0)∂θ∂θ0 ) = −Σθ, there exists a neighborhood Θ1 of θ0 (contained in Λ1) such that
supθ∈Θ1k ∂2
∂θ∂θ0E(1n lnLn(θ)) + Σθ k≤ c/2 for large enough n. Hence,
E(1
nlnLn(θ))−E( 1
nlnLn(θ0)) ≤ 1
2
(θ − θ0)0k θ − θ0 k ·(−Σθ)·
(θ − θ0)k θ − θ0 k· k θ−θ0 k
2 +c
4k θ−θ0 k2< − c
4k θ−θ0 k2,
that is, the identiÞcation uniqueness property holds on Θ1. Finally, the consistency of θn follows from the
identiÞcation uniqueness and that 1n [lnLn(θ) − E(lnLn(θ))]
p→ 0 uniformly in θ from Proposition 2.1 (see
White 1994). Q.E.D.
Proof of Theorem 2.2 The expected average log likelihood of (2.3) is
E(lnLn(θ)) = −n2ln(2π)− n
2lnσ2 + ln |Sn(λ)|
− 1
2σ2[Sn(λ)S−1n Xnβ0 −Xnβ]0[Sn(λ)S−1n Xnβ0 −Xnβ] + σ20tr(S
0−1n S0n(λ)Sn(λ)S
−1n ).
As Sn(λ)S−1n Xnβ0−Xnβ = Xn(β0− β) + (λ0−λ)GnXnβ0, it follows that 1
nE(lnLn(θ))− 1nE(lnLn(θ0)) =
T1n − 12σ2T2n, where
T1n(λ,σ2) =
1
2(lnσ20 − lnσ2) +
(σ2 − σ20)2σ2
− (σ2n(λ)− σ20)2σ2
+1
n(ln |Sn(λ)|− ln |Sn(λ0)|),
25
and T2n(β,λ) = (β0 − β00,λ− λ0)(Xn, GnXnβ0)0(Xn, GnXnβ0)(β0 − β00,λ− λ0)0.
Consider the pure spatial autoregressive process Yn = λWn,nYn + Vn where Vn is N(0,σ2In). The log
likelihood function of this process is
lnLp,n(λ,σ2) = −n
2ln(2π)− n
2lnσ2 + ln |Sn(λ)|− 1
2σ2Y 0nS
0n(λ)Sn(λ)Yn.
Let Ep(·) be the expectation operator for Yn based on this pure spatial autoregressive process. It follows
that
Ep(1
nlnLp,n(λ,σ
2))− Ep( 1nlnLp,n(λ0,σ
20))
=1
2(1 + lnσ20 − lnσ2) +
1
n(ln |Sn(λ)|− ln |Sn(λ0)|)− σ20
2σ2ntr(S
0−1n S0n(λ)Sn(λ)S
−1n )
=1
2(lnσ20 − lnσ2) +
(σ2 − σ20)2σ2
+1
n(ln |Sn(λ)|− ln |Sn(λ0)|)− (σ
2n(λ)− σ20)2σ2
,
which equals to T1n(λ,σ2). By the information inequality, Ep(
1n lnLp,n(λ,σ
2))− Ep( 1n lnLp,n(λ0,σ20)) ≤ 0.
Thus, T1n(λ,σ2) ≤ 0 for any (λ,σ2).
T2n(β,λ) is a quadratic function of β and λ. Under the assumed condition, T2n(β,λ) < 0 whenever
(β,λ) 6= (β0,λ0). Thus, β0 and λ0 are globally identiÞed. At λ0, σ20 is the unique maximizer of T1n(λ0,σ2).
Thus our assumption guarantees that E( 1n lnLn(θ))− E( 1n lnLn(θ0)) satisÞes the identiÞcation uniqueness
condition and θ0 is globally identiÞable. The consistency of θn follows from this identiÞcation and uniform
convergence in Proposition 2.1. Q.E.D.
Proof of Theorem 2.3 Suppose that lim supn→∞[maxθ∈N²(θ0)E(1n lnLn(θ) − E( 1n lnLn(θ0))] < 0 does
not hold for some ² > 0. Then there exists a sequence θn = (β0n,σ2n,λn) which converges to a vector
θ+ = (β0+,σ
2+,λ+) where θ+ 6= θ0 such that limn→∞[E( 1n lnLn(θn))− E( 1n lnLn(θ0))] = 0. If E( 1n lnLn(θ))
is uniformly equicontinuous in θ, it will imply that limn→∞[E( 1n lnLn(θn)) − E( 1n lnLn(θ+))] = 0. Hence,
limn→∞[E( 1n lnLn(θ+)) − E( 1n lnLn(θ0))] = 0. We want to show that there will be a contradiction to our
assumed condition in (2.14).
First, we show that E( 1n lnLn(θ)) is uniformly equicontinuous in θ. Let θ1 and θ2 be any two vectors
in Θ. As E( 1n lnLn(θ)) = − ln(2π)2 − lnσ2
2 + 1n ln |Sn(λ)|− 1
2σ2nE(V0n(δ)Vn(δ)),
E(1
nlnLn(θ2))− E( 1
nlnLn(θ1))
=1
2(lnσ21 − lnσ22) +
1
n(ln |Sn(λ2)|− ln |Sn(λ1)|)− 1
2(1
σ22− 1
σ21)1
nE(V 0n(δ2)Vn(δ2))
+1
2σ21· 1n[E(V 0n(δ1)Vn(δ1))− E(V 0n(δ2)Vn(δ2))].
26
By expansion,
1
nE(V 0n(δ)Vn(δ)) =
1
n[(Sn(λ)S
−1n Xnβ0 −Xnβ)0(Sn(λ)S−1n Xnβ0 −Xnβ) + σ20tr(S
0−1n S0n(λ)Sn(λ)S
−1n )]
= (β − β0)0X0nXnn
(β − β0) + (λ− λ0)2 1n[(Xnβ0)
0G0nGnXnβ0 + σ20tr(G
0nGn)] + σ
20
+ 2(λ− λ0)(β − β0)0 1nX 0nGnXnβ0 + 2(λ0 − λ)σ20
tr(Gn)
n,
and
1
n[E(V 0n(δ1)Vn(δ1))− E(V 0n(δ2)Vn(δ2))]
= (β1 − β2)0X0nXnn
(β1 + β2 − 2β0) + [(λ1 − λ0)2 − (λ2 − λ0)2] 1n[(Xnβ0)
0G0nGnXnβ0 + σ20tr(G
0nGn)]
+ 2[(λ1 − λ0)(β1 − β0)0 − (λ2 − λ0)(β2 − β0)0] 1nX 0nGnXnβ0 + 2(λ2 − λ1)σ20
tr(Gn)
n.
By Lemma A.5 in Lee (2001), 1nX0nXn,
1n (Xnβ0)
0G0nGnXnβ0 and1nX
0nGnXnβ0 are of order O(1). By Lemma
2, 1n tr(Gn) and
1n tr(G
0nGn) are of order O(
1hn). Thus, as Θ is compact, 1
nE(V0n(δ)Vn(δ)) are uniformly
bounded and are uniformly equicontinuous on Θ. By the mean value theorem, 1n (ln |Sn(λ2)|− ln |Sn(λ1)|) =1n tr(Wn,nS
−1n (λn))(λ2 − λ1) where λn lies between λ1 and λ2. By the uniform boundedness Assumption
7, Lemma 2 implies that 1n tr(Wn,nS
−1(λn)) = O( 1hn ). Hence, 1n ln |Sn(λ)| is uniformly equicontinuous
in λ in Λ. As Θ is compact, lnσ2 is uniformly continuous and, consequently, E( 1n lnLn(θ)) is uniformly
equicontinuous in θ.
Consider Þrst the case that λ+ = λ0. If this were the case,
E(1
nlnLn(θ+))− E( 1
nlnLn(θ0)) = − lnσ
2+
2− 1
2σ2+n[(β0 − β+)0X 0
nXn(β0 −Xnβ+) + nσ20 ] +lnσ202
+1
2,
which would correspond to the difference of the average log likelihood functions of the normal regression
model SnYn = Xnβ+Vn (as if λ0 were known) at (β+,σ2+) and (β0,σ
20). In a linear regression model, β0 and
σ20 are uniquely identiÞable in its limiting process under the condition that limn→∞X0nXn
n is nonsingular. So
λ+ could not be equal to λ0.
By the Jensen inequality, for any θ, E(lnLn(θ)) ≤ E(lnLn(θ0)). At λ0, θ∗n(λ0) = β0, σ
∗2n (λ0) = σ20
and Qn(λ0) = E(lnLn(θ0)). It follows that E(1n lnLn(θ+)) ≤ 1
nQn(λ+) ≤ 1nQn(λ0) =
1nE(lnLn(θ0)). In
turn, these imply that limn→∞ 1n [Qn(λ+) − Qn(λ0)] = 0, which is a contradiction. This is so, because
the assumed condition (2.14) is equivalent to that limn→∞[ 1n (ln |Sn(λ)| − ln |Sn|) − 12 (lnσ
∗2n (λ) − lnσ20)] =
limn→∞ 1n [Qn(λ)−Qn(λ0)] 6= 0 for λ 6= λ0 as σ2n = σ∗2n when MnGnXnβ0 = 0.
The consistency of θn follows from the above identiÞcation uniqueness and the uniform convergence of
1n (lnLn(θ)− lnLn(θ0))
p→ 0 in Proposition 2.1. Q.E.D.
27
Proof of Proposition 2.3 The matrix Gn of the quadratic form is uniformly bounded in row sums.
As all the regressors are bounded, the elements of GnXnβ0 for all n are uniformly bounded (Lemma A.5
in Lee (2001)). When hn = O(1), as Σθ is positive deÞnite, the variances of1√n∂ lnLn(θ0)
∂θ in (2.15) are
bounded away from zero. With the existence of high order moments of v in Assumption 2, the sufficient
conditions of the central limit theorem for quadratic forms of double arrays of Kelejian and Prucha (1999b)
are satisÞed and the limiting distribution of the score vector follows. For the case that limn→∞ hn = ∞,1√n(V 0nG0nVn−σ20tr(Gn)) p→ 0 because var(V 0nG0nVn) = O(
nhn). The remaining terms converge in distribution
to a relevant normal vector by Kolmogorovs central limit theorem. Q.E.D.
Proof of Proposition 2.4 AsX0nXn
n = O(1),X0nWn,nYn
n = OP (1) and σ2n
p→ σ20 , (2.5) implies that
1
n
∂2 lnLn(θn)
∂β∂β0− 1
n
∂2 lnLn(θ0)
∂β∂β0= (
1
σ20− 1
σ2n)X 0nXnn
= op(1),
and
1
n
∂2 lnLn(θn)
∂β∂λ− 1
n
∂2 lnLn(θ0)
∂β∂λ= (
1
σ20− 1
σ2n)X 0nWn,nYnn
= op(1).
As Vn(δn) = Yn −Xn βn − λnWn,nYn = Xn(β0 − βn) + (λ0 − λn)Wn,nYn + Vn,
1
n
∂2 lnLn(θn)
∂β∂σ2− 1
n
∂2 lnLn(θ0)
∂β∂σ2= (
1
σ40− 1
σ4n)X 0nVnn
+X 0nXnσ4nn
( βn − β0) + X0nWn,nYnσ4nn
(λn − λ0) = op(1).
By the mean value theorem, tr(G2n(λn)) = tr(G
2n) + 2tr(G
3n(λn)) · (λn − λ0), therefore,
1
n
∂2 lnLn(θn)
∂λ2− 1
n
∂2 lnLn(θ0)
∂λ2= −2 tr(G
3n(λn))
n(λn − λ0) + ( 1
σ20− 1
σ2n)Y 0nW 0
n,nWn,nYn
n= op(1).
On the other hand, because
1
nY 0nW
0n,nVn(
δn) =Y 0nW 0
n,nXn
n(β0 − βn) + (λ0 − λn)
Y 0nW 0n,nWn,nYn
n+Y 0nW 0
n,nVn
n=Y 0nW 0
n,nVn
n+ oP (1)
and
1
nV 0n(δn)Vn(δn) = (βn − β0)0
X 0nXnn
( βn − β0) + (λn − λ0)2Y 0nW 0
n,nWn,nYn
n+V 0nVnn
+ 2(λn − λ0)( βn − β0)0X0nWn,nYnn
+ 2(β0 − βn)0X
0nVnn
+ 2(λ0 − λn)Y 0nW
0n,nVn
n
=V 0nVnn
+ oP (1),
it follows that
1
n
∂2 lnLn(θn)
∂σ2∂λ− 1
n
∂2 lnLn(θ0)
∂σ2∂λ
= −Y0nW
0n,nVn(
δn)
σ4nn+Y 0nW
0n,nVn
σ40n
=Y 0nW 0
n,nXn
σ4nn( βn − β0) +
Y 0nW 0n,nWn,nYn
σ4nn(λn − λ0) + ( 1
σ40− 1
σ4n)Y 0nW 0
n,nVn
n= op(1),
28
and1
n
∂2 lnLn(θn)
∂σ2∂σ2− 1
n
∂2 lnLn(θ0)
∂σ2∂σ2=
1
2σ4n− V
0n(δn)Vn(δn)
nσ6n− 1
2σ40+V 0nVnnσ60
=1
2(1
σ4n− 1
σ40) + (
1
σ60− 1
σ6n)V 0nVnn
+ oP (1) = op(1).
Q.E.D.
Proof of Proposition 2.5 By Lemma A.14 in Lee (2001), 1nX0nGnVn = oP (1) and
1nX
0nG
0nGnVn = oP (1).
It follows that 1nX
0nWn,nYn =
1nX
0nGnXnβ0 + oP (1),
1nY
0nW
0n,nVn =
1nV
0nG
0nVn + oP (1), and
1
nY 0nW
0n,nWn,nYn =
1
n(Xnβ0)
0G0nGnXnβ0 +1
nV 0nG
0nGnVn + oP (1).
E(V 0nG0nVn) = σ20tr(Gn) and var(1nV
0nG
0nVn) =
(µ4−3σ40)n2
Pni=1G
2n,ii +
σ40n2 [tr(GnG
0n) + tr(G
2n)] = O(
1nhn
) by
Lemma A.10 in Lee (2001). Similarly, E(V 0nG0nGnVn) = σ
20tr(G
0nGn) and
var(1
nV 0nG
0nGnVn) =
(µ4 − 3σ40)n2
nXi=1
(G0nGn)2ii + 2
σ40n2tr((G0nGn)
2) = O(1
nhn).
By the law of large numbers for i.i.d. random variables, 1nV0nVn
p→ σ20 . With these properties, the convergence
result follows.
Lemma 2 implies that 1n tr(Gn) = O( 1hn ),
1n tr(G
2n) = O( 1hn ) and
1n tr(G
0nGn) = O( 1hn ). Thus, if
limn→∞ hn =∞, 1n tr(Gn), 1n tr(G2n) and 1n tr(G
0nGn) vanish. Q.E.D.
Proof of Theorem 2.4 The asymptotic distribution of θn follows from the expansion√n(θn − θ0) =
−³1n∂2 lnLn(θn)
∂θ∂θ0
´−11√n
∂ lnLn(θ0)∂θ and Propositions (2.3) and (2.4). Q.E.D.
Proof of Proposition 3.1 As MnSn(λ)S−1n Xnβ0 = (λ0−λ)MnGnXnβ0 = 0 whenMnGnXnβ0 = 0, σ
∗2n (λ)
in (2.12) is reduced to
σ∗2n (λ) =σ20ntr(S
0−1n S0n(λ)Sn(λ)S
−1n ) = σ20 [1 + 2(λ0 − λ)
tr(Gn)
n+ (λ0 − λ)2 tr(G
0nGn)
n].
As tr(Gn)n and
tr(G0nGn)n are of order O( 1hn ), it follows that, when limn→∞ hn = ∞, σ∗2n (λ) − σ20 = op(1)
uniformly in λ in any bounded set. As
σ2n(λ)− σ∗2n (λ) =1
nY 0nS
0n(λ)MnSn(λ)Yn − σ
20
ntr(S
0−1n S0n(λ)Sn(λ)S
−1n )
=1
nV 0nS
0−1n S0n(λ)Sn(λ)S
−1n Vn − σ
20
ntr(S
0−1n S0n(λ)Sn(λ)S
−1n )− Tn(λ),
where Tn(λ) =1nV
0nS
0−1n S0n(λ)Xn(X
0nXn)
−1X 0nSn(λ)S
−1n Vn. By Lemma A.14 in Lee (2001),
1√nX 0nSn(λ)S
−1n Vn =
1√nX 0nS
−1n Vn − λ 1√
nX 0nGnVn = Op(1).
29
As hnn goes to zero, hnTn(λ) =hnn (
1√nX 0nSn(λ)S
−1n Vn)
0(X0nXn
n )−1( 1√nX 0nSn(λ)S
−1n Vn)
0 goes to zero in prob-
ability. By Lemma 3, hnn [V
0nS
0−1n S0n(λ)Sn(λ)S
−1n Vn − σ20tr(S
0−1n S0n(λ)Sn(λ)S
−1n )] = oP (1). Therefore,
hn(σ2n(λ) − σ∗2n (λ)) = oP (1) uniformly in λ in any bounded set. The latter uniform convergence in λ in
any bounded set is apparent because the relevant expressions are linear in λ.
Because hnn (lnLn(λ)−Qn(λ)) = −hn
2 (ln σ2n(λ)− lnσ∗2n (λ)), by the mean value theorem hn
n (lnLn(λ)−
Qn(λ)) = − hn2σ2n(λ)
(σ2n(λ) − σ∗2n (λ)), where σ2n(λ) lies between σ2n(λ) and σ∗2n (λ). As σ2n(λ) and σ∗2n (λ)
converges to σ20 uniformly in any bounded set Λ when limn→∞ hn = ∞, supλ∈Λ 1σn(λ)
= OP (1) and
supλ∈Λ1
σ∗2n (λ) = O(1). As σ2n(λ) lies between σ2n(λ) and σ
∗2n (λ),
1σ2n(λ)
≤ 1σ2n(λ)
+ 1σ∗2n (λ) is stochastically
bounded uniformly in λ ∈ Λ. Hence
hnnsupλ∈Λ
| lnLn(λ)−Qn(λ)| ≤ supλ∈Λ
1
2σ2n(λ)· hn sup
λ∈Λ|σ2n(λ)− σ∗2n (λ)| = op(1).
Q.E.D.
Proof of Proposition 3.2 As Lemma A.10 in Lee (2001) gives
E(V 0nG0nMnVn)
2 = (µ4 − 3σ40)nXi=1
(G0nMn)2ii + σ
40 [tr
2(G0nMn) + tr(G0nMnGn) + tr((G
0nMn)
2)],
hence E(P0(λ0)) =2n [(
µ4−3σ40σ40
)Pn
i=1(G0nMn)
2ii+tr
2(G0nMn)+tr(G0nMnGn)+tr((G
0nMn)
2)]−tr(G0nMnGn)−
tr(G2n). The orders of the relevant terms tr(G0nMn), tr(G0nMnGn), tr((G
0nMn)
2) and tr((G0nMnGn)2)
are needed. From Lemma 5,Pni=1(G
0nMn)
2ii =
Pni=1G
2n,ii + O(
1hn), tr2(G0nMn) = tr2(Gn) + O(
nhn),
tr(G0nMnGn) = tr(G0nGn)+O(1), tr[(G
0nMn)
2] = tr(G2n)+O(1), and tr[(G0nMnGn)
2] = tr[(GnG0n)2]+O(1).
The simpliÞed expression for E(Pn(λ0)) follows also from these expansions.
From Lemma 2,Pni=1G
2n,ii = O( nh2n
) and tr(G0nMnGn), tr((G0nMn)
2) and tr((G0nMnGn)2) have the
order O( nhn ). Ashnn [Pn(λ0) − E(Pn(λ0))] = 2
σ40∆n1 − 1
σ20∆n2 + o(1), where ∆n1 =
hnn2 [(V
0nG
0nMnVn)
2 −
σ40tr2(G0nMn)] and ∆n2 =
hnn [V
0nG
0nMnGnVn − σ20tr(G0nMnGn)],
hnn [Pn(λ0) − E(Pn(λ0))] will converge to
zero in probability if both ∆n1 and ∆n2 converge to zero in probability. By Lemma A.10 in Lee (2001) and
the orders of relevant terms,
E(∆n1) =hnn2var(V 0nG
0nMnVn) =
hnn2(µ4 − 3σ40)
nXi=1
(G0nMn)2ii + σ
40 [tr(G
0nMnGn) + tr((G
0nMn)
2)] = O( 1n)
and
E(∆2n2) = (
hnn)2var(V 0nG
0nMnGnVn) = (
hnn)2[(µ4−3σ40)
nXi=1
(G0n,iMnGn,i)2+2σ40tr((G
0nMnGn)
2)] = O(hnn),
30
which go to zero.
Finally,
hnn
µ∂2 lnLn(λ0)
∂λ2− E(Pn(λ0))
¶=hnn
µ∂2 lnLn(λ0)
∂λ2− Pn(λ0)
¶+hnn(Pn(λ0)− E(Pn(λ0)))
= (1
σ4n(λ0)− 1
σ40) · 2hn
n2(V 0nG
0nMnVn)
2 − ( 1
σ2n(λ0)− 1
σ20) · hnnV 0nG
0nMnGnVn + oP (1).
Lemma 3 implies that hnn V
0nG
0nMnGnVn = OP (1) and
hnn2 (V
0nG
0nMnVn)
2 = (√hnn V 0nG
0nMnVn)
2 = OP (1hn).
Because σ2n(λ0) is a consistent estimate of σ20 , the Þnal result follows from Slutskys lemma. Q.E.D.
Proof of Theorem 3.1 When MnGnXnβ0 = 0, σ∗2n (λ) =
σ20n tr(S
0−1n S0n(λ)Sn(λ)S
−1n ) from (2.12). For this
case, Qn(λ) happens to have the same expression used in Lee (2001) for the pure spatial autoregressive
process. The uniqueness identiÞcation condition holds as in Theorem 3 in Lee (2001). The consistency of
the estimator λn follows from the unique identiÞcation and the uniform convergence ofhnn (lnLn(λ)−Qn(λ))
to zero in probability. Q.E.D.
Proof of Proposition 3.3 Because MnGnXnβ0 = 0, MnXn = 0 and Sn(λ) = Sn + (λ0 − λ)Wn,n, one has
Y 0nW0n,nMnWn,nYn = (GnXnβ0 +GnVn)
0Mn(GnXnβ0 +GnVn) = V0nG
0nMnGnVn,
andY 0nW
0n,nMnSn(λ)Yn = Y
0nW
0n,nMnSnYn + (λ0 − λ)Y 0nW 0
n,nMnWn,nYn
= (GnXnβ0 +GnVn)0Mn(Xnβ0 + Vn) + (λ0 − λ)V 0nG0nMnGnVn
= V 0nG0nMnVn + (λ0 − λ)V 0nG0nMnGnVn.
Therefore,
hnn
Ã∂2 lnLn(λn)
∂λ2− ∂
2 lnLn(λ0)
∂λ2
!
= 2(1
σ4n(λn)
− 1
σ4n(λ0))hnn2(V 0nG
0nMnVn)
2 + 4(λ0 − λn)σ4n(λn)
(1
nV 0nG
0nMnGnVn)(
hnnV 0nG
0nMnVn)
+ 2(λ20 − λn)2σ4n(λn)
· hnn2(V 0nG
0nMnGnVn)
2 + (1
σ2n(λ0)− 1
σ2n(λn)
)(hnnV 0nG
0nMnGnVn)
+ 2hnntr(G3n(λn))(
λn − λ0)
= oP (1),
because V 0nG0nMnVn = Op(
nhn) and V 0nG
0nMnGnVn = Op(
nhn) by Lemma 3, σ2n(λ0) =
1nV
0nMnVn = Op(1) by
Lemma A.11 in Lee (2001), and hnn tr(G
3n(λn)) = O(1) by Lemma 2. Q.E.D.
31
Proof of Proposition 3.4 The mean of Qn is E(Qn) = σ20tr(MnCn) = −σ20 ·tr[(X 0
nXn)−1X 0
nCnXn] = O(1)
becauseX0nXn
n = O(1) by Assumption 6 andX0nCnXn
n = O(1) from Lemma A.5 in Lee (2001). The variance
of Qn from Lemma A.10 in Lee (2001) is
σ2Qn= (µ4 − 3σ40)
nXi=1
(C0nMn)2ii + σ
40 [tr(C
0nMnCn) + tr((C
0nMn)
2)]
= (µ4 − 3σ40)nXi=1
C2n,ii + σ40 [tr(C
0nCn) + tr(C
2n)] +O(1),
where the last expression follows by using
nXi=1
(C 0nMn)2ii =
nXi=1
C2n,ii +O(1
hn), tr(C 0nMnCn) = tr(C
0nCn) +O(1) and tr[(C0nMn)
2] = tr(C2n) +O(1)
from Lemma 5. The central limit theorem for quadratic functions (Lemma A.13, Lee (2001)) implies that
(Qn−E(Qn))σQn
D→ N(0, 1). Asq
hnn E(Qn) = O(
qhnn ), which goes to zero,r
hnn
∂ lnLn(λ0)
∂λ=
qhnn σQn
σ2n(λ0)· (Qn − E(Qn))
σQn
+
rhnn
E(Qn)
σ2n(λ0)
=
qhnn σQn
σ2n(λ0)· (Qn − E(Qn))
σQn
+ oP (1)
p→ N
Ã0, limn→∞
hnn[(µ4 − 3σ40)
σ40
nXi=1
C2n,ii + tr(CnC0n) + tr(C
2n)]
!.
As C2n,ii = O(1h2n), hnn
Pni=1 C
2n,ii = O(
1hn) which goes to zero when limn→∞ hn =∞. Q.E.D.
Proof of Theorem 3.2 The asymptotic distribution follows from the Taylor expansion and the convergence
results in Propositions 3.3 and 3.4 and that in (3.11). Q.E.D.
Proof of Theorem 3.3 From (2.9), as Sn(λ) = Sn + (λ0 − λn)Wn,n,
βn(λn)− β0 = (X 0nXn)
−1X 0nVn − (λn − λ0)(X 0
nXn)−1X 0
nWn,nYn
= (X 0nXn)
−1X 0nVn − (λn − λ0)(X 0
nXn)−1X 0
nGnXnβ0 − (λn − λ0)(X 0nXn)
−1X 0nGnVn.
As (λn−λ0)(X 0nXn)
−1X 0nGnVn = Op(
√hnn ) because
X0nXn
n = O(1) andX0nGnVn√n
= Op(1) by Lemma A.14 in
Lee (2001) and λn − λ0 = OP (q
hnn ) by Theorem 3.2,
βn(λn)− β0 = (X 0nXn)
−1X 0nVn − (λn − λ0)(X 0
nXn)−1X 0
nGnXnβ0 + Op(
√hnn).
In general,rn
hn( βn(λn)− β0) = 1√
hn(X 0nXnn
)−1X 0nVn√n−rn
hn(λn − λ0) · (X 0
nXn)−1X 0
nGnXnβ0 +Op(1√n)
= −rn
hn(λn − λ0) · (X 0
nXn)−1X 0
nGnXnβ0 +Op(1√hn).
32
If β0 is zero,√n( βn(λn)− β0) = (X
0nXn
n )−1X0nVn√n+Op(
qhnn )
D−→ N(0,σ20 limn→∞(X0nXn
n )−1).
As
σ2n =1
nY 0nS
0n(λn)MnSn(λn)Yn
=1
nY 0nS
0nMnSnYn + 2(λ0 − λn) 1
nY 0nS
0nMnWn,nYn + (λ0 − λn)2 1
nY 0nW
0n,nMnWn,nYn
and 1nY
0nS
0nMnSnYn =
1nV
0nMnVn, it follows that
√n(σ2n − σ20) =
1√n(V 0nVn − nσ20)−
1√nV 0nXn(X
0nXn)
−1X 0nVn
− 2rn
hn(λn − λ0) ·
√hnnY 0nW
0n,nMnSnYn +
rn
hn(λn − λ0)2 ·
√hnnY 0nW
0n,nMnWn,nYn.
Because MnXn = 0 and MnGnXnβ0 = 0,
√hnnY 0nW
0n,nMnSnYn =
√hnn(GnXnβ0 +GnVn)
0Mn(Xnβ0 + Vn) =
√hnnV 0nG
0nMnGnVn = O(
1√hn)
and√hnn Y 0nW
0n,nMnWn,nYn =
√hnn (GnXnβ0+GnVn)
0Mn(GnXnβ0+GnVn) =√hnn V 0nG
0nMnGnVn = O(
1√hn)
by Lemma 3. As E( 1√nV 0nXn(X 0
nXn)−1X 0
nVn) =σ20√ntr(Xn(X
0nXn)
−1Xn) =σ20k√ngoes to zero, the Markov
inequality implies that 1√nV 0nXn(X
0nXn)
−1X 0nVn = oP (1). Hence, as limn→∞ hn = ∞, √n(σ2n − σ20) =
1√n(V 0nVn − nσ20) + oP (1) D→ N(0, µ4 − σ4). Q.E.D.
Proof of Theorem 3.4 The asymptotic distributions of βn1 and βn2 follow from (3.18) and (3.19). Q.E.D.
33
References
Anselin, L. (1988), Spatial Econometrics: Methods and Models, Kluwer Academic Publishers, The Nether-
lands.
Anselin, L. (1992), Space and Applied Econometrics, Anselin, ed. Special Issue, Regional Science and Urban
Economics 22.
Anselin, L. and A.K. Bera (1998), Spatial Dependence in Linear Regression Models with an Introduction
to Spatial Econometrics, in: Handbook of Applied Economics Statistics, A. Ullah and D.E.A. Giles, eds.,
Marcel Dekker, NY.
Anselin, L. and R. Florax (1995), New Directions in Spatial Econometrics, Springer-Verlag, Berlin.
Anselin, L. and S. Rey (1991), Properties of Tests for Spatial Dependence in Linear Regression Models,
Geographical Analysis 23, 110-131.
Anselin, L. and S. Rey (1997), Spatial Econometrics, Anselin, L. and S. Rey, ed. Special Issue, International
Regional Science Review 20.
Case, A.C. (1991), Spatial Patterns in Household Demand, Econometrica 59, 953-965.
Case, A.C. (1992), Neighborhood Inßuence and Technological Change, Regional Science and Urban Eco-
nomics 22, 491-508.
Case, A.C., H.S. Rosen, and J.R. Hines (1993), Budget Spillovers and Fiscal Policy Interdependence:
Evidence from the States, Journal of Public Economics 52, 285-307.
Cressie, N. (1993), Statistics for Spatial Data, Wiley, New York.
Dhrymes, P.J. (1978), Mathematics for Econometrics, Springer-Verlag, New York.
Doreian, P. (1980), Linear Models with Spatially Distributed Data, Spatial Disturbances, or Spatial Effects,
Sociological Methods and Research 9, 29-60.
Haining, R. (1990), Spatial Data Analysis in the Social and Environmental Sciences, Cambridge U. Press,
Cambridge.
Horn, R., and C. Johnson (1985), Matrix Analysis, New York: Cambridge University Press.
Kelejian, H.H. and I.R. Prucha (1998), A Generalized Spatial Two-Stage Least Squares Procedure for
Estimating a Spatial Autoregressive Model with Autoregressive Disturbances, Journal of Real Estate
Finance and Economics 17, 99-121.
Kelejian, H.H., and I.R. Prucha (1999a), A Generalized Moments Estimator for the Autoregressive Param-
34
eter in a Spatial Model. International Economics Review 40, 509-533.
Kelejian, H.H., and I.R. Prucha (1999b), On the Asymptotic Distribution of the Moran I Test Statistic
with Applications. Manuscript, Department of Economics, University of Maryland.
Kelejian, H.H., and D. Robinson (1993), A suggested method of estimation for spatial interdependent models
with autocorrelated errors, and an application to a county expenditure model, Papers in Regional Science
72, 297-312.
Lee, L.F. (1999a), Best Spatial Two-Stage Least Squares Estimators for a Spatial Autoregressive Model
with Autoregressive Disturbances, Manuscript, Department of Economics, HKUST, Hong Kong.
Lee, L.F. (1999b), Consistency and Efficiency of Least Squares Estimation for Mixed Regressive, Spatial
Autoregressive Models. Manuscript, forthcoming in Econometric Theory.
Lee, L.F. (2001), Asymptotic Distributions of Quasi-Maximum Likelihood Estimators for Spatial Econo-
metric Models I: Spatial Autoregressive Processes. Manuscript, Department of Economics, HKUST,
1999; Revised Manuscript, Department of Economics, OSU, 2001.
Manski, C.F. (1993), IdentiÞcation of Endogenous Social Effects: The Reßection Problem, The Review of
Economic Studies 60, 531-542.
Mead, R. (1967), A Mathematical Model for the Estimation of Interplant Competition, Biometrics 23,
189-205.
Ord, J.K. (1975), Estimation Methods for Models of Spatial Interaction, Journal of American Statistical
Association, 70.
Paelinck, J. and L. Klaassen (1979), Spatial Econometrics, Saxon House, Farnborough.
Pinkse, J. (1999), Asymptotic Properties of Moran and Related Tests and Testing for Spatial Correlation
in Probit Models, manuscript, U. of British Columbia.
Rothenberg, T.J. (1971), IdentiÞcation in Parametric Models, Econometrica 32: 57-76.
Smirnov, O. and L. Anselin (1999), Fast Maximum Likelihood Estimation of Very Large Spatial Autoregres-
sive Models: A Characteristic Polynomial Approach. Manuscript, School of Social Sciences, University
of Texas at Dallas.
White, H. (1994), Estimation, Inference and SpeciÞcation Analysis, Cambridge University Press, New York,
New York.
Whittle, P. (1954), On Stationary Processes in the Plane, Biometrika 41, 434-449.
35