Department of Economics
Working Paper No. 173
Fixed Effects and Random Effects Estimation of Higher-Order Spatial
Autoregressive Models with Spatial Autoregressive and Heteroskedastic
Disturbances
Harald Badinger Peter Egger
April 2014
Fixed Effects and Random Effects Estimation of Higher-Order Spatial
Autoregressive Models with Spatial Autoregressive
and Heteroskedastic Disturbances
Harald Badinger
Department of Economics, Vienna University of Economics and Business;
Austrian Institute of Economic Research (WIFO)
Peter Egger
Department of Management, Technology, and Economics at ETH Zürich; CEPR
April 2014
Abstract: This paper develops a unified framework for fixed and random effects estimation
of higher-order spatial autoregressive panel data models with spatial autoregressive
disturbances and heteroskedasticity of unknown form in the idiosyncratic error component.
We derive the moment conditions and optimal weighting matrix without distributional
assumptions for a generalized moments (GM) estimation procedure of the spatial
autoregressive parameters of the disturbance process and define both a random effects and a
fixed effects spatial generalized two-stage least squares estimator for the regression
parameters of the model. We prove consistency of the proposed estimators and derive their
joint asymptotic distribution, which is robust to heteroskedasticity of unknown form in the
idiosyncratic error component. Finally, we derive a robust Hausman-test of the spatial random
against the spatial fixed effects model.
JEL-code: C13, C21, C23
Keywords: Higher-order spatial dependence; Generalized moments estimation;
Heteroskedasticity; Two-stage least squares; Asymptotic statistics
I. Introduction
This paper considers the estimation of panel data models with higher-order spatially
autocorrelated error components and spatially autocorrelated dependent variables. Spatial
interactions in data may originate from various sources such as strategic interaction between
jurisdictions (to attract firms or other mobile agents) and firms (in their price, quantity, or
quality setting) or general equilibrium effects which disseminate with spatial decay due to
their transmission through trade flows, migration, or input-output relationships.1
Data sets
used in empirical studies often share three features: first, they are available in the form of
panel data, with a large cross-sectional and a small time series dimension; second, spatial
interactions of various kinds co-exist – such as geography-related, trade-related, migration-
related interactions – or the decay function of a single spatial interaction is unknown; third, it
is unclear whether spatial interactions are local – and affect only immediate neighbors – or
global – and affect second third and other neighbors with repercussions. The estimator
proposed here addresses the mentioned three features in a unified framework. It allows for
panel data with a fixed but arbitrary number of channels or decay segments of spatial
interaction in both the error components and the dependent variable, referred to as
SARAR(R,S).
Estimation and testing of both random and fixed effects spatial regressive panel data models
with homoskedastic error terms has been considered in the recent literature using a maximum
likelihood framework (Baltagi, Song, and Koh, 2003; Lee and Yu, 2010) or a generalized
moments approach (Kapoor, Kelejian and Prucha, 2007; Mutl and Pfaffermayr, 2011). The
present paper builds on Kapoor et al. (2007). They propose a generalized moments (GM)
estimator for the parameters of the spatial regressive error process in a homoskedastic random
effects panel data model without endogenous explanatory variables (such as spatial lags of the
dependent variable), derive a simplified weighting matrix for the moment conditions under
the assumption of normally and identically distributed error components, and prove
consistency of the GM estimates. They also establish the asymptotic distribution of the
feasible generalized least squares (FGLS) estimates of the parameters of the exogenous
regressors.
The present paper extends and generalizes the analysis in Kapoor et al. (2007) in several
respects. First, we allow the explanatory variables to be related to the time-invariant error
component, i.e., we provide an estimation framework that nests both the fixed and random
1 See Cliff and Ord (1973, 1981), Anselin (1988), and Cressie (1993) for classic references
about spatial econometric models in general. Recent theoretical contributions of spatial panel
data models include Baltagi, Song, and Koh (2003), Baltagi, Song, Jung, and Koh (2007),
Kapoor et al. (2007), Baltagi, Egger, and Pfaffermayr (2008), and Lee and Yu (2008). Recent
applications of spatial panel data models include Arbia, Basile, and Piras (2005), Egger,
Pfaffermayr, and Winner (2005), Baltagi, Egger, and Pfaffermayr (2007), and Badinger and
Egger (2009).
effects setup. Second, we allow for higher-order rather than only first-order spatial regressive
processes in both the dependent variable and the error process, enabling a more flexible
design and specification tests of the ‘spatial’ interdependence decay function.2
Third, we
allow for endogenous variables, including spatial lags of the dependent variable in the main
equation, which is shown to affect the optimal weighting matrix for the moment conditions as
well as the distribution of the GM estimates. Fourth, we do not only prove consistency of the
estimates of the model parameters but also derive their joint asymptotic distribution (which is
affected by the presence of endogenous variables in a nontrivial way). Fifth, we dispense with
the assumption of normally distributed error components, used by Kapoor et al. (2007) to
derive a simplified weighting matrix of the moments. In particular, we relax the restrictive
assumption that the idiosyncratic errors are identically distributed and allow for
heteroskedasticity of arbitrary form over cross-sectional units and time in the idiosyncratic
error terms. Under these assumptions, we derive a robust variance-covariance matrix, drawing
on recent results by Stock and Watson (2008). We emphasize that the framework of the
present paper, the advantage of the GM approach over maximum likelihood (ML) estimation
goes beyond that of imposing less restrictive distributional assumptions and computational
simplicity, since ML yields inconsistent parameters estimates in the SARAR(R,S) framework
with heteroskedasticity of unknown form (see Lee and Yu, 2010). Sixth, we derive a
Hausman-test that allows to test the spatial fixed effects against the random effects model in
the presence of heteroskedasticity. Seventh and finally, we provide some limited Monte Carlo
evidence on the small sample performance of the proposed estimation procedures. In sum this
provides a fairly flexible framework for applied work, allowing specification tests, estimation,
and inference in random and fixed effects panel data models with potentially higher-order
cross-sectional interdependence and heteroskedasticity.
The remainder of the paper is organized as follows. Section II introduces the basic model
specification, discusses the fixed versus the random effects model, and provides an overview
of the key assumptions of the proposed estimation procedure. Section III proposes GM
estimators for the parameters of spatial dependence in the error components. Section IV
derives a two-stage least squares (TSLS) and spatial generalized TSLS procedure for
estimation of the regression parameters of the model and derives a joint heteroskedasticity-
robust asymptotic variance-covariance matrix of the GM and TSLS estimates of the model
parameters. Section V derives a consistent estimator of the variance-covariance matrix.
Section VI proposes a Hausman-type test of the random versus the fixed effects model.
Section VII presents results of a Monte Carlo simulation exercise. Section VIII summarizes
our main findings and concludes. The detailed proofs are relegated to a technical appendix.
2 In a cross-sectional framework, estimation of higher order spatial regressive models is
considered by Lee and Liu (2010) under homoskedasticity and by Badinger and Egger (2008)
under heteroskedasticity.
II. The Basic Model
1. Specification and Key Assumptions
We consider an R-th order spatial regressive panel data model with S-th order spatial
regressive error components, referred to as SARAR(R,S) panel data error components model.
The basic model comprises Ni ,...,1 cross-sectional units and Tt ,...,1 time periods.
Throughout, subscript N indicates that the variables or parameters are allowed to depend on
sample size. For time period t, the model reads
Nt
R
r
NtNrNrNNtNt ,
1
,,,,, uyWβXy
, or (1a)
NtNNtNt ,,, uδZy , (1b)
where Nt ,y is an 1N vector with cross-sectional observations of the dependent variable in
year t, Nt ,X is an KN matrix of observations on K non-stochastic explanatory variables,
i.e., ),...,( ,,1, NNtNtNt xxX , where each of the N vectors ),...,( ,,,,1, NitKNitNit xxx is of
dimension K1 , containing the observations on the K explanatory variables for cross-
section i and period t . For later reference, define the KT matrix ),...,( ,,1, NiTNiNi xxX as
observations on the K explanatory variables for cross-section i and all periods Tt ,...,1 .
The structure of spatial dependence in Nt ,y is determined by the time-invariant NN
matrices Nr ,W , Rr ,...,1 , whose elements Nrijw ,, are assumed to be known and will often
(but need not) be specified as a decreasing function of geographical distance between the
cross-sectional units i and j. The expression NtNrNrt ,,,, yWy is referred to as the r-th spatial
lag of Ny . The specification of a higher-order process allows the strength of spatial
interdependence in the dependent variable (reflected in the spatial autoregressive parameters
Nr , , Rr ,...,1 ) to vary across a fixed number of R subsets of relations between cross-
sectional units.
In equation (1b), the )( RKN design matrix is given by ),( ,,, NtNtNt YXZ , with
],...,[ ,,,1,, NRtNtNt yyY , and ),( NNN λβδ , where the 1K parameter vector of the
exogenous variables is given by ),...,( ,,1 NKNN βββ and the 1R vector of spatial
autoregressive parameters of Ny is defined as ),...,( ,,1 NRNN λ .
The 1N vector of error terms ),...,( ,,1, NNtNtNt uuu is assumed to follow a spatial
autoregressive process given by
Nt
S
s
NtNsNsNt ,
1
,,,, εuMu
, (1c)
NtNNt ,, vμε , (1d)
where Ns, and Ns,M denote the time-invariant, unknown parameters and the known NN
matrix of spatial interdependence, respectively. The structure of spatial correlation in the
disturbances is determined by the S different, time-invariant NN matrices Ns,M . As in
equation (1a), the specification of a higher-order process allows the strength of spatial
interdependence in the disturbances (reflected in the parameters Ns, , Ss ,...,1 ) to vary
across a fixed number of S subsets of relations between cross-sectional units. This enables a
more flexible parameterization of the decay of spatial dependence than with a first-order
process along two lines: by capturing more than just one channel of interdependence and by
allowing for estimation of several parameters Ns, for S segments of the decay function
(e.g., rings of neighbors or segments of distance). The expression NtNsNst ,,,, uMu is referred
to as the s-th spatial lag of Nu . The 1S vector of the spatial autoregressive parameters of
Nt ,u is defined as .),...,( ,,1 NSNN ρ
Finally, the 1N vector of error terms Nt ,ε consists of two error components, a cross-section
specific, time-invariant error component Nμ and an idiosyncratic error component Nt ,v ,
which is specific to both the cross-sectional unit and the time period. The typical elements of
Nt ,ε and Nt ,v are the scalars Nit , and Nitv , , respectively, and the 1N vector of unit-specific
error components is given by ),...,( ,,1 NNNN μμμ .
Stacking observations for all time periods such that t is the slow index and i is the fast index
with all vectors and matrices, the model reads
NNNNNN uλYβXy , or (2a)
NNNN uδZy , (2b)
with the KNT regressor matrix ),...,( ,,1 NTNN XXX , and ),...,( ,,1 NRNN yyY , where
),...,( ,,,,1, NrTNrNr yyy is the 1NT vector of observations on the r-th spatial lag of the
dependent variable Nr ,y . The 1NT vector of disturbances ),...,( ,,1 NTNN uuu for the
spatial autoregressive process of order S is given by
N
S
s
NNsTNsN εuMIu 1
,, )( , (2c)
where TI is an identity matrix of dimension TT . The 1NT vector ),...,( ,,1 NTNN εεε is
specified as
NNNNNTN vμvμIeε
)( , (3a)
where Te is a unit vector of dimension 1T and NI is an identity matrix of dimension
NN . In light of (2c), the error term can also be written as
S
s
NNsNsNT
S
s
NNsTNsNN
1
,,
1
,, )()( uMIIuMIuε . (3b)
It follows that
)]()([])([1
1
,,
1
1
,,
S
s
NNNsNsNT
S
s
NNsNsNTN vμMIIεMIIu , and (4a)
N
R
r
NrNrNTNN
R
r
NrNrNTN uWIIβXWIIy ])([])([ 1
1
,,
1
1
,,
, (4b)
A more general specification of (4a) would allow the spatial regressive parameters (and
possibly the weighting matrices) associated with the two error components Nμ and Nv to
differ as in Baltagi, Egger, and Pfaffermayr (2009). With a higher order process as considered
in the present paper, such a specification would be both difficult to identify and
computationally involved. Hence we assume the pattern of the spatial regressive disturbance
process to be the same for Nμ and Nv as in Kapoor et al. (2007).
2. Key Assumptions
As it is standard in the spatial econometric panel data literature, we assume that the
explanatory variables collected in XN are nonstochastic with elements that are bounded
uniformly in absolute value.3
Without loss of generality we further assume that each
explanatory variable changes over time, at least for some cross-section i. (Under random
effects estimation, this assumption could be relaxed in a straightforward way without
invalidating the asymptotic results.) Beyond those, the following assumptions are maintained
throughout this paper.
Assumption 1.
Let T be a fixed positive integer. (a) For all Tt 1 and 1,1 NNi , the error
components Nitv , are (mutually) independently distributed with 0)( , NitvE , 2
,
2
, )( itvNitvE ,
3
See Kapoor, Kelejian, and Prucha (2007, p. 100), Lee and Yu (2008, p.3), Lee and Yu
(2010, Assumption 6, p. 6) or Mutl and Pfaffermayr (2011, p.51).
where 2
,0 itv , and 4
,NitvE for some 0 . Hence, the idiosyncratic disturbances
exhibit heteroskedasticity of unknown form.
(b) For all 1,1 NNi , the unit-specific error components Ni, are identically and
(mutually) independently distributed with 22
, )( NiE , where b20 , and
4
,NiE for some 0 . Following Mundlak (1978), it is assumed that
NitNitNi wμ ,,, πx . Averaging over time periods Tt ,...,1 we obtain NiNiNi wμ ,,, πx for
Ni ,...,1 “between-transformed” observations, where
T
t
NitNi T1
,
1
, xx and
T
t
NitNi wTw1
,
1
, are both 1N vectors, and ),0(~ 2
,, NwNiw . In the random effects model,
we have 0π , which implies that the time-invariant error component is uncorrelated with
the explanatory variables Nit ,x in any time period t. In the fixed effects model, we have 0π
, i.e., the explanatory variables are correlated with the time-invariant error component. More
precisely, in the random effects specification we have 0)( , NNiE X , whereas in the fixed
effects model it holds that 0)()( , NNNi fE XX .4
(c) The processes }{ ,Nitv and }{ ,Ni are independent of each other.5
We emphasize that the estimation framework considered here assumes that the spatial
regressive structure of the empirical model given by (1a) to (1d) is identical under the fixed
effects and the random effects setup, i.e., the time-invariant error component displays the
same spatial regressive structure through equations (1b) and (1c), irrespective of the
properties of the covariates. This differs from the specification of the spatial regressive fixed
effects models in Lee and Yu (2010) as well as Mutl and Pfaffermayr (2011), who exclude the
time-invariant error component from the spatial regressive error process.6
4
Strictly speaking, with non-stochastic regressors, the two expectations could also be stated
unconditionally (see Greene, 2008, p. 18). 5
Assumption 1 is maintained throughout the paper. For some results, in particular for
consistent estimation of the variance-covariance matrix of the GM estimates without
distributional assumptions, Assumption 1 will have to be strengthened, assuming that
for all and .
6 Lee and Yu (2008) consider maximum likelihood estimation of a homoskedastic spatial
regressive fixed effects panel data model; Mutl and Pfaffermayr (2008) consider a Hausman-
test for random versus fixed effects first order SARAR(1,1) model with homoskedastic error
components. Both partial out the time invariant error component from the spatial
regressive disturbance process under fixed effects estimation. This choice implies that the
time-invariant error component displays the spatial regressive structure of the dependent
variable. The difference of Mutl and Pfaffermayr (2008) to our approach is apparent from the
8
,NitvE Tt 1 Ni 1
Nμ
As we will see below, our specification implies that the spatial generalized least squares
(GLS) transformed model nests the standard fixed and random effects panel data models.
Hence, we regard the nature of the spatial regressive process and the properties of the
explanatory variables under random versus fixed effects as two separate sets of assumptions.
Our approach allows for cross-sectional interdependence (with a known spatial structure) not
only in unobserved variables captured by Nv but also in unobserved time-invariant variables
subsumed in Nμ in SARAR(0,S) models (i.e., without a spatially lagged dependent variable).
More importantly, this approach allows us to use the same set of four moment conditions both
under random effects and fixed effects estimation. Finally, when considering a Hausman test
of the random effects versus the fixed effects model in section VI, we wish to consider two
model specifications, i.e., the random effects and the fixed effects model, whose assumptions
regarding the spatial regressive structure of the error components and the nature of
heteroskedasticity are identical and which only differ with regard to whether or not 0π
Assumption (1b).
Assumption 1 implies that
2
,
2
,, )( itvNjsNitE for i = j and t = s, (5a)
2
,, )( NjsNitE for i = j and t s, (5b)
0)( ,, NjsNitE , otherwise. (5c)
As a consequence, the variance-covariance matrix of the stacked error term Nε reads
NNTNNN E ΣIJεεΩε )()( 2
, , (6a)
where TTT eeJ is a TT matrix with unitary elements, NTI is an identity matrix of
dimension NT NT, and
NTNn
NT
nNn
NT
nNNN εEdiagvEdiagE IvvΣ22
,1
2
,1 )()()( . (6b)
Note that we use single indexation NTn ,...,1 in equation (6b) to denote elements of the
stacked vectors or matrices. We will adopt this convention at several points in the paper in
order to simplify notation, when there is no possibility for confusion.
specification of the ‘Mundlak assumption’ in (9). In matrix form and using notation in an
obvious way, we assume that , whereas Mutl and Pfaffermayr (2008) assume
that (which differs from the specification of their random
effects model).
NNN wπXμ
NNNNNN wπXWIμ )(
Next, we define two matrices N,0Q and N,1Q , which are central to the estimation of error
component models and the moment conditions of the GM estimator:
NT
TNT
IJ
IQ )(,0 (7a)
NT
NT
IJ
Q ,1 . (7b)
Pre-multiplying an NT 1 vector with N,0Q transforms its elements into deviations from
cross-section specific sample means taken over time (“within-transformation”). We will refer
to “within-transformed” vectors or matrices with an underbar, e.g., NNN ZZQ ,0 . Pre-
multiplying a vector by N,1Q transforms its elements into cross-section specific sample means
(“between-transformation”). Notice that N,0Q and N,1Q are both of order NT NT,
symmetric, idempotent, orthogonal to each other, and sum up to NTI .7
Assumption 2.
(a) All diagonal elements of the matrices Nr ,W , Rr ,...,1 , and Ns,M , Ss ,...,1 , are zero.
(b) The admissible parameter space for the spatial lag of the dependent variable is given by
),( ,,,rr
NUNLNr aa , with r
NLa
,0 , aaa r
NUr ,
, , Rr ,...,1 , and
AR
r
Nr
1
, ,
where we define a such that )(max ,
,...,1
r
Rraa
holds.
Analogous assumptions are made for the parameters of the spatial autoregressive error
process: ),( ,,,ss
NUNLNs aa , with ,0 ,
s
NLa
aaa s
NUs ,
, , Ss ,...,1 , and
AS
m
Nm
1
, , where we define a such that )(max ,
,...,1
s
Ssaa
holds.
(c) The matrices )(1
,,
R
r
NrNrN WI and )(1
,,
S
m
NmNmN MI are nonsingular for
),( ,,rr
NUNLr aa and ),( ,,
rr
NUNLs aa , respectively.
Part (a) of Assumption 2 is standard. Assumption (2b) requires the spatial regressive
parameters to be finite. The admissible value of the scalars A ( A ) will generally depend on
the properties of the weights matrices Nr ,W ( Ns,M ). For example, with row-normalized
matrices Nr ,W , Rr ,...,1 , choosing 1A ensures that )(1
,,
R
r
NrNrN WI is invertible, as
7 See Remark A.2 in Appendix A for further properties of and . N,0Q N,1Q
required in Assumption (2c).8
Finally, Assumption (2c) ensures that Nu and Ny are uniquely
identified through equations (4a) and (4b).
Assumption 3.
The row and column sums of the matrices Nr ,W , Rr ,...,1 , Ns,M , Ss ,...,1 ,
1
1
,, )(
R
r
NrNrN WI , and 1
1
,, )(
S
s
NsNsN MI are bounded uniformly in absolute value.
In light of Assumptions 1-3 and Remark A.1 in the Appendix, it follows that 0u )( NE and
the variance-covariance matrix of Nu is given by
S
s
NsNsNT
S
s
NNsNsNTNNN E1
1
,,
1
,
1
,,, ])([])([)( MIIΩMIIuuΩ εu , (8)
For the sake generality, all explanatory variables and parameters (except for the variances of
the error components Nμ and Nv ) are allowed to depend on sample size N. (Of course, all
results hold up in the case where parameters do not depend on N.) In spatial econometric
models this degree of generality is important, given that spatial lags (and disturbance
processes) depend on normalized weights matrices. Depending on the weighting scheme, both
the spatially weights and the corresponding parameters will change with the size of the cross
section dimension, N, since a growing N (e.g., a growing number of countries or regions)
requires renormalizing the weights matrices. Such a specification is consistent, for example,
with models where the weights matrices are row-normalized and the number of neighbours of
a given cross-sectional unit depends on sample size (see Kapoor et al., 2007, p. 102) or where
the strength of interdependence (in terms of the spatial autoregressive parameters) changes
with the number of neighbours.
As a result, the model specification in equations (1a)-(1c) is fairly general, allowing for
higher-order spatial dependence in the dependent variable, the explanatory variables, and the
disturbances, and enabling specification tests to determine to proper structure of cross-
sectional interdependence in applied work.
3. Overview of Estimation Procedure
In the following, we outline the estimation procedure proposed in the present paper. Details
and proofs of the claims made here are given in the subsequent sections.
8 If the matrices are not row-normalized, Assumption (2c) is implied by
for some matrix norm (see Lee and Liu, 2010, Horn and Johnson,
1985, p. 301).
Nr ,W
1
,,...,1
max
Nr
RrA W
In a first step, the regression parameters in model (1a), i.e., Nδ , are consistently
estimated by fixed effects two-stage least squares (TSLS), ignoring the spatial
regressive structure of Nu (see Amemiya, 1971, Baltagi, 2005). Under the maintained
assumptions, this yields consistent estimates of the disturbances NNNN δZyu ˆˆ .
Under stronger assumptions, consistent estimates can also be obtained by pooled two-
stage least squares or two-stage least squares with random effects.
Based on the estimates of the disturbances Nu , a generalized moments (GM)
estimator can be used to obtain consistent estimates of the parameters of the spatial
regressive disturbance process )( Nρ and the variance of the time-invariant error
component ( 2
), denoted as Nρ~ and 2
,~
N .
The joint variance-covariance matrix for the estimates of the regression parameters Nδ
and the spatial regressive parameters Nρ derived in the present paper, which is robust
to both the spatial dependence in Nu as well as arbitrary heteroskedasticity in the
idiosyncratic error term Nv , can be used for specification tests to determine the proper
form of the interdependence decay function.9
To improve efficiency (the estimates of) the parameters Nρ can be used to obtain a
(feasible) spatial generalized least squares (GLS) transformed variant of model (1a),
which corresponds to a “standard” (fixed or random effects) panel data model without
spatial dependence in the disturbances but with heteroskedasticity of unknown form in
the idiosyncratic error term Nv . Using this transformed model, feasible spatial
generalized two-stage least squares (TSLS) estimates of the regression parameters *~̂Nδ
can be obtained. (The asterisk indicates that the estimates are based on a transformed
model; the tilde indicates that the model transformation is based on Nρ~ , i.e., the GM
estimates of Nρ ). Again a heteroskedasticity-robust joint variance-covariance of *~̂Nδ
and Nρ~ is derived, allowing for joint inference regarding the regression parameters
and the spatial regressive parameters of the model.
The estimation procedure can also be implemented in an iterative way, i.e., the
feasible spatial generalized TSLS estimates *~̂Nδ can be used to obtain iterated
9 The possibility that joint hypotheses about and may be formulated and tested is an
advantage of the proposed two-step approach over the use of (spatial-dependence and
heteroskedasticity) robust standard errors. In particular, it allows for specification tests a la
Anselin et al. (1996) in a higher order setting and under less restrictive distributional
assumptions.
Nδ Nρ
estimates of the disturbances Nu , which can in turn be used to obtain a new set of
estimates for Nρ , etc.
The obtained (feasible or iterated) heteroskedasticity-robust fixed and random effects
models can then be tested against each other by a Hausman test which is derived in
this paper.
To keep the analysis general, we first consider only the GM estimation of the disturbance
process (1c), without assuming a particular form of model (1a) or how consistent estimates of
the residuals Nu~ of model (1a) are obtained. The advantage of this approach is that the results
are potentially applicable to the disturbances of a wider class of regression models, e.g.,
nonlinear specifications of equation (1a). Then, we consider the estimation of the main
equation (1a), using a modular approach with general notation that covers the four estimators
considered in the present paper: random effects and fixed effects estimation of both the
original and the spatial GLS transformed model.
III. GM Estimation of a SAR(S) Process
In the following, we consider GM estimators for the spatial regressive parameters Nρ of the
disturbance process in equation (1c) and the variance of the time-invariant error component 2
and establish their asymptotic joint distribution. In this subsection, we only consider the
process in equation (1c) for the disturbances Nu , but not necessarily the one in equation (1a)
for Ny . These disturbances Nu are unknown and thus have to be obtained in a first-step,
using consistent estimates of Nδ in the main equation (1a) (or from some other model),
ignoring the spatial regressive error structure in Nu . The assumptions sufficient to establish
the asymptotic properties of the GM estimates (consistency and normality) are stated in
general terms in Assumptions 4 to 7 in this section and will be made more specific in section
IV, where we consider TSLS and spatial generalized TSLS estimation of model (1a). It will
also become apparent in this section that the asymptotic distribution of the (second-step) GM
estimates of Nρ , which are based on estimated disturbances Nu , is affected in a non-trivial
way by the properties of the first-step estimation (fixed versus random effects) and by the
presence of endogenous right hand side variables.
1. Moment Conditions
A set of three moment conditions for GM estimation of first-order spatial regressive error
processes was introduced in the seminal paper by Kelejian and Prucha (1999) for the case of a
single cross-section under homoskedasticity. The extension of this estimator to a random
effects panel data error component model by Kapoor et al. (2007) (under homoskedasticity)
yields a set of six moment conditions. Heteroskedasticity has so far only been considered in
the cross-sectional SARAR(1,1) framework by Kelejian and Prucha (2010), who use two of
the three moment conditions in Kelejian and Prucha (1999), and in the SARAR(0,1)
framework by Lin and Lee (2010).
An analogous approach to Kelejian and Prucha (2010) is pursued here in the derivation of the
moment conditions under heteroskedasticity, but for (both fixed and random effects) panel
data models. For this, we use four of the six moment conditions akin to the ones in Kapoor et
al. (2007). Moreover, with an S-th-order rather than a first-order process (SAR(S), with 1S
), additional moment conditions are available, associated with each weights matrix Ns,M ,
Ss ,...,1 , and each pair of weights matrices Ns,M , Ns ,M , SssSs ,...,;,...,1, . Define
])()[()(1
,,,,,
S
m
NNmTNmNNsTNNsTNs uMIuMIεMIε . (10)
Under Assumptions 1 to 3, we then have the following set of moment conditions for 2T ,
and SssSs ,...,;,...,1, :
:M ,
1
ss )]()([
)1(
1]
)1(
1[ ,,,0
2
,1,,0, NsNsTNNn
NT
nNsNNs vEdiagTrTNTN
E
MMIQεQε (11a)
:M2
s 0])1(
1[ ,0,
NNNs
TNE εQε , or (11b)
:M ,
3
ss )]()([
1)(]
1[ ,,,1
2
,1,,
2
,,1, NsNsTNNn
NT
nNsNsNsNNs vEdiagtrN
trN
T
NE
MMIQMMεQε (11c)
:M4
s 0]1
[ ,1, NNNs
NE εQε . (11d)
Unless part of the weights matrices are orthogonal, there are )1(4 SSS moment
conditions.10
For the case of a first-order spatial regressive process, i.e., 1S , they nest the
moment conditions of the aforementioned GM estimators as special cases. Under
homoskedasticity, i.e., NTvNit
NT
n vEdiag I22
,1 )( , the corresponding four moment conditions in
Kapoor et al. (2007) are then obtained. In the cross-sectional case, i.e., for 1T (and
0Q N,0 ) the moment conditions M1 and M2 become uninformative and M3 and M4 reduce to
the corresponding the two moment conditions in Kelejian and Prucha (2010) under
heteroskedasticity with the NN matrix )( 2
1 i
N
i vEdiag , or the two moment conditions in
10
If some pairs of matrices are orthogonal, for some , the corresponding
moment condition is trivially satisfied for any set of (finite) parameter values. Hence, if all
weights matrices were pairwise orthogonal, there would be moment conditions.
0MM NsNs ,, ss
S4
Kelejian and Prucha (1999) under homoskedasticity with the NN matrix 2
,
2
,1 )( NvNNi
N
i vEdiag I .
Note that the moment conditions can also be written as quadratic forms in the vector Nε :
:M ,
1
ss 0]
)1(
1[ ,
,1
N
ss
NNTN
E εAε , with (12a)
NnnNsNsTN
NT
nNsNsTNNss diag ,0,,,01,,,0
1
,, ]})([){( QMMIQMMIQA .
:M2
s 0])1(
1[ ,2
N
s
NNTN
E εAε , with )( ,,0,2 NsTN
s
N MIQA . (12b)
:M ,
3
ss 0)(]
1[ ,,
2,
,3
NsNsN
ss
NN trN
T
NE MMεAε , with (12c)
}])(([)({ ,0,,,11,0,,,1
,
,3 NnnNsNsTN
NT
nNNsNsTN
ss
N diag QMMIQQMMIQA
.
:M4
s 0]1
[ ,4 N
s
NNN
E εAε , with )( ,,1,4 NsTN
s
N MIQA . (12d)
Substituting equations (3a), (3b), (6b), and (10) into the )1(4 SSS moment conditions
(11) yields a )1(4 SSS equation system in ),,...,( 2
,,1 NSN , which can be written as
0 NNN bΓγ , (13a)
where Nb is a 1]12/)1([ SSS vector given by
),,...,,...,,,..., ,,...,( 2
,,1,,1,2,1
2
,
2
,1,,1 NSNSNSNNNNSNNSNNb ,
i.e., Nb contains S linear terms Nm, , Sm ,...,1 , S quadratic terms 2
, Nm , Sm ,...,1 ,
2/)1( SS cross products NlNm ,, , SmlSm ,...,)1( ),1(,...,1 , as well as 2
. For later
reference, we define the 1)1( S vector of all parameters as
) ,,...,(),( 2
,,1
2 NSNNN ρθ .
Nγ is a 1)]1(4[ SSS vector with elements )( ,Ni , )1(4,...,1 SSSi , and NΓ is a
)]1(4[ SSS ]12/)1(2[ SSS matrix with elements )( ,, Nji , )1(4,...,1 SSSi ,
12/)1(2,...,1 SSSj . The elements Ni , and Nji ,, will be defined below.
Throughout the paper, we adopt the following convention with respect to the ordering of the
rows in equation system (13). The first four rows are associated with the 4 moment conditions ssM,
1 , sM 2 , ssM,
3 , and sM 4 with 1 ss . The next four rows are associated with ssM,
1 ,
sM 2 , ssM,
3 , and sM 4 with 2 ss , and so forth up to Sss . This yields S4 rows of the
equation system. These moment conditions are always available under Assumptions 1 and 2.
Unless part of the weights matrices are orthogonal, there are )1( SS further moment
conditions available, resulting from ss ,
1M and ss ,
3M with )1(,...,1 Ss , Sss ),...,1( .
These are added to the equation system, starting from row 14 S , as follows. The next row (
14 S ) is associated with ss ,
1M and 1s and 2s ; the next rows with ss ,
1M with 1s and
3s , and so forth up to 1s and Ss ; this yields 2/)1( SS rows. We then proceed with
ss ,
3M in the same way, yielding another 2/)1( SS rows.
The sample analogue to equation system (13a) is given by
)(~~
NNNNN θΓγ b , (13b)
where the elements of Nγ~ and NΓ
~ are equal to those of Nγ and NΓ with the expectations
operator suppressed and the disturbances Nu replaced by (consistent) estimates Nu~ .
GM estimates of the parameters NSN ,,,1 ..., , 2
are then obtained as the solution to
)](~
)([)]~~(
~)
~~[(minarg2
21 ,,..,,NNNNNNNNNNNN
S
θΘθΓγΘΓγ
bb , (14)
i.e., the parameter estimates can be obtained from a (weighted) non-linear least squares
regression of Nγ~ on the columns of NΓ
~; )( NN θ can then be viewed as a vector of
regression residuals. The optimal choice of the )]1(4[ SSS )]1(4[ SSS weighting
matrix NΘ and its estimation will be discussed below.
In the following, we define the elements of Nγ and NΓ , grouped by the corresponding
moment conditions. Thereby, we use the following notation:
NNsTNs uMIu )( ,, , Ss ,...,1 , and (15a)
NNmNsTNNmTNsTNsm uMMIuMIMIu )())(( ,,,,, , Ss ,...,1 , Sm ,...,1 . (15b)
In the derivation of the elements of Nγ and NΓ , we also make use of the fact that
)(2)()( ,,,1
1
2
,1
2
,1 NnNnm
NT
n
S
m
mNn
NT
nNn
NT
n uudiagudiagvdiag
NT
S
m
NnlNnm
NT
nl
S
l
m uudiag I2
1
,,,,1
1
)(
, (16)
where Nnmu ,, denotes the n-th element of the vector Nm,u .
Moment Condition ss ,
1M
Due to the adopted convention regarding the ordering of the rows in equation system (13), the
row index for moment condition ss ,
1M – denoted as row( ss ,
1M ) – is given by 1)1(4 s for
ss and ssssSsS 2/)1()1(4 for ss . Hence, moment condition 1
,M ss
delivers S rows of equation system (in row 1, 5, …, )34 S ) for ss and 2/)1( SS rows
of equation system (in rows 2/)1(4,....,14 SSSS ) for ss . The corresponding
elements of Nγ and NΓ are defined as follows:11
)]})(([{)1(
1,,
2
,1,0,,0,)row(M ,1
NsNsTNn
NT
nNNsNNs udiagTrETN
ss
MMIQuQu (17a)
)()1(
1 ,
,1 N
ss
NNETN
uu
A ,
where })](([)({ ,,,01,,,0
,
,1 nnNsNsTN
NT
nNsNsTN
ss
N diag MMIQMMIQ
A .
)]})(([{)1(
2,,,,,1,0,,0,,)row(M ,
1NsNsTNnNnm
NT
nNNmsNNsmuudiagTrE
TNss
MMIQuQu
]([)1(
2 ,
,1, N
ss
NNmTNETN
uMIu
)A , associated with m , Sm ,...,1 .
)]})(([{)1(
1,,
2
,,1,0,,0,,)row(M ,1
NsNsTNnm
NT
nNNmsNNsmmSudiagTrE
TNss
MMIQuQu
])()([)1(
1,
,
,1, NNmT
ss
NNmTNETN
uMIMIu
A , associated with 2
m , Sm ,...,1 .
)]})(([{)1(
2,,,,,,1,0,,0,2/)1()1(,)row(M ,
1NsNsTNnlNnm
NT
nNNlsNNsmmlmmmSuudiagTrE
TNss
MMIQuQu
])(([)1(
2,
,
,1, NNmT
ss
NNlTNETN
uMIMIu
)A ,
associated with lm , )1(,...,1 Sm ; Sml ),...,1( .
)(1
,,12/)1(2,)row(M ,1
NsNsSSStr
Nss
MM .
Moment Condition 2M s
11
For simplicity, subscript N is dropped in the definition of the elements of and . Nγ NΓ
Due to the adopted convention with respect to the ordering of the rows in equation system
(13a), the row index for moment condition ss,
2M is given by 2)1(4 s . (For ss,
2M we
always have ss such that we use only a single subscript.) Hence, moment condition s
2M
delivers S rows of the equation system (in rows 2, 6, …, )24 S . The corresponding
elements of Nγ and NΓ are defined as follows:
)()1(
1,0,2)1(4 NNNss E
TNuQu
(17b)
)()1(
1,2 N
s
NNETN
uu A
, where s
N
s
N ,2,2 AA .
)()1(
1,,0,,0,,2)1(4 NmNNsNNNsmms E
TNuQuuQu
]))(([)1(
1,2,2, N
s
N
s
NNmTNETN
uMIu AA
,
)()1(
1,,0,,2)1(4 NmNNsmmSs E
TNuQu
])(([)1(
1,,2, NNmT
s
NNmTNETN
uMIMIu
)A ,
][)1(
1,,0,,,0,2/)1()1(,2)1(4 NlNNsmNmNNslmlmmmSs E
TNuQuuQu
]))()(([)1(
1,,2,2, NNmT
s
N
s
NNlTNETN
uMIMIu
AA ,
012/)1(2,2)1(4 SSSs .
Moment Condition ss ,
3M
Due to the adopted convention regarding the ordering of the rows in equation system (13), the
row index for moment condition ss ,
3M – denoted as row(ss ,
3M ) – is given by 3)1(4 s for
ss and ssssSsSSS 2/)1()1(2/)1(4 for ss . Hence, moment condition
ss ,
3M delivers S rows of the equation system (in rows 3, 6, …, )14 S for ss and
2/)1( SS rows of the equation system (in rows )1(4,....,12/)1(4 SSSSSS ) for
ss .
)]})(([{1
,,
2
,1,1,,1,)row(M ,3
NsNsTNn
NT
nNNsNNs udiagTrEN
ss MMIQuQu (17c)
)(1 ,
,3 N
ss
NNEN
uu A ,
where ])]([)([ ,,,11,,,1
,
,3 nnNsNsTN
NT
nNsNsTN
ss
N diag MMIQMMIQ
A .
)]})(([{2
,,,,,1,1,,1,,)row(M ,3
NsNsTNnNnm
NT
nNNmsNNsmuudiagTrE
Nss
MMIQuQu
]([2 ,
3, N
ss
,NNmTNEN
uMIu )A ,
)]})(([{1
,,
2
,,1,1,,1,,)row(M ,3
NsNsTNnm
NT
nNNmsNNsmmSudiagTrE
Nss
MMIQuQu
])(([1
,
,
,3, NNmT
ss
NNmTNEN
uMIMIu
)A ,
)]})(([{2
,,,,,,1,1,,1,2/)1()1(,)row(M ,3
NsNsTNnlNnm
NT
nNNlsNNsmmlmmmSuudiagTrE
Nss
MMIQuQu
])(([2
,
,
,3, NNmT
ss
NNlTNEN
uMIMIu
)A ,
)()1(
,,12/)1(2,)row(M ,3
NsNsSSStr
N
Tss
MM , associated with 2
,N .
Moment Condition ss ,
4M
The row index for moment condition ss ,
4M is given by 4)1(4 s , i.e., moment condition
s
4M delivers S rows of the equation system (in rows 4, 8, …, S4 ). Moment condition M2
delivers S rows of the equation system (in rows 2, 6, …, 24 S ). The corresponding
elements of Nγ and NΓ are defined as follows:
)(1
,1,4)1(4 NNNss EN
uQu (17d)
)(1
,4 N
s
NNEN
uu A , where s
N
s
N ,4,4 AA .
)(1
,,1,,1,,4)1(4 NmNNsNNNsmms EN
uQuuQu
]))(([1
,4,4, N
s
N
s
NNmTNEN
uMIu AA ,
)(1
,,1,,4)1(4 NmNNsmmSs EN
uQu
])(([1
,,4, NNmT
s
NNmTNEN
uMIMIu )A ,
][1
,,1,,,1,2/)1()1(,4)1(4 NlNNsmNmNNslmlmmmSs EN
uQuuQu
]))()(([1
,,4,4, NNmT
s
N
s
NNlTNEN
uMIMIu AA ,
012/)1(2,4)1(4 SSSs .
This completes the specification of the elements of the matrices Nγ and NΓ . The similarity
between the structure of the expressions resulting from moment conditions ss ,
1M and s
2M on
the one hand and ss ,
3M and s
4M on the other hand is apparent. Apart from a slight discrepancy
in the definition of the element corresponding to 2
between ss ,
1M and ss ,
3M , the other
elements differ only by the normalization factor and the corresponding matrix of quadratic
forms, N,0Q and N,1Q , respectively.
2. Definition of GM Estimator
It is a well known result from the literature on generalized method of moments estimation
that, for weighting matrix NΘ in (14), it is optimal to use the inverse of the (properly
normalized) variance-covariance matrix of the sample moments, evaluated at the true
parameter values. Denote the optimal weighting matrix, which will be derived in Subsection
3.2, by 1
NΨ and its estimate by 1~
NΨ . The optimally weighted GM estimator uses 1~~ NN ΨΘ
and is defined as
}],0[,,..,1, ),(~
)({ minarg),~,~,...,~(22
,,1,
bSsaa sNNNNNSN θΘθ ,
with ),,()(2
ρθ NN )
~~( bNN Γγ . (18)
In a first step, we will assume that Ni, and Nitv , are normally distributed in the derivation of
the optimal weighting matrix 1
NΨ as in Kapoor et al. (2007). In the Appendix, the optimal
weighting matrix 1
NΨ will be derived without distributional assumptions (apart from the ones
in Assumption 1). It is worth emphasizing that the use of estimated disturbances together with
the presence of endogenous variables in (1a) introduces a difference between the optimal
weighting matrix 1
NΨ and the inverse of the variance-covariance matrix of the sample
moments. Under fixed effects, this is also true, even if there are no endogenous variables in
the main equation (1a). This will become apparent in section 3.2., where the optimal
weighting matrix 1
NΨ and an estimate 1~
NΨ are derived.
3. Asymptotic Properties of the GM Estimator for Nθ
3.1 Consistency
In order to prove consistency of the estimator Nθ~
, the following additional assumptions are
introduced:
Assumption 4.
Assume that NNNN ΔDuu ~ , i.e., NNnNnNn uu Δd .,,,~ , for NTn ,...,1 , where ND is an
PNT matrix, the P1 vector Nn.,d denotes the n-th row of ND and NΔ is a 1P vector.
Let Nnjd , be the j-th element of Nn.,d . We assume that
dNnj ctdE2
, )( for some 0 ,
where dc does not depend on N, and that )1(2/1
pN ON Δ .
Assumption 4 will be fulfilled in many settings, e.g., if model (1a) contains endogenous
variables (such as spatial lags of Ny ) and is estimated by fixed or random effects two-stage
least squares. In that case, NΔ denotes the difference between the parameter estimates and the
true parameter values and Nn.,d is the (negative of the) n-th row of the design matrix NZ
under random effects or of the within transformed design matrix NNN ZQZ ,0 under fixed
effects (see subsection 2 of Section IV). Under certain conditions, Assumption 4 will also be
satisfied if model (1a) involves a non-linear specification (see Kelejian and Prucha, 2010).
Finally, Assumption 4 implies that
NT
n
NnNT1
2
.,
1)(
d is )1(pO .
Assumption 5.
(a) The smallest eigenvalues of NNΓΓ are bounded uniformly away from zero, i.e.,
0)( *min NNΓΓ . (b) )1(~
pNN oΘΘ , where NΘ are )]1(4[)]1(4[ SSSSSS
non-stochastic, symmetric, positive definite matrices. (c) The largest eigenvalues of NΘ are
bounded uniformly from above, i.e., *max )( NΘ and the smallest eigenvalues of NΘ
are bounded uniformly away from zero, i.e., 0)( *min NΘ .
Assumption 5 implies that the smallest eigenvalues of NNN ΓΘΓ are bounded uniformly away
from zero, ensuring that the true parameter vector Nθ is identifiable unique. Moreover, by the
equivalence of matrix norms, it follows from Assumption 5 that NΘ and 1
NΘ are O(1).
Assumptions 1-5 ensure consistency of the GM estimators for ),( 2
NN ρθ . We summarize
these results in the following theorem, which is proven in Appendix B.
Theorem 1. Consistency of Weighted GM Estimator Nθ~
Suppose Assumptions 1-5 hold. Then, provided the optimization space contains the parameter
space, the weighted GM estimators ])~
(~),~
(~),...,~
(~[)~
(~ 2
,,,1 NNNNSNNNN ρρ ΘΘΘΘθ defined by
(18) are consistent for NSN ,1, ,..., and 2
, i.e.,
0 )~
(~,s,
p
NsNN Θ , Ss ,...,1 , and 0 )~
( ~ 22
,
p
NN Θ as N .
This result holds for an arbitrary weighting matrix (that satisfies Assumption 5). Hence, it
applies to both the optimally weighted GM estimator defined by (18) with 1~~ NN ΨΘ or the
initial unweighted GM estimator with NN IΘ ~
.
3.2 Asymptotic Distribution of GM Estimator for Nθ
In the following we consider the asymptotic distribution of the optimally weighted GM
estimator Nθ~
. To establish asymptotic normality of )~ ,~(~ 2
,NNN ρθ , we need some additional
assumptions.
Assumption 6.
Let ND be defined as in Assumption 4, such that NNNN ΔDuu ~ . For any real NTNT
matrix NA , whose row and column sums are bounded uniformly in absolute value, it holds
that )1()(11
pNNNNNN oENN uADuAD .
A sufficient condition for Assumption 6 is, e.g., that the columns of ND are of the form
NNN εΠπ , where the elements of Nπ are bounded uniformly in absolute value and the row
and column sums of NΠ are bounded uniformly in absolute value (see Remark A.1 in the
Appendix). This will be the case in many applications, e.g., for model (1a), when ND equals
(the negative of) the design matrix NZ or the within-transformed design matrix NZ (compare
subsection 2 of Section IV).
Assumption 7.
Let NΔ be defined as in Assumption 4. Then,
)1()()( 2/12/1
pNNN oNTNT ξTΔ , with ),( ,,
NNN μv TTT , ),( NNN μvξ , i.e.,
)1()()()( ,
2/1
,
2/12/1
pNNNNN oNTNTNT μTvTΔ μv ,
where NT is an PNNT )( -dimensional real non-stochastic matrix whose elements are
bounded uniformly in absolute value; its submatrices N,vT and N,μT are of dimension
)( PNT and )( PN , respectively. As remarked above, NΔ typically denotes the difference
between the parameter estimates and the true parameter values. Assumption 7 is kept general
and will be satisfied by many estimators, which differ in the definition of NT . In Section IV,
we verify that it holds if the model in equation (1a) is estimated by (random or fixed effects)
two-stage least squares (TSLS) or feasible spatial generalized TSLS.
In Appendix B, the limiting distribution of the GM estimator of Nθ is shown to depend on
(the inverse of) the matrix NNN JΘJ and the variance-covariance matrix of a vector of
quadratic forms in Nv and Nμ , denoted as Nq . We consider each of these expressions in the
following.
The )1()]1(4[ SSSS matrix NJ of derivatives of the 1)]1(4[ SSS vector of
moment conditions in (11) is given by
θ
ΓγθJ
)()( NNN
NN
b),,...,( ,1,,,,1, NSiNSiNi jjj , with (19a)
Nsij ,,
s
NNiNi
)( .,., bΓγ, )1(4,...,1 SSSi , Ss ,...,1 ,
NSij ,1,
)( .,., NNiNi bΓγ, )1(4,...,1 SSSi ,
where Ni.,γ and Ni.,Γ denote the i-th row of Nγ and NΓ respectively.
Using 0θ
γ
N and ignoring the negative sign, we have
NNNNN Bb ΓΓθ
θJ
)( , (19b)
where NΓ is defined above and of dimension )]1(4[ SSS ]12/)1(2[ SSS and NB
is a )1(]12/)1(2[ SSSS matrix of the form
),,,( ,4,3,21 NNNN BBBBB , (20a)
with ),( 11 SS 0IB and )]),2([ 1,1,2 SNs
S
sN diag 0B . The )1(2/)1( SSS matrix
],),...,[( 12/)1(,1,3,1,3,3 SSNSNN 0BBB consists of )1( S vertically arranged blocks Nm,,3B ,
)1(,...,1 Sm , which have the following structure:
),,( ,,,,,3 NmNmNmNm EdCB , (20b)
where Nm,C is a )1()( mmS matrix of zeros,12
Nm,d is a 1)( mS vector, defined as
),...,( ,,1, NSNmNm d , and mSNmNm I,, E . Finally, N,4B is a )1(1 S vector, defined as
12
I.e., there is no block in . N,1C N,1,3B
)1,( 1,4 SN 0B . (20c)
For later reference, note that NB has full column rank )1( S ; as a consequence, the
)1()1( SS matrix NNBB is positive definite (see, e.g., Greene, 2003, p. 835).
We next consider the vector Nq and its limiting distribution. First, define ),( NNN Δθq as the
1)]1(4[ SSS vector of sample moments with the expectation operator suppressed,
evaluated at the true parameter values, and ignoring the deterministic constants. It is made up
of the following quadratic forms in Nu~ :
)~~(),( ,
,
1
N
ss
NcNNNN N uCuΔθq for 4,...,1c and Sss ,...,1, . (21)
Hence, each element of this vector corresponds to a particular moment condition, indexed by
c, each of which is associated with a particular weights matrix Ns,M through (12b) and (12d)
for moment conditions s
2M and s
4M , or through (12a) and (12c) with a pair of weights
matrices Ns,M and Ns ,M for moment conditions ss ,
1M and ss ,
3M . The arrangement of the
elements is the same as in equation system (13).
In light of (12), the matrices ss
Nc
,
,C , 4,...,1c , and Sss ,...,1, , are defined as follows:
N
ss
N
ss
NN
ss
NT
RAARC ])([)1(2
1 ,
,1
,
,1
,
,1
, (22)
N
s
N
s
NN
s
NT
RAARC ])([)1(2
1,2,2,2
,
N
ss
N
ss
NN
ss
N RAARC ])([2
1 ,
,3
,
,3
,
,3
,
N
s
N
s
NN
s
N RAARC ])([2
1,4,4,4
,
where we have used the definition
S
m
NmNmNTN
1
,, )]([ MIIR .
By Assumption 3 and Remark A.1 in Appendix A, the row and column sums of the
symmetric NTNT matrices ss
Nc
,
,C , 4,...,1c , and Sss ,...,1, , are bounded uniformly in
absolute value. Using equation (21) and invoking Lemma B.1 (see Appendix B), the elements
of ),(2/1
NNNN Δρq can be expressed as
)1()()()~~( 2/1,
,
,
,
2/1,
,
2/1
pN
ss
NcN
ss
NcNN
ss
NcN oNNN ΔαuCuuCu (23)
with )(2])([ ,
,
1,
,
,
,
1,
, N
ss
NcNN
ss
Nc
ss
NcN
ss
Nc ENEN uCDuCCDα since ss
Nc
,
,C is symmetric. By
Lemma B.1 the elements of the 1P vectors ss
Nc
,
,α , 4,...,1c , and Sss ,...,1, , are bounded
uniformly in absolute value. As evident from (23), 0α ss
Nc
,
, when 0Du )( NNE , which is the
case under random effects estimation if there are no endogenous variables.
Note that NNNN vQεQ ,0,0 and that for symmetric NN matrices NA , we have
NNNTNN εQAIQε ,1,1 )( NNNTNNNNNT vQAIQvμAμ ,1,1 )( + NNTN μAev )(2 . Using
(22), (23), and Assumption 7 we can rewrite the vector of sample moments as
)1()1(),( *2/12/1
pNpNNNN ooNN qqΔθq , (24)
where each element of the 1)]1(4[ SSS vector )( ,,
*
,
* ssNcNc
qq can be written as linear
quadratic form of the 1)( NNT vector ),( NNN μvξ :
)1(])([ ,
,
,
,
,
,
*
pN
ss
NcN
ss
NcN
ss
Nco
ξaξAξq
)1(])()([ ,
,,
,
,,
,
, pN
ss
NcN
ss
NcN
ss
NcN o
μavaξAξ μv , (25)
where
ss
Nc
ss
Nc
ss
Nc
ss
Ncss
Nc ,
,,
,
,,,
,
,,,
,
,,,
,)( μμv
μvv
AA
AAA ,
ss
NcN
ss
Nc T
,
,
1,
, αTa , 4,...,1c , Sss ,...,1, , or
])(,)[(])(,)[( ,
,,
,
,,
1,
,,
,
,,
,
,
ss
NcN
ss
NcN
ss
Nc
ss
Nc
ss
Nc T αTαTaaa μvμv , for 4,...,1c , and Sss ,...,1, .
Observe that the elements of ss
Nc
,
,a , 4,...,1c , and Sss ,...,1, , are bounded uniformly in
absolute value by Assumption 7 and Lemma B.1. The symmetric matrices ss
Nc
,
,A , ss
Nc
,
,,vA ,
ss
Nc
,
,,, μvA , and ss
Nc
,
,,μA are of dimension )()( NNTNNT , NTNT , NNT , and NN ,
respectively, and defined as follows.
For moment condition ss ,
1M , we have
])([)1(2
1 ,
,1
,
,1
,
,,1
ss
N
ss
N
ss
NT
AAA v , NNT
ss
N
0A μv
,
,,,1 , and NN
ss
N
0A μ
,
,,1 . (26a)
For moment condition s
2M we have
])([)1(2
1,2,2,,2
s
N
s
N
s
NT
AAA v , NNT
s
N 0A μv ,,,2 , and NN
s
N 0A μ,,2 . (26b)
For moment condition ss ,
3M we have
])([2
1,3,3
,
,,3
s
N
s
N
ss
N AAA v, )]([
2
1,,,,
,
,,,3 NsNsNsNsT
ss
N
MMMMeA μv , and
)(2
,,,,
,
,,3 NsNsNsNs
ss
N
T
MMMMA μ . (26c)
For moment condition s
4M , we have
])([2
1,4,4,,4 s
N
s
N
s
N AAA v , )]([2
1,,,,,4 NsNsT
s
N MMeA μv , and
)(2
,,
,
,,4 NsNs
ss
N
TMMA μ
. (26d)
Note that the row and column sums of the symmetric matrices ss
Nc
,
,A , ss
Nc
,
,,vA , ss
Nc
,
,,, μvA , and
ss
Nc
,
,,μA are bounded uniformly in absolute value by Assumption 3 and Remark A.1 in the
Appendix. Moreover, the elements of the ),( NNN μvξ are independently distributed by
Assumption 1, and the variance-covariance matrix of Nξ is
NNTN
NNTN
NI0
0ΣΩξ 2,
. (27)
In order to calculate the variance-covariance matrix of Nq , given by the
)]1(4[)]1(4[ SSSSSS matrix )( **1
NN
-
N EN qq Ψ , we invoke Lemma A.1 in
Kelejian and Prucha (2010). For the moment, assume that the error components Nμ and Nv
are normally distributed.13
The distribution of the GM estimates without distributional
assumptions (apart from Assumption 1) is considered in the Appendix. Under normality, the
covariance between two elements of the vector Nq is given by:
),(,
,
*,
,
*1,;,
,,
tt
Nc
ss
Nc
ttss
Ncc CovN
qqE (28a)
])(,)([ ,
,
,
,
,
,
,
,
1
N
tt
NcN
tt
NcNN
ss
NcN
ss
NcNCovN ξaξAξξaξAξ
])()(,)()([ ,
,,
,
,,
,
,
,
,,
,
,,
,
,
1
N
tt
NcN
tt
NcN
tt
NcNN
ss
NcN
ss
NcN
ss
NcNCovN μavaξAξμavaξAξ μvμv
13
In that case, in Assumption 1, the requirement of finite -th moments of the error
components can be relaxed to the requirement of finite variances.
4
))(2(2 ,
,,
,
,,
4,
,,,
,
,,,
2,
,,
,
,,
1 tt
Nc
ss
Nc
tt
NcN
ss
NcN
tt
NcN
ss
NcTrN
μμμvμvvv AAAΣAΣAΣA .
])()[( ,
,,
,
,,
2,
,,
,
,,
1 tt
Nc
ss
Nc
tt
NcN
ss
NcN
μμvv aaaΣa ,
with 4,...,1, cc , Sts ,...,1, for ss and tt , and 1,...,1 Ss , ss . Note that the
each combination of indices c , s , s (and also c , t , t ) is associated with a particular row
of Nq . Hence, ttss
Ncc
,;,
,,E is the covariance between the element of Nq associated with moment
condition ss
c
,M and the element of Nq associated with moment condition tt
c
,M . (For the
second and fourth moment condition we always have ss and tt ).
In equation (28), ss
Nnnca,
,,,v and ss
Niica,
,,,μ denote the n-th and i-th main diagonal element of the
matrices ss
Nc
,
,,vA and ss
Nc
,
,,μA , respectively, and ss
Nnca,
,,,v and ss
Nica,
,,,μ denote the n-th and i-th
element of the vectors ss
Nc
,
,,va and ss
Nc
,
,,μa respectively.
The arrangement of the elements )( ,, NjiN Ψ , )]1(4[,...,1 SSSi ,
)]1(4[,...,1 SSSj is straightforward and follows naturally from the ordering of the
elements in the vector Nq , though it is notationally burdensome to state in the general case.
The expression in (28) holds generally. Part of the elements of NΨ can be stated in simpler
terms: in particular, the submatrices ss
Nc
,
,,μA , are zero for 1c and 2c such that *
,NμE drops
out for the respective elements. If both sub-matrices associated with Nit , are zero ( 1c or
2c and 1c or 2c ), **
,NμE drops out as well. Under fixed effects estimation, the terms
**
,NμE (the expressions involving ss
Nc
,
,,μa ) are equal to zero. Finally, since the main diagonal
elements of the matrices s
N,2A and s
N,4A are zero, the term *,
,NvE does not show up for
elements where 2c or 4c (or where 2c or 4c ).
To derive the asymptotic distribution of Nq and Nθ~
we invoke the central limit theorem for
vectors of linear quadratic forms given by Kelejian and Prucha (2010, Theorem A.1) and
Corollary F4 in Pötscher and Prucha (1997). We summarize the results regarding the
asymptotic distribution of Nθ~
in the following Theorem, which is proven in Appendix B.
Theorem 2. Asymptotic Normality of Nθ~
Let Nθ~
be the GM estimator defined by (18). Suppose Assumptions 1-7 hold and,
furthermore, that 0)( *
min ΨΨ cN . Then, provided the optimization space contains the
parameter space, we have
)1()()~
( 2/112/1
pNNNNNNNNN oN ξΨΘJJΘJθθ , with
NNNNN Bb ΓΓθ
J
, and
),0( )1(4
2/1
SSSd
NNN N IΨξ q ,
where )( NNN E qq Ψ and ))(( 2/12/1 NNN ΨΨΨ .
Furthermore )1()~
(2/1
pNN ON θθ and
11~ )()()( NNNNNNNNNNNN
N
JΘJJΘΨΘJJΘJΘΩθ
,
where Nθ
Ω~ is positive definite.
Theorem 2 implies that the difference between the cumulative distribution function of
)~
(2/1
NNN θθ and that of ),0( ~N
Nθ
Ω converges pointwise to zero, which justifies the use of
the latter as an approximation of the former.14
Theorem 2 holds both under normality and non-
normality of the error components, the difference being only the definition of the elements of
NΨ (and the requirement regarding the finiteness of the moments of the error components in
Assumption 1).
Note that 111
~ )()( NNNNN
JΨJΨΩθ
and that )()( 1~~
NNNN
ΨΩΘΩθθ
is positive semidefinite.
Thus, using a consistent estimator of 1
NΨ (which will be derived below) as weighting matrix
NΘ leads to the efficient GM estimator. We add that NΨ is not exactly equal to the variance-
covariance matrix of the moments, if there is an endogenous right-hand side variable in
equation (1), since the GM estimates are based on estimated rather than the true disturbances.
(See also the discussion surrounding equation (23)).
IV. Estimation of Regression Parameters Nδ and Joint Asymptotic Distribution
In the following, we consider estimators for the regression parameters Nδ in model (1a) and
establish their joint asymptotic distribution with the GM estimates Nθ~
derived in section III.
We keep the analysis general first, allowing us to state our results in a succinct way that nests
both random and fixed effects estimation of the original model as well as the spatial GLS
transformed model. We will then be more specific about the properties and the respective
expressions for the TSLS and spatial generalized TSLS estimation of model (1a).
14
Compare Corollary F4 in Pötscher and Prucha (1997).
1. General Statement of Estimator and Joint Asymptotic Distribution
Key to establishing the asymptotic properties of the GM estimates Nθ~
, which are based on the
estimated disturbances of model (1a), is Assumption 7, which holds that the (properly
normalized) difference between the true parameters and the estimates ( NΔ ) is linear in the
stacked vector of error terms, i.e., )1()()( 2/12/1
pNNN oNTNT ξTΔ .
For all estimators of Nδ in model (1a) considered in the present paper, the matrix NT has the
following structure:
NNN PFT with ),( ,, NNN μv FFF , (29a)
which can also be written as
),( ,, NNN μv TTT with NNN PFT vv ,, , NNN PFT μμ ,, , (29b)
where N,vF is a real non-stochastic *PNT matrix, N,μF is a real non-stochastic *PN
matrix, and NP is a real non-stochastic PP * matrix, with P as in Assumption 7. The
definition of NP , N,vF , N,μF will be seen to depend on the estimated model (original versus
spatial GLS transformed model) and the estimation approach (random versus fixed effects). In
general, NP is a function of the original or within-transformed design matrix NZ and a real
non-stochastic *PNT matrix of instruments NH , (or spatial GLS transformed variants
thereof); N,vF and N,μF depend on the original or within-transformed instruments NH (or
spatial GLS transformed variants thereof), and – in the untransformed model – on the matrix
S
m
NmNmNT
1
1
,, ])([ MII .
Since both )~
(2/1
NNN θθ and NNT Δ2/1)( , and thus also NN Δ
2/1 are asymptotically linear in
Nξ , the joint distribution of the vector ])~
(,[ 2/12/1 NNN NN θθΔ can be derived invoking the
central limit theorem for vectors of quadratic forms by Kelejian and Prucha (2010).
Consider the 1)]1(4[( * SSSP vector of linear and linear quadratic forms in Nξ :
N
NN
N
NT
q
ξFw
2/1)(. (30)
Its variance-covariance matrix is of dimension )]1(4[()]1(4[ ** SSSPSSSP and
given by:
NNNNN
NNNNNNN
NNNT
NTNTEVar
qqq
q
Fξ
ξFFξξFΨw w 2/1
2/11
,)(
)()()(
NN
NN
ΨΨ
ΨΨ
Δθ
ΔθΔΔ
,
,, , (31)
where the )]1(4[()]1(4[ SSSSSS matrix NΨ is defined above in (28).
The ** PP matrix N,ΔΔΨ is defined as
N,ΔΔΨ μ
ΔΔ
v
ΔΔ ΨΨ NN ,, , with (32a)
)()( ,,
1
, NNNN NT vv
v
ΔΔ FΣFΨ and NNN NT ,,
21
, )( μμ
μ
ΔΔ FFΨ
.
The )]1(4[* SSSP matrix N,ΔθΨ is given by
])[( 2/1
, NNNN NTE q ξFΨΔθ , (32b)
which is made up by )]1(4[ SSS columns of dimension 1* P , each of them associated
with a set of indices c , s , and s and thus a particular moment condition. Under normality of
Nμ and Nv , the columns are defined as
Nssc ),,,(,., Δθψ μ
Δθ
v
Δθ ψψ NsscNssc ),,,(,.,),,,(,., , 4,...,1c , Sss ,...,1, , with (32c)
v
Δθψ Nssc ),,,(,., )(11 ,
,,,2/1
ss
NcNNTN
vv aΣF , and
μ
Δθψ Nssc ),,,(,., )(11 ,
,,,
2
2/1
ss
NcNTN
μμ aF .
In Appendix 1.2, N,ΔθΨ is defined for the general case without distributional assumptions
(apart from Assumption 1).
Regarding the joint limiting distribution of )~
(2/1
NNN θθ and NNT Δ2/1)( , we now have the
following result, which is proven in Appendix B.
Theorem 3. Joint Distribution of Nθ~
and Regression Parameters
Suppose that Assumptions 1-7 hold. Moreover, assume also that )1(ON H (see Assumption
9 below) and that )1(ON F ; the latter assumption will be verified, once we have defined the
matrix NF for the particular estimators used. Moreover, assume that 0)( *
,min wΨwΨ cN .
Then,
)1()()
~(
,
2/1
,1
2/1
2/1
2/1
pNoN
NNNNN
N
NN
N oT
N
N
ξΨΘJJΘJ0
0P
θθ
Δw
, with
),(],)[()1(4
2/12/1
,
2/1
,, *
SSSP
dNNNNNNNo NNT I0FξΨwΨξ ww q , and
1
2/1
,1
2/1
,)()( NNNNN
N
N
NNNNN
N
N
TT
JΘJJΘ0
0PΨ
ΘJJΘJ0
0PΩ ww .
Theorem 3 implies that the difference between the joint cumulative distribution function of
])~
(,[ 2/12/1 NNN NN θθΔ and that of ),( ,NN wΩ0 converges pointwise to zero, which justifies
the use of the latter distribution as an approximation of the former.
Remark 2.
Theorem 3 holds under both normality and non-normality of the error components, the
difference being the definition of the elements of N,wΨ , in particular those of NΨ and N,ΔθΨ .
Obviously, Theorem 3 can also be used to obtain the joint distribution of )~
(2/1
NNN θθ and
some other estimator
NN Δ2/1
, where )1()()( 2/12/1
pNNN oNTNT ξTΔ ,
NNN PFT ,
assuming that analogous assumptions are maintained for this estimator. In particular, the
results remain valid, but with NF and NP replaced by
NF and
NP in the definitions of N,ΔΔΨ
as well as N,ΔθΨ .
2. Two-Stage Least Squares (TSLS) and Spatial Generalized TSLS Estimation of Nδ
Obviously 0uY )( NNE in model (1a). In the following we consider four TSLS estimators
for Nδ . First, depending on whether 0π or not in equation (9), we consider random effects
or fixed effects estimation. Second, we consider (both fixed and random effects) estimation of
the original model (1a) as well as of the spatial generalized LS transformed model, which is
obtained by premultiplying model (1a) with the transformation matrix
S
m
NmNmNTN
1
,, )]([ MIIR . Regarding notation, we use an underbar to refer to within-
transformed variables, e.g., NNN ZQZ ,0 . Spatial generalized LS transformed variables are
indicated by an asterix, e.g., NNN ZRZ *. Matrices and vectors that are both within- and
spatial GLS transformed variables are indicated, accordingly, e.g.,
NNNNNN ZRQZQZ ,0
*
,0
* . By the properties of N,0Q , an equivalent way writing this is
NNNNNN ZRQZRZ ,0
* , i.e., the order, in which the transformations are performed is
immaterial.
2.1 Assumptions
Some properties of the regressor matrix NX have already been discussed in subsection 3 of
section II. The following further assumptions are maintained.
Assumption 8.
The non-stochastic instrument matrix NH has full column rank RKP * (for N large
enough). Furthermore, the elements of NH are bounded uniformly in absolute value. Under
fixed effects estimation, we also assume that each instrument changes over time (at least for
some cross-section i). Moreover, it hold that ])[(lim 1
NNN NT HHQHH
and
])[(plim 1
NNN NT ZHQHZ
are finite and non-singular.
Regarding the choice of instruments, note that
}])({[)()( 1
1
,,
1
,
1
,
1
, NN
R
r
NrNrNT
R
r
NrN
R
r
Nr
R
r
NNr EEE βXWIIWyWyW
NN
i
iR
r
NrNrNT
R
r
Nr βXWIIW
1 1
,,
1
, ]})([{ ,
provided that 11
,,
R
r
NrNr W for some matrix norm (compare Horn and Johnson, 1985,
p. 301). The instrument matrix NH is used to instrument ),( NNN YXZ in a least squares
regression of NZ on NH , obtaining NN NZPZ Hˆ , where NNNNN
HHHHPH 1)( . It is thus
reasonable to select NH to include NX and a subset of the linearly independent columns of
terms of the sum N
Q
i
iR
r
NrT XWI
1 1
, ])([ , where Q is some predefined constant.15
Note that
such a choice of NH implies that the second part of Assumption 9 will be fulfilled (by
Assumptions 3 and 8) and that NX is projected on itself.
Analogous assumptions are maintained for the within-transformed regressor and instrument
matrices NX and NH . Assumption 8 then also hold for the spatial GLS transformed variables
15
Kelejian, Prucha, and Yuzefovich (2004) consider the results using alternative sets of
instruments in the estimation of a cross-section SARAR(1,1) model. Their Monte Carlo
simulation results suggest that choosing will be sufficient in many applications. 2Q
*
NX and *
NH (under random effects estimation) or *
NX and *
NH (under fixed effects
estimation).
2.2 Definition of TSLS Estimator and Asymptotic Results
2.2.1 Random Effects Estimation
The random effects TSLS estimator of model (1a) is defined as
NNNNN yZZZδ ˆ)ˆ(~ 1 , where (33)
)ˆ,(ˆNNNN N
YXZPZ H , and
NN NYPY Hˆ with NNNNN
HHHHPH 1)( .
As already mentioned, under random effects estimation, the Z-matrix typically includes a
constant. The following lemma shows that the various assumptions maintained in Section III
are automatically satisfied by the random effects TSLS estimator Nδ~
and the corresponding
residuals NNnN δZyu~~ , which are used in the GM estimation of the parameters Ns, ,
Ss ,...,1 , and 2
. A proof of Lemma 1 is given in Appendix B.
Lemma 1
Suppose that Assumptions 1-3 and 8 hold, and that bNN βsup . Let NN ZD , then,
the fourth moments of the elements of ND are bounded uniformly in absolute value,
Assumption 6 holds, and
(a) )1()()()~
()( ,
2/1
,
2/12/1
pNNNNNN oNTNTNT μTvTδδ μv , where
NNN PFT vv ,, , NNN PFT μμ ,, ,
111 )( HZHHHZHZHH QQQQQPN ,
N
S
m
NmNmNTN HMIIFv
1
1
,,, ])([ , and
N
S
m
NmNmNTNTN HMIIIeFμ
1
1
,,, ])()[( .
(b) )1()()( ,
2/1
,
2/1
pNNNN ONTNT μTvT μv ;
(c) )1(pN OP and )1(~
pNN oPP , with
11111111 ]})[(])][()]{[()[(])[(~ NNNNNNNNNNN NTNTNTNTNT ZHHHHZZHHHP .
Note that (a) and (b) together imply that Nδ~
is a 2/1N -consistent estimator of Nδ . Regarding
Assumption 4, we now have NNNN ΔDuu ~ , where NN ZD and NNN δδΔ ~
. Lemma
1 shows that under Assumptions 1-3 and 8 the TSLS residuals automatically satisfy the
conditions postulated in Assumptions 4, 6, and 7 with respect to ND , NΔ , and NT . Hence,
Theorems 1 and 2 apply to the GM estimator Nθ~
, which is based on the TSLS residuals. The
lemma also establishes that the elements of NN ZD are bounded uniformly in absolute
value, gives explicit expressions for NP and NP~
, and verifies that the conditions concerning
these matrices made in Theorem 3 are fulfilled. Hence, Theorem 3 covers the GM estimator
Nθ~
and the TSLS estimator Nδ~
, and gives the joint limiting distribution of )~
(2/1
NNN θθ
and )~
(2/1
NNN δδ , where the matrices NN PP~
, , N,vF , N,μF are as in Lemma 1.
2.2.2 Fixed Effects Estimation
The fixed effects TSLS estimator of model (1a) is defined as
NNNNN yZZZδ ˆ)ˆ(
~ 1 , where (34)
NNN NNZPZPZ HH ˆ with NNNNNN
ZHHHHPH 1)( .
The fixed effects estimates Nδ~
can then be used to obtain consistent estimates of the
disturbances, given by NNnN δZyu~~ , which are then used for the GM estimation of the
parameters Ns, , Ss ,...,1 , and 2
. These should not be confused with the fixed effects
residuals NNnN δZyu~~ , which are an estimate of NNuQ ,0 .
The results for the fixed effects estimation are exactly as in Lemma 1, with NT , NP , NH
replaced with their within-transformed counterparts NT , NP , NH , and with
0Tμ N, , 0Fμ N, , and
N
S
m
NmNmNTN
S
m
NmNmNTNN HMIIHMIIQFv
1
1
,,
1
1
,,,0, ])([])([ .16
3. Definition of Spatial Generalized Two-Stage Least Squares (GTSLS) Estimator and
Asymptotic Results
16
By the idempotency of the within-transformation matrix , one could equivalently use
the fixed effects residuals in the expression
. However, since the derivation of the heteroskedasticity-
robust variance-covariance matrix relies on the use of the original residuals, we also define
the fixed effects estimator as a linear form in the original residuals .
N,0Q
NNN vQv ,0
)1()()( ,
2/12/1
pNNN oNTNT vTΔ v
Nv
3.1. Random Effects Estimation
The spatial GLS transformed version of model (1b) is given by
***
NNNN uδZy , (34)
where NNN yRy * , NNN ZRZ * , and NNNN εuRu * and the transformation matrix NR is
given by
S
m
NmNmNTN
1
,, )]([ MIIR .
The random effects spatial GTSLS estimator, denoted as *ˆNδ , is then obtained as a TSLS
estimator applied to the transformed model (37), using the transformed instruments
NNN HRH *, i.e.,
**1*** ˆ)ˆ(ˆNNNNN yZZZδ
, (35a)
with **
*ˆ
NNN
ZPZH
and *1*** )( NNNNNHHHHPH .
The feasible random effects spatial GTSLS estimator, denoted as *~̂Nδ , is defined analogously,
replacing the transformation matrix NR by its estimate
S
m
NmNmNTN
1
,, )]~([~
MIIR , i.e.,
**1*** ~~̂)
~~̂(
~̂NNNNN yZZZδ , (35b)
where the tilde indicates that the transformation is based on the estimate of NR .
The following lemma shows that the various assumptions maintained in Section III are
automatically satisfied by the (feasible) random effects spatial GTSLS estimator *~̂Nδ and the
corresponding residuals ** ~̂
)~̂
( NNNNN δZyδu . The proof is given in Appendix B.
Lemma 2.
Suppose the Assumptions of Lemma 1 hold, and let *ˆNδ
be defined as in (39), where Nθ
is
any 2/1N -consistent estimator of Nθ (such as the GM estimator Nθ
~ based on the TSLS
residuals). Then
(a) )1()()()( *
,
2/1*
,
2/1*2/1
pNNNNN oNTNTNT μTvTΔ μv , where
**
,
*
, NNN PFT vv , **
,
*
, NNN PFT μμ ,
1
**
1
******
1
**
* )( ZHHHZHZHHH QQQQQPN ,
**
, NN HFv ,
*
,NμF*)( NNT HIe .
(b) )1()()( *
,
2/1*
,
2/1
pNNNN ONTNT μTvT μv .
(c) )1(* ON P and )1(**
pNN oPP
for
1**11**1**1**11**1* ]})[(])][(){[(])[(])[( NNNNNNNNNNN NTNTNTNTNT ZHHHHZZHHHP
.
In light of Lemmata 1 and 2 the joint limiting distribution of the (feasible) spatial GTSLS
estimator *ˆNδ
and the GM estimator Nθ
follows from Theorem 3 and the discussion
thereafter, with NNN δδΔ **̂
.
Note that in light of Lemma 2 the residuals **** ˆ
)ˆ
(ˆNNNNNNNN ΔDuδZyδu
can be used to
estimate Nθ by the GM estimator defined by (18), where the discussion surrounding Lemma
1 applies analogously here. Taking this argument one step further, Nθ and Nδ can also be
estimated by an iterative procedure.
3.2. Fixed Effects Estimation
The fixed effects spatial GTSLS estimator, denoted as *
ˆNδ , is defined as
**1*** ˆ)ˆ(ˆNNNNN yZZZδ , (36a)
with **
*ˆ
NNN
ZPZH
and *1***)( NNNNN
HHHHPH .
The feasible fixed effects spatial GTSLS estimator, denoted as *~̂Nδ , is defined analogously,
using the estimate of the transformation matrix
S
m
NmNmNTN
1
,, )]~([~
MIIR , i.e.,
**1**
*~~̂
)~~̂
(~̂
NNNNN yZZZδ
. (36b)
The results for the fixed effects estimation are exactly as in Lemma 1, with *
NT , *
NP , *
NH
replaced with their within-transformed counterparts *
NT , *
NP , *
NH , and with
0Tμ *
,N and 0Fμ *
,N , and
**
, NN HFv .
Again notice that it is not the fixed effects residuals but the estimated disturbances
** ~̂)
~̂(~
NNNNN δZyδu , which can be used in the GM estimation of Nθ .
V. Variance-Covariance Matrix Estimation
As evident from Theorem 3, the matrix N,wΩ is of sandwich form. Both under random and
fixed effects estimation, the “sandwiched” middle term, i.e., N,wΨ , is seen to depend (among
others) on the idiosyncratic error terms Nv . A complication in deriving a consistent estimator
for N,wΨ arises from the well-known fact that one can only obtain consistent estimates of the
vector of fixed effects residuals )( ,NitN vv , i.e., the within-transformed residuals, but not of
the original idiosyncratic errors Nv – a manifestation of the so-called incidental parameter
problem (Lancaster, 2000).
This point was prominently made in a recent paper by Stock and Watson (2008), who suggest
a heteroskedasticity-robust bias-corrected variance-covariance matrix estimator for nonspatial
fixed effects panel data models. A closely related issue arises in the estimation of the
variance-covariance matrix of the GM estimates Nθ given by (28). In the following, we will
derive bias-corrected estimators for the joint asymptotic variance-covariance matrix of all
model parameters under both fixed and random effects estimation, pursuing an approach
analogous to that in Stock and Watson (2008).
1. Estimation of N,wΨ
In the following, we derive estimators for the each block of N,wΨ . We start by defining an
estimator for N,ΔΔΨ , required for inference with respect to the parameters Nδ of the main
equation (1a). In a next step we turn to the estimation of the (inverse) of the optimal
weighting matrix for the GM estimation NΨ , which is also a key element in the estimation of
the variance-covariance matrix of the GM estimates of Nθ . Finally, we turn to the estimation
of NθΔΨ , required for joint tests regarding Nθ and Nδ .
1.1 Estimation of N,ΔΔΨ
Consider
N,ΔΔΨ μ
ΔΔ
v
ΔΔ ΨΨ NN ,, , where
)()( ,,
1
, NNNN NT vv
v
ΔΔ FΣFΨ and NNN NT ,,
21
, )( μμ
μ
ΔΔ FFΨ
.
Under random effects estimation, the estimators for NF (original model) and *
NF (spatial GLS
transformed model) are defined as
N
S
m
NmNmNTN HMIIFv
1
,,, ])~([~
, (37a)
N
S
m
NmNmNTNTN HMIIIeFμ
1
,,, ])~()[(~
,
and
N
S
m
NmNmNTNN HMIIHFv
1
,,
**
, )]~([~
, (37b)
N
S
m
NmNmNTNTNNTN HMIIIeHIeFμ
1
,,
**
, )]~()[()(~
,
Under fixed effects estimation, the estimators for NF and *
NF are defined as
NNN ,,0,
~~vv FQF , 0Fμ N,
~, and (37c)
*
,,0
*
,
~~NNN vv FQF , 0Fμ
*
,
~N . (37d)
Hence – under random effects estimation of the untransformed model – the estimator for μ
ΔΔΨ N, is given by
NNNNT
,,
2
,
~~~1~μμ
μ
ΔΔ FFΨ , (38)
where 2~ is the GM estimate of 2
(based on the residuals generated using the random
effects estimator NNNNN δZyδu~
)~
(~ ). For the other estimators considered, μ
ΔΔΨ N,
~ is defined
in the same way, properly replacing the F-matrices and the estimates of the disturbances Nu~ .
As already mentioned above, due to the heteroskedasticity of Nv and the fact that the
variance covariance matrix depends on the idiosyncratic error terms in levels Nv rather than
the fixed effects residuals Nv , a bias correction is required. As shown in Lemma C.2 of the
Appendix, adopting an approach analogous to that in Stock and Watson (2008) in the present
framework yields the following bias-corrected estimator for v
ΔΔΨ N, :17
)~~~
()2(
1~,
HR
,, NNNNTN
vv
v
ΔΔ FΣFΨ
, (39)
where
])~[(~ 2
,1
HR
Nn
NT
n
HR
N vdiag Σ with
T
r
NirNit
HR
Nit vTT
vv1
2
,
2
,
2
,~
)1(
1~)~( . The estimates of the
fixed effects residual are given by N
S
m
NmNmNTNNNNitN v uMIIQεQv ~)~([~)~(~
1
,,,0,0,
.
Again the modification modifications of (39) for other estimators are straightforward,
replacing N,
~vF properly.
We summarize the consistency result of the estimators given by (38) and (39) with the
following theorem.
Theorem 4a. Consistency of N,
~ΔΔΨ
Let v
ΔΔ
μ
ΔΔΔΔ ΨΨΨ NNN ,,,
~~~ with
μ
ΔΔΨ N,
~ and
v
ΔΔΨ N,
~ defined in (38) and (39). Suppose that the
Assumptions of Theorem 3, apart from Assumptions 5 and 7, hold and that additionally all of
the fourth moments of the elements of ND are bounded uniformly. Suppose furthermore (a)
1sup1
,
S
s
NsN and that the row and column sums of NM are bounded uniformly in
absolute value by one and some finite constant respectively, and
(b) )1(~
pNN oPP with )1(ON P . Then, )1(~
,, pNN o ΔΔΔΔ ΨΨ and )1(~ 1
,, pNN o
ΔΔΔΔ ΨΨ .
Proof. Theorem 4a follows from Lemmata C.2 and C.3 in Appendix C.18
Remark 3: Under estimation of the spatial GLS transformed model (where the inverse of NR
cancels out), condition (a) can be dropped. Under TSLS (or spatial GTSLS estimation ),
condition (b) in Theorem 4a is automatically fulfilled (see Lemmata 1 and 2).
1.2 Estimation of NΨ
17
The result in Stock and Watson (2008) is obtained as a special case for and if there
are no endogenous right-hand side variables, i.e., .
18
Note that Lemma C.2 uses a slightly different definition of , factoring out ,
for notational convenience of the proof.
0ρ N
NN XF
HR
NΣ~
)2/( TT
Consider the elements of NΨ as defined in (28). For estimation, it will turn out convenient to
rewrite the part of the elements of NΨ as given by (28a) with the main diagonal elements of
the matrices ss
Nc
,
,,vA set to zero in the first expression of the trace in the first line. Furthermore,
to simplify the exposition we drop the indices sc, , and s in the following derivation and to
adopt the following notational convention. We refer to the matrix ss
Nc
,
,,vA , associated with the
set of indices ssc ,, as )()( ,,,,,, NjsitNnnN aa vvvA , and to the matrix ss
Nc
,
,,vA with its main
diagonal elements set to zero as )( ,,1,, Nnn
NT
nNN adiag vvv AA . Analogously, the matrix
tt
Nc
,
,,vA , associated with the set of indices ttc ,, is denoted as )()( ,,,,,, NjsitNnnN bb vvvB , and
)( ,,1,, Nnn
NT
nNN bdiag vvv BB . We adopt the same convention for the matrices ss
Nc
,
,,μA ,
henceforth denoted as )()( ,,,,,, NjsitNnnN aa μμμA , as well as tt
Nc
,
,,μA , henceforth denoted as
N,μB , and also for the vectors ss
Nc
,
,,va and tt
Nc
,
,,va , henceforth denoted )()( ,,,,, NitNnN aa vvva
and )()( ,,,,, NitNnN bb vvvb , respectively. Finally, we refer to products of equally indexed
elements of N,vA and N,vB as NjsitNjsitNit bac ,,,,,,,, vvv (or NnnNnnNn bac ,,,,, vv ) , and we define
)( ,,,,,,,,,, NnNnnNnnNnNn babad vvvvv and )( ,,,,,,,,,, NnNnnNnnNnNn babad μμμμμ .
In that case, equation (28a) can be written, for given a given pair of index sets ssc ,, and
ttc ,, , as
*
,
*
,
*
,
*,
, NNNNN vμμvv EEEEE **
,
**
, NN μv EE , (40)
where
)(2 ,,
1*,
, NNNNN TrN ΣBΣA vvv
E
N
i
T
t
N
j
T
itjss
NjsNitNjsit vvcEN1 1 1 1
2
,
2
,,,,
12 v,
*,
,NvE =
NT
n
NnvNn
NT
n
NnvNn cNcN1
)4(
,,,,
1
1
4
,,,,
1
3
22 vv ,
)(4 ,,,,
12*
, NNNN TrN μvμvvμ BΣA
E ,
)(2 ,,
14*
, NNN TrN μμμ BA E ,
NNNN N ,,
1**
, vvv bΣa E ,
NNN N ,,
21**
, μμμ ba
E ,
Notice that the terms 4
,,,, NnvNnc v , NTn ,...,1 , associated with the main diagonal elements of
N,vA and N,vB , in the expression NNNN ΣBΣA vv ,, , are not included in *,
,NvE . To rewrite *,
,NvE ,
we have used the fact that 3/)4(
,,
4
,, NnvNnv under normality, where )4(
,, Nnv is the fourth
moment of Nv .
We next define the estimates for N,va and N,μa :
NNNnN a αTa vvv
~~)~(~
,,,, , NNNnN a αTa μμμ
~~)~(~
,,,, , with (41)
NNN PFT vv
~~~,, , NNN PFT μμ
~~~,, , and )~~
(2~ 1
NNNN N uCDα . The (properly indexed) matrices
NC~
, i.e., ss
Nc
,
,C , are given by (22) with Nρ replaced by Nρ~ and the estimates of the
disturbances are given by NNNNN δZyδu~
)~
(~ . Expression (41) as written holds for random
effects estimation of the original model; the modifications for the other estimators are
obvious, appropriately replacing NN ZD , NT~
, and Nu~ . Of course, analogous definitions
apply to N,vb and N,μb .
We next define estimators for the terms in (40), starting with the “homoskedastic” terms,
involving only the time-invariant error component Ni, .
1.2.1. Estimation of “homoskedastic” terms
Consistent estimators of the expressions in (40), associated (only) with the homoskedastic,
time-invariance error component Niμ , , are given by
)(~2~
,,
14*
, NNN TrN μμμ BA E , (42a)
NNN N ,,
21**
,
~~~~μμμ ba
E , (42b)
The consistency proofs for the estimators defined in (42) are easily seen to be special cases of
those for the heteroskedastic terms considered in the next section and thus omitted for the
sake of brevity.
1.2.2. Estimation of “heteroskedastic” terms
Consider first *,
,NvE as defined in (40). Its estimation is simplified by the fact that the matrices
N,vA and
N,vB and thus the elements Njsitc ,, are time-invariant, i.e., NjiNjsit cc ,,,, . As shown
in Lemma C.5 in Appendix C, a consistent estimate of *,
,NvE is given by
*,
,
~NvE
N
i
N
j
T
t
T
s
NjsNitNji vvcNT
T
1 1 1 1
2
,
2
,,,2
2
~~1
)1(. (43a)
Next, consider
NT
n
NnvNnN cN1
4
,,,,
1*,
, 2 vvE . Under normality, and noting that elements Nitc , are
time-invariant, this can also be written as weighted sum of fourth moments as
N
i
T
t
NitvNiN cN1 1
)4(
,,,,
1*,
,3
2vvE , which can be estimated consistently using
]~
)1(
~
)1([
3
2~
11
01
11
0*,
, NNNmk
mk
mk
kaE
v , where (43b)
N
i
T
t
NitNiN vcN 1 1
4
,,~1~ ,
N
i
T
t
T
trr
NirNitNiN vT
vcN 1 1 1
2
,
2
,,~
1
1~1~a ,
932 23
3
0
TTT
Tm ,
932
)32(231
TTT
Tm ,
364
)1(23
2
0
TTT
TTk ,
364
)96)(1(231
TTT
TTk .
The derivation, using a bias correction in the spirit of Stock and Watson (2008), and the proof
of consistency is given in Lemma C.6a in Appendix C.
In light of the previous results, estimation of *
,NvμE is straightforward; exploiting the fact that
the weights matrices are time-invariant, a consistent estimate is given by
)~
(1
1~4~
,,,,
2*
, NNNN TrT
T
Nμvμvvμ BΣA
E , (44a)
where ])~[(~ 2
,1 Nn
NT
nN vdiag Σ .
Finally, an estimate of **
,NvE is given by
N
HR
NNNTN
T,,
**
,
~~~
)2(
~vvv bΣa
E . (44b)
That )1(~ **
,
**
, pNN o vv EE follows from Lemmata C.2 and C.3 and Remark C.1 thereafter in
Appendix C.
We summarize the results of section 1.2 with the following theorem.
Theorem 4b. Consistency of NΨ~
Suppose all of the assumptions of Theorem 4a and Assumption 7 holds and that Nv and Nμ
are normally distributed. Let the elements of NΨ~
be defined as above (from (39) to (44)).
Then, )1(~
pNN oΨΨ and )1(~ 1
pNN o ΨΨ .
Remark 4: Under non-normality, Theorem 4.b holds under additional assumptions regarding
the moments of Nv and Nμ and with augmented definitions of the elements of NΨ and NΨ~
;
details are given in the Appendix.
1.3 Estimation of N,θΔΨ
It remains to provide an estimate of N,θΔΨ , which is required for tests of joint hypotheses
concerning the regression parameters Nδ and the parameters associated with the spatial
regressive disturbance process Nθ .
As evident from the results in section 1.2, the assumptions maintained in Theorem 4b are
sufficient to prove that the following expressions consistently estimate the columns of N,θΔΨ
as defined in light of (32c), provided that Nv and Nμ are normally distributed:
Nssc ),,,(,.,~
Δθψ μ
Δθ
v
Δθ ψψ NsscNssc ),,,(,.,),,,(,.,~~
, 4,...,1c , Sss ,...,1, , with (45)
v
Δθψ Nssc ),,,(,.,~
)~~(
)2(
,
,,,
2/1ss
Nc
HR
NNTN
T
vv aΣF , and
μ
Δθψ Nssc ),,,(,.,~
)~(~11 ,
,,,
2
2/1
ss
NcNTN
μμ aF .
Theorem 4c. Consistency of N,
~ΔθΨ
Suppose the assumptions of Theorem 4b hold and let (the columns of ) N,
~ΔθΨ be defined by
(45). Then, we have )1(, ON θΔΨ , )1(~
,, pNN o θΔθΔ ΨΨ , and )1(~
, pN OθΔΨ .
Remark 5: Under non-normality, Theorem 4c holds under additional assumptions and with
augmented definitions of the columns of N,ΔθΨ and N,
~ΔθΨ ; details are given in the appendix.
2. Estimation of N,wΩ
The estimate of NJ is given by
NNN B~~~
ΓJ . (46)
The elements of NΓ~
are defined in (17) with the expectations operator suppressed and the
disturbances Nu replaced by their estimated counterparts. For simplicity of notation, the
estimated disturbances are denoted as Nu~ throughout, though it should be clear that they are
generated by the respective estimators Nδ
~,
Nδ~
, *~̂Nδ , or
*~̂Nδ defined above. For example,
under fixed effects (feasible) spatial generalized LS estimation, we have *~̂~NNNN δZyu .
The matrix NB~
is given by (20) with Ns, replaced by the GM estimates Ns,~ , Ss ,...,1 .
Theorem 5. Consistency of NwΩ
~
Suppose that Assumptions 1-7 hold. Let N,
~wΨ be defined as above (from (39) to (45)). Define
N,
~wΩ
1
2/1
,1
2/1
)~~~
(~~
~~
~~)
~~~(
~
NNNNN
NN
NNNNN
N TT
JΘJJΘ0
0PΨ
ΘJJΘJ0
0Pw .
It follows that )1(~
poNN ww ΩΩ , )1(O
NwΩ , and )1(
~pO
NwΩ .
Proof.
Above we showed that )1(~
,, pNN o ww ΨΨ . By assumption, )1(~
pNN oPP , )1(ON P ,
and )1(~
pN OP as well as )1(~
pNN oΘΘ , )1(ON Θ and )1(~
pN OΘ . In the proof of
Theorem 2 it was shown that )1(~
pNN o JJ , )1(ON J , and )1(~
pN OJ , and that
)1()()~~~
( 1
pNNNNNN o JΘJJΘJ , )1()( 1 ONNN
JΘJ , and )1()~~~
( pNNN O JΘJ . It now
follows that )1(~
,, pNN o ww ΩΩ and )1(, ON wΩ and thus )1(~
, pN OwΩ .
Remark 5: Under non-normality, Theorem 5 holds under additional assumptions and with
augmented definitions of N,wΨ and N,
~wΨ ; details are given in the appendix.
VI. Random vs. Fixed Effects. A Heteroskedasticity-Robust Hausman Test
In the following we derive a Hausman-type test of the spatial random effects versus the
spatial fixed effects model under heteroskedasticity of unknown form. Both estimators
considered are based on the spatial GLS transformed model (which removes the cross-
sectional interdependence) and use a heteroskedasticity-robust robust variance-covariance
matrix for inference. In general, neither of these two estimators will be efficient, such that we
use a generalized Hausman-test for inference (see, Weesie, 1999; Creel, 2004).
Consider the stacked vector of random and fixed effects estimates of the regression
parameters, which is given by
)ˆ(
)ˆ(*
2/1
*2/1
*2/1
*2/1
NN
NN
N
N
N
N
N
N
N
δδ
δδ
Δ
Δd . (47)
By Theorem A.1 in Kelejian and Prucha (2010)
***
***
ˆˆˆ
ˆˆˆ
NNN
NNN
N
d
N
δδδ
δδδ
ΩΩ
ΩΩΩd . (48)
As evident from (48), *
,
*1
ˆ * NNNTN
PΨPΩ ΔΔδ
and *
,
*1
ˆ * NNNTN
PΨPΩ ΔΔδ
. The off-diagonal
block of NΩ is given by
** ˆˆNN δδ
Ω **
,
*
,
* ]1
[1
NNNNNNTT
PFΣFP vv , (49a)
which can be estimated consistently – by the same logic as *ˆNδ
Ω and *ˆNδ
Ω – using
**
,*
,
*
ˆˆ
~]
~~~
)2(
1[
~1~** NN
HR
NNNTNTNN
PFΣFPΩ vvδδ
. (49b)
The Hausman test, which is derived under the null hypothesis that the random effects model
as specified in section II is the true model, takes the form of a Wald-type test of the restriction
that *
* ˆˆNN δδ . Define the discrepancy vector NNN qRm ˆ ˆ , where )ˆ,ˆ(ˆ
** NNN δδq . Note that
typically, the dimension of the parameter vector under random effects exceeds the parameter
vector under fixed effects by 1 due to the inclusion of a constant. Hence, for comparison of
the two estimators, we focus on a joint test regarding the slope parameters, i.e., we test
0NH Rq :0 against 0NH Rq :1 , (50)
where ),,( 1 PPP II0 R , assuming that the constant appears in the first row of the random
effects estimator *ˆNδ . We use a generalized Wald-type test (e.g., Greene, 2003, pp. 95, 487),
which takes the form19
)(~)~
( 2 PNNN mRQRm , (51)
19
If one of the estimators is efficient, the off-diagonal blocks are equal to zero and equation
(51) reduces to the standard Hausman test.
where ** ˆˆ
1~~
NN
NNδδ
ΩQ and P is the number of restrictions, which is equal to the number of
slope parameters in the present case.20
VII. Some Monte Carlo Evidence
In the following we provide some limited Monte Carlo evidence on the performance of the
estimation procedure suggested in the present paper. A comprehensive assessment, using a
broad range of parameter constellations, alternative distributional assumptions, and alternative
specifications of the weights matrices is beyond the scope of the present paper and left for
future research. We consider a SARAR(2,2) specification with two explanatory variables,
assuming that NN MW :21
uyWIxxy
2
1
2211 )(r
rTrββ , (52a)
εuWIu
2
1
)(s
sTs . (52b)
We consider three sample sizes: 50N , 100N , and 250N and assume 5T
throughout. For each Monte Carlo experiment, we consider 1000 draws. The explanatory
variables 1x and 2x are generated as random draws from a standard normal distribution,
scaled with a factor of five, and treated as fixed in repeated samples. The parameters are as
follows: 121 , 5.01 , 25.02 , 4.01 , and 2.02 .
The unnormalized NN matrix 0
W consists of two NN matrices 0
1W and 0
2W , where
00
2
0
1 WWW . The matrices 0
1W and 0
2W are specified such that they contain the elements
of 0
W for a different band of neighbours each. Otherwise, they have zero elements. In line
with Kelejian and Prucha (2010), we choose a design, where 0
1W corresponds to an ‘up to 3
ahead and up to 3 behind’ specification and 0
2W corresponds to a ‘4 to 6 ahead and 4 to 6
behind’ specification. The final weights matrices 1W and 2W are obtained by individually
row-normalizing 0
1W and 0
2W . As already mentioned, we have 11 WM and 22 WM .
20
The theory underlying Hausman tests with not fully efficient estimators is derived in White
(1982, 1994). In a non-spatial context, such a generalized Hausman test is considered, e.g., in
Weesie (1999) or Creel (2004). Sufficient assumptions to ensure well-behaved asymptotic
properties in generalized Wald tests are derived and discussed in Andrews (1987) and Vuong
(1987). 21
For simplicity of notation, the subscript is suppressed in the following. N
Regarding the choice of instruments, we include linearly independent terms of up to second
order spatial lags of the exogenous variables. In particular, the matrix of untransformed
instruments H contains 12 columns and is given by
].)(,)(,)(,)(,)(,[ 21
2
2
2
121 XWWIXWIXWIXWIXWIXH TTTTT (53)
The elements of the error term ε are specified as itiit vμε , where the idiosyncratic error is
given by itititititit xxv )1.01.05.0(5.0 2
,2
2
,1 . Thereby it and it are draws from a
standard normal distribution and it is a draw from a uniform distribution with support
]5.1 ,5.0[ , which is treated as fixed in repeated samples. Hence, itv exhibits both conditional
and unconditional heteroskedasticity.
The individual effect is specified as iiii wxπxπμ ,22,,11 , where Niw , is a draw from
normal distribution with variance 0.5. We consider two specifications: in the random effects
model we have 021 ππ (and, hence, )()( ,, NiNi wVarμVar ); in the fixed effects model we
have 25.021 ππ (and, hence, )()()( 2
,, iiNi wVarVarμVar πx ).
Results for the estimates of 1 and 2 are obtained by the GM estimator defined in equation
(18), using the optimal weighting matrix under normality 1)~
( NΨ . The estimates reported for
the regression parameters are FGTSLS estimates as defined in (35) and (36) using the
transformed set of instruments **~H . For each single coefficient, we report the average bias
and root mean squared error for each parameter constellation and the rejection rates for the
test that the coefficient is equal to the true parameter value. For the random effects models,
we also show the results for the Hausman test.
< Table 1 >
Table 1 reports the results of the Monte Carlo analysis for the three different sample sizes
considered, both under the random and fixed effects specification. Given that the natural
habitat of GM estimation is large samples, the performance in the smallest sample with
50N is acceptable. In the random effects (fixed effects) specification, the average bias and
RMSE amount to 0.0008385 and 0.0246475 (0.001719 and 0.027935) for the estimates of
),( 21 λ and -0.0096335 and 0.2563835 (-0.0106385 and 1.050696) for the estimates of
),( 21 ρ . With an average rejection rate of 0.0685 and 0.139 (0.0650 and 0.1225), the
performance of the single hypotheses tests referring to λ and ρ is not too bad as well. The
Hausman test is oversized with a rejection rate of 0.1060.
For moderately sized samples with 250N , the bias has virtually disappeared: in relative
terms it amounts to 0.01560 (0.0102) percent for estimates of ),( 21 λ and to -0.1647
(-0.280) percent for the estimates of ),( 21 ρ under random effects (fixed effects). The
average RMSE of the estimates of ),( 21 λ shrinks to 0.011376 (0.011466), that of the
estimates of ),( 21 ρ shrinks to 0.213485 (0.800393) under random effects (fixed
effects). The size of the tests improves, but it approaches nominal size of 5 percent relatively
slowly. The reason for the latter partly accrues to the fact that the data for 1x and 2x are
generated as random draws. A second reason relates to the specific ‘ahead-behind’ design of
the spatial weights matrices, which – together with the properties of the explanatory variables
– results in a fairly high correlation between spatial lags of different orders. With explanatory
variables as in many empirical applications and less artificial spatial weights matrices, there
will be less correlation between the spatial lags of the explanatory variables and spatial lags of
different orders and the size of tests can be expected to approach the nominal size faster than
in the chosen design. Regarding the GM estimates of ρ , the average size amounts to 0.139
(0.123), that for the FGTSLS estimates of λ to 0.0555 (0.139). The performance of the
Hausman test is worth mentioning, which has already approached its nominal size with a
rejection rate of 0.056.
The final column in Table 1 considers the case with N = 250 and where the sum of the
parameters of the spatial lag of the dependent variable is closer to 1, i.e., with 1 = 0.6 and 2
= 0.35. As can be seen from the results, the performance in terms of bias and size is
comparable with the parameter constellation where the sum of 1 and 2 is smaller in
magnitude.
Overall, the Monte Carlo experiments illustrate that the proposed estimators work reasonably
well in terms of bias and RMSE, even in very small samples. Regarding the estimates of the
variance-covariance matrix of the parameter estimates, in particular those relating to the
disturbance process, some care is warranted in the interpretation of the results in small
samples, though the tests appear to be conservative in the sense that they under-reject the null
and the p-values converge from above for reasons mentioned in the previous paragraph. It
should also be emphasized that the results here are based on a correctly specified model with
a high signal to noise ratio. Hence, apart from a comprehensive Monte Carlos study using
alternative distributional assumptions and ‘real world’ explanatory variables and weights
matrices, an interesting extension for future research would be to explore small sample
corrections or re-sampling methods for the GM estimators considered in the present paper in
order to improve the performance in small samples or in empirical models with poor fit.
VIII. Conclusions
This paper derived a two-step estimation procedure for spatial regressive panel data models
with spatial regressive disturbances of the SARAR(R,S) type under both random and fixed
effects assumptions and allowing for heteroskedasticity of arbitrary form in the idiosyncratic
error terms. The regression model is estimated by two-stage least squares (TSLS) to obtain
consistent estimates of the disturbances, which are then used in the second step to obtain
generalized moments (GM) estimates of the parameters of the spatial regressive disturbance
process.
We provide a detailed study of the asymptotic properties of the proposed two-step TSLS and
GM estimators of the model parameters, prove their consistency and establish asymptotic
normality. Both for the original model and the spatial generalized least squares (GLS)
transformed model, we derive the joint and asymptotic variance-covariance matrix, which is
robust to (cross-sectional interdependence and) heteroskedasticity of unknown form. This
enables robust tests of the general SARAR(R,S) model against restricted alternatives such as
SARAR(0,S) and SARAR(R,0) or SARAR(1,1) with random and fixed effects panel data
models under heteroskedasticity. We also propose a generalized Hausman-type test of the
spatial random versus the spatial fixed effects model.
The framework suggested in the present paper provides a flexible tool for applied
econometric researchers for empirical models with cross-sectional interdependence and
allows to study the strength and pattern of spatial interdependence more flexibly and under
less restrictive assumptions than existing SARAR(1,1) models assuming homoskedasticity.
Allowing for alternative modes of interdependence and determining the proper pattern of the
interdependence decay function is not only of interest in itself but also a prerequisite for a
correct model specification and valid inference.
Table 1. Monte Carlo Results, 1000 draws
N = 50 N = 100 N = 250 N = 250
RE FE RE FE RE FE RE FE
1 = 0.5 1 = 0.6
Bias 0.001082 0.002029 -0.000113 0.000359 0.000237 -0.0000468 0.000161 0.000587
RMSE 0.02395 0.026601 0.018157 0.016493 0.010799 0.011269 0.009209 0.009012
Rej. Rate 0.074 0.068 0.045 0.049 0.046 0.042 0.056 0.048
2 = 0.25 2 = 0.35
Bias 0.000595 0.001409 0.000259 0.000791 -0.000120 -0.0000301 -0.000073 -0.000654
RMSE 0.025345 0.029269 0.019095 0.017378 0.011953 0.011663 0.010053 0.009981
Rej. Rate 0.063 0.062 0.05 0.054 0.065 0.051 0.047 0.052
1 = 1 1 = 1
Bias -0.000313 0.000429 -0.000564 -0.000313 0.000187 -0.000447 0.000286 0.00000197
RMSE 0.017017 0.01953 0.013611 0.012649 0.008170 0.008024 0.008164 0.007508
Rej. Rate 0.049 0.057 0.064 0.064 0.049 0.054 0.061 0.042
2 = 1 2 = 1
Bias -0.000125 -0.000613 0.000103 -0.000262 0.000005 -0.0000729 -0.000561 0.000248
RMSE 0.018706 0.019158 0.012461 0.011945 0.008016 0.007815 0.00777 0.008334
Rej. Rate 0.057 0.087 0.054 0.047 0.047 0.053 0.048000 0.052000
1 = 0.4 1 = 0.4
Bias -0.002348 0.014094 0.000859 -0.00071 0.000162 0.003757 -0.000614 0.004045
RMSE 0.184249 0.954723 0.146618 0.670901 0.121458 0.701775 0.126152 0.747754
Rej. Rate 0.147 0.122 0.126 0.115 0.146 0.131 0.125000 0.133000
2 = 0.2 2 = 0.2
Bias -0.016919 -0.035371 -0.004592 -0.010192 -0.000656 -0.005854 -0.005464 -0.011615
RMSE 0.328518 1.146669 0.314663 0.864678 0.305513 0.899012 0.312235 0.945605
Rej. Rate 0.131 0.123 0.131 0.116 0.132 0.115 0.118000 0.128000
Hausman-test
Rej. Rate 0.106 0.058 0.056 0.054
References
Amemiya, T. (1971). The estimation of the variances in a variance-components model.
International Economic Review, 12, 1-13.
Andrews, D. (1987). Asymptotic results of generalized wald tests. Econometric Theory, 3,
348-358.
Anselin, L. (1988). Spatial Econometrics: Methods and Models. Boston: Kluwer, Academic
Publishers.
Anselin, L., Bera, A.K., Florax, R. and Yoon, M.J. (1996). Simple diagnostic tests for spatial
dependence. Regional Science and Urban Economics, 26, 77-104.
Arbia, G., Basile, R., and Piras, G. (2005), Using spatial panel data in modeling regional
growth and convergence. ISAE Working Paper no. 55, Rome.
Arraiz, I., Drukker, D.M., Kelejian, H., and Prucha, I. (2010). A spatial Cliff-Ord-type model
with heteroskedastic innovations: Small and large sample results. Regional Science and
Urban Economics, 50(2), 592-614.
Audretsch, D.B. and Feldmann, M.P. (1996). R&D spillovers and the geography of
innovation and production. American Economic Review, 86, 630-640.
Badinger, H. and Egger, P. (2008). GM estimation of higher-order spatial autoregressive
processes in cross-section models with heteroskedastic disturbances. CESifo Working
Paper no. 2356, Munich.
Badinger, H. and Egger, P. (2009). Horizontal versus vertical interdependence in
multinational activity. CESifo Working Paper no. 2327, Munich.
Baltagi, B.H. (2005). Econometric Analysis of Panel Data, third edition. Chichester: Wiley.
Baltagi, B.H. (2006). Random effects and spatial autocorrelation with equal weights.
Econometric Theory, 22(5), 973-84.
Baltagi, B.H., Egger, P., and Pfaffermayr, M. (2007). Estimating models of complex FDI: Are
there third-country effects? Journal of Econometrics, 140(1), 260-281.
Baltagi, B.H., Egger, P., and Pfaffermayr, M. (2009). A generalized spatial panel data model
with random effects. Working Paper No. 113, Center for Policy Research, University of
Syracuse.
Baltagi, B.H. and Li, D. (2001). LM test for functional form and spatial error correlation.
International Regional Science Review, 24, 194-225.
Baltagi, B.H., Song, S.H., and Koh, W. (2003). Testing panel data regression models with
spatial error correlation. Journal of Econometrics, 117, 123-150.
Baltagi, B.H., Song, S.H., Jung, B.C., and Koh, W. (2007). Testing for serial correlation,
spatial autocorrelation and random effects using panel data. Journal of Econometrics,
140, 5-51.
Bell, K.P. and Bockstael, N.E. (2000). Applying the generalized-moments estimation
approach to spatial problems involving microlevel data. The Review of Economics and
Statistics, 82(1), 72–82.
Besley, T. and Case, A. (1995). Incumbent behavior: Vote-seeking, tax-setting, and yardstick
competition. American Economic Review, 85, 25-45.
Case, A., Hines Jr., J. and Rosen, H. (1993). Budget spillovers and fiscal policy
independence: Evidence from the States. Journal of Public Economics, 52, 285-307.
Cliff, A. and Ord, J. (1973). Spatial Autocorrelation. London: Pion, 1973.
Cliff, A. and Ord, J. (1981). Spatial Processes, Models and Applications. London: Pion, 1981.
Cohen, J.P. and Morrison Paul, C. (2007). The impacts of transportation infrastructure on
property values: A higher-order spatial econometrics approach. Journal of Regional
Science, 47(3), 457-478.
Cohen, J.P. and Morrison Paul, C.J. (2004). Public infrastructure investment, interstate spatial
spillovers, and manufacturing costs. The Review of Economics and Statistics, 86(2),
551-560.
Conley, T. (1999). GMM estimation with cross sectional dependence. Journal of
Econometrics, 92, 1-45.
Creel, M. (2004). Modified Hausman tests for inefficient estimators. Applied Economics,
36(21), 2373-2376.
Egger, P., Pfaffermayr, M. and Winner, H. (2005). An unbalanced spatial panel data approach
to US state tax competition. Economics Letters, 88, 329-335.
Gilbert, S. (2002). Testing the distribution of error components in panel data models.
Economics Letters, 77, 47-53.
Greene, W.H. (2003). Econometric Analysis, fifth edition. Pearson, Upper Saddle River, New
Jersey.
Holtz-Eakin, D. (1994). Public sector capital and the productivity puzzle. Review of
Economics and Statistics, 76, 12-21.
Horn, R.A. and Johnson, C.R. (1985). Matrix Analysis. Cambridge: Cambridge University
Press, 1985.
Kapoor, M., Kelejian, H.H., and Prucha, I.R. (2007). Panel data models with spatially
correlated error components. Journal of Econometrics, 140, 97-130.
Kelejian, H. and Robinson, D. (1992). Spatial autocorrelation: A new computationally simple
test with an application to per capita county police expenditures. Regional Science and
Urban Economics, 22, 317–331.
Kelejian, H.H. and Prucha, I.R. (1998). A generalized spatial two-stage least squares
procedure for estimating a spatial autoregressive model with autoregressive
disturbances. Journal of Real Estate Finance and Economics, 17, 99-121.
Kelejian, H.H. and Prucha, I.R. (1999). A generalized moments estimator for the
autoregresssive parameter in a spatial model. International Economic Review, 40, 509-
533.
Kelejian, H.H. and Prucha, I.R. (2004). Estimation of simultaneous systems of spatially
interrelated cross sectional equations. Journal of Econometrics, 118, 27-50.
Kelejian, H.H. and Prucha, I.R. (2007). HAC Estimation in a Spatial Framework. Journal of
Econometrics, 140(1), 131-154.
Kelejian, H.H. and Prucha, I.R. (2010). Specification and estimation of spatial autoregressive
models with autoregressive and heteroskedastic disturbances. Journal of Econometrics,
157(1), 53-67.
Kelejian, H.H., Prucha, I.R. and Yuzefovich, E. (2004). Instrumental variable estimation of a
spatial autoregressive model with autoregressive disturbances: Large and small sample
results. In: LeSage, J. and Pace, K. (eds.), Advances in Econometrics: Spatial and
Spatiotemporal Econometrics. Elsevier, New York, 163-198.
Lancaster, T. (2000). The incidental parameter problem since 1948. Journal of Econometrics,
95(2), 391–413.
Lee, L.-F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for
spatial autoregressive models. Econometrica, 72(6), 1899-1925.
Lee, L.F. and Yu, J. (2010). Estimation of spatial autoregressive panel data models with fixed
effects. Journal of Econometrics, 154(2), 165-185.
Lee, L.F. and Liu, X. (2010). Efficient GMM estimation of high order spatial autoregressive
models. Econometric Theory, 26, 187-230.
Lin, X. and L.F. Lee (2010). GMM estimation of spatial autoregressive models with unknown
heteroskedasticity. Journal of Econometrics, 157(1), 34-52.
Mittelhammer, R.C. (1996). Mathematical Statistics for Economics and Business. New York:
Springer.
Mundlak, Y. (1978). On Pooling Time Series and Cross Section Data, Econometrica, 46(1),
69-85.
Mutl, J. and Pfaffermayr, M. (2011). The Hausman Test in a Cliff and Ord Panel Model.
Econometrics Journal. 14, 48-76.
Pinkse, J. and Slade, M.E. (1998). Contracting in space: An application of spatial statistics to
discrete-choice models. Journal of Econometrics, 85, 125-154.
Pinkse, J., Slade, M.E., and Brett, C. (2002). Spatial price competition: A semiparametric
approach. Econometrica, 70, 1111-1153.
Pötscher, B.M. and Prucha, I.R. (1997). Dynamic Nonlinear Econometric Models, Asymptotic
Theory. New York: Springer.
Rao, C.R. (1973). Linear Statistical Inference and its Applications, 2nd
edition. New York:
Wiley.
Resnik, S. (1999). A Probability Path. Boston: Birkhäuser.
Shroder, M. (1995). Games the States don’t play: Welfare benefits and the theory of fiscal
federalism. Review of Economics and Statistics, 77, 183-191.
Stock and Watson (2008). Heterokedasticity-robust standard errors for fixed effects panel data
regression. Econometrica, 76(1), 155-174.
Topa, G. (2001). Social interactions, local spillovers and unemployment. Review of Economic
Studies, 68, 261-295.
Van der Vaart, H.R. and Yen, H.E. (1968). Weak sufficient conditions for Fatou’s Lemma
and Lebesgue’s Dominated Convergence Theorem. Mathematics Magazine, 41(3), 109-
117.
Vuong, Q.H. (1987). Generalized inverses and asymptotic properties of Wald test. Economics
Letters, 24, 343-347.
Weesie, J. (1999) Seemingly unrelated estimation and the cluster-adjusted sandwich
estimator, Stata Technical Bulletin STB-52.
APPENDIX. Variance-Covariance Matrix Under Non-Normality of Error Components
As already mention in the main text, Theorems 4b and 4c as well as Theorem 5 also hold
under non-normality with different definitions of NΨ and N,θΔΨ respectively. In the
following, we provide the definitions of the respective elements under non-normality and
define consistent estimates for them.
1.1 Distribution of GM Estimates under Non-Normality (Definition of NΨ )
If we drop the assumption that Nμ and Nv are normally distributed, equation (28b) becomes
),(,
,
*,
,
*1,;,
,,
tt
Nc
ss
Nc
ttss
Ncc CovN
qqE (A.1a)
])(,)([ ,
,
,
,
,
,
,
,
1
N
tt
NcN
tt
NcNN
ss
NcN
ss
NcNCovN ξaξAξξaξAξ
])()(,)()([ ,
,,
,
,,
,
,
,
,,
,
,,
,
,
1
N
tt
NcN
tt
NcN
tt
NcNN
ss
NcN
ss
NcN
ss
NcNCovN μavaξAξμavaξAξ μvμv
))(2(2 ,
,,
,
,,
4,
,,,
,
,,,
2,
,,
,
,,
1 tt
Nc
ss
Nc
tt
NcN
ss
NcN
tt
NcN
ss
NcTrN
μμμvμvvv AAAΣAΣAΣA .
])()[( ,
,,
,
,,
2,
,,
,
,,
1 tt
Nc
ss
Nc
tt
NcN
ss
NcN
μμvv aaaΣa
N
i
tt
Niic
ss
Niic
NT
n
NnvNnv
tt
Nnnc
ss
Nnnc aaNaaN1
,
,,,
,
,,,
14)4(
1
4
,,
)4(
,,
,
,,,
,
,,,
1 )3()3( μμvv
N
i
tt
Nic
ss
Niic
tt
Niic
ss
Nic
NT
n
Nnv
tt
Nnc
ss
Nnnc
tt
Nnnc
ss
Nnc aaaaNaaaaN1
,
,,,
,
,,,
,
,,,
,
,,,
1)3(
1
)3(
,,
,
,,,
,
,,,
,
,,,
,
,,,
1 )()( μμμμvvvv .
Adopting the notational convention introduced in section V, subsection 1.2, (A.1a) can be
written as
*
,
*
,
*,
, NNNN vμμv EEEE **
,
**
, NN μv EE (A.1b)
***
,
***
, NN μv EE ****
,
****
, NN μv EE ,
i.e., 0*
, NvE , and the additional terms, appearing in the second row of (A.1b), are defined as:
NT
n
NnvNnvNnN cN1
4
,,
)4(
,,,,
1***
, )( vvE (A.2a)
N
i
NiiN cN1
,,
14)4(***
, )3( μμ E (A.2b)
NT
n
NnvNnN dN1
)3(
,,,,
1****
, vvE (A.2c)
N
i
NiN dN1
,,
1)3(****
, μμ E , (A.2d)
where )3(
and )4(
( )3(
v and )4(
v ) denote the third and fourth moments of Nμ ( Nv ),
respectively.22
As shown in Lemma C.4 of Appendix C, the third and fourth moments of Ni, , denoted as
)3(
and )4(
, can be estimated consistently using
N
i
it
T
s
T
stt
NisNTNT 1
2
1 1
,
)3(
,~~
)1(
1~ , and (A.3a)
N
i
T
s
T
stt
NitNisNTNT 1 1 1
3
,,
)4(
,~~
)1(
1~ (A.3b)
)~~
)1(
1~1(~~
)1(
3
1 1 1
,,
1 1
2
,
1 1 1
,,
N
i
T
s
T
stt
NitNis
N
i
T
t
Nit
N
i
T
s
T
stt
NitNisTNTNTTNT
,
where
S
m
NNmNmNTN
1
,,~)~(~ uMIIε .
Hence, consistent estimators of the expressions in (A1), associated (only) with the
homoskedastic, time-invariance error component Niμ , , are given by
N
i
NiiN cN1
,,
14)4(***
, )~3~(~
μμ E , (A.4a)
N
i
NiN dN1
,,
1)3(****
,
~~~μμ E with )
~~(~
,,,,,,,,,, NnNnnNnnNnNn babad μμμμμ . (A.4b)
Next turn to ***
,NvE , which we rewrite as
***
,2
***
,1
***
, NNN vvv EEE with
NT
n
NnvNnN cN1
)4(
,,,,
1***
,1 vvE and
NT
n
NnvNnN cN1
4
,,,,
1***
,2 vvE .(A.5a)
We first consider ***
,1 NvE and note that the elements Nnc ,,v are time-invariant. By Lemma C.6a,
a consistent estimator of ***
,1 NvE is given by
22
For the elements, where both (or ) and (or ), the terms involving
the third and fourth moments drop out.
2c 4c 2c 4c
N
i
T
t
T
trr
NirNitNi
N
i
T
t
NitNiN vT
vT
cNmk
mkvc
NTmk
k
1 1 1
2
,
2
,,,
11
01
1 1
4
,,,
11
0***
,1~
1
1~11
1
~1
1
~vvvE , (A.5a)
where
932 23
3
0
TTT
Tm ,
932
32231
TTT
Tm ,
)364(
)1(23
2
0
TTT
TTk ,
)364(
)96)(1(231
TTT
TTk .
Next consider ***
,2 NvE , which involves a weighted sum of the squared variance. Without
distributional assumptions and unknown heteroskedasticity over both cross-sections and time,
it is not possible to obtain an estimates of (a weighted sum of) the squared variances. (Using
the fourth power of the residuals estimates a weighted sum of the fourth moments.) Hence, an
approximation is required, assuming that the idiosyncratic error components are
heteroskedastic only over cross-sections, but not over time, i.e., ),0.(.~ 2
, iNit div . Under that
assumption, the following expression consistently estimates ***
,2 NvE as shown in Lemma C.6b
in Appendix C:
***
,2
~NvE
N
i
T
t
NitNi
N
i
T
t
T
trr
NirNitNi vcNTkm
kmv
Tv
Tc
Nkm
m
1 1
4
,,,
11
01
1 1 1
2
,
2
,,,
11
0 ~1
1
~
1
1~11
1vv
, (A.5b)
where 1010 ,,, kkmm are defined as above.
Finally, a consistent estimate of ****
,NvE is given by
NT
n
HR
NnNnN vdTN
T
1
3
,,,
****
, )~(~
)1(
~vvE with )
~~(~
,,,,,,,,,, NnNnnNnnNnNn babad vvvvv , (A.6)
where ]~
)1(
1~[33
)1()~(
1
3
,3
3
,2
3
,
T
r
NirNit
HR
Nit vT
vTT
TTv . The consistency of
****
,
~NvE follows from
Lemma C.7 and Remark C.3 thereafter in Appendix C.
1.2 Joint Distribution of Regression Parameters and GM Estimates under Non-
normality (Definition of N,θΔΨ )
Under non-normality, equation (32c) becomes augmented by terms involving the third
moments of the error components as follows:
Nssc ),,,(,., Δθψ μ
Δθ
v
Δθ ψψ NsscNssc ),,,(,.,),,,(,., , 4,...,1c , Sss ,...,1, , with (A.7a)
v
Δθψ Nssc ),,,(,., )]([11 ,
,,
)3(
,2/1 ,,,
ss
NcNNN ssNcTN
vAv aΣκΣF
v
, and
μ
Δθψ Nssc ),,,(,., )([11 ,
,,
2)3(
,2/1 ,,,
ss
NcN ssNcTN
μAμ aκF
μ ,
where )( )3(
,
)3(
nvN σdiagΣ is an NTNT diagonal matrix with third moments )3(
,nvσ ,
NTn ,...,1 , ssNc
,,,vA
κ is an 1NT vector with the main diagonal elements of ss
Nc
,
,,vA , and
ssNc
,,,μA
κ is an 1N vector with the main diagonal elements of ss
Nc
,
,,μA .
In light (32c) and the results of section 1.2, the assumptions maintained in Theorem 4b are
sufficient to prove that the following expressions consistently estimate the elements of N,θΔΨ :
Nssc ),,,(,.,~
Δθψ μ
Δθ
v
Δθ ψψ NsscNssc ),,,(,.,),,,(,.,~~
, 4,...,1c , Sss ,...,1, , with (A.7b)
v
Δθψ Nssc ),,,(,., )]~~~([
)1(
1 ,
,,
)3(,
,
2/1
,,,
ss
Nc
HR
N
HR
NN ssNcT
T
N
vAv aΣκΣF
v
, and
μ
Δθψ Nssc ),,,(,.,~
)~~~([11 ,
,,
2)3(
,2/1 ,,,
ss
NcN ssNcTN
μAμ aκF
μ .
TECHNICAL APPENDIX
APPENDIX A
Notation
We adopt the standard convention to refer to matrices and vectors with acronyms in boldface.
Let NA denote some matrix. Its elements are referred to as Nija , ; Ni.,a and Ni,.a denote the i-
th row and the i-th column of NA respectively. If NA is a square matrix, 1
NA denotes its
inverse; if NA is singular,
NA denotes its generalized inverse. The (submultiplicative)
matrix norm is defined as 2/1)]([ NNN Tr AAA . In several places, we use single
indexation, e.g., NTn ,...,1 , to denote elements of the vectors or matrices that are stacked
over time periods. 23
Remark A.1
i) Definition of row and column sum boundedness (Kapoor, Kelejian, and Prucha, 2007, p.
99): Let 1, NNA , be some sequence of NTNT matrices with T some fixed positive
integer. We will then say that the row and column sums of the (sequence of) matrices NA are
bounded uniformly in absolute value, if there exists a constant c , which does not depend
on N, such that
caNT
j
NnjNTn
1
,1max and ca
NT
n
NnjNTj
1
,1max for all N 1.
ii) Let NA be a (sequence of) NN matrices whose row and column sums are bounded
uniformly in absolute value, and let S be some TT matrix (with 1T fixed). Then the
row and column sums of the matrix NAS are bounded uniformly in absolute value
(compare Kapoor, Kelejian, and Prucha, 2007, p. 118).
iii) If NA and NB are (sequences of) NTNT matrices (with 1T fixed), whose row and
column sums are bounded uniformly in absolute value (by Ac and Bc ), then so are the row
and column sums of NNBA and NN BA (by BAcc and BA cc ). If NZ is a (sequence of)
PNT matrices whose elements are bounded uniformly in absolute value, then so are the
elements of NNZA and NNNNT ZAZ1)( . Of course, this also covers the case NNNT ZZ1)(
for NTN IA (compare Kapoor, Kelejian, and Prucha, 2007, p. 119).
23
Take the vector , for example. Using indexation , the
elements , refer to period , elements refer to ,
etc., and elements refer to period .
),...,( ,,1 NTNN vvv NTn ,...,1
Nnv Nn ,...,1 ,, 1t NNnu Nn 2,...,1 ,, 2t
NTNTnu Nn ,...,1)1( ,, Tt
iv) Suppose that the row and columns sums of the NTNT matrices )( ,NijN aA are
bounded uniformly in absolute value by some finite constant Ac ; then q
A
qNT
n
Nnj ca 1
, for
1q (see Kelejian and Prucha, 2009, Remark C.1).
v) Let Nξ and Nη be 1NT random vectors (with 1T fixed), where, for each N, the
elements are independently distributed with zero mean and finite variances. Then the elements
of NNNT ξZ 2/1)( are )1(pO and NNNNT ηAξ1)( is )1(pO .24
vi) Let Nζ be a 1NT random vector (with 1T fixed), where, for each N, the elements are
distributed with zero mean and finite fourth moments. Let Nπ be some nonstochastic 1NT
vector, whose elements are bounded uniformly in absolute value and let NΠ be a NTNT
nonstochastic matrix whose row and column sums are bounded uniformly in absolute value.
Define the column vector NNNN ζΠπd . It follows that the elements of Nd have finite
fourth moments.25
Remark A.2
The matrices N,0Q and N,1Q have the following properties (see Kapoor, Kelejian, and Prucha,
2007, p. 101):
)1()( ,0 TNtr NQ , Ntr N )( ,1Q , 0IeQ )(,0 NTN , )()(,1 NTNTN IeIeQ ,
NNNN vQεQ ,0,0 , NNNNTNN vQμIeεQ ,1,1 )( , )()( ,0,0 NTNNNT DIQQDI ,
)()( ,1,1 NTNNNT DIQQDI , )()1(])[( ,0 NNNT trTtr DQDI ,
)(])[( ,1 NNNT trtr DQDI ,
24
Kelejian and Prucha (2004) consider the case and identically distributed elements of
and . Results hold up for (fixed) and under heteroskedasticity, as long as the
variances of the elements of and are bounded uniformly in absolute value.
25
Kelejian and Prucha (2009, Lemma C.2) give a proof for and independent elements
of . The extension to (fixed) is obvious. Independence of the elements of is not
required for the result to hold. The fourth moments of the elements of are
given by
, by Hölder’s
inequality as long as the fourth moments of the elements of are bounded uniformly.
1T
Nξ Nη 1T
Nξ Nη
1T
Nζ 1TNζ
NNNN ζΠπd
NT
j
NjNijNiE1
4
,,, )( ])([21
4
,,
4
,
4
NT
j
NjNijNiE
KENT
j
Nm
NT
k
Nl
NT
l
Nk
NT
m
NjNimNilNikNijNi ][21
,
1
,
1
,
1
,,,,,
4
,
4
Nζ
where ND is an arbitrary N N matrix. Obviously, the row and column sums of N,0Q and
N,1Q are bounded uniformly in absolute value.
APPENDIX B
Lemma B.126
Let NA be some nonstochastic NTNT matrix (with T fixed), whose row and column sums
are bounded uniformly in absolute value. Let Nu be defined by (2c) and Nu~ be a predictor
for Nu . Suppose that Assumptions 1 to 4 hold. Then
(a) )1(1 OEN NNN uAu , )1()( 1 oNVar NNN
uAu , and )1()()~~( 11
pNNNNNN oENN uAuuAu .
(b) )1(,.
1 OEN NNNj uAd , Pj ,...,1 , where Nj ,.d is the j-th column of the PNT matrix
ND , and )1()(~ 11
pNNNNNN oENN uADuAD .
(c) If furthermore Assumption 6 holds, then
)1(~~ 2/12/12/1
pNNNNNNNN oNNN ΔαuAuuAu with ])([1
NNNNN EN uAADα .
In light of (b), we have )1(ON α and )1(~)(1
pNNNNN oN αuAAD .
Proof of part (a)
Let
NNNN N uAu 1 and NNNN N uAu ~~~ 1 . (B.1)
Given (4a), we have NNNN N εε S 1 , with the symmetric NTNT matrix NS defined as
S
m
NmNmNTNN
S
m
NmNmNTN
1
1
,,
1
1
,, ])()[(])()[2/1( MIIAAMII S (B.2)
By Assumptions 1-3 and Remark A.1 in Appendix A, the row and column sums of the
matrices NS are bounded uniformly in absolute value. Let NNTN ΣIJΩε )(2
, , then
given Assumption 2, the row and column sums of the matrices NNNN ,, εε ΩΩ SS are bounded
uniformly in absolute value.
In the following let K be a common bound for the row and column sums of the absolute
value of the elements of NS , N,εΩ , and NNNN ,, εε ΩΩ SS and of the absolute value of their
respective elements. Then
NT
n
NT
j
NjNnNnjN bNEE1 1
,,,
1 (B.3)
26
Compare Lemma C.1 in Kelejian and Prucha (2009) for the case of a cross-sectional
SARAR(1,1) model and Lemma C.1 in Badinger and Egger (2008b) for the case of a cross-
sectional SARAR(R,S) model.
NT
n
NT
j
NjNnNnj EbN1 1
,,,
1
NT
n
NT
j
jnNnjbN1 1
,,,
1
3TK ,
where we used Hölder’s inequality in the last step. This proves that NE is O(1).
Now consider )( NVar , rewriting N as quadratic form in ),( NNN μvξ and invoking
Lemma A.1 in Kelejian and Prucha (2009):
)( NVar ),( 11
NNNNNN NNCov εεεε SS (B.4)
),(2
NNNNNNCovN ξξξξ SS
NT
n
NnnnNN ENTrNNN
1
4
,
2
,*
22 ]3)([)(2 sξξ ΩΩ SS ,
]}3)([)({)(2 4
,,...,1
2
*,,...,1
22
NnNTnNnnNTnNN EdiagdiagTrNTrNNN
sξξ ΩΩ SS ,
where NS is a )1()1( TNTN matrix, whose elements and row and column sums are
bounded uniformly in absolute value by some constant *K . Next, Nnn*,s is the n-th diagonal
element of NNNNnnN SS SS )( ,*,
* s , with NNN ξΩSS , where
NξΩ is the variance-
covariance matrix of Nξ , which is diagonal with elements 2
,nv for NTn ,...,1 and elements
2
for )1(,...,1 TNNTn . Finally, the vector NNN ξSη1 . In light of Assumption 1, the
row and column sums (and the elements) of NS are bounded uniformly in absolute value by
some finite constant, say **K . Moreover, the row and column sums (and the elements) of
1
NS
are also bounded uniformly in absolute value by some constant ***K .
Finally, in light of
Remark A.1 and Assumption 1 it follows that the elements of NN ξSη1 have finite fourth
moments. Denote their bound by ****K . Without loss of generality we assume that the bound
K used above is chosen such that KK *, KK **
, KK ***, and KK ****
. Hence, we
have
)( NVar ).1()2)(1()]([)(2 312
)1(,...,1
2
)1(
2 oKKTNKKdiagTrNKTrN TNnTN
I
The claim in part (a) of Lemma B.1 that )1()()( 11
pNNNNNN oENN uAuuAu now
follows from Chebychev’s inequality (see, for example, White, 2001, p. 35).
We now prove the second part of (a), i.e., )1()()~~( 11
pNNNNNN oENN uAuuAu . Since
)1()( pNN oE , it suffices to show that )1(~
pNN o . By Assumption 4, we have
NNNN ΔDuu ~ , where ),...,( .,.,1 NNTNN ddD . Substituting NNNN ΔDuu ~ into the
expression for N
~ in (B.1), we obtain
NNNNNNNNNNNN NN uAuΔDuADΔu 11 )()(~
(B.5)
])([1
NNNNNNNNNNN ΔDADΔuAADΔ
NN ,
where
N ])([1
NNNNNN uAADΔ , (B.6)
}])()[({1
1
,,
1
S
m
NNmNmNTNNNNN εMIIAADΔ
)(1
NNNNN εCDΔ ,
with ),..,(])()[( .,.,1
1
1
,,
NNTN
S
m
NmNmNTNNN ccMIIAAC , and
N NNNNNN ΔDADΔ 1. (B.7)
By Assumption 3 and Remark A.1, the row and column sums of NC are bounded uniformly
in absolute value. We next prove that )1(pN o and )1(pN o .
Proof that )1(pN o :
N NNNNN εCDΔ 1 (B.8)
N
NT
n
NnNnNN εcdΔ
1
.,.,
1
NT
n
NNnNnNN1
.,.,
1 εcdΔ
NT
n
NT
j
NjNnjNnN cN1 1
,,.,
1 dΔ
NT
n
NT
j
NjNnjNnN cN1 1
,,.,
1 dΔ
NT
n
NT
j
NjNnjNnN cN1 1
,,.,
1 dΔ
NT
n
NnjNn
NT
j
NjN cN1
,.,
1
,
1 dΔ
qNT
n
q
Nnj
pNT
n
p
Nn
NT
j
NjN cN
/1
1
,
/1
1
.,
1
,
1
dΔ
q
NT
n
q
Nnj
pNT
n
p
Nn
NT
j
NjN
p cNNNN
/1
1
,
/1
1
.,
1
1
,
12/12/1/1
dΔ .
Note that
KcNT
n
Nnj
1
, by Assumption. In the following we denote by K the uniform
bound for the row and column sums of the absolute value of the elements of NA and NC .
From Remark A.1 in Appendix A, it follows that qNT
n
q
Nnj Kc
1
, and thus
qNT
n
q
Nnjc
/1
1
,
K . Factoring K out of the sum yields
N p
NT
n
p
Nn
NT
j
NjN
p NTTNNNK
/1
1
.,
1
1
,
12/12/1/1 )(
dΔ .
This holds for 2p for some 0 as in Assumption 4 and 1/1/1 qp . By
Assumption 4, )1(2/1
pN ON Δ . Assumption 4 also implies that
)1( )(
/1
1
.,
1
p
pNT
n
p
Nn ONT
d for 2p and some 0 .
Moreover, KE Nj , , which implies that )1(1
,
1
p
NT
n
Nn ON
. Since 02/1/1 pN as
N it follows that N )1(po . For later reference, note that N )1()1( pp ooK , where
we can choose PAccK 2 , where Ac and Pc are the bounds for the row and column sums of
the absolute values of the elements of NA and
S
m
NmNmNT
1
1
,, ])([ MII , respectively.
(Compare (B.6) and Remark A.1).
Next consider
N NNNNNN ΔDADΔ 1 =
NT
n
NT
j
NNjNnjNnN aN1 1
.,,.,
1ΔddΔ (B.9)
Nnj
NT
n
NT
j
NjNnN aN ,
1 1
.,.,
21
ddΔ
qNT
j
q
Nnj
NT
n
pNT
j
p
NjNnN aN
/1
1
,
1
/1
1
.,.,
21
ddΔ
pNT
j
p
Nj
NT
n
NnN
p NNKN
/1
1
.,
1
1
.,
12/1
ddΔ
)1(
/2
1
.,
122/12/12/1/1
p
pNT
j
p
NjN
p oNNKNN
dΔ .
From the last inequality we can also see that )1(2/1
pN oN . Note that N )1()1( pp ooK ,
where we can choose PAccK 2 . Summing up, we have proved that )1(pN o .
Proof of part (b)
Denote by *
,Ns the s-th element of NNNN uAD1 . By Assumptions 3 and 4 and Remark A.1
in Appendix A there exists a constant K such that KuE Ni )( 2
, and KdEp
Nij , with
2p for some 0 . Without loss of generality we assume that the row and column
sums of the matrices NA are bounded uniformly by K . Notice first that
2/12
,
2/12
,,, NjsNnNjsNn EdEuduE
with as before.
It follows that
(B.10)
,
which shows that , and also that .
It is readily verified that , such that we have . Next observe
that
, (B.11)
where . By arguments analogous to the proof that
, it follows that . Hence , and thus
, which also shows that .
Proof of part (c)
In light of the proof of part (a)
, (B.12)
where as shown above, and in light of (b) and since by
Assumption 4, we have
. (B.13)
p
p
NjsNn dEEu/1
,
2/12
,
pp KKK /12/1/12/1 p
NT
n
NjsNn
NT
j
NnjNs duEaNE1
,,
1
,
1*
,
ppNT
n
NT
j
Nnj
p KTKNTNKaNK /12/31/12/1
1 1
,
1/12/1
)1(,.
1 ONE NNNs uAd )1(])([1 OEN NNNNN
uAADα
)1()( * oVar s )1()( **
pss oE
*11 ~NNNNNNN NN
uADuAD
NNNNN N ΔDAD 1* N
)1(])([1
pNNNNN oN uAADΔ )1(*
pN o )1(~ **
pss o
)1()(~ **
pss oE )1(~)(1
pNNNNN oN αuAAD
NNNN uAu ~~2/1
NNNNNNNNN NNNN 2/12/112/1 ])([ ΔDAAuuAu
)1(2/1
pN oN )1(2/1
pN ON Δ
)1(~~ 2/12/12/1
pNNNNNNNN oNNN ΔαuAuuAu
Proof of Theorem 1. Consistency of the Weighted GM Estimator
We first show that Assumption 5 also implies that the smallest eigenvalue of is
bounded away from zero, i.e., that for some By Assumption 5
and in light of Rao (1973, p. 62),
. (B.14)
Using Mittelhammer (1996, p. 254) we have
, (B.15)
with since by Assumption 5.
The objective function of the weighted GM estimator and its nonstochastic counterpart are
given by
and (B.16a)
(B.16b)
Since , we have , i.e., at the true parameter vector
. Hence,
. (B.17a)
In light of Rao (1973, p. 62) and Assumption 5, it follows that:
and (B.17b)
.
By the properties of the norm , we have such
that . Hence, for every
, (B.18)
which proves that the true parameter vector is identifiable unique
(compare Lemma 4.1 in Pötscher and Prucha, 1997).
Moreover, let and . Then, the difference between the
objective function and its nonstochastic counterpart can then be written as
and (B.19a)
NNN ΓΘΓ
0min )( NNN ΓΘΓ .00
0inf)( *min
xx
xΓΓxΓΓ NN
xNN
xx
ΓΘΓxΓΘΓ
xNNN
xNNN inf)(min
xx
xΓΓx
NN
xNΞ inf)( 1
min
0)()( 0minmin NNN ΓΓΘ
**0 0)( *min NΘ
)~~()
~~()( bb NNNNNNR ΓγΘΓγθ
)()()( bb NNNNNNR ΓγΘΓγθ
0 NNN bΓγ 0)( NNR θ 0)( θNR
),,...,( ,1 NSNNθ
)()()()( NNNNNNNN RR bbbb ΓΘΓθθ
)())(()()( min NNNNNNNN RR bbbb ΓΘΓθθ
)()()()( 0 NNNNN RR bbbb θθ
2/1)]([ AAA tr 2
θθ )()( NN bbbb
2
0)()( NNNN RR θθθθ 0
0inf)]()([inflim 2
0
2
0}:{}:{
NNNNN
NN
RR θθθθθθθθθθ
),,...,( 2
,,1 NSNNθ
)~
,~( NNN Γγ F ),( NNN ΓγΦ
),1(~
),1(),( bFFb NNNNR Θθ
, (B.19b)
such that
.
As evident from (17), the elements of the matrices and are all of the form ,
where are nonstochastic matrices, whose row and column sums are bounded
uniformly in absolute value. In light of Lemma B.1, the elements of are and it
follows that and as . As a consequence,
we have (for finite S)
(B.20)
Together with identifiable uniqueness, the consistency of now
follows directly from Lemma 3.1 in Pötscher and Prucha (1997).
Proof of Theorem 2. Asymptotic Normality of
To derive the asymptotic distribution of the vector , defined in (30) we invoke the central
limit theorem for vectors of linear quadratic forms given by Kelejian and Prucha (2009,
Theorem A.1). The vector of quadratic forms in the present context, to which the Theorem is
applied is ; its variance-covariance matrix is given by and
.
Note that in light of Assumptions 1, 2 and 7 (and Lemma B.1), the stacked innovations ,
the matrices , , and the vectors and , , , satisfy
the assumptions of central limit theorem by Kelejian and Prucha (2009, Theorem A.1).
It follows that
, (B.21)
since by assumption as required in Theorem A.1.
),1(),1()( bb NNNNR ΦΘΦθ
),1)(~
)(,1()(),( bFFb NNNNNNNN RR ΦΘΦΘθθ
2
),1( ~
bFF NNNNNN ΦΘΦΘ
])(2
)1(2)([1
~ 242
ba
SSSaSNNNNNN
ΦΘΦΘ FF
Nγ NΓ NNN uu A
NA NTNT
NΦ )1(O
0p
NN ΦF 0~ p
NNNNNN ΦΘΦΘ FF N
. as 0])(2
)1()([1 ][)(),(sup 242
],0[,,..,1,2
NbaSS
aSRR p
NNNNNNbSsaa s
ΦΦθθ FF
)~,~,...,~(~ 2
,,,1 NNSNN θ
Nθ~
Nq
NN N qq 2/1* NN NΨΨ *
2/12/12/1*)( NN N ΨΨ
Nξ
ss
Nc
,
,,vAss
Nc
,
,,μAss
Nc
,
,,vass
Nc
,
,,μa 4,...,1c Sss ,...,1,
),()( 24
2/1*2/12/1*2/1*
S
d
NNNNNN N I0ΨΨΨ qqq
0)()( min
*
min
1
NNN ΨΨ
Since the row and column sums of the matrices , the elements of the vectors ,
, and , and the moments of and are bounded
uniformly in absolute value, it follows in light of (28) that the elements of and also those
of are bounded uniformly in absolute value.
We next turn to the derivation of the limiting distribution of the GM estimator . In
Theorem 1 we showed that the GM estimator defined by (18) is consistent. It follows that
– apart from a set of the sample space whose probability tends to zero – the estimator satisfies
the following first order condition:
, (B.22)
which is a vector, the rows corresponding the partial derivatives of the criterion
function with respect to , , and .
Substituting the mean value theorem expression
, (B.23)
where is some between value, into the first-order condition yields
. (B.24)
Observe that and consider the two matrices
, (B.25)
, (B.26)
where and correspond to as defined above with and substituted for
. Notice that is positive definite, since and are positive definite by
assumption and the matrix has full column rank.
In the proof of Theorem 1 (and Lemma B.1) we have demonstrated that and
that the elements of and are and , respectively. By Assumption 5,
, and . Since and (and thus also and
ss
Nc
,
,Ass
Nc
,
,a
4,...,1c Sss ,...,1, th )4( Nv Nμ
NΨ
2/1
NΨ
Nθ~
Nθ~
0ΔθqΘθ
ΔθqΔθqΘΔθq
θ
),
~(
~),~
(),
~(
~),
~( NNNN
NNNNNNNNNN
1)1( S
Ns, Ss ,...,1 2
)~
(),(
),(),~
( NNNNN
NNNNNN θθθ
ΔθqΔθqΔθq
Nθ
),(~),
~(
)~
(),(~),
~( 2/12/1
NNNNNNN
NNNNN
NNNN NN ΔθqΘ
θ
Δθqθθ
θ
ΔθqΘ
θ
Δθq
NNNN BΓ
θ
Δθq ~),(
)1()1( SS
NNNNNNNN
NNNN
N BB ΓΘΓθ
ΔθqΘ
θ
ΔθqΞ
~~~~),(~),~
(~
NNNNNN BB ΓΘΓΞ
NB~
NB NBNθ
~Nθ
Nθ NΞ NΓ NΘ
)1(]12/)1(2[ SSSS NB
0ΓΓp
NN ~
NΓ NΓ~
)1(O )1(pO
)1(~
pNN oΘΘ )1(ON Θ )1(~
pN OΘNρ
~Nρ NB
~
) are consistent and bounded uniformly in probability, if follows that ,
, and . Moreover, is positive definite and thus invertible, and its
inverse is also .
Denote as the generalized inverse of . It then follows as a special case of Lemma F1
in Pötscher and Prucha (1997) that is non-singular with probability approaching 1 as
, that is , and that .
Pre-multiplying (B.24) with we obtain, after rearranging terms,
.(B.27)
In light of the discussion above, the first term on the right-hand side is zero on -sets of
probability approaching 1 (compare Pötscher and Prucha, 1997, pp. 228). This yields
. (B.28)
Next observe that
, (B.29)
since and .
As we showed in section III, the elements of can be expressed as
. (B.30)
where is defined in (24), and that
. (B.31)
It now follows from (B.28), (B.29), and (B.30) that
. (B.32)
Since all nonstochastic terms on the right hand side from (B.32) are it follows that
is . To derive the asymptotic distribution of , we invoke
Corollary F4 in Pötscher and Prucha (1997). In the present context, we have
NB )1(~
pNN oΞΞ
)1(~
pN OΞ )1(ON Ξ NΞ
1
NΞ )1(O
NΞ~
NΞ~
NΞ~
N
NΞ~
)1(pO )1(~ 1
pNN o ΞΞ
NΞ~
),(~),
~(~
)~
()~~
()~
( 2/12/1
1
2/1
NNNNNNN
NNNNNSNN NNN ΔθqΘθ
ΔθqΞθθΞΞIθθ
)1(),(~),
~(~
)~
( 2/12/1
pNNNNNNN
NNN oNN
ΔθqΘθ
ΔθqΞθθ
)1(~),
~(~ 1
pNNNNNNNN
N o
ΘΓΞΘ
θ
ΔθqΞ B
)1(~ 1
pNN o ΞΞ )1(
),~
(pNN
NNN o
Γ
θ
ΔθqB
),(2/1
NNNN Δθq
),(2/1
NNNN Δθq )1()1(*2/1
pNpN ooN qq
*
Nq
),()( 24
2/1*2/12/1*2/1*
S
d
NNNNNN N I0ΨΨΨ qqq
)1()()~
( 2/12/112/1
pNNNNNNNN oN qΨΨΘJΞθθ
)1(O
)~
(2/1
NNN θθ )1(pO )~
(2/1
NNN θθ
,
, with
.
Furthermore, and its variance-covariance matrix is
,
where is positive definite.
As a final point it has to be shown that as required in Corollary
F4 in Pötscher and Prucha (1997). Observe that
(B.33)
,
since the matrices involved are all positive definite.
Proof of Theorem 3. Joint Distribution of and Other Model Parameters
The first line in Theorem 3 holds in light of Assumption 7 (for ), bearing in mind that
, and Theorem 2 (for ).
We next prove that by verifying that the
assumptions of the central limit theorem A.1 by Kelejian and Prucha (2009) are fulfilled. Note
that by assumption. In Theorem 2, we verified that the stacked
innovations , the matrices , , and the vectors and , ,
, satisfy the assumptions of central limit theorem by Kelejian and Prucha (2009,
Theorem A.1).
For the estimators considered in the present paper, the elements of the matrix
are bounded uniformly in absolute value, provided that the elements of the
matrix are bounded uniformly in absolute value (see Lemmata 1 and 2). Hence, the linear
form fulfils the assumptions of Theorem A.1; as a consequence,
.
),(~ )1(4
2/1
SSSd
NNN N I0ζΨζ q
)1()~
(2/1
pNNNN oN ζθθ X
2/11
NNNNN ΨΘJΞ X
)1()~
(2/1
pNN ON θθ
11~ )()()( NNNNNNNNNNNN
N
JΘJJΘΨΘJJΘJΘΩθ
NθΩ~
0)(inflim min NNN XX
)(min NNXX )( 11
min
NNNNNNN ΞJΘΨΘJΞ
0)()()()()( minmin
11
minminmin
NNNNNNNNN BB ΓΓΞΞΘΘΨ
Nρ~
NN Δ2/1
NNN PFT )~
(2/1
NNN θθ
),(],)[()1(4
2/12/1
,, *
SSSP
dNNNNNo NNT I0FξΨξ w q
0)( *
,min wΨwΨ cN
Nξss
Nc
,
,,vAss
Nc
,
,,μAss
Nc
,
,,vass
Nc
,
,,μa 4,...,1c
Sss ,...,1,
),( ,, NNN μv FFF
NH
NNNNNN μFvFξF μv ,,
),()1(4, *
SSSP
dNo N I0ξ
Proof of Lemma 1.
Consider the case of random effects estimation first. In light of equations (4a) and (4b),
Assumptions 3 and 8, as well as , it follows that all columns of
are of the form , where the elements of the vector and
the row and column sums of the matrix are bounded uniformly in absolute value. It
follows that the fourth moments of the elements of the matrix are bounded
uniformly by some finite constant and that Assumption 6 holds (see Remark A.1 in Appendix
A).
Next, note that
,
where is defined in the Lemma, and
, and
.
In light of Assumption 8, and , with as defined in the Lemma.
By Assumptions 2, 3 and 8, the elements of and are bounded uniformly in absolute
value. By Assumption 1, , , and the diagonal variance-covariance
matrices of and have uniformly bounded elements. Thus, and
the elements of the variance-covariance matrix of , i.e., , are
bounded uniformly in absolute value. Moreover, , and the elements of
the variance-covariance matrix of , i.e., , are bounded
uniformly in absolute value (see Remark A.1 in Appendix A). It follows from Chebychev’s
inequality that , , and consequently
and that
. This completes the proof, recalling that
. Obviously, the same proof applies under fixed effects
estimation, using the within-transformed matrices , , , , , and ,
provided that Assumption 8 is maintained accordingly for and .
bNN βsup
),( NNN YXZ NNNN εΠπ Nπ
NΠ
NN ZD
NNNNNNNN NTNTNT μFPvFPδδ μv ,
2/1
,
2/12/1 )(~
)(~
)~
()(
NP~
N
S
m
NmNmNTN HMIIFv
1
1
,,, ])([
N
S
m
NmNmNTNTN HMIIIeFμ
1
1
,,, ])()[(
)1(~
pNN oPP )1(ON P NP
N,vF N,μF
0v )( NE 0μ )( NE
Nv Nμ 0vFv ])[( ,
2/1
NNNTE
NNvN vF ,
2/1
NNNNT ,,
1)( vv FΣF
0μFμ ])[( ,
2/1
NNNTE
NNN μFμ,
2/1
NNNT ,,
21)( μμ FF
)1()( ,
2/1
pNN ONT vFv )1()( ,
2/1
pNN ONT μFμ
)1()()()~
()( ,
2/1
,
2/12/1
pNNNNNNNN oNTNTNT μFPvFPδδ μv
)1()()( ,
2/1
,
2/1
pNNNNNN ONTNT μFPvFP μv
),(),( ,,,, NNNNNNN μvμv FPFPTTT
N,vT N,μT NP NHN,vF 0Fμ N,
NX NH
Proof of Lemma 2.
The random effects spatial generalized TSLS estimator is given by
, where with
.
Substituting , we obtain
, with
.
Next note that
,
where is a matrix, whose row and columns sums are bounded uniformly in absolute
value, satisfying
.
Substituting for , we obtain
, where
,
,
Note that the feasible generalized TSLS estimator uses generated (transformed) instruments
, based on the estimate . Using
we obtain such that .
**1** ˆ)
ˆ(
ˆNNNNN yZZZδ ***
NNNN uδZy
N
S
m
NmNmNTN uMIIu
1
,,
* )]([
**1****** )(ˆ
* NNNNNNNN
ZHHHHZPZH
***2/1*2/12/1 ~)()()
ˆ()( NNNNNN NTNTNT uHPΔδδ
1**1**11**11**1**1* ])][()[(]})[(])][(){[(~ NNNNNNNNNNN NTNTNTNTNT HHHZZHHHHZP
NNT
S
m
NmNmNN uMIuu )()(1
,,
**
NM
S
m
NmNmNm
1
,,, )( M
S
m
NNmNm
1
,, )( M
*
Nu
)ˆ
()( 2/1
NNNT δδ
*2/1)( NNT Δ N,N ,21 dd
NNNN NT εHP **2/1
,1
~)(
d
N,2d NNT
S
m
NmNmNNNT uMIHP )()(~
)(1
,,
**2/1
N
S
m
NmNmNTNT
S
m
NmNmNNNT εMIIMIHP
1
1
,,
1
,,
**2/1 ])()[()(~
)(
*
NH
Nθ
NNT
S
m
NmNmNNN HMIθHH )()()(1
,,
**
2
1
,,
j
NijNi dd
2
1
2
1
,
*2/1)(i j
NijNNT dΔ
Considering , we have
,
with , and .
.
Regarding we have
,
Next note that, in light of Assumption 8 and since is -consistent, it follows that
.
By Assumption 8 we also have and thus
. It follows as a special case of Pötscher and Prucha (1997,
Lemma F1) that
.
It follows further that and with defined in the Lemma.
Next observe that . Note further that all terms except for are of
the form , where are matrices involving products of
, , and . By the maintained assumptions regarding
these matrices it follows that the elements of are bounded uniformly in absolute value.
N,1d
NNNN NT εHP **2/1
,11 )(
d
])([~
)( **2/1
NNTNNNNT μIevHP
NNNNNN NTNT μFPvFP μv *
,
*2/1*
,
*2/1 ~)(
~)(
**
, NN HFv **
, )( NNTN HIeFμ
N,12d NNTN
S
m
NmNmNNT εMIHP )()(~
)(1
,,
*2/1
N,2d
N
S
m
NmNmNTNT
S
m
NmNmNNN NT εMIIMIHP
1
1
,,
1
,,
**2/1
,21 ])()[()(~
)(
d
N
S
m
NmNmNTNT
S
m
NmNm
NTN
S
m
NmNmNN NT
εMIIMI
MIHP
1
1
,,
1
,,
1
,,
*2/1
,22
])()[()(
)()(~
)(
d
N
S
m
NmNmNTNNTN
S
m
NmNmNNT εMIIMMIHP
1
1
,,
1
2
,,
*2/1 ])()[()]([~
)(
Nθ
2/1N
)1(ˆ
)( **
1
****
**1
pNN oNT
ZHHHZH QQQZZ
)1(**
1
**** O
ZHHHZH QQQ
)1()( 1
**
1
**** O
ZHHHZH QQQ
)1()(]ˆ
)([ 1
**
1
****
1**1
pNN oNT
ZHHHZH QQQZZ
)1(~ **
pNN oPP )1(* ON P*
NP
)1()( pNN oρρ
Nij ,d N,11d
NN
/-
Np NTo εP D 21* )(~
)1(ND *PNT
)( NT MI
S
m
NmNmNT
1
1
,, ])([ MII NH
ND
As a consequence, and the elements of the variance-covariance matrix
of , i.e., , are bounded uniformly in absolute value (see
Remark A.1 in Appendix A). It follows from Chebychev’s inequality that
. As a consequence, all terms except for are , and
. Finally, observe that , with
and , recalling that .
APPENDIX C
Lemma C.1
Define the vectors with elements and the
vector of fixed effects residuals . Suppose that Assumptions 1-4 hold and that
the elements of have bounded fourth moments. Then and
, with , , and where ,
, and for some . As a direct consequence,
and .
Proof.
Note first that
, where (C.1)
.
This can also be written as
, (C.2)
where with
,
, and
0ε ])[( 2/1
NNNTE D
NNNT εD 2/1)( N,NNNT DD εΩ1)(
)1()( 2/1
pNN ONT εD Nij ,d N,11d )1(po
)1(,11 pN Od NNNNNvNN NTNT μFPvFP *
,
*2/1*
,
*2/1
,11
~)(
~)( d
*
,NvF*
NH *
,NF )(*
NTN IeH ***
NNN PFT
1NT N
S
m
NmNmNTN uMIIε ~])~([~
1
,,
Nitε ,~
NNN εQv ~~,0
),...,( .,.,1 NNNN ddD NNN ηεε ~
NNN ηvv ~NitNNit ,,
NitNNit ,, )1(2/1
pN ON KE Nit 4
,
KENit
4
, K
2
,
2
,,
2
,
2
, 2~NitNNitNitNNitNit
2
,
2
,,
2
,
2
, 2~NitNNitNitNNitNit vvv
Nε~
NN ηε
Nη N
S
m
NmNmNT
S
m
NmTNmNm εMIIMI ])([)])(~([ 1
1
,,
1
,,,
NN
S
m
S
m
NmTNmNmNNNmNmNT ΔDMIΔDMII
1 1
,,,,, )])(~([)]([
NNN gRη
),,( ,3,2,1 NNNN RRRR
N,1R
S
m
NNmNmNT
1
,, ,)( DMII
}])()[(,...,])()[{( 1
1
,,,
1
1
,,,1,2 N
S
m
NmNmNTNSTN
S
m
NmNmNTNTN εMIIMIεMIIMI
R
])(,...,)[( ,,1,3 NNSTNNTN DMIDMI R
.
In light of Assumption 3 and since the elements of have bounded fourth
moments, each column of the matrix is of the form , where the elements of
the vector are bounded uniformly in absolute value by some finite constant, the
row and column sums of the matrix are bounded uniformly in absolute value
by some finite constant, and the fourth moments of the elements of are also bounded by
some finite constant. It follows that the fourth moments of the elements of are also
bounded by some finite constant (see Remark A.1 in Appendix A).
As a consequence, , or for the n-th element of the vector ,
, (C.3)
where , denotes the n-th row of , and with
. Without loss of generality we can select such that for
. By Assumption 1 there is also some such that for . In the
following we use to denote the larger bound, i.e., . Also note that
. Replacing index with index , we have, from (C.1) and (C.3), that
.
By the same reasoning we have
(C.4)
,
with , where . Obviously, the elements of the columns of
and their fourth moments remain bounded uniformly after pre-multiplication with , such
that we have with defined as above and . Finally, we
])~(,)~(,[ NNNNNNN ΔρρρρΔg
),...,( .,.,1 NNNN ddD
NR NNN ζΠπ
1NT Nπ
NTNT NΠ
Nζ
NR
NNN gR η 1NT Nη
NnNNnNNn ,,, rg
NN g Nn.,rNR NnNn .,, r
KE Nn
4
, K
KE Nn )( ,
4K
KE Nn,4
K ),max( KKK
)1(2/1
pN ON n it
2
,
2
,,,
2
,
2
,
2
,,
2
,
2
, )2()(~NitNitNitNitNitNitNitNitNitNit
2
,,,2 NitNitNit
2
,,,2 NitNitNit
2
,
2
,,2 NitNNitNiN
NNN εQv ~~,0 NNNN ηQεQ ,0,0
NNN ηvQ ,0
NN ηv
NNNgRη NNN RR ,0Q NR
N,0Q
NitNNit ,, N KE
Nit
4
,
have . Without loss of generality, we choose the bound
in the lemma such that and .
Proof of Theorem 4a. Consistency of
In the following we provide two Lemmata that establish the consistency of .27
As
evident from the proof, this also covers the simpler case of .
Lemma C.2
Suppose Assumptions 1-4 hold and let
, and
,
with and , and where the vector can be any
estimator that satisfies . Let and be vectors, whose elements
are bounded uniformly in absolute value by some constant c, and let
. Define with
. Then
(a) and .
(b) There exist random variables that do not depend on and such that
, with and where is a
constant that depends monotonically on (as well as on some other bounds maintained in the
assumptions).
Proof.
A complication in the estimation of arises from the fact that
is based on the idiosyncratic error components in levels ( ), whereas
the estimator has to be based on the (demeaned) fixed effects residuals . The problem at
hand is similar in its structure to that in Stock and Watson (2008), who consider the
estimation of a heteroskedasticity-robust variance-covariance matrix in fixed effects panel
data models (without spatial correlation). They suggest an asymptotic bias correction that is
27
Related results for the cross-sectional case are obtained by Kelejian and Prucha (2009).
2
,
2
,,
2
,
2
, 2~NitNNitNitNNitNit vvv
KKK KK
N,
~ΔΔΨ
v
ΔΔΨ N,
~
μ
ΔΔΨ N,
~
N
S
m
NmNmNTNNNNNN uMIIQvQεQv )]([1
,,,0,0,0
N
S
m
NmNmNTNNNN uMIIQεQv ~)~([~~
1
,,,0,0
NNNN ΔDuu ~ ),...,( .,.,1 NNTNN ddD 1S Nρ
~
)1()~( pNN oρρNa Nb 1NT
)()( 2
,,1
2
,1 Nnv
NT
nNn
NT
nN diagEvdiag Σ ])~[(])~[(~ 2
,1
2
,1
HR
Nit
NT
it
HR
Nn
NT
n
HR
N vdiagvdiag Σ
T
r
NirNit
HR
Nit vTT
vT
Tv
1
2
,
2
,
2
,~
)2)(1(
1~
2)~(
)1(1~1
pNNNN
HR
NN oNTNT
bΣabΣa )1(1
ONT
NNN bΣa
N Na Nb
)1)((1~1
NNNNN
HR
NN cKNTNT
bΣabΣa )1(pN o )(cK
c
NNNNT bΣa1)(
)( 2
,1 Nn
NT
nN Evdiag Σ Nitv ,
Nitv ,
based on an expression, where the error components are clustered over cross-section
units (averaged over time), and which can be estimated consistently with the fixed effects
residuals ( ). In the following, we adopt the approach by Stock and Watson (2008) to
derive bias-corrected estimators in the present framework.
Define
, (C.5a)
with , and (C.5b)
with . (C.5c)
The bias is derived using the expectation of the infeasible estimate , which assumes that
the true parameters and are known and omits the degrees of freedom correction for
the P regressors. For simplicity of notation, define ; without loss of generality,
the bound in the Lemma is chosen such that .
Recognizing that we have, for each i,
(C.6)
,
using and .
Rearranging terms and averaging over N yields the following bias corrected estimator for :
, (C.7)
Nitv ,
2
,~
Nitv
)(1
NNNN ENT
bΣa
NNNNTN
bΣa
)1(
1 )(
2
,1 Nn
NT
nN vdiag Σ
NNNNTN
bΣa~
)1(
1~
)~(~ 2
,1 Nn
NT
nN vdiag Σ
NE
Nρ Nδ
NitNitNit bac ,,,
c cc Nit ,
N
i
NiNNNN EN
ETN
E1
,
1)(
)1(
1 bΣa
T
t
NitNitNi vcET
E1
2
,,,1
1
T
t
NiNiNitNitNit vvvvcET 1
2
,,,
2
,, )2(1
1
T
t
T
r
T
s
NisNirNit
T
t
T
s
NisNitNit
T
t
NitNit vvcT
ET
vT
vcET
vcET 1 1 1
,,,21 1
,,,
1
2
,,
1
1
112
1
1
1
1
NiNiTT
T,,
)1(
1
1
2
T
t
NitNitNi vcET 1
2
,,,
1
T
t
T
r
NirNitNi vT
cET 1 1
2
,,,
11
N
]~
1
1~[2
1~NN
HR
NTT
T
where
and .28
Finally, note
that (C.7) can also be written as , where is a diagonal matrix with
elements .
We next prove that , considering
(C.8)
and showing that both and are for fixed T as .
Consider first . It follows from the triangle inequality that
. (C.9)
By the weak law of large numbers for i.d. variables (e.g., White, 2001, p. 35), we have
,
(C.10)
observing that the fourth moments of (and ) are bounded uniformly by Assumption
1. We thus also have .
Moreover, repeatedly using the triangle inequality, it follows that
(C.11)
,
where ; the constant is chosen such
that
and . Note that
by the weak law of large numbers.
Next rewrite
, and
28
Note that , where .
N
i
T
t
NitNitN vcTN 1 1
2
,,~
)1(
1~
N
i
T
t
T
r
NirNitN vT
cNT 1 1 1
2
,,~
1
11~
N
HR
NN
HR
NNT
bΣa~1~
HR
NΣ~
T
r
NirNit
HR
Nit vTT
vT
Tv
1
2
,
2
,
2
,~
)2)(1(
1~
2)~(
)1(~pN
HR
N o
N
HR
N ~ )~(2
1)~(
2
1NNNN
TE
T
T
)~( NN E )~( N )1(po N
NN E ~
NN E ~NNNN E ~
NN E )1()()1(
1
1 1
2
,
2
,, p
N
i
T
t
NitNitNit ovEvcTN
Nitv , Nitv ,
)1(pNN oE
N
i
T
t
NitNitNitNN vEvcTN
E1 1
2
,
2
,, )()1(
1
Nc ,12
*2 cc
N
i
T
t
NitNitNitNitN vEvEvEvTN 1 1
2
,
2
,
2
,
2
,,1 )1(
1*c
*
2
, cEv Nit *
2
, cvE Nit )1(,1 pN o
N
i
T
t
NNNNitNitNNNNTN
vcTNTN 1 1
2
,,~~
)1(
1~
)1(
1~
)1(
1~ vCvbΣa
NiNi E ,,
T
t
T
r
NirNitNi vT
cET 1 1
2
,,,1
11
,
where . Hence,
(C.12a)
,
where , and
(C.12b)
.
By the properties of the matrices
, , and , and in light of Remark
A.1, the expressions in (C.12b) are all quadratic forms in matrices whose row and column
sums are bounded uniformly in absolute value by some constants that depend monotonically
on c as well as on other bounds maintained in the assumptions.
Repeatedly using the triangle inequality, Lemma B.1 in Appendix B, and factoring out the
terms it follows that
, (C.13)
where and does not depend on and and the constant depends
monotonically on c and other bounds maintained in the assumptions. Obviously, it follows
that . Moreover, we have
. (C.14)
It follows from (C.9), (C.11), and (C.13) that
(C.15)
where .
NNNNTN
vCv
)1(
1
)()( ,1,1 Nit
NT
itNn
NT
nN cdiagcdiag C
NNNNTN
vCv ~~
)1(
1~
N
S
m
NNmNmTNNNNTNTN
uCMIuuCu ~)]~([~
)1(
2~~
)1(
1
1
,,
S
m
N
S
m
NmNmTNNmNmTNTN 1 1
,,,,~)]~[)]~([~
)1(
1uMICMIu
NNNN ,0,0 QCQC
NNNNTN
vCv
)1(
1
N
S
m
NNmNmTNNNN NTTN
uCMIuuCu
1
,,
1 )]([)(2)1(
1
S
m
N
S
m
NmNmTNNmNmTNTN 1 1
,,,, )][)]([)1(
1uMICMIu
N,0QNC Nm,M ,,...,1 Sm
)1(po
NNN ck ,2)(~
)1(,2 pN o Na Nb )(ck
)1(~pNN oE
*
1 1
2
,,
1ccvEc
NTE
N
i
T
t
NitNitN
NN ~ ),)]((2[ *
Nckccc
)1( NN
Next consider . By the triangle inequality, and by the
weak law of large numbers
,
(C.16)
observing that the fourth moments of are bounded uniformly by Assumption 2. We thus
also have . Next, rewrite as
(C.17a)
where and . Moreover, repeatedly using the triangle
inequality, it follows that
(C.17b)
,
where the last step uses and is defined
as above. Note that
by the weak law of large numbers.
From Lemma C.1 in Appendix C, it follows that ,
where and . Using the triangle and Hölder inequality, we
have
, where . (C.18)
Obviously, , which – together with (C.16) implies that . It
also holds that .
From (C.17) with (C.18) it follows that
, (C.19)
where . Combining our results that and ,
result (a) in Lemma C.2 follows in light of (C.8), noting that
)~( N NNNN ~~
)1(pNN o
Nitv ,
)1(pNN o NN
NN )(1
1 1
2
,
2
,,
N
i
T
t
NiNiNit vEvcNT
T
r
NirNi vT
v1
2
,
2
,1
1
T
r
NirNi vT
v1
2
,
2
,
1
N
i
T
t
NiNiNitNN vEvcNT 1 1
2
,
2
,, )(1
Nc ,1 cc*2
N
i
T
t
NiNiNiNiN vEvEvEvNT1 1
2
,
2
,
2
,
2
,
1
,1 )(*c
)1(,1 pN o
2
,
2
,,
2
,
2
, 2~NitNNitNitNNitNit vvv
KENit
4
,)1(2/1
pN ON
NN ~
N
i
N
i
T
t
T
r
NirNit
T
t
T
r
NirNit vT
cvT
cNT 1 1 1 1
2
,,
1 1
2
,,1
1~
1
11
Nck ,2)( )1(,2 pN o
)1(~pNN o )1(~
pN o
N
NN ~
Nckccc )](2[ *
)1( NN )1(~pNN oE )1(~
pN o
. Result (b) in Lemma follows from (C.15) and (C.19),
which yields
(C.20)
,
where and .
Lemma C.3
Suppose Assumptions 1-4 hold. Furthermore, assume that , and that the row
and column sums of , are uniformly bounded in absolute value by 1 and
some finite constant respectively. Let , and
let with and
, and where the vector can be any estimator that satisfies
.
Let and , where
is an matrix whose elements are uniformly bounded in absolute value by some
constant , and let and be defined as in Lemma C.2. Then,
and .
Proof.
The subsequent proof will focus on the case, where and
; this corresponds to the random effects estimation of the
untransformed model (see Lemma 1); it is readily observed from the proof that this covers
also the case where (fixed effects estimation of untransformed model),
(random effects estimation of transformed model) as well as (fixed effects
estimation of transformed model).
*
1 1
2
,,,
1ccEc
NT
N
i
T
t
NitvNitN
N
HR
N ~ })](2[)](2{[2 **
NN ckcccckccc
Nckccc )](2[2 *
)](),(max[)( ckckck NNN )2( NN
1sup1
,
S
m
NmN
Nm,M Sm ,...,1
N
S
m
NmNmNTNNNN uMIIQεQv )]([1
,,,0,0
N
S
m
NmNmNTNNNN uMIIQεQv ~)~([~~
1
,,,0,0
NNNN ΔDuu ~
),...,( .,.,1 NNTNN ddD 1S Nρ
~
)1()~( pNN oρρ
N
S
m
NmNmNTN HMIIF ])([ 1
1
,,
N
S
m
NmNmNTN HMIIF ])~([~
1
,,
NH *PN
cNΣ
HR
NΣ~
)1(1~~~1
pNNNN
HR
NN oNTNT
FΣFFΣF )1(1
ONT
NNN FΣF
N
S
m
NmNmNTN HMIIF ])([ 1
1
,,
N
S
m
NmNmNTN HMIIF ])~([~
1
,,
NN FF *
NN FF
*
NN FF
Under the maintained assumptions there exists a with . By the
properties of the matrices the row and column sums of , are
uniformly bounded in absolute value by 1 and some finite constant respectively. For later
reference, also note that the elements of the vector are also uniformly bounded in
absolute value by c.
In the following, we ignore the division by (the fixed constant) T without consequences for
the proof. Denote the (r,s)-th element of the difference as . It
is given by
, , (C.21)
which can be written as , where
(C.22)
.
Next note that and thus
(C.23)
We next demonstrate that by showing that each summand ,
, invoking the following theorem (see, e.g., Resnik, 1999, p. 171): Let
) be real valued random variables. Then, if and only if each subsequence
contains a further subsequence that converges almost surely to .
As we show below we will be confronted with terms of the form:
* 1sup *
1
,
S
m
Nm
Nm,M Nm,*M Sm ,...,1
Ns
k
N
k
,.* hM
NNNN
HR
NNNTNT
FΣFFΣF1~~~1
N
)~~~
( ,.,.,.,.
1
NsNNrNs
HR
NNrN N fΣffΣf *,...,1, Psr
7
1
,
i
NiN
)~
)(~
()~
( ,.,.,.,.
1
,1 NsNsN
HR
NNrNrN N ffΣΣff
NsN
HR
NNrNrN N ,.,.,.
1
,2 )~
()~
( fΣΣff
)~
)(~
( ,.,.,.
1
,3 NsNsN
HR
NNrN N ffΣΣf
NsN
HR
NNrN N ,.,.
1
,4 )~
( fΣΣf
)~
()~
( ,.,.,.,.
1
,5 NsNsNNrNrN N ffΣff
NsNNrNrN N ,.,.,.
1
,6 )~
( fΣff
)~
( ,.,.,.
1
,7 NsNsNNrN N ffΣf
Ns
S
m
NmNmNTNs ,.
1
1
,,,. ])([ hMIIf
Ns
S
m
NmNmN
S
m
NmNmNTNsNs ,.
1
1
,,
1
,,,.,. ]})()~[({~
hMIMIIff
)1(~
,.,. pNsNs o ff )1(, pNi o
7,...,1i 1,,( NXX N
XX pN aNX
aNX X
.(C.24)
where is a matrix, whose row and column sums are uniformly bounded in absolute value
by some constant It follows that the absolute values of the elements of the vector
(and also that of ) are uniformly bounded in absolute value
by some finite constant (and ). (See Remark A.1 in Appendix A.)
Without loss of generality is chosen such that and holds.
Hence, Lemma C.2 applies and it follows that and that there exist random
variables such that .
Now, let the index denote some subsequence. In light of the aforementioned equivalence,
there exists a subsequence of this subsequence ( ) such that for events , with
, it holds that
, , , (C.25)
and that for some , and thus
, (C.26)
and finally
, where . (C.27)
In the following, assume that . Since , it follows from Horn and
Johnson (1985, p. 301) that is invertible and that
(C.28)
.
Substituting into the expression for given by (C.22) yields
(C.29)
Ns
k
NTN
l
NTNr
kl
Ns
k
NT
HR
N
l
NTNr
kllk
N pNpN ,.,.*
1
,.,.*
1),( )(()(()(~
)( hMIΣMIhhMIΣMIh
NM
Mc
Ns
k
NT ,.)( hMI Ns
k
NT
k
,.* )( hMI
ccc * **** cccc k
*c*
2
, cEv Nit *
2
, cvE Nit
)1(),(
p
lk
N o
)1(pN o )1)(( *
),(
N
lk
N cK
aN
aN A
0)( CAP
0)(),( lk
Na0)(
aN 0)(~,, aa NmNm ρρ Sm ,...,1
NNa 1)( aN
)(2))(1)(()( **
),( cKcKaa N
lk
N
S
m
Nm pρa
1
**, )(~ 12
sup *
1
,
**
p
p
S
m
NmN
NNa 1)(~
1
,
S
m
Nm a
))(~(1
,,
S
m
NmNmN aaMI
aaaaaaa Ns
S
m
NmNmNT
S
m
NmNmNTNsNs
,.
1
1
,,
1
1
,,,.,. ]})())(~{[()(~
hMIIMIIff
aaaaa Ns
l
lS
m
NmNm
lS
m
NmNmT
,.
1 1
,,
1
,, ])())(~[( hMMI
aN ,1
))(~
)(~
())(~
( ,.,.,.,.
1
,1 aaaaaaa NsNsN
HR
NNrNraN N
ffΣΣff
.
A single element with index (k,l) of this infinite double sum over k and l is given by
.
(C.30)
Next note that for any value of and any there exist matrices and ,
whose row and column sums are uniformly bounded in absolute value, such that:
and . (C.31)
and can thus be factored out of the sum, yielding
. (C.32)
By the same reasoning, for any values of and , there exists a matrix
, whose row and column sums are uniformly bounded in absolute value, such that:
. (C.33)
Substituting into the expression for , we obtain
(C.127)
.
Hence, we can then write
aaaaaaa
aaaaa
Ns
k
kS
m
NmNm
kS
m
NmNmTNN
l
lS
m
NmNm
lS
m
NmNmTNraN
,.
1 1
,,
1
,,
1 1
,,
1
,,,.
1
]})()~[(){~
(
} ])()~[({
hMMIΣΣ
MMIh
1
,.
1
,,
1
,,
1
,,
1
,,
1
,.
1]})()~[({
~]})()~[({
k
Ns
kS
m
NmNm
kS
m
NmNmT
HR
N
lS
m
NmNm
lS
m
NmNmT
l
Nra aaaaaaaaaaaN hMMIΣMMIh
1
,.
1
,,
1
,,
1 1
,,
1
,,,.
1]})()~[({}])()~[({
k
Ns
kS
m
NmNm
kS
m
NmNmTN
l
lS
m
NmNm
lS
m
NmNmTNra aaaaaaaaaaaN hMMIΣMMIh
aaaaaaaaaaa Ns
kS
m
NmNm
kS
m
NmNmT
HR
N
lS
m
NmNm
lS
m
NmNmTNraN
,.
1
,,
1
,,
1
,,
1
,,,.
1]})()~[({
~]})()~[({ hMMIΣMMIh
aaaaaaaaaaa Ns
kS
m
NmNm
kS
m
NmNmTN
lS
m
NmNm
lS
m
NmNmTNraN
,.
1
,,
1
,,
1
,,
1
,,,.
1]})()~[({]})()~[({ hMMIΣMMIh
aN ρ )(~ aN ρ
aN MaN M
S
m
NNm
S
m
NmNm aaaa
1
,
1
,, )( MM
S
m
NNm
S
m
NmNm aaaa
1
,
1
,, )~(~ MM
aN MaN M
l
N
lS
m
Nm
l
N
lS
m
Nm aaaa
MM )())(~(1
,
1
,
))(~(1
,
S
m
Nm a)(
1
,
S
m
Nm a
aN M
aaaaaaa N
lS
m
Nm
lS
m
NmN
lS
m
NmN
lS
m
Nm
MMM ])())(~[()())(~(1
,
1
,
1
,
1
,
aN MaN ,1
aaaaaaaaaa Ns
k
N
kS
m
Nm
kS
m
NmT
HR
N
l
N
lS
m
Nm
lS
m
NmTNraN N
,.
1
,
1
,
1
,
1
,,.
1
,1 }])())(~[({~
}])())(~[({ hMIΣMIh
aaaaaaaaa Ns
k
N
kS
m
Nm
kS
m
NmTN
l
N
lS
m
Nm
lS
m
NmTNraN
,.
1
,
1
,
1
,
1
,,.
1}])())(~[({}])())(~[({ hMIΣMIh
, (C.34)
where with
(C.35)
and
(C.36)
Note that as in light of the aforementioned results and thus
since for large enough. Moreover,
. (C.37)
Hence,
.
For , , such that
. (C.38)
Hence, there exists a dominating function for all values of k,l. Moreover, since
by construction, the dominating function is integrable (summable), i.e.,
. (C.39)
Hence the assumptions for application of Lebesgue’s Dominated Convergence Theorem are
fulfilled (see, e.g., Van der Vaart and Yen, 1968), such that
. (C.40)
The same holds for , . It follows that as and in light of
Resnik (1999) it follows that .
)()(1 1
),(
,1
k l
lk
NN aaX
)()()( ),(),(),( lk
N
lk
N
lk
N aaaaX
k
kS
m
Nm
lS
m
Nm
l
lS
m
Nm
lS
m
Nmlk
N
aaaa
aa
*
1
,
1
,
*
1
,
1
,),(
])()~[(])())(~[(
)(
aaaaaaaaaa Ns
k
NTN
l
NTNr
kl
Ns
k
NT
R
N
l
NTNr
kllk
N pNpN
,.,.*
1
,.,.*
1),( )()())((~
)()( hMIΣMIhhMIΣMIh
0),(
lk
Naa aN
0)(),( lk
NaX )(2)( *
),( cKlk
Na
aN
k
kS
m
Nm
lS
m
Nm
l
lS
m
Nm
lS
m
Nmlk
N
aaaa
aa
*
1
,
1
,
*
1
,
1
,),(
])()~[(])())(~[(
)(
klkl
Naa
*
**
*
**
*
** 422
NN )(2)( *
),( cKlk
Na
klkl
kllk
N cKcKBXa
*
**
*
**),(),( )(84)(2)(
),( klB
1/ ***
1 1 1
),(
1
),(
k k l
kl
l
kl BB
)(lim ,1 aa NN 0
aNi , 7,...,2i 0, aNi aN
)1(pN o
Thus, . That follows from the
properties maintained for the row and column sums of and the elements
of and .
Remark C.1
Regarding , note that and (obviously suppressing the
indexation of ), and accordingly for . By assumption , and
thus , where the dimension of is . Moreover, ,
and thus , where the dimension of is . By Lemma C.3, we
have . It follows that
.
Lemma C.4
Let and be defined as in (A.3a) and (A.3b). Suppose that Assumptions 1-4 hold
and that the elements of have bounded fourth moments. It follows that
, , and thus , and that ,
and thus .
Proof.
The subsequent proof builds on Gilbert (2002), who considers the estimation of third and
fourth moments in homoskedastic error component models without spatial lags of the
dependent variable (or other endogenous variables) and without spatial regressive
disturbances.
Consider the third moment of and its estimate:
for any given i and , and (C.41a)
. (C.41b)
By Assumption 1, is invariant to the choice of i, s and t. Using (C.1), we have
(C.42)
)1(~~~ 11
pNNNN
HR
NN oNN FΣFFΣF )1(1 ON NNN
FΣF
S
m
NmmN
1
1
, )( MI
NΣ NH
**
,
~NvE NNNN αPFa vv ,, NNNN αPFa vv
~~~~,,
Nα~
N,vb )1(~
pNN oPP )1(ON P
)1(~
pN OPNP PP *
)1(~pNN oαα
)1(ON α )1(~pN Oα
Nα 1P
)1(~~~ 11
pN
HR
NNN
HR
NN oNN FΣFFΣF
)1(~~~~~~~~ 11**
,
**
, pNNNNNNNNNN
HR
NNNNNN oNN αPFΣFPααPFΣFPαvv EE
)3(
,~
N)4(
,~
N
),...,( .,.,1 NNNN ddD
)1(~ )3()3(
, pN o )1()3( O )1(~ )3(
, pN O )1(~ )4()4(
, pN o
)1()4( O )1(~ )4(
, pN O
Ni,
)( 2
,,
)3(
NitNisE ts
N
i
Nit
T
s
T
stt
NisNTNT 1
2
,
1 1
,
)3(
,~~
)1(
1~
)3(
N
i
NitNit
T
s
T
stt
NisNisNTNT 1
2
,,
1 1
,,
)3(
, )()()1(
1~
.
Consider
(C.43a)
.
By the weak law of large numbers converges in probability to . Notice further that,
by the properties of and (see Assumption 1), , and are all
. As a consequence, converges in probability to .
Next observe that
(C.43b)
,
(C.43c)
,
(C.43d)
,
N
i
T
s
T
stt
NitNisNitNisNitNisNisNitNisNitNitNisTNT 1 1 1
2
,,,,
2
,,,
2
,,,
2
,, )22()1(
1
NNNNNN ,6,5,4,3,2,1
N
i
T
s
T
stt
NitNitNisNis
N
i
T
s
T
st
NitNisN vv1 1 1
2
,,,,
1
2
,,,1 ))((
N
i
T
s
T
stt
NisNitNisNitNiNiNisNitNiNitNiNi vvvvvvv1 1 1
,
2
,,,,
2
,,
2
,,,
2
,
3
, )22(
NNNNNN ,16,15,14,13,12,11
N,11 )3(
Nitv , Nit , NNNN ,15,14,13,12 ,,, N,16
)1(po N,1)3(
N
i
T
s
T
stt
NisNitNTNT 1 1 1
,,,2 2)1(
1
N
i
T
s
T
stt
NitNisNTNT 1 1 1
,,)1(
2
)1()]1([)]1([2
2/1
1 1 1
2
,
1
2/1
1 1 1
2
,
12/12/1
p
N
i
T
s
T
stt
Nit
N
i
T
s
T
stt
NisN oTNTTNTNN
N
i
T
s
T
stt
NisNitNTNT 1 1 1
,
2
,,3)1(
1
)1()]1([)]1([)(
2/1
1 1 1
4
,
1
2/1
1 1 1
2
,
1122/1
p
N
i
T
s
T
stt
Nit
N
i
T
s
T
stt
NisN oTNTTNTNN
N
i
T
s
T
stt
NitNisNTNT 1 1 1
2
,,,4)1(
1
)1()]1([)]1([
2/1
1
2
,
1
2/1
1
4
,
12/12/1
p
N
i
T
s
T
st
Nit
N
i
T
s
T
st
NisN oTNTTNTNN
(C.43e)
,
(C.43f)
,
because is and the terms in brackets expressions are all , since
and for and all N. It follows that
, by Assumption 1, and that . Obviously, we then
also have that .
Consider next the fourth moment of and its estimate:
for any given i and , (C.44a)
(C.44b)
.
Observe that
(C.45)
N
i
T
s
T
st
NitNisNTNT 1
,,,5 2)1(
1
N
i
T
s
T
stt
NitNisN NNTNT 1 1 1
,,
122/1 )()1(
2
)1()]1([)]1([)(2
2/1
1 1 1
2
,
1
2/1
1 1 1
2
,
1122/1
p
N
i
T
s
T
stt
Nit
N
i
T
s
T
stt
NisN oTNTTNTNN
N
i
T
s
T
stt
NitNisNTNT 1 1 1
2
,,,6)1(
1
)1()]1([)]1([)(2
2/1
1 1 1
4
,
1
2/1
1 1 1
2
,
12/332/1
p
N
i
T
s
T
stt
Nit
N
i
T
s
T
stt
NisN oTNTTNTNN
NN 2/1 )1(pO )1(pO
KE Nis
, KE Nit
,4
)1(~ )3()3(
, pN o )1()3( O )1(~ )3(
, pN O
)1(~)~~( )3()3(
,
)3()3(
,
)3(
, pvNvvNN o
Ni,
)]()()[(3)( 234
itisititisitis EEEE ts
N
i
T
s
T
stt
NitNisNTNT 1 1 1
3
,,
)4(
,~~
)1(
1~
)~~
)1(
1~1(~~
)1(
3
1 1 1
,,
1 1
2
,
1 1 1
,,
N
i
T
s
T
stt
NitNis
N
i
T
t
Nit
N
i
T
s
T
stt
NitNisTNTNTTNT
)~( ,2
2
,,2,1 NNNN
N
i
T
s
T
stt
NitNitNisNisNTNT 1 1 1
3
,,,,,1 ))(()1(
1
N
i
T
s
T
stt
NitNisNitNisNitNitNitNisNitNisTNT 1 1 1
3
,,,,
2
,,
2
,,
3
,, 33()1(
1
)33 ,
3
,,,
2
,,,
2
,
3
,, NisNitNitNisNitNisNitNitNitNis
.
The first term can also be written as
(C.46)
.
By the properties of and (see Assumption 1), the difference between and
converges in probability to zero by the weak law of large numbers
for i.d. random variables (White, 2001, p. 37, Corollary 3.9).
Moreover, it follows from the properties of and (see Assumption 1), that the terms
are all . It follows that the difference between
and converges in probability to zero.
Next consider
(C.47)
,
which converges to by the weak law of large numbers, since
for by the properties of and and the sum
over the remainder terms appearing in are by arguments analogous to those for
and (see (C.43e) and (C.43f)). Finally, the difference between
and converges in probability to zero. As a
consequence, , by Assumption 1, and .
NNNNNNNN ,18,17,16,15,14,13,12,11
N,11
N,11 3
,,,,
3
,, ))(( NitNitNisNisNitNis vv
)33)(( 3
,
2
,,,
2
,
3
,,, NitNitNitNitNitNitNisNis vvvv
3
,,
2
,,,,
2
,,
3
,, 33( NitNisNitNitNisNitNitNisNitNis vvv
)33 3
,,
2
,,,,,
2
,
3
,, NitNisNitNisNitNitNisNitNitNis vvvvvvv
)3333( 3
,,
2
,,,,,
2
,
3
,,
3
,,
2
,
2
,,
3
,
4
, NitNisNitNisNitNitNisNitNitNisNitNisNitNiNitNiNi vvvvvvvvvv
Nitv , Nit , N,11
N
i
T
t
NitvTN
E1 1
2
,
2)4( 13
Nv Nμ
NNNNNNN ,18,17,16,15,14,13,12 ,,,,,, )1(po N,1
N
i
T
t
NitvTN 1 1
2
,,
2)4( 13
N
i
T
s
T
st
NitNitNitNisNTNT 1 1
,,,,,2 ))(()1(
3
N
i
T
s
T
st
NitNitNisNisTNT 1 1
,,,, ))(()1(
3
N
i
T
s
T
st
NitNisNitNisNitNisNitNisTNT 1 1
,,,,,,,, )()1(
3
23
2
1
,, ])1(
11[
T
s
T
st
NitNisTT
E ts Nitv , Nit ,
N,2 )1(po
N,2 N,5 2
,~
N
N
i
T
t
NitNT 1 1
2
,~1
N
i
T
t
NitvTN
E1 1
2
,,
2 11
)1(~ )4()4(
, pN o )1()4( O )1(~ )4(
, pN O
Lemma C.5
Suppose Assumptions 1-4 hold. Let with ,
with real, nonstochastic, and symmetric matrices, whose elements
are time-invariant ( ), whose diagonal elements are zero ( for
), and whose row and column sums are bounded uniformly in absolute value. Let
with and
, and where the vector can be any estimator that satisfies
. Finally, define
.
Then, we have , , and .
Proof.
Note that
, where (C.48a)
,
since for and . The corresponding expression based on the fixed effects
residuals is given by
where . (C.48b)
Since and are independent for all ,
(C.49)
,
which suggests the following bias-corrected estimator:
, where (C.50)
2
,
2
, )(1
NNNNNT
vv σAσ ),...,( 2
,,
2
,1,
2 NNTvNvN σ
)( ,, NjsitN aA NTNT
NjiNjsit aa ,,,, 0,, Njia 0,, Njia
N
S
m
NmNmNTNNNN uMIIQεQv ~)~([~~
1
,,,0,0
NNNN ΔDuu ~
),...,( .,.,1 NNTNN ddD 1S Nρ
~
)1()~( pNN oρρ
N
i
N
j
T
t
T
s
NjsNitNji
HR
N vvT
aNT
T
1 1 1 1
2
,
2
,,,2
2
~~11
)1(
~
)1(~
pN
HR
N o )1(ON )1(~
p
HR
N O
N
i
N
j
NijN EN 1 1
,
1
T
t
T
s
jsvitvNji
T
t
T
s
NjsNitNjiNijT
avvT
Ea1 1
2
,
2
,,,
1 1
2
,
2
,,,,
11
0,, Njia ji st
N
i
N
j
NijNN 1 1
,
1
T
t
T
s
NjsNitNjiNij vvT
a1 1
2
,
2
,,,,
1
Nitv , Njsv , st,
NijE ,
T
t
T
s
NjsNitNji vEvET
a1 1
2
,
2
,,,
1
T
t
T
s
NjsNitNji vEvET
T
Ta
1 1
2
,
2
,2
2
,,
)1(1
NijT
T,2
2)1(
N
HR
NT
T
~
)1(
~2
2
with .
To show that , we next demonstrate that and
.
Consider
(C.51)
,
using and . Note that and that
for . Next, define the vector . By Assumption 1,
and the row and column sums of the variance-covariance matrix
are bounded uniformly in absolute value. Next rewrite
, and note that
(C.52)
,
such that we have by Chebychev’s inequality.
Next note that and consider
(C.53)
,
with
,
,
.
N
i
N
j
NijNN 1 1
,
~1~
T
t
T
s
NjsNitNjiNij vvT
a1 1
2
,
2
,,,,~~1~
)1(~
pN
HR
N o )1()1( 2
2
pNN oT
T
)1()1(
~2
2
pNHR
N oT
T
N
N
i
N
j
T
t
T
s
NjsNitNji vvaNT 1 1 1 1
2
,
2
,,,
1
22)(
1NNN
NTvAv
)()(2
,
2
,
2
NitNnN vv v )()( ,,,, NjsitNnnN aa A NjiNjsit aa ,,,,
0,, Njsita jsit 1NT )( 2
,,
2
, NitvNitN v ζ
KvE NitNitv )(2
,
2
,,
)( NNN E ζζΞ
NNNNNNNNT
ENT
ζAσζAζ v )(2
)(1 2
5.02
22]})[(
)(
4])[(
)(
1{)( NNNNNN Var
NTVar
NTVar ζAσζAζ v
)1(])()()(
4)(
)(
2[ 5.022
22o
NTTr
NTNNNN vNvNN σAΞAσΞAΞA
)1()1( 2
2
pNN oET
T
NNNHR
NT
T
T
T
~
)1()1(
~2
2
2
2
N
i
N
j
T
t
T
s
NjsNitNjsNitNjiNN vvvvaNT 1 1 1 1
2
,
2
,
2
,
2
,,, )~~(1~
321
N
i
N
j
T
t
T
s
NjsNitNitNji vvvaTN 1 1 1 1
2
,
2
,
2
,,,1 )~(11
N
i
N
j
T
t
T
s
NjsNjsNji vvaTN 1 1 1 1
2
,
2
,,,2 )~(11
N
i
N
j
T
t
T
s
NjsNjsNitNitNji vvvvaTN 1 1 1 1
2
,
2
,
2
,
2
,,,3 )~)(~(11
By Lemma C.1 we have . In light of the maintained
assumptions regarding the properties of and , it follows that , ,
, and thus .
Summing up . Finally, , such that
, which completes the proof.
Lemma C.6a
Suppose Assumptions 1-4 hold; in addition, assume that for all and
. Let , where the nonstochastic, time-invariant
scalars are bounded uniformly in absolute value. Let
with and
, and where the vector can be any estimator that satisfies
. Finally, define
, where
,
,
, ,
, .
Then, we have , , and .
Proof.
Consider
with . (C.54a)
The corresponding expression based on the fixed effects residuals is given by
2
,
2
,,
2
,
2
, 2~NitNNitNitNNitNit vvv
2
,Nitv NA )1(1 po )1(2 po
)1(3 po )1(~
pNN o
)1(~
pN
HR
N o )1(1
1 1 1 1
2
,
2
,,, OaNT
N
i
N
j
T
t
T
s
jsvitvNjsitN
)1(~
p
HR
N O
8
,NitEv Tt 1
1,1 NNi
N
i
T
t
NitNiN vcENT 1 1
4
,,
1
NiNit cc ,,
N
S
m
NmNmNTNNNN uMIIQεQv ~)~([~~
1
,,,0,0
NNNN ΔDuu ~
),...,( .,.,1 NNTNN ddD 1S Nρ
~
)1()~( pNN oρρ
NN
HR
Nmk
mk
mk
ka~
1
~
1
~
11
01
11
0
N
i
T
t
NitNiN vcNT 1 1
4
,,~1~
N
i
T
t
T
trr
NirNitNiN vT
vT
cN 1 1 1
2
,
2
,,~
1
1~11~a
932 23
3
0
TTT
Tm
932
)32(231
TTT
Tm
364
)1(23
2
0
TTT
TTk
364
)96)(1(231
TTT
TTk
)1(~pN
HR
N o )1(ON )1(~p
HR
N O
N
i
NiN EN 1
,
1
T
t
NitNiNi vcT 1
4
,,,
1
with . (C.54b)
Substituting for , simplifying (exploiting the independence of and for or
), and collecting terms, we obtain – for each – that
, where (C.55)
,
and .
Since the correction term is also based on original rather than demeaned residuals,
another bias correction for is required. Analogous derivations yield the result that
, with (C.56)
,
and .
Substituting (C.56) into (C.55), averaging over and solving for yields the
following bias corrected estimator for :
, where (C.57)
and .
We next show that , considering each summand in (C.57). By the weak law
of large numbers,
, (C.58a)
given that , since the 8-th moments of (and thus also those of ) are
finite. Using the triangle inequality and the results in Lemma C.1, we have
(C.58b)
,
N
i
NiN EN 1
,
1
T
t
NitNiNi vcT 1
4
,,,1
1
4
,NitvNitv , Njsv ,
ji
ts i
NiNiNi kEk ,1,0, a
T
t
T
trr
NirNitNiNi vT
vT
Ec1 1
2
,
2
,,,1
11a
0k364
)1(23
2
TTT
TT1k
364
)96)(1(23
TTT
TT
Ni,a
NiNiNi mEm ,1,0, aa
T
t
T
trr
NirNitNiNi vT
vT
Ec1 1
2
,
2
,,,1
11a
932 23
3
0
TTT
Tm
932
)32(231
TTT
Tm
Ni ,...,1N
N
NN
HR
N gg a~~~10
11
00
1 mk
kg
11
011
1 mk
mkg
)1(~pN
HR
N o
)1(pNN oE
KE N
2 Nitv , Nitv ,
N
i
NiNiNNN 1
,,,~1~
N
i
T
tNitNitNitNitNitNitNitNi vvvcE
NT 1 1
4
,
2
,,
2
,
2
,,
3
,, 4641
N
i
T
tNitNNitNitNNitNitNNitNitNNi vvvc
NT 1 1
4
,
43
,,
32
,
2
,
2
,
3
,, )464(1
N
i k
itkNicNT 1
4
1
,,
1
with , , , and
. It is readily verified that for under
the maintained assumptions. As an example, consider the case . Using for
some , the triangle inequality, and Hölder’s inequality, we have
(C.58c)
,
since , , , and .
It follows that and thus .
Next consider . Again, under the maintained assumptions,
, (C.59a)
and thus by the weak law of large numbers.
Using the triangle inequality and the results in Lemma C.1, we have
(C.59b)
,
with , , ,
, , ,
, .
Consider . Substituting for , using the triangle inequality and the
generalized Hölder inequality, we obtain – for each of the terms with
(C.59c)
T
tNitNitNit v
1,
3
,,1 4
T
tNitNitNit v
1
2
,
2
,
2
,2 6
T
tNitNitNit v
1
3
,,
3
,3 4
T
rNitNit
1
4
,
4
,4
N
i
T
t
pitkNi ocNT 1 1
,, )1(1
4,...,1k
1k Kc Ni,
K
N
i
T
t
itNicNT 1 1
,1,2
1
N
i
T
tNitNNitv
NT
K
1 1,
3
,
4
)1(11
)(4
4/1
1 1
4
,
4/3
1 1
4
,
2/12/1
p
N
j
T
sNjs
N
i
T
t
NitN oNT
vNT
NKN
)1(2/1
pN ON )1(1
1 1
4
, p
N
i
T
t
Nit OvNT
)1(1
1 1
4
, p
N
j
T
sNjs
ONT
)1(2/1 oN
)1(~pNN o )1(~
pNN oE
NN Eaa ~
)1(pNN oE aa
)1(pNN oE aa
N
i
NiNiNNN 1
,,~1~ aaaa
N
i
T
t k
itkNicTNT 1 1
8
1
,,)1(
1
T
trr
NirNitNitit vv1
2
,,,,1 2
T
trr
NirNitit v1
2
,
2
,,2
T
trr
NirNirNitit vvT 1
,,
2
,,3 21
1
T
trr
NirNirNitNitit vvT 1
,,,,,41
2
T
trr
NirNirNitit vT 1
,,
2
,,5 21
1
T
trr
NirNitit vT 1
2
,
2
,,61
1
T
trr
NirNitNitit vT 1
2
,,,,71
2
T
trr
NirNititT 1
2
,
2
,,81
1
it,1 NitNNit ,,
)1( T tr
N
i
T
t
itNicTNT 1 1
,1,)1(
1
N
i
T
t
NirNitNitN vvTTN
K
1 1
2
,,,)1(
112
since with , , ,
, , and .
By analogous arguments, the other terms involving to can be shown to be
under the maintained assumptions. It follows that , and thus
.
This completes the proof, recognizing that under the maintained assumptions.
Lemma C.6b
Suppose Assumptions 1-4 hold; assume further that , i.e., there is cross-
sectional heteroskedasticity only in (but no heteroskedasticity over time). Let
and define
,
where as well as and are as in Lemma C.6a.
Then, we have , , and .
Proof.
Notice first that
. (C.60b)
Under the maintained assumptions, this can be written equivalently in the following
(estimable) expression:
, (C.60b)
where .
Next, observe that is equal to as defined in the proof of Lemma C.6a. Substituting
(C.55) into (C.56), solving for , and averaging over the bias corrected estimator
),1(111
)()1(
24/1
1
4
,
4/1
1 1
4
,
2/1
1 1
2
,
2/12/1
p
N
i
Nir
N
i
T
sNis
N
i
T
t
NitN ovNNT
vNT
NNT
K
Kc Ni , )1(OK )1(2/1
pN ON )1(1
2/1
1 1
2
, p
N
i
T
t
Nit OvNT
)1(1
4/1
1 1
4
, p
N
j
T
sNjs
ONT
)1(1
4/1
1 1
4
, p
N
i
T
t
Nit OvNT
)1(2/1 oN
it,2 it,8 )1(po
)1(~,1,1 pNN
oaa
)1(~pNN oE aa
)1(ON
),0.(.~ 2
,, ivNit div
Nitv ,
N
i
T
t
iNiN cNT 1 1
4
,
1
NN
HR
Nkm
km
km
m ~
1
~
1
~
11
01
11
0
a
1010 ,,, kkmm Na~
N~
)1(~pN
HR
N o )1(ON )1(~p
HR
N O
N
i
T
t
NitNiN EvcNT 1 1
22
,, )(1
N
i
iNicN 1
4
,
1
N
i
NiNN 1
,
1
))1(
(1 1
2
,
2
,,
T
t
T
tss
NisNiti
Ni vvTT
cE
N Na
Na Ni ,...,1
given in Lemma C.6b is obtained. That and was
already shown in the proof of Lemma C.6a.
)1(~pNN oE )1(~
pNN oE aa
Remark C.2
If is in fact heteroskedastic over both cross-sections and time, the error made by the
approximation in Lemma C.6b is given by
.
Hence, can be assumed to be small for small T and when heteroskedasticity is mainly of
the cross-section type (or random over time).
Lemma C.7
Suppose the assumptions of Lemma C.6a hold. Let , where the
nonstochastic scalars are bounded uniformly in absolute value. Define
, where
,
,
and .
Then, we have , , and .
Proof.
Consider
with , (C.61a)
The corresponding expression based on the fixed effects residuals is given by
with . (C.61b)
Substituting for , simplifying (exploiting the independence of and for or
), and rearranging terms, we obtain that – for each
, where (C.62)
and .
Nitv ,
N
i
T
t
T
tss
NitNitNi
N
i
T
t
T
tss
NisNitNiTT
cTT
cN 1 1 1
2
,
2
,,
1 1 1
2
,
2
,,)1(
1
)1(
11
N
i
Nit
T
t
T
tss
NisNitNiTT
cN 1
2
,
1 1
2
,
2
,, )()1(
11
N
i
T
t
NitNitN vcENT 1 1
3
,,
1
Nitc ,
NN
HR
N ff b~~~
10
N
i
T
t
NitNitN vcETN 1 1
3
,,~
)1(
1~
N
i
T
t
T
r
NirNitN vT
cTN 1 1 1
3
,,~
1
111~b
33
)1(20
TT
TTf
2
2
21)1()33(
1
T
T
TTf
)1(~
pN
HR
N o )1(ON )1(~
p
HR
N O
N
i
NiN EN 1
,
1
T
t
NitNitNi vcT 1
3
,,,
1
N
i
NiN EN 1
,
1
T
t
NitNitNi vcT 1
3
,,,1
1
3
,NitvNitv , Njsv ,
ji
ts i
NiNiNi fEf ,1,0, b
0f33
)1(2
TT
TT1f
33
12
TT
Since the correction term is also based on original rather than demeaned residuals,
another bias correction is required as well. Analogous derivations yield the result that
, (C.63)
such that
, where
,
.
Averaging over , we obtain the following bias corrected estimator for :
, where (C.64)
, and
.
The proof that is very similar to that in Lemma C.6a and is thus omitted for
the sake of brevity. Finally, suppose that can be written as quadratic form
with and ; then
, and with
.
Remark C.3
Note that and . Accounting for the
definition of , can be written as sum of the two expressions and
, where ( ) is an vector made up of the main diagonal elements
of the matrix ( ). Next, observe that and
(obviously suppressing the indexation of ). By assumption ,
and thus , where the dimension of is . Moreover, ,
and thus , where the dimension of is . By arguments,
analogous to that in Lemma C.3, we have . It
Ni ,b
NiNi ET
T,2
2
,)1(
bb
NiNiNi EfEf ,1,0, b
33
)1(12
0
0
TT
TT
ff
2
2
22
2
11)1()33(
1
)1(
T
T
TTT
Tff
3032 )1()1(33
)1(
T
Tf
T
T
TT
TT
Ni ,...,1 N
NN
HR
N ff b~~~
10
N
i
Ni
N
i
T
t
NitNitN EN
vcETN 1
,
1 1
3
,,
~1~
)1(
1~
N
i
T
t
T
r
NirNitN vT
cTN 1 1 1
3
,,~
1
111~b
)1(~pN
HR
N o
N
NNNNNT
bΣa)3(1
)()( 3
,1
3
,1
)3(
Nit
NT
itNn
NT
nN vEdiagvEdiag Σ NitNitNit bac ,,,
N
HR
NN
HR
NTN
bΣa),3(~
)1(
1~
])~[(
~ 3
,1
),3( HR
Nn
NT
n
HR
N vdiag Σ
]~
)1(
1~[)~(1
3
,3
3
,0
3
,
T
r
NirNit
HR
Nit vT
vfv
NT
n
NnvNnN dN1
)3(
,,,,
1****
, vvE
NT
n
NnvNnN dN1
)3(
,,,,
1****
,
~~vvE
Nnd ,,v
****
,NvEN
NN,
)3(
,vBv κΣa
NNN
,
)3(
,vA
bΣκv
N,vB
κN,vA
κ 1NT
N,vB N,vA NNNN αPFa vv ,, NNNN αPFa vv
~~~~,,
Nα~ )1(
~pNN oPP )1(ON P
)1(~
pN OPNP PP *
)1(~pNN oαα
)1(ON α )1(~pN Oα
Nα 1P
)1(~~
,,
),3(1),3(1
p
HR
NN
HR
NN oNNNN
vv BBκΣFκΣF
follows that . By the same
reasoning, , from which it follows that
.
)1(~~~~
,,
)3(
,,
1),3(1
pNNNNrp
HR
NNNN oNNNN
vv BBκΣFPακΣFPα
)1(~~
,
)3(
,
)3(
,,pNNNN o
NN
vAvA
bΣκbΣκvv
)1(~ ****
,
****
, pNN o vv EE