Fixed Effects and Random Effects Estimation of Higher ... · Spatial interactions in data may...

Department of Economics

Working Paper No. 173

Fixed Effects and Random Effects Estimation of Higher-Order Spatial

Autoregressive Models with Spatial Autoregressive and Heteroskedastic

Disturbances

Harald Badinger Peter Egger

April 2014

Fixed Effects and Random Effects Estimation of Higher-Order Spatial

Autoregressive Models with Spatial Autoregressive

and Heteroskedastic Disturbances

Harald Badinger

Department of Economics, Vienna University of Economics and Business;

Austrian Institute of Economic Research (WIFO)

Peter Egger

Department of Management, Technology, and Economics at ETH Zürich; CEPR

April 2014

Abstract: This paper develops a unified framework for fixed and random effects estimation

of higher-order spatial autoregressive panel data models with spatial autoregressive

disturbances and heteroskedasticity of unknown form in the idiosyncratic error component.

We derive the moment conditions and optimal weighting matrix without distributional

assumptions for a generalized moments (GM) estimation procedure of the spatial

autoregressive parameters of the disturbance process and define both a random effects and a

fixed effects spatial generalized two-stage least squares estimator for the regression

parameters of the model. We prove consistency of the proposed estimators and derive their

joint asymptotic distribution, which is robust to heteroskedasticity of unknown form in the

idiosyncratic error component. Finally, we derive a robust Hausman-test of the spatial random

against the spatial fixed effects model.

JEL-code: C13, C21, C23

Keywords: Higher-order spatial dependence; Generalized moments estimation;

Heteroskedasticity; Two-stage least squares; Asymptotic statistics

I. Introduction

This paper considers the estimation of panel data models with higher-order spatially

autocorrelated error components and spatially autocorrelated dependent variables. Spatial

interactions in data may originate from various sources such as strategic interaction between

jurisdictions (to attract firms or other mobile agents) and firms (in their price, quantity, or

quality setting) or general equilibrium effects which disseminate with spatial decay due to

their transmission through trade flows, migration, or input-output relationships.1

Data sets

used in empirical studies often share three features: first, they are available in the form of

panel data, with a large cross-sectional and a small time series dimension; second, spatial

interactions of various kinds co-exist – such as geography-related, trade-related, migration-

related interactions – or the decay function of a single spatial interaction is unknown; third, it

is unclear whether spatial interactions are local – and affect only immediate neighbors – or

global – and affect second third and other neighbors with repercussions. The estimator

proposed here addresses the mentioned three features in a unified framework. It allows for

panel data with a fixed but arbitrary number of channels or decay segments of spatial

interaction in both the error components and the dependent variable, referred to as

SARAR(R,S).

Estimation and testing of both random and fixed effects spatial regressive panel data models

with homoskedastic error terms has been considered in the recent literature using a maximum

likelihood framework (Baltagi, Song, and Koh, 2003; Lee and Yu, 2010) or a generalized

moments approach (Kapoor, Kelejian and Prucha, 2007; Mutl and Pfaffermayr, 2011). The

present paper builds on Kapoor et al. (2007). They propose a generalized moments (GM)

estimator for the parameters of the spatial regressive error process in a homoskedastic random

effects panel data model without endogenous explanatory variables (such as spatial lags of the

dependent variable), derive a simplified weighting matrix for the moment conditions under

the assumption of normally and identically distributed error components, and prove

consistency of the GM estimates. They also establish the asymptotic distribution of the

feasible generalized least squares (FGLS) estimates of the parameters of the exogenous

regressors.

The present paper extends and generalizes the analysis in Kapoor et al. (2007) in several

respects. First, we allow the explanatory variables to be related to the time-invariant error

component, i.e., we provide an estimation framework that nests both the fixed and random

1 See Cliff and Ord (1973, 1981), Anselin (1988), and Cressie (1993) for classic references

about spatial econometric models in general. Recent theoretical contributions of spatial panel

data models include Baltagi, Song, and Koh (2003), Baltagi, Song, Jung, and Koh (2007),

Kapoor et al. (2007), Baltagi, Egger, and Pfaffermayr (2008), and Lee and Yu (2008). Recent

applications of spatial panel data models include Arbia, Basile, and Piras (2005), Egger,

Pfaffermayr, and Winner (2005), Baltagi, Egger, and Pfaffermayr (2007), and Badinger and

Egger (2009).

effects setup. Second, we allow for higher-order rather than only first-order spatial regressive

processes in both the dependent variable and the error process, enabling a more flexible

design and specification tests of the ‘spatial’ interdependence decay function.2

Third, we

allow for endogenous variables, including spatial lags of the dependent variable in the main

equation, which is shown to affect the optimal weighting matrix for the moment conditions as

well as the distribution of the GM estimates. Fourth, we do not only prove consistency of the

estimates of the model parameters but also derive their joint asymptotic distribution (which is

affected by the presence of endogenous variables in a nontrivial way). Fifth, we dispense with

the assumption of normally distributed error components, used by Kapoor et al. (2007) to

derive a simplified weighting matrix of the moments. In particular, we relax the restrictive

assumption that the idiosyncratic errors are identically distributed and allow for

heteroskedasticity of arbitrary form over cross-sectional units and time in the idiosyncratic

error terms. Under these assumptions, we derive a robust variance-covariance matrix, drawing

on recent results by Stock and Watson (2008). We emphasize that the framework of the

present paper, the advantage of the GM approach over maximum likelihood (ML) estimation

goes beyond that of imposing less restrictive distributional assumptions and computational

simplicity, since ML yields inconsistent parameters estimates in the SARAR(R,S) framework

with heteroskedasticity of unknown form (see Lee and Yu, 2010). Sixth, we derive a

Hausman-test that allows to test the spatial fixed effects against the random effects model in

the presence of heteroskedasticity. Seventh and finally, we provide some limited Monte Carlo

evidence on the small sample performance of the proposed estimation procedures. In sum this

provides a fairly flexible framework for applied work, allowing specification tests, estimation,

and inference in random and fixed effects panel data models with potentially higher-order

cross-sectional interdependence and heteroskedasticity.

The remainder of the paper is organized as follows. Section II introduces the basic model

specification, discusses the fixed versus the random effects model, and provides an overview

of the key assumptions of the proposed estimation procedure. Section III proposes GM

estimators for the parameters of spatial dependence in the error components. Section IV

derives a two-stage least squares (TSLS) and spatial generalized TSLS procedure for

estimation of the regression parameters of the model and derives a joint heteroskedasticity-

robust asymptotic variance-covariance matrix of the GM and TSLS estimates of the model

parameters. Section V derives a consistent estimator of the variance-covariance matrix.

Section VI proposes a Hausman-type test of the random versus the fixed effects model.

Section VII presents results of a Monte Carlo simulation exercise. Section VIII summarizes

our main findings and concludes. The detailed proofs are relegated to a technical appendix.

2 In a cross-sectional framework, estimation of higher order spatial regressive models is

considered by Lee and Liu (2010) under homoskedasticity and by Badinger and Egger (2008)

under heteroskedasticity.

II. The Basic Model

1. Specification and Key Assumptions

We consider an R-th order spatial regressive panel data model with S-th order spatial

regressive error components, referred to as SARAR(R,S) panel data error components model.

The basic model comprises Ni ,...,1 cross-sectional units and Tt ,...,1 time periods.

Throughout, subscript N indicates that the variables or parameters are allowed to depend on

sample size. For time period t, the model reads

Nt

R

r

NtNrNrNNtNt ,

1

,,,,, uyWβXy

, or (1a)

NtNNtNt ,,, uδZy , (1b)

where Nt ,y is an 1N vector with cross-sectional observations of the dependent variable in

year t, Nt ,X is an KN matrix of observations on K non-stochastic explanatory variables,

i.e., ),...,( ,,1, NNtNtNt xxX , where each of the N vectors ),...,( ,,,,1, NitKNitNit xxx is of

dimension K1 , containing the observations on the K explanatory variables for cross-

section i and period t . For later reference, define the KT matrix ),...,( ,,1, NiTNiNi xxX as

observations on the K explanatory variables for cross-section i and all periods Tt ,...,1 .

The structure of spatial dependence in Nt ,y is determined by the time-invariant NN

matrices Nr ,W , Rr ,...,1 , whose elements Nrijw ,, are assumed to be known and will often

(but need not) be specified as a decreasing function of geographical distance between the

cross-sectional units i and j. The expression NtNrNrt ,,,, yWy is referred to as the r-th spatial

lag of Ny . The specification of a higher-order process allows the strength of spatial

interdependence in the dependent variable (reflected in the spatial autoregressive parameters

Nr , , Rr ,...,1 ) to vary across a fixed number of R subsets of relations between cross-

sectional units.

In equation (1b), the )( RKN design matrix is given by ),( ,,, NtNtNt YXZ , with

],...,[ ,,,1,, NRtNtNt yyY , and ),( NNN λβδ , where the 1K parameter vector of the

exogenous variables is given by ),...,( ,,1 NKNN βββ and the 1R vector of spatial

autoregressive parameters of Ny is defined as ),...,( ,,1 NRNN λ .

The 1N vector of error terms ),...,( ,,1, NNtNtNt uuu is assumed to follow a spatial

autoregressive process given by

Nt

S

s

NtNsNsNt ,

1

,,,, εuMu

, (1c)

NtNNt ,, vμε , (1d)

where Ns, and Ns,M denote the time-invariant, unknown parameters and the known NN

matrix of spatial interdependence, respectively. The structure of spatial correlation in the

disturbances is determined by the S different, time-invariant NN matrices Ns,M . As in

equation (1a), the specification of a higher-order process allows the strength of spatial

interdependence in the disturbances (reflected in the parameters Ns, , Ss ,...,1 ) to vary

across a fixed number of S subsets of relations between cross-sectional units. This enables a

more flexible parameterization of the decay of spatial dependence than with a first-order

process along two lines: by capturing more than just one channel of interdependence and by

allowing for estimation of several parameters Ns, for S segments of the decay function

(e.g., rings of neighbors or segments of distance). The expression NtNsNst ,,,, uMu is referred

to as the s-th spatial lag of Nu . The 1S vector of the spatial autoregressive parameters of

Nt ,u is defined as .),...,( ,,1 NSNN ρ

Finally, the 1N vector of error terms Nt ,ε consists of two error components, a cross-section

specific, time-invariant error component Nμ and an idiosyncratic error component Nt ,v ,

which is specific to both the cross-sectional unit and the time period. The typical elements of

Nt ,ε and Nt ,v are the scalars Nit , and Nitv , , respectively, and the 1N vector of unit-specific

error components is given by ),...,( ,,1 NNNN μμμ .

Stacking observations for all time periods such that t is the slow index and i is the fast index

with all vectors and matrices, the model reads

NNNNNN uλYβXy , or (2a)

NNNN uδZy , (2b)

with the KNT regressor matrix ),...,( ,,1 NTNN XXX , and ),...,( ,,1 NRNN yyY , where

),...,( ,,,,1, NrTNrNr yyy is the 1NT vector of observations on the r-th spatial lag of the

dependent variable Nr ,y . The 1NT vector of disturbances ),...,( ,,1 NTNN uuu for the

spatial autoregressive process of order S is given by

N

S

s

NNsTNsN εuMIu 1

,, )( , (2c)

where TI is an identity matrix of dimension TT . The 1NT vector ),...,( ,,1 NTNN εεε is

specified as

NNNNNTN vμvμIeε

)( , (3a)

where Te is a unit vector of dimension 1T and NI is an identity matrix of dimension

NN . In light of (2c), the error term can also be written as

S

s

NNsNsNT

S

s

NNsTNsNN

1

,,

1

,, )()( uMIIuMIuε . (3b)

It follows that

)]()([])([1

1

,,

1

1

,,

S

s

NNNsNsNT

S

s

NNsNsNTN vμMIIεMIIu , and (4a)

N

R

r

NrNrNTNN

R

r

NrNrNTN uWIIβXWIIy ])([])([ 1

1

,,

1

1

,,

, (4b)

A more general specification of (4a) would allow the spatial regressive parameters (and

possibly the weighting matrices) associated with the two error components Nμ and Nv to

differ as in Baltagi, Egger, and Pfaffermayr (2009). With a higher order process as considered

in the present paper, such a specification would be both difficult to identify and

computationally involved. Hence we assume the pattern of the spatial regressive disturbance

process to be the same for Nμ and Nv as in Kapoor et al. (2007).

2. Key Assumptions

As it is standard in the spatial econometric panel data literature, we assume that the

explanatory variables collected in XN are nonstochastic with elements that are bounded

uniformly in absolute value.3

Without loss of generality we further assume that each

explanatory variable changes over time, at least for some cross-section i. (Under random

effects estimation, this assumption could be relaxed in a straightforward way without

invalidating the asymptotic results.) Beyond those, the following assumptions are maintained

throughout this paper.

Assumption 1.

Let T be a fixed positive integer. (a) For all Tt 1 and 1,1 NNi , the error

components Nitv , are (mutually) independently distributed with 0)( , NitvE , 2

,

2

, )( itvNitvE ,

3

See Kapoor, Kelejian, and Prucha (2007, p. 100), Lee and Yu (2008, p.3), Lee and Yu

(2010, Assumption 6, p. 6) or Mutl and Pfaffermayr (2011, p.51).

where 2

,0 itv , and 4

,NitvE for some 0 . Hence, the idiosyncratic disturbances

exhibit heteroskedasticity of unknown form.

(b) For all 1,1 NNi , the unit-specific error components Ni, are identically and

(mutually) independently distributed with 22

, )( NiE , where b20 , and

4

,NiE for some 0 . Following Mundlak (1978), it is assumed that

NitNitNi wμ ,,, πx . Averaging over time periods Tt ,...,1 we obtain NiNiNi wμ ,,, πx for

Ni ,...,1 “between-transformed” observations, where

T

t

NitNi T1

,

1

, xx and

T

t

NitNi wTw1

,

1

, are both 1N vectors, and ),0(~ 2

,, NwNiw . In the random effects model,

we have 0π , which implies that the time-invariant error component is uncorrelated with

the explanatory variables Nit ,x in any time period t. In the fixed effects model, we have 0π

, i.e., the explanatory variables are correlated with the time-invariant error component. More

precisely, in the random effects specification we have 0)( , NNiE X , whereas in the fixed

effects model it holds that 0)()( , NNNi fE XX .4

(c) The processes }{ ,Nitv and }{ ,Ni are independent of each other.5

We emphasize that the estimation framework considered here assumes that the spatial

regressive structure of the empirical model given by (1a) to (1d) is identical under the fixed

effects and the random effects setup, i.e., the time-invariant error component displays the

same spatial regressive structure through equations (1b) and (1c), irrespective of the

properties of the covariates. This differs from the specification of the spatial regressive fixed

effects models in Lee and Yu (2010) as well as Mutl and Pfaffermayr (2011), who exclude the

time-invariant error component from the spatial regressive error process.6

4

Strictly speaking, with non-stochastic regressors, the two expectations could also be stated

unconditionally (see Greene, 2008, p. 18). 5

Assumption 1 is maintained throughout the paper. For some results, in particular for

consistent estimation of the variance-covariance matrix of the GM estimates without

distributional assumptions, Assumption 1 will have to be strengthened, assuming that

for all and .

6 Lee and Yu (2008) consider maximum likelihood estimation of a homoskedastic spatial

regressive fixed effects panel data model; Mutl and Pfaffermayr (2008) consider a Hausman-

test for random versus fixed effects first order SARAR(1,1) model with homoskedastic error

components. Both partial out the time invariant error component from the spatial

regressive disturbance process under fixed effects estimation. This choice implies that the

time-invariant error component displays the spatial regressive structure of the dependent

variable. The difference of Mutl and Pfaffermayr (2008) to our approach is apparent from the

8

,NitvE Tt 1 Ni 1

Nμ

As we will see below, our specification implies that the spatial generalized least squares

(GLS) transformed model nests the standard fixed and random effects panel data models.

Hence, we regard the nature of the spatial regressive process and the properties of the

explanatory variables under random versus fixed effects as two separate sets of assumptions.

Our approach allows for cross-sectional interdependence (with a known spatial structure) not

only in unobserved variables captured by Nv but also in unobserved time-invariant variables

subsumed in Nμ in SARAR(0,S) models (i.e., without a spatially lagged dependent variable).

More importantly, this approach allows us to use the same set of four moment conditions both

under random effects and fixed effects estimation. Finally, when considering a Hausman test

of the random effects versus the fixed effects model in section VI, we wish to consider two

model specifications, i.e., the random effects and the fixed effects model, whose assumptions

regarding the spatial regressive structure of the error components and the nature of

heteroskedasticity are identical and which only differ with regard to whether or not 0π

Assumption (1b).

Assumption 1 implies that

2

,

2

,, )( itvNjsNitE for i = j and t = s, (5a)

2

,, )( NjsNitE for i = j and t s, (5b)

0)( ,, NjsNitE , otherwise. (5c)

As a consequence, the variance-covariance matrix of the stacked error term Nε reads

NNTNNN E ΣIJεεΩε )()( 2

, , (6a)

where TTT eeJ is a TT matrix with unitary elements, NTI is an identity matrix of

dimension NT NT, and

NTNn

NT

nNn

NT

nNNN εEdiagvEdiagE IvvΣ22

,1

2

,1 )()()( . (6b)

Note that we use single indexation NTn ,...,1 in equation (6b) to denote elements of the

stacked vectors or matrices. We will adopt this convention at several points in the paper in

order to simplify notation, when there is no possibility for confusion.

specification of the ‘Mundlak assumption’ in (9). In matrix form and using notation in an

obvious way, we assume that , whereas Mutl and Pfaffermayr (2008) assume

that (which differs from the specification of their random

effects model).

NNN wπXμ

NNNNNN wπXWIμ )(

Next, we define two matrices N,0Q and N,1Q , which are central to the estimation of error

component models and the moment conditions of the GM estimator:

NT

TNT

IJ

IQ )(,0 (7a)

NT

NT

IJ

Q ,1 . (7b)

Pre-multiplying an NT 1 vector with N,0Q transforms its elements into deviations from

cross-section specific sample means taken over time (“within-transformation”). We will refer

to “within-transformed” vectors or matrices with an underbar, e.g., NNN ZZQ ,0 . Pre-

multiplying a vector by N,1Q transforms its elements into cross-section specific sample means

(“between-transformation”). Notice that N,0Q and N,1Q are both of order NT NT,

symmetric, idempotent, orthogonal to each other, and sum up to NTI .7

Assumption 2.

(a) All diagonal elements of the matrices Nr ,W , Rr ,...,1 , and Ns,M , Ss ,...,1 , are zero.

(b) The admissible parameter space for the spatial lag of the dependent variable is given by

),( ,,,rr

NUNLNr aa , with r

NLa

,0 , aaa r

NUr ,

, , Rr ,...,1 , and

AR

r

Nr

1

, ,

where we define a such that )(max ,

,...,1

r

Rraa

holds.

Analogous assumptions are made for the parameters of the spatial autoregressive error

process: ),( ,,,ss

NUNLNs aa , with ,0 ,

s

NLa

aaa s

NUs ,

, , Ss ,...,1 , and

AS

m

Nm

1

, , where we define a such that )(max ,

,...,1

s

Ssaa

holds.

(c) The matrices )(1

,,

R

r

NrNrN WI and )(1

,,

S

m

NmNmN MI are nonsingular for

),( ,,rr

NUNLr aa and ),( ,,

rr

NUNLs aa , respectively.

Part (a) of Assumption 2 is standard. Assumption (2b) requires the spatial regressive

parameters to be finite. The admissible value of the scalars A ( A ) will generally depend on

the properties of the weights matrices Nr ,W ( Ns,M ). For example, with row-normalized

matrices Nr ,W , Rr ,...,1 , choosing 1A ensures that )(1

,,

R

r

NrNrN WI is invertible, as

7 See Remark A.2 in Appendix A for further properties of and . N,0Q N,1Q

required in Assumption (2c).8

Finally, Assumption (2c) ensures that Nu and Ny are uniquely

identified through equations (4a) and (4b).

Assumption 3.

The row and column sums of the matrices Nr ,W , Rr ,...,1 , Ns,M , Ss ,...,1 ,

1

1

,, )(

R

r

NrNrN WI , and 1

1

,, )(

S

s

NsNsN MI are bounded uniformly in absolute value.

In light of Assumptions 1-3 and Remark A.1 in the Appendix, it follows that 0u )( NE and

the variance-covariance matrix of Nu is given by

S

s

NsNsNT

S

s

NNsNsNTNNN E1

1

,,

1

,

1

,,, ])([])([)( MIIΩMIIuuΩ εu , (8)

For the sake generality, all explanatory variables and parameters (except for the variances of

the error components Nμ and Nv ) are allowed to depend on sample size N. (Of course, all

results hold up in the case where parameters do not depend on N.) In spatial econometric

models this degree of generality is important, given that spatial lags (and disturbance

processes) depend on normalized weights matrices. Depending on the weighting scheme, both

the spatially weights and the corresponding parameters will change with the size of the cross

section dimension, N, since a growing N (e.g., a growing number of countries or regions)

requires renormalizing the weights matrices. Such a specification is consistent, for example,

with models where the weights matrices are row-normalized and the number of neighbours of

a given cross-sectional unit depends on sample size (see Kapoor et al., 2007, p. 102) or where

the strength of interdependence (in terms of the spatial autoregressive parameters) changes

with the number of neighbours.

As a result, the model specification in equations (1a)-(1c) is fairly general, allowing for

higher-order spatial dependence in the dependent variable, the explanatory variables, and the

disturbances, and enabling specification tests to determine to proper structure of cross-

sectional interdependence in applied work.

3. Overview of Estimation Procedure

In the following, we outline the estimation procedure proposed in the present paper. Details

and proofs of the claims made here are given in the subsequent sections.

8 If the matrices are not row-normalized, Assumption (2c) is implied by

for some matrix norm (see Lee and Liu, 2010, Horn and Johnson,

1985, p. 301).

Nr ,W

1

,,...,1

max

Nr

RrA W

In a first step, the regression parameters in model (1a), i.e., Nδ , are consistently

estimated by fixed effects two-stage least squares (TSLS), ignoring the spatial

regressive structure of Nu (see Amemiya, 1971, Baltagi, 2005). Under the maintained

assumptions, this yields consistent estimates of the disturbances NNNN δZyu ˆˆ .

Under stronger assumptions, consistent estimates can also be obtained by pooled two-

stage least squares or two-stage least squares with random effects.

Based on the estimates of the disturbances Nu , a generalized moments (GM)

estimator can be used to obtain consistent estimates of the parameters of the spatial

regressive disturbance process )( Nρ and the variance of the time-invariant error

component ( 2

), denoted as Nρ~ and 2

,~

N .

The joint variance-covariance matrix for the estimates of the regression parameters Nδ

and the spatial regressive parameters Nρ derived in the present paper, which is robust

to both the spatial dependence in Nu as well as arbitrary heteroskedasticity in the

idiosyncratic error term Nv , can be used for specification tests to determine the proper

form of the interdependence decay function.9

To improve efficiency (the estimates of) the parameters Nρ can be used to obtain a

(feasible) spatial generalized least squares (GLS) transformed variant of model (1a),

which corresponds to a “standard” (fixed or random effects) panel data model without

spatial dependence in the disturbances but with heteroskedasticity of unknown form in

the idiosyncratic error term Nv . Using this transformed model, feasible spatial

generalized two-stage least squares (TSLS) estimates of the regression parameters *~̂Nδ

can be obtained. (The asterisk indicates that the estimates are based on a transformed

model; the tilde indicates that the model transformation is based on Nρ~ , i.e., the GM

estimates of Nρ ). Again a heteroskedasticity-robust joint variance-covariance of *~̂Nδ

and Nρ~ is derived, allowing for joint inference regarding the regression parameters

and the spatial regressive parameters of the model.

The estimation procedure can also be implemented in an iterative way, i.e., the

feasible spatial generalized TSLS estimates *~̂Nδ can be used to obtain iterated

9 The possibility that joint hypotheses about and may be formulated and tested is an

advantage of the proposed two-step approach over the use of (spatial-dependence and

heteroskedasticity) robust standard errors. In particular, it allows for specification tests a la

Anselin et al. (1996) in a higher order setting and under less restrictive distributional

assumptions.

Nδ Nρ

estimates of the disturbances Nu , which can in turn be used to obtain a new set of

estimates for Nρ , etc.

The obtained (feasible or iterated) heteroskedasticity-robust fixed and random effects

models can then be tested against each other by a Hausman test which is derived in

this paper.

To keep the analysis general, we first consider only the GM estimation of the disturbance

process (1c), without assuming a particular form of model (1a) or how consistent estimates of

the residuals Nu~ of model (1a) are obtained. The advantage of this approach is that the results

are potentially applicable to the disturbances of a wider class of regression models, e.g.,

nonlinear specifications of equation (1a). Then, we consider the estimation of the main

equation (1a), using a modular approach with general notation that covers the four estimators

considered in the present paper: random effects and fixed effects estimation of both the

original and the spatial GLS transformed model.

III. GM Estimation of a SAR(S) Process

In the following, we consider GM estimators for the spatial regressive parameters Nρ of the

disturbance process in equation (1c) and the variance of the time-invariant error component 2

and establish their asymptotic joint distribution. In this subsection, we only consider the

process in equation (1c) for the disturbances Nu , but not necessarily the one in equation (1a)

for Ny . These disturbances Nu are unknown and thus have to be obtained in a first-step,

using consistent estimates of Nδ in the main equation (1a) (or from some other model),

ignoring the spatial regressive error structure in Nu . The assumptions sufficient to establish

the asymptotic properties of the GM estimates (consistency and normality) are stated in

general terms in Assumptions 4 to 7 in this section and will be made more specific in section

IV, where we consider TSLS and spatial generalized TSLS estimation of model (1a). It will

also become apparent in this section that the asymptotic distribution of the (second-step) GM

estimates of Nρ , which are based on estimated disturbances Nu , is affected in a non-trivial

way by the properties of the first-step estimation (fixed versus random effects) and by the

presence of endogenous right hand side variables.

1. Moment Conditions

A set of three moment conditions for GM estimation of first-order spatial regressive error

processes was introduced in the seminal paper by Kelejian and Prucha (1999) for the case of a

single cross-section under homoskedasticity. The extension of this estimator to a random

effects panel data error component model by Kapoor et al. (2007) (under homoskedasticity)

yields a set of six moment conditions. Heteroskedasticity has so far only been considered in

the cross-sectional SARAR(1,1) framework by Kelejian and Prucha (2010), who use two of

the three moment conditions in Kelejian and Prucha (1999), and in the SARAR(0,1)

framework by Lin and Lee (2010).

An analogous approach to Kelejian and Prucha (2010) is pursued here in the derivation of the

moment conditions under heteroskedasticity, but for (both fixed and random effects) panel

data models. For this, we use four of the six moment conditions akin to the ones in Kapoor et

al. (2007). Moreover, with an S-th-order rather than a first-order process (SAR(S), with 1S

), additional moment conditions are available, associated with each weights matrix Ns,M ,

Ss ,...,1 , and each pair of weights matrices Ns,M , Ns ,M , SssSs ,...,;,...,1, . Define

])()[()(1

,,,,,

S

m

NNmTNmNNsTNNsTNs uMIuMIεMIε . (10)

Under Assumptions 1 to 3, we then have the following set of moment conditions for 2T ,

and SssSs ,...,;,...,1, :

:M ,

1

ss )]()([

)1(

1]

)1(

1[ ,,,0

2

,1,,0, NsNsTNNn

NT

nNsNNs vEdiagTrTNTN

E

MMIQεQε (11a)

:M2

s 0])1(

1[ ,0,

NNNs

TNE εQε , or (11b)

:M ,

3

ss )]()([

1)(]

1[ ,,,1

2

,1,,

2

,,1, NsNsTNNn

NT

nNsNsNsNNs vEdiagtrN

trN

T

NE

MMIQMMεQε (11c)

:M4

s 0]1

[ ,1, NNNs

NE εQε . (11d)

Unless part of the weights matrices are orthogonal, there are )1(4 SSS moment

conditions.10

For the case of a first-order spatial regressive process, i.e., 1S , they nest the

moment conditions of the aforementioned GM estimators as special cases. Under

homoskedasticity, i.e., NTvNit

NT

n vEdiag I22

,1 )( , the corresponding four moment conditions in

Kapoor et al. (2007) are then obtained. In the cross-sectional case, i.e., for 1T (and

0Q N,0 ) the moment conditions M1 and M2 become uninformative and M3 and M4 reduce to

the corresponding the two moment conditions in Kelejian and Prucha (2010) under

heteroskedasticity with the NN matrix )( 2

1 i

N

i vEdiag , or the two moment conditions in

10

If some pairs of matrices are orthogonal, for some , the corresponding

moment condition is trivially satisfied for any set of (finite) parameter values. Hence, if all

weights matrices were pairwise orthogonal, there would be moment conditions.

0MM NsNs ,, ss

S4

Kelejian and Prucha (1999) under homoskedasticity with the NN matrix 2

,

2

,1 )( NvNNi

N

i vEdiag I .

Note that the moment conditions can also be written as quadratic forms in the vector Nε :

:M ,

1

ss 0]

)1(

1[ ,

,1

N

ss

NNTN

E εAε , with (12a)

NnnNsNsTN

NT

nNsNsTNNss diag ,0,,,01,,,0

1

,, ]})([){( QMMIQMMIQA .

:M2

s 0])1(

1[ ,2

N

s

NNTN

E εAε , with )( ,,0,2 NsTN

s

N MIQA . (12b)

:M ,

3

ss 0)(]

1[ ,,

2,

,3

NsNsN

ss

NN trN

T

NE MMεAε , with (12c)

}])(([)({ ,0,,,11,0,,,1

,

,3 NnnNsNsTN

NT

nNNsNsTN

ss

N diag QMMIQQMMIQA

.

:M4

s 0]1

[ ,4 N

s

NNN

E εAε , with )( ,,1,4 NsTN

s

N MIQA . (12d)

Substituting equations (3a), (3b), (6b), and (10) into the )1(4 SSS moment conditions

(11) yields a )1(4 SSS equation system in ),,...,( 2

,,1 NSN , which can be written as

0 NNN bΓγ , (13a)

where Nb is a 1]12/)1([ SSS vector given by

),,...,,...,,,..., ,,...,( 2

,,1,,1,2,1

2

,

2

,1,,1 NSNSNSNNNNSNNSNNb ,

i.e., Nb contains S linear terms Nm, , Sm ,...,1 , S quadratic terms 2

, Nm , Sm ,...,1 ,

2/)1( SS cross products NlNm ,, , SmlSm ,...,)1( ),1(,...,1 , as well as 2

. For later

reference, we define the 1)1( S vector of all parameters as

) ,,...,(),( 2

,,1

2 NSNNN ρθ .

Nγ is a 1)]1(4[ SSS vector with elements )( ,Ni , )1(4,...,1 SSSi , and NΓ is a

)]1(4[ SSS ]12/)1(2[ SSS matrix with elements )( ,, Nji , )1(4,...,1 SSSi ,

12/)1(2,...,1 SSSj . The elements Ni , and Nji ,, will be defined below.

Throughout the paper, we adopt the following convention with respect to the ordering of the

rows in equation system (13). The first four rows are associated with the 4 moment conditions ssM,

1 , sM 2 , ssM,

3 , and sM 4 with 1 ss . The next four rows are associated with ssM,

1 ,

sM 2 , ssM,

3 , and sM 4 with 2 ss , and so forth up to Sss . This yields S4 rows of the

equation system. These moment conditions are always available under Assumptions 1 and 2.

Unless part of the weights matrices are orthogonal, there are )1( SS further moment

conditions available, resulting from ss ,

1M and ss ,

3M with )1(,...,1 Ss , Sss ),...,1( .

These are added to the equation system, starting from row 14 S , as follows. The next row (

14 S ) is associated with ss ,

1M and 1s and 2s ; the next rows with ss ,

1M with 1s and

3s , and so forth up to 1s and Ss ; this yields 2/)1( SS rows. We then proceed with

ss ,

3M in the same way, yielding another 2/)1( SS rows.

The sample analogue to equation system (13a) is given by

)(~~

NNNNN θΓγ b , (13b)

where the elements of Nγ~ and NΓ

~ are equal to those of Nγ and NΓ with the expectations

operator suppressed and the disturbances Nu replaced by (consistent) estimates Nu~ .

GM estimates of the parameters NSN ,,,1 ..., , 2

are then obtained as the solution to

)](~

)([)]~~(

~)

~~[(minarg2

21 ,,..,,NNNNNNNNNNNN

S

θΘθΓγΘΓγ

bb , (14)

i.e., the parameter estimates can be obtained from a (weighted) non-linear least squares

regression of Nγ~ on the columns of NΓ

~; )( NN θ can then be viewed as a vector of

regression residuals. The optimal choice of the )]1(4[ SSS )]1(4[ SSS weighting

matrix NΘ and its estimation will be discussed below.

In the following, we define the elements of Nγ and NΓ , grouped by the corresponding

moment conditions. Thereby, we use the following notation:

NNsTNs uMIu )( ,, , Ss ,...,1 , and (15a)

NNmNsTNNmTNsTNsm uMMIuMIMIu )())(( ,,,,, , Ss ,...,1 , Sm ,...,1 . (15b)

In the derivation of the elements of Nγ and NΓ , we also make use of the fact that

)(2)()( ,,,1

1

2

,1

2

,1 NnNnm

NT

n

S

m

mNn

NT

nNn

NT

n uudiagudiagvdiag

NT

S

m

NnlNnm

NT

nl

S

l

m uudiag I2

1

,,,,1

1

)(

, (16)

where Nnmu ,, denotes the n-th element of the vector Nm,u .

Moment Condition ss ,

1M

Due to the adopted convention regarding the ordering of the rows in equation system (13), the

row index for moment condition ss ,

1M – denoted as row( ss ,

1M ) – is given by 1)1(4 s for

ss and ssssSsS 2/)1()1(4 for ss . Hence, moment condition 1

,M ss

delivers S rows of equation system (in row 1, 5, …, )34 S ) for ss and 2/)1( SS rows

of equation system (in rows 2/)1(4,....,14 SSSS ) for ss . The corresponding

elements of Nγ and NΓ are defined as follows:11

)]})(([{)1(

1,,

2

,1,0,,0,)row(M ,1

NsNsTNn

NT

nNNsNNs udiagTrETN

ss

MMIQuQu (17a)

)()1(

1 ,

,1 N

ss

NNETN

uu

A ,

where })](([)({ ,,,01,,,0

,

,1 nnNsNsTN

NT

nNsNsTN

ss

N diag MMIQMMIQ

A .

)]})(([{)1(

2,,,,,1,0,,0,,)row(M ,

1NsNsTNnNnm

NT

nNNmsNNsmuudiagTrE

TNss

MMIQuQu

]([)1(

2 ,

,1, N

ss

NNmTNETN

uMIu

)A , associated with m , Sm ,...,1 .

)]})(([{)1(

1,,

2

,,1,0,,0,,)row(M ,1

NsNsTNnm

NT

nNNmsNNsmmSudiagTrE

TNss

MMIQuQu

])()([)1(

1,

,

,1, NNmT

ss

NNmTNETN

uMIMIu

A , associated with 2

m , Sm ,...,1 .

)]})(([{)1(

2,,,,,,1,0,,0,2/)1()1(,)row(M ,

1NsNsTNnlNnm

NT

nNNlsNNsmmlmmmSuudiagTrE

TNss

MMIQuQu

])(([)1(

2,

,

,1, NNmT

ss

NNlTNETN

uMIMIu

)A ,

associated with lm , )1(,...,1 Sm ; Sml ),...,1( .

)(1

,,12/)1(2,)row(M ,1

NsNsSSStr

Nss

MM .

Moment Condition 2M s

11

For simplicity, subscript N is dropped in the definition of the elements of and . Nγ NΓ

Due to the adopted convention with respect to the ordering of the rows in equation system

(13a), the row index for moment condition ss,

2M is given by 2)1(4 s . (For ss,

2M we

always have ss such that we use only a single subscript.) Hence, moment condition s

2M

delivers S rows of the equation system (in rows 2, 6, …, )24 S . The corresponding

elements of Nγ and NΓ are defined as follows:

)()1(

1,0,2)1(4 NNNss E

TNuQu

(17b)

)()1(

1,2 N

s

NNETN

uu A

, where s

N

s

N ,2,2 AA .

)()1(

1,,0,,0,,2)1(4 NmNNsNNNsmms E

TNuQuuQu

]))(([)1(

1,2,2, N

s

N

s

NNmTNETN

uMIu AA

,

)()1(

1,,0,,2)1(4 NmNNsmmSs E

TNuQu

])(([)1(

1,,2, NNmT

s

NNmTNETN

uMIMIu

)A ,

][)1(

1,,0,,,0,2/)1()1(,2)1(4 NlNNsmNmNNslmlmmmSs E

TNuQuuQu

]))()(([)1(

1,,2,2, NNmT

s

N

s

NNlTNETN

uMIMIu

AA ,

012/)1(2,2)1(4 SSSs .


3M

Due to the adopted convention regarding the ordering of the rows in equation system (13), the

row index for moment condition ss ,

3M – denoted as row(ss ,

3M ) – is given by 3)1(4 s for

ss and ssssSsSSS 2/)1()1(2/)1(4 for ss . Hence, moment condition

ss ,

3M delivers S rows of the equation system (in rows 3, 6, …, )14 S for ss and

2/)1( SS rows of the equation system (in rows )1(4,....,12/)1(4 SSSSSS ) for

ss .

)]})(([{1

,,

2

,1,1,,1,)row(M ,3

NsNsTNn

NT

nNNsNNs udiagTrEN

ss MMIQuQu (17c)

)(1 ,

,3 N

ss

NNEN

uu A ,

where ])]([)([ ,,,11,,,1

,

,3 nnNsNsTN

NT

nNsNsTN

ss

N diag MMIQMMIQ

A .

)]})(([{2

,,,,,1,1,,1,,)row(M ,3

NsNsTNnNnm

NT

nNNmsNNsmuudiagTrE

Nss

MMIQuQu

]([2 ,

3, N

ss

,NNmTNEN

uMIu )A ,

)]})(([{1

,,

2

,,1,1,,1,,)row(M ,3

NsNsTNnm

NT

nNNmsNNsmmSudiagTrE

Nss

MMIQuQu

])(([1

,

,

,3, NNmT

ss

NNmTNEN

uMIMIu

)A ,

)]})(([{2

,,,,,,1,1,,1,2/)1()1(,)row(M ,3

NsNsTNnlNnm

NT

nNNlsNNsmmlmmmSuudiagTrE

Nss

MMIQuQu

])(([2

,

,

,3, NNmT

ss

NNlTNEN

uMIMIu

)A ,

)()1(

,,12/)1(2,)row(M ,3

NsNsSSStr

N

Tss

MM , associated with 2

,N .


4M

The row index for moment condition ss ,

4M is given by 4)1(4 s , i.e., moment condition

s

4M delivers S rows of the equation system (in rows 4, 8, …, S4 ). Moment condition M2

delivers S rows of the equation system (in rows 2, 6, …, 24 S ). The corresponding

elements of Nγ and NΓ are defined as follows:

)(1

,1,4)1(4 NNNss EN

uQu (17d)

)(1

,4 N

s

NNEN

uu A , where s

N

s

N ,4,4 AA .

)(1

,,1,,1,,4)1(4 NmNNsNNNsmms EN

uQuuQu

]))(([1

,4,4, N

s

N

s

NNmTNEN

uMIu AA ,

)(1

,,1,,4)1(4 NmNNsmmSs EN

uQu

])(([1

,,4, NNmT

s

NNmTNEN

uMIMIu )A ,

][1

,,1,,,1,2/)1()1(,4)1(4 NlNNsmNmNNslmlmmmSs EN

uQuuQu

]))()(([1

,,4,4, NNmT

s

N

s

NNlTNEN

uMIMIu AA ,

012/)1(2,4)1(4 SSSs .

This completes the specification of the elements of the matrices Nγ and NΓ . The similarity

between the structure of the expressions resulting from moment conditions ss ,

1M and s

2M on

the one hand and ss ,

3M and s

4M on the other hand is apparent. Apart from a slight discrepancy

in the definition of the element corresponding to 2

between ss ,

1M and ss ,

3M , the other

elements differ only by the normalization factor and the corresponding matrix of quadratic

forms, N,0Q and N,1Q , respectively.

2. Definition of GM Estimator

It is a well known result from the literature on generalized method of moments estimation

that, for weighting matrix NΘ in (14), it is optimal to use the inverse of the (properly

normalized) variance-covariance matrix of the sample moments, evaluated at the true

parameter values. Denote the optimal weighting matrix, which will be derived in Subsection

3.2, by 1

NΨ and its estimate by 1~

NΨ . The optimally weighted GM estimator uses 1~~ NN ΨΘ

and is defined as

}],0[,,..,1, ),(~

)({ minarg),~,~,...,~(22

,,1,

bSsaa sNNNNNSN θΘθ ,

with ),,()(2

ρθ NN )

~~( bNN Γγ . (18)

In a first step, we will assume that Ni, and Nitv , are normally distributed in the derivation of

the optimal weighting matrix 1

NΨ as in Kapoor et al. (2007). In the Appendix, the optimal

weighting matrix 1

NΨ will be derived without distributional assumptions (apart from the ones

in Assumption 1). It is worth emphasizing that the use of estimated disturbances together with

the presence of endogenous variables in (1a) introduces a difference between the optimal

weighting matrix 1

NΨ and the inverse of the variance-covariance matrix of the sample

moments. Under fixed effects, this is also true, even if there are no endogenous variables in

the main equation (1a). This will become apparent in section 3.2., where the optimal

weighting matrix 1

NΨ and an estimate 1~

NΨ are derived.

3. Asymptotic Properties of the GM Estimator for Nθ

3.1 Consistency

In order to prove consistency of the estimator Nθ~

, the following additional assumptions are

introduced:

Assumption 4.

Assume that NNNN ΔDuu ~ , i.e., NNnNnNn uu Δd .,,,~ , for NTn ,...,1 , where ND is an

PNT matrix, the P1 vector Nn.,d denotes the n-th row of ND and NΔ is a 1P vector.

Let Nnjd , be the j-th element of Nn.,d . We assume that

dNnj ctdE2

, )( for some 0 ,

where dc does not depend on N, and that )1(2/1

pN ON Δ .

Assumption 4 will be fulfilled in many settings, e.g., if model (1a) contains endogenous

variables (such as spatial lags of Ny ) and is estimated by fixed or random effects two-stage

least squares. In that case, NΔ denotes the difference between the parameter estimates and the

true parameter values and Nn.,d is the (negative of the) n-th row of the design matrix NZ

under random effects or of the within transformed design matrix NNN ZQZ ,0 under fixed

effects (see subsection 2 of Section IV). Under certain conditions, Assumption 4 will also be

satisfied if model (1a) involves a non-linear specification (see Kelejian and Prucha, 2010).

Finally, Assumption 4 implies that

NT

n

NnNT1

2

.,

1)(

d is )1(pO .

Assumption 5.

(a) The smallest eigenvalues of NNΓΓ are bounded uniformly away from zero, i.e.,

0)( *min NNΓΓ . (b) )1(~

pNN oΘΘ , where NΘ are )]1(4[)]1(4[ SSSSSS

non-stochastic, symmetric, positive definite matrices. (c) The largest eigenvalues of NΘ are

bounded uniformly from above, i.e., *max )( NΘ and the smallest eigenvalues of NΘ

are bounded uniformly away from zero, i.e., 0)( *min NΘ .

Assumption 5 implies that the smallest eigenvalues of NNN ΓΘΓ are bounded uniformly away

from zero, ensuring that the true parameter vector Nθ is identifiable unique. Moreover, by the

equivalence of matrix norms, it follows from Assumption 5 that NΘ and 1

NΘ are O(1).

Assumptions 1-5 ensure consistency of the GM estimators for ),( 2

NN ρθ . We summarize

these results in the following theorem, which is proven in Appendix B.

Theorem 1. Consistency of Weighted GM Estimator Nθ~

Suppose Assumptions 1-5 hold. Then, provided the optimization space contains the parameter

space, the weighted GM estimators ])~

(~),~

(~),...,~

(~[)~

(~ 2

,,,1 NNNNSNNNN ρρ ΘΘΘΘθ defined by

(18) are consistent for NSN ,1, ,..., and 2

, i.e.,

0 )~

(~,s,

p

NsNN Θ , Ss ,...,1 , and 0 )~

( ~ 22

,

p

NN Θ as N .

This result holds for an arbitrary weighting matrix (that satisfies Assumption 5). Hence, it

applies to both the optimally weighted GM estimator defined by (18) with 1~~ NN ΨΘ or the

initial unweighted GM estimator with NN IΘ ~

.

3.2 Asymptotic Distribution of GM Estimator for Nθ

In the following we consider the asymptotic distribution of the optimally weighted GM

estimator Nθ~

. To establish asymptotic normality of )~ ,~(~ 2

,NNN ρθ , we need some additional

assumptions.

Assumption 6.

Let ND be defined as in Assumption 4, such that NNNN ΔDuu ~ . For any real NTNT

matrix NA , whose row and column sums are bounded uniformly in absolute value, it holds

that )1()(11

pNNNNNN oENN uADuAD .

A sufficient condition for Assumption 6 is, e.g., that the columns of ND are of the form

NNN εΠπ , where the elements of Nπ are bounded uniformly in absolute value and the row

and column sums of NΠ are bounded uniformly in absolute value (see Remark A.1 in the

Appendix). This will be the case in many applications, e.g., for model (1a), when ND equals

(the negative of) the design matrix NZ or the within-transformed design matrix NZ (compare

subsection 2 of Section IV).

Assumption 7.

Let NΔ be defined as in Assumption 4. Then,

)1()()( 2/12/1

pNNN oNTNT ξTΔ , with ),( ,,

NNN μv TTT , ),( NNN μvξ , i.e.,

)1()()()( ,

2/1

,

2/12/1

pNNNNN oNTNTNT μTvTΔ μv ,

where NT is an PNNT )( -dimensional real non-stochastic matrix whose elements are

bounded uniformly in absolute value; its submatrices N,vT and N,μT are of dimension

)( PNT and )( PN , respectively. As remarked above, NΔ typically denotes the difference

between the parameter estimates and the true parameter values. Assumption 7 is kept general

and will be satisfied by many estimators, which differ in the definition of NT . In Section IV,

we verify that it holds if the model in equation (1a) is estimated by (random or fixed effects)

two-stage least squares (TSLS) or feasible spatial generalized TSLS.

In Appendix B, the limiting distribution of the GM estimator of Nθ is shown to depend on

(the inverse of) the matrix NNN JΘJ and the variance-covariance matrix of a vector of

quadratic forms in Nv and Nμ , denoted as Nq . We consider each of these expressions in the

following.

The )1()]1(4[ SSSS matrix NJ of derivatives of the 1)]1(4[ SSS vector of

moment conditions in (11) is given by

θ

ΓγθJ

)()( NNN

NN

b),,...,( ,1,,,,1, NSiNSiNi jjj , with (19a)

Nsij ,,

s

NNiNi

)( .,., bΓγ, )1(4,...,1 SSSi , Ss ,...,1 ,

NSij ,1,

)( .,., NNiNi bΓγ, )1(4,...,1 SSSi ,

where Ni.,γ and Ni.,Γ denote the i-th row of Nγ and NΓ respectively.

Using 0θ

γ

N and ignoring the negative sign, we have

NNNNN Bb ΓΓθ

θJ

)( , (19b)

where NΓ is defined above and of dimension )]1(4[ SSS ]12/)1(2[ SSS and NB

is a )1(]12/)1(2[ SSSS matrix of the form

),,,( ,4,3,21 NNNN BBBBB , (20a)

with ),( 11 SS 0IB and )]),2([ 1,1,2 SNs

S

sN diag 0B . The )1(2/)1( SSS matrix

],),...,[( 12/)1(,1,3,1,3,3 SSNSNN 0BBB consists of )1( S vertically arranged blocks Nm,,3B ,

)1(,...,1 Sm , which have the following structure:

),,( ,,,,,3 NmNmNmNm EdCB , (20b)

where Nm,C is a )1()( mmS matrix of zeros,12

Nm,d is a 1)( mS vector, defined as

),...,( ,,1, NSNmNm d , and mSNmNm I,, E . Finally, N,4B is a )1(1 S vector, defined as

12

I.e., there is no block in . N,1C N,1,3B

)1,( 1,4 SN 0B . (20c)

For later reference, note that NB has full column rank )1( S ; as a consequence, the

)1()1( SS matrix NNBB is positive definite (see, e.g., Greene, 2003, p. 835).

We next consider the vector Nq and its limiting distribution. First, define ),( NNN Δθq as the

1)]1(4[ SSS vector of sample moments with the expectation operator suppressed,

evaluated at the true parameter values, and ignoring the deterministic constants. It is made up

of the following quadratic forms in Nu~ :

)~~(),( ,

,

1

N

ss

NcNNNN N uCuΔθq for 4,...,1c and Sss ,...,1, . (21)

Hence, each element of this vector corresponds to a particular moment condition, indexed by

c, each of which is associated with a particular weights matrix Ns,M through (12b) and (12d)

for moment conditions s

2M and s

4M , or through (12a) and (12c) with a pair of weights

matrices Ns,M and Ns ,M for moment conditions ss ,

1M and ss ,

3M . The arrangement of the

elements is the same as in equation system (13).

In light of (12), the matrices ss

Nc

,

,C , 4,...,1c , and Sss ,...,1, , are defined as follows:

N

ss

N

ss

NN

ss

NT

RAARC ])([)1(2

1 ,

,1

,

,1

,

,1

, (22)

N

s

N

s

NN

s

NT

RAARC ])([)1(2

1,2,2,2

,

N

ss

N

ss

NN

ss

N RAARC ])([2

1 ,

,3

,

,3

,

,3

,

N

s

N

s

NN

s

N RAARC ])([2

1,4,4,4

,

where we have used the definition

S

m

NmNmNTN

1

,, )]([ MIIR .

By Assumption 3 and Remark A.1 in Appendix A, the row and column sums of the

symmetric NTNT matrices ss

Nc

,

,C , 4,...,1c , and Sss ,...,1, , are bounded uniformly in

absolute value. Using equation (21) and invoking Lemma B.1 (see Appendix B), the elements

of ),(2/1

NNNN Δρq can be expressed as

)1()()()~~( 2/1,

,

,

,

2/1,

,

2/1

pN

ss

NcN

ss

NcNN

ss

NcN oNNN ΔαuCuuCu (23)

with )(2])([ ,

,

1,

,

,

,

1,

, N

ss

NcNN

ss

Nc

ss

NcN

ss

Nc ENEN uCDuCCDα since ss

Nc

,

,C is symmetric. By

Lemma B.1 the elements of the 1P vectors ss

Nc

,

,α , 4,...,1c , and Sss ,...,1, , are bounded

uniformly in absolute value. As evident from (23), 0α ss

Nc

,

, when 0Du )( NNE , which is the

case under random effects estimation if there are no endogenous variables.

Note that NNNN vQεQ ,0,0 and that for symmetric NN matrices NA , we have

NNNTNN εQAIQε ,1,1 )( NNNTNNNNNT vQAIQvμAμ ,1,1 )( + NNTN μAev )(2 . Using

(22), (23), and Assumption 7 we can rewrite the vector of sample moments as

)1()1(),( *2/12/1

pNpNNNN ooNN qqΔθq , (24)

where each element of the 1)]1(4[ SSS vector )( ,,

*

,

* ssNcNc

qq can be written as linear

quadratic form of the 1)( NNT vector ),( NNN μvξ :

)1(])([ ,

,

,

,

,

,

*

pN

ss

NcN

ss

NcN

ss

Nco

ξaξAξq

)1(])()([ ,

,,

,

,,

,

, pN

ss

NcN

ss

NcN

ss

NcN o

μavaξAξ μv , (25)

where

ss

Nc

ss

Nc

ss

Nc

ss

Ncss

Nc ,

,,

,

,,,

,

,,,

,

,,,

,)( μμv

μvv

AA

AAA ,

ss

NcN

ss

Nc T

,

,

1,

, αTa , 4,...,1c , Sss ,...,1, , or

])(,)[(])(,)[( ,

,,

,

,,

1,

,,

,

,,

,

,

ss

NcN

ss

NcN

ss

Nc

ss

Nc

ss

Nc T αTαTaaa μvμv , for 4,...,1c , and Sss ,...,1, .

Observe that the elements of ss

Nc

,

,a , 4,...,1c , and Sss ,...,1, , are bounded uniformly in

absolute value by Assumption 7 and Lemma B.1. The symmetric matrices ss

Nc

,

,A , ss

Nc

,

,,vA ,

ss

Nc

,

,,, μvA , and ss

Nc

,

,,μA are of dimension )()( NNTNNT , NTNT , NNT , and NN ,

respectively, and defined as follows.

For moment condition ss ,

1M , we have

])([)1(2

1 ,

,1

,

,1

,

,,1

ss

N

ss

N

ss

NT

AAA v , NNT

ss

N

0A μv

,

,,,1 , and NN

ss

N

0A μ

,

,,1 . (26a)

For moment condition s

2M we have

])([)1(2

1,2,2,,2

s

N

s

N

s

NT

AAA v , NNT

s

N 0A μv ,,,2 , and NN

s

N 0A μ,,2 . (26b)

For moment condition ss ,

3M we have

])([2

1,3,3

,

,,3

s

N

s

N

ss

N AAA v, )]([

2

1,,,,

,

,,,3 NsNsNsNsT

ss

N

MMMMeA μv , and

)(2

,,,,

,

,,3 NsNsNsNs

ss

N

T

MMMMA μ . (26c)

For moment condition s

4M , we have

])([2

1,4,4,,4 s

N

s

N

s

N AAA v , )]([2

1,,,,,4 NsNsT

s

N MMeA μv , and

)(2

,,

,

,,4 NsNs

ss

N

TMMA μ

. (26d)

Note that the row and column sums of the symmetric matrices ss

Nc

,

,A , ss

Nc

,

,,vA , ss

Nc

,

,,, μvA , and

ss

Nc

,

,,μA are bounded uniformly in absolute value by Assumption 3 and Remark A.1 in the

Appendix. Moreover, the elements of the ),( NNN μvξ are independently distributed by

Assumption 1, and the variance-covariance matrix of Nξ is

NNTN

NNTN

NI0

0ΣΩξ 2,

. (27)

In order to calculate the variance-covariance matrix of Nq , given by the

)]1(4[)]1(4[ SSSSSS matrix )( **1

NN

-

N EN qq Ψ , we invoke Lemma A.1 in

Kelejian and Prucha (2010). For the moment, assume that the error components Nμ and Nv

are normally distributed.13

The distribution of the GM estimates without distributional

assumptions (apart from Assumption 1) is considered in the Appendix. Under normality, the

covariance between two elements of the vector Nq is given by:

),(,

,

*,

,

*1,;,

,,

tt

Nc

ss

Nc

ttss

Ncc CovN

qqE (28a)

])(,)([ ,

,

,

,

,

,

,

,

1

N

tt

NcN

tt

NcNN

ss

NcN

ss

NcNCovN ξaξAξξaξAξ

])()(,)()([ ,

,,

,

,,

,

,

,

,,

,

,,

,

,

1

N

tt

NcN

tt

NcN

tt

NcNN

ss

NcN

ss

NcN

ss

NcNCovN μavaξAξμavaξAξ μvμv

13

In that case, in Assumption 1, the requirement of finite -th moments of the error

components can be relaxed to the requirement of finite variances.

4

))(2(2 ,

,,

,

,,

4,

,,,

,

,,,

2,

,,

,

,,

1 tt

Nc

ss

Nc

tt

NcN

ss

NcN

tt

NcN

ss

NcTrN

μμμvμvvv AAAΣAΣAΣA .

])()[( ,

,,

,

,,

2,

,,

,

,,

1 tt

Nc

ss

Nc

tt

NcN

ss

NcN

μμvv aaaΣa ,

with 4,...,1, cc , Sts ,...,1, for ss and tt , and 1,...,1 Ss , ss . Note that the

each combination of indices c , s , s (and also c , t , t ) is associated with a particular row

of Nq . Hence, ttss

Ncc

,;,

,,E is the covariance between the element of Nq associated with moment

condition ss

c

,M and the element of Nq associated with moment condition tt

c

,M . (For the

second and fourth moment condition we always have ss and tt ).

In equation (28), ss

Nnnca,

,,,v and ss

Niica,

,,,μ denote the n-th and i-th main diagonal element of the

matrices ss

Nc

,

,,vA and ss

Nc

,

,,μA , respectively, and ss

Nnca,

,,,v and ss

Nica,

,,,μ denote the n-th and i-th

element of the vectors ss

Nc

,

,,va and ss

Nc

,

,,μa respectively.

The arrangement of the elements )( ,, NjiN Ψ , )]1(4[,...,1 SSSi ,

)]1(4[,...,1 SSSj is straightforward and follows naturally from the ordering of the

elements in the vector Nq , though it is notationally burdensome to state in the general case.

The expression in (28) holds generally. Part of the elements of NΨ can be stated in simpler

terms: in particular, the submatrices ss

Nc

,

,,μA , are zero for 1c and 2c such that *

,NμE drops

out for the respective elements. If both sub-matrices associated with Nit , are zero ( 1c or

2c and 1c or 2c ), **

,NμE drops out as well. Under fixed effects estimation, the terms

**

,NμE (the expressions involving ss

Nc

,

,,μa ) are equal to zero. Finally, since the main diagonal

elements of the matrices s

N,2A and s

N,4A are zero, the term *,

,NvE does not show up for

elements where 2c or 4c (or where 2c or 4c ).

To derive the asymptotic distribution of Nq and Nθ~

we invoke the central limit theorem for

vectors of linear quadratic forms given by Kelejian and Prucha (2010, Theorem A.1) and

Corollary F4 in Pötscher and Prucha (1997). We summarize the results regarding the

asymptotic distribution of Nθ~

in the following Theorem, which is proven in Appendix B.

Theorem 2. Asymptotic Normality of Nθ~

Let Nθ~

be the GM estimator defined by (18). Suppose Assumptions 1-7 hold and,

furthermore, that 0)( *

min ΨΨ cN . Then, provided the optimization space contains the

parameter space, we have

)1()()~

( 2/112/1

pNNNNNNNNN oN ξΨΘJJΘJθθ , with

NNNNN Bb ΓΓθ

J

, and

),0( )1(4

2/1

SSSd

NNN N IΨξ q ,

where )( NNN E qq Ψ and ))(( 2/12/1 NNN ΨΨΨ .

Furthermore )1()~

(2/1

pNN ON θθ and

11~ )()()( NNNNNNNNNNNN

N

JΘJJΘΨΘJJΘJΘΩθ

,

where Nθ

Ω~ is positive definite.

Theorem 2 implies that the difference between the cumulative distribution function of

)~

(2/1

NNN θθ and that of ),0( ~N

Nθ

Ω converges pointwise to zero, which justifies the use of

the latter as an approximation of the former.14

Theorem 2 holds both under normality and non-

normality of the error components, the difference being only the definition of the elements of

NΨ (and the requirement regarding the finiteness of the moments of the error components in

Assumption 1).

Note that 111

~ )()( NNNNN

JΨJΨΩθ

and that )()( 1~~

NNNN

ΨΩΘΩθθ

is positive semidefinite.

Thus, using a consistent estimator of 1

NΨ (which will be derived below) as weighting matrix

NΘ leads to the efficient GM estimator. We add that NΨ is not exactly equal to the variance-

covariance matrix of the moments, if there is an endogenous right-hand side variable in

equation (1), since the GM estimates are based on estimated rather than the true disturbances.

(See also the discussion surrounding equation (23)).

IV. Estimation of Regression Parameters Nδ and Joint Asymptotic Distribution

In the following, we consider estimators for the regression parameters Nδ in model (1a) and

establish their joint asymptotic distribution with the GM estimates Nθ~

derived in section III.

We keep the analysis general first, allowing us to state our results in a succinct way that nests

both random and fixed effects estimation of the original model as well as the spatial GLS

transformed model. We will then be more specific about the properties and the respective

expressions for the TSLS and spatial generalized TSLS estimation of model (1a).

14

Compare Corollary F4 in Pötscher and Prucha (1997).

1. General Statement of Estimator and Joint Asymptotic Distribution

Key to establishing the asymptotic properties of the GM estimates Nθ~

, which are based on the

estimated disturbances of model (1a), is Assumption 7, which holds that the (properly

normalized) difference between the true parameters and the estimates ( NΔ ) is linear in the

stacked vector of error terms, i.e., )1()()( 2/12/1

pNNN oNTNT ξTΔ .

For all estimators of Nδ in model (1a) considered in the present paper, the matrix NT has the

following structure:

NNN PFT with ),( ,, NNN μv FFF , (29a)

which can also be written as

),( ,, NNN μv TTT with NNN PFT vv ,, , NNN PFT μμ ,, , (29b)

where N,vF is a real non-stochastic *PNT matrix, N,μF is a real non-stochastic *PN

matrix, and NP is a real non-stochastic PP * matrix, with P as in Assumption 7. The

definition of NP , N,vF , N,μF will be seen to depend on the estimated model (original versus

spatial GLS transformed model) and the estimation approach (random versus fixed effects). In

general, NP is a function of the original or within-transformed design matrix NZ and a real

non-stochastic *PNT matrix of instruments NH , (or spatial GLS transformed variants

thereof); N,vF and N,μF depend on the original or within-transformed instruments NH (or

spatial GLS transformed variants thereof), and – in the untransformed model – on the matrix

S

m

NmNmNT

1

1

,, ])([ MII .

Since both )~

(2/1

NNN θθ and NNT Δ2/1)( , and thus also NN Δ

2/1 are asymptotically linear in

Nξ , the joint distribution of the vector ])~

(,[ 2/12/1 NNN NN θθΔ can be derived invoking the

central limit theorem for vectors of quadratic forms by Kelejian and Prucha (2010).

Consider the 1)]1(4[( * SSSP vector of linear and linear quadratic forms in Nξ :

N

NN

N

NT

q

ξFw

2/1)(. (30)

Its variance-covariance matrix is of dimension )]1(4[()]1(4[ ** SSSPSSSP and

given by:

NNNNN

NNNNNNN

NNNT

NTNTEVar

qqq

q

Fξ

ξFFξξFΨw w 2/1

2/11

,)(

)()()(

NN

NN

ΨΨ

ΨΨ

Δθ

ΔθΔΔ

,

,, , (31)

where the )]1(4[()]1(4[ SSSSSS matrix NΨ is defined above in (28).

The ** PP matrix N,ΔΔΨ is defined as

N,ΔΔΨ μ

ΔΔ

v

ΔΔ ΨΨ NN ,, , with (32a)

)()( ,,

1

, NNNN NT vv

v

ΔΔ FΣFΨ and NNN NT ,,

21

, )( μμ

μ

ΔΔ FFΨ

.

The )]1(4[* SSSP matrix N,ΔθΨ is given by

])[( 2/1

, NNNN NTE q ξFΨΔθ , (32b)

which is made up by )]1(4[ SSS columns of dimension 1* P , each of them associated

with a set of indices c , s , and s and thus a particular moment condition. Under normality of

Nμ and Nv , the columns are defined as

Nssc ),,,(,., Δθψ μ

Δθ

v

Δθ ψψ NsscNssc ),,,(,.,),,,(,., , 4,...,1c , Sss ,...,1, , with (32c)

v

Δθψ Nssc ),,,(,., )(11 ,

,,,2/1

ss

NcNNTN

vv aΣF , and

μ

Δθψ Nssc ),,,(,., )(11 ,

,,,

2

2/1

ss

NcNTN

μμ aF .

In Appendix 1.2, N,ΔθΨ is defined for the general case without distributional assumptions

(apart from Assumption 1).

Regarding the joint limiting distribution of )~

(2/1

NNN θθ and NNT Δ2/1)( , we now have the

following result, which is proven in Appendix B.

Theorem 3. Joint Distribution of Nθ~

and Regression Parameters

Suppose that Assumptions 1-7 hold. Moreover, assume also that )1(ON H (see Assumption

9 below) and that )1(ON F ; the latter assumption will be verified, once we have defined the

matrix NF for the particular estimators used. Moreover, assume that 0)( *

,min wΨwΨ cN .

Then,

)1()()

~(

,

2/1

,1

2/1

2/1

2/1

pNoN

NNNNN

N

NN

N oT

N

N

ξΨΘJJΘJ0

0P

θθ

Δw

, with

),(],)[()1(4

2/12/1

,

2/1

,, *

SSSP

dNNNNNNNo NNT I0FξΨwΨξ ww q , and

1

2/1

,1

2/1

,)()( NNNNN

N

N

NNNNN

N

N

TT

JΘJJΘ0

0PΨ

ΘJJΘJ0

0PΩ ww .

Theorem 3 implies that the difference between the joint cumulative distribution function of

])~

(,[ 2/12/1 NNN NN θθΔ and that of ),( ,NN wΩ0 converges pointwise to zero, which justifies

the use of the latter distribution as an approximation of the former.

Remark 2.

Theorem 3 holds under both normality and non-normality of the error components, the

difference being the definition of the elements of N,wΨ , in particular those of NΨ and N,ΔθΨ .

Obviously, Theorem 3 can also be used to obtain the joint distribution of )~

(2/1

NNN θθ and

some other estimator

NN Δ2/1

, where )1()()( 2/12/1

pNNN oNTNT ξTΔ ,

NNN PFT ,

assuming that analogous assumptions are maintained for this estimator. In particular, the

results remain valid, but with NF and NP replaced by

NF and

NP in the definitions of N,ΔΔΨ

as well as N,ΔθΨ .

2. Two-Stage Least Squares (TSLS) and Spatial Generalized TSLS Estimation of Nδ

Obviously 0uY )( NNE in model (1a). In the following we consider four TSLS estimators

for Nδ . First, depending on whether 0π or not in equation (9), we consider random effects

or fixed effects estimation. Second, we consider (both fixed and random effects) estimation of

the original model (1a) as well as of the spatial generalized LS transformed model, which is

obtained by premultiplying model (1a) with the transformation matrix

S

m

NmNmNTN

1

,, )]([ MIIR . Regarding notation, we use an underbar to refer to within-

transformed variables, e.g., NNN ZQZ ,0 . Spatial generalized LS transformed variables are

indicated by an asterix, e.g., NNN ZRZ *. Matrices and vectors that are both within- and

spatial GLS transformed variables are indicated, accordingly, e.g.,

NNNNNN ZRQZQZ ,0

*

,0

* . By the properties of N,0Q , an equivalent way writing this is

NNNNNN ZRQZRZ ,0

* , i.e., the order, in which the transformations are performed is

immaterial.

2.1 Assumptions

Some properties of the regressor matrix NX have already been discussed in subsection 3 of

section II. The following further assumptions are maintained.

Assumption 8.

The non-stochastic instrument matrix NH has full column rank RKP * (for N large

enough). Furthermore, the elements of NH are bounded uniformly in absolute value. Under

fixed effects estimation, we also assume that each instrument changes over time (at least for

some cross-section i). Moreover, it hold that ])[(lim 1

NNN NT HHQHH

and

])[(plim 1

NNN NT ZHQHZ

are finite and non-singular.

Regarding the choice of instruments, note that

}])({[)()( 1

1

,,

1

,

1

,

1

, NN

R

r

NrNrNT

R

r

NrN

R

r

Nr

R

r

NNr EEE βXWIIWyWyW

NN

i

iR

r

NrNrNT

R

r

Nr βXWIIW

1 1

,,

1

, ]})([{ ,

provided that 11

,,

R

r

NrNr W for some matrix norm (compare Horn and Johnson, 1985,

p. 301). The instrument matrix NH is used to instrument ),( NNN YXZ in a least squares

regression of NZ on NH , obtaining NN NZPZ Hˆ , where NNNNN

HHHHPH 1)( . It is thus

reasonable to select NH to include NX and a subset of the linearly independent columns of

terms of the sum N

Q

i

iR

r

NrT XWI

1 1

, ])([ , where Q is some predefined constant.15

Note that

such a choice of NH implies that the second part of Assumption 9 will be fulfilled (by

Assumptions 3 and 8) and that NX is projected on itself.

Analogous assumptions are maintained for the within-transformed regressor and instrument

matrices NX and NH . Assumption 8 then also hold for the spatial GLS transformed variables

15

Kelejian, Prucha, and Yuzefovich (2004) consider the results using alternative sets of

instruments in the estimation of a cross-section SARAR(1,1) model. Their Monte Carlo

simulation results suggest that choosing will be sufficient in many applications. 2Q

*

NX and *

NH (under random effects estimation) or *

NX and *

NH (under fixed effects

estimation).

2.2 Definition of TSLS Estimator and Asymptotic Results

2.2.1 Random Effects Estimation

The random effects TSLS estimator of model (1a) is defined as

NNNNN yZZZδ ˆ)ˆ(~ 1 , where (33)

)ˆ,(ˆNNNN N

YXZPZ H , and

NN NYPY Hˆ with NNNNN

HHHHPH 1)( .

As already mentioned, under random effects estimation, the Z-matrix typically includes a

constant. The following lemma shows that the various assumptions maintained in Section III

are automatically satisfied by the random effects TSLS estimator Nδ~

and the corresponding

residuals NNnN δZyu~~ , which are used in the GM estimation of the parameters Ns, ,

Ss ,...,1 , and 2

. A proof of Lemma 1 is given in Appendix B.

Lemma 1

Suppose that Assumptions 1-3 and 8 hold, and that bNN βsup . Let NN ZD , then,

the fourth moments of the elements of ND are bounded uniformly in absolute value,

Assumption 6 holds, and

(a) )1()()()~

()( ,

2/1

,

2/12/1

pNNNNNN oNTNTNT μTvTδδ μv , where

NNN PFT vv ,, , NNN PFT μμ ,, ,

111 )( HZHHHZHZHH QQQQQPN ,

N

S

m

NmNmNTN HMIIFv

1

1

,,, ])([ , and

N

S

m

NmNmNTNTN HMIIIeFμ

1

1

,,, ])()[( .

(b) )1()()( ,

2/1

,

2/1

pNNNN ONTNT μTvT μv ;

(c) )1(pN OP and )1(~

pNN oPP , with

11111111 ]})[(])][()]{[()[(])[(~ NNNNNNNNNNN NTNTNTNTNT ZHHHHZZHHHP .

Note that (a) and (b) together imply that Nδ~

is a 2/1N -consistent estimator of Nδ . Regarding

Assumption 4, we now have NNNN ΔDuu ~ , where NN ZD and NNN δδΔ ~

. Lemma

1 shows that under Assumptions 1-3 and 8 the TSLS residuals automatically satisfy the

conditions postulated in Assumptions 4, 6, and 7 with respect to ND , NΔ , and NT . Hence,

Theorems 1 and 2 apply to the GM estimator Nθ~

, which is based on the TSLS residuals. The

lemma also establishes that the elements of NN ZD are bounded uniformly in absolute

value, gives explicit expressions for NP and NP~

, and verifies that the conditions concerning

these matrices made in Theorem 3 are fulfilled. Hence, Theorem 3 covers the GM estimator

Nθ~

and the TSLS estimator Nδ~

, and gives the joint limiting distribution of )~

(2/1

NNN θθ

and )~

(2/1

NNN δδ , where the matrices NN PP~

, , N,vF , N,μF are as in Lemma 1.

2.2.2 Fixed Effects Estimation

The fixed effects TSLS estimator of model (1a) is defined as

NNNNN yZZZδ ˆ)ˆ(

~ 1 , where (34)

NNN NNZPZPZ HH ˆ with NNNNNN

ZHHHHPH 1)( .

The fixed effects estimates Nδ~

can then be used to obtain consistent estimates of the

disturbances, given by NNnN δZyu~~ , which are then used for the GM estimation of the

parameters Ns, , Ss ,...,1 , and 2

. These should not be confused with the fixed effects

residuals NNnN δZyu~~ , which are an estimate of NNuQ ,0 .

The results for the fixed effects estimation are exactly as in Lemma 1, with NT , NP , NH

replaced with their within-transformed counterparts NT , NP , NH , and with

0Tμ N, , 0Fμ N, , and

N

S

m

NmNmNTN

S

m

NmNmNTNN HMIIHMIIQFv

1

1

,,

1

1

,,,0, ])([])([ .16

3. Definition of Spatial Generalized Two-Stage Least Squares (GTSLS) Estimator and

Asymptotic Results

16

By the idempotency of the within-transformation matrix , one could equivalently use

the fixed effects residuals in the expression

. However, since the derivation of the heteroskedasticity-

robust variance-covariance matrix relies on the use of the original residuals, we also define

the fixed effects estimator as a linear form in the original residuals .

N,0Q

NNN vQv ,0

)1()()( ,

2/12/1

pNNN oNTNT vTΔ v

Nv

3.1. Random Effects Estimation

The spatial GLS transformed version of model (1b) is given by

***

NNNN uδZy , (34)

where NNN yRy * , NNN ZRZ * , and NNNN εuRu * and the transformation matrix NR is

given by

S

m

NmNmNTN

1

,, )]([ MIIR .

The random effects spatial GTSLS estimator, denoted as *ˆNδ , is then obtained as a TSLS

estimator applied to the transformed model (37), using the transformed instruments

NNN HRH *, i.e.,

**1*** ˆ)ˆ(ˆNNNNN yZZZδ

, (35a)

with **

*ˆ

NNN

ZPZH

and *1*** )( NNNNNHHHHPH .

The feasible random effects spatial GTSLS estimator, denoted as *~̂Nδ , is defined analogously,

replacing the transformation matrix NR by its estimate

S

m

NmNmNTN

1

,, )]~([~

MIIR , i.e.,

**1*** ~~̂)

~~̂(

~̂NNNNN yZZZδ , (35b)

where the tilde indicates that the transformation is based on the estimate of NR .

The following lemma shows that the various assumptions maintained in Section III are

automatically satisfied by the (feasible) random effects spatial GTSLS estimator *~̂Nδ and the

corresponding residuals ** ~̂

)~̂

( NNNNN δZyδu . The proof is given in Appendix B.

Lemma 2.

Suppose the Assumptions of Lemma 1 hold, and let *ˆNδ

be defined as in (39), where Nθ

is

any 2/1N -consistent estimator of Nθ (such as the GM estimator Nθ

~ based on the TSLS

residuals). Then

(a) )1()()()( *

,

2/1*

,

2/1*2/1

pNNNNN oNTNTNT μTvTΔ μv , where

**

,

*

, NNN PFT vv , **

,

*

, NNN PFT μμ ,

1

**

1

******

1

**

* )( ZHHHZHZHHH QQQQQPN ,

**

, NN HFv ,

*

,NμF*)( NNT HIe .

(b) )1()()( *

,

2/1*

,

2/1

pNNNN ONTNT μTvT μv .

(c) )1(* ON P and )1(**

pNN oPP

for

1**11**1**1**11**1* ]})[(])][(){[(])[(])[( NNNNNNNNNNN NTNTNTNTNT ZHHHHZZHHHP

.

In light of Lemmata 1 and 2 the joint limiting distribution of the (feasible) spatial GTSLS

estimator *ˆNδ

and the GM estimator Nθ

follows from Theorem 3 and the discussion

thereafter, with NNN δδΔ **̂

.

Note that in light of Lemma 2 the residuals **** ˆ

)ˆ

(ˆNNNNNNNN ΔDuδZyδu

can be used to

estimate Nθ by the GM estimator defined by (18), where the discussion surrounding Lemma

1 applies analogously here. Taking this argument one step further, Nθ and Nδ can also be

estimated by an iterative procedure.

3.2. Fixed Effects Estimation

The fixed effects spatial GTSLS estimator, denoted as *

ˆNδ , is defined as

**1*** ˆ)ˆ(ˆNNNNN yZZZδ , (36a)

with **

*ˆ

NNN

ZPZH

and *1***)( NNNNN

HHHHPH .

The feasible fixed effects spatial GTSLS estimator, denoted as *~̂Nδ , is defined analogously,

using the estimate of the transformation matrix

S

m

NmNmNTN

1

,, )]~([~

MIIR , i.e.,

**1**

*~~̂

)~~̂

(~̂

NNNNN yZZZδ

. (36b)

The results for the fixed effects estimation are exactly as in Lemma 1, with *

NT , *

NP , *

NH

replaced with their within-transformed counterparts *

NT , *

NP , *

NH , and with

0Tμ *

,N and 0Fμ *

,N , and

**

, NN HFv .

Again notice that it is not the fixed effects residuals but the estimated disturbances

** ~̂)

~̂(~

NNNNN δZyδu , which can be used in the GM estimation of Nθ .

V. Variance-Covariance Matrix Estimation

As evident from Theorem 3, the matrix N,wΩ is of sandwich form. Both under random and

fixed effects estimation, the “sandwiched” middle term, i.e., N,wΨ , is seen to depend (among

others) on the idiosyncratic error terms Nv . A complication in deriving a consistent estimator

for N,wΨ arises from the well-known fact that one can only obtain consistent estimates of the

vector of fixed effects residuals )( ,NitN vv , i.e., the within-transformed residuals, but not of

the original idiosyncratic errors Nv – a manifestation of the so-called incidental parameter

problem (Lancaster, 2000).

This point was prominently made in a recent paper by Stock and Watson (2008), who suggest

a heteroskedasticity-robust bias-corrected variance-covariance matrix estimator for nonspatial

fixed effects panel data models. A closely related issue arises in the estimation of the

variance-covariance matrix of the GM estimates Nθ given by (28). In the following, we will

derive bias-corrected estimators for the joint asymptotic variance-covariance matrix of all

model parameters under both fixed and random effects estimation, pursuing an approach

analogous to that in Stock and Watson (2008).

1. Estimation of N,wΨ

In the following, we derive estimators for the each block of N,wΨ . We start by defining an

estimator for N,ΔΔΨ , required for inference with respect to the parameters Nδ of the main

equation (1a). In a next step we turn to the estimation of the (inverse) of the optimal

weighting matrix for the GM estimation NΨ , which is also a key element in the estimation of

the variance-covariance matrix of the GM estimates of Nθ . Finally, we turn to the estimation

of NθΔΨ , required for joint tests regarding Nθ and Nδ .

1.1 Estimation of N,ΔΔΨ

Consider

N,ΔΔΨ μ

ΔΔ

v

ΔΔ ΨΨ NN ,, , where

)()( ,,

1

, NNNN NT vv

v

ΔΔ FΣFΨ and NNN NT ,,

21

, )( μμ

μ

ΔΔ FFΨ

.

Under random effects estimation, the estimators for NF (original model) and *

NF (spatial GLS

transformed model) are defined as

N

S

m

NmNmNTN HMIIFv

1

,,, ])~([~

, (37a)

N

S

m

NmNmNTNTN HMIIIeFμ

1

,,, ])~()[(~

,

and

N

S

m

NmNmNTNN HMIIHFv

1

,,

**

, )]~([~

, (37b)

N

S

m

NmNmNTNTNNTN HMIIIeHIeFμ

1

,,

**

, )]~()[()(~

,

Under fixed effects estimation, the estimators for NF and *

NF are defined as

NNN ,,0,

~~vv FQF , 0Fμ N,

~, and (37c)

*

,,0

*

,

~~NNN vv FQF , 0Fμ

*

,

~N . (37d)

Hence – under random effects estimation of the untransformed model – the estimator for μ

ΔΔΨ N, is given by

NNNNT

,,

2

,

~~~1~μμ

μ

ΔΔ FFΨ , (38)

where 2~ is the GM estimate of 2

(based on the residuals generated using the random

effects estimator NNNNN δZyδu~

)~

(~ ). For the other estimators considered, μ

ΔΔΨ N,

~ is defined

in the same way, properly replacing the F-matrices and the estimates of the disturbances Nu~ .

As already mentioned above, due to the heteroskedasticity of Nv and the fact that the

variance covariance matrix depends on the idiosyncratic error terms in levels Nv rather than

the fixed effects residuals Nv , a bias correction is required. As shown in Lemma C.2 of the

Appendix, adopting an approach analogous to that in Stock and Watson (2008) in the present

framework yields the following bias-corrected estimator for v

ΔΔΨ N, :17

)~~~

()2(

1~,

HR

,, NNNNTN

vv

v

ΔΔ FΣFΨ

, (39)

where

])~[(~ 2

,1

HR

Nn

NT

n

HR

N vdiag Σ with

T

r

NirNit

HR

Nit vTT

vv1

2

,

2

,

2

,~

)1(

1~)~( . The estimates of the

fixed effects residual are given by N

S

m

NmNmNTNNNNitN v uMIIQεQv ~)~([~)~(~

1

,,,0,0,

.

Again the modification modifications of (39) for other estimators are straightforward,

replacing N,

~vF properly.

We summarize the consistency result of the estimators given by (38) and (39) with the

following theorem.

Theorem 4a. Consistency of N,

~ΔΔΨ

Let v

ΔΔ

μ

ΔΔΔΔ ΨΨΨ NNN ,,,

~~~ with

μ

ΔΔΨ N,

~ and

v

ΔΔΨ N,

~ defined in (38) and (39). Suppose that the

Assumptions of Theorem 3, apart from Assumptions 5 and 7, hold and that additionally all of

the fourth moments of the elements of ND are bounded uniformly. Suppose furthermore (a)

1sup1

,

S

s

NsN and that the row and column sums of NM are bounded uniformly in

absolute value by one and some finite constant respectively, and

(b) )1(~

pNN oPP with )1(ON P . Then, )1(~

,, pNN o ΔΔΔΔ ΨΨ and )1(~ 1

,, pNN o

ΔΔΔΔ ΨΨ .

Proof. Theorem 4a follows from Lemmata C.2 and C.3 in Appendix C.18

Remark 3: Under estimation of the spatial GLS transformed model (where the inverse of NR

cancels out), condition (a) can be dropped. Under TSLS (or spatial GTSLS estimation ),

condition (b) in Theorem 4a is automatically fulfilled (see Lemmata 1 and 2).

1.2 Estimation of NΨ

17

The result in Stock and Watson (2008) is obtained as a special case for and if there

are no endogenous right-hand side variables, i.e., .

18

Note that Lemma C.2 uses a slightly different definition of , factoring out ,

for notational convenience of the proof.

0ρ N

NN XF

HR

NΣ~

)2/( TT

Consider the elements of NΨ as defined in (28). For estimation, it will turn out convenient to

rewrite the part of the elements of NΨ as given by (28a) with the main diagonal elements of

the matrices ss

Nc

,

,,vA set to zero in the first expression of the trace in the first line. Furthermore,

to simplify the exposition we drop the indices sc, , and s in the following derivation and to

adopt the following notational convention. We refer to the matrix ss

Nc

,

,,vA , associated with the

set of indices ssc ,, as )()( ,,,,,, NjsitNnnN aa vvvA , and to the matrix ss

Nc

,

,,vA with its main

diagonal elements set to zero as )( ,,1,, Nnn

NT

nNN adiag vvv AA . Analogously, the matrix

tt

Nc

,

,,vA , associated with the set of indices ttc ,, is denoted as )()( ,,,,,, NjsitNnnN bb vvvB , and

)( ,,1,, Nnn

NT

nNN bdiag vvv BB . We adopt the same convention for the matrices ss

Nc

,

,,μA ,

henceforth denoted as )()( ,,,,,, NjsitNnnN aa μμμA , as well as tt

Nc

,

,,μA , henceforth denoted as

N,μB , and also for the vectors ss

Nc

,

,,va and tt

Nc

,

,,va , henceforth denoted )()( ,,,,, NitNnN aa vvva

and )()( ,,,,, NitNnN bb vvvb , respectively. Finally, we refer to products of equally indexed

elements of N,vA and N,vB as NjsitNjsitNit bac ,,,,,,,, vvv (or NnnNnnNn bac ,,,,, vv ) , and we define

)( ,,,,,,,,,, NnNnnNnnNnNn babad vvvvv and )( ,,,,,,,,,, NnNnnNnnNnNn babad μμμμμ .

In that case, equation (28a) can be written, for given a given pair of index sets ssc ,, and

ttc ,, , as

*

,

*

,

*

,

*,

, NNNNN vμμvv EEEEE **

,

**

, NN μv EE , (40)

where

)(2 ,,

1*,

, NNNNN TrN ΣBΣA vvv

E

N

i

T

t

N

j

T

itjss

NjsNitNjsit vvcEN1 1 1 1

2

,

2

,,,,

12 v,

*,

,NvE =

NT

n

NnvNn

NT

n

NnvNn cNcN1

)4(

,,,,

1

1

4

,,,,

1

3

22 vv ,

)(4 ,,,,

12*

, NNNN TrN μvμvvμ BΣA

E ,

)(2 ,,

14*

, NNN TrN μμμ BA E ,

NNNN N ,,

1**

, vvv bΣa E ,

NNN N ,,

21**

, μμμ ba

E ,

Notice that the terms 4

,,,, NnvNnc v , NTn ,...,1 , associated with the main diagonal elements of

N,vA and N,vB , in the expression NNNN ΣBΣA vv ,, , are not included in *,

,NvE . To rewrite *,

,NvE ,

we have used the fact that 3/)4(

,,

4

,, NnvNnv under normality, where )4(

,, Nnv is the fourth

moment of Nv .

We next define the estimates for N,va and N,μa :

NNNnN a αTa vvv

~~)~(~

,,,, , NNNnN a αTa μμμ

~~)~(~

,,,, , with (41)

NNN PFT vv

~~~,, , NNN PFT μμ

~~~,, , and )~~

(2~ 1

NNNN N uCDα . The (properly indexed) matrices

NC~

, i.e., ss

Nc

,

,C , are given by (22) with Nρ replaced by Nρ~ and the estimates of the

disturbances are given by NNNNN δZyδu~

)~

(~ . Expression (41) as written holds for random

effects estimation of the original model; the modifications for the other estimators are

obvious, appropriately replacing NN ZD , NT~

, and Nu~ . Of course, analogous definitions

apply to N,vb and N,μb .

We next define estimators for the terms in (40), starting with the “homoskedastic” terms,

involving only the time-invariant error component Ni, .

1.2.1. Estimation of “homoskedastic” terms

Consistent estimators of the expressions in (40), associated (only) with the homoskedastic,

time-invariance error component Niμ , , are given by

)(~2~

,,

14*

, NNN TrN μμμ BA E , (42a)

NNN N ,,

21**

,

~~~~μμμ ba

E , (42b)

The consistency proofs for the estimators defined in (42) are easily seen to be special cases of

those for the heteroskedastic terms considered in the next section and thus omitted for the

sake of brevity.

1.2.2. Estimation of “heteroskedastic” terms

Consider first *,

,NvE as defined in (40). Its estimation is simplified by the fact that the matrices

N,vA and

N,vB and thus the elements Njsitc ,, are time-invariant, i.e., NjiNjsit cc ,,,, . As shown

in Lemma C.5 in Appendix C, a consistent estimate of *,

,NvE is given by

*,

,

~NvE

N

i

N

j

T

t

T

s

NjsNitNji vvcNT

T

1 1 1 1

2

,

2

,,,2

2

~~1

)1(. (43a)

Next, consider

NT

n

NnvNnN cN1

4

,,,,

1*,

, 2 vvE . Under normality, and noting that elements Nitc , are

time-invariant, this can also be written as weighted sum of fourth moments as

N

i

T

t

NitvNiN cN1 1

)4(

,,,,

1*,

,3

2vvE , which can be estimated consistently using

]~

)1(

~

)1([

3

2~

11

01

11

0*,

, NNNmk

mk

mk

kaE

v , where (43b)

N

i

T

t

NitNiN vcN 1 1

4

,,~1~ ,

N

i

T

t

T

trr

NirNitNiN vT

vcN 1 1 1

2

,

2

,,~

1

1~1~a ,

932 23

3

0

TTT

Tm ,

932

)32(231

TTT

Tm ,

364

)1(23

2

0

TTT

TTk ,

364

)96)(1(231

TTT

TTk .

The derivation, using a bias correction in the spirit of Stock and Watson (2008), and the proof

of consistency is given in Lemma C.6a in Appendix C.

In light of the previous results, estimation of *

,NvμE is straightforward; exploiting the fact that

the weights matrices are time-invariant, a consistent estimate is given by

)~

(1

1~4~

,,,,

2*

, NNNN TrT

T

Nμvμvvμ BΣA

E , (44a)

where ])~[(~ 2

,1 Nn

NT

nN vdiag Σ .

Finally, an estimate of **

,NvE is given by

N

HR

NNNTN

T,,

**

,

~~~

)2(

~vvv bΣa

E . (44b)

That )1(~ **

,

**

, pNN o vv EE follows from Lemmata C.2 and C.3 and Remark C.1 thereafter in

Appendix C.

We summarize the results of section 1.2 with the following theorem.

Theorem 4b. Consistency of NΨ~

Suppose all of the assumptions of Theorem 4a and Assumption 7 holds and that Nv and Nμ

are normally distributed. Let the elements of NΨ~

be defined as above (from (39) to (44)).

Then, )1(~

pNN oΨΨ and )1(~ 1

pNN o ΨΨ .

Remark 4: Under non-normality, Theorem 4.b holds under additional assumptions regarding

the moments of Nv and Nμ and with augmented definitions of the elements of NΨ and NΨ~

;

details are given in the Appendix.

1.3 Estimation of N,θΔΨ

It remains to provide an estimate of N,θΔΨ , which is required for tests of joint hypotheses

concerning the regression parameters Nδ and the parameters associated with the spatial

regressive disturbance process Nθ .

As evident from the results in section 1.2, the assumptions maintained in Theorem 4b are

sufficient to prove that the following expressions consistently estimate the columns of N,θΔΨ

as defined in light of (32c), provided that Nv and Nμ are normally distributed:

Nssc ),,,(,.,~

Δθψ μ

Δθ

v

Δθ ψψ NsscNssc ),,,(,.,),,,(,.,~~

, 4,...,1c , Sss ,...,1, , with (45)

v

Δθψ Nssc ),,,(,.,~

)~~(

)2(

,

,,,

2/1ss

Nc

HR

NNTN

T

vv aΣF , and

μ

Δθψ Nssc ),,,(,.,~

)~(~11 ,

,,,

2

2/1

ss

NcNTN

μμ aF .

Theorem 4c. Consistency of N,

~ΔθΨ

Suppose the assumptions of Theorem 4b hold and let (the columns of ) N,

~ΔθΨ be defined by

(45). Then, we have )1(, ON θΔΨ , )1(~

,, pNN o θΔθΔ ΨΨ , and )1(~

, pN OθΔΨ .

Remark 5: Under non-normality, Theorem 4c holds under additional assumptions and with

augmented definitions of the columns of N,ΔθΨ and N,

~ΔθΨ ; details are given in the appendix.

2. Estimation of N,wΩ

The estimate of NJ is given by

NNN B~~~

ΓJ . (46)

The elements of NΓ~

are defined in (17) with the expectations operator suppressed and the

disturbances Nu replaced by their estimated counterparts. For simplicity of notation, the

estimated disturbances are denoted as Nu~ throughout, though it should be clear that they are

generated by the respective estimators Nδ

~,

Nδ~

, *~̂Nδ , or

*~̂Nδ defined above. For example,

under fixed effects (feasible) spatial generalized LS estimation, we have *~̂~NNNN δZyu .

The matrix NB~

is given by (20) with Ns, replaced by the GM estimates Ns,~ , Ss ,...,1 .

Theorem 5. Consistency of NwΩ

~

Suppose that Assumptions 1-7 hold. Let N,

~wΨ be defined as above (from (39) to (45)). Define

N,

~wΩ

1

2/1

,1

2/1

)~~~

(~~

~~

~~)

~~~(

~

NNNNN

NN

NNNNN

N TT

JΘJJΘ0

0PΨ

ΘJJΘJ0

0Pw .

It follows that )1(~

poNN ww ΩΩ , )1(O

NwΩ , and )1(

~pO

NwΩ .

Proof.

Above we showed that )1(~

,, pNN o ww ΨΨ . By assumption, )1(~

pNN oPP , )1(ON P ,

and )1(~

pN OP as well as )1(~

pNN oΘΘ , )1(ON Θ and )1(~

pN OΘ . In the proof of

Theorem 2 it was shown that )1(~

pNN o JJ , )1(ON J , and )1(~

pN OJ , and that

)1()()~~~

( 1

pNNNNNN o JΘJJΘJ , )1()( 1 ONNN

JΘJ , and )1()~~~

( pNNN O JΘJ . It now

follows that )1(~

,, pNN o ww ΩΩ and )1(, ON wΩ and thus )1(~

, pN OwΩ .

Remark 5: Under non-normality, Theorem 5 holds under additional assumptions and with

augmented definitions of N,wΨ and N,

~wΨ ; details are given in the appendix.

VI. Random vs. Fixed Effects. A Heteroskedasticity-Robust Hausman Test

In the following we derive a Hausman-type test of the spatial random effects versus the

spatial fixed effects model under heteroskedasticity of unknown form. Both estimators

considered are based on the spatial GLS transformed model (which removes the cross-

sectional interdependence) and use a heteroskedasticity-robust robust variance-covariance

matrix for inference. In general, neither of these two estimators will be efficient, such that we

use a generalized Hausman-test for inference (see, Weesie, 1999; Creel, 2004).

Consider the stacked vector of random and fixed effects estimates of the regression

parameters, which is given by

)ˆ(

)ˆ(*

2/1

*2/1

*2/1

*2/1

NN

NN

N

N

N

N

N

N

N

δδ

δδ

Δ

Δd . (47)

By Theorem A.1 in Kelejian and Prucha (2010)

***

***

ˆˆˆ

ˆˆˆ

NNN

NNN

N

d

N

δδδ

δδδ

ΩΩ

ΩΩΩd . (48)

As evident from (48), *

,

*1

ˆ * NNNTN

PΨPΩ ΔΔδ

and *

,

*1

ˆ * NNNTN

PΨPΩ ΔΔδ

. The off-diagonal

block of NΩ is given by

** ˆˆNN δδ

Ω **

,

*

,

* ]1

[1

NNNNNNTT

PFΣFP vv , (49a)

which can be estimated consistently – by the same logic as *ˆNδ

Ω and *ˆNδ

Ω – using

**

,*

,

*

ˆˆ

~]

~~~

)2(

1[

~1~** NN

HR

NNNTNTNN

PFΣFPΩ vvδδ

. (49b)

The Hausman test, which is derived under the null hypothesis that the random effects model

as specified in section II is the true model, takes the form of a Wald-type test of the restriction

that *

* ˆˆNN δδ . Define the discrepancy vector NNN qRm ˆ ˆ , where )ˆ,ˆ(ˆ

** NNN δδq . Note that

typically, the dimension of the parameter vector under random effects exceeds the parameter

vector under fixed effects by 1 due to the inclusion of a constant. Hence, for comparison of

the two estimators, we focus on a joint test regarding the slope parameters, i.e., we test

0NH Rq :0 against 0NH Rq :1 , (50)

where ),,( 1 PPP II0 R , assuming that the constant appears in the first row of the random

effects estimator *ˆNδ . We use a generalized Wald-type test (e.g., Greene, 2003, pp. 95, 487),

which takes the form19

)(~)~

( 2 PNNN mRQRm , (51)

19

If one of the estimators is efficient, the off-diagonal blocks are equal to zero and equation

(51) reduces to the standard Hausman test.

where ** ˆˆ

1~~

NN

NNδδ

ΩQ and P is the number of restrictions, which is equal to the number of

slope parameters in the present case.20

VII. Some Monte Carlo Evidence

In the following we provide some limited Monte Carlo evidence on the performance of the

estimation procedure suggested in the present paper. A comprehensive assessment, using a

broad range of parameter constellations, alternative distributional assumptions, and alternative

specifications of the weights matrices is beyond the scope of the present paper and left for

future research. We consider a SARAR(2,2) specification with two explanatory variables,

assuming that NN MW :21

uyWIxxy

2

1

2211 )(r

rTrββ , (52a)

εuWIu

2

1

)(s

sTs . (52b)

We consider three sample sizes: 50N , 100N , and 250N and assume 5T

throughout. For each Monte Carlo experiment, we consider 1000 draws. The explanatory

variables 1x and 2x are generated as random draws from a standard normal distribution,

scaled with a factor of five, and treated as fixed in repeated samples. The parameters are as

follows: 121 , 5.01 , 25.02 , 4.01 , and 2.02 .

The unnormalized NN matrix 0

W consists of two NN matrices 0

1W and 0

2W , where

00

2

0

1 WWW . The matrices 0

1W and 0

2W are specified such that they contain the elements

of 0

W for a different band of neighbours each. Otherwise, they have zero elements. In line

with Kelejian and Prucha (2010), we choose a design, where 0

1W corresponds to an ‘up to 3

ahead and up to 3 behind’ specification and 0

2W corresponds to a ‘4 to 6 ahead and 4 to 6

behind’ specification. The final weights matrices 1W and 2W are obtained by individually

row-normalizing 0

1W and 0

2W . As already mentioned, we have 11 WM and 22 WM .

20

The theory underlying Hausman tests with not fully efficient estimators is derived in White

(1982, 1994). In a non-spatial context, such a generalized Hausman test is considered, e.g., in

Weesie (1999) or Creel (2004). Sufficient assumptions to ensure well-behaved asymptotic

properties in generalized Wald tests are derived and discussed in Andrews (1987) and Vuong

(1987). 21

For simplicity of notation, the subscript is suppressed in the following. N

Regarding the choice of instruments, we include linearly independent terms of up to second

order spatial lags of the exogenous variables. In particular, the matrix of untransformed

instruments H contains 12 columns and is given by

].)(,)(,)(,)(,)(,[ 21

2

2

2

121 XWWIXWIXWIXWIXWIXH TTTTT (53)

The elements of the error term ε are specified as itiit vμε , where the idiosyncratic error is

given by itititititit xxv )1.01.05.0(5.0 2

,2

2

,1 . Thereby it and it are draws from a

standard normal distribution and it is a draw from a uniform distribution with support

]5.1 ,5.0[ , which is treated as fixed in repeated samples. Hence, itv exhibits both conditional

and unconditional heteroskedasticity.

The individual effect is specified as iiii wxπxπμ ,22,,11 , where Niw , is a draw from

normal distribution with variance 0.5. We consider two specifications: in the random effects

model we have 021 ππ (and, hence, )()( ,, NiNi wVarμVar ); in the fixed effects model we

have 25.021 ππ (and, hence, )()()( 2

,, iiNi wVarVarμVar πx ).

Results for the estimates of 1 and 2 are obtained by the GM estimator defined in equation

(18), using the optimal weighting matrix under normality 1)~

( NΨ . The estimates reported for

the regression parameters are FGTSLS estimates as defined in (35) and (36) using the

transformed set of instruments **~H . For each single coefficient, we report the average bias

and root mean squared error for each parameter constellation and the rejection rates for the

test that the coefficient is equal to the true parameter value. For the random effects models,

we also show the results for the Hausman test.

< Table 1 >

Table 1 reports the results of the Monte Carlo analysis for the three different sample sizes

considered, both under the random and fixed effects specification. Given that the natural

habitat of GM estimation is large samples, the performance in the smallest sample with

50N is acceptable. In the random effects (fixed effects) specification, the average bias and

RMSE amount to 0.0008385 and 0.0246475 (0.001719 and 0.027935) for the estimates of

),( 21 λ and -0.0096335 and 0.2563835 (-0.0106385 and 1.050696) for the estimates of

),( 21 ρ . With an average rejection rate of 0.0685 and 0.139 (0.0650 and 0.1225), the

performance of the single hypotheses tests referring to λ and ρ is not too bad as well. The

Hausman test is oversized with a rejection rate of 0.1060.

For moderately sized samples with 250N , the bias has virtually disappeared: in relative

terms it amounts to 0.01560 (0.0102) percent for estimates of ),( 21 λ and to -0.1647

(-0.280) percent for the estimates of ),( 21 ρ under random effects (fixed effects). The

average RMSE of the estimates of ),( 21 λ shrinks to 0.011376 (0.011466), that of the

estimates of ),( 21 ρ shrinks to 0.213485 (0.800393) under random effects (fixed

effects). The size of the tests improves, but it approaches nominal size of 5 percent relatively

slowly. The reason for the latter partly accrues to the fact that the data for 1x and 2x are

generated as random draws. A second reason relates to the specific ‘ahead-behind’ design of

the spatial weights matrices, which – together with the properties of the explanatory variables

– results in a fairly high correlation between spatial lags of different orders. With explanatory

variables as in many empirical applications and less artificial spatial weights matrices, there

will be less correlation between the spatial lags of the explanatory variables and spatial lags of

different orders and the size of tests can be expected to approach the nominal size faster than

in the chosen design. Regarding the GM estimates of ρ , the average size amounts to 0.139

(0.123), that for the FGTSLS estimates of λ to 0.0555 (0.139). The performance of the

Hausman test is worth mentioning, which has already approached its nominal size with a

rejection rate of 0.056.

The final column in Table 1 considers the case with N = 250 and where the sum of the

parameters of the spatial lag of the dependent variable is closer to 1, i.e., with 1 = 0.6 and 2

= 0.35. As can be seen from the results, the performance in terms of bias and size is

comparable with the parameter constellation where the sum of 1 and 2 is smaller in

magnitude.

Overall, the Monte Carlo experiments illustrate that the proposed estimators work reasonably

well in terms of bias and RMSE, even in very small samples. Regarding the estimates of the

variance-covariance matrix of the parameter estimates, in particular those relating to the

disturbance process, some care is warranted in the interpretation of the results in small

samples, though the tests appear to be conservative in the sense that they under-reject the null

and the p-values converge from above for reasons mentioned in the previous paragraph. It

should also be emphasized that the results here are based on a correctly specified model with

a high signal to noise ratio. Hence, apart from a comprehensive Monte Carlos study using

alternative distributional assumptions and ‘real world’ explanatory variables and weights

matrices, an interesting extension for future research would be to explore small sample

corrections or re-sampling methods for the GM estimators considered in the present paper in

order to improve the performance in small samples or in empirical models with poor fit.

VIII. Conclusions

This paper derived a two-step estimation procedure for spatial regressive panel data models

with spatial regressive disturbances of the SARAR(R,S) type under both random and fixed

effects assumptions and allowing for heteroskedasticity of arbitrary form in the idiosyncratic

error terms. The regression model is estimated by two-stage least squares (TSLS) to obtain

consistent estimates of the disturbances, which are then used in the second step to obtain

generalized moments (GM) estimates of the parameters of the spatial regressive disturbance

process.

We provide a detailed study of the asymptotic properties of the proposed two-step TSLS and

GM estimators of the model parameters, prove their consistency and establish asymptotic

normality. Both for the original model and the spatial generalized least squares (GLS)

transformed model, we derive the joint and asymptotic variance-covariance matrix, which is

robust to (cross-sectional interdependence and) heteroskedasticity of unknown form. This

enables robust tests of the general SARAR(R,S) model against restricted alternatives such as

SARAR(0,S) and SARAR(R,0) or SARAR(1,1) with random and fixed effects panel data

models under heteroskedasticity. We also propose a generalized Hausman-type test of the

spatial random versus the spatial fixed effects model.

The framework suggested in the present paper provides a flexible tool for applied

econometric researchers for empirical models with cross-sectional interdependence and

allows to study the strength and pattern of spatial interdependence more flexibly and under

less restrictive assumptions than existing SARAR(1,1) models assuming homoskedasticity.

Allowing for alternative modes of interdependence and determining the proper pattern of the

interdependence decay function is not only of interest in itself but also a prerequisite for a

correct model specification and valid inference.

Table 1. Monte Carlo Results, 1000 draws

N = 50 N = 100 N = 250 N = 250

RE FE RE FE RE FE RE FE

1 = 0.5 1 = 0.6

Bias 0.001082 0.002029 -0.000113 0.000359 0.000237 -0.0000468 0.000161 0.000587

RMSE 0.02395 0.026601 0.018157 0.016493 0.010799 0.011269 0.009209 0.009012

Rej. Rate 0.074 0.068 0.045 0.049 0.046 0.042 0.056 0.048

2 = 0.25 2 = 0.35

Bias 0.000595 0.001409 0.000259 0.000791 -0.000120 -0.0000301 -0.000073 -0.000654

RMSE 0.025345 0.029269 0.019095 0.017378 0.011953 0.011663 0.010053 0.009981

Rej. Rate 0.063 0.062 0.05 0.054 0.065 0.051 0.047 0.052

1 = 1 1 = 1

Bias -0.000313 0.000429 -0.000564 -0.000313 0.000187 -0.000447 0.000286 0.00000197

RMSE 0.017017 0.01953 0.013611 0.012649 0.008170 0.008024 0.008164 0.007508

Rej. Rate 0.049 0.057 0.064 0.064 0.049 0.054 0.061 0.042

2 = 1 2 = 1

Bias -0.000125 -0.000613 0.000103 -0.000262 0.000005 -0.0000729 -0.000561 0.000248

RMSE 0.018706 0.019158 0.012461 0.011945 0.008016 0.007815 0.00777 0.008334

Rej. Rate 0.057 0.087 0.054 0.047 0.047 0.053 0.048000 0.052000

1 = 0.4 1 = 0.4

Bias -0.002348 0.014094 0.000859 -0.00071 0.000162 0.003757 -0.000614 0.004045

RMSE 0.184249 0.954723 0.146618 0.670901 0.121458 0.701775 0.126152 0.747754

Rej. Rate 0.147 0.122 0.126 0.115 0.146 0.131 0.125000 0.133000

2 = 0.2 2 = 0.2

Bias -0.016919 -0.035371 -0.004592 -0.010192 -0.000656 -0.005854 -0.005464 -0.011615

RMSE 0.328518 1.146669 0.314663 0.864678 0.305513 0.899012 0.312235 0.945605

Rej. Rate 0.131 0.123 0.131 0.116 0.132 0.115 0.118000 0.128000

Hausman-test

Rej. Rate 0.106 0.058 0.056 0.054

References

Amemiya, T. (1971). The estimation of the variances in a variance-components model.

International Economic Review, 12, 1-13.

Andrews, D. (1987). Asymptotic results of generalized wald tests. Econometric Theory, 3,

348-358.

Anselin, L. (1988). Spatial Econometrics: Methods and Models. Boston: Kluwer, Academic

Publishers.

Anselin, L., Bera, A.K., Florax, R. and Yoon, M.J. (1996). Simple diagnostic tests for spatial

dependence. Regional Science and Urban Economics, 26, 77-104.

Arbia, G., Basile, R., and Piras, G. (2005), Using spatial panel data in modeling regional

growth and convergence. ISAE Working Paper no. 55, Rome.

Arraiz, I., Drukker, D.M., Kelejian, H., and Prucha, I. (2010). A spatial Cliff-Ord-type model

with heteroskedastic innovations: Small and large sample results. Regional Science and

Urban Economics, 50(2), 592-614.

Audretsch, D.B. and Feldmann, M.P. (1996). R&D spillovers and the geography of

innovation and production. American Economic Review, 86, 630-640.

Badinger, H. and Egger, P. (2008). GM estimation of higher-order spatial autoregressive

processes in cross-section models with heteroskedastic disturbances. CESifo Working

Paper no. 2356, Munich.

Badinger, H. and Egger, P. (2009). Horizontal versus vertical interdependence in

multinational activity. CESifo Working Paper no. 2327, Munich.

Baltagi, B.H. (2005). Econometric Analysis of Panel Data, third edition. Chichester: Wiley.

Baltagi, B.H. (2006). Random effects and spatial autocorrelation with equal weights.

Econometric Theory, 22(5), 973-84.

Baltagi, B.H., Egger, P., and Pfaffermayr, M. (2007). Estimating models of complex FDI: Are

there third-country effects? Journal of Econometrics, 140(1), 260-281.

Baltagi, B.H., Egger, P., and Pfaffermayr, M. (2009). A generalized spatial panel data model

with random effects. Working Paper No. 113, Center for Policy Research, University of

Syracuse.

Baltagi, B.H. and Li, D. (2001). LM test for functional form and spatial error correlation.

International Regional Science Review, 24, 194-225.

Baltagi, B.H., Song, S.H., and Koh, W. (2003). Testing panel data regression models with

spatial error correlation. Journal of Econometrics, 117, 123-150.

Baltagi, B.H., Song, S.H., Jung, B.C., and Koh, W. (2007). Testing for serial correlation,

spatial autocorrelation and random effects using panel data. Journal of Econometrics,

140, 5-51.

Bell, K.P. and Bockstael, N.E. (2000). Applying the generalized-moments estimation

approach to spatial problems involving microlevel data. The Review of Economics and

Statistics, 82(1), 72–82.

Besley, T. and Case, A. (1995). Incumbent behavior: Vote-seeking, tax-setting, and yardstick

competition. American Economic Review, 85, 25-45.

Case, A., Hines Jr., J. and Rosen, H. (1993). Budget spillovers and fiscal policy

independence: Evidence from the States. Journal of Public Economics, 52, 285-307.

Cliff, A. and Ord, J. (1973). Spatial Autocorrelation. London: Pion, 1973.

Cliff, A. and Ord, J. (1981). Spatial Processes, Models and Applications. London: Pion, 1981.

Cohen, J.P. and Morrison Paul, C. (2007). The impacts of transportation infrastructure on

property values: A higher-order spatial econometrics approach. Journal of Regional

Science, 47(3), 457-478.

Cohen, J.P. and Morrison Paul, C.J. (2004). Public infrastructure investment, interstate spatial

spillovers, and manufacturing costs. The Review of Economics and Statistics, 86(2),

551-560.

Conley, T. (1999). GMM estimation with cross sectional dependence. Journal of

Econometrics, 92, 1-45.

Creel, M. (2004). Modified Hausman tests for inefficient estimators. Applied Economics,

36(21), 2373-2376.

Egger, P., Pfaffermayr, M. and Winner, H. (2005). An unbalanced spatial panel data approach

to US state tax competition. Economics Letters, 88, 329-335.

Gilbert, S. (2002). Testing the distribution of error components in panel data models.

Economics Letters, 77, 47-53.

Greene, W.H. (2003). Econometric Analysis, fifth edition. Pearson, Upper Saddle River, New

Jersey.

Holtz-Eakin, D. (1994). Public sector capital and the productivity puzzle. Review of

Economics and Statistics, 76, 12-21.

Horn, R.A. and Johnson, C.R. (1985). Matrix Analysis. Cambridge: Cambridge University

Press, 1985.

Kapoor, M., Kelejian, H.H., and Prucha, I.R. (2007). Panel data models with spatially

correlated error components. Journal of Econometrics, 140, 97-130.

Kelejian, H. and Robinson, D. (1992). Spatial autocorrelation: A new computationally simple

test with an application to per capita county police expenditures. Regional Science and

Urban Economics, 22, 317–331.

Kelejian, H.H. and Prucha, I.R. (1998). A generalized spatial two-stage least squares

procedure for estimating a spatial autoregressive model with autoregressive

disturbances. Journal of Real Estate Finance and Economics, 17, 99-121.

Kelejian, H.H. and Prucha, I.R. (1999). A generalized moments estimator for the

autoregresssive parameter in a spatial model. International Economic Review, 40, 509-

533.

Kelejian, H.H. and Prucha, I.R. (2004). Estimation of simultaneous systems of spatially

interrelated cross sectional equations. Journal of Econometrics, 118, 27-50.

Kelejian, H.H. and Prucha, I.R. (2007). HAC Estimation in a Spatial Framework. Journal of

Econometrics, 140(1), 131-154.

Kelejian, H.H. and Prucha, I.R. (2010). Specification and estimation of spatial autoregressive

models with autoregressive and heteroskedastic disturbances. Journal of Econometrics,

157(1), 53-67.

Kelejian, H.H., Prucha, I.R. and Yuzefovich, E. (2004). Instrumental variable estimation of a

spatial autoregressive model with autoregressive disturbances: Large and small sample

results. In: LeSage, J. and Pace, K. (eds.), Advances in Econometrics: Spatial and

Spatiotemporal Econometrics. Elsevier, New York, 163-198.

Lancaster, T. (2000). The incidental parameter problem since 1948. Journal of Econometrics,

95(2), 391–413.

Lee, L.-F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for

spatial autoregressive models. Econometrica, 72(6), 1899-1925.

Lee, L.F. and Yu, J. (2010). Estimation of spatial autoregressive panel data models with fixed

effects. Journal of Econometrics, 154(2), 165-185.

Lee, L.F. and Liu, X. (2010). Efficient GMM estimation of high order spatial autoregressive

models. Econometric Theory, 26, 187-230.

Lin, X. and L.F. Lee (2010). GMM estimation of spatial autoregressive models with unknown

heteroskedasticity. Journal of Econometrics, 157(1), 34-52.

Mittelhammer, R.C. (1996). Mathematical Statistics for Economics and Business. New York:

Springer.

Mundlak, Y. (1978). On Pooling Time Series and Cross Section Data, Econometrica, 46(1),

69-85.

Mutl, J. and Pfaffermayr, M. (2011). The Hausman Test in a Cliff and Ord Panel Model.

Econometrics Journal. 14, 48-76.

Pinkse, J. and Slade, M.E. (1998). Contracting in space: An application of spatial statistics to

discrete-choice models. Journal of Econometrics, 85, 125-154.

Pinkse, J., Slade, M.E., and Brett, C. (2002). Spatial price competition: A semiparametric

approach. Econometrica, 70, 1111-1153.

Pötscher, B.M. and Prucha, I.R. (1997). Dynamic Nonlinear Econometric Models, Asymptotic

Theory. New York: Springer.

Rao, C.R. (1973). Linear Statistical Inference and its Applications, 2nd

edition. New York:

Wiley.

Resnik, S. (1999). A Probability Path. Boston: Birkhäuser.

Shroder, M. (1995). Games the States don’t play: Welfare benefits and the theory of fiscal

federalism. Review of Economics and Statistics, 77, 183-191.

Stock and Watson (2008). Heterokedasticity-robust standard errors for fixed effects panel data

regression. Econometrica, 76(1), 155-174.

Topa, G. (2001). Social interactions, local spillovers and unemployment. Review of Economic

Studies, 68, 261-295.

Van der Vaart, H.R. and Yen, H.E. (1968). Weak sufficient conditions for Fatou’s Lemma

and Lebesgue’s Dominated Convergence Theorem. Mathematics Magazine, 41(3), 109-

117.

Vuong, Q.H. (1987). Generalized inverses and asymptotic properties of Wald test. Economics

Letters, 24, 343-347.

Weesie, J. (1999) Seemingly unrelated estimation and the cluster-adjusted sandwich

estimator, Stata Technical Bulletin STB-52.

APPENDIX. Variance-Covariance Matrix Under Non-Normality of Error Components

As already mention in the main text, Theorems 4b and 4c as well as Theorem 5 also hold

under non-normality with different definitions of NΨ and N,θΔΨ respectively. In the

following, we provide the definitions of the respective elements under non-normality and

define consistent estimates for them.

1.1 Distribution of GM Estimates under Non-Normality (Definition of NΨ )

If we drop the assumption that Nμ and Nv are normally distributed, equation (28b) becomes

),(,

,

*,

,

*1,;,

,,

tt

Nc

ss

Nc

ttss

Ncc CovN

qqE (A.1a)

])(,)([ ,

,

,

,

,

,

,

,

1

N

tt

NcN

tt

NcNN

ss

NcN

ss

NcNCovN ξaξAξξaξAξ

])()(,)()([ ,

,,

,

,,

,

,

,

,,

,

,,

,

,

1

N

tt

NcN

tt

NcN

tt

NcNN

ss

NcN

ss

NcN

ss

NcNCovN μavaξAξμavaξAξ μvμv

))(2(2 ,

,,

,

,,

4,

,,,

,

,,,

2,

,,

,

,,

1 tt

Nc

ss

Nc

tt

NcN

ss

NcN

tt

NcN

ss

NcTrN

μμμvμvvv AAAΣAΣAΣA .

])()[( ,

,,

,

,,

2,

,,

,

,,

1 tt

Nc

ss

Nc

tt

NcN

ss

NcN

μμvv aaaΣa

N

i

tt

Niic

ss

Niic

NT

n

NnvNnv

tt

Nnnc

ss

Nnnc aaNaaN1

,

,,,

,

,,,

14)4(

1

4

,,

)4(

,,

,

,,,

,

,,,

1 )3()3( μμvv

N

i

tt

Nic

ss

Niic

tt

Niic

ss

Nic

NT

n

Nnv

tt

Nnc

ss

Nnnc

tt

Nnnc

ss

Nnc aaaaNaaaaN1

,

,,,

,

,,,

,

,,,

,

,,,

1)3(

1

)3(

,,

,

,,,

,

,,,

,

,,,

,

,,,

1 )()( μμμμvvvv .

Adopting the notational convention introduced in section V, subsection 1.2, (A.1a) can be

written as

*

,

*

,

*,

, NNNN vμμv EEEE **

,

**

, NN μv EE (A.1b)

***

,

***

, NN μv EE ****

,

****

, NN μv EE ,

i.e., 0*

, NvE , and the additional terms, appearing in the second row of (A.1b), are defined as:

NT

n

NnvNnvNnN cN1

4

,,

)4(

,,,,

1***

, )( vvE (A.2a)

N

i

NiiN cN1

,,

14)4(***

, )3( μμ E (A.2b)

NT

n

NnvNnN dN1

)3(

,,,,

1****

, vvE (A.2c)

N

i

NiN dN1

,,

1)3(****

, μμ E , (A.2d)

where )3(

and )4(

( )3(

v and )4(

v ) denote the third and fourth moments of Nμ ( Nv ),

respectively.22

As shown in Lemma C.4 of Appendix C, the third and fourth moments of Ni, , denoted as

)3(

and )4(

, can be estimated consistently using

N

i

it

T

s

T

stt

NisNTNT 1

2

1 1

,

)3(

,~~

)1(

1~ , and (A.3a)

N

i

T

s

T

stt

NitNisNTNT 1 1 1

3

,,

)4(

,~~

)1(

1~ (A.3b)

)~~

)1(

1~1(~~

)1(

3

1 1 1

,,

1 1

2

,

1 1 1

,,

N

i

T

s

T

stt

NitNis

N

i

T

t

Nit

N

i

T

s

T

stt

NitNisTNTNTTNT

,

where

S

m

NNmNmNTN

1

,,~)~(~ uMIIε .

Hence, consistent estimators of the expressions in (A1), associated (only) with the

homoskedastic, time-invariance error component Niμ , , are given by

N

i

NiiN cN1

,,

14)4(***

, )~3~(~

μμ E , (A.4a)

N

i

NiN dN1

,,

1)3(****

,

~~~μμ E with )

~~(~

,,,,,,,,,, NnNnnNnnNnNn babad μμμμμ . (A.4b)

Next turn to ***

,NvE , which we rewrite as

***

,2

***

,1

***

, NNN vvv EEE with

NT

n

NnvNnN cN1

)4(

,,,,

1***

,1 vvE and

NT

n

NnvNnN cN1

4

,,,,

1***

,2 vvE .(A.5a)

We first consider ***

,1 NvE and note that the elements Nnc ,,v are time-invariant. By Lemma C.6a,

a consistent estimator of ***

,1 NvE is given by

22

For the elements, where both (or ) and (or ), the terms involving

the third and fourth moments drop out.

2c 4c 2c 4c

N

i

T

t

T

trr

NirNitNi

N

i

T

t

NitNiN vT

vT

cNmk

mkvc

NTmk

k

1 1 1

2

,

2

,,,

11

01

1 1

4

,,,

11

0***

,1~

1

1~11

1

~1

1

~vvvE , (A.5a)

where

932 23

3

0

TTT

Tm ,

932

32231

TTT

Tm ,

)364(

)1(23

2

0

TTT

TTk ,

)364(

)96)(1(231

TTT

TTk .

Next consider ***

,2 NvE , which involves a weighted sum of the squared variance. Without

distributional assumptions and unknown heteroskedasticity over both cross-sections and time,

it is not possible to obtain an estimates of (a weighted sum of) the squared variances. (Using

the fourth power of the residuals estimates a weighted sum of the fourth moments.) Hence, an

approximation is required, assuming that the idiosyncratic error components are

heteroskedastic only over cross-sections, but not over time, i.e., ),0.(.~ 2

, iNit div . Under that

assumption, the following expression consistently estimates ***

,2 NvE as shown in Lemma C.6b

in Appendix C:

***

,2

~NvE

N

i

T

t

NitNi

N

i

T

t

T

trr

NirNitNi vcNTkm

kmv

Tv

Tc

Nkm

m

1 1

4

,,,

11

01

1 1 1

2

,

2

,,,

11

0 ~1

1

~

1

1~11

1vv

, (A.5b)

where 1010 ,,, kkmm are defined as above.

Finally, a consistent estimate of ****

,NvE is given by

NT

n

HR

NnNnN vdTN

T

1

3

,,,

****

, )~(~

)1(

~vvE with )

~~(~

,,,,,,,,,, NnNnnNnnNnNn babad vvvvv , (A.6)

where ]~

)1(

1~[33

)1()~(

1

3

,3

3

,2

3

,

T

r

NirNit

HR

Nit vT

vTT

TTv . The consistency of

****

,

~NvE follows from

Lemma C.7 and Remark C.3 thereafter in Appendix C.

1.2 Joint Distribution of Regression Parameters and GM Estimates under Non-

normality (Definition of N,θΔΨ )

Under non-normality, equation (32c) becomes augmented by terms involving the third

moments of the error components as follows:

Nssc ),,,(,., Δθψ μ

Δθ

v

Δθ ψψ NsscNssc ),,,(,.,),,,(,., , 4,...,1c , Sss ,...,1, , with (A.7a)

v

Δθψ Nssc ),,,(,., )]([11 ,

,,

)3(

,2/1 ,,,

ss

NcNNN ssNcTN

vAv aΣκΣF

v

, and

μ

Δθψ Nssc ),,,(,., )([11 ,

,,

2)3(

,2/1 ,,,

ss

NcN ssNcTN

μAμ aκF

μ ,

where )( )3(

,

)3(

nvN σdiagΣ is an NTNT diagonal matrix with third moments )3(

,nvσ ,

NTn ,...,1 , ssNc

,,,vA

κ is an 1NT vector with the main diagonal elements of ss

Nc

,

,,vA , and

ssNc

,,,μA

κ is an 1N vector with the main diagonal elements of ss

Nc

,

,,μA .

In light (32c) and the results of section 1.2, the assumptions maintained in Theorem 4b are

sufficient to prove that the following expressions consistently estimate the elements of N,θΔΨ :

Nssc ),,,(,.,~

Δθψ μ

Δθ

v

Δθ ψψ NsscNssc ),,,(,.,),,,(,.,~~

, 4,...,1c , Sss ,...,1, , with (A.7b)

v

Δθψ Nssc ),,,(,., )]~~~([

)1(

1 ,

,,

)3(,

,

2/1

,,,

ss

Nc

HR

N

HR

NN ssNcT

T

N

vAv aΣκΣF

v

, and

μ

Δθψ Nssc ),,,(,.,~

)~~~([11 ,

,,

2)3(

,2/1 ,,,

ss

NcN ssNcTN

μAμ aκF

μ .

TECHNICAL APPENDIX

APPENDIX A

Notation

We adopt the standard convention to refer to matrices and vectors with acronyms in boldface.

Let NA denote some matrix. Its elements are referred to as Nija , ; Ni.,a and Ni,.a denote the i-

th row and the i-th column of NA respectively. If NA is a square matrix, 1

NA denotes its

inverse; if NA is singular,

NA denotes its generalized inverse. The (submultiplicative)

matrix norm is defined as 2/1)]([ NNN Tr AAA . In several places, we use single

indexation, e.g., NTn ,...,1 , to denote elements of the vectors or matrices that are stacked

over time periods. 23

Remark A.1

i) Definition of row and column sum boundedness (Kapoor, Kelejian, and Prucha, 2007, p.

99): Let 1, NNA , be some sequence of NTNT matrices with T some fixed positive

integer. We will then say that the row and column sums of the (sequence of) matrices NA are

bounded uniformly in absolute value, if there exists a constant c , which does not depend

on N, such that

caNT

j

NnjNTn

1

,1max and ca

NT

n

NnjNTj

1

,1max for all N 1.

ii) Let NA be a (sequence of) NN matrices whose row and column sums are bounded

uniformly in absolute value, and let S be some TT matrix (with 1T fixed). Then the

row and column sums of the matrix NAS are bounded uniformly in absolute value

(compare Kapoor, Kelejian, and Prucha, 2007, p. 118).

iii) If NA and NB are (sequences of) NTNT matrices (with 1T fixed), whose row and

column sums are bounded uniformly in absolute value (by Ac and Bc ), then so are the row

and column sums of NNBA and NN BA (by BAcc and BA cc ). If NZ is a (sequence of)

PNT matrices whose elements are bounded uniformly in absolute value, then so are the

elements of NNZA and NNNNT ZAZ1)( . Of course, this also covers the case NNNT ZZ1)(

for NTN IA (compare Kapoor, Kelejian, and Prucha, 2007, p. 119).

23

Take the vector , for example. Using indexation , the

elements , refer to period , elements refer to ,

etc., and elements refer to period .

),...,( ,,1 NTNN vvv NTn ,...,1

Nnv Nn ,...,1 ,, 1t NNnu Nn 2,...,1 ,, 2t

NTNTnu Nn ,...,1)1( ,, Tt

iv) Suppose that the row and columns sums of the NTNT matrices )( ,NijN aA are

bounded uniformly in absolute value by some finite constant Ac ; then q

A

qNT

n

Nnj ca 1

, for

1q (see Kelejian and Prucha, 2009, Remark C.1).

v) Let Nξ and Nη be 1NT random vectors (with 1T fixed), where, for each N, the

elements are independently distributed with zero mean and finite variances. Then the elements

of NNNT ξZ 2/1)( are )1(pO and NNNNT ηAξ1)( is )1(pO .24

vi) Let Nζ be a 1NT random vector (with 1T fixed), where, for each N, the elements are

distributed with zero mean and finite fourth moments. Let Nπ be some nonstochastic 1NT

vector, whose elements are bounded uniformly in absolute value and let NΠ be a NTNT

nonstochastic matrix whose row and column sums are bounded uniformly in absolute value.

Define the column vector NNNN ζΠπd . It follows that the elements of Nd have finite

fourth moments.25

Remark A.2

The matrices N,0Q and N,1Q have the following properties (see Kapoor, Kelejian, and Prucha,

2007, p. 101):

)1()( ,0 TNtr NQ , Ntr N )( ,1Q , 0IeQ )(,0 NTN , )()(,1 NTNTN IeIeQ ,

NNNN vQεQ ,0,0 , NNNNTNN vQμIeεQ ,1,1 )( , )()( ,0,0 NTNNNT DIQQDI ,

)()( ,1,1 NTNNNT DIQQDI , )()1(])[( ,0 NNNT trTtr DQDI ,

)(])[( ,1 NNNT trtr DQDI ,

24

Kelejian and Prucha (2004) consider the case and identically distributed elements of

and . Results hold up for (fixed) and under heteroskedasticity, as long as the

variances of the elements of and are bounded uniformly in absolute value.

25

Kelejian and Prucha (2009, Lemma C.2) give a proof for and independent elements

of . The extension to (fixed) is obvious. Independence of the elements of is not

required for the result to hold. The fourth moments of the elements of are

given by

, by Hölder’s

inequality as long as the fourth moments of the elements of are bounded uniformly.

1T

Nξ Nη 1T

Nξ Nη

1T

Nζ 1TNζ

NNNN ζΠπd

NT

j

NjNijNiE1

4

,,, )( ])([21

4

,,

4

,

4

NT

j

NjNijNiE

KENT

j

Nm

NT

k

Nl

NT

l

Nk

NT

m

NjNimNilNikNijNi ][21

,

1

,

1

,

1

,,,,,

4

,

4

Nζ

where ND is an arbitrary N N matrix. Obviously, the row and column sums of N,0Q and

N,1Q are bounded uniformly in absolute value.

APPENDIX B

Lemma B.126

Let NA be some nonstochastic NTNT matrix (with T fixed), whose row and column sums

are bounded uniformly in absolute value. Let Nu be defined by (2c) and Nu~ be a predictor

for Nu . Suppose that Assumptions 1 to 4 hold. Then

(a) )1(1 OEN NNN uAu , )1()( 1 oNVar NNN

uAu , and )1()()~~( 11

pNNNNNN oENN uAuuAu .

(b) )1(,.

1 OEN NNNj uAd , Pj ,...,1 , where Nj ,.d is the j-th column of the PNT matrix

ND , and )1()(~ 11

pNNNNNN oENN uADuAD .

(c) If furthermore Assumption 6 holds, then

)1(~~ 2/12/12/1

pNNNNNNNN oNNN ΔαuAuuAu with ])([1

NNNNN EN uAADα .

In light of (b), we have )1(ON α and )1(~)(1

pNNNNN oN αuAAD .

Proof of part (a)

Let

NNNN N uAu 1 and NNNN N uAu ~~~ 1 . (B.1)

Given (4a), we have NNNN N εε S 1 , with the symmetric NTNT matrix NS defined as

S

m

NmNmNTNN

S

m

NmNmNTN

1

1

,,

1

1

,, ])()[(])()[2/1( MIIAAMII S (B.2)

By Assumptions 1-3 and Remark A.1 in Appendix A, the row and column sums of the

matrices NS are bounded uniformly in absolute value. Let NNTN ΣIJΩε )(2

, , then

given Assumption 2, the row and column sums of the matrices NNNN ,, εε ΩΩ SS are bounded

uniformly in absolute value.

In the following let K be a common bound for the row and column sums of the absolute

value of the elements of NS , N,εΩ , and NNNN ,, εε ΩΩ SS and of the absolute value of their

respective elements. Then

NT

n

NT

j

NjNnNnjN bNEE1 1

,,,

1 (B.3)

26

Compare Lemma C.1 in Kelejian and Prucha (2009) for the case of a cross-sectional

SARAR(1,1) model and Lemma C.1 in Badinger and Egger (2008b) for the case of a cross-

sectional SARAR(R,S) model.

NT

n

NT

j

NjNnNnj EbN1 1

,,,

1

NT

n

NT

j

jnNnjbN1 1

,,,

1

3TK ,

where we used Hölder’s inequality in the last step. This proves that NE is O(1).

Now consider )( NVar , rewriting N as quadratic form in ),( NNN μvξ and invoking

Lemma A.1 in Kelejian and Prucha (2009):

)( NVar ),( 11

NNNNNN NNCov εεεε SS (B.4)

),(2

NNNNNNCovN ξξξξ SS

NT

n

NnnnNN ENTrNNN

1

4

,

2

,*

22 ]3)([)(2 sξξ ΩΩ SS ,

]}3)([)({)(2 4

,,...,1

2

*,,...,1

22

NnNTnNnnNTnNN EdiagdiagTrNTrNNN

sξξ ΩΩ SS ,

where NS is a )1()1( TNTN matrix, whose elements and row and column sums are

bounded uniformly in absolute value by some constant *K . Next, Nnn*,s is the n-th diagonal

element of NNNNnnN SS SS )( ,*,

* s , with NNN ξΩSS , where

NξΩ is the variance-

covariance matrix of Nξ , which is diagonal with elements 2

,nv for NTn ,...,1 and elements

2

for )1(,...,1 TNNTn . Finally, the vector NNN ξSη1 . In light of Assumption 1, the

row and column sums (and the elements) of NS are bounded uniformly in absolute value by

some finite constant, say **K . Moreover, the row and column sums (and the elements) of

1

NS

are also bounded uniformly in absolute value by some constant ***K .

Finally, in light of

Remark A.1 and Assumption 1 it follows that the elements of NN ξSη1 have finite fourth

moments. Denote their bound by ****K . Without loss of generality we assume that the bound

K used above is chosen such that KK *, KK **

, KK ***, and KK ****

. Hence, we

have

)( NVar ).1()2)(1()]([)(2 312

)1(,...,1

2

)1(

2 oKKTNKKdiagTrNKTrN TNnTN

I

The claim in part (a) of Lemma B.1 that )1()()( 11

pNNNNNN oENN uAuuAu now

follows from Chebychev’s inequality (see, for example, White, 2001, p. 35).

We now prove the second part of (a), i.e., )1()()~~( 11

pNNNNNN oENN uAuuAu . Since

)1()( pNN oE , it suffices to show that )1(~

pNN o . By Assumption 4, we have

NNNN ΔDuu ~ , where ),...,( .,.,1 NNTNN ddD . Substituting NNNN ΔDuu ~ into the

expression for N

~ in (B.1), we obtain

NNNNNNNNNNNN NN uAuΔDuADΔu 11 )()(~

(B.5)

])([1

NNNNNNNNNNN ΔDADΔuAADΔ

NN ,

where

N ])([1

NNNNNN uAADΔ , (B.6)

}])()[({1

1

,,

1

S

m

NNmNmNTNNNNN εMIIAADΔ

)(1

NNNNN εCDΔ ,

with ),..,(])()[( .,.,1

1

1

,,

NNTN

S

m

NmNmNTNNN ccMIIAAC , and

N NNNNNN ΔDADΔ 1. (B.7)

By Assumption 3 and Remark A.1, the row and column sums of NC are bounded uniformly

in absolute value. We next prove that )1(pN o and )1(pN o .

Proof that )1(pN o :

N NNNNN εCDΔ 1 (B.8)

N

NT

n

NnNnNN εcdΔ

1

.,.,

1

NT

n

NNnNnNN1

.,.,

1 εcdΔ

NT

n

NT

j

NjNnjNnN cN1 1

,,.,

1 dΔ

NT

n

NT

j

NjNnjNnN cN1 1

,,.,

1 dΔ

NT

n

NT

j

NjNnjNnN cN1 1

,,.,

1 dΔ

NT

n

NnjNn

NT

j

NjN cN1

,.,

1

,

1 dΔ

qNT

n

q

Nnj

pNT

n

p

Nn

NT

j

NjN cN

/1

1

,

/1

1

.,

1

,

1

dΔ

q

NT

n

q

Nnj

pNT

n

p

Nn

NT

j

NjN

p cNNNN

/1

1

,

/1

1

.,

1

1

,

12/12/1/1

dΔ .

Note that

KcNT

n

Nnj

1

, by Assumption. In the following we denote by K the uniform

bound for the row and column sums of the absolute value of the elements of NA and NC .

From Remark A.1 in Appendix A, it follows that qNT

n

q

Nnj Kc

1

, and thus

qNT

n

q

Nnjc

/1

1

,

K . Factoring K out of the sum yields

N p

NT

n

p

Nn

NT

j

NjN

p NTTNNNK

/1

1

.,

1

1

,

12/12/1/1 )(

dΔ .

This holds for 2p for some 0 as in Assumption 4 and 1/1/1 qp . By

Assumption 4, )1(2/1

pN ON Δ . Assumption 4 also implies that

)1( )(

/1

1

.,

1

p

pNT

n

p

Nn ONT

d for 2p and some 0 .

Moreover, KE Nj , , which implies that )1(1

,

1

p

NT

n

Nn ON

. Since 02/1/1 pN as

N it follows that N )1(po . For later reference, note that N )1()1( pp ooK , where

we can choose PAccK 2 , where Ac and Pc are the bounds for the row and column sums of

the absolute values of the elements of NA and

S

m

NmNmNT

1

1

,, ])([ MII , respectively.

(Compare (B.6) and Remark A.1).

Next consider

N NNNNNN ΔDADΔ 1 =

NT

n

NT

j

NNjNnjNnN aN1 1

.,,.,

1ΔddΔ (B.9)

Nnj

NT

n

NT

j

NjNnN aN ,

1 1

.,.,

21

ddΔ

qNT

j

q

Nnj

NT

n

pNT

j

p

NjNnN aN

/1

1

,

1

/1

1

.,.,

21

ddΔ

pNT

j

p

Nj

NT

n

NnN

p NNKN

/1

1

.,

1

1

.,

12/1

ddΔ

)1(

/2

1

.,

122/12/12/1/1

p

pNT

j

p

NjN

p oNNKNN

dΔ .

From the last inequality we can also see that )1(2/1

pN oN . Note that N )1()1( pp ooK ,

where we can choose PAccK 2 . Summing up, we have proved that )1(pN o .

Proof of part (b)

Denote by *

,Ns the s-th element of NNNN uAD1 . By Assumptions 3 and 4 and Remark A.1

in Appendix A there exists a constant K such that KuE Ni )( 2

, and KdEp

Nij , with

2p for some 0 . Without loss of generality we assume that the row and column

sums of the matrices NA are bounded uniformly by K . Notice first that

2/12

,

2/12

,,, NjsNnNjsNn EdEuduE

with as before.

It follows that

(B.10)

,

which shows that , and also that .

It is readily verified that , such that we have . Next observe

that

, (B.11)

where . By arguments analogous to the proof that

, it follows that . Hence , and thus

, which also shows that .

Proof of part (c)

In light of the proof of part (a)

, (B.12)

where as shown above, and in light of (b) and since by

Assumption 4, we have

. (B.13)

p

p

NjsNn dEEu/1

,

2/12

,

pp KKK /12/1/12/1 p

NT

n

NjsNn

NT

j

NnjNs duEaNE1

,,

1

,

1*

,

ppNT

n

NT

j

Nnj

p KTKNTNKaNK /12/31/12/1

1 1

,

1/12/1

)1(,.

1 ONE NNNs uAd )1(])([1 OEN NNNNN

uAADα

)1()( * oVar s )1()( **

pss oE

*11 ~NNNNNNN NN

uADuAD

NNNNN N ΔDAD 1* N

)1(])([1

pNNNNN oN uAADΔ )1(*

pN o )1(~ **

pss o

)1()(~ **

pss oE )1(~)(1

pNNNNN oN αuAAD

NNNN uAu ~~2/1

NNNNNNNNN NNNN 2/12/112/1 ])([ ΔDAAuuAu

)1(2/1

pN oN )1(2/1

pN ON Δ

)1(~~ 2/12/12/1

pNNNNNNNN oNNN ΔαuAuuAu

Proof of Theorem 1. Consistency of the Weighted GM Estimator

We first show that Assumption 5 also implies that the smallest eigenvalue of is

bounded away from zero, i.e., that for some By Assumption 5

and in light of Rao (1973, p. 62),

. (B.14)

Using Mittelhammer (1996, p. 254) we have

, (B.15)

with since by Assumption 5.

The objective function of the weighted GM estimator and its nonstochastic counterpart are

given by

and (B.16a)

(B.16b)

Since , we have , i.e., at the true parameter vector

. Hence,

. (B.17a)

In light of Rao (1973, p. 62) and Assumption 5, it follows that:

and (B.17b)

.

By the properties of the norm , we have such

that . Hence, for every

, (B.18)

which proves that the true parameter vector is identifiable unique

(compare Lemma 4.1 in Pötscher and Prucha, 1997).

Moreover, let and . Then, the difference between the

objective function and its nonstochastic counterpart can then be written as

and (B.19a)

NNN ΓΘΓ

0min )( NNN ΓΘΓ .00

0inf)( *min

xx

xΓΓxΓΓ NN

xNN

xx

ΓΘΓxΓΘΓ

xNNN

xNNN inf)(min

xx

xΓΓx

NN

xNΞ inf)( 1

min

0)()( 0minmin NNN ΓΓΘ

**0 0)( *min NΘ

)~~()

~~()( bb NNNNNNR ΓγΘΓγθ

)()()( bb NNNNNNR ΓγΘΓγθ

0 NNN bΓγ 0)( NNR θ 0)( θNR

),,...,( ,1 NSNNθ

)()()()( NNNNNNNN RR bbbb ΓΘΓθθ

)())(()()( min NNNNNNNN RR bbbb ΓΘΓθθ

)()()()( 0 NNNNN RR bbbb θθ

2/1)]([ AAA tr 2

θθ )()( NN bbbb

2

0)()( NNNN RR θθθθ 0

0inf)]()([inflim 2

0

2

0}:{}:{

NNNNN

NN

RR θθθθθθθθθθ

),,...,( 2

,,1 NSNNθ

)~

,~( NNN Γγ F ),( NNN ΓγΦ

),1(~

),1(),( bFFb NNNNR Θθ

, (B.19b)

such that

.

As evident from (17), the elements of the matrices and are all of the form ,

where are nonstochastic matrices, whose row and column sums are bounded

uniformly in absolute value. In light of Lemma B.1, the elements of are and it

follows that and as . As a consequence,

we have (for finite S)

(B.20)

Together with identifiable uniqueness, the consistency of now

follows directly from Lemma 3.1 in Pötscher and Prucha (1997).

Proof of Theorem 2. Asymptotic Normality of

To derive the asymptotic distribution of the vector , defined in (30) we invoke the central

limit theorem for vectors of linear quadratic forms given by Kelejian and Prucha (2009,

Theorem A.1). The vector of quadratic forms in the present context, to which the Theorem is

applied is ; its variance-covariance matrix is given by and

.

Note that in light of Assumptions 1, 2 and 7 (and Lemma B.1), the stacked innovations ,

the matrices , , and the vectors and , , , satisfy

the assumptions of central limit theorem by Kelejian and Prucha (2009, Theorem A.1).

It follows that

, (B.21)

since by assumption as required in Theorem A.1.

),1(),1()( bb NNNNR ΦΘΦθ

),1)(~

)(,1()(),( bFFb NNNNNNNN RR ΦΘΦΘθθ

2

),1( ~

bFF NNNNNN ΦΘΦΘ

])(2

)1(2)([1

~ 242

ba

SSSaSNNNNNN

ΦΘΦΘ FF

Nγ NΓ NNN uu A

NA NTNT

NΦ )1(O

0p

NN ΦF 0~ p

NNNNNN ΦΘΦΘ FF N

. as 0])(2

)1()([1 ][)(),(sup 242

],0[,,..,1,2

NbaSS

aSRR p

NNNNNNbSsaa s

ΦΦθθ FF

)~,~,...,~(~ 2

,,,1 NNSNN θ

Nθ~

Nq

NN N qq 2/1* NN NΨΨ *

2/12/12/1*)( NN N ΨΨ

Nξ

ss

Nc

,

,,vAss

Nc

,

,,μAss

Nc

,

,,vass

Nc

,

,,μa 4,...,1c Sss ,...,1,

),()( 24

2/1*2/12/1*2/1*

S

d

NNNNNN N I0ΨΨΨ qqq

0)()( min

*

min

1

NNN ΨΨ

Since the row and column sums of the matrices , the elements of the vectors ,

, and , and the moments of and are bounded

uniformly in absolute value, it follows in light of (28) that the elements of and also those

of are bounded uniformly in absolute value.

We next turn to the derivation of the limiting distribution of the GM estimator . In

Theorem 1 we showed that the GM estimator defined by (18) is consistent. It follows that

– apart from a set of the sample space whose probability tends to zero – the estimator satisfies

the following first order condition:

, (B.22)

which is a vector, the rows corresponding the partial derivatives of the criterion

function with respect to , , and .

Substituting the mean value theorem expression

, (B.23)

where is some between value, into the first-order condition yields

. (B.24)

Observe that and consider the two matrices

, (B.25)

, (B.26)

where and correspond to as defined above with and substituted for

. Notice that is positive definite, since and are positive definite by

assumption and the matrix has full column rank.

In the proof of Theorem 1 (and Lemma B.1) we have demonstrated that and

that the elements of and are and , respectively. By Assumption 5,

, and . Since and (and thus also and

ss

Nc

,

,Ass

Nc

,

,a

4,...,1c Sss ,...,1, th )4( Nv Nμ

NΨ

2/1

NΨ

Nθ~

Nθ~

0ΔθqΘθ

ΔθqΔθqΘΔθq

θ

),

~(

~),~

(),

~(

~),

~( NNNN

NNNNNNNNNN

1)1( S

Ns, Ss ,...,1 2

)~

(),(

),(),~

( NNNNN

NNNNNN θθθ

ΔθqΔθqΔθq

Nθ

),(~),

~(

)~

(),(~),

~( 2/12/1

NNNNNNN

NNNNN

NNNN NN ΔθqΘ

θ

Δθqθθ

θ

ΔθqΘ

θ

Δθq

NNNN BΓ

θ

Δθq ~),(

)1()1( SS

NNNNNNNN

NNNN

N BB ΓΘΓθ

ΔθqΘ

θ

ΔθqΞ

~~~~),(~),~

(~

NNNNNN BB ΓΘΓΞ

NB~

NB NBNθ

~Nθ

Nθ NΞ NΓ NΘ

)1(]12/)1(2[ SSSS NB

0ΓΓp

NN ~

NΓ NΓ~

)1(O )1(pO

)1(~

pNN oΘΘ )1(ON Θ )1(~

pN OΘNρ

~Nρ NB

~

) are consistent and bounded uniformly in probability, if follows that ,

, and . Moreover, is positive definite and thus invertible, and its

inverse is also .

Denote as the generalized inverse of . It then follows as a special case of Lemma F1

in Pötscher and Prucha (1997) that is non-singular with probability approaching 1 as

, that is , and that .

Pre-multiplying (B.24) with we obtain, after rearranging terms,

.(B.27)

In light of the discussion above, the first term on the right-hand side is zero on -sets of

probability approaching 1 (compare Pötscher and Prucha, 1997, pp. 228). This yields

. (B.28)

Next observe that

, (B.29)

since and .

As we showed in section III, the elements of can be expressed as

. (B.30)

where is defined in (24), and that

. (B.31)

It now follows from (B.28), (B.29), and (B.30) that

. (B.32)

Since all nonstochastic terms on the right hand side from (B.32) are it follows that

is . To derive the asymptotic distribution of , we invoke

Corollary F4 in Pötscher and Prucha (1997). In the present context, we have

NB )1(~

pNN oΞΞ

)1(~

pN OΞ )1(ON Ξ NΞ

1

NΞ )1(O

NΞ~

NΞ~

NΞ~

N

NΞ~

)1(pO )1(~ 1

pNN o ΞΞ

NΞ~

),(~),

~(~

)~

()~~

()~

( 2/12/1

1

2/1

NNNNNNN

NNNNNSNN NNN ΔθqΘθ

ΔθqΞθθΞΞIθθ

)1(),(~),

~(~

)~

( 2/12/1

pNNNNNNN

NNN oNN

ΔθqΘθ

ΔθqΞθθ

)1(~),

~(~ 1

pNNNNNNNN

N o

ΘΓΞΘ

θ

ΔθqΞ B

)1(~ 1

pNN o ΞΞ )1(

),~

(pNN

NNN o

Γ

θ

ΔθqB

),(2/1

NNNN Δθq

),(2/1

NNNN Δθq )1()1(*2/1

pNpN ooN qq

*

Nq

),()( 24

2/1*2/12/1*2/1*

S

d

NNNNNN N I0ΨΨΨ qqq

)1()()~

( 2/12/112/1

pNNNNNNNN oN qΨΨΘJΞθθ

)1(O

)~

(2/1

NNN θθ )1(pO )~

(2/1

NNN θθ

,

, with

.

Furthermore, and its variance-covariance matrix is

,

where is positive definite.

As a final point it has to be shown that as required in Corollary

F4 in Pötscher and Prucha (1997). Observe that

(B.33)

,

since the matrices involved are all positive definite.

Proof of Theorem 3. Joint Distribution of and Other Model Parameters

The first line in Theorem 3 holds in light of Assumption 7 (for ), bearing in mind that

, and Theorem 2 (for ).

We next prove that by verifying that the

assumptions of the central limit theorem A.1 by Kelejian and Prucha (2009) are fulfilled. Note

that by assumption. In Theorem 2, we verified that the stacked

innovations , the matrices , , and the vectors and , ,

, satisfy the assumptions of central limit theorem by Kelejian and Prucha (2009,

Theorem A.1).

For the estimators considered in the present paper, the elements of the matrix

are bounded uniformly in absolute value, provided that the elements of the

matrix are bounded uniformly in absolute value (see Lemmata 1 and 2). Hence, the linear

form fulfils the assumptions of Theorem A.1; as a consequence,

.

),(~ )1(4

2/1

SSSd

NNN N I0ζΨζ q

)1()~

(2/1

pNNNN oN ζθθ X

2/11

NNNNN ΨΘJΞ X

)1()~

(2/1

pNN ON θθ

11~ )()()( NNNNNNNNNNNN

N

JΘJJΘΨΘJJΘJΘΩθ

NθΩ~

0)(inflim min NNN XX

)(min NNXX )( 11

min

NNNNNNN ΞJΘΨΘJΞ

0)()()()()( minmin

11

minminmin

NNNNNNNNN BB ΓΓΞΞΘΘΨ

Nρ~

NN Δ2/1

NNN PFT )~

(2/1

NNN θθ

),(],)[()1(4

2/12/1

,, *

SSSP

dNNNNNo NNT I0FξΨξ w q

0)( *

,min wΨwΨ cN

Nξss

Nc

,

,,vAss

Nc

,

,,μAss

Nc

,

,,vass

Nc

,

,,μa 4,...,1c

Sss ,...,1,

),( ,, NNN μv FFF

NH

NNNNNN μFvFξF μv ,,

),()1(4, *

SSSP

dNo N I0ξ

Proof of Lemma 1.

Consider the case of random effects estimation first. In light of equations (4a) and (4b),

Assumptions 3 and 8, as well as , it follows that all columns of

are of the form , where the elements of the vector and

the row and column sums of the matrix are bounded uniformly in absolute value. It

follows that the fourth moments of the elements of the matrix are bounded

uniformly by some finite constant and that Assumption 6 holds (see Remark A.1 in Appendix

A).

Next, note that

,

where is defined in the Lemma, and

, and

.

In light of Assumption 8, and , with as defined in the Lemma.

By Assumptions 2, 3 and 8, the elements of and are bounded uniformly in absolute

value. By Assumption 1, , , and the diagonal variance-covariance

matrices of and have uniformly bounded elements. Thus, and

the elements of the variance-covariance matrix of , i.e., , are

bounded uniformly in absolute value. Moreover, , and the elements of

the variance-covariance matrix of , i.e., , are bounded

uniformly in absolute value (see Remark A.1 in Appendix A). It follows from Chebychev’s

inequality that , , and consequently

and that

. This completes the proof, recalling that

. Obviously, the same proof applies under fixed effects

estimation, using the within-transformed matrices , , , , , and ,

provided that Assumption 8 is maintained accordingly for and .

bNN βsup

),( NNN YXZ NNNN εΠπ Nπ

NΠ

NN ZD

NNNNNNNN NTNTNT μFPvFPδδ μv ,

2/1

,

2/12/1 )(~

)(~

)~

()(

NP~

N

S

m

NmNmNTN HMIIFv

1

1

,,, ])([

N

S

m

NmNmNTNTN HMIIIeFμ

1

1

,,, ])()[(

)1(~

pNN oPP )1(ON P NP

N,vF N,μF

0v )( NE 0μ )( NE

Nv Nμ 0vFv ])[( ,

2/1

NNNTE

NNvN vF ,

2/1

NNNNT ,,

1)( vv FΣF

0μFμ ])[( ,

2/1

NNNTE

NNN μFμ,

2/1

NNNT ,,

21)( μμ FF

)1()( ,

2/1

pNN ONT vFv )1()( ,

2/1

pNN ONT μFμ

)1()()()~

()( ,

2/1

,

2/12/1

pNNNNNNNN oNTNTNT μFPvFPδδ μv

)1()()( ,

2/1

,

2/1

pNNNNNN ONTNT μFPvFP μv

),(),( ,,,, NNNNNNN μvμv FPFPTTT

N,vT N,μT NP NHN,vF 0Fμ N,

NX NH

Proof of Lemma 2.

The random effects spatial generalized TSLS estimator is given by

, where with

.

Substituting , we obtain

, with

.

Next note that

,

where is a matrix, whose row and columns sums are bounded uniformly in absolute

value, satisfying

.

Substituting for , we obtain

, where

,

,

Note that the feasible generalized TSLS estimator uses generated (transformed) instruments

, based on the estimate . Using

we obtain such that .

**1** ˆ)

ˆ(

ˆNNNNN yZZZδ ***

NNNN uδZy

N

S

m

NmNmNTN uMIIu

1

,,

* )]([

**1****** )(ˆ

* NNNNNNNN

ZHHHHZPZH

***2/1*2/12/1 ~)()()

ˆ()( NNNNNN NTNTNT uHPΔδδ

1**1**11**11**1**1* ])][()[(]})[(])][(){[(~ NNNNNNNNNNN NTNTNTNTNT HHHZZHHHHZP

NNT

S

m

NmNmNN uMIuu )()(1

,,

**

NM

S

m

NmNmNm

1

,,, )( M

S

m

NNmNm

1

,, )( M

*

Nu

)ˆ

()( 2/1

NNNT δδ

*2/1)( NNT Δ N,N ,21 dd

NNNN NT εHP **2/1

,1

~)(

d

N,2d NNT

S

m

NmNmNNNT uMIHP )()(~

)(1

,,

**2/1

N

S

m

NmNmNTNT

S

m

NmNmNNNT εMIIMIHP

1

1

,,

1

,,

**2/1 ])()[()(~

)(

*

NH

Nθ

NNT

S

m

NmNmNNN HMIθHH )()()(1

,,

**

2

1

,,

j

NijNi dd

2

1

2

1

,

*2/1)(i j

NijNNT dΔ

Considering , we have

,

with , and .

.

Regarding we have

,

Next note that, in light of Assumption 8 and since is -consistent, it follows that

.

By Assumption 8 we also have and thus

. It follows as a special case of Pötscher and Prucha (1997,

Lemma F1) that

.

It follows further that and with defined in the Lemma.

Next observe that . Note further that all terms except for are of

the form , where are matrices involving products of

, , and . By the maintained assumptions regarding

these matrices it follows that the elements of are bounded uniformly in absolute value.

N,1d

NNNN NT εHP **2/1

,11 )(

d

])([~

)( **2/1

NNTNNNNT μIevHP

NNNNNN NTNT μFPvFP μv *

,

*2/1*

,

*2/1 ~)(

~)(

**

, NN HFv **

, )( NNTN HIeFμ

N,12d NNTN

S

m

NmNmNNT εMIHP )()(~

)(1

,,

*2/1

N,2d

N

S

m

NmNmNTNT

S

m

NmNmNNN NT εMIIMIHP

1

1

,,

1

,,

**2/1

,21 ])()[()(~

)(

d

N

S

m

NmNmNTNT

S

m

NmNm

NTN

S

m

NmNmNN NT

εMIIMI

MIHP

1

1

,,

1

,,

1

,,

*2/1

,22

])()[()(

)()(~

)(

d

N

S

m

NmNmNTNNTN

S

m

NmNmNNT εMIIMMIHP

1

1

,,

1

2

,,

*2/1 ])()[()]([~

)(

Nθ

2/1N

)1(ˆ

)( **

1

****

**1

pNN oNT

ZHHHZH QQQZZ

)1(**

1

**** O

ZHHHZH QQQ

)1()( 1

**

1

**** O

ZHHHZH QQQ

)1()(]ˆ

)([ 1

**

1

****

1**1

pNN oNT

ZHHHZH QQQZZ

)1(~ **

pNN oPP )1(* ON P*

NP

)1()( pNN oρρ

Nij ,d N,11d

NN

/-

Np NTo εP D 21* )(~

)1(ND *PNT

)( NT MI

S

m

NmNmNT

1

1

,, ])([ MII NH

ND

As a consequence, and the elements of the variance-covariance matrix

of , i.e., , are bounded uniformly in absolute value (see

Remark A.1 in Appendix A). It follows from Chebychev’s inequality that

. As a consequence, all terms except for are , and

. Finally, observe that , with

and , recalling that .

APPENDIX C

Lemma C.1

Define the vectors with elements and the

vector of fixed effects residuals . Suppose that Assumptions 1-4 hold and that

the elements of have bounded fourth moments. Then and

, with , , and where ,

, and for some . As a direct consequence,

and .

Proof.

Note first that

, where (C.1)

.

This can also be written as

, (C.2)

where with

,

, and

0ε ])[( 2/1

NNNTE D

NNNT εD 2/1)( N,NNNT DD εΩ1)(

)1()( 2/1

pNN ONT εD Nij ,d N,11d )1(po

)1(,11 pN Od NNNNNvNN NTNT μFPvFP *

,

*2/1*

,

*2/1

,11

~)(

~)( d

*

,NvF*

NH *

,NF )(*

NTN IeH ***

NNN PFT

1NT N

S

m

NmNmNTN uMIIε ~])~([~

1

,,

Nitε ,~

NNN εQv ~~,0

),...,( .,.,1 NNNN ddD NNN ηεε ~

NNN ηvv ~NitNNit ,,

NitNNit ,, )1(2/1

pN ON KE Nit 4

,

KENit

4

, K

2

,

2

,,

2

,

2

, 2~NitNNitNitNNitNit

2

,

2

,,

2

,

2

, 2~NitNNitNitNNitNit vvv

Nε~

NN ηε

Nη N

S

m

NmNmNT

S

m

NmTNmNm εMIIMI ])([)])(~([ 1

1

,,

1

,,,

NN

S

m

S

m

NmTNmNmNNNmNmNT ΔDMIΔDMII

1 1

,,,,, )])(~([)]([

NNN gRη

),,( ,3,2,1 NNNN RRRR

N,1R

S

m

NNmNmNT

1

,, ,)( DMII

}])()[(,...,])()[{( 1

1

,,,

1

1

,,,1,2 N

S

m

NmNmNTNSTN

S

m

NmNmNTNTN εMIIMIεMIIMI

R

])(,...,)[( ,,1,3 NNSTNNTN DMIDMI R

.

In light of Assumption 3 and since the elements of have bounded fourth

moments, each column of the matrix is of the form , where the elements of

the vector are bounded uniformly in absolute value by some finite constant, the

row and column sums of the matrix are bounded uniformly in absolute value

by some finite constant, and the fourth moments of the elements of are also bounded by

some finite constant. It follows that the fourth moments of the elements of are also

bounded by some finite constant (see Remark A.1 in Appendix A).

As a consequence, , or for the n-th element of the vector ,

, (C.3)

where , denotes the n-th row of , and with

. Without loss of generality we can select such that for

. By Assumption 1 there is also some such that for . In the

following we use to denote the larger bound, i.e., . Also note that

. Replacing index with index , we have, from (C.1) and (C.3), that

.

By the same reasoning we have

(C.4)

,

with , where . Obviously, the elements of the columns of

and their fourth moments remain bounded uniformly after pre-multiplication with , such

that we have with defined as above and . Finally, we

])~(,)~(,[ NNNNNNN ΔρρρρΔg

),...,( .,.,1 NNNN ddD

NR NNN ζΠπ

1NT Nπ

NTNT NΠ

Nζ

NR

NNN gR η 1NT Nη

NnNNnNNn ,,, rg

NN g Nn.,rNR NnNn .,, r

KE Nn

4

, K

KE Nn )( ,

4K

KE Nn,4

K ),max( KKK

)1(2/1

pN ON n it

2

,

2

,,,

2

,

2

,

2

,,

2

,

2

, )2()(~NitNitNitNitNitNitNitNitNitNit

2

,,,2 NitNitNit

2

,,,2 NitNitNit

2

,

2

,,2 NitNNitNiN

NNN εQv ~~,0 NNNN ηQεQ ,0,0

NNN ηvQ ,0

NN ηv

NNNgRη NNN RR ,0Q NR

N,0Q

NitNNit ,, N KE

Nit

4

,

have . Without loss of generality, we choose the bound

in the lemma such that and .

Proof of Theorem 4a. Consistency of

In the following we provide two Lemmata that establish the consistency of .27

As

evident from the proof, this also covers the simpler case of .

Lemma C.2

Suppose Assumptions 1-4 hold and let

, and

,

with and , and where the vector can be any

estimator that satisfies . Let and be vectors, whose elements

are bounded uniformly in absolute value by some constant c, and let

. Define with

. Then

(a) and .

(b) There exist random variables that do not depend on and such that

, with and where is a

constant that depends monotonically on (as well as on some other bounds maintained in the

assumptions).

Proof.

A complication in the estimation of arises from the fact that

is based on the idiosyncratic error components in levels ( ), whereas

the estimator has to be based on the (demeaned) fixed effects residuals . The problem at

hand is similar in its structure to that in Stock and Watson (2008), who consider the

estimation of a heteroskedasticity-robust variance-covariance matrix in fixed effects panel

data models (without spatial correlation). They suggest an asymptotic bias correction that is

27

Related results for the cross-sectional case are obtained by Kelejian and Prucha (2009).

2

,

2

,,

2

,

2


KKK KK

N,

~ΔΔΨ

v

ΔΔΨ N,

~

μ

ΔΔΨ N,

~

N

S

m

NmNmNTNNNNNN uMIIQvQεQv )]([1

,,,0,0,0

N

S

m

NmNmNTNNNN uMIIQεQv ~)~([~~

1

,,,0,0

NNNN ΔDuu ~ ),...,( .,.,1 NNTNN ddD 1S Nρ

~

)1()~( pNN oρρNa Nb 1NT

)()( 2

,,1

2

,1 Nnv

NT

nNn

NT

nN diagEvdiag Σ ])~[(])~[(~ 2

,1

2

,1

HR

Nit

NT

it

HR

Nn

NT

n

HR

N vdiagvdiag Σ

T

r

NirNit

HR

Nit vTT

vT

Tv

1

2

,

2

,

2

,~

)2)(1(

1~

2)~(

)1(1~1

pNNNN

HR

NN oNTNT

bΣabΣa )1(1

ONT

NNN bΣa

N Na Nb

)1)((1~1

NNNNN

HR

NN cKNTNT

bΣabΣa )1(pN o )(cK

c

NNNNT bΣa1)(

)( 2

,1 Nn

NT

nN Evdiag Σ Nitv ,

Nitv ,

based on an expression, where the error components are clustered over cross-section

units (averaged over time), and which can be estimated consistently with the fixed effects

residuals ( ). In the following, we adopt the approach by Stock and Watson (2008) to

derive bias-corrected estimators in the present framework.

Define

, (C.5a)

with , and (C.5b)

with . (C.5c)

The bias is derived using the expectation of the infeasible estimate , which assumes that

the true parameters and are known and omits the degrees of freedom correction for

the P regressors. For simplicity of notation, define ; without loss of generality,

the bound in the Lemma is chosen such that .

Recognizing that we have, for each i,

(C.6)

,

using and .

Rearranging terms and averaging over N yields the following bias corrected estimator for :

, (C.7)

Nitv ,

2

,~

Nitv

)(1

NNNN ENT

bΣa

NNNNTN

bΣa

)1(

1 )(

2

,1 Nn

NT

nN vdiag Σ

NNNNTN

bΣa~

)1(

1~

)~(~ 2

,1 Nn

NT

nN vdiag Σ

NE

Nρ Nδ

NitNitNit bac ,,,

c cc Nit ,

N

i

NiNNNN EN

ETN

E1

,

1)(

)1(

1 bΣa

T

t

NitNitNi vcET

E1

2

,,,1

1

T

t

NiNiNitNitNit vvvvcET 1

2

,,,

2

,, )2(1

1

T

t

T

r

T

s

NisNirNit

T

t

T

s

NisNitNit

T

t

NitNit vvcT

ET

vT

vcET

vcET 1 1 1

,,,21 1

,,,

1

2

,,

1

1

112

1

1

1

1

NiNiTT

T,,

)1(

1

1

2

T

t

NitNitNi vcET 1

2

,,,

1

T

t

T

r

NirNitNi vT

cET 1 1

2

,,,

11

N

]~

1

1~[2

1~NN

HR

NTT

T

where

and .28

Finally, note

that (C.7) can also be written as , where is a diagonal matrix with

elements .

We next prove that , considering

(C.8)

and showing that both and are for fixed T as .

Consider first . It follows from the triangle inequality that

. (C.9)

By the weak law of large numbers for i.d. variables (e.g., White, 2001, p. 35), we have

,

(C.10)

observing that the fourth moments of (and ) are bounded uniformly by Assumption

1. We thus also have .

Moreover, repeatedly using the triangle inequality, it follows that

(C.11)

,

where ; the constant is chosen such

that

and . Note that

by the weak law of large numbers.

Next rewrite

, and

28

Note that , where .

N

i

T

t

NitNitN vcTN 1 1

2

,,~

)1(

1~

N

i

T

t

T

r

NirNitN vT

cNT 1 1 1

2

,,~

1

11~

N

HR

NN

HR

NNT

bΣa~1~

HR

NΣ~

T

r

NirNit

HR

Nit vTT

vT

Tv

1

2

,

2

,

2

,~

)2)(1(

1~

2)~(

)1(~pN

HR

N o

N

HR

N ~ )~(2

1)~(

2

1NNNN

TE

T

T

)~( NN E )~( N )1(po N

NN E ~

NN E ~NNNN E ~

NN E )1()()1(

1

1 1

2

,

2

,, p

N

i

T

t

NitNitNit ovEvcTN

Nitv , Nitv ,

)1(pNN oE

N

i

T

t

NitNitNitNN vEvcTN

E1 1

2

,

2

,, )()1(

1

Nc ,12

*2 cc

N

i

T

t

NitNitNitNitN vEvEvEvTN 1 1

2

,

2

,

2

,

2

,,1 )1(

1*c

*

2

, cEv Nit *

2

, cvE Nit )1(,1 pN o

N

i

T

t

NNNNitNitNNNNTN

vcTNTN 1 1

2

,,~~

)1(

1~

)1(

1~

)1(

1~ vCvbΣa

NiNi E ,,

T

t

T

r

NirNitNi vT

cET 1 1

2

,,,1

11

,

where . Hence,

(C.12a)

,

where , and

(C.12b)

.

By the properties of the matrices

, , and , and in light of Remark

A.1, the expressions in (C.12b) are all quadratic forms in matrices whose row and column

sums are bounded uniformly in absolute value by some constants that depend monotonically

on c as well as on other bounds maintained in the assumptions.

Repeatedly using the triangle inequality, Lemma B.1 in Appendix B, and factoring out the

terms it follows that

, (C.13)

where and does not depend on and and the constant depends

monotonically on c and other bounds maintained in the assumptions. Obviously, it follows

that . Moreover, we have

. (C.14)

It follows from (C.9), (C.11), and (C.13) that

(C.15)

where .

NNNNTN

vCv

)1(

1

)()( ,1,1 Nit

NT

itNn

NT

nN cdiagcdiag C

NNNNTN

vCv ~~

)1(

1~

N

S

m

NNmNmTNNNNTNTN

uCMIuuCu ~)]~([~

)1(

2~~

)1(

1

1

,,

S

m

N

S

m

NmNmTNNmNmTNTN 1 1

,,,,~)]~[)]~([~

)1(

1uMICMIu

NNNN ,0,0 QCQC

NNNNTN

vCv

)1(

1

N

S

m

NNmNmTNNNN NTTN

uCMIuuCu

1

,,

1 )]([)(2)1(

1

S

m

N

S

m

NmNmTNNmNmTNTN 1 1

,,,, )][)]([)1(

1uMICMIu

N,0QNC Nm,M ,,...,1 Sm

)1(po

NNN ck ,2)(~

)1(,2 pN o Na Nb )(ck

)1(~pNN oE

*

1 1

2

,,

1ccvEc

NTE

N

i

T

t

NitNitN

NN ~ ),)]((2[ *

Nckccc

)1( NN

Next consider . By the triangle inequality, and by the

weak law of large numbers

,

(C.16)

observing that the fourth moments of are bounded uniformly by Assumption 2. We thus

also have . Next, rewrite as

(C.17a)

where and . Moreover, repeatedly using the triangle

inequality, it follows that

(C.17b)

,

where the last step uses and is defined

as above. Note that

by the weak law of large numbers.

From Lemma C.1 in Appendix C, it follows that ,

where and . Using the triangle and Hölder inequality, we

have

, where . (C.18)

Obviously, , which – together with (C.16) implies that . It

also holds that .

From (C.17) with (C.18) it follows that

, (C.19)

where . Combining our results that and ,

result (a) in Lemma C.2 follows in light of (C.8), noting that

)~( N NNNN ~~

)1(pNN o

Nitv ,

)1(pNN o NN

NN )(1

1 1

2

,

2

,,

N

i

T

t

NiNiNit vEvcNT

T

r

NirNi vT

v1

2

,

2

,1

1

T

r

NirNi vT

v1

2

,

2

,

1

N

i

T

t

NiNiNitNN vEvcNT 1 1

2

,

2

,, )(1

Nc ,1 cc*2

N

i

T

t

NiNiNiNiN vEvEvEvNT1 1

2

,

2

,

2

,

2

,

1

,1 )(*c

)1(,1 pN o

2

,

2

,,

2

,

2


KENit

4

,)1(2/1

pN ON

NN ~

N

i

N

i

T

t

T

r

NirNit

T

t

T

r

NirNit vT

cvT

cNT 1 1 1 1

2

,,

1 1

2

,,1

1~

1

11

Nck ,2)( )1(,2 pN o

)1(~pNN o )1(~

pN o

N

NN ~

Nckccc )](2[ *

)1( NN )1(~pNN oE )1(~

pN o

. Result (b) in Lemma follows from (C.15) and (C.19),

which yields

(C.20)

,

where and .

Lemma C.3

Suppose Assumptions 1-4 hold. Furthermore, assume that , and that the row

and column sums of , are uniformly bounded in absolute value by 1 and

some finite constant respectively. Let , and

let with and

, and where the vector can be any estimator that satisfies

.

Let and , where

is an matrix whose elements are uniformly bounded in absolute value by some

constant , and let and be defined as in Lemma C.2. Then,

and .

Proof.

The subsequent proof will focus on the case, where and

; this corresponds to the random effects estimation of the

untransformed model (see Lemma 1); it is readily observed from the proof that this covers

also the case where (fixed effects estimation of untransformed model),

(random effects estimation of transformed model) as well as (fixed effects

estimation of transformed model).

*

1 1

2

,,,

1ccEc

NT

N

i

T

t

NitvNitN

N

HR

N ~ })](2[)](2{[2 **

NN ckcccckccc

Nckccc )](2[2 *

)](),(max[)( ckckck NNN )2( NN

1sup1

,

S

m

NmN

Nm,M Sm ,...,1

N

S

m

NmNmNTNNNN uMIIQεQv )]([1

,,,0,0

N

S

m


1

,,,0,0

NNNN ΔDuu ~

),...,( .,.,1 NNTNN ddD 1S Nρ

~

)1()~( pNN oρρ

N

S

m

NmNmNTN HMIIF ])([ 1

1

,,

N

S

m

NmNmNTN HMIIF ])~([~

1

,,

NH *PN

cNΣ

HR

NΣ~

)1(1~~~1

pNNNN

HR

NN oNTNT

FΣFFΣF )1(1

ONT

NNN FΣF

N

S

m

NmNmNTN HMIIF ])([ 1

1

,,

N

S

m

NmNmNTN HMIIF ])~([~

1

,,

NN FF *

NN FF

*

NN FF

Under the maintained assumptions there exists a with . By the

properties of the matrices the row and column sums of , are

uniformly bounded in absolute value by 1 and some finite constant respectively. For later

reference, also note that the elements of the vector are also uniformly bounded in

absolute value by c.

In the following, we ignore the division by (the fixed constant) T without consequences for

the proof. Denote the (r,s)-th element of the difference as . It

is given by

, , (C.21)

which can be written as , where

(C.22)

.

Next note that and thus

(C.23)

We next demonstrate that by showing that each summand ,

, invoking the following theorem (see, e.g., Resnik, 1999, p. 171): Let

) be real valued random variables. Then, if and only if each subsequence

contains a further subsequence that converges almost surely to .

As we show below we will be confronted with terms of the form:

* 1sup *

1

,

S

m

Nm

Nm,M Nm,*M Sm ,...,1

Ns

k

N

k

,.* hM

NNNN

HR

NNNTNT

FΣFFΣF1~~~1

N

)~~~

( ,.,.,.,.

1

NsNNrNs

HR

NNrN N fΣffΣf *,...,1, Psr

7

1

,

i

NiN

)~

)(~

()~

( ,.,.,.,.

1

,1 NsNsN

HR

NNrNrN N ffΣΣff

NsN

HR

NNrNrN N ,.,.,.

1

,2 )~

()~

( fΣΣff

)~

)(~

( ,.,.,.

1

,3 NsNsN

HR

NNrN N ffΣΣf

NsN

HR

NNrN N ,.,.

1

,4 )~

( fΣΣf

)~

()~

( ,.,.,.,.

1

,5 NsNsNNrNrN N ffΣff

NsNNrNrN N ,.,.,.

1

,6 )~

( fΣff

)~

( ,.,.,.

1

,7 NsNsNNrN N ffΣf

Ns

S

m

NmNmNTNs ,.

1

1

,,,. ])([ hMIIf

Ns

S

m

NmNmN

S

m

NmNmNTNsNs ,.

1

1

,,

1

,,,.,. ]})()~[({~

hMIMIIff

)1(~

,.,. pNsNs o ff )1(, pNi o

7,...,1i 1,,( NXX N

XX pN aNX

aNX X

.(C.24)

where is a matrix, whose row and column sums are uniformly bounded in absolute value

by some constant It follows that the absolute values of the elements of the vector

(and also that of ) are uniformly bounded in absolute value

by some finite constant (and ). (See Remark A.1 in Appendix A.)

Without loss of generality is chosen such that and holds.

Hence, Lemma C.2 applies and it follows that and that there exist random

variables such that .

Now, let the index denote some subsequence. In light of the aforementioned equivalence,

there exists a subsequence of this subsequence ( ) such that for events , with

, it holds that

, , , (C.25)

and that for some , and thus

, (C.26)

and finally

, where . (C.27)

In the following, assume that . Since , it follows from Horn and

Johnson (1985, p. 301) that is invertible and that

(C.28)

.

Substituting into the expression for given by (C.22) yields

(C.29)

Ns

k

NTN

l

NTNr

kl

Ns

k

NT

HR

N

l

NTNr

kllk

N pNpN ,.,.*

1

,.,.*

1),( )(()(()(~

)( hMIΣMIhhMIΣMIh

NM

Mc

Ns

k

NT ,.)( hMI Ns

k

NT

k

,.* )( hMI

ccc * **** cccc k

*c*

2

, cEv Nit *

2

, cvE Nit

)1(),(

p

lk

N o

)1(pN o )1)(( *

),(

N

lk

N cK

aN

aN A

0)( CAP

0)(),( lk

Na0)(

aN 0)(~,, aa NmNm ρρ Sm ,...,1

NNa 1)( aN

)(2))(1)(()( **

),( cKcKaa N

lk

N

S

m

Nm pρa

1

**, )(~ 12

sup *

1

,

**

p

p

S

m

NmN

NNa 1)(~

1

,

S

m

Nm a

))(~(1

,,

S

m

NmNmN aaMI

aaaaaaa Ns

S

m

NmNmNT

S

m

NmNmNTNsNs

,.

1

1

,,

1

1

,,,.,. ]})())(~{[()(~

hMIIMIIff

aaaaa Ns

l

lS

m

NmNm

lS

m

NmNmT

,.

1 1

,,

1

,, ])())(~[( hMMI

aN ,1

))(~

)(~

())(~

( ,.,.,.,.

1

,1 aaaaaaa NsNsN

HR

NNrNraN N

ffΣΣff

.

A single element with index (k,l) of this infinite double sum over k and l is given by

.

(C.30)

Next note that for any value of and any there exist matrices and ,

whose row and column sums are uniformly bounded in absolute value, such that:

and . (C.31)

and can thus be factored out of the sum, yielding

. (C.32)

By the same reasoning, for any values of and , there exists a matrix

, whose row and column sums are uniformly bounded in absolute value, such that:

. (C.33)

Substituting into the expression for , we obtain

(C.127)

.

Hence, we can then write

aaaaaaa

aaaaa

Ns

k

kS

m

NmNm

kS

m

NmNmTNN

l

lS

m

NmNm

lS

m

NmNmTNraN

,.

1 1

,,

1

,,

1 1

,,

1

,,,.

1

]})()~[(){~

(

} ])()~[({

hMMIΣΣ

MMIh

1

,.

1

,,

1

,,

1

,,

1

,,

1

,.

1]})()~[({

~]})()~[({

k

Ns

kS

m

NmNm

kS

m

NmNmT

HR

N

lS

m

NmNm

lS

m

NmNmT

l

Nra aaaaaaaaaaaN hMMIΣMMIh

1

,.

1

,,

1

,,

1 1

,,

1

,,,.

1]})()~[({}])()~[({

k

Ns

kS

m

NmNm

kS

m

NmNmTN

l

lS

m

NmNm

lS

m

NmNmTNra aaaaaaaaaaaN hMMIΣMMIh

aaaaaaaaaaa Ns

kS

m

NmNm

kS

m

NmNmT

HR

N

lS

m

NmNm

lS

m

NmNmTNraN

,.

1

,,

1

,,

1

,,

1

,,,.

1]})()~[({

~]})()~[({ hMMIΣMMIh

aaaaaaaaaaa Ns

kS

m

NmNm

kS

m

NmNmTN

lS

m

NmNm

lS

m

NmNmTNraN

,.

1

,,

1

,,

1

,,

1

,,,.

1]})()~[({]})()~[({ hMMIΣMMIh

aN ρ )(~ aN ρ

aN MaN M

S

m

NNm

S

m

NmNm aaaa

1

,

1

,, )( MM

S

m

NNm

S

m

NmNm aaaa

1

,

1

,, )~(~ MM

aN MaN M

l

N

lS

m

Nm

l

N

lS

m

Nm aaaa

MM )())(~(1

,

1

,

))(~(1

,

S

m

Nm a)(

1

,

S

m

Nm a

aN M

aaaaaaa N

lS

m

Nm

lS

m

NmN

lS

m

NmN

lS

m

Nm

MMM ])())(~[()())(~(1

,

1

,

1

,

1

,

aN MaN ,1

aaaaaaaaaa Ns

k

N

kS

m

Nm

kS

m

NmT

HR

N

l

N

lS

m

Nm

lS

m

NmTNraN N

,.

1

,

1

,

1

,

1

,,.

1

,1 }])())(~[({~

}])())(~[({ hMIΣMIh

aaaaaaaaa Ns

k

N

kS

m

Nm

kS

m

NmTN

l

N

lS

m

Nm

lS

m

NmTNraN

,.

1

,

1

,

1

,

1

,,.

1}])())(~[({}])())(~[({ hMIΣMIh

, (C.34)

where with

(C.35)

and

(C.36)

Note that as in light of the aforementioned results and thus

since for large enough. Moreover,

. (C.37)

Hence,

.

For , , such that

. (C.38)

Hence, there exists a dominating function for all values of k,l. Moreover, since

by construction, the dominating function is integrable (summable), i.e.,

. (C.39)

Hence the assumptions for application of Lebesgue’s Dominated Convergence Theorem are

fulfilled (see, e.g., Van der Vaart and Yen, 1968), such that

. (C.40)

The same holds for , . It follows that as and in light of

Resnik (1999) it follows that .

)()(1 1

),(

,1

k l

lk

NN aaX

)()()( ),(),(),( lk

N

lk

N

lk

N aaaaX

k

kS

m

Nm

lS

m

Nm

l

lS

m

Nm

lS

m

Nmlk

N

aaaa

aa

*

1

,

1

,

*

1

,

1

,),(

])()~[(])())(~[(

)(

aaaaaaaaaa Ns

k

NTN

l

NTNr

kl

Ns

k

NT

R

N

l

NTNr

kllk

N pNpN

,.,.*

1

,.,.*

1),( )()())((~

)()( hMIΣMIhhMIΣMIh

0),(

lk

Naa aN

0)(),( lk

NaX )(2)( *

),( cKlk

Na

aN

k

kS

m

Nm

lS

m

Nm

l

lS

m

Nm

lS

m

Nmlk

N

aaaa

aa

*

1

,

1

,

*

1

,

1

,),(

])()~[(])())(~[(

)(

klkl

Naa

*

**

*

**

*

** 422

NN )(2)( *

),( cKlk

Na

klkl

kllk

N cKcKBXa

*

**

*

**),(),( )(84)(2)(

),( klB

1/ ***

1 1 1

),(

1

),(

k k l

kl

l

kl BB

)(lim ,1 aa NN 0

aNi , 7,...,2i 0, aNi aN

)1(pN o

Thus, . That follows from the

properties maintained for the row and column sums of and the elements

of and .

Remark C.1

Regarding , note that and (obviously suppressing the

indexation of ), and accordingly for . By assumption , and

thus , where the dimension of is . Moreover, ,

and thus , where the dimension of is . By Lemma C.3, we

have . It follows that

.

Lemma C.4

Let and be defined as in (A.3a) and (A.3b). Suppose that Assumptions 1-4 hold

and that the elements of have bounded fourth moments. It follows that

, , and thus , and that ,

and thus .

Proof.

The subsequent proof builds on Gilbert (2002), who considers the estimation of third and

fourth moments in homoskedastic error component models without spatial lags of the

dependent variable (or other endogenous variables) and without spatial regressive

disturbances.

Consider the third moment of and its estimate:

for any given i and , and (C.41a)

. (C.41b)

By Assumption 1, is invariant to the choice of i, s and t. Using (C.1), we have

(C.42)

)1(~~~ 11

pNNNN

HR

NN oNN FΣFFΣF )1(1 ON NNN

FΣF

S

m

NmmN

1

1

, )( MI

NΣ NH

**

,

~NvE NNNN αPFa vv ,, NNNN αPFa vv

~~~~,,

Nα~

N,vb )1(~

pNN oPP )1(ON P

)1(~

pN OPNP PP *

)1(~pNN oαα

)1(ON α )1(~pN Oα

Nα 1P

)1(~~~ 11

pN

HR

NNN

HR

NN oNN FΣFFΣF

)1(~~~~~~~~ 11**

,

**

, pNNNNNNNNNN

HR

NNNNNN oNN αPFΣFPααPFΣFPαvv EE

)3(

,~

N)4(

,~

N

),...,( .,.,1 NNNN ddD

)1(~ )3()3(

, pN o )1()3( O )1(~ )3(

, pN O )1(~ )4()4(

, pN o

)1()4( O )1(~ )4(

, pN O

Ni,

)( 2

,,

)3(

NitNisE ts

N

i

Nit

T

s

T

stt

NisNTNT 1

2

,

1 1

,

)3(

,~~

)1(

1~

)3(

N

i

NitNit

T

s

T

stt

NisNisNTNT 1

2

,,

1 1

,,

)3(

, )()()1(

1~

.

Consider

(C.43a)

.

By the weak law of large numbers converges in probability to . Notice further that,

by the properties of and (see Assumption 1), , and are all

. As a consequence, converges in probability to .

Next observe that

(C.43b)

,

(C.43c)

,

(C.43d)

,

N

i

T

s

T

stt

NitNisNitNisNitNisNisNitNisNitNitNisTNT 1 1 1

2

,,,,

2

,,,

2

,,,

2

,, )22()1(

1

NNNNNN ,6,5,4,3,2,1

N

i

T

s

T

stt

NitNitNisNis

N

i

T

s

T

st

NitNisN vv1 1 1

2

,,,,

1

2

,,,1 ))((

N

i

T

s

T

stt

NisNitNisNitNiNiNisNitNiNitNiNi vvvvvvv1 1 1

,

2

,,,,

2

,,

2

,,,

2

,

3

, )22(

NNNNNN ,16,15,14,13,12,11

N,11 )3(

Nitv , Nit , NNNN ,15,14,13,12 ,,, N,16

)1(po N,1)3(

N

i

T

s

T

stt

NisNitNTNT 1 1 1

,,,2 2)1(

1

N

i

T

s

T

stt

NitNisNTNT 1 1 1

,,)1(

2

)1()]1([)]1([2

2/1

1 1 1

2

,

1

2/1

1 1 1

2

,

12/12/1

p

N

i

T

s

T

stt

Nit

N

i

T

s

T

stt

NisN oTNTTNTNN

N

i

T

s

T

stt

NisNitNTNT 1 1 1

,

2

,,3)1(

1

)1()]1([)]1([)(

2/1

1 1 1

4

,

1

2/1

1 1 1

2

,

1122/1

p

N

i

T

s

T

stt

Nit

N

i

T

s

T

stt

NisN oTNTTNTNN

N

i

T

s

T

stt

NitNisNTNT 1 1 1

2

,,,4)1(

1

)1()]1([)]1([

2/1

1

2

,

1

2/1

1

4

,

12/12/1

p

N

i

T

s

T

st

Nit

N

i

T

s

T

st

NisN oTNTTNTNN

(C.43e)

,

(C.43f)

,

because is and the terms in brackets expressions are all , since

and for and all N. It follows that

, by Assumption 1, and that . Obviously, we then

also have that .

Consider next the fourth moment of and its estimate:

for any given i and , (C.44a)

(C.44b)

.

Observe that

(C.45)

N

i

T

s

T

st

NitNisNTNT 1

,,,5 2)1(

1

N

i

T

s

T

stt

NitNisN NNTNT 1 1 1

,,

122/1 )()1(

2

)1()]1([)]1([)(2

2/1

1 1 1

2

,

1

2/1

1 1 1

2

,

1122/1

p

N

i

T

s

T

stt

Nit

N

i

T

s

T

stt

NisN oTNTTNTNN

N

i

T

s

T

stt

NitNisNTNT 1 1 1

2

,,,6)1(

1

)1()]1([)]1([)(2

2/1

1 1 1

4

,

1

2/1

1 1 1

2

,

12/332/1

p

N

i

T

s

T

stt

Nit

N

i

T

s

T

stt

NisN oTNTTNTNN

NN 2/1 )1(pO )1(pO

KE Nis

, KE Nit

,4

)1(~ )3()3(

, pN o )1()3( O )1(~ )3(

, pN O

)1(~)~~( )3()3(

,

)3()3(

,

)3(

, pvNvvNN o

Ni,

)]()()[(3)( 234

itisititisitis EEEE ts

N

i

T

s

T

stt

NitNisNTNT 1 1 1

3

,,

)4(

,~~

)1(

1~

)~~

)1(

1~1(~~

)1(

3

1 1 1

,,

1 1

2

,

1 1 1

,,

N

i

T

s

T

stt

NitNis

N

i

T

t

Nit

N

i

T

s

T

stt

NitNisTNTNTTNT

)~( ,2

2

,,2,1 NNNN

N

i

T

s

T

stt

NitNitNisNisNTNT 1 1 1

3

,,,,,1 ))(()1(

1

N

i

T

s

T

stt

NitNisNitNisNitNitNitNisNitNisTNT 1 1 1

3

,,,,

2

,,

2

,,

3

,, 33()1(

1

)33 ,

3

,,,

2

,,,

2

,

3

,, NisNitNitNisNitNisNitNitNitNis

.

The first term can also be written as

(C.46)

.

By the properties of and (see Assumption 1), the difference between and

converges in probability to zero by the weak law of large numbers

for i.d. random variables (White, 2001, p. 37, Corollary 3.9).

Moreover, it follows from the properties of and (see Assumption 1), that the terms

are all . It follows that the difference between

and converges in probability to zero.

Next consider

(C.47)

,

which converges to by the weak law of large numbers, since

for by the properties of and and the sum

over the remainder terms appearing in are by arguments analogous to those for

and (see (C.43e) and (C.43f)). Finally, the difference between

and converges in probability to zero. As a

consequence, , by Assumption 1, and .

NNNNNNNN ,18,17,16,15,14,13,12,11

N,11

N,11 3

,,,,

3

,, ))(( NitNitNisNisNitNis vv

)33)(( 3

,

2

,,,

2

,

3

,,, NitNitNitNitNitNitNisNis vvvv

3

,,

2

,,,,

2

,,

3

,, 33( NitNisNitNitNisNitNitNisNitNis vvv

)33 3

,,

2

,,,,,

2

,

3

,, NitNisNitNisNitNitNisNitNitNis vvvvvvv

)3333( 3

,,

2

,,,,,

2

,

3

,,

3

,,

2

,

2

,,

3

,

4

, NitNisNitNisNitNitNisNitNitNisNitNisNitNiNitNiNi vvvvvvvvvv

Nitv , Nit , N,11

N

i

T

t

NitvTN

E1 1

2

,

2)4( 13

Nv Nμ

NNNNNNN ,18,17,16,15,14,13,12 ,,,,,, )1(po N,1

N

i

T

t

NitvTN 1 1

2

,,

2)4( 13

N

i

T

s

T

st

NitNitNitNisNTNT 1 1

,,,,,2 ))(()1(

3

N

i

T

s

T

st

NitNitNisNisTNT 1 1

,,,, ))(()1(

3

N

i

T

s

T

st

NitNisNitNisNitNisNitNisTNT 1 1

,,,,,,,, )()1(

3

23

2

1

,, ])1(

11[

T

s

T

st

NitNisTT

E ts Nitv , Nit ,

N,2 )1(po

N,2 N,5 2

,~

N

N

i

T

t

NitNT 1 1

2

,~1

N

i

T

t

NitvTN

E1 1

2

,,

2 11

)1(~ )4()4(

, pN o )1()4( O )1(~ )4(

, pN O

Lemma C.5

Suppose Assumptions 1-4 hold. Let with ,

with real, nonstochastic, and symmetric matrices, whose elements

are time-invariant ( ), whose diagonal elements are zero ( for

), and whose row and column sums are bounded uniformly in absolute value. Let

with and


. Finally, define

.

Then, we have , , and .

Proof.

Note that

, where (C.48a)

,

since for and . The corresponding expression based on the fixed effects

residuals is given by

where . (C.48b)

Since and are independent for all ,

(C.49)

,

which suggests the following bias-corrected estimator:

, where (C.50)

2

,

2

, )(1

NNNNNT

vv σAσ ),...,( 2

,,

2

,1,

2 NNTvNvN σ

)( ,, NjsitN aA NTNT

NjiNjsit aa ,,,, 0,, Njia 0,, Njia

N

S

m


1

,,,0,0

NNNN ΔDuu ~

),...,( .,.,1 NNTNN ddD 1S Nρ

~

)1()~( pNN oρρ

N

i

N

j

T

t

T

s

NjsNitNji

HR

N vvT

aNT

T

1 1 1 1

2

,

2

,,,2

2

~~11

)1(

~

)1(~

pN

HR

N o )1(ON )1(~

p

HR

N O

N

i

N

j

NijN EN 1 1

,

1

T

t

T

s

jsvitvNji

T

t

T

s

NjsNitNjiNijT

avvT

Ea1 1

2

,

2

,,,

1 1

2

,

2

,,,,

11

0,, Njia ji st

N

i

N

j

NijNN 1 1

,

1

T

t

T

s

NjsNitNjiNij vvT

a1 1

2

,

2

,,,,

1

Nitv , Njsv , st,

NijE ,

T

t

T

s

NjsNitNji vEvET

a1 1

2

,

2

,,,

1

T

t

T

s

NjsNitNji vEvET

T

Ta

1 1

2

,

2

,2

2

,,

)1(1

NijT

T,2

2)1(

N

HR

NT

T

~

)1(

~2

2

with .

To show that , we next demonstrate that and

.

Consider

(C.51)

,

using and . Note that and that

for . Next, define the vector . By Assumption 1,

and the row and column sums of the variance-covariance matrix

are bounded uniformly in absolute value. Next rewrite

, and note that

(C.52)

,

such that we have by Chebychev’s inequality.

Next note that and consider

(C.53)

,

with

,

,

.

N

i

N

j

NijNN 1 1

,

~1~

T

t

T

s

NjsNitNjiNij vvT

a1 1

2

,

2

,,,,~~1~

)1(~

pN

HR

N o )1()1( 2

2

pNN oT

T

)1()1(

~2

2

pNHR

N oT

T

N

N

i

N

j

T

t

T

s

NjsNitNji vvaNT 1 1 1 1

2

,

2

,,,

1

22)(

1NNN

NTvAv

)()(2

,

2

,

2

NitNnN vv v )()( ,,,, NjsitNnnN aa A NjiNjsit aa ,,,,

0,, Njsita jsit 1NT )( 2

,,

2

, NitvNitN v ζ

KvE NitNitv )(2

,

2

,,

)( NNN E ζζΞ

NNNNNNNNT

ENT

ζAσζAζ v )(2

)(1 2

5.02

22]})[(

)(

4])[(

)(

1{)( NNNNNN Var

NTVar

NTVar ζAσζAζ v

)1(])()()(

4)(

)(

2[ 5.022

22o

NTTr

NTNNNN vNvNN σAΞAσΞAΞA

)1()1( 2

2

pNN oET

T

NNNHR

NT

T

T

T

~

)1()1(

~2

2

2

2

N

i

N

j

T

t

T

s

NjsNitNjsNitNjiNN vvvvaNT 1 1 1 1

2

,

2

,

2

,

2

,,, )~~(1~

321

N

i

N

j

T

t

T

s

NjsNitNitNji vvvaTN 1 1 1 1

2

,

2

,

2

,,,1 )~(11

N

i

N

j

T

t

T

s

NjsNjsNji vvaTN 1 1 1 1

2

,

2

,,,2 )~(11

N

i

N

j

T

t

T

s

NjsNjsNitNitNji vvvvaTN 1 1 1 1

2

,

2

,

2

,

2

,,,3 )~)(~(11

By Lemma C.1 we have . In light of the maintained

assumptions regarding the properties of and , it follows that , ,

, and thus .

Summing up . Finally, , such that

, which completes the proof.

Lemma C.6a

Suppose Assumptions 1-4 hold; in addition, assume that for all and

. Let , where the nonstochastic, time-invariant

scalars are bounded uniformly in absolute value. Let

with and


. Finally, define

, where

,

,

, ,

, .


Proof.

Consider

with . (C.54a)

The corresponding expression based on the fixed effects residuals is given by

2

,

2

,,

2

,

2


2

,Nitv NA )1(1 po )1(2 po

)1(3 po )1(~

pNN o

)1(~

pN

HR

N o )1(1

1 1 1 1

2

,

2

,,, OaNT

N

i

N

j

T

t

T

s

jsvitvNjsitN

)1(~

p

HR

N O

8

,NitEv Tt 1

1,1 NNi

N

i

T

t

NitNiN vcENT 1 1

4

,,

1

NiNit cc ,,

N

S

m


1

,,,0,0

NNNN ΔDuu ~

),...,( .,.,1 NNTNN ddD 1S Nρ

~

)1()~( pNN oρρ

NN

HR

Nmk

mk

mk

ka~

1

~

1

~

11

01

11

0

N

i

T

t

NitNiN vcNT 1 1

4

,,~1~

N

i

T

t

T

trr

NirNitNiN vT

vT

cN 1 1 1

2

,

2

,,~

1

1~11~a

932 23

3

0

TTT

Tm

932

)32(231

TTT

Tm

364

)1(23

2

0

TTT

TTk

364

)96)(1(231

TTT

TTk

)1(~pN

HR

N o )1(ON )1(~p

HR

N O

N

i

NiN EN 1

,

1

T

t

NitNiNi vcT 1

4

,,,

1

with . (C.54b)

Substituting for , simplifying (exploiting the independence of and for or

), and collecting terms, we obtain – for each – that

, where (C.55)

,

and .

Since the correction term is also based on original rather than demeaned residuals,

another bias correction for is required. Analogous derivations yield the result that

, with (C.56)

,

and .

Substituting (C.56) into (C.55), averaging over and solving for yields the

following bias corrected estimator for :

, where (C.57)

and .

We next show that , considering each summand in (C.57). By the weak law

of large numbers,

, (C.58a)

given that , since the 8-th moments of (and thus also those of ) are

finite. Using the triangle inequality and the results in Lemma C.1, we have

(C.58b)

,

N

i

NiN EN 1

,

1

T

t

NitNiNi vcT 1

4

,,,1

1

4

,NitvNitv , Njsv ,

ji

ts i

NiNiNi kEk ,1,0, a

T

t

T

trr

NirNitNiNi vT

vT

Ec1 1

2

,

2

,,,1

11a

0k364

)1(23

2

TTT

TT1k

364

)96)(1(23

TTT

TT

Ni,a

NiNiNi mEm ,1,0, aa

T

t

T

trr

NirNitNiNi vT

vT

Ec1 1

2

,

2

,,,1

11a

932 23

3

0

TTT

Tm

932

)32(231

TTT

Tm

Ni ,...,1N

N

NN

HR

N gg a~~~10

11

00

1 mk

kg

11

011

1 mk

mkg

)1(~pN

HR

N o

)1(pNN oE

KE N

2 Nitv , Nitv ,

N

i

NiNiNNN 1

,,,~1~

N

i

T

tNitNitNitNitNitNitNitNi vvvcE

NT 1 1

4

,

2

,,

2

,

2

,,

3

,, 4641

N

i

T

tNitNNitNitNNitNitNNitNitNNi vvvc

NT 1 1

4

,

43

,,

32

,

2

,

2

,

3

,, )464(1

N

i k

itkNicNT 1

4

1

,,

1

with , , , and

. It is readily verified that for under

the maintained assumptions. As an example, consider the case . Using for

some , the triangle inequality, and Hölder’s inequality, we have

(C.58c)

,

since , , , and .

It follows that and thus .

Next consider . Again, under the maintained assumptions,

, (C.59a)

and thus by the weak law of large numbers.

Using the triangle inequality and the results in Lemma C.1, we have

(C.59b)

,

with , , ,

, , ,

, .

Consider . Substituting for , using the triangle inequality and the

generalized Hölder inequality, we obtain – for each of the terms with

(C.59c)

T

tNitNitNit v

1,

3

,,1 4

T

tNitNitNit v

1

2

,

2

,

2

,2 6

T

tNitNitNit v

1

3

,,

3

,3 4

T

rNitNit

1

4

,

4

,4

N

i

T

t

pitkNi ocNT 1 1

,, )1(1

4,...,1k

1k Kc Ni,

K

N

i

T

t

itNicNT 1 1

,1,2

1

N

i

T

tNitNNitv

NT

K

1 1,

3

,

4

)1(11

)(4

4/1

1 1

4

,

4/3

1 1

4

,

2/12/1

p

N

j

T

sNjs

N

i

T

t

NitN oNT

vNT

NKN

)1(2/1

pN ON )1(1

1 1

4

, p

N

i

T

t

Nit OvNT

)1(1

1 1

4

, p

N

j

T

sNjs

ONT

)1(2/1 oN

)1(~pNN o )1(~

pNN oE

NN Eaa ~

)1(pNN oE aa

)1(pNN oE aa

N

i

NiNiNNN 1

,,~1~ aaaa

N

i

T

t k

itkNicTNT 1 1

8

1

,,)1(

1

T

trr

NirNitNitit vv1

2

,,,,1 2

T

trr

NirNitit v1

2

,

2

,,2

T

trr

NirNirNitit vvT 1

,,

2

,,3 21

1

T

trr

NirNirNitNitit vvT 1

,,,,,41

2

T

trr

NirNirNitit vT 1

,,

2

,,5 21

1

T

trr

NirNitit vT 1

2

,

2

,,61

1

T

trr

NirNitNitit vT 1

2

,,,,71

2

T

trr

NirNititT 1

2

,

2

,,81

1

it,1 NitNNit ,,

)1( T tr

N

i

T

t

itNicTNT 1 1

,1,)1(

1

N

i

T

t

NirNitNitN vvTTN

K

1 1

2

,,,)1(

112

since with , , ,

, , and .

By analogous arguments, the other terms involving to can be shown to be

under the maintained assumptions. It follows that , and thus

.

This completes the proof, recognizing that under the maintained assumptions.

Lemma C.6b

Suppose Assumptions 1-4 hold; assume further that , i.e., there is cross-

sectional heteroskedasticity only in (but no heteroskedasticity over time). Let

and define

,

where as well as and are as in Lemma C.6a.


Proof.

Notice first that

. (C.60b)

Under the maintained assumptions, this can be written equivalently in the following

(estimable) expression:

, (C.60b)

where .

Next, observe that is equal to as defined in the proof of Lemma C.6a. Substituting

(C.55) into (C.56), solving for , and averaging over the bias corrected estimator

),1(111

)()1(

24/1

1

4

,

4/1

1 1

4

,

2/1

1 1

2

,

2/12/1

p

N

i

Nir

N

i

T

sNis

N

i

T

t

NitN ovNNT

vNT

NNT

K

Kc Ni , )1(OK )1(2/1

pN ON )1(1

2/1

1 1

2

, p

N

i

T

t

Nit OvNT

)1(1

4/1

1 1

4

, p

N

j

T

sNjs

ONT

)1(1

4/1

1 1

4

, p

N

i

T

t

Nit OvNT

)1(2/1 oN

it,2 it,8 )1(po

)1(~,1,1 pNN

oaa

)1(~pNN oE aa

)1(ON

),0.(.~ 2

,, ivNit div

Nitv ,

N

i

T

t

iNiN cNT 1 1

4

,

1

NN

HR

Nkm

km

km

m ~

1

~

1

~

11

01

11

0

a

1010 ,,, kkmm Na~

N~

)1(~pN

HR

N o )1(ON )1(~p

HR

N O

N

i

T

t

NitNiN EvcNT 1 1

22

,, )(1

N

i

iNicN 1

4

,

1

N

i

NiNN 1

,

1

))1(

(1 1

2

,

2

,,

T

t

T

tss

NisNiti

Ni vvTT

cE

N Na

Na Ni ,...,1

given in Lemma C.6b is obtained. That and was

already shown in the proof of Lemma C.6a.

)1(~pNN oE )1(~

pNN oE aa

Remark C.2

If is in fact heteroskedastic over both cross-sections and time, the error made by the

approximation in Lemma C.6b is given by

.

Hence, can be assumed to be small for small T and when heteroskedasticity is mainly of

the cross-section type (or random over time).

Lemma C.7

Suppose the assumptions of Lemma C.6a hold. Let , where the

nonstochastic scalars are bounded uniformly in absolute value. Define

, where

,

,

and .


Proof.

Consider

with , (C.61a)

The corresponding expression based on the fixed effects residuals is given by

with . (C.61b)

Substituting for , simplifying (exploiting the independence of and for or

), and rearranging terms, we obtain that – for each

, where (C.62)

and .

Nitv ,

N

i

T

t

T

tss

NitNitNi

N

i

T

t

T

tss

NisNitNiTT

cTT

cN 1 1 1

2

,

2

,,

1 1 1

2

,

2

,,)1(

1

)1(

11

N

i

Nit

T

t

T

tss

NisNitNiTT

cN 1

2

,

1 1

2

,

2

,, )()1(

11

N

i

T

t

NitNitN vcENT 1 1

3

,,

1

Nitc ,

NN

HR

N ff b~~~

10

N

i

T

t

NitNitN vcETN 1 1

3

,,~

)1(

1~

N

i

T

t

T

r

NirNitN vT

cTN 1 1 1

3

,,~

1

111~b

33

)1(20

TT

TTf

2

2

21)1()33(

1

T

T

TTf

)1(~

pN

HR

N o )1(ON )1(~

p

HR

N O

N

i

NiN EN 1

,

1

T

t

NitNitNi vcT 1

3

,,,

1

N

i

NiN EN 1

,

1

T

t

NitNitNi vcT 1

3

,,,1

1

3

,NitvNitv , Njsv ,

ji

ts i

NiNiNi fEf ,1,0, b

0f33

)1(2

TT

TT1f

33

12

TT

Since the correction term is also based on original rather than demeaned residuals,

another bias correction is required as well. Analogous derivations yield the result that

, (C.63)

such that

, where

,

.

Averaging over , we obtain the following bias corrected estimator for :

, where (C.64)

, and

.

The proof that is very similar to that in Lemma C.6a and is thus omitted for

the sake of brevity. Finally, suppose that can be written as quadratic form

with and ; then

, and with

.

Remark C.3

Note that and . Accounting for the

definition of , can be written as sum of the two expressions and

, where ( ) is an vector made up of the main diagonal elements

of the matrix ( ). Next, observe that and

(obviously suppressing the indexation of ). By assumption ,

and thus , where the dimension of is . Moreover, ,

and thus , where the dimension of is . By arguments,

analogous to that in Lemma C.3, we have . It

Ni ,b

NiNi ET

T,2

2

,)1(

bb

NiNiNi EfEf ,1,0, b

33

)1(12

0

0

TT

TT

ff

2

2

22

2

11)1()33(

1

)1(

T

T

TTT

Tff

3032 )1()1(33

)1(

T

Tf

T

T

TT

TT

Ni ,...,1 N

NN

HR

N ff b~~~

10

N

i

Ni

N

i

T

t

NitNitN EN

vcETN 1

,

1 1

3

,,

~1~

)1(

1~

N

i

T

t

T

r

NirNitN vT

cTN 1 1 1

3

,,~

1

111~b

)1(~pN

HR

N o

N

NNNNNT

bΣa)3(1

)()( 3

,1

3

,1

)3(

Nit

NT

itNn

NT

nN vEdiagvEdiag Σ NitNitNit bac ,,,

N

HR

NN

HR

NTN

bΣa),3(~

)1(

1~

])~[(

~ 3

,1

),3( HR

Nn

NT

n

HR

N vdiag Σ

]~

)1(

1~[)~(1

3

,3

3

,0

3

,

T

r

NirNit

HR

Nit vT

vfv

NT

n

NnvNnN dN1

)3(

,,,,

1****

, vvE

NT

n

NnvNnN dN1

)3(

,,,,

1****

,

~~vvE

Nnd ,,v

****

,NvEN

NN,

)3(

,vBv κΣa

NNN

,

)3(

,vA

bΣκv

N,vB

κN,vA

κ 1NT

N,vB N,vA NNNN αPFa vv ,, NNNN αPFa vv

~~~~,,

Nα~ )1(

~pNN oPP )1(ON P

)1(~

pN OPNP PP *

)1(~pNN oαα

)1(ON α )1(~pN Oα

Nα 1P

)1(~~

,,

),3(1),3(1

p

HR

NN

HR

NN oNNNN

vv BBκΣFκΣF

follows that . By the same

reasoning, , from which it follows that

.

)1(~~~~

,,

)3(

,,

1),3(1

pNNNNrp

HR

NNNN oNNNN

vv BBκΣFPακΣFPα

)1(~~

,

)3(

,

)3(

,,pNNNN o

NN

vAvA

bΣκbΣκvv

)1(~ ****

,

****

, pNN o vv EE

Date post:	22-Aug-2020
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Fixed Effects and Random Effects Estimation of Higher ... · Spatial interactions in data may...

Documents