Spatial Correlation Robust Inference

Post on 04-Dec-2021

6 views 0 download

transcript

Spatial Correlation Robust Inference

Ulrich K. Müller and Mark W. Watson

Princeton University

Presentation at University of Virginia

Motivation

• More and more empirical work using spatial data in development, trade, macro, etc.

• How to appropriately correct standard errors?

— Conley (1999): Spatial analog of HAC standard errors

— Don’t work well under moderate or high spatial dependence

— Many applications characterized by strong spatial correlations (Kelly (2019, 2020))

1

Canonical Problem

Observe

= + , = 1

• (and ) associated with observed location ∈ S ⊂ R

• drawn i.i.d. with density

• With s = (1 ): E[|s] = 0, E[|s] = ( − ), so covariance stationary

• How to test 0 : = 0 and construct confidence interval for ?

• Extensions to regression, GMM etc. follow from standard arguments (see paper)

2

Three One-Dimensional Designs

3

Two Geographic Designs

Light data from Henderson, Squires, Storeygard and Weil (2018)

4

Spatial Inference

• Usual t-statistic

=

√( − 0)

with associated critical value cv, and confidence interval with endpoints ± cv √

• Conley (1999): kernel-type “consistent” estimator 2 = −1X

µ||−||

¶ with →∞

and standard normal cv

• Bester, Conley, Hansen, and Vogelsang (2016): ‘fixed-b’ version 2 = −1X

µ||−||

and nonstandard cv obtained by simulating from ∼ N (0 1)

• Sun and Kim (2012): 2 = −1X

=1

³X−12

´2with Fourier weights and student-t

cv (analogous to Müller (2004, 2007), Phillips (2005), etc.)

5

This Paper

• Measure of strength of spatial dependence: average pairwise correlation

=1

(− 1)X=1

X6=Cor ( |s)

• Objective: method to construct and cv that “work” for

— some moderate and all small values of

— also for non-uniform spatial density

6

Outline of Talk

1. Definition of new “SCPC” method

2. Size control under generic weak correlation

3. Size control under some forms of strong correlation

4. Small sample performance

7

Motivation for SCPC Method

• Consider OLS variance estimator 2OLS = (− 1)−1P=1( − )2. Using familiar notation

2OLS = (− 1)−1y0My = (− 1)−1y0⎛⎝−1X=1

ww0

⎞⎠y = (− 1)−1 −1X=1

(w0u)2

for w such that −1w0w = 1[ = ] and l0w = 0 for l = (1 1)

0So | | cv iff

(l0u)2 cv2(− 1)−1−1X=1

(w0u)2

• Challenge with (positive) spatial correlation: Most weighted averages w0u are less variable than l0u,leading to downward bias of 2

• Solution: Select (few) weighted averages w0u that are as variable as possible under plausible spatialcovariance matrix, and use 2 = −1P

=1(w0u)

2

8

SCPC Method

• Benchmark model ( − ) = exp(−|| − ||) with associated covariance matrix Σ() of u

— worst-case correlation has = 0, weaker correlation for 0

• Use eigenvectors r1 r ofMΣ(0)M corresponding to largest eigenvalues as weights and

2SCPC() = −1X

=1

(r0u)2

“Spatial Correlation Principle Components” (SCPC) of u ∼ N (0MΣ(0)M)

• For given , compute critical value cvSCPC such that sup≥0 P0Σ()(|SCPC()| cvSCPC()|s) =

• SCPC minimizes CI length E[2SCPC() cvSCPC()|s] in i.i.d. model u|s ∼ N (0 2I)9

Eigenvectors in One-Dimensional Designs

10

Eigenvectors in Geographic Designs

11

Properties of SCPC Inference I

• Select 0 as function of implied average correlation = 0

⇒ SCPC inference invariant to scale of locations, as well as any distance preserving transformations,

such as rotations

• For fixed 0 (such as 0 = 002), SCPC = (1)

— SCPC not consistent, cvSCPC does not converge to normal critical value

— (appropriately) reflects difficulty of correctly estimating variance of l0u under ‘strong’ correlation

• SCPC easy to apply, even for (very) large , using computational short-cuts for determination ofeigenvectors if is large (STATA and matlab code available)

• By construction, SCPC controls size in benchmark model u|s ∼ N (0Σ()), ≥ 012

Properties of SCPC Inference II

• Theorem: Under regularity conditions, SCPC controls asymptotic size under generic ‘weak correlation’with

→ 0, also in non-Gaussian models

(whereas suggestions by Sun and Kim (2012) and Bester, Conley, Hansen, and Vogelsang (2016)

only do so for uniform )

• Theorem: In Gaussian model with parametric covariance matrices u|s ∼ N (0Σ()), ∈ Θ,

easy-to-compute conditions such that SCPC controls size in nonparametric class of models

u|s ∼ N (0ZΣ() ()) for all cdfs

⇒ Numerical result: in many spatial designs, SCPC controls size for mixtures of Matérn covariance

functions with implied ≤ 0

• Numerical result: SCPC close to minimizing expected confidence interval length in particular classof spatial correlations among all functions of y

13

Weak Correlation: Set-up of Lahiri (2003)

• Location are sampled i.i.d. on S ⊂ R

• Conditional on locations s = (1 ), for some sequence 0,

= ()

with a mean-zero stationary random field on R with E[()()] = ( − )

• Calculation: = (1), so weak dependence when →∞

⇒ nature of weak dependence characterized by = → ∈ [0∞)

⇒ →∞ corresponds to asymptotically negligible spatial correlation

14

Weak Correlation: Weighted Averages

• Lemma: Let the × (+ 1) matrixW0 have th row w0() where w0() = (1w()0) w() =

(1() ())0 and : S 7→ R are continuous weight functions. Under mixing and moment

assumptions on of Lahiri (2003)

12 −12W00u|s⇒ N (0Ω) with Ω = (0)V1 +

µZ()

¶V2

where

V1 =

ZSw0()w0()0() and V2 =

ZSw0()w0()0()2

⇒ V1 is what we would expect from i.i.d. data, large corresponds to very weak correlation

⇒ V2 proportional to V1 only under constant

• Convergence holds conditional on s: Randomness in locations doesn’t drive variability of weightedaverages

15

Weak Correlation: t-statistic

• Recall: 12 −12W00u|s⇒ N (0Ω) with Ω = (0)V1 + (R())V2

• Theorem: Let be t-statistic with 2 = −1P=1(w

0y)

2. WithX = (0X01:)

0 ∼ N (0Ω),

under assumptions of Lemma,

P (| | cv |s) → P

⎛⎜⎝ |0|q−1X01:X1:

cv

⎞⎟⎠

• If Ω ∝ V1 =RS w0()w0()0() = I+1, then critical value from student-t

⇒ but Ω ∝ V1 only if uniform, so Sun and Kim (2012) approach not generically valid

16

Weak Correlation: Size Control

• Since t-statistic is scale invariant, no loss of generality to normalize

Ω = V1 + (1− )V2 ∈ [0 1)where

V1 =

ZSw0()w0()0() and V2 =

ZSw0()w0()0()2

⇒ scalar parameter ∈ [0 1) fully characterizes all possible limits under weak correlation

• SCPC benchmark model (− ) = exp(−||− ||) replicates all such limits with appropriatechoices of = →∞

Theorem: SCPC critical value construction sup≥0 P0Σ()

(|SCPC()| cvSCPC()|s) = implies

size control also under arbitrary sequences ≥ 0, and, therefore, under generic weak correlation.

17

Weak Correlation: Additional Results

• Convergence of first eigenvectors ofMΣ0M to limiting eigenfunctions : S 7→ R in appropriatesense

• Analogous results to those above for t-statistics with ‘fixed-b’ kernel variance estimators

2 = −1X

µ||−||

with cv obtained from distribution in i.i.d. model, as in Bester, Conley, Hansen, and Vogelsang (2016)

⇒ generic size control under weak correlation again only with uniform

18

Size Control Under Mixtures I

• Consider Gaussian model and parametric covariance matrices u|s ∼ N (0Σ()), ∈ Θ.

Seek easy-to-compute conditions such that t-statistic is valid in nonparametric class of models

u|s ∼ N (0ZΣ() ()) for all cdfs

• Suppose 2 = u0WW0u, cv and Σ0 are such that

P(| | cv) ≤ under y ∼ N (0Σ0)so can one find inequalities relating Σ(), ∈ Θ to Σ0 such that also

P(| | cv) ≤ under y ∼ Nµ0

ZΣ() ()

¶for any cdf on Θ?

19

Size Control Under Mixtures II

Theorem: With W0 = (lW), let Ω0 = W00Σ0W0 and Ω() = W00Σ()W0. Suppose A0 =

D(cv)Ω0 with D(cv) = diag(1− cv2 I) is diagonalizable, and let P be its eigenvectors. Define

A() = P−1D(cv)Ω()P and A() = 12(A() +A()0), and suppose A0 and A() ∈ Θ are scale

normalized such that 1(A0) = 1(A()) = 1, where (·) is the th largest eigenvalue. Let

1() = (−A())− 1(A())(−A0)− (1(A())− 1)() = +1−(−A())− 1(A())+1−(−A0) for = 2

If inf∈ΘP=1 () ≥ 0 for all 1 ≤ ≤ , then for any cdf on Θ, P(| | cv) ≤ under

y ∼ N (0Σ0) implies

P(| | cv) ≤ under y ∼ Nµ0

ZΣ() ()

20

Implications of Mixture Theorem

1. Spatial context

• Apply SCPC in U.S. states “light” and uniform designs

• Use theorem to numerically establish size control in arbitrary mixtures of Matérn-class covariancematrices with ∈ 12 32 52∞ and 0 such that implied ≤ 0

2. Regular time series

• Consider equal-weighted cosine (EWC) t-statistic of Müller (2004, 2007), Lazarus et al. (2018)with critical value computed from benchmark AR(1) model with coefficient 1− 0

• Class of spectral densities where ()0() is (weakly) monotonically increasing in ||⇒ Can be written as mixture (|) ∝

R1[ ]0() ()

⇒ Theorem implies small sample and asymptotic size control (cf. results in Dou (2019))

21

Intuition for Theorem

• With Ω() =W00Σ(θ)W0 and X = (0X01:)

0 ∼ Ω()12Z ∼ N (0Ω())

P³2 cv2

´→ P

⎛⎝ 20

X01:X1: cv2

⎞⎠ = P ³20 − cv2X01:X1: 0

´= P

³X0D(cv)X 0

´= P(Z0Ω()12D(cv)Ω()12Z 0)

= P

⎛⎝ X=0

()2 0

⎞⎠ = P⎛⎝20 − X

=1

()

0()2

⎞⎠where and D(cv) = diag(1− cv2 I) and () are the eigenvalues of Ω()12D(cv)Ω()12, or,equivalently, of D(cv)Ω()

• Can show: P³20

P=1

2

´is Schur convex in =1

• Use majorization theory on eigenvalues of linear combinations of matrices (plus additional calcula-tions) to obtain result

22

Monte Carlo Comparison with Existing Methods

For each of the 48 contiguous U.S. states, draw 5 independent samples of 500 locations from density

∈ uniform, light

• Gaussian benchmark model with ( − ) = exp(−0|| − ||), 0 calibrated from 0

• Alternative methods

1. Conley (1999) with choice of that minimizes size distortion

2. Fixed-b kernel variance estimator with = max || − || (analog to Kiefer, Vogelsang andBunzel (2000))

3. Sun and Kim (2012) with chosen according to their rule using true covariance function

4. Ibragimov and Müller (2010) cluster inference with = 4 and = 9 clusters

23

Monte Carlo Results

24

Monte Carlo Results

25

Monte Carlo Results

26

Conclusions

• New method to conduct spatial correlation robust inference that accounts for

— non-uniform distribution of locations

— (some) strong spatial correlations

• Potentially applicable also in network econometrics or more generally under given plausible form of

‘worst-case’ correlation structure

27