Statistics for Spatio-Temporal Data (Tutorial) Christopher ...€¦ · Statistics for...

Statistics for Spatio-Temporal Data (Tutorial)

Christopher K. Wikle

Department of StatisticsUniversity of Missouri

([email protected])

Many of these slides were excerpted from a copyrighted short coursedeveloped by Chris Wikle and Noel Cressie (University of Wollongong)

based on their book Statistics for Spatio-Temporal Data

1

Spatio-Temporal Processes and Data

Data from spatio-temporal processes are common in thereal world, representing a variety of interactions acrossprocesses and scales of variability.

2

Spatio-Temporal Processes and Data (cont.)

Although it may be informative to see snapshots of spatialevents in time (see the Missouri River scene below), tounderstand the process, we must know something aboutthe behavior from one time-period to the next.

3


Similarly, high-frequency temporal information from thegage level at Hermann, MO (on the Missouri River) doesnot give a sense of the spatial extent of the flood event.

1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 19980

5

10

15

20

25

30

35

40

Year

Hei

ght (

ft)

gage levelflood stage

4

Outline of this Tutorial

• Overview of Spatio-Temporal Modeling

• Descriptive vs Dynamical Approach

• Hierarchical Spatio-Temporal Models

• Parameterization of Linear Dynamical Spatio-Temporal Models

• Nonlinear Spatio-Temporal Dynamical Models

• Invasive Species Example

• Ocean Biogeochemical Example

• Conclusion

Most references given in this tutorial can be found in Cressie andWikle (2011) [henceforth, C&W (2011)]

5

Spatio-Temporal Processes and Data

There is no history without geography (and vice versa)! Weconsider space and time together

The dynamical evolution (time dimension) of spatial processesmeans that we are able to reach more forecefully for the “Why”question. (The problems are clearest when there is noaggregation; henceforth, consider processes at point-levelsupport for this tutorial, unless stated otherwise.)

Notation: Let{Y (s; t) : s ∈ Ds , t ∈ Dt}

denote a spatio-temporal random process. We sometimes writethis process as Y (s; t) or, more correctly, as Y (·; ·). For discretetime, we write Yt(s).

6

Spatio-Temporal Statistical Modeling

Spatio-tempral models exist in many scientific andmathematical disciplines. From a statistician’sperspective, what makes a model “statistical”?

• Uncertainty in data, model, and the associatedparameters

• Estimation of parameters and prediction of processes

We also often make a distinction between “stochastic”and “statistical”

• The former concerns random structures in models

• The latter concerns estimation and prediction givendata

7


Why spatio-temporal modeling? Characterizeprocesses in the presence of uncertain and (often)incomplete observations and system knowledge, for thepurposes of:

• Prediction in space (smoothing, interpolation)

• Prediction in time (forecasting)

• Assimilation of observations with deterministic models

• Inference on parameters that explain the etiology ofthe spatio-temporal process

Traditionally, there are two approaches to modeling suchprocesses: descriptive and dynamical.

8

Spatio-Temporal Modeling

Descriptive (marginal) approach: Characterize thesecond-moment (covariance) behavior of the process

• Several different physical processes could imply thesame marginal structure

• Most useful when knowledge of the etiology of theprocess is limited

9

Spatio-Temporal Modeling

Dynamical (conditional) approach: Current values ofthe process at a location evolve from past values of theprocess at various locations

• Conditional models are closer to the etiology of thephenomenon under study

• Most useful if there is some a priori knowledgeavailable concerning the process’ behavior

Note that the descriptive approach and the dynamicalapproach can be related through their respectivecovariance functions.

10

A Simple Example

Consider the deterministic 1-D space × time, reaction-diffusionequation:

∂Y (s; t)

∂t= β

∂2Y (s; t)

∂s2− αY (s; t) ,

for {s ∈ R, t ≥ 0}, where β is the diffusion coefficient and α is the

“reaction” coefficient.

Meaning of the Equation: The rate of change in Y is equal to the“spread” of Y in space (i.e., diffusion) offset by the “loss” of acertain multiple of Y (i.e., reaction).

Behavior of the Equation: From a given initial condition Y (s; 0),the process Y (s; t) dampens as time t increases.

11

A Simple Example (cont.)

Y (s; 0) = I (15 ≤ s ≤ 24)

(a) α = 1, β = 20; (b) α = 0.05, β = 0.05; (c) α = 1, β = 5012

A Simple Example: Stochastic VersionConsider the stochastic version of this PDE:

∂Y

∂t− β∂

2Y

∂s2+ αY = η ,

where {η(s; t) : s ∈ R, t ≥ 0} is a mean-zero, white-noise process:

E (η(s; t)) ≡ 0

cov(η(s; t), η(u; r)) = σ2I (s = u, t = r)

In this case, a statistical balance is reached between the“disturbance” caused by η(·; ·) and the smoothing effect of thediffusion and loss components. That is, from a given initial condition,the stochastic PDE results in a process that eventually achieves bothspatial and temporal stationarity. (The more general case ofstochastic PDEs in Rd is given, e.g., by Brown et al., 2000.)

13

Stochastic Reaction-Diffusion Simulation Plots

Y (s; 0) = I (15 ≤ s ≤ 24)

α = 1 , β = 20

(a) σ = 0.01; (b) σ = 0.1; (c) σ = 1

14

Spatio-Temporal Covariance Function

The stochastic reaction-diffusion equation implies a (stationary inspace and time – definition to follow) covariance function:

CY (h; τ) ≡ cov(Y (s; t),Y (s + h; t + τ))

and correlation function:

ρY (h; τ) ≡ CY (h; τ)/CY (0; 0)

Heine (1955; Biometrika) gives a closed-form solution for ρY (·; ·)for spatial lag h ∈ R and temporal lag τ ∈ R:

ρY (h; τ) = (1/2)

{e−h(α/β)1/2

Erfc

(2τ(α/β)1/2 − h/β

2(τ/β)1/2

)+ eh(α/β)1/2

Erfc

(2τ(α/β)1/2 + h/β

2(τ/β)1/2

)},

where Erfc(z) is the “complementary error function”:

Erfc(z) ≡ (2/π1/2)

∫ ∞z

e−v2dv, z ≥ 0 ;

andErfc(z) = 2 − Erfc(−z), z < 0.

15

Contour Plot of Spatio-Temporal Correlation Function

The plot shows ρY (h; τ) for the stochastic reaction-diffusion equationwhen α = 1 and β = 20

16

Plots of Marginal Spatial and Temporal CorrelationFunctions

Special cases include the marginal spatial correlation function at agiven time: (a) ρY (h; 0) = exp{−h(α/β)1/2}, h > 0; and thetemporal correlation function at a given spatial location: (b)ρY (0; τ) = Erfc(τ 1/2α1/2), τ > 0.

17

Spatio-Temporal Stationarity

Definition:

We say that f is a stationary spatio-temporal covariance function onRd × R, if it is nonnegative-definite and can be written as:

f ((s; t), (x; r)) = C (s− x; t − r) , s, x ∈ Rd , t, r ∈ R.

If a random process Y (·; ·) has a constant expectation and astationary covariance function CY (h; τ), then it is said to besecond-order (or weakly) stationary. (Strong stationarity implies theequivalence of the two probability measures defining the randomprocess Y (·; ·) and Y (·+ h; ·+ τ), respectively, for all h ∈ Rd and allτ ∈ R.)

18

Separability of Spatio-Temporal Covariance Functions

Stochastic PDEs are built from dynamical physicalconsiderations, and they imply covariance functions.Covariance functions have to be positive-definite (p-d). So,specifying classes of spatio-temporal covariance functions todescribe the dependence in spatio-temporal data is not all thateasy.Suppose the spatial C (1)(h) is p-d and the temporal C (2)(τ) isp-d. Then the separable class:

C (h; τ) ≡ C (1)(h) · C (2)(τ)

is guaranteed to be p-d.Separability is unusual in dynamical models; it says thattemporal evolution proceeds independently at each spatiallocation. That is, separability comes from a lack ofspatio-temporal interaction in Y (·; ·).

19

Stochastic Reaction-Diffusion and Separability

If C (h; τ) = C (1)(h) · C (2)(τ),then

C (h; 0) = C (1)(h)C (2)(0)

C (0; τ) = C (1)(0)C (2)(τ) ,

and henceρ(h; τ) =

C (1)(h) · C (2)(τ)

C (0; 0)

=C (h; 0) · C (0; τ)

C (0; 0) · C (0; 0)

= ρ(h; 0) · ρ(0; τ)

What about the stochastic reaction-diffusion equation for Y (·; ·)?Plot:

ρY (h; 0) · ρY (0; τ) versus (h, τ)

ρY (h; τ) versus (h, τ)

20

Contour Plots of Spatio-Temporal Correlation Functions

(a) ρY (h; 0) · ρY (0; τ); (b) ρY (h; τ)

The difference in correlation functions is striking. Hence ρY (·; ·), forthe stochastic reaction-diffusion equation, is non-separable. Note,however, that it is often difficult to see the difference betweenseparability and non-separability in realizations from a process.

21

Inference on a Hidden Spatio-Temporal Process

We could ignore the dynamics and treat time as another “spatial”dimension (i.e., descriptive approach). Write the data as:

Z = (Z (s1; t1), . . . ,Z (sm; tm))′ ,

which are observations taken at known space-time “locations.”

Note that the data are usually noisy and not observed at alllocations of interest.

Assume a hidden (“true”) process,{Y (s; t) : s ∈ Ds ⊂ Rd , t ≥ 0}, which is not observable due tomeasurement error and “missingness.” Write

Z = Y + ε ,

where E (ε) = 0, cov(ε) = σ2ε I.

We wish to predict Y (s0; t0) from data Z

22

Spatio-Temporal (Simple) KrigingPredict Y (s0; t0) with the linear predictor, λ′Z + k :

For simplicity, assume E (Y (s; t)) ≡ 0. Then k = 0, and we minimizew.r.t. λ, the mean squared prediction error,

E (Y (s0; t0)− λ′Z)2 .

This results in the simple kriging predictor:

Y (s0; t0) = c(s0; t0)′Σ−1Z Z ,

where ΣZ ≡ cov(Z), andc(s0; t0)′ = cov(Y (s0; t0),Z) = cov(Y (s0; t0),Y)

The simple kriging standard error (s.e.) is:

σk(s0; t0) = {var(Y (s0; t0))− c(s0; t0)′Σ−1Z c(s0; t0)}1/2

23

Kriging for Stochastic Reaction-Diffusion Equation

(a) For simplicity, assume no noise in the data Z (i.e., ε = 0)

(b) Crosses show {(si ; ti) : i = 1, . . . , 48} (“data” locations)

superimposed on the kriging predictor map, {Y (s0; t0)}(c) Kriging s.e. map, {σk(s0; t0)}

24

Kriging for Stochastic Reaction-Diffusion Equation (cont.)

(a) Same noiseless dataset (i.e., ε = 0)

(b) Crosses show different {(si ; ti) : i = 1, . . . , 48)} superimposed on

the kriging predictor map, {Y (s0; t0)}(c) Kriging s.e. map, {σk(s0; t0)}

25

Spatio-Temporal Covariance Functions

In practice, one does not typically know the underlying stochasticPDE that governs the system of interest. Even with such knowledge,it may not be easy to find the analytical covariance function.

We saw that the assumption of separability is not very realistic andthat covariance functions must satisfy the positive-definitenessproperty. This suggests the need for realistic classes ofspatio-temporal covariance functions.

In recent years, there has been good progress in developing newclasses of spatio-temporal covariance functions through the useof the spectral-domain representation and Bochner’s Theorem (e.g.,see C&W 2011, Sec. 6.1.6: Examples include the work of Cressie andHuang, 1999; Gneiting, 2002; Stein, 2005; and many others).

26

Spatio-Temporal Covariance Functions (cont.)

To date, available classes of (descriptive) S-T covariance functionsare not realistic for many complicated phenomena, and there can beserious computational issues with their implementation in traditionalkriging formulas due to the dimensionality of the prediction problemsof interest.

As an alternative, we can make use of dynamical (conditional)formulations. These simplify the joint-dependence structure. Inaddition, because conditional models are closer to the process’etiology, it may be easier to incorporate process knowledge directly(e.g., using dynamical models).

Consider again the stochastic reaction-diffusion equation, now fromthe dynamical perspective.

27

Emphasize the Dynamics

Approximate the differentials in the reaction-diffusion equation:

∂Y

∂t= β

∂2Y

∂s2− αY

with differences over the grid from 0 to L at intervals ∆s :

Y (s; t + ∆t)− Y (s; t)

∆t= β

{Y (s + ∆s ; t)− 2Y (s; t) + Y (s −∆s ; t)

∆2s

}− αY (s; t)

Define Yt ≡ (Y (∆s ; t), . . . ,Y (L−∆s ; t))′; YBt ≡ (Y (0; t),Y (L; t))′.

Then the stochastic version of the difference equation above is:

Yt+∆t = MYt + MBYBt + ηt+∆t

,

where MBYBt represents given boundary effects. The difference

equation is a good approximation to the differential equation,provided α∆t < 1 and 2β∆t/∆2

s < 1.28

Emphasize the Dynamics (cont.)

Importantly, the matrix M is given by

M =

θ1 θ2 0 . . . 0

θ2 θ1 θ2 . . ....

0 θ2 θ1. . .

......

. . .. . . θ2

0 0 . . . θ2 θ1

,

where θ1 = (1− α∆t − 2β∆t/∆2s ), θ2 = β∆t/∆2

s .

This can be viewed as the propagator (transition) matrix of aVAR(1) process. The matrix is defined by the dynamics. In otherwords, in a dynamic model of spatio-temporal dependence, M hasstructure (which is typically sparse).

29

Emphasize the Dynamics (cont.)

Conditional on the boundary effects, we see that the lagged (in time)spatial covariances are given by,

C(m)Y = MmC(0)

Y ,

where C(m)Y ≡ cov(Yt ,Yt+m∆t ); m = 0, 1, 2, . . . , and it can be shown

that the lag-0 marginal spatial covariance for Y can be written interms of the propagator matrix M and the spatial covariance matrix

for the η-process, C(0)η :

vec(C(0)Y ) = (I−M⊗M)−1vec(C(0)

η ).

This suggests that we can compare the spatio-temporal covariancestructure for this reaction-diffusion difference equation with thePDE’s theoretical form derived by Heine (1955).

30

Comparison of Differential and Difference Equations

Spatio-temporal correlations; α = 1, β = 20, ∆s = 1, and ∆t = 0.01Solid blue line: from differential equationRed dots: from difference equation

31

The Dynamics in the Difference Equation

Think of a spatial process at time t rather than a spatio-temporalprocess. Call it the vector Yt . Then describe its dynamics by adiscrete-time Markov process; e.g., VAR(1):

Yt = MYt−1 + ηt

As implied above, the choice of M is crucial. In particular, we notethat M ≡ (mij) represent “spatial weights” of the process values fromthe past, e.g.,

Yt(si) =n∑

j=1

mijYt−1(sj) + ηt(si).

Usually, many of these coefficients have small or zero weight.Typically, the mij corresponding to nearby locations si and sj arenon-zero, and they are zero when locations are far apart.

32

Structure of M

These directed graphs show the case of one-dimensional space:

su = s

u = u1

u = u2

t-1 t

u

u

u

u ii

i − 2

i − 1

i + 1

i + 2

i − 2

i − 1

i + 1

i + 2

t-1 t

u u

uuu

uuu

uuu

uuu

General M M defined “spatially”

33

Structure of M (cont.)

The importance of the structure of M suggests ways in which thismatrix can be parameterized.

What is it about the structure of this matrix and the values of these“nearest neighbor” parameters that affect the dynamics? Can we usethis sort of scientific process knowledge (in various forms) to helpwith this parameterization?

In fact, this type of information can help us but we need an efficientframework in which to build it into the model.

The hierarchical modeling framework is quite helpful in this regard.

34

Towards Hierarchical Spatio-Temporal Statistical Models

• We can motivate dynamical models through mechanisticrelationships.

• These models can still be over-parameterized, or too simple forreal-world processes.

• We must account for this complexity and our uncertainty in theprocess and parameters.

• There is also uncertainty in data, and the size of the dataset canbe a problem.

• Hierarchical statistical models (specifically, BayesianHierarchical Models, BHMs) can provide a framework to accountfor these issues.

Before getting back to the dynamical specifications, consider thefollowing motivating example to illustrate the BHM approach forspatio-temporal modeling.

35

Motivating Problem: Spread of Invasive SpeciesEurasian Collared-Doves (ECDs)

• The Eurasian Collared-Dove (Streptopelia decaocto) originated inAsia and, starting in the 1930s, expanded its range into Europe(Hudson, 1965).

• They were first observed in the United States in the mid 1980safter being introduced into the Bahamas in 1974 from apopulation that escaped captivity (Smith, 1987).

• Since the species’ introduction in Florida, its range has beenexpanding dramatically across North America.

36

Breeding Bird Survey (BBS) Counts of ECD, 1986-2003

37

BBS ECD Counts: 2003 and Yearly Totals)

38

Invasion Impacts

• ECD biological threats (Romagosa and Labisky, 2000):competition for resources with native avifauna; transmission ofdisease

• “ECD will probably colonize all of North America within a fewdecades” (Romagosa and Labisky, 2000)

Just how probable is this colonization? The example presented laterwill answer this question.

The following provides the spatio-temporal hierarchical motivation forsuch a model.

39

Typical Invasions

Invasive species phases:

• Introduction

• Establishment

• Range Expansion

• Saturation

Ecological models for invasions involve dispersal and growth

40

Uncertainty in Spread of Invasives

• Uncertainty in data (e.g, BBS counts)I differences in experience and expertise of the BBS volunteer

observers leads to differences in probability of detection

I The Eurasian Collared-Dove is similar in appearance to theRinged Turtle-Dove. Although there are fundamentaldifferences, observers routinely mistake these species,especially early in invasion.

• Uncertainty and complexity in the underlyingspatio-temporal process dynamics

I “diffusion” (spread) and growthI species interactionsI important exogenous variables

• Uncertainty in parametersI diffusion, growth, and carrying capacity vary spatially

41

Bayesian Hierarchical Spatio-Temporal Models

Basic rule of probability: [Z ,Y , θ] = [Z |Y , θ][Y |θ][θ]

Rather than seek to model the complicated joint distribution, wefactor this joint distribution as a product of a sequence of conditionaldistributions, to which we might be able to apply scientific insight.

Thus, for complicated spatio-temporal processes, we consider thefollowing three-stage factorization of [data, process, parameters](Berliner, 1996; Wikle, et al. 1998):

Stage 1. Data Model: [data|process, data parameters]

Stage 2. Process Model: [process|process parameters]

Stage 3. Parameter Model: [data params and process params].

42

Data Models

Let Za be data observed for some process Y , and let θa beparameters.

The data model is written:

[Za|Y , θa]

This distribution is much simpler than the unconditionaldistribution of [Za], because most of the complicatedstructure (spatial and temporal) comes from theprocess Y .

43

Data Models (cont.)Combining data sets: given observations Za,Zb for the sameprocess, Y , often we can write:

[Za,Zb|Y , θa, θb] = [Za|Y , θa][Zb|Y , θb] .

That is, conditional on the true process, the data are often assumedindependent. (Note that they are almost certainly not unconditionallyindependent!). This hierarchical framework presents a natural way toaccommodate data at differing spatial and temporal resolutions andalignments (e.g.,Wikle and Berliner, 2005).

Similarly, for multivariate process (Ya,Yb), often we can write:

[Za,Zb|Ya,Yb, θa, θb] = [Za|Ya, θa][Zb|Yb, θb] .

Again, conditional on the true processes, the data are often assumedindependent.

44

Process Models

Process models are also often factored into a series of conditionalmodels:

[Ya,Yb|θY ] = [Ya|Yb, θY ][Yb|θY ]

We make such an assumption when using the Markov model fordynamical processes. (For example, in the first-order case, the “a”and “b” subscripts refer to time t and t − 1, respectively.)

Such factorizations are also important for simplifying multivariateprocesses; Royle and Berliner (1999) consider such a conditionalframework for modeling multivariate spatial processes. For example,consider ozone concentration conditioned on temperature; or considerCO2 conditioned on potential temperature.

45

Parameter Models

Parameter models can also be factored into subcomponents. Forexample, we might assume,

[θa, θb, θY ] = [θa][θb][θY ].

That is, we often assume that parameter distributions areindependent, although subject-matter knowledge may lead to morecomplex parameter models.

Scientific insight and previous studies can facilitate the specificationof these models. For example, measurement-error parameters canoften be obtained from previous studies that focused on such issues(this is typically the case for environmental variables and someecological data such as from the BBS).

46

Parameter Models (cont.)

Process parameters often carry scientific insight (e.g., spatiallydependent diffusion parameters, Wikle, 2003; turbulence parameters,Wikle et al., 2001).

In some cases, we do not know much about the parameters and usevague or non-informative distributions for parameters. Alternatively,we might use data-based estimates for such parameters.

Specification of parameter distributions is often criticized for its“subjectiveness.” Such criticism is misguided! This is what brings thepower to hierarchical models.

47

Bayesian Hierarchical Model (BHM): Schematic Example

• [data | process, parameters]: uncertainty in observations. Forexample,

[bird-count observations | true bird counts, data parameters]

• [process | parameters]: science (diffusion PDEs); partitioned intosubcomponents (e.g., Markov process); uncertainty (additivenoise, random effects). For example,

[true bird counts | diffusion and growth processes, process params]

• [parameters]: prior scientific understanding. For example,

[diffusion parameters | habitat covariates]

48

Empirical Hierarchical Model (EHM)

• [data | process, parameters]: For example,

[bird-count observations | true bird counts, data parameters]

• [process | parameters]: For example,

[true bird counts | diffusion and growth processes, process params]

• data parameters and process parameters are assumed fixed butunknown. They are typically estimated based on the marginaldistribution,

[data | parameters]

This framework is common in traditional state-space models whereone might use an E-M algorithm for parameter estimation.

49

Inference for Hierarchical Statistical Models

BHM: Use Bayes’ Theorem to derive the posterior distribution,

[process, parameters | data]∝ Data Model × Process Model × Parameter Model

The normalizing constant is [data]

EHM: Use Bayes’ Theorem to derive the predictive distribution,

[process | data, parameters]∝ Data Model × Process Model

The normalizing constant is [data | parameters]. The unknownparameters are replaced with estimates.

50

General Dynamic Spatio-Temporal Model (DSTM)

51

General DSTM (Data Models)

The general DSTM data model typically makes the sameassumption as in generalized linear mixed models(GLMMs): conditioned on the mean response, theobservations are independent. This makes a dramaticsimplification in the case of non-Gaussian likelihoods.

In the context of DSTMs, conditioned on thespatio-temporal process, the observations are assumed tobe independent. The focus is then on modeling this latentspatio-temporal process.

52

General DSTM (Data Models)

In most cases, a transformation of the underlying latentspatio-temporal process is assumed to be conditionally Gaussian –this is then where we put our modeling effort.

E.g., one could imagine this corresponding to the underlying intensityof a spatio-temporal (log-Gaussian Cox) point process or the logit ofthe probability of presence in an occupancy model.

We consider this conditional Gaussian latent process approach in thistutorial.

NOTE: although this GLMM perspective is quite general andeffective, there are some alternative approaches (e.g., spatio-temporalauto-logistic models (Zheng and Zhu, 2008); spatio-temporalstochastic agent-based models (Hooten and Wikle, 2010), etc.).

53

Statistical DSTMs: Process Modeling

Spatio-temporal dynamics are due to the interactionof the process across space and time and/or acrossscales of variability

I Some types of interaction make sense for some processes,and some don’t (e.g., process knowledge should not beignored if available)

I Statisticians have often ignored such knowledge!

Dimensionality can prevent the (efficient) estimationof model parameters, e.g., M(·) or M

I Requires sensible science-based parameterizations and/ordimension reduction; sparse structures

I Hierarchical representations can help here as well

54

First-Order Linear DSTM Process Revisted

55

First-Order Linear DSTM Process

Linear spatio-temporal processes often exhibit advective and diffusivebehavior:

“width” (decay rate) of thetransition operatorneighborhood controls the rateof spread (diffusion)

degree of “asymmetry” in thetransition operator controls thespeed and direction ofpropagation (advection)

“long range dependence” can beaccommodated by “multimodal”operators and/or heavy tails

This suggests ways thatwe might parameterize thetransition operator and/orinduce sparse structure.

56

Basic Hierarchical Linear DSTM

Data:Zt = Ht(θh,1)Yt + εt , εt ∼ Gau(0,R(θh,2))

Process:

Yt = M(θm,1)Yt−1 + ηt , ηt ∼ Gau(0,Q(θm,2))

Parameters:

θh,1, θh,2, θm,1, θm,2

These parameters may be estimated

empirically, or they can be given prior

distributions, such as Gaussian random process

priors (that may depend on other variables),

and they can easily be allowed to vary with

time and/or space so as to borrow strength.

57

Radar Nowcasting Motivation! (September 25, 2010)

Can we predict in near real-time when this storm will arrive?

58

Example: Radar Nowcasting (Sydney, pre 2000 Olympics)

34

Posterior Mean: Kernels

35

Sta$s$cal model mo$vated by an IDE (linear advec$on-‐diffusion) process with spa$ally varying parameters.

Xu, Wikle, and Fox (2005)

Data

Implied Propaga$on by Post. Params.

Mean Post. Real. Std.Dev.

Forec.

59

Low Rank (Spectral) Reduction on DynamicsIt can be useful to parameterize the science-based dynamicalspatio-temporal process in terms of a reduced-rank basis function(spectral) expansion:

Yt = µ+ Φαt + Ψβt

αt = Mααt−1 + ηα,t .

where Φ is a basis function matrix and αt are associated expansioncoefficients.

In this case, either the dimension of the dynamical process αt ismuch lower than n, reducing the number of parameters in Mα andQα ≡ cov(ηα,t), and/or the reduction acts as a decorrelator, whichreduces the complexity of these matrices.

One can still get science-based parameterizations when working in“spectral” space. In particular, many PDEs and IDEs are amenable tospectral and Galerkin-based representations (e.g., see C&W 2011, pp.396-402). 60

A Word About Spatial Basis Functions

In general: wt(s) ≈∑pk=1 φk(s)αt(k), where φk(s), αt(k) are the

spatial basis functions and associated expansion coefficients,respectively.

Currently, it is very fashionable to consider such expansions inspatial statistics for “big data”

Many choices: e.g., orthogonal polynomials, wavelets, splines,Wendland, Galerkin, empirical orthogonal functions (EOFs),discrete kernel convolutions, “factor” loadings, “predictiveprocesses”, Moran’s I bases, etc.Basis Function Decisions:

I Fixed or “estimated” (parameterized);I reduced rank (p << n), complete (p = n), or overcomplete (p > n);I expansion coefficients in physical space or “spectral” space;I discrete or continuous space

61

A Word About Basis Functions for Spatial Processes(cont.)

There is very little guidance on which bases to select!I people have their favoritesI in most spatial cases, it probably doesn’t matter much!

For linear dynamical processes, it can matter: Why?I if the bases are estimated, there is potential confounding

between the dynamics on the coefficients and the bases

I dimensionality of the coefficients may impact the ability toestimate parameters in the dynamic model withoutadditional information

For nonlinear spatio-temporal dynamics, the choice ofbasis function is even more critical.

I One must account for scale interaction

62

Nonlinear Spatio-Temporal Processes

Few environmental/ecological processes are linear (e.g.,density-dependent growth, nonlinear advection, repulsion,shock waves, infection, predation, etc.)

Nonlinear dynamical behavior arises from thecomplicated interactions across spatio-temporalscales of variability and interactions across multipleprocesses!

Examples abound in mechanistic and process modelsacross many disciplines

63

Nonlinear Spatio-Temporal Processes: Examples

64

Nonlinear Spatio-Temporal Processes: Examples (cont.)

65

Nonlinear Spatio-Temporal Processes: Examples (cont.)

66

Commonaltiy?

What do all of these processes have in common?

Quadratic nonlinearity!

This suggests a class of useful statistical models fornonlinear DSTM processes:

General Quadratic Nonlinearity (GQN)

67

General Quadratic Nonlinearity (GQN)(Wikle and Hooten, 2010)

68

GQN (cont.)

Even with Gaussian conditional noise, ηt(s), the jointdistribution of {Yt(si) : i = 1, . . . , n} is not, ingeneral, Gaussian.

Major Problem: There are too many parameters toestimate in typical spatio-temporal applicationswithout extra information!

As with the linear DSTM, we can consider:I mechanistically-motivated parameterizations,I reduced-rank spectral representations,I shrinkage priors

Consider the following illustrative example.

69

Mechanistically-Motivated Example: ECD Invasive SpeciesRevisited

Invasive Species: phases of a successful invasion

1. Introduc5on 2. Establishment 3. Range Expansion 4. Satura5on

Popula5on Growth Model

Spa5al Movement Model

Example: Eurasian Collared Dove (Streptopelia decaocto)

• Invaded Europe in the 1930s • Introduced to S. Florida in mid-‐1980s • Data collected through N. American Breeding Bird Survey (BBS) • We considered gridded average counts

70

North American Breeding Bird Survey (BBS): ECD

71

Data Model

Zt(si )|Yt(si ), θ ∼ ind . Bin(Yt(si ), θ) , i = 1, . . . ,m, t = 1, . . . ,T .

• Zt(si ) is the observed ECD count for route i and year t; Yt(si ) is thetrue but unknown abundance of ECDs at location si in year t.

• Since we only observed a subset of the number of doves present at agiven site/time, θ represents the probability of detecting an ECDwhen it is there.

• In general, Yt(si ) and θ are not both identifiable without additionalinformation (e.g., multiple surveys, capture-recapture, etc.).

• We were able to use strong prior information about θ from a detailedstudy on a related species (Mourning Dove) that was thought toshare similar detectability characteristics.

72

Process Model

True ECD population (i.e., abundance):Yt(si)

Defining Yt ≡ (Yt(s1), . . . ,Yt(sm))′ (at m observation locations), weassume that

Yt |λt ∼ ind . Poi(Hλt), t = 1, 2, . . . ,

where the n-dimensional vector,

λt ≡ (λt(s1), . . . , λt(sm), λt(sm+1), . . . , λt(sn))′

corresponds to the “true intensity” at observation locations and theadditional prediction locations {sm+1, . . . , sn}.

The matrix H is an incidence matrix that relates the true processYt(·) at observation locations, to the intensity process λt at alllocations of interest.

73

Process Model (cont.)This is motivated by a “matrix model” framework from populationdynamics with random parameters:

λt = Mλt−1

= B(τ )G(λt−1;θG )λt−1 , t = 2, 3, . . . ,

where the propagator matrix M is comprised of two distinct n × nmatrices.

• G(λt−1;θG ) is a diagonal matrix that accommodates growthover time and is dependent on the previous state λt−1 and(Ricker) growth parameters θG :

Gii(λt−1(si); θG1 , θG2 ) ≡ exp

{θG1

(1− λt−1(si)

θG2

)}, i = 1, . . . , n ,

where θG1 and θG2 are the growth and carrying-capacityparameters, respectively.

74

Process Model (cont.)

Recall,

λt = B(τ )G(λt−1;θG )λt−1, t = 2, 3, . . . .

• B(τ ) accommodates dispersal of the population and isdependent on dispersal parameters τ in a Gaussian dispersalkernel. The (i , j)-th element of this matrix is given by:

Bij(τ ) ∝ exp

{− d2

ij

τ(si)

},

where d2ij is the distance between location si and sj , and

τ ≡ (τ(s1), . . . , τ(sn))′ are spatially varying dispersal coefficients.

75

Process Model (cont.)

Why is this GQN?

For i = 1, . . . , n:

λt(si) =n∑

j=1

Bij(τ )λt(sj) exp

{θG1

(1− λt−1(sj)

θG2

)}.

But, notice that most of the interaction parameters are pre-specified

to be zero (thus, there are only O(n2) parameters) and these

non-zero parameters are highly parameterized in terms of the τ

spatial process and the growth parameters. That is, there are just a

few controlling parameters, so the effective number of parameters is

much less than O(n2).76

Parameter Models

θ ∼ Beta(aθ, bθ)

log(τ ) ∼ Gau(0,Στ ) [ spatial random field ]

log(λ1) ∼ Gau(0,Σλ) [ spatial random field ]

θG1 ∼ Gau(µ1, σ21)

θG2 ∼ IG (a2, b2)

As detailed in Hooten et al. (2007), considerable care went into thechoice of the hyperparameters for these distributions, includinginformation from previous studies and expert knowledge.

Note that in this model, the randomness in the dynamics comes fromthe initial condition (λ1) and the parameters in the evolutionmatrices, τ and θG .

77

Implementation

• Estimation and forecasting was implemented on a gridof points across the eastern two-thirds of the USA

• The Markov chain Monte Carlo (MCMC) algorithmwas a combination of Gibbs and Metropolis-Hastingssteps

• MCMC: 200,000 samples with 20,000 burn-in

• Data were available from 1986 - 2003, which was usedfor estimation.

• “Out-of-sample” forecasts were made for 2004-2020,based on the 1986 - 2003 data.

78

Posterior Means: In-Sample

79

Posterior Means: Out-of-Sample

80

Posterior Credible Intervals (95%)Two Locations: S. Florida, N. Utah

81

Posterior Inference on Dispersion ParametersDispersion parameters (τ ): Posterior mean (top),Posterior standard deviation (bottom)

82

Epilogue to ECD Analysis

• This analysis was conducted in the mid-2000s, based on datafrom 1986 through 2003.

• As of 2013, surveys show that ECD sightings are now relativelyfrequent throughout the continental US, with the exception ofthe northeast.

• The forecast made in our analysis was reasonable, if somewhatconservative in the speed of the invasion.

• Presumably, the forecast could have been improved if ecologicallyrelevant covariates were used in the model for dispersalparameters.

83

Reduced Rank QN, Shrinkage, and Informative Priors

The important dynamics of many ecological and environmentalsystems exist on a lower-dimensional manifold. This suggests thatreduced rank dynamics may be appropriate (i.e., the state dimensionis p << n). Typically, there are still too many transition parametersto estimate without further restriction.

Mechanistically-motivated choices can be made to reduce theparameter space (e.g., certain nonlinear scale interactions are lesslikely for some processes; e.g., Gladish and Wikle, 2014)

Use shrinkage priors on transition parameters (e.g., stochasticsearch variable selection; e.g., Wikle and Holan, 2011)

Use priors derived from mechanistic model output (e.g., Leeds etal. 2013)

Consider the following example illustrating the latter.

84

Ocean Biogeophysical Coupling NASA AMSR: Sea Surface Temperature (scien>fic visualiza>on studio)

NASA SEAWIFS: Ocean Color (scien>fic visualiza>on studio)

Complicated mul>variate process with variability across many spa>o-‐temporal scales. Interac>ons: •  Between processes •  Within processes

Proxy for ocean primary produc1on (i.e., chlorophyll/phytoplankton)

85

Ocean Color Observa.ons

Coastal Gulf of Alaska SeaWiFS Ocean Color Satellite Observa.ons (8 day averages)

“Gappy” and substan.al measurement uncertainty! We seek to predict at missing loca.ons and filter obs error.

(ocean color: surrogate for phytoplankton)

86

Complicated Model Components •  Lower Trophic Ecosystem

–  Essen1ally a complicated mul1component predator-‐prey system influenced by the environment (highly nonlinear)

•  Physical Ocean –  Navier-‐Stokes fluid dynamic process across mul1ple state variables (highly nonlinear)

Coupled!

(nonlinear)

The process of interest is mul;variate, nonlinear and spa;o-‐temporal.

Sea Surface Height and Currents

87

Physical-‐Biological Interface Sample output from a coupled ocean-‐ecology model in the coastal Gulf

of Alaska for May 1, 2001 (Fiechter et al. 2008)

Sea surface height (SSH) and currents

Chlorophyll concentraHon and bathymetry

Note: we can learn about the biology by knowing something about the physics!

(A DeterminisHc Model)

Strong associa9on!

88

•  Data Assimila*on: Combine primary produc0on data and mechanis0c computer model for a coupled ocean and ecosystem model (ROMS-‐NPZDFe; Fiechter et al. 2009)

•  Surrogate: quadra0c nonlinear emulator for coupled model: Phytoplankton, SSH (sea surface height), and SST (sea surface temperature) model output

•  Predict/Assimilate: Primary produc0on given high-‐dimensional ocean color (SeaWiFS) satellite data and ocean model physical output

EXAMPLE: Spa0o-‐temporal predic0on of primary produc0on (chlorophyll) in the Coastal Gulf of Alaska (GOGA)

Gulf of Alaska

• Train based on 4 years (1998-‐2001), 8 day averages

• Predict/Assimilate for 2002

89

Coupled Dynamics: Example from Coupled Ocean Model Phytoplankton SSH SST

Example training data Time: consecu@ve 8-‐day periods

90

Hierarchical Reduced-‐Rank GQN Emulator-‐Assisted DSTM

KEY POINT: means are from a GQN parametric staEsEcal emulator; esEmated “off-‐line”; or, alternaEvely, these can inform priors on a SSVS prior probability of inclusion.

Zt = Ht�↵t + Ht⇥�t + ✏t, ✏t ⇠ Gau(0,Rt)

[Rt,Q, ⌧ ]

�t ⇠ Gau(0,diag(⌧ ))

↵t = m(↵t�1;✓) + ⌘t, ⌘t ⇠ Gau(0,Q)

✓ ⇠ (✓,⌃✓)

Yt = �↵t + ⇥�tNote:

* m() quadraEc nonlinear model

(Leeds et al. 2013)

91

Ocean Ecosystem Example •  In this case:

•  State Rank Reduc9on: O(105) to O(10) (EOFs from the coupled ROMS-‐NPZDFe output for 1998-‐2001; 7 EOFs; 97.5% of the varia9on)

•  Nonlinear surrogate: quadra9c nonlinear model –  the non-‐dynamic small-‐scale components were based on the next 10 singular vectors (over 99% of varia9on in model output)

Zt =

0@

Z1,t

Z2,t

Z3,t

1A mi,t(i = 1, 2, 3) - dimensional data vectors

for Chlorophyll, SSH, SST

Yt =

0@

Y1,t

Y2,t

Y3,t

1A pi(i = 1, 2, 3) - dimensional reduced rank

process vectors for Chlorophyll, SSH, SST

(Work in log space for Chlorophyll) Data vector

Process vector

92

Results: log(CHL)

Data (SeaWiFS)

Posterior Mean

Posterior STD

93

Results (cont.)

94

ConclusionThere is much work to be done in the development of spatio-temporalstatistical models – from a theory, computation, and applicationperspective. Some important things I didn’t talk about here (or, onlymentioned briefly); most of these are very active areas of research:

Sampling modelsComputationMultivariate modelsAreal dataSpatio-temporal point processesChange-of-supportModel evaluation and “selection”Agent (Individual)-based modelsGravity models, flow models, functional modelsSampling network designLinkage across modelsSpatio-temporal confounding

95

THANK YOU!

The material in this tutorialwas based loosely on the2011 John Wiley & Sonsbook by Noel Cressie andChris Wikle. Mostreferences given on theslides can be found in thebook’s bibliography; or,send me an email at([email protected]) and Iwill gladly send you thereference.

96

Date post:	14-Jun-2020
Category:	Documents
Upload:	others
View:	12 times
Download:	1 times

Statistics for Spatio-Temporal Data (Tutorial) Christopher ...€¦ · Statistics for...

Documents