Statistics for Spatio-Temporal Data (Tutorial)
Christopher K. Wikle
Department of StatisticsUniversity of Missouri
Many of these slides were excerpted from a copyrighted short coursedeveloped by Chris Wikle and Noel Cressie (University of Wollongong)
based on their book Statistics for Spatio-Temporal Data
1
Spatio-Temporal Processes and Data
Data from spatio-temporal processes are common in thereal world, representing a variety of interactions acrossprocesses and scales of variability.
2
Spatio-Temporal Processes and Data (cont.)
Although it may be informative to see snapshots of spatialevents in time (see the Missouri River scene below), tounderstand the process, we must know something aboutthe behavior from one time-period to the next.
3
Spatio-Temporal Processes and Data (cont.)
Similarly, high-frequency temporal information from thegage level at Hermann, MO (on the Missouri River) doesnot give a sense of the spatial extent of the flood event.
1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 19980
5
10
15
20
25
30
35
40
Year
Hei
ght (
ft)
gage levelflood stage
4
Outline of this Tutorial
• Overview of Spatio-Temporal Modeling
• Descriptive vs Dynamical Approach
• Hierarchical Spatio-Temporal Models
• Parameterization of Linear Dynamical Spatio-Temporal Models
• Nonlinear Spatio-Temporal Dynamical Models
• Invasive Species Example
• Ocean Biogeochemical Example
• Conclusion
Most references given in this tutorial can be found in Cressie andWikle (2011) [henceforth, C&W (2011)]
5
Spatio-Temporal Processes and Data
There is no history without geography (and vice versa)! Weconsider space and time together
The dynamical evolution (time dimension) of spatial processesmeans that we are able to reach more forecefully for the “Why”question. (The problems are clearest when there is noaggregation; henceforth, consider processes at point-levelsupport for this tutorial, unless stated otherwise.)
Notation: Let{Y (s; t) : s ∈ Ds , t ∈ Dt}
denote a spatio-temporal random process. We sometimes writethis process as Y (s; t) or, more correctly, as Y (·; ·). For discretetime, we write Yt(s).
6
Spatio-Temporal Statistical Modeling
Spatio-tempral models exist in many scientific andmathematical disciplines. From a statistician’sperspective, what makes a model “statistical”?
• Uncertainty in data, model, and the associatedparameters
• Estimation of parameters and prediction of processes
We also often make a distinction between “stochastic”and “statistical”
• The former concerns random structures in models
• The latter concerns estimation and prediction givendata
7
Spatio-Temporal Processes and Data (cont.)
Why spatio-temporal modeling? Characterizeprocesses in the presence of uncertain and (often)incomplete observations and system knowledge, for thepurposes of:
• Prediction in space (smoothing, interpolation)
• Prediction in time (forecasting)
• Assimilation of observations with deterministic models
• Inference on parameters that explain the etiology ofthe spatio-temporal process
Traditionally, there are two approaches to modeling suchprocesses: descriptive and dynamical.
8
Spatio-Temporal Modeling
Descriptive (marginal) approach: Characterize thesecond-moment (covariance) behavior of the process
• Several different physical processes could imply thesame marginal structure
• Most useful when knowledge of the etiology of theprocess is limited
9
Spatio-Temporal Modeling
Dynamical (conditional) approach: Current values ofthe process at a location evolve from past values of theprocess at various locations
• Conditional models are closer to the etiology of thephenomenon under study
• Most useful if there is some a priori knowledgeavailable concerning the process’ behavior
Note that the descriptive approach and the dynamicalapproach can be related through their respectivecovariance functions.
10
A Simple Example
Consider the deterministic 1-D space × time, reaction-diffusionequation:
∂Y (s; t)
∂t= β
∂2Y (s; t)
∂s2− αY (s; t) ,
for {s ∈ R, t ≥ 0}, where β is the diffusion coefficient and α is the
“reaction” coefficient.
Meaning of the Equation: The rate of change in Y is equal to the“spread” of Y in space (i.e., diffusion) offset by the “loss” of acertain multiple of Y (i.e., reaction).
Behavior of the Equation: From a given initial condition Y (s; 0),the process Y (s; t) dampens as time t increases.
11
A Simple Example (cont.)
Y (s; 0) = I (15 ≤ s ≤ 24)
(a) α = 1, β = 20; (b) α = 0.05, β = 0.05; (c) α = 1, β = 5012
A Simple Example: Stochastic VersionConsider the stochastic version of this PDE:
∂Y
∂t− β∂
2Y
∂s2+ αY = η ,
where {η(s; t) : s ∈ R, t ≥ 0} is a mean-zero, white-noise process:
E (η(s; t)) ≡ 0
cov(η(s; t), η(u; r)) = σ2I (s = u, t = r)
In this case, a statistical balance is reached between the“disturbance” caused by η(·; ·) and the smoothing effect of thediffusion and loss components. That is, from a given initial condition,the stochastic PDE results in a process that eventually achieves bothspatial and temporal stationarity. (The more general case ofstochastic PDEs in Rd is given, e.g., by Brown et al., 2000.)
13
Stochastic Reaction-Diffusion Simulation Plots
Y (s; 0) = I (15 ≤ s ≤ 24)
α = 1 , β = 20
(a) σ = 0.01; (b) σ = 0.1; (c) σ = 1
14
Spatio-Temporal Covariance Function
The stochastic reaction-diffusion equation implies a (stationary inspace and time – definition to follow) covariance function:
CY (h; τ) ≡ cov(Y (s; t),Y (s + h; t + τ))
and correlation function:
ρY (h; τ) ≡ CY (h; τ)/CY (0; 0)
Heine (1955; Biometrika) gives a closed-form solution for ρY (·; ·)for spatial lag h ∈ R and temporal lag τ ∈ R:
ρY (h; τ) = (1/2)
{e−h(α/β)1/2
Erfc
(2τ(α/β)1/2 − h/β
2(τ/β)1/2
)+ eh(α/β)1/2
Erfc
(2τ(α/β)1/2 + h/β
2(τ/β)1/2
)},
where Erfc(z) is the “complementary error function”:
Erfc(z) ≡ (2/π1/2)
∫ ∞z
e−v2dv, z ≥ 0 ;
andErfc(z) = 2 − Erfc(−z), z < 0.
15
Contour Plot of Spatio-Temporal Correlation Function
The plot shows ρY (h; τ) for the stochastic reaction-diffusion equationwhen α = 1 and β = 20
16
Plots of Marginal Spatial and Temporal CorrelationFunctions
Special cases include the marginal spatial correlation function at agiven time: (a) ρY (h; 0) = exp{−h(α/β)1/2}, h > 0; and thetemporal correlation function at a given spatial location: (b)ρY (0; τ) = Erfc(τ 1/2α1/2), τ > 0.
17
Spatio-Temporal Stationarity
Definition:
We say that f is a stationary spatio-temporal covariance function onRd × R, if it is nonnegative-definite and can be written as:
f ((s; t), (x; r)) = C (s− x; t − r) , s, x ∈ Rd , t, r ∈ R.
If a random process Y (·; ·) has a constant expectation and astationary covariance function CY (h; τ), then it is said to besecond-order (or weakly) stationary. (Strong stationarity implies theequivalence of the two probability measures defining the randomprocess Y (·; ·) and Y (·+ h; ·+ τ), respectively, for all h ∈ Rd and allτ ∈ R.)
18
Separability of Spatio-Temporal Covariance Functions
Stochastic PDEs are built from dynamical physicalconsiderations, and they imply covariance functions.Covariance functions have to be positive-definite (p-d). So,specifying classes of spatio-temporal covariance functions todescribe the dependence in spatio-temporal data is not all thateasy.Suppose the spatial C (1)(h) is p-d and the temporal C (2)(τ) isp-d. Then the separable class:
C (h; τ) ≡ C (1)(h) · C (2)(τ)
is guaranteed to be p-d.Separability is unusual in dynamical models; it says thattemporal evolution proceeds independently at each spatiallocation. That is, separability comes from a lack ofspatio-temporal interaction in Y (·; ·).
19
Stochastic Reaction-Diffusion and Separability
If C (h; τ) = C (1)(h) · C (2)(τ),then
C (h; 0) = C (1)(h)C (2)(0)
C (0; τ) = C (1)(0)C (2)(τ) ,
and henceρ(h; τ) =
C (1)(h) · C (2)(τ)
C (0; 0)
=C (h; 0) · C (0; τ)
C (0; 0) · C (0; 0)
= ρ(h; 0) · ρ(0; τ)
What about the stochastic reaction-diffusion equation for Y (·; ·)?Plot:
ρY (h; 0) · ρY (0; τ) versus (h, τ)
ρY (h; τ) versus (h, τ)
20
Contour Plots of Spatio-Temporal Correlation Functions
(a) ρY (h; 0) · ρY (0; τ); (b) ρY (h; τ)
The difference in correlation functions is striking. Hence ρY (·; ·), forthe stochastic reaction-diffusion equation, is non-separable. Note,however, that it is often difficult to see the difference betweenseparability and non-separability in realizations from a process.
21
Inference on a Hidden Spatio-Temporal Process
We could ignore the dynamics and treat time as another “spatial”dimension (i.e., descriptive approach). Write the data as:
Z = (Z (s1; t1), . . . ,Z (sm; tm))′ ,
which are observations taken at known space-time “locations.”
Note that the data are usually noisy and not observed at alllocations of interest.
Assume a hidden (“true”) process,{Y (s; t) : s ∈ Ds ⊂ Rd , t ≥ 0}, which is not observable due tomeasurement error and “missingness.” Write
Z = Y + ε ,
where E (ε) = 0, cov(ε) = σ2ε I.
We wish to predict Y (s0; t0) from data Z
22
Spatio-Temporal (Simple) KrigingPredict Y (s0; t0) with the linear predictor, λ′Z + k :
For simplicity, assume E (Y (s; t)) ≡ 0. Then k = 0, and we minimizew.r.t. λ, the mean squared prediction error,
E (Y (s0; t0)− λ′Z)2 .
This results in the simple kriging predictor:
Y (s0; t0) = c(s0; t0)′Σ−1Z Z ,
where ΣZ ≡ cov(Z), andc(s0; t0)′ = cov(Y (s0; t0),Z) = cov(Y (s0; t0),Y)
The simple kriging standard error (s.e.) is:
σk(s0; t0) = {var(Y (s0; t0))− c(s0; t0)′Σ−1Z c(s0; t0)}1/2
23
Kriging for Stochastic Reaction-Diffusion Equation
(a) For simplicity, assume no noise in the data Z (i.e., ε = 0)
(b) Crosses show {(si ; ti) : i = 1, . . . , 48} (“data” locations)
superimposed on the kriging predictor map, {Y (s0; t0)}(c) Kriging s.e. map, {σk(s0; t0)}
24
Kriging for Stochastic Reaction-Diffusion Equation (cont.)
(a) Same noiseless dataset (i.e., ε = 0)
(b) Crosses show different {(si ; ti) : i = 1, . . . , 48)} superimposed on
the kriging predictor map, {Y (s0; t0)}(c) Kriging s.e. map, {σk(s0; t0)}
25
Spatio-Temporal Covariance Functions
In practice, one does not typically know the underlying stochasticPDE that governs the system of interest. Even with such knowledge,it may not be easy to find the analytical covariance function.
We saw that the assumption of separability is not very realistic andthat covariance functions must satisfy the positive-definitenessproperty. This suggests the need for realistic classes ofspatio-temporal covariance functions.
In recent years, there has been good progress in developing newclasses of spatio-temporal covariance functions through the useof the spectral-domain representation and Bochner’s Theorem (e.g.,see C&W 2011, Sec. 6.1.6: Examples include the work of Cressie andHuang, 1999; Gneiting, 2002; Stein, 2005; and many others).
26
Spatio-Temporal Covariance Functions (cont.)
To date, available classes of (descriptive) S-T covariance functionsare not realistic for many complicated phenomena, and there can beserious computational issues with their implementation in traditionalkriging formulas due to the dimensionality of the prediction problemsof interest.
As an alternative, we can make use of dynamical (conditional)formulations. These simplify the joint-dependence structure. Inaddition, because conditional models are closer to the process’etiology, it may be easier to incorporate process knowledge directly(e.g., using dynamical models).
Consider again the stochastic reaction-diffusion equation, now fromthe dynamical perspective.
27
Emphasize the Dynamics
Approximate the differentials in the reaction-diffusion equation:
∂Y
∂t= β
∂2Y
∂s2− αY
with differences over the grid from 0 to L at intervals ∆s :
Y (s; t + ∆t)− Y (s; t)
∆t= β
{Y (s + ∆s ; t)− 2Y (s; t) + Y (s −∆s ; t)
∆2s
}− αY (s; t)
Define Yt ≡ (Y (∆s ; t), . . . ,Y (L−∆s ; t))′; YBt ≡ (Y (0; t),Y (L; t))′.
Then the stochastic version of the difference equation above is:
Yt+∆t = MYt + MBYBt + ηt+∆t
,
where MBYBt represents given boundary effects. The difference
equation is a good approximation to the differential equation,provided α∆t < 1 and 2β∆t/∆2
s < 1.28
Emphasize the Dynamics (cont.)
Importantly, the matrix M is given by
M =
θ1 θ2 0 . . . 0
θ2 θ1 θ2 . . ....
0 θ2 θ1. . .
......
. . .. . . θ2
0 0 . . . θ2 θ1
,
where θ1 = (1− α∆t − 2β∆t/∆2s ), θ2 = β∆t/∆2
s .
This can be viewed as the propagator (transition) matrix of aVAR(1) process. The matrix is defined by the dynamics. In otherwords, in a dynamic model of spatio-temporal dependence, M hasstructure (which is typically sparse).
29
Emphasize the Dynamics (cont.)
Conditional on the boundary effects, we see that the lagged (in time)spatial covariances are given by,
C(m)Y = MmC(0)
Y ,
where C(m)Y ≡ cov(Yt ,Yt+m∆t ); m = 0, 1, 2, . . . , and it can be shown
that the lag-0 marginal spatial covariance for Y can be written interms of the propagator matrix M and the spatial covariance matrix
for the η-process, C(0)η :
vec(C(0)Y ) = (I−M⊗M)−1vec(C(0)
η ).
This suggests that we can compare the spatio-temporal covariancestructure for this reaction-diffusion difference equation with thePDE’s theoretical form derived by Heine (1955).
30
Comparison of Differential and Difference Equations
Spatio-temporal correlations; α = 1, β = 20, ∆s = 1, and ∆t = 0.01Solid blue line: from differential equationRed dots: from difference equation
31
The Dynamics in the Difference Equation
Think of a spatial process at time t rather than a spatio-temporalprocess. Call it the vector Yt . Then describe its dynamics by adiscrete-time Markov process; e.g., VAR(1):
Yt = MYt−1 + ηt
As implied above, the choice of M is crucial. In particular, we notethat M ≡ (mij) represent “spatial weights” of the process values fromthe past, e.g.,
Yt(si) =n∑
j=1
mijYt−1(sj) + ηt(si).
Usually, many of these coefficients have small or zero weight.Typically, the mij corresponding to nearby locations si and sj arenon-zero, and they are zero when locations are far apart.
32
Structure of M
These directed graphs show the case of one-dimensional space:
su = s
u = u1
u = u2
t-1 t
u
u
u
u ii
i − 2
i − 1
i + 1
i + 2
i − 2
i − 1
i + 1
i + 2
t-1 t
u u
uuu
uuu
uuu
uuu
General M M defined “spatially”
33
Structure of M (cont.)
The importance of the structure of M suggests ways in which thismatrix can be parameterized.
What is it about the structure of this matrix and the values of these“nearest neighbor” parameters that affect the dynamics? Can we usethis sort of scientific process knowledge (in various forms) to helpwith this parameterization?
In fact, this type of information can help us but we need an efficientframework in which to build it into the model.
The hierarchical modeling framework is quite helpful in this regard.
34
Towards Hierarchical Spatio-Temporal Statistical Models
• We can motivate dynamical models through mechanisticrelationships.
• These models can still be over-parameterized, or too simple forreal-world processes.
• We must account for this complexity and our uncertainty in theprocess and parameters.
• There is also uncertainty in data, and the size of the dataset canbe a problem.
• Hierarchical statistical models (specifically, BayesianHierarchical Models, BHMs) can provide a framework to accountfor these issues.
Before getting back to the dynamical specifications, consider thefollowing motivating example to illustrate the BHM approach forspatio-temporal modeling.
35
Motivating Problem: Spread of Invasive SpeciesEurasian Collared-Doves (ECDs)
• The Eurasian Collared-Dove (Streptopelia decaocto) originated inAsia and, starting in the 1930s, expanded its range into Europe(Hudson, 1965).
• They were first observed in the United States in the mid 1980safter being introduced into the Bahamas in 1974 from apopulation that escaped captivity (Smith, 1987).
• Since the species’ introduction in Florida, its range has beenexpanding dramatically across North America.
36
Breeding Bird Survey (BBS) Counts of ECD, 1986-2003
37
BBS ECD Counts: 2003 and Yearly Totals)
38
Invasion Impacts
• ECD biological threats (Romagosa and Labisky, 2000):competition for resources with native avifauna; transmission ofdisease
• “ECD will probably colonize all of North America within a fewdecades” (Romagosa and Labisky, 2000)
Just how probable is this colonization? The example presented laterwill answer this question.
The following provides the spatio-temporal hierarchical motivation forsuch a model.
39
Typical Invasions
Invasive species phases:
• Introduction
• Establishment
• Range Expansion
• Saturation
Ecological models for invasions involve dispersal and growth
40
Uncertainty in Spread of Invasives
• Uncertainty in data (e.g, BBS counts)I differences in experience and expertise of the BBS volunteer
observers leads to differences in probability of detection
I The Eurasian Collared-Dove is similar in appearance to theRinged Turtle-Dove. Although there are fundamentaldifferences, observers routinely mistake these species,especially early in invasion.
• Uncertainty and complexity in the underlyingspatio-temporal process dynamics
I “diffusion” (spread) and growthI species interactionsI important exogenous variables
• Uncertainty in parametersI diffusion, growth, and carrying capacity vary spatially
41
Bayesian Hierarchical Spatio-Temporal Models
Basic rule of probability: [Z ,Y , θ] = [Z |Y , θ][Y |θ][θ]
Rather than seek to model the complicated joint distribution, wefactor this joint distribution as a product of a sequence of conditionaldistributions, to which we might be able to apply scientific insight.
Thus, for complicated spatio-temporal processes, we consider thefollowing three-stage factorization of [data, process, parameters](Berliner, 1996; Wikle, et al. 1998):
Stage 1. Data Model: [data|process, data parameters]
Stage 2. Process Model: [process|process parameters]
Stage 3. Parameter Model: [data params and process params].
42
Data Models
Let Za be data observed for some process Y , and let θa beparameters.
The data model is written:
[Za|Y , θa]
This distribution is much simpler than the unconditionaldistribution of [Za], because most of the complicatedstructure (spatial and temporal) comes from theprocess Y .
43
Data Models (cont.)Combining data sets: given observations Za,Zb for the sameprocess, Y , often we can write:
[Za,Zb|Y , θa, θb] = [Za|Y , θa][Zb|Y , θb] .
That is, conditional on the true process, the data are often assumedindependent. (Note that they are almost certainly not unconditionallyindependent!). This hierarchical framework presents a natural way toaccommodate data at differing spatial and temporal resolutions andalignments (e.g.,Wikle and Berliner, 2005).
Similarly, for multivariate process (Ya,Yb), often we can write:
[Za,Zb|Ya,Yb, θa, θb] = [Za|Ya, θa][Zb|Yb, θb] .
Again, conditional on the true processes, the data are often assumedindependent.
44
Process Models
Process models are also often factored into a series of conditionalmodels:
[Ya,Yb|θY ] = [Ya|Yb, θY ][Yb|θY ]
We make such an assumption when using the Markov model fordynamical processes. (For example, in the first-order case, the “a”and “b” subscripts refer to time t and t − 1, respectively.)
Such factorizations are also important for simplifying multivariateprocesses; Royle and Berliner (1999) consider such a conditionalframework for modeling multivariate spatial processes. For example,consider ozone concentration conditioned on temperature; or considerCO2 conditioned on potential temperature.
45
Parameter Models
Parameter models can also be factored into subcomponents. Forexample, we might assume,
[θa, θb, θY ] = [θa][θb][θY ].
That is, we often assume that parameter distributions areindependent, although subject-matter knowledge may lead to morecomplex parameter models.
Scientific insight and previous studies can facilitate the specificationof these models. For example, measurement-error parameters canoften be obtained from previous studies that focused on such issues(this is typically the case for environmental variables and someecological data such as from the BBS).
46
Parameter Models (cont.)
Process parameters often carry scientific insight (e.g., spatiallydependent diffusion parameters, Wikle, 2003; turbulence parameters,Wikle et al., 2001).
In some cases, we do not know much about the parameters and usevague or non-informative distributions for parameters. Alternatively,we might use data-based estimates for such parameters.
Specification of parameter distributions is often criticized for its“subjectiveness.” Such criticism is misguided! This is what brings thepower to hierarchical models.
47
Bayesian Hierarchical Model (BHM): Schematic Example
• [data | process, parameters]: uncertainty in observations. Forexample,
[bird-count observations | true bird counts, data parameters]
• [process | parameters]: science (diffusion PDEs); partitioned intosubcomponents (e.g., Markov process); uncertainty (additivenoise, random effects). For example,
[true bird counts | diffusion and growth processes, process params]
• [parameters]: prior scientific understanding. For example,
[diffusion parameters | habitat covariates]
48
Empirical Hierarchical Model (EHM)
• [data | process, parameters]: For example,
[bird-count observations | true bird counts, data parameters]
• [process | parameters]: For example,
[true bird counts | diffusion and growth processes, process params]
• data parameters and process parameters are assumed fixed butunknown. They are typically estimated based on the marginaldistribution,
[data | parameters]
This framework is common in traditional state-space models whereone might use an E-M algorithm for parameter estimation.
49
Inference for Hierarchical Statistical Models
BHM: Use Bayes’ Theorem to derive the posterior distribution,
[process, parameters | data]∝ Data Model × Process Model × Parameter Model
The normalizing constant is [data]
EHM: Use Bayes’ Theorem to derive the predictive distribution,
[process | data, parameters]∝ Data Model × Process Model
The normalizing constant is [data | parameters]. The unknownparameters are replaced with estimates.
50
General Dynamic Spatio-Temporal Model (DSTM)
51
General DSTM (Data Models)
The general DSTM data model typically makes the sameassumption as in generalized linear mixed models(GLMMs): conditioned on the mean response, theobservations are independent. This makes a dramaticsimplification in the case of non-Gaussian likelihoods.
In the context of DSTMs, conditioned on thespatio-temporal process, the observations are assumed tobe independent. The focus is then on modeling this latentspatio-temporal process.
52
General DSTM (Data Models)
In most cases, a transformation of the underlying latentspatio-temporal process is assumed to be conditionally Gaussian –this is then where we put our modeling effort.
E.g., one could imagine this corresponding to the underlying intensityof a spatio-temporal (log-Gaussian Cox) point process or the logit ofthe probability of presence in an occupancy model.
We consider this conditional Gaussian latent process approach in thistutorial.
NOTE: although this GLMM perspective is quite general andeffective, there are some alternative approaches (e.g., spatio-temporalauto-logistic models (Zheng and Zhu, 2008); spatio-temporalstochastic agent-based models (Hooten and Wikle, 2010), etc.).
53
Statistical DSTMs: Process Modeling
Spatio-temporal dynamics are due to the interactionof the process across space and time and/or acrossscales of variability
I Some types of interaction make sense for some processes,and some don’t (e.g., process knowledge should not beignored if available)
I Statisticians have often ignored such knowledge!
Dimensionality can prevent the (efficient) estimationof model parameters, e.g., M(·) or M
I Requires sensible science-based parameterizations and/ordimension reduction; sparse structures
I Hierarchical representations can help here as well
54
First-Order Linear DSTM Process Revisted
55
First-Order Linear DSTM Process
Linear spatio-temporal processes often exhibit advective and diffusivebehavior:
“width” (decay rate) of thetransition operatorneighborhood controls the rateof spread (diffusion)
degree of “asymmetry” in thetransition operator controls thespeed and direction ofpropagation (advection)
“long range dependence” can beaccommodated by “multimodal”operators and/or heavy tails
This suggests ways thatwe might parameterize thetransition operator and/orinduce sparse structure.
56
Basic Hierarchical Linear DSTM
Data:Zt = Ht(θh,1)Yt + εt , εt ∼ Gau(0,R(θh,2))
Process:
Yt = M(θm,1)Yt−1 + ηt , ηt ∼ Gau(0,Q(θm,2))
Parameters:
θh,1, θh,2, θm,1, θm,2
These parameters may be estimated
empirically, or they can be given prior
distributions, such as Gaussian random process
priors (that may depend on other variables),
and they can easily be allowed to vary with
time and/or space so as to borrow strength.
57
Radar Nowcasting Motivation! (September 25, 2010)
Can we predict in near real-time when this storm will arrive?
58
Example: Radar Nowcasting (Sydney, pre 2000 Olympics)
34
Posterior Mean: Kernels
35
Sta$s$cal model mo$vated by an IDE (linear advec$on-‐diffusion) process with spa$ally varying parameters.
Xu, Wikle, and Fox (2005)
Data
Implied Propaga$on by Post. Params.
Mean Post. Real. Std.Dev.
Forec.
59
Low Rank (Spectral) Reduction on DynamicsIt can be useful to parameterize the science-based dynamicalspatio-temporal process in terms of a reduced-rank basis function(spectral) expansion:
Yt = µ+ Φαt + Ψβt
αt = Mααt−1 + ηα,t .
where Φ is a basis function matrix and αt are associated expansioncoefficients.
In this case, either the dimension of the dynamical process αt ismuch lower than n, reducing the number of parameters in Mα andQα ≡ cov(ηα,t), and/or the reduction acts as a decorrelator, whichreduces the complexity of these matrices.
One can still get science-based parameterizations when working in“spectral” space. In particular, many PDEs and IDEs are amenable tospectral and Galerkin-based representations (e.g., see C&W 2011, pp.396-402). 60
A Word About Spatial Basis Functions
In general: wt(s) ≈∑pk=1 φk(s)αt(k), where φk(s), αt(k) are the
spatial basis functions and associated expansion coefficients,respectively.
Currently, it is very fashionable to consider such expansions inspatial statistics for “big data”
Many choices: e.g., orthogonal polynomials, wavelets, splines,Wendland, Galerkin, empirical orthogonal functions (EOFs),discrete kernel convolutions, “factor” loadings, “predictiveprocesses”, Moran’s I bases, etc.Basis Function Decisions:
I Fixed or “estimated” (parameterized);I reduced rank (p << n), complete (p = n), or overcomplete (p > n);I expansion coefficients in physical space or “spectral” space;I discrete or continuous space
61
A Word About Basis Functions for Spatial Processes(cont.)
There is very little guidance on which bases to select!I people have their favoritesI in most spatial cases, it probably doesn’t matter much!
For linear dynamical processes, it can matter: Why?I if the bases are estimated, there is potential confounding
between the dynamics on the coefficients and the bases
I dimensionality of the coefficients may impact the ability toestimate parameters in the dynamic model withoutadditional information
For nonlinear spatio-temporal dynamics, the choice ofbasis function is even more critical.
I One must account for scale interaction
62
Nonlinear Spatio-Temporal Processes
Few environmental/ecological processes are linear (e.g.,density-dependent growth, nonlinear advection, repulsion,shock waves, infection, predation, etc.)
Nonlinear dynamical behavior arises from thecomplicated interactions across spatio-temporalscales of variability and interactions across multipleprocesses!
Examples abound in mechanistic and process modelsacross many disciplines
63
Nonlinear Spatio-Temporal Processes: Examples
64
Nonlinear Spatio-Temporal Processes: Examples (cont.)
65
Nonlinear Spatio-Temporal Processes: Examples (cont.)
66
Commonaltiy?
What do all of these processes have in common?
Quadratic nonlinearity!
This suggests a class of useful statistical models fornonlinear DSTM processes:
General Quadratic Nonlinearity (GQN)
67
General Quadratic Nonlinearity (GQN)(Wikle and Hooten, 2010)
68
GQN (cont.)
Even with Gaussian conditional noise, ηt(s), the jointdistribution of {Yt(si) : i = 1, . . . , n} is not, ingeneral, Gaussian.
Major Problem: There are too many parameters toestimate in typical spatio-temporal applicationswithout extra information!
As with the linear DSTM, we can consider:I mechanistically-motivated parameterizations,I reduced-rank spectral representations,I shrinkage priors
Consider the following illustrative example.
69
Mechanistically-Motivated Example: ECD Invasive SpeciesRevisited
Invasive Species: phases of a successful invasion
1. Introduc5on 2. Establishment 3. Range Expansion 4. Satura5on
Popula5on Growth Model
Spa5al Movement Model
Example: Eurasian Collared Dove (Streptopelia decaocto)
• Invaded Europe in the 1930s • Introduced to S. Florida in mid-‐1980s • Data collected through N. American Breeding Bird Survey (BBS) • We considered gridded average counts
70
North American Breeding Bird Survey (BBS): ECD
71
Data Model
Zt(si )|Yt(si ), θ ∼ ind . Bin(Yt(si ), θ) , i = 1, . . . ,m, t = 1, . . . ,T .
• Zt(si ) is the observed ECD count for route i and year t; Yt(si ) is thetrue but unknown abundance of ECDs at location si in year t.
• Since we only observed a subset of the number of doves present at agiven site/time, θ represents the probability of detecting an ECDwhen it is there.
• In general, Yt(si ) and θ are not both identifiable without additionalinformation (e.g., multiple surveys, capture-recapture, etc.).
• We were able to use strong prior information about θ from a detailedstudy on a related species (Mourning Dove) that was thought toshare similar detectability characteristics.
72
Process Model
True ECD population (i.e., abundance):Yt(si)
Defining Yt ≡ (Yt(s1), . . . ,Yt(sm))′ (at m observation locations), weassume that
Yt |λt ∼ ind . Poi(Hλt), t = 1, 2, . . . ,
where the n-dimensional vector,
λt ≡ (λt(s1), . . . , λt(sm), λt(sm+1), . . . , λt(sn))′
corresponds to the “true intensity” at observation locations and theadditional prediction locations {sm+1, . . . , sn}.
The matrix H is an incidence matrix that relates the true processYt(·) at observation locations, to the intensity process λt at alllocations of interest.
73
Process Model (cont.)This is motivated by a “matrix model” framework from populationdynamics with random parameters:
λt = Mλt−1
= B(τ )G(λt−1;θG )λt−1 , t = 2, 3, . . . ,
where the propagator matrix M is comprised of two distinct n × nmatrices.
• G(λt−1;θG ) is a diagonal matrix that accommodates growthover time and is dependent on the previous state λt−1 and(Ricker) growth parameters θG :
Gii(λt−1(si); θG1 , θG2 ) ≡ exp
{θG1
(1− λt−1(si)
θG2
)}, i = 1, . . . , n ,
where θG1 and θG2 are the growth and carrying-capacityparameters, respectively.
74
Process Model (cont.)
Recall,
λt = B(τ )G(λt−1;θG )λt−1, t = 2, 3, . . . .
• B(τ ) accommodates dispersal of the population and isdependent on dispersal parameters τ in a Gaussian dispersalkernel. The (i , j)-th element of this matrix is given by:
Bij(τ ) ∝ exp
{− d2
ij
τ(si)
},
where d2ij is the distance between location si and sj , and
τ ≡ (τ(s1), . . . , τ(sn))′ are spatially varying dispersal coefficients.
75
Process Model (cont.)
Why is this GQN?
For i = 1, . . . , n:
λt(si) =n∑
j=1
Bij(τ )λt(sj) exp
{θG1
(1− λt−1(sj)
θG2
)}.
But, notice that most of the interaction parameters are pre-specified
to be zero (thus, there are only O(n2) parameters) and these
non-zero parameters are highly parameterized in terms of the τ
spatial process and the growth parameters. That is, there are just a
few controlling parameters, so the effective number of parameters is
much less than O(n2).76
Parameter Models
θ ∼ Beta(aθ, bθ)
log(τ ) ∼ Gau(0,Στ ) [ spatial random field ]
log(λ1) ∼ Gau(0,Σλ) [ spatial random field ]
θG1 ∼ Gau(µ1, σ21)
θG2 ∼ IG (a2, b2)
As detailed in Hooten et al. (2007), considerable care went into thechoice of the hyperparameters for these distributions, includinginformation from previous studies and expert knowledge.
Note that in this model, the randomness in the dynamics comes fromthe initial condition (λ1) and the parameters in the evolutionmatrices, τ and θG .
77
Implementation
• Estimation and forecasting was implemented on a gridof points across the eastern two-thirds of the USA
• The Markov chain Monte Carlo (MCMC) algorithmwas a combination of Gibbs and Metropolis-Hastingssteps
• MCMC: 200,000 samples with 20,000 burn-in
• Data were available from 1986 - 2003, which was usedfor estimation.
• “Out-of-sample” forecasts were made for 2004-2020,based on the 1986 - 2003 data.
78
Posterior Means: In-Sample
79
Posterior Means: Out-of-Sample
80
Posterior Credible Intervals (95%)Two Locations: S. Florida, N. Utah
81
Posterior Inference on Dispersion ParametersDispersion parameters (τ ): Posterior mean (top),Posterior standard deviation (bottom)
82
Epilogue to ECD Analysis
• This analysis was conducted in the mid-2000s, based on datafrom 1986 through 2003.
• As of 2013, surveys show that ECD sightings are now relativelyfrequent throughout the continental US, with the exception ofthe northeast.
• The forecast made in our analysis was reasonable, if somewhatconservative in the speed of the invasion.
• Presumably, the forecast could have been improved if ecologicallyrelevant covariates were used in the model for dispersalparameters.
83
Reduced Rank QN, Shrinkage, and Informative Priors
The important dynamics of many ecological and environmentalsystems exist on a lower-dimensional manifold. This suggests thatreduced rank dynamics may be appropriate (i.e., the state dimensionis p << n). Typically, there are still too many transition parametersto estimate without further restriction.
Mechanistically-motivated choices can be made to reduce theparameter space (e.g., certain nonlinear scale interactions are lesslikely for some processes; e.g., Gladish and Wikle, 2014)
Use shrinkage priors on transition parameters (e.g., stochasticsearch variable selection; e.g., Wikle and Holan, 2011)
Use priors derived from mechanistic model output (e.g., Leeds etal. 2013)
Consider the following example illustrating the latter.
84
Ocean Biogeophysical Coupling NASA AMSR: Sea Surface Temperature (scien>fic visualiza>on studio)
NASA SEAWIFS: Ocean Color (scien>fic visualiza>on studio)
Complicated mul>variate process with variability across many spa>o-‐temporal scales. Interac>ons: • Between processes • Within processes
Proxy for ocean primary produc1on (i.e., chlorophyll/phytoplankton)
85
Ocean Color Observa.ons
Coastal Gulf of Alaska SeaWiFS Ocean Color Satellite Observa.ons (8 day averages)
“Gappy” and substan.al measurement uncertainty! We seek to predict at missing loca.ons and filter obs error.
(ocean color: surrogate for phytoplankton)
86
Complicated Model Components • Lower Trophic Ecosystem
– Essen1ally a complicated mul1component predator-‐prey system influenced by the environment (highly nonlinear)
• Physical Ocean – Navier-‐Stokes fluid dynamic process across mul1ple state variables (highly nonlinear)
Coupled!
(nonlinear)
The process of interest is mul;variate, nonlinear and spa;o-‐temporal.
Sea Surface Height and Currents
87
Physical-‐Biological Interface Sample output from a coupled ocean-‐ecology model in the coastal Gulf
of Alaska for May 1, 2001 (Fiechter et al. 2008)
Sea surface height (SSH) and currents
Chlorophyll concentraHon and bathymetry
Note: we can learn about the biology by knowing something about the physics!
(A DeterminisHc Model)
Strong associa9on!
88
• Data Assimila*on: Combine primary produc0on data and mechanis0c computer model for a coupled ocean and ecosystem model (ROMS-‐NPZDFe; Fiechter et al. 2009)
• Surrogate: quadra0c nonlinear emulator for coupled model: Phytoplankton, SSH (sea surface height), and SST (sea surface temperature) model output
• Predict/Assimilate: Primary produc0on given high-‐dimensional ocean color (SeaWiFS) satellite data and ocean model physical output
EXAMPLE: Spa0o-‐temporal predic0on of primary produc0on (chlorophyll) in the Coastal Gulf of Alaska (GOGA)
Gulf of Alaska
• Train based on 4 years (1998-‐2001), 8 day averages
• Predict/Assimilate for 2002
89
Coupled Dynamics: Example from Coupled Ocean Model Phytoplankton SSH SST
Example training data Time: consecu@ve 8-‐day periods
90
Hierarchical Reduced-‐Rank GQN Emulator-‐Assisted DSTM
KEY POINT: means are from a GQN parametric staEsEcal emulator; esEmated “off-‐line”; or, alternaEvely, these can inform priors on a SSVS prior probability of inclusion.
Zt = Ht�↵t + Ht⇥�t + ✏t, ✏t ⇠ Gau(0,Rt)
[Rt,Q, ⌧ ]
�t ⇠ Gau(0,diag(⌧ ))
↵t = m(↵t�1;✓) + ⌘t, ⌘t ⇠ Gau(0,Q)
✓ ⇠ (✓,⌃✓)
Yt = �↵t + ⇥�tNote:
* m() quadraEc nonlinear model
(Leeds et al. 2013)
91
Ocean Ecosystem Example • In this case:
• State Rank Reduc9on: O(105) to O(10) (EOFs from the coupled ROMS-‐NPZDFe output for 1998-‐2001; 7 EOFs; 97.5% of the varia9on)
• Nonlinear surrogate: quadra9c nonlinear model – the non-‐dynamic small-‐scale components were based on the next 10 singular vectors (over 99% of varia9on in model output)
Zt =
0@
Z1,t
Z2,t
Z3,t
1A mi,t(i = 1, 2, 3) - dimensional data vectors
for Chlorophyll, SSH, SST
Yt =
0@
Y1,t
Y2,t
Y3,t
1A pi(i = 1, 2, 3) - dimensional reduced rank
process vectors for Chlorophyll, SSH, SST
(Work in log space for Chlorophyll) Data vector
Process vector
92
Results: log(CHL)
Data (SeaWiFS)
Posterior Mean
Posterior STD
93
Results (cont.)
94
ConclusionThere is much work to be done in the development of spatio-temporalstatistical models – from a theory, computation, and applicationperspective. Some important things I didn’t talk about here (or, onlymentioned briefly); most of these are very active areas of research:
Sampling modelsComputationMultivariate modelsAreal dataSpatio-temporal point processesChange-of-supportModel evaluation and “selection”Agent (Individual)-based modelsGravity models, flow models, functional modelsSampling network designLinkage across modelsSpatio-temporal confounding
95
THANK YOU!
The material in this tutorialwas based loosely on the2011 John Wiley & Sonsbook by Noel Cressie andChris Wikle. Mostreferences given on theslides can be found in thebook’s bibliography; or,send me an email at([email protected]) and Iwill gladly send you thereference.
96