Chapter 4 - Fundamentals of spatial processesLecture notes
Geir Storvik
January 21, 2013
STK4150 - Intro 1
Spatial processes
Typically correlation between nearby sites
Mostly positive correlation
Negative correlation when competition
Typically part of a space-time process
Temporal snapshot
Temporal aggregation
Statistical analysis
Incorporate spatial dependence into spatial statistical models
Relatively new in statistics, active research field!
STK4150 - Intro 2
Hierarchical (statistical) models
Data model
Process model
In time series setting - state space models
Example
Yt =αYt−1 + Wt , t = 2, 3, ... Wtind∼ (0, σ2
W ) Process model
Zt =βYt + ηt , t = 1, 2, 3, ... ηtind∼ (0, σ2
W ) Data model
We will do similar type of modelling now, separating the process modeland the data model:
Z (si ) = Y (si ) + ε(si ), ε(si ) ∼ iid
STK4150 - Intro 3
Hierarchical (statistical) models
Data model
Process model
In time series setting - state space modelsExample
Yt =αYt−1 + Wt , t = 2, 3, ... Wtind∼ (0, σ2
W ) Process model
Zt =βYt + ηt , t = 1, 2, 3, ... ηtind∼ (0, σ2
W ) Data model
We will do similar type of modelling now, separating the process modeland the data model:
Z (si ) = Y (si ) + ε(si ), ε(si ) ∼ iid
STK4150 - Intro 4
Prediction
Typical aim:
Observed Z = (Z (s1), ...,Z (sm))
Want to predict Y (s0)
Squared error loss function: Optimal E (Y (s0)|Z)
Empirical methods: E(Y (s0)|Z) ≈ E(Y (s0)|Z, θ)Bayesian methods: E(Y (s0)|Z) =
∫θ E(Y (s0)|Z,θ)p(θ|Z)dθ
Require E (Y (s0)|Z,θ)
Z|Y are independent, but dependent marginally
Require model for {Y (s)}
STK4150 - Intro 5
Geostatistical models
Assume {Y (s)} is a Gaussian process
(Y (s1), ...,Y (sm)) is multivariate Gaussian for all m and s1, ..., sm
Need to specify
µ(s) = E (Y (s))
CY (s, s′) = cov(Y (s),Y (s′))
Usually assume 2. order stationarity:
µ(s) = µ, ∀sCY (s, s′) = CY (|s− s′|), ∀s, s′
Possible extension:
µ(s) = x(s)Tβ
Often:
Z (si )|Y (si ), σ2ε ∼ ind.Gau(Y (si ), σ
2ε)
STK4150 - Intro 6
Prediction
(Z (s1), ...,Z (sm),Y (s0)) is multivariate GaussianCan use rules about conditional distributions:(
X1
X2
)∼MVN
((µ1
µ2
),
(Σ11 Σ12
Σ21 Σ22
))E (X2|X1) =µ1 + Σ12Σ−122 (X2 − µ2)
var(X2|X1) =Σ11 −Σ12Σ−122 Σ21
Need
Expectations: As for ordinary linear regression
Covariances: New!
STK4150 - Intro 7
Variogram and covariance function
Dependence can be specified through covariance functionsNot always that such exist, but prediction can still be carried outMore general concept: Variogram
2γY (h) ≡var[Y (s + h)− Y (s)]
=var[Y (s + h)] + var[Y (s)]− 2cov[Y (s + h),Y (s)]
=2CY (0)− 2CY (h)
Note:
Variogram can exist even if var[Y (s)] =∞!
Often assumed in spatial statistics, thereby the use of variogramsWhen 2γY (h) = 2γY (||h||), we get an isotrophic variogram
STK4150 - Intro 8
Isotropic covariance functions/variograms
Matern covariance function
CY (h;θ) = σ21{2θ2−1Γ(θ2)}−1{||h||/θ1}θ2Kθ2(||h||/θ1)
Powered-exponential
CY (h;θ) = σ21 exp{−(||h||/θ1)θ2}
Exponential
CY (h;θ) = σ21 exp{−(||h||/θ1)}
Gaussian
CY (h;θ) = σ21 exp{−(||h||/θ1)2}
STK4150 - Intro 9
Bochner’s theorem
A covariance function needs to be positive definite
Theorem (Bochner, 1955)
If∫∞−∞ · · ·
∫∞−∞ |CY (h)|dh <∞, then a valid real-valued covariance
function can be written as
CY (h) =
∫ ∞−∞· · ·∫ ∞−∞
cos(ωTh)fY (ω)dω
where fY (ω) ≥ 0 is symmetric about ω = 0.
fY (ω): Spectral density of CY (h).
STK4150 - Intro 10
Nugget effect and Sill
We have
CZ (h) =cov[Z (s),Z (s + h)]
=cov[Y (s) + ε(s),Y (s + h) + ε(s + h)]
=cov[Y (s),Y (s + h)] + cov[ε(s), ε(s + h)]
=CY (h) + σ2εI (h = 0)
Assume
CY (0) =σ2Y
limh→0
[CY (0)− CY (h)] =0
Then
CZ (0) = σ2Y + σ2
ε Sill
limh→0
[CZ (0)− CZ (h)] = σ2ε = c0 Nugget effect
Possible to include nugget effect also in Y -process.STK4150 - Intro 11
Nugget/sill
STK4150 - Intro 12
Estimation of variogram/covariance function
2γZ (h) ≡var[Z (s + h)− Z (s)]
Const expectation= E [Z (s + h)− Z (s)]2
Can estimate from “all” pairs having distance h between.Problem: Few/no pairs for all hSimplifications
γZ (h) = γ0Z (||h||)Use 2γ0Z (h) = ave{(Z (si )− Z (sj))2; ||si − sj || ∈ T (h)}If covariates, use residuals
STK4150 - Intro 13
Boreality data
Empirical variogram: Boreality data
STK4150 - Intro 14
Testing for independence
If independence: γ0Z (h) = σ2Z
Test-statistic F = γZ (h1)/σ2Z , h1 smallest observed distance
Reject H0 for |F − 1| largePermutation test: Recalculate F for all permutations of ZIf observed F is above 97.5% percentile, reject H0
Boreality example:
P-value = 0.0001
STK4150 - Intro 15
Kriging = Prediction
Model
Y =Xβ + δ
Z =Y + ε
Prediction of Y (s0)Linear predictors {LTZ + k}Optimal predictor minimize
MSPE(L, k) ≡E [Y (s0)− LTZ− k]2
=var[Y (s0)− LTZ− k] + {E [Y (s0)− LTZ− k]}2
Note: Do not assume any distributional assumptions
STK4150 - Intro 16
Kriging
MSPE(L, k) =var[Y (s0)− LTZ− k] + {E [Y (s0)− LTZ− k]}2
=var[Y (s0)− LTZ− k] + {µY (s0)− LTµz − k]}2
Second term is zero if k = µY (s0)− LTµZ .First term (c(s0) = cov[Z,Y (s0)])
var[Y (s0)− LTZ− k] =CY (s0, s0)− 2LTc(s0) + LTCZL
Derivative wrt LT :
− 2c(s0) + 2CZL = 0
L∗ = C−1Z c(s0)
giving
Y ∗(s0) =µY (s0) + c(s0)TC−1Z [Z− µZ ]
MSPE(L∗, k∗) =CY (s0, s0)− c(s0)TC−1Z c(s0)
STK4150 - Intro 17
Kriging
c(s0) =cov[Z,Y (s0)]
=cov[Y,Y (s0)]
=(CY (s0, s1), ...,CY (s0, sm))
=cY (s0)
CZ ={CZ (si , sj)}
CZ (si , sj) =
{CY (si , si ) + σ2
ε, si = sj
CY (si , sj), si 6= sj
Y ∗(s0) = µY (s0) + cY (s0)TC−1Z (Z− µZ )
STK4150 - Intro 18
Gaussian assumptions
Assume now Y,Z are MVN(Z
Y(s0)
)= MVN
((µZ
µY (s0)
),
(CZ cY (s0)T
cY (s0) CY (s0, s0)
))Give
Y (s0)|Z ∼ N(µY (s0) + c(s0)TC−1Z [Z− µZ ],CY (s0, s0)− c(s0)TC−1Z c(s0))
Same as kriging!The book derive this directly without using the formula for conditionaldistribution
STK4150 - Intro 19
Kriging (cont)
Y ∗(s0) = µY (s0) + cY (s0)TC−1Z (Z− µZ )
Assuming
E [Y (s)] =x(s)Tβ
Z (si )|Y (si ), σ2ε ∼ind.Gau(Y (si ), σ
2ε)
Then
µZ =µY = Xβ
CZ =ΣY + σ2εI
cY (s0) =(CY (s0, s1), ...,CY (s0, sm))T
and
Y ∗(s0) = x(s0)β + cY (s0)T [ΣY + σ2εI]−1(Z− Xβ)
STK4150 - Intro 20
Summary previous time/plan
Simple kriging
Linear predictorAssume parameters knownEqual to conditional expectation
Unknown parameters
Ordinary krigingPlug-in estimatesBayesian approach
Non-Gaussian models
STK4150 - Intro 21
Summary previous time/plan
Simple kriging
Linear predictorAssume parameters knownEqual to conditional expectation
Unknown parameters
Ordinary krigingPlug-in estimatesBayesian approach
Non-Gaussian models
STK4150 - Intro 22
Unknown parameters
So far assumed parameters known, what if unknown?
Plug-in estimate/Empirical Bayes
Bayesian approach
Direct approach - Ordinary kriging
STK4150 - Intro 23
Estimation
Likelihood
L(θ) = p(z;θ) =
∫y
p(z|y;θ)p(y;θ)dθ
Gaussian processes: Can be derived analytically
Optimization can still be problematic
Many (too many?) routines in R available
> gls(Bor~Wet,correlation=corGaus(form=~x+y,nugget=TRUE),data=Boreality)
Generalized least squares fit by REML
Model: Bor ~ Wet
Data: Boreality
Log-restricted-likelihood: -1363.146
Coefficients:
(Intercept) Wet
15.06705 77.66940
Correlation Structure: Gaussian spatial correlation
Formula: ~x + y
Parameter estimate(s):
range nugget
460.3941829 0.6112509
Degrees of freedom: 533 total; 531 residual
Residual standard error: 3.636527
STK4150 - Intro 24
Estimation
Likelihood
L(θ) = p(z;θ) =
∫y
p(z|y;θ)p(y;θ)dθ
Gaussian processes: Can be derived analytically
Optimization can still be problematic
Many (too many?) routines in R available
> gls(Bor~Wet,correlation=corGaus(form=~x+y,nugget=TRUE),data=Boreality)
Generalized least squares fit by REML
Model: Bor ~ Wet
Data: Boreality
Log-restricted-likelihood: -1363.146
Coefficients:
(Intercept) Wet
15.06705 77.66940
Correlation Structure: Gaussian spatial correlation
Formula: ~x + y
Parameter estimate(s):
range nugget
460.3941829 0.6112509
Degrees of freedom: 533 total; 531 residual
Residual standard error: 3.636527
STK4150 - Intro 25
Bayesian approach
Hierarchical model
Variable Densities Notation in bookData model: Z p(Z|Y,θ) [Z|Y,θ]Process model: Y p(Y|θ) [Y|θ]Parameter: θ
Simultaneous model: p(y, z|θ) = p(z|y,θ)p(y|θ)Marginal model: L(θ) = p(z|θ) =
∫y p(z, y|θ)dy
Inference: θ = argmaxθL(θ)
Bayesian approach: Include model on θ
Variable Densities Notation in bookData model: Z p(Z|Y,θ) [Z|Y,θ]Process model: Y p(Y|θ) [Y|θ]Parameter model: θ p(θ) [θ]
Simultaneous model: p(y, z,θ)Marginal model: p(z) =
∫θ∫
y p(z, y|θ)dydθ
Inference: θ =∫θ θp(θ|z)dθ
STK4150 - Intro 26
Bayesian approach
Hierarchical model
Variable Densities Notation in bookData model: Z p(Z|Y,θ) [Z|Y,θ]Process model: Y p(Y|θ) [Y|θ]Parameter: θ
Simultaneous model: p(y, z|θ) = p(z|y,θ)p(y|θ)Marginal model: L(θ) = p(z|θ) =
∫y p(z, y|θ)dy
Inference: θ = argmaxθL(θ)
Bayesian approach: Include model on θ
Variable Densities Notation in bookData model: Z p(Z|Y,θ) [Z|Y,θ]Process model: Y p(Y|θ) [Y|θ]Parameter model: θ p(θ) [θ]
Simultaneous model: p(y, z,θ)Marginal model: p(z) =
∫θ∫
y p(z, y|θ)dydθ
Inference: θ =∫θ θp(θ|z)dθ
STK4150 - Intro 27
Bayesian approach
Hierarchical model
Variable Densities Notation in bookData model: Z p(Z|Y,θ) [Z|Y,θ]Process model: Y p(Y|θ) [Y|θ]Parameter: θ
Simultaneous model: p(y, z|θ) = p(z|y,θ)p(y|θ)Marginal model: L(θ) = p(z|θ) =
∫y p(z, y|θ)dy
Inference: θ = argmaxθL(θ)
Bayesian approach: Include model on θ
Variable Densities Notation in bookData model: Z p(Z|Y,θ) [Z|Y,θ]Process model: Y p(Y|θ) [Y|θ]Parameter model: θ p(θ) [θ]
Simultaneous model: p(y, z,θ)Marginal model: p(z) =
∫θ∫
y p(z, y|θ)dydθ
Inference: θ =∫θ θp(θ|z)dθ
STK4150 - Intro 28
Bayesian approach - example
Assume z1, ..., zm are iid, zi = µ+ εi , εi ∼ N(0, σ2)σ2 known, interest in estimating µ.ML: µML = z , Var[z ] = σ2/m
Bayesian approach: Assume µ ∼ N(µ0, kσ2)
Can show that
E [µ|z] =k−1µ0 + mz
k−1 + mk→∞→ z
Var[µ|z] =σ2
k−1 + mk→∞→ σ2
m
STK4150 - Intro 29
Bayesian approach - example
Assume z1, ..., zm are iid, zi = µ+ εi , εi ∼ N(0, σ2)σ2 known, interest in estimating µ.ML: µML = z , Var[z ] = σ2/m
Bayesian approach: Assume µ ∼ N(µ0, kσ2)
Can show that
E [µ|z] =k−1µ0 + mz
k−1 + mk→∞→ z
Var[µ|z] =σ2
k−1 + mk→∞→ σ2
m
STK4150 - Intro 30
Prediction - example
Predict new point z0.Plug-in-rule: z0 = z , var[z0] = σ2
Bayesian approach:
E [z0|z] =E [E [z0|µ, z]] = E [µ|z]
=k−1µ0 + mz
k−1 + mk→∞→ z
var[z0|z] =E [var[z0|µ, z]] + var[E [z0|µ, z]]
=σ2 + var[µ|z]
=σ2 +σ2
k−1 + mk→∞→ σ2 +
σ2
m
Taking uncertainty in µ into account.
STK4150 - Intro 31
Prediction - example
Predict new point z0.Plug-in-rule: z0 = z , var[z0] = σ2
Bayesian approach:
E [z0|z] =E [E [z0|µ, z]] = E [µ|z]
=k−1µ0 + mz
k−1 + mk→∞→ z
var[z0|z] =E [var[z0|µ, z]] + var[E [z0|µ, z]]
=σ2 + var[µ|z]
=σ2 +σ2
k−1 + mk→∞→ σ2 +
σ2
m
Taking uncertainty in µ into account.
STK4150 - Intro 32
INLA
INLA = Integrated nested Laplace approximation (software)C-code, R-interfaceCan be installed by
source("http://www.math.ntnu.no/inla/givemeINLA.R")
Both empirical Bayes and Bayes
STK4150 - Intro 33
Example - simulated data
Assume
Y (s) =β0 + β1x(s) + σδδ(s), {δ(s)} matern correlation
Z (si ) =Y (si ) + σεε(si ), {ε(si )} independent
Simulations:
Simulation on 20× 30 grid
{x(s)} sinus curve in horizontal direction
Parametervalues (β0, β1) = (2, 0.3), σδ = 1, σε = 0.3 and(θ1, θ2) = (2, 3)
Show imagesShow code/results
STK4150 - Intro 34
Matern model - parametrization
Model Range par Smoothness parBook ∝ {||h||/θ1}θ2Kθ2(||h||/θ1) θ1 θ2geoR ∝ {||h||/φ}κKκ(||h||/φ) φ κINLA ∝ {||h|| ∗ κ}νKν(||h|| ∗ κ) 1/κ ν
Note: INLA reports an estimate of another “range”, defined as
r =
√8
κ
corresponding to a distance where the covariance function isapproximately zero. An estimate of the range parameter can then beobtained by
1
κ=
r√8
Note: INLA works with precisions τy = σ−2y , τz = σ−2z .
STK4150 - Intro 35
Output INLA
Empirical Bayes
Fixed effects:mean sd 0.025quant 0.5quant 0.975quant kld
(Intercept) 2.5757115 0.006218343 2.5635180 2.5757114 2.5879077 0x 0.3714519 0.040367838 0.2922947 0.3714515 0.4506262 0
Random effects:Name Model Max KLDdelta Matern2D model
Model hyperparameters:mean sd 0.025quant 0.5quant
Precision for the Gaussian observations 28.8979 1.5841 25.9174 28.8546Precision for delta 2.2549 0.1507 1.9738 2.2498Range for delta 5.9772 0.2936 5.4222 5.9700
0.975quantPrecision for the Gaussian observations 32.1247Precision for delta 2.5644Range for delta 6.5730
Marginal Likelihood: -455.08Warning: Interpret the marginal likelihood with care if the prior model is improper.Posterior marginals for linear predictor and fitted values computed
STK4150 - Intro 36
Result - INLA - empirical Bayes
Parameter True value Estimateβ1 2.0 2.423β2 0.3 0.357σz 0.3 1√
13.2739= 0.274
σy 1.0 1√0.5802
= 1.313
φ1 2.0 5.977√8
= 2.113
STK4150 - Intro 37
Ordinary kriging
E [Y (s)] = µ, µ unknown, obs Z = (Z1, ...,Zm)MSPE(λ) = E [Y (s0)− λTZ)2
Constraint: E [λTZ] = E [Y (s0)] = µE [λTZ] = λTE [Z] = λT1µ⇒ λT1 = 1Lagrange multiplier: Minimize MSPE(λ)− 2κ(λT1− 1)
MSPE(λ) = CY (s0, s0)− 2λT cY (s0) + λTCZλ
Differentiation wrt λT :
−2cY (s0) + 2CZλ =2κ1
λT1 =1
λ∗ =C−1Z [cY (s0) + κ∗1]
κ∗ =1− cY (s0)TC−1Z 1
1TC−1Z 1
STK4150 - Intro 38
Kriging
Simple kriging, EY (s) = x(s)Tβ, known
Y ∗(s0) = x(s0)β + cY (s0)TC−1Z (Z− Xβ)
Ordinary kriging, EY (s) = µ, unknown
Y (s0) ={cY (s0) +1(1− 1TCZ−1cy (s0))
1TC−1Z 1}TC−1Z Z
=µgls + cY (s0)TC−1Z (Z− 1µgls)
µgls =[1TC−1Z 1]−11TC−1z Z
Universal kriging, EY (s) = x(s)Tβ, unknown
Y (s0) =x(s0)T βgls + cY (s0)TC−1Z (Z− Xβgls)
βgls =[XTC−1Z X]−1XTC−1z Z
STK4150 - Intro 39
Kriging
Simple kriging, EY (s) = x(s)Tβ, known
Y ∗(s0) = x(s0)β + cY (s0)TC−1Z (Z− Xβ)
Ordinary kriging, EY (s) = µ, unknown
Y (s0) ={cY (s0) +1(1− 1TCZ−1cy (s0))
1TC−1Z 1}TC−1Z Z
=µgls + cY (s0)TC−1Z (Z− 1µgls)
µgls =[1TC−1Z 1]−11TC−1z Z
Universal kriging, EY (s) = x(s)Tβ, unknown
Y (s0) =x(s0)T βgls + cY (s0)TC−1Z (Z− Xβgls)
βgls =[XTC−1Z X]−1XTC−1z Z
STK4150 - Intro 40
Kriging
Simple kriging, EY (s) = x(s)Tβ, known
Y ∗(s0) = x(s0)β + cY (s0)TC−1Z (Z− Xβ)
Ordinary kriging, EY (s) = µ, unknown
Y (s0) ={cY (s0) +1(1− 1TCZ−1cy (s0))
1TC−1Z 1}TC−1Z Z
=µgls + cY (s0)TC−1Z (Z− 1µgls)
µgls =[1TC−1Z 1]−11TC−1z Z
Universal kriging, EY (s) = x(s)Tβ, unknown
Y (s0) =x(s0)T βgls + cY (s0)TC−1Z (Z− Xβgls)
βgls =[XTC−1Z X]−1XTC−1z Z
STK4150 - Intro 41
Non-Gaussian data
Counts, binary data: Gaussian assumption inappropriate
Can still have Gaussian assumption on latent process, butnon-Gaussian data-distribution
Y (s) =x(s)Tβ + ε(s), {ε(s)} Gaussian process
Z (si )|Y (si ), θ1 ∼ind.f(Y (si ), θ1)
Best linear predictor still possible, but not reasonable (?)
Conditional expectation E [Y (s0)|Z] still optimal under square lossNot easy to compute anymore
Exponential-family model (EFM)
f (z) = exp{(zη − b(η))/a(θ1) + c(z , θ1)}η =xTβ
Include Binomial, Poisson, Gaussian, Gamma
STK4150 - Intro 42
Estimation
Likelihood
L(θ) = p(z;θ) =
∫y
p(z|y;θ)p(y;θ)dθ
Gaussian processes: Can be derived analyticallyNon-Gaussian: Can be problematic
Monte Carlo
Laplace approximations
STK4150 - Intro 43
Computation of conditional expectation
E [Y (s0)|Z] = E [E [Y (s0)|Y,Z]]
Monte Carlo:
E [E [Y (s0)|Y,Z]] ≈ 1
M
M∑m=1
E [Y (s0)|Ym,Z]
where Ym ∼ [Y|Z] (MCMC)
Laplace approximation: Approximate [Y|Z] by a Gaussian
Computer software available for both approahces.
Also give uncertainty measures var[Y (s0)|Z].
STK4150 - Intro 44
Laplace approximation
L(θ) =p(z;θ) =
∫y
p(z|y;θ)p(y;θ)dy =
∫y
exp{f (y;θ)}dy
where
f (y;θ) = log p(z|y;θ) + log p(y;θ)
Taylor approximation, y0 max-point of f (y;θ) (depending on θ)
f (y) ≈f (y0)− 1
2(y − y0)TH(y − y0), H = − ∂2
∂y∂yTf (y)|y=y0
L(θ) ≈∫
y
exp{f (y0)− 1
2(y − y0)TH(y − y0)}dy
= exp{f (y0)}(2π)m/2|H|1/2
STK4150 - Intro 45
Example - simulated data
Assume
Y (s) =β0 + β1x(s) + σδδ(s), {δ(s)} matern correlation
Z (si ) ∼Poisson(exp(Y (si ))
Simulations:
Simulation on 20× 30 grid
{x(s)} sinus curve in horizontal direction
Parametervalues (β0, β1) = (2, 0.3), σδ = 1, σε = 0.3 and(θ1, θ2) = (2, 3)
Show imagesShow code/results
STK4150 - Intro 46