Flexible covariance modeling of multivariate
spatio-temporal processes
Rafael Santos Erbisti
Universidade Federal do Rio de Janeiro
Instituto de Matematica
2019
Flexible covariance modeling of multivariate spatio-temporal
processes
Rafael Santos Erbisti
Thesis submitted to the Statistics Graduate Program at Mathematics Institute of Federal Uni-
versity of Rio de Janeiro, as a requirement to earn a PhD’s degree in statistics.
Approved by:
Thaıs C. O. da Fonseca (Orientador)
DME/IM/UFRJ.
Dani Gamerman
DME/IM/UFRJ.
Marina Silva Paez
DME/IM/UFRJ.
Flavio Bambirra Goncalves
DEST/ICEx/UFMG.
Gustavo da Silva Ferreira
ENCE/IBGE.
Rio de Janeiro, RJ - Brazil
2019
ii
CIP - Catalogação na Publicação
Elaborado pelo Sistema de Geração Automática da UFRJ com os dados fornecidospelo(a) autor(a), sob a responsabilidade de Miguel Romeu Amorim Neto - CRB-7/6283.
E65fErbisti, Rafael Santos Flexible covariance modeling of multivariatespatio-temporal processes / Rafael Santos Erbisti. - Rio de Janeiro, 2019. 71 f.
Orientadora: Thaís Cristina Oliveira da Fonseca. Coorientadora: Mariane Branco Alves. Tese (doutorado) - Universidade Federal do Riode Janeiro, Instituto de Matemática, Programa de PósGraduação em Estatística, 2019.
1. geoestatística. 2. modelos espaço-tempomultivariados. 3. função de covariância cruzada nãoseparável. 4. dimensões latentes. 5. inferênciabayesiana. I. Fonseca, Thaís Cristina Oliveira da,orient. II. Alves, Mariane Branco, coorient. III.Título.
iii
“... a vida e trem-bala, parceiro
E a gente e so passageiro prestes a partir”
(Trecho da musica Trem-Bala - Ana Vilela)
iv
Agradecimentos
Agradeco a todos que contribuıram de alguma forma para a realizacao deste trabalho, especial-
mente:
As minhas orientadoras, Thaıs C. O. Fonseca e Mariane B. Alves, pelos ensinamentos, disponi-
bilidade, paciencia e delicadeza. Agradeco por poder trabalhar com pesquisadoras tao excelentes.
O conhecimento e experiencia transmitidos por voces foram essenciais para meu amadurecimento
e desenvolvimento. Sei que nem todos os momentos foram faceis mas sabemos que as dificuldades
sempre aparecerao e precisamos aprender a contorna-las. Aprendi muito com voces e sou muito
grato por tudo que fizeram! Muito obrigado!
Aos meus pais, Belinha e Renzo, que sempre acreditaram em mim e me apoiaram em todos
os momentos. Sem voces nada disso seria possıvel. A minha irma, Juliana, pela amizade e apoio
de sempre. E claro ao mais novo integrante da famılia Erbisti, Pedrinho! Mesmo na correria do
fechamento da tese, consegui estar presente em seu nascimento!
Ao meu amor, Paloma, por sempre estar ao meu lado. Seu amor, parceria, carinho e paciencia
foram fundamentais para que eu conseguisse chegar ate aqui. Muito obrigado por tudo!
Agradeco aos amigos que me acompanharam durante essa trajetoria, sempre dando forca e me
fazendo acreditar que era possıvel!
v
Resumo
A modelagem da dependencia de um vetor de variaveis correlacionadas espacialmente e uma
tarefa desafiadora, uma vez que a especificacao de funcoes de covariancia cruzada validas que
representem processos reais nao e trivial. A fim de satisfazer a hipotese de validade, varias pro-
postas consideram formulacoes simplificadas, como funcoes de covariancia separaveis que implicam
o mesmo alcance espacial para todos os componentes no vetor de resposta. Isso raramente e uma
suposicao realista para aplicacoes espaciais. Neste contexto, e de grande interesse contabilizar as
dependencias na analise de dados espaciais multivariados, permitindo ainda a especificacao flexıvel
de modelos para cada componente. Neste trabalho propomos inferencia bayesiana completa para
uma classe flexıvel e nao-separavel de funcoes de covariancia cruzada, inicialmente para dados es-
paciais multivariados. A funcao de covariancia nao separavel considerada e baseada na convolucao
de funcoes de covariancia separaveis e permite diferentes alcances no espaco. Um teste bayesiano
e proposto para a separabilidade das funcoes de covariancia, o que e mais interpretavel do que
tentar entender um conjunto de parametros relacionados a separabilidade, tipicamente definidos
em escala que dificulta a interpretacao pratica. Uma aplicacao a variaveis meteorologicas no estado
do Ceara, Brasil, indica que nossa proposta e preferida a abordagem de coregionalizacao. Para
tratar a limitacao computacional, aproximamos as matrizes de covariancia cheia usando uma de-
composicao baseada no produto de Kronecker de duas matrizes separaveis de dimensoes menores.
Essas aproximacoes sao aplicadas somente a funcao de verossimilhanca para obter uma estimativa
rapida dos parametros, mas ainda mantendo a interpretacao e a flexibilidade do modelo multi-
variado nao separavel. Os efeitos do uso da aproximacao sao avaliados para conjuntos de dados
simulados em termos de erro de previsao. Tambem introduzimos um cenario espaco-temporal as
funcoes de covariancia cruzada propostas, que permitem coeficientes de covariancia variando no
tempo que acomodam observacoes atıpicas e heterogeneidade temporal.
Palavras-Chaves: geoestatıstica, modelos espaco-tempo multivariados, funcao de covariancia
cruzada, funcao de covariancia nao separavel, dimensoes latentes, inferencia bayesiana.
vi
Abstract
Modeling the dependence of a vector of spatially correlated variables is a challenging task, once
specification of valid cross-covariance functions is non-trivial. In order to satisfy the validity as-
sumption, several proposals consider simplified formulations such as separable covariance functions
which imply the same spatial reach for all the components in the response vector. This is rarely
a realistic assumption for spatial applications. In this context, accounting for the dependencies in
multivariate spatial data analysis while still allowing for flexible specification of models for each
component is of great interest. In this work we propose full Bayesian inference for a flexible nonsep-
arable class of cross-covariance functions for multivariate spatial data. The nonseparable covariance
function considered is based on the convolution of separable covariance functions and allows for dif-
ferent reaches in space. A Bayesian test is proposed for separability of covariance functions which is
more interpretable than trying to make sense of a set of parameters related to separability, typically
defined on a scale that is difficult to interpret. An application to meteorological variables in the
state of Ceara, Brazil, indicates that our proposal is preferred to the well known coregionalization
approach. To treat the computational limitation we approximate the full covariance matrices using
a decomposition based on the Kronecker product of two separable matrices of minor dimensions.
These approximations are applied to the likelihood function in order to obtain fast estimation
of parameters but we still keep the interpretation and flexibility of the multivariate nonsepara-
ble model. The effects of using the approximation are evaluated for simulated datasets in terms
of prediction error. We also introduce a spatio-temporal setting of the proposed cross-covariance
functions, which allow time-varying covariance coefficients accommodating atypical observations
and temporal heterogeneity.
Keywords: geostatistics, multivariate spatio-temporal models, cross-covariance function, nonsepa-
rable covariance function, latent dimensions, bayesian inference.
vii
Contents
1 Introduction 1
2 Covariance modeling of multivariate spatial random fields 6
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Multivariate process modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Separable cross-covariance functions . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Nonseparable cross-covariance functions . . . . . . . . . . . . . . . . . . . . . 10
2.3 Multivariate spatial modeling based on mixtures . . . . . . . . . . . . . . . . . . . . 11
2.3.1 Flexible classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Bayesian hypotheses testing for separability . . . . . . . . . . . . . . . . . . . . . . . 17
2.6 Simulated examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6.1 Example 1: computing posterior probabilities of separability . . . . . . . . . 18
2.6.2 Example 2: model discrimination . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.7 Ceara weather dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Likelihood computation for large data 30
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Separable approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Sensitivity study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Multivariate spatio-temporal modeling 38
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
viii
4.2 Multivariate dynamic spatial models . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.1 DLM completion and prior specification . . . . . . . . . . . . . . . . . . . . . 41
4.3 Simulated examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.1 Simulation study one: constant variance over time . . . . . . . . . . . . . . . 44
4.3.2 Simulation study two: time-varying variance . . . . . . . . . . . . . . . . . . 47
4.3.3 Simulation study three: change of regime . . . . . . . . . . . . . . . . . . . . 49
4.4 California dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5 Conclusions 61
A Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
A.1 Covariance functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
A.2 Prior distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
A.3 Model comparison predictive measures . . . . . . . . . . . . . . . . . . . . . 66
ix
List of Tables
2.1 Interpretation table for the Bayesian separability test. BF: Bayes Factor; p0: pos-
terior probability of separability; w0: loss associated to the decision of rejecting H0
when H0 is true; w1: loss associated to the decision of not rejecting H0 when H1 is
true. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Posterior probabilities of separability p0 for each dataset and scenario of α0. . . . . . 19
2.3 Predictive measures for model comparison and posterior probability of separability
for the simulation examples. IS=Interval Score, WCI=Width of Credibility Interval
and LPS=Log Predictive Score. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Comparison of models in predictive terms for the Ceara weather dataset. IS=Interval
Score, WCI=Width of Credibility Interval and LPS=Log Predictive Score. . . . . . . 29
3.1 Necessary time (in seconds) to calculate the likelihood function based on a full co-
variance matrix and an approximate structure. (Intel(R) Core(TM) i7-3630QM,
2.40GHz, 6GB RAM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Predictive model comparison. SEP: separable model. NS app: nonseparable approx-
imate model. NS: nonseparable model. . . . . . . . . . . . . . . . . . . . . . . . . . . 36
x
List of Figures
2.1 Posterior median and 95% CI of the cross-correlations among variables for each
dataset considering an independent multivariate normal distribution for each spatial
location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Posterior median of covariance function (gray full line) and 95% credible interval
(gray dashed lines) of Mix-NSEP model. Black full line: true covariance function
(SEP model). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Collection of monitoring stations in Ceara state, Brazil. Numbers 1 and 2 are spatial
locations considered for predictive comparison. . . . . . . . . . . . . . . . . . . . . . 26
2.4 Posterior median and 95% CI of the cross-correlations among components considering
an independent multivariate normal distribution for each spatial location. . . . . . . 26
2.5 Posterior median (full line) and 95% credible interval (dashed lines) of spatial cor-
relation of the univariate spatial models for each component. . . . . . . . . . . . . . 27
2.6 Posterior densities of the spatial ranges of SEP and Mix-NSEP models. . . . . . . . 28
3.1 Separability approximation error index as a function of α0. Full line: p = 2; dashed
line: p = 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Likelihood contour plots. Black line: full structure. Red line: approximate structure.
Dashed black line: true value of parameters. . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Computational time reduction (in percent) in calculation of the likelihood function
using approximate structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1 Spatial location simulated in [0, 1]× [0, 1] square. Red points: spatial location used
in estimation. Black points: spatial location used in prediction. . . . . . . . . . . . . 43
4.2 Cross-correlation function for components in space: simulations. . . . . . . . . . . . 43
4.3 Posterior median and 95% credibility interval of σi,t, i = 1, 2, t = 1, . . . , 30, for M1
and M2 models. Green line: true value of the σi,t: simulation one . . . . . . . . . . . 45
xi
4.4 Interval Score as predictive measure for M1 and M2 models comparison at each
component i, i = 1, 2, and time t, t = 1, . . . , 30: simulation one. . . . . . . . . . . . . 46
4.5 Log Predictive Score as predictive measure for M1 and M2 models comparison at
each time t, t = 1, . . . , 30: simulation one. . . . . . . . . . . . . . . . . . . . . . . . . 46
4.6 Empirical variance over time for each component: simulation two. . . . . . . . . . . 47
4.7 Posterior median and 95% credibility interval of σi,t, i = 1, 2, t = 1, . . . , 30, for M1
and M2 models. Green line: true value of the σi,t: simulation two. . . . . . . . . . . 48
4.8 Interval Score as predictive measure for M1 and M2 models comparison at each
component i, i = 1, 2, and time t, t = 1, . . . , 30: simulation two. . . . . . . . . . . . . 49
4.9 Log Predictive Score as predictive measure for M1 and M2 models comparison at
each time t, t = 1, . . . , 30: simulation two. . . . . . . . . . . . . . . . . . . . . . . . . 49
4.10 Empirical variance over time for each component: simulation three. . . . . . . . . . . 50
4.11 Posterior median and 95% credibility interval of σi,t, i = 1, 2, t = 1, . . . , 30, for M1
and M2 models. Green line: true value of the σi,t: simulation three. . . . . . . . . . 51
4.12 posterior median and 95% credible interval for the component one versus the observed
values at times t = 7 and t = 23 for all out-of-sample stations: simulation three. . . . 52
4.13 Interval Score as predictive measure for M1 and M2 models comparison at each
component i, i = 1, 2, and time t, t = 1, . . . , 30: simulation three. . . . . . . . . . . . 53
4.14 Log Predictive Score as predictive measure for M1 and M2 models comparison at
each time t, t = 1, . . . , 30: simulation three. . . . . . . . . . . . . . . . . . . . . . . . 53
4.15 Collection of monitoring stations in California state, USA. (a) Collection of 21 mon-
itoring stations. (b) Locations that have gone through an imputation process. (c)
Collection of 21 monitoring stations where numbers 1 and 2 are spatial locations
considered for predictive comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.16 Empirical variance for NO2 and PM2.5 across space: California dataset. . . . . . . . 55
4.17 Empirical variance for NO2 and PM2.5 across time: California dataset. . . . . . . . . 55
4.18 Posterior median and 95% CI of the cross-correlation among components considering
an independent multivariate normal distribution for each time: California dataset. . 56
4.19 Posterior median (full line) and 95% credible interval (dashed lines) of spatial cor-
relation of the univariate dynamic spatial model for each component: California
dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
xii
4.20 Posterior densities of the cross-correlation parameters for each model: California
dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.21 Posterior median and 95% credibility interval of σNO2 and σPM2.5 for M1 and M2
models at each time t, t = 1, . . . , 40,: California dataset. . . . . . . . . . . . . . . . . 58
4.22 Interval Score as predictive measure for M1 and M2 models comparison at each
component i, i = 1, 2, and time t, t = 1, . . . , 40: California dataset. . . . . . . . . . . 59
4.23 Log Predictive Score as predictive measure for M1 and M2 models comparison at
each time t, t = 1, . . . , 40: California dataset. . . . . . . . . . . . . . . . . . . . . . . 59
xiii
Chapter 1
Introduction
The analysis of multivariate data observed over space and time is of great interest in sev-
eral application areas such as environmental science, climate science and agriculture. Spatial and
spatiotemporal data often arise as multivariate measurements at each location. In particular, in
geostatistical applications, the data may be considered as a partial realization of a random vector
Y(s), Y(s) ∈ <p, s ∈ D ⊆ <d.
In the context of multivariate spatial modelling, two kinds of dependence must be accounted for:
among measurements in a specific location and among measurements across locations. The ultimate
goal in the analysis would be to predict the response vector based on partial observations of the
process under study. The development of multivariate spatial models is based on the assumption
that data which are closer in space are more correlated than data further apart and that vector
components are usually better predicted considering the components interdependence in this vector.
These general ideas are directly related to the cross-covariance function of spatial multivariate
data, that is, Cij(s, s′) = Cov(Yi(s), Yj(s
′)), s, s′ ∈ D ⊆ <d, which models the spatial dependence
of Yi(.) and Yj(.), i, j = 1, . . . , p, p denoting the number of variables in the random vector.
The cross-covariance functions considered must be valid, thus, the construction of new cross-
covariance functions usually rely on mathematical simplifications which are not necessarily followed
by good model-data fit. An usual simplifying assumption is that the cross-covariance functions are
separable (see Mardia and Goodall, 1993). Separability implies that when the spatial location varies,
the covariance pattern for different components remains unchanged, that is, Cij(s, s′) = aijρ(s, s′),
with A = aij a positive definite p× p matrix and ρ(., .) a valid spatial correlation function.
Cressie and Huang (1999) discuss shortcomings of separable models in the context of spatiotem-
poral processes and point out that separable models are often chosen for convenience rather than
1
for fitting the data well. Stein (2005) presents results about the limited kind of behaviours which
these classes represent in practice. A consequence of the separability assumption is that the dif-
ferent p process components have the same spatial range. Another consequence of separability is
full symmetry of the resulting covariance matrices. However, environmental, atmospheric and geo-
physical processes are often influenced by wind direction or ocean currents, which are incompatible
with the assumption of full symmetry. In the spatiotemporal context, asymmetric behavior is often
observed when the effect of one variable on another is delayed in time (Stein, 2005; Gneiting et al.,
2007). For instance, processes which are influenced by air flows might have asymmetric covariance
functions (see Fonseca and Steel, 2011).
Several authors have proposed alternative formulations to relax the separability assumption of
cross-covariance functions. A pioneer work is the linear model of coregionalization (Goulard and
Voltz, 1992; Wackernagel, 2003) which decomposes the spatial process Yj(s); j = 1, . . . , p into
sets W (j)u (s);u = 0, . . . ,K of spatially uncorrelated components1, i.e, Yj(s) =
∑Ku=0W
(j)u (s). In
this approach, the resulting cross-covariance function is Cij(s, s′) =
∑Ku=0 b
uijρ
u(s, s′;θu), where buij ,
i, j = 1, . . . , p, are real coefficients and ρu(., .;θu), u = 1, . . . ,K, are valid correlation functions that
may be different for each component. This proposal allows for different range parameters for each
component of Y(s). Schmidt and Gelfand (2003) and Gelfand et al. (2004) present full Bayesian
inference for this class of multivariate models.
A different proposal considers multidimensional scaling ideas (Cox and Cox, 2000) which repre-
sent similarity or dissimilarity between objects as distances between points in a multidimensional
space. Following this approach, Apanasovich and Genton (2010) proposed a multivariate spatial
model based on existing stationary covariance models for univariate processes. In particular, the
authors extend the spatiotemporal models of Gneiting (2002) to include a third argument that
represents the component vector and compute distances using latent dimensions. The structures
presented are based on processes which consider, for each component, the same isotropic covariance
function with respect to space and time. To overcome this limitation the authors consider the linear
model of coregionalization.
In a recent paper, Cressie and Zammit-Mangion (2016) proposed the conditional approach to
derive multivariate models. The construction is based on partitioning the vector of spatial processes
so that the joint distribution is specified through univariate spatial conditional distributions. This
is convenient as the modeler just needs to specify univariate covariance functions and an integrable
1Cov(W(i)u (s),W
(j)u (s′)) = Cuij(s, s
′); Cov(W(i)u (s),W
(j)v (s′)) = 0, u 6= v.
2
function of p arguments. Obviously, the results depend on the chosen conditioning and this is not
always an easy modeling decision.
Other proposals for multivariate spatial modelling are the convolution model (Gaspari and
Cohn, 1999; Majumdar and Gelfand, 2007), the semi-parametric model via separable functions
(Reich and Fuentes, 2007) and the multivariate Matern model (Gneiting et al., 2010).
In the context of space-time covariance models Fonseca and Steel (2011) developed nonseparable
covariance structures based on the convolution of purely spatial and purely temporal valid covari-
ance functions. One advantage of this approach is that it allows for different modelling decisions
regarding space and time. In this work, we propose to extend this class of nonseparable covariance
functions to the modeling of component and spatial dependence. The model is based on mixtures
and we consider multidimensional scaling ideas to define latent distances between components.
The proposed class allows for different ranges and degrees of smoothness across space for dif-
ferent components of the multivariate random vector. Similarly to the conditional approach of
Cressie and Zammit-Mangion (2016), the proposed covariance depends on the definition of univari-
ate spatial covariance functions and a bivariate joint density function. It is advantageous compared
to the conditional approach as it does not require the definition of conditioning relations between
components of the vector. In the proposed model, and in many models in the literature, a set of
parameters is responsible for separability of the resulting process. In this context, a Bayesian test
for separability is presented which is easier to interpret than the posterior distributions of model
parameters, since the scales in which these parameters are defined are neither limited or easily
interpretable.
It is important to note that the main articles cited in this work, for instance, Apanasovich
and Genton (2010) and Cressie and Zammit-Mangion (2016), make use of the frequentist inference
approach which does not fully account for all uncertainties when interest lies in both estimation
and prediction at ungauged location and/or future time points. Like Schmidt and Gelfand (2003)
and Gelfand et al. (2004) we use the Bayesian paradigm for estimation of the multivariate spatial
covariance functions and other model components as well as for prediction purposes.
The computational treatment of high complexity spatial models is a challenge. In the context of
geostatistics, analyzing multivariate data requires the specification of the cross-covariance function
and the computational cost to make inference and predictions can be prohibitive. As a result, the
use of complex models might be infeasible. Consider a p-dimensional vector measured at n locations
in D space. If a Gaussian process is assumed, the modeling of the p variables through nonseparable
3
multivariate spatial models results in a full matrix Σ ∈ <np×np. Therefore, the computation of the
likelihood requires the calculation of the inverse and determinant of this matrix, which might be
infeasible depending on the size of the problem.
In this work we present a way to approximate the full covariance matrix from two separable
matrices of minor dimensions based on Van Loan and Pitsianis (1993). The idea is to obtain
matrices R ∈ <n×n and A ∈ <p×p such that, for a given full covariance matrix Σ, the Frobenius
norm of ‖Σ − R ⊗ A‖F is minimized. Genton (2007) investigated the use of this approximate
structure in the spatio-temporal context. Here, this method is used for the multivariate spatial
case and is applied only in the likelihood computation, keeping the interpretation of the original
model.
We know that datasets are generally indexed in both space and time and that the assumption of
gaussianity in the modeling of multivariate spatio-temporal data with non-Gaussian characteristics
can be very restrictive and its predictive performance may not be satisfactory. When aberrant
observations are present in the dataset, we need to think of some way to accommodate these
observations, since the identification of outliers is essential to improve model fit and predictive
power. In this work we extend a valid nonseparable cross-covariance function, which is based on
mixtures and multidimensional scaling ideas, in to spatio-temporal setting, considering dynamic
models to describe the temporal evolution as West and Harrison (1997, Chapter 10). These models
have time-varying covariance coefficients which accommodate outliers and heterogeneity over time.
This thesis contributes to the modelling of multivariate spatio-temporal processes joining the
following four aspects: i) the proposal of a class of valid nonseparable spatial multivariate covariance
functions based on the convolution of separable functions, which allows the specification of different
structures for space and components, as well as different spatial ranges for each component; ii)
the use of multidimensional scaling concepts based on latent spaces to deal with association (or
similarity) among components; iii) the adoption of a Bayesian inferential approach on the proposed
models which includes the construction of a Bayesian hypothesis test, based on mixture priors,
which explores the fact that separability is straightforwardly obtained as a particular case of the
general proposed covariance structure, if an specif subset of the parameter vector is null; iv) the
use of dynamic concepts to time-varying covariance coefficients allowing to accommodate outlying
observations and changes in variability over time.
This thesis is organized as follows: in Chapter 2 we present a flexible class of multivariate
spatial covariance models and inference on these models, as previously mentioned, is conducted
4
from a Bayesian perspective. A Bayesian test of separability to measure the degree of separability
between space and components is developed and finally we also present simulated examples and
an illustration of the proposed approach based on weather data. A fast algorithm is used to
compute the likelihood function to allow for scalable modeling of large multivariate spatial data in
Chapter 3. We investigate the performance of an approximation for the full nonseparable covariance
and a sensitivity study is performed showing that the approximate approach provides important
gains in computational efficiency. Chapter 4 presents multivariate spatio-temporal models. The
proposal is based on time-varying covariance coefficients that accommodates outlying observations
and changes in variability over time. Simulated examples and an illustration of the proposed
approach is presented. Finally, Chapter 5 presents conclusions and future developments.
5
Chapter 2
Covariance modeling of multivariate
spatial random fields
2.1 Introduction
Following Cressie (1993) geostatistical data is defined as a partial realization of a stochastic
process Y (s) : s ∈ D, where D is a subset of <d with d-dimensional positive volume, that is, the
spatial index s varies continuously throughout the region D. Usually d = 2 (for example, latitude
and longitude) or d = 3 (for example, latitude, longitude and altitude).
A stochastic process (or random field), defined in D space, is usually described from finite-
dimensional probability distributions of random variables in collections of points s1, . . . , sn ∈ D. In
particular, a Gaussian random filed is a stochastic process whose finite dimensional distributions
are multivariate normal for every n and every locations s1, . . . , sn and is completely determined by
its mean and covariance functions.
Consider that the mean and covariance functions of a Gaussian spatial process Y (s) : s ∈ D
are defined by
µ(s) = E(Y (s)) (2.1)
and
Cov(Y (s), Y (s′)) = E((Y (s)− µ(s))(Y (s′)− µ(s′))), (2.2)
respectivily, with s, s′ ∈ D.
Often it is assumed that the Gaussian random field Y (s) : s ∈ D is stationary, that is,
µ(s) = E(Y (s)) = µ, (2.3)
6
where µ is a constant and
Cov(Y (s), Y (s′)) = E((Y (s)− µ(s))(Y (s′)− µ(s′))) = C(s− s′) (2.4)
is independent of locations s and s′, i.e, the covariance is just a function C(.) of the separation vector
h = s− s′. In addition, if the covariance is a function of Euclidean distance ‖h‖ = ‖s− s′‖, that is,
Cov(Y (s), Y (s′)) = C(‖s − s′‖) = C(‖h‖), the process is said to be isotropic. These assumptions
are restrictive and often unrealistic.
When a process is stationary and isotropic, its variance is constant and the elements of the
covariance matrix depend only on the parameter of variance and a valid correlation function, which
depends on the Euclidean distance. As follows we present some valid stationary, isotropic correlation
functions.
1. Power exponential class
ρ(‖h‖; Θ) = exp
−(‖h‖θ1
)θ2(2.5)
with θ2 ∈ (0, 2]. This family presents two correlation function as particular cases: the Gaus-
sian correlation function when θ2 = 2 and exponential function when θ2 = 1.
2. Cauchy class
ρ(‖h‖; Θ) =
(1 +
(‖h‖θ1
)θ2)−θ3(2.6)
with θ1 > 0 , θ2 ∈ (0, 2] and θ3 > 0. This function allows for long range dependence and easy
computation of the Hurst effect (Gneiting and Schlather, 2004).
3. Matern class
ρ(‖h‖; Θ) =1
2θ2−1Γ(θ2)
(‖h‖θ1
)θ2Kθ2
(‖h‖θ1
)(2.7)
where Γ is the gamma function, Kθ2(.) is the modified Bessel function of the third kind of
order θ2, θ1 > 0 and θ2 > 0.
In this chapter we work with multivariate spatial processes and propose a new class of nonsep-
arable covariance functions, which are flexible and intuitive depending only on the specification of
univariate covariance functions.
7
2.2 Multivariate process modeling
In the context of multivariate spatial processes, the main goal is usually to model the dependence
among several variables measured across a spatial domain of interest, in order to obtain realistic
predictions. Denote by Y(s) the p−dimensional vector of variables at location s ∈ D. Thus, the
direct covariance function measures the spatial dependence for each component individually, while
the cross-covariance function between two random functions measures the component dependence
at the same location and the component dependence between two different locations.
Assuming that Y(s) is a spatially stationary process, that is
E[Yi(s)] = mi, Cov[Yi(s), Yj(s + h)] = Cij(h), ∀s, s + h ∈ D; i, j = 1, 2, . . . , p, (2.8)
the cross-covariance function of Y(s) is defined as
E[(Yi(s)−mi)(Yj(s + h)−mj)] = Cij(h), s, s + h ∈ D; i, j = 1, 2, . . . , p. (2.9)
The requirement of positive definiteness of Cij(·) is a limitation in the definition of realistic
covariance functions for multivariate spatial processes. As a result, several simplifications are
called for in practice such as stationarity and separability.
2.2.1 Separable cross-covariance functions
Consider a p-dimensional multivariate random field Y(s) : s ∈ D ⊂ <d; Y ∈ <p. For example,
Y(s) = (temperature, humidity)(s). The cross-covariance function for two components i and j of
the vector Y, between two locations s and s′, can be described by
Cij(s, s′) = aijρ(s, s′), (2.10)
with A = aij a positive definite p × p matrix and ρ(·, ·) a valid correlation function. Let Y
be a vectorized version of Yik = Yi(sk), k = 1, · · · , n; i = 1, · · · , p (see Mardia and Goodall,
1993). Then the covariance matrix is Σ = R ⊗ A, with Rkl = ρ(sk, sl), k, l = 1, · · · , n. The
condition of positive definiteness is respected if R and A are positive definite. This specification
is computationally advantageous as inverses and determinants are obtained from smaller matrices,
that is, Σ−1 = R−1 ⊗A−1 and |Σ| = |R|p|A|n. However, this model has theoretical limitations
(Banerjee et al., 2004). Firstly, it is an intrinsic model implying that the covariance between two
components Yi(sk) and Yj(sl) is aij , that is, it does not depend on the locations sk and sl. Secondly,
note that as the covariance is defined by one spatial correlation function ρ(·, ·), the spatial range will
8
be the same for all components. This last feature can be perceived through the following argument:
consider the univariate spatial processes Y (s) : s ∈ D and X(s) : s ∈ D, D ⊂ <2. For locations
s1, . . . , sn we have Y = [Y (s1), Y (s2), . . . , Y (sn)]T and X = [X(s1), X(s2), . . . , X(sn)]T .
Consider the stacked 2n × 1 vector (X,Y)T , following a multivariate Normal distribution and
a separable covariance structure as in (2.10), that is, X
Y
∼ N2n(µ,Σ), Σ = A⊗R,
implying that X ∼ Nn(µx, a11R) and Y ∼ Nn(µy, a22R). It follows directly that Y|X ∼
Nn(µ∗,Σ∗), with
µ∗ = µy + (a12R)(a11R)−1(X− µx)
= µy −a12a11µx +
a12a11
X
and
Σ∗ = a22R− (a12R)(a11R)−1(a12R)
=
(a22 −
a212a11
)R,
which is equivalent to Y|X ∼ Nn(β0 + β1X, σ2R), with
β0 = µy −a12a11µx, β1 =
a12a11
and
σ2 = a22 −a212a11
.
Now assume, reversely, that X ∼ Nn(µx, a11R) and Y|X ∼ Nn(β0 + β1X, σ2S), with S any
spatial correlation matrix, then we obtain by marginalization the covariance structure for Y
Cov[Yi, Yj ] = σ2Sij + β21a11Rij
=
(a22 −
a212a11
)Sij +
a212a211
a11Rij
= a22Sij −a212a11
Sij +a212a11
Rij . (2.11)
Remember that if separability is assumed then Cov[Yi, Yj ] = a22R. Then (2.11) equals a22R,
reducing to the separable specification if and only if S = R, that is, if Y|X has the same spatial
correlation structure as X.
9
2.2.2 Nonseparable cross-covariance functions
Several authors have proposed cross-covariance functions capable of relaxing separability as-
sumptions. More flexible structures are obtained via the coregionalization approach (Goulard and
Voltz, 1992; Wackernagel, 2003), which in its simplest form is Y(s) = Aw(s), with A a p×p ma-
trix and the components of w(s), wj(s), j = 1, 2, . . . , p, independent and identically distributed
spatial processes. If the processes wj(s) are stationary with zero mean and unit variances and
Cov(wj(s), wj(s′)) = ρ(s − s′), then E(Y(s)) = 0 and the cross-covariance function of Y(s) is
ΣY(s),Y(s′) ≡ C(s− s′) = ρ(s− s′)AAT which is separable. A more general form for the coregional-
ization model considers independent processes wj(s) however they are not identically distributed.
The covariance matrix is given by
ΣY(s),Y(s′) ≡ C(s− s′) =
p∑j=1
ρj(s− s′)Tj (2.12)
with Tj = ajaTj , aj the j−th column of A. The resulting covariance is nonseparable and stationary.
Apanasovich and Genton (2010) propose a methodology based on latent dimensions and existing
covariance models for univariate random fields. The vector of components are represented as
coordinates in a k−dimensional space, for an integer 1 ≤ k ≤ p, that is, the i − th component is
represented as ξi = (ξi1, . . . , ξik)T . This approach can be used for any valid covariance function
Cij = C((s, ξi), (s′, ξj)). For any s, s′ there is Cs,s′(.) such that Cij(s, s
′) = Cs,s′(ξi, ξj) for some
ξi, ξj ∈ <k.
The latent coordinates may be treated as parameters and estimated from data. Moreover,
it is possible to consider the reparametrisation δij = ‖ξi − ξj‖. This approach is similar to the
multidimensional scaling (Cox and Cox, 2000) with latent distances δij ’s, where for fixed locations
s and s′, small δij ’s are converted into strong cross-correlation. Notice that large values of δij ’s
mean small correlation. Indeed, δij ’s can be interpreted as measures of distances in a latent space.
Suppose we are evaluating two variables, if the measure of distance between them is small, these
variables are “near” each other. Put another way, if they are very different, the distance in latent
space is large. It is important to note that ξi’s and δij ’s are not calculated from multidimensional
scaling methods. They are parameters of the cross-covariance function, estimated from the data.
These parameters are just interpreted in the same way as the existing measures of dissimilarities
in multidimensional scaling.
10
A simple cross-covariance function presented in Apanasovich and Genton (2010) is given by
Cij(‖h‖) = C(‖h‖, δij) =
a211exp(−α1‖h‖) (i = j = 1)
a221exp(−α1‖h‖) + a222exp(−α2‖h‖) (i = j = 2)
a11a21δ12 + 1
exp
− α1‖h‖
(δ12 + 1)β2
(i 6= j)
with ‖h‖ = ‖s − s′‖ a Euclidean distance between locations, where the linear model of coregion-
alization is a special case when δ12 = β = 0. From simulation results, the authors show that the
coregionalization model is not flexible enough to provide unbiased estimates for the spatial ranges.
As follows we consider an intuitive proposal for the construction of nonseparable covariance
structures, which is based on mixing separable functions as in Fonseca and Steel (2011) and latent
distances discussed in Cox and Cox (2000) and used in Apanasovich and Genton (2010).
A review of the main approaches to build a valid multivariate cross-covariance function is
presented in Genton and Kleiber (2015).
2.3 Multivariate spatial modeling based on mixtures
Separable functions result in limited structures in the modeling of space-time or space-
component interaction. There are several ways to construct nonseparable covariance functions
that provide more flexible and realistic covariance structures. One way to build such structures is
based on mixtures.
In the space-time context, Ma (2002, 2003) introduced models based on mixtures of purely
spatial and purely temporal covariance functions. Particularly, the author presents mixtures of
scale and the positive power mixture of separable covariance functions. Ma argued that one benefit
of the mixture method is that it generates a sufficient variety of nonseparable spatiotemporal
covariance models, with appropriate choices of the mixing function and the purely spatial and
temporal covariances. As a result, the proposal provides an easy and effective way to construct new
spatiotemporal covariance models, which depend only on the characterization of Laplace transforms.
Porcu et al. (2007) investigated the properties of the stationary spatiotemporal scale mixture
based random field. The model described in Porcu et al. (2007, Proposition 1) is a special case
of a wider class of covariance functions introduced by Ma (2003), where variograms in Laplace
transform are defined by Burnstein functions. Some important classes of cross-covariance functions
are described in Porcu and Zastavnyi (2011). They provide some results for mixture based models
11
allowing the construction of cross-covariance models (Theorem 1, p. 1297). Recent works also use
covariance models via mixture representation to characterize classes of covariance models (Daley
et al., 2014; Bourotte et al., 2016; Porcu et al., 2016).
Fonseca and Steel (2011) developed a class of space-time covariance functions based on the
scale mixture defined in Ma (2002, 2003) which is obtained analytically by appropriate choices of
the mixing distribution. The proposed models are obtained by mixing purely spatial and purely
temporal valid covariance functions. More specifically, let (U, V ) be a bivariate random vector with
joint distribution g(u, v). This proposal considers two uncorrelated processes: Z1(s, u) : s ∈ D a
purely spatial process for every u ∈ <+ with a stationary covariance C1(·;u) and Z2(t, v) : t ∈ <+
a purely temporal process for every v ∈ <+ with stationary covariance C2(·, v), both independent
of (U, V ). The mixture representation of the covariance structure of Z(s, t) = Z1(s, U)Z2(t, V ) is a
convex combination of separable covariance functions. If stationarity is assumed in space and time
then Cov[Z(s, t), Z(s + h, t+ l)], given by
C(h, l) =
∫ ∫C1(h;u)C2(l; v)g(u, v)dudv, (2.13)
is a valid nonseparable function for h ∈ D ⊆ <d and l ∈ <+.
This work proposes to modify (2.13) to deal with the multivariate spatial specification. As
follows we present a class of multivariate spatial covariances which is flexible and intuitive depending
only on the specification of univariate correlation functions in space and on mixing functions in <2+.
The cross-dependencies between components of a spatial vector is based on latent dimensions as in
Apanasovich and Genton (2010). The idea is to define the vector of components as coordinates in
a k−dimensional latent space, for an integer 1 ≤ k ≤ p, that is, the i− th component is represented
as ξi = (ξi1, . . . , ξik)T . If stationarity is assumed also in the latent dimensions and hij = ξi − ξj ,
then the covariance of Y(s) is a convex combination of separable covariance functions given by
Cij(h,hij) =
∫ ∫C1(h;u)C2(hij ; v)gij(u, v)dudv (2.14)
with hij ∈ <k representing a latent separation vector between the components i, j = 1, . . . , p and
h ∈ D ⊆ <d, the separation vector in space.
It is possible to analytically solve (2.14), still assuring positive definitiness of the covariance
structure, defining C1(h;u) = exp−‖h‖u and C2(hij ; v) = exp−‖hij‖v, with ‖.‖ Euclidean
norm, resulting in a valid cross-correlation function
ρij(h; hij) = Mij(−‖h‖,−‖hij‖), (2.15)
12
with Mij(·, ·) a bivariate moment generating function. Following the specification presented in
Fonseca and Steel (2011), for each i, j define Uij = X0,ij + X1,ij and Vij = X0,ij + X2,ij with
Xl,ij nonnegative random variables with moment generating functions Ml,ij , l = 0, 1, 2. Then, the
correlation function implied by (2.14) is
ρij(h; hij) = M0,ij(−‖h‖ − ‖hij‖)M1,ij(−‖h‖)M2,ij(−‖hij‖). (2.16)
This general class allows for different parametric representations for each component, as we
may vary the specifications for M0,ij , M1,ij and M2,ij . Observe that if Uij and Vij are uncorrelated,
that is, Uij = X1,ij and Vij = X2,ij , and correlation in space is the same for all componentes,
M1,ij(·) = M1(·), ∀ i, j, then ρij(h; hij) = M1(−‖h‖)M2,ij(−‖hij‖) which is separable.
A valid covariance structure can be built by considering co-located covariance coefficients σij ,
i, j = 1, . . . , p, and Cij(h,hij) = σijρij(h; hij), i, j = 1, . . . , p, since if A and B are nonnegtive
definite matrices, with A = Aijpi,j=1 and B = Bijpi,j=1, then so is AB, AB = AijBijpi,j=1.
Thus, the general valid covariance function is given by
Cij(h; hij) = σijρij(h; hij) (2.17)
= σijM0,ij(−‖h‖ − ‖hij‖)M1,ij(−‖h‖)M2,ij(−‖hij‖),
with h ∈ <d, hij ∈ <k and co-located covariance coefficients σij i, j = 1, . . . , p, where σii ∈ <+ and
σij ∈ <, i 6= j.
The fundamental step in the definition of this class of functions lies on the representation of
the dependence between U and V for each (i, j). In particular, to generate flexible classes of cross-
covariance functions we need only to specify the moment generating functions M0,ij , M1,ij and
M2,ij in (2.17).
2.3.1 Flexible classes
Let M0,ij , M1,ij and M2,ij be the moment-generating functions for the gamma distribution. The
resulting covariance function is in the Cauchy family, which allows for long range dependence and
easy computation of the Hurst effect (Gneiting and Schlather, 2004). Let Xl,ij ∼ Gamma(αl, φij),
∀ i, j = 1, . . . , p, l = 0, 1, 2, and h ∈ <d and hij ∈ <k. Then, from (2.17), the proposed cross-
covariance function is
Cij(h; hij) = σij
(1 +‖h‖+ ‖hij‖
φij
)−α0(
1 +‖h‖φij
)−α1(
1 +‖hij‖φij
)−α2
(2.18)
13
with σii ∈ <+, i = 1, . . . , p, σij ∈ <, i 6= j, i, j = 1, . . . , p, αl > 0, l = 0, 1, 2, and φij > 0,
i, j = 1, . . . , p. In order to avoid redundancy, like Cressie and Huang (1999) and Fonseca and Steel
(2011), we fix αi = 1, for i = 1, 2.
Thus the proposed flexible model is given by
Cij(‖h‖, δij) = σij
(1 + δij +
‖h‖φij
)−α0(
1 +‖h‖φij
)−1(1 + δij)
−1 , i, j = 1, . . . , p, (2.19)
where we adopt the following reparametrization:
φij = φ11 + dij , i = 1, . . . , p, j = 2, . . . , p,
where δij =‖hij‖φij
represents a latent distance between components i, j, σij is the co-located
covariance between components, α0 ≥ 0 is a smoothness parameter and φij ’s are spatial range
parameters. Observe that φij > 0, then dij > −φ11.
Notice that the model is conveniently specified in such a way that if dij = 0, for i = 1, . . . , p,
j = 2, . . . , p, and α0 = 0, the separable model of Mardia and Goodall (1993) is obtained.
The general class is flexible enough to generate a nonseparable covariance structure and allows
for different spatial ranges associated to each component, which may be obtained through the
specification of p ranges φii, i = 1, . . . , p and p(p − 1)/2 cross ranges φij = φji, i 6= j. In the
conditional approach presented in Cressie and Zammit-Mangion (2016), it is necessary to specify
a univariate covariance function for a component and p− 1 conditional covariance functions. The
construction of the cross-covariance structure and the remaining p− 1 components is done from an
interaction function.
Other choices of moment-generating functions may be considered as suggested in Fonseca and
Steel (2011). For instance, if X1,ij follows an Inverse Gamma distribution ∀i, j then the resulting
function M1,ij would be the Matern covariance function. For example, let Xl,ij ∼ Gamma(αl, φij),
∀ i, j = 1, . . . , p, l = 0, 2, Xl,ij ∼ InvGamma(ν, 1/φij), ∀ i, j = 1, . . . , p, l = 1, and h ∈ <d and
hij ∈ <k, then de cross-covariance function is
Cij(h; hij) = σij
(1 +‖h‖+ ‖hij‖
φij
)−α0 2
Γ(ν)
(‖h‖)ν2
φijKν
(2
(‖h‖φij
)1/2)(
1 +‖hij‖φij
)−α2
(2.20)
with Kν(.) the modified Bessel function of the third kind of order ν, σii ∈ <+, i = 1, . . . , p, σij ∈ <,
i 6= j, i, j = 1, . . . , p, αl > 0, l = 0, 1, 2, and φij > 0, i, j = 1, . . . , p. If ν = 1/2, the space component
would have a exponential covariance function.
In what follows, we restrict our attention to (2.19).
14
2.4 Bayesian inference
Let y = (yt(s1), . . . ,yt(sn)) be a matrix of multivariate data observed at spatial locations
s1, . . . , sn ∈ D with replicates t ∈ Z+, where yt(si) = (y1t(si), . . . , ypt(si))′, t = 1, . . . , T , is a
p−dimensional vector. As follows, replicates are considered in order to ensure identifiability of
model parameters and latent distances. If the Gaussian assumption is made, the likelihood function
with T independent replicates for the unknown parameters based on n spatial locations is given by
l(y; Θ) = (2π)−npT
2 |Σ|−T/2exp
−1
2
T∑t=1
(yt − µ)′Σ−1(yt − µ)
(2.21)
with yt the vectorized version of (yt(s1), . . . ,yt(sn)) with np observations; µ = Xβ the mean
vector with block diagonal design matrix X, where each block represents the predictor for each
component, and a parameter vector β; Σ the covariance matrix with dimension np × np. The
covariance matrix is described by a parametric vector Ψ defined by equation (2.19). In particu-
lar for our model specification, Ψ = (σ, δ, α0, φ11,d) and the parametric vector is Θ = (Ψ,β),
with σ the vector of co-located covariance σij , i, j = 1, . . . , p, δ the vector of latent distances
δij , i 6= j, i, j = 1, . . . , p, α0 the smoothness parameter, φ11 the range parameter of the
first component, d = (d12, d13, . . . , d1p, . . . , dpp) the vector of dissimilarities among ranges and
β = (β11, . . . , β1q1 , β21, . . . , β2q2 , . . . , βp1, . . . , βpqp), with qi, i = 1, . . . , p, the number of elements of
the predictor of each component. From now on, we consider qi = q, i = 1, . . . , p.
Gneiting et al. (2010) have introduced a class of Matern cross-covariance functions defined by
Cij(h) = σijM(h|νij , aij) = ρijσiσjM(h|νij , aij), i, j = 1, . . . , p, with co-located correlation coeffi-
cient ρij , co-located variance parameters σi and σj , smoothness parameter νij and scale parameter
aij . The authors discuss the values of νij , aij , and ρij that result in a bivariate valid structure.
Apanasovich et al. (2012) use the same parameterization of Gneiting et al. (2010) to give a general
characterization of the parameters that yield a valid multivariate Matern model for any number of
components. This parameterization can also be seen in Daley et al. (2014). To simplify the esti-
mation, set σij = σiσj , i, j = 1, . . . , p, with σi ∈ <, i = 1, . . . , p. Note that the ρij signal is present
in σi or σj components and we can still reconstruct the co-located variance and the co-located
covariance. Thus we estimate the vector σ = (σ1, . . . , σp).
To complete the Bayesian model specification, prior independence is assumed for the parameters
in the proposed model (2.19) with σi ∼ Normal(ci, c∗i ), i = 1, . . . , p, δij ∼ Gamma(fij , f
∗ij), i < j,
i = 1, . . . , p, j = 2, . . . , p, φ11 ∼ Gamma(u × med(δs), u), with med(δs) denoting the median of
the distances among observed locations and β ∼ MVN(b,B) pq-dimensional (MVN - Multivariate
15
Normal distribution). A continuous prior specification would assign zero probability for α0 = 0
and d = (d12, d13, . . . , d1p, . . . , dpp) = 0. As an alternative, we consider the following joint mixture
representation for ∆ = (α0,d)
π(∆ | φ11) = p0D0 + (1− p0)g(∆ | φ11), (2.22)
with D0 the dirac function at α0 = 0 and dij = 0, i = 1, . . . , p, j = 2, . . . , p, and g(∆ | φ11) a
continuous joint distribution for α0 > 0 and dij > −φ11, i = 1, . . . , p, j = 2, . . . , p. Thus, p0 is the
prior probability of a separable covariance function and p0 = 0 implies a completely continuous prior
π(∆ | φ11) = g(∆ | φ11). In this sense, considering independence, that is, g(∆ | φ11) = g1(α0) ×∏pi=1
∏pj=2 g2(dij | φ11), it is necessary to specify the prior distributions for parameters α0 and dij .
Then consider g1(α0) ≡ Gamma(r, r∗), for α0 > 0, and g2(dij | φ11) ≡ TruncatedNormal(v, v∗), for
dij ∈ (−φ11,∞), i = 1, . . . , p, j = 2, . . . , p.
Inference is based on stochastic simulations from the complete conditional distributions for
sets of parameters. In particular Metropolis-Hastings steps are considered in the Gibbs sampler
algorithm as detailed in Gamerman and Lopes (2006).
2.4.1 Prediction
One of the main goals in spatial data analysis is to obtain prediction for unobserved locations
or for missing data within the observed data. Let yu be the observation vector at unmeasured
locations su ∈ D ⊆ <d. The prediction of yu is based on the predictive distribution p(yu | yo),
with yo denoting the vector of observed data. Thus,
p(yu | yo) =
∫p(yu | yo,Θ)p(Θ | yo)dΘ. (2.23)
From the Gaussian assumption, the distribution p(yu | yo,Θ) is also Gaussian with parameters
µ∗ = µu + ΣuoΣ−1oo (yo − µo) and Σ∗ = Σuu − ΣuoΣ
−1oo Σou. Assume that Θ(1), . . . ,Θ(M) are a
sample from the posterior distribution (Θ | yo) obtained by MCMC sampling. Thus, the predictive
distribution in (2.23) may be obtained by the approximation
p(yu | yo) =1
M
M∑i=1
p(yu | yo,Θ(i)). (2.24)
Composition sampling is considered to obtain samples from this predictive distribution within the
Metropolis Hasting algorithm (Banerjee et al., 2015, p. 126).
16
2.5 Bayesian hypotheses testing for separability
The separable model is a special case of the (2.19) if the spatial ranges are equal, φij = φji = φ,
∀i, j = 1, . . . , p, and α0 = 0. From a frequentist point of view, many authors present a formal
method to test separability in the spatiotemporal and multivariate space-time models (Mitchell
et al., 2005, 2006; Fuentes, 2006; Li et al., 2007, 2008). The test proposed in this work aims to
measure the degree of separability between space and components and we follow the Bayesian
paradigm for hypothesis testing.
Consider the prior distribution for ∆ = (α0,d) as in (2.22). The resulting posterior distribution
in this specification is also a mixture
π(∆ | y, φ11) = p0D0 + (1− p0)g(∆ | y, φ11), (2.25)
with p0 being the posterior probability of separable covariance functions given the data. The
posterior probabilities p0 might be used to select a model or to predict new observations based
on model averaging across both models (Hoeting et al., 1999). As follows the main interest is to
develop a Bayesian test of separability.
Consider the null hypothesis H0: ∆ ∈ Ω0 and the alternative hypothesis H1: ∆ ∈ Ω1, where
Ω0 ∪ Ω1 = Ω. In particular, consider Ω0 = 0 and losses w0 if H0 is rejected when the process
is separable and w1 if H0 is not rejected when the process is nonseparable. Correct decisions are
associated to null losses. The general idea is to choose the action which leads to the minimum
posterior expected loss (DeGroot and Schervish, 2011). Such test procedure rejects H0 when
p0 = P (H0|y) ≤ w1w0+w1
.
It is common to use the Bayes factors (BF) for comparing a point hypothesis to a continuous
alternative. We can define Bayes factor as the ratio of the posterior probabilities of the alternative
and the null hypotheses over the ratio of the prior probabilities of the alternative and the null
hypotheses. Thus, the BF is given by
BF =1− p0p0
/1− p0p0
. (2.26)
Observe that (2.26) would be the posterior odds against H0 if p0 = 0.5. Considering this
situation, we can reconstruct the interpretation table of the BF given in Kass and Raftery (1995)
based on information of the posterior probability of separability p0 and losses w0 and w1. Table 2.1
presents the interpretation of the Bayesian test for separability proposed in this subsection. Detailed
information about BF is described in Kass and Raftery (1995). More details about Bayesian
hypotheses testing can be seen in Robert (1994) and Schervish (1995).
17
BF p0 (w0;w1)Evidence against H0
(against separability)
1 to 3 0.50 to 0.25 (1 to 3; 1) Not worth more than a bare mention
3 to 20 0.25 to 0.05 (3 to 20; 1) Subtancial nonseparability
20 to 150 0.05 to 0.01 (20 to 150; 1) Strong nonseparability
> 150 < 0.01 (>150; 1) Very strong nonseparability
Table 2.1: Interpretation table for the Bayesian separability test. BF: Bayes Factor; p0: posterior
probability of separability; w0: loss associated to the decision of rejecting H0 when H0 is true; w1:
loss associated to the decision of not rejecting H0 when H1 is true.
2.6 Simulated examples
2.6.1 Example 1: computing posterior probabilities of separability
In practice, repeated observations are available, at each location, and in this paper we will
assume that we have independent replications. It is insufficient to use a single replication of the
data to make inferences on the parameters of the covariance function, especially on the latent
parameters. In this context, we simulate scenarios from separable to very nonseparable structures
considering the covariance function in (2.19).
We generate three datasets with p = 2 components, n = 80 spatial locations and T = 10
independent replicates, specifying different combinations of parameters responsible for separability.
We consider a Gaussian process, so yt ∼ MVN(Xβ,Σ) np-dimensional, t = 1, . . . , T , where Σ
is a np × np covariance matrix and X is a p-block diagonal design matrix, with identical blocks
containing an intercept term and independent variables (latitude, longitude and altitude, randomly
generated in the cube [0,1] × [0,1] × [0,1]). For each dataset, we consider three scenarios for
α0, which are α0 = 0, α0 = 0.1 and α0 = 0.2. The specification of the other parameters is
the same for all datasets and scenarios and is given by Θ∗ = (β, δ12, σ1, σ2) with β = (β1,β2),
β1 = (1,−0.2,−0.8, 0.5), β2 = (1.5, 0.6,−0.5,−0.8), δ12 = 1.5, σ1 = 1 and σ2 = 1.5. The structure
for spatial ranges are constructed as follows
Dataset 1 (equal ranges): considers φ11 = φ22 = φ12 = 0.1 (d12 = d22 = 0);
Dataset 2 (similar ranges): assumes the spatial range of the second variable is somewhat
18
different, ie, φ11 = φ12 = 0.1 and φ22 = 0.13 (d12 = 0 and d22 = 0.03);
Dataset 3 (different ranges): considers φ11 = φ12 = 0.1 and φ22 = 0.2 (d12 = 0 and d22 = 0.1).
The prior distributions for all scenarios follow the discussion in Section 2.4 and the hyperpa-
rameters are described in Appendix A.2.
Table 2.2 presents the posterior probabilities of separability p0 for each scenario. Note that
the model that considers equal ranges and α0 = 0 results in estimated p0 close to 1. In the
same dataset, assuming α0 = 0.2, the posterior probability of separability indicates substantial
nonseparability, following Table 2.1. In dataset 2, even assuming α0 = 0 there are indications that
the structure is nonseparable, that is, even if the ranges are only slightly different from one another,
the nonseparable specification is preferred. By increasing the value of α0, the nonseparability
hypothesis becomes evident. In dataset 3, where we assume that one of the ranges is completely
different from the other two, there is always evidence of substantial or strong nonseparability,
independently of α0.
Dataα0
0 0.10 0.20
Dataset 10.987 0.251 0.146
(equal ranges)
Dataset 20.259 0.115 0.059
(similar ranges)
Dataset 30.058 0.000 0.000
(different ranges)
Table 2.2: Posterior probabilities of separability p0 for each dataset and scenario of α0.
The posterior probability of separability is an easy measure to interpret, regarding inference
on separability. Apanasovich and Genton (2010) propose models that use the covariance functions
presented in Gneiting (2002). They also estimate a separability parameter defined in the interval
[0, 1], where 0 indicates separability and 1 should indicate strong nonseparability. However, Fon-
seca and Steel (2017) verified that this parameter is not able to correctly capture the degree of
separability in the data structure, showing that when the parameter is at the upper limit there
is no implication of strong nonseparability. Also, from simulated exercises with model (2.19), we
19
find that other parameters influence the separability, for example, the spatial ranges. Thus, the
computation of the probability of separability is an attractive alternative which does not depend
on the scale of ranges or smoothness.
2.6.2 Example 2: model discrimination
This section presents a simulated example for three different scenarios, in order to compare
different models analyzing their predictive performance and posterior probability of separability.
We also aim at verifying the ability of the nonseparable model, presented in equation (2.19), to
recover simpler structures.
Three datasets were generated with p = 3 components, T = 20 independent replications and
n = 55 spatial locations in the [0, 1]× [0, 1] square. The information about three spatial locations
were removed for prediction. The datasets are generated using the following specifications:
dataset 1 (separable): Simulation based on the separable model with Cauchy correlation
function, specified in Appendix A.1. Parameters aij defined in order to high correlation be-
tween components which may be rewritten as a∗ij =aij√aiiajj
. The parameter specification is
Θ = (β, a11, a22, a33, a∗12, a
∗13, a
∗23, φ), with β = (β1,β2,β3), β1 = (1,−0.2,−0.8, 0.5), β2 =
(1.5, 0.6,−0.5,−0.8), β3 = (1.8,−0.4,−0.3, 0.6), a11 = 0.8, a22 = 0.7, a33 = 0.5, a∗12 = −0.8,
a∗13 = 0.9, a∗23 = −0.7 and φ = 0.1;
dataset 2 (nonseparable with equal ranges): Simulation based on the nonseparable covariance
function presented in (2.19) with φij = φji = φ, i, j = 1, . . . , p, that is, dij = 0, i = 1, . . . , p,
j = 2, . . . , p. The parametric vector is Θ = (β, σ1, σ2, σ3, δ12, δ13, δ23, α0, φ). We define α0 = 0.25.
The δ parameters were chosen such that the variables present weak/moderate correlation. Thus,
we consider the following parameter specification: σ1 = 2, σ2 = 1, σ3 = 2.5, δ12 = 2, δ13 = 2.2,
δ23 = 1.9 and φ = 0.1. The β = (β1,β2,β3) parameters are the same defined in dataset 1.
dataset 3 (nonseparable): Simulation based on the nonseparable covariance function pre-
sented in (2.19) with dij 6= 0, i = 1, . . . , p, j = 2, . . . , p. The parametric vector is Θ =
(β, σ1, σ2, σ3, δ12, δ13, δ23, α0, φ11, d12, d13, d22, d23, d33), with β = (β1,β2,β3), β1 = (11.2,−0.07,
0.25,−0.01), β2 = (−6.8, 1.3,−0.25, 0.02), β3 = (5.6, 0.1, 0.13,−0.01), σ1 = −2.4, σ2 = 8.7,
σ3 = −1.4, δ12 = 0.6, δ13 = 1.7, δ23 = 1.2, α0 = 0, φ11 = 0.07, d12 = 0.15, d13 = 0.13, d22 = 0.18,
d23 = 0.23 and d33 = −0.01.
In order to illustrate the correlations between the variables in each dataset, we estimate the
20
parameters in the model y = µ+ ε for each spatial location i, i = 1, . . . , 55, that is,
y = (y1,y2,y3)′ ∼ MVN(µ,C), (2.27)
with C an arbitrary full cross-covariance matrix.
Figure 2.1 presents the posterior median and 95% credibility interval of the correlations between
variables for each dataset.
21
spatial location
post
erio
r co
rrel
atio
n ρ
12
1 8 17 27 37 47
−1
−0.
50
0.5
1
spatial locationpo
ster
ior
corr
elat
ion
ρ13
1 8 17 27 37 47
−1
−0.
50
0.5
1
spatial location
post
erio
r co
rrel
atio
n ρ
23
1 8 17 27 37 47
−1
−0.
50
0.5
1
(a) Dataset 1
spatial location
post
erio
r co
rrel
atio
n ρ
12
1 8 17 27 37 47
−1
−0.
50
0.5
1
spatial location
post
erio
r co
rrel
atio
n ρ
13
1 8 17 27 37 47
−1
−0.
50
0.5
1
spatial location
post
erio
r co
rrel
atio
n ρ
23
1 8 17 27 37 47
−1
−0.
50
0.5
1
(b) Dataset 2
spatial location
post
erio
r co
rrel
atio
n ρ
12
1 8 17 27 37 47
−1
−0.
50
0.5
1
spatial location
post
erio
r co
rrel
atio
n ρ
13
1 8 17 27 37 47
−1
−0.
50
0.5
1
spatial location
post
erio
r co
rrel
atio
n ρ
23
1 8 17 27 37 47
−1
−0.
50
0.5
1
(c) Dataset 3
Figure 2.1: Posterior median and 95% CI of the cross-correlations among variables for each dataset
considering an independent multivariate normal distribution for each spatial location.
We estimate four multivariate models for each dataset and their performances are compared in
22
predictive terms. The models considered are described as follows:
1. SEP: the separable model as presented in Mardia and Goodall (1993) with covariance function
in the Cauchy family, as in Appendix A.1;
2. NSEP-φ: the nonseparable model with equal ranges and covariance function as defined in
(2.19), that is, dij = 0, i = 1, . . . , p, j = 2, . . . , p, and a continuous prior for α0;
3. NSEP: the nonseparable model with different ranges as defined in (2.19), with prior for α0
and dij , i = 1, . . . , p, j = 2, . . . , p, as in (2.22) with p0 = 0;
4. Mix-NSEP: the nonseparable model with different ranges as defined in (2.19), with a mix-
ture prior for α0 and dij , i = 1, . . . , p, as in (2.22).
Details about prior distributions for model parameters are deferred to Appendix A.2. Inference
was carried under an MCMC scheme and for convergence monitoring we use the algorithms in Coda
package for R (Plummer et al., 2006).
Table 2.3 presents predictive measures for model comparison, for each dataset. The IS (Interval
Score), WCI (Width of Credibility Interval) and LPS (Log Predictive Score) comparison measures
are predictive score rules that provide summaries for accuracy of probabilistic predictions. The
smaller the IS, WCI or LPS, the better the model in predictive terms. For more details about
scoring rules see Gneiting and Raftery (2007) and Appendix A.3.
In the first scenario (dataset 1) the data was simulated assuming a separable structure and
indeed the hypothesis test for separability obtained from the most complex model (Mix-NSEP) for
p0 = 0.814 indicates separability for this data. In Figure 2.2 it is possible to verify that the most
complex model is able to recover the simplest separable structure. The predictive measures are
similar for all models, then the simpler model is preferable (Occam’s razor).
In the second scenario (dataset 2) the data is nonseparable, however the ranges are all equal.
The simulated data represents an example of moderate nonseparability and separable models are
expected to reasonably mimic this covariance structure. The hypothesis of separability may be
tested for the Mix-NSEP model and the test indicates posterior probability of separability p0 =
0.311, that is, nonseparability could be assumed but it is not substantially advantageous when
compared to separability, for this data. The nonseparable model with equal ranges (NSEP-φ) is
selected as the best model according to the predictive discrepancy measures.
In the third scenario (dataset 3) the data is nonseparable and the spatial ranges are all different.
This represents an example of strong nonseparability which could not be reproduced by separable
23
covariance structures. The hypothesis test for the model with different ranges (Mix-NSEP) correctly
indicates strong nonseparability for this data with p0 = 0. The models with equal ranges (SEP
and NSEP-φ) have poor predictive performance when compared to the models with unequal spatial
ranges.
0.0 0.4 0.8 1.2
0.0
0.2
0.4
0.6
0.8
distance in space
cova
rianc
e
(a) Cov11
0.0 0.4 0.8 1.2
0.0
0.2
0.4
0.6
distance in space
cova
rianc
e
(b) Cov22
0.0 0.4 0.8 1.2
0.0
0.2
0.4
distance in space
cova
rianc
e
(c) Cov33
0.0 0.4 0.8 1.2
−0.
6−
0.4
−0.
20.
0
distance in space
cova
rianc
e
(d) Cov12
0.0 0.4 0.8 1.2
0.0
0.2
0.4
0.6
distance in space
cova
rianc
e
(e) Cov13
0.0 0.4 0.8 1.2
−0.
5−
0.3
−0.
1
distance in space
cova
rianc
e
(f) Cov23
Figure 2.2: Posterior median of covariance function (gray full line) and 95% credible interval (gray
dashed lines) of Mix-NSEP model. Black full line: true covariance function (SEP model).
24
data model average WCI average IS LPS p0
Dataset 1
SEP 6.70 7.53 70.19 –
(separable)
NSEP-φ 6.58 7.54 70.29 –
NSEP 6.61 7.58 70.09 –
Mix-NSEP 6.67 7.48 70.12 0.814
Dataset 2SEP 17.16 24.12 309.96 –
(nonseparable:NSEP-φ 17.10 24.45 309.25 –
equal ranges)NSEP 17.07 24.51 309.38 –
Mix-NSEP 17.21 24.24 309.92 0.311
Dataset 3SEP 32.04 33.80 373.81 –
(nonseparable:NSEP-φ 31.93 33.99 373.72 –
different ranges)NSEP 29.75 31.69 369.65 –
Mix-NSEP 29.73 31.41 369.67 0.000
Table 2.3: Predictive measures for model comparison and posterior probability of separability for
the simulation examples. IS=Interval Score, WCI=Width of Credibility Interval and LPS=Log
Predictive Score.
2.7 Ceara weather dataset
In this section we apply the model defined in (2.19) to an illustrative dataset obtained for a
collection of monitoring stations in Ceara state, Brazil. The weather dataset was obtained from
Instituto Nacional de Pesquisas Espaciais (INPE) and consists of three variables, temperature (C),
humidity (%) and solar radiation (MJ/m2), measured daily at 12 o’clock and recorded at 24 stations
from December 20, 2010 to February 28, 2011. Locations with less than 10% missings have gone
through an imputation process1. In addition, we work with the seasonally adjusted series to obtain
T = 71 independent replicates. For predictive comparison and validation, we consider two spatial
locations. Figure 2.3 shows the locations of these 24 monitoring sites and two hold-out sites on a
latitute-longitude scale.
1The imputation was performed applying the mice package in R.
25
longitude
latit
ude
1
2
−42 −40 −38
−8
−7
−6
−5
−4
−3
Figure 2.3: Collection of monitoring stations in Ceara state, Brazil. Numbers 1 and 2 are spatial
locations considered for predictive comparison.
In order to evaluate the correlations between each pair of variables, we estimate the parameters
in a model as in (2.27).
From Figure 2.4 note that there is strong correlation among the three variables, so we expect
the component distance between them to be small.
spatial location
post
erio
r co
rrel
atio
n T
vs.
H
1 4 7 11 16 21
−1
−0.
50
0.5
1
(a) temp. vs. humid.
spatial location
post
erio
r co
rrel
atio
n T
vs.
SR
1 4 7 11 16 21
−1
−0.
50
0.5
1
(b) temp. vs. solar rad.
spatial location
post
erio
r co
rrel
atio
n H
vs.
SR
1 4 7 11 16 21
−1
−0.
50
0.5
1
(c) humid. vs. solar rad.
Figure 2.4: Posterior median and 95% CI of the cross-correlations among components considering
an independent multivariate normal distribution for each spatial location.
We compared the predictive performance of the following models: separable model (SEP) with
covariance function in the Cauchy family (see Appendix A.1); nonseparable model with equal
26
spatial ranges (NSEP-φ); nonseparable model with different spatial ranges (NSEP); nonseparable
model with different spatial ranges and mixing prior for α0 and dij , i = 1, 2, 3, j = 2, 3, defined
in (2.22) (Mix-NSEP); and linear model of coregionalization (LMC) with covariance function for
each component in the Cauchy family (see Appendix A.1). Parameter estimation was performed
considering the likelihood described in (2.21).
The prior distributions for the parameters in the proposed model follow the discussion in Sec-
tion 2.4 and their hyperparameters are presented in Appendix A.2. MCMC methods were used
to generate posterior and predictive samples. For MCMC convergence monitoring we used the
algorithms in Coda package for R.
Using separable models that consider covariance structures for all variables to be proportional in
space is a limitation. We evaluated the behavior of the posterior spatial correlation for each variable
(temperature, humidity and solar radiation) from a univariate spatial model, where the covariance
function is defined by C(‖h‖) = σ2ρ(‖h‖), with σ2 the constant variance, ρ(·) a valid spatial
correlation function and ‖h‖ the Euclidean distance. We consider spatial correlation function in
the Cauchy family. These models were used only for this evaluation and are presented in Appendix
A.1 (for more details see Banerjee et al., 2015). The prior distributions are presented in Appendix
A.2. Figure 2.5 shows the posterior spatial correlation for each component. Note that the humidity
variable behaves differently from the other components.
0.0 0.5 1.0 1.5 2.0
0.0
0.2
0.4
0.6
0.8
1.0
distance in space
corr
elat
ion
temperaturehumiditysolar radiation
Figure 2.5: Posterior median (full line) and 95% credible interval (dashed lines) of spatial correlation
of the univariate spatial models for each component.
Therefore, models that allow for different structures associated to different components are
27
presumed to have better predictive performance than models that consider proportionality in space
between variables’ covariances. The Mix-NSEP and NSEP model estimate different spatial ranges
for each component, as well as the ranges of cross-dependency structures. Note, in Figure 2.6, the
difference in the posterior densities of each range in the Mix-NSEP model compared to the single
range estimated in the SEP model. This flexibility makes the Mix-NSEP model present better
predictive results across all models, including the LMC (see Table 2.4).
0.0 0.1 0.2 0.3 0.4
05
1525
35
φtemperature
dens
ity
SEPMix−NSEP
0.0 0.1 0.2 0.3 0.4
05
1525
35
φhumidity
dens
itySEPMix−NSEP
0.0 0.1 0.2 0.3 0.4
05
1525
35
φs.radiation
dens
ity
SEPMix−NSEP
0.0 0.1 0.2 0.3 0.4
05
1525
35
φtemp. x humidity
dens
ity
SEPMix−NSEP
0.0 0.1 0.2 0.3 0.4
05
1525
35
φtemp. x s.radiation
dens
ity
SEPMix−NSEP
0.0 0.1 0.2 0.3 0.4
05
1525
35
φhumidity x s.radiation
dens
ity
SEPMix−NSEP
Figure 2.6: Posterior densities of the spatial ranges of SEP and Mix-NSEP models.
28
Model average WCI average IS LPS p0
SEP 46.65 52.97 898.88 –
NSEP-φ 46.77 53.27 898.89 –
NSEP 42.95 48.61 891.23 –
Mix-NSEP 43.04 48.53 889.77 0.000
LMC 44.75 50.32 894.75 –
Table 2.4: Comparison of models in predictive terms for the Ceara weather dataset. IS=Interval
Score, WCI=Width of Credibility Interval and LPS=Log Predictive Score.
2.8 Discussion
This chapter extends the class of nonseparable covariance functions proposed in Fonseca and
Steel (2011) to the modeling of component and spatial dependence, considering latent distance
between components as in Apanasovich and Genton (2010). We have proposed a Bayesian test to
measure the degree of separability between space and components. The posterior probability p0
has shown to be an easily interpretative measure in terms of separability of the component-spatial
covariance structures.
From the simulated and meteorological data examples, it is clear that flexible structures are
needed, which are able to accommodate the assumption of different spatial ranges in space for a
vector of spatial processes. The presented model was able to recover simple structures and presented
better predictive performance when applied to Ceara weather dataset than models widely used in
the literature, such as the separable model and the linear model of coregionalization.
29
Chapter 3
Likelihood computation for large data
3.1 Introduction
With the increase of high-resolution geocoded data, the big data problem became crucial in the
spatial and spatiotemporal setup. For instance, if Gaussianity is assumed, large covariance matrices
need to be inverted in the inference procedure and computational effort is of cubic order on the
number of locations. This limitation becomes even more important in the case of spatiotemporal or
multivariate data. Even low dimensional vectors observed over space may lead to huge covariance
matrices, making the inference for unknown parameters not feasible. Thus, a compromise between
complexity and parsimony is called for in this context.
In this chapter, we work with multivariate spatial covariance functions proposed in Chapter 2.
In order to deal with high computational effort, we approximate the full covariance matrices using
a decomposition based on the Kronecker product of two separable matrices of minor dimensions.
These approximations have been applied to the likelihood function in order to obtain fast estimation
of parameters but we still keep the interpretation and flexibility of the multivariate nonseparable
model.
3.2 Separable approximations
We have presented a nonseparable covariance model which results in a full matrix Σ which might
have high dimension and the computation of likelihoods in a Gaussian model require the inversion
and determinant computation of this matrix. We investigate the use of separable approximations
for the matrix Σ which will lead to fast computation of the likelihood function. The approximation
30
will be based on the work of Van Loan and Pitsianis (1993).
Genton (2007) investigates the use of singular decompositions of a full matrix in the context of
nonseparable spatiotemporal covariance matrices. The work considers a decomposition based on
separable matrices which allow for fast inversions and determinant computations. Thus, instead of
np × np matrices, the approximation uses only n × n and p × p matrices. We consider the same
separable approximation in order to compute likelihoods for the nonseparable multivariate spatial
models presented in Section 2.3.1. The aim is to obtain matrices R ∈ <n×n and A ∈ <p×p such
that the Frobenius norm1 of ‖Σ−R⊗A‖F is minimized, for a given full covariance matrix Σ. The
author shows that the solution to this problem is given by the singular value decomposition of a
permuted version of Σ ∈ <np×np.
The idea is to rearrange Σ obtaining another matrix =(Σ) ∈ <n2×p2 , such that the sum of
squares that arises in ‖Σ−R⊗A‖F equals the sum of squares in ‖=(Σ)− vec(R)⊗ vec(A)T ‖F .
It is showed in Golub and Van Loan (1996) that ‖Σ−R⊗A‖F = ‖=(Σ)− vec(R)⊗ vec(A)T ‖Fand ‖Σ‖F = ‖=(Σ)‖F .
The problem then reduces to finding the rank of the rectangular matrix =(Σ) ∈ <n2×p2 . The so-
lution is based on the singular value decomposition of =(Σ), where UT=(Σ)V = diag(w1, . . . , wr),
U ∈ <n2×n2and V ∈ <p2×p2 are orthogonal matrices, w1 ≥ w2 ≥ . . . ≥ wr ≥ 0 and
r = rank(=(Σ)) = minn2, p2. The solution can also be found in Golub and Van Loan (1996)
and is given by:
vec(R) =√w1u1 vec(A) =
√w1v1 (3.1)
with u1 denoting the first column of the matrix U ∈ <n2×n2and v1, the first column of V ∈ <p2×p2 .
In order to measure the quality of the approximation, Genton (2007) defines an approximation
error, denoted by κΣ(R,A), as follows:
κΣ(R,A) =‖Σ−R⊗A‖F
‖Σ‖F. (3.2)
κΣ(R,A) varies between zero (if Σ is separable) and√
1− 1r , and is minimized by R and A
given above. A standardized error index, varying between zero and one is given by:
κ∗Σ(R,A) =κΣ(R,A)√
1− 1r
. (3.3)
From the covariance structure proposed in equation (2.19), we investigate the sensitivity of the
separability approximation error index as a function of α0, for p = 2 and p = 3. Note that we use
1The Frobenius norm of a n× n matrix B (‖B‖F ) is given by ‖B‖F =(∑n
i=1
∑nj=1 b
2ij
)1/231
the idea previously applied to nonseparable spatiotemporal covariance matrices in the context of
nonseparable multivariate spatial covariance matrices. In Figure 3.1 we can see that the separa-
bility approximation error index is not larger than 5% for a covariance structure in which all the
components have the same spatial range. From Figure 3.1(a) note that there is no error when α0
is zero, which reduces to the separable case. If different spatial ranges are considered, it is possible
to see in Figure 3.1(b) that the error index does not start at zero because if α0 = 0 the separable
case is not obtained.
0.0 0.1 0.2 0.3 0.4 0.5 0.6
0.00
0.01
0.02
0.03
0.04
0.05
α0
stan
dard
err
or
(a) Equal spatial ranges.
0.0 0.1 0.2 0.3 0.4 0.5 0.6
0.06
0.08
0.10
0.12
α0
stan
dard
err
or
(b) Different spatial ranges.
Figure 3.1: Separability approximation error index as a function of α0. Full line: p = 2; dashed
line: p = 3.
3.3 Sensitivity study
We present a sensitivity study of the approximation structure investigated in the spatiotemporal
context by Genton (2007), however, used here for the multivariate spatial case. We study different
scenarios and measure the errors obtained in the likelihood approximation. Moreover, we compare
the inferential and predictive results obtained as we apply the full nonseparable model with and
without separable approximation for the covariance matrix in the likelihood computation, as well
as considering a separable model.
Suppose a bivariate dataset of 200 spatial locations in the [0, 1]× [0, 1] square. The observations
were generated from the model y = µ+ε. We consider a Gaussian process, so y ∼ Nnp(µ,Σ), where
32
Σij is obtained from the covariance function defined in (2.19) with dij = 0, for i = 1, . . . , p, j =
2, . . . , p. Therefore, consider the following parameter specification: Θ = (µ1,µ2, δ12, φ, α0, σ1, σ2)
with µ1 = µ2 = 0, δ12 = 2, φ = 0.1, α0 = 0.5 and σ1 = σ2 = 1. In this example, we generate only
one dataset in the region of interest. We plot the likelihood contour with both structures, using
the separable approximation for the covariance matrix and its full original structure. From Figure
3.2 it can be seen that the approximate structure is very similar to the full structure. Note that in
some cases the approximate likelihood and the exact one are almost coincident. It seems that the
approximations are satisfactory.
φ
α 0
0.04 0.12
0.0
0.6
1.2
δ12
α 0
1.0 3.0
0.2
0.6
1.0
σ1
α 00.95 1.15
0.2
0.6
σ2
α 0
0.85 1.05
0.2
0.6
1.0
φ
δ 12
0.07 0.11
1.0
2.5
4.0
σ1
δ 12
0.95 1.15
1.0
2.5
σ2
δ 12
0.90 1.05
1.0
2.5
4.0
σ1
φ0.95 1.15
0.08
0.12
σ2
φ
0.85 1.00
0.07
0.11
σ1
σ 2
0.95 1.10
0.85
1.00
Figure 3.2: Likelihood contour plots. Black line: full structure. Red line: approximate structure.
Dashed black line: true value of parameters.
We analyzed the necessary time to calculate the likelihood function based on a full covariance
matrix and a covariance matrix with approximate structure. We generated p = 2, 3, 5 and 8
variables in a dataset with n = 100, 200, 500, 700 and 1000 spatial locations in the [0, 1] × [0, 1]
square. In this example, 200 replicates were generated in the region of interest.
Table 3.1 shows that the separable approach provides important gains in computational effi-
33
ciency. Note that the time to calculate the likelihood function is substantially lower when we use
the approximate structure.
We also analyzed the time reduction using the separable approximations. Figure 3.3 shows that
the time reduction (in %) to calculate the likelihood function increases as the size of the covariance
matrix increases. Indeed, if we increase the number of variables or spatial locations or both, the
greater will be the computational gain of the approximate structure.
np = 2 p = 3 p = 5 p = 8
full approx. full approx. full approx. full approx.
100 2.3 0.8 3.6 0.6 8.9 0.9 28.0 2.1
200 12.3 3.1 13.9 1.6 44.8 4.2 187.1 11.1
500 74.0 13.4 143.1 12.1 618.6 30.5 2409.5 86.9
700 148.2 20.9 388.4 29.4 1649.7 65.8 6520.7 182.2
1000 374.6 52.6 1020.7 66.2 4673.9 133.5 19180.3 446.2
Table 3.1: Necessary time (in seconds) to calculate the likelihood function based on a full covariance
matrix and an approximate structure. (Intel(R) Core(TM) i7-3630QM, 2.40GHz, 6GB RAM)
Finally, we compare the predictive results obtained by the separable model, the separable ap-
proximation for the covariance matrix in the likelihood of the nonseparable model and the results ob-
tained with the nonseparable original covariance structure, without any approximations for the like-
lihood. For that purpose, we generated five replicated datasets considering the nonseparable covari-
ance function defined in equation (2.19) with dij = 0, for i = 1, . . . , p, j = 2, . . . , p. For each dataset,
we generated p = 2 variables in n = 110 spatial locations in the [0, 1] × [0, 1] square considering
the following parameter specification Θ = (β1,β2, δ12, φ, α0, σ1, σ2) with β1 = (1,−0.2,−0.8, 0.5),
β2 = (1.5, 0.6,−0.5,−0.8), δ12 = 2, φ = 0.2, α0 = 1, σ1 = 1.5 and σ2 = 1. The covariance functions
used for the separable and nonseparable models are respectively shown in equations (3.4) and (3.5):
Cij(‖h‖) =
a11
(1 +‖h‖φ
)−1(i = j = 1)
a22
(1 +‖h‖φ
)−1(i = j = 2)
a12
(1 +‖h‖φ
)−1(i 6= j),
(3.4)
34
6570
7580
8590
9510
0
spatial locations
time
redu
ctio
n (%
)
100 200 300 400 500 600 700 800 900 1000
p = 2p = 3p = 5p = 8
Figure 3.3: Computational time reduction (in percent) in calculation of the likelihood function
using approximate structure.
C(‖h‖, δij) =
σ21
(1 +‖h‖φ
)−(α0+1)
(i = j = 1)
σ22
(1 +‖h‖φ
)−(α0+1)
(i = j = 2)
σ1σ2
(1 + δ12 +
‖h‖φ
)−α0(
1 +‖h‖φ
)−1(1 + δ12)
−1 (i 6= j),
(3.5)
with ‖h‖ the Euclidean distance. We adopted T = 30 independent replicates. The observations
were generated from the model y = Xβ+ ε. We consider a Gaussian process, so yt ∼ Nnp(Xβ,Σ),
t = 1, . . . , T , where Σ is np × np covariance matrix and X are independent variables (latitude,
longitude and altitude, randomly generated in the cube [0, 1]× [0, 1]× [0, 1]). For the nonseparable
models we use the following priors: σi ∼ N(0, 100), i = 1, 2, δ12 ∼ Gamma(1, 0.5), φ ∼ Gamma(0.1×
med(ds), 0.1), with med(ds) = 0.502, β ∼ N8(0, 1000I8) and α0 ∼ Gamma(1, 0.25). For the
separable model we use the following priors: A ∼ InverseWishart(I2, 3), φ ∼ Gamma(0.75 ×
med(ds), 0.75), with med(ds) = 0.502 and β ∼ N8(0, 1000I8). The simulation method used was
MCMC. For convergence monitoring we use the algorithms present in the Coda package in R (see
Plummer et al., 2006).
35
Furthermore, the data of five spatial locations were removed from the trainning data and used
for prediction validation. Therefore, we estimate the model using information about n = 105
spatial locations. After the estimation of the models for each dataset, we were able to compute
measures of predictive performance for each model. Table 3.2 presents the comparison of the
models in predictive terms and our main goal is to analyze the predictive performance between the
full nonseparable model and the approximate nonseparable model. We can see that in predictive
terms the approximation leads to very similar results to the full model likelihood, presenting very
similar performance to the original full nonseparable model. The results of the separable model
are in agreement with the discussion presented in Section 2.6.2, that is, we expect its predictive
performance to be similar to the nonseparable model with equal ranges.
Dataaverage WCI average IS LPS
SEP NS app. NS SEP NS app. NS SEP NS app. NS
1 3.26 3.23 3.21 4.21 4.26 4.25 375.62 375.65 375.19
2 3.25 3.24 3.22 4.25 4.26 4.35 381.58 382.18 382.64
3 3.29 3.25 3.25 3.97 3.93 3.94 355.49 355.06 354.98
4 3.32 3.28 3.29 3.92 3.91 3.96 372.51 372.69 372.50
5 3.25 3.23 3.22 3.88 3.91 3.85 341.26 341.24 341.24
Mean 3.27 3.25 3.24 4.05 4.05 4.07 365.29 365.36 365.31
Table 3.2: Predictive model comparison. SEP: separable model. NS app: nonseparable approximate
model. NS: nonseparable model.
3.4 Discussion
In this chapter we have investigated the performance of an approximation for the full nonsepa-
rable covariance model using the decomposition based on the Kronecker product of two separable
matrices of minor dimensions. A sensitivity study was performed showing that the approximate
approach provides important gains in computational efficiency while keeping the predictive power.
Although taking advantage of approximations to compute the likelihood, our idea keeps interpre-
tation and flexibility.
We conclude that it is better to consider a separable approximation of the nonseparable de-
scribed model than to consider the nonseparable structures, when we assume equal ranges. The non-
separable approximation reduces considerably the computational cost and keeps predictive power,
36
which is usually the main focus of multivariate spatiotemporal data analysis.
37
Chapter 4
Multivariate spatio-temporal
modeling
4.1 Introduction
In Chapter 2 we propose a class of multivariate spatial covariance functions based on mixtures.
The parametric function defined in (2.19) generates a nonseparable covariance structure flexible
enough to allow different spatial ranges for each component and it has been seen that using equal
spatial ranges – which are implied by separable specifications – limits the flexibility of the spatial
structure, not favoring its use.
In the simulated and real datasets presented in previous chapters, we have modeled only the cross
spatial dependence. Generally, the datasets are evaluated over time and the temporal dependence
modeling becomes essential. Then it is necessary to incorporate a structure capable of modeling
the dependence of variables in space and time.
Several papers have extended spatial cross-covariance models for space-time setting. Rouhani
and Wackernagel (1990), Choi et al. (2009), Berrocal et al. (2010) and Iaco et al. (2013), for
example, developed versions of space-time models based on linear model of coregionalization.
Gelfand et al. (2005) considered dynamic approach to model multivariate space-time data. Their
idea is to view the data as a time series of spatial processes, by adapting the structure of dynamic
models to space-time models with space-varying coefficients (Reis et al., 2013).
Following the idea of latent dimensions presented in Section 2.2, Apanasovich and Genton (2010)
have proposed multivariate spatio-temporal covariance functions extending the univariate space-
time class of covariance functions described in Gneiting (2002). More recently, Ip and Li (2016)
38
have extended results of the bivariate matern class presented in Gneiting et al. (2010) to space-time
setting. Models can be very restrictive since the fit and predictions may not be satisfactory if the
dataset presents non-Gaussian characteristics. In fact, if there are aberrant observations in the
dataset, it would be reasonable to think of some way to accommodate these observations since the
identification of outliers is essential to improve model fit and predictive power.
In this chapter, we introduce a spatio-temporal setting of the cross-covariance function presented
in (2.19). Like Gelfand et al. (2005), we also consider dynamic models to describe the temporal
evolution, however, with time-varying covariance coefficients, allowing to accommodate outlying
observations and changes in variability over time.
4.2 Multivariate dynamic spatial models
There are several ways of describing phenomena that vary over time. Generally, the purpose of
time series analysis is related to the forecast for future and unobserved times. Harrison and Stevens
(1976) described a class of Bayesian forecasting models called dynamic models.
The dynamic models (or state space models) assume that at each time t ∈ N the observation
yt (or the vector of observations yt) of the time series is characterized probabilistically by a vector
of parameters θt, called the state vector, whose components may vary over time accommodating
local structures of the temporal process. These models are very flexible because they are able to
treat non-stationary time series with structural changes or irregular patterns. Following Berliner
(1996) we define a casual way to think about the structure of dynamic model:
Observational model: [data | state space process, parameters]
System evolution: [state space process | parameters]
Initial information: [parameters].
An important special case of general state space models is the dynamic linear model (DLM)
which considers Gaussian responses. A DLM is generally characterized by two equations: the
observation equation, which describes the relationship between covariates and the response variable,
and the evolution equation, which describes how the states of the model evolve over time. An
extensive review about these models can be seen in West and Harrison (1997) and Petris et al.
(2009).
In this subsection, our aim is to introduce the temporal component in the multivariate spatial
39
model y = Xβ+ε with covariance matrix Σ. Following the idea in Stroud et al. (2001), define βt =
(β1t . . . ,βqt)′, with βkt = (β1k,t, . . . , βpk,t), as the full parameter vector at time t, for t = 1, . . . , T .
Next, specify a probability model p(βt|βt−1) that links the parameters βt over time. Assuming
linear evolution for p(βt|βt−1) the multivariate space-time model is
yt = Xtβt + εt, εt ∼ MVN(0,Σt) (4.1)
βt = Gtβt−1 + ωt, ωt ∼ MVN(0,Wt), (4.2)
where yt = (yt1, . . . ,ytp)′, is the np × 1 vector of the variables at spatial locations, t = 1, . . . , T ,
with yti = (yti(s1), . . . , yti(sn))′, s1, . . . , sn ∈ D, i = 1, . . . , p, Xt is the np × pq time-dependent
design matrix at time t, Gt is a known matrix and εt and ωt are independent white noise sequences
with mean zero and covariance matrices Σt and Wt, respectively.
Observe that equation (4.1) is an extension of the multivariate spatial model and equation (4.2)
define the evolution over time of the regression parameters βt. Details and characteristics of the
DLM can be seen in West and Harrison (1997).
In this section, we present two different models that incorporate the temporal structure. Re-
member that the covariance matrix Σt in (4.1) defines the multivariate spatial dependence structure,
with its elements given by a parametric cross-covariance function based on equantion (2.19). Thus,
the two proposed models are described below.
Model 1 (M1): assume the model defined in (4.1) and (4.2) but do not consider temporal evolution
in the cross-covariance structure, that is, Σt = Σ. The temporal structure is only present in the
mean and the cross-covariance function is given by equation (2.19), that is,
Cij(‖h‖, δij) = σij
(1 + δij +
‖h‖φij
)−α0(
1 +‖h‖φij
)−1(1 + δij)
−1 , (4.3)
i, j = 1, . . . , p, where ‖h‖ is the Euclidean distance between locations, δij represents a latent dis-
tance between components i, j, σij is the co-located covariance between components, α0 ≥ 0 is a
smoothness parameter and φij ’s are spatial range parameters. Note that the cross-covariance func-
tion in (4.3) is the same of the (2.19), but now we are not concerned about separability between
components and space, thus φij is not parameterized as a function of dij . The other parameters of
the cross-covariance function maintain the same interpretations as described in Section 2.3. How-
ever, since Σt = Σ the spatial structure is repeated at all times t, t = 1, . . . , T , and thus separability
is assumed for the temporal component.
40
Model 2 (M2): assume the model defined in (4.1) and (4.2) and consider the temporal cross-
covariance Σt having its elements given by
Cij,t(‖h‖, δij) = σij,t
(1 + δij +
‖h‖φij
)−α0(
1 +‖h‖φij
)−1(1 + δij)
−1 , (4.4)
i, j = 1, . . . , p, where ‖h‖ is the Euclidean distance between locations, δij represents a latent
distance between components i, j, σij,t is the co-located covariance between components at time t,
t = 1, . . . , T , α0 ≥ 0 is a smoothness parameter and φij ’s are spatial range parameters.
The models presented in equations (4.3) and (4.4) differ in the characterization of the covariance
function. Model M2 considers a temporal evolution in covariance parameters between components,
while model M1 maintains the static covariance structure over time. It is reasonable to assume
that the uncertainty of a component i, i = 1, . . . , p, varies over time with atypical values across
space. In fact, the model M2 allows us to identify that there is an increase in uncertainty related
to atypical observations at time t, t = 1, . . . , T . However, the model presented in (4.4) does not
accomodate outliers and heterogeneity over time.
4.2.1 DLM completion and prior specification
Refer to the general DLM representation (4.1, 4.2) and let Dt = yt, Dt−1 be the information
set at time t with D0 the initial information set. On the sequential Bayesian learning of dynamic
models, the posterior distribution for the parameters at time t− 1, p(βt−1|Dt−1), must be updated
via (4.2) in order to become the prior distribution at time t, p(βt|Dt−1). Combining the prior
with the likelihood p(yt|βt) from (4.1), the predictive p(yt|Dt−1) and the posterior p(βt|Dt) are
produced. The initial prior is β0|D0 ∼ MVN(m0,C0) with specified initial hyper-parameters. The
model for the sequence Wt is based on standard variance discounting using a discount factor δW ,
where typically values close to 1 are chosen to represent relative stability over time in the evolution
of Wt. In the following simulations and application we assume δW = 0.9.
The forward filtering analysis sequences through t = 1, . . . , T , updating the one-step forecast and
posterior distributions follow from the theory of general multivariate DLMs (West and Harrison,
1997). Having completed the forward filtering computations over times t = 1, . . . , T , backward
sampling generates a posterior draw for the full sequence of states from the posterior p(β1:T |DT ).
This operates in reverse time, using the general theory of DLMs that generate this computationally
efficient algorithm for what can be a very high-dimensional full set of states when T is large (Liu
and West, 2009). This is the known forward filtering, backward sampling (FFBS) algorithm (see
Fruhwirth-Schnater, 1994 and Carter and Kohn, 1994).
41
The update of σij,t do not follow the sequential inference used for βt since no conditional
conjugacy is available. We then adopt a metropolis-hastings step. Following Section 2.4, set
σij = ρijσiσj , i, j = 1, . . . , p, with σi ∈ <+ and ρij the co-located correlation between components.
In this chapter we assume to know the ρij sign beforehand. Then, we need to estimate σi ∈ <+.
In the temporal evolution context, set σij,t = σi,tσj,t, i, j = 1, . . . , p, σi,t ∈ <+, i = 1, . . . , p. Based
on this constraint, the dynamic evolution for σi,t and σj,t is
log(σt) = log(σt−1) +ψt, ψt ∼ MVN(0,Ψt), (4.5)
with σt = [σ1,t, . . . , σp,t]′
and Ψt a p× p matrix.
Here we assume that Ψt = Ψ is a known matrix and the initial prior log(σ0)|D0 ∼
MVN(m∗0,Ψ0). We chose a diagonal matrix with large values to represent our uncertainty about Ψ.
The inference approach for log(σt) is updated via (4.5). A metropolis-hastings step is performed
every time t, t = 1, . . . , T , which can be prohibitive when T is large. The bayesian inference for
the other parameters of the cross-covariance function are described in Section 2.4. When σt = σ,
as in M1 model, the inference is made for log(σi), i = 1, . . . , p, and the prior is Normal(ci, di). To
get the value without the transformation, simply apply an exponential function.
4.3 Simulated examples
In this section we present three artificial simulations to investigate the characteristics of the
models described in equations (4.3) and (4.4). We analyze their predictive performance and evaluate
the capacity of identifying atypical observations in space.
The datasets were generated from the model defined by equations (4.1) and (4.2), with p = 2
components, T = 30 time replications and n = 50 spatial locations in the [0, 1]× [0, 1] square. The
spatial locations used for fit and prediction are presented in Figure 4.1.
42
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
longitude
latit
ude
Figure 4.1: Spatial location simulated in [0, 1]× [0, 1] square. Red points: spatial location used in
estimation. Black points: spatial location used in prediction.
The parameters of the cross-correlation function ρij(‖h‖, δij), Θ = (δ12, α0, φ11, φ12, φ22), were
chosen such that the cross-correlation between components was moderate. The values of these
parameters were the same for all simulations and are given by: δ12 = 1.5, α0 = 0, φ11 = 0.1, φ12 =
0.1 and φ22 = 0.3. For all analyzes we consider ρij = +1. With this configuration, ρ12(0, δ12) = 0.4
and the spatial dependence in component two is greater than component one. Figure 4.2 presents
the cross-correlation in space.
0.0 0.5 1.0 1.5 2.0
0.0
0.2
0.4
0.6
0.8
1.0
distance in space
cros
s−co
rrel
atio
n
ρ11ρ22ρ12
Figure 4.2: Cross-correlation function for components in space: simulations.
43
4.3.1 Simulation study one: constant variance over time
We generated a dataset from the model described in equations (4.1) and (4.2), with p = 2
components, T = 30 time replications and n = 50 spatial locations in the [0, 1]× [0, 1] square (see
Figure 4.1). The covariance function used is defined in (4.3), meaning the covariance structure is
static over time.
The dataset was generated using the following specifications: Xt =
1 0
0 1
, with 1 = [1, . . . , 1]′,
Gt =
1 0
0 1
, Wt =
0.008 0
0 0.008
, σ1 = 0.6 and σ2 = 0.8, ∀ t = 1, . . . , T .
In this simulation we evaluate the dataset generated with the specifications presented above and
estimate the models M1 and M2. Our aim is to verify if M2 recovers the covariance structure at each
time t, analyzing the estimated σt and its predictive performance. Details about prior distributions
for model parameters are deferred to Appendix A.2. Inference was carried under MCMC scheme
and for convergence monitoring we use the algorithms in Coda package for R (Plummer et al., 2006).
Figure 4.3 presents the posterior median and 95% credibility interval of σi,t for M1 and M2
models. Note that model M2 is able to recover the static structure of the parameters σ1 and σ2.
44
time index
σ 1
1 4 7 11 16 21 26
0.5
0.6
0.7
0.8
0.9
1.0
time index
σ 2
1 4 7 11 16 21 26
0.5
0.6
0.7
0.8
0.9
1.0
(a) M1
time index
σ 1
1 4 7 11 16 21 26
0.2
0.6
1.0
1.4
1.8
time index
σ 2
1 4 7 11 16 21 26
0.2
0.6
1.0
1.4
1.8
(b) M2
Figure 4.3: Posterior median and 95% credibility interval of σi,t, i = 1, 2, t = 1, . . . , 30, for M1 and
M2 models. Green line: true value of the σi,t: simulation one
Figures 4.4 and 4.5 present predictive measures for model comparison at each time t, t =
1, . . . , T . The IS (Interval Score) and LPS (Log Predictive Score) comparison measures are predic-
tive score rules that provide summaries for accuracy of probabilistic predictions. The smaller the IS
or LPS, the better the model in predictive terms. For more details about scoring rules see Gneiting
and Raftery (2007) and Appendix A.3. Since the covariance structure is recovered by M2, it is
expected that there is no difference in predictive performance between the models M1 and M2. In
fact, from Figures 4.4 and 4.5 it is possible to verify that the predictive comparison measures are
almost identical under both models.
45
0 5 10 15 20 25 30
1.5
2.0
2.5
3.0
time index
inte
rval
sco
re −
y1
M1
M2
0 5 10 15 20 25 30
1.5
2.0
2.5
3.0
time index
inte
rval
sco
re −
y2
M1
M2
Figure 4.4: Interval Score as predictive measure for M1 and M2 models comparison at each com-
ponent i, i = 1, 2, and time t, t = 1, . . . , 30: simulation one.
0 5 10 15 20 25 30
6070
8090
100
120
time index
log
pred
ictiv
e sc
ore
M1
M2
Figure 4.5: Log Predictive Score as predictive measure for M1 and M2 models comparison at each
time t, t = 1, . . . , 30: simulation one.
Using the same generated data set, we selected four random locations in four different times
(t = 11, 15, 20, 25) to analyze the performance of the model in identifying outliers. Observations of
the first component were contaminated at those times and locations, with the addition of random
increments σ1u, with u ∼ Uniform(5, 6).
Model M2 is capable of increasing the uncertainty of component one at times of contamination,
that is, σ1t is inflated at t = 11, 15, 20, 25, while M1, which estimates a single σi, i = 1, 2, for all
time t, t = 1, . . . , 30, overestimated σ1. Model M2 seems to be more appropriate since it is able to
46
identify the times at which aberrant values exist. However, under spatial predictive performance,
model M2 does not differ from model M1 because σit parameters do not vary in space. To identify
spatial outliers we would need to work with σit(s). The output of this simulation has been omitted.
4.3.2 Simulation study two: time-varying variance
In this simulation we follow the same specifications described in simulation one but now we
consider the covariance function defined in (4.4), which presents a dynamic evolution.
The dataset is generated using the following specifications: Xt =
1 0
0 1
, with 1 = [1, . . . , 1]′,
Gt =
1 0
0 1
, Wt =
0.008 0
0 0.008
, and Ψt =
0.25 0
0 0.25
, ∀ t = 1, . . . , T .
Our aim is to compare the predictive performance between M1 and M2 models when the dataset
has time-varying variance. Details about prior distributions for model parameters are deferred to
Appendix A.2. Inference was carried under an MCMC scheme and for convergence monitoring we
use the algorithms in Coda package for R (Plummer et al., 2006).
We expected model M2 to give good results since the empirical variances seem to change in
time in both components as shown in Figure 4.6.
0 5 10 15 20 25 30
05
1015
20
time index
empi
rical
var
ianc
e y
1
0 5 10 15 20 25 30
05
1015
20
time index
empi
rical
var
ianc
e y
2
Figure 4.6: Empirical variance over time for each component: simulation two.
Figure 4.7 presents the posterior median and 95% credibility interval of σi,t under M1 and M2
47
models. Model M2 recovers the temporal structure of σt in both components. Model M1 estimates
a single value that appears to be an average over time. For the second component we note that σ2
estimated by the M1 model is very different from the true value at the start and end times (Figure
4.7(a)). The model that does not allow dynamic evolution is not flexible enough to identify the
heterogeneity of the observations over time.
time index
σ 1
1 4 7 11 16 21 26
1.0
1.5
2.0
2.5
3.0
3.5
4.0
time index
σ 2
1 4 7 11 16 21 261.
01.
52.
02.
53.
03.
54.
0
(a) M1
time index
σ 1
1 4 7 11 16 21 26
1.0
2.0
3.0
4.0
5.0
time index
σ 2
1 4 7 11 16 21 26
1.0
2.0
3.0
4.0
5.0
(b) M2
Figure 4.7: Posterior median and 95% credibility interval of σi,t, i = 1, 2, t = 1, . . . , 30, for M1 and
M2 models. Green line: true value of the σi,t: simulation two.
Figures 4.8 and 4.9 present predictive measures (IS and LPS) for model comparison at each
time t, t = 1, . . . , T . Model M2 results in higher predictive performance than model M1 when
there is a big difference between the true value of σt and the value estimated by the M1 model,
which was already expected.
48
0 5 10 15 20 25 30
46
810
1214
time index
inte
rval
sco
re −
y1
M1
M2
0 5 10 15 20 25 30
46
810
1214
time index
inte
rval
sco
re −
y2
M1
M2
Figure 4.8: Interval Score as predictive measure for M1 and M2 models comparison at each com-
ponent i, i = 1, 2, and time t, t = 1, . . . , 30: simulation two.
0 5 10 15 20 25 30
300
350
400
time index
log
pred
ictiv
e sc
ore
M1
M2
Figure 4.9: Log Predictive Score as predictive measure for M1 and M2 models comparison at each
time t, t = 1, . . . , 30: simulation two.
4.3.3 Simulation study three: change of regime
This simulation considers the cross-covariance function defined in (4.4). The dataset is gen-
erated using the following specifications: Xt =
1 0
0 1
, with 1 = [1, . . . , 1]′, Gt =
1 0
0 1
,
Wt =
0.008 0
0 0.008
, Ψt =
0.1 0
0 0.1
, ∀ t = 1, . . . , 14, 16, . . . , T , and Ψ15 =
10 0
0 0.1
.
49
Unlike simulation study two, here we consider a dynamic evolution for σi,t, i = 1, . . . , p, t =
1, . . . , T , but we increase the uncertainty at t = 15 by changing the evolution level of σ1,t. This
example considers a regime change in uncertainty at component one at time t = 15. Figure 4.10
presents the empirical variance at each component. Note that the empirical variance in component
one is at a different level at times t ≥ 15.
In this simulation, we expect model M2 to recover the structure of variable one and, conse-
quently, to present better predictive performance. In fact, we expect that the M1 model will not
be able to represent adequately the uncertainty associated with component one. Details about
prior distributions for model parameters are deferred to Appendix A.2. Inference was carried un-
der MCMC scheme and for convergence monitoring we use the algorithms in Coda package for R
(Plummer et al., 2006).
0 5 10 15 20 25 30
05
1015
2025
time index
empi
rical
var
ianc
e y
1
0 5 10 15 20 25 30
05
1015
2025
time index
empi
rical
var
ianc
e y
2
Figure 4.10: Empirical variance over time for each component: simulation three.
Figure 4.11 presents the posterior median and 95% credibility interval of σi,t for M1 and M2
models.
Model M2 is able to recover the simulated variance structure (see Figure 4.11(b)). As expected,
model M1 does not recover the variance structure, especially the structure of the component one.
Note that the estimated value of σ1 in M1 (Figure 4.11(a)) is far from the true value when t < 15.
Thus model M1 overestimates the uncertainty when t < 15 and underestimates (not in the same
magnitude that overestimates) when t ≥ 15.
50
time index
σ 1
1 4 7 11 16 21 26
0.0
1.5
3.0
4.5
6.0
time index
σ 2
1 4 7 11 16 21 26
0.0
0.5
1.0
1.5
2.0
(a) M1
time index
σ 1
1 4 7 11 16 21 26
0.0
1.5
3.0
4.5
6.0
time index
σ 2
1 4 7 11 16 21 26
0.0
0.5
1.0
1.5
2.0
(b) M2
Figure 4.11: Posterior median and 95% credibility interval of σi,t, i = 1, 2, t = 1, . . . , 30, for M1
and M2 models. Green line: true value of the σi,t: simulation three.
Figures 4.13 and 4.14 present predictive measures (IS and LPS, described in Appendix A.3) for
model comparison at each time t, t = 1, . . . , T . Model M2 provides very good predictions when
compared to model M1. Specially in component one, the uncertainties are predicted much more
adequately. This can be seen in Figure 4.12, which presents the posterior median and 95% credible
interval for the component one versus the observed values at times t = 7 and t = 23 for all out-
of-sample stations. When the true uncertainty is lower (t = 7) the M1 model presents very large
intervals since this model overestimates the uncertainty. At time t = 23, the M1 model has smaller
intervals than the M2 model because it is not able to represent uncertainty properly. This suggests
that uncertainty is well modeled in the M2 model, providing better predictive performance, as seen
in Figures 4.13 and 4.14.
51
−1.0 0.0 0.5 1.0 1.5
−6
−4
−2
02
46
M1
observed values
pred
ict v
alue
s
−1.0 0.0 0.5 1.0 1.5
−6
−4
−2
02
46
M2
observed values
pred
ict v
alue
s
(a) t = 7
−15 −5 0 5 10
−15
−5
05
10
M1
observed values
pred
ict v
alue
s
−15 −5 0 5 10
−15
−5
05
10
M2
observed values
pred
ict v
alue
s
(b) t = 23
Figure 4.12: posterior median and 95% credible interval for the component one versus the observed
values at times t = 7 and t = 23 for all out-of-sample stations: simulation three.
52
0 5 10 15 20 25 30
05
1015
2025
30
time index
inte
rval
sco
re −
y1
M1
M2
0 5 10 15 20 25 30
01
23
4
time index
inte
rval
sco
re −
y2
M1
M2
Figure 4.13: Interval Score as predictive measure for M1 and M2 models comparison at each
component i, i = 1, 2, and time t, t = 1, . . . , 30: simulation three.
0 5 10 15 20 25 30
5010
015
020
025
030
0
time index
log
pred
ictiv
e sc
ore
M1
M2
Figure 4.14: Log Predictive Score as predictive measure for M1 and M2 models comparison at each
time t, t = 1, . . . , 30: simulation three.
4.4 California dataset
Many air pollutants, emitted from a variety of sources, affect the health of people around the
world every day. In general, pollutants can cause serious effects throughout life. According to the
World Health Organization (WHO), one third of deaths from stroke, lung cancer and heart disease
are due to air pollution.
The use of statistical methods for space-time estimation and forecasting of pollutants is widely
53
applied. In these processes, atypical observations are usually captured and the usual gaussian
models are not able to represent these phenomena satisfactorily.
The air pollution is a problem in California. According to the California Air Resources Board
(CARB), over 90 percent of Californians breathe unhealthy levels of one or more air pollutants
during some part of the year. In this sense, the spatial dependence behavior and the time trajectory
of the pollutant levels PM2.5 and NO2, are studied in this section. These pollutants cause various
effects on the health of individuals, such as premature death, cardiovascular disease, respiratory
disease, asthma, lung irritation and headache.
The pollution dataset was obtained from United States Environmental Protection Agency
(EPA). We evaluated the daily mean PM2.5 concentration (ug/m3) and daily max 1-hour NO2
concentration (ppb), measured at 21 stations (Figure 4.15(a)) from January 1, 2017 to February
09, 2017, totaling 40 moments in time.
Locations with less than 10% missings have gone through an imputation process1 (Figure
4.15(b)). For predictive comparison and validation, we consider two spatial locations. Figure 4.15(c)
shows the locations of these 21 monitoring sites and two hold-out sites on a latitute-longitude scale.
longitude
latit
ude
−124 −120 −116
3436
3840
42
(a)
longitude
latit
ude
−124 −120 −116
3436
3840
42 okimputation
(b)
longitude
latit
ude
1
2
−124 −120 −116
3436
3840
42
(c)
Figure 4.15: Collection of monitoring stations in California state, USA. (a) Collection of 21 mon-
itoring stations. (b) Locations that have gone through an imputation process. (c) Collection
of 21 monitoring stations where numbers 1 and 2 are spatial locations considered for predictive
comparison.
Figures 4.16 and 4.17 show the empirical variance across space and time for each component,
respectively. Notice in Figure 4.17 how volatile the PM2.5 series is. This indicates that the PM2.5
1The imputation was performed applying the mice package in R.
54
changes a lot over time, sometimes changing 5 units in just a couple of days.
5 10 15 20
67
89
11
spatial location
empi
rical
var
ianc
e N
O2
5 10 15 20
24
68
spatial location
empi
rical
var
ianc
e P
M2.
5
Figure 4.16: Empirical variance for NO2 and PM2.5 across space: California dataset.
0 10 20 30 40
68
1012
days
empi
rical
var
ianc
e N
O2
0 10 20 30 40
24
68
10
days
empi
rical
var
ianc
e P
M2.
5
Figure 4.17: Empirical variance for NO2 and PM2.5 across time: California dataset.
In order to illustrate the cross-correlation between the variables across time, we estimate the
parameters in the model y = µ+ ε for each time t, t = 1, . . . , 40, that is,
y = (y1,y2)′ ∼ MVN(µ,G), (4.6)
with G an arbitrary full cross-covariance matrix.
From Figure 4.18 note that there is a positive cross-correlation among NO2 and PM2.5 for all
times, except at time t = 17. Indeed, we have restricted analysis by considering positive dependence
between components at all times. The models used have constraints and assume that the variables
are positively dependent for all t, t = 1, . . . , 40.
55
time index
post
erio
r cr
oss−
corr
elat
ion
1 6 12 18 24 30 36−
1−
0.5
00.
51
Figure 4.18: Posterior median and 95% CI of the cross-correlation among components considering
an independent multivariate normal distribution for each time: California dataset.
We evaluated the behavior of the posterior spatial correlation for each variable (NO2 and PM2.5)
from a univariate dynamic spatial model, where the covariance function is defined by C(‖h‖) =
σ2ρ(‖h‖), with σ2 the constant variance, ρ(·) a valid spatial correlation function and ‖h‖ the
Euclidean distance. We consider a spatial correlation function in the Cauchy family. These models
were used only for this evaluation and are presented in Appendix A.1 (for more details see Banerjee
et al., 2015). The prior distributions for covariance parameters are presented in Appendix A.2
and for the mean parameters we use FFBS as described in Section 4.2.1. Figure 4.19 shows the
posterior spatial correlation for each component. Note that the variables have different behaviors
across space.
56
0.0 0.5 1.0 1.5 2.00.
00.
20.
40.
60.
81.
0distance in space
corr
elat
ion
NO2PM2.5
Figure 4.19: Posterior median (full line) and 95% credible interval (dashed lines) of spatial corre-
lation of the univariate dynamic spatial model for each component: California dataset.
We compared the predictive performance of the following models: M1 and M2 models, both
described in (4.3) and (4.4), respectively. Parameter estimation was performed considering the
likelihood described in (2.21). We consider dynamic effects on the level and covariates (latitude
and longitude) and assume the following discount factor δw = 0.9.
The prior distributions for the parameters in the proposed model follow the discussion in Sec-
tions 4.2.1 and 2.4 and their hyperparameters are presented in Appendix A.2. We assume that
ρij = +1. MCMC methods were used to generate posterior and predictive samples. For MCMC
convergence monitoring we used the algorithms in Coda package for R.
The cross-correlation functions are similar in both models. The posterior densities of ρij(h, δij)
parameters are not significantly different (see Figure 4.20). The posteriori mean and 95% credibility
interval for the uncertainty parameter, σi for M1 and σi,t for M2, i = 1, 2, t = 1, . . . , 40, are shown
in Figure 4.21. Note that M2 seems to recover the behavior of variance parameters from both
components. The difference between the models is explicit when PM2.5 is evaluated. There are
moments in time when the variability is large and the M1 model is not able to estimate uncertainty
satisfactorily. The difference between the models is confirmed from Figures 4.22 and 4.23, which
present the predictive comparison measures. Note that at moments in time when M1 does not
recover the uncertainty associated with the PM2.5, its predictive performance is lower than M2
model.
57
0.0 0.1 0.2 0.3 0.4
02
46
φ11
dens
ity
M1
M2
0.0 0.2 0.4
02
46
φ12
dens
ity
M1
M2
0.00 0.05 0.10 0.15
05
1020
φ22
dens
ity
M1
M2
0.5 1.5 2.5
0.0
0.5
1.0
1.5
δ12
dens
ity
M1
M2
0.0 0.2 0.4
02
46
α0
dens
ity
M1
M2
Figure 4.20: Posterior densities of the cross-correlation parameters for each model: California
dataset.
time index
σ NO
2
1 5 9 14 20 26 32 38
57
911
1315
17
M1
M2
time index
σ PM
2.5
1 5 9 14 20 26 32 38
13
57
911
1315 M1
M2
Figure 4.21: Posterior median and 95% credibility interval of σNO2 and σPM2.5 for M1 and M2
models at each time t, t = 1, . . . , 40,: California dataset.
58
0 10 20 30 40
050
100
150
200
250
time index
inte
rval
sco
re −
NO
2
M1
M2
0 10 20 30 40
050
100
150
time index
inte
rval
sco
re −
PM
2.5
M1
M2
Figure 4.22: Interval Score as predictive measure for M1 and M2 models comparison at each
component i, i = 1, 2, and time t, t = 1, . . . , 40: California dataset.
0 10 20 30 40
810
1214
1618
20
time index
log
pred
ictiv
e sc
ore
M1
M2
Figure 4.23: Log Predictive Score as predictive measure for M1 and M2 models comparison at each
time t, t = 1, . . . , 40: California dataset.
4.5 Discussion
In this chapter we have introduced the space-time setting of the cross-covariance function pre-
sented in (2.19). We use dynamic models to represent temporal evolution. We allowed time-varying
co-located covariance, accommodating atypical observations in a multivariate space-time nonsepa-
rable structure.
In the simulation study one the model with temporal evolution in covariance coefficients recov-
59
ered the structure with constant variance. In the simulations studies two and three, the model that
assumes constant variance did not recover the uncertainty structure and presented worse predictive
performance than the model that assumes time-varying variance. The difference between the tem-
poral evolution model and constant variance model is evident in simulation three, which considers
a regime change in the uncertainty associated with component one.
From the application with California pollutant data, it was noted that the model that allows
the temporal evolution in covariance coefficients is able to recover the true uncertainty associated
with the components and presenting better predictive performance than the model that assumes
constant variance over time.
So far the estimation procedure considered is based on the assumption that the the signal of
co-located correlation ρij is known a prior. In the static model and dynamic model, ρij is fixed at
all times t, t = 1, . . . , T , and it seems that this cross-correlation does not change over time (see,
for example, the application of California pollutants, Figure 4.18). Do not fix the signal of the
co-located correlation implies a bimodal posterior distribution for σi,t.
The parameters σij,t could be directly estimated without the restriction σij,t = σi,tσj,t imposed
so far. However, this implies that correlation will be allowed to change over time and identifiability
issues will probably occur in this scenario. With our restriction we ensure that correlation between
components is fixed over time and only variances are allowed to vary. Our simulated examples and
real data analysis indicate that this model represents well scenarios of outliers, heterogeneity and
change of regimes. Investigation of more general solutions to this problem is still under development.
60
Chapter 5
Conclusions
In this thesis we have extended the class of nonseparable covariance functions proposed in
Fonseca and Steel (2011) to the modeling of component and spatial dependence, considering latent
distance between components as in Apanasovich and Genton (2010).
The general class is flexible enough to generate a nonseparable convariance structure and allows
specification of different structures for space and components, as well as different spatial ranges
associated to each component. The separable model (Mardia and Goodall, 1993) is a special
case. We have proposed a Bayesian test to measure the degree of separability between space and
components. The posterior probability p0 has shown to be an easily interpretative measure in terms
of separability of the component-spatial covariance structures, since does not depend on the scale
of ranges or smoothness.
From the simulations and meteorological data example, it is clear that flexible structures are
needed, which are able to accommodate the assumption of different spatial ranges in space for a
vector of spatial processes. The presented model was able to recover simple structures and provided
better predictive performance than models widely used in the literature, such as the separable model
and the linear model of coregionalization.
To treat the computational limitation we investigated the use of separable approximations
for the full covariance matrix which lead to fast computation of the likelihood function. Thus,
based on Van Loan and Pitsianis (1993) we approximated the full covariance matrices using a
decomposition based on the Kronecker product of two separable matrices of minor dimensions.
These approximations have been applied to the likelihood function in order to obtain fast estimation
of the parameters but we still keep the interpretation and flexibility of the multivariate nonseparable
model. From simulations, we observed that the approximation leads to very similar predictiver
61
performance to the original full nonseparable model.
We have introduced the spatio-temporal setting in the multivariate cross-covariance proposed
in Chapter 2. We allowed time-varying covariance coefficients to accomodate atypical observations.
The spatio-temporal multivariate models were construct based on the dynamic approach. From
simulations and application with California pollutant data, we observed that the model with dy-
namic evolution in covariance coefficients presented better predictive performance and was able to
recover variance structure of components. The limitation of the models used in Chapter 4 is the
constraints used in the estimation of co-located correlations. With our restriction we ensured that
correlation between components was fixed over time and only variances are allowed to vary. This
was necessary due to bimodality of posterior distributions of these parameters. Thus, investigation
of more general solutions to this identification issue is topic of future research.
62
A Appendix A
A.1 Covariance functions
The univariate covariance function used in Section 2.7 and Section 4.4 is given by
C(‖h‖) = σ2(
1 +‖h‖φ
)−1,
with ‖h‖ Euclidean distance, σ2 the variance of the variable and φ the spatial range.
The separable multivariate covariance function used in Sections 2.6.2 and 2.7 is given by
Cij(‖h‖) = aij
(1 +‖h‖φ
)−1, i, j = 1, ..., p,
with ‖h‖ Euclidean distance, aij the covariances of the components and φ the spatial range.
The linear model of coregionalization used in Section 2.7 has the following covariance function
Cij(‖h‖) =
p∑k=1
bikbjk
(1 +‖h‖φk
)−1with ‖h‖ Euclidean distance, B = bij is a p×p full rank matrix and φk the spatial range for k-th
component.
A.2 Prior distributions
The hyperparametres of the prior distributions described in Section 2.4 for the simulation
exercises of Sections 2.5 and 2.6.2, application of weather dataset in Section 2.7, simulation studies
of Section 4.3 and application of California pollutants in Section 4.4 are presented below.
1. Prior distributions considered for all scenarios in simulation of Section 2.5 follows: β ∼
MVN(0, 1000I8), δ12 ∼ Gamma(1, 0.5) and σi ∼ Normal(0, 100), i = 1, 2. We adopted the
mixture prior for α0 and dij , i = 1, . . . , p, j = 2, . . . , p, defined in (2.22). We assume a point
mass at zero, a Gamma(1, 3) for α0 > 0 and a TruncatedNormal(0, 2) for dij ∈ (−φ11,∞),
i = 1, . . . , p, j = 2, . . . , p. For φ11 we consider a gamma distribution, as in Section 2.4, with
shape = 0.75×med(δs) and scale = 0.75, with med(δs) = 0.5175.
2. Prior distributions considered for each dataset in simulation of Section 2.6.2 follows:
Dataset 1:
• SEP: β ∼ Normal(0, 1000I12), A ∼ InverseWishart(I3, 4), and φ ∼ Gamma(0.75 ×
med(δs), 0.75), with med(δs) = 0.5145;
63
• NSEP-φ: β ∼ MVN(0, 1000I12), σi ∼ Normal(0, 100), i = 1, . . . , 3, δij ∼ Gamma(1, 0.5),
i 6= j, i, j = 1, . . . , 3. This model considers p0 = 0 and dij = 0, i = 1, 2, 3, j = 2, 3, then
α0 ∼ Gamma(1, 12) and φ11 ≡ φ ∼ Gamma(0.75×med(δs), 0.75), med(δs) = 0.5145;
• NSEP: β ∼ MVN(0, 1000I12), σi ∼ Normal(0, 100), i = 1, . . . , 3, δij ∼ Gamma(1, 0.5),
i 6= j, i, j = 1, . . . , 3. This model considers p0 = 0, then α0 ∼ Gamma(1, 1),
dij ∼ TruncatedNormal(0, 2) for dij ∈ (−φ11,∞), i = 1, 2, 3, j = 2, 3, and φ11 ∼
Gamma(0.75×med(δs), 0.75), med(δs) = 0.5145;
• Mix-NSEP: β ∼ MVN(0, 1000I12), σi ∼ Normal(0, 100), i = 1, . . . , 3, δij ∼ Gamma(1, 0.5),
i 6= j, i, j = 1, . . . , 3. This model sets a mixture prior for α0 and dij , i = 1, 2, 3, j = 2, 3,
given by a point mass at zero, a Gamma(1, 3) for α0 > 0 and a TruncatedNormal(0, 2)
for dij ∈ (−φ11,∞), i = 1, 2, 3, j = 2, 3, with φ11 ∼ Gamma(0.75 × med(δs), 0.75),
med(δs) = 0.5145;
Dataset 2:
• SEP: β ∼ Normal(0, 1000I12), A ∼ InverseWishart(I3, 4), and φ ∼ Gamma(0.75 ×
med(δs), 0.75), with med(δs) = 0.5145;
• NSEP-φ: β ∼ MVN(0, 1000I12), σi ∼ Normal(0, 100), i = 1, . . . , 3, δij ∼ Gamma(1, 0.5),
i 6= j, i, j = 1, . . . , 3. This model considers p0 = 0 and dij = 0, i = 1, 2, 3, j = 2, 3, then
α0 ∼ Gamma(1, 1) and φ11 ≡ φ ∼ Gamma(0.75×med(δs), 0.75), med(δs) = 0.5145;
• NSEP: β ∼ MVN(0, 1000I12), σi ∼ Normal(0, 100), i = 1, . . . , 3, δij ∼ Gamma(1, 0.5),
i 6= j, i, j = 1, . . . , 3. This model considers p0 = 0, then α0 ∼ Gamma(1, 1),
dij ∼ TruncatedNormal(0, 2) for dij ∈ (−φ11,∞), i = 1, 2, 3, j = 2, 3, and φ11 ∼
Gamma(0.75×med(δs), 0.75), med(δs) = 0.5145;
• Mix-NSEP: β ∼ MVN(0, 1000I12), σi ∼ Normal(0, 100), i = 1, . . . , 3, δij ∼ Gamma(1, 0.5),
i 6= j, i, j = 1, . . . , 3. This model sets a mixture prior for α0 and dij , i = 1, 2, 3, j = 2, 3,
given by a point mass at zero, a Gamma(1, 3) for α0 > 0 and a TruncatedNormal(0, 2)
for dij ∈ (−φ11,∞), i = 1, 2, 3, j = 2, 3, with φ11 ∼ Gamma(0.75 × med(δs), 0.75),
med(δs) = 0.5145;
Dataset 3:
• SEP: β ∼ Normal(0, 1000I12), A ∼ InverseWishart(I3, 4), and φ ∼ Gamma(0.75 ×
med(δs), 0.75), with med(δs) = 0.5145;
64
• NSEP-φ: β ∼ MVN(0, 1000I12), σi ∼ Normal(0, 100), i = 1, . . . , 3, δij ∼ Gamma(1, 0.5),
i 6= j, i, j = 1, . . . , 3. This model considers p0 = 0 and dij = 0, i = 1, 2, 3, j = 2, 3, then
α0 ∼ Gamma(1, 1) and φ11 ≡ φ ∼ Gamma(0.75×med(δs), 0.75), med(δs) = 0.5145;
• NSEP: β ∼ MVN(0, 1000I12), σi ∼ Normal(0, 100), i = 1, . . . , 3, δij ∼ Gamma(1, 0.5),
i 6= j, i, j = 1, . . . , 3. This model considers p0 = 0, then α0 ∼ Gamma(1, 1),
dij ∼ TruncatedNormal(0, 2) for dij ∈ (−φ11,∞), i = 1, 2, 3, j = 2, 3, and φ11 ∼
Gamma(0.75×med(δs), 0.75), med(δs) = 0.5145;
• Mix-NSEP: β ∼ MVN(0, 1000I12), σi ∼ Normal(0, 100), i = 1, . . . , 3, δij ∼ Gamma(1, 0.5),
i 6= j, i, j = 1, . . . , 3. This model sets a mixture prior for α0 and dij , i = 1, 2, 3, j = 2, 3,
given by a point mass at zero, a Gamma(1, 3) for α0 > 0 and a TruncatedNormal(0, 2)
for dij ∈ (−φ11,∞), i = 1, 2, 3, j = 2, 3, with φ11 ∼ Gamma(0.75 × med(δs), 0.75),
med(δs) = 0.5145;
3. The prior distribution for the models fitted for the weather dataset in Section 2.7 is given by:
• SEP: β ∼ MVN(0, 1000I12), A ∼ InverseWishart(I3, 4), and φ ∼ Gamma(0.25 ×
med(δs), 0.25), with med(δs) = 1.958;
• NSEP-φ: β ∼ MVN(0, 1000I12), σi ∼ Normal(0, 100), i = 1, . . . , 3, δij ∼ Gamma(1, 0.5),
i 6= j, i, j = 1, . . . , 3. This model considers p0 = 0 and dij = 0, i = 1, 2, 3, j = 2, 3, then
α0 ∼ Gamma(1, 1) and φ11 ≡ φ ∼ Gamma(0.25×med(δs), 0.25), med(δs) = 1.958;
• NSEP: β ∼ MVN(0, 1000I12), σi ∼ Normal(0, 100), i = 1, . . . , 3, δij ∼ Gamma(1, 0.5),
i 6= j, i, j = 1, . . . , 3. This model considers p0 = 0, then α0 ∼ Gamma(1, 1),
dij ∼ TruncatedNormal(0, 2) for dij ∈ (−φ11,∞), i = 1, 2, 3, j = 2, 3, and φ11 ∼
Gamma(0.25×med(δs), 0.25), med(δs) = 1.958;
• Mix-NSEP: β ∼ MVN(0, 1000I12), σi ∼ Normal(0, 100), i = 1, . . . , 3, δij ∼ Gamma(1, 0.5),
i 6= j, i, j = 1, . . . , 3. This model sets a mixture prior for α0 and dij , i = 1, 2, 3, j = 2, 3,
given by a point mass at zero, a Gamma(1, 3) for α0 > 0 and a TruncatedNormal(0, 2)
for dij ∈ (−φ11,∞), i = 1, 2, 3, j = 2, 3, with φ11 ∼ Gamma(0.25 × med(δs), 0.25),
med(δs) = 1.958;
• LMC: β ∼ MVN(0, 1000I12), bii ∼ InverseGamma(0.001, 0.001), i = 1, . . . , 3, bij ∼
Normal(0, 100), i 6= j, i, j = 1, . . . , 3, φi ∼ Gamma(0.75 × med(δs), 0.75), i = 1, . . . , 3,
with med(δs) = 1.958;
65
• For each univariate model: β ∼ MVN(0, 1000I4), θ ∼ Gamma(1, 0.25), with θ = 1σ2 ,
and φ ∼ Gamma(0.1×med(δs), 0.1), with med(δs) = 1.958.
4. Prior distributions considered for all simulation studies of Section 4.3 follows:
• M1: β0|D0 ∼ MVN(0, 1000I6), log(σi) ∼ Normal(0, 100), i = 1, . . . , 2, δ12 ∼ Gamma(1, 0.5).
This model sets a mixture prior for α0 given by a point mass at zero, a Gamma(1, 3) for
α0 > 0, and φij ∼ Gamma(0.25×med(δs), 0.25), med(δs) = 0.501;
• M2: β0|D0 ∼ MVN(0, 1000I6), log(σ0)|D0 ∼ MVN(0, 100I2), Ψt = Ψ = 100I2 for
all t, δ12 ∼ Gamma(1, 0.5). This model sets a mixture prior for α0 given by a point
mass at zero, a Gamma(1, 3) for α0 > 0, and φij ∼ Gamma(0.25 × med(δs), 0.25),
med(δs) = 0.501;
5. The prior distribution for the models fitted for the California pollutants dataset in Section
4.4 is given by:
• M1: β0|D0 ∼ MVN(0, 1000I6), log(σi) ∼ Normal(0, 100), i = 1, . . . , 2, δ12 ∼ Gamma(1, 0.5).
This model sets a mixture prior for α0 given by a point mass at zero, a Gamma(1, 3) for
α0 > 0, and φij ∼ Gamma(0.25×med(δs), 0.25), med(δs) = 3.092;
• M2: β0|D0 ∼ MVN(0, 1000I6), log(σ0)|D0 ∼ MVN(0, 100I2), Ψt = Ψ = 100I2 for
all t, δ12 ∼ Gamma(1, 0.5). This model sets a mixture prior for α0 given by a point
mass at zero, a Gamma(1, 3) for α0 > 0, and φij ∼ Gamma(0.25 × med(δs), 0.25),
med(δs) = 3.092;
A.3 Model comparison predictive measures
As follows we present some measures considered for model comparison in the illustrations of
our proposal. Further details can be found in Gneiting and Raftery (2007).
1. Interval Score (IS) is given by
ISα(l, u;x) = (u− l) +2
α(l − x)I[x<l] +
2
α(x− u)I[x>u], (1)
where l and u represent the forecaster quoted α2 and 1− α
2 quantiles. According to Gneiting
and Raftery (2007), the forecaster is rewarded for narrow prediction intervals, and he or she
incurs a penalty, the size of which depends on α, if the observation misses the interval.
66
2. The Width of Credibility Interval (WCI) is defined by (u− l) of the IS function in (1).
3. Log Predictive Score (LPS) is based on the predictive distribution q and on the observed
value x,
LPS(q;x) = −log(q(x)). (2)
67
Bibliography
Apanasovich, T. V. and Genton, M. G. (2010). “Cross-covariance functions for multivariate random
fields based on latent dimensions.” Biometrika, 97, 1, 15–30.
Apanasovich, T. V., Genton, M. G., and Sun, Y. (2012). “A Valid Matern Class of Cross-Covariance
Functions for Multivariate Random Fields With Any Number of Components.” Journal of the
American Statistical Association, 107, 1, 15–30.
Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2004). Hierarchical Modeling and Analysis for
Spatial Data. Monographs on Statistics and Applied Probability, 1st ed. Chapman & Hall/CRC.
— (2015). Hierarchical Modeling and Analysis for Spatial Data. Monographs on Statistics and
Applied Probability, 2nd ed. Chapman & Hall/CRC.
Berliner, L. M. (1996). Hierarchical Bayesian Time Series Models, vol. 79, 15–22. Springer.
Berrocal, V. J., Gelfand, A. E., and Holland, D. M. (2010). “A bivariate space-time downscaler
under space and time misalignment.” The Annals of Applied Statistics, 4, 4, 1942–1975.
Bourotte, M., Allard, D., and Porcu, E. (2016). “A flexible class of non-separable cross-covariance
functions for multivariate space-time data.” Spatial Statistics, 18, 125–146.
Carter, C. K. and Kohn, R. (1994). “On Gibbs sampling for state space models.” Biometrika, 81,
541–553.
Choi, J., Fuentes, M., Reich, B. J., and Davis, J. M. (2009). “Multivariate spatial-temporal mod-
eling and prediction of speciated fine particles.” J. Stat. Theory Pract., 2, 3, 407–418.
Cox, T. F. and Cox, M. A. A. (2000). Multidimensional scaling . Chapman & Hall/CRC.
Cressie, N. (1993). Statistics for Spatial Data. New York: Wiley.
Cressie, N. and Huang, H.-C. (1999). “Classes of Nonseparable, Spatio-Temporal Stationary Co-
variance Functions.” Journal of the American Statistical Association, 94, 448, 1330–1340.
Cressie, N. and Zammit-Mangion, A. (2016). “Multivariate spatial covariance models: a conditional
approach.” Biometrika, 103, 4, 915–935.
Daley, D. J., Porcu, E., and Bevilacqua, M. (2014). “Classes of compactly supported covariance
functions for multivariate random fields.” 29, 4, 1249–1263.
DeGroot, M. H. and Schervish, M. J. (2011). Probability and Statistics. 4th ed. Pearson.
68
Fonseca, T. C. O. and Steel, M. F. J. (2011). “A General Class of Nonseparable Space-time
Covariance Models.” Environmetrics, 22, 2, 224–242.
— (2017). “Measuring Separability in Spatio-temporal Covariance Functions.” Tech. Rep. 290,
Department of Statistics, Federal University of Rio de Janeiro.
Fruhwirth-Schnater, S. (1994). “Data augmentation and dynamic linear models.” Journal of Time
Series Analysis, 15, 2, 183–202.
Fuentes, M. (2006). “Testing for Separability of Spatial-Temporal Covariance Functions.” Journal
of Statistical Planning and Inference, 136, 2, 447–466.
Gamerman, D. and Lopes, H. F. (2006). Markov Chain Monte Carlo: stochastic simulation for
bayesian inference. 2nd ed. CRC Press.
Gaspari, G. and Cohn, S. E. (1999). “Construction of correlation functions in two and three
dimensions.” Q.J.R. Meteorol. Soc., 125, 723–757.
Gelfand, A. E., Banerjee, S., and Gamerman, D. (2005). “Spatial Process Modelling for Univariate
and Multivariate Dynamic Spatial Data.” Environmetrics, 16, 5, 465–479.
Gelfand, A. E., Schmidt, A. M., Banerjee, S., and Sirmans, C. F. (2004). “Nonstationary multi-
variate process modeling through spatially varying coregionalization.” Test , 13, 2, 263–312.
Genton, M. (2007). “Separable Approximations Of Space-time Covariance Matrices.” Environ-
metrics, 18, 681–695.
Genton, M. G. and Kleiber, W. (2015). “Cross-Covariance Functions for Multivariate Geostatis-
tics.” Statistical Science, 30, 3, 147–163.
Gneiting, T. (2002). “Nonseparable, Stationary Covariance Functions for Space-Time Data.” Jour-
nal of the American Statistical Association, 97, 458, 590–600.
Gneiting, T., Genton, M. G., and Guttorp, P. (2007). “Geostatistical space-time models, sta-
tionarity, separability and full symmetry.” In Statistical Methods for Spatio-Temporal Systems,
151–175. Chapman and Hall.
Gneiting, T., Kleiber, W., and Schlather, M. (2010). “Matern Cross-Covariance Functions for
Multivariate Random Fields.” Journal of the American Statistical Association, 105, 491, 1167–
1177.
Gneiting, T. and Raftery, A. E. (2007). “Strictly proper scoring rules, prediction and estimation.”
Journal of the American Statistical Association, 102, 477, 360–378.
Gneiting, T. and Schlather, M. (2004). “Stochastic Models That Separate Fractal Dimension and
the Hurst Effect.” SIAM review , 46, 2, 269–282.
Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations. The Johns Hopkins University
Press.
Goulard, M. and Voltz, M. (1992). “Linear coregionalization model: tools for estimation and choice
of cross-variogram matrix.” Mathematical Geology , 24, 3, 269–286.
69
Harrison, P. and Stevens, C. F. (1976). “Bayesian Forecasting.” Journal of the Royal Statistical
Society Series B , 38, 3, 205–247.
Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999). “Bayesian Model Averaging:
A Tutorial.” Statistical Science, 14, 4, 382–417.
Iaco, S. D., Myers, D. E., Palma, M., and Posa, D. (2013). “Using Simultaneous Diagonalization
to Identify a Space–Time Linear Coregionalization Model.” Math. Geosci., 45, 69–86.
Ip, H. L. and Li, W. K. (2016). “Matern cross-covariance functions for bivariate spatio-temporal
random fields.” Spatial Statistics, 17, 22–37.
Kass, R. and Raftery, A. E. (1995). “Bayes Factor.” Journal of the American Statistical Association,
90, 430, 773–795.
Li, B., Genton, M. G., and Sherman, M. (2007). “A Nonparametric Assessment of Properties of
Space-Time Covariance Functions.” Journal of the American Statistical Association, 102, 478,
736–744.
— (2008). “Testing the covariance structure of multivariate.” Biometrika, 95, 4, 813–829.
Liu, F. and West, M. (2009). “A Dynamic Modelling Strategy for Bayesian Computer Model
Emulation.” Bayesian Analysis, 4, 2, 393–412.
Ma, C. (2002). “Spatio-Temporal Covariance Functions Generated by Mixtures.” Mathematical
Geology , 34, 8, 965–975.
— (2003). “Spatio-Temporal Stationary Covariance Models.” Journal of Multivariate analysis, 86,
1, 97–107.
Majumdar, A. and Gelfand, A. E. (2007). “Multivariate Spatial Modeling for Geostatistical Data
Using Convolved Covariance Functions.” Mathematical Geology , 39, 7, 225–245.
Mardia, K. V. and Goodall, C. R. (1993). Spatial-temporal analysis of multivariate environmental
monitoring data, 347–386. Elsevier Sci., New York.
Mitchell, M. W., Genton, M. G., and Gumpertz, M. L. (2005). “Testing for separability of space-
time covariances.” Environmetrics, 16, 819–831.
— (2006). “A likelihood ratio test for separability of covariances.” Journal of Multivariate analysis,
97, 1025–1043.
Petris, G., Petrone, S., and Campagnoli, P. (2009). Dynamic Linear Models with R. Use R, 1st ed.
Springer-Verlag New York.
Plummer, M., Best, N., Cowles, K., and Vines, K. (2006). “CODA: Convergence Diagnosis and
Output Analysis for MCMC.” R News, 6, 7–11.
Porcu, E., Bevilacqua, M., and Genton, M. G. (2016). “Spatio-Temporal Covariance and Cross-
Covariance Functions of the Great Circle Distance on a Sphere.” Journal of the American
Statistical Association, 111, 514, 888–898.
70
Porcu, E., Mateu, J., and Bevilacqua, M. (2007). “Covariance functions that are stationary or
nonstationary in space and stationary in time.” Statistica Neerlandica, 61, 3, 358–382.
Porcu, E. and Zastavnyi, V. (2011). “Characterization theorems for some classes of covariance
functions associated to vector valued random fields.” Journal of Multivariate analysis, 102,
1293–1301.
Reich, B. and Fuentes, M. (2007). “A multivariate semiparametric Bayesian spatial modelling
framework for hurricane surface wind fields.” Annals of Applied Statistics, 1, 249–264.
Reis, E. A., Gamerman, D., Paez, M. S., and Martins, T. G. (2013). “Bayesian dynamic models
for space–time point processes.” Computational Statistics and Data Analysis, 60, 146–156.
Robert, C. P. (1994). The Bayesian choice: a decision-theoretic motivation. 1st ed. Springer.
Rouhani, S. and Wackernagel, H. (1990). “Multivariate Geostatistical Approach to Space-Time
Data Analysis.” Water Resour , 26, 4, 585–591.
Schervish, M. J. (1995). Theory of Statistics. 1st ed. Springer.
Schmidt, A. M. and Gelfand, A. E. (2003). “A Bayesian Coregionalization Approach for Multivari-
ate Pollutant Data.” Journal of Geophysical Research-Atmospheres, 108, D24.
Stein, M. L. (2005). “Space-Time Covariance Functions.” Journal of the American Statistical
Association, 100, 469, 310–321.
Stroud, J., Muller, P., and Sanso, B. (2001). “Dynamic models for spatio-temporal data.” Journal
of the Royal Statistical Society Series B , 63, 673–689.
Van Loan, C. F. and Pitsianis, N. (1993). Approximation with Kronecker Products, vol. 232, 293–
314. Springer.
Wackernagel, H. (2003). Multivariate Geostatistics: An Introduction with Applications. 3rd ed.
Springer-Verlag Berlin Heidelberg.
West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Models. Springer Series in
Statistics, 2nd ed. Springer.
71