Flexible covariance modeling of multivariate spatio-temporal … · 2019. 10. 25. · Ao meu amor,...

Flexible covariance modeling of multivariate

spatio-temporal processes

Rafael Santos Erbisti

Universidade Federal do Rio de Janeiro

Instituto de Matematica

2019

Flexible covariance modeling of multivariate spatio-temporal

processes

Rafael Santos Erbisti

Thesis submitted to the Statistics Graduate Program at Mathematics Institute of Federal Uni-

versity of Rio de Janeiro, as a requirement to earn a PhD’s degree in statistics.

Approved by:

Thaıs C. O. da Fonseca (Orientador)

DME/IM/UFRJ.

Dani Gamerman

DME/IM/UFRJ.

Marina Silva Paez

DME/IM/UFRJ.

Flavio Bambirra Goncalves

DEST/ICEx/UFMG.

Gustavo da Silva Ferreira

ENCE/IBGE.

Rio de Janeiro, RJ - Brazil

2019

ii

CIP - Catalogação na Publicação

Elaborado pelo Sistema de Geração Automática da UFRJ com os dados fornecidospelo(a) autor(a), sob a responsabilidade de Miguel Romeu Amorim Neto - CRB-7/6283.

E65fErbisti, Rafael Santos Flexible covariance modeling of multivariatespatio-temporal processes / Rafael Santos Erbisti. - Rio de Janeiro, 2019. 71 f.

Orientadora: Thaís Cristina Oliveira da Fonseca. Coorientadora: Mariane Branco Alves. Tese (doutorado) - Universidade Federal do Riode Janeiro, Instituto de Matemática, Programa de PósGraduação em Estatística, 2019.

1. geoestatística. 2. modelos espaço-tempomultivariados. 3. função de covariância cruzada nãoseparável. 4. dimensões latentes. 5. inferênciabayesiana. I. Fonseca, Thaís Cristina Oliveira da,orient. II. Alves, Mariane Branco, coorient. III.Título.

iii

“... a vida e trem-bala, parceiro

E a gente e so passageiro prestes a partir”

(Trecho da musica Trem-Bala - Ana Vilela)

iv

Agradecimentos

Agradeco a todos que contribuıram de alguma forma para a realizacao deste trabalho, especial-

mente:

As minhas orientadoras, Thaıs C. O. Fonseca e Mariane B. Alves, pelos ensinamentos, disponi-

bilidade, paciencia e delicadeza. Agradeco por poder trabalhar com pesquisadoras tao excelentes.

O conhecimento e experiencia transmitidos por voces foram essenciais para meu amadurecimento

e desenvolvimento. Sei que nem todos os momentos foram faceis mas sabemos que as dificuldades

sempre aparecerao e precisamos aprender a contorna-las. Aprendi muito com voces e sou muito

grato por tudo que fizeram! Muito obrigado!

Aos meus pais, Belinha e Renzo, que sempre acreditaram em mim e me apoiaram em todos

os momentos. Sem voces nada disso seria possıvel. A minha irma, Juliana, pela amizade e apoio

de sempre. E claro ao mais novo integrante da famılia Erbisti, Pedrinho! Mesmo na correria do

fechamento da tese, consegui estar presente em seu nascimento!

Ao meu amor, Paloma, por sempre estar ao meu lado. Seu amor, parceria, carinho e paciencia

foram fundamentais para que eu conseguisse chegar ate aqui. Muito obrigado por tudo!

Agradeco aos amigos que me acompanharam durante essa trajetoria, sempre dando forca e me

fazendo acreditar que era possıvel!

v

Resumo

A modelagem da dependencia de um vetor de variaveis correlacionadas espacialmente e uma

tarefa desafiadora, uma vez que a especificacao de funcoes de covariancia cruzada validas que

representem processos reais nao e trivial. A fim de satisfazer a hipotese de validade, varias pro-

postas consideram formulacoes simplificadas, como funcoes de covariancia separaveis que implicam

o mesmo alcance espacial para todos os componentes no vetor de resposta. Isso raramente e uma

suposicao realista para aplicacoes espaciais. Neste contexto, e de grande interesse contabilizar as

dependencias na analise de dados espaciais multivariados, permitindo ainda a especificacao flexıvel

de modelos para cada componente. Neste trabalho propomos inferencia bayesiana completa para

uma classe flexıvel e nao-separavel de funcoes de covariancia cruzada, inicialmente para dados es-

paciais multivariados. A funcao de covariancia nao separavel considerada e baseada na convolucao

de funcoes de covariancia separaveis e permite diferentes alcances no espaco. Um teste bayesiano

e proposto para a separabilidade das funcoes de covariancia, o que e mais interpretavel do que

tentar entender um conjunto de parametros relacionados a separabilidade, tipicamente definidos

em escala que dificulta a interpretacao pratica. Uma aplicacao a variaveis meteorologicas no estado

do Ceara, Brasil, indica que nossa proposta e preferida a abordagem de coregionalizacao. Para

tratar a limitacao computacional, aproximamos as matrizes de covariancia cheia usando uma de-

composicao baseada no produto de Kronecker de duas matrizes separaveis de dimensoes menores.

Essas aproximacoes sao aplicadas somente a funcao de verossimilhanca para obter uma estimativa

rapida dos parametros, mas ainda mantendo a interpretacao e a flexibilidade do modelo multi-

variado nao separavel. Os efeitos do uso da aproximacao sao avaliados para conjuntos de dados

simulados em termos de erro de previsao. Tambem introduzimos um cenario espaco-temporal as

funcoes de covariancia cruzada propostas, que permitem coeficientes de covariancia variando no

tempo que acomodam observacoes atıpicas e heterogeneidade temporal.

Palavras-Chaves: geoestatıstica, modelos espaco-tempo multivariados, funcao de covariancia

cruzada, funcao de covariancia nao separavel, dimensoes latentes, inferencia bayesiana.

vi

Abstract

Modeling the dependence of a vector of spatially correlated variables is a challenging task, once

specification of valid cross-covariance functions is non-trivial. In order to satisfy the validity as-

sumption, several proposals consider simplified formulations such as separable covariance functions

which imply the same spatial reach for all the components in the response vector. This is rarely

a realistic assumption for spatial applications. In this context, accounting for the dependencies in

multivariate spatial data analysis while still allowing for flexible specification of models for each

component is of great interest. In this work we propose full Bayesian inference for a flexible nonsep-

arable class of cross-covariance functions for multivariate spatial data. The nonseparable covariance

function considered is based on the convolution of separable covariance functions and allows for dif-

ferent reaches in space. A Bayesian test is proposed for separability of covariance functions which is

more interpretable than trying to make sense of a set of parameters related to separability, typically

defined on a scale that is difficult to interpret. An application to meteorological variables in the

state of Ceara, Brazil, indicates that our proposal is preferred to the well known coregionalization

approach. To treat the computational limitation we approximate the full covariance matrices using

a decomposition based on the Kronecker product of two separable matrices of minor dimensions.

These approximations are applied to the likelihood function in order to obtain fast estimation

of parameters but we still keep the interpretation and flexibility of the multivariate nonsepara-

ble model. The effects of using the approximation are evaluated for simulated datasets in terms

of prediction error. We also introduce a spatio-temporal setting of the proposed cross-covariance

functions, which allow time-varying covariance coefficients accommodating atypical observations

and temporal heterogeneity.

Keywords: geostatistics, multivariate spatio-temporal models, cross-covariance function, nonsepa-

rable covariance function, latent dimensions, bayesian inference.

vii

Contents

1 Introduction 1

2 Covariance modeling of multivariate spatial random fields 6

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Multivariate process modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Separable cross-covariance functions . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.2 Nonseparable cross-covariance functions . . . . . . . . . . . . . . . . . . . . . 10

2.3 Multivariate spatial modeling based on mixtures . . . . . . . . . . . . . . . . . . . . 11

2.3.1 Flexible classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4.1 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 Bayesian hypotheses testing for separability . . . . . . . . . . . . . . . . . . . . . . . 17

2.6 Simulated examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.6.1 Example 1: computing posterior probabilities of separability . . . . . . . . . 18

2.6.2 Example 2: model discrimination . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.7 Ceara weather dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Likelihood computation for large data 30

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Separable approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3 Sensitivity study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Multivariate spatio-temporal modeling 38

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

viii

4.2 Multivariate dynamic spatial models . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.2.1 DLM completion and prior specification . . . . . . . . . . . . . . . . . . . . . 41

4.3 Simulated examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.3.1 Simulation study one: constant variance over time . . . . . . . . . . . . . . . 44

4.3.2 Simulation study two: time-varying variance . . . . . . . . . . . . . . . . . . 47

4.3.3 Simulation study three: change of regime . . . . . . . . . . . . . . . . . . . . 49

4.4 California dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5 Conclusions 61

A Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

A.1 Covariance functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

A.2 Prior distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

A.3 Model comparison predictive measures . . . . . . . . . . . . . . . . . . . . . 66

ix

List of Tables

2.1 Interpretation table for the Bayesian separability test. BF: Bayes Factor; p0: pos-

terior probability of separability; w0: loss associated to the decision of rejecting H0

when H0 is true; w1: loss associated to the decision of not rejecting H0 when H1 is

true. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Posterior probabilities of separability p0 for each dataset and scenario of α0. . . . . . 19

2.3 Predictive measures for model comparison and posterior probability of separability

for the simulation examples. IS=Interval Score, WCI=Width of Credibility Interval

and LPS=Log Predictive Score. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4 Comparison of models in predictive terms for the Ceara weather dataset. IS=Interval

Score, WCI=Width of Credibility Interval and LPS=Log Predictive Score. . . . . . . 29

3.1 Necessary time (in seconds) to calculate the likelihood function based on a full co-

variance matrix and an approximate structure. (Intel(R) Core(TM) i7-3630QM,

2.40GHz, 6GB RAM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2 Predictive model comparison. SEP: separable model. NS app: nonseparable approx-

imate model. NS: nonseparable model. . . . . . . . . . . . . . . . . . . . . . . . . . . 36

x

List of Figures

2.1 Posterior median and 95% CI of the cross-correlations among variables for each

dataset considering an independent multivariate normal distribution for each spatial

location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2 Posterior median of covariance function (gray full line) and 95% credible interval

(gray dashed lines) of Mix-NSEP model. Black full line: true covariance function

(SEP model). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Collection of monitoring stations in Ceara state, Brazil. Numbers 1 and 2 are spatial

locations considered for predictive comparison. . . . . . . . . . . . . . . . . . . . . . 26

2.4 Posterior median and 95% CI of the cross-correlations among components considering

an independent multivariate normal distribution for each spatial location. . . . . . . 26

2.5 Posterior median (full line) and 95% credible interval (dashed lines) of spatial cor-

relation of the univariate spatial models for each component. . . . . . . . . . . . . . 27

2.6 Posterior densities of the spatial ranges of SEP and Mix-NSEP models. . . . . . . . 28

3.1 Separability approximation error index as a function of α0. Full line: p = 2; dashed

line: p = 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2 Likelihood contour plots. Black line: full structure. Red line: approximate structure.

Dashed black line: true value of parameters. . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Computational time reduction (in percent) in calculation of the likelihood function

using approximate structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1 Spatial location simulated in [0, 1]× [0, 1] square. Red points: spatial location used

in estimation. Black points: spatial location used in prediction. . . . . . . . . . . . . 43

4.2 Cross-correlation function for components in space: simulations. . . . . . . . . . . . 43

4.3 Posterior median and 95% credibility interval of σi,t, i = 1, 2, t = 1, . . . , 30, for M1

and M2 models. Green line: true value of the σi,t: simulation one . . . . . . . . . . . 45

xi

4.4 Interval Score as predictive measure for M1 and M2 models comparison at each

component i, i = 1, 2, and time t, t = 1, . . . , 30: simulation one. . . . . . . . . . . . . 46

4.5 Log Predictive Score as predictive measure for M1 and M2 models comparison at

each time t, t = 1, . . . , 30: simulation one. . . . . . . . . . . . . . . . . . . . . . . . . 46

4.6 Empirical variance over time for each component: simulation two. . . . . . . . . . . 47


and M2 models. Green line: true value of the σi,t: simulation two. . . . . . . . . . . 48


component i, i = 1, 2, and time t, t = 1, . . . , 30: simulation two. . . . . . . . . . . . . 49


each time t, t = 1, . . . , 30: simulation two. . . . . . . . . . . . . . . . . . . . . . . . . 49

4.10 Empirical variance over time for each component: simulation three. . . . . . . . . . . 50


and M2 models. Green line: true value of the σi,t: simulation three. . . . . . . . . . 51

4.12 posterior median and 95% credible interval for the component one versus the observed

values at times t = 7 and t = 23 for all out-of-sample stations: simulation three. . . . 52


component i, i = 1, 2, and time t, t = 1, . . . , 30: simulation three. . . . . . . . . . . . 53


each time t, t = 1, . . . , 30: simulation three. . . . . . . . . . . . . . . . . . . . . . . . 53

4.15 Collection of monitoring stations in California state, USA. (a) Collection of 21 mon-

itoring stations. (b) Locations that have gone through an imputation process. (c)

Collection of 21 monitoring stations where numbers 1 and 2 are spatial locations

considered for predictive comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.16 Empirical variance for NO2 and PM2.5 across space: California dataset. . . . . . . . 55

4.17 Empirical variance for NO2 and PM2.5 across time: California dataset. . . . . . . . . 55

4.18 Posterior median and 95% CI of the cross-correlation among components considering

an independent multivariate normal distribution for each time: California dataset. . 56

4.19 Posterior median (full line) and 95% credible interval (dashed lines) of spatial cor-

relation of the univariate dynamic spatial model for each component: California

dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

xii

4.20 Posterior densities of the cross-correlation parameters for each model: California

dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.21 Posterior median and 95% credibility interval of σNO2 and σPM2.5 for M1 and M2

models at each time t, t = 1, . . . , 40,: California dataset. . . . . . . . . . . . . . . . . 58


component i, i = 1, 2, and time t, t = 1, . . . , 40: California dataset. . . . . . . . . . . 59


each time t, t = 1, . . . , 40: California dataset. . . . . . . . . . . . . . . . . . . . . . . 59

xiii

Chapter 1

Introduction

The analysis of multivariate data observed over space and time is of great interest in sev-

eral application areas such as environmental science, climate science and agriculture. Spatial and

spatiotemporal data often arise as multivariate measurements at each location. In particular, in

geostatistical applications, the data may be considered as a partial realization of a random vector

Y(s), Y(s) ∈ <p, s ∈ D ⊆ <d.

In the context of multivariate spatial modelling, two kinds of dependence must be accounted for:

among measurements in a specific location and among measurements across locations. The ultimate

goal in the analysis would be to predict the response vector based on partial observations of the

process under study. The development of multivariate spatial models is based on the assumption

that data which are closer in space are more correlated than data further apart and that vector

components are usually better predicted considering the components interdependence in this vector.

These general ideas are directly related to the cross-covariance function of spatial multivariate

data, that is, Cij(s, s′) = Cov(Yi(s), Yj(s

′)), s, s′ ∈ D ⊆ <d, which models the spatial dependence

of Yi(.) and Yj(.), i, j = 1, . . . , p, p denoting the number of variables in the random vector.

The cross-covariance functions considered must be valid, thus, the construction of new cross-

covariance functions usually rely on mathematical simplifications which are not necessarily followed

by good model-data fit. An usual simplifying assumption is that the cross-covariance functions are

separable (see Mardia and Goodall, 1993). Separability implies that when the spatial location varies,

the covariance pattern for different components remains unchanged, that is, Cij(s, s′) = aijρ(s, s′),

with A = aij a positive definite p× p matrix and ρ(., .) a valid spatial correlation function.

Cressie and Huang (1999) discuss shortcomings of separable models in the context of spatiotem-

poral processes and point out that separable models are often chosen for convenience rather than

1

for fitting the data well. Stein (2005) presents results about the limited kind of behaviours which

these classes represent in practice. A consequence of the separability assumption is that the dif-

ferent p process components have the same spatial range. Another consequence of separability is

full symmetry of the resulting covariance matrices. However, environmental, atmospheric and geo-

physical processes are often influenced by wind direction or ocean currents, which are incompatible

with the assumption of full symmetry. In the spatiotemporal context, asymmetric behavior is often

observed when the effect of one variable on another is delayed in time (Stein, 2005; Gneiting et al.,

2007). For instance, processes which are influenced by air flows might have asymmetric covariance

functions (see Fonseca and Steel, 2011).

Several authors have proposed alternative formulations to relax the separability assumption of

cross-covariance functions. A pioneer work is the linear model of coregionalization (Goulard and

Voltz, 1992; Wackernagel, 2003) which decomposes the spatial process Yj(s); j = 1, . . . , p into

sets W (j)u (s);u = 0, . . . ,K of spatially uncorrelated components1, i.e, Yj(s) =

∑Ku=0W

(j)u (s). In

this approach, the resulting cross-covariance function is Cij(s, s′) =

∑Ku=0 b

uijρ

u(s, s′;θu), where buij ,

i, j = 1, . . . , p, are real coefficients and ρu(., .;θu), u = 1, . . . ,K, are valid correlation functions that

may be different for each component. This proposal allows for different range parameters for each

component of Y(s). Schmidt and Gelfand (2003) and Gelfand et al. (2004) present full Bayesian

inference for this class of multivariate models.

A different proposal considers multidimensional scaling ideas (Cox and Cox, 2000) which repre-

sent similarity or dissimilarity between objects as distances between points in a multidimensional

space. Following this approach, Apanasovich and Genton (2010) proposed a multivariate spatial

model based on existing stationary covariance models for univariate processes. In particular, the

authors extend the spatiotemporal models of Gneiting (2002) to include a third argument that

represents the component vector and compute distances using latent dimensions. The structures

presented are based on processes which consider, for each component, the same isotropic covariance

function with respect to space and time. To overcome this limitation the authors consider the linear

model of coregionalization.

In a recent paper, Cressie and Zammit-Mangion (2016) proposed the conditional approach to

derive multivariate models. The construction is based on partitioning the vector of spatial processes

so that the joint distribution is specified through univariate spatial conditional distributions. This

is convenient as the modeler just needs to specify univariate covariance functions and an integrable

1Cov(W(i)u (s),W

(j)u (s′)) = Cuij(s, s

′); Cov(W(i)u (s),W

(j)v (s′)) = 0, u 6= v.

2

function of p arguments. Obviously, the results depend on the chosen conditioning and this is not

always an easy modeling decision.

Other proposals for multivariate spatial modelling are the convolution model (Gaspari and

Cohn, 1999; Majumdar and Gelfand, 2007), the semi-parametric model via separable functions

(Reich and Fuentes, 2007) and the multivariate Matern model (Gneiting et al., 2010).

In the context of space-time covariance models Fonseca and Steel (2011) developed nonseparable

covariance structures based on the convolution of purely spatial and purely temporal valid covari-

ance functions. One advantage of this approach is that it allows for different modelling decisions

regarding space and time. In this work, we propose to extend this class of nonseparable covariance

functions to the modeling of component and spatial dependence. The model is based on mixtures

and we consider multidimensional scaling ideas to define latent distances between components.

The proposed class allows for different ranges and degrees of smoothness across space for dif-

ferent components of the multivariate random vector. Similarly to the conditional approach of

Cressie and Zammit-Mangion (2016), the proposed covariance depends on the definition of univari-

ate spatial covariance functions and a bivariate joint density function. It is advantageous compared

to the conditional approach as it does not require the definition of conditioning relations between

components of the vector. In the proposed model, and in many models in the literature, a set of

parameters is responsible for separability of the resulting process. In this context, a Bayesian test

for separability is presented which is easier to interpret than the posterior distributions of model

parameters, since the scales in which these parameters are defined are neither limited or easily

interpretable.

It is important to note that the main articles cited in this work, for instance, Apanasovich

and Genton (2010) and Cressie and Zammit-Mangion (2016), make use of the frequentist inference

approach which does not fully account for all uncertainties when interest lies in both estimation

and prediction at ungauged location and/or future time points. Like Schmidt and Gelfand (2003)

and Gelfand et al. (2004) we use the Bayesian paradigm for estimation of the multivariate spatial

covariance functions and other model components as well as for prediction purposes.

The computational treatment of high complexity spatial models is a challenge. In the context of

geostatistics, analyzing multivariate data requires the specification of the cross-covariance function

and the computational cost to make inference and predictions can be prohibitive. As a result, the

use of complex models might be infeasible. Consider a p-dimensional vector measured at n locations

in D space. If a Gaussian process is assumed, the modeling of the p variables through nonseparable

3

multivariate spatial models results in a full matrix Σ ∈ <np×np. Therefore, the computation of the

likelihood requires the calculation of the inverse and determinant of this matrix, which might be

infeasible depending on the size of the problem.

In this work we present a way to approximate the full covariance matrix from two separable

matrices of minor dimensions based on Van Loan and Pitsianis (1993). The idea is to obtain

matrices R ∈ <n×n and A ∈ <p×p such that, for a given full covariance matrix Σ, the Frobenius

norm of ‖Σ − R ⊗ A‖F is minimized. Genton (2007) investigated the use of this approximate

structure in the spatio-temporal context. Here, this method is used for the multivariate spatial

case and is applied only in the likelihood computation, keeping the interpretation of the original

model.

We know that datasets are generally indexed in both space and time and that the assumption of

gaussianity in the modeling of multivariate spatio-temporal data with non-Gaussian characteristics

can be very restrictive and its predictive performance may not be satisfactory. When aberrant

observations are present in the dataset, we need to think of some way to accommodate these

observations, since the identification of outliers is essential to improve model fit and predictive

power. In this work we extend a valid nonseparable cross-covariance function, which is based on

mixtures and multidimensional scaling ideas, in to spatio-temporal setting, considering dynamic

models to describe the temporal evolution as West and Harrison (1997, Chapter 10). These models

have time-varying covariance coefficients which accommodate outliers and heterogeneity over time.

This thesis contributes to the modelling of multivariate spatio-temporal processes joining the

following four aspects: i) the proposal of a class of valid nonseparable spatial multivariate covariance

functions based on the convolution of separable functions, which allows the specification of different

structures for space and components, as well as different spatial ranges for each component; ii)

the use of multidimensional scaling concepts based on latent spaces to deal with association (or

similarity) among components; iii) the adoption of a Bayesian inferential approach on the proposed

models which includes the construction of a Bayesian hypothesis test, based on mixture priors,

which explores the fact that separability is straightforwardly obtained as a particular case of the

general proposed covariance structure, if an specif subset of the parameter vector is null; iv) the

use of dynamic concepts to time-varying covariance coefficients allowing to accommodate outlying

observations and changes in variability over time.

This thesis is organized as follows: in Chapter 2 we present a flexible class of multivariate

spatial covariance models and inference on these models, as previously mentioned, is conducted

4

from a Bayesian perspective. A Bayesian test of separability to measure the degree of separability

between space and components is developed and finally we also present simulated examples and

an illustration of the proposed approach based on weather data. A fast algorithm is used to

compute the likelihood function to allow for scalable modeling of large multivariate spatial data in

Chapter 3. We investigate the performance of an approximation for the full nonseparable covariance

and a sensitivity study is performed showing that the approximate approach provides important

gains in computational efficiency. Chapter 4 presents multivariate spatio-temporal models. The

proposal is based on time-varying covariance coefficients that accommodates outlying observations

and changes in variability over time. Simulated examples and an illustration of the proposed

approach is presented. Finally, Chapter 5 presents conclusions and future developments.

5

Chapter 2

Covariance modeling of multivariate

spatial random fields

2.1 Introduction

Following Cressie (1993) geostatistical data is defined as a partial realization of a stochastic

process Y (s) : s ∈ D, where D is a subset of <d with d-dimensional positive volume, that is, the

spatial index s varies continuously throughout the region D. Usually d = 2 (for example, latitude

and longitude) or d = 3 (for example, latitude, longitude and altitude).

A stochastic process (or random field), defined in D space, is usually described from finite-

dimensional probability distributions of random variables in collections of points s1, . . . , sn ∈ D. In

particular, a Gaussian random filed is a stochastic process whose finite dimensional distributions

are multivariate normal for every n and every locations s1, . . . , sn and is completely determined by

its mean and covariance functions.

Consider that the mean and covariance functions of a Gaussian spatial process Y (s) : s ∈ D

are defined by

µ(s) = E(Y (s)) (2.1)

and

Cov(Y (s), Y (s′)) = E((Y (s)− µ(s))(Y (s′)− µ(s′))), (2.2)

respectivily, with s, s′ ∈ D.

Often it is assumed that the Gaussian random field Y (s) : s ∈ D is stationary, that is,

µ(s) = E(Y (s)) = µ, (2.3)

6

where µ is a constant and

Cov(Y (s), Y (s′)) = E((Y (s)− µ(s))(Y (s′)− µ(s′))) = C(s− s′) (2.4)

is independent of locations s and s′, i.e, the covariance is just a function C(.) of the separation vector

h = s− s′. In addition, if the covariance is a function of Euclidean distance ‖h‖ = ‖s− s′‖, that is,

Cov(Y (s), Y (s′)) = C(‖s − s′‖) = C(‖h‖), the process is said to be isotropic. These assumptions

are restrictive and often unrealistic.

When a process is stationary and isotropic, its variance is constant and the elements of the

covariance matrix depend only on the parameter of variance and a valid correlation function, which

depends on the Euclidean distance. As follows we present some valid stationary, isotropic correlation

functions.

1. Power exponential class

ρ(‖h‖; Θ) = exp

−(‖h‖θ1

)θ2(2.5)

with θ2 ∈ (0, 2]. This family presents two correlation function as particular cases: the Gaus-

sian correlation function when θ2 = 2 and exponential function when θ2 = 1.

2. Cauchy class

ρ(‖h‖; Θ) =

(1 +

(‖h‖θ1

)θ2)−θ3(2.6)

with θ1 > 0 , θ2 ∈ (0, 2] and θ3 > 0. This function allows for long range dependence and easy

computation of the Hurst effect (Gneiting and Schlather, 2004).

3. Matern class

ρ(‖h‖; Θ) =1

2θ2−1Γ(θ2)

(‖h‖θ1

)θ2Kθ2

(‖h‖θ1

)(2.7)

where Γ is the gamma function, Kθ2(.) is the modified Bessel function of the third kind of

order θ2, θ1 > 0 and θ2 > 0.

In this chapter we work with multivariate spatial processes and propose a new class of nonsep-

arable covariance functions, which are flexible and intuitive depending only on the specification of

univariate covariance functions.

7

2.2 Multivariate process modeling

In the context of multivariate spatial processes, the main goal is usually to model the dependence

among several variables measured across a spatial domain of interest, in order to obtain realistic

predictions. Denote by Y(s) the p−dimensional vector of variables at location s ∈ D. Thus, the

direct covariance function measures the spatial dependence for each component individually, while

the cross-covariance function between two random functions measures the component dependence

at the same location and the component dependence between two different locations.

Assuming that Y(s) is a spatially stationary process, that is

E[Yi(s)] = mi, Cov[Yi(s), Yj(s + h)] = Cij(h), ∀s, s + h ∈ D; i, j = 1, 2, . . . , p, (2.8)

the cross-covariance function of Y(s) is defined as

E[(Yi(s)−mi)(Yj(s + h)−mj)] = Cij(h), s, s + h ∈ D; i, j = 1, 2, . . . , p. (2.9)

The requirement of positive definiteness of Cij(·) is a limitation in the definition of realistic

covariance functions for multivariate spatial processes. As a result, several simplifications are

called for in practice such as stationarity and separability.

2.2.1 Separable cross-covariance functions

Consider a p-dimensional multivariate random field Y(s) : s ∈ D ⊂ <d; Y ∈ <p. For example,

Y(s) = (temperature, humidity)(s). The cross-covariance function for two components i and j of

the vector Y, between two locations s and s′, can be described by

Cij(s, s′) = aijρ(s, s′), (2.10)

with A = aij a positive definite p × p matrix and ρ(·, ·) a valid correlation function. Let Y

be a vectorized version of Yik = Yi(sk), k = 1, · · · , n; i = 1, · · · , p (see Mardia and Goodall,

1993). Then the covariance matrix is Σ = R ⊗ A, with Rkl = ρ(sk, sl), k, l = 1, · · · , n. The

condition of positive definiteness is respected if R and A are positive definite. This specification

is computationally advantageous as inverses and determinants are obtained from smaller matrices,

that is, Σ−1 = R−1 ⊗A−1 and |Σ| = |R|p|A|n. However, this model has theoretical limitations

(Banerjee et al., 2004). Firstly, it is an intrinsic model implying that the covariance between two

components Yi(sk) and Yj(sl) is aij , that is, it does not depend on the locations sk and sl. Secondly,

note that as the covariance is defined by one spatial correlation function ρ(·, ·), the spatial range will

8

be the same for all components. This last feature can be perceived through the following argument:

consider the univariate spatial processes Y (s) : s ∈ D and X(s) : s ∈ D, D ⊂ <2. For locations

s1, . . . , sn we have Y = [Y (s1), Y (s2), . . . , Y (sn)]T and X = [X(s1), X(s2), . . . , X(sn)]T .

Consider the stacked 2n × 1 vector (X,Y)T , following a multivariate Normal distribution and

a separable covariance structure as in (2.10), that is, X

Y

∼ N2n(µ,Σ), Σ = A⊗R,

implying that X ∼ Nn(µx, a11R) and Y ∼ Nn(µy, a22R). It follows directly that Y|X ∼

Nn(µ∗,Σ∗), with

µ∗ = µy + (a12R)(a11R)−1(X− µx)

= µy −a12a11µx +

a12a11

X

and

Σ∗ = a22R− (a12R)(a11R)−1(a12R)

=

(a22 −

a212a11

)R,

which is equivalent to Y|X ∼ Nn(β0 + β1X, σ2R), with

β0 = µy −a12a11µx, β1 =

a12a11

and

σ2 = a22 −a212a11

.

Now assume, reversely, that X ∼ Nn(µx, a11R) and Y|X ∼ Nn(β0 + β1X, σ2S), with S any

spatial correlation matrix, then we obtain by marginalization the covariance structure for Y

Cov[Yi, Yj ] = σ2Sij + β21a11Rij

=

(a22 −

a212a11

)Sij +

a212a211

a11Rij

= a22Sij −a212a11

Sij +a212a11

Rij . (2.11)

Remember that if separability is assumed then Cov[Yi, Yj ] = a22R. Then (2.11) equals a22R,

reducing to the separable specification if and only if S = R, that is, if Y|X has the same spatial

correlation structure as X.

9

2.2.2 Nonseparable cross-covariance functions

Several authors have proposed cross-covariance functions capable of relaxing separability as-

sumptions. More flexible structures are obtained via the coregionalization approach (Goulard and

Voltz, 1992; Wackernagel, 2003), which in its simplest form is Y(s) = Aw(s), with A a p×p ma-

trix and the components of w(s), wj(s), j = 1, 2, . . . , p, independent and identically distributed

spatial processes. If the processes wj(s) are stationary with zero mean and unit variances and

Cov(wj(s), wj(s′)) = ρ(s − s′), then E(Y(s)) = 0 and the cross-covariance function of Y(s) is

ΣY(s),Y(s′) ≡ C(s− s′) = ρ(s− s′)AAT which is separable. A more general form for the coregional-

ization model considers independent processes wj(s) however they are not identically distributed.

The covariance matrix is given by

ΣY(s),Y(s′) ≡ C(s− s′) =

p∑j=1

ρj(s− s′)Tj (2.12)

with Tj = ajaTj , aj the j−th column of A. The resulting covariance is nonseparable and stationary.

Apanasovich and Genton (2010) propose a methodology based on latent dimensions and existing

covariance models for univariate random fields. The vector of components are represented as

coordinates in a k−dimensional space, for an integer 1 ≤ k ≤ p, that is, the i − th component is

represented as ξi = (ξi1, . . . , ξik)T . This approach can be used for any valid covariance function

Cij = C((s, ξi), (s′, ξj)). For any s, s′ there is Cs,s′(.) such that Cij(s, s

′) = Cs,s′(ξi, ξj) for some

ξi, ξj ∈ <k.

The latent coordinates may be treated as parameters and estimated from data. Moreover,

it is possible to consider the reparametrisation δij = ‖ξi − ξj‖. This approach is similar to the

multidimensional scaling (Cox and Cox, 2000) with latent distances δij ’s, where for fixed locations

s and s′, small δij ’s are converted into strong cross-correlation. Notice that large values of δij ’s

mean small correlation. Indeed, δij ’s can be interpreted as measures of distances in a latent space.

Suppose we are evaluating two variables, if the measure of distance between them is small, these

variables are “near” each other. Put another way, if they are very different, the distance in latent

space is large. It is important to note that ξi’s and δij ’s are not calculated from multidimensional

scaling methods. They are parameters of the cross-covariance function, estimated from the data.

These parameters are just interpreted in the same way as the existing measures of dissimilarities

in multidimensional scaling.

10

A simple cross-covariance function presented in Apanasovich and Genton (2010) is given by

Cij(‖h‖) = C(‖h‖, δij) =

a211exp(−α1‖h‖) (i = j = 1)

a221exp(−α1‖h‖) + a222exp(−α2‖h‖) (i = j = 2)

a11a21δ12 + 1

exp

− α1‖h‖

(δ12 + 1)β2

(i 6= j)

with ‖h‖ = ‖s − s′‖ a Euclidean distance between locations, where the linear model of coregion-

alization is a special case when δ12 = β = 0. From simulation results, the authors show that the

coregionalization model is not flexible enough to provide unbiased estimates for the spatial ranges.

As follows we consider an intuitive proposal for the construction of nonseparable covariance

structures, which is based on mixing separable functions as in Fonseca and Steel (2011) and latent

distances discussed in Cox and Cox (2000) and used in Apanasovich and Genton (2010).

A review of the main approaches to build a valid multivariate cross-covariance function is

presented in Genton and Kleiber (2015).

2.3 Multivariate spatial modeling based on mixtures

Separable functions result in limited structures in the modeling of space-time or space-

component interaction. There are several ways to construct nonseparable covariance functions

that provide more flexible and realistic covariance structures. One way to build such structures is

based on mixtures.

In the space-time context, Ma (2002, 2003) introduced models based on mixtures of purely

spatial and purely temporal covariance functions. Particularly, the author presents mixtures of

scale and the positive power mixture of separable covariance functions. Ma argued that one benefit

of the mixture method is that it generates a sufficient variety of nonseparable spatiotemporal

covariance models, with appropriate choices of the mixing function and the purely spatial and

temporal covariances. As a result, the proposal provides an easy and effective way to construct new

spatiotemporal covariance models, which depend only on the characterization of Laplace transforms.

Porcu et al. (2007) investigated the properties of the stationary spatiotemporal scale mixture

based random field. The model described in Porcu et al. (2007, Proposition 1) is a special case

of a wider class of covariance functions introduced by Ma (2003), where variograms in Laplace

transform are defined by Burnstein functions. Some important classes of cross-covariance functions

are described in Porcu and Zastavnyi (2011). They provide some results for mixture based models

11

allowing the construction of cross-covariance models (Theorem 1, p. 1297). Recent works also use

covariance models via mixture representation to characterize classes of covariance models (Daley

et al., 2014; Bourotte et al., 2016; Porcu et al., 2016).

Fonseca and Steel (2011) developed a class of space-time covariance functions based on the

scale mixture defined in Ma (2002, 2003) which is obtained analytically by appropriate choices of

the mixing distribution. The proposed models are obtained by mixing purely spatial and purely

temporal valid covariance functions. More specifically, let (U, V ) be a bivariate random vector with

joint distribution g(u, v). This proposal considers two uncorrelated processes: Z1(s, u) : s ∈ D a

purely spatial process for every u ∈ <+ with a stationary covariance C1(·;u) and Z2(t, v) : t ∈ <+

a purely temporal process for every v ∈ <+ with stationary covariance C2(·, v), both independent

of (U, V ). The mixture representation of the covariance structure of Z(s, t) = Z1(s, U)Z2(t, V ) is a

convex combination of separable covariance functions. If stationarity is assumed in space and time

then Cov[Z(s, t), Z(s + h, t+ l)], given by

C(h, l) =

∫ ∫C1(h;u)C2(l; v)g(u, v)dudv, (2.13)

is a valid nonseparable function for h ∈ D ⊆ <d and l ∈ <+.

This work proposes to modify (2.13) to deal with the multivariate spatial specification. As

follows we present a class of multivariate spatial covariances which is flexible and intuitive depending

only on the specification of univariate correlation functions in space and on mixing functions in <2+.

The cross-dependencies between components of a spatial vector is based on latent dimensions as in

Apanasovich and Genton (2010). The idea is to define the vector of components as coordinates in

a k−dimensional latent space, for an integer 1 ≤ k ≤ p, that is, the i− th component is represented

as ξi = (ξi1, . . . , ξik)T . If stationarity is assumed also in the latent dimensions and hij = ξi − ξj ,

then the covariance of Y(s) is a convex combination of separable covariance functions given by

Cij(h,hij) =

∫ ∫C1(h;u)C2(hij ; v)gij(u, v)dudv (2.14)

with hij ∈ <k representing a latent separation vector between the components i, j = 1, . . . , p and

h ∈ D ⊆ <d, the separation vector in space.

It is possible to analytically solve (2.14), still assuring positive definitiness of the covariance

structure, defining C1(h;u) = exp−‖h‖u and C2(hij ; v) = exp−‖hij‖v, with ‖.‖ Euclidean

norm, resulting in a valid cross-correlation function

ρij(h; hij) = Mij(−‖h‖,−‖hij‖), (2.15)

12

with Mij(·, ·) a bivariate moment generating function. Following the specification presented in

Fonseca and Steel (2011), for each i, j define Uij = X0,ij + X1,ij and Vij = X0,ij + X2,ij with

Xl,ij nonnegative random variables with moment generating functions Ml,ij , l = 0, 1, 2. Then, the

correlation function implied by (2.14) is

ρij(h; hij) = M0,ij(−‖h‖ − ‖hij‖)M1,ij(−‖h‖)M2,ij(−‖hij‖). (2.16)

This general class allows for different parametric representations for each component, as we

may vary the specifications for M0,ij , M1,ij and M2,ij . Observe that if Uij and Vij are uncorrelated,

that is, Uij = X1,ij and Vij = X2,ij , and correlation in space is the same for all componentes,

M1,ij(·) = M1(·), ∀ i, j, then ρij(h; hij) = M1(−‖h‖)M2,ij(−‖hij‖) which is separable.

A valid covariance structure can be built by considering co-located covariance coefficients σij ,

i, j = 1, . . . , p, and Cij(h,hij) = σijρij(h; hij), i, j = 1, . . . , p, since if A and B are nonnegtive

definite matrices, with A = Aijpi,j=1 and B = Bijpi,j=1, then so is AB, AB = AijBijpi,j=1.

Thus, the general valid covariance function is given by

Cij(h; hij) = σijρij(h; hij) (2.17)

= σijM0,ij(−‖h‖ − ‖hij‖)M1,ij(−‖h‖)M2,ij(−‖hij‖),

with h ∈ <d, hij ∈ <k and co-located covariance coefficients σij i, j = 1, . . . , p, where σii ∈ <+ and

σij ∈ <, i 6= j.

The fundamental step in the definition of this class of functions lies on the representation of

the dependence between U and V for each (i, j). In particular, to generate flexible classes of cross-

covariance functions we need only to specify the moment generating functions M0,ij , M1,ij and

M2,ij in (2.17).

2.3.1 Flexible classes

Let M0,ij , M1,ij and M2,ij be the moment-generating functions for the gamma distribution. The

resulting covariance function is in the Cauchy family, which allows for long range dependence and

easy computation of the Hurst effect (Gneiting and Schlather, 2004). Let Xl,ij ∼ Gamma(αl, φij),

∀ i, j = 1, . . . , p, l = 0, 1, 2, and h ∈ <d and hij ∈ <k. Then, from (2.17), the proposed cross-

covariance function is

Cij(h; hij) = σij

(1 +‖h‖+ ‖hij‖

φij

)−α0(

1 +‖h‖φij

)−α1(

1 +‖hij‖φij

)−α2

(2.18)

13

with σii ∈ <+, i = 1, . . . , p, σij ∈ <, i 6= j, i, j = 1, . . . , p, αl > 0, l = 0, 1, 2, and φij > 0,

i, j = 1, . . . , p. In order to avoid redundancy, like Cressie and Huang (1999) and Fonseca and Steel

(2011), we fix αi = 1, for i = 1, 2.

Thus the proposed flexible model is given by

Cij(‖h‖, δij) = σij

(1 + δij +

‖h‖φij

)−α0(

1 +‖h‖φij

)−1(1 + δij)

−1 , i, j = 1, . . . , p, (2.19)

where we adopt the following reparametrization:

φij = φ11 + dij , i = 1, . . . , p, j = 2, . . . , p,

where δij =‖hij‖φij

represents a latent distance between components i, j, σij is the co-located

covariance between components, α0 ≥ 0 is a smoothness parameter and φij ’s are spatial range

parameters. Observe that φij > 0, then dij > −φ11.

Notice that the model is conveniently specified in such a way that if dij = 0, for i = 1, . . . , p,

j = 2, . . . , p, and α0 = 0, the separable model of Mardia and Goodall (1993) is obtained.

The general class is flexible enough to generate a nonseparable covariance structure and allows

for different spatial ranges associated to each component, which may be obtained through the

specification of p ranges φii, i = 1, . . . , p and p(p − 1)/2 cross ranges φij = φji, i 6= j. In the

conditional approach presented in Cressie and Zammit-Mangion (2016), it is necessary to specify

a univariate covariance function for a component and p− 1 conditional covariance functions. The

construction of the cross-covariance structure and the remaining p− 1 components is done from an

interaction function.

Other choices of moment-generating functions may be considered as suggested in Fonseca and

Steel (2011). For instance, if X1,ij follows an Inverse Gamma distribution ∀i, j then the resulting

function M1,ij would be the Matern covariance function. For example, let Xl,ij ∼ Gamma(αl, φij),

∀ i, j = 1, . . . , p, l = 0, 2, Xl,ij ∼ InvGamma(ν, 1/φij), ∀ i, j = 1, . . . , p, l = 1, and h ∈ <d and

hij ∈ <k, then de cross-covariance function is

Cij(h; hij) = σij

(1 +‖h‖+ ‖hij‖

φij

)−α0 2

Γ(ν)

(‖h‖)ν2

φijKν

(2

(‖h‖φij

)1/2)(

1 +‖hij‖φij

)−α2

(2.20)

with Kν(.) the modified Bessel function of the third kind of order ν, σii ∈ <+, i = 1, . . . , p, σij ∈ <,

i 6= j, i, j = 1, . . . , p, αl > 0, l = 0, 1, 2, and φij > 0, i, j = 1, . . . , p. If ν = 1/2, the space component

would have a exponential covariance function.

In what follows, we restrict our attention to (2.19).

14

2.4 Bayesian inference

Let y = (yt(s1), . . . ,yt(sn)) be a matrix of multivariate data observed at spatial locations

s1, . . . , sn ∈ D with replicates t ∈ Z+, where yt(si) = (y1t(si), . . . , ypt(si))′, t = 1, . . . , T , is a

p−dimensional vector. As follows, replicates are considered in order to ensure identifiability of

model parameters and latent distances. If the Gaussian assumption is made, the likelihood function

with T independent replicates for the unknown parameters based on n spatial locations is given by

l(y; Θ) = (2π)−npT

2 |Σ|−T/2exp

−1

2

T∑t=1

(yt − µ)′Σ−1(yt − µ)

(2.21)

with yt the vectorized version of (yt(s1), . . . ,yt(sn)) with np observations; µ = Xβ the mean

vector with block diagonal design matrix X, where each block represents the predictor for each

component, and a parameter vector β; Σ the covariance matrix with dimension np × np. The

covariance matrix is described by a parametric vector Ψ defined by equation (2.19). In particu-

lar for our model specification, Ψ = (σ, δ, α0, φ11,d) and the parametric vector is Θ = (Ψ,β),

with σ the vector of co-located covariance σij , i, j = 1, . . . , p, δ the vector of latent distances

δij , i 6= j, i, j = 1, . . . , p, α0 the smoothness parameter, φ11 the range parameter of the

first component, d = (d12, d13, . . . , d1p, . . . , dpp) the vector of dissimilarities among ranges and

β = (β11, . . . , β1q1 , β21, . . . , β2q2 , . . . , βp1, . . . , βpqp), with qi, i = 1, . . . , p, the number of elements of

the predictor of each component. From now on, we consider qi = q, i = 1, . . . , p.

Gneiting et al. (2010) have introduced a class of Matern cross-covariance functions defined by

Cij(h) = σijM(h|νij , aij) = ρijσiσjM(h|νij , aij), i, j = 1, . . . , p, with co-located correlation coeffi-

cient ρij , co-located variance parameters σi and σj , smoothness parameter νij and scale parameter

aij . The authors discuss the values of νij , aij , and ρij that result in a bivariate valid structure.

Apanasovich et al. (2012) use the same parameterization of Gneiting et al. (2010) to give a general

characterization of the parameters that yield a valid multivariate Matern model for any number of

components. This parameterization can also be seen in Daley et al. (2014). To simplify the esti-

mation, set σij = σiσj , i, j = 1, . . . , p, with σi ∈ <, i = 1, . . . , p. Note that the ρij signal is present

in σi or σj components and we can still reconstruct the co-located variance and the co-located

covariance. Thus we estimate the vector σ = (σ1, . . . , σp).

To complete the Bayesian model specification, prior independence is assumed for the parameters

in the proposed model (2.19) with σi ∼ Normal(ci, c∗i ), i = 1, . . . , p, δij ∼ Gamma(fij , f

∗ij), i < j,

i = 1, . . . , p, j = 2, . . . , p, φ11 ∼ Gamma(u × med(δs), u), with med(δs) denoting the median of

the distances among observed locations and β ∼ MVN(b,B) pq-dimensional (MVN - Multivariate

15

Normal distribution). A continuous prior specification would assign zero probability for α0 = 0

and d = (d12, d13, . . . , d1p, . . . , dpp) = 0. As an alternative, we consider the following joint mixture

representation for ∆ = (α0,d)

π(∆ | φ11) = p0D0 + (1− p0)g(∆ | φ11), (2.22)

with D0 the dirac function at α0 = 0 and dij = 0, i = 1, . . . , p, j = 2, . . . , p, and g(∆ | φ11) a

continuous joint distribution for α0 > 0 and dij > −φ11, i = 1, . . . , p, j = 2, . . . , p. Thus, p0 is the

prior probability of a separable covariance function and p0 = 0 implies a completely continuous prior

π(∆ | φ11) = g(∆ | φ11). In this sense, considering independence, that is, g(∆ | φ11) = g1(α0) ×∏pi=1

∏pj=2 g2(dij | φ11), it is necessary to specify the prior distributions for parameters α0 and dij .

Then consider g1(α0) ≡ Gamma(r, r∗), for α0 > 0, and g2(dij | φ11) ≡ TruncatedNormal(v, v∗), for

dij ∈ (−φ11,∞), i = 1, . . . , p, j = 2, . . . , p.

Inference is based on stochastic simulations from the complete conditional distributions for

sets of parameters. In particular Metropolis-Hastings steps are considered in the Gibbs sampler

algorithm as detailed in Gamerman and Lopes (2006).

2.4.1 Prediction

One of the main goals in spatial data analysis is to obtain prediction for unobserved locations

or for missing data within the observed data. Let yu be the observation vector at unmeasured

locations su ∈ D ⊆ <d. The prediction of yu is based on the predictive distribution p(yu | yo),

with yo denoting the vector of observed data. Thus,

p(yu | yo) =

∫p(yu | yo,Θ)p(Θ | yo)dΘ. (2.23)

From the Gaussian assumption, the distribution p(yu | yo,Θ) is also Gaussian with parameters

µ∗ = µu + ΣuoΣ−1oo (yo − µo) and Σ∗ = Σuu − ΣuoΣ

−1oo Σou. Assume that Θ(1), . . . ,Θ(M) are a

sample from the posterior distribution (Θ | yo) obtained by MCMC sampling. Thus, the predictive

distribution in (2.23) may be obtained by the approximation

p(yu | yo) =1

M

M∑i=1

p(yu | yo,Θ(i)). (2.24)

Composition sampling is considered to obtain samples from this predictive distribution within the

Metropolis Hasting algorithm (Banerjee et al., 2015, p. 126).

16

2.5 Bayesian hypotheses testing for separability

The separable model is a special case of the (2.19) if the spatial ranges are equal, φij = φji = φ,

∀i, j = 1, . . . , p, and α0 = 0. From a frequentist point of view, many authors present a formal

method to test separability in the spatiotemporal and multivariate space-time models (Mitchell

et al., 2005, 2006; Fuentes, 2006; Li et al., 2007, 2008). The test proposed in this work aims to

measure the degree of separability between space and components and we follow the Bayesian

paradigm for hypothesis testing.

Consider the prior distribution for ∆ = (α0,d) as in (2.22). The resulting posterior distribution

in this specification is also a mixture

π(∆ | y, φ11) = p0D0 + (1− p0)g(∆ | y, φ11), (2.25)

with p0 being the posterior probability of separable covariance functions given the data. The

posterior probabilities p0 might be used to select a model or to predict new observations based

on model averaging across both models (Hoeting et al., 1999). As follows the main interest is to

develop a Bayesian test of separability.

Consider the null hypothesis H0: ∆ ∈ Ω0 and the alternative hypothesis H1: ∆ ∈ Ω1, where

Ω0 ∪ Ω1 = Ω. In particular, consider Ω0 = 0 and losses w0 if H0 is rejected when the process

is separable and w1 if H0 is not rejected when the process is nonseparable. Correct decisions are

associated to null losses. The general idea is to choose the action which leads to the minimum

posterior expected loss (DeGroot and Schervish, 2011). Such test procedure rejects H0 when

p0 = P (H0|y) ≤ w1w0+w1

.

It is common to use the Bayes factors (BF) for comparing a point hypothesis to a continuous

alternative. We can define Bayes factor as the ratio of the posterior probabilities of the alternative

and the null hypotheses over the ratio of the prior probabilities of the alternative and the null

hypotheses. Thus, the BF is given by

BF =1− p0p0

/1− p0p0

. (2.26)

Observe that (2.26) would be the posterior odds against H0 if p0 = 0.5. Considering this

situation, we can reconstruct the interpretation table of the BF given in Kass and Raftery (1995)

based on information of the posterior probability of separability p0 and losses w0 and w1. Table 2.1

presents the interpretation of the Bayesian test for separability proposed in this subsection. Detailed

information about BF is described in Kass and Raftery (1995). More details about Bayesian

hypotheses testing can be seen in Robert (1994) and Schervish (1995).

17

BF p0 (w0;w1)Evidence against H0

(against separability)

1 to 3 0.50 to 0.25 (1 to 3; 1) Not worth more than a bare mention

3 to 20 0.25 to 0.05 (3 to 20; 1) Subtancial nonseparability

20 to 150 0.05 to 0.01 (20 to 150; 1) Strong nonseparability

> 150 < 0.01 (>150; 1) Very strong nonseparability

Table 2.1: Interpretation table for the Bayesian separability test. BF: Bayes Factor; p0: posterior

probability of separability; w0: loss associated to the decision of rejecting H0 when H0 is true; w1:

loss associated to the decision of not rejecting H0 when H1 is true.

2.6 Simulated examples

2.6.1 Example 1: computing posterior probabilities of separability

In practice, repeated observations are available, at each location, and in this paper we will

assume that we have independent replications. It is insufficient to use a single replication of the

data to make inferences on the parameters of the covariance function, especially on the latent

parameters. In this context, we simulate scenarios from separable to very nonseparable structures

considering the covariance function in (2.19).

We generate three datasets with p = 2 components, n = 80 spatial locations and T = 10

independent replicates, specifying different combinations of parameters responsible for separability.

We consider a Gaussian process, so yt ∼ MVN(Xβ,Σ) np-dimensional, t = 1, . . . , T , where Σ

is a np × np covariance matrix and X is a p-block diagonal design matrix, with identical blocks

containing an intercept term and independent variables (latitude, longitude and altitude, randomly

generated in the cube [0,1] × [0,1] × [0,1]). For each dataset, we consider three scenarios for

α0, which are α0 = 0, α0 = 0.1 and α0 = 0.2. The specification of the other parameters is

the same for all datasets and scenarios and is given by Θ∗ = (β, δ12, σ1, σ2) with β = (β1,β2),

β1 = (1,−0.2,−0.8, 0.5), β2 = (1.5, 0.6,−0.5,−0.8), δ12 = 1.5, σ1 = 1 and σ2 = 1.5. The structure

for spatial ranges are constructed as follows

Dataset 1 (equal ranges): considers φ11 = φ22 = φ12 = 0.1 (d12 = d22 = 0);

Dataset 2 (similar ranges): assumes the spatial range of the second variable is somewhat

18

different, ie, φ11 = φ12 = 0.1 and φ22 = 0.13 (d12 = 0 and d22 = 0.03);

Dataset 3 (different ranges): considers φ11 = φ12 = 0.1 and φ22 = 0.2 (d12 = 0 and d22 = 0.1).

The prior distributions for all scenarios follow the discussion in Section 2.4 and the hyperpa-

rameters are described in Appendix A.2.

Table 2.2 presents the posterior probabilities of separability p0 for each scenario. Note that

the model that considers equal ranges and α0 = 0 results in estimated p0 close to 1. In the

same dataset, assuming α0 = 0.2, the posterior probability of separability indicates substantial

nonseparability, following Table 2.1. In dataset 2, even assuming α0 = 0 there are indications that

the structure is nonseparable, that is, even if the ranges are only slightly different from one another,

the nonseparable specification is preferred. By increasing the value of α0, the nonseparability

hypothesis becomes evident. In dataset 3, where we assume that one of the ranges is completely

different from the other two, there is always evidence of substantial or strong nonseparability,

independently of α0.

Dataα0

0 0.10 0.20

Dataset 10.987 0.251 0.146

(equal ranges)

Dataset 20.259 0.115 0.059

(similar ranges)

Dataset 30.058 0.000 0.000

(different ranges)

Table 2.2: Posterior probabilities of separability p0 for each dataset and scenario of α0.

The posterior probability of separability is an easy measure to interpret, regarding inference

on separability. Apanasovich and Genton (2010) propose models that use the covariance functions

presented in Gneiting (2002). They also estimate a separability parameter defined in the interval

[0, 1], where 0 indicates separability and 1 should indicate strong nonseparability. However, Fon-

seca and Steel (2017) verified that this parameter is not able to correctly capture the degree of

separability in the data structure, showing that when the parameter is at the upper limit there

is no implication of strong nonseparability. Also, from simulated exercises with model (2.19), we

19

find that other parameters influence the separability, for example, the spatial ranges. Thus, the

computation of the probability of separability is an attractive alternative which does not depend

on the scale of ranges or smoothness.

2.6.2 Example 2: model discrimination

This section presents a simulated example for three different scenarios, in order to compare

different models analyzing their predictive performance and posterior probability of separability.

We also aim at verifying the ability of the nonseparable model, presented in equation (2.19), to

recover simpler structures.

Three datasets were generated with p = 3 components, T = 20 independent replications and

n = 55 spatial locations in the [0, 1]× [0, 1] square. The information about three spatial locations

were removed for prediction. The datasets are generated using the following specifications:

dataset 1 (separable): Simulation based on the separable model with Cauchy correlation

function, specified in Appendix A.1. Parameters aij defined in order to high correlation be-

tween components which may be rewritten as a∗ij =aij√aiiajj

. The parameter specification is

Θ = (β, a11, a22, a33, a∗12, a

∗13, a

∗23, φ), with β = (β1,β2,β3), β1 = (1,−0.2,−0.8, 0.5), β2 =

(1.5, 0.6,−0.5,−0.8), β3 = (1.8,−0.4,−0.3, 0.6), a11 = 0.8, a22 = 0.7, a33 = 0.5, a∗12 = −0.8,

a∗13 = 0.9, a∗23 = −0.7 and φ = 0.1;

dataset 2 (nonseparable with equal ranges): Simulation based on the nonseparable covariance

function presented in (2.19) with φij = φji = φ, i, j = 1, . . . , p, that is, dij = 0, i = 1, . . . , p,

j = 2, . . . , p. The parametric vector is Θ = (β, σ1, σ2, σ3, δ12, δ13, δ23, α0, φ). We define α0 = 0.25.

The δ parameters were chosen such that the variables present weak/moderate correlation. Thus,

we consider the following parameter specification: σ1 = 2, σ2 = 1, σ3 = 2.5, δ12 = 2, δ13 = 2.2,

δ23 = 1.9 and φ = 0.1. The β = (β1,β2,β3) parameters are the same defined in dataset 1.

dataset 3 (nonseparable): Simulation based on the nonseparable covariance function pre-

sented in (2.19) with dij 6= 0, i = 1, . . . , p, j = 2, . . . , p. The parametric vector is Θ =

(β, σ1, σ2, σ3, δ12, δ13, δ23, α0, φ11, d12, d13, d22, d23, d33), with β = (β1,β2,β3), β1 = (11.2,−0.07,

0.25,−0.01), β2 = (−6.8, 1.3,−0.25, 0.02), β3 = (5.6, 0.1, 0.13,−0.01), σ1 = −2.4, σ2 = 8.7,

σ3 = −1.4, δ12 = 0.6, δ13 = 1.7, δ23 = 1.2, α0 = 0, φ11 = 0.07, d12 = 0.15, d13 = 0.13, d22 = 0.18,

d23 = 0.23 and d33 = −0.01.

In order to illustrate the correlations between the variables in each dataset, we estimate the

20

parameters in the model y = µ+ ε for each spatial location i, i = 1, . . . , 55, that is,

y = (y1,y2,y3)′ ∼ MVN(µ,C), (2.27)

with C an arbitrary full cross-covariance matrix.

Figure 2.1 presents the posterior median and 95% credibility interval of the correlations between

variables for each dataset.

21

spatial location

post

erio

r co

rrel

atio

n ρ

12

1 8 17 27 37 47

−1

−0.

50

0.5

1

spatial locationpo

ster

ior

corr

elat

ion

ρ13

1 8 17 27 37 47

−1

−0.

50

0.5

1

spatial location

post

erio

r co

rrel

atio

n ρ

23

1 8 17 27 37 47

−1

−0.

50

0.5

1

(a) Dataset 1

spatial location

post

erio

r co

rrel

atio

n ρ

12

1 8 17 27 37 47

−1

−0.

50

0.5

1

spatial location

post

erio

r co

rrel

atio

n ρ

13

1 8 17 27 37 47

−1

−0.

50

0.5

1

spatial location

post

erio

r co

rrel

atio

n ρ

23

1 8 17 27 37 47

−1

−0.

50

0.5

1

(b) Dataset 2

spatial location

post

erio

r co

rrel

atio

n ρ

12

1 8 17 27 37 47

−1

−0.

50

0.5

1

spatial location

post

erio

r co

rrel

atio

n ρ

13

1 8 17 27 37 47

−1

−0.

50

0.5

1

spatial location

post

erio

r co

rrel

atio

n ρ

23

1 8 17 27 37 47

−1

−0.

50

0.5

1

(c) Dataset 3

Figure 2.1: Posterior median and 95% CI of the cross-correlations among variables for each dataset

considering an independent multivariate normal distribution for each spatial location.

We estimate four multivariate models for each dataset and their performances are compared in

22

predictive terms. The models considered are described as follows:

1. SEP: the separable model as presented in Mardia and Goodall (1993) with covariance function

in the Cauchy family, as in Appendix A.1;

2. NSEP-φ: the nonseparable model with equal ranges and covariance function as defined in

(2.19), that is, dij = 0, i = 1, . . . , p, j = 2, . . . , p, and a continuous prior for α0;

3. NSEP: the nonseparable model with different ranges as defined in (2.19), with prior for α0

and dij , i = 1, . . . , p, j = 2, . . . , p, as in (2.22) with p0 = 0;

4. Mix-NSEP: the nonseparable model with different ranges as defined in (2.19), with a mix-

ture prior for α0 and dij , i = 1, . . . , p, as in (2.22).

Details about prior distributions for model parameters are deferred to Appendix A.2. Inference

was carried under an MCMC scheme and for convergence monitoring we use the algorithms in Coda

package for R (Plummer et al., 2006).

Table 2.3 presents predictive measures for model comparison, for each dataset. The IS (Interval

Score), WCI (Width of Credibility Interval) and LPS (Log Predictive Score) comparison measures

are predictive score rules that provide summaries for accuracy of probabilistic predictions. The

smaller the IS, WCI or LPS, the better the model in predictive terms. For more details about

scoring rules see Gneiting and Raftery (2007) and Appendix A.3.

In the first scenario (dataset 1) the data was simulated assuming a separable structure and

indeed the hypothesis test for separability obtained from the most complex model (Mix-NSEP) for

p0 = 0.814 indicates separability for this data. In Figure 2.2 it is possible to verify that the most

complex model is able to recover the simplest separable structure. The predictive measures are

similar for all models, then the simpler model is preferable (Occam’s razor).

In the second scenario (dataset 2) the data is nonseparable, however the ranges are all equal.

The simulated data represents an example of moderate nonseparability and separable models are

expected to reasonably mimic this covariance structure. The hypothesis of separability may be

tested for the Mix-NSEP model and the test indicates posterior probability of separability p0 =

0.311, that is, nonseparability could be assumed but it is not substantially advantageous when

compared to separability, for this data. The nonseparable model with equal ranges (NSEP-φ) is

selected as the best model according to the predictive discrepancy measures.

In the third scenario (dataset 3) the data is nonseparable and the spatial ranges are all different.

This represents an example of strong nonseparability which could not be reproduced by separable

23

covariance structures. The hypothesis test for the model with different ranges (Mix-NSEP) correctly

indicates strong nonseparability for this data with p0 = 0. The models with equal ranges (SEP

and NSEP-φ) have poor predictive performance when compared to the models with unequal spatial

ranges.

0.0 0.4 0.8 1.2

0.0

0.2

0.4

0.6

0.8

distance in space

cova

rianc

e

(a) Cov11

0.0 0.4 0.8 1.2

0.0

0.2

0.4

0.6

distance in space

cova

rianc

e

(b) Cov22

0.0 0.4 0.8 1.2

0.0

0.2

0.4

distance in space

cova

rianc

e

(c) Cov33

0.0 0.4 0.8 1.2

−0.

6−

0.4

−0.

20.

0

distance in space

cova

rianc

e

(d) Cov12

0.0 0.4 0.8 1.2

0.0

0.2

0.4

0.6

distance in space

cova

rianc

e

(e) Cov13

0.0 0.4 0.8 1.2

−0.

5−

0.3

−0.

1

distance in space

cova

rianc

e

(f) Cov23

Figure 2.2: Posterior median of covariance function (gray full line) and 95% credible interval (gray

dashed lines) of Mix-NSEP model. Black full line: true covariance function (SEP model).

24

data model average WCI average IS LPS p0

Dataset 1

SEP 6.70 7.53 70.19 –

(separable)

NSEP-φ 6.58 7.54 70.29 –

NSEP 6.61 7.58 70.09 –

Mix-NSEP 6.67 7.48 70.12 0.814

Dataset 2SEP 17.16 24.12 309.96 –

(nonseparable:NSEP-φ 17.10 24.45 309.25 –

equal ranges)NSEP 17.07 24.51 309.38 –

Mix-NSEP 17.21 24.24 309.92 0.311

Dataset 3SEP 32.04 33.80 373.81 –

(nonseparable:NSEP-φ 31.93 33.99 373.72 –

different ranges)NSEP 29.75 31.69 369.65 –

Mix-NSEP 29.73 31.41 369.67 0.000

Table 2.3: Predictive measures for model comparison and posterior probability of separability for

the simulation examples. IS=Interval Score, WCI=Width of Credibility Interval and LPS=Log

Predictive Score.

2.7 Ceara weather dataset

In this section we apply the model defined in (2.19) to an illustrative dataset obtained for a

collection of monitoring stations in Ceara state, Brazil. The weather dataset was obtained from

Instituto Nacional de Pesquisas Espaciais (INPE) and consists of three variables, temperature (C),

humidity (%) and solar radiation (MJ/m2), measured daily at 12 o’clock and recorded at 24 stations

from December 20, 2010 to February 28, 2011. Locations with less than 10% missings have gone

through an imputation process1. In addition, we work with the seasonally adjusted series to obtain

T = 71 independent replicates. For predictive comparison and validation, we consider two spatial

locations. Figure 2.3 shows the locations of these 24 monitoring sites and two hold-out sites on a

latitute-longitude scale.

1The imputation was performed applying the mice package in R.

25

longitude

latit

ude

1

2

−42 −40 −38

−8

−7

−6

−5

−4

−3

Figure 2.3: Collection of monitoring stations in Ceara state, Brazil. Numbers 1 and 2 are spatial

locations considered for predictive comparison.

In order to evaluate the correlations between each pair of variables, we estimate the parameters

in a model as in (2.27).

From Figure 2.4 note that there is strong correlation among the three variables, so we expect

the component distance between them to be small.

spatial location

post

erio

r co

rrel

atio

n T

vs.

H

1 4 7 11 16 21

−1

−0.

50

0.5

1

(a) temp. vs. humid.

spatial location

post

erio

r co

rrel

atio

n T

vs.

SR

1 4 7 11 16 21

−1

−0.

50

0.5

1

(b) temp. vs. solar rad.

spatial location

post

erio

r co

rrel

atio

n H

vs.

SR

1 4 7 11 16 21

−1

−0.

50

0.5

1

(c) humid. vs. solar rad.

Figure 2.4: Posterior median and 95% CI of the cross-correlations among components considering

an independent multivariate normal distribution for each spatial location.

We compared the predictive performance of the following models: separable model (SEP) with

covariance function in the Cauchy family (see Appendix A.1); nonseparable model with equal

26

spatial ranges (NSEP-φ); nonseparable model with different spatial ranges (NSEP); nonseparable

model with different spatial ranges and mixing prior for α0 and dij , i = 1, 2, 3, j = 2, 3, defined

in (2.22) (Mix-NSEP); and linear model of coregionalization (LMC) with covariance function for

each component in the Cauchy family (see Appendix A.1). Parameter estimation was performed

considering the likelihood described in (2.21).

The prior distributions for the parameters in the proposed model follow the discussion in Sec-

tion 2.4 and their hyperparameters are presented in Appendix A.2. MCMC methods were used

to generate posterior and predictive samples. For MCMC convergence monitoring we used the

algorithms in Coda package for R.

Using separable models that consider covariance structures for all variables to be proportional in

space is a limitation. We evaluated the behavior of the posterior spatial correlation for each variable

(temperature, humidity and solar radiation) from a univariate spatial model, where the covariance

function is defined by C(‖h‖) = σ2ρ(‖h‖), with σ2 the constant variance, ρ(·) a valid spatial

correlation function and ‖h‖ the Euclidean distance. We consider spatial correlation function in

the Cauchy family. These models were used only for this evaluation and are presented in Appendix

A.1 (for more details see Banerjee et al., 2015). The prior distributions are presented in Appendix

A.2. Figure 2.5 shows the posterior spatial correlation for each component. Note that the humidity

variable behaves differently from the other components.

0.0 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

distance in space

corr

elat

ion

temperaturehumiditysolar radiation

Figure 2.5: Posterior median (full line) and 95% credible interval (dashed lines) of spatial correlation

of the univariate spatial models for each component.

Therefore, models that allow for different structures associated to different components are

27

presumed to have better predictive performance than models that consider proportionality in space

between variables’ covariances. The Mix-NSEP and NSEP model estimate different spatial ranges

for each component, as well as the ranges of cross-dependency structures. Note, in Figure 2.6, the

difference in the posterior densities of each range in the Mix-NSEP model compared to the single

range estimated in the SEP model. This flexibility makes the Mix-NSEP model present better

predictive results across all models, including the LMC (see Table 2.4).

0.0 0.1 0.2 0.3 0.4

05

1525

35

φtemperature

dens

ity

SEPMix−NSEP

0.0 0.1 0.2 0.3 0.4

05

1525

35

φhumidity

dens

itySEPMix−NSEP

0.0 0.1 0.2 0.3 0.4

05

1525

35

φs.radiation

dens

ity

SEPMix−NSEP

0.0 0.1 0.2 0.3 0.4

05

1525

35

φtemp. x humidity

dens

ity

SEPMix−NSEP

0.0 0.1 0.2 0.3 0.4

05

1525

35

φtemp. x s.radiation

dens

ity

SEPMix−NSEP

0.0 0.1 0.2 0.3 0.4

05

1525

35

φhumidity x s.radiation

dens

ity

SEPMix−NSEP

Figure 2.6: Posterior densities of the spatial ranges of SEP and Mix-NSEP models.

28

Model average WCI average IS LPS p0

SEP 46.65 52.97 898.88 –

NSEP-φ 46.77 53.27 898.89 –

NSEP 42.95 48.61 891.23 –

Mix-NSEP 43.04 48.53 889.77 0.000

LMC 44.75 50.32 894.75 –

Table 2.4: Comparison of models in predictive terms for the Ceara weather dataset. IS=Interval

Score, WCI=Width of Credibility Interval and LPS=Log Predictive Score.

2.8 Discussion

This chapter extends the class of nonseparable covariance functions proposed in Fonseca and

Steel (2011) to the modeling of component and spatial dependence, considering latent distance

between components as in Apanasovich and Genton (2010). We have proposed a Bayesian test to

measure the degree of separability between space and components. The posterior probability p0

has shown to be an easily interpretative measure in terms of separability of the component-spatial

covariance structures.

From the simulated and meteorological data examples, it is clear that flexible structures are

needed, which are able to accommodate the assumption of different spatial ranges in space for a

vector of spatial processes. The presented model was able to recover simple structures and presented

better predictive performance when applied to Ceara weather dataset than models widely used in

the literature, such as the separable model and the linear model of coregionalization.

29

Chapter 3

Likelihood computation for large data

3.1 Introduction

With the increase of high-resolution geocoded data, the big data problem became crucial in the

spatial and spatiotemporal setup. For instance, if Gaussianity is assumed, large covariance matrices

need to be inverted in the inference procedure and computational effort is of cubic order on the

number of locations. This limitation becomes even more important in the case of spatiotemporal or

multivariate data. Even low dimensional vectors observed over space may lead to huge covariance

matrices, making the inference for unknown parameters not feasible. Thus, a compromise between

complexity and parsimony is called for in this context.

In this chapter, we work with multivariate spatial covariance functions proposed in Chapter 2.

In order to deal with high computational effort, we approximate the full covariance matrices using

a decomposition based on the Kronecker product of two separable matrices of minor dimensions.

These approximations have been applied to the likelihood function in order to obtain fast estimation

of parameters but we still keep the interpretation and flexibility of the multivariate nonseparable

model.

3.2 Separable approximations

We have presented a nonseparable covariance model which results in a full matrix Σ which might

have high dimension and the computation of likelihoods in a Gaussian model require the inversion

and determinant computation of this matrix. We investigate the use of separable approximations

for the matrix Σ which will lead to fast computation of the likelihood function. The approximation

30

will be based on the work of Van Loan and Pitsianis (1993).

Genton (2007) investigates the use of singular decompositions of a full matrix in the context of

nonseparable spatiotemporal covariance matrices. The work considers a decomposition based on

separable matrices which allow for fast inversions and determinant computations. Thus, instead of

np × np matrices, the approximation uses only n × n and p × p matrices. We consider the same

separable approximation in order to compute likelihoods for the nonseparable multivariate spatial

models presented in Section 2.3.1. The aim is to obtain matrices R ∈ <n×n and A ∈ <p×p such

that the Frobenius norm1 of ‖Σ−R⊗A‖F is minimized, for a given full covariance matrix Σ. The

author shows that the solution to this problem is given by the singular value decomposition of a

permuted version of Σ ∈ <np×np.

The idea is to rearrange Σ obtaining another matrix =(Σ) ∈ <n2×p2 , such that the sum of

squares that arises in ‖Σ−R⊗A‖F equals the sum of squares in ‖=(Σ)− vec(R)⊗ vec(A)T ‖F .

It is showed in Golub and Van Loan (1996) that ‖Σ−R⊗A‖F = ‖=(Σ)− vec(R)⊗ vec(A)T ‖Fand ‖Σ‖F = ‖=(Σ)‖F .

The problem then reduces to finding the rank of the rectangular matrix =(Σ) ∈ <n2×p2 . The so-

lution is based on the singular value decomposition of =(Σ), where UT=(Σ)V = diag(w1, . . . , wr),

U ∈ <n2×n2and V ∈ <p2×p2 are orthogonal matrices, w1 ≥ w2 ≥ . . . ≥ wr ≥ 0 and

r = rank(=(Σ)) = minn2, p2. The solution can also be found in Golub and Van Loan (1996)

and is given by:

vec(R) =√w1u1 vec(A) =

√w1v1 (3.1)

with u1 denoting the first column of the matrix U ∈ <n2×n2and v1, the first column of V ∈ <p2×p2 .

In order to measure the quality of the approximation, Genton (2007) defines an approximation

error, denoted by κΣ(R,A), as follows:

κΣ(R,A) =‖Σ−R⊗A‖F

‖Σ‖F. (3.2)

κΣ(R,A) varies between zero (if Σ is separable) and√

1− 1r , and is minimized by R and A

given above. A standardized error index, varying between zero and one is given by:

κ∗Σ(R,A) =κΣ(R,A)√

1− 1r

. (3.3)

From the covariance structure proposed in equation (2.19), we investigate the sensitivity of the

separability approximation error index as a function of α0, for p = 2 and p = 3. Note that we use

1The Frobenius norm of a n× n matrix B (‖B‖F ) is given by ‖B‖F =(∑n

i=1

∑nj=1 b

2ij

)1/231

the idea previously applied to nonseparable spatiotemporal covariance matrices in the context of

nonseparable multivariate spatial covariance matrices. In Figure 3.1 we can see that the separa-

bility approximation error index is not larger than 5% for a covariance structure in which all the

components have the same spatial range. From Figure 3.1(a) note that there is no error when α0

is zero, which reduces to the separable case. If different spatial ranges are considered, it is possible

to see in Figure 3.1(b) that the error index does not start at zero because if α0 = 0 the separable

case is not obtained.

0.0 0.1 0.2 0.3 0.4 0.5 0.6

0.00

0.01

0.02

0.03

0.04

0.05

α0

stan

dard

err

or

(a) Equal spatial ranges.

0.0 0.1 0.2 0.3 0.4 0.5 0.6

0.06

0.08

0.10

0.12

α0

stan

dard

err

or

(b) Different spatial ranges.

Figure 3.1: Separability approximation error index as a function of α0. Full line: p = 2; dashed

line: p = 3.

3.3 Sensitivity study

We present a sensitivity study of the approximation structure investigated in the spatiotemporal

context by Genton (2007), however, used here for the multivariate spatial case. We study different

scenarios and measure the errors obtained in the likelihood approximation. Moreover, we compare

the inferential and predictive results obtained as we apply the full nonseparable model with and

without separable approximation for the covariance matrix in the likelihood computation, as well

as considering a separable model.

Suppose a bivariate dataset of 200 spatial locations in the [0, 1]× [0, 1] square. The observations

were generated from the model y = µ+ε. We consider a Gaussian process, so y ∼ Nnp(µ,Σ), where

32

Σij is obtained from the covariance function defined in (2.19) with dij = 0, for i = 1, . . . , p, j =

2, . . . , p. Therefore, consider the following parameter specification: Θ = (µ1,µ2, δ12, φ, α0, σ1, σ2)

with µ1 = µ2 = 0, δ12 = 2, φ = 0.1, α0 = 0.5 and σ1 = σ2 = 1. In this example, we generate only

one dataset in the region of interest. We plot the likelihood contour with both structures, using

the separable approximation for the covariance matrix and its full original structure. From Figure

3.2 it can be seen that the approximate structure is very similar to the full structure. Note that in

some cases the approximate likelihood and the exact one are almost coincident. It seems that the

approximations are satisfactory.

φ

α 0

0.04 0.12

0.0

0.6

1.2

δ12

α 0

1.0 3.0

0.2

0.6

1.0

σ1

α 00.95 1.15

0.2

0.6

σ2

α 0

0.85 1.05

0.2

0.6

1.0

φ

δ 12

0.07 0.11

1.0

2.5

4.0

σ1

δ 12

0.95 1.15

1.0

2.5

σ2

δ 12

0.90 1.05

1.0

2.5

4.0

σ1

φ0.95 1.15

0.08

0.12

σ2

φ

0.85 1.00

0.07

0.11

σ1

σ 2

0.95 1.10

0.85

1.00

Figure 3.2: Likelihood contour plots. Black line: full structure. Red line: approximate structure.

Dashed black line: true value of parameters.

We analyzed the necessary time to calculate the likelihood function based on a full covariance

matrix and a covariance matrix with approximate structure. We generated p = 2, 3, 5 and 8

variables in a dataset with n = 100, 200, 500, 700 and 1000 spatial locations in the [0, 1] × [0, 1]

square. In this example, 200 replicates were generated in the region of interest.

Table 3.1 shows that the separable approach provides important gains in computational effi-

33

ciency. Note that the time to calculate the likelihood function is substantially lower when we use

the approximate structure.

We also analyzed the time reduction using the separable approximations. Figure 3.3 shows that

the time reduction (in %) to calculate the likelihood function increases as the size of the covariance

matrix increases. Indeed, if we increase the number of variables or spatial locations or both, the

greater will be the computational gain of the approximate structure.

np = 2 p = 3 p = 5 p = 8

full approx. full approx. full approx. full approx.

100 2.3 0.8 3.6 0.6 8.9 0.9 28.0 2.1

200 12.3 3.1 13.9 1.6 44.8 4.2 187.1 11.1

500 74.0 13.4 143.1 12.1 618.6 30.5 2409.5 86.9

700 148.2 20.9 388.4 29.4 1649.7 65.8 6520.7 182.2

1000 374.6 52.6 1020.7 66.2 4673.9 133.5 19180.3 446.2

Table 3.1: Necessary time (in seconds) to calculate the likelihood function based on a full covariance

matrix and an approximate structure. (Intel(R) Core(TM) i7-3630QM, 2.40GHz, 6GB RAM)

Finally, we compare the predictive results obtained by the separable model, the separable ap-

proximation for the covariance matrix in the likelihood of the nonseparable model and the results ob-

tained with the nonseparable original covariance structure, without any approximations for the like-

lihood. For that purpose, we generated five replicated datasets considering the nonseparable covari-

ance function defined in equation (2.19) with dij = 0, for i = 1, . . . , p, j = 2, . . . , p. For each dataset,

we generated p = 2 variables in n = 110 spatial locations in the [0, 1] × [0, 1] square considering

the following parameter specification Θ = (β1,β2, δ12, φ, α0, σ1, σ2) with β1 = (1,−0.2,−0.8, 0.5),

β2 = (1.5, 0.6,−0.5,−0.8), δ12 = 2, φ = 0.2, α0 = 1, σ1 = 1.5 and σ2 = 1. The covariance functions

used for the separable and nonseparable models are respectively shown in equations (3.4) and (3.5):

Cij(‖h‖) =

a11

(1 +‖h‖φ

)−1(i = j = 1)

a22

(1 +‖h‖φ

)−1(i = j = 2)

a12

(1 +‖h‖φ

)−1(i 6= j),

(3.4)

34

6570

7580

8590

9510

0

spatial locations

time

redu

ctio

n (%

)

100 200 300 400 500 600 700 800 900 1000

p = 2p = 3p = 5p = 8

Figure 3.3: Computational time reduction (in percent) in calculation of the likelihood function

using approximate structure.

C(‖h‖, δij) =

σ21

(1 +‖h‖φ

)−(α0+1)

(i = j = 1)

σ22

(1 +‖h‖φ

)−(α0+1)

(i = j = 2)

σ1σ2

(1 + δ12 +

‖h‖φ

)−α0(

1 +‖h‖φ

)−1(1 + δ12)

−1 (i 6= j),

(3.5)

with ‖h‖ the Euclidean distance. We adopted T = 30 independent replicates. The observations

were generated from the model y = Xβ+ ε. We consider a Gaussian process, so yt ∼ Nnp(Xβ,Σ),

t = 1, . . . , T , where Σ is np × np covariance matrix and X are independent variables (latitude,

longitude and altitude, randomly generated in the cube [0, 1]× [0, 1]× [0, 1]). For the nonseparable

models we use the following priors: σi ∼ N(0, 100), i = 1, 2, δ12 ∼ Gamma(1, 0.5), φ ∼ Gamma(0.1×

med(ds), 0.1), with med(ds) = 0.502, β ∼ N8(0, 1000I8) and α0 ∼ Gamma(1, 0.25). For the

separable model we use the following priors: A ∼ InverseWishart(I2, 3), φ ∼ Gamma(0.75 ×

med(ds), 0.75), with med(ds) = 0.502 and β ∼ N8(0, 1000I8). The simulation method used was

MCMC. For convergence monitoring we use the algorithms present in the Coda package in R (see

Plummer et al., 2006).

35

Furthermore, the data of five spatial locations were removed from the trainning data and used

for prediction validation. Therefore, we estimate the model using information about n = 105

spatial locations. After the estimation of the models for each dataset, we were able to compute

measures of predictive performance for each model. Table 3.2 presents the comparison of the

models in predictive terms and our main goal is to analyze the predictive performance between the

full nonseparable model and the approximate nonseparable model. We can see that in predictive

terms the approximation leads to very similar results to the full model likelihood, presenting very

similar performance to the original full nonseparable model. The results of the separable model

are in agreement with the discussion presented in Section 2.6.2, that is, we expect its predictive

performance to be similar to the nonseparable model with equal ranges.

Dataaverage WCI average IS LPS

SEP NS app. NS SEP NS app. NS SEP NS app. NS

1 3.26 3.23 3.21 4.21 4.26 4.25 375.62 375.65 375.19

2 3.25 3.24 3.22 4.25 4.26 4.35 381.58 382.18 382.64

3 3.29 3.25 3.25 3.97 3.93 3.94 355.49 355.06 354.98

4 3.32 3.28 3.29 3.92 3.91 3.96 372.51 372.69 372.50

5 3.25 3.23 3.22 3.88 3.91 3.85 341.26 341.24 341.24

Mean 3.27 3.25 3.24 4.05 4.05 4.07 365.29 365.36 365.31

Table 3.2: Predictive model comparison. SEP: separable model. NS app: nonseparable approximate

model. NS: nonseparable model.

3.4 Discussion

In this chapter we have investigated the performance of an approximation for the full nonsepa-

rable covariance model using the decomposition based on the Kronecker product of two separable

matrices of minor dimensions. A sensitivity study was performed showing that the approximate

approach provides important gains in computational efficiency while keeping the predictive power.

Although taking advantage of approximations to compute the likelihood, our idea keeps interpre-

tation and flexibility.

We conclude that it is better to consider a separable approximation of the nonseparable de-

scribed model than to consider the nonseparable structures, when we assume equal ranges. The non-

separable approximation reduces considerably the computational cost and keeps predictive power,

36

which is usually the main focus of multivariate spatiotemporal data analysis.

37

Chapter 4

Multivariate spatio-temporal

modeling

4.1 Introduction

In Chapter 2 we propose a class of multivariate spatial covariance functions based on mixtures.

The parametric function defined in (2.19) generates a nonseparable covariance structure flexible

enough to allow different spatial ranges for each component and it has been seen that using equal

spatial ranges – which are implied by separable specifications – limits the flexibility of the spatial

structure, not favoring its use.

In the simulated and real datasets presented in previous chapters, we have modeled only the cross

spatial dependence. Generally, the datasets are evaluated over time and the temporal dependence

modeling becomes essential. Then it is necessary to incorporate a structure capable of modeling

the dependence of variables in space and time.

Several papers have extended spatial cross-covariance models for space-time setting. Rouhani

and Wackernagel (1990), Choi et al. (2009), Berrocal et al. (2010) and Iaco et al. (2013), for

example, developed versions of space-time models based on linear model of coregionalization.

Gelfand et al. (2005) considered dynamic approach to model multivariate space-time data. Their

idea is to view the data as a time series of spatial processes, by adapting the structure of dynamic

models to space-time models with space-varying coefficients (Reis et al., 2013).

Following the idea of latent dimensions presented in Section 2.2, Apanasovich and Genton (2010)

have proposed multivariate spatio-temporal covariance functions extending the univariate space-

time class of covariance functions described in Gneiting (2002). More recently, Ip and Li (2016)

38

have extended results of the bivariate matern class presented in Gneiting et al. (2010) to space-time

setting. Models can be very restrictive since the fit and predictions may not be satisfactory if the

dataset presents non-Gaussian characteristics. In fact, if there are aberrant observations in the

dataset, it would be reasonable to think of some way to accommodate these observations since the

identification of outliers is essential to improve model fit and predictive power.

In this chapter, we introduce a spatio-temporal setting of the cross-covariance function presented

in (2.19). Like Gelfand et al. (2005), we also consider dynamic models to describe the temporal

evolution, however, with time-varying covariance coefficients, allowing to accommodate outlying

observations and changes in variability over time.

4.2 Multivariate dynamic spatial models

There are several ways of describing phenomena that vary over time. Generally, the purpose of

time series analysis is related to the forecast for future and unobserved times. Harrison and Stevens

(1976) described a class of Bayesian forecasting models called dynamic models.

The dynamic models (or state space models) assume that at each time t ∈ N the observation

yt (or the vector of observations yt) of the time series is characterized probabilistically by a vector

of parameters θt, called the state vector, whose components may vary over time accommodating

local structures of the temporal process. These models are very flexible because they are able to

treat non-stationary time series with structural changes or irregular patterns. Following Berliner

(1996) we define a casual way to think about the structure of dynamic model:

Observational model: [data | state space process, parameters]

System evolution: [state space process | parameters]

Initial information: [parameters].

An important special case of general state space models is the dynamic linear model (DLM)

which considers Gaussian responses. A DLM is generally characterized by two equations: the

observation equation, which describes the relationship between covariates and the response variable,

and the evolution equation, which describes how the states of the model evolve over time. An

extensive review about these models can be seen in West and Harrison (1997) and Petris et al.

(2009).

In this subsection, our aim is to introduce the temporal component in the multivariate spatial

39

model y = Xβ+ε with covariance matrix Σ. Following the idea in Stroud et al. (2001), define βt =

(β1t . . . ,βqt)′, with βkt = (β1k,t, . . . , βpk,t), as the full parameter vector at time t, for t = 1, . . . , T .

Next, specify a probability model p(βt|βt−1) that links the parameters βt over time. Assuming

linear evolution for p(βt|βt−1) the multivariate space-time model is

yt = Xtβt + εt, εt ∼ MVN(0,Σt) (4.1)

βt = Gtβt−1 + ωt, ωt ∼ MVN(0,Wt), (4.2)

where yt = (yt1, . . . ,ytp)′, is the np × 1 vector of the variables at spatial locations, t = 1, . . . , T ,

with yti = (yti(s1), . . . , yti(sn))′, s1, . . . , sn ∈ D, i = 1, . . . , p, Xt is the np × pq time-dependent

design matrix at time t, Gt is a known matrix and εt and ωt are independent white noise sequences

with mean zero and covariance matrices Σt and Wt, respectively.

Observe that equation (4.1) is an extension of the multivariate spatial model and equation (4.2)

define the evolution over time of the regression parameters βt. Details and characteristics of the

DLM can be seen in West and Harrison (1997).

In this section, we present two different models that incorporate the temporal structure. Re-

member that the covariance matrix Σt in (4.1) defines the multivariate spatial dependence structure,

with its elements given by a parametric cross-covariance function based on equantion (2.19). Thus,

the two proposed models are described below.

Model 1 (M1): assume the model defined in (4.1) and (4.2) but do not consider temporal evolution

in the cross-covariance structure, that is, Σt = Σ. The temporal structure is only present in the

mean and the cross-covariance function is given by equation (2.19), that is,

Cij(‖h‖, δij) = σij

(1 + δij +

‖h‖φij

)−α0(

1 +‖h‖φij

)−1(1 + δij)

−1 , (4.3)

i, j = 1, . . . , p, where ‖h‖ is the Euclidean distance between locations, δij represents a latent dis-

tance between components i, j, σij is the co-located covariance between components, α0 ≥ 0 is a

smoothness parameter and φij ’s are spatial range parameters. Note that the cross-covariance func-

tion in (4.3) is the same of the (2.19), but now we are not concerned about separability between

components and space, thus φij is not parameterized as a function of dij . The other parameters of

the cross-covariance function maintain the same interpretations as described in Section 2.3. How-

ever, since Σt = Σ the spatial structure is repeated at all times t, t = 1, . . . , T , and thus separability

is assumed for the temporal component.

40

Model 2 (M2): assume the model defined in (4.1) and (4.2) and consider the temporal cross-

covariance Σt having its elements given by

Cij,t(‖h‖, δij) = σij,t

(1 + δij +

‖h‖φij

)−α0(

1 +‖h‖φij

)−1(1 + δij)

−1 , (4.4)

i, j = 1, . . . , p, where ‖h‖ is the Euclidean distance between locations, δij represents a latent

distance between components i, j, σij,t is the co-located covariance between components at time t,

t = 1, . . . , T , α0 ≥ 0 is a smoothness parameter and φij ’s are spatial range parameters.

The models presented in equations (4.3) and (4.4) differ in the characterization of the covariance

function. Model M2 considers a temporal evolution in covariance parameters between components,

while model M1 maintains the static covariance structure over time. It is reasonable to assume

that the uncertainty of a component i, i = 1, . . . , p, varies over time with atypical values across

space. In fact, the model M2 allows us to identify that there is an increase in uncertainty related

to atypical observations at time t, t = 1, . . . , T . However, the model presented in (4.4) does not

accomodate outliers and heterogeneity over time.

4.2.1 DLM completion and prior specification

Refer to the general DLM representation (4.1, 4.2) and let Dt = yt, Dt−1 be the information

set at time t with D0 the initial information set. On the sequential Bayesian learning of dynamic

models, the posterior distribution for the parameters at time t− 1, p(βt−1|Dt−1), must be updated

via (4.2) in order to become the prior distribution at time t, p(βt|Dt−1). Combining the prior

with the likelihood p(yt|βt) from (4.1), the predictive p(yt|Dt−1) and the posterior p(βt|Dt) are

produced. The initial prior is β0|D0 ∼ MVN(m0,C0) with specified initial hyper-parameters. The

model for the sequence Wt is based on standard variance discounting using a discount factor δW ,

where typically values close to 1 are chosen to represent relative stability over time in the evolution

of Wt. In the following simulations and application we assume δW = 0.9.

The forward filtering analysis sequences through t = 1, . . . , T , updating the one-step forecast and

posterior distributions follow from the theory of general multivariate DLMs (West and Harrison,

1997). Having completed the forward filtering computations over times t = 1, . . . , T , backward

sampling generates a posterior draw for the full sequence of states from the posterior p(β1:T |DT ).

This operates in reverse time, using the general theory of DLMs that generate this computationally

efficient algorithm for what can be a very high-dimensional full set of states when T is large (Liu

and West, 2009). This is the known forward filtering, backward sampling (FFBS) algorithm (see

Fruhwirth-Schnater, 1994 and Carter and Kohn, 1994).

41

The update of σij,t do not follow the sequential inference used for βt since no conditional

conjugacy is available. We then adopt a metropolis-hastings step. Following Section 2.4, set

σij = ρijσiσj , i, j = 1, . . . , p, with σi ∈ <+ and ρij the co-located correlation between components.

In this chapter we assume to know the ρij sign beforehand. Then, we need to estimate σi ∈ <+.

In the temporal evolution context, set σij,t = σi,tσj,t, i, j = 1, . . . , p, σi,t ∈ <+, i = 1, . . . , p. Based

on this constraint, the dynamic evolution for σi,t and σj,t is

log(σt) = log(σt−1) +ψt, ψt ∼ MVN(0,Ψt), (4.5)

with σt = [σ1,t, . . . , σp,t]′

and Ψt a p× p matrix.

Here we assume that Ψt = Ψ is a known matrix and the initial prior log(σ0)|D0 ∼

MVN(m∗0,Ψ0). We chose a diagonal matrix with large values to represent our uncertainty about Ψ.

The inference approach for log(σt) is updated via (4.5). A metropolis-hastings step is performed

every time t, t = 1, . . . , T , which can be prohibitive when T is large. The bayesian inference for

the other parameters of the cross-covariance function are described in Section 2.4. When σt = σ,

as in M1 model, the inference is made for log(σi), i = 1, . . . , p, and the prior is Normal(ci, di). To

get the value without the transformation, simply apply an exponential function.

4.3 Simulated examples

In this section we present three artificial simulations to investigate the characteristics of the

models described in equations (4.3) and (4.4). We analyze their predictive performance and evaluate

the capacity of identifying atypical observations in space.

The datasets were generated from the model defined by equations (4.1) and (4.2), with p = 2

components, T = 30 time replications and n = 50 spatial locations in the [0, 1]× [0, 1] square. The

spatial locations used for fit and prediction are presented in Figure 4.1.

42

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

longitude

latit

ude

Figure 4.1: Spatial location simulated in [0, 1]× [0, 1] square. Red points: spatial location used in

estimation. Black points: spatial location used in prediction.

The parameters of the cross-correlation function ρij(‖h‖, δij), Θ = (δ12, α0, φ11, φ12, φ22), were

chosen such that the cross-correlation between components was moderate. The values of these

parameters were the same for all simulations and are given by: δ12 = 1.5, α0 = 0, φ11 = 0.1, φ12 =

0.1 and φ22 = 0.3. For all analyzes we consider ρij = +1. With this configuration, ρ12(0, δ12) = 0.4

and the spatial dependence in component two is greater than component one. Figure 4.2 presents

the cross-correlation in space.

0.0 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

distance in space

cros

s−co

rrel

atio

n

ρ11ρ22ρ12

Figure 4.2: Cross-correlation function for components in space: simulations.

43

4.3.1 Simulation study one: constant variance over time

We generated a dataset from the model described in equations (4.1) and (4.2), with p = 2

components, T = 30 time replications and n = 50 spatial locations in the [0, 1]× [0, 1] square (see

Figure 4.1). The covariance function used is defined in (4.3), meaning the covariance structure is

static over time.

The dataset was generated using the following specifications: Xt =

1 0

0 1

, with 1 = [1, . . . , 1]′,

Gt =

1 0

0 1

, Wt =

0.008 0

0 0.008

, σ1 = 0.6 and σ2 = 0.8, ∀ t = 1, . . . , T .

In this simulation we evaluate the dataset generated with the specifications presented above and

estimate the models M1 and M2. Our aim is to verify if M2 recovers the covariance structure at each

time t, analyzing the estimated σt and its predictive performance. Details about prior distributions

for model parameters are deferred to Appendix A.2. Inference was carried under MCMC scheme

and for convergence monitoring we use the algorithms in Coda package for R (Plummer et al., 2006).

Figure 4.3 presents the posterior median and 95% credibility interval of σi,t for M1 and M2

models. Note that model M2 is able to recover the static structure of the parameters σ1 and σ2.

44

time index

σ 1

1 4 7 11 16 21 26

0.5

0.6

0.7

0.8

0.9

1.0

time index

σ 2

1 4 7 11 16 21 26

0.5

0.6

0.7

0.8

0.9

1.0

(a) M1

time index

σ 1

1 4 7 11 16 21 26

0.2

0.6

1.0

1.4

1.8

time index

σ 2

1 4 7 11 16 21 26

0.2

0.6

1.0

1.4

1.8

(b) M2

Figure 4.3: Posterior median and 95% credibility interval of σi,t, i = 1, 2, t = 1, . . . , 30, for M1 and

M2 models. Green line: true value of the σi,t: simulation one

Figures 4.4 and 4.5 present predictive measures for model comparison at each time t, t =

1, . . . , T . The IS (Interval Score) and LPS (Log Predictive Score) comparison measures are predic-

tive score rules that provide summaries for accuracy of probabilistic predictions. The smaller the IS

or LPS, the better the model in predictive terms. For more details about scoring rules see Gneiting

and Raftery (2007) and Appendix A.3. Since the covariance structure is recovered by M2, it is

expected that there is no difference in predictive performance between the models M1 and M2. In

fact, from Figures 4.4 and 4.5 it is possible to verify that the predictive comparison measures are

almost identical under both models.

45

0 5 10 15 20 25 30

1.5

2.0

2.5

3.0

time index

inte

rval

sco

re −

y1

M1

M2

0 5 10 15 20 25 30

1.5

2.0

2.5

3.0

time index

inte

rval

sco

re −

y2

M1

M2

Figure 4.4: Interval Score as predictive measure for M1 and M2 models comparison at each com-

ponent i, i = 1, 2, and time t, t = 1, . . . , 30: simulation one.

0 5 10 15 20 25 30

6070

8090

100

120

time index

log

pred

ictiv

e sc

ore

M1

M2

Figure 4.5: Log Predictive Score as predictive measure for M1 and M2 models comparison at each

time t, t = 1, . . . , 30: simulation one.

Using the same generated data set, we selected four random locations in four different times

(t = 11, 15, 20, 25) to analyze the performance of the model in identifying outliers. Observations of

the first component were contaminated at those times and locations, with the addition of random

increments σ1u, with u ∼ Uniform(5, 6).

Model M2 is capable of increasing the uncertainty of component one at times of contamination,

that is, σ1t is inflated at t = 11, 15, 20, 25, while M1, which estimates a single σi, i = 1, 2, for all

time t, t = 1, . . . , 30, overestimated σ1. Model M2 seems to be more appropriate since it is able to

46

identify the times at which aberrant values exist. However, under spatial predictive performance,

model M2 does not differ from model M1 because σit parameters do not vary in space. To identify

spatial outliers we would need to work with σit(s). The output of this simulation has been omitted.

4.3.2 Simulation study two: time-varying variance

In this simulation we follow the same specifications described in simulation one but now we

consider the covariance function defined in (4.4), which presents a dynamic evolution.

The dataset is generated using the following specifications: Xt =

1 0

0 1

, with 1 = [1, . . . , 1]′,

Gt =

1 0

0 1

, Wt =

0.008 0

0 0.008

, and Ψt =

0.25 0

0 0.25

, ∀ t = 1, . . . , T .

Our aim is to compare the predictive performance between M1 and M2 models when the dataset

has time-varying variance. Details about prior distributions for model parameters are deferred to

Appendix A.2. Inference was carried under an MCMC scheme and for convergence monitoring we

use the algorithms in Coda package for R (Plummer et al., 2006).

We expected model M2 to give good results since the empirical variances seem to change in

time in both components as shown in Figure 4.6.

0 5 10 15 20 25 30

05

1015

20

time index

empi

rical

var

ianc

e y

1

0 5 10 15 20 25 30

05

1015

20

time index

empi

rical

var

ianc

e y

2

Figure 4.6: Empirical variance over time for each component: simulation two.

Figure 4.7 presents the posterior median and 95% credibility interval of σi,t under M1 and M2

47

models. Model M2 recovers the temporal structure of σt in both components. Model M1 estimates

a single value that appears to be an average over time. For the second component we note that σ2

estimated by the M1 model is very different from the true value at the start and end times (Figure

4.7(a)). The model that does not allow dynamic evolution is not flexible enough to identify the

heterogeneity of the observations over time.

time index

σ 1

1 4 7 11 16 21 26

1.0

1.5

2.0

2.5

3.0

3.5

4.0

time index

σ 2

1 4 7 11 16 21 261.

01.

52.

02.

53.

03.

54.

0

(a) M1

time index

σ 1

1 4 7 11 16 21 26

1.0

2.0

3.0

4.0

5.0

time index

σ 2

1 4 7 11 16 21 26

1.0

2.0

3.0

4.0

5.0

(b) M2

Figure 4.7: Posterior median and 95% credibility interval of σi,t, i = 1, 2, t = 1, . . . , 30, for M1 and

M2 models. Green line: true value of the σi,t: simulation two.

Figures 4.8 and 4.9 present predictive measures (IS and LPS) for model comparison at each

time t, t = 1, . . . , T . Model M2 results in higher predictive performance than model M1 when

there is a big difference between the true value of σt and the value estimated by the M1 model,

which was already expected.

48

0 5 10 15 20 25 30

46

810

1214

time index

inte

rval

sco

re −

y1

M1

M2

0 5 10 15 20 25 30

46

810

1214

time index

inte

rval

sco

re −

y2

M1

M2

Figure 4.8: Interval Score as predictive measure for M1 and M2 models comparison at each com-

ponent i, i = 1, 2, and time t, t = 1, . . . , 30: simulation two.

0 5 10 15 20 25 30

300

350

400

time index

log

pred

ictiv

e sc

ore

M1

M2


time t, t = 1, . . . , 30: simulation two.

4.3.3 Simulation study three: change of regime

This simulation considers the cross-covariance function defined in (4.4). The dataset is gen-

erated using the following specifications: Xt =

1 0

0 1

, with 1 = [1, . . . , 1]′, Gt =

1 0

0 1

,

Wt =

0.008 0

0 0.008

, Ψt =

0.1 0

0 0.1

, ∀ t = 1, . . . , 14, 16, . . . , T , and Ψ15 =

10 0

0 0.1

.

49

Unlike simulation study two, here we consider a dynamic evolution for σi,t, i = 1, . . . , p, t =

1, . . . , T , but we increase the uncertainty at t = 15 by changing the evolution level of σ1,t. This

example considers a regime change in uncertainty at component one at time t = 15. Figure 4.10

presents the empirical variance at each component. Note that the empirical variance in component

one is at a different level at times t ≥ 15.

In this simulation, we expect model M2 to recover the structure of variable one and, conse-

quently, to present better predictive performance. In fact, we expect that the M1 model will not

be able to represent adequately the uncertainty associated with component one. Details about

prior distributions for model parameters are deferred to Appendix A.2. Inference was carried un-

der MCMC scheme and for convergence monitoring we use the algorithms in Coda package for R

(Plummer et al., 2006).

0 5 10 15 20 25 30

05

1015

2025

time index

empi

rical

var

ianc

e y

1

0 5 10 15 20 25 30

05

1015

2025

time index

empi

rical

var

ianc

e y

2

Figure 4.10: Empirical variance over time for each component: simulation three.

Figure 4.11 presents the posterior median and 95% credibility interval of σi,t for M1 and M2

models.

Model M2 is able to recover the simulated variance structure (see Figure 4.11(b)). As expected,

model M1 does not recover the variance structure, especially the structure of the component one.

Note that the estimated value of σ1 in M1 (Figure 4.11(a)) is far from the true value when t < 15.

Thus model M1 overestimates the uncertainty when t < 15 and underestimates (not in the same

magnitude that overestimates) when t ≥ 15.

50

time index

σ 1

1 4 7 11 16 21 26

0.0

1.5

3.0

4.5

6.0

time index

σ 2

1 4 7 11 16 21 26

0.0

0.5

1.0

1.5

2.0

(a) M1

time index

σ 1

1 4 7 11 16 21 26

0.0

1.5

3.0

4.5

6.0

time index

σ 2

1 4 7 11 16 21 26

0.0

0.5

1.0

1.5

2.0

(b) M2

Figure 4.11: Posterior median and 95% credibility interval of σi,t, i = 1, 2, t = 1, . . . , 30, for M1

and M2 models. Green line: true value of the σi,t: simulation three.

Figures 4.13 and 4.14 present predictive measures (IS and LPS, described in Appendix A.3) for

model comparison at each time t, t = 1, . . . , T . Model M2 provides very good predictions when

compared to model M1. Specially in component one, the uncertainties are predicted much more

adequately. This can be seen in Figure 4.12, which presents the posterior median and 95% credible

interval for the component one versus the observed values at times t = 7 and t = 23 for all out-

of-sample stations. When the true uncertainty is lower (t = 7) the M1 model presents very large

intervals since this model overestimates the uncertainty. At time t = 23, the M1 model has smaller

intervals than the M2 model because it is not able to represent uncertainty properly. This suggests

that uncertainty is well modeled in the M2 model, providing better predictive performance, as seen

in Figures 4.13 and 4.14.

51

−1.0 0.0 0.5 1.0 1.5

−6

−4

−2

02

46

M1

observed values

pred

ict v

alue

s

−1.0 0.0 0.5 1.0 1.5

−6

−4

−2

02

46

M2

observed values

pred

ict v

alue

s

(a) t = 7

−15 −5 0 5 10

−15

−5

05

10

M1

observed values

pred

ict v

alue

s

−15 −5 0 5 10

−15

−5

05

10

M2

observed values

pred

ict v

alue

s

(b) t = 23

Figure 4.12: posterior median and 95% credible interval for the component one versus the observed

values at times t = 7 and t = 23 for all out-of-sample stations: simulation three.

52

0 5 10 15 20 25 30

05

1015

2025

30

time index

inte

rval

sco

re −

y1

M1

M2

0 5 10 15 20 25 30

01

23

4

time index

inte

rval

sco

re −

y2

M1

M2

Figure 4.13: Interval Score as predictive measure for M1 and M2 models comparison at each

component i, i = 1, 2, and time t, t = 1, . . . , 30: simulation three.

0 5 10 15 20 25 30

5010

015

020

025

030

0

time index

log

pred

ictiv

e sc

ore

M1

M2


time t, t = 1, . . . , 30: simulation three.

4.4 California dataset

Many air pollutants, emitted from a variety of sources, affect the health of people around the

world every day. In general, pollutants can cause serious effects throughout life. According to the

World Health Organization (WHO), one third of deaths from stroke, lung cancer and heart disease

are due to air pollution.

The use of statistical methods for space-time estimation and forecasting of pollutants is widely

53

applied. In these processes, atypical observations are usually captured and the usual gaussian

models are not able to represent these phenomena satisfactorily.

The air pollution is a problem in California. According to the California Air Resources Board

(CARB), over 90 percent of Californians breathe unhealthy levels of one or more air pollutants

during some part of the year. In this sense, the spatial dependence behavior and the time trajectory

of the pollutant levels PM2.5 and NO2, are studied in this section. These pollutants cause various

effects on the health of individuals, such as premature death, cardiovascular disease, respiratory

disease, asthma, lung irritation and headache.

The pollution dataset was obtained from United States Environmental Protection Agency

(EPA). We evaluated the daily mean PM2.5 concentration (ug/m3) and daily max 1-hour NO2

concentration (ppb), measured at 21 stations (Figure 4.15(a)) from January 1, 2017 to February

09, 2017, totaling 40 moments in time.

Locations with less than 10% missings have gone through an imputation process1 (Figure

4.15(b)). For predictive comparison and validation, we consider two spatial locations. Figure 4.15(c)

shows the locations of these 21 monitoring sites and two hold-out sites on a latitute-longitude scale.

longitude

latit

ude

−124 −120 −116

3436

3840

42

(a)

longitude

latit

ude

−124 −120 −116

3436

3840

42 okimputation

(b)

longitude

latit

ude

1

2

−124 −120 −116

3436

3840

42

(c)

Figure 4.15: Collection of monitoring stations in California state, USA. (a) Collection of 21 mon-

itoring stations. (b) Locations that have gone through an imputation process. (c) Collection

of 21 monitoring stations where numbers 1 and 2 are spatial locations considered for predictive

comparison.

Figures 4.16 and 4.17 show the empirical variance across space and time for each component,

respectively. Notice in Figure 4.17 how volatile the PM2.5 series is. This indicates that the PM2.5

1The imputation was performed applying the mice package in R.

54

changes a lot over time, sometimes changing 5 units in just a couple of days.

5 10 15 20

67

89

11

spatial location

empi

rical

var

ianc

e N

O2

5 10 15 20

24

68

spatial location

empi

rical

var

ianc

e P

M2.

5

Figure 4.16: Empirical variance for NO2 and PM2.5 across space: California dataset.

0 10 20 30 40

68

1012

days

empi

rical

var

ianc

e N

O2

0 10 20 30 40

24

68

10

days

empi

rical

var

ianc

e P

M2.

5

Figure 4.17: Empirical variance for NO2 and PM2.5 across time: California dataset.

In order to illustrate the cross-correlation between the variables across time, we estimate the

parameters in the model y = µ+ ε for each time t, t = 1, . . . , 40, that is,

y = (y1,y2)′ ∼ MVN(µ,G), (4.6)

with G an arbitrary full cross-covariance matrix.

From Figure 4.18 note that there is a positive cross-correlation among NO2 and PM2.5 for all

times, except at time t = 17. Indeed, we have restricted analysis by considering positive dependence

between components at all times. The models used have constraints and assume that the variables

are positively dependent for all t, t = 1, . . . , 40.

55

time index

post

erio

r cr

oss−

corr

elat

ion

1 6 12 18 24 30 36−

1−

0.5

00.

51

Figure 4.18: Posterior median and 95% CI of the cross-correlation among components considering

an independent multivariate normal distribution for each time: California dataset.

We evaluated the behavior of the posterior spatial correlation for each variable (NO2 and PM2.5)

from a univariate dynamic spatial model, where the covariance function is defined by C(‖h‖) =

σ2ρ(‖h‖), with σ2 the constant variance, ρ(·) a valid spatial correlation function and ‖h‖ the

Euclidean distance. We consider a spatial correlation function in the Cauchy family. These models

were used only for this evaluation and are presented in Appendix A.1 (for more details see Banerjee

et al., 2015). The prior distributions for covariance parameters are presented in Appendix A.2

and for the mean parameters we use FFBS as described in Section 4.2.1. Figure 4.19 shows the

posterior spatial correlation for each component. Note that the variables have different behaviors

across space.

56

0.0 0.5 1.0 1.5 2.00.

00.

20.

40.

60.

81.

0distance in space

corr

elat

ion

NO2PM2.5

Figure 4.19: Posterior median (full line) and 95% credible interval (dashed lines) of spatial corre-

lation of the univariate dynamic spatial model for each component: California dataset.

We compared the predictive performance of the following models: M1 and M2 models, both

described in (4.3) and (4.4), respectively. Parameter estimation was performed considering the

likelihood described in (2.21). We consider dynamic effects on the level and covariates (latitude

and longitude) and assume the following discount factor δw = 0.9.

The prior distributions for the parameters in the proposed model follow the discussion in Sec-

tions 4.2.1 and 2.4 and their hyperparameters are presented in Appendix A.2. We assume that

ρij = +1. MCMC methods were used to generate posterior and predictive samples. For MCMC

convergence monitoring we used the algorithms in Coda package for R.

The cross-correlation functions are similar in both models. The posterior densities of ρij(h, δij)

parameters are not significantly different (see Figure 4.20). The posteriori mean and 95% credibility

interval for the uncertainty parameter, σi for M1 and σi,t for M2, i = 1, 2, t = 1, . . . , 40, are shown

in Figure 4.21. Note that M2 seems to recover the behavior of variance parameters from both

components. The difference between the models is explicit when PM2.5 is evaluated. There are

moments in time when the variability is large and the M1 model is not able to estimate uncertainty

satisfactorily. The difference between the models is confirmed from Figures 4.22 and 4.23, which

present the predictive comparison measures. Note that at moments in time when M1 does not

recover the uncertainty associated with the PM2.5, its predictive performance is lower than M2

model.

57

0.0 0.1 0.2 0.3 0.4

02

46

φ11

dens

ity

M1

M2

0.0 0.2 0.4

02

46

φ12

dens

ity

M1

M2

0.00 0.05 0.10 0.15

05

1020

φ22

dens

ity

M1

M2

0.5 1.5 2.5

0.0

0.5

1.0

1.5

δ12

dens

ity

M1

M2

0.0 0.2 0.4

02

46

α0

dens

ity

M1

M2

Figure 4.20: Posterior densities of the cross-correlation parameters for each model: California

dataset.

time index

σ NO

2

1 5 9 14 20 26 32 38

57

911

1315

17

M1

M2

time index

σ PM

2.5

1 5 9 14 20 26 32 38

13

57

911

1315 M1

M2

Figure 4.21: Posterior median and 95% credibility interval of σNO2 and σPM2.5 for M1 and M2

models at each time t, t = 1, . . . , 40,: California dataset.

58

0 10 20 30 40

050

100

150

200

250

time index

inte

rval

sco

re −

NO

2

M1

M2

0 10 20 30 40

050

100

150

time index

inte

rval

sco

re −

PM

2.5

M1

M2

Figure 4.22: Interval Score as predictive measure for M1 and M2 models comparison at each

component i, i = 1, 2, and time t, t = 1, . . . , 40: California dataset.

0 10 20 30 40

810

1214

1618

20

time index

log

pred

ictiv

e sc

ore

M1

M2


time t, t = 1, . . . , 40: California dataset.

4.5 Discussion

In this chapter we have introduced the space-time setting of the cross-covariance function pre-

sented in (2.19). We use dynamic models to represent temporal evolution. We allowed time-varying

co-located covariance, accommodating atypical observations in a multivariate space-time nonsepa-

rable structure.

In the simulation study one the model with temporal evolution in covariance coefficients recov-

59

ered the structure with constant variance. In the simulations studies two and three, the model that

assumes constant variance did not recover the uncertainty structure and presented worse predictive

performance than the model that assumes time-varying variance. The difference between the tem-

poral evolution model and constant variance model is evident in simulation three, which considers

a regime change in the uncertainty associated with component one.

From the application with California pollutant data, it was noted that the model that allows

the temporal evolution in covariance coefficients is able to recover the true uncertainty associated

with the components and presenting better predictive performance than the model that assumes

constant variance over time.

So far the estimation procedure considered is based on the assumption that the the signal of

co-located correlation ρij is known a prior. In the static model and dynamic model, ρij is fixed at

all times t, t = 1, . . . , T , and it seems that this cross-correlation does not change over time (see,

for example, the application of California pollutants, Figure 4.18). Do not fix the signal of the

co-located correlation implies a bimodal posterior distribution for σi,t.

The parameters σij,t could be directly estimated without the restriction σij,t = σi,tσj,t imposed

so far. However, this implies that correlation will be allowed to change over time and identifiability

issues will probably occur in this scenario. With our restriction we ensure that correlation between

components is fixed over time and only variances are allowed to vary. Our simulated examples and

real data analysis indicate that this model represents well scenarios of outliers, heterogeneity and

change of regimes. Investigation of more general solutions to this problem is still under development.

60

Chapter 5

Conclusions

In this thesis we have extended the class of nonseparable covariance functions proposed in

Fonseca and Steel (2011) to the modeling of component and spatial dependence, considering latent

distance between components as in Apanasovich and Genton (2010).

The general class is flexible enough to generate a nonseparable convariance structure and allows

specification of different structures for space and components, as well as different spatial ranges

associated to each component. The separable model (Mardia and Goodall, 1993) is a special

case. We have proposed a Bayesian test to measure the degree of separability between space and

components. The posterior probability p0 has shown to be an easily interpretative measure in terms

of separability of the component-spatial covariance structures, since does not depend on the scale

of ranges or smoothness.

From the simulations and meteorological data example, it is clear that flexible structures are

needed, which are able to accommodate the assumption of different spatial ranges in space for a

vector of spatial processes. The presented model was able to recover simple structures and provided

better predictive performance than models widely used in the literature, such as the separable model

and the linear model of coregionalization.

To treat the computational limitation we investigated the use of separable approximations

for the full covariance matrix which lead to fast computation of the likelihood function. Thus,

based on Van Loan and Pitsianis (1993) we approximated the full covariance matrices using a

decomposition based on the Kronecker product of two separable matrices of minor dimensions.

These approximations have been applied to the likelihood function in order to obtain fast estimation

of the parameters but we still keep the interpretation and flexibility of the multivariate nonseparable

model. From simulations, we observed that the approximation leads to very similar predictiver

61

performance to the original full nonseparable model.

We have introduced the spatio-temporal setting in the multivariate cross-covariance proposed

in Chapter 2. We allowed time-varying covariance coefficients to accomodate atypical observations.

The spatio-temporal multivariate models were construct based on the dynamic approach. From

simulations and application with California pollutant data, we observed that the model with dy-

namic evolution in covariance coefficients presented better predictive performance and was able to

recover variance structure of components. The limitation of the models used in Chapter 4 is the

constraints used in the estimation of co-located correlations. With our restriction we ensured that

correlation between components was fixed over time and only variances are allowed to vary. This

was necessary due to bimodality of posterior distributions of these parameters. Thus, investigation

of more general solutions to this identification issue is topic of future research.

62

A Appendix A

A.1 Covariance functions

The univariate covariance function used in Section 2.7 and Section 4.4 is given by

C(‖h‖) = σ2(

1 +‖h‖φ

)−1,

with ‖h‖ Euclidean distance, σ2 the variance of the variable and φ the spatial range.

The separable multivariate covariance function used in Sections 2.6.2 and 2.7 is given by

Cij(‖h‖) = aij

(1 +‖h‖φ

)−1, i, j = 1, ..., p,

with ‖h‖ Euclidean distance, aij the covariances of the components and φ the spatial range.

The linear model of coregionalization used in Section 2.7 has the following covariance function

Cij(‖h‖) =

p∑k=1

bikbjk

(1 +‖h‖φk

)−1with ‖h‖ Euclidean distance, B = bij is a p×p full rank matrix and φk the spatial range for k-th

component.

A.2 Prior distributions

The hyperparametres of the prior distributions described in Section 2.4 for the simulation

exercises of Sections 2.5 and 2.6.2, application of weather dataset in Section 2.7, simulation studies

of Section 4.3 and application of California pollutants in Section 4.4 are presented below.

1. Prior distributions considered for all scenarios in simulation of Section 2.5 follows: β ∼

MVN(0, 1000I8), δ12 ∼ Gamma(1, 0.5) and σi ∼ Normal(0, 100), i = 1, 2. We adopted the

mixture prior for α0 and dij , i = 1, . . . , p, j = 2, . . . , p, defined in (2.22). We assume a point

mass at zero, a Gamma(1, 3) for α0 > 0 and a TruncatedNormal(0, 2) for dij ∈ (−φ11,∞),

i = 1, . . . , p, j = 2, . . . , p. For φ11 we consider a gamma distribution, as in Section 2.4, with

shape = 0.75×med(δs) and scale = 0.75, with med(δs) = 0.5175.

2. Prior distributions considered for each dataset in simulation of Section 2.6.2 follows:

Dataset 1:

• SEP: β ∼ Normal(0, 1000I12), A ∼ InverseWishart(I3, 4), and φ ∼ Gamma(0.75 ×

med(δs), 0.75), with med(δs) = 0.5145;

63

• NSEP-φ: β ∼ MVN(0, 1000I12), σi ∼ Normal(0, 100), i = 1, . . . , 3, δij ∼ Gamma(1, 0.5),

i 6= j, i, j = 1, . . . , 3. This model considers p0 = 0 and dij = 0, i = 1, 2, 3, j = 2, 3, then

α0 ∼ Gamma(1, 12) and φ11 ≡ φ ∼ Gamma(0.75×med(δs), 0.75), med(δs) = 0.5145;

• NSEP: β ∼ MVN(0, 1000I12), σi ∼ Normal(0, 100), i = 1, . . . , 3, δij ∼ Gamma(1, 0.5),

i 6= j, i, j = 1, . . . , 3. This model considers p0 = 0, then α0 ∼ Gamma(1, 1),

dij ∼ TruncatedNormal(0, 2) for dij ∈ (−φ11,∞), i = 1, 2, 3, j = 2, 3, and φ11 ∼

Gamma(0.75×med(δs), 0.75), med(δs) = 0.5145;

• Mix-NSEP: β ∼ MVN(0, 1000I12), σi ∼ Normal(0, 100), i = 1, . . . , 3, δij ∼ Gamma(1, 0.5),

i 6= j, i, j = 1, . . . , 3. This model sets a mixture prior for α0 and dij , i = 1, 2, 3, j = 2, 3,

given by a point mass at zero, a Gamma(1, 3) for α0 > 0 and a TruncatedNormal(0, 2)

for dij ∈ (−φ11,∞), i = 1, 2, 3, j = 2, 3, with φ11 ∼ Gamma(0.75 × med(δs), 0.75),

med(δs) = 0.5145;

Dataset 2:


med(δs), 0.75), with med(δs) = 0.5145;












med(δs) = 0.5145;

Dataset 3:


med(δs), 0.75), with med(δs) = 0.5145;

64












med(δs) = 0.5145;

3. The prior distribution for the models fitted for the weather dataset in Section 2.7 is given by:

• SEP: β ∼ MVN(0, 1000I12), A ∼ InverseWishart(I3, 4), and φ ∼ Gamma(0.25 ×

med(δs), 0.25), with med(δs) = 1.958;












med(δs) = 1.958;

• LMC: β ∼ MVN(0, 1000I12), bii ∼ InverseGamma(0.001, 0.001), i = 1, . . . , 3, bij ∼

Normal(0, 100), i 6= j, i, j = 1, . . . , 3, φi ∼ Gamma(0.75 × med(δs), 0.75), i = 1, . . . , 3,

with med(δs) = 1.958;

65

• For each univariate model: β ∼ MVN(0, 1000I4), θ ∼ Gamma(1, 0.25), with θ = 1σ2 ,

and φ ∼ Gamma(0.1×med(δs), 0.1), with med(δs) = 1.958.

4. Prior distributions considered for all simulation studies of Section 4.3 follows:

• M1: β0|D0 ∼ MVN(0, 1000I6), log(σi) ∼ Normal(0, 100), i = 1, . . . , 2, δ12 ∼ Gamma(1, 0.5).

This model sets a mixture prior for α0 given by a point mass at zero, a Gamma(1, 3) for

α0 > 0, and φij ∼ Gamma(0.25×med(δs), 0.25), med(δs) = 0.501;

• M2: β0|D0 ∼ MVN(0, 1000I6), log(σ0)|D0 ∼ MVN(0, 100I2), Ψt = Ψ = 100I2 for

all t, δ12 ∼ Gamma(1, 0.5). This model sets a mixture prior for α0 given by a point

mass at zero, a Gamma(1, 3) for α0 > 0, and φij ∼ Gamma(0.25 × med(δs), 0.25),

med(δs) = 0.501;

5. The prior distribution for the models fitted for the California pollutants dataset in Section

4.4 is given by:

• M1: β0|D0 ∼ MVN(0, 1000I6), log(σi) ∼ Normal(0, 100), i = 1, . . . , 2, δ12 ∼ Gamma(1, 0.5).

This model sets a mixture prior for α0 given by a point mass at zero, a Gamma(1, 3) for

α0 > 0, and φij ∼ Gamma(0.25×med(δs), 0.25), med(δs) = 3.092;

• M2: β0|D0 ∼ MVN(0, 1000I6), log(σ0)|D0 ∼ MVN(0, 100I2), Ψt = Ψ = 100I2 for

all t, δ12 ∼ Gamma(1, 0.5). This model sets a mixture prior for α0 given by a point

mass at zero, a Gamma(1, 3) for α0 > 0, and φij ∼ Gamma(0.25 × med(δs), 0.25),

med(δs) = 3.092;

A.3 Model comparison predictive measures

As follows we present some measures considered for model comparison in the illustrations of

our proposal. Further details can be found in Gneiting and Raftery (2007).

1. Interval Score (IS) is given by

ISα(l, u;x) = (u− l) +2

α(l − x)I[x<l] +

2

α(x− u)I[x>u], (1)

where l and u represent the forecaster quoted α2 and 1− α

2 quantiles. According to Gneiting

and Raftery (2007), the forecaster is rewarded for narrow prediction intervals, and he or she

incurs a penalty, the size of which depends on α, if the observation misses the interval.

66

2. The Width of Credibility Interval (WCI) is defined by (u− l) of the IS function in (1).

3. Log Predictive Score (LPS) is based on the predictive distribution q and on the observed

value x,

LPS(q;x) = −log(q(x)). (2)

67

Bibliography

Apanasovich, T. V. and Genton, M. G. (2010). “Cross-covariance functions for multivariate random

fields based on latent dimensions.” Biometrika, 97, 1, 15–30.

Apanasovich, T. V., Genton, M. G., and Sun, Y. (2012). “A Valid Matern Class of Cross-Covariance

Functions for Multivariate Random Fields With Any Number of Components.” Journal of the

American Statistical Association, 107, 1, 15–30.

Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2004). Hierarchical Modeling and Analysis for

Spatial Data. Monographs on Statistics and Applied Probability, 1st ed. Chapman & Hall/CRC.

— (2015). Hierarchical Modeling and Analysis for Spatial Data. Monographs on Statistics and

Applied Probability, 2nd ed. Chapman & Hall/CRC.

Berliner, L. M. (1996). Hierarchical Bayesian Time Series Models, vol. 79, 15–22. Springer.

Berrocal, V. J., Gelfand, A. E., and Holland, D. M. (2010). “A bivariate space-time downscaler

under space and time misalignment.” The Annals of Applied Statistics, 4, 4, 1942–1975.

Bourotte, M., Allard, D., and Porcu, E. (2016). “A flexible class of non-separable cross-covariance

functions for multivariate space-time data.” Spatial Statistics, 18, 125–146.

Carter, C. K. and Kohn, R. (1994). “On Gibbs sampling for state space models.” Biometrika, 81,

541–553.

Choi, J., Fuentes, M., Reich, B. J., and Davis, J. M. (2009). “Multivariate spatial-temporal mod-

eling and prediction of speciated fine particles.” J. Stat. Theory Pract., 2, 3, 407–418.

Cox, T. F. and Cox, M. A. A. (2000). Multidimensional scaling . Chapman & Hall/CRC.

Cressie, N. (1993). Statistics for Spatial Data. New York: Wiley.

Cressie, N. and Huang, H.-C. (1999). “Classes of Nonseparable, Spatio-Temporal Stationary Co-

variance Functions.” Journal of the American Statistical Association, 94, 448, 1330–1340.

Cressie, N. and Zammit-Mangion, A. (2016). “Multivariate spatial covariance models: a conditional

approach.” Biometrika, 103, 4, 915–935.

Daley, D. J., Porcu, E., and Bevilacqua, M. (2014). “Classes of compactly supported covariance

functions for multivariate random fields.” 29, 4, 1249–1263.

DeGroot, M. H. and Schervish, M. J. (2011). Probability and Statistics. 4th ed. Pearson.

68

Fonseca, T. C. O. and Steel, M. F. J. (2011). “A General Class of Nonseparable Space-time

Covariance Models.” Environmetrics, 22, 2, 224–242.

— (2017). “Measuring Separability in Spatio-temporal Covariance Functions.” Tech. Rep. 290,

Department of Statistics, Federal University of Rio de Janeiro.

Fruhwirth-Schnater, S. (1994). “Data augmentation and dynamic linear models.” Journal of Time

Series Analysis, 15, 2, 183–202.

Fuentes, M. (2006). “Testing for Separability of Spatial-Temporal Covariance Functions.” Journal

of Statistical Planning and Inference, 136, 2, 447–466.

Gamerman, D. and Lopes, H. F. (2006). Markov Chain Monte Carlo: stochastic simulation for

bayesian inference. 2nd ed. CRC Press.

Gaspari, G. and Cohn, S. E. (1999). “Construction of correlation functions in two and three

dimensions.” Q.J.R. Meteorol. Soc., 125, 723–757.

Gelfand, A. E., Banerjee, S., and Gamerman, D. (2005). “Spatial Process Modelling for Univariate

and Multivariate Dynamic Spatial Data.” Environmetrics, 16, 5, 465–479.

Gelfand, A. E., Schmidt, A. M., Banerjee, S., and Sirmans, C. F. (2004). “Nonstationary multi-

variate process modeling through spatially varying coregionalization.” Test , 13, 2, 263–312.

Genton, M. (2007). “Separable Approximations Of Space-time Covariance Matrices.” Environ-

metrics, 18, 681–695.

Genton, M. G. and Kleiber, W. (2015). “Cross-Covariance Functions for Multivariate Geostatis-

tics.” Statistical Science, 30, 3, 147–163.

Gneiting, T. (2002). “Nonseparable, Stationary Covariance Functions for Space-Time Data.” Jour-

nal of the American Statistical Association, 97, 458, 590–600.

Gneiting, T., Genton, M. G., and Guttorp, P. (2007). “Geostatistical space-time models, sta-

tionarity, separability and full symmetry.” In Statistical Methods for Spatio-Temporal Systems,

151–175. Chapman and Hall.

Gneiting, T., Kleiber, W., and Schlather, M. (2010). “Matern Cross-Covariance Functions for

Multivariate Random Fields.” Journal of the American Statistical Association, 105, 491, 1167–

1177.

Gneiting, T. and Raftery, A. E. (2007). “Strictly proper scoring rules, prediction and estimation.”

Journal of the American Statistical Association, 102, 477, 360–378.

Gneiting, T. and Schlather, M. (2004). “Stochastic Models That Separate Fractal Dimension and

the Hurst Effect.” SIAM review , 46, 2, 269–282.

Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations. The Johns Hopkins University

Press.

Goulard, M. and Voltz, M. (1992). “Linear coregionalization model: tools for estimation and choice

of cross-variogram matrix.” Mathematical Geology , 24, 3, 269–286.

69

Harrison, P. and Stevens, C. F. (1976). “Bayesian Forecasting.” Journal of the Royal Statistical

Society Series B , 38, 3, 205–247.

Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999). “Bayesian Model Averaging:

A Tutorial.” Statistical Science, 14, 4, 382–417.

Iaco, S. D., Myers, D. E., Palma, M., and Posa, D. (2013). “Using Simultaneous Diagonalization

to Identify a Space–Time Linear Coregionalization Model.” Math. Geosci., 45, 69–86.

Ip, H. L. and Li, W. K. (2016). “Matern cross-covariance functions for bivariate spatio-temporal

random fields.” Spatial Statistics, 17, 22–37.

Kass, R. and Raftery, A. E. (1995). “Bayes Factor.” Journal of the American Statistical Association,

90, 430, 773–795.

Li, B., Genton, M. G., and Sherman, M. (2007). “A Nonparametric Assessment of Properties of

Space-Time Covariance Functions.” Journal of the American Statistical Association, 102, 478,

736–744.

— (2008). “Testing the covariance structure of multivariate.” Biometrika, 95, 4, 813–829.

Liu, F. and West, M. (2009). “A Dynamic Modelling Strategy for Bayesian Computer Model

Emulation.” Bayesian Analysis, 4, 2, 393–412.

Ma, C. (2002). “Spatio-Temporal Covariance Functions Generated by Mixtures.” Mathematical

Geology , 34, 8, 965–975.

— (2003). “Spatio-Temporal Stationary Covariance Models.” Journal of Multivariate analysis, 86,

1, 97–107.

Majumdar, A. and Gelfand, A. E. (2007). “Multivariate Spatial Modeling for Geostatistical Data

Using Convolved Covariance Functions.” Mathematical Geology , 39, 7, 225–245.

Mardia, K. V. and Goodall, C. R. (1993). Spatial-temporal analysis of multivariate environmental

monitoring data, 347–386. Elsevier Sci., New York.

Mitchell, M. W., Genton, M. G., and Gumpertz, M. L. (2005). “Testing for separability of space-

time covariances.” Environmetrics, 16, 819–831.

— (2006). “A likelihood ratio test for separability of covariances.” Journal of Multivariate analysis,

97, 1025–1043.

Petris, G., Petrone, S., and Campagnoli, P. (2009). Dynamic Linear Models with R. Use R, 1st ed.

Springer-Verlag New York.

Plummer, M., Best, N., Cowles, K., and Vines, K. (2006). “CODA: Convergence Diagnosis and

Output Analysis for MCMC.” R News, 6, 7–11.

Porcu, E., Bevilacqua, M., and Genton, M. G. (2016). “Spatio-Temporal Covariance and Cross-

Covariance Functions of the Great Circle Distance on a Sphere.” Journal of the American

Statistical Association, 111, 514, 888–898.

70

Porcu, E., Mateu, J., and Bevilacqua, M. (2007). “Covariance functions that are stationary or

nonstationary in space and stationary in time.” Statistica Neerlandica, 61, 3, 358–382.

Porcu, E. and Zastavnyi, V. (2011). “Characterization theorems for some classes of covariance

functions associated to vector valued random fields.” Journal of Multivariate analysis, 102,

1293–1301.

Reich, B. and Fuentes, M. (2007). “A multivariate semiparametric Bayesian spatial modelling

framework for hurricane surface wind fields.” Annals of Applied Statistics, 1, 249–264.

Reis, E. A., Gamerman, D., Paez, M. S., and Martins, T. G. (2013). “Bayesian dynamic models

for space–time point processes.” Computational Statistics and Data Analysis, 60, 146–156.

Robert, C. P. (1994). The Bayesian choice: a decision-theoretic motivation. 1st ed. Springer.

Rouhani, S. and Wackernagel, H. (1990). “Multivariate Geostatistical Approach to Space-Time

Data Analysis.” Water Resour , 26, 4, 585–591.

Schervish, M. J. (1995). Theory of Statistics. 1st ed. Springer.

Schmidt, A. M. and Gelfand, A. E. (2003). “A Bayesian Coregionalization Approach for Multivari-

ate Pollutant Data.” Journal of Geophysical Research-Atmospheres, 108, D24.

Stein, M. L. (2005). “Space-Time Covariance Functions.” Journal of the American Statistical

Association, 100, 469, 310–321.

Stroud, J., Muller, P., and Sanso, B. (2001). “Dynamic models for spatio-temporal data.” Journal

of the Royal Statistical Society Series B , 63, 673–689.

Van Loan, C. F. and Pitsianis, N. (1993). Approximation with Kronecker Products, vol. 232, 293–

314. Springer.

Wackernagel, H. (2003). Multivariate Geostatistics: An Introduction with Applications. 3rd ed.

Springer-Verlag Berlin Heidelberg.

West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Models. Springer Series in

Statistics, 2nd ed. Springer.

71

Date post:	13-Sep-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Flexible covariance modeling of multivariate spatio-temporal … · 2019. 10. 25. · Ao meu amor,...

Documents