+ All Categories
Home > Documents > Bayesian computing with INLA: New features

Bayesian computing with INLA: New features

Date post: 25-Dec-2016
Category:
Upload: havard
View: 214 times
Download: 1 times
Share this document with a friend
16
Computational Statistics and Data Analysis 67 (2013) 68–83 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Bayesian computing with INLA: New features Thiago G. Martins , Daniel Simpson, Finn Lindgren, Håvard Rue Department of Mathematical Sciences, Norwegian University of Science and Technology, N-7491 Trondheim, Norway article info Article history: Received 25 September 2012 Received in revised form 23 April 2013 Accepted 23 April 2013 Available online 2 May 2013 Keywords: Approximate Bayesian inference INLA Latent Gaussian models abstract The INLA approach for approximate Bayesian inference for latent Gaussian models has been shown to give fast and accurate estimates of posterior marginals and also to be a valuable tool in practice via the R-package R-INLA. New developments in the R-INLA are formalized and it is shown how these features greatly extend the scope of models that can be analyzed by this interface. The current default method in R-INLA to approximate the posterior marginals of the hyperparameters using only a modest number of evaluations of the joint posterior distribution of the hyperparameters, without any need for numerical integration, is discussed. © 2013 Elsevier B.V. All rights reserved. 1. Introduction The Integrated Nested Laplace Approximation (INLA) is an approach proposed by Rue et al. (2009) to perform approximate fully Bayesian inference on the class of latent Gaussian models (LGMs). INLA makes use of deterministic nested Laplace approximations and, as an algorithm tailored to the class of LGMs, it provides a faster and more accurate alternative to simulation-based MCMC schemes. This is demonstrated in a series of examples ranging from simple to complex models in Rue et al. (2009). Although the theory behind INLA has been well established in Rue et al. (2009), the INLA method continues to be a research area in active development. Designing a tool that allows the user the flexibility to define their own model with a relatively easy to use interface is an important factor for the success of any approximate inference method. The R package INLA, hereafter referred to as R-INLA, provides this interface and allows users to specify and perform inference on complex LGMs. The breadth of classical Bayesian problems covered under the LGM framework, and therefore handled by INLA, is – when coupled with the user-friendly R-INLA interface – a key element in the success of the INLA methodology. For example, INLA has been shown to work well with generalized linear mixed models (GLMM) (Fong et al., 2010), spatial GLMM (Eidsvik et al., 2009), Bayesian quantile additive mixed models (Yue and Rue, 2011), survival analysis (Martino, Akerkar et al., 2011), stochastic volatility models (Martino, Aas et al., 2011), generalized dynamic linear models (Ruiz-Cárdenas et al., 2011), change point models where data dependency is allowed within segments (Wyse et al., 2011), spatio-temporal disease mapping models (Schrödle and Held, 2011), models to complex spatial point pattern data that account for both local and global spatial behavior (Illian et al., 2011), and so on. There has also been a considerable increase in the number of users that have found in INLA the possibility to fit models that they were otherwise unable to fit. More interestingly, those users come from areas that are sometimes completely Corresponding author. Tel.: +47 46937429. E-mail addresses: [email protected], [email protected] (T.G. Martins), [email protected] (D. Simpson), [email protected] (F. Lindgren), [email protected] (H. Rue). URLs: http://www.math.ntnu.no/ guerrera (T.G. Martins), http://www.math.ntnu.no/ daniesi/ (D. Simpson), http://www.math.ntnu.no/ finnkrl/ (F. Lindgren), http://www.math.ntnu.no/ hrue/ (H. Rue). 0167-9473/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.csda.2013.04.014
Transcript
Page 1: Bayesian computing with INLA: New features

Computational Statistics and Data Analysis 67 (2013) 68–83

Contents lists available at SciVerse ScienceDirect

Computational Statistics and Data Analysis

journal homepage: www.elsevier.com/locate/csda

Bayesian computing with INLA: New featuresThiago G. Martins ∗, Daniel Simpson, Finn Lindgren, Håvard RueDepartment of Mathematical Sciences, Norwegian University of Science and Technology, N-7491 Trondheim, Norway

a r t i c l e i n f o

Article history:Received 25 September 2012Received in revised form 23 April 2013Accepted 23 April 2013Available online 2 May 2013

Keywords:Approximate Bayesian inferenceINLALatent Gaussian models

a b s t r a c t

The INLA approach for approximate Bayesian inference for latent Gaussian models hasbeen shown to give fast and accurate estimates of posterior marginals and also to be avaluable tool in practice via the R-package R-INLA. New developments in the R-INLA areformalized and it is shown how these features greatly extend the scope of models that canbe analyzed by this interface. The current default method in R-INLA to approximate theposterior marginals of the hyperparameters using only a modest number of evaluations ofthe joint posterior distribution of the hyperparameters, without any need for numericalintegration, is discussed.

© 2013 Elsevier B.V. All rights reserved.

1. Introduction

The Integrated Nested Laplace Approximation (INLA) is an approach proposed by Rue et al. (2009) to performapproximate fully Bayesian inference on the class of latent Gaussianmodels (LGMs). INLAmakes use of deterministic nestedLaplace approximations and, as an algorithm tailored to the class of LGMs, it provides a faster andmore accurate alternativeto simulation-basedMCMC schemes. This is demonstrated in a series of examples ranging from simple to complexmodels inRue et al. (2009). Although the theory behind INLA has beenwell established in Rue et al. (2009), the INLAmethod continuesto be a research area in active development. Designing a tool that allows the user the flexibility to define their own modelwith a relatively easy to use interface is an important factor for the success of any approximate inference method. The Rpackage INLA, hereafter referred to as R-INLA, provides this interface and allows users to specify and perform inferenceon complex LGMs.

The breadth of classical Bayesian problems covered under the LGM framework, and therefore handled by INLA, is – whencoupled with the user-friendly R-INLA interface – a key element in the success of the INLA methodology. For example,INLA has been shown to work well with generalized linear mixedmodels (GLMM) (Fong et al., 2010), spatial GLMM (Eidsviket al., 2009), Bayesian quantile additive mixed models (Yue and Rue, 2011), survival analysis (Martino, Akerkar et al., 2011),stochastic volatility models (Martino, Aas et al., 2011), generalized dynamic linear models (Ruiz-Cárdenas et al., 2011),change point models where data dependency is allowed within segments (Wyse et al., 2011), spatio-temporal diseasemapping models (Schrödle and Held, 2011), models to complex spatial point pattern data that account for both local andglobal spatial behavior (Illian et al., 2011), and so on.

There has also been a considerable increase in the number of users that have found in INLA the possibility to fit modelsthat they were otherwise unable to fit. More interestingly, those users come from areas that are sometimes completely

∗ Corresponding author. Tel.: +47 46937429.E-mail addresses: [email protected], [email protected] (T.G. Martins), [email protected] (D. Simpson),

[email protected] (F. Lindgren), [email protected] (H. Rue).URLs: http://www.math.ntnu.no/∼guerrera (T.G. Martins), http://www.math.ntnu.no/∼daniesi/ (D. Simpson), http://www.math.ntnu.no/∼finnkrl/

(F. Lindgren), http://www.math.ntnu.no/∼hrue/ (H. Rue).

0167-9473/$ – see front matter© 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.csda.2013.04.014

Page 2: Bayesian computing with INLA: New features

T.G. Martins et al. / Computational Statistics and Data Analysis 67 (2013) 68–83 69

unrelated to each other, such as econometrics, ecology, climate research, etc. Some examples are bi-variate meta-analysisof diagnostic studies (Paul et al., 2010), detection of under-reporting of cases in an evaluation of veterinary surveillance data(Schrödle et al., 2011), investigation of geographic determinants of reported human Campylobacter infections in Scotland(Bessell et al., 2010), the analysis of the impact of different social factors on the risk of acquiring infectious diseases inan urban setting (Wilking et al., 2012), analysis of animal space use metrics (Johnson et al., 2011), animal models usedin evolutionary biology and animal breeding to identify the genetic part of traits (Holand et al., 2011), analysis of therelation between biodiversity loss and disease transmission across a broad, heterogeneous ecoregion (Haas et al., 2011),identification of areas in Toronto where spatially varying social or environmental factors could be causing higher incidenceof lupus than would be expected given the population (Li et al., 2011), and spatio-temporal modeling of particulate matterconcentration in the North-Italian region Piemonte (Cameletti et al., 2012). The relative black-box format of INLA allows itto be embedded in external tools for a more integrated data analysis. For example, Beale et al. (2010) mention that INLAhas been used by tools embedded in a Geographical Information System (GIS) to evaluate the spatial relationships betweenhealth and the environment data. Themodel selectionmeasures available in INLA are also something verymuch appreciatedin the applied work mentioned so far. Such quantities include marginal likelihood, deviance information criterion (DIC)(Spiegelhalter et al., 2002), and other predictive measures.

Some extensions to the work of Rue et al. (2009) have also been presented in the literature; Hosseini et al. (2011)extends the INLA approach to fit spatial GLMMwith skew normal priors for the latent variables instead of themore standardnormal priors, Sørbye and Rue (2010) extend the use of INLA to joint inference and present an algorithm to derive analyticalsimultaneous credible bands for subsets of the latent field based on approximating the joint distribution of the subsets bymultivariate Gaussian mixtures, Martins and Rue (2012) extend INLA to fit models where independent components of thelatent field can have non-Gaussian distributions, and Cseke and Heskes (2011) discuss variations of the classic Laplace-approximation idea based on alternative Gaussian approximations (see also Rue et al. (2009, pp. 386–387) for a discussionon this issue).

A lot of advances have been made in the area of spatial and spatial–temporal models; Eidsvik et al. (2011) addressthe issue of approximate Bayesian inference for large spatial datasets by combining the use of prediction process modelsas a reduced-rank spatial process to diminish the dimensionality of the model and the use of INLA to fit these reduced-rank models. INLA blends well with the work of Lindgren et al. (2011) where an explicit link between Gaussian Fields(GFs) and Gaussian Markov Random Fields (GMRFs) allows the modeling of spatial and spatio-temporal data to bedone with continuously indexed GFs while the computations are carried out with GMRFs, using INLA as the inferentialalgorithm.

The INLA methodology requires some expertise in numerical methods and computer programming to be implemented,since all procedures required to perform INLA need to be carefully implemented to achieve a good speed. This can, at first, beconsidered a disadvantage when compared with other approximate methods such as (naive) MCMC schemes that are mucheasier to implement, at least on a case-by-case basis. To overcome this, the R-INLA package was developed to provide aneasy to use interface to the stand-alone C coded inla program.1 To download the package one only needs one line of theR code that can be found in the download section of the INLA website (http://www.r-inla.org/). In addition, the websitecontains several worked out examples, papers and even the complete source code of the project.

In Rue et al. (2009) most of the attention was focused on the computation of the posterior marginals of the elementsof the latent field since those are usually the biggest challenge when dealing with LGMs given the high dimension of thelatent field usually found in the models of interest. On the other hand, it was mentioned that the posterior marginals of theunknownparameters not in the latent field, hereafter referred to as hyperparameters, are obtained via numerical integrationof an interpolant constructed from evaluations of the Laplace approximation of the joint posterior of the hyperparametersalready computed in the computation of the posterior marginals of the latent field. However, details of such an interpolantwere not given. The first part of the paper will show how to construct this interpolant in a cost-effective way. Besides that,we will describe the algorithm currently in use in the R-INLA package that completely bypasses the need for numericalintegration, providing accuracy and scalability.

Unfortunately, when an interface is designed, a compromise must be made between simplicity and generality, meaningthat in order to build a simple to use interface, some models that could be handled by the INLA method might not beavailable through that interface, hence not available to the general user. The second part of the paper will formalize somenew developments already implemented on the R-INLA package and show how these new features greatly extend thescope of models available through that interface. It is important to keep in mind the difference between themodels that canbe analyzed by the INLA method and the models that can be analyzed through the R-INLA package. The latter is containedwithin the first, whichmeans that not everymodel that can be handled by the INLAmethod is available through the R-INLAinterface. Therefore, this part of the paper will formalize tools that extend the scope of models within R-INLA that werealready available within the theoretical framework of the INLA method.

Section 2 will present an overview of the latent Gaussian models and of the INLA methodology. Section 3 will addressthe issue of computing the posterior marginal of the hyperparameters using a novel approach. A number of new features

1 The dependency on the stand-alone C program is the reason why R-INLA is not available on CRAN.

Page 3: Bayesian computing with INLA: New features

70 T.G. Martins et al. / Computational Statistics and Data Analysis 67 (2013) 68–83

already implemented in the R-INLA package will be formalized in Section 4 together with examples highlighting theirusefulness.

2. Integrated nested Laplace approximation

In Section 2.1 we define latent Gaussian models using a hierarchical structure highlighting the assumptions required tobe used within the INLA framework and point out which components of the model formulation will be made more flexiblewith the features presented in Section 4. Section 2.2 gives a brief description of the INLA approach and presents the task ofapproximating the posterior marginals of the hyperparameters that will be formalized in Section 3. A basic description ofthe R-INLA package is given in Section 2.3 and this is mainly to situate the reader when going through the extensions inSection 4.

2.1. Latent Gaussian models

The INLA framework was designed to deal with latent Gaussian models, where the observation (or response) variableyi is assumed to belong to a distribution family (not necessarily part of the exponential family) where some parameter ofthe family φi is linked to a structured additive predictor ηi through a link function g(·) so that g(φi) = ηi. The structuredadditive predictor ηi accounts for the effects of various covariates in an additive way:

ηi = α +

nfj=1

f (j)(uji)+

ηβk=1

βkzki + ϵi, (1)

where {f (j)(·)}’s are unknown functions of the covariates u, used for example to relax the linear relationship of covariatesand to model temporal and/or spatial dependence, the {βk}’s represent the linear effect of covariates z and the {ϵi}’s areunstructured terms. Then a Gaussian prior is assigned to α, {f (j)(·)}, {βk} and {ϵi}.

We can also write the model described above using a hierarchical structure, where the first stage is formed bythe likelihood function with conditional independence properties given the latent field x = (η, α, f ,β) and possiblehyperparameters θ1, where each data point {yi, i = 1, . . . , nd} is connected to one element in the latent field xi. Assumingthat the elements of the latent field connected to the data points are positioned on the first nd elements of x, we have thefirst stage.Stage 1. y|x, θ1 ∼ π(y|x, θ1) =

ndi=1 π(yi|xi, θ1).

Two new features relaxing the assumptions of Stage 1 within the R-INLA package will be presented in Section 4.Section 4.1will showhow to fitmodelswhere different subsets of data come fromdifferent sources (i.e. different likelihoods)and Section 4.4 will show how to relax the assumption that each observation can only depend on one element of the latentfield and allow it to depend on a linear combination of the elements in the latent field.

The conditional distribution of the latent field x given some possible hyperparameters θ2 forms the second stage of themodel and has a joint Gaussian distribution.Stage 2. x|θ2 ∼ π(x|θ2) = N (x; µ(θ2),Q−1(θ2)),where N (·; µ,Q−1) denotes a multivariate Gaussian distribution with a mean vector µ and a precision matrix Q . In mostapplications, the latent Gaussian fields have conditional independence properties, which translates into a sparse precisionmatrix Q (θ2), which is of extreme importance for the numerical algorithms that will follow. A multivariate Gaussiandistribution with a sparse precision matrix is known as a Gaussian Markov Random Field (GMRF) (Rue and Held, 2005).The latent field x may have additional linear constraints of the form Ax = e for a k × n matrix A of rank k, where k is thenumber of constraints and n is the size of the latent field. Stage 2 is very general and can accommodate an enormous numberof latent field structures. Sections 4.2, 4.3 and 4.6 will formalize new features of the R-INLA package that gives the usergreater flexibility to define these latent field structures, i.e. enable them to define complex latent fields from simpler GMRFsbuilding blocks.

The hierarchical model is then completed with an appropriate prior distribution for the hyperparameters of the modelθ = (θ1, θ2).Stage 3. θ ∼ π(θ).

2.2. INLA methodology

For the hierarchical model described in Section 2.1, the joint posterior distribution of the unknowns then reads

π(x, θ|y) ∝ π(θ)π(x|θ)ndi=1

π(yi|xi, θ)

∝ π(θ)|Q (θ) |n/2 exp

12xTQ (θ)x +

ndi=1

log{π(yi|xi, θ)}

Page 4: Bayesian computing with INLA: New features

T.G. Martins et al. / Computational Statistics and Data Analysis 67 (2013) 68–83 71

and the marginals of interest can be defined as

π(xi|y) =

π(xi|θ, y)π(θ|y)dθ i = 1, . . . , n

π(θj|y) =

π(θ|y)dθ−j j = 1, . . . ,m

while the approximated posterior marginals of interest π̃(xi|y), i = 1, . . . , n, and π̃(θj|y), j = 1, . . . ,m, returned by INLAhave the following form:

π̃(xi|y) =

k

π̃(xi|θ(k), y)π̃(θ(k)|y)∆θ(k) (2)

π̃(θj|y) =

π̃(θ|y)dθ−j (3)

where {π̃(θ(k)|y)} are the density values computed during a grid exploration on π̃(θ|y).Looking at [(2)–(3)], we can see that the method can be divided into three main tasks; firstly propose an approximation

π̃(θ|y) to the joint posterior of the hyperparameters π(θ|y), secondly propose an approximation π̃(xi|θ, y) to themarginalsof the conditional distribution of the latent field given the data and the hyperparametersπ(xi|θ, y) and finally explore π̃(θ|y)on a grid and use it to integrate out θ in Eq. (2) and θ−j in Eq. (4).

Since we do not have π̃(θ|y) evaluated at all points required to compute the integral in Eq. (3) we construct aninterpolation I(θ|y) using the density values {π̃(θ(k)|y)} computed during the grid exploration on π̃(θ|y) and approximate(3) by

π̃(θj|y) =

I(θ|y)dθ−j. (4)

Details on how to construct such an interpolant were not given in Rue et al. (2009). Besides the description of theinterpolation algorithm used to compute Eq. (4), Section 3 will present a novel approach to compute π̃(θj|y) that bypassnumerical integration.

The approximation used for the joint posterior of the hyperparameters π(θ|y) is

π̃(θ|y) ∝π(x, θ, y)πG(x|θ, y)

x=x∗(θ)

(5)

where πG(x|θ, y) is a Gaussian approximation to the full conditional of x obtained bymatching themodal configuration andthe curvature at the mode, and x∗(θ) is the mode of the full conditional for x, for a given θ. Expression (5) is equivalent tothe Laplace approximation of a marginal posterior distribution (Tierney and Kadane, 1986), and it is exact if π(x|y, θ) is aGaussian.

For π(xi|θ, y), three options are available, and they vary in terms of speed and accuracy. The fastest option, πG(xi|θ, y), isto use the marginals of the Gaussian approximation πG(x|θ, y) already computed when evaluating expression (5). The onlyextra cost to obtain πG(xi|θ, y) is to compute the marginal variances from the sparse precision matrix of πG(x|θ, y); see Rueet al. (2009) for details. The Gaussian approximation often gives reasonable results, but there can be errors in the locationand/or errors due to the lack of skewness (Rue and Martino, 2007). The more accurate approach would be to perform againa Laplace approximation, denoted by πLA(xi|θ, y), with a form similar to expression (5)

πLA(xi|θ, y) ∝π(x, θ, y)

πGG(x−i|xi, θ, y)

x−i=x∗

−i(xi,θ), (6)

where x−i represents the vector x with its i-th element excluded, πGG(x−i|xi, θ, y) is the Gaussian approximation tox−i|xi, θ, y and x∗

−i(xi, θ) is the modal configuration. A third option πSLA(xi|θ, y), called simplified Laplace approximation, isobtained by doing a Taylor expansion on the numerator and denominator of expression (6) up to third order, thus correctingthe Gaussian approximation for location and skewness with a much lower cost when compared to πLA(xi|θ, y). We refer toRue et al. (2009) for a detailed description of the Gaussian, Laplace and simplified Laplace approximations to π(xi|θ, y).

2.3. R-INLA interface

In this section we present the general structure of the R-INLA package since the reader will benefit from this whenreading the extensions proposed in Section 4. The syntax for the R-INLA package is based on the built-in glm function inR, and a basic call starts with

formula = y ~ a + b + a:b + c*d + f(idx1, model1, ...) + f(idx2, model2, ...)

Page 5: Bayesian computing with INLA: New features

72 T.G. Martins et al. / Computational Statistics and Data Analysis 67 (2013) 68–83

where formula describes the structured additive linear predictor described in Eq. (1). Here, y is the response variable, andthe term a + b + a:b + c*d holds similar meaning as in the built-in glm function in R and is then responsible forthe fixed effects’ specification. The f() terms specify the general Gaussian random-effect components of the model andrepresent the smooth functions {f (j)(·)} in Eq. (1). In this case we say that both idx1 and idx2 are latent building blocksthat are combined together to form a joint latent Gaussian model of interest. Once the linear predictor is specified, a basiccall to fit the model with R-INLA takes the following form:

result = inla(formula, data = data.frame(y, a, b, c, d, idx1, idx2),family = "gaussian").

After the computations the variable result will hold an S3 object of class "inla", from which summaries, plots, andposterior marginals can be obtained. We refer to the package website http://www.r-inla.org for more information aboutmodel components available to use inside the f() functions as well as more advanced arguments to be used within theinla() function.

3. On the posterior marginals for the hyperparameters

This section starts by describing the grid exploration required to integrate out the uncertainty with respect to θ whencomputing the posterior marginals of the latent field. It also presents two algorithms that can be used to compute theposterior marginals of the hyperparameters with little additional cost by using the points of the joint density of thehyperparameters already evaluated during the grid exploration.

3.1. Grid exploration

The main focus in Rue et al. (2009) lies on approximating posterior marginals for the latent field. In this context,π(θ|y)is used to integrate out uncertainty with respect to θ when approximatingπ(xi|y). For this task we do not need a detailedexploration ofπ(θ|y) as long as we are able to select good evaluation points for the numerical solution of Eq. (2). Rue et al.(2009) propose two different exploration schemes to perform the integration.

Both schemes require a reparametrization of θ-space in order to make the density more regular; we denote such aparametrization as the z-parametrization throughout the paper. Assuming θ = (θ1, . . . , θm) ∈ Rm, which can alwaysbe obtained by ad-hoc transformations of each element of θ, we proceed as follows:

1. Find the mode θ∗ ofπ(θ|y) and compute the negative Hessian H at the modal configuration.2. Compute the eigen-decomposition Σ = VΛ1/2V T where Σ = H−1.3. Define a new z-variable such that

θ(z) = θ∗+ VΛ1/2z.

The variable z = (z1, . . . , zm) is standardized and its components are mutually orthogonal.At this point, if the dimension of θ is small, say m ≤ 5, Rue et al. (2009) propose to use the z-parametrization to

build a grid covering the area where the density of π̃(θ|y) is higher. Such a procedure has a computational cost whichgrows exponentially with m. It turns out that, when the goal is π(xi|y), a rather rough grid is enough to give accurateresults.

If the dimension of θ is higher, Rue et al. (2009) propose a different approach, namedCCD integration. Here the integrationproblem is considered as a design problem and, using the mode θ∗ and the negative Hessian H as a guide, we locate some‘‘points’’ in them-dimensional spacewhich allows us to approximate the unknown functionwith a second order surface (seeSection 6.5 of Rue et al. (2009)). The CCD strategy requires much less computational power compared to the grid strategybut, when the goal is π(xi|y), it still allows us to capture variability in the hyperparameter space when this is too wide tobe explored via the grid strategy.

Fig. 1 shows the location of the integration points in a two-dimensional θ-space using the grid and the CCD strategy.

3.2. Algorithms for computing π̃(θj|y)

If the dimension of θ is not too high, it is possible to evaluate π̃(θ|y) on a regular grid and use the resulting values tonumerically compute the integral in Eq. (3) by summing out the variables θ−j. Of course this is a naive solution in which thecost to obtain m such marginals would increase exponentially on m. A more elaborate solution would be to use a Laplaceapproximation

π(θj|y) ≈π(θ|y)πG(θ−j|θj, y)

θ−j=θ∗

−j

(7)

Page 6: Bayesian computing with INLA: New features

T.G. Martins et al. / Computational Statistics and Data Analysis 67 (2013) 68–83 73

2.8

2.6

2.4

2.2

2.0

1.8

1.6

1.0 1.5 2.0

a

2.8

2.6

2.4

2.2

2.0

1.8

1.6

b

1.0 1.5 2.0

Fig. 1. Location of the integration points in a two-dimensional θ-space using the (a) grid and (b) the CCD strategy.

where θ∗

−j is the modal configuration of π(θ−j|θj, y) and πG(θ−j|θj, y) is a Gaussian approximation to π(θ−j|θj, y) built bymatching the mode and the curvature at the mode. This would certainly give us accurate results, but it requires to find themaximum of the (m − 1)-dimensional function π(θ−j|θj, y) for each value of θj, which again does not scale well with thedimension m of the problem. Besides that, the Hessian computed at the numerically computed ‘‘mode’’ of π(θ−j|θj, y) wasnot always positive definite, which became a major issue. It is worth pointing out that in latent Gaussian models of interest,the dimension of the latent field is usually quite big, which makes the evaluation of π̃(θ|y) given by Eq. (5) expensive. Withthat in mind, it is useful to build and use algorithms that use the density points already evaluated in the grid explorationof π̃(θ|y) as described in Section 3.1. Remember that those grid points already had to be computed in order to integrateout the uncertainty about θ using Eq. (2) so that algorithms that use those points to compute the posterior marginals for θwould be doing so with little extra cost.

3.2.1. Asymmetric Gaussian interpolationSome information about the marginals π(θj|y) can be obtained by approximating the joint distribution π(θ|y) with

a multivariate Normal distribution by matching the mode and the curvature at the mode of π̃(θ|y). Such a Gaussianapproximation forπ(θj|y) comeswith no extra computational effort since themode θ∗ and the negative HessianH of π̃(θ|y)have already been computed in the numerical strategy used to approximate Eq. (2) as described in Section 3.1.

Unfortunately, π(θj|y) can be rather skewed so that a Gaussian approximation is inaccurate. It is possible to correct theGaussian approximation for the lack of asymmetry, with minimal additional costs, as described in the following.

Let z(θ) = (z1(θ), . . . , zm(θ)) be the point in the z-parametrization corresponding to θ. We define the function f (θ) as

f (θ) =

mj=1

fj(zj(θ)) (8)

where

fj(z) ∝

exp

12(σ j+)2

z2

if z ≥ 0

exp

−1

2(σ j−)2z2

if z < 0.(9)

In order to capture some of the asymmetry ofπ(θ|y)we allow the scaling parameters (σ j+, σ j−), j = 1, . . . ,m, to vary notonly according to the m different axes but also according to the direction, positive and negative, of each axis. To computethese, we first note that in a Gaussian density, the drop in log-density when we move from the mode to ±2 the standarddeviation is −2. We compute our scaling parameters in such a way that this is approximately true for all directions. We dothis while exploringπ(θ|y) to solve Eq. (2), meaning that no extra cost is required. An illustration of this process is given inFig. 2.

Approximations for π(θj|y) are then computed via numerical integration of Eq. (8), which is easy to do once the scalingparameters are known. Fig. 3 illustrates the flexibility of fj(z) in Eq. (9) for different values of σ− and σ+.

This algorithm was successfully used in the R-INLA package for a long time, and our experience is that it gives accurateresults with low computational time. However, we came to realize that the multi-dimensional numerical integrationalgorithms available to integrate out θ−j in Eq. (8) get increasingly unstable as we start to fit models with a higher number ofhyperparameters, resulting in approximated posterior marginal densities with undesirable spikes instead of smooth ones.This has lead us to look for an algorithm that gives us accurate and fast approximations without the need to use thosemulti-dimensional integration algorithms, and we now describe our proposed solution.

Page 7: Bayesian computing with INLA: New features

74 T.G. Martins et al. / Computational Statistics and Data Analysis 67 (2013) 68–83

xx-

2

0

Fig. 2. Schematic picture of the process to compute the scaling parameters that determine the form of the asymmetric Gaussian function given by Eq. (9).The solid line is the log-density of the distribution we want to approximate, and the scaling parameters σ 1 and σ 2 are obtained according to a −2 drop inthe target log-density.

0.4

0.3

0.2

0.1

0.0

-4 -2 0 2 4

Fig. 3. Standard normal distribution (solid line) and densities given by Eq. (9) for different values of the scaling parameters (dashed lines).

3.2.2. Numerical integration free algorithmThe approximated posterior marginals π̃(θj|y) returned by the new numerical integration free algorithm will assume

the following structure:

π̃(θj|y) =

N(0, σ 2

j+), θj > 0N(0, σ 2

j−), θj ≤ 0(10)

and the question now becomes how to compute σ 2j+, σ

2j−, j = 1, . . . ,m, without using numerical integration as in

Section 3.2.1. The following lemma will be useful for that (Rue et al., 2009).

Lemma 1. Let x = (x1, . . . , xn)T ∼ N(0,Σ); then for all x1

−12(x1, E(x−1|x1)T )Σ−1

x1

E(x−1|x1)

= −

12

x21Σ11

.

The lemma above can be used in our favor since it states that the joint distribution of θ as a function of θi and θ−i evaluatedat the conditional mean E(θ−i|θi) behaves as the marginal of θi. In our case this will be an approximation since θ is notGaussian.

For each axis j = 1, . . . ,m our algorithmwill compute the conditional mean E(θ−j|θj) assuming θ to be Gaussian, whichis linear in θj and depends only on the mode θ∗ and covariance Σ already computed in the grid exploration of Section 3.1,and then use Lemma 1 to explore the approximated posterior marginal of θj in each direction of the axis. For each directionof the axis we only need to evaluate three points of this approximated marginal given by Lemma 1, which is enough tocompute the second derivative and with that get the standard deviations σ−

j and σ+

j required to represent Eq. (10).

Page 8: Bayesian computing with INLA: New features

T.G. Martins et al. / Computational Statistics and Data Analysis 67 (2013) 68–83 75

0.30

0.25

0.20

0.15

0.10

0.05

0.00

dens

ity

2 4 6 8

a

0.5

0.4

0.3

0.2

0.1

0.0

2 4 6 8

b

0.25.2

0.35. 3

0. 05.0

0 .15.1

0.0 0.2 0.4 0.6 0.8

c

dens

ity

dens

ity

Fig. 4. Posterior distribution for the hyperparameters in the replicate example with a vertical line to indicate the true values used in the simulation. Solidline computed by the numerical integration free algorithm and dashed line computed via the more expensive grid exploration. (a) Gaussian observationalprecision, (b) precision of the AR(1) latent model, (c) persistence parameter for the AR(1) process.

Example 1. To illustrate the difference in accuracy between the numerical integration free algorithm and the posteriormarginals obtained via a more computationally intensive grid exploration we show in Fig. 4 the posterior marginals of thehyperparameters of Example 3 computed by the first (solid line) and by the latter (dashed line). We can see that we loseaccuracy when using the numerical integration free algorithm, but it still gives us sensible results with almost no extracomputation time while we need to perform a second finer grid exploration to obtain a more accurate result via the gridmethod, an operation that can take a long time in examples with high dimension of the latent field and/or hyperparameters.The numerical integration free algorithm is the defaultmethod to compute the posteriormarginals for the hyperparameters.In order to get more accurate results via the grid method the user needs to use the output of the inla function into theinla.hyperpar function. For example, to generate the marginals computed by the grid method in Fig. 4 we have used thefollowing.

result.hyperpar = inla.hyperpar(result)

The asymmetric Gaussian interpolation can still be used through the control.inla argument.

inla(..., control.inla = list(interpolator = "ccdintegrate"), ...)

4. Extending the scope of INLA

This section formalizes several features available within the R-INLA package that greatly extend the scope of modelsavailable through that interface. The features are illustrated with small examples that help us to understand the usefulnessof the features and to apply it through the R code available along the paper.

4.1. Multiple likelihoods

In many applications, different subsets of data may have been generated by different sources, leading us to be interestedin models where each subset of data may be described by a different likelihood function. Here different likelihood functionsmightmean either a different family of distribution, as for examplewhen a subset of the data follows a Gaussian distributionand the other follows a Poisson distribution, or the same family of distribution but with different hyperparameters, as forexample when one subset of the data comes from a Gaussian distribution with unknown precision τ1 and the other froma Gaussian with unknown precision τ2. Concrete examples of the usefulness of this feature can be found in Guo and Carlin(2004) where longitudinal and event time data are jointlymodeled or in the preferential sampling framework of Diggle et al.(2010) where geostatistical models with stochastic dependence between the continuous measurements and the locations

Page 9: Bayesian computing with INLA: New features

76 T.G. Martins et al. / Computational Statistics and Data Analysis 67 (2013) 68–83

at which the measurements were made are presented. The R code for the examples presented in those papers can be foundin the case studies section at the INLA website.

Although being a very useful feature, models with multiple likelihoods are not straightforward, if at all possible, toimplement through many of the popular packages available in R. From a theoretical point of view there is nothing thatkeeps us from fitting a model with multiple likelihoods with the INLA approach. The only requirements, as described inSection 2.1, are that the likelihood function must have conditional independence properties given the latent field x andhyperparameters θ1, and that each data point yi must be connected to one element in the latent field xi, so that

π(y|x, θ1) =

ndi=1

π(yi|xi, θ1).

Even this last restriction will be made more flexible in Section 4.5 where each data point yi may be connected with a linearcombination of the elements in the latent field.

Modelswithmultiple likelihoods canbe fitted through theR-INLApackage by rewriting the response variable as amatrix(or list) where the number of columns (or elements in the list) is equal to the number of different likelihood functions. Thefollowing small example will help us to illustrate the process.

Example 2. Suppose we have a dataset ywith 2n elements where the first n data points come from a binomial experimentand the last n data points come from a Poisson distribution. In this case the response variable to be used as input to theinla() function must be written as a matrix with two columns and 2n rows where the first n elements of the first columnhold the binomial data while the last n elements of the second column hold the Poisson data, and all other elements of thematrix should be filled with NA. Following is the R code to simulate data following the description above together with theR-INLA code to fit the appropriate model to the simulated data.

n = 100x1 = runif(n)eta1 = 1 + x1y1 = rbinom(n, size = 1, prob = exp(eta1)/(1+exp(eta1))) # binomial datax2 = runif(n)eta2 = 1 + x2y2 = rpois(n, exp(eta2))Y = matrix(NA, 2*n, 2) # need the response variable as matrixY[1:n, 1] = y1 # binomial dataY[1:n + n, 2] = y2 # Poisson dataNtrials = c(rep(1,n), rep(NA, n)) # required only for binomial dataxx = c(x1, x2)formula = Y ~ 1 + xxresult = inla(formula, data = list(Y = Y, xx = xx),

family = c("binomial", "Poisson"), Ntrials = Ntrials)summary(result)plot(result)

4.2. Replicate feature

The replicate feature in R-INLA allows us to define models where the latent field x contains conditional independentreplications of the same latent model given some hyperparameters. Assume for example that z1 and z2 are independentreplications from z|θ such that x = (z1, z2) and

π(x|θ) = π(z1|θ)π(z2|θ). (11)

It is important to note here that although the process z1 and z2 are conditionally independent given θ they both conveyinformation about θ. A latent model such as (11) can be defined in the R-INLA package using the replicate argumentinside the f() function used to specify the random-effect components as described in Section 2.3.

Example 3. Let us define the following AR(1) process

x1 ∼ N(0, (κ(1 − φ2))−1)

xi = φxi−1 + ϵi; ϵi ∼ N(0, κ−1), i = 2, . . . , n

with φ and κ being unknown hyperparameters satisfying |φ| < 1 and κ > 0. Denote by τ the marginal precision of theprocess, τ = κ(1−φ2). Nowassume two conditionally independent realizations z1 and z2 of theAR(1)process defined abovegiven the hyperparameters θ = (φ, τ ).We are then given a dataset y with 2n elementswhere the first n elements come froma Poissonwith intensity parameters given by exp(z1) and the last n elements of the dataset come from aGaussianwithmean

Page 10: Bayesian computing with INLA: New features

T.G. Martins et al. / Computational Statistics and Data Analysis 67 (2013) 68–83 77

0.05.0

0.15.1

5.1 -0.1-

5.0-

0 020 2040 4060 6080 80100 100Time Time

12

02-

1-

a b

Fig. 5. Replicate example: (a) simulated z1 process (solid line) together with posterior means and (0.025, 0.975) quantiles returned by INLA (dashed line),(b) simulated z2 process (solid line) together with posterior means and (0.025, 0.975) quantiles returned by INLA (dashed line).

z2. The latent model x = (z1, z2) described here can be specified with a two-dimensional index (i, r)where i is the positionindex for each process and r is the index to label the process. Following is the INLA code to fit themodel we just described tosimulated data with φ = 0.5, κ =

√2 and Gaussian observational precision τobs = 3. Priors for the hyperparameters were

chosen following the guide-lines described in Fong et al. (2010). Fig. 5 shows the simulated z = (z1, z2) (solid line) togetherwith posterior means and (0.025, 0.975) quantiles (dashed line) returned by INLA. Fig. 6 shows the posterior distributionsfor the hyperparameters returned by INLA with a vertical line to indicate true values used in the simulation.

n = 100z1 = arima.sim(n, model = list(ar = 0.5), sd = 0.5) # independent replicationz2 = arima.sim(n, model = list(ar = 0.5), sd = 0.5) # from AR(1) processy1 = rpois(n, exp(z1))y2 = rnorm(n, mean = z2, sd = 1/sqrt(3))y = matrix(NA, 2*n, 2) # Setting up matrix due to multiple likelihoodsy[1:n, 1] = y1y[n + 1:n, 2] = y2hyper.gaussian = list(prec = list(prior = "loggamma", # prior for Gaussian

param = c(1, 0.2161))) # likelihood precisionhyper.ar1 = list(prec = list(prior = "loggamma", # priors for the

param = c(1, 0.2161)), # ’ar1’ modelrho = list(prior = "normal",

param = c(0, 0.3)))i = rep(1:n, 2) # position index for each processr = rep(1:2, each = n) # index to label the processformula = y ~ f(i, model = "ar1", replicate = r, hyper = hyper.ar1) -1result = inla(formula, family = c("Poisson", "gaussian"),

data = list(y = y, i = i, r = r),control.family = list(list(), list(hyper = hyper.gaussian)))

summary(result)plot(result)

4.3. Copy feature

The formula syntax as illustrated in Section 2.3 allows us to have only one element from each latentmodel to contributeto the linear prediction specification. So that a model formulation such as

formula = y ~ f(idx1, model1, ...) + f(idx2, model2, ...)

indicates that each data point yi is connected to one linear predictor ηi through a given link function g and that each ηi isconnected to one element of the random effect idx1 and to one element of the random effect idx2. Unfortunately this isnot always enough as illustrated in the example below.

Example 4. Suppose our data come from a Gaussian distribution yi ∼ N(ηi, τ−1), i = 1, . . . , n, where the linear predictionηi assume the following form:

ηi = ai + bizi, (ai, bi)iid∼N2(0,Q−1),

Page 11: Bayesian computing with INLA: New features

78 T.G. Martins et al. / Computational Statistics and Data Analysis 67 (2013) 68–83

51.002.0

52.000.0

50.001.0

dens

ity

a

2 4 6 8

0.5

0.4

0.3

0.2

0.1

0.0

b

dens

ity

0.25.2

0.35.3

0.05.0

0.15.1

cde

nsity

1 2 3 4 5 6 7

0.0 0.2 0.4 0.6 0.8

Fig. 6. Posterior distribution for the hyperparameters in the replicate example with a vertical line to indicate the true values used in the simulation. (a)Gaussian observational precision, (b) precision of the AR(1) latent model, (c) lag-one correlation for the AR(1) process.

where z represent here known covariates. The bi-variate Gaussian model N2(0,Q−1) is defined in R-INLA by f(i, model= "iid2d"). However, a definition like

formula = y ~ f(i, model = "iid2d", ...) - 1

does not allowus to define themodel of interestwhere each linear predictor ηi is connected to two elements of the bi-variateGaussian model, which are ai and bi in this case. To address this inconvenience the copy feature was created and our modelformulation could be defined by

formula = y ~ f(i, model = "iid2d", n = 2*n) +f(i.plus, z, copy = "i")

with appropriate definitions for the indexes i and i.plus. The copy feature is not limited to the bi-variate case as in theabove example; we could easily have defined a model where each linear predictor is connected to three or more elementsof a given latent model. For example, if we had a tri-variate Gaussian

ηi = ai + biz1,i + ciz2,i, (ai, bi, ci)iid∼N3(0,Q−1),

we would use

formula = y ~ f(i, model = "iid3d", n = 3*n) +f(i.plus1, z1, copy = "i") +f(i.plus2, z2, copy = "i")

with appropriate definitions for the indexes i, i.plus1 and i.plus2.Below is the R code to simulate data and to fit the bi-variate model described above with INLA. The data is simulated

with observational precision τ = 1 and bi-variate Gaussian distribution for the random effects (ai, bi), i = 1, . . . , 1000,with marginal precisions τa = τb = 1 for ai and bi respectively, and correlation ρab between ai and bi equal to 0.8. Fig. 7shows the posterior marginals for the hyperparameters returned by INLA.

n = 1000Sigma = matrix(c(1, 0.8, 0.8, 1), 2, 2)z = rnorm(n)ab = rmvnorm(n, sigma = Sigma) # require ’mvtnorm’ packagea = ab[, 1]b = ab[, 2]eta = a + b * zy = eta + rnorm(n, sd = 1)

Page 12: Bayesian computing with INLA: New features

T.G. Martins et al. / Computational Statistics and Data Analysis 67 (2013) 68–83 79

0.15.1

0.20.0

5.0de

nsity

a

0.5 0.51.0 1.01.5 1.52.0 2.0

3.0

2.5

2.0

1.5

1.0

0.5

0.0

dens

ity

b

dens

ity

c

0.6 0.8 1.0 1.2 1.4 1.66

801

40

2de

nsity

d

0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95

3.0

2.5

2.0

1.5

1.0

0.5

0.0

Fig. 7. Posterior distribution for the hyperparameters in the copy example with a vertical line to indicate the true values used in the simulation. (a)Gaussian observational precision τ , (b) marginal precision for ai , τa , (c) marginal precision for bi , τb , (d) correlation between ai and bi , ρab .

hyper.gaussian = list(prec = list(prior = "loggamma",param = c(1, 0.2161)))

i = 1:n # use only the first n elements (a_1, ..., a_n)j = 1:n + n # use only the last n elements (b_1, ..., b_n)formula = y ~ f(i, model = "iid2d", n = 2*n) +

f(j, z, copy = "i") - 1result = inla(formula, data = list(y = y, z = z, i = i, j = j),

family = "gaussian",control.data = list(hyper = hyper.gaussian))

summary(result)plot(result)

Formally, the copy feature is usedwhen a latent field is neededmore than once in themodel formulation.When using thefeature we then create an (almost) identical copy of xS , denoted here by x∗

S , that can then be used in the model formulationas shown in Example 4. In this case, we have extended our latent field from xS to x = (xS, x∗

S ), where π(x) = π(xS)π(x∗

S |xS)and

π(x∗

S |xS, τ ) ∝ exp

−τ

2(x∗

S − xS)T (x∗

S − xS)

(12)

so that the degree of closeness between xS and x∗

S is controlled by the fixed high precision τ in Eq. (12), which has a defaultvalue of τ = exp 15. It is also possible for the copied model to have an unknown scale parameter ψ , in which case

π(x∗

S |xS, τ , ψ) ∝ exp

−τ

2(x∗

S − ψxS)T (x∗

S − ψxS). (13)

4.4. Linear combinations of the latent field

Depending on the context, interest might lie not only in the posterior marginals of the elements in the latent field butalso in the linear combinations of those elements. Assume v are the linear combinations of interest, it can then be written as

v = Bx,

Page 13: Bayesian computing with INLA: New features

80 T.G. Martins et al. / Computational Statistics and Data Analysis 67 (2013) 68–83

where x is the latent field and B is a k× nmatrix where k is the number of linear combinations and n is the size of the latentfield. The functions inla.make.lincomb and inla.make.lincombs in R-INLA are used to define a linear combinationand many linear combinations at once, respectively.

R-INLA provides two approaches for dealing with v. The first approach creates an enlarged latent field x̃ = (x, v)and then uses the INLA method as usual to fit the enlarged model. After completion we then have posterior marginals foreach element of x̃ which includes the linear combinations v. Using this approach the marginals can be computed usingthe Gaussian, Laplace or simplified Laplace approximations discussed in Section 2.2. The drawback is that the addition ofmany linear combinationswill lead tomore dense precisionmatriceswhichwill consequently slow down the computations.This approach can be used by defining the linear combinations of interest using the functions mentioned in the previousparagraph and using control.inla = list(lincomb.derived.only = FALSE) as an argument to the inlafunction.

The second approach does not include v in the latent field but performs a post-processing of the resulting output givenby INLA and approximates v|θ, y by a Gaussian where

Ev|θ,y(v) = Bµ∗ and Varv|θ,y(v) = BQ ∗−1BT ,

in which µ∗ is the mean of the best marginal approximation used for π(xi|θ, y) (i.e. Gaussian, Simplified Laplace orLaplace approximation) and Q ∗ is the precision matrix of the Gaussian approximation πG(x|θ, y) used in Eq. (5). Thenapproximations for the posteriormarginals of v are obtained by integrating θ out in a process similar to Eq. (2). The advantagehere is that the computation of the posterior marginals for v does not affect the graph of the latent field, leading to a muchfaster approximation. That is why this is the default method in R-INLA, but more accurate approximations can be obtainedby switching to the first approach, if necessary.

Example 5. Following is the R code to compute the posterior marginal of a linear combination between elements of theAR(1) process of Example 3. More specifically, we are interested in

v1 = 3z1,2 − 5z1,4v2 = z1,3 + 2z1,5,

where zi,j denote the j-th element of the latent model zi as defined in Example 3.

# define the linear combinations:# v_1 = 3*z_{1,2} - 5*z_{1,4}# v_2 = z_{1,3}+ 2*z_{1,5}lc1 = inla.make.lincomb(i = c(NA, 3, NA, -5))names(lc1) = "lc1"lc2 = inla.make.lincomb(i = c(NA, NA, 1, NA, 2))names(lc2) = "lc2"# compute v_1 and v_2 using the default method.result = inla(formula,

family = c("Poisson", "gaussian"),data = list(y = y, i = i, r = r),control.family = list(list(), list(hyper = hyper.gaussian)),lincomb = c(lc1, lc2))

# compute v_1 and v_2 with the more accurate (and slow) approach.result2 = inla(formula,

family = c("Poisson", "gaussian"),data = list(y = y, i = i, r = r),control.family = list(list(), list(hyper = hyper.gaussian)),lincomb = c(lc1, lc2),control.inla = list(lincomb.derived.only = FALSE))

The code illustrates how to use both approaches described in this section and Fig. 8 shows the posterior distributionsfor v1 and v2 computed with the more accurate approach (solid line) and with the faster one (dashed line). We can seelittle difference between the two approaches in this example. We refer to the FAQ section at the R-INLAwebsite for moreinformation about defining multiple linear combinations at once. �

When using the faster approach, there is also an option to compute the posterior correlationmatrix between all the linearcombinations by using the following.

control.inla = list(lincomb.derived.correlation.matrix = TRUE)

This correlation matrix could be used for example to build a Gaussian copula to approximate the joint density of somecomponents of the latent field, as discussed in Section 6.1 of Rue et al. (2009).

Page 14: Bayesian computing with INLA: New features

T.G. Martins et al. / Computational Statistics and Data Analysis 67 (2013) 68–83 81

01.021.0

41.040.0

60.08 0.0

00 . 020 .0

dens

ity

dens

ity

a

-10 -5 0 5 10

0.35

0.30

51.05 2.0

0.10

0.05

0.00

0.20

-4 -2 0 2 4

b

Fig. 8. Posterior distribution for the linear combinations computedwith bothmethods described in Section 4.4; the solid line represents themore accurateapproach while the dashed line represents the faster one. (a) Posterior distribution for v1 , (b) posterior distribution for v2 .

4.5. More flexible dependence on the latent field

As mentioned in Section 2.1, the INLA method in its original implementation allowed each data point to be connected toonly one element in the latent field.While this is often the case, this assumption is violated, for example, when the observeddata consists of area or time averages of the latent field. In this case,

yi|x, θ1 ∼ π

yi

j

aijxj, θ1

. (14)

Assume A to be the matrix formed by the {aij} elements in Eq. (14). We further assume that the dependence of the data onthe latent field is ‘‘local’’ in the sense that most elements of A are zero. With this assumption everything stays Markovianand fast inference is still possible. This is defined in R-INLA by modifying the control.compute argument of the inlafunction as follows.

inla(..., control.compute = list(A = A))

Internally, R-INLA adds another layer in the hierarchical model

η∗= Aη

where η∗ is formed by a linear combination of the linear predictor η, but now the likelihood function is connected to thelatent field through η∗ instead of η,

y|x, θ1 ∼

ndi=1

π(yi|η∗

i , θ1).

This is a very powerful feature that allows us to fit models with a likelihood representation given by Eq. (14) and besides itcan even mimic to some extent the copy feature of Section 4.3, with the exception that the copy feature allows us to copymodel components using an unknown scale parameter as illustrated in Eq. (13). This feature is implemented by also addingη∗ to the latent model, where the conditional distribution for η∗ has the mean Aη and the precision matrix κAI where theconstant κA is set to a high value, like κA = exp(15) a priori. In terms of output frominla, then (η∗, η) is the linear predictor.

To illustrate the relation between the A matrix and the copy feature we fit the model of Example 4 again but now usingthe feature described in this section. Following is the R code.

## This is an alternative implementation of the model in Example 3i = 1:(2*n)zz = c(rep(1, n), z)formula = y ~ f(i, zz, model = "iid2d", n = 2*n) - 1 # Define etaI = Diagonal(n)A = cBind(I,I) # Define A matrix used to construct eta* = A etaresult = inla(formula,

data = list(y = y, zz = zz, i = i),family = "gaussian",control.predictor = list(A = A))

summary(result)plot(result)

Page 15: Bayesian computing with INLA: New features

82 T.G. Martins et al. / Computational Statistics and Data Analysis 67 (2013) 68–83

Although this A-matrix feature can replicate the copy feature to some extent (remember that copy allows us to copycomponents with unknown scale parameters), for some models it is much simpler to use the copy feature. Which one iseasier to use varies on a case-by-case basis and is left to the user to decide which one he or she is most comfortable with.

In some cases it has been shown useful to simply define the model using the A matrix, by simply defining η as a longvector of all the different components that the full model consists of, and then putting it all together using the A matrix.The following simplistic linear regression example demonstrates the idea. Note that η1 is the intercept and η2 is the effectof covariate x.

n = 100x = rnorm(n)y = 1 + x + rnorm(n, sd = 0.1)intercept = c(1, NA)b = c(NA, 1)A = cbind(1,x)r = inla(y ~ -1 + intercept + b, family = "gaussian",

data = list(A = A, y = y, intercept = intercept, b = b),control.predictor = list(A = A))

4.6. Kronecker feature

In a number of applications, the precision matrix in the latent model can be written as a Kronecker product of twoprecision matrices. A simple example of this is the separable space–time model constructed by using spatially correlatedinnovations in an AR(1)model:

xt+1 = φxt + ϵt ,

where φ is a scalar and ϵ ∼ N(0,Q−1ϵ ). In this case the precision matrix is Q = QAR(1) ⊗ Qϵ , where ⊗ is the Kronecker

product.The general Kronecker product mechanism is currently in progress, but a number of special cases are already available in

the code through thegroup feature. For example, a separable spatio-temporalmodel can be constructed using the command

result = y ~ f(loc, model = "besag",group = time, control.group = list(model = "ar1"))

in which every observation is assigned a location loc and a time time. At each time the points are spatially correlatedwhileacross the time periods, they evolve according to an AR(1) process; see for example Cameletti et al. (2012). Besides the AR(1)model, a uniform correlation matrix defining exchangeable latent models, a random-walk of order one (RW1) and of ordertwo (RW2) are also implemented in R-INLA through the group feature.

5. Conclusion

The INLA framework has become a daily tool for many applied researchers from different areas of application. With thisincrease in usage came as well an increase in demand for the possibility to fit more complex models from within R. It hashappened in a way that many of the latest developments have come from the necessity expressed by the users. Several newfeatures implemented in the R package R-INLA that have greatly extended the scope of models available to be used withinR have been described and illustrated. This is an active project that continues to evolve in order to fulfill, as well as makepossible, the demands of the statistical and applied community. Several case studies that have used the features formalizedhere can be found at the INLAwebsite. Those case studies treat a variety of situations, as for example dynamicmodels, sharedrandom-effect models, spatio-temporal models and preferential sampling, and serve to illustrate the generic nature of thefeatures presented in Section 4.

Most of the attention in Rue et al. (2009) has been focused on the algorithms to compute the posterior marginals ofthe latent field since this is usually the most challenging task given the usual big size of the latent field. However, thecomputation of the posterior marginals of the hyperparameters is not straightforward given the high cost to evaluatethe approximation to the joint density of hyperparameters. We have described here two algorithms that have been usedsuccessfully to obtain the posterior marginals of the hyperparameters by using the few evaluation points already computedwhen integrating out the uncertainty with respect to the hyperparameters in the computation of the posterior marginals ofthe elements in the latent field.

References

Beale, L., Hodgson, S., Abellan, J., LeFevre, S., Jarup, L., 2010. Evaluation of spatial relationships between health and the environment: the rapid inquiryfacility. Environmental Health Perspectives 118 (9), 1306.

Bessell, P., Matthews, L., Smith-Palmer, A., Rotariu, O., Strachan, N., Forbes, K., Cowden, J., Reid, S., Innocent, G., 2010. Geographic determinants of reportedhuman Campylobacter infections in Scotland. BMC Public Health 10 (1), 423.

Page 16: Bayesian computing with INLA: New features

T.G. Martins et al. / Computational Statistics and Data Analysis 67 (2013) 68–83 83

Cameletti, M., Lindgren, F., Simpson, D., Rue, H., 2012. Spatio–temporal modeling of particulatematter concentration through the SPDE approach. Advancesin Statistical Analysis 97 (2), 109–131.

Cseke, B., Heskes, T., 2011. Approximate marginals in latent Gaussian models. Journal of Machine Learning Research 12, 417–454.Diggle, P., Menezes, R., Su, T., 2010. Geostatistical inference under preferential sampling. Journal of the Royal Statistical Society: Series C (Applied Statistics)

59 (2), 191–232.Eidsvik, J., Finley, A., Banerjee, S., Rue, H., 2011. Approximate Bayesian inference for large spatial datasets using predictive process models. Computational

Statistics & Data Analysis.Eidsvik, J., Martino, S., Rue, H., 2009. Approximate Bayesian inference in spatial generalized linear mixed models. Scandinavian Journal of Statistics 36 (1),

1–22.Fong, Y., Rue, H., Wakefield, J., 2010. Bayesian inference for generalized linear mixed models. Biostatistics 11 (3), 397–412.Guo, X., Carlin, B., 2004. Separate and joint modeling of longitudinal and event time data using standard computer packages. The American Statistician 58

(1), 16–24.Haas, S., Hooten, M., Rizzo, D., Meentemeyer, R., 2011. Forest species diversity reduces disease risk in a generalist plant pathogen invasion. Ecology Letters.Holand, A., Steinsland, I., Martino, S., Jensen, H., 2011. Animal models and integrated nested Laplace approximations. Statistics (4), Preprint.Hosseini, F., Eidsvik, J., Mohammadzadeh, M., 2011. Approximate Bayesian inference in spatial glmm with skew normal latent variables. Computational

Statistics & Data Analysis 55 (4), 1791–1806.Illian, J., Soerbye, S., Rue, H., 2011. A toolbox for fitting complex spatial point process models using integrated nested Laplace approximation (inla). Annals

of Applied Statistics.Johnson, D., London, J., Kuhn, C., 2011. Bayesian inference for animal space use and other movement metrics. Journal of Agricultural, Biological, and

Environmental Statistics 16 (3), 357–370.Li, Y., Brown, P., Rue, H., al Maini, M., Fortin, P., 2011. Spatial modelling of lupus incidence over 40 years with changes in census areas. Journal of the Royal

Statistical Society: Series C (Applied Statistics).Lindgren, F., Rue, H., Lindström, J., 2011. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential

equation approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (4), 423–498.Martino, S., Aas, K., Lindqvist, O., Neef, L., Rue, H., 2011. Estimating stochastic volatility models using integrated nested Laplace approximations. The

European Journal of Finance 17 (7), 487–503.Martino, S., Akerkar, R., Rue, H., 2011. Approximate Bayesian inference for survival models. Scandinavian Journal of Statistics 38 (3), 514–528.Martins, T., Rue, H., 2012. Extending INLA to a Class of Near-Gaussian Latent Models. Department of Mathematical Sciences, NTNU, Norway.Paul, M., Riebler, A., Bachmann, L., Rue, H., Held, L., 2010. Bayesian bivariate meta-analysis of diagnostic test studies using integrated nested Laplace

approximations. Statistics in Medicine 29 (12), 1325–1339.Rue, H., Held, L., 2005. Gaussian Markov Random Fields: Theory and Applications. Chapman & Hall.Rue, H., Martino, S., 2007. Approximate Bayesian inference for hierarchical Gaussian Markov random field models. Journal of Statistical Planning and

Inference 137 (10), 3177–3192.Rue, H., Martino, S., Chopin, N., 2009. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.

Journal of the Royal Statistical Society: Series B(Statistical Methodology) 71 (2), 319–392.Ruiz-Cárdenas, R., Krainski, E., Rue, H., 2011. Direct fitting of dynamic models using integrated nested Laplace approximations—inla. Computational

Statistics & Data Analysis.Schrödle, B., Held, L., 2011. Spatio–temporal disease mapping using inla. Environmetrics 22 (6), 725–734.Schrödle, B., Held, L., Riebler, A., Danuser, J., 2011. Using integrated nested Laplace approximations for the evaluation of veterinary surveillance data from

Switzerland: a case-study. Journal of the Royal Statistical Society: Series C (Applied Statistics) 60 (2), 261–279.Spiegelhalter, D., Best, N., Carlin, B., Van Der Linde, A., 2002. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series

B (Statistical Methodology) 64 (4), 583–639.Sørbye, S., Rue, H., 2010. Simultaneous credible bands for latent Gaussian models. Scandinavian Journal of Statistics.Tierney, L., Kadane, J., 1986. Accurate approximations for posterior moments andmarginal densities. Journal of the American Statistical Association 82–86.Wilking, H., Höhle, M., Velasco, E., Suckau, M., Tim, E., Salinas-Perez, J., Garcia-Alonso, C., Molina-Parrilla, C., Jorda-Sampietro, E., Salvador-Carulla, L., et al.,

2012. Ecological analysis of social risk factors for rotavirus infections in Berlin, Germany, 2007–2009. International Journal of Health Geographics 11(1), 37.

Wyse, J., Friel, N., Rue, H., 2011. Approximate simulation-free Bayesian inference for multiple changepoint models with dependence within segments.Bayesian Analysis 6 (4), 501–528.

Yue, Y., Rue, H., 2011. Bayesian inference for additive mixed quantile regression models. Computational Statistics & Data Analysis 55 (1), 84–96.


Recommended