INLAandinlabru withspatial patterns · 2021. 6. 9. · control.compute = list(waic = TRUE),...

Post on 21-Jun-2021

2 views 0 download

transcript

www.IN

BO.be

INLA and inlabruwith spatialpatternsThierry Onkelinx

Overzicht

1 Checking spatial autocorrelationPearson residualsVariogram

2 Prepare the modelCreating a meshCreating an SPDE model

3 Fitting the modelOnly the dataPredictions

1 / 54

www.IN

BO.be

Checkingspatial auto-correlation

www.IN

BO.be

Checkingspatial auto-correlationPearson residuals

www.IN

BO.be

Definition

▶ components?

▶ observed value (yi), fitted value (yi), mean squared error (MSE)▶ formula▶

pri =yi − yi√MSE

▶ MSE (variance) depends on distribution! Check it usinginla.doc("name_of_your_distribution").

4 / 54

www.IN

BO.be

Definition

▶ components?▶ observed value (yi), fitted value (yi), mean squared error (MSE)

▶ formula▶

pri =yi − yi√MSE

▶ MSE (variance) depends on distribution! Check it usinginla.doc("name_of_your_distribution").

4 / 54

www.IN

BO.be

Definition

▶ components?▶ observed value (yi), fitted value (yi), mean squared error (MSE)▶ formula

▶pri =

yi − yi√MSE

▶ MSE (variance) depends on distribution! Check it usinginla.doc("name_of_your_distribution").

4 / 54

www.IN

BO.be

Definition

▶ components?▶ observed value (yi), fitted value (yi), mean squared error (MSE)▶ formula▶

pri =yi − yi√MSE

▶ MSE (variance) depends on distribution! Check it usinginla.doc("name_of_your_distribution").

4 / 54

www.IN

BO.be

Definition

▶ components?▶ observed value (yi), fitted value (yi), mean squared error (MSE)▶ formula▶

pri =yi − yi√MSE

▶ MSE (variance) depends on distribution! Check it usinginla.doc("name_of_your_distribution").

4 / 54

www.IN

BO.be

Example data: rainfall in Parana state, Brazil

-27

-26

-25

-24

-23

-54 -52 -50 -48

Longitude

Lati

tude

Rain

30

60

90

30

60

90

Rain

5 / 54

www.IN

BO.be

Calculate Pearson residuals

model_iid <- inla(Rain ~ Xc + Yc, family = "gamma", data = dataset,control.compute = list(waic = TRUE))

dataset %>%mutate(

mu = model_iid$summary.fitted.values$mean,sigma2 = mu ^ 2 / model_iid$summary.hyperpar[1, "mean"],Pearson_iid = (Rain - mu) / sqrt(sigma2)

) -> dataset

6 / 54

www.IN

BO.be

Challenge 1

▶ What is the mean for your model?▶ What is the variance for your model? Hint: inla.doc("your

distribution")▶ Calculate the Pearson residuals for your model

7 / 54

www.IN

BO.be

Checkingspatial auto-correlationVariogram

www.IN

BO.be

Definition

vg_default <- variogram(Pearson_iid ~ 1, locations = ~X + Y,data = as.data.frame(dataset), cressie = TRUE)

0.00

0.25

0.50

0.75

0 100 200

distance (km)

vari

ance

9 / 54

www.IN

BO.be

Important characteristics

1

3

2

4

0.00

0.25

0.50

0.75

0 100 200

distance (km)

vari

ance

10 / 54

www.IN

BO.be

Important characteristics

range

nugget

sill

partial sill

0.00

0.25

0.50

0.75

0 100 200

distance (km)

vari

ance

11 / 54

www.IN

BO.be

Projected example data

7000000

7100000

7200000

7300000

7400000

7500000

5000000 5100000 5200000 5300000 5400000 5500000 5600000

Rain

30

60

90

30

60

90

Rain

12 / 54

www.IN

BO.be

Increased cutoff

vg_large <- variogram(Pearson_iid ~ 1, locations = ~X + Y, cressie = TRUE,data = as.data.frame(dataset), cutoff = 600e3)

0.00

0.25

0.50

0.75

0 200 400 600

distance (km)

vari

ance

13 / 54

www.IN

BO.be

Number of point pairs is important

3010

8473

11968

143751583916722

1597014653124159514

6920

44932781

1370546

0.00

0.25

0.50

0.75

0 200 400 600

distance (km)

vari

ance

14 / 54

www.IN

BO.be

Too small width leads to unstable variograms

vg_small <- variogram(Pearson_iid ~ 1, locations = ~X + Y, cressie = TRUE,data = as.data.frame(dataset), width = 1e3)

0.0

0.3

0.6

0.9

1.2

0 100 200

distance (km)

vari

ance

number ofpoint pairs

(0,10]

(10,20]

(20,50]

(50,100]

(100,200]

(200,500]

15 / 54

www.IN

BO.be

Sensible small width yields the most informative variogram

vg_final <- variogram(Pearson_iid ~ 1, locations = ~X + Y, cressie = TRUE,data = as.data.frame(dataset), width = 10e3)

150590

9451325

16942008

22322539272429483128316833723518

3762

3723

383639074014

40824211

4065423542114096

3882

2206

0.00

0.25

0.50

0.75

0 100 200

distance (km)

vari

ance

16 / 54

www.IN

BO.be

Challenge 2

▶ What is the minimum binwidth for your data?▶ Calculate the variogram for your model▶ What is the approximate range of of the variogram?▶ What is the nugget, sill and partial sill?

17 / 54

www.IN

BO.be

Prepare themodel

www.IN

BO.be

Prepare themodelCreating a mesh

www.IN

BO.be

Size of a mesh I

20 / 54

www.IN

BO.be

Size of a mesh II

21 / 54

www.IN

BO.be

Size of a mesh III

22 / 54

www.IN

BO.be

Size of a mesh IV

23 / 54

www.IN

BO.be

Size of a mesh V

24 / 54

www.IN

BO.be

Guidelines

▶ equilateral triangles work best▶ edge length should be around a third to a tenth of the range▶ avoid narrow triangles▶ avoid small edges▶ add extra, larger triangles around the border▶ simplify the border

25 / 54

www.IN

BO.be

Mesh only within the border

mesh <- inla.mesh.2d(boundary = border, max.edge = 0.15)ggplot() + gg(mesh) + coord_fixed() + theme_map() +ggtitle(paste("Vertices: ", mesh$n))

Vertices: 261

26 / 54

www.IN

BO.be

Mesh going outside the border

mesh <- inla.mesh.2d(boundary = border, max.edge = c(0.15, 0.3))ggplot() + gg(mesh) + coord_fixed() + theme_map() +ggtitle(paste("Vertices: ", mesh$n))

Vertices: 417

27 / 54

www.IN

BO.be

Mesh for rainfall data

mesh <- inla.mesh.2d(boundary = boundary, max.edge = c(30e3, 100e3))ggplot(dataset) + gg(mesh) + geom_sf() + ggtitle(paste("Vertices: ", mesh$n)) +coord_sf(datum = st_crs(5880))

7000000

7100000

7200000

7300000

7400000

7500000

7600000

4800000 5000000 5200000 5400000 5600000

x

yVertices: 8531

28 / 54

www.IN

BO.be

Use cutoff to simplify mesh

mesh1 <- inla.mesh.2d(boundary = boundary, max.edge = c(30e3, 100e3),cutoff = 10e3)

ggplot(dataset) + gg(mesh1) + geom_sf() +ggtitle(paste("Vertices: ", mesh1$n)) + coord_sf(datum = st_crs(5880))

7000000

7100000

7200000

7300000

7400000

7500000

7600000

4800000 5000000 5200000 5400000 5600000

x

y

Vertices: 844

29 / 54

www.IN

BO.be

Finer mesh for final model run

mesh2 <- inla.mesh.2d(boundary = boundary, max.edge = c(10e3, 30e3),cutoff = 5e3)

ggplot(dataset) + gg(mesh2) + geom_sf() +ggtitle(paste("Vertices: ", mesh2$n)) + coord_sf(datum = st_crs(5880))

7000000

7100000

7200000

7300000

7400000

7500000

7600000

4800000 5000000 5200000 5400000 5600000

x

y

Vertices: 5920

30 / 54

www.IN

BO.be

Challenge 3

▶ What are the relevant max.edge and cutoff for a course mesh?▶ What are the relevant max.edge and cutoff for a smooth mesh?▶ Create a course and a smooth mesh for your data

31 / 54

www.IN

BO.be

Prepare themodelCreating an SPDE model

www.IN

BO.be

SPDE using penalised complexity priors

Stochastic Partial Differential Equations▶ prior.range = c(r, alpha_r): P(ρ < r) < αr▶ prior.sigma = c(s, alpha_s): P(σ > s) < αs

spde1 <- inla.spde2.pcmatern(mesh1, prior.range = c(100e3, 0.5),prior.sigma = c(0.9, 0.05))

spde2 <- inla.spde2.pcmatern(mesh2, prior.range = c(100e3, 0.5),prior.sigma = c(0.9, 0.05))

33 / 54

www.IN

BO.be

Challenge 4

▶ What are relevant priors for the range and sigma for your data▶ Hint: see challenge 2

▶ Make the SPDE models for your data

34 / 54

www.IN

BO.be

Fitting themodel

www.IN

BO.be

Fitting themodelOnly the data

www.IN

BO.be

The stack for the observed data

A1 <- inla.spde.make.A(mesh = mesh1, loc = st_coordinates(dataset))stack1 <- inla.stack(tag = "estimation", ## tagdata = list(Rain = dataset$Rain), ## responseA = list(A1, 1), ## projector matrices (SPDE and fixed effects)effects = list(

list(site = seq_len(spde1$n.spde)), ## random field indexdataset %>%

as.data.frame() %>%transmute(Intercept = 1, Xc, Yc) ## fixed effect covariates

))

37 / 54

www.IN

BO.be

Model fit

INLA

model_spde1 <- inla(Rain ~ 0 + Intercept + Xc + Yc + f(site, model = spde1),family = "gamma", data = inla.stack.data(stack1),control.predictor = list(A = inla.stack.A(stack1)),control.compute = list(waic = TRUE)

)

inlabru

bru_spde1 <- bru(Rain ~ Xc + Yc + site(map = st_coordinates, model = spde1),family = "gamma", data = dataset)

bru_spde1 <- bru(Rain ~ Xc + Yc + site(map = coordinates, model = spde1),family = "gamma", data = as_Spatial(dataset))

38 / 54

www.IN

BO.be

Comparison of fixed effect parameters

Intercept Xc Yc

INLA inlabru INLA inlabru INLA inlabru

-0.70

-0.65

-0.60

-0.55

-0.50

-0.45

-0.30

-0.25

-0.20

-0.15

-0.10

3.4

3.6

3.8

4.0

model

mea

n

39 / 54

www.IN

BO.be

Comparing hyperparameters

Precision parameter for the Gamma observations Range for site Stdev for site

INLA inlabru INLA inlabru INLA inlabru

0.40

0.45

0.50

0.55

50000

75000

100000

125000

3.6

4.0

4.4

4.8

model

mea

n

40 / 54

www.IN

BO.be

Correlation structure

spde.posterior(bru_spde1, "site", what = "matern.covariance") -> covplotspde.posterior(bru_spde1, "site", what = "matern.correlation") -> corplotmultiplot(plot(covplot), plot(corplot))

0.0

0.1

0.2

0.3

0 50000 100000 150000

x

med

ian

0.00

0.25

0.50

0.75

1.00

0 50000 100000 150000

x

med

ian

41 / 54

www.IN

BO.be

Calculate Pearson residuals

dataset %>%mutate(

mu = model_spde1$summary.fitted.values$mean,sigma2 = mu ^ 2 / model_spde1$summary.hyperpar[1, "mean"],Pearson_iid = (Rain - mu) / sqrt(sigma2)

) -> dataset

## Error: Problem with `mutate()` input `mu`.## x Input `mu` can't be recycled to size 528.## i Input `mu` is `model_spde1$summary.fitted.values$mean`.## i Input `mu` must be size 528 or 1, not 1664.

42 / 54

www.IN

BO.be

Using the stack index

si <- inla.stack.index(stack1, "estimation")$datadataset %>%mutate(

mu = model_spde1$summary.fitted.values$mean[si],sigma2 = mu ^ 2 / model_spde1$summary.hyperpar[1, "mean"],Pearson_spde = (Rain - mu) / sqrt(sigma2)

) -> dataset

43 / 54

www.IN

BO.be

Using inlabru

fit <- predict(bru_spde1, as_Spatial(dataset), ~exp(Intercept + Xc + Yc + site))dataset %>%

mutate(mu = fit$mean,sigma2 = mu ^ 2 / model_spde1$summary.hyperpar[1, "mean"],Pearson_spde = (Rain - mu) / sqrt(sigma2)

) -> dataset

44 / 54

www.IN

BO.be

Variogram

vg_fit <- variogram(Pearson_spde ~ 1, cressie = TRUE,data = as_Spatial(dataset), width = 10e3)

0.0

0.2

0.4

0.6

0 100 200

distance (km)

vari

ance

45 / 54

www.IN

BO.be

Interpolate GMRF

A1.grid <- inla.mesh.projector(mesh1, dims = c(41, 41))inla.mesh.project(A1.grid, model_spde1$summary.random$site) %>%as.matrix() %>%as.data.frame() %>%bind_cols(

expand.grid(x = A1.grid$x, y = A1.grid$y)) %>%filter(!is.na(ID)) -> eta_spde

46 / 54

www.IN

BO.be

Plot GMRF

ggplot(dataset) + geom_tile(data = eta_spde, aes(x = x, y = y, fill = mean)) +geom_sf() + scale_fill_gradient2()

27°S

26°S

25°S

24°S

23°S

22°S

56°W 54°W 52°W 50°W 48°W

x

y

-1.0

-0.5

0.0

0.5

mean

47 / 54

www.IN

BO.be

Fitting themodelPredictions

www.IN

BO.be

Prediction stack for SPDE grid + fixed effects

expand.grid(X = A1.grid$x, Y = A1.grid$y) %>%mutate(Intercept = 1, Xc = X / 1e5 - 53, Yc = Y / 1e5 - 71) -> grid_data

stack1_grid <- inla.stack(tag = "grid", ## tagdata = list(Rain = NA), ## responseA = list(A1.grid$proj$A, 1), ## projector matrices (SPDE and fixed effects)effects = list(

list(site = seq_len(spde1$n.spde)), ## random field indexgrid_data ## covariates at grid locations

))

49 / 54

www.IN

BO.be

Refit the model with the combinated stack

stack_all <- inla.stack(stack1, stack1_grid)model_grid <- inla(Rain ~ 0 + Intercept + Xc + Yc + f(site, model = spde1),

family = "gamma", data = inla.stack.data(stack_all),control.predictor = list(A = inla.stack.A(stack_all),

link = 1),control.compute = list(waic = TRUE),control.mode = list(theta = model_spde1$mode$theta,

restart = FALSE),control.results = list(return.marginals.random = FALSE,

return.marginals.predictor = FALSE))

50 / 54

www.IN

BO.be

Plot grid I

si <- inla.stack.index(stack_all, "grid")$datagrid_data %>%bind_cols(model_grid$summary.fitted.values[si, ]) %>%`coordinates<-`(~X + Y) %>%`proj4string<-`(CRS(SRS_string = "EPSG:5880")) -> gd

gd[!is.na(over(gd, boundary)), ] %>%as.data.frame() %>%ggplot() + geom_tile(aes(x = X, y = Y, fill = mean)) + coord_fixed()

51 / 54

www.IN

BO.be

Plot grid II

7100000

7200000

7300000

7400000

7500000

5000000 5200000 5400000 5600000

X

Y

20

40

60

80

mean

52 / 54

www.IN

BO.be

Using inlabru

pred_mesh <- predict(bru_spde1, pixels(mesh1), ~exp(Intercept + Xc + Yc + site))ggplot() + gg(pred_mesh) + gg(boundary)

7000000

7200000

7400000

7600000

5000000 5250000 5500000

x

y

10

20

30

40

50

mean

53 / 54

www.IN

BO.be

Challenge 5

▶ Fit the model using the SPDE▶ Plot a map of the GMRF▶ Plot a map of the predictions and their credible interval

54 / 54

www.IN

BO.be