+ All Categories
Home > Documents > TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND...

TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND...

Date post: 27-Aug-2018
Category:
Upload: hoangquynh
View: 222 times
Download: 0 times
Share this document with a friend
40
TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA Chris Glasbey, Dave Allcroft and Adam Butler Biomathematics & Statistics Scotland
Transcript
Page 1: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL ANDCOMPOSITIONAL DATA

Chris Glasbey, Dave Allcroft and Adam Butler

Biomathematics & Statistics Scotland

Page 2: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

QUESTION 1: How to analyse data with lots of zeros, such as:

Crop lodging (%)

Trial

Variety 1 2 3 4 5 6 7

1 0 0 0 0.3 7.7 0 0.4

2 0 0 0 0 1.7 0 0

3 66.7 1.3 0.7 1.0 6.7 0 0

4 0 0 0 0 0 0 0

5 0 0 0 0 2.7 0 0

6 0 0 0.7 0.3 10.0 0 0

7 0 0 0 0 5.0 0 0

8 3.3 0 0 1.7 28.3 0.3 0

9 0 0 0 0 37.7 0 0

10 0 0 0 0 1.0 0 0... ... ... ... ... ... ... ...

30 3.3 3.0 0 2.0 11.0 0 0.2

31 0 0.3 0.3 0 9.3 0.3 0

32 30.0 1.3 0 0.3 8.3 0 0

Winter wheat showing lodging

2

Page 3: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

QUESTION 2: How to summarise high-dimensional food intake data?

white bread (g)

brow

n br

ead

(g)

0 1000 2000 3000

010

0020

0030

00

2-dimensional marginal plot, weekly intakes of 2200 adults

3

Page 4: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

QUESTION 3: What can be done if rainfall is needed at a finer spatial scalethan recorded?

402km squares ⇒ 82km squares scaledisaggregation

4

Page 5: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

QUESTION 4: Do compositions of beef and pork differ?

Beef

+++

++

+++

+

+

++++++

++

++

++

++

++

++

+

++++

+

+

+++++++++

+

+

+++

++++

++

+++

+++++++++

+++

+

+

++

++

++++

++

++

+

+++++++++

++

+

+++

+

++

+++

++

+

++

+++++

+

++++

+

++

+++

+

++

++++

++++

+++

+++

++

++

++

++

++

++

++++

++

+++++

+

+

+

+

++

+

+

+

++

++

++++

+++

++

++

+

++

+

+

++

+

+

+

+

+

+++

+

+ ++

+

++

+++

+++

+

+++

+

+++

++

++

+++

++

++++

++

+

++++++

+++++++++++

+++

+++++

+

+

+

+

++

+++++++

+++

+++++

++++

+++

+++

++

+++

++

++++

+++

++

+++++

++++

+++

+

+++

++++

++++

+++

++

++

++

++

++

++++

+

+++

++

++

+

++

+

++

+

+++

++

++

++

++

++

++

+

++

++

+

+

+

++

+++

+

+

++++

+

+++

+

+

++

+

++

+++

+

+++

++++

++

++++

++

++++

++

++

+

++

++

+

+

+

+

+

+

++

+++

++

++

+

+++

+++

++++

+

+

+

+

++

+++++++++

+++

+

++++

+++++

+

+++

+

++

+

++

+

++

++

++

+++

++

++++++++++++++

+

++

++++++

+++++

++

+++

+

++++++

++

++++

++++

++

++++++

++

+++

++

++++

++++

++

++

+++++

++

+++++++

++++

+++

+++

++++

++++

++

+++++

+

++

+++

++

+++++++

+++++

+++

+

+++

++

+

+

+

++

+

+

++

++

+

++

++++

+

+

+

++++++

+++++

+++++

+

++++

+

++++

++

++++

+++++++++++

+++

+++++++++++++++++++++++

++++

++

+++++++

+

+

+++

+++++

++++

++

+

0.2 0.4 0.6 0.8

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

protein

fat

carbs

Pork

+

+

+

+

++

+

+

+

++

+

+

++

+

+

++

++++

++++

++++

++++++++

++++

++++

++++

++++

++++

++

++

++

+++

+

++

+

+

+

++

+++

++

++

++++

++

+

++

++

+++

++

+

++

+

++++

+

+

+

++

++

++

+

+

+

+

++

+

++

++

++

+

+

++

+

++

++

+++

+

+

++

+

++

++

+

++

+

+

++++

++

++

++++

+

++++

+

+++

+++

++++

+++++

+

+

+++

+

+

++

+++

+

+

+

+

+++

+++

++

+++

+

+

+++++

+++

+++++++++

+++

+++

++

++++

++

+

++

++

+++

+

+

+

+

++

+

+

+

+++

+

+

+++

++

++++

+

++++

++

+

+

+

0.2 0.4 0.6 0.8

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

protein

fat

carbs

Fish

++

+++

+

+++

+

++

+

+

+++++

+

+

++

+

++

+

++

+

+++++++

+

++

+

+

+

++

++

+

+

++

++

+++

+++++++++++

++

++

+

+

++

++

+

++++

+

+++

+++

++++

+

+

+

++

++++

+++++

+

+

++++

+

+++++

++

+

++

+

++

++++

+

++

+

+++

+

+++++++

+

++

+

+

+

+

+

+

+++

+

++

+

+

+

+++

+

++

+

+

+++++++

+

+

++

++

+

+

+++

+

++

+++

++

++++

+

+

+

+

++++

+

+

+

+

+

++

++

++

++

++

+

+

+

+

++++

++++

++

+

++

++

++

+

++

0.2 0.4 0.6 0.8

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

protein

fat

carbs

Beverages

+++

++

++

+

+++++++++++++++++++++++++++

++

+

+++++++

+++++

+

+++++++++

+

++

+

++++

+

++++

+

+++++

+

+

+

++

+

+++ +

+

+

+

+

+

+++

+

+

++++

+++++ + ++++

++++++

+

++++++

+

+

+

++++++++++++++++

+

++

+

+

+

+

+

+++

+

+

+ +

+

+

+

+

++

+

++

+

++++

+

+++

+

+

+++++++++

+

+ +++

++++

+

++++++++++++++++

+

+++++++++

+

+

+

+

+

0.2 0.4 0.6 0.8

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

protein

fat

carbs

5

Page 6: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Gaussian models are the motorway network of statistics!

6

Page 7: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Binary data (Z) can be modelled by Gaussians, using Probit model:

Z =

0 if Y ≤ 01 otherwise

where Y ∼ N(α + βx, σ2)

x

Z

0 2 4 6 8 10

01

x

Y

0 2 4 6 8 10

−2

02

4

7

Page 8: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

So can non-negative data (Z), using Tobit (or Latent Gaussian) model:

Z =

0 if Y ≤ 0f (Y ) otherwise

where Y ∼ N(α + βx, σ2)

x

Z

0 2 4 6 8 10

02

4

x

Y

0 2 4 6 8 10

−2

02

4

8

Page 9: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

James Tobin (Econometrica, 1958)

9

Page 10: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

10

Page 11: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

PLAN

0. Introduction

1. Univariate data – crop lodging

2. Multivariate data – food intake

3. Spatio-temporal data – rainfall

4. Compositional data – food composition

5. Summary

11

Page 12: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

1. UNIVARIATE DATA – CROP LODGING

Crop lodging (Z)

Trial

Variety 1 2 3 4 5 6 7

1 0 0 0 0.3 7.7 0 0.4

2 0 0 0 0 1.7 0 0

3 66.7 1.3 0.7 1.0 6.7 0 0

4 0 0 0 0 0 0 0

5 0 0 0 0 2.7 0 0

6 0 0 0.7 0.3 10.0 0 0

7 0 0 0 0 5.0 0 0

8 3.3 0 0 1.7 28.3 0.3 0

9 0 0 0 0 37.7 0 0

10 0 0 0 0 1.0 0 0... ... ... ... ... ... ... ...

30 3.3 3.0 0 2.0 11.0 0 0.2

31 0 0.3 0.3 0 9.3 0.3 0

32 30.0 1.3 0 0.3 8.3 0 0

12

Page 13: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

A square-root transformation normalises the non-zero data, so assume:

Zij =

0 if Yij ≤ 0

Y 2ij otherwise

where Yij ∼ N(vi + tj , σ2)

Estimate v, t and σ2 by numerically maximising the likelihood

Zij=0

Φ(eij)∏

Zij>0

φ(eij) where eij =

Zij − vi − tj

σ

13

Page 14: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Crop lodging square-root (√

Z)

Trial

1 2 3 4 5 6 7

Variety v t = 0.9 –1.5 –1.4 0.4 3.5 –0.7 –0.9

1 –0.5 0 0 0 0.5 2.8 0 0.6

2 –2.8 0 0 0 0 1.3 0 0

3 1.7 8.2 1.1 0.8 1.0 2.6 0 0

4 –∞ 0 0 0 0 0 0 0

5 –2.6 0 0 0 0 1.6 0 0

6 –0.3 0 0 0.8 0.5 3.1 0 0

7 –2.3 0 0 0 0 2.2 0 0

8 0.6 1.8 0 0 1.3 5.3 0.5 0

9 –0.8 0 0 0 0 6.1 0 0

10 –3.0 0 0 0 0 1.0 0 0... ... ... ... ... ... ... ... ...

30 0.9 1.8 1.7 0 1.4 3.3 0 0.4

31 0.2 0 0.5 0.5 0 3.0 0.5 0

32 0.9 5.5 1.1 0 0.5 2.9 0 0

σ = 1.6

14

Page 15: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Diagnostic plots using standardised residuals: eij =

Zij − vi − tj

σ

censored scatter plot Kaplan-Meier estimator & Φ

15

Page 16: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

2. MULTIVARIATE DATA – FOOD INTAKE

white bread (g)

brow

n br

ead

(g)

0 1000 2000 3000

010

0020

0030

00

UK Data Archive (Essex University): weekly intakes of 51 food types by2200 adults.

16

Page 17: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Model intake of food j by adult i by:

Zij =

0 if Yij ≤ 0fj(Yij) otherwise

where Yij ∼ N(µj , 1)

and f−1j is a quadratic power transformation though the origin

Y = f−1j (Z) = α1Z

γ + α2Z2γ

Model fitting step 1:

Estimate µj, α and γ by regressing non-zero Z’s on normal scores

17

Page 18: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

For example, for intake of white bread:

untransformed (Y ) transformed (Z = f (Y ))

18

Page 19: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Further assume Yi. ∼ MVN(µ, V ) (Vii ≡ 1, so also correlation matrix)

Model fitting step 2:

Estimate Vjk by maximising the pairwise likelihood:∏

i

p(Zij, Zik)

where

p(Zij, Zik) =

Φ2(−µj,−µk; Vjk) if Zij = 0, Zik = 0

φ(Yij − µj) Φ

(

−µk−Vjk(Yij−µj)√

1−V 2jk

)

if Zij > 0, Zik = 0

φ(Yik − µk) Φ

(

−µj−Vjk(Yik−µk)√

1−V 2jk

)

if Zij = 0, Zik > 0

φ2(Yij − µj, Yik − µk; Vjk) otherwise

19

Page 20: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

V Foods re-ordered

20

Page 21: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

We prefer to have fewer than N(N − 1)/2 = 1275 parameters in V

In Factor Analysis

V =

L∑

l=1

βlβTl + diag (σ2

1, . . . , σ2N ) = BBT + Σ

Equivalently

Yij = µj +

L∑

l=1

Bjlfil + eij

where fi1, fi2, . . . , fiL ∼ N(0, 1) are latent variables

and eij ∼ N(0, σ2j)

21

Page 22: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Model fitting step 3:

Estimate B and Σ using the maximum likelihood algorithm due to Joreskog(1967), modified by using V in place of sample covariance matrix

To maximise:

L = − log |BBT + Σ| − trace[(BBT + Σ)−1V ]

1. Obtain initial estimate of Σ: σ2j = 1 − maxk 6=j |Vjk|

2. B = Σ1/2Ω(Θ − I)1/2

where Θ is L × L diagonal of largest eigenvalues of(

Σ−1/2V Σ−1/2)

and Ω is the N × L matrix of corresponding eigenvectors

3. Numerically maximise L with respect to Σ

4. Repeat steps 2 and 3 until convergence

22

Page 23: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

V :

L = 1 L = 2

L = 3 L = 4

23

Page 24: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Factor loadings B (L = 2)

24

Page 25: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

3. SPATIO-TEMPORAL DATA – RAINFALL

We have 12 hourly arrays (1200km × 600km) of storm in Arkansas USA

Here are hours 3-5:

We will build a model using fine-resolution data

Then use it to disaggregate data at a coarser scale and see how well werecover the fine scale

25

Page 26: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Similar to the multivariate model:

Step 1: We transform rainfall to a censored Gaussian variable (Y ) via aquadratic power transformation

Step 2: We estimate autocorrelations (V ) at a range of spatial and temporallags by maximising pairwise likelihoods

26

Page 27: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

V

Time lag 0

4 .493 .57 .522 .68 .62 .561 .83 .73 .65 .580 1. .89 .75 .66 .59

0 1 2 3 4

Time lag 1 hour

4 .443 .50 .472 .57 .53 .491 .63 .59 .55 .510 .68 .65 .60 .55 .51

0 1 2 3 4

27

Page 28: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

To model V we use a spatio-temporal Gaussian Markov Random Field(GMRF), because rainfall disaggregation requires simulation from condi-tional distributions

Therefore

p(Y ) ∝ 1

|V |12exp

[

−1

2(Y − µ)TV −1(Y − µ)

]

where V −1 is the precision matrix, with non-zero entries specifying theconditional dependencies between elements in Y

For example, a 3 × 3 × 3 neighbourhood:

t-1 t t+1

requires 5 parameters, if we allow for symmetries

28

Page 29: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Extending Rue and Tjelmeland (2002), we approximate both space and timeby a torus. Therefore, all matrices are Toeplitz block circulant (TBC), and

• the first row summarises a matrix

• we can compute V from V −1 via two 3-D Fourier transforms:

V ∗ijt =

Ni-1∑

k=0

Nj−1∑

l=0

Nt−1∑

s=0

V −1000,kls exp

−2πι

(

ik

Ni+

jl

Nj+

ts

Nt

)

then

V000,kls =1

NiNjNt

Ni−1∑

i=0

Nj−1∑

j=0

Nt−1∑

t=0

1

V ∗ijt

exp

2πι

(

ik

Ni+

jl

Nj+

ts

Nt

)

29

Page 30: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Model fitting step 3:We estimate GMRF parameters by minimising

i

j

t

1

i2 + j2 + t2

(

Vijt − Vijt

)2

For neighbourhood size 5 × 5 × 3:

Time lag 0 Time lag 1 hour

30

Page 31: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Model diagnostics:Bivariate histogram of pairs of wet locations at a spatial separation of 8km

0 50mm 50mmobserved expected

31

Page 32: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Model diagnostics:Histogram of rainfall for locations for which the adjacent location was dry

— observed, - - - expected

32

Page 33: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Disaggregation

Gibbs sampling to update blocks of 5 × 5 pixels (YA)

Conditional distribution is multivariate normal, obtained from(

YAYB

)

∼ MVN

((

µAµB

)

,

(

VAA VABVBA VBB

))

where dimension of neighbourhood YB is (3 × 92 − 52) = 218

Use rejection sampling to constrain YA such that∑

ZA matches observed

rainfall

33

Page 34: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Which are the 2 simulated disaggregations?

scale

34

Page 35: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Which are the 2 simulated disaggregations?

Simulation 1 Observed Simulation 2 scale

35

Page 36: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

4. COMPOSITIONAL DATA – FOOD COMPOSITIONBeef

+++

++

+++

+

+

++++++

++

++

++

++

++

++

+

++++

+

+

+++++++++

+

+

+++

++++

++

+++

+++++++++

+++

+

+

++

++

++++

++

++

+

+++++++++

++

+

+++

+

++

+++

++

+

++

+++++

+

++++

+

++

+++

+

++

++++

++++

+++

+++

++

++

++

++

++

++

++++

++

+++++

+

+

+

+

++

+

+

+

++

++

++++

+++

++

++

+

++

+

+

++

+

+

+

+

+

+++

+

+ ++

+

++

+++

+++

+

+++

+

+++

++

++

+++

++

++++

++

+

++++++

+++++++++++

+++

+++++

+

+

+

+

++

+++++++

+++

+++++

++++

+++

+++

++

+++

++

++++

+++

++

+++++

++++

+++

+

+++

++++

++++

+++

++

++

++

++

++

++++

+

+++

++

++

+

++

+

++

+

+++

++

++

++

++

++

++

+

++

++

+

+

+

++

+++

+

+

++++

+

+++

+

+

++

+

++

+++

+

+++

++++

++

++++

++

++++

++

++

+

++

++

+

+

+

+

+

+

++

+++

++

++

+

+++

+++

++++

+

+

+

+

++

+++++++++

+++

+

++++

+++++

+

+++

+

++

+

++

+

++

++

++

+++

++

++++++++++++++

+

++

++++++

+++++

++

+++

+

++++++

++

++++

++++

++

++++++

++

+++

++

++++

++++

++

++

+++++

++

+++++++

++++

+++

+++

++++

++++

++

+++++

+

++

+++

++

+++++++

+++++

+++

+

+++

++

+

+

+

++

+

+

++

++

+

++

++++

+

+

+

++++++

+++++

+++++

+

++++

+

++++

++

++++

+++++++++++

+++

+++++++++++++++++++++++

++++

++

+++++++

+

+

+++

+++++

++++

++

+

0.2 0.4 0.6 0.8

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

protein

fat

carbs

Pork

+

+

+

+

++

+

+

+

++

+

+

++

+

+

++

++++

++++

++++

++++++++

++++

++++

++++

++++

++++

++

++

++

+++

+

++

+

+

+

++

+++

++

++

++++

++

+

++

++

+++

++

+

++

+

++++

+

+

+

++

++

++

+

+

+

+

++

+

++

++

++

+

+

++

+

++

++

+++

+

+

++

+

++

++

+

++

+

+

++++

++

++

++++

+

++++

+

+++

+++

++++

+++++

+

+

+++

+

+

++

+++

+

+

+

+

+++

+++

++

+++

+

+

+++++

+++

+++++++++

+++

+++

++

++++

++

+

++

++

+++

+

+

+

+

++

+

+

+

+++

+

+

+++

++

++++

+

++++

++

+

+

+

0.2 0.4 0.6 0.8

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

protein

fat

carbs

Fish

++

+++

+

+++

+

++

+

+

+++++

+

+

++

+

++

+

++

+

+++++++

+

++

+

+

+

++

++

+

+

++

++

+++

+++++++++++

++

++

+

+

++

++

+

++++

+

+++

+++

++++

+

+

+

++

++++

+++++

+

+

++++

+

+++++

++

+

++

+

++

++++

+

++

+

+++

+

+++++++

+

++

+

+

+

+

+

+

+++

+

++

+

+

+

+++

+

++

+

+

+++++++

+

+

++

++

+

+

+++

+

++

+++

++

++++

+

+

+

+

++++

+

+

+

+

+

++

++

++

++

++

+

+

+

+

++++

++++

++

+

++

++

++

+

++

0.2 0.4 0.6 0.8

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

protein

fat

carbs

Beverages

+++

++

++

+

+++++++++++++++++++++++++++

++

+

+++++++

+++++

+

+++++++++

+

++

+

++++

+

++++

+

+++++

+

+

+

++

+

+++ +

+

+

+

+

+

+++

+

+

++++

+++++ + ++++

++++++

+

++++++

+

+

+

++++++++++++++++

+

++

+

+

+

+

+

+++

+

+

+ +

+

+

+

+

++

+

++

+

++++

+

+++

+

+

+++++++++

+

+ +++

++++

+

++++++++++++++++

+

+++++++++

+

+

+

+

+

0.2 0.4 0.6 0.8

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

protein

fat

carbs

USDA Nutrient Database: composition of 7270 foods in 25 food groups

36

Page 37: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

We model food compositions by:

Z = arg minX

‖X − Y ‖ : X ∈ where Y ∼ MVN(µ , V )

(Where we ensure Y T1 = 1, by constraining µT1 = 1 and V J = 0)

37

Page 38: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

In D dimensions, if Y1 ≤ Y2 ≤ · · · ≤ YD

Zl =

0 if l ≤ L

Yl + 1D−L

L∑

i=1

Yi otherwise

where L is smallest integer s.t. Z ≥ 0

Model fitting

For D ≤ 3, we compute likelihoods analytically

For D > 3, we use MCMC: a Gibbs sampler alternately simulating:

• (Y |Z) by rejection sampling

• µ and V

38

Page 39: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

Maximum likelihood estimates:Beef

++

+

+

+

+

+

+

++

+

+

+

+

++

+

++

+

+

+

+

+

+

+

+

+

+++

+

+

+

+

+

+

+

++

+

+

+

+

+++

+

+

++

+

+

+

++

+++

+

+

+

++++

+

+

+

++

+

+

+

++

+

+

+

+

+

+

+

+

+

++

+

++

+

+

+

++

+

+++

+

+

+++

+

+++

+

+

+

+

+

+

+

+

+

+

++

+

+

+

+

++

+

+

++

+

++

+

+

+

+

+

+

+

+

+

++

++

++

++++

+

+++

+

+

++

+

+

+++

+

+

+

+

+

+

+

+

++

++

+

+

+

+

++

+ +++

++

+

+

+

+

++

+

++

++

++

+

+

+

+

+

+++

+

+

+

+

++

+

+++

+

++

+

+

+

+

+

+

++

+

+

+

++

+

++++

+

+

+

+

+

+

+

+

+

++

+++

++

+

+

+

+

++

++

+

+

+

++

+

++

++

+

++

+

+

+

+

+

+

++

+

+

+

+

+

++

+

++

+

+

+++

+

+

+

+++

+

++++

+

+

+

+

+

+

+

++

+

+

+

+

++

+

++

+

++

+

+

+

++

+

++

+

++

+

+

+

+

+

+

+

+

++++

+++

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

++

++

+

++++

+

+

++ +++

+

++

++

+++

+

+

++

++

+

+

+

+

++

+

++

+

+

+

+

+

+++

+

+

+

+

+

+

+

+

++

+

++

+

+

+

+

+

++

+

+

+

+

+

++

+++

+

++

+

+

+

++

+

+

+++

+

+

+

+

+

+

++

+

++

+

+

++

+

+

+

+

++

+

+

+++

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+++

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

++

+

+

++

+

++

+++

+

++

+

++

++

++

+

+++

+

++

+

++

+

+

++

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

++

+

+

++

++

+

+

++

+

+

+

+

++++

+

++

+

+

+

+

+

+

+

++

+

+

+

+

+++

+

++

+

++

+++

++

+

+

+

+

+

+

+

+

++

+

+

+

+

++

+

+

+

+

++

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

++

++

+

++

+

+++

+

+

+

+

+

+

+

+

+

+

++

+

+

++

+

++

+

+

++

+

+

+

+

+

+

+

++

++

+

+

+++

+

++++

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+++

+

+

++

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+++

+

+++

+

+

++

+

+

+

+

0.2 0.4 0.6 0.8

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

protein

fat

carbs

Pork

++

+++

+

+

+

+

+

+

++

+

+

++

++

+

+ ++

+

++ ++

+

+

++

+

+

+

+

+

+

++

+

+

+

+

+

++

+

+

+++

++

+

+

+

+

+

+

+

+

++

++

+++

+

++

+

+

+++

+

+

+

+

++

+

+

++

+

++ +

+

+

+

+

++

+

+

+

++

+

+

+

+

++

+

+

+

+

++

+

+

+

+

+

+

++

+++

+

+

+

+

+

+

+

+

+

+

+

+

+

+++

+

+

+

+

+

+

+

++

+

+

+

++

++

+

+

+

++

++

+

++

++

+++++

+

+

++

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+++

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+

++

+

+

+

+

+

+

+

++

+

++

++

+

+

+

+

+

+

++

+

++

+

+

+

+

++

+

+

+

+

++

+

++

0.2 0.4 0.6 0.8

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

protein

fat

carbs

Fish

+

+

+

++

++

+

+

+

++

+

+

++

+

++

+

++++

+

+

+

+

+

+

+++

++

+

++

++

+

+++

+

+

++

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

++

+

++

+

++

++

+ +

++++

++

+

+

++

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+ ++

++

+++

+

+

+++

+

+

+

+

+

+

+

++++ +

++

+

+

+

+

+

+

+

+

++

++

++

+

+

+

+

+

+

+

+

+

+

++

++

+

++

++

++

+

+ ++

++

++

+++

+

+

+

++

+

+

+

++

+

+

+

+

++

+++

+

+

+

+

+

+

++

++

+

+

+

+

+

+

++

+

+

+

+

+

+

++

+

+

++

+

+

++

+

++++

+

+++

+++

+

+

++

+

0.2 0.4 0.6 0.8

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

protein

fat

carbs

Beverages

+

+

+++

+

+

+

+

+

++

++

+ +

+

+

+

+

+++

+

+

+

+

+++

+

+++++

+

+

+

++ ++

++++

+

++

+

+

+

+

+

+

+

+++

+

+++

+

+

+

++

++ ++

+ +

+

++

++

+

++ +

++ +

+

++

+

++

+

+

+

+

+

+

+

++ ++

+

+

+

+

+

+++ +++

+

++++

+

+

++

+

+

+

+

++

+

+++

+

+

+

+

++

+

+

+

+

+

+

++

+

+

++ +

++

++ ++

+

+

+

++

++++

+

++

+

+

+

+++++

+

+

+

+

+

+

+++

++

+

++

+

++ ++++ +++

++

+

+

++

++

+++

+

++

+

+

++

+

++

++

+

++

+

0.2 0.4 0.6 0.8

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

protein

fat

carbs

Likelihood ratio test shows beef and pork to be different

39

Page 40: TOBIT MODELS FOR MULTIVARIATE, SPATIO … · TOBIT MODELS FOR MULTIVARIATE, SPATIO-TEMPORAL AND COMPOSITIONAL DATA ... QUESTION 1: How to analyse data with lots of zeros, ...

5. SUMMARY

We have developed Tobit models for data that are:

1. Univariate – crop lodging – additive model

2. Multivariate – food intake – Latent Factors model

3. Spatio-temporal – rainfall – GMRF model

4. Compositional – food composition – bivariate normal model

Issues remaining:

• Efficient estimation

• Model diagnostics

• Generalisations when model does not fit

Further details are in papers on

http://www.bioss.sari.ac.uk/staff/chris.html

40


Recommended