Statistical modelling and inference forspatio-temporal disease processes
Martin Hazelton1
Statistics and Bioinformatics GroupInstitute of Fundamental Sciences, Massey University
24 October 2012
1Presenter: [email protected]
IDReC Symposium October 2012, Palmerston North 1 / 31
Disease, Prediction and Chance
Spread of infectious disease is a complexprocess typically involving a plethora offactors.Some are explicable/predictable.Others are essentially random.Plenty of work for statisticians decidingwhich is which.
IDReC Symposium October 2012, Palmerston North 2 / 31
Spatial Patterns ... of Disease?
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
A
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
B
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
● ●
●
● ●
● ●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●●
●
●
● ●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●
C
Complete spatial randomnessLocations of gastroenteritis casesMystery spatial point pattern
IDReC Symposium October 2012, Palmerston North 3 / 31
Spatial Patterns ... of Disease?
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
Gastroenteritis cases
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
Complete spatial randomness
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
● ●
●
● ●
● ●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●●
●
●
● ●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●
●
●
●
●
Sea anemone locations
Complete spatial randomness (B)Locations of gastroenteritis cases (A)Mystery spatial point pattern (C)
IDReC Symposium October 2012, Palmerston North 3 / 31
Measuring Spatial Risk
To understand spatial patterns of disease we must adjust forunderlying population density.To this end, Bithell (1990) defined the relative risk function:
r(z) =f (z)g(z)
z ∈ R
where f is density of cases and g density of controls.Usual to work on log-scale:
ρ(z) = log{r(z)} = log{f (z)} − log{g(z)}
Bithell J.F. (1990). An application of density estimation to geographical epidemiology. Statistics in
Medicine 9:691–701.
IDReC Symposium October 2012, Palmerston North 4 / 31
Kernel Estimation
Straightforward approach to estimation of r is to replacingunknown densities by kernel estimates thereof:
r(z) =f (z)g(z)
z ∈ R
Kernel density estimate constructed from bivariate datax1, . . . ,xn:
f (z) = n−1n∑
i=1
Kh(z − x i)
I Kh(z) = h−2K (z/h), with kernel K being spherically symmetric;I h is the bandwidth, controlling degree of smoothing.
IDReC Symposium October 2012, Palmerston North 5 / 31
Constructing Kernel Density EstimatesAn example using the gastroenteritis data
●
IDReC Symposium October 2012, Palmerston North 6 / 31
Constructing Kernel Density EstimatesAn example using the gastroenteritis data
IDReC Symposium October 2012, Palmerston North 6 / 31
Constructing Kernel Density EstimatesAn example using the gastroenteritis data
●●
IDReC Symposium October 2012, Palmerston North 6 / 31
Constructing Kernel Density EstimatesAn example using the gastroenteritis data
IDReC Symposium October 2012, Palmerston North 6 / 31
Constructing Kernel Density EstimatesAn example using the gastroenteritis data
●●
●
IDReC Symposium October 2012, Palmerston North 6 / 31
Constructing Kernel Density EstimatesAn example using the gastroenteritis data
IDReC Symposium October 2012, Palmerston North 6 / 31
Constructing Kernel Density EstimatesAn example using the gastroenteritis data
●●
●
●
IDReC Symposium October 2012, Palmerston North 6 / 31
Constructing Kernel Density EstimatesAn example using the gastroenteritis data
IDReC Symposium October 2012, Palmerston North 6 / 31
Constructing Kernel Density EstimatesAn example using the gastroenteritis data
●●
●
●●
IDReC Symposium October 2012, Palmerston North 6 / 31
Constructing Kernel Density EstimatesAn example using the gastroenteritis data
IDReC Symposium October 2012, Palmerston North 6 / 31
Constructing Kernel Density EstimatesAn example using the gastroenteritis data
●●
●
●●●●
●
●●
●
●●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●●●●
●
●
●●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●●
●●
●●●
●
●●
●
●
●
●
●
● ●●
●●
●●
●
●●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
● ●
●
●●
●●
●
●
●
●
●
●
●●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
● ●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
IDReC Symposium October 2012, Palmerston North 6 / 31
Constructing Kernel Density EstimatesAn example using the gastroenteritis data
IDReC Symposium October 2012, Palmerston North 6 / 31
Smoothing of Kernel Density EstimatesSmall bandwidth (undersmoothing)
IDReC Symposium October 2012, Palmerston North 7 / 31
Smoothing of Kernel Density EstimatesSmall bandwidth (undersmoothing)
IDReC Symposium October 2012, Palmerston North 7 / 31
Smoothing of Kernel Density EstimatesSmall bandwidth (undersmoothing)
IDReC Symposium October 2012, Palmerston North 7 / 31
Smoothing of Kernel Density EstimatesSmall bandwidth (undersmoothing)
IDReC Symposium October 2012, Palmerston North 7 / 31
Smoothing of Kernel Density EstimatesLarge bandwidth (oversmoothing)
IDReC Symposium October 2012, Palmerston North 8 / 31
Smoothing of Kernel Density EstimatesLarge bandwidth (oversmoothing)
IDReC Symposium October 2012, Palmerston North 8 / 31
Smoothing of Kernel Density EstimatesLarge bandwidth (oversmoothing)
IDReC Symposium October 2012, Palmerston North 8 / 31
Smoothing of Kernel Density EstimatesLarge bandwidth (oversmoothing)
IDReC Symposium October 2012, Palmerston North 8 / 31
Example: Cancer of the Larynx in South Lancashire
Data are geographicalcoordinates for 58 casesof cancer of the larynx,and 978 controls (casesof lung cancer).Data collected between1974 and 1983 byChorley and SouthRibble Health Authorityin Lancashire, England.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●●
IDReC Symposium October 2012, Palmerston North 9 / 31
The Effect of Bandwidth SelectionFor the log-relative risk function
−15
−10
−5
0
h=1
−15
−10
−5
0
h=10
IDReC Symposium October 2012, Palmerston North 10 / 31
Research on Bandwidth Selection
Bandwidth selection is a challenging problem for both theoreticaland practical reasons.
I Typical inhomogeneity of populations.I Want stable estimates of risk even where data are sparse.I Asymptotic theory very delicate.
Recent progress on spatially adaptive smoothing regimens Davies& Hazelton (2010).
Davies, T.M. and Hazelton, M.L. (2010). Adaptive kernel estimation of spatial relative risk.
Statistics in Medicine 29, 2423–2437.
IDReC Symposium October 2012, Palmerston North 11 / 31
Estimation with Spatially Adaptive BandwidthsEstimation of case density
●
IDReC Symposium October 2012, Palmerston North 12 / 31
Estimation with Spatially Adaptive BandwidthsEstimation of case density
IDReC Symposium October 2012, Palmerston North 12 / 31
Estimation with Spatially Adaptive BandwidthsEstimation of case density
●
IDReC Symposium October 2012, Palmerston North 12 / 31
Estimation with Spatially Adaptive BandwidthsEstimation of case density
IDReC Symposium October 2012, Palmerston North 12 / 31
Estimation with Spatially Adaptive BandwidthsEstimation of case density
●
IDReC Symposium October 2012, Palmerston North 12 / 31
Estimation with Spatially Adaptive BandwidthsEstimation of case density
IDReC Symposium October 2012, Palmerston North 12 / 31
Estimation with Spatially Adaptive BandwidthsEstimation of case density
IDReC Symposium October 2012, Palmerston North 12 / 31
Estimation with Spatially Adaptive BandwidthsEstimation of log-relative risk surface
−15
−10
−5
0
IDReC Symposium October 2012, Palmerston North 13 / 31
Boundary Bias
Disease data collectedover finite region.For points near edge,some of kernel weightspills over boundary andis lost.Has significant impact onkernel estimates.Theoretical analysis isintricate.
●
IDReC Symposium October 2012, Palmerston North 14 / 31
Foundations of Asymptotic Analysis of Boundary Bias
IDReC Symposium October 2012, Palmerston North 15 / 31
Asymptotics with Adaptive Boundary KernelOr ... Why Postdocs are Great
bias(f (z))
h20
=[D10f (z)]2
4f (z)
[ b0
q
(a(20)
40 + 2a(11)31 + a(02)
22 + 7a(10)30 + 7a(01)
21 + 8a20
)+
b1
q
(a(20)
50 + 2a(11)41 + a(02)
32 + 9a(10)40 + 9a(01)
31 + 15a30
)+
b2
q
(a(20)
41 + 2a(12)32 + a(02)
23 + 9a(10)31 + 9a(01)
22 + 15a21
)]
+D10f (z)D01f (z)
4f (z)
[ b0
q
(2a(20)
31 + 4a(11)22 + 2a(02)
13 + 14a(10)21 + 14a(01)
12 + 16a11
)+
b1
q
(2a(20)
41 + 4a(11)22 + 2a(02)
23 + 18a(10)31 + 18a(01)
22 + 30a21
)+
b2
q
(2a(20)
32 + 4a(11)23 + 2a(02)
14 + 18a(10)22 + 18a(01)
13 + 30a12
)]
+[D01f (z)]2
4f (z)
[ b0
q
(a(20)
22 + 2a(11)13 + a(02)
04 + 7a(10)12 + 7a(01)
03 + 8a02
)+
b1
q
(a(20)
32 + 2a(11)23 + a(02)
14 + 9a(10)22 + 9a(01)
13 + 15a12
)+
b2
q
(a(20)
23 + 2a(11)14 + a(02)
05 + 9a(10)13 + 9a(01)
04 + 15a03
)]
+D20f (z)
4
[ b0
q
(2a(10)
30 + 2a(01)21 + 8a20
)+
b1
q
(2a(10)
40 + 2a(01)31 + 10a30
)+
b2
q
(2a(10)
31 + 2a(01)22 + 10a21
)]
+D11f (z)
4
[ b0
q
(4a(10)
21 + 4a(01)12 + 16a11
)+
b1
q
(4a(10)
31 + 4a(01)22 + 20a21)
)+
b2
q
(4a(10)
22 + 4a(01)13 + 20a12
)]
+D02f (z)
4
[ b0
q
(2a(10)
12 + 2a(01)03 + 8a02
)+
b1
q
(2a(10)
22 + 2a(01)13 + 10a12
)+
b2
q
(2a(10)
13 + 2a(01)04 + 10a03
)],
IDReC Symposium October 2012, Palmerston North 16 / 31
Computer Implementation
Important practical problems in spatial epidemiology can quicklylead to fascinating theoretical work.Need to make this stuff available to users.Released R package sparr. See Davies, Hazelton & Marshall(2011).
Davies, T.M., Hazelton, M.L. and Marshall, J.C. (2011). sparr: Analyzing spatial relative risk
using fixed and adaptive kernel density estimation in R. Journal of Statistical Software 39, 1–14.
IDReC Symposium October 2012, Palmerston North 17 / 31
Statistics and Bioinformatics Group Research
Methodological research with applications to infectious diseaseI Spatial statisticsI Markov chain Monte Carlo methodsI Simulation based inferenceI Smoothing methodsI Semiparametric modellingI Statistical software development
Variety of applications
IDReC Symposium October 2012, Palmerston North 19 / 31
Some Members of the Research Group
Postgraduate students
Sarojinie FernandoPhD (recently submitted)
Brigid Betz-StableinPhD (year 3)
Kate RichardsPhD (year 2)
Khair JonesPhD (year 1)
Lyndal HendenHonours
Staff with interest in infectious disease applications
Prof Martin Hazelton Dr Chris JewellA/Prof Geoff Jones Dr Jonathan Marshall
IDReC Symposium October 2012, Palmerston North 20 / 31
Some Members of the Research Group
Postgraduate students
Sarojinie FernandoPhD (recently submitted)
Brigid Betz-StableinPhD (year 3)
Kate RichardsPhD (year 2)
Khair JonesPhD (year 1)
Lyndal HendenHonours
Staff with interest in infectious disease applications
Prof Martin Hazelton Dr Chris JewellA/Prof Geoff Jones Dr Jonathan Marshall
IDReC Symposium October 2012, Palmerston North 21 / 31
Improved Spatio-Temporal Risk Estimation
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
Local linear versus density ratio estimation of relative risk.Application to myrtle wilt in Tasmanian myrtle beech (Nothofaguscunninghamii).
IDReC Symposium October 2012, Palmerston North 22 / 31
Some Members of the Research Group
Postgraduate students
Sarojinie FernandoPhD (recently submitted)
Brigid Betz-StableinPhD (year 3)
Kate RichardsPhD (year 2)
Khair JonesPhD (year 1)
Lyndal HendenHonours
Staff with interest in infectious disease applications
Prof Martin Hazelton Dr Chris JewellA/Prof Geoff Jones Dr Jonathan Marshall
IDReC Symposium October 2012, Palmerston North 23 / 31
Modelling Progression of Eye Disease
−4
−3
−2
−1
0
(a) SPROG method
−4
−3
−2
−1
0
(b) PLR method
Using ideas from geographical disease mapping to examinespatial patterns of damage over retina.Interpretation of visual field data must account for physiology andoptics.
IDReC Symposium October 2012, Palmerston North 24 / 31
Some Members of the Research Group
Postgraduate students
Sarojinie FernandoPhD (recently submitted)
Brigid Betz-StableinPhD (year 3)
Kate RichardsPhD (year 2)
Khair JonesPhD (year 1)
Lyndal HendenHonours
Staff with interest in infectious disease applications
Prof Martin Hazelton Dr Chris JewellA/Prof Geoff Jones Dr Jonathan Marshall
IDReC Symposium October 2012, Palmerston North 25 / 31
Modelling Foot and Mouth Disease in Vietnam
Data quality issues for FMD inVietnam.Appropriate level of dataaggregation for modelling?Developing methods foridentifying anomalies
I High risk provinces.I Under-reporting.
IDReC Symposium October 2012, Palmerston North 26 / 31
Some Members of the Research Group
Postgraduate students
Sarojinie FernandoPhD (recently submitted)
Brigid Betz-StableinPhD (year 3)
Kate RichardsPhD (year 2)
Khair JonesPhD (year 1)
Lyndal HendenHonours
Staff with interest in infectious disease applications
Prof Martin Hazelton Dr Chris JewellA/Prof Geoff Jones Dr Jonathan Marshall
IDReC Symposium October 2012, Palmerston North 27 / 31
Bayesian Constrained Smoothing Problems
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3
0.0
0.2
0.4
0.6
0.8
1.0
Rainfall (m)
Toxo
plas
mos
is p
ositi
ve
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
MonotonicUnconstrained
In some problems want flexible smooth fit under shape constraints.Application to toxoplasmosis data from 34 cities in El Salvador.
IDReC Symposium October 2012, Palmerston North 28 / 31
Some Members of the Research Group
Postgraduate students
Sarojinie FernandoPhD (recently submitted)
Brigid Betz-StableinPhD (year 3)
Kate RichardsPhD (year 2)
Khair JonesPhD (year 1)
Lyndal HendenHonours
Staff with interest in infectious disease applications
Prof Martin Hazelton Dr Chris JewellA/Prof Geoff Jones Dr Jonathan Marshall
IDReC Symposium October 2012, Palmerston North 29 / 31
Methods of Simulation Based Inference
Modern complex stochasticmodels often have intractabledistribution theory.Standard methods of inferenceinfeasible.We are working on improvingsimulation based approaches(e.g. ABC).Application is estimation ofeffective population sizes in Baliusing coalescent model.
IDReC Symposium October 2012, Palmerston North 30 / 31