surveillance package Aberration Detection Discussion References
Danish Mortality Monitoring using the R packagesurveillance
Michael Hohle1
1Department for Infectious Disease EpidemiologyRobert Koch Institute, Berlin, Germany
Statistical Methods for Outbreak DetectionOpen University, Milton Keynes, UK
19 May 2010
R package surveillance Michael Hohle 1/ 23
surveillance package Aberration Detection Discussion References
Outline
1 The surveillance package
2 Aberration DetectionNegative Binomial CUSUM – TheoryNegative Binomial CUSUM – Mortality MonitoringRun-length properties
3 Discussion
R package surveillance Michael Hohle 2/ 23
surveillance package Aberration Detection Discussion References
What is surveillance? (1)
An open source package for the visualization, modelingand monitoring of count data and categorical time series inpublic health surveillance
Prospective outbreak detection methods for univariate countdata time series:
farrington – Farrington et al. (1996)cusum – Rossi et al. (1999) and extensionsrogerson – Rogerson and Yamada (2004)glrnb – Hohle and Paul (2008)
Retrospective count data time series models:
hhh – Held et al. (2005); Paul et al. (2008)twins – Held et al. (2006)
Spatio-Temporal cluster detection
stcd – Assuncao and Correa (2009).
R package surveillance Michael Hohle 3/ 23
surveillance package Aberration Detection Discussion References
What is surveillance? (2)
Motivation: Provide data structure and implementationalframework for methodological developments
Spin-off: Tool for epidemiologists and others working inapplied disease monitoring
Availability: CRAN, current development version from
http://surveillance.r-forge.r-project.org/
Package is available under the GNU General Public License(GPL) v. 2.0.
R package surveillance Michael Hohle 4/ 23
surveillance package Aberration Detection Discussion References
The EuroMOMO project
European monitoring of excess mortality for public healthaction (EuroMOMO)
Aim: develop and strengthen real-time monitoring of mortalityacross Europe in order to enhance the management of seriouspublic health risks such as pandemic influenza, heat waves andcold snaps
Main outcome of mortality monitoring: excess mortality
EuroMOMO in this talk
Danish mortality data provided by Statens Serum Institut,Denmark, are used to illustrate the surveillance package.
R package surveillance Michael Hohle 5/ 23
surveillance package Aberration Detection Discussion References
Data structure: The sts class
Surveillance time series {yit ; t = 1, . . . , n, i = 1, . . . ,m} arerepresented using objects of class sts
> data("momo")
> momo
-- An object of class sts --
freq: 52 with strptime format string %V
start: 1994-01-03
dim(observed): 782 8
Head of observed:
[0,1) [1,5) [5,15) [15,45) [45,65) [65,75) [75,85) [85,Inf)
[1,] 11 4 2 53 212 279 528 408
...
Dates can be stored using the R Date class which handles theISO 8601 date standard
R package surveillance Michael Hohle 6/ 23
surveillance package Aberration Detection Discussion References
Visualizing sts objects (1)
The plot function provides an interface to several visualrepresentations controlled by the type argument.
> plot(momo[year(momo) >= 2000, ], type = observed ~ time |
+ unit)
time
No.
infe
cted
2000
II
2003
II
2006
II
010
030
050
0
[0,1)
time
No.
infe
cted
2000
II
2003
II
2006
II
010
030
050
0
[1,5)
timeN
o. in
fect
ed
2000
II
2003
II
2006
II
010
030
050
0
[5,15)
time
No.
infe
cted
2000
II
2003
II
2006
II
010
030
050
0
[15,45)
time
No.
infe
cted
2000
II
2003
II
2006
II
010
030
050
0
[45,65)
time
No.
infe
cted
2000
II
2003
II
2006
II
010
030
050
0
[65,75)
time
No.
infe
cted
2000
II
2003
II
2006
II
010
030
050
0[75,85)
timeN
o. in
fect
ed
2000
II
2003
II
2006
II
010
030
050
0
[85,Inf)
R package surveillance Michael Hohle 7/ 23
surveillance package Aberration Detection Discussion References
Visualizing sts objects (2)
> plot(momo[, "[0,1)"], ylab = "No. of deaths")
time
No.
of d
eath
s
1994
II
1995
IV
1997
II
1998
IV
2000
II
2001
IV
2003
II
2004
IV
2006
II
2007
IV
05
1015
2025
Summarizing: The series contain small and large counts, trendsand seasonality → take this into account within a statistical model.
R package surveillance Michael Hohle 8/ 23
surveillance package Aberration Detection Discussion References
Statistical Framework for Aberration Detection
Univariate time series {yt , t = 1, 2, . . .} to monitor
At the unknown time τ , an important change in the processoccurs. For each time t we differentiate between two-states:
xt =
{0 if t < τ (in-control),1 otherwise (out-of-control).
At time s ≥ 1, the available information is ys = {yt ; t ≤ s}.Detection is based on a statistic r(·) with resulting alarm time
TA = min{s ≥ 1 : r(ys) > g},
where g is a known threshold.
R package surveillance Michael Hohle 9/ 23
surveillance package Aberration Detection Discussion References
Theory: Negative Binomial CUSUM (1)
Likelihood ratio between the out-of-control and in-controlmodels at time s given that τ = t:
L(s, t) =f (ys |τ = t)
f (ys |τ > s)=
s∏i=t
f (yi ; θ1)
f (yi ; θ0),
where f (·; θ) is the negative binomial PMF with parametervector θ.
Cumulative Sum (CUSUM) procedure advantageous fordetecting sustained shifts:
r(ys) = max{1 ≤ t ≤ s : log L(s, t)}.
R package surveillance Michael Hohle 10/ 23
surveillance package Aberration Detection Discussion References
Theory: Negative Binomial CUSUM (2)
The computation of r(ys) in recursive form:
r0 = 0,
rs = max
(0, rs−1 + log
{f (ys ; θ1)
f (ys ; θ0)
}), s ≥ 1.
When there is evidence against in-control, the LLRcontributions are added up.
No credit in the direction of the in-control is given because rscannot get below zero.
R package surveillance Michael Hohle 11/ 23
surveillance package Aberration Detection Discussion References
Theory: Negative Binomial CUSUM (3)
Negative-binomial response with fixed dispersion parameter αand in-control mean modeled using a GLM with log-link
yt ∼ NegBin(µ0,t , α),
log(µ0,t) = log(popt) + β0 + β1 · t + ct ,
where ct is a cyclic function with period 52 or 53 dependingon the number of ISO weeks in the year of t and popt denotesthe population size in the respective age group at time t.
As a consequence, E(yt) = µ0,t and Var(yt) = µ0,t + α · µ20,t
Out-of-control model for given κ > 1:
µ1,t = κ · µ0,t .
R package surveillance Michael Hohle 12/ 23
surveillance package Aberration Detection Discussion References
Application: Negative Binomial CUSUM (1)
Monitoring example: Age group 75-84 starting from week 40in 2007 (i.e. 1st October 2007) using past 5 years as reference:
> m <- glm.nb( `observed.[75,85)` ~ 1 + epoch + sin(2*pi*epochInPeriod) +
+ cos(2*pi*epochInPeriod) + offset(log(`population.[75,85)`)),+ data=momo.df[phase1,])
> mu0 <- predict(m, newdata=momo.df[phase2,],type="response")
Aim: to optimally detect a 20% increase in the mean, i.e.κ = 1.2. Use g = 4.75 – consequences?
> kappa <- 1.2
> s.nb <- glrnb(momo[, "[75,85)"], control = list(range = phase2,
+ alpha = 1/m$theta, mu0 = mu0, c.ARL = 4.75, theta = log(kappa),
+ ret = "cases"))
R package surveillance Michael Hohle 13/ 23
surveillance package Aberration Detection Discussion References
Application: Negative Binomial CUSUM (2)
For week 02 in 2008 an alarm is generated:
time (weeks)
No.
of d
eath
s
2007
IV
2008
II
2008
III
2008
IV
010
020
030
040
0
µ0 µ1 NNBA
Also shown is the number needed before alarm (NNBA), i.e.given r(ys−1) find the minimum ys such that r(ys) > g .
R package surveillance Michael Hohle 14/ 23
surveillance package Aberration Detection Discussion References
Application: Negative Binomial CUSUM (2)
For week 02 in 2008 an alarm is generated:
time (weeks)
No.
of d
eath
s
2007
IV
2008
II
2008
III
2008
IV
010
020
030
040
0
µ0GAM µ1
GAM NNBA
Also shown is the number needed before alarm (NNBA), i.e.given r(ys−1) find the minimum ys such that r(ys) > g .
R package surveillance Michael Hohle 14/ 23
surveillance package Aberration Detection Discussion References
Run-length of NegBin CUSUM (1)
Interest is in the PMF of TA. Compute this either by MonteCarlo simulation or by using a Markov chain approximation.
Generalization of Bissell (1984) to time varying count dataCUSUMs: dynamics of rt described by a Markov chain:
State 0 rt = 0State i rt ∈
((i − 1) · g
M , i ·gM
], i = 1, 2, . . . ,M
State M + 1 rt > g
Calculation of the (M + 2)× (M + 2) transition matrix Pt
with elements
pt,i ,j = P(rt ∈ State j |rt−1 ∈ State i), i , j = 0, 1, . . . ,M + 1
by approximations suggested in Hawkins and Olwell (1998)
R package surveillance Michael Hohle 15/ 23
surveillance package Aberration Detection Discussion References
Run-length of NegBin CUSUM (2)
State M + 1 is absorbing.
The cumulative probability of an alarm at any step up to timen, n ≥ 1, is:
P(TA ≤ n) =
[n∏
t=1
Pt
]0,M+1
The PMF of TA can thus be determined by subtraction
Now: Choose g such that P(TA ≤ 65|τ =∞) is below someacceptable value, e.g. 10%.
> pMarkovChain <- sapply(g.grid, function(g) {
+ TA <- LRCUSUM.runlength(mu = t(mu0), mu0 = t(mu0), mu1 = kappa *
+ t(mu0), h = g, dfun = dY, n = rep(600, length(mu0)),
+ alpha = 1/m$theta)
+ return(tail(TA$cdf, n = 1))
+ })
R package surveillance Michael Hohle 16/ 23
surveillance package Aberration Detection Discussion References
Run-length of NegBin CUSUM (3)
P(TA ≤ 65|τ =∞) as a function of g – computed by bothMonte Carlo simulation and the Markov chain approximation.
1 2 3 4 5 6 7 8
0.0
0.2
0.4
0.6
0.8
g
P(T
A≤
65|τ
=∞
)
0.1
Monte CarloMarkov chain
The Markov chain approximation is 5.0 times faster thanMonte Carlo based on 1000 samples.
R package surveillance Michael Hohle 17/ 23
surveillance package Aberration Detection Discussion References
Comparison with the Farrington algorithm
Fitted negative binomial model with mean µ0,t and dispersionαt , matching the quasi-Poisson model, as true model.
Based on 1000 realizations of I (TA ≤ 65|τ =∞) for theFarrington et al. (1996) algorithm with 2
3 -power transform,b = 5, w = 4 and α = 0.001, we obtain
P(TA ≤ 65|τ =∞) ≈ 0.19.
A rough estimate of this number would have been
1−(
1− α
2
)65= 0.03.
Note: Using farrington without reweighting and alwaysincluding a trend, we obtain the Monte Carlo estimate 0.04.
R package surveillance Michael Hohle 18/ 23
surveillance package Aberration Detection Discussion References
Discussion (1)
surveillance offers visualization, modeling and monitoringof count data and categorical time series.
Combined with Sweave/odfWeave the package can be usedfor automatic report generation using LaTeX/OpenOffice.
A starting point to learn more about the package is Hohle andMazick (2010) or the short course slides available from theR-Forge page.
The current package version is 1.1-6.
R package surveillance Michael Hohle 19/ 23
surveillance package Aberration Detection Discussion References
Discussion (2)
Current methodological work:
CUSUM changepoint detection in the binomial settingyt ∼ Bin(nt , πt) and the multinomial setting yt ∼ Mk(nt ,πt).The proportions are modeled by logistic, proportional odds andmultinomial logistic regression models (Hohle, 2010).Model based space-time cluster detection based on theadditive-multiplicative intensity model in (Hohle, 2009).
EuroMOMO work:
Extend to more demographics oriented two-dimensional counttable modelling indexed by time and age (Eilers et al., 2008):
log(µta) = log(popta) + vta + ftacos(ωt) + gta sin(ωt).
R package surveillance Michael Hohle 20/ 23
surveillance package Aberration Detection Discussion References
Acknowledgements
Persons:
Michaela Paul, Andrea Riebler and Leonhard Held, Institute ofSocial and Preventive Medicine, University of Zurich,Switzerland
Valentin Wimmer, Ludwig-Maximilians-Universitat Munchen,Germany, and Mathias Hofmann, Technical University ofMunich, Germany
Thais Correa, Department of Statistics, Universidade Federalde Minas Gerais, Belo Horizonte, Brazil
Financial Support:
German Science Foundation (DFG, 2003-2006)
Munich Center of Health Sciences (2007-2010)
R package surveillance Michael Hohle 21/ 23
surveillance package Aberration Detection Discussion References
Literature I
Assuncao, R., Correa, T., 2009. Surveillance to detect emerging space-time clusters.Computational Statistics & Data Analysis 53 (8), 2817–2830.
Bissell, A. F., 1984. The Performance of Control Charts and Cusums Under LinearTrend. Applied Statistics 33 (2), 145–151.
Eilers, P. H. C., Gampe, J., Marx, B. D., Rau, R., 2008. Modulation models forseasonal time series and incidence tabels. Statistics in Medicine 27, 3430–3441.
Farrington, C., Andrews, N., Beale, A., Catchpole, M., 1996. A statistical algorithmfor the early detection of outbreaks of infectious disease. Journal of the RoyalStatistical Society, Series A 159, 547–563.
Hawkins, D. M., Olwell, D. H., 1998. Cumulative Sum Charts and Charting for QualityImprovement. Statistics for Engineering and Physical Science. Springer.
Held, L., Hofmann, M., Hohle, M., Schmid, V., 2006. A two component model forcounts of infectious diseases. Biostatistics 7, 422–437.
Held, L., Hohle, M., Hofmann, M., 2005. A statistical framework for the analysis ofmultivariate infectious disease surveillance data. Statistical Modelling 5, 187–199.
Hohle, M., 2009. Additive-multiplicative regression models for spatio-temporalepidemics. Biometrical Journal 51 (6), 961–978.
R package surveillance Michael Hohle 22/ 23
surveillance package Aberration Detection Discussion References
Literature II
Hohle, M., 2010. Changepoint detection in categorical time series. In: Kneib, T., Tutz,G. (Eds.), Statistical Modelling and Regression Structures – Festschrift in Honourof Ludwig Fahrmeir. Springer, pp. 377–397.
Hohle, M., Mazick, A., 2010. Aberration detection in R illustrated by Danish mortalitymonitoring. In: Kass-Hout, T., Zhang, X. (Eds.), Biosurveillance: A HealthProtection Priority. CRC Press, to appear.
Hohle, M., Paul, M., 2008. Count data regression charts for the monitoring ofsurveillance time series. Computational Statistics & Data Analysis 52 (9),4357–4368.
Paul, M., Held, L., Toschke, A. M., 2008. Multivariate modelling of infectious diseasesurveillance data. Statistics in Medicine 27, 6250–6267.
Rogerson, P., Yamada, I., 2004. Approaches to syndromic surveillance when dataconsist of small regional counts. Morbidity and Mortality Weekly Report 53, 79–85.
Rossi, G., Lampugnani, L., Marchi, M., 1999. An approximate CUSUM procedure forsurveillance of health events. Statistics in Medicine 18, 2111–2122.
R package surveillance Michael Hohle 23/ 23