Characterizing Infectious Disease Outbreaks: Traditional and Novel Approaches
Laura F White15 October 2013
2009 Influenza A H1N1 Pandemic
• H1N1 pandemic first noticed in February in Mexico.• Large outbreak early on in La Gloria-a small village
outside of Mexico City.• Studied extensively in the first report on H1N1
(Fraser, Donelly et al. “Pandemic potential of a strain of Influenza (H1N1): early findings”, Science Express, 11 May 2009.)
Example-H1N1 Outbreak
Example-H1N1 Outbreak
Edgar Hernandez (four years old): first confirmed case
Cases reported in La Gloria
Quantitative Issues
• How do we determine how fast the disease is spreading?– Reproductive number, serial interval
• How do we determine how severe the disease is?– Attack rate, case fatality ratio– A topic for another talk!
• How do we determine what interventions will be most effective?– Mathematical modeling, network models, etc.– Estimates of severity and transmission by age group
Importance of parameter estimates
• Good information leads to good policy.• School closure is expensive
– Important to determine if it will really help.
• If R0 < 2, some estimate that Influenza can be controlled.
• Information on R0 and the serial interval can give a good picture of how a disease might spread.
Source: Fraser et al (2004)
Impact of the serial interval
Some of the challenges in infectious diseases
• Dependency in the data.– Chain of infection.
• Undetected cases.– Asymptomatic, but still infectious.– Unable to detect with existing surveillance.
• Need to act fast with little information.
Approaches to estimation
• Classical: Mathematical models• Network models• Statistical approaches
Simple approach
• Assume exponential growth for the first part of an epidemic.
• td is the doubling time of the epidemic, D is the average serial interval. Then use the following to solve for R0.
• Overly simplistic and sensitive.0(ln 2) / ( 1)d D Rt
Mathematical models
Susceptible Infected Recovered
R0=(attack rate)(contact rate)(duration of infectiousness)
SIR Model
(Contact Rate)*(Transmission Probability)Infected
1/(duration of infectiousness)
Mathematical Models-Uses
• Modeling vaccination programs• Determining optimal intervention strategies
for halt or control an epidemic• HIV transmission routes• Estimating parameters of disease
Mathematical Models: Limitations
• Make a lot of assumptions.– Must plug in a lot of values in order to get estimates.
• Do not allow for randomness in processes-always gives a number as the answer with no error bounds.– Stochastic epidemic model.
• Can oversimplify the problem.– Challenge to achieve balance between making the model
too simple and too complex.
References
• Hethcote– The Mathematics of Infectious Diseases. Herbert
W. Hethcote. SIAM Review, Vol. 42, No. 4, 599-653. Dec., 2000.
• Anderson and May– Infectious Diseases of Humans: Dynamics and
Control, Oxford University Press, 1992.
Wallinga & Tuenis
• Network based method to estimate the reproductive number each day of an epidemic.
• Requires knowledge of the serial interval.• Requires that all cases have been observed
and epidemic is over.• Originated to analyze SARS.
American Journal of Epidemiology, 2004
Day 1
Day 2
Day 4
Day 5
Day 6
Day 3
= infected person
Day 1
Day 2
Day 4
Day 5
Day 6
Day 3
All possible infectors.
j
i
Day 1
Day 2
Day 4
Day 5
Day 6
Day 3p1
p1
p1p1
p2
p3
pt=probability of being infected by a case that appeared t days prior.
Wallinga & Teunis
• If g(t) is the distribution of the serial interval, then, the relative probability that case i has been infected by case j is:
• The effective reproductive number for cases on day j is then:
( )
( )i j
iji j
i j
g t t
g tp
t
j iji
R p
WT - SARS
White & Pagano
• Statistical method, using probability models to estimate the serial interval and reproductive number.
• Assume that we observe daily counts of new cases: .
• Let Xij be the number of cases with symptoms on day j that were infected by a case with symptoms on day i.
1 2, , , TNN N
Statistics in Medicine, 2008
White & Pagano
Method
• Using this scheme, we make some probabilistic assumptions and get a likelihood equation:
• Where
• pj describes the serial interval (i.e. probability of having symptoms j days after infector).
• Use numerical methods to get MLEs of Ro and p.
01
)( , |!
t tNTt
t t
eNL R
N
p
( , )
01
.min k t
t j t jj
R p N
H1N1 Example
• In April the public became aware of a novel strain of Influenza that was affecting Mexico.
• Fraser, Donelly et al published initial report in Science on 11 May 2009.
• Estimate the reproductive number to be between 1.4 and 1.6.
• Estimate the average serial interval to be 1.91 days.
H1N1 Example
• We obtained data from the CDC with information on each confirmed and suspected case (1368 cases) as of May 8.
• 750 had a date of symptom onset.
Influenza A/H1N1: Serial Interval
• Spanish work estimate average serial interval to be 3.5 days, range=1-6 days.– Use contact tracing data.
• Seasonal influenza (Cowling et al, 2009)– 3.6 days, SD=1.6– From a household contact study
Influenza A/H1N1: R0 estimates
• Mexico: 1.3-1.4 (Cruz-Pacheco et al)• Mexico: less than 2.2-3.1 (Boelle et al)• Japan: 2.3 (Nishiura et al)• Netherlands: less than 1 (Hahne et al)• US: 1.7-1.8 (White et al)
Influenza A/H1N1: USA
Influenza A/H1N1: USA
• Missing dates of symptom onset– All cases have report date but many lack date of
symptom onset.– Calculate the distribution of time between
reported date and symptom onset for those with both.
– Impute a date of symptom onset for those with missing information from the observed distribution.
Reporting delay distribution
Other issues in the data
• Imported cases– Make an adjustment in the estimation method to account
for those who were known to have traveled to Mexico.• Reporting delay
– The decline in cases as it gets closer to May 8 is likely due to reporting delays, rather than a true drop off in case numbers.
– Augment the data at the end, using the reporting delay distribution.
Augmented data
Estimates in the USA
• Using the White & Pagano Method with the modifications mentioned we get estimates for R0 and the serial interval in the initial outbreak in the US.
Serial interval estimate
Using data up to and including April 27, 2009.
Using data up to and including April 25, 2009.
HETEROGENEITY
Heterogeneity
• Variation in transmission between adults and kids, geographically, etc.
• Can lead to better policy decisions– Who gets vaccinated first?– Social distancing measures that might be most
effective?
Overview
• Social mixing matrices• Glass method• Modification of Wallinga and Teunis• Modification of White and Pagano
Social mixing
• To understand who is most culpable for transmission, we typically need to understand how people interact
• Many approaches to this, but we choose most popular currently: social mixing matrices
PolyMod study
• Large European study– Belgium, Finland, Great Britain, Germany, Italy,
Luxembourg, the Netherlands, and Poland• 97,904 contacts among 7,290 participants • Participants record number and nature of
contacts in a diary• Contact matrices were created to describe all
close contacts and separately, close contacts that involve physical touch
Table 1.
Mossong J, Hens N, Jit M, Beutels P, et al. (2008) Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases. PLoS Med 5(3): e74. doi:10.1371/journal.pmed.0050074http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0050074
Figure 1. The Mean Proportion of Contacts That Involved Physical Contact, by Duration, Frequency, and Location of Contact in All Countries
Mossong J, Hens N, Jit M, Beutels P, et al. (2008) Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases. PLoS Med 5(3): e74. doi:10.1371/journal.pmed.0050074http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0050074
Figure 2. The Distribution by Location and by Country of (A) All Reported Contacts and (B) Physical Contacts Only
Mossong J, Hens N, Jit M, Beutels P, et al. (2008) Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases. PLoS Med 5(3): e74. doi:10.1371/journal.pmed.0050074http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0050074
Figure 3. Smoothed Contact Matrices for Each Country Based on (A) All Reported Contacts and (B) Physical Contacts Weighted by Sampling Weights
Mossong J, Hens N, Jit M, Beutels P, et al. (2008) Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases. PLoS Med 5(3): e74. doi:10.1371/journal.pmed.0050074http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0050074
Other studies
• Similar studies have been conducted in South Africa and Vietnam
• First of this nature in Netherlands (Wallinga et al, 2006)
• Johnstone-Robertson et al (2011) carried out a very similar study in a South African township
Approaches
• Glass et al, 2011– Estimate R for adults and children– Do not require transmission data
• Modify Wallinga and Teunis method – Estimate Rt (and R0) across age groups. – Require contact information.
• Moser and White, 2013 (in preparation)– Bayesian approach to the problem– Modify White & Pagano method to incorporate age contact
information– Incorporate contact information as a prior distribution
Approach 1: Glass et al• Modify Wallinga & Teunis and White & Pagano
methods to estimate R for children and adults• Assume a form for a reproduction matrix:
• mij describes the number of cases of type i infected by cases of type j.
• Some pre-specified structure must be imposed on the matrix M must be assumed to estimate the mij.
mCC mCA
mAC mAA
M=
Source: Glass et al, 2011
Matrix constraints
Modification of White & Pagano
• Let and
where Ai and Ci are the incidence counts for adults and children, respectively.Xi and Yi are the total number of cases infected by children and adults from day i, respectively.
Modification of White & Pagano
are the expected number of adults and children on day t.
Modification of White & Pagano
The likelihood used is:
Maximize this over the mij to obtain estimates.
Applying constraints to M, creates relationships between the mij and they become identifiable.
Modification of Wallinga & Teunis
• Modify the pij to incorporate the probability children infect each other and adults infect each other.
• is the probability that an individual of type ai was infected by someone of the same type (ai=C or A).
Modification of Wallinga & Teunis
• To estimate qA and qC, define f to be the total number of cases that are children.– Then the elements of the matrix elements in
terms of f and R (the population reproductive number).
– For separable matrix, and where .
Approach 1: simulation study
True RC=2.5 and true RA=1. L, M and U are 3rd, median and 98th percentiles over 100 simulations.
Approach 1: Japanese influenza dataWallinga & Teunis Approach
Approach 1: Japanese influenza dataWhite & Pagano Method
APPROACH 2: MODIFICATION OF WALLINGA AND TEUNIS
Heterogeneity
Source: White, Archer and Pagano (submitted, 2013)
Approach 2: modification of Wallinga & Teunis
• Similar to Glass et al, allow the probability of infection to be impacted by more than just distance apart in time
where is the probability of a serial interval of length j-i and is a similarity measure (similar to the matrices used by Glass et al).
Approach 2: modification of Wallinga & Teunis
• Similar to Glass et al, but we do not assume any structure on a similarity matrix, D=(dij).
• We use available data to define this matrix and are able to obtain estimates of Rj for a large number of age groups (or spatial locations, etc.)
Similarity measures
• Individuals who are “close” together are more likely to infect each other have larger similarity measures.
• Can be used to address probability of infection between different geographical regions, age groups, etc.
Similarity measures
Use a matrix to define the similarity measure.
Xij describes the amount of contact individuals in group i have with those in group j.
Age group 1
Age group 2
Age group 3
Age group 1
x11 x12 x13
Age group 2
x21 x22 x23
Age group 3
x31 x32 x33
Similarity Matrix
Basic similarity measures
• Matrix of all 1’s: original estimator– Implies that transmission is equally likely among
all individuals• Diagonal matrix: transmission only occurs
within homogenous groups (no mixing)– Comparable to applying original method to each
homogenous group separately• Can also use matrix that describes contact
patterns
Source: Archer et al (2009)
Example: Pandemic Influenza In South Africa
• Between 6/15/2009 and 11/23/2009 there were 12,630 confirmed cases
JSM 2012
Age Analysis
• We restrict our attention to Gauteng Province (the most populous) to limit geographic effects
• Use two sources of information on contact patterns between age groups:– PolyMod Study (Mossong et al, 2009)– Study in South African township (Johnstone-
Robertson, 2011)
PolyMod contact trace matrix0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+
0-4 1.92 0.65 0.41 0.24 0.46 0.73 0.67 0.83 0.24 0.22 0.36 0.2 0.2 0.26 0.135-9 0.95 6.64 1.09 0.73 0.61 0.75 0.95 1.39 0.9 0.16 0.3 0.22 0.5 0.48 0.210-14 0.48 1.31 6.85 1.52 0.27 0.31 0.48 0.76 1 0.69 0.32 0.44 0.27 0.41 0.3315-19 0.33 0.34 1.03 6.71 1.58 0.73 0.42 0.56 0.85 1.16 0.7 0.3 0.2 0.48 0.6320-24 0.45 0.3 0.22 0.93 2.59 1.49 0.75 0.63 0.77 0.87 0.88 0.61 0.53 0.37 0.3325-29 0.79 0.66 0.44 0.74 1.29 1.83 0.97 0.71 0.74 0.85 0.88 0.87 0.67 0.74 0.3330-34 0.97 1.07 0.62 0.5 0.88 1.19 1.67 0.89 1.02 0.91 0.92 0.61 0.76 0.63 0.2735-39 1.02 0.98 1.26 1.09 0.76 0.95 1.53 1.5 1.32 1.09 0.83 0.69 1.02 0.96 0.240-44 0.55 1 1.14 0.94 0.73 0.88 0.82 1.23 1.35 1.27 0.89 0.67 0.94 0.81 0.845-49 0.29 0.54 0.57 0.77 0.97 0.93 0.57 0.8 1.32 1.87 0.61 0.8 0.61 0.59 0.5750-54 0.33 0.38 0.4 0.41 0.44 0.85 0.6 0.61 0.71 0.95 0.74 1.06 0.59 0.56 0.5755-59 0.31 0.21 0.25 0.33 0.39 0.53 0.68 0.53 0.55 0.51 0.82 1.17 0.85 0.85 0.3360-64 0.26 0.25 0.19 0.24 0.19 0.34 0.4 0.39 0.47 0.55 0.41 0.78 0.65 0.85 0.5765-69 0.09 0.11 0.12 0.2 0.19 0.22 0.13 0.3 0.23 0.13 0.21 0.28 0.36 0.7 0.670+ 0.14 0.15 0.21 0.1 0.24 0.17 0.15 0.41 0.5 0.71 0.53 0.76 0.47 0.74 1.47
Great Britain, all contacts
Source: Johnstone-Robertson et al, AJE, 2011
South African township contact matrix
Estimate of Rt
Epidemic curve
Estimate of Rt
(a) All contacts involving physical touch; (b) all close contacts
Estimates of R0 by age group
040
080
012
00
0.0
0.5
1.0
1.5
(a)
Age Group
R^ 0
0-4
5-9
10-1
4
15-1
920
-24
25-2
930
-34
35-3
940
-44
45+
040
080
012
00N
um
be
r o
f Ca
ses
0.0
0.5
1.0
1.5
(b)
Age Group0-
45-
910
-14
15-1
920
-24
25-2
930
-34
35-3
940
-44
45+
(a) All contacts involving physical touch; (b) all close contacts
Estimates of R0
Age group N (%) R0, close contacts R0, all physical contacts
0-4 484 (8.73) 0.94 (0.91-0.97) 0.74 (0.72-0.76)
5-9 927 (16.72) 1.20 (1.17-1.24) 1.29 (1.25-1.33)
10-14 1150 (20.75) 1.53 (1.49-1.58) 1.47 (1.44-1.51)
15-19 1026 (18.52) 1.36 (1.32-1.40) 1.47 (1.42-1.50)
20-24 556 (10.03) 1.06 (1.03-1.09) 1.03 (1.01-1.06)
25-29 389 (7.02) 0.98 (0.94-1.01) 0.97 (0.94-1.01)
30-34 229 (4.13) 0.92 (0.88-0.94) 0.86 (0.82-0.88)
35-39 246 (4.44) 0.85 (0.82-0.88) 0.75 (0.82-0.78)
40-44 171 (3.09) 0.86 (0.83-0.90) 0.83 (0.80-0.87)
45+ 363 (6.55) 0.79 (0.75-0.85) 0.75 (0.71-0.81)
R0 by age group, by country used
(a) results for all close contacts and (b) for contacts involving physical touch
Estimates of R0 depending on contact matrix used
Overall R0
Method Overall
Homogenous mixing 1.28 (1.26-1.31)
Contact matrix used All close contacts Contacts with physical touch
South Africa 1.27 (1.25-1.31) 1.27 (1.25-1.31)
Belgium 1.26 (1.24-1.31) 1.27 (1.24-1.31)
Finland 1.27 (1.25-1.32) 1.27 (1.25-1.32)
Great Britain 1.27 (1.25-1.32) 1.27 (1.25-1.31)
Germany 1.27 (1.25-1.32) 1.27 (1.25-1.32)
Italy 1.27 (1.25-1.31) 1.27 (1.24-1.31)
Luxembourg 1.27 (1.25-1.32) 1.27 (1.25-1.32)
Netherlands 1.27 (1.25-1.32) 1.27 (1.25-1.32)
Poland 1.27 (1.25-1.31) 1.27 (1.24-1.31)
APPROACH 3: MODIFICATION OF WHITE AND PAGANO
Heterogeneity
Moser and White
• Modification of the White and Pagano method to estimate R0 and incorporate heterogeneity in the population
• Revise the likelihood to incorporate heterogeneity in the reproductive numbers
• Consider the scenario where we look at adults and kids only (2 group scenario)– RA and RC are the reproductive numbers for adults
and children, respectively
Moser and White
• Reparameterize the problem to allow for inclusion of contact matrix information– qhg is the probability that individual of type h has
contact with individual of type g– Example: RCA= qCA*RC
– RC=RCA+RCC
Day 0:N0C
N0A
Day 1:N1C = XC
0C1 + XA0C1
N1A = XC0A1 + XA
0A1
Day 2:N2C = XC
0C2 + XA0C2 + XC
1C2 + XA1C2
N2A = XC0A2 + XA
0A2 + XC1A2 + XA
1A2
Day 3:N3C = XC
0C3 + XA0C3 + XC
1C3 + XA1C3 + XC
2C3 + XA2C3
N3A = XC0A3 + XA
0A3 + XC1A3 + XA
1A3 + XC2A3 + XA
2A3
Day 4:N4C = + XC
1C4 + XA1C4 + XC
2C4 + XA2C4 + XC
3C4 + XA3C4
N4A = + XC1A4 + XA
1A4 + XC2A4 + XA
2A4 + XC3A4 + XA
3A4
……. ………….
Day T:NTC
NTA
Derivation of Likelihood FunctionTwo Group Example
Day 0:N0C
N0A
Day 1:N1C = XC
0C1 + XA0C1
N1A = XC0A1 + XA
0A1
Day 2:N2C = XC
0C2 + XA0C2 + XC
1C2 + XA1C2
N2A = XC0A2 + XA
0A2 + XC1A2 + XA
1A2
Day 3:N3C = XC
0C3 + XA0C3 + XC
1C3 + XA1C3 + XC
2C3 + XA2C3
N3A = XC0A3 + XA
0A3 + XC1A3 + XA
1A3 + XC2A3 + XA
2A3
Day 4:N4C = + XC
1C4 + XA1C4 + XC
2C4 + XA2C4 + XC
3C4 + XA3C4
N4A = + XC1A4 + XA
1A4 + XC2A4 + XA
2A4 + XC3A4 + XA
3A4
……. ………….
Day T:NTC
NTA
3 Day Serial Interval XC
0A2 = Adults infected on day 2 by a child from day 0
Day 0:N0C
N0A
Day 1:N1C = XC
0C1 + XA0C1
N1A = XC0A1 + XA
0A1
Day 2:N2C = XC
0C2 + XA0C2 + XC
1C2 + XA1C2
N2A = XC0A2 + XA
0A2 + XC1A2 + XA
1A2
Day 3:N3C = XC
0C3 + XA0C3 + XC
1C3 + XA1C3 + XC
2C3 + XA2C3
N3A = XC0A3 + XA
0A3 + XC1A3 + XA
1A3 + XC2A3 + XA
2A3
Day 4:N4C = + XC
1C4 + XA1C4 + XC
2C4 + XA2C4 + XC
3C4 + XA3C4
N4A = + XC1A4 + XA
1A4 + XC2A4 + XA
2A4 + XC3A4 + XA
3A4
……. ………….
Day T:NTC
NTA
RC
RC
RC
RC
RA
RA
RA
RA
Is Mixing Assortative?
XC0A2 = Adults infected
on day 2 by a child from day 0
Day 0:N0C
N0A
Day 1:N1C = XC
0C1
N1A = XC0A1
Day 2:N2C = XC
0C2
N2A = XC0A2
Day 3:N3C = XC
0C3
N3A = XC0A3
Day 4:N4C =
N4A =
……. ………….
Day T:NTC
NTA
RCARCC
Is Mixing Assortative?
XC0A2 = Adults infected
on day 2 by a child from day 0
Updated Likelihood
• The likelihood can be written as:
where Ntg is the number of cases on day t from group g.
• How do we maximize this likelihood?
Estimation
• We could try a frequentist approach, but there are issues with identifiability– We have four parameters to estimate and, similar
to Glass et al, would need to impose constraints on the q’s in order to get estimates.
• Alternative approach: MCMC with prior information– Use contact frequency matrices from survey data
to inform the priors of the q’s
Epidemic curves by age in South Africa
Results from South Africa pandemic
Age = 20 Age = 18 Age = 15
N (%) 1589 (66) 801 (34) 1340 (56) 1050 (44) 901 (38) 1489 (62) Mossong
Prior 1 1.34 (1.11-1.58)
1.51 (1.12-1.91)
1.32(1.12-1.53)
1.45(1.21-1.71)
1.47(1.07-1.90)
1.37(1.11-1.64)
Prior 2 1.37 (1.16-1.60)
1.45 (1.07-1.85)
1.36(1.15-1.57)
1.41(1.17-1.67)
1.51(1.10-1.93)
1.34(1.08-1.61)
Prior 3 1.67 (1.47-1.88)
0.94 (0.67-1.23)
1.55(1.35-1.76)
1.20(0.98-1.43)
1.61(1.22-2.00)
1.28(1.04-1.54)
Johnston-Robertson
Prior 1 1.33 (1.11-1.57)
1.52 (1.13-1.92)
1.31(1.11-1.53)
1.46(1.21-1.72)
1.44(1.04-1.88)
1.38(1.11-1.65)
Prior 2 1.37 (1.16-1.60)
1.45 (1.08-1.83)
1.35(1.14-1.56)
1.43(1.18-1.69)
1.48(1.09-1.90)
1.36(1.10-1.62)
Prior 3 1.65 (1.47-1.85)
0.97 (0.71-1.25)
1.54(1.35-1.74)
1.20(1.00-1.41)
1.58(1.23-1.93)
1.30(1.09-1.53)
Glass
Separable 1.78 0.90 1.65 1.29 1.06 1.75
HiC2C 1.73 1.00 1.63 1.31 0.87 1.86
Contact Freq. 1.90 0.64 1.91 0.95 1.63 1.41
Proportional 1.89 0.67 1.97 0.87 1.92 1.24
Issues
• Reporting differences across age groups– How might this impact our results?– Example: kids are much more likely to show up at
the clinic and have their cases reported. Adults are more likely to stay home.
• Non-uniformity of contact patterns globally?• Other issues?
Final thoughts
• Quantitative methods are essential to informing policy decisions in a disease outbreak
• Issues we want to address:– Severity– Transmissibility– Heterogeneity– Uncertainty
• Challenges with dependency in the data, unobserved events, etc.
Thanks!
• Funding source: National Institute Of General Medical Sciences of the National Institutes of Health under Award Number U54GM088558.