Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | beverly-morse |
View: | 29 times |
Download: | 2 times |
Network Dynamics and Simulation Science
Laboratory
A Data-driven Epidemiological Model
Stephen Eubank, Christopher Barrett, Madhav V. Marathe
GIACS Conference on"Data in Complex Systems"
Palermo, Italy, April 7-9 2008
QuickTime™ and aAnimation decompressor
are needed to see this picture.
QuickTime™ and aCinepak decompressor
are needed to see this picture.
Network Dynamics and Simulation Science
Laboratory
Data driven epidemiological models
I. Complex system
II. Data driven, individual-based simulation
III. Privacy and accuracy issues
Network Dynamics and Simulation Science
Laboratory
What’s so complex about epidemiology?
Consider an “outbreak” among 4 people
removedinfectious
susceptible
Network Dynamics and Simulation Science
Laboratory
Outbreaks can be represented as Markov processes
A given configuration of the system probabilistically transitions into any of several other configurations.
Even a small system has many possible configurations.
Network Dynamics and Simulation Science
Laboratory
Very little data is available to estimate this process
Historically, we (partially) observe 1 or 2 Markov chains
We want to estimate transition probabilities on every edge
Network Dynamics and Simulation Science
Laboratory
Aggregation simplifies the model …
… at the cost of reduced information content.
p(C’t+1 | C’t) is less informative than p(Ct+1 | Ct) when C’ C,
0 1 2 3 4 #S
#I
4
3
2
1
0
Network Dynamics and Simulation Science
Laboratory
Other assumptions further simplify the model …
… but are unwarranted in social systems, where components are
1. Heterogenous (distinguishable)2. Intentional (behavior not determined by physical laws)
QuickTime™ and aPNG decompressor
are needed to see this picture.
Network Dynamics and Simulation Science
Laboratory
Aggregation naturally makes contact with observations
Observations of outbreaks often ignore heterogeneity and intention, and provide only point estimates.
“An approximate answer to the right problem is worth a good deal more than
an exact answer to an approximate problem”
- J. Tukey
“All models are wrong, but some are useful”- G.E.P. Box
A system is complex “if its behavior crucially depends on the details of its parts.”
- G. Parisi
Network Dynamics and Simulation Science
Laboratory
Interaction approach simplifies process itself
Interactions among system components completely determine transition probabilities among configurations
replaced with
Network Dynamics and Simulation Science
Laboratory
Calibrating with unexpectedly rich data
• For aerosol borne pathogens, the probability of transmission
is related to physical proximity, duration, etc.
• The interaction approach reduces to estimating a social network.
• There is much more data available for this than for outbreaks.
• But it is not directly observable.
How can we estimate a social network?
Network Dynamics and Simulation Science
Laboratory
A possible approach we didn’t use
• Consider a subset of random networks subject to certain constraints• Constraints should be relevant to the global dynamics, i.e. epidemics• But what are those? A “chicken or the egg” problem:
It would seem offhand that a taxonomy of “nets” … would arise naturally from the consideration of the statistical parameters... But the statistical parameters themselves are singled out on the basis of taxonomic considerations, which have yet to be clarified.
- Anatol Rapoport and William Horvath, Behav Sci. 1961, 6, 279–291
Network Dynamics and Simulation Science
Laboratory
Questions to drive model development
1. What is the optimal targeted allocation of antivirals used prophylactically or therapeutically to mitigate influenza pandemic?
2. What combination of targeted antivirals and feasible, community-based, non-pharmaceutical interventions (e.g. closing schools, allowing liberal leave from work) can best delay an outbreak from becoming epidemic for several months?
1 & 2 Models must compare changes in social network with changes in transmissibility
This is an example of policy informatics for complex systems
Network Dynamics and Simulation Science
Laboratory
Interventions specified naturally by effect on network
No single “knob” reduces overall transmission by 50%
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Network Dynamics and Simulation Science
Laboratory
Step 1. Create a synthetic population
• Census data– Individual demographics
• Age and gender
– Household characteristics• Size and Income
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Network Dynamics and Simulation Science
Laboratory
• Start from a proto-population, e.g. a list of ids.• Add observed data• Capture correlations in data using statistical models
(iterative proportional fitting from Public Use Microdata)
• Start from a proto-population, e.g. a list of ids.• Add observed data• Start from a proto-population, e.g. a list of ids.
Successive refinement of synthetic data
ID GenderHousehold
1 M 1
.
.
.
.
.
.
.
.
.
3 x 108 F 1.2 x 108
Network Dynamics and Simulation Science
Laboratory
Step 2. Assign activities, locations & times
• Locations – Dunn and Bradstreet data
• Activity surveys– Matched to households by demographics
– Matched to locations by activity type & travel time
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Network Dynamics and Simulation Science
Laboratory
• Surveys are very different kinds of data sources than census• This step depends on data fusion capability• Some values may be outcomes of very large games, not statistical models
Successive refinement of synthetic data
• Surveys are very different kinds of data sources than census• This step depends on data fusion capability
ID GenderHousehold
ActivitiesActivity
LocationsActivityTimes
1 M 1Schoolshop
2743
8:003:00
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 x 108 F 1.2 x 108 Worksocial
98734723947
9:007:30
Network Dynamics and Simulation Science
Laboratory
So far: a typical family’s day
Carpool
HomeHome
Work Lunch WorkCarpool
Bus
Shopping
Car
Daycare
Car
School
time
Bus
Network Dynamics and Simulation Science
Laboratory
Overlapping families’ days create a social network
Network Dynamics and Simulation Science
Laboratory
Successive refinement of synthetic data
• Gives us a generative model for contacts• More powerful than traditional encapsulated agents• Note: each byte of data / person adds ~300 MB to the database
ID GenderHousehold
ActivitiesActivity
LocationsActivityTimes
ContactsContactDuration
1 M 1Schoolshop
2743
8:003:00
2,3,4836, 289
5:200:45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 x 108 F 1.2 x 108 Worksocial
98734723947
9:007:30
Network Dynamics and Simulation Science
Laboratory
Using data for purposes other than intended
Possibly the only epidemiological model that hasbeen calibrated using automobile traffic counts!
(Because the same activity model generates both transportation demand and contact networks)
QuickTime™ and aCinepak decompressor
are needed to see this picture.
Network Dynamics and Simulation Science
Laboratory
HomeHome
Activities adapt to situation & generate network changesActivities adapt to situation & generate network changes
Network Dynamics and Simulation Science
Laboratory
Derive disease interaction from social network
Interactions only need to get a few things right:• Susceptibility• Infectivity as a function of time since exposure
Network Dynamics and Simulation Science
Laboratory
Modeling pandemic influenza
• Nobody knows what pandemic flu will look like
• Assume something like seasonal flu, but with less immunity
• Create several “flu” bugs in siico– Moderate (10% attack rate)– Strong (20 - 25% attack rate) – Catastrophic (> 50% attack rate)
• For each, fix other characteristics:– Incubation period: 2-3 days– Infectious period: 2-5 days
Network Dynamics and Simulation Science
Laboratory
Resolution, fidelity, and accuracy are different
• Resolution describes level of aggregation,
e.g. individuals vs populations
• Fidelity describes the completeness of the representation’s features,
e.g. age vs (age, gender, income, household size, education)
• Accuracy describes the correctness of features and correlations
e.g. is mixing by age derived from social network correct?
“Validity” (always for a particular question) depends on all 3.
Effect of changes in social networks (above) on disease dynamics (below)
Network Dynamics and Simulation Science
Laboratory
Characterizing the resulting network
Degree Distribution, location-location
Degree Distribution, people-people
Sensitivity to parameters
Sensitivity to parameters
Network Dynamics and Simulation Science
Laboratory
Assortative Mixing
• Static people - people projection is assortative – by degree (~0.25)– but not as strongly by age, income, household size, …
This is
• Like other social networks • Unlike
– technological networks, – Erdos-Renyi random graphs– Barabasi-Albert networks
Removing high degree people useless
Removing high degree locations better
Network Dynamics and Simulation Science
Laboratory
Summary
• Complex systems models are hungry for detail (= data)
• Privacy & extrapolation require “synthetic” data, combining observations (declarative), statistical models, and simulation results (procedural)
• Validity of synthetic data depends on resolution, fidelity, accuracy, and the question it is intended to answer
Network Dynamics and Simulation Science
Laboratory
When is this model simpler?
Notation: x and y are states of a component at time t and t+1
1. Components’ states are updated independently:
# parameters
2. Interactions are pairwise independent:
# parameters
Network Dynamics and Simulation Science
Laboratory
When is this model simpler?
3. Most components do not interact directly:
# parameters
4. Only one state transition, S I, is affected by interactions: # parameters
Architecture
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Network Dynamics and Simulation Science
Laboratory
Computational Resources
• Demonstration experiment– 8 experiments (exp ids: 1083 to 1090)– 24 cells with 200 days and 25 reps
• Computations performed– 291 million contacts * 200 days * 25 reps * 24 cells =
34.92 quadrillion transmission evaluations
• Time Requirements– Single processor: 2 years 340 days– Small cluster (10 nodes, 4 cores): 26 days 18 hours– Current IDAC cluster: > 3 hours
Network Dynamics and Simulation Science
Laboratory
Example Located Synthetic Population
Example Route Plans
HOME
WORKLUNCH
WORK
DOCTOR
SHOP
HOME
HOME
WORK
SHOP
second person in household
first person in household
Network Dynamics and Simulation Science
Laboratory
Time Slice of a Typical Family’s Day
Network Dynamics and Simulation Science
Laboratory
How much does detail matter?
• Interaction picture: – Dynamics of outbreak depend on topology
– How and how much?
– What differences in network topology are relevant to prevention/mitigation
• What statistics capture difference?• Answer staring us in the face (see above):
– Overall attack rate is a function of the topology of the network
• Other measures for other questions– Attack rate by transmissibility as function of edges retained
– Vulnerability of a subset as function of edges retained
– Distribution of vulnerabilities as function of edges retained
How much does detail matter?
Network Dynamics and Simulation Science
Laboratory
Edge deletion in a graph
• RTI synthesized poultry farm network• In collaboration with Upenn, studying outbreaks• National network, essentially complete graph
– Distribution of weights
• Attack rate as function of edges retained• Attack rate by transmissibility as function of edges retained• Vulnerability of a subset as function of edges retained• Distribution of vulnerabilities as function of edges retained
Network Dynamics and Simulation Science
Laboratory
Model comparison
• Compare outcomes of same scenarios– Compare distributions of outcomes of similar scenarios
– Compare distributions of summary statistics of outcomes of similar scenarios
– Compare distributions of answers to questions about similar scenarios
• Compare
Network Dynamics and Simulation Science
Laboratory
Adds up to serious informatics challenge
• Managing the refinement process
• Integrating various data sources & simulations
• Curating the database
• Providing HPC services
• Providing analysis support