+ All Categories
Home > Documents > Project Mimic: Simulation for Syndromic Surveillance Thomas Lotze Applied Mathematics and Scientific...

Project Mimic: Simulation for Syndromic Surveillance Thomas Lotze Applied Mathematics and Scientific...

Date post: 18-Dec-2015
Category:
Upload: erick-anthony
View: 216 times
Download: 1 times
Share this document with a friend
Popular Tags:
18
Project Mimic: Simulation for Syndromic Surveillance Thomas Lotze Applied Mathematics and Scientific Computation University of Maryland Galit Shmueli and Inbal Yahav RH Smith School of Business University of Maryland with Howard Burkom and Sean Murphy JHU Applied Physics Lab This work was partially supported by NIH grant RFA-PH-05-12
Transcript

Project Mimic:Simulation for Syndromic Surveillance

Thomas LotzeApplied Mathematics and Scientific ComputationUniversity of Maryland

Galit Shmueli and Inbal YahavRH Smith School of BusinessUniversity of Maryland

with Howard Burkom and Sean MurphyJHU Applied Physics Lab

This work was partially supported by NIH grant RFA-PH-05-126.

Outline

The Biosurveillance Problem Motivation: Reasons for simulation Simulation Methodology

Options/Generation Mimicking a dataset

Analysis Is this is a good mimic?

Results

The Biosurveillance Problem

The Biosurveillance Problem, cont.

Given time series (usually pre-diagnostic daily data) Detect disease outbreaks

With few false alerts

Early

Difficulties with Biosurveillance Data Teams work on different authentic datasets

Each team has their own private data Cannot compare results Researchers with no data cannot join the effort

Data are unlabeled We don’t know exactly when there are outbreaks Challenges evaluation of algorithm performance Hinders comparison of different algorithms

Project Mimic

Q: What if there was a way to generate pseudo-authentic data similar in statistical structure to real data AND insert simulated outbreak signatures into it?

A: we’d have new, labeled pseudo-real data!

Project Mimic: Dataset Mimicker “Mimics” statistical structure of background

data Levels of counts of different series Day-of-week patterns Seasonal patterns Holidays Within-series autocorrelation Cross-series cross-correlation

Extracts features from the authentic dataset Output: dataset that “looks” like real dataset

Set of 6 series from one city Original Mimicked

Res

pG

I

3 series from one city, zoomed in

Mimic Methodology

Our method(s): Create random autocorrelated multivariate data

Normal or poisson Uses mean, standard deviation, reduced cross-

correlation, 1-day acf from original Holiday factor Seasonal factor Day-of-week factor Details at www.projectmimic.com

Mimicking implicitly uses a generative model What is the right model?

Evaluating Mimics

Test: could the original data have been generated from the mimicker?

Compare different generative models If the model were simple, could use AIC Instead, Chi-squared

Chi-squared Goodness-of-fit Tests By series By day of week Separate values into bins Chi-squared Test on counts

Example of Disparity

Project Mimic: Outbreak signature simulator Generates multivariate outbreak-signatures Options:

Number of outbreak-signatures in series? Magnitude of outbreak? How many (and which) series will include outbreak-

signatures? Stochastic/fixed? Include effects such as DOW, holidays, etc.? (like

background data) Output: matrix of outbreak-signatures to be inserted in

the background data

Outbreak labels

Project Mimic

Combining the background matrix + outbreak-signature matrix yields labeled data

Two final products Mimicker: Data and outbreak-signature simulators (in freeware R)

Can be used by data owners to disseminate pseudo-data Can be used by research teams to evaluate robustness of methods

Mimics: Datasets that mimic DARPA BioALIRT data Benchmark datasets for comparison across groups Can be used to perform optimization methods for improved detection

Available at www.projectmimic.com Example: BioALIRT data on 3 series (Resp from

civilian/military/prescriptions)

Mimicked data + outbreak-signature

Conclusions

Mimic opens the door to: new techniques new researchers

First data sets of their kind Open methodology Publicly available Realistic

www.projectmimic.com


Recommended