8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
1/56
Biosurveillance 2.0Collaboration for Early Disease
Warning and EffectiveResponse
Taha Kass-Hout
Nicols di Tada
Invited by Dr. Barbara Massoudi, PhD, MPHLecture at Emory University Rollins School of Public Health
Public Health Informatics, INFO 503
Atlanta, GA, USA
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
2/56
Background
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
3/56
DAY
CASES
Opportunityfor control
Background
Late Detection Response
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
4/56
DAY
CASES
Opportunityfor control
Background
Early Detection andResponse
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
5/56
PUBLIC HEALTH MEASURES
Representativeness
Completeness
Predictive Value
Timeliness
Background
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
6/56
PUBLIC HEALTH MEASURES
1000 Malariainfections (100%)
50 Malarianotifications (5%)
Get as close to thebottom of the pyramid
as possible
Urge frequent reporting:Weekly daily immediately
Specificity /Reliability
Sensitivity /Timeliness Main attributes
o Representativenesso Completenesso
Predictive value positiveBackground
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
7/56
Analyze and interpret
Signal as
earlyas possible
Automated analysis/thresholds
Time
Main attributeso Timeliness
PUBLIC HEALTH MEASURES
Health care hotline
Background
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
8/56
PUBLIC HEALTH TWOPERSPECTIVES
Case management Individual cases of notifiable
diseases
Relationship networks (contacttracing)
Population surveillance Larger risk patterns
Background
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
9/56
CASE MANAGEMENT
Questions/problems:
Is a case due to recent transmission?
If so, does the case share any feature
with other, recent cases?
Ways it's being done:
Investigations/interviews Meeting with other investigators
Background
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
10/56
POPULATION SURVEILLANCE
Questions/problems:
Are more cases happening than expected?
Does an excess suggest ongoing
transmission in a specific region?
Way it's being done:
Semi-automated routine temporal and
space-time statistical analysis
Background
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
11/56
WHY LOCATION MATTERS CASE MANAGEMENT
If you are studying a case of acertain disease that was justdeclared
It is harder to picture thesituation by looking at somethingas this..
Background
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
12/56
WHY LOCATION MATTERS CASE MANAGEMENT
Background
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
13/56
WHY LOCATION MATTERS CASE MANAGEMENT
Than by looking at this..
Background
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
14/56
WHY LOCATION MATTERS CASE MANAGEMENT
Background
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
15/56
WHY LOCATION MATTERS POP SURVEILLANCE
If you are studying the spatialdistribution of a set of diseaseclusters
This would seem more difficult..
Background
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
16/56
WHY LOCATION MATTERS POP SURVEILLANCE
Background
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
17/56
WHY LOCATION MATTERS POP SURVEILLANCE
Than this..
Background
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
18/56
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
19/56
The Problem Space
Current systems design,analysis and evaluationhas been gearedtowards specific data
sources and detectionalgorithms nothumans
We have systems in
place for those threatswe have been facedwith before
The Problem
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
20/56
Traditional DISEASESURVEILLANCE
In the past two decades focus was on
automatically detecting anomalouspatterns in data (often a single stream)
Modern methods
rely on human input and judgment
incorporate temporal, spatial, andmultivariate information
The Problem
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
21/56
9/20, 15213, cough/cold,
9/21, 15207, antifever, 9/22, 15213, CC = cough, ...
1,000,000 more records
Huge mass of data Detection algorithm What are we
supposed to do with
this?
Too many
alerts
Traditional DISEASESURVEILLANCE
The Problem
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
22/56
Our Approach
Human-based
Collaborative and cross-disciplinary
Web 2.0/3.0 platform
Our Approach
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
23/56
Information Sources
Event-based - ad-hocunstructured reportsissued by formal orinformal sources
Indicator-based -(number of cases,rates, proportion of
strains)
Timeliness, Representativeness, Completeness,
Predictive Value, Quality, Our Approach
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
24/56
9/20, 15213, cough/cold,
9/21, 15207, antifever, 9/22, 15213, CC = cough, ...
1,000,000 more records
Huge mass of dataFeedback loop
MODERN DISEASESURVEILLANCE
Our Approach
M i C t
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
25/56
Main Components
Feature extraction, reference andbaseline information
Tags
Multiple Data Streams
User-Generated and Machine LearningMetadata
Comments
Spatio-temporal
Flags/Alerts/Bookmarks
EventClassification,
Characterizationand Detection
Previous Event Training Data
Previous Event Control Data
Metadataextraction
Machinelearning
Social network
Professionalfeedback
Anomalydetection
Collaborative Spaces
Hypotheses generation\testing
Our Solution
Main Components
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
26/56
Main Components
Our Solution
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
27/56
Item
Hypothesis
Field Actions and
Verifications
Feedback /
Confirmation
Our Solution
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
28/56
ADVANTAGES OF MACHINELEARNING
P(malaria) = 22%P(influenza) = 13%
P(other ILI) = 33%
Our Solution
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
29/56
MACHINE LEARNINGTECHNIQUES
Classifiers
Clustering
Bayesian Statistics Neural Networks
Genetic Algorithms
Our Solution
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
30/56
HOW TO REPRESENT ADOCUMENT?
cold
fever
Our Solution
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
31/56
CLASSIFIERS PROBLEMDEFINITION
Map items to vectors (Featureextraction)
Normalize those vectors
Train the classifier
Measure the results with new
information Feedback the classifier
Separate classes in feature space
Our Solution
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
32/56
CLASSIFIERS - SVM
Our Solution
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
33/56
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
34/56
SVM NON LINEAR?
: x(x)
Map to higher-dimension space
Our Solution
SVM FILTERING OR
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
35/56
SVM FILTERING ORCLASSIFYING
Classifier
Document1
Document2
Document3
Positives
Negatives
TrainingDocument
TrainingDocument
Our Solution
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
36/56
CLUSTERING PROBLEMDEFINITION
Map items to vectors (Featureextraction)
Normalization
Agglomerative and Partitional
Our Solution
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
37/56
CLUSTERING -AGGLOMERATIVE
Our Solution
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
38/56
CLUSTERING - PARTITIONAL
Our Solution
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
39/56
BAYESIAN STATISTICS
P(A|B)=P(B|A).P(A)
P(B)
Probability ofdisease A (flu)
once symptomsB (fever) are
observed
Probability offever once fluis confirmed
Probability offlu (prior ormarginal)
Probability offever (prioror marginal)
Our Solution
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
40/56
NEURAL NETWORKS
Given a set of stimulus, train asystem to produce a given output
Our Solution
NEURAL NETWORKS
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
41/56
Hidden Layer
Output Layer
Input Layer
NEURAL NETWORKS -STRUCTURE
[]
[]
{I0,I1,In}
{O0,O1,On}
Weight
Hn= (I
i.
i=0
I
win)
Our Solution
NEURAL NETWORK
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
42/56
NEURAL NETWORK -APPLICATION
Event?
Our Solution
GENETIC ALGORITHM
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
43/56
GENETIC ALGORITHM -BASICS
Define the model that you wantto optimize
Create the fitness function
Evolve the gene pool testingagainst the fitness function.
Select the best individual
Our Solution
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
44/56
GENETIC ALGORITHM
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
45/56
GENETIC ALGORITHM MODEL FITNESS
Fitness = 1/Area
Our Solution
GENETIC ALGORITHM
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
46/56
GENETIC ALGORITHM PROCESS
1.Create an initial population ofcandidates
2.Use operators to generate new
candidates (mating and mutation)3.Discard worst individuals or select best
individuals in generation
4.Repeat from 2 until you find a
candidate that satisfies the solutionsearched
Our Solution
GENETIC ALGORITHM
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
47/56
(4,5,6,3,5)(4,3,6,2,5)
GENETIC ALGORITHM -PROCESS
(5,3,4,6,2) (2,4,6,3,5) (4,3,6,5,2)
(2,3,4,6,5) (3,4,5,2,6)
(3,5,4,6,2) (4,5,3,6,2) (5,4,2,3,6)
(4,6,3,2,5) (3,4,2,6,5) (3,6,5,1,4)
(5,3,2,6,5)
(3,4,4,6,2)
(5,3,2,6,5)
(3,4,4,6,2)
Our Solution
RESULTS IMPROVED
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
48/56
RESULTS IMPROVEDSURVEILLANCE
Our Solution
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
49/56
Our Solution
InSTEDD Evolve
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
50/56
Our Solution
InSTEDD Evolve
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
51/56
Our Solution
InSTEDD Evolve
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
52/56
Our Solution
InSTEDD Evolve
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
53/56
Our Solution
InSTEDD Evolve
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
54/56
Acknowledgement
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
55/56
8/7/2019 Bio Surveillance 2.0 Kass-Hout and Di Tada
56/56