+ All Categories
Home > Documents > 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

Date post: 22-Dec-2015
Category:
View: 217 times
Download: 0 times
Share this document with a friend
130
1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director
Transcript
Page 1: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

1

PROGRAMS IN

HOMELAND SECURITY AT

DIMACS

Fred S. RobertsDIMACS Director

Page 2: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

2

THE FOUNDING OF DIMACS

THE NSF SCIENCE AND TECHNOLOGY CENTERS PROGRAM

The STC program was launched by the White House and the National Academy of Sciences in 1988 in order to increase the economic competitiveness of the U.S.

NSF ran a nationwide competition. The rules:

*cutting edge research

*education and knowledge transfer

*university-industry partnerships

Page 3: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

3

Because of the increasing importance of discrete mathematics and theoretical computer science, especially in the fields of telecommunications and computing, four institutions, Rutgers and Princeton Universities and AT&T Bell Labs and Bell Communications Research (Bellcore) each developed strong research groups in these fields.

Under the leadership of Rutgers, they came together to found DIMACS and entered the STC competition.

There were more than 800 preproposals; more than 300 proposals, in all fields of science; 11 winners.

THE FOUNDING OF DIMACS

Page 4: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

4

The DIMACS Partners Today

Rutgers UniversityPrinceton UniversityAT&T LabsBell Labs (Lucent Technologies)NEC Laboratories AmericaTelcordia Technologies

Affiliates:Avaya LabsHP LabsIBM ResearchMicrosoft ResearchStevens Institute of Technology

Page 5: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

5

WHO IS DIMACS?•There are about 250 scientists affiliated with DIMACS and called permanent members.

•Most are from the partner and affiliated organizations.

•They include many of the world’s leaders in discrete mathematics and theoretical computer science and their applications.

•They also include statisticians, biologists, psychologists, chemists, epidemiologists, and engineers.

•None are paid by DIMACS, but they join in DIMACS projects.

Page 6: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

6

Outline: A Selection of DIMACS Projects

•Bioterrorism Sensor Location•Port of Entry Inspection Algorithms•Monitoring Message Streams•Author Identification•Computational and Mathematical Epidemiology•Adverse Event/Disease Reporting/Surveillance/Analysis•Bioterrorism Working Group•Modeling Social Responses to Bioterrorism•Predicting Disease Outbreaks from Remote Sensing and Media Data•Communication Security and Information Privacy

Page 7: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

7

The Bioterrorism Sensor Location Problem

Page 8: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

8

• Early warning is critical in defense against terrorism

• This is a crucial factor underlying the government’s plans to place networks of sensors/detectors to warn of a bioterrorist attack

The BASIS System – Salt Lake City

Page 9: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

9

Locating Sensors is not Easy

• Sensors are expensive

• How do we select them and where do we place them to maximize “coverage,” expedite an alarm, and keep the cost down?

• Approaches that improve upon existing, ad hoc location methods could save countless lives in the case of an attack and also money in capital and operational costs.

Page 10: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

10

Two Fundamental Problems

• Sensor Location Problem– Choose an

appropriate mix of sensors

– decide where to locate them for best protection and early warning

Page 11: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

11

Two Fundamental Problems• Pattern Interpretation

Problem: When sensors set off an alarm, help public health decision makers decide– Has an attack taken place?– What additional

monitoring is needed?– What was its extent and

location?– What is an appropriate

response?

Page 12: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

12

The SLP: What is a Measure of Success of a Solution?

• A modeling problem.

• Needs to be made precise.

• Many possible formulations.

Page 13: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

13

The SLP: What is a Measure of Success of a Solution?

• Identify and ameliorate false alarms.• Defending against a “worst case” attack or an

“average case” attack.• Minimize time to first alarm? (Worst case?

(Average case?)• Maximize “coverage” of the area.

– Minimize geographical area not covered– Minimize size of population not covered– Minimize probability of missing an attack

Page 14: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

14

The SLP: What is a Measure of Success of a Solution?

•Cost: Given a mix of available sensors and a fixed budget, what mix will best accomplish our other goals?

Page 15: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

15

The SLP: What is a Measure of Success of a Solution?

•It’s hard to separate the goals.•Even a small number of sensors might detect an attack if there is no constraint on time to alarm.•Without budgetary restrictions, a lot more can be accomplished.

Page 16: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

16

The Sensor Location Problem

•Approach is to develop new algorithmic methods.

•We are building on approaches to other modeling problems, seeing if they can be modified in the sensor location context.

•This is a multi-criteria modeling problem and it seems hopeless to try to find “optimal solutions”

•We will be happy with “efficient” algorithms that find “good” solutions

Page 17: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

17

Algorithmic Approaches I : Greedy Algorithms

Page 18: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

18

Greedy Algorithms• Find the most important location first and locate a

sensor there.

• Find second-most important location.

• Etc.

• Builds on earlier mathematical work at Institute for Defense Analyses (Grotte, Platt)

• “Steepest ascent approach.’’

• No guarantee of “optimal” or best solution.

• In practice, gets pretty close to optimal solution.

Page 19: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

19

Algorithmic Approaches II : Variants of Classic Location and

Clustering Methods

Page 20: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

20

Algorithmic Approaches II : Variants of Classic Location and

Clustering Methods

• Location theory: locate facilities (sensors) to be used by users located in a region.

• Cluster analysis: Given points in a metric space, partition them into groups or clusters so points within clusters are relatively close.

• Clusters correspond to points covered by a facility (sensor).

Page 21: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

21

Variants of Classic Location and Clustering Methods

• k-median clustering: Given k sensors, place them so each point in the city is within x feet of a sensor.

• Complications: More dimensions: location affects sensitivity, wind strength enters, sensors have different characteristics, etc.

• This higher-dimensional k-median clustering problem is hard! Best-known algorithms are due to Rafail Ostrovsky.

Page 22: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

22

Variants of Classic Location and Clustering Methods

• Further complications make this even more challenging:– Different costs of different sensors– Restrictions on where we can place different

sensors– Is it better to have every point within x feet of

some sensor or every point within y feet of at least three sensors (y > x)?

• Approximation methods due to Chuzhoy, Ostrovsky, and Rabani and to Guha, Tardos, and Shmoys are relevant.

Page 23: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

23

Algorithmic Approaches III : Variants of Highway Sensor

Network Algorithms

Page 24: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

24

Variants of Highway Sensor Network Algorithms

• Sensors located along highways and nearby pathways measure atmospheric and road conditions.

• Muthukrishnan, et al. have developed very efficient algorithms for sensor location.

• Based on “bichromatic clustering” and “bichromatic facility location” (color nodes corresponding to sensors red, nodes corresponding to sensor messages blue)

Page 25: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

25

Variants of Highway Sensor Network Algorithms

• These algorithms apply to situations with many more sensors than the bioterrorism sensor location problem.

• As BT sensor technology changes, we can envision a myriad of miniature sensors distributed around a city, making this work all the more relevant.

Page 26: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

26

Algorithmic Approaches IV : Building on Equipment Placing

Algorithms

Page 27: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

27

Building on Equipment Placing Algorithms

• The “Node Placement Problem” is problem of determining locations or nodes to install certain types of networking equipment.

• “Coverage” and cost are a major consideration.

• Researchers at Telcordia Technologies have studied variations of this problem arising from broadband access technologies.

Page 28: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

28

The Broadband Access Node Placement Problem

• There are inherent range limitations that drive placement.

• E.g.: customer for DSL service must be within xx feet of an assigned multiplexer.

• Multiplexer = sensor.

• Problem solved using dynamic programming algorithms.

(Tamra Carpenter, Martin Eiger,David Shallcross, Paul Seymour)

Page 29: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

29

The Broadband Access Node Placement Problem: Complications

• Restrictions on types of equipment that can be placed at a given node.

• Constraints on how far a signal from a given piece of equipment can travel.

• Cost and profit maximization considerations.

• Relevance of work on general integer programming, the knapsack cover problem, and local access network expansion problems.

Page 30: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

30

The Pattern Interpretation Problem

Page 31: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

31

The Pattern Interpretation Problem

• It will be up to the Decision Maker to decide how to respond to an alarm from the sensor network.

Page 32: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

32

The Pattern Interpretation Problem

• Little has been done to develop analytical models for rapid evaluation of a positive alarm or pattern of alarms from a sensor network.

• How can this pattern be used to minimize false alarms?

• Given an alarm, what other surveillance measures can be used to confirm an attack, locate areas of major threat, and guide public health interventions?

Page 33: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

33

The Pattern Interpretation Problem (PIP)

• Close connection to the SLP.

• How we interpret a pattern of alarms will affect how we place the sensors.

• The same simulation models used to place the sensors can help us in tracing back from an alarm to a triggering attack.

Page 34: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

34

Approaching the PIP: Minimizing False Alarms

Page 35: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

35

Approaching the PIP: Minimizing False Alarms

• One approach: Redundancy. Require two or more sensors to make a detection before an alarm is considered confirmed.

Page 36: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

36

Approaching the PIP: Minimizing False Alarms

• Portal Shield: requires two positives for the same agent during a specific time period.

• Redundancy II: Place two or more sensors at or near the same location. Require two proximate sensors to give off an alarm before we consider it confirmed.

• Redundancy drawbacks: cost, delay in confirming an alarm.

Page 37: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

37

Approaching the PIP: Using Decision Rules

• Existing sensors come with a sensitivity level specified and sound an alarm when the number of particles collected is sufficiently high – above threshold.

Page 38: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

38

Approaching the PIP: Using Decision Rules

• Alternative decision rule: alarm if two sensors reach 90% of threshold, three reach 75% of threshold, etc.

• One approach: use clustering algorithms for sounding an alarm based on a given distribution of clusters of sensors reaching a percentage of threshold.

Page 39: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

39

Approaching the PIP: Using Decision Rules

• When sensors are to be used jointly, the rules for “tuning” each sensor should be optimized to take advantage of the fact that each is part of a network.

• The optimal tuning depends on the decision rule applied to reach an overall decision given the sensor inputs.

Page 40: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

40

Approaching the PIP: Using Decision Rules

• Prior work along these lines in missile detection (Cherikh and Kantor)

Page 41: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

41

Approaching the PIP: Using Decision Rules

• Most work has concentrated on the case of stochastic independence of information available at two sensors – clearly violated in BT sensor location problems.

• Even with stochastic independence, finding “optimal” decision rules is nontrivial.

• Recent promising approaches of Paul Kantor: study fusion of multiple methods for monitoring message streams.

Page 42: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

42

Approaching the PIP: Spatio-Temporal Mining of Sensor Data

Page 43: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

43

Approaching the PIP: Spatio-Temporal Mining of Sensor Data

• Sensors provide observations of the state of the world localized in space and time.

• Finding trends in data from individual sensors: time series data mining.

• PIP: detecting general correlations in multiple time series of observations.

• This has been studied in statistics, database theory, knowledge discovery, data mining.

• Complications: proximity relationships based on geography; complex chronological effects.

Page 44: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

44

Approaching the PIP: Spatio-Temporal Mining of Sensor Data

• Sensor technology is evolving rapidly.

• It makes sense to consider idealized settings where data are collected continuously and communicated instantly.

• Then, modern methods of spatio-temporal data mining due to Muthukrishnan and others are relevant.

Page 45: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

45

Approaching the PIP: Triggering Other Methods of Surveillance

• One type of BT surveillance cannot be considered in isolation.• Question: How can the pattern of sensor warnings guide other

biosurveillance methods?

• Increased syndromic surveillance?• Change threshold for alarm in syndromic surveillance?• Increased attention to E.R. visits in a certain region?

Page 46: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

46

Approaching the PIP: Triggering Other Methods of Surveillance

• Decreased threshold for alarm from subway worker absenteeism levels?

Page 47: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

47

Approaching the PIP: Triggering Other Methods of Surveillance

• If there is an initial alarm, each sensor may be read more often.

• How do we pick the sensors to read more frequently?

• This is “adaptive biosensor engagement.”• Methods of bichromatic combinatorial

optimization may be relevant. • As for the SLP, sensors get one color, sensor

messages another.• Relevance of work of Muthukrishnan.

Page 48: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

48

Outline•Bioterrorism Sensor Location•Port of Entry Inspection Algorithms•Monitoring Message Streams•Author Identification•Computational and Mathematical Epidemiology•Adverse Event/Disease Reporting/Surveillance/Analysis•Bioterrorism Working Group•Modeling Social Responses to Bioterrorism•Predicting Disease Outbreaks from Remote Sensing and Media Data•Communication Security and Information Privacy

Page 49: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

49

Port of Entry Inspection Algorithms

In collaboration with Los Alamos National Laboratory

Page 50: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

50

Port of Entry Inspection Algorithms•Goal: Find ways to intercept illicit nuclear materials and weapons destined for the U.S. via the maritime transportation system•Aim: Develop decision support algorithms that will help us to “optimally” intercept illicit materials and weapons•Find inspection schemes that minimize total “cost” including “cost” of false positives and false negatives

Page 51: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

51

Sequential Decision Making Problem•Stream of entities arrives at a port•Decision Maker needs to decide which to inspect, which to subject to increasingly stringent inspection based on outcomes of previous inspections•Our approach: “decision logics” and combinatorial optimization methods•Builds on approach of Stroud and Saeger and large literature in sequential decision making.

Page 52: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

52

Sequential Decision Making Problem•Entities arriving to be classified into categories.•Simple case: 0 = “ok”, 1 = “suspicious”•Observations are made.•Inspection scheme: specifies which observations are to be made based on previous observations•Entities have attributes a0, a1, …, an, each in a number of states•Sample attributes:

Does ship’s manifest set off an “alarm”?Does container give off neutron or Gamma emission above threshold?Does a radiograph image come up positive?Does an induced fission test come up positive?

Page 53: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

53

Sequential Decision Making Problem•Simplest Case: Attributes are in state 0 or 1•Then: Entity is a binary string like 011001•Then: Classification is a decision function F that assigns each binary string to a category.•If there are two categories, 0 and 1, F is a boolean function.

F(000) = F(111) = 1, F(abc) = 0 otherwise

This classifies an entity as positive iff it has none of the attributes or all of them.

Page 54: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

54

Sequential Decision Making Problem•Different problems depending on whether or not F is known. Assume first that F is known.•Given an entity, test its attributes until know enough to calculate the value of F. •An inspection scheme tells us in which order to test the attributes to minimize cost.•Even this simplified problem is hard computationally.

Page 55: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

55

Binary Decision Tree Approach•We assume we have sensors to measure presence or absence of attributes.•Build a tree:

•Nodes are sensors or categories (0 or 1)•Label nodes with atrribute the sensor measures for or the number of the category•Category nodes are “leaves” of the tree – nodes with only one neighbor•Two arcs exit from each sensor node, labeled left and right.•Take the right arc when sensor says the attribute is present, left arc otherwise

Page 56: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

56

Binary Decision Tree Approach

•We reach category 1 from the root only through the path a0 to a1 to 1. •Thus, an entity is classified in category 1 iff it has both attributes.•The binary decision tree corresponds to the boolean function F(11) = 1, F(10) = F(01) = F(00) = 0.

Figure 1

Page 57: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

57

Binary Decision Tree Approach•We reach category 1 from the root by:a0 L to a1 R a2 R 1 ora0 R a2 R1•An entity is classified in category 1 iff hasa1 and a2 and not a0 or a0 and a2 and possibly a1.•Corresponding boolean function F(111) = F(101) = F(011) = 1, F(abc) = 0 otherwise.

Figure 2

Page 58: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

58

Binary Decision Tree Approach•This binary decision tree corresponds to the same boolean function

F(111) = F(101) = F(011) = 1, F(abc) = 0 otherwise.

However, it has one less observation node. So, it is more efficient if all observations are equally costly and equally likely.

Figure 3

Page 59: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

59

Binary Decision Tree Approach•Even if the boolean function F is fixed, the problem of finding the “optimal” binary decision tree for it is NP-complete.•For small n, can try to solve it by brute force enumeration. •But even for n = 4, not practical. (n = 4 at Port of Long Beach-Los Angeles)•Seeking heuristic algorithms, approximations to optimal.•Making special assumptions about the boolean function F. •Example: For so-called “monotone” boolean functions, integer programming formulations give promising heuristics.

Page 60: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

60

Cost Functions

•Above analysis: Only uses number of sensors•Using a sensor has a cost:

Unit cost of inspecting one item with itFixed cost of purchasing and deploying itDelay cost from queuing up at the sensor station

•How many nodes of the decision tree are actually visited during average inspection? Depends on “distribution” of entities.

Page 61: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

61

Cost Functions

•Cost of false positive: Cost of additional tests. •If it means opening the container, it’s very expensive.

•Cost of false negative: Complex issue.

Page 62: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

62

Complications

•Sensor errors – probabilistic approach

•More than two values of an attribute (present, absent, present with 75% probability, …)

•Partially defined boolean functions (inferring the boolean function from observations)

•In this case, machine learning approaches are promising:

Bayesian binary regressionSplitting strategiesPruning learned decision trees

Page 63: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

63

Outline•Bioterrorism Sensor Location•Port of Entry Inspection Algorithms•Monitoring Message Streams•Author Identification•Computational and Mathematical Epidemiology•Adverse Event/Disease Reporting/Surveillance/Analysis•Bioterrorism Working Group•Modeling Social Responses to Bioterrorism•Predicting Disease Outbreaks from Remote Sensing and Media Data•Communication Security and Information Privacy

Page 64: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

64

Monitoring Message Streams:

Algorithmic Methods for Automatic

Processing of Messages

Page 65: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

65

Motivation: monitoring email traffic, news, communiques, faxes, voice intercepts (with speech recognition)

OBJECTIVE:

Monitor huge communication streams, in particular, streams of textualized communication to automatically detect pattern changes and "significant" events

Page 66: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

66

• Given stream of text in any language.

• Decide whether "new events" are present in the flow of messages.

• Event: new topic or topic with unusual level of activity.

• Initial Problem: Retrospective or “Supervised” Event Identification: Classification into pre-existing classes. Given example messages on events/topics of interest, algorithm detects instances in the stream.

TECHNICAL APPROACHES:

Page 67: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

67

• Batch filtering: Given examples of relevant documents up front.

• Adaptive filtering: Examples accumulated; need to decide if will bother analyst for guidance; “pay” for information about relevance as process moves along.

TECHNICAL APPROACHES: SUPERVISED FILTERING

Page 68: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

68

MORE COMPLEX PROBLEM: PROSPECTIVE DETECTION OR “UNSUPERVISED” FILTERING

• Classes change - new classes or change meaning

• A difficult problem in statistics• Recent new C.S. approaches

“Semi-supervised Learning”: • Algorithm suggests a possible new

event/topic• Human analyst labels it; determines its

significance

Page 69: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

69

COMPONENTS OF AUTOMATIC MESSAGE PROCESSING

(1). Compression of Text – increase speed, reduce memory/disk use

(2). Representation of Text – convert text to form amenable to computation and statistical analysis;

(3). Matching Scheme – compute similarity between texts;

(4). Learning Method – create profiles of events/topics from known examples.

(5). Fusion Scheme -- combine multiple filtering techniques to increase accuracy.

Page 70: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

70

COMPONENTS OF AUTOMATIC MESSAGE PROCESSING - II

•These distinctions are somewhat arbitrary.

•Many approaches to message processing overlap several of these components of automatic message processing; our techniques usually address more than one component.

Project Premise: Existing methods don’t exploit the full power of the 5 components, synergies among them, and/or an understanding of how to apply them to text data.

Page 71: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

71

COMPONENTS OF AUTOMATIC MESSAGE PROCESSING - III

•Our approach is to develop/explore methods for each component and then to combine them.

•In the first phase of the project, we did over 5000 complete experiments with different combinations of methods.

Page 72: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

72

Nearest Neighbor (kNN) Classifiers • Route message by

– Finding k most similar training messages (neighbors)– Assign to classes that are most common among

neighbors (using weighting by distance) • kNN classifiers studied since 1958, for text since early 90’s

– Moderately effective for text; has been considered inefficient; finding neighbors is slow

• But, finding neighbors only needs to be done once– No matter how many classes (even if huge) – So: for large number of topics, maybe more efficient

than one-classifier-per-topic approaches

Page 73: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

73

Speeding up kNN• Can finding neighbors be made fast enough to make kNN

practical?• Worked on fast implementation• Store text and classes sparsely (Representation)

– Store class labels sparsely– Arrange computations to do work proportional only to

number of class labels in neighbors, not total number of classes

• Search engine heuristics use the in-memory inverted file (Matching)– Use inverted file (group by word, not by document)– Retain only high impact terms within each document, or

within each inverted list– Compute similarities using only inverted lists for the

few words occurring in test document

Page 74: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

74

kNN: Results• Great reduction in size of inverted index and speed

of classification• Slight additional cost in effectiveness• Effectiveness slightly below our best methods

(Bayesian probit and logistic classifiers)• Compressed index 90% smaller than original index

w/only 7-12% loss in effectiveness (macro-F1)• Approximate matching is 10 to 100 times faster w/

only 2-10% loss in effectiveness (macro-F1)• Ours are first large scale experiments on search

engine heuristic for neighbor lookup in kNN• Partnership between theoreticians and practitioners.

Page 75: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

75

Bayesian Methods

•Bayesian statistical methods place “prior” probability distributions on all unknowns, and then compute “posterior” distribution for the unknowns conditional on the knowns.

Thomas Bayes

Page 76: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

76

Bayesian Methods•Zhang and Oles (2001): developed an efficient optimization algorithm for logistic regression (10,000 dimensions) and achieved excellent predictive performance.

•The Bayesian approach explicitly incorporates prior knowledge about model complexity (“regularization”)

•We extended the Bayesian approach to incorporate a prior requirement for sparsity.

•Logistic regression has one parameter per dimension; our sparse model sets many of these to zero; handles hundreds of thousands of parameters efficiently.

•Resulting sparse models produce outstanding accuracy and ultra-fast predictions with no ad-hoc feature selection

Page 77: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

77

Bayesian Methods: Sample Results

•We have implemented several efficient variants, e.g., probit,informative priors.•Publicly released software; over 1000 downloads•Compared to Zhang & Oles, our implementation:

–Eliminates ad hoc feature selection–Often uses less than 1% of the features at prediction time–Is publicly available

•Accuracy: as good as the best results ever published.•In sum, we have a sparseness-inducing Bayesian approach that produces dramatically simpler models with no loss in accuracy

Page 78: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

78

Streaming Data Analysis•Motivated by need to make decisions about data during an initial scan as data “stream by”

•Recent development of theoretical CS algorithms

•Algorithms motivated by intrusion detection, transaction applications, time series transactions

10

11

1

0

1

0

0

1

1

Page 79: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

79

Streaming Text Data: “Historic” Data Analysis

• The accumulation of text messages is massive over time

• A lot of streaming research is focused on on-going or current analyses

• It is a great challenge to use only summarized historic data and see if a currently emerging phenomenon had precursors occurring in the past

• We are working on a novel architecture for historic and posterior analyses via small summaries - “sketches”

Page 80: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

80

Streaming Analysis Tool: CM Sketch• Theoretical: We have developed the CM Sketch that

uses (1/e) log 1/d space to approximate data distribution with error at most e, and probability of success at least 1-d. – All other previously known sample or sketch

methods use space at least (1/e2).– CM Sketch is an order of magnitude better.

• Practical: Few 10's of KBs gives accurate summary of large data: Create summaries of data that allow historic queries to find

– Heavy Hitters (Most Frequent Items)

– Quantiles of a Distribution (Median, Percentiles etc.)

– Finding items with large changes

Page 81: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

81

Outline•Bioterrorism Sensor Location•Port of Entry Inspection Algorithms•Monitoring Message Streams•Author Identification•Computational and Mathematical Epidemiology•Adverse Event/Disease Reporting/Surveillance/Analysis•Bioterrorism Working Group•Modeling Social Responses to Bioterrorism•Predicting Disease Outbreaks from Remote Sensing and Media Data•Communication Security and Information Privacy

Page 82: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

82

Large-scale Automated Author Identification

Page 83: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

83

Statistical Analysis of Text

•Statistical text analysis has a long history in literary analysis and in solving disputed authorship problems

•First (?) is Thomas C. Mendenhall in 1887

Page 84: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

84

•Hamilton versus Madison: the Federalist Papers

•Mosteller and Wallace (1963) used Naïve Bayes with a Poisson and Negative Binomial model

•Good predictive performance

Page 85: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

85

Some Background

• Identification technologies important for homeland security and in the legal system

• Author attribution for textual artifacts using “topic independent” stylometric features has a long history

• Historical focus on small numbers of authors and low-dimensional representations via function words

Page 86: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

86

Author ID Project Objectives

• Application of state-of-the-art statistical and computing technologies to authorship attribution

• Work with very high-dimensional document representations

• Focus on providing working solutions to particular problems

Page 87: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

87

Author ID Project FocusGoal: Identification of Authors From Large Collection of Objects

•traditional disputed authorship (choose among k known authors)

•clustering of “putative” authors (e.g., internet handles: termin8r, heyr, KaMaKaZie)

•document pair analysis: Were two documents written by the same author?

•odd-man-out: Were these documents written by one of this set of authors or by someone else?

Page 88: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

88

Representation

•Long tradition in stylometry that seeks a small number of textual characteristics that distinguish the texts of authors from one another (Burrows, Holmes, Binongo, Hoover, Mosteller & Wallace, McMenamin, Tweedie, etc.)

•Typically use “function words” (a, with, as, were, all, would, etc.) followed by PCA & cluster analysis

•Function words aim to be “topic-independent”

•Hoover (2003) shows that using all high-frequency words does a better job than function words alone

Page 89: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

89

Idiosyncratic Usage•Idiosyncratic usage less formalized in the literature (misspellings, repeated neologisms, etc.) but apparently useful. For example, Foster’s unmasking of Klein as the author of “Primary Colors”:

“Klein and Anonymous loved unusual adjectives ending in -y and –inous: cartoony, chunky, crackly, dorky, snarly,…, slimetudinous, vertiginous, …”

“Both Klein and Anonymous added letters to their interjections: ahh, aww, naww.”

“Both Klein and Anonymous loved to coin words beginning in hyper-, mega-, post-, quasi-, and semi-, more than all others put together”

“Klein and Anonymous use “riffle” to mean rifle or rustle, a usage for which the OED provides no instance in the past thousand years”

Page 90: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

90

Odd-Man Out

Were these documents written by one of this set of authors or by someone else?

•Training data contains documents by given set of authors

•Test data contains documents by some set of authors including some not in original set

•Bayesian hierarchical model incorporates prior knowledge that model parameters for different authors differ from each other

•Initial success on small-scale simulated examples

•Generalizations for more than one new author

Page 91: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

91

Some Results

• Created largest-ever (?) feature set including function words, suffixes, POS tags, lengths, spelling errors, common English errors, grammatical errors, phrases, idiosyncratic usage, ngrams, etc.

• Extensive experiments for 1-of-K and “odd-man-out”

• New 1.2 million message Listserv corpus, 82,000 authors

Page 92: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

92

Some Results - II

• Developed general purpose feature extraction software for author attribution

• Bayesian Multinomial Regression Software extends our highly scalable, sparse, BBR software (MMS Project) to the multi-class case

Page 93: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

93

Outline•Bioterrorism Sensor Location•Port of Entry Inspection Algorithms•Monitoring Message Streams•Author Identification•Computational and Mathematical Epidemiology•Adverse Event/Disease Reporting/Surveillance/Analysis•Bioterrorism Working Group•Modeling Social Responses to Bioterrorism•Predicting Disease Outbreaks from Remote Sensing and Media Data•Communication Security and Information Privacy

Page 94: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

94

“Special Focus” on Computational and

Mathematical Epidemiology

smallpox

Page 95: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

95

Components of a Special Focus

•Working Groups

•Tutorials

•Workshops

•Visitor Programs

•Graduate Student Programs

•Postdoc Programs

•Dissemination

Page 96: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

96

A Sampling of Working GroupsWG’s on Large Data Sets:

•Adverse Event/Disease Reporting, Surveillance & Analysis

•Data Mining and Epidemiology

WG’s on Analogies between Computers and Humans:

•Analogies between Computer Viruses/Immune Systems and Human Viruses/Immune Systems

•Distributed Computing, Social Networks, and Disease Spread Processes

Page 97: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

97

WG’s on Methods/Tools of Theoretical CS

•Phylogenetic Trees and Rapidly Evolving Diseases

•Order-Theoretic Aspects of Epidemiology

WG’s on Computational Methods for Analyzing Large Models for Spread/Control of Disease

•Spatio-temporal and Network Modeling of Diseases

•Methodologies for Comparing Vaccination Strategies

Page 98: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

98

WG’s on Mathematical Sciences Methodologies

•Mathematical Models and Defense Against Bioterrorism

•Predictive Methodologies for Infectious Diseases

•Statistical, Mathematical, and Modeling Issues in the Analysis of Marine Diseases

Page 99: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

99

Workshops on Modeling of Infectious Diseases

•The Pathogenesis of Infectious Diseases

•Models/Methodological Problems of Botanical Epidemiology

WS on Modeling of Non-Infectious Diseases

•Disease Clusters

A Sampling of Workshops

Page 100: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

100

Workshops on Evolution and Epidemiology

•Genetics and Evolution of Pathogens

•The Epidemiology and Evolution of Influenza

•The Evolution and Control of Drug Resistance

•Models of Co-Evolution of Hosts and Pathogens

Page 101: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

101

Workshops on Methodological Issues

•Capture-recapture Models in Epidemiology

•Spatial Epidemiology and Geographic Information Systems

• Ecologic Inference

•Combinatorial Group Testing

Page 102: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

102

Outline•Bioterrorism Sensor Location•Port of Entry Inspection Algorithms•Monitoring Message Streams•Author Identification•Computational and Mathematical Epidemiology•Adverse Event/Disease Reporting/Surveillance/Analysis•Bioterrorism Working Group•Modeling Social Responses to Bioterrorism•Predicting Disease Outbreaks from Remote Sensing and Media Data•Communication Security and Information Privacy

Page 103: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

103

The DIMACS Working Group on Adverse Event/Disease Reporting,

Surveillance, and Analysis

Page 104: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

104

Working Group on Adverse Event/Disease Reporting, Surveillance, and Analysis

•Health surveillance a core activity in public health•Concerns about bioterrorism have attracted attention to new surveillance methods:

–OTC drug sales–Subway worker absenteeism–Ambulance dispatches

•Spawns need for novel statistical methods for surveillance of multiple data streams. •WG coordinated closely with National Syndromic Surveillance Conferences

Page 105: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

105

New Data Types for Public Health Surveillance

• Managed care patient encounter data• Pre-diagnostic/chief complaint (text data) • Over-the-counter sales transactions

– Drug store– Grocery store

• 911-emergency calls• Ambulance dispatch data• Absenteeism data• ED discharge summaries • Prescription/pharmaceuticals• Adverse event reports

Page 106: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

106Farzad Mostashari

Page 107: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

107

New Analytic Methods and Approaches

• Spatial-temporal scan statistics

• Statistical process control (SPC)

• Bayesian applications

• Market-basket association analysis

• Text mining

• Rule-based surveillance

• Change-point techniques

Page 108: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

108

SubGroup on Privacy & Confidentiality of Health Data

•Privacy concerns are a major stumbling block to public health surveillance, in particular bioterrorism surveillance.•Challenge: produce anonymous data specific enough for research.•Exploring ways to remove identifiers (s.s. #, tel. #, zip code) from data sets.•Exploring ways to aggregate, remove information from data sets.•Partnerships with cryptographers•Exploring methods of combinatorial optimization

Page 109: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

109

Outline•Bioterrorism Sensor Location•Port of Entry Inspection Algorithms•Monitoring Message Streams•Author Identification•Computational and Mathematical Epidemiology•Adverse Event/Disease Reporting/Surveillance/Analysis•Bioterrorism Working Group•Modeling Social Responses to Bioterrorism•Predicting Disease Outbreaks from Remote Sensing and Media Data•Communication Security and Information Privacy

Page 110: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

110

Bioterrorism Working Group

anthrax

Page 111: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

111

Bioterrorism Working Group

•Biosurveillance•Evolution•Modeling Bioterror Response Logistics•Computer Science Challenges•Agroterrorism

Page 112: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

112

Modeling Bioterror Response Logistics

Exploring Discrete Optimization/Queueing •size of stockpiles of vaccines•allocation of medications

•analysis of bottlenecks in treatment facilities•transportation schedules

1947smallpox vaccincation queueNYC

Page 113: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

113

Agroterrorism•Subgroup just starting•Interest in plant diseases•Partnership with the National Plant Diagnostic Network•Emphasis on Data Mining and Epidemiology

Page 114: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

114

Outline•Bioterrorism Sensor Location•Port of Entry Inspection Algorithms•Monitoring Message Streams•Author Identification•Computational and Mathematical Epidemiology•Adverse Event/Disease Reporting/Surveillance/Analysis•Bioterrorism Working Group•Modeling Social Responses to Bioterrorism•Predicting Disease Outbreaks from Remote Sensing and Media Data•Communication Security and Information Privacy

Page 115: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

115

Working Group on Modeling Social Responses to Bioterrorism

•Models of the spread of infectious disease commonly assume passive bystanders and rational actors who will comply with health authorities.•It is not clear how well this assumption applies to situations like a bioterrorist attack using smallpox or plague.

Page 116: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

116

Working Group on Modeling Social Responses to Bioterrorism

Interdisciplinary group is discussing incorporating social behavior into models, building models of public health decisionmaking, risk communication.

Some Issues•Movement•Compliance•Rumor•Subcultural differences•Indirect economic effects•Social stigmata•Panic

How do youmeasure the indirect cost of an attack?

Page 117: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

117

Outline•Bioterrorism Sensor Location•Port of Entry Inspection Algorithms•Monitoring Message Streams•Author Identification•Computational and Mathematical Epidemiology•Adverse Event/Disease Reporting/Surveillance/Analysis•Bioterrorism Working Group•Modeling Social Responses to Bioterrorism•Predicting Disease Outbreaks from Remote Sensing and Media Data•Communication Security and Information Privacy

Page 118: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

118

Predicting Disease Outbreaks from Remote Sensing and Media Data

Outbreaks of disease in other parts of the world have the capacity to affect the security of the US

Joint project with ImagingScience and InformationSystems Center at GeorgetownUniversity Medical School(ISIS Center)

Page 119: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

119

Predicting Disease Outbreaks from Remote Sensing and Media Data

•Recent work has shown that it’s possible to predict disease outbreaks in distant parts of the world using remotely sensed satellite data.•SARS and heightened avian flu in the Pacific Rim appeared following temperature anomalies in China. •Could we have anticipated this given enviro-climatic information?

Page 120: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

120

Predicting Disease Outbreaks from Remote Sensing and Media Data

•Rift Valley Fever epidemic in 1997/8 in East Africa occurred following heavy flooding related to El Nino•Flooding in Venezuela in 1995 resulted in a multi-pathogen outbreak.

Page 121: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

121

Predicting Disease Outbreaks from Remote Sensing and Media Data

•Indications and warnings can alert US responders to bioevents in faraway places. •Disease that can result in social disruptions can be detected in open source media reports even if there is no official reporting of this.

Page 122: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

122

Predicting Disease Outbreaks from Remote Sensing and Media Data

•A model developed at the ISIS Center at Georgetown predicts social disruptions due to disease based on keyword “hit counts” from text-based sources (media reports).•DIMACS Project goal: Use media model to develop ways to predict social disruptions from disease from remote sensing enviro-climatic data. •We will be using remote sensing data indicating increased Normalized Difference Vegetation Index (NDVI).

Page 123: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

123

Predicting Disease Outbreaks from Remote Sensing and Media Data

•Project Premise: We can use enviro-climatic indices such as NDVI coupled with disease-related social disruption predictors from media data delayed by several months to validate the enviro-climatic indicators as predictors.

•Approach: Machine Learning

•Project waiting to get started

Page 124: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

124

Predicting Disease Outbreaks from Remote Sensing and Media Data

•The approach is similar to ones used by members of the DIMACS team to estimate probability of a match between remotely sensed signals and a signature that has been observed before. This work has been applied to face recognition and explosive detection.

Page 125: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

125

Outline•Bioterrorism Sensor Location•Port of Entry Inspection Algorithms•Monitoring Message Streams•Author Identification•Computational and Mathematical Epidemiology•Adverse Event/Disease Reporting/Surveillance/Analysis•Bioterrorism Working Group•Modeling Social Responses to Bioterrorism•Predicting Disease Outbreaks from Remote Sensing and Media Data•Communication Security and Information Privacy

Page 126: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

126

“Special Focus” on Communication Security and Information Privacy

Page 127: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

127

“Special Focus” on Communication Security and Information Privacy

Working Groups

•Privacy-Preserving Data Mining•Usable Privacy and Security Software•Data De-Identification, Combinatorial Optimization, Graph Theory, and the Stat-OR Interface•Intrusion Detection and Network Security Management Systems

Page 128: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

128

“Special Focus” on Communication Security and Information Privacy

A Selection of Workshops

•Software Security•Applied Cryptography and Network Security•Large-scale Internet Attacks•Mobile and Wireless Security•Security of Web Services and E-Commerce•Database Security: Query Authorization and Information Inference

Page 129: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

129

Working Group on Analogies between Computer Viruses and Biological

Viruses•Can ideas for defending against biological viruses lead to ideas for defending against computer viruses?•Concern about large gap between initial time of attack and implementation of defensive strategies•“Public health” approach: Once a virus has infected a machine, it tries to connect it to as many computers as possible, as fast as possible. A “throttle” limits rate at which a computer can connect to new computers.

Time

# o

f In

fections

Pre-attack

Initia

l occurrence

Clean up

Page 130: 1 PROGRAMS IN HOMELAND SECURITY AT DIMACS Fred S. Roberts DIMACS Director.

130


Recommended