+ All Categories
Home > Documents > 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

9/15/2008 CTBTO Data Mining/Data Fusion Workshop

Date post: 21-Nov-2014
Category:
Upload: tommy96
View: 453 times
Download: 0 times
Share this document with a friend
Description:
 
Popular Tags:
19
9/15/2008 9/15/2008 CTBTO Data Mining/Data Fusion CTBTO Data Mining/Data Fusion Workshop Workshop 1 Spatiotemporal Stream Spatiotemporal Stream Mining Applied to Mining Applied to Seismic+ Data Seismic+ Data Margaret H. Dunham Margaret H. Dunham CSE Department CSE Department Southern Methodist University Southern Methodist University Dallas, Texas 75275 USA Dallas, Texas 75275 USA [email protected]
Transcript
Page 1: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 11

Spatiotemporal Stream Spatiotemporal Stream Mining Applied to Mining Applied to

Seismic+ DataSeismic+ Data

Margaret H. DunhamMargaret H. DunhamCSE DepartmentCSE Department

Southern Methodist UniversitySouthern Methodist UniversityDallas, Texas 75275 USADallas, Texas 75275 USA

[email protected]

Page 2: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

OutlineOutline

CTBTO Data CTBTO Data CTBTO Modeling RequirementsCTBTO Modeling Requirements EMMEMM

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 22

Page 3: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

CTBTO DataCTBTO Data

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 33

As a Data Miner I must first understand As a Data Miner I must first understand your DATAyour DATA

•Diverse – Seismic, Hydroacoustic, Infrasound, Radionuclide•Spatial (source and sensor)•Temporal•STREAM Data

Page 4: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

From Sensors to StreamsFrom Sensors to Streams

Stream Data - Data captured and sent by a set Stream Data - Data captured and sent by a set of sensorsof sensors

Real-time sequence of encoded signals which Real-time sequence of encoded signals which contain desired information. contain desired information.

Continuous, ordered (implicitly by arrival time Continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic or explicitly by timestamp or by geographic coordinates) sequence of items coordinates) sequence of items

Stream data is infinite - the data keeps coming. Stream data is infinite - the data keeps coming.

11/26/07 – IRADSN’0711/26/07 – IRADSN’0744

Page 5: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

CTBTO & Data MiningCTBTO & Data Mining

Data Mining techniques must be Data Mining techniques must be defined based on your data and defined based on your data and applicationsapplications

Can’t use predefined fixed models Can’t use predefined fixed models and prediction/classification and prediction/classification techniques.techniques.

Must not redo massive amounts of Must not redo massive amounts of algorithms already created.algorithms already created.

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 55

Page 6: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

CTBTO + DM RequirementsCTBTO + DM Requirements• Model:Model:

Handle different data types (seismic, hydroacoustic, Handle different data types (seismic, hydroacoustic, etc.)etc.)

Spatial + Temporal (Spatiotemporal)Spatial + Temporal (Spatiotemporal) HierarchicalHierarchical ScalableScalable OnlineOnline DynamicDynamic

• Anomaly Detection:Anomaly Detection: Not just specific wave type or data valuesNot just specific wave type or data values Relationships between arrival of waves/dataRelationships between arrival of waves/data Combined values of data from all sensorsCombined values of data from all sensors

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 66

Page 7: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

EMM (Extensible Markov Model)EMM (Extensible Markov Model)

Time Varying Discrete First Order Markov ModelTime Varying Discrete First Order Markov Model Nodes are clusters of real world states.Nodes are clusters of real world states. Overlap of learning and validation phasesOverlap of learning and validation phases Learning:Learning:

• Transition probabilities between nodesTransition probabilities between nodes• Node labels (centroid or medoid of cluster)Node labels (centroid or medoid of cluster)• Nodes are added and removed as data arrivesNodes are added and removed as data arrives

Applications: prediction, anomaly detectionApplications: prediction, anomaly detection

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop77

Page 8: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

Research ObjectivesResearch Objectives Apply proven spatiotemporal modeling Apply proven spatiotemporal modeling

technique to seismic datatechnique to seismic data Construct EMM to model sensor dataConstruct EMM to model sensor data

• Local EMM at location or areaLocal EMM at location or area• Hierarchical EMM to summarize lower level modelsHierarchical EMM to summarize lower level models• Represent all data in one vector of valuesRepresent all data in one vector of values• EMM learns normal behaviorEMM learns normal behavior

Develop new similarity metrics to include all sensor Develop new similarity metrics to include all sensor data types (Fusion)data types (Fusion)

Apply anomaly detection algorithmsApply anomaly detection algorithms

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 88

Page 9: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

EMM Creation/LearningEMM Creation/Learning

9/15/20089/15/2008 99

<18,10,3,3,1,0,0><18,10,3,3,1,0,0>

<17,10,2,3,1,0,0><17,10,2,3,1,0,0>

<16,9,2,3,1,0,0><16,9,2,3,1,0,0>

<14,8,2,3,1,0,0><14,8,2,3,1,0,0>

<14,8,2,3,0,0,0><14,8,2,3,0,0,0>

<18,10,3,3,1,1,0.><18,10,3,3,1,1,0.>

Page 10: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

Input Data RepresentationInput Data Representation

Vector of sensor values (numeric) at Vector of sensor values (numeric) at precise time points or aggregated precise time points or aggregated over time intervals.over time intervals.

Need not come from same sensor Need not come from same sensor types.types.

Similarity/distance between vectors Similarity/distance between vectors used to determine creation of new used to determine creation of new nodes in EMM.nodes in EMM.

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1010

Page 11: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

11/3/0411/3/04 1111

Anomaly Detection with EMMAnomaly Detection with EMMObjectiveObjective: Detect rare (unusual, : Detect rare (unusual, surprising) eventssurprising) eventsAdvantages:Advantages:

•Dynamically learns what is Dynamically learns what is normalnormal•Based on this learning, can Based on this learning, can predict what is not normalpredict what is not normal•Do not have to a priori indicate Do not have to a priori indicate normal behaviornormal behavior

Applications:Applications:•Network IntrusionNetwork Intrusion•Data: IP traffic data, Automobile Data: IP traffic data, Automobile traffic datatraffic data

Seismic:Seismic:•Unusual Seismic EventsUnusual Seismic Events•Automatically Filter out normal Automatically Filter out normal eventsevents

Weekdays Weekend

Minnesota DOT Traffic Data

Detected unusual weekend traffic pattern

Page 12: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

EMM with Seismic DataEMM with Seismic Data Input – Wave arrivals (all or one per sensor)Input – Wave arrivals (all or one per sensor) Identify states and changes of states in seismic dataIdentify states and changes of states in seismic data Wave form would first have to be converted into a series of vectors Wave form would first have to be converted into a series of vectors

representing the activity at various points in time.representing the activity at various points in time. Initial Testing with RDG dataInitial Testing with RDG data Use amplitude, period, and wave typeUse amplitude, period, and wave type

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1212

Page 13: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

New Distance MeasureNew Distance Measure Data = <amplitude, period, wave type>Data = <amplitude, period, wave type> Different wave type = 100% differenceDifferent wave type = 100% difference For events of same wave type:For events of same wave type:

• 50% weight given to the difference in amplitude.50% weight given to the difference in amplitude.• 50% weight given to the difference in period.50% weight given to the difference in period.

If the distance is greater than the threshold, a If the distance is greater than the threshold, a state change is required.state change is required.

  amplitude =amplitude =

| amplitude| amplitudenewnew – amplitude – amplitudeaverageaverage | / amplitude | / amplitudeaverageaverage

period = period =

| period| periodnewnew – period – periodaverageaverage | / period | / periodaverage average

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1313

Page 14: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

EMM with Seismic DataEMM with Seismic Data

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1414

States 1, 2, and 3 correspond to Noise, Wave A, and Wave B respectively.

Page 15: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

Preliminary TestingPreliminary Testing

RDG data February 1, 1981 – 6 RDG data February 1, 1981 – 6 earthquakesearthquakes

Find transition times close to known Find transition times close to known earthquakesearthquakes

9 total nodes9 total nodes 652 total transitions652 total transitions Found all quakesFound all quakes

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1515

Page 16: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

EMM NodesEMM Nodes

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1616

Node # Average amplitude Average period Phase code

1 1.649m 0.119 sec P (primary wave)2 8.353m 0.803 sec P (primary wave)3 23.237m 0.898 sec P (primary wave)4 87.324m 0.997 sec P (primary wave)5 253.333m 1.282 sec P (primary wave)

6 270.524m 0.96 sec P (primary wave)

7 7.719m 20.4 sec P (primary wave)

8 723.088m 1.962 sec P (primary wave)

9 1938.772m 1.2 sec P (primary wave)

.

Page 17: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

Hierarchical EMMHierarchical EMM

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1717

Page 18: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1818

DATA NEEDE

D

Now What?Now What?

NOISE MAY NOT BE BAD

KDD CUP

Interest DM COMMUNITY

Page 19: 9/15/2008 CTBTO Data Mining/Data Fusion Workshop

ReferencesReferences Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for Spatio-Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for Spatio-

Temporal Data”, Temporal Data”, Proceedings of the First International Workshop on Knowledge Proceedings of the First International Workshop on Knowledge Discovery in Multimedia and Complex DataDiscovery in Multimedia and Complex Data, May 2002, pp 1-9., May 2002, pp 1-9.

Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Variables to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Variables to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Conference, May 2003, pp 519-531.Conference, May 2003, pp 519-531.

Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings Proceedings IEEE ICDM ConferenceIEEE ICDM Conference, November 2004, pp 371-374., November 2004, pp 371-374.

Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Spatiotemporal,” Proceedings of the IEEE PAKDD ConferenceProceedings of the IEEE PAKDD Conference, April 2006, Singapore. , April 2006, Singapore. (Also in (Also in Lecture Notes in Computer ScienceLecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, , Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.)pp 750-754.)

Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Spatiotemporal Data Streams,” Journal of ComputersJournal of Computers, Vol 1, No 3, June 2006, pp 43-50., Vol 1, No 3, June 2006, pp 43-50.

Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic Anomalies,” Anomalies,” International Journal of Computer Science and Network SecurityInternational Journal of Computer Science and Network Security, Vol 6, No , Vol 6, No 6, June 2006, pp 258-265.6, June 2006, pp 258-265.

Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,” Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,” Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) SymposiumSymposium, November 26, 2007, Shreveport Louisiana., November 26, 2007, Shreveport Louisiana.

9/15/20089/15/2008 CTBTO Data Mining/Data Fusion WorkshopCTBTO Data Mining/Data Fusion Workshop 1919


Recommended