+ All Categories
Home > Documents > DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based...

DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based...

Date post: 03-Jan-2016
Category:
Upload: archibald-dalton
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
36
2006/06/29 The University of Hong Kong, Dept. of Computer Science DB group seminar Neighborhood based detection of anomalies in high dimensional spatio-temporal Sensor Datasets (SAC’04) Nabil R. Adam Vandana Pursnani Janeja Vijayalakshmi Atluri Presented by Leonidas Mak
Transcript
Page 1: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

2006/06/29 The University of Hong Kong,Dept. of Computer Science

DB group seminar

Neighborhood based detection of anomalies in high dimensional spatio-temporal Sensor Datasets

(SAC’04)

Nabil R. Adam

Vandana Pursnani Janeja

Vijayalakshmi Atluri

Presented by Leonidas Mak

Page 2: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Agenda

Spatial data mining Problem & proposed solution Approach overview Implementation detail Discussion of result

Page 3: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Spatial data mining

Deals with knowledge discovery from spatial data sets of: Spatial (point, location, etc.) Non-spatial (population, speed, etc.)

Two properties of spatial objects make spatial data mining different from others Spatial dependency Spatial heterogeneity

Page 4: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Spatial data mining

When considering spatial object: Spatial & non-spatial attributes Implicit and explicit spatial relationships Region of influence

Region of influence Underlying spatial process Influence the behavior of the object and its

neighboring objects

Page 5: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Spatial data mining

Consider the spatial process near the objects when performing spatial analysis

To identify outliers & trends in the region of influence Spatial features in the vicinity of the objects Underlying spatial process

Identify similarly behaving objects

Page 6: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Problem & proposed solution

Spatial outlier detection Objects behave very differently from their

neighborhood Graph based neighborhood [3] [11]

Does not capture the semantic relationship between the objects and the area of influence

Some clustering techniques also Delaunay triangulation [5]

Voronoi diagram

Page 7: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Problem & proposed solution

Refine the concept of “a neighborhood of an object”

To characterize similarly behaving objects Spatial relationships Semantic relationships

Identification of spatio-temporal outliers in high dimensions

Page 8: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Proposed approach to solution

Take into account of both spatial and semantic relationships

Features of these objects can be different Despite the close proximity of them

Each object has an immediate neighborhood Micro Neighborhood (Mi)

Mi can be extended or merged with others Macro Neighborhood (MaN)

Page 9: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Some definitions

Outlier (in terms of distance) [7]

An object o in a dataset T is a DB(p,D) outlier if at least a fraction p of the objects in T are at a greater distance D from o

Voronoi diagrams (of a set of objects O) [10]

The subdivision of the plane into n polygons, with a point q in the polygon corresponding to object oi iff

¿

Voi={q∣∥q−o

i∥≤∥q−o

j∥ for i ≠j }

di st q, oi≤di st q, o

j for each o

j∈O

Page 10: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Some definitions

Jaccard Coefficient (JC) [10]

Measure the similarity of asymmetric binary variables

To quantify the similarity match (1-1 match), indicating the similarity of features J i , j =

a

abc

dc0

ba1

01

Object i

Object j

Page 11: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Some definitions

Silhouette Coefficient (SC) [10]

To identify the quality of clustering result in terms of structure and its overlapping on other clusters [6]

0.7 < SC <= 1.0 Strong structure 0.5 < SC <= 0.7 Medium structure SC < 0.25 no structure

To indicate the similarity of two sparial micro neighborhoods

si=

bi−a

i

max{ai , bi }

silhouette of data i

SCX=∑i

si

i

Silhouette Coefficient of cluster X

Page 12: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Overview of the approach

1. Generation of Micro Neighborhood To generate Voronoi polygons Input: set of objects with spatial locations Output: Voronoi diagram

2. Identification of Spatial Relationships Input: Voronoi diagram, edge list Output: adjacency matrix indicating if one Mi is

a neighbor of any other Mis

Page 13: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Overview of the approach

3. Identification of semantic relationships Calculating JC and SC

Input of JC part: a set of micro neighborhoods Characterized by feature vector Representing the spatial processes

Input of SC part: a set of micro neighborhoods Characterized by a set of points Readings over a period of time

Page 14: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Overview of the approach

4. Generation of Macro Neighborhood Input: neighborhood (adjacency) matrix, JC,

SC Output: Macro Neighborhood

5. Detecting outliers Based on the distance values of various points Use Distance based outlier detection [7]

Page 15: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Overview of the approach

Generation of Micro

Neighborhood

Identification of Spatial

Relationships

Identification of Semantic

Relationships

Generation of Macro

NeighborhoodOutliers Detection

Obj. set

Voronoi Diagra

m

Edge list

Feature vector

Set of points

Neighborhood matrix

JC

SC

Macro neighborhoo

d

Page 16: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Generation of Micro Neighborhood

The definition of neighborhood is based on the concept of Voronoi diagrams

Generate the Voronoi polygon around each spatial object

A feature q lies in a Voronoi polygon is associated with the related object

Region of influence is defined as the Voronoi polygon

Page 17: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Generation of Micro Neighborhood

Micro Neighborhood (Mi) is defined as: Region of influence; dominance of one object

over the other

Spatial features have their own spatial process

Doms1, s

2={x∈features set∣dx, s

1≤dx , s

2}

Sensor (object)

River(spatial feature)

Micro Neighborhood

Page 18: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Identification of Spatial Relationships

Spatial relationships are binary relations between pairs of objects Object: point, line, polygons, etc. Relationship: topological, distance, etc.

Topological relationship of adjacency Determined by the shared edge of two Voronoi

polygon Edge list is generated by Triangle: 2D mesh

generator [12] for the Delaunay triangulation

Page 19: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Identification of Spatial Relationships

Edge list format Edge# <from spatial obj> <to spatial obj>

Two micro neighborhoods are adjacent If there is an edge between two 2 spatial

objects The adjacency information is stored in the

neighborhood adjacency matrix

Page 20: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Identification of Semantic Relationships

Micro Neighborhood can be characterized by Present/absent of spatial features Other spatial processes

Results in feature vector of 0’s and 1’s [14]

Object itself may also have an associated set of readings (points in neighborhood)

Make use of the features and also the data points in the neighborhood

Page 21: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Identification of Semantic Relationships

JC is used to identify binary valued attributes in feature vector

SC is used for non-binary valued attributes, such as readings of sensors To measure the overlap of the micro

neighborhoods Based on the readings over a period of time

Two micro neighborhoods are considered as semantic similar for Higher JC Lower SC

Page 22: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Generation of Macro Neighborhood

Each Mi can be consider as an implicit sub-cluster or grouping

Macro Neighborhood can be defined in terms of Spatial relationship between Mi

Semantic relationship Spatial, non-spatial attributes

Macro Neighborhood is defined as a graph: With outer edges E’ from Mi

Links, l = (mi,mi+1) holds iff spatial & semantic neighbor

Page 23: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Generation of Macro Neighborhood

Spatial neighbor (mi,mi+1) refers to spatial relation between polygons

Semantic neighbor refers to semantic relation based on JC & SC such that

Merge the Mi & Mj to form MaN

SC≤δ2

JC≥δ1

Page 24: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Outlier detection

Graph based spatial outlier detection [11]

It is important to identify the outliers as well as the neighborhood Since a given point can be the outlier of

several clusters Spatio-Temporal Outlier is defined as:

A point xi is a spatio-temporal outlier iff it differs sufficiently from other points in the Marco neighborhood

Page 25: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Outlier detection

First identify Macro Neighborhood Utilize distance based outlier detection

technique [7]

Consider proximity in terms of distance threshold as one of the determining factor

Investigate whether the object is an outlier (spatial outlier) If more than a certain number of points are

outliers for that object

Page 26: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Dataset

Data sets Highway traffic monitoring [11]

Water monitoring [14]

Highway traffic monitoring Traffic reading from 60 stations in time slots of

5 minutes Non-spatial attributes: volume, occupancy Spatial attributes: latitude, longitude Feature matrix: traffic flow direction, clustering

Page 27: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Dataset

Water monitoring 7 stations monitoring water quality of rivers Feature matrix consists of 21 features

Used to show the characteristics in the M i

Spatial attributes: latitude, longitude Temporal attributes: date, time of sampling Data points consists of >100 attributes

Page 28: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Results (Spatial)

Spatial relationships are identified by applying program TRIANGLE [12]

Generate edges for nodes that are judged adjacent to each other

Adjacency is expressed into a matrix

High connectivity collapse into one big neighborhood

Page 29: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Results (Spatial + JC)

Incremental building of Macro Neighborhood JC = 0.5

MaN consists of polygons 2,4,6,7 JC = 0.2

MaN consists of polygons 2,3,4,6,7

Incremental merging on the basis of less restrictive threshold of JC

Page 30: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Results (Spatial + JC)

Refinement in outliers detected Number of outliers detected varied as JC

changesWaterMonitoring Data: Num Outliers vs. JC

JC THRESHOLD

NU

M.

OU

TL

IER

S

Page 31: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Results (Spatial + JC)

Systematic elimination of outliers

Consistency in Outlier detection If one neighborhood has no outliers at low JC

threshold,it is consistently at higher threshold value

O1⊂O

2

O1: Outliers detected at high threshold of JCO2: Outliers detected at low threshold of JC

2,4JC = 0.8

2,3,4,8JC = 0.5

Outliers (part of)

Page 32: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Results (Spatial + SC)

Similar conclusion for adding SC SC decrease Neighborhood is more refined

WaterMonitoring Data: Num Outliers vs. SC

JC THRESHOLD

NU

M.

OU

TL

IER

S

Page 33: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

Results (Spatial + JC + SC)

Low JC & High SC big neighborhood More outliers

High JC & Low LC refined neighborhood Reduced outliers

Page 34: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

References: [1] F. Aurenhammer. Voronoi Diagrams: A Survey of a Fundamental

Geometric Data Structure. ACM Computing Surveys, Vol 23(3), 345-405, 1991

[2] M. Ester, A. Frommelt, H.-P. Kriegel, and J. Sander. Algorithms for characterization and trend detection in spatial databases. In Proceedings of 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD), 1998.

[3] M. Ester, H. P. Kriegel, and J. Sander. Spatial Data Mining: A Database Approach. In Proceedings of the International Symposium on Large Spatial Databases, Berlin, Germany, July 1997, pp. 47-66.

[4] M. Ester, H. -P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In Proceedings of 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD), 1996.

[5] I. Kang, T. Kim, and K. Li. A Spatial Data Mining Method by Delaunay Triangulation. In Proceedings of the 5th International Workshop on Advances in Geographic Information Systems (GIS-97), pages 35-39, 1997.

Page 35: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

References:

[6] L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.

[7] E. M. Knorr and R. T. Ng. Algorithms for Mining Distance-Based Outliers in Large Datasets. In Proceedings of 24th Int. Conf. Very Large Data Bases, VLDB, 1998

[8] H. J. Miller and J. Han, Geographic Data Mining & Knowledge Discovery, Publisher: Taylor & Francis; 1st edition

[9] Minnesota Highway traffic dataset: http://www.cs.umn.edu/research/shashi-group/TrafficData/

[10] A. Okabe, B. Boots, K. Sugihara, S. Chiu. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. pp. 291-410. John Wiley, 2000.

Page 36: DB group seminar 2006/06/29The University of Hong Kong, Dept. of Computer Science Neighborhood based detection of anomalies in high dimensional spatio-temporal.

DB group seminar

References:

[11] S. Shekhar, C. Lu, and P. Zhang. Detecting Graph-Based Spatial Outlier: Algorithms and Applications(A Summary of Results). In Computer Science & Engineering Department, UMN, Technical Report 01-014, 2001.

[12] J. R. Shewchuk, Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator. First Workshop on Applied Computational Geometry (Philadelphia, Pennsylvania), pages 124-133, ACM, May 1996

[13] D. Unwin, Introductory Spatial analysis, Publisher: Routledge Kegan & Paul. January 1982

[14] USGS, National Stream Water Quality Network (NASQAN), Published Data: http://water.usgs.gov/nasqan/progdocs/index.html

[15] Water Monitoring, the Meadowlands Environmental Research Institute, and the New Jersey Meadowlands Commision : http://cimic.rutgers.edu/hmdc_public/monitoring/


Recommended