Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | archibald-dalton |
View: | 216 times |
Download: | 0 times |
2006/06/29 The University of Hong Kong,Dept. of Computer Science
DB group seminar
Neighborhood based detection of anomalies in high dimensional spatio-temporal Sensor Datasets
(SAC’04)
Nabil R. Adam
Vandana Pursnani Janeja
Vijayalakshmi Atluri
Presented by Leonidas Mak
DB group seminar
Agenda
Spatial data mining Problem & proposed solution Approach overview Implementation detail Discussion of result
DB group seminar
Spatial data mining
Deals with knowledge discovery from spatial data sets of: Spatial (point, location, etc.) Non-spatial (population, speed, etc.)
Two properties of spatial objects make spatial data mining different from others Spatial dependency Spatial heterogeneity
DB group seminar
Spatial data mining
When considering spatial object: Spatial & non-spatial attributes Implicit and explicit spatial relationships Region of influence
Region of influence Underlying spatial process Influence the behavior of the object and its
neighboring objects
DB group seminar
Spatial data mining
Consider the spatial process near the objects when performing spatial analysis
To identify outliers & trends in the region of influence Spatial features in the vicinity of the objects Underlying spatial process
Identify similarly behaving objects
DB group seminar
Problem & proposed solution
Spatial outlier detection Objects behave very differently from their
neighborhood Graph based neighborhood [3] [11]
Does not capture the semantic relationship between the objects and the area of influence
Some clustering techniques also Delaunay triangulation [5]
Voronoi diagram
DB group seminar
Problem & proposed solution
Refine the concept of “a neighborhood of an object”
To characterize similarly behaving objects Spatial relationships Semantic relationships
Identification of spatio-temporal outliers in high dimensions
DB group seminar
Proposed approach to solution
Take into account of both spatial and semantic relationships
Features of these objects can be different Despite the close proximity of them
Each object has an immediate neighborhood Micro Neighborhood (Mi)
Mi can be extended or merged with others Macro Neighborhood (MaN)
DB group seminar
Some definitions
Outlier (in terms of distance) [7]
An object o in a dataset T is a DB(p,D) outlier if at least a fraction p of the objects in T are at a greater distance D from o
Voronoi diagrams (of a set of objects O) [10]
The subdivision of the plane into n polygons, with a point q in the polygon corresponding to object oi iff
¿
Voi={q∣∥q−o
i∥≤∥q−o
j∥ for i ≠j }
di st q, oi≤di st q, o
j for each o
j∈O
DB group seminar
Some definitions
Jaccard Coefficient (JC) [10]
Measure the similarity of asymmetric binary variables
To quantify the similarity match (1-1 match), indicating the similarity of features J i , j =
a
abc
dc0
ba1
01
Object i
Object j
DB group seminar
Some definitions
Silhouette Coefficient (SC) [10]
To identify the quality of clustering result in terms of structure and its overlapping on other clusters [6]
0.7 < SC <= 1.0 Strong structure 0.5 < SC <= 0.7 Medium structure SC < 0.25 no structure
To indicate the similarity of two sparial micro neighborhoods
si=
bi−a
i
max{ai , bi }
silhouette of data i
SCX=∑i
si
i
Silhouette Coefficient of cluster X
DB group seminar
Overview of the approach
1. Generation of Micro Neighborhood To generate Voronoi polygons Input: set of objects with spatial locations Output: Voronoi diagram
2. Identification of Spatial Relationships Input: Voronoi diagram, edge list Output: adjacency matrix indicating if one Mi is
a neighbor of any other Mis
DB group seminar
Overview of the approach
3. Identification of semantic relationships Calculating JC and SC
Input of JC part: a set of micro neighborhoods Characterized by feature vector Representing the spatial processes
Input of SC part: a set of micro neighborhoods Characterized by a set of points Readings over a period of time
DB group seminar
Overview of the approach
4. Generation of Macro Neighborhood Input: neighborhood (adjacency) matrix, JC,
SC Output: Macro Neighborhood
5. Detecting outliers Based on the distance values of various points Use Distance based outlier detection [7]
DB group seminar
Overview of the approach
Generation of Micro
Neighborhood
Identification of Spatial
Relationships
Identification of Semantic
Relationships
Generation of Macro
NeighborhoodOutliers Detection
Obj. set
Voronoi Diagra
m
Edge list
Feature vector
Set of points
Neighborhood matrix
JC
SC
Macro neighborhoo
d
DB group seminar
Generation of Micro Neighborhood
The definition of neighborhood is based on the concept of Voronoi diagrams
Generate the Voronoi polygon around each spatial object
A feature q lies in a Voronoi polygon is associated with the related object
Region of influence is defined as the Voronoi polygon
DB group seminar
Generation of Micro Neighborhood
Micro Neighborhood (Mi) is defined as: Region of influence; dominance of one object
over the other
Spatial features have their own spatial process
Doms1, s
2={x∈features set∣dx, s
1≤dx , s
2}
Sensor (object)
River(spatial feature)
Micro Neighborhood
DB group seminar
Identification of Spatial Relationships
Spatial relationships are binary relations between pairs of objects Object: point, line, polygons, etc. Relationship: topological, distance, etc.
Topological relationship of adjacency Determined by the shared edge of two Voronoi
polygon Edge list is generated by Triangle: 2D mesh
generator [12] for the Delaunay triangulation
DB group seminar
Identification of Spatial Relationships
Edge list format Edge# <from spatial obj> <to spatial obj>
Two micro neighborhoods are adjacent If there is an edge between two 2 spatial
objects The adjacency information is stored in the
neighborhood adjacency matrix
DB group seminar
Identification of Semantic Relationships
Micro Neighborhood can be characterized by Present/absent of spatial features Other spatial processes
Results in feature vector of 0’s and 1’s [14]
Object itself may also have an associated set of readings (points in neighborhood)
Make use of the features and also the data points in the neighborhood
DB group seminar
Identification of Semantic Relationships
JC is used to identify binary valued attributes in feature vector
SC is used for non-binary valued attributes, such as readings of sensors To measure the overlap of the micro
neighborhoods Based on the readings over a period of time
Two micro neighborhoods are considered as semantic similar for Higher JC Lower SC
DB group seminar
Generation of Macro Neighborhood
Each Mi can be consider as an implicit sub-cluster or grouping
Macro Neighborhood can be defined in terms of Spatial relationship between Mi
Semantic relationship Spatial, non-spatial attributes
Macro Neighborhood is defined as a graph: With outer edges E’ from Mi
Links, l = (mi,mi+1) holds iff spatial & semantic neighbor
DB group seminar
Generation of Macro Neighborhood
Spatial neighbor (mi,mi+1) refers to spatial relation between polygons
Semantic neighbor refers to semantic relation based on JC & SC such that
Merge the Mi & Mj to form MaN
SC≤δ2
JC≥δ1
DB group seminar
Outlier detection
Graph based spatial outlier detection [11]
It is important to identify the outliers as well as the neighborhood Since a given point can be the outlier of
several clusters Spatio-Temporal Outlier is defined as:
A point xi is a spatio-temporal outlier iff it differs sufficiently from other points in the Marco neighborhood
DB group seminar
Outlier detection
First identify Macro Neighborhood Utilize distance based outlier detection
technique [7]
Consider proximity in terms of distance threshold as one of the determining factor
Investigate whether the object is an outlier (spatial outlier) If more than a certain number of points are
outliers for that object
DB group seminar
Dataset
Data sets Highway traffic monitoring [11]
Water monitoring [14]
Highway traffic monitoring Traffic reading from 60 stations in time slots of
5 minutes Non-spatial attributes: volume, occupancy Spatial attributes: latitude, longitude Feature matrix: traffic flow direction, clustering
DB group seminar
Dataset
Water monitoring 7 stations monitoring water quality of rivers Feature matrix consists of 21 features
Used to show the characteristics in the M i
Spatial attributes: latitude, longitude Temporal attributes: date, time of sampling Data points consists of >100 attributes
DB group seminar
Results (Spatial)
Spatial relationships are identified by applying program TRIANGLE [12]
Generate edges for nodes that are judged adjacent to each other
Adjacency is expressed into a matrix
High connectivity collapse into one big neighborhood
DB group seminar
Results (Spatial + JC)
Incremental building of Macro Neighborhood JC = 0.5
MaN consists of polygons 2,4,6,7 JC = 0.2
MaN consists of polygons 2,3,4,6,7
Incremental merging on the basis of less restrictive threshold of JC
DB group seminar
Results (Spatial + JC)
Refinement in outliers detected Number of outliers detected varied as JC
changesWaterMonitoring Data: Num Outliers vs. JC
JC THRESHOLD
NU
M.
OU
TL
IER
S
DB group seminar
Results (Spatial + JC)
Systematic elimination of outliers
Consistency in Outlier detection If one neighborhood has no outliers at low JC
threshold,it is consistently at higher threshold value
O1⊂O
2
O1: Outliers detected at high threshold of JCO2: Outliers detected at low threshold of JC
2,4JC = 0.8
2,3,4,8JC = 0.5
Outliers (part of)
DB group seminar
Results (Spatial + SC)
Similar conclusion for adding SC SC decrease Neighborhood is more refined
WaterMonitoring Data: Num Outliers vs. SC
JC THRESHOLD
NU
M.
OU
TL
IER
S
DB group seminar
Results (Spatial + JC + SC)
Low JC & High SC big neighborhood More outliers
High JC & Low LC refined neighborhood Reduced outliers
DB group seminar
References: [1] F. Aurenhammer. Voronoi Diagrams: A Survey of a Fundamental
Geometric Data Structure. ACM Computing Surveys, Vol 23(3), 345-405, 1991
[2] M. Ester, A. Frommelt, H.-P. Kriegel, and J. Sander. Algorithms for characterization and trend detection in spatial databases. In Proceedings of 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD), 1998.
[3] M. Ester, H. P. Kriegel, and J. Sander. Spatial Data Mining: A Database Approach. In Proceedings of the International Symposium on Large Spatial Databases, Berlin, Germany, July 1997, pp. 47-66.
[4] M. Ester, H. -P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In Proceedings of 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD), 1996.
[5] I. Kang, T. Kim, and K. Li. A Spatial Data Mining Method by Delaunay Triangulation. In Proceedings of the 5th International Workshop on Advances in Geographic Information Systems (GIS-97), pages 35-39, 1997.
DB group seminar
References:
[6] L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.
[7] E. M. Knorr and R. T. Ng. Algorithms for Mining Distance-Based Outliers in Large Datasets. In Proceedings of 24th Int. Conf. Very Large Data Bases, VLDB, 1998
[8] H. J. Miller and J. Han, Geographic Data Mining & Knowledge Discovery, Publisher: Taylor & Francis; 1st edition
[9] Minnesota Highway traffic dataset: http://www.cs.umn.edu/research/shashi-group/TrafficData/
[10] A. Okabe, B. Boots, K. Sugihara, S. Chiu. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. pp. 291-410. John Wiley, 2000.
DB group seminar
References:
[11] S. Shekhar, C. Lu, and P. Zhang. Detecting Graph-Based Spatial Outlier: Algorithms and Applications(A Summary of Results). In Computer Science & Engineering Department, UMN, Technical Report 01-014, 2001.
[12] J. R. Shewchuk, Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator. First Workshop on Applied Computational Geometry (Philadelphia, Pennsylvania), pages 124-133, ACM, May 1996
[13] D. Unwin, Introductory Spatial analysis, Publisher: Routledge Kegan & Paul. January 1982
[14] USGS, National Stream Water Quality Network (NASQAN), Published Data: http://water.usgs.gov/nasqan/progdocs/index.html
[15] Water Monitoring, the Meadowlands Environmental Research Institute, and the New Jersey Meadowlands Commision : http://cimic.rutgers.edu/hmdc_public/monitoring/