Date post: | 17-Dec-2015 |
Category: |
Documents |
Upload: | egbert-harper |
View: | 222 times |
Download: | 1 times |
3D Shape Histograms for Similarity Search and Classification in Spatial
Databases.
Mihael Ankerst,Gabi Kastenmuller,Hans-Peter-Kriegel,Thomas Seidl
Univ of Munich, Germany
Outline
Introduction 3D Shape Similarity Model Quadratic Form Distance Functions Extensibility of Histogram Models Query Processing Experimental Results and Conclusion
Outline
Introduction 3D Shape Similarity Model Quadratic Form Distance Functions Extensibility of Histogram Models Query Processing Experimental Results and Conclusion
Introduction
Classification the problem of assigning an appropriate class to the
query object Applications -molecular biology, medical imaging
mechanical engg., astronomy Objects of same class have some characteristic
properties in common. These could be geometric properties , thematic
properties.
Classification in Molecular Databases
Classification schemata is already available
We need a fast filter classification algorithm
Dali System - a sophisticated classification algorithm for proteins
CATH – hierarchical classification of protein domain structures
Four levels – class, architecture, topology and homologous super family.
Nearest Neighbor Classification
In general classification is done after training
Object is assigned if it matches the description of the class
Nearest neighbor classifiers –find the nearest neighbor and return its class
K- nearest neighbors - #k, Weights of neighbors
Geometry Based Similarity Search
Spatial objects transformed into high dimensional vector space
In 2D shapes can be represented as ordered set of surface points, approx rectangular coverings etc.
Section Coding technique – each polygon’s circumcircle is decomposed into number of sectors, and each of these sectors are normalized.
Similarity is defined in terms of Euclidean distance between resulting feature vectors.
Invariance Properties
Similarity models need to incorporate invariance against translation, rotation, scaling etc.
Most of the methods include a preprocessing step such as rotation of objects to a normalized orientation, translation of center of mass to origin etc.
Robustness against errors is not considered in most of these models
Outline
Introduction 3D Shape Similarity Model Quadratic Form Distance Functions Extensibility of Histogram Models Query Processing Experimental Results and Conclusion
3D Shape Similarity Model
We extend the concept of section coding technique to 3D.
Shape Histograms – feature vectors
Quadratic Distance Function
Shape Histograms
Feature transform maps a complex object onto a feature vector in a multidimensional space.
3D shape histograms are also feature vectors
Based on partitioning the space into complete and disjoint cells called the bins of the histogram
We can use any space (geometric , thematic etc.)
Shell Model
3D space is decomposed into concentric shells around the center point
Independent of rotation around the center
Radii of the shells are determined from the extension of the objects
Shells of uniform thickness
Sector Model
3D space is decomposed into sectors that emerge from the center point of the model
Distribute points uniformly on the surface of the sphere.
The Voronoi diagram gives an appropriate decomposition of the space.
Combined Model
Combination of shell and sector models
Results in a higher dimensionality
We can different combinations of shells and sectors for the same dimensionality
Euclidean Distance
Euclidean Distance between two N dimensional vectors p and q is given by
Individual components of the feature vectors are assumed to be independent
No relationships of the components such as substitutability and compensability may be regarded
Euclidean Distance
Consider 3 objects a, b and c
We can clearly see ‘a and b’ are closely related than ‘a and c’ or ‘b and c’
However due to rotation, the peaks of ‘a’ and ‘b’ are mapped into different bins and hence the Euclidean distance does not reflect similarity in this case
Outline
Introduction 3D Shape Similarity Model Quadratic Form Distance Functions Extensibility of Histogram Models Query Processing Experimental Results and Conclusion
Quadratic Form Distance Function
Quadratic form distance function is defined in terms of similarity matrix ‘A’
The components aij of A represent similarity of the components i and j in the underlying space
Euclidean distance is a specific case of Quad Form Distance where A= I, the Identity Matrix
Quadratic Form Distance Functions
Euclidean distance of two vectors is totally determined
Weighted Euclidean distance is a little more flexible , for it controls the effect of individual vector component onto the overall distance
On top of this, General Quad form distance function also specifies cross-dependencies of the dimensions
Quadratic Form Distance Functions
The neighborhood of the bins can be represented as the similarity weights
Let d(i,j) represent the distance of the cells that correspond to bin i and j
For shells the bin distance is the difference in the corresponding radii
For sectors the bin distance is the difference in the angles of sector centers
Quadratic Form Distance Functions
When provided with appropriate distance function, the similarity matrix can be computed as
aij = e-σ.d(i,j)
where the parameter σ controls the global shape of the
similarity matrix.
Invariance Properties
During normalization , we perform translation and rotation of all objects
Translation is done such that the COM maps onto the Origin
Principal Axes Transform is done
This generally leads to unique orientation of the object
Principal Axes Transform
Compute the Covariance matrix for a given 3D set of points (x,y,z)
Principal Axes Transform
The eigen vectors of this matrix represent the principal axes of the original 3D point
set The eigen values indicate the variance of
the points in the respective direction As a result of PAT all the covariances of
the transformed points vanish
Outline
Introduction 3D Shape Similarity Model Quadratic Form Distance Functions Extensibility of Histogram Models Query Processing Experimental Results and Conclusion
Extensibility of Histogram Models
Along with spatial properties we can also consider thematic properties
General approach to manage both thematic and spatial properties is to use combined histograms
Combined histogram is the cartesian product of the individual histograms
Outline
Introduction 3D Shape Similarity Model Quadratic Form Distance Functions Extensibility of Histogram Models Query Processing Experimental Results and Conclusion
Query Processing
In case of Quad Form Distance Function, the evaluation time of a single database object increases quadratically with dimension
Optimal Multistep k- Nearest Neighbor Search
In order to achieve a good performance , the paradigm of mutlistep query processing is used
An index-based filter step produces a set of candidates
Refinement step performs the expensive exact evaluation of the candidates
Filter is responsible for completeness and refinement for correctness
Optimal Multistep k- Nearest Neighbor Search
Based on multi-dimensional index structure, the filter step performs an incremental ranking
objects ordered by their increasing filter distance to the query are reported
In order to guarantee no false dismissals caused by the filter step, dj(p,q) ≤ do(p,q)
Where dj =filter distance and d0 = object distance
Reduction in Dimensionality of Quadratic Forms
Objects in high dimensional spaces are managed by reducing their dimensionality
Typically this is done by Principal Component Analysis, Discrete Fourier transform, Similarity Matrix decomposition, Feature Subselection etc.
These approaches can also be used in case of Quadratic Form Distance
Reduction in Dimensionality of Quadratic Forms
An algorithm to reduce the similarity matrix from a high-dim. space down to a low-dim. space was developed in the context of multimedia databases.
The method guarantees three things
the reduced distance function is a lower bound of the given high-dimensional distance function.
the reduced distance function again is a quadratic form
the reduced distance function is the greatest of all lower-bounding
distance functions in the reduced space.
Experimental Evaluation
Data is taken from Brookhaven Protein Databank.
Molecules are represented as surface points for the computation of shape histograms
Reduced Feature Vectors for the filter step are managed by a X-tree of dimension 10.
Experimental Evaluation
Similarity Matrices are computed by an adapted formula from where the similarity weights aij of bin i and j are defined as
aij = e-σ.d(i,j)
σ = 10
Basic Similarity Search
Classification by Shape Similarity
Every class has at least two molecules
From Preprocessing , 3422 proteins have been classified into 281 classes
3models pure shell model, pure sector model and combined model have been considered .
The accuracy for the combined model is the best
Classification by Shape Similarity