Wei-Ta Chu
2010/10/21
Spectral Texture Features1
Multimedia Content Analysis, CSIE, CCU
Gabor Texture2
The Gabor representation has been shown to beoptimal in the sense of minimizing the joint two-dimensional uncertainty in space and frequency.
These filters can be considered as orientation andscale tunable edge and line (bar) detectors.
The statistics of these microfeatures in a givenregion are often used to characterize theunderlying texture information.
B.S. Manjunathand W.Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Trans. on PAMI, vol. 18, no. 8, 1996, pp. 837-842.
Gabor Texture
Multimedia Content Analysis, CSIE, CCU
3
Fourier coefficients depend on the entire image (Global) →we lose spatial information
Objective: local spatial frequency analysis Gabor kernels: looks like Fourier basis multiplied by a
Gaussian Gabor filters come in pairs: symmetric and anti-symmetric
We need to apply a number of Gabor filters at differentscales, orientations, and spatial frequencies
Symmetric kernel
Anti-symmetric kernel
Gabor Texture
Multimedia Content Analysis, CSIE, CCU
4
Image I(x,y) convoluted with Gabor filters hmn (totally M x N)
Using first and 2nd moments for each scale and orientations
Features: e.g., 4 scales, 6 orientations→ 48 dimensions
evenodd
Gabor Texture
Multimedia Content Analysis, CSIE, CCU
5
Arranging the mean energy in a 2D form structured: localized pattern oriented (or directional): column pattern granular: row pattern random: random pattern
scale
orientation
Homogeneous Texture Descriptor6
Frequency plane partition is uniform along the angular direction (30º), non-uniform alongthe radial direction (on an octave scale)
B.S. Manjunathand W.Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Trans. on PAMI, vol. 18, no. 8, 1996, pp. 837-842.
Gabor Function7
On the top of the feature channel, the following 2D Gaborfunction (modulated Gaussian) is applied to each individualchannels.
Equivalent to weighting the Fourier transform coefficients of theimage with a Gaussian centered at the frequency channels asdefined above
Each channel filters a specific type of texture
Homogeneous Texture Descriptor
Multimedia Content Analysis, CSIE, CCU
8
Partition the frequency domain into 30 channels(modeled by a 2D Gabor function)
Computing the energy and energy deviation foreach channel
Computing the mean and standard deviation offrequency coefficients
HTD = {fDC, fSD, e1,e2,…,e30,d1,d2,…,d30}
fDC and fSD are the mean and standard deviation of the imageei and di are the mean energy and energy deviation of the corresponding ith channel
Distance Measure9
Resources: http://vision.ece.ucsb.edu/texture/feature.htmlOn-line demo: http://vision.ece.ucsb.edu/texture/mpeg7/index.html
B.S. Manjunathand W.Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Trans. on PAMI, vol. 18, no. 8, 1996, pp. 837-842.
Example: Browsing Satellite Images
Multimedia Content Analysis, CSIE, CCU
10
Find a vegetation patch that looks like this region
B.S. Manjunathand W.Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Trans. on PAMI, vol. 18, no. 8, 1996, pp. 837-842.
Example: Browsing Satellite Images
Multimedia Content Analysis, CSIE, CCU
11
(b) parts of highway (c) region containing some buildings (center of the image
toward the left) (d) a number marked on the image (lower left corner)
Wavelet Features12
Wavelet transforms refer to the decomposition of a signal witha family of basis functions with recursive filtering andsubsampling
At each level, it decomposes a 2D signal into four subbands,which are often referred to as LL, LH, HL, HH (L=low, H=high)
LL2 HL2HL1
LH2 HH2
LH1 HH1
Wavelet Features13
Using the mean and standard deviation of the energydistribution in each subband at each level.
PWT (Pyramid-structured wavelet transform) Recursively decompose the LL band Results in 30-dimensional feature vector (3x3x2+2=30)
TWT (Tree-structured wavelet transform) Some information appears in the middle frequency channels–
decomposition is not restricted to the LL band Results in 40x2 = 80 dimensional feature vector
Original image PWT TWT
T. Chang and C.C.J. Kuo, “Texture analysis and classification with tree-structure wavelet transform,” IEEE Trans. On Image Processing, vol. 2, no. 4, 1993, pp. 429-441.
Wei-Ta Chu
2010/10/21
Edge Histogram Descriptor14
Multimedia Content Analysis, CSIE, CCU
Park, et al. “Efficient use of local edge histogram descriptor,” Proc. of ACM International Workshop on Standards, Interoperability and Practices, pp. 51-54, 2000.
Introduction15
Spatial distribution of edges Edge histogram descriptor (EHD)
Dividing the image into 4x4 subimages, and generatethe edge histogram based on the edges in thesubimages. Edges are categorized into five types: vertical, horizontal,
45º diagonal, 135º diagonal, and nondirectional edges. A total of 5x16=80 histogram bins
Local Edge Histogram16
Global, Semi-global, and LocalHistograms
Multimedia Content Analysis, CSIE, CCU
17
Global-edge histogram Accumulate five types of edge distributions for all subimages
Semiglobal-edge histogram
Image Matching
Multimedia Content Analysis, CSIE, CCU
18
Combining the local, the semiglobal, and global histogramtogether.
Total of 150 bins 80 bins (local) + 5 bins (global) + 65 bins (13x5, semiglobal)
The L1 distance measure D(A,B) can be:
This feature is one of the MPEG-7 texture descriptors.
Performance Comparison19
Retrieval performance of different texture features for the Corel photo databases.
L1 distance is used to computing the dissimilarity between images.
For the MRSAR, Mahalanobis distance is used.
MRSAR (M)
GaborTWTPWT
MRSAR
Tamura (improved)
Coarseness histogramDirectionalityEdge histogramTamura (traditional)
#relevant images
#top matches considered
Manjunath and Ma, Chapter12 of Image Database:Search and Retrieval of DigitalImagery, edited by V. Castelliand L.D. Bergman, John Wiley& Sons, 2002.
Performance Comparison20
Retrieval performance of different texture featuresfor the Brodatz texture image set.
MRSAR (M)Gabor
TWTPWT
MRSARTamura (improved)
Coarseness histogramDirectionalityEdge histogram
Tamura (traditional)
#top matches considered
Percentage ofretrieving allcorrect patterns
Wei-Ta Chu
2010/10/21
Shape for CBIR21
Multimedia Content Analysis, CSIE, CCU
Shape Features
Multimedia Content Analysis, CSIE, CCU
22
MPEG-7 provides contour-based shape and region-based shape tools.
contour-basedsimilarity
region-basedsimilarity
Bober, “MPEG-7 visual shapedescriptors”, IEEE Trans. On CSVT, vol. 11, no. 6, pp. 716-719, 2001.
Region-Based Shape Descriptor
Multimedia Content Analysis, CSIE, CCU
23
The region-based SD expressed pixel distributionwithin a 2D object or region.
It can describe complex objects consisting ofmultiple disconnected regions.
2D Angular Radial Transformation (ART)Gives a compact and efficient way of describing
multiple disjoint regionsRobust to segmentation noise
Angular Radical Transform (ART)
Multimedia Content Analysis, CSIE, CCU
24
For each image, a set of ART coefficients Fnm is extracted:
•The MPEG-7 Visual Part of the XM 4.0, ISO/IECMPEG99/W3068, Dec. 1999.•W.-Y. Kim and Y.-S. Kim, “A New Region-BasedShape Descriptor,” ISO/IEC MPEG99/M5472, Maui, Hawaii, Dec. 1999.
Contour-Based Shape Descriptor
Multimedia Content Analysis, CSIE, CCU
25
The contour SD is based on theCurvature Scale-Space (CSS)representation of the contour. Distinguish between shapes that have similar
region-based shape (b) Support search for shapes that are
semantically similar, even significant intra-class variability (c)
Robust to significant nonrigid deformations (d) and to perspective transformation (e)
Curvature Scale-Space (CSS)
Multimedia Content Analysis, CSIE, CCU
26
When comparing shapes, humans tend todecompose shape contours into concave and convexsections.Features: How prominent they are, their length relative
to the contour length, and their position and order onthe contour
CSS representation decomposes the contour into convexand concave sections by determining the reflectionpoints (points at which curvature is zero)
Curvature Scale-Space (CSS)
Multimedia Content Analysis, CSIE, CCU
27
CSS image shows how the inflection points change whenfiltering is applied to the contour X-axis corresponds to the position on the contour (clockwise, starting
from any arbitrary point) Y-axis corresponds to the values of a shape smooth parameter (when y-
values increase, amount of smoothing increases) Any black point in the CSS image signifies that at the corresponding
position and at the corresponding scale, there is an inflection point.
Curvature Scale-Space (CSS)28
The smoothing is performed iteratively and for each level, the zero crossings of thecurvature function are computed.
The CSS image is obtained by plotting all zero-crossing points on a plane
Mokhtarian and Mackworth, “A theory of multiscale, curvature-basedshape representation for planar curves,” IEEE Trans. on PAMI, vol. 14, no. 8, pp. 789-805, 1992.
Shape Descriptor
Multimedia Content Analysis, CSIE, CCU
29
Based on CSS images, the descriptor consists of Eccentricity (偏移量) and circularity (環狀) values of the
original and filtered contour Number of peaks The magnitude (height) of the largest peak The x and y positions on the remaining peaks
Chapter 15 of Introduction to MPEG-7: Multimedia ContentDescription Interface. Edited by Manjunath, et al., John Wiley & Sons,2002.
Example: The QBIC System30
Example: The QBIC System
Multimedia Content Analysis, CSIE, CCU
31
ColorColor histogram
TextureCoarseness, contrast, directionality
ShapeArea, circularity, eccentricity, major-axis direction
Fusion of multiple types of features often givesbetter performance.
References
Multimedia Content Analysis, CSIE, CCU
32
Tamura, et al. "Textural feature corresponding to visualperception,"IEEE Trans. on Systems, Man, and Cybernetics, vol.SMC-8, no. 6, pp. 460-473, 1978.
Park, et al. “Efficient use of local edge histogram descriptor,” Proc. of ACM International Workshop on Standards,Interoperability and Practices, pp. 51-54, 2000.
Manjunath and Ma, Chapter 12 of Image Database: Searchand Retrieval of Digital Imagery, edited by V. Castelli and L.D.Bergman, John Wiley & Sons, 2002.
Bober, “MPEG-7 visual shape descriptors”, IEEE Trans. on CSVT, vol. 11, no. 6, pp. 716-719, 2001.
Wei-Ta Chu
2010/10/21
Multidimensional IndexingTechniques
33
Multimedia Content Analysis, CSIE, CCU
Types of Content-Based Query
Multimedia Content Analysis, CSIE, CCU
34
Range search Find all images where feature 1 is within rang r1, and feature 2 is
within range r2, …, and feature n is within range rn
K-Nearest neighbor search Find the k most similar images to the template
Within-distance (α-cut) Find all images with a similarity score better than αwith respect to a
template
V. Castelli, “Multidimensional indexing structures for content-based retrieval,” IBM Research Report, 2001.
Curse of Dimensionality
Multimedia Content Analysis, CSIE, CCU
35
In two dimensions a circle is well approximated by the minimumbounding square The ratio of the square to the circle area is 4/π
In three dimensions, the ratio is 6/π In 100 dimensions, the ratio is 4.2 x 1039
Indexing schemes that rely on properties of low-dimensionalityspaces do not perform well in high-dimensional spaces
In a high-dimensional space, most data points appear to bealmost the same distance from the query sample Difficult for k-nearest neighbor or α-cut approach
Curse of Dimensionality36
The features of each vector independently distributed as standardGaussian random variable.
A large Gaussian sample in a 3-dim space looks like a tight and wellconcentrated cloud. But it’s not so in a 100-dim space.
<12.5, return 5.3% of the database<13, return 14% of the database
Dimensionality Reduction
Multimedia Content Analysis, CSIE, CCU
37
The feature space often has a local structureQuery images have close neighbors and therefore
nearest-neighbor and α-cut can be meaningful The features used to represent the images are
usually not independentThe feature vectors in the database can be wellapproximated by their “projections” onto a lower-dimensionality space
Example38
An artificial data set constructed by taking one of the off-linedigits, represented by a 64 x 64 pixel grey-level image, andembedding it in a larger image of size 100x100.
Each of the resulting images is represented by a point in the100x100 = 10000-dimensional data space.
However, there are only three degrees of freedom: verticaland horizontal translations and the rotations–intrinsicdimensionality is three.
C.M. Bishop, Chapter 12 of Pattern Recognition and Machine Learning, Springer, 2006.
Variable-Subset Selection
Multimedia Content Analysis, CSIE, CCU
39
Retaining some of the dimensions of the feature spaceand discarding the remaining ones
Goal: minimize the error induced by approximatingthe original vectors with their lower-dimensionalityprojections–by linear transformation of the featurespace
Variable-Subset Selection
Multimedia Content Analysis, CSIE, CCU
40
Methods: Karhunen-Loeve transform (KLT), singularvalue decomposition (SVD), principle componentanalysis (PCA)
They are data-dependent transformations and arecomputationally expensive.Poorly suited for dynamic databases
Multidimensional Scaling41
Non-linear methods to reduce the dimensionality of thefeature space.
No precise definition E.g. remapping the space Rn into Rm (m<n) using m
transformations each of which is a combination ofappropriate radial basis functions.
E.g. metric version of multidimensional scaling Generally, multidimensional scaling algorithms can
provide better reduction than linear methods. Much more expensive Data-dependent–poorly suited for dynamic databases
Beatty and Manjunath, “Dimensionality reduction using multi-dimensional scalingfor content-based image retrieval,” Proc. of ICIP, vol. 2, pp. 835-838, 1997.
Wei-Ta Chu
2010/10/21
Dimension Reduction42
Multimedia Content Analysis, CSIE, CCU
1.1 Principal Component Analysis (PCA)43
Widely used in dimensionality reduction, lossydata compression, feature extraction, anddata visualization
Also known as Karhunen-Loeve transform Two commonly-used definitions
Orthogonal projection of the data onto a lowerdimensional linear space such that the variance ofthe projected data is maximized.
Linear projection that minimizes the averageprojection cost
C.M. Bishop, Chapter 12 of Pattern Recognition and Machine Learning, Springer, 2006.
Maximum Variance Formulation
Multimedia Content Analysis, CSIE, CCU
44
Data set of observation {xn} with dimensionality D. Goal: project the data onto a space having
dimensionality M < D with maximizing the varianceof the projected data. Assume the value of M is given.
Begin with M=1. Data are projected onto a line in aD-dimensional space. The direction of the line isdenoted by a D-dimensional vector u1.
Each data point xn is then projected onto a scalarvalue u1
Txn.
LA Recap: Orthogonal Projection
Multimedia Content Analysis, CSIE, CCU
45
cosproj
)toorthogonalofcomponent(vectorproj
)alongofcomponent(vectorproj
2
2
ua
auu
auaa
auuuu
auaa
auu
a
a
a
Maximum Variance Formulation
Multimedia Content Analysis, CSIE, CCU
46
The mean of the projected data is
The variance of the projected data is given by
Where S is the covariance matrix defined by
Maximum Variance Formulation
Multimedia Content Analysis, CSIE, CCU
47
Maximize the projected variance with respect to u1
Introduce a Lagrange multiplier denoted by λ1
By setting the derivative with respect to u1 equal to zero, wesee that this quantity will have a stationary point when
u1 must be an eigenvector of S The variance will be a maximum when we set u1 equal to the
eigenvector having the largest eigenvalue λ1
Maximum Variance Formulation
Multimedia Content Analysis, CSIE, CCU
48
The optimal linear projection for which the variance of theprojected data is maximized is now defined by the Meigenvectors u1, …, uM of the data covariance matrix Scorresponding to the M largest eigenvalues λ1,…,λM
Principal component analysis involves evaluating the meanand the covariance matrix of the data set and then finding theM eigenvectors of S corresponding the M largest eigenvalues.
Covariance
Multimedia Content Analysis, CSIE, CCU
49
High variance, low covariance High variance, high covariance→ No inter-dimension dependency → inter-dimension dependency
Minimum Error Formulation
Multimedia Content Analysis, CSIE, CCU
50
Each data point can be represented by a linearcombination of the basis vectors
Our goal is to approximate this data point using arepresentation involving a restricted number M < D ofvariables corresponding to a projection onto a lower-dimensional subspace.
M-dim projection
Minimum Error Formulation
Multimedia Content Analysis, CSIE, CCU
51
Minimize approximation error
Obtaining the minimum value of J by selecting eigenvectorsto those having the D-M smallest eigenvalues, and hence theeigenvectors defining the principal subspace are thosecorresponding to the M largest eigenvalues.
L.I. Smith, “A tutorial on Principal Component Analysis,” http://csnet.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
J. Shlens, “A tutorial on Principal Component Analysis,” http://www.cs.cmu.edu/~elaw/papers/pca.pdf
Applications of PCA
Multimedia Content Analysis, CSIE, CCU
52
Mean vector and the first four PCAeigenvectors for the off-line digits data set
Eigenvalue spectrum and the sum of thediscard eigenvalues
An original example together with its PCAreconstructions obtained by retaining Mprincipal components
Eigenfaces53
Eigenfaces for face recognition is a famous application of PCA Eigenfaces capture the majority of variance in face data Project a face on those eigenfaces to represent face features
M. Turk and A.P. Pentland, “Face recognition using eigenfaces,” Proc. of CVPR, pp. 586-591, 1991.
1.2 Singular Value Decomposition (SVD)54
SVD works directly on data PCA works on covariance matrix of data The SVD technique examines the entire set of data and rotates the axis
to maximize variance along the first few dimensions.
Problem: #1: Find concepts in text #2: Reduce dimensionality
http://www.cs.cmu.edu/~guestrin/Class/10701-S06/Handouts/recitations/recitation-pca_svd.ppt
SVD - Definition
A[n x m] = U[n x r] Λ [ r x r] (V[m x r])T
A: n x m matrix (e.g., n documents, m terms) U: n x r matrix (n documents, r concepts) Λ: r x r diagonal matrix (strength of each‘concept’) (r: rank of the matrix)
V: m x r matrix (m terms, r concepts)
55
SVD - Properties
‘spectral decomposition’ of the matrix:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
= x xu1 u2
λ1
λ2
v1
v2
56
SVD - Interpretation
‘documents’, ‘terms’ and ‘concepts’: U: document-to-concept similarity matrix V: term-to-concept similarity matrix Λ: its diagonal elements: ‘strength’ of each concept
Projection: best axis to project on: (‘best’ = min sum of squares
of projection errors)
57
SVD - Example
A = U Λ VT - example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
datainf.
retrievalbrain lung
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=CS
MD
9.64 00 5.29
x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
CS-conceptMD-concept
doc-to-conceptsimilarity matrix
58
SVD - Example
A = U Λ VT - example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
datainf.
retrievalbrain lung
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=CS
MD
9.64 00 5.29
x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
‘strength’ of CS-concept
59
SVD - Example
A = U Λ VT - example:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
datainf.
retrievalbrain lung
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=CS
MD
9.64 00 5.29
x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
term-to-conceptsimilarity matrix
CS-concept
60
SVD–Dimensionality reduction
Q: how exactly is dim. reduction done? A: set the smallest singular values to zero:
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
0.18 00.36 00.18 00.90 00 0.530 0.800 0.27
=9.64 00 5.29
x
0.58 0.58 0.58 0 00 0 0 0.71 0.71
x
61
SVD - Dimensionality reduction
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
0.180.360.180.90000
~9.64
x
0.58 0.58 0.58 0 0
x
62
SVD - Dimensionality reduction
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 2 20 0 0 3 30 0 0 1 1
~
1 1 1 0 02 2 2 0 01 1 1 0 05 5 5 0 00 0 0 0 00 0 0 0 00 0 0 0 0
63
2.1 Multidimensional Scaling (MDS)64
Goal: represent data points in some lower-dimensional space such that the distances betweenpoints in that space correspond to the distancebetween points in the original space
http://www.analytictech.com/networks/mds.htm
Multidimensional Scaling (MDS)
Multimedia Content Analysis, CSIE, CCU
65
What MDS does is to find a set of vectors in p-dimensionalspace such that the matrix of Euclidean distances among themcorresponds as closely as possible to some function of the inputmatrix according to a criterion function called stress.
Stress: the degree of correspondence between the distancesamong points implied by MDS map and the input matrix.
dij refers to the distance between points i and j in the original spacezij refers to the distance between points i and j on the map
Multidimensional Scaling (MDS)66
The true dimensionality of the data will be revealed by therate of decline of stress as dimensionality increases.
Multidimensional Scaling (MDS)
Multimedia Content Analysis, CSIE, CCU
67
Algorithm Assign points to arbitrary coordinates in p-dimensional space Compute Euclidean distances among all pairs of points to form a
matrix Compare the matrix with the input matrix by evaluating the stress
function. The smaller the value, the greater the correspondence betweenthe two.
Adjust coordinates of each point in the direction that best maximallystress
Repeat steps 2 through 4 until stress won’t get any lower
T.F. Cox and M.A.A. Cox, Multidimensional Scaling, Chapman & Hall/CRC; 2 edition, 2000
2.2 Isometric Feature Mapping (Isomap)68
Examples
J.B. Tenenbaum, V. de Silva, and J.C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, pp. 2319-2323, 2000.
Isometric Feature Mapping (Isomap)
Multimedia Content Analysis, CSIE, CCU
69
Estimate the geodesic distance between far awaypoints, given only input-space distances.Adding up a sequence of “short hops” between neighboring
points
Isometric Feature Mapping (Isomap)
Multimedia Content Analysis, CSIE, CCU
70
Algorithm Step 1: construct neighborhood graphDetermines which points are neighbors on the manifoldConnect each point to all points within some fixed radius ε, or to
its K nearest neighbors
Step 2: compute shortest pathsEstimate the geodesic distance between all pairs of points on the
manifold by computing their shortest path in the graph
Step 3: construct d-dimensional embeddingApply MDS to the matrix of graph distances constructing an
embedding of the data
Isometric Feature Mapping (Isomap)71
2.3 Locally Linear Embedding (LLE)72
Eliminate the need to estimate pairwise distances between widelyseparated data points. LLE recovers global nonlinear structure from locallylinear fits.
S.T. Roweis and L.K. Saul,“Nonlinear dimensionality reduction by locally linearembedding,” Science, vol. 290, pp. 2323-2326, 2000http://www.cs.toronto.edu/~roweis/lle/publications.html
Locally Linear Embedding (LLE)73
Characterize the local geometry bylinear coefficients that reconstructeach data point from its neighbors.
Minimize the reconstruction errors
Choosing d-dimensional coordinate Yito minimize the embedding costfunction
Example74
The bottom images correspondto points along the top-rightpath, illustrating one particularmode of variability in pose andexpression.
Wei-Ta Chu
2010/10/21
Indexing Structures75
Multimedia Content Analysis, CSIE, CCU
Indexing
Multimedia Content Analysis, CSIE, CCU
76
After feature selection and dimensionality reduction,the third step is to the selection of appropriateindexing structure.
Vector space index methods Index feature vectors directly
Metric space index methods Index pairwise distances between objects
1. Vector Space Methods
Multimedia Content Analysis, CSIE, CCU
77
Non-hierarchical methodsBrute-force, maps a d-dimensional space onto the real
line, partition the space into non-overlapping cells, …
Recursive partitioning methodsQuadtree, k-d tree, R-tree, …
Projection-based methodsSupporting fixed-radius nearest-neighbor searches,
supporting 1+εnearest-neighbor searches
1.1 Quadtree
Multimedia Content Analysis, CSIE, CCU
78
Quadtrees are trees of degree 2d, where d is the dimension ofthe sample space.
Each step of the decomposition consists of identifying dsplitting points (one along each dimension), and partitioningthe space by means of (d-1)-dimensional hyperplanes passingthrough the splitting point and orthogonal to the splitting pointcoordinate axis.
Splitting a node of d-quadtree consists of dividing eachdimension into two parts, thus defining 2d hyperrectangles.
1.1 Quadtree79
Variations Region quadtree: decompose the space into squares Point quadtree: adaptive decomposition where the splitting points
depending on the data distribution
1.1 Quadtree
Multimedia Content Analysis, CSIE, CCU
80
Extremely popular in geographic information systemapplications
Drawbacks: Each split node always has 2d children–the quadtree is in general
very sparse, that is, most of its nodes are empty Quadtrees are inefficient for exact α–cut and nearest-neighbor
queries, since hyperspheres are not well approximated byhyperrectangles.
Poor performance in high-dimensional spaces
Example81
Quadtree is used to describe a class ofhierarchical data structures whosecommon property is that they are basedon recursive decomposition of space.
Samet, “The quadtree and related hierarchical datastructures” ACM Computer Surveys, vol. 16, no. 2, pp. 187-260, 1984.
Example82
1.2 K-D (K-Dimensional) Tree83
The k-d tree is a binary search tree that represents a recursivesubdivision of the universe into subspaces by means of (d-1)-dimensional hyperplanes. E.g. for d=3, splitting hyperplanes are alternately perpendicular to
the x-, y-, z-axes. Vertical splitting crossing c3, then horizontal splitting crossing p10 and
c7
V. Gaede and O. Gunther, “Multidimensional access methods,” ACM Computing Surveys, vol. 30, no. 2, pp. 170-231, 1998.
1.2 K-D (K-Dimensional) Tree84
Disadvantage: The structure is sensitive to the order in which the points are inserted Data points are scattered all over the tree
Adaptive k-d tree Choosing a split such that about the same number of points on both sides Split points are not part of the input data; all data points are stored in
the leaves
1.3 R-Tree
Multimedia Content Analysis, CSIE, CCU
85
An R-tree corresponds to a hierarchy of nested d-dimensionalintervals (boxes).
Each node v of the R-tree corresponds to an interval
1.3 R-Tree86
R-Trees : represent spatial objects byintervals in several dimensions
Guttman, “R-trees: a dynamic index structure forspatial searching” Proc. of SIGMOD, 1984.
1.4 Fixed-Radius Nearest-NeighborSearches
Multimedia Content Analysis, CSIE, CCU
87
Projects data points onto the individual coordinatesaxes, and produces d sorted lists, on per dimension.
In response to a query, the algorithm retrieves fromeach list the points whose coordinate lie within r ofthe corresponding coordinate of the query point.
The candidate data points are exhaustivelysearched.
Friedman, et al. “An algorithm for finding nearest neighbors,” IEEE Trans. on Computer, pp. 1000-1006, 1975.
2. Metric Space Methods
Multimedia Content Analysis, CSIE, CCU
88
Indexing the metric structure of a spaceVoronoi regions
Vantage-point methodsvp-tree
http://groups.csail.mit.edu/graphics/classes/6.838/S98/meetings/m25/m25.html
References
Multimedia Content Analysis, CSIE, CCU
89
V. Castelli, “Multidimensional indexing structures for content-based retrieval,” IBM Research Report, 2001.
V. Gaede and O. Gunther, “Multidimensional access methods,” ACM Computing Surveys, vol. 30, no. 2, pp. 170-231, 1998.
L.I. Smith, A tutorial on Principal Component Analysis,http://csnet.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
J. Shlens, “A tutorial on Principal Component Analysis,” http://www.cs.cmu.edu/~elaw/papers/pca.pdf
Next Week
Multimedia Content Analysis, CSIE, CCU
90
No Class in Oct. 28