Automatic Matching of Multi-View Images
Ed Bremer
University of Rochester
Automatic Matching of Multi-View Images
2
References
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, October 2004,http://lear.inrialpes.fr/pubs/2004/MS04a
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04
[3] Lowe, D., 2004. Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60, 2 (2004), pp. 91-118.
[4] Matas, J., Chum, O., Urban, M., Pajdla,T. 2002. Robust Wide Baseline Stereo From Maximally Stable Extremal Regions, Proc British Machine Vision Conference BMVC2002, pages 384 – 393.
[5] Zisserman, A., Schaffalitzky, F., 2002, Multi-view matching for unordered image sets, or ”How do I organize my holiday snaps?”, Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, pages 414-431, vol 1.
[6] Baumberg, A., 2000, Reliable Feature Matching Across Widely Separated Views, In Proc. CVPR ,pages 774-781.
[7] Mikolajczyk, K, Schmid, C., 2001, Indexing based on scale invariant interest points, In Proc. 8th ICCV, pages 525-531.
Automatic Matching of Multi-View Images
3
Outline
Motivation
Applications
Process Components
Region Detectors
Descriptors
Matching Criteria
Performance Evaluation
Conclusion & Next Steps
Automatic Matching of Multi-View Images
4
Motivation
Multi-view/Multi-image MatchingMultiple images of scene taken by single or multiple cameras with different rotation, scale, viewpoint and illumination
3D scene
Automatic Matching of Multi-View Images
5
Motivation
Applications
… detecting matching regions is used in all the following
Image registration
Super-resolution
Stereo vision
Object detection and recognition
Object and motion tracking
Indexing and retrieval of objects
3D scene reconstruction
Scene recognition
Automatic Matching of Multi-View Images
6
Examples of Multi-view Images [2]
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04
Automatic Matching of Multi-View Images
7
Process Components
Covariant region detection Detect image regions covariant to class of
transformation between reference image and transformed image
Invariant descriptor Compute invariant descriptors from covariant regions
Descriptor matching Compute distance between descriptors in reference
image and transformed image
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,
http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View Images
8
Region Detectors
Support regions for computation of descriptors
Determined independently in each image Scale invariant or Affine invariant Can be points (feature points) or regions (covariant) Provide dense (local) coverage – robust to occlusion Need to be stable and repeatable
Five region detectors -
Harris points -> invariant to rotation Harris-Laplacian -> invariant to rotation and scale Hessian-Laplace ->invariant to rotation and scale Harris-Affine -> invariant to affine image transformations Hessian-Affine -> invariant to affine image transformations
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View Images
9
Region Detectors
Harris points - Maxima of Harris function used to locate interest point Support region fixed in size, 41x41 neighborhood centered at
interest point
Harris-Laplace regions - Scale adapted Harris function Interest point is local minima or maxima across scale-space by
Laplacian-of-Gaussian
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View Images
10
Region Detectors
Harris-Laplace Performance - Approximately 10% better than Laplacian, Lowe or
gradient methods. Harris standard detector is very poor under scale changes
[7] Mikolajczyk, K., Schmid, C., 2001, Indexing based on scale invariant interest points, In Proc. 8th ICCV, Pages 525-531.
Automatic Matching of Multi-View Images
11
Region Detectors
Hessian-Laplace regions - Interest point is at local maxima of Hessian determinant
Location in scale-space using maxima of Laplacian-of-Gaussian (can also use Difference-of-Gaussians)
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a
[3] Lowe, D., 2004. Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60, 2 (2004), pp. 91-118.
Automatic Matching of Multi-View Images
12
Region Detectors
Harris-Affine regions - Find regions using Harris-Laplace detector Region based on 2nd moment & affine adapted
Hessian-Affine regions - Find regions using Hessian-Laplace detector Affine adapted region based on 2nd moment.
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04
Automatic Matching of Multi-View Images
13
Region Detectors
Regions produced by Harris-Affine and Hessian-Affine detectors
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04
Automatic Matching of Multi-View Images
14
Region Detectors
Affine normalization using 2nd moment matrix for region L and R
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04
Automatic Matching of Multi-View Images
15
Region Detectors
Region normalization Detectors produce circular or elliptical regions Size dependant on detection scale Map regions to circular region with constant radius Rotate regions in direction of dominant gradient
orientation
Illumination normalization Use affine transformation -> aI(x) + b Mean and standard deviation of pixel intensities
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View Images
16
Descriptors
Descriptors -> Feature vector Invariant to changes in scale, rotation, affine translation and affine
illumination Need to be distinct, stable and repeatable Distribution (histogram) type or Covariance type
Ten Descriptor types Scale-Invariant Feature Transform (SIFT) Gradient Location and Orientation histogram (GLOH) Shape Context Principal Component Analysis (PCA)-SIFT Steerable Filters Differential Invariants Complex Filters Moment Invariants Cross-Correlation Spin Image
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View Images
17
Descriptors
SIFT and GLOH 3D Descriptors SIFT -> 4 x 4 x 8 = 128 dimension descriptor GLOH -> Log-polar [(2 x 8) + 1] x 16 = 272 dimension descriptor
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View Images
18
Matching Criteria
Distance measure Find putative matches between images Mahalanobis distance – used for covariant descriptors Euclidean distance – used for distribution (histogram) descriptors Direct distance comparison not suitable for indexing or database
searching
Simple threshold Descriptors match if distance between is below threshold t Descriptor in reference image can have many matches to
descriptors in transformed image
Nearest Neighbor (NN) Find closest match between descriptors in reference and
transformed image Descriptor in reference image can have only 1 match to descriptor
in transformed image
Automatic Matching of Multi-View Images
19
Performance Evaluation
Criterion basis Recall rate = #correct matched/#correspondences 1-precision = #false matches/[#correct matches + #false matches] Ideal descriptor -> recall rate = 1, for all precision given no overlap error
[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a
Automatic Matching of Multi-View Images
20
SIFT - Scale Invariant Feature Transform
Scale Invariant Feature Transform (SIFT) Lowe [3]
Features – Invariant to image scale, rotation Invariant for small changes in illumination and 3D camera
viewpoint
Extracts large number of highly distinctive features Enables detection of small objects Improved performance in cluttered scenes
Algorithms are efficient – complex operations applied to local regions or features vs whole image
Procedure Scale-space extrema detection Keypoint localization Orientation asignment Keypoint vector (descriptor)
Automatic Matching of Multi-View Images
21
SIFT - Scale Invariant Feature Transform [3]
Scale-Space Blob Detector - Search for stable features over all scales and image
locations Scale-space kernel -> Gaussian function
Difference of Gaussian
Automatic Matching of Multi-View Images
22
SIFT - Scale Invariant Feature Transform [3]
Difference of Gaussian (DoG) simple subtraction of blurred L images
Approximation to scale-normalized Laplacian of Gaussian
Maxima or minima of scale-normalized Laplacian produces the most stable image features compared to gradient, Hessian, or Harris corner function (Mikolajczyk 2002)
Automatic Matching of Multi-View Images
23
SIFT - Scale Invariant Feature Transform [3]
Scale-Space Image Set - Divide each octave into s intervals
Compute s + 3 filtered (increasing blurry) images, k = 2(1/s)
s = 3, k = 1.26 -> 6th –> 3.18σ5th –> 2.52σ4th –> 2.00σ3rd –> 1.59σ2nd –> 1.26σ 1st –> 1.00σ
Subtract adjacent images to produce DoG images
Repeat for next octave using 2nd image from top and decimate by 2
Automatic Matching of Multi-View Images
24
SIFT - Scale Invariant Feature Transform [3]
Scale-Space Pyramid -(from Lowe)
Automatic Matching of Multi-View Images
25
SIFT - Scale Invariant Feature Transform [3]
Locating Scale-Space Extrema - Detection of local maxima or minima of D(x, y, σ)
Compare each sample point to 8 neighbors in same scale image and 9 neighbors in scale image above and below.
Mark if sample is greater than or less than all of the neighbors
Compares s number of DoG images
Automatic Matching of Multi-View Images
26
SIFT - Scale Invariant Feature Transform [3]
Improving Localization -
Reject points that have low contrast using:
<threshold
Where –>
Gives offset extremum ->
Hessian and derivative of D(x, y, σ) uses differences of neighboring sample points. x = (x, y , σ)T is offset from sample point
Automatic Matching of Multi-View Images
27
SIFT - Scale Invariant Feature Transform [3]
Edge Rejection -
Eliminate poorly defined peaks (edges) using Hessian matrix
Verify ratio of principal curves is less than threshold r<10
Efficient to compute -> less than 20 floating point operations
Automatic Matching of Multi-View Images
28
SIFT - Scale Invariant Feature Transform [3]
Results from Lowe [3] – 832 keypoints reduced to 536 (233x189 image)
Automatic Matching of Multi-View Images
29
SIFT - Scale Invariant Feature Transform
Results from Lowe [3] – performance measures
Automatic Matching of Multi-View Images
30
SIFT - Scale Invariant Feature Transform
Results from Lowe [3] – performance measures
Automatic Matching of Multi-View Images
31
SIFT - Scale Invariant Feature Transform [3]
Orientation – rotational invariance Use scale of point to select image L(x, y, σ)
Compute the gradient m(x, y) and orientation θ(x, y) at each image sample using differences.
Orientation histogram of sample points – entries weighted by gradient magnitude and a Gaussian window around the keypoint, bins cover 360° range
Peaks in histogram correspond to dominant directions of local gradients
Automatic Matching of Multi-View Images
32
SIFT - Scale Invariant Feature Transform [3]
Descriptor – the feature vector
8x8 sub-region histograms allow shift in gradient positions
128 element feature vector -> 4x4 array of 8 orientations(2x2x8 from Lowe is shown below)
Feature vectors matched by nearest neighbor (Euclidean distance)
Automatic Matching of Multi-View Images
33
SIFT - Scale Invariant Feature Transform [3]
Results from Lowe [3] – Two training objects recognized in cluttered image Small squares show point matches Large rectangles shown border of training image after affine
transformation
Automatic Matching of Multi-View Images
34
Conclusions
Conclusions Harris-Laplacian region detector performs better than Laplacian, DoG and
gradient scale-space operators
Scale-space detectors provide invariance to rotation, scale and small changes to illumination and viewpoint.
Affine adaptation provides invariance to affine transformations
GLOH and SIFT descriptors provide the best performance.
Dense, localized descriptors perform well under occlusions
Nexts steps Coding and testing of region detectors, descriptors and matching…