Lecture Name Course Name
School of Computer Science & StatisticsTrinity College DublinDublin 2Irelandwww.scss.tcd.ie
Improvements to SIFT - SURF & CenSure
1
Applications of strong features - Photosynth & CBIR
1
Lecture Name Course Name
Objectives Look at extensions to the SIFT model of strong
features Detector + Descriptor + Matching
Examine the use of strong features within two different web based applications of computer vision Building 3D models from web photo collections
Photosynth, www.arc3d.be Content based or Reverse image retrieval on the web
Goggle images / Google Goggles / www.tineye.com
2
2
Lecture Name Course Name
SURF: Speeded Up Robust Features - ECCV 2006Herbert Bay, Tinne Tuytelaars and Luc Van Gool
Three stages Detection
Find the key points Description
Build a descriptor of the key points Matching / recognition
Finding key points in a database SIFT especially slow in the last step
Dimensionally of the key point descriptor SURF tries to speed up each step
3
3
Lecture Name Course Name
SURF - fast key-point detector Hessian based key point detector
Where Laa(x) is the partial derivative in direction a and Lab(x) is a mixed partial derivative
Approximate Partial derivatives with box filters
5
Lyy Lxy DxyDyy
5
Lecture Name Course Name
SURF - Scale detector not image
6
SURF - one image, scale the detectorsSIFT - multiple images convolved with Gaussian, then calculate DoG
6
Lecture Name Course Name
Orientation Assignment SIFT = HOG, gaussian
weighting, multiple orientations SURF
distribution of Harr wavelet responses in X and Y
Summed in a [60 deg] window Max response is the principal
direction Other candidates ignored
8
8
Lecture Name Course Name
SURF - Key point description Split the interest region up into 4 x 4 square sub-regions
with 5 x 5 regularly spaced sample points inside Calculate Haar wavelet response dx and dy
Weight with a Gaussian kernel centered at key-point Sum the response over each sub-region for dx and dy
separately: feature vector of length 32 In order to bring in information about the polarity of the
intensity changes, extract the sum of absolute value of the responses: feature vector of length 64
SURF-128 - The sum of dx and |dx| are computed separately for dy < 0 and dy >0 Similarly for the sum of dy and |dy|
9
9
Lecture Name Course Name
SURF - Matching Key points tend to be found
on blob like structures Sign of the Laplacian
separates bright from dark blobs This information is free and
accelerates matching
11
11
Lecture Name Course Name
SURF Exploits Integral Images
Approximation of Partial Derivatives Harr wavelet responses to calculate orientation
Size of sliding window Resolution of the harr wavelets in the window
U-SURF Avoid orientation calculation Applications were camera
will not change significantly Driving, wearable cameras….
13
13
Lecture Name Course Name
CenSure CenSurE: center surround extremeas for
realtime feature detection and matching Agrawal, Konolige, Blas ECCV 2008
More stable and more accurate than SIFT or SURF
Test domain Visual Odometery
14
14
Lecture Name Course Name
CenSurE SIFT - DOG SURF - approximation to Hessian CenSurE - approximation to Laplacian (LOG)
Centre Surround Function
15
Approximate Laplacian - bi-level filter 1 or -1
15
Lecture Name Course Name
CenSure - Integral Image Uses Integral
Image But… regions are
not rectangular: e.g. octogon….?
Answer.. Build “slanted” Integral images to calculate polygons
16
16
Lecture Name Course Name
CenSurE Similar to SURF - Scale invariance by upscaling
the filters n= 1,2,3,4,5,6,7
Non maximal suppression in scale Maxima in a 3x3x3 neighborhood Maxima and minima must pass a threshold
Octagonal filters rotation invariant
17
17
Lecture Name Course Name
CenSurE - evaluation Visual Odometry
Extract distinctive features, find potential matches - search 1/5th image RANSAC based pose estimation If motion estimate is small and large number of
inliers - discard the frame (key frames) Further refinement using bundle adjustment (beyond
scope of course)
18
18
Lecture Name Course Name
Summary Strong Feature Detectors allow a range of 2D
and 3D applications Realtime responses on conventional hardware
Choose feature detector carefully SURF vs U-SURF, CenSurE vs SURF?
Need to measure performance within an application domain empirically
Very active area of research! http://cvlab.epfl.ch/publications/publications/2009/
CalonderLKBMF09.pdf
20
20
Lecture Name Course Name
Applications of strong feature detectors Photosynth
Build a 3D image using my photos & videos Location based links between images
Content Based Image retrieval Text based image look up
“find me images of dogs” Reverse image look up
“find me images like this”
21
21
Lecture Name Course Name
Photosynth Multi-View Stereo for Community Photo
Collections - M Goesele, N Snavely, B Curless, H Hoppe, SM Seitz - research.microsoft.com/~hoppe/mvscpc.pdf
Multi View Stereo Dense Depth Maps Different
View points Illumination conditions Camera parameters Clutter
22
22
Lecture Name Course Name
Algorithm overview Calibrate cameras geometrically and
radiometrically PTLens
Global View Selection Good set of images for stereo matching
Local View selection Subsets of these images that will yield good
matches Stereo match at each pixel Match Optimised if higher confidence found
23
23
Lecture Name Course Name
Strong Reliance on SIFT Strong Feature extraction method
Scale Invariant Rotation Invariant Illumination Invariant
24
24
Lecture Name Course Name
SIFT in Photosynth Common SIFT features used to identify potential
Global Views but.. Don’t want to be too close stereo ill conditioned- c.f.
Pollefeys Function used to encourage view angle difference of
>10degs Views need to be rescaled (SIFT is scale invarient)
Resample views to lowest common demoninator
25
25
Lecture Name Course Name
Local View Selection Select a subset A of the N good match
candidates (A typically 4 ) A updated through the process based on
current estimate of depth Photometric consistency
NCC matching Sufficiently wide baseline
Match along epi-polar lines Reject outliers Depth and normal are optimised
26
26
Lecture Name Course Name
http://photosynth.net 2010
30
30
Course Name
School of Computer Science & StatisticsTrinity College DublinDublin 2Irelandwww.scss.tcd.ie
Content Based Image Retrieval
31
Image Retrieval: Ideas, Influences, and Trends of the New AgeRITENDRA DATTA, DHIRAJ JOSHI, JIA LI, and JAMES Z. WANG Computing Surveys, Vol. 40, No. 2, Article 5, Publication date: April 2008
Forsyth and Ponce Computer vision a modern approach Ch 25
31
Course Name
Content Based Image Retrieval
32
Organise images using their visual content Google Images vs iPhoto “faces” feature
Internet becoming more visual Flicker, you tube, google images …. Text based tagging limited, leading to rise in CBIR
See http://www.gwap.com/gwap/ for “human computing” behind captchas
Image based Captcha
32
Course Name
Why understanding images is hard Sensory Gap
Real world object vs Information in description of the scene - picture is worth 1000 words...
Semantic Gap Differences between peoples interpretation of the
same image.
34
www.gwap.com
34
Course Name
Colour Histograms 2D or 3D histograms of colour
Colour space: RGB, RG Chromaticity, XYLAB… Different tolerance to lighting changes, perceptual
weighting of colours
Tolerant to scale changes, rotations & warping
Colour manipulation of an image can put CBIR off
Does not capture spatial organisation
36
36
Course Name
Colour Correlogram Similar in concept to GLCM Autocorrelogram compares only identical
colours Captures small scale spatial distribution of
colours Scale of the correlogram
E.g. 64x64 quantised colours Can be used for image sub regions
Compare with texture segmentation
37
37
Course Name
Texture GLCM, Autocorrelation… Texture Histograms
Define bins that correspond to texture types in sub regions in the image
Histogram counts the occurrence of the texture types
Compact representation of texture info Texture of Textures
Recursive approach - texture based region segmentation - texture analysis of regions
38
38
Course Name
Global Vs Local Global features
Looking for a specific image no distinction between foreground and
background Simple & Fast
Local features Image sub-regions Perceptually driven features
39
39
Course Name
Image Sub regions
Tile based Similar to global Object “fracture” Simple...
Region based Sensible ROIs Objects
(sometimes) Complex...
40
40
Course Name
Shape Detection Dense Feature Approach to Shape
description Histograms of Oriented Gradients (HOG)
Dalal and Triggs
Sparse Approach involves “Segmentation” Edge based approaches Regions based approaches Shape Descriptors
Matching based on User Sketch / example image
41
41
Course Name
High Level Semantics Extract regions at belong to a large broadly
defined semantic group Face Detection Car, Boat, Airplane, Dog…….
Image tagging in large databases Very computationally intensive Limitations to approach
42
42
Course Name
Strong Features SIFT, SURF et. al. Image represented by an unordered
collection of features “Bag of words”
Features are “words” Create “Codebook”
E.g. Use KNN Use training set to create clusters, each representing an
object N clusters = N objects that can be recognised
Image mapped onto a histogram of codewords 43
43
Course Name
Matching Image Histograms Many representations of images are
histogram based Comparison based on the similarity of the
histograms Simple measures
Euclidian / SSD Overlap Chi-Square Statistic...
More robust Earth Mover distance(EMD)
44
44
Course Name
Chi Square (χ2) Compares two distributions
Simple approaches are sensitive to the way in which the data is binned
45
45
Course Name
Earth Mover Distance (EMD)(Wasserstein metric)
Each bin viewed as a pile of dirt piled on M The metric measures the cost of turning one
histogram into the other… i.e. The amount of dirt by the distance it needs to
be moved We want to find that measures the flow
between pi and qj such that it minimises the work
Subject to 4 constraints46
46
Course Name
EMD Once the optimisation problem has been
solved
Details in The Earth Mover's Distance as a Metric for
Image Retrieval - 1998 Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas
Code for EMD http://www.cs.duke.edu/~tomasi/software/emd.htm
47
47
Course Name
Relevance Feedback Image Retrieval is iterative User target class is unique
For some images colour may be a good feature For others texture, shape, etc.
User provides feedback on Good match Bad match Neutral match
Imbalance between new training set and global set - SVM not suitable
48
48
Course Name
CBIR Large and rapidly growing field Applications to video as well as images
Violence detection - TCD & Google Pornography Detection - TCD and PixAlert
Knowledge of image descriptors strengths and weaknesses
Knowledge of image matching methods strengths and weaknesses
Role of Relevance Feedback
51
51