CS 1674: Intro to Computer Vision
Midterm Review
Prof. Adriana KovashkaUniversity of Pittsburgh
October 10, 2016
Reminders
• The midterm exam is in class on this coming Wednesday
• There will be no make-up exams unless you or a close relative is seriously ill!
Review requests I received
• Textures and texture representations, image responses to size and orientation of Gaussian filter banks, comparisons – 4
• Corner detection alg, Harris – 4• Invariance vs covariance, affine intensity change, and applications to know – 3 • Scale-invariant detection, blob detection, Harris automatic scale selection – 3 • Sift and feature description – 3 • Keypoint matching alg, feature matching – 2 • Examples of how to compute and apply homography, epipolar geometry – 2 • Why it makes sense to use the ratio: distance to best match / distance to second
best match when matching features across images• Summary of equations students need to know • Pyramids• Convolution practical use • Filters for transforming the image
Transformations, Homographies, Epipolar Geometry
2D Linear Transformations
Only linear 2D transformations can be represented with
a 2x2 matrix.
Linear transformations are combinations of …
• Scale,
• Rotation,
• Shear, and
• Mirror
y
x
dc
ba
y
x
'
'
Alyosha Efros
2D Affine Transformations
Affine transformations are combinations of …
• Linear transformations, and
• Translations
Maps lines to lines, parallel lines remain parallel
w
y
x
fed
cba
w
y
x
100'
'
'
Adapted from Alyosha Efros
Projective Transformations
Projective transformations:
• Affine transformations, and
• Projective warps
Parallel lines do not necessarily remain parallel
w
yx
ihg
fedcba
w
yx
'
''
Kristen Grauman
How to stitch together a panorama (a.k.a. mosaic)?
• Basic Procedure
– Take a sequence of images from the same position• Rotate the camera about its optical center
– Compute the homography (transformation) between second image and first
– Transform the second image to overlap with the first
– Blend the two together to create a mosaic
– (If there are more images, repeat)
Modified from Steve Seitz
11, yx 11, yx
To compute the homography given pairs of corresponding
points in the images, we need to set up an equation where
the parameters of H are the unknowns…
22 , yx 22 , yx
…
…
nn yx , nn yx ,
Kristen Grauman
Computing the homography
Computing the homography
Can set scale factor i=1. So, there are 8 unknowns.
Set up a system of linear equations:
Ah = b
where vector of unknowns h = [a,b,c,d,e,f,g,h]T
Need at least 8 eqs, but the more the better…
Solve for h. If overconstrained, solve using least-squares: 2
min bAh
1
y
x
ihg
fed
cba
w
wy'
wx'
p’ = Hp
Kristen Grauman
Computing the homography
• Assume we have four matched points: How do we
compute homography H?
0h
'''1000
'''0001
yyyxyyx
xyxxxyx
'
''
''
'p
w
yw
xw
987
654
321
hhh
hhh
hhh
H
9
8
7
6
5
4
3
2
1
h
h
h
h
h
h
h
h
h
h
Derek Hoiem
p’=Hp
• Apply SVD: UDVT = A [U, S, V] = svd(A);
• h = Vsmallest (column of V corr. to smallest singular value)
A
1
yx
*********
w
wy'wx'
H pp’
yx,
w
yww
xw,
yx ,
To apply a given homography H
• Compute p’ = Hp (regular matrix multiply)
• Convert p’ from homogeneous to image
coordinates
Modified from Kristen Grauman
Transforming the second imageImage 1 canvasImage 2
Test point:
f(x,y) g(x’,y’)
Transforming the second image
Forward warping:
Send each pixel f(x,y) to its corresponding location
(x’,y’) = H(x,y) in the right image
x x’
H(x,y)
y y’
Modified from Alyosha Efros
Image 1 canvasImage 2
Depth from disparity
image I(x,y) image I´(x´,y´)Disparity map D(x,y)
So if we could find the corresponding points in two images,
we could estimate relative depth…
Kristen Grauman
We have two images taken from cameras with different intrinsic
and extrinsic parameters.• How do we match a point in the first image to a point in the second?
• Epipolar Lines - intersections of epipolar plane with image
planes (always come in corresponding pairs)• Note: All epipolar lines intersect at the epipole.
Epipolar geometry: notationX
x x’
• Epipolar Plane – plane containing baseline
• Epipoles
= intersections of baseline with image planes
= projections of the other camera center
• Baseline – line connecting the two camera centers
Derek Hoiem
Epipolar constraint
The epipolar constraint is useful because
it reduces the correspondence problem
to a 1D search along an epipolar line.
Kristen Grauman, image from Andrew Zisserman
Essential matrix
0 RXTX
0][T RXX x
E is called the essential matrix, and it relates corresponding image
points between both cameras, given the rotation and translation.
Before we said: If we observe a point in one image, its position in other
image is constrained to lie on line defined by above.• Turns out Ex’ is the epipolar line through x in the first image, corresp. to x’.
Note: these points are in camera coordinate systems.
Let RE ][T x
0 EXXEXXT
Kristen Grauman
Basic stereo matching algorithm
• For each pixel in the first image– Find corresponding epipolar scanline in the right image
– Search along epipolar line and pick the best match x’
– Compute disparity x-x’ and set depth(x) = f*T/(x-x’)
Derek Hoiem
Matching cost
disparity
Left Right
scanline
• Slide a window along the right scanline and compare contents
of that window with the reference window in the left image
• Matching cost: e.g. Euclidean distance
Derek Hoiem
Correspondence search
• Assume parallel optical axes, known camera parameters
(i.e., calibrated cameras). What is expression for Z?
Similar triangles (pl, P, pr) and
(Ol, P, Or):
Geometry for a simple stereo system
Z
T
fZ
xxT rl
lr xx
TfZ
disparity
Kristen Grauman
depth
Results with window searchData
Window-based matching Ground truth
Left image Right image
Window-based matching Ground truth
Derek Hoiem
How can we improve?• Uniqueness
– For any point in one image, there should be at most one matching point in the other image
• Ordering– Corresponding points should be in the same order in both
views
• Smoothness– We expect disparity values to change slowly (for the most
part)
Derek Hoiem
Many of these constraints can be encoded in an energy function and solved using graph cuts
Graph cuts Ground truth
For the latest and greatest: http://vision.middlebury.edu/stereo/
Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy
Minimization via Graph Cuts, PAMI 2001
Before
Derek Hoiem
Projective structure from motion
• Given: m images of n fixed 3D points
xij = Pi Xj , i = 1,… , m, j = 1, … , n
• Problem: estimate m projection matrices Pi and n 3D points Xj from the mn corresponding 2D points xij
x1j
x2j
x3j
Xj
P1
P2
P3
Svetlana Lazebnik
Photo synth
Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring
photo collections in 3D," SIGGRAPH 2006
http://photosynth.net/
3D from multiple images
Building Rome in a Day: Agarwal et al. 2009
Recap: Epipoles
C
• Point x in left image corresponds to epipolarline l’ in right image
• Epipolar line passes through the epipole (the intersection of the cameras’ baseline with the image plane
C
Derek Hoiem
Recap: Essential, Fundamental Matrices
• Fundamental matrix maps from a point in one image to a line in the other
• If x and x’ correspond to the same 3d point X:
• Essential matrix is like fundamental matrix but more constrained
Adapted from Derek Hoiem
Recap: stereo with calibrated cameras
• Given image pair, R, T
• Detect some features
• Compute essential matrix E
• Match features using the epipolar and other constraints
• Triangulate for 3d structure and get depth
Kristen Grauman
Texture representations
Correlation filtering
Say the averaging window size is 2k+1 x 2k+1:
Loop over all pixels in neighborhood around image pixel F[i,j]
Attribute uniform weight to each pixel
Now generalize to allow different weights depending on neighboring pixel’s relative position:
Non-uniform weights
Kristen Grauman
Convolution vs. correlation
Convolution
Cross-correlation5 2 5 4 4
5 200 3 200 4
1 5 5 4 4
5 5 1 1 2
200 1 3 5 200
1 200 200 200 1
F
.06 .12 .06
.12 .25 .12
.06 .12 .06
H
u = -1, v = -1
(0, 0)
(i, j)
-101
-202
-101
* =
Slide credit: Derek Hoiem
Filters for computing gradients
Texture representation: example
original image
derivative filter responses, squared
statistics to summarize patterns in small
windows
mean d/dxvalue
mean d/dyvalue
Win. #1 4 10
Win.#2 18 7
Win.#9 20 20
…
…
Kristen Grauman
Filter banks
• What filters to put in the bank?
– Typically we want a combination of scales and orientations, different types of patterns.
Matlab code available for these examples: http://www.robots.ox.ac.uk/~vgg/research/texclass/filters.html
scales
orientations
“Edges” “Bars”
“Spots”
Kristen Grauman
Matching with filters
• Goal: find in image
• Method 0: filter the image with eye patch
Input Filtered Image
],[],[],[,
lnkmflkhnmglk
What went wrong?
f = image
g = filter
Derek Hoiem
Matching with filters
• Goal: find in image
• Method 1: filter the image with zero-mean eye
Input Filtered Image (scaled) Thresholded Image
)],[())(],[(],[,
lnkmfhmeanlkhnmglk
True detections
False
detections
Likes bright pixels where filters are above average, dark pixels where filters are below average.
Derek Hoiem
Showing magnitude of responses
Kristen Grauman
Kristen Grauman
Kristen Grauman
Representing texture by mean abs
response
Mean abs responses
Filters
Derek Hoiem
Computing distances using texture
Dimension 1
Dim
en
sio
n 2
a
b
dim#
1
2
222
211
)(),(
)()(),(
i
ii babaD
bababaD
Kristen Grauman
Feature detection: Harris
Corners as distinctive interest points• We should easily recognize the keypoint by looking
through a small window
• Shifting a window in any direction should give a large change in intensity
“edge”:
no change along
the edge direction
“corner”:
significant change
in all directions
“flat” region:
no change in
all directions
A. Efros, D. Frolova, D. Simakov
Harris Detector: Mathematics
Window-averaged squared change of intensity induced by shifting the image data by [u,v]:
IntensityShifted intensity
Window function
orWindow function w(x,y) =
Gaussian1 in window, 0 outside
D. Frolova, D. Simakov
Harris Detector: MathematicsExpanding I(x,y) in a Taylor series expansion, we have, for small shifts [u,v], a quadratic approximation to the error surface between a patch and itself, shifted by [u,v]:
where M is a 2×2 matrix computed from image derivatives:
D. Frolova, D. Simakov
yyyx
yxxx
IIII
IIIIyxwM ),(
x
II x
y
II y
y
I
x
III yx
Notation:
K. Grauman
Harris Detector: Mathematics
What does the matrix M reveal?
Since M is symmetric, we have TXXM
2
1
0
0
iii xMx
The eigenvalues of M reveal the amount of intensity change in the two principal orthogonal gradient directions in the window.
K. Grauman
Corner response function
“flat” region:
1 and 2 are small
“edge”:
1 >> 2
2 >> 1
“corner”:
1 and 2 are large,1 ~ 2
Adapted from A. Efros, D. Frolova, D. Simakov, K. Grauman
Harris Detector: Algorithm
• Compute image gradients Ix and Iy for all pixels
• For each pixel– Compute
by looping over neighbors x, y
– compute
• Find points with large corner response function R (R > threshold)
• Take the points of locally maximum R as the detected feature points (i.e., pixels where R is bigger than for all the 4 or 8 neighbors)
55D. Frolova, D. Simakov
(k :empirical constant, k = 0.04-0.06)
K. Grauman
Example of Harris application
Feature detection: Scale-invariance
Invariance vs covariance
“A function is invariant under a certain family of
transformations if its value does not change when a
transformation from this family is applied to its argument.
A function is covariant when it commutes with the
transformation, i.e., applying the transformation to the
argument of the function has the same effect as applying
the transformation to the output of the function. […]
[For example,] the area of a 2D surface is invariant under
2D rotations, since rotating a 2D surface does not make
it any smaller or bigger.
But the orientation of the major axis of inertia of the
surface is covariant under the same family of
transformations, since rotating a 2D surface will affect
the orientation of its major axis in exactly the same way.”
“Local Invariant Feature Detectors: A Survey” by Tinne Tuytelaars and Krystian Mikolajczyk,
in Foundations and Trends in Computer Graphics and Vision Vol. 3, No. 3 (2007) 177–280
Chapter 1, 3.2, 7 http://homes.esat.kuleuven.be/%7Etuytelaa/FT_survey_interestpoints08.pdf
What happens if: Affine intensity change
• Only derivatives are used =>
invariance to intensity shift I I + b
• Intensity scaling: I a I
R
x (image coordinate)
threshold
R
x (image coordinate)
Partially invariant to affine intensity change
I a I + b
L. Lazebnik
What happens if: Image translation
• Derivatives and window function are shift-invariant
Corner location is covariant w.r.t. translation
L. Lazebnik
What happens if: Image rotation
Second moment ellipse rotates but its shape
(i.e. eigenvalues) remains the same
Corner location is covariant w.r.t. rotation
L. Lazebnik
What happens if: Scaling
All points will
be classified
as edges
Corner
Corner location is not covariant to scaling!
L. Lazebnik
• Problem:
– How do we choose corresponding circles independently in each image?
– Do objects in the image have a characteristic scale that we can identify?
D. Frolova, D. Simakov
Scale Invariant Detection
Scale Invariant Detection
• Solution:
– Design a function on the region which is “scale invariant” (has the same shape even if the image is resized)
– Take a local maximum of this function
scale = 1/2
f
region size
Image 1 f
region size
Image 2
Adapted from A. Torralba
s1 s2
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
K. Grauman, B. Leibe
)),((1
xIfmii
)),((1
xIfmii
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
K. Grauman, B. Leibe
)),((1
xIfmii
)),((1
xIfmii
Automatic Scale Selection
• Function responses for increasing scale (scale signature)
K. Grauman, B. Leibe
)),((1
xIfmii
)),((1
xIfmii
What Is A Useful Signature Function?
• Laplacian of Gaussian = “blob” detector
K. Grauman, B. Leibe
Difference of Gaussian ≈ Laplacian
• We can approximate the Laplacian with a difference of Gaussians; more efficient to implement.
2 ( , , ) ( , , )xx yyL G x y G x y
( , , ) ( , , )DoG G x y k G x y
(Laplacian)
(Difference of Gaussians)
Difference of Gaussian: Efficient computation
• Computation in Gaussian scale pyramid
K. Grauman, B. Leibe
Original image4
1
2
Sampling with
step 4 =2
Find local maxima in position-scale space of Difference-of-Gaussian
Adapted from K. Grauman, B. Leibe
2
3
4
5
List of(x, y, s)
Position-scale space:
Find places where X greater than all of its neighbors (in green)
Laplacian pyramid example
• Allows detection of increasingly coarse detail
Results: Difference-of-Gaussian
K. Grauman, B. Leibe
Feature description
Gradients
m(x, y) = sqrt(1 + 0) = 1Θ(x, y) = atan(0/1) = 0
Full version• Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below)
• Quantize the gradient orientations i.e. snap each gradient to one of 8 angles
• Each gradient contributes not just 1, but magnitude(gradient) to the histogram, i.e.
stronger gradients contribute more
• 16 cells * 8 orientations = 128 dimensional descriptor for each detected feature
Scale Invariant Feature Transform
Adapted from L. Zitnick, D. Lowe
Full version• Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below)
• Quantize the gradient orientations i.e. snap each gradient to one of 8 angles
• Each gradient contributes not just 1, but magnitude(gradient) to the histogram, i.e.
stronger gradients contribute more
• 16 cells * 8 orientations = 128 dimensional descriptor for each detected feature
• Normalize + clip (threshold normalize to 0.2) + normalize the descriptor
• After normalizing, we have:
Scale Invariant Feature Transform
0.2
Adapted from L. Zitnick, D. Lowe
such that:
CSE 576: Computer Vision
Image from Matthew Brown
• Rotate patch according to its dominant gradient orientation• This puts the patches into a canonical orientation
K. Grauman
Making descriptor rotation invariant
Keypoint matching
Matching local features
?
• To generate candidate matches, find patches that have the
most similar appearance (e.g., lowest feature Euclidean distance)
• Simplest approach: compare them all, take the closest (or closest
k, or within a thresholded distance)
Image 1 Image 2
K. Grauman
Robust matching
• At what Euclidean distance value do we have a good match?
• To add robustness to matching, can consider ratio : distance
to best match / distance to second best match
• If low, first match looks good.
• If high, could be ambiguous match.
Image 1 Image 2
? ? ? ?
K. Grauman
Ratio: example
• Let q be the query from the first image,
d1 be the closest match in the second image,
and d2 be the second closest match
• Let dist(q, d1) and dist(q, d2) be the distances
• Let r = dist(q, d1) / dist(q, d2)
• What is the largest that r can be?
• What is the lowest that r can be?
• If r is 1, what do we know about the two
distances?
• What about when r is 0.1?
Indexing local features: Setup
• When we see close points in feature space, we
have similar descriptors, which indicates similar
local content.
Descriptor’s
feature space
Database
images
Query
image
K. Grauman
Image matching
• Summarize entire image
based on its distribution
(histogram) of word
occurrences.
• Analogous to bag of words
representation commonly
used for documents.
Describing images w/ visual words
tim
es a
pp
ea
rin
g
tim
es a
pp
ea
rin
g
tim
es a
pp
ea
rin
g
Feature patches:
Visual wordsK. Grauman
Bag of visual words: Two uses
1. Represent the image
2. Using that representation, look for similar images
3. Can also use BOW to compute an inverted index, to simplify application #2
Visual words: main idea
• Extract some local features from a number of images …
e.g., SIFT descriptor space: each
point is 128-dimensional
D. Nister, CVPR 2006
Visual words: main idea
D. Nister, CVPR 2006
D. Nister, CVPR 2006
“Quantize” the space by grouping
(clustering) the features.
Note: For now, we’ll treat clustering
as a black box.
Inverted file index and
bags of words similarity
w91
1. (offline) Extract features in database images, cluster them to find words, make index
2. Extract words in query (extract features and map each to closest cluster center)
3. Use inverted file index to find frames relevant to query
4. For each relevant frame, rank them by comparing word counts (BOW) of query and
frame Adapted from K. Grauman
Scoring retrieval quality
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
recall
pre
cis
ion
Query
Database size: 10 images
Relevant (total): 5 images(e.g. images of Golden Gate)
Results (ordered):
precision = # returned relevant / # returnedrecall = # returned relevant / # total relevant
Ondrej Chum