+ All Categories
Home > Documents > CS 1674: Intro to Computer Vision

CS 1674: Intro to Computer Vision

Date post: 10-Jan-2022
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
86
CS 1674: Intro to Computer Vision Midterm Review Prof. Adriana Kovashka University of Pittsburgh October 10, 2016
Transcript
Page 1: CS 1674: Intro to Computer Vision

CS 1674: Intro to Computer Vision

Midterm Review

Prof. Adriana KovashkaUniversity of Pittsburgh

October 10, 2016

Page 2: CS 1674: Intro to Computer Vision

Reminders

• The midterm exam is in class on this coming Wednesday

• There will be no make-up exams unless you or a close relative is seriously ill!

Page 3: CS 1674: Intro to Computer Vision

Review requests I received

• Textures and texture representations, image responses to size and orientation of Gaussian filter banks, comparisons – 4

• Corner detection alg, Harris – 4• Invariance vs covariance, affine intensity change, and applications to know – 3 • Scale-invariant detection, blob detection, Harris automatic scale selection – 3 • Sift and feature description – 3 • Keypoint matching alg, feature matching – 2 • Examples of how to compute and apply homography, epipolar geometry – 2 • Why it makes sense to use the ratio: distance to best match / distance to second

best match when matching features across images• Summary of equations students need to know • Pyramids• Convolution practical use • Filters for transforming the image

Page 4: CS 1674: Intro to Computer Vision

Transformations, Homographies, Epipolar Geometry

Page 5: CS 1674: Intro to Computer Vision

2D Linear Transformations

Only linear 2D transformations can be represented with

a 2x2 matrix.

Linear transformations are combinations of …

• Scale,

• Rotation,

• Shear, and

• Mirror

y

x

dc

ba

y

x

'

'

Alyosha Efros

Page 6: CS 1674: Intro to Computer Vision

2D Affine Transformations

Affine transformations are combinations of …

• Linear transformations, and

• Translations

Maps lines to lines, parallel lines remain parallel

w

y

x

fed

cba

w

y

x

100'

'

'

Adapted from Alyosha Efros

Page 7: CS 1674: Intro to Computer Vision

Projective Transformations

Projective transformations:

• Affine transformations, and

• Projective warps

Parallel lines do not necessarily remain parallel

w

yx

ihg

fedcba

w

yx

'

''

Kristen Grauman

Page 8: CS 1674: Intro to Computer Vision

How to stitch together a panorama (a.k.a. mosaic)?

• Basic Procedure

– Take a sequence of images from the same position• Rotate the camera about its optical center

– Compute the homography (transformation) between second image and first

– Transform the second image to overlap with the first

– Blend the two together to create a mosaic

– (If there are more images, repeat)

Modified from Steve Seitz

Page 9: CS 1674: Intro to Computer Vision

11, yx 11, yx

To compute the homography given pairs of corresponding

points in the images, we need to set up an equation where

the parameters of H are the unknowns…

22 , yx 22 , yx

nn yx , nn yx ,

Kristen Grauman

Computing the homography

Page 10: CS 1674: Intro to Computer Vision

Computing the homography

Can set scale factor i=1. So, there are 8 unknowns.

Set up a system of linear equations:

Ah = b

where vector of unknowns h = [a,b,c,d,e,f,g,h]T

Need at least 8 eqs, but the more the better…

Solve for h. If overconstrained, solve using least-squares: 2

min bAh

1

y

x

ihg

fed

cba

w

wy'

wx'

p’ = Hp

Kristen Grauman

Page 11: CS 1674: Intro to Computer Vision

Computing the homography

• Assume we have four matched points: How do we

compute homography H?

0h

'''1000

'''0001

yyyxyyx

xyxxxyx

'

''

''

'p

w

yw

xw

987

654

321

hhh

hhh

hhh

H

9

8

7

6

5

4

3

2

1

h

h

h

h

h

h

h

h

h

h

Derek Hoiem

p’=Hp

• Apply SVD: UDVT = A [U, S, V] = svd(A);

• h = Vsmallest (column of V corr. to smallest singular value)

A

Page 12: CS 1674: Intro to Computer Vision

1

yx

*********

w

wy'wx'

H pp’

yx,

w

yww

xw,

yx ,

To apply a given homography H

• Compute p’ = Hp (regular matrix multiply)

• Convert p’ from homogeneous to image

coordinates

Modified from Kristen Grauman

Transforming the second imageImage 1 canvasImage 2

Test point:

Page 13: CS 1674: Intro to Computer Vision

f(x,y) g(x’,y’)

Transforming the second image

Forward warping:

Send each pixel f(x,y) to its corresponding location

(x’,y’) = H(x,y) in the right image

x x’

H(x,y)

y y’

Modified from Alyosha Efros

Image 1 canvasImage 2

Page 14: CS 1674: Intro to Computer Vision

Depth from disparity

image I(x,y) image I´(x´,y´)Disparity map D(x,y)

So if we could find the corresponding points in two images,

we could estimate relative depth…

Kristen Grauman

We have two images taken from cameras with different intrinsic

and extrinsic parameters.• How do we match a point in the first image to a point in the second?

Page 15: CS 1674: Intro to Computer Vision

• Epipolar Lines - intersections of epipolar plane with image

planes (always come in corresponding pairs)• Note: All epipolar lines intersect at the epipole.

Epipolar geometry: notationX

x x’

• Epipolar Plane – plane containing baseline

• Epipoles

= intersections of baseline with image planes

= projections of the other camera center

• Baseline – line connecting the two camera centers

Derek Hoiem

Page 16: CS 1674: Intro to Computer Vision

Epipolar constraint

The epipolar constraint is useful because

it reduces the correspondence problem

to a 1D search along an epipolar line.

Kristen Grauman, image from Andrew Zisserman

Page 17: CS 1674: Intro to Computer Vision

Essential matrix

0 RXTX

0][T RXX x

E is called the essential matrix, and it relates corresponding image

points between both cameras, given the rotation and translation.

Before we said: If we observe a point in one image, its position in other

image is constrained to lie on line defined by above.• Turns out Ex’ is the epipolar line through x in the first image, corresp. to x’.

Note: these points are in camera coordinate systems.

Let RE ][T x

0 EXXEXXT

Kristen Grauman

Page 18: CS 1674: Intro to Computer Vision

Basic stereo matching algorithm

• For each pixel in the first image– Find corresponding epipolar scanline in the right image

– Search along epipolar line and pick the best match x’

– Compute disparity x-x’ and set depth(x) = f*T/(x-x’)

Derek Hoiem

Page 19: CS 1674: Intro to Computer Vision

Matching cost

disparity

Left Right

scanline

• Slide a window along the right scanline and compare contents

of that window with the reference window in the left image

• Matching cost: e.g. Euclidean distance

Derek Hoiem

Correspondence search

Page 20: CS 1674: Intro to Computer Vision

• Assume parallel optical axes, known camera parameters

(i.e., calibrated cameras). What is expression for Z?

Similar triangles (pl, P, pr) and

(Ol, P, Or):

Geometry for a simple stereo system

Z

T

fZ

xxT rl

lr xx

TfZ

disparity

Kristen Grauman

depth

Page 21: CS 1674: Intro to Computer Vision

Results with window searchData

Window-based matching Ground truth

Left image Right image

Window-based matching Ground truth

Derek Hoiem

Page 22: CS 1674: Intro to Computer Vision

How can we improve?• Uniqueness

– For any point in one image, there should be at most one matching point in the other image

• Ordering– Corresponding points should be in the same order in both

views

• Smoothness– We expect disparity values to change slowly (for the most

part)

Derek Hoiem

Page 23: CS 1674: Intro to Computer Vision

Many of these constraints can be encoded in an energy function and solved using graph cuts

Graph cuts Ground truth

For the latest and greatest: http://vision.middlebury.edu/stereo/

Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy

Minimization via Graph Cuts, PAMI 2001

Before

Derek Hoiem

Page 24: CS 1674: Intro to Computer Vision

Projective structure from motion

• Given: m images of n fixed 3D points

xij = Pi Xj , i = 1,… , m, j = 1, … , n

• Problem: estimate m projection matrices Pi and n 3D points Xj from the mn corresponding 2D points xij

x1j

x2j

x3j

Xj

P1

P2

P3

Svetlana Lazebnik

Page 25: CS 1674: Intro to Computer Vision

Photo synth

Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring

photo collections in 3D," SIGGRAPH 2006

http://photosynth.net/

Page 26: CS 1674: Intro to Computer Vision

3D from multiple images

Building Rome in a Day: Agarwal et al. 2009

Page 27: CS 1674: Intro to Computer Vision

Recap: Epipoles

C

• Point x in left image corresponds to epipolarline l’ in right image

• Epipolar line passes through the epipole (the intersection of the cameras’ baseline with the image plane

C

Derek Hoiem

Page 28: CS 1674: Intro to Computer Vision

Recap: Essential, Fundamental Matrices

• Fundamental matrix maps from a point in one image to a line in the other

• If x and x’ correspond to the same 3d point X:

• Essential matrix is like fundamental matrix but more constrained

Adapted from Derek Hoiem

Page 29: CS 1674: Intro to Computer Vision

Recap: stereo with calibrated cameras

• Given image pair, R, T

• Detect some features

• Compute essential matrix E

• Match features using the epipolar and other constraints

• Triangulate for 3d structure and get depth

Kristen Grauman

Page 30: CS 1674: Intro to Computer Vision

Texture representations

Page 31: CS 1674: Intro to Computer Vision

Correlation filtering

Say the averaging window size is 2k+1 x 2k+1:

Loop over all pixels in neighborhood around image pixel F[i,j]

Attribute uniform weight to each pixel

Now generalize to allow different weights depending on neighboring pixel’s relative position:

Non-uniform weights

Kristen Grauman

Page 32: CS 1674: Intro to Computer Vision

Convolution vs. correlation

Convolution

Cross-correlation5 2 5 4 4

5 200 3 200 4

1 5 5 4 4

5 5 1 1 2

200 1 3 5 200

1 200 200 200 1

F

.06 .12 .06

.12 .25 .12

.06 .12 .06

H

u = -1, v = -1

(0, 0)

(i, j)

Page 33: CS 1674: Intro to Computer Vision

-101

-202

-101

* =

Slide credit: Derek Hoiem

Filters for computing gradients

Page 34: CS 1674: Intro to Computer Vision

Texture representation: example

original image

derivative filter responses, squared

statistics to summarize patterns in small

windows

mean d/dxvalue

mean d/dyvalue

Win. #1 4 10

Win.#2 18 7

Win.#9 20 20

Kristen Grauman

Page 35: CS 1674: Intro to Computer Vision

Filter banks

• What filters to put in the bank?

– Typically we want a combination of scales and orientations, different types of patterns.

Matlab code available for these examples: http://www.robots.ox.ac.uk/~vgg/research/texclass/filters.html

scales

orientations

“Edges” “Bars”

“Spots”

Kristen Grauman

Page 36: CS 1674: Intro to Computer Vision

Matching with filters

• Goal: find in image

• Method 0: filter the image with eye patch

Input Filtered Image

],[],[],[,

lnkmflkhnmglk

What went wrong?

f = image

g = filter

Derek Hoiem

Page 37: CS 1674: Intro to Computer Vision

Matching with filters

• Goal: find in image

• Method 1: filter the image with zero-mean eye

Input Filtered Image (scaled) Thresholded Image

)],[())(],[(],[,

lnkmfhmeanlkhnmglk

True detections

False

detections

Likes bright pixels where filters are above average, dark pixels where filters are below average.

Derek Hoiem

Page 38: CS 1674: Intro to Computer Vision

Showing magnitude of responses

Kristen Grauman

Page 39: CS 1674: Intro to Computer Vision

Kristen Grauman

Page 40: CS 1674: Intro to Computer Vision

Kristen Grauman

Page 41: CS 1674: Intro to Computer Vision

Representing texture by mean abs

response

Mean abs responses

Filters

Derek Hoiem

Page 42: CS 1674: Intro to Computer Vision

Computing distances using texture

Dimension 1

Dim

en

sio

n 2

a

b

dim#

1

2

222

211

)(),(

)()(),(

i

ii babaD

bababaD

Kristen Grauman

Page 43: CS 1674: Intro to Computer Vision

Feature detection: Harris

Page 44: CS 1674: Intro to Computer Vision

Corners as distinctive interest points• We should easily recognize the keypoint by looking

through a small window

• Shifting a window in any direction should give a large change in intensity

“edge”:

no change along

the edge direction

“corner”:

significant change

in all directions

“flat” region:

no change in

all directions

A. Efros, D. Frolova, D. Simakov

Page 45: CS 1674: Intro to Computer Vision

Harris Detector: Mathematics

Window-averaged squared change of intensity induced by shifting the image data by [u,v]:

IntensityShifted intensity

Window function

orWindow function w(x,y) =

Gaussian1 in window, 0 outside

D. Frolova, D. Simakov

Page 46: CS 1674: Intro to Computer Vision

Harris Detector: MathematicsExpanding I(x,y) in a Taylor series expansion, we have, for small shifts [u,v], a quadratic approximation to the error surface between a patch and itself, shifted by [u,v]:

where M is a 2×2 matrix computed from image derivatives:

D. Frolova, D. Simakov

Page 47: CS 1674: Intro to Computer Vision

yyyx

yxxx

IIII

IIIIyxwM ),(

x

II x

y

II y

y

I

x

III yx

Notation:

K. Grauman

Harris Detector: Mathematics

Page 48: CS 1674: Intro to Computer Vision

What does the matrix M reveal?

Since M is symmetric, we have TXXM

2

1

0

0

iii xMx

The eigenvalues of M reveal the amount of intensity change in the two principal orthogonal gradient directions in the window.

K. Grauman

Page 49: CS 1674: Intro to Computer Vision

Corner response function

“flat” region:

1 and 2 are small

“edge”:

1 >> 2

2 >> 1

“corner”:

1 and 2 are large,1 ~ 2

Adapted from A. Efros, D. Frolova, D. Simakov, K. Grauman

Page 50: CS 1674: Intro to Computer Vision

Harris Detector: Algorithm

• Compute image gradients Ix and Iy for all pixels

• For each pixel– Compute

by looping over neighbors x, y

– compute

• Find points with large corner response function R (R > threshold)

• Take the points of locally maximum R as the detected feature points (i.e., pixels where R is bigger than for all the 4 or 8 neighbors)

55D. Frolova, D. Simakov

(k :empirical constant, k = 0.04-0.06)

Page 51: CS 1674: Intro to Computer Vision

K. Grauman

Example of Harris application

Page 52: CS 1674: Intro to Computer Vision

Feature detection: Scale-invariance

Page 53: CS 1674: Intro to Computer Vision

Invariance vs covariance

“A function is invariant under a certain family of

transformations if its value does not change when a

transformation from this family is applied to its argument.

A function is covariant when it commutes with the

transformation, i.e., applying the transformation to the

argument of the function has the same effect as applying

the transformation to the output of the function. […]

[For example,] the area of a 2D surface is invariant under

2D rotations, since rotating a 2D surface does not make

it any smaller or bigger.

But the orientation of the major axis of inertia of the

surface is covariant under the same family of

transformations, since rotating a 2D surface will affect

the orientation of its major axis in exactly the same way.”

“Local Invariant Feature Detectors: A Survey” by Tinne Tuytelaars and Krystian Mikolajczyk,

in Foundations and Trends in Computer Graphics and Vision Vol. 3, No. 3 (2007) 177–280

Chapter 1, 3.2, 7 http://homes.esat.kuleuven.be/%7Etuytelaa/FT_survey_interestpoints08.pdf

Page 54: CS 1674: Intro to Computer Vision

What happens if: Affine intensity change

• Only derivatives are used =>

invariance to intensity shift I I + b

• Intensity scaling: I a I

R

x (image coordinate)

threshold

R

x (image coordinate)

Partially invariant to affine intensity change

I a I + b

L. Lazebnik

Page 55: CS 1674: Intro to Computer Vision

What happens if: Image translation

• Derivatives and window function are shift-invariant

Corner location is covariant w.r.t. translation

L. Lazebnik

Page 56: CS 1674: Intro to Computer Vision

What happens if: Image rotation

Second moment ellipse rotates but its shape

(i.e. eigenvalues) remains the same

Corner location is covariant w.r.t. rotation

L. Lazebnik

Page 57: CS 1674: Intro to Computer Vision

What happens if: Scaling

All points will

be classified

as edges

Corner

Corner location is not covariant to scaling!

L. Lazebnik

Page 58: CS 1674: Intro to Computer Vision

• Problem:

– How do we choose corresponding circles independently in each image?

– Do objects in the image have a characteristic scale that we can identify?

D. Frolova, D. Simakov

Scale Invariant Detection

Page 59: CS 1674: Intro to Computer Vision

Scale Invariant Detection

• Solution:

– Design a function on the region which is “scale invariant” (has the same shape even if the image is resized)

– Take a local maximum of this function

scale = 1/2

f

region size

Image 1 f

region size

Image 2

Adapted from A. Torralba

s1 s2

Page 60: CS 1674: Intro to Computer Vision

Automatic Scale Selection

• Function responses for increasing scale (scale signature)

K. Grauman, B. Leibe

)),((1

xIfmii

)),((1

xIfmii

Page 61: CS 1674: Intro to Computer Vision

Automatic Scale Selection

• Function responses for increasing scale (scale signature)

K. Grauman, B. Leibe

)),((1

xIfmii

)),((1

xIfmii

Page 62: CS 1674: Intro to Computer Vision

Automatic Scale Selection

• Function responses for increasing scale (scale signature)

K. Grauman, B. Leibe

)),((1

xIfmii

)),((1

xIfmii

Page 63: CS 1674: Intro to Computer Vision

What Is A Useful Signature Function?

• Laplacian of Gaussian = “blob” detector

K. Grauman, B. Leibe

Page 64: CS 1674: Intro to Computer Vision

Difference of Gaussian ≈ Laplacian

• We can approximate the Laplacian with a difference of Gaussians; more efficient to implement.

2 ( , , ) ( , , )xx yyL G x y G x y

( , , ) ( , , )DoG G x y k G x y

(Laplacian)

(Difference of Gaussians)

Page 65: CS 1674: Intro to Computer Vision

Difference of Gaussian: Efficient computation

• Computation in Gaussian scale pyramid

K. Grauman, B. Leibe

Original image4

1

2

Sampling with

step 4 =2

Page 66: CS 1674: Intro to Computer Vision

Find local maxima in position-scale space of Difference-of-Gaussian

Adapted from K. Grauman, B. Leibe

2

3

4

5

List of(x, y, s)

Position-scale space:

Find places where X greater than all of its neighbors (in green)

Page 67: CS 1674: Intro to Computer Vision

Laplacian pyramid example

• Allows detection of increasingly coarse detail

Page 68: CS 1674: Intro to Computer Vision

Results: Difference-of-Gaussian

K. Grauman, B. Leibe

Page 69: CS 1674: Intro to Computer Vision

Feature description

Page 70: CS 1674: Intro to Computer Vision

Gradients

m(x, y) = sqrt(1 + 0) = 1Θ(x, y) = atan(0/1) = 0

Page 71: CS 1674: Intro to Computer Vision

Full version• Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below)

• Quantize the gradient orientations i.e. snap each gradient to one of 8 angles

• Each gradient contributes not just 1, but magnitude(gradient) to the histogram, i.e.

stronger gradients contribute more

• 16 cells * 8 orientations = 128 dimensional descriptor for each detected feature

Scale Invariant Feature Transform

Adapted from L. Zitnick, D. Lowe

Page 72: CS 1674: Intro to Computer Vision

Full version• Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below)

• Quantize the gradient orientations i.e. snap each gradient to one of 8 angles

• Each gradient contributes not just 1, but magnitude(gradient) to the histogram, i.e.

stronger gradients contribute more

• 16 cells * 8 orientations = 128 dimensional descriptor for each detected feature

• Normalize + clip (threshold normalize to 0.2) + normalize the descriptor

• After normalizing, we have:

Scale Invariant Feature Transform

0.2

Adapted from L. Zitnick, D. Lowe

such that:

Page 73: CS 1674: Intro to Computer Vision

CSE 576: Computer Vision

Image from Matthew Brown

• Rotate patch according to its dominant gradient orientation• This puts the patches into a canonical orientation

K. Grauman

Making descriptor rotation invariant

Page 74: CS 1674: Intro to Computer Vision

Keypoint matching

Page 75: CS 1674: Intro to Computer Vision

Matching local features

?

• To generate candidate matches, find patches that have the

most similar appearance (e.g., lowest feature Euclidean distance)

• Simplest approach: compare them all, take the closest (or closest

k, or within a thresholded distance)

Image 1 Image 2

K. Grauman

Page 76: CS 1674: Intro to Computer Vision

Robust matching

• At what Euclidean distance value do we have a good match?

• To add robustness to matching, can consider ratio : distance

to best match / distance to second best match

• If low, first match looks good.

• If high, could be ambiguous match.

Image 1 Image 2

? ? ? ?

K. Grauman

Page 77: CS 1674: Intro to Computer Vision

Ratio: example

• Let q be the query from the first image,

d1 be the closest match in the second image,

and d2 be the second closest match

• Let dist(q, d1) and dist(q, d2) be the distances

• Let r = dist(q, d1) / dist(q, d2)

• What is the largest that r can be?

• What is the lowest that r can be?

• If r is 1, what do we know about the two

distances?

• What about when r is 0.1?

Page 78: CS 1674: Intro to Computer Vision

Indexing local features: Setup

• When we see close points in feature space, we

have similar descriptors, which indicates similar

local content.

Descriptor’s

feature space

Database

images

Query

image

K. Grauman

Page 79: CS 1674: Intro to Computer Vision

Image matching

Page 80: CS 1674: Intro to Computer Vision

• Summarize entire image

based on its distribution

(histogram) of word

occurrences.

• Analogous to bag of words

representation commonly

used for documents.

Describing images w/ visual words

tim

es a

pp

ea

rin

g

tim

es a

pp

ea

rin

g

tim

es a

pp

ea

rin

g

Feature patches:

Visual wordsK. Grauman

Page 81: CS 1674: Intro to Computer Vision

Bag of visual words: Two uses

1. Represent the image

2. Using that representation, look for similar images

3. Can also use BOW to compute an inverted index, to simplify application #2

Page 82: CS 1674: Intro to Computer Vision

Visual words: main idea

• Extract some local features from a number of images …

e.g., SIFT descriptor space: each

point is 128-dimensional

D. Nister, CVPR 2006

Page 83: CS 1674: Intro to Computer Vision

Visual words: main idea

D. Nister, CVPR 2006

Page 84: CS 1674: Intro to Computer Vision

D. Nister, CVPR 2006

“Quantize” the space by grouping

(clustering) the features.

Note: For now, we’ll treat clustering

as a black box.

Page 85: CS 1674: Intro to Computer Vision

Inverted file index and

bags of words similarity

w91

1. (offline) Extract features in database images, cluster them to find words, make index

2. Extract words in query (extract features and map each to closest cluster center)

3. Use inverted file index to find frames relevant to query

4. For each relevant frame, rank them by comparing word counts (BOW) of query and

frame Adapted from K. Grauman

Page 86: CS 1674: Intro to Computer Vision

Scoring retrieval quality

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

recall

pre

cis

ion

Query

Database size: 10 images

Relevant (total): 5 images(e.g. images of Golden Gate)

Results (ordered):

precision = # returned relevant / # returnedrecall = # returned relevant / # total relevant

Ondrej Chum


Recommended