Computer Visionsparse and dense matching between two images
Martin de La Gorce
February 2015
1 / 1
A important problem
The problem of matching subparts of two different images is acore computer vision problem and has many applicationincluding
stereo reconstruction
motion estimation
tracking
medical image registration
2 / 1
An important problem
Panorama created by stiching images:
3 / 1
An important problem
Depth reconstructed from disparity of matching points:
4 / 1
An important problem
Object recognition by matching images:
5 / 1
Local vs Global
We can classify image matching algorithm into:
local methods that look for matching only in a limitedregion of the image
global methods that look for potential matchings throughthe entire image
6 / 1
Sparse vs Dense
We can classify image matching algorithm into:
sparse matching consists in finding reliable matches onlyfor a subset of key feature points of the imagecorresponding to salient points
dense matching consists in matching all the points in theoverlapping region of the two images on a dense grid inthe input
7 / 1
Sparse vs Dense
For efficiency, most method are either
global but sparse
dense but local
We can combine two approaches two get a dense a globalmatching method by
starting with a global and sparse method and then
densify the matching by propagation to neighboring pixels
8 / 1
The apperture problem
It is difficult to estimate the motion of a line along the linedirection if we do not see its extremities:
Sparse matching adresses the problem by consideringonly reliable points (corners)
Dense matching adresses the problem by propagating theinformation from reliable point to the surrounding regionsby imposing some regularity or rigidity of thetransformation across neighboring pixels,
9 / 1
A naive brute force patch matching
We measure the difference between two patches fromimage A and image B of size N by N respectively centeredat locations (i , j) and (m, n) using the sum of squaredifferences between corresponding pixels (SSD):
D(i , j , k , l) =r∑
m=−r
r∑
n=−r
(A(i + m, j + n) − B(k + m, l + n))2
with r = N−12 the "radius" of the patch
denoting PA(i , j) and PB(k , l) the two N by N matrix of pixelintensities corresponding to the patch extractedrespectively from A around (i , j) and B around (k , l) we get
D(i , j , k , l) = ‖PA(i , j) − B(k , l)‖2
10 / 1
A naive brute force patch matching
For each patch from image A we look we the most similarpatch in the image B and obtain a nearest neighbor fieldM : N2 → N2
M(i , j) = min(k ,l)D(i , j , k , l)
denoting (H, W ) the size of the image this methodcomplexity is of order H2W 2N2 which is generallyprohibitive..
accelerations possible using approximations and kdtrees 1
1Computing Nearest-Neighbor Fields via Propagation-Assisted KD-Trees.Kaiming He, Jian Sun CVPR 2012
11 / 1
A naive brute force patch matching
limitation of the brute force patch matching :
computationally expensive
do not cope with change in scale and local rotation. Wecould search across various scale and rotation angles, butthat would increase event more the computational cost
many wrong matches for edges due to the apertureproblem
many wrong matches in uniform regions
sensitive to viewing direction and pose
sensitive to conditions of illumination
12 / 1
A naive brute force patch matching
Solutions
We can get robustness to illumination changes using betterpatch similarity measures
We can partially address the computational cost problemby extracting all patches in image B and put them in astructure that allows fast nearest neighbor search (kd-tree,FLANN).
We can address the aperture problem using only a subsetof the patch that are likely to have reliable matches using acorner detector. We sacrifice density but gain robustnessand speed
13 / 1
normalized correlation
I order to be robust to change in contrast we can use thenormalized cross-correlation between the two patches Letdenote X and Y the two patches of size N by N
NCC(X , Y ) =
∑i∑
j(X [i , j] − μX)(Y [i , j] − μY )
‖X − μX‖ × ‖Y − μY‖(1)
With μX and μY the mean intensity in X and YμX = 1
N2
∑ij X [i , j],μY = 1
N2
∑ij Y [i , j] and
‖X − μX‖ =
√∑
m
∑
n
(X [i , j] − μX )2 (2)
‖Y − μY‖ =
√∑
m=
∑
n
(Y [i , j] − μY )2 (3)
14 / 1
normalized correlation
We have a list of patches X1, . . . , Xm than we want to compareto patche Y1, . . . , Yn. In order to reduce computation we firstcompute centered patches X1, . . . , Xm and Y1, . . . , Yn :
Xk [i , j] = Xij − μX
Yl [i , j] = Yij − μY
then centred and normalized patches
Xk [i , j] = Xk [i , j]/‖Xk‖
Yk [i , j] = Yk [i , j]/‖Yk‖
we then get
NCC(Xk , Yl) =∑
ij
Xk [i , j]Yl [i , j]
15 / 1
Corner detection
A patch is unlikely to be reliably matched if it looks like itssurrounding patches.
flat region:no change
edge: changein one direction
corner: changein both direction
16 / 1
Corner detection
A patch is unlikely to be reliably matched if it looks like itssurrounding patches.
The approach proposed by Moravec tests each pixel in theimage to see if a corner is present, by testing whether apatch centered on the pixel is similar to nearby, largelyoverlapping patches.
The similarity is measured by taking the sum of squareddifferences (SSD) between the two patches. A lowernumber indicates more similarity.
17 / 1
Corner detection
Given an image I and a pixel location (i , j) we compute asurface that measure the self-similarity for small displacements(u, v) :
Eij(u, v) =∑
x
∑
y
w(x − i , y − j)(I(x + u, x + v) − I(x , y))2
with w a binary window function w(x , y) = [(max(|x |, |y |) < c]
image Eij(u, v)
18 / 1
Moravec’s Corner detection
We measure the dissimilarity of a patch with its 8neighbors using the score
Sij = min(u,v)∈T Eij(u, v)
with T four tested shifts T = {(1, 0), (1, 1), (0, 1), (−1, 1)}
All patches with a high E score in either image A or imageB are discarded in the matching as they would probablymatch several patches in the other image
19 / 1
Moravec’s Corner detection
This approach from Moravec as several limitations
Noisy response due to a binary window function
Only a set of shifts at every 45 degree is considered
20 / 1
Harris Corners detector
To adress these limitations Harris proposed to
use a gaussian window function to get less noisyresponses
w(x , y) = exp(
−(x2 + y2)
2σ2
)
consider all small shift using taylor’s expension
Eij(u, v) =∑
x ,y
w(x − i , y − j)(I(x + u, y + v) − I(x , y))2
'∑
x ,y
w(x − i , y − j)(Ixu + Iyv)2
With Ix the derivative of the image in the x direction and Iythe derivative in the y direction
21 / 1
Harris Corners detector
Eij(u, v) '∑
x ,y
w(x − i , y − j)(Ixu + Iyv)2 (4)
'∑
x ,y
w(x − i , y − j)(I2x u2 + I2
y v2 + 2Ix Iyuv) (5)
We can rewrite this in a matrix form
Eij(u, v) ' [u, v ]Mij
[uv
]
With
Mij =∑
x ,y
w(x − i , y − j)[
I2x Ix Iy
Ix Iy I2y
]
22 / 1
Harris Corners detector
Eij(u, v) ' [u, v ]Mij
[uv
]
Eij(u, v) is quadratic positive and the level set{(u, v)|Eij(u, v) = 1} is an ellipsoid with
main axis direction aligned with the eigen vectors of Mij
main axis lengths proportional to the inverse of the twoeigen values of Mij
23 / 1
Harris
We want Eij(u, v) to be large over the entire unit circleC = {(u, v)|u2 + v2 = 1}
Using the taylor expension approximation we have
min(u,v)∈CEij(u, v) ' min(u,v)∈C
(
[u, v ]Mij
[uv
])
= λmin
with λmin the smallest eigen value of the matrix Mij .
We can compute Mij ’s eigen values and use a threshod κby keeping point with λmin > κ to get the Kanade-Tomasicorner detector.
24 / 1
Harris
Computing the eigen values requires a square root.instead Harris proposed to use a corner score R defined by
Rij = det(Mij) − k(trace(Mij))2
with k ' 0.06Denoting λ1 and λ2 the two eigen values we getR = λ1λ2 − k(λ1 + λ2)
2
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
0.000
0.0
50
0.1
00
0.1500.200
0.250
0.300
0.350
0.400
0.450
0.500
0.550
0.600
0.650
0.700
25 / 1
Harris
Classification of pixel given the two eigen values of Mij :
26 / 1
Harris
input images:
27 / 1
Harris
harris scores image with value Rij at location (i , j)
28 / 1
Harris
thresholded harris scores Rij > τ at location (i , j)
29 / 1
Harris
harris scores local maximums
30 / 1
Harris
detected corners:
31 / 1
Harris
ccwithout filtering with filtering
32 / 1
Harris
Once we found a set of corner in both image A and B wecompare each pair of point using the sum of squaredifferences or the normalized cross- correlation
Given a point in A we keep a matching point in B only if thematching score is much better that than the score of thesecond best matching patch in B. we keep matching(i , j) → (k , l) only if
∀(m, n) 6= (k , l) : D(i , j , k , l) < 0.8 ∗ D(i , j , m, n)
33 / 1
Harris: filtering matches
34 / 1
Multiscale
In many situations, detecting features at the finest scalepossible may not be appropriate. For example, when matchingimages with little high frequency detail (e.g., clouds), fine-scalefeatures may not exist. One solution to the problem is to extractfeatures at a variety of scales, e.g., by performing the sameoperations at multiple resolutions in a pyramid and thenmatching features at the same level. This kind of approach issuitable when the images being matched do not undergo largescale changes,
35 / 1
Descriptors
If we relax the assumption that we have only a translationby adding change in scale and rotation the search spacefor patch matching is big
instead of using the row patch pixel intensities we create alocal descriptor that are scale invariant and rotationinvariant
36 / 1
Rotation invariance
We compute the histogram of the gradient directions
keep all directions within 80% of dominant one
when comparing two patches we test only the set ofrotation that align these main directions.
37 / 1
Scale invariance
we can
Instead of Harris corner detector we need a salient pointdetector that works at various scale and prov
keep all directions within 80% of dominant one
when comparing two patches we test only the set ofrotation that align these main directions.
38 / 1
dense matching
Suppose we have two A et B we look for the displacement fieldu : R2 7→ R2 u(x , y) = (ux(x , y), ut(x , y)) such that:
Forward formulation:
A(x , y) = B(x + ux(x , y), y + uy (x , y))
Backward formulation:
B(x , y) = A(x − ux(x , y), y − uy (x , y))
image ux ,uy
39 / 1
dense matching as a minimization
Because of the noise and the discretization there is nosolution, instead we minimize the cost function
E(u) =
∫
Ω‖B(x , y) − A(x − ux(x , y), y − uy (x , y))‖2dxdy
(6)
The problem is ill posed : because of the appertureproblem there are many equally good solutions
We need either to parameterize u (for example consideringonly rigid transformation) or to impose some smoothnesson u
40 / 1
dense matching
Adding a smoothness term we minimize
E(u) =
∫
Ω‖B(x , y) − A(x − ux(x , y), y − uy (x , y)‖2dxdy
+
∫
Ω‖∇ux‖
2 + ‖∇ux‖2dxdy
(7)
the smoothness term will diffuse the motion informationfrom the corner to the edges and uniform regions
We will minimize this cost function with respect to u using agauss-newton approach
41 / 1
1D case
For simplicity we do the derivation of the method in 1D in adiscret setting (A and B continuous and u discretized into avector)
E(u) = Edata(u) + Esmooth(u) (8)
with
Edata(u) =∑
x
‖B(x) − A(x − u(x))‖2 (9)
Esmooth(u) =∑
x
(u(x + 1) − u(x))2 (10)
42 / 1
1D case
Edata(u) =∑
x
‖B(x) − A(x − u(x))‖2
Following the gauss newton method, we obtain a linear leastsquare approximation of Edata(u) around an current estimate ut
by linearising A(x − u(x)) around ut(x) :
A(x−u(x)) ' A(x−ut(x))−
(∂A∂x
)
(x−ut (x))
×(u(x)−ut(x)) (11)
Edata(u) '∑
x
‖B(x)−A(x−ut(x))−
(∂A∂x
)
(x−ut (x))
×(u(x)−ut(x))‖2dx
Edata(u) ' ‖rt − Mt(u − ut)‖2
with Mt the diagonal matrix Mt(x , x) = −(
∂A∂x
)(x−ut (x))
and rt thevector of residuals with rt(x) = B(x) − A(x − ut(x))
43 / 1
1D case
Esmooth(u) rewrites Esmooth(u) = ‖Du‖2 with D the followingToeplitz matrix of size n − 1 × n :
D =
−1 1 0 ∙ ∙ ∙ 0
0. . . . . . . . .
......
. . . . . . . . . 00 ∙ ∙ ∙ 0 −1 1
matrix We get
E(u) ' ‖Mt(u − ut) − rt‖2 + ‖Du‖2
At each iteration the new displacement field ut+1 is estimatedas the minimum of this least square problem:
ut+1 = (DT D + MTt Mt)
−1MTt (Mtut + rt)
We iterate until convergence44 / 1
2D case
In the 2D case , denoting uxv , uuv ,uxvt , uuvt the vectors obtainedfrom ux , uy ,uxt and uyt taking element in the row-major orderwe get
Edata(u) ' ‖rt − Mxt(uxv − uxvt) − Myt(uyv − uyvt)‖2
with
Mxt the diagonal matrix whose diagonal corresponds to thederivatives −∂A
∂x (x − uxt(x , y), y − uyt(x , y)) in therow-major order
Myt the diagonal matrix whose diagonal corresponds to thederivatives −∂A
∂y (x − uxt(x , y), y − uyt(x , y)) in therow-major order
rt the vector of residuals containing the residualsB(x , y) − A(x − uxt(x , y), y − uyt(x , y)) in the row-majororder
45 / 1
2D case
In the 2D case we get
Esmooth(u) =∑
xy
(ux(x + 1, y) − ux(x , u))2
+∑
xy
(uy (x + 1, y) − uy (x , u))2
+∑
xy
(ux(x , y + 1) − uy (x , u))2
+∑
xy
(uy (x , y + 1) − uy (x , u))2
Denoting uxv and uuv the to vectors obtained from ux and uy
taking element in the row major order we define sparsematrices Dx and Dy to getEsmooth(u) = ‖Dxuxv‖2 + ‖Dyuxv‖2 + ‖Dxuyv‖2 + ‖Dyuyv‖2
46 / 1
2D case
We need to take
Dx = IH−1,H ⊗ DW
Dy = DH ⊗ IW−1,W
with (H, W ) the size of the image, ⊗ the kronecker product oftwo matrices , Ikl the identity matrix of size k × l and Dk thebi-diagonal matrix of size k − 1× k with Dii = −1 and Di,j+1 = 1
47 / 1
2D case
E(u) = Edata(u) + Esmooth(u)
Edata(u) ' ‖rt − Mxt(uxv − uxtv ) − Myt(uyv − uytv )‖2
Esmooth(u) = ‖Dxuxv‖2 + ‖Dyuxv‖
2 + ‖Dxuyv‖2 + ‖Dyuyv‖
2
We can minimize alternatively with respect to uxv and Uyv :
for t odd:uxvt+1 = (DT
x Dx + DTy Dy + MT
xtMxt)−1MT
xt(Mxtuxt + rt)
for t even:uxvt+1 = (DT
x Dx + DTy Dy + MT
ytMyt)−1MT
yt(Mytuyt + rt)
48 / 1
2D case
We can use a single iteration to update both ux and uy using
Edata(u) ' ‖rt − Mt(uv − utv )‖2
Esmooth(u) = ‖Dx2uv‖2 + ‖Dy2uv‖
2
with uv =
[uvx
uvy
]
,Mt = [Mxt , Myt ], Dx2 = I2 ⊗ Dx ,Dy2 = I2 ⊗ Dy
We can minimize with respect to uv using
uvt+1 = (DTx2Dx2 + DT
y2Dy2 + MTt Mt)
−1MTt (Mtut + rt)
49 / 1
Optical Flow
A well know method to estimate a displacement field forsmall inter-frame displacement in a video sequence ifcalled the optical flow
A particular form of optical flow computation introduced byHorn and Schunck corresponds to a single iteration of thegauss newton method we just presented where the linearsystem is solved using the jacobi method
Instead of deriving the optical flow the classic way , I choseto present a gauss-newton optimization interpretation ofthis problem that i believe allows a more generalunderstanding of the method and its possible extensions(TV regulariation to allow discontinuities, iterativerefinement etc)
50 / 1
stereo
a special case of displacement field estimation is thestereo reconstuction problem
from a sparse set of correspondances between the twoimages it is possible to find a pair of homographictransformations to deforme each of the two image suchthat pairs of corresponding point are on the samehorizontal line of pixels this is called the rectification
51 / 1
stereo
After rectification the problem consist then in estimating apure horizontal displacement field (ux in our previousformulation) refered as the disparity map
Efficient dense and global method base on graph cut orother discrete optimization methods exist to estimate thedisparity map
52 / 1
Links and references
Revisiting Horn and Schunck: Interpretation asGauss-Newton Optimisation http://ar.in.tum.de/pub/
zikic2010revisiting/zikic2010revisiting.pdf
Horn-Schunck Optical Flow with a Multi-Scale Strategy(with demo)http://www.ipol.im/pub/art/2013/20/
Darko Zikic, Ali Kamen, and Nassir Navab Revisiting Hornand Schunck: Interpretation as Gauss-NewtonOptimisation. BMVC 2010.
53 / 1