1
Machine visionLecture Summary # 11
STEREO VISION
The goal of stereo vision is to use two cameras to capture 3D scenes. There are two important problems in stereo vision:• Correspondence problem: finding matching pairs (conjugate pairs) of the two images that represent the same point in the
3D scene.• Reconstruction problem: obtain the 3D structure from the images.
For a single pinhole camera we wrote:
u =λx
z(1)
v =λy
z(2)
A simple camera geometry for stereo vision is shown in figure 1, from which we have:
ur =λ (x− b)
z(3)
u` =λx
z(4)
vr =λy
z(5)
v` =λy
z(6)
where• λ is the focal length. The distance from the image plane to the center of projection.• b is the baseline, distance between the centers of the two cameras .• We assume that the optical axes are aligned.
By subtraction, we get
u` − ur =λb
z(7)
and thereforez =
λb
u` − ur(8)
It is common to attach the origin to the left camera as shown in figure 1. We assume that both cameras are calibrated and thatthey are identical. We also assume that the relative orientation of the two cameras is the same. It is also possible to attach theorigin to the middle point between the two cameras reference frames. The equations will be slightly different. Equation (8)gives the distance to the 3D point from the camera. Note that
• The difference u` − ur is called the horizontal disparity, retinal disparity, or binocular disparity. In order to get a feel forthe disparity, put one finger in front of you, close one eye, then open it and close the other eye.
• Distance z is inversely proportional to disparity• Disparity is proportional to the base line• Accuracy of depth determination increases with increasing baseline.• Images become less similar when the baseline increases.• For a given baseline, the accuracy is better for closer objects than for farther objects.
Example
Using equation (8), we can determine the x and y coordinates of point P as follows:
x = bu`
u` − ur(9)
y = bvr
u` − ur(10)
Machine vision, spring 2019 FB
Fig. 1. Stereo vision geometry, Cl is the reference point.
Example
Consider images 2 and 3 obtained from a stereo vision system (in this problem we use subscripts 1 and 2 for right and leftimages, respectively). The image size is 3456 by 4608 pixels. The pixel coordinates of the dot are r1 = 749, r2 = 4271, c1 =420. The origin of the pixel coordinate system is the bottom left of the image. (u0, v0) is located in the middle of the image.
1) Deduce a formula to find the distance z to point P .2) Calculate z when the intrinsic parameters are
λ
sx=3700 (11)
λ
sy=3450 (12)
u0 =2304 (13)v0 =1728 (14)
The distance between the two cameras is 30cm.3) Find the (x, y) coordinates of point P .
The solution is shown as a code below.
% Camera p a r a m e t e r su0= 2304v0= 1728a l p h a v =3450
2
Machine vision, spring 2019 FB
Point P
Fig. 2. Left image
Point P
Fig. 3. Right image
a l p h a u =3700%R i g h t
r1 =749c1 =420
%L e f tr2 =4271c2 =420b=300z p o i n t =( a l p h a u ∗b ) / ( r2−r1 )y p o i n t = z p o i n t ∗ ( c2−v0 ) / a l p h a vy p o i n t=−y p o i n t %T r a n s f o r m i n g t h e o r i g i nx p o i n t = z p o i n t ∗ ( r2−u0 ) / a l p h a u
The resiul;ts are
z = 315.16mm (15)x = 167.54mm (16)y = 119.48mm (17)
3
Machine vision, spring 2019 FB
Fig. 4. Relative geometry between two cameras
RELATIVE GEOMETRY BETWEEN TWO CAMERAS
The assumption of perfectly aligned cameras is violated in practice. Also, two identical cameras do not exist. In general,the first step in stereo vision is to determine the relationship between the two cameras. By relationship we mean the relativeorientation and position (cameras are not aligned any more). Consider the geometric representation of figure 4. Let S` =(x`, y`, z`)
T be the position of point P in the left camera coordinate system and Sr = (xr, yr, zr)T be the position of pointP in the right camera coordinate system. It is possible to relate the coordinates by the following equation
Sr = RS` + T (18)
where R is a rotation matrix, it satisfies RTR = I . System (18) can be written as
r11x` + r12y` + r13z` + Tx = xrr21x` + r22y` + r23z` + Ty = yrr31x` + r32y` + r33z` + Tz = zr
(19)
We do not know R or T but we know the the left and right image projections ur, vr, u`, v`. Knowing the focal length, it ispossible to write
u` =λx`z`
(20)
v` =λy`z`
(21)
ur =λxrzr
(22)
vr =λyrzr
(23)
Now z` and zr are regarded as additional unknowns. After substituting x`, y`, xr, yr by their formulae in terms of the focallength and the depth distance, we get
r11u`z`λ
+ r12v`z`λ
+ r13z` + Tx =urzrλ
(24)
r21u`z`λ
+ r22v`z`λ
+ r23z` + Ty =vrzrλ
(25)
r31u`z`λ
+ r32v`z`λ
+ r33z` + Tz = zr (26)
4
Machine vision, spring 2019 FB
and
r11u` + r12v` + r13λ+ Txλ
z`= ur
zrz`
(27)
r21u` + r22v` + r23λ+ Tyλ
z`= vr
zrz`
(28)
r31u` + r32v` + r33λ+ Tzλ
z`= λ
zrz`
(29)
There are three equations and fourteen unknowns (rij , Tx, Ty, Tz.z`, zr). Each additional point provides three more equations,but at the same time introduces two unknown variables: z`, zr. For one point, we can write the system as
3 equations× 1 point = 12 unkowns + 2 unkowns× 1 point (30)
For N points, we obtain3 equations×N points = 12 unkowns + 2 unkowns×N points (31)
Therefore, we need at least 12 points to solve.
COMPUTING THE DEPTH
If we know the translation and the rotation matrix as well as the image coordinate u`, ur, v`, vr, we can calculate the depthsz` and zr:
[r11
u`λ
+ r12v`λ
+ r13
]z` + Tx = ur
zrλ
(32)[r21
u`λ
+ r22v`λ
+ r23
]z` + Ty = vr
zrλ
(33)[r31
u`λ
+ r32v`λ
+ r33
]z` + Tz = zr (34)
Since we have two unknowns and three equations we can use any two equations to solve for z` and zr. In the particular casewhen the cameras have the same orientation, we have:[u`
λ
]z` + Tx = zr
[urλ
](35)[v`
λ
]z` + Ty = zr
[vrλ
](36)
z` + Tz = zr (37)
EPIPOLAR GEOMETRY AND FUNDAMENTAL MATRIX
Consider the stereo vision geometry of figures 5 and 6. We want to solve the correspondence problem. Point P in the 3Dspace is imaged in the left camera at q` and in the right camera at qr. Rays C`q` and Crqr intersect at point P and they bothlie in the same plane. As a result the image points q`, qr, the space point P , and the camera centers are coplanar, i.e., theybelong to the same plane. The plane defined by these three points (C`, Cr, P ) is called the epipolar plane and is denoted byΠ.
The correspondence problem can be formulated as follows: knowing q`, is what are the coordinates and the constraints onthe location of qr? Point qr lies in the right image plane and at the same time, it lies in the plane Π. The intersection ofthe epipolar plane Π and the image plane forms a line `r. Now the search for qr is reduced to line `r. Line `r is called theepipolar line. Points e`, er are called the epipoles. Figure 6 shows the stereo vision geometry. Points P1, P2, P3 have the sameprojection in the left image plane, but different projections in the right image. The epipolar constraint reduces the problem to1D search.
Example
The images in figure 7 and 8 are taken using a stereo vision system with b = 300mm. The coordinates of the point ofinterest in the pixel coordinate system are (916, 686) in the right image and (97, 701) in the left image. The blue line is theright epipolar line and the red line is the left epipolar line.
THE ESSENTIAL MATRIX
Assume we have canonical camerasA` = Ar = I
5
Machine vision, spring 2019 FB
Fig. 5. Stereo vision geometry
Fig. 6. Stereo vision geometry
X: 916Y: 685.6
500 1000 1500 2000 2500
200
400
600
800
1000
1200
1400
1600
1800
Fig. 7. Examples of the epipolar lines
6
Machine vision, spring 2019 FB
X: 97Y: 701.4
500 1000 1500 2000 2500
200
400
600
800
1000
1200
1400
1600
1800
Fig. 8. Examples of the epipolar lines
where A` and Ar are the intrinsic matrixes of the left and right camera, respectively. We define the projection matrixes asfollows
M` =[I 0
](38)
Mr =[R T
](39)
The essential matrix is defines asE = TXR
where TX is the translation vector represented under matrix form
TX =
0 −Tz TyTz 0 −Tx−Ty Tx 0
(40)
THE FUNDAMENTAL MATRIX
The fundamental matrix is a algebraic representation of the epipolar geometry. It represents a mapping between the rightand left image. In general
A` 6= I
andAr 6= I
. The fundamental matrix is given byF = [A−1
r ]T [TX ][R][A−1` ] (41)
The most important property of the fundamental matrix is summarized in the following theorem.
Theorem:The fundamental matrix satisfies the following condition: for any pair of corresponding image points
qTr Fq` = 0 (42)
• qr lies on the epilopar line`r = Fq` (43)
• q` lies on the epilopar line`` = FT qr (44)
Equations (43) and (44) show that the fundamental matrix represents a mapping between a point and a line.The correspondence problem is formulated in terms of matrix F . Solving the correspondence problem means solving for
matrix F , which is a unique 3× 3 matrix of rank 2.
7
Machine vision, spring 2019 FB
COMPUTING THE FUNDAMENTAL MATRIX: THE EIGHT-POINT ALGORITHM
Equation (41) gives the fundamental matrix in terms of the intrinsic and extrinsic parameters. As mentioned previously, eachpair of points gives a scalar constraint as follows
[qTr ]iF [q`]i = 0 (45)
The eight-point algorithm proposes to use at least eight points to calculate matrix F . Equation (42) can be written as
[ur vr 1
] f11 f12 f13f21 f22 f23f31 f32 f33
u`v`1
= 0 (46)
This is a scalar equation that can be reduced to:
[u`ur u`vr u` v`ur v`vr v` ur vr 1
]
f11f12f13f21f22f23f31f32f33
= 0
At least eight points are needed to solve. If we take N points, we obtain N constraint that can be put under matrix form asfollow
Wf = 0
where W is given by
u`1ur1 u`1vr1 u`1 v`1ur1 v`1vr1 v`1 ur1 vr1 1u`2ur2 u`2vr2 u`2 v`2ur2 v`2vr2 v`2 ur2 vr2 1u`3ur3 u`3vr3 u`3 v`3ur3 v`3vr3 v`3 ur3 vr3 1u`4ur4 u`4vr4 u`4 v`4ur4 v`4vr4 v`4 ur4 vr4 1u`5ur5 u`5vr5 u`5 v`5ur5 v`5vr5 v`5 ur5 vr5 1u`6ur6 u`6vr6 u`6 v`6ur6 v`6vr6 v`6 ur6 vr6 1u`7ur7 u`7vr7 u`7 v`7ur7 v`7vr7 v`7 ur7 vr7 1
......
......
......
......
...u`NurN u`NvrN u`N v`NurN v`NvrN v`N urN vrN 1
(47)
and
f =
f11f12f13f21f22f23f31f32f33
(48)
One possible way to solve is bu using the singular value decomposition method. The solution consists of two steps in general• Step 1: Linear solution: Use the singular value decomposition to obtain a first estimation of matrix F by soling Wf = 0.
This estimation may not satisfy the rank requirement for the fundamental matrix. The following commands can be used:
[U, S , V] = svd (A ) ;f = V ( : , end ) ;F = r e s h a p e ( f , [3 3 ] ) ’ ;
• Step 2: Constraint enforcement: Find the closest approximation to F that has rank 2. Again SVD is used as follows
[U, S , V] = svd ( F ) ;S ( 3 , 3 ) = 0 ;F = U∗S∗V’ ;
8
Machine vision, spring 2019 FB
Example
We want to find the epipolar lines for the pair of images given in figures 7 8. The desired points are
p1 =(916, 686) (49)p2 =(97, 701) (50)
(51)
For 8 points, matrix W is given by:
W =
86165 635807 907 64410 475278 678 95 701 1135184 833000 952 122262 753375 861 142 875 1153576 951588 972 153102 948651 969 158 979 185975 941200 905 97850 1071200 1030 95 1040 1288706 1076086 1193 216590 807290 895 242 902 1589050 1198890 1386 363800 740440 856 425 865 1872448 1307136 1536 478824 717393 843 568 851 1262080 1228500 1170 233632 1095150 1043 224 1050 1
(52)
The fundamental matrix is given by
f =
−0.0000 −0.0000 0.00180.0000 0.0000 −0.0036−0.0009 0.0022 1.0000
(53)
The epipolar lines are given by
`r =
0.0005−0.00261.7367
`` =
−0.00030.0023−1.3524
(54)
9