+ All Categories
Home > Documents > Self-Calibration of a Moving Camera from Point Correspondences and Fundamental Matrices

Self-Calibration of a Moving Camera from Point Correspondences and Fundamental Matrices

Date post: 31-Jan-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
29
International Journal of Computer Vision 22(3), 261–289 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. Self-Calibration of a Moving Camera from Point Correspondences and Fundamental Matrices * Q.-T. LUONG SRI International, 333 Ravenswood av, Menlo Park, CA 94025, USA [email protected] O.D. FAUGERAS I.N.R.I.A., 2004 route des Lucioles, B.P. 93 06902 Sophia-Antipolis, France [email protected] Received (Bolles Office) 1993; (Kanade Office) December 8, 1995; Accepted October 24, 1995 Abstract. We address the problem of estimating three-dimensional motion, and structure from motion with an uncalibrated moving camera. We show that point correspondences between three images, and the fundamental matrices computed from these point correspondences, are sufficient to recover the internal orientation of the camera (its calibration), the motion parameters, and to compute coherent perspective projection matrices which enable us to reconstruct 3-D structure up to a similarity. In contrast with other methods, no calibration object with a known 3-D shape is needed, and no limitations are put upon the unknown motions to be performed or the parameters to be recovered, as long as they define a projective camera. The theory of the method, which is based on the constraint that the observed points are part of a static scene, thus allowing us to link the intrinsic parameters and the fundamental matrix via the absolute conic, is first detailed. Several algorithms are then presented, and their performances compared by means of extensive simulations and illustrated by several experiments with real images. Keywords: camera calibration, projective geometry, Euclidean geometry, Kruppa equations 1. Introduction and Motivations The problem of estimating the three-dimensional mo- tion of a camera from a number of token matches has received a lot of attention in the last fifteen years. Hav- ing detected and matched such tokens as points or lines in two or more images, researchers have developed methods for estimating the three-dimensional camera displacement, assuming a moving camera and a static object. This problem is equivalent to the problem of estimating the three-dimensional motion of an object observed by a static camera. The camera is modeled * This work was partially supported by the EC under Esprit grant 5390, Real Time Gaze Control. as a pinhole and its internal parameters are supposed to be known (the pinhole model and the internal pa- rameters are defined later). This is the full perspective case. Other researchers have assumed less general im- age formation models such as the orthographic model, for example Ullman (1979). In this article we will assume the most general case of the full perspective image formation model. When matching points, two views are sufficient and the computation of the motion is usually based upon the estimation of a matrix called the Essential, or E - matrix after Longuet-Higgins (1981) who first pub- lished a linear algorithm (called the eight-point algo- rithm because it requires eight point correspondences over two frames) for estimating this matrix and recover
Transcript

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

International Journal of Computer Vision 22(3), 261–289 (1997)c© 1997 Kluwer Academic Publishers. Manufactured in The Netherlands.

Self-Calibration of a Moving Camera from Point Correspondencesand Fundamental Matrices∗

Q.-T. LUONGSRI International, 333 Ravenswood av, Menlo Park, CA 94025, USA

[email protected]

O.D. FAUGERASI.N.R.I.A., 2004 route des Lucioles, B.P. 93 06902 Sophia-Antipolis, France

[email protected]

Received (Bolles Office) 1993; (Kanade Office) December 8, 1995; Accepted October 24, 1995

Abstract. We address the problem of estimating three-dimensional motion, and structure from motion with anuncalibrated moving camera. We show that point correspondences between three images, and the fundamentalmatrices computed from these point correspondences, are sufficient to recover the internal orientation of the camera(its calibration), the motion parameters, and to compute coherent perspective projection matrices which enable usto reconstruct 3-D structure up to a similarity. In contrast with other methods, no calibration object with a known3-D shape is needed, and no limitations are put upon the unknown motions to be performed or the parameters to berecovered, as long as they define a projective camera.

The theory of the method, which is based on the constraint that the observed points are part of a static scene,thus allowing us to link the intrinsic parameters and the fundamental matrix via the absolute conic, is first detailed.Several algorithms are then presented, and their performances compared by means of extensive simulations andillustrated by several experiments with real images.

Keywords: camera calibration, projective geometry, Euclidean geometry, Kruppa equations

1. Introduction and Motivations

The problem of estimating the three-dimensional mo-tion of a camera from a number of token matches hasreceived a lot of attention in the last fifteen years. Hav-ing detected and matched such tokens as points or linesin two or more images, researchers have developedmethods for estimating the three-dimensional cameradisplacement, assuming a moving camera and a staticobject. This problem is equivalent to the problem ofestimating the three-dimensional motion of an objectobserved by a static camera. The camera is modeled

∗This work was partially supported by the EC under Esprit grant5390, Real Time Gaze Control.

as a pinhole and its internal parameters are supposedto be known (the pinhole model and the internal pa-rameters are defined later). This is the full perspectivecase. Other researchers have assumed less general im-age formation models such as the orthographic model,for example Ullman (1979). In this article we willassume the most general case of the full perspectiveimage formation model.

When matching points, two views are sufficient andthe computation of the motion is usually based uponthe estimation of a matrix called the Essential, orE-matrix after Longuet-Higgins (1981) who first pub-lished a linear algorithm (called the eight-point algo-rithm because it requires eight point correspondencesover two frames) for estimating this matrix and recover

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

262 Luong and Faugeras

the camera displacement from it from a number of pointmatches. The properties of theE-matrix are now wellunderstood after the work of Faugeras and Maybank(1990), Huang and Faugeras (1989), and Maybank(1990). This matrix must satisfy a number of alge-braic constraints which are not taken into account bythe eight-point algorithm. Taking these constraints intoaccount forces to use nonlinear methods such as thefive-point algorithm of Faugeras and Maybank (1990).

The internal parameters of the cameras are tradi-tionally determined by observing a known calibrationobject (see Tsai (1989) for a review), prior to the ex-ecution of the vision task. However, there are severalapplications for which a calibration object is not avail-able, or its use is too cumbersome. The thrust of thispaper is to extend the previous results to the case wherethe internal parameters of the camera are unknown, stillassuming the full perspective model. Our guiding lightwill be projective geometry which we found to be ex-tremely useful both from the theoretical point of viewin that it has allowed us to express the geometry of theproblem in a much simpler way and from the practi-cal point of view in that this formal simplicity can betransported to algorithmic simplicity.

We will show that if we take three snapshots ofthe environment, each time establishing sufficientlymany point correspondences between the three pairsof images, we can a) recover the epipolar geometry ofeach pair of images b) recover the intrinsic parametersof the camera (which we assume not to be changingduring the motion) and c) recover the motion of thecamera (using already published algorithms). The fo-cus of the paper is on point b), point a) being describedelsewhere.

Section 2 will be dedicated to the geometric and al-gebraic modeling of the problem and to a description ofthe relations of the present approach to previous ones.In particular, we will tie the intrinsic parameters to theimage of the absolute conic, define the fundamentalmatrix which is analog to the essential matrix in the un-calibrated case, relate it to the intrinsic parameters. Wewill also define the Kruppa equations from which wewill be able to estimate the intrinsic parameters and re-late them to the work on the essential matrix. Section 3will build upon the theoretical results of Section 2 anddescribe a method for recovering the intrinsic parame-ters of the camera and therefore its motion, as detailedin Section 4. Examples with real images are given inSection 5. Finally, in Section 6, we conclude and com-pare our work to that of others.

2. Background and Theory

In this section we lay the ground for the solution of theproblem of estimating the motion of a camera with un-known intrinsic parameters. First we consider the caseof a single camera and introduce the camera model andthe intrinsic parameters. We make heavy use of sim-ple projective geometry. We show that even for a singlecamera, projective geometry offers a rich description ofthe geometry of the problem through the introductionof the absolute conic which is fundamental in motionanalysis. We then consider the case of two camerasand describe their geometric relations. We show thatthese relations can be summarized very simply by theepipolar correspondence (geometric viewpoint) or thefundamental matrix (algebraic viewpoint). We then de-scribe the relationship between the fundamental matrixand the intrinsic parameters of the camera through var-ious complementary approaches.

2.1. The Pinhole Model, Intrinsic and ExtrinsicParameters, the Absolute Conic

The camera model which we consider is the pinholemodel. In this model, the camera performs a perspec-tive projection of an object pointM onto a pixelm in theretinal plane through the optical centerC (see Fig. 1).The optical axis is the line going throughC and per-pendicular to the retinal plane. It pierces that plane atthe principal pointc. If we consider an orthonormalsystem of coordinates in the retinal plane, centered atc, say(c, xc, yc) we can define a three-dimensional or-thonormal system of coordinates centered at the opticalcenterC with two axes of coordinates parallel to theretinal ones and the third one parallel to the optical axis(C, XC,YC, ZC). In these two systems of coordinates,the relationship between the coordinates ofm, imageof M is particularly simple

xc = − fXC

ZCyc = − f

YC

ZC

It is nonlinear but if we write it using the homogeneouscoordinates ofm andM , it becomes linear:

TC ZCxc

TC ZC yc

TC ZC

=− f 0 0 0

0 − f 0 00 0 1 0

TC XC

TCYC

TC ZC

TC

(1)

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

Self-Calibration of a Moving Camera 263

Figure 1. The general projective camera model.

In this equationZCxc, ZC yc and ZC should be con-sidered as the projective coordinatesXc,Yc, Zc of thepixel m andTC XC, TCYC, TC ZC, TC as the projectivecoordinatesXC,YC,ZC, TC of the pointM . We verifyon this equation that the projective coordinates are de-fined up to a scale factor since multiplying them by anarbitrary nonzero factor does not change the Euclideancoordinates of eithermor M . The main property of thiscamera model is thus thatthe relationship between theworld coordinates and the pixel coordinates is linearprojective.

This property is independent of the choice of thecoordinate systems in the retinal plane or in the three-dimensional space. In particular we have indicated inFig. 1 another world coordinate system(O, X,Y, Z)and another retinal coordinate system(o, u, v). Thecoordinate system(O, X,Y, Z) is related to the coor-dinate system(C, XC,YC, ZC) by a rigid displacementdescribed by the rotation matrixR and the translationvectort. If we think of(O, x, y, z)as the laboratory co-ordinate system, the displacement describes the pose ofthe camera in the laboratory. The parameters describ-ing the displacement are called theextrinsic cameraparameters. The coordinate system(o, u, v) is relatedto the the coordinate system(c, xc, yc) by a change of

scale of magnitudeku andkv along theu- andv-axes,respectively, followed by a translation [u0, v0]T . Inaddition, the second coordinate axis is rotated fromθaroundo. The parameters relating the two retinal coor-dinate systems do not depend on the pose of the cameraand are called thecamera intrinsic parameters. The co-ordinate system(o, u, v) is the coordinate system thatwe use when we address the pixels in an image. It isusually centered at the upper left hand corner of theimage which is usually not the pointc, the pixels areusually not square and have aspect ratios depending onthe actual size of the photosensitive cells of the cameraas well as on the idiosyncrasies of the acquisition sys-tem. For most of the imaging situations which are com-monly encountered, the retinal axes are orthogonal, andtherefore the angleθ is zero. However, in order to betotally general this angle has to be considered for sev-eral reasons. First, the projection matrices, being 3×4matrices defined up to a scale factor, depend on 11 freeparameters. Since a displacement is described by 6 pa-rameters (3 for the rotation, 3 for the translation), thereare 5 intrinsic parameters, and therefore, in addition tothe scale factors and the coordinates of the principalpoint, one additional intrinsic parameter is needed toensure the existence of the decomposition of any pro-jection matrix into extrinsic and intrinsic parameters.Second, from a practical standpoint, there are a coupleof special imaging situations which cannot be prop-erly described without consideringθ . These situationsinclude not only the case of pixel grids which have anon-orthogonal arrangement, but also the images takenwith a bellows camera, or enlarged with a tilted easel(in both of these cases the optical axis may not be or-thogonal to the image plane), or the case when picturesof pictures are considered. Even if the value ofθ isknown, if we include its recovery into the computationscheme, we obtain this way an easy quality check on theresults.

This camera model is essentially linear and ignoresnonlinear effects such as those caused by lens distor-tions. If such a correction is needed, it can be per-formed in a way compatible with the projective linearframework which is presented in this paper, as demon-strated by the work of Brand et al. (1993). Moreover,we believe that for several computer vision applica-tions, such a correction is not even needed. The self-consistency of the results that we have obtained, as wellas the quantitative reconstruction results show that theprojective linear model is indeed appropriate, althoughthe cameras which were used in the experiments were

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

264 Luong and Faugeras

off-the-shelf cameras normally priced, described inmore details later. Most of the distortion problemscan be avoided either with a reasonable choice of op-tics (care has to be taken using some extreme wideangles and some zoom lenses), or by using mostlypoints which lie in the center of the field of view.There are also a number of high-quality lenses, someof which quite cheap, for which the distortion is ne-glectible. In our own experiments with off-the-shelfcameras, as well as those conducted by Brand et al.(1993) it was found that the correction to be appliedin order to obtain a projective linear model was sub-pixellic. Unless feature detectors reach the same preci-sion, which is not always guaranteed even with state-of-the art algorithms, the additional correction is not veryuseful.

The fact that no nonlinear camera distortion is con-sidered allows us to use the powerful tools of projectivegeometry. Projective geometry is emerging as an at-tractive framework for computer vision (Mundy andZisserman, 1992). In this paper, we assume that thereader is familiar with some elementary projective ge-ometry. Such material can be found in classical mathe-matic textbooks such as (Coxeter, 1987; Garner, 1981;Semple and Kneebone, 1979), but also in the computervision literature where it is presented in chapters of re-cent books (Faugeras, 1993; Kanatani, 1992; Mundyand Zisserman, 1992), and articles (Kanatani, 1991;Maybank and Faugeras, 1992).

Using Eq. (1) and the basic properties of changesof coordinate systems, we can express the relationbetween the image coordinates in the(o, u, v) coor-dinate system and the three-dimensional coordinatesin the(O, x, y, z) coordinate system by the followingequation

UVW

=A

1 0 0 00 1 0 00 0 1 0

D

XYZT

=P

XYZT

(2)

whereU,V , andW are retinal projective coordinates,X ,Y,Z, andT are projective world coordinates,A a3× 3 matrix describing the change of retinal coordi-nate system, andD is a 4× 4 matrix describing thechange of world coordinate system. The 3× 4 matrixP is the perspective projection matrix, which relates3-D world projective coordinates and 2-D retinal pro-jective coordinates. Except for the points at infinity inthe retina for whichW = 0, the usual retinal coordi-

natesu, v are related to the retinal projective coordi-nates by

u = U

Wv = V

W

The points at infinity in the retinal plane can be con-sidered as the images of the 3-D points in the focalplane of the camera, i.e., the plane going throughCand parallel to the retinal plane.

Similarly, except for the points at infinity in 3-Dspace for whichT = 0, the usual space coordinatesX,Y, andZ are related to the projective world coordi-nates by

X = XT Y = YT Z = ZT .

The matrix A can be expressed as the followingfunction of the intrinsic parameters and the focallength f

A =

− f ku f ku cotθ u0

0 − f kvsinθ v0

0 0 1

(3)

Note that the focal length and minus sign appearingin (1) has been “transfered” to matrixA which dependson the productsf ku, f kv which says that we cannotdiscriminate between a change of focal length and achange of units on the pixel axes. For this reason, weintroduce the parametersαu = − f ku andαv = − f kv.If θ = π/2, Eq. (3) takes the simpler form:

A =αu 0 u0

0 αv v0

0 0 1

(4)

Matrix D depends on 6 extrinsic parameters, threedefining the rotation, three defining the translation, andhas the form:

D =[

R t0T

3 1

](5)

There is an interesting and important relationship be-tween the camera intrinsic parameters and the absoluteconic which is central to the problematic of this pa-per and which we study now. The absolute conic wasused in (Faugeras and Maybank, 1990) to compute thenumber of solutions to the problem of estimating themotion of a camera from five point correspondences

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

Self-Calibration of a Moving Camera 265

in two views and in (Maybank and Faugeras, 1992) tostudy the problem of camera calibration. The absoluteconicÄ lies in the plane at infinity of equationT = 0and its equation is

X 2+ Y2+ Z2 = 0 (6)

All points on that conic have complex coordinates. Infact, if we definex = X

Z andy = YZ , the equation can be

rewrittenx2+ y2 = −1 which shows that it representsa circle of radiusi = √−1. Even though this seems alittle bit farfetched, this conic is closely related to theproblem of camera calibration and motion estimationbecause it has the fundamental property of being invari-ant under rigid displacements, a fact already known toCayley. The proof of this can be found in (Faugeras,1993; Faugeras and Maybank, 1990). Let us examinethe consequences of this invariance. Since the abso-lute conic is invariant under rigid displacements, itsimage by the camera, which is also a conic with onlycomplex points, does not depend on the pose of thecamera. Indeed, when we move the camera, the abso-lute conic does not change since it is invariant underrigid displacements and hence its image by the cameraremains the same. Therefore, its equation in the reti-nal coordinate system(o, u, v) does not depend on theextrinsic parameters and depends only on the intrin-sic parameters. By taking the identity displacement asextrinsic parameters, it is easily seen (Faugeras, 1993;Luong, 1992), that the matrix defining the equation ofthe image of the absolute conic in the retinal coordinatesystem(o, u, v) is:

B = A−TA−1 (7)

One of the important ideas which has emerged fromour previous work (Faugeras et al., 1992; Faugeras andMaybank, 1990; Maybank and Faugeras, 1992) andwill also become apparent in this paper, is that the ab-solute conic can be used as a calibration pattern for thecamera. This calibration pattern has the nice propertiesof always being present and of being free.

2.2. The Epipolar Correspondence, theFundamental Matrix and the Essential Matrix

In the previous section, we have discussed the geometryof one camera. We are now going to introduce a secondcamera and study the new geometric properties of a setof two cameras. The main new geometric property is

Figure 2. The epipolar geometry.

known in computer vision as the epipolar constraintand can readily be understood by looking at Fig. 2.

Let C (resp. C′) be the optical center of the firstcamera (resp. the second). The line〈C,C′〉 projectsto a pointe (resp.e′) in the first retinal planeR (resp.in the second retinal planeR′). The pointse, e′ arethe epipoles. The lines throughe in the first imageand the lines throughe′ in the second image are theepipolar lines. The epipolar constraint is well-knownin stereovision: for each pointm in the first retina, itscorresponding pointm′ lies on its epipolar linel′m.

A 2D point, as well as a 2D line, is represented inprojective geometry by a vector of three coordinates.Two proportional vectors represent the same point (orline). The pointm = [m1,m2,m3]T belongs to theline l = [l1, l2, l3]T if, and only if lTm = 0. The keyobservation is that the relationship between the retinalcoordinates of a pointm and its corresponding epipolarline l ′m is projective linear. The fundamental matrixdescribes this correspondence:

l′m = Fm

The epipolar constraint has then a very simple expres-sion: since the pointm′ corresponding tom belongs tothe linel′m by definition, it follows that

m′TFm = 0 (8)

The epipolese ande′ are special points which verifythe following relations:

Fe= FTe′ = 0 (9)

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

266 Luong and Faugeras

They imply that the rank ofF is less than equal to 2, andin general it is equal to 2. Since the matrix is definedup to a scale factor, it depends upon seven independentparameters.

Equation (8) is the analog in the uncalibrated caseof the so-called Longuet-Higgins equation (Longuet-Higgins, 1981). Indeed, in the case of calibrated cam-eras, the 2D projective coordinates of a pointm give the3-D direction of the optical rayCm (see Fig. 2), whichis of course not the case with retinal (uncalibrated) co-ordinates. If the motion between the two positions ofthe camera is given by the rotation matrixR and thetranslation matrixt, and ifm andm′ are correspondingpoints, then the coplanarity constraint relatingCm′, t,andCm is written as:

m′ · (t × Rm) ≡ m′TEm = 0 (10)

The matrixE, which is the product of an orthogonalmatrix and an antisymmetric matrix is called an essen-tial matrix. Because of the depth/speed ambiguity,Edepends on five parameters only, i.e., the translationvector is defined up to a scale factor.

It can be seen that the two Eqs. (8) and (10) areequivalent, and that we have the relation:

F = A−TEA−1

Unlike the essential matrix, which is characterizedby the two constraints found by Huang and Faugeras(1989) which are the nullity of the determinant and theequality of the two non-zero singular values, the onlyconstraint on the fundamental matrix is that it is of ranktwo.

2.3. The Rigidity Constraint, Kruppa Equationsand the Intrinsic Parameters

We now provide a direct link between the fundamentalmatrix (a projective quantity), and the intrinsic param-eters (Euclidean quantities). Several formulations arepresented and proved to be equivalent in spite of the in-triguing discrepancy in the number of constraints thateach of them generates.

Algebraic Formulations of the Rigidity ConstraintsUsing the Essential Matrix.In the case of two differ-ent cameras, the transformation between the two retinalcoordinate systems is a general linear projective trans-formation ofP3, depending on 15 parameters. This

transformation can be decomposed in two (possiblysimilar) changes of retinal coordinates, and one rigiddisplacement. This decomposition is very far from be-ing unique. However, not all the choices of the intrinsicparameters are possible. The constraints on the intrin-sic parameters are obtained by expressing the rigidityof this underlying displacement, the fact that for anyfundamental matrixF, one can find intrinsic parame-ters matricesA andA′, such thatA′TFA is an essentialmatrix. We have seen that only the seven parametersof the fundamental matrix are available to describe thegeometric relationship between two views. The fiveparameters of the essential matrix are needed to de-scribe the rigid underlying displacement between theassociated normalized coordinate systems, thus we cansee that at most two independent constraints are avail-able for the determination of intrinsic parameters fromthe fundamental matrix.

A first set of approaches to express the rigidity con-straint involve the essential matrix:

E = A′TFA (11)

The rigidity of the motion yielding the fundamentalmatrixF with intrinsic parametersA andA′ is equiva-lent to the Huang and Faugeras conditions expressingthe fact thatE, defined by (11) is an essential matrix:

det(E)= 0

f (E) = 1

2trace2(EET )− trace(EET )2 = 0

(12)

As we have det(F) = 0, the first condition is automat-ically satisfied, and does not yield any valuable con-straint in our framework, thus we are left with only onepolynomial constraint, the second condition.

A second expression of the rigidity constraints hasbeen presented by Trivedi (1988). IfE is an essentialmatrix, the symmetric matrixS= EET , whicha priorihas six independent entries, depends only on the threecomponents oft:

EET =−[t]×2= t2

2 + t23 −t1t2 −t1t3

−t2t1 t23 + t2

1 −t2t3−t3t1 −t3t2 t2

1 + t22

(13)

The matrixS = EET has thus a special structure inwhich the three diagonal and the three off-diagonalentries are related by the three relations designated by(Ti j ), 1≤ i < j ≤ 3:

4Si j − (trace(S)− 2Sii )(trace(S)− 2Sj j ) = 0 (Ti j )

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

Self-Calibration of a Moving Camera 267

Trivedi has shown that in the case he considered, wherethe only intrinsic parameters were the coordinates ofthe principal point, his three polynomial constraints re-duce in fact to a tautology and two independent poly-nomial constraints, provided that det(E) = 0. An ex-amination of his proof shows that this fact is true in thecase of a general intrinsic parameters model too. Thuswe are left with two polynomial constraints, in additionto the nullity of the determinant.

We show in Appendix A that in spite of the appar-ent discrepancy in the number of equations, these ap-proaches to express the rigidity are equivalent. How-ever, the two independent Trivedi equations which areequivalent to the second Huang and Faugeras conditionare not simpler than this one, contradicting what wouldbe expected. They all yield algebraic constraints whichare polynomials of degree 8 in the coefficients ofA andA′ (the intrinsic parameters) and thus are not suitablefor practical computation, or even theoretical study. Itis why we are going to consider ageometricalinterpre-tation of the rigidity constraint which yields low-orderpolynomial constraints.

The Kruppa Equations: A Geometric Interpreta-tion of the Rigidity Constraint.The Kruppa equations(Kruppa, 1913) are obtained from a geometric inter-pretation of the rigidity constraints. They were firstintroduced in the field of computer vision by Faugerasand Maybank for the study of motion (Faugeras andMaybank, 1990), and then to develop a theory of self-calibration (Maybank and Faugeras, 1992). In thisexposition, we will return to the original formula-tion, which doesn’t assume that the two cameras areidentical, which is useful to prove the equivalence ofthe Kruppa equations and the Huang and Faugerasconstraint.

Let consider an epipolar plane5, which is tangent toÄ. Then the epipolar linel is tangent toω, projection ofÄ into the first image, and the epipolar linel ′ is tangentto the projectionω′ of Ä into the second image. It fol-lows that the two tangents toω from the epipoleecorre-spond under the epipolar transformation to the two tan-gents toω′ from the epipolee′, as illustrated by Fig. 3.

If B is the matrix ofω, image of the absolute conic inthe first camera, then the matrix of the dual conic ofω

(formed by the tangents toω) is the dual matrix (matrixof cofactors) ofB: K = B∗. SinceB∗ is proportional tothe inverse ofB, and since we are dealing with matricesdefined up to a scale factor, we can take, using (7):

K = AA T (14)

Figure 3. The absolute conic and epipolar transformation.

This makes explicit the fact that the matrixA can beobtained uniquely from the Cholesky decomposition(Golub and Van Loan, 1989) of the symmetric matrixK when this matrix is positive definite, which is alwaysthe case sinceω has only complex points. A remarkwhich will be useful later is that when the pixel grid isorthogonal (θ = π

2 ), then:

K13K23− K33K11 = 0 (15)

The epipolar linel = 〈e, y〉 is tangent toω iff:

(e× y)TK(e× y) = 0 (16)

The epipolar line corresponding to the pointy is Fyand is tangent toω′ iff:

yTFTK ′Fy = 0 (17)

Writing that (16) and (17) are equivalent yield the so-called Kruppa equations.

One possible way to do so is to takey = (1, τ,0)Tin the case where the epipoles are at finite distance.The relations (16) (resp. (17)) take the formP1(τ ) =k0 + k1τ + k2τ

2 = 0 (resp. P2(τ ) = k′0 + k′1τ +k′2τ

2 = 0), and the Kruppa equations can be writtenas three proportionality conditions between these two

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

268 Luong and Faugeras

polynomials, of which only two are independent:

k2k′1− k′2k1 = 0

k0k′1− k′0k1 = 0 (18)

k0k′2− k′0k2 = 0

We have shown that the Kruppa equations are equiv-alent to the Huang and Faugeras constraint expressedusing the fundamental matrix and the intrinsic param-eters. As seen previously, the null determinant con-straint is readily satisfied, thus we have only to showthat the set of Kruppa Eqs. (18) is equivalent to thesecond constraint, which is done in Appendix B. Thisis something which is a priori not intuitive, since therearetwoKruppa equations.

The nice thing with the Kruppa equations is that sincethe coefficientski (resp. k′i ) depend linearly on theentries ofK (resp.K ′), they are only of degreetwo inthese entries, thus providing a much simpler expressionof the rigidity constraint than the one obtained by thepurely algebraic methods described at the beginning ofthis section.

3. Using the Kruppa Equations to Computethe Intrinsic Parameters

In this section, we examine how to use in practice thetwo Kruppa equations relating fundamental matricesand intrinsic parameters which have been derived in theprevious section. We first examine how many move-ments or images are necessary, as well as the cases ofdegeneracy. Two types of numerical methods to solvethe equations are then considered, and simulations arepresented to assess the strengths and weaknesses ofeach method.

3.1. Using Three Displacementsof a Moving Camera

A Moving Camera. In an earlier work, Trivedi (1988)has considered the problem of computing only the co-ordinates of the principal point of each camera, that isto solve the self-calibration problem for the restrictedmodel of intrinsic parameters:

A =1 0 u0

0 1 v0

0 0 1

A′ =1 0 u′0

0 1 v′00 0 1

using thethreeequations (Ti j ) mentioned previously.The initial idea was that if there were three such inde-pendent equations, then it would have been possible tofind a solution as soon as the number of cameras is su-perior or equal to three. But Trivedi pointed out that thethree equations reduce to two independent equations,and a tautology, and thus that there are not enough con-straints for the problem to be solved.

Recently, Hartley (1992) has brought a partial solu-tion using a simplified camera model, where the onlyunknown is the focal distance, thus taking as a modelfor the intrinsic parameters:

A =1 0 0

0 1 00 0 k

A′ =1 0 0

0 1 00 0 k′

He exhibits an algorithm to factor the fundamental ma-trix F as A′TEA−1, which under his assumption de-pends also on seven parameters, the two different focallengths and the five motion parameters.

If we do not make an additional assumption, it isnot possible to use a more general model for the in-trinsic parameters, since by adding views, we add anumber of unknowns that is a least equal to the numberof additional equations. The idea behind our methodis to use constraints which arise from the observationof a static scene by asingle1 moving camera. In thiscase the intrinsic parameters remain constant:A = A′,K = K ′, thus we can cumulate constraints over differ-ent displacements, and obtain a sufficient number ofequations for the resolution.

How Many Displacements or Images are Necessary?Each displacement yields two independent algebraicequations. In the case of a moving camera, we haveonly five coefficients ofK to estimate, sinceK is a sym-metric matrix defined up to a scale factor. In the generalcase, three displacements are necessary. In the caseof the simplified model with four intrinsic parameters,two displacements are sufficient, since we have the ad-ditional constraint (15). Note that in the most gen-eral case, the displacements need not be in a sequence,which means that we can consider for example the dis-placements 1-2, 3-4, 5-6 between 6 different images.However, the displacements are not always indepen-dent, when the images which are considered form a se-quence. The algorithm based on the Kruppa equationsignores the additional constraints which arises then.

Between three images, there are three displacements,1-2, 2-3, et 1-3. One could worry about the fact that

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

Self-Calibration of a Moving Camera 269

since the third displacement in this caseD3 = D2D1 isa composition of the two first displacementsD1 andD2,the 1-3 equations would be dependent on the 1-2 and2-3 equations, thus resulting in an under-constrainedsystem. One way to see that it is not the case isto count unknowns. Two fundamental matrices de-pend only upon 14 parameters. On the other hand,the Euclidean information which is recovered by self-calibration, consists of the three displacements 1-2, 2-3,and 1-3 up to a common scale factor, and the 5 intrinsicparameters. The displacements depend on 11 param-eters: 3 for each of the rotationsR12 andR23, 2 foreach of the directions of translationst12 and t23, onefor the ratio of the norms of the translations. The totalis 16 parameters, thus the information is not entirelycontained in the first two fundamental matrices. Thesetwo missing parameters are actually recovered thanksto the two additional Kruppa equations provided by thethird fundamental matrix. We also give in Appendix Ca simple numerical example to show that in the generalcase the equations are independent.

Degenerate Cases.Not all combinations of displace-ments will work. For instance, if two of the displace-ments are identical, obviously they will yield only twoindependent constraint.

Also, in the case of a displacement for which thetranslation vector is nullt = (0, 0, 0)T , that is if thedisplacement is a pure rotation whose axis goes con-tains the optical center of the camera, as the two opticalcenters are identical, there is no epipolar constraint,and thus the rigidity constraint cannot be expressedby means of the Kruppa equations. However, self-calibration is even easier in this case, since instead oftwo quadratic constraints on the entries ofK , one ob-tains five linear constraints on these entries, as shownin (Hartley, 1994b; Luong and Vi´eville, 1993).

In the case where the displacement is a pure trans-lation (the rotation is the identityR = I3), it can beseen from (10) and (11) that the fundamental matrix isantisymmetric. From (9), we conclude that

∀y, Fy = e× y

therefore the Eqs. (16) and (17) are equivalent, andthus the Kruppa equations reduce to tautologies. Ageometric interpretation is that since the two tangentstoω in the first imagearethe tangents toω in the secondimage, no further constraint is put onω by consideringa second image.

3.2. A Semi-Analytic Method

Principle. Three displacements yield six equations inthe entries of the matrixK . The equations are homo-geneous, so the solution is determined only up to ascale factor. In effect there are five unknowns. Tryingto solve the over-determined problem with numericalmethods usually fails, so five equations are picked fromthe six and solved first. As the equations are each ofdegree two, the number of solutions in the general caseis 32= 25. The remaining equation could just be usedto discard the spurious solutions, but we have preferred to exploit the redundancy of information to obtaina more robust algorithm, as well as a gross estimate ofthe variance of the solutions.

Solving the Polynomial System by Continuation.Aproblem is that solving a polynomial system by pro-viding an initial guess and using an iterative numericalmethod will not generally find all the solutions: manyof the starting points will yield trajectories that do notconverge and many other trajectories will converge tothe same solution. However it is not acceptable to misssolutions, since there is only one correct one amongstthe 32. Recently developed methods in numerical con-tinuation can reliably compute all solutions to poly-nomial systems. These methods have been improvedover a decade to provide reliable solutions to kinemat-ics problems. The details of these improvements areomitted. The interested reader is referred for instanceto (Wampler et al., 1988) for a detailed tutorial presen-tation. The solution of a system of nonlinear equationsby numerical continuation is suggested by the idea thatsmall changes in the parameters of the system usuallyproduce small changes in the solutions. Suppose thesolutions to problem A (the start system) are known andsolutions to problem B (the target system) are required.Solutions to the problem are tracked as the parametersof the system are slowly changed from those of A tothose of B. Although for a general nonlinear systemnumerous difficulties can arise, such as divergence orbifurcation of a solution path, for a polynomial systemall such difficulties can be avoided. Using an imple-mentation provided by Jean Ponce and colleagues fairlyprecise solutions can be obtained. The major drawbackof this method is that it is expensive in terms of CPUtime. The method is a naturally parallel algorithm, be-cause each continuation path can be tracked on a sep-arate processor. Running it on a network of 7 Sun-4workstations takes approximatively half a minute tosolve one system of equations.

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

270 Luong and Faugeras

Continuation-based computation of the intrinsic parameters with threedisplacements

• generate six independent Kruppa equations from the three fundamental matrices.• for each of the six system of five equationsEi :

— solveEi by the continuation method to obtain the Kruppa matricesKmi with 1≤ m≤ 32

— keep only the matricesKmi which have real entries and are positive

• for each listi of solutions, (i = 1 · · ·6)

— for each list j of solutions, (j = i + 1 · · ·6)

∗ find the solutionKmi in the list i and the solutionKn

j in the list j which minimize thedistance2:

d(u, v) =5∑

l=1

|ul − vl |max(|ul |, |vl |)

whereu and v are two 5-dimensional vectors representingKmi and Kn

j , and ul and vl

are their components,∗ increment the counters corresponding toKm

i andKnj

• choose in each list the solution which has obtained the highest counter score,• compute the intrinsic parameters• compute the final solution and an estimate of the covariance by an averaging operator3.

Two Examples.We present in Tables 1 and 2 two typ-ical examples. They differ by the choice of the threemovements. It can be seen that results can differ signif-icantly from one configuration to another. The secondconfiguration yields results which are less satisfactorybecause one of the three displacements has a rathersmall rotational component.

The experimental procedure consisted in choosingthree displacements, generating point correspondencesby projecting in the two retinas a set of random 3Dpoints and adding Gaussian image noise, computing thefundamental matrix with a non-linear method (Luonget al., 1993) from these point correspondences, andthen use the continuation algorithm to solve the Kruppaequations obtained from the fundamental matrices. Tomatch real conditions, the size of the images was takento be 512× 512 pixels, the field of view of the camerawas 43◦, and the 3D points were scattered in a cube ofsize 10 meters centered around the initial position ofthe camera.

We have verified the impact of the orthogonality con-straint (15) in two different cases. In the first one,where we have only two sets of correspondences, wejust solve the system of equations which are the fourKruppa equations and the constraint (15), which is also

quadratic. In the second one, where we have threesets of correspondences, we could use a very strongredundancy of equations since there are nowC4

6 = 15systems which could be built by picking four Kruppaequations plus the constraint (15). However, for thesake of comparison, we have used only six of theseequations.

We have tested:

• 2 displacements with the orthogonality constraint(the results are displayed with the three possiblecombinations of displacements, the dash indicatethat no solution compatible with the constraints ofpositivity of K was found),• 3 displacements with the orthogonality constraint,• 3 displacements without the orthogonality cons-

traint.

Numbers in brackets are estimates of the uncertaintyof the results.

The big advantage of the method is that no initializa-tion is needed. If the points are measured with a goodprecision, the results can be sufficiently precise. An-other advantage is that it is easy to assess the successor failure of the algorithm. However there are severaldrawbacks:

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

Self-Calibration of a Moving Camera 271

Table 1. Results obtained with the continuation method, configuration 1.

Noise(pixels)

Method Estimated parameters

Orth. Displ. αu αv u0 v0 θ − π2

0 640.125 943.695 246.096 255.648 0

0.1 Y 1,2 642.32 947.37 245.82 253.94

Y 2,3 639.44 944.36 246.04 258.59

Y 1,3 641.62 945.73 248.97 255.56

Y 1,2,3 641.69 [2.0] 947.49 [3.7] 247.03 [1.2] 256.55 [1.7]

N 1,2,3 644.40 [2.3] 952.29 [3.8] 237.45 [4.2] 254.61 [1.9]

0.5 Y 1,2 651.39 962.47 244.73 246.56

Y 2,3 636.54 946.72 245.82 270.46

Y 1,3 647.41 953.67 260.91 255.11

Y 1,2,3 648.39 [11.1] 963.69 [20.3] 250.84 [6.7] 260.26 [9.1]

N 1,2,3 664.19 [11.2] 996.03 [20.6] 190.91 [23.8] 248.71 [9.1] 4.10−2 [2.10−2]

1.0 Y 1,2 — — — — —

Y 2,3 632.49 948.90 245.50 285.45

Y 1,3 74.85 455.95 733.93 434.07

Y 1,2,3 658.00 [24.8] 986.63 [45.7] 255.61 [14.3] 265.09 [19.7]

N 1,2,3 681.66 [25.7] 1109.05 [75.6] 31.10 [139.9] 231.99 [20.5] 0.13 [0.08]

1.5 Y 1,2 676.05 1002.37 241.89 223.28

Y 2,3 627.92 950.11 245.16 300.56

Y 1,3 659.79 971.19 293.82 252.73

Y 1,2,3 669.62 [42.6] 1013.85 [79.2] 260.16 [23.3] 270.23 [32.3]

N 1,2,3 633.02 [73.0] 1223.62 [104.5] 190.46 [231.1] 205.49 [43.9] 0.27 [0.2]

• the method is suitable only for the case of the min-imum number of displacement, as it is difficult touse all the constraints provided by a long sequencewithout increasing considerably the amount of com-putations,• it is difficult to take into account uncertainty for the

input (fundamental matrices) as well as for the output(camera parameters),• the computational cost of solving the polynomial

system is relatively high,• it is not possible to express the constraints of positiv-

ity of K at the resolution level, since continuationswork in the complex plane. Thus with noisy data, itcan happen that no acceptable solution can be found,• it is not very easy to use some a priori knowledge

that one might have about the intrinsic parameters.

All these drawbacks come from the use of the continua-tion method and can be overcome using an optimizationformulation.

3.3. Optimization Formulation Taking into AccountMultiple Views

In this approach, we no longer make use of the sim-ple polynomial structure of the Kruppa equations, butrather consider them as measurement equations relat-ing directly the fundamental matrices to intrinsic pa-rameters, obtained by substituting the values of the en-tries of K obtained from (14) into (18). We can thensolve them either by a batch non-linear least-squaresminimization technique, or by an extended Kalman fil-tering approach. In both cases, the uncertainty of themeasurements (F) can be taken into account.

Global Minimization. The choice of the error func-tion to be minimized is very important. We havenoticed two things. First, using the three KruppaEqs. (18) even though they are not independent pro-vides additional constraints and improve the results.Second, minimizing directly the value of the residual

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

272 Luong and Faugeras

Table 2. Results obtained with the continuation method, configuration 2.

Noise(pixels)

Method Estimated parameters

Orth. Displ. αu αv u0 v0 θ − π2

0 640.125 943.695 246.096 255.648 0

0.1 Y 1,2 647.56 955.50 245.32 250.58

Y 2,3 124.317 947.934 230.705 252.053

Y 1,3 639.303 943.591 246.083 257.594

Y 1,2,3 640.83 [2.7] 947.59 [2.7] 237.90 [10.2] 252.88 [3.1]

N 1,2,3 636.32 [15.8] 942.45 [5.0] 241.87 [5.9] 251.61 [2.8] 0.018 [0.02]

0.5 Y 1,2 — — — — —

Y 2,3 — — — — —

Y 1,3 635.76 942.88 246.03 265.70

Y 1,2,3 654.01 [24.1] 976.83 [22.4] 214.28 [47.0] 232.66 [20.7]

N 1,2,3 623.63 [78.8] 934.15 [31.4] 240.84 [2.1] 237.95 [14.7] 0.089 [0.09]

1.0 Y 1,2 744.34 1110.38 235.28 187.89

Y 2,3 — — — — —

Y 1,3 630.86 941.71 245.96 277.04

Y 1,2,3 505.94 [248.7] 779.03 [389.9] 179.30 [94.4] 407.68 [317.3]

N 1,2,3 628.20 [130.4] 936.94 [68.6] 208.05 217.74 [27.9] 0.15 [0.1]

1.5 Y 1,2 2462.05 3943.05 27.53 −558.13

Y 2,3 342.86 875.35 219.38 246.91

Y 1,3 604.46 885.15 249.27 260.23

Y 1,2,3 688.43 [163.9] 1048.77 [254.3] 161.48 [75.0] 207.08 [38.7]

N 1,2,3 1190.91 [1164.0] 1803.80 [1867.3] 109.39 [149.1]−109.65 [661.5] 0.13 [0.1]

of expressions (18) do not work well. The reason isthe well known fact that minimizing an error func-tion

∑( ai

bi− a′i

b′i)2 is quite different from minimiz-

ing∑(ai b′i − a′i bi )

2 because the later is weightedby the variable quantitybi b′i . In our case, sincewe are interested in expressing the proportional-ity of the polynomialsP1 and P2 (defined just be-fore (18)) we thus minimize the following errorfunction:

minαu,αv,u0,v0,θ

∑displacements

(k0

k′0− k1

k′1

)2

+(

k1

k′2− k1

k′2

)2

+(

k0

k′2− k0

k′2

)2

(19)

where the coefficientski andk′i are defined just before(18).

We have compared this method with the continu-ation method, using a statistical approach involving

100 triples4 of displacements. To obtain an idea of theprecision and convergence properties, we have startedthe minimization with different initial values: (1) theexact values, (2) the values given by the continuationmethod, (3) the arbitrary valuesαu = 800,αv = 800,u0 = 255, v0 = 255, θ = π

2 , corresponding tothe relatively standard situation of an orthogonal pixelgrid, no principal point shift, and reasonable valuesfor the scale factors (with no knowledge of aspect ra-tio). The Table 3 shows the mean relative error foreach parameter, obtained at two different noise lev-els: 0.2 is approximately the subpixel precision ofthe model-based feature detectors, whereas 1.0 pixelis the typical precision of some operator-based featuredetectors.

We can conclude from these results that:

• the precision on the scale factorsαu andαv is betterthan the one on the principal point coordinatesu0

andv0,

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

Self-Calibration of a Moving Camera 273

Table 3. Statistical comparison of the continuation method andof optimization methods with 3 displacements. The minimiza-tion is initialized with: (1) exact values, (2) the continuationmethod, (3) an arbitrary value. The table shows the percent averageerror.

Noise(pixels)

Parameters

Method Failure αu αv u0 v0 θ − π2

0.2 Continuation 2 5 6 3 3 3

Mini (1) 2 3 7 9 3

Mini (2) 5 5 10 12 5

Mini (3) 6 6 13 15 5

1.0 Continuation 7 12 14 26 32 8

Mini (1) 11 14 26 32 9

Mini (2) 13 16 27 37 13

Mini (3) 17 18 28 36 13

• the results are quite sensitive to the choice of theinitialization point,• the precision of the iterative method is roughly

comparable with the precision of the continuationmethod,

Since the number of equations and parameters is rela-tively small, the method is computationally efficient.Its main disadvantage is the need for a good start-ing point, but it could be obtained by the continuationmethod.

Recursive Filtering. If we have a long sequence, itmay be interesting to use the Iterated Extended KalmanFilter5, with the following data:

Table 4. Statistical comparison of minimization and Kalman filtering. Initialization is done withminimization on 3 displacements. The table show the percent average error.

Noise(pixels)

nb(displ)

αu αv u0 v0

Mini Kalman Mini Kalman Mini Kalman Mini Kalman

0.2 3 14 13 14 13 30 27 30 28

5 9 13 8 14 22 26 25 27

10 6 13 7 13 19 23 24 24

15 4 13 7 12 20 22 20 21

1.0 3 38 35 39 36 52 50 57 55

5 33 33 31 36 47 49 49 52

10 28 35 29 37 44 46 51 48

15 30 36 27 37 43 43 49 47

vector of state parametersa= (αu, αv, u0, v0)

T

vector of measurementsx = (F11,F12,F13,F21,F22,F23,F31,F32,F33)

T

measurement equationsf(x, a) = 0, f1 and f2 are two Kruppa Eqs. (18)

The orthogonality correction factor has beendropped to reduce non-linearities in the model, andwe have only used two Kruppa equations to ensure thatthe measurement equations are independent. The un-certainty on the fundamental matrices is needed. Itis obtained using the method described in (Luong andFaugeras, 1994, 1995).

Statistical results have been conducted to see the ef-fect of the increase of the number of displacements andto compare the Kalman method to the batch minimiza-tion approach6. In Table 4 the Kalman filtering hasbeen initialized with the parameters estimated from theminimization technique using the first three displace-ments. The fact that the average error remains approx-imatively the same for the parametersαu andαv is dueto convergence to false local minima induced by inex-act starting points, and the fact that in the Kalman filterapproach, the full information provided by all the dis-placements is not available, due to the recursive natureof the approach.

Thus, statistically, the global minimization gives bet-ter results, a finding consistent with those of (Kumaret al., 1989) and (Weng et al., 1989). However, if thestarting point is precise, as in Table 5, where it is foundby the minimization method using a larger number ofdisplacements, it can be seen that the results are slightlybetter, which may be due to the fact that uncertainty is

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

274 Luong and Faugeras

Table 5. Statistical comparison of minimization and Kalman filtering. Initialization done with theminimization on 15 displacements. The table show the percent average error, and the percent of tries forwhich the final error is above 5%.

Noise(pixels)

αu αv u0 v0

Mini Kalman Mini Kalman Mini Kalman Mini Kalman

0.2 err 4 4 7 6 18 14 19 13

% err> 5% 22 13 28 15 48 29 58 40

1.0 err 23 24 22 27 41 35 48 41

% err> 5% 68 49 64 58 78 59 78 72

taken into account. In this table, we have mentioned notonly the average relative error, but also the percentageof cases for which the final error was superior to 5%,which shows that if the Kalman filter does not fall intoa false minimum, it improves the results significantly.

3.4. An Evaluation of the Methods Basedon the Kruppa Equations

From the numerous simulations that we performed(some of which were described in this section), it ap-pears that all the methods give results which are com-parable, in the sense that none of them gives clearlysuperior results in all situations. In any case, the mainlimitation of the method comes from the necessity toget precise localization of the points in order to computeprecise fundamental matrices. A subpixel accuracy ofabout 0.2 to 0.5 pixel is necessary in order to get ac-ceptable results. It means that the most precise featuredetectors need to be used. Some types of displacementswill not work well, specifically those leading to nearlydegenerate cases for Kruppa equations, mentioned inthis section, and those leading to unstable computa-tion of the fundamental matrix, which are studied in(Luong, 1992; Luong and Faugeras, 1995).

Another limitation might be that the method doesnot give an accurate estimation for the position of theprincipal point, and the angle of the retinal axes. Thelater is of no importance, since in practice it is verywell controlled and very close toπ2 . Thus this infor-mation can be used, either to restrict the model, or todiscard false solutions. We will see in the next sectionthat the former is also of little importance, in the sensethat it does not affect a lot the subsequent stage of thecalibration, the estimation of 3D motion. In fact, wewill see that even with imprecise values of the cameraparameters, fairly acceptable motion parameters canbe recovered, and that furthermore, during this processof recovering the motion parameters, the estimation ofintrinsic parameters can be refined.

To summarize the findings of this section, we rec-ommend to use the continuation method when there isonly a mininal number of displacements available, orwhen there is no initial guess available for the intrinsicparameters. When more displacements are available,and provided that there is a reasonable initial guess, werecommend to use the optimization approaches. Thechoice between the batch non-linear minimization al-gorithm and the Kalman filter algorithm depends onseveral factors. The advantages of the Kalman filteralgorithm are that it is incremental, very fast, and givesthe most precise results if the initialization point is veryprecise. On the other hand, it is considerably more sen-sitive to the choice of the initialization point than thebatch minimization algorithm.

4. Taking into Account the Motionof the Camera

Our first goal in this section is to compute the three-dimensional motion from pairs of images, supposingthat we have obtained the intrinsic parametersA. Weshow experimentally that this computation can be donequite robustly even with imprecise camera parame-ters, provided that the appropriate algorithm is chosen.The interesting remark is that the algorithm which isthe most robust to image noise is not the one whichis the most robust to imprecision of intrinsic param-eters. The comparison of the motion determinationalgorithms is the basis for a new self-calibration algo-rithm. By combining the computation of motion withthe computation of the intrinsic parameters we obtainanother iterative approach to self-calibration, which wecompare to the Kruppa approach.

4.1. Computing the Motion after Calibrating

The motion determination problem from point corre-spondences is very classical. See (Faugeras et al., 1987;Horn, 1990; Spetsakis and Aloimonos, 1988; Weng

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

Self-Calibration of a Moving Camera 275

et al., 1989) for solutions similar to ours. There aretwo different solutions, both based on the computationof the fundamental matrix.

A Direct Factorization. We have seen that during thecourse of intrinsic parameters estimation, we had tocompute the fundamental matrixF, from which theessential matrix is immediately obtained:

E = ATFA (20)

The problem of finding the rotationR and the transla-tion t fromE is classical (Faugeras et al., 1987; Hartley,1992; Longuet-Higgins, 1981; Tsai and Huang, 1984).We denote this algorithm byFACTOR .

An Iterative Solution. An alternative method is to usedirectly the error function that has been used to de-termine the fundamental matrix. It differs from theprevious one by the fact that it uses again the measuredpoints. In (Luong et al., 1993; Zhang et al., 1995) dif-ferent parameterizations for this matrix have been pro-posed to take into account constraints on its structureand linear and non-linear criteria for its estimation werealso considered. We have used the two error functions:∑

i

(m′i

TFmi)2

subject to Tr(FTF) = 1 (21)

and ∑i

{d(m′i

T,Fmi

)2+ d(mT

i ,FTm′i

)2}(22)

whered is a the Euclidean distance in the image planebetween a point and a line.

We denote byMIN-LIN 7 the minimization of theerror function (21) and byMIN-DIST the minimiza-tion of the error function (22). The knowledge of theintrinsic parameters allows us to minimize these er-ror functions with respect to five motion parameters:we parameterizeT by t1/t3, t2/t3 andR by the three-dimensional vectorr whose direction is that of the axisof rotation and whose norm is equal to the rotation an-gle. Hence, we minimize with respect tor andT theerror functions:∑

i

(m′Ti A−TEAm i

)2and∑

i

{d(m′Ti ,A

−TEAm i)2+ d

(mT

i , (A−TEA)Tm′i

)2}whereE = [T]×R.

Figure 4. Relative error on the rotation, initialization with the exactdisplacement.

4.2. An Experimental Comparison

The Case of Exact Intrinsic Parameters.In the firstcomparative study, we suppose that theexactintrinsicparameters are known. The graphs have been obtainedusing 200 different displacements, and show the av-erage relative error on the rotational and translationalcomponents, measured as‖1r‖

‖r‖ and ‖1t‖‖t‖ , wherer is

a vector whose norm is the angle of the rotation, andwhose direction gives the rotation axis. Since the non-linear methods require a starting point whose choice isimportant, we have considered the three possibilities:

1. the exact motion, to test the precision of the mini-mum (Figs. 4 and 5, Label 1).

2. the motion obtained byFACTOR , which is the re-alistic initialization (Figs. 6 and 7, Label 2).

3. an arbitrary motion:r = ( 12,

12,

12)

T , t = (0, 0, 1)T ,to test the convergence properties (Figs. 6 and 7,Label 3).

These indexes (1, 2, 3) are used as labels in the graphsplotted in Figs. 4–7. For example,MIN-LIN-1 meansthat the corresponding curve shows the performance oftheMIN-LIN error function when initialized with theexact motion.

The conclusions of the simulations are:

• The computation is more stable than the fundamentalmatrix computation. Motion computation is a lessdifficult problem.• The rotational part is determined more precisely than

the translational part.

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

276 Luong and Faugeras

Figure 5. Relative error on the translation, initialization with theexact displacement.

Figure 6. Relative error on the rotation, initialization with the re-sults ofFACTOR (2) and an arbitrary motion (3).

• The iterative method based onMIN-DIST is themost precise, but it is the most sensitive to the choiceof the starting point.• The results obtained byMIN-DIST and by

FACTOR in the realistic case whereMIN-DISTis initialized with the results ofFACTOR are veryclose.

Note that even usingMIN-LIN , the results are muchmore precise than those usually found by using a purelylinear methods such as the eight-point algorithm (Fangand Huang, 1984; Tsai and Huang, 1984).

Sensitivity to Errors on the Intrinsic Parameters.Very few results are available concerning the sensitiv-ity of motion and structure computations to errors onthe intrinsic parameters (Kumar and Hanson, 1990). Itis nevertheless an important issue, as it determines the

Figure 7. Relative error on the translation, initialization with theresults ofFACTOR (2) and an arbitrary motion (3).

precision of calibration that it is necessary to achieve toobtain a given precision on the three dimensional recon-struction, which is the final objective. We present heresome experimental results which give an idea of thenumerical values. All experiments were run with pixelnoise levels varying from 0.2 to 1.8 pixels. Figure 8represents the effects of the error on the location of theprincipal point. The exact principal point is at the cen-ter (255, 255) of the image, and we have used for thecomputation of the motion principal points that wereshifted from 20 to 200 pixels following a Gaussian law.Each point on the figure represents 100 trials.

Figure 9 represents the effects of the error on thescale factor, which has been similarly made vary from2.5% to 25%. Among the numerous conclusions thatcan be drawn from the graphs, we would like to em-phasize the following:

• The effects of the imprecision on intrinsic parametersare significant; however, until relatively large errorsare reached (10% on the scale factors, several tens ofpixels for the principal point), these effects are lesssignificant than those due to noise (for example, ifthe image noise increases from 0.6 to 1.0 pixels).• The sensitivity to errors on the principal point is less

than the sensitivity to errors on the scale factor: interms of relative errors, a 120 pixels shift of the prin-cipal point corresponds to a 50% relative error andhas the same effects as a 25% relative error on thescale factors.• The iterative criterionMIN-DIST is more sensitive

to the error on the intrinsic parameters than the so-lution of FACTOR . This can be explained by thefact that the fundamental matrix, which is directly

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

Self-Calibration of a Moving Camera 277

Figure 8. Sensitivity of motion computation to errors on the principal point. Top:FACTOR , bottom: MIN-DIST , left: rotation,right: translation.

used byFACTOR partially retains the informationon the exact intrinsic parameters, whereas the itera-tive method compensates entirely the error on theintrinsic parameters by an error on the computedmotion.

The conclusion of these two subsections is thatwe recommend in practice to use the error functionFACTOR when the intrinsic parameters are not accu-rately known andMIN-DIST when they are, since itis less sensitive thenFACTOR to pixel noise.

4.3. A Global Approach to Compute SimultaneouslyCalibration and Motion

Using a Single Displacement.A natural extension ofthe previous techniques is to minimize the error func-tion (22) simultaneously with respect to the five motionparameters previously introduced and to the intrinsicparameters. Since we have seen that the most signifi-

cant areαu andαv, we choose to estimate them keepingu0 andv0 constant and equal to some “reasonable” val-ues. The relative errors obtained on the motion param-eters are shown in Fig. 10. They are to be comparedto Fig. 9, and to facilitate this comparison, we havealso plotted on this figure the two curves obtained inFig. 9 for the two extreme noise levels. This superpo-sition makes it clear that the new method is much lesssensitive to initial errors on the scale factors, but moresensitive to noise.

The final error on the motion are compensated byerrors on the camera parameters, as seen in Fig. 11,which shows that the final error on the camera parame-ters depends mainly on the noise (the curves are almosthorizontal lines), and not to much on the initial erroron the parameters. The fact that the curves are notexactly horizontal is due to the fact that when the ini-tial error on the camera parameters are above 25%, thealgorithm has sometimes difficulties to retrieve the cor-rect camera parameters, probably due to convergenceproblems. The graphs shows also that it is interesting

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

278 Luong and Faugeras

Figure 9. Sensitivity of motion computation to errors on the scale factors. Top:FACTOR , bottom:MIN-DIST , left: rotation, right: translation.

Figure 10. Sensitivity of motion computation to errors on the scale factors, in the case the scale factors are allowed to vary. Left: rotation,right: translation.

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

Self-Calibration of a Moving Camera 279

Figure 11. Final scale factors obtained when computing the motion.

to re-estimate the camera parameters only if the erroron these parameters is sufficiently high, and this leveldepends again on the image noise.

Global Minimization Using Multiple Motions. If wehave several camera displacements, then the previousapproach can be used to estimate all the camera param-eters and the displacements. Since the minimization ishighly non-linear, and involves a large number of un-knowns, to obtain convergence we need a good startingpoint, which can fortunately be obtained from the pre-vious method. Let us summarize the new algorithm,which can accommodateN independent displacements(N ≥ 2), and, for each displacementi , a minimum ofeight correspondences(mi j ,m′i j ) j :

Global computation of intrinsic parameters and motion

1. Compute theN fundamental matricesFi .2. Compute an initial estimate of the intrinsic parameters (αu, αv, u0, v0), using one of the Kruppa

methods described in Section (3).3. Compute theN initial motions(r i , t i ) usingFACTOR , from theFi and the intrinsic parameters4. Minimize, with respect to the 5N + 4 variables (or 5N + 2 if u0, v0 are taken as the image

center) the error function:

minαu,αv ,u0,v0

r i ,ti xtiz,

tiytiz i=1···N

N∑i=1

∑j

d2(m′i j ,A−1T [t i ]×Ri A−1mi j )+ d2

(mi j ,A−1TRT

i [t i ]×A−1m′i j)

(23)

whered is the Euclidean point-line distance,Ri = e[r i ]× , andA = αu 0 u0

0 αv v00 0 1

5. Perform again stage 3 with the new intrinsic parameters (optional).

A Comparison.We now present some statistical sim-ulation results to show that the new global method cansignificantly improve upon the results obtained by themethods based on the Kruppa equations (denoted byKRUPPA). The nameMOUV designates the globalmethod, initialized with the starting point already usedin the previous section (800, 800, 255, 255). The nameMOUV-KRUPPA designates the global method, ini-tialized with the values obtained byKRUPPA. Allnames are followed by the number of displacementsused, e.g.,KRUPPA2. The image noise has the samemeaning as previously, that is Gaussian noise added topixel coordinates of point correspondences. In Fig. 12,each point represents 100 trials, obtained by varyingthe intrinsic parameters and the camera motions. Wehave represented the average error on the scale factors.We have given both the results withu0 andv0 fixed andvarying.

Let us try to characterize the two methodsKRUPPAandMOUV . The first stage for each method is identi-cal: it is concerned with the determination of the funda-mental matrices. Then in the second stage of determin-ing the intrinsic parameters,KRUPPA use only thesematrices, the rigidity constraint being used toeliminatethe unknown motion parameters. Thus the methodinvolves only the unknowns we try to compute, andallows for a semi-analytical solution, as well as for ef-ficient iterative solutions. Contrary to this, inMOUV ,it is the form of the parameterizationwhich ensuresthat all the constraints are satisfied. Then we have tocompute explicitly all the unknowns in the problem,and thus need a good starting point and more intensivecomputations. However, first the criterion takes intoaccount more constraints, since it ensures the exact de-composability of each fundamental matrixFi under the

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

280 Luong and Faugeras

Figure 12. Comparison of the Kruppa based self-calibration method with the motion-based global self-calibration method. Left: 4 parametersestimated (variable principal point). Right: 2 parameters estimated (fixed principal point).

form A−1TEi A−1, with an unique intrinsic parametersmatrix. Thus it achieves a minimal parameterizationof the unknowns. In theKRUPPA approach, the fun-damental matrices obtained verify further constraints,which are precisely the existence of solutions for theKruppa equations, and these constraints cannot be en-forced at the first stage of the computation. Second, thecriterion uses directly more information. This explainswhy we obtain more precise results.

4.4. An Evaluation of the Methods Basedon the Motion

In this section, we have presented an alternative methodfor self-calibration, which computes simultaneouslythe intrinsic parameters and the camera motion.

As a preliminary, we have first studied the compu-tation of the motion parameters in the context of self-calibration. One finding is that although in a classical

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

Self-Calibration of a Moving Camera 281

context where the camera parameters are known accu-rately, the non-linear minimization techniques providethe most accurate results for the motion parameters, thebest method in our context is the decomposition of es-sential matrix method. This method is very fast and notvery sensitive to the errors on the camera parameters.

Once an estimate of the motion has been obtainedthis way, we can simultaneously refine the camera andmotion parameters. So far this method has proved tobe the most reliable, and gives better results than themethods based on the Kruppa equations. Althoughits principle is very simple, it nevertheless depends onthe availability of a starting point, and the methodspresented in the previous section are perfectly adequatefor this purpose, since some of them do not even needan initialization.

5. Experimental Results with Real Data

5.1. Self-Calibration of a Camera

We use three images taken by a camera from differentpositions. The camera is a Pulnix CCD camera. The

Table 6. Results of the fundamental matrix estimation.

From the grid Estimated RMS

ex ey e′x e′y ex ey e′x e′y Points Grid

1-2 −222.4 181.0 −466.9 167.5 −200.0 185.8 −447.5 170.1 0.36 0.76

2-3 2226.9 −1065.1 −2817.9 1646.6 2708.5 −1380.1 −2099.6 1315.5 0.31 0.31

1-3 654.4 −288.8 1114.7 −715.6 680.2 −321.7 1230.9 −842.2 0.26 0.54

Figure 13. A pair of images with the detected corners superimposed.

CCD has a size of 6.4 mm× 4.8 mm, and the lens hasa focal of 8 mm, resulting in an horizontal field of viewof 43◦. In order to make comparisons possible withthe standard calibration method, we have performeddisplacements in such a way that the calibration gridremains always visible. We use between 20 and 30corners, which are extracted with a sub-pixel accuracy,semi-automatically, by the program of Deriche andBlaszka (1993). Correspondence is performed man-ually. It should be noted that the corresponding pointsbetween pairs of images are different, that is, pointsneed not be seen in the three views. Figure 13 showsthe detected points of interest matched between image 1and image 2.

Note that only few of the points are on the calibrationgrid. The standard calibration is performed on eachimage, using the algorithm of Robert (1993), whichis a much improved version of the linear method ofFaugeras and Toscani (1986). From the projection ma-trices obtained by this algorithm, the three fundamentalmatricesF12, F23, F13 are computed and used as a ref-erence for the comparisons with our algorithm whichcomputes the fundamental matrices from the pointmatches. The resulting epipoles are shown in Table 6.

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

282 Luong and Faugeras

Table 7. Results of the intrinsic parameters estimation.

Method αu αv u0 v0 θ − π2

Grid, image 1 657 1003 244 256 −2.05e-06

Grid, image 2 664 1015 232 257 −7.47e-07

Grid, image 3 639 980 252 249 −2.60e-06

Average, [s.d.] 653 [10] 999 [14] 242 [8] 254 [3]

Kruppa polynomial 639 982 258 341 −6.11e-03

Kruppa iterative 640 936 206 284 −0.07

Kruppa iterative (center) 681 985 255 255

Average, [s.d.] 653 [19] 967 [22] 239 [23] 293 [35]

Figure 14. A triplet of images with some estimated epipolar lines superimposed.

It can be seen that the estimation is quite precise. Wehave given two values of the RMS error, which repre-sents the average distance of corresponding points toepipolar lines. The first one (labeled points) is com-puted over the detected points which were used to esti-mate the fundamental matrices. The low value (onethird of a pixel) confirms the validity of our lineardistortion-free model (see remark in Section 2.1), aswell as the accuracy of the corner detection process.The second value of the RMS (grid) is computed overthe 128 corners of the little white squares on the cal-ibration grid, which were used for model-based cal-ibration. Since these points were not used at all toestimate the fundamental matrices, this provides ap-propriate control values. As expected, the RMS withthe control points is sometimes higher than the RMSwith the data points, but the values remains bellow onepixel. Some epipolar lines obtained with points thatare seen in the three images are shown in Fig. 14 to il-lustrate the quality of the estimated epipolar geometry.The cameras intrinsic parameters are then computedfrom the fundamental matrices. We show in Table 7

the intrinsic parameters obtained by the standard cal-ibration method using each of the three images, andthe results of our method, with the polynomial method(Section 3.2) and the iterative method (Section 3.3)used to compute all the parameters, or just the scalefactors, starting from the previous value.

The scale factors are determined with a good accu-racy, however, this is not the case for the coordinates ofthe principal point. Thus the best is to assume that it isat the center of the image. Note that the intrinsic param-eters computed from the standard calibration methodshow a fair amount of variability among views. Wehave then compared in the Table 8 the camera motionobtained directly from the projection matrices given bythe classic calibration procedure, and the estimation byperforming the decomposition of the fundamental ma-trices already obtained, and using the camera parame-ters obtained by the self-calibration method. The tableshows the relative error on the rotation angleα, theangular errorθr on the rotation axis andθt on the direc-tion of translation. It can be seen that the estimation isaccurate.

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

Self-Calibration of a Moving Camera 283

Table 8. Results of the camera motion estimation. Rotations are represented by the rotation vector, translationsby their unit direction.

Motion rx r y rz tx ty tz1αα

θr θt

1-2 grid 0.01175 −0.2117 −0.01785 −0.7290 −0.06831 0.6809

Estimated 0.01843 −0.2110 −0.01961 −0.7239 −0.06102 0.6871 0.0005 1.8 0.62

2-3 grid 0.1900 0.4526 0.1211 −0.9395 0.2779 0.1999

Estimated 0.1915 0.4682 0.1279 −0.9209 0.2896 0.2608 0.032 0.61 3.7

1-3 grid 0.2007 0.2533 0.07876 0.6976−0.5041 0.5090

Estimated 0.01306 −0.2145 −0.01405 −0.7371 −0.05872 0.6731 0.10 0.98 3.0

Table 9. Parameters obtained with a zoom camera.

Focal Method αu αv u0 v0 θ − π2

αuαv

9 GRID 481.31 711.54 248.57 260.97 10−7 .6764

SELFCALIB 503.49 760.71 250.24 282.67 .6618

12 GRID 642.45 950.37 248.30 263.31 −5.10−7 .6759

SELFCALIB 636.12 921.36 201.52 338.89 .6904

20 GRID 1036.38 1539.6 252.43 272.53 7.10−8 .6731

SELFCALIB 1208.83 1838.48 251.93 200.58 .6575

30 GRID 1573.20 2330.953 207.98 210.35 4.10−7 .6749

SELFCALIB 2047.61 3063.94 249.678 198.463 .6682

5.2. Varying the Focal Length

We have applied the method to a camera with a vari-able focal length. The camera was fitted with a Canonzoom lens J8x6B4 which covers focals from 6 mm to48 mm. The size of the CCD is 4.8 mm× 3.6 mm.In order to avoid camera distortion, we have not usedthe focals shorter than 9 mm. The results are shownin Table 9. It allows us to notice that the best resultsare obtained for short focal lengths, which yield largefields of views. Although the focal length is overesti-mated by the method for large values, we can noticethat the computed aspect ratio is quite consistent overthe whole focal range.

5.3. Reconstructions from a Triplet of UncalibratedImages Taken by a Camera

We now show an example of reconstruction using struc-ture from motion with three uncalibrated views. Theapproach is to use the global minimization approachpresented in Section 4.3, with the variant to accountfor trinocular constraints.

We have tested the precision of reconstruction ofour algorithm using triplets of images of a stan-dard photogrammetric calibration pattern which werecommunicated to us for testing by the commercialphotogrammetry companyCHROMA, of Marseille,France. Coordinates of 3D reference points are avail-able, which allows us to assess quantitatively the errorin reconstruction from the uncalibrated images. Thetriplet used in this experiment is shown in Fig. 15.

The points of interest are the light dots and have beenlocated and matched manually8. Note that the scale fac-tors foundαu = 1859.47, αv = 2520.79 correspondto a rather long focal length, which is not very favor-able, and that among the three motions between pairsof images, the motion 2-3, whose translation vectorwas found to bet23 = (−1.186, 0.6623,−0.0857)T ,is nearly parallel to the image plane, a defavorableconfiguration, as shown in (Luong, 1992; Luongand Faugeras, 1995). However, the epipolar geome-try found from the three projection matrices obtainedby self-calibration is fairly coherent, as illustrated inFig. 16, which shows a zoom with the epipolar lines ofone the point of interest.

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

284 Luong and Faugeras

Figure 15. The triplet of images of the photogrammetric object.

Figure 16. Zoom on the photogrammetric triplet, showing corresponding epipolar lines.

We have then performed a 3D trinocular reconstruc-tion from the matched points, using our computed pro-jection matrices as input for a the classical reconstruc-tion algorithm of Deriche et al. (1992). The 3D pointsare obtained in the coordinate system associated withone of the cameras, since we can reconstruct only upa similarity with the self-calibration technique. Thusin order to compare the reconstruction with the refer-ence data, we have computed the best similarity whichrelates the two sets of 3D points, using an algorithmof Zhang. After applying this similarity to the initialreconstruction, the final average error in 3D space withthis sequence is 2 millimeters9. A sample of coordi-nates of reconstructed points are shown in Table 10,units being in millimeters.

6. Conclusion

We have presented a general framework to perform theself-calibration of a camera. The basic idea is that theonly information which is needed to perform calibra-tion are point correspondences between images takenfrom various viewpoints. This is contrast with all stan-dard calibration methods. As a side effect of the cali-

Table 10. Comparison of the 3D reconstruction fromself-calibration with reference points.

Reference points Reconstructed points

X Y Z X Y Z

−56.3 0.38 90.1 −55.5 −2.28 89.1

−69.7 0.33 110.1 −69.6 −3.02 108.3

−41.8 30.0 40.1 −40.9 29.7 40.6

−28.2 49.8 90.0 −26.5 49.3 89.2

−70.0 30.0 3.5 −69.8 30.4 3.4

−112.0 70.2 90.1 −113.8 70.5 88.1

−69.5 89.7 90.0 −69.6 90.8 88.9

bration procedure, we can also estimate the relativedisplacements between the cameras and the structureof the scene. The algorithms which arise from this studyare the most general possible, in the sense that they donot require:

• any model of the observed objects, or any 3D coor-dinates,• any knowledge of the camera motion, which can

be entirely general, with the exception of a few

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

Self-Calibration of a Moving Camera 285

degenerate cases, and can be computed as a byprod-uct of the method,• any initial guess about the values of the camera pa-

rameters, or any restrictive model of these param-eters, which describe the most general projectivecamera model.

Thus, of the four pieces of information used in 3D vi-sion (calibration, motion, structure, correspondences),our method needs only one input and produces threeoutputs, whereas the other algorithms need at least twoinputs or produce at most two outputs, as shown in thetable below:

Camera Corres- Rigid 3DParadigm parameters pondences displacement Structure

Structure from Input Input Output Outputmotion

Stereovision Input Output Input Output

Model-based Output Input Not used Inputcalibration

Calibration from Output Input Input Not usedmotion

The problem of on-line calibration is now becomingvery important in the framework of active vision, whereoptical parameters such as focus, aperture, zoom, andvergence are constantly changing, making the use ofclassic calibration techniques impossible. Another crit-ical framework is digital image libraries which are usu-ally built from images for which no calibration datais available. Thus a number of researchers have re-cently investigated self-calibration techniques. How-ever, with one exception, all of them have put morelimitations on their methods than we did, by addingsupplementary constraints, such as an initial knowl-edge of camera parameters which are then only updated(Crowley et al., 1993), or restriction on the cameramotions (Basu, 1993; Dron, 1993; Du and Brady, 1993;Hartley, 1994b; Vieville, 1994). When the camera mo-tion is exactly known in some reference frame, thenthese methods should be rather called “calibration frommotion” than self-calibration, where motionandcali-bration are estimated. However, one of the most rea-sonable restriction seems to be a partial control of themotion, which may be performed by a robotic head.In this context, one of the most general work is that ofVi eville (1994) where the only additional assumption

is the fact that the motion is a fixed-axis rotation, some-thing well-suited to robotics heads. Another approach,which is complementary to the one described in thispaper, is to use a stationary camera (Hartley, 1994b;Luong and Vieville, 1993). More precise and robustresults can be obtained, since the problem is more con-strained.

The only equivalent approach has been recently10

presented by Hartley (1994a). There are a number ofsimilarities in the steps of the algorithm, although eachstep is quite differently done. The middle steps are inboth methods non-iterative computations mixed withsmall-scale optimizations based on motion parametersin order to find an initialization for the camera param-eters. The first step, and the last step consists in ex-ploiting directly the point correspondences. The maindifference here is that Hartley’s method it is based on abundle adjustment technique, whereas ours is based onthe epipolar geometry. Therefore, Hartley’s algorithmenforces more geometric constraints, but requires cor-respondences across multiple views, and a large scalenon-linear minimization which is not successful withthe minimum number of views. Experimental compar-isons remain to be done in order to quantify the eventualgain in precision.

Although we have shown using experiments withreal images that our self-calibration method can be ac-curate enough to provide useful 3D metric descriptions,and that the results are often of a similar quality to theones obtained by a traditional method, it must be admit-ted that the method has presently its own constraints:not all types of displacements yield stable results, and,as in all calibration procedures, precise image points lo-calization and reliable correspondences are necessary.The precision of our method might compare very de-favorably with those obtained using photogrammetrictechniques, but one has to remember that our methodcan work with only the theoretical minimal amountof data, and although the reconstructions are not veryprecise, they are usable for most robotics tasks, and arecertainly better than the total absence of metric infor-mation. Only three images are required, and there isno need to have correspondences across three views.By using more images, which has not been done in theexamples of this paper, results would improve.

Natural extensions of this work are to investigatethe geometry of a system of three cameras, since ourformulation does not take into account trinocular con-straints at the projective level, but only at the Euclidean

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

286 Luong and Faugeras

level. Using a third view should also enable to use lines,which are usually more stable primitive than points. Itcan be expected that the resulting algorithm will havenicer robustness properties. Another idea, which is im-portant in the framework of active vision, is to studythe case of parameters which are allowed to changeover time. The framework that has been laid out in thispaper could prove to be a useful starting point for thesestudies which would hopefully result in more truly au-tonomous vision systems.

Appendix A: Equivalence of the Trivedi Equationsand the Huang-Faugeras Constraints

We now show that the three Trivedi equations are equiv-alent to the Huang and Faugeras conditions. Let firstsuppose that we have (13). It follows immediately thatdet(EET ) = 0, and thus the first condition det(E) = 0is satisfied. AddingT12, T13 andT23, yields:

4(S2

12+ S213+ S2

23

)+ S211+ S2

22+ S233

− 2(S11S22+ S22S33+ S33S11) = 0

Since the matrixS is symmetrical, the first term canbe replaced by: 4(S12S21 + S13S31 + S23S32), and asimple calculus shows that it is identical to the secondHuang-Faugeras condition:

trace2(S)− 2trace(S2) = 0

Let then suppose that the Huang-Faugeras conditionsare satisfied. They are equivalent to the fact that thematrix E has a zero singular value and two non-zeroequal singular valuesσ . By definition, there exists anorthogonal matrix2 such as:

S= EET = 20 0 0

0 σ 2 00 0 σ 2

2T

This matrix equality can be expanded as:

S2 = (2i 22 j 2+2i 32 j 3)1≤i, j≤3

SinceΘ is orthogonal:

2i 22 j 2+2i 32 j 3 ={−2i 12 j 1 if i 6= j

1−22i 1 if i = j

The diagonal element 1−2211 (resp. 1−22

21, 1−2231)

can be rewritten2231+ 22

21 (resp.2211+ 22

31, 2221+

2211), which shows thatS has exactly the form (13).

Appendix B: Equivalence of the Huang-FaugerasConstraints and the Kruppa Equations

Let us make a change of retinal coordinate system ineach of the two retinal planes, so that the new funda-mental matrix is diagonalised. One way to see thatit can always be done is to use the singular value de-composition : there exists two orthogonal matricesΘand∆ such thatF = ∆ΛΘT . If we use matrixΘ tochange retinal coordinates in the first retina and matrix∆ to change retinal coordinates in the second retina, thenew intrinsic parameters matrices areA = A0Θ andA′ = A0∆ in the first and second retina, respectively.If the epipolar constraint in normalized coordinatesmandm′ was:

m′TA−1T0 FA−1

0 m = 0

with the new coordinate systems, we have:

p′TA′−1TΛA−1p = 0

Thus it is possible,provided we allow the two camerasto be different, to consider thatF is in diagonal form:

F =λ 0 0

0 µ 00 0 0

(B1)

whereλ 6= 0 andµ 6= 0 since we know that a fun-damental matrix must be of rank two. Let’s use theKruppa notation:

K =−δ23 δ3 δ2

δ3 −δ13 δ1

δ2 δ1 −δ12

Using (B1) we obtain easily the epipolese = e′ =(0, 0, 1)T and then, after some algebra, the Kruppaequations:

λδ3δ′23+ µδ13δ

′3 = 0 (E1)

λδ23δ′3+ µδ3δ

′13 = 0 (E2)

λ2δ23δ′23− µ2δ13δ

′13 = 0 (E3)

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

Self-Calibration of a Moving Camera 287

with, lT1 , lT2 , lT3 being the row vectors ofA (similarprimed notations are used for the second retina):

δ3 = 〈l1, l2〉δ13 = −‖l2‖2δ23 = −‖l1‖2 (B2)

Note that although we use for convenience the threeKruppa equations, only two of them are independent,since we have for instance the relation:

λδ23E1− µδ13E2 = δ3E3 (B3)

Let now express the conditionf (E)= 0. Since:E=A′TFA, some algebra (done partially using thesymbolic computation program MAPLE), leads to:

f (E) = −1

2((λ2δ23δ

′23− µ2δ13δ

′13)

2

+ 2λµ(λδ3δ′23+ µδ13δ

′3)(λδ23δ

′3+ µδ3δ

′13))

or

f (E) = −1

2

(E2

3 + 2λµE1E2)

It is then clear that if the Kruppa equations are satisfied,then f (E) = 0. Let now prove the inverse implication.

In the case whereδ3 6= 0, the previous equation canbe rewritten, using (B3):

(λδ23E1− µδ13E2)2+ 2λµE1E2δ

23 = 0 (B4)

Thus:

λ2δ223E2

1+µ2δ213E2

2 = 2λµE1E2(δ13δ23−δ2

3

)(B5)

According to the definitions (B2) ofδ3, δ13, δ23, theSchwartz inequality implies thatδ13δ23− δ2

3 is superioror equal to zero. If it is zero, one can obtain from (B5)that δ23E1 = δ13E2 = 0. Sinceδ13δ23 = δ2

3 6= 0, itfollows E1 = E2 = 0. If it is strictely positive, then2λµE1E2 ≥ 0. The Eq. (B4) is the sum of two positiveterms, thus they have to be simultaneously zero, thusE1E2 = 0 andE3 = 0.

The only special case which remains isδ3 = 0. TheKruppa equations are then in the simple form:

µδ13δ′3 = λδ23δ

′3 = λ2δ23δ

′23− µ2δ13δ

′13 = 0

which is equivalent to:{δ′3 = 0

λ2δ23δ′23− µ2δ13δ

′13 = 0

or {δ′3 6= 0

δ13 = δ23 = 0

and to:

f (E) = 2λ2µ2δ13δ23δ′23+ (λ2δ23δ

′23− µ2δ13δ

′13)

2 = 0

Appendix C: Independence of the KruppaEquations Obtained from Three Images

The first two displacements are:

R1 =1 0 0

0 0 −10 1 0

t1 =1

21

R2 =

0 1 0−1 0 00 0 1

t2 = 2

0−1

The displacement obtained by composition ofD1 andD2, in the coordinate system of the first camera is:

R3 = R1R2 = 0 1 0

0 0 −1−1 0 0

t3 = R1t2+ t1 =

331

If we take as intrinsic parameter matrixA the identitymatrix, the fundamental matrices are identical to the es-sential matrices. By chosing the normalizationδ12 =1, the six Kruppa equations obtained are shown inTable 11. A solution of the system of equationsE1, E′1,E2, E′2 obtained from the displacementsD1 andD2 is:

δ1 = 0 δ2 = −1

2δ3 = 1 δ13 = −4 δ23 = 0

Substituting these values into the equations obtainedfrom D3 yields: E3 = −27, E′3 = 19, thus we haveverified that these equations are independant from theprevious ones.

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

288 Luong and Faugeras

Table 11. The six Kruppa equations.

E1= 3δ1 − 2+ 6δ3 δ13+ 9δ3 + 4δ1 δ3 − 7δ2 δ13

+ 2δ2 + 12δ1 δ2 + 3δ1 δ13+ 2δ132

E′1= 3δ23− 3δ13 δ23+ 8δ1 δ23+ 1+ δ1 − δ2 δ13

− 4δ2 − 4δ1 δ2 − 4δ3 δ13− δ3 + 4δ1 δ3− δ13

2 + δ1 δ13

E2= 2δ3 δ23+ 16δ3 − 8δ2 δ3 + 4δ2 δ23+ 16δ2− 16δ22 + 4δ1 δ13+ 16δ1 − 16δ12

+ 2δ3 δ13− 8δ1 δ3

E′2= δ232 + 4δ23− 4δ2 δ23− δ13

2 − 4δ13+ 4δ1 δ13

E3= 6δ23+ 6δ3 + 18δ23 δ3 + 12δ3 δ13+ 36δ32

+ 18δ23 δ2 + 36δ2 δ13+ 36δ2 δ3 − 6δ23 δ1− 12δ1 δ13+ 18δ2 − 36δ1 δ2− 6δ13

2 − 18δ1 + 36δ12

E′3= 9δ232 + 9δ23 δ13+ 18δ23 δ3 + δ23− 9δ13

+ 2δ3 + 6δ23 δ2 + 6δ2 δ13+ 12δ2 δ3 − δ132

− 4δ1 δ13− 9+ 12δ1 + 12δ12

Acknowledgments

The authors would like to thank R. Deriche, S.Maybank, T. Papadopoulo, T. Vi´eville, and Z. Zhangfor useful discussions and partial contributions to thiswork, T. Blaszka and B. Bascle for providing us withpoint of interest detectors, L. Robert for helping us withhis calibration and stereo software, and H. Mathieu formaking image acquisition possible.

Notes

1. Since a camera is characterized by its intrinsic parameters, thismeans that we assume that intrinsic parameters remain constantduring the displacements. In the opposite case, the problem wewould have to deal with would be the same as with multipledifferent cameras.

2. This relative distance had to be chosen, because the orders ofmagnitude of each component are very different.

3. In our implementation, we chose to compute the mean value,and to discard iteratively the solutions whose distance to themean values are above a certain threshold. The final solutionis obtained as the mean value of the retained solutions, and anestimate of covariance is obtained by computing their standarddeviation.

4. The results improve if one considers more displacements. Seenext paragraph.

5. As it is a classical tool in computer vision, we do not give detailson the filter itself, and rather invite the interested reader to readthe classical references (Jazwinsky, 1970; Maybeck, 1979), orthe more practical presentations which can be found in (Ayache,1990; Faugeras, 1993 and Zhang and Faugeras, 1992).

6. The seemingly inferior results come from the fact that there wasno requirements on these experiments on the minimum numberof point matches generated, and thus often very few points havebeen used, in contrast with the previous experiments, where westarted with at least 30 points.

7. Although it is not a linear method, but a non-linear method basedon the same error measure than the linear criterion for the com-putation of the fundamental matrix.

8. A snake-based ellipse localization program due to B. Bascle, hasalso been tried.

9. This is typical, more precise results have been sometimesachieved.

10. The methods described in this paper were first described in(Luong, 1992). Parts of the results were presented in (Faugeraset al., 1992; Luong and Faugeras, 1992, 1994).

References

Ayache, N. 1990.Stereovision and Sensor Fusion. MIT Press.Basu, A. 1993. Active calibration: Alternative strategy and analysis.

In Proc. of the Conf. on Computer Vision and Pattern Recognition,New-York, pp. 495–500.

Brand, P., Mohr, R., and Bobet, P. 1993. Distorsions optiques:Correction dans un modele projectif. Technical Report RR-1933,INRIA.

Coxeter, H.S.M. 1987.Projective Geometry. Springer Verlag, secondedition.

Crowley, J., Bobet, P., and Schmid, C. 1993. Maintaining stereo cali-bration by tracking image points. InProc. of the Conf. on ComputerVision and Pattern Recognition, New-York, pp. 483–488.

Deriche, R. and Blaszka, T. 1993. Recovering and characterizingimage features using an efficient model based approach. InProc.International Conference on Computer Vision and Pattern Recog-nition.

Deriche, R., Vaillant, R., and Faugeras, O. 1992.From Noisy EdgesPoints to 3D Reconstruction of a Scene: A Robust Approach andIts Uncertainty Analysis, Vol. 2, pp. 71–79. World Scientific. Se-ries in Machine Perception and Artificial Intelligence.

Dron, L. 1993. Dynamic camera self-calibration from controled mo-tion sequences. InProc. of the conf. on Computer Vision and Pat-tern Recognition, New-York, pp. 501–506.

Du, F. and Brady, M. 1993. Self-calibration of the intrinsic parame-ters of cameras for active vision systems. InProc. of the conf. onComputer Vision and Pattern Recognition, New-York, pp. 477–482.

Fang, J.Q. and Huang, T.S. 1984. Some experiments on estimatingthe 3D motion parameters of a rigid body from two consecutive im-age frames.IEEE Transactions on Pattern Analysis and MachineIntelligence, 6:545–554.

Faugeras, O.D. 1993.Three-Dimensional Computer Vision: A Geo-metric Viewpoint. MIT Press.

Faugeras, O.D. and Toscani, G. 1986. The calibration problem forstereo. InProceedings of CVPR’86, pp. 15–20.

Faugeras, O.D., Lustman, F., and Toscani, G. 1987. Motion andStructure from point and line matches. InProc. International Con-ference on Computer Vision, pp. 25–34.

Faugeras, O.D. and Maybank, S.J. 1990. Motion from point matches:Multiplicity of solutions.The International Journal of ComputerVision, 4(3):225–246, also INRIA Tech. Report 1157.

Faugeras, O.D., Luong, Q.-T., and Maybank, S.J. 1992. Camera self-calibration: Theory and experiments. InProc. European Con-ference on Computer Vision, Santa-Margerita, Italy, pp. 321–334.

Garner, L.E. 1981.An Outline of Projective Geometry. Elsevier:North Holland.

P1: PMR/SFI P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

International Journal of Computer Vision KL410-04-Luong March 7, 1997 9:59

Self-Calibration of a Moving Camera 289

Golub, G.H. and Van Loan, C.F. 1989.Matrix Computations. TheJohn Hopkins University Press.

Hartley, R.I. 1992. Estimation of relative camera positions for un-calibrated cameras. InProc. European Conference on ComputerVision, pp. 579–587.

Hartley, R.I. 1994a. An algorithm for self calibration from sev-eral views. InProc. Conference on Computer Vision and PatternRecognition, Seattle, WA, pp. 908–912.

Hartley, R.I. 1994b. Self-calibration from multiple views with a ro-tating camera. InProc. European Conference on Computer Vision,Stockholm, Sweden, pp. 471–478.

Horn, B.K.P. 1990. Relative orientation.The International Journalof Computer Vision, 4(1):59–78.

Huang, T.S. and Faugeras, O.D. 1989. Some properties of the E-matrix in two view motion estimation.IEEE Transactions on Pat-tern Analysis and Machine Intelligence, 11:1310–1312.

Jazwinsky, A.M. 1970.Stochastic Processes and Filtering Theory.Academic Press: London.

Kanatani, K. 1991. Computational projective geometry.ComputerVision, Graphics, and Image Processing. Image Understanding,54(3).

Kanatani, K. 1992.Geometric Computation for Machine Vision.Oxford university press.

Kruppa, E. 1913. Zur Ermittlung eines Objektes aus zwei Perspek-tiven mit innerer Orientierung.Sitz.-Ber. Akad. Wiss., Wien, math.naturw. Kl., Abt. IIa., 122:1939–1948.

Kumar, R. and Hanson, A. 1990. Sensibility of the pose refinementproblem to accurate estimation of camera parameters. InProceed-ings of the International Conference on Computer Vision, Osaka,Japan, pp. 365–369.

Kumar, R.V.R., Tirumalai, A., and Jain, R.C. 1989. A non-linearoptimization algorithm for the estimation of structure and motionparameters. InProc. International Conference on Computer Visionand Pattern Recognition, pp. 136–143.

Longuet-Higgins, H.C. 1981. A computer algorithm for reconstruct-ing a scene from two projections.Nature, 293:133–135.

Luong, Q.-T. 1992. Matrice fondamentale et auto-calibration en vi-sion par ordinateur. Ph.D. thesis, Universite de Paris-Sud, Orsay.

Luong, Q.-T. and Faugeras, O.D. 1992. Self-calibration of a cam-era using multiples images. InProc. International Conference onPattern Recognition, Den Hag, The Netherlands, pp. 9–12.

Luong, Q.-T. and Vi´eville, T. 1996. Canonic representations for thegeometries of multiple projective views.Computer Vision and Im-age Understanding, 64(2):193–229.

Luong, Q.-T. and Faugeras, O.D. 1994. An optimization frameworkfor efficient self-calibration and motion determination. InProc. In-ternational Conference on Pattern Recognition, Jerusalem, Israel,pp. A-248–A-252.

Luong, Q.-T. and Faugeras, O.D. 1994. A stability analysis of thefundamental matrix. InProc. European Conference on ComputerVision, Stockholm, Sweden, pp. 577–588.

Luong, Q.-T. and Faugeras, O.D. 1996. The fundamental matrix:Theory, algorithms, and stability analysis.Intl. Journal of Com-puter Vision7(1):43–76.

Luong, Q.-T., Deriche, R., Faugeras, O.D., and Papadopoulo, T.1993. On determining the fundamental matrix: Analysis of differ-ent methods and experimental results. Technical Report RR-1894,INRIA.

Maybank, S.J. 1990. The projective geometry of ambiguous surfaces.Proc. of the Royal Society London A, 332:1–47.

Maybank, S.J. and Faugeras, O.D. 1992. A theory of self-calibrationof a moving camera.The International Journal of ComputerVision, 8(2):123–151.

Maybeck, P.S. 1979.Stochastic Models, Estimation and Control.Academic Press: London.

Mundy, J.L. and Zisserman, A. (Eds.) 1992.Geometric Invariancein Computer Vision. MIT Press.

Robert, L. 1993. Reconstruction de courbes et de surfaces par visionstereoscopique. Applications a la robotique mobile. Ph.D. thesis,Ecole Polytechnique.

Semple, J.G. and Kneebone, G.T. 1979.Algebraic Projective Geom-etry. Clarendon Press: Oxford, 1952 (Reprinted).

Spetsakis, M.E. and Aloimonos, J. 1988. Optimal computing ofstructure from motion using point correspondances in two frames.In Proc. International Conference on Computer Vision, pp. 449–453.

Trivedi, H.P. 1988. Can multiple views make up for lack of cameraregistration?Image and Vision Computing, 6(1):29–32.

Tsai, R.Y. 1989. Synopsis of Recent Progress on Camera Calibrationfor 3D Machine Vision. In Oussama Khatib, John J. Craig, andTomas Lozano-P´erez (Eds.),The Robotics Review. MIT Press: pp.147–159.

Tsai, R.Y. and Huang, T.S. 1984. Uniqueness and estimation ofthree-dimensional motion parameters of rigid objects wirth curvedsurfaces.IEEE Transactions on Pattern Analysis and MachineIntelligence, 6:13–27.

Ullman, S. 1979.The Interpretation of Visual Motion. MIT Press.Vieville, T. 1994. Auto-calibration of visual sensor parameters on a

robotic head.Image and Vision Computing, 12.Wampler, C.W., Morgan, A.P., and Sommese, A.J. 1988.

Numerical continuation methods for solving polynomial sys-tems arising in kinematics. Technical Report GMR-6372, GeneralMotors Research Labs.

Weng, J., Ahuja, N., and Huang, T.S. 1989. Optimal motion andstructure estimation. InProc. International Conference on Com-puter Vision and Pattern Recognition, pp. 144–152.

Zhang, Z. and Faugeras, O.D. 1992.3D Dynamic Scene Analysis.Springer-Verlag.

Zhang, Z., Deriche, R., Faugeras, O., and Luong, Q.-T. 1995. A ro-bust technique for matching two uncalibrated images through therecovery of the unknown epipolar geometry.Artificial IntelligenceJournal78:87–119.


Recommended