[IEEE 2013 20th International Conference on Telecommunications (ICT) - Casablanca...

A Genetic Algorithm for Recovering CameraParameters and Motion From Silhouettes

Amine Mouafi, Rachid BenslimaneLaboratoire de Transmission et Traitement d’Information

Ecole Superieure de TechnologieUniversite Sidi Mohamed Ben Abdellah

Route d’Imouzzer B.P. 2427, Fez, MoroccoEmail: [email protected]; [email protected]

Aziza El ouaaziziLIMAO

Faculte Polydisciplinaire de TazaUniversite Sidi Mohamed Ben Abdellah

Route d’Oujda, B.P. 1223, Taza, MoroccoEmail: az [email protected]

Abstract—we present a novel approach for recovering thecamera parameters and motion from the concept of coherenceof a set of silhouettes of an object taken under circular motion.These parameters can be obtained by maximizing the totalcoherence between all silhouettes. For this purpose, we proposein this paper to use genetic algorithm instead the conventionaloptimization procedures. Genetic algorithm can correctly find theoptimal solution without the need of initial values of parameters.The performance of the proposed method is shown in term ofconvergence and accuracy to recover camera parameters. Therecovered parameters may be used for 3D image-based modelingto obtain high quality 3D reconstructions.

Index Terms—Genetic Algorithm, silhouette coherence, param-eters estimation, circular motion.

I. INTRODUCTION

The recovery of camera parameters and motion fromimage sequences without using any calibration patterns canbe classified into two approaches, the feature-based andsilhouette-based approaches. In the feature-based approach,structure from motion algorithm [14] determines the camerainformation and the 3D structure of the object simultaneouslyfrom the feature correspondences. In the silhouette-basedapproach, the cameras are first calibrated from the objectsilhouettes and then the volumetric description, intersection[16,17] techniques are applied for object modeling.

Silhouette-based approach was studied by [1, 2, 3, 4],and recently there has been some renewed interest in theproblem [5, 6, 7]. Furukawa et. al. [6] directly searched forfrontier points on silhouettes in each viewpoint. To recoverthe epipolar geometry, they require an orthographic cameramodel. Their method also requires high quality silhouettesand works better with small camera baselines. Boyer et.al. [5] proposed a quantitative and qualitative criterion formeasuring the silhouette consistency from visual cones, butthe estimated results heavily depend on the initial parameters.Hernandez et. al. [7] makes use of the silhouettes in orderto recover the camera parameters under circular motion.Assuming that the silhouettes are well extracted, the authorspropose the silhouette coherence as a global quality measurefor the calibration of a set of the camera parameters by

maximizing the total silhouette coherence.

In previous works, using silhouettes only to recover cameraparameters and motion needs either a good initial guessabout the camera parameters to provide enough constraints.To resolve this problem, we have used a genetic algorithmto maximize the total silhouette coherence defined byHernandez et. al.[7]. The proposed algorithm can correctlyfind the optimal solution without the need of initial values ofparameters.

This paper is organized as follows: in Section 2 we presentthe problem statement. In Section 3 we define the silhouettecoherence measure and its application limits. In Section 4 wepresent the Estimation camera parameters and motion by usinggenetic algorithm. In Section 5 we present the obtained results.

II. PROBLEM STATEMENT

We consider a pinhole camera model. Let M = (x, y, z) bea 3D point in an object frame and m = (u, v) the correspond-ing image point in the image frame. The relation between thepoint M and its 2D projection m is fully represented by the3x4 camera projection matrix P (see [14]):

m ' PM ' K [R | t]M (1)

where the 3x3 rotation matrix R and the vector t represent theorientation and translation. The calibration matrix K containsthe intrinsic parameters of the camera. The aspect ratio and theskew factor are assumed to be known or ideal, the principalpoint (u0, v0) is considered to be the center of the image. Theonly intrinsic parameter that we consider is the focal length f(in pixels). The calibration matrix K is given as:

K =

f 0 u00 f v00 0 1

For n views, we parameterize the circular motion with

n + 3 parameters: the spherical coordinates of the rotationaxis (θa, φa), the translation direction angle αt, the n − 1

Fig. 1. Example of circular motion parametrization.

camera angle steps ∆ωi and the focal length f . The ith cameraprojection matrix Pi has the following decomposition:

Pi = K [Ri | ti] = K [Ra(ωi) | t] ∀i (2)

where the rotation matrix Ra(ωi) is written as a function ofωi and a as follow:

Ra(ωi) = R1(ωi) +R2(ωi)

R1(ωi) = (1− cosωi)

a2x axay axazaxay a2y ayazaxaz ayaz a2z

R2(ωi) =

cosωi −az sinωi ay sinωi

az sinωi cosωi −ax sinωi

−ay sinωi ax sinωi cosωi

and the vectors a and t are given as:

a = (sin θa cosφa, sin θa sinφa, cos θa)

t = (sinαt, 0, cosαt)

Our goal is to recover the projection matrices Pi of a set ofsilhouettes Si of an object taken under circular motion as theset of n+3 parameters (θa, φa, αt,∆ωi, f) (see Fig. 1).

III. MEASURE OF SILHOUETTE COHERENCE

The coherence measure is computed as follow :1) compute the reconstructed visual hull1 defined by the

silhouettes,2) project the reconstructed visual hull back into the cam-

eras,3) compare the reconstructed visual hull silhouettes to the

original onesIn the situation of ideal data, i.e., perfect projection matri-ces, the reconstructed visual hull silhouettes and the originalsilhouettes will be exactly the same (see Fig. 2.a). With realdata, the silhouettes will not be perfect. As a consequence,the original silhouettes and the reconstructed visual hull sil-houettes will not be the same, the reconstructed visual hull

(a) (b)

Fig. 2. 2D examples of different silhouette coherences. The reconstructedvisual hulls are the black polygons. a) Perfectly coherent silhouette set. b) Setof 3 silhouettes with low silhouette coherence. Incoherent optic ray pencilsare indicated by the arrows..

silhouettes being always contained in the original ones. Thiscan be explained mathematically [7].

Let ∂i denote the contour of the original silhouette Si, and∂Vi the contour of the reconstructed visual hull silhouette SV

i .A measure C of coherence between these two silhouettes canbe defined as:

C(Si, SVi ) =

∫(∂Vi ∩ ∂i)∫

∂i∈ [0, 1] . (3)

This measure evaluates the coherence between the silhouetteSi and all the other silhouettes Sj;j 6=i that contributed tothe reconstructed visual hull. To compute the total coherencebetween all the silhouettes, we simply compute the meancoherence of each silhouette with the n− 1 other silhouettes:

C(Si, . . . , Sj) =1

n

n∑i=1

C(Si, SVi ) (4)

To be able to efficiently exploit this coherence measure weneed first to know its application limits. As we stated above, ifwe have perfect camera matrices, then C(Si, . . . , Sj) = 1. Thisnever happens in practice. We can maximize the silhouette co-herence by adjusting the camera parameters in order to reducemismatches between silhouettes. But maximizing coherencebetween silhouettes does not mean finding the right cameraparameters [5]. This depends on: the object shape, the numberof silhouettes, and the interdependence between parameters ofthe camera.

IV. THE PROPOSED GA FOR CAMERAPARAMETERS AND MOTION ESTIMATION

The proposed GA for camera parameters and motionestimation Genetic Algorithms (GAs) [9, 10], are pseudo-stochastic search methods whose derive their fundamentalideas and terminology from the Darwinian ”Natural selection”theory, according to which individuals that are better fit toa given environment are more likely to survive. While

1Visual Hull is an approximate 3D geometric representation of an objectresulting from projecting and intersecting all silhouettes from multiple views.

solving an optimization problem using GAs, each solutionis usually coded as an alphabet string of finite length calledchromosome. Each string or chromosome is considered as anindividual. A collection of N individuals is called population.GAs start with a randomly generated population of size N, ineach iteration of the algorithm, a new population of the samesize is generated from the current population by applyingoperators, termed selection, crossover and mutation [11],that mimic the corresponding processes of natural selection.Following nature’s example the probability pm of applyingthe mutation operator is very low compared to the probabilityof applying the crossover operator pc.

In order to exploit silhouette coherence for camera motionand focal length estimation under circular motion, the keyis to use the silhouette coherence as the fitness function inan optimization procedure. In this paper we use a GA toestimate the adequate camera and motion parameters that givesmaximal total silhouette coherence described in equation (4).We encode the vector v = (θak, φak, αt,∆ωi, f) in a way thatallows manipulation by genetic work. Therefore, we considerthe chromosome representation individuals as a binary stringof finite length. The phenotype of the kth individual is definedby:

θak φak αt ∆ω1 . . . ∆ωn f

There exists no criterion in the literature [9], which ensuresthe convergence of GAs to an optimal solution. But usually,two stopping criteria are used in Genetic Algorithms: Inthe first, the process is executed for a fixed number ofiterations and the best individual obtained is taken to be theoptimal one. In the second, the algorithm is terminated if noimprovement in the fitness value of the best individual for afixed number of iterations, and the best chromosome is takento be the optimal one. In this paper, we have adopted the firststopping criteria with elitist strategy [12].

The aim of the elitist strategy is to carry the best chro-mosome from the previous generation into the next. We haveimplemented this strategy in the following way:

Step 1: Copy the best individual ind0 of the initialpopulation pop0 in a separate location.Step 2: Perform selection, crossover and mutation oper-ations to obtain a new population pop1.Step 3: Compare the worst individual ind1 in p1 withind0 in terms of their fitness values. If ind1 is found tobe worse than ind0, then replace ind1 by ind0.Step 4: Find the best individual ind2 in pop1 and replaceind0 by ind2.

Note that an individual ind1 is said to be better than anotherindividual ind2 if the fitness value of ind2 is less that ofind1, since the problem under consideration is a maximizationproblem.

Fig. 3. Some of the images of the Hannover dinosaur sequence [19] (of atotal of 36 images). From top to bottom: color images, binarized silhouettesand extracted smooth contours silhouettes.

V. EXPERIMENTAL RESULTS

In our experiment, we have used the Hannover dinosaursequence shown in Fig. 3. The dinosaur sequence (36 images)of 720x576 pixels has been binarized by a segmentationalgorithm. We also dispose of a set of its calibrated camerasparameters with the purpose to accurately recover the intrinsicparameters and the circular motion. In order to extract smoothcontours from the silhouettes, we have used a GVF 2D snake[13].

A. Local maxima challenge

One of the critical points of the optimization procedureis the existence of local maxima. The silhouette coherencemethod suffers from non-convex energy maps. In order tovisualize the energy shape of the silhouette coherence, wehave computed some 2D slices of the energy shape for thecamera motion problem with the case of real silhouettes(Dinosaur sequence). Fig. 4 plots the energy correspondingto the rotation axis (θa, φa) for the silhouette coherence withthe other camera parameters fixed. This figure shows severallocal maxima. This presents a serious problem to conventionalmaximization procedures, and the choice of starting pointwill determine which maxima the procedure converges to.

We have tested derivative-based methods for the optimiza-tion of the silhouette coherence criterion. Although we havenot the possibility of computing the analytic derivatives of thesilhouette coherence criterion, we can estimate the derivativesnumerically [20]. We have tested the conjugate gradient from[21]. Based on the results that we have obtained in Table IThe derivative-based methods actually reach good convergencerates when starting points are close enough to the optimum.However, if starting points are far away from the optimum,the derivative-based methods could converge erroneously.

B. Results and Performance of the proposed GA

In this test we have used the proposed GA for maximizingthe silhouette coherence of real images (Dinosaur sequence).Table II gives parameters bounds we set for this test and

Fig. 4. Energy shape of the silhouette coherence for the rotation axisestimation of the Dinosaur sequence. The energy domain corresponds toθa ∈ [0, π]φa) ∈ [0, 2π].

Fig. 5. Detail of the energy shapes and corresponding isocontours for thecouple of parameters (θa, φa).

TABLE ICAMERA PARAMETER ESTIMATED BY CONJUGATE GRADIENT

Parameters Rotation Translation Focal lengthθa φa αt f

Ground Truth 92.66 2.26 2.73 3217Starting points 90.00 2.00 2.00 3100

Recovered by CG 92.47 2.12 2.89 3186Starting points 60.00 10.00 2.00 3100

Recovered by CG 63.38 -4.56 2.34 3146Starting points 120.00 20.00 2.00 3100

Recovered by CG 118.66 1.78 1.93 3165

corresponding camera parameters estimated by the proposedGA (N = 10,pm = 0.75,pc = 0.035). Fig.6 plots conver-gence performance for camera parameters as GA’s evolutionproceeds. It shows that after approximately 80 generations, allestimated parameters can simultaneously in parallel convergeto a stable solution. Different from conventional optimizationapproach, the proposed GA can converge to a solution close tothe optimal solution without the need of good starting points.

We now present how focal length bound affects theperformance of our approach. Its main problem is that verylarge variations of the focal length can produce only smallvariations of the silhouette coherence criterion. We keeprotation axis angles from −π to π, the translation directionfrom −5 to 5 and then examine the change in performanceby means of varying bounds of focal length. The focal lengthbounds and corresponding camera parameters estimated by

Fig. 6. Convergence performance of camera parameters, y-axis representsscaled camera parameters.

TABLE IICAMERA PARAMETER ESTIMATED BY GA

Parameters Rotation Translation Focal lengthθa φa αt f

Ground Truth 92.66 2.26 2.73 3217Parameter Bounds −π, π 0, π −5, 5 3100, 3300Recovered by GA 92.40 2.27 2.77 3224

TABLE IIICAMERA PARAMETERS BY THE PROPOSED GA UNDER DIFFERENT FOCAL

LENGTH BOUNDS

Focal lengthbounds

Rotation Translation Focal lengthθa φa αt f

3100, 3300 92.40 2.27 2.77 3224

3000, 3400 89.99 2.53 2.86 3125

2800, 3500 93.75 2.04 2.56 3397

2000, 4000 93.83 2.18 2.50 3381

the proposed GA (N = 10,pm = 0.75,pc = 0.035) after 200generations are summarized in Table III.

Table III indicate that expanding focal length bound affectsthe parameters accuracy and if we restrict the focal lengthin a reasonable range, the camera parameters would be moreaccurate as demonstrated in the first row.

VI. CONCLUSION

We presented novel approach to the problem of silhouette-based camera calibration based on Genetic Algorithms. Thismethod has been successfully tested for the case of circularmotion. The calibration is evaluated by maximizing betweensilhouettes based cost function. Our experimentation withreal images demonstrates a good performance. The proposedmethod can correctly find the near-optimal without the needof initial values.

ACKNOWLEDGMENT

The dinosaur images used here were provided by WolfgangNiem at the University of Hannover. The Ground truth datawas provided by Dr A.W.Fitzgibbon and Prof A.Zissermanfrom the University of Oxford Robotics Research Group.

REFERENCES

[1] Tanuja Joshi, Narendra Ahuja, and Jean Ponce. Structure and motionestimation from dynamic silhouettes under perspective projection. InICCV, pages 290295, 1995.

[2] Paulo R. S. Mendonca, Kwan-Yee K. Wong, and Roberto Cipolla.Epipolar geometry from profiles under circular motion. IEEE Trans.Pattern Anal. Mach. Intell., 23(6):604616, 2001

[3] B. Vijayakumar, D. J. Kriegman, and J. Ponce. Structure and motion ofcurved 3d objects from monocular silhouettes. In CVPR 96: Proceedingsof the 1996 Conference on Computer Vision and Pattern Recognition(CVPR 96), page 327, Washington, DC, USA, 1996. IEEE ComputerSociety.

[4] K.Y.K. Wong and R. Cipolla. Structure and motion from silhouettes. InICCV, pages II: 217222, 2001.

[5] Edmond Boyer. On using silhouettes for camera calibration. In ACCV(1), pages 110, 2006.

[6] Yasutaka Furukawa, Amit Sethi, Jean Ponce, and David Kriegman. Robuststructure and motion from outlines of smooth curved surfaces. PAMI,28(2):302315, February 2006.

[7] Carlos Hernndez, Francis Schmitt, and Roberto Cipolla. Silhouette coher-ence for camera calibration under circular motion. PAMI, 29(2):343349,February 2007.

[8] Wong, K.-Y. K. and Cipolla, R. (2001). Structure and motion fromsilhouettes. In 8th IEEE International Conference on Computer Vision,volume II, pages 217-222, Vancouver, Canada.

[9] A. El ouaazizi, M. Zaim and R. Benslimane, A Genetic Algorithmfor Motion Estimation, International Journal of Computer Science andNetwork Security, VOL.11 No.4, April 2011

[10] D.E. Goldberg, ”Genetic Algorithm in Search, Optimization, and Ma-chine Learning”, Addison Wesley, 1989.

[11] Z. Michalewiez, ”Genetic Algorithms + Data Structure =EvolutionPrograms”, Springer, Berlin, 1992.

[12] A. El ouaazizi, R. Ouremchi and R. Benslimane, Reconstruction ofgray-level image by genetic algorithm , Proceeding of 4th InternationalConference on Quality Control by Artificial Vision, Japan 1998.

[13] Xu, C. and Prince, J. L. (1998). Snakes, shapes, and gradient vector fow.IEEE Transactions on Image Processing, pages 359-369.

[14] Hartley, R. and Zisserman, A. (2000). Multiple View Geometry inComputer Vision. Cambridge University Press, Cambridge,UK.

[15] Fitzgibbon, A. W., Cross, G., and Zisserman, A. (1998). Automatic3D model construction for turn-table sequences. In R., R. K. and Van-Gool., L., editors, 3D Structure from Multiple Images of Large-ScaleEnvironments, LNCS 1506, pages 155-170. Springer-Verlag.

[16] W. N. Martin and J. K. Aggarwal, Volumetric descriptions of objectsfrom multiple views, IEEE Trans. Pattern Analysis Machine Intelligence,5(2):150 158, 1983.

[17] R. Szeliski, Rapid octree construction from image sequences, Com-puter Vision, Graphics, and Image Processing. Image Understanding,58(1):2332, 1993.

[18] M. Powell, ”An efficient method for finding the minimum of a functionof several variables without calculating derivatives”, Computer Journal,vol. 17, pp. 155162, 1964.

[19] http://www.robots.ox.ac.uk/ vgg/data/data-mview.html[20] Griewank, A. (2000). Evaluating derivatives: principles and techniques

of algorithmic differentiation. Society for Industrial and Applied Mathe-matics.

[21] Press, W., Teukolsky, S., Vetterling, W., and Flannery, B. (1992).Numerical Recipes in C (The Art of Scientific Computing). CambridgeUniversity. Press

Date post:	20-Dec-2016
Category:	Documents
Upload:	aziza
View:	215 times
Download:	2 times

[IEEE 2013 20th International Conference on Telecommunications (ICT) - Casablanca...

Documents