Deliverable N° 2D4 Report on 3D reconstruction · 2007. 10. 22. · Deliverable N° 2D4 Report on...

Underwater Systems Department A.G. Allais V. Brandou 07/02/2007 – DOP/CM/SM/PRAO/07.044

Project Exocet/D

Deliverable N° 2D4 Report on 3D reconstruction Diffusion : P.M. Sarradin DOP/CB/EEP/LEP J. Sarrazin DOP/CB/EEP/LEP M. Perrier DOP/CM/SM/PRAO

Confidential

Restricted

Public

CE DOCUMENT, PROPRIETE DE L'IFREMER, NE PEUT ETRE REPRODUIT OU COMMUNIQUE SANS SON AUTORISATION

Date : 07/02/2007 Reference : DOP/CM/SM/PRAO/07.044 Analytic N° : E010403A1 Contract N° :

Number of pages : 24 Number of figures : Number of annex : File name : 2D4.doc Writer : A.G.Allais, V.Brandou

Subject/Title : Project Exocet/D

Deliverable N° 2D4 Report on 3D reconstruction

Abstract :

Key-words :

Revisions

Grade Object Date Written by Checked by Approved by

A Creation 07/02/07 A.G. Allais V. Brandou

Project Exocet/D page 3/24


DOP/CM/SM/PRAO/07.044

Grade : A 07/02/2007

TABLE OF CONTENTS 1. INTRODUCTION ..............................................................................................................................4

2. IMAGE ACQUISITION STRATEGY ................................................................................................4 2.1. Choice for the use of visual servoing........................................................................4 2.2. Trajectories induced by the geometry of the stereo rig.............................................5 2.3. Visual servoing approach .........................................................................................7 2.4. Simulation.................................................................................................................9 2.5. Experimentations....................................................................................................12 2.6. Pre-programmed trajectories..................................................................................14

3. 3D RECONSTRUCTION............................................................................................................... 14 3.1. Sea trials and image acquisition.............................................................................14

3.1.1. Calibration.......................................................................................................15 3.1.2. Image acquisition............................................................................................16

3.2. 3D reconstruction ...................................................................................................17 3.2.1. Relating images ..............................................................................................18

3.2.1.1. Feature extraction and matching ................................................................18 3.2.1.2. Removing outliers .......................................................................................18

3.2.2. Camera calibration..........................................................................................19 3.2.3. Structure and motion recovery........................................................................19

3.2.3.1. Triangulation ...............................................................................................19 3.2.3.2. Bundle adjustment ......................................................................................19

3.2.4. Dense surface estimation ...............................................................................20 3.2.4.1. Rectification ................................................................................................20 3.2.4.2. Dense stereo correspondence....................................................................21 3.2.4.3. Dense triangulation.....................................................................................21

3.2.5. Visual scene representation ...........................................................................22 3.2.5.1. Triangular mesh..........................................................................................22 3.2.5.2. Texture mapping.........................................................................................22 3.2.5.3. VRML format...............................................................................................23

4. CONCLUSION .............................................................................................................................. 23

5. REFERENCES .............................................................................................................................. 23


1. INTRODUCTION This document presents the methodology developed to generate the 3D reconstruction of small-scale underwater natural scenes. The first part of this report focuses on an innovative methodology to acquire the images which is based on stereovision techniques and visual servoing. The second part describes the method of 3D reconstruction that we used and presents the results obtained using the images of natural scenes collected during the MoMARETO cruise with the stereovision system IRIS [ALL05] developed in the framework the EXOCET/D project.

2. IMAGE ACQUISITION STRATEGY

2.1. Choice for the use of visual servoing

Our goal is to develop a methodology based on a vision system, to obtain quantitative measurements through a 3D reconstruction of underwater structures. Consequently, taking advantage of possibilities allowed by underwater vehicles such as the victor6000 ROV, we carried out researches related to the use of visual servoing techniques to improve the 3D reconstruction. The images used for the reconstruction are collected when the vehicle is deployed on the seafloor at a fixed and stable attitude. The camera is hung from the 6 DOF manipulator arm which is mounted on the vehicle (Figure 1).

Figure 1: victor6000 on the seabed scanning a 3D object

The images are subjected to several constraints linked to the underwater environment. First of all, the observed scenes are unknown, and the objects to be reconstructed in these scenes are made up of random textures and forms. We just know that the objects are rigid and have a vertical overall shape. Moreover, refraction, presence of particles, absorption and problems of lighting in an underwater environment considerably alter the image quality. Given noisy images and an unknown model, it is very difficult to obtain an accurate 3D reconstruction. The idea is thus to reduce the number of unknown variables in the reconstruction computation. Considering less unknown variables in the optimisation process produces a faster algorithm which is more suitable for time-dependent applications as in robotics [BEN04]. In order to get more information, we acquire images at regular spatial intervals, following a predefined trajectory. As a result, visual servoing is performed with a pair of stereo cameras mounted on the manipulator arm effector. We show hereafter that the stereo rig geometry induces the trajectory and the number of images collected around the object to be reconstructed. In our case, the stereovision system IRIS is equipped with two



Grade : A 07/02/2007


different underwater cameras, the first one is fixed while the other one is mounted on pan & tilt. So, the intrinsic parameters are not the same for the two cameras. Moreover they are influenced by the characteristics of the underwater environment such as the optical index, which varies as a function of temperature, salinity, pressure, or wavelength [HIL62], [PES04].

The whole process of calibration (image acquisition and processing) is required for the 3D metric reconstruction but it is time-consuming. In order to save the time on the seabed, we have chosen to acquire during the dive only the images required for the calibration which is performed in an off-line processing stage. Thus the major part of time at the bottom is dedicated to the acquisition of the images of the scene that has to be reconstructed. Consequently, we have implemented a visual servoing method which does not rely on the result of the system calibration, i.e. which is invariant to camera intrinsic parameters [MAL02]. The visual servoing scheme proposed in this report has been completely validated by experiments performed on two different robots.

2.2. Trajectories induced by the geometry of the stereo rig

Visual control is carried out with a system of stereovision hung from the tip of an instrumented arm of a robot (eye-in-hand robotic system). It consists in capturing a reference image with the right camera on a given position, and then converging towards this position with the left camera (see Figure 2). We have shown that setting the geometry between both cameras, amounts to moving the coordinate system associated to each camera on the surface of a cylinder. The demonstration is based on the integration of infinitesimal displacements to obtain a discrete displacement.

Let T be the transformation matrix between the position of the two cameras (see Figure 3):

⎟⎟⎠

⎞⎜⎜⎝

⎛=

10tR

T ( )3SE∈T (1)

where R is the rotation matrix, and t is the ( 33× ) ( )13× translation vector. The rotation matrix can be written as a function of α:

( ) ( )[ ]xe αα rR = (2)

where r is the rotation vector corresponding to a rotation by an angle R∈α about a fixed axis specified by the vector δ :

( ) ααδδ

=r (3)

with:

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛=

0pan

tilt

δδ

δ (4)

Similarly, the translation vector depends on α and l:



Grade : A 07/02/2007


( ) ( ) ( )[ ]

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛−=

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛−=

00

00,

le

ll xααα rRt (5)

T can always be written as: AT e= (6) ( )3se∈A

with:

⎟⎟⎠

⎞⎜⎜⎝

⎛=

00A

νω x][ (7)

where is the rotation speed vector, such that ω ( 13× ) π<ω , and ν is a ( translation speed vector. Let be the ( skew matrix associated to vector

))

13×

x][ω 33× ω .

An infinitesimal displacement of the frame origin O, which coincides with the centre of projection C, on the trajectory according to time t, standardized between 0 and 1, is defined by:

( ) ( ) νω += tt x CC ][& . (8)

It can be shown that the integral of this equation can be written in the general form of the parametric equations of a cylinder:

( ) ( ) ( ) tt wvu +++= θθ sincos0rC (9)

with:

tω=θ , 2ω

γ=0r , 2

ω

γ−=u , 2

ω

ζ−=v , 2

ω

γν +=w

where:

⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜

⎝

⎛

+−−+−

=

yzyx

xzzx

yzzy

ωνωνωνωνωνων

γ and . ⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜

⎝

⎛

+−+++−+++−

=)(

)()(

22

22

22

xyzzyyzxx

zyzxzyyxx

zxzyxyzyx

ωωνωωνωωνωωνωωνωωνωωνωωνωων

ζ

The three vectors u, v and w are unit and form a direct trihedron. Including , they characterize the cylinder which defines the trajectory. Since u, v, w and depend on the various components of and , which depend themselves on the transformation matrix T between the position of the cameras (see Equations 1,6,7), we can say that the stereo rig geometry induces the trajectory.

0r

0rω ν

Besides, point is located on the cylinder axis. Therefore, if we set the geometry of the stereo rig, we can determine the distance to be applied between the cameras and the object under study.

0r



Grade : A 07/02/2007


Hence, it is possible to choose the geometry of the stereo rig with respect to the form, the volume, and the orientation of the underwater object to be reconstructed. For example, with a fixed geometry of the stereo rig, the trajectory corresponds to a straight line in case of parallel cameras (no angle), a circle if we apply a pan angle, or an helicoid in case of a pan and tilt angle. For instance (see Figure 2), a pan angle α and a distance l between the cameras force the stereo rig to describe a circular trajectory. We can also compose more complex trajectories if the geometry of the stereo rig is changed during the visual servoing.

Figure 2: Trajectory induced by the stereo rig geometry

– – – Trajectory of the cameras, –––– Object

2.3. Visual servoing approach

Figure 3: Perspective projections of a 3D point X on the two stereo rig cameras



Grade : A 07/02/2007


The aim of visual servoing is to control the movement of the robot's end-effector using the information provided by vision sensors. A typical task consists in repositioning an “eye-in-hand” system with respect to an observed object [MAL01]. Most visual servoing techniques are based on a “teaching-by-showing” approach [HAS93], [HUT96]. In our case, given a reference image I* taken by the right camera at the reference position F*, the goal is to reach the same position with the left camera.

Let us consider the pinhole camera model to perform the perspective projection of a 3D point to a virtual point ( ) 3P∈= 1,,, ZYXX ( ) 2P∈= 1,,yxm in the frame F from the centre of

projection C (see Figure 3). The relationship is defined by:

[ ]XtRmZ1

=ζ (10)

where ζ is the positive depth, R and t are respectively the rotation and the translation between frames F* and F which set the stereo rig geometry. Point m gives the corresponding point measured in pixels in image I: ( 1,,vu=p )

)mKp = (11) ( Kp ,FI∈

where K is the intrinsic parameters matrix of the left camera:

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛=

1000 0

0

vfrusf

K (12)

where and are the coordinates of the principal point (in pixels), f is the focal length (in meters), s is the skew and r is the aspect ratio. In the same way, points p* are obtained in frame F*, but with different intrinsic parameters if the cameras are not the same.

0u 0v

Generally, using “camera-dependent”; visual servoing methods [MAL04] implies that the same camera is used to learn the reference image and to perform the visual servoing. If the cameras are different, the intrinsic parameters of these cameras must be precisely identified. Otherwise, the current image I converges towards the reference image I*, but with different camera positions. Note that the intrinsic parameters may significantly vary during the life of a vision system, and they can be changed intentionally when using zooming cameras [MAL01]. We saw in section 2.1 that it is difficult to calibrate cameras under underwater experimental conditions. So, we have used a visual servoing method that allows us to compute an error function invariant to camera intrinsic parameters K [MAL02].

Consider n non-collinear 3D points of the observed object. These points are projected respectively in frames F* and F to give points ( )1,, **

ii*i vu=p and ( )1,, iii vu=p .

The latter are then projected in spaces Q* and Q invariant to intrinsic parameters. To achieve these projections, we need to compute the following

ni ,,2,1 K∈∀

( )33× symmetric matrix:

∑=

=n

i

Tiin 1

1 ppSp . (13)



Grade : A 07/02/2007


If the observed points are not collinear and , matrix is symmetric positive definite and can be written, using a Cholesky decomposition:

3>n pS

Tppp TTS = (14)

where is a non-singular upper triangular matrix, which allows us to compute points :

pT ( 33× )Q∈iq

)1,,(1iiii ba== − pTq p . (15)

In the same way, points are computed with points . These new points are then contained into two vectors:

*iq *

ip( )**

2*1

* ,,, nqqqs K= and ( )nqqqs ,,, 21 K= . The camera has converged to the reference position when . The derivative of the vector s is written: *ss =

Lvs =& (16)

where L is the ( matrix of interaction, and the )63 ×n ( )16× vector v represents the Cartesian speed of the camera.

The task function is:

( *ssLe −= +ˆ ) (17)

where is an approximation of the pseudo-inverse of L. +L̂In order to control the movement of the camera with an exponential convergence of the task function, the control law is:

ev λ−= , where 0>λ . (18)

This method has been tested with a stereovision system, both by simulation and by experimentation, in order to generate large trajectories by repeating visual servoing.

2.4. Simulation

The simulations have allowed us to test the control law and the complete chain of visual servoing. We have taken into account the real dimensions and parameters of the cameras (sensor dimension, pixel size, and focal length), and of the robotic arm (axes dimension, articular thrusts, maximum articular velocities). The simulation of the arm makes possible to determine which trajectories can be performed within the robot workspace.



Grade : A 07/02/2007


(a) Side view

(b) Top view

Figure 4: Simulation of trajectories

Figure 4 represents a simulated trajectory of the stereo rig hung from the robotic arm of the underwater vehicle. In Figure 4(a) and Figure 4(b), camera Cam1 is mounted on pan and tilt and is used to acquire reference images whereas camera Cam2 is fixed and is controlled during visual servoing. Both cameras have different intrinsic parameters. In these figures, the blue scatter plot stands for the 3D object such that it is similar in shape to an underwater hydrothermal vent.

The camera trajectories, hence the stereo rig geometry, are chosen with respect to the size and the shape of the object under study. In this example, the trajectory is induced by a geometry of 0.3 meter between the cameras and a pan of 18 degrees. Figure 4(a) and Figure 4(b) show respectively the side view and the top view of the trajectory used for the



Grade : A 07/02/2007


simulation. The trajectory is defined by the successive positions of the reference camera in green points. It is composed of three quarters of a circle in order to cover the whole height of the object. To ensure the continuity of the trajectory, the stereo rig U-turns at the end of a quarter of a circle. We can see that the arm cannot be manoeuvred all around the object if the vehicle is at a fixed attitude. Therefore, to make the 3D reconstruction of an overall object, the underwater vehicle has to be landed at several positions.

Figure 5: Projection of the scatter plot seen by the two cameras (red) in the invariant space

(green)

The upper part of Figure 5 represents the scatter plot (in red) seen by both cameras, at the end of visual servoing once the controlled camera has converged towards the reference position. The picture on the right is image I* taken by the reference camera at the beginning of visual servoing whereas the picture on the left is image I taken by the controlled camera at the end of visual servoing. We can see that, because the intrinsic parameters are not the same for the two cameras, points p do not match points . *p

On the contrary, in the lower part of Figure 5, the green scatter plot represents points q and which are the projection of points *q I∈p and in space Q and invariant to

intrinsic parameters. So, these intrinsics-free spaces are perfectly adapted to quantify the convergence of the camera positions. These points are then used to compute the control law v applied to the controlled camera.

*I∈*p *Q



Grade : A 07/02/2007


Different trajectories adapted to various object shapes (cylinder, sphere, several plans, …) have been simulated, adding noise on the point coordinates. The final aim for the use of simulation is to improve the robustness of the control law, and to observe the robot behaviour to prepare the experimentations, and thus to prevent the real arm from damage.

2.5. Experimentations

Figure 6: Experimental conditions

(a) (b)

Figure 7: Visual servoing (a) End of 1st visual servoing (b) End of 2nd visual servoing



Grade : A 07/02/2007


(a) (b)

Figure 8: Task function of the 1st servoing (a) Task function eν (b) Task function eω

(a) (b)

Figure 9: Task function of the 2nd servoing (a) Task function eν (b) Task function eω

The experiments have been carried out on two robots, both of them have a 6 DOF arm. The first one (see Figure 6(a)) is used in laboratory to test the visual servoing loop under optimum conditions, while the second one makes possible the chain validation under real underwater conditions. Figure 6(b) represents an experimentation carried out in laboratory with a baseline of 20 cm between the cameras, and a right camera angle of 15 degrees. With this geometry the trajectory described by the stereo rig around the object is a circle. The reference camera is the right one, and the left one is controlled during visual servoing. Both cameras have different intrinsic parameters. The 3D target, located at approximately 1.3 m from the cameras, is composed of three plans with different normal vectors (see Figure 6(b)).

In order to compute the robot control law, features have to be extracted from images. We have focused our choice on a method based on a robust point extraction such as the SIFT algorithm [LOW04]. The keypoints extracted from the reference and the current images are matched. A RANSAC algorithm [FIS81] is applied to remove false matches. We then obtain points (see 2.3) in the reference image, and points p matched in the current image of the controlled camera.

*p

Points p are tracked in the current image until the left camera reaches the initial right camera position. The ESM tracking [BEN04] algorithm has been chosen for these first experiments. But this method allows to track only planar targets, therefore in the first experiments, the 3D target is made up of several plans. As soon as a new image is acquired, points are tracked and the command is computed and applied to the robot. To generate the command, the



Grade : A 07/02/2007


coordinates of points and p are projected in invariant spaces and Q , to obtain respectively points and q, invariant to intrinsic parameters. At each iteration, points q are computed with the new coordinates of points p.

*p *Q*q

When points and q are superposed, task function e has converged exponentially towards zero (Figure 8 and Figure 9). This is also illustrated in Figure 7, which represents the end of two successive displacements of the stereo rig, when the left camera position has reached the reference camera position. The starting position is represented in transparency, in order to show the superposition of the left camera on the reference position. Finally, we can see in Figure 8 and Figure 9 that the control law is stable in spite of noise and calibration error.

*q

So, we can see that the first laboratory experiments have confirmed and validated the results obtained by simulation.

2.6. Pre-programmed trajectories

In order to generate a trajectory and make the arm MAESTRO follow this trajectory, an alternative to visual servoing consists in computing by simulation each position of the arm corresponding to the desired position for image acquisition. This pre-programmed trajectory is then loaded and carried out.

The advantage of this method is that it can be performed automatically with no operator. But a main drawback lies in the fact that all the positions have to be recalculated if the trajectory is to be changed. In case of visual servoing, on the other hand, the same result is obtained by simply changing the geometry configuration of the stereovision head. Besides, the accuracy of the trajectory depends on the precision and the adjustment of the manipulator arm. Therefore, this method could be carried out in some specific cases and in order to compare the results of 3D reconstruction.

3. 3D RECONSTRUCTION The 3D metric reconstruction is performed using the images of natural scenes collected during the MoMARETO cruise [SAR06] which was held from August 6 to September 6, 2006 on the French RV Pourquoi Pas? with the victor6000 ROV in the MOMAR area located on the Azores Triple Junction.

3.1. Sea trials and image acquisition

Our stereovision system IRIS (Figure 10) was tested during the first leg of the MoMARETO cruise (2006 August 6-17); two dives were partly dedicated to IRIS validation.



Grade : A 07/02/2007


(a) (b) Figure 10: Stereovision system IRIS (a) and IRIS hung from the tip of robotic arm MAESTRO of

victor6000 (b)

The first trials were conducted on the 850m deep Menez Gwen area during dive 287-4 whereas the second experiments were carried out during dive 289-6 on the Lucky Strike hydrothermal vent field, at a depth of 1750m (Figure 11). So, IRIS benefited from two 3 to 4 hour testing periods.

Figure 11: Location of IRIS deployment on Lucky Strike area

3.1.1. Calibration

To make a 3D metric reconstruction, the stereovision system needs to be calibrated; the extrinsic (baseline length and angle between both cameras) and intrinsic (focal length, principal point coordinates, skew coefficient) parameters must be known accurately. Since some of the parameters depend on the medium in which they are estimated, an in situ calibration must be performed. So, once an area where victor6000 could land at a fixed and stable attitude and could deploy its arm in front of an object had been found, the calibration stage could begin. It consists in deploying a calibration pattern on the seafloor then hang IRIS on the tip of the robotic arm and acquire a series of pairs of images from different viewpoints (Figure 12). The collected images allow us to determine the parameters required for the 3D reconstruction in an off-line processing stage.



Grade : A 07/02/2007


Figure 12: In situ calibration of IRIS using a calibration pattern

3.1.2. Image acquisition

As we said before, the image acquisition strategy is of great importance for the 3D reconstruction. Our approach was to generate a trajectory around the object that has to be reconstructed so that the images are acquired at regular intervals (Figure 13). In that way, the different positions of the cameras are known and this knowledge can be used to obtain a more accurate 3D reconstruction.

Figure 13: Representation of the different camera positions that define the trajectory around

the object

During the MoMARETO cruise, we generated the trajectories in two different ways. The first one corresponds to the visual servoing method detailed in section 2.3 while the other one consists in programming the trajectory on the robotic arm (see §2.6). The second method is heavier to put into operation but is required to compare both methods in term of 3D reconstruction accuracy. This comparison has not been performed yet and the results of 3D reconstruction presented in the following section have been obtained using the images collected with the pre-programmed trajectories.



Grade : A 07/02/2007


3.2. 3D reconstruction

In this section, we present the first 3D reconstruction results using the natural underwater images collected during the MoMARETO cruise. The 3D reconstruction is performed off-line since the whole process is very time-consuming. It is composed of several steps which are summarized in Figure 14: features are extracted and matched in the image sequence; the triangulation of the matched points is worked out using the calibration results in order to get the corresponding 3D points and to obtain a 3D metric reconstruction of the object.

Figure 14: System overview

All the steps of the method are detailed hereafter. Though the method set out in this section is available for a set of images, the results presented hereafter are obtained using just one stereo image pair (Figure 15).

Figure 15: Stereo image pair



Grade : A 07/02/2007


3.2.1. Relating images

3.2.1.1.Feature extraction and matching

As previously explained, for one position of the vehicle on the seabed, the images are acquired according to a regular spatial distribution around the object (21 images maximum during the sea trials). Although the geometry linking these views is roughly known because it corresponds to the geometry of the stereovision rig, a preliminary step consists in extracting and matching some features in two views in order to recover the accurate geometry linking these views.

Point features are extracted in the two images using the SIFT method [LOW04]. These features are invariant to image scaling and rotation, and partially invariant to change in illumination and 3D camera viewpoint. These properties are particularly appropriate in our case. For image matching and recognition, SIFT features are first extracted from the images (Figure 16) and their corresponding vectors are stored in a database. Then, matching the points together is performed by comparing their feature vectors in the database.

Figure 16: Feature extraction

3.2.1.2.Removing outliers

Few false matches remain after feature extraction and matching. To cope with this problem, the epipolar geometry linking the two images (Figure 17) introduces additional constraints that can be used to remove the outliers.

x x’

X

c1 c2e1 e2

Ep1Ep2

T

x x’

X

c1 c2e1 e2

Ep1Ep2

T Figure 17: The epipolar geometry



Grade : A 07/02/2007


The epipolar geometry corresponds to the intrinsic projective geometry between two views and is represented by the fundamental matrix. Given only a set of point matches, the latter can be estimated using the normalized 8-point algorithm [HAR00]. This estimation is combined with a RANSAC algorithm which is a very robust estimator capable to cope with a large proportion of outliers [FIS81]. So, the estimation of the fundamental matrix makes possible to remove the outliers and at the same time to compute the epipolar geometry which will be used later to perform a dense matching after the image rectification.

3.2.2. Camera calibration

Although the method of visual servoing we described before does not require the knowledge of the camera intrinsic parameters, the latter have to be estimated as well as the extrinsic parameters which correspond to the geometry of the stereo rig in order to work out the 3D metric reconstruction. These parameters are computed off-line from the image pairs of the calibration pattern using the camera calibration toolbox developed by Jean-Yves Bouguet (Figure 18).

29,97cm

17,41°

29,97cm

17,41°

Figure 18: Results of IRIS calibration

Once the external parameters are estimated the theoretical spatial distribution of the images around the object is known. Indeed, visual servoing allows us to acquire images at regular intervals corresponding to the geometry of the stereo rig and if visual servoing is not used, the trajectory of the arm is programmed so that images are acquired at regular intervals corresponding to this geometry.

3.2.3. Structure and motion recovery

3.2.3.1.Triangulation

Given the geometry of the stereo rig, the internal camera parameters and the set of point matches in the left and right images, a basic triangulation is computed to obtain the 3D location of the points. Although this triangulation is quite rough, it provides a first estimate which is used as an initialisation of the 3D structure for the bundle adjustment algorithm.

3.2.3.2.Bundle adjustment

A basic triangulation of the projected point matches is not sufficient since the extrinsic parameters of the stereo rig ensure only a rough estimation of the spatial distribution of the shots. In case of visual servoing, some uncertainties are induced by noise in the input images that leads to a shift of the camera positions from their theoretical positions. And in the



Grade : A 07/02/2007


case of programming the arm trajectory, the accuracy depends on the precision and the adjustment of the manipulator arm. It is thus necessary to take these uncertainties into account and to correct them in order to improve the quality of the final reconstruction result.

Therefore, to obtain such precise 3D measurements, we use an algorithm of minimization, such as a sparse bundle adjustment algorithm [TRIG00], initialised with the theoretical camera positions, intrinsic camera parameters, and the 3D structure computed by triangulation.

A bundle adjustment computes the best possible fit and correct the relative camera pose of all views and the corresponding 3D features [KOC05]. Generally, the other approaches take into account the projective reconstruction and refine later to a metric reconstruction, but in our case we solve directly for the 3D metric structure and camera parameters.

Figure 19 shows the results of the bundle adjustment on a pair of images. This sparse object gives the outlines of the object shape, even if there is no sufficient surface details for a good visual reconstruction.

Figure 19: Structure recovery

3.2.4. Dense surface estimation

In order to obtain a highly realistic 3D reconstruction, the 3D structure obtained in the previous section must be improved by a dense depth estimation [KOC99]. This step is composed of two main parts which are explained hereafter but which have not been finalized in our global processing method so the final 3D reconstruction results presented in this report do not include this processing stage.

3.2.4.1.Rectification

The great advantage of the rectification step is to introduce a simplification in the dense matching procedure. It consists in transforming both images (see Figure 17) in a standard geometry so that both image planes are coplanar and the epipolar lines are projected to infinity: all the epipolar lines are parallel and horizontal. Thus, the rectified image planes are warped such that the epipolar lines coincide with the image scan lines, that is to say, given a point in an image, its corresponding point in the second image will be searched on the horizontal epipolar line [KOC98].



Grade : A 07/02/2007


3.2.4.2.Dense stereo correspondence

A large number of stereo matching algorithms exist, and they can be classified in two main categories: local and global methods according to the principle they are based upon. Other methods, called cooperative algorithms, combine local and global approaches at the same time.

The difficulty is to choose the good algorithm to perform a dense 3D reconstruction that takes into account the rendering, the metrologic quality, the computing speed and the complexity of the scene. Thus, we are currently evaluating the performances of five algorithms presented in [SCH02]:

1. Shiftable window SSD (Sum of Squared Differences),

2. Dynamic programming,

3. Scanline optimisation,

4. Graph-cut optimisation,

5. Bayesian diffusion.

The taxonomy of [SCH02] provides information about the overall performance of these different algorithms (textureless regions, depth discontinuity regions, occluded regions). Shiftable-window SSD seems to be the best-performing representative among the local methods. Algorithms 3, 4 and 5 are all different global optimisation algorithms based on dynamic programming. The graph-cut method (4) gives excellent results (Figure 20), performing better in textureless areas and near discontinuities, and outperforming the other optimisation methods. The major downsides are the relatively high computation time, and the need for precisely tuned parameters, whose values are often image-dependent. This algorithm remains however a very interesting choice for our application, since the quality of the rendering process is a higher priority compared to the execution time.

Tsukuba reference image True disparities Graph cuts

Figure 20: Illustration of a dense matching method on the Tsukuba images taken from [SCH02]

3.2.4.3.Dense triangulation

Once a dense correspondence map and the metric camera parameters have been estimated, dense surface depth maps are computed by triangulation.



Grade : A 07/02/2007


3.2.5. Visual scene representation

3.2.5.1.Triangular mesh

Using triangular mesh makes possible to reduce the geometric complexity of a 3D surface representation. It consists in generating with the Delaunay triangulation algorithm a 2D triangular mesh from the set of points of one image. Then the corresponding 3D mesh is obtained by providing the 2D vertices with their depth value. The resulting surface model obtained with few feature points is presented in Figure 21.

Figure 21: Triangular mesh

3.2.5.2.Texture mapping

The 3D representation is then visualized in a more realistic appearance by providing the wire-frame model with texture mapping. First of all, a reference image is chosen as the texture map. Then, each basic triangle primitive is easily mapped with texture. Indeed, the knowledge of the exact position of the reference image and of the 3D model makes possible to project the texture map onto the 3D model. Figure 22 shows the final reconstruction with texture mapping. We can note that even if the set of 3D points used in this reconstruction was not dense, we get a good visual impression. Introducing dense matching makes possible to have a much more accurate 3D model, which can be used for 3D quantitative imaging.

(a) (b)



Grade : A 07/02/2007


(c)

Figure 22: 3D reconstruction with texture: left view (a), right view (b), front view (c)

3.2.5.3.VRML format

The resulting surface model (Figure 22) is stored in VRML format for easy visualization and exchange of information.

4. CONCLUSION In this report, we have presented a novel method based on visual servoing which has been developed in order to acquire stereo images at regular spatial intervals that depend on the geometry of the stereovision head. This constraint can be used as a priori knowledge to optimise and improve the final 3D reconstruction which is carried out in a post-processing stage. The whole system and its associated methodology have been validated in laboratory experiments both in air and test tank and during sea trials. The first results obtained on natural images collected during the MoMARETO cruise turned out to be very efficient and promising even if the initial results presented in this report are not optimised and do not take into account all the steps to lead to an accurate 3D reconstruction. The natural extension of the current work would be to perform the 3D reconstruction from a large set of images instead of just two images and to introduce the dense matching in order to obtain a dense depth map.

5. REFERENCES [ALL05] Allais, A-G, Brandou, V, Hoge, U, Bergmann, M, Lévêque, J-P, Léon, P, Cadiou, J-

F, Sarrazin, J, and Sarradin P-M. “Design of optical instrumentation for 3D and temporal deep-sea observation”, Proc. of the 1st international conference of optical complex systems, OCS, Marseille, France, 2005.

[BEN04] Benhimane, S, Malis, E. “Real-time image-based tracking of planes using efficient second-order minimization”, IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS'04, Vol. 1, pp 943–948, Sendai, Japan, October 2004.

[FIS81] Fischler, M. A, Bolles, R. C. “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography”, Comm. of the ACM, Vol. 24, pp 381–395, 1981.



Grade : A 07/02/2007


[HAR00] Hartley, R, and Zisserman, A. “Multiple view geometry in computer vision”, Cambridge University Press, pp 265, 2000.

[HAS93] Hashimoto, K. “Visual servoing: real time control of robot manipulators based on visual sensory feedback”, Vol. 7 of World Scientic Series in Robotics and Automated Systems, World Scientic Press, Singapore, 1993.

[HIL62] Hill, M.N. “The Sea”, Vol. 1, In Physical Oceanography, Interscience Publishers,1962.

[HUT96] Hutchinson, S, Hager, G. D, Corke, P. I. “A tutorial on visual servo control," IEEE Trans. on Robotics and Automation, Vol. 12(5), pp 651–670, October 1996.

[KOC98] Koch, R, Pollefeys, M, and Van Gool, L-J. “Multi viewpoint stereo from uncalibrated video sequences”, ECCV, pp 55-71, London, UK, 1998.

[KOC99] Koch, R. Pollefeys, M, and Van Gool, L-J. “Realistic 3-D scene modeling from uncalibrated image sequences”, ICIP (2), pp 500-504, 1999.

[KOC05] Koch, R, Evers-Senne, J-F, Frahm, J-M, and Koeserr, K. “3D reconstruction and rendering from image sequences”, WIAMIS, Montreux, Switzerland, 2005.

[LOW04] Lowe, D.G. “Distinctive image features from scale-invariant keypoints”, International Journal of Computer Vision, Vol. 60(2), pp 91–110, 2004.

[MAL01] Malis, E. “Vision-based control using different cameras for learning the reference image and for servoing”, IEEE/RSJ International Conference on Intelligent Robots Systems, Vol. 3, pp 1428–1433, Hawaii, USA, November 2001.

[MAL02] Malis, E. “A unified approach to model-based and model-free visual servoing”, European Conference on Computer Vision, Vol. 4, pp 433–447, Copenhagen, Denmark, May 2002.

[MAL04] Malis, E. “Visual servoing invariant to changes in camera intrinsic parameters”, IEEE Transaction on Robotics and Automation, Vol. 20(1), pp 72–81, February 2004.

[PES04] Pessel, N. “Camera self-calibration in underwater environment”, Proceedings of The Fourteenth International Offshore and Polar Engineering Conference, ISOPE-2004, Vol. 1, pp 738–745, Toulon, France, May 2004.

[SAR06] Sarrazin, J, Sarradin, P-M, and the Momareto cruise participants. “Momareto: a cruise dedicated to the spatio-temporal dynamics and the adaptations of hydrothermal vent fauna on the Mid-Atlantic Ridge”, InterRidge News, Vol 15, pp 24-33, 2006.

[SCH02] Scharstein, D and Szeliski, R. “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms”, IJCV, Vol. 47(1/2/3), pp 7-42, April-June 2002.

[TRIG00] Triggs, B, McLauchlan, P, Hartley, R, and Fiztgibbon, A. “Bundle adjustment: a modern synthesis”, In B. Triggs, A. Zisserman, R. Szeliski (Eds.), Vision Algorithms: Theory and Practice, LNCS, Vol. 1883, pp 298-372, Springer-Verlag, 2000.



Grade : A 07/02/2007

Date post:	18-Jan-2021
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Deliverable N° 2D4 Report on 3D reconstruction · 2007. 10. 22. · Deliverable N° 2D4 Report on...

Documents