Post on 26-Jun-2020
transcript
1
Mobile Robot Navigation Using
Visual Servoing
T. TEPE DC 2010.018
2
DYNAMICS & CONTROL TECHNOLOGY
GROUP
MOBILE ROBOT NAVIGATION USING VISUAL SERVOING
M.Sc. INTERNSHIP
Supervisor : Prof. Dr. Henk NIJMEIJER
Coach : Dr. Dragan KOSTIĆ
Student : Tufan TEPE
Student ID: 0666323
3
ABSTRACT
Equipping robots with vision systems increases the versatility of the robots but also
complexity of their control. Despite the increasing complexity, vision remains an attractive
sensory modality for mobile robot navigation since it provides rich information about the
robot's environment.
In this work, a problem of visual servoing based on a fixed monocular camera mounted on a
mobile robot is investigated. A homography based control method is used for autonomous
navigation of a mobile robot with nonholonomic motion constraints. The visual control task
uses the idea of homing. With this approach, an image is taken previously at the desired
position. Then, the robot is driven from an initial position towards the desired position by
using the information extracted from the target image and the images taken during movement
of the robot.
4
Table of Contents
1 INTRODUCTION .............................................................................................................................. 5
2 DESIGN ISSUES ............................................................................................................................... 5
2.1 Camera configuration .............................................................................................................. 5
2.1 Servoing architectures ............................................................................................................. 6
3 AN INSIGHT INTO VISUAL SERVOING METHODS ............................................................................ 7
3.1 The geometry of image formation ............................................................................................ 7
3.1 Analysis of visual servoing methods ......................................................................................... 9
4 PROJECT DESCRIPTION ................................................................................................................. 13
5 HOMOGRAPHY BASED VISUAL SERVOING of a NONHOLONOMIC MOBILE ROBOT...................... 13
5.1 Homography and its estimation ............................................................................................. 13
5.1.1 Geometric transformations ......................................................................................... 14
5.1.2 Situations in which solving a homography arises .......................................................... 18
5.1.3 How to find the homography? ..................................................................................... 19
5.2 Motion model of the mobile robot ......................................................................................... 29
5.3 Input-output linearization and control law ............................................................................. 33
5.3.1 Input-output linearization ............................................................................................ 34
5.3.2 Control law .................................................................................................................. 35
5.3.3 Desired trajectories of the homography elements ....................................................... 36
5.4 Stability analysis ..................................................................................................................... 38
6 SIMULATIONS .............................................................................................................................. 39
7 EXPERIMENTAL ARRANGEMENTS ................................................................................................ 58
8 CONCLUSIONS .............................................................................................................................. 59
APPENDIX A .................................................................................................................................... 60
APPENDIX B .................................................................................................................................... 62
5
1-INTRODUCTION
Robots are electro-mechanical machines which are designed in such a way that they interact
with their environment. In order to realize that interaction in a desired manner, they must be
equipped with appropriate sensory modalities. In today's world so far, most of robotic
applications take place in known environments or in the environments which are arranged to
be suitable for robots. Robots have been rarely used until lately in the work environments
which can not be controlled fully or about which not much information is available. The main
reason of this limitation lies under the insufficient sensory capabilities of the robots. In order
to compensate for the lack of information obtained from the surroundings, integration of
different sensors to the robots is made to be one of the crucial steps in the design of the robots
and vision is recognized to be very important to increase the versatility of robots. In the last
couple of decades, a lot of work and investigation have been carried out successfully in the
area of robotic vision [1], [2], [3]. Increased computing power and developed pixel
processing hardware enable analysis of images at a sufficient rate to guide the robotic
manipulators without touching the objects [1]. With the use of vision devices and the
information obtained from them in robotic applications, the term "visual servoing or visual
servo control" is started to be used. "Visual Servo Control" refers to closed loop control of
the pose of a robot by utilizing the information extracted from vision sensors and it relies on
the offerings and techniques from many elemental areas such as image processing, computer
vision, kinematics, dynamics and control theory.
2-DESIGN ISSUES
While designing a vision-based control system, one can raise many questions ranging from
the type of the camera to be used to the type of the lens, from the number of cameras to
where to place the cameras, from which kind of image features to utilize to whether to derive
three dimensional description of the scene or to use two dimensional image data or
combination of both etc. Since vision has a broad application area and new techniques and
solutions are being developed day by day, the number of this type of questions can be
increased easily. However, two very crucial issues in the design step of vision based control
systems are explained to stay in the bounds of this project and people can consult with
numerous academic sources easily to obtain detailed information for other aspects.
2.1. Camera Configuration
One main issue when constructing a vision based control system is the determination of the
place where the camera is positioned. There are two main options: the camera can be placed
at a fixed location and it does not possess any motion or it can be mounted to the robot. These
configurations are named as "fixed camera" and "eye-in-hand" configurations respectively.
If a fixed camera configuration is used, the camera is placed at a location that it is
allowed to observe the task space and the robot/manipulator. Since the camera is not exposed
6
to any motion, the geometric link between the task space and the camera does not change.
However, the clear view of the task space of the camera can be hampered by the manipulator
motion and this kind of occlusions can create severe degradation of the performance or even
some instability issues.
With an eye-in-hand system, the camera is mounted on the robot/manipulator. This
configuration enables the camera to see the task space without any occlusions while the robot
travels around the work space. As opposed to the fixed camera configuration, the geometric
relationship between the task space and the camera alters when the robot moves in this
configuration. On the other hand, the scene that the camera sees can change very drastically
when the position of the camera attachment point is exposed to large and fast movements.
This drawback may be encountered especially with multiple link robotic manipulators and
could have undesired performance consequences.
2.2. Servoing Architectures
Different servoing architecture classifications are offered by different people in the literature
but the mostly used one is based upon the question: "Is the error signal or the task function
defined in three dimensional work space coordinates or directly in terms of the image
features?" and the answer to this question resulted in such a taxonomy that the error signal
can be defined in 3D workspace coordinates or directly in terms of image features or
combination of them.
2.2.1. Image Based Visual Servoing
This approach uses the image data directly to control the robot motion and the task function is
defined in the image such that there is no need to estimate the pose error in Cartesian space
explicitly. The image measurements that are used to determine the task/error function are the
pixel coordinates of a set of image features such as interest points and the task function is
isomorphic to the camera pose. A control law is constructed to map the image error to robot
motion directly. A system can either use a fixed camera or eye-in-hand configuration. In
either case, the motion of the robot results in changes of the image provided by the vision
system. Hence, determination of an image based visual servoing task necessitates an
appropriate definition of an error e such that when the task is accomplished, error becomes
zero.
2.2.2. Position Based Visual Servoing
The vision data are used to build the 3D representation of the scene with this approach, that
is, the task/error function is expressed in Cartesian space. Features extracted from the image
and/or 3D model of the object are used to find out the position and orientation of the target
with respect to the camera. Using this information, an error between the current pose and the
desired pose of the robot is defined in the work space and suitable coordinates can be
provided as set points to the controller.
7
2.2.3. 2D ½ Visual Servoing(Hybrid Visual Servoing)
The task function is expressed both in Cartesian space and in the image such that the rotation
error is estimated explicitly in Cartesian space and the translational error is expressed in the
image. 2D 1/2 visual servoing is based on the estimation of the partial camera displacement
from the current to the desired camera poses at each iteration of the control law. Contrary to
position based approaches, it does not need 3D model of the object and contrary to image
based methods, it can avoid some stability problems in the whole task space [4].
3-AN INSIGHT INTO VISUAL SERVOING METHODS
In order for the reader to get an insight into visual servoing methods, an analysis of these
methods is carried out and related references are given in this section. Before going on with
this analysis, the geometry of image formation is explained as a preliminary subject.
3.1. The geometry of Image formation
A digital image is a data structure representing a generally rectangular grid of pixels. The
word pixel is based on a contraction of pix ("pictures") and el (for "element"). Pixels are
normally arranged in a 2-dimensional grid, and are often represented using dots or squares.
The image is formed by directing the light onto a two dimensional array of sensing elements.
Each pixel has a value which is corresponding to the intensity of the light focused on a
particular sensing element [5]. The mediums used to focus the light onto the sensing elements
are the lens and the sensing elements are composed of charge coupled device sensors. A
charge-coupled device (CCD) is a device for the movement of electrical charge, usually from
within the device to an area where the charge can be manipulated, for example conversion
into a digital value [6].
3.1.1. The Camera Coordinate Frame
Image plane is the plane that contains the sensing elements and the camera coordinate frame
is assigned as follows:
i) z axis is chosen to be perpendicular to the image plane and along the optical axis of the
lens,
ii) The origin of the camera coordinate frame is λ (focal distance of the camera) much behind
the image plane,
iii) x and y axes are assigned according to the right hand rule and they are taken to be parallel
to the horizontal and vertical axes of the image plane respectively.
The origin of the camera coordinate frame is called center of projection and the point where
the optical axis crosses the image plane is the principal point. An illustration of the coordinate
frame is given in Figure 3.1. Any point on the image plane can be represented by the
coordinates of (u,v, 𝜆) with respect to the camera coordinate frame.
8
Figure 3.1. The Camera Coordinate Frame
The point P whose coordinates with respect to the camera coordinate frame are (x,y,z) is
projected on to the image plane with coordinates (u,v, 𝜆). The relation between these
coordinates with an unknown positive constant k is given as:
k
x
y
z
u
v
From this equality, following equations can be obtained easily.
𝑘 =𝜆
𝑧 , 𝑢 = 𝜆
𝑥
𝑧 , 𝑣 = 𝜆
𝑦
𝑧 (3.1)
This relation is defined for perspective projection method which is a widely used camera
projection method. There are also other camera projection methods such as scaled
orthographic projection and affine projection offered by S.Hutchinson [2]. Analysis of visual
servo control methods will be based upon perspective projection method in this report.
3.1.2. The Image Plane and the Sensor Array
The row and column indices for a pixel are denoted by pixel coordinates (r,c). In order to
establish a relation between the coordinates of image points and their corresponding 3D
world coordinates, the image plane coordinates(u,v) and the pixel coordinates(r,c) must be
related.
Let the pixel coordinates of the principal point be denoted by (or , oc) and let the origin of the
pixel array be attached to the corner of the image. The horizontal and vertical dimensions of a
pixel are given by sx and sy respectively. sx and sy are the scale factors relating pixels to
distance. Also, the vertical and horizontal axes of the pixel coordinate system usually point in
opposite directions from the horizontal and vertical axes of the camera frame [5]. Therefore,
9
combining all the information above reveals equation (3.2) which relates the image plane
coordinates and pixel coordinates.
−𝑢
𝑠𝑥= 𝑟 − 𝑜𝑟 , −
𝑣
𝑠𝑦= 𝑐 − 𝑜𝑐 (3.2)
3.2. Analysis of Visual Servoing Methods
As it is stated before, there are mainly three classes of visual servoing methods and
explanation of each method is generally specific to the application. For this reason, a lot of
works are being added to the visual servoing knowledge and all of them deserve its own
tutorial so it is not possible to cover all the available methods here. Thus, only the classical
image based visual servoing method is considered here in order to gain some basic insight
and some appropriate references are pointed out for other methods.
The aim of all vision based control schemes is to minimize an error usually defined by
𝑒 𝑡 = 𝑠 𝑡 − 𝑠∗.
s(t) denotes a vector of image feature values that are tracked during motion and 𝑠∗ contains
the desired values of those features. If a single point is used as an image feature, then s(t) can
be defined in terms of image plane coordinates of that point as such
𝒔 𝒕 = 𝑢(𝑡)𝑣(𝑡)
.
The time derivative of 𝒔 𝒕 is called as an image feature velocity and it is linearly related to
the camera velocity. If the camera velocity is represented by 𝝃 = 𝑣𝜔
in which 𝑣 stands for
linear velocity of the origin of the camera and 𝜔 stands for the angular velocity of the camera
about the z axis of camera coordinate frame, then the relationship between the image feature
velocity and the camera velocity becomes
𝒔 = 𝑳 𝒔, 𝒒 𝝃. (3.3)
The matrix L is called image Jacobian matrix or interaction matrix and it is a function of
image features and position of the robot. In order to derive the interaction matrix which
relates the velocity of the camera(𝝃) to the time derivatives of the coordinates of the
projection of a 3D fixed point 𝑷 in the image (𝒔 ), it is necessary to find out an expression
for the velocity of point 𝑷 with respect to the moving camera. Using homogeneous
transformation equations, the relation between the coordinates of point 𝑷 with respect to the
world frame and with respect to the moving camera can be established as
𝑷𝒐 = 𝑹 𝒕 𝑷𝒄 𝒕 + 𝒐(𝒕). In this equation, 𝑷𝒐 stands for the coordinates of P with respect to
the world coordinate frame and 𝑷𝒄 is the coordinates of P relative to the moving camera
frame. Also, 𝑹 𝒕 and 𝒐(𝒕) are the rotation matrix and the translation vector respectively
between the world frame and the camera coordinate frame. Thus, the coordinates of P relative
to the camera frame can be obtained as in the following equation
10
𝑷𝒄 𝒕 = 𝑹𝑻 𝒕 𝑷𝒐 − 𝒐 𝒕 (3.4)
since 𝑹𝑻 𝒕 = 𝑹−𝟏 𝒕 . By taking the time derivative of equation (3.4), we get
𝑷 𝒄 𝒕 = 𝑹 𝑻 𝒕 𝑷𝒐 − 𝒐 𝒕 − 𝑹𝑻 𝒕 𝒐 𝒕 (3.5)
since 𝑷𝒐 is invariant in time. Plugging 𝑹 = 𝑺 𝝎 𝑹 and 𝑹 𝑻 = 𝑹𝑻𝑺(𝝎)𝑻 = 𝑹𝑻𝑺(−𝝎) into
equation 3.5 and after some manipulations, the following equation is obtained [5].
𝑷 𝒄 𝒕 = −𝝎𝒄 𝒕 𝐱 𝑷𝒄 𝒕 − 𝒐 𝒄 𝒕 (3.6)
Here, 𝝎 𝒄 and 𝒐 𝒄 are the angular velocity and linear velocity of the camera respectively
expressed in the camera coordinate frame. If the arguments in equation (3.6) are defined
explicitly and the cross product and subtraction operations are done, a system of three
independent equations are obtained.
𝑷𝒄(𝒕) =
𝑥 𝑡
𝑦 𝑡
𝑧 𝑡 , 𝑷 𝒄(𝒕) =
𝑥 𝑡
𝑦 𝑡
𝑧 𝑡 , 𝝎𝒄(𝒕) =
𝜔𝑥 𝑡
𝜔𝑦 𝑡
𝜔𝑧 𝑡
, 𝒐 𝒄(𝒕) =
𝑣𝑥(𝑡)𝑣𝑦 (𝑡)
𝑣𝑧(𝑡)
The coordinates of point 𝑷 relative to the moving camera as well as the angular and linear
velocities of the camera with respect to the camera coordinate frame are time dependent.
However, the explicit time dependence will not be shown in the following equations for the
sake of simplicity of the notation.
𝑥 𝑦 𝑧
= −
𝜔𝑥
𝜔𝑦
𝜔𝑧
𝐱 𝑥𝑦𝑧 −
𝑣𝑥
𝑣𝑦
𝑣𝑧
(3.7)
Equating the right hand side and the left hand side of the equation (3.7) results in a system of
three equations (3.8)-(3.10).
𝑥 = 𝑦𝜔𝑧 − 𝑧𝜔𝑦 − 𝑣𝑥 (3.8)
𝑦 = 𝑧𝜔𝑥 − 𝑥𝜔𝑧 − 𝑣𝑦 (3.9)
𝑧 = 𝑥𝜔𝑦 − 𝑦𝜔𝑥 − 𝑣𝑧 (3.10)
Combining these equations with equation (3.1) gives the equations (3.11)-(3.13).
𝑥 =𝑣𝑧
𝜆𝜔𝑧 − 𝑧𝜔𝑦 − 𝑣𝑥 (3.11)
𝑦 = 𝑧𝜔𝑥 −𝑢𝑧
𝜆𝜔𝑧 − 𝑣𝑦 (3.12)
𝑧 =𝑢𝑧
𝜆𝜔𝑦 −
𝑣𝑧
𝜆𝜔𝑥 − 𝑣𝑧 (3.13)
11
It is also necessary to find the time derivative of the image plane coordinates. While taking
the time derivative of image plane coordinates, equations (3.11)-(3.13) are used wherever
necessary.
𝑢 = 𝜆𝑧𝑥 − 𝑥𝑧
𝑧2= −
𝜆
𝑧𝑣𝑥 +
𝑢
𝑧𝑣𝑧 +
𝑢𝑣
𝜆𝜔𝑥 −
𝜆2 + 𝑢2
𝜆𝜔𝑦 + 𝑣𝜔𝑧 (3.14)
𝑣 = 𝜆𝑧𝑦 − 𝑦𝑧
𝑧2= −
𝜆
𝑧𝑣𝑦 +
𝑣
𝑧𝑣𝑧 −
𝑢𝑣
𝜆𝜔𝑦 +
𝜆2 + 𝑣2
𝜆𝜔𝑥 − 𝑢𝜔𝑧 (3.15)
Equations (3.14) and (3.15) can be represented in the matrix form [5]:
𝑢 𝑣 =
−𝜆
𝑧 0
0 −𝜆
𝑧
𝑢
𝑧
𝑢𝑣
𝜆
𝑣
𝑧
𝜆2+𝑣2
𝜆
−
𝜆2+𝑢2
𝜆𝑣
−𝑢𝑣
𝜆−𝑢
𝑣𝑥
𝑣𝑦
𝑣𝑧
𝜔𝑥
𝜔𝑦
𝜔𝑧
(3.16)
The first three columns are dependent on the image plane coordinates (𝑢, 𝑣) and the depth, 𝑧,
of the 3D point relative to the camera frame. Therefore, the interaction matrix must estimate
or approximate the value of 𝑧 for any control scheme using this form. This depth information
can come from stereotype cameras, multiple cameras, a single camera but with multiple
views or proper range sensors/finders. 𝑧 can be estimated, for instance, by triangulation for at
least two views of the scene. As can be seen, the part of the interaction matrix which
includes the depth value is related to the translational part and rotation part is just dependent
on image plane coordinates.
When more than one point is tracked in the image, the interaction matrices for each point can
be stacked in one general interaction matrix in order to find the camera movement.
𝑢 1𝑣 1..
𝑢 𝑛𝑣 𝑛
=
−
𝜆
𝑧1 0
0 −𝜆
𝑧1
𝑢1
𝑧1
𝑢1𝑣1
𝜆
𝑣1
𝑧1
𝜆2 +𝑣12
𝜆
−
𝜆2+𝑢12
𝜆𝑣1
−𝑢1𝑣1
𝜆−𝑢1
. .
. . . . . .
. .
. .
−𝜆
𝑧𝑛 0
0 −𝜆
𝑧𝑛
𝑢𝑛
𝑧𝑛
𝑢𝑛 𝑣𝑛
𝜆
𝑣𝑛
𝑧𝑛
𝜆2+𝑣𝑛2
𝜆
−
𝜆2+𝑢𝑛2
𝜆𝑣𝑛
−𝑢𝑛 𝑣𝑛
𝜆−𝑢𝑛
𝑣𝑥
𝑣𝑦
𝑣𝑧
𝜔𝑥
𝜔𝑦
𝜔𝑧
Thus 𝑳 ∈ R2nX 6 and therefore three points are sufficient to solve for 𝝃 given the image
measurements 𝒔 and desired camera velocity 𝝃 can be used as the control input. In order to
find 𝝃, if possible, the interaction matrix must be directly inverted. Otherwise,
pseudoinverse(Moore-Penrose inverse) must be used. If k many features are tracked in the
image and the camera has a velocity which is consisting of m components and
rank(L)=min(k,m), i.e., L is full rank, then there are three possibilities for the inversion of
the interaction matrix.
12
i)If k=m, 𝝃 = 𝑳−𝟏𝒔 Enough number of features are observed.
ii)If k<m, 𝝃 = 𝑳+𝒔 where 𝑳+ = 𝑳𝑻 𝑳𝑳𝑻 −𝟏 Not enough features are observed.
iii)If k>m, 𝝃 = 𝑳+𝒔 where 𝑳+ = 𝑳𝑳𝑻 −𝟏𝑳𝑻More than sufficient number of features are
observed.
Proof of the stability can be done with the help of a suitable Lyapunov function for the error
system.
𝑉 𝑡 =1
2 𝒆(𝒕) 2
The Lyapunov candidate must be positive definite in the space except at the origin of the
error system and its time derivative 𝑉 (𝑡) = 𝒆𝑻𝒆 must be negative definite excluding the
origin. Stability of the system is proven, if 𝒆 is chosen as 𝒆 = −𝜅𝒆 (𝜅 being a positive
constant). The choice of the time derivative of the error for the vision system can be made as
the following:
𝒆 𝒕 = 𝒔 𝒕 − 𝒔∗
𝒆 𝒕 = 𝒔 𝒕 = 𝑳𝝃
Since 𝒆 = −𝜅𝒆 must be satisfied, 𝑳𝝃 = −𝜅𝒆 must also be satisfied.
If k=m and rank(L)=min(k,m), then the exact inverse of the interaction matrix exists so we
can use 𝝃 = −𝜅𝑳−𝟏𝒆(𝒕) as the control signal. Also, the time derivative of the Lyapunov
function stated above becomes
𝑉 = 𝒆𝑻𝒆 = 𝒆𝑻𝑳𝝃 = −𝜅𝒆𝑻𝑳𝑳−𝟏𝒆 = −𝜅𝒆𝑻𝒆 < 0
and this proves the asymptotic stability.
If k>m or k<m and L is full rank, then the exact inverse of the interaction matrix can not be
obtained so pseudoinverse of it should be used. Then, 𝝃 = −𝜅𝑳+𝒆(𝒕) is used as the control
signal(Definition of 𝑳+ varies for the cases k>m and k<m). Then, the time derivative of the
Lyapunov function becomes
𝑉 = 𝒆𝑻𝒆 = 𝒆𝑻𝑳𝝃 = −𝜅𝒆𝑻𝑳𝑳+𝒆 ≤ 0
since 𝑳𝑳+ is positive semidefinite. Therefore, system stability can be proven but this is not
valid for asymptotic stability.
The analysis of the position based visual servoing and hybrid approaches may vary from an
application to one another. Several of important basic works can be enumerated as [4], [7],
[8]. For these kinds of visual servoing methods, the main aim is to minimize the error
𝒆 𝒕 = 𝒔 𝒕 − 𝒔∗ too as the case in classical image based control method but this time the
ingredients of 𝒔 change depending on the available information, set-ups and the aim of the
application.
13
4-PROJECT DESCRIPTION
Having provided an introduction to visual servoing and been familiar with the classical
methods and the way of constructing the control law, description of the project and the rest of
the work will be more appropriate to build on top of the basics. In this assignment, a problem
of visual servoing based on a fixed monocular camera mounted on a mobile robot is
investigated. The objective is to design a control law for autonomous navigation of the robot
with non-holonomic motion constraints. The visual control task uses the idea of homing.
With this approach, an image is taken previously at the desired position. Then, the control
law drives the mobile robot from an initial pose towards the desired pose by processing the
image information extracted from the target image and the current images taken during the
movement of the robot. Off beat the classical methods, homography based visual servoing
method is adopted in order to achieve this task without the need of depth estimation or any
measurements of the scene. With this approach, the controller is obtained by an exact input-
output linearization of the geometric model in which homography elements are chosen to be
the outputs of the system [9].
5-HOMOGRAPHY BASED VISUAL SERVOING of a NONHOLONOMIC
MOBILE ROBOT
In this chapter, detailed analysis of homography based visual servoing is carried out. Section
5.1 describes the homography and develops an understanding of it and section 5.2 derives the
motion model of the mobile robot. In section 5.3, input-output linearization of the system is
done through the homography and control law is constructed based upon that linearization
scheme. Then, in section 5.4, stability analysis of the system is conducted.
5.1. Homography and Its Estimation
A two dimensional point 𝑿𝟐𝑫 = (𝑥, 𝑦) which lies on a plane can be represented by a three
dimensional vector as well like 𝑿𝟑𝑫 = (𝑥1, 𝑥2, 𝑥3). Here, 𝑿𝟐𝑫 is the scaled version of 𝑿𝟑𝑫
by its third elements such as 𝑥 =𝑥1
𝑥3 𝑎𝑛𝑑 𝑦 =
𝑥2𝑥3
. When points on a projective plane
are represented with respect to a coordinate frame whose x and y axes are on the very same
projective plane, all points possess the same depth value such that "z" coordinate does not
mean much. Therefore, all points are scaled by the third element and "z" becomes 1 for all
points. This kind of representation(𝑿𝟑𝑫) is used in homography analysis and it is called
homogeneous representation of a point lying on a projective plane 𝑃2. Then, homography can
be defined as a mapping of these points from one projective plane to another projective plane
and it has the property of invertibility. Synonymies of homography are projectivity, planar
projective transformation and collineation. According to [10], a homography is an invertible
mapping from P2 to itself such that three points lie on the same line if and only if their
mapped points are also collinear and its algebraic definition is as such: A mapping from
P2 → P2 is a projectivity if and only if there exists a nonsingular 3x3 matrix H such that for
any point in P2 represented by a vector x, it is true that its mapped point is equal to Hx.
14
5.1.1. Geometric Transformations
There are several geometric transformations each of which has some properties peculiar to
them and homographies are one of them. Homographies will be better understood if it is
explained in a context which includes other types of geometric transformations. A detailed
description of all geometric transformations can be found in [10].
i)Isometries
Isometries(Iso=same, metric=measure) are transformations of the plane 𝑃2 that preserve
Euclidean distance. An isometry can be described by equation 5.1.
𝑥 ′
𝑦 ′
1
= 𝜖𝑐𝑜𝑠𝜃 −𝑠𝑖𝑛𝜃 𝑡𝑥
𝜖𝑠𝑖𝑛𝜃 𝑐𝑜𝑠𝜃 𝑡𝑦
0 0 1
𝑥𝑦1 =
𝑹 𝒕𝟎 1
𝑥𝑦1 (5.1)
where 𝜖 = ∓1. If 𝜖 is 1, the isometry is preserving the orientation and it becomes a Euclidean
transformation. If 𝜖 is -1, then it is reversing the orientation. Euclidean transformations
represent the rigid body motion. The isometry consists of planar rotations and translations. If
the rotation matrix becomes identity matrix, this means the points are just 2D translated.
Also, if the translation vector becomes a zero vector, then the points are exposed to pure 2D
rotation. A planar Euclidean transformation has three degrees of freedom: one d.o.f. for
rotation(𝜃) and two d.o.f. for the translation(𝑡𝑥 𝑎𝑛𝑑 𝑡𝑦 ). The distance between two points is
kept same when they are mapped by an isometry transformation and so is the angle between
two lines and the area.
ii)Similarity Transformations
A similarity transformation(or a similarity) is an isometry but with a difference of isotropic
scaling and its representation is given in equation (5.2).
𝑥 ′
𝑦 ′
1
= 𝑠𝑐𝑜𝑠𝜃 −𝑠𝑠𝑖𝑛𝜃 𝑡𝑥
𝑠𝑠𝑖𝑛𝜃 𝑠𝑐𝑜𝑠𝜃 𝑡𝑦
0 0 1
𝑥𝑦1 =
𝑠𝑹 𝒕𝟎 1
𝑥𝑦1 (5.2)
where the isotropic scaling is direction invariant. "s" adds one more degree of freedom to
isometries and the similarity has four degrees of freedom. A similarity no longer preserves
the distance between the points when 𝑠 ≠ ∓1. However, it keeps the ratio of the distances
and the angles between lines invariant so it preserves the shape. An example is shown in
Figure 5.1[11].
15
Figure 5.1 Similarity Transformation
iii)Affine Transformations
An affine transformation(or an affinity) is a non-singular linear transformation followed by a
translation [10]. It is like a similarity but it has two rotations and two non-isotropic scalings.
It is represented by
𝑥 ′
𝑦 ′
1
= 𝑎11 𝑎12 𝑡𝑥
𝑎21 𝑎22 𝑡𝑦
0 0 1
𝑥𝑦1 =
𝑨 𝒕𝟎 1
𝑥𝑦1 (5.3)
It has six degrees of freedom corresponding to 𝑎11 , 𝑎12 , 𝑎21 , 𝑎22 , 𝑡𝑥 , 𝑡𝑦 . The affine matrix 𝑨
can be decomposed as
𝑨 = 𝑅 𝜃 𝑅 −𝜙 𝐷𝑅 𝜙 𝑤𝑒𝑟𝑒 𝐷 = 𝜆1 00 𝜆2
.
Therefore, what the affine matrix 𝑨 does is a rotation by 𝜙, a scaling of 𝜆1 in the direction of
x and another scaling of 𝜆2 in the direction of y, a rotation by – 𝜙 and another rotation by 𝜃.
An affinity has two more degrees of freedom than a similarity. Those are corresponding to
the angle 𝜙 which shows the direction of scaling and the ratio of scaling parameters 𝜆1/𝜆2.
Figure 5.2 shows the interpretation of the action of the affine matrix 𝑨.
Figure 5.2 Effect of Affine Transformation
If the affine matrix is considered in two parts like 𝑨 = [𝑅 𝜃 | 𝑅 −𝜙 𝐷𝑅 𝜙 ], then 𝑅 𝜃
corresponds to a rotation preserving the shape and 𝑅 −𝜙 𝐷𝑅 𝜙 part corresponds to the
16
deformation of the shape in the axis defined by 𝜙 and in the axis that is perpendicular to the
axis defined by 𝜙 and the amount of distortion is dependent on the scaling factors 𝜆1and 𝜆2.
Figure 5.3 [12] shows some examples of affinity transformation.
Figure 5.3 Visual examples of affinity transformations
The distances between the points and the angles between the lines are not preserved in affine
transformations. However, there are some invariants such that parallel lines in one image
remain parallel in the mapped image, ratios of lengths of parallel line segments and the ratios
of areas are kept unchanged.
iv)Perspective Projection
Perspective projection is the projection of three dimensional points in the Cartesian space to
two dimensional points. This projection is an important projection method which is widely
used. This projection describes the mapping of points in the space into the image plane when
images are taken by the cameras. A perspective projection can be described by
𝒙 = 𝑷𝑿
where P is 3x4 projection matrix, 𝒙 is an image point represented by a homogeneous 3-vector
and X is a point in the space represented by a homogeneous 4-vector[13]. In the projection
matrix, there are 12 elements but they are defined up to a scale constant, i.e., the ratios of the
elements are significant so it has 11 degrees of freedom. These 11 degrees of freedom come
from internal and external camera matrices. Internal(Intrinsic) camera matrix or camera
calibration matrix provides 5 degrees of freedom and external(extrinsic) camera matrix
17
provides 6 degrees of freedom. Perspective projection can be split into two phases in terms of
its actions. First, it finds the coordinates of the point, which is in the 3D space, with respect to
the camera frame by the help of homogeneous transformation matrix. Then, it projects those
coordinates which are relative to the camera frame into the image plane and this is done by
using intrinsic camera matrix.
Extrinsic camera matrix can be defined as [R|t] which accounts for the rotation matrix and
the translation vector between camera and world frames, so six external parameters relate the
camera orientation to the world coordinate system. Those six parameters are 3 rotations
expressed by 3x3 rotation matrix "R" and three translations denoted by 3x1 vector "t".
Intrinsic camera matrix "K" can be defined as
𝑲 =
𝛼𝑥 𝑠 𝑥𝑜
0 𝛼𝑦 𝑦𝑜
0 0 1 .
In this matrix, 𝛼𝑥 and 𝛼𝑦 are the focal lengths of the camera in terms of pixel dimensions in
the x and y directions respectively. 𝑥𝑜 and 𝑦𝑜 are the coordinates of the principal point in
pixels in the image and 𝑠 is the skew parameter which shows the deviation of pixels from
orthogonality (or perpendicularity of the sides of the pixels). 𝑠 = cot(𝜍) where 𝜍 is the angle
between sides of the a pixel. Generally, pixels are rectangular so 𝜍 = 90° and then
𝑠 = cot 𝜍 = 0.Hence, intrinsic camera matrix 𝑲 explains 5 internal parameters
(𝛼𝑥 , 𝛼𝑦 , 𝑥𝑜 , 𝑦𝑜 ,𝑠).
Thus, the projection matrix can be represented by the combination of extrinsic and intrinsic
camera matrices 𝑷 = 𝑲[𝑹|𝒕].
An example of perspective projection is given in Figure 5.4 [14].
Figure 5.4 Perspective Projection
18
All 3D world points are mapped into 2D image points as illustrated in figure 5.4. The
perspective projection gives the most realistic impression of depth, although it is not possible
to know the exact depth information from a single image. A perspective projection produces
a similar view to the way the human eye perceives its environment. Remark: When you close
one of your eyes fully and try to touch something around you, you will see that you are not as
accurate as when you touch the same thing when both of your eyes are open. In other words,
you could not touch the object in a relaxed and comfortable manner when one of your eyes is
closed. This shows that perspective projection really gives a realistic but not perfect
impression of depth. However, when you open both of your eyes, you know the exact depth
of the point. This is called Stereopsis in human sense of depth.
v)Projective Transformation
A planar projective transformation or a homography is a transformation on homogeneous
3-vectors represented by a nonsingular 3x3 matrix H such that 𝒙′ = 𝑯𝒙. The matrix H can be
changed by multiplying it by a nonzero scale factor without altering the projective
transformation. Hence, H is called a homogeneous matrix since only the ratios of the matrix
elements are important. There are 8 independent ratios so homographies have 8 degrees of
freedom. None of the invariants of affine transformation is valid for homographies. However,
as it is mentioned at the beginning of this chapter, if three points are on the same line in one
image, they will also be on the same line when they are mapped to another image. A
projective transformation can be written as
𝑯 = 𝑨 𝒕𝐕 𝒗
where V=(𝑽𝟏, 𝑽𝟐).
An important difference between projective transformations and affinities is V vector which
is the source of nonlinearities of projective transformations. Besides, as opposed to the
affinities, the scalings included in "𝑨" vary depending on the position on the image.
Similarly, orientation of the transformed line also depends on the position and orientation of
the source line.
5.1.2. Situations in which solving a homography arises
There are many situations where the use of homographies is required. In this part, the
applications which use homographies are discussed [13].
i)Camera Calibration
Camera calibration is the key step in many vision applications as it lets the systems to
determine the relation between what appears on the image and where it is located in 3D
world. In order to compensate for the undesired features of the lens such as radial distortions,
camera calibration matrix must be known. Two important works of finding camera
calibration matrix using homography estimation are [15] and [16]. In these works, the images
of the same planar pattern such as checker boards are taken from different perspectives and a
homography is estimated between those images to find out calibration matrix.
19
ii) 3D Reconstruction and Visual Metrology
3D reconstruction is a problem in computer vision where the goal is to obtain the scene
configurations and camera positions from images of the scene. In medical imaging, multiple
images of some body parts are taken and 3D model of that part is analyzed. Additionally, the
distances between the objects and the size of the objects are estimated by utilizing
homographies in visual metrology.
iii) Stereo Vision
Two cameras which are separated by a distance take the pictures of the same scene. Images
are shifted over top of each other to find the parts that match. The shifted amount is called the
disparity. A key step is to find out the point correspondences in the images, and these points
are searched across a line called epipolar line. Rectifying the homographies between the
images allows to make the epipolar lines axis-aligned and parallel, thus makes the search of
corresponding points very efficient [13].
Some more applications can be added to the ones mentioned above. The homography
between two views plays an important role in the geometry of multiple views. Homography
is also used in tracking applications using multiple cameras and/or using one camera with
multiple views of the scene, and also it is used to build projector-camera systems.
Homography relation can be used between two views to obtain the transformation between
planes. Even when the target is partially or fully occluded by an unknown object, the tracker
can follow the target as long as it is visible from another view [13]. No complicated inference
scheme is used and no 3D information is recovered explicitly [17]. Additionally,
homographies are used for military applications such that they are used to obtain the altitude
map of an unknown environment by the help of photos taken by airplanes so the risks to the
soldiers can be eliminated in advance.
5.1.3. How to find the homography?
Finding the homography between two images is a must in order to construct the control law
in this project. The ways of finding homography are analyzed in two subsections. In the first
subsection, the answer to the question "How can the homography be found in a simulation
environment?" is provided. In the second subsection, the method of estimating the
homography from two real images for real experiments is explained in details.
*5.1.3.1. Theory of Homography and Homography in Simulation
Environments
In this project, the aim is to bring the current camera frame ℱ to the target(reference) camera
frame ℱ∗. It is supposed that only the images 𝔗∗ and 𝔗 of the scenes at the target position and
at the current position respectively are available to us. This is illustrated in Figure 5.5.
20
Figure 5.5 Illustration of the configuration and Homography between two images of a plane
Let P be a point in 3D space and its coordinates are represented by 𝛘∗ = [𝑋∗, 𝑌∗, 𝑍∗]𝑇 in the
reference frame ℱ∗. 𝛘∗ is mapped to a virtual plane which is perpendicular to the optical axis
and 𝜆(focal length) much away from the center of projection 𝒪∗. Then, its mapped
coordinates are denoted by 𝒎∗ = [𝑢∗, 𝑣∗, 𝜆]𝑇 with respect to the reference camera frame.
Thus, the relationship between 𝛘∗ and 𝒎∗comes out to be 𝒎∗ =𝜆
𝑍∗ 𝛘∗. After that, 𝒎∗ is
projected onto the reference image plane 𝔗∗ as 𝒑∗ = [𝑟∗, 𝑐∗, 1] which has the pixel
coordinates 𝑟∗ and 𝑐∗ by the help of intrinsic camera matrix such that
𝒑∗ = 𝑲𝒎∗ 𝑤𝑒𝑟𝑒 𝑲 =
𝛼𝑥 𝑠 𝑥𝑜
0 𝛼𝑦 𝑦𝑜
0 0 1 .
In the intrinsic camera matrix, 𝛼𝑥 and 𝛼𝑦 are the focal lengths of the camera in terms of pixel
dimensions in the x and y directions respectively. 𝑥𝑜 and 𝑦𝑜 are the coordinates of the
principal point in pixels in the image and 𝑠 is the skew parameter as explained in perspective
projection section.
The 3D point P is represented by 𝛘 = [𝑋, 𝑌, 𝑍]𝑇 relative to the current camera coordinate
frame ℱ. If the same procedure is applied to the point P but this time with respect to the
current camera frame, the following equations are obtained:
𝒎 =𝜆
𝑍𝛘 where 𝒎 = [𝑢, 𝑣, 𝜆]𝑇
and then m is projected onto the current image plane as point 𝒑 = [𝑟, 𝑐, 1]𝑇 by the help of
𝒑 = 𝑲𝒎.
The rotation matrix and the translation vector between the frames ℱ∗and ℱ are 𝑹 ∈ 𝑆𝑂 3
and 𝒄 ∈ ℜ3 respectively. Besides, if the point P is supposed to belong to the plane 𝜋
and 𝒏∗ = [𝑛𝑥 , 𝑛𝑦 , 𝑛𝑧]𝑇 is the normal to the plane 𝜋 expressed relative to the reference camera
frame and 𝑑∗ is the distance between the plane 𝜋 and the origin of the reference plane, then
21
the relation between 𝒑∗ and 𝒑 is defined by a projective transformation H in such a way that
𝒑 = 𝑯𝒑∗. A homography H can be related to the camera motion as seen in the equation (5.4).
𝑯 = 𝑲𝑹 𝑰 + 𝒄𝒏∗𝑇
𝑑∗ 𝑲−𝟏 (5.4)
In the simulations, there is no real robot which travels through the works space and takes the
images of the scene so no real images are available in the computer. Therefore, the initial and
the target positions and orientations of the robot must be known by us in order to emulate the
real motion of the robot in simulation environment. With that knowledge, the rotation matrix
and the translation vector between the current frame and the target frame can be found. With
the knowledge of the intrinsic camera matrix [18], the homography can be computed by
equation (5.4). For 𝒏∗ and 𝑑∗, some arbitrary but appropriate values can be tried by
inspection. Although they have effects on the performance, they do not affect the
convergence of the system at all. Thus, plugging all of these into the equation (5.4), a 3x3
homography is obtained and its elements can be used in the determination of the control
signal.
*5.1.3.2. Homography estimation from two real images
In the real experiments, we have the image of the scene taken at the desired position as a
reference image and the current images taken during the robot's motion. This means that we
have nothing else as additional information other than two images(current one and the
reference one), so the rotation matrix and the translation vector are not known a priori.
Therefore, all required information must be extracted from the images in order to find out the
control signal. In order to do so, two steps must be completed.
STEP 1: First, features that can be utilized in order to find out reliable matchings between the
views of the scene must be extracted from the images. There are several methods in the
literature to find features in the images such as Harris corner detector, canny edges, entropy
operator, SIFT etc. If the features detected are highly distinctive and invariant to image
scaling and rotation, it is going to allow for more robust estimation of the homography.
Among several, the Scale Invariant Feature Transform (SIFT) which is an algorithm in
computer vision to detect and describe local features in images is employed in this project.
SIFT and most common algorithms search for points as image features, while lines and
conics may also be utilized as image features by other algorithms. There are four main
cascaded steps to determine the set of image features in SIFT algorithm. In [19] and [20],
detailed information about this algorithm can be found.
1. Scale-space extrema detection: The first stage of computation searches over all
scales and image locations, i.e., the first stage of keypoint detection is to identify locations
and scales which can be used under various views of the same scene. It is implemented
efficiently by using a difference-of-Gaussian function to identify potential interest points that
are invariant to scale and orientation. The image is convolved with Gaussian filters at
22
different scales, and then the difference of successive Gaussian-blurred images is taken.
Specifically, difference-of-Gaussian image is given by
D x, y, σ = L x, y, kiσ − L x, y, kjσ
where L x, y, kσ is the convolution of the original image I(x, y) with Gaussian-blur
G x, y, kσ at a scale kσ such that
L x, y, kσ = G x, y, kσ ∗ I(x, y).
Thus, a difference-of-Gaussian image between scales kiσ and kjσ is just the difference of the
Gaussian-blurred images at scales kiσ and kjσ . The image is first convolved with Gaussian-
blurs at different scales. The convolved images are grouped by octave (an octave corresponds
to doubling the value of σ), and the value of k is selected so a fixed number of convolved
images per octave are obtained. Then, the Difference-of-Gaussian images are taken from
adjacent Gaussian-blurred images per octave. An illustration of difference-of-Gaussian is
given in figure 5.6[19].
Figure 5.6 Illustration of difference of Gaussian
2. Keypoint localization: Once difference-of-Gaussian images have been obtained,
keypoints are then taken as maxima/minima of the Difference of Gaussians (DoG) images
across scales[19], [21]. This is done by comparing each pixel in the difference-of-Gaussian
images to its eight neighbors at the same scale and nine corresponding neighboring pixels in
each of the neighboring scales. Figure 5.7 [19] shows the search region.
23
Figure 5.7 Search region to find a keypoint candidate
The pixel marked with an X is investigated whether it could be a keypoint candidate or not.
There are 8 neighbors around it, all of which are at the same scale of X pixel and 9 pixels
above at a higher scale and 9 pixels below at a lower scale shown by green circles. If the X
pixel has the minimum or maximum intensity value among 26 pixels, then it is included to
the list of keypoint candidates. As a result of this procedure, lots of keypoint candidates
appear. However, some of them are not stable enough such that they may be located on an
edge of the image or may be in a low contrast region so if there is some image noise, it could
be hard to distinguish that pixel from its neighbors so it may not be recognized as a keypoint
anymore. There are some algorithms developed for discarding low contrast candidate
keypoints and eliminating edge repsonses [21]. These subjects are not explained here for the
sake of keeping the main subject in the bounds. After elimination of the inappropriate
keypoint candidates, there is one more thing left to do, which is the determination of the
keypoint location. For each candidate keypoint, interpolation of nearby data is used to
accurately determine its position. Calculating the interpolated location of the extremum
improves matching and stability when compared to locating each keypoint at the location and
scale of the candidate keypoint. This can be explained by a simple example: Assume that
there are two pixels nearby and one pixel is totally white and the other one is in another color
and besides, the white pixel is considered as a keypoint. Normally, the coordinates of the
center of the white pixel should be provided as keypoint coordinates. However, as you may
guess, the point at the middle of the line that combines the center of the white pixel and the
center of the other pixel has a higher contrast because it is on the transition region and it is
thus easier to detect this point at other images of the same scene taken from different
perspectives. Therefore, the interpolations are carried out to find more suitable coordinates.
The interpolations are done using the quadratic Taylor expansion of the Difference-of-
Gaussian scale-space function with the candidate keypoint as the origin. Additionally, the
softwares used to find the keypoints generally return some double numbers for the
coordinates of the keypoints rather than integer numbers which indicate the pixel location in a
matrix. This is simply because of this kind of interpolations.
3. Orientation assignment: Orientations are assigned to each pixel around the
keypoint location based on local image gradient directions. Firstly, the Gaussian-smoothed
24
image L x, y, σ at the keypoint's scale σ is taken so that all computations are performed in a
scale-invariant manner. For an image sample L x, y at the scale of σ, the gradient magnitude,
𝑚(𝑥, 𝑦), and the orientation, 𝜃(𝑥, 𝑦), are computed using pixel differences [21]:
𝑚 𝑥, 𝑦 = (𝐿 𝑥 + 1, 𝑦 − 𝐿 𝑥 − 1, 𝑦 )2 + (𝐿 𝑥, 𝑦 + 1 − 𝐿 𝑥, 𝑦 − 1 )2
𝜃 𝑥, 𝑦 = tan−1 𝐿 𝑥, 𝑦 + 1 − 𝐿 𝑥, 𝑦 − 1
𝐿 𝑥 + 1, 𝑦 − 𝐿 𝑥 − 1, 𝑦
The magnitude and direction calculations for the gradient are repeated for every pixel
in a neighboring region around the keypoint in the Gaussian-blurred image L x, y, σ . The
result of this procedure is illustrated in figure 5.8 for 8x8 array of pixels in the neighborhood
of the keypoint location.
Figure 5.8 Gradients of pixels around the keypoint location
An orientation histogram with 36 bins covering 360 degree range of orientation is formed,
with each bin covering 10 degrees. Each sample in the neighboring window added to a
histogram bin is weighted by its gradient magnitude and by a Gaussian-weighted circular
window with σ that is 1.5 times that of the scale of the keypoint [21]. The peaks in this
histogram correspond to dominant orientations. Once the histogram is filled, the orientations
corresponding to the highest peak and local peaks that are within 80% of the highest peak are
assigned to the keypoint. In the case of multiple orientations being assigned, an additional
keypoint is created having the same location and scale as the original keypoint for each
additional orientation [19].
4. Keypoint descriptor: Previous steps found keypoint locations at particular scales
and assigned orientations to them and this ensures invariance to image location, scale and
rotation. At this step, a descriptor vector for each keypoint is computed such that the
descriptor is highly distinctive and partially invariant to the remaining variations such as
25
illumination. Generally, magnitude and orientation values of samples in a 16x16 region
around the keypoint are calculated. Then, for each 4x4 subregion of the original
neighborhood region, the samples are accumulated into orientation histograms with 8 bins
corresponding to 8 directions so there are totally (16x16)/(4x4)=16 histograms created. The
magnitudes of the gradients are further weighted by a Gaussian function with σ equal to 1.5
times the scale of the keypoint. The descriptor then becomes a vector of all the values of
these histograms [21]. 16 histograms with 8 bins are created so there are 16x8=128 entries
that must be included in keypoint descriptor vector.
After applying these four steps to both of the images, keypoint descriptors of both
images are obtained. Then, the keypoints in both images must be checked whether they are
matching with each other or not. In order find the matches, one keypoint is taken from the
first image and it is compared with all keypoints of the other image one by one. After that, the
second keypoint is picked up from the first image and it is compared with all keypoints of the
other image again. This loop continues until all keypoints are compared with eachother. The
criterion of whether two key points are accepted as a matched pair or not can be explained
like the following: Each keypoint has its feature(descriptor) vector. When two keypoints are
compared, the angle between the feature vectors is found by the help of dot product. If that
angle is smaller than a threshold, they are accepted as a matched pair.
𝐹 1. 𝐹 2 = 𝐹 1 |𝐹 2|cos(𝛼) where 𝐹 1 and 𝐹 2 are the feature vectors and 𝛼 is the angle between
them. The smaller the 𝛼 is, the more similar the feature vectors are. When 𝛼 gets below a
certain threshold, keypoints are assumed to match with each other. An example of matched
points by SIFT program is illustrated in Figure 5.9.
Figure 5.9 An example of point matches
There are 1021 and 579 keypoints found in the left and right images respectively and 19 of
them are matched.
100 200 300 400 500 600 700
50
100
150
200
250
300
350
26
STEP 2: After matched points are found, it is now doable to determine the homography
between two images. One of the widely used methods for homography estimation is Direct
Linear Transformation algorithm. In order to find a homography between two images, there
should be at least 4 matched point pairs. As stated in projective transformation section, a
homography has 9 elements and only the ratios of the elements are important so a
homography has 8 degrees of freedom. One matched pair of keypoints constraints 2 degrees
of freedom so 4 matched pairs are eventually necessary to define the homography fully. The
homography relates one point 𝑥𝑖 in one image to another point 𝑥𝑖′ which is in the other image
in such a way that 𝒙𝒊′ = 𝑯𝒙𝒊. In this representation, the homogeneous 3-vectors 𝒙𝒊
′ and 𝑯𝒙𝒊
may not be equal in magnitude since H is defined up to a scale but they have the same
direction. In order to ease the analysis, it is more appropriate to use 𝒙𝒊′ 𝐱 𝑯𝒙𝒊 = 𝟎 .
If the 𝑗𝑡 row of the matrix H is represented by 𝒉𝒋𝑇 , then 𝑯𝒙𝒊 can be written as
𝑯𝒙𝒊 =
𝒉𝟏𝑇𝒙𝒊
𝒉𝟐𝑇𝒙𝒊
𝒉𝟑𝑇𝒙𝒊
.
If 𝒙𝒊′ = (𝑥𝑖
′ , 𝑦𝑖′ , 𝑤𝑖
′ )𝑇 , then the cross product becomes
𝒙𝒊′ 𝐱 𝑯𝒙𝒊 =
𝑦𝑖′𝒉𝟑𝑻
𝒙𝒊 − 𝑤𝑖′𝒉𝟐𝑻
𝒙𝒊
𝑤𝑖′𝒉𝟏𝑻
𝒙𝒊 − 𝑥𝑖′𝒉𝟑𝑻
𝒙𝒊
𝑥𝑖′𝒉𝟐𝑻
𝒙𝒊 − 𝑦𝑖′𝒉𝟏𝑻
𝒙𝒊
= 𝟎.
Since 𝒉𝒋𝑇𝒙𝒊 is 1x1, it is equal to its transpose. Therefore, 𝒉𝒋𝑇𝒙𝒊 = 𝒙𝒊𝑻𝒉𝒋 for j=1,2,3 and a set
of three equations can be obtained and represented by the equation (5.5).
𝟎𝑻 −𝑤𝑖′𝒙𝒊
𝑻 𝑦𝑖′𝒙𝒊
𝑻
𝑤𝑖′𝒙𝒊
𝑻 𝟎𝑻 −𝑥𝑖′𝒙𝒊
𝑻
−𝑦𝑖′𝒙𝒊
𝑻 𝑥𝑖′𝒙𝒊
𝑻 𝟎𝑻
1
2
3
4
5
6
7
8
9
= 𝟎 (5.5)
Now the equations are in the form of 𝑨𝒊𝒉 = 𝟎, where 𝑨𝒊 is a 3X9 matrix and h is a 9-vector
consisting of the elements of the homography elements. Therefore if h is found, then H is
also determined.
27
𝒉 =
1
2
3
4
5
6
7
8
9
, 𝑯 =
1 2 3
4 5 6
7 8 9
(5.6)
Even though there are three equations in (5.5), the third row is dependent on the other two
rows such that third row is the sum of 𝑥𝑖′ times the first row and 𝑦𝑖
′ times the second row.
𝑥𝑖′ times the first row : 𝟎𝑻 −𝑥𝑖
′𝑤𝑖′𝒙𝒊
𝑻 𝑥𝑖′𝑦𝑖
′𝒙𝒊𝑻
𝑦𝑖′ times the second row: 𝑦𝑖
′ 𝑤𝑖′𝒙𝒊
𝑻 𝟎𝑻 −𝑦𝑖′ 𝑥𝑖
′𝒙𝒊𝑻
Sum: 𝑦𝑖′ 𝑤𝑖
′𝒙𝒊𝑻 −𝑥𝑖
′𝑤𝑖′𝒙𝒊
𝑻 𝟎𝑻
If −𝑤𝑖′ is factored out from the sum, the third row of equation (5.5) is obtained. Therefore,
equation (5.5) can be reduced to equation (5.7).
𝑨𝒊𝒉 = 𝟎𝑻
𝑤𝑖′𝒙𝒊
𝑻 −𝑤𝑖
′𝒙𝒊𝑻
𝟎𝑻
𝑦𝑖′𝒙𝒊
𝑻
−𝑥𝑖′𝒙𝒊
𝑻
1
2
3
4
5
6
7
8
9
= 𝟎 (5.7)
The solution of equation (5.7) gives the homography. The summary of the Direct Linear
Transformation algorithm [10] is as follows.
i) For each matched pair of points 𝑥𝑖 ↔ 𝑥𝑖′ , find 2x9 𝑨𝒊 matrix.
ii) Stack all n many 𝑨𝒊 matrices for n correspondences in 2nx9 A matrix.
iii) Obtain the singular value decomposition of A. The unit singular vector corresponding to
the smallest singular value is the solution h. If 𝑨 = 𝑼𝑫𝑽𝑻, then h is the last column of V.
iv) Then using equation (5.6), H can be constructed from h.
This algorithm is implemented in Matlab and it gives the following 3x3 homography for the
images in figure 5.9
𝑯 = −0.0009 −0.0021 0.61740.0030 −0.0013 −0.78670.0000 −0.0000 −0.0020
.
28
The correctness of this homography matrix can be verified by the following way
* Pick up a specific point in the left image and find its coordinates(𝒙𝒊),
* Find that specific point in the right image and also find its coordinates(𝒙𝒊′ ),
* If they are related by the obtained homography, then it means that software is working
correctly.
Let's examine the correctness of the homography this way. Find the coordinates of the upper
right corner of the letter "I" in the word "BASMATI" in the left image. Then this time, find
again the upper right corner of the letter "I" in the word "BASMATI" in the right image. This
is illustrated in figures 5.10 and 5.11 .
Figure 5.10 A specific point in the left image
Figure 5.11 Same specific point in the right image
X: 97 Y: 34 Index: 113 RGB: 0.471, 0.471, 0.471
X: 336 Y: 213 Index: 65 RGB: 0.259, 0.259, 0.259
29
The coordinates of that specific point in the left image are 𝒙𝒊 = [336, 213, 1]𝑇 and in the
right image 𝒙𝒊′ = [97, 34,1]𝑇 and
𝒙𝒊′ =
97341
= 𝑯𝒙𝒊 ≅ −0.0009 −0.0021 0.61740.0030 −0.0013 −0.78670.0000 −0.0000 −0.0020
336213
1 =
97.084734.4144
1
so this indicates that homography is estimated in a true manner. Please note that the elements
of H are rounded off here, so the hand calculation of 𝑯𝒙𝒊 is not exactly same as the result of
Matlab.
5.2. Motion Model of a Mobile Robot
The system that is to be controlled is a mobile robot with nonholonomic motion constraints.
Nonholonomic constraints occur due to the presence of the wheels such that the mobile robot
can not move sideways as shown in figure 5.12.
Figure 5.12 Nonholonomic constraint for a mobile robot
The nonholonomic constraints allow for rolling but not slipping. In general, a nonholonomic
mechanical system can not move arbitrarily in its configuration space. Holonomic constraints
can be written as equations independent of 𝑞 , like 𝑓 𝑞, 𝑡 = 0, where 𝑞 stands for generalized
coordinates. However, nonholonomic constraints can not be written only in terms of
generalized coordinates as they also depend on the time derivative of the generalized
coordinates. This means that nonholonomic constraints are not integrable constraints. A
nonholonomic mobile robot model can be represented by the following state and output
equations:
𝒙 = 𝒇 𝒙, 𝒖 (5.8)
30
𝒚 = 𝒉 𝒙 5.9
where 𝒙 denotes the state vector, 𝒖 denotes the input vector and 𝒚 is the output vector. Inputs
consist of forward velocity(𝑣) and angular velocity(𝑤).
The coordinate system used is shown in figure 5.13.
Figure 5.13 Coordinate System
There are two coordinate frames that should be specified in order to remove possible
ambiguities in minds. One of them is the coordinate frame attached to the mobile robot and
the other one is the world coordinate frame. When the robot reaches its target pose, the
coordinate frame attached to the robot may be different than the world coordinate frame.
However, the world coordinate frame can be chosen to be fully coincident with the robot
coordinate frame at its target pose without loss of generality.
The state vector can be defined as 𝒙 = [𝑥 𝑧 𝜙]𝑇 since the robot has movements on x-z plane.
𝑥 and 𝑧 are for the position of the mobile robot with respect to the world coordinate frame. 𝜙
represents the orientation of the robot and it is the angle between the z axis of the coordinate
frame attached to the mobile robot and the z axis of the world frame. According to the
information provided above, it can be said without loss of generality that when the mobile
robot reaches the target position, all state variables become zero since the world coordinate
frame is coincident with the robot coordinate frame at the target pose. Now, state equations
can be written explicitly as
𝑥 𝑧 𝜙
= −sin(𝜙)cos(𝜙)
0
𝑣 + 001 𝑤 (5.10)
31
In order to define the output vector, a homography between two images(current and target
images) must be found since outputs of the system are chosen among homography elements.
A homography is related to camera motion as
𝑯 = 𝑲𝑹 𝑰 + 𝒄𝒏∗𝑇
𝑑∗ 𝑲−𝟏. (5.11)
𝑹 and 𝒄 are the rotation matrix and the translation vector between the current and target
poses. 𝑲 is the internal camera calibration matrix. In practice, there are some assumptions
made such that robot moves on a planar surface without irregularities, the principal point
coordinates are (0,0) and there is no skew of pixels.
The rotation matrix can be derived by conveying the origin of the target frame to the origin of
the current frame and examining the relationship between those two coordinate frames.
Figure 5.14 Target frame(x, y, z) and Current Frame(x′ , y′ , z′)
The target frame and the current frame are shown in figure 5.14. y and y′ axes are not shown
in the figure because they are orthogonal to the page plane according to the right hand rule. In
order for the current frame to be coincident with the target frame, it must rotate –𝜙 degrees
in clockwise direction(according to the convention used, counterclockwise rotations are
positive as shown in figure 5.13). Thus, following equations define the relationship between
the current and target coordinate frames when their origins are coincident.
𝑥 = −𝑧 ′ sin −𝜙 + 𝑥 ′ cos −𝜙 = 𝑥 ′ cos 𝜙 + 𝑧 ′ sin 𝜙
𝑦 = 𝑦 ′
𝑧 = 𝑧 ′ cos −𝜙 + 𝑥 ′ sin −𝜙 = −𝑥 ′ sin 𝜙 + 𝑧 ′ cos 𝜙
These equations can be put into matrix representation and the rotation matrix, 𝑹, can be
obtained as equation (5.12).
32
𝑥𝑦𝑧 =
cos(𝜙) 0 sin(𝜙)0 1 0
−sin(𝜙) 0 cos(𝜙)
𝑥 ′
𝑦 ′
𝑧 ′
= 𝑹 𝑥 ′
𝑦 ′
𝑧 ′
(5.12)
The translation vector between the target and current frames is represented by
𝒄 = 𝑥0𝑧 . (5.13)
y coordinate is always zero because robot moves on x-z plane. Using equation (5.11),
homography between the target and current images can be obtained as
𝑯 = 11 12 13
21 22 23
31 32 33
where
11 = cos 𝜙 + [𝑥𝑐𝑜𝑠 𝜙 + 𝑧𝑠𝑖𝑛(𝜙)]𝑛𝑥
𝑑𝜋
12 =𝛼𝑥
𝛼𝑦[𝑥𝑐𝑜𝑠 𝜙 + 𝑧𝑠𝑖𝑛(𝜙)]
𝑛𝑦
𝑑𝜋
13 = αx[sin 𝜙 + 𝑥𝑐𝑜𝑠 𝜙 + 𝑧𝑠𝑖𝑛 𝜙 𝑛𝑧
𝑑𝜋]
21 = 0
22 = 1
23 = 0
31 = [−sin 𝜙 + (−𝑥𝑠𝑖𝑛 𝜙 + 𝑧𝑐𝑜𝑠 𝜙 )𝑛𝑥
𝑑𝜋]
1
𝛼𝑥
32 = (−𝑥𝑠𝑖𝑛 𝜙 + 𝑧𝑐𝑜𝑠 𝜙 )𝑛𝑦
𝑑𝜋
1
𝛼𝑦
33 = cos 𝜙 + −𝑥𝑠𝑖𝑛 𝜙 + 𝑧𝑐𝑜𝑠 𝜙 𝑛𝑧
𝑑𝜋.
21 ,22 and 23 do not give any information because they are already constant numbers due
to planar motion constraint. Elements 31 and 32 are discarded since their magnitudes are
low due to that 𝛼𝑥and 𝛼𝑦 take place at the denominator and they are more sensitive to noise
when compared with other homography elements. In monocular systems, planes in front of
the camera with dominant 𝑛𝑧 are detected more easily [9], so 13 and 33 are chosen among
the rest of the elements since they are dependent on 𝑛𝑧 . Therefore, output vector is defined as
𝒚 = 13
33 .
33
5.3. Input-Output Linearization and Control Law
The approach employed here navigates the mobile robot by controlling the elements of the
homography. This means that the problem of visual servo control is converted into a tracking
problem, i.e., actual elements of the homography should follow the desired trajectories of the
homography elements during the motion. The geometric model of this system is nonlinear
relating inputs and outputs. A linearization is carried out by differentiating the homography
elements until the control inputs can be obtained. Before going on with input-output
linearization and derivation of the control law, we can show that the system is controllable.
The state dynamics of the mobile robot allows the system to be written in an affine format as
𝒙 = 𝑓 𝒙 + 𝑔𝑖(𝑥)𝑢𝑖
𝑚
𝑖=1
where ui′s are the inputs. (5.14)
The state dynamics of the mobile robot is given by equation (5.10). If the equations (5.14)
and (5.10) are equated, the following result is obtained.
𝑓 𝑥 = 0 ,
𝑚 = 2 ,
𝑢1 = 𝑣 𝑎𝑛𝑑 𝒈𝟏 = −sin(𝜙)cos(𝜙)
0
,
𝑢2 = 𝑤 𝑎𝑛𝑑 𝒈𝟐 = 001 .
Since 𝑚 = 2, the accessibility distribution(C) becomes
𝑪 = 𝒈𝟏, 𝒈𝟐, 𝒈𝟏, 𝒈𝟐 .
𝑔1, 𝑔2 is the Lie bracket operation and its definition is the following:
𝒈𝟏, 𝒈𝟐 ≡𝝏𝒈𝟏
𝝏𝒙𝒈𝟐 −
𝝏𝒈𝟐
𝝏𝒙𝒈𝟏 where 𝒙 =
𝑥𝑧𝜙
is the state vector.
Accessibility distribution is obtained as:
𝝏𝒈𝟏
𝝏𝒙𝒈𝟐 =
0 0 −𝑐𝑜𝑠(𝜙)0 0 −𝑠𝑖𝑛(𝜙)0 0 0
001 =
−𝑐𝑜𝑠(𝜙)−𝑠𝑖𝑛(𝜙)
0
𝝏𝒈𝟐
𝝏𝒙𝒈𝟏 =
0 0 00 0 00 0 0
−sin(𝜙)cos(𝜙)
0
= 000
34
𝒈𝟏, 𝒈𝟐 = −𝑐𝑜𝑠(𝜙)−𝑠𝑖𝑛(𝜙)
0
𝑪 = −sin(𝜙) 0 −𝑐𝑜𝑠(𝜙)cos(𝜙) 0 −𝑠𝑖𝑛(𝜙)
0 1 0
.
Since rank(C) is equal to 3 which is the number of states, the system is controllable [22].
5.3.1. Input-Output Linearization
Linearization is a common way of designing nonlinear control systems. In this section,
outputs will be differentiated until they become linearly dependent on the inputs. Please note
that the normal vector(n) of the plane which creates the homography and the distance(d)
between that plane and the origin of the target frame are invariant and time derivatives of
them are also zero.
Time derivative of 13 :
13 = αx[sin 𝜙 + 𝑥𝑐𝑜𝑠 𝜙 + 𝑧𝑠𝑖𝑛 𝜙 𝑛𝑧
𝑑𝜋]
13 = αx[cos 𝜙 𝜙 + 𝑥 𝑐𝑜𝑠 𝜙 + 𝑧 𝑠𝑖𝑛 𝜙 − 𝑥𝑠𝑖𝑛 𝜙 𝜙 + 𝑧𝑐𝑜𝑠(𝜙)𝜙 𝑛𝑧
𝑑𝜋]
By the help of state equations, the equation above can be simplified.
𝑥 = − sin 𝜙 𝑣 ==> 𝑥 cos(𝜙) = − sin 𝜙 cos(𝜙) 𝑣
𝑧 = cos 𝜙 𝑣 ==> 𝑧 sin 𝜙 = sin 𝜙 cos(𝜙) 𝑣
𝑥 cos 𝜙 + 𝑧 sin 𝜙 = − sin 𝜙 cos(𝜙) 𝑣 + sin 𝜙 cos(𝜙) 𝑣 = 0
Therefore,
13 = αx cos 𝜙 𝜙 + −𝑥𝑠𝑖𝑛 𝜙 𝜙 + 𝑧𝑐𝑜𝑠 𝜙 𝜙 𝑛𝑧
𝑑𝜋
= αx𝜙 cos 𝜙 + −𝑥𝑠𝑖𝑛 𝜙 + 𝑧𝑐𝑜𝑠 𝜙 𝑛𝑧
𝑑𝜋 = αxh33𝑤 since 𝑤 = 𝜙 .
First time derivative of 13 becomes linearly dependent on the inputs so the relative degree of
this output is 1, and there is no need for further differentiations of 13 .
Time derivative of 33 :
33 = cos 𝜙 + −𝑥𝑠𝑖𝑛 𝜙 + 𝑧𝑐𝑜𝑠 𝜙 𝑛𝑧
𝑑𝜋
33 = −sin 𝜙 𝜙 + −𝑥 𝑠𝑖𝑛 𝜙 + 𝑧 𝑐𝑜𝑠 𝜙 − 𝑥𝑐𝑜𝑠 𝜙 𝜙 − 𝑧𝑠𝑖𝑛(𝜙)𝜙 𝑛𝑧
𝑑𝜋
By the help of state equations, the equation above can be simplified.
35
𝑥 = − sin 𝜙 𝑣 ==> −𝑥 sin(𝜙) = sin2 𝜙 𝑣
𝑧 = cos 𝜙 𝑣 ==> z cos ϕ = cos2(𝜙)v
−𝑥 sin 𝜙 + 𝑧 cos 𝜙 = sin2 𝜙 𝑣 + cos2(𝜙)𝑣 = 𝑣
Therefore,
33 =𝑛𝑧
𝑑𝜋𝑣 − 𝑤 sin 𝜙 +
𝑛𝑧
𝑑𝜋
𝑥𝑐𝑜𝑠 𝜙 + 𝑧𝑠𝑖𝑛 𝜙 =𝑛𝑧
𝑑𝜋𝑣 −
13
𝛼𝑥𝑤.
Also, first time derivative of 33 becomes linearly dependent on the inputs so relative degree
of this output is 1, too, and there is no need for further differentiations of 33 .
5.3.2. Control Law
After taking the first time derivatives of the outputs, a linear relationship is obtained between
outputs and inputs. This relationship can be shown by matrix representation, and decoupling
matrix(𝑳) can be obtained as
13
33
=
0 𝛼𝑥33
𝑛𝑧
𝑑𝜋−
13
𝛼𝑥
𝑣𝑤
= 𝑳 𝑣𝑤
. (5.15)
The error system should be in such a form that both the tracking error and the derivative of
the tracking error must converge to zero. To illustrate, an error system differential equation of
a tracking problem should be 𝑒 + 𝑘𝑒 = 0, so it has a left half plane pole for positive 𝑘 values
and thus the error and the time derivative of the error decay to zero exponentially. In order to
achieve this task, following arrangements are made(Superscript ′𝑑′ stands for 'desired').
𝒆 = 𝑒1
𝑒2 =
13𝑑 − 13
33𝑑 − 33
, 𝒆 = 𝑒 1𝑒 2
= 13
𝑑 − 13
33𝑑 − 33
𝑎𝑛𝑑 𝒌 = 𝑘13 00 𝑘33
(5.16)
13
𝑑 − 13
33𝑑 − 33
+ 𝑘13 00 𝑘33
13
𝑑 − 13
33𝑑 − 33
= 00 (5.17)
After some manipulations on equation (5.17), equation (5.18) is obtained.
13
33
= 13
𝑑 + 𝑘13(13𝑑 − 13)
33𝑑 + 𝑘33(33
𝑑 − 33) (5.18)
𝑘13 and 𝑘33 are positive control gains. Equating right hand sides of the equations (5.15) and
(5.18) allows for the solution of the control signal.
𝑳 𝑣𝑤
= 13
𝑑 + 𝑘13(13𝑑 − 13)
33𝑑 + 𝑘33(33
𝑑 − 33) (5.19)
36
Multiplying both the left and the right hand sides of the equation (5.19) by 𝑳−1 gives the
control signal.
𝑣𝑤
= 𝑳−1 13
𝑑 + 𝑘13(13𝑑 − 13)
33𝑑 + 𝑘33(33
𝑑 − 33) =
13𝑑𝜋
𝛼𝑥233𝑛𝑧
𝑑𝜋
𝑛𝑧
1
𝛼𝑥330
13
𝑑 + 𝑘13(13𝑑 − 13)
33𝑑 + 𝑘33(33
𝑑 − 33) (5.20)
In order to have a nonsingular control signal, decoupling matrix must be invertible such that
det(𝑳)≠ 0. In order to investigate the situations that can create nonsingularity, determinant of
the decoupling matrix should be analyzed:
det 𝑳 = −𝛼𝑥33
𝑛𝑧
𝑑𝜋 (5.21)
Here, 𝛼𝑥 denotes the focal length in pixel dimensions in x direction, so it is not zero and since
the plane that generates the homography is at a finite distance from the target position,
𝑑𝜋 ≠ ∞. Also, the plane must be seen by the camera and this makes 𝑛𝑧 ≠ 0. Then, there is
one possibility left which can make the determinant of the decoupling matrix zero and that
possibility is 33 = 0. 33 is given by
33 = cos 𝜙 + −𝑥𝑠𝑖𝑛 𝜙 + 𝑧𝑐𝑜𝑠 𝜙 𝑛𝑧
𝑑𝜋.
It should be shown that 33 never becomes zero in order to hamper singularity in control law.
The target is in front of the mobile robot so 𝑧 < 0 until the moment robot reaches the target
pose according to the assigned target coordinate frame. At the moment robot is at the desired
pose, 𝑥, 𝑧 and 𝜙 become zero and 33 becomes one. There are some constraints on the
orientation of the robot. In order for the robot to see the target scene fully or partially,
−𝜋
2< 𝜙 <
𝜋
2 must be satisfied. Otherwise, robot would see a scene which is not related to
the target scene at all, and it would not be possible to construct a meaningful control signal.
−𝜋
2< 𝜙 <
𝜋
2 constraint ensures that 𝑐𝑜𝑠 𝜙 > 0. Besides, 𝑛𝑧 must be negative with respect
to the target coordinate frame since the plane that produces the homography is visible for the
camera. Therefore, 𝑧𝑐𝑜𝑠 𝜙 𝑛𝑧
𝑑𝜋 becomes greater than zero and then it follows that if
cos 𝜙 + 𝑧𝑐𝑜𝑠 𝜙 𝑛𝑧
𝑑𝜋 > | − 𝑥𝑠𝑖𝑛(𝜙)
𝑛𝑧
𝑑𝜋|, then 33 becomes greater than zero(33 > 0).
This inequality imposes that the lateral distance to compensate is smaller than the depth error.
In other words, this inequality holds if the depth error is higher than the lateral error due to
the camera field of view constraint such that 𝑧𝑐𝑜𝑠 𝜙 > |𝑥𝑠𝑖𝑛(𝜙)|. As a result, it is
concluded that the determinant of the decoupling matrix is never zero in the work space and
control signal can be constructed without facing any singularity.
5.3.3. Desired Trajectories of the Homography Elements
Control law needs the definition of the desired trajectories of the homography elements as
can be seen by equation (5.20). The motion performed by the robot is obviously dependent on
37
the selection of the desired trajectories of the homography elements(13𝑑 , 33
𝑑 ). When the
robot reaches the target pose,
𝑥𝑧𝜙
= 000 and therefore, the homography becomes an identity
matrix. This dictates that final values of 13 and 33 must be 0 and 1 respectively. There are
several proposals for the desired trajectories of the homography elements in the literature.
Two of the most important ones are offered by [9] and [23]. Suggestion of [23] for the
desired trajectories is taken into consideration in this project.
*Desired Trajectories:
The desired trajectory of 13 is selected in such a way that it corrects the lateral and
orientation errors simultaneously and the chosen desired trajectory for 33 is a sinusoid which
is a smooth function that converges to 1 assuring depth error is removed.
Desired Trajectory of 13: There is a condition regarding the initial configuration of
the robot that should be checked before deciding about the desired trajectory of 13 . That
condition is related to the current and target epipoles at the starting time of the motion. The
sign of the multiplication of x coordinates of the current epipole(ecx ) and the target
epipole(etx ) at the beginning of the motion must be examined. Please refer to Appendix A for
information about the epipolar geometry and its relationship with the mobile robot
navigation. Let's analyze the desired trajectory of 13 in two cases.
Case 1: If ecx 0 . etx 0 ≤ 0, desired trajectory can be defined in two steps.
13𝑑 0 ≤ 𝑡 ≤ 𝑇2 = 13(0)
𝜓 (𝑡)
𝜓(0)
13𝑑 𝑇2 < 𝑡 < ∞ = 0.
Case 2: If ecx 0 . etx 0 > 0, desired trajectory is defined in three steps. First
of these steps drives the robot to a proper orientation and thereafter, a smooth motion towards
the target can be realized. Second and third steps can be defined alike the first and second
steps of the first case.
13𝑑 0 ≤ 𝑡 ≤ 𝑇1 =
13 0 +13𝑑 (𝑇1)
2+
13 0 −13𝑑 (𝑇1)
2cos(
𝜋𝑡
𝑇1)
13𝑑 𝑇1 < 𝑡 ≤ 𝑇2 = 13(𝑇1)
𝜓 (𝑡)
𝜓(𝑇1)
13𝑑 𝑇2 < 𝑡 < ∞ = 0 where 13
𝑑 𝑇1 = −2
313 0 and 𝑇1 < 𝑇2.
First step is an intermediate step that should be completed in 𝑇1. 𝜓 is the angle of the straight
line connecting the current position of the robot with the target position defined in target
frame as seen in figure 5.13. 13𝑑 is proposed in relation with 𝜓 since it is desired to correct
the lateral and orientation errors altogether.
38
Desired Trajectory of 33 : The desired trajectory of 33 is realized in two steps.
33𝑑 0 ≤ 𝑡 ≤ 𝑇2 =
33 0 +1
2+
33 0 −1
2cos(
𝜋𝑡
𝑇2)
33𝑑 𝑇2 < 𝑡 < ∞ = 1.
Desired homography values should be reached in 𝑇2. The desired trajectories are dependent
on the homography and initial position. As the robot moves, control law makes the realized
homography elements track the desired trajectories defined above guaranteeing the
convergence to the target.
5.4. Stability Analysis
A candidate Lyapunov function for the error system is chosen as
𝑉 𝒙, 𝑡 =1
2 𝒆 2 𝑤𝑒𝑟𝑒 𝒆 =
𝑒1
𝑒2 =
13𝑑 − 13
33𝑑 − 33
. (5.22)
The Lyapunov candidate is positive definite except the origin of the error space. Now, it must
be proven that the time derivative of the Lyapunov function is zero at the origin of the error
space and negative definite elsewhere. Following definitions are made for this analysis.
𝑫 = 13
𝑑
13𝑑
, 𝑫 = 13
𝑑
33𝑑
and 𝒌 = 𝑘13 00 𝑘33
𝑉 𝒙, 𝑡 = 𝒆𝑻𝒆 = 𝒆𝑻 13
𝑑 − 13
33𝑑 − 33
= 𝒆𝑻 13
𝑑
33𝑑
− 𝑳 𝑣𝑤
= 𝒆𝑻 𝑫 − 𝑳𝑳−𝟏(𝑫 + 𝒌𝒆)
= 𝒆𝑻 𝑰 − 𝑳𝑳−𝟏 𝑫 − 𝒌 𝒆𝑻𝑳𝑳−𝟏𝒆 (5.23)
Equation (5.23) shows that the time derivative of the Lyapunov candidate is negative definite
in the error space except the origin so asymptotic stability is guaranteed. Since 𝑳𝑳−𝟏 is equal
to 2x2 identity matrix in theory, first term of equation (5.23) drops and 𝑉 𝒙, 𝑡 = −𝒌 𝒆 2
with a positive definite and diagonal gain matrix 𝒌 satisfies the asymptotic stability
conditions. In practice, the estimation of 𝑳−𝟏 may not be exact, so 𝑳𝑳−𝟏 may not be an exact
identity matrix. However, if the estimation of 𝑳−𝟏 is not too course, asymptotic stability of
the system is achieved [3]. Region of the stability is the workspace of the mobile robot with
the camera field of view limitations [23].
Now, it has been proven that 13 converges to 13𝑑 and 33 converges to 33
𝑑 since 𝒆 goes to
zero(system is asymptotically stable). After time 𝑇2, 13𝑑 becomes 0 and 33
𝑑 becomes 1 as
understood from the proposed desired trajectory sets. If figure 5.13 is examined, it is seen that
𝜓 = − arctan 𝑥
𝑧 since 𝑥 = −𝜌 sin 𝜓 and 𝑧 = 𝜌 cos 𝜓 for all quadrants. In order for
13𝑑 (and so 13) to converge to zero, 𝜓 must goes to zero eventually and this is realized when
𝑥 becomes equal to zero. Therefore, 𝑥 = 0 is reached at the end of the motion. Now, the final
39
values of the other state variables (z and 𝜙) must be found. The values of these state variables
are found by the help of the homography equations of h13 and h33 .
13 = αx sin 𝜙 + 𝑥𝑐𝑜𝑠 𝜙 + 𝑧𝑠𝑖𝑛 𝜙 𝑛𝑧
𝑑𝜋 (5.24)
33 = cos 𝜙 + −𝑥𝑠𝑖𝑛 𝜙 + 𝑧𝑐𝑜𝑠 𝜙 𝑛𝑧
𝑑𝜋 (5.25)
𝑧 variable is eliminated from equations (5.24) and (5.25) by following the procedure below.
i) Multiply equation (5.24) by 𝑐𝑜𝑠 𝜙 ,
ii) Multiply equation (5.25) by –αx sin 𝜙 ,
iii) Add the results of (i) and (ii) side by side.
cos 𝜙 13 = αx sin 𝜙 cos 𝜙 + αx𝑥𝑐𝑜𝑠2 𝜙 𝑛𝑧
𝑑𝜋+ αx𝑧𝑠𝑖𝑛 𝜙 cos(𝜙)
𝑛𝑧
𝑑𝜋
–αx sin 𝜙 33 = –αx sin 𝜙 cos 𝜙 + αx𝑥𝑠𝑖𝑛2 𝜙 𝑛𝑧
𝑑𝜋– αx𝑧𝑠𝑖𝑛 𝜙 𝑐𝑜𝑠 𝜙
𝑛𝑧
𝑑𝜋
Adding the equations above and plugging the final values of 13 and 33 into the added
equation result in equation (5.26):
sin 𝜙 = 𝑥𝑛𝑧
𝑑𝜋 (5.26)
Since 𝑥 becomes equal to zero at the end of the motion, 𝜙 must also become zero as
understood from equation (5.26). Plugging 𝑥 = 0 and 𝜙 = 0 into equation (5.25) shows that
𝑧 = 0. This analysis proves that the only equilibrium state of the system is
𝑥𝑧𝜙
= 000 and
when the equilibrium state is reached, homography becomes 3x3 identity matrix( 𝑯 = 𝑰)
which is an indication of that camera sees the target scene and the goal is accomplished.
6-SIMULATIONS
Simulations are carried out in order to show the validity of the proposed approach.
Performance of the system with and without noise and calibration errors is investigated. In
simulations, the knowledge of the initial and target configurations is enough as a priori.
Control algorithm tries to drive the robot from initial configuration to the target
configuration. Control loop is illustrated in figure 6.1.
40
Figure 6.1 Diagram of the control loop
The rotation matrix and the translation vector between the current and target configurations
can be found since the current and target positions and orientations are known as inputs.
Possessing the knowledge of intrinsic camera matrix, the theoretical formula of homography
(equation (5.11)) can be used to find out the 3x3 homography matrix between the current and
target virtual scenes. Intrinsic camera matrix is formed by using the information in [10] and
[18]. The virtual image is assumed to have a 640x480 pixel resolution. The value of the focal
length is used as 𝑓 = 6 𝑚𝑚, and its real value is varied to see the effect on the final errors.
Besides, the effect of principal coordinates on the final errors is also analyzed. The values of
the control gains used are 𝑘13 = 1 and 𝑘33 = 1 and 𝑇1 = 40 𝑠 and 𝑇2 = 80 𝑠. Total time of
the simulation is chosen to be 𝑇𝑡𝑜𝑡𝑎𝑙 = 100 𝑠.
The simulations are carried out for several initial configurations and the target configuration
of (𝑥 = 0, 𝑧 = 0, 𝜙 = 0°). Since the mobile robot is moving on a horizontal plane
(𝑥 − 𝑧 𝑝𝑙𝑎𝑛𝑒), 𝑦 coordinate with respect to the robot attached coordinate frame and the world
reference frame is zero. Furthermore, the roll and pitch angles do not change in time, but yaw
angle(𝜙) is a variable. The outcomes of the simulations are shown in figures 6.2-6.22.
41
i) Results for initial configuration of (x = −5, z = −15, ϕ = 5°):
Figure 6.2 Evolution of position and orientation parameters and control signals
Figure 6.3 Evolution of homography elements
0 50 100-6
-5
-4
-3
-2
-1
0
1Evolution of Lateral Position (X) in time
Time[s]
X [
m]
0 50 100-15
-10
-5
0Evolution of Depth (Z) in time
Time[s]
Z [
m]
0 50 100-30
-25
-20
-15
-10
-5
0
5Evolution of Orientation () in time
Time[s]
[
deg]
-10 -5 0 5-15
-10
-5
0Followed Path: X vs. Z
X [m]
Z [
m]
0 50 1000
0.05
0.1
0.15
0.2
0.25
0.3
0.35Control Output: Linear Velocity(v) vs. Time
v [
m/s
]
Time[s]
0 50 100-1.5
-1
-0.5
0
0.5
1
1.5Control Output: Angular Velocity(w) vs. Time
w [
deg/s
]
Time[s]
0 50 1000.5
1
1.5
2
H11
vs. Time
Time[s]
0 50 100-0.05
0
0.05
0.1
0.15
H12
vs. Time
Time[s]
0 50 100-500
0
500
1000
H13
vs. Time
Time[s]
Realized H13
Desired H13
0 50 100-1
-0.5
0
0.5
1
H21
vs. Time
Time[s]
0 50 1000
0.5
1
1.5
2
H22
vs. Time
Time[s]
0 50 100-1
-0.5
0
0.5
1
H23
vs. Time
Time[s]
0 50 100-2
0
2
4x 10
-3 H31
vs. Time
Time[s]
0 50 1000
2
4
6x 10
-4 H32
vs. Time
Time[s]
0 50 1001
2
3
4
H33
vs. Time
Time[s]
Realized H33
Desired H33
42
Figure 6.4 Evolution of error in position and orientation parameters
ii) Results for initial configuration of (x = −8, z = −20,ϕ = −45°):
Figure 6.5 Evolution of position and orientation parameters and control signals
0 50 100-6
-5
-4
-3
-2
-1
0
1Error in Lateral Position (X) in time
Time[s]
[m]
0 50 100-15
-10
-5
0Error in Depth (Z) in time
Time[s]
[m]
0 50 100-30
-25
-20
-15
-10
-5
0
5Error in Orientation () in time
Time[s]
[deg]
0 50 100-8
-6
-4
-2
0
2Evolution of Lateral Position (X) in time
Time[s]
X [
m]
0 50 100-20
-15
-10
-5
0Evolution of Depth (Z) in time
Time[s]
Z [
m]
0 50 100-50
-40
-30
-20
-10
0
10Evolution of Orientation () in time
Time[s]
[
deg]
-10 -5 0 5-20
-15
-10
-5
0Followed Path: X vs. Z
X [m]
Z [
m]
0 50 1000
0.1
0.2
0.3
0.4
0.5Control Output: Linear Velocity(v) vs. Time
v [
m/s
]
Time[s]
0 50 100-0.5
0
0.5
1
1.5Control Output: Angular Velocity(w) vs. Time
w [
deg/s
]
Time[s]
43
Figure 6.6 Evolution of homography elements
Figure 6.7 Evolution of error in position and orientation parameters
0 50 100-1
0
1
2
H11
vs. Time
Time[s]
0 50 100-0.2
0
0.2
H12
vs. Time
Time[s]
0 50 100-2000
-1000
0
1000
H13
vs. Time
Time[s]
Realized H13
Desired H13
0 50 100-1
0
1
H21
vs. Time
Time[s]
0 50 1000
1
2
H22
vs. Time
Time[s]
0 50 100-1
0
1
H23
vs. Time
Time[s]
0 50 1000
2
4
6x 10
-3 H31
vs. Time
Time[s]
0 50 1000
2
4
6x 10
-4 H32
vs. Time
Time[s]
0 50 1001
2
3
4
H33
vs. Time
Time[s]
Realized H33
Desired H33
0 50 100-8
-7
-6
-5
-4
-3
-2
-1
0
1Error in Lateral Position (X) in time
Time[s]
[m]
0 50 100-20
-18
-16
-14
-12
-10
-8
-6
-4
-2
0Error in Depth (Z) in time
Time[s]
[m]
0 50 100-45
-40
-35
-30
-25
-20
-15
-10
-5
0
5Error in Orientation () in time
Time[s]
[deg]
44
iii) Results for initial configuration of (x = 10, z = −35, ϕ = −25°):
Figure 6.8 Evolution of position and orientation parameters and control signals
Figure 6.9 Evolution of homography elements
0 50 100-5
0
5
10
15Evolution of Lateral Position (X) in time
Time[s]
X [
m]
0 50 100-40
-30
-20
-10
0Evolution of Depth (Z) in time
Time[s]
Z [
m]
0 50 100-40
-20
0
20
40
60
80Evolution of Orientation () in time
Time[s]
[
deg]
-10 0 10 20-35
-30
-25
-20
-15
-10
-5
0Followed Path: X vs. Z
X [m]
Z [
m]
0 50 100-0.5
0
0.5
1
1.5Control Output: Linear Velocity(v) vs. Time
v [
m/s
]
Time[s]
0 50 100-6
-4
-2
0
2
4Control Output: Angular Velocity(w) vs. Time
w [
deg/s
]
Time[s]
0 50 100-4
-2
0
2
4
H11
vs. Time
Time[s]
0 50 100-0.5
0
0.5
H12
vs. Time
Time[s]
0 50 100-4000
-2000
0
2000
H13
vs. Time
Time[s]
Realized H13
Desired H13
0 50 100-1
-0.5
0
0.5
1
H21
vs. Time
Time[s]
0 50 1000
0.5
1
1.5
2
H22
vs. Time
Time[s]
0 50 100-1
-0.5
0
0.5
1
H23
vs. Time
Time[s]
0 50 1000
2
4
6
8x 10
-3 H31
vs. Time
Time[s]
0 50 1000
0.5
1x 10
-3 H32
vs. Time
Time[s]
0 50 1001
2
3
4
5
H33
vs. Time
Time[s]
Realized H
33
Desired H33
45
Figure 6.10 Evolution of error in position and orientation parameters
iv) Results for initial configuration of (x = 10, z = −25, ϕ = −35°):
Figure 6.11 Evolution of position and orientation parameters and control signals
0 50 100-2
0
2
4
6
8
10
12
14Error in Lateral Position (X) in time
Time[s]
[m]
0 50 100-35
-30
-25
-20
-15
-10
-5
0Error in Depth (Z) in time
Time[s]
[m]
0 50 100-30
-20
-10
0
10
20
30
40
50
60
70Error in Orientation () in time
Time[s]
[deg]
0 50 100-5
0
5
10
15
20Evolution of Lateral Position (X) in time
Time[s]
X [
m]
0 50 100-25
-20
-15
-10
-5
0Evolution of Depth (Z) in time
Time[s]
Z [
m]
0 50 100-50
0
50
100
150Evolution of Orientation () in time
Time[s]
[
deg]
0 5 10 15 20-25
-20
-15
-10
-5
0Followed Path: X vs. Z
X [m]
Z [
m]
0 50 100-1
-0.5
0
0.5
1
1.5Control Output: Linear Velocity(v) vs. Time
v [
m/s
]
Time[s]
0 50 100-10
-5
0
5
10Control Output: Angular Velocity(w) vs. Time
w [
deg/s
]
Time[s]
46
Figure 6.12 Evolution of homography elements
Figure 6.13 Evolution of error in position and orientation parameters
0 50 100-4
-2
0
2
H11
vs. Time
Time[s]
0 50 100-0.5
0
0.5
H12
vs. Time
Time[s]
0 50 100-4000
-2000
0
2000
H13
vs. Time
Time[s]
Realized H13
Desired H13
0 50 100-1
-0.5
0
0.5
1
H21
vs. Time
Time[s]
0 50 1000
0.5
1
1.5
2
H22
vs. Time
Time[s]
0 50 100-1
-0.5
0
0.5
1
H23
vs. Time
Time[s]
0 50 100-2
0
2
4
6x 10
-3 H31
vs. Time
Time[s]
0 50 1000
2
4
6x 10
-4 H32
vs. Time
Time[s]
0 50 1001
1.5
2
2.5
3
H33
vs. Time
Time[s]
Realized H
33
Desired H33
0 50 100-2
0
2
4
6
8
10
12
14
16
18Error in Lateral Position (X) in time
Time[s]
[m]
0 50 100-25
-20
-15
-10
-5
0Error in Depth (Z) in time
Time[s]
[m]
0 50 100-40
-20
0
20
40
60
80
100
120Error in Orientation () in time
Time[s]
[deg]
47
v) Results for initial configuration of (x = −3, z = −20, ϕ = 30°):
Figure 6.14 Evolution of position and orientation parameters and control signals
Figure 6.15 Evolution of homography elements
0 50 100-5
-4
-3
-2
-1
0
1Evolution of Lateral Position (X) in time
Time[s]
X [
m]
0 50 100-20
-15
-10
-5
0Evolution of Depth (Z) in time
Time[s]
Z [
m]
0 50 100-60
-40
-20
0
20
40Evolution of Orientation () in time
Time[s]
[
deg]
-10 -5 0 5-20
-15
-10
-5
0Followed Path: X vs. Z
X [m]
Z [
m]
0 50 100-0.2
0
0.2
0.4
0.6
0.8
1
1.2Control Output: Linear Velocity(v) vs. Time
v [
m/s
]
Time[s]
0 50 100-4
-2
0
2
4
6Control Output: Angular Velocity(w) vs. Time
w [
deg/s
]Time[s]
0 50 100-2
0
2
4
H11
vs. Time
Time[s]
0 50 100-0.5
0
0.5
H12
vs. Time
Time[s]
0 50 100-1000
0
1000
2000
H13
vs. Time
Time[s]
Realized H13
Desired H13
0 50 100-1
0
1
H21
vs. Time
Time[s]
0 50 1000
1
2
H22
vs. Time
Time[s]
0 50 100-1
0
1
H23
vs. Time
Time[s]
0 50 1000
2
4x 10
-3 H31
vs. Time
Time[s]
0 50 1000
2
4
6x 10
-4 H32
vs. Time
Time[s]
0 50 1001
2
3
4
H33
vs. Time
Time[s]
Realized H
33
Desired H33
48
Figure 6.16 Evolution of error in position and orientation parameters
vi) Results for initial configuration of (x = −0.25, z = −1.2, ϕ = −20°):
Figure 6.17 Evolution of position and orientation parameters and control signals
0 20 40 60 80 100-5
-4
-3
-2
-1
0
1Error in Lateral Position (X) in time
Time[s]
[m]
0 20 40 60 80 100-20
-18
-16
-14
-12
-10
-8
-6
-4
-2
0Error in Depth (Z) in time
Time[s]
[m]
0 20 40 60 80 100-50
-40
-30
-20
-10
0
10
20
30Error in Orientation () in time
Time[s]
[deg]
0 50 100-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1Evolution of Lateral Position (X) in time
Time[s]
X [
m]
0 50 100-1.4
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0Evolution of Depth (Z) in time
Time[s]
Z [
m]
0 50 100-20
-15
-10
-5
0
5Evolution of Orientation () in time
Time[s]
[
deg]
-0.5 0 0.5
-1
-0.8
-0.6
-0.4
-0.2
0Followed Path: X vs. Z
X [m]
Z [
m]
0 50 1000
0.005
0.01
0.015
0.02
0.025
0.03Control Output: Linear Velocity(v) vs. Time
v [
m/s
]
Time[s]
0 50 1000
0.1
0.2
0.3
0.4Control Output: Angular Velocity(w) vs. Time
w [
deg/s
]
Time[s]
49
Figure 6.18 Evolution of homography elements
Figure 6.19 Evolution of error in position and orientation parameters
0 20 40 60 80 1000.9
0.95
1
H11
vs. Time
Time[s]
0 20 40 60 80 100-4
-3
-2
-1
0x 10
-3 H12
vs. Time
Time[s]
0 20 40 60 80 100-300
-200
-100
0
100
H13
vs. Time
Time[s]
Realized H13
Desired H13
0 20 40 60 80 100-1
-0.5
0
0.5
1
H21
vs. Time
Time[s]
0 20 40 60 80 1000
0.5
1
1.5
2
H22
vs. Time
Time[s]
0 20 40 60 80 100-1
-0.5
0
0.5
1
H23
vs. Time
Time[s]
0 20 40 60 80 100-5
0
5
10x 10
-4 H31
vs. Time
Time[s]
0 20 40 60 80 1000
1
2
3
4x 10
-5 H32
vs. Time
Time[s]
0 20 40 60 80 1000.9
1
1.1
1.2
H33
vs. Time
Time[s]
Realized H
33
Desired H33
0 20 40 60 80 100-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1Error in Lateral Position (X) in time
Time[s]
[m]
0 20 40 60 80 100-1.4
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0Error in Depth (Z) in time
Time[s]
[m]
0 20 40 60 80 100-20
-15
-10
-5
0
5Error in Orientation () in time
Time[s]
[deg]
50
vii) Results for initial configuration of (x = 12, z = −40, ϕ = 45) and this time target
configuration of (x = −8, z = −5, ϕ = −20°).
Figure 6.20 Evolution of position and orientation parameters and control signals
Figure 6.21 Evolution of homography elements
0 20 40 60 80 100-10
-5
0
5
10
15Evolution of Lateral Position (X) in time
Time[s]
X [
m]
0 20 40 60 80 100-40
-35
-30
-25
-20
-15
-10
-5Evolution of Depth (Z) in time
Time[s]Z
[m
]
0 20 40 60 80 100-40
-20
0
20
40
60Evolution of Orientation () in time
Time[s]
[
deg]
-10 0 10 20-40
-35
-30
-25
-20
-15
-10
-5Followed Path: X vs. Z
X [m]
Z [
m]
0 20 40 60 80 100-0.2
0
0.2
0.4
0.6
0.8
1Control Output: Linear Velocity(v) vs. Time
v [
m/s
]
Time[s]
0 20 40 60 80 100-4
-3
-2
-1
0
1Control Output: Angular Velocity(w) vs. Time
w [
deg/s
]Time[s]
0 20 40 60 80 1000.8
1
1.2
1.4
H11
vs. Time
Time[s]
0 20 40 60 80 100-0.1
0
0.1
0.2
H12
vs. Time
Time[s]
0 20 40 60 80 100-1000
0
1000
2000
H13
vs. Time
Time[s]
Realized H
13
Desired H13
0 20 40 60 80 100-1
0
1
H21
vs. Time
Time[s]
0 20 40 60 80 1000
1
2
H22
vs. Time
Time[s]
0 20 40 60 80 100-1
0
1
H23
vs. Time
Time[s]
0 20 40 60 80 100-2
0
2
4x 10
-3 H31
vs. Time
Time[s]
0 20 40 60 80 1000
0.5
1
1.5x 10
-3 H32
vs. Time
Time[s]
0 20 40 60 80 1000
5
10
H33
vs. Time
Time[s]
Realized H
33
Desired H33
51
Figure 6.22 Evolution of error in position and orientation parameters
Since the homography decomposition is not necessary and it is not done in this control
approach, the normal vector(𝒏) of the plane that generates the homography and the
distance(𝑑𝜋) between that plane and the origin of the target frame are not known, so the term 𝑛𝑧
𝑑𝜋 used in control and homography calculations is not known exactly either. Therefore, the
value of term 𝑛𝑧
𝑑𝜋 must be estimated. The effect of the uncertainty in 𝑛𝑧 and 𝑑𝜋 on the
performance is checked by using fixed values in the computation of the homography and
varying those values in the control law. Figure 6.23 and 6.24 show the effect of this
uncertainty on the final pose error.
0 50 100-5
0
5
10
15
20Error in Lateral Position (X) in time
Time[s]
[m]
0 50 100-35
-30
-25
-20
-15
-10
-5
0Error in Depth (Z) in time
Time[s]
[m]
0 50 100-10
0
10
20
30
40
50
60
70Error in Orientation () in time
Time[s]
[deg]
52
Figure 6.23 Final pose error for different 𝑑𝜋 values
Figure 6.24 Final pose error for different 𝑛𝑧 values
As can be understood from the results shown in figures 6.23 and 6.24, the convergence of the
approach is not affected by the uncertainty and good final pose errors are obtained. Another
important issue in most of the visual servoing systems is the calibration of the camera. Since
0 5 10 15 20 25 30-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
Fin
al P
ose E
rror
d [m]
Lateral(x) Error[m]
Depth(z) Error[m]
Orientation() Error[deg]
-5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
Fin
al P
ose E
rror
nz [m]
Lateral(x) Error[m]
Depth(z) Error[m]
Orientation() Error[deg]
53
the elements of the intrinsic camera matrix take place in the control law and in the
computation of homography, it is necessary to investigate the impacts of the elements of the
intrinsic camera matrix on the performance. The simulation results presented before are
obtained by taking the focal length of the camera as 6 millimeters as mentioned before and
the principal point is assumed to be at the centre of the image(𝑥0 = 0, 𝑦0 = 0). Final pose
errors of the robot are shown in figures 6.25-6.27 for a range of the focal length and the
coordinates of the principal point.
Figure 6.25 Final pose error varying the focal length
Figure 6.26 Final pose error varying the location of the x coordinate of the principle point
0 5 10 15-1.2
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
Pose
Erro
r
f [mm])
Lateral(x) Error
Depth(z) Error
Orientation() Error
-50 -40 -30 -20 -10 0 10 20 30 40 50-0.01
-0.005
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Pose
Erro
r
x0(pixels)
Lateral(x) Error
Depth(z) Error
Orientation() Error
54
Figure 6.27 Final pose error varying the location of the y coordinate of the principle point
Results indicate that the method is able to compensate for the calibration errors. In other
words, a rough calibration is sufficient to ensure the convergence of the system.
Also, the performance of the system is analyzed when noise is applied to the homography
elements directly. The results of driving the robot from
𝑥𝑧𝜙
= −5−15
5
to
𝑥𝑧𝜙
= 000 with
white noise of standard deviation(𝜍) equal to 0.3 are represented in figures 6.28 and 6.29.
Figure 6.28 Evolution of pose parameters with noise
-50 -40 -30 -20 -10 0 10 20 30 40 50-0.5
0
0.5
1
1.5
2
2.5
3x 10
-4
Pose
Erro
r
y0(pixels)
Lateral(x) Error
Depth(z) Error
Orientation() Error
0 20 40 60 80 100-6
-4
-2
0
2Evolution of Lateral Position (X) in time
Time[s]
X [
m]
0 20 40 60 80 100-20
-15
-10
-5
0
5Evolution of Depth (Z) in time
Time[s]
Z [
m]
0 20 40 60 80 100-30
-20
-10
0
10Evolution of Orientation () in time
Time[s]
[
deg]
-15 -10 -5 0 5 10-15
-10
-5
0
Followed Path: X vs. Z
X [m]
Z [
m]
55
Figure 6.29 Evolution of homography elements with noise
Besides, the final pose error under the effect of white noise with increasing standard
deviation(𝜍) is given in figure 6.30.
Figure 6.30 Final pose error varying noise on homography
It could be inferred from the graphics above that the convergence of the system is achieved in
spite of the existence of noise. Unsurprisingly, the higher the standard deviation noise has,
the more deviation from the target configuration occurs. Lateral and depth errors are
0 20 40 60 80 100-600
-400
-200
0
200
400
600
800
H13
vs. Time
Time[s]
0 20 40 60 80 1000.5
1
1.5
2
2.5
3
3.5
H33
vs. Time
Time[s]
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2-25
-20
-15
-10
-5
0
5
10
15
20
Fin
al P
ose E
rror
Lateral(x) Error[m]
Depth(z) Error[m]
Orientation() Error[deg]
56
compensated better when compared with orientation error when noise with high standard
deviation is affecting the system.
As a final explanatory remark in this chapter, the ways of finding 𝜓 will be discussed. Since
the definition of the desired trajectories is absolutely necessary to carry out the simulations
and since the desired trajectory of 13 includes 𝜓 , it is a must to know its value during
simulations. There are two methods that can be employed to find out 𝜓 in simulations.
1-) Initial pose of the mobile robot is provided as an input to simulation algorithm so
the initial values of 𝑥, 𝑧 and 𝜙 are known in the beginning. If the target pose is
𝒙𝑡 = [𝑥𝑡 𝑧𝑡 𝜙𝑡]𝑇 , then the value of 𝜓 can be computed by using the relation
𝜓 = − arctan 𝑥−𝑥𝑡
𝑧−𝑧𝑡 − 𝜙𝑡 which can be inferred by examining figure 5.13. If the target pose
is 𝒙𝑡 = [0 0 0]𝑇, then simply 𝜓 = − arctan 𝑥
𝑧 . After the computation of 𝜓, the construction
of the control law can be completed and the robot can be driven by the control signal to its
next position(𝑥, 𝑧) and orientation(𝜙) that are also known. Applying 𝜓 = − arctan 𝑥
𝑧 again,
the control signal can be calculated and robot can be driven again. This loop continues until
robot reaches the target pose.
2-) Second way of finding 𝜓 is related to the target epipole. Please refer to
Appendix A for information about epipolar geometry. The relation between the target epipole
and 𝜓 is explained by the help of figures 6.31 and 6.32.
Figure 6.31 Epipoles in the current and target poses
57
If the target epipole is zoomed in, figure 6.32 is obtained.
Figure 6.32 Target epipole
The triangle in figure 6.32 reveals out equation (6.1) which gives the value of 𝜓.
tan 180 − 𝜓 = − tan 𝜓 =𝑒𝑡𝑥
𝛼𝑥 ==> 𝜓 = − arctan
𝑒𝑡𝑥
𝛼𝑥 (6.1)
In equation (6.1), 𝛼𝑥 is the focal length of the camera in pixel dimensions, so 𝑒𝑡𝑥 must also be
in pixel dimensions in order to make the argument of the arctangent function unitless. In
order to find out 𝜓 from equation (6.1), the value of x coordinate of the target epipole in pixel
dimensions must be known. This is done by projecting the focal center of the camera, which
is at the current pose, 𝐶𝑐 , onto the image plane of the camera which is at the target pose.
When figure 6.31 is analyzed, it is seen that the ray emanating from 𝐶𝑐 and going towards 𝐶𝑡
creates the target epipole. The relationship between a 3D homogeneous point 𝐗 = [X Y Z 1]T
expressed in the fixed world frame and its projection 𝐱 = [x y 1]T in the image plane of the
camera is:
𝐱 = 𝐏𝐗 = 𝐊 𝐑 𝐭]𝐗
where (𝐑, 𝐭) are the extrinsic parameters(the rotation and the translation between the fixed
world and the camera frames) and 𝐊 is the intrinsic camera matrix as explained in perspective
projection section. Therefore, there are two steps to calculate the target epipole in pixel
dimensions.
i) Compute 3x4 projection matrix 𝐏 for the target pose,
ii) Project the focal center of the camera, which is at the current pose, onto the image
plane of the camera, which is at the target pose by 𝐞𝐭 = 𝐏𝐗𝐂𝐜. Here, 𝐞𝐭 is the 3x1 vector
standing for the target epipole, 𝐏 is the 3x4 projection matrix of the target pose found in step
(i), and 𝐗𝐂𝐜 is the 4x1 vector showing the homogeneous coordinates of the focal center of the
camera which is at the current pose with respect to the world coordinate frame.
After the calculation of 𝐞𝐭, 𝑒𝑡𝑥 , which is the x coordinate of the target epipole, can easily be
found and used in equation (6.1) to ascertain 𝜓. Then, the construction of the control law can
be finished.
Also, please note that the time derivative of the desired trajectories is necessary to find the
control signal. Since numerical values of 𝜓 are available, time derivative of 𝜓 is found by
numerical differentiation such that 𝜓 𝑡 = limΔ𝑡→0𝜓 𝑡+Δ𝑡 −𝜓 𝑡
Δ𝑡.
58
7-EXPERIMENTAL ARRANGEMENTS
In an experiment, there are only real images from the camera as inputs and nothing else. This
control algorithm needs two images, one of which is the image taken at the desired pose and
the other one is the current image. It tries to drive the robot from the initial configuration
towards the target pose by comparing the image taken at the desired pose and the current
images captured during the motion. Control loop for an experiment is shown in figure 7.1.
Figure 7.1 Diagram of the control loop for an experiment
Features extraction from images and matching of image points are carried out by SIFT.
SIFT(Scale Invariant Feature Transform) is an interest point detector and descriptor which is
invariant to scale and rotation as explained in section 5.1.3.2. The information obtained from
SIFT is used for the estimation of homography and the extraction of 𝜓. Estimation of
homography is done by direct linear transformation method as elucidated in 5.1.3.2, and the
extraction of 𝜓 is achieved by the relation 𝜓 = − arctan 𝑒𝑡𝑥
𝛼𝑥 , so x coordinate of the target
epipole must be found in reality from images. An algorithm proposed by [10] is used in order
to find out the fundamental matrix and then epipoles. Please refer to Appendix B for
information about the derivation of fundamental matrix and epipoles. After the extraction of
𝜓 and the computation of 3x3 homography matrix, construction of the control law is
complete. Then, the control signal which includes the angular and linear velocities
compatible with the aim can be applied to the robot. Thus, all required algorithms to carry out
an experiment and an understanding of them are explained in this report. Although all
necessary Matlab scripts are prepared to conduct an experiment on top of the required Matlab
codes of simulations, we had lack of time in this three month internship project to perform an
experiment.
59
It takes about 1.25 seconds(0.8 Hz) to calculate the control signal from two real images and
the completion time of one cycle of the control loop depends on the communication speed
between the computer and the robot. It has been verified by [23] with the experiments that if
the control loop runs even at 0.75 Hz of frequency, the stability of the system is achieved.
Thus, if the communication between the robot and the computer is sufficiently fast, then the
proposed algorithm has to perform well with the guarantee of stability.
8-CONCLUSIONS
In this project, a research on mobile robot navigation using visual servo control methods is
carried out. A homography based visual servoing method is decided to apply on a
nonholonomic mobile robot. A control law is constructed based upon the input-output
linearization of the system. Outputs of the system are chosen among the homography
elements and a set of desired trajectories for those outputs are defined. Therefore, the visual
servo control problem is transformed into a tracking problem. The visual control method
needs neither homography decomposition nor depth estimation nor any 3D measure of the
scene. Simulations show that the control algorithm is robust and the convergence of the
system is achieved with noise, calibration errors and uncertainty of the control parameters.
The performance of the system is obviously dependent on the desired trajectories of the
homography elements, since the problem is a tracking problem. In literature, there are several
proposed sets of desired trajectories of the homography elements and one of them is used in
this project. The set of desired trajectories picked up makes the robot converge towards the
target in a smooth manner avoiding discrete motions. However, the mobile robot can not
always converge to the target with zero pose error in a specified duration. This is mainly
because of that the desired homography trajectories dictate the robot to follow a path which
can not be achieved with present robot capabilities. Therefore, the mapping from
homography trajectories to Cartesian path should be investigated more as a future work, and
while doing that, the abilities of the robot should be taken into account. Then, more
appropriate and realizable desired homography trajectories could be brought out.
Also, there is a drawback for all homography based control methods used in applications and
offered in the literature. The homgoraphy based control methods may fail or give insufficient
results, if no plane is detected in the scene or the plane detected has 𝑛𝑧 = 0, i.e., the plane is
horizontal. In order to get rid of this disadvantage, some switching model based control
methods are proposed, such that when there is no appropriate plane detected to employ
homography based visual control, another control method takes over the control of the
system. If the other control method faces a singularity, then the homography based control
method becomes in charge again. As a future work, an addition of another control method
such as epipole based control method to the present work will eventually increase the
versatility and the robustness of the robot, on which the switching control algorithm is used.
60
APPENDIX A
When two cameras view a 3D scene from two distinct positions or when a single camera
takes the pictures of the same 3D scene from different positions, there are a number of
geometric relations between the 3D points and their projections onto the 2D images that lead
to constraints between the image points. Figure A.1 shows two cameras looking at point X
which is the point of interest to the cameras. OL and OR are the centers of projection(focal
points) of the cameras. XL and XR are the projected points of 3D point X onto the image
planes. Each camera captures a 2D image of the 3D world and transformation from 3D to 2D
is carried out by perspective projection.
Figure A.1 Epipolar Geometry
Centers of projection of the cameras are distinct so each center of projection is projected onto
a distinct point into the other camera's image plane(projection manifold) [24]. These two
points on the image planes are denoted by eL and eR and they are called epipoles. Centers of
projections and the epipoles of the cameras lie on the same 3D line. The line OL − X is
viewed by the left camera as a point because that line is the projection ray such that it is
directly in line with the left camera's center of projection. On the other hand, the very same
line is seen as a line by the right camera and the projection of that line onto the image plane
of the right camera is called an epipolar line (eR − XR). In the same manner, the line OR − X
which is seen as a point by the right camera is viewed as an epipolar line (eL − XL) by the left
camera. Additionally, the plane formed by OL , OR and X is called the epipolar plane. This
plane intersects each camera's image plane and that intersection results in a line which is the
epipolar line. All epipolar planes and lines intersect the epipole regardless of the location of
X. Additionally, the vector, w , originated from OL pointing towards OR is called positive
epipolar ray while the vector, −w originated from OL pointing to the opposite direction is
called negative epipolar ray.
61
The knowledge of the signs of the epipoles at the beginning of the motion is required in the
determination of desired trajectory of h13 . If the robot has not got at a suitable orientation at
the beginning of the motion, there is an extra step that should be taken in order to drive the
robot into a proper orientation for a smooth motion towards the target. Decision about the
extra step is dependent on the signs of x coordinates of the epipoles with respect to the robot
attached coordinate frame. In the framework of this project, the mobile robot moves with
planar motion, so only the x coordinates of the epipoles change in time. Therefore, x
coordinates of the epipoles are the decisive factors. This phenomenon is explained with the
help of figure A.2.
(a) (b)
Figure A.2 Geometric relations of the epipoles in the current image and target image
x coordinate of the target epipole is always positive when the initial position of the robot is in
the third quadrant(x<0 and z<0) of target frame, such as the cases illustrated in figure A.2. To
explain, the ray emanating from Cc crosses the projection manifold of the target scene in the
first quadrant of the target frame, so epipole takes place in the first quadrant and it has a
positive x coordinate. In a similar analogy, if the robot is in the fourth quadrant initially,
target epipole will always be in the second quadrant and it will have a negative x coordinate.
If the current epipoles are analyzed, it is seen that x coordinate of the current epipole of "case
a" has a positive value and x coordinate of the current epipole of "case b" is negative with
regard to the robot attached coordinate frames. Therefore, the desired trajectory of h13 is
defined in three phases for the "case a". However, it is defined in two phases for the "case b"
skipping the extra step.
62
APPENDIX B
The epipolar geometry explained in Appendix A is the intrinsic projective geometry between
two views and independent of scene structure. It only depends on the cameras' internal
parameters and relative pose. The fundamental matrix F encapsulates this intrinsic geometry
[10]. In other words, it is the algebraic representation of epipolar geometry.
A point X in three dimensional space is projected onto two images as being 𝒙 in the first
image and 𝒙′ in the second image. Then, the fundamental matrix shows the relation between
these two image points. The image points 𝒙 and 𝒙′ , the space point X, and the camera centers
are coplanar as shown in figure B.1, and this plane is called epipolar plane and denoted by 𝜋.
Figure B.1 3D Point X and its image points x and x′
The image point 𝒙 back projects to a ray in 3D space defined by camera center, C, and 𝒙
which are collinear. This ray is seen as line 𝒍′ in the second image.
Figure B.2 The ray emanating from C and passing through x is seen as line
l′(epipolar line for x) in the second image
63
As it can be seen in figure B.2, for each point 𝐱 in one image, there is a corresponding
epipolar line 𝒍′ in the other image and the matched point 𝐱′ of 𝐱 must lie on 𝒍′ . The
fundamental matrix defines the mapping from a point in one image to its corresponding
epipolar line in the other image(𝐱 → 𝒍′ ), and it satisfies the condition that for any pair of
corresponding points 𝒙 ↔ 𝒙′ in two images
𝒙′𝑻𝑭𝒙 = 0 (B. 1)
If points 𝒙 and 𝒙′ are the matching points, then 𝒙′ must lie on the epipolar line 𝒍′ = 𝑭𝒙. Since
𝒙′ is on 𝒍′ , the equation 𝒙′𝑻𝒍′ = 0 must be satisfied. Plugging 𝒍′ = 𝑭𝒙 into 𝒙′𝑻𝒍′ = 0 results
in 𝒙′𝑻𝑭𝒙 = 0. If the fundamental matrix denoted by
𝑭 =
f11 f12 f13
f21 f22 f23
f31 f32 f33
is written as
𝒇 = [f11 f12 f13 f21 f22 f23 f31 f32 f33]𝑇
and 𝒙 = [𝑥 𝑦 1]𝑇 and 𝒙′ = [𝑥 ′ 𝑦 ′ 1] , then each point match results in one linear equation in
terms of the unknown entries of the fundamental matrix, as shown in equation (B.2) below:
𝑥 ′𝑥f11 + 𝑥 ′𝑦f12 + 𝑥 ′ f13 + 𝑦 ′𝑥f21 + 𝑦 ′𝑦f22 + 𝑦 ′ f23 + 𝑥f31 + yf32 + f33 = 0 or
𝑥 ′𝑥 𝑥 ′𝑦 𝑥 ′ 𝑦 ′𝑥 𝑦 ′𝑦 𝑦 ′ 𝑥 y 1 𝒇 = 0 (B. 2)
For a set of n point matches, a set of linear equations are obtained.
𝑥1′𝑥1 𝑥1
′𝑦1 𝑥1′ 𝑦1
′𝑥1 𝑦1′𝑦1 𝑦1
′ 𝑥1 y1 1...
𝑥𝑛′𝑥𝑛 𝑥𝑛
′𝑦𝑛 𝑥𝑛′ 𝑦𝑛
′𝑥𝑛 𝑦𝑛′𝑦𝑛 𝑦𝑛
′ 𝑥𝑛 yn 1
𝒇 = 𝑨𝒇 = 0 (B. 3)
Equation (B.3) shows a homogeneous set of equations, so 𝒇 can be determined up to a scale
[10]. In order to obtain a solution for 𝒇, matrix A must have a rank of 8 at most. If it has a
rank of 8, then there exists a unique solution. However, if the data is noisy, then the rank may
be higher than 8. In this case, least squares solution is applied to find out 𝒇. The least-squares
solution for 𝒇 is the singular vector corresponding to the smallest singular value of 𝑨, that is,
the last column of 𝑽 in SVD(𝑨) = 𝑼𝑫𝑽𝑻.
An important property of the fundamental matrix is that it is not full rank, that is, it is not an
invertible mapping. An image point 𝒙 in one image defines a line 𝒍′ in the other image which
is the epipolar line of 𝒙. In the same manner, the image point 𝒙′ in the second image also
defines a line 𝒍 in the first image which is the epipolar line of 𝒙′ . Then, any point 𝒙 on 𝒍 is
mapped to the same line 𝒍′ . Therefore, there is no inverse mapping since the location of the
64
inverse mapped point of line 𝒍′ can not be exactly known, i.e., it can be anywhere on the
epipolar line 𝒍. This phenomenon makes the fundamental matrix rank deficient and it has a
rank of 2. Also, another consequence of the singularity of the fundamental matrix is that the
epipole location does not vary for different points. Physical interpretation of the singularity of
the fundamental matrix is explained by the help of figure B.3 [10].
Figure B.3 (a)Full rank Fundamental Matrix (b)Rank Deficient Fundamental Matrix
The lines seen in figure B.3 are the epipolar lines calculated using 𝒍′ = 𝑭𝒙 for different 𝒙
points. There is no common epipole in (a), but all epipolar lines intersect at the same point
which is the epipole in (b).
The fundamental matrix obtained by solving the linear equations in equation (B.3) may not
be of rank 2 due to contaminated data due to noise. For such a case, there is a step that should
be applied to force the fundamental matrix to be singular. This is done by singular value
decomposition. If singular value decomposition is applied to 𝑭 found from equation (B.3),
the following result is obtained:
SVD(𝑭) = 𝑼𝑫𝑽𝑻 where 𝑫 = a 0 00 b 00 0 c
and a ≥ b ≥ c.
Then, the reconstruction of the fundamental matrix is done by making the smallest singular
value zero, such that a ≥ b ≥ c = 0. Hence, 𝑭 = 𝑼 𝑑𝑖𝑎𝑔 𝑎, 𝑏, 0 𝑽𝑻 and it has a rank of 2.
Besides, the epipoles in two images are the left and right nullspaces of 𝑭, i.e., the last
columns of 𝑼 and 𝑽 respectively.
65
REFERENCES
[1] G. N. DeSouza and A. C. Kak, “Vision for mobile robot navigation: A survey,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 237–267,
2002.
[2] S. Hutchinson, G. D. Hager, and P. I. Corke, “A tutorial on visual servo control”, IEEE
Tran. on Robotics and Automation, vol. 12, no. 5, pp. 651–670, 1996.
[3] Francois Chaumette and Seth Hutchinson, “Visual Servo Control Part 1: Basic
Approaches and Part 2: Advanced Approaches” , IEEE Robotics & Automation Magazine,
December 2006.
[4] E. Malis, F. Chaumette, S. Boudet, 2 ½ D Visual servoing. IEEE Transactions on
Robotics and Automation, 1999.
[5] Mark W. Spong, Seth Hutchinson, M.Vidyasagar, Robot Modeling and Control, John
Wiley & Sons, Inc. ,the USA.
[6] http://en.wikipedia.org, Charge Coupled Devices. Obtained on 10th of December, 2009.
[7] B. Thuilot, P. Martinet, L.Cordesses, J. Gallice, “Position based visual servoing: Keeping
the object in the field of vision”, in Proc. IEEE Int. Conf. Robot Automat., pp. 1624-1629,
May 2002.
[8] W.Wilson, C.Hulls, G. Bell, “Relative end effector control using cartesian position based
visual servoing”, IEEE Trans. Robot. Automat. vol. 12, pp. 684-696, Oct. 1996.
[9] C.Sagues, G. Lopez-Nicolas, J.J.Guerrero, “Homography based visual control of
nonholonomic vehicles”, IEEE Int. Conference on Robotics and Automation, pages 1703-
1708, Rome- Italy, April 2007
[10] R. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge
University Press: Cambridge, UK.
[11] http://mathworld.wolfram.com, Dilation. Obtained on 10th of December, 2009.
[12] https://www.e-education.psu.edu/natureofgeoinfo/c2_p18.html, Nature of Geographic
Information , Plane Coordinate Transformations. Obtained on 15th of October, 2009
[13] Elan Dubrofsky, “Homography Estimation: A Master's essay submitted in partial
fulfillment of the requirements for the degree of master of science in faculty of graduate
studies”, University of British Columbia, March 2009.
[14] http://www.svgopen.org/2008/papers/86-Achieving_3D_Effects_with_SVG, Achieving
3D Effects with SVG For the SVG Open 2008 conference. Obtained on 5th of December.
66
[15] Z. Chuan, T.D. Long, Z. Feng and D.Z. Li, “A planar homography estimation method
for camera calibration”, Computational Intelligence in Robotics and Automation, 2003 and
IEEE International Symposium on, 1:424-429, 2003.
[16] Z. Zhang., “A flexible new technique for camera calibration”, IEEE Transactions on
Pattern Analysis and Machine Intelligence, 22(11):1330–1334, 2000.
[17] Anubhav Agarwal, C. V. Jawahar, and P. J. Narayanan, “A Survey of Planar
Homography Estimation Techniques”,Tech. Rep. IIIT/TR/2005/12, 2005.
[18] http://en.wikipedia.org, Camera resectioning. Obtained on 10th of October,2009.
[19] David G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”,
International Journal of Computer Vision, January 2004.
[20] David G. Lowe, “Object Recognition from Local Scale-Invariant Features”, Proc. of
International Conference on Computer Vision, Corfu, September 1999.
[21] http://en.wikipedia.org, Scale invariant feature transform. Obtained on 1st of
December,2009.
[22] J.J.E Slotine, Li Wieping, “Applied Non-linear Control”, Prentice-Hall.
[23] C. Sagues, G. Lopez-Nicolas, J.J. Guerrero, “Visual Control of Vehicles Using Two
View Geometry”, sent to the journal “Mechatronics”,2009.
[24] http://en.wikipedia.org, Epipolar Geometry. Obtained on 15th
November,2009.