VISION-BASED UAV POSE ESTIMATION
A Thesis Presented
by
Zahra Mohajerani
to
The Department of Electrical and Computer Engineering
In partial fulfillment of the requirements
for the degree of
Master of Science
in
Electrical Engineering
Northeastern University
Boston, Massachusetts
08/2008
Acknowledgements
I would like to express my gratitude to all who supported me and helped me through-
out this study. I thank Ehsan Nourbakhsh, for his invaluable assistance, and for all
his helps, with the Linux and programming in C. I am also grateful to Fei Xiong,
who helped me run the experiments over and over again. I thank Ali Nouri, for his
guidance, his encouragement, and all his helps which showed me how to think like
a computer scientist, not an engineer. And finally, I thank my parents for always
having faith and belief in me.
v
Contents
Acknowledgements v
1 Abstract 1
2 Introduction 2
2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Problem Description and Approach . . . . . . . . . . . . . . . . . . . 2
2.2.1 Unmanned Aircraft Vehicles (UAV) . . . . . . . . . . . . . . . 3
2.2.2 Pose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Pose Estimation - Related Works . . . . . . . . . . . . . . . . . . . 4
2.3.1 Single Camera and Moire Pattern . . . . . . . . . . . . . . . 5
2.3.2 Two Ground Cameras . . . . . . . . . . . . . . . . . . . . . . 7
2.3.3 Two Cameras: One Ground And One Onboard . . . . . . . . 8
2.3.4 Other Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Essential Preliminaries 12
3.1 Markov Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Conditional Mean and Variance in Gaussian Distribution . . . . . . . 13
3.3 EKF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5 Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5.1 DBN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.6 Shi and Tomasi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.7 Spatial Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.7.1 AT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.7.2 Coordinate Frame Relationships . . . . . . . . . . . . . . . . . 20
3.7.3 Stochastic Map . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.7.4 Updating The Map . . . . . . . . . . . . . . . . . . . . . . . . 26
vi
4 SLAM 294.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.2.2 Observation Model . . . . . . . . . . . . . . . . . . . . . . . . 314.2.3 Motion Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Probabilistic SLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.3.1 Step one: Prediction(time-update) . . . . . . . . . . . . . . . 324.3.2 Step two: Correction(measurement-update) . . . . . . . . . . 32
4.4 EKF-SLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.4.1 Time Update(Predict) . . . . . . . . . . . . . . . . . . . . . . 344.4.2 Observation Update(Correct) . . . . . . . . . . . . . . . . . . 354.4.3 Advantages and Disadvantages . . . . . . . . . . . . . . . . . 35
5 Experiments 405.1 MonoSLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1.1 Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.2 Vicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6 Conclusions 53
7 Discussion and Future Works 547.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.1.1 Which Feature to Add . . . . . . . . . . . . . . . . . . . . . . 547.1.2 Moving Objects . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.2 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Bibliography 56
List of Tables
4.1 SLAM Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 EKF-SLAM Notations . . . . . . . . . . . . . . . . . . . . . . . . . . 34
viii
List of Figures
2.1 6 Degree Of Freedom (6DOF) of an aircraft . . . . . . . . . . . . . . 4
2.2 A moire pattern generated by superposition of two layers consisting of
parallel lines; the lines of the revealing layer are parallel to the lines of
the base layer–courtesy of[15] . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Moire Target used by Tournier for UAV pose estimation . . . . . . . 6
2.4 Calculation of Altitude and Yaw angle . . . . . . . . . . . . . . . . . 7
2.5 Using the color markers under the quadrotor and 2 cameras onboard
and on the ground, pose of the UAV can be computed . . . . . . . . . 9
2.6 Using Accelerometers in wii and iphone for motion analysis . . . . . . 10
2.7 GUI view of MonoSLAM software. . . . . . . . . . . . . . . . . . . . 11
3.1 DBN for the SLAM problem. Ui:control input applied at time (i − 1) to
derive the system to state X(i), Z(i, j): observation of Lj made at time i. 17
3.2 A sequence of approximate transformations. line: AT, ellipse: AT’s uncer-
tainty, solid:gained by movement, dashed:gained by observation. . . . . . 20
3.3 Compounding, Merging and Reversal operations. line: AT, ellipse:
AT’s uncertainty, solid:gained by movement, dashed:gained by obser-
vation, Xij: AT from frame i to frame j . . . . . . . . . . . . . . . . 24
3.4 map of 3D world, each feature represented as a 6DOF vector (3 DOF posi-
tion and 3 DOF orientation) . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5 updating the map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
ix
4.1 1:start the prediction step,2: estimate new location, 3:add process noise,
4:start the measurement step, measure features, 5:update positions and un-
certainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 The convergence in landmark uncertainty. The plot shows a time history of
standard deviations of a set of landmark locations. A landmark is initially
observed with uncertainty inherited from the robot location and observation.
Over time, the standard deviations reduce monotonically to a lower bound.
New landmarks are acquired during motion . . . . . . . . . . . . . . . . 37
4.3 Before and after closing the loop, this figure is from [19] . . . . . . . 38
5.1 The evolution of depth probability density from uniform distribution
to Gaussian, points are uniformly distributed over the image patch’s
normal line from 0.5 to 5m. Courtesy of [12]. . . . . . . . . . . . . . . 43
5.2 The camera poses in 2 different viewpoints looking at the same image
patch with normal n. Courtesy of [12]. . . . . . . . . . . . . . . . . . 43
5.3 Camera Motion Model . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Vicon camera. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.5 Vicon markers, from [3]. . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.6 The output of MonoSLAM for a sample sequence, with 4 startup fea-
tures. The map is built gradually by updating the visible features in
each frame. Blue marks correspond to failed matches, red marks cor-
respond to successful matches and yellow marks correspond to unused
features in current frame. . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.7 MonoSLAM Error in X, Y and Z Dimension. . . . . . . . . . . . . . . 51
5.8 UAV orientation, MonoSLAM and Vicon outputs . . . . . . . . . . . 51
5.9 Comparing the output trajectories in 2D space, from Vicon and MonoSLAM. 52
7.1 a: existing map of the room; 10 points stars are the features, darker
color corresponds to high quality features and lighter colors corre-
sponds to low quality ones. 4 points pink stars are the selected land-
marks. b: updated map, selecting 2 other landmarks by only taking
into account the quality of features. b: updated map, selecting features
by quality and distance from the existing features in the map. . . . . 55
Chapter 1
Abstract
The popularity of Unmanned Aerial Vehicles(UAVs) has increased dramatically in
the past few decades. The main challenging task for UAVs operation is to find
their position and orientation accurately and continuously during the flight, using
the onboard sensors (camera in our case). There are lots of different approaches
for UAV pose estimation in literature now. Several different approach and their
problems in real-life application were studied in this research. We chose the real
time single camera SLAM method [12], because our robot was equipped with an
onboard camera, and MonoSLAM had proved its reliability in previous localization
experiments[12],[10],[6],[11]. In our work, we obtained the ground truth by the Vicon
cameras. The experimental results suggest that this method can be reliably used in
the control of indoor UAV navigation.
1
Chapter 2
Introduction
In our work, we consider the problem of pose estimation in a helicopter, when the
sensory information comes only from a single mounted camera. We show that a
previously studied algorithm called MonoSLAM [12] can be successfully used to solve
this task by checking the ground truth with Vicon cameras.
2.1 Motivation
The popularity of Unmanned Aerial Vehicles(UAVs) has increased dramatically in
the past few decades. The main challenging task for UAVs operation is to find
their position and orientation accurately and continuously during the flight, using
the onboard sensors.
2.2 Problem Description and Approach
In order to control the UAV during any mission, we need to know its position and
orientation to take the proper decision in each moment. In the cases where a complete
map of the environment is known ahead of time, this extra information can be used to
get leverage on solving the pose estimation problem. However, in most of the real-life
applications, the map is not available a priori. This imposes additional complexities
into the localization problem. Here we can redefine the localization problem as the
2
2.2. PROBLEM DESCRIPTION AND APPROACH 3
ego-motion estimation of a moving robot inside the world and build the map of the
world at the same time and update it dynamically. This problem is known as The
Simultaneous Localization and Mapping (SLAM) problem in the robotics community.
As robot’s updating sensor we chose the camera mounted on a helicopter and used
the MonoSLAM program of Andrew Davison to implement the real time pose of the
UAV. Finally we checked the ground truth using Vicon cameras. The experiments
showed that the calculated pose by this algorithm is reliable enough to be used in the
UAV controlling step.
2.2.1 Unmanned Aircraft Vehicles (UAV)
Among all aircrafts, helicopters are are particularly interesting in that they possess
unique characteristics. Helicopters, unlike fixed wings aircrafts, can land and take off
vertically without the needs for a runway, because their rotors’ blades revolve in the
air, which lets the helicopters to move vertically without the need to move forward.
Another interesting characteristic of helicopters is that they can hover for an extended
period of time, and that they can handle low airspeed conditions. Because of these
unique properties, helicopters can be used to accomplish tasks that no other aircrafts
can.
Today, helicopters are used for transportation, construction, fire fighting, search and
rescue, and lots of different tasks that require the special capabilities mentioned above
[4].
UAVs are aircrafts that work without pilots. They can be used in situations, where a
human’s life would be at risk, like fire fighting, search and rescue in unknown fields,
and all missions which are too dangerous for manned aircrafts[wiki].
The focus of this research is to provide a way of controlling the unmanned quadrotor,
with the hope of taking advantages of the good characteristics of both UAVs and
helicopters.
4 CHAPTER 2. INTRODUCTION
(a) horizontal (b) vertical (c) sideways
(d) roll (e) pitch (f) yaw
Figure 2.1: 6 Degree Of Freedom (6DOF) of an aircraft
2.2.2 Pose
Degrees of freedom (DOF) are the set of independent displacement of the object
in the space that completely define the position of the object or system during its
movement. 6DOF or pose refers to motions in 3-dimonsional space, three of these
degrees are related to angular position (rotation around perpendicular axes/attitude:
(φ, θ, ψ)) which are pitch, roll and yaw; and the other three are the translation in three
perpendicular axes (position:(x, y, z)) : moving backward/forward (Longitudinal),
up/down (Vertical) and left/right(Lateral). Figure 2.1 1 illustrates these 6DOF for
an airplane.
2.3 Pose Estimation - Related Works
The first step to successfully use UAV is to find its position and attitude (pose) in
the 3D space accurately and fast. This information is very critical for controlling
1from NASA: http://www.aviationsystemsdivision.arc.nasa.gov/.
2.3. POSE ESTIMATION - RELATED WORKS 5
the UAV. Because the main use of UAVs is in unknown environments for search and
rescue, or for surveillance in cluttered environments, in which the global positioning
system (GPS) usually doesn’t work, lots of research has been recently done to find the
pose using other devices like camera or laser range finder and other sensors. A brief
summary of these approaches and their positive and negative properties is presented
in this section.
2.3.1 Single Camera and Moire Pattern
THE BASICS OF LINE MOIRÉ PATTERNS AND OPTICAL SPEEDUP
Emin Gabrielyan Switzernet Sàrl, Scientific Park of Swiss Federal
Institute of Technology, Lausanne (EPFL) [email protected]
Abstract We are addressing the optical speedup of movements
of layers in moiré patterns. We introduce a set of equations for computing curved patterns, where the formulas of optical speedup and moiré periods are kept in their simplest form. We consider linear movements and rotations. In the presented notation, all periods are relative to the axis of movements of layers and moiré bands.
Keywords: moiré patterns, line moiré, superposition images, optical speedup, moiré speedup, moiré magnification, moiré inclination angles, periodic moiré
1. Introduction Moiré patterns appear when superposing two
transparent layers containing correlated opaque patterns. The case when layer patterns comprise straight or curved lines is called line moiré.
This document presents the basics of line moiré patterns. We present numerous examples and we focus also on the optical speedup of moiré shapes when moving layer patterns. Numerous examples are present. Dynamic examples demonstrating the movements of layers are presented by GIF files (hyperlinks are provided in square brackets).
We develop here the most important formulas for computing the periods of superposition patterns, the inclination angles and the velocities of optical shapes when moving one of the layers.
In section 2, we demonstrate the phenomenon on the examples with horizontal parallel lines, which are further extended to cases with inclined and curved lines. In section 3 we present circular examples with straight radial lines, which are analogously extended.
2. Simple moiré patterns
2.1. Superposition of layers with periodically repeating parallel lines
Simple moiré patterns can be observed when superposing two transparent layers comprising periodically repeating opaque parallel lines as shown in Figure 1. The lines of one layer are parallel to the lines of the second layer.
Figure 1. Superposition of two layers consisting of
parallel lines, where the lines of the revealing layer are parallel to the lines of the base layer [eps], [tif], [png]
The superposition image does not change if transparent layers with their opaque patterns are inverted. We denote one of the layers as the base layer and the other one as the revealing layer. When considering printed samples, we assume that the revealing layer is printed on a transparency and is superposed on top of the base layer, which can be printed either on a transparency or on an opaque paper. The periods of the two layer patterns, i.e. the space between the axes of parallel lines, are close. We denote the period of the base layer as and the period of the revealing layer as . In Figure 1, the period of lines of the base layer is equal to 6 units, and the period of lines of the revealing layer is equal to 5.5 units.
bp
rp
The superposition image of Figure 1 outlines periodically repeating dark parallel bands, called moiré lines. Spacing between the moiré lines is much larger than the periodicity of lines in the layers.
Light areas of the superposition image correspond to the zones where the lines of both layers overlap. The dark areas of the superposition image forming the moiré lines correspond to the zones where the lines of the two layers interleave, hiding the white background. The labels of Figure 2 show the passages from light zones with overlapping layer lines to dark zones with interleaving layer lines. The light and dark zones are periodically interchanging.
Page 1 of 9
Figure 2.2: A moire pattern generated by superposition of two layers consisting ofparallel lines; the lines of the revealing layer are parallel to the lines of the baselayer–courtesy of[15]
Glenn Tournier [36], used a single camera onboard as a quadrotor sensor for pose
estimating. He used the property of Moire effect to estimate the quadrotor pose by
making a target with two different types of Morie patterns in two perpendicular sides
and 4 colored LEDs on each corner as indicated in Figure 2.3. The Morie pattern in
6 CHAPTER 2. INTRODUCTION
Figure 2. Gratings to generate moire patterns.
Figure 3. Moire target with key features labeled.
on of top of the other a distance d apart as shown in Figure 2. The gratings in the figure are regularlyspaced with characteristic wavelength λ. The printed gratings are referred to as Ronchi rulings where thetransmittance profile is a square wave which is opaque for exactly half of the duty cycle and transparent forthe remaining half.11
Figure 3 shows a drawing of the target with key features labeled including the moire patterns createdby the gratings. The actual target as it appears on the image plane of the camera is shown in Figure 4.Five red lights on the target are used as feature points with the off-corner red light used to determine theorientation of the target. The red lights are given a reference letter as seen in Figure 3.
Referred to as the red dots on the image plane, the red lights are used to find the coarse and fine gratingsand compute yaw and altitude. Identical grating sets are printed in orthogonal directions. The differencebetween the fine gratings and coarse gratings is their characteristic wavelength λ shown in Figure 2. Thecoarse grating wavelength is larger than that of the fine grating, generating a lower frequency moire pattern.The purpose of the different moire pattern wavelengths will be discussed later.
III. Pose Estimation Equations
In this section, we present how the target allows the system to calculate the 6 DOF of the camera usingmoire patterns and geometry. The altitude and yaw estimations rely entirely on geometry while the lateraltranslations and rotations about those axes incorporate the use of the patterns.
The several coordinate systems used to perform the calculations are shown in Figure 5. A fixed inertialframe (X, Y, Z) is assumed to be connected to the target with the origin located on the red light labeled B.A vehicle coordinate system is connected through its center of mass (xv, yv, zv) and another set is connectedthrough the camera, (xc, yc, zc) with the origin at the focal point, F . The image plane is described by (u, v).Pitch, roll, and yaw are all given in the camera coordinate system as Θ, Φ, and Ψ respectively.
A. Position
A pinhole model of the camera is assumed to calculate the altitude of the vehicle relative to the target, andwe assume that the camera is located at the vehicle’s center of gravity, i.e. the vehicle coordinate system is
3 of 16
American Institute of Aeronautics and Astronautics
(a) Target Model Figure 4. Actual image of moire target as seen by the camera.
identical to the camera coordinate system. In addition, the image plane is assumed to be nearly parallel tothe target plane when the camera is looking down at the target such as during a hover or landing condition.Given these assumptions, the basic principle of similar triangles can be applied as shown in Figure 6 tocompute altitude. F is the focal point of the camera, P is the principle point, K is the point is directlybelow, and D and E are points on the target in the fixed coordinate frame (X,Y, Z) while D′ and E′ are theequivalent points as they appear in the image plane (u, v). The principle point is defined as the point wherethe line that is perpendicular to image plane goes through the focal point and intersects the image plane.Since 4FKD is similar to 4FPD′ and 4FKE is similar to 4FPE′, we obtain the altitude, h, using
s
h=
r
f. (1)
The calculation requires knowledge of the physical distance between the feature points on the target, s,the corresponding distance in the image in pixels, r, and the focal length of the camera, f , in units of pixels.The altitude is computed four times using the lines BC, CD, ED, and EB on the image. The resultingvalues are averaged to remove the effect of perspective. This is an appropriate method given the assumptionof nearly parallel image and target planes.
Since the top and bottom gratings are printed in one dimension, the analysis will assume a one-dimensional problem focusing on two gratings. Each grating is represented as a square wave with a Fouriertransform of
Ff(y) =∞∑
i=−∞
sin πi2
πiδ
(f− i
λ
)(2)
consisting of impulses at integer multiple frequencies of the printed pattern. Given that the magnitude ofthe sine components decays inversely with frequency, a first order raised cosine model, i = 0,±1 in Equation2, can be used to approximate the square wave.
Modeling of the printed gratings using raised cosine function approximations to the square wave canbe seen in Figure 7. The figure represents the view from the side of the two superimposed gratings of theprinted wavelength λ at a distance apart d. We let the origin be the start of the gratings, x be the distanceto the camera focal point, y be the distance to the ray of light entering the camera from the top grating,and h be the altitude, i.e. the distance between the focal point and target surface. The angle between thevertical and the ray of light of interest is γ. Using these variables, the top and bottom pattern are modeledas
fT (y) =12
+12
cos(
2π
kλy
)(3)
4 of 16
American Institute of Aeronautics and Astronautics
(b) Actual Target
Figure 2.3: Moire Target used by Tournier for UAV pose estimation
the target is generated by placing two similar-structured grid on top of each other by
a small distance. Tournier computed the ”altitude” of the camera by applying the
principle of similar triangles on the LED around the target, the ”yaw” by computing
how parallel the lines EB and DC are to the vertical, and the rest four parameters
by using the moire effect appearing on the target due to the camera movement.
Disadvantages
Although this approach showed good results in [16] and [36], there are some disad-
vantages to it if used in outdoor environments or in other real-life approaches:
1. Moire effects are sensitive to change in illumination, and perform poorly in out-
door environments. Because the primary UAV usage in our research is surveil-
lance, search and rescue, firefighting and etc, and these all occur in outside
environments, Moire pattern is not a good choice.
2. The pose estimation here is highly dependent to the target, and if the target is
not visible to the camera for a short period of time, the UAV will lose its control
due to the lack of sufficient knowledge about its pose. Also if we want to fly
2.3. POSE ESTIMATION - RELATED WORKS 7
Figure 5. Coordinate Systems.
Figure 6. Calculation of altitude.
fB(y) =12
+12
cos(
2π
kλ(y − d tan γ) + ψ
). (4)
The phase offset ψ accounts for lateral misalignment error of the gratings when placed one over the other.The k variable corrects for the refraction of light as it passes through the transparent acrylic sheets thatsandwich the printed grating sheets to hold them in place. The physical layout of the target will be describedfurther in Section V.
The overall intensity of the observed moire pattern, I, is computed through the multiplication of theequations of the individual grating intensities:
I =[12
+12
cos(
2π
kλy
)] [12
+12
cos(
2π
kλ(y − d tan γ) + ψ
)]. (5)
The following equations exist due to the geometry of the problem and from a trigonometric identity:
tan γ =x− y
h, (6)
(12
+12
cos a
) (12
+12
cos b
)=
14
+14
cos a +14
cos b +18
cos(a + b) +18
cos(a− b). (7)
5 of 16
American Institute of Aeronautics and Astronautics
(a) AltitudeFigure 10. Yaw angle Ψ is determined by observing how parallel the lines EB and DC are to the vertical.
Given that the apparent wavelength, ρ, of the gratings remains nearly constant on the image plane asdiscussed previously, the phase at the principle point can be computed by first finding the perpendiculardistance between P and the line where the gratings start in pixels, xg. Given xg and the known phase angleat the start of the gratings, the phase at the principle point, φp, is solved by
φp =2π
ρxg + φ− 2πn. (16)
Equation 15 is then used to compute the attitude angle. Similar to the position computation, the sameequation holds for the orthogonal set of fringes to calculate the angle about the other non-vertical axis.
Camera yaw, Ψ, is evaluated by observing how parallel the lines between feature points on the target areto the vertical. Using the red dots on the target, this implies
Ψ = tan−1
(Bv − Ev
Bu − Eu
)(17)
as seen in Figure 10.Since the target provides information for the exercise of landing, instances when the target is a great
lateral distance away and the effect of perspective is prominent are outside the scope of the purpose of thetarget. Therefore, it was not necessary to include perspective in Equation 17. To minimize any small effect,the yaw angle is averaged between the angles returned by using EB and DC.
IV. Algorithm
To extract all the information the target provides, the algorithm first captures an image and goes overeach pixel. It removes the distortion introduced by the camera, finds the red dots, determines the target’sorientation, finds the beginning of the gratings, knows the direction in which to acquire the data for theDFT, performs the DFT, unwraps the phase, and computes the 6 DOF of the camera relative to the target.The algorithm then repeats the process on the next captured frame. A flowchart of the algorithm can befound in Figure 11.
V. Integrated Vehicle System
To demonstrate the capabilities of this device in a real-time environment, we used this vision sensortechnology to provide real-time positioning data for a Draganflyer V Ti Pro12 (as shown in Figure 1) inan low-cost, indoor testing environment. The quadrotor is a four rotor helicopter that receives 4 inputcommands, one to each rotor. Two of the four rotors are counter-rotating to enable 6 degree of freedommovement. A wireless CMOS camera is attached to the vehicle pointing downward at the moire target.Figure 12 shows a diagram of the components and setup of the integrated system. Notice that all ofthe computing for this system is done on a ground-based computer, which had two AMD 64-bit Opertonprocessors, 2 Gb of memory, and ran Gentoo Linux. To command the vehicle, a circuit board was constructed
9 of 16
American Institute of Aeronautics and Astronautics
(b) Yaw
Figure 2.4: Calculation of Altitude and Yaw angle
in a higher altitude than the room ceiling in outdoor environments, the target
and its grating will become too small and its tracking will become impossible
(Unless we significantly increase the target size). So, finding the pose while
tracking a fixed target is not a good idea for outdoor environment use either.
2.3.2 Two Ground Cameras
Andrew NG used 2 ground stereo pair cameras and a machine-learning approach for
helicopter pose estimation and flight control. His projects were performed by a real
helicopter flying in outdoor environments and showed very good results (the exper-
iment movies are available in his website)2. Because of using two ground cameras
, this approach can not be used in unknown strategic environments for surveillance
tasks.
2http://heli.stanford.edu/
8 CHAPTER 2. INTRODUCTION
2.3.3 Two Cameras: One Ground And One Onboard
Erdinc Altug and Camillo Taylor [5] also used 2 cameras. They placed a pan/tilt/zoom
camera on the ground and another one on the quadrotor, looking downward to the
ground camera. Their research goal was to find the quadrotor position and attitude
with respect to the ground camera rather than finding the quadrotor pose by itself. In
order to find the camera pose, some colored markers with different colors and known
radius (2.5cm in their experiment) were attached under each rotor, on the center of
the camera and also on the top of the ground camera as illustrated in Figure 2.6.
By tracking the markers and computing their areas, the relative pose to the camera
could be computed.
Disadvantages
1. In order to use in real-life situations, the markers must be very big to be rec-
ognizable from a far distance, and this makes the approach unusable in critical
situations like surveillance.
2. There is no other way to find the quadrotor pose when the ground camera is
out of sight of the quadrotor camera. This makes the approach very sensitive
to any possible occlusion in outdoor environments.
3. Having the ground camera is not possible for some missions like fire fighting or
search and rescue.
2.3.4 Other Approaches
Beside cameras, it is possible to use other sensors to equip the system with extra useful
information in order to increase the localization precision. Using Gyro to find the
attitude is a great choice when the weight limit is not an issue. Three accelerometers
in three perpendicular axes can also be used to estimate position Wii). Finally, laser
2.3. POSE ESTIMATION - RELATED WORKS 9II. H ELICOPTERPOSEESTIMATION
For surveillance and remote inspection tasks a relativeposition and orientation detection is important. Our goal isto obtain the pose from vision rather than complex and heavynavigation systems or GPS. The purpose of the pose estimationmethod is to obtain the relative position and orientationof the helicopter with respect to the ground camera. Twocamera pose estimation method uses a pan/tilt/zoom groundcamera and an onboard camera. Previous work on vision basedpose estimation utilizes monocular views or stereo pairs. Ourtwo camera pose estimation method involves the use of twocameras that are set to see each other. Colored blobs of 2.5cm radius are attached to the bottom of the quadrotor andto the ground camera as shown in Figure 2. A blob trackingalgorithm is used to get the positions and areas of the blobs onthe image planes. Therefore, the purpose of the pose estimationalgorithm is to obtain (x, y, z) positions, tilt angles (θ ,ψ) andthe yaw angle (φ ) of the helicopter in real-time relative to thecamera frame.
Fig. 2. Quadrotor Tracking with a Camera
The pose estimation can be defined as:Problem: Find Rotation Matrix,R ∈ R3×3, defining the
body fixed frame of the helicopter with respect to the fixedframe located at the ground camera frame, whereRTR =I ,det(R) = I , relative position~p∈R3 which is the position ofthe helicopter with respect to the ground camera and velocities~w and~V.
In this section, we will introduce the two-camera poseestimation algorithm, and compare it through simulations toother pose estimation algorithms.
A. Two Camera Pose Estimation Method
The two camera pose estimation method uses a pan-tiltground camera and an on-board camera. Previous work onvision-based pose estimation utilizes monocular views orstereo pairs. Our two camera pose estimation method involvesthe use of two cameras that are set to see each other. Thismethod is especially useful for autonomous taking off orlanding. Especially when the relative motion information iscritical, such as landing on a ship at rough seas. Colored blobsof 2.5 cm radius are attached to the bottom of the quadrotorand to the ground camera as shown in Figure 3. Trackingtwo blobs on the quadrotor image plane and one blob on theground image frame is found to be enough for accurate poseestimation. To minimize the error as much as possible, fiveblobs are placed on the quadrotor and a single blob is located
on the ground camera. The blob tracking algorithm tracks theblobs and returns image values(ui ,vi) for i = 1· · ·6.
3
6
5
4
1
2
Pan/Tilt
xb
zb
x
z
OnboardCamera
w1
w2
Quadrotor
w3
yb
Ground Camera
Fig. 3. Two-Camera Pose Estimation Method using a Pair of Ground andOnboard Cameras
The cameras have matrices of intrinsic parameters,A1 andA2. The unit vector~wi ∈ R3 from each camera to the blobscan be found as
~wi = inv(A1).[ui vi 1]′, ~wi = ~wi/norm(~wi)f or i = 1,3,4,5,6 (1)
~w2 = inv(A2).[u2 v2 1]′, ~w2 = ~w2/norm(~w2)
Let~La be the vector pointing from blob-1 to blob-3 in Figure 3.Vectors~w1 and~w3 are related by
λ3~w3 = λ1~w1 +R~La (2)
whereλ1 andλ3 are unknown scalars. Taking the cross productwith ~w3 gives
λ1(~w3×~w1) = R~La×~w3. (3)
This can be rewritten as(~w3×~w1)× (R~La×~w3) = 0. (4)
Let the rotation matrixR be composed of two rotations: therotation of θ degrees around the vector formed by the crossproduct of~w1 and ~w2 and the rotation ofα degrees around~w1. In other words
R = Rot(~w1×~w2,θ) ·Rot(~w1,α) (5)
whereRot(~a,b) means the rotation ofb degrees around the unitvector~a. The value ofθ can be found from the dot productof vectors~w1 and~w2.
θ = acos(~w1 ·~w2) (6)
The only unknown is the angleα. Let M be a matrixdescribed as
M = (~w3×~w1)× (~w3× (R(~w1×~w2,θ))). (7)
(a) markers
Method Ang. E. (deg) Posn. E. (cm)
Direct M. 10.2166 1.55754 Pnt. M. 3.0429 3.0807
2 Camera M. 1.2232 1.2668Linear M. 4.3700 1.8731Stereo M. 6.5467 1.1681
TABLE I
COMPARISON OF THEANGULAR AND POSITIONAL ERRORS OF
DIFFERENTPOSEESTIMATION METHODS
(φ , θ , ψ), representing yaw, roll (rotation aroundy-axis) andpitch (rotation aroundx-axis), respectively.
φ
θψ
lB
y x
zb
yb
mg
F
F
F
M
3
2
4
1
F
Front M2
O
4z
xb M1M3
Fig. 6. 3D Quadrotor Model
A spinning rotor produces moment as well as thrust. LetFi
be the thrust andMi be the moment generated by rotori, thatis spinning with rotational speed ofwi .
Let Vb ∈ B be the linear velocity in body-fixed frame andwb ∈ B the angular velocity. Therefore the velocities will be
Vb =RT p (13)
skew(wb) =RTR (14)
where skew(w) ∈ so(3) is the skew symmetric matrix ofw.To represent the dynamics of the quadrotor, one can write theNewton-Euler equations as follows
mVb =Fext−wb×mVb (15)
Ibwb =Mext−wb× Ibwb. (16)
Fext and Mext are the external forces and moments on thebody-fixed frame.Ib is the inertia matrix, andm is the massof the helicopter.
Drag on a moving object [12] is given byDrag= 12Cdρv2A,
in which ρ is the density of air,A is the frontal area,Cd is thedrag coefficient, andV is the velocity. Assuming constantρ,the constants at the above equation can be combined to formC, which simplifies drag toDrag = Cv2.
The force generated by a rotor [12] which is spinning withrotational velocity ofw is given byF = bL = ρ
4 w2R3abc(θt −
φt), whereb is the number of blades on a rotor,θt is the pitch atthe blade tip,φt is the inflow angle at the tip. By combining theconstant terms as constant variableD, this equation simplifiesto Fi = Dwi
2.ThereforeFext andMext will be
Fext =−Cxx2i−Cyy
2 j +(T−Czz2)k−R·mgk (17)
Mext =Mxi +My j +Mzk (18)
whereCx, Cy, Cz are the drag coefficients alongx, y and zaxes, respectively.T is the total thrust andMx, My, Mz arethe moments generated by the rotors. The relation of thrustand moments to the rotational velocities of rotors is given asfollows
TMx
My
Mz
=
D D D D−Dl Dl Dl −Dl−Dl −Dl Dl DlCD −CD CD −CD
w12
w22
w32
w42
.
(19)The above matrixM ∈R4×4 is full rank for l ,C,D 6= 0. The
rotational velocity of rotori (wi), can be related to the torqueof motor i (τi) as
τi = Ir wi +Kwi2 (20)
where Ir is the rotational inertia of rotori, K is the reactivetorque due to the drag terms.
Motor torquesτi should be selected to produce the desiredrotor velocities (wi) in Equation 20, which will change theexternal forces and moments in Equation 18. This will producethe desired body velocities and accelerations in Equation 14.
A. Control
A controller should pick suitable rotor speedswi for thedesired body accelerations. Let us define the control inputs tobe
u1 = (F1 +F2 +F3 +F4)u2 = l(−F1 +F2 +F3−F4) (21)
u3 = l(−F1−F2 +F3 +F4)u4 = C(F1−F2 +F3−F4).
C is the force-to-moment scaling factor. Theu1 represents atotal thrust on the body in the z-axis,u2 andu3 are the pitchand roll inputs andu4 is a yawing moment. Backsteppingcontrollers [13] are useful when some states are controlledthrough other states. Since motions along thex and y axesare related to tilt anglesθ and ψ respectively, backsteppingcontrollers given in [1] can be used to control tilt anglesenabling the precise control of thex andy motions (inputsu2
and u3). The altitude and the yaw, can be controlled by PDcontrollers
u1 =g+Kp1(zd−z)+Kd1(zd− z)
cosθ cosψ(22)
u4 = Kp2(φd−φ)+Kd2(φd− φ).
(b) Quadrotor model
Figure 2.5: Using the color markers under the quadrotor and 2 cameras onboard andon the ground, pose of the UAV can be computed
range finders have also been used to improve the performance of pose estimation.
Since the quadrotor we planned to use in this research was equipped with an on-
board camera, and since the weight limit did not let us to have extra sensors on board,
we chose the single camera approach, and used MonoSLAM [12]. A brief sketch of
MonoSLAM is presented here and in Chapter 4 the algorithm is explained in details.
MonoSLAM
MonoSLAM is a camera based approach of SLAM, first introduced by Andrew Davi-
son [10]. Before this method was introduced, other sensory information such as laser
range finders were used in SLAM.
The camera mounted on the Robot captured a series of pictures from the environment
as it moves around, and detects the high-quality features inside them. These image
patches are then used in the regular SLAM method as discussed in [14] and [8].
Unlike the others, MonoSLAM relies more on localization rather than finding the
precise map of the room. This method stores just enough number of features to find
10 CHAPTER 2. INTRODUCTION
(a) Nintendo Wii (b) iPhone
Figure 2.6: Using Accelerometers in wii and iphone for motion analysis
the robot’s position, thus creating a very sparse map (only around 100 total features),
so the output is a very sparse map and the robot’s pose during its motion.
To decrease the uncertainty in the pose, Davison chose to initialize the map in the first
frame with four known features (by giving their appearances and true locations), and
also he gave the starting pose of the camera. Hence, both the map and pose data series
start with known data, which help the system have a sense of depth at the start point.
Disadvantages
1. If we want to use MonoSLAM in real time (30Hz for example), we cannot have
more than 100 features in the map, which is enough for pose estimation inside
a small room but not in bigger indoor or outdoor environments.
2. The other setback of this approach is its fragility against moving objects. The
key point of the algorithm is repeatable observation, and if features move in-
side the scene, reobservation of them can result in closing uncorrect loops and
unreasonable localization. Moreover, as a result of this problem, it’s almost
2.3. POSE ESTIMATION - RELATED WORKS 11
Introduction Davison’s MonoSLAM The SceneLib Libraries Final Thoughts
The MonoSLAMGlow Application
Failed matchSuccessful match
Unused feature
Figure 2.7: GUI view of MonoSLAM software.
impossible to use the robot for tracking (a moving) object inside the scene.
We study possible ways to overcome these disadvantages in Chapter 7, and believe
this approach is currently the best one in the literature for our purpose.
Chapter 3
Essential Preliminaries
In this chapter, we review the fundamental concepts and techniques, which are neces-
sary for understanding the MonoSLAM algorithm. We demonstrate some probability
estimation techniques, Markov sequence, Monte Carlo simulation, Bayes theorem,
Dynamic Bayesian Network (DBN) and also the fundamental foundation of spatial
relationships.
3.1 Markov Sequence
Papoulis[27] defined Markov process as a random process, in which its upcoming prob-
abilities are only dependent on their most recent values. In other words, a stochastic
process x(t) is Markovian if it has the following properties for all n and t1 < t2... < tn:
P (x(tn) ≤ xn|x(tn−1), ..., x(t1)) = P (x(tn) ≤ xn|x(tn−1))
This is equivalent to:
P (x(tn) ≤ xn|x(t), ∀t ≤ tn−1) = P (x(tn) ≤ xn|x(tn−1))
([27], p.535).
These equations signify the fact that the current state value has all the information
12
3.2. CONDITIONAL MEAN AND VARIANCE IN GAUSSIAN DISTRIBUTION13
we need and the future value of such a variable is independent of its past. So, the
process with this property is very computationally efficient.
Markov sequences also have the following property:
p(x(t+ 1)) =∫p(x(t+ 1)|x(t))p(x(t))dx(t),
which is used in SLAM prediction step as explained in Chapter 4.
3.2 Conditional Mean and Variance in Gaussian
Distribution
For jointly Gaussian random vectors y and z, the conditional distribution of y given
z is also Gaussian with the following moments [34].
Conditional mean:
E[y|z] = E[y] + CyzC−1z (z − E[z])
Conditional variance:
Cyy|z = Cyy − CyzCz−1Czy,
Where:
Cyz = E[(y − E[y])(z − E[z])T ]
3.3 EKF
Unlike the regular Kalman Filter, which tries to estimate the state of a controlled
linear process, EKF (Extended Kalman Filter) addresses the problem of estimating
the state of nonlinear processes, where the state transition and/or observation models
are nonlinear functions. EKF linearizes these two nonlinear functions about their
mean vector and covariance matrix, similar to Taylor series[17].
14 CHAPTER 3. ESSENTIAL PRELIMINARIES
Assume the process to be estimated is:
xk = f(xk−1, uk, wk−1),
and the measurement is:
zk = h(xk, vk),
where wk and vk represent the process and measurement gaussian zero-mean noise
with the given covariance, xk and zk represent the state and observation at time k
respectively, f is the process model and h is the measurement model.
EKF algorithm consists of two parts of time update and measurement update. The
complete set of equations is given below.
EKF time update(prediction) equations:
(1) Project the state ahead:
x−k = f(xk−1, uk, 0)
(2) Project the error covariance ahead:
P−k = AkPk1ATk +WkQk1W
Tk
Where Ak and Wk are the process jacobians at step k, and Qk is the process noise
covariance at step k.
EKF measurement update(correction) equations:
(1) Compute the Kalman gain:
Kk = P−k HTk (HkP
−k H
Tk + VkRkV
Tk )1
(2) Update estimate with measurement zk:
xk = x−k +Kk(zk − h(xk, 0))
3.4. MONTE CARLO SIMULATION 15
(3) Update the error covariance Pk:
Pk = (I −KkHk)P−k
Where Hk and Vk are the measurement Jacobians at step k, and Rk is the mea-
surement noise covariance at step k.
3.4 Monte Carlo Simulation
The Monte Carlo method was invented by S.Ulam and N.Metropolis in 1949 [38],
for solving problems using random numbers and probability distributions. This tech-
nique is usually used when we want to simulate physical and mathematical systems
when their system models are nonlinear, have lots of uncertain parameters, or are too
complicated to solve analytically [2].
The Monte Carlo simulation iteratively samples from the probability distribution
we defined for the uncertain input variables and uses these samples to feed the system
and evaluate the result.
In order to do this, we must try to find the probability distribution that matches
our input data as close as possible, and also expresses our knowledge about these
data precisely.
Monte Carlo simulation algorithm has the following steps[]:
Step1: model the system.
Step2: find the input probability distribution, generate data from it.
Step3: run the system with this data set and check the result.
Step4: iterate step 1 to 3 for n times.
Step5: dissect the result.
16 CHAPTER 3. ESSENTIAL PRELIMINARIES
Monte Carlo sampling gives improved estimates of the output, as the input sam-
pling continues.
3.5 Bayes Theorem
In the probability theory, Bayes’ theorem simplifies the mathematical formula used
for calculating conditional probabilities, which makes the simple connection between
the conditional and marginal probabilities of two random variables or events[30].
P (A|B) =P (B|A)P (A)
P (B),
where:
P (A|B): posterior, the probability of the state of nature being A given that feature
value B has been measured
P (B|A): likelihood, the likelihood of B with respect to A
P (A): prior, probability of the state of nature being A before any observation
The Bayes formula is often used to compute posterior probabilities given observa-
tions (prior) as below:
posterior=(likelihood × prior)/normalizing constant.
3.5.1 DBN
Dynamic Bayesian Network[24] is a specific type of BN which was built to solve the
stochastic process cases and represents sequences of variables (usually time series),
DBNs are directed graph models of these kind of processes. HMM (Hidden Markov
Model) is a simple example of DBN [20].
BN (Bayesian or belief network), is a simple tool for representing and reasoning
about uncertainty and belongs to the graphical model family (GM). It visualizes how
different variables are dependent to each other, represents what we know about an un-
certain domain and helps to simplify the complex problems. Bayesian network B is an
3.5. BAYES THEOREM 17
X(0) X(1) X(2)
Z(0,1)
Z(0,2)
Z(1,1) Z(1,2)
Z(2,1)
Z(2,2)
L1 L2Landmarks
Observations
UAV states
Control inputs u2u1
Figure 3.1: DBN for the SLAM problem. Ui:control input applied at time (i− 1) to derivethe system to state X(i), Z(i, j): observation of Lj made at time i.
acyclic graph of the form B = 〈G,Θ〉 which corresponds to the joint probability dis-
tribution over a set of random variables, where G is a DAG (Directed Acyclic Graph)
with nodes:X1, ...Xn and Θ represents the network parameters θXi|πi= PB(Xi|πi)
where πi represents the parent nodes. B defines the unique joint probability distribu-
tion given below:
PB(X1, X2, .., Xn) =n∏
i=1
PB(Xi|πi) =n∏
i=1
θXi|πi
18 CHAPTER 3. ESSENTIAL PRELIMINARIES
The Bayesian Networks can be made following these 3 steps:
Step 1: adding variables we want to be included in the network as nodes Xi
Step 2: adding the links from the parent (πi) to child. Each link represents a rela-
tionship between two nodes, note that the graph must remain acyclic
Step 3: for each node X defining a probability table containing P (X|π) for all com-
bination of its parents
Figure 3.1 illustrates the DBN of SLAM problem.
3.6 Shi and Tomasi
Shi and Tomasi introduced another feature tracking algorithm that obtains the fea-
tures of an image at time t+τ based on features of image at time t [31]. This method
works by moving every point in the earlier image by an appropriate displacement vec-
tor δ = Dx+ d, where d is the features’ translation vector and D is the deformation
matrix defined by:
D =
dxx dxy
dyx dyy
Denote the current image by I(x, t) and the upcoming image by I(Ax + d, t + τt),
where A = I2×2 + δ. Shi and Tomasi’s idea was to find an affine transformation that
minimizes the dissimilarity,(ε), between these two images:
ε =∫ ∫
W[I(Ax+ d)− I(x)]2dx,
and W is a window around the feature in the current image, where this residual is
being computed. In the simplest case, D is equal to the identity matrix and nd the
3.7. SPATIAL RELATIONSHIPS 19
only important quantity is the translation vector d. By setting the derivative of ε with
respect to d equal to zero and linearizing the system by truncated Taylor expansion,
the following linear system will be obtained:
Gd = e
G =∑
W
I2
x IxIy
IxIy I2y
, e =
∑
W
It(Ix Iy
)
The method finds the good features to track by computing the eigenvalues of the
matrix G, (λ1 and λ2), and checking them against a predetermined threshold, (λ),
that is if min min(λ1, λ2) > λ [35].
3.7 Spatial Relationships
3.7.1 AT
Smith [29] defines AT (uncertain or Approximate Transformation) as an estimate
between two different coordinate frames, companying with a covariance matrix which
shows how much uncertainty this estimate has.
Each AT represent an observation (or other kind of sensing) in a relative motion.
Smith used AT(A)=[x, y, θ] (where θ is the rotation about z-axis), to express the
relative location of landmark A to the world reference frame, and AT(AB) to express
the relative location of landmark A coordinate frame to landmark B’s.
In Figure 3.2 (from [29]), Smith and Cheeseman used an arrow (form the reference
frame to the landmark) to represent the AT and an ellipse(representing the two
dimensional Gaussian distribution) around the landmark to show its uncertainty.
20 CHAPTER 3. ESSENTIAL PRELIMINARIES
L1
L2
L3
L4
W
A
B
C
E
F
G
S
D
Figure 3.2: A sequence of approximate transformations. line: AT, ellipse: AT’s uncertainty,solid:gained by movement, dashed:gained by observation.
3.7.2 Coordinate Frame Relationships
Having different ATs in different coordinate frames, we need to estimate the relative
AT between these frames. To do so, we must use some operations to combine infor-
mation from the middle steps and produce a single AT([29],[28]).
We can write any relationship as:
y = g(x)
3.7. SPATIAL RELATIONSHIPS 21
Where g() is a general function consisting of a desired combination of Compound,
Merge and Reversal operations. The estimate of the first two moments two moments,
i.e. mean and covariance matrix, of this function is given by:
y ≈ g(x)
C(y) ≈ GxC(x)GTx
Where Gx ≡ ∂f(x)∂x
(x)
Compounding:⊕
Compounding operation allows us to recursively compress a chain of subsequent ATs
and get a single AT. By using this operation, in each step (combining two AT) the
uncertainty of the resultant AT with respect to the two ATs it is made from will in-
crease. This is similar to moving with closed eyes in an unknown environment. With
each step you take, the uncertainty about where you are with respect to where you
started increases.
Compounding formulas for a pair are given by:
xik = xij ⊕ xjk =
xjk cos(θij)− yjk sin(θij) + xij
xjk sin(θij) + yjk cos(θij) + xij
θij + θjk
, xm =
xm
ym
θm
,
compounded mean, is:
xik ≈ xij ⊕ xjk
and compounded covariance matrix is:
C(xik) = J⊕
C(xij) C(xij,xjk)
C(xjk,xij) C(xjk)
JT⊕ ,
22 CHAPTER 3. ESSENTIAL PRELIMINARIES
where Jacobian of the compounding operation is:
J⊕ =∂(xij ⊕ xjk)∂(xij, xjk)
=∂xik
∂(xij, xjk)=
1 0 −(yik − yij) cos(θij) − sin(θij) 0
0 1 (xik − xij) sin(θij) cos(θij) 0
0 0 1 0 0 1
= [ J1⊕ J2⊕ ]
Merging:⊗
The merging operation is used to combine parallel ATs between the same pair of coor-
dinate frames. These parallel ATs occur as a result of having more than one source of
data: sensors which collect data and also UAV’s odometry data after each movement.
So the sensor result and the final approximation (computed with ⊕ operation) give
us two different pieces of information about the same landmark, and combining these
two will decrease the uncertainty.
Merging a pair of transformations X1 and X2 is showed by:
X3 = X1 ⊗X2.
In order to find the merged mean and covariance matrix, the Kalman gain factor is
computed as:
K = C1 × [C1 + C2]−1,
using this gain,we can compute merged covariance matrix:
C3 = C1 −K × C1,
and merged mean:
X3 = X1 +K × (X2 − X − 1),
where:
C1, C2 and X1, X2 are covariance matrices and mean vectors of the two ATs we want
to merge respectively, and C3 and X3 are covariance matrix and mean vector of the
3.7. SPATIAL RELATIONSHIPS 23
resulting merged pair.
Reversal:
For merging and compounding, all ATs must be in the right direction. Otherwise,
using the reversal operation we can correct the direction (sense) of the AT Xij =
(xij, yij, θij) which is in the wrong direction.
Reversal formulas:
Xji = Xij =
−xij cos(θij)− yij sin(θij)
xij cos(θij)− yij sin(θij)
−θij
Reversal mean:
xji ≈ xij
Reversal covariance matrix:
C(Xji) = JC(Xij)JT
Where the jacobian of the reversal operation is:
J ≡∂Xji
∂Xij
=
− cos(θij) sin(θij) yji
sin(θij) − cos(θij) −xji0 0 −1
.
Figure 3.3 shows an example of above three operations.
3.7.3 Stochastic Map
World is represented as a 3D surface, made up of different patches. Each patch
corresponds to a 3D object with photometry and geometry(shape) properties and a
coordinate frame which defines each patch’s pose in the world. Smelyanskiy et al
showed in [39] that the property of each patch can completely be learnt by using
24 CHAPTER 3. ESSENTIAL PRELIMINARIES
Compounding:
Merging:
Reversal:
X_01
X_01
X_01
X_01X_01
X_12 X_02
X_10
Figure 3.3: Compounding, Merging and Reversal operations. line: AT, ellipse: AT’suncertainty, solid:gained by movement, dashed:gained by observation, Xij: AT fromframe i to frame j
Bayesian Super Resolution technique[32]. Learning these properties are usefull espe-
cially when examining the observed landmark to find out if it has been seen before
or is a new one. To know the environment completely, UAV needs to know each
patch’s pose too. The most popular way for this aim is the EKF method, where the
world is represented as a set of features and each feature is the pose of a single patch
mentioned above. EKF tracked a mean state vector which consists of the UAV pose
followed by the pose of each feature respectively [33]. Figure 3.4 illustrates a simple
example for the world map.
3.7. SPATIAL RELATIONSHIPS 25
square triangle circleUAV
Mean state vector:
Map: Features:
Figure 3.4: map of 3D world, each feature represented as a 6DOF vector (3 DOF positionand 3 DOF orientation)
Because we do not know exactly where the landmarks and UAV are, we must show
the objects in a stochastic map which includes uncertain spatial relationships, their
uncertainties and a scale to show the inner-dependencies between these uncertainties.
The uncertain spatial relationship can be represented as a probability distribution
around its spatial variable, and because this distribution is unknown, we can model
it by just estimating its first two moments (mean x and covariance matrix C(x)) and
use this normal distribution to approximate the real distribution.
For the 2D case map is made with the following 2 steps algorithm, assuming no sensor
is provided, and the world is 2D:
step 1: Adding the first relation to the empty map
26 CHAPTER 3. ESSENTIAL PRELIMINARIES
X = X1 =
x
y
φ
, C(X) =
σ2x σxy σxφ
σxy σ2y σyφ
σxφ σyφ σ2x
step 2: Adding a new object, one at a time:
X ′ =
x
xn
, C(X ′) =
C(X) C(X,Xn)
C(Xn, X) C(Xn)
3.7.4 Updating The Map
The map can change if one of the followings situations happen:
1. the world itself changes (note: UAV is part of the world, if it moves the world will
change, and because we assumed landmarks are stationary in our research, the world
can change only by UAV movement).
2. our knowledge about the world changes, for example by observing a landmark
again and more precisely (loop closure problem).
We have estimated the world map by its first two moments, so in order to change
Figure 3.5: updating the map
3.7. SPATIAL RELATIONSHIPS 27
the map we must change and update these two moments, Figure 3.5 (from [28])
shows the change in map due to change of the world and adding new observations
(measurements).
Smith et al. assumed in [28] that the new observations are applied at discrete time k,
and their effect is immediate. they showed the moments before the observation by x(−)k
and C(x(−)k ) and right after it by x
(+)k and C(x
(+)k ). By making a new observation, our
knowledge about the world will be increased and this will decrease the uncertainty
of the observed landmark and all the world respectively (direct result of merging
operation).
For finding x(+)k and C(x
(+)k ) mathematically, we must model our sensor (camera in
our case) somehow to describe the effect of the sensor on mapping the spatial variables
(xi:2D image patches) into the sensor variables (zi:pose of each landmarks). Camera
model is defined as function h, and because the measurement is noisy we add an
independent gaussian zero mean noise v with a given covariance to it:
z = h(x) + v
The estimated map after this observation, using the formula in Section 3.2 is:
x|z = x+ C(x, z)C(z)−1(z − z)
C(x|z) = C(x)− C(x, z)C(z)−1C(x, z)
where x(+)k = x|z and x
(−)k = x
If the above covariance matrices C(x) and C(x, z) are to be replaced by their approx-
imations, then we would get the Kalman Filter equations given below:
x+k = x−k +Kk[zk − hk(x−k )],
C(x+k ) = C(x−k )−KkHxC(x−k ),
Kk = C(x−k )Hx/6t[HxC(x−k )HTx + C(v)k]
−1.
28 CHAPTER 3. ESSENTIAL PRELIMINARIES
Between the two steps of updating the sensor (i.e time k − 1 and k), UAV can
move. Assume the UAV is the Rth relationship in the map, so it’s current estimated
location is defined by xR, future location(after movement) by x′R and the relative
motion it makes to get to this place by yR, note that this relative motion is uncertain.
So, using the compounding formula given in Section 3.5.2 and assuming xR and yR
have independent errors, the new state vector and covariance matrix can be inferred
as follows:
x′R = xR ⊕ yRx′R ≈ xR ⊕ yRC(x′R) ≈ J1⊕C(xR)JT1⊕ + J2⊕C(yR)JT2⊕
C(x′R, xi) ≈ J1⊕C(xR, xi)
and the new map will become:
x′ =
x′R
, C(x′R) =
C(x, x′R)
C(x′R, x) C(x′R)
Chapter 4
SLAM
4.1 Abstract
Simultaneous localization and mapping (SLAM) is the problem of being in an un-
known location in an unknown environment and simultaneously learning the environ-
ment (mapping) and determining the location accurately and repeatedly (localization)
using this map. So, in order to solve the slam problem, we must answer the following
questions:
1.Where am I?
Localization is the act of providing the map to UAV, where UAV is required to find
its location with respect to the objects in the given map.
2.What does the world around me look like?
Mapping on the other hand is when the UAV location is known, and we ask it to find
all the object’s locations in the environment surrounding it(create the sparse map of
the room) with respect to its own location.
The main problem here is, to map a feature in the world map UAV needs to know
its own location; and to find its own location, UAV needs the accurate map of the
29
30 CHAPTER 4. SLAM
1 2
3
5
4
Figure 4.1: 1:start the prediction step,2: estimate new location, 3:add process noise, 4:startthe measurement step, measure features, 5:update positions and uncertainties
world! And this is a bootstrapping problem.
So, Slam is the act of the UAV building a map of the environment around it and
using this map to find its location at the same time.
4.2 Implementation
SLAM is considered as a solved problem now([14] and [8]), and can estimate the
trajectory of the sensor (which is located on the robot, and hence has the same
4.2. IMPLEMENTATION 31
Table 4.1: SLAM Notationsxk state vector of UAV, including its location and orienta-
tion (pose) at time k.X0:k = X0:k−1, xk history of UAV pose, up to and including time k.
uk the control input(applied at time k-1) which derivedUAV to the state xk(at time k).
U0:k = U0:k−1, uk history of input controls, up to and including time k.
zik observation(sensor input) of the ith landmark at time k.zk = z1k, ... observation of all the landmarks at time k.
Z0:k = Z0:k−1, zk history of observed landmarks, up to and including timek.
mi estimated location of the ith landmark, we assumedall the landmarks are stationary (their true locationsdoesn’t change with time).
m = m1,m2, ...,mn Set of all landmarks(Notice that the map parametersdo not have a time subscript as they are modelled asstationary)
trajectory as the robot) and locations of all the landmarks in the environment without
any prior information.
4.2.1 Notations
In order to solve the SLAM problem Bailey and Durrant-Whyte [14] introduced the
notations showed in Table 4.1 .
4.2.2 Observation Model
P (zk|xk,m)
The observation model describes the effect of the observation. It computes the probability
of making the kth observation of the environment (zk), assuming we have the location of
the UAV (xk) and the map of the room (m), and describes how the uncertainty can be
reduced. If we precisely know the true map, the new observations will not change it, so in
32 CHAPTER 4. SLAM
that case making observations is independent from the map of the world.
4.2.3 Motion Model
P (xk|xk−1, uk)
The motion model describes the effect of the control input on state transition and calculates
how the uncertainty increases. Motion model is function of the previous state (xk−1) and
the control input which derived us to the current state (uk). Since the probability of xk is
just dependent on xk−1, this model is Markov and we can use its properties (as mentioned
in Section 3.1) later in the SLAM algorithm.
4.3 Probabilistic SLAM
Hugh and Bailey defined the SLAM problem as the computation of the joint posterior
probability P (xk,m|Z0:k, U0:k, x0)(or simply P (xk,m)) for the UAV state xp and map m,
based on all the observation and control inputs for all time t in [14].
To do so, they introduced the following two-steps recursive algorithm, in which the first
step is the time update and the second one is the measurement-update.
4.3.1 Step one: Prediction(time-update)
When UAV moves, we need to estimate its new position and the uncertainty of its location
(knowing that this uncertainty is monotonically increasing). The motion model used in this
step is:
P (xk,m|Z0:k−1, U0:k, x0) =∫P (xk|xk−1, uk)× P (xk−1,m|Z0:k−1, U0:k−1, x0)dxk−1
4.3.2 Step two: Correction(measurement-update)
The aim of this step is adding new feature to the map and remeasure the previously added
features. When UAV observes a feature which was previously in the map, it needs to update
the system using Bayes Theorem and Markov property as follows:
P (xk,m|Z0:k, U0:k, x0) =P (zk|xk,m)P (xk,m|Z0:k−1, U0:k, x0)
P (zk|Z0:k−1, U0:k−1)
4.4. EKF-SLAM 33
This formula is a function of the vehicle model P (xk|xk−1, uk) and observation model
P (zk|xk,m).
And when UAV observes a new feature (which is not in the map), using the inverse of
observation equation (P (mi|xk,m)) its initial position will be estimated.
4.4 EKF-SLAM
EKF-SLAM is the most popular solution to the SLAM problem. It uses a linearized Gaus-
sian probability distribution model to approximate the posterior as a Gaussian distribution,
where the Mean (µk) contains the state vector (which is the location of the robot and map
of the environment), and covariance matrix (Σk) estimates the uncertainties and demon-
strates how the elements of state vector are correlated to each other.
A posteriori state estimate equation represents the mean of the state distribution:
µk =
xk|k
mk
= E
xk
m|Z0:k
,
and a posteriori estimate error covariance equation represents the variance of the state
distribution:
Σk = Pk|k =
Pxx Pxm
P Txm Pmm
=
xk − xkm− mk
xk − xkm− mk
T
|Z0:k
Hence, the joint probability distribution which was previously used can be defined as
the following Normal equation, knowing the additive process and measurement noises are
white:
P (xk,m|Z0:k, U0:k, x0)⇔ N(µk,Σk)
Defining the posterior as this normal distribution, we can apply EKF to compute the
mean and covariance matrix of the joint posterior distribution P (xk,m|Z0:k, U0:k, x0).
34 CHAPTER 4. SLAM
Table 4.2: EKF-SLAM Notationsf motion modelh measurement model
wk = N(0, Qk) additive, zero mean, uncorrelated Gaussian motion dis-turbances with covariance Qk
vk = N(0, Rk) additive, zero mean, uncorrelated Gaussian observationerrors with covariance Rk
Motion model
The motion model describes change of robot’s state with time as:
P (xk|xk−1, uk)⇔ xk = f(xk−1, uk) + wk
Observation model
Observation model predicts the measurement value given robot’s state as:
P (zk|xk,m)⇔ zk = h(xk,m) + vk
where the notations are explained in table 4.2.
4.4.1 Time Update(Predict)
The purpose of this step is to update the vector state, as the landmarks or the robot itself
may move during the learning process. So, there is no need to compute the time update
if the landmarks and robot are stationary and do not move. In our case all the landmarks
are stationary and just the robot can move, and when it moves using this step we can find
its new position and also the uncertainty of its location, noting that positional uncertainty
always increases. Time update performs via the following two steps:
step 1: project the state ahead:
xk|k−1 = f(xk|k−1, uk)
4.4. EKF-SLAM 35
step 2: project the error covariance ahead:
Pxx,k|k−1 = ∇fPxx,k|k−1∇fT +Qk
4.4.2 Observation Update(Correct)
This step aims to add new features to map and remeasure the existing features by observing
them again:
step 1: update the estimate with observation zk:
xk|k
mk
= [xk|k−1mk−1] +Wk[zk − h(xk|k−1, mk−1)]
step 2: update the error covariance:
Pk|k = Pk|k−1 −WkSkWTk ,
where the Kalman gain is:
Sk = ∇hPk|k−1∇hT +Rk
Wk = Pk|k−1∇hTS−1k ,
and ∇h is the jacobian of h computed at xk|k−1 and mk−1.
4.4.3 Advantages and Disadvantages
EKF-SLAM solution inherits many of the standard EKF benefits and problems, the most
important advantages of this method are its simplicity and the proven experience of working
well in practice.
The problems of EKF-SLAM mostly arise because firstly EKF-SLAM represents the es-
timate of state by the mean and its uncertainty by a covariance matrix and because we get
36 CHAPTER 4. SLAM
these two moments by using the linearization they are approximate and not exactly the true
moments of the distribution. And secondly we assumed the map distribution is uni-modal
gaussian and so it can be represented by its first two moments but in fact this distribution
is not gaussian [9].
Consistency
Bailey et al. claimed in[9] the most important source of inconsistency of EKF-SLAM is due
to the heading uncertainty (variance). They examined their hypothesis in different scenarios
(moving and stationary vehicle) and with different heading uncertainty and concluded that
with a relatively small heading uncertainty (less than 1.7 degree), inconsistency is time
dependent and its effect can be weakend by adding some stabilising noise. But for Large
variance, failure occurs after a few updates and so it can not be prevented by adding noise
or any other method.
Julier and Uhlmann, on the other hand, showed in [37] the map built by the full covariance
SLAM algorithm is always inconsistent, even in the case of stationary vehicle and no process
(motion) noise. And the reason people working in this area didn’t mention it in the previous
studies was the fact that this inconsistency appears after about thousands of updates, and
the works that have been performed in the past had only simulated for the first few hundred
steps in which the divergence is not noticeable.
So, inconsistency becomes evident when either the ground truth is available or the loop
becomes big (e.g. greater than 100m), and it can even have more effect on divergence than
computational complexity problem [26].
Convergence
Huang and Dissanayake showed the error of robot heading has an important role in limit
or lower bound of the uncertainty of the estimation of the landmark locations, and so has
a significant effect on convergence analysis of EKF-SLAM. They stated and proved the
following theorems for the convergence of 2D EKF-SLAM in [13], and also for finding the
limit of covariance matrices and limit of inconsistency:
Theorem 1: The determinant of any submatrix of the map covariance matrix decreases
monotonically as successive observation are made.
4.4. EKF-SLAM 37
Theorem 2: If robot keeps stationary and observe a new landmark many times, the robot
uncertainty keeps unchanged.
Theorem 3: If robot keeps stationary and observes two new landmarks many times, the
robot uncertainty keeps unchanged.
Theorem 4: In EKF SLAM, if the robot keeps stationary at point A and observing a new
landmark k times, the inconsistency may occur due to the fact that Jacobians are evaluated
at different estimation values. When k →∞, the inconsistency may cause the robot orien-
tation error to be reduced to zero.
Whyte and Bailey showed in [14] the map will converge if the map covariance matrix
and all landmark’s pair submatrices monotonically converged to zero.
The distinct landmark’s variances inherit the initial uncertainty of robot position and ob-
servations and converge to the lower bound monotonically as shown in Figure 4.2 from [14].
40 50 60 70 80 90 100 1100
0.5
1
1.5
2
2.5
Time (s)
Sta
ndar
d D
evia
tion
in X
(m
)
Figure 4.2: The convergence in landmark uncertainty. The plot shows a time history ofstandard deviations of a set of landmark locations. A landmark is initially observed withuncertainty inherited from the robot location and observation. Over time, the standarddeviations reduce monotonically to a lower bound. New landmarks are acquired duringmotion
38 CHAPTER 4. SLAM
In short, they claimed the Convergence and consistency can only be guaranteed in the
linear case.
Loop Closure
As the UAV moves inside the environment and explores the map around it, the localization
error will increase, therefor the uncertainty in UAV position will increase because of this
accumulated error. As a direct result of this problem, the map will diverge gradually. To
overcome the divergence, UAV must be able to recognize the positions which it was in before
and use their new observations to correct the map estimate and reduce the uncertainty of
the map and its pose. This process in SLAM is known as the loop closure step. Gutmann
and Konolige [18] illustrated the loop closing result in Figure[].
Since the map will become larger after adding new landmarks in each iteration, and also,
since in online processing we need each step to be computed in a little time to have near
real time execution, all the computations must be performed with little cost. Hence closing
the loop must be performed with little computational cost, and in ideal case it does not
need a separate algorithm.
EKF based SLAM needs a separate algorithm to identify loops and closing the loop requires
a very expensive computation of O(N2) as shown in [23]. So, loop closure remains one of
the main unsolved issues with EKF-SLAM.
Figure 4.1: Before and after closing a loop. Courtesy of [34].
Many types of sensors (e.g. sonar or a single laser point rangefinder), provide sparse amounts of
data fast while acquiring a more dense amount of data usuallyrequires several seconds. This impacts
the map building process since it is usually advantageous toobtain a multitude of points for data
association. More points equals more possible features, while sparse data sets allows only a sparse
reconstruction and hinders the point matching process due to the lack of features.
In order to accomodate these issues with data association, the use of statistics is typically employed to
determine outliers and to find the ’best’ match to the data given sets of noisy measurements.
Most SLAM algorithms cannot cope with a failure to properly associate measurements to map fea-
tures. For instance, in a Kalman filter based SLAM algorithm,if too many measurements are incorrectly
assigned, the filter will diverge. Hahnel [36] explicitly addresses this problem and constructs a theoretical
framework for performing a “lazy” data association. This involves maintaining a tree that represents the
possible data associations at each update. The log-likelihood for each association is stored in the tree.
58
Figure 4.3: Before and after closing the loop, this figure is from [19]
4.4. EKF-SLAM 39
Data Association
Data association, is the process of relating the observed features through the sensor(camera
in our case) to the features in the existing map. If we associate these observations incorrectly,
the filter update will be based on incorrect data, and if this incorrect association happens
for more and more features, the EKF filter will diverge and can not be corrected [25].
Incorrect data association can occur if we use an inaccurate or imprecise model for the
sensor, if the measurements(observations) are too noisy, or simply when the features are
not stationary in the environment as we previously assumed and can move.
Kalman filter based SLAM can not cope with the incorrect data association problem and
will diverge.
Computational Complexity
The most computational cost of EKF-SLAM is because of computing the Kalman gain
in the observation update step. Due to computing the inverse matrix its cost is of the
order of O(N2). Hence, EKF update procedure needs a quadratic number of operation
and its computational complexity is O(N2) where N is the number of features in map. So,
EKF-SLAM can not be used in the case of big maps (i.e. big N).
Uni-modal Gaussian
Because the map is assumed to have ”Uni-modal Gaussian Probability” distribution in all
the cases, missions like the one in Figure ?? can not be completed. For solving this problem
the EKF approach must be replaced with other methods.
Nonlinearity
In EKF, nonlinear functions are approximated with their linear approximation. So the filter
can diverge if the function’s nonlinearity is large.
Chapter 5
Experiments
The main focus of this research is to find the correct pose of an autonomous-flying quad
rotor in a room, and to use this information to better control the device.
The quad rotor we used, was equipped with a single onboard camera, and MonoSLAM was
used as the pose finder, with the onboard camera as sensor. MonoSLAM is described in
Section 5.1. Davison’s C++ library was used to implement MonoSLAM 1.
Ground truth of quad rotor pose was checked by VICON cameras installed in robust control
lab; a brief overview of VICON is presented in Section 5.2.
In order to simulate the flight of the UAV, we used a handheld camera. The result of the
experiments are presented in Section 5.3.
5.1 MonoSLAM
This section presents the result of Davison’s work on the MonoSLAM problem [12]. In
MonoSLAM, the main aim of the process is finding the robot position (localization) rather
than building the complete map of the room, and the robot will add just as much feature
to the map as it is necessary to find its position in the unknown world around it. So, the
output map is a sparse set of high quality landmarks including around 100 features for a
limited space in the room. Davison assumed the scene is rigid and hence the features inside
the scene are stationary.
1The latest version of this software is available at his website:http://www.doc.ic.ac.uk/ ajd/software.html
40
5.1. MONOSLAM 41
The camera state vector xv is a 13 parameter vector consisting of a 3-element vector rW
representing the 3D metric position, a 4-element vector qRW representing the orientation
quaternion, and 3-element vectors vW and wR representing velocity and angular velocity
respectively. The camera is modelled as a rigid body with its pose completely described by
its translation and rotation:
xv =
rW
qRW
vW
wR
,
where W indicates the ”world frame” and R illustrates the ”robot frame”.
Map of the world x, consists of camera state vector(xv) and feature’s position (yi). The
uncertainty of this map is showed by the covariance matrix P :
x =
xv
y1
y2
...
, P =
Pxx Pxy1 Pxy2 · · ·Py1x Py1y1 Py1y2 · · ·Py2x Py2y1 Py2y2 · · ·
......
.... . .
The map is made by the image patches with their corresponding feature positions (mean)
and an elliptical shape around each feature to show their uncertainty bound (standard de-
viation) in 3D space.
If the map had N features, the size of the map would have become O(N2) because of the
size of matrix P , and the MonoSLAM complexity is also O(N2). Because of the real time
processing, for 30Hz implementation, the possible number of features in map is around
100 features, and Davison showed in [11] 100 well-chosen features are sufficient to span the
limited volume of the room where the experiment is taking place in.
As mentioned in the last chapter, close features (which can be observed at the same time)
are related to each other with a good approximation, but the group position has a large
42 CHAPTER 5. EXPERIMENTS
uncertainty. Which means the elliptical bounds mentioned above are correlated and this
legitimizes the non-zero off diagonal elements of covariance matrix.
MonoSLAM is based on repeatable observation of features from different viewpoints. Vari-
ous subsets of features at different depths will be covisible as the camera moves freely inside
the limited 3D space and different size loop closure will be occurred; so knowing the de-
tailed correlation between the map’s subsets is very important for solving the loop closure
problem, Davision et al. showed that among all the known methods,”Standard single, full
covariance EKF approach” is the most computationally feasable method which can achieve
the goal of ”long term, repeatable localization within restricted volumes”.
Davison and Maury used large image patches(11x11 pixels) as the long-term landmarks,
and used the Shi and Tomasi feature detection approach for finding the features. Instead of
using the normal template matching for recognizing the features after reobservation, they
assumed each landmark is placed on a locally planar surface and approximated this surface’s
normal with the vector oriented from the feature to the camera. Because the real orienta-
tion is unknown at initialization, each feature will be stored as oriented planar texture after
being fully initialized. In order to detect the initialized feature from a different viewpoint
when the camera pose changes, they projected the image patch to the predicted place where
they expected to find the feature (using the new camera 3D position and orientation) and
made the new template and compared it with the existing image patch in each location
until a peak is found 5.1.
In order to update the normal of each feature’s plane in 3D map, as the camera moves
and the new estimate of its pose becomes available, the warped version of the initial image
patch will be computed using the following formula:
H = CR[nTxpI − tnT ]C−1,
where: C= camera’s calibration matrix (perspective projection),
R= camera rotation matrix,
t= camera translation matrix,
n= surface normal,
xp= projection of patch’s cente,
5.1. MONOSLAM 43
coordinate, and that their effect on the parameters of the line isnegligible. This is a good approximation because the amountof uncertainty in depth is very large compared with theuncertainty in the line’s direction. While the feature isrepresented in this way with a line and set of depthhypotheses we refer to it as partially initialized. Once wehave obtained a good depth estimate in the form of a peakeddepth PDF, we convert the feature to “fully initialized” with astandard 3D Gaussian representation.
At each subsequent time step, the hypotheses are all testedby projecting them into the image, where each is instantiatedas an elliptical search region. The size and shape of eachellipse is determined by the uncertain parameters of the line:Each discrete hypothesis at depth has 3D world locationyi ¼ rWi þhWi . This location is projected into the image viathe standard measurement function and relevant Jacobians ofSection 3.5 to obtain the search ellipse for each depth. Notethat, in the case of a nonperspective camera (such as the wide-angle cameras we normally use), the centers of the ellipseswill not lie along a straight line, but a curve. This does notpresent a problem as we treat each hypothesis separately.
We use an efficient algorithm to make correlationsearches for the same feature template over this set ofellipses, which will typically be significantly overlapping(the algorithm builds a look-up table of correlation scores sothat image processing work is not repeated for the over-lapping regions). Feature matching within each ellipseproduces a likelihood for each, and their probabilities arereweighted via Bayes’ rule: The likelihood score is simplythe probability indicated by the 2D Gaussian PDF in imagespace implied by the elliptical search region. Note that, inthe case of many small ellipses with relatively smalloverlaps (true when the camera localization estimate isvery good), we get much more resolving power betweendifferent depth hypotheses than when larger, significantlyoverlapping ellipses are observed, and this affects the speedat which the depth distribution will collapse to a peak.
Fig. 4 illustrates the progress of the search over severalframes and Fig. 5 shows a typical evolution of thedistribution over time, from uniform prior to sharp peak.When the ratio of the standard deviation of depth to depthestimate drops below a threshold (currently 0.3), thedistribution is safely approximated as Gaussian and thefeature initialized as a point into the map. Features whichhave just crossed this threshold typically retain large depthuncertainty (see Fig. 1a which shows several uncertaintyellipsoids elongated along the approximate camera viewingdirection), but this shrinks quickly as the camera moves andfurther standard measurements are obtained.
The important factor of this initialization is the shape ofthe search regions generated by the overlapping ellipses. Adepth prior has removed the need to search along the entireepipolar line and improved the robustness and speed ofinitialization. In real-time implementation, the speed ofcollapse of the particle distribution is aided (and correlationsearch work saved) by deterministic pruning of the weakesthypotheses at each step, and during typical motions around2-4 frames is sufficient. It should be noted that most of theexperiments we have carried out have involved mostlysideways camera motions and this initialization approachwould perform more poorly with motions along the opticaxis where little parallax is measured.
Since the initialization algorithm of this section was firstpublished in [5], some interesting developments to theessential idea have been published. In particular, Sola et al.[46] have presented an algorithm which represents theuncertainty in a just-initialized feature by a set of overlapping3D Gaussian distributions spaced along the 3D initializationline. Appealing aspects of this approach are first thedistribution of the Gaussians, which is uniform in inversedepth rather than uniform in depth as in our technique—thisappears to be a more efficient use of multiple samples. Also,their technique allows measurements of new featuresimmediately to have an effect on refining the cameralocalization estimate, improving on our need to wait untilthe feature is “fully-initialized.” Most recently, Montiel et al.[47] have shown that a reparametrization in terms of inversedepth permits even more straightforward and efficientinitialization within the standard EKF framework, in anapproach similar to that used by Eade and Drummond [42] ina new FastSLAM-based monocular SLAM system.
DAVISON ET AL.: MONOSLAM: REAL-TIME SINGLE CAMERA SLAM 1059
Fig. 4. A close-up view of image search in successive frames during feature initialization. In the first frame, a candidate feature image patch isidentified within a search region. A 3D ray along which the feature must lie is added to the SLAM map, and this ray is projected into subsequentimages. A distribution of depth hypotheses from 0.5 m to 5 m translates via the uncertainty in the new camera position relative to the ray into a set ofellipses which are all searched to produce likelihoods for Bayesian reweighting of the depth distribution. A small number of time-steps is normallysufficient to reduce depth uncertainly sufficiently to approximate as Gaussian and enable the feature to be converted to a fully-initialized pointrepresentation.
Fig. 5. Frame-by-frame evolution of the probability density over featuredepth represented by a particle set. One hundred equally-weightedparticles are initially spread evenly along the range 0.5 m to 5.0 m; witheach subsequent image measurement the distribution becomes moreclosely Gaussian.
Figure 5.1: The evolution of depth probability density from uniform distribution toGaussian, points are uniformly distributed over the image patch’s normal line from0.5 to 5m. Courtesy of [12].
3.7 Map Management
An important part of the overall algorithm is sensible
management of the number of features in the map, and on-
the-fly decisions need to be made about when new features
shouldbeidentifiedandinitialized,aswellaswhenitmightbe
necessary to delete a feature. Our map-maintenance criterion
aims to keep the number of reliable features visible from any
camera location closetoapredeterminedvaluedeterminedby
the specifics of the measurement process, the required
localization accuracy and the computing power available:
we have found that with a wide-angle camera a number in the
region of 12 gives accurate localization without overburden-
ing the processor. An important part of our future work plan
is to put heuristics such as this on a firm theoretical footing
using methods from information theory as discussed in [45].Feature “visibility” (more accurately, predicted measur-
ability) is calculated based on the relative position of thecamera and feature and the saved position of the camera fromwhich the feature was initialized. The feature must bepredicted to lie within the image, but also the camera mustnot have translated too far from its initialization viewpoint ofthe feature or we would expect correlation to fail (note that wecan cope with a full range of rotation). Features are added tothe map only if the number visible in the area the camera ispassing through is less than this threshold—it is undesirableto increase the number of features and add to the computa-tional complexity of filtering without good reason. Featuresare detected by running the image interest operator of Shi andTomasi to locate the best candidate within a box of limitedsize (around 80 60 pixels) placed within the image. Theposition of the search box is currently chosen randomly, withthe constraints only that it should not overlap with anyexisting features and that based on the current estimates ofcamera velocity and angular velocity any detected featuresare not expected to disappear from the field of viewimmediately.
A feature is deleted from the map if, after a predeter-mined number of detection and matching attempts when thefeature should be visible, more than a fixed proportion (inour work, 50 percent) are failures. This criterion prunesfeatures which are “bad” for a number of possible reasons:They are not true 3D points (lying at occlusion boundariessuch as T-junctions), lie on moving objects, are caused byspecular highlights on a curved surface, or importantly arejust often occluded.
Over aperiod of time, a“natural selection” of features takesplace through these map management criteria which leads toa map of stable, static, widely-observable point features.Clutter in the scene can be dealt with even if it sometimesoccludes these landmarks since attempted measurements ofthe occluded landmarks simply fail and do not lead to a filterupdate. Problems only arise if mismatches occur due to asimilarity in appearance between clutter and landmarks andthis can potentially lead to catastrophic failure. Note,however, that mismatches of any kind are extremely rareduring periods of good tracking since the large featuretemplates give a high degree of uniqueness and the activesearch method means that matching is usually only attemptedwithin very small image regions (typically, 15-20 pixelsacross).
3.8 Feature Orientation Estimation
In Section 3.2, we described how visual patch features
extracted from the image stream are inserted into the map as
oriented, locally-planar surfaces, but explained that the
orientations of these surfaces are initially just postulated,
this proving sufficient for calculating the change of appear-
ance of the features over reasonable viewpoint changes. This
is the approach used in the applications presented in
Sections 4 and 5.
In this section, we show as in [7] that it is possible to go
further, and use visual measurement within real-time
SLAM to actually improve the arbitrarily assigned orienta-
tion for each feature and recover real information about
local surface normals at the feature locations. This improves
the range of measurability of each feature, but also takes us
a step further toward a possible future goal of recovering
detailed 3D surface maps in real-time rather than sets of
sparse landmarks.Our approach shares some of the ideas of Jin et al. [48]
who described a sequential (but not real-time) algorithm
they described as “direct structure from motion” which
estimated feature positions and orientations. Their concept
of their method as “direct” in globally tying together featuretracking and geometrical estimation is the same as the
principles of probabilistic SLAM and active search used
over several years in our work [5], [27]. They achieve
impressive patch orientation estimates as a camera moves
around a highly textured object.
Since we assume that a feature corresponds to a locally
planar region in 3D space, as the camera moves its image
appearance will be transformed by changes in viewpoint by
warpingtheinitial templatecapturedforthefeature.Theexact
nature of the warp will depend on the initial and current
positions of the camera, the 3D position of the center of the
feature, and the orientation of its local surface. The SLAM
system provides a running estimate of camera pose and
3D feature positions. We now additionally maintain estimates
of the initial camera position and the local surface orientation
for each point. This allows a prediction of the feature’s warped
appearance from the current viewpoint. In the image, we then
make a measurement of the current warp, and the difference
between the prediction and measurement is used to update the
surface orientation estimate.Fig. 6a shows the geometry of a camera in two positions
viewing an oriented planar patch. The warping which
1060 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 6, JUNE 2007
Fig. 6. (a) Geometry of a camera in two positions observing a surface
with normal n. (b) Processing cycle for estimating the 3D orientation of
planar feature surfaces.
Figure 5.2: The camera poses in 2 different viewpoints looking at the same imagepatch with normal n. Courtesy of [12].
I= 3x3 identity matrix,
which corresponds to its predicted appearance from the current viewpoint. The difference
44 CHAPTER 5. EXPERIMENTS
between this predicted amount and the measurement is used to update the feature’s sur-
face normal direction. Davison assumed in [12] the feature’s plane normal direction is just
weakly correlated with camera and feature positions, and therefore maintains a separate
2-parameter EKF for estimating the normal direction for each feature.
To aid the system in its computation to assign a precise scale to the estimated map
and to help it to estimate the camera motion with a good approximation in the starting
frames, the map is initialized with 4 features (corners of a letter sized paper for example)
with known position and known appearance. Also the initial position of the camera in the
first frame (the viewpoint from which the initialized features are stored) is known.
Gathering several measurements from different viewpoints is required for estimating the
feature’s depth. To find the depth, a semi-infinite 3D line is defined after initializing the
feature, starting from the camera position and passing through the feature point and going
to infinity. The true location of the feature lies on this line and the Gaussian uncertainty of
this ray is showed by an elliptical shape around it. The representation of this line is showed
by the vector ypi as follows, where ri is the position of the camera and hwi is a vector which
describes its orientation:
ypi =
rWi
hWi
To realize if the recognized feature is a new one (so it can be initialized) or an exist-
ing one (so the knowledge about its depth can be updated), a set of uniformly distributed
depth hypothesis will be defined along this semi-infinite line, the image patch (stored in
initialization step) will be projected in each discrete place to make new image templates,
and the observed feature will be compared with them using 2D template matching tech-
nique. The location of the hypothesis corresponding to feature yi at depth λ has the form
yλi = rWi + λhWi , and its Gaussian uncertainty is showed with an ellipse around this point.
Therefore, at the beginning, for each feature we have a set of ellipses uniformly distributed
along a line with their centers lying on that line. After reobserving a feature and finding
its likelihood to be which one of these hypothesis along the line, the probability of each
hypothesis will be recalculated until the correct depth is found.
The robot must decide when a new feature can be added to the map and when the
5.1. MONOSLAM 45
existing feature can be deleted from it while it is flying inside the room and grabbing new
frames. Davison showed it is sufficient to have around 12 visible features in each frame for
accurate localization. A new feature will be added if the number of visible features become
less than this threshold and will be deleted from the map after more than 50% failure of
detecting it in a predetermined number of frames. Since the image patches are relatively
large, it is very rare to have a poor match between a true feature and an occlusion due to
their appearance similarity.
5.1.1 Motion
As mentioned in Chapter 3, the SLAM algorithm consists of two steps: prediction and
update. For MonoSLAM algorithm these two steps are as follows:
1) prediction step:
Using the standard pinhole model for the camera, the location (u, v) we expect to find the
feature can be found from:
hi =
u
v
=
u0 − fkuhR
Lx
hRLz
v0 − fkuhR
Ly
hRLz
,
where (f, ku, kv, u0, v0) are camera calibration parameters and hRL is the expected feature
position relative to camera in the robot’s frame:
hRL = RRW (yWi − rW ).
2) update step:
The camera motion model which Davison used in this project is constant velocity, constant
angular velocity, assuming the unknown accelerations occur with a Gaussian profile cause
an impulse of velocity (V W ) and angular velocity (ΩR) in each step:
n =
V W
ΩW
=
aW∆t
αR∆t
46 CHAPTER 5. EXPERIMENTS
Introduction Davison’s MonoSLAM The SceneLib Libraries Final Thoughts
Extended Kalman Filter: Motion models
Constant velocityAssume bounded, Gaussian-distributed linear and angularacceleration
xnew =
rWnew
qWRnew
vWnew
ωRnew
= f (x , u) =
rW +(
vW + V W)
∆tqWR × q
((ωR + ΩR)
∆t)
vW + V W
ωR + ΩR
Figure 5.3: Camera Motion Model
Using the above motion model, the new state estimate update fv(xv, u) and its uncertainty
Qv can be computed as follows:
fv =
rWnew
qWRnew
vWnew
wRnew
=
rW + (vW + V W )∆t
qWR × q((wR + ΩR)∆t)
vW + V W
wR + ΩR
,
Qv =∂fv∂n
Pn∂fv∂n
T
,
where q((wR + ΩR)∆t) is the quaternion defined by the rotation vector (wR + ΩR)∆t, n
is the noise vector and Pn is its diagonal covariance matrix representing the uncorrelated
noise between all state’s components. The motion uncertainty growth rate is determined
by the size of Pn, the covariance of noise vector; and smoothness of the motion is defined
by its parameters value. Small Pn corresponds to a smooth motion with small acceleration
(which can not fulfill the rapid movement), high Pn on the other hand can cope with this
kind of motion but needs us to make lots of good measurement in each step.
5.2. VICON 47
5.2 Vicon
Vicon system is designed for optical motion capturing by tracking the infrared reflective
markers attached on the object of interest, which reflect the infrared light emitted by a ring
of infrared LEDs around the lens of each camera.
Vicon systems consists of 4 main parts:
1. MX Cameras: system includes 8 high resolution high speed cameras installed
around the room. Each camera outputs raw IR point values reflected by the markers
to the ultranet system as shown in Figure 5.4.
Figure 5.4: Vicon camera.
2. Ultranet MX: which links the MX cameras and the host computer.
3. Host computer.
4. Infrared reflective markers: which are available in 4 different sizes as shown in
Figure 5.5. The key point in using the Vicon is placing these markers somehow to
give us the opportunity to predict the complete motion from their positions. Because
the camera is a rigid body without any joint, the model we used for it is a simple
box, with a marker on the center of each side as showed in Figure [].
By fitting the box model into camera motion, its exact pose will be found for checking
the ground truth of the MonoSLAM algorithm output.
48 CHAPTER 5. EXPERIMENTS
Figure 5.5: Vicon markers, from [3].
5.3 Experimental Results
We used ”Panasonic PV-GS500 ” to run the experiment and ”Vicon cameras” to check
the ground truth of MonoSLAM program. Camera calibration was performed using the
standard software and calibration grid [1] and the following parameters were obtained:
Focal length: fku = fkv = 941.5 pixels;
Principal point: (u0, v0) = (426.0, 239.5);
Distortion factor: K1 = 0.001;
Image size: 853× 480.
The experiments were done inside the robust control lab, and data sequence were collected
while camera was looking downward to the floor and moving in a restricted volume inside
the room.
In our setup, several challenges made the task harder. The problem here was that usually
there is not enough object on the floor, and the features are mostly tile corners which
are very similar to each other and can lead the system to incorrect loop closure. And also
because we wanted to move the camera inside the space which is covered by Vicon’s camera,
it was not possible to move the camera very high, so using the objects located on the desk
and tables was not possible either. We made a cluttered space full of objects on the floor
to help the system.
Camera localization was performed using Davison’s MonoSLAM open source library, and
the ground truth was checked by Vicon Cameras. The output of MonoSLAM for some
discrete times is shown in Figure 5.6. Each figure consists of the map of the room and
the detected features for the corresponding frames. Camera trajectory for a sample run is
shown in Figure 5.9 and the errors in x and y dimensions are shown in Figure 5.7.
The output of Vicon cameras has also an error with the average standard deviation of
(0.0194, 0.0154, 0.0261) in x, y and z dimensions respectively in this experiment, therefor
the error illustrated in Plot 5.7 is not only due to MonoSLAM program. We can see the
5.3. EXPERIMENTAL RESULTS 49
difference between the output of MonoSLAM and output of Vicon has a big peak at the
beginning and get less as the program continued.
50 CHAPTER 5. EXPERIMENTS
(a) (b) (c) (d)
(e) (f) (g) (h)
(i) (j) (k)
Figure 5.6: The output of MonoSLAM for a sample sequence, with 4 startup features.The map is built gradually by updating the visible features in each frame. Blue markscorrespond to failed matches, red marks correspond to successful matches and yellowmarks correspond to unused features in current frame.
5.3. EXPERIMENTAL RESULTS 51
200 400 600 800 1000 1200 1400 1600 1800 2000 2200
−2
0
2
4
6
8
10Robot Localization Error in "x" Dimension
Iteration
x(V
icon
) −
x(M
onoS
LAM
)
(a) Robot localization error in”x” dimension.
200 400 600 800 1000 1200 1400 1600 1800 2000 2200−6
−4
−2
0
2
4
6
8
10Robot Localization Error in "y" Dimension
Iteration
y(V
icon
) −
y(M
onoS
LAM
)
(b) Robot localization error in”y” dimension.
200 400 600 800 1000 1200 1400 1600 1800 2000 2200
−3
−2
−1
0
1
2
3
Robot Localization Error in "z" Dimension
Iteration
z(V
icon
) −
z(M
onoS
LAM
)
(c) Robot localization error in”z” dimension.
Figure 5.7: MonoSLAM Error in X, Y and Z Dimension.
(a) Orientation in x-y plane (b) Orientation in y-z plane (c) Orientation in z-x plane
Figure 5.8: UAV orientation, MonoSLAM and Vicon outputs
52 CHAPTER 5. EXPERIMENTS
−300 −250 −200 −150 −100 −50 0 50−100
−80
−60
−40
−20
0
20
40
60
80
1002D Trajectory in xy plane
x
y
ViconMonoSLAM
Figure 5.9: Comparing the output trajectories in 2D space, from Vicon andMonoSLAM.
Chapter 6
Conclusions
In this research, we studied different approaches for finding the pose of the flying quadrotor
inside the room. The disadvantages of each approach in real life situations were explained
and finally the Davison’s MonoSLAM approach was chosen for the localization step in
”Vision-Based UAV Control” research.
This was the starting point for our UAV controlling approach, and we showed MonoSLAM
can generate the reliable result to be used in control step.
53
Chapter 7
Discussion and Future Works
7.1 Discussion
7.1.1 Which Feature to Add
Davison relied on finding high quality landmarks, and added a new one when the number
of observable features in each frame was below a predefined threshold (e.g 12).
It is possible to have the most high quality features, placed close to each other in an
environment. This will cause having most of the landmarks overcrowded in some part
of the map and no information in some other parts. Davison’s assumption for having 12
landmarks in each frame will help a lot to solve this problem, but it’s better to find features
scattered around the room equally. This can also help us to run MonoSLAM in bigger
environments.
The reasonable solution is when we want to add a new landmark to our map, not look for
them in a defined space around the existing ones 7.1. This will prevent the program from
finding close high quality features (which can not add that much useful information to the
localization problem).
7.1.2 Moving Objects
When we use the SLAM algorithm for localization, after building the complete map, occlu-
sion, acceleration and any unexpected movement can be handled by the system and they
can not affect the pose estimation process.
54
7.2. FUTURE WORKS 55
(a) (b) (c)
Figure 7.1: a: existing map of the room; 10 points stars are the features, darker colorcorresponds to high quality features and lighter colors corresponds to low qualityones. 4 points pink stars are the selected landmarks. b: updated map, selecting 2other landmarks by only taking into account the quality of features. b: updated map,selecting features by quality and distance from the existing features in the map.
Davison assumed the initial position of 4 features with known appearance, and the initial
pose of the camera(robot) are given at the starting frame. If, unlike Davison, we assume
a high resolution picture of the environment (or some part of it) is available [21] with no
additional information, it is possible to use classification tree [7] and [22]. This can solve
the problem of moving objects inside the map (by giving an initial overview of the map in
the starting point). Additionally, this can help the SLAM algorithm with its extra localiza-
tion information [7]. This can also solve the problem of template matching algorithm, by
providing more templates for each landmark (similar to what Davison did in [12]).
7.2 Future works
The next step of this research is unmanned controlling of the quadrotor inside the room, and
also we *will* try to note the solutions mentioned previously in discussion part to improve
the UAV localization results.
Bibliography
[1] Camera calibration toolbox. http://www.vision.caltech.edu/bouguetj/calib_
doc/.
[2] Monte carlo simulation. http://www.vertex42.com/ExcelArticles/mc/
MonteCarloSimulation.html.
[3] Vicon introduction. http://grouplab.cpsc.ucalgary.ca/cookbook/index.php/
Toolkits/ViconIntroduction.
[4] Wikipedia. http://en.wikipedia.org/wiki/Helicopter.
[5] C. Altug, E. Taylor. Vision-based pose estimation and control of a model helicopter.
Mechatronics, 2004. ICM ’04. Proceedings of the IEEE International Conference, pages
316–321, June 2004.
[6] Nobuyuki Kita Andrew J. Davison, Yolanda Gonzalez Cid. Real-time 3d SLAM with
wide-angle vision. 2004.
[7] Yanghai Tsin Aurelien Boffy and Yakup Genc. Real-time feature matching using adap-
tive and spatially distributed classification trees. BMVC 2006, September 2006.
[8] H. Bailey, T.; Durrant-Whyte. Simultaneous localization and mapping (SLAM): part
ii. Robotics and Automation Magazine, IEEE, 13(3):108–117, Sept. 2006.
[9] J.; Guivant J.; Stevens M.; Nebot E. Bailey, T.; Nieto. Consistency of the EKF-SLAM
algorithm. Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference
on, pages 3562–3568, Oct. 2006.
56
BIBLIOGRAPHY 57
[10] A. Davison. Real-time simultaneous localisation and mapping with a single camera.
In Proceedings of the ninth international Conference on Computer Vision ICCV’03,
October 2003.
[11] A. Davison, W. Mayol, and D. Murray. Real-time localisation and mapping with
wearable active vision. In Proceedings of the IEEE International Symposium on Mixed
and Augmented Reality, 2003.
[12] Ian D. Molton Nicholas D. Stasse Olivier Davison, Andrew J. Reid. MonoSLAM:
Real-time single camera SLAM. Pattern Analysis and Machine Intelligence, IEEE
Transactions, 29(6):1052–1067, June 2007.
[13] Shoudong Huang; Gamini Dissanayake. Convergence analysis for extended kalman
filter based SLAM. Robotics and Automation, 2006. ICRA 2006. Proceedings 2006
IEEE International Conference on, pages 412–417, May 2006.
[14] T. Durrant-Whyte, H.; Bailey. Simultaneous localization and mapping: part i. Robotics
and Automation Magazine, IEEE, 13(2):99–110, June 2006.
[15] Emin Gabrielyan. The basics of line moir patterns and optical speedup. 2007.
[16] Mario Valentiy Glenn P. Tournier and Jonathan P. How. Estimation and control
of a quadrotor vehicle using monocular vision and moire patterns. AIAA Guidance,
Navigation, and Control Conference and Exhibit, August 2006.
[17] Gary Bishop Greg Welch. An introduction to the kalman filter. 2001.
[18] J. Gutmann and K. Konolige. Incremental mapping of large cyclic environments.
Proceedingsof the IEEE International Symposium on Computational Intelligence in
Robotics and Automation (CIRA), pages 318–325, November 1999.
[19] J. Gutmann and K. Konolige. Incremental mapping of large cyclic environments.
Proceedingsof the IEEE International Symposium on Computational Intelligence in
Robotics and Automation (CIRA), pages 318–325, November 1999.
[20] D. Heckerman. A tutorial on learning with bayesian networks, 1995.
[21] S.; Panuganti R. Joshi, M.V.; Chaudhuri. A learning-based method for image super-
resolution from zoomed observations. Systems, Man, and Cybernetics, Part B, IEEE
Transactions on, 35:527 – 537, June 2005.
58 BIBLIOGRAPHY
[22] P. Fua P. Lepetit, V. Lagger. Randomized trees for real-time keypoint recognition.
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society
Conference on, 2:775 – 781, June 2005.
[23] R. Martinelli, A.; Siegwart. Exploiting the information at the loop closure in SLAM.
Robotics Robotics and Automation, 2007 IEEE International Conference on, pages
2055–2060, 10-14 April 2007.
[24] Kevin P. Murphy. Dynamic bayesian networks. 12 November 2002.
[25] J. Neira and J. Tardos. Data association in stochastic mapping using the joint com-
patibility test. Robotics and Automation Magazine, IEEE, 17(6):890–897, Dec 2001.
[26] Jos Neira. Multivehicle mapping of large environments. IROS 2005 Advanced Tutorial
in SLAM, August 2005.
[27] Athanasios Papoulis. Probability, Random Variables and Stochastic Processes.
McGraw-Hill Companies, 3rd edition, 1991.
[28] Peter Cheeseman Randal Smith, Matthew Self. A stochastic map for uncertain spatial
relationships. 1988.
[29] Peter Cheeseman Randall C. Smith. On the representation and estimation of spatial
uncertainty. The International Journal of Robotics Research, 5(4):56–68, 1986.
[30] David G. Stork Richard O. Duda, Peter E. Hart. Pattern Classification. Wiley-
Interscience, 2nd edition, October 2000.
[31] Jianbo Shi and Carlo Tomasi. Good features to track. In IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR’94), Seattle, Jun 1994.
[32] V. Smelyanskiy, P. Cheeseman, D. Maluf, and R. Morris. Bayesian super-resolved
surface reconstruction from images. pages 375–382.
[33] Doron Tal. 3d model-based SLAM from 2d images. Fall 2003.
[34] Charles W. Therrien. Discrete Random Signals and Statistical Signal Processing. Pren-
tice Hall, 1992.
BIBLIOGRAPHY 59
[35] T. Tommasini, A. Fusiello, V. Roberto, and E. Trucco. Robust feature tracking in
underwater video sequences. In Proceedings IEEE Oceans, pages 46–50, page 4650,
September 1998.
[36] Glenn P. (Glenn Paul) Tournier. Six degrees of freedom estimation using monocular
vision and moir patterns, 2006.
[37] Simon J. Julier; Jeffrey K. Uhlmann. A counter example to the theory of simultaneous
localization and map building. Robotics and Automation Magazine, IEEE, pages 4238–
4243, 2001.
[38] Nicholas Metropolis; S. Ulam. The monte carlo method. Journal of the American
Statistical Association,, 1949.
[39] D. A. Maluf R. D. Morris V. N. Smelyanskiy, P. Cheeseman. Bayesian super-resolved
surface reconstruction from images. Computer Vision and Pattern Recognition, 2000.
Proceedings. IEEE Conference on, 1:375–382, 13-15 June 2000.