unco
rrec
ted
pro
of
Intel Serv Robotics
DOI 10.1007/s11370-016-0195-4
ORIGINAL RESEARCH PAPER
Online underwater optical mapping for trajectories with gaps
Armagan Elibol1 · Hyunjung Shim1· Seonghun Hong2
· Jinwhan Kim3·
Nuno Gracias4· Rafael Garcia4
Received: 10 September 2015 / Accepted: 2 March 2016
© Springer-Verlag Berlin Heidelberg 2016
Abstract This paper proposes a vision-only online mosaic-1 1
ing method for underwater surveys. Our method tackles a2
common problem in low-cost imaging platforms, where com-3
plementary navigation sensors produce imprecise or even4
missing measurements. Under these circumstances, the suc-5
cess of the optical mapping depends on the continuity of6
the acquired video stream. However, this continuity can-7
not be always guaranteed due to the motion blurs or lack8
of texture, common in underwater scenarios. Such tempo-2 9
ral gaps hinder the extraction of reliable motion estimates10
B Armagan Elibol
Hyunjung Shim
Seonghun Hong
Jinwhan Kim
Nuno Gracias
Rafael Garcia
1 School of Integrated Technology, Yonsei Institute of
Convergence Technology, Yonsei University, Incheon,
Republic of Korea
2 Robotics Program, Korea Advanced Institute of Science and
Technology, Daejeon, Republic of Korea
3 Robotics Program, Department of Mechanical Engineering,
Korea Advanced Institute of Science and Technology,
Daejeon, Republic of Korea
4 Computer Vision and Robotics Institute,
University of Girona, Girona, Spain
from visual odometry, and compromise the ability to infer 11
the presence of loops for producing an adequate optical 12
map. Unlike traditional underwater mosaicing methods, our 13
proposal can handle camera trajectories with gaps between 14
time-consecutive images. This is achieved by constructing 15
minimum spanning tree which verifies whether the cur- 16
rent topology is connected or not. To do so, we embed a 17
trajectory estimate correction step based on graph theory 18
algorithms. The proposed method was tested with several 19
different underwater image sequences and results were pre- 20
sented to illustrate the performance. 21
Keywords Underwater robotics · Optical mapping · Image 22
mosaicing · Environmental monitoring 23
1 Introduction 24
Rapid developments in the robotics field have made pos- 25
sible to design small and low-cost robots installed with a 26
limited number of sensors for aerial and/or underwater opti- 27
cal mapping. These robots are capable of reaching areas 28
beyond human reach. This capability allows them to be used 29
for exploring and mapping unknown environments with dif- 30
ficult accessibility for humans such as the Moon, Mars or 31
deep ocean. Collected optical data and processing that data 32
to obtain a map are crucial for different purposes; map- 33
based navigation, planning the path of the vehicle during 34
the execution of the mission and also the generated map 35
can be used to perform some further processing by human 36
experiment (e.g., localization of interest areas, detection of 37
temporal changes in the morphology of bio-diversity of the 38
mapped environment by comparing generated maps). Our 39
target scenarios are surveys of an area of interest with a 40
minimally instrumented underwater vehicles such as low- 41
123
Journal: 11370 MS: 0195 TYPESET DISK LE CP Disp.:2016/3/15 Pages: 13 Layout: Large
Au
tho
r P
ro
of
unco
rrec
ted
pro
of
Intel Serv Robotics
cost ROV, towed arrays with standalone underwater cameras,42
diver or action cameras [1]. Most existing low-cost remotely43
operated vehicles (ROVs) [2,3] available on the market use44
low-cost sensors such as a video camera and a pressure45
sensor and in some cases a compass [4,5]. This is differ-46
ent from autonomous underwater vehicles (AUVs), which47
are generally equipped with a wide range of navigation48
sensors such as ultra-short base line (USBL), Doppler veloc-49
ity log (DVL), inertial navigation system (INS), and ring50
laser gyroscopes. In absence of such extensive sensor suite,51
image sensing may provide valuable navigation information,52
although under the constraints of visibility and texture-rich53
environments.54
Image mosaicing is a well-known tool for building 2D55
maps from images. The mosaicing process can be done online56
during data collection and/or offline, as a batch process after57
data collected. Offline mosaicing mainly aims at produc-58
ing highly accurate map of the surveyed area while online59
mosaicing is usually used for navigational purposes in order60
to provide visual feedback to the robot. Due to the online61
computational restrictions, mosaics might not be as accurate62
as the one built by offline methods.63
Most online mosaicing approaches are based on image-to-64
image registration, which relies on time-consecutive image65
registration when there is no other navigational sensor infor-66
mation available. Underwater environmental challenges such67
as poor visibility and non-uniform illumination, combined68
with limited platform maneuverability or inadequate distance69
to the seafloor may result in failure of the image registration70
process. As an example, non-overlapping time-consecutive71
images may appear when the camera operates a low frame72
rate and the robot navigates at very low altitude leading to a73
very small footprint for each image.74
The underlying goal of this paper is to develop a vision-75
only online mosaicing method that is capable of dealing76
with gaps between time-consecutive images, targeting the77
low-cost platforms with very limited sensor suites. For such78
platforms, images and mosaics built from them are the only79
information source about the area being mapped and the path80
being followed during survey. Therefore, efficient gap han-81
dling is important to carry out successful surveys. To deal82
with gaps between time-consecutive images, our approach83
builds upon the Kalman filter-based mosaicing approaches84
similar to [6]. We use a graph-based representation where85
an image is a node and a successfully registered image pair86
is represented as a link (or edge) between two nodes of the87
graph. Every link has an associated motion observation. In88
this paper, we consider plane-induced motions which are89
represented by a 2D planar transformation in the form of90
a homography. This motion is computed from image reg-91
istration [7,8]. When the registration of time-consecutive92
images fails, we add a virtual link between current and pre-93
vious nodes (or frames). The motion attached to these links94
is an identity mapping1 with large uncertainty, and they are 95
intended to keep the transects of the trajectory together rep- 96
resented with respect to the chosen global frame. A graph is 97
regarded as connected when there is a path between every 98
pair of nodes. In other words, there are no unreachable nodes 99
in a connected graph. In our problem, if the current topol- 100
ogy is connected without taking into account the virtual 101
link(s), then every node can be reached and this results in 102
obtaining a proper trajectory estimate regardless of having 103
gaps between time-consecutive images. Therefore, assump- 104
tion of time-consecutive images can be relaxed by the use 105
of non-consecutive overlapping image pairs and checking 106
the connectivity of the topology graph. Motivated by this, 107
we integrate an MST-based checking and trajectory correc- 108
tion step to the framework and this step is triggered when 109
there is/are virtual link(s) in the system. If the topology graph 110
is connected without taking into account the present virtual 111
links, the trajectory estimate is corrected by minimizing the 112
symmetric transfer error. 113
2 Related work 114
Many online mosaicing algorithms have been presented 115
within the context of visual simultaneous localization and 116
mapping (SLAM). Garcia et al. [9] and Eustice et al. 117
[10] employed an augmented-state Kalman filter (ASKF) to 118
incorporate relative pose measurement obtained from image 119
matching. In [11], a real-time mosaicing system on a seabed 120
was proposed by combining vision and DVL odometery in 121
order to estimate the vehicle’s position and bound the odom- 122
etry drift. However, the system was limited to cover only 123
translation motion in vision processing. Mahon et al. [12] 124
presented mapping results of a large-area seafloor using a 125
stereo vision system, and the stereo rig was also used to pro- 126
vide loop-closure observations. Kim and Eustice [13] showed 127
a photomosaic for ship hull as a result of autonomous inspec- 128
tion using a hovering-type AUV [14]. In their work, the 129
authors used a pose-graph structure based on mutual infor- 130
mation (extended information filter) for inference. These 131
approaches have shown impressive results, but assume the 132
availability of expensive motion sensors (e.g., DVL, INS). 133
However, those sensors cannot be guaranteed in most com- 134
mercial low-cost ROVs [2,3]. 135
Bülow et al. [15] proposed an efficient online mosaicing 136
method for unmanned aerial vehicles (UAVs) using Fourier– 137
Mellin transform. Even though the proposed methodology 138
is able to create the photo maps just based on images with- 139
out any additional measurement input about the vehicle’s 140
motion, they did not address the problem of matching non- 141
1 Identity mapping is no rotation (0 degree with respect to the chosen
global frame), no translation and scale equals to 1.
123
Journal: 11370 MS: 0195 TYPESET DISK LE CP Disp.:2016/3/15 Pages: 13 Layout: Large
Au
tho
r P
ro
of
unco
rrec
ted
pro
of
Intel Serv Robotics
time consecutive images. As stated, the proposed method142
fails if there is not enough overlapping area between time-143
consecutive images.144
Caballero et al. [6] formulated a vision-only online145
mosaicing algorithm based on extended Kalman filter (EKF).146
In their model, the state vector was composed of absolute147
homographies and the state was updated when a loop-closure148
was detected and images were processed sequentially. Fer-149
reira et al. [16] contributed with a real-time mosaicing150
method in the frame of landmark-based SLAM without the151
need for costly motion sensors. Neither method explicitly152
considers the case of matching failure with respect to sequen-153
tial image pairs and thus their approaches may fail if there is154
not enough overlap between time-consecutive images. Kekec155
et al. [17] utilized a geometric tool from computer graph-156
ics, called separating axis theorem, to detect intersections157
between new and previous images in the mosaic. Those158
approaches can contribute to improvement of total com-159
putational cost by reducing the number of matching trials.160
However, this approach also requires overlap between time-161
consecutive images to compute initial position of the last162
image.163
A common feature of the above online approaches is either164
the reliance on navigation sensors, or the assumption that165
time-consecutive images can always be adequately regis-166
tered.167
This paper builds upon a body of work by the same168
group [18]. In [19], the problem of batch mosaicing problem169
was addressed using Kalman-filter framework and differ-170
ent strategies presented for ranking possible overlapping171
image pairs. In [20], a global alignment method is presented172
which substitutes non-linear minimization with two linear173
successive steps reducing the computational cost signifi-174
cantly. In [21], the problem of obtaining topology with less175
image-matching attempts from unordered images acquired176
previously was addressed using graph theory algorithms. The177
current paper departs from these, in the sense that it addresses178
the mapping online and with gaps between time-consecutive179
images.180
3 Model definitions and nomenclature181
This paper uses the following common mosaicing notation [6,182
19].183
• i H j is the homography relating image points represented184
in the coordinate frame image j to the image i .185
• All images need to be represented in a common single186
coordinate frame, called mosaic frame and represented187
as M .188
• M Hi is the homography relating image points in image i189
to the mosaic frame.190
• The state vector, x = [x1, x2, x3, . . . , xN ]T , is composed 191
of the homography values that relate every image with 192
the mosaic frame and N is the total number of images. 193
M Hi =
si · cos θi −si · sin θi t xi
si · sin θi si · cos θi t yi
0 0 1
, 194
where si is scale, θi is rotation while t xi and t yi are trans- 195
lation parameters. vec(·) is the function that converts the 196
homography matrix input into a vector, thus 197
xi = vec(M Hi ) 198
= [si θi t xi t yi ]T . 199
• P denotes the covariance matrix of the state vector x. 200
• A new observation (measurement) is obtained when 201
two images, i and j , are successfully matched. The 202
observation is represented by the homography between 203
corresponding images. The relation between state and the 204
observation at time k can be expressed as follows: 205
z(k) = vec(i H j ) + v(k) 206
= vec(i HM ·M H j ) + v(k) 207
= vec(mat (xi )−1 · mat (x j )) + v(k), (1) 208
where mat (·) is the function which converts the state 209
vector into homography matrices and v(k) is 4×1 obser- 210
vation noise vector. It is assumed that the observation 211
noise is Gaussian, it is not correlated with state noise, 212
and its covariance matrix is R(k). 213
• The reference frame is the frame of the first image, there- 214
fore M = 1 and it is not part of the parameter vector 215
M H1 = mat ([1 0 0 0]). (2) 216
4 Online mosaicing for trajectories with gaps 217
Our proposal is based on standard image-to-map (ITM) reg- 218
istration using EKF equations for carrying out trajectory 219
estimation. For better numerical stability, we employ square 220
root formulation for Kalman filter as in [22]. As our interest 221
is in creating the optical map of an area from images alone 222
and with no assumptions on the dynamics of the underwater 223
platform, we do not take into account any control input. Our 224
model does not have any state prediction equations, and only 225
update equations are used. The outline and detailed work- 226
flow of our proposal is given in Fig. 1. 227
Image acquisition and matching Each new image 228
acquired at time t is matched against the image t − 1. 229
123
Journal: 11370 MS: 0195 TYPESET DISK LE CP Disp.:2016/3/15 Pages: 13 Layout: Large
Au
tho
r P
ro
of
unco
rrec
ted
pro
of
Intel Serv Robotics
Ini�alize Data
Structuresand set Flag=0
Matching
successful
Set Switch=0and
Empty=1
Do augmenta�on with
generic values
Acquire new image It
Match image pair It It-1
Y
N
Include It It-1 in
successfully
matched list and
Compute overlap
EKF update virtual obs.
Flag=Flag+1, overlap=0,
Switch=1and Empty=0
Overlap <
threshold
Y
N
Flag==0Y
N
Find overlapping
image pairs by
Similarity Search
Find overlapping
image pairs by
Similarity and
Spa�al Search
Rank possible
overlapping pairs
A�empt to match
selected pairs
Matching
successful
N
Y
Switch==1Y
N
Flag=Flag-1
Do EKF
update with
successfully
matched pairs
Flag>=1
Topology
Connected
Y
Y
Correct
Trajectory and
Flag=0
N
N
B
A
A
B
Update successfully
matched list and
Set Empty=1
Empty==1Y
N
A
Acquire new image It Match image pair It It-1
Generate possible
overlapping image pairs
A�empt to match n
selected image pairs
Update state and its
covariance
Correct trajectory if
topology is connected
Fig. 1 Workflow of the proposed method. It is divided into three main steps; image acquisition and matching, identification of possible overlapping
pairs, and state update and trajectory correction
If this matching is successful, we set Cons Flag to 1 so230
that this image pair (It , It−1) will be used to update state231
and its covariance accordingly. We compute the percent-232
age of overlapping area between images using a quick233
method of projection a subset of point from one image234
to the other using the motion computed through image235
matching and determining the percentage of projected236
point that fall inside the frame of the second image.237
If the matching fails, we add a virtual observation (or238
link) as if this matching was successful. The main goal239
of adding such links is that they allow the segments of240
the trajectory to be represented with respect to the cho-241
sen global frame.2 They are useful for searching the242
possible overlapping images between the last acquired243
image and different segments of the trajectory. Virtual244
links are treated as identity mappings (e.g., in Eq. 2)245
2 The first image frame is usually considered as a global frame in the
absence of any other relevant information.
with a suitably large observation noise covariance to 246
have a minimum impact on state vector. The state vec- 247
tor is updated with this identity mapping. We give an 248
increase to the parameter Flag, which is used as an indi- 249
cator if there is a virtual link in the topology and also its 250
cardinality. 251
Identification of possible overlapping pairs After fin- 252
ishing the image-matching step, either using a virtual link 253
or with successfully matched link, the next step is to find 254
the possible overlapping image pairs between image It 255
and all previous images. To do so, we employ two dif- 256
ferent strategies depending on the existence of virtual 257
links at that time t . If there are virtual links, the trajec- 258
tory estimate will likely be very inaccurate. Therefore, 259
the trajectory estimate and its covariance are not used 260
to predict overlapping image pairs. In such cases, we use 261
only a visual similarity indicator (described next). If there 262
are no virtual links, then we use both the visual similarity 263
and a spatial proximity indicator. 264
123
Journal: 11370 MS: 0195 TYPESET DISK LE CP Disp.:2016/3/15 Pages: 13 Layout: Large
Au
tho
r P
ro
of
unco
rrec
ted
pro
of
Intel Serv Robotics
The spatial proximity indicator is computed from the265
current state vector and its covariance by using Maha-266
lanobis distance between image centers [7,19]. The267
visual similarity indicator is based on feature descrip-268
tor matching. We scale down image size to reduce the269
computational cost to maintain real-time performance,270
and extract speeded up robust features (SURF) features271
as they provide good performance underwater [23]. We272
use local difference binary (LDB) [24] since it is a very273
fast and compact binary feature descriptor, directly over274
an image patch. It is computed using average intensity275
and first-order gradients, which can be computed very276
fast using integral images. The descriptors are matched277
using Hamming distance. For each descriptor, the two278
best matches are found with a nearest neighbor search.279
The distance ratio from the closest neighbor to the second280
closest is used as a criterion to identify matching descrip-281
tors [25]. For a given pair of images, the visual similarity282
indicator is proportional to the number of descriptors that283
are matched using the distance ratio criterion.284
A list of possibly overlapping image pairs is created by285
selecting the image pairs that have high visual similarity286
(for the case of virtual links present), or a combination287
of both high visual similarity and high spatial proximity288
(for the case when no virtual links exist).289
This list contains the pairs that should be matched in the290
following step using random sampling over the homog-291
raphy constraint. However, this list might be too large292
to process under the real-time constraints. In such case,293
only a subset of pairs is chosen, using the observation294
mutual information (OMI) [19,26] as a ranking criterion.295
The OMI can be regarded as a measure of the amount296
of information that one observation can provide to the297
system within a Kalman filter estimator, since the OMI298
provides the amount of uncertainty of the state that will299
reduce when the observation is done. At time t , given an300
observation z(t), the OMI can be calculated as:301
I (t, z(t)) =1
2ln[|S(t)||R(t)−1|], (3)302
where S(t) is an innovation covariance matrix in Kalman303
filter formulation [27] and R(t) is the observation noise304
covariance matrix. The computation of OMI requires305
noise covariance matrix R(k). Since the real values for306
R(k) cannot be obtained without having images matched,307
we use the same covariance matrix for all image pairs.308
OMI scores computed in this way can be named as pre-309
dicted information gain of observations. The OMI score310
is calculated for each image pair in the possible overlap-311
ping image pairs list.312
Based on preliminaA subset of maximum five image pairs313
after ranking the image pairs, we select a subset. Dur-314
ing our experiments, the subset size was chosen as five 315
image pairs, which was found empirically to be an ade- 316
quate value. Image pairs in the subset are attempted to be 317
matched. For image matching, SURF features and LDB 318
descriptors are used and M-estimator sample consensus 319
(MSAC) [28] is used for outlier rejection and homogra- 320
phy estimation. 321
State update and trajectory correction If some image 322
pairs are successfully matched, then the homography 323
between them is used to update the state. The noise 324
covariance of the homography is calculated from the 325
matched correspondences using first-order covariance 326
propagation [29] by assuming additive Gaussian noise 327
on the positions of correspondences. This noise covari- 328
ance matrix is used to update the state covariance matrix 329
using the Kalman filter update equations. 330
The final step in this framework is to check whether any 331
of the existing virtual links can be removed. To do so, we 332
check if the successfully matched pairs so far are enough 333
to establish a link from the first image to current image. 334
This can be done by finding the spanning tree (ST). If 335
the current form of the topology allows for establishing 336
such a connection, this means that the topology is already 337
connected and there is no need for virtual links. This also 338
allows us to correct the trajectory estimate by minimizing 339
the symmetric transfer error given in Eq. 4. Due to the 340
time constraints of online operation, we use four image 341
corners as correspondences between overlapping image 342
pairs during this step. Since the homography between 343
overlapping image pairs is computed from a large set 344
of inliers, the resulting homography is accurate enough 345
to generate four virtual correspondences namely image 346
corners, for trajectory correction [30]. After estimating 347
the trajectory, its covariance is propagated [7]. 348
This MST correction step is triggered if at least one virtual 349
link is used from the last time the correction was applied. 350
Once the image acquisition is completed, we employ this 351
correction step again in order to reduce the effects of the 352
presence of virtual links on the trajectory estimate. 353
The overall EKF algorithm is described by the following 354
structured text outline: 355
1. Initialize data structures and set Flag = 0 356
2. Wait for new time instant and set Swi tch = 0 and 357
Empty = 1; 358
3. Do state and covariance matrix augmentation with 359
generic values 360
4. Acquire new image It 361
5. Match against last image It−1 and compute overlap 362
6. if matching is successful 363
(a) Add image pair (It , It−1) in successfully matched 364
image pairs list. 365
123
Journal: 11370 MS: 0195 TYPESET DISK LE CP Disp.:2016/3/15 Pages: 13 Layout: Large
Au
tho
r P
ro
of
unco
rrec
ted
pro
of
Intel Serv Robotics
Fig. 2 Simulated trajectory: white dots represent the image centers
while red dashed lines are for non-overlapping time-consecutive image
pairs
(b) Compute the overlap percentage between image366
(It , It−1)367
7. else368
(a) Do EKF measurement update with virtual measure-369
ment370
(b) Set Flag = Flag + 1, overlap = 0, Swi tch = 1, 371
and Empty = 0; 372
8. end 373
9. if overlap < overlapT hreshold 374
(a) if Flag == 0 375
i. Find images that are in the certain bounding area 376
to the current position of It and that have certain 377
percent of visual similarity. 378
(b) else 379
i. Find images that have certain percent of visual 380
similarity. 381
(c) end 382
(d) Rank possible overlapping image pairs and select top 383
n number of them. 384
(e) Attempt to match selected image pairs 385
(f) if matching is successful 386
i. Update successfully matched list. 387
ii. Set Empty = 1 388
iii. if Swi tch == 1 389
A. Set Flag = Flag − 1 390
iv. end 391
(g) end 392
10. end 393
11. if Empty == 1 394
(a) Do EKF measurement update with all successfully 395
matched images with It 396
(b) if Flag >= 1 397
i. Check if a current topology graph is connected 398
Fig. 3 Final trajectory obtained
for the first dataset. Dots
represent image centers and
dotted (red) lines show the
established non-consecutive
overlapping image pairs, while
dashed lines denote the
non-overlapping
time-consecutive images
123
Journal: 11370 MS: 0195 TYPESET DISK LE CP Disp.:2016/3/15 Pages: 13 Layout: Large
Au
tho
r P
ro
of
unco
rrec
ted
pro
of
Intel Serv Robotics
A. Correct trajectory estimate399
B. Set Flag = 0400
ii. end401
12. end402
13. Go to 2403
5 Experimental results404
The framework described in the previous section was405
tested on a general setup for image surveys using differ-406
ent unmanned underwater vehicles (UUVs) equipped with407
a down-looking camera. Experiments have been carried out408
using both real and simulated challenging datasets of under-409
water surveys.410
During our experiments, we found that image resolution411
around 256 × 192 pixels provide an acceptable trade-off412
between visual similarity quality and computational perfor-413
mance. Therefore the images were downscaled by a factor of414
2 or 4, depending on the original resolution. The image over-415
lap threshold was set to 90 % (overlapT hreshold = 0.9)416
and the number of overlapping image pairs to be attempted417
to be matched was set to (n = 5).418
All the tests were performed using a desktop computer419
with an Intel Xeon E5-1650™ 3.5 Ghz processor with a420
64-bit operating system and running MATLAB™ on CPU.421
For feature detection and matching, OpenCV functions are422
employed through MEX-file interface. Feature detection and423
computing descriptors took 30 ms on average for an image424
of 256 × 192 pixels, while LDB descriptor matching for an425
image pair having maximum 250 descriptors took approxi-426
mately 0.5 ms. Other steps of the method were implemented427
purely in MATLAB.428
5.1 Dataset properties429
The first dataset is composed of 245 images of 1024 × 1024430
pixels and they are cropped from a high-resolution image431
using a subset of real trajectory executed by Victor-6000 ROV432
during the MoMAR08 cruise3 at the deep-sea Lucky Strike433
hydrothermal field (Mid-Atlantic Ridge) [31]. The trajec-434
tory superimposed on the high-resolution image is illustrated435
in Fig. 2. The total number of overlapping image pairs is436
3321 and the total number of correspondences is 2,218,692.437
This dataset has six large jumps between time-consecutive438
images. In other words, there are six time-consecutive images439
that do not have overlapping area. The distance between the440
centers of these time-consecutive images can be considered441
as large taking into account the area covered by a single442
3 http://www.ifremer.fr/biocean/acces_gb/rapports/Appel_2cruisefr.
htql?numcruise=203. Accessed on August 25th, 2015. Tab
le1
Err
ors
inp
ixel
so
no
bta
ined
traj
ecto
ries
for
test
edd
atas
ets
Dat
aset
Met
hod
Num
ber
of
over
lappin
gim
age
pai
rsIm
age
cente
rdis
tance
erro
rS
ym
met
ric
tran
sfer
dis
tance
erro
rE
rror
on
mosa
icfr
ame
Su
cces
sfu
lly
mat
ched
Unsu
cces
sful
atte
mp
tsM
ean
Std
Max
Mea
nS
tdM
axM
ean
Std
Max
Dat
aset
1a
Pro
pose
d1139
13
39.3
923.8
785.0
65.8
13.0
349.5
72.7
73.0
050.6
3
245
imag
esIm
age-
to-i
mag
e(I
TI)
1053
156
5309.5
32344.9
616,7
39.5
34947.2
95413.0
565,1
26.6
41277.0
32424.4
912,6
01.5
6
1024
×1024
Offl
ine
mosa
icin
gin
[7]
3215
283
42.3
625.7
584.8
05.9
23.0
045.9
72.8
32.9
846.6
7
2,2
18,6
92
corr
esp.
All
-agai
nst
-all
(AG
A)
3321
26,5
69
39.5
723.3
879.0
85.9
63.0
146.0
42.8
52.9
947.1
8
Imag
e-to
-map
(IT
M)
(4-D
OF
)N
.AN
.A21.9
310.2
940.8
06.5
83.3
055.5
33.1
33.2
156.9
0
Imag
e-to
-map
(IT
M)
(8-D
OF
)N
.AN
.AN
.AN
.AN
.A1.1
90.3
84.0
30.5
60.3
63.8
9
Dat
aset
2P
ropose
d1615
1669
43.8
225.6
5124.8
58.4
18.1
3192.5
94.1
98.2
7178.9
4
1011
imag
esIm
age-
to-i
mag
e(I
TI)
2140
976
827.0
5856.0
43665.3
7131. 3
6259.8
14133.3
6110.5
1399.0
23971.2
7
512
×384
Offl
ine
mosa
icin
gin
[7]
3259
35,7
84
3.0
11.8
610.9
66.9
62.6
444.3
53.4
52.6
341.1
7
556,1
23
corr
esp.
All
-agai
nst
-all
(AG
A)
3340
507,2
15
N.A
N.A
N.A
7.1
42.6
839.6
33.5
42.6
739.7
9
Dat
aset
3P
ropose
d1642
1219
250.4
9171.6
7646.7
429.1
615.5
0311.6
213.2
814.5
6269.7
4
493
imag
esIm
age-
to-i
mag
e(I
TI)
1351
387
2130.3
71230.8
15396.6
32255.5
01541.3
67517.0
81425.6
31908.4
28607.7
8
1440
×806
Offl
ine
mosa
icin
gin
[ 7]
3676
27,8
10
287.5
2970.7
93951.1
138.4
1134.3
43657.1
017.5
6126.8
33428.2
4
259,4
43
corr
esp.
All
-agai
nst
-all
(AG
A)
3686
117,5
92
N.A
N.A
N.A
25.9
012.0
9167.0
811.5
811.0
4150.0
8
aF
or
this
dat
aset
,tr
ajec
tory
com
par
ison
isdone
agai
nst
ITM
(6-D
OF
)
123
Journal: 11370 MS: 0195 TYPESET DISK LE CP Disp.:2016/3/15 Pages: 13 Layout: Large
Au
tho
r P
ro
of
unco
rrec
ted
pro
of
Intel Serv Robotics
Fig. 4 Trajectory comparison for the second dataset. Left trajectory
shown with blue lines is obtained with minimizing the symmetric trans-
fer errors using all-overlapping image pairs, while the black one is
obtained with the proposed method. Final mosaics (rendered using last-
on-top strategy without any intensity blending) for the second dataset.
Center obtained using all-overlapping image pairs. The size of mosaic is
2348×3565. Right obtained with proposed method. The size of mosaic
is 2305 × 3580
image. For this dataset, the mean distance between the centers443
of non-overlapping time-consecutive pairs is approximately444
five times greater than a single image diagonal. The final445
trajectory obtained with our method is illustrated in Fig. 3.446
By registering every single image to the high-resolution447
image, we obtain trajectories that can serve as ground-truth.448
We registered individual images with two different type of449
transformations namely, similarity (4-DOF) and projective450
(8-DOF). These trajectories are included in Table 1 as ITM3 451
4-DOF and 8-DOF.452
The second dataset originally consists of 1136 images of453
512 × 384 obtained by Phantom XTL ROV during a survey454
of a patch reef located in the Florida Reef Tract (depth 7–455
10 m) near Key Largo in the US [32]. We removed some456
of images in order to create our test scenario of having457
relatively large jumps (at least 1.5 times the diagonal size458
of a single image) between some of the time-consecutive459
images. Finally, the dataset has 1011 images of which four460
time-consecutive images do not overlap. The total number of461
overlapping image pairs is 3340, while the total number of462
correspondences is 556,123.463
The third dataset is composed of 493 images of 1440×806464
pixels. This dataset has 15 non-overlapping time-consecutive465
image pairs. The total number of overlapping image pairs is466
3686 and the total number of correspondences is 259,443.467
The fourth dataset is extracted from a dataset that was468
acquired by the ICTINEU AUV [33] during the experiments469
in the Mediterranean Sea, surveying at a depth of 16 m and 470
keeping a distance from the robot to the seafloor of 3 m. The 471
extracted data has 92 images of 384 × 288 and composed of 472
two unconnected trajectories. 473
5.2 Trajectory accuracy comparison 474
For accuracy comparison on trajectories, we computed a 475
trajectory with minimizing the symmetric transfer error 476
given below using all-overlapping image pairs identified by 477
exhaustive all-against-all (AGA) image-matching strategy. 478
min1H2,
1H3,...,1HN
∑
k
∑
m
c∑
j=1
(
‖ kp j − 1H−1k · 1Hm · mp j ‖2 479
+ ‖ mp j − 1H−1m · 1Hk · kp j ‖2
)
, (4) 480
where k and m are image indices that were successfully 481
matched, N is the total number of images, and c is the 482
total number of correspondences between the overlapping 483
image pairs. In our experiments, c is selected as 4-corners 484
of the images due to the real-time constraints. This tra- 485
jectory is referred as AGA in the Table 1. We computed 486
trajectories using offline mosaicing method in [7], which is 487
capable of creating mosaics from totally unordered image 488
set unlike the traditional methods requiring overlap between 489
time-consecutive images. This method makes use of similar- 490
123
Journal: 11370 MS: 0195 TYPESET DISK LE CP Disp.:2016/3/15 Pages: 13 Layout: Large
Au
tho
r P
ro
of
unco
rrec
ted
pro
of
Intel Serv Robotics
Fig. 5 First row trajectory comparison for the third dataset. Trajectory
illustrated with blue lines is obtained with minimizing the symmet-
ric transfer errors using all-overlapping image pairs, while the black
one is obtained with the proposed method. Right zoomed regions of
trajectory comparison. While the difference between trajectories in the
regions 1 and 2 can be regarded as big, in the region 3 difference is small
relatively. This is mainly related to the number of non-consecutive over-
lapping image pairs used. Images in the region 1 have 474 overlapping
image pairs. Only 132 of them were identified and used by our method.
Similarly in the region 2, there are 267 overlapping image pairs and
198 of them were used. In the region 3, there are 197 pairs and 179 of
them were used. Second row final mosaics (rendered using last-on-top
strategy) for the third dataset. Left obtained using all-overlapping image
pairs. The size of mosaic is 5935 × 7052 pixels. Right obtained with
proposed method. The size of mosaic is 6090 × 7133 pixels
ity information between image pairs computed a priori using491
descriptor matching. We computed this similarity informa-492
tion with the same parameter set used in our proposed493
method. We also include the results of traditional image-to-494
image online mosaicing, referred to as image-to-image (ITI)495
with generating overlapping image pairs through distance496
between image centers for a given trajectory estimate. For497
non-overlapping time-consecutive images, we introduced498
them with identity mapping. Obtained results are summa-499
rized in Table 1. First image frame of the obtained trajectory500
is aligned with the first image frame of the AGA trajectory501
(except Dataset 1) and then distance on image centers are502
calculated against the image centers obtained with AGA tra-503
jectory. Symmetric transfer errors are also computed using504
all correspondences detected over all identified overlapping505
image pairs.506
One of the important properties of symmetric transfer 507
error is invariance to the selected global frame. On the other 508
hand, sometimes it does not directly provide information 509
on the final mosaic quality due to the rendering strategy, 510
namely (last-on-top), we used. If there is a big scale differ- 511
ence between images, one of them may not be visible on the 512
final mosaic. Error between them may not effect the visual 513
quality of the final mosaic. For this reason, we also report 514
error on mosaic frame computed as: 515
min1H2,
1H3,...,1HN
∑
k
∑
m
c∑
j=1
| 1Hk ·k x j −1 Hm ·m x j | , (5) 516
where k and m are images that have an overlap area, and c 517
is the total number of correspondences between them. The 518
123
Journal: 11370 MS: 0195 TYPESET DISK LE CP Disp.:2016/3/15 Pages: 13 Layout: Large
Au
tho
r P
ro
of
unco
rrec
ted
pro
of
Intel Serv Robotics
Fig. 6 Initial similarity matrix computed detecting SURF features and
matching LDB descriptors without outlier rejection. Maximum number
of matched descriptors is 228
main drawback of this error is that it suffers from the scal-519
ing effect. The smaller the mosaic size gets, the smaller the520
error becomes. For the second and third datasets, although521
the errors on final trajectories may seem high, final mosaic522
quality is similar to their counterparts as seen in Figs. 4 and 5.4 523
Although the offline method was able to identify almost all-524
overlapping image pairs, it failed to obtain coherent trajectory525
for the third dataset as one of the transects remained uncon-526
nected. This is mainly due to the initial similarity matrix given527
in Fig. 6. This initial information can be regarded as noisy528
since it suggests a lot of possible overlapping pairs that are529
not overlapping actually. Our proposal was able to recover530
connection between transects as the MST check applies iter-531
atively for every image unlike the offline method, which tries532
to find an MST from the first image to the last one by adap-533
tively changing virtual links in each iteration.534
5.3 Performance analysis with different number of 535
non-overlapping time-consecutive image pairs 536
In order to see how our method performs with different num- 537
ber of non-overlapping time-consecutive image pairs, we 538
designed an experiment using the first dataset. We removed 539
the correspondences of a certain number of overlapping 540
time-consecutive image pairs randomly and run our method. 541
Results are provided in Table 2. For each number of differ- 542
ent non-overlapping time-consecutive image pairs, results 543
are statistically computed over 100 different runs. For the 544
maximum error columns, we report the maximum of over 545
different runs. Our method was able to obtain the final tra- 546
jectory. This is mainly as a result of the trajectory that is 547
fully connected and moreover it is dense (3321 overlapping 548
pairs for 245 images) thus, removing links between time- 549
consecutive overlapping pairs did not break the connectivity. 550
We also observed that increasing the number of removed 551
links increased the total number of executions of the correc- 552
tion step as expected. On the other hand, if the distance the 553
vehicle traveled between non-overlapping time-consecutive 554
images is relatively small comparing to the diagonal of a 555
single image, intermediate trajectory estimate with virtual 556
link(s) added usually did not totally drift away from the orig- 557
inal trajectory. This provides a good initial trajectory estimate 558
for the correction step, thus reducing the total number of 559
iterations in the non-linear optimization process. Also, in 560
such cases, the execution of correction step can be limited in 561
order to meet computational requirements. Since the trajec- 562
tory estimate usually contains virtual links (especially for a 563
higher number of non-overlapping time-consecutive image 564
pairs), the step of generating the possible list of overlap- 565
ping image pairs mostly reduces to and relies on the only 566
visual similarity search between images. In such cases, the 567
overall performance is highly dependent on the performance 568
of visual similarity search. In order to support these find- 569
ings, we also did a similar test with the third dataset which 570
Table 2 Errors in pixels on obtained trajectories for different number of non-overlapping time-consecutive image pairs for the first dataset
Method Number of non-overlapping
consecutive image pairs
Image center distance error Symmetric transfer
distance error
Error on mosaic frame
Mean Std Max Mean Std Max Mean Std Max
Proposed 25 4.30 3.49 44.40 5.86 3.13 57.49 2.80 3.09 51.54
All-against-all (AGA) 25 N.A N.A N.A 5.98 3.02 46.41 2.86 3.00 47.46
Proposed 50 5.02 4.56 45.39 5.93 3.25 58.45 2.83 3.19 51.54
All-against-all (AGA) 50 N.A N.A N.A 6.01 3.03 46.41 2.88 3.01 47.53
Proposed 100 8.22 7.54 45.55 6.13 3.72 58.56 2.93 3.59 51.59
All-against-all (AGA) 100 N.A N.A N.A 6.07 3.04 46.41 2.90 3.02 47.53
Proposed 200 12.59 10.24 46.47 6.49 4.35 59.06 3.10 4.13 51.89
All-against-all (AGA) 200 N.A N.A N.A 6.20 3.07 46.42 2.97 3.06 47.53
123
Journal: 11370 MS: 0195 TYPESET DISK LE CP Disp.:2016/3/15 Pages: 13 Layout: Large
Au
tho
r P
ro
of
unco
rrec
ted
pro
of
Intel Serv Robotics
Fig. 7 Left full trajectory. Images from 180 to 225 and 270 to 315
are extracted and formed as a new dataset. Right obtained trajectory
with proposed method. Due to the virtual link established between the
last image of first transect and the first image of the second transect,
absolute positioning of the transects are not the same with original tra-
jectory. However, they are within themselves are aligned similarly to
the original trajectory as in the left plot
Table 3 Errors in pixels on obtained trajectories for the transects of unconnected trajectory
Dataset Method Number of overlapping
image pairs
Image center
distance error
Symmetric transfer
distance error
Error on mosaic frame
Successfully
matched
Unsuccessful
attempts
Mean Std Max Mean Std Max Mean Std Max
Dataset 4 Proposed 152 72 28.00 20.80 63.94 6.86 2.98 41.75 5.38 4.68 58.30
Transect 1 All-against-all (AGA) 293 742 N.A N.A N.A 7.84 2.83 27.14 4.30 3.10 28.43
Dataset 4 Proposed 207 15 + 44a 4.13 4.31 16.04 7.39 3.11 37.00 6.00 5.07 58.04
Transect 2 All-against-all (AGA) 366 669 + 2116a N.A N.A N.A 7.20 3.68 50.69 3.65 3.72 49.80
a This number is the image-matching attempts between image pairs among two transects
is sparser comparing to the first dataset. This dataset has571
already 15 non-overlapping time-consecutive image pairs.572
We randomly removed ten more links between overlapping573
time-consecutive image pairs and run for 100 trials. Tak-574
ing into account the topology, removing ten was more than575
enough to break the connectivity of the topology, which rules576
out the initial scenario of having connected trajectory. 17 of577
these trials trajectory became unconnected due to the links 578
removed. In 36 trials of remaining 83, our method was able 579
to recover the trajectory similar to the original one. In order 580
to see how the result changes with the different parameter 581
set for visual similarity search, we further did a final experi- 582
ment. We increase the number of feature points detected per 583
image by changing the image resizing scale factor from 0.25 584
123
Journal: 11370 MS: 0195 TYPESET DISK LE CP Disp.:2016/3/15 Pages: 13 Layout: Large
Au
tho
r P
ro
of
unco
rrec
ted
pro
of
Intel Serv Robotics
Fig. 8 Individual trajectory
comparison for transects of
Dataset 4. Left first transect.
Right second transect
to 0.5 for this dataset and reducing the threshold for fea-585
ture detector. These changes have increased the total number586
of features detected and caused an increase on the com-587
putational cost naturally. Feature detection and description588
took approximately 80 ms per image and descriptor match-589
ing took 7 ms in average per image pair both running on590
CPU. Again we remove ten more links between overlapping591
time-consecutive image pairs randomly and run 100 trials.592
In 15 trials trajectory became unconnected and our method593
was able to recover the trajectory in 76 trials of remaining 85.594
Since the proposed method is capable of dealing with differ-595
ent number of non-overlapping time-consecutive image pairs596
depending on the visual similarity search performance, the597
image acquisition order is not fully important. This is mainly598
due to the fact that overlapping image pairs are searched with599
their visual similarity scores. It should be also noted that the600
total number of execution of correction step varies depend-601
ing on the image order. For the first dataset, we order images602
randomly and run our method. This is repeated 100 times.603
Our method was successful in all of 100 trials. This result604
mainly depends on the parameters used for visual similarity605
search and partially the density of dataset. This flexibility on606
image order allows our method to be suitable for multi-robot607
surveying scenarios.608
5.4 Test with unconnected trajectories 609
Finally, we tested our proposal on a trajectory composed 610
of two unconnected transects, although our initial scenario 611
assumes a connected trajectory. Trajectories are illustrated 612
in Fig. 7. Obtained results are given in Table 3 as Dataset 4. 613
Individual trajectory comparisons are depicted in Fig. 8. Our 614
proposal was able to obtain the topologies of independent 615
parts within acceptable accuracy. Absolute positions of the 616
transects were not obtained correctly due to the virtual link. 617
Since the trajectory is unconnected, removing virtual link is 618
not possible. 619
6 Conclusions 620
Owing to inexpensive robotic platforms and optical sensors, 621
underwater exploration and mapping have become avail- 622
able to an increasing number of end-users, who can deploy 623
these systems with little expertise. In this paper, we pro- 624
pose an online mosaicing method that is capable of handling 625
gaps between time-consecutive images unlike the traditional 626
methods and this capability makes it adequate for surveys 627
with such low-cost platforms. This is achieved by the use 628
123
Journal: 11370 MS: 0195 TYPESET DISK LE CP Disp.:2016/3/15 Pages: 13 Layout: Large
Au
tho
r P
ro
of
unco
rrec
ted
pro
of
Intel Serv Robotics
of MST check step, which triggers the trajectory estimate629
correction step. We also show that visual similarity search630
becomes crucial when there is no additional positioning631
information available. The proposed method was tested with632
several different underwater image sequences and results633
were presented to illustrate the performance.634
References635
1. Gleason A, Gracias N, Lirman D, Gintert B, Smith T, Dick M, Reid636
R (2010) Landscape video mosaic from a mesophotic coral reef.637
Coral Reefs 29(2):253638
2. GNOM Baby (2015). http://www.gnomrov.com/products/gnom-639
baby/5 640
3. VideoRay Scout (2015). http://shop.videoray.com/shop-front#!/641
VideoRay-Scout-Remotely-Operated-Vehicle-ROV-System/p/642
39381588/category=0643
4. SeaBotix LBV150-4 MiniROV (2013). http://www.seabotix.com/644
products/lbv150-4.htm645
5. Proteus 500 ROV (2014). http://www.hydroacousticsinc.com/646
products/rov-remote-operated-vehicles/rov-product-specs.html647
6. Caballero F, Merino L, Ferruz J, Ollero A (2007) Homography648
based Kalman filter for mosaic building. applications to UAV posi-649
tion estimation. In: IEEE international conference on robotics and650
automation, pp 2004–2009651
7. Elibol A, Gracias N, Garcia R, Gleason A, Gintert B, Lirman652
D, Reid PR (2011) Efficient autonomous image mosaicing with653
applications to coral reef monitoring. In: IROS 2011 workshop on654
robotics for environmental monitoring655
8. Garcia-Fidalgo E, Ortiz A, Bonnin-Pascual F, Company JP (2015)656
A mosaicing approach for vessel visual inspection using a micro-657
aerial vehicle. In: 2015 IEEE/RSJ international conference on658
intelligent robots and systems (IROS). IEEE, pp 104–110659
9. Garcia R, Puig J, Ridao P, Cufí X (2002) Augmented state Kalman660
filtering for AUV navigation. In: IEEE international conference on661
robotics and automation, Washington, D.C., vol 3, pp 4010–4015662
10. Eustice R, Pizarro O, Singh H (2004) Visually augmented naviga-663
tion in an unstructured environment using a delayed state history.664
In: 2004 IEEE international conference on robotics and automa-665
tion, 2004. Proceedings. ICRA’04, vol 1. IEEE, pp 25–32666
11. Richmond K, Rock SM (2006) An operational real-time large-667
scale visual mosaicking and navigation system. In: OCEANS 2006.668
IEEE, pp 1–6669
12. Mahon I, Williams SB, Pizarro O, Johnson-Roberson M (2008)670
Efficient view-based slam using visual loop closures. IEEE Trans671
Robot 24(5):1002–1014672
13. Kim A, Eustice R (2009) Pose-graph visual slam with geometric673
model selection for autonomous underwater ship hull inspection.674
In: IEEE/RSJ international conference on intelligent robots and675
systems (IROS’09). IEEE, pp 1559–1565676
14. Vaganay J, Elkins M, Willcox S, Hover F, Damus R, Desset677
S, Morash J, Polidoro V (2005) Ship hull inspection by hull-678
relative navigation and control. In: OCEANS, 2005. Proceedings679
of MTS/IEEE. IEEE, pp 761–766680
15. Bülow H, Birk A (2009) Fast and robust photomapping with an 681
unmanned aerial vehicle (UAV). In: IEEE/RSJ international con- 682
ference on intelligent robots and systems (IROS’09). IEEE, pp 683
3368–3373 684
16. Ferreira F, Veruggio G, Caccia M, Bruzzone G (2012) Real-time 685
optical slam-based mosaicking for unmanned underwater vehicles. 686
Intell Serv Robot 5(1):55–71 687
17. Kekec T, Yildirim A, Unel M (2014) A new approach to real-time 688
mosaicing of aerial images. Robot Auton Syst 62(12):1755–1767 689
18. Elibol A, Gracias N, Garcia R (2012) Efficient topology estimation 690
for large scale optical mapping. In: Springer tracts in advanced 691
robotics, vol 82. Springer, New York 692
19. Elibol A, Gracias N, Garcia R (2010) Augmented state-extended 693
Kalman filter combined framework for topology estimation in 694
large-area underwater mapping. J Field Robot 27(5):656–674 695
20. Elibol A, Garcia R, Gracias N (2011) A new global align- 696
ment approach for underwater optical mapping. Ocean Eng 697
38(10):1207–1219 698
21. Elibol A, Gracias N, Garcia R (2013) Fast topology estimation for 699
image mosaicing using adaptive information thresholding. Robot 700
Auton Syst 61(2):125–136 701
22. Moon H, Tully S, Kantor G, Choset H (2007) Square root 702
iterated Kalman filter for bearing-only SLAM. In: The 4th inter- 703
national conference on ubiquitous robots and ambient intelligence 704
(URAI’07), Pohang 705
23. Garcia R, Gracias N (2011) Detection of interest points in turbid 706
underwater images. In: IEEE OCEANS, pp 1–9 707
24. Yang X, Cheng KT (2014) Local difference binary for ultrafast 708
and distinctive feature description. IEEE Trans Pattern Anal Mach 709
Intell 36(1):188–194 710
25. Lowe D (2004) Distinctive image features from scale-invariant key- 711
points. Int J Comput Vis 60(2):91–110 712
26. Ila V, Porta JM, Andrade-Cetto J (2010) Information-based com- 713
pact pose SLAM. IEEE Trans Robot 26(1):78–93 714
27. Anderson BDO, Moore JB (1979) Optimal filtering. Prentice-Hall, 715
USA 716
28. Torr P, Zisserman A (1998) Robust computation and parametriza- 717
tion of multiple view relations. In: Sixth international conference 718
on computer vision. IEEE, pp 727–732 719
29. Haralick RM (1998) Propagating covariance in computer vision. 720
In: 9. Theoretical foundations of computer vision, pp 95–114 721
30. Gracias N, Negahdaripour S (2005) Underwater mosaic creation 722
using video sequences from different altitudes. In: MTS/IEEE 723
OCEANS conference, Washigton, D.C., pp 1234–1239 724
31. Escartin J, Garcia R, Delaunoy O, Ferrer J, Gracias N, Elibol A, 725
Cufi X, Neumann L, Fornari DJ, Humpris SE, Renard J (2008) 726
Globally aligned photomosaic of the Lucky Strike hydrothermal 727
vent field (Mid-Atlantic Ridge, 3718.5′N): release of georeferenced 728
data, mosaic construction, and viewing software. Geochem Geo- 729
phys Geosyst 9(12):Q12,009 730
32. Lirman D, Gracias N, Gintert B, Gleason A, Reid RP, Negah- 731
daripour S, Kramer P (2007) Development and application of a 732
video-mosaic survey technology to document the status of coral 733
reef communities. Environ Monitor Assess 159:59–73 734
33. Ribas D, Palomeras N, Ridao P, Carreras M, Hernandez E (2007) 735
Ictineu AUV wins the first SAUC-E competition. In: IEEE inter- 736
national conference on robotics and automation, Rome 737
123
Journal: 11370 MS: 0195 TYPESET DISK LE CP Disp.:2016/3/15 Pages: 13 Layout: Large
Au
tho
r P
ro
of