Share this document with a friend

7

Transcript

Unit Quaternion-based Parameterization forPoint Features in Visual Navigation

James Maley and Guoquan Huang

Abstract— In this paper, we propose to use unit quaternionsto represent point features in visual navigation. Contrary tothe Cartesian 3D representation, the unit quaternion can wellrepresent features at both large and small distances fromthe camera without suffering from convergence problems.Contrary to inverse-depth, homogeneous points, or anchoredhomogeneous points, the unit quaternion has error state ofminimum dimension of three. In contrast to prior representa-tions, the proposed method does not need to approximate aninitial infinite depth uncertainty. In fact, the unit-quaternionerror covariance can be initialized from the initial featureobservations without prior information, and the initial error-states are not only bounded, but the bound is identical for allscene geometries. To the best of our knowledge, this is the firsttime bearing-only recursive estimation (in covariance form) ofpoint features has been possible without using measurements toinitialize error covariance. The proposed unit quaternion-basedrepresentation is validated on numerical examples.

I. INTRODUCTION

Visual navigation with a monocular camera typically re-quires estimating point features along with the states ofthe sensor platform. Many algorithms are available forsolving this problem, for example, extended Kalman filter(EKF)-based SLAM [1]–[3], pose graph optimization [4]–[6], and visual-inertial odometry (VIO) [7]–[10]. Estimat-ing feature positions is challenging because a monocularcamera measurement provides no depth information withoutmotion parallax. Prior to being observed there is typicallyno information about the feature location. For this reason, acovariance-form EKF is difficult to initialize without makingan approximation to the prior feature covariance or perform-ing a delayed initialization procedure [11], [12].

To address these challenges, in this paper we proposeto use a unit quaternion to represent 3D feature positions.Quaternions are widely used in navigation [13], attitudeestimation [14], and computer graphics [15], but only forrepresenting rotations. It is possible to represent a R3 vectoras a so-called “pure quaternion” for the purpose of rotating itinto a different frame of reference with quaternions, but thatis not what is advocated here. Instead, the unit quaternion istreated as a special case of homogeneous coordinates, subjectto spherical normalization. The motivation for using quater-nions for representing point feature positions is analogousto that for using quaternions to represent attitude instead of

This work was partially supported by the U.S. Army Research Laboratory,the University of Delaware College of Engineering, the NSF (IIS-1566129)and the DTRA (HDTRA 1-16-1-0039).

J. Maley is with the U.S. Army Research Lab, Aberdeen Proving Ground,MD 21005, USA. Email: [email protected]

G. Huang is with the Department of Mechanical Engineering, Universityof Delaware, Newark, DE 19716, USA. Email: [email protected]

Euler angles. Euler angles are intuitive, provide a minimalerror representation, and careful modeling and engineeringwork-arounds can avoid degenerate rotation spaces (e.g.gimbal lock). For attitude, quaternions provide a minimalerror representation and work to represent any sequence ofrotations without design modifications. We show in this paperthat when using quaternions to represent feature positionsthey not only can represent features at any depth withoutsuffering from convergence issues, but do so with minimalerror representation while enabling simple undelayed initial-ization for robocentric geometries.

Note first that throughout this paper, we have employedthe following notations:

u, φ Unit vector, and quaternion anglev Vectors are in bold lowercaseM Matrices are in bold uppercasef Quaternions (and briefly homogeneous co-

ordinates) denoted by¯

a, v, M, ˆf Estimated quantities denoted byˆ

a, v, M, ˜f Error states denoted by˜CBR A rotation matrix from frame B to frame CR(α) A rotation matrix formed from Euler angles

in vector αCvA/B Coordinates of A with respect to B viewed

in frame CCvA No / implies coordinates of A with respect

to C viewed in frame Cbv×c Skew-symmetric matrix formed from v, i.e.

bv×cw = v×w where × denotes a crossproduct

⊗ Quaternion multiplication operatorJy (x) If evaluation point not specified, Jacobian

of vector valued function y with respect tox, evaluated at x = 0

{G}, {B}, {I}, {C} Global, IMU (body)-fixed, initialsighting, and camera frames

p Position vectorpf Feature positionT Homogeneous transformationCGq Rotation quaterniont> =

[θ> p>

]Transformation error vector

f> =[v> b

]Feature quaternion in user-chosen coordinate system

g> =[w> c

]Feature quaternion after sometransformation

2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)Madrid, Spain, October 1-5, 2018

978-1-5386-8094-0/18/$31.00 ©2018 IEEE 6880

II. REPRESENTATIONS OF POINT FEATURES

A feature’s mathematical representation ideally should beable to represent points that are infinitely far away fromthe camera, as well as points that are close to the camera.It is known that a standard R3 Cartesian vector is notsuitable for representing points at infinity. When used inbundle adjustment (BA), Cartesian representations have poorconvergence because extremely large position corrections areneeded to be observable in pixel reprojection errors [6].Cartesian representations also yield practical difficulty forcovariance-form EKF-based solutions because it is hard tofully initialize a feature’s error covariance to represent thecomplete absence of depth information. While this issue canbe alleviated with initialization procedures such as [16], theconvergence problem remains.

The inverse-depth parameterization was introduced to ad-dress these issues and provides a parameterization in whichthe transformation from feature errors to pixel residuals ismore linear than the Cartesian representation [17]. Sincethen, inverse depth has been the standard in feature-basedSLAM systems in particular when distant points are present,and Cartesian points are used when they are close to thecamera relative to the motion. In visual SLAM, the inversedepth parameterization works by considering not only thecurrent camera frame {C} position and rotation relative tothe global frame {G}, i.e., (GpC ,CGR), but also the positionof the camera GpI when the feature was first observed.Prior to the perspective projection operation, the feature pointposition in the camera frame is given by:

Cpf = CGR

[GI R

Ipf + GpI − GpC]

(1)

In [17], the feature position relative to the initial sightingis a function of the inverse depth ρ, and the azimuth andelevation angles ψ, and θ, i.e., GI R

Ipf = 1ρm (ψ, θ). The

result is that the feature points are parameterized by a R6

vector af , i.e., af =[Gp>I ψ θ ρ

]>. The disadvantage

is the additional computational complexity of including GpIin the state-vector, which motivates in many implementationsa mechanism for switching between inverse depth and Carte-sian coordinates and a metric to determine when that switchis appropriate [18], [19].

Different state-space modeling can be used to avoidincluding GpI as part of the state vector. For example,in a robocentric formulation, the feature position can bedefined relative to either the current camera frame [20] or arecent keyframe [21]. Another inverse depth parameterizationincludes the MSCKF-based VIO [7], in which the trigono-metric terms are replaced with a projective terms α, β, i.e.,Ipf = 1

ρ

[α β 1

]>. This is another instance in which it

is not required to maintain GpI as part of the state vectorbecause by construction the features are defined relative tocamera frame in a sliding window.1

1Technically, the MSCKF and its variants do not store the feature pointsas states at all, while they are marginalized immediately after a window ofobservations are used.

While the inverse depth and Cartesian coordinates are thetwo most widely used representations, other parameteriza-tions such as homogeneous coordinates and anchored homo-geneous coordinates (AHC) also exist. In particular, AHCwere shown to be more consistent than the inverse depthin EKF-based SLAM [22], although they are 7-dimensional,which is over parameterized.

In summary, it is clearly possible to develop a functionalvisual SLAM or visual odometry system using an inverse-depth or other existing parameterization of point features.What we hope to improve upon is the generality and initial-ization by finding an alternative parameterization.

III. UNIT QUATERNION PARAMETERIZATIONFOR POINT FEATURES

A. Homogeneous Coordinates

The quaternion representation presented here can be con-sidered a special case of homogeneous coordinates. Torepresent features at any depth, it is strongly suggested in[6] using homogeneous coordinates of the form:

f =

[vb

]︸ ︷︷ ︸homogeneous

=⇒ pf =v

b︸ ︷︷ ︸Cartesian

(2)

with spherical normalization, i.e., f>f = 1. The featureposition in the camera frame is given by:

Cpf = CGR

Gpf − CGR

GpC (3)

⇒ Cpfb =[CGR

... −CGRGpC

]f (4)

Conveniently, the scalar term b cancels out in the perspectiveprojection that generates normalized pixel coordinates in theimage plane, given by:

y =Cpfx,y

bCpfzb

(5)

where Cpfx,ydenotes the x and y coordinates of Cpf and

Cpfz is the z coordinate. However, the problem is thatwith an additive error model, the normalization constraintis violated. In addition, the feature position becomes lessobservable because an additional degree of freedom is added.

B. Unit Quaternion Representation

Suppose the homogeneous coordinates are further con-strained to form a unit quaternion of the following form:

f =

[u sin(φ)cos(φ)

](6)

In this case, u is a unit vector that points from the origin ofthe reference frame to the feature point. It is not hard to seethat this satisfies the normalization constraint. From (2), thescale of the feature is given by tan(φ), thus:

pf = u tan(φ) (7)

Because tan(φ) has a range of [−∞,∞], this representationis capable of representing any point in R3.

6881

0 20 40

tan(φ)

0

1

2

3

4

tan(φ

+φ)/

tan(φ)

φ = -1o

φ = 0o

φ = 1o

0 50

x

0

0.5

1

1.5

2

(x+x)/x

x = -1

x = 0

x = 1

Fig. 1: Ratio of feature position scale before and after quaternionupdate with small φ values compared to that of feature positionscale before and after Cartesian update with small x values (left).Position update in the second special case (right).

Quaternions have been used extensively for representingorientation in most navigation applications. Usually whena quaternion is being estimated, the initial estimate ˆf iscorrected by multiplying it by an error quaternion ˜f [23].One reason for this is unit quaternions are closed underquaternion multiplication, but not under addition (see [24]).For reference, this operation is given by:

f = ˜f ⊗ ˆf =

[v

b

]⊗[v

b

](8)[

vb

]=

[Ib+ bv×c v

−v> b

] [v

b

](9)

To understand the physical interpretation of this operationwhen applied to feature positions, we examine the equivalentupdated Cartesian position by substituting (6) into (8):

v = cos(φ) sin(φ)u + cos(φ) sin(φ)u+

sin(φ) sin(φ)bu×cub = cos(φ) cos(φ)− sin(φ) sin(φ)u>f u

⇒ pf =tan(φ)u + tan(φ)u− tan(φ) tan(φ)bu×cu

1− tan(φ) tan(φ)u>u(10)

1) Special Cases: Two special cases provide intuitionabout the updated Cartesian vector (10). First, if u is in theexactly same direction as u, i.e., u>u = 1 and bu×cu = 0,then (10) reduces to:

pf =tan(φ) + tan(φ)

1− tan(φ) tan(φ)u = tan(φ+ φ)u (11)

This becomes convenient, because the change in the magni-tude of pf imparted by a small φ scales with the magnitudeof pf . Fig. 1(left) illustrates this effect. With the quaternionparameterization, small φ corrections produce larger andlarger position scale changes as the original estimate in-creases. This is in contrast to the Cartesian parameterization,in which corrections to any 3D coordinate (x in this example)have diminishing returns as the prior estimate scale increases.This is why Cartesian representations have convergenceproblems for distant feature points. But what about if φis close to π/2? In this case, it is indeed possible for acorrection with a large enough φ to push the magnitudeof the resulting Cartesian position scale past ∞ to a largenegative number. Although this results in a reconstructed

Cartesian coordinate that is behind the camera, this doesnot effect the normalized pixel coordinates because the scalefactor cancels in the perspective projection (5). In addition,an indirect (EKF) SLAM estimator is not jointly estimatingCartesian coordinates of the features with the robot statesbut estimating the feature quaternion error states, and theseestimates are what must be consistent. This is similar topushing ρ below 0 in an inverse depth parameterization.

In the second special case, if u>u = 0, we have:

pf = tan(φ)u + tan(φ)u− tan(φ) tan(φ)bu×cu

= pf +1

cos(φ)

(cos(φ)pf − sin(φ)bu×cpf

)(12)

The first term (12) is a Cartesian addition to the originalposition estimate. The second term is a special case ofa vector rotated by an angle φ about a vector −u usingRodrigues formula, and then scaled by a factor of 1/ cos(φ).The whole transformation is depicted in Fig. 1(right).

C. Linearization of Quaternions

It is clear from the special cases discussed above thata small φ may cause a large change in the Cartesiancoordinates of the feature position. As in the perturbationanalysis of quaternions used to represent rotation [24], φ isassumed to be small (i.e., cos(φ) ≈ 1), and thus, only thevector part of the error quaternion v needs to be estimated.The dimension of the quaternion error state has now gonefrom 4 to 3, same as that of Cartesian error states. The errorquaternion is hence approximated by ˜f ≈

[v> 1

]>, and

the quaternion update simply becomes [see (9)]:

f ≈[Ib+ bv×c−v>

]v +

[v

b

]=: Fv + ˆf (13)

D. Transformation of Quaternions

Suppose f is the quaternion representation of a featureposition in frame {J}, Jpf , and g is the quaternion rep-resentation of a feature position in frame {K}, Kpf .2 Thetransformation between f and g is given by [see (4)]:

Kpf = KJ RJpf − K

J RJpK (14)

⇒[wc

]︸︷︷︸g

b =

[KJ R −KJ RJpK0 1

]︸ ︷︷ ︸

T

[vb

]︸︷︷︸f

c (15)

The estimate of g is easily obtained from the estimates of fand T by multiplying them and normalizing the result. Thegoal now is to determine the linear mapping from errors in fand T to that in g, which can be determined by perturbation.Specifically, the quaternions are linearized as [see (13)]:

F =

[Ib+ bv×c−v>

]G =

[Ic+ bw×c−w>

](16)

f ≈ Fv + ˆf g ≈ Gw + ˆg (17)

b ≈ b− v>v c ≈ c− w>w (18)

2For example, J and K could be reference and camera frame, cameraframe at t = k and t = k + 1, etc. depending on filter mechanization.

6882

Substitution of these identities into (15) yields:(Gw + ˆg

) (b− v>v

)≈(T + T

)(Fv + ˆf

) (c− w>w

)where for simplicity (with a bit abuse of notation) wedecomposed the true homogeneous transformation into itsestimate and error, T ≈ T + T. By neglecting high-ordererrors and using ˆgb = T ˆf c, this can be simplified to:(

ˆgw>bc−1 + Gb)w ≈

(ˆgv> + TFc

)v + T ˆf c (19)

It is not difficult to validate that G>G = I3×3 and G> ˆg =03×1. By multiplying both sides by G>b−1, we obtain thedesired linear relation of the quaternion error states:

w ≈ c

bG>TFv +

c

bG>T ˆf (20)

Different parameterizations of T are possible in terms ofthe overarching navigation error states. For example, veryoften we consider the transformation error vector as t =[θ> p>

]>, where K

J R ≈(I− bθ×c

)KJ R, and JpK =

J pK+p. Then it becomes useful to re-derive (20) as follows:

w ≈ Jw(v)v + Jw(t)t (21)

Jw(v) =c

bG>TF (22)

Jw(t) =c

bG>

[bKJ R

(v − bJ pK

)×c −bKJ R

](23)

We point out that care should be taken to account for caseswhen b = 0 (which can happen often as will become clearin the initialization discussed later). For instance, if b = 0,then c = 0 by definition and the ratio can be set to 1.

E. Perspective Projection

Suppose the transformed quaternion g =[w> c

]>rep-

resents the feature position in the camera frame, i.e., Cpf =w/c. A monocular camera ideally measures the perspectiveprojection of this position into the camera frame (5), and thusthis measurement is: y = f

[wx/wz wy/wz

]>+ c + η,

where f is the focal-length in pixels, c is the cameracenter offset, w =

[wx wy wz

]>, and the noise η ∼

N (0, Iσ2pix). With the error state, w = w+ (Ic+ bw×c) w,

we compute the measurement Jacobian as:

Jy(w)=f

(cwz+wxwy)w2

z− w

2x

w2z− 1 − (cwx−wywz)

w2z

w2y

w2z

+ 1(cwz−wxwy)

w2z

− (cwy+wxwz)w2

z

(24)

F. Unit Vector Measurements

The most general parameterization of a (pinhole or om-nidirectional) camera measurement is a unit vector pointedto the feature in the camera frame Cu. To preserve the unit-norm constraint, the error in this vector can parameterizedby a small rotation angle α, so that Cu = R(α)C u ≈(I− bα×c)C u, where C u is the measured unit vector.The corresponding quaternion formed by this unit vectorbecomes: g =

[Cu> sin(φ) cos(φ)

]>, where the angle

component φ is unknown. As the cross product of the vector

components is invariant to φ, we use this as the measurement:y = u × w = 0. Based on w = w + (Ic+ bw×c) w, wehave the following linearized measurement model:

y ≈ y + Jy (w) w + Jy (α) α (25)

Jy (w) = bC u×c (Ic+ bw×c) (26)

Jy (α) = −bC u×cbw×c (27)

The cross product measurements however only have 2 de-grees of freedom. Performing the QR decomposition yieldsJy (w) = MN, where M>M = I and N is upper righttriangular with the bottom row equal to 0. Using M>1:2 todenote the first two rows of M>, it is possible to yield thetransformed measurement y′:

y′ ≈M>1:2y︸ ︷︷ ︸y′

+M>1:2Jy (w)︸ ︷︷ ︸Jy(w)′

w + M>1:2Jy (α)︸ ︷︷ ︸Jy(α)′

α (28)

G. Feature Initialization

To initialize the EKF-based visual SLAM system thatincludes feature quaternion f =

[v> b

]>in the state

vector, we need to compute both the feature state estimateˆf and its error covariance. To this end, when the feature ismeasured for the first time, we set its quaternion as:

ˆf0 =[u>0 0

]>(29)

where u0 is the first observed unit vector pointing from thecamera to the feature, rotated into the desired coordinatesystem. In this way, it is indeed possible to obtain the initialmeasurement provided that the camera measurements do nothave infinite variance. On the other hand, in order to computean initial covariance, we first note that:

˜f = f ⊗ ˆf−1 =

[bv − bv + bv×cv

bb+ v>v

](30)

By evaluating it at the initial quaternion ˆf0, we have:

˜f0 =

[−bu0 + bv×cu0

v>u0

]=

[− cos(φ)u0 + sin(φ)bu×cu0

sin(φ)u>u0

](31)

From above, the norm of the initial error quaternion can becomputed as follows:

‖v0‖2 =(− cos(φ)u>0 − sin(φ)u>0 bu×c

)(32)

(− cos(φ)u0 + sin(φ)bu×cu0)

= cos2(φ)− sin2(φ)u>0 bu×c2u0 (33)

= cos2(φ)− sin2(φ)u>0(uu> − I

)u0 (34)

= 1− sin2(φ)(uT0 u

) (uT u0

)(35)

= 1− sin2(φ) cos2(θ) (36)

where θ is the angle between u and u0.This is an important result, because it means when a

feature quaternion is initialized with b0 = 0 as suggestedhere, the error states are bounded by a sphere with radius 1.One condition where this bound is reached is when sin(φ) =0, which means that the feature is located at the origin of

6883

the chosen coordinate system. The other condition is whencos(φ) = 0, which means that u and u0 are perpendicular.Note that this bound applies for any u and u0. As such,a finite covariance can be assigned to the quaternion erroreven when no information is available from observations.However, having a consistent distribution for v0 does notmean that the residual error distribution will be consistent ifan arbitrary u0 is chosen because the measurement equationis nonlinear. The initialization in (29) ensures that the resid-uals will be small at first which negates the nonlinearitiesin certain situations. In practice, the proposed initializationprocedure works well in situations where the distance fromthe camera to the origin of the global coordinate frame issmall relative to the distance from the landmark to the camera(see Fig. 2), but in the opposite case the non-linearities seemto prevent consistent recursive estimation.

Fig. 2: Bad (left) and good (right)geometries for initialization.

It is important to notethat in the Cartesian rep-resentation, the feature po-sition errors have infinitevariance prior to the ini-tialization procedure. Withinverse depth representa-tions, the initial depth alsohas infinite variance, buta relatively small varianceassigned to ρ0 accounts for most depths that will ever bepractically encountered. With the quaternion representationhowever, not only is the initial error state covariance finite,it is independent from the geometry of the problem. This isvalidated in the numerical experiment in Section IV, wherean actual value for the covariance is calculated numerically,while analytically determining the covariance will be forfuture work. It should be also noted that while the distri-bution of quaternion error states has finite variance, it isnot necessarily Gaussian. So, while the error states can beinitialized with the correct variance in the EKF, there is stillan approximation of the distribution type.

IV. NUMERICAL EXPERIMENTS

Three numerical experiments were performed to validatethe results derived in this paper: (1) The first test is anextremely simple bundle adjustment problem in which asingle feature point is estimated from known camera poses.While this is not a SLAM problem because the pose isnot jointly estimated, it demonstrates the differences andsimilarities between two popular feature representations andthe proposed quaternion parameterization when it comes toconvergence and consistency. (2) The second test is designedto demonstrate that the error-states of an initial featurequaternion with no prior information is bounded as derived inthe preceding section. (3) In the third test, simulated inertialmeasurement unit (IMU) and feature points are used toevaluate three robocentric visual-inertial monocular SLAMfilters that use different point feature representations. Aninverse depth parameterization is included as benchmark and

two quaternion representations are presented to demonstratethe modeling advocated in this work.

A. Test 1: Batch Estimation

2 cameras centered on origin

Single landmark placed at [x,y,z] = [0,0,d]

−𝑏

2

𝑥

𝑦

𝑧

𝑏

2

𝑑

Fig. 3: Triangulation layout.

It is insightfulto consider asimple exampleof triangulatinga single featurefrom two cameraposes that areseparated by abaseline distanceb as illustrated in Fig. 3. The cameras are aligned with theglobal coordinate system, i.e., CGR = I. In each trial of theexperiment, each camera produces noisy 2D measurementsof the feature projection [see (5)]:

yC =(pfx,y

− pCx,y

)/ (pfz − pCz

) + η (37)

where η ∼ N (0, Iσ2). The observations from both camerasare stacked to form the vector y. For each trial of theexperiment, the measurements from the two cameras areused to perform maximum likelihood estimation (MLE) ofthe feature point position using the Cartesian, inverse depth,and quaternion representations. To show convergence ofthe estimators, they are each initialized with a depth ofd0 = 10d, and the MLE is performed using a Levenberg-Marquardt (LM) algorithm. In each iteration of the algorithm,a correction to the feature state x is computed from:

x =(Jx (y)

>Jx (y) + Iλ

)−1

Jx (y)>

(y − y (x)) (38)

where the feature for each parameterization is given by:Cartesian Inverse Depth Quaternion

x:[x y z

]> [α β ρ

]> [vx vy vz c

]>x:

[x y z

]> [α β ρ

]> [vx vy vz

]>The relaxation factor λ is set to 1e-5, and each experiment

runs the LM algorithm for 100 iterations. The experiment isrun 500 times for depths of d =

{1e1, 1e2, 1e3

}. The error

states are calculated from the true and estimated Cartesiancoordinate, inverse depth parameters, and quaternion. Thecovariance of the estimated error states is computed fromthe information matrix evaluated at the converged estimates:

P =(Jx (y)

>Jx (y)

)−1

σ2 (39)

The final error values in the form native to each parame-terization as well as the predicted 3σ error bounds computedfrom the covariance are shown in Fig. 4.3 As expected, allparameterizations converge to a consistent estimate whenthe depth-to-baseline ratio d/b is only 10. As this ratio isincreased to 100 and 1,000, the Cartesian representationfails to converge in the given number of iterations. Both theinverse-depth and quaternion parameterizations converge toconsistent estimates.

3Note that although this was a 3D experiment, the results are shown as2D cross sections to improve visibility.

6884

-4 -2 0 2 4

y×10-3

-0.05

0

0.05

0.1

z

d/b = 1e1

-4 -2 0 2 4

β×10-4

-10

-5

0

5

ρ

×10-4

-5 0 5

vy×10-4

-1

0

1

2

vz

×10-3

-0.4 -0.2 0 0.2 0.4

y

-1000

-500

0

500

z

d/b = 1e2

-4 -2 0 2 4

β×10-4

-5

0

5

ρ

×10-4

-5 0 5 10

vy×10-4

-1

-0.5

0

0.5

1

vz

×10-3

-4 -2 0 2 4

y

-5

0

5

z

×104 d/b = 1e3

-4 -2 0 2 4

β×10-4

-5

0

5

ρ

×10-4

-5 0 5

vy×10-4

-1

-0.5

0

0.5

1

vz

×10-3

error samples

predicted 3σ bound

Cartesian

Inverse Depth

Quaternion

Fig. 4: Native parameter errors b = 1, σ =1e-4.

The inverse-depth and quaternion estimates were alsotransformed back into Cartesian coordinates, along with theirassociated covariances based on a linearized transformationfrom native-to-Cartesian coordinates. It was observed inour tests that if the linearization errors are not excessive,both inverse-depth and quaternion estimates can be used toconsistently produce a Cartesian location if this is desired.When the noise is increased beyond a certain level relativeto d/b, the linearization errors become too large and thistransformation begins to break, while fully understanding thisphenomenon is left as a subject for future work.

B. Test 2: Initialization Error Bound

A simple test is conducted to verify the error boundof the initial feature quaternion with no prior informationas described in Section III-G. To do this, we perform thefollowing procedure:Repeat N times:

1) Generate random feature quaternionf =

[v> b

]>=[u> sin(φ) cos(φ)

]>.

2) Initialize feature quaternionestimate with v = u0 and b = 0.

3) Compute vector component of errorquaternion v.

The random quaternions are generated by creating u sam-ples with a uniformly distributed direction, and φ samplesuniformly distributed between −2π and 2π, which are thenused to compute (6). The u0 samples are also uniformlydistributed in direction. The results of this test with N = 100are shown in Fig. 5. As predicted, all of the error stateslie within a unit sphere. The experiment is repeated withN = 1e6, and the resulting sample covariance of the errorstates is E

[vv>

]u 0.277I.

C. Test 3: EKF-based VIO

To illustrate how the quaternion representation will beused in practice, a monocular EKF-based VIO system isimplemented. Note that our goal of this experiment is tovalidate that (i) a full EKF can be built with quaternionfeatures, (ii) the un-delayed initialization can work, and (iii)

-0.5 0 0.5 1

vx

-1

-0.5

0

0.5

1

vy Unit Circle

Error State Evaluations

-0.5 0 0.5 1

vx

-1

-0.5

0

0.5

1

vz

-0.5 0 0.5 1

vy

-1

-0.5

0

0.5

1

vz

Fig. 5: The x, y and z components of the quaternion error statesafter being initialized without prior information.

the same results can be obtained regardless of the coordinatesystem used for the quaternion features. The IMU (body)state vector for the EKF consists of the rotation quaternion,the gyroscope bias, the velocity, the accelerometer bias, andthe position: xB =

[BGq> b>ω

Gv>B b>aGp>B

]>. The

IMU state dynamics is well known and can be found in [7]and many others. The system state vector is augmented withfeatures which are defined relative to the body-frame (whichis assumed co-located with the camera frame) using threedifferent parameterizations:• Inverse-Depth VIO: The feature positions in the camera

frame are parameterized by: Cpf =[α β 1

]>ρ−1,

and the state vector is augmented with inverse depthparameters for each feature.

• Quaternions in Camera Frame: The features are param-eterized by f =

[v> c

]>so that Cpf = vc−1. Here,

the feature points are initialized as f0 =[Cu>0 0

]>,

and assigned an initial covariance of E[vv>

]u 0.277I.

• Quaternions in Camera/Global Frame: The features areparameterized by f =

[v> c

]>so that Gpf/B =

vc−1 (to clarify, Gpf/B = Gpf/G − GpB/G).Here, the feature points are initialized as f0 =[(CGR

>0Cu0)> 0

]>, and assigned an initial covariance

of E[vv>

]u 0.277I.

1) Simulations: To generate synthetic data with whichto evaluate the above algorithms, a low fidelity quadcoptormodel is created and equipped with a simulated ADIS16448IMU and a monocular camera providing feature point cor-respondences. The ADIS16448 uses the noise and biasrepeatability specifications listed in [25], but the higher ordererrors are turned off since they are not modeled in the EKF.The camera model produces feature points on the groundso that there would be approximately 20 of them in viewif the camera is pointed straight down. The noise on thefeature measurements is small (approximately 0.04 deg 1σ).The 40◦ field-of-view camera is pointed slightly down fromthe horizon for the majority of the trajectories when thequadcoptor flying at altitudes between 50 to 140 m. Severaltrajectories are shown in Fig. 6.

2) Results: Ten trajectories were run, and then the datafrom each was passed through each VIO filters. The po-

6885

Fig. 6: VIO results of trajectories and position RMSE and NEES.

sition RMSE (√Gp>B

GpB) and NEES (√Gp>BP

−1GpB)are shown in Fig. 6. From this result, while there is nostatistically significant difference in accuracy or consistencybetween the three feature parameterizations, it does verifythe efficacy of the quaternion feature parameterization andthe initialization procedure.

V. CONCLUSIONS AND FUTURE WORK

Unit quaternions are ideally suited for representing thehomogeneous coordinates of feature points at any scale. Inthis paper, we have shown that updating a feature quater-nion by quaternion multiplication of an error quaternioncan produce changes in the 3D feature position that scalewith the magnitude of the 3D position. Therefore, the errorquaternion only has minimum 3 degrees of freedom andcan be linearized as in rotation estimation problems. Asimple expression for initializing the feature quaternion waspresented, and it was shown that when this initialization isused the covariance of the quaternion error states is boundedand the bound is the same regardless of the scene geometry.This means that unlike other feature representations, thefeature quaternion error covariance can be set to a knownand finite value for use in a covariance-form Kalman filter.Numerical experiments were performed that validate theanalytical results, showing that quaternion representations aregood alternatives of inverse depth, e.g., in VIO systems.

There is a large amount of future work that can be pursuedfrom the results presented here. The unit-quaternion featureparameterization can be applied in all non-linear SLAM,bundle adjustment, and visual odometry approaches thatmake use of feature points observed with bearing-only mea-surements. In addition to the applications, theoretical resultsremain to be obtained. These include analysis of the linearity,derivation of an analytical expression for the initial featurequaternion error covariance, and observability analysis ofsystems using the feature quaternion parameterization.

REFERENCES

[1] A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, “MonoSLAM:Real-time single camera SLAM,” IEEE Transactions on Pattern Anal-ysis and Machine Intelligence, vol. 29, no. 6, pp. 1052–1067, Jun.2007.

[2] J. Civera, A. Davison, and J. Montiel, “Inverse Depth to DepthConversion for Monocular SLAM,” in Proc. of the IEEE InternationalConference on Robotics and Automation, Roma, Italy, April 2007.

[3] J. Fuentes-Pacheco, J. Ruiz-Ascencio, and J. Rendn-Mancha, “Visualsimultaneous localization and mapping: a survey,” Artificial Intelli-gence Review, pp. 1–27, 2012.

[4] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “ORB-SLAM: aversatile and accurate monocular SLAM system,” IEEE Transactionson Robotics, vol. 15, no. 2, pp. 1147–1163, 2015.

[5] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira,I. Reid, and J. J. Leonard, “Past, present, and future of simultaneouslocalization and mapping: Toward the robust-perception age,” IEEETransactions on Robotics, vol. 32, no. 6, pp. 1309–1332, 2016.

[6] B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon,“Bundle adjustment – a modern synthesis,” Lecture Notes in ComputerScience, vol. 1883, pp. 298–375, Jan. 2000.

[7] A. I. Mourikis and S. I. Roumeliotis, “A multi-state constraint Kalmanfilter for vision-aided inertial navigation,” in Proceedings of the IEEEInternational Conference on Robotics and Automation, Rome, Italy,Apr. 10–14, 2007, pp. 3565–3572.

[8] M. Li and A. I. Mourikis, “Improving the accuracy of EKF-basedvisual-inertial odometry,” in Proc. of the IEEE International Confer-ence on Robotics and Automation, St. Paul, MN, May 14–18, 2012,pp. 828–835.

[9] J. Hesch, D. Kottas, S. Bowman, and S. Roumeliotis, “Consistencyanalysis and improvement of vision-aided inertial navigation,” IEEETransactions on Robotics, vol. PP, no. 99, pp. 1–19, 2013.

[10] Z. Huai and G. Huang, “Robocentric visual-inertial odometry,” in Proc.IEEE/RSJ International Conference on Intelligent Robots and Systems,Madrid, Spain, Oct. 1-5, 2018, (to appear).

[11] T. Lemaire, S. Lacroix, and J. Sola, “A practical 3D bearing-onlySLAM algorithm,” in IEEE/RSJ International Conference on Intelli-gent Robots and Systems, Edmonton, Alberta, Canada, Aug. 2–6, 2005,pp. 2449–2454.

[12] T. Vidal-Calleja, M. Bryson, S. Sukkarieh, A. Sanfeliu, andJ. Andrade-Cetto, “On the observability of bearing only SLAM,”in Proc. of the IEEE International Conference on Robotics andAutomation, Roma, Italy, Apr. 10–14, 2007, pp. 4114–4118.

[13] S. I. Roumeliotis, G. S. Sukhatme, and G. A. Bekey, “Circumventingdynamic modeling: Evaluation of the error-state Kalman filter appliedto mobile robot localization,” in Proceedings of the IEEE InternationalConference on Robotics and Automation, vol. 2, Detroit, MI, May 10-15 1999, pp. 1656–1663.

[14] J. L. Crassidis, F. L. Markley, and Y. Cheng, “Survey of nonlinear atti-tude estimation methods,” Journal of guidance control and dynamics,vol. 30, no. 1, p. 12, 2007.

[15] J. B. Kuipers et al., Quaternions and rotation sequences. Princetonuniversity press Princeton, 1999, vol. 66.

[16] T. Lemaire, C. Berger, I. Jung, and S. Lacroix, “Vision-Based SLAM:Stereo and Monocular Approaches,” International Journal of Com-puter Vision, vol. 74, no. 3, pp. 343–364, 2007.

[17] J. Civera, A. Davison, and J. Montiel, “Inverse depth parametrizationfor monocular SLAM,” IEEE Transactions on Robotics, vol. 24, no. 5,pp. 932–945, Oct. 2008.

[18] R. Mur-Artal and J. D. Tardos, “Orb-slam2: an open-source slamsystem for monocular, stereo and rgb-d cameras,” arXiv preprintarXiv:1610.06475, 2016.

[19] J. Montiel, J. Civera, and A. Davison, “Unified inverse depthparametrization for monocular SLAM,” in Proc. of Robotics: Scienceand Systems, Philadelphia, PA, Aug. 16-19 2006, pp. 81–88.

[20] M. Bloesch, S. Omari, M. Hutter, and R. Siegwart, “Robust visualinertial odometry using a direct EKF-based approach,” in Proc. of theIEEE/RSJ International Conference on Intelligent Robots and Systems,Hamburg, Germany, Sep. 28 - Oct. 2, 2015, pp. 298–304.

[21] D. P. Koch, D. O. Wheeler, R. Beard, T. McLain, and K. M.Brink, “Relative multiplicative extended Kalman filter for observablegps-denied navigation,” Brigham Young University, Tech. Rep., 2017.[Online]. Available: http://hdl.lib.byu.edu/1877/3102

[22] J. Sola, “Consistency of the monocular EKF-SLAM algorithm forthree different landmark parametrizations,” in Proc. of the IEEEInternational Conference on Robotics and Automation, Anchorage,AK, May 3-7, 2010, pp. 3513–3518.

[23] M. D. Shuster, “A survey of attitude representations,” Journal of theAstronautical Sciences, vol. 41, no. 4, pp. 439–517, Oct.-Dec. 1993.

[24] N. Trawny and S. I. Roumeliotis, “Indirect Kalman filter for 3Dattitude estimation,” University of Minnesota, Dept. of Comp. Sci.& Eng., Tech. Rep., Mar. 2005.

[25] Data Sheet ADIS16448, Analog Devices, Inc., 2017. [Online].Available: www.analog.com

6886

Recommended