+ All Categories
Home > Documents > Pronto: a Multi-Sensor State Estimator for Legged Robots in...

Pronto: a Multi-Sensor State Estimator for Legged Robots in...

Date post: 28-Oct-2020
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
22
Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco Camurri 1,* , Milad Ramezani 1 , Simona Nobili 1,2 and Maurice Fallon 1 1 Dynamic Robot Systems, Oxford Robotics Institute, Department of Engineering Science, University of Oxford, Oxford, UK 2 University of Edinburgh, Edinburgh, UK Correspondence*: M. Camurri [email protected] ABSTRACT In this paper we present a modular and flexible state estimation framework for legged robots operating in real world scenarios, where environmental conditions such as occlusions, low light, rough terrain and dynamic obstacles can severely impair estimation performance. At the core of the proposed estimation system, called Pronto, is an Extended Kalman Filter (EKF) which fuses IMU and Leg Odometry sensing for pose and velocity estimation. We also show how Pronto can integrate pose corrections from visual and LIDAR and odometry to correct pose drift in a loosely coupled manner. This allows it to have a real- time proprioceptive estimation thread running at high frequency (2501000 Hz) for use in the control loop while taking advantage of occasional (and often delayed) low frequency (115 Hz) updates from exteroceptive sources, such as cameras and LIDARs. To demonstrate the robustness and versatility of the approach, we have tested it on a variety of legged platforms, including two humanoid robots (the Boston Dynamics Atlas and NASA Valkyrie) and two dynamic quadruped robots (IIT HyQ and ANYbotics ANYmal) for more than 2h of total runtime and 1.37 km of distance traveled. The tests were conducted in a number of different field scenarios which the conditions described above. The algorithms presented in this paper are made available to the research community as open-source ROS packages. 1 INTRODUCTION Legged robotics is rapidly transitioning from research laboratories into the real world, as demonstrated by the recent introduction of several commercial quadruped platforms. To be truly useful, legged robots must be able to reliably and rapidly navigate across rough terrain Figure 1. The Atlas (A) and Valkyrie (B) humanoid robots; The HyQ (C) and ANYmal (D) dynamic quadruped robots. Sources: MIT, University of Edinburgh, Istituto Italiano di Tecnologia (IIT), University of Oxford. and be stable in the presence of disturbances such as slips or pushes. They must also be able to perceive and manipulate the environment whilst avoiding collisions with obstacles and people. None of these tasks can be accomplished without the ability to accurately and robustly estimate the pose and velocity of the robot (i.e., its state) in real- time using only onboard sensors and computers. The robot’s state is used to plan and track body trajectories, to balance and recover from external disturbances, as well as to map the environment and navigate trough it. To achieve a satisfactory level of accuracy, proprioceptive and exteroceptive sensor fusion is necessary, exposing the problem of synchronization and latency between the different signals coming from each sensor. 1
Transcript
Page 1: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Pronto: a Multi-Sensor State Estimator forLegged Robots in Real World ScenariosMarco Camurri 1,∗, Milad Ramezani 1, Simona Nobili 1,2 and Maurice Fallon 1

1Dynamic Robot Systems, Oxford Robotics Institute, Department of EngineeringScience, University of Oxford, Oxford, UK2University of Edinburgh, Edinburgh, UKCorrespondence*:M. [email protected]

ABSTRACT

In this paper we present a modular andflexible state estimation framework for leggedrobots operating in real world scenarios, whereenvironmental conditions such as occlusions,low light, rough terrain and dynamic obstaclescan severely impair estimation performance. Atthe core of the proposed estimation system,called Pronto, is an Extended Kalman Filter(EKF) which fuses IMU and Leg Odometrysensing for pose and velocity estimation.We also show how Pronto can integratepose corrections from visual and LIDAR andodometry to correct pose drift in a looselycoupled manner. This allows it to have a real-time proprioceptive estimation thread runningat high frequency (250–1000 Hz) for use inthe control loop while taking advantage ofoccasional (and often delayed) low frequency(1–15 Hz) updates from exteroceptive sources,such as cameras and LIDARs. To demonstratethe robustness and versatility of the approach,we have tested it on a variety of leggedplatforms, including two humanoid robots (theBoston Dynamics Atlas and NASA Valkyrie)and two dynamic quadruped robots (IIT HyQand ANYbotics ANYmal) for more than 2 h oftotal runtime and 1.37 km of distance traveled.The tests were conducted in a number ofdifferent field scenarios which the conditionsdescribed above. The algorithms presented inthis paper are made available to the researchcommunity as open-source ROS packages.

1 INTRODUCTION

Legged robotics is rapidly transitioning fromresearch laboratories into the real world, asdemonstrated by the recent introduction of severalcommercial quadruped platforms.

To be truly useful, legged robots must be able toreliably and rapidly navigate across rough terrain

Figure 1. The Atlas (A) and Valkyrie (B)humanoid robots; The HyQ (C) and ANYmal(D) dynamic quadruped robots. Sources: MIT,University of Edinburgh, Istituto Italiano diTecnologia (IIT), University of Oxford.

and be stable in the presence of disturbances such asslips or pushes. They must also be able to perceiveand manipulate the environment whilst avoidingcollisions with obstacles and people.

None of these tasks can be accomplished withoutthe ability to accurately and robustly estimate thepose and velocity of the robot (i.e., its state) in real-time using only onboard sensors and computers.The robot’s state is used to plan and track bodytrajectories, to balance and recover from externaldisturbances, as well as to map the environment andnavigate trough it.

To achieve a satisfactory level of accuracy,proprioceptive and exteroceptive sensor fusion isnecessary, exposing the problem of synchronizationand latency between the different signals comingfrom each sensor.

1

Page 2: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

Migrating from the controlled environment ofa laboratory to the real operating conditionsof industrial applications (e.g., oil rig platforminspection or mine exploration) makes the taskeven more challenging, as it requires extra effort torobustify the estimation algorithm against unknownsituations and long periods of continuous operationwithout human intervention.

In this paper, we demonstrate how inertial,kinematic, stereo vision and LIDAR sensing canbe combined to produce a low-latency and high-frequency state estimate which can be directly usedto control state-of-the-art humanoids and dynamicquadrupeds. In turn, this estimate can be usedto build accurate maps of the robot’s immediateenvironment and to enable navigational autonomyand manipulation.

Compared with prior research, this contribution isthe first to provide an open source implementationof a fully integrated state estimation systemperforming sensor fusion of IMU, kinematics,stereo vision and LIDAR on four differentlegged platforms: the NASA Valkyrie and BostonDynamics Atlas humanoids; and the IIT HyQ andANYbotics ANYmal quadrupeds (Fig. 1). Anotherkey achievement in comparison with the state-of-the-art is the demonstration of the algorithm inclosed loop with the controller.

1.1 Contribution

This paper combines previous works focused onthe individual platforms including state estimationon Atlas in Fallon et al. (2014); an extension tothe HyQ quadruped and incorporation of visionin Nobili et al. (2017a); and further evaluation ofLIDAR localization in Nobili et al. (2017b).

The paper provides a complete and coherentoverview of the method with a) a comprehensiveand updated survey on state estimation methods forlegged robots; b) additional experimental results onthe ANYmal quadruped platform; c) a more detaileddescription of the overall estimation method andarchitecture.

Furthermore, we release the Pronto state estimator,the FOVIS Visual Odometry, and the AICP LIDARodometry modules as open-source ROS packagesfor the research community.

2 RELATED WORK

The literature on state estimation for legged robotscan be classified according to several criteria: thetype of sensors used (proprioceptive, exteroceptive,or both); output frequency (at control rate, e.g.

400 Hz, or camera/LIDAR rate, e.g. 10 Hz); statedefinition (pose, velocity, joint states, contactpoints, etc.); presence of loop closures (odometry vs.SLAM); the degree of marginalization of past states(from filtering to full batch optimization). Finally, ifthere is fusion of proprioceptive and exteroceptivesignals, this can be performed in a loosely or tightlycoupled manner.

In this section we divide the related work in threemain categories: proprioceptive state estimation,which includes filtering methods to fuse only thehigh frequency signals such as IMU and kinematics;multi-sensor filtering, which cover filtering methodswith proprioceptive and exteroceptive sensor fusion;multi-sensor smoothing, which typically involvefusion of Visual Odometry (VO), IMU andkinematics in a tightly coupled manner withinprobabilistic graphical models frameworks such asfactor graphs.

2.1 Proprioceptive State Estimation

Nearly all modern legged robots are equippedwith IMUs, encoders and force/torque sensors.Since these devices provide low dimensionalsignals at high frequencies (250–1000 Hz), theyare the first to be fused for a smooth (althoughdrifting) state estimate, useful for control purposes.Since real-time safety is paramount for controllers,most methods are based on algorithms such asthe Kalman filter (Section 2.1.1) or lightweightoptimization methods (Section 2.1.2).

2.1.1 Kalman Filtering

Bloesch et al. (2012) were the first to proposean EKF-based state estimator method that did notdepend on a specific type of gait or number of legs.The filter used IMU signals (linear accelerationand angular velocity) as inputs to be integrated forthe process model. The state included pose andvelocity of the robot, as well as IMU biases and feetcontact locations. In this way, they could define legodometry measurements from forward kinematicsof the feet in stable contact with the ground. Theirwork was implemented on the StarlETH robot andtested on short indoor experiments. Shortly after,Rotella et al. (2014) adapted the same method tohumanoid platforms by including the ankle jointand the feet orientations in the state vector.

An important aspect of humanoid robot stateestimation is the distance between the Center ofMass (CoM) and the feet, which is larger thanquadruped platforms. In humanoids, the flexibilityof the links is therefore not negligible and can leadto falls when the CoM is incorrectly estimatedto be inside the relatively small support polygon

This is a provisional file, not the final typeset article 2

Page 3: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

given by the robot’s footprints. Xinjilefu et al.(2015) explicitly estimated the CoM offset usingan inverted pendulum model to infer modelingerror and/or unexpected external forces. In contrast,the approach of Koolen et al. (2016) modeledthe elasticity of their robot’s leg joints to betterdistribute error.

The above methods integrated the kinematics asposition constraints. An alternative approach is touse differential kinematics in addition to forwardkinematics to create linear velocity measurements,which are then integrated within the filter toget consistent position estimates. Bloesch et al.(2013) applied this approach again on the StarlETHquadruped robot. Since angular velocity from theIMU appeared on both the inertial process modeland the measurement update, the authors proposedthe Unscented Kalman Filter (UKF) instead of anEKF to better handle the correlation between thejoint and gyroscope noises.

Fallon et al. (2014) used the same elasticity modelof Koolen et al. (2016) and integrated leg odometryas velocity measurements on the Atlas robot. Sincethe EKF models the measurements as Gaussian,nonlinearities such as slippages or impacts arenot captured by the filter noise model. Therefore,special care was taken to ignore invalid contacts byclassifying the output from the contact sensor in thefeet.

When feet sensors are unavailable, the contact feetare detected by thresholding the Ground ReactionForces (GRF), which are estimated from the jointtorques. Camurri et al. (2017) proposed a methodthat evaluates GRF discontinuities to discard invalidleg odometry velocity measurements on the HyQquadruped robot. To better detect the feet in contact,they also proposed a logistic regression method tolearn the optimal GRF threshold on different gaits.A different approach, based on Hidden MarkovModel (HMM), was adopted by Jenelten et al.(2019) for slip recovery on the ANYmal robot.In this case, the probability of contact for eachleg was determined from dynamics and differentialkinematics.

2.1.2 Optimization

Kalman-based filtering have been preferredover more sophisticated methods because ofits simplicity and low computational expense.However, recent technological progress have madeoptimization-based methods feasible to use. Thesemethods can overcome some limitations such as theneed to define a process model, even when unfit forthe application. Indeed, the widely adopted EKFinertial process model approximates the robot to a

ballistic missile, while optimization methods couldincorporate the floating base dynamics equations ofmotion instead.

Xinjilefu et al. (2014) formulated the stateestimation of the Atlas robot as a QuadraticProgramming (QP) problem. The cost functionwas defined as the weighted sum of two quadraticterms: the modeling error and the measurementerror, where the former is derived from the floatingbase dynamics equation of motion while the latteris derived from encoders, force/torque sensors andIMU measurements. The optimization variable wascomposed by: the generalized (i.e., joint and baselink) velocities, generalized forces and the modelingerror itself. Note that the base link pose was notpart of the state and was estimated separately withan EKF. Tests on the Atlas robot have shownsignificant improvements in the behavior of thefeedback controller with this estimation method.

A more unified optimization-based solution wasproposed by Bloesch et al. (2018). In their approach,they eliminated the process model and made eachmeasurement dependent on both the current andthe previous state of the system. Intuitively, this issimilar to an incremental smoothing method witha window of size two. The approach was ableto integrate the dynamic equations of motion toestimate the linear and angular acceleration of therobot body in addition to what sensed by the IMU,providing extra redundancy. If a process model isavailable, it could still be incorporated as a pseudomeasurement, allowing the form of an EKF to beretained if required.

2.2 Multisensor Filtering

Chilian et al. (2011) were among the first todiscuss stereo, inertial and kinematic fusion on alegged robot. They used a six-legged crawling robotmeasuring just 35 cm across – yet combining all therequired sensing on board.

Similarly, Ahn et al. (2012) addressed the 3Dpose estimation of the humanoid robot Roboray,using an EKF-based SLAM technique. Theirmotion estimation pipeline contains a visual-inertial-kinematic odometry module and visualSLAM module. The kinematic and visual odometryare used to update the IMU measurements within anEKF filter. These constitute the input of the visualSLAM algorithm which performed loop closuresand decrease the drift.

Hornung et al. (2010) applied Monte Carlolocalization (MCL), a Bayes filtering approachwhich recursively estimates the posterior, toestimate the 6 DoF pose of the Nao humanoid robot.

Frontiers 3

Page 4: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

By fusing the measurements of a 2D LIDAR witha motion model, they estimated the pose of therobot’s torso including while climbing a miniaturestaircase.

Ma et al. (2016) proposed an error-state Kalmanfilter fusing a tactical grade inertial measurementunit with stereo visual odometry to produce apose estimate for navigation tasks, such as pathplanning. The robot’s kinematic sensing was onlyused when visual odometry failed. Their approachwas focused on pose estimation and was not usedwithin the robot’s closed loop controller. Theirextensive evaluation (over thousands of meters)achieved 1 % error per distance travelled.

In contrast to the above mentioned methods, weaim to estimate both pose and velocity of the robotwith multisensor fusion, and use this estimate onlineinside the control loop. This is motivated by thefact that, for highly dynamic motions, the drift rateof proprioceptive estimators is unacceptable andrequires integration of other exteroceptive signals.

The estimator used in this work is based ona loosely-coupled EKF, an approach that hasbeen previously applied to Micro Aerial Vehicles(MAVs), e.g., Lynen et al. (2013) Shen et al. (2014).

2.3 Multisensor Smoothing

Smoothing methods are well established in theMAVs community for tightly coupled visual-inertial navigation, partly due to the relatively lowcomplexity of these machines (e.g., fewer degreesof freedom). The main advantage of smoothing isthe ability to jointly use all (or part) of the pasthistory of measurements to reduce the uncertaintyaround the full robot’s trajectory.

In the recent years, promising works havebeen released to apply these techniques to leggedmachines. Hartley et al. (2018b) proposed the firstattempt to fuse leg odometry and IMU in a factorgraph on the Cassie bipedal robot. They extendedthe state with the feet contact locations and definedtwo new factors to incorporate forward kinematicsand impose a zero velocity constraint on the contactpoints of a foot. These were then combined with thepreintegrated IMU factor from Forster et al. (2017a).Hartley et al. (2018a) extended this work to includeadditional pose measurements from the SVO VisualOdometry system (Forster et al. (2017b)). Bothworks were demonstrated on Cassie in controlledenvironments for a short period of time (<5 min).

Wisth et al. (2019) proposed a tightly coupledvisual-inertial-legged system based on the iSAM2solver running on the ANYmal robot. The methodextracts Kanade-Lucas features from the stereo

camera on a RealSense D435 camera and optimizesthem as the landmarks in the graph. Leg odometrywas integrated as relative pose factors obtainedfrom the internal state estimator running on therobot, Bloesch et al. (2018). The method wasdemonstrated on extensive outdoor experimentsin urban and industrial scenarios where dynamicoccludants and textureless areas were present in thescene.

All the above works were based on the assumptionof a stationary point of contact. This assumptionis violated every time there are slippages ordeformations of the leg and/or the ground. Contactdetection methods can help to reject sporadicslippage or deformation events. However, whenthese occur regularly, they need to be modelled.

Wisth et al. (2019) proposed a factor graphmethod that models contact nonlinearities as biasterm of the linear velocity measurements fromleg odometry. This can reduce the inconsistencybetween leg and visual odometry and provide amore robust pose and velocity estimate.

3 PROBLEM STATEMENT

We wish to track the pose and velocity of anarticulated floating base robot with two or more legsand equipped with an onboard IMU, joint sensingof position and torque, cameras, and LIDARs. Inthis paper, we will focus on the Atlas and Valkyrie28-DoF humanoids and on the HyQ and ANYmal12-DoF quadrupeds. The robots of the same typeshare the same kinematic tree, with differences onlyin the links length and sensor locations.

3.1 Frames and Definitions

3.1.1 Coordinate Frames

In Figure 2 we illustrate the reference framesrelevant to our estimation problem. The inertialframe W and the base frame B are rigidly attachedto the ground and the robot’s floating base,respectively. The frames located at the sensororigins are also rigidly attached to the floatingbase, namely: the camera optical frame C, the IMUframe I, the LIDAR frame L. The relative locationsof these frames are known by design or can beretrieved with calibration procedures such as theones described in Furgale et al. (2013); Reinke et al.(2019). One or more temporal contact frames K arecreated when a foot comes into contact with theground.

This is a provisional file, not the final typeset article 4

Page 5: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

Figure 2. Reference frame conventions for typicalquadruped and humanoid robots (with a simplifiedupper torso structure). The world frame W is fixed toearth, while the base frame B, the camera’s opticalframe C, and the IMU frame, I are rigidly attachedto the robot’s chassis or head. When a foot touchesthe ground (e.g., the Right Front, RF), a contactframe K (perpendicular to the ground) is defined.

3.1.2 Notation

In the remainder of the paper, we adopt thefollowing conventions: the robot position p =pW WB ∈ R3 and orientation R = RWB ∈ SO(3)

are from world to base and expressed in worldcoordinates; the robot velocities v = vB WB, ω =ωB WB ∈ R3 are from world to base expressed in base

coordinates; the IMU biases baI , bω

I ∈ R3 areexpressed in IMU coordinates. A time dependentvector quantity a computed at time tk is shortenedas ak = a(tk).

3.2 State Definition

The robot state is defined as the vector combiningposition, orientation, linear velocity, and IMUbiases. The angular velocity does not appear as it isassumed to be directly measured by the IMU onceproperly bias compensated. The state at time tk is:

xk = [pk Rk vk bak bω

k ]T (1)

The orientation uncertainty is tracked by theexponential coordinates of the perturbation rotationvector, as described in Bry et al. (2012).

3.3 Requirements

To effectively track base and feet trajectories, thestate estimate should have negligible drift at least

over the course of one planning cycle. Modernfootstep planners typically replan every 1–5 s. Lowlatency velocity estimates (including transductionand data transmission) are also fundamental for thefeedback loop of a controller.

Low drift or drift-free state estimates are alsorequired for navigation tasks (such as mapping andglobal path planning) as basic building blocks formany autonomous systems.

With these considerations in mind, we definethe following requirements for a state estimatordesigned to run on legged robots in fieldoperations:

• low pose drift on short range (e.g., 10 m);• reliability in real semi-structured environments

(i.e., does not diverge);• signal smoothness for safe use in a control loop.

4 METHOD DESCRIPTION

Our approach adapts the core EKF filter describedin Bry et al. (2012), with velocity corrections addedby Fallon et al. (2014) for humanoid kinematicsand then extended to quadruped kinematics inCamurri et al. (2017) (see Section 4.2). Additionalpose corrections from visual odometry and LIDARregistration are described in Sections 4.4 and 4.5.

The goal of the EKF is to estimate the meanµ andcovariance Σ of the Gaussian distribution over thestate, xk, given: the previous state xk−1, the currentcontrol input uk, and the current measurementzk. The state is first predicted using the nonlineardiscrete transition function f(·) and then correctedby the observation function h(·). Both functionsare corrupted by zero-mean Gaussian process noisewk ∼ N (0,Qk) and measurement noise ηk ∼N (0,Pk):

xk = f(xk−1,uk,wk) (2)

zk = h(xk,ηk) (3)

The mean and covariance are propagated in thestandard manner:

µ−k = f(µk,uk,0) (4)

Σ−k = Ak−1Σk−1ATk−1 + Wk−1QWT

k−1 (5)

where the minus superscript indicates that thequantity is evaluated before the measurementupdate takes place. For details on the derivationof the partial derivatives of the transition function,A and W, please refer to Bry et al. (2012).

The measurements are also integrated in astandard EKF manner. For instance, a velocity

Frontiers 5

Page 6: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

measurement vk ∈ R3 with covariance matrixPv

k ∈ R3×3 would be integrated as follows:

zk = vk (6)

Kk = Σ−k HT(HΣ−k HT + Pvk)−1 (7)

µk = µ−k + Kk(zk −Hµ−k ) (8)

Σk = (I−KkH)Σ−k (9)

where: Kk ∈ R15×3 is the Kalman gain and H ∈R3×15 is the Jacobian of the observation function,which in the specific case above acts as a selectormatrix for the linear velocity substate.

4.1 Inertial Process Model

The acceleration (in the presence of gravity) andangular velocity are sensed by the IMU at highfrequencies in the range 0.4–1 kHz. These areaffected by bias and zero-mean Gaussian noise:

ωI WI = ωI WI + bω + ηω (10)

aI WI = aI WI + ba + ηa (11)

These quantities are transformed into the base frameand used as inputs to the process model:

u =

[ωa

]=

[RIB( ωI WI − bω − ηω)RIB( aI WI − ba − ηa)

](12)

where RIB is the rotational part of the rigidtransform between the IMU and base frames. Notethat we ignore the effects of angular accelerationand centripetal force (see Diebel (2006)) andassume the IMU is close enough to the robot’s baseto make them negligible.

The process equations are:

p = Rv (13)

R = Rω∧ (14)

v = −ω∧v + RTg + a (15)

ba = ηab (16)

bω = ηωb (17)

where ηωb , ηab are bias random walk noises.

Given Equations 13–17, we can predict the meanof the state xk by simple integration over the period

∆t = tk − tk−1:

uk =

[ωkak

]=

[RIB(ωk − bω

k−1)RIB(ak − ba

k−1)

](18)

µ−k = f(µk−1,uk,0) =

pk−1vk−1

0bak−1

bωk−1

+

vk−1∆t

(−ω∧kvk−1 + (Rk−1)Tg + ak)∆t

Rk−1 exp(ω∧k∆t)00

(19)

Note that the attitude is integrated separately usingthe exponential map between the Lie group ofrotations and its Lie algebra at the identity (seeForster et al. (2017a)).

The prior covariance Σ−k is also computed byEuler integration of the partial derivatives of theprocess equation, as detailed in Bry et al. (2015).

Having propagated the filter, measurements fromother sensors can be used to correct the state vector.In the following sections, we derive measurementsand their covariance matrix from leg, visual andLIDAR odometry.

4.2 Leg Odometry

Leg odometry estimates the incremental motion ofthe floating base of a legged robot from the forwardkinematics of the legs in stable contact with theground. This measurement can be formulated aseither a relative pose or a velocity measurement.In our system, we formulate linear velocitymeasurements.

In the following sections, we derive thismeasurement specifically for humanoids andquadrupeds.

4.2.1 Humanoids

We adopt the contact classification and velocitymeasurement strategies from Fallon et al. (2014).

4.2.1.1 Contact Classification

Humanoid robots are typically equipped withforce/torque sensors at the feet, from which thecontact state can be inferred by thresholdingthe measured normal force. Torsional friction isassumed to be high enough for there to be no footrotation.

This is a provisional file, not the final typeset article 6

Page 7: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

We use a Schmitt trigger to classify contactforces sensed by the robot’s 3-axis foot force-torquesensors and to detect how likely a foot is to be incontact. For simplicity, only one foot is detected asin contact during a double support phase and used asimple state machine to decides which foot is morereliable.

We also classify other events in the gait cycle,such as striking contact (as a 20–30 N positive andincreasing discontinuity lasting more than 5 ms)and breaking contact (negative force discontinuitybelow a threshold) Because these events createunrealistic measurements, the EKF integrates themwith higher measurement covariance.

Finally, we found that in some cases it is necessaryto use the state of the controller to decide whichcontact points were in stable contact. For example,when climbing stairs the toe of the trailing footpushes the robot upward but is not in stationarycontact (a “toe off” event). In that case, we useinformation from the controller to assign the leadingfoot to be the primary fixed foot.

4.2.1.2 Measurements

Once the primary fixed foot is established, avelocity measurement is created by differentiationof the base position across the interval ∆t =tk − tk−1. The foot contact locations at times tk−1and tk are defined as the composition of the baseposition in world coordinates and the foot positionin base coordinates:

pW WK(tk−1) = pk−1 + Rk−1fk(qk−1) (20)

pW WK(tk) = pk + Rkfk(qk) (21)

pW WK(tk) = pW WK(tk−1) (22)

where fk(·) is the forward kinematic function thatreturns the foot location in base coordinates and qare the measured joint positions.

Since the contact location in world coordinatesdoes not change over the interval (see Equation 22),the difference in position pk−pk−1 can be inferredfrom the forward kinematics only, by subtractingand rearranging Equations 20–21. Finally, thediscrete differentiation is then simply computed bydividing pk − pk−1 by the time interval (Equation23).

vk =pk − pk−1

∆t+ ηv =

=Rk−1fk(qk−1)−Rkfk(qk)

∆t+ ηv (23)

zk = vk (24)

where the covariance matrix Pvk = Pv is defined

from fixed values (empirically found) which areincreased when special events (striking contact,breaking contact) occur.

4.2.2 Quadrupeds

Quadruped robots are typically equipped withhigh precision joint encoders from which lownoise joint velocity measurements can be derived.However, a major challenge is achieving accuratecontact estimation since field-ready quadruped arenot typically equipped with direct contact sensorsas they easily break during operation.

4.2.2.1 Contact Classification

Quadruped robot feet are usually approximatedto be point-like and then assumed to exertonly pure forces onto the ground. These forcescan be estimated for each individual leg ` ∈LF, RF, LH, RH using the base acceleration ω, vand torques τ :

f ` = −(

J(q)T)†(

τ − hq − FT

[ωv

])(25)

where: J(·) is the foot Jacobian, hq are the Corioliseffects, and F is the matrix of spatial forces requiredat the floating base to support unit accelerationsabout each joint variable (see Featherstone (2008)).

Let f `k ∈ R be the vertical component of f ` ∈ R3

at time tk. Thus, we model the probability for aparticular foot to be in firm, static and stable contactwith the following Sigmoid function:

P`k(s`k = 1|f `k) =

1

1 + exp (−β1f `k − β0)(26)

where s`k ∈ B is a binary value that indicatescontact/no-contact for foot ` at time tk. We learnthe Sigmoid parameters β0 and β1 using a logisticclassifier, as described in Camurri et al. (2017).

For each leg, we determine a (binary) contact states`k = 1 if P`

k > 0.5 and s`k = 0 otherwise.

4.2.2.2 Measurements

Having determined the set of legs in contact, fora given leg ` the robot’s linear velocity can becomputed as follows:

vB WB = − vB BK − ωB WB × pB BK (27)

From the sensed joint positions and velocities q, ˜qand their additive noises ηq,ηq, we can rewrite

Frontiers 7

Page 8: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

Eq. (27) as a linear velocity measurement of therobot’s base, computed using the leg `:

v` = −J(q−ηq) ·(˜q−ηq)−ω× fk(q−ηq) (28)

where fk(·) and J(·) are the forward kinematicsfunction and its Jacobian, respectively.

As in Bloesch et al. (2013), we collect all theeffects of noise into one additive term ηv:

v`k = −J(qk)˜qk − ω × fk(qk) + ηv (29)

Since multiple legs can be in contact simultaneously,we define the velocity measurement as a weightedaverage using the set of legs in contact, whereweights are using the contact probabilities P`

k fromEquation 26:

vk =

∑P`kv

`k∑

P`k

+ ηv ∀` | s` 6= 0 (30)

z = vk (31)

The adaptive covariance Pvk is associated with

the velocity measurement and accounts for harshimpact forces (up to 600 N for a 90 kg robottrotting). These forces can severly undermine theestimation performance, because compression ofthe legs or the ground causes wrong kinematicmeasurements which translate into velocity andposition errors.

The covariance is computed as the combinationof a fix term (from encoder noise datasheet), theinter-leg velocity covariance Dk and a term that isproportional to force discontinuities (that are causedby impacts). For convenience, let ck be the totalnumber of detected contact legs at time tk. The inter-leg covariance is defined as the covariance matrixof the velocity contributions from the contact legs:

Dk =1

ck

∑(vk − v`

k)(vk − v`k)T

'

σ2x 0 00 σ2y 00 0 σ2z

= Λ(σ2x, σ2y , σ

2z) (32)

The force discontinuity is defined as the averagenormal force absolute difference per each leg:

∆f =1

ck

∑∀`|f `k − f `k−1| (33)

From Equations 32–33, the final covariance for thevelocity measurement is:

Pvk = Pv

0 +

[1

2

(Λ(σx, σy, σz) + I3

∆f

α

)]2(34)

Figure 3. Visual odometry performance duringa trotting sequence on HyQ: the robot first trotsforward at 0.3 m/s and then turns in place sharplyover a 5 s period. During the initial trotting phase,VO performance is satisfactory. However, imageblur causes the number of inliers to fall and meanre-projection error to spike. During this part of theexperiment, no VO measurements are incorporatedinto the main motion estimate.

where α is a constant normalization factor,empirically determined.

4.3 Zero Velocity Bias Estimation

The yaw drift due to bias evolution can besignificant over long periods of time. Yaw erroris also the dominant source of drift in any stateestimator or SLAM system. For this reason, in Maet al. (2016) zero velocity updates were used tomeasure rotation rate bias estimates.

In our system, we continually check for periodswhere the robot is stationary using the jointvelocities and GRF.

When the robot is stationary for at least 400 ms,the gyro bias is updated to the average angularvelocity recorded during the stationary period:

bωk =

1

tk − ti

k∑j=i

ωj (35)

zk = bωk (36)

where tk − ti > 400 ms.Since the bias is generally a very small quantity

(i.e., ω bω), the covariance associated to themeasurement can be typically set to very smallvalues without affecting the control system of therobot.

This is a provisional file, not the final typeset article 8

Page 9: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

4.4 Visual Odometry

Visual Odometry estimates the pose of the robotby tracking features on camera images. The VOestimate frequency is typically in the range of 10–30 Hz, which corresponds to the camera frame rate.

When used in combination with LIDAR odometry,the benefit of VO is twofold. First, it makes theoverall estimated trajectory smoother if comparedwith a inertial-kinematic-LIDAR only system,as it reduces the drift rate in between twoLIDAR updates. Second, the reduced drift ratealso helps the LIDAR registration itself, asthe sparsity of the LIDAR scans requires theaccumulation of scans over time before performingthe registration. Therefore, a smaller drift rateduring the accumulation produces higher qualitypoint clouds to be registered.

Our visual odometry pipeline is based on theFOVIS algorithm by Huang et al. (2011). Themeasurements are loosely integrated into the filteras relative pose measurements between frames. Thiswould allow the use of other VO algorithms, suchas ORB-SLAM (Mur-Artal et al. (2015)), SVO(Forster et al. (2017b)) or VINS-Mono (Qin et al.(2018)), to name a few. FOVIS was chosen becauseof its computational efficiency.

The only input to FOVIS is a sequence of stereoimage pairs. It tracks FAST features in a key-frameapproach to estimate incremental camera motion.Given two keyframes at times ti, tj , we denote theestimated relative motion of the camera betweenthese two times as TCi CiCj

= TCij . Using theknown camera-to-base frame transformation, TB BC,this can be expressed at the corresponding estimateof the motion of the base frame as:

TBij = TB BC TCij ( TB BC)−1 (37)

We integrate the VO estimate for a time windowtj − ti which is typically 2–3 s. When used incombination with the LIDAR module, we thenform a position measurement in the world frame, asfollows:

TW WB(tj) = TW WB(i) TBij (38)

pj = trans(

TW WB(tj))

(39)

zj = pj (40)

where the pose of the robot at time ti is taken fromfilter’s history of states. Note that we choose touse only the translational part of Equation 38 forthe EKF filter update, as yaw corrections from theLIDAR are more accurate and sufficiently frequent.

Without the LIDAR module, the VO update canalso include rotational components (typically onlyyaw, since roll and pitch are observable from theIMU).

Note that the update could be delayed in time (i.e.,tj < tk), therefore the filter will re-apply the chainof measurements from time tj to tk, as explained inSection 5.1.

The covariance matrix for the measurement wasmanually set to fixed values. However, when theFOVIS algorithm reports failure, the measurementis discarded. The algorithm reports failure in threecases: 1) when the number of inlier features beingtracked is below a threshold (10 in our case); 2)when the solution of the optimization is degenerate;3) when the reprojection error is higher than athreshold (1.5 pixels in our case).

An example of failure is provided by Figure 3,where at time 12 s the number of inliers dropsbelow the threshold (top plot) and the reprojectionerror increases significantly (medium plot) due toan abrupt robot rotation which caused motion blur(bottom plot).

4.5 LIDAR Odometry

Our LIDAR odometry is based on the Auto-tuned Iterative Closest Point (AICP) algorithm byNobili et al. (2017b), which improves the ICPimplementation from Pomerleau et al. (2013) bymaking it more robust against significant changesin overlap between the clouds to be registered.

The rotating Hokuyo LIDAR sensor inside theMultisense SL (mounted on Atlas, Valkyrie andHyQ), as well as the Velodyne VLP-16 (mounted onANYmal) produce very sparse point clouds whichcannot be used directly for scan to scan registration.

Therefore, we accumulate consecutive measurementsfrom the sensor as a reference point cloud. Thefilter’s state is used as the source of robot posesduring the accumulation. We assume that the posedrift during the accumulation is small enough not tocreate significantly distorted reference point clouds.In this context, the VO module is important, as itkeeps the drift bounded during the accumulationperiod.

Once a sufficiently dense reference cloud isobtained, a sequence of reading point clouds areaccumulated and registered against the referencefor motion estimation. The result of the registrationconstitutes an additional relative pose measurementfor the EKF.

Frontiers 9

Page 10: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

4.5.1 Reference Update

Using the first accumulated point cloud as thereference and registering the forthcoming clouds asreading is effective only in confined scenes. Whenthe robot travels far away from its initial location,this method is intractable due to the decreasingoverlap between the source and the reading clouds,eventually resulting in ICP failure.

To guarantee the sufficient overlap between thereference and the reading clouds, we update thereference clouds whenever the overlap drops belowa safety threshold. In long range missions, such asthe one described in Section 7.5, we convenientlyforced a reference update after the robot has traveled5 m from its initial location. This way, the drift iseffectively bounded while having sufficient overlapfor data association.

4.5.2 Pre-filtering

According to Segal et al. (2009), point-to-plane registration has proven to have superiorperformance than point-to-point. Therefore, weextract planar macro-features (e.g., walls, doors,ceilings) and implicitly discard all other entities(including dynamic obstacles). We also apply avoxel filter with a leaf size of 8 cm to uniformlydownsample the clouds. This step is necessary toequalize the contribution from all the points duringthe optimization process, as point clouds are denserin the proximity of the sensor.

For planar surface extraction, we adopt a regiongrowing strategy: a patch which is larger than aspecific area (e.g., 30 cm× 30 cm) is accepted forfurther process. An example of pre-filtering outputis shown in Figures 4A and 4B.

4.5.3 Auto-tuned ICP

Most ICP solutions assume a constant overlapbetween reference and reading clouds. However,when partial occlusion occurs (e.g., during passagethrough a narrow door), this assumption is violatedand the massive concentration of points on theoccludant (e.g., the walls besides the door) cancause wrong correspondences.

In contrast, we continuously estimate theamount of overlap between the point clouds andautomatically tune the ICP inlier ratio for robustregistration. The overlap parameter is proportionalto the true positive correspondences (i.e., the higheris Ω, the larger the number of true positive matchesand vice versa). In the following subsection, webriefly describe how Ω is computed. More detailscan be found in Nobili et al. (2017b).

Figure 4. Pre-filtering and Outlier filtering. (A)raw point cloud from Valkyrie’s dataset, people areoutlined in red. (B) after pre-filtering. People andsmall irrelevant features have been filtered-out. (C)region of overlap as the result of the analysis of thetwo volumes of the point clouds.

4.5.4 Overlap Filter

The overlap parameter Ω is computed in a point-wise fashion (Figure 4C). Let AW , BW be thereference and reading point clouds acquired at timesti, tj and whose points have been expressed in worldcoordinates by using a prior from the EKF. Eachcloud is confined into the volumes Vi, Vj by thesensor Field-of-View (FoV). The intersection ofthe two volumes defines an overlap region (red infigure). If Si and Sj are the points of Ai and Bjbelonging to the overlap region, we can define theoverlap parameter Ω as:

Ω =|Si||A|·|Sj ||B|

(41)

where | · | indicates the number of points in thecloud.

This is a provisional file, not the final typeset article 10

Page 11: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

We use the overlap parameter from Eq. (41) todynamically set the inlier ratio of the ICP algorithm.If 0.2 < Ω < 0.7 we set the inlier ratio to Ω. If Ωis below 0.2, the inlier ratio is set to 0.2, as it is theminimum required for ICP registration. Finally, ifΩ exceeds 0.7, the inlier ratio is bounded to 0.7 toavoid overestimation.

We follow three heuristics to determine if analignment is successful. First, the mean residualpoint-wise error should be smaller than thethreshold α:

MSE =1

n

n∑i=1

ri < α (42)

where r1, . . . , rn are the residual distances betweenthe accepted matching points in the input clouds.Second, the median of the residual distribution,Q(50), should be smaller than the threshold α:

Q(50) < α (43)

Third, the quantile corresponding to the overlapmeasure should be also smaller than α:

Q(Ω) < α (44)

The first two conditions are commonly used metricsof robustness, while the third automatically adaptsto the degree of point cloud overlap. The parameterα was set to 0.01 m during our experiments.

4.5.5 Measurements

Once the two clouds A,B have been successfullyregistered, the estimate TBij of the robot’s baserelative pose between time ti and tj is available,similarly to Equation 37. Thus, the measurementis incorporated in the same way, but includingrotation:

TW WB(tj) = TW WB(i) TBij (45)

pj = trans(

TW WB(tj))

(46)

Rj = ∠(

TW WB(tj))

(47)

zj =[pj Rj

](48)

where again the absolute pose of the robot at timeti is taken from the filter and the covariance matrixis set to fixed values. Note the time index is j astypically the measure is delayed (i.e., tj < tk).

5 IMPLEMENTATION

A block diagram of our system is presented inFigure 5. Even though the Pronto modules can all

Figure 5. Block diagram of our system: the IMUprocess model and leg odometry run on the controlcomputer in a real time process (in pink) on thecontrol computer, while the other modules run onseparate computers in the user space (light blue).These modules output filter measurements as ROSmessages which are exchanged with the real-timedomain through native shared memory mechanisms.

be run on a single machine, it is common practicein legged robot design to distribute the computationacross two separate computers: a Control PCconnected to actuators and proprioceptive sensorsrunning a real-time operating system; and a VisionPC for exteroceptive sensor processing. This designarchitecture has been adopted for all the robotsevaluated in this paper. Its main advantage is that themore critical operations are unaffected by potentialdelays, failure or overloads caused by the resourceintensive data processing from camera and LIDAR.

Therefore, the IMU prediction and leg odometryupdates are performed within the same UNIXprocess running on the Control PC. After everyIMU process step, the estimator immediately sharesthe filter state with the control system via a realtime interface based on shared memory. The sameestimate is also available on the network for othermodules to use (e.g., as prior for ICP).

The FOVIS and AICP processes are run onthe Vision PC. Both modules are decoupled fromthe core estimator, which receives the updatesas timestamped messages via a TCP or UDPchannel (e.g., ROS messages). This allows Prontoto perform the core IMU/leg odometry, which ismore critical, and to incorporate measurements ifand when they are available.

On all platforms, the computation is carried out onconsumer grade processors (e.g., equivalent to Inteli7 for laptop), with no need for GPU processing.

Frontiers 11

Page 12: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

Figure 6. Example illustrating how VO and LIDAR measurements can be incorporated into the filterdespite having much higher latency than the IMU process model. In black is the best estimate of thetrajectory at that instance, in red are updates introduced by incorporated measurements and dashed graylines are parts of the trajectory which are recomputed. For clarity, the magnitude of the corrections isexaggerated. Elapsed time is indicated in the upward direction.

5.1 Measurement History

The implementation of the filter maintains ahistory of measurements (with their covariance),filter prior/posterior states, and filter covariancescovering a time window of typically 10 s. Thisallows incorporation of asynchronous correctionsfrom VO and LIDAR which have significantlatency.

In Figure 6, we explain the concept with a toyexample. In black is the best estimate of the currentstate and history at that moment in time. In red arediscontinuities caused by EKF updates (exaggeratedfor clarity). In dashed gray are portions of the filterhistory which are overwritten due to a receivedmeasurement.

Event 1: At time ta, the head of the filter pointsto Ta. This state is the result of predictions andmeasurement integrations available up to time ta. Atthis same time, the filter receives a delayed LIDARmeasurement with timestamp tc (with tc < ta). Inparticular, the measurement involves the relativepose between Tb and Tc (with tb < tc). The historyconsists of a window of measurements, filter states,and filter covariances, ordered by timestamp. Sincethe window is longer than the time interval ta − tb,the filter head can be moved back to Tc, whichcorresponds to the state recorded at time tc.

Event 2: The LIDAR measurement is incorporatedas an EKF correction resulting in the posteriorestimate Tc. At this point, all the measurements inthe history with timestamps after tc are re-appliedto the filter, as if they had been received afterthe LIDAR measurement. As a result, the filterhead at time ta becomes Ta. The past trajectory

(dashed gray line) is therefore overwritten. The newcurrent state Ta is the same as it would have beenif the LIDAR measurement was received at time tcinstead of ta.

Event 3: Over the next period of time, the filtercontinues to propagate the head of the estimatorusing the IMU process model and leg odometry.At time td, a new visual odometry measurement iscreated which measures the relative transformationof the body frame between time te and time tf . Thismeasurement is typically received with 150–300 msof delay.

Event 4: We wish to use this information tocorrect the pose of the robot towards Tf , asdescribed in Section 4.4. The key step is that thiscorrection to the filter is carried out using the re-filtered trajectory (mentioned in Event 2). After thecorrection is applied, the head of the filter becomesTd and the estimator continues as normal.

The final sub-figure (on the right) shows the stateof the head of the filter over the course of theexample. This is the running estimate that wouldhave been available to the controller online.

Note that the proposed framework qualifies asan odometry system, as no loop closures areperformed. Therefore, typical corrections fromthe exteroceptive modules are in the order offew centimeters (cf. Figure 9C top right). Thesediscontinuities are small enough to be dealt withby the position controller acting on the robot basewith appropriate gains. Bigger discontinuities, suchas the ones from a SLAM system, are typicallyaddressed by using two different reference frames

This is a provisional file, not the final typeset article 12

Page 13: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

for control and for global path planning (e.g., WimMeeussen (2010)).

5.2 Software Structure

The framework presented in this paper is availableto the research community as three open-sourcerepositories:

• pronto1: library implementations of the EKFinertial process model and the Leg Odometrymodules described in Sections 4.1 and 4.2,respectively

• fovis ros2: ROS wrapper for the FOVISalgorithm (previously open-source but not ROScompatible)

• aicp mapping3: implementation of theAICP algorithm described in Section 4.5.

The first repository is independent from the othersand contains all the code necessary to implement aproprioceptive state estimator on a legged robot. Todeploy the algorithm on a legged robot of choice,either with or without ROS, the implementationof the forward kinematics API and the creationof a dedicated executable is required. A completeexample of a deployment on the ANYmal robot isalso provided.

6 EXPERIMENTAL PLATFORMS

In the following sections, we describe the relevantcharacteristics of the experimental platforms used:the Atlas and Valkyrie humanoid robots and theHyQ and ANYmal dynamic quadruped robots. Asummary of the main sensors mounted on the robotsis provided by Table 1.

6.1 Atlas

Atlas (version 5, Figure 1A) is a 195 cm high,95 kg heavy, 28-DoF hydraulic robot manufacturedby Boston Dynamics for the DARPA RoboticsChallenge. Each leg has six joints (three hip,one knee, and two ankle joints), the positionof which are estimated from the measuredtravel of their hydraulic actuators using a LinearVariable Differential Transformer (LVDT). Sincethe accuracy of these devices is limited, the jointvelocities are very noisy and therefore not useddirectly for leg odometry (cf. Section 4.2.1). Othermeasurement nonlinearities such as backlash have

1 github.com/ori-drs/pronto2 github.com/ori-drs/fovis_ros3 github.com/ori-drs/aicp_mapping

been addressed the same way as Koolen et al.(2016).

Located at the pelvis is a tactical KVH 1750 IMUequipped with Fiber Optic Gyro (FOG) for accurateangular velocity measurements.

The main source of exteroceptive signals is theCarnegie Robotics Multisense SL, a tri-modalruggedized sensor that includes: a rotating HokuyoUTM-30LX-EW; a high quality rolling shutterRGB stereo camera with a 7 cm baseline; and aFPGA implementation of the stereo Semi-GlobalMatching algorithm by Hirschmuller (2008), toprovide dense 3D point clouds at nominal camerafrequency. All these signals are synchronized inhardware through the FPGA. The laser produces 40line scans per second with 30 m maximum range —while spinning about the forward-facing axis. Everyfew seconds, it spins half a revolution and a full3D point cloud is accumulated with a Field-of-View(FoV) of 220 × 180.

6.2 Valkyrie

Valkyrie (Figure 1B) is a 1.87 m tall, 129 kg,and 44-DoF (28-DoF without hands) electricallyactuated robot developed by NASA for the DARPARobotics Challenge and space operations (Radfordet al. (2015)). As for Atlas, each leg has 6-DoF,with 3-DoF hips, 1-DoF knee and 2-DoF ankles.The hip and knee motors are rotary actuators whoserotation is measured by magnetic encoders andtorque by a measuring the spring deflection. Theankle joints are linear with encoders located at therotation for position and load cells on the shaft fortorque measurement, respectively.

Even though the robot is equipped with severalcameras for visual servoing, the main exteroceptivesensor considered in this paper is again theMultisense SL. The FoV of the LIDAR is reducedto 180 × 120 by a plastic cover over the head.

6.3 HyQ

HyQ (Figure 1C) is a torque-controlled HydraulicQuadruped robot (Figure 1) developed by Seminiet al. (2011) at the Istituto Italiano di Tecnologia(IIT). The system is 1 m long, and weighsapproximately 85 kg. Its 12 revolute joints havea rotational range of 120 and a maximum torqueof 160 N m. The 1 kHz sensors are read by a controlcomputer (using a real-time operating system).All other sensors are connected to a perceptioncomputer and are passively synchronized with thereal-time sensors as described in Olson (2010).

As for Atlas and Valkyrie, the robot’s mainexteroceptive sensor is the Carnegie Robotics

Frontiers 13

Page 14: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

Sensor Model Hz SpecsATLAS

IMU KVH 1750 333 Init Bias: 0.5 /h | 0.5 mgBias Stab: 0.05 /h | 0.05 mg

StereoCamera Multisense SL 10

Res: 1024× 1024 pxFoV: 80 × 80Imager: CMV4000 4MP

LIDAR Hokuyo UTM-30LX-EW 40 FoV (full rot.): 220 × 180

Encoder N/A 333 Res: <0.0045

Torque N/A 333 Res: N/AVALKYRIE

IMU 3DM-GX4-25 500 Init Bias: 0.05 /s | 2 mgBias Stab: 10 /h | 0.04 mg

StereoCamera Multisense SL 10 FoV (full rot.): 180 × 120

Encoder N/A 500 Res: 0.0043

F/T ATI Omega85 500 Res: 0.07–0.1 N0.02–0.03 N m

HYQ

IMU KVH 1775 1000 Init Bias: 0.5 /h | 0.5 mgBias Stab: 0.05 /h | 0.05 mg

StereoCamera Multisense SL 10 See above

Encoder AEDA3300-BE1 1000 Res: <0.0045

Force Burster 8417 1000 Res: <25 NTorque N/A 1000 Res: N/A

ANYMAL

IMU Xsens MTi-100 400 Init Bias: 0.2 /s | 5 mgBias Stab: 10 /h | 15 mg

StereoCamera RealSense D435 30

Res: 848× 480 pxFoV: 91.2 × 65.5Imager: IR global shutter

Encoder ANYdrive 400 Res: <0.025

Torque ANYdrive 400 Res: <0.1 N m

Table 1. Sensor specifications divided by robot.

Multisense SL. The stereo camera was configuredto capture 1024× 1024 images at 10 Hz. Figure 10shows an example of a left camera image anda depth image taken during an experiment —indicating the challenging scenarios we target.

6.4 ANYmal

ANYmal (version B, Figure 1D) is a 12-DoFelectrically actuated quadruped robot initiallydesigned by Hutter et al. (2016) at ETH Zurichand now manufactured by ANYbotics. It is80 cm long and weights 33 kg. Its series elasticactuators can deliver up to 40 N m of torque andprovide accurate measurements of the joint position,velocity (internally computed by differentiation)and torque (by spring deflection measurement).

The robot is equipped with a XSens MTi-100industrial grade IMU, a RealSense D435 cameraat the front (for visual stereo odometry and localmapping) and a Velodyne VLP-16 LIDAR on thetop (for localization and global mapping).

7 EXPERIMENTAL RESULTS

We carried out a series of experiments with theAtlas, Valkyrie, HyQ and ANYmal robots over thecourse of four years. We present summary resultswhich have a combined time of 2 h and 13 minand 1.37 km of distance traveled, respectively. Asummary of the experimental results divided bydataset and robot is available in Table 2.

7.1 Evaluation Protocol

We aim to evaluate the estimation performanceboth quantitatively and qualitatively, with a focus onreal-world scenarios and online/real-time execution.

7.1.1 Ground Truth

The experimental results presented in this sectionhave been collected over the span of several yearsin a variety of different conditions and platforms.For this reason, it was not always possible togenerate the ground truth poses from the samesource (e.g., motion capture). The last column ofTable 2 indicates the experiments where groundtruth was available.

For all indoor experiments on HyQ and Valkyrie(lines 2–4, Table 2), we have used a Vicon motioncapture system to have millimeter accurate groundtruth poses at 100 Hz.

For the HyQ outdoor experiments (line 6 intable), we exploited situations where the robotwas completely stationary to accumulate six fullsweeps of LIDAR scans from different locationsto reconstruct the scene in post-processing viaICP registration. Since the LIDAR was perfectlystationary, the accumulation was performed forat least two full turns (65k points per scan), andthe overlap was more than 70 %, we ascribe theaccuracy of the reconstruction to the one of thesensor, which is 3 cm for the experimental areaevaluated. Then, we have generated a ground truthtrajectory by aligning the point cloud data fromthe onboard LIDAR with the prior map in post-processing. Note that this trajectory is different thanthe one obtained during online estimation, as therewas no prior map involved in this process.

The experiment with ANYmal (line 7, Table 2)have been paired with ground truth from a LeicaTS16 laser tracking system, which tracked therobot’s position with millimeter accuracy using areflective prism on the robot. The data from the lasertracker was then spatio-temporally aligned with theIMU to get ground truth poses via an offline batchoptimization, as described in Burri et al. (2016).

This is a provisional file, not the final typeset article 14

Page 15: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

Exp. N Robot RPE VO AICP OL CL T DT A GT[m] [s] [m] [m2]

1 Atlas ≤ 0.03* - X X X 1236 16 154 -2 Valkyrie 0.016 - X X X 341 12 78 X

3 Valkyrie 0.016 - X X X 50 2.5 78 X

4 HyQ 0.027 X X X X 1740 400 7.5 X

5 HyQ ≤ 0.03** X X X X 1740 400 9 -6 HyQ 0.033 X X X X 2640 300 100 X

7 ANYmal 0.34 X X - - 1996 240 1381 X0.83 - - - -

Table 2. Summary of the experiments. Legend: Exp. N = Experiment number, RPE = Relative Pose Error(translational part, evaluated over 10 m distance, ATE = Absolute Translation Error, VO = Visual Odometry,OL = Online, CL = Control Loop, T = Time, DT = Distance Traveled, A = Area, GT = Ground Truth. *By evaluation of the ground truth point cloud. ** By evaluation of the accuracy in returning to the initialposition.

Finally, when ground truth was not available,we have designed the experiments such that theestimation performance could be measured byanalyzing the robot’s accuracy in returning toits initial position after several forward/backwardmotions.

7.1.2 Pose Estimation Performance

Since our proposed algorithm is an odometrysystem (i.e., no loop closures are preformed),we base our quantitative analysis on the meantranslational component of the Relative Pose Error(RPE), defined by Sturm et al. (2012), overa distance of 10 m. The performance for eachexperiment is indicated in Table 2.

7.1.3 Control Loop Performance

We have evaluated the stability of the algorithm inreal conditions by running the estimator in real-timeon Valkyrie (to feed footstep planner). On Atlasand HyQ, we also closed the control loop with theestimator. The control loop test implicitly evaluatesthe quality of the velocity estimates, which aredirectly used by the locomotion controllers. Forthe ANYmal the execution was tested offline, butat nominal speed and on consumer grade laptopwith comparable performance to the hardwaremounted on the robot. In this case, the suitability forcontrol loop was assessed by looking at the signalsmoothness.

7.2 Atlas Experiments

The Atlas dataset (Table 2, Experiment 1) wascollected during a run by the MIT team at the

Figure 7. AICP performance on the DRC Finalsdataset with Atlas. Top: a top view of the alignmentof 206 point clouds during the run — left: rawclouds with people, right: filtered clouds. Bottomleft: state estimation without applying correction,valve perceived in different locations by successiveclouds. Bottom right: with successful localization,consistent estimate of the affordance.

DARPA Robotics Challenge Finals (Pomona, CA2015). It consists of 20 min 36 s of continuousoperation in a semi-structured environment of14 m× 11 m, with walls on the right side of therobot and an open-space populated by a crowd onthe left. The robot walks through the test scenarioalong a 16 m path while passing over uneven terrainand manipulating objects (Figure 1A). Accuratemaps of the environment were obtained in post-processing.

During the whole competion, Pronto (withoutAICP) was used to close the control loop. Its

Frontiers 15

Page 16: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

low-drift estimation performance (evaluated tobe approximately 1.67 % travelled distance inpreliminary indoor tests) allowed to successfullytraverse uneven and rough terrain, altough pausesfor re-localization were necessary.

Later on, further offline tests were carried outafter the integration of the AICP module. This time,the system performance was qualitatively evaluatedfrom careful observation of the map after the run(Figure 7), where the estimated trajectory is close toerror free (approximately 3 cm error for the full run).People have been filtered-out and do not contributeto the alignment (top right). The algorithm isstable and robust enough to compute successfulalignments during the entire run (with more than14 m displacement and overlap decreasing to just10 %), satisfying requirement 2. Under the sameconditions, standard ICP algorithms fails after 400 sas they are not accounting for dynamically changingpoint cloud overlap across the run.

7.3 Valkyrie Experiments

The state estimation framework was tested onlineon two different tasks: repeated walking on flatground (Table 2, Experiment 2) and stair climbing(Table 2, Experiment 3).

7.3.1 Repeated Walk to a Target

Valkyrie walked repeatedly forward towards afixed target identified at the beginning of therun before reversing direction. Over the courseof the experiment, the error in translation neverexceeds 7.5 cm and is 1.6 cm on average, whereasthe estimator without LIDAR has an unboundeddrift (Figure 8), mostly dominated by yaw bias(see bottom plot). This satisfies the requirementsabout expected localization accuracy. Thanks tothis localization performance, the robot couldreach the goal target and maintain a precise poseestimate during the entire run. In contrast, usingthe proprioceptive state estimator the robot failed toreach the target due to odometry drift.

7.3.2 Stair Ascend

Valkyrie was placed at 1 m from a staircase. Thetask was to walk towards it and climb up the steps.Planning was performed only once, at start. Overthe course of this 50 s experiment, the medianerrors in translation and rotation were comparableto Experiment 2. This level of accuracy allowed therobot to safely perform the task without needingto re-plan. In contrast, during the DRC robotstypically took a few steps at a time to climb stairsor transverse uneven terrain, pausing periodically tomanually re-localize and re-plan. In this context,

Figure 8. Translational and rotational error forExperiment 2 (Valkyrie). The blue line shows thekinematic-inertial typical estimation drift while inred is the estimate with the AICP corrections.

our system was demonstrated to enable greaterautonomy in task execution.

7.4 HyQ Experiments

On HyQ we performed experiments in twodifferent scenarios. First, for Experiment 4, arepetitive trotting motion was carried out ina laboratory environment with a Vicon motioncapture system for ground truth. Second, forExperiment 5 and 6, extensive testing was carriedout in a poorly lit, industrial area with a feature-less concrete floor, as well as test ramps androck beds (Figure 9B). The environment, thedifferent locomotion gaits (trotting and crawling)and the uneven terrains presented a large number ofchallenges to our algorithms and demonstrated theimportance of using redundant and heterogeneoussensing. The robot’s peak velocity when trottingwas about 0.5 m/s, which is approximately half oftypical human walking speed.

7.4.1 Indoor Repeated Trot to a Target

The robot was commanded to continuously trotforward and backward to reach a fixed target (aparticular line in Figure 9A). Robot position andvelocity estimates were used by the controller tostabilize the robot motion while tracking the desiredposition, as described in Barasuol et al. (2013).

Periodically, the operator updated the target soas to command the robot to trot a further 10 cmforward. The experiment continued for a totalduration of 29 min. At the end of the run, the robothad covered a total distance of about 400 m andtrotted forward and backward 174 times.

In Table 2, we show that the drift is below3 cm when combining IMU, Legged, Visual and

This is a provisional file, not the final typeset article 16

Page 17: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

LIDAR odometry. By comparison, without anyexteroceptive signals the drift was more than threetimes higher. When testing independently theaddition of VO or LO, we noticed that incorporatingVO reduces the drift rate relative to the base linesystem, while adding AICP achieves drift-freelocalization, since the AICP re-localizes againstthe same fixed map (the room).

To test the performance with uneven terrain andwhere the reference point cloud has to be updateddue to longer paths, a second series of experimentswas carried out in a larger environment.

7.4.2 Outdoor Repeated Trot to a Target

An equivalent experiment has been performedwithin a section of a a 20 m× 5 m industrial areasurrounded by pallets, walls and air treatmentmachines. The robot repeated a forward-backwardmotion covering a 6 m× 1.5 m area towards a targetplaced at 5 m distance from its starting position(Figure 1C). The robot traveled about 400 m on a0.5 m/s trotting gate, reaching the target 40 timeswithout any user input at run time.

The results presented in this section show that thefully integrated state estimation system, leveragingIMU, leg odometry, VO and AICP data, produce avery low drift estimate of the robot state. However,no LIDAR reference cloud updates were triggeredas the robot did not travel far from its initiallocation.

In the case of larger explorations, every referenceupdate generates an accumulated error. Themagnitude of this error depends on the residualerror from the alignment of the new reference tothe previous. In the case of HyQ, a reference updatehappens once every 10–13 m distance covered,depending on occlusions. In the following section,we present statistics from experiments wheremultiple LIDAR reference cloud updates weremade.

7.4.3 Outdoor Industrial Area Exploration

The robot explored the same industrial areadescribed in the previous section. To test the systemin different conditions, in some experiments wehave added rough terrains and ramps (Figure 9B)with both crawling and trotting gaits at up to 0.5 m/s.Turning in place (as seen in Figure 3) representedan extra challenge for the state estimation system.Lighting conditions varied dramatically during datarecording, from bright light to strong shadows andfrom day to night-time. In some experiments, on-board lighting was used. The dataset is summarizedin Table 3 and consists of five runs, for a totalduration of 44 min and 300 m traveled.

N Gait Duration Area m2 Laser Ramp6a crawl 869 s 20×5, F/B 5 RPM X

6b crawl 675 s 20×5, F 5 RPM X

6c trot 313 s 20×5, F/B 15 RPM X6d trot 330 s 20×5, F/B 10 RPM X6e trot 469 s 7×5, F/B 10 RPM X

Table 3. Detailed summary of the dataset used forExperiment 6, including log duration, size of arena,type of motion (F/B = forward/backward trajectory),laser spin rate, and terrain features.

No motion capture system was available in thisspace: to quantitatively evaluate the state estimationperformance on the dataset, we built a prior mapmade up of a collection of 4 carefully aligned pointclouds and we estimated drift relative to it.

7.4.3.1 Crawling Gait

In the previous section, we have shown (whiletrotting) that integrating VO reduces the pose driftrate between the lower frequency AICP corrections.Here, we focus on the importance of using VO inaddition to AICP.

Figure 11 shows the estimated error over thecourse of Experiment 6a, recorded in the arena ofFigure 9. The robot started from pose A, reached Band returned to A. The robot crawled for 40 m andpaused to make 3 sharp turns. The experiment wasat night and used the on-board LED lights.

During this run, the reference point cloud wasupdated 4 times. After 860 s, the state estimationperformance had not significantly degraded, despiteno specific global loop closure being computed.

7.4.3.2 Trotting Gait

As mentioned previously, trotting is a moredynamic gait with a higher proprioceptive drift rate,which means that the VO could better contributewhen combined with AICP. Empirically, this canbe seen in the inset plot in Figure 9. In thiscase, the algorithm with VO produces a smoothertrajectory (in green) than without (in yellow). Thisis important because the robot’s base controlleruses these estimates to close position and velocitycontrol loops. Discontinuities in the velocityestimate could lead to undesired destabilizing footforces and controller reactions.

In brief, for Experiments 6c–6e the integration ofAICP allowed state estimation with an average 3Dmedian translation error of approximately 4.9 cm.The integration of VO further reduced the median

Frontiers 17

Page 18: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

Figure 9. (A) Indoor repeatability tests. (B) Outdoor exploration tests in challenging scenarios. (C)Comparison between estimated trajectories of HyQ from Experiment 6d: IMU and Leg Odometry (cyan);IMU, Leg Odometry, VO (magenta); IMU, Leg Odometry (LO), AICP (yellow); IMU, LO, VO, AICP(green). Note the IMU-LO-VO-AICP trajectory is smoother than the combination without VO (inset).

translation error to 3.2 cm (Figure 11). The RPEover 10 m is in line with the indoor experiments.

Figure 10. Example of left camera image anddepth image produced by the HyQ’s stereo camera.This reflects the difficult lighting conditions andchallenging structure of the test arena. The scene isilluminated with the sensor’s on-board lights.

Figure 11. Estimated error of the state estimatorused in Experiment 6a. The experiment involvedthe robot crawling for a total of 40 m.

7.5 ANYmal Experiments

The ANYmal dataset was collected at the FireService College, a 32.5 m× 42.5 m industrial oilrig facility used for firefighter training (Figure 12).The ground truth was collected with a laser trackingsystem, a Leica TS16, which tracked the robot’sposition with mm accuracy using a reflective prismon the robot.

The robot started from an open area and wascommanded to trot at 0.3 m/s inside the facility,between metal containers and stairs, performingthree loops before returning to the initial position,for a total of 240 m distance covered in 33 min.The dataset includes several extra challenges inaddtion to the ones of the previous section: 1)the area covered is much wider, therefore we hadto trigger forced AICP reference updates on aregular basis (i.e., once every 0.5 m traveled); 2) thescene includes open areas where the robot looksat the horizon, where a very limited number ofstereo features are available; 3) the scene containsreflections due to water puddles which confuse thevisual feature tracking.

The different level of performance compared toprevious tests is due to several factors related tothe scenario used. In contrast with the previousexperiments, the open space and the size of thearea covered force triggering of frequent referenceupdates (more than 40 updates vs. 4 updates inExperiment 6 on HyQ). As no loop closures areperformed, in this situation the LIDAR cannotcompletely eliminate the drift accumulated whennew reference updates are triggered. In additionto that, the Velodyne scans are much sparser dueto the wider scenario (only a few LIDAR ringsare projected onto the ground), making hard to

This is a provisional file, not the final typeset article 18

Page 19: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

Figure 12. (A) Experiment 7 with the ANYmal robot at the Fire Service College. (B) Onboard camerafeed during the experiment. Note the visual odometry challenging conditions due to reflections in the water.(C) Estimated trajectory from Pronto with FOVIS and AICP active (blue) against ground truth (red). Theground truth was not available in the bottom area. Start and end location for both algorithms are highlightedwith a circle and a cross, respectively.

constrain the robot position on the z-axis. Wehave partially compensated for this problem byaugmenting the LIDAR data with a filtered outputof a downward facing RealSense D435.

Despite these challenges, the system is able toeffectively fuse all the sensors modalities, achievingan RPE of 34 cm over 10 m, which correspondsto 3.4 % error. The contribution of the LIDARlocalization is particulary evident on the z-axiswhere it significantly reduces the characteristicvertical drift caused by leg/ground compressionwhile trotting. This allowed to reduce the RPE by60 % from the baseline algorithm with IMU andLeg Odometry only. After 240 m traveled, the poseestimate is less than 30 cm away from the groundtruth (cf. the estimate on the xy-plane in Figure12C).

8 DISCUSSION

In the previous section we demonstrated the abilityof our system to overcome a variety of perceptionchallenges, including: low light conditions, motionblur, reflections, dynamic motions, rough terrain.We also showed its versatility by supporting avariety of sensor modalities and four differentlegged robots.

The simple but effective integration of delayedsignals into the time history described inSection 5.1 allowed us to integrate two differentodometry sources (Visual and LIDAR) despite theirsignificant delay and different frequencies.

A limitation of the current approach is thelack of measurement update triaging in caseof disagreement between different exteroceptivesources. Currently, when an exteroceptive moduledoes not report failure, its confidence on themeasurement is only encoded by a fixed covariancematrix. A possible approach is to implement amechanism that maps the error metrics specificto a module (e.g., VO reprojection error, ICPregistration error) into a dynamically changingcovariance matrix.

Alternatively, transitioning from loosely to tightlycoupled approaches would allow joint optimizationover all the measurements, making the estimationmore robust against outlier updates. This is on goingwork.

9 CONCLUSION

We have presented a state estimation framework toperform sensor fusion of inertial, kinematic, visualsensing and LIDAR on legged robots, built upon amodular Extended Kalman Filter.

In particular, we indicated how our approachsupports dynamic maneuvers and operation insensor impoverished situations. The reliability ofour approach was demonstrated with dynamic gaitsand speeds up to 0.5 m/s. A particular technicalachievement has been reliably closing the loop withthis state estimator in dynamic gaits.

During experiments lasting over two hours,our system was demonstrated to be robust and

Frontiers 19

Page 20: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

continuously accurate, with a RPE less than 35 cmover 10 m traveled for the most challenging scenarioand 2–3 cm in smaller areas.

Our current filter marginalizes out previous statevariables. In future work we will explore usingwindowed smoothing to incorporate measurementsrelative to previous filter states. We are alsointerested in extending the state with dynamicquantities such as CoM and linear/angular momentasimilarly to Xinjilefu et al. (2014).

CONFLICT OF INTEREST STATEMENT

The authors declare that the research was conductedin the absence of any commercial or financialrelationships that could be construed as a potentialconflict of interest.

AUTHOR CONTRIBUTIONS

MC wrote the main article, implemented thequadruped leg odometry and the overall filterarchitecture, and run the experiments on HyQ.MR performed LIDAR localization experimentson the ANYmal robot. SN implemented theLIDAR localization algorithm and performed thelocalization experiments on Atlas, Valkyrie, andHyQ. MF implemented the bipedal leg odometry,revised and approved the manuscript.

FUNDING

This research has been conducted as part of theANYmal research community. It was part fundedby the EU H2020 Projects THING and MEMMO,the Innovate UK-funded ORCA Robotics Hub(EP/R026173/1) and a Royal Society UniversityResearch Fellowship (Fallon).

ACKNOWLEDGMENTS

The authors would like to thank the DynamicRobot Systems Group (University of Oxford),the MIT’s DARPA Robotics Challenge team, theUniversity of Edinburgh, and the Dynamic LeggedSystems Lab (IIT) for support with experimentaltrials. The authors also thank the organizers of theRSS and ICRA conferences (2017 edition) for theopportunity to present the authors’ prior work.

REFERENCESAhn, S., Yoon, S., Hyung, S., Kwak, N., and Roh,

K. S. (2012). On-board odometry estimationfor 3d vision-based slam of humanoid robot.

In 2012 IEEE/RSJ International Conference onIntelligent Robots and Systems (IEEE), 4006–4012

Barasuol, V., Buchli, J., Semini, C., Frigerio, M.,De Pieri, E. R., and Caldwell, D. G. (2013). AReactive Controller Framework for QuadrupedalLocomotion on Challenging Terrain. In IEEEIntl. Conf. on Robotics and Automation (ICRA)(Karlsruhe, Germany)

Bloesch, M., Burri, M., Sommer, H., Siegwart,R., and Hutter, M. (2018). The two-stateimplicit filter recursive estimation for mobilerobots. IEEE Robotics and Automation Letters 3,573–580. doi:10.1109/LRA.2017.2776340

Bloesch, M., Gehring, C., Fankhauser, P., Hutter,M., Hoepflinger, M. A., and Siegwart, R. (2013).State estimation for legged robots on unstableand slippery terrain. In IEEE/RSJ InternationalConference on Intelligent Robots and Systems.6058–6064. doi:10.1109/IROS.2013.6697236

Bloesch, M., Hutter, M., Hoepflinger, M.,Leutenegger, S., Gehring, C., Remy, C. D., et al.(2012). State estimation for legged robots -consistent fusion of leg kinematics and IMU. InProc. of Robotics: Science and Systems (RSS)

Bry, A., Bachrach, A., and Roy, N. (2012). Stateestimation for aggressive flight in GPS-deniedenvironments using onboard sensing. In IEEEIntl. Conf. on Robotics and Automation (ICRA).1–8

Bry, A., Richter, C., Bachrach, A., and Roy, N.(2015). Aggressive flight of fixed-wing andquadrotor aircraft in dense indoor environments.The International Journal of Robotics Research34, 969–1002. doi:10.1177/0278364914558129

Burri, M., Nikolic, J., Gohl, P., Schneider, T.,Rehder, J., Omari, S., et al. (2016). Theeuroc micro aerial vehicle datasets. TheInternational Journal of Robotics Research doi:10.1177/0278364915620033

Camurri, M., Fallon, M., Bazeille, S., Radulescu,A., Barasuol, V., Caldwell, D. G., et al. (2017).Probabilistic Contact Estimation and ImpactDetection for State Estimation of QuadrupedRobots. IEEE Robotics and Automation Letters2, 1023–1030

Chilian, A., Hirschmuller, H., and Gorner, M.(2011). Multi-sensor data fusion for robust poseestimation of a six-legged walking robot. InIEEE/RSJ Intl. Conf. on Intelligent Robots andSystems (IROS) (San Francisco, California)

Diebel, J. (2006). Representing attitude: Eulerangles, unit quaternions, and rotation vectors(Stanford University, California)

Fallon, M. F., Antone, M., Roy, N., and Teller,S. (2014). Drift-free humanoid state estimationfusing kinematic, inertial and LIDAR sensing.

This is a provisional file, not the final typeset article 20

Page 21: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

In IEEE/RSJ Int. Conf. on Humanoid Robots(Madrid, Spain)

Featherstone, R. (2008). Rigid Body DynamicsAlgorithms (Springer)

Forster, C., Carlone, L., Dellaert, F., andScaramuzza, D. (2017a). On-ManifoldPreintegration for Real-Time Visual–InertialOdometry. IEEE Transactions on Robotics 33,1–21. doi:10.1109/TRO.2016.2597321

Forster, C., Zhang, Z., Gassner, M., Werlberger, M.,and Scaramuzza, D. (2017b). Svo: Semidirectvisual odometry for monocular and multicamerasystems. IEEE Transactions on Robotics 33, 249–265. doi:10.1109/TRO.2016.2623335

Furgale, P., Rehder, J., and Siegwart, R. (2013).Unified temporal and spatial calibration for multi-sensor systems. In 2013 IEEE/RSJ InternationalConference on Intelligent Robots and Systems.1280–1286. doi:10.1109/IROS.2013.6696514

Hartley, R., Jadidi, M. G., Gan, L., Huang, J.,Grizzle, J. W., and Eustice, R. M. (2018a).Hybrid contact preintegration for visual-inertial-contact state estimation using factor graphs. In2018 IEEE/RSJ International Conference onIntelligent Robots and Systems (IROS). 3783–3790

Hartley, R., Mangelson, J., Gan, L., GhaffariJadidi, M., Walls, J. M., Eustice, R. M., et al.(2018b). Legged robot state-estimation throughcombined forward kinematic and preintegratedcontact factors. In 2018 IEEE InternationalConference on Robotics and Automation (ICRA).4422–4429. doi:10.1109/ICRA.2018.8460748

Hirschmuller, H. (2008). Stereo processing by semi-global matching and mutual information. IEEETrans. Pattern Anal. Machine Intell. 30, 328–341

Hornung, A., Wurm, K. M., and Bennewitz,M. (2010). Humanoid robot localization incomplex indoor environments. In 2010 IEEE/RSJInternational Conference on Intelligent Robotsand Systems (IEEE), 1690–1695

Huang, A. S., Bachrach, A., Henry, P., Krainin,M., Maturana, D., Fox, D., et al. (2011). Visualodometry and mapping for autonomous flightusing an RGB-D camera. Proceedings of theInternational Symposium on Robotics Research(ISRR)

Hutter, M., Gehring, C., Jud, D., Lauber,A., Bellicoso, C. D., Tsounis, V., et al.(2016). ANYmal - a highly mobile anddynamic quadrupedal robot. In 2016 IEEE/RSJInternational Conference on Intelligent Robotsand Systems (IROS). 38–44. doi:10.1109/IROS.2016.7758092

Jenelten, F., Hwangbo, J., Tresoldi, F., Bellicoso,C. D., and Hutter, M. (2019). Dynamiclocomotion on slippery ground. IEEE Robotics

and Automation Letters 4, 4170–4176. doi:10.1109/LRA.2019.2931284

Koolen, T., Bertrand, S., Thomas, G., de Boer,T., Wu, T., Smith, J., et al. (2016).Design of a momentum-based control frameworkand application to the humanoid robot Atlas.International Journal of Humanoid Robotics 13

Lynen, S., Achtelik, M., Weiss, S., Chli, M., andSiegwart, R. (2013). A robust and modularmulti-sensor fusion approach applied to MAVnavigation. In IEEE/RSJ Intl. Conf. on IntelligentRobots and Systems (IROS) (Tokyo, Japan)

Ma, J., Bajracharya, M., Susca, S., Matthies,L., and Malchano, M. (2016). Real-time poseestimation of a dynamic quadruped in GPS-denied environments for 24-hour operation. Intl.J. of Robotics Research 35, 631–653

Mur-Artal, R., Montiel, J. M. M., and Tardos, J. D.(2015). ORB-SLAM: A versatile and accuratemonocular SLAM system. IEEE Trans. Robotics31, 1147–1163

Nobili, S., Camurri, M., Barasuol, V., Focchi,M., Caldwell, D., Semini, C., et al. (2017a).Heterogeneous sensor fusion for accurate stateestimation of dynamic legged robots. In Robotics:Science and Systems (RSS)

Nobili, S., Scona, R., Caravagna, M., and Fallon,M. (2017b). Overlap-based ICP tuning for robustlocalization of a humanoid robot. In IEEE Intl.Conf. on Robotics and Automation (ICRA). 4721–4728

Olson, E. (2010). A passive solution to thesensor synchronization problem. In IEEE/RSJInternational Conference on Intelligent Robotsand Systems (IROS) (Taipei, Taiwan)

Pomerleau, F., Colas, F., Siegwart, R., andMagnenat, S. (2013). Comparing ICP variantson real-world data sets. Autonomous Robots 34,133–148

Qin, T., Li, P., and Shen, S. (2018). VINS-Mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions onRobotics 34, 1004–1020. doi:10.1109/TRO.2018.2853729

Radford, N. A., Strawser, P., Hambuchen, K.,Mehling, J. S., Verdeyen, W. K., Donnan, A. S.,et al. (2015). Valkyrie: NASA’s first bipedalhumanoid robot. Journal of Field Robotics 32,397–419. doi:10.1002/rob.21560

Reinke, A., Camurri, M., and Semini, C. (2019). Afactor graph approach to multi-camera extrinsiccalibration on legged robots. In 2019 Third IEEEInternational Conference on Robotic Computing(IRC). 391–394. doi:10.1109/IRC.2019.00071

Rotella, N., Bloesch, M., Righetti, L., and Schaal, S.(2014). State estimation for a humanoid robot. InProc. of the IEEE/RSJ International Conference

Frontiers 21

Page 22: Pronto: a Multi-Sensor State Estimator for Legged Robots in ...mobile/drs/Papers/2020FRONTIERS...Pronto: a Multi-Sensor State Estimator for Legged Robots in Real World Scenarios Marco

Camurri et al.

on Intelligent Robots and Systems (IROS). 952–958

Segal, A., Haehnel, D., and Thrun, S. (2009).Generalized-icp. In Robotics: science andsystems (Seattle, WA), vol. 2, 435

Semini, C., Tsagarakis, N. G., Guglielmino, E.,Focchi, M., Cannella, F., and Caldwell, D. G.(2011). Design of HyQ – A hydraulicallyand electrically actuated quadruped robot.Proceedings of the Institution of MechanicalEngineers, Part I: Journal of Systems and ControlEngineering 225, 831–849

Shen, S., Mulgaonkar, Y., Michael, N., andKumar, V. (2014). Multi-sensor fusion forrobust autonomous flight in indoor and outdoorenvironments with a rotorcraft mav. In IEEE Intl.Conf. on Robotics and Automation (ICRA) (HongKong)

Sturm, J., Engelhard, N., Endres, F., Burgard,W., and Cremers, D. (2012). A Benchmarkfor the Evaluation of RGB-D SLAM systems.In 2012 IEEE/RSJ International Conference onIntelligent Robots and Systems. 573–580. doi:10.1109/IROS.2012.6385773

Wim Meeussen (2010). Coordinate Framesfor Mobile Platforms. Tech. Rep. REP– 105. https://www.ros.org/reps/rep-0105.html

Wisth, D., Camurri, M., and Fallon, M.(2019). Preintegrated velocity bias estimation toovercome contact nonlinearities in legged robotodometry. arXiv , 1910.09875

Wisth, D., Camurri, M., and Fallon, M. (2019).Robust legged robot state estimation usingfactor graph optimization. IEEE Robotics andAutomation Letters 4, 4507–4514. doi:10.1109/LRA.2019.2933768

Xinjilefu, X., Feng, S., and Atkeson, C. G.(2014). Dynamic state estimation using quadraticprogramming. In IEEE/RSJ InternationalConference on Intelligent Robots and Systems.989–994. doi:10.1109/IROS.2014.6942679

Xinjilefu, X., Feng, S., and Atkeson, C. G. (2015).Center of mass estimator for humanoids and itsapplication in modelling error compensation, falldetection and prevention. In Humanoids, IEEE-RAS International Conference on HumanoidRobots

This is a provisional file, not the final typeset article 22


Recommended