SFM2018 submission draft v6 010218 -...

Angles-Only Orbit Determination Using Hamiltonian

Monte Carlo

Lauren G. Schlenker∗, Richard Linares†

University of Minnesota, Minnesota, 55455, USA

Andrew J. Sinclair‡

Air Force Research Laboratory, Kirtland AFB, New Mexico, 87117, USA

The lack of observability inherent in the linearized dynamics model for angles-only rel-

ative navigation between two satellites in close proximity has been well established by

numerous studies, showing that an infinite set of possible relative orbits satisfy the obser-

vations. This work seeks a probabilistic method of angles-only orbit determination and to

study this problem using the full nonlinear formulation. The lack of range observability in

the problem makes Gaussian approximations a poor representation of the solution proba-

bility density, and motivates higher fidelity approaches than typical Markov Chain Monte

Carlo approaches for probability distribution sampling. In order to achieve this, Hamil-

tonian Monte Carlo sampling of a theoretical probability distribution of possible solutions

is explored, which is known to be more successful for high dimensional problems. The

technique is performed on several angles-only measurement cases with increasingly diffi-

cult observability, including close proximity and coplanar cases. It is observed that when

tuned correctly, Hamiltonian Monte Carlo sampling can successfully resolve the probability

distributions of the possible deputy states, showing increasingly non-Gaussian behavior as

observability is limited. Additionally, Hamiltonian Monte Carlo achieves this much more

efficiently than traditional Markov Chain Monte Carlo techniques.

I. Introduction

In an ideal relative orbit determination situation, a primary “chief” satellite (whose orbit is generallytaken to be precisely known) may utilize a combination of range and range-rate sensors and line-of-sightand line-of-sight rate sensors in order to estimate the relative position and velocity of a secondary “deputy”satellite. However, in many applications it is desirable to perform the orbit determination using only line-of-sight measurements from cameras due to their low size, weight, and power requirements. Unfortunately,it has been well-established that line-of-sight angles-only measurements from cameras fail to produce aunique observable system when linearized dynamics such as the Clohessy-Wiltshire equations are used, asdemonstrated by Woffinden and Geller.1 In particular, use of a linearized model for the relative orbitaldynamics produces an infinite set of possible solutions for the deputy orbit, each satisfying the line-of-sightvectors produced by camera observations.

In theory, use of a nonlinear model may produce better observability due to the presence of additionalnonlinear terms that reveal subtleties of the underlying dynamics when incorporated into acceleration pa-rameters.1 In reality, the close-proximity cases that are generally relevant to chief-deputy scenarios resultin the nonlinear and linear models looking very similar (i.e. any function looks like a line at small enoughdistances,) thus the nonlinear terms are difficult to observe.2

Previous work by Patel et al. has reformulated the problem of observability into one of estimating anew basis vector for the relative trajectory of the deputy, allowing for insight into various characteristics ofthe deputy orbit.2 Though this method does provide some information about details of the orbit such as

∗Graduate Student, Aerospace Engineering and Mechanics. AIAA Student Member. email: [email protected].†Assistant Professor, Aerospace Engineering and Mechanics. Senior AIAA Member. email: [email protected].‡Senior Aerospace Engineer, Space Vehicles Directorate. AIAA Associate Fellow. email: [email protected].

1 of 17

American Institute of Aeronautics and Astronautics

proportions between the amplitude of the periodic components of the motion and the drift rate, the rangeto the deputy is still an undetermined parameter.

One of the core criteria established by Woffinden and Geller is that unobservability occurs when nothrusting maneuvers are used to alter the trajectory of the chief; conversely, if certain maneuver profiles areutilized in a predictable and strategic way, the range to the deputy may be estimated based on the changein angle measurements before and after the maneuver, which must be functions of range.1 This method ofproducing observability through maneuvering strategies has been previously explored. Work by Hebert etal. demonstrated that use of a strategic maneuver can produce observability, however the accuracy of thismethod is heavily impacted by errors in measurement and the accuracy of the linear model utilized.3 Thiscan be a particular problem if the orbit of the chief is not assumed to be circular (as initially required bythe Clohessy-Wiltshire equations) as in the virtual-chief elliptic model described by Sherrill et al., which wasdemonstrated to produce additional model error.4

In addition to the issue of error-driven accuracy, certain maneuvers do not produce full observabilitydue to resulting singular measurement equations. These singular maneuvers can be predicted and avoidedthrough the use of successive maneuver schemes, as demonstrated by Hebert et al.5 Though this methodmay be reasonably successful in potentially producing observability, maneuvering is costly, especially multiplemaneuvers, offsetting any potential cost saved by omitting range sensors. Therefore, it is beneficial to developa maneuverless method of producing observability.

Rather than attempting to directly estimate the range from the chief to the deputy, it may be logical tofirst obtain a probability distribution of possible states for the deputy, then use this distribution to guidefurther measurements in order to obtain state estimates. Approaching orbit determination from a statisticalstandpoint in this way, though uncommon, has been successfully demonstrated, first by Muionen and Bowellin 1993 for asteroid orbit determination and more recently by Schneider for orbital debris tracking.6, 7

The lack of observability inherent in the close proximity angles-only problem poses challenges in char-acterizing the probability distribution of orbit-determination solutions. Typical Gaussian representationsmay provide poor approximation of the probability distribution. However, accurate characterization of theprobability distribution is important for high-fidelity solution estimation and sensor tasking for follow upobservations. This paper focuses on enabling this sort of technique by applying Hamiltonian Monte Carlosampling, a potentially more efficient method of obtaining probability distributions for the 6-dimensionalstate of a deputy satellite.

One of the most general and widely used approaches for Bayesian inversion is the Markov Chain MonteCarlo (MCMC) algorithm, which originated in 1953 and has been applied to many problems ever since.8

MCMC algorithms produce samples from the posterior probability distribution function using relatively sim-ple proposal and rejection rules. The first and most popular MCMC algorithm is the Metropolis-Hastings(MH) algorithm which has seen widespread use, particularly in solving complicated high-dimensional sta-tistical problems.8, 9 The MH method has recently been applied to initial orbit determination in a handfulof situations, including cataloging objects in geostationary orbit10 as well as tracking asteroids from Gaiadata.11 MH was also recently applied to angles-only initial orbit determination using an unscented transformto account for the uncertainty in the Gauss’ angles-only method.12 Though proven to be useful, the primarydrawback of MH is that a number parameters require tuning in order to obtain useful results; tuning theseparameters is not trivial, and though a number of adaptive approaches have been developed to estimate theseparameters and improve the overall accuracy and efficiency of the MH approach by automatically learningimproved parameters based on previous chain values, the tuning process is still tedious.13

II. Bayesian Orbit Determination

The orbit determination problem involves determining a conditional estimate of the orbital state param-eters, xk, given observations, yk for k = 1, · · · ,m. The dynamics of the state and observation equations canbe written in general as,

xk+1 = f (xk, tk) +wk (1a)

yk = h (xk, tk) + vk (1b)

where the dynamics are continuous but can be expressed in discrete time form, and the observations arediscrete at times tk. The vector xk is the state parameters of the satellite, f is the nonlinear dynamicsfunction, and wk is the process noise term. The process noise represents the error in the dynamic model

2 of 17


and is usually assumed to be a zero-mean white-noise random variable with Ewk = 0, EwkwTj = 0

for k 6= j, and EwkwTk = Qk. The discrete time model in Eq. (1a) provides the transition probability

distribution function for the state which is given by p(xk+1|xk) = N (xk+1; f (xk, tk) , Q) and Eq. (1b) providea likelihood probability distribution function p(yk|xk) = N (yk;h (xk, tk) , R). Between measurements thestate probability distribution can be propagated using the Chapman-Kolmogorov (CK):

p(xk+1|Yk) =

∫

p(xk+1|xk)p(xk|Yk)dxk (2)

where Yk includes all measurements up to time step k and is given by Yk = [y0, y1, · · · , yk]. Then Bayesianinference is based on the posterior distribution, p(x0|y). Bayes’ Rule14 relates the prior and likelihooddistribution to the posterior distribution:

p(xk|Yk) =p(yk|xk)p(xk|Yk−1)

∫

p(yk|xk)p(xk|Yk−1)dxk

(3)

where the denominator is a normalization constant to ensure that p(xk|Yk) is a proper probability distribu-tion function. Equations (2) and (3) provide a recursion for solving for the posterior probability distributionfunction of the orbital state given observations. This recursion is a complete solution for a probabilisticdescription of the orbital state. However, in most cases Eq. (3) is difficult to use since the normalizationconstant is an n dimensional integral and it is not tractable in general. Additionally, Eq. (2) describes thepropagation of the state probability distribution function between measurements and the integral in thisequation may be difficult to solve. This process can be simplified by assuming a deterministic model inEq. (1a) which removes the need for using the CK equation to determine the posterior probability distribu-tion function. Additionally, the dynamics in Eq. (1a) can be expressed in terms of the initial condition withthe solution flow function as

xk = φ (q, tk) (4)

where q = x0 is the initial condition and φ (q, tk) is the solution flow that propagations initial conditionstates to time k. Note that here we use the initial state but the state at any time of interest can be usedsince the dynamical system satisfies the Markov property. Then the orbit determination problem can beexpressed in term of estimating the state parameters q. The measurement equation can be written in termsof the state parameters

yk = h (φ (q, tk) , tk) + vk (5)

Then the posterior probability distribution function for the state vector is given by Bayes’ rule:

p(q|y) = p(y|q)p(q)∫

p(y|q)p(q)dq (6)

where y = [y0, y1, · · · , ym] includes all observations. The probability distribution function p(q) is the a

priori or prior distribution and represents prior knowledge on the initial condition vector. For the orbit de-termination problem there may not be any prior, however, the prior can be used as a regularization function.Furthermore, under the assumption that the observations are independent and identically distributed, withyk ∼ p(yk|q), the likelihood function can be expressed as

p(y|q) =m∏

k=0

p(yk|q) (7)

Then the posterior of the initial condition can be solved for using Eqs. (6) and (7). However, under thisformulation the normalization constant is still required which makes the computation of posterior difficultfor multidimensional problems.

The MCMC approach provides a simple method for sampling values from a distribution while only usinginformation from the unnormalized posterior. Then the unnormalized posterior probability distributionfunction used in the MCMC approach is given by

π(q|y) ∝ p(y|q)p(q) (8)

3 of 17


where π(q|y) represents the unnormalized posterior density. Under the assumption that a functional formfor the prior and the likelihood are given, π(q|y) can be easily computed. Typically, the likelihood is assumedto be a Gaussian probability distribution function and Eq. (5) makes this assumption. The goal of MCMCmethods is to sample from p(q|y) using π(q|y), producing a Markov Chain q1,q2, · · · ,qN which aresamples from p(q|y). Although MCMC methods do not produce an analytical expression of the posteriordistribution, the chain can be used to estimate moments and all other relevant statistics of the conditionalestimate for q.

The MH algorithm is one of the most general algorithm for MCMC simulations. The MH approachsequentially proposes new additions to the chain, q∗, using a proposal density density, g(q,q∗), given acurrent value in the chain q. If the new state, q∗, does not meet an acceptance criteria used in the MHapproach it is not included in the chain. If the new state is rejected the chain remains at the current state.The acceptance probability, α, for new state in the MH approach is defined as

α(q,q∗) = max

[

1,π(q∗|y)g(q∗,q)

π(q|y)g(q,q∗)

]

(9)

Note that the MH approach accepts or rejects the moves based on the ratio of π(q|y)/π(q∗|y) and thereforesince the normalization constant does not depend on q or q∗ it cancels in this ratio. The power of the MHapproach is that it does not require that the normalization constant be calculated to produce samples fromp(y|q). The acceptance probability in Eq. (9) ensures that the chain convergences to target distribution,π(θ), as N → ∞.

III. Hamiltonian Monte Carlo

The goal of using Hamiltonian Monte Carlo sampling for this problem is to characterize the probabilitydistribution of possible deputy states given angles-only measurements. Assuming no prior information isavailable regarding the object’s orbit, the a priori distribution in state space is assumed to be uniform.A more efficient solution to the tuning problem is possible with Hamiltonian Monte Carlo (HMC), whichuses Hamiltonian dynamics as a method of proposing states based on the target density and an auxiliarymomentum parameter in order to improve mixing and overall algorithm efficiency.15, 16 HMC also offersa significant advantage over MCMC methods in that the proposals more efficiently explore the regions ofinterest in far fewer samples. For example, MCMC applied to an orbital debris tracking problem required onthe order of 500,000 samples to resolve the target distributions.7 We wish to show that a similar problem,when approached with HMC, can resolve the target distribution in far fewer samples.

The most useful property that allows HMC to be successful is that the samples generated by the Hamil-tonian dynamics preserve volume, energy, and are time reversible, as long as the correct integrator is used topropagate the dynamics. The rigorous mathematical proofs of these properties as well as other interestingproperties of HMC are outside the scope of this paper, which focuses on applying the useful properties ofHMC to orbit determination, and thus will not be covered here. Sufficient proofs can be found in great detailin papers by Neal and Betancourt.15, 16 HMC begins by defining the potential energy as the logarithm of ajoint probability density:

V (q) = − log π (q|y) = − log p(y|q) − log p(q) (10)

where q for our purpose is the relevant state vector, however, HMC introduces an auxiliary momentumvector, p, that is separate from the underlying problem (i.e. the momentum p is completely artificial andnot related to any physical velocity parameters that may be present in q). Therefore, the Hamiltonian issimply defined as the logarithm of the posterior distribution of q given y and the kinetic energy functionK(p,q). Using Eq. 10 as potential energy, the Hamiltonian becomes:

H(q,p) = K(p,q) + V (q) (11)

The kinetic energy of this formulation is seen to depend on p and q, while the potential energy is dependentonly on the target state distribution and not the auxiliary momentum variable. The kinetic energy can evenbe defined in the usual way in terms of momentum and mass M , which then corresponds to the negative ofthe log probability density of a zero-mean Gaussian distribution with covariance M :

K =1

2pTM−1p (12)

4 of 17


By taking partial derivatives of the Hamiltonian, Hamilton’s equations may be obtained in the usual way todetermine how p and q vary with time:

q =∂H

∂p(13a)

p = −∂H

∂q(13b)

(13c)

In general, K is allowed to be a function of both p and q, as seen in Eq. 11 – in this case, this would meanthat M is allowed to be a function of q. However, if mass M is taken to be a function of q, then Hamilton’sequations are not separable which complicates their use. Therefore, we assume that M is not a functionof q during integration, i.e. M is held constant with respect to q, even though the initial selection of Mmight be dependent on q. Further, in order to obtain a useful algorithm, Equations (13a) and (13b) mustbe discretized. The most common method of integrating these dynamics is with a leapfrog integrator, whichin combination with Eq. (12) takes the following steps:

p(t+ ǫ/2) = p(t)− (ǫ/2)∂V

∂q(t)(14a)

q(t+ ǫ) = q(t) + ǫp(t+ ǫ/2)

m(14b)

p(t+ ǫ) = p(t+ ǫ/2)− ǫ

2

∂V

∂q(t+ ǫ)(14c)

where ǫ is an arbitrary step size. The initial momentum value p(t) is taken to be dependent on mass inthe usual way, i.e. p = Mv where v is a randomly selected “velocity.” Equations 14a-14c then representa single “leapfrog” step which can be repeated as many times L as desired, such that p(t + ǫL) = p∗ andq(t + ǫL) = q∗ are the resulting proposal momentum and state respectively. It can be shown that thismethod of numerical integration preserves volume and is reversible for K dependent on quadratic p.15 Afterobtaining proposal state q∗, the proposal is either accepted or rejected with probability given by:

P(acceptance) = min [1, exp(−H(q∗,p∗)] = min [1, exp(−V (q∗)−K(p∗) + V (q) +K(p))] (15)

If the proposal is rejected, the current state q is used for the next state q(t), and momentum p is randomlyvaried. This acceptance criteria can roughly be understood as a way of ensuring that proposals approximatelyconserve the Hamiltonian, with built in probabilistic allowance for small deviations due to integration errorthat produce biases in the joint density.17 For our purposes, the potential energy V is defined as follows:

V (q) = − log [p(q|y)p(q)] (16)

where p(q) is a prior density on q, and p(q|y) is the likelihood function associated with q given measurementsy. This choice of potential drives the Hamiltonian towards regions of high probability, as in gradientdescent.17 The HMC approach has been applied to many statistical applications due to a number of usefulproperties that are advantageous compared to traditional random-walk methods. HMC reduces the randomwalk behavior of MH and proposes samples that may be distant from the current state, more fully exploringstate space. This reduction of the random walk behavior makes HMC very appealing for high-dimensionalapplications.

However, the HMC approach is still dependent on parameter tuning, in particular the mass matrix orthe covariance of the momentum states. To overcome this, Girolami and Calderhead propose a modificationcalled Riemannian Manifold HMC (RMHMC) that exploits the Riemannian geometry of the parameter spaceto improve the efficiency of standard HMC.18 Along with several details that are not yet being exploited forthis work, RMHMC uses the local Fisher Information Matrix to define the mass matrix and the scale forthe momentum variable; this part of the method is utilized for this work in order to reduce the amount oftuning necessary to generate uncorrelated samples. Using this method, the mass matrix is then defined as:

M = F (q) = E

[

∂

∂qlog(p(y|q)

] [

∂

∂qlog(p(y|q)

]T

(17)

5 of 17


Figure 1. The chief and deputy orbital setup around the Earth.

This choice of mass, and thereby momentum, is a key step in driving the efficiency of HMC, as the evolutionof proposed states can be scaled in an intelligent way so that certain directions are explored on a scale thatis relative to their respective length scales. Note once more that this definition of M is in fact a function ofq, requiring that M be held constant across the leapfrog steps defined in Equation 14, which may introduceerror into the integration if M would otherwise change significantly across the integration. For our purposes,this aspect is neglected as M is not seen to vary significantly. Techniques for integrating with non-constantM do exist, however they typically require gradients of F (q), which are difficult to compute.19

These methods will be pursued in application to the angles-only initial orbit determination problempresented, with the goal of inferring a likely subspace of solutions which can be used in further estimation.

IV. Orbital Dynamics Model

The problem setup used to define the chief and deputy orbits is similar to the orbit determination problemdescribed in Chapter 4 of Crassidis and Junkins.20 However in this case, the chief is in orbit around theEarth rather than a ground based observer, and all measurements are angles-only. This setup is depictedin Figure 1. In the inertial i1, i2, i3 basis, the vectors R and r to the chief and deputy respectively aredefined as:

R =[

X Y Z]T

(18)

r =[

x y z]T

(19)

Both are governed by Keplerian dynamics:

r = − µ

||r||3 r (20)

It is assumed that the position and velocity of the chief is perfectly known. For this reason, the structureof the approach used here remains the same regardless of whether the chief is located on orbit or on thesurface of the Earth. For orbit determination we are primarily concerned with the position and velocity ofthe deputy, thus we define our state as:

x =[

rT rT]T

(21)

The state transition matrix for the deputy relative to the center of the Earth as shown in Figure 1 has beenderived analytically and is defined as follows:21

Φ(t, t0) =

[

Φ11 Φ12

Φ21 Φ22

]

(22)

6 of 17


The terms of the state transition matrix are found to be:

Φ11 =||r||µ

(r− r0)(r− r0)T + ||r0||−3[||r0||(1 − F )rr0

T ] + FI3x3 (23a)

Φ12 =||r0)||µ

(1− F )[(r− r0) ˙r0T − (r− r0)r0

T ] +c

µrr0

T +GI3x3 (23b)

Φ21 = −||r0−2(r− r0)r0T − ||r||−2r(r− r0)

T − µc

||r||3||r0||3rr0

T

+F

[

I3x3 − ||r0−2rrT +1

µ||r|| (rrT − rrT)r(r − r0)

T

] (23c)

Φ22 =||r0||µ

(r− r0)(r− r0)T + ||r0||−3[||r0||(1− F )rr0

T − crr0T] + GI3x3 (23d)

where F , G, F , and G are coefficients that satisfy the following system of equations:

r(t) = r0F + r0G (24)

r(t) = r0F + r0G (25)

For ease of writing, c is defined as

c =(3u5 − χu4 −√

µ(t− t0)u2

µ(26)

which is made up of the following substitutions:

χ =

√µ(t− t0)

a+

rTr√µ

− rT0r0√µ

(27)

u2 = a

(

1− cos

(

χ√a

))

(28)

u3 = a3/2(

χ√a− sin

(

χ√a

))

(29)

u4 =aχ2

2− au2 (30)

u5 =aχ3

6− au3 (31)

where a is the semimajor axis of the deputy.

V. Measurement Model

In order to perform HMC, measurements need to be simulated as a function of the state of the deputyrelative to the chief. We define a relative instantaneous position vector from the chief to the deputy:

ρ =

x−X

y − Y

z − Z

(32)

We choose to rotate ρ to be expressed in the chief’s local Up-East-North frame u, e, n in order to eventuallyobtain azimuthal and elevation angles as our measurements:

ρu

ρe

ρn

=

cosλ 0 sinλ

0 1 0

− sinλ 0 cosλ

cos θ sin θ 0

− sin θ cos θ 0

0 0 1

ρ (33)

7 of 17


The angles λ and θ to the chief in spherical coordinates can be calculated geometrically as:

λ = arcsin

(

Z√X2 + Y 2 + Z2

)

(34)

θ = arccos

(

X

X2 + Y 2 + Z2

1

cosλ

)

(35)

with care taken to ensure that θ is in the correct quadrant. In this way we can generate angles-onlymeasurements taken by the chief as the azimuthal and elevation angles in the Up-East-North frame:

y =

[

az

el

]

=

tan−1(

ρe

ρn

)

sin−1(

ρu

||ρ||

)

(36)

For our simulated measurements, zero-mean Gaussian measurement noise is added to y to produce therealistically random measurements. Unless stated otherwise, all measurement error added has a standarddeviation of 0.01. Therefore, our measurements become:

yk = yk + vk (37)

where vk ∼ N (0, R = diag([0.01, 0.01])2). By introducing probabilistic error to our measurements, weobtain a logarithm of a likelihood function for use in Eq. (16):

log(L(y|q)) = log

(

1√

det(2πR)

)

− 1

2

m∑

k=1

(yk − yk)TR−1(yk − yk) (38)

where yk = h(q) is the predicted observation. In order to efficiently generate HMC proposals, the partialsof observation are needed to form the Fisher Information Matrix shown in Eq. (17). For the dynamics andmeasurements described, the corresponding measurement partial derivatives also have analytical forms:

∂az

∂x=

ρe sinλ cos θ − ρn sin θ

ρ2n + ρ2e(39)

∂az

∂y=

ρe sinλ sin θ + ρn cos θ

ρ2n + ρ2e(40)

∂az

∂z=

−ρe cosλ

ρ2n + ρ2e(41)

∂el

∂x=

||ρ|| cosλ cos θ − ρu∂||ρ||∂x

||ρ||√

||ρ||2 − ρ2u(42)

∂el

∂y=

||ρ|| cosλ sin θ − ρu∂||ρ||∂y

||ρ||√

||ρ||2 − ρ2u(43)

∂el

∂z=

||ρ|| sinλ− ρu∂||ρ||∂z

||ρ||√

||ρ||2 − ρ2u(44)

Therefore, the full measurement Jacobian with respect to the states is:

∇xy =

∂az

∂x

∂az

∂y

∂az

∂z0 0 0

∂el

∂x

∂el

∂y

∂el

∂z0 0 0

(45)

In order to express the Jacobian with respect to the initial state, we use Eq. (22) and the chain rule toobtain:

∇xy(tk) = ∇xy(tk)Φ(tk, t0) (46)

8 of 17


The measurement Jacobian can then be used in Eq. (17) to obtain the mass matrix:

M = ∇xyTR−1∇xy (47)

In this way, the dynamics of the problem are fully defined. Note that though the magnitude of the rangevector is used to generate measurements, the range itself is not used as a measurement, making this anangles-only problem.

VI. HMC Algorithm Outline

Though the equations needed to create simulated measurements are somewhat lengthy, the actual pro-cess of using them with HMC is actually straightforward. For all cases, 10 consecutive measurements aresimulated, equally spaced across 1000 seconds. After obtaining the dynamics of the chief and simulatedmeasurements of the angles to the deputy, the corresponding solution flow is as follows:

1. Choose an initial sampling point q

2. Calculate the local Fisher Information Matrix based off of the measurement Jacobian

3. Randomly choose momentum p, and scale with Fisher Information Matrix

4. Simulate leapfrog dynamics for L steps of size ǫ to obtain q∗ and p∗

5. Calculate the Hamiltonian associated with q∗ and p∗

6. Use Eq. (15) to determine if proposed sample q∗ adequately conserves the Hamiltonian

(a) If yes, repeat steps 6-10 with the proposed state q∗ as the new initial point

(b) If no, repeat steps 6-10 with the same initial point until a new proposal is accepted

Steps 3-6 are then repeated for as many samples N as desired .

A. Tuning of Monte Carlo Algorithms

Despite the significant improvement in sampling efficiency that HMC offers, the main challenge with HMC isthe parameter tuning which is required to efficiently produce meaningful samples of the target distribution.As described above, the mass matrix M has an analytical solution, which is used for this work. The othertuning parameters however, namely the leapfrog integration parameters ǫ and L must be chosen carefully toproduce useful results.

As described by Eq. (15), only a certain percentage of HMC samples will be accepted. In theory, a100% acceptance ratio would be ideal for exploring the target distribution, however this would correspondto very conservative proposals and would thus explore the region very slowly, requiring many samples tofully explore distant regions of the distribution. Conversely, a low acceptance ratio means that the proposedstates are taking large steps and quickly exploring the distribution, but are often too “wild” to be acceptedand therefore require many more samples to produce a detailed picture of the target distribution. In otherwords, a low acceptance rate is inefficient because it requires more samples to reach distant regions of thedistribution, but a high acceptance rate is inefficient because it requires more samples to fully capture thedetails of the distribution.

Betancourt has shown that a balance may be achieved by seeking an acceptance ratio that balancescomputational load to the amount of the distribution that has been explored; this ratio is found to be65%.16 However, in cases where it is desirable to explore fine details of the target distribution, higheracceptance ratios (i.e. a finer proposal distribution) will allow for this. For this reason, an acceptanceratio of 80-85% was the target ratio for this work. In the future where computational efficiency must beconsidered, the algorithms employed could easily be modified to accommodate a desired acceptance ratio of65%.

9 of 17


(a) Probability Distribution Matrix

100

101

102

103

104

-0.05

0

0.05

0.1

0.15

0.2

x y z vx vy vz

(b) Autocorrelation

Figure 2. Case A: Long Range Example

The product of ǫ and L is a parameter that directly affects the ratio of accepted HMC proposal states –different combinations of the two parameters with the same product will produce the same ratio of acceptedproposals, but result in vastly different sampling behavior. For this work, the leapfrog step size ǫ was chosenin each case by finding the product ǫL that produced the desired acceptance ratio. Then, L was variedto achieve desirable exploration behavior, and ǫ was calculated with respect to the chosen L to maintainthe desired product of ǫL. The acceptance ratio, which is governed directly by the integration error in theHamiltonian, is in general not affected by the number of integration steps L as long as the corresponding ǫis small enough, which justifies this approach to tuning.15

The number of leapfrog steps L was chosen by eye and it is difficult to quantify how the “best” L wasfound. In general, it is intuitive to see that smaller integration step sizes are more likely to produce acceptableproposals, however decreasing ǫ requires that the number of leapfrog integrations L must therefore increaseto maintain a desired acceptance ratio. However, the amount of orbit integrations necessary to generate aproposal state scales linearly with L as seen by the leapfrog integration, so it is not computationally efficientto make L arbitrarily large. In general, L was selected by gradually increasing L until no noticeable increasein sampling quality was observed.

For the interested reader, a thorough discussion of the intricacies of tuning HMC for a particular problemis discussed by Neal.15 Ultimately, tuning HMC is an art that is learned through trial and error, guided bythese observations.

VII. Simulation Results

Several test cases were chosen to demonstrate key findings of how HMC performs across a range of angles-only orbit determination cases, with a focus on close proximity cases. In order to show that HMC can beused to produce a probability distribution of solutions for angles-only measurements, a non-close proximitycase was first tested as a proof of concept. This case was tested 3 times, once with a measurement errorto be used for all close-proximity cases (Case A,) once with a higher measurement error in order to explorewhether non-Gaussian behavior might still be seen for long range orbits (Case A2,) and once with a shortmeasurement span and small number of measurements in an attempt to decrease observability and observenon-Gaussian behavior (Case A3.)

The “nominal” test case was chosen to be a relatively close proximity case that is potentially morestressing in terms of observability than the long range case. In order to judge the efficiency of HMC ascompared to MCMC, the nominal case was tested three times, once with 10,000 HMC samples (Case B,)once with 100,000 HMC samples (Case C,) and once with 100,000 samples obtained by MCMC. The goal ofthis approach was to show that HMC requires far fewer samples than MCMC to achieve similar results.

Two additional test cases were chosen to push the observability of the system even lower; one case is

10 of 17



100

101

102

103

104

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

x y z vx vy vz

(b) Autocorrelation

Figure 3. Case A2: Long Range High Error Example


100

101

102

103

104

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

x y z vx vy vz

(b) Autocorrelation

Figure 4. Case A3: Long Range Short-Arc Example

“medium,” and one case is “hard.” The medium case is similar to the nominal case, but the inclination ofthe deputy is lowered so that the chief and deputy are nearly coplanar (Case E.) The hard case is coplanar,as in Case E, but is also much closer range than the nominal case, in order to potentially limit observabilityas much as possible (Case F.)

Finally, Case G was chosen to demonstrate the “burn-in” behavior of HMC – all other samples wereobtained by initializing the samples at the true location of the deputy. This was done in order to demonstratethe tuning properties of HMC and how this governs how burn-in should be treated.

The first figure associated with each Case is a 6x6 matrix of the sampled distribution along each coordinatedirection of the state vector, in order to show how HMC performs in each direction. The second figure is thena sample autocorrelation plot for each state variable. Though in general no single metric exists to determineif Monte Carlo samples are “quality,” the autocorrelation of the samples is commonly used to qualitativelyjudge if Monte Carlo samples are sufficiently exploring the distribution.22

For all test cases, the orbit elements governing the dynamics of the chief are held constant at the valuesshown in Table 1: where a through M0 represent the traditional orbital elements of the satellite. The initialcondition of the chief is then taken to be the state at time t = 0. In order to create the test cases desired,the chief satellite’s parameters were varied with respect to the deputy parameters according to the test cases

11 of 17


a (km) e i() Ω() ω() M0()

Chief 7000 0 25 180 270 0

Table 1. Chief orbital elements across all test cases.

shown in Table 2. If a parameter is not listed as being varied in Table 2, then it is assumed to be the same

Property Case Range (km) ∆a (km) ∆e ∆i() σθ() Measurements

Long Range A 4,345 3,000 0.1 5 0.01 10 in 1000s

Long Range, High Error A2 4,345 3,000 0.1 5 0.1 10 in 1000s

Long Range, Short Arc A3 4,345 3,000 0.1 5 0.01 3 in 200s

Nominal B,C,D 617 0 0.01 5 0.01 10 in 1000s

Coplanar E 700 0 0.1 0.05 0.01 10 in 1000s

Coplanar, Short Range F 70 0 0.01 0.05 0.01 10 in 1000s

Burn In G 617 0 0.01 5 0.01 10 in 1000s

Table 2. Not shown: for case A, ω and M0 were chosen to be 5 degrees in order to produce a larger range.

as the Deputy parameter. Based on the discussion in Section A, the Monte Carlo sampling parameters werechosen to be the following for each test case.

Property Case L ǫ N Acceptance Ratio

Long Range A 20 0.08828 10,000 80.37%

Long Range, High Error A2 10 0.1594 10,000 80.14%

Long Range, Short Measure A3 10 0.33 5,000 76.67%

Nominal B 20 0.045 10,000 84.78%

Nominal C 20 0.045 100,000 84.78%

Nominal (MCMC) D∗ n/a n/a 100,000 65.3%

Coplanar E 10 0.009 10,000 89.69%

Coplanar + Short Range F 15 0.00750 10,000 85.36%

Burn In G 7 0.02 1,000 33.7%

Table 3. HMC parameters used to obtain the results.

A. Long Range Case: Case A, A2, A3

The state probability distribution results for the long range case (Case A) are shown in Figure 2(a), withassociated sample autocorrelation in Figure 2(b). The lower triangle of Figure 2(a) shows the 6-dimensionalprobability distribution projected along each individual state. The dots are the locations of the samples thatHMC visited, and the star at 0 is the location of the true state. Note that rather than plotting the statevariables directly, samples have been plotted with respect to the true state, i.e. x∗ = x − xtrue. This wasdone in order to more clearly depict the width of the distribution in each direction.

These distributions appear to be well behaved, with very little evidence of clumpiness. Additionally,the distributions appear to be extremely flat in one direction – this to be expected because we expect theprimary uncertainty to lie along the direction of the vector pointing from the chief to the deputy. Indeed,this direction is seen to have a much larger variance as compared to the other directions. It is interesting tonote that most of the states appear to have been sampled almost completely uncorrelated, except for the yand z components of the velocity. This could also explain why the true state appears to lie along the edgeof the distribution in the y and z velocity distributions. From this we may infer that these two directionsare significantly more difficult to sample. Despite this, the other directions appear to be quite observable,as expected for a long range case.

12 of 17



100

101

102

103

104

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

x y z vx vy vz

(b) Autocorrelation

Figure 5. Case B: Nominal Example

It is interesting to note that increasing the measurement error, as in Case A2 shown in Figures 3(a)and 3(b), seemed to not affect the shape of the distributions much, i.e. the two cases have equal levels ofnon-Gaussian behavior. However, the order of magnitude of the variance of the distribution is larger forCase A2, which makes sense as this case has much higher measurement error. Additionally, comparisonbetween Case A and A3 shows that observability can be pushed by lowering the amount of measurements(as well as the time span between measurements.) Case A3, shown in Figures 4(a) and 4(b), exhibits muchmore non-Gaussian behavior, as is expected for a case that has low observability. From these results we mayconclude that HMC can indeed provide information about the probability distribution of states for orbitdetermination give angles-only measurement.


100

101

102

103

104

-0.2

0

0.2

0.4

0.6

0.8

1

x y z vx vy vz

(b) Autocorrelation

Figure 6. Case C: Nominal 100,000 Samples Example

B. Nominal Case: Case B, C, and D

Figures 5(a) and 5(b) show the results of the “nominal” case chosen to demonstrate that HMC can charac-terize close proximity probability distributions. From these results, we see that though HMC appears to beworking, the autocorrelation does not drop off as quickly as Case A, indicating that this is indeed a morestressing case than the long range cases. Figures 6(a) and 6(b) show the results of Case C, which is the same

13 of 17


nominal case but with 10 times more samples than Case B. It is observed that the results between CaseB and Case C are not drastically different, indicating that 10,000 samples is enough samples to sufficientlyexplore the distribution.


100

101

102

103

104

-0.2

0

0.2

0.4

0.6

0.8

1

x y z vx vy vz

(b) Autocorrelation

Figure 7. Case D: Nominal 100,000 MCMC Samples Example

Case D then offers a comparison between HMC and traditional MCMC. Figures 7(a) and 7(b) show theresults of the same nominal case as in Case B and C, but sampled with MCMC. From this we see that MCMCdoes indeed have more trouble exploring the distribution, even with far more samples than the 10,000 thatare sufficient for HMC. This shows that HMC is in fact an improved method for stressful cases like closeproximity angles only orbit determination.

C. Stressing Cases: Case E and F

Figures 8(a) and 8(b) depict the results of the medium and hard stressing cases. Case E, which has similarrange as the nominal case but is much more coplanar, shows in particular very non-Gaussian distributions,as expected for a case with low observability. Case F is in theory even less observable than Case E, as therange is an order of magnitude smaller (60km vs. 700 km). However, though still clearly non-Gaussian,these distributions are slightly more regular looking than those of Case E, which is unexpected. Comparisonof the autocorrelation plots between Case E and F indicate that this could be because Case E could possiblybe tuned even further to achieve higher quality samples. This is further supported by the slight clumpinessof Case E.

D. Burn-In Case: Case G

For our purposes, the test cases were all generated with an initial starting point at the true location ofthe deputy, which is obviously not ideal for a real world estimation problem. In order to demonstrate thisphenomenon, the nominal case (Case A) was simulated with an initial starting point that is 100 km awayfrom the true state, i.e. x0 = xtrue + [100, 0, 0, 0, 0, 0]T . The resulting samples of this test are shown in thesubplots of the sample coordinates in Figure 10.

The dashed line at the origin of each subplot is the true location of the deputy in that coordinate, andthe location of the initial state is the state the zeroth sample. From this we see some interesting burn-inbehavior of the chain of samples – at the risk of anthropomorphizing the simulations, it appears that initiallythe samples head away from the location of the true state, but eventually turn around and appear to beginheading towards the truth in some directions, and away from the truth in others. However, the density ofthe samples show that after turning around, the samples begin to approach the x coordinate of the truthmore and more slowly. Indeed, changing the number of samples does not change this behavior. Rather, asthe samples get closer to the truth they begin to get rejected, which is why they appear to slow down. Atthe same time, though all the states other than x were initialized at the truth, they cannot begin to explorethe target region until the x samples are in the region of the target distribution.

14 of 17



100

101

102

103

104

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

x y z vx vy vz

(b) Autocorrelation

Figure 8. Case E: Coplanar Example


100

101

102

103

104

-0.5

0

0.5

1

x y z vx vy vz

(b) Autocorrelation

Figure 9. Case F: Close Proximity Example

This behavior can be explained as resulting from the sensitivity of HMC to tuning – the region closer tothe truth is significantly different than the region near the initial guess, thus different leapfrog parametersare needed to propagate the trajectories. In other words, when far away from the truth, larger leapfrog stepsare necessary to explore the area, but as the samples get closer, these same stepsizes are too large and beginto get rejected. Therefore, if HMC was to be used in this way, with arbitrary initial guesses, the stepsizewould have to be adaptively changed depending on where the samples are in relation to the location of theheart of the probability distribution. However, adaptively changing the step size is a challenging problem tosolve, and is outside the scope of this research. Thus, all cases presented have been initialized at the truthso that the stepsize used to propagate the dynamics is optimal for the region of the probability distribution.This does not necessarily take away from our results, as the point of HMC is to sample the probabilitydistribution, and other methods such as gradient descent may be better suited towards finding the optimallocation of the probability distribution in the first place.

Other optimization methods such as gradient descent could be used in conjunction with HMC to producea realistically useful way of finding probability distributions with truly unknown range outside of simulation,however the algorithm as it stands is not meant for that, though it could easily be used after an initial periodof searching once the proposals reach the region of the probability distribution. This composite approach

15 of 17


would most likely produce a successful implementation of HMC for the purpose of exploring a truly unknownprobability distribution.

Figure 10. Case G Burn In Convergence

VIII. Conclusions

We have shown that Hamiltonian Monte Carlo sampling can successfully be used to resolve the probabilitydistribution of possible solutions for angles-only orbit determination, particularly for close proximity cases.Moreover, we have demonstrated that HMC can do so more efficiently than traditional Markov ChainMonte Carlo techniques, by taking advantage of the geometry of the state space when proposing samples.These results could possibly be used to efficiently guide follow up observations towards regions of highprobability. The algorithm used to achieve this is relatively general, requiring only a measurement modeland measurement Jacobian to create the mass matrix that influences the proposal samples, which in this caseis analytically convenient. Main weakness of Monte Carlo methods is partially overcome by using the localFisher Information Matrix to select the mass matrix in the HMC approach, which was shown to improvesampling efficiency.

Acknowledgments

This work was made possible by support from the Air Force Research Laboratory at Kirtland Air ForceBase, through the Universities Space Research Association.

Appendix A: MH

The MH-MCMC approach produces a sequence of states, [q0,q1, · · · ,qN ]. Once a proposal distribution,g(q,q∗), and desired length of chain are selected the MH-MCMC algorithm is given by

1. Choose an initial value q0;

16 of 17


2. At each step, where the current value is qi−1, propose a candidate for the new parameter q∗ from thedistribution g(qi−1, ·);

3. If the proposed value q∗ has a higher target density value than qi−1 or

π(q∗)g(q∗,q) > π(q)g(q,q∗)

the proposal is accepted unconditionally;

4. If this is not the case, then q∗ is accepted as the new value with a probability α given by Eq. (9);

5. If q∗ is not accepted, then the chain remains at the current value and qi = qi−1;

6. Repeat the simulation from step 2 until the desired length of the chain is reached.

References

1Woffinden, D. C. and Geller, D. K., “Observability Criteria for Angles-Only Navigation,” IEEE Transactions of Aerospaceand Electronic Systems, Vol. 45, No. 3, July 2009.

2Patel, H., Lovell, T. A., Allgeier, S., Russell, R., and Sinclair, A., “Relative Navigation For Satellites in Close ProximityUsing Angles-Only Observations,” American Astronautical Society , , No. 12-202, 2012, pp. 1485–1495.

3Hebert, L. M., Sinclair, A. J., and Lovell, T. A., “Angles-Only Relative-Orbit Determination Via Maneuver,” AmericanAstronautical Society , , No. 15-352, 2015.

4Sherrill, R. E., Sinclair, A. J., and Lovell, T. A., “Virtual-Chief Generalization of Hill-Clohessy-Wiltshire to EllipticOrbits,” Journal of Guidance, Control, and Dynamics, Vol. 38, No. 3, March 2015, pp. 523–527.

5Hebert, L. M., Sinclair, A. J., and Lovell, T. A., “Angles-Only Initial Relative-Orbit Determination Via SuccessiveManeuvers,” American Astronautical Society , , No. 16-512, 2016.

6Muinonen, K. and Bowell, E., “Asteroid Orbit Determination Using Bayesian Probabilities,” Icarus, , No. 104, 1993,pp. 255–279.

7Schneider, M. D., “Bayesian Linking of Geosynchronous Orbital Debris Tracks As Seen By the Large Synoptic SurveyTelescope.” arXiv:1111.2556 , 2011.

8Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E., “Equation of State Calculations byFast Computing Machines,” Journal of Chemical Physics, Vol. 21, 1953, pp. 1087–1092.

9Hastings, W. K., “Monte Carlo Sampling Methods Using Markov Chains and Their Applications,” Biometrika, Vol. 57,1970, pp. 97–109.

10Moretti, N., Rutten, M., Bessell, T., and Morreale, B., “Autonomous Space Object Catalogue Construction and UpkeepUsing Sensor Control Theory.” Proc. the Advanced Maui Optical and Space Surveillance Technologies Conf. (AMOS), 2017.

11Muinonen, K., Fedorets, G., Pentikainen, H., and Pieniluoma, T., “Asteroid Orbits With Gaia Using Random-WalkStatistical Ranging,” Planetary and Space Science, Vol. 123, 2016, pp. 95–100.

12Binz, C. R. and Healy, L. M., “Direct Uncertainty Estimates for Angles-Only Initial Orbit Determination,” Journal ofGuidance, Control, and Dynamics (Article in Advance), 2017.

13Roberts, G. O. and Rosenthal, J. S., “Examples of Adapative MCMC,” Journal of Computational and Graphical Statis-tics, Vol. 18, No. 2, 2009, pp. 349–367.

14Crassidis, J. L. and Junkins, J. L., Optimal Estimation of Dynamic Systems, CRC Press, Boca Raton, FL, 2nd ed., 2012,pp. 91–92, 103–108.

15Neal, R. M., “MCMC Using Hamiltonian Dynamics,” Handbook of Markov Chain Monte Carlo, edited by S. Brooks,A. Gelman, G. Jones, and X.-L. Meng, Chapman & Hall/CRC Press, 2011.

16Betancourt, M., “A Conceptual Introduction to Hamiltonian Monte Carlo,” arXiv:1701.02434 , 2017.17Linares, R. and Crassidis, J. L., “Space Object Shape Inversion via Adaptive Hamiltonian Markov Chain Monte Carlo.”

Journal of Guidance, Control, and Dynamics, 2017, pp. 1–12.18Girolami, M. and Calderhead, B., “Riemann Manifold Langevin and Hamiltonian Monte Carlo methods,” Journal of the

Royal Statistical Society , Vol. 73, No. 2, 2011, pp. 123–214.19Betancourt, M., “A General Metric for Riemannian Manifold Hamiltonian Monte,” arXiv:1212.4693 , 2013.20Crassidis, J. L. and Junkins, J. L., Optimal Estimation of Dynamic Systems, Applied Mathematics and Nonlinear Science

Series, Chapman & Hall/CRC Press, 1st ed., 2004.21Battin, R. H., An Introduction to the Mathematics and Methods of Astrodynamics, AIAA Education Series, American

Institute of Aeronautics and Astronautics, 1999.22Kruschke, J., Doing Bayesian Analysis, Elsevier Inc., 2nd ed., 2015.

17 of 17


Date post:	06-Oct-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

SFM2018 submission draft v6 010218 -...

Documents