[American Institute of Aeronautics and Astronautics AIAA Guidance, Navigation and Control Conference...

Stochastic Optimal Guidance

of Missiles with Nonlinear Dynamics

Gyorgy Hexner∗

RAFAEL, Haifa 31021, Israel

Tal Shima†

Technion - Israel Institute of Technology, Haifa, 32000, Israel

A stochastic optimal control guidance law is derived for a system with nonlinear dy-namics having a bounded acceleration command. In the investigated problem of missileguidance, the kinematics are linear and the measurements Gaussian; however, due to thenonlinear dynamics and the bounded controller, the certainty equivalence principle is notvalid. Consequently, the optimal guidance law is obtained by numerically solving theHamilton-Jacobi equation associated with the stochastic optimization problem. The opti-mal guidance law depends on the conditional probability density function of the estimatedstates. Moreover, the guidance law is nonlinear in the estimated zero effort miss distanceand in the missile internal states. It is shown that if rate saturation is present or if theacceleration bound is non-symmetric then a non-zero acceleration command is issued evenif the zero effort miss is zero. The aim of this unusual feature of the guidance law is to posi-tion the missile as far as possible from saturation limits, thus placing it in as advantageoussituation as possible to deal with expected future target maneuvers.

I. Introduction

Guidance of autonomous vehicles to intercept a moving target is a well studied problem. Problems ofspecific interest are missile interception engagements and robot pursuit. In such engagements intercept maybe possible if perfect information is available and there is no limit on the interceptor’s control.

Under imperfect information a state estimator may be needed. In order to implement an estimator bothmeasurement noise and target maneuver models are required. Commonly measurement noise is assumed tobe additive, white, and Gaussian. The simplest model for representing the target maneuvers is a zero meanwhite noise. The Singer model,1 representing the target acceleration as being piecewise constant with jumpsoccurring with Poisson statistics, is more realistic and commonly used. A widely employed approximationin deriving a state estimator is to use a linear system, driven by white noise, whose output has the samefirst and second order statistics as the statistical model used to describe the target acceleration. The linearsystem is usually called a shaping filter.2 This practice is valid provided the missile system and the guidancelaw are both linear, and the cost quadratic, since in such a case the second order statistics are all that isneeded to characterize the system.

Commonly used optimal guidance laws such as proportional navigation (PN),3 augmented PN (APN),4

and optimal guidance law (OGL)5 have been developed without explicitly taking into account the inherentbound on the interceptor’s maneuver command. Implicitly it has been taken into account by penalizing accel-eration commands in the running cost. A suboptimal guidance law for missiles with a bounded accelerationcommand was proposed in Ref. 6. The guidance law was derived for minimum and non-minimum phasemissiles taking into account their high order dynamics. Using simulations, the superiority of the guidancelaw over the classical PN has been shown in a scenario against a target performing a barrel roll maneuver.

It is of common practice to implement such deterministic optimal guidance laws in a stochastic setting.This practice is based on the assumed well-known certainty equivalence property7 stating that the optimal

∗Research Fellow, Department 39; E-mail: [email protected]†Senior Lecturer, Department of Aerospace Engineering; This work was supported in part by a Horev Fellowship through

the Taub Foundation; Senior Member AIAA; E-mail: [email protected]

1 of 18

American Institute of Aeronautics and Astronautics

AIAA Guidance, Navigation and Control Conference and Exhibit20 - 23 August 2007, Hilton Head, South Carolina

AIAA 2007-6536

Copyright © 2007 by the authors. Published by the American Institute of Aeronautics and Astronautics, Inc., with permission.

control law for a stochastic control problem is the optimal control law for the associated deterministic(certainty equivalent) problem. The certainty equivalence property was initially proven for LQG problemsbut is not valid for guidance problems characterized by bounded control. Implementing the deterministicoptimal guidance law in such problems may result in inferior homing performance, especially against a highlymaneuvering target (see Ref. 8). For this class of problems, the estimator may be designed independently ofthe controller, but the optimal controller depends on the conditional probability density function resultingfrom the estimation process.9 Thus, the target state estimator is designed independently of the guidance law,but the guidance law depends on the noise structure of the problem and on the estimation error statistics.

In Ref. 10 a linear guidance law was derived for a missile with linear dynamics having a boundedacceleration command and utilizing Gaussian measurements. The random input describing function wasused for representing the nonlinear saturation. Compared to OGL, the obtained guidance law uses a higherequivalent navigation constant earlier in the course of the pursuit. Hence, the merit of the approach is indistributing the commands along the entire duration of the end-game. Saturation is not avoided but is reachedearlier in the scenario, exploiting the interceptor missile capabilities. In another paper11 by the authors thestochastic non linear optimal guidance law was derived for a similar problem. The approach was basedon the numerical solution of the Hamilton-Jacobi equation using the Markov chain approximation method.Simulation results of both approaches showed that such guidance laws yield better homing performance thanthe classical OGL. In recent related works12,13 a different approach for estimation guided guidance has beenproposed. Particle filtering has been used to approximate the entire state conditional probability densityfunction without constraining the analysis to the standard Gaussian noise assumptions.

In this paper an integrated estimation-guidance approach is proposed for a missile having nonlineardynamics. The target maneuver uncertainty is defined by a first order Gauss-Markov model, which is theshaping filter for the Singer model; however, if the actual target acceleration includes jumps then no claim ofoptimality is made. The remainder of this paper is organized as follows: In the next section, the engagementkinematics, missile dynamics equations, and measurement model are provided and the problem is posed.Then, the state estimator is given. Next, the optimal nonlinear controller is synthesized. An analysis andcomparison between the new nonlinear controller and the linear OGL is then presented and concludingremarks are offered in the last section. The numerical solution of the Hamilton-Jacobi equation is discussedin the Appendix.

II. The Problem

A. Kinematics and Dynamics

1. Engagement Kinematics

The end-game phase of an endo-atmospheric interception scenario is analyzed. We assume that during thisstage the vehicles fly close to a collision course allowing trajectory linearization. We analyze the interceptionproblem in 2-D and further assume that throughout the short duration of the end-game the speeds of themissile and target are constant. Let r0 be the initial distance between the missile and its target. Then theclosing speed Vc is constant and the interception time, computed by tf = r0/Vc can be assumed fixed. Usingtf , the time-to-go of the interception problem can be computed as

tgo = tf − t (1)

The linearized end-game kinematics are depicted in Fig. 1, where the X-axis is aligned with the initialline of sight (LOS). p, given by p = pT − pM , is the relative displacement between the target and the missilenormal to the initial LOS direction; and v is its time derivative. The target and missile accelerations normalto the LOS are denoted by aT and aM , respectively. The state vector of the linearized kinematics is definedby

xk =[

p v]T

(2)

and the corresponding equations of motion are

xk = Akxk + BkaM + BT aT ; xk(0) = [ 0 v0 ]T (3)

2 of 18


Figure 1. Linearized engagement kinematics

where

Ak =

[0 10 0

], Bk =

[0−1

], BT =

[01

](4)

2. Maneuver Dynamics

In endo-atmospheric interception usually the majority of the interceptor’s lift is obtained by generating anangle of attack, a process that has dynamics. The interceptor has steering devices such as canard or tailthat can generate an angle of attack, and, neglecting servo dynamics, possibly instantaneous maneuvers. Wemodel the closed loop system as follows

aM = g(xM , ac) (5)

andxM = f(xM , ac) (6)

where ac is the control input; and xM is the state vector of the interceptor’s internal state variables withdim(xM ) = n. We denote the part of the acceleration without dynamics, if it exists, as the direct lift, whilethe part with dynamics as the specific force.

The maneuvers of the evading target are modeled by a first order Gauss-Markov process

aT = (ωT − aT )/τT (7)

where ωT is a zero mean white process noise with spectral density Q and τT is the maneuver’s decorrelationtime.

B. Measurements

It is assumed that the interceptor measures the LOS angle to the target which, under the linearizationassumption, can be approximated by

y = Cxk + ϕ (8)

whereC = [ 1/r 0 ] (9)

and r is the range to the target assumed to be accurately known; and ϕ is the white measurement noise withspectral density R. In any real-life application, the angular measurement noise generally increases for smallranges. This is modeled by assuming that the spectral density of the additive measurement noise is of theform

R =

{σ2

o r > rc

σ2o

(r2c/r2

)r ≤ rc

(10)

We also assume that the missile’s n internal states are known with negligible measurement error.

3 of 18


C. Optimization function

The state vector of the interception problem is

x =[

xTk aT xT

M

]T

(11)

The objective is to calculate a control signal to minimize the value of

J = E{

xT (tf )Sx(tf ) +∫ tf

0

a2c(ξ)2

dξ

}(12)

where S is a matrix with a single non-zero entry, s > 0, in the upper left corner.

III. Independent Estimator Design

As discussed earlier, the design of the estimator is independent of that of the controller. In this sectionthe target state estimator is presented. Since we assumed that the missile’s state vector is available to theguidance loop with negligible measurement error, only the remaining three states need to be estimated.These are the relative position and velocity between the target and the missile; and, in addition, the target’sacceleration. All are perpendicular to the initial LOS direction. The state vector composed of these statesis denoted as xE ,

xE =[

p v aT

]T

(13)

The vector xE satisfies the stochastic differential equation,

dxE = (FxE + GMaM )dt + GT dw (14)

with

F =

[Ak BT

0 −1/τT

](15)

GT =[

0 1/τT

]T

(16)

GM =[

BTk 0

]T

(17)

where the notation 0 denotes a matrix of zeros with appropriate dimensions.The continuous time Kalman filter for xE is

dxE = (FxE + GMaM )dt + Kdν, (18)

where K is the Kalman filter gain, and ν is the innovations process, which is,

dν = dy −HdxE , (19)

H =[

C 0]

(20)

The Kalman gain K satisfiesK = PHT R−1, (21)

where P is the covariance matrix of the estimation error for the vector xE with

P = FP + PFT + GT QGTT −PHT R−1HP, (22)

IV. Optimal Controller

Numerically solving the high order problem posed in section II can be computationally intractable.However, the problem can be simply reduced by two states by using the well-known concept of zero effortmiss (ZEM) distance as shown next. For such a reduced order problem the optimal controller is then derivedfollowed by a discussion of some special cases.

4 of 18


A. Order Reduction

The notion of ZEM is commonly used in advanced optimal control and differential game formulations ofguidance problems. It is the miss distance, if from the current time onwards the interceptor’s control is setto zero, and the target performs the expected maneuver. In a linear setting, this scalar variable is obtainedfrom the homogenous solution of the associated engagement equations of motion. The usefulness of usingthe ZEM is twofold. First, if the system response is maintained such that the ZEM equals zero then themiss distance is nulled. Second, it’s use can reduce the order of a linear guidance problem to a scalar one,with a dynamic equation that in the nominal case depends only on the system inputs.

In our problem the equations of motion are nonlinear. Nonetheless, we utilize the ZEM concept to reducethe order of the problem from n + 3 to n + 1. Remember that n is the order of the interceptor’s internalnonlinear states while the problem has three additional states (p, v, and aT ). Let

z =[

Z xTM

]T

(23)

define the reduced order state of the problem where the ZEM is

Z = p + vtgo + aT τ2T ψ(tgo/τT )− U(xM , tgo) (24)

withψ(ξ) = exp(−ξ) + ξ − 1 (25)

and U(xM , tgo) is the solution of the variable p(tgo) of the following initial value problem,

p = v ; p(0) = 0v = g(xM , ac) ; v(0) = 0

xM = f(xM , ac) ; xM (0) = xM0

(26)

with ac = 0.Differentiating Eq. (24) we obtain the ZEM dynamics

Z = ωT τT ψ(tgo/τT )− g(xM , ac)tgo −UxM(xM , tgo)f(xM , ac) + Utgo(xM , tgo) (27)

whereUxM (xM , tgo) = ∂U(xM , tgo)/∂xM (28)

andUtgo(xM , tgo) = ∂U(xM , tgo)/∂tgo (29)

In general the obtained ZEM dynamics are dependent on the interceptor’s and target’s states and controls.As expected, all the states having linear dynamics have vanished.

As a special case, if the missile has first order linear dynamics with a time constant τM , then U(xM , tgo) =aMτ2

Mψ(tgo/τM ) and consequently Z of Eq. (27) degenerates to

Z = ωT τT ψ(tgo/τT )− acτMψ(tgo/τM ) (30)

which is dependent only on the controls in the problem, and in particular independent of the interceptor’sinternal state. Hence, Eq. (30) is the only differential equation required to describe the problem. If theinterceptor’s internal dynamics are non-linear, then Eq. (27) retains its dependence on the interceptor’sinternal states. In this case, in addition to (27), Eqs. (5) and (6) are required.

Although the dynamics could be nonlinear, Z as defined in Eq. (24) implies that Z = 0 at a given timewould result in zero miss distance, if zero control were applied up to the terminal time, and the target behavedas in the model used to generate ZEM . Note that Z is linear in the estimated variables but non-linear inthe missile internal states. Hence, if expectations are taken conditioned on the information available, thenall the statements remain valid.

5 of 18


B. Stochastic Dynamic Programming

The cost (12) can also be expressed as

J = E{

Z2(tf )s +∫ tf

0

a2c(ξ)2

dξ

}(31)

On the other handZ(t) = E{Z(t)|y(λ), λ < t}+ Z(t)|y(λ), λ < t (32)

where Z(t)|y(λ), λ < t is the estimation error of Z(t) given the available information to time t. Substituting(32) into (31) we obtain

J = E(

[E{Z(tf )|y(λ), λ < t}]2s +∫ tf

0

a2c(ξ)2

dξ

)+ E[Z(t)|y(λ), λ < t}]2. (33)

Since the last term is the mean square estimation error of the zero effort miss Z(t), in our model this termis not influenced by the acceleration commands ac(t). It then follows that for the purpose of calculating theoptimal ac(t), minimizing the first two terms of (33) yields the same ac as minimizing (33). The conditionalexpectation E{Z(t)|y(λ), λ < t} can be obtained from (18),

dE{Z(t)|y(λ), λ < t} = b1 dt + σdν, (34)

whereEν2 = Rt (35)

σ = K1 + K2tgo + K3τ2T ψ(tgo/τT ), (36)

b1 = −g(xM , ac)tgo −UxM(xM , tgo)f(xM , ac) + Utgo(xM , tgo) (37)

and Ki is the ith component of the Kalman gain. If differential equation (27) is independent of xM (thelinear dynamics case) then Eqs. (34) – (37) are all that is required. If the xM terms do not cancel out, thenthe differential equation for xM must be appended. The set of equations may be written more compactlyas,

dz = b(z, ac)dt + Σdν (38)

where

b =

[b1

f(xM , ac)

](39)

and

Σ =

[σ

0

](40)

Let V (z, t) be the optimal cost to go of E{J |y(λ), λ < t} from the initial state vector z at time t. V (z, t)satisfies the Hamilton-Jacobi partial differential equation,

∂V

∂t+ min

ac

{bT∇V +

12Trace

[M

∂2V

∂zi∂zj

]+

a2c

2

}= 0 (41)

where

M = Σ(tgo)ΣT (tgo) =

[σ2R 00 0

](42)

6 of 18


C. Some special cases

When the missile dynamics are linear, the function U(xM , tgo) can be explicitly evaluated. Let the stablemissile dynamics be,

xM = AMxM + BMac

aM = CMxM

(43)

In this case,U(xM , tgo) = CMA−2

M

(eAM tgo − I−AM tgo

)xM . (44)

If in addition the missile is approximated by a first order transfer function, aM/ac = 1/(1+sτM ), U(xM , tgo)reduces to the familiar form

U(xM , tgo) = aMτ2Mψ(tgo/τM ). (45)

On evaluating Eq. (34) for a general linear missile we obtain

dE{Z(t)|y(λ), λ < t} = −CMA−2M


)BMacdt + σdν, (46)

that is, the differential equation depends on the commanded acceleration, ac, and is independent of the missilestate. Hence, in this case the optimal guidance command depends on the ZEM only, and is independent ofthe missile internal state.

We obtain another noteworthy case when the missile non-linearity is at the input, when the dynamics ofthe missile are defined by the differential equation

xM = AMxM + BM (ac)aM = CMxM

(47)

Namely, the only non-linear element is BM (ac). In this case the differential equation satisfied by the zeroeffort miss is

dE{Z(t)|y(λ), λ < t} = −CMA−2M


)BM (ac)dt + σdν, (48)

implying that in this case also the optimal guidance law depends solely on the zero effort miss. When BM (ac)is the saturation function, we obtain a missile with bounded acceleration, a problem which was studied inRef. 11.

V. Example

A. First Order Nonlinear Missile Dynamics

A block diagram of the missile dynamics chosen for our example is depicted in Fig. 2. Note the limits on theacceleration command and also on the rate of change of the missile’s internal state. This may be a simplifiedmodel of a missile with characteristic response slower for large inputs than for small ones.

ca

s

1 Mx Ma

d

-

++

+

1 d−1

Mτ

LmaxMa

Figure 2. Missile Dynamics

7 of 18


The dynamics of such a system are,

xM = f(xM , ac) = sat ([sat(ac, amaxM )− xM ] /τM , L) (49)

where amaxM is the maximum allowed acceleration command, L is the rate limit, and sat(·) is the standard

saturation function.The missile acceleration is

aM = g(xM , ac) = d · sat(ac, amaxM ) + (1− d) · xM (50)

where d is the direct lift part of the control (feed-forward). The parameters chosen for the evaluated exampleare given in Table 1

parameter value units definitionτM 0.1 sec missile time constantL 5000 m/sec3 missile state maximum rate

amaxM 500 m/sec2 missile maximum accelerationd 0 direct lift coefficientτT 3 sec target acceleration decorrelation timeσT 50 m/sec2 target acceleration standard deviationσo 0.1 mRad observation noise standard deviationrc 100 m glint noise distanceVc 1000 m/sec closing speedr0 500 m initial range

Table 1. Simulation Parameters

B. Acceleration Command

In Figs. 3–5 the acceleration command is plotted as a function of the ZEM at time instances tgo =0.05, 0.4, 0.5 sec, respectively. In each figure the results are plotted for 5 different values of xM . Clearly,in our example with rate and magnitude saturations this dependence is nonlinear. In the case of OGL thisdependence is linear as the command is proportional to N ′/t2go. There are several outstanding features inthese figures:

• Negative acceleration commands saturate at values smaller than the missile’s maximum maneuvercapability;

• For Z = 0 and xM 6= 0 the acceleration command is different from zero;

• For xM 6= 0 the ZEM value for which the acceleration command changes sign depends on tgo.

The first feature is easily explained. The negative acceleration command to saturate the rate of change ofthe missile’s internal state is ac = xM − τML = xM − 500. Applying more negative values of ac does notinfluence the missile’s acceleration or its internal state, as there is no direct lift (d = 0) in this example.

The fact that the acceleration command depends on the missile internal state xM , in addition to theusual dependence on ZEM was to be expected as the xM dependence does not cancel out in (38).

At large values of tgo the acceleration command becomes independent of xM . As the missile approachesthe target it seeks to place itself in as advantageous position as possible to counter the target’s assumedfuture maneuver potential. The missile accomplishes this by biasing ac positively when xM > 0, e. g. ac > 0when Z = 0. Such a command causes a decrease in the value ZEM . Subsequently this causes the use ofa more negative value of ac, which drives xM closer to zero, out of the rate saturation, leading to a fasterresponse.

8 of 18


-600

-400

-200

0

200

400

600

-0.1 -0.05 0 0.05 0.1

a c [m

/sec

2 ]

Z [m]

xM= 0 xM= 125xM= 250xM= 375xM= 500

Figure 3. Acceleration command dependence on ZEM; tgo = 0.05 seconds

9 of 18


-600

-400

-200

0

200

400

600

-10 -5 0 5 10

a c [m

/sec

2 ]

Z [m]

xM= 0 xM= 125xM= 250xM= 375xM= 500

Figure 4. Acceleration command dependence on ZEM; tgo = 0.4 seconds

10 of 18


-600

-400

-200

0

200

400

600

-10 -5 0 5 10

a c [m

/sec

2 ]

Z [m]

xM= 0 xM= 125xM= 250xM= 375xM= 500

Figure 5. Acceleration command dependence on ZEM; tgo = 0.5 seconds.

11 of 18


To better understand this last feature of the optimal solution, the rate saturation was removed from theproblem. Instead, Eq. (49) was replaced by

xM = f(xM , ac) = [satu(ac, amin, amax)− xM ]/τM , (51)

wheresatu(ac, amin, amax) = ac amin ≤ ac ≤ amax

= amax ac > amax

= amin ac < amin

(52)

is a nonsymmetric saturation function.The values chosen for amin and amax were −500m/s2 and 350m/s2 respectively. The optimal acceleration

commands are shown in Figs. 6 – 8. Again, a similar feature of a bias in the acceleration command isobserved. In the present case the idea is to position the missile so that the expected target maneuvers maybe countered by a negative acceleration command, avoiding the acceleration limits. Apparently, both in therate limited case (Figs. 3 – 5) and without the rate limit case (Figs. 6 – 8) the optimization of the missileguidance law looks ahead to position the missile as far as possible from saturation limits and thus place themissile in as advantageous situation as possible to deal with future target maneuvers.

Closer to intercept, as in Fig 3, the sign of the acceleration command corresponding to Z = 0 is reversed.Now the optimal guidance command is a result of a compromise between driving xM towards zero andapproaching the target. The acceleration command reversal also fits in well with the earlier policy of drivingZEM in the positive direction for xM > 0. Note that the reversal is absent from the problem without therate saturation.

-600

-400

-200

0

200

400

600

-0.1 -0.05 0 0.05 0.1

a c [m

/sec

2 ]

Z [m]

Figure 6. Acceleration command dependence on ZEM; tgo = 0.05 seconds, no rate limiter

In Fig. 9 an approximation of N ′ is plotted for two different values of the target’s maneuver standarddeviation along with the navigation constant of OGL. The approximation of N ′ was obtained by first finding

12 of 18


-600

-400

-200

0

200

400

600

-10 -5 0 5 10

a c [m

/sec

2 ]

Z [m]


13 of 18


-600

-400

-200

0

200

400

600

-10 -5 0 5 10

a c [m

/sec

2 ]

Z [m]


14 of 18


the smallest value of ZEM that causes the guidance law to saturate (ZEMsat) and then calculating N ′ =act

2go/ZEMsat. Clearly, the gains are higher than those of OGL in order to reach saturation early on and

exploit the missile’s maneuver capabilities. Also, as expected, against a more maneuverable target the gainis higher. Note that the values of N ′ may seem outlandish, but these should be considered in light of:

• Relatively small observation noise assumed in the example.

• The calculation of N ′ was based on knowledge of the missile saturation. Hence, the magnitude of N ′

should be taken as an indication of the size of the ZEM required to saturate the missile accelerationcommand.

It was shown in Ref. 10 that similar large values of N ′ do indeed lead to smaller miss distances than the useof N ′ derived from OGL (which ignores the missile’s acceleration saturation). Also, note that N ′ derivedfrom OGL does reach similar outlandish values; but very close to the terminal time. Here N ′ reaches itsmaximum some time before the terminal time, leaving time for the missile’s bounded acceleration to affectthe miss distance.

0

50

100

150

200

250

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

N’

tg

5010

OGL

Figure 9. Effective navigation gain comparison; xM = 0

VI. Conclusions

A stochastic optimal guidance law for an acceleration constrained missile with nonlinear dynamics hasbeen presented. The guidance law has been obtained by solving the nonlinear Hamilton-Jacobi equation.Numerically solving this equation was possible by reducing the order of the problem using the well-knownzero effort miss concept.

Unlike classical deterministic optimal guidance laws, the obtained stochastic guidance law was found tohave a nonlinear dependence on the zero effort miss distance and the missile internal states. Moreover, thenew guidance law is dependent on the probability density function of the estimated states.

15 of 18


If rate saturation or nonsymmetric acceleration bound exist then acceleration command is issued evenif the zero effort miss is zero. This presents the look ahead capability of the guidance law to position themissile as far as possible from saturation limits thus placing it in as advantageous situation as possible todeal with future target maneuvers.

References

1Singer, R. A., “Estimating Optimal Tracking Filter Performance for Manned Maneuvering Targets,” IEEE Transactionon Aerospace and Electronic Systems, Vol. 6, No. 4, 1970, pp. 473–483.

2Fitzgerald, R. J., “Shaping filters for Disturbances with Random Starting Times,” AIAA Journal of Guidance andControl , Vol. 2, No. 2, 1979, pp. 152–154.

3Yuan, L. C., “Homing and Navigational Courses of Automatic Target Seeking Devices,” Journal of Applied Physics,Vol. 19, 1948, pp. 1122–1128.

4Garber, V., “Optimum Intercept Laws for Accelerating Targets,” AIAA Journal , Vol. 6, No. 11, 1969, pp. 2196–2198.5Cottrell, R. G., “Optimal Intercept Guidance for Short-Range Tactical Missiles,” AIAA Journal , Vol. 9, No. 7, 1971,

pp. 1414–1415.6Rusnak, I., “Advanced Guidance Laws for Acceleration-Constrained Missile, Randomly Maneuvering Target and Noisy

Measurements,” IEEE Transactions on Aerospace and Electronic Systems, Vol. 32, No. 1, 1996, pp. 456–464.7Maybeck, S. P., Stochastic Models, Estimation, and Control, Vol. 3 , Vol. 141 of Mathematics in Science and Engineering,

Academic Press, 1982, p. 17.8Shinar, J. and Shima, T., “Non-Orthodox Guidance Law Development Approach for Intercepting Maneuvering Targets,”

AIAA Journal of Guidance, Control, and Dynamics, Vol. 25, No. 4, 2002, pp. 658–666.9Witsenhausen, H. S., “Separation of Estimation and Control for Discrete Time Systems,” Proceedings of the IEEE ,

Vol. 59, No. 11, 1971, pp. 1557–1566.10Hexner, G., Shima, T., and Weiss, H., “An LQG Guidance Law with Bounded Acceleration Command,” Proceedings of

the IEEE Conference on Decision and Control , December 2003.11Hexner, G. and Shima, T., “Stochastic Optimal Control Guidance Law with Bounded Acceleration,” Proceedings of the

IEEE Conference on Decision and Control , December 2004.12Shaviv, I. G. and Oshman, Y., “Guidance Without Assuming Separation,” Proceedings of the AIAA Guidance, Naviga-

tion, and Control Conference, CP-6154, AIAA, Washington, DC, 2005.13Shaviv, I. G. and Oshman, Y., “Estimation-Guided Guidance,” Proceedings of the AIAA Guidance, Navigation, and

Control Conference, CP-6217, AIAA, Washington, DC, 2006.14Kushner, H. J. and Dupuis, P., Numerical Methods for Stochastic Control Problems in Continuous Time, Springer-Verlag,

2001.

Appendix: Numerical Solution of the Hamilton-Jacobi equation

The Hamilton-Jacobi equation was solved numerically using the Markov chain approximation method.14

This class of methods has the advantage that we are ensured of the convergence of the numerical results tothe solution of the continuous problem as the time and spatial discretizations shrink towards zero.

The method is based on deriving a Markov chain to approximate the continuous time problem, and thenproving that both the optimal control and the cost derived through the Markov chain converge to the optimalcontrol and cost for the continuous time problem.

The Markov chain approximation is obtained by finite differencing the Hamilton-Jacobi partial differentialequation. However, the finite differencing is used solely to derive the approximating Markov chain. The proofof convergence relies only on the properties of the approximating Markov chain.

In order to discretize the Hamilton-Jacobi equation, the state of the stochastic equation (38) must bemodified to constrain z to a bounded set. The missile internal states, xM , are inherently bounded becauseac is bounded, and the missile was assumed stable. When the missile dynamics are defined as in section V,in Eq. (49), xM is constrained to |xM | ≤ amax

M . To constrain Z to a bounded set, Eq. (34) is modified to

dE{Z(t)|y(λ), λ < t} = b′1 dt + σ′dν, (53)

whereb′1 = −C Z > M

=(1− Z−m

M−m

)b1 −

(Z−mM−m

)C m < Z ≤ M

= b1 −m < Z ≤ m

=(1− Z+m

−M+m

)b1 + Z+m

−M+mC −M < Z ≤ −m

= C Z ≤ −M

(54)

16 of 18


andσ′ = 0 Z > M

=(1− Z−m

M−m

)σ m < Z ≤ M

= σ −m < Z ≤ m

=(1− Z+m

−M+m

)σ −M < Z ≤ −m

= 0 Z ≤ M

(55)

Note that both b′1 and σ′ are continuous and ensure that |Z| ≤ M for any realization of the innovationsprocess, and any commanded acceleration |ac| ≤ amax

M . In the examples presented in the paper M was takenas 15m, and M −m was set to 10 units of discretization in the Z direction.

The Hamilton-Jacobi equation (41) was discretized using the finite difference approximations,

∂V

∂t≈ V (z, t + δ)− V (z, t)

δ(56)

∂V

∂zi≈ V (z + hiei, t + δ)− V (z, t + δ)

hiif b′i ≥ 0 (57)

∂V

∂zi≈ V (z, t + δ)− V (z,−eihi, t + δ)

hiif b′i < 0 (58)

∂2V (z)∂z2

i

≈ V (z + eihi, t + δ) + V (z− eihi, t + δ)− 2V (z, t + δ)h2

i

, (59)

where δ is the time discretization, ei is the unit vector and hi is the discretization length in the ith direction.An approximation analogous to (59) may also be derived, if M is not diagonal, but is not needed here.

Letb+i = max(b′i, 0), (60)

andb−i = max(−b′i, 0), (61)

so that,b+i + b−i = |b′i| (62)

Using approximations (56)–(59), the Hamilton-Jacobi partial differential equation (41) becomes,

V (z, t + δ)− V (z, t)δ

+

minac

{∑

i

b+i

[V (z + hiei, t + δ)− V (z, t + δ)

hi

]− b−i

[V (z, t + δ)− V (z− hiei, t + δ)

hi

]

+σ′2R

2V (z + h1e1, t + δ) + V (z− h1e1, t + δ)2V (z, t + δ)

h21

+a2

c

2

}= 0

(63)

Solving for V (z, t) and simplifying we obtain,

V (z, t) = minac

{V (z, t + δ)

[1− δσ′2R

h21

− δ

h1|b′1| −

δ

h2|b′2|

]

+V (z + h1e1, t + δ)[b+1

δ

h1+

σ′2Rδ

2h21

]+ V (z− h1e1, t + δ)

[b−1

δ

h1+

σ′2Rδ

2h21

]

+V (z + h2e2, t + δ)[b+2

δ

h2

]+ V (z− h2e2, t + δ)

[b−2

δ

h2

]+ δ

a2c

2

},

(64)

and in addition the terminal condition,V (z, tf ) = sZ2 (65)

17 of 18


where from Eq. (23) recall that z1 = Z. Eq. (64) can be solved by backwards recursion starting from theterminal time. It is also the recursion equation for the cost to go function of a Markov process on the discretegrid with the spatial discretization of sizes h1 and h2, with transition probabilities

Pr(z|z) =[1− δσ′2R

h21

− δ

h1|b′1| −

δ

h2|b′2|

](66)

Pr(z + h1e1|z) =[b−1

δ

h1+

σ′2Rδ

2h21

](67)

Pr(z− h1e1|z) =[b+1

δ

h1+

σ′2Rδ

2h21

](68)

Pr(z + h2e1|z) =[b+2

δ

h2

](69)

Pr(z− h2e1|z) =[b−2

δ

h2

](70)

and cost per step of δa2c/2. For this interpretation to be valid the probabilities defined in Eqs. (66)–(70)

should be valid transition probabilities. The probabilities do sum to 1, but we must also ensure that theyare all non-negative. For this it is sufficient to require that,

Pr(z|z) =[1− δσ2

h21

− δ

h1|b1| − δ

h2|b2|

]≥ 0 (71)

Eq. (71) constrains the maximum time step δ that we may use in iterating the recursion (64). More explicitly,

δ ≤ 1σ2

h21

+ N1h1

+ N2h2

, (72)

where N1 and N2 are such that|b1| ≤ N1 (73)

and|b2| ≤ N2. (74)

Let ζk be the discrete time and discrete space Markov process corresponding to the transition probabilitiesEqs. (66)–(70). Then the proof of convergence of ζk to the continuous time process (38) and the convergenceof the cost to the continuous time cost of Eq. (33) as δ → 0 and h → 0 simultaneously requires the localconsistency condition,14

E {ζk+1 − ζk|ζk} = bδ + o(δ) (75)

andCov

{(ζk+1 − ζk)2 |ζk

}= Mδ + o(δ) (76)

The two local consistency conditions are satisfied by the Markov transition probabilities (66)–(70) by virtueof their definitions. The proof of convergence of the Markov chain to the continuous time process and theproof of convergence of the cost to the continuous time cost may be found in Ref. 14.

It was found that keeping the discretization constant throughout the solution interval was simply notpractical. Typically in the application of these methods to guidance problems, the noise intensity σ′2R in(42) decreases as the missile approaches the target. A smaller value of σ′2R implies that larger time stepsmay be taken, implying that if the spatial discretization were constant the permissible time step woulddecrease as the solution point retreated from the terminal time. This caused the permissible time stepsto decrease to unreasonably small values far from the terminal time. Further, it is expected that near theterminal time the guidance gain increases and all the variability in the guidance law be confined to a smallregion around Z = 0, which in turn would require a fine spatial discretization to observe the details of thesolution. In the xM direction the discretization was kept constant. In the Z direction the discretizationwas finest near the terminal time, and the discretization was gradually coarsened the further the solutionwas from the terminal time, making it possible to keep the length of the time step approximately constantthroughout the solution interval. In this way the size of the spatial discretization was in accord with the sizeof the features expected in the solution.

18 of 18


Date post:	14-Dec-2016
Category:	Documents
Upload:	tal
View:	212 times
Download:	0 times

[American Institute of Aeronautics and Astronautics AIAA Guidance, Navigation and Control Conference...

Documents