An Overview of Risk-Sensitive Stochastic Optimal...

An Overview of Risk-Sensitive StochasticOptimal Control

Matt James

Australian National University

Workshop:

Stochastic control, communications and numerics perspective

UniSA, Adelaide, Sept. 2004.

1

Outline

• A brief history

• Approaches to controller design

• Relationships among approaches

• Some limit results (state feedback)

• Robust interpretation

• Partially observed case

• Quantum systems

• References

2

A brief history

The risk-sensitive criterion introduced by Jacobson 1973.

[Linear-Exponential-Quadratic-Gaussian (LEQG)]

Linear system with Gaussian noise:

dx = (Ax + Bu)dt + dw

Minimize average of exponential of quadratic cost:

Jµ = Ex,t

[

expµ

(

∫ T

t

12 [x′Qx + 1

2 |u|2] ds + 1

2x′Rx

)]

µ > 0 - risk-sensitive

µ < 0 - risk-seeking

3

solution

Optimal state feedback control is

u∗(x, t) = −B′Pµt x

where 0 ≤ t ≤ T

Pµt = −Pµ

t A − A′Pµt + Pµ

t (BB′ − µI) Pµt − Q,

PµT = R

[control type Riccati equation]

4

Limit µ → 0: recover standard LQG control (risk-neutral).

[Linear-Quadratic-Gaussian (LQG)]



Minimize average of quadratic cost:

Ex,t

[

∫ T

t

12 [x′Qx + 1

2 |u|2] ds + 1

2x′Rx

]

Solution: Optimal state feedback control is

u∗(x, t) = −B′Ptx

where

Pt = −PtA − A′Pt + PtBB′Pt − Q,

PT = R

5

partially observed case



dy = Cxdt + dv

Minimize average of exponential of quadratic cost

Jµ = Ex,t

[

expµ

(

∫ T

t

12 [x′Qx + 1

2 |u|2] ds + 1

2x′Rx

)]

over causal maps y(·) 7→ u(·).

6

Solution: Whittle 1981, Bensoussan-van Schuppen 1985.

Optimal partially observed control is

u∗t = −B′Pµ

t xµt

where

dxµ = ([A + µY µQ]xµ + Bu)dt + Y µC ′(dy − Cxµdt)

and

Y µ = AY µ + Y µA′ + µY µQY µ + I − Y µC ′CY µ

Significantly, this is not the Kalman filter.

When µ → 0, LEQG → LQG, whose solution is expressed in terms of

the Kalman filter.

7

Some subsequent developments:

• LEQG. Connection with dynamic games. (Jacobson, 1973)

• LEQG. Connection with robust control (H∞). (Glover-Doyle, 1988)

• Risk-sensitive control for nonlinear systems, state feedback.

Connection with dynamic games. (Fleming-McEneaney 1992, James,

1992, Nagai 1996)

• Solution of risk-sensitive control for nonlinear systems, output

feedback. Solution and connection with dynamic games and nonlinear

robust control. (James-Baras-Elliott 1994, Charalambous-Hibey 1996,

Dai Pra-Meneghini-Runggaldier 1996. )

• Robustness properties of risk-sensitive criterion. Also stochastic IQC.

(James, Fleming, Boel, Dupuis, Petersen, 1995, 1998)

• Risk-sensitive control for finance. (Nagai 2000)

• Risk-sensitive control for quantum systems. (James 2004)

8

Approaches to controller design

The basic aim of robust control is:

Design a controller to achieve “satisfactory performance” in

the presence of disturbances and uncertainty.

Stochastic control has similar aims.

9

Dynamics H∞ robust control

xs = f(xs) + g(xs)ds, 0 < s < T,

zs = h(xs)

f(0) = 0, h(0) = 0, x0 = 0

Disturbance—Output Map

Td, z : d(·) 7→ z(·)

H∞ Norm

‖ Td, z ‖∞△= sup

d∈L2[0,T ]

‖ z ‖2

‖ d ‖2

Note:

‖ Td, z ‖∞≤ γ iff

∫ T

0

|z(t)|2dt ≤∫ T

0

γ2|d(t)|2 dt ∀d ∈ L2[0, T ]

10

D-H2 deterministic optimal control

disturbance: none

controller: gives optimal performance without

special consideration of disturbances.

Optimal cost

W (x, t) = infu

[

∫ T

t

L(xs, us) ds + Φ(xT )

]

Dynamics

xs = b(xs, us), t < s < T,

xt = x

11

S-H2 (RN) stochastic optimal control

disturbance: noise

controller: gives optimal average

performance

Optimal cost

W ε(x, t) = infu

Ex,t

[

∫ T

t

L(xεs, us) ds + Φ(xT )

]

Dynamics

dxεs = b(xε

s, us) ds +√

ε dBs, t < s < T,

xεt = x

(ε > 0 - noise intensity)

12

RS stochastic risk-sensitive optimal control

disturbance: noise

controller: gives optimal average performance

using exponential cost

(heavily penalizes large values)

Optimal cost

Sµ,ε(x, t) = inf

uEx,t

[

expµ

ε

(∫ T

t

L(xεs, us) ds + Φ(xε

T )

)]

Dynamics

dxεs = b(xε

s, us) ds +√

ε dBs, t < s < T,

xεt = x

(µ > 0 - risk sensitivity)

13

H∞ H∞ robust control

disturbance: L2 signal

controller: achieves specified H∞

norm constraint

(worst-case design)

Find u(·) such that

‖ Td, z ‖∞ ≤ γ

Dynamics

xs = b(xs, us) + ds t < s < T

xt = x

zs = h(xs)

(γ = 1/√

µ - H∞ constraint)

14

D-DG deterministic differential game

disturbance: L2 signal

controller: gives optimal performance subject

to opposing efforts of disturbance

(worst-case design)

Optimal cost

Wµ(x, t) = sup

d

infu

[∫ T

t

L(xs, us) −1

2µ|ds|

2ds + Φ(xT )

]

Dynamics

xs = b(xs, us) + ds, t < s < T

xt = x

15

S-DG stochastic differential game

disturbances: noise

L2 signal

controller: gives optimal average performance

subject to opposing efforts of

disturbance (worst-case design)

Optimal cost

Wµ,ε(x, t) = supd

infu

Ex,t

[

∫ T

t

L(xεs, us) − 1

2µ|ds|2 ds + Φ(xε

T )

]

Dynamics

dxεs = (b(xε

s, us) + ds) ds +√

ε dBs, t < s < T

xεt = x

16

Relationships among approaches

ε → 0

?

-

?

D-H2

S-H2RS

D-DG

µ → 0

µ → 0

ε → 0

-

RS, S-DG, D-DG, and S-H2 are

“perturbations”

of D-H2.

17

(James, Fleming-McEneaney, Whittle)

RS ≡ S-DG

≡ D-H2

+ (noise variance term)

+ (disturbance energy term)

Equivalence via

• logarithmic transformation,

• dynamic programming PDEs, and

• representation theorem

Expansion valid for

• small noise intensity and

• small risk-sensitivity

18

S-H2 ≡ D-H2 + (noise variance term)

H∞ ≡ D-H2 + (disturbance energy term)

RS ≡ H2 + H∞ ???

19

Some limit results (state feedback)

Assumptions

• b(x, u) = f(x) + g(x)u, where f : Rn → Rn, g : Rm → Rn and

their first order derivatives Df , Dg are bounded and uniformly

Lipschitz continuous.

• Control values: U ⊂ Rm is compact and convex.

• Disturbance values: D = Rn.

• Φ : Rn → R is non–negative; Φ and DΦ are bounded and

uniformly Lipschitz continuous.

20

• L : Rn × Rm → R is C2, non–negative; L(·, u) and DxL(·, u) are

bounded and uniformly Lipschitz continuous, uniformly in u ∈ U ;

and D2uL(x, ·) > 0, uniformly in x ∈ Rn.

• There exists locally Lipschitz U∗(x, p) achieving maximum in

minu∈U

[p · b(x, u) + L(x, u)]

• Use Elliott–Kalton formulation for two–player zero–sum

differential games (strategies, upper and lower values, etc)

• For later use, assume also

– L(x, u) = 12 |u|2 + V (x), and the functions f , g, V and Φ are of

class C∞ and have compact support.

– U∗(x, p) is of class C2.

21

Viscosity Solutions

Viscosity solutions introduced by Crandall-Lions 1983.

The definition for a fully nonlinear parabolic PDE

∂v∂t

+ F (x, Dv, D2v) = 0 in Rn × (0, T )

v(x, T ) = Φ(x) in Rn.

is as follows.

22

Definition

An upper semicontinuous (u.s.c.) function v (resp. l.s.c. function v)

is called a viscosity subsolution (resp. supersolution) if

v ≤ Φ (resp. v ≥ Φ) on Rn × {T}

and

∂

∂tφ(x, t) + F (x, Dφ(x, t), D2φ(x, t)) ≥ 0 (resp. ≤ 0)

for every smooth function φ and any local maximum (resp.

minimum) (x, t) ∈ Rn × [0, T ) of v − φ (resp. v − φ).

A continuous function is called a viscosity solution if it is both a

subsolution and a supersolution.

23

Comparison Theorem

If v is a subsolution and if v is a supersolution, then

v ≤ v in Rn × [0, T ].

Thus any continuous viscosity solution is unique.

24

Dynamic Programming PDEs

RS (ε > 0, µ > 0)

∂Sµ,ε

∂t+ ε

2∆Sµ,ε+

minu∈U [DSµ,ε · b(x, u) +µεL(x, u)Sµ,ε] = 0 in Rn × (0, T )

Sµ,ε(x, T ) = exp µεΦ(x) in Rn

Logarithmic transformation: (Fleming, 1970’s)

Wµ,ε(x, t) =ε

µlog Sµ,ε(x, t)

25

S-DG (ε > 0, µ > 0)

∂W µ,ε

∂t+ ε

2∆Wµ,ε+

minu∈U maxd∈D[DWµ,ε ·(b(x, u) + d) + L(x, u)

− 12µ

|d|2] = 0 in Rn × (0, T )

Wµ,ε(x, T ) = Φ(x) in Rn.

Note:

minu∈U maxd∈D

[

p · (b(x, u) + d) + L(x, u) − 1

2µ|d|2]

= minu∈U [p · b(x, u) + L(x, u)] + 1

2µ|p|2

26

Representation theorem

Viscosity formulation of stochastic differential games developed by

• Fleming–Souganidis (1989).

Theorem (James, 1992; Fleming-McEneany 1992) The function

Wµ,ε defined by the logarithmic transformation is the optimal cost

function of the stochastic differential game defined above.

i.e. RS ≡ S-DG

Proof : • Methods of Fleming (1971) imply

|DWµ,ε(x, t)| ≤ C∗ for all (x, t) ∈ Rn × [0, T ].

• In view of this bound, the set of disturbance values D can be

taken as bounded.

• The result now follows from Fleming–Souganidis (1989). 2

27

Limit PDEs

Obtained as ε ↓ 0 and/or µ ↓ 0.

D-DG (ε ↓ 0, µ > 0)

∂W µ

∂t+ minu∈U maxd∈D[DWµ ·(b(x, u) + d)

+ L(x, u) − 12µ

|d|2] = 0 in Rn × (0, T )

Wµ(x, T ) = Φ(x) in Rn.

28

S-H2 (ε > 0, µ ↓ 0)

∂W ε

∂t+ ε

2∆W ε+

minu∈U [DW ε · b(x, u) +L(x, u)] = 0 in Rn × (0, T )

W ε(x, T ) = Φ(x) in Rn.

D-H2 (ε ↓ 0, µ ↓ 0)

∂W∂t

+ minu∈U [DW · b(x, u) + L(x, u)] = 0 in Rn × (0, T )

W (x, T ) = Φ(x) in Rn.

29

Limit theorems

Limit theorems depend on general viscosity limit techniques

developed by

• Evans–Ishii (1984)

• Barles–Perthame (1987)

Theorem (James, 1992) We have

limε↓0

Wµ,ε(x, t) = W

µ(x, t)

uniformly on compact subsets, where W µ ∈ C(Rn × [0, T ]) is the unique

bounded continuous viscosity solution of the corresponding PDE and is the

optimal cost function of the deterministic differential game defined above.

i.e. S-DG −→ D-DG as ε ↓ 0.

30

Proof : • Assumptions imply

0 ≤ Wµ,ε(x, t) ≤ C for all (x, t) ∈ R

n × [0, T ], µ, ε > 0

• The function

v(x, t) = lim supε↓0, y→x, s→t

Wµ,ε(y, s).

is u.s.c. and a viscosity subsolution.

• The function

v(x, t) = lim infε↓0, y→x, s→t

Wµ,ε(y, s)

is l.s.c and a viscosity supersolution.

• By construction

v ≤ v

• By the comparison theorem

v ≤ v

31

• Therefore

limε↓0

Wµ,ε(x, t) = v = v△= v

• From Evans–Souganidis (1984) the value Wµ of the D-DG is

the unique viscosity solution, and hence

Wµ = v.

2

Expansion of optimal cost for S-DG

W aε,bε(x, t) = W (x, t) + ε (Wg(x, t), Wn(x, t)) ·

a

b

+ . . .

[methods of Fleming, 1970’s]

32

Robust interpretation

(Boel, Dupuis, James, Petersen 1998, 2000)

We wish to interpret the finiteness of the risk-sensitive criterion for a

nominal system G0 in terms of a performance bound for a perturbed

system G.

nominal system

- -

ZwG0

33

Nominal system dynamics:

G0 :

dXt = b(Xt)dt + ε12 σ(Xt)dwt

Zt = h(Xt)

Risk-sensitive criterion:

S(x, T ).= Ex

[

exp1

2γ2ε

∫ T

0

|Zt|2dt

]

γ2 = 1/µ

34

Key tool is the representation [Dupuis-Ellis 1997]

γ2ε log S(x, T ) = supv∈VT

Ex

[

1

2

∫ T

0

(

|Zt|2 − γ2|vt|2)

dt

]

where Xt is given by perturbed system dynamics:

G :

dXt = b(Xt)dt + σ(Xt)vtdt + ε12 σ(Xt)dwt

Zt = h(Xt)

35

perturbed system

- -

�

-

ZwG

perturbation

Xv

robustness inequality

Ex

[

1

2

∫ T

0

|Zt|2dt

]

≤ γ2Ex

[

1

2

∫ T

0

|vt|2dt

]

+ γ2ε log S(x, T )

(analogous to H∞ inequality)

36

Partially observed (output feedback) case

(James-Baras-Elliott, 1994)

Dynamics

dxt = b(xt, ut) dt +√

ε dwt

dyt = h(xt) dt +√

ε dvt

Objective γ2 = 1/µ

J(u) = E

[

expµ

ε

(

∫ T

0

L(xt, ut) dt + Φ(xT )

)]

37

solution of RS

Information state σµ, εt (x) not cond prob

〈σµ,εt , η〉 = E0

[

η(xεt )Z

εt exp

(

µ

ε

∫ t

0

L(xεs, us) ds

)

| Yt

]

for all η

Dynamics stochastic PDE

dσµ,εt =

(

Aut ∗ + µεLut)

σµ,εt dt + 1

εhσµ,ε

t dyεt ,

σµ,ε0 = ρ in L1(R

n),

Representation

J(u) = E0[

〈σµ,εT , exp

µ

εΦ〉]

Linear case: Bensoussan-van Schuppen

38

Dynamic programming

Value function

Sµ,ε(σ, t) = infu∈Ut,T

E0σ,t

[

〈σµ,εT , e

µεΦ〉]

.

Dyn prog equation infinite dim, nonlinear, 2nd-order

∂∂t

Sµ,ε + 12ε

D2Sµ,ε(hσ, hσ)

+ infu∈U

{

DSµ,ε ·(

Au ∗σ + µεLuσ

)}

= 0

Sµ,ε(σ, T ) = 〈σ, eµεΦ〉

Optimal controller

u∗t (σ) = achieves min in d.p.e.

information state feedback

39

Small Noise limit

Duality

〈σ, ν〉 △=

∫

Rn

σ(x)ν(x) dx

max-plus

(p, q)△= sup

x∈Rn

{ p(x) + q(x) }

large deviations

limε→0

ε

µlog〈eµ

εp, e

µε

q〉 = (p, q)

40

Information state limit

limε→0

ε

µlog σµ,ε

t (x) = pµt (x),

Value function limit

limε→0

ε

µlog Sµ,ε(e

µε

p, t) = Wµ(p, t)

Some consequences

• Can solve output feedback deterministic dynamic games (D-DG)

using analogous, though deterministic, information state methods

(cf. max-plus).

• Output feedback nonlinear H∞ control problem can be solved

using the D-DG methods.

41

Risk-sensitive control of quantum systems

(James, 2004 - current)

bath (EM field)

X dB†, dB

H

system (atom)

Γ

System: eg. atom, can be controlled, Hilbert space H

Bath: e.g. electromagnetic field, continuously monitored, Hilbert space Γ (Fock space)

42

Dynamics on H ⊗ Γ: [Hudson-Parthasarathy quantum stochastic DE, 1980’s]

dU(t) ={

−K(u(t))dt + LdB†(t) − L†dB(t)}

U(t)

with initial condition U(0) = I, where

K(u) =i

~H(u) +

1

2L†L

System operators evolve according to

X(t) = jt(u, X) = U†(t)X ⊗ IU(t)

i.e.

dX(t) + (X(t)K(t) + K†(t)X(t) − L†(t)X(t)L(t))dt

= [X(t), L(t)]dB†(t) − [X(t), L†(t)]dB(t)

where

L(t) = jt(u, L), K(t) = jt(u, K(u(t)))

43

Measurements of real quadrature of field: Qt = Bt + B†t

dY (t) = (L(t) + L†(t))dt+ dQ(t)

output field input field

Risk-sensitive cost: [James, 2004]

Jµ = 〈π0 ⊗ vv†, R†(T )eµC2(T )R(T )〉

where [〈A, B〉 = tr(A†B)]

dR(t)

dt=

µ

2C1(t)R(t), C1(t) = jt(u, C1(u(t))), C2(t) = jt(u, C2)

π0 = initial system state

vv† = field vacuum state

44

Stochastic representation:

Jµ = E0[〈σµT , eµC2〉]

where P0 is standard Wiener measure, and the information state (an

unnormalized density operator) satisfies

dσµt +[Kσµ

t +σµt K†−LσtL

†]dt = µ12 [C1σ

µt +σµ

t C1]dt+[Lσµt +σµ

t L†]dy(t)

[modified stochastic master equation]

πµt =

σµt

〈σt, 1〉[unnormalized state, new to physics]

When µ → 0 recover the Belavkin quantum filtering equation

(stochastic master equation) (1980’s) [unnormalized state σt,

normalized state πt = σt/〈σt, 1〉].

45

The dynamic programming equation is, formally,

∂∂t

Sµ(σ, t) + infu∈U Lµ;uSµ(σ, t) = 0, 0 ≤ t < T

Sµ(σ, T ) = 〈σ, eµC2〉

Some related current work

• Discrete time (James 2004)

• Robustness interpretation, discrete time (James-Petersen 2004)

• Robustness, linear/quadratic case (Doherty, d’Helon, James,

Petersen, Wilson)

46

References[1] T. Basar and P. Bernhard. H∞ Optimal Control and Related Minimax Design Problems. Birkhauser,

Boston, 1995.

[2] A. Bensoussan and J. H. van Schuppen. Optimal control of partially observable stochastic

systems with an exponential-of-integral performance index. SIAM Journal on Control and

Optimization, 23:599–613, 1985.

[3] P. Dupuis and R.S. Ellis. A Weak Convergence Approach to the Theory of Large Deviations. Wiley, New

York, 1997.

[4] W.H. Fleming and W.M. McEneaney. Risk-sensitive control on an infinite time horizon. SIAM J.

Control and Optimization, 33:1881–1915, 1995.

[5] K. Glover and J. Doyle. State space formulea for all stabilizing controllers that statisfy an H∞norm bound and relations to risk-sensitivity. Systems and Control Letters, 11:167–172, 1988.

[6] J.W. Helton and M.R. James. Extending H∞ Control to Nonlinear Systems: Control of Nonlinear

Systems to Achieve Performance Objectives, volume 1 of Advances in Design and Control. SIAM,

Philadelphia, 1999.

[7] D.H. Jacobson. Optimal stochastic linear systems with exponential performance criteria and

their relation to deterministic differential games. IEEE Trans. Aut. Control, 18(2):124–131, 1973.

[8] M. R. James, J.S. Baras, and R.J. Elliott. Risk-sensitive control and dynamic games for

partially observed discrete-time nonlinear systems. IEEE Trans. Automatic Control, 39(4):780–792,

1994.

[9] M.R. James. Asymptotic analysis of nonlinear stochastic risk-sensitive control and differential

games. Mathematics of Control, Signals and Systems, 5(4):401–417, 1992.

[10] M.R. James. Risk-sensitive optimal control of quantum systems. Phys. Rev. A, 69:032108, 2004.

47

[11] P. Dai Pra, L. Meneghini, and W.J. Runggaldier. Connections between stochastic control and

dynamic games. Math. Control, Signals and Systems, 9(4):303–326, 1996.

[12] A.J. van der Schaft. L2-Gain and Passivity Techniques in Nonlinear Control. Springer Verlag, New

York, 1996.

[13] P. Whittle. Risk-sensitive linear quadratic Gaussian control. Advances in Applied Probability,

13:764–777, 1981.

[14] G. Zames. On the input-output stability of time-varying nonlinear feedback systems. part i:

Conditions derived using concepts of loop gain, conicity, and positivity. part ii: Conditions

involving circles in the frequency plane and sector nonlinearities. IEEE Trans. Aut. Control,

11:228–238, 465–476, 1966.

[15] G. Zames. Feedback and optimal sensitivity: Model reference transformation, multiplicative

seminorms and approximate inverses. IEEE Trans. Aut. Control, 26:301–320, 1981.

[16] C.D. Charalambous and J.L. Hibey, ” Minimum principle for partially observable nonlinear

risk-sensitive control problems using measure-valued decompositions, ” Stochastics and

Stochastics Reports, vol. 57, no. 3+4, pp. 247-288, August 1996.

[17] W.H. Fleming and M.R. James, The Risk-Sensitive Index and the H2 and H∞ Norms for Nonlinear Systems,

Mathematics of Control, Signals, and Systems, 8(3), 1995, 199-221.

[18] H. Nagai, Bellman Equation of Risk-Sensitive Control, SIAM J. Control Optim., 34(1), 1996, 74-101.

[19] M.G. Crandall, L.C. Evans and P.L. Lions, Some Properties of Viscosity Solutions of Hamilton–Jacobi

Equations, Trans. AMS, 282 (2) (1984) 487–502.

[20] M.G. Crandall, H. Ishii and P.L. Lions, User’s Guide to Viscosity Solutions of Second Order Partial Differential

Equations, CEREMADE Report No. 9039, (1990).

[21] P. Dupuis, M.R. James, and I.R. Petersen, Robust Properties of Risk-Sensitive Control, Math. Control,

Systems and Signals, 13, 318-332, 2000.

[22] M. Bardi and I. Capuzzo-Dolcetta, “Optimal Control and Viscosity Solutions of

Hamilton-Jacobi-Bellman Equations”, Birkhauser, Boston, 1997.

48

[23] W.H. Fleming and H.M. Soner, “Controlled Markov Processes and Viscosity Solutions”,

Springer, New York, 1993.

49

Date post:	13-Aug-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

An Overview of Risk-Sensitive Stochastic Optimal...

Documents