Post on 02-Jan-2022
transcript
Slide 1
Cheng Tao & Frank L. LewisCheng Tao & Frank L. LewisAdvanced Controls & Sensors Group
Automation & Robotics Research Institute (ARRI)The University of Texas at Arlington
Nonlinear Network Structures for Optimal Control
Neural Network Solution for Fixed-Final Time
Optimal Control of Nonlinear Systems
Automation & Robotics Research Institute (ARRI)The University of Texas at Arlington
ECC 07 Kos
Cheng TaoFrank L. Lewis
Slide 3
qdRobot System[Λ I]
Robust ControlTerm
q
v(t)
e
PD Tracking Loop
τrKv
Neural Network Robot Controller
^
qd
f(x)
Nonlinear Inner Loop
..
Feedforward Loop
Universal Approximation Property
Problem- Nonlinear in the NN weights sothat standard proof techniques do not work
Feedback linearization
Easy to implement with a few more lines of codeLearning feature allows for on-line updates to NN memory as dynamics changeHandles unmodelled dynamics, disturbances, actuator problems such as frictionNN universal basis property means no regression matrix is neededNonlinear controller allows faster & more precise motion
Slide 4
4 US Patents Sponsored by NSF- Paul Werbos
ARO- Randy Zachery
Slide 5
Problem- Nonlinear in the NN weights sothat standard proof techniques do not work
New book by Jay Farrell and Marios Polycarpou
Adaptive Approximation Based Control
Slide 6
Cell Homeostasis The individual cell is a complex feedback control system. It pumps ions across the cell membrane to maintain homeostatis, and has only limited energy to do so.
Cellular Metabolism
Permeability control of the cell membrane
http://www.accessexcellence.org/RC/VL/GG/index.html
Optimality in Biological Systems
Slide 7
2. Neural Network Solution of Optimal Design Equations – 2002-2006
Nearly Optimal ControlBased on HJ Optimal Design EquationsKnown system dynamicsPreliminary Off-line tuning
1. Neural Networks for Feedback Control – 1995-2002
Based on FB Control ApproachUnknown system dynamicsOn-line tuningNN- FB lin., sing. pert., backstepping, force control, dynamic inversion, etc.
3. Approximate Dynamic Programming – 2006-
Nearly Optimal ControlBased on recursive equation for the optimal valueUsually Known system dynamics (except Q learning)
The Goal – unknown dynamicsOn-line tuningOptimal Adaptive Control
ARRI Research Roadmap in Neural Networks
Extended adaptive control to NLIP systemsNo regression matrix
Nearly optimal solution ofcontrols design equations.No canonical form needed.
Extend adaptive control toyield OPTIMAL controllers.No canonical form needed.
Slide 8
Objective and Significance
Provide a tool to solve finite-horizon continuous-time optimal control problems for nonlinear systems.
Continuous time finite horizon optimal control problems appear applications in which people use model predictive control (receding horizon control).
Slide 9
Outline:
1. Fixed-Final Time Optimal Control of Nonlinear Systems Using Neural Network HJB Approach
2. Neural Network Solution for Finite-Final Time H-Infinity State Feedback Control
3. Neural Network Solution for Fixed-Final time Constrained Optimal Control
• This research was supported by NSF grant ECS-0140490 and ARO grant DAAD 19-02-1-0366.
Slide 10
Review of Related work and Motivation
Approximate HJB solutions
Munos et. al [65](Gradient descent approaches)
Kim, Lewis and Dawson [47](NNs)
Huang and Lin [44](Taylor series expansion)
NN applications to an optimal control
Miller [63](NNs for control)
Parisini and Zoppoli [70](Infinite horizon)
Constrained-input optimization
Sussmann, Sontag and yang [84]
Bernstein [15]
Dolphus [33]
Abu-Khalaf, M [1](Infinity horizon)
Unconstrained policy iteration with finite-time horizon
Beard[11]
Slide 11
with , positive definite on
where , , and the input
Background on Fixed-Final-Time HJB Optimal Control
Nonlinear dynamical system
)()()( tuxgxfx +=
nx ℜ∈ nxf ℜ∈)( mnxg ×ℜ∈)( ( ) mRtu ∈.
(1)
It is desired to find the control that minimizes a generalized nonquadratic functionalu
[ ]∫ ++= ft
tff dtuWxQttxttxV0
)()()),(()),(( 00 φ
)(xQ )(uW Ω
(2)
Slide 12
An infinitesimal equivalent to (2) is
( ) ( ) ( ) ( ) ( )( )tuxgxfx
txVLt
txV T
+⎟⎠⎞
⎜⎝⎛
∂∂
+=∂
∂−
,,
where . This is a time-varying partial differential equation with the cost function for any given and is solved backwards in time from .
( ) ( )uWxQL += ( )txV ,( )tu ftt =
By setting in (2) its boundary condition isftt =0
( )( ) ( )( )ffff ttxttxV ,, φ=
(3)
(4)
Background on Fixed-Final-Time HJB Optimal Control
Slide 13
According to Bellman’s optimality principle, the optimal cost is given by
( )( )
( ) ( ) ( ) ( )( )⎟⎟
⎠
⎞
⎜⎜
⎝
⎛+⎟⎟
⎠
⎞⎜⎜⎝
⎛
∂∂
+=∂
∂− xuxgxf
xtxVL
ttxV
T
tu
** ,min
,
which yields the optimal control
where is the optimal value function.( )txV ,*
( ) ( ) ( )x
txVxgRxu T
∂∂
−= −*
1* ,21
Substituting (6) into (5) yields the well-known time-varying Hamilton-Jacobi-Bellman (HJB) equation
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 0,,41,, *
1***
=∂
∂∂
∂−+
∂∂
+∂
∂ −
xtxVxgRxg
xtxVxQxf
xtxV
ttxV T
T
(5)
(6)
(7)
Background on Fixed-Final-Time HJB Optimal Control
Slide 14
Then (5) becomes
(8)
If this HJB equation can be solved for the value function , then the optimal control is
( )txV ,
Background on Fixed-Final-Time HJB Optimal Control
( )( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) 0,,41
,,,
*1
*
**
=∂
∂∂
∂−
+∂
∂+
∂∂
=
−
∗
xtxVxgRxg
xtxV
xQxfx
txVt
txVtxVHJB
TT
( ) ( ) ( )x
txVxgRxu T
∂∂
−= −*
1* ,21
Slide 15
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
NN Approximation of the Cost Function ( )txV ,
Using the following equation to approximate for on a compactset
[ ]fttt ,0∈nℜ⊂Ω
( ) ( ) ( ) ( ) ( )xtwxtwtxV LTL
L
jjjL σσ == ∑
=1, (9)
The NN weights are and is the number of hidden-layer neurons.
is the vector of activation function.
is the vector of NN weights.
( )tw j L
( ) ( ) ( ) ( )[ ] TLL xxxx σσσ ...21≡σ
( ) ( ) ( ) ( )[ ]TLL twtwtwt ...21≡w
( )txV ,
In Sandberg [78], it is shown that NNs with time-varying weights can be used to uniformly approximate continuous time-varying functions.
Slide 16
Note:
The set is selected to be independent. Then without loss of generality, they can be assumed to be orthonormal, i.e. select equivalent basis functions to that are also orthonormal. The orthonormality of the set on implies that if a function then
( )xjσ( )xjσ
( ){ }∞
1xjσ Ω
( ) ( )Ω∈ 2, Ltxψ
( ) ( ) ( ) ( )∑∞
=Ω
=1
,,,j
jj xxtxtx σσψψ
where is inner product.∫ΩΩ⋅= gdxfgf ,
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
Slide 17
Note that( ) ( ) ( ) ( ) ( )txt
xx
xtxV
LTLL
TLL wσw
σ∇≡
∂∂
=∂
∂ , (10)
where is the Jacobian , and that( )xLσ∇ ( ) xxL ∂∂σ
( ) ( ) ( )xtt
txVL
TL
L σw=∂
∂ ,(11)
Therefore approximating by uniformly in in the HJB equation (8) results in
( )txV , ( )txVL ,
(12)
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
( ) ( )txexQ
txxgRxgxt
xfxtxt
L
LTL
TL
TL
LTLL
TL
,41 1
=−
+
∇−−
− wσσw
σwσw
Slide 18
or
( ) ( ) ( ) ( )txextwtxVHJB L
L
jjjL ,,
1
=⎟⎟⎠
⎞⎜⎜⎝
⎛= ∑
=
σ (13)
where is a residual equation error. The corresponding optimal control input is( )txeL ,
(14)
To find the least-squares solution for , the method of weighted residuals is used( )tLw
( )( ) ( ) 0,,,
=∂
∂
Ω
txettxe
LL
L
w
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
( ) ( ) ( )x
txVxgRxu T
∂∂
−= −*
1* ,21 ( ) ( )txgR L
TL
T wσ∇= −1
Slide 19
(15)
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
Ω
−
Ω
Ω
−−
Ω
Ω
−
Ω
⋅−
∇⋅
∇⋅+
⋅∇⋅−
=
)(),()(),(
)(),()(
)()()(41
)(),(
)()(),()()(),(
)(
1
11
1
xxQxx
xtxg
Rxgxtxx
txxfxxx
t
LLL
LLTL
T
LTL
LL
LLLLL
L
σσσ
σwσ
σwσσ
wσσσσ
w
Boundary condition ( )( ) ( )( ) ( ) ( )( )fLfTLffff txtttxttxV σw== ,, φ
Slide 20
Lemma 1: Convergence of Approximate HJB Equation
Lemma 2: Convergence of NN Weights
Lemma 3: Convergence of Approximate Value Function
Lemma 4: Convergence of Value Function Gradient
Lemma 5: Convergence of Control Inputs
Lemma 6: Convergence of State Trajectory
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
Lemmas Proved:
Slide 21
Lemma 1: Convergence of Approximate HJB Equation.
( ) ( ) ( )∑=
=L
jj
TjL xtwtxV
1, σ ( )( ) ( ) 0,, =
ΩxtxVHJB LL σ ( ) 0, =
ΩLfL tV σLet satisfy and
( ) ( ) ( )∑∞
==
1,
j jT
j xtctxV σ ( ) ( ) ( ) ( )[ ]TLL tctctct ...21≡c
( )( ) 0, =txVHJB ( ) ( )( )fff ttxtxV ,, φ=
Let and satisfy
and
( )( ) 0, →txVHJB L Ω LThen on as increases.
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
Slide 22
1. Calculate using eq. (13).( )( ) ( )Ω
xtxVHJB jL σ,,
0, =Ωjk σσ2. Apply if to simplify .( )( ) ( )
ΩxtxVHJB jL σ,,
3. Prove as increases,( )( ) 0, →txVHJB L L
Here .( )( ) ( )( ) ( ) ( )∑∞
= Ω=
1,,,
j jjLL xxtxVHJBtxVHJB σσ
Outline of proof of Lemma 1
jk ≠
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
Slide 23
Optimal Algorithm Based on NN Approximation
A mesh of points over the integration region can be introduced on 0Ω
( ) ( )[ ]TLL xxAp1 xx |......| σσ=
( ) ( ) ( ) ( )[ ]TLL xfxxfxBp1 xx |......| σσ=
( ) ( )[ ]TxQxQDp1 xx |...|=
where in represents the number of points of the mesh.p px
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
( )( )
( ) ( )( )
T
TL
TL
TL
TL
xxgRxgx
xxgRxgxC
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
∇∇
∇∇=
−
−
p
1
x1
x1
|)()(41
......|)()()(41
σσ
σσ
Slide 24
(16)
then
(17)
This is a nonlinear ODE that can easily be integrated backwards using final condition to find the least-squares optimal NN weights.( )fL tw
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
DAAAtCtAAA
BAtAAtTT
LTL
TT
TL
TL
11
1
)()()()(
)()()(−−
−
−+
−=
ww
ww
( ) ( ) ( ) ( ) 0=−+⋅−⋅− DAtCtAtBAtAA TL
TL
TL
TL
T wwww
Slide 25
Numerical Examples
a) Linear System
(18)
To find a nearly optimal time-varying controller, the following smooth function is used to approximate the value function of the system
(19)
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
1 1 2 1
2 1 2 2
2 35 6 2
x x x ux x x u
= + += + +
( ) 223212
21121, xwxxwxwxxV ++=
Slide 26
Fig. 1 Linear System Weights Fig. 2 State Trajectory of Linear System
Fig. 3 Optimal NN Control Law
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Time
Wei
ght
Linear system weights
w1w2w3
0 1 2 3 4 5 6 7 8 9 10-1
-0.5
0
0.5
1
1.5
2
Time
Sta
tes
State Trajectory
x1x2
0 1 2 3 4 5 6 7 8 9 10-30
-25
-20
-15
-10
-5
0
5
Time
Con
trol I
nput
Optimal Control
u1u2
Slide 27
b) Nonlinear Chained System
213
22
11
uxxuxux
===
(20)
( )
33221
33220
33119
321183
31172
3116
2321153
22114
322113
23
2212
23
2111
22
2110
439
428
417
326315214233
222
211321 ,,
xxwxxw
xxwxxwxxwxxwxxxwxxxw
xxxwxxwxxwxxwxwxwxw
xxwxxwxxwxwxwxwxxxV
++
++++++
+++++++
+++++=
Smooth approximating function
(21)
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
Slide 28
0 5 10 15 20 25 30-4
-2
0
2
4
6
8
10
Time
W
NN Weights
Fig. 4 Nonlinear System Weights0 5 10 15 20 25
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time
Sta
tes
State Trajectory
x1x2x3
Fig. 5 State Trajectory of Nonlinear System
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
0 0.5 1 1.5 2 2.5 3-120
-100
-80
-60
-40
-20
0
20Optimal Control
Time
Con
trol I
nput
u1u2
Fig. 6 Optimal NN Control Law
Slide 29
Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control
Fixed-Final-Time HJI Optimal Control
1. Based on solving a related Hamilton-Jacobi-Isaacs equation of the correspondingfinite-horizon zero-sum game.
2. The game value function is approximated by a neural network with time-varyingweights.
3. It is shown that the neural network approximation converges uniformly to thegame-value function and the resulting nearly optimal constrained feedbackcontroller provides closed-loop stability and bounded gain. 2L
Slide 30
2 2Tz h h u= +
2
0
2
0
2
0
2
0
2
)(
)(
)(
)(γ≤
+=
∫
∫
∫
∫∞
∞
∞
∞
dttd
dtuhh
dttd
dttz T
System
),(
)()()(
uxzxy
dxkuxgxfx
ψ==
++=
)(ylu =
d
u
z
y control
Performance output
Measuredoutput
disturbance
where
Find control u(t) so that
For all L2 disturbancesAnd a prescribed gain γ2
L2 Gain Problem
H-Infinity Control Using Neural Networks
Zero-Sum differential Nash game
Slide 31
Hamiltonian function
( ) ( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) 222,,,, tdtuxhxhtdxktuxgxfx
txVdupxH TT
γ−++++∂
∂=
Minimizing the Hamiltonian of the optimal control problem with regard to and gives
u d
( ) ( ) ( ) 02, * =+∂
∂ tux
txVxgT ( ) ( ) ( ) 02, *2 =−∂
∂ tdx
txVxkT γ
( ) ( ) ( )dx
txdVxgtu T ,21* −= ( ) ( ) ( )
dxtxdVxktd T ,
21
2*
γ=
Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control
Slide 32
Hamilton-Jacobi-Isaacs equation
So
Boundary condition ( )( ) ( )( )ffff ttxttxV ,, φ=
Here ( ) ( ) ( ) ( ) ( ) ( )TTT xkxkxgxgxgxg 241
41ˆˆ
γ−=
Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control
( ) ( ) ( ) ( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( ) 0
,,
2*22*
**
=−++
++∂
∂+
∂∂
tdtuxhxh
tdxktuxgxfx
txVt
txV
T
T
γ
( )( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) 0,ˆˆ,
,,,
**
***
=+∂
∂∂
∂−
∂∂
+∂
∂=
xhxhx
txVxgxgx
txV
fx
txVt
txVtxVHJI
TTT
T
Slide 33
Cannot solve HJI !!
CT Policy Iteration for H-Infinity Control
Consistency equationFor Value Function
Successive Solution- Algorithm 1:Let γ be prescribed and fixed.
0u a stabilizing control with region of asymptotic stability 0Ω
1. Outer loop- update controlInitial disturbance 00 =d
2. Inner loop- update disturbanceSolve Value Equation
( ) 0)()(2)( 2
0
=−++++∂
∂∫ − iTiu
TTj
Tj
i
dddhhkdgufx
V j
γννφ
Inner loop update disturbance
xVxkd j
iTi
∂∂
=+ )(21
21
γgo to 2.Iterate i until convergence to jVd ∞∞ , with RAS j
∞Ω
Outer loop update control action
⎟⎟⎠
⎞⎜⎜⎝
⎛∂
∂−=
∞
+ xVxgu jT
j )(21
1 φ
Go to 1.Iterate j until convergence to ∞
∞∞ Vu , , with RAS ∞
∞Ω
Murad Abu KhalafInfinite Horizon Case
Slide 34
Results for this Algorithm
For this to occur it is required that 0* Ω⊆Ω
The algorithm converges to )(*),(*,),(* 0000 ΩΩΩΩ duV
the optimal solution on the RAS 0Ω
Sometimes the algorithm converges to the optimal HJI solution V*, *Ω , u*, d*
For every iteration on the disturbance di one hasj
ij
i VV 1+≤ the value function increasesj
ij
i 1+Ω⊇Ω the RAS decreases
For every iteration on the control uj one has1+
∞∞ ≥ jj VV the value function decreases
1+∞∞ Ω⊆Ω jj the RAS does not decrease
Converges to available storage
Murad Abu Khalaf
Slide 35
Hamilton-Jacobi-Isaacs equation
Boundary condition ( )( ) ( )( )ffff ttxttxV ,, φ=
Here ( ) ( ) ( ) ( ) ( ) ( )TTT xkxkxgxgxgxg 241
41ˆˆ
γ−=
Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control
( )( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )* * * *
* , , , ,ˆ ˆ, 0
T TT TV x t V x t V x t V x t
HJI V x t f g x g x h x h xt x x x
∂ ∂ ∂ ∂= + − + =
∂ ∂ ∂ ∂
Slide 36
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
NN Approximation of the Cost Function ( )txV ,
( ) ( ) ( ) ( ) ( )xtwxtwtxV LTL
L
jjjL σσ == ∑
=1,
The NN weights are and is the number of hidden-layer neurons.
is the vector of activation function.
is the vector of NN weights.
( )tw j L
( ) ( ) ( ) ( )[ ] TLL xxxx σσσ ...21≡σ
( ) ( ) ( ) ( )[ ]TLL twtwtwt ...21≡w
In Sandberg [78], it is shown that NNs with time-varying weights can be used to uniformly approximate continuous time-varying functions.
Slide 37
Note that( ) ( ) ( ) ( ) ( )txt
xx
xtxV
LTLL
TLL wσw
σ∇≡
∂∂
=∂
∂ ,
( )L x∇ =σ ( ) xxL ∂∂σ
( ) ( ) ( )xtt
txVL
TL
L σw=∂
∂ ,
HJI becomes
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )11 ,4
T TL L L L
T T TL L L L L
t x t x f x
t x g x R g x x t Q x e x t−
− − ∇
+ − =
w σ w σ
w σ σ w
This turns a PDE into an ODE for the NN weights
Slide 38
Lemma 1: Convergence of Approximate HJI Equation
Lemma 2: Convergence of NN Weights
Lemma 3: Convergence of Approximate Value Function
Lemma 4: Convergence of Value Function Gradient
Lemma 5: Convergence of Control Inputs
Lemma 6: Convergence of State Trajectory
Nonlinear Fixed-Final-Time HJI Solution by NN Least-Squares Approximation
Lemmas Proved:
Stability for enough hidden layer neurons
As number of NN hidden layer neurons L → ∞
Conv. in Sobolev space
Slide 39
( ) ( ) ( ) ( )txextwtxVHJB L
L
jjjL ,,
1=⎟⎟
⎠
⎞⎜⎜⎝
⎛= ∑
=
σ
The corresponding optimal control input is
To find the least-squares solution for , the method of weighted residuals is used( )tLw
( )( ) ( ) 0,,,
=∂
∂
Ω
txettxe
LL
L
w
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
( ) ( ) ( )x
txVxgRxu T
∂∂
−= −*
1* ,21 ( ) ( )txgR L
TL
T wσ∇= −1
Slide 40
Neural-network-based nearly optimal FEEDBACK control law.
This uses preliminary off-line tuning to solve HJI equation using NN.Dynamics must be known.
Slide 41
NN Least-Squares Approximate HJI Solution
( )( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( )txhhxx
xtxgxgxtxx
txxfxxx
t
LLT
LL
LLTL
TL
TLLL
LLLLL
L
wσσσ
σwσσwσσ
wσσσσ
w
⋅⋅−
∇∇⋅+
⋅∇⋅−
=
Ω
−
Ω
Ω
−
Ω
Ω
−
Ω
,,
,ˆˆ,
,,
1
1
1
Boundary condition .( )( ) ( )( )ffff ttxttxV ,, φ=
Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control
NN approx has converted a PDE into an ODE for the NN weightsc.f. Mech Eng assumed mode shapes method for flexible systems
Slide 42
Optimal Algorithm Based on NN Approximation
A mesh of points over the integration region can be introduced on 0Ω
( ) ( )[ ]TLL xxAp1 xx |......| σσ=
( ) ( ) ( ) ( )[ ]TLL xfxxfxBp1 xx |......| σσ=
where represents the number of points of the mesh.p
( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( )( )[ ]TTL
TL
TL
TL xxgxgxxxgxgxC
p1 xx |ˆˆ......|ˆˆ σσσσ ∇∇∇∇=
[ ]TDp1 x
Tx
T |hh......|hh=
Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control
Slide 43
( ) ( ) ( ) ( ) ( ) ( ) ( ) DAAAtCtAAABAtAAt TTL
TL
TTTL
TL
111 −−−−+−= wwww
This is a nonlinear ODE that can easily be integrated backwards using final condition to find the least-squares optimal NN weights.( )fL tw
Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control
Slide 44
Numerical Examples
a) Linear System
(18)
To find a nearly optimal time-varying controller, the following smooth function is used to approximate the value function of the system
(19)
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
1 1 2 1
2 1 2 2
2 35 6 2
x x x ux x x u
= + += + +
( ) 223212
21121, xwxxwxwxxV ++=
Slide 45
Fig. 1 Linear System Weights Fig. 2 State Trajectory of Linear System
Fig. 3 Optimal NN Control Law
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
Time
Wei
ght
Linear system weights
w1w2w3
0 1 2 3 4 5 6 7 8 9 10-1
-0.5
0
0.5
1
1.5
2
Time
Sta
tes
State Trajectory
x1x2
0 1 2 3 4 5 6 7 8 9 10-30
-25
-20
-15
-10
-5
0
5
Time
Con
trol I
nput
Optimal Control
u1u2
Slide 46
b) Nonlinear Chained System
213
22
11
uxxuxux
===
(20)
( )
33221
33220
33119
321183
31172
3116
2321153
22114
322113
23
2212
23
2111
22
2110
439
428
417
326315214233
222
211321 ,,
xxwxxw
xxwxxwxxwxxwxxxwxxxw
xxxwxxwxxwxxwxwxwxw
xxwxxwxxwxwxwxwxxxV
++
++++++
+++++++
+++++=
Smooth approximating function
(21)
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
Slide 47
0 5 10 15 20 25 30-4
-2
0
2
4
6
8
10
Time
W
NN Weights
Fig. 4 Nonlinear System Weights0 5 10 15 20 25
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time
Sta
tes
State Trajectory
x1x2x3
Fig. 5 State Trajectory of Nonlinear System
Nonlinear Fixed-Final-Time HJB Solution by NN Least-Squares Approximation
0 0.5 1 1.5 2 2.5 3-120
-100
-80
-60
-40
-20
0
20Optimal Control
Time
Con
trol I
nput
u1u2
Fig. 6 Optimal NN Control Law
Fix using time-varying transformations–Zhihua Qu
Slide 48
Simulation-Benchmark Problem
Fig. 8 Rotational actuator to control a translational oscillator.
Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control
Slide 49
( ) ( ) ( ) ( )tdxkuxgxfx ++= 2≤u
224
23
22
21 1.01.01.0
qT uxxxxzz ++++=
( )( )2.0
2=
++=
mMmeImeε 10=γ
( ) T
xxxxx
xxxxx
xf ⎥⎦
⎤⎢⎣
⎡
−−
−+−
=3
223
2413
43
223
241
2 cos1sincos
cos1sin
εεε
εε
T
xxx
g ⎥⎦
⎤⎢⎣
⎡
−−−
=3
223
223
cos110
cos1cos
0εε
ε
T
xx
xk ⎥
⎦
⎤⎢⎣
⎡
−−
−=
322
3
322 cos1
cos0
cos110
εε
ε
Neural Network Solution for Finite-Horizon H-Infinity State Feedback Control
Slide 50
0 10 20 30 40 50 60 70 80 90 100-3
-2
-1
0
1
2
3x 1,x
3
Time in seconds
Nearly Optimal Controller State Trajectories
rtheta
Fig. 9 , State Trajectoriesr θ
0 10 20 30 40 50 60 70 80 90 100-1
-0.5
0
0.5
1
1.5
x 2,x4
Time in seconds
Nearly Optimal Controller State Trajectories
rdotthetadot
Fig. 10 , State Trajectoriesr θ
0 10 20 30 40 50 60 70 80 90 100-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
cont
rol
Time in seconds
Nearly Optimal Controller
Fig. 11 Control Input( )tu0 10 20 30 40 50 60 70 80 90 100
0
2
4
6
8
10
12
14
16
Atte
nuat
ion
Time in seconds
Nearly Optimal Controller Cost
Fig. 12 Disturbance Attenuation
Slide 51
Neural Network Solution for Fixed-Final time Constrained Optimal Control
When the control input is constrained by a saturated function .To guarantee bounded controls, [1][46] introduced a generalized nonquadratic functional
( )⋅φ
( ) ( )∫ −=u T RdvvuW0
2 φ
( ) ( ) ( )[ ] Tmvvv φφ 1=φ
( ) ( ) ( )[ ]muuu 11
11 −−− = φφφ
where , , and is a bounded one-to-one function that belongs to and .
mv ℜ∈ mℜ∈φ ( )⋅φ( )1≥pC p ( )Ω2L
Moreover, it is a monotonic odd function with its first derivative bounded by a constant .M
(22)
Slide 52
-3 -2 -1 0 1 2 30
1
2
3
4
5
6
7
8
9
10
u
WFig. 14 Nonquadratic Cost
( ) ( )∫ −⋅=u T RdvAvAuW0
tanh2
Fig. 13 Hyperbolic tangent function
-5 -4 -3 -2 -1 0 1 2 3 4 5-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
( )xy tanh=
Neural Network Solution for Fixed-Final time Constrained Optimal Control
Slide 53
When (22) is used, (5) becomes
( )( )
( ) ( ) ( ) ( ) ( ) ( )( )⎟⎟
⎠
⎞
⎜⎜
⎝
⎛+
∂∂
++=∂
∂− ∫ − xuxgtxf
xtxVRdvvxQ
ttxV
Tu T
tu,,2min, *
0
*
φ
Minimizing the Hamiltonian of the optimal control problem with regard to givesu
( ) ( ) ( ) 02, *1*
=+∂
∂ − ux
txVxg T φ
so
( ) ( ) ( )⎟⎟⎠
⎞⎜⎜⎝
⎛
∂∂
−= −
xtxVxgRxu T
*1* ,
21φ mUu ℜ⊂∈ (23)
Neural Network Solution for Fixed-Final time Constrained Optimal Control
Slide 54
HJB equation
( )( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) 0,21,2
,,,
*1
*
0
***
=+⎟⎟⎠
⎞⎜⎜⎝
⎛
∂∂
⋅⋅∂
∂−+
∂∂
+∂
∂=
−−∫ xQx
txVxgRxgx
txVRdvv
xfx
txVt
txVtxVHJB
TT
u T
T
φφ(24)
If this HJB equation can be solved for the value function , then (24) gives the optimal constrained control.
( )txV ,
Neural Network Solution for Fixed-Final time Constrained Optimal Control
Slide 55
So that
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )Ω
−
Ω
Ω
−−
Ω
Ω
−−
Ω
Ω
−
Ω
⋅−
⎟⎠⎞
⎜⎝⎛ ∇⋅⋅∇⋅+
−
⋅∇⋅−=
∫
xxQxx
xtxxgRxgxtxx
xRdvvxx
txxfxxxt
LLL
LLTL
TL
TLLL
L
u TLL
LLLLLL
σσσ
σwσφσwσσ
σφσσ
wσσσσw
,,
,21,
,2,
,,
1
11
0
1
1
(25)
Neural Network Solution for Fixed-Final time Constrained Optimal Control
Slide 56
(12) can be converted to
( ) ( ) ( ) 0=−+−−− EAtDACAtBAtAA TL
TTL
TL
T www (26)
then
( ) ( ) ( ) ( )( ) ( ) ( ) EAAAtDAAA
AAAtBAAAtTT
LTT
TTL
TTL
11
11
−−
−−
−+
−−=
w
ww(27)
This is a nonlinear ODE that can easily be integrated backwards using final condition to find the least-squares optimal NN weights.( )fL tw
Neural Network Solution for Fixed-Final time Constrained Optimal Control
Optimal Algorithm Based on NN Approximation
Slide 57
Numerical Examples
a) Linear System
133
2212
3211 2
uxxuxxxxxxx
+=+−=++=
(28)
To find a nearly optimal time-varying controller, the following smooth function is used to approximate the value function of the system
(29)( ) 326315214233
222
21121 , xxwxxwxxwxwxwxwxxV +++++=
Neural Network Solution for Fixed-Final time Constrained Optimal Control
51 ≤u 202 ≤u
Slide 58
0 5 10 15 20 25 300
10
20
30
40
50
60
70
80
Time
W
w1w2w3w4w5w6
Fig. 15 Constrained Linear System Weights
0 5 10 15 20 25 30-4
-3
-2
-1
0
1
2
3
Time
Sta
tes
State Trajectory
x1x2x3
Fig. 16 State Trajectory of Linear System with Bounds
0 5 10 15 20 25 30-10
-5
0
5
10
15
20Optimal Control with Bounds
Time
Con
trol I
nput
u1u2
Fig. 17 Optimal NN Control Law with Bounds
Slide 59
b) Nonlinear Chained System
213
22
11
uxxuxux
===
(30)
( )
33221
33220
33119
321183
31172
3116
2321153
22114
322113
23
2212
23
2111
22
2110
439
428
417
326315214233
222
211321 ,,
xxwxxw
xxwxxwxxwxxwxxxwxxxw
xxxwxxwxxwxxwxwxwxw
xxwxxwxxwxwxwxwxxxV
++
++++++
+++++++
+++++=
Selecting the smooth approximating function
(31)
Neural Network Solution for Fixed-Final time Constrained Optimal Control
11 ≤u 22 ≤u
Slide 60
0 5 10 15 20 25 30-4
-2
0
2
4
6
8
10
Time
W
NN Weights
Fig. 18 Nonlinear System Weights0 5 10 15 20 25
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time
Sta
tes
State Trajectory
x1x2x3
Fig. 19 State Trajectory of Nonlinear System
Neural Network Solution for Fixed-Final time Constrained Optimal Control
0 5 10 15 20 25-2
-1.5
-1
-0.5
0
0.5Optimal Control with Constrains
Time
Con
trol I
nput
u1u2
Fig. 20 Optimal NN Constrained Control Law
Slide 61
0 10 20 30 40 50 60 70 80 90 100-6
-5
-4
-3
-2
-1
0
1
2
3x 1,x
3
Time in seconds
Nearly Optimal Controller State Trajectories
rtheta
r θFig. 21 State Trajectories
0 10 20 30 40 50 60 70 80 90 100-2
-1.5
-1
-0.5
0
0.5
1
1.5
x 2,x4
Time in seconds
Nearly Optimal Controller State Trajectories
rdotthetadot
r θFig. 22 State Trajectories
0 10 20 30 40 50 60 70 80 90 100-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
cont
rol
Time in seconds
Nearly Optimal Controller with Constrains
Fig. 23 Control Input( )tu0 10 20 30 40 50 60 70 80 90 100
0
1
2
3
4
5
6
7
8
9
10
Atte
nuat
ion
Time in seconds
Nearly Optimal Controller Cost with Constrains
Fig. 24 Disturbance Attenuation
C) Simulation-Benchmark Problem
Slide 62
Overview of the Method
Neural networks are used to approximately solve the finite-horizon optimal state feedback control problem
The method is based on solving a related Hamilton-Jacobi equation of the corresponding finite-horizon problem
Transform the problem into solving an ODE equation backwards in time.
Neural network approximation converges uniformly to the function and the resulting controller provides closed-loop stability.
The result is a nearly exact feedback controller with time-varyingcoefficients.
No policy iteration needed.
Slide 63