Date post: | 08-Jan-2016 |
Category: |
Documents |
Upload: | fsaleheen9599 |
View: | 218 times |
Download: | 0 times |
of 40
Automation & Robotics Research Institute (ARRI)The University of Texas at Arlington
F.L. LewisMoncrief-ODonnell Endowed Chair
Head, Controls & Sensors Group
Talk available online at http://ARRI.uta.edu/acs
Notes on Optimal Control
Supported by :NSF - PAUL WERBOSARO RANDY ZACHERY
Draguna Vrabie
Mike Athans
BOOKSOptimal Control (McGraw Hill, 1966) Systems, Networks and Computation: Basic Concepts (McGraw Hill, 1972) Multivariable Methods (McGraw Hill, 1974)
Third President of the IEEE Control Systems Society, 1972 to 1974
PhD from Berkeley in 1961
More than 40 Phd students- how many?
Awards
American Automatic Control Council's Donald P. Eckman Award "for outstanding contributions to the field of automatic control" (first recipient of the award, 1964) American Society for Engineering Education's Frederick E. Terman Award as "the outstanding young electrical engineering educator" (first recipient, 1969) American Control Council's Education Award for "outstanding contributions and distinguished leadership in automatic control education" (second recipient, 1980) Fellow of the IEEE (1973) Fellow of the AAAS (1977) Distinguished Member of the IEEE Control Systems Society (1983) IEEE Control Systems Society's 1993 H.W. Bode Prize, including the delivery of the Bode Plenary Lecture at the 1993 IEEE Conference on Decision and Control American Automatic Control Council's Richard E. Bellman Control Heritage Award (1995) "In Recognition of a Distinguished Career in Automatic Control; As a Leader and Champion of Innovative Research; As a Contributor to Fundamental Knowledge in Optimal, Adaptive, Robust, Decentralized and Distributed Control; and as a Mentor to his Students"
Athans & Falb, Optimal Control (McGraw Hill, 1966) First book on OC?
1. Use Neural Networks as Function Approximators in Adaptive Control
2. Optimal Adaptive Control gives new adaptive control algorithms
Two-layer feedforward static neural network (NN)
(.)
(.)
(.)
(.)
x1
x2
y1
y2
VT WT
inputs
hidden layer
outputs
xn ym
1
2
3
L
(.)
(.)
(.)
Summation eqs Matrix eqs
)( xVWy TT=
+
+=
= =
K
ki
n
jkjkjiki wvxvwy
10
10
Have the universal approximation propertyOvercome Barrons fundamental accuracy limitation of 1-layer NN
qdRobot System[ I]
Robust ControlTerm
q
v(t)
e
PD Tracking Loop
rKv
Neural Network Robot Controller
^
qd
f(x)
Nonlinear Inner Loop
..
Feedforward Loop
Universal Approximation Property
Problem- Nonlinear in the NN weights sothat standard proof techniques do not work
Feedback linearization
NN universal basis property means no regression matrix is neededExtension of Adaptive Control to Nonlinear-in the parameters systems
Theorem 1 (NN Weight Tuning for Stability)
Let the desired trajectory )(tqd and its derivatives be bounded. Let the initial tracking error bewithin a certain allowable set U . Let MZ be a known upper bound on the Frobenius norm of theunknown ideal weights Z . Take the control input as
vrKxVW vTT += )( with rZZKtv MFZ )()( += .
Let weight tuning be provided by
WrFxrVFrFW TTT ' = , VrGrWGxV TT )'( =
with any constant matrices 0,0 >=>= TT GGFF , and scalar tuning parameter 0> . Initialize the weight estimates as randomVW == ,0 .
Then the filtered tracking error )(tr and NN weight estimates VW , are uniformly ultimately bounded. Moreover, arbitrarily small tracking error may be achieved by selecting large controlgains vK . Backprop terms-
Werbosrobustifying terms-e-mod, sigma mod, etc.
Extension of Adaptive Control to nonlinear-in parameters systemsNo regression matrix needed
Extra Jacobian terms from NLIP network
NLIP
[ I]
Robust ControlTerm vi(t)
Tracking Loop
rKr
Nonlinear FB Linearization Loop
F1(x)^qr =
qrqr.e
ee = .
qd =qdqd.
..qd
RobotSystem1/KB1 i
F2(x)^
Kid
NN#1
NN#2Backstepping Loop
ue[ I]
Robust ControlTerm vi(t)
Tracking Loop
rKrKr
Nonlinear FB Linearization Loop
F1(x)F^1(x)^
Neural network backstepping controller for Flexible-Joint robot arm
qr =qrqr.qr =qr =qrqr.qrqr.e
ee = .e = .
qd =qdqd.qd =qd =qdqd.qdqd.
..qd..qd
RobotSystem1/KB1 i
F2(x)F^2(x)^
KKid
NN#1
NN#2Backstepping Loop
ue
Backstepping
Advantages over traditional Backstepping- no regression functions needed
Add an extra feedback loopTwo NN neededUse passivity to show stability
Flexible & Vibratory Systems
MechanicalSystem
Kv[ ]v
reqd
Estimateof NonlinearFunction
w
--
D(u)u
NN DeadzonePrecompensator
I
II
( )f x
q
dq NN in Feedforward Loop- Deadzone Compensation
Acts like a 2-layer NNWith enhanced backprop tuning !
little observer
Actuator Nonlinearities - Deadzone, saturation, backlash
iiiTTTTT
iii WWrTkWrTkUuUWrwUTW )(')( 21 = WrSkrwUWUuUSW TTiii
TT )()(' 1= Critic:Actor:
Cell Homeostasis The individual cell is a complex feedback control system. It pumps ions across the cell membrane to maintain homeostatis, and has only limited energy to do so.
Cellular Metabolism
Permeability control of the cell membrane
http://www.accessexcellence.org/RC/VL/GG/index.html
Optimality in Biological Systems
Optimality in Control Systems DesignR. Kalman 1960
Rocket Orbit Injection
http://microsat.sm.bmstu.ru/e-library/Launch/Dnepr_GEO.pdf
FmmmF
rwvv
mF
rrvw
wr
=+=
+==
cos
sin22
ObjectivesGet to orbit in minimum timeUse minimum fuel
Dynamics
Adaptive Control is generally not Optimal
Optimal Control is off-line, and needs to know the system dynamics to solve design eqs.
We want ONLINE ADAPTIVE OPTIMAL Control
),( uxfx = +==t
T
t
dtRuuxQdtuxrtxV ))((),())((
),,(),(),(),(),(0 uxVxHuxruxf
xVuxrx
xVuxrV
TT
+
=+
=+= 0)0( =V
+=
+= ),(),(min),(min0
*
)(
*
)(uxf
xVuxrx
xVuxr
T
tu
T
tu
xVxgRtxh T
= *
12
1* )())((
dxdVggR
dxdVxQf
dxdV T
TT *1
*
41
*
)(0
+
= 0)0( =V
System
Cost
Hamiltonian
Optimal cost
Optimal control
HJB equation
Continuous-Time Optimal Control
Bellman
+=
+= ),(),(min),(min0
)()(uxf
xVuxrx
xVuxr
T
tu
T
tu
,
,
For a given control, the cost satisfies this eq.
In LQR, this is a Lyapunov eq
In LQR, this is a Riccati eq
Linear system, quadratic cost
System:Utility:
The cost is quadratic
Optimal control (state feedback):
HJB equation is the algebraic Riccati equation (ARE):
PBPBRQPAPA TT 10 ++=
1( ) ( ) ( )Tu t R B Px t Lx t= =
BuAxx +=0,0;),( >+= QRRuuQxxuxr TT
==t
T tPxtxduxrtxV )()(),())((
Full system dynamics must be knownOff-line solution
),,(),(),(0 uxVxHuxruxf
xV T
+
=
0 ( , ( )) ( , ( ))T
kk k
V f x h x r x h xx
= + (0) 0kV =
1121( ) ( )
T kk
Vh x R g xx
+
=
CT Policy Iteration
Convergence proved by Saridis 1979 if Lyapunov eq. solved exactly
Beard & Saridis used complicated GalerkinIntegrals to solve Lyapunov eq.
Abu Khalaf & Lewis used NN to approx. V for nonlinear systems and proved convergence
RuuxQuxr T+= )(),(Utility
Cost for any given u(t)
Lyapunov equation
Iterative solution
Pick stabilizing initial control
Find cost
Update control
Full system dynamics must be knownOff-line solution
To avoid solving HJB equation
LQR Policy iteration = Kleinman algorithm
1. For a given control policy solve for the cost:
2. Improve policy:
If started with a stabilizing control policy the matrix monotonically converges to the unique positive definite solution of the Riccati equation.
Every iteration step will return a stabilizing controller. The system has to be known.
xLu k=k
Tk
Tkkk
Tk RLLCCAPPA +++=0
11 = kTk PBRL
kk BLAA =
0L kP
Kleinman 1968
Lyapunov eq.
Policy Iteration Solution
( ) T TRic P A P PA Q PBB P + +
1 1( ) ( ) 0T T T T
i i i i i iA BB P P P A BB P PBB P Q+ + + + + =
( ) 11 ( ), 0,1,ii i P iP P Ric Ric P i+ = =
Policy iteration
This is in fact a Newtons Method
Then, Policy Iteration is
Frechet Derivative
)()()(' iTT
iT
P PBBAPPPBBAPRic i +
Policy Iterations without Lyapunov Equations
Dynamic programming built on Bellmans optimality principle alternative
form for CT Systems [Athans & Falb 1966, Lewis & Syrmos 1995]
* *( )
( ( )) min ( ( ), ( )) ( ( ))t t
utt t t
V x t r x u d V x t t
+
< +
= + +
)()()()())(),(( RuuQxxuxr TT +=
Draguna Vrabie
f(x) and g(x) do not appear
Solving for the cost Our approach
))(()())(( TtxVdtRuuQxxtxVTt
t
TT +++= +Lxu =
Draguna Vrabie
f(x) and g(x) do not appear
For a given control
The cost satisfies
)()()()()( TtPxTtxdtRuuQxxtPxtx TTt
t
TTT ++++= +
PBRL T1=
LQR case
Optimal gain is
1 1( ( )) ( ) ( ( ))t T
T kT kk k
t
V x t x Qx u Ru dt V x t T+
+ += + + +1. Policy iteration
11
1 ++ = kTk PBRLInitial stabilizing control is needed
Draguna Vrabie
For LQR case
Cost update
Control gain update
A and B do not appear
B needed for control update
( ) ( )k ku t L x t=
1 1( ) ( ) ( )( ) ( ) ( ) ( )t T
T T T Tk k k k
tx t P x t x Q L RL x d x t T P x t T
++ += + + + +
Solving for the cost Our approach
1 1( ) ( ) 0T T T T
i i i i i iA B B P P P A B B P P B B P Q+ + + + + =Theorem D. Vrabie
This algorithm converges and is equivalent to Kleinmans Algorithm( ) 11 ( ) , 0 , 1 ,ii i P iP P R i c R i c P i+ = =
1 1( ) ( ) ( )( ) ( ) ( ) ( )t T
T T T Tk k k k
tx t P x t x Q L RL x d x t T P x t T
++ += + + + +
1 1 11 1( ) ( ) 0
T T T Tk k k k k kA BR B P P P A BR B P P BR B P Q
+ + + + + =
Lemma 1
is equivalent to
xQRKKxxAPPAxdt
xPxdi
Ti
Tiii
Ti
TiT
)()()( +=+=
Proof:
( ) ( ) ( ) ( ) ( ) ( )t T t T
T T T T Ti i i i i
t t
x Q K RK xd d x Px x t Px t x t T Px t T+ +
+ = = + +
1 Tk kL R B P
=Solves Lyapunov equation without knowing A or B
Only B is needed
1 1( ) ( ) ( )( ) ( ) ( ) ( )t T
T T T Tk k k k
tx t P x t x Q L RL x d x t T P x t T
++ += + + + +
Critic update
11
1 ++ = kTk PBRL
1kp +Now use RLS along the trajectory to get new weights Unpack weights into the matrix Pk+1Then find updated FB gain
( )( ) ( )Tvec ABC C A vec B=
[ ]1 1( ) ( ) ( ) ( ) ( ) ( )t T
T T T Tk k i i
tp t p x t x t T x Q K RK x d
++ + + = +
1 1( ) ( ) ( ) ( ) ( )t T
T T T Tk i i k
tp x t x Q K RK x d p x t T
++ += + + +
is the quadratic basis set
Use Kronecker product
To set this up as ( ) ( ) ( )x t x t x t=
( , )t t T + Reinforcement on time interval [t, t+T]
Algorithm Implementation
Quadratic regression vector
1. Select initial control policy
2. Find associated cost
3. Improve control 11 1
Tk kL R B P
+ +=
[ ]1 ( ) ( ) ( ) ( ) ( ) ( , )t T
T T Tk k k
tp x t x t T x Q L RL x d t t T
++ + = + = +
Solves Lyapunov eq. without knowing dynamics
t t+T
observe x(t)
observe x(t+T)
apply uk(t)=Lkx(t)
observe cost integral
update P
do RLS until convergence to Pk+1
update control gain to Lk+1
Measure cost increment (reinforcement)by adding V as a state. Then
( )T k T kV x Qx u Ru= +
A is not needed anywhere
Algorithm Implementation
The Critic update
can be setup as
Evaluating for n(n+1)/2 trajectory points, one can setup a least squares problem to solve
1 1( ) ( ) ( )( ) ( ) ( ) ( )t T
T T T Tk k k k
tx t P x t x Q L RL x d x t T P x t T
++ += + + + +
11 ( )
Tip XX XY
+ = 1 2[ ( ) ( ) ... ( )]NX t t t =1 2[ ( , ) ( , ) ... ( , )]N Ti i iY d x K d x K d x K=
( ( ), )kd x t L
Or use batch Least-Squares solution along the trajectory
[ ]1 1( ) ( ) ( ) ( ) ( ) ( ) ( ( ), )t T
T T T Ti i k k k
tp t p x t x t T x Q L RL x d d x t L
++ + + = +
is the quadratic basis set( ) ( ) ( )x t x t x t=
Direct Optimal Adaptive Controller
A hybrid continuous/discrete dynamic controller whose internal state is the observed cost over the interval
Draguna Vrabie
u
VZOH T
0; xBuAxx +=System
RuuQxxV TT +=
Critic
TT
xu
VZOH T
0; xBuAxx +=System
RuuQxxV TT +=
Critic
Actor
TT
x
FB Gain L
Lkmultiplier
DynamicControlSystem
)()( txLtu kk =
Continuous-time control with discrete gain updates
t
Lk
k0 1 2 3 4 5
Sample periods need not be the sameThey can be selected on-line in real time
Gain update (Policy)
Control
1( ) ( ) ( )( ) ( ) ( ) ( )t T
T T T Tk k k k
tx t P x t x Q L RL x d x t T P x t T
++ = + + + +
1( ( )) ( ) ( ( ))t T
T kT kk k
t
V x t x Qx u Ru dt V x t T+
+ = + + +2. CT ADP Greedy iteration
)()( txLtu kk =
11
1 ++ = kTk PBRL
No initial stabilizing control needed
Draguna Vrabie
LQR
Control policy
Cost update
Control gain update A and B do not appear
B needed for control update
Direct Optimal Adaptive Control for Partially Unknown CT Systems
( )( ) ( )Tvec ABC C A vec B=
1( ) ( ) ( )( ) ( ) ( ) ( )t T
T T T Tk k k k
tx t P x t x Q L RL x d x t T P x t T
++ = + + + +
1 ( ) ( ) ( ) ( ) ( )t T
T T T Tk k k k
tp x t x Q L RL x d p x t T
++ = + + +
The critic update
Use Kronecker product
To set this up as
1kp +Now use RLS along the trajectory to get new weights
Unpack weights into the matrix Pk+1Then find updated FB gain 1
11 ++ = kTk PBRL
is the quadratic basis set( ) ( ) ( )x t x t x t=
Algorithm Implementation
Previous weightsRegression vector
1. Select control policy
2. Find associated cost
3. Improve control 11 1
Tk kL R B P
+ +=
1 ( ) ( ) ( ) ( ) ( )t T
T T T Tk k k k
tp x t x Q L RL x d p x t T
++ = + + +
Solves for cost update without knowing dynamics
t t+T
observe x(t)
observe x(t+T)
apply u
observe cost integral
update P
do RLS until convergence to Pk+1
update control gain to Lk+1
Measure cost increment by adding V as a state. Then ( )T i T iV x Qx u Ru= +
No initial stabilizing control needed
A is not needed anywhere
Direct Optimal Adaptive Controller
A hybrid continuous/discrete dynamic controller whose internal state is the observed value over the interval
Draguna Vrabie
u
VZOH T
0; xBuAxx +=System
RuuQxxV TT +=
Critic
TT
xu
VZOH T
0; xBuAxx +=System
RuuQxxV TT +=
Critic
Actor
TT
x
FB Gain L
LkDynamicControlSystem
Has a different critic cost updateNo initial stabilizing gain needed
Greedy update is equivalent to
Analysis of the algorithm
For a given control policy
( ){ }1 0( ( )) ( ( )), 0t T TT k kk ktV x t x Qx u Ru d V x t T V++ = + + + =
( ) ( )k ku t L x t= kTk PBRL 1=k
Tk PBBRAA
1=
1 ( )T Tk k k k
t TA t A t A T A TT
k k k kt
P e Q L RL e dt e P e+
+ = + +
with
a strange pseudo-discretized RE
Draguna Vrabie
( ) APBBPBPBPAQAPAP kTKTkkTkTk 11 + ++=c.f. DT RE ( ) kKTkTkkkTkk LBPBPLQAPAP +++=+1
++=+T
tAk
Tkkk
tAkk dteRLLQAPAPePP k
Tk
01 )(
Lemma 2. CT HDP is equivalent to
kT
k PBBRAA1=
ADP solves the CT ARE without knowledge of the system dynamics A
Analysis of the algorithmDraguna Vrabie
This extra term means the initial Control action need not be stabilizing
When ADP converges, the resulting P satisfies the Continuous-Time ARE !!
Direct OPTIMAL ADAPTIVE CONTROL
Solve the Riccati Equation WITHOUT knowing the plant dynamics
Model-free ADP
Works for Nonlinear Systemsa neural network is used to approximate the cost
Robustness?Comparison with adaptive control methods?
Policy Evaluation Critic updateLet K be any state feedback gain for the system (1). One can measure the associated cost over the infinite time horizon
where is an initial infinite horizon cost to go.
( , ( )) ( ) ( ) ( ) ( , ( ))t T
T T
tV t x t x Q K RK x d W t T x t T
+= + + + +
( , ( ))W t T x t T+ +
What to do about the tail issues in Receding Horizon Control
)()( txLtu kk =
Continuous-time control with discrete gain updates
t
Lk
k0 1 2 3 4 5
Sample periods need not be the same
Gain update (Policy)
Control
0 0.5 1 1.5 2-0.3
-0.2
-0.1
0Control signal
Time (s)
0 0.5 1 1.5 2-0.4
-0.2
0Controller parameters
Time (s)
0 0.5 1 1.5 20
0.5
1
1.5
2
2.5
3
3.5
4System states
Time (s)
0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
Critic parameters
Time (s)
P(1,1)P(1,2)P(2,2)P(1,1) - optimalP(1,2) - optimalP(2,2) - optimal
Simulations on: F-16 autopilotLoad frequency control for power system
A matrix not needed
Converge to SS Riccati equation soln
Nonlinear Case Works Too
( ) ( )x f x g x u= +
( ( )) ( , ) ( ( ) )Tt t
V x t r x u dt Q x u Ru dt
= = +
10 ( , ( )) ( , ( ))T
kk k
V f x h x r x h xx
+ = +
1 1121 1( ) ( )
T kk k
Vu h x R g xx
++ +
= =
( )1 1( ( )) ( ) ( ( ))t T Tk k k kt
V x t Q x u Ru dt V x t T+
+ += + + +1 1
1 1( ) ( )T k
k kVu h x R g x
x +
+ += =
Policy Iteration
Prove this is equivalent to:
which is known to converge
f(x) not needed
f(x) needed
( )1 1( ( )) ( ) ( ( ))t T Tk k k kt
V x t Q x u Ru dt V x t T+
+ += + + +Approximate value by Neural Network ( )TV W x=
( )1 1( ( )) ( ) ( ( ))t T Tk k k kt
W x t Q x u Ru dt W x t T +
+ += + + +[ ] ( )1 ( ( )) ( ( )) ( )t T Tk k k
t
W x t x t T Q x u Ru dt +
+ + = +regression vector Reinforcement on time interval [t, t+T]
1kW +Now use RLS along the trajectory to get new weights
Then find updated FB1 11 1
2 21 1 1( ( ))( ) ( ) ( )
( )
TT Tk
k k kV x tu h x R g x R g x Wx x t
+ + +
= = =
Algorithm Implementation
),( uxfx = +==t
T
t
dtRuuxQdtuxrtxV ))((),())((
),,(),(),(),(),(0 uxVxHuxruxf
xVuxrx
xVuxrV
TT
+
=+
=+= 0)0( =V
+=
+= ),(),(min),(min0
*
)(
*
)(uxf
xVuxrx
xVuxr
T
tu
T
tu
xVxgRtxh T
= *
12
1* )())((
dxdVggR
dxdVxQf
dxdV T
TT *1
*
41
*
)(0
+
= 0)0( =V
System
Cost
Hamiltonian
Optimal cost
Optimal control
HJB equation
Continuous-Time Optimal Control
Bellman
+=
+= ),(),(min),(min0
)()(uxf
xVuxrx
xVuxr
T
tu
T
tu
)())(,()( 1++= khkkkh xVxhxrxV
c.f. DT value recursion, where f(), g() do not appear