+ All Categories
Home > Documents > Mae 546 Lecture 23

Mae 546 Lecture 23

Date post: 03-Apr-2018
Category:
Upload: mxjoe
View: 223 times
Download: 0 times
Share this document with a friend

of 14

Transcript
  • 7/29/2019 Mae 546 Lecture 23

    1/14

    Stochastic Optimal ControlRobert Stengel

    Optimal Control and Estimation MAE 546Princeton University, 2012

    Copyright 2012 by Robert Stengel. All rights reserved. For educational use only.http://www.princeton.edu/~stengel/MAE546.html

    http://www.princeton.edu/~stengel/OptConEst.html

    ! Nonlinear systems with random inputs andperfect measurements

    ! Nonlinear systems with random inputs andimperfect measurements

    ! Certainty equivalence and separation

    ! Stochastic neighboring-optimal control

    ! Linear-quadratic-Gaussian (LQG) control

    Nonlinear Systems with RandomInputs and Perfect Measurements

    Inputs and initial conditions are uncertain, but the state can bemeasured without error

    !x t( ) = f x t( ), u t( ),w t( ), t!" #$z t( ) = x t( )

    E x 0( )!" #$ = x 0( ); E x 0( )% x 0( )!" #$ x 0( )% x 0( )!" #$T

    { } = 0E w t( )!" #$ = 0; E w t( )w

    T&( )!" #$ =W t( )' t% &( )

    Assume that random disturbance effects are small and additive

    !x t( ) = f x t( ), u t( ),t!" #$ + L t( )w t( )

    Cost Must Be an Expected Value Deterministic cost function cannot be minimized because

    disturbance effect on state cannot be predicted

    state and control have become random variables

    minu(t)

    J= ! x(tf )"# $% + L x(t),u(t)[ ]to

    tf

    & dt

    minu(t)

    J= E ! x(tf)"# $% + L x(t),u(t)[ ]to

    tf

    & dt'()

    *)

    +,)

    -)

    However, the expected value of a deterministic costfunction can be minimized

    Stochastic Euler-LagrangeEquations?

    There is no single optimal trajectory

    Expected values of Euler-Lagrange necessaryconditions may not be well defined

    1) E ! (tf )"# $% = E &'[x(tf)]&x()*

    +,-

    T

    2) E! (t)"# $% = &E

    'H[x(t),u(t),! (t),t]'x

    ()*

    +,-

    T

    3) E!H[x(t),u(t),"(t),t]

    !u

    #$%

    &'(= 0

  • 7/29/2019 Mae 546 Lecture 23

    2/14

    Stochastic Value Functionfor a Nonlinear System

    However, a Hamilton-Jacobi-Bellman (HJB) based onexpectations can be solved

    Base the optimization on the Principle of Optimality

    Optimal expected value function at t1

    V* t1( ) = E ! x * (tf )"# $% & L x *('),u* (')[ ]

    tf

    t1

    ( d')*+

    ,+

    -.+

    /+

    =minu

    E ! x *(tf)"# $% & L x * ('),u(')[ ]

    tf

    t1

    ( d')*+

    ,+

    -.+

    /+

    Rate of Change of the Value Function

    dV*dt

    t= t1

    = !E L x *(t1),u* (t1)[ ]{ }

    x(t) and u(t) can be known precisely; therefore

    Total time-derivative of V*

    dV*

    dtt= t1

    = !L x * (t1),u *(t

    1)[ ]

    Incremental Changein the Value Function

    Apply chain rule to total derivative

    dV*

    dt= E

    !V*

    !t+

    !V*

    !x!x

    "

    #$%

    &'

    !V* =dV*

    dt!t= E

    "V*"t

    !t+"V*"x

    !x!t+1

    2!xT"2V*"x2

    !x#$%

    &'(!t2 +"

    )

    *+

    ,

    -.

    = E"V*"t

    !t+"V*"x

    f .( ) + Lw .( )( )!t+1

    2f .( ) + Lw .( )( )

    T "2V*"x2

    f .( ) + Lw .( )( )#$%

    &'(!t2 +"

    )

    *+

    ,

    -.

    Incremental change in value function,

    Expand to second degree

    !V

    Introduction of the Trace

    Trace of a matrix product

    Tr ABC( ) = Tr CAB( ) = Tr BCA( )

    Tr xTQx( ) = Tr xxTQ( ) = Tr QxxT( ) dim Tr ( )!" #$ = 1%1

    dV*

    dt! E

    "V*"t

    +"V*"x

    f .( )+ Lw .( )( )+1

    2Tr f .( )+ Lw .( )( )

    T "2V*"x2

    f .( )+ Lw .( )( )#$%

    &'()t

    *

    +,

    -

    ./

    = E"V*"t

    +"V*"x

    f .( )+ Lw .( )( ) +1

    2Tr

    "2V*"x2

    f .( )+ Lw .( )( ) f .( )+ Lw .( )( )T#

    $%&'()t

    *

    +,

    -

    ./

    Cancel !t

  • 7/29/2019 Mae 546 Lecture 23

    3/14

    Toward the StochasticHJB Equation

    dV*

    dt= E

    !V*!t

    +!V*!x

    f .( ) + Lw .( )( )+1

    2Tr

    !2V*!x2

    f .( ) + Lw .( )( ) f .( ) + Lw .( )( )T"

    #$%&'(t

    )

    *+

    ,

    -.

    =

    !V*!t

    +

    !V*!x

    f .( ) + E!V*!x

    Lw .( ) +1

    2Tr

    !2V*!x2

    f .( ) + Lw .( )( ) f .( ) + Lw .( )( )T"

    #$%&'(t

    )

    *+

    ,

    -.

    ! Because x(t) and u(t)can be measured

    Toward the Stochastic HJB Equation

    E w t( )!" #$=

    0

    E w t( )wT %( )!" #$ =W t( )& t' %( )

    dV*

    dt=

    !V*

    !t+

    !V*

    !xf .( ) +

    1

    2lim"t#0

    Tr!2V*

    !x2E f .( ) f .( )

    T( )"t+ LE w .( )w .( )T( )LT$% &'"t()*

    +,-

    =

    !V*

    !tt( ) +

    !V*

    !xt( )f .( ) +

    1

    2Tr

    !2V*

    !x2t( )L t( )W t( )L t( )

    T$

    %.

    &

    '/

    ! Uncertain disturbance input can only increase the valuefunction rate of change

    ! Disturbance is assumed to be zero-mean white noise

    Stochastic Principleof Optimality

    (Perfect Measurements)

    !V*

    !tt( ) =

    "minu

    E L x * t( ),u t( ),t#$ %& +!V*

    !xt( )f x * t( ),u t( ),t#$ %& +

    1

    2Tr

    !2V*

    !x2t( )L t( )W t( )L t( )

    T#

    $'

    %

    &(

    )*+,

    -./,

    Boundary (terminal) condition : V* tf( ) = E 0 tf( )#$ %&

    dV*

    dt= !V*

    !tt( )+ !V*

    !xt( ) f .( ) + 1

    2Tr !

    2

    V*

    !x2t( )L t( )W t( )L t( )

    T

    "#$ %

    &'

    ! Substitute for total derivative, dV*/dt= L(x*,u*)

    ! Solve for the partial derivative, !V*/!t

    ! Stochastic HJB Equation

    Observations of StochasticPrinciple of Optimality

    (Perfect Measurements)

    !V*

    !tt( ) =

    "minu

    E L x * t( ),u t( ),t#$ %& +!V*

    !xt( ) f x * t( ),u t( ),t#$ %& +

    1

    2Tr

    !2V*

    !x2t( )L t( )W t( )L t( )

    T#

    $'

    %

    &(

    )*+,

    -./,

    ! Control has no effect on the disturbance input

    ! Criterion for optimality is the same as for the

    deterministic case

    ! Disturbance uncertainty increases the magnitude

    of the total optimal value function, V*(0)

  • 7/29/2019 Mae 546 Lecture 23

    4/14

    Information Sets andExpected Cost

    ! Sigma algebra(Wikipedia definitions)

    ! The collection of sets over which a measure is defined

    ! The collection of events that can be assigned probabilities

    !

    A measurable space

    ! Information available at current time, t1! All measurements from initial time, to! All control commands from initial time

    I t

    o,t

    1[ ] = z to,t1[ ],u to,t1[ ]{ }

    The Information Set,I

    ! Plus available model structure, parameters, and statistics

    I to,t1[ ] = z to ,t1[ ],u to ,t1[ ], f ( ),Q,R,!{ }

    ! Measurements may be directly useful, e.g.,

    ! Displays

    ! Simple feedback control

    ! ... or they may require processing, e.g.,

    ! Transformation

    ! Estimation

    ! Example of a derived information set

    ! History of mean and covariance from a state estimator

    I

    Dto, t

    1[ ] = x to,t1[ ],P to, t1[ ],u to, t1[ ]{ }

    A Derived Information Set, ID

    !

    Markov derived information set! Most current mean and covariance from

    a state estimator

    IMD

    t1( ) = x t1( ),P t1( ),u t1( ){ }

    Additional DerivedInformation Sets

    ! Multiple model derived information set

    ! Parallel estimates of current mean, covariance,and hypothesis probability mass function

    IMM t1( ) = xA t1( ),PA t1( ),u t1( ),Pr HA( )!" #$ , xB t1( ),PB t1( ),u t1( ),Pr HB( )!" #$,!{ }

  • 7/29/2019 Mae 546 Lecture 23

    5/14

    ! Optimal control requires propagation of information back

    from the final time

    ! Hence, it requires the entire information set, extending from tototf

    Required and Available Information Setsfor Optimal Control

    I to ,tf!" #$

    ! Separate information set into knowable and predictable parts

    I to,tf!" #$ = I to, t1[ ]+ I t1,tf!" #$

    ! Knowable information has been received

    ! Predictable information is to come

    Expected Values ofState and Control

    ! Expected values of the state and control are conditioned

    on the information set

    E x t( ) | ID

    !" #$ = x t( )

    E x t( )% x t( )!" #$ x t( ) % x t( )!" #$T

    | ID{ } = P t( )

    ... where the conditional expected values are obtainedfrom a Kalman-Bucy filter

    Dependence of the Stochastic CostFunction on the Information Set

    ! Expand the state covariance

    J=1

    2E E Tr S(t

    f

    )x(tf

    )xT(t

    f

    )!"

    #$| I

    D

    !

    "

    #

    $+ E Tr Qx t

    ( )xT t

    ( )!"

    #${ }

    dt0

    tf

    %+ E Tr Ru t

    ( )uT t

    ( )!"

    #${ }

    dt0

    tf

    %

    &'(

    )(

    *+(

    ,(

    P t( ) = E x t( )! x t( )"# $% x t( )! x t( )"# $%T

    | ID{ }

    = E x t( )xT t( )! x t( )xT t( )! x t( ) xT t( )+ x t( ) xT t( )"# $% | ID{ }

    E x t( ) xT t( )!" #$ | ID{ } = E x t( )x

    Tt( )!" #$ | ID{ } = x t( ) x

    Tt( )

    P t( ) = E x t( )xT t( )!" #$ | ID{ } % x t( ) xT

    t( )

    or

    E x t( )xT

    t( )!" #$ | ID{ } = P t( ) +x t( )

    x

    T

    t( )

    Certainty-Equivalent andStochastic Incremental Costs

    J= 1

    2E Tr S(tf ) P tf( )+ x(tf )x

    T(tf )!" #${ }+ Tr Q P t( ) + x t( ) xT t( )!" #${ }dt

    0

    tf

    %+ Tr Ru t( )uT t( )!" #$dt

    0

    tf

    %&

    '(

    )(

    *

    +(

    ,(

    ! JCE + JS

    JCE =1

    2E Tr S(tf)x(tf )x

    T(tf )!" #$ + Tr Qx t( ) x

    Tt( ){ }dt

    0

    tf

    % + Tr Ru t( )uT

    t( )!" #$dt0

    tf

    %&'(

    )(

    *+(

    ,(

    JS =1

    2E Tr S(tf )P tf( )!" #$ + Tr QP t( )!" #$dt

    0

    tf

    %&'(

    )(

    *+(

    ,(

    ! Cost function has two parts

    ! Certainty-equivalent cost

    ! Stochastic increment cost

  • 7/29/2019 Mae 546 Lecture 23

    6/14

    Expected Cost of the Trajectory

    V* t

    o( ) ! J* tf( ) = E ! x *(tf )"# $% + L x *(&),u *(&)[ ]t0

    tF

    ' d&()*

    +*

    ,-*

    .*

    E !( ) = E !| I to, t1[ ]( )Pr I to,t1[ ]{ }+ E !| I t1,tf"# $%( )Pr I t1,tf"# $%{ }= E E !| I( )"# $%

    ! Law of total expectation

    ! Optimized cost function

    ! Because the past is established at t1

    E J*( ) = E J* | I to , t1[ ]( ) 1[ ]+ E J* | I t1,tf!" #$( )Pr I t1, tf!" #${ }= E J* | I to ,t1[ ]( )+ E J* | I t1,tf!" #$( )Pr I t1, tf!" #${ }

    Expected Cost of the Trajectory

    !For planning or post-trajectory analysis, one can assumethat the entire information set is available

    ! For real-time control, t1 !tf, and future information canonly be predicted

    ! If separation property applies (TBD), future conditioningeffect can be predicted

    ! If not, future conditioning effect can only be approximated

    Separation Property andCertainty Equivalence

    Separation Property Optimal Control Law and Optimal Estimation Law can be derived

    separately Their derivations are strictly independent

    Certainty Equivalence Property Separation property plus, ... The Stochastic Optimal Control Law and the Deterministic Optimal

    Control Law are the same The Optimal Estimation Law can be derived separately Linear-quadratic-Gaussian control is certainty-equivalent

    Neighboring-Optimal Control withUncertain Disturbance, Measurement,and Initial Condition

  • 7/29/2019 Mae 546 Lecture 23

    7/14

    Immune Response Example

    ! Optimal open-loop drug therapy (control)

    ! Assumptions! Initial condition known without error

    ! No disturbance

    ! Optimal closed-loop therapy! Assumptions

    ! Small error in initial condition

    ! Small disturbance

    ! Perfect measurement of state

    ! Stochastic optimal closed-loop therapy! Assumptions

    ! Small error in initial condition

    ! Small disturbance

    ! Imperfect measurement

    ! Certainty-equivalence applies toperturbation control

    Immune Response

    Example with OptimalFeedback Control

    Open-Loop Optimal Controlfor Lethal Initial Condition

    Open- and Closed-LoopOptimal Control for 150%Lethal Initial Condition

    Immune Response with Stochastic

    Optimal Feedback Control(Random Disturbance and Measurement Error Not Simulated)

    Low-Bandwidth Estimator

    (|W| < |N|)

    High-Bandwidth Estimator

    (|W| > |N|)

    ! Initial control too sluggish

    to prevent divergence

    ! Quick initial control

    prevents divergence

    Immune Response to Random Disturbancewith Stochastic Neighboring-Optimal Control

    Disturbance due to

    Re-infection Sequestered pockets

    of pathogen

    Noisy measurements

    Closed-loop therapy isrobust

    ... but not robust enough:

    Organ death occurs inone case

    Probability of satisfactorytherapy can bemaximized by stochasticredesign of controller

  • 7/29/2019 Mae 546 Lecture 23

    8/14

    Stochastic Linear-QuadraticOptimal Control

    Stochastic Principle of Optimality Applied tothe Linear-Quadratic (LQ) Problem

    ! Linear dynamic constraint

    V to( ) = E ! x(tf)"# $% & L x('),u(')[ ]

    tf

    to

    ( d')*+

    ,+

    -.+

    /+

    =

    1

    2E x

    T(t

    f)S(t

    f)x(t

    f) & xT(t) uT(t)"#

    $%

    Q(t) M(t)

    MT(t) R(t)

    "

    #

    00

    $

    %

    11

    x(t)

    u(t)

    "

    #00

    $

    %11

    dttf

    to

    ()*+

    ,+

    -.+

    /+

    !x t( ) = F(t)x(t)+G(t)u(t)+ L(t)w(t)

    ! Quadratic value function

    Components of the LQ Value Function

    ! Certainty-equivalent value function

    V t( )=

    1

    2x

    T

    (t)S(t)x(t)

    +v t( )

    ! Quadratic value function has two parts

    ! Stochastic value function increment

    VCE t( ) !1

    2xT(t)S(t)x(t)

    v t( ) =1

    2Tr S !( )L !( )W !( )L !( )

    T"#

    $%d!t

    tf

    &

    Value Function Gradient and Hessian! Certainty-equivalent value function

    ! Gradient with respect to the state

    VCE t( ) !1

    2 xT

    (t)S(t)x(t)

    !V

    !xt( ) = xT(t)S(t)

    ! Hessian with respect to the state

    !2V

    !x2

    t( ) = S(t)

  • 7/29/2019 Mae 546 Lecture 23

    9/14

    Linear-Quadratic StochasticHamilton-Jacobi-Bellman Equation

    (Perfect Measurements)

    !Certainty-equivalent plus stochastic terms

    !V*

    !t= "min

    u

    1

    2E x *

    TQx*+2x *

    TMu + u

    TRu( )+ x *T S Fx *+Gu( )+ Tr SLWLT( )#$ %&

    = "minu

    1

    2x *

    TQx*+2x *

    TMu + u

    TRu( )+ x *T S Fx *+Gu( )+ Tr SLWLT( )#$ %&

    ! Terminal condition

    V tf( ) =1

    2x

    T(tf )S(tf )x(tf )

    Optimal Control Law! Differentiate right side of HJB equation w.r.t. u

    and set equal to zero

    ! !V !t( )!u

    = 0 = xTM + u

    TR( )+ xTSG"# $%

    u t( ) = !R!1 t( ) GT t( )S t( ) +MT t( )"# $%x t( )

    ! !C t( )x t( )

    ! Solve for u, obtaining feedback control law

    LQ Optimal Control Law

    u t( ) = !R!1 t( ) GT t( )S t( )+MT t( )"# $%x t( )

    ! !C t( )x t( )

    ! Zero-mean, white-noise disturbance has no effect on the structureand gains of the LQ feedback control law

    Matrix Riccati Equation

    ! Substitute optimal control law in HJB equation

    ! Matrix Riccati equation provides S(t)

    !S t( ) = !Q(t)+M(t)R!1(t)MT(t)"# $% ! F(t)!G(t)R

    !1(t)M

    T(t)"# $%

    T

    S t( )

    ! S t( ) F(t)!G(t)R!1(t)MT(t)"# $% + S t( )G(t)R!1(t)G

    T(t)S t( ), S tf( ) = &xx tf( )

    1

    2

    xT !Sx + !v =1

    2

    xT !Q + MR!1MT( )! F ! GR!1MT( )T

    S ! S F ! GR!1MT( )+ SGR!1GTS"#

    $%

    x

    +1

    2Tr SLWL

    T( ) u t( ) = !R!1

    t( ) GT t( )S t( )+MT t( )"# $%x t( )

    ! Stochastic value function increases cost due to disturbance

    ! However, its calculation is independent of the Riccati equation

    !v =1

    2Tr SLWL

    T( )

  • 7/29/2019 Mae 546 Lecture 23

    10/14

    Evaluation of the Total Cost(Imperfect Measurements)

    ! Stochastic quadratic cost function, neglecting cross terms

    J=1

    2Tr E xT(tf )S(tf )x(tf )!" #$ + E x

    T(t) uT(t)!"#$

    Q(t) 0

    0 R(t)

    !

    "

    %

    %

    #

    $

    &

    &

    x(t)

    u(t)

    !

    "

    %

    %

    #

    $

    &

    &

    dt

    to

    tf

    '()*

    +*

    ,-*

    .*

    =1

    2Tr S(tf )E x(tf )x

    T(tf )!" #$ + Q(t)E x(t)xT(t)!" #$ + R(t)E u(t)u

    T(t)!" #${ }dtto

    tf

    '

    J=1

    2Tr S(tf )P(tf )+ Q(t)P(t)+R(t)U t( )!" #$dt

    to

    tf

    %&'(

    )(

    *+(

    ,(

    where

    P(t) ! E x(t)xT(t)!" #$

    U t( ) ! E u(t)uT(t)!" #$

    or

    Optimal Control Covariance

    u t( ) = !C t( ) x t( )

    ! Optimal control vector

    U t( ) = C t( )P t( )CT t( )

    = R!1

    t( )GT t( )S t( )P t( )S t( )G t( )R!1 t( )

    ! Optimal control covariance

    Revise Cost to Reflect State andAdjoint Covariance Dynamics

    ! Integration by parts

    J=

    1

    2Tr S(to )P to( )+ Q(t)P t( ) + R(t)U t( ) + !S(t)P(t)+ S(t) !P(t)!" #$dt

    to

    tf

    %&'(

    )(

    *+(

    ,(

    S(t)P(t)to

    tf= !S(t)P(t) + S(t) !P(t)!" #$dt

    to

    tf

    %

    S(tf )P(tf ) = S(to )P(to )+!S(t)P(t) + S(t) !P(t)!" #$dt

    to

    tf

    %

    ! Rewrite cost function to incorporate initial cost

    Evolution of State and AdjointCovariance Matrices

    (No Control)

    ! State covariance response to random disturbance

    ! Adjoint covariance response to terminal cost

    !P t( ) = F t( )P t( ) + P t( )FT t( ) + L t( )W t( )LT t( ), P to( ) given

    !S t( ) = !FT t( )S t( ) ! S t( )F t( ) !Q t( ), S tf( ) given

    u t( ) = 0; U t( ) = 0

  • 7/29/2019 Mae 546 Lecture 23

    11/14

    Evolution of State and AdjointCovariance Matrices

    (Optimal Control)

    !State covariance response to random disturbance

    ! Adjoint covariance response to terminal cost

    !P t( ) = F t( ) !G t( )C t( )"# $%P t( ) + P t( ) F t( ) !G t( )C t( )"# $%T + L t( )W t( )LT t( )

    !S t( ) = !FT t( )S t( ) ! S t( )F t( ) !Q t( ) ! S t( )G t( )R!1 t( )GT t( )S t( )

    Dependent on S(t)

    Independent of P(t)

    ! With no control

    Jno control =

    12Tr S(to )P to( ) + S t( )L t( )W t( )LT t( )dt

    to

    tf

    !"

    #$$

    %

    &''

    ! With optimal control, the equation for the cost is the same

    Joptimal control

    =1

    2Tr S(t

    o)P t

    o( )+ S t( )L t( )W t( )LT t( )dt

    to

    tf

    !"

    #$$

    %

    &''

    ! ... but evolutions of S(t) and S(to) are different in each case

    Total Cost With andWithout Control

    Next Time:Linear-Quadratic-GaussianRegulatorsSupplemental

    Material

  • 7/29/2019 Mae 546 Lecture 23

    12/14

    Dual Control(Fel"dbaum, 1965)

    ! Nonlinear system

    ! Uncertain system parameters to be estimated

    ! Parameter estimation can be aided by test inputs

    ! Approach: Minimize value function with three increments

    ! Nominal control

    ! Cautious control

    ! Probing control

    minu

    V* =

    minu

    V*nominal +V*cautious +V*probing( )

    Estimation and control calculations are coupledand necessarily recursive

    Adaptive Critic Controller

    Nonlinear control law, c, takes the general form

    On-line adaptive critic controller Nonlinear control law (action network) Criticizes non-optimal performance via critic network

    Adapts control gains to improve performance Adapts cost model to improve estimate

    u t( ) = c x(t),a,y * t( )[ ]

    x(t) : state

    a : parameters of operating point

    y *( t) : command input

    Algebraic Initialization of Neural Networks(Ferrari and Stengel)

    Initially, c[x, a, y*] is unknown

    Design PI-LQ controllers with integral compensation that

    satisfy requirements at noperating points Scheduling variable, a

    u t( ) = CF

    a( )y *+CB

    a( )!x + CI

    a( ) !y t( )dt" # c x(t),a,y * t( )$% &'

    Replace Gain Matrices by Neural Networks

    Replace control gain matrices by sigmoidal neural networks

    u t( ) = NNF

    y * t( ),a t( )!" #$ + NNB x t( ),a t( )!" #$ + NNI %y t( )dt& ,a t( )!" #$ = c x(t),a, y * t( )!" #$

  • 7/29/2019 Mae 546 Lecture 23

    13/14

    Initial Neural Control Law

    Algebraic training of neural networks produces exact fit oflinear control gains and trim conditions at noperating points

    Interpolation and gain scheduling via neural networks

    One node per operating point in each neural network

    On-line Optimization of Adaptive

    Critic Neural Network Controller

    Critic adapts neural network weights toimprove performance usingapproximate dynamic programming

    Heuristic Dynamic ProgrammingAdaptive Critic

    Dual Heuristic Programming Adaptive Critic for receding-horizonoptimization problem

    Critic and Action (i.e., Control) networks adapted concurrently LQ-PI cost function applied to nonlinear problem

    Modified resilient backpropagation for neural network training

    V x tk

    ( )[ ] = L x tk( ),u tk( )[ ] + V x tk+1( )[ ]

    !V

    !u=!L

    !u+!V

    !x

    !x

    !u= 0

    !V xat( )[ ]

    !xat( )

    = NNCx

    at( ),a t( )[ ]

    Action Network On-line TrainingTrain action network, at timet, holding the critic parameters fixed

    NNC

    Aircraft Model

    Transition Matrices State Prediction

    Utility Function

    Derivatives

    NNA

    xa(t)

    a(t)

    Optimality

    Condition

    NNA

    Target

    Target Generation

  • 7/29/2019 Mae 546 Lecture 23

    14/14

    Critic Network On-line TrainingTrain critic network, at time t, holding the action parameters fixed

    NNC(old)

    Utility Function

    Derivatives

    NNA

    NNC

    Target

    Target Generation

    Aircraft Model Transition Matrices

    State Prediction

    NNC

    Target Cost

    Gradient

    xa(t)

    a(t)


Recommended