+ All Categories
Home > Documents > Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory...

Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory...

Date post: 29-Mar-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
113
Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia
Transcript
Page 1: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Optimal Control Theory

Benjamin Dubois-Taine

July 22, 2020

The University of British Columbia

Page 2: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Introduction

Today we will cover

• Discrete time control and the Bellman equations

• Continuous time control and the Hamilton-Jacobi-Bellman equations

• An important special case: Linear-Quadratic Gaussian and

Linear-Quadratic Regulator problems

• Pontryagin’s maximum principle

• (time allowing) Optimal Estimation and Kalman filter

This content is taken from [1, Chapter 12].

1

Page 3: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Introduction

Today we will cover

• Discrete time control and the Bellman equations

• Continuous time control and the Hamilton-Jacobi-Bellman equations

• An important special case: Linear-Quadratic Gaussian and

Linear-Quadratic Regulator problems

• Pontryagin’s maximum principle

• (time allowing) Optimal Estimation and Kalman filter

This content is taken from [1, Chapter 12].

1

Page 4: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Introduction

Today we will cover

• Discrete time control and the Bellman equations

• Continuous time control and the Hamilton-Jacobi-Bellman equations

• An important special case: Linear-Quadratic Gaussian and

Linear-Quadratic Regulator problems

• Pontryagin’s maximum principle

• (time allowing) Optimal Estimation and Kalman filter

This content is taken from [1, Chapter 12].

1

Page 5: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Introduction

Today we will cover

• Discrete time control and the Bellman equations

• Continuous time control and the Hamilton-Jacobi-Bellman equations

• An important special case: Linear-Quadratic Gaussian and

Linear-Quadratic Regulator problems

• Pontryagin’s maximum principle

• (time allowing) Optimal Estimation and Kalman filter

This content is taken from [1, Chapter 12].

1

Page 6: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Introduction

Today we will cover

• Discrete time control and the Bellman equations

• Continuous time control and the Hamilton-Jacobi-Bellman equations

• An important special case: Linear-Quadratic Gaussian and

Linear-Quadratic Regulator problems

• Pontryagin’s maximum principle

• (time allowing) Optimal Estimation and Kalman filter

This content is taken from [1, Chapter 12].

1

Page 7: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Introduction

Today we will cover

• Discrete time control and the Bellman equations

• Continuous time control and the Hamilton-Jacobi-Bellman equations

• An important special case: Linear-Quadratic Gaussian and

Linear-Quadratic Regulator problems

• Pontryagin’s maximum principle

• (time allowing) Optimal Estimation and Kalman filter

This content is taken from [1, Chapter 12].

1

Page 8: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete control and the Bellman equations

Define

• x ∈ X the state of the agent’s environment.

• u ∈ U(x) the action chosen at state x .

• next(x , u) ∈ X the resulting state from applying action u in state x

• cost(x , u) ≥ 0 the cost of applying u in state x

2

Page 9: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Example: plane tickets

• X = set of cities

• U(x) = flights available from city x

• next(x , u) the city where the flight lands

• cost(x , u) price of the flight

Goal: find cheapest way to get to your destination

3

Page 10: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Example: plane tickets

• X = set of cities

• U(x) = flights available from city x

• next(x , u) the city where the flight lands

• cost(x , u) price of the flight

Goal: find cheapest way to get to your destination

3

Page 11: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

Define

• x ∈ X the state of the agent’s environment.

• u ∈ U(x) the action chosen at state x .

• next(x , u) ∈ X the resulting state from applying action u in state x

• cost(x , u) ≥ 0 the cost of applying u in state x

Goal: find action sequence (u0, . . . , un−1) minimizing the total cost

J(x ., u.) =n−1∑k=0

cost(xk , uk)

where xk+1 = next(xk , uk), and x0 and xn given.

• We can think of this as a graph where nodes are states, and actions

are arrows connecting the nodes.

4

Page 12: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

Define

• x ∈ X the state of the agent’s environment.

• u ∈ U(x) the action chosen at state x .

• next(x , u) ∈ X the resulting state from applying action u in state x

• cost(x , u) ≥ 0 the cost of applying u in state x

Goal: find action sequence (u0, . . . , un−1) minimizing the total cost

J(x ., u.) =n−1∑k=0

cost(xk , uk)

where xk+1 = next(xk , uk), and x0 and xn given.

• We can think of this as a graph where nodes are states, and actions

are arrows connecting the nodes.

4

Page 13: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

Goal: find action sequence (u0, . . . , un−1) minimizing the total cost

J(x ., u.) =n−1∑k=0

cost(xk , uk)

We need a control law, namely a mapping from states to actions.

Defining the optimal value function as

v(x) = minu∈U(x)

{cost(x , u) + v(next(x , u))

}(1)

the associated optimal control law is

π(x) = arg minu∈U(x)

{cost(x , u) + v(next(x , u))

}(2)

Those are the Bellman equations.

5

Page 14: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

Goal: find action sequence (u0, . . . , un−1) minimizing the total cost

J(x ., u.) =n−1∑k=0

cost(xk , uk)

We need a control law, namely a mapping from states to actions.

Defining the optimal value function as

v(x) = minu∈U(x)

{cost(x , u) + v(next(x , u))

}(1)

the associated optimal control law is

π(x) = arg minu∈U(x)

{cost(x , u) + v(next(x , u))

}(2)

Those are the Bellman equations.

5

Page 15: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

Goal: find action sequence (u0, . . . , un−1) minimizing the total cost

J(x ., u.) =n−1∑k=0

cost(xk , uk)

We need a control law, namely a mapping from states to actions.

Defining the optimal value function as

v(x) = minu∈U(x)

{cost(x , u) + v(next(x , u))

}(1)

the associated optimal control law is

π(x) = arg minu∈U(x)

{cost(x , u) + v(next(x , u))

}(2)

Those are the Bellman equations.

5

Page 16: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

Goal: find action sequence (u0, . . . , un−1) minimizing the total cost

J(x ., u.) =n−1∑k=0

cost(xk , uk)

We need a control law, namely a mapping from states to actions.

Defining the optimal value function as

v(x) = minu∈U(x)

{cost(x , u) + v(next(x , u))

}(1)

the associated optimal control law is

π(x) = arg minu∈U(x)

{cost(x , u) + v(next(x , u))

}(2)

Those are the Bellman equations.

5

Page 17: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

We want to solve

v(x) = minu∈U(x)

{cost(x , u) + v(next(x , u))

}

π(x) = arg minu∈U(x)

{cost(x , u) + v(next(x , u))

}

Let’s go back to the graph analogy. Assume the graph is acyclic.

Suppose we start at x0 and want to reach xf .

• set v(xf ) = 0

• once every successor of a state x has been visited, apply the formula

for v to x .

6

Page 18: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

We want to solve

v(x) = minu∈U(x)

{cost(x , u) + v(next(x , u))

}

π(x) = arg minu∈U(x)

{cost(x , u) + v(next(x , u))

}Let’s go back to the graph analogy. Assume the graph is acyclic.

Suppose we start at x0 and want to reach xf .

• set v(xf ) = 0

• once every successor of a state x has been visited, apply the formula

for v to x .

6

Page 19: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

We want to solve

v(x) = minu∈U(x)

{cost(x , u) + v(next(x , u))

}

π(x) = arg minu∈U(x)

{cost(x , u) + v(next(x , u))

}Let’s go back to the graph analogy. Assume the graph is acyclic.

Suppose we start at x0 and want to reach xf .

• set v(xf ) = 0

• once every successor of a state x has been visited, apply the formula

for v to x .

6

Page 20: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

• For cyclic graphs, this approach will not work.

• The Bellman equations are still valid.

• need to design iterative schemes: Value Iteration and Policy

Iteration

7

Page 21: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

• For cyclic graphs, this approach will not work.

• The Bellman equations are still valid.

• need to design iterative schemes: Value Iteration and Policy

Iteration

7

Page 22: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

• For cyclic graphs, this approach will not work.

• The Bellman equations are still valid.

• need to design iterative schemes: Value Iteration and Policy

Iteration

7

Page 23: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

Value Iteration proceeds as follows:

• start with some guess v (0) of the optimal value function.

• construct a sequence of guesses

v (i+1)(x) = minu∈U(x)

{cost(x , u) + v (i)(next(x , u))

}

This algorithm can be shown to converge at a linear rate [2].

Each iteration costs O(|X ||U|).

8

Page 24: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

Value Iteration proceeds as follows:

• start with some guess v (0) of the optimal value function.

• construct a sequence of guesses

v (i+1)(x) = minu∈U(x)

{cost(x , u) + v (i)(next(x , u))

}

This algorithm can be shown to converge at a linear rate [2].

Each iteration costs O(|X ||U|).

8

Page 25: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

Value Iteration proceeds as follows:

• start with some guess v (0) of the optimal value function.

• construct a sequence of guesses

v (i+1)(x) = minu∈U(x)

{cost(x , u) + v (i)(next(x , u))

}

This algorithm can be shown to converge at a linear rate [2].

Each iteration costs O(|X ||U|).

8

Page 26: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

Value Iteration proceeds as follows:

• start with some guess v (0) of the optimal value function.

• construct a sequence of guesses

v (i+1)(x) = minu∈U(x)

{cost(x , u) + v (i)(next(x , u))

}

This algorithm can be shown to converge at a linear rate [2].

Each iteration costs O(|X ||U|).

8

Page 27: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

Policy Iteration proceeds as follows:

• start with some guess π(0) of the optimal law.

• construct a sequence of guesses

vπ(i)

(x) = cost(x , π(i)(x)) + vπ(i)

(next(x , π(i)(x))

π(i+1)(x) = arg minu∈U(x)

{cost(x , u) + vπ

(i)

(next(x , u))}

Need to relax the first line or solve a system of linear equations.

Under certain assumptions, this is faster than value iteration [3].

However each iteration is more costly.

9

Page 28: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

Policy Iteration proceeds as follows:

• start with some guess π(0) of the optimal law.

• construct a sequence of guesses

vπ(i)

(x) = cost(x , π(i)(x)) + vπ(i)

(next(x , π(i)(x))

π(i+1)(x) = arg minu∈U(x)

{cost(x , u) + vπ

(i)

(next(x , u))}

Need to relax the first line or solve a system of linear equations.

Under certain assumptions, this is faster than value iteration [3].

However each iteration is more costly.

9

Page 29: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

Policy Iteration proceeds as follows:

• start with some guess π(0) of the optimal law.

• construct a sequence of guesses

vπ(i)

(x) = cost(x , π(i)(x)) + vπ(i)

(next(x , π(i)(x))

π(i+1)(x) = arg minu∈U(x)

{cost(x , u) + vπ

(i)

(next(x , u))}

Need to relax the first line or solve a system of linear equations.

Under certain assumptions, this is faster than value iteration [3].

However each iteration is more costly.

9

Page 30: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

Policy Iteration proceeds as follows:

• start with some guess π(0) of the optimal law.

• construct a sequence of guesses

vπ(i)

(x) = cost(x , π(i)(x)) + vπ(i)

(next(x , π(i)(x))

π(i+1)(x) = arg minu∈U(x)

{cost(x , u) + vπ

(i)

(next(x , u))}

Need to relax the first line or solve a system of linear equations.

Under certain assumptions, this is faster than value iteration [3].

However each iteration is more costly.

9

Page 31: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman Equations

Policy Iteration proceeds as follows:

• start with some guess π(0) of the optimal law.

• construct a sequence of guesses

vπ(i)

(x) = cost(x , π(i)(x)) + vπ(i)

(next(x , π(i)(x))

π(i+1)(x) = arg minu∈U(x)

{cost(x , u) + vπ

(i)

(next(x , u))}

Need to relax the first line or solve a system of linear equations.

Under certain assumptions, this is faster than value iteration [3].

However each iteration is more costly.

9

Page 32: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman equations

• It is also of interest to consider the stochastic setting where we have

p(y | x , u) = ”probability that next(x , u) = y”

• The Bellman equation for the optimal control law becomes

π(x) = arg minu∈U(x)

{cost(x , u) + E

[v(next(x , u))

]}

• everything we have seen so far generalizes to this setting.

• This is called a Markov Decision Process (MDP)

10

Page 33: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman equations

• It is also of interest to consider the stochastic setting where we have

p(y | x , u) = ”probability that next(x , u) = y”

• The Bellman equation for the optimal control law becomes

π(x) = arg minu∈U(x)

{cost(x , u) + E

[v(next(x , u))

]}

• everything we have seen so far generalizes to this setting.

• This is called a Markov Decision Process (MDP)

10

Page 34: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman equations

• It is also of interest to consider the stochastic setting where we have

p(y | x , u) = ”probability that next(x , u) = y”

• The Bellman equation for the optimal control law becomes

π(x) = arg minu∈U(x)

{cost(x , u) + E

[v(next(x , u))

]}

• everything we have seen so far generalizes to this setting.

• This is called a Markov Decision Process (MDP)

10

Page 35: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Discrete Control and the Bellman equations

• It is also of interest to consider the stochastic setting where we have

p(y | x , u) = ”probability that next(x , u) = y”

• The Bellman equation for the optimal control law becomes

π(x) = arg minu∈U(x)

{cost(x , u) + E

[v(next(x , u))

]}

• everything we have seen so far generalizes to this setting.

• This is called a Markov Decision Process (MDP)

10

Page 36: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control

• State x ∈ Rnx and actions u ∈ U(x) ⊂ Rnu are real-valued vectors.

• Assume that our trajectory is given by

dx = f (x , u)dt + F (x , u)dw

where dw is nw -dimensional Brownian motion. We can also write

the previous as

x(t) = x(0) +

∫ t

0

f (x(s), u(s))ds +

∫ t

0

F (x(s), u(s))dw(s)

• Discretizing this into time steps of size ∆, i.e. t = k∆, gives

xk+1 = xk + ∆f (xk , uk) +√

∆F (xk , uk)εk (3)

where εk ∼ N (0, Inw ) and xk = x(k∆).

11

Page 37: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control

• State x ∈ Rnx and actions u ∈ U(x) ⊂ Rnu are real-valued vectors.

• Assume that our trajectory is given by

dx = f (x , u)dt + F (x , u)dw

where dw is nw -dimensional Brownian motion. We can also write

the previous as

x(t) = x(0) +

∫ t

0

f (x(s), u(s))ds +

∫ t

0

F (x(s), u(s))dw(s)

• Discretizing this into time steps of size ∆, i.e. t = k∆, gives

xk+1 = xk + ∆f (xk , uk) +√

∆F (xk , uk)εk (3)

where εk ∼ N (0, Inw ) and xk = x(k∆).

11

Page 38: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control

• State x ∈ Rnx and actions u ∈ U(x) ⊂ Rnu are real-valued vectors.

• Assume that our trajectory is given by

dx = f (x , u)dt + F (x , u)dw

where dw is nw -dimensional Brownian motion. We can also write

the previous as

x(t) = x(0) +

∫ t

0

f (x(s), u(s))ds +

∫ t

0

F (x(s), u(s))dw(s)

• Discretizing this into time steps of size ∆, i.e. t = k∆, gives

xk+1 = xk + ∆f (xk , uk) +√

∆F (xk , uk)εk (3)

where εk ∼ N (0, Inw ) and xk = x(k∆).

11

Page 39: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control

• We also need a cost function.

• For now assume finite horizon-problems, i.e. a final time tf is

specified.

• Separate the total cost into cost rate ` and final cost h.

• Total cost is then

J(x ., u.) = h(x(tf )) +

∫ tf

0

`(x(t), u(t), t)dt

• Discretizing this gives

J(x ., u.) = h(xn) + ∆n−1∑k=0

`(xk , uk , k∆) (4)

where n = tf /∆.

12

Page 40: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control

• We also need a cost function.

• For now assume finite horizon-problems, i.e. a final time tf is

specified.

• Separate the total cost into cost rate ` and final cost h.

• Total cost is then

J(x ., u.) = h(x(tf )) +

∫ tf

0

`(x(t), u(t), t)dt

• Discretizing this gives

J(x ., u.) = h(xn) + ∆n−1∑k=0

`(xk , uk , k∆) (4)

where n = tf /∆.

12

Page 41: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control

• We also need a cost function.

• For now assume finite horizon-problems, i.e. a final time tf is

specified.

• Separate the total cost into cost rate ` and final cost h.

• Total cost is then

J(x ., u.) = h(x(tf )) +

∫ tf

0

`(x(t), u(t), t)dt

• Discretizing this gives

J(x ., u.) = h(xn) + ∆n−1∑k=0

`(xk , uk , k∆) (4)

where n = tf /∆.

12

Page 42: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control

• We also need a cost function.

• For now assume finite horizon-problems, i.e. a final time tf is

specified.

• Separate the total cost into cost rate ` and final cost h.

• Total cost is then

J(x ., u.) = h(x(tf )) +

∫ tf

0

`(x(t), u(t), t)dt

• Discretizing this gives

J(x ., u.) = h(xn) + ∆n−1∑k=0

`(xk , uk , k∆) (4)

where n = tf /∆.

12

Page 43: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control

• To summarize, we have

xk+1 = xk + ∆f (xk , uk) +√

∆F (xk , uk)εk (5)

with εk ∼ N (0, Inw ), and

J(x ., u.) = h(xn) + ∆n−1∑k=0

`(xk , uk , k∆) (6)

• From (5) we can see that

xk+1 = xk + ∆f (xk , uk) + ξ

where ξ ∼ N (0,∆S(xk , uk)) and S(x , u) = F (x , u)F (x , u)T .

• With this we can define the optimal value function similarly

v(x , k) = minu

{∆`(x , u, k∆) + E

[v(x + ∆f (x , u) + ξ, k + 1

)]}(7)

13

Page 44: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control

• To summarize, we have

xk+1 = xk + ∆f (xk , uk) +√

∆F (xk , uk)εk (5)

with εk ∼ N (0, Inw ), and

J(x ., u.) = h(xn) + ∆n−1∑k=0

`(xk , uk , k∆) (6)

• From (5) we can see that

xk+1 = xk + ∆f (xk , uk) + ξ

where ξ ∼ N (0,∆S(xk , uk)) and S(x , u) = F (x , u)F (x , u)T .

• With this we can define the optimal value function similarly

v(x , k) = minu

{∆`(x , u, k∆) + E

[v(x + ∆f (x , u) + ξ, k + 1

)]}(7)

13

Page 45: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control

• To summarize, we have

xk+1 = xk + ∆f (xk , uk) +√

∆F (xk , uk)εk (5)

with εk ∼ N (0, Inw ), and

J(x ., u.) = h(xn) + ∆n−1∑k=0

`(xk , uk , k∆) (6)

• From (5) we can see that

xk+1 = xk + ∆f (xk , uk) + ξ

where ξ ∼ N (0,∆S(xk , uk)) and S(x , u) = F (x , u)F (x , u)T .

• With this we can define the optimal value function similarly

v(x , k) = minu

{∆`(x , u, k∆) + E

[v(x + ∆f (x , u) + ξ, k + 1

)]}(7)

13

Page 46: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control

• We will simplify E[v(x + ∆f (x , u) + ξ)

].

• Setting δ = ∆f (x , u) + ξ, Taylor expansion gives

v(x + δ) = v(x) + δT vx(x) +1

2δT vxx(x)δ + o(δ3)

• Then

E[v(x + δ)] = v(x) + ∆f (x , u)T vx(x) +1

2E[ξT vxx(x)ξ

]+ o(∆2)

• Now,

E[ξT vxxξ

]= E

[tr(ξT vxxξ)

]= E

[tr(ξξT vxx)

]= tr

(Cov[ξ]vxx

)= tr

(∆Svxx

)14

Page 47: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous control

Going back to

v(x , k) = minu

{∆`(x , u, k∆) + E

[v(x + ∆f (x , u) + ξ, k + 1

)]}and with

E[v(x + δ)] = v(x) + ∆f (x , u)T vx(x) +1

2tr(∆S(x , u)vxx(x)

)+ o(∆2)

we get

v(x , k)− v(x , k + 1)

∆= min

u

{`+ f T vx +

1

2tr(Svxx)

}

15

Page 48: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control

v(x , k)− v(x , k + 1)

∆= min

u

{`+ f T vx +

1

2tr(Svxx)

}and recall that k in v(x , k) represents time k∆, so that the LHS is

v(x , t)− v(x , t + ∆)

As ∆→ 0, this is − ∂∂t v , which we denote −vt . So for v(x , tf ) = h(x)

and 0 ≤ t ≤ tf , we have

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}(8)

and the associated optimal control law

π(x , t) = arg minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}(9)

Those are the Hamilton-Jacobi-Bellman (HJB) equations.

16

Page 49: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control

v(x , k)− v(x , k + 1)

∆= min

u

{`+ f T vx +

1

2tr(Svxx)

}and recall that k in v(x , k) represents time k∆, so that the LHS is

v(x , t)− v(x , t + ∆)

As ∆→ 0, this is − ∂∂t v , which we denote −vt . So for v(x , tf ) = h(x)

and 0 ≤ t ≤ tf , we have

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}(8)

and the associated optimal control law

π(x , t) = arg minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}(9)

Those are the Hamilton-Jacobi-Bellman (HJB) equations.

16

Page 50: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control: solve the HJB Equations

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}with v(x , tf ) = h(x).

• Non-linear second-order PDE.

• May not have a classic solution

• numerical methods relying on ”viscosity” exist

• suffers from ”curse of dimensionality”

• several methods for approximate solutions exist and work well in

practice.

17

Page 51: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control: solve the HJB Equations

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}with v(x , tf ) = h(x).

• Non-linear second-order PDE.

• May not have a classic solution

• numerical methods relying on ”viscosity” exist

• suffers from ”curse of dimensionality”

• several methods for approximate solutions exist and work well in

practice.

17

Page 52: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control: solve the HJB Equations

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}with v(x , tf ) = h(x).

• Non-linear second-order PDE.

• May not have a classic solution

• numerical methods relying on ”viscosity” exist

• suffers from ”curse of dimensionality”

• several methods for approximate solutions exist and work well in

practice.

17

Page 53: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control: solve the HJB Equations

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}with v(x , tf ) = h(x).

• Non-linear second-order PDE.

• May not have a classic solution

• numerical methods relying on ”viscosity” exist

• suffers from ”curse of dimensionality”

• several methods for approximate solutions exist and work well in

practice.

17

Page 54: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control: solve the HJB Equations

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}with v(x , tf ) = h(x).

• Non-linear second-order PDE.

• May not have a classic solution

• numerical methods relying on ”viscosity” exist

• suffers from ”curse of dimensionality”

• several methods for approximate solutions exist and work well in

practice.

17

Page 55: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control: Infinite Horizon

Two infinite-horizon costs used in practice:

• Discounted cost formulation

J(x ., u.) =

∫ ∞0

exp(−αt)`(x(t), u(t))dt

• Average cost per stage formulation

J(x ., u.) = limtf→∞

1

tf

∫ tf

0

`(x(t), u(t))dt

Both those formulations bring similar HJB equations, except that they do

not depend on time.

In that sense they are easier to solve using numerical approximations.

However the finite-horizon problem also advantages.

18

Page 56: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control: Infinite Horizon

Two infinite-horizon costs used in practice:

• Discounted cost formulation

J(x ., u.) =

∫ ∞0

exp(−αt)`(x(t), u(t))dt

• Average cost per stage formulation

J(x ., u.) = limtf→∞

1

tf

∫ tf

0

`(x(t), u(t))dt

Both those formulations bring similar HJB equations, except that they do

not depend on time.

In that sense they are easier to solve using numerical approximations.

However the finite-horizon problem also advantages.

18

Page 57: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control: Infinite Horizon

Two infinite-horizon costs used in practice:

• Discounted cost formulation

J(x ., u.) =

∫ ∞0

exp(−αt)`(x(t), u(t))dt

• Average cost per stage formulation

J(x ., u.) = limtf→∞

1

tf

∫ tf

0

`(x(t), u(t))dt

Both those formulations bring similar HJB equations, except that they do

not depend on time.

In that sense they are easier to solve using numerical approximations.

However the finite-horizon problem also advantages.

18

Page 58: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control: Infinite Horizon

Two infinite-horizon costs used in practice:

• Discounted cost formulation

J(x ., u.) =

∫ ∞0

exp(−αt)`(x(t), u(t))dt

• Average cost per stage formulation

J(x ., u.) = limtf→∞

1

tf

∫ tf

0

`(x(t), u(t))dt

Both those formulations bring similar HJB equations, except that they do

not depend on time.

In that sense they are easier to solve using numerical approximations.

However the finite-horizon problem also advantages.

18

Page 59: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Continuous Control: Infinite Horizon

Two infinite-horizon costs used in practice:

• Discounted cost formulation

J(x ., u.) =

∫ ∞0

exp(−αt)`(x(t), u(t))dt

• Average cost per stage formulation

J(x ., u.) = limtf→∞

1

tf

∫ tf

0

`(x(t), u(t))dt

Both those formulations bring similar HJB equations, except that they do

not depend on time.

In that sense they are easier to solve using numerical approximations.

However the finite-horizon problem also advantages.

18

Page 60: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Linear-Quadratic-Gaussian control

• An important class of optimal control problems

• unlike many other problems, it is possible to find a closed-form

formula

• we will derive solutions in both the continuous and discrete cases

19

Page 61: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQG: the Continuous Case

We make the following assumptions

• dynamics: dx = (Ax + Bu)dt + Fdw

• cost rate: `(x , u) = 12u

TRu + 12x

TQx

• final cost: h(x) = 12x

TQ f x

where R,Q and Q f are symmetric, R is positive definite, and set

S = FFT .

Recall the HJB equation

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}with v(x , tf ) = h(x).

In our case it reads

−vt(x , t) = minu

{1

2uTRu +

1

2xTQx + (Ax + Bu)T vx(x) +

1

2tr(Svxx(x))

}with v(x , tf ) = 1

2xTQ f x

20

Page 62: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQG: the Continuous Case

We make the following assumptions

• dynamics: dx = (Ax + Bu)dt + Fdw

• cost rate: `(x , u) = 12u

TRu + 12x

TQx

• final cost: h(x) = 12x

TQ f x

where R,Q and Q f are symmetric, R is positive definite, and set

S = FFT .

Recall the HJB equation

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}with v(x , tf ) = h(x).

In our case it reads

−vt(x , t) = minu

{1

2uTRu +

1

2xTQx + (Ax + Bu)T vx(x) +

1

2tr(Svxx(x))

}with v(x , tf ) = 1

2xTQ f x

20

Page 63: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQG: the Continuous Case

We make the following assumptions

• dynamics: dx = (Ax + Bu)dt + Fdw

• cost rate: `(x , u) = 12u

TRu + 12x

TQx

• final cost: h(x) = 12x

TQ f x

where R,Q and Q f are symmetric, R is positive definite, and set

S = FFT .

Recall the HJB equation

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}with v(x , tf ) = h(x).

In our case it reads

−vt(x , t) = minu

{1

2uTRu +

1

2xTQx + (Ax + Bu)T vx(x) +

1

2tr(Svxx(x))

}with v(x , tf ) = 1

2xTQ f x

20

Page 64: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQG: the Continous Case

−vt(x , t) = minu

{1

2uTRu +

1

2xTQx + (Ax + Bu)T vx(x) +

1

2tr(Svxx(x))

}

• We make the following guess: v(x , t) = 12x

TV (t)x + a(t)

• the derivatives in the HJB equations are

• vt(x , t) =12xT V (t)x + a(t)

• vx(x) = V (t)x

• vxx(x) = V (t)

21

Page 65: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQG: the Continuous Case

Plugging back into the HJB equation gives

−vt(x , t) = minu

{1

2uTRu +

1

2xTQx + (Ax + Bu)TV (t)x +

1

2tr(SV (t))

}

This is simply a quadratic in u, whose minimizer is

u∗ = −R−1BTV (t)x

and thus

−vt(x , t) =1

2xT(Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)

)x +

1

2tr(SV(t))

Because vt(x , t) = 12x

T V (t)x + a(t), this gives

−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t) (10)

−a(t) =1

2tr(SV(t))

This is a continuous-time Riccati equation.

22

Page 66: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQG: the Continuous Case

Plugging back into the HJB equation gives

−vt(x , t) = minu

{1

2uTRu +

1

2xTQx + (Ax + Bu)TV (t)x +

1

2tr(SV (t))

}This is simply a quadratic in u, whose minimizer is

u∗ = −R−1BTV (t)x

and thus

−vt(x , t) =1

2xT(Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)

)x +

1

2tr(SV(t))

Because vt(x , t) = 12x

T V (t)x + a(t), this gives

−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t) (10)

−a(t) =1

2tr(SV(t))

This is a continuous-time Riccati equation.

22

Page 67: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQG: the Continuous Case

Plugging back into the HJB equation gives

−vt(x , t) = minu

{1

2uTRu +

1

2xTQx + (Ax + Bu)TV (t)x +

1

2tr(SV (t))

}This is simply a quadratic in u, whose minimizer is

u∗ = −R−1BTV (t)x

and thus

−vt(x , t) =1

2xT(Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)

)x +

1

2tr(SV(t))

Because vt(x , t) = 12x

T V (t)x + a(t), this gives

−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t) (10)

−a(t) =1

2tr(SV(t))

This is a continuous-time Riccati equation.

22

Page 68: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQG: the Continuous Case

Plugging back into the HJB equation gives

−vt(x , t) = minu

{1

2uTRu +

1

2xTQx + (Ax + Bu)TV (t)x +

1

2tr(SV (t))

}This is simply a quadratic in u, whose minimizer is

u∗ = −R−1BTV (t)x

and thus

−vt(x , t) =1

2xT(Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)

)x +

1

2tr(SV(t))

Because vt(x , t) = 12x

T V (t)x + a(t), this gives

−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t) (10)

−a(t) =1

2tr(SV(t))

This is a continuous-time Riccati equation.

22

Page 69: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQG: the Continuous Case

−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)

−a(t) =1

2tr(SV(t))

The boundary conditions v(x , tf ) = 12x

TQ f x imply that V (tf ) = Q f and

a(tf ) = 0.

⇒ This is a simple ODE, which is easy to solve.

The optimal control law is given by

u∗ = −R−1BTV (t)x

• It does not depend on the noise.

• It remains the same in the deterministic case, called the

linear-quadratic regulator.

23

Page 70: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQG: the Continuous Case

−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)

−a(t) =1

2tr(SV(t))

The boundary conditions v(x , tf ) = 12x

TQ f x imply that V (tf ) = Q f and

a(tf ) = 0.

⇒ This is a simple ODE, which is easy to solve.

The optimal control law is given by

u∗ = −R−1BTV (t)x

• It does not depend on the noise.

• It remains the same in the deterministic case, called the

linear-quadratic regulator.

23

Page 71: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQG: the Continuous Case

−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)

−a(t) =1

2tr(SV(t))

The boundary conditions v(x , tf ) = 12x

TQ f x imply that V (tf ) = Q f and

a(tf ) = 0.

⇒ This is a simple ODE, which is easy to solve.

The optimal control law is given by

u∗ = −R−1BTV (t)x

• It does not depend on the noise.

• It remains the same in the deterministic case, called the

linear-quadratic regulator.

23

Page 72: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQG: the Continuous Case

−V (t) = Q + ATV (t) + V (t)A− V (t)BR−1BTV (t)

−a(t) =1

2tr(SV(t))

The boundary conditions v(x , tf ) = 12x

TQ f x imply that V (tf ) = Q f and

a(tf ) = 0.

⇒ This is a simple ODE, which is easy to solve.

The optimal control law is given by

u∗ = −R−1BTV (t)x

• It does not depend on the noise.

• It remains the same in the deterministic case, called the

linear-quadratic regulator.

23

Page 73: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQR: the Discrete Case

We make the following assumptions

• dynamics: xk+1 = Axk + Buk

• cost: `(xk , uk) = 12u

Tk Ruk + 1

2xTk Qxk

• final cost: h(xn) = 12x

Tn Q f xn

where R,Q and Q f are symmetric, R is positive definite, and set

S = FFT .

Recall the Bellman equation

v(x , k) = minu

{`(x .u, k) + v(next(x , u, k))

}with v(xn) = h(xn).

Again we make the assumption that

v(x , k) =1

2xTVkx

24

Page 74: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQR: the Discrete Case

We make the following assumptions

• dynamics: xk+1 = Axk + Buk

• cost: `(xk , uk) = 12u

Tk Ruk + 1

2xTk Qxk

• final cost: h(xn) = 12x

Tn Q f xn

where R,Q and Q f are symmetric, R is positive definite, and set

S = FFT .

Recall the Bellman equation

v(x , k) = minu

{`(x .u, k) + v(next(x , u, k))

}with v(xn) = h(xn).

Again we make the assumption that

v(x , k) =1

2xTVkx

24

Page 75: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQR: the Discrete Case

We make the following assumptions

• dynamics: xk+1 = Axk + Buk

• cost: `(xk , uk) = 12u

Tk Ruk + 1

2xTk Qxk

• final cost: h(xn) = 12x

Tn Q f xn

where R,Q and Q f are symmetric, R is positive definite, and set

S = FFT .

Recall the Bellman equation

v(x , k) = minu

{`(x .u, k) + v(next(x , u, k))

}with v(xn) = h(xn).

Again we make the assumption that

v(x , k) =1

2xTVkx

24

Page 76: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQR: the Discrete Case

The boundary constraint gives Vn = Q f .

Plugging everything gives

1

2xTVkx = min

u

{1

2uTRu +

1

2xTQx +

1

2(Ax + Bu)TVk+1(Ax + Bu)

}

This is simply a quadratic in u, and we get

Vk = Q + ATVk+1A− ATVk+1B(R + BTVk+1B)−1BTVk+1A

which is a discrete-time Ricatti equation and the associated optimal

control law

uk = −Lkxkwhere Lk = (R + BTVk+1B)−1Vk+1A

25

Page 77: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQR: the Discrete Case

The boundary constraint gives Vn = Q f .

Plugging everything gives

1

2xTVkx = min

u

{1

2uTRu +

1

2xTQx +

1

2(Ax + Bu)TVk+1(Ax + Bu)

}This is simply a quadratic in u, and we get

Vk = Q + ATVk+1A− ATVk+1B(R + BTVk+1B)−1BTVk+1A

which is a discrete-time Ricatti equation and the associated optimal

control law

uk = −Lkxkwhere Lk = (R + BTVk+1B)−1Vk+1A

25

Page 78: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

LQR: the Discrete Case

Vk = Q + ATVk+1A− ATVk+1B(R + BTVk+1B)−1BTVk+1A

• Start with Vn = Q f and iterate backwards

• Can be computed offline

26

Page 79: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Deterministic Control: Pontryagin’s Maximum Principle

• Another approach to optimal control theory

• developped in the Soviet Union by Pontryagin

• only applies for deterministic problems.

• avoids the curse of dimensionality.

• applies for both continuous and discrete time.

27

Page 80: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

Setting:

• dynamics: dx = f (x(t), u(t))dt

• cost rate: `(x(t), u(t), t)

• final cost: h(x(tf ))

with fixed x0 and final time tf .

Recall the HJB equation

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}Because we are in the deterministic case we have

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x)}

Suppose optimal control law is given by u = π(x , t)

28

Page 81: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

Setting:

• dynamics: dx = f (x(t), u(t))dt

• cost rate: `(x(t), u(t), t)

• final cost: h(x(tf ))

with fixed x0 and final time tf .

Recall the HJB equation

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}

Because we are in the deterministic case we have

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x)}

Suppose optimal control law is given by u = π(x , t)

28

Page 82: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

Setting:

• dynamics: dx = f (x(t), u(t))dt

• cost rate: `(x(t), u(t), t)

• final cost: h(x(tf ))

with fixed x0 and final time tf .

Recall the HJB equation

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}Because we are in the deterministic case we have

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x)}

Suppose optimal control law is given by u = π(x , t)

28

Page 83: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

Setting:

• dynamics: dx = f (x(t), u(t))dt

• cost rate: `(x(t), u(t), t)

• final cost: h(x(tf ))

with fixed x0 and final time tf .

Recall the HJB equation

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x) +

1

2tr(S(x , u)vxx(x))

}Because we are in the deterministic case we have

−vt(x , t) = minu

{`(x , u, t) + f (x , u)T vx(x)}

Suppose optimal control law is given by u = π(x , t)

28

Page 84: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

−vt(x , t) = `(x , π(x , t), t) + f (x , π(x , t))T vx(x , t)

Taking derivatives w.r.t. x

0 = vtx + `x + πTx `u + f Tx vx + πT

x fTu vx + vxx f

Observe that vx = vxx x + vtx = vxx f + vtx ,

0 = vx + `x + f Tx vx + πTx (`u + f Tu vx)

Observe that `u + f Tu vx = `u(x , π(x , t), t) + fu(x , π(x , t))T vx(x , t)= 0

29

Page 85: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

−vt(x , t) = `(x , π(x , t), t) + f (x , π(x , t))T vx(x , t)

Taking derivatives w.r.t. x

0 = vtx + `x + πTx `u + f Tx vx + πT

x fTu vx + vxx f

Observe that vx = vxx x + vtx = vxx f + vtx ,

0 = vx + `x + f Tx vx + πTx (`u + f Tu vx)

Observe that `u + f Tu vx = `u(x , π(x , t), t) + fu(x , π(x , t))T vx(x , t)= 0

29

Page 86: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

−vt(x , t) = `(x , π(x , t), t) + f (x , π(x , t))T vx(x , t)

Taking derivatives w.r.t. x

0 = vtx + `x + πTx `u + f Tx vx + πT

x fTu vx + vxx f

Observe that vx = vxx x + vtx = vxx f + vtx ,

0 = vx + `x + f Tx vx + πTx (`u + f Tu vx)

Observe that `u + f Tu vx = `u(x , π(x , t), t) + fu(x , π(x , t))T vx(x , t)= 0

29

Page 87: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

−vt(x , t) = `(x , π(x , t), t) + f (x , π(x , t))T vx(x , t)

Taking derivatives w.r.t. x

0 = vtx + `x + πTx `u + f Tx vx + πT

x fTu vx + vxx f

Observe that vx = vxx x + vtx = vxx f + vtx ,

0 = vx + `x + f Tx vx + πTx (`u + f Tu vx)

Observe that `u + f Tu vx = `u(x , π(x , t), t) + fu(x , π(x , t))T vx(x , t)

= 0

29

Page 88: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

−vt(x , t) = `(x , π(x , t), t) + f (x , π(x , t))T vx(x , t)

Taking derivatives w.r.t. x

0 = vtx + `x + πTx `u + f Tx vx + πT

x fTu vx + vxx f

Observe that vx = vxx x + vtx = vxx f + vtx ,

0 = vx + `x + f Tx vx + πTx (`u + f Tu vx)

Observe that `u + f Tu vx = `u(x , π(x , t), t) + fu(x , π(x , t))T vx(x , t)= 0

29

Page 89: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

We then get

−vx(x , t) = fx(x , π(x , t))T vx(x , t) + `x(x , π(x , t), t)

Setting p = vx , this gives

−p(t) = fx(x , π(x , t))Tp(t) + `x(x , π(x , t), t)

The maximum principle thus reads

x(t) = f (x(t), u(t))

−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)

u(t) = arg minu

{`(x(t), u, t) + f (x(t), u)Tp(t)

}with boundary conditions p(tf ) = vx(x(tf ), tf ) = hx(x(tf )), and x0, tfgiven.

30

Page 90: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

We then get

−vx(x , t) = fx(x , π(x , t))T vx(x , t) + `x(x , π(x , t), t)

Setting p = vx , this gives

−p(t) = fx(x , π(x , t))Tp(t) + `x(x , π(x , t), t)

The maximum principle thus reads

x(t) = f (x(t), u(t))

−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)

u(t) = arg minu

{`(x(t), u, t) + f (x(t), u)Tp(t)

}with boundary conditions p(tf ) = vx(x(tf ), tf ) = hx(x(tf )), and x0, tfgiven.

30

Page 91: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

We then get

−vx(x , t) = fx(x , π(x , t))T vx(x , t) + `x(x , π(x , t), t)

Setting p = vx , this gives

−p(t) = fx(x , π(x , t))Tp(t) + `x(x , π(x , t), t)

The maximum principle thus reads

x(t) = f (x(t), u(t))

−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)

u(t) = arg minu

{`(x(t), u, t) + f (x(t), u)Tp(t)

}with boundary conditions p(tf ) = vx(x(tf ), tf ) = hx(x(tf )), and x0, tfgiven.

30

Page 92: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

Setting the Hamiltonian H(x , u, p, t) := `(x , u, t) + f (x , u)Tp, the

maximum principle reads

x(t) = f (x(t), u(t))

−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)

u(t) = arg minu

H(x(t), u, p(t), t)

with p(tf ) = hx(x(tf ))

• Simple ODE, cost grows linearly with nx

• existing software packages to solve

• Only issue is to solve for the Hamiltonian

• For problems where the dynamic is linear and the cost is quadratic

w,r.t. the control u, a nice closed form formula exists.

31

Page 93: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

Setting the Hamiltonian H(x , u, p, t) := `(x , u, t) + f (x , u)Tp, the

maximum principle reads

x(t) = f (x(t), u(t))

−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)

u(t) = arg minu

H(x(t), u, p(t), t)

with p(tf ) = hx(x(tf ))

• Simple ODE, cost grows linearly with nx

• existing software packages to solve

• Only issue is to solve for the Hamiltonian

• For problems where the dynamic is linear and the cost is quadratic

w,r.t. the control u, a nice closed form formula exists.

31

Page 94: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

Setting the Hamiltonian H(x , u, p, t) := `(x , u, t) + f (x , u)Tp, the

maximum principle reads

x(t) = f (x(t), u(t))

−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)

u(t) = arg minu

H(x(t), u, p(t), t)

with p(tf ) = hx(x(tf ))

• Simple ODE, cost grows linearly with nx

• existing software packages to solve

• Only issue is to solve for the Hamiltonian

• For problems where the dynamic is linear and the cost is quadratic

w,r.t. the control u, a nice closed form formula exists.

31

Page 95: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

Setting the Hamiltonian H(x , u, p, t) := `(x , u, t) + f (x , u)Tp, the

maximum principle reads

x(t) = f (x(t), u(t))

−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)

u(t) = arg minu

H(x(t), u, p(t), t)

with p(tf ) = hx(x(tf ))

• Simple ODE, cost grows linearly with nx

• existing software packages to solve

• Only issue is to solve for the Hamiltonian

• For problems where the dynamic is linear and the cost is quadratic

w,r.t. the control u, a nice closed form formula exists.

31

Page 96: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Continuous Case

Setting the Hamiltonian H(x , u, p, t) := `(x , u, t) + f (x , u)Tp, the

maximum principle reads

x(t) = f (x(t), u(t))

−p(t) = fx(x(t), u(t))Tp(t) + `x(x(t), u(t), t)

u(t) = arg minu

H(x(t), u, p(t), t)

with p(tf ) = hx(x(tf ))

• Simple ODE, cost grows linearly with nx

• existing software packages to solve

• Only issue is to solve for the Hamiltonian

• For problems where the dynamic is linear and the cost is quadratic

w,r.t. the control u, a nice closed form formula exists.

31

Page 97: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Pontryagin’s Maximum Principle: The Discrete Case

• Derivation in the continuous and discrete case is also possible using

Lagrange multipliers

• Optimization using gradient descent is possible in the discrete case

32

Page 98: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Optimal Estimation and the Kalman Filter

• Goal: From a sequence of noisy measurements, estimate the true

dynamics.

• Intimately tied to the problem of optimal control

dynamics: xk+1 = Axk + wk

observation: yk = Hxk + vk

where wk ∼ N (0,S) and vk ∼ N (0,P), x0 ∼ N (x0,Σ0), and

A,H,S ,P, x0,Σ0 are known.

⇒ Goal: estimate the probability distribution of xk given y0, . . . , yk−1:

pk = p(xk | y0, . . . , yk−1)

p0 = N (x0,Σ0)

33

Page 99: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Optimal Estimation and the Kalman Filter

• Goal: From a sequence of noisy measurements, estimate the true

dynamics.

• Intimately tied to the problem of optimal control

dynamics: xk+1 = Axk + wk

observation: yk = Hxk + vk

where wk ∼ N (0,S) and vk ∼ N (0,P), x0 ∼ N (x0,Σ0), and

A,H,S ,P, x0,Σ0 are known.

⇒ Goal: estimate the probability distribution of xk given y0, . . . , yk−1:

pk = p(xk | y0, . . . , yk−1)

p0 = N (x0,Σ0)

33

Page 100: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Optimal Estimation and the Kalman Filter

• Goal: From a sequence of noisy measurements, estimate the true

dynamics.

• Intimately tied to the problem of optimal control

dynamics: xk+1 = Axk + wk

observation: yk = Hxk + vk

where wk ∼ N (0,S) and vk ∼ N (0,P), x0 ∼ N (x0,Σ0), and

A,H,S ,P, x0,Σ0 are known.

⇒ Goal: estimate the probability distribution of xk given y0, . . . , yk−1:

pk = p(xk | y0, . . . , yk−1)

p0 = N (x0,Σ0)

33

Page 101: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Optimal Estimation and the Kalman Filter

• Goal: From a sequence of noisy measurements, estimate the true

dynamics.

• Intimately tied to the problem of optimal control

dynamics: xk+1 = Axk + wk

observation: yk = Hxk + vk

where wk ∼ N (0,S) and vk ∼ N (0,P), x0 ∼ N (x0,Σ0), and

A,H,S ,P, x0,Σ0 are known.

⇒ Goal: estimate the probability distribution of xk given y0, . . . , yk−1:

pk = p(xk | y0, . . . , yk−1)

p0 = N (x0,Σ0)

33

Page 102: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Optimal Estimation and the Kalman Filter

Using properties of multivariate Gaussian, it can be shown that

pk+1 = p(xk+1 | y0, . . . , yk) ∼ N (xk+1,Σk+1)

where

xk+1 = Axk + AΣkHT (P + HΣkH

T )−1(yk − Hxk) (11)

and

Σk+1 = S + AΣkAT − AΣkH

T (P + HΣkHT )−1HΣkA

T (12)

This is the Kalman filter.

Recall the Riccati equation for LQR

Vk = Q + ATVk+1A− ATVk+1B(R + BTVk+1B)−1BTVk+1A

34

Page 103: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Optimal Estimation and the Kalman Filter

Using properties of multivariate Gaussian, it can be shown that

pk+1 = p(xk+1 | y0, . . . , yk) ∼ N (xk+1,Σk+1)

where

xk+1 = Axk + AΣkHT (P + HΣkH

T )−1(yk − Hxk) (11)

and

Σk+1 = S + AΣkAT − AΣkH

T (P + HΣkHT )−1HΣkA

T (12)

This is the Kalman filter.

Recall the Riccati equation for LQR

Vk = Q + ATVk+1A− ATVk+1B(R + BTVk+1B)−1BTVk+1A

34

Page 104: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Optimal Estimation and the Kalman Filter

Using properties of multivariate Gaussian, it can be shown that

pk+1 = p(xk+1 | y0, . . . , yk) ∼ N (xk+1,Σk+1)

where

xk+1 = Axk + AΣkHT (P + HΣkH

T )−1(yk − Hxk) (11)

and

Σk+1 = S + AΣkAT − AΣkH

T (P + HΣkHT )−1HΣkA

T (12)

This is the Kalman filter.

Recall the Riccati equation for LQR

Vk = Q + ATVk+1A− ATVk+1B(R + BTVk+1B)−1BTVk+1A

34

Page 105: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Conclusion

What we covered today

• Bellman equations and dynamic programming

• Hamilton-Jacobi-Bellman equations

• Linear-Quadratic-Gaussian, Linear-Quadratic regulator and the

Riccati equations

• Pontryagin’s maximum principle

• Kalman Filter

What we didn’t cover

• solving non-linear optimal problem using linear relaxation

• duality between optimal control and optimal estimation

35

Page 106: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Conclusion

What we covered today

• Bellman equations and dynamic programming

• Hamilton-Jacobi-Bellman equations

• Linear-Quadratic-Gaussian, Linear-Quadratic regulator and the

Riccati equations

• Pontryagin’s maximum principle

• Kalman Filter

What we didn’t cover

• solving non-linear optimal problem using linear relaxation

• duality between optimal control and optimal estimation

35

Page 107: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Conclusion

What we covered today

• Bellman equations and dynamic programming

• Hamilton-Jacobi-Bellman equations

• Linear-Quadratic-Gaussian, Linear-Quadratic regulator and the

Riccati equations

• Pontryagin’s maximum principle

• Kalman Filter

What we didn’t cover

• solving non-linear optimal problem using linear relaxation

• duality between optimal control and optimal estimation

35

Page 108: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Conclusion

What we covered today

• Bellman equations and dynamic programming

• Hamilton-Jacobi-Bellman equations

• Linear-Quadratic-Gaussian, Linear-Quadratic regulator and the

Riccati equations

• Pontryagin’s maximum principle

• Kalman Filter

What we didn’t cover

• solving non-linear optimal problem using linear relaxation

• duality between optimal control and optimal estimation

35

Page 109: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Conclusion

What we covered today

• Bellman equations and dynamic programming

• Hamilton-Jacobi-Bellman equations

• Linear-Quadratic-Gaussian, Linear-Quadratic regulator and the

Riccati equations

• Pontryagin’s maximum principle

• Kalman Filter

What we didn’t cover

• solving non-linear optimal problem using linear relaxation

• duality between optimal control and optimal estimation

35

Page 110: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Conclusion

What we covered today

• Bellman equations and dynamic programming

• Hamilton-Jacobi-Bellman equations

• Linear-Quadratic-Gaussian, Linear-Quadratic regulator and the

Riccati equations

• Pontryagin’s maximum principle

• Kalman Filter

What we didn’t cover

• solving non-linear optimal problem using linear relaxation

• duality between optimal control and optimal estimation

35

Page 111: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Any questions?

35

Page 112: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

Thank you!

35

Page 113: Optimal Control Theory - Home | Computer Science at UBC · 2020. 7. 30. · Optimal Control Theory Benjamin Dubois-Taine July 22, 2020 The University of British Columbia. Introduction

References

[1] Kenji Doya, Shin Ishii, Alexandre Pouget, and Rajesh PN Rao.

Bayesian brain: Probabilistic approaches to neural coding. MIT press,

2007.

[2] A. Heydari. Revisiting approximate dynamic programming and its

convergence. IEEE Transactions on Cybernetics, 44(12):2733–2743,

2014.

[3] Ali Heydari. Convergence analysis of policy iteration. CoRR,

abs/1505.05216, 2015. URL http://arxiv.org/abs/1505.05216.

36


Recommended