Lecture 1: Examples · Fundamentals of Unconstrained OptimizationOverview of AlgorithmsLine Search...

Optimal Control with Engineering Applications

Lecture 1: Examples

23. April 2009

S. A. Attia- Tuesdays 10-12am- [email protected] Lecture 2: Nonlinear programming 1(36)

Fundamentals of Unconstrained Optimization Overview of Algorithms Line Search Methods

Outline

1 Fundamentals of Unconstrained Optimization

2 Overview of Algorithms

3 Line Search Methods



Fundamentals of Unconstrained Optimization

The following notation is used

∇x f (x) =

∂f∂x1∂f∂x2

...∂f∂xn

(1)

to denote the gradient of f . And ∇2x f the Hessian matrix

∇2x f =

∂2f∂x2

1

∂2f∂x1∂x2

. . . ∂2f∂x1∂xn

∂2f∂x2∂x1

∂2f∂x2

2. . . ∂2f

∂x2∂xn

......

......

∂2f∂xn∂x1

∂2f∂xn∂x2

. . . ∂2f∂x2

n

(2)




The following notation is used

∇fk = ∇x f (xk) (3)

where xk ∈ Rn and is called later an iterate. The transpose is denoted bya prime ′.




An example: Rosenbrock function

f (x) = 100(x2 − x21 )2 + (1− x1)2 (4)





∇f (x) =

( ∂f∂x1∂f∂x2

)=

(−400(x2 − x2

1 )x1 − 2(1− x1)200(x2 − x2

1 )

)(5)

∇2f =

(∂2f∂x2

1

∂2f∂x1∂x2

∂2f∂x2∂x1

∂2f∂x2

2

)=

(−400(x2 − 3x2

1 ) + 2 −400x1

−400x1 200

)(6)

for xk = (1 1)′, ∇fk = (0 0)′ and ∇2fk =

(802 −400−400 200

).

Remark

Notice the symmetry of the Hessian.




We want to minimize an objective function that depends on realvariables, with no restriction on the value of these variables.

minx

f (x) (7)

where x ∈ Rn is a real vector with n ≥ 1 and f : Rn → R is a smoothfunction.




Example: Data fitting problemSuppose we have a set of pairs of data (tj , yj) for j = 1, . . . ,m. Fromknowledge about the application, we deduce that the signal has thefollowing form

φ(t; x) = x1 + x2e−(x3−t)

x4 + x5 cos(x6t) (8)

where xi are the parameters to be chosen such that the model fits theobserved data yj . We can define

rj(x) = yj − φ(tj ; x), j = 1, . . . ,m (9)

and formulate the following problem

minx

f (x) ,m∑j

r2j (x) (10)

which is an unconstrained optimization problem.




Example: Unconstrained Optimal Control Problem

minu(·)

∫ T

0

L(x(τ), u(τ))dτ subject to

x(t) = f (x(t), u(t))x(0) = x0

u ∈ Rm(11)

with f and L continuously differentiable in both x , u and T finite.




A solution

Definition

A point x∗ is a global minimizer if f (x∗) ≤ f (x) for all x.

Definition

A point x∗ is a local minimizer if there is a neighborhood N of x∗

(N (x∗)) such that f (x∗) ≤ f (x) for all x ∈ N .

Definition

A point x∗ is a strict local minimizer if there is a neighborhood N of x∗

(N (x∗)) such that f (x∗) < f (x) for all x ∈ N with x 6= x∗

(x ∈ N (x∗) \ {x∗}).




Result (Taylor’s Theorem)

Suppose that f : Rn → R is continuously differentiable and that p ∈ Rn.Then we have

f (x + tp) = f (x) +∇x f (x + tp)′p (12)

for some t ∈ (0, 1). Moreover, if f is twice continuously differentiable, wehave that

∇x f (x + p) = ∇x f (x) +

∫ 1

0

∇2x f (x + tp)pdt (13)

and that

f (x + p) = f (x) +∇x f ′(x)p +1

2∇2

x f (x + tp)p (14)

for some t ∈ (0, 1).




Result (First order Necessary conditions)

If x∗ is a local minimizer and f is continuously differentiable, then∇f (x∗) = 0.

Proof.

By contradiction, assume ∇f (x∗) 6= 0 and choose p = −∇f (x∗). By thecontinuity of ∇f near x∗, there exist T > 0 such that

p′∇f (x∗ + tp) < 0, for all t ∈ (0, T )

applying the Taylor result

f (x∗ + tp) = f (x∗) + tp′∇f (x∗ + tp), t ∈ (0, t)

therefore f (x∗ + tp) < f (x∗) which is a contradiction.




Definition

We call x∗ a stationary point if ∇f (x∗) = 0

Remark

Any local minimizer must be a stationary point.





f (x) = 100(x2 − x21 )2 + (1− x1)2 (15)

For the Rosenbrock function xk = (1 1)′, is a stationary point since∇fk = (0 0)′ BUT that does not mean that it is a local minimizer.




Definition

A matrix A ∈ Rn×n is positive definite if p′Ap > 0 for all p ∈ Rn, p 6= 0.If the strict inequality is replaced by an inequality (p′Ap ≥ 0) the matrixA is said to be positive semi-definite.

Example

A =

(2 γ

−2− γ 2

)with γ 6= 1. We have p′Ap = 2(p2

1 + p22 − p1p2) > 0.




Result (Second order Necessary conditions)

If x∗ is a local minimizer and ∇f and ∇2f exists and is continuous, then∇f (x∗) = 0 and ∇2f (x∗) is positive semi-definite.

Proof.

We know that ∇f (x∗) = 0. For contradiction, we assume that ∇2f is notpositive semi-definite, then we can choose

p′∇2f (x∗)p < 0

by using the continuity and Taylor result we find

f (x∗ + tp) = f (x∗) +1

2p′∇2f (x∗ + tp)p < f (x∗)

which is a contradiction.




Result (Second Order Sufficient conditions)

Suppose that ∇2f exists and is continuous and that ∇f (x∗) = 0 and∇2f (x∗) is positive definite. Then x∗ is a strict local minimizer of f .

Remark

The second order sufficient conditions are not necessary. A point x∗ maybe a strict local minimizer and fails to satisfy the sufficient conditions.




Example

f (x) = x4 (16)

x∗ = 0 is the minimizer, but the Hessian vanishes and is thus not positivedefinite.





f (x) = 100(x2 − x21 )2 + (1− x1)2 (17)

For the Rosenbrock function xk = (1 1)′, is a stationary point since∇fk = (0 0)′ and ∇2fk is positive definite then x∗ is a strict localminimizer (indeed a global minimizer by inspection).




Definition (Convex set)

A set S ∈ Rn is a convex set if the straight line segment connecting anytwo points in S lies entirely inside S . For x ∈ S , y ∈ S , we have thatσx + (1− σ)y ∈ S .

Definition (Convex function)

A function f is convex if its domain S is a convex set and if for any twopoints x , y in S the following property is satisfied

f (σx + (1− σ)y) ≤ σf (x) + (1− σ)f (y), for all σ ∈ [0, 1] (18)

The function f is called strictly convex if the inequality above is a strictone.

Remark

A function f is concave if −f is convex.




Examples of convex sets and convex functions

Polyhedron {x ∈ Rn | Ax = b, Cx ≤ d}The unit ball {x ∈ Rn | ‖x‖2 ≤ 1}The exponential function eax for a ≥ 1 or a ≤ 0The max-function is convex max(x1, x2, . . . , xn)The function log-sum log(ex1 + ex2 + . . .+ exn )













Result (Minimum of Convex Functions)

When f is convex, any local minimizer x∗ is a global minimizer of f . If inaddition f is differentiable then any stationary point is a global minimizerof f .

Proof.

Suppose that x∗ is a local minimizer but not a global one, then we canfind z ∈ Rn such that

f (z) < f (x∗)

Let x = σz + (1− σ)x∗ some point for any σ ∈ (0, 1]. By the convexityof f , we have

f (x) ≤ σf (z) + (1− σ)f (x∗) < f (x∗)

any neighborhood of x∗ contains a piece of the line σz + (1− σ)x∗ whichmeans that x∗ is not a local minimizer, which is a contradiction.



Overview of Algorithms

All algorithms require an initial estimate x0. The algorithm thengenerates a sequence of iterates {xk}k that terminates when either nomore progress is made or when it seems that a solution point has beenapproximated with sufficient accuracy.



Overview of Algorithms

Two fundamental strategies for moving from the current point xk to anew iterate xk+1.

Trust Region The information gathered about f is used to construct amodel (approximation of f ) and then the direction iscomputed such that a sufficient decrease in f is achieved.

Line Search The algorithm chooses a direction pk and searches alongthis direction from xk for a new iterate with a lowerfunction value.

The two approaches are dual



Line Search Methods

The distance to move along pk can be found by solving the followingproblem

minα

f (xk + αpk) (19)

α is the step length.

1 Solving exactly the problem above, maximum benefit from thedirection but may be expensive;

2 Approximate solution is usually sufficient, line search generates alimited number of trial step lengths.



Line Search Methods

At each iteration, we have the update equation

xk+1 = xk + αkpk (20)

Possible choices of the direction pk?.



Line Search Methods

1 Steepest descent pk = −∇fk (f (xk + p) = f (xk) + p′∇fk)

2 Newton method pNk = −(∇fk)−1∇fk

(f (xk + αp) = f (xk) + αp′∇fk + p′∇2fkp)

3 The Quasi-Newton search direction pQNk = −(Bk)−1∇fk where Bk is

an approximation of the Hessian matrix(Broyden-Fletcher-Goldfarb-Shanno fromula)

Descent direction p′k∇fk < 0, this condition guarantees decrease alongthe direction.



Line Search Methods

Independently from the chosen direction pk , the following problem needsto be solved

minα

f (xk + αpk), α > 0 (21)

such that a substantial reduction in f (xk+1) compared to f (xk).



Line Search Methods

One way to provide a sufficient decrease is the Armijo condition

f (xk + αpk) ≤ f (xk) + c1α∇f ′k pk , c1small positive constant (22)



Line Search Methods

Another way to provide a sufficient decrease is the Goldstein conditions

f (xk) + (1− c)αk∇f ′k pk ≤ f (xk + αpk) ≤ f (xk) + cα∇f ′k pk (23)

with 0 < c < 12 .



Line Search Methods

Algorithm (Backtracking Line Search (Armijo rule))

Choose α, ρ ∈ (0, 1), c ∈ (0, 1) set α← αrepeat until f (xk + αpk) ≤ f (xk) + cα∇f ′k pk

α← ραend (repeat) αk = α



Exercise

Let f be a function described as

f (x) =

3(1−x)2

4 − 2(1− x) if x > 13(1+x)2

4 − 2(1 + x) if x < −1x2 − 1 if − 1 ≤ x ≤ 1

(24)

Implement the Backtracking algorithm with steepest descent (theparameters are as follows α = 1, ρ = 1/2, c = 0.1, test also your ownparameters). Implement the steepest descent with the Goldsteincondition (c = 1/2, test your own values). Implement the steepestdescent with a constant α, test it on the function (x − 1)2 (test differentvalues of constant α). Conclude.



Nonlinear Programming

Next lecture in room (Room E-N 191, 2-4 pm)

An Optimal Control Problem


Date post:	11-Jul-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Lecture 1: Examples · Fundamentals of Unconstrained OptimizationOverview of AlgorithmsLine Search...

Documents