Introduction Optimality Conditions Solution methods
Continuous Optimisation, Chpt 2:Unconstrained Optimisation
Peter J.C. Dickinson
DMMP, University of Twente
http://dickinson.website/Teaching/2017CO.html
version: 28/09/17
Monday 25th September 2017
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 1/21
Introduction Optimality Conditions Solution methods
Literature: KRT 2.1 and 4.
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 2/21
Introduction Optimality Conditions Solution methods
Table of Contents
1 Introduction
2 Optimality ConditionsGeometry of minimisationDescent directionsNecessary/sufficient conditionsConvex functions
3 Solution methods
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 3/21
Introduction Optimality Conditions Solution methods
Geometry of minimisation
Theorem 2.1 (Geometry of minimisation)
Consider f : Rn → R, f ∈ C1 and a point y ∈ F with ∇f (y) 6= 0.In a neighbourhood of y the set Dy = {x ∈ F : f (x) = f (y)} is aC1-manifold of dimension n − 1, and at y we have ∇f (y) ⊥ Dy.
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 4/21
Introduction Optimality Conditions Solution methods
Example
http://ggbm.at/e3vayUbW
f (x1, x2) = 1100(x21 + x22 )((x1 − 5)2 + (x2 − 1)2)((x1 − 2)2 + (x2 − 3)2 + 1)
Three strict local minima, two of which are global minima.Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 5/21
Introduction Optimality Conditions Solution methods
Descent directions
Definition 2.2
For f : Rn → R and x ∈ Rn, we call h ∈ Rn a strict descentdirection of f at x if ∃ε > 0 s. t. f (x + εh) < f (x) for all ε ∈ (0, ε].
Fill in the quiz at www.shakeq.com, login code utwente118.
Lemma 2.3
For x ∈ Rn and f : Rn → R consider the following statements:
1 x is a global minimiser of f ;
2 x is a local minimiser of f ;
3 There are no strict descent directions of f at x.
We have (1)⇒ (2)⇒ (3). If f is convex then (1)⇔ (2)⇔ (3)
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 6/21
Introduction Optimality Conditions Solution methods
Descent directions
Definition 2.2
For f : Rn → R and x ∈ Rn, we call h ∈ Rn a strict descentdirection of f at x if ∃ε > 0 s. t. f (x + εh) < f (x) for all ε ∈ (0, ε].
Fill in the quiz at www.shakeq.com, login code utwente118.
Lemma 2.3
For x ∈ Rn and f : Rn → R consider the following statements:
1 x is a global minimiser of f ;
2 x is a local minimiser of f ;
3 There are no strict descent directions of f at x.
We have (1)⇒ (2)⇒ (3). If f is convex then (1)⇔ (2)⇔ (3)
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 6/21
Introduction Optimality Conditions Solution methods
Exercises
Ex. 2.1 Prove Lemma 2.3.
Ex. 2.2 Consider f (x1, x2) = (x21 − 2x2)(x21 − x2). Show that:
(a) the origin 0 is not a local minimiser of f ;
(b) all h ∈ Rn \ {0} are strict ascent directions of f at 0, i.e. forall h ∈ Rn \ {0}, ∃ε > 0 s. t. f (εh) > f (0) for all ε ∈ (0, ε].
N.B. Therefore, in this nonconvex example, statement (3) ofLemma 2.3 holds, but not statement (1).We thus see that for nonconvex problems even if every directionwill lead to an increase, we may still not have a local minimum.
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 7/21
Introduction Optimality Conditions Solution methods
Necessary/sufficient conditions
Theorem 2.4
For f ∈ C2 and ‖h‖ small:
f (x + h) = f (x) +∇f (x)Th + 12h
T∇2f (x)h + o(‖h‖2).
Corollary 2.5 (Necessary condition)
Consider f : Rn → R, f ∈ C1 (resp. f ∈ C2). If x ∈ Rn is a localminimiser then ∇f (x) = 0 (resp. ∇2f (x) � O).
N.B. Not sufficient, e.g. f (x) = x3, − x4, − exp(−x−2).
Corollary 2.6 (Sufficient condition)
Consider f : Rn → R, f ∈ C2. If x ∈ Rn has ∇f (x) = 0 and∇2f (x) � O then x is a strict local minimiser of f .
N.B. Not Necessary, e.g. f (x) = x4, exp(−x−2)Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 8/21
Introduction Optimality Conditions Solution methods
Convex functions
Corollary 2.7 (from Theorem 1.21 and Corollary 2.5)
For a convex function f : Rn → R, f ∈ C1 and x0 ∈ Rn thefollowing are equivalent:
1 x0 is a global minimum,
2 x0 is a local minimum,
3 ∇f (x0) = 0.
Lemma 2.8
The set of global minimisers of a convex function is a convex set.
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 9/21
Introduction Optimality Conditions Solution methods
Example: Quadratic functions
For Q ∈ Sn, Q � O, c ∈ Rn and γ ∈ R consider f : Rn → R givenas
f (x) = xTQx + 2cTx + γ.
We have ∇2f (x) = 2Q � O and thus f is strictly convex.
A vector x is a global minimiser of f if and only if
0 = ∇f (x) = 2Qx + 2c.
Therefore the unique strict global minimiser is x∗ = −Q−1c,and the optimal value is f (x∗) = γ − cTQ−1c.
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 10/21
Introduction Optimality Conditions Solution methods
Table of Contents
1 Introduction
2 Optimality Conditions
3 Solution methodsBasic IdeaDescent directionsChoosing dNewton’s method(Dis)advantagesOther methodsStopping Criteria
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 11/21
Introduction Optimality Conditions Solution methods
Descent directions and derivatives
Lemma 2.9
Consider f : Rn → R, f ∈ C1, and x,d ∈ Rn. Thendfdd(x) = ∇f (x)Td;
If dfdd(x) < 0 then d is a strict descent direction of f at x;
If dfdd(x) > 0 then d is a strict ascent direction of f at x
(and thus is not a strict descent direction of f at x).
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 12/21
Introduction Optimality Conditions Solution methods
Basic idea
Basic idea for minimising a function f : Rn → (R ∪ {∞}), f ∈ C1
over Rn:
1 Start at a point x0 ∈ Rn. (k = 0)
2 Find a search direction dk ∈ Rn such that dfddk
(xk) < 0.
3 If no such direction exists then STOP.
4 Line search: Find λk = arg minλ{f (xk + λdk) : λ ∈ R}(or just f (xk + λkdk) < f (xk)). [See KRT, 4.3]
5 Let xk+1 = xk + λkdk and k ← k + 1.
6 If stopping criteria satisfied then STOP, else go to step 2.
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 13/21
Introduction Optimality Conditions Solution methods
Choosing d: First order
Lemma 2.10
For f ∈ C1(Rn,R) and x,d ∈ Rn we have ∂f∂d(x) = ∇f (x)Td and
f (x + λd) = f (x) + λ∇f (x)Td + o(λ).
Lemma 2.11
For f ∈ C1(Rn,R) and x ∈ Rn s.t. ∇f (x) 6= 0 we have
arg mind{∇f (x)Td : ‖d‖2 = 1} = − ∇f (x)
‖∇f (x)‖2 .
d = − ∇f (x)‖∇f (x)‖2 is the direction of steepest descent.
Ex. 2.3 For xk+1,dk as given on the previous slide withλk = arg minλ{f (xk + λdk) : λ ∈ R}, show that dTk∇f (xk+1) = 0.
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 14/21
Introduction Optimality Conditions Solution methods
Example: Quadratic optimisation
Ex. 2.4 Do exercise 4.15 from KRT.
https://ggbm.at/TYBdQDeB
The convergence to the optimal can be quite slow.
This is a problem in general for minimising a function f ∈ C2, as ifat a minimiser x∗ we have ∇2f (x∗) � O, and then forA = 1
2∇2f (x∗) � O, c = −Ax∗, γ = f (x∗) + x∗TAx∗ we have
f (x) ≈ f (x∗) + (x− x∗)TA(x− x∗) = xTAx + 2cTx + γ for x ≈ x∗.
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 15/21
Introduction Optimality Conditions Solution methods
Newton’s methodLemma 2.12
For f ∈ C2(Rn,R) and x,d ∈ Rn we have
f (x + d) = f (x) +∇f (x)Td + 12d
T∇2f (x)d + o(‖d‖2).
Letting Q = 12∇
2f (xk), c = 12∇f (xk) and γ = f (xk) we have
f (xk + d) ≈ dTQd + 2cTd + γ.If Q � O then, as a function of d, we have that the right-hand sideis minimised at d = −Q−1c = −
(∇2f (xk)
)−1∇f (xk).Referred to as Newton’s direction, and works well as a searchdirection (often with λk = 1, xk+1 = xk −
(∇2f (xk)
)−1∇f (xk),e.g. https://ggbm.at/qMX5uqcF ).Finds minimum in one step for quadratic functions.
Ex. 2.5 Show that if A � O, ∇f (xk) 6= 0 and dk = −A∇f (xk)then dTk∇f (xk) < 0, and thus dk is a descent direction.Which choices of A give the steepest descent and the Newton’sdirection respectively?
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 16/21
Introduction Optimality Conditions Solution methods
(Dis)advantages
(+) Newton’s method normally converges quicker (in terms ofnumber of steps).
(+) With Newton’s method it is normally sufficient to consider astep length of one, so no line search necessary.
(–) For steepest descent method we need only compute thegradient vector ∇f (xk), whereas for Newton’s method we
need to also compute the Hessian and(∇2f (xk)
)−1∇f (xk).
(–) With Newton’s method, we require that the Hessian matrix ispositive definite.
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 17/21
Introduction Optimality Conditions Solution methods
Exercise
Ex. 2.6 Consider the (convex) function f : R2 → R given by
f (x) = exp(x21 + 2x22
).
(N.B. The global minimiser is at x∗ = 0.)
For the starting point x0 =(0.6 0.6
)T, perform the first 7
iterations (i.e. find x1, . . . , x7) for:
1 the steepest descent method;
2 Newton’s method without line search;
3 Newton’s method with line search.
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 18/21
Introduction Optimality Conditions Solution methods
Other methods
There are also plenty of other methods, e.g.:
Conjugate gradient method;
Quasi-Newton method;
Stochastic gradient descent;
Simulated annealing.
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 19/21
Introduction Optimality Conditions Solution methods
Stopping Criteria
Upper bound given by f (xk) ∈ R, i.e. infx{f (x) : x ∈ Rn} ≤ f (xk).
Could stop after certain number of iterations.
If also have lower bounds Lk ∈ R (e.g. duality, see later in course),can pick parameter ε > 0 and stop when
f (xk)− Lk1 + |f (xk)|
≤ ε,
i.e. relative difference between upper and lower bounds small.
If no (good) lower bounds, can pick parameter ε > 0 and stopwhen
f (xk)− f (xk+1)
1 + |f (xk)|≤ ε,
i.e. relative improvement small.Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 20/21
Introduction Optimality Conditions Solution methods
Ex. 2.7 Assuming
xk+1 = xk + dk , and
f (xk+1) ≈ f (xk) + dTk∇f (xk) + 12d
Tk∇2f (xk)dk
when considering Newton’s method, what is
f (xk)− f (xk+1)
approximately equal to, as a function of xk?
Peter J.C. Dickinson http://dickinson.website CO17, Chpt 2: Unconstrained Optimisation 21/21