+ All Categories
Home > Documents > Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent...

Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent...

Date post: 24-Jun-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
31
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 1 Final Review Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye
Transcript
Page 1: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 1

Final Review

Yinyu Ye

Department of Management Science and Engineering

Stanford University

Stanford, CA 94305, U.S.A.

http://www.stanford.edu/ yyye

Page 2: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 2

Duality Theory for Convex Optimization

(CLP ) minimize c • x

subject to ai • x = bi, i = 1, 2, ..., m, x ∈ C.

(CLD) maximize bT y

subject to∑m

i yiai + s = c, s ∈ C∗,

where y ∈ Rm, s is called the dual slack vector/matrix, and C∗ is the dual cone

of C .

x • s ≥ 0

for any feasible x of (CLP) and s of (CLD).

Linear Programming (LP): c,ai,x ∈ Rn and C = Rn+

Semidefinite Programming (SDP): c,ai,x ∈Mn and C = Mn+

Page 3: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 3

Nonlinear and Nonconvex Optimization Problems

The question: How does one recognize an optimal solution to a nonlinearly

constrained optimization problem? Let the problem have the form

(P)minimize f(x)

subject to c(x) (≤, =,≥) 0.

The functions c(x) are the components of a mapping

c(x) =

c1(x)...

cm(x)

from Rn to Rm.

We like to develop a set of test criteria.

Page 4: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4

Descent directions

Let U ⊂ Rn and let f : U → R be a differentiable function on U . If x ∈ U and

there exists a vector d such that

∇f(x)d < 0.

The vector d (above) is called a descent direction at x. For differentiable function

f ,

{d : ∇f(x)d < 0}is a set of direction of at x. Denote byD the set of descent directions at x.

Page 5: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 5

Feasible directions

At feasible point x, a feasibble direction cone is

F := {d ∈ Rn : d 6= 0, x + λd ∈ S for all λ ∈ (0, γ) for some γ > 0}.

Examples:

S = {x : Ax = b} ⇒ F = {d : Ad = 0}.

S = {x : Ax ≥ b} ⇒ F = {d : Aid ≥ 0, ∀i ∈ A(x)},where the active or binding constraint setA(x) := {i : Aix = bi}.

Page 6: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 6

Feasible curve

At a feasible x, consider the feasible direction set

F := {d ∈ Rn : ∇ci(x)d (<, =, >) 0, ∀i ∈ A(x)}

whereA(x) is the set of active constraints, including all equality constraints.

Then, for 0 ≤ θ ≤ θ consider a possibly feasible curve γ(θ) ∈ Rn with

γ(0) = x and γ′(θ) = d ∈ F .

Such a curve always exists if∇ci(x) of all active constraints are linearly

independent (Constraint Qualification). Then,D ∩ F must be empty if x is a

local minimizer.

Page 7: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 7

The nonlinearly constrained KKT condition illustration

h(x)=0

d

xLevel setof f(x)

Gamma curve

Figure 1: KKT Illustration

Page 8: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 8

Other Constraint Qualification?

Yes, for example,

minimize f(x)

subject to c(x) ≥ 0.

where ci(x) are all convave and the feasible region has an interior.

Page 9: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 9

................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

......

........................... .............. x1

........

...................

..............

x2

...................................................................................................................................................................................................................................................................................................................................................................................

..............................................................................................................................................................................

........................

..........................................

..................

...................................................................................................................................................................................................................................................................................................................................................................................

..............................................................................................................................................................................

........................

..........................................

.................. ........................... ..............

∇f(x)

........

...................

..............

∇c2(x)

.........................................

∇c1(x)

Page 10: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 10

The KKT Theorem

Theorem 1 Let x be a local minimizer for (P). Assume the functions ci are

differentiable at x for all i, and a constraint qualification is met: ∇ci(x) of all

active constraints are linearly independent, then there exist multipliers

y1, . . . , ym such that

∇f(x)−m∑

i=1

yi∇ci(x) = 0

yici(x) = 0 ∀i = 1, . . . , m

yi (≤, free,≥) 0 ∀i = 1, . . . , m

Page 11: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 11

Constraint Qualification and KKT Conditions

The KKT conditions serve as the test criteria; but it may not work.

It always work if some Constraint Qualification is met.

A local minimizer may still satisfy the KKT conditions even the Constraint

Qualifiation is not met.

Constraint Qualification is a sufficient condition to make the KKT conditions work,

and the KKT conditions are necessary for a solution to be a local minimizer.

Page 12: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 12

Sign rules of the Largange system

∇f(x)−∇C(x)T y = 0

Max model Min model

yi ≥ 0 jth constraint≥yi ≤ 0 jth constraint≤yi free jth constraint =

ith constraint≤ yi ≥ 0

ith constraint≥ yi ≤ 0

ith constraint = yi free

Page 13: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 13

The Lagrange function

The numbers y1, . . . , ym (under the sign requirements) are called Lagrange

multipliers; the function

L(x,y) = f(x)−m∑

i=1

yici(x)

is called the Lagrangian function, or simply Lagrangian.

Page 14: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 14

The KKT Conditions

The first-order necessary conditions of local optimality are called the

Karush-Kuhn-Tucker conditions.

The vector x is called a KKT stationary point, and (x, y) is called a KKT pair.

To say that x is a KKT stationary point means that there exists a vector y such

that (x, y) is a KKT pair, i.e., satisfies the KKT (first-order necessary) conditions

of local optimality.

Page 15: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 15

First-order sufficient conditions for optimality

Theorem 2 If f is a differentiable convex function and the feasible set is convex,

then the first-order (KKT) optimality conditions are sufficient for the global

optimality.

The feasible region is convex if: “less-constrained” functions are convex,

“greater-constraint” functions are concave, and “equal-constraint” functions are

affine.

Page 16: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 16

Second-order necessary conditions

Theorem 3 Let x be a local minimizer of (P) and let (x, y) satisfy the KKT

conditions of (P). If the constraint qualification holds at x: ∇ci(x) of all active

constraints are linearly independent, then

zT ∇2xL(x, y)z ≥ 0 for all z ∈ T

where the tangent space

T := {d ∈ Rn : ∇ci(x)d = 0, ∀i ∈ A(x)}

Page 17: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 17

Second-order sufficient conditions

Theorem 4 Let (x, y) satisfies the KKT conditions of (P). If the constraint

qualification holds at x and

zT ∇2xL(x, y)z > 0 for all z ∈ T ′, z 6= 0,

where

T ′ := {d ∈ Rn : ∇ci(x)d = 0, ∀i ∈ {i : yi > 0}},then x is a local minimizer of (P).

Page 18: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 18

Optimization Algorithms

Optimization algorithms tend to be iterative procedures.

Starting from a given point x0, they generate a sequence {xk} of iterates

(or trial solutions).

We study algorithms that produce iterates according to well determined

rules–Deterministic Algorithm.

Page 19: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 19

Search direction and step-size

Typically, a nonlinear programming algorithm generates a sequence of points

through an iterative scheme of the form

xk+1 = xk + αkpk

where pk is the search direction and αk is the step size or step length.

The point is that once xk is known, then pk is some function of xk, and the

scalar αk may be chosen in accordance with some line-search rules.

Page 20: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 20

The gradient method (steepest descent method)

Let f be a differentiable function and assume we can compute∇f . We want to

solve the unconstrained minimization problem

minx∈Rn

f(x).

In the absence of further information, we seek a stationary point of f , that is, a

point x∗ at which∇f(x∗) = 0.

We choose pk = −∇f(xk) as the search direction at xk; in fact, it is the

direction of steepest descent. The αk ≥ 0 is chosen “appropriately,” namely to

satisfy

αk ∈ arg minf(xk − α∇f(xk)).

Then the new iterate is defined as xk+1 = xk − αk∇f(xk).

Convergence: if the level set is bounded; the limit point is an KKT solution.

Page 21: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 21

Newton’s method

minx∈Rn

f(x).

The iteration is given by

xk+1 = xk − αk(∇2f(xk))−1∇f(xk).

Convergence: quadratically if the starting point is close enough and the Hassian

matrix is non-singular at every iterate and the limit point.

Page 22: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 22

Sample 1: Newton’s Method

minx∈R

f(x) = ax− log(x), x ≥ 0.

f ′(x) = a− 1/x

and

f ′′(x) = 1/x2

The iteration is given by

xk+1 = xk − (xk)2(a− 1/xk) = 2xk − a(xk)2.

1− axk+1 = 1− 2axk + a2(xk)2 = (1− axk)2 !

Page 23: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 23

Sample 2: Convex Quadratic Optimization and SDP

Given a positive semidefinite matrix Q, consider a bounded QP problem

(CQP ) minimize xT Qx + 2cT x

subject to Ax = b, x ≥ 0.

and

(COP ) minimize Q •X + 2cT x

subject to Ax = b, x ≥ 0, X º xxT .

Show that the two share the same optimal objective value.

Page 24: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 24

Let x be the minimizer of (CQP). Then X = xxT and x = x are feasible for

(COP), so that the min value of (COP) is less or equal to that of (CQP).

Let X and x be the minimizer pair for (COP). Then, x is feasible for (CQP).

Moreover, the objective value of (CQP) at x

xT Qx + 2cT x = Q • xxT + 2cT x ≤ Q • X + 2cT x,

that is, the min value of (CQP) is less or equal to that of (COP).

Page 25: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 25

Note that the matrix inequality of (COP)

X º xxT

can be written as a standard semidefinite matrix inequality X x

xT 1

º 0.

Page 26: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 26

Sample 3: Proof of Lemma 3 on Lecture Note #11

Given (x > 0 ∈ Rn, y ∈ Rm, s > 0 ∈ Rn) interior feasible solution for the

standard form linear program of Ax = b and s = c−AT y, let the direction

d = (dx, dy, ds) be generated from equations

Sdx + Xds = γµe−Xs,

Adx = 0,

−AT dy − ds = 0

where γ = n/(n + ρ) for some positive number ρ and µ = xT sn , and let

θ =α√

min(Xs)‖(XS)−1/2( xT s

(n+ρ)e−Xs)‖ , (1)

where α is a positive constant less than 1. Let

x+ = x + θdx, y+ = y + θdy, and s+ = s + θds.

Page 27: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 27

Then, we have (x+, y+, s+) remains interior feasible and

ψn+ρ(x+, s+)− ψn+ρ(x, s)

≤ −α√

min(Xs)‖(XS)−1/2(e− (n + ρ)xT s

Xs)‖+α2

2(1− α),

where

ψn+ρ(x, s) = (n + ρ) log(sT x)−n∑

j=1

log(sjxj).

Proof Note that dTx ds = 0 by the facts that Adx = 0 and ds = −AT dy . It is

clear that

ψn+ρ(x+, s+)− ψn+ρ(x, s)

= (n + ρ) log(

1 +θdT

s x + θdTx s

sT x

)

−n∑

j=1

(log(1 +

θdsj

sj) + log(1 +

θdxj

xj))

Page 28: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 28

≤ (n + ρ)θdT

s x + dTx s

sT x− θeT (S−1ds + X−1dx)

+||θS−1ds||2 + ||θX−1dx||2

2(1− τ).

where τ = max{||θS−1ds||∞, ||θX−1dx||∞}, and the inequality follows

from the Logarithmic lemma. In the following, we first bound the difference of the

first two terms and then bound the third term. For simplicity, we let β = n+ρsT x

.

Then we have

γµe =n

n + ρ

sT x

ne =

e.

The difference of the first two terms

βθ(dTs x + dT

x s)− θeT (S−1ds + X−1dx)

= βθeT (Xds + Sdx)− θeT (S−1ds + X−1dx)

= θ(βe− (XS)−1e)T (Xds + Sdx)

= θ(βe− (XS)−1e)T (1β

e−Xs) (By Newton’s equations)

Page 29: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 29

= −θ(e− βXs)T (XS)−1(1β

e−Xs)

= −θβ(1e−Xs)T (XS)−1(

e−Xs)

= −θβ||(XS)−1/2(1β

e−Xs)||2

= −βα√

min(Xs)||(XS)−1/2(1β

e−Xs)|| (By the definition of θ)

= −α√

min(Xs)||(XS)−1/2(e− βXs)||= −α

√min(Xs)||(XS)−1/2(e− n + ρ

xT sXs)||.

(2)

Now for the third term we have

S−1ds = (XS)−1/2(XS)−1/2Xds and X−1dx = (XS)−1/2(XS)−1/2Sdx.

Page 30: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 30

Therefore,

||θS−1ds||2 + ||θX−1dx||2

≤ θ2 ||(XS)−1/2Xds||2 + ||(XS)−1/2Sdx||2min(Xs)

≤ θ2 ||(XS)−1/2Xds + (XS)−1/2Sdx||2min(Xs)

= θ2 ||(XS)−1/2(Xds + Sdx)||2min(Xs)

= θ2||(XS)−1/2( 1

β e−Xs)||2min(Xs)

= α2

where the first equality holds since

dTs XT (XS)−1/2(XS)−1/2Sdx = dT

s dx = 0

and the last equality follows from the definition of θ. This also show that τ ≤ α.

Page 31: Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent directions Let U ‰ Rn and let f: U ! R be a differentiable function on U. If x„

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 31

Therefore, we must have

||θS−1ds||2 + ||θX−1dx||22(1− τ)

≤ α2

2(1− α). (3)

The desired inequality follows from (2) and (3).

To complete the proof of the Lemma, we should show that

(x+, s+) > 0, Ax+ = b, and c−AT y+ = s+.

The last two equalities are easily verified by the definition of dx, dy, ds. The

positivity follows from τ ≤ α < 1, where recall

τ = max{||θS−1ds||∞, ||θX−1dx||∞},

and

x+ = x+θdx = X(e+θX−1dx) and s+ = s+θds = S(e+θS−1ds).


Recommended