Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent...

Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 1

Final Review

Yinyu Ye

Department of Management Science and Engineering

Stanford University

Stanford, CA 94305, U.S.A.

http://www.stanford.edu/ yyye


Duality Theory for Convex Optimization

(CLP ) minimize c • x

subject to ai • x = bi, i = 1, 2, ..., m, x ∈ C.

(CLD) maximize bT y

subject to∑m

i yiai + s = c, s ∈ C∗,

where y ∈ Rm, s is called the dual slack vector/matrix, and C∗ is the dual cone

of C .

x • s ≥ 0

for any feasible x of (CLP) and s of (CLD).

Linear Programming (LP): c,ai,x ∈ Rn and C = Rn+

Semidefinite Programming (SDP): c,ai,x ∈Mn and C = Mn+


Nonlinear and Nonconvex Optimization Problems

The question: How does one recognize an optimal solution to a nonlinearly

constrained optimization problem? Let the problem have the form

(P)minimize f(x)

subject to c(x) (≤, =,≥) 0.

The functions c(x) are the components of a mapping

c(x) =

c1(x)...

cm(x)

from Rn to Rm.

We like to develop a set of test criteria.


Descent directions

Let U ⊂ Rn and let f : U → R be a differentiable function on U . If x ∈ U and

there exists a vector d such that

∇f(x)d < 0.

The vector d (above) is called a descent direction at x. For differentiable function

f ,

{d : ∇f(x)d < 0}is a set of direction of at x. Denote byD the set of descent directions at x.


Feasible directions

At feasible point x, a feasibble direction cone is

F := {d ∈ Rn : d 6= 0, x + λd ∈ S for all λ ∈ (0, γ) for some γ > 0}.

Examples:

S = {x : Ax = b} ⇒ F = {d : Ad = 0}.

S = {x : Ax ≥ b} ⇒ F = {d : Aid ≥ 0, ∀i ∈ A(x)},where the active or binding constraint setA(x) := {i : Aix = bi}.


Feasible curve

At a feasible x, consider the feasible direction set

F := {d ∈ Rn : ∇ci(x)d (<, =, >) 0, ∀i ∈ A(x)}

whereA(x) is the set of active constraints, including all equality constraints.

Then, for 0 ≤ θ ≤ θ consider a possibly feasible curve γ(θ) ∈ Rn with

γ(0) = x and γ′(θ) = d ∈ F .

Such a curve always exists if∇ci(x) of all active constraints are linearly

independent (Constraint Qualification). Then,D ∩ F must be empty if x is a

local minimizer.


The nonlinearly constrained KKT condition illustration

h(x)=0

d

xLevel setof f(x)

Gamma curve

Figure 1: KKT Illustration


Other Constraint Qualification?

Yes, for example,

minimize f(x)

subject to c(x) ≥ 0.

where ci(x) are all convave and the feasible region has an interior.


................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

........

......

........................... .............. x1

........

...................

..............

x2

•

•

...................................................................................................................................................................................................................................................................................................................................................................................

..............................................................................................................................................................................

........................

..........................................

..................

...................................................................................................................................................................................................................................................................................................................................................................................

..............................................................................................................................................................................

........................

..........................................

.................. ........................... ..............

∇f(x)

........

...................

..............

∇c2(x)

.........................................

∇c1(x)


The KKT Theorem

Theorem 1 Let x be a local minimizer for (P). Assume the functions ci are

differentiable at x for all i, and a constraint qualification is met: ∇ci(x) of all

active constraints are linearly independent, then there exist multipliers

y1, . . . , ym such that

∇f(x)−m∑

i=1

yi∇ci(x) = 0

yici(x) = 0 ∀i = 1, . . . , m

yi (≤, free,≥) 0 ∀i = 1, . . . , m


Constraint Qualification and KKT Conditions

The KKT conditions serve as the test criteria; but it may not work.

It always work if some Constraint Qualification is met.

A local minimizer may still satisfy the KKT conditions even the Constraint

Qualifiation is not met.

Constraint Qualification is a sufficient condition to make the KKT conditions work,

and the KKT conditions are necessary for a solution to be a local minimizer.


Sign rules of the Largange system

∇f(x)−∇C(x)T y = 0

Max model Min model

yi ≥ 0 jth constraint≥yi ≤ 0 jth constraint≤yi free jth constraint =

ith constraint≤ yi ≥ 0

ith constraint≥ yi ≤ 0

ith constraint = yi free


The Lagrange function

The numbers y1, . . . , ym (under the sign requirements) are called Lagrange

multipliers; the function

L(x,y) = f(x)−m∑

i=1

yici(x)

is called the Lagrangian function, or simply Lagrangian.


The KKT Conditions

The first-order necessary conditions of local optimality are called the

Karush-Kuhn-Tucker conditions.

The vector x is called a KKT stationary point, and (x, y) is called a KKT pair.

To say that x is a KKT stationary point means that there exists a vector y such

that (x, y) is a KKT pair, i.e., satisfies the KKT (first-order necessary) conditions

of local optimality.


First-order sufficient conditions for optimality

Theorem 2 If f is a differentiable convex function and the feasible set is convex,

then the first-order (KKT) optimality conditions are sufficient for the global

optimality.

The feasible region is convex if: “less-constrained” functions are convex,

“greater-constraint” functions are concave, and “equal-constraint” functions are

affine.


Second-order necessary conditions

Theorem 3 Let x be a local minimizer of (P) and let (x, y) satisfy the KKT

conditions of (P). If the constraint qualification holds at x: ∇ci(x) of all active

constraints are linearly independent, then

zT ∇2xL(x, y)z ≥ 0 for all z ∈ T

where the tangent space

T := {d ∈ Rn : ∇ci(x)d = 0, ∀i ∈ A(x)}


Second-order sufficient conditions

Theorem 4 Let (x, y) satisfies the KKT conditions of (P). If the constraint

qualification holds at x and

zT ∇2xL(x, y)z > 0 for all z ∈ T ′, z 6= 0,

where

T ′ := {d ∈ Rn : ∇ci(x)d = 0, ∀i ∈ {i : yi > 0}},then x is a local minimizer of (P).


Optimization Algorithms

Optimization algorithms tend to be iterative procedures.

Starting from a given point x0, they generate a sequence {xk} of iterates

(or trial solutions).

We study algorithms that produce iterates according to well determined

rules–Deterministic Algorithm.


Search direction and step-size

Typically, a nonlinear programming algorithm generates a sequence of points

through an iterative scheme of the form

xk+1 = xk + αkpk

where pk is the search direction and αk is the step size or step length.

The point is that once xk is known, then pk is some function of xk, and the

scalar αk may be chosen in accordance with some line-search rules.


The gradient method (steepest descent method)

Let f be a differentiable function and assume we can compute∇f . We want to

solve the unconstrained minimization problem

minx∈Rn

f(x).

In the absence of further information, we seek a stationary point of f , that is, a

point x∗ at which∇f(x∗) = 0.

We choose pk = −∇f(xk) as the search direction at xk; in fact, it is the

direction of steepest descent. The αk ≥ 0 is chosen “appropriately,” namely to

satisfy

αk ∈ arg minf(xk − α∇f(xk)).

Then the new iterate is defined as xk+1 = xk − αk∇f(xk).

Convergence: if the level set is bounded; the limit point is an KKT solution.


Newton’s method

minx∈Rn

f(x).

The iteration is given by

xk+1 = xk − αk(∇2f(xk))−1∇f(xk).

Convergence: quadratically if the starting point is close enough and the Hassian

matrix is non-singular at every iterate and the limit point.


Sample 1: Newton’s Method

minx∈R

f(x) = ax− log(x), x ≥ 0.

f ′(x) = a− 1/x

and

f ′′(x) = 1/x2

The iteration is given by

xk+1 = xk − (xk)2(a− 1/xk) = 2xk − a(xk)2.

1− axk+1 = 1− 2axk + a2(xk)2 = (1− axk)2 !


Sample 2: Convex Quadratic Optimization and SDP

Given a positive semidefinite matrix Q, consider a bounded QP problem

(CQP ) minimize xT Qx + 2cT x

subject to Ax = b, x ≥ 0.

and

(COP ) minimize Q •X + 2cT x

subject to Ax = b, x ≥ 0, X º xxT .

Show that the two share the same optimal objective value.


Let x be the minimizer of (CQP). Then X = xxT and x = x are feasible for

(COP), so that the min value of (COP) is less or equal to that of (CQP).

Let X and x be the minimizer pair for (COP). Then, x is feasible for (CQP).

Moreover, the objective value of (CQP) at x

xT Qx + 2cT x = Q • xxT + 2cT x ≤ Q • X + 2cT x,

that is, the min value of (CQP) is less or equal to that of (COP).


Note that the matrix inequality of (COP)

X º xxT

can be written as a standard semidefinite matrix inequality X x

xT 1

º 0.


Sample 3: Proof of Lemma 3 on Lecture Note #11

Given (x > 0 ∈ Rn, y ∈ Rm, s > 0 ∈ Rn) interior feasible solution for the

standard form linear program of Ax = b and s = c−AT y, let the direction

d = (dx, dy, ds) be generated from equations

Sdx + Xds = γµe−Xs,

Adx = 0,

−AT dy − ds = 0

where γ = n/(n + ρ) for some positive number ρ and µ = xT sn , and let

θ =α√

min(Xs)‖(XS)−1/2( xT s

(n+ρ)e−Xs)‖ , (1)

where α is a positive constant less than 1. Let

x+ = x + θdx, y+ = y + θdy, and s+ = s + θds.


Then, we have (x+, y+, s+) remains interior feasible and

ψn+ρ(x+, s+)− ψn+ρ(x, s)

≤ −α√

min(Xs)‖(XS)−1/2(e− (n + ρ)xT s

Xs)‖+α2

2(1− α),

where

ψn+ρ(x, s) = (n + ρ) log(sT x)−n∑

j=1

log(sjxj).

Proof Note that dTx ds = 0 by the facts that Adx = 0 and ds = −AT dy . It is

clear that

ψn+ρ(x+, s+)− ψn+ρ(x, s)

= (n + ρ) log(

1 +θdT

s x + θdTx s

sT x

)

−n∑

j=1

(log(1 +

θdsj

sj) + log(1 +

θdxj

xj))


≤ (n + ρ)θdT

s x + dTx s

sT x− θeT (S−1ds + X−1dx)

+||θS−1ds||2 + ||θX−1dx||2

2(1− τ).

where τ = max{||θS−1ds||∞, ||θX−1dx||∞}, and the inequality follows

from the Logarithmic lemma. In the following, we first bound the difference of the

first two terms and then bound the third term. For simplicity, we let β = n+ρsT x

.

Then we have

γµe =n

n + ρ

sT x

ne =

1β

e.

The difference of the first two terms

βθ(dTs x + dT

x s)− θeT (S−1ds + X−1dx)

= βθeT (Xds + Sdx)− θeT (S−1ds + X−1dx)

= θ(βe− (XS)−1e)T (Xds + Sdx)

= θ(βe− (XS)−1e)T (1β

e−Xs) (By Newton’s equations)


= −θ(e− βXs)T (XS)−1(1β

e−Xs)

= −θβ(1e−Xs)T (XS)−1(

1β

e−Xs)

= −θβ||(XS)−1/2(1β

e−Xs)||2

= −βα√

min(Xs)||(XS)−1/2(1β

e−Xs)|| (By the definition of θ)

= −α√

min(Xs)||(XS)−1/2(e− βXs)||= −α

√min(Xs)||(XS)−1/2(e− n + ρ

xT sXs)||.

(2)

Now for the third term we have

S−1ds = (XS)−1/2(XS)−1/2Xds and X−1dx = (XS)−1/2(XS)−1/2Sdx.


Therefore,

||θS−1ds||2 + ||θX−1dx||2

≤ θ2 ||(XS)−1/2Xds||2 + ||(XS)−1/2Sdx||2min(Xs)

≤ θ2 ||(XS)−1/2Xds + (XS)−1/2Sdx||2min(Xs)

= θ2 ||(XS)−1/2(Xds + Sdx)||2min(Xs)

= θ2||(XS)−1/2( 1

β e−Xs)||2min(Xs)

= α2

where the first equality holds since

dTs XT (XS)−1/2(XS)−1/2Sdx = dT

s dx = 0

and the last equality follows from the definition of θ. This also show that τ ≤ α.


Therefore, we must have

||θS−1ds||2 + ||θX−1dx||22(1− τ)

≤ α2

2(1− α). (3)

The desired inequality follows from (2) and (3).

To complete the proof of the Lemma, we should show that

(x+, s+) > 0, Ax+ = b, and c−AT y+ = s+.

The last two equalities are easily verified by the definition of dx, dy, ds. The

positivity follows from τ ≤ α < 1, where recall

τ = max{||θS−1ds||∞, ||θX−1dx||∞},

and

x+ = x+θdx = X(e+θX−1dx) and s+ = s+θds = S(e+θS−1ds).

Date post:	24-Jun-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Final Review - web.stanford.edu · Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4 Descent...

Documents