Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 1
Final Review
Yinyu Ye
Department of Management Science and Engineering
Stanford University
Stanford, CA 94305, U.S.A.
http://www.stanford.edu/ yyye
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 2
Duality Theory for Convex Optimization
(CLP ) minimize c • x
subject to ai • x = bi, i = 1, 2, ..., m, x ∈ C.
(CLD) maximize bT y
subject to∑m
i yiai + s = c, s ∈ C∗,
where y ∈ Rm, s is called the dual slack vector/matrix, and C∗ is the dual cone
of C .
x • s ≥ 0
for any feasible x of (CLP) and s of (CLD).
Linear Programming (LP): c,ai,x ∈ Rn and C = Rn+
Semidefinite Programming (SDP): c,ai,x ∈Mn and C = Mn+
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 3
Nonlinear and Nonconvex Optimization Problems
The question: How does one recognize an optimal solution to a nonlinearly
constrained optimization problem? Let the problem have the form
(P)minimize f(x)
subject to c(x) (≤, =,≥) 0.
The functions c(x) are the components of a mapping
c(x) =
c1(x)...
cm(x)
from Rn to Rm.
We like to develop a set of test criteria.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 4
Descent directions
Let U ⊂ Rn and let f : U → R be a differentiable function on U . If x ∈ U and
there exists a vector d such that
∇f(x)d < 0.
The vector d (above) is called a descent direction at x. For differentiable function
f ,
{d : ∇f(x)d < 0}is a set of direction of at x. Denote byD the set of descent directions at x.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 5
Feasible directions
At feasible point x, a feasibble direction cone is
F := {d ∈ Rn : d 6= 0, x + λd ∈ S for all λ ∈ (0, γ) for some γ > 0}.
Examples:
S = {x : Ax = b} ⇒ F = {d : Ad = 0}.
S = {x : Ax ≥ b} ⇒ F = {d : Aid ≥ 0, ∀i ∈ A(x)},where the active or binding constraint setA(x) := {i : Aix = bi}.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 6
Feasible curve
At a feasible x, consider the feasible direction set
F := {d ∈ Rn : ∇ci(x)d (<, =, >) 0, ∀i ∈ A(x)}
whereA(x) is the set of active constraints, including all equality constraints.
Then, for 0 ≤ θ ≤ θ consider a possibly feasible curve γ(θ) ∈ Rn with
γ(0) = x and γ′(θ) = d ∈ F .
Such a curve always exists if∇ci(x) of all active constraints are linearly
independent (Constraint Qualification). Then,D ∩ F must be empty if x is a
local minimizer.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 7
The nonlinearly constrained KKT condition illustration
h(x)=0
d
xLevel setof f(x)
Gamma curve
Figure 1: KKT Illustration
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 8
Other Constraint Qualification?
Yes, for example,
minimize f(x)
subject to c(x) ≥ 0.
where ci(x) are all convave and the feasible region has an interior.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 9
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
......
........................... .............. x1
........
...................
..............
x2
•
•
...................................................................................................................................................................................................................................................................................................................................................................................
..............................................................................................................................................................................
........................
..........................................
..................
...................................................................................................................................................................................................................................................................................................................................................................................
..............................................................................................................................................................................
........................
..........................................
.................. ........................... ..............
∇f(x)
........
...................
..............
∇c2(x)
.........................................
∇c1(x)
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 10
The KKT Theorem
Theorem 1 Let x be a local minimizer for (P). Assume the functions ci are
differentiable at x for all i, and a constraint qualification is met: ∇ci(x) of all
active constraints are linearly independent, then there exist multipliers
y1, . . . , ym such that
∇f(x)−m∑
i=1
yi∇ci(x) = 0
yici(x) = 0 ∀i = 1, . . . , m
yi (≤, free,≥) 0 ∀i = 1, . . . , m
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 11
Constraint Qualification and KKT Conditions
The KKT conditions serve as the test criteria; but it may not work.
It always work if some Constraint Qualification is met.
A local minimizer may still satisfy the KKT conditions even the Constraint
Qualifiation is not met.
Constraint Qualification is a sufficient condition to make the KKT conditions work,
and the KKT conditions are necessary for a solution to be a local minimizer.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 12
Sign rules of the Largange system
∇f(x)−∇C(x)T y = 0
Max model Min model
yi ≥ 0 jth constraint≥yi ≤ 0 jth constraint≤yi free jth constraint =
ith constraint≤ yi ≥ 0
ith constraint≥ yi ≤ 0
ith constraint = yi free
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 13
The Lagrange function
The numbers y1, . . . , ym (under the sign requirements) are called Lagrange
multipliers; the function
L(x,y) = f(x)−m∑
i=1
yici(x)
is called the Lagrangian function, or simply Lagrangian.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 14
The KKT Conditions
The first-order necessary conditions of local optimality are called the
Karush-Kuhn-Tucker conditions.
The vector x is called a KKT stationary point, and (x, y) is called a KKT pair.
To say that x is a KKT stationary point means that there exists a vector y such
that (x, y) is a KKT pair, i.e., satisfies the KKT (first-order necessary) conditions
of local optimality.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 15
First-order sufficient conditions for optimality
Theorem 2 If f is a differentiable convex function and the feasible set is convex,
then the first-order (KKT) optimality conditions are sufficient for the global
optimality.
The feasible region is convex if: “less-constrained” functions are convex,
“greater-constraint” functions are concave, and “equal-constraint” functions are
affine.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 16
Second-order necessary conditions
Theorem 3 Let x be a local minimizer of (P) and let (x, y) satisfy the KKT
conditions of (P). If the constraint qualification holds at x: ∇ci(x) of all active
constraints are linearly independent, then
zT ∇2xL(x, y)z ≥ 0 for all z ∈ T
where the tangent space
T := {d ∈ Rn : ∇ci(x)d = 0, ∀i ∈ A(x)}
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 17
Second-order sufficient conditions
Theorem 4 Let (x, y) satisfies the KKT conditions of (P). If the constraint
qualification holds at x and
zT ∇2xL(x, y)z > 0 for all z ∈ T ′, z 6= 0,
where
T ′ := {d ∈ Rn : ∇ci(x)d = 0, ∀i ∈ {i : yi > 0}},then x is a local minimizer of (P).
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 18
Optimization Algorithms
Optimization algorithms tend to be iterative procedures.
Starting from a given point x0, they generate a sequence {xk} of iterates
(or trial solutions).
We study algorithms that produce iterates according to well determined
rules–Deterministic Algorithm.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 19
Search direction and step-size
Typically, a nonlinear programming algorithm generates a sequence of points
through an iterative scheme of the form
xk+1 = xk + αkpk
where pk is the search direction and αk is the step size or step length.
The point is that once xk is known, then pk is some function of xk, and the
scalar αk may be chosen in accordance with some line-search rules.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 20
The gradient method (steepest descent method)
Let f be a differentiable function and assume we can compute∇f . We want to
solve the unconstrained minimization problem
minx∈Rn
f(x).
In the absence of further information, we seek a stationary point of f , that is, a
point x∗ at which∇f(x∗) = 0.
We choose pk = −∇f(xk) as the search direction at xk; in fact, it is the
direction of steepest descent. The αk ≥ 0 is chosen “appropriately,” namely to
satisfy
αk ∈ arg minf(xk − α∇f(xk)).
Then the new iterate is defined as xk+1 = xk − αk∇f(xk).
Convergence: if the level set is bounded; the limit point is an KKT solution.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 21
Newton’s method
minx∈Rn
f(x).
The iteration is given by
xk+1 = xk − αk(∇2f(xk))−1∇f(xk).
Convergence: quadratically if the starting point is close enough and the Hassian
matrix is non-singular at every iterate and the limit point.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 22
Sample 1: Newton’s Method
minx∈R
f(x) = ax− log(x), x ≥ 0.
f ′(x) = a− 1/x
and
f ′′(x) = 1/x2
The iteration is given by
xk+1 = xk − (xk)2(a− 1/xk) = 2xk − a(xk)2.
1− axk+1 = 1− 2axk + a2(xk)2 = (1− axk)2 !
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 23
Sample 2: Convex Quadratic Optimization and SDP
Given a positive semidefinite matrix Q, consider a bounded QP problem
(CQP ) minimize xT Qx + 2cT x
subject to Ax = b, x ≥ 0.
and
(COP ) minimize Q •X + 2cT x
subject to Ax = b, x ≥ 0, X º xxT .
Show that the two share the same optimal objective value.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 24
Let x be the minimizer of (CQP). Then X = xxT and x = x are feasible for
(COP), so that the min value of (COP) is less or equal to that of (CQP).
Let X and x be the minimizer pair for (COP). Then, x is feasible for (CQP).
Moreover, the objective value of (CQP) at x
xT Qx + 2cT x = Q • xxT + 2cT x ≤ Q • X + 2cT x,
that is, the min value of (CQP) is less or equal to that of (COP).
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 25
Note that the matrix inequality of (COP)
X º xxT
can be written as a standard semidefinite matrix inequality X x
xT 1
º 0.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 26
Sample 3: Proof of Lemma 3 on Lecture Note #11
Given (x > 0 ∈ Rn, y ∈ Rm, s > 0 ∈ Rn) interior feasible solution for the
standard form linear program of Ax = b and s = c−AT y, let the direction
d = (dx, dy, ds) be generated from equations
Sdx + Xds = γµe−Xs,
Adx = 0,
−AT dy − ds = 0
where γ = n/(n + ρ) for some positive number ρ and µ = xT sn , and let
θ =α√
min(Xs)‖(XS)−1/2( xT s
(n+ρ)e−Xs)‖ , (1)
where α is a positive constant less than 1. Let
x+ = x + θdx, y+ = y + θdy, and s+ = s + θds.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 27
Then, we have (x+, y+, s+) remains interior feasible and
ψn+ρ(x+, s+)− ψn+ρ(x, s)
≤ −α√
min(Xs)‖(XS)−1/2(e− (n + ρ)xT s
Xs)‖+α2
2(1− α),
where
ψn+ρ(x, s) = (n + ρ) log(sT x)−n∑
j=1
log(sjxj).
Proof Note that dTx ds = 0 by the facts that Adx = 0 and ds = −AT dy . It is
clear that
ψn+ρ(x+, s+)− ψn+ρ(x, s)
= (n + ρ) log(
1 +θdT
s x + θdTx s
sT x
)
−n∑
j=1
(log(1 +
θdsj
sj) + log(1 +
θdxj
xj))
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 28
≤ (n + ρ)θdT
s x + dTx s
sT x− θeT (S−1ds + X−1dx)
+||θS−1ds||2 + ||θX−1dx||2
2(1− τ).
where τ = max{||θS−1ds||∞, ||θX−1dx||∞}, and the inequality follows
from the Logarithmic lemma. In the following, we first bound the difference of the
first two terms and then bound the third term. For simplicity, we let β = n+ρsT x
.
Then we have
γµe =n
n + ρ
sT x
ne =
1β
e.
The difference of the first two terms
βθ(dTs x + dT
x s)− θeT (S−1ds + X−1dx)
= βθeT (Xds + Sdx)− θeT (S−1ds + X−1dx)
= θ(βe− (XS)−1e)T (Xds + Sdx)
= θ(βe− (XS)−1e)T (1β
e−Xs) (By Newton’s equations)
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 29
= −θ(e− βXs)T (XS)−1(1β
e−Xs)
= −θβ(1e−Xs)T (XS)−1(
1β
e−Xs)
= −θβ||(XS)−1/2(1β
e−Xs)||2
= −βα√
min(Xs)||(XS)−1/2(1β
e−Xs)|| (By the definition of θ)
= −α√
min(Xs)||(XS)−1/2(e− βXs)||= −α
√min(Xs)||(XS)−1/2(e− n + ρ
xT sXs)||.
(2)
Now for the third term we have
S−1ds = (XS)−1/2(XS)−1/2Xds and X−1dx = (XS)−1/2(XS)−1/2Sdx.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 30
Therefore,
||θS−1ds||2 + ||θX−1dx||2
≤ θ2 ||(XS)−1/2Xds||2 + ||(XS)−1/2Sdx||2min(Xs)
≤ θ2 ||(XS)−1/2Xds + (XS)−1/2Sdx||2min(Xs)
= θ2 ||(XS)−1/2(Xds + Sdx)||2min(Xs)
= θ2||(XS)−1/2( 1
β e−Xs)||2min(Xs)
= α2
where the first equality holds since
dTs XT (XS)−1/2(XS)−1/2Sdx = dT
s dx = 0
and the last equality follows from the definition of θ. This also show that τ ≤ α.
Yinyu Ye, MS&E, Stanford MS&E311 Final Review Note 31
Therefore, we must have
||θS−1ds||2 + ||θX−1dx||22(1− τ)
≤ α2
2(1− α). (3)
The desired inequality follows from (2) and (3).
To complete the proof of the Lemma, we should show that
(x+, s+) > 0, Ax+ = b, and c−AT y+ = s+.
The last two equalities are easily verified by the definition of dx, dy, ds. The
positivity follows from τ ≤ α < 1, where recall
τ = max{||θS−1ds||∞, ||θX−1dx||∞},
and
x+ = x+θdx = X(e+θX−1dx) and s+ = s+θds = S(e+θS−1ds).