Post on 21-Mar-2021
transcript
ISyE 6661: Topics Covered
1. Optimization fundamentals: 1.5 lectures
2. LP Geometry (Chpt.2): 5 lectures
3. The Simplex Method (Chpt.3): 4 lectures
4. LP Duality (Chpt.4): 4 lectures
5. Sensitivity Analysis (Chpt.5): 3 lectures
6. Large-scale LP (Chpt.6): 1.5 lectures
7. Computational complexity and the Ellipsoid method
(Chpt. 8): 2 lectures
8. Interior Point Algorithms (Chpt. 9): 5 lectures
1
1. Fundamentals of Optimization
• The generic optimization problem:
(P ) : min{f(x) : x ∈ X}.
• Weirstrass’ Theorem: If f is continuous and X is
compact then problem (P ) has an optimal solu-
tion.
• If f is a convex function and X is a convex set,
then (P ) is a convex program.
• Theorem: If x∗ is a local optimal solution of the
convex program (P ) then it is also a global optimal
solution.
2
2. Linear Programming Geometry
• LP in standard form
(P ) : min{cTx : Ax = b, x ≥ 0}.
• LP involves involves minimizing a linear functionover the polyhderal set X = {x : Ax = b, x ≥ 0}.
• Basic building blocks of a polyhedral set:Extreme points and Extreme rays.
• Theorem: (Algebraic characterization of Extremepts.) A vector x is an extreme point of X iff it isa Basic Feasible Solution, i.e., ∃ a partitioning ofA = [B|N ] (with B square and nonsingular) suchthat xB = B−1b and xN = 0.
• Theorem: (Algebraic characterization of Extremerays.) A vector d 6= 0 is an extreme ray of Xiff if it is a Non-negative Basic Direction, i.e., ∃a partitioning of A = [B|N ] (with B square andnonsingular) s.t.
d = α
[−B−1Aj
ej
]≥ 0
for some Aj ∈ N and α > 0.
3
2. Linear Programming Geometry (contd.)
• The Representation Theorem: Let x1, . . . , xk andd1, . . . , dl be the extreme points and extreme raysof X respectively. Then
X =
x : x =k∑
i=1
λixi +
l∑j=1
µjdj
k∑i=1
λi = 1, λi ≥ 0 ∀i, µj ≥ 0 ∀j
.
• To prove the above result, we used:
The Separation Theorem: Let S be a non-emptyclosed convex set, and x∗ 6∈ S. Then ∃ a vector c
s.t. cTx∗ < cTx ∀ x ∈ S.
• Theorem: (Cor. of Rep. Thm.)
(a) An LP min{cTx : x ∈ X} has an optimal solutioniff cTdj ≥ 0 for all extreme rays dj j = 1, . . . , l.
(b) Extreme point optimality: If an LP has an opti-mal solution then there exists an extreme pointthat is optimal.
4
3. The Simplex Method
• Basic idea: Move from one extreme point (bfs) toanother while improving the objective.
• Given a bfs xk with basis B move along one of thej-th Basic Directions (j ∈ N)
dj =
[−B−1Aj
ej
].
• If x is non-degenerate then dj is a feasible direc-tion, i.e., allows a positive step move. If cTdj < 0then dj is an improving direction. Note cTdj =cj − cT
BB−1Aj = c̄j (the reduced cost).
• If no improving direction exists, i.e. c̄j ≥ 0 for allj ∈ N , the current solution is optimal, Stop.
• Chose an improving basic direction dj from j ∈ N ,and move to xk+1 ← xk + αdj where α ≥ 0 is suchthat xk+1 ≥ 0.
• If dj ≥ 0 then α = +∞ implying that the problemis unbounded, Stop.
5
3. The Simplex Method (contd.)
• Theorem: xk+1 is an adjacent bfs to xk with basis
B̂ = B +{Aj}−{Al} where l is some basic variable
that becomes nonbasic.
• Degeneracy, i.e., when a basic variable has a value
of zero, is a problem.
• If xk is a degenerate, α could be zero, i.e., the
basis changes from B to B̂ but xk+1 = xk and
cause Stalling or Cycling.
• Can be dealt with by properly choosing j and l
(e.g. Lexicographic rule).
• Theorem: The Simplex method (with proper pivot
rules) solves LP in a finite number of iterations.
6
3. The Simplex Method (contd.)
• Revised Simplex and Tableau implementations.
• Initializing the Simplex method
– Two-phase Simplex
– Big-M method
7
4. Duality
• Standard form Primal-dual LP pairs:
vP = min cTx vD = max bTy
s.t. Ax = b s.t. ATy ≤ c.x ≥ 0
• Recipe for writing dual problem for general LPs.
• Weak Duality Theorem: vD ≤ vP .
• Proof of WD: By construction of the dual prob-
lem.
• Strong Duality Theorem: If either problem has a
finite optimal value then vD = vP .
• Proof 1 of SD: From the Simplex Method. (cTBB−1
are the optimal dual variables).
• Proof 2 of SD: From the theorems of alternatives
(Farkaas’ Lemma).
8
4. Duality (Contd.)
• Fakaas’ Lemma: Let A ∈ Rm×n and b ∈ Rm then
exactly one of the following two systems (a or b)
is feasible:
(a) Ax = b (b) ATy ≥ 0x ≥ 0 bTy < 0.
• Proof: Use Separating Hyperplane theorem.
• See different forms of Farkaas’ Lemma.
• From Duality to Polyhedral theory:
– An immediate proof of Farkaas’ Lemma.
– A simple proof of the Representation Thm.
– Converse to Rep. Thm.: Convex hull of a fi-
nite number of points is a polytope.
9
4. Duality (Contd.)
• LP Optimality Conditions (Cor. of SD) A pair
(x∗, y∗) is primal-dual optimal iff
Ax∗ = b, x∗ ≥ 0 Primal FeasibilityATy∗ ≤ c Dual Feasibilityx∗j(cj −AT
j y∗) = 0 ∀ j Complementary Slackness.
• Relation between non-degeneracy and uniqueness
amongst primal and dual optimal solutions.
• The Dual Simplex Algorithm:
– A basis B is primal feasible (PF) if B−1b ≥ 0
and dual feasible (DF) if cT − cTBB−1A ≥ 0.
– Start with a basis that is DF but not PF.
– Select a variable (< 0) to leave the basis (move
towards PF).
– Select an entering variable to maintain DF.
10
4. Duality (Contd.)
• Dual Simplex is not analogous to applying Primal
Simplex to the Dual problem.
• When to use Dual Simplex over Primal Simplex?
• Generalized Duality:
The dual of vP = min{cT : Ax ≥ b, x ∈ X} is
vD = max{L(y) : y ≥ 0} where
L(y) := min{cTx + yT (b−Ax) : x ∈ X}.
11
5. Sensitivity Analysis
Consider the LP
z = min{cTx : Ax = b, x ≥ 0}.
An instance of the LP is given by the data (n, m, c, A, b).
• If the optimal solution x∗ is non-degenerate then
the i-th dual variable represents
y∗i =∂z
∂bi
∣∣∣∣∣x∗
i = 1, . . . , m.
• Local Sensitivity Analysis:
(a) How doe the optimal solution x∗ and the opti-
mal value z behave under small perturbations
of the problem data (n, m, c, A, b)?
(b) How to efficiently recover the new optimal solu-
tion and optimal value after the perturbation?
12
5. Sensitivity Analysis (contd.)
• Adding a new variable: Current basis remains PF.
So check DF (reduced cost of the new variable)
and use Primal Simplex to optimize if needed.
• Adding a new constraint: Current basis remain DF.
Check PF, and use Dual Simplex to optimize if
needed.
• Perturbing b← b + δd: Current basis remain DF
and PF over a computable range of δ. Outside
this range, we have DF but not PF, so use Dual
Simplex to optimize.
• Perturbing c← c + δd: Current basis remain DF
and PF over a computable range of δ. Outside
this range, we have PF but not DF, so use Primal
Simplex to optimize.
13
5. Sensitivity Analysis (contd.)
• Perturbing Aj ← Aj + δd where j ∈ N : Current ba-
sis remain DF and PF over a computable range of
δ. Outside this range, we have PF but not DF, so
use Primal Simplex to optimize.
• Perturbing Aj ← Aj + δd where j ∈ B: Current ba-
sis remain DF and PF over a computable range of
δ. Outside this range, both PF and DF maybe
affected.
• Global behavior of value functions:
(a) F (b) = min{cTx : Ax = b, x ≥ 0} is a con-
vex function of b, and the dual solution y is a
subgradient of F (b) at b.
(b) G(c) = min{cTx : Ax = b, x ≥ 0} is a concave
function of c, and −x is a subgradient of −G(c)
at c.
14
6. Large-Scale LP
• Column Generation:
– The Cutting Stock Problem
– Dantzig-Wolfe decomposition.
• Row Generation:
– Benders decomposition.
15
7. Computational Complexity of LP
• A problem (class) is “easy” if there exists an al-
gorithm whose computational “effort” required to
solve any instance of the problem is bounded by
some polynomial of the “size” of that instance
(i.e. if there exists a polynomial time algorithm
for the problem).
• Is LP “easy”?
• The Simplex method may require an exponential
number (in the number of variables) of iterations!
– Klee-Minty (1972).
• Yudin and Nemirovskii (1977) developed Ellipsoid
method and showed that general convex programs
are “easy” and Khachian (1979) used it show that
LP is indeed “easy.”
16
7. The Ellipsoid Method for LP
• The Ellipsoid method answers the following ques-tion
Is X = {x ∈ Rn|Ax ≥ b} = ∅?
• Assume: if X 6= ∅ then 0 < v ≤ vol(X) ≤ V .
• We have a Separation Oracle S(x, X) which re-turns 0 if x ∈ X, otherwise it returns a vectora 6= 0 such that aTy > aTx for all y ∈ X.
0. Find an ellipsoid E0(x0) ⊇ X. Set k = 0.
1. If S(xk, X) = 0 stop X 6= ∅. If vol(Ek(xk)) ≤ v
Stop X = ∅.
2. If S(xk, X) = ak, then X ⊂ Hk := {x : aTk x ≥
aTk xk}. Find
Ek+1(xk+1) ⊇ Ek(x
k) ∩Hk ⊃ X
such that
vol(Ek+1(xk+1))
vol(Ek(xk))< e−1/2(n+1).
3. Set k ← k + 1 and go to step 1.
17
7. The Ellipsoid Method for LP (contd.)
• The numbers v and V depend on n and U (the
largest number in the data (A, b)).
• Theorem: The Ellipsoid method answers the ques-
tion “Is X = {x ∈ Rn|Ax ≥ b} = ∅?” in O(n6 log(nU))
iterations.
18
7. The Ellipsoid Method for LP (contd.)
• Easily modified for optimization of a linear func-
tion over polyhedra. Polynomial complexity is pre-
served.
• Note the complexity does not depend on the num-
ber of constraints in X.
• Equivalence of Separation and Optimization: The
description of X maybe involve an exponential
number of constraints. However as long as we
have a polynomial time Separation Oracle then
the Ellipsoid algorithm guarantees that optimiza-
tion of a linear function over X is still polynomial
time!
19
8. Interior Point Methods
min{cTx : x ∈ X}
• Basic idea:
– Given xk ∈ int(X),
– find a direction dk and a step size αk s.t. xk +
αkdk =: xk+1 ∈ int(X) and cTxk+1 < cTxk.
– Continue until some termination criteria is met.
• The algorithms differ w.r.t choice of dk, αk and
the termination criteria.
• May need some preprocessing to guarantee that
an optimal solution exists.
• The algorithms are convergent
limk→∞
xk = x∗.
A good criteria for finite termination is needed.
20
8. Interior Point Methods:
The Affine Scaling Method
• Basic idea:
– Given xk ∈ int(X), construct an Ellipsoid Ek(xk) ⊂
int(X).
– Choose xk+1 = argmin{cTx : x ∈ Ek}.
• Based on the fact that the minimizer of a linear
form over an Ellipsoid can be found analytically.
• Not proven to be polynomial time.
21
8. Interior Point Methods:
The Primal path following (Barrier) method
• We want to solve
P : min{cTx : Ax = b, x ≥ 0}.
• Use a penalty function to prevent iterates from
approaching the boundary of the polyhedron.
• Reduce penalty as the iterates approach an opti-
mal solution (on the boundary).
• Given µ > 0, the barrier problem is
P (µ) : min{fµ(x) := cTx− µn∑
j=1
log(xj) : Ax = b}.
22
8. Interior Point Methods:
The Barrier method
• For any µ > 0 the function fµ(x) is strictly convex
⇒ the problem P (µ) has a unique optimal solution
x(µ).
• For any µ > 0, x(µ) ∈ int(X), where X = {x : Ax =
b, x ≥ 0}.
• For µ = +∞, x(µ) is the analytic center of X.
• As µ→ 0, x(µ)→ x∗.
• The set of solutions {x(µ) : µ ∈ (0,∞)} is known
as the Central Path.
• How to find x(µ) (at least approximately)?
23
Aside: NLP Optimality Conditions
NLP : min{f(x) : Ax = b, x ≥ 0}
LP (x∗) : min{∇f(x∗)Tx : Ax = b, x ≥ 0}
• Theorem: If x∗ is an optimal solution of NLP thenx∗ is an optimal solution of LP (x∗).
• Theorem: If f is convex, then x∗ is an optimalsolution of NLP iff x∗ is an optimal solution ofLP (x∗).
• Theorem: If x∗ is an optimal solution of NLP thenx∗ solves the KKT system
Ax = b, x ≥ 0ATy + s = ∇f(x∗), s ≥ 0xjsj = 0 j = 1, . . . , n.
• Theorem: If f is convex, then x∗ is an optimalsolution of NLP iff x∗ solves the KKT system
Ax = b, x ≥ 0ATy + s = ∇f(x∗), s ≥ 0xjsj = 0 j = 1, . . . , n.
24
8. Interior Point Methods:
The Barrier method (contd.)
• x(µ) is a solution of the KKT system for the Bar-
rier problem
Ax = b, x > 0ATy + s = c, s > 0xjsj = µ j = 1, . . . , n.
• The system is nonlinear – difficult to solve.
• We are content with β-approximate solutions (0 <
β < 1)
Ax = b, x > 0ATy + s = c, s > 0∑n
j=1(xjsjµ − 1)2 ≤ β2
• For fixed β, limµ→0 xβ(µ) = limµ→0 x(µ) = x∗.
25
8. Interior Point Methods:
The Barrier method (contd.)
• Let β = 1/2. Start with some µk > 0 and a β-
approximation xk of x(µk).
• Linearize the KKT system around xk and solve it
to get the new solution xk+1.
• It can be shown that xk+1 is a β-approximation of
x(µk+1) with µk+1 = (1− 12+4
√n)µk.
• Continue until the duality gap (xk)T sk ≤ ε.
26
8. Interior Point Methods:
The Barrier Method (contd.)
• Theorem: The barrier algorithm reduces the dual-
ity gap from ε0 to ε in O(√
n log ε0ε ) iterations.
27
Not covered: Network Flow Problems
• A very important class of problems.
• Constraint matrix has a very special structure,
called a Network matrix.
• Specialized Simplex type algorithm is strongly poly-
nomial time.
• E.g. Transportation and Assignment Problems.
28
What’s next?Optimization Courses in SP’04
• ISyE 6662: Optimization II. Ph.D. level class onInteger Programming and Network Flows. Offeredby Prof. Ergun.
• ISyE 8871: Integer Programming. Advanced Ph.D.level class on Integer Programming. Offered byProf. Nemhauser.
• ISyE 6663: Optimization III. Nonlinear Program-ming programming theory for Ph.D. students. Of-fered by Prof. Nemirovskii.
• ISyE 8813: Advanced Ph.D. class on Interior PointMethods. Offered by Prof. Nemirovskii.
• ISyE 6669: Deterministic optimization (MS level).
• ISyE 6673: Financial optimization models (MSlevel). Offered by Prof. Sokol .
• ISyE 6679: Computational Methods in Optimiza-tion. Offered by Prof. Barnes.
29