ISyE 6661: Topics Coveredsahmed/isye6661/6661review.pdf · 2003. 12. 4. · 3. The Simplex Method...

transcript

ISyE 6661: Topics Covered

1. Optimization fundamentals: 1.5 lectures

2. LP Geometry (Chpt.2): 5 lectures

3. The Simplex Method (Chpt.3): 4 lectures

4. LP Duality (Chpt.4): 4 lectures

5. Sensitivity Analysis (Chpt.5): 3 lectures

6. Large-scale LP (Chpt.6): 1.5 lectures

7. Computational complexity and the Ellipsoid method

(Chpt. 8): 2 lectures

8. Interior Point Algorithms (Chpt. 9): 5 lectures

1. Fundamentals of Optimization

• The generic optimization problem:

(P ) : min{f(x) : x ∈ X}.

• Weirstrass’ Theorem: If f is continuous and X is

compact then problem (P ) has an optimal solu-

• If f is a convex function and X is a convex set,

then (P ) is a convex program.

• Theorem: If x∗ is a local optimal solution of the

convex program (P ) then it is also a global optimal

solution.

2. Linear Programming Geometry

• LP in standard form

(P ) : min{cTx : Ax = b, x ≥ 0}.

• LP involves involves minimizing a linear functionover the polyhderal set X = {x : Ax = b, x ≥ 0}.

• Basic building blocks of a polyhedral set:Extreme points and Extreme rays.

• Theorem: (Algebraic characterization of Extremepts.) A vector x is an extreme point of X iff it isa Basic Feasible Solution, i.e., ∃ a partitioning ofA = [B|N ] (with B square and nonsingular) suchthat xB = B−1b and xN = 0.

• Theorem: (Algebraic characterization of Extremerays.) A vector d 6= 0 is an extreme ray of Xiff if it is a Non-negative Basic Direction, i.e., ∃a partitioning of A = [B|N ] (with B square andnonsingular) s.t.

d = α

[−B−1Aj

]≥ 0

for some Aj ∈ N and α > 0.

2. Linear Programming Geometry (contd.)

• The Representation Theorem: Let x1, . . . , xk andd1, . . . , dl be the extreme points and extreme raysof X respectively. Then

x : x =k∑

λixi +

l∑j=1

k∑i=1

λi = 1, λi ≥ 0 ∀i, µj ≥ 0 ∀j

• To prove the above result, we used:

The Separation Theorem: Let S be a non-emptyclosed convex set, and x∗ 6∈ S. Then ∃ a vector c

s.t. cTx∗ < cTx ∀ x ∈ S.

• Theorem: (Cor. of Rep. Thm.)

(a) An LP min{cTx : x ∈ X} has an optimal solutioniff cTdj ≥ 0 for all extreme rays dj j = 1, . . . , l.

(b) Extreme point optimality: If an LP has an opti-mal solution then there exists an extreme pointthat is optimal.

3. The Simplex Method

• Basic idea: Move from one extreme point (bfs) toanother while improving the objective.

• Given a bfs xk with basis B move along one of thej-th Basic Directions (j ∈ N)

[−B−1Aj

• If x is non-degenerate then dj is a feasible direc-tion, i.e., allows a positive step move. If cTdj < 0then dj is an improving direction. Note cTdj =cj − cT

BB−1Aj = c̄j (the reduced cost).

• If no improving direction exists, i.e. c̄j ≥ 0 for allj ∈ N , the current solution is optimal, Stop.

• Chose an improving basic direction dj from j ∈ N ,and move to xk+1 ← xk + αdj where α ≥ 0 is suchthat xk+1 ≥ 0.

• If dj ≥ 0 then α = +∞ implying that the problemis unbounded, Stop.

3. The Simplex Method (contd.)

• Theorem: xk+1 is an adjacent bfs to xk with basis

B̂ = B +{Aj}−{Al} where l is some basic variable

that becomes nonbasic.

• Degeneracy, i.e., when a basic variable has a value

of zero, is a problem.

• If xk is a degenerate, α could be zero, i.e., the

basis changes from B to B̂ but xk+1 = xk and

cause Stalling or Cycling.

• Can be dealt with by properly choosing j and l

(e.g. Lexicographic rule).

• Theorem: The Simplex method (with proper pivot

rules) solves LP in a finite number of iterations.

3. The Simplex Method (contd.)

• Revised Simplex and Tableau implementations.

• Initializing the Simplex method

– Two-phase Simplex

– Big-M method

4. Duality

• Standard form Primal-dual LP pairs:

vP = min cTx vD = max bTy

s.t. Ax = b s.t. ATy ≤ c.x ≥ 0

• Recipe for writing dual problem for general LPs.

• Weak Duality Theorem: vD ≤ vP .

• Proof of WD: By construction of the dual prob-

• Strong Duality Theorem: If either problem has a

finite optimal value then vD = vP .

• Proof 1 of SD: From the Simplex Method. (cTBB−1

are the optimal dual variables).

• Proof 2 of SD: From the theorems of alternatives

(Farkaas’ Lemma).

4. Duality (Contd.)

• Fakaas’ Lemma: Let A ∈ Rm×n and b ∈ Rm then

exactly one of the following two systems (a or b)

is feasible:

(a) Ax = b (b) ATy ≥ 0x ≥ 0 bTy < 0.

• Proof: Use Separating Hyperplane theorem.

• See different forms of Farkaas’ Lemma.

• From Duality to Polyhedral theory:

– An immediate proof of Farkaas’ Lemma.

– A simple proof of the Representation Thm.

– Converse to Rep. Thm.: Convex hull of a fi-

nite number of points is a polytope.

4. Duality (Contd.)

• LP Optimality Conditions (Cor. of SD) A pair

(x∗, y∗) is primal-dual optimal iff

Ax∗ = b, x∗ ≥ 0 Primal FeasibilityATy∗ ≤ c Dual Feasibilityx∗j(cj −AT

j y∗) = 0 ∀ j Complementary Slackness.

• Relation between non-degeneracy and uniqueness

amongst primal and dual optimal solutions.

• The Dual Simplex Algorithm:

– A basis B is primal feasible (PF) if B−1b ≥ 0

and dual feasible (DF) if cT − cTBB−1A ≥ 0.

– Start with a basis that is DF but not PF.

– Select a variable (< 0) to leave the basis (move

towards PF).

– Select an entering variable to maintain DF.

4. Duality (Contd.)

• Dual Simplex is not analogous to applying Primal

Simplex to the Dual problem.

• When to use Dual Simplex over Primal Simplex?

• Generalized Duality:

The dual of vP = min{cT : Ax ≥ b, x ∈ X} is

vD = max{L(y) : y ≥ 0} where

L(y) := min{cTx + yT (b−Ax) : x ∈ X}.

5. Sensitivity Analysis

Consider the LP

z = min{cTx : Ax = b, x ≥ 0}.

An instance of the LP is given by the data (n, m, c, A, b).

• If the optimal solution x∗ is non-degenerate then

the i-th dual variable represents

y∗i =∂z

∣∣∣∣∣x∗

i = 1, . . . , m.

• Local Sensitivity Analysis:

(a) How doe the optimal solution x∗ and the opti-

mal value z behave under small perturbations

of the problem data (n, m, c, A, b)?

(b) How to efficiently recover the new optimal solu-

tion and optimal value after the perturbation?

5. Sensitivity Analysis (contd.)

• Adding a new variable: Current basis remains PF.

So check DF (reduced cost of the new variable)

and use Primal Simplex to optimize if needed.

• Adding a new constraint: Current basis remain DF.

Check PF, and use Dual Simplex to optimize if

needed.

• Perturbing b← b + δd: Current basis remain DF

and PF over a computable range of δ. Outside

this range, we have DF but not PF, so use Dual

Simplex to optimize.

• Perturbing c← c + δd: Current basis remain DF

and PF over a computable range of δ. Outside

this range, we have PF but not DF, so use Primal

Simplex to optimize.

5. Sensitivity Analysis (contd.)

• Perturbing Aj ← Aj + δd where j ∈ N : Current ba-

sis remain DF and PF over a computable range of

δ. Outside this range, we have PF but not DF, so

use Primal Simplex to optimize.

• Perturbing Aj ← Aj + δd where j ∈ B: Current ba-

sis remain DF and PF over a computable range of

δ. Outside this range, both PF and DF maybe

affected.

• Global behavior of value functions:

(a) F (b) = min{cTx : Ax = b, x ≥ 0} is a con-

vex function of b, and the dual solution y is a

subgradient of F (b) at b.

(b) G(c) = min{cTx : Ax = b, x ≥ 0} is a concave

function of c, and −x is a subgradient of −G(c)

6. Large-Scale LP

• Column Generation:

– The Cutting Stock Problem

– Dantzig-Wolfe decomposition.

• Row Generation:

– Benders decomposition.

7. Computational Complexity of LP

• A problem (class) is “easy” if there exists an al-

gorithm whose computational “effort” required to

solve any instance of the problem is bounded by

some polynomial of the “size” of that instance

(i.e. if there exists a polynomial time algorithm

for the problem).

• Is LP “easy”?

• The Simplex method may require an exponential

number (in the number of variables) of iterations!

– Klee-Minty (1972).

• Yudin and Nemirovskii (1977) developed Ellipsoid

method and showed that general convex programs

are “easy” and Khachian (1979) used it show that

LP is indeed “easy.”

7. The Ellipsoid Method for LP

• The Ellipsoid method answers the following ques-tion

Is X = {x ∈ Rn|Ax ≥ b} = ∅?

• Assume: if X 6= ∅ then 0 < v ≤ vol(X) ≤ V .

• We have a Separation Oracle S(x, X) which re-turns 0 if x ∈ X, otherwise it returns a vectora 6= 0 such that aTy > aTx for all y ∈ X.

0. Find an ellipsoid E0(x0) ⊇ X. Set k = 0.

1. If S(xk, X) = 0 stop X 6= ∅. If vol(Ek(xk)) ≤ v

Stop X = ∅.

2. If S(xk, X) = ak, then X ⊂ Hk := {x : aTk x ≥

aTk xk}. Find

Ek+1(xk+1) ⊇ Ek(x

k) ∩Hk ⊃ X

such that

vol(Ek+1(xk+1))

vol(Ek(xk))< e−1/2(n+1).

3. Set k ← k + 1 and go to step 1.

7. The Ellipsoid Method for LP (contd.)

• The numbers v and V depend on n and U (the

largest number in the data (A, b)).

• Theorem: The Ellipsoid method answers the ques-

tion “Is X = {x ∈ Rn|Ax ≥ b} = ∅?” in O(n6 log(nU))

iterations.

7. The Ellipsoid Method for LP (contd.)

• Easily modified for optimization of a linear func-

tion over polyhedra. Polynomial complexity is pre-

served.

• Note the complexity does not depend on the num-

ber of constraints in X.

• Equivalence of Separation and Optimization: The

description of X maybe involve an exponential

number of constraints. However as long as we

have a polynomial time Separation Oracle then

the Ellipsoid algorithm guarantees that optimiza-

tion of a linear function over X is still polynomial

8. Interior Point Methods

min{cTx : x ∈ X}

• Basic idea:

– Given xk ∈ int(X),

– find a direction dk and a step size αk s.t. xk +

αkdk =: xk+1 ∈ int(X) and cTxk+1 < cTxk.

– Continue until some termination criteria is met.

• The algorithms differ w.r.t choice of dk, αk and

the termination criteria.

• May need some preprocessing to guarantee that

an optimal solution exists.

• The algorithms are convergent

limk→∞

xk = x∗.

A good criteria for finite termination is needed.

8. Interior Point Methods:

The Affine Scaling Method

• Basic idea:

– Given xk ∈ int(X), construct an Ellipsoid Ek(xk) ⊂

int(X).

– Choose xk+1 = argmin{cTx : x ∈ Ek}.

• Based on the fact that the minimizer of a linear

form over an Ellipsoid can be found analytically.

• Not proven to be polynomial time.

The Primal path following (Barrier) method

• We want to solve

P : min{cTx : Ax = b, x ≥ 0}.

• Use a penalty function to prevent iterates from

approaching the boundary of the polyhedron.

• Reduce penalty as the iterates approach an opti-

mal solution (on the boundary).

• Given µ > 0, the barrier problem is

P (µ) : min{fµ(x) := cTx− µn∑

log(xj) : Ax = b}.

The Barrier method

• For any µ > 0 the function fµ(x) is strictly convex

⇒ the problem P (µ) has a unique optimal solution

x(µ).

• For any µ > 0, x(µ) ∈ int(X), where X = {x : Ax =

b, x ≥ 0}.

• For µ = +∞, x(µ) is the analytic center of X.

• As µ→ 0, x(µ)→ x∗.

• The set of solutions {x(µ) : µ ∈ (0,∞)} is known

as the Central Path.

• How to find x(µ) (at least approximately)?

Aside: NLP Optimality Conditions

NLP : min{f(x) : Ax = b, x ≥ 0}

LP (x∗) : min{∇f(x∗)Tx : Ax = b, x ≥ 0}

• Theorem: If x∗ is an optimal solution of NLP thenx∗ is an optimal solution of LP (x∗).

• Theorem: If f is convex, then x∗ is an optimalsolution of NLP iff x∗ is an optimal solution ofLP (x∗).

• Theorem: If x∗ is an optimal solution of NLP thenx∗ solves the KKT system

Ax = b, x ≥ 0ATy + s = ∇f(x∗), s ≥ 0xjsj = 0 j = 1, . . . , n.

• Theorem: If f is convex, then x∗ is an optimalsolution of NLP iff x∗ solves the KKT system

Ax = b, x ≥ 0ATy + s = ∇f(x∗), s ≥ 0xjsj = 0 j = 1, . . . , n.

The Barrier method (contd.)

• x(µ) is a solution of the KKT system for the Bar-

rier problem

Ax = b, x > 0ATy + s = c, s > 0xjsj = µ j = 1, . . . , n.

• The system is nonlinear – difficult to solve.

• We are content with β-approximate solutions (0 <

β < 1)

Ax = b, x > 0ATy + s = c, s > 0∑n

j=1(xjsjµ − 1)2 ≤ β2

• For fixed β, limµ→0 xβ(µ) = limµ→0 x(µ) = x∗.

The Barrier method (contd.)

• Let β = 1/2. Start with some µk > 0 and a β-

approximation xk of x(µk).

• Linearize the KKT system around xk and solve it

to get the new solution xk+1.

• It can be shown that xk+1 is a β-approximation of

x(µk+1) with µk+1 = (1− 12+4

√n)µk.

• Continue until the duality gap (xk)T sk ≤ ε.

The Barrier Method (contd.)

• Theorem: The barrier algorithm reduces the dual-

ity gap from ε0 to ε in O(√

n log ε0ε ) iterations.

Not covered: Network Flow Problems

• A very important class of problems.

• Constraint matrix has a very special structure,

called a Network matrix.

• Specialized Simplex type algorithm is strongly poly-

nomial time.

• E.g. Transportation and Assignment Problems.

What’s next?Optimization Courses in SP’04

• ISyE 6662: Optimization II. Ph.D. level class onInteger Programming and Network Flows. Offeredby Prof. Ergun.

• ISyE 8871: Integer Programming. Advanced Ph.D.level class on Integer Programming. Offered byProf. Nemhauser.

• ISyE 6663: Optimization III. Nonlinear Program-ming programming theory for Ph.D. students. Of-fered by Prof. Nemirovskii.

• ISyE 8813: Advanced Ph.D. class on Interior PointMethods. Offered by Prof. Nemirovskii.

• ISyE 6669: Deterministic optimization (MS level).

• ISyE 6673: Financial optimization models (MSlevel). Offered by Prof. Sokol .

• ISyE 6679: Computational Methods in Optimiza-tion. Offered by Prof. Barnes.

ISyE 6661: Topics Coveredsahmed/isye6661/6661review.pdf · 2003. 12. 4. · 3. The Simplex Method...

Documents