CO255: Introduction to Optimisation

transcript

Helena S. Ven

05 May 2019

Instructor: Levent Tuncel (ltuncel@uwaterloo.ca)

Office: MC5136 (MW 1600 – 1800)

Grading: 50% Homework, 50% Final

Topics:

• Linear programming

• Polyhedra

• Combinatorial Optimisation

• Convex Geometry & Optimisation

• Complexity Theory

By convention, all vectors are column vectors.

1 The Problem of Optimisation 2

2 Linear Programming 32.1 Theorems of The Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Affine Functions and Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Fourier-Motzkin Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 Complementary Slackness Conditions and Theorems . . . . . . . . . . . . . . . . . . . . . . . . . 152.5 Convexity and Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5.1 Extreme Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.6 Bases and Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.6.1 Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.6.2 Cycling and Stalling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.6.3 Two-Phase Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Combinatorial Optimisation 403.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.2 Integer Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.3 Totally Unimodular Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3.1 Application of Integer Programming to Graphs . . . . . . . . . . . . . . . . . . . . . . . . 453.4 Faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.5 Maximum Weight Perfect Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.6 Alternatives in Integer Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4 Continuous Optimisation 554.1 Topology on Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.2 Semi-definite Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.2.1 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.3 Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.3.1 Steepest Descent and Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.4 Separating Hyperplane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.5 Lagrangians and Lagrangian Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.6 Ellipsoid and Interior Point Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Caput 1

The Problem of Optimisation

1.0.1 Problem.Given a feasible region (set) C and a objective function f : C → R, solve

max{f(x) : x ∈ C}

Notice that solving minimum is equivalent to maximum of −f .This problem might not be well posed. For example,

• (1.0.1) may be infeasible (e.g. C = ∅)

• (1.0.1) may be unbounded, when f can be arbitrarily small on C.

• (1.0.1) may not have a solution, e.g. maximising f(x) := x on ]0, 1[.

Instead of finding minimum/maximum, we can find inf, sup, which always exist. In some special cases withmore structure on C, we can obtain more insight about the nature of the solution.

In this document we shall look at several types of optimisation problems:

• Linear Optimisation

• Combinatorial Optimisation

• Continuous Optimisation

• Convex Optimisation

Unconstrained optimisation problems are very general. For example:

Example: Nonlinear Optimisation Problem

Fermat’s Last Theorem: For n ≥ 3, there does not exist x, y, z ∈ Z>0 with xn + yn = zn.Consider the optimisation problem:

inf(xx41 + x

x42 − x

x43 )2 +

∑i=1,2,3,4

(sinπxi)2 :

x1 ≥ 1

x2 ≥ 1

x3 ≥ 1

x4 ≥ 3

It is not hard to show that the infimum of this problem is 0, but showing that the infimum is unattainable isequivalent to proving Fermat’s Last Theorem. This is an example of a very difficult continuous optimisationproblem.

In general unconstrained optimisation problems are undecidable. Matijasevic (1970) proved that Diophantineequations are powerful enough to express Turing machines and are undecidable in general. Diophantine equationscan be represented with optimisation problems.

Caput 2

Linear Programming

2.1 Theorems of The Alternative

Definition

Given two vectors a , b, a ≥ b when ai ≥ bi for all i.

2.1.1 Linear Programming.Given A ∈ Rm×n and b ∈ Rm, determine if there exists x with Ax ≥ b.

The usefulness of the following theorem (FToLA) is that to prove Ax = b has no solutions, it suffices tofind a y .

2.1.1 Lemma.For any A ∈ Rm×n,

ker Aᵀ ⊕ img A = Rm

It is necessary to acquire intuition about the following theorem. Given A, b, one of two things can happen:

1. b ∈ img A

2. b 6∈ img A, in which case b can be projected onto img A, forming a decomposition

b = b‖ + b⊥

The vector y is a certificate of infeasibility.

2.1.2 Fundamental Theorem of Linear Algebra (Theorem 1).Let A ∈ Rm×n, b ∈ Rm. Then exactly one of the following holds:

1. ∃x ∈ Rn such that Ax = b

(equivalently b ∈ img A)

2. ∃y ∈ Rm such that Aᵀy = 0 ∈ Rn and bᵀy 6= 0

and by extension bᵀy = c for non-zero c. Equivalently b makes a non-zero dot product with y ∈ ker Aᵀ.

Proof. Let b = b⊥ + b⊥, where b⊥ ∈ img A and b⊥ ∈ ker Aᵀ.Assume (1) and (2) hold. Then

0 = (Aᵀy)ᵀ︸︷︷︸0

x = yᵀAx = yᵀb 6= 0

which is absurd, so one of them cannot hold.Suppose (1) does not hold. Then b 6∈ img A. From Linear Algebra we know that

(img A)⊥ = ker Aᵀ

By Orthogonal Decomposition Theorem, there exist (unique!) b‖ ∈ img A and b⊥ ∈ ker Aᵀ such that

b = b‖ + b⊥

We also note that since img A is not the entire space Rm, (img A)⊥ 6= {0}, so b⊥ 6= 0 .Then

Aᵀb⊥ = 0

butb · b⊥ = b‖ · b⊥ + b⊥ · b⊥ 6= 0

Therefore we can take y := b.The next theorem provides some insight into the nature of ≥ 0 vectors. It provides tools for duality problems

in linear programming. Similarly to the above theorem, Farkas’ Lemma has an intuitive explanation:

1. b is in the closed cone formed by A’s column vectors.

2. b is not in the closed cone. In which case there is a covector yᵀ that separates b from the columns of A.

2.1.3 Farkas’ Lemma (Theroem 2).Let A ∈ Rm×n, b ∈ Rm. Then exactly one of the following holds

1. ∃x ∈ Rn such that Ax = b and x ≥ 0

2. ∃y ∈ Rm such that Aᵀy ≥ 0 and bᵀy < 0

Proof. Suppose both systems (1),(2) have solutions. Then since the inner product of two non-negativevectors must be non-negative,

0 ≤ (Aᵀy)ᵀx = yᵀAx = yᵀb = bᵀy < 0

so (1),(2) cannot both hold.Suppose (1) is infeasible. Define

C := {Ax : x ≥ 0}

Descriptio 2.1: yᵀAe i > 0 leads to a contradiction to the minimality of ‖b −Aξ‖, so all Aej ’s must pointaway from b

A(ξ + λe i)

This is a closed cone and must be non-empty (0 ∈ C), so by Hilbert Projection Theorem there exists ξ ∈ Rn

withξ := arg min

x≥0‖b −Ax‖2

which means Aξ ∈ C. Define y := b −Aξ. Suppose y 6⊥ Aξ. Then if Aρ is the projection of y onto Aξ, forsufficiently small but non-zero λ we have ξ + λρ ≥ 0 and

‖b −A(ξ + λρ)‖ = ‖(b −Aξ)− λAρ‖=∥∥y − λ projAξ y

∥∥< ‖y‖

which violates the minimality of ξ. Hence y ⊥ Aξ.Suppose there exists index i such that yᵀAe i > 0. We shall derive a contradiction. Notice that the

minimality of ξ implies that λ = 0 is the minimum of the function (where λ ≥ 0)

f(λ) := ‖b −A(ξ + λe i)‖2

= ‖y − λAe i‖

= ‖y‖2 − 2λyᵀAe i︸︷︷︸>0

+λ2 ‖Ae i‖2

This is the crux of this proof. The quadratic function f has a negative linear term and therefore for a sufficientlysmall but positive λ, we have f(λ) < f(0), and moreover

ξ + λe i ≥ 0

since λ > 0. This is a contradiction to the minimality of ‖b −Aξ‖ and we conclude that no i’s may exist suchthat Ae i · y > 0.

Therefore we have found y withAᵀy ≤ 0 , bᵀy > 0

so (2) is feasible.

Example: Two-person zero-sum game

Suppose we have two players ρ, γ. They are playing a game. ρ, γ choose each a strategy from two options{1, 2} without informing the other party.

Then, ρ, γ reveal the strategy to the other player. Depending on the outcome, player γ pays player ρ:

ρ1 +20 −502 −40 +80

Being self-interested and rational players, γ applies a “mixed strategy”. Sometimes γ plays (1) andsometimes (2).

Let xi be the probability of γ playing strategy (i). If ρ plays (1), then γ has to pay

c1 = +20x1 − 50x2

If ρ plays (2), then γ has to payc2 = −40x1 + 80x2

From ρ’s viewpoint, we want to maximise min{c1, c2}. From γ’s viewpoint, we want to minimise max{c1, c2}.These are linear optimisation problems. γ’s problem can be reformulated as maximising

x0 : c1, c2 ≥ x01 , x1 + x2 = 1

Generally, ρ has strategies {1, . . . ,m} and γ has {1, . . . , n}. Let the payoff metric be A ∈ Rm×n. ρ’s problemcan be formulated as maximising (with probability vector x ∈ Rn)

arg maxx

x0 :Ax ≥ x01

1ᵀx = 1

x ≥ 0

The last two condition forces x to be a probability distribution.Similarly γ’s problem is to minimise

arg miny

y0 :Aᵀy ≤ y01

1ᵀy = 1

y ≥ 0

The problem of ρ, γ are duals and they have the same solution. The power of duality theory is to prove that aproblem is unsolvable by giving an example to another problem.

How do we solve a system of linear inequalities such as Ax ≤ b?

2.2 Affine Functions and Linear Programming

Definition

f : Rn → R is an affine function if there exist o ∈ Rn and β ∈ R with

f(x ) = oᵀx + β

f is a linear function if β = 0.A linear constraint is a condition of the form (one of)

f(x ) ≤ g(x ), f(x ) = g(x ), f(x ) ≥ g(x )

If f, g are both affine.

Definition

A linear programming (LP) problem is the problem of optimising (minimising/maximising) an affinefunction of finitely many real variables subject to finitely many linear constraints.

We will consider LPs in standard inequality form which is

max{cᵀx : Ax ≤ b,x ≥ 0}

and the standard equality formmax{cᵀx : Ax = b,x ≥ 0}

Both of these forms are general, where

A ∈ Rm×n, b ∈ Rm, c ∈ Rn

Any LP conforming to the definition above can be put into either (i.e. both) of the standard in/equality forms.

2.2.1 Proposition.The standard inequality and equality forms are equivalent.

Proof. If we have a problem in standard equality form:

max{cᵀx : Ax = b,x ≥ 0}

we can rewrite this as

max{cᵀx : Ax ≤ b,−Ax ≤ −b,x ≥ 0} = max

{cᵀx :

[A 00 −A

],x ≥ 0

}so SIF is not less general than SEF.

Conversely, given a problem in SIF:

max{cᵀx : Ax ≤ b,x ≥ 0}

We can define s withAx + s = b

and force s ≥ 0 , transforming the problem into

max {cᵀx : Ax + s = b,x ≥ 0 , s ≥ 0}

and so

]ᵀ [xs

]: [A|I]

]≥ 0

Definition

In a Linear Programming problem,

• A feasible solution is a x ∈ Rn that satisfies all constraints.

The set of feasible solutions is the feasible region.

• A optimal solution is a x ∈ Rn is a feasible solution with the best objective value.

There could be multiple optimal solutions but at most one optimal value.

• The problem is infeasible if the feasible region is empty.

The constraints of any LP problem can be put into the form Ax ≤ b. We focus on one component of x ata time. We can look at the nth column of A which corresponds to xn: Define

J0 := {i : ai,n = 0}J+ := {i : ai,n > 0}J− := {i : ai,n < 0}

We can extract the nth column:

Ax ≤ b ⇐⇒

∑n−1j=1 Ai,jxj ≤ bi, ∀i ∈ J0

xn ≤ 1Ai,n

(bi −

∑n−1j=1 Ai,jxj

), ∀i ∈ J+

xn ≥ 1Ai,n

(bi −

∑n−1j=1 Ai,jxj

), ∀i ∈ J−

The 3 simultaneous conditions provide insight to the feasibility of Ax ≤ b. The first equation is independentof xn and can be recursively calculated. Now the only requirement for the rest of the rows is that

maxi∈J−

bi − n−1∑j=1

Ai,jxj

≤ mini∈J+

bi − n−1∑j=1

Ai,jxj

Note. By convention, the maximum of an empty set is −∞ and the minimum of a empty set is +∞.

The above system has a solution iff the system has a solution:{∑n−1j=1 Ai,jxj ≤ bi, ∀i ∈ J0

(bl −

∑n−1j=1 Al,jxj

)≤ 1

(bk −

∑n−1j=1 Ak,jxj

), ∀l ∈ J−, k ∈ J+

This system can be represented as A′x ′ ≤ b ′ because it is another linear system. In particular x ′ = [x1, . . . , xn−1]ᵀ.

2.2.2 Lemma.The initial system Ax ≤ b is feasible if and only if A′x ′ ≤ b ′ is feasible.

Moreover, for any feasible solution x ′ of A′x ′ ≤ b ′ there exists xn ∈ R such that x :=

[x ′

]is a feasible

solution of Ax ≤ b.

Proof. Suppose Ax ≤ b is feasible. Then every inequality in A′x ′ ≤ b ′ is either one of the constraintsfrom Ax ≤ b or a non-negative (i.e. multiplied by a non-negative number on both sides) linear combination ofinequalities from Ax ≤ b. Specifically the inequalities

bl − n−1∑j=1

Al,jxj

bk − n−1∑j=1

Ak,jxj

, ∀l ∈ J−, k ∈ J+

rearrange to (notice Ak,n,−Al,n > 0)

n−1∑j=1

Ak,jxj −1

n−1∑j=1

Al,jxj ≤1

Ak,nbk −

Al,nbl

Descriptio 2.2: Feasibility of a two-dimensional linear system Ax ≤ b. The feasible region of x1 directly affectsthe feasible region of x2

x1A:,1

Vect A:,2

Therefore A′x ′ ≤ b ′ is feasible.Conversely, suppose A′x ′ ≤ b ′. Then we can choose ξ such that

maxi∈J−

bi − n−1∑j=1

Ai,jxj

≤ ξ ≤ mini∈J+

bi − n−1∑j=1

Ai,jxj

[x ′

]is a solution of Ax ≤ b.

2.2.1 Fourier-Motzkin Elimination

If Ax ≤ b has a solution, since in each iteration we have the following

L(x1, . . . , xi−1) ≤ xi ≤ U(x1, . . . , xi−1)

The following algorithm can yield a solution to x to Ax ≤ b.

1. Pick L ≤ x1 ≤ U

2. Pick L(x1) ≤ x2 ≤ U(x1)

3. . . .

4. Pick L(x1, . . . , xn−1) ≤ xn ≤ U(x1, . . . , xn−1)

Suppose Ax ≤ b has no solution. Fourier-Motzkin Elimination generates a series of linear systems:

Ax ≤ b

A′x ′ ≤ b ′

Each system consists of non-negative linear combinations of linear inequalities of the preceding system.Let u ∈ Rm≥0, v ∈ Rm≥0. Then

Ax ≤ b =⇒

{uᵀ(Ax ) ≤ uᵀb

vᵀ(Ax ) ≤ vᵀb

so for any α, β ≥ 0,(αuᵀ + βvᵀ)︸︷︷︸

Ax ≤ (αuᵀ + βvᵀ)︸︷︷︸w

Thus every new inequality generated by Fourier-Motzkin Elimination can be written as

uᵀAx ≤ uᵀb

for some u ∈ Rm≥0. Eventually we arrive at0x ≤ γ

for some real γ.Since Ax ≤ b has no solution, there exists u ∈ Rm≥0 with

uᵀA = 0ᵀ, uᵀb = γ < 0

2.2.3 (Theorem 5).Let A ∈ Rm×n, b ∈ Rm. Then exactly one of the following has a solution:

1. Ax ≤ b

2. Aᵀu = 0 ,u ≥ 0 , bᵀu < 0

Proof. Suppose both (1), (2) are solvable with solutions x , u . Then

0 > bᵀu ≥ (Ax )ᵀu = xᵀ (Aᵀu)︸︷︷︸=0

Suppose (1) has no solution. Then we can apply Fourier-Motzkin Elimination produces u withuᵀA = 0ᵀ anduᵀb = γ < 0.

Theorem 5 shows that deciding feasibility is equally hard computationally as deciding infeasibility.

2.2.4 (Theorem 7).Let A ∈ Rm×n, b ∈ Rm. Then exactly one of the following has a solution:

1. Ax ≤ b,x ≥ 0

2. Aᵀu ≥ 0 ,u ≥ 0 , bᵀu < 0

Proof. Suppose (1) and (2) are solvable with solutions x , u . Then

0 > bᵀu ≥ (Ax )ᵀu = xᵀ(Aᵀu) ≥ 0

this is because the inner product of two non-negative vectors must be non-negative.We can put (1) into the form

A′ :=

[A−I

], b ′ :=

]Applying 2.2.3 to A′, b ′, system (2) from 2.2.3 is[

Aᵀ −I]y = 0 ,y ≥ 0 ,

[bᵀ 0ᵀ]y < 0

Hence if we decompose [uv

We have

Aᵀu = v ≥ 0 ,u ≥ 0 ,[bᵀ 0ᵀ] [u

]= bᵀu

(A′)ᵀu = 0 ,u ≥ 0 , bᵀu < 0

Note. (Remark 8): We can also prove Farkas’ Lemma in a similar manner by applying Theorem 7 to

[A−A

], b :=

[b−b

Example: A Linear Programming Problem

Consider x ∈ R2:

z(x ) := 2x1 + x2

subject to

x1 − 2x2 ≤ 2

4x1 + x2 ≤ 10

x1 ≥ 0

x2 ≥ 0

The feasible region can be visualised

x1 − 2x2 ≤ 2

4x1 + x2 ≤ 10

Consider

1 −24 1−1 00 −1

Choose the linear combination u = [0, 1, 2, 0]ᵀ. Then{

Ax ≤ b

u ≥ 0=⇒ uᵀAx ≡ 2x1 + x2 ≤ 10

Observe that the optimal value 10 is attained at the top-left corner:

]u provides a proof that any feasible solution must not exceed the maximum value 10. Since we haveexhibited x such that z(x ) = 10, x is optimal.

Note: u does not provide a easy algorithm to find x !

What the certificate u provides, is the coefficient of vectors normal to the constraints, when summed usingweight u , coincides with the gradient of the objective function.

2.2.5 (Theorem 9) Gordan’s Theorem, 1873.Exactly one of the following systems has a solution:

1. Ax > 0

2. Aᵀy = 0 ,y ≥ 0 ,y 6= 0

2.2.6 (Theorem 10) Stiemke’s Theorem, 1915.Exactly one of the following systems has a solution:

1. Ax = 0 ,Ax > 0

2. Aᵀy ≥ 0 ,Aᵀy 6= 0

2.3 Duality

Definition

There are 3 classes of LP problems:

1. Unbounded problems: For every real number x ∈ R there exists a feasible solution with objectivevalue > x (for maximisation) or < x (for minimisation).

2. Feasible problems

3. Infeasible problems

A LP problem with a bounded feasible region cannot be unbounded. The converse is not true, e.g. minimisingthe L1 norm on the non-negativity orthant Rn≥0.

Definition

Let A ∈ Rm×n, b ∈ Rm, c ∈ Rn.The dual of the LP problem

P := max cᵀx , subject to

{Ax ≤ b

x ≥ 0

P ∗ := min bᵀy , subject to

{Aᵀy ≥ c

y ≥ 0

Dual is an involution. P ∗∗ = P .The following theorem implies that y can prove the optimality of x .

2.3.1 Weak Duality Relation (Theorem 13).For every feasible solution x of P (the LP problem) and every feasible solution y of P ∗, we have

cᵀx ≤ bᵀy

Proof. Since x ≥ 0 and c ≤ Aᵀy ,

cᵀx ≤ (Aᵀy)ᵀx = yᵀ(Ax ) ≤ yᵀb = bᵀy

2.3.2 (Corollary 14).Let x , y be feasible solutions of P, P ∗, respectively. If cᵀx = bᵀy , then x is optimal for P and y is optimal forP ∗.

Proof. For any other feasible x , via the Weak Duality Relation,

cᵀx ≤ bᵀy = cᵀx

which proves the optimality of x .Similarly for any feasible y ,

bᵀy = cᵀx ≤ bᵀy

2.3.3 (Corollary 15).If P is unbounded, then P ∗ is infeasible. If P ∗ is unbounded, then P is infeasible.

Proof. If P is unbounded, then for any feasible solution y of P ∗, we have bᵀy ≥ c for any c ∈ R. This isimpossible to satisfy for a fixed y , so P ∗ is infeasible.

2.3.4 Strong Duality Theorem, 1 (Theorem 16).Assume P, P ∗ have feasible solutions. Then they have optimal solutions x , y , such that

cᵀx = bᵀy

Proof. Suppose there exists x , y with{Ax ≤ b

x ≥ 0,

{Aᵀy ≥ c

y ≥ 0

Consider the system:

Ax ≤ b

x ≥ 0

Aᵀy ≥ c

y ≥ 0

cᵀx ≥ bᵀy

The first 4 conditions imply that x is a feasible solution of P and y is a feasible solution of P ∗. The lastcondition, combined with Weak Duality Relation, implies that cᵀx = bᵀy . By Corollary 14, this impliesoptimality.

We can rewrite this system in the form of system (1) in Theorem 7: A 00 Aᵀ

−cᵀ bᵀ

b−c0

]≥ 0

If this system were to have a solution we concluded above that the proof is complete. Suppose that this systemdoes not have a solution.

By Theorem 7, there exists u ∈ Rm, v ∈ Rn, α ∈ R such that

[Aᵀ 0 −c0 −A b

≥ 0 ,

≥ 0 ,[bᵀ −cᵀ 0

] uvα

α ≥ 0 so there are 2 cases:

• α > 0: We setu :=

α, v :=

Then using the inequality above,

Aᵀu ≥ c, u ≥ 0

Av ≤ b v ≥ 0

and bᵀu < cᵀv . Since u is feasible in P ∗, and v is feasible in P , by Theorem 13 we have

bᵀu ≥ cᵀv

which is a contradiction.

• α = 0: The inequality simplifies to

Aᵀu ≥ 0 , Av ≤ 0 , bᵀu < cᵀv

Since b ≥ Ax ,bᵀu ≥ (Ax )ᵀu = xᵀ (Aᵀu)︸︷︷︸

cᵀv ≤ (Aᵀy)ᵀv = yᵀ Av︸︷︷︸≤0

Therefore,0 ≤ bᵀu < cᵀv ≤ 0

which is a contradiction.

Therefore the alternative system (2) never has a solution, so Theorem 7 system (1) must have a solution, i.e.there exists x , y satisfying

Ax ≤ b

x ≥ 0

Aᵀy ≥ c

y ≥ 0

cᵀx = bᵀy

as required.

2.3.5 (Lemma 17).If P is feasible and P ∗ is infeasible, then P is unbounded.

Proof. Suppose P is feasible, so there exist x such that x ≥ 0 ,Ax ≤ b.If P ∗ is infeasible, there does not exist y ∈ Rm such that Aᵀy ≥ c,y ≥ 0 , then via Theorem 7 (with A

replaced by −Aᵀ), there exists v such that

−Av ≥ 0 , v ≥ 0 , cᵀv > 0

At this point, it should be apparent that moving x in the direction of v can improve the solution by anunbounded amount. For every λ ∈ R≥0, consider

x (λ) := x + λv

Then x (λ) ≥ 0 since each component is ≥ 0 , and

Ax (λ) = Ax︸︷︷︸≤b

+λ Av︸︷︷︸≤0

butcᵀx (λ) = cᵀx + λ cᵀv︸︷︷︸

is unbounded.

2.3.6 Strong Duality Theorem 2 (Theorem 18).If P has an optimal solution, then so does P ∗, and their optimal values coincide.

Proof. If P has an optimal solution, P is bounded, so P ∗ must be feasible. By Strong Duality Theorem 1,P, P ∗ have the same objective value.

2.3.7 Fundamental Theorem of Linear Programming (Theorem 19).Every LP problem P must fall into at precisely one of the categories

• Feasible, Unbounded

• Feasible, Bounded, and has optimal solutions

• Infeasible

Proof. Let P be such a problem. Then P must either be feasible or infeasible. If P is feasible, P ∗ iseither feasible or infeasible. If P ∗ is feasible, P is bounded (Theorem 13) and feasible. If P ∗ is infeasible, P isunbounded (Lemma 17).

Note. The reader may be tempted to ask if there exists a LP P such that P and P ∗ are infeasible. Consider

x1 − x2 ≤ 2

−x1 + x2 ≤ −3

x1, x2 ≥ 0

This is infeasible since the sum of the first two constraints give 0 ≤ −1. The interesting thing about this problemis that the matrix A is symmetric. We can build upon this idea by defining

1 −1 0 0−1 1 0 00 0 −1 10 0 1 −1

, b := c :=

2−3−23

Consider A[x1, x2, x3, x4]ᵀ. This gives rise to the inequalities

+x1 − x2 ≤ 2

−x1 + x2 ≤ −3

−x3 + x4 ≤ −2

+x3 − x4 ≤ 3

Summing every 2 inequalties gives that0 ≤ −1, 0 ≤ 1

which is infeasible, and likewise for the dual problem P ∗ since replacing all ≤’s by ≥ still results in a infeasibleproblem.

P, P ∗ Optimisable Unbounded InfeasibleOptimisableUnboundedInfeasible

Note that we have also proved

2.3.8 (Proposition 20).For every A, b, c, the set of primal-dual optimal solution pairs is

]∈ Rn × Rm :

A 00 Aᵀ

−cᵀ bᵀ

b−c0

]≥ 0

2.4 Complementary Slackness Conditions and Theorems

Definition

(Definition 21) The complementary slackness conditions are the following:

1. ∀j ∈ {1, . . . , n}, either xj = 0 or (Aᵀy − c)j = 0

2. ∀i ∈ {1, . . . ,m}, either yi = 0 or (b −Ax )i = 0

Let x ,y be solutions to P, P ∗, respectively, then

cᵀx ≤ (Aᵀy)ᵀx = yᵀ(Ax ) ≤ bᵀy

Such a feasible primal-dual pair x ,y are optimal in their respective problems iff the above is an equation, whichis equivalent to

xᵀ(Aᵀy − c) = 0

yᵀ(b −Ax ) = 0

Since x ,y are feasible,xᵀ︸︷︷︸≥0

(Aᵀy − c)︸︷︷︸≥0

= 0 , yᵀ︸︷︷︸≥0

(b −Ax )︸︷︷︸≥0

Therefore,

2.4.1 Complementary Slackness Conditions (Theorem 22).Let x be feasible in P , y be feasible in P ∗, then x , y are optimal iff

• For all j ∈ {1, . . . , n}, either xj = 0 or (Aᵀy − c)j = 0

• For all i ∈ {1, . . . ,m}, either yi = 0 or (b −Ax )i = 0

2.4.2 (Theorem 23).Let x be feasible in P , then x is optimal in P iff there exists y , feasible in P ∗, such that x , y satisfy theComplementary Slackness Conditions.

The power of Theorem 23 is that given a candidate optimal solution x , (Aᵀy − c)j = 0 is an array of linearequations, which, if solved, produces either a certificate of optimality or inoptimality.

Example:

max 4x1 + 3x2,

x1 + 2x2 ≤ 2

x1 − 2x2 ≤ 3

2x1 + 3x2 ≤ 5

3x1 + x2 ≤ 3

x ≥ 0

Question: Are the following x optimal?

], x ′ :=

[4/53/5

The dual problem is

max 2y1 + 3y2 + 5y3 + 2y4 + 3y5,

y1 + y2 + 2y3 + y4 + 3y5 ≥ 4

2y1 − 2y2 + 3y3 + y4 + y5 ≥ 3

y ≥ 0

x is feasible (easily checked). It generates the conditionsx1 − 2x2 = −2 < 3

2x1 − 3x2 = 3 < 5

x1 + x2 = 1 < 2

3x1 + x2 = 1 < 3

y2 = 0

y3 = 0

y4 = 0

No y satisfying this condition can satisfy the equalities given in CSC, so x is not optimal.

Complementary Slackness is not always useful. Consider the problem (P)

max cᵀx ,

{Ax ≤ 0

x ≥ 0

Is x = 0 optimal for (P)? The dual is

min 0ᵀy ,

{Aᵀy ≥ 0

y ≥ 0

Since x = 0, the first complementary condition tells us nothing. Since 0 −Ax = 0 , the second complementarytells us nothing.

Definition

The LP problems P and P ′ are equivalent if

• P has an optimal solution iff P ′ does

• P is indeasible iff P ′ is

• P is unbounded iff P ′ is.

Moreover, certificates of optimality, infeasibility, unboundedness for one can be converted for the samekind of certificate for the other.

Several transformations exist to produce equivalent LP’s:

• min cᵀx ∼ −max−cᵀx

Aᵀx = b ⇐⇒

{Aᵀx ≤ b

−Aᵀx ≤ −b

• xi free ∼ Introduct two new non-negative variables ui, vi and set xi := ui − vi.

•aᵀx ≤ b ∼ aᵀx + xn+1 = b, (xn+1 ≥ 0)

Below we have a LP in SEF:

max cᵀx ,

{Ax = b

x ≥ 0

This can be transformed to

max cᵀx ,

]x ≤

]x ≥ 0

The dual of this LP

min (bᵀu − bᵀv),

[Aᵀ −Aᵀ

]≥ c[

]≥ 0

After simplification,

min bᵀ(u − v)

Aᵀ(u − v) ≥ c[u

]≥ 0

Finally, if we let y := u − v ,

min bᵀy

{Aᵀy ≥ c

y free

Note. At this point we derive a key observation: If we have a inequality ≤ in P , then the certificate producedin the dual problem P ∗ will need a ≥ 0 constraint so we obtain a non-negative linear combination of ≤’s. If wehave a ≥ inequality, the inequality has to be flipped in the certificate, corresponding to ≤ 0.

If we have an equality in P , the coefficient corresponding to this equality has no effect on the overall feasibility,so the variable in y is free.

General formula for dual of LP’s

Maximisation Problem (P ) Minimisation Problem (P ∗)

ith constraint is ≤? ith variable is ≥ 0ith constraint is ≥? ith variable is ≤ 0ith constraint is =? ith variable is free

jth variable is ≥ 01 jth constraint is ≥?jth variable is ≤ 0 jth constraint is ≤?jth variable is free jth constraint is =?

2.4.3 (Theorem 25) Strong Duality Theorem for General Form.Let P, P ∗ be a pair of primal-dual LP’s.

1. If P, P ∗ both have feasible solutions, then they both have optimal solutions, and their optimal objectivevalues are the same.

2. If one of P, P ∗ have optimal solutions, the other one must also have an optimal solution and the optimalobjective values coincide.

2.4.4 (Theorem 26).Let P, P ∗ be a pair of primal-dual LP’s. Let x be feasible in P and y be feasible in P ∗. Then x , y are optimalin their respective problems if and only if the following complementary slackness condition holds:

• For all j either xj = 0 or jth dual constraint is satisfied with equality by y

• For all i either yi = 0 or jth primal constraint is satisfied with equality by x

2.5 Convexity and Polyhedra

Given x (1),x (2) ∈ Rn, there is a line segment:

{(1− λ)x (1) + λx (2) : λ ∈ [0, 1]}

This is the convex hull of x (1),x (2).

Definition

(Definition 27) S ⊆ Rn is a convex set if for every pair of points x (1),x (2) ∈ S, the line segment joining

x (1) and x (2).The convex hull of a set S ⊆ Rn is

convS :=⋂

H⊇S convex

2.5.1 (Proposition 28).Let A ∈ Rm×n and b ∈ Rm. Let

F := {x ∈ Rn : Ax ≤ b}

Then F is convex.

Proof. Let x (1),x (2) ∈ F . Let λ ∈ [0, 1]. Then

A((1− λ)x (1) + λx (2)) = (1− λ)︸︷︷︸≥0

Ax (1) + λ︸︷︷︸≥0

Ax (2) ≤ (1− λ)b + λb = b

Hence (1− λ)x (1) + λx (2) ∈ F , so F is convex.

2.5.2 (Proposition 29).The intersection of an arbitrary set of convex sets is convex.

Definition

(Definition 30)A closed half-space is a set (a ∈ Rn, b ∈ R)

{x ∈ Rn : aᵀx ≤ b}

A polyhedron is a set of the form {x ∈ Rn : Ax ≤ b}.

Note. The definition of closed half-space includes ∅ and R.(Remark 31) Polyhedra are convex sets. Feasible regions of LP problems are convex sets. Moreover, the set

of optimal solutions of a LP problem is convex.

2.5.3 Proposition.The set of optimal solutions of a LP problem is convex.

Proof. Let (P ) be a LP problem:

max cᵀx ,{

Ax ≤ b

If (P ) has no optimal solution, the solution set ∅ is empty.Assume that there is a optimal solution x . Then the set

{x ∈ Rn : Ax ≤ b,−cᵀx ≤ −cᵀx}

Definition

Let 〈x (j)〉kj=1 ⊆ Rn. A convex combination of 〈x (j)〉 is

k∑j=1

λjx(j),

λj = 1, λj ≥ 0

If we lift the restriction that λj ≥ 0, we get a affine combination.

2.5.4 (Proposition 32).S ⊆ Rn is a convex set iff S contains all convex combinations of its elements.

Proof. If S contains all convex combinations of its elements, then S is convex by definition.Conversely, we prove by induction on k (number of points in the convex combination). If k = 1, 2, the proof

is the definition of convex set. Suppose the case holds for k−1. Let 〈x (j)〉kj=1 ⊆ S and 〈λj〉 ≥ 0 with∑j λj = 1.

If λk = 1 the case degenerates to the 1 element case, so we assume this is not the case. By inductivehypothesis,

k−1∑j=1

λj1− λk

x (j) ∈ S

Then (1− λk)y + λkx(k) ∈ S, so

(1− λk)y + λkx(k) ∈ S =

k−1∑j=1

λjx(j) + λkx

k∑j=1

λjx(j) ∈ S

2.5.5 (Corollary 33).For every S ⊆ Rn, convS is the set of all convex combinations of elements in S.

2.5.6 (Theorem 34) Caratheodory, 1907.] Let S ⊆ Rn. Then every point in the convex hull of S can be expressed as a convex combination of at mostn+ 1 points in S.

Proof. Let x ∈ convS such that for 〈x (j)〉kj=1 ⊆ S with k ≥ n+ 2 satisfies

k∑j=1

λjx(j) = x ,

k∑j=1

λj = 1, λj > 0

(without loss of generality we can assume λj 6= 0) The goal is to make some λj = 0.Observe that for every j, [

]∈ Rn+1

We can write this as a matrix [x (1) · · · x (k)

1 · · · 1

]λ1...λk

Since there are n + 2 vectors of [x (j); 1], they are linearly dependent in Rn+1, so there exist coefficients 〈µj〉,not all 0, such that

k∑j=1

µjx(j), 0 =

k∑j=1

Letα := max{α : λ + αµ ≥ 0}

Defineλ := λ + αµ

At least one entry of λ is zero. Notice

k∑j=1

λjx(j) =

k∑j=1

λjx(j)

︸︷︷︸x

k∑j=1

µjx(j)

︸︷︷︸0

λ ≥ 0 by the definition of α.Finally,

k∑j=1

(λj + αµj) =

k∑j=1

λj︸︷︷︸=1

k∑j=1

µj︸︷︷︸=0

Therefore we expressed x as a linear combination of (k − 1) points from S. Repeating this process yields alinear combination of at most n+ 1 elements.

Descriptio 2.3: Extreme Points of convex sets

Definition

[x (j)

]are linearly dependent, x (j) are affinely dependent.

An affine subspace of Rn is a set of the form

{x : Ax = b}

When b = 0 , an affine subspace reduces to a linear subspace.Let X := {x : Ax = b}. If X is not empty, then there exists o ∈ X. Then

X = o + {x : Ax = 0} = o + ker A

2.5.1 Extreme Points

Definition

Let S be a convex set. Then x ∈ S is an extreme point of S if there do not exist two points u , v ∈ S\{x}such that

The set of extreme points of S is extS.

2.5.7 (Theorem 35).Let S ⊆ Rn be a convex set and x ∈ S. x ∈ S is an extreme point of S if and only if S \ {x} is convex.

If we remove the shell of a ball using Theorem 35, we obtain its interior.

2.5.8 (Theorem 36).Let A ∈ Rm×n and b ∈ Rm. Define

F := {x ∈ Rn : Ax ≤ b}

Let x ∈ F . We partition the rows of A, b so that

Ax ≤ b ⇐⇒

{A=x = b=

A<x < b<

x is an extreme point of F if and only ifrank A= = n

Descriptio 2.4: Lemma 36A: If x , a point on the boundary, is half way between u , v , then u , v must be on theboundary of the half space of aᵀ.

aᵀx = bx uv

Proof. Let x ∈ F . Suppose rank A= 6= n. Then rank A= ≤ n− 1, so there exists d ∈ Rn \ {0} such that

A=d = 0

Our goal is to perturb x (allowed since the inequalities are strict). For sufficiently small ε > 0,

u := x − εd , v := x + εd

Then x = 12 (u + v). Moreover,

A=u = A=(x + εd) = A=x︸︷︷︸b=

+εA=d︸︷︷︸0

A<u = A<(x + εd) = A<x︸︷︷︸<b

+ εA<d︸︷︷︸�(b

<−A<x)

It suffices to take

0 < ε <mini(b

<i − (A<x )i)

maxi∣∣(A<d)i

∣∣and likewise for v , so u , v ∈ F . Finally, since d 6= 0 , u 6= x 6= v .

To prove the other direction, we need a lemma:

2.5.9 (Lemma 36A).Let a ∈ Rn, b ∈ R, x ∈ Rn. Define

H := {x ∈ Rn : aᵀx ≤ b}

Suppose aᵀx = b and u , v ∈ H such that

2(u + v)

Then aᵀu = aᵀv = b.

Proof.

b ≥ 1

2aᵀu +

2aᵀu = aᵀ

)︸︷︷︸

= aᵀx = b

Since u , v ∈ H, we have b = aᵀu = aᵀv .Suppoes rank A= = n . Suppose x is not an extreme point of F . We are seeking a contradiction.Since x is not an extreme point, there exist u , v ∈ F \ {x} such that

We apply Lemma 36A to the rows of [A=|b=], which gives

A=u = b=, A=v = b=

HenceA=(u − v) = 0

but u − v 6= 0 and so rank A= < n. This is a contradiction, so u , v may not exist.

2.5.10 (Corollary 37.i).If rank A < n, F has no extreme points.

2.5.11 (Corollary 37.ii).The number of extreme points of a polyhedra is finite. An upper bound is

Proof. Each [A=|b=] of rank n generates a unique extreme point, since there are m rows, a maximumnumber of

)[A=|b=] are possible.

There are tighter bounds, but all are at least exponential on n.

Definition

(Definition 38) A polyhedron is pointed if it does not contain a line.

Examples:

• {x : Ax = b,x ≥ 0} is always pointed.

• {x : Ax ≤ b,x ≥ 0} is always pointed.

• Empty set is pointed.

2.5.12 (Proposition 39).Let P ⊆ Rn be a non-empty polyhedron. Then P is pointed if and only if P has at least 1 extreme point.

2.5.13 (Theorem 40).If the feasible region of a LP problem (P ) is pointed, and (P ) has an optimal solution, then (P ) has an optimalsolution at an extreme point of its feasible region.

Proof. Assume feasible region F := {x : Ax ≤ b} of (P ) is pointed. Let x be an optimal solution suchthat x satisfies a maximal number of equalities in Ax ≤ b.

Suppose x is not a extreme point of F . We shall derive a contradiction, so there exist u , v ∈ F \ {x} suchthat x = 1

2 (u + v). Define the line

ξ(λ) := (1− λ)u + λv , (λ ∈ R)

u , v , x are on this line. Let A=x ≤ b= denote the constraints held with equality at x . Since u , v ∈ F , theysatisfy all of the inequality constraints. By Lemma 36A,

A=u = b=, A=v = b=

Since ξ(λ) is an affine combination of u and v ,

A=ξ(λ) = b=

Moreover, since cᵀx is optimal, Lemma 36A implies all points on the line segment connecting u , v are optimal.When L is extended as far as possible inside F (which does not contain a line due to pointed-ness), there is

a particular λ such thatξ(λ) ∈ F

satisfies A=ξ(λ) = b= and λ is minimal or maximal. For this ξ(λ), a constraint in A<ξ(λ) ≤ b< becomes anequality. This violates the maximality of x , so our assumption is false, and x is an extreme point of F .

Note. (Remark 41) This theorem provides an algebraic characterisation of a fundamental geometric obejct. Italso indicates a finite algorthm can solve LP’s.

Descriptio 2.5: The convex hull formed by the extreme points of a polytope must be the polytope itself. If x isoutside of the polytope, a certificate y must exist which separates x from a facet of the polytope.

ξ F \ C

Definition

A polytope is a bounded polyhedron.

Note that polytopes are compact.

2.5.14 (Theorem 43).Let F ⊆ Rn be a polytope. Then F is the convex hull of its extreme points.

Proof. Let F be a polytope. By Theorem 36, there exists finitely many extreme points of F . If F = ∅,then F is the convex hull of ∅.

Otherwise, F is non-empty, so it has extreme points by Proposition 40. Let {v1, . . . , vk} be the extremepoints of F . Let C be the convex hull of these points. Since F is convex, C ⊆ F .

Suppose there exists ξ ∈ F \ C. We shall derive a contradiction. Then the system∑kj=1 λjv j = ξ∑kj=1 λj = 1

λ ≥ 0

is infeasible. This is the primal system in Farkas’ Lemma, so the dual system with y ∈ Rn, α ∈ R has a solutiony : {

yᵀξ + α < 0

yᵀv j + α ≥ 0, (j ∈ {1, . . . , k})

The following problem thus has an optimal solution x .

min yᵀx ,{

x ∈ F

By Theorem 41, the optimal can be attained at an extreme point of F . However, this extreme point must notbe equal to any v j , since

yᵀx < −α ≤ yᵀv j

so one of the extreme points in F is not in C, but this is a contradiction to that C contains all extreme pointsof F , so our assumption is false and ξ cannot exist. Therefore F = C.

2.5.15 (Theorem 44).Convex hulls of any finite subset of Rn is a polytope.

Definition

Let S1, S2 ⊆ Rn. The Minkowski sum of S1, S2 is

S1 + S2 := {u + v : u ∈ S1, v ∈ S2}

Definition

A cone is a set C such thatx ∈ C, λ ≥ 0 =⇒ λx ∈ C

A polyhedral cone is a polyhedron which is also a cone.

Note. Every non-empty polyhedral cone can be expressed as

{x ∈ Rn : Ax ≤ 0}

This is because every polyhedron can be written as Ax ≤ b, but since a cone contains 0 , b = 0 .

2.5.16 (Theorem 45).

Let v (1), . . . , v (m) ∈ Rn. Let

{m∑i=1

λiv(i) : λ ≥ 0

K = {x ∈ Rn : Ax ≤ 0}

for some A ∈ Rk×n, for some k ∈ N.

Proof. Define

V := [v (1), . . . , v (m)] ∈ Rn×m

Without loss of generality we may assume V is of full rank (rank V = n), otherwise K can be redefined usingfewer vectors.

DefineF := {y ∈ Rn : Vᵀy ≥ 0}

andS := {s ∈ Rn : yᵀs ≥ 0 ,∀y ∈ F}

We shall show that S = K.

• Let x ∈ K. Then there exists λ ≥ 0 such that x = Vλ. Let y ∈ F be arbitrary. Then

yᵀx = yᵀV︸︷︷︸≥0

λ ≥ 0

Thus x ∈ S. This shows S ⊇ K.

• Conversely, suppose there exists s ∈ S \K. By Farkas’ Lemma, there exists u ∈ Rn such that Vᵀu ≥ 0and uᵀs < 0. This implies u ∈ F , which is inconsistent with s ∈ S, so s cannot exist, and S ⊆ K.

Next we shall show that finitely many y ’s from F suffices to define S and hence K. Let y ∈ F and Vᵀ= be thesubmatrix of Vᵀ corresponding to tight constraints on y . If rank Vᵀ= ≤ n− 2, then there exists u , v ∈ F andα, β > 0 such that the inequality given by y in the definition of S is unnecessary.

Clarify?Therefore, it suffices to consider only those y ∈ F such that ‖y‖ = 1 and rank Vᵀ= = n− 1. There are only

finitely many such y ’s, say y (1), . . . ,y (k). Then

K = S = {x ∈ Rn : [y (1), . . . ,y (k)]x ≤ 0}

We also proved that a polyhedral cone is finitely generated :

2.5.17 (Corollary 46).

Let K := {x ∈ Rn : Ax ≤ 0} for some A ∈ Rm×n. Thent here exists positive integer k and v (1), . . . , v (k) ∈ Ksuch that

{k∑i=1

λiv(i) : λ ≥ 0

}2.5.18 Proposition.Minkowski sum of two convex sets is always convex. Minkowski sum of two polyhedra is always a polyhedron.

Definition

The largest affine subspace contained in a polyhedron is the linearity space of the polyhedron.

2.5.19 (Theorem 47).Let F ⊆ Rn be a non-empty pointed polyhedron. Then there exist

• A pointed polyhedral cone K ⊆ Rn

• A polytope P ⊆ Rn, that is the convex hull of the extreme points of F

such thatF = P +K

Proof. Let F = {x ∈ Rn : Ax ≤ b}. Consider the polyhedral cone F ⊆ Rn+1:

]∈ R× Rn : Ax − µb ≤ 0 , µ0

Using Corollary 46, F is the convex combination of vectors of the form

]. If µ 6= 0, we normalise so µ = 1.

Otherwise we normalise so the ∞-norm is 1. Then we have that

F = cone

], . . . ,

]}+ cone

], . . . ,

]}Then we can let

P := conv{v (1), . . . , v (k)}

K := cone{w (1), . . . ,w (l)}

x ∈ F ⇐⇒[

]∈ F ⇐⇒ x ∈ P +K

The decomposition is not unique, when P is allowed to be any polytope. If P = conv extF , then thedecomposition is unique.

If we are faced with an LP problem with a pointed feasible region,

• We test rays of the formo + λr

where o is an extreme point of F (or P ) and r is an extreme ray of K.

• If r · b > 0, the problem is unbounded.

• If r · b ≤ 0 for all r , the optimal value must be attained at one extreme point of F (or P ).

2.5.20 (Theorem 48).Let F ⊆ Rn be a non-empty polyhedron.

• There exists a pointed polyhedral cone K ⊆ Rn

• There exists a polytope P ⊆ Rn

• Let L be the linearity space of F

such thatF = P +K + L

Again, the decomposition is not unique unless we force P = conv extF .

2.5.21 (Theorem 49).Let C ⊆ Rn be a compact convex set and S ⊆ C. The following are equivalent:

1. cl convS = C

2. inf{hᵀx : x ∈ S} = min{hᵀx : x ∈ C} for all h ∈ Rn

3. extC ⊆ clS

2.6 Bases and Simplex Algorithm

Definition

A simplex is a subset S ⊆ Rn, such that

S = conv{v (1), . . . , v (n+1)}

where v (j)’s are affinely independent.

Examples of simplices: a line segment in R1, a triangle in R2, a tetrahedron in R3, and a 5-cell in R4.Consider the primal problem in standard equality form

(P ) : max z(x ) := cᵀx ,

{Ax = b

x ≥ 0, (A ∈ Rm×n, b ∈ Rm)

Assume that rank A = m. If not, we can apply Gaussian Elimination or put [A|b] in row echelon form. Withthis, we either prove Ax = b is infeasible, or Ax = b has solutions with rank A < m, in which case we eliminateall redundant equations.

The columns of A can be partitioned:

A = [A:,1| · · · |A:,n]

Definition

B ⊆ {1, . . . , n} is a basis of A if

|B| = m, det A:,B = det[A:,j : j ∈ B] 6= 0

If B is a basis of A, the columns of AB forms a basis of Rm.

Descriptio 2.6: A selection of columns of A. Some of which may be bases.

Descriptio 2.7: Example of A ∈ R2×5. The feasible region of the LP problem in SEF is the set of columnlinear combinations which produce b. {1, 5} do not form a feasible basis, in the sense that any conical linearcombination of these two columns do not produce b, but {1, 2} does.

eᵀ2Ax

eᵀ1Ax

eᵀ2Ax

eᵀ1Ax

Definition

Every basis B of A determines a basic solution x of Ax = b, which is the solution of{Ax = b

xN = 0

where N = {1, . . . , n} \B.If A-1

Bb ≥ 0 (i.e. x ≥ 0 ), x is a basic feasible solution of{Ax = b

x ≥ 0

Ax = ABxB + ANxN

so {Ax = b

xN = 0⇐⇒

{ABxB = b

xN = 0⇐⇒

{xB = A-1

xN = 0

Each basis of A gives rise to a unique basic feasible solution. The total number of bases is bounded by(nm

and this also bounds the number of basic feasible solutions.

2.6.1 (Theorem 50).Let A ∈ Rm×n, rank A = m, b ∈ Rm. Define

F := {x ∈ Rn : Ax = b,x ≥ 0}

Suppose x ∈ F . The following are equivalent:

1. x is a basic feasible solution of F

2. {Aj : xj > 0} is linearly independent. (The indices j for which xj 6= 0 is the support)

3. x is an extreme point of F .

Proof. We can express F in a slightly different from. Let

A−A−I

, b :=

b−b0

F ={x ∈ Rn : Ax ≤ b

}This is a standard way to break an equality into two inequalities. Moreover, rank A = m, so we know thatm ≤ n. (Otherwise Rank-Nullity Theorem prevents rank A from exceeding n) The matrix A has 2m+ n rows.

Suppose (1) holds. Then there exist indexing sets B,N such that x is a BFS determined by the basis B, andN is the complement of B. By definition of BFS, xN = 0 , so |N | of the inequalities in Ax ≤ b, in particularthose originating from

−Ix ≤ 0

are equalities. Moreover, Ax = b, since x ∈ F . Therefore, (using Theorem 36’s notation), A=

contains

• The rows of A,−A, since ±Ax = ±b is satisfied.

• The rows of −I corresponding to the index set N , since xN = 0 . Let these rows be −J

• Some other rows of −I. Let the indices (row index) of these rows be M and let these rows be −K. Notethat by definition of M we require it to be disjoint from N .

nB N B

−J (with indices N)

−K (with indices M)

[A−A

]= rank A = m

We need to prove that the remaining n−m rank are contributed by J,K.Suppose that there is a linear combination of rows of A and J which produces 0 , with coefficients λ,µ,

0ᵀ = λᵀA + µᵀJ

The crux is that since J is a slice of I, µ only excites columns corresponding to the index set N , so the remainingcolumns must be contributed by λᵀA. In other words,

0ᵀ = λᵀA + µᵀJN

Let i ∈ B. Recall the standard basis vector e i. Then

0 = 0ᵀe i = λᵀAe i + µᵀ JNe i︸︷︷︸0

= λᵀAe i

Since this holds for all i ∈ B,0 = λᵀAB

Since AB has full rank, λ = 0 , but this implies

0ᵀ = µᵀJ

J is a slice of the identity matrix, so it has full rank as well, which forces µ = 0 .We have thus proved that the rows of J are linearly independent from A. Since J has |N | = n−|B| = n−m

rows, the following matrix has full rank: [AJ

]Since this is a subset of A

=, rank A

== n. By Theorem 36, x is an extreme point of F .

Conversely, suppose x is an extreme point of F . Then since x ∈ F , we have Ax = b, and the first 2m rowsof Ax = b are satisfied with equality by x .

By Theorem 36, rank A=

= n. Since rank A = m, the remaining n −m dimensions must come from theidentity matrix in A. Thus there exists an index set N with |N | = n−m such that xN = 0 . Let J be the rowsof I whose indices are in N . Define

[A−J

]By definition of A

=we have that

[A−J

]x ≤

]Let λ ∈ Rn such that Aλ = 0 and λN = 0 . If we prove that under such circumstances λ must be 0 , then wehave established the linear independence of columns of AB . But since λN = 0 , Jλ = 0 , so

[Aλ−Jλ

]Since B is of full rank, λ = 0 , as required. Therefore the columns of AB are linearly independent, and by thevirtue of |B| = m, we have {

Ax = b

Jx = 0⇐⇒

{AB x = b

Jx = 0⇐⇒

{AB xB = b

xN = 0

Hence x is a BFS.

Definition

(Definition 51) A basis which corresponds to a basic feasible solution is a feasible basis.

2.6.1 Simplex Method

Let B be a feasible basis of A and x be the basic feasible solution of Ax = b,x ≥ 0 , determined by B.Consider the LP in SEF:

(P ) : max z(x ) := cᵀx ,

{Ax = b

x ≥ 0

Theorem 40 and Theorem 41 imply that the optimal value of (P ) can always be attained at an extreme point.Ax = b can also be written as

ABxB + ANxN = b

which converts the problem to an equivalent form

max z(x ) := cᵀx ,

{xB + A-1

BANxN = A-1Bb

x ≥ 0

This system uniquely determines the basic feasibel solution

[A-1Bb0

]so, we may wish to move our current solution to improve the objective value z. We forced x = 0 . If we want toconsider other feasible solutions of (P ), we need to set some xj to a non-zero value. Assume that we set xk 6= 0.To maintain the feasibility of the new solution, we have

xk := α ≥ 0

Moreover, to maintain Ax = b, we need

xB = A-1Bb − (A-1

BAk)︸︷︷︸d :=

α ≥ 0

This constraints on how large α can be set: We need xj − αdj ≥ 0 whenever dj > 0, or

α ≤ min

: j ∈ B, dj > 0

}Thus the new solution is of the form

x ′ := x + αd

−dj j ∈ B1 j = k

0 j ∈ N \ {k}=

−eᵀ

jA-1BAk j ∈ B

1 j = k

0 j ∈ N \ {k}

Note. A-1B is indexed using the same set of indices in B! The indices do not “collapse”

If d ≤ 0 , then the feasible region of (P ) is unbounded, since it contains the ray

λ 7→ x + λd

When this does not occur, x ′ is the new feasible solution.

2.6.2 Proposition.x ′ is a basic feasible solution.

Proof. The index which achieved the minimal value of α can be removed from B, i.e.

l := arg minj∈B,dj<0

ThenB′ := B ∪ {k} \ {l}

AB′ is invertible since A has full rank.To ensure that the k ∈ N of choice improved the objective value, we use the decomposition

z = cᵀx = cᵀBxB + cᵀ

Descriptio 2.8: One iteration of marching from basis {1, 2} to {3, 2}. Since α is maximal, one component (inthis case x′1), is annihilated.

Let xB be a BFS. ThenxB = A-1

Bb −A-1BANxN

z = cᵀBA-1

Bb − cᵀBAᵀ

BANxN + cᵀNxN

= cᵀBA-1

Bb + (cᵀN − cᵀ

BA-1BAN )xN

Lety := A-1ᵀ

B cB , c := cᵀN − cᵀ

BA-1BAN

z = cᵀBA-1

Bb︸︷︷︸z(x)

+(cᵀN − yᵀAN )xN

= z(x ) +∑j∈N

The entries cj of c is the reduced cost of xj w.r.t. the current basis B. cj behaves like the partial derivativeof the objective function w.r.t xk. In order to improve our current solution, we need to choose a k ∈ N suchthat ck > 0. If this does not happen, the current solution x is optimal.

The above series of operation produces another problem

(P ′) : max z′ = cᵀx + z(x ) :

{Ax = b

x ≥ 0

where[A|b] = A-1

B [A|b], c := c −Aᵀy , z(x ) = bᵀy

Although the objective functions of P, P ′ differ, they coincide on the affine subspace Ax = b.

2.6.3 Lemma.If cN ≤ 0 , then the current BFS (basic feasible solution), i.e.{

xN := 0

xB := A-1Bb ≥ 0

is optimal. Moreover, when cN < 0 , then x is the unique optimal solution.

Descriptio 2.9: Left: Plot of the feasible region of (P ) along with some bases of (P ). Centre and Right: Plot of(Ai, ci) pairs. The objective value of any solution is the intersection of the b line with a conic combination of(Ai, ci)’s. Suppose we select a feasible basis {1, 4}. This feasible basis generates a basic feasible solution, butchanging x2 to a non-zero value (thus forcing x4 = 0) Notice that {1, 2, 4} forms a simplex.

eᵀ2Ax

eᵀ1Ax

img AA1

The converse to this lemma is not necessarily true.Proof. For every feasible solution x of (P ), we have cjxj ≤ 0, so

z = cᵀBA-1

Bb +∑j∈N

cjxj︸︷︷︸≤0

≤ cᵀBA-1

Bb = cᵀB xB = cᵀx

For every feasible solution of (P ) except for x , we must have xj > 0 for some j ∈ N . Since cj < 0 for everyJ ∈ N , for every feasible solution of (P ) except for x , the above inequaltiy is strict.

The same technique can also be used to show that if cN < 0 , then the current BFS x is the unique optimalsolution of (P ). To see this, notice that if x is another feasible solution, we must have xj > 0 for some j ∈ N ,so z < cᵀx .

Another proof: Proof. The dual of (P ) is

(P ∗) : min bᵀy :{

Aᵀy ≥ c

Recall y = (A-1B)ᵀcB . By the assumption

c′N = cᵀN yᵀAN ≤ 0

y is a feasible solution for (P ∗). Moreover,

bᵀy = bᵀ(A-1ᵀB cB) = cᵀ

BA-1Bb = cᵀ

By Weak Duality Theorem, x , y are optimal.Finally, what if d ≥ 0? Under this assumption, the objective value of λ→ x + λd is

cᵀ(x + λd) = cᵀx + λcj → +∞

The idea is that the current solution x is at an extreme point. We slide along the boundary of the polyhedronto reach another extreme point that is optimal. With this established, we obtain the Simplex Method.

• Input: (A, b, c, x , B) such that

– A, b, c defines a LP system (P ) in Standard Equality Form.

– B is a feasible basis for (P ) determining a BFS x .

1: procedure SimplexMethod(A, b, c, x , B)

2: N := {1, . . . , n} \B3: y := (Aᵀ

B)-1cB4: cN := cN −Aᵀ

N y5: if cN ≤ 06: return (x , y) are optimal in (P ) and (P ∗).7: end if8: k ∈ {k : ck > 0}9: d := A-1

10: dj ←

1 if j = k

0 if j ∈ N \ {k}−dj if j ∈ B

11: if d ≥ 012: return (P ) is unbounded: x +λd is a certificate. (D) is infeasible: d satisfies d ≥ 0 ,Ad = 0 , cᵀd =

ck > 0.13: end if14: l← arg minj∈B,dj<0

−dj15: α := xl

−dl16: B := (B ∪ {k}) \ {l}17: x ← x + αd18: SimplexMethod(A, b, c, x , B)19: end procedure

The change in objective value is

cᵀ(x + αd)︸︷︷︸New

− cᵀx︸︷︷︸Old

= αcᵀd = αck ≥ 0

A natural question which arises is that how many iterations do we have to execute simplex algorithm tofind the optimal solution. Since in each iteration (except for the last), α > 0, the objective value improves by apositive amount in each iteration.

2.6.4 (Theorem 52).Simplex method applied to LP problems in SEF with a basic feasible solution terminates in at most

)iterations.

Provided that α > 0 (happens when all xi 6= 0) in each iteration.When the algorithm stops, it either proves (P ) has an optimal solution with certificates, or proves that (P )

is unbounded.

Proof. When α > 0, since αc > 0, the objective value improves in every iteration, so no basis is repeated.

2.6.2 Cycling and Stalling

Definition

(Definition 53) A basic solution x determined by basis B of A is a degenerate if xi = 0 for some i ∈ B.In which case B is a degenerate basis.

Suppose we are using well-defined deterministic rules for the choice of subscripts k, l. If such an implementa-tion of simplex method does not terminate, then it must cycle, i.e. iterate over the same list of bases indefinitely.Note that all basic feasible solutions in the cycle must be degenerate, since no improvement of objective valuecan happen.

Degeneracy in the dual corresponds to multiple optima in the primal.When degeneracy happens, α = 0 and the algorithm may not make progress. The improvement is in the

choice of k or l. When there is a tie, choose the possible k, l with the lowest index. This is the smallest indexrule (Bland’s Rule) and will ensure termination.

Descriptio 2.10: Left: {1, 2} is a non-degenerate basis. Right: {1, 4} is a degenerate basis

eᵀ2Ax

eᵀ1Ax

eᵀ2Ax

eᵀ1Ax

Descriptio 2.11: Simplex Method can cycle in this case

Descriptio 2.12: A degenerate b lies on the edge of a the conic hull of a feasible basis. In the diagram below, theobjective value for feasible bases {1, 2} and {2, 3} are identical. Note that it is possible to escape the degeneracyby moving x4.

A4 A5b

Descriptio 2.13: A minor random pertubation to b removes degeneracy. In industrial-strength simplex method,a multivariate normal error is introduced to perturb b. When the optimum is found with this perturbed b, weremove the perturbation to get the true optimum. The scale is exaggerated in the diagram.

eᵀ2Ax

eᵀ1Ax

Definition

(Definition 54) LP problem (P ) with constraints Ax = b,x ≥ 0 is non-degenerate if every basis of Ais non-degenerate.

2.6.5 (Theorem 55).Simplex method applied to any LP in SEF with a starting BFS and utilising the smallest index rule terminatesin at most

)iterations by either providing an optimal solution, or proving that LP is unbounded.

Other than Bland’s rule, another method exists to curb cycling. Without loss of generality, we may assumethat

A = [A|I]

where I corresponds to basis B via row elimination.Let ε > 0. Define

...εm

and consider the perturbed problem

(P ′) : max cᵀx ,

{Ax = b + ε

x ≥ 0

The idea is to perturb the RHS in a way that we never get

(A-1Bb)i = 0

for any i and any basis B.One way to do this is to approximate each entry in A and b by a rational number (albeit with large

denominator). Then they can be scaled to be integers.Consider an example: When we compare two integers e.g. 31571 and 31519, we know the answer to the

comparison after comparing the fourth digit. Using a similar idea, we perturb the right side with We applysimplex method to (P ′). In computing

}we need to compare expressions of the form

β0 + β1ε1 + · · ·+ βmεm

γ0 + γ1ε1 + · · ·+ γmεm

when 1� ε1 � · · · � εm > 0, the first expression is lexicographically larger than the second if for the smallesti such that βi 6= γi, we have βi > γi. ε are indeterminants that impose a total order on the bases of A.

For any basis B of A, the corresponding xB for (P ′) is

xB = A-1Bb + A-1

Applying Simplex Method to (P ′) is applying Lexicographic Simplex Method to (P ).

2.6.6 (Proposition 56).(P ′) is non-degenerate.

2.6.7 (Theorem 57).Lexicographic Simplex Method applied to (P ) with a starting BFS terminates in at most

)iterations.

The resulting basis from the Lexicographic Simplex Method proves the same claim for (P ).

In pratice, the danger is not cycling but stalling (traversing through a sequence of degenerate solutionswithout changing x )

2.6.3 Two-Phase Method

The simplex algorithm we have needs to start with a BFS. The Two-Phase method generates a BFS from agiven LP and uses Simplex Algorithm to determine an optimal solution.

(P ) : max cᵀx ,

{Ax = b

x ≥ 0

Without loss of generality we can assume

• rank A = m.

• b ≥ 0 : We can flip rows of [A|b] corresponding to negative bi.

To determine a BPFS, we introduce auxiliary variables xn+1, . . . , xn+m and solve an auxiliary LP:

(P ′) : maxw(x ), [A|I]

x1...xn...xm

= b, x ≥ 0

A basic feasible solution is given tby B = {n + 1, . . . , n + m}, which yields a BFS of the above system. Theobjective function of (P ) is

w(x ) := [0ᵀ| − 1ᵀ]x

• Every feasible solution of (P ) corresponds to an optimal solution of (P ′) which ca be obtained by appending0’s to x .

• Every optimal solution to (P ′) is a feasible solution to (P ), because such a solution must have xn+j = 0for j = 1, . . . ,m.

Applying the simplex method to (P ′) gives a solution x .

• If w(x ) = 0, then x is a feasible solution to (P ).

• Otherwise, (P ) is infeasible, certified by the last y in Simplex Algorithm.

The dual of (P ′) is

(P ′∗) : min bᵀy ,

{Aᵀy ≥ 0

y ≥ −1

In the case that the optimal value of (P ′) is non-zero. The last y computed by the simplex method, whichcertifies optimality of (P ′∗), will certify the infeasibility of (P ).

Another way to determine feasibility of (P ) is the following: Feasibility is not relevant to objective function,so we can find a feasible solution to

(P ) : max 0ᵀx ,

{Ax = b

x ≥ 0

its dual is(P ∗) : min bᵀy ,

{Aᵀy ≥ 0

0 is a BFS (wait a second, we have not defined BFS for this type of LP yet, but (P ∗) can be expressed asAᵀy = 0 ) of (P ∗).

Below is a refinement of Complementary Slackness:

2.6.8 (Theorem 58) Strict Complementarity Theorem.Let (P ) be in standard equality form which has optimal solutions. Then (P ), (P ∗) have optimal solutions x , ysuch that for all j ∈ {1, . . . , n},

xj(Aᵀy − c)j = 0, xj + (Aᵀy − c)j > 0

Proof. The first condition is implied by Complementary Slackness Theorem.When (P ), (P ∗) both have optimal solutions, their optimal solutions’ objective values must coincide. Let

this value be z. For each j ∈ {1, . . . , n}. We shall either

• Construct an optimal solution x (j) of (P ) with x(j)j > 0, or

• An optimal solution y (j) of (P ∗) with (Aᵀy (j) − c)j > 0.

Define the primal-dual pair

(Pj) : max eᵀj x ,

Ax = b

cᵀx ≥ zx ≥ 0

(P ∗j ) : min bᵀy + zη,

{Aᵀx + ηc ≥ ejη ≤ 0

(Pj) is formulated in such a way that its feasible region is the set of optimal solutions to (P ). If (Pj) has feasible

solutions with xj > 0, then we have an optimal solution x (j) to (P ). Otherwise, the optimal objective value of(Pj) is 0. By Strong Duality Theorem (Dj) has an optimal solution y∗, η∗ such that bᵀy∗ = −z∗η∗.

Cases:

• η∗ = 0. Then Aᵀy∗ ≥ ej , bᵀy∗ = 0.

Let y be any optimal solution of (P ∗), and let

y (j) = y + y∗

This y (j) satisfies constraint j with slack ≥ 1 and all others with slack ≥ 0, i.e.

(Aᵀy (j) − c)j ≥ 1 > 0

• η∗ < 0. Define

y (j) :=y∗

−ηThen

Aᵀy (j) ≥ c+1

−η∗ej , bᵀy (j) =

bᵀy∗

−η∗=−z∗η∗

−η∗= z∗

From which we also have(Aᵀy (j) − c)j ≥ −η

∗ > 0

Let B be the set of indices j for which we constructed x (j), and let N be the complement of B.With this established, we take the convex combination (the optimal solution set must be convex):

|B|∑j∈B

x (j), y :=1

|N |∑j∈N

are optimal solutions of (P ), (P ∗). Note that B or N may be empty.

• If B = ∅, then the unique optimal solution of (P ) is x = 0

• If N = ∅, the unqiue optimal solution y with rank A = m, satisfies Aᵀy = c

Caput 3

Combinatorial Optimisation

Example: (Example 59) Assignment Problem

Suppose we have set of jobs J , a set of workers W . We are given ci,j ∈ R for each i ∈W, j ∈ J describingcompatibility of worker i to job j. We want an assignment workers to jobs bijectively such that the sumci,j for the assigned pairs are optimised.

The brute force solution to this problem is infeasible when |J | is large.Let

xi,j :=

{1 if worker i is assigned to j

0 otherwise

We want to maximise

(A) :∑i∈W

∑j∈J

ci,jxi,j ,

∀i,∑j∈J xi,j = 1

∀j,∑i∈W xi,j = 1

xi,j > 0

xi,j ∈ Z

(A) is feasible iff |J | = |W |. Let n := |J | = |W |. One of the equations specifying feasible region of (A) isredundant, since the sum of all xi,j can be derived in two different ways.

To discourage a worker from being assigned a particular job, assign a extremely negative value to ci,j .

3.1 Graphs

Definition

A graph is a pair G = (V,E), where

• V is the set of vertices.

• E is the set of edges, which has elements fo the form (u, v), with u, v ∈ V . The pair is not ordered,and u 6= v.

The graphs following the above rules is a simple graph.

Definition

A matching in G is M ⊆ E such that every vertex in V is incident to at most one edge of M .A perfect matching in G is a matching M where |M | = |V | /2.

The graph in Assignment Problem is bipartite and at least one perfect matching exists. The problem is tofind the best perfect matching w.r.t. the weights of edges.

Descriptio 3.1: A set P and its integer hull

Maximum Weight Matching Problem.Let G be a graph and w : E → R. Find a matching M in G such that

arg maxM

∑e∈M

Maximum Weight Perfect Matching Problem.Let G be a graph and w : E → R. Find a perfect matching M in G such that

arg maxM

∑e∈M

Note. (Remark 60) Matching Problems can be solved efficiently.

3.2 Integer Programming

Pure Integer Programming Problems can be formulated as

(IP ) : max cᵀx ,

{Ax ≤ b

x ∈ Zn

The LP relaxation of (IP ) is

(LP ) : max cᵀx ,{

Ax ≤ b

Suppose we have a solution of (LP ). The only difference between (IP ) and (LP ) is the integer constraint. If xis an optimal solution of (LP ), then x is a solution of (IP ). Regardless, cᵀx is always an upper bound of theoptimal solution of (IP ).

Let P := {x ∈ Rn : Ax ≤ b}. The integer hull of P is

Z(P ) := conv(P ∩ Zn)

3.2.1 (Theorem 61).Let A ∈ Rm×n, b ∈ Rm. Define

S := {x ∈ Zn : Ax ≤ b}

If S is bounded, then convS is a polytope.

3.2.2 (Theorem 62) Fundamental Theorem of Integer Programming.Let A ∈ Qm×n, b ∈ Qm, and

S := {x ∈ Zn : Ax ≤ b}

Then convS is a polyhedron.

Descriptio 3.2: The equalities A=x = b= are denoted in red

The curious part about Theorem 62 is that if you replace Q by R, it is no longer true.When is P = Z(P )?

3.2.3 Proposition.If P is bounded, then P = Z(P ) iff extP ⊆ Zn.

3.2.4 (Theorem 64).Let A ∈ Qm×n, b ∈ Qm, and

P := {x ∈ Rn : Ax ≤ b}

such that P is non-empty and bounded. Then P = Z(P ) iff for every c ∈ Zn, the LP

max{cᵀx : x ∈ P}

has an integer optimal value.

Proof. Suppose P = Z(P ). Then extP ⊆ Zn due to Theorem 41 and the fact that P is pointed, for allc ∈ Zn the LP has an extreme point x of P that is optimal. Since x , c ∈ Zn,

cᵀx ∈ Z

Conversely, suppose for all c ∈ Zn, the optimal value of LP is an integer.Let x ∈ P be an arbitrary extreme point of P . Let A=x ≤ b= be the set of constraints from Ax ≤ b

satisfied with equality by x . By Theorem 36, rank A= = n. Define ξ ∈ N such that ξA ∈ Zm×n. Define

{ξ if (Ax )i = bi0 otherwise

This is a very easy way to satisfy complementary slackness. Define

c := Aᵀy

If one entry of yi = 0, then it removes the corresponding column of Aᵀ. Otherwise the factor ξ makes ci ∈ Z.By Complementary Slackness Theorem, x is an optimal solution of

max{cᵀx : x ∈ P}

In fact, x is an unique optimal solution for this LP, because for this y , the corresponding optimal x is uniqueby virtue of A= having full rank.

The next idea is to perturb c. There exists r > 0 such that

cᵀx − cᵀx > r

for other extreme points x of P . (The set of extreme points cannot accumulate.)There also exists R > 0 such that

∣∣xj − xj∣∣ < R for all j and for all other extreme points x .

Let M ∈ N such that M > Rr . Fix k ∈ {1, . . . , n}. Define

c := M c + ek

cᵀx − cᵀx = cᵀ(x − x )

= M cᵀ(x − x )︸︷︷︸>r

+(xk − xk)

> Mr −R > R−R = 0

sfor every other extreme point x of P . Therefore, x is the unique optimal solution of max{cᵀx : x ∈ P}.The crux of this proof is that, now we notice

cᵀx︸︷︷︸∈Z

= M cᵀx︸︷︷︸∈Z

proving that xk ∈ Z. Hence x ∈ Zn.

3.2.5 (Theorem 65).Let A ∈ Qm×n, b ∈ Qm. Suppose

P := {x ∈ Rn : Ax ≤ b}is non-empty and bounded. The following are equivalent:

1. P = Z(P )

2. extP ⊆ Zn

3. ∀c ∈ Rn (not Zn), the LP max{cᵀx : Ax ≤ b} has an optimal solution x ∈ Zn.

4. ∀c ∈ Zn,max{cᵀ : Ax ≤ b} ∈ Z

5. ∀c ∈ Zn,min{bᵀy : Aᵀy = c,y ≥ 0} ∈ Z

3.3 Totally Unimodular Matrices

Definition

(Definition 66) A matrix A ∈ Zm×n is totally unimodular (TU) if for every k ∈ {1, . . . , n}, thedeterminant of every k-by-k submatrix of A is in {−1, 0,+1}.

A submatrix is indexed by a subset of row/column indices. In particular, the submatrix does not haveto be contiguous.

Note that from the definition, the entries of A can only be {−1, 0,+1}.

Example: (Example 67)

This is totally unimodular:

1 0 10 1 11 1 1

This is not totally unimodular because det A2 = −2.

1 0 10 1 11 1 0

3.3.1 (Theorem 68).Let A ∈ Zm×n. Suppose rank A = m. The following are equivalent.

1. det AB = ±1 for all bases B of A.

2. Every extreme point of {x : Ax = b,x ≥ 0} is in Zn, for b ∈ Zm.

3. A-1B ∈ Zm×m for all bases B of A.

Note that we do not require A to be totally unimodular, but every totally unimodular matrix will satisfy (1).

Proof.

• 1→ 2: Suppose (1) holds. Then for every basis B of A, det AB = ±1.

Let b ∈ Zm be arbitrary. Let x be an arbitrary extreme point of {x ∈ Rn : Ax = b,x ≥ 0}. Then thereexists basis B such that x is the BFS of B.

Thus {xN = 0

xB = A-1Bb

via Cramer’s rule (where detb AB is a vector whose jth entry is the determinant of AB with jth columnreplaced by b.)

xB =detb AB

det AB

However, detb AB ∈ Zm, since AB and b have integer entries. Therefore, xB ∈ Zm.

• 2→ 3: Let B be a basis of A. Fix index i. We shall show that A-1Be i ∈ Zm. Define

b := e i + αAB1

⌈maxi,j

∣∣∣(A-1B)i,j

∣∣∣⌉ ∈ N

The idea is to define b so that the system in (2) has a solution. Consider the basic solution of Ax = bdetermined by basis B. We have

xB = A-1Bb = A-1

B(e i + αAB1) = A-1Be i + α1

Since α is ≥ any entry in A-1B , the basic solution xB is feasible in (2) and is an extreme point of {x :

Ax = b,x ≥ 0}.By (2), x ∈ Zn, so

A-1Be i = xB − α1 ∈ Zm

as required.

• 3→ 1: Let B be a basis of A. The property of det implies

det A-1B · det AB = det I = 1

Both det A-1B and det AB are integer, so

det AB = ±1

3.3.2 (Proposition 69).Let A ∈ {−1, 0, 1}m×n. Then the following are equivalent:

1. A is totally unimodular.

2. Aᵀ is totally unimodular.

3. [A|I] is totally unimodular.

]is totally unimodular.

5. [A|A] is totally unimodular.

6. Let D := diag(±1, . . . ,±1) (diagonal matrix with ±1 on the diagonal). Then DA is totally unimodular.

Proof.

• 1↔ 2: Determinant and transpose commute so (1) and (2) are equivalent.

• 3→ 1: All submatrices A are submatrices of [A|I].

• 1→ 3: Let B be a submatrix of [A|I]. Suppose B is not a submatrix of A. Then B contains a column ofthe identity matrix. Expanding the Laplace form of determinant on that column removes the column.

• 3↔ 4: Via 1↔ 2

• 1↔ 5: Let B be a submatrix of [A|A]. Then if B contains duplicate columns, det B = 0. Otherwise B isa submatrix of A up to cyclic rearrangements.

• 1↔ 6: Homogeneity of determinant.

3.3.3 (Remark 70).A matrix A is totally unimodular iff every matrix obtained from A by pivoting is totally unimodular.

3.3.4 (Theorem 71).Let A ∈ {−1, 0,+1}m×n be totally unimodular and b ∈ Zn. Then

ext{x : Ax ≤ b} ⊆ Zn

3.3.5 (Theorem 72).Let A ∈ {−1, 0,+1}m×n be totally unimodular and b ∈ Zn. Then

ext{x : Ax ≤ b,x ≥ 0} ⊆ Zn

3.3.1 Application of Integer Programming to Graphs

Definition

A graph G = (V,E) is directed if its edges are ordered pairs of vertices.A closure is a set C ⊆ V such that no edges leave C.

Note that cycles of length 2 are allowed.

3.3.6 (Theorem 73).The node-arc incident matrix of every graph (directed or not) is totally unimodular.

Proof. We will use induction on the size of square submatrices. When k = 1, by definition, A ∈{−1, 0,+1}m×n, so every 1-by-1 matrix is totally unimodular.

Suppose every k-by-k submatrix for k ≤ l has determinant ∈ {−1, 0, 1}. Let C be a submatrix of size(l + 1)-by-(l + 1).

• If there is a zero column in C, then det C = 0.

• There exists a column with exactly one non-zero entry in C. Then by Laplace expansion, |det C| =∣∣det C′

∣∣where C′ is a l-by-l submatrix of C. By inductive hypothesis, det C = ±1.

Descriptio 3.3: Node-arc incidence matrix of a graph G

(1, 2) (2, 3) (1, 4) (3, 4) (1, 5) (4, 5)

1 −1

+1 −1

• Every column has exactly two non-zero entries in C. In which case one must be −1 and the other mustbe 1. Then

1ᵀC = 0ᵀ

Hence det C = 0.

3.3.7 (Corollary 74).Node-edge incidence matrix of every undirected bipartite graph is totally unimodular.

Proof. Let G = (V,E) be a bipartite graph. Then there exists a bipartition (A,B) of V such that all edgespoint from A to B or vice versa. Orient all edges of G to point from A to B to create a directed graph. Let thedirected graph be G.

Let A be the node-arc incident matrix of G. By Theorem 70, A is totally unimodular. Order the rows of Aby placing A above B. This gives

[A−A+

]where the entries of A± are in {0,±1}. The node-edge incidence matrix of G is DA, where D is a diagonal

matrix which flips the signs of A−. By Proposition 66, DA is totally unimodular.If G is not bipartite, its node-edge incidence matrix is not necessarily totally unimodular. The node-edge

incidence matrix of the triangle is ∣∣∣∣∣∣1 0 11 1 00 1 1

∣∣∣∣∣∣ = 2

Finding a minimum carinality cover in a general graph is NP-hard. However, finding a matching is mucheasier and can be done in polynomial time.

3.3.8 (Theorem 75) Konig’s Theorem.If G is a bipartite graph, the maximal matching of G and minimal cover of G have the same cardinality.

Proof. Let G = (V,E) be a bipartite graph and let A ∈ {0, 1}V×E be the node-edge incident matrix of G.Interpret

{1 if edge e is in the matching

0 otherwise

Consider the optimisation problem

z := max 1ᵀx ,

Ax ≤ 1

x ≥ 0

x ∈ ZE

The condition Ax ≤ 1 encodes the idea that each node may be associated to at most 1 edge. This integerprogramming problem is a equivalent form of the matching problem.

Consider the linear programming problem

z := max 1ᵀx ,

{Ax ≤ 1

x ≥ 0

This is not infeasible since 0 is a solution. This bounds the IP (z ≤ z). Using the dual,

w := min 1ᵀy ,

{Aᵀy ≥ 1

y ≥ 0

Every column of A has at most 2 1’s, so the objective values are bounded by

1ᵀ 1

This shows the LP of z is not unbounded and not infeasible, so z, the optimal value, exists.The dual problem’s objective value is bounded above by its integer version:

w := min 1ᵀy ,

Aᵀy ≥ 1

y ≥ 0

y ∈ ZV

We shall show that w corresponds to the minimal cardinality vertex cover problem. Hence we have the chain

z ≤ z = w ≤ w

If y is an optimal solution of w, then y ≤ 1 . This is because changing yi > 1 to yi := 1 improves the optimalsolution without disturbing the feasibility of y . Thus

w := min 1ᵀy ,

Aᵀy ≥ 1

y ≥ 0

y ∈ {0, 1}V

This shows w encodes the minimum cover problem in G.Since G is bipartite, via Corollary 71, A is totally unimodular, and so is Aᵀ by Proposition 66. Thus by

Theorem 69, the extreme points of z, w are integral. Thus z = z, w = w, and

z = z = w = w

Hence the maximum matching of G is has the same size as the minimum cover.The general idea applied in the proof can be applied to other integer programming problems:

max{cᵀx : Ax ≤ b,x ≥ 0 ,x ∈ Z} ≤ max{cᵀx : Ax ≤ b,x ≥ 0}= min{bᵀy : Aᵀy ≥ c,y ≥ 0}≤ min{bᵀy : Aᵀy ≥ c,y ≥ 0 ,y ∈ Z}

If the integer problems at the two ends of this chain encode some other problem, this can be used to prove ifthey have the same solution.

Maximal Flow Problem.Given a directed G = (V,E), two distinguished nodes s, t ∈ V and capacities ui,j ∈ Z+. Push as much flow froms to t as possible while respecting the capacity of each edge.

s is the source and t is the sink.

Descriptio 3.4: Maximum Flow Problem and an s-t cut (edges highlighted)

Definition

Let U ⊆ V . The directed cut induced by U is

S(U) := {(i, j) ∈ E : i ∈ U, j 6∈ U}

Let U ⊆ V such that s ∈ U, t 6∈ U . Then S(U) is a s-t cut.(Definition 76) The capacity of S(U) is

capacityU :=∑

(i,j)∈Ei∈U,j 6∈U

ui,j =∑

(i,j)∈S(U)

A minimum cut is a s-t-cut which minimises capacityU .

Let xi,j be the amount of flow from node i to node j. We wish to calculate

max∑

(s,j)∈E

{(Conservation)

∑(i,j)∈E xi,j −

∑(j,k)∈E xj,k = 0,∀j 6= s, t

(Capacity) 0 ≤ xi,j ≤ ui,j ,∀(i, j) ∈ E

Define c ∈ {0, 1}E by

cu,v :=

{1 if u = s, (u, v) ∈ E0 otherwise

The maximum flow can be encoded as the problem

max cᵀx :

Mx = 0

x ≤ u

x ≥ 0

where M ∈ {−1, 0, 1}(V \{s,t})×E is th node-arc incidence matrix of G with rows corresponding to the source (s)and sink (t) removed.

Equivalently, we have the problem

max cᵀx :

x ≥ 0

The matrix in this LP problem is totally unimodular. Because of this, the technique we applied to prove Konig’sTheorem applies here as well.

3.3.9 Min-cut Max-flow Theorem (Theorem 77).

Let G = (V,E) be a digraph with two distinct nodes s, t. Also let u ∈ RE≥0 be given. Then the value of themaximum flow is the capacity of the minimum cut.

If u ∈ ZE, then there exists a maximum flow x in G that is integral.

If we have a candidate flow x , we can subtract x from u . It should be impossible to push any flow throughthe capacity x −u . If it is possible to reach t from s using a positive capacity path, this certifies non-optimality.

3.4 Faces

If A is totally unimodular and b, c integral, then under some mild assumptions, we can establish that

(Primal IP) ≤ (Primal LP) = (Dual LP) ≤ (Dual IP)

and equality throughout. From this we obtain combinatorial min-max theorems and efficient, robust primal-dualalgorithms. However, it is not always possible to find an IP formula with coefficient matrix totally unimodularor to have the LP relaxation

max cᵀx : x ∈ P

where P = Z(P ), but in general we only have P ⊃ Z(P )How can we judge IP formulations based on how closely P approximates Z(P )?

Definition

Let P ⊆ Rn be a polyhedron. The dimension of P is

dimP := (Maximum number of affinely independent points in P )− 1

which is also the dimension of the smallest affine subspace containing P .The dimension of ∅ is −1.

Definition

Let P ⊆ Rn be a polyhedron. Given a ∈ Rn, α ∈ R,

aᵀx ≤ α

is a valid inequality for P ifP ⊆ {x : aᵀx ≤ α}

The F ⊆ P is a face of P ifF = P ∩ {x : aᵀx = α}

for some valid inequality aᵀx ≤ α.

Note that the empty set is a face and P itself is a face. (Set a := 0 ) Every face of P is a polyhedron is apolyhedron itself since a face can be expressed as a finite intersection of half-spaces.

Definition

(Definition 78) A face F of P is a facet if dimF = dimP − 1.

A polyhedron which has no facets is an affine subspace.

F dimF

∅ −1(Point) 0(Edge) 1

......

(Facet) dimP − 1P dimP

Descriptio 3.5: Projection of a distorted 5-faceted prism is a hexagon

3.4.1 (Theorem 79).Let P ⊂ Rn (not Rn) be a polyhedron such that dimP = n. Then every description of P in terms of linearinequalities contains at least one inequality for each facet of P .

Moreover, all such minimal descriptions are unique up to scaling by positive constraints.In every representation of the polyhedron with linear inequalities, for every facet Pf of P , there is at least

one valid inequality aᵀx ≤ β representing that facet. If dimP = n− k, then the representation

P = {x ∈ Rn : A=x = b=,Ax ≤ b}, A= ∈ Rk×n, rank A= = k

is unique up to multiplication with scalars.

Note. Not all polyhedra are full dimensional. A n− k dimensional polyhedron can be specified by k equalities.Theorem 76 continues to hold in the affine subspaces that the polyhedron resides. The uniqueness continues tohold modulo translations orthogonal to the affine subspace.

Note. (Remark 77) Notion of facets of Z(P ) help us describe the strongest valid inequaltiies for Z(P ). Hencegives us a tool in judging IP formulations

max cᵀx : x ∈ P, max cᵀx : x ∈ Z(P )

While the facet description of a polyhedron P ⊆ Rn with dimP = n is minimal. If you are allowed to expressP as a projection of another polyhedron P2 ⊆ Rn+m, P2 may have much fewer facets than P .

3.5 Maximum Weight Perfect Matching

We shall specialise to a class of combinatorial optimisation problems.

Definition

Let G = (V,E) be a bipartite graph and (A,B) a bipartition of G. Given S ⊆ V . The neighbour set ofS is

N(S) =⋃x∈S

3.5.1 Hall’s Theorem (Hall 1939, 81).

Let G = (V,E) be bipartite with bipartition A,B, such that |A| = |B| = |V |2 . G has a perfect matching if and

only if∀S ⊆ A. |N(S)| ≥ |S|

Proof. Suppose there exists S ⊆ A such that |S| > |N(S)|. Then it is impossible to match all vertices in S,so G has no perfect matching.

Conversely, suppose G has no perfect matching. Let M be a maximal matching and C be a minimalcardinality vertex cover. By Konig’s Theorem, |C| = |M |. Since M is not perfect, |C| = |M | < |A|.

Descriptio 3.6: Left: Complete marriage of A possible. Right: Complete marriage impossible due to bottleneck

Below is a disjoint decompisition of C:

C = (C ∩A) ∪ (C ∩B)

Define S := A \ C. C is a vertex cover, so it must be incident to every edge. This in particular implies thatN(S) ⊆ C ∩B because all edges incident to a vertex in S must be incident to a vertex in B.

Since there is no perfect matching, |C| < |A|.

|N(S)| ≤ |C ∩B| = |C| − |C ∩A| < |A| − |A ∩ C| = |S|

as required.

Note. Although Hall’s Theorem does not give a efficient algorithm, it gives a succint certificate for the infeasi-bility of perfect matchings.

Definition

(Definition 82) A set S ⊆ A with |N(S)| < |S| is a deficient set.

The IP formulation of maximum weight perfect matching problem is

(IP ) : max∑e∈E

Ax = 1

x ≥ 0

x ∈ ZE

where we ∈ R are weights and A is the node-edge incidence matrix of G, which is totally unimodular.To solve this we consider the problem with integrality relaxed

(P ) : max∑e∈E

{Ax = 1

x ≥ 0

This has a dual problem

(P ∗) : min∑v∈V

Aᵀy ≥ w

Expanding this,Aᵀy ≥ w ⇐⇒ ∀(u, v) ∈ E.yu + yv ≥ wu,v

This problem has some obvious solutions such as y = (max w) · 1 .We start with a dual feasible solution and maintain dual feasibility throughout the algorithm. We will also

maintain complementary slackness.Define G(y) := (V,E(y)), where

E(y) := {(u, v) ∈ E : yu + yv = wu,v}

Does G(y) have a perfect matching?

• If true, we have an optimal x for (P ) and (IP ), and y is an optimal solution for (P ∗).

• If not, there exists a deficient S ⊆ A, by Hall’s Theorem, such that∣∣NG(y)(S)∣∣ < |S|

Define the next y by

y ′ :=

yv − ε if v ∈ Syv + ε if v ∈ NG(y)(S)

yv otherwise

whereε := min{yu + yv − wu,v : (u, v) ∈ E, u ∈ S, v 6∈ NG(y)(S)}

This eliminates the slack on at least one of the edges.

Note that if ε does not exist, then NG(S) = NG(y)(S), so this certificate S also certifies the non-existenceof perfect matching in G, showing that the problem has no solution.

This algorithm terminates since the minimum slack in a situation with all integer weights is at least 1. Theobjective value’s improvement is

1ᵀy ′ − 1ᵀy = −ε(|S| −∣∣NG(y)(S)

∣∣) ≤ −ε1: procedure HungarianAlgorithm(G = (V,E), y , c)2: E(y) = {(u, v) ∈ E : yu + yv = cu,v}3: G(y) = (V,E(y))4: if ∃M ⊆ E(y),M is perfect matching5: x ← 1M

6: return (x , y) are optimal.7: end if8: S ← deficient-set(G(y))9: ε := min{yu + yv − wu,v : (u, v) ∈ E, u ∈ S, v 6∈ NG(y)(S)}

10: if ε undefined11: return G has no perfect matching, (P ) and (IP ) are infeasible, (D) is unbounded.12: end if

13: y ′ :=

yv − ε if v ∈ Syv + ε if v ∈ NG(y)(S)

yv otherwise14: end procedure

3.6 Alternatives in Integer Systems

Note that a deficient set is the alternative to perfect matching. Previously we have proved this by apppealingto the LP relaxation.

3.6.1 Fundamental Theorem of Linear Algebra (Gauss, Theorem 83).Given A ∈ Qm×n and b ∈ Qm, exactly one of the following holds:

1. ∃x ∈ Qn.Ax = b

2. ∃y ∈ Qm.Aᵀy = 0 , bᵀy 6= 0

Unfortunately, this cannot be easily generalised to integer programming. The two following systems are notalternatives:

1. ∃x ∈ Zn.Ax = b,x ≥ 0

2. ∃y ∈ Qm.Aᵀy ∈ Zn≥0, bᵀy 6∈ Z≥0

Example: Let A := [2, 3] and b := 1. The two systems are

1. ∃x1, x2 ∈ Z.2x1 + 3x2 = 1,

]≥ 0

This system does not have a solution.

2. ∃y ∈ Q.[2y3y

]∈ Z2

≥0, y 6∈ Z≥0

This system also does not have a solution since 2, 3 are co-prime.

To obtain a theorem analogous to Farkas’ Lemma, we need to make some compromises.

3.6.2 (Kronecker, Theorem 81).Let a ∈ Qn, b ∈ Q. Then exactly one of the following has a solution:

1. ∃x ∈ Zn.aᵀx = b

2. ∃y ∈ Q.ya ∈ Zn, yb 6∈ Zn

Proof. Suppose (1), (2) both have solutions, x and y. Then

Z 3 yaᵀx = yb 6∈ Z

which is a contradiction.Stub

3.6.3 Kronecker’s Approximation Theorem (Theorem 82).Let A ∈ Rm×n, b ∈ Rm. Then exactly one of the following holds

1. ∀ε > 0.∃x ∈ Zn. ‖Ax − b‖ < ε

2. ∃y ∈ Rm.Aᵀy ∈ Zn, bᵀy 6∈ Z

3.6.4 (Corollary 83).Let A ∈ Qm×n, b ∈ Qm. Then exactly one of the following holds:

1. ∃x ∈ Zn.Ax = b

2. ∃y ∈ Qm.Aᵀy ∈ Zn, bᵀy 6∈ Z

Definition

(Definition 84) Let U ∈ Zn×n. U is unimodular if det U = ±1.

Note. Unimodularity has very little to do with total unimodularity.

Any product of the following type of row/column operations is unimodular:

• Ri ← Ri + λRj (i 6= j)

• Ri ← ±Ri

• Ri ↔ Rj

Definition

A matrix H ∈ Zm×n is in Hermite normal form if

1. Hi,i > 0 for all i.

2. 0 ≤ −Hi,j < Hi,i for j < i.

3. H is lower-triangular (Hi,j = 0 when j > i)

Given A ∈ Zm×n with rank A = m, we will write as A = HU, where

• U ∈ Zn×n is unimodular.

Note that for every such matrix, U is a automorphism on Z, since U-1x can be obtained by Cramer’s rulefor any x ∈ Zn.

• H ∈ Zm×n is in Hermite normal form.

We can algorithmicly produce U,H from A using elementary column operations.Hence

Ax = b ⇐⇒ HUx = b

Since U is an automorphism on Zn,

3.6.5 Proposition.Ax = b has an integral solution iff Hz = b has an integral solution.

In general for IP’s, how do we construct theorems of the alternative?If we have S := {x ∈ Zn : Ax = b,x ≥ 0}, we can express alternatives in term of convS instead. We will

try to represent convS in terms of constraints obtained fromAx = b

x ≥ 0

x ∈ Zn

Caput 4

Continuous Optimisation

Nonlinear optimisation problems are at least as hard as integer programming problems.For example, xi ∈ {0, 1} can also be written as xi(1− xi) = 0. xi ∈ Z can be written as sin(πxI) = 0.

4.1 Topology on Rn

Definition

(Definition 88) A ball in Rn with centre x and radius r is

B(x ; r) := {x ∈ Rn : ‖x − x‖2 < r}, (x ∈ Rn, r > 0)

Definition

Let S ⊆ Rn

(89) The interior of S isintS := {x ∈ S : ∃r.B(x ; r) ⊆ S}

(90,91) The closure of S isclS := {x ∈ R : ∃〈x i〉 ⊆ S.x i → x}

The boundary of S is∂S := clS \ intS

Definition

(Definition 92) Let S ⊆ Rn. S is compact if S is closed and bounded.

Note. In more general metric spaces, S is compact iff S is closed and totally bounded.

4.1.1 Bolzano-Weierstrass Theorem (Theorem 93).

Let S ⊆ Rn be compact. Then every sequence in 〈x (k)〉 ⊆ S has a convergent subsequence which converges tox ∈ S.

Definition

(Definition 94) Let S ⊆ Rn. f : S → Rm is continuous at x ∈ S iff for every sequence 〈x (k)〉 ⊆ S with

x (k) → x , we have f (x (k))→ f (x ).

Descriptio 4.1: Contours of f(x1, x2) := x21 + 1

1 2 14

4.1.2 (Theorem 95).Let S ⊆ Rn with f : S → Rm. The following statements are equivalent:

1. f is continuous at every x ∈ S

2. ∀x ∈ S.∀ε > 0.∃δ > 0.∀x . ‖x − x‖ < δ → ‖f (x )− f (x )‖ < ε

3. ∀U ⊆ Rm open, f -1[U ] is open w.r.t. S, i.e. there exists V ⊆ Rn open such that f -1[U ] = V ∩ S.

4. ∀G ⊆ Rm closed, f -1[G] is closed w.r.t. S, i.e. there exists H ⊆ Rn closed such that f -1[G] = H ∩ S.

Definition

Let f : S → R. The level sets of f are

level(f ;α) := {x ∈ S : f(x) ≤ α}, (α ∈ R)

The contours of f are{x ∈ S : f(x) = α}, (α ∈ R)

Note. In some literatures, level sets are sublevel sets and contours are level sets.

When f is continuous and S is closed, level(f ;α) are closed.

Definition

Let S ⊆ Rn and f : S → R. The infimum of f on S is the largest α ∈ R such that f(x ) ≥ α for allx ∈ S.

4.1.3 Weierstrass’s Theorem (Theorem 93).Let S ⊆ Rn be non-empty compact set. Let f : S → R be continuous on S. Then f attains infimum andsupremum.

Definition

(Definition 96) Let S ⊆ Rn and f : S → R. Then f is coercive on S if the level sets level(f ;α) of f arebounded for all α ∈ R.

4.1.4 Proposition.

f is coercive iff for every 〈x (k)〉 ⊆ S such that∥∥∥x (k)

∥∥∥→∞, f(x (k))→ +∞.

4.1.5 (Theorem 100).Let S ⊆ Rn be non-empty, closed set and f : S → R continuous and coercive on S. Then f attains infimum onS.

4.2 Semi-definite Programming

4.2.1 Eigenvalue Decomposition.Let A ∈ Rn×n such that A = Aᵀ. Then there exists a orthogonal U ∈ Rn×n and diagonal D ∈ Rn×n such that

A = UDUᵀ

The columns of U are eigenvectors of A and the diagonal of D are eigenvalues of A.

Consider a quadratic function

f(x ) := γ + cᵀx + xᵀAx , (x ∈ Rn)

Note that we can assume A is symmetric, because

xᵀAx = xᵀ(

A + Aᵀ

Definition

(Definition 102)A symmetric matrix A ∈ Rn is

• Positive definite if all eigenvalues of A are positive.

• Positive semi-definite if all eigenvalues of A are non-negative.

• Negative definite if all eigenvalues of A are negative.

• Negative semi-definite if all eigenvalues of A are non-positive.

Equivalently, for all x 6= 0 ,

xᵀAx

> 0 if positive definite

≥ 0 if positive semi-definite

< 0 if negative definite

≤ 0 if negative semi-definite

4.2.2 Rayleigh Quotient.let A be a symmetric matrix. Let λ1, . . . , λn be the eigenvalues of A, in descending order. Then

λ1(A) = maxh 6=0

hᵀAh

hᵀh, λn(A) = min

hᵀAh

hᵀh,

Descriptio 4.2: A convex function. Notice that the function lies below the red line joining two points on itsimage

4.2.3 Proposition.If A is positive definite, then f(x ) := γ + cᵀx + xᵀAx is coercive.

Proof. By Rayleigh Quotient,

xᵀAx ≥ λn(A)xᵀx = λn(A) ‖x‖2

sof(x ) = γ + cᵀx + xᵀAx ≥ γ + cᵀx + λn(A) ‖x‖2

By Cauchy-Schwarz Inequality, cᵀx ≥ −‖c‖ ‖x‖, so

f(x ) = γ + cᵀx + λn(A) ‖x‖2 ≥ γ − ‖c‖ ‖x‖+ λn(A) ‖x‖2

when ‖x‖ → +∞, the quadratic term dominates the linear term and f(x )→ +∞.

4.2.1 Convex Functions

Optimisation is always a iterative algorithm. We start from one point and move closer and closer to the infimum.The difficulty in nonlinear optimisation is to not get stuck in a local minimum. However, for convex functions,

local minimum must be the global minimum.

Definition

Let S ⊆ Rn be a convex set. Then f : S → R is convex on S if

∀u , v ∈ S.∀λ ∈ [0, 1] .f((1− λ)u + λu) ≤ (1− λ)f(u) + λf(v)

f is strictly convex if the inequality is strict for λ 6= 0, 1 and u 6= v .

Definition

Let f : Rn → R. The epigraph of f is

epi f :=

]∈ R× Rn : f(x ) ≤ µ

}4.2.4 Theorem.f is a convex function iff epi f is a convex set.

4.3 Gradient

Definition

(Definition 103) Let S ⊆ Rn, f : S → R, and x ∈ intS. If there exists ∇f(x ) ∈ Rn such that

limx→x

f(x )− f(x )−∇ᵀf(x )(x − x )

‖x − x‖= 0

Then ∇f(x ) is the gradient or derivative of f at x , and f is differentiable at x .

If the partial derivatives ∂f∂xi

are continuous in a neighbourhood of x , then

∇f(x ) =

∂f∂x1

(x )...

∂f∂xn

and f is continuously differentiable.

4.3.1 (Theorem 104).Let S ⊆ Rn be a convex set and f : S → R be continuously differentiable on S. Then f is convex iff

f(y) ≥ f(x ) +∇ᵀf(x )(y − x ), (∀x ,y ∈ S)

Proof. Suppose f is convex. Let x ,y ∈ S and λ ∈ ]0, 1]. Then

(1− λ)f(x ) + λf(y) ≥ f((1− λ)x + λy)

Hencef(x ) + λ(f(y)− f(x )) ≥ f((1− λ)x + λy)

Rearranging,f(y)− f(x + λ(y − x ))

λ≥ f(x )− f(y)

Differentiating,

−∇ᵀf(x )(y − x ) = limλ↘0

f(y)− f(x + λ(y − x ))

λ≥ f(x )− f(y)

Rearranging again,f(y) ≥ f(x ) +∇ᵀf(x )(y − x )

Conversely, suppose f(y) ≥ f(x ) +∇ᵀf(x )(y −x ) for all x ,y ∈ S. Let x ,y ∈ S be arbitrary and λ ∈ [0, 1].Define

z := (1− λ)x + λy ∈ S

f(x ) ≥ f(z ) +∇ᵀf(z )(x − z )

f(y) ≥ f(z ) +∇ᵀf(z )(y − z )

Adding the two inequalities with weight (1− λ) and λ:

(1− λ)f(x ) + λf(y) ≥ f(z ) +∇ᵀf(z )((1− λ)x + λy − z︸︷︷︸0

Hence(1− λ)f(x ) + λf(y) ≥ f(z )

Definition

(Definition 105) Let S ⊆ Rn, f : S → Rm, and x ∈ intS. If there exists a linear transformationDf (x ) : Rn → Rm such that

limx→x

‖f (x )− f (x )−Df (x )(x − x )‖‖x − x‖

Then f is differentiable at x and Df (x ) is the derivative of f at x .

If Df (x ) is represented as a matrix (Jacobian matrix), the (i, j)th entry of A is ∂fi∂xj

Df (x ) =

∇ᵀf1(x )

...∇ᵀfm(x )

The determinant det Df is the Jacobian determinant.

Definition

If f : Rn → R is twice continuously differentiable, the Hessian Matrix of f is

Hf(x ) :=

∂xi∂xj

Since f is twice continuously differentiable,

∂xi∂xj=

∂xj∂xi

so Hf(x ) is symmetric.

4.3.2 (Theorem 106).Let S ⊆ Rn be a convex set. Let f : S → R be twice continuously differentiable. Then f is convex on S iffHf(x ) is positive semi-definite for every x ∈ S.

4.3.3 (Theorem 107).Let S ⊆ Rn be a convex set. Let f : S → R be twice continuously differentiable. If Hf(x ) is positive definitefor every x ∈ S, then f is strictly convex on S.

Note. The converse to the above theorem is false. Consider f : R→ R with f(x) = x4. We have f ′′(x) = 12x2,so f ′′(0) = 0, but f is strictly convex.

4.3.4 (Theorem 108).Let S ⊆ Rn be a convex set. Let f : S → R be continuously differentiable on S. Then f is convex on S if andonly if

(∇f(x )−∇f(y))ᵀ(x − y) ≥ 0

For every u , v ∈ S.

Intuition: If x > y then f ′(x) > f ′(y). This is equivalent to that the second derivative is non-negative.

4.3.1 Steepest Descent and Newton’s Method

In continuous optimisation, we sometimes pick a direction d ∈ Rn and move the current solution along d toimprove some obejctive function f . d is the search direction and the new point is

x (i+1) := x (i) + αd

where α is the step size.We want to analyse the behaviour of f along the line of d . Define

g(λ) := f(x (i) + λd)

The derivative of g isg′(λ) = ∇f(x + λd)ᵀd

Henceg′(0) = ∇f(x )ᵀd

Similarly,g′′(0) = dᵀHf(x )d

We present two optimisation algorithms. Consider a objective function f : S → R that we wish to minimise.

• SteepestDescent: d := −∇f(x )

• NewtonMethod: d := −(Hf(x ))-1∇f(x )

NewtonMethod converges much faster but calculating Hf is very expensive (second derivative), soSteepestDescent is more commonly used. In general the choice of d and λ is a very difficult problem.

4.3.5 (Theorem 109).Let S ⊆ Rn be a non-empty convex set and f : S → R be a convex function. Suppose f is continuouslydifferentiable at x ∈ S. Then x is a minimiser of f on S iff

∇f(x )ᵀ(x − x ) ≥ 0, (∀x ∈ S)

In particular, the gradient at any interior minimiser is 0 .

4.4 Separating Hyperplane

Consider S ⊆ R2, a non-empty convex set. Let

f(x ) := ‖x − u‖2 , (u ∈ R2)

f is convex because Hf(x ) = 2I and f is continuously differentiable. Consider the problem ofminimising f on S.

One way to solve this problem is via Hilbert Projection Theorem: If S is closed, there is a uniqueminimiser x .

By Theorem 106, x ∈ S is a minimiser iff

2(x − u)ᵀ(x − x ) = ∇f(x )ᵀ(x − x ) ≥ 0

4.4.1 (Corollary 111) Kolmogorov’s Criterion.Let S ⊆ Rn be a non-empty closed convex set. Then there exists closest point x (u) ∈ S which is unique andsatisfies

(u − x (u))ᵀ(x − x (u)) ≤ 0 , (∀x ∈ S)

4.4.2 (Theorem 112) Separating Hyperplane Theorem.Let S ⊆ Rn be a non-empty closed convex set. Then for every u 6∈ S, there exists a non-zero vector a ∈ Rn\{0}such that

∀x ∈ S.aᵀx ≤ α, aᵀu > α

Descriptio 4.3: x minimises the distance from S to u . Note that the dot product of (u − x ) and (x − x ) isnon-positive.

Proof. Let x be the closest point to u in S, which exists by Kolmogorov’s Criterion. Since u 6∈ S,‖x − u‖ > 0. By Kolmogorov’s Criterion,

(u − x )(x − x ) ≤ 0, (∀x ∈ S)

Expanding,(u − x )ᵀ︸︷︷︸

x ≤ xᵀ(u − x )︸︷︷︸α

andaᵀu − α = (u − x )ᵀu − xᵀ(u − x ) = (u − x )ᵀ(u − x ) = ‖u − x‖ > 0

Note. Separating Hyperplane Theorem produces another proof for Farkas’ Lemma and Strong Duality.The hyperplane we found in the proof is a supporting hyperplane for S since it touches S, i.e.

S ∩ {x : aᵀx = α} 6= ∅

Separating Hyperplane Theorem can also be used to prove Farkas’ Lemma. Let A ∈ Rm×n. Then S :={Ax : x ≥ 0} is a cone, and is non-empty, closed, and convex, so the alternative of b ∈ S is that there exists aseparating hyperplane y .

4.5 Lagrangians and Lagrangian Duality

Let S ⊆ Rn and f, g1, . . . , gm : S → R. The set S should be “simple”. Consider the optimisation problem

(P ) : inf f(x ),

{g(x ) ≤ 0

x ∈ S

Definition

The Lagrangian of (P ) is L : Rn × Rm → R:

L(x ,λ) := f(x ) + λᵀg(x ), (x ∈ Rn,λ ∈ Rm)

λ are Lagrange multipliers.

The variables λ are dual variables.

Definition

The Lagrangian dual of (P ) is

(P ∗) : suph(λ),{λ ≥ 0

where h(λ) := infx∈S L(x ,λ).

4.5.1 (Theorem 113).Let S ⊆ Rn and x ∈ S. x is an optimal solution to (P ) if ∃λ ∈ Rm such that

1. g(x ) ≤ 0 (primal feasibility)

2. λ ≥ 0 (dual feasibility)

3. L(x , λ) = f(x ) + λᵀg(x ) = infx∈S L(x , λ)

4. λ� g(x ) = 0 (complementary slackness)

Proof. Let x , λ satisfy the assumption of the theorem.Let x be an arbitrary solution of (P ). Then

f(x ) ≥ f(x ) +

≥0ᵀ︷︸︸︷

g(x )︸︷︷︸L(x ,λ)

≥ f(x ) +

0︷︸︸︷λᵀg(x )︸︷︷︸

L(x ,λ)

= f(x )

so f(x ) is optimal.Note the connection between Theorem 110 and duality. For every pair of primal dual solutions (x ,λ),

supλ≥0

infx∈S

L(x ,λ) ≤ infx∈S

supλ≥0

L(x ,λ)

Theorem 110 is analogous to the corollary fo the Weak Duality Theorem for characterisation of optimal solutions.Note that unlike optimisable linear programming problems, the gap between primal optimal and dual optimalis often large.

Definition

(Definition 111) Let S := Rn and f, g1, . . . , gm : Rn → R be convex functions. x ∈ Rn is a Slater pointfor (P ) if g(x ) < 0 .

4.5.2 (Theorem 112).Suppose S := Rn and f, g1, . . . , gm : Rn → R are convex functions, and (P ) has a Slater point. Then a feasiblesolution x of (P ) is optimal iff ∃λ ≥ 0 :

1. (Saddle point condition1)L(x ,λ) ≤ L(x , λ) ≤ L(x , λ), (∀x ,λ)

2. (Complementary Slackness)

λᵀg(x ) = 0

4.5.3 Karush-Kuhn-Tucker (KKT) Theorem (Theorem 113).Consider the same (P ) as above. Suppose (P ) has a Slater point. Assume x ∈ Rn satisfies

1. g(x ) ≤ 0

2. f, gi are differentiable at x , where i ∈ J(x ) := {i : gi(x ) = 0}

Then x is optimal in (P ) iff−∇f(x ) ∈ cone {∇gi(x ) : i ∈ J(x )}

Descriptio 4.4: Visualisation of Theorem 113. Note that constraints g1, g2 are tight but g3 is not

g3 ≤ 0

∇g2(x )

∇g1(x )

∇f(x )

Can we generalise the KKT Theorem to non-convex f, gi? Consider the optimisation problem

(P ) : inf f(x ),

g(x ) ≤ 0

h(x ) = 0

x ∈ S

where f, g1, . . . , gm, h1, . . . , hp : Rn → R and S ⊆ Rn is “simple”.

4.5.4 Mangasarian and Fromovitz Constraint Qualfiication (MFCQ).Let (P ) be a NLP. Let x be a feasible solution of (P ). Suppose

(i) x ∈ intS, g(x ) ≤ 0 ,h(x ) = 0

(ii) f, g1, . . . , gm, h1, . . . , hp are continuous on S and f, gi for i ∈ J(x ) and h are differentiable at x .

(iii) ∇h1, . . . ,∇hp are linearly independent at x .

∃d ∈ Rn.

{∇gi(x )ᵀd < 0, ∀i ∈ J(x )

∇hj(x )ᵀd = 0, ∀j ∈ {1, . . . , p}

This condition replaced the Slater point condition in KKT Theorem.

Then a necesary condition for x to be a local minimiser of (P ) is:

∃λ ∈ RJ(x)≥0 , µ ∈ Rp.∇f(x ) +

∑i∈J(x)

λi∇gi(x ) +

p∑j=1

µj∇hj(x ) = 0

The idea is that each non-linear constraint gi, hj can be approximated with a gradient

gi(x ) ' g(x ) + (x − x )ᵀ∇gi(x )

MFCQ is a generalisation of the unconstrained condition ∇f(x ) = 0 . Many continuous optimisation algorithmstry to generate points which satisfy MFCQ.

1This called saddle point condition because from x ’s perspective, L is convex, and from λ’s perspective, L is concave.

Consider the special case where S = Rn and g1, . . . , gm are convex, and h1, . . . , hp are affine. If such a convexoptimisation problem has a Slater point x , then by slater condition g(x ) < 0 , so

0 > gi(x ) ≥ gi(x )︸︷︷︸0

+∇gi(x )ᵀ (x − x )︸︷︷︸d

, ∀i ∈ J(x )

Hence condition (4) of MFCQ holds. This shows Slater point is a stronger condition than MFCQ.The following condition implies (iii) and (iv) of MFCQ:

{∇hj(x ) : j ∈ {1, . . . , p}} ∪ {∇gi(x ) : i ∈ J(x )}

is linearly independent.

Definition

Let F ⊆ Rn be the feasible region of an NLP. Let x ∈ F .x is a global minimiser if

∀x ∈ F.f(x ) ≤ f(x )

x is a local minimiser if ∃ε > 0 such that x is a minimum of f �B(x ;ε).

4.5.5 Theorem.For convex NLPs, every local minimiser is a global minimiser.

4.5.6 Theorem.Consider the convex NLP:

inf f(x ) :

g(x ) ≤ 0

h(x ) = 0

x ∈ S

where f is convex, S is convex, g1, . . . , gm are convex, and h1, . . . , hp are affine. Suppose MFCQ holds at x .Then x is a minimiser of (P ) if and only if

∃λ ∈ RJ(x)≥0 , µ ∈ Rp.∇f(x ) +

∑i∈J(x)

λi∇gi(x ) +

p∑j=1

µj∇hj(x ) = 0

4.6 Ellipsoid and Interior Point Methods

Suppose we have a separation oracle:

Given a point x and a convex set C, then decide either x ∈ C, or produce a separating hyperplane whichproves that it does not.

Relative to this oracle, all linear programming problems can be solved in polynomial time.Consider the following games:

1. There is a differentiable function f : [a, b] → R which is convex and we want to minimise f , but we canonly sample its derivative using a oracle, and the oracle only returns the sign of the derivative.

Easy way to find f(x ) = 0 is to execute bisection algorithm. Guess x (1) = a+b2 and bisect the interval.

Iterating⌈log2

b−aε

⌉iterations will converge to a root with error ε. Bisection method is essentially optimal.

2. Closely related: We want to find a small interval J of length ε > 0 in a given interval [a, b] of R. We inputx , the oracle outputs either

• x ∈ J• J is on the left side of x

• J is on the right side of x

3. Generalisation to Rn: Consider the problem of computing a point in a compact convex set C. C is givenby a separation oracle.

Start with an ellipsoid Ei which contains C. The ellipsoid is parametrised by its centre x (i). We ask the

oracle if x (i) ?∈ C. Suppose the oracle gives a separating hyperplane {x : aᵀx ≤ α}. Then we calculate

the smallest ellipsoid Ei+1 which contains the half-sllipsoid

E0 ∩ {x : aᵀx ≤ α}

log|Ei||Ei+1|

< − 1

This is the ellipsoid method. We could also use other convex solids such as simplices or cubes.

To find the initial ellipsoid E0, we could compute a bound on the norm of basic feasible solutions.Other optimisation courses:

• CO450 Combinatorial Optimisation

• CO452 Integer Programming

• CO463 Convex Optimisation and Analysis

• CO466 Continuous Optimisation

• CO471 Semidefinite Optimisation

• CO673 Optimisation in Data Science

Other combinatorics courses:

• CO442 Graph Theory

• CO430 Combinatorial Enumeration

• CO444 Algebraic Graph Theory

• CO434 Coding Theory

• CO481 Quantum Information

• CO485 Public-Key Cryptography

• CO487 Applied Cryptography

CO255: Introduction to Optimisation

Documents