+ All Categories
Home > Documents > Lecture 10 - Linearly Constrained Problems: Separation ...

Lecture 10 - Linearly Constrained Problems: Separation ...

Date post: 07-Apr-2022
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
21
Lecture 10 - Linearly Constrained Problems: Separation Alternative Theorems Optimality Conditions I A hyperplane H = {x R n : a T x = b} (a R n \{0}, b R) is said to strictly separate a point y / S from S if a T y > b and a T x b for all y S . Theorem (separation of a point from a closed and convex set) Let C R n be a nonempty closed and convex set, and let y / C . Then there exists p R n \{0} and α R such that p T y and p T x α for all x C . Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 1 / 21
Transcript
Page 1: Lecture 10 - Linearly Constrained Problems: Separation ...

Lecture 10 - Linearly Constrained Problems: Separation →Alternative Theorems → Optimality Conditions

I A hyperplane

H = {x ∈ Rn : aTx = b} (a ∈ Rn\{0}, b ∈ R)

is said to strictly separate a point y /∈ S from S if

aTy > b

andaTx ≤ b for all y ∈ S .

Theorem (separation of a point from a closed and convex set) Let C ⊆ Rn

be a nonempty closed and convex set, and let y /∈ C . Then there existsp ∈ Rn\{0} and α ∈ R such that

pTy > α and pTx ≤ α for all x ∈ C .

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 1 / 21

Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Page 2: Lecture 10 - Linearly Constrained Problems: Separation ...

Proof of the Separation Theorem

I By the second orthogonal projection theorem, the vector x = PC (y) ∈ Csatisfies

(y − x)T (x− x) ≤ 0 for all x ∈ C ,

which is the same as

(y − x)Tx ≤ (y − x)T x for all x ∈ C .

I Denote p = y − x 6= 0 and α = (y − x)T x. Then

pTx ≤ α for all x ∈ C

I On the other hand,

pTy = (y − x)Ty = (y − x)T (y − x) + (y − x)T x = ‖y − x‖2 + α > α.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 2 / 21

Thomas
Typewritten Text
Note p'x = alpha is the same as p' (x-xbar) = 0, which is the hyperplane going through xbar and orthgonal to p. If the boundary of C is a smooth hypersurface, then that hyperplane is the tangent plane of the boundary surface at xbar.
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Page 3: Lecture 10 - Linearly Constrained Problems: Separation ...

Farkas Lemma - an Alternative TheoremFarkas Lemma. Let c ∈ Rn and A ∈ Rm×n. Then exactly one of thefollowing systems has a solution

I. Ax ≤ 0, cTx > 0.

II. ATy = c, y ≥ 0.

Another equivalent formulation is the following.

Farkas Lemma - second Formulation Let c ∈ Rn and A ∈ Rm×n. Then thefollowing two claims are equivalent:

(A) The implication Ax ≤ 0⇒ cTx ≤ 0 holds true.

(B) There exists y ∈ Rm+ such that ATy = c.

What does it mean?

Example. A =

(1 5−1 2

), c =

(−19

),

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 3 / 21

Thomas
Typewritten Text
Thomas
Typewritten Text
intersection of a convex polyhedral cone and a half space in R^n
Thomas
Typewritten Text
Thomas
Typewritten Text
intersection of a hyperplane and the positive orthant in R^m
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Claim: For A and c above, the set defined by I. is empty. Proof: Let y' = [1,2]. Then y' A = [-1 9] = c'. Since y>=0, this means if Ax<=0, then y'Ax <= 0, so c'x<=0. Q.E.D. This argument can easily to extended to show that if II is not empty, then I must be empty,or that, in the second formulation, (B)=>(A). The converse implication is less obvious, and the proof uses the separation theorem.
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Page 4: Lecture 10 - Linearly Constrained Problems: Separation ...

Proof of Farkas LemmaI Suppose that system (B) is feasible:∃y ∈ Rm

+ such that ATy = c.

I To see that the implication (A) holds, suppose that Ax ≤ 0 for some x ∈ Rn.

I Multiplying this inequality from the left by yT :

yTAx ≤ 0.

I Hence,cTx ≤ 0,

I Suppose that the implication (A) is satisfied, and let us show that the system(B) is feasible. Suppose in contradiction that system (B) is infeasible.

I Consider the following closed and convex (why?) set

S = {x ∈ Rn : x = ATy for some y ∈ Rm+}

I c /∈ S .

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 4 / 21

Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Page 5: Lecture 10 - Linearly Constrained Problems: Separation ...

Proof Contd.

I By the separation theorem ∃p ∈ Rn\{0} and α ∈ R such that pTc > α and

pTx ≤ α for all x ∈ S . (1)

I 0 ∈ S ⇒ α ≥ 0⇒ pTc > 0.

I (1) is equivalent topTATy ≤ α for all y ≥ 0

or to(Ap)Ty ≤ α for all y ≥ 0, (2)

I Therefore, Ap ≤ 0.

I Contradiction to the assertion that implication (A) holds.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 5 / 21

Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Callout
>=0
Thomas
Line
Thomas
Line
Page 6: Lecture 10 - Linearly Constrained Problems: Separation ...

Gordan’s Alternative TheoremTheorem. Let A ∈ Rm×n. Then exactly one of the following two systemshas a solution.

(A) Ax < 0.

(B) p 6= 0,ATp = 0,p ≥ 0.

Proof.I Suppose that system (A) has a solution.I Assume in contradiction that (B) is feasible: ∃p 6= 0 satisfying

ATp = 0,p ≥ 0.I Multiplying the equality ATp = 0 from the left by xT yields (Ax)Tp = 0,

which is an impossible equality.I Suppose that system (A) does not have a solution.I System (A) is equivalent to (s is a scalar) to Ax + se ≤ 0, s > 0.

I or to A

(xs

)≤ 0, cT

(xs

)> 0, where A =

(A e

)and c = en+1.

I The infeasibility of (A) is thus equivalent to the infeasibility of the system

Aw ≤ 0, cTw > 0,w ∈ Rn+1.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 6 / 21

Thomas
Typewritten Text
Thomas
Typewritten Text
<0
Thomas
Line
Thomas
Typewritten Text
>=0
Thomas
Line
Page 7: Lecture 10 - Linearly Constrained Problems: Separation ...

Proof of Gordan Contd.

I By Farkas’ lemma, ∃z ∈ Rm+ such that(

AT

eT

)z = c

I ⇔ ∃z ∈ Rm+ : AT z = 0, eT z = 1.

I ⇔ ∃0 6= z ∈ Rm+ : AT z = 0.

I ⇒ System (B) is feasible.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 7 / 21

Page 8: Lecture 10 - Linearly Constrained Problems: Separation ...

KKT Conditions for Linearly Constrained Problems

Theorem (KKT conditions for linearly constrained problems - necessary op-timality conditions)Consider the minimization problem

(P)min f (x),s.t. aT

i x ≤ bi , i = 1, 2, . . . ,m

where f is continuously differentiable over Rn, a1, a2, . . . , am ∈Rn, b1, b2, . . . , bm ∈ R and let x∗ be a local minimum point of (P). Thenthere exist λ1, λ2, . . . , λm ≥ 0 such that

∇f (x∗) +m∑i=1

λiai = 0. (3)

andλi (aT

i x∗ − bi ) = 0, i = 1, 2, . . . ,m. (4)

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 8 / 21

Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Ex: compare this result with the Lagrange multiplier theorem in calculus (which allows general nonlinear, but only *equality*, constraints)
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
lambda_i > 0 & a_i'x*-b_i < 0 cannot happen together, for each i.
Thomas
Line
Thomas
Line
Thomas
Typewritten Text
Note: We see (3), but not (4), in the Lagrange multiplier theorem.
Page 9: Lecture 10 - Linearly Constrained Problems: Separation ...

Proof of KKT TheoremI x∗ is a local minimum ⇒ x∗ is a stationary point.I ∇f (x∗)T (x− x∗) ≥ 0 for every x ∈ Rn satisfying aT

i x ≤ bi for anyi = 1, 2, . . . ,m.

I Denote the set of active constraints by

I (x∗) = {i : aTi x∗ = bi}.

I Making the change of variables y = x− x∗, we have

∇f (x∗)Ty ≥ 0 for any y ∈ Rm satisfying aTi (y + x∗) ≤ bi , i = 1, 2, . . . ,m.

I or ∇f (x∗)Ty ≥ 0 for any y satisfying

aTi y ≤ 0 i ∈ I (x∗),

aTi y ≤ bi − aT

i x∗ i /∈ I (x∗).

I The second set of inequalities can be removed, that is, we will prove that

aTi y ≤ 0 for all i ∈ I (x∗)⇒ ∇f (x∗)Ty ≥ 0.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 9 / 21

Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Line
Thomas
Line
Thomas
Line
Thomas
Typewritten Text
(Same as: grad f(x*)' y >= 0 for any y satisfying just a_i' y <= 0 for all i in I(x*).)
Thomas
Typewritten Text
Thomas
Line
Page 10: Lecture 10 - Linearly Constrained Problems: Separation ...

Proof Contd.

I Suppose then that y satisfies aTi y ≤ 0 for all i ∈ I (x∗)

I Since bi − aTi x∗ > 0 for all i /∈ I (x∗), it follows that there exists a small

enough α > 0 for which aTi (αy) ≤ bi − aT

i x∗.

I Thus, since in addition aTi (αy) ≤ 0 for any i ∈ I (x∗), it follows by the

stationarity condition that ∇f (x∗)Ty ≥ 0.

I We have shown aTi y ≤ 0 for all i ∈ I (x∗)⇒ ∇f (x∗)Ty ≥ 0.

I By Farkas’ lemma ∃λi ≥ 0, i ∈ I (x∗) such that

−∇f (x∗) =∑

i∈I (x∗)

λiai .

I Defining λi = 0 for all i /∈ I (x∗) we get that λi (aTi x∗ − bi ) = 0 for all

i ∈ {1, 2, . . . ,m} and

∇f (x∗) +m∑i=1

λiai = 0.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 10 / 21

Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Line
Thomas
Typewritten Text
Page 11: Lecture 10 - Linearly Constrained Problems: Separation ...

The Convex Case

Theorem [KKT conditions for convex linearly constrained problems -necessary and sufficient optimality conditions]Consider the minimization problem

(P)min f (x),s.t. aT

i x ≤ bi , i = 1, 2, . . . ,m

where f is a convex continuously differentiable function over Rn,a1, a2, . . . , am ∈ Rn, b1, b2, . . . , bm ∈ R and let x∗ be a feasible solu-tion of (P). Then x∗ is an optimal solution if and only if there existλ1, λ2, . . . , λm ≥ 0 such that

∇f (x∗) +m∑i=1

λiai = 0. (5)

andλi (aT

i x∗ − bi ) = 0, i = 1, 2, . . . ,m. (6)

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 11 / 21

Thomas
Line
Thomas
Typewritten Text
Thomas
Line
Page 12: Lecture 10 - Linearly Constrained Problems: Separation ...

Proof of KKT in Convex Case

I Necessity was proven.

I Suppose that x∗ is a feasible solution of (P) satisfying (5) and (6). Let x bea feasible solution of (P).

I Define the function

h(x) = f (x) +m∑i=1

λi (aTi x− bi ).

I ∇h(x∗) = 0⇒ x∗ is a minimizer of h over Rn.

I

f (x∗) = f (x∗) +m∑i=1

λi (aTi x∗ − bi ) ≤ f (x) +

m∑i=1

λi (aTi x− bi ) ≤ f (x),

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 12 / 21

Thomas
Callout
1. It is convex, isn't it? 2. What is grad h(x) ?
Thomas
Callout
(5)
Thomas
Callout
x is feasible
Page 13: Lecture 10 - Linearly Constrained Problems: Separation ...

Problems with Equality and Inequality ConstraintsTheorem [KKT conditions for linearly constrained problems]Consider the minimization problem

(Q)min f (x),s.t. aT

i x ≤ bi , i = 1, 2, . . . ,m,cTj x = dj , j = 1, 2, . . . , p.

where f cont. dif., ai , cj ∈ Rn, bi , dj ∈ R.

(i) (necessity of the KKT conditions) If x∗ is a local minimum of (Q), thenthere exist λ1, λ2, . . . , λm ≥ 0 and µ1, µ2, . . . , µp ∈ R such that

∇f (x∗) +m∑i=1

λiai +

p∑j=1

µjcj = 0, (7)

λi (aTi x∗ − bi ) = 0, i = 1, 2, . . . ,m. (8)

(ii) (sufficiency in the convex case) If f is convex over Rn and x∗ is a feasiblesolution of (Q) for which there exist λ1, . . . , λm ≥ 0 and µ1, . . . , µp ∈ Rsuch that (7) and (8) are satisfied, then x∗ is an optimal solution of (Q).

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 13 / 21

Thomas
Typewritten Text
Equiv. to: c_j' x <= d_j & -c_j' x <= -d_j
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Another easy trick: Any real number can be written as the difference of two non-negative numbers.
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Line
Thomas
Line
Page 14: Lecture 10 - Linearly Constrained Problems: Separation ...

Representation Via the LagrangianGiven the a problem

(NLP)min f (x)s.t. gi (x) ≤ 0, i = 1, 2, . . . ,m,

hj(x) = 0, j = 1, 2, . . . , p.

The associated Lagrangian function os

L(x,λ,µ) = f (x) +m∑i=1

λigi (x) +

p∑j=1

µjhj(x).

The KKT conditions can be written as

∇xL(x∗,λ,µ) = ∇f (x∗) +m∑i=1

λi∇gi (x∗) +

p∑j=1

µj∇hj(x∗) = 0

λigi (x∗) = 0, i = 1, 2, . . . ,m.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 14 / 21

Thomas
Typewritten Text
Thomas
Typewritten Text
Page 15: Lecture 10 - Linearly Constrained Problems: Separation ...

ExamplesI

min 12 (x21 + x22 + x23 )

s.t. x1 + x2 + x3 = 3.

I

min x21 + 2x22 + 4x1x2s.t. x1 + x2 = 1,

x1, x2 ≥ 0.

In class

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 15 / 21

Page 16: Lecture 10 - Linearly Constrained Problems: Separation ...

Projection onto Affine SpacesLemma. Let C be the affine space

C = {x ∈ Rn : Ax = b},

where A ∈ Rm×n and b ∈ Rm. Then

PC (y) = y − AT (AAT )−1(Ay − b).

Proof. In class

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 16 / 21

Thomas
Typewritten Text
min ||x - y||^2 s.t. A x = b. Everything is convex (in fact it is a QP), apply the KKT result. (In this case, it says that the usual Lagrange multiplier condition is both necessary and sufficient for optimality.) (Of course, if you know your linear algebra, you can derive the same result without the KKT theorem, just like you don't need calculus to tell you that a x^2 + b x has its minimizer at x=-b/(2a).)
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Callout
assume rank m
Thomas
Typewritten Text
Thomas
Typewritten Text
Page 17: Lecture 10 - Linearly Constrained Problems: Separation ...

Orthogonal Projection onto HyperplanesConsider the hyperplane

H = {x ∈ Rn : aTx = b} (0 6= a ∈ Rn, b ∈ R).

Then by the previous slide:

PH(y) = y − a(aTa)−1(aTy − b) = y − aTy − b

‖a‖2a.

Lemma (distance of a point from a hyperplane) Let H = {x ∈ Rn :aTx = b}, where 0 6= a ∈ Rn and b ∈ R. Then

d(y,H) =|aTy − b|‖a‖

.

Proof.

d(y,H) = ‖y − PH(y)‖ =

∥∥∥∥y −(

y − aTy − b

‖a‖2a

)∥∥∥∥ =|aTy − b|‖a‖

.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 17 / 21

Page 18: Lecture 10 - Linearly Constrained Problems: Separation ...

Orthogonal Projection onto Half-SpacesLet H− = {x ∈ Rn : aTx ≤ b},where 0 6= a ∈ Rn and b ∈ R.Then

PH−(x) = x− [aTx− b]+‖a‖2

a

Proof. In class

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 18 / 21

Thomas
Typewritten Text
Page 19: Lecture 10 - Linearly Constrained Problems: Separation ...

Orthogonal Regression

I a1, . . . , am ∈ Rn.

I For a given 0 6= x ∈ Rn andy ∈ R, we define thehyperplane:

Hx,y :={

a ∈ Rn : xTa = y}.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 11

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

a2

a3

a5

a4

a1

I In the orthogonal regression problem we seek to find a nonzero vector x ∈ Rn

and y ∈ R such that the sum of squared Euclidean distances between thepoints a1, . . . , am to Hx,y is minimal:

minx,y

{m∑i=1

d(ai ,Hx,y )2 : 0 6= x ∈ Rn, y ∈ R

}.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 19 / 21

Thomas
Typewritten Text
Q: Is this different from the standard least square problem (aka linear regression) in statistics? (See Chapter 3.)
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
A: Yes, quite different!
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Page 20: Lecture 10 - Linearly Constrained Problems: Separation ...

Orthogonal RegressionI d(ai ,Hx,y )2 =

(aTi x−y)2‖x‖2 , i = 1, . . . ,m.

I The Orthogonal Regression problem is the same as

min

{m∑i=1

(aTi x− y)2

‖x‖2: 0 6= x ∈ Rn, y ∈ R

}.

I Fixing x and minimizing first with respect to y we obtain that the optimal yis given by y = 1

m

∑mi=1 aT

i x = 1meTAx.

I Using the above expression for y we obtain thatm∑i=1

(aTi x− y

)2=

m∑i=1

(aTi x− 1

meTAx

)2

=m∑i=1

(aTi x)2 − 2

m

m∑i=1

(eTAx)(aTi x) +

1

m(eTAx)2

=m∑i=1

(aTi x)2 − 1

m(eTAx)2 = ‖Ax‖2 − 1

m(eTAx)2

= xTAT

(Im −

1

meeT

)Ax.

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 20 / 21

Thomas
Typewritten Text
Thomas
Callout
Not a convex problem, and this time we have no intention to convexify it !
Page 21: Lecture 10 - Linearly Constrained Problems: Separation ...

Orthogonal Regression

I Therefore, a reformulation of the problem is

minx

{xT [AT (Im − 1

meeT )A]x

‖x‖2: x 6= 0

}.

Proposition. An optimal solution of the orthogonal regression problem (x, y)where x is an eigenvector of AT (Im− 1

meeT )A associated with the minimumeigenvalue and y = 1

m

∑mi=1 aT

i x. The optimal function value of the problemis λmin

[AT (Im − 1

meeT )A].

Amir Beck “Introduction to Nonlinear Optimization” Lecture Slides - Linearly Constrained Problems 21 / 21

Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
We turn the (non-convex) optimization problem into an eigen- problem !
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text
Thomas
Typewritten Text

Recommended