Convex Slides

8/10/2019 Convex Slides

1/301

Convex Optimization Boyd & Vandenberghe

1. Introduction

mathematical optimization

least-squares and linear programming

convex optimization

example

course goals and topics

nonlinear optimization

brief history of convex optimization

11


2/301

Mathematical optimization

(mathematical) optimization problem

minimize f0(x)

subject to fi(x) bi, i= 1, . . . , m x= (x1, . . . , xn): optimization variables

f0:Rn

R: objective function fi:Rn R, i= 1, . . . , m: constraint functions

optimal solution x has smallest value off0 among all vectors thatsatisfy the constraints

Introduction 12


3/301

Examples

portfolio optimization

variables: amounts invested in different assets constraints: budget, max./min. investment per asset, minimum return objective: overall risk or return variance

device sizing in electronic circuits

variables: device widths and lengths

constraints: manufacturing limits, timing requirements, maximum area objective: power consumption

data fitting

variables: model parameters constraints: prior information, parameter limits

objective: measure of misfit or prediction error

Introduction 13


4/301

Solving optimization problems

general optimization problem

very difficult to solve methods involve some compromise, e.g., very long computation time, or

not always finding the solution

exceptions: certain problem classes can be solved efficiently and reliably

least-squares problems linear programming problems convex optimization problems

Introduction 14


5/301

Least-squares

minimize Ax b22

solving least-squares problems

analytical solution: x = (ATA)1ATb reliable and efficient algorithms and software

computation time proportional to n2k (A

Rkn); less if structured

a mature technology

using least-squares

least-squares problems are easy to recognize a few standard techniques increase flexibility (e.g., including weights,

adding regularization terms)

Introduction 15


6/301

Linear programming

minimize cTxsubject to aTix bi, i= 1, . . . , m

solving linear programs no analytical formula for solution reliable and efficient algorithms and software

computation time proportional to n2m ifm n; less with structure a mature technology

using linear programming

not as easy to recognize as least-squares problems a few standard tricks used to convert problems into linear programs

(e.g., problems involving 1- or-norms, piecewise-linear functions)

Introduction 16


7/301

Convex optimization problem

minimize f0(x)subject to fi(x) bi, i= 1, . . . , m

objective and constraint functions are convex:

fi(x + y) fi(x) + fi(y)

if + = 1, 0, 0

includes least-squares problems and linear programs as special cases

Introduction 17


8/301

solving convex optimization problems

no analytical solution reliable and efficient algorithms

computation time (roughly) proportional to max

{n3, n2m, F

}, where F

is cost of evaluating fis and their first and second derivatives

almost a technology

using convex optimization

often difficult to recognize many tricks for transforming problems into convex form surprisingly many problems can be solved via convex optimization

Introduction 18


9/301

Example

m lamps illuminating n (small, flat) patches

lamp power pj

illumination Ik

rkjkj

intensity Ik at patch k depends linearly on lamp powers pj:

Ik=m

j=1akjpj, akj =r

2kj max{cos kj, 0}

problem: achieve desired illumination Ides with bounded lamp powers

minimize maxk=1,...,n | log Ik log Ides|subject to 0 pj pmax, j = 1, . . . , m

Introduction 19


10/301

how to solve?

1. use uniform power: pj =p, vary p2. use least-squares:

minimize

nk=1(Ik Ides)2

roundpj ifpj > pmax orpj


11/301

5. use convex optimization: problem is equivalent to

minimize f0(p) = maxk=1,...,n h(Ik/Ides)subject to 0 pj pmax, j = 1, . . . , m

with h(u) = max

{u, 1/u

}

0 1 2 3 40

1

2

3

4

5

u

h(u

)

f0 is convex because maximum of convex functions is convex

exact solution obtained with effortmodest factorleast-squares effortIntroduction 111


12/301

additional constraints: does adding 1 or 2 below complicate the problem?

1. no more than half of total power is in any 10 lamps

2. no more than half of the lamps are on (pj >0)

answer: with (1), still easy to solve; with (2), extremely difficult moral: (untrained) intuition doesnt always work; without the proper

background very easy problems can appear quite similar to very difficultproblems

Introduction 112


13/301

Course goals and topics

goals

1. recognize/formulate problems (such as the illumination problem) asconvex optimization problems

2. develop code for problems of moderate size (1000 lamps, 5000 patches)

3. characterize optimal solution (optimal power distribution), give limits ofperformance, etc.

topics

1. convex sets, functions, optimization problems

2. examples and applications

3. algorithms

Introduction 113


14/301

Nonlinear optimization

traditional techniques for general nonconvex problems involve compromises

local optimization methods (nonlinear programming)

find a point that minimizes f0 among feasible points near it fast, can handle large problems require initial guess provide no information about distance to (global) optimum

global optimization methods

find the (global) solution worst-case complexity grows exponentially with problem size

these algorithms are often based on solving convex subproblems

Introduction 114


15/301

Brief history of convex optimization

theory (convex analysis): ca19001970

algorithms

1947: simplex algorithm for linear programming (Dantzig) 1960s: early interior-point methods (Fiacco & McCormick, Dikin, . . . ) 1970s: ellipsoid method and other subgradient methods 1980s: polynomial-time interior-point methods for linear programming

(Karmarkar 1984)

late 1980snow: polynomial-time interior-point methods for nonlinearconvex optimization (Nesterov & Nemirovski 1994)

applications

before 1990: mostly in operations research; few in engineering since 1990: many new applications in engineering (control, signal

processing, communications, circuit design, . . . ); new problem classes(semidefinite and second-order cone programming, robust optimization)

Introduction 115


16/301


2. Convex sets

affine and convex sets

some important examples

operations that preserve convexity

generalized inequalities

separating and supporting hyperplanes

dual cones and generalized inequalities

21


17/301

Affine set

linethrough x1, x2: all points

x=x1+ (1 )x2 ( R)

x1

x2

= 1.2 = 1

= 0.6

= 0 = 0.2

affine set: contains the line through any two distinct points in the set

example: solution set of linear equations{x | Ax=b}(conversely, every affine set can be expressed as solution set of system oflinear equations)

Convex sets 22


18/301

Convex set

line segment between x1 and x2: all points

x=x1+ (1 )x2

with 0 1

convex set: contains line segment between any two points in the set

x1, x2 C, 0 1 = x1+ (1 )x2 C

examples(one convex, two nonconvex sets)

Convex sets 23


19/301

Convex combination and convex hull

convex combination ofx1,. . . , xk: any point x of the form

x=1x1+ 2x2+ + kxk

with 1+ + k= 1, i 0

convex hullconv

S: set of all convex combinations of points in S

Convex sets 24


20/301

Convex cone

conic (nonnegative) combination ofx1 andx2: any point of the form

x=1x1+ 2x2

with 1 0, 2 0

0

x1

x2

convex cone: set that contains all conic combinations of points in the set

Convex sets 25


21/301

Hyperplanes and halfspaces

hyperplane: set of the form{x | aTx=b} (a = 0)a

xaTx=b

x0

halfspace: set of the form{x | aTx b} (a = 0)

a

aTx b

aTx b

x0

a is the normal vector

hyperplanes are affine and convex; halfspaces are convex

Convex sets 26


22/301

Euclidean balls and ellipsoids

(Euclidean) ball with center xc and radius r:

B(xc, r) = {x | x xc2 r} = {xc+ ru | u2 1}

ellipsoid: set of the form

{x | (x xc)TP1(x xc) 1}

with P Sn++ (i.e., Psymmetric positive definite)

xc

other representation:{

xc+ Au|

u

2

1}

with A square and nonsingular

Convex sets 27


23/301

Norm balls and norm cones

norm: a function that satisfies x 0;x = 0 if and only ifx= 0

tx

=

|t

| x

for t

R

x + y x + ynotation: is general (unspecified) norm; symb is particular normnorm ball with center xc and radius r:

{x

| x

xc

r}

norm cone:{(x, t) | x t}

Euclidean norm cone is called second-order cone

x1x2

t

10

1

1

0

10

0.5

1

norm balls and cones are convexConvex sets 28


24/301

Polyhedra

solution set of finitely many linear inequalities and equalities

Ax b, Cx=d

(A Rmn, C Rpn, is componentwise inequality)a1 a2

a3

a4

a5P

polyhedron is intersection of finite number of halfspaces and hyperplanes

Convex sets 29


25/301

Positive semidefinite cone

notation: Sn is set of symmetric n n matrices Sn+= {X Sn | X 0}: positive semidefinite n n matrices

X Sn+ zTXz 0 for all zSn+ is a convex cone

Sn++=

{X

Sn

|X

0

}: positive definite n

n matrices

example: x yy z S2+xy

z

0

0.5

1

1

0

10

0.5

1

Convex sets 210


26/301

Operations that preserve convexity

practical methods for establishing convexity of a set C

1. apply definition

x1, x2 C, 0 1 = x1+ (1 )x2 C

2. show that C is obtained from simple convex sets (hyperplanes,halfspaces, norm balls, . . . ) by operations that preserve convexity

intersection affine functions perspective function linear-fractional functions

Convex sets 211


27/301

Intersection

the intersection of (any number of) convex sets is convex

example:

S= {x Rm

| |p(t)| 1for |t| /3}where p(t) =x1 cos t + x2 cos2t + + xm cos mtform= 2:

0 /3 2/3

1

0

1

t

p

(t)

x1

x2 S

2 1 0 1 22

1

0

1

2

Convex sets 212


28/301

Affine function

suppose f :Rn Rm is affine (f(x) =Ax + b withA Rmn, b Rm) the image of a convex set under f is convex

S Rn

convex = f(S) = {f(x) | x S} convex the inverse image f1(C) of a convex set under f is convex

C Rm

convex = f1

(C) = {x Rn

| f(x) C} convexexamples

scaling, translation, projection

solution set of linear matrix inequality{x | x1A1+ + xmAm B}(with Ai, B Sp)

hyperbolic cone{x | xTP x (cTx)2, cTx 0} (with P Sn+)

Convex sets 213


29/301

Perspective and linear-fractional function

perspective function P :Rn+1 Rn:

P(x, t) =x/t, domP = {(x, t) | t >0}

images and inverse images of convex sets under perspective are convex

linear-fractional function f :Rn Rm:

f(x) = Ax + b

cTx + d, dom f= {x | cTx + d >0}

images and inverse images of convex sets under linear-fractional functionsare convex

Convex sets 214


30/301

exampleof a linear-fractional function

f(x) = 1x1+ x2+ 1

x

x1

x2 C

1 0 11

0

1

x1

x2

f(C)

1 0 11

0

1

Convex sets 215


31/301

Generalized inequalities

a convex cone K Rn is a proper cone if

K is closed (contains its boundary)

K is solid (has nonempty interior) Kis pointed (contains no line)

examples nonnegative orthant K=Rn+= {x Rn | xi 0, i= 1, . . . , n} positive semidefinite cone K=Sn+ nonnegative polynomials on [0, 1]:

K= {x Rn | x1+ x2t + x3t2 + + xntn1 0 fort [0, 1]}

Convex sets 216


32/301

generalized inequality defined by a proper cone K:

x Ky y x K, x Ky y x intK

examples

componentwise inequality (K=Rn

+)

x Rn+ y xi yi, i= 1, . . . , n

matrix inequality (K=Sn+)

XSn+ Y Y Xpositive semidefinite

these two types are so common that we drop the subscript in

K

properties: many properties ofKare similar toonR, e.g.,

x Ky, u Kv = x + u Ky+ v

Convex sets 217


33/301

Minimum and minimal elements

K is not in general a linear ordering: we can have x Ky and yKxx S is the minimum element ofSwith respect toK if

y S = x Ky

x S is a minimal element ofSwith respect toK if

y S, yKx = y=x

example(K=R2

+)

x1 is the minimum element ofS1x2 is a minimal element ofS2 x1

x2S1S2

Convex sets 218


34/301

Separating hyperplane theorem

ifC and D are nonempty disjoint convex sets, there exist a = 0, b s.t.

aTx b forx C, aTx b forx D

D

C

a

aTx b aTx b

the hyperplane{x | aTx=b} separates C andD

strict separation requires additional assumptions (e.g., C is closed, D is asingleton)

Convex sets 219


35/301

Supporting hyperplane theorem

supporting hyperplane to set Cat boundary point x0:

{x | aTx=aTx0}

where a = 0 and aTx aTx0 for all x C

C

ax0

supporting hyperplane theorem: ifCis convex, then there exists asupporting hyperplane at every boundary point ofC

Convex sets 220


36/301

Dual cones and generalized inequalities

dual cone of a cone K:

K = {y| yTx 0 for all x K}

examples K=Rn+: K =Rn+ K=Sn+: K =Sn+ K= {(x, t) | x2 t}: K = {(x, t) | x2 t} K= {(x, t) | x1 t}: K = {(x, t) | x t}

first three examples are self-dual cones

dual cones of proper cones are proper, hence define generalized inequalities:

yK0 yTx 0 for all x K0

Convex sets 221


37/301

Minimum and minimal elements via dual inequalities

minimum element w.r.t.Kx is minimum element ofS iff for all K 0, x is the unique minimizerofTz over S

x

S

minimal element w.r.t.K

ifx minimizes Tz over Sfor some

K0, then x is minimal

Sx1

x2

1

2

ifx is a minimal element of a convex setS, then there exists a nonzero

K0 such that x minimizes

Tz over S

Convex sets 222


38/301

optimal production frontier

different production methods use different amounts of resources x Rn

production set P: resource vectorsxfor all possible production methods efficient (Pareto optimal) methods correspond to resource vectors x

that are minimal w.r.t. R

n

+

example(n= 2)x1, x2, x3 are efficient; x4, x5 are not

x4x2

x1

x5

x3

P

labor

fuel

Convex sets 223


39/301


3. Convex functions

basic properties and examples

operations that preserve convexity

the conjugate function

quasiconvex functions

log-concave and log-convex functions

convexity with respect to generalized inequalities

31

D fi i i


40/301

Definition

f :Rn

R is convex ifdom f is a convex set andf(x + (1 )y) f(x) + (1 )f(y)

for all x, y

dom f, 0

1

(x, f(x))

(y, f(y))

f is concave iff is convex

f is strictly convex ifdom f is convex and

f(x + (1 )y)< f(x) + (1 )f(y)

for x, y dom f, x =y, 0<


41/301

Examples on R

convex:

affine: ax + b onR, for any a, b R

exponential: eax, for any a

R

powers: x onR++, for 1 or 0 powers of absolute value:|x|p onR, for p 1

negative entropy: x log x onR++

concave:

affine: ax + b onR, for any a, b R powers: x onR++, for 0 1 logarithm: log x onR++

Convex functions 33

n mn


42/301

Examples on Rn and Rmn

affine functions are convex and concave; all norms are convex

examples on Rn

affine function f(x) =aTx + b norms:xp= (ni=1 |xi|p)1/p forp 1;x= maxk |xk|

examples on Rmn (m n matrices)

affine function

f(X) =tr(ATX) + b=m

i=1

nj=1

AijXij+ b

spectral (maximum singular value) norm

f(X) = X2=max(X) = (max(XTX))1/2

Convex functions 34

R t i ti f f ti t li


43/301

Restriction of a convex function to a line

f :Rn R is convex if and only if the function g:R R,

g(t) =f(x + tv), dom g= {t | x + tv dom f}

is convex (in t) for any x dom f, v Rn

can check convexity offby checking convexity of functions of one variable

example. f :Sn

R with f(X) = log det X, dom f=Sn

++

g(t) = log det(X+ tV) = log det X+ log det(I+ tX1/2V X1/2)

= log det X+n

i=1 log(1 + ti)where i are the eigenvalues ofX

1/2V X1/2

g is concave in t (for any choice ofX

0, V); hence f is concave

Convex functions 35

Extended value extension


44/301

Extended-value extension

extended-value extension f off is

f(x) =f(x), x dom f, f(x) = , x dom f

often simplifies notation; for example, the condition

0 1 = f(x + (1 )y) f(x) + (1 )f(y)

(as an inequality in R {}), means the same as the two conditions

dom f is convex for x, y dom f,

0 1 = f(x + (1 )y) f(x) + (1 )f(y)

Convex functions 36

First order condition


45/301

First-order condition

f is differentiable ifdom f is open and the gradient

f(x) =

f(x)

x1,

f(x)

x2, . . . ,

f(x)

xn exists at each x dom f1st-order condition: differentiable fwith convex domain is convex iff

f(y) f(x) + f(x)T(y x) for allx, y dom f

(x, f(x))

f(y)

f(x) + f(x)

T

(y x)

first-order approximation offis global underestimator

Convex functions 37

Second order conditions


46/301

Second-order conditions

f is twice differentiable ifdom f is open and the Hessian2f(x) Sn,

2f(x)ij = 2f(x)

xix

j

, i, j = 1, . . . , n ,

exists at each x dom f

2nd-order conditions: for twice differentiable fwith convex domain

fis convex if and only if

2f(x)

0 for allx

dom f

if2f(x) 0 for all x dom f, then f is strictly convex

Convex functions 38

Examples


47/301

Examples

quadratic function: f(x) = (1/2)xT

P x + qT

x + r (with P Sn

)

f(x) =P x + q, 2f(x) =P

convex ifP

0

least-squares objective: f(x) = Ax b22f(x) = 2AT(Ax b), 2f(x) = 2ATA

convex (for any A)

quadratic-over-linear: f(x, y) =x2/y

2f(x, y) = 2y3 y

x y

xT 0

convex for y >0 xy

f(x,y

)

2

0

2

0

1

20

1

2

Convex functions 39

log-sum-exp: f (x) = log nk exp xk is convex


48/301

log sum exp: f(x) = log

k=1 exp xk is convex

2f(x) = 11Tz

diag(z) 1(1Tz)2

zzT (zk= exp xk)

to show2

f(x) 0, we must verify that vT

2

f(x)v 0 for all v:

vT2f(x)v=(

k zkv2k)(

k zk) (

k vkzk)2

(

kzk)2

0

since (k vkzk)2 (k zkv2k)(k zk) (from Cauchy-Schwarz inequality)

geometric mean: f(x) = (nk=1 xk)1/n onRn++ is concave(similar proof as for log-sum-exp)

Convex functions 310

Epigraph and sublevel set


49/301

Epigraph and sublevel set

-sublevel set off :Rn R:

C= {x dom f| f(x) }

sublevel sets of convex functions are convex (converse is false)

epigraphoff :Rn R:

epif= {(x, t) R

n+1

| x dom

f, f(x) t}epi f

f

fis convex if and only ifepi f is a convex set


Jensens inequality


50/301

Jensen s inequality

basic inequality: iff is convex, then for 0 1,

f(x + (1 )y) f(x) + (1 )f(y)

extension: iff is convex, then

f(E z) E f(z)for any random variable z

basic inequality is special case with discrete distribution

prob(z =x) =, prob(z=y) = 1


Operations that preserve convexity


51/301

Op t o s t t p s o ty

practical methods for establishing convexity of a function

1. verify definition (often simplified by restricting to a line)

2. for twice differentiable functions, show2f(x) 0

3. show that f is obtained from simple convex functions by operationsthat preserve convexity

nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective



52/301

Pointwise maximum


53/301

iff1, . . . ,fm are convex, then f(x) = max{f1(x), . . . , f m(x)} is convex

examples

piecewise-linear function: f(x) = maxi=1,...,m(aTix + bi) is convex sum ofr largest components ofx Rn:

f(x) =x[1]+ x[2]+ + x[r]

is convex (x[i] is ith largest component ofx)

proof:f(x) = max{xi1+ xi2+ + xir| 1 i1< i2< < ir n}


Pointwise supremum


54/301

iff(x, y) is convex in x for each y A, theng(x) = sup

yAf(x, y)

is convexexamples

support function of a set C: SC(x) = supyCyTx is convex

distance to farthest point in a set C:f(x) = sup

yCx y

maximum eigenvalue of symmetric matrix: for X Sn,max(X) = sup

y2=1yTXy


Composition with scalar functions


55/301

composition ofg:Rn

R and h:R R:f(x) =h(g(x))

f is convex if g convex, h convex, h nondecreasing

g concave, h convex, hnonincreasing

proof (for n= 1, differentiable g, h)

f(x) =h(g(x))g(x)2 + h(g(x))g(x)

note: monotonicity must hold for extended-value extension h

examples

exp g(x) is convex ifg is convex 1/g(x) is convex ifg is concave and positive


Vector composition


56/301

composition ofg:Rn Rk and h:Rk R:

f(x) =h(g(x)) =h(g1(x), g2(x), . . . , gk(x))

f is convex if gi convex, h convex, h nondecreasing in each argument

gi concave, h convex, h nonincreasing in each argument

proof (for n= 1, differentiable g, h)

f(x) =g(x)T2h(g(x))g(x) + h(g(x))Tg(x)

examplesmi=1 log gi(x) is concave ifgi are concave and positive log

mi=1 exp gi(x) is convex ifgi are convex


Minimization


57/301

iff(x, y) is convex in (x, y) and C is a convex set, then

g(x) = infyC

f(x, y)

is convex

examples

f(x, y) =xTAx + 2xTBy+ yTCy with

A BBT C

0, C 0

minimizing over y gives g(x) = infy f(x, y) =xT

(A BC1

BT

)xg is convex, hence Schur complement A BC1BT 0

distance to a set: dist(x, S) = infySx y is convex ifS is convex


Perspective


58/301

theperspectiveof a functionf :Rn R is the function g :Rn R R,

g(x, t) =tf(x/t), dom g= {(x, t) | x/t dom f, t >0}

g is convex iff is convex

examples

f(x) =xTx is convex; hence g(x, t) =xTx/t is convex for t >0 negative logarithm f(x) = log x is convex; hence relative entropy

g(x, t) =t log t t log x is convex onR2++ iff is convex, then

g(x) = (cTx + d)f(Ax + b)/(cTx + d)is convex on{x | cTx + d >0, (Ax + b)/(cTx + d) dom f}


The conjugate function


59/301

theconjugate of a function f is

f(y) = supxdom f

(yTx f(x))

f(x)

(0,f(y))

xy

x

f is convex (even iff is not) will be useful in chapter 5


examples


60/301

negative logarithm f(x) = log x

f(y) = supx>0

(xy+ log x)

= 1 log(y) y


61/301

f :Rn R is quasiconvex ifdom fis convex and the sublevel sets

S= {x dom f| f(x) }

are convex for all

a b c

f is quasiconcave iff is quasiconvex f is quasilinear if it is quasiconvex and quasiconcave


Examples


62/301

|x| is quasiconvex onR ceil(x) = inf{z Z | z x} is quasilinear log x is quasilinear on R++ f(x1, x2) =x1x2 is quasiconcave on R2++ linear-fractional function

f(x) =aTx + b

cTx + d

, dom f=

{x

|cTx + d >0

}is quasilinear

distance ratio

f(x) =x a2x b2 , dom f= {x | x a2 x b2}

is quasiconvex


internal rate of return


63/301

cash flow x= (x0, . . . , xn); xi is payment in periodi (to us ifxi>0) we assume x00 present value of cash flow x, for interest rate r:

PV(x, r) =n

i=0

(1 + r)ixi

internal rate of return is smallest interest rate for which PV(x, r) = 0:

IRR(x) = inf{r 0 | PV(x, r) = 0}

IRR is quasiconcave: superlevel set is intersection of open halfspaces

IRR(x) R n

i=0

(1 + r)ixi>0 for 0 r < R


Properties


64/301

modified Jensen inequality: for quasiconvex f

0 1 = f(x + (1 )y) max{f(x), f(y)}

first-order condition: differentiable fwith cvx domain is quasiconvex iff

f(y) f(x) = f(x)T(y x) 0

xf(x)

sums of quasiconvex functions are not necessarily quasiconvex


Log-concave and log-convex functions


65/301

a positive function f is log-concave iflog f is concave:

f(x + (1 )y) f(x)f(y)1 for 0 1

f is log-convex iflog f is convex

powers: xa onR++ is log-convex for a 0, log-concave for a 0 many common probability densities are log-concave, e.g., normal:

f(x) = 1(2)n det

e12(xx)

T1(xx)

cumulative Gaussian distribution function is log-concave

(x) = 1

2

x

eu2/2 du


Properties of log-concave functions


66/301

twice differentiable fwith convex domain is log-concave if and only if

f(x)2f(x) f(x)f(x)T

for all x dom f

product of log-concave functions is log-concave

sum of log-concave functions is not always log-concave integration: iff :Rn Rm R is log-concave, then

g(x) = f(x, y)dyis log-concave (not easy to show)


consequences of integration property


67/301

convolution f

g of log-concave functions f, g is log-concave

(f g)(x) =

f(x y)g(y)dy

ifC Rn convex and y is a random variable with log-concave pdf then

f(x) =prob(x + y C)

is log-concave

proof: write f(x) as integral of product of log-concave functions

f(x) = g(x + y)p(y) dy, g(u) = 1 u C0 u C,pis pdf ofy


example: yield function


68/301

Y(x) =prob(x + w S) x Rn: nominal parameter values for product

w Rn

: random variations of parameters in manufactured product

S: set of acceptable values

ifSis convex and w has a log-concave pdf, then

Y is log-concave

yield regions{x | Y(x) } are convex


Convexity with respect to generalized inequalities


69/301

f :Rn Rm is K-convex ifdom f is convex and

f(x + (1 )y) Kf(x) + (1 )f(y)

forx, y dom f, 0 1

examplef :Sm Sm, f(X) =X2 is Sm+ -convex

proof: for fixed z Rm, zTX2z= Xz22 is convex in X, i.e.,

zT(X+ (1 )Y)2z zTX2z+ (1 )zTY2z

forX, Y Sm, 0 1

therefore (X+ (1 )Y)2 X2 + (1 )Y2



4 Convex optimization problems


70/301

4. Convex optimization problems

optimization problem in standard form

convex optimization problems

quasiconvex optimization linear optimization

quadratic optimization geometric programming generalized inequality constraints

semidefinite programming vector optimization

41

Optimization problem in standard form


71/301

minimize f0(x)subject to fi(x) 0, i= 1, . . . , m

hi(x) = 0, i= 1, . . . , p

x Rn

is the optimization variable f0:Rn R is the objective or cost function fi:Rn R, i= 1, . . . , m, are the inequality constraint functions

hi:Rn

R are the equality constraint functionsoptimal value:

p = inf{

f0(x)|

fi(x)

0, i= 1, . . . , m, hi(x) = 0, i= 1, . . . , p}

p = if problem is infeasible (no x satisfies the constraints) p = if problem is unbounded below

Convex optimization problems 42

Optimal and locally optimal points


72/301

x is feasible ifx dom f0 and it satisfies the constraintsa feasible x is optimal iff0(x) =p

; Xopt is the set of optimal points

x is locally optimal if there is an R >0 such that x is optimal for

minimize (over z) f0(z)subject to fi(z) 0, i= 1, . . . , m, hi(z) = 0, i= 1, . . . , p

z x2 R

examples(with n= 1, m=p= 0)

f0(x) = 1/x, dom f0=R++: p = 0, no optimal point

f0(x) = log x, dom f0=R++: p

= f0(x) =x log x, dom f0=R++: p = 1/e, x= 1/e is optimal f0(x) =x3 3x, p = , local optimum at x= 1


Implicit constraints


73/301

the standard form optimization problem has an implicit constraint

x D =m

i=0dom fi

p

i=1domhi,

we callD thedomain of the problem

the constraints fi(x)

0, hi(x) = 0 are the explicit constraints

a problem is unconstrainedif it has no explicit constraints (m=p= 0)

example:

minimize f0(x) = ki=1 log(bi aTix)is an unconstrained problem with implicit constraints aTix < bi


Feasibility problem


74/301

find xsubject to fi(x) 0, i= 1, . . . , m

hi(x) = 0, i= 1, . . . , p

can be considered a special case of the general problem with f0(x) = 0:

minimize 0subject to fi(x) 0, i= 1, . . . , m

hi(x) = 0, i= 1, . . . , p

p = 0 if constraints are feasible; any feasible x is optimal

p = if constraints are infeasible


Convex optimization problem


75/301

standard form convex optimization problem


a

T

ix=bi, i= 1, . . . , p

f0, f1, . . . ,fm are convex; equality constraints are affine problem is quasiconvex iff0 is quasiconvex (and f1, . . . ,fm convex)

often written as

minimize f0(x)

subject to fi(x) 0, i= 1, . . . , mAx=b

important property: feasible set of a convex optimization problem is convex


example


76/301

minimize f0(x) =x21+ x22subject to f1(x) =x1/(1 + x

22) 0

h1(x) = (x1+ x2)2 = 0

f0 is convex; feasible set{(x1, x2) | x1= x2 0} is convex not a convex problem (according to our definition): f1 is not convex, h1

is not affine

equivalent (but not identical) to the convex problem

minimize x21+ x22

subject to x1 0x1+ x2= 0


Local and global optima

any locally optimal point of a convex problem is (globally) optimal


77/301

any locally optimal point of a convex problem is (globally) optimal

proof: supposexis locally optimal and there exists a y withf0(y)< f0(x)

x locally optimal means there is an R >0 such that

z feasible, z x2 R = f0(z) f0(x)

considerz =y+ (1 )x with =R/(2y x2)

y x2> R, so 0<


78/301

x is optimal if and only if it is feasible and

f0(x)T(y x) 0 for all feasibley

f0(x)

X x

if nonzero,f0(x) defines a supporting hyperplane to feasible set X at x


unconstrained problem: x is optimal if and only if

x dom f0, f0(x) = 0


79/301

f0,

f0( ) 0

equality constrained problem

minimize f0(x) subject to Ax=b

x is optimal if and only if there exists a such that

x dom f0, Ax=b, f0(x) + AT= 0

minimization over nonnegative orthant

minimize f0(x) subject to x 0

x is optimal if and only if

x dom f0, x 0,f0(x)i 0 xi= 0

f0(x)i= 0 xi>0


Equivalent convex problems

two problems are (informally) equivalent if the solution of one is readily


80/301

p ( y) q yobtained from the solution of the other, and vice-versa

some common transformations that preserve convexity:

eliminating equality constraints


Ax=b

is equivalent to

minimize (over z) f0(F z+ x0)subject to fi(F z+ x0) 0, i= 1, . . . , m

where F and x0 are such that

Ax=b x=F z+ x0 for some z


introducing equality constraintsminimize f0(A0x + b0)


81/301

subject to fi(Aix + bi) 0, i= 1, . . . , mis equivalent to

minimize (over x, yi) f0(y0)subject to fi(yi) 0, i= 1, . . . , m

yi=Aix + bi, i= 0, 1, . . . , m

introducing slack variables for linear inequalities

minimize f0(x)subject to aTix bi, i= 1, . . . , m

is equivalent to

minimize (over x, s) f0(x)subject to aTix + si=bi, i= 1, . . . , m

si 0, i= 1, . . . m


epigraph form: standard form convex problem is equivalent to

minimize (over x, t) t


82/301

( , )subject to f0(x) t 0

fi(x) 0, i= 1, . . . , mAx=b

minimizing over some variables

minimize f0(x1, x2)subject to fi(x1)

0, i= 1, . . . , m

is equivalent to

minimize f0(x1)

subject to fi(x1) 0, i= 1, . . . , m

where f0(x1) = infx2 f0(x1, x2)


Quasiconvex optimization


83/301


Ax=b

with f0:Rn

R quasiconvex, f1, . . . ,fm convex

can have locally optimal points that are not (globally) optimal

(x, f0(x))



84/301

quasiconvex optimization via convex feasibility problems

(x) 0, f (x) 0, i= 1, . . . , m, Ax=b (1)


85/301

t( )

i( )

( )

for fixed t, a convex feasibility problem in x if feasible, we can conclude that t p; if infeasible, t p

Bisection method for quasiconvex optimization

givenl p, u p, tolerance > 0.

repeat1. t := (l+u)/2.2. Solve the convex feasibility problem (1).

3. if (1) is feasible, u:=t; else l :=t.

until u l .

requires exactlylog2((u l)/) iterations (where u, l are initial values)


Linear program (LP)


86/301

minimize cTx + dsubject to Gx h

Ax=b

convex problem with affine objective and constraint functions feasible set is a polyhedron

P x

c


Examples

diet problem: choose quantities x1, . . . ,xn ofn foods


87/301

one unit of food j costs cj, contains amount aij of nutrient i healthy diet requires nutrient i in quantity at least bi

to find cheapest healthy diet,

minimize cTxsubject to Ax b, x 0

piecewise-linear minimization

minimize maxi=1,...,m(aTix + bi)

equivalent to an LP

minimize tsubject to aTi x + bi t, i= 1, . . . , m


Chebyshev center of a polyhedron

Chebyshev center of


88/301

P= {x | aTix bi, i= 1, . . . , m}

is center of largest inscribed ball

B= {xc+ u | u2 r}

xchebxcheb

aTix

bi for all x

B if and only if

sup{aTi(xc+ u) | u2 r} =aTixc+ rai2 bi

hence, xc, r can be determined by solving the LP

maximize rsubject to aTixc+ rai2 bi, i= 1, . . . , m


Linear-fractional program

i i i f0( )


89/301

minimize f0(x)subject to Gx h

Ax=b

linear-fractional program

f0(x) = cTx + d

eTx + f, dom f0(x) = {x | eTx + f >0}

a quasiconvex optimization problem; can be solved by bisection also equivalent to the LP (variables y, z)

minimize cTy+ dz

subject to Gy hzAy=bzeTy+ f z= 1z 0


generalized linear-fractional program

f0( )cT

ix + d

i d m f0( ) { | Ti +fi > 0 i 1 }


90/301

f0(x) = maxi=1,...,r i ieTix + fi, dom f0(x) = {x | eTi x+fi>0, i= 1, . . . , r}

a quasiconvex optimization problem; can be solved by bisection

example: Von Neumann model of a growing economy

maximize (over x, x+) mini=1,...,n x+i /xi

subject to x+

0, Bx+

Ax

x, x+ Rn: activity levels ofn sectors, in current and next period

(Ax)i, (Bx

+)i: produced, resp. consumed, amounts of good i

x+i /xi: growth rate of sector iallocate activity to maximize growth rate of slowest growing sector


Quadratic program (QP)

minimi e (1/2) T P + T +


91/301

minimize (1/2)xTP x + qTx + rsubject to Gx h

Ax=b

P Sn+, so objective is convex quadratic minimize a convex quadratic function over a polyhedron

P

x

f0(x

)


Examples

least squares 2


92/301

least-squares minimize Ax b22

analytical solution x =Ab (A is pseudo-inverse)

can add linear constraints, e.g., l x ulinear program with random cost

minimize cT

x + xT

x=E cT

x + var(cT

x)subject to Gx h, Ax=b

c is random vector with mean c and covariance

hence, cTx is random variable with mean cTx and variance xTx >0 is risk aversion parameter; controls the trade-off between

expected cost and variance (risk)


Quadratically constrained quadratic program (QCQP)


93/301

minimize (1/2)xTP0x + qT0x + r0

subject to (1/2)xTPix + qTi x + ri 0, i= 1, . . . , m

Ax=b

Pi Sn+; objective and constraints are convex quadratic

ifP1, . . . , P m Sn++, feasible region is intersection ofm ellipsoids andan affine set


Second-order cone programming

T


94/301

minimize fTxsubject to Aix + bi2 cTix + di, i= 1, . . . , m

F x=g

(Ai Rnin, F Rpn)

inequalities are called second-order cone (SOC) constraints:

(Aix + bi, cTix + di) second-order cone in Rni+1

for ni= 0, reduces to an LP; ifci= 0, reduces to a QCQP

more general than QCQP and LP


Robust linear programming

the parameters in optimization problems are often uncertain, e.g., in an LP


95/301

minimize cTxsubject to aTix bi, i= 1, . . . , m ,

there can be uncertainty in c, ai, b

i

two common approaches to handling uncertainty (in ai, for simplicity)

deterministic model: constraints must hold for all ai Ei

minimize cTxsubject to aTix bi for all ai Ei, i= 1, . . . , m ,

stochastic model: ai is random variable; constraints must hold with

probability

minimize cTxsubject to prob(aTix bi) , i= 1, . . . , m


deterministic approach via SOCP

choose an ellipsoid as

Ei:


96/301

EEi= {ai+ Piu | u2 1} (ai Rn, Pi Rnn)

center is ai, semi-axes determined by singular values/vectors ofPi

robust LPminimize cTx

subject to aT

ix bi ai Ei, i= 1, . . . , mis equivalent to the SOCP

minimize cTx

subject to aTix + PTi x2 bi, i= 1, . . . , m

(follows from supu21(ai+ Piu)Tx= aTix + PTi x2)


stochastic approach via SOCP

assume ai is Gaussian with mean ai, covariance i (ai

N(ai, i))


97/301

N aTix is Gaussian r.v. with mean aTix, variance xTix; hence

prob(aT

i

x

bi) = bi aTix

1/2i x2where (x) = (1/

2)x

et2/2 dt is CDF ofN(0, 1)

robust LPminimize cTxsubject to prob(aTix bi) , i= 1, . . . , m ,

with 1/2, is equivalent to the SOCPminimize cTx

subject to aTix + 1()1/2i x2 bi, i= 1, . . . , m


Geometric programming

monomial function


98/301

f(x) =cxa11 xa22 xann , dom f=Rn++

with c >0; exponent ai can be any real number

posynomial function: sum of monomials

f(x) =K

k=1ckx

a1k1 x

a2k2 xankn , dom f=Rn++

geometric program (GP)

minimize f0(x)

subject to fi(x) 1, i= 1, . . . , mhi(x) = 1, i= 1, . . . , p

with fi posynomial, hi monomial


Geometric program in convex form

change variables to yi= log xi, and take logarithm of cost, constraints


99/301

monomial f(x) =cxa11 xann transforms to

log f(ey1, . . . , eyn) =aTy+ b (b= log c)

posynomial f(x) =Kk=1 ckxa1k1 xa2k2 xankn transforms tolog f(ey1, . . . , eyn) = log

K

k=1 eaTk y+bk (bk= log ck)

geometric program transforms to convex problem

minimize log Kk=1 exp(aT0ky+ b0k)subject to log

Kk=1 exp(a

Tiky+ bik)

0, i= 1, . . . , m

Gy+ d= 0


Design of cantilever beam

segment 4 segment 3 segment 2 segment 1


100/301

F

Nsegments with unit lengths, rectangular cross-sections of size wi hi given vertical force Fapplied at the right end

design problem

minimize total weightsubject to upper & lower bounds on wi, hi

upper bound & lower bounds on aspect ratios hi/wi

upper bound on stress in each segmentupper bound on vertical deflection at the end of the beam

variables: wi, hi for i= 1, . . . , N


objective and constraint functions

total weight w1h1+

+ wNhN is posynomial


101/301

aspect ratio hi/wi and inverse aspect ratio wi/hi are monomials

maximum stress in segment i is given by 6iF/(wih

2i ), a monomial

the vertical deflection yi and slope vi of central axis at the right end ofsegmenti are defined recursively as

vi = 12(i 1/2) FEwih3i

+ vi+1

yi = 6(i 1/3) FEwih3i

+ vi+1+ yi+1

fori=N, N 1, . . . , 1, withvN+1=yN+1= 0(Eis Youngs modulus)vi and yi are posynomial functions ofw, h


formulation as a GP

minimize w1h1+ + wNhNsubject to w1maxwi 1, wminw1i 1, i= 1, . . . , N


102/301

h1maxhi 1, hminh1i 1, i= 1, . . . , N S1maxw

1i hi 1, Sminwih1i 1, i= 1, . . . , N

6iF 1maxw1i h2i 1, i= 1, . . . , N y1maxy1 1

note

we write wmin wi wmax andhmin hi hmaxwmin/wi 1, wi/wmax 1, hmin/hi 1, hi/hmax 1

we write Smin hi/wi Smax as

Sminwi/hi 1, hi/(wiSmax) 1


Minimizing spectral radius of nonnegative matrix

Perron-Frobenius eigenvalue pf(A)

i t f ( l t i ) iti A Rnn


103/301

exists for (elementwise) positive A Rnn

a real, positive eigenvalue ofA, equal to spectral radius maxi |i(A)|

determines asymptotic growth (decay) rate ofAk

: Ak

k

pf as k alternative characterization: pf(A) = inf{ | Av v for some v 0}

minimizing spectral radius of matrix of posynomials

minimize pf(A(x)), where the elements A(x)ij are posynomials ofx equivalent geometric program:

minimize subject to nj=1 A(x)ijvj/(vi) 1, i= 1, . . . , nvariables , v, x


Generalized inequality constraints

convex problem with generalized inequality constraints

i i i f ( )


104/301

minimize f0(x)subject to fi(x) Ki0, i= 1, . . . , m

Ax=b

f0:Rn R convex; fi:Rn Rki Ki-convex w.r.t. proper cone Ki same properties as standard convex problem (convex feasible set, local

optimum is global, etc.)

conic form problem: special case with affine objective and constraints

minimize cTx

subject to F x + gK0Ax=b

extends linear programming (K=Rm+ ) to nonpolyhedral cones


Semidefinite program (SDP)

minimize cTx


105/301

subject to x1F1+ x2F2+ + xnFn+ G 0Ax=b

with Fi, G

Sk

inequality constraint is called linear matrix inequality (LMI) includes problems with multiple LMI constraints: for example,

x1F1+ + xnFn+ G 0, x1F1+ + xnFn+ G 0

is equivalent to single LMI

x1 F1 0

0 F1

+x2

F2 00 F2

+ +xn

Fn 00 Fn

+ G 0

0 G

0


LP and SOCP as SDP

LP and equivalent SDP


106/301

LP: minimize cTxsubject to Ax b

SDP: minimize cTxsubject to diag(Ax b) 0

(note different interpretation of generalized inequality)SOCP and equivalent SDP

SOCP: minimize fTxsubject to Aix + bi2 cTix + di, i= 1, . . . , m

SDP: minimize fTx

subject to (cTi x + di)I Aix + bi

(Aix + bi)T cTix + di

0, i= 1, . . . , mConvex optimization problems 437

Eigenvalue minimization

minimize max(A(x))


107/301

minimize max(A(x))

where A(x) =A0+ x1A1+ + xnAn (with given Ai Sk)

equivalent SDPminimize tsubject to A(x)

tI

variables x Rn, t R follows from

max(A) t A tI


Matrix norm minimization

minimize A(x)2= max(A(x)TA(x))1/2


108/301

where A(x) =A0+ x1A1+ + xnAn (with given Ai Rpq)equivalent SDP

minimize t

subject to

tI A(x)A(x)T tI

0

variables x Rn, t R constraint follows from

A2 t AT

A t2

I, t 0

tI AAT tI

0


Vector optimization

general vector optimization problem


109/301

minimize (w.r.t. K) f0(x)subject to fi(x) 0, i= 1, . . . , m

hi(x) = 0, i= 1, . . . , p

vector objective f0:Rn Rq, minimized w.r.t. proper cone K Rq

convex vector optimization problem

minimize (w.r.t. K) f0(x)subject to fi(x) 0, i= 1, . . . , m

Ax=b

with f0 K-convex, f1, . . . ,fm convex


Optimal and Pareto optimal points

set of achievable objective values


110/301

O = {f0(x) | x feasible}

feasible x is optimal iff0(x) is the minimum value ofO feasible x is Pareto optimal iff0(x) is a minimal value ofO

O

f0(x)

x is optimal

O

f0(xpo)

xpo is Pareto optimal



111/301

Regularized least-squares

minimize (w.r.t. R2

+) (Ax b2

2, x2

2)


112/301

0 10 20 30 40 500

5

10

15

20

25

F1(x) =

Ax

b

22

F2(x)=

x

2 2 O

example for A R10010; heavy line is formed by Pareto optimal points


Risk return trade-off in portfolio optimization

minimize (w.r.t. R2+) (

pTx, xTx)

subject to 1Tx= 1, x 0


113/301

x Rn is investment portfolio; xi is fraction invested in asset i

p

Rn is vector of relative asset price changes; modeled as a random

variable with mean p, covariance

pTx=E r is expected return; xTx=var r is return variance

example

meanreturn

standard deviation of return0% 10% 20%

0%

5%

10%

15%

standard deviation of return

allo

cationx

x(1)

x(2)x(3)x(4)

0% 10% 20%

0

0.5

1


Scalarization

to find Pareto optimal points: choose

K 0 and solve scalar problem

minimize Tf0(x)


114/301


hi(x) = 0, i= 1, . . . , p

ifx is optimal for scalar problem,then it is Pareto-optimal for vectoroptimization problem

O

f0(x1)

1f

0(x

2) 2

f0(x3)

for convex vector optimization problems, can find (almost) all Paretooptimal points by varying K0


Scalarization for multicriterion problems

to find Pareto optimal points, minimize positive weighted sum

Tf0(x) =1F1(x) + + qFq(x)


115/301

f ( ) ( ) q q( )

examples

regularized least-squares problem of page 443

take = (1, ) with >0

minimize Ax b22+ x22

for fixed , a LS problem

0 5 10 15 200

5

10

15

20

Ax b22

x

2 2

= 1


risk-return trade-off of page 444

minimize

pTx + xTx

subject to 1Tx= 1, x 0


116/301

for fixed >0, a quadratic program



5. Duality


117/301

Lagrange dual problem

weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples

generalized inequalities

51

Lagrangian

standard form problem (not necessarily convex)

minimize f0(x)


118/301

( )subject to fi(x) 0, i= 1, . . . , m

hi(x) = 0, i= 1, . . . , p

variable x Rn, domainD, optimal value p

Lagrangian: L:Rn Rm Rp R, with domL= D Rm Rp,

L(x,,) =f0(x) +m

i=1

ifi(x) +p

i=1

ihi(x)

weighted sum of objective and constraint functions

i is Lagrange multiplier associated with fi(x) 0 i is Lagrange multiplier associated with hi(x) = 0

Duality 52

Lagrange dual function

Lagrange dual function: g:Rm

Rp

R,( ) i f L( )


119/301

g(, ) = inf xD

L(x,,)

= inf xD

f0(x) + mi=1

ifi(x) +

pi=1

ihi(x)g is concave, can befor some , lower bound property: if 0, then g(, ) p

proof: ifx is feasible and 0, then

f0(x) L(x,,) infxD L(x,,) =g(, )

minimizing over all feasible x gives p g(, )

Duality 53

Least-norm solution of linear equations

minimize xT

xsubject to Ax=b


120/301

dual function

Lagrangian is L(x, ) =xT

x +

T

(Ax b) to minimize L over x, set gradient equal to zero:

xL(x, ) = 2x + AT= 0 = x= (1/2)AT

plug in in L to obtain g:

g() =L((1/2)AT, ) = 14

TAAT bT

a concave function of

lower bound property: p (1/4)TAAT bT for all

Duality 54

Standard form LP

minimize cT

xsubject to Ax=b, x 0


121/301

dual function

Lagrangian isL(x,,) = cTx + T(Ax b) Tx

= bT+ (c + AT )Tx Lis affine in x, hence

g(, ) = infx

L(x,,) =

bT AT + c= 0 otherwise

g is linear on affine domain{(, ) | AT + c= 0}, hence concave

lower bound property: p bT ifAT+ c 0

Duality 55

Equality constrained norm minimization

minimize

xsubject to Ax=b


122/301

dual function

g() = infx (x TAx + bT) = bT AT 1 otherwisewherev= supu1 uTv is dual norm of proof: follows from infx(

x

yTx) = 0 if

y

1,

otherwise

ify 1, thenx yTx 0 for all x, with equality ifx= 0 ify>1, choose x=tu whereu 1, uTy= y>1:

x yTx=t(u y) as t

lower bound property: p bT ifAT 1

Duality 56

Two-way partitioning

minimize x

T

W xsubject to x2i = 1, i= 1, . . . , n


123/301

a nonconvex problem; feasible set contains 2n discrete points

interpretation: partition{1, . . . , n} in two sets; Wij is cost of assigningi, j to the same set;Wij is cost of assigning to different sets

dual function

g() = infx (xTW x +i

i(x2i 1)) = inf x xT(W+ diag())x 1T

=

1T W+ diag() 0 otherwise

lower bound property: p 1T ifW+ diag() 0example: = min(W)1gives bound p nmin(W)

Duality 57

Lagrange dual and conjugate function

minimize f0(x)subject to Ax b, Cx=dd l f i


124/301

dual function

g(, ) = inf xdom f0 f0(x) + (AT + CT)Tx bT dT= f0 (AT CT) bT dT

recall definition of conjugate f(y) = supxdom f(y

Tx

f(x))

simplifies derivation of dual if conjugate off0 is known

example: entropy maximization

f0(x) =n

i=1

xi log xi, f0 (y) =

ni=1

eyi1

Duality 58

The dual problem

Lagrange dual problem

maximize g(, )subject to 0


125/301

subject to 0

finds best lower bound on p

, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d , are dual feasible if 0, (, ) dom g

often simplified by making implicit constraint (, ) dom g explicitexample: standard form LP and its dual (page 55)

minimize cTxsubject to Ax=b

x 0

maximize

bTsubject to AT+ c 0

Duality 59


126/301

Slaters constraint qualification

strong duality holds for a convex problem

minimize f0(x)subject to fi(x) 0 i = 1 m


127/301

subject to fi(x) 0, i= 1, . . . , mAx=b

if it is strictly feasible, i.e.,

x intD : fi(x) ) can be sharpened: e.g., can replace intD with relintD (interior

relative to affine hull); linear inequalities do not need to hold with strictinequality, . . .

there exist many other types of constraint qualifications

Duality 511

Inequality form LP

primal problem minimize cTxsubject to Ax b


128/301

dual function

g() = infx

(c + AT)Tx bT= bT AT + c= 0 otherwise

dual problemmaximize bTsubject to AT + c= 0, 0

from Slaters condition: p =d ifAx b for some x in fact, p =d except when primal and dual are infeasible

Duality 512

Quadratic program

primal problem (assume P

Sn++)

minimize xTP xsubject to Ax b


129/301

dual functiong() = inf

x

xTP x + T(Ax b)= 1

4TAP1AT bT

dual problemmaximize (1/4)TAP1AT bTsubject to 0

from Slaters condition: p =d ifAx b for some x in fact, p =d always

Duality 513

A nonconvex problem with strong duality

minimize xTAx + 2bTx

subject to xTx 1A 0, hence nonconvex


130/301

dual function: g() = infx(xT(A + I)x + 2bTx

)

unbounded below ifA + I 0 or ifA + I 0 andb R(A + I) minimized by x= (A + I)b otherwise: g() = bT(A + I)b

dual problem and equivalent SDP:

maximize bT(A + I)b subject to A + I

0

b R(A + I)

maximize t

subject to A + I bbT t 0strong duality although primal problem is not convex (not easy to show)

Duality 514

Geometric interpretation

for simplicity, consider problem with one constraint f1(x) 0interpretation of dual function:

g() = inf (t + u) where G = {(f1(x) f0(x)) | x D}


131/301

g() = inf (u,t)G

(t + u), where G = {(f1(x), f0(x)) | x D}

G

p

g()u+t =g()

t

u

G

p

d

t

u

u + t=g() is (non-vertical) supporting hyperplane toG hyperplane intersects t-axis at t=g()

Duality 515

epigraph variation: same interpretation ifG is replaced with

A = {(u, t) | f1(x) u, f0(x) t for some x D}

A

t


132/301

p

g()

u+t =g()

u

strong duality

holds if there is a non-vertical supporting hyperplane toAat (0, p) for convex problem,Ais convex, hence has supp. hyperplane at (0, p) Slaters condition: if there exist (u, t) A with u


133/301

x

i=1

i=1

f0(x) +m

i=1

i fi(x) +

pi=1

i hi(x)

f0(x)

hence, the two inequalities hold with equality

x minimizes L(x, , )

i fi(x

) = 0 fori= 1, . . . , m (known as complementary slackness):

i >0 = fi(x) = 0, fi(x)


134/301

1. primal constraints: fi(x) 0, i= 1, . . . , m, hi(x) = 0, i= 1, . . . , p2. dual constraints: 03. complementary slackness: ifi(x) = 0, i= 1, . . . , m

4. gradient of Lagrangian with respect to x vanishes:

f0(x) +m

i=1

ifi(x) +p

i=1

ihi(x) = 0

from page 517: if strong duality holds and x, , are optimal, then theymust satisfy the KKT conditions

Duality 518

KKT conditions for convex problem

ifx,, satisfy KKT for a convex problem, then they are optimal:

from complementary slackness: f0(x) =L(x, , )( ) ( ) ( )


135/301

from 4th condition (and convexity): g(, ) =L(x, , )hence,f0(x) =g(, )

ifSlaters condition is satisfied:

x is optimal if and only if there exist , that satisfy KKT conditions

recall that Slater implies strong duality, and dual optimum is attained generalizes optimality conditionf0(x) = 0 for unconstrained problem

Duality 519

example: water-filling (assume i>0)

minimize ni=1 log(xi+ i)

subject to x 0, 1T

x= 1

x is optimal iffx 0, 1Tx= 1, and there exist Rn, R such that


136/301

0, ixi= 0, 1

xi+ i + i=

if


137/301

hi(x) = 0, i= 1, . . . , p

perturbed problem and its dual

min. f0(x)

s.t. fi(x) ui, i= 1, . . . , mhi(x) =vi, i= 1, . . . , p

max. g(, ) uT vTs.t. 0

x is primal variable; u, v are parameters

p

(u, v) is optimal value as a function ofu, v we are interested in information about p(u, v) that we can obtain fromthe solution of the unperturbed problem and its dual

Duality 521

global sensitivity result

assume strong duality holds for unperturbed problem, and that , aredual optimal for unperturbed problem

apply weak duality to perturbed problem:

( ) ( ) T T


138/301

p(u, v) g(, ) uT vT

= p(0, 0) uT vT

sensitivity interpretation

ifi large: p increases greatly if we tighten constraint i (ui0) ifi large and positive: p increases greatly if we take vi0 ifi small and positive: p does not decrease much if we take vi>0;ifi small and negative: p

does not decrease much if we take vi


139/301

p(0, 0)

ui= lim

t0

p(tei, 0) p(0, 0)t

i

p(0, 0)

ui= lim

t0

p(tei, 0) p(0, 0)t

ihence, equality

p(u) for a problem with one (inequality)constraint: u

p(u)

p(0) u

u= 0

Duality 523

Duality and problem reformulations

equivalent formulations of a problem can lead to very different duals reformulating the primal problem can be useful when the dual is difficult

to derive or uninteresting


140/301

to derive, or uninteresting

common reformulations

introduce new variables and equality constraints

make explicit constraints implicit or vice-versa transform objective or constraint functions

e.g., replace f0(x) by(f0(x)) with convex, increasing

Duality 524

Introducing new variables and equality constraints

minimize f0(Ax + b)

dual function is constant: g= infx L(x) = infx f0(Ax + b) =ph d li b d l i i l


141/301

we have strong duality, but dual is quite useless

reformulated problem and its dual

minimize f0(y)subject to Ax + b

y= 0

maximize bT f0 ()subject to AT= 0

dual function follows from

g() = inf x,y

(f0(y) Ty+ TAx + bT)

=f0 () + bT AT= 0

otherwise

Duality 525

norm approximation problem: minimizeAx b

minimize ysubject to y=Ax b

can look up conjugate of , or derive dual directly


142/301

g() = inf x,y

(

y

+ Ty

TAx + bT)

=

bT+ infy(y + Ty) AT= 0 otherwise

= bT AT= 0, 1 otherwise(see page 54)

dual of norm approximation problem

maximize bTsubject to AT= 0, 1

Duality 526

Implicit constraints

LP with box constraints: primal and dual problem

minimize cTxsubject to Ax=b

1 x 1

maximize bT 1T1 1T2subject to c + AT+ 1 2= 0

1 0 2 0


143/301

1 x 1 1 0, 2 0

reformulation with box constraints made implicit

minimize f0(x) =

cTx 1 x 1 otherwise

subject to Ax=b

dual function

g() = inf 1x1

(cTx + T(Ax b))

= bT AT+ c1dual problem: maximizebT AT+ c1

Duality 527

Problems with generalized inequalities

minimize f0(x)

subject to fi(x) Ki0, i= 1, . . . , mhi(x) = 0, i= 1, . . . , p

K is generalized inequality on Rki


144/301

Ki is generalized inequality on Rdefinitionsare parallel to scalar case:

Lagrange multiplier for fi(x) Ki0 is vector i Rki Lagrangian L:Rn Rk1 Rkm Rp R, is defined as

L(x, 1, , m, ) =f0(x) +m

i=1

Tifi(x) +p

i=1

ihi(x)

dual function g:Rk1 Rkm Rp R, is defined asg(1, . . . , m, ) = inf

xDL(x, 1, , m, )

Duality 528

lower bound property: ifi Ki 0, then g(1, . . . , m, ) p

proof: ifx is feasible and Ki 0, then

f0(x) f0(x) + mi=1

Ti fi(x) +p

i=1

ihi(x)

inf L(x, 1, . . . , m, )


145/301

infxD

L(x, 1, . . . , m, )

= g(1, . . . , m, )

minimizing over all feasible x gives p g(1, . . . , m, )dual problem

maximize g(1, . . . , m, )subject to i Ki 0, i= 1, . . . , m

weak duality: p

d

always strong duality: p =d for convex problem with constraint qualification(for example, Slaters: primal problem is strictly feasible)

Duality 529


146/301


6. Approximation and fitting

norm approximation


147/301

norm approximation

least-norm problems

regularized approximation

robust approximation

61

Norm approximation

minimize

Ax

b

(A Rmn with m n, is a norm on Rm)interpretations of solution x = argmin Ax b:


148/301

interpretations of solution x = argminx Ax b: geometric: Ax is point inR(A) closest to b estimation: linear measurement model

y=Ax + v

y are measurements, x is unknown, v is measurement error

giveny=b, best guess ofx is x

optimal design: x are design variables (input), Ax is result (output)x is design that best approximates desired result b

Approximation and fitting 62

examples

least-squares approximation ( 2): solution satisfies normal equations

ATAx=ATb

(x = (ATA)1ATb if rankA = n)


149/301

(x (A A) A b ifrankA n)

Chebyshev approximation ( ): can be solved as an LP

minimize tsubject to

t1

Ax

b

t1

sum of absolute residuals approximation ( 1): can be solved as an LP

minimize 1Tysubject to y Ax b y



150/301

example(m= 100, n= 30): histogram of residuals for penalties

(u) =

|u

|, (u) =u2, (u) = max

{0,

|u

|a

}, (u) =

log(1

u2)

p=

1

0

40


151/301

p=

2

Dead

zone

Logbarrier

r2

2

2

2

1

1

1

1

0

0

0

0

1

1

1

1

2

2

2

20

0

10

0

20

0

10

shape of penalty function has large effect on distribution of residuals


Huber penalty function (with parameter M)

hub(u) = u2 |u| MM(2|u| M) |u| > M

linear growth for large u makes approximation less sensitive to outliers

2


152/301

u

h

ub

(u

)

1.5 1 0.5 0 0.5 1 1.50

0.5

1

1.5

t

f(t)

10 5 0 5 1020

10

0

10

20

left: Huber penalty for M= 1

right: affine function f(t) = + t fitted to 42 points ti, yi (circles)using quadratic (dashed) and Huber (solid) penalty


Least-norm problems

minimize

xsubject to Ax=b

(A Rmn with m n, is a norm on Rn)


153/301

interpretations of solution x = argminAx=b x:

geometric: x is point in affine set{x | Ax=b} with minimumdistance to 0

estimation: b=Ax are (perfect) measurements ofx; x is smallest(most plausible) estimate consistent with measurements

design: x are design variables (inputs); b are required results (outputs)x is smallest (most efficient) design that satisfies requirements


examples

least-squares solution of linear equations ( 2):

can be solved via optimality conditions

2x + AT= 0, Ax=b


154/301

minimum sum of absolute values ( 1): can be solved as an LPminimize 1Tysubject to y x y, Ax=b

tends to produce sparse solution x

extension: least-penalty problem

minimize (x1

) +

+ (xn

)subject to Ax=b

:R R is convex penalty function


Regularized approximation

minimize (w.r.t. R2+) (Ax b, x)

A Rmn, norms on Rm andRn can be different


155/301

interpretation: find good approximation Ax b with small x

estimation: linear measurement model y=Ax + v, with priorknowledge thatx is small

optimal design: small x is cheaper or more efficient, or the linearmodely=Axis only valid for small x

robust approximation: good approximation Ax

b with small x is

less sensitive to errors in A than good approximation with large x


Scalarized problem

minimize Ax b + x solution for >0 traces out optimal trade-off curve

other common method: minimize Ax b 2 + x 2 with > 0


156/301

other common method: minimize

Ax

b

2 +

x

2 with >0

Tikhonov regularization

minimize

Ax

b

22+

x

22

can be solved as a least-squares problem

minimize AI x b0 2

2

solutionx = (ATA + I)1ATb


Optimal input design

linear dynamical system with impulse response h:

y(t) =t

=0

h()u(t ), t= 0, 1, . . . , N

input design problem: multicriterion problem with 3 objectives


157/301

input design problem: multicriterion problem with 3 objectives

1. tracking error with desired output ydes: Jtrack=N

t=0(y(t) ydes(t))2

2. input magnitude: Jmag=

Nt=0 u(t)

2

3. input variation: Jder=N1t=0 (u(t + 1) u(t))2track desired output using a small and slowly varying input signal

regularized least-squares formulation

minimize Jtrack+ Jder+ Jmag

for fixed , , a least-squares problem in u(0), . . . ,u(N)


example: 3 solutions on optimal trade-off surface

(top)= 0, small ; (middle) = 0, larger ; (bottom) large

u(t)

5

0

5

y(t)

1

0.5

0

0.51


158/301

t0 50 100 150 200

10

t0 50 100 150 200

1

t

u(t)

0 50 100 150 20042

0

2

4

t

y(t)

0 50 100 150 2001

0.5

0

0.5

1

t

u(t)

0 50 100 150 2004

2

0

2

4

t

y(t)

0 50 100 150 2001

0.50

0.5

1


Signal reconstruction

minimize (w.r.t. R2+) (x xcor2, (x))

x Rn is unknown signal


159/301

xcor=x + v is (known) corrupted version ofx, with additive noise v variable x (reconstructed signal) is estimate ofx :Rn R is regularization function or smoothing objective

examples: quadratic smoothing, total variation smoothing:

quad

(x) =n1

i=1 (xi+1 xi)2, tv(x) =n1

i=1 |xi+1 xi|


quadratic smoothing example

x

0 5

0

0.5x

0 1000 2000 3000 4000

0.5

0

0.5

0.5


160/301

i

xco

r

0

0

1000

1000

2000

2000

3000

3000

4000

4000

0.5

0.5

0

0.5

i

x

x

0

0

1000

1000

2000

2000

3000

3000

4000

4000

0.5

0.5

0

0

0.5

original signal x and noisysignalxcor

three solutions on trade-off curvex xcor2 versus quad(x)


total variation reconstruction example

x

1

0

1

2 x

i

0 500 1000 1500 20002

0

2

2


161/301

i

xcor

0

0

500

500

1000

1000

1500

1500

2000

2000

2

2

1

0

1

2

i

x

i

xi

0

0

500

500

1000

1000

1500

1500

2000

2000

2

2

0

0

2


three solutions on trade-off curvex xcor2 versus quad(x)

quadratic smoothing smooths out noise and sharp transitions in signal


x

0 500 1000 1500 20002

1

0

1

2 x

x

0 500 1000 1500 20002

0

0

2

2


162/301

i

xcor

0 500 1000 1500 20002

1

0

1

2

i

x

0

0

500

500

1000

1000

1500

1500

2000

2000

2

2

0

2


three solutions on trade-off curve

x

xcor

2 versus tv(x)

total variation smoothing preserves sharp transitions in signal


Robust approximation

minimizeAx b with uncertain A

two approaches:

stochastic: assume A is random, minimize E Ax b worst-case: setAof possible values ofA, minimize supAA Ax b


163/301

tractable only in special cases (certain norms , distributions, setsA)

example: A(u) =A0+ uA1

xnom minimizesA0x b22 xstoch minimizes E A(u)x b22

with u uniform on [1, 1] xwcminimizes sup1u1 A(u)x b22

figure shows r(u) = A(u)x b2u

r(u

)

xnom

xstoch

xwc

2 1 0 1 20

2

4

6

8

10

12


stochastic robust LS withA= A + U,U random, EU= 0, EUTU=P

minimize E (A + U)x b22

explicit expression for objective:

E Ax b22 = E Ax b + U x22


164/301

= Ax b22+ ExTUTU x= Ax b22+ xTP x

hence, robust LS problem is equivalent to LS problem

minimize Ax b22+ P1/2x22

for P =I, get Tikhonov regularized problem

minimize Ax b22+ x22


worst-case robust LS withA = {A + u1A1+ + upAp | u2 1}

minimize supAA Ax b22= supu21 P(x)u + q(x)22where P(x) = A1x A2x Apx , q(x) = Ax b from page 514, strong duality holds between the following problems


165/301

maximize P u + q22subject to u22 1

minimize t +

subject to

I P q

PT I 0qT 0 t

0

hence, robust LS problem is equivalent to SDP

minimize t +

subject to I P(x) q(x)P(x)T I 0q(x)T 0 t

0


example: histogram of residuals

r(u) = (A0+ u1A1+ u2A2)x b2with u uniformly distributed on unit disk, for three values ofx

xrls0.2

0.25


166/301

r(u)

xlsxtik

frequency

0 1 2 3 4 50

0.05

0.1

0.15

xls minimizes

A0x

b

2

xtik minimizesA0x b22+ x22 (Tikhonov solution) xrls minimizes supAA Ax b22+ x22



7. Statistical estimation

maximum likelihood estimation


167/301

optimal detector design

experiment design

71

Parametric distribution estimation

distribution estimation problem: estimate probability density p(y) of arandom variable from observed values parametric distribution estimation: choose from a family of densities

px(y), indexed by a parameter x


168/301

maximum likelihood estimation

maximize (over x) logpx(y)

y is observed value l(x) = logpx(y) is called log-likelihood function

can add constraints x Cexplicitly, or define px(y) = 0 for x C a convex optimization problem iflogpx(y) is concave in x for fixed y

Statistical estimation 72

Linear measurements with IID noise

linear measurement model

yi=aTix + vi, i= 1, . . . , m

x Rn is vector of unknown parameters


169/301

vi is IID measurement noise, with density p(z) yi is measurement: y Rm has density px(y) =

mi=1p(yi aTix)

maximum likelihood estimate: any solution x of

maximize l(x) =mi=1 logp(yi

aTix)

(y is observed value)


examples

Gaussian noiseN(0, 2): p(z) = (22)1/2ez2/(22),

l(x) = m2

log(22) 122

mi=1

(aTix yi)2

ML estimate is LS solution

|z|/a


170/301

Laplacian noise: p(z) = (1/(2a))e ,

l(x) = m log(2a) 1a

mi=1

|aTix yi|

ML estimate is 1-norm solution

uniform noise on[a, a]:

l(x) = m log(2a) |aTix

yi

| a, i= 1, . . . , m

otherwiseML estimate is any x with|aTix yi| a


Logistic regression

random variable y {0, 1} with distribution

p=prob(y= 1) = exp(aTu + b)

1 + exp(aTu + b)

a, b are parameters; u Rn are (observable) explanatory variablesf ( )


171/301

estimation problem: estimate a, b from m observations (ui, yi)

log-likelihood function (for y1= =yk= 1, yk+1= =ym= 0):

l(a, b) = log ki=1

exp(aTui+ b)1 + exp(aTui+ b)

mi=k+1

11 + exp(aTui+ b)

=

k

i=1(aTui+ b) m

i=1 log(1 + exp(aTui+ b))concave in a, b


example(n= 1, m= 50 measurements)

b

(y=

1)

0 4

0.6

0.8

1


172/301

u

prob

0 2 4 6 8 10

0

0.2

0.4

circles show 50 points (ui, yi)

solid curve is ML estimate ofp= exp(au + b)/(1 + exp(au + b))


(Binary) hypothesis testing

detection (hypothesis testing) problem

given observation of a random variable X {1, . . . , n}, choose between:

hypothesis 1: Xwas generated by distribution p= (p1, . . . , pn)

h h i 2 X d b di ib i ( 1 )


173/301

hypothesis 2: Xwas generated by distribution q= (q1, . . . , q n)

randomized detector

a nonnegative matrix T R2n, with 1TT =1T

if we observe X=k, we choose hypothesis 1 with probability t1k,hypothesis 2 with probability t2k

if all elements ofT are0 or1, it is called a deterministic detector


detection probability matrix:

D= T p T q = 1 Pfp Pfn

Pfp 1 Pfn Pfp is probability of selecting hypothesis 2 ifX is generated by

distribution 1 (false positive)

Pfn i b bilit f l ti h th i 1 if X i t d b


174/301

Pfn is probability of selecting hypothesis 1 ifX is generated bydistribution 2 (false negative)

multicriterion formulation of detector design

minimize (w.r.t. R2+) (Pfp, Pfn) = ((T p)2, (T q)1)subject to t1k+ t2k= 1, k= 1, . . . , n

tik 0, i= 1, 2, k= 1, . . . , n

variable T R2n


scalarization (with weight >0)

minimize (T p)2+ (T q)1

subject to t1k+ t2k= 1, tik 0, i= 1, 2, k= 1, . . . , nan LP with a simple analytical solution

(t1k t2k)

(1, 0) pk

qk

(0 1) k < k


175/301

(t1k, t2k) = (0, 1) pk< qk a deterministic detector, given by a likelihood ratio test

ifpk=qk for some k, any value 0

t1k

1, t1k= 1

t2k is optimal

(i.e., Pareto-optimal detectors include non-deterministic detectors)

minimax detector

minimize max

{Pfp, Pfn

}= max

{(T p)2, (T q)1

}subject to t1k+ t2k= 1, tik 0, i= 1, 2, k= 1, . . . , nan LP; solution is usually not deterministic


example

P =

0.70 0.100.20 0.100.05 0.700.05 0.10

0 8

1


176/301

Pfp

Pfn

1

2

34

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

solutions 1, 2, 3 (and endpoints) are deterministic; 4 is minimax detector


Experiment design

m linear measurements yi=aTix + wi, i= 1, . . . , mof unknown x Rn

measurement errors wi are IIDN(0, 1) ML (least-squares) estimate is

x =

m

aiaT

i

1 m

yiai


177/301

x= i=1

aiai i=1

yiai

error e= x x has zero mean and covariance

E=E eeT = m

i=1

aiaTi

1confidence ellipsoids are given by

{x

|(x

x)TE1(x

x)

}experiment design: choose ai {v1, . . . , vp} (a set of possible testvectors) to make E small


vector optimization formulation

minimize (w.r.t. Sn+) E=

pk=1 mkvkv

Tk

1

subject to mk

0, m1

+

+ mp

=mmk Z

variables are mk (# vectors ai equal to vk)

difficult in general due to integer constraint


178/301

difficult in general, due to integer constraintrelaxed experiment design

assumem

p, use k=mk/mas (continuous) real variable

minimize (w.r.t. Sn+) E= (1/m)p

k=1 kvkvTk

1subject to 0, 1T= 1

common scalarizations: minimize log det E, trE, max(E), . . . can add other convex constraints, e.g., bound experiment cost cT B


D-optimal design

minimize log det

pk=1 kvkv

Tk

1

subject to 0, 1T

= 1

interpretation: minimizes volume of confidence ellipsoids

dual problem


179/301

maximize log det W+ n log nsubject to vTk W vk 1, k= 1, . . . , p

interpretation:{x | xT

W x 1} is minimum volume ellipsoid centered atorigin, that includes all test vectors vkcomplementary slackness: for, Wprimal and dual optimal

k(1 vTk W vk) = 0, k= 1, . . . , poptimal experiment uses vectors vk on boundary of ellipsoid defined by W


example(p= 20)

1 = 0.5

2 = 0.5

design uses two vectors on boundary of ellipse defined by optimal W


180/301

design uses two vectors, on boundary of ellipse defined by optimal W


derivation of dual of page 713

first reformulate primal problem with new variable X:

minimize log det X1subject to X=

pk=1 kvkv

Tk , 0, 1T= 1

L(X,,Z,z,) = log det X1+tr Z X

p

k=1

kvkvT

k zT+(1T

1)


181/30

Date post:	02-Jun-2018
Category:	Documents
Upload:	pawankumar-barnwal
View:	225 times
Download:	0 times

Convex Slides

Documents