Post on 05-Apr-2018
transcript
7/31/2019 Bv Cvxslides
1/300
Convex Optimization Boyd & Vandenberghe
1. Introduction
mathematical optimization
least-squares and linear programming
convex optimization
example
course goals and topics
nonlinear optimization
brief history of convex optimization
11
7/31/2019 Bv Cvxslides
2/300
Mathematical optimization
(mathematical) optimization problem
minimize f 0(x)
subject to f i (x) bi , i = 1 , . . . , m x = ( x1, . . . , x n ): optimization variables
f 0 : Rn
R: objective function f i : Rn R, i = 1 , . . . , m : constraint functions
optimal solution x has smallest value of f 0 among all vectors thatsatisfy the constraints
Introduction 12
7/31/2019 Bv Cvxslides
3/300
Examples
portfolio optimization
variables: amounts invested in different assets constraints: budget, max./min. investment per asset, minimum return objective: overall risk or return variancedevice sizing in electronic circuits
variables: device widths and lengths
constraints: manufacturing limits, timing requirements, maximum area objective: power consumptiondata tting
variables: model parameters constraints: prior information, parameter limits
objective: measure of mist or prediction error
Introduction 13
7/31/2019 Bv Cvxslides
4/300
Solving optimization problems
general optimization problem
very difficult to solve methods involve some compromise, e.g. , very long computation time, ornot always nding the solution
exceptions: certain problem classes can be solved efficiently and reliably
least-squares problems linear programming problems convex optimization problems
Introduction 14
7/31/2019 Bv Cvxslides
5/300
Least-squares
minimize Ax b 22solving least-squares problems
analytical solution: x = ( AT A) 1AT b reliable and efficient algorithms and software
computation time proportional to n2k (A
Rk n ); less if structured
a mature technologyusing least-squares
least-squares problems are easy to recognize a few standard techniques increase exibility (e.g. , including weights,adding regularization terms)
Introduction 15
7/31/2019 Bv Cvxslides
6/300
Linear programming
minimize cT xsubject to aT i x bi , i = 1 , . . . , m
solving linear programs
no analytical formula for solution reliable and efficient algorithms and software
computation time proportional to n2m if m n; less with structure a mature technologyusing linear programming
not as easy to recognize as least-squares problems a few standard tricks used to convert problems into linear programs(e.g. , problems involving1- or -norms, piecewise-linear functions)
Introduction 16
7/31/2019 Bv Cvxslides
7/300
Convex optimization problem
minimize f 0(x)subject to f i (x) bi , i = 1 , . . . , m
objective and constraint functions are convex:f i (x + y) f i (x) + f i (y)
if + = 1 , 0, 0
includes least-squares problems and linear programs as special cases
Introduction 17
7/31/2019 Bv Cvxslides
8/300
solving convex optimization problems
no analytical solution reliable and efficient algorithms
computation time (roughly) proportional to max
{n3, n 2m, F
}, where F
is cost of evaluating f i s and their rst and second derivatives
almost a technology
using convex optimization
often difficult to recognize many tricks for transforming problems into convex form surprisingly many problems can be solved via convex optimization
Introduction 18
7/31/2019 Bv Cvxslides
9/300
Example
m lamps illuminating n (small, at) patches
lamp power p j
illumination I k
r kjkj
intensity I k at patch k depends linearly on lamp powerspj :
I k =m
j =1
akj pj , akj = r 2kj max{cos kj , 0}problem : achieve desired illuminationI des with bounded lamp powers
minimize max k =1 ,...,n | log I k log I des|subject to 0 pj pmax, j = 1 , . . . , m
Introduction 19
7/31/2019 Bv Cvxslides
10/300
how to solve?
1. use uniform power: pj = p, vary p2. use least-squares:
minimize nk =1 (I k I des)2
round pj if pj > p max or pj < 03. use weighted least-squares:
minimize nk =1 (I k I des)2 + mj =1 wj ( pj pmax/ 2)2iteratively adjust weights wj until 0 pj pmax
4. use linear programming:
minimize max k =1 ,...,n
|I k
I des
|subject to 0 pj pmax, j = 1 , . . . , mwhich can be solved via linear programming
of course these are approximate (suboptimal) solutions
Introduction 110
7/31/2019 Bv Cvxslides
11/300
5. use convex optimization: problem is equivalent to
minimize f 0( p) = max k =1 ,...,n h(I k /I des)subject to 0 pj pmax, j = 1 , . . . , m
with h(u) = max
{u, 1/u
}
0 1 2 3 40
1
2
3
4
5
u
h ( u
)
f 0 is convex because maximum of convex functions is convex
exact solution obtained with effort modest factor least-squares effortIntroduction 111
7/31/2019 Bv Cvxslides
12/300
additional constraints: does adding 1 or 2 below complicate the problem?
1. no more than half of total power is in any 10 lamps
2. no more than half of the lamps are on ( pj > 0)
answer: with (1), still easy to solve; with (2), extremely difficult moral: (untrained) intuition doesnt always work; without the properbackground very easy problems can appear quite similar to very difficult
problems
Introduction 112
7/31/2019 Bv Cvxslides
13/300
Course goals and topics
goals
1. recognize/formulate problems (such as the illumination problem) asconvex optimization problems
2. develop code for problems of moderate size (1000 lamps, 5000 patches)
3. characterize optimal solution (optimal power distribution), give limits of performance, etc.
topics
1. convex sets, functions, optimization problems
2. examples and applications
3. algorithms
Introduction 113
7/31/2019 Bv Cvxslides
14/300
Nonlinear optimization
traditional techniques for general nonconvex problems involve compromises
local optimization methods (nonlinear programming)
nd a point that minimizes f 0 among feasible points near it fast, can handle large problems require initial guess provide no information about distance to (global) optimumglobal optimization methods
nd the (global) solution worst-case complexity grows exponentially with problem size
these algorithms are often based on solving convex subproblems
Introduction 114
7/31/2019 Bv Cvxslides
15/300
Brief history of convex optimization
theory (convex analysis) : ca19001970algorithms
1947: simplex algorithm for linear programming (Dantzig) 1960s: early interior-point methods (Fiacco & McCormick, Dikin, . . . ) 1970s: ellipsoid method and other subgradient methods 1980s: polynomial-time interior-point methods for linear programming(Karmarkar 1984) late 1980snow: polynomial-time interior-point methods for nonlinearconvex optimization (Nesterov & Nemirovski 1994)applications
before 1990: mostly in operations research; few in engineering since 1990: many new applications in engineering (control, signalprocessing, communications, circuit design, . . . ); new problem classes
(semidenite and second-order cone programming, robust optimization)
Introduction 115
7/31/2019 Bv Cvxslides
16/300
Convex Optimization Boyd & Vandenberghe
2. Convex sets
affine and convex sets
some important examples
operations that preserve convexity
generalized inequalities
separating and supporting hyperplanes
dual cones and generalized inequalities
21
7/31/2019 Bv Cvxslides
17/300
Affine set
line through x1, x2: all points
x = x1 + (1 )x2 (R)
x 1
x 2
= 1 .2 = 1
= 0 .6
= 0 = 0.2
affine set : contains the line through any two distinct points in the set
example : solution set of linear equations {x | Ax = b}(conversely, every affine set can be expressed as solution set of system of linear equations)
Convex sets 22
7/31/2019 Bv Cvxslides
18/300
Convex set
line segment between x1 and x2: all points
x = x1 + (1 )x2with 0 1convex set : contains line segment between any two points in the set
x1, x2C, 0 1 = x1 + (1 )x2C examples (one convex, two nonconvex sets)
Convex sets 23
7/31/2019 Bv Cvxslides
19/300
Convex combination and convex hull
convex combination of x1,. . . , xk : any point x of the form
x = 1x1 + 2x2 + + k xkwith 1 + + k = 1 , i 0
convex hullconv
S : set of all convex combinations of points inS
Convex sets 24
7/31/2019 Bv Cvxslides
20/300
Convex cone
conic (nonnegative) combination of x1 and x2: any point of the form
x = 1x1 + 2x2
with 1 0, 2 0
0
x 1
x 2
convex cone : set that contains all conic combinations of points in the set
Convex sets 25
7/31/2019 Bv Cvxslides
21/300
Hyperplanes and halfspaces
hyperplane : set of the form {x | aT x = b}(a = 0 )a
xa T x = b
x 0
halfspace: set of the form {x | aT x b}(a = 0 )a
a T x b
a T x b
x 0
a is the normal vector
hyperplanes are affine and convex; halfspaces are convex
Convex sets 26
7/31/2019 Bv Cvxslides
22/300
Euclidean balls and ellipsoids
(Euclidean) ball with center xc and radius r :
B (xc, r ) = {x | x xc 2 r}= {xc + ru | u 2 1}ellipsoid: set of the form
{x | (x xc)T P 1(x xc) 1}with P S
n++ (i.e. , P symmetric positive denite)
x c
other representation:
{xc + Au
|u 2
1
}with A square and nonsingular
Convex sets 27
7/31/2019 Bv Cvxslides
23/300
Norm balls and norm cones
norm: a function that satises x 0; x = 0 if and only if x = 0
tx =
|t
|x for t
R
x + y x + ynotation: is general (unspecied) norm; symb is particular normnorm ball with center xc and radius r :
{x
|x
xc
r
}norm cone: {(x, t ) | x t}Euclidean norm cone is called second-order cone
x 1x 2
t
10
1
1
0
10
0.5
1
norm balls and cones are convexConvex sets 28
7/31/2019 Bv Cvxslides
24/300
Polyhedra
solution set of nitely many linear inequalities and equalities
Ax b, Cx = d
(ARm n , C R
p n , is componentwise inequality)
a 1 a 2
a 3
a 4
a 5P
polyhedron is intersection of nite number of halfspaces and hyperplanes
Convex sets 29
7/31/2019 Bv Cvxslides
25/300
Positive semidenite cone
notation: Sn is set of symmetric n n matrices Sn+ = {X Sn | X 0}: positive semidenite n n matrices
X Sn+ z
T Xz 0 for all zSn+ is a convex cone
Sn++ =
{X
Sn
|X
0
}: positive denite n
n matrices
example: x yy z S2+
xy
z
00.5
1
1
0
10
0.5
1
Convex sets 210
7/31/2019 Bv Cvxslides
26/300
Operations that preserve convexity
practical methods for establishing convexity of a set C
1. apply denition
x1, x2C, 0 1 = x1 + (1 )x2C
2. show that C is obtained from simple convex sets (hyperplanes,halfspaces, norm balls, . . . ) by operations that preserve convexity
intersection affine functions perspective function linear-fractional functions
Convex sets 211
7/31/2019 Bv Cvxslides
27/300
Intersection
the intersection of (any number of) convex sets is convex
example:
S = {xRm
| | p(t)| 1 for |t| / 3}where p(t) = x1 cos t + x2 cos2t + + xm cos mtfor m = 2 :
0 / 3 2/ 3
1
01
t
p ( t )
x 1
x 2 S
2 1 0 1 2 2
1
0
1
2
Convex sets 212
7/31/2019 Bv Cvxslides
28/300
Affine function
suppose f : Rn Rm is affine (f (x) = Ax + b with ARm n , bRm ) the image of a convex set under f is convex
S Rn
convex = f (S ) = {f (x) | xS }convex the inverse image f 1(C ) of a convex set under f is convex
C Rm
convex = f 1
(C ) = {xRn
| f (x)C }convexexamples
scaling, translation, projection
solution set of linear matrix inequality{x | x1A1 + + xm Am B}(with Ai , B S p) hyperbolic cone {x | xT P x (cT x)2, cT x 0}(with P Sn+ )
Convex sets 213
7/31/2019 Bv Cvxslides
29/300
Perspective and linear-fractional function
perspective function P : Rn +1 Rn :P (x, t ) = x/t, dom P = {(x, t ) | t > 0}
images and inverse images of convex sets under perspective are convex
linear-fractional function f : Rn Rm :
f (x) =Ax + bcT x + d
, dom f = {x | cT x + d > 0}images and inverse images of convex sets under linear-fractional functionsare convex
Convex sets 214
7/31/2019 Bv Cvxslides
30/300
example of a linear-fractional function
f (x) = 1x1 + x2 + 1
x
x 1
x 2
C
1 0 1 1
0
1
x 1
x 2
f (C )
1 0 1 1
0
1
Convex sets 215
7/31/2019 Bv Cvxslides
31/300
Generalized inequalities
a convex cone K Rn is a proper cone if
K is closed (contains its boundary)
K is solid (has nonempty interior) K is pointed (contains no line)
examples
nonnegative orthant K = Rn+ = {xRn | x i 0, i = 1 , . . . , n } positive semidenite cone K = Sn+ nonnegative polynomials on [0, 1]:
K = {xRn | x1 + x2t + x3t2 + + xn tn 1 0 for t[0, 1]}
Convex sets 216
7/31/2019 Bv Cvxslides
32/300
generalized inequality dened by a proper cone K :
x K y y xK, x K y y xint K examples
componentwise inequality (K = Rn+ )
x Rn+ y x i yi , i = 1 , . . . , n
matrix inequality (K = Sn+ )
X Sn+ Y Y X positive semidenitethese two types are so common that we drop the subscript in K
properties: many properties of K are similar to on R, e.g. ,x K y, u K v = x + u K y + v
Convex sets 217
7/31/2019 Bv Cvxslides
33/300
Minimum and minimal elements
K is not in general a linear ordering : we can have x K y and y K x
xS is the minimum element of S with respect to K if
yS = xK y
xS is a minimal element of S with respect to K if
yS, y K x = y = x
example (K = R2+ )
x1 is the minimum element of S 1x2 is a minimal element of S 2 x 1
x 2S 1S 2
Convex sets 218
7/31/2019 Bv Cvxslides
34/300
Separating hyperplane theorem
if C and D are disjoint convex sets, then there exists a = 0 , b such that
aT x b for xC, aT x b for xD
D
C
a
a T x b a T x b
the hyperplane {x | aT x = b}separates C and Dstrict separation requires additional assumptions ( e.g. , C is closed, D is asingleton)
Convex sets 219
7/31/2019 Bv Cvxslides
35/300
Supporting hyperplane theorem
supporting hyperplane to set C at boundary point x0:
{x | aT x = aT x0}where a = 0 and aT x aT x0 for all xC
C
ax 0
supporting hyperplane theorem: if C is convex, then there exists asupporting hyperplane at every boundary point of C
Convex sets 220
7/31/2019 Bv Cvxslides
36/300
Dual cones and generalized inequalities
dual cone of a cone K :
K = {y | yT x 0 for all xK }examples
K = Rn+ : K = Rn+ K = Sn+ : K = Sn+ K = {(x, t ) | x 2 t}: K = {(x, t ) | x 2 t} K = {(x, t ) | x 1 t}: K = {(x, t ) | x t}rst three examples are self-dual cones
dual cones of proper cones are proper, hence dene generalized inequalities:
y K 0 yT x 0 for all x K 0
Convex sets 221
7/31/2019 Bv Cvxslides
37/300
Minimum and minimal elements via dual inequalities
minimum element w.r.t. K
x is minimum element of S iff for allK
0, x is the unique minimizerof T z over S
x
S
minimal element w.r.t. K
if x minimizes T z over S for some
K 0, then x is minimal
Sx 1
x 2
1
2
if x is a minimal element of aconvex set S , then there exists a nonzero K 0 such that x minimizes T z over S Convex sets 222
7/31/2019 Bv Cvxslides
38/300
optimal production frontier
different production methods use different amounts of resources xRn
production set P : resource vectors x for all possible production methods efficient (Pareto optimal) methods correspond to resource vectors x
that are minimal w.r.t. Rn+
example (n = 2 )x1, x2, x3 are efficient; x4, x5 are not
x 4x 2
x 1
x 5
x 3
P
labor
fuel
Convex sets 223
7/31/2019 Bv Cvxslides
39/300
Convex Optimization Boyd & Vandenberghe
3. Convex functions
basic properties and examples
operations that preserve convexity
the conjugate function
quasiconvex functions
log-concave and log-convex functions
convexity with respect to generalized inequalities
31
7/31/2019 Bv Cvxslides
40/300
Denition
f : Rn
R is convex if dom f is a convex set andf (x + (1 )y) f (x) + (1 )f (y)
for all x, y
dom f , 0
1
(x, f (x ))
(y, f (y ))
f is concave if f is convex
f is strictly convex if dom f is convex and
f (x + (1 )y) < f (x) + (1 )f (y)for x, ydom f , x = y, 0 < < 1
Convex functions 32
l
7/31/2019 Bv Cvxslides
41/300
Examples on R
convex:
affine: ax + b on R, for any a, bR
exponential: eax , for any a
R
powers: x on R++ , for 1 or 0 powers of absolute value: |x| p on R, for p 1
negative entropy: x log x on R++concave:
affine: ax + b on R, for any a, bR powers: x on R++ , for 0 1 logarithm: log x on R++
Convex functions 33
7/31/2019 Bv Cvxslides
42/300
Examples on R n and R m n
affine functions are convex and concave; all norms are convex
examples on R n
affine function f (x) = aT x + b norms: x p = ( ni=1 |x i | p)1/p for p 1; x = max k |xk |examples on R m n (m n matrices)
affine function
f (X ) = tr (AT X ) + b =m
i=1
n
j =1
Aij X ij + b
spectral (maximum singular value) normf (X ) = X 2 = max (X ) = ( max (X T X ))1/ 2
Convex functions 34
R i i f f i li
7/31/2019 Bv Cvxslides
43/300
Restriction of a convex function to a line
f : Rn R is convex if and only if the functiong : R R,g(t) = f (x + tv), dom g = {t | x + tvdom f }
is convex (in t) for any xdom f , vR
n
can check convexity of f by checking convexity of functions of one variable
example. f : Sn
R with f (X ) = log det X , dom f = Sn
++
g(t) = log det( X + tV ) = log det X + log det( I + tX 1/ 2V X 1/ 2)
= log det X +n
i=1
log(1 + t i )
where i are the eigenvalues of X 1/ 2V X 1/ 2
g is concave in t (for any choice of X
0, V ); hence f is concave
Convex functions 35
E t d d l t i
7/31/2019 Bv Cvxslides
44/300
Extended-value extension
extended-value extension f of f is
f (x) = f (x), xdom f, f (x) = , xdom f
often simplies notation; for example, the condition
0 1 = f (x + (1 )y) f (x) + (1 )f (y)(as an inequality in R{}), means the same as the two conditions
dom f is convex for x, ydom f ,
0 1 = f (x + (1 )y) f (x) + (1 )f (y)
Convex functions 36
First order condition
7/31/2019 Bv Cvxslides
45/300
First-order condition
f is differentiable if dom f is open and the gradient
f (x) =f (x)x1
,f (x)x2
, . . . ,f (x)xn
exists at each xdom f
1st-order condition: differentiable f with convex domain is convex iff
f (y) f (x) + f (x)T (y x) for all x, ydom f
(x, f (x ))
f (y )
f (x ) + f (x )T
(y x )
rst-order approximation of f is global underestimator
Convex functions 37
Second order conditions
7/31/2019 Bv Cvxslides
46/300
Second-order conditions
f is twice differentiable if dom f is open and the Hessian2f (x)S
n ,
2f (x) ij =
2f (x)x
ix
j
, i, j = 1 , . . . , n ,
exists at each xdom f
2nd-order conditions: for twice differentiable f with convex domain
f is convex if and only if
2f (x) 0 for all x
dom f
if 2f (x)0 for all xdom f , then f is strictly convex
Convex functions 38
Examples
7/31/2019 Bv Cvxslides
47/300
Examples
quadratic function: f (x) = (1 / 2)xT
P x + qT
x + r (with P Sn
)
f (x) = P x + q, 2f (x) = P
convex if P 0
least-squares objective : f (x) = Ax b 22f (x) = 2 A
T (Ax b), 2f (x) = 2 AT Aconvex (for any A)
quadratic-over-linear: f (x, y ) = x2/y
2f (x, y ) =
2y3
y
xy
xT
0
convex for y > 0 xy
f ( x
, y
)
20
2
01
20
1
2
Convex functions 39
log-sum-exp : f (x) = log n exp xk is convex
7/31/2019 Bv Cvxslides
48/300
log-sum-exp : f (x) = log k =1 exp xk is convex
2f (x) = 1
1 T zdiag (z) 1(1 T z)2 zz
T (zk = exp xk )
to show
2
f (x) 0, we must verify that vT
2
f (x)v 0 for all v:vT
2f (x)v =( k zk v
2k )( k zk ) ( k vk zk )2
( k zk )2 0
since ( k vk zk )2 ( k zk v2k )( k zk ) (from Cauchy-Schwarz inequality)
geometric mean : f (x) = (nk =1 xk )1/n on R
n++ is concave
(similar proof as for log-sum-exp)
Convex functions 310
Epigraph and sublevel set
7/31/2019 Bv Cvxslides
49/300
Epigraph and sublevel set
-sublevel set of f : Rn R:C = {xdom f | f (x) }
sublevel sets of convex functions are convex (converse is false)
epigraph of f : Rn R:epi
f = {(x, t )Rn +1
| xdom
f, f (x) t}epi f
f
f is convex if and only if epi f is a convex set
Convex functions 311
Jensens inequality
7/31/2019 Bv Cvxslides
50/300
Jensen s inequality
basic inequality: if f is convex, then for 0 1,f (x + (1 )y) f (x) + (1 )f (y)
extension: if f is convex, then
f (E z) E f (z)for any random variable z
basic inequality is special case with discrete distribution
prob (z = x) = , prob (z = y) = 1
Convex functions 312
Operations that preserve convexity
7/31/2019 Bv Cvxslides
51/300
Operations that preserve convexity
practical methods for establishing convexity of a function
1. verify denition (often simplied by restricting to a line)
2. for twice differentiable functions, show2f (x) 0
3. show that f is obtained from simple convex functions by operationsthat preserve convexity
nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective
Convex functions 313
Positive weighted sum & composition with affine function
7/31/2019 Bv Cvxslides
52/300
Positive weighted sum & composition with affine function
nonnegative multiple: f is convex if f is convex, 0sum: f 1 + f 2 convex if f 1, f 2 convex (extends to innite sums, integrals)
composition with affine function : f (Ax + b) is convex if f is convex
examples
log barrier for linear inequalitiesf (x) =
m
i=1
log(bi aT i x), dom f = {x | aT i x < b i , i = 1 , . . . , m }
(any) norm of affine function: f (x) = Ax + b
Convex functions 314
Pointwise maximum
7/31/2019 Bv Cvxslides
53/300
if f 1, . . . , f m are convex, then f (x) = max {f 1(x), . . . , f m (x)}is convex
examples
piecewise-linear function: f (x) = max i=1 ,...,m (aT i x + bi ) is convex sum of r largest components of xRn :
f (x) = x[1] + x[2] + + x[r ]is convex (x[i ] is ith largest component of x)
proof:f (x) = max {x i 1 + x i 2 + + x i r | 1 i1 < i 2 < < i r n}
Convex functions 315
Pointwise supremum
7/31/2019 Bv Cvxslides
54/300
p
if f (x, y ) is convex inx for each yA, theng(x) = sup
yAf (x, y )
is convexexamples
support function of a set C : S C (x) = sup yC yT x is convex
distance to farthest point in a set C :f (x) = sup
yC x y
maximum eigenvalue of symmetric matrix: forX Sn ,max (X ) = sup
y 2=1yT Xy
Convex functions 316
Composition with scalar functions
7/31/2019 Bv Cvxslides
55/300
p
composition of g : Rn
R and h : R R:f (x) = h(g(x))
f is convex if g convex, h convex, h nondecreasingg concave, h convex, h nonincreasing
proof (for n = 1 , differentiable g, h)f (x) = h (g(x))g(x)2 + h (g(x))g (x)
note: monotonicity must hold for extended-value extension h
examples
exp g(x) is convex if g is convex 1/g (x) is convex if g is concave and positive
Convex functions 317
Vector composition
7/31/2019 Bv Cvxslides
56/300
composition of g : Rn Rk and h : Rk R:f (x) = h(g(x)) = h(g1(x), g2(x), . . . , gk (x))
f is convex if gi convex, h convex, h nondecreasing in each argumentgi concave, h convex, h nonincreasing in each argument
proof (for n = 1 , differentiable g, h)
f (x) = g (x)T 2h(g(x))g (x) + h(g(x))
T g (x)
examples mi=1 log gi (x) is concave if gi are concave and positive log mi=1 exp gi (x) is convex if gi are convex
Convex functions 318
Minimization
7/31/2019 Bv Cvxslides
57/300
if f (x, y ) is convex in(x, y ) and C is a convex set, then
g(x) = inf yC
f (x, y )
is convex
examples
f (x, y ) = xT Ax + 2 xT By + yT Cy with
A BB T C 0, C 0
minimizing overy gives g(x) = inf y f (x, y ) = xT
(A BC 1
BT
)xg is convex, hence Schur complement A BC 1B T 0
distance to a set: dist (x, S ) = inf yS x y is convex if S is convex
Convex functions 319
Perspective
7/31/2019 Bv Cvxslides
58/300
the perspective of a function f : Rn R is the function g : Rn R R,g(x, t ) = tf (x/t ), dom g = {(x, t ) | x/t dom f, t > 0}
g is convex if f is convex
examples
f (x) = xT x is convex; hence g(x, t ) = xT x/t is convex fort > 0 negative logarithm f (x) = log x is convex; hence relative entropyg(x, t ) = t log t t log x is convex onR2++ if f is convex, then
g(x) = ( cT x + d)f (Ax + b)/ (cT x + d)
is convex on{x | cT x + d > 0, (Ax + b)/ (cT x + d)dom f }Convex functions 320
The conjugate function
7/31/2019 Bv Cvxslides
59/300
the conjugate of a function f is
f (y) = supxdom f
(yT x f (x))
f (x )
(0 , f (y ))
xy
x
f is convex (even if f is not) will be useful in chapter 5
Convex functions 321
examples
7/31/2019 Bv Cvxslides
60/300
negative logarithm f (x) = log xf (y) = sup
x> 0(xy + log x)
= 1 log(y) y < 0 otherwise
strictly convex quadratic f (x) = (1 / 2)xT Qx with Q
Sn++
f (y) = supx
(yT x (1/ 2)xT Qx )=
1
2yT Q 1y
Convex functions 322
7/31/2019 Bv Cvxslides
61/300
Examples
7/31/2019 Bv Cvxslides
62/300
|x| is quasiconvex on R ceil(x) = inf {zZ | z x}is quasilinear log x is quasilinear on R++ f (x1, x2) = x1x2 is quasiconcave on R2++ linear-fractional function
f (x) =aT x + b
cT x + d, dom f =
{x
|cT x + d > 0
}is quasilinear
distance ratiof (x) = x a 2x b 2
, dom f = {x | x a 2 x b 2}is quasiconvex
Convex functions 324
internal rate of return
7/31/2019 Bv Cvxslides
63/300
cash ow x = ( x0, . . . , x n ); x i is payment in period i (to us if x i > 0) we assume x0 < 0 and x0 + x1 + + xn > 0 present value of cash ow x, for interest rate r :
PV( x, r ) =n
i=0
(1 + r ) i x i
internal rate of return is smallest interest rate for which PV( x, r ) = 0 :
IRR( x) = inf {r 0 | PV( x, r ) = 0}IRR is quasiconcave: superlevel set is intersection of halfspaces
IRR( x) R n
i=0
(1 + r ) i x i 0 for 0 r R
Convex functions 325
Properties
7/31/2019 Bv Cvxslides
64/300
modied Jensen inequality: for quasiconvex f 0 1 = f (x + (1 )y) max{f (x), f (y)}
rst-order condition: differentiable f with cvx domain is quasiconvex iff
f (y) f (x) = f (x)T (y x) 0
x f (x )
sums of quasiconvex functions are not necessarily quasiconvex
Convex functions 326
Log-concave and log-convex functions
7/31/2019 Bv Cvxslides
65/300
a positive function f is log-concave if log f is concave:
f (x + (1 )y) f (x)f (y)1 for 0 1f is log-convex if log f is convex
powers: xa on R++ is log-convex fora 0, log-concave for a 0 many common probability densities are log-concave, e.g. , normal:
f (x) = 1 (2)n det e 12 (x x )
T 1(x x )
cumulative Gaussian distribution function is log-concave
(x) =1
2 x e u 2 / 2 duConvex functions 327
Properties of log-concave functions
7/31/2019 Bv Cvxslides
66/300
twice differentiable f with convex domain is log-concave if and only if f (x)
2f (x) f (x)f (x)T
for all xdom f
product of log-concave functions is log-concave
sum of log-concave functions is not always log-concave integration: if f : Rn Rm R is log-concave, then
g(x) = f (x, y ) dyis log-concave (not easy to show)Convex functions 328
consequences of integration property
7/31/2019 Bv Cvxslides
67/300
convolution f
g of log-concave functions f , g is log-concave
(f g)(x) = f (x y)g(y)dy if C Rn convex and y is a random variable with log-concave pdf then
f (x) = prob (x + yC )
is log-concave
proof: write f (x) as integral of product of log-concave functions
f (x) = g(x + y) p(y) dy, g(u) = 1 uC 0 uC, p is pdf of yConvex functions 329
example: yield function
7/31/2019 Bv Cvxslides
68/300
Y (x) = prob (x + wS )
xRn : nominal parameter values for product
wRn
: random variations of parameters in manufactured product
S : set of acceptable values
if S is convex and w has a log-concave pdf, then
Y is log-concave
yield regions{x | Y (x) }are convex
Convex functions 330
Convexity with respect to generalized inequalities
7/31/2019 Bv Cvxslides
69/300
f : Rn Rm is K -convex if dom f is convex andf (x + (1 )y) K f (x) + (1 )f (y)
for x, ydom f , 0 1
example f : Sm Sm , f (X ) = X 2 is Sm+ -convex
proof: for xed zRm , zT X 2z = Xz 22 is convex inX , i.e. ,
zT (X + (1 )Y )2z zT X 2z + (1 )zT Y 2zfor X, Y S
m , 0 1
therefore (X + (1 )Y )2 X 2 + (1 )Y 2Convex functions 331
Convex Optimization Boyd & Vandenberghe
4 Convex optimization problems
7/31/2019 Bv Cvxslides
70/300
4. Convex optimization problems
optimization problem in standard form
convex optimization problems
quasiconvex optimization linear optimization
quadratic optimization geometric programming generalized inequality constraints
semidenite programming vector optimization
41
Optimization problem in standard form
7/31/2019 Bv Cvxslides
71/300
minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , mh i (x) = 0 , i = 1 , . . . , p
xRn
is the optimization variable
f 0 : Rn R is the objective or cost function f i : Rn R, i = 1 , . . . , m , are the inequality constraint functions
h i : Rn
R are the equality constraint functionsoptimal value:
p = inf
{f 0(x)
|f i (x)
0, i = 1 , . . . , m, h i (x) = 0 , i = 1 , . . . , p
} p = if problem is infeasible (nox satises the constraints) p = if problem is unbounded below
Convex optimization problems 42
Optimal and locally optimal points
7/31/2019 Bv Cvxslides
72/300
x is feasible if xdom f 0 and it satises the constraints
a feasible x is optimal if f 0(x) = p; X opt is the set of optimal points
x is locally optimal if there is an R > 0 such that x is optimal for
minimize (over z) f 0(z)subject to f i (z) 0, i = 1 , . . . , m, h i (z) = 0 , i = 1 , . . . , pz x 2 R
examples (with n = 1 , m = p = 0 )
f 0(x) = 1 /x , dom f 0 = R++ : p = 0 , no optimal point
f 0(x) = log x, dom f 0 = R++ : p
= f 0(x) = x log x, dom f 0 = R++ : p = 1/e , x = 1 /e is optimal f 0(x) = x3 3x, p = , local optimum at x = 1
Convex optimization problems 43
Implicit constraints
7/31/2019 Bv Cvxslides
73/300
the standard form optimization problem has an implicit constraint
xD=m
i=0
dom f i p
i=1
dom h i ,
we call Dthe domain of the problem
the constraints f i (x)
0, h i (x) = 0 are the explicit constraints
a problem is unconstrained if it has no explicit constraints (m = p = 0 )
example :minimize f 0(x) = ki=1 log(bi aT i x)
is an unconstrained problem with implicit constraints aT i x < b i
Convex optimization problems 44
Feasibility problem
7/31/2019 Bv Cvxslides
74/300
nd xsubject to f i (x) 0, i = 1 , . . . , mh i (x) = 0 , i = 1 , . . . , p
can be considered a special case of the general problem withf 0(x) = 0 :
minimize 0subject to f i (x) 0, i = 1 , . . . , mh i (x) = 0 , i = 1 , . . . , p
p = 0 if constraints are feasible; any feasiblex is optimal
p = if constraints are infeasible
Convex optimization problems 45
Convex optimization problem
7/31/2019 Bv Cvxslides
75/300
standard form convex optimization problem
minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , m
aT
i x = bi , i = 1 , . . . , p
f 0, f 1, . . . , f m are convex; equality constraints are affine problem is quasiconvex if f 0 is quasiconvex (and f 1, . . . , f m convex)often written as
minimize f 0(x)
subject to f i (x) 0, i = 1 , . . . , mAx = b
important property: feasible set of a convex optimization problem is convex
Convex optimization problems 46
example
7/31/2019 Bv Cvxslides
76/300
minimize f 0(x) = x21 + x22subject to f 1(x) = x1/ (1 + x22) 0h1(x) = ( x1 + x2)2 = 0
f 0 is convex; feasible set{(x1, x2) | x1 = x2 0}is convex not a convex problem (according to our denition): f 1 is not convex, h1is not affine
equivalent (but not identical) to the convex problemminimize x21 + x22
subject to x1 0x1 + x2 = 0
Convex optimization problems 47
Local and global optima
any locally optimal point of a convex problem is (globally) optimal
7/31/2019 Bv Cvxslides
77/300
any locally optimal point of a convex problem is (globally) optimal
proof : suppose x is locally optimal andy is optimal with f 0(y) < f 0(x)
x locally optimal means there is an R > 0 such that
z feasible, z x 2 R = f 0(z) f 0(x)
consider z = y + (1 )x with = R/ (2 y x 2)
y x 2 > R , so 0 < < 1/ 2 z is a convex combination of two feasible points, hence also feasible z x 2 = R/ 2 and
f 0(z) f 0(x) + (1 )f 0(y) < f 0(x)which contradicts our assumption that x is locally optimal
Convex optimization problems 48
Optimality criterion for differentiable f 0
7/31/2019 Bv Cvxslides
78/300
x is optimal if and only if it is feasible and
f 0(x)T (y x) 0 for all feasibley
f 0(x )
X x
if nonzero,f 0(x) denes a supporting hyperplane to feasible set X at x
Convex optimization problems 49
unconstrained problem : x is optimal if and only if x dom f 0, f 0(x) = 0
7/31/2019 Bv Cvxslides
79/300
x
dom f 0,
f 0(x) 0
equality constrained problemminimize f 0(x) subject to Ax = b
x is optimal if and only if there exists a such that
xdom f 0, Ax = b, f 0(x) + AT = 0
minimization over nonnegative orthantminimize f 0(x) subject to x 0
x is optimal if and only if
xdom f 0, x 0, f 0(x)i 0 x i = 0f 0(x)i = 0 x i > 0
Convex optimization problems 410
Equivalent convex problems
two problems are (informally) equivalent if the solution of one is readily
7/31/2019 Bv Cvxslides
80/300
p ( y) q yobtained from the solution of the other, and vice-versa
some common transformations that preserve convexity:
eliminating equality constraints
minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , mAx = b
is equivalent to
minimize (over z) f 0(F z + x0)subject to f i (F z + x0) 0, i = 1 , . . . , m
where F and x0 are such that
Ax = b x = F z + x0 for some z
Convex optimization problems 411
introducing equality constraintsminimize f 0(A0x + b0)
7/31/2019 Bv Cvxslides
81/300
subject to f i (Ai x + bi ) 0, i = 1 , . . . , mis equivalent to
minimize (over x, yi ) f 0(y0)subject to f i (yi ) 0, i = 1 , . . . , myi = Ai x + bi , i = 0 , 1, . . . , m
introducing slack variables for linear inequalities
minimize f 0(x)subject to aT i x bi , i = 1 , . . . , m
is equivalent to
minimize (over x, s) f 0(x)subject to aT i x + s i = bi , i = 1 , . . . , m
s i 0, i = 1 , . . . mConvex optimization problems 412
epigraph form : standard form convex problem is equivalent tominimize (over x, t) t
7/31/2019 Bv Cvxslides
82/300
minimize (over x, t) tsubject to f 0(x) t 0f i (x) 0, i = 1 , . . . , mAx = b
minimizing over some variablesminimize f 0(x1, x2)subject to f i (x1)
0, i = 1 , . . . , m
is equivalent to
minimize f 0(x1)subject to f i (x1) 0, i = 1 , . . . , m
where f 0(x1) = inf x 2 f 0(x1, x2)
Convex optimization problems 413
Quasiconvex optimization
7/31/2019 Bv Cvxslides
83/300
minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , mAx = b
with f 0 : Rn
R quasiconvex, f 1, . . . , f m convex
can have locally optimal points that are not (globally) optimal
(x, f 0(x ))
Convex optimization problems 414
convex representation of sublevel sets of f 0
if f 0 is quasiconvex, there exists a family of functionst such that:
7/31/2019 Bv Cvxslides
84/300
t (x) is convex inx for xed t t-sublevel set of f 0 is 0-sublevel set of t , i.e. ,
f 0(x) t t (x) 0example
f 0(x) =p(x)q(x)
with p convex, q concave, and p(x) 0, q(x) > 0 on dom f 0can take
t(x) = p(x)
tq(x):
for t 0, t convex in x p(x)/q (x) t if and only if t (x) 0
Convex optimization problems 415
quasiconvex optimization via convex feasibility problems
(x) 0, f (x) 0, i = 1 , . . . , m, Ax = b (1)
7/31/2019 Bv Cvxslides
85/300
t( )
,
i( )
, , , , ( )
for xed t, a convex feasibility problem inx if feasible, we can conclude that t p; if infeasible,t p
Bisection method for quasiconvex optimization
given l p, u p, tolerance > 0.repeat
1. t := ( l + u ) / 2.2. Solve the convex feasibility problem (1).3. if (1) is feasible, u := t ; else l := t .
until u l .
requires exactly log2((u l)/) iterations (where u, l are initial values)Convex optimization problems 416
Linear program (LP)
7/31/2019 Bv Cvxslides
86/300
minimize cT x + dsubject to Gx h
Ax = b
convex problem with affine objective and constraint functions feasible set is a polyhedron
P x
c
Convex optimization problems 417
Examples
diet problem: choose quantities x1, . . . , xn of n foods
7/31/2019 Bv Cvxslides
87/300
one unit of food j costs cj , contains amount a ij of nutrient i healthy diet requires nutrient i in quantity at least bito nd cheapest healthy diet,
minimize cT xsubject to Ax b, x 0
piecewise-linear minimization
minimize max i=1 ,...,m (aT i x + bi )
equivalent to an LP
minimize tsubject to aT i x + bi t, i = 1 , . . . , m
Convex optimization problems 418
Chebyshev center of a polyhedron
Chebyshev center of
7/31/2019 Bv Cvxslides
88/300
P = {x | aT i x bi , i = 1 , . . . , m }is center of largest inscribed ball
B= {xc + u | u 2 r}
x chebx cheb
aT i x
bi for all x
Bif and only if
sup{aT i (xc + u) | u 2 r}= aT i xc + r a i 2 bi
hence, xc, r can be determined by solving the LP
maximize rsubject to aT i xc + r a i 2 bi , i = 1 , . . . , m
Convex optimization problems 419
(Generalized) linear-fractional program
i i i f ( )
7/31/2019 Bv Cvxslides
89/300
minimize f 0(x)subject to Gx h
Ax = b
linear-fractional program
f 0(x) =cT x + deT x + f
, dom f 0(x) = {x | eT x + f > 0}
a quasiconvex optimization problem; can be solved by bisection also equivalent to the LP (variables y, z)
minimize cT y + dz
subject to Gy hzAy = bzeT y + fz = 1z 0
Convex optimization problems 420
generalized linear-fractional program
f 0( )cT
ix + d
i d f 0( ) Ti f i 0 i 1
7/31/2019 Bv Cvxslides
90/300
f 0(x) = maxi=1 ,...,r i ieT i x + f i ,dom f 0(x) = {x | eT i x+ f i > 0, i = 1 , . . . , r }
a quasiconvex optimization problem; can be solved by bisection
example : Von Neumann model of a growing economy
maximize (over x, x+ ) min i=1 ,...,n x+i /x isubject to x+ 0, Bx + Ax
x, x + Rn : activity levels of n sectors, in current and next period
(Ax)i , (Bx + ) i : produced, resp. consumed, amounts of good i
x+i /x i : growth rate of sector iallocate activity to maximize growth rate of slowest growing sector
Convex optimization problems 421
Quadratic program (QP)
i i i (1/ 2) T P + T +
7/31/2019 Bv Cvxslides
91/300
minimize (1/ 2)xT P x + qT x + rsubject to Gx h
Ax = b
P Sn+ , so objective is convex quadratic minimize a convex quadratic function over a polyhedron
P
x
f 0(x
)
Convex optimization problems 422
Examples
l t q 2
7/31/2019 Bv Cvxslides
92/300
least-squares minimize Ax b 22
analytical solution x = Ab (A is pseudo-inverse)
can add linear constraints, e.g. , l x ulinear program with random cost
minimize cT
x + xT
x = E cT
x + var (cT
x)subject to Gx h, Ax = b
c is random vector with mean c and covariance
hence, cT x is random variable with mean cT x and variance xT x > 0 is risk aversion parameter; controls the trade-off betweenexpected cost and variance (risk)
Convex optimization problems 423
Quadratically constrained quadratic program (QCQP)
7/31/2019 Bv Cvxslides
93/300
minimize (1/ 2)xT P 0x + qT 0 x + r 0subject to (1/ 2)xT P i x + qT i x + r i 0, i = 1 , . . . , mAx = b
P iSn+ ; objective and constraints are convex quadratic
if P 1, . . . , P m Sn++ , feasible region is intersection of m ellipsoids andan affine set
Convex optimization problems 424
Second-order cone programming
T
7/31/2019 Bv Cvxslides
94/300
minimize f T xsubject to Ai x + bi 2 cT i x + di , i = 1 , . . . , mF x = g
(AiRn i n , F R
p n )
inequalities are called second-order cone (SOC) constraints:(Ai x + bi , cT i x + di )second-order cone in R
n i +1
for n i = 0 , reduces to an LP; if ci = 0 , reduces to a QCQP
more general than QCQP and LP
Convex optimization problems 425
Robust linear programming
the parameters in optimization problems are often uncertain, e.g. , in an LP
7/31/2019 Bv Cvxslides
95/300
minimize cT xsubject to aT i x bi , i = 1 , . . . , m ,
there can be uncertainty in c, ai, b
i
two common approaches to handling uncertainty (in a i , for simplicity)
deterministic model: constraints must hold for all a iE iminimize cT xsubject to aT i x bi for all a iE i , i = 1 , . . . , m ,
stochastic model: a i is random variable; constraints must hold withprobability
minimize cT xsubject to prob (aT i x bi ) , i = 1 , . . . , m
Convex optimization problems 426
deterministic approach via SOCP
choose an ellipsoid as
Ei:
7/31/2019 Bv Cvxslides
96/300
E E i = {a i + P i u | u 2 1} (a iRn , P iRn n )
center is a i , semi-axes determined by singular values/vectors of P i
robust LPminimize cT x
subject to aT i x bi a iE i , i = 1 , . . . , m
is equivalent to the SOCP
minimize cT xsubject to aT i x + P T i x 2 bi , i = 1 , . . . , m
(follows from sup u 2 1(a i + P i u)T x = aT i x + P T i x 2)
Convex optimization problems 427
7/31/2019 Bv Cvxslides
97/300
Geometric programming
monomial function
7/31/2019 Bv Cvxslides
98/300
f (x) = cxa 11 xa 22 xa nn , dom f = Rn++
with c > 0; exponent i can be any real number
posynomial function: sum of monomials
f (x) =K
k =1
ck xa 1k1 x
a 2k2 xa nkn , dom f = Rn++
geometric program (GP)
minimize f 0(x)subject to f i (x) 1, i = 1 , . . . , mh i (x) = 1 , i = 1 , . . . , p
with f i posynomial, h i monomial
Convex optimization problems 429
Geometric program in convex form
change variables to yi = log x i , and take logarithm of cost, constraints
7/31/2019 Bv Cvxslides
99/300
monomial f (x) = cxa 11 xa nn transforms tolog f (ey1 , . . . , e yn ) = aT y + b (b = log c)
posynomial f (x) = Kk =1 ck xa 1k1 x
a 2k2 x
a nkn transforms to
log f (ey1 , . . . , e yn ) = logK
k =1
eaT k y+ bk (bk = log ck )
geometric program transforms to convex problemminimize log Kk =1 exp( aT 0k y + b0k )subject to log Kk =1 exp( a
T ik y + bik ) 0, i = 1 , . . . , m
Gy + d = 0
Convex optimization problems 430
Design of cantilever beamsegment 4 segment 3 segment 2 segment 1
7/31/2019 Bv Cvxslides
100/300
F
N segments with unit lengths, rectangular cross-sections of size wi h i given vertical forceF applied at the right enddesign problem
minimize total weightsubject to upper & lower bounds on wi , h i
upper bound & lower bounds on aspect ratios h i /w i
upper bound on stress in each segmentupper bound on vertical deection at the end of the beam
variables: wi , h i for i = 1 , . . . , N
Convex optimization problems 431
objective and constraint functions
total weight w1h1 +
+ wN hN is posynomial
7/31/2019 Bv Cvxslides
101/300
aspect ratio h i /w i and inverse aspect ratio wi /h i are monomials
maximum stress in segment i is given by6iF/ (wi h2i ), a monomial
the vertical deection yi and slope vi of central axis at the right end of segment i are dened recursively as
vi = 12( i 1/ 2) F Ew i h3i + vi+1yi = 6( i 1/ 3)
F Ew i h3i
+ vi+1 + yi+1
for i = N, N 1, . . . , 1, with vN +1 = yN +1 = 0 (E is Youngs modulus)vi and yi are posynomial functions of w, h
Convex optimization problems 432
formulation as a GP
minimize w1h1 + + wN hN subject to w 1max wi 1, wmin w 1i 1, i = 1 , . . . , N
7/31/2019 Bv Cvxslides
102/300
j , , , ,h 1max h i 1, hmin h 1i 1, i = 1 , . . . , N S 1max w
1i h i 1, S min wi h 1i 1, i = 1 , . . . , N
6iF 1max w 1i h 2i 1, i = 1 , . . . , N y 1max y1 1
note
we write wmin wi wmax and hmin h i hmaxwmin /w i 1, wi /w max 1, hmin /h i 1, hi /h max 1
we write S min h i /w i S max asS min wi /h i 1, hi / (wi S max ) 1
Convex optimization problems 433
Minimizing spectral radius of nonnegative matrix
Perron-Frobenius eigenvalue pf (A)
i f ( l i ) i i ARn n
7/31/2019 Bv Cvxslides
103/300
exists for (elementwise) positive ARn n
a real, positive eigenvalue of A, equal to spectral radius max i | i (A)| determines asymptotic growth (decay) rate of A
k
: Ak
kpf as k alternative characterization: pf (A) = inf { | Av v for some v0}
minimizing spectral radius of matrix of posynomials
minimize pf (A(x)) , where the elements A(x)ij are posynomials of x equivalent geometric program:
minimize subject to nj =1 A(x)ij vj / (v i ) 1, i = 1 , . . . , nvariables , v, x
Convex optimization problems 434
Generalized inequality constraints
convex problem with generalized inequality constraints
7/31/2019 Bv Cvxslides
104/300
minimize f 0(x)subject to f i (x) K i 0, i = 1 , . . . , m
Ax = b
f 0 : Rn R convex; f i : Rn Rk i K i -convex w.r.t. proper cone K i same properties as standard convex problem (convex feasible set, localoptimum is global, etc.)conic form problem : special case with affine objective and constraints
minimize cT x
subject to F x + g K 0Ax = b
extends linear programming (K = Rm+ ) to nonpolyhedral cones
Convex optimization problems 435
Semidenite program (SDP)
minimize cT x
7/31/2019 Bv Cvxslides
105/300
subject to x1F 1 + x2F 2 + + xn F n + G 0Ax = bwith F i , G
Sk
inequality constraint is called linear matrix inequality (LMI) includes problems with multiple LMI constraints: for example,
x1F 1 + + xn F n + G 0, x1F 1 + + xn F n + G 0is equivalent to single LMI
x1F 1 00 F 1
+ x2F 2 00 F 2
+ + xnF n 00 F n
+ G 00 G
0
Convex optimization problems 436
LP and SOCP as SDP
LP and equivalent SDP
7/31/2019 Bv Cvxslides
106/300
LP: minimize cT xsubject to Ax b
SDP: minimize cT xsubject to diag (Ax b) 0
(note different interpretation of generalized inequality )
SOCP and equivalent SDP
SOCP: minimize f T xsubject to Ai x + bi 2 cT i x + di , i = 1 , . . . , m
SDP: minimize f T x
subject to (cT i x + di )I A i x + bi
(Ai x + bi )T cT i x + di0, i = 1 , . . . , m
Convex optimization problems 437
7/31/2019 Bv Cvxslides
107/300
Matrix norm minimization
minimize A(x) 2 = max (A(x)T A(x)) 1/ 2
7/31/2019 Bv Cvxslides
108/300
where A(x) = A0 + x1A1 + + xn An (with given AiR p q)equivalent SDP
minimize t
subject to tI A (x)A(x)T tI 0
variables xRn , tR constraint follows from
A 2 t AT
A t2
I, t 0
tI AAT tI 0
Convex optimization problems 439
Vector optimization
general vector optimization problem
7/31/2019 Bv Cvxslides
109/300
minimize (w.r.t. K ) f 0(x)subject to f i (x) 0, i = 1 , . . . , m
h i (x) 0, i = 1 , . . . , pvector objective f 0 : Rn Rq, minimized w.r.t. proper cone K Rq
convex vector optimization problem
minimize (w.r.t. K ) f 0(x)subject to f i (x) 0, i = 1 , . . . , m
Ax = b
with f 0 K -convex, f 1, . . . , f m convex
Convex optimization problems 440
Optimal and Pareto optimal points
set of achievable objective values
7/31/2019 Bv Cvxslides
110/300
O= {f 0(x) | x feasible}
feasible x is optimal if f 0(x) is a minimum value of O feasible x is Pareto optimal if f 0(x) is a minimal value of O
O
f 0(x )x is optimal
O
f 0(x po )
x po is Pareto optimal
Convex optimization problems 441
Multicriterion optimization
vector optimization problem with K = Rq+
f (x) = ( F (x) F (x))
7/31/2019 Bv Cvxslides
111/300
f 0(x) = ( F 1(x), . . . , F q(x))
q different objectives F i ; roughly speaking we want allF i s to be small
feasible x is optimal if y feasible = f 0(x
) f 0(y)
if there exists an optimal point, the objectives are noncompeting
feasible xpo is Pareto optimal if y feasible, f 0(y) f 0(xpo ) =
f 0(xpo ) = f 0(y)
if there are multiple Pareto optimal values, there is a trade-off betweenthe objectives
Convex optimization problems 442
Regularized least-squares
minimize (w.r.t. R2+ ) ( Ax b
22, x
22)
7/31/2019 Bv Cvxslides
112/300
0 10 20 30 40 500
5
10
15
20
25
F 1(x) = Ax
b 22
F 2 ( x )
=
x
2 2 O
example for AR100 10 ; heavy line is formed by Pareto optimal points
Convex optimization problems 443
Risk return trade-off in portfolio optimization
minimize (w.r.t. R2+ ) (
pT x, x T x)
subject to 1 T x = 1 , x 0
7/31/2019 Bv Cvxslides
113/300
xRn is investment portfolio; x i is fraction invested in asset i
p
Rn is vector of relative asset price changes; modeled as a randomvariable with mean p, covariance
pT x = E r is expected return; xT x = var r is return varianceexample
m e a n r e t u r n
standard deviation of return0% 10% 20%
0%
5%
10%
15%
standard deviation of return
a l l o c a t i o n x
x (1)
x (2)x (3)x (4)
0% 10% 20%
0
0.5
1
Convex optimization problems 444
Scalarization
to nd Pareto optimal points: choose
K 0 and solve scalar problem
minimize T f 0(x)
7/31/2019 Bv Cvxslides
114/300
minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , mh i (x) = 0 , i = 1 , . . . , p
if x is optimal for scalar problem,then it is Pareto-optimal for vectoroptimization problem
O
f 0(x 1)
1f
0(x
2) 2
f 0(x 3)
for convex vector optimization problems, can nd (almost) all Paretooptimal points by varying K
0
Convex optimization problems 445
Scalarization for multicriterion problems
to nd Pareto optimal points, minimize positive weighted sum
T f 0(x) = 1F 1(x) + + qF q(x)
7/31/2019 Bv Cvxslides
115/300
0( ) 1 1( ) q q( )
examples
regularized least-squares problem of page 443
take = (1 , ) with > 0
minimize Ax b 22 + x 22for xed , a LS problem
0 5 10 15 200
5
10
15
20
Ax b 22
x
2 2
= 1
Convex optimization problems 446
risk-return trade-off of page 444minimize
pT x + x T x
subject to 1 T x = 1 , x 0
7/31/2019 Bv Cvxslides
116/300
for xed > 0, a quadratic program
Convex optimization problems 447
7/31/2019 Bv Cvxslides
117/300
Lagrangian
standard form problem (not necessarily convex)
minimize f 0(x)
7/31/2019 Bv Cvxslides
118/300
( )subject to f i (x) 0, i = 1 , . . . , mh i (x) = 0 , i = 1 , . . . , p
variable xRn , domain D, optimal value p
Lagrangian: L : Rn Rm R p R, with dom L = D Rm R p,
L(x,, ) = f 0(x) +m
i=1
i f i (x) + p
i=1
i h i (x)
weighted sum of objective and constraint functions
i is Lagrange multiplier associated with f i (x) 0 i is Lagrange multiplier associated with h i (x) = 0
Duality 52
Lagrange dual function
Lagrange dual function: g : Rm
R p
R,
7/31/2019 Bv Cvxslides
119/300
g(, ) = inf xD
L(x,, )
= inf xD f 0(x) +m
i=1 i f i (x) +
p
i=1 i h i (x)
g is concave, can be for some , lower bound property: if 0, then g(, ) pproof: if x is feasible and 0, then
f 0(x) L(x,, ) inf xD L(x,, ) = g(, )minimizing over all feasiblex gives p g(, )Duality 53
7/31/2019 Bv Cvxslides
120/300
Standard form LP
minimize cT
xsubject to Ax = b, x 0
7/31/2019 Bv Cvxslides
121/300
dual function
Lagrangian isL(x,, ) = cT x + T (Ax b) T x
= bT + ( c + AT )T x L is affine inx, hence
g(, ) = inf x
L(x,, ) = bT AT + c = 0 otherwise
g is linear on affine domain{(, ) | AT + c = 0}, hence concavelower bound property : p bT if AT + c 0Duality 55
Equality constrained norm minimization
minimize xsubject to Ax = b
7/31/2019 Bv Cvxslides
122/300
dual function
g( ) = inf x ( x T Ax + bT ) =bT AT
1
otherwisewhere v = sup u 1 uT v is dual norm of proof: follows frominf x ( x
yT x) = 0 if y
1,
otherwise
if y 1, then x yT x 0 for all x, with equality if x = 0 if y > 1, choose x = tu where u 1, uT y = y > 1:
x yT x = t( u y ) as t lower bound property: p bT if AT 1Duality 56
Two-way partitioning
minimize xT
W xsubject to x2i = 1 , i = 1 , . . . , n
7/31/2019 Bv Cvxslides
123/300
a nonconvex problem; feasible set contains 2n discrete points
interpretation: partition {1, . . . , n }in two sets; W ij is cost of assigningi, j to the same set; W ij is cost of assigning to different setsdual function
g( ) = inf x (xT W x +i
i (x2i 1)) = inf x xT (W + diag ( ))x 1 T = 1 T W + diag ( ) 0
otherwise
lower bound property : p 1 T if W + diag ( ) 0example: = min (W )1 gives bound p n min (W )Duality 57
Lagrange dual and conjugate function
minimize f 0(x)subject to Ax b, Cx = d
7/31/2019 Bv Cvxslides
124/300
dual function
g(, ) = inf xdom f 0 f 0(x) + ( AT
+ C T
)T
x bT
dT
= f 0 (AT C T ) bT dT
recall denition of conjugate f (y) = sup xdom f (yT x
f (x))
simplies derivation of dual if conjugate of f 0 is kownexample: entropy maximization
f 0(x) =n
i=1
x i log x i , f 0 (y) =n
i=1
ey i 1
Duality 58
The dual problem
Lagrange dual problem
maximize g(, )subject to 0
7/31/2019 Bv Cvxslides
125/300
subject to 0
nds best lower bound on p
, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d , are dual feasible if 0, (, )dom g
often simplied by making implicit constraint (, )dom g explicitexample: standard form LP and its dual (page 55)
minimize cT xsubject to Ax = b
x 0
maximize
bT
subject to AT + c 0
Duality 59
Weak and strong duality
weak duality: d p
always holds (for convex and nonconvex problems) can be used to nd nontrivial lower bounds for difficult problems
7/31/2019 Bv Cvxslides
126/300
can be used to nd nontrivial lower bounds for difficult problemsfor example, solving the SDP
maximize 1 T subject to W + diag ( ) 0gives a lower bound for the two-way partitioning problem on page 57
strong duality: d = p
does not hold in general
(usually) holds for convex problems conditions that guarantee strong duality in convex problems are calledconstraint qualicationsDuality 510
Slaters constraint qualication
strong duality holds for a convex problem
minimize f 0(x)subject to f (x) 0 i = 1 m
7/31/2019 Bv Cvxslides
127/300
subject to f i (x) 0, i = 1 , . . . , mAx = bif it is strictly feasible,i.e. ,
xint D: f i (x) < 0, i = 1 , . . . , m, Ax = b
also guarantees that the dual optimum is attained (if p > ) can be sharpened: e.g. , can replace int Dwith relint D(interior
relative to affine hull); linear inequalities do not need to hold with strictinequality, . . .
there exist many other types of constraint qualications
Duality 511
Inequality form LP
primal problem minimize cT xsubject to Ax b
7/31/2019 Bv Cvxslides
128/300
dual function
g() = inf x
(c + AT )T x bT = bT AT + c = 0
otherwise
dual problemmaximize bT subject to AT + c = 0 , 0
from Slaters condition: p = d if Axb for some x in fact, p = d except when primal and dual are infeasible
Duality 512
Quadratic program
primal problem (assume P
Sn++ )
minimize xT P xsubject to Ax b
7/31/2019 Bv Cvxslides
129/300
dual functiong() = inf
xxT P x + T (Ax b) =
14
T AP 1AT bT
dual problemmaximize (1/ 4)T AP 1AT bT subject to 0
from Slaters condition: p = d if Axb for some x in fact, p = d always
Duality 513
A nonconvex problem with strong duality
minimize xT Ax + 2 bT xsubject to xT x 1
A 0, hence nonconvex
7/31/2019 Bv Cvxslides
130/300
dual function: g() = inf x (xT (A + I )x + 2 bT x
)
unbounded below if A + I 0 or if A + I 0 and bR(A + I ) minimized by x = (A + I )b otherwise: g() = bT (A + I )b
dual problem and equivalent SDP:
maximize bT (A + I )bsubject to A + I 0bR(A + I )
maximize t subject to
A + I bbT t 0
strong duality although primal problem is not convex (not easy to show)
Duality 514
Geometric interpretation
for simplicity, consider problem with one constraint f 1(x) 0interpretation of dual function:
g() = inf (t + u ) where G= {(f 1(x) f 0(x)) | xD}
7/31/2019 Bv Cvxslides
131/300
g() = inf (u,t )G
(t + u ), where G= {(f 1(x), f 0(x)) | xD}
G
p
g( )u + t = g( )
t
u
G
p
d
t
u
u + t = g() is (non-vertical) supporting hyperplane to G hyperplane intersects t-axis at t = g()
Duality 515
epigraph variation: same interpretation if Gis replaced with
A= {(u, t ) | f 1(x) u, f 0(x) t for some xD}A
t
7/31/2019 Bv Cvxslides
132/300
p
g( )
u + t = g( )
u
strong duality
holds if there is a non-vertical supporting hyperplane toAat (0, p) for convex problem,Ais convex, hence has supp. hyperplane at (0, p) Slaters condition: if there exist (u, t )Awith u < 0, then supportinghyperplanes at (0, p) must be non-vertical
Duality 516
7/31/2019 Bv Cvxslides
133/300
Karush-Kuhn-Tucker (KKT) conditions
the following four conditions are called KKT conditions (for a problem withdifferentiable f i , h i ):
7/31/2019 Bv Cvxslides
134/300
1. primal constraints: f i (x) 0, i = 1 , . . . , m , h i (x) = 0 , i = 1 , . . . , p2. dual constraints: 0
3. complementary slackness: i f i (x) = 0 , i = 1 , . . . , m
4. gradient of Lagrangian with respect to x vanishes:
f 0(x) +m
i=1
if i (x) + p
i=1
ih i (x) = 0
from page 517: if strong duality holds andx, , are optimal, then theymust satisfy the KKT conditions
Duality 518
KKT conditions for convex problem
if x, , satisfy KKT for a convex problem, then they are optimal:
from complementary slackness: f 0(x) = L(x, , )
7/31/2019 Bv Cvxslides
135/300
from 4th condition (and convexity): g(, ) = L(x, , )hence, f 0(x) = g(, )
if Slaters condition is satised:
x is optimal if and only if there exist, that satisfy KKT conditions
recall that Slater implies strong duality, and dual optimum is attained generalizes optimality conditionf 0(x) = 0 for unconstrained problem
Duality 519
example: water-lling (assume i > 0)
minimize ni=1 log(x i + i )subject to x 0, 1
T x = 1
x is optimal iff x 0, 1 T x = 1 , and there exist Rn , R such that
7/31/2019 Bv Cvxslides
136/300
0, i x i = 0 ,1
x i + i + i =
if < 1/ i : i = 0 and x i = 1 / i
if
1/ i : i =
1/ i and x i = 0
determine from 1 T x = ni=1 max{0, 1/ i}= 1interpretation
n patches; level of patch i is at height
i
ood area with unit amount of water resulting level is1/ i
1/ x i i
Duality 520
Perturbation and sensitivity analysis
(unperturbed) optimization problem and its dual
minimize f 0(x)subject to f i (x) 0, i = 1 , . . . , mh ( ) 0 i 1
maximize g(, )subject to 0
7/31/2019 Bv Cvxslides
137/300
h i (x) = 0 , i = 1 , . . . , p
perturbed problem and its dual
min. f 0(x)
s.t. f i (x) u i , i = 1 , . . . , mh i (x) = vi , i = 1 , . . . , pmax. g(, ) uT vT s.t. 0
x is primal variable;u, v are parameters
p
(u, v ) is optimal value as a function of u, v we are interested in information about p(u, v ) that we can obtain fromthe solution of the unperturbed problem and its dual
Duality 521
global sensitivity result
assume strong duality holds for unperturbed problem, and that , aredual optimal for unperturbed problem
apply weak duality to perturbed problem:
T T
7/31/2019 Bv Cvxslides
138/300
p(u, v ) g(, ) uT vT = p(0, 0) uT vT
sensitivity interpretation
if i large: p increases greatly if we tighten constraint i (u i < 0) if i small: p does not decrease much if we loosen constraint i (u i > 0) if i large and positive: p increases greatly if we take vi < 0;
if
i large and negative: p
increases greatly if we take vi > 0 if i small and positive: p does not decrease much if we take vi > 0;if i small and negative: p does not decrease much if we take vi < 0
Duality 522
local sensitivity: if (in addition) p(u, v ) is differentiable at (0, 0), then
i =
p(0, 0)
u i, i =
p(0, 0)
viproof (for i ): from global sensitivity result,
7/31/2019 Bv Cvxslides
139/300
p(0, 0)
u i= lim
t 0
p(te i , 0) p(0, 0)t
i
p(0, 0)u i
= limt 0
p(te i , 0) p(0, 0)t
i
hence, equality
p(u) for a problem with one (inequality)constraint: u
p(u )
p(0) u
u = 0
Duality 523
Duality and problem reformulations
equivalent formulations of a problem can lead to very different duals reformulating the primal problem can be useful when the dual is difficultto derive or uninteresting
7/31/2019 Bv Cvxslides
140/300
to derive, or uninteresting
common reformulations
introduce new variables and equality constraints
make explicit constraints implicit or vice-versa transform objective or constraint functions
e.g. , replace f 0(x) by (f 0(x)) with convex, increasing
Duality 524
7/31/2019 Bv Cvxslides
141/300
norm approximation problem: minimize Ax bminimize y
subject to y = Ax bcan look up conjugate of , or derive dual directly
7/31/2019 Bv Cvxslides
142/300
g( ) = inf x,y
( y + T y
T Ax + bT )
= bT + inf y ( y + T y) AT = 0
otherwise
=bT AT = 0 ,
1
otherwise(see page 54)
dual of norm approximation problemmaximize bT subject to AT = 0 , 1
Duality 526
Implicit constraints
LP with box constraints: primal and dual problem
minimize cT xsubject to Ax = b
1 x 1
maximize bT 1 T 1 1 T 2subject to c + AT + 1 2 = 01 0 2 0
7/31/2019 Bv Cvxslides
143/300
1 x 1 1 0, 2 0reformulation with box constraints made implicit
minimize f 0(x) =cT x 1 x 1 otherwise
subject to Ax = bdual function
g( ) = inf 1 x 1
(cT x + T (Ax b))= bT AT + c 1
dual problem: maximize bT AT + c 1Duality 527
Problems with generalized inequalities
minimize f 0(x)subject to f i (x) K i 0, i = 1 , . . . , m
h i (x) = 0 , i = 1 , . . . , p
K is generalized inequality onRk i
7/31/2019 Bv Cvxslides
144/300
K i is generalized inequality onR
denitions are parallel to scalar case:
Lagrange multiplier for f i (x) K i 0 is vector iRk i Lagrangian L : Rn Rk1 Rkm R p R, is dened as
L(x, 1, , m , ) = f 0(x) +m
i=1
T i f i (x) + p
i=1
i h i (x)
dual function g : Rk1 Rkm R p R, is dened asg(1, . . . , m , ) = inf
xDL(x, 1, , m , )
Duality 528
7/31/2019 Bv Cvxslides
145/300
Semidenite program
primal SDP (F i , GSk )
minimize cT xsubject to x1F 1 + + xn F n G
k
7/31/2019 Bv Cvxslides
146/300
Lagrange multiplier is matrix Z
Sk
Lagrangian L(x, Z ) = cT x + tr (Z (x1F 1 + + xn F n G)) dual function
g(Z ) = inf x L(x, Z ) = tr (GZ ) tr (F i Z ) + ci = 0 , i = 1 , . . . , n
otherwisedual SDP
maximize tr
(GZ )subject to Z 0, tr (F i Z ) + ci = 0 , i = 1 , . . . , n
p = d if primal SDP is strictly feasible (x with x1F 1 + + xn F n G)Duality 530
Convex Optimization Boyd & Vandenberghe
6. Approximation and tting
i ti
7/31/2019 Bv Cvxslides
147/300
norm approximation
least-norm problems
regularized approximation
robust approximation
61
7/31/2019 Bv Cvxslides
148/300
examples
least-squares approximation ( 2): solution satises normal equationsAT Ax = AT b
(x = ( AT A) 1AT b if rank A = n)
7/31/2019 Bv Cvxslides
149/300
(x ( A A) A b if rank A n)
Chebyshev approximation ( ): can be solved as an LPminimize tsubject to
t1 Ax
b t1
sum of absolute residuals approximation ( 1): can be solved as an LP
minimize 1 T ysubject to y Ax b y
Approximation and tting 63
7/31/2019 Bv Cvxslides
150/300
example (m = 100 , n = 30 ): histogram of residuals for penalties
(u) =
|u
|, (u) = u2, (u) = max
{0,
|u
|a
}, (u) =
log(1
u2)
p =
1
0
40
7/31/2019 Bv Cvxslides
151/300
p =
2
D e a d
z o n e
L o g b a r r i e r
r 2
2
2
2
1
1
1
1
0
0
0
0
1
1
1
1
2
2
2
20
0
10
0
20
0
10
shape of penalty function has large effect on distribution of residuals
Approximation and tting 65
Huber penalty function (with parameter M )
hub (u) =u2 |u| M M (2|u| M ) |u| > M
linear growth for large u makes approximation less sensitive to outliers2
7/31/2019 Bv Cvxslides
152/300
u
h u
b ( u )
1.5 1 0.5 0 0 .5 1 1 .50
0.5
1
1.5
t
f ( t )
10 5 0 5 10 20
10
0
10
20
left: Huber penalty for M = 1
right: affine function f (t) = + t tted to 42 points t i , yi (circles)using quadratic (dashed) and Huber (solid) penalty
Approximation and tting 66
Least-norm problems
minimize xsubject to Ax = b
(ARm n with m n, is a norm on Rn )
7/31/2019 Bv Cvxslides
153/300
interpretations of solution x = argmin Ax = b x :
geometric: x is point in affine set{x | Ax = b}with minimumdistance to 0
estimation: b = Ax are (perfect) measurements of x; x is smallest(most plausible) estimate consistent with measurements
design: x are design variables (inputs); b are required results (outputs)x is smallest (most efficient) design that satises requirements
Approximation and tting 67
examples
least-squares solution of linear equations ( 2):can be solved via optimality conditions
2x + AT = 0 , Ax = b
7/31/2019 Bv Cvxslides
154/300
minimum sum of absolute values ( 1): can be solved as an LPminimize 1 T ysubject to y x y, Ax = b
tends to produce sparse solution x
extension: least-penalty problem
minimize (x1) +
+ (x
n)
subject to Ax = b
: R R is convex penalty functionApproximation and tting 68
Regularized approximation
minimize (w.r.t. R2+ ) ( Ax b , x )AR
m n , norms on Rm and Rn can be different
7/31/2019 Bv Cvxslides
155/300
interpretation: nd good approximation Ax b with small x
estimation: linear measurement model y = Ax + v, with priorknowledge that x is small
optimal design : small x is cheaper or more efficient, or the linearmodel y = Ax is only valid for smallx
robust approximation: good approximation Ax
b with small x is
less sensitive to errors inA than good approximation with large x
Approximation and tting 69
Scalarized problem
minimize Ax b + x solution for > 0 traces out optimal trade-off curve
other common method: minimize Ax b 2 + x 2 with > 0
7/31/2019 Bv Cvxslides
156/300
other common method: minimize Ax
b 2 + x 2 with > 0
Tikhonov regularization
minimize Ax
b 22 + x 22
can be solved as a least-squares problem
minimizeA
I x b0
2
2
solution x = ( AT A + I ) 1AT b
Approximation and tting 610
Optimal input design
linear dynamical system with impulse response h:
y(t) =t
=0
h( )u(t ), t = 0 , 1, . . . , N
input design problem: multicriterion problem with 3 objectives
7/31/2019 Bv Cvxslides
157/300
input design problem: multicriterion problem with 3 objectives
1. tracking error with desired output ydes : J track =N t =0 (y(t) ydes (t))2
2. input magnitude: J mag =N t =0 u(t)
2
3. input variation: J der =N 1
t =0 (u(t + 1) u(t))2
track desired output using a small and slowly varying input signal
regularized least-squares formulation
minimize J track + J der + J mag
for xed , , a least-squares problem in u(0) , . . . , u(N )
Approximation and tting 611
example : 3 solutions on optimal trade-off curve
(top) = 0 , small ; (middle) = 0 , larger ; (bottom) large
u ( t )
5
0
5
y ( t )
1
0.50
0.51
7/31/2019 Bv Cvxslides
158/300
t0 50 100 150 200 10
t0 50 100 150 200 1
t
u ( t )
0 50 100 150 200 4 2
0
2
4
t
y ( t )
0 50 100 150 200 1
0.5
0
0.5
1
t
u ( t )
0 50 100 150 200 4
20
2
4
t
y ( t )
0 50 100 150 200 1
0.50
0.5
1
Approximation and tting 612
Signal reconstruction
minimize (w.r.t. R2+ ) ( x xcor 2, (x))
xRn is unknown signal
7/31/2019 Bv Cvxslides
159/300
xcor = x + v is (known) corrupted version of x, with additive noise v variable x (reconstructed signal) is estimate of x : Rn R is regularization function or smoothing objectiveexamples: quadratic smoothing, total variation smoothing:
quad
(x) =n 1
i=1
(xi+1
xi)2,
tv(x) =
n 1
i=1 |x
i+1 x
i |
Approximation and tting 613
quadratic smoothing example
x
0 5
0
0.5 x
0 1000 2000 3000 4000 0.5
0
0.5
0.5
7/31/2019 Bv Cvxslides
160/300
i
x c o r
0
0
1000
1000
2000
2000
3000
3000
4000
4000
0.5
0.5
0
0.5
i
x
x
0
0
1000
1000
2000
2000
3000
3000
4000
4000
0.5
0.5
0
0
0.5
original signal x and noisysignal xcor
three solutions on trade-off curvex xcor 2 versus quad (x)
Approximation and tting 614
total variation reconstruction example
x
1
0
1
2 x i
i
0 500 1000 1500 2000 2
0
2
2
7/31/2019 Bv Cvxslides
161/300
i
x c o r
0
0
500
500
1000
1000
1500
1500
2000
2000
2
2
1
0
1
2
i
x i
x i
0
0
500
500
1000
1000
1500
1500
2000
2000
2
2
0
0
2
original signal x and noisysignal xcorthree solutions on trade-off curve
x xcor 2 versus quad (x)quadratic smoothing smooths out noise and sharp transitions in signal
Approximation and tting 615
x
0 500 1000 1500 2000 2
1
0
1
2 x
x
0 500 1000 1500 2000 2
0
0
2
2
7/31/2019 Bv Cvxslides
162/300
i
x c o r
0 500 1000 1500 2000 2
1
0
1
2
i
x
0
0
500
500
1000
1000
1500
1500
2000
2000
2
2
0
2
original signal x and noisysignal xcor
three solutions on trade-off curvex
xcor 2 versus tv (x)
total variation smoothing preserves sharp transitions in signal
Approximation and tting 616
Robust approximation
minimize Ax b with uncertain Atwo approaches:
stochastic : assume A is random, minimize E Ax b worst-case: set Aof possible values of A, minimize sup AA Ax b
7/31/2019 Bv Cvxslides
163/300
tractable only in special cases (certain norms , distributions, sets A)
example : A(u) = A0 + uA1
xnom minimizes A0x b 22 xstoch minimizes E A(u)x b 22with u uniform on [1, 1] xwc minimizes sup 1 u 1 A(u)x b 22gure shows r (u) = A(u)x b 2
u
r ( u
)
x nom
x stochx wc
2 1 0 1 20
2
4
6
8
10
12
Approximation and tting 617
stochastic robust LS with A = A + U , U random, E U = 0 , E U T U = P
minimize E (A + U )x b 22
explicit expression for objective:E Ax b 22 = E Ax b + Ux 22
7/31/2019 Bv Cvxslides
164/300
= Ax b 22 + E xT U T Ux= Ax b 22 + xT P x
hence, robust LS problem is equivalent to LS problem
minimize Ax b 22 + P 1/ 2x 22
for P = I , get Tikhonov regularized problem
minimize Ax b 22 + x 22
Approximation and tting 618
worst-case robust LS with A= {A + u1A1 + + u pA p | u 2 1}minimize sup AA Ax b 22 = sup u 2 1 P (x)u + q(x) 22
where P (x) = A1x A2x A px , q(x) = Ax b
from page 514, strong duality holds between the following problems
7/31/2019 Bv Cvxslides
165/300
maximize P u + q 22subject to u 22 1
minimize t +
subject toI P q
P T I 0qT 0 t
0
hence, robust LS problem is equivalent to SDPminimize t +
subject toI P (x) q(x)
P (x)T I 0q(x)T 0 t
0
Approximation and tting 619
example: histogram of residuals
r (u) = (A0 + u1A1 + u2A2)x b 2with u uniformly distributed on unit disk, for three values of x
x rls0.2
0.25
7/31/2019 Bv Cvxslides
166/300
r (u )
x lsx tik
f r e q u e n c y
0 1 2 3 4 50
0.05
0.1
0.15
x ls minimizes A0x
b 2
x tik minimizes A0x b 22 + x 22 (Tikhonov solution) xwc minimizes sup u 2 1 A0x b 22 + x 22
Approximation and tting 620
Convex Optimization Boyd & Vandenberghe
7. Statistical estimation
maximum likelihood estimation
7/31/2019 Bv Cvxslides
167/300
optimal detector design
experiment design
71
Parametric distribution estimation
distribution estimation problem: estimate probability density p(y) of arandom variable from observed values parametric distribution estimation: choose from a family of densities px (y), indexed by a parameter x
7/31/2019 Bv Cvxslides
168/300
maximum likelihood estimation
maximize (over x) log px (y)
y is observed value l(x) = log px (y) is called log-likelihood function
can add constraints xC explicitly, or dene px (y) = 0 for xC a convex optimization problem if log px (y) is concave in x for xed y
Statistical estimation 72
Linear measurements with IID noise
linear measurement model
yi = aT i x + vi , i = 1 , . . . , m
x Rn is vector of unknown parameters
7/31/2019 Bv Cvxslides
169/300
vi is IID measurement noise, with densityp(z) yi is measurement: yRm has density px (y) = mi=1 p(yi aT i x)
maximum likelihood estimate: any solution x of
maximize l(x) = mi=1 log p(yi
aT i x)
(y is observed value)
Statistical estimation 73
examples
Gaussian noise N (0, 2): p(z) = (2 2) 1/ 2e z2 / (2 2) ,
l(x) = m2 log(2 2) 122m
i=1
(aT i x yi )2
ML estimate is LS solution| z | /a
7/31/2019 Bv Cvxslides
170/300
Laplacian noise: p(z) = (1 / (2a))e ,l(x) = m log(2a)
1a
m
i =1|aT i x yi |
ML estimate is 1-norm solution
uniform noise on [a, a ]:
l(x) = m log(2a)
|aT i x
yi
| a, i = 1 , . . . , m
otherwiseML estimate is any x with |aT i x yi | a
Statistical estimation 74
Logistic regression
random variable y {0, 1}with distribution
p = prob (y = 1) = exp( aT u + b)
1 + exp( aT u + b)
a, b are parameters; uRn are (observable) explanatory variables
7/31/2019 Bv Cvxslides
171/300
estimation problem: estimate a, b from m observations (u i , yi )log-likelihood function (for y1 = = yk = 1 , yk +1 = = ym = 0 ):
l(a, b) = logk
i=1
exp( aT u i + b)1 + exp( aT u i + b)
m
i= k +1
11 + exp( aT u i + b)
=
k
i=1 (aT
u i + b) m
i=1 log(1 + exp( aT
u i + b))
concave in a, b
Statistical estimation 75
example (n = 1 , m = 50 measurements)
b
( y =
1 )
0.6
0.8
1
7/31/2019 Bv Cvxslides
172/300
u
p r o b
0 2 4 6 8 10
0
0.2
0.4
circles show 50 points (u i , yi )
solid curve is ML estimate of p = exp( au + b)/ (1 + exp( au + b))
Statistical estimation 76
(Binary) hypothesis testing
detection (hypothesis testing) problem
given observation of a random variableX {1, . . . , n }, choose between:
hypothesis 1: X was generated by distribution p = ( p1, . . . , p n )h h d b d b ( 1 )
7/31/2019 Bv Cvxslides
173/300
hypothesis 2: X was generated by distribution q = ( q1, . . . , qn )
randomized detector
a nonnegative matrix T R2 n , with 1 T T = 1 T
if we observeX = k, we choose hypothesis 1 with probabilityt1k ,hypothesis 2 with probability t2k if all elements of T are 0 or 1, it is called a deterministic detector
Statistical estimation 77
detection probability matrix:
D = T p T q = 1 P fp P fnP fp 1 P fn
P fp is probability of selecting hypothesis 2 if X is generated bydistribution 1 (false positive)P f i b bili f l i h h i 1 ifX i d b
7/31/2019 Bv Cvxslides
174/300
P fn is probability of selecting hypothesis 1 if X is generated bydistribution 2 (false negative)multicriterion formulation of detector design
minimize (w.r.t. R2+ ) (P fp , P fn ) = (( T p)2, (T q)1)subject to t1k + t2k = 1 , k = 1 , . . . , n
t ik 0, i = 1 , 2, k = 1 , . . . , nvariable T R
2 n
Statistical estimation 78
scalarization (with weight > 0)
minimize (T p)2 + (T q)1
subject to t1k + t2k = 1 , tik 0, i = 1 , 2, k = 1 , . . . , nan LP with a simple analytical solution
(t1k t 2k )(1, 0) pk qk(0 1) k < k
7/31/2019 Bv Cvxslides
175/300
(t1k , t 2k ) = (0, 1) pk < q k a deterministic detector, given by a likelihood ratio test
if pk = qk for some k, any value 0
t1k
1, t1k = 1
t2k is optimal
(i.e. , Pareto-optimal detectors include non-deterministic detectors)
minimax detector
minimize max
{P fp , P fn
}= max
{(T p)2, (T q)1
}subject to t1k + t2k = 1 , tik 0, i = 1 , 2, k = 1 , . . . , nan LP; solution is usually not deterministic
Statistical estimation 79
example
P =
0.70 0.100.20 0.100.05 0.700.05 0.10
0 8
1
7/31/2019 Bv Cvxslides
176/300
P fp
P f n
12
34
0 0.2 0 .4 0 .6 0 .8 10
0.2
0.4
0.6
0.8
solutions 1, 2, 3 (and endpoints) are deterministic; 4 is minimax detector
Statistical estimation 710
Experiment design
m linear measurements yi = aT i x + wi , i = 1 , . . . , m of unknown xRn
measurement errors wi are IID N (0, 1) ML (least-squares) estimate is
x =
m
a i aT i
1 m
yi a i
7/31/2019 Bv Cvxslides
177/300
x = i=1 a i a i i=1 yi a i
error e = x x has zero mean and covariance
E = E eeT =m
i=1
a i aT i 1
condence ellipsoids are given by
{x
|(x
x)T E 1(x
x)
}experiment design : choose a i {v1, . . . , v p}(a set of possible testvectors) to make E small
Statistical estimation 711
vector optimization formulation
minimize (w.r.t. Sn+