Date post: | 02-Jun-2018 |
Category: |
Documents |
Upload: | pawankumar-barnwal |
View: | 225 times |
Download: | 0 times |
of 301
8/10/2019 Convex Slides
1/301
Convex Optimization Boyd & Vandenberghe
1. Introduction
mathematical optimization
least-squares and linear programming
convex optimization
example
course goals and topics
nonlinear optimization
brief history of convex optimization
11
8/10/2019 Convex Slides
2/301
Mathematical optimization
(mathematical) optimization problem
minimize f0(x)
subject to fi(x) bi, i= 1, . . . , m x= (x1, . . . , xn): optimization variables
f0:Rn
R: objective function fi:Rn R, i= 1, . . . , m: constraint functions
optimal solution x has smallest value off0 among all vectors thatsatisfy the constraints
Introduction 12
8/10/2019 Convex Slides
3/301
Examples
portfolio optimization
variables: amounts invested in different assets constraints: budget, max./min. investment per asset, minimum return objective: overall risk or return variance
device sizing in electronic circuits
variables: device widths and lengths
constraints: manufacturing limits, timing requirements, maximum area objective: power consumption
data fitting
variables: model parameters constraints: prior information, parameter limits
objective: measure of misfit or prediction error
Introduction 13
8/10/2019 Convex Slides
4/301
Solving optimization problems
general optimization problem
very difficult to solve methods involve some compromise, e.g., very long computation time, or
not always finding the solution
exceptions: certain problem classes can be solved efficiently and reliably
least-squares problems linear programming problems convex optimization problems
Introduction 14
8/10/2019 Convex Slides
5/301
Least-squares
minimize Ax b22
solving least-squares problems
analytical solution: x = (ATA)1ATb reliable and efficient algorithms and software
computation time proportional to n2k (A
Rkn); less if structured
a mature technology
using least-squares
least-squares problems are easy to recognize a few standard techniques increase flexibility (e.g., including weights,
adding regularization terms)
Introduction 15
8/10/2019 Convex Slides
6/301
Linear programming
minimize cTxsubject to aTix bi, i= 1, . . . , m
solving linear programs no analytical formula for solution reliable and efficient algorithms and software
computation time proportional to n2m ifm n; less with structure a mature technology
using linear programming
not as easy to recognize as least-squares problems a few standard tricks used to convert problems into linear programs
(e.g., problems involving 1- or-norms, piecewise-linear functions)
Introduction 16
8/10/2019 Convex Slides
7/301
Convex optimization problem
minimize f0(x)subject to fi(x) bi, i= 1, . . . , m
objective and constraint functions are convex:
fi(x + y) fi(x) + fi(y)
if + = 1, 0, 0
includes least-squares problems and linear programs as special cases
Introduction 17
8/10/2019 Convex Slides
8/301
solving convex optimization problems
no analytical solution reliable and efficient algorithms
computation time (roughly) proportional to max
{n3, n2m, F
}, where F
is cost of evaluating fis and their first and second derivatives
almost a technology
using convex optimization
often difficult to recognize many tricks for transforming problems into convex form surprisingly many problems can be solved via convex optimization
Introduction 18
8/10/2019 Convex Slides
9/301
Example
m lamps illuminating n (small, flat) patches
lamp power pj
illumination Ik
rkjkj
intensity Ik at patch k depends linearly on lamp powers pj:
Ik=m
j=1akjpj, akj =r
2kj max{cos kj, 0}
problem: achieve desired illumination Ides with bounded lamp powers
minimize maxk=1,...,n | log Ik log Ides|subject to 0 pj pmax, j = 1, . . . , m
Introduction 19
8/10/2019 Convex Slides
10/301
how to solve?
1. use uniform power: pj =p, vary p2. use least-squares:
minimize
nk=1(Ik Ides)2
roundpj ifpj > pmax orpj
8/10/2019 Convex Slides
11/301
5. use convex optimization: problem is equivalent to
minimize f0(p) = maxk=1,...,n h(Ik/Ides)subject to 0 pj pmax, j = 1, . . . , m
with h(u) = max
{u, 1/u
}
0 1 2 3 40
1
2
3
4
5
u
h(u
)
f0 is convex because maximum of convex functions is convex
exact solution obtained with effortmodest factorleast-squares effortIntroduction 111
8/10/2019 Convex Slides
12/301
additional constraints: does adding 1 or 2 below complicate the problem?
1. no more than half of total power is in any 10 lamps
2. no more than half of the lamps are on (pj >0)
answer: with (1), still easy to solve; with (2), extremely difficult moral: (untrained) intuition doesnt always work; without the proper
background very easy problems can appear quite similar to very difficultproblems
Introduction 112
8/10/2019 Convex Slides
13/301
Course goals and topics
goals
1. recognize/formulate problems (such as the illumination problem) asconvex optimization problems
2. develop code for problems of moderate size (1000 lamps, 5000 patches)
3. characterize optimal solution (optimal power distribution), give limits ofperformance, etc.
topics
1. convex sets, functions, optimization problems
2. examples and applications
3. algorithms
Introduction 113
8/10/2019 Convex Slides
14/301
Nonlinear optimization
traditional techniques for general nonconvex problems involve compromises
local optimization methods (nonlinear programming)
find a point that minimizes f0 among feasible points near it fast, can handle large problems require initial guess provide no information about distance to (global) optimum
global optimization methods
find the (global) solution worst-case complexity grows exponentially with problem size
these algorithms are often based on solving convex subproblems
Introduction 114
8/10/2019 Convex Slides
15/301
Brief history of convex optimization
theory (convex analysis): ca19001970
algorithms
1947: simplex algorithm for linear programming (Dantzig) 1960s: early interior-point methods (Fiacco & McCormick, Dikin, . . . ) 1970s: ellipsoid method and other subgradient methods 1980s: polynomial-time interior-point methods for linear programming
(Karmarkar 1984)
late 1980snow: polynomial-time interior-point methods for nonlinearconvex optimization (Nesterov & Nemirovski 1994)
applications
before 1990: mostly in operations research; few in engineering since 1990: many new applications in engineering (control, signal
processing, communications, circuit design, . . . ); new problem classes(semidefinite and second-order cone programming, robust optimization)
Introduction 115
8/10/2019 Convex Slides
16/301
Convex Optimization Boyd & Vandenberghe
2. Convex sets
affine and convex sets
some important examples
operations that preserve convexity
generalized inequalities
separating and supporting hyperplanes
dual cones and generalized inequalities
21
8/10/2019 Convex Slides
17/301
Affine set
linethrough x1, x2: all points
x=x1+ (1 )x2 ( R)
x1
x2
= 1.2 = 1
= 0.6
= 0 = 0.2
affine set: contains the line through any two distinct points in the set
example: solution set of linear equations{x | Ax=b}(conversely, every affine set can be expressed as solution set of system oflinear equations)
Convex sets 22
8/10/2019 Convex Slides
18/301
Convex set
line segment between x1 and x2: all points
x=x1+ (1 )x2
with 0 1
convex set: contains line segment between any two points in the set
x1, x2 C, 0 1 = x1+ (1 )x2 C
examples(one convex, two nonconvex sets)
Convex sets 23
8/10/2019 Convex Slides
19/301
Convex combination and convex hull
convex combination ofx1,. . . , xk: any point x of the form
x=1x1+ 2x2+ + kxk
with 1+ + k= 1, i 0
convex hullconv
S: set of all convex combinations of points in S
Convex sets 24
8/10/2019 Convex Slides
20/301
Convex cone
conic (nonnegative) combination ofx1 andx2: any point of the form
x=1x1+ 2x2
with 1 0, 2 0
0
x1
x2
convex cone: set that contains all conic combinations of points in the set
Convex sets 25
8/10/2019 Convex Slides
21/301
Hyperplanes and halfspaces
hyperplane: set of the form{x | aTx=b} (a = 0)a
xaTx=b
x0
halfspace: set of the form{x | aTx b} (a = 0)
a
aTx b
aTx b
x0
a is the normal vector
hyperplanes are affine and convex; halfspaces are convex
Convex sets 26
8/10/2019 Convex Slides
22/301
Euclidean balls and ellipsoids
(Euclidean) ball with center xc and radius r:
B(xc, r) = {x | x xc2 r} = {xc+ ru | u2 1}
ellipsoid: set of the form
{x | (x xc)TP1(x xc) 1}
with P Sn++ (i.e., Psymmetric positive definite)
xc
other representation:{
xc+ Au|
u
2
1}
with A square and nonsingular
Convex sets 27
8/10/2019 Convex Slides
23/301
Norm balls and norm cones
norm: a function that satisfies x 0;x = 0 if and only ifx= 0
tx
=
|t
| x
for t
R
x + y x + ynotation: is general (unspecified) norm; symb is particular normnorm ball with center xc and radius r:
{x
| x
xc
r}
norm cone:{(x, t) | x t}
Euclidean norm cone is called second-order cone
x1x2
t
10
1
1
0
10
0.5
1
norm balls and cones are convexConvex sets 28
8/10/2019 Convex Slides
24/301
Polyhedra
solution set of finitely many linear inequalities and equalities
Ax b, Cx=d
(A Rmn, C Rpn, is componentwise inequality)a1 a2
a3
a4
a5P
polyhedron is intersection of finite number of halfspaces and hyperplanes
Convex sets 29
8/10/2019 Convex Slides
25/301
Positive semidefinite cone
notation: Sn is set of symmetric n n matrices Sn+= {X Sn | X 0}: positive semidefinite n n matrices
X Sn+ zTXz 0 for all zSn+ is a convex cone
Sn++=
{X
Sn
|X
0
}: positive definite n
n matrices
example: x yy z S2+xy
z
0
0.5
1
1
0
10
0.5
1
Convex sets 210
8/10/2019 Convex Slides
26/301
Operations that preserve convexity
practical methods for establishing convexity of a set C
1. apply definition
x1, x2 C, 0 1 = x1+ (1 )x2 C
2. show that C is obtained from simple convex sets (hyperplanes,halfspaces, norm balls, . . . ) by operations that preserve convexity
intersection affine functions perspective function linear-fractional functions
Convex sets 211
8/10/2019 Convex Slides
27/301
Intersection
the intersection of (any number of) convex sets is convex
example:
S= {x Rm
| |p(t)| 1for |t| /3}where p(t) =x1 cos t + x2 cos2t + + xm cos mtform= 2:
0 /3 2/3
1
0
1
t
p
(t)
x1
x2 S
2 1 0 1 22
1
0
1
2
Convex sets 212
8/10/2019 Convex Slides
28/301
Affine function
suppose f :Rn Rm is affine (f(x) =Ax + b withA Rmn, b Rm) the image of a convex set under f is convex
S Rn
convex = f(S) = {f(x) | x S} convex the inverse image f1(C) of a convex set under f is convex
C Rm
convex = f1
(C) = {x Rn
| f(x) C} convexexamples
scaling, translation, projection
solution set of linear matrix inequality{x | x1A1+ + xmAm B}(with Ai, B Sp)
hyperbolic cone{x | xTP x (cTx)2, cTx 0} (with P Sn+)
Convex sets 213
8/10/2019 Convex Slides
29/301
Perspective and linear-fractional function
perspective function P :Rn+1 Rn:
P(x, t) =x/t, domP = {(x, t) | t >0}
images and inverse images of convex sets under perspective are convex
linear-fractional function f :Rn Rm:
f(x) = Ax + b
cTx + d, dom f= {x | cTx + d >0}
images and inverse images of convex sets under linear-fractional functionsare convex
Convex sets 214
8/10/2019 Convex Slides
30/301
exampleof a linear-fractional function
f(x) = 1x1+ x2+ 1
x
x1
x2 C
1 0 11
0
1
x1
x2
f(C)
1 0 11
0
1
Convex sets 215
8/10/2019 Convex Slides
31/301
Generalized inequalities
a convex cone K Rn is a proper cone if
K is closed (contains its boundary)
K is solid (has nonempty interior) Kis pointed (contains no line)
examples nonnegative orthant K=Rn+= {x Rn | xi 0, i= 1, . . . , n} positive semidefinite cone K=Sn+ nonnegative polynomials on [0, 1]:
K= {x Rn | x1+ x2t + x3t2 + + xntn1 0 fort [0, 1]}
Convex sets 216
8/10/2019 Convex Slides
32/301
generalized inequality defined by a proper cone K:
x Ky y x K, x Ky y x intK
examples
componentwise inequality (K=Rn
+)
x Rn+ y xi yi, i= 1, . . . , n
matrix inequality (K=Sn+)
XSn+ Y Y Xpositive semidefinite
these two types are so common that we drop the subscript in
K
properties: many properties ofKare similar toonR, e.g.,
x Ky, u Kv = x + u Ky+ v
Convex sets 217
8/10/2019 Convex Slides
33/301
Minimum and minimal elements
K is not in general a linear ordering: we can have x Ky and yKxx S is the minimum element ofSwith respect toK if
y S = x Ky
x S is a minimal element ofSwith respect toK if
y S, yKx = y=x
example(K=R2
+)
x1 is the minimum element ofS1x2 is a minimal element ofS2 x1
x2S1S2
Convex sets 218
8/10/2019 Convex Slides
34/301
Separating hyperplane theorem
ifC and D are nonempty disjoint convex sets, there exist a = 0, b s.t.
aTx b forx C, aTx b forx D
D
C
a
aTx b aTx b
the hyperplane{x | aTx=b} separates C andD
strict separation requires additional assumptions (e.g., C is closed, D is asingleton)
Convex sets 219
8/10/2019 Convex Slides
35/301
Supporting hyperplane theorem
supporting hyperplane to set Cat boundary point x0:
{x | aTx=aTx0}
where a = 0 and aTx aTx0 for all x C
C
ax0
supporting hyperplane theorem: ifCis convex, then there exists asupporting hyperplane at every boundary point ofC
Convex sets 220
8/10/2019 Convex Slides
36/301
Dual cones and generalized inequalities
dual cone of a cone K:
K = {y| yTx 0 for all x K}
examples K=Rn+: K =Rn+ K=Sn+: K =Sn+ K= {(x, t) | x2 t}: K = {(x, t) | x2 t} K= {(x, t) | x1 t}: K = {(x, t) | x t}
first three examples are self-dual cones
dual cones of proper cones are proper, hence define generalized inequalities:
yK0 yTx 0 for all x K0
Convex sets 221
8/10/2019 Convex Slides
37/301
Minimum and minimal elements via dual inequalities
minimum element w.r.t.Kx is minimum element ofS iff for all K 0, x is the unique minimizerofTz over S
x
S
minimal element w.r.t.K
ifx minimizes Tz over Sfor some
K0, then x is minimal
Sx1
x2
1
2
ifx is a minimal element of a convex setS, then there exists a nonzero
K0 such that x minimizes
Tz over S
Convex sets 222
8/10/2019 Convex Slides
38/301
optimal production frontier
different production methods use different amounts of resources x Rn
production set P: resource vectorsxfor all possible production methods efficient (Pareto optimal) methods correspond to resource vectors x
that are minimal w.r.t. R
n
+
example(n= 2)x1, x2, x3 are efficient; x4, x5 are not
x4x2
x1
x5
x3
P
labor
fuel
Convex sets 223
8/10/2019 Convex Slides
39/301
Convex Optimization Boyd & Vandenberghe
3. Convex functions
basic properties and examples
operations that preserve convexity
the conjugate function
quasiconvex functions
log-concave and log-convex functions
convexity with respect to generalized inequalities
31
D fi i i
8/10/2019 Convex Slides
40/301
Definition
f :Rn
R is convex ifdom f is a convex set andf(x + (1 )y) f(x) + (1 )f(y)
for all x, y
dom f, 0
1
(x, f(x))
(y, f(y))
f is concave iff is convex
f is strictly convex ifdom f is convex and
f(x + (1 )y)< f(x) + (1 )f(y)
for x, y dom f, x =y, 0<
8/10/2019 Convex Slides
41/301
Examples on R
convex:
affine: ax + b onR, for any a, b R
exponential: eax, for any a
R
powers: x onR++, for 1 or 0 powers of absolute value:|x|p onR, for p 1
negative entropy: x log x onR++
concave:
affine: ax + b onR, for any a, b R powers: x onR++, for 0 1 logarithm: log x onR++
Convex functions 33
n mn
8/10/2019 Convex Slides
42/301
Examples on Rn and Rmn
affine functions are convex and concave; all norms are convex
examples on Rn
affine function f(x) =aTx + b norms:xp= (ni=1 |xi|p)1/p forp 1;x= maxk |xk|
examples on Rmn (m n matrices)
affine function
f(X) =tr(ATX) + b=m
i=1
nj=1
AijXij+ b
spectral (maximum singular value) norm
f(X) = X2=max(X) = (max(XTX))1/2
Convex functions 34
R t i ti f f ti t li
8/10/2019 Convex Slides
43/301
Restriction of a convex function to a line
f :Rn R is convex if and only if the function g:R R,
g(t) =f(x + tv), dom g= {t | x + tv dom f}
is convex (in t) for any x dom f, v Rn
can check convexity offby checking convexity of functions of one variable
example. f :Sn
R with f(X) = log det X, dom f=Sn
++
g(t) = log det(X+ tV) = log det X+ log det(I+ tX1/2V X1/2)
= log det X+n
i=1 log(1 + ti)where i are the eigenvalues ofX
1/2V X1/2
g is concave in t (for any choice ofX
0, V); hence f is concave
Convex functions 35
Extended value extension
8/10/2019 Convex Slides
44/301
Extended-value extension
extended-value extension f off is
f(x) =f(x), x dom f, f(x) = , x dom f
often simplifies notation; for example, the condition
0 1 = f(x + (1 )y) f(x) + (1 )f(y)
(as an inequality in R {}), means the same as the two conditions
dom f is convex for x, y dom f,
0 1 = f(x + (1 )y) f(x) + (1 )f(y)
Convex functions 36
First order condition
8/10/2019 Convex Slides
45/301
First-order condition
f is differentiable ifdom f is open and the gradient
f(x) =
f(x)
x1,
f(x)
x2, . . . ,
f(x)
xn exists at each x dom f1st-order condition: differentiable fwith convex domain is convex iff
f(y) f(x) + f(x)T(y x) for allx, y dom f
(x, f(x))
f(y)
f(x) + f(x)
T
(y x)
first-order approximation offis global underestimator
Convex functions 37
Second order conditions
8/10/2019 Convex Slides
46/301
Second-order conditions
f is twice differentiable ifdom f is open and the Hessian2f(x) Sn,
2f(x)ij = 2f(x)
xix
j
, i, j = 1, . . . , n ,
exists at each x dom f
2nd-order conditions: for twice differentiable fwith convex domain
fis convex if and only if
2f(x)
0 for allx
dom f
if2f(x) 0 for all x dom f, then f is strictly convex
Convex functions 38
Examples
8/10/2019 Convex Slides
47/301
Examples
quadratic function: f(x) = (1/2)xT
P x + qT
x + r (with P Sn
)
f(x) =P x + q, 2f(x) =P
convex ifP
0
least-squares objective: f(x) = Ax b22f(x) = 2AT(Ax b), 2f(x) = 2ATA
convex (for any A)
quadratic-over-linear: f(x, y) =x2/y
2f(x, y) = 2y3 y
x y
xT 0
convex for y >0 xy
f(x,y
)
2
0
2
0
1
20
1
2
Convex functions 39
log-sum-exp: f (x) = log nk exp xk is convex
8/10/2019 Convex Slides
48/301
log sum exp: f(x) = log
k=1 exp xk is convex
2f(x) = 11Tz
diag(z) 1(1Tz)2
zzT (zk= exp xk)
to show2
f(x) 0, we must verify that vT
2
f(x)v 0 for all v:
vT2f(x)v=(
k zkv2k)(
k zk) (
k vkzk)2
(
kzk)2
0
since (k vkzk)2 (k zkv2k)(k zk) (from Cauchy-Schwarz inequality)
geometric mean: f(x) = (nk=1 xk)1/n onRn++ is concave(similar proof as for log-sum-exp)
Convex functions 310
Epigraph and sublevel set
8/10/2019 Convex Slides
49/301
Epigraph and sublevel set
-sublevel set off :Rn R:
C= {x dom f| f(x) }
sublevel sets of convex functions are convex (converse is false)
epigraphoff :Rn R:
epif= {(x, t) R
n+1
| x dom
f, f(x) t}epi f
f
fis convex if and only ifepi f is a convex set
Convex functions 311
Jensens inequality
8/10/2019 Convex Slides
50/301
Jensen s inequality
basic inequality: iff is convex, then for 0 1,
f(x + (1 )y) f(x) + (1 )f(y)
extension: iff is convex, then
f(E z) E f(z)for any random variable z
basic inequality is special case with discrete distribution
prob(z =x) =, prob(z=y) = 1
Convex functions 312
Operations that preserve convexity
8/10/2019 Convex Slides
51/301
Op t o s t t p s o ty
practical methods for establishing convexity of a function
1. verify definition (often simplified by restricting to a line)
2. for twice differentiable functions, show2f(x) 0
3. show that f is obtained from simple convex functions by operationsthat preserve convexity
nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective
Convex functions 313
8/10/2019 Convex Slides
52/301
Pointwise maximum
8/10/2019 Convex Slides
53/301
iff1, . . . ,fm are convex, then f(x) = max{f1(x), . . . , f m(x)} is convex
examples
piecewise-linear function: f(x) = maxi=1,...,m(aTix + bi) is convex sum ofr largest components ofx Rn:
f(x) =x[1]+ x[2]+ + x[r]
is convex (x[i] is ith largest component ofx)
proof:f(x) = max{xi1+ xi2+ + xir| 1 i1< i2< < ir n}
Convex functions 315
Pointwise supremum
8/10/2019 Convex Slides
54/301
iff(x, y) is convex in x for each y A, theng(x) = sup
yAf(x, y)
is convexexamples
support function of a set C: SC(x) = supyCyTx is convex
distance to farthest point in a set C:f(x) = sup
yCx y
maximum eigenvalue of symmetric matrix: for X Sn,max(X) = sup
y2=1yTXy
Convex functions 316
Composition with scalar functions
8/10/2019 Convex Slides
55/301
composition ofg:Rn
R and h:R R:f(x) =h(g(x))
f is convex if g convex, h convex, h nondecreasing
g concave, h convex, hnonincreasing
proof (for n= 1, differentiable g, h)
f(x) =h(g(x))g(x)2 + h(g(x))g(x)
note: monotonicity must hold for extended-value extension h
examples
exp g(x) is convex ifg is convex 1/g(x) is convex ifg is concave and positive
Convex functions 317
Vector composition
8/10/2019 Convex Slides
56/301
composition ofg:Rn Rk and h:Rk R:
f(x) =h(g(x)) =h(g1(x), g2(x), . . . , gk(x))
f is convex if gi convex, h convex, h nondecreasing in each argument
gi concave, h convex, h nonincreasing in each argument
proof (for n= 1, differentiable g, h)
f(x) =g(x)T2h(g(x))g(x) + h(g(x))Tg(x)
examplesmi=1 log gi(x) is concave ifgi are concave and positive log
mi=1 exp gi(x) is convex ifgi are convex
Convex functions 318
Minimization
8/10/2019 Convex Slides
57/301
iff(x, y) is convex in (x, y) and C is a convex set, then
g(x) = infyC
f(x, y)
is convex
examples
f(x, y) =xTAx + 2xTBy+ yTCy with
A BBT C
0, C 0
minimizing over y gives g(x) = infy f(x, y) =xT
(A BC1
BT
)xg is convex, hence Schur complement A BC1BT 0
distance to a set: dist(x, S) = infySx y is convex ifS is convex
Convex functions 319
Perspective
8/10/2019 Convex Slides
58/301
theperspectiveof a functionf :Rn R is the function g :Rn R R,
g(x, t) =tf(x/t), dom g= {(x, t) | x/t dom f, t >0}
g is convex iff is convex
examples
f(x) =xTx is convex; hence g(x, t) =xTx/t is convex for t >0 negative logarithm f(x) = log x is convex; hence relative entropy
g(x, t) =t log t t log x is convex onR2++ iff is convex, then
g(x) = (cTx + d)f(Ax + b)/(cTx + d)is convex on{x | cTx + d >0, (Ax + b)/(cTx + d) dom f}
Convex functions 320
The conjugate function
8/10/2019 Convex Slides
59/301
theconjugate of a function f is
f(y) = supxdom f
(yTx f(x))
f(x)
(0,f(y))
xy
x
f is convex (even iff is not) will be useful in chapter 5
Convex functions 321
examples
8/10/2019 Convex Slides
60/301
negative logarithm f(x) = log x
f(y) = supx>0
(xy+ log x)
= 1 log(y) y
8/10/2019 Convex Slides
61/301
f :Rn R is quasiconvex ifdom fis convex and the sublevel sets
S= {x dom f| f(x) }
are convex for all
a b c
f is quasiconcave iff is quasiconvex f is quasilinear if it is quasiconvex and quasiconcave
Convex functions 323
Examples
8/10/2019 Convex Slides
62/301
|x| is quasiconvex onR ceil(x) = inf{z Z | z x} is quasilinear log x is quasilinear on R++ f(x1, x2) =x1x2 is quasiconcave on R2++ linear-fractional function
f(x) =aTx + b
cTx + d
, dom f=
{x
|cTx + d >0
}is quasilinear
distance ratio
f(x) =x a2x b2 , dom f= {x | x a2 x b2}
is quasiconvex
Convex functions 324
internal rate of return
8/10/2019 Convex Slides
63/301
cash flow x= (x0, . . . , xn); xi is payment in periodi (to us ifxi>0) we assume x00 present value of cash flow x, for interest rate r:
PV(x, r) =n
i=0
(1 + r)ixi
internal rate of return is smallest interest rate for which PV(x, r) = 0:
IRR(x) = inf{r 0 | PV(x, r) = 0}
IRR is quasiconcave: superlevel set is intersection of open halfspaces
IRR(x) R n
i=0
(1 + r)ixi>0 for 0 r < R
Convex functions 325
Properties
8/10/2019 Convex Slides
64/301
modified Jensen inequality: for quasiconvex f
0 1 = f(x + (1 )y) max{f(x), f(y)}
first-order condition: differentiable fwith cvx domain is quasiconvex iff
f(y) f(x) = f(x)T(y x) 0
xf(x)
sums of quasiconvex functions are not necessarily quasiconvex
Convex functions 326
Log-concave and log-convex functions
8/10/2019 Convex Slides
65/301
a positive function f is log-concave iflog f is concave:
f(x + (1 )y) f(x)f(y)1 for 0 1
f is log-convex iflog f is convex
powers: xa onR++ is log-convex for a 0, log-concave for a 0 many common probability densities are log-concave, e.g., normal:
f(x) = 1(2)n det
e12(xx)
T1(xx)
cumulative Gaussian distribution function is log-concave
(x) = 1
2
x
eu2/2 du
Convex functions 327
Properties of log-concave functions
8/10/2019 Convex Slides
66/301
twice differentiable fwith convex domain is log-concave if and only if
f(x)2f(x) f(x)f(x)T
for all x dom f
product of log-concave functions is log-concave
sum of log-concave functions is not always log-concave integration: iff :Rn Rm R is log-concave, then
g(x) = f(x, y)dyis log-concave (not easy to show)
Convex functions 328
consequences of integration property
8/10/2019 Convex Slides
67/301
convolution f
g of log-concave functions f, g is log-concave
(f g)(x) =
f(x y)g(y)dy
ifC Rn convex and y is a random variable with log-concave pdf then
f(x) =prob(x + y C)
is log-concave
proof: write f(x) as integral of product of log-concave functions
f(x) = g(x + y)p(y) dy, g(u) = 1 u C0 u C,pis pdf ofy
Convex functions 329
example: yield function
8/10/2019 Convex Slides
68/301
Y(x) =prob(x + w S) x Rn: nominal parameter values for product
w Rn
: random variations of parameters in manufactured product
S: set of acceptable values
ifSis convex and w has a log-concave pdf, then
Y is log-concave
yield regions{x | Y(x) } are convex
Convex functions 330
Convexity with respect to generalized inequalities
8/10/2019 Convex Slides
69/301
f :Rn Rm is K-convex ifdom f is convex and
f(x + (1 )y) Kf(x) + (1 )f(y)
forx, y dom f, 0 1
examplef :Sm Sm, f(X) =X2 is Sm+ -convex
proof: for fixed z Rm, zTX2z= Xz22 is convex in X, i.e.,
zT(X+ (1 )Y)2z zTX2z+ (1 )zTY2z
forX, Y Sm, 0 1
therefore (X+ (1 )Y)2 X2 + (1 )Y2
Convex functions 331
Convex Optimization Boyd & Vandenberghe
4 Convex optimization problems
8/10/2019 Convex Slides
70/301
4. Convex optimization problems
optimization problem in standard form
convex optimization problems
quasiconvex optimization linear optimization
quadratic optimization geometric programming generalized inequality constraints
semidefinite programming vector optimization
41
Optimization problem in standard form
8/10/2019 Convex Slides
71/301
minimize f0(x)subject to fi(x) 0, i= 1, . . . , m
hi(x) = 0, i= 1, . . . , p
x Rn
is the optimization variable f0:Rn R is the objective or cost function fi:Rn R, i= 1, . . . , m, are the inequality constraint functions
hi:Rn
R are the equality constraint functionsoptimal value:
p = inf{
f0(x)|
fi(x)
0, i= 1, . . . , m, hi(x) = 0, i= 1, . . . , p}
p = if problem is infeasible (no x satisfies the constraints) p = if problem is unbounded below
Convex optimization problems 42
Optimal and locally optimal points
8/10/2019 Convex Slides
72/301
x is feasible ifx dom f0 and it satisfies the constraintsa feasible x is optimal iff0(x) =p
; Xopt is the set of optimal points
x is locally optimal if there is an R >0 such that x is optimal for
minimize (over z) f0(z)subject to fi(z) 0, i= 1, . . . , m, hi(z) = 0, i= 1, . . . , p
z x2 R
examples(with n= 1, m=p= 0)
f0(x) = 1/x, dom f0=R++: p = 0, no optimal point
f0(x) = log x, dom f0=R++: p
= f0(x) =x log x, dom f0=R++: p = 1/e, x= 1/e is optimal f0(x) =x3 3x, p = , local optimum at x= 1
Convex optimization problems 43
Implicit constraints
8/10/2019 Convex Slides
73/301
the standard form optimization problem has an implicit constraint
x D =m
i=0dom fi
p
i=1domhi,
we callD thedomain of the problem
the constraints fi(x)
0, hi(x) = 0 are the explicit constraints
a problem is unconstrainedif it has no explicit constraints (m=p= 0)
example:
minimize f0(x) = ki=1 log(bi aTix)is an unconstrained problem with implicit constraints aTix < bi
Convex optimization problems 44
Feasibility problem
8/10/2019 Convex Slides
74/301
find xsubject to fi(x) 0, i= 1, . . . , m
hi(x) = 0, i= 1, . . . , p
can be considered a special case of the general problem with f0(x) = 0:
minimize 0subject to fi(x) 0, i= 1, . . . , m
hi(x) = 0, i= 1, . . . , p
p = 0 if constraints are feasible; any feasible x is optimal
p = if constraints are infeasible
Convex optimization problems 45
Convex optimization problem
8/10/2019 Convex Slides
75/301
standard form convex optimization problem
minimize f0(x)subject to fi(x) 0, i= 1, . . . , m
a
T
ix=bi, i= 1, . . . , p
f0, f1, . . . ,fm are convex; equality constraints are affine problem is quasiconvex iff0 is quasiconvex (and f1, . . . ,fm convex)
often written as
minimize f0(x)
subject to fi(x) 0, i= 1, . . . , mAx=b
important property: feasible set of a convex optimization problem is convex
Convex optimization problems 46
example
8/10/2019 Convex Slides
76/301
minimize f0(x) =x21+ x22subject to f1(x) =x1/(1 + x
22) 0
h1(x) = (x1+ x2)2 = 0
f0 is convex; feasible set{(x1, x2) | x1= x2 0} is convex not a convex problem (according to our definition): f1 is not convex, h1
is not affine
equivalent (but not identical) to the convex problem
minimize x21+ x22
subject to x1 0x1+ x2= 0
Convex optimization problems 47
Local and global optima
any locally optimal point of a convex problem is (globally) optimal
8/10/2019 Convex Slides
77/301
any locally optimal point of a convex problem is (globally) optimal
proof: supposexis locally optimal and there exists a y withf0(y)< f0(x)
x locally optimal means there is an R >0 such that
z feasible, z x2 R = f0(z) f0(x)
considerz =y+ (1 )x with =R/(2y x2)
y x2> R, so 0<
8/10/2019 Convex Slides
78/301
x is optimal if and only if it is feasible and
f0(x)T(y x) 0 for all feasibley
f0(x)
X x
if nonzero,f0(x) defines a supporting hyperplane to feasible set X at x
Convex optimization problems 49
unconstrained problem: x is optimal if and only if
x dom f0, f0(x) = 0
8/10/2019 Convex Slides
79/301
f0,
f0( ) 0
equality constrained problem
minimize f0(x) subject to Ax=b
x is optimal if and only if there exists a such that
x dom f0, Ax=b, f0(x) + AT= 0
minimization over nonnegative orthant
minimize f0(x) subject to x 0
x is optimal if and only if
x dom f0, x 0,f0(x)i 0 xi= 0
f0(x)i= 0 xi>0
Convex optimization problems 410
Equivalent convex problems
two problems are (informally) equivalent if the solution of one is readily
8/10/2019 Convex Slides
80/301
p ( y) q yobtained from the solution of the other, and vice-versa
some common transformations that preserve convexity:
eliminating equality constraints
minimize f0(x)subject to fi(x) 0, i= 1, . . . , m
Ax=b
is equivalent to
minimize (over z) f0(F z+ x0)subject to fi(F z+ x0) 0, i= 1, . . . , m
where F and x0 are such that
Ax=b x=F z+ x0 for some z
Convex optimization problems 411
introducing equality constraintsminimize f0(A0x + b0)
8/10/2019 Convex Slides
81/301
subject to fi(Aix + bi) 0, i= 1, . . . , mis equivalent to
minimize (over x, yi) f0(y0)subject to fi(yi) 0, i= 1, . . . , m
yi=Aix + bi, i= 0, 1, . . . , m
introducing slack variables for linear inequalities
minimize f0(x)subject to aTix bi, i= 1, . . . , m
is equivalent to
minimize (over x, s) f0(x)subject to aTix + si=bi, i= 1, . . . , m
si 0, i= 1, . . . m
Convex optimization problems 412
epigraph form: standard form convex problem is equivalent to
minimize (over x, t) t
8/10/2019 Convex Slides
82/301
( , )subject to f0(x) t 0
fi(x) 0, i= 1, . . . , mAx=b
minimizing over some variables
minimize f0(x1, x2)subject to fi(x1)
0, i= 1, . . . , m
is equivalent to
minimize f0(x1)
subject to fi(x1) 0, i= 1, . . . , m
where f0(x1) = infx2 f0(x1, x2)
Convex optimization problems 413
Quasiconvex optimization
8/10/2019 Convex Slides
83/301
minimize f0(x)subject to fi(x) 0, i= 1, . . . , m
Ax=b
with f0:Rn
R quasiconvex, f1, . . . ,fm convex
can have locally optimal points that are not (globally) optimal
(x, f0(x))
Convex optimization problems 414
8/10/2019 Convex Slides
84/301
quasiconvex optimization via convex feasibility problems
(x) 0, f (x) 0, i= 1, . . . , m, Ax=b (1)
8/10/2019 Convex Slides
85/301
t( )
i( )
( )
for fixed t, a convex feasibility problem in x if feasible, we can conclude that t p; if infeasible, t p
Bisection method for quasiconvex optimization
givenl p, u p, tolerance > 0.
repeat1. t := (l+u)/2.2. Solve the convex feasibility problem (1).
3. if (1) is feasible, u:=t; else l :=t.
until u l .
requires exactlylog2((u l)/) iterations (where u, l are initial values)
Convex optimization problems 416
Linear program (LP)
8/10/2019 Convex Slides
86/301
minimize cTx + dsubject to Gx h
Ax=b
convex problem with affine objective and constraint functions feasible set is a polyhedron
P x
c
Convex optimization problems 417
Examples
diet problem: choose quantities x1, . . . ,xn ofn foods
8/10/2019 Convex Slides
87/301
one unit of food j costs cj, contains amount aij of nutrient i healthy diet requires nutrient i in quantity at least bi
to find cheapest healthy diet,
minimize cTxsubject to Ax b, x 0
piecewise-linear minimization
minimize maxi=1,...,m(aTix + bi)
equivalent to an LP
minimize tsubject to aTi x + bi t, i= 1, . . . , m
Convex optimization problems 418
Chebyshev center of a polyhedron
Chebyshev center of
8/10/2019 Convex Slides
88/301
P= {x | aTix bi, i= 1, . . . , m}
is center of largest inscribed ball
B= {xc+ u | u2 r}
xchebxcheb
aTix
bi for all x
B if and only if
sup{aTi(xc+ u) | u2 r} =aTixc+ rai2 bi
hence, xc, r can be determined by solving the LP
maximize rsubject to aTixc+ rai2 bi, i= 1, . . . , m
Convex optimization problems 419
Linear-fractional program
i i i f0( )
8/10/2019 Convex Slides
89/301
minimize f0(x)subject to Gx h
Ax=b
linear-fractional program
f0(x) = cTx + d
eTx + f, dom f0(x) = {x | eTx + f >0}
a quasiconvex optimization problem; can be solved by bisection also equivalent to the LP (variables y, z)
minimize cTy+ dz
subject to Gy hzAy=bzeTy+ f z= 1z 0
Convex optimization problems 420
generalized linear-fractional program
f0( )cT
ix + d
i d m f0( ) { | Ti +fi > 0 i 1 }
8/10/2019 Convex Slides
90/301
f0(x) = maxi=1,...,r i ieTix + fi, dom f0(x) = {x | eTi x+fi>0, i= 1, . . . , r}
a quasiconvex optimization problem; can be solved by bisection
example: Von Neumann model of a growing economy
maximize (over x, x+) mini=1,...,n x+i /xi
subject to x+
0, Bx+
Ax
x, x+ Rn: activity levels ofn sectors, in current and next period
(Ax)i, (Bx
+)i: produced, resp. consumed, amounts of good i
x+i /xi: growth rate of sector iallocate activity to maximize growth rate of slowest growing sector
Convex optimization problems 421
Quadratic program (QP)
minimi e (1/2) T P + T +
8/10/2019 Convex Slides
91/301
minimize (1/2)xTP x + qTx + rsubject to Gx h
Ax=b
P Sn+, so objective is convex quadratic minimize a convex quadratic function over a polyhedron
P
x
f0(x
)
Convex optimization problems 422
Examples
least squares 2
8/10/2019 Convex Slides
92/301
least-squares minimize Ax b22
analytical solution x =Ab (A is pseudo-inverse)
can add linear constraints, e.g., l x ulinear program with random cost
minimize cT
x + xT
x=E cT
x + var(cT
x)subject to Gx h, Ax=b
c is random vector with mean c and covariance
hence, cTx is random variable with mean cTx and variance xTx >0 is risk aversion parameter; controls the trade-off between
expected cost and variance (risk)
Convex optimization problems 423
Quadratically constrained quadratic program (QCQP)
8/10/2019 Convex Slides
93/301
minimize (1/2)xTP0x + qT0x + r0
subject to (1/2)xTPix + qTi x + ri 0, i= 1, . . . , m
Ax=b
Pi Sn+; objective and constraints are convex quadratic
ifP1, . . . , P m Sn++, feasible region is intersection ofm ellipsoids andan affine set
Convex optimization problems 424
Second-order cone programming
T
8/10/2019 Convex Slides
94/301
minimize fTxsubject to Aix + bi2 cTix + di, i= 1, . . . , m
F x=g
(Ai Rnin, F Rpn)
inequalities are called second-order cone (SOC) constraints:
(Aix + bi, cTix + di) second-order cone in Rni+1
for ni= 0, reduces to an LP; ifci= 0, reduces to a QCQP
more general than QCQP and LP
Convex optimization problems 425
Robust linear programming
the parameters in optimization problems are often uncertain, e.g., in an LP
8/10/2019 Convex Slides
95/301
minimize cTxsubject to aTix bi, i= 1, . . . , m ,
there can be uncertainty in c, ai, b
i
two common approaches to handling uncertainty (in ai, for simplicity)
deterministic model: constraints must hold for all ai Ei
minimize cTxsubject to aTix bi for all ai Ei, i= 1, . . . , m ,
stochastic model: ai is random variable; constraints must hold with
probability
minimize cTxsubject to prob(aTix bi) , i= 1, . . . , m
Convex optimization problems 426
deterministic approach via SOCP
choose an ellipsoid as
Ei:
8/10/2019 Convex Slides
96/301
EEi= {ai+ Piu | u2 1} (ai Rn, Pi Rnn)
center is ai, semi-axes determined by singular values/vectors ofPi
robust LPminimize cTx
subject to aT
ix bi ai Ei, i= 1, . . . , mis equivalent to the SOCP
minimize cTx
subject to aTix + PTi x2 bi, i= 1, . . . , m
(follows from supu21(ai+ Piu)Tx= aTix + PTi x2)
Convex optimization problems 427
stochastic approach via SOCP
assume ai is Gaussian with mean ai, covariance i (ai
N(ai, i))
8/10/2019 Convex Slides
97/301
N aTix is Gaussian r.v. with mean aTix, variance xTix; hence
prob(aT
i
x
bi) = bi aTix
1/2i x2where (x) = (1/
2)x
et2/2 dt is CDF ofN(0, 1)
robust LPminimize cTxsubject to prob(aTix bi) , i= 1, . . . , m ,
with 1/2, is equivalent to the SOCPminimize cTx
subject to aTix + 1()1/2i x2 bi, i= 1, . . . , m
Convex optimization problems 428
Geometric programming
monomial function
8/10/2019 Convex Slides
98/301
f(x) =cxa11 xa22 xann , dom f=Rn++
with c >0; exponent ai can be any real number
posynomial function: sum of monomials
f(x) =K
k=1ckx
a1k1 x
a2k2 xankn , dom f=Rn++
geometric program (GP)
minimize f0(x)
subject to fi(x) 1, i= 1, . . . , mhi(x) = 1, i= 1, . . . , p
with fi posynomial, hi monomial
Convex optimization problems 429
Geometric program in convex form
change variables to yi= log xi, and take logarithm of cost, constraints
8/10/2019 Convex Slides
99/301
monomial f(x) =cxa11 xann transforms to
log f(ey1, . . . , eyn) =aTy+ b (b= log c)
posynomial f(x) =Kk=1 ckxa1k1 xa2k2 xankn transforms tolog f(ey1, . . . , eyn) = log
K
k=1 eaTk y+bk (bk= log ck)
geometric program transforms to convex problem
minimize log Kk=1 exp(aT0ky+ b0k)subject to log
Kk=1 exp(a
Tiky+ bik)
0, i= 1, . . . , m
Gy+ d= 0
Convex optimization problems 430
Design of cantilever beam
segment 4 segment 3 segment 2 segment 1
8/10/2019 Convex Slides
100/301
F
Nsegments with unit lengths, rectangular cross-sections of size wi hi given vertical force Fapplied at the right end
design problem
minimize total weightsubject to upper & lower bounds on wi, hi
upper bound & lower bounds on aspect ratios hi/wi
upper bound on stress in each segmentupper bound on vertical deflection at the end of the beam
variables: wi, hi for i= 1, . . . , N
Convex optimization problems 431
objective and constraint functions
total weight w1h1+
+ wNhN is posynomial
8/10/2019 Convex Slides
101/301
aspect ratio hi/wi and inverse aspect ratio wi/hi are monomials
maximum stress in segment i is given by 6iF/(wih
2i ), a monomial
the vertical deflection yi and slope vi of central axis at the right end ofsegmenti are defined recursively as
vi = 12(i 1/2) FEwih3i
+ vi+1
yi = 6(i 1/3) FEwih3i
+ vi+1+ yi+1
fori=N, N 1, . . . , 1, withvN+1=yN+1= 0(Eis Youngs modulus)vi and yi are posynomial functions ofw, h
Convex optimization problems 432
formulation as a GP
minimize w1h1+ + wNhNsubject to w1maxwi 1, wminw1i 1, i= 1, . . . , N
8/10/2019 Convex Slides
102/301
h1maxhi 1, hminh1i 1, i= 1, . . . , N S1maxw
1i hi 1, Sminwih1i 1, i= 1, . . . , N
6iF 1maxw1i h2i 1, i= 1, . . . , N y1maxy1 1
note
we write wmin wi wmax andhmin hi hmaxwmin/wi 1, wi/wmax 1, hmin/hi 1, hi/hmax 1
we write Smin hi/wi Smax as
Sminwi/hi 1, hi/(wiSmax) 1
Convex optimization problems 433
Minimizing spectral radius of nonnegative matrix
Perron-Frobenius eigenvalue pf(A)
i t f ( l t i ) iti A Rnn
8/10/2019 Convex Slides
103/301
exists for (elementwise) positive A Rnn
a real, positive eigenvalue ofA, equal to spectral radius maxi |i(A)|
determines asymptotic growth (decay) rate ofAk
: Ak
k
pf as k alternative characterization: pf(A) = inf{ | Av v for some v 0}
minimizing spectral radius of matrix of posynomials
minimize pf(A(x)), where the elements A(x)ij are posynomials ofx equivalent geometric program:
minimize subject to nj=1 A(x)ijvj/(vi) 1, i= 1, . . . , nvariables , v, x
Convex optimization problems 434
Generalized inequality constraints
convex problem with generalized inequality constraints
i i i f ( )
8/10/2019 Convex Slides
104/301
minimize f0(x)subject to fi(x) Ki0, i= 1, . . . , m
Ax=b
f0:Rn R convex; fi:Rn Rki Ki-convex w.r.t. proper cone Ki same properties as standard convex problem (convex feasible set, local
optimum is global, etc.)
conic form problem: special case with affine objective and constraints
minimize cTx
subject to F x + gK0Ax=b
extends linear programming (K=Rm+ ) to nonpolyhedral cones
Convex optimization problems 435
Semidefinite program (SDP)
minimize cTx
8/10/2019 Convex Slides
105/301
subject to x1F1+ x2F2+ + xnFn+ G 0Ax=b
with Fi, G
Sk
inequality constraint is called linear matrix inequality (LMI) includes problems with multiple LMI constraints: for example,
x1F1+ + xnFn+ G 0, x1F1+ + xnFn+ G 0
is equivalent to single LMI
x1 F1 0
0 F1
+x2
F2 00 F2
+ +xn
Fn 00 Fn
+ G 0
0 G
0
Convex optimization problems 436
LP and SOCP as SDP
LP and equivalent SDP
8/10/2019 Convex Slides
106/301
LP: minimize cTxsubject to Ax b
SDP: minimize cTxsubject to diag(Ax b) 0
(note different interpretation of generalized inequality)SOCP and equivalent SDP
SOCP: minimize fTxsubject to Aix + bi2 cTix + di, i= 1, . . . , m
SDP: minimize fTx
subject to (cTi x + di)I Aix + bi
(Aix + bi)T cTix + di
0, i= 1, . . . , mConvex optimization problems 437
Eigenvalue minimization
minimize max(A(x))
8/10/2019 Convex Slides
107/301
minimize max(A(x))
where A(x) =A0+ x1A1+ + xnAn (with given Ai Sk)
equivalent SDPminimize tsubject to A(x)
tI
variables x Rn, t R follows from
max(A) t A tI
Convex optimization problems 438
Matrix norm minimization
minimize A(x)2= max(A(x)TA(x))1/2
8/10/2019 Convex Slides
108/301
where A(x) =A0+ x1A1+ + xnAn (with given Ai Rpq)equivalent SDP
minimize t
subject to
tI A(x)A(x)T tI
0
variables x Rn, t R constraint follows from
A2 t AT
A t2
I, t 0
tI AAT tI
0
Convex optimization problems 439
Vector optimization
general vector optimization problem
8/10/2019 Convex Slides
109/301
minimize (w.r.t. K) f0(x)subject to fi(x) 0, i= 1, . . . , m
hi(x) = 0, i= 1, . . . , p
vector objective f0:Rn Rq, minimized w.r.t. proper cone K Rq
convex vector optimization problem
minimize (w.r.t. K) f0(x)subject to fi(x) 0, i= 1, . . . , m
Ax=b
with f0 K-convex, f1, . . . ,fm convex
Convex optimization problems 440
Optimal and Pareto optimal points
set of achievable objective values
8/10/2019 Convex Slides
110/301
O = {f0(x) | x feasible}
feasible x is optimal iff0(x) is the minimum value ofO feasible x is Pareto optimal iff0(x) is a minimal value ofO
O
f0(x)
x is optimal
O
f0(xpo)
xpo is Pareto optimal
Convex optimization problems 441
8/10/2019 Convex Slides
111/301
Regularized least-squares
minimize (w.r.t. R2
+) (Ax b2
2, x2
2)
8/10/2019 Convex Slides
112/301
0 10 20 30 40 500
5
10
15
20
25
F1(x) =
Ax
b
22
F2(x)=
x
2 2 O
example for A R10010; heavy line is formed by Pareto optimal points
Convex optimization problems 443
Risk return trade-off in portfolio optimization
minimize (w.r.t. R2+) (
pTx, xTx)
subject to 1Tx= 1, x 0
8/10/2019 Convex Slides
113/301
x Rn is investment portfolio; xi is fraction invested in asset i
p
Rn is vector of relative asset price changes; modeled as a random
variable with mean p, covariance
pTx=E r is expected return; xTx=var r is return variance
example
meanreturn
standard deviation of return0% 10% 20%
0%
5%
10%
15%
standard deviation of return
allo
cationx
x(1)
x(2)x(3)x(4)
0% 10% 20%
0
0.5
1
Convex optimization problems 444
Scalarization
to find Pareto optimal points: choose
K 0 and solve scalar problem
minimize Tf0(x)
8/10/2019 Convex Slides
114/301
minimize f0(x)subject to fi(x) 0, i= 1, . . . , m
hi(x) = 0, i= 1, . . . , p
ifx is optimal for scalar problem,then it is Pareto-optimal for vectoroptimization problem
O
f0(x1)
1f
0(x
2) 2
f0(x3)
for convex vector optimization problems, can find (almost) all Paretooptimal points by varying K0
Convex optimization problems 445
Scalarization for multicriterion problems
to find Pareto optimal points, minimize positive weighted sum
Tf0(x) =1F1(x) + + qFq(x)
8/10/2019 Convex Slides
115/301
f ( ) ( ) q q( )
examples
regularized least-squares problem of page 443
take = (1, ) with >0
minimize Ax b22+ x22
for fixed , a LS problem
0 5 10 15 200
5
10
15
20
Ax b22
x
2 2
= 1
Convex optimization problems 446
risk-return trade-off of page 444
minimize
pTx + xTx
subject to 1Tx= 1, x 0
8/10/2019 Convex Slides
116/301
for fixed >0, a quadratic program
Convex optimization problems 447
Convex Optimization Boyd & Vandenberghe
5. Duality
8/10/2019 Convex Slides
117/301
Lagrange dual problem
weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples
generalized inequalities
51
Lagrangian
standard form problem (not necessarily convex)
minimize f0(x)
8/10/2019 Convex Slides
118/301
( )subject to fi(x) 0, i= 1, . . . , m
hi(x) = 0, i= 1, . . . , p
variable x Rn, domainD, optimal value p
Lagrangian: L:Rn Rm Rp R, with domL= D Rm Rp,
L(x,,) =f0(x) +m
i=1
ifi(x) +p
i=1
ihi(x)
weighted sum of objective and constraint functions
i is Lagrange multiplier associated with fi(x) 0 i is Lagrange multiplier associated with hi(x) = 0
Duality 52
Lagrange dual function
Lagrange dual function: g:Rm
Rp
R,( ) i f L( )
8/10/2019 Convex Slides
119/301
g(, ) = inf xD
L(x,,)
= inf xD
f0(x) + mi=1
ifi(x) +
pi=1
ihi(x)g is concave, can befor some , lower bound property: if 0, then g(, ) p
proof: ifx is feasible and 0, then
f0(x) L(x,,) infxD L(x,,) =g(, )
minimizing over all feasible x gives p g(, )
Duality 53
Least-norm solution of linear equations
minimize xT
xsubject to Ax=b
8/10/2019 Convex Slides
120/301
dual function
Lagrangian is L(x, ) =xT
x +
T
(Ax b) to minimize L over x, set gradient equal to zero:
xL(x, ) = 2x + AT= 0 = x= (1/2)AT
plug in in L to obtain g:
g() =L((1/2)AT, ) = 14
TAAT bT
a concave function of
lower bound property: p (1/4)TAAT bT for all
Duality 54
Standard form LP
minimize cT
xsubject to Ax=b, x 0
8/10/2019 Convex Slides
121/301
dual function
Lagrangian isL(x,,) = cTx + T(Ax b) Tx
= bT+ (c + AT )Tx Lis affine in x, hence
g(, ) = infx
L(x,,) =
bT AT + c= 0 otherwise
g is linear on affine domain{(, ) | AT + c= 0}, hence concave
lower bound property: p bT ifAT+ c 0
Duality 55
Equality constrained norm minimization
minimize
xsubject to Ax=b
8/10/2019 Convex Slides
122/301
dual function
g() = infx (x TAx + bT) = bT AT 1 otherwisewherev= supu1 uTv is dual norm of proof: follows from infx(
x
yTx) = 0 if
y
1,
otherwise
ify 1, thenx yTx 0 for all x, with equality ifx= 0 ify>1, choose x=tu whereu 1, uTy= y>1:
x yTx=t(u y) as t
lower bound property: p bT ifAT 1
Duality 56
Two-way partitioning
minimize x
T
W xsubject to x2i = 1, i= 1, . . . , n
8/10/2019 Convex Slides
123/301
a nonconvex problem; feasible set contains 2n discrete points
interpretation: partition{1, . . . , n} in two sets; Wij is cost of assigningi, j to the same set;Wij is cost of assigning to different sets
dual function
g() = infx (xTW x +i
i(x2i 1)) = inf x xT(W+ diag())x 1T
=
1T W+ diag() 0 otherwise
lower bound property: p 1T ifW+ diag() 0example: = min(W)1gives bound p nmin(W)
Duality 57
Lagrange dual and conjugate function
minimize f0(x)subject to Ax b, Cx=dd l f i
8/10/2019 Convex Slides
124/301
dual function
g(, ) = inf xdom f0 f0(x) + (AT + CT)Tx bT dT= f0 (AT CT) bT dT
recall definition of conjugate f(y) = supxdom f(y
Tx
f(x))
simplifies derivation of dual if conjugate off0 is known
example: entropy maximization
f0(x) =n
i=1
xi log xi, f0 (y) =
ni=1
eyi1
Duality 58
The dual problem
Lagrange dual problem
maximize g(, )subject to 0
8/10/2019 Convex Slides
125/301
subject to 0
finds best lower bound on p
, obtained from Lagrange dual function a convex optimization problem; optimal value denoted d , are dual feasible if 0, (, ) dom g
often simplified by making implicit constraint (, ) dom g explicitexample: standard form LP and its dual (page 55)
minimize cTxsubject to Ax=b
x 0
maximize
bTsubject to AT+ c 0
Duality 59
8/10/2019 Convex Slides
126/301
Slaters constraint qualification
strong duality holds for a convex problem
minimize f0(x)subject to fi(x) 0 i = 1 m
8/10/2019 Convex Slides
127/301
subject to fi(x) 0, i= 1, . . . , mAx=b
if it is strictly feasible, i.e.,
x intD : fi(x) ) can be sharpened: e.g., can replace intD with relintD (interior
relative to affine hull); linear inequalities do not need to hold with strictinequality, . . .
there exist many other types of constraint qualifications
Duality 511
Inequality form LP
primal problem minimize cTxsubject to Ax b
8/10/2019 Convex Slides
128/301
dual function
g() = infx
(c + AT)Tx bT= bT AT + c= 0 otherwise
dual problemmaximize bTsubject to AT + c= 0, 0
from Slaters condition: p =d ifAx b for some x in fact, p =d except when primal and dual are infeasible
Duality 512
Quadratic program
primal problem (assume P
Sn++)
minimize xTP xsubject to Ax b
8/10/2019 Convex Slides
129/301
dual functiong() = inf
x
xTP x + T(Ax b)= 1
4TAP1AT bT
dual problemmaximize (1/4)TAP1AT bTsubject to 0
from Slaters condition: p =d ifAx b for some x in fact, p =d always
Duality 513
A nonconvex problem with strong duality
minimize xTAx + 2bTx
subject to xTx 1A 0, hence nonconvex
8/10/2019 Convex Slides
130/301
dual function: g() = infx(xT(A + I)x + 2bTx
)
unbounded below ifA + I 0 or ifA + I 0 andb R(A + I) minimized by x= (A + I)b otherwise: g() = bT(A + I)b
dual problem and equivalent SDP:
maximize bT(A + I)b subject to A + I
0
b R(A + I)
maximize t
subject to A + I bbT t 0strong duality although primal problem is not convex (not easy to show)
Duality 514
Geometric interpretation
for simplicity, consider problem with one constraint f1(x) 0interpretation of dual function:
g() = inf (t + u) where G = {(f1(x) f0(x)) | x D}
8/10/2019 Convex Slides
131/301
g() = inf (u,t)G
(t + u), where G = {(f1(x), f0(x)) | x D}
G
p
g()u+t =g()
t
u
G
p
d
t
u
u + t=g() is (non-vertical) supporting hyperplane toG hyperplane intersects t-axis at t=g()
Duality 515
epigraph variation: same interpretation ifG is replaced with
A = {(u, t) | f1(x) u, f0(x) t for some x D}
A
t
8/10/2019 Convex Slides
132/301
p
g()
u+t =g()
u
strong duality
holds if there is a non-vertical supporting hyperplane toAat (0, p) for convex problem,Ais convex, hence has supp. hyperplane at (0, p) Slaters condition: if there exist (u, t) A with u
8/10/2019 Convex Slides
133/301
x
i=1
i=1
f0(x) +m
i=1
i fi(x) +
pi=1
i hi(x)
f0(x)
hence, the two inequalities hold with equality
x minimizes L(x, , )
i fi(x
) = 0 fori= 1, . . . , m (known as complementary slackness):
i >0 = fi(x) = 0, fi(x)
8/10/2019 Convex Slides
134/301
1. primal constraints: fi(x) 0, i= 1, . . . , m, hi(x) = 0, i= 1, . . . , p2. dual constraints: 03. complementary slackness: ifi(x) = 0, i= 1, . . . , m
4. gradient of Lagrangian with respect to x vanishes:
f0(x) +m
i=1
ifi(x) +p
i=1
ihi(x) = 0
from page 517: if strong duality holds and x, , are optimal, then theymust satisfy the KKT conditions
Duality 518
KKT conditions for convex problem
ifx,, satisfy KKT for a convex problem, then they are optimal:
from complementary slackness: f0(x) =L(x, , )( ) ( ) ( )
8/10/2019 Convex Slides
135/301
from 4th condition (and convexity): g(, ) =L(x, , )hence,f0(x) =g(, )
ifSlaters condition is satisfied:
x is optimal if and only if there exist , that satisfy KKT conditions
recall that Slater implies strong duality, and dual optimum is attained generalizes optimality conditionf0(x) = 0 for unconstrained problem
Duality 519
example: water-filling (assume i>0)
minimize ni=1 log(xi+ i)
subject to x 0, 1T
x= 1
x is optimal iffx 0, 1Tx= 1, and there exist Rn, R such that
8/10/2019 Convex Slides
136/301
0, ixi= 0, 1
xi+ i + i=
if
8/10/2019 Convex Slides
137/301
hi(x) = 0, i= 1, . . . , p
perturbed problem and its dual
min. f0(x)
s.t. fi(x) ui, i= 1, . . . , mhi(x) =vi, i= 1, . . . , p
max. g(, ) uT vTs.t. 0
x is primal variable; u, v are parameters
p
(u, v) is optimal value as a function ofu, v we are interested in information about p(u, v) that we can obtain fromthe solution of the unperturbed problem and its dual
Duality 521
global sensitivity result
assume strong duality holds for unperturbed problem, and that , aredual optimal for unperturbed problem
apply weak duality to perturbed problem:
( ) ( ) T T
8/10/2019 Convex Slides
138/301
p(u, v) g(, ) uT vT
= p(0, 0) uT vT
sensitivity interpretation
ifi large: p increases greatly if we tighten constraint i (ui0) ifi large and positive: p increases greatly if we take vi0 ifi small and positive: p does not decrease much if we take vi>0;ifi small and negative: p
does not decrease much if we take vi
8/10/2019 Convex Slides
139/301
p(0, 0)
ui= lim
t0
p(tei, 0) p(0, 0)t
i
p(0, 0)
ui= lim
t0
p(tei, 0) p(0, 0)t
ihence, equality
p(u) for a problem with one (inequality)constraint: u
p(u)
p(0) u
u= 0
Duality 523
Duality and problem reformulations
equivalent formulations of a problem can lead to very different duals reformulating the primal problem can be useful when the dual is difficult
to derive or uninteresting
8/10/2019 Convex Slides
140/301
to derive, or uninteresting
common reformulations
introduce new variables and equality constraints
make explicit constraints implicit or vice-versa transform objective or constraint functions
e.g., replace f0(x) by(f0(x)) with convex, increasing
Duality 524
Introducing new variables and equality constraints
minimize f0(Ax + b)
dual function is constant: g= infx L(x) = infx f0(Ax + b) =ph d li b d l i i l
8/10/2019 Convex Slides
141/301
we have strong duality, but dual is quite useless
reformulated problem and its dual
minimize f0(y)subject to Ax + b
y= 0
maximize bT f0 ()subject to AT= 0
dual function follows from
g() = inf x,y
(f0(y) Ty+ TAx + bT)
=f0 () + bT AT= 0
otherwise
Duality 525
norm approximation problem: minimizeAx b
minimize ysubject to y=Ax b
can look up conjugate of , or derive dual directly
8/10/2019 Convex Slides
142/301
g() = inf x,y
(
y
+ Ty
TAx + bT)
=
bT+ infy(y + Ty) AT= 0 otherwise
= bT AT= 0, 1 otherwise(see page 54)
dual of norm approximation problem
maximize bTsubject to AT= 0, 1
Duality 526
Implicit constraints
LP with box constraints: primal and dual problem
minimize cTxsubject to Ax=b
1 x 1
maximize bT 1T1 1T2subject to c + AT+ 1 2= 0
1 0 2 0
8/10/2019 Convex Slides
143/301
1 x 1 1 0, 2 0
reformulation with box constraints made implicit
minimize f0(x) =
cTx 1 x 1 otherwise
subject to Ax=b
dual function
g() = inf 1x1
(cTx + T(Ax b))
= bT AT+ c1dual problem: maximizebT AT+ c1
Duality 527
Problems with generalized inequalities
minimize f0(x)
subject to fi(x) Ki0, i= 1, . . . , mhi(x) = 0, i= 1, . . . , p
K is generalized inequality on Rki
8/10/2019 Convex Slides
144/301
Ki is generalized inequality on Rdefinitionsare parallel to scalar case:
Lagrange multiplier for fi(x) Ki0 is vector i Rki Lagrangian L:Rn Rk1 Rkm Rp R, is defined as
L(x, 1, , m, ) =f0(x) +m
i=1
Tifi(x) +p
i=1
ihi(x)
dual function g:Rk1 Rkm Rp R, is defined asg(1, . . . , m, ) = inf
xDL(x, 1, , m, )
Duality 528
lower bound property: ifi Ki 0, then g(1, . . . , m, ) p
proof: ifx is feasible and Ki 0, then
f0(x) f0(x) + mi=1
Ti fi(x) +p
i=1
ihi(x)
inf L(x, 1, . . . , m, )
8/10/2019 Convex Slides
145/301
infxD
L(x, 1, . . . , m, )
= g(1, . . . , m, )
minimizing over all feasible x gives p g(1, . . . , m, )dual problem
maximize g(1, . . . , m, )subject to i Ki 0, i= 1, . . . , m
weak duality: p
d
always strong duality: p =d for convex problem with constraint qualification(for example, Slaters: primal problem is strictly feasible)
Duality 529
8/10/2019 Convex Slides
146/301
Convex Optimization Boyd & Vandenberghe
6. Approximation and fitting
norm approximation
8/10/2019 Convex Slides
147/301
norm approximation
least-norm problems
regularized approximation
robust approximation
61
Norm approximation
minimize
Ax
b
(A Rmn with m n, is a norm on Rm)interpretations of solution x = argmin Ax b:
8/10/2019 Convex Slides
148/301
interpretations of solution x = argminx Ax b: geometric: Ax is point inR(A) closest to b estimation: linear measurement model
y=Ax + v
y are measurements, x is unknown, v is measurement error
giveny=b, best guess ofx is x
optimal design: x are design variables (input), Ax is result (output)x is design that best approximates desired result b
Approximation and fitting 62
examples
least-squares approximation ( 2): solution satisfies normal equations
ATAx=ATb
(x = (ATA)1ATb if rankA = n)
8/10/2019 Convex Slides
149/301
(x (A A) A b ifrankA n)
Chebyshev approximation ( ): can be solved as an LP
minimize tsubject to
t1
Ax
b
t1
sum of absolute residuals approximation ( 1): can be solved as an LP
minimize 1Tysubject to y Ax b y
Approximation and fitting 63
8/10/2019 Convex Slides
150/301
example(m= 100, n= 30): histogram of residuals for penalties
(u) =
|u
|, (u) =u2, (u) = max
{0,
|u
|a
}, (u) =
log(1
u2)
p=
1
0
40
8/10/2019 Convex Slides
151/301
p=
2
Dead
zone
Logbarrier
r2
2
2
2
1
1
1
1
0
0
0
0
1
1
1
1
2
2
2
20
0
10
0
20
0
10
shape of penalty function has large effect on distribution of residuals
Approximation and fitting 65
Huber penalty function (with parameter M)
hub(u) = u2 |u| MM(2|u| M) |u| > M
linear growth for large u makes approximation less sensitive to outliers
2
8/10/2019 Convex Slides
152/301
u
h
ub
(u
)
1.5 1 0.5 0 0.5 1 1.50
0.5
1
1.5
t
f(t)
10 5 0 5 1020
10
0
10
20
left: Huber penalty for M= 1
right: affine function f(t) = + t fitted to 42 points ti, yi (circles)using quadratic (dashed) and Huber (solid) penalty
Approximation and fitting 66
Least-norm problems
minimize
xsubject to Ax=b
(A Rmn with m n, is a norm on Rn)
8/10/2019 Convex Slides
153/301
interpretations of solution x = argminAx=b x:
geometric: x is point in affine set{x | Ax=b} with minimumdistance to 0
estimation: b=Ax are (perfect) measurements ofx; x is smallest(most plausible) estimate consistent with measurements
design: x are design variables (inputs); b are required results (outputs)x is smallest (most efficient) design that satisfies requirements
Approximation and fitting 67
examples
least-squares solution of linear equations ( 2):
can be solved via optimality conditions
2x + AT= 0, Ax=b
8/10/2019 Convex Slides
154/301
minimum sum of absolute values ( 1): can be solved as an LPminimize 1Tysubject to y x y, Ax=b
tends to produce sparse solution x
extension: least-penalty problem
minimize (x1
) +
+ (xn
)subject to Ax=b
:R R is convex penalty function
Approximation and fitting 68
Regularized approximation
minimize (w.r.t. R2+) (Ax b, x)
A Rmn, norms on Rm andRn can be different
8/10/2019 Convex Slides
155/301
interpretation: find good approximation Ax b with small x
estimation: linear measurement model y=Ax + v, with priorknowledge thatx is small
optimal design: small x is cheaper or more efficient, or the linearmodely=Axis only valid for small x
robust approximation: good approximation Ax
b with small x is
less sensitive to errors in A than good approximation with large x
Approximation and fitting 69
Scalarized problem
minimize Ax b + x solution for >0 traces out optimal trade-off curve
other common method: minimize Ax b 2 + x 2 with > 0
8/10/2019 Convex Slides
156/301
other common method: minimize
Ax
b
2 +
x
2 with >0
Tikhonov regularization
minimize
Ax
b
22+
x
22
can be solved as a least-squares problem
minimize AI x b0 2
2
solutionx = (ATA + I)1ATb
Approximation and fitting 610
Optimal input design
linear dynamical system with impulse response h:
y(t) =t
=0
h()u(t ), t= 0, 1, . . . , N
input design problem: multicriterion problem with 3 objectives
8/10/2019 Convex Slides
157/301
input design problem: multicriterion problem with 3 objectives
1. tracking error with desired output ydes: Jtrack=N
t=0(y(t) ydes(t))2
2. input magnitude: Jmag=
Nt=0 u(t)
2
3. input variation: Jder=N1t=0 (u(t + 1) u(t))2track desired output using a small and slowly varying input signal
regularized least-squares formulation
minimize Jtrack+ Jder+ Jmag
for fixed , , a least-squares problem in u(0), . . . ,u(N)
Approximation and fitting 611
example: 3 solutions on optimal trade-off surface
(top)= 0, small ; (middle) = 0, larger ; (bottom) large
u(t)
5
0
5
y(t)
1
0.5
0
0.51
8/10/2019 Convex Slides
158/301
t0 50 100 150 200
10
t0 50 100 150 200
1
t
u(t)
0 50 100 150 20042
0
2
4
t
y(t)
0 50 100 150 2001
0.5
0
0.5
1
t
u(t)
0 50 100 150 2004
2
0
2
4
t
y(t)
0 50 100 150 2001
0.50
0.5
1
Approximation and fitting 612
Signal reconstruction
minimize (w.r.t. R2+) (x xcor2, (x))
x Rn is unknown signal
8/10/2019 Convex Slides
159/301
xcor=x + v is (known) corrupted version ofx, with additive noise v variable x (reconstructed signal) is estimate ofx :Rn R is regularization function or smoothing objective
examples: quadratic smoothing, total variation smoothing:
quad
(x) =n1
i=1 (xi+1 xi)2, tv(x) =n1
i=1 |xi+1 xi|
Approximation and fitting 613
quadratic smoothing example
x
0 5
0
0.5x
0 1000 2000 3000 4000
0.5
0
0.5
0.5
8/10/2019 Convex Slides
160/301
i
xco
r
0
0
1000
1000
2000
2000
3000
3000
4000
4000
0.5
0.5
0
0.5
i
x
x
0
0
1000
1000
2000
2000
3000
3000
4000
4000
0.5
0.5
0
0
0.5
original signal x and noisysignalxcor
three solutions on trade-off curvex xcor2 versus quad(x)
Approximation and fitting 614
total variation reconstruction example
x
1
0
1
2 x
i
0 500 1000 1500 20002
0
2
2
8/10/2019 Convex Slides
161/301
i
xcor
0
0
500
500
1000
1000
1500
1500
2000
2000
2
2
1
0
1
2
i
x
i
xi
0
0
500
500
1000
1000
1500
1500
2000
2000
2
2
0
0
2
original signal x and noisysignalxcor
three solutions on trade-off curvex xcor2 versus quad(x)
quadratic smoothing smooths out noise and sharp transitions in signal
Approximation and fitting 615
x
0 500 1000 1500 20002
1
0
1
2 x
x
0 500 1000 1500 20002
0
0
2
2
8/10/2019 Convex Slides
162/301
i
xcor
0 500 1000 1500 20002
1
0
1
2
i
x
0
0
500
500
1000
1000
1500
1500
2000
2000
2
2
0
2
original signal x and noisysignalxcor
three solutions on trade-off curve
x
xcor
2 versus tv(x)
total variation smoothing preserves sharp transitions in signal
Approximation and fitting 616
Robust approximation
minimizeAx b with uncertain A
two approaches:
stochastic: assume A is random, minimize E Ax b worst-case: setAof possible values ofA, minimize supAA Ax b
8/10/2019 Convex Slides
163/301
tractable only in special cases (certain norms , distributions, setsA)
example: A(u) =A0+ uA1
xnom minimizesA0x b22 xstoch minimizes E A(u)x b22
with u uniform on [1, 1] xwcminimizes sup1u1 A(u)x b22
figure shows r(u) = A(u)x b2u
r(u
)
xnom
xstoch
xwc
2 1 0 1 20
2
4
6
8
10
12
Approximation and fitting 617
stochastic robust LS withA= A + U,U random, EU= 0, EUTU=P
minimize E (A + U)x b22
explicit expression for objective:
E Ax b22 = E Ax b + U x22
8/10/2019 Convex Slides
164/301
= Ax b22+ ExTUTU x= Ax b22+ xTP x
hence, robust LS problem is equivalent to LS problem
minimize Ax b22+ P1/2x22
for P =I, get Tikhonov regularized problem
minimize Ax b22+ x22
Approximation and fitting 618
worst-case robust LS withA = {A + u1A1+ + upAp | u2 1}
minimize supAA Ax b22= supu21 P(x)u + q(x)22where P(x) = A1x A2x Apx , q(x) = Ax b from page 514, strong duality holds between the following problems
8/10/2019 Convex Slides
165/301
maximize P u + q22subject to u22 1
minimize t +
subject to
I P q
PT I 0qT 0 t
0
hence, robust LS problem is equivalent to SDP
minimize t +
subject to I P(x) q(x)P(x)T I 0q(x)T 0 t
0
Approximation and fitting 619
example: histogram of residuals
r(u) = (A0+ u1A1+ u2A2)x b2with u uniformly distributed on unit disk, for three values ofx
xrls0.2
0.25
8/10/2019 Convex Slides
166/301
r(u)
xlsxtik
frequency
0 1 2 3 4 50
0.05
0.1
0.15
xls minimizes
A0x
b
2
xtik minimizesA0x b22+ x22 (Tikhonov solution) xrls minimizes supAA Ax b22+ x22
Approximation and fitting 620
Convex Optimization Boyd & Vandenberghe
7. Statistical estimation
maximum likelihood estimation
8/10/2019 Convex Slides
167/301
optimal detector design
experiment design
71
Parametric distribution estimation
distribution estimation problem: estimate probability density p(y) of arandom variable from observed values parametric distribution estimation: choose from a family of densities
px(y), indexed by a parameter x
8/10/2019 Convex Slides
168/301
maximum likelihood estimation
maximize (over x) logpx(y)
y is observed value l(x) = logpx(y) is called log-likelihood function
can add constraints x Cexplicitly, or define px(y) = 0 for x C a convex optimization problem iflogpx(y) is concave in x for fixed y
Statistical estimation 72
Linear measurements with IID noise
linear measurement model
yi=aTix + vi, i= 1, . . . , m
x Rn is vector of unknown parameters
8/10/2019 Convex Slides
169/301
vi is IID measurement noise, with density p(z) yi is measurement: y Rm has density px(y) =
mi=1p(yi aTix)
maximum likelihood estimate: any solution x of
maximize l(x) =mi=1 logp(yi
aTix)
(y is observed value)
Statistical estimation 73
examples
Gaussian noiseN(0, 2): p(z) = (22)1/2ez2/(22),
l(x) = m2
log(22) 122
mi=1
(aTix yi)2
ML estimate is LS solution
|z|/a
8/10/2019 Convex Slides
170/301
Laplacian noise: p(z) = (1/(2a))e ,
l(x) = m log(2a) 1a
mi=1
|aTix yi|
ML estimate is 1-norm solution
uniform noise on[a, a]:
l(x) = m log(2a) |aTix
yi
| a, i= 1, . . . , m
otherwiseML estimate is any x with|aTix yi| a
Statistical estimation 74
Logistic regression
random variable y {0, 1} with distribution
p=prob(y= 1) = exp(aTu + b)
1 + exp(aTu + b)
a, b are parameters; u Rn are (observable) explanatory variablesf ( )
8/10/2019 Convex Slides
171/301
estimation problem: estimate a, b from m observations (ui, yi)
log-likelihood function (for y1= =yk= 1, yk+1= =ym= 0):
l(a, b) = log ki=1
exp(aTui+ b)1 + exp(aTui+ b)
mi=k+1
11 + exp(aTui+ b)
=
k
i=1(aTui+ b) m
i=1 log(1 + exp(aTui+ b))concave in a, b
Statistical estimation 75
example(n= 1, m= 50 measurements)
b
(y=
1)
0 4
0.6
0.8
1
8/10/2019 Convex Slides
172/301
u
prob
0 2 4 6 8 10
0
0.2
0.4
circles show 50 points (ui, yi)
solid curve is ML estimate ofp= exp(au + b)/(1 + exp(au + b))
Statistical estimation 76
(Binary) hypothesis testing
detection (hypothesis testing) problem
given observation of a random variable X {1, . . . , n}, choose between:
hypothesis 1: Xwas generated by distribution p= (p1, . . . , pn)
h h i 2 X d b di ib i ( 1 )
8/10/2019 Convex Slides
173/301
hypothesis 2: Xwas generated by distribution q= (q1, . . . , q n)
randomized detector
a nonnegative matrix T R2n, with 1TT =1T
if we observe X=k, we choose hypothesis 1 with probability t1k,hypothesis 2 with probability t2k
if all elements ofT are0 or1, it is called a deterministic detector
Statistical estimation 77
detection probability matrix:
D= T p T q = 1 Pfp Pfn
Pfp 1 Pfn Pfp is probability of selecting hypothesis 2 ifX is generated by
distribution 1 (false positive)
Pfn i b bilit f l ti h th i 1 if X i t d b
8/10/2019 Convex Slides
174/301
Pfn is probability of selecting hypothesis 1 ifX is generated bydistribution 2 (false negative)
multicriterion formulation of detector design
minimize (w.r.t. R2+) (Pfp, Pfn) = ((T p)2, (T q)1)subject to t1k+ t2k= 1, k= 1, . . . , n
tik 0, i= 1, 2, k= 1, . . . , n
variable T R2n
Statistical estimation 78
scalarization (with weight >0)
minimize (T p)2+ (T q)1
subject to t1k+ t2k= 1, tik 0, i= 1, 2, k= 1, . . . , nan LP with a simple analytical solution
(t1k t2k)
(1, 0) pk
qk
(0 1) k < k
8/10/2019 Convex Slides
175/301
(t1k, t2k) = (0, 1) pk< qk a deterministic detector, given by a likelihood ratio test
ifpk=qk for some k, any value 0
t1k
1, t1k= 1
t2k is optimal
(i.e., Pareto-optimal detectors include non-deterministic detectors)
minimax detector
minimize max
{Pfp, Pfn
}= max
{(T p)2, (T q)1
}subject to t1k+ t2k= 1, tik 0, i= 1, 2, k= 1, . . . , nan LP; solution is usually not deterministic
Statistical estimation 79
example
P =
0.70 0.100.20 0.100.05 0.700.05 0.10
0 8
1
8/10/2019 Convex Slides
176/301
Pfp
Pfn
1
2
34
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
solutions 1, 2, 3 (and endpoints) are deterministic; 4 is minimax detector
Statistical estimation 710
Experiment design
m linear measurements yi=aTix + wi, i= 1, . . . , mof unknown x Rn
measurement errors wi are IIDN(0, 1) ML (least-squares) estimate is
x =
m
aiaT
i
1 m
yiai
8/10/2019 Convex Slides
177/301
x= i=1
aiai i=1
yiai
error e= x x has zero mean and covariance
E=E eeT = m
i=1
aiaTi
1confidence ellipsoids are given by
{x
|(x
x)TE1(x
x)
}experiment design: choose ai {v1, . . . , vp} (a set of possible testvectors) to make E small
Statistical estimation 711
vector optimization formulation
minimize (w.r.t. Sn+) E=
pk=1 mkvkv
Tk
1
subject to mk
0, m1
+
+ mp
=mmk Z
variables are mk (# vectors ai equal to vk)
difficult in general due to integer constraint
8/10/2019 Convex Slides
178/301
difficult in general, due to integer constraintrelaxed experiment design
assumem
p, use k=mk/mas (continuous) real variable
minimize (w.r.t. Sn+) E= (1/m)p
k=1 kvkvTk
1subject to 0, 1T= 1
common scalarizations: minimize log det E, trE, max(E), . . . can add other convex constraints, e.g., bound experiment cost cT B
Statistical estimation 712
D-optimal design
minimize log det
pk=1 kvkv
Tk
1
subject to 0, 1T
= 1
interpretation: minimizes volume of confidence ellipsoids
dual problem
8/10/2019 Convex Slides
179/301
maximize log det W+ n log nsubject to vTk W vk 1, k= 1, . . . , p
interpretation:{x | xT
W x 1} is minimum volume ellipsoid centered atorigin, that includes all test vectors vkcomplementary slackness: for, Wprimal and dual optimal
k(1 vTk W vk) = 0, k= 1, . . . , poptimal experiment uses vectors vk on boundary of ellipsoid defined by W
Statistical estimation 713
example(p= 20)
1 = 0.5
2 = 0.5
design uses two vectors on boundary of ellipse defined by optimal W
8/10/2019 Convex Slides
180/301
design uses two vectors, on boundary of ellipse defined by optimal W
Statistical estimation 714
derivation of dual of page 713
first reformulate primal problem with new variable X:
minimize log det X1subject to X=
pk=1 kvkv
Tk , 0, 1T= 1
L(X,,Z,z,) = log det X1+tr Z X
p
k=1
kvkvT
k zT+(1T
1)
8/10/2019 Convex Slides
181/30