Agenda
1 Cone programming
2 Convex cones
3 Generalized inequalities
4 Linear programming (LP)
5 Second-order cone programming (SOCP)
6 Semidefinite programming (SDP)
7 Examples
Optimization problem in standard form
minimize f0(x)subject to fi(x) ≤ 0 i = 1, . . . ,m
hi(x) = 0 i = 1, . . . , p
x ∈ Rn
f0 : Rn → R (objective or cost function)
fi : Rn → R (inequality constraint functionals)
hi : Rn → R (equality constraint functionals)
Terminology
x is feasible if x obeys the constraints
feasible set C: set of all feasible points
optimal value: p? = inff0(x), x ∈ Ccan be −∞; e.g. min log(x), x > 0.by convention, p? =∞ if C = ∅ (problem infeasible)
optimal solution: x? s.t. f(x?) = p?
there may be no optimal solution: e.g. min log(x), x > 0
optimal set: x : f(x) = p?
Convex optimization problem
Convex optimization problem in standard form
minimize f0(x)subject to fi(x) ≤ 0 i = 1, . . . ,m
aTi x = bi i = 1, . . . , p
f0, f1, . . . , fm convex
affine equality constraints Ax = b, A ∈ Rp×n
feasible set is convex
Abstract convex optimization problem
minimize f0(x)subject to x ∈ C
f0 convex
C convex
Why convexity?
A convex function has no local minimum that is not global
convex not convex
A convex set is connected and has feasible directions at any point
convex + feasible directions
not convex
A convex function is continuous and has some differentiability properties
Convex functions arise prominently in duality
Cone programming I
LPminimize cTxsubject to Fx+ g ≥ 0
Ax = b
Nonlinear programming → nonlinear constraints
Express nonlinearity via generalized inequalities
Orderings of Rn and convex cones
K is a convex cone if
(i) K is convex
(ii) K is a cone (i.e. x ∈ K =⇒ λx ∈ K ∀λ ≥ 0)
K is pointed if
(iii) x ∈ K and − x ∈ K =⇒ x = 0(K does not contain a straight line through the origin)
Example: K = x ∈ Rn : x ≥ 0 is a pointed convex cone
Two additional properties of Rn+(iv) Rn+ is closed
(v) Rn+ has a nonempty interior
Implication: ordering
a K b ⇐⇒ a− b ∈ K
(i) - (iii) ensure that this is a good ordering
1 reflexivity: a a follows from 0 ∈ K2 antisymmetry: a b, b a =⇒ a = b (since K is pointed)
3 transitivity: a b, b c =⇒ a c (since K is convex and a cone)
→ compatibility with linear operations
a b & λ ≥ 0 =⇒ λa λba b & c d =⇒ a+ c b+ d
Good properties of LPs come from these properties
4 closedness: ai bi, ai → a, bi → b =⇒ a b5 nonempty interior allows us to define strict inequalities:a b ⇐⇒ a− b ∈ int(K)
Examples of cones
Nonnegative orthant Rn+x ∈ Rn : x ≥ 0
Second-order (or Lorentz or ice cream) cone
x ∈ Rn+1 :√x21 + . . .+ x2n ≤ xn+1
Positive semidefinite cone
X ∈ Sn : X 0
Cone programming II
minimize cTxsubject to Fx+ g 0
Ax = b
K = Rn+ =⇒ linear programming
Minimize linear functional over an affine slice of a cone
Very fruitful point of view
useful theory (duality)useful algorithms (interior point methods)
Linear programming (LP)
minimize cTxsubject to Fx+ g ≥ 0
Ax = b
Linear objective
Linear equality and inequality constraints
Feasible set is a polyhedronc
x* (optimal)
cTx = constant
Many problems can be formulated as LP’s
Example: Chebyshev approximation
A ∈ Rm×n
b ∈ Rm
minimize ‖Ax− b‖∞ ⇐⇒ minimize maxi=1,...,m |aTi x− bi|
Different from LS problem: minimize‖Ax− b‖2LP formulation (epigraph trick)
⇐⇒ minimize tsubject to |aTi x− bi| ≤ t ∀i
⇐⇒ minimize tsubject to − t ≤ aTi x− bi ≤ t ∀i
optimization variables (x, t) ∈ Rn+1
Example: basis pursuit
A ∈ Rm×n
b ∈ Rm
minimize ‖x‖1subject to Ax = b
LP formulations:
(a)minimize
∑ti
subject to −ti ≤ xi ≤ tiAx = b
optimization variables (x, t) ∈ R2n
(b)minimize
∑x+i +
∑x−i
subject to A(x+ − x−) = bx+, x− ≥ 0
optimization variables (x+, x−) ∈ R2n
Second-order cone programming (SOCP)
minimize cTxsubject to ‖Fix+ gi‖2 ≤ cTi x+ di i = 1, . . . ,m
Ax = b
‖Fix+ gi‖2 ≤ cTi x+ di ⇐⇒[Fix+ gicTi x+ di
]∈ Li = (yi, t) : ‖yi‖ ≤ t
(hence the name)
SOCP ⇐⇒ minimize cTx
subject to
[FicTi
]x+
[gidi
]∈ Li
Ax = b
affine mapping
Fx+ g = [FicTi
]x+
[gidi
]i=1,...,m
cone productK = L1 × L2 × . . .× Lm
[FicTi
]x+
[gidi
]∈ Li ∀i ⇐⇒ Fx+ g ∈ K
∴ SOCP ⇐⇒minimize cTxsubject to Fx+ g ∈ K
Ax = b
this is a cone program
Example: support vector machines
n pairs (xi, yi)
xi ∈ Rp: feature/explanatory variablesyi ∈ −1, 1: response/class label
Examples
xi: infrared blood absorption spectrumyi: person is diabetic or not
SVM model: SVM as a penalized fittingprocedure
minβ
n∑i=1
[1− yif(xi)]+ + λ‖β‖2
f(x) = xTβ
sometimes f(x) = xTβ + β0 and sameminimum
hinge loss
0 1 yf(x)
[1-yf(x)]+
SVM: formulation as an SOCP
Variables (β, t) ∈ Rp×n
minimize∑ti + λ‖β‖2 ⇐⇒ minimize
∑ti + λ‖β‖2
subject to [1− yif(xi)]+ ≤ ti subject to yif(xi) ≥ 1− titi ≥ 0
this an SOCP, since SOCP’s are more general than QP’s and QCQP’s
Equivalenceminimize
∑ti + λu
subject to ‖β‖2 ≤ uyif(xi) ≥ 1− titi ≥ 0
‖β‖2 ≤ u ⇐⇒ ‖β‖2 ≤(u+ 1
2
)2−(u− 1
2
)2⇐⇒
∥∥∥∥[ βu−12
]∥∥∥∥ ≤ u+ 1
2
QP ⊂ SOCP ( =⇒ LP ⊂ SOCP)
QCQPminimize 1
2xTP0x+ qT0 x+ r0
subject to 12x
TPix+ qTi x+ ri ≤ 0P0, Pi 0
QCQP ⊂ SOCP
quadratic convex inequalities are SOCP-representable
Example: total-variation denoising
Observebij = fij + σzij 0 < i, j < n
f is original image
b is a noisy version
Problem: recover original image (de-noise)
Min-TV solutionminimize ‖x‖TVsubject to ‖x− b‖ ≤ δ
TV norm
‖x‖TV =∑‖Dijx‖2 Dijx =
[xi+1,j − xi,jxi,j+1 − xi,j
]Formulation as an SOCP
minimize∑tij
subject to ‖Dijx‖2 ≤ tij‖x− b‖2 ≤ δ
Semidefinite programming (SDP)
minimize cTxsubject to F (x) = x1F1 + . . .+ xnFn − F0 0
Fi ∈ Sp(p× p symmetric matrices)
linear matrix inequality (LMI): F (x) 0
multiple LMI’s can be combined into one:
Fi(x) 0 i = 1, . . . ,m ⇐⇒
F1(x). . .
Fm(x)
0
SOCP ⊂ SDP (but the converse is not true!)
(x, t) ∈ Rn+1 : ‖x‖ ≤ t ⇐⇒[tIm xxT t
] 0
SOCP constraints are LMI’s
Hierarchy: LP ⊂ SOCP ⊂ SDP
Many nonlinear convex problems can be cast as SDP’s
Example: minimum-norm problem
minimize ‖A(x)‖subject to A(x) = x1A1 + . . .+ xnAn −B
with Ai ∈ Rp1×p2 , is equivalent to
minimize tsubject to ‖A(x)‖ ≤ t
‖A(x)‖ ≤ t ⇐⇒[tIp1 A(x)AT (x) tIp2
] 0
Why? Eigenvalues of this matrix are t± σi(A(x))
Example: nuclear-norm minimization
minimize ‖X‖∗ =∑σi(X)
subject to Xij = Bij (i, j) ∈ Ω ⊂ [p1]× [p2]
This is an SDP (proof, later)
Stability analysis for dynamical systems
Linear systemdv
dt= v(t) = Qv(t) Q ∈ Rn×n
Main question: is this system stable? i.e. do all trajectories tend to zero ast→∞?
Simple sufficient condition: existence of a quadratic Lyapunov function
(i) L(v) = vTXv X 0
(ii) L = ddtL(v(t)) ≤ −αL(v(t)) (α > 0) for any trajectory
This condition gives L(v(t)) = vT (t)Xv(t) ≤ exp(−αt)L(v(0)) (Gronwall’sinequality), whence
X 0 =⇒ v(t)→ 0 as t→∞
Exsitence of X 0 and α > 0 provides a certificate of stability
dv
dt= v(t) = Qv(t), L(v) = vTXv X 0
L =d
dt
[vT (t)Xv(t)
]= vTXv + vTXv = vT (QTX +XQ)v
i.e. L ≤ −αL ⇐⇒ vT (QTX +XQ+ αX)v < 0 ∀v⇐⇒ QTX +XQ+ αX ≺ 0
Conclusion: to certify stability, it suffices to find X obeying
X 0, QTX +XQ ≺ 0
If the optimal value of SDP
minimize t
subject to
[X + tI 0
0 −(QTX +XQ) + tI
] 0
is negative, then the system is stable
Extension
v(t) = Q(t)v(t)
Q(t) ∈ convQ1, . . . , Qn time-varying
L(v) = vTXv (X 0) s.t. L ≤ −αL =⇒ stability
Similar calculations show that for all v
vT (QT (t)X +XQ(t) + αX)v ≤ 0⇐⇒ QT (t)X +XQ(t) + αX ≺ 0, ∀Q(t) ∈ convQ1, . . . , Qn⇐⇒ QTi X +XQi + αX ≺ 0, ∀i = 1, . . . , k
If we can find X such that
X 0 & QTi X +XQi ≺ 0 ∀i = 1, . . . , k
then we have stability
This is an SDP!
References
1 A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization:Analysis, Algorithms, and Engineering Applications, MPS-SIAM Series onOptimization
2 S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge UniversityPress