+ All Categories
Home > Documents > Convex Slides

Convex Slides

Date post: 02-Jun-2018
Category:
Upload: pawankumar-barnwal
View: 225 times
Download: 0 times
Share this document with a friend

of 301

Transcript
  • 8/10/2019 Convex Slides

    1/301

    Convex Optimization Boyd & Vandenberghe

    1. Introduction

    mathematical optimization

    least-squares and linear programming

    convex optimization

    example

    course goals and topics

    nonlinear optimization

    brief history of convex optimization

    11

  • 8/10/2019 Convex Slides

    2/301

    Mathematical optimization

    (mathematical) optimization problem

    minimize f0(x)

    subject to fi(x) bi, i= 1, . . . , m x= (x1, . . . , xn): optimization variables

    f0:Rn

    R: objective function fi:Rn R, i= 1, . . . , m: constraint functions

    optimal solution x has smallest value off0 among all vectors thatsatisfy the constraints

    Introduction 12

  • 8/10/2019 Convex Slides

    3/301

    Examples

    portfolio optimization

    variables: amounts invested in different assets constraints: budget, max./min. investment per asset, minimum return objective: overall risk or return variance

    device sizing in electronic circuits

    variables: device widths and lengths

    constraints: manufacturing limits, timing requirements, maximum area objective: power consumption

    data fitting

    variables: model parameters constraints: prior information, parameter limits

    objective: measure of misfit or prediction error

    Introduction 13

  • 8/10/2019 Convex Slides

    4/301

    Solving optimization problems

    general optimization problem

    very difficult to solve methods involve some compromise, e.g., very long computation time, or

    not always finding the solution

    exceptions: certain problem classes can be solved efficiently and reliably

    least-squares problems linear programming problems convex optimization problems

    Introduction 14

  • 8/10/2019 Convex Slides

    5/301

    Least-squares

    minimize Ax b22

    solving least-squares problems

    analytical solution: x = (ATA)1ATb reliable and efficient algorithms and software

    computation time proportional to n2k (A

    Rkn); less if structured

    a mature technology

    using least-squares

    least-squares problems are easy to recognize a few standard techniques increase flexibility (e.g., including weights,

    adding regularization terms)

    Introduction 15

  • 8/10/2019 Convex Slides

    6/301

    Linear programming

    minimize cTxsubject to aTix bi, i= 1, . . . , m

    solving linear programs no analytical formula for solution reliable and efficient algorithms and software

    computation time proportional to n2m ifm n; less with structure a mature technology

    using linear programming

    not as easy to recognize as least-squares problems a few standard tricks used to convert problems into linear programs

    (e.g., problems involving 1- or-norms, piecewise-linear functions)

    Introduction 16

  • 8/10/2019 Convex Slides

    7/301

    Convex optimization problem

    minimize f0(x)subject to fi(x) bi, i= 1, . . . , m

    objective and constraint functions are convex:

    fi(x + y) fi(x) + fi(y)

    if + = 1, 0, 0

    includes least-squares problems and linear programs as special cases

    Introduction 17

  • 8/10/2019 Convex Slides

    8/301

    solving convex optimization problems

    no analytical solution reliable and efficient algorithms

    computation time (roughly) proportional to max

    {n3, n2m, F

    }, where F

    is cost of evaluating fis and their first and second derivatives

    almost a technology

    using convex optimization

    often difficult to recognize many tricks for transforming problems into convex form surprisingly many problems can be solved via convex optimization

    Introduction 18

  • 8/10/2019 Convex Slides

    9/301

    Example

    m lamps illuminating n (small, flat) patches

    lamp power pj

    illumination Ik

    rkjkj

    intensity Ik at patch k depends linearly on lamp powers pj:

    Ik=m

    j=1akjpj, akj =r

    2kj max{cos kj, 0}

    problem: achieve desired illumination Ides with bounded lamp powers

    minimize maxk=1,...,n | log Ik log Ides|subject to 0 pj pmax, j = 1, . . . , m

    Introduction 19

  • 8/10/2019 Convex Slides

    10/301

    how to solve?

    1. use uniform power: pj =p, vary p2. use least-squares:

    minimize

    nk=1(Ik Ides)2

    roundpj ifpj > pmax orpj

  • 8/10/2019 Convex Slides

    11/301

    5. use convex optimization: problem is equivalent to

    minimize f0(p) = maxk=1,...,n h(Ik/Ides)subject to 0 pj pmax, j = 1, . . . , m

    with h(u) = max

    {u, 1/u

    }

    0 1 2 3 40

    1

    2

    3

    4

    5

    u

    h(u

    )

    f0 is convex because maximum of convex functions is convex

    exact solution obtained with effortmodest factorleast-squares effortIntroduction 111

  • 8/10/2019 Convex Slides

    12/301

    additional constraints: does adding 1 or 2 below complicate the problem?

    1. no more than half of total power is in any 10 lamps

    2. no more than half of the lamps are on (pj >0)

    answer: with (1), still easy to solve; with (2), extremely difficult moral: (untrained) intuition doesnt always work; without the proper

    background very easy problems can appear quite similar to very difficultproblems

    Introduction 112

  • 8/10/2019 Convex Slides

    13/301

    Course goals and topics

    goals

    1. recognize/formulate problems (such as the illumination problem) asconvex optimization problems

    2. develop code for problems of moderate size (1000 lamps, 5000 patches)

    3. characterize optimal solution (optimal power distribution), give limits ofperformance, etc.

    topics

    1. convex sets, functions, optimization problems

    2. examples and applications

    3. algorithms

    Introduction 113

  • 8/10/2019 Convex Slides

    14/301

    Nonlinear optimization

    traditional techniques for general nonconvex problems involve compromises

    local optimization methods (nonlinear programming)

    find a point that minimizes f0 among feasible points near it fast, can handle large problems require initial guess provide no information about distance to (global) optimum

    global optimization methods

    find the (global) solution worst-case complexity grows exponentially with problem size

    these algorithms are often based on solving convex subproblems

    Introduction 114

  • 8/10/2019 Convex Slides

    15/301

    Brief history of convex optimization

    theory (convex analysis): ca19001970

    algorithms

    1947: simplex algorithm for linear programming (Dantzig) 1960s: early interior-point methods (Fiacco & McCormick, Dikin, . . . ) 1970s: ellipsoid method and other subgradient methods 1980s: polynomial-time interior-point methods for linear programming

    (Karmarkar 1984)

    late 1980snow: polynomial-time interior-point methods for nonlinearconvex optimization (Nesterov & Nemirovski 1994)

    applications

    before 1990: mostly in operations research; few in engineering since 1990: many new applications in engineering (control, signal

    processing, communications, circuit design, . . . ); new problem classes(semidefinite and second-order cone programming, robust optimization)

    Introduction 115

  • 8/10/2019 Convex Slides

    16/301

    Convex Optimization Boyd & Vandenberghe

    2. Convex sets

    affine and convex sets

    some important examples

    operations that preserve convexity

    generalized inequalities

    separating and supporting hyperplanes

    dual cones and generalized inequalities

    21

  • 8/10/2019 Convex Slides

    17/301

    Affine set

    linethrough x1, x2: all points

    x=x1+ (1 )x2 ( R)

    x1

    x2

    = 1.2 = 1

    = 0.6

    = 0 = 0.2

    affine set: contains the line through any two distinct points in the set

    example: solution set of linear equations{x | Ax=b}(conversely, every affine set can be expressed as solution set of system oflinear equations)

    Convex sets 22

  • 8/10/2019 Convex Slides

    18/301

    Convex set

    line segment between x1 and x2: all points

    x=x1+ (1 )x2

    with 0 1

    convex set: contains line segment between any two points in the set

    x1, x2 C, 0 1 = x1+ (1 )x2 C

    examples(one convex, two nonconvex sets)

    Convex sets 23

  • 8/10/2019 Convex Slides

    19/301

    Convex combination and convex hull

    convex combination ofx1,. . . , xk: any point x of the form

    x=1x1+ 2x2+ + kxk

    with 1+ + k= 1, i 0

    convex hullconv

    S: set of all convex combinations of points in S

    Convex sets 24

  • 8/10/2019 Convex Slides

    20/301

    Convex cone

    conic (nonnegative) combination ofx1 andx2: any point of the form

    x=1x1+ 2x2

    with 1 0, 2 0

    0

    x1

    x2

    convex cone: set that contains all conic combinations of points in the set

    Convex sets 25

  • 8/10/2019 Convex Slides

    21/301

    Hyperplanes and halfspaces

    hyperplane: set of the form{x | aTx=b} (a = 0)a

    xaTx=b

    x0

    halfspace: set of the form{x | aTx b} (a = 0)

    a

    aTx b

    aTx b

    x0

    a is the normal vector

    hyperplanes are affine and convex; halfspaces are convex

    Convex sets 26

  • 8/10/2019 Convex Slides

    22/301

    Euclidean balls and ellipsoids

    (Euclidean) ball with center xc and radius r:

    B(xc, r) = {x | x xc2 r} = {xc+ ru | u2 1}

    ellipsoid: set of the form

    {x | (x xc)TP1(x xc) 1}

    with P Sn++ (i.e., Psymmetric positive definite)

    xc

    other representation:{

    xc+ Au|

    u

    2

    1}

    with A square and nonsingular

    Convex sets 27

  • 8/10/2019 Convex Slides

    23/301

    Norm balls and norm cones

    norm: a function that satisfies x 0;x = 0 if and only ifx= 0

    tx

    =

    |t

    | x

    for t

    R

    x + y x + ynotation: is general (unspecified) norm; symb is particular normnorm ball with center xc and radius r:

    {x

    | x

    xc

    r}

    norm cone:{(x, t) | x t}

    Euclidean norm cone is called second-order cone

    x1x2

    t

    10

    1

    1

    0

    10

    0.5

    1

    norm balls and cones are convexConvex sets 28

  • 8/10/2019 Convex Slides

    24/301

    Polyhedra

    solution set of finitely many linear inequalities and equalities

    Ax b, Cx=d

    (A Rmn, C Rpn, is componentwise inequality)a1 a2

    a3

    a4

    a5P

    polyhedron is intersection of finite number of halfspaces and hyperplanes

    Convex sets 29

  • 8/10/2019 Convex Slides

    25/301

    Positive semidefinite cone

    notation: Sn is set of symmetric n n matrices Sn+= {X Sn | X 0}: positive semidefinite n n matrices

    X Sn+ zTXz 0 for all zSn+ is a convex cone

    Sn++=

    {X

    Sn

    |X

    0

    }: positive definite n

    n matrices

    example: x yy z S2+xy

    z

    0

    0.5

    1

    1

    0

    10

    0.5

    1

    Convex sets 210

  • 8/10/2019 Convex Slides

    26/301

    Operations that preserve convexity

    practical methods for establishing convexity of a set C

    1. apply definition

    x1, x2 C, 0 1 = x1+ (1 )x2 C

    2. show that C is obtained from simple convex sets (hyperplanes,halfspaces, norm balls, . . . ) by operations that preserve convexity

    intersection affine functions perspective function linear-fractional functions

    Convex sets 211

  • 8/10/2019 Convex Slides

    27/301

    Intersection

    the intersection of (any number of) convex sets is convex

    example:

    S= {x Rm

    | |p(t)| 1for |t| /3}where p(t) =x1 cos t + x2 cos2t + + xm cos mtform= 2:

    0 /3 2/3

    1

    0

    1

    t

    p

    (t)

    x1

    x2 S

    2 1 0 1 22

    1

    0

    1

    2

    Convex sets 212

  • 8/10/2019 Convex Slides

    28/301

    Affine function

    suppose f :Rn Rm is affine (f(x) =Ax + b withA Rmn, b Rm) the image of a convex set under f is convex

    S Rn

    convex = f(S) = {f(x) | x S} convex the inverse image f1(C) of a convex set under f is convex

    C Rm

    convex = f1

    (C) = {x Rn

    | f(x) C} convexexamples

    scaling, translation, projection

    solution set of linear matrix inequality{x | x1A1+ + xmAm B}(with Ai, B Sp)

    hyperbolic cone{x | xTP x (cTx)2, cTx 0} (with P Sn+)

    Convex sets 213

  • 8/10/2019 Convex Slides

    29/301

    Perspective and linear-fractional function

    perspective function P :Rn+1 Rn:

    P(x, t) =x/t, domP = {(x, t) | t >0}

    images and inverse images of convex sets under perspective are convex

    linear-fractional function f :Rn Rm:

    f(x) = Ax + b

    cTx + d, dom f= {x | cTx + d >0}

    images and inverse images of convex sets under linear-fractional functionsare convex

    Convex sets 214

  • 8/10/2019 Convex Slides

    30/301

    exampleof a linear-fractional function

    f(x) = 1x1+ x2+ 1

    x

    x1

    x2 C

    1 0 11

    0

    1

    x1

    x2

    f(C)

    1 0 11

    0

    1

    Convex sets 215

  • 8/10/2019 Convex Slides

    31/301

    Generalized inequalities

    a convex cone K Rn is a proper cone if

    K is closed (contains its boundary)

    K is solid (has nonempty interior) Kis pointed (contains no line)

    examples nonnegative orthant K=Rn+= {x Rn | xi 0, i= 1, . . . , n} positive semidefinite cone K=Sn+ nonnegative polynomials on [0, 1]:

    K= {x Rn | x1+ x2t + x3t2 + + xntn1 0 fort [0, 1]}

    Convex sets 216

  • 8/10/2019 Convex Slides

    32/301

    generalized inequality defined by a proper cone K:

    x Ky y x K, x Ky y x intK

    examples

    componentwise inequality (K=Rn

    +)

    x Rn+ y xi yi, i= 1, . . . , n

    matrix inequality (K=Sn+)

    XSn+ Y Y Xpositive semidefinite

    these two types are so common that we drop the subscript in

    K

    properties: many properties ofKare similar toonR, e.g.,

    x Ky, u Kv = x + u Ky+ v

    Convex sets 217

  • 8/10/2019 Convex Slides

    33/301

    Minimum and minimal elements

    K is not in general a linear ordering: we can have x Ky and yKxx S is the minimum element ofSwith respect toK if

    y S = x Ky

    x S is a minimal element ofSwith respect toK if

    y S, yKx = y=x

    example(K=R2

    +)

    x1 is the minimum element ofS1x2 is a minimal element ofS2 x1

    x2S1S2

    Convex sets 218

  • 8/10/2019 Convex Slides

    34/301

    Separating hyperplane theorem

    ifC and D are nonempty disjoint convex sets, there exist a = 0, b s.t.

    aTx b forx C, aTx b forx D

    D

    C

    a

    aTx b aTx b

    the hyperplane{x | aTx=b} separates C andD

    strict separation requires additional assumptions (e.g., C is closed, D is asingleton)

    Convex sets 219

  • 8/10/2019 Convex Slides

    35/301

    Supporting hyperplane theorem

    supporting hyperplane to set Cat boundary point x0:

    {x | aTx=aTx0}

    where a = 0 and aTx aTx0 for all x C

    C

    ax0

    supporting hyperplane theorem: ifCis convex, then there exists asupporting hyperplane at every boundary point ofC

    Convex sets 220

  • 8/10/2019 Convex Slides

    36/301

    Dual cones and generalized inequalities

    dual cone of a cone K:

    K = {y| yTx 0 for all x K}

    examples K=Rn+: K =Rn+ K=Sn+: K =Sn+ K= {(x, t) | x2 t}: K = {(x, t) | x2 t} K= {(x, t) | x1 t}: K = {(x, t) | x t}

    first three examples are self-dual cones

    dual cones of proper cones are proper, hence define generalized inequalities:

    yK0 yTx 0 for all x K0

    Convex sets 221

  • 8/10/2019 Convex Slides

    37/301

    Minimum and minimal elements via dual inequalities

    minimum element w.r.t.Kx is minimum element ofS iff for all K 0, x is the unique minimizerofTz over S

    x

    S

    minimal element w.r.t.K

    ifx minimizes Tz over Sfor some

    K0, then x is minimal

    Sx1

    x2

    1

    2

    ifx is a minimal element of a convex setS, then there exists a nonzero

    K0 such that x minimizes

    Tz over S

    Convex sets 222

  • 8/10/2019 Convex Slides

    38/301

    optimal production frontier

    different production methods use different amounts of resources x Rn

    production set P: resource vectorsxfor all possible production methods efficient (Pareto optimal) methods correspond to resource vectors x

    that are minimal w.r.t. R

    n

    +

    example(n= 2)x1, x2, x3 are efficient; x4, x5 are not

    x4x2

    x1

    x5

    x3

    P

    labor

    fuel

    Convex sets 223

  • 8/10/2019 Convex Slides

    39/301

    Convex Optimization Boyd & Vandenberghe

    3. Convex functions

    basic properties and examples

    operations that preserve convexity

    the conjugate function

    quasiconvex functions

    log-concave and log-convex functions

    convexity with respect to generalized inequalities

    31

    D fi i i

  • 8/10/2019 Convex Slides

    40/301

    Definition

    f :Rn

    R is convex ifdom f is a convex set andf(x + (1 )y) f(x) + (1 )f(y)

    for all x, y

    dom f, 0

    1

    (x, f(x))

    (y, f(y))

    f is concave iff is convex

    f is strictly convex ifdom f is convex and

    f(x + (1 )y)< f(x) + (1 )f(y)

    for x, y dom f, x =y, 0<

  • 8/10/2019 Convex Slides

    41/301

    Examples on R

    convex:

    affine: ax + b onR, for any a, b R

    exponential: eax, for any a

    R

    powers: x onR++, for 1 or 0 powers of absolute value:|x|p onR, for p 1

    negative entropy: x log x onR++

    concave:

    affine: ax + b onR, for any a, b R powers: x onR++, for 0 1 logarithm: log x onR++

    Convex functions 33

    n mn

  • 8/10/2019 Convex Slides

    42/301

    Examples on Rn and Rmn

    affine functions are convex and concave; all norms are convex

    examples on Rn

    affine function f(x) =aTx + b norms:xp= (ni=1 |xi|p)1/p forp 1;x= maxk |xk|

    examples on Rmn (m n matrices)

    affine function

    f(X) =tr(ATX) + b=m

    i=1

    nj=1

    AijXij+ b

    spectral (maximum singular value) norm

    f(X) = X2=max(X) = (max(XTX))1/2

    Convex functions 34

    R t i ti f f ti t li

  • 8/10/2019 Convex Slides

    43/301

    Restriction of a convex function to a line

    f :Rn R is convex if and only if the function g:R R,

    g(t) =f(x + tv), dom g= {t | x + tv dom f}

    is convex (in t) for any x dom f, v Rn

    can check convexity offby checking convexity of functions of one variable

    example. f :Sn

    R with f(X) = log det X, dom f=Sn

    ++

    g(t) = log det(X+ tV) = log det X+ log det(I+ tX1/2V X1/2)

    = log det X+n

    i=1 log(1 + ti)where i are the eigenvalues ofX

    1/2V X1/2

    g is concave in t (for any choice ofX

    0, V); hence f is concave

    Convex functions 35

    Extended value extension

  • 8/10/2019 Convex Slides

    44/301

    Extended-value extension

    extended-value extension f off is

    f(x) =f(x), x dom f, f(x) = , x dom f

    often simplifies notation; for example, the condition

    0 1 = f(x + (1 )y) f(x) + (1 )f(y)

    (as an inequality in R {}), means the same as the two conditions

    dom f is convex for x, y dom f,

    0 1 = f(x + (1 )y) f(x) + (1 )f(y)

    Convex functions 36

    First order condition

  • 8/10/2019 Convex Slides

    45/301

    First-order condition

    f is differentiable ifdom f is open and the gradient

    f(x) =

    f(x)

    x1,

    f(x)

    x2, . . . ,

    f(x)

    xn exists at each x dom f1st-order condition: differentiable fwith convex domain is convex iff

    f(y) f(x) + f(x)T(y x) for allx, y dom f

    (x, f(x))

    f(y)

    f(x) + f(x)

    T

    (y x)

    first-order approximation offis global underestimator

    Convex functions 37

    Second order conditions

  • 8/10/2019 Convex Slides

    46/301

    Second-order conditions

    f is twice differentiable ifdom f is open and the Hessian2f(x) Sn,

    2f(x)ij = 2f(x)

    xix

    j

    , i, j = 1, . . . , n ,

    exists at each x dom f

    2nd-order conditions: for twice differentiable fwith convex domain

    fis convex if and only if

    2f(x)

    0 for allx

    dom f

    if2f(x) 0 for all x dom f, then f is strictly convex

    Convex functions 38

    Examples

  • 8/10/2019 Convex Slides

    47/301

    Examples

    quadratic function: f(x) = (1/2)xT

    P x + qT

    x + r (with P Sn

    )

    f(x) =P x + q, 2f(x) =P

    convex ifP

    0

    least-squares objective: f(x) = Ax b22f(x) = 2AT(Ax b), 2f(x) = 2ATA

    convex (for any A)

    quadratic-over-linear: f(x, y) =x2/y

    2f(x, y) = 2y3 y

    x y

    xT 0

    convex for y >0 xy

    f(x,y

    )

    2

    0

    2

    0

    1

    20

    1

    2

    Convex functions 39

    log-sum-exp: f (x) = log nk exp xk is convex

  • 8/10/2019 Convex Slides

    48/301

    log sum exp: f(x) = log

    k=1 exp xk is convex

    2f(x) = 11Tz

    diag(z) 1(1Tz)2

    zzT (zk= exp xk)

    to show2

    f(x) 0, we must verify that vT

    2

    f(x)v 0 for all v:

    vT2f(x)v=(

    k zkv2k)(

    k zk) (

    k vkzk)2

    (

    kzk)2

    0

    since (k vkzk)2 (k zkv2k)(k zk) (from Cauchy-Schwarz inequality)

    geometric mean: f(x) = (nk=1 xk)1/n onRn++ is concave(similar proof as for log-sum-exp)

    Convex functions 310

    Epigraph and sublevel set

  • 8/10/2019 Convex Slides

    49/301

    Epigraph and sublevel set

    -sublevel set off :Rn R:

    C= {x dom f| f(x) }

    sublevel sets of convex functions are convex (converse is false)

    epigraphoff :Rn R:

    epif= {(x, t) R

    n+1

    | x dom

    f, f(x) t}epi f

    f

    fis convex if and only ifepi f is a convex set

    Convex functions 311

    Jensens inequality

  • 8/10/2019 Convex Slides

    50/301

    Jensen s inequality

    basic inequality: iff is convex, then for 0 1,

    f(x + (1 )y) f(x) + (1 )f(y)

    extension: iff is convex, then

    f(E z) E f(z)for any random variable z

    basic inequality is special case with discrete distribution

    prob(z =x) =, prob(z=y) = 1

    Convex functions 312

    Operations that preserve convexity

  • 8/10/2019 Convex Slides

    51/301

    Op t o s t t p s o ty

    practical methods for establishing convexity of a function

    1. verify definition (often simplified by restricting to a line)

    2. for twice differentiable functions, show2f(x) 0

    3. show that f is obtained from simple convex functions by operationsthat preserve convexity

    nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective

    Convex functions 313

  • 8/10/2019 Convex Slides

    52/301

    Pointwise maximum

  • 8/10/2019 Convex Slides

    53/301

    iff1, . . . ,fm are convex, then f(x) = max{f1(x), . . . , f m(x)} is convex

    examples

    piecewise-linear function: f(x) = maxi=1,...,m(aTix + bi) is convex sum ofr largest components ofx Rn:

    f(x) =x[1]+ x[2]+ + x[r]

    is convex (x[i] is ith largest component ofx)

    proof:f(x) = max{xi1+ xi2+ + xir| 1 i1< i2< < ir n}

    Convex functions 315

    Pointwise supremum

  • 8/10/2019 Convex Slides

    54/301

    iff(x, y) is convex in x for each y A, theng(x) = sup

    yAf(x, y)

    is convexexamples

    support function of a set C: SC(x) = supyCyTx is convex

    distance to farthest point in a set C:f(x) = sup

    yCx y

    maximum eigenvalue of symmetric matrix: for X Sn,max(X) = sup

    y2=1yTXy

    Convex functions 316

    Composition with scalar functions

  • 8/10/2019 Convex Slides

    55/301

    composition ofg:Rn

    R and h:R R:f(x) =h(g(x))

    f is convex if g convex, h convex, h nondecreasing

    g concave, h convex, hnonincreasing

    proof (for n= 1, differentiable g, h)

    f(x) =h(g(x))g(x)2 + h(g(x))g(x)

    note: monotonicity must hold for extended-value extension h

    examples

    exp g(x) is convex ifg is convex 1/g(x) is convex ifg is concave and positive

    Convex functions 317

    Vector composition

  • 8/10/2019 Convex Slides

    56/301

    composition ofg:Rn Rk and h:Rk R:

    f(x) =h(g(x)) =h(g1(x), g2(x), . . . , gk(x))

    f is convex if gi convex, h convex, h nondecreasing in each argument

    gi concave, h convex, h nonincreasing in each argument

    proof (for n= 1, differentiable g, h)

    f(x) =g(x)T2h(g(x))g(x) + h(g(x))Tg(x)

    examplesmi=1 log gi(x) is concave ifgi are concave and positive log

    mi=1 exp gi(x) is convex ifgi are convex

    Convex functions 318

    Minimization

  • 8/10/2019 Convex Slides

    57/301

    iff(x, y) is convex in (x, y) and C is a convex set, then

    g(x) = infyC

    f(x, y)

    is convex

    examples

    f(x, y) =xTAx + 2xTBy+ yTCy with

    A BBT C

    0, C 0

    minimizing over y gives g(x) = infy f(x, y) =xT

    (A BC1

    BT

    )xg is convex, hence Schur complement A BC1BT 0

    distance to a set: dist(x, S) = infySx y is convex ifS is convex

    Convex functions 319

    Perspective

  • 8/10/2019 Convex Slides

    58/301

    theperspectiveof a functionf :Rn R is the function g :Rn R R,

    g(x, t) =tf(x/t), dom g= {(x, t) | x/t dom f, t >0}

    g is convex iff is convex

    examples

    f(x) =xTx is convex; hence g(x, t) =xTx/t is convex for t >0 negative logarithm f(x) = log x is convex; hence relative entropy

    g(x, t) =t log t t log x is convex onR2++ iff is convex, then

    g(x) = (cTx + d)f(Ax + b)/(cTx + d)is convex on{x | cTx + d >0, (Ax + b)/(cTx + d) dom f}

    Convex functions 320

    The conjugate function

  • 8/10/2019 Convex Slides

    59/301

    theconjugate of a function f is

    f(y) = supxdom f

    (yTx f(x))

    f(x)

    (0,f(y))

    xy

    x

    f is convex (even iff is not) will be useful in chapter 5

    Convex functions 321

    examples

  • 8/10/2019 Convex Slides

    60/301

    negative logarithm f(x) = log x

    f(y) = supx>0

    (xy+ log x)

    = 1 log(y) y

  • 8/10/2019 Convex Slides

    61/301

    f :Rn R is quasiconvex ifdom fis convex and the sublevel sets

    S= {x dom f| f(x) }

    are convex for all

    a b c

    f is quasiconcave iff is quasiconvex f is quasilinear if it is quasiconvex and quasiconcave

    Convex functions 323

    Examples

  • 8/10/2019 Convex Slides

    62/301

    |x| is quasiconvex onR ceil(x) = inf{z Z | z x} is quasilinear log x is quasilinear on R++ f(x1, x2) =x1x2 is quasiconcave on R2++ linear-fractional function

    f(x) =aTx + b

    cTx + d

    , dom f=

    {x

    |cTx + d >0

    }is quasilinear

    distance ratio

    f(x) =x a2x b2 , dom f= {x | x a2 x b2}

    is quasiconvex

    Convex functions 324

    internal rate of return

  • 8/10/2019 Convex Slides

    63/301

    cash flow x= (x0, . . . , xn); xi is payment in periodi (to us ifxi>0) we assume x00 present value of cash flow x, for interest rate r:

    PV(x, r) =n

    i=0

    (1 + r)ixi

    internal rate of return is smallest interest rate for which PV(x, r) = 0:

    IRR(x) = inf{r 0 | PV(x, r) = 0}

    IRR is quasiconcave: superlevel set is intersection of open halfspaces

    IRR(x) R n

    i=0

    (1 + r)ixi>0 for 0 r < R

    Convex functions 325

    Properties

  • 8/10/2019 Convex Slides

    64/301

    modified Jensen inequality: for quasiconvex f

    0 1 = f(x + (1 )y) max{f(x), f(y)}

    first-order condition: differentiable fwith cvx domain is quasiconvex iff

    f(y) f(x) = f(x)T(y x) 0

    xf(x)

    sums of quasiconvex functions are not necessarily quasiconvex

    Convex functions 326

    Log-concave and log-convex functions

  • 8/10/2019 Convex Slides

    65/301

    a positive function f is log-concave iflog f is concave:

    f(x + (1 )y) f(x)f(y)1 for 0 1

    f is log-convex iflog f is convex

    powers: xa onR++ is log-convex for a 0, log-concave for a 0 many common probability densities are log-concave, e.g., normal:

    f(x) = 1(2)n det

    e12(xx)

    T1(xx)

    cumulative Gaussian distribution function is log-concave

    (x) = 1

    2

    x

    eu2/2 du

    Convex functions 327

    Properties of log-concave functions

  • 8/10/2019 Convex Slides

    66/301

    twice differentiable fwith convex domain is log-concave if and only if

    f(x)2f(x) f(x)f(x)T

    for all x dom f

    product of log-concave functions is log-concave

    sum of log-concave functions is not always log-concave integration: iff :Rn Rm R is log-concave, then

    g(x) = f(x, y)dyis log-concave (not easy to show)

    Convex functions 328

    consequences of integration property

  • 8/10/2019 Convex Slides

    67/301

    convolution f

    g of log-concave functions f, g is log-concave

    (f g)(x) =

    f(x y)g(y)dy

    ifC Rn convex and y is a random variable with log-concave pdf then

    f(x) =prob(x + y C)

    is log-concave

    proof: write f(x) as integral of product of log-concave functions

    f(x) = g(x + y)p(y) dy, g(u) = 1 u C0 u C,pis pdf ofy

    Convex functions 329

    example: yield function

  • 8/10/2019 Convex Slides

    68/301

    Y(x) =prob(x + w S) x Rn: nominal parameter values for product

    w Rn

    : random variations of parameters in manufactured product

    S: set of acceptable values

    ifSis convex and w has a log-concave pdf, then

    Y is log-concave

    yield regions{x | Y(x) } are convex

    Convex functions 330

    Convexity with respect to generalized inequalities

  • 8/10/2019 Convex Slides

    69/301

    f :Rn Rm is K-convex ifdom f is convex and

    f(x + (1 )y) Kf(x) + (1 )f(y)

    forx, y dom f, 0 1

    examplef :Sm Sm, f(X) =X2 is Sm+ -convex

    proof: for fixed z Rm, zTX2z= Xz22 is convex in X, i.e.,

    zT(X+ (1 )Y)2z zTX2z+ (1 )zTY2z

    forX, Y Sm, 0 1

    therefore (X+ (1 )Y)2 X2 + (1 )Y2

    Convex functions 331

    Convex Optimization Boyd & Vandenberghe

    4 Convex optimization problems

  • 8/10/2019 Convex Slides

    70/301

    4. Convex optimization problems

    optimization problem in standard form

    convex optimization problems

    quasiconvex optimization linear optimization

    quadratic optimization geometric programming generalized inequality constraints

    semidefinite programming vector optimization

    41

    Optimization problem in standard form

  • 8/10/2019 Convex Slides

    71/301

    minimize f0(x)subject to fi(x) 0, i= 1, . . . , m

    hi(x) = 0, i= 1, . . . , p

    x Rn

    is the optimization variable f0:Rn R is the objective or cost function fi:Rn R, i= 1, . . . , m, are the inequality constraint functions

    hi:Rn

    R are the equality constraint functionsoptimal value:

    p = inf{

    f0(x)|

    fi(x)

    0, i= 1, . . . , m, hi(x) = 0, i= 1, . . . , p}

    p = if problem is infeasible (no x satisfies the constraints) p = if problem is unbounded below

    Convex optimization problems 42

    Optimal and locally optimal points

  • 8/10/2019 Convex Slides

    72/301

    x is feasible ifx dom f0 and it satisfies the constraintsa feasible x is optimal iff0(x) =p

    ; Xopt is the set of optimal points

    x is locally optimal if there is an R >0 such that x is optimal for

    minimize (over z) f0(z)subject to fi(z) 0, i= 1, . . . , m, hi(z) = 0, i= 1, . . . , p

    z x2 R

    examples(with n= 1, m=p= 0)

    f0(x) = 1/x, dom f0=R++: p = 0, no optimal point

    f0(x) = log x, dom f0=R++: p

    = f0(x) =x log x, dom f0=R++: p = 1/e, x= 1/e is optimal f0(x) =x3 3x, p = , local optimum at x= 1

    Convex optimization problems 43

    Implicit constraints

  • 8/10/2019 Convex Slides

    73/301

    the standard form optimization problem has an implicit constraint

    x D =m

    i=0dom fi

    p

    i=1domhi,

    we callD thedomain of the problem

    the constraints fi(x)

    0, hi(x) = 0 are the explicit constraints

    a problem is unconstrainedif it has no explicit constraints (m=p= 0)

    example:

    minimize f0(x) = ki=1 log(bi aTix)is an unconstrained problem with implicit constraints aTix < bi

    Convex optimization problems 44

    Feasibility problem

  • 8/10/2019 Convex Slides

    74/301

    find xsubject to fi(x) 0, i= 1, . . . , m

    hi(x) = 0, i= 1, . . . , p

    can be considered a special case of the general problem with f0(x) = 0:

    minimize 0subject to fi(x) 0, i= 1, . . . , m

    hi(x) = 0, i= 1, . . . , p

    p = 0 if constraints are feasible; any feasible x is optimal

    p = if constraints are infeasible

    Convex optimization problems 45

    Convex optimization problem

  • 8/10/2019 Convex Slides

    75/301

    standard form convex optimization problem

    minimize f0(x)subject to fi(x) 0, i= 1, . . . , m

    a

    T

    ix=bi, i= 1, . . . , p

    f0, f1, . . . ,fm are convex; equality constraints are affine problem is quasiconvex iff0 is quasiconvex (and f1, . . . ,fm convex)

    often written as

    minimize f0(x)

    subject to fi(x) 0, i= 1, . . . , mAx=b

    important property: feasible set of a convex optimization problem is convex

    Convex optimization problems 46

    example

  • 8/10/2019 Convex Slides

    76/301

    minimize f0(x) =x21+ x22subject to f1(x) =x1/(1 + x

    22) 0

    h1(x) = (x1+ x2)2 = 0

    f0 is convex; feasible set{(x1, x2) | x1= x2 0} is convex not a convex problem (according to our definition): f1 is not convex, h1

    is not affine

    equivalent (but not identical) to the convex problem

    minimize x21+ x22

    subject to x1 0x1+ x2= 0

    Convex optimization problems 47

    Local and global optima

    any locally optimal point of a convex problem is (globally) optimal

  • 8/10/2019 Convex Slides

    77/301

    any locally optimal point of a convex problem is (globally) optimal

    proof: supposexis locally optimal and there exists a y withf0(y)< f0(x)

    x locally optimal means there is an R >0 such that

    z feasible, z x2 R = f0(z) f0(x)

    considerz =y+ (1 )x with =R/(2y x2)

    y x2> R, so 0<

  • 8/10/2019 Convex Slides

    78/301

    x is optimal if and only if it is feasible and

    f0(x)T(y x) 0 for all feasibley

    f0(x)

    X x

    if nonzero,f0(x) defines a supporting hyperplane to feasible set X at x

    Convex optimization problems 49

    unconstrained problem: x is optimal if and only if

    x dom f0, f0(x) = 0

  • 8/10/2019 Convex Slides

    79/301

    f0,

    f0( ) 0

    equality constrained problem

    minimize f0(x) subject to Ax=b

    x is optimal if and only if there exists a such that

    x dom f0, Ax=b, f0(x) + AT= 0

    minimization over nonnegative orthant

    minimize f0(x) subject to x 0

    x is optimal if and only if

    x dom f0, x 0,f0(x)i 0 xi= 0

    f0(x)i= 0 xi>0

    Convex optimization problems 410

    Equivalent convex problems

    two problems are (informally) equivalent if the solution of one is readily

  • 8/10/2019 Convex Slides

    80/301

    p ( y) q yobtained from the solution of the other, and vice-versa

    some common transformations that preserve convexity:

    eliminating equality constraints

    minimize f0(x)subject to fi(x) 0, i= 1, . . . , m

    Ax=b

    is equivalent to

    minimize (over z) f0(F z+ x0)subject to fi(F z+ x0) 0, i= 1, . . . , m

    where F and x0 are such that

    Ax=b x=F z+ x0 for some z

    Convex optimization problems 411

    introducing equality constraintsminimize f0(A0x + b0)

  • 8/10/2019 Convex Slides

    81/301

    subject to fi(Aix + bi) 0, i= 1, . . . , mis equivalent to

    minimize (over x, yi) f0(y0)subject to fi(yi) 0, i= 1, . . . , m

    yi=Aix + bi, i= 0, 1, . . . , m

    introducing slack variables for linear inequalities

    minimize f0(x)subject to aTix bi, i= 1, . . . , m

    is equivalent to

    minimize (over x, s) f0(x)subject to aTix + si=bi, i= 1, . . . , m

    si 0, i= 1, . . . m

    Convex optimization problems 412

    epigraph form: standard form convex problem is equivalent to

    minimize (over x, t) t

  • 8/10/2019 Convex Slides

    82/301

    ( , )subject to f0(x) t 0

    fi(x) 0, i= 1, . . . , mAx=b

    minimizing over some variables

    minimize f0(x1, x2)subject to fi(x1)

    0, i= 1, . . . , m

    is equivalent to

    minimize f0(x1)

    subject to fi(x1) 0, i= 1, . . . , m

    where f0(x1) = infx2 f0(x1, x2)

    Convex optimization problems 413

    Quasiconvex optimization

  • 8/10/2019 Convex Slides

    83/301

    minimize f0(x)subject to fi(x) 0, i= 1, . . . , m

    Ax=b

    with f0:Rn

    R quasiconvex, f1, . . . ,fm convex

    can have locally optimal points that are not (globally) optimal

    (x, f0(x))

    Convex optimization problems 414

  • 8/10/2019 Convex Slides

    84/301

    quasiconvex optimization via convex feasibility problems

    (x) 0, f (x) 0, i= 1, . . . , m, Ax=b (1)

  • 8/10/2019 Convex Slides

    85/301

    t( )

    i( )

    ( )

    for fixed t, a convex feasibility problem in x if feasible, we can conclude that t p; if infeasible, t p

    Bisection method for quasiconvex optimization

    givenl p, u p, tolerance > 0.

    repeat1. t := (l+u)/2.2. Solve the convex feasibility problem (1).

    3. if (1) is feasible, u:=t; else l :=t.

    until u l .

    requires exactlylog2((u l)/) iterations (where u, l are initial values)

    Convex optimization problems 416

    Linear program (LP)

  • 8/10/2019 Convex Slides

    86/301

    minimize cTx + dsubject to Gx h

    Ax=b

    convex problem with affine objective and constraint functions feasible set is a polyhedron

    P x

    c

    Convex optimization problems 417

    Examples

    diet problem: choose quantities x1, . . . ,xn ofn foods

  • 8/10/2019 Convex Slides

    87/301

    one unit of food j costs cj, contains amount aij of nutrient i healthy diet requires nutrient i in quantity at least bi

    to find cheapest healthy diet,

    minimize cTxsubject to Ax b, x 0

    piecewise-linear minimization

    minimize maxi=1,...,m(aTix + bi)

    equivalent to an LP

    minimize tsubject to aTi x + bi t, i= 1, . . . , m

    Convex optimization problems 418

    Chebyshev center of a polyhedron

    Chebyshev center of

  • 8/10/2019 Convex Slides

    88/301

    P= {x | aTix bi, i= 1, . . . , m}

    is center of largest inscribed ball

    B= {xc+ u | u2 r}

    xchebxcheb

    aTix

    bi for all x

    B if and only if

    sup{aTi(xc+ u) | u2 r} =aTixc+ rai2 bi

    hence, xc, r can be determined by solving the LP

    maximize rsubject to aTixc+ rai2 bi, i= 1, . . . , m

    Convex optimization problems 419

    Linear-fractional program

    i i i f0( )

  • 8/10/2019 Convex Slides

    89/301

    minimize f0(x)subject to Gx h

    Ax=b

    linear-fractional program

    f0(x) = cTx + d

    eTx + f, dom f0(x) = {x | eTx + f >0}

    a quasiconvex optimization problem; can be solved by bisection also equivalent to the LP (variables y, z)

    minimize cTy+ dz

    subject to Gy hzAy=bzeTy+ f z= 1z 0

    Convex optimization problems 420

    generalized linear-fractional program

    f0( )cT

    ix + d

    i d m f0( ) { | Ti +fi > 0 i 1 }

  • 8/10/2019 Convex Slides

    90/301

    f0(x) = maxi=1,...,r i ieTix + fi, dom f0(x) = {x | eTi x+fi>0, i= 1, . . . , r}

    a quasiconvex optimization problem; can be solved by bisection

    example: Von Neumann model of a growing economy

    maximize (over x, x+) mini=1,...,n x+i /xi

    subject to x+

    0, Bx+

    Ax

    x, x+ Rn: activity levels ofn sectors, in current and next period

    (Ax)i, (Bx

    +)i: produced, resp. consumed, amounts of good i

    x+i /xi: growth rate of sector iallocate activity to maximize growth rate of slowest growing sector

    Convex optimization problems 421

    Quadratic program (QP)

    minimi e (1/2) T P + T +

  • 8/10/2019 Convex Slides

    91/301

    minimize (1/2)xTP x + qTx + rsubject to Gx h

    Ax=b

    P Sn+, so objective is convex quadratic minimize a convex quadratic function over a polyhedron

    P

    x

    f0(x

    )

    Convex optimization problems 422

    Examples

    least squares 2

  • 8/10/2019 Convex Slides

    92/301

    least-squares minimize Ax b22

    analytical solution x =Ab (A is pseudo-inverse)

    can add linear constraints, e.g., l x ulinear program with random cost

    minimize cT

    x + xT

    x=E cT

    x + var(cT

    x)subject to Gx h, Ax=b

    c is random vector with mean c and covariance

    hence, cTx is random variable with mean cTx and variance xTx >0 is risk aversion parameter; controls the trade-off between

    expected cost and variance (risk)

    Convex optimization problems 423

    Quadratically constrained quadratic program (QCQP)

  • 8/10/2019 Convex Slides

    93/301

    minimize (1/2)xTP0x + qT0x + r0

    subject to (1/2)xTPix + qTi x + ri 0, i= 1, . . . , m

    Ax=b

    Pi Sn+; objective and constraints are convex quadratic

    ifP1, . . . , P m Sn++, feasible region is intersection ofm ellipsoids andan affine set

    Convex optimization problems 424

    Second-order cone programming

    T

  • 8/10/2019 Convex Slides

    94/301

    minimize fTxsubject to Aix + bi2 cTix + di, i= 1, . . . , m

    F x=g

    (Ai Rnin, F Rpn)

    inequalities are called second-order cone (SOC) constraints:

    (Aix + bi, cTix + di) second-order cone in Rni+1

    for ni= 0, reduces to an LP; ifci= 0, reduces to a QCQP

    more general than QCQP and LP

    Convex optimization problems 425

    Robust linear programming

    the parameters in optimization problems are often uncertain, e.g., in an LP

  • 8/10/2019 Convex Slides

    95/301

    minimize cTxsubject to aTix bi, i= 1, . . . , m ,

    there can be uncertainty in c, ai, b

    i

    two common approaches to handling uncertainty (in ai, for simplicity)

    deterministic model: constraints must hold for all ai Ei

    minimize cTxsubject to aTix bi for all ai Ei, i= 1, . . . , m ,

    stochastic model: ai is random variable; constraints must hold with

    probability

    minimize cTxsubject to prob(aTix bi) , i= 1, . . . , m

    Convex optimization problems 426

    deterministic approach via SOCP

    choose an ellipsoid as

    Ei:

  • 8/10/2019 Convex Slides

    96/301

    EEi= {ai+ Piu | u2 1} (ai Rn, Pi Rnn)

    center is ai, semi-axes determined by singular values/vectors ofPi

    robust LPminimize cTx

    subject to aT

    ix bi ai Ei, i= 1, . . . , mis equivalent to the SOCP

    minimize cTx

    subject to aTix + PTi x2 bi, i= 1, . . . , m

    (follows from supu21(ai+ Piu)Tx= aTix + PTi x2)

    Convex optimization problems 427

    stochastic approach via SOCP

    assume ai is Gaussian with mean ai, covariance i (ai

    N(ai, i))

  • 8/10/2019 Convex Slides

    97/301

    N aTix is Gaussian r.v. with mean aTix, variance xTix; hence

    prob(aT

    i

    x

    bi) = bi aTix

    1/2i x2where (x) = (1/

    2)x

    et2/2 dt is CDF ofN(0, 1)

    robust LPminimize cTxsubject to prob(aTix bi) , i= 1, . . . , m ,

    with 1/2, is equivalent to the SOCPminimize cTx

    subject to aTix + 1()1/2i x2 bi, i= 1, . . . , m

    Convex optimization problems 428

    Geometric programming

    monomial function

  • 8/10/2019 Convex Slides

    98/301

    f(x) =cxa11 xa22 xann , dom f=Rn++

    with c >0; exponent ai can be any real number

    posynomial function: sum of monomials

    f(x) =K

    k=1ckx

    a1k1 x

    a2k2 xankn , dom f=Rn++

    geometric program (GP)

    minimize f0(x)

    subject to fi(x) 1, i= 1, . . . , mhi(x) = 1, i= 1, . . . , p

    with fi posynomial, hi monomial

    Convex optimization problems 429

    Geometric program in convex form

    change variables to yi= log xi, and take logarithm of cost, constraints

  • 8/10/2019 Convex Slides

    99/301

    monomial f(x) =cxa11 xann transforms to

    log f(ey1, . . . , eyn) =aTy+ b (b= log c)

    posynomial f(x) =Kk=1 ckxa1k1 xa2k2 xankn transforms tolog f(ey1, . . . , eyn) = log

    K

    k=1 eaTk y+bk (bk= log ck)

    geometric program transforms to convex problem

    minimize log Kk=1 exp(aT0ky+ b0k)subject to log

    Kk=1 exp(a

    Tiky+ bik)

    0, i= 1, . . . , m

    Gy+ d= 0

    Convex optimization problems 430

    Design of cantilever beam

    segment 4 segment 3 segment 2 segment 1

  • 8/10/2019 Convex Slides

    100/301

    F

    Nsegments with unit lengths, rectangular cross-sections of size wi hi given vertical force Fapplied at the right end

    design problem

    minimize total weightsubject to upper & lower bounds on wi, hi

    upper bound & lower bounds on aspect ratios hi/wi

    upper bound on stress in each segmentupper bound on vertical deflection at the end of the beam

    variables: wi, hi for i= 1, . . . , N

    Convex optimization problems 431

    objective and constraint functions

    total weight w1h1+

    + wNhN is posynomial

  • 8/10/2019 Convex Slides

    101/301

    aspect ratio hi/wi and inverse aspect ratio wi/hi are monomials

    maximum stress in segment i is given by 6iF/(wih

    2i ), a monomial

    the vertical deflection yi and slope vi of central axis at the right end ofsegmenti are defined recursively as

    vi = 12(i 1/2) FEwih3i

    + vi+1

    yi = 6(i 1/3) FEwih3i

    + vi+1+ yi+1

    fori=N, N 1, . . . , 1, withvN+1=yN+1= 0(Eis Youngs modulus)vi and yi are posynomial functions ofw, h

    Convex optimization problems 432

    formulation as a GP

    minimize w1h1+ + wNhNsubject to w1maxwi 1, wminw1i 1, i= 1, . . . , N

  • 8/10/2019 Convex Slides

    102/301

    h1maxhi 1, hminh1i 1, i= 1, . . . , N S1maxw

    1i hi 1, Sminwih1i 1, i= 1, . . . , N

    6iF 1maxw1i h2i 1, i= 1, . . . , N y1maxy1 1

    note

    we write wmin wi wmax andhmin hi hmaxwmin/wi 1, wi/wmax 1, hmin/hi 1, hi/hmax 1

    we write Smin hi/wi Smax as

    Sminwi/hi 1, hi/(wiSmax) 1

    Convex optimization problems 433

    Minimizing spectral radius of nonnegative matrix

    Perron-Frobenius eigenvalue pf(A)

    i t f ( l t i ) iti A Rnn

  • 8/10/2019 Convex Slides

    103/301

    exists for (elementwise) positive A Rnn

    a real, positive eigenvalue ofA, equal to spectral radius maxi |i(A)|

    determines asymptotic growth (decay) rate ofAk

    : Ak

    k

    pf as k alternative characterization: pf(A) = inf{ | Av v for some v 0}

    minimizing spectral radius of matrix of posynomials

    minimize pf(A(x)), where the elements A(x)ij are posynomials ofx equivalent geometric program:

    minimize subject to nj=1 A(x)ijvj/(vi) 1, i= 1, . . . , nvariables , v, x

    Convex optimization problems 434

    Generalized inequality constraints

    convex problem with generalized inequality constraints

    i i i f ( )

  • 8/10/2019 Convex Slides

    104/301

    minimize f0(x)subject to fi(x) Ki0, i= 1, . . . , m

    Ax=b

    f0:Rn R convex; fi:Rn Rki Ki-convex w.r.t. proper cone Ki same properties as standard convex problem (convex feasible set, local

    optimum is global, etc.)

    conic form problem: special case with affine objective and constraints

    minimize cTx

    subject to F x + gK0Ax=b

    extends linear programming (K=Rm+ ) to nonpolyhedral cones

    Convex optimization problems 435

    Semidefinite program (SDP)

    minimize cTx

  • 8/10/2019 Convex Slides

    105/301

    subject to x1F1+ x2F2+ + xnFn+ G 0Ax=b

    with Fi, G

    Sk

    inequality constraint is called linear matrix inequality (LMI) includes problems with multiple LMI constraints: for example,

    x1F1+ + xnFn+ G 0, x1F1+ + xnFn+ G 0

    is equivalent to single LMI

    x1 F1 0

    0 F1

    +x2

    F2 00 F2

    + +xn

    Fn 00 Fn

    + G 0

    0 G

    0

    Convex optimization problems 436

    LP and SOCP as SDP

    LP and equivalent SDP

  • 8/10/2019 Convex Slides

    106/301

    LP: minimize cTxsubject to Ax b

    SDP: minimize cTxsubject to diag(Ax b) 0

    (note different interpretation of generalized inequality)SOCP and equivalent SDP

    SOCP: minimize fTxsubject to Aix + bi2 cTix + di, i= 1, . . . , m

    SDP: minimize fTx

    subject to (cTi x + di)I Aix + bi

    (Aix + bi)T cTix + di

    0, i= 1, . . . , mConvex optimization problems 437

    Eigenvalue minimization

    minimize max(A(x))

  • 8/10/2019 Convex Slides

    107/301

    minimize max(A(x))

    where A(x) =A0+ x1A1+ + xnAn (with given Ai Sk)

    equivalent SDPminimize tsubject to A(x)

    tI

    variables x Rn, t R follows from

    max(A) t A tI

    Convex optimization problems 438

    Matrix norm minimization

    minimize A(x)2= max(A(x)TA(x))1/2

  • 8/10/2019 Convex Slides

    108/301

    where A(x) =A0+ x1A1+ + xnAn (with given Ai Rpq)equivalent SDP

    minimize t

    subject to

    tI A(x)A(x)T tI

    0

    variables x Rn, t R constraint follows from

    A2 t AT

    A t2

    I, t 0

    tI AAT tI

    0

    Convex optimization problems 439

    Vector optimization

    general vector optimization problem

  • 8/10/2019 Convex Slides

    109/301

    minimize (w.r.t. K) f0(x)subject to fi(x) 0, i= 1, . . . , m

    hi(x) = 0, i= 1, . . . , p

    vector objective f0:Rn Rq, minimized w.r.t. proper cone K Rq

    convex vector optimization problem

    minimize (w.r.t. K) f0(x)subject to fi(x) 0, i= 1, . . . , m

    Ax=b

    with f0 K-convex, f1, . . . ,fm convex

    Convex optimization problems 440

    Optimal and Pareto optimal points

    set of achievable objective values

  • 8/10/2019 Convex Slides

    110/301

    O = {f0(x) | x feasible}

    feasible x is optimal iff0(x) is the minimum value ofO feasible x is Pareto optimal iff0(x) is a minimal value ofO

    O

    f0(x)

    x is optimal

    O

    f0(xpo)

    xpo is Pareto optimal

    Convex optimization problems 441

  • 8/10/2019 Convex Slides

    111/301

    Regularized least-squares

    minimize (w.r.t. R2

    +) (Ax b2

    2, x2

    2)

  • 8/10/2019 Convex Slides

    112/301

    0 10 20 30 40 500

    5

    10

    15

    20

    25

    F1(x) =

    Ax

    b

    22

    F2(x)=

    x

    2 2 O

    example for A R10010; heavy line is formed by Pareto optimal points

    Convex optimization problems 443

    Risk return trade-off in portfolio optimization

    minimize (w.r.t. R2+) (

    pTx, xTx)

    subject to 1Tx= 1, x 0

  • 8/10/2019 Convex Slides

    113/301

    x Rn is investment portfolio; xi is fraction invested in asset i

    p

    Rn is vector of relative asset price changes; modeled as a random

    variable with mean p, covariance

    pTx=E r is expected return; xTx=var r is return variance

    example

    meanreturn

    standard deviation of return0% 10% 20%

    0%

    5%

    10%

    15%

    standard deviation of return

    allo

    cationx

    x(1)

    x(2)x(3)x(4)

    0% 10% 20%

    0

    0.5

    1

    Convex optimization problems 444

    Scalarization

    to find Pareto optimal points: choose

    K 0 and solve scalar problem

    minimize Tf0(x)

  • 8/10/2019 Convex Slides

    114/301

    minimize f0(x)subject to fi(x) 0, i= 1, . . . , m

    hi(x) = 0, i= 1, . . . , p

    ifx is optimal for scalar problem,then it is Pareto-optimal for vectoroptimization problem

    O

    f0(x1)

    1f

    0(x

    2) 2

    f0(x3)

    for convex vector optimization problems, can find (almost) all Paretooptimal points by varying K0

    Convex optimization problems 445

    Scalarization for multicriterion problems

    to find Pareto optimal points, minimize positive weighted sum

    Tf0(x) =1F1(x) + + qFq(x)

  • 8/10/2019 Convex Slides

    115/301

    f ( ) ( ) q q( )

    examples

    regularized least-squares problem of page 443

    take = (1, ) with >0

    minimize Ax b22+ x22

    for fixed , a LS problem

    0 5 10 15 200

    5

    10

    15

    20

    Ax b22

    x

    2 2

    = 1

    Convex optimization problems 446

    risk-return trade-off of page 444

    minimize

    pTx + xTx

    subject to 1Tx= 1, x 0

  • 8/10/2019 Convex Slides

    116/301

    for fixed >0, a quadratic program

    Convex optimization problems 447

    Convex Optimization Boyd & Vandenberghe

    5. Duality

  • 8/10/2019 Convex Slides

    117/301

    Lagrange dual problem

    weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples

    generalized inequalities

    51

    Lagrangian

    standard form problem (not necessarily convex)

    minimize f0(x)

  • 8/10/2019 Convex Slides

    118/301

    ( )subject to fi(x) 0, i= 1, . . . , m

    hi(x) = 0, i= 1, . . . , p

    variable x Rn, domainD, optimal value p

    Lagrangian: L:Rn Rm Rp R, with domL= D Rm Rp,

    L(x,,) =f0(x) +m

    i=1

    ifi(x) +p

    i=1

    ihi(x)

    weighted sum of objective and constraint functions

    i is Lagrange multiplier associated with fi(x) 0 i is Lagrange multiplier associated with hi(x) = 0

    Duality 52

    Lagrange dual function

    Lagrange dual function: g:Rm

    Rp

    R,( ) i f L( )

  • 8/10/2019 Convex Slides

    119/301

    g(, ) = inf xD

    L(x,,)

    = inf xD

    f0(x) + mi=1

    ifi(x) +

    pi=1

    ihi(x)g is concave, can befor some , lower bound property: if 0, then g(, ) p

    proof: ifx is feasible and 0, then

    f0(x) L(x,,) infxD L(x,,) =g(, )

    minimizing over all feasible x gives p g(, )

    Duality 53

    Least-norm solution of linear equations

    minimize xT

    xsubject to Ax=b

  • 8/10/2019 Convex Slides

    120/301

    dual function

    Lagrangian is L(x, ) =xT

    x +

    T

    (Ax b) to minimize L over x, set gradient equal to zero:

    xL(x, ) = 2x + AT= 0 = x= (1/2)AT

    plug in in L to obtain g:

    g() =L((1/2)AT, ) = 14

    TAAT bT

    a concave function of

    lower bound property: p (1/4)TAAT bT for all

    Duality 54

    Standard form LP

    minimize cT

    xsubject to Ax=b, x 0

  • 8/10/2019 Convex Slides

    121/301

    dual function

    Lagrangian isL(x,,) = cTx + T(Ax b) Tx

    = bT+ (c + AT )Tx Lis affine in x, hence

    g(, ) = infx

    L(x,,) =

    bT AT + c= 0 otherwise

    g is linear on affine domain{(, ) | AT + c= 0}, hence concave

    lower bound property: p bT ifAT+ c 0

    Duality 55

    Equality constrained norm minimization

    minimize

    xsubject to Ax=b

  • 8/10/2019 Convex Slides

    122/301

    dual function

    g() = infx (x TAx + bT) = bT AT 1 otherwisewherev= supu1 uTv is dual norm of proof: follows from infx(

    x

    yTx) = 0 if

    y

    1,

    otherwise

    ify 1, thenx yTx 0 for all x, with equality ifx= 0 ify>1, choose x=tu whereu 1, uTy= y>1:

    x yTx=t(u y) as t

    lower bound property: p bT ifAT 1

    Duality 56

    Two-way partitioning

    minimize x

    T

    W xsubject to x2i = 1, i= 1, . . . , n

  • 8/10/2019 Convex Slides

    123/301

    a nonconvex problem; feasible set contains 2n discrete points

    interpretation: partition{1, . . . , n} in two sets; Wij is cost of assigningi, j to the same set;Wij is cost of assigning to different sets

    dual function

    g() = infx (xTW x +i

    i(x2i 1)) = inf x xT(W+ diag())x 1T

    =

    1T W+ diag() 0 otherwise

    lower bound property: p 1T ifW+ diag() 0example: = min(W)1gives bound p nmin(W)

    Duality 57

    Lagrange dual and conjugate function

    minimize f0(x)subject to Ax b, Cx=dd l f i

  • 8/10/2019 Convex Slides

    124/301

    dual function

    g(, ) = inf xdom f0 f0(x) + (AT + CT)Tx bT dT= f0 (AT CT) bT dT

    recall definition of conjugate f(y) = supxdom f(y

    Tx

    f(x))

    simplifies derivation of dual if conjugate off0 is known

    example: entropy maximization

    f0(x) =n

    i=1

    xi log xi, f0 (y) =

    ni=1

    eyi1

    Duality 58

    The dual problem

    Lagrange dual problem

    maximize g(, )subject to 0

  • 8/10/2019 Convex Slides

    125/301

    subject to 0

    finds best lower bound on p

    , obtained from Lagrange dual function a convex optimization problem; optimal value denoted d , are dual feasible if 0, (, ) dom g

    often simplified by making implicit constraint (, ) dom g explicitexample: standard form LP and its dual (page 55)

    minimize cTxsubject to Ax=b

    x 0

    maximize

    bTsubject to AT+ c 0

    Duality 59

  • 8/10/2019 Convex Slides

    126/301

    Slaters constraint qualification

    strong duality holds for a convex problem

    minimize f0(x)subject to fi(x) 0 i = 1 m

  • 8/10/2019 Convex Slides

    127/301

    subject to fi(x) 0, i= 1, . . . , mAx=b

    if it is strictly feasible, i.e.,

    x intD : fi(x) ) can be sharpened: e.g., can replace intD with relintD (interior

    relative to affine hull); linear inequalities do not need to hold with strictinequality, . . .

    there exist many other types of constraint qualifications

    Duality 511

    Inequality form LP

    primal problem minimize cTxsubject to Ax b

  • 8/10/2019 Convex Slides

    128/301

    dual function

    g() = infx

    (c + AT)Tx bT= bT AT + c= 0 otherwise

    dual problemmaximize bTsubject to AT + c= 0, 0

    from Slaters condition: p =d ifAx b for some x in fact, p =d except when primal and dual are infeasible

    Duality 512

    Quadratic program

    primal problem (assume P

    Sn++)

    minimize xTP xsubject to Ax b

  • 8/10/2019 Convex Slides

    129/301

    dual functiong() = inf

    x

    xTP x + T(Ax b)= 1

    4TAP1AT bT

    dual problemmaximize (1/4)TAP1AT bTsubject to 0

    from Slaters condition: p =d ifAx b for some x in fact, p =d always

    Duality 513

    A nonconvex problem with strong duality

    minimize xTAx + 2bTx

    subject to xTx 1A 0, hence nonconvex

  • 8/10/2019 Convex Slides

    130/301

    dual function: g() = infx(xT(A + I)x + 2bTx

    )

    unbounded below ifA + I 0 or ifA + I 0 andb R(A + I) minimized by x= (A + I)b otherwise: g() = bT(A + I)b

    dual problem and equivalent SDP:

    maximize bT(A + I)b subject to A + I

    0

    b R(A + I)

    maximize t

    subject to A + I bbT t 0strong duality although primal problem is not convex (not easy to show)

    Duality 514

    Geometric interpretation

    for simplicity, consider problem with one constraint f1(x) 0interpretation of dual function:

    g() = inf (t + u) where G = {(f1(x) f0(x)) | x D}

  • 8/10/2019 Convex Slides

    131/301

    g() = inf (u,t)G

    (t + u), where G = {(f1(x), f0(x)) | x D}

    G

    p

    g()u+t =g()

    t

    u

    G

    p

    d

    t

    u

    u + t=g() is (non-vertical) supporting hyperplane toG hyperplane intersects t-axis at t=g()

    Duality 515

    epigraph variation: same interpretation ifG is replaced with

    A = {(u, t) | f1(x) u, f0(x) t for some x D}

    A

    t

  • 8/10/2019 Convex Slides

    132/301

    p

    g()

    u+t =g()

    u

    strong duality

    holds if there is a non-vertical supporting hyperplane toAat (0, p) for convex problem,Ais convex, hence has supp. hyperplane at (0, p) Slaters condition: if there exist (u, t) A with u

  • 8/10/2019 Convex Slides

    133/301

    x

    i=1

    i=1

    f0(x) +m

    i=1

    i fi(x) +

    pi=1

    i hi(x)

    f0(x)

    hence, the two inequalities hold with equality

    x minimizes L(x, , )

    i fi(x

    ) = 0 fori= 1, . . . , m (known as complementary slackness):

    i >0 = fi(x) = 0, fi(x)

  • 8/10/2019 Convex Slides

    134/301

    1. primal constraints: fi(x) 0, i= 1, . . . , m, hi(x) = 0, i= 1, . . . , p2. dual constraints: 03. complementary slackness: ifi(x) = 0, i= 1, . . . , m

    4. gradient of Lagrangian with respect to x vanishes:

    f0(x) +m

    i=1

    ifi(x) +p

    i=1

    ihi(x) = 0

    from page 517: if strong duality holds and x, , are optimal, then theymust satisfy the KKT conditions

    Duality 518

    KKT conditions for convex problem

    ifx,, satisfy KKT for a convex problem, then they are optimal:

    from complementary slackness: f0(x) =L(x, , )( ) ( ) ( )

  • 8/10/2019 Convex Slides

    135/301

    from 4th condition (and convexity): g(, ) =L(x, , )hence,f0(x) =g(, )

    ifSlaters condition is satisfied:

    x is optimal if and only if there exist , that satisfy KKT conditions

    recall that Slater implies strong duality, and dual optimum is attained generalizes optimality conditionf0(x) = 0 for unconstrained problem

    Duality 519

    example: water-filling (assume i>0)

    minimize ni=1 log(xi+ i)

    subject to x 0, 1T

    x= 1

    x is optimal iffx 0, 1Tx= 1, and there exist Rn, R such that

  • 8/10/2019 Convex Slides

    136/301

    0, ixi= 0, 1

    xi+ i + i=

    if

  • 8/10/2019 Convex Slides

    137/301

    hi(x) = 0, i= 1, . . . , p

    perturbed problem and its dual

    min. f0(x)

    s.t. fi(x) ui, i= 1, . . . , mhi(x) =vi, i= 1, . . . , p

    max. g(, ) uT vTs.t. 0

    x is primal variable; u, v are parameters

    p

    (u, v) is optimal value as a function ofu, v we are interested in information about p(u, v) that we can obtain fromthe solution of the unperturbed problem and its dual

    Duality 521

    global sensitivity result

    assume strong duality holds for unperturbed problem, and that , aredual optimal for unperturbed problem

    apply weak duality to perturbed problem:

    ( ) ( ) T T

  • 8/10/2019 Convex Slides

    138/301

    p(u, v) g(, ) uT vT

    = p(0, 0) uT vT

    sensitivity interpretation

    ifi large: p increases greatly if we tighten constraint i (ui0) ifi large and positive: p increases greatly if we take vi0 ifi small and positive: p does not decrease much if we take vi>0;ifi small and negative: p

    does not decrease much if we take vi

  • 8/10/2019 Convex Slides

    139/301

    p(0, 0)

    ui= lim

    t0

    p(tei, 0) p(0, 0)t

    i

    p(0, 0)

    ui= lim

    t0

    p(tei, 0) p(0, 0)t

    ihence, equality

    p(u) for a problem with one (inequality)constraint: u

    p(u)

    p(0) u

    u= 0

    Duality 523

    Duality and problem reformulations

    equivalent formulations of a problem can lead to very different duals reformulating the primal problem can be useful when the dual is difficult

    to derive or uninteresting

  • 8/10/2019 Convex Slides

    140/301

    to derive, or uninteresting

    common reformulations

    introduce new variables and equality constraints

    make explicit constraints implicit or vice-versa transform objective or constraint functions

    e.g., replace f0(x) by(f0(x)) with convex, increasing

    Duality 524

    Introducing new variables and equality constraints

    minimize f0(Ax + b)

    dual function is constant: g= infx L(x) = infx f0(Ax + b) =ph d li b d l i i l

  • 8/10/2019 Convex Slides

    141/301

    we have strong duality, but dual is quite useless

    reformulated problem and its dual

    minimize f0(y)subject to Ax + b

    y= 0

    maximize bT f0 ()subject to AT= 0

    dual function follows from

    g() = inf x,y

    (f0(y) Ty+ TAx + bT)

    =f0 () + bT AT= 0

    otherwise

    Duality 525

    norm approximation problem: minimizeAx b

    minimize ysubject to y=Ax b

    can look up conjugate of , or derive dual directly

  • 8/10/2019 Convex Slides

    142/301

    g() = inf x,y

    (

    y

    + Ty

    TAx + bT)

    =

    bT+ infy(y + Ty) AT= 0 otherwise

    = bT AT= 0, 1 otherwise(see page 54)

    dual of norm approximation problem

    maximize bTsubject to AT= 0, 1

    Duality 526

    Implicit constraints

    LP with box constraints: primal and dual problem

    minimize cTxsubject to Ax=b

    1 x 1

    maximize bT 1T1 1T2subject to c + AT+ 1 2= 0

    1 0 2 0

  • 8/10/2019 Convex Slides

    143/301

    1 x 1 1 0, 2 0

    reformulation with box constraints made implicit

    minimize f0(x) =

    cTx 1 x 1 otherwise

    subject to Ax=b

    dual function

    g() = inf 1x1

    (cTx + T(Ax b))

    = bT AT+ c1dual problem: maximizebT AT+ c1

    Duality 527

    Problems with generalized inequalities

    minimize f0(x)

    subject to fi(x) Ki0, i= 1, . . . , mhi(x) = 0, i= 1, . . . , p

    K is generalized inequality on Rki

  • 8/10/2019 Convex Slides

    144/301

    Ki is generalized inequality on Rdefinitionsare parallel to scalar case:

    Lagrange multiplier for fi(x) Ki0 is vector i Rki Lagrangian L:Rn Rk1 Rkm Rp R, is defined as

    L(x, 1, , m, ) =f0(x) +m

    i=1

    Tifi(x) +p

    i=1

    ihi(x)

    dual function g:Rk1 Rkm Rp R, is defined asg(1, . . . , m, ) = inf

    xDL(x, 1, , m, )

    Duality 528

    lower bound property: ifi Ki 0, then g(1, . . . , m, ) p

    proof: ifx is feasible and Ki 0, then

    f0(x) f0(x) + mi=1

    Ti fi(x) +p

    i=1

    ihi(x)

    inf L(x, 1, . . . , m, )

  • 8/10/2019 Convex Slides

    145/301

    infxD

    L(x, 1, . . . , m, )

    = g(1, . . . , m, )

    minimizing over all feasible x gives p g(1, . . . , m, )dual problem

    maximize g(1, . . . , m, )subject to i Ki 0, i= 1, . . . , m

    weak duality: p

    d

    always strong duality: p =d for convex problem with constraint qualification(for example, Slaters: primal problem is strictly feasible)

    Duality 529

  • 8/10/2019 Convex Slides

    146/301

    Convex Optimization Boyd & Vandenberghe

    6. Approximation and fitting

    norm approximation

  • 8/10/2019 Convex Slides

    147/301

    norm approximation

    least-norm problems

    regularized approximation

    robust approximation

    61

    Norm approximation

    minimize

    Ax

    b

    (A Rmn with m n, is a norm on Rm)interpretations of solution x = argmin Ax b:

  • 8/10/2019 Convex Slides

    148/301

    interpretations of solution x = argminx Ax b: geometric: Ax is point inR(A) closest to b estimation: linear measurement model

    y=Ax + v

    y are measurements, x is unknown, v is measurement error

    giveny=b, best guess ofx is x

    optimal design: x are design variables (input), Ax is result (output)x is design that best approximates desired result b

    Approximation and fitting 62

    examples

    least-squares approximation ( 2): solution satisfies normal equations

    ATAx=ATb

    (x = (ATA)1ATb if rankA = n)

  • 8/10/2019 Convex Slides

    149/301

    (x (A A) A b ifrankA n)

    Chebyshev approximation ( ): can be solved as an LP

    minimize tsubject to

    t1

    Ax

    b

    t1

    sum of absolute residuals approximation ( 1): can be solved as an LP

    minimize 1Tysubject to y Ax b y

    Approximation and fitting 63

  • 8/10/2019 Convex Slides

    150/301

    example(m= 100, n= 30): histogram of residuals for penalties

    (u) =

    |u

    |, (u) =u2, (u) = max

    {0,

    |u

    |a

    }, (u) =

    log(1

    u2)

    p=

    1

    0

    40

  • 8/10/2019 Convex Slides

    151/301

    p=

    2

    Dead

    zone

    Logbarrier

    r2

    2

    2

    2

    1

    1

    1

    1

    0

    0

    0

    0

    1

    1

    1

    1

    2

    2

    2

    20

    0

    10

    0

    20

    0

    10

    shape of penalty function has large effect on distribution of residuals

    Approximation and fitting 65

    Huber penalty function (with parameter M)

    hub(u) = u2 |u| MM(2|u| M) |u| > M

    linear growth for large u makes approximation less sensitive to outliers

    2

  • 8/10/2019 Convex Slides

    152/301

    u

    h

    ub

    (u

    )

    1.5 1 0.5 0 0.5 1 1.50

    0.5

    1

    1.5

    t

    f(t)

    10 5 0 5 1020

    10

    0

    10

    20

    left: Huber penalty for M= 1

    right: affine function f(t) = + t fitted to 42 points ti, yi (circles)using quadratic (dashed) and Huber (solid) penalty

    Approximation and fitting 66

    Least-norm problems

    minimize

    xsubject to Ax=b

    (A Rmn with m n, is a norm on Rn)

  • 8/10/2019 Convex Slides

    153/301

    interpretations of solution x = argminAx=b x:

    geometric: x is point in affine set{x | Ax=b} with minimumdistance to 0

    estimation: b=Ax are (perfect) measurements ofx; x is smallest(most plausible) estimate consistent with measurements

    design: x are design variables (inputs); b are required results (outputs)x is smallest (most efficient) design that satisfies requirements

    Approximation and fitting 67

    examples

    least-squares solution of linear equations ( 2):

    can be solved via optimality conditions

    2x + AT= 0, Ax=b

  • 8/10/2019 Convex Slides

    154/301

    minimum sum of absolute values ( 1): can be solved as an LPminimize 1Tysubject to y x y, Ax=b

    tends to produce sparse solution x

    extension: least-penalty problem

    minimize (x1

    ) +

    + (xn

    )subject to Ax=b

    :R R is convex penalty function

    Approximation and fitting 68

    Regularized approximation

    minimize (w.r.t. R2+) (Ax b, x)

    A Rmn, norms on Rm andRn can be different

  • 8/10/2019 Convex Slides

    155/301

    interpretation: find good approximation Ax b with small x

    estimation: linear measurement model y=Ax + v, with priorknowledge thatx is small

    optimal design: small x is cheaper or more efficient, or the linearmodely=Axis only valid for small x

    robust approximation: good approximation Ax

    b with small x is

    less sensitive to errors in A than good approximation with large x

    Approximation and fitting 69

    Scalarized problem

    minimize Ax b + x solution for >0 traces out optimal trade-off curve

    other common method: minimize Ax b 2 + x 2 with > 0

  • 8/10/2019 Convex Slides

    156/301

    other common method: minimize

    Ax

    b

    2 +

    x

    2 with >0

    Tikhonov regularization

    minimize

    Ax

    b

    22+

    x

    22

    can be solved as a least-squares problem

    minimize AI x b0 2

    2

    solutionx = (ATA + I)1ATb

    Approximation and fitting 610

    Optimal input design

    linear dynamical system with impulse response h:

    y(t) =t

    =0

    h()u(t ), t= 0, 1, . . . , N

    input design problem: multicriterion problem with 3 objectives

  • 8/10/2019 Convex Slides

    157/301

    input design problem: multicriterion problem with 3 objectives

    1. tracking error with desired output ydes: Jtrack=N

    t=0(y(t) ydes(t))2

    2. input magnitude: Jmag=

    Nt=0 u(t)

    2

    3. input variation: Jder=N1t=0 (u(t + 1) u(t))2track desired output using a small and slowly varying input signal

    regularized least-squares formulation

    minimize Jtrack+ Jder+ Jmag

    for fixed , , a least-squares problem in u(0), . . . ,u(N)

    Approximation and fitting 611

    example: 3 solutions on optimal trade-off surface

    (top)= 0, small ; (middle) = 0, larger ; (bottom) large

    u(t)

    5

    0

    5

    y(t)

    1

    0.5

    0

    0.51

  • 8/10/2019 Convex Slides

    158/301

    t0 50 100 150 200

    10

    t0 50 100 150 200

    1

    t

    u(t)

    0 50 100 150 20042

    0

    2

    4

    t

    y(t)

    0 50 100 150 2001

    0.5

    0

    0.5

    1

    t

    u(t)

    0 50 100 150 2004

    2

    0

    2

    4

    t

    y(t)

    0 50 100 150 2001

    0.50

    0.5

    1

    Approximation and fitting 612

    Signal reconstruction

    minimize (w.r.t. R2+) (x xcor2, (x))

    x Rn is unknown signal

  • 8/10/2019 Convex Slides

    159/301

    xcor=x + v is (known) corrupted version ofx, with additive noise v variable x (reconstructed signal) is estimate ofx :Rn R is regularization function or smoothing objective

    examples: quadratic smoothing, total variation smoothing:

    quad

    (x) =n1

    i=1 (xi+1 xi)2, tv(x) =n1

    i=1 |xi+1 xi|

    Approximation and fitting 613

    quadratic smoothing example

    x

    0 5

    0

    0.5x

    0 1000 2000 3000 4000

    0.5

    0

    0.5

    0.5

  • 8/10/2019 Convex Slides

    160/301

    i

    xco

    r

    0

    0

    1000

    1000

    2000

    2000

    3000

    3000

    4000

    4000

    0.5

    0.5

    0

    0.5

    i

    x

    x

    0

    0

    1000

    1000

    2000

    2000

    3000

    3000

    4000

    4000

    0.5

    0.5

    0

    0

    0.5

    original signal x and noisysignalxcor

    three solutions on trade-off curvex xcor2 versus quad(x)

    Approximation and fitting 614

    total variation reconstruction example

    x

    1

    0

    1

    2 x

    i

    0 500 1000 1500 20002

    0

    2

    2

  • 8/10/2019 Convex Slides

    161/301

    i

    xcor

    0

    0

    500

    500

    1000

    1000

    1500

    1500

    2000

    2000

    2

    2

    1

    0

    1

    2

    i

    x

    i

    xi

    0

    0

    500

    500

    1000

    1000

    1500

    1500

    2000

    2000

    2

    2

    0

    0

    2

    original signal x and noisysignalxcor

    three solutions on trade-off curvex xcor2 versus quad(x)

    quadratic smoothing smooths out noise and sharp transitions in signal

    Approximation and fitting 615

    x

    0 500 1000 1500 20002

    1

    0

    1

    2 x

    x

    0 500 1000 1500 20002

    0

    0

    2

    2

  • 8/10/2019 Convex Slides

    162/301

    i

    xcor

    0 500 1000 1500 20002

    1

    0

    1

    2

    i

    x

    0

    0

    500

    500

    1000

    1000

    1500

    1500

    2000

    2000

    2

    2

    0

    2

    original signal x and noisysignalxcor

    three solutions on trade-off curve

    x

    xcor

    2 versus tv(x)

    total variation smoothing preserves sharp transitions in signal

    Approximation and fitting 616

    Robust approximation

    minimizeAx b with uncertain A

    two approaches:

    stochastic: assume A is random, minimize E Ax b worst-case: setAof possible values ofA, minimize supAA Ax b

  • 8/10/2019 Convex Slides

    163/301

    tractable only in special cases (certain norms , distributions, setsA)

    example: A(u) =A0+ uA1

    xnom minimizesA0x b22 xstoch minimizes E A(u)x b22

    with u uniform on [1, 1] xwcminimizes sup1u1 A(u)x b22

    figure shows r(u) = A(u)x b2u

    r(u

    )

    xnom

    xstoch

    xwc

    2 1 0 1 20

    2

    4

    6

    8

    10

    12

    Approximation and fitting 617

    stochastic robust LS withA= A + U,U random, EU= 0, EUTU=P

    minimize E (A + U)x b22

    explicit expression for objective:

    E Ax b22 = E Ax b + U x22

  • 8/10/2019 Convex Slides

    164/301

    = Ax b22+ ExTUTU x= Ax b22+ xTP x

    hence, robust LS problem is equivalent to LS problem

    minimize Ax b22+ P1/2x22

    for P =I, get Tikhonov regularized problem

    minimize Ax b22+ x22

    Approximation and fitting 618

    worst-case robust LS withA = {A + u1A1+ + upAp | u2 1}

    minimize supAA Ax b22= supu21 P(x)u + q(x)22where P(x) = A1x A2x Apx , q(x) = Ax b from page 514, strong duality holds between the following problems

  • 8/10/2019 Convex Slides

    165/301

    maximize P u + q22subject to u22 1

    minimize t +

    subject to

    I P q

    PT I 0qT 0 t

    0

    hence, robust LS problem is equivalent to SDP

    minimize t +

    subject to I P(x) q(x)P(x)T I 0q(x)T 0 t

    0

    Approximation and fitting 619

    example: histogram of residuals

    r(u) = (A0+ u1A1+ u2A2)x b2with u uniformly distributed on unit disk, for three values ofx

    xrls0.2

    0.25

  • 8/10/2019 Convex Slides

    166/301

    r(u)

    xlsxtik

    frequency

    0 1 2 3 4 50

    0.05

    0.1

    0.15

    xls minimizes

    A0x

    b

    2

    xtik minimizesA0x b22+ x22 (Tikhonov solution) xrls minimizes supAA Ax b22+ x22

    Approximation and fitting 620

    Convex Optimization Boyd & Vandenberghe

    7. Statistical estimation

    maximum likelihood estimation

  • 8/10/2019 Convex Slides

    167/301

    optimal detector design

    experiment design

    71

    Parametric distribution estimation

    distribution estimation problem: estimate probability density p(y) of arandom variable from observed values parametric distribution estimation: choose from a family of densities

    px(y), indexed by a parameter x

  • 8/10/2019 Convex Slides

    168/301

    maximum likelihood estimation

    maximize (over x) logpx(y)

    y is observed value l(x) = logpx(y) is called log-likelihood function

    can add constraints x Cexplicitly, or define px(y) = 0 for x C a convex optimization problem iflogpx(y) is concave in x for fixed y

    Statistical estimation 72

    Linear measurements with IID noise

    linear measurement model

    yi=aTix + vi, i= 1, . . . , m

    x Rn is vector of unknown parameters

  • 8/10/2019 Convex Slides

    169/301

    vi is IID measurement noise, with density p(z) yi is measurement: y Rm has density px(y) =

    mi=1p(yi aTix)

    maximum likelihood estimate: any solution x of

    maximize l(x) =mi=1 logp(yi

    aTix)

    (y is observed value)

    Statistical estimation 73

    examples

    Gaussian noiseN(0, 2): p(z) = (22)1/2ez2/(22),

    l(x) = m2

    log(22) 122

    mi=1

    (aTix yi)2

    ML estimate is LS solution

    |z|/a

  • 8/10/2019 Convex Slides

    170/301

    Laplacian noise: p(z) = (1/(2a))e ,

    l(x) = m log(2a) 1a

    mi=1

    |aTix yi|

    ML estimate is 1-norm solution

    uniform noise on[a, a]:

    l(x) = m log(2a) |aTix

    yi

    | a, i= 1, . . . , m

    otherwiseML estimate is any x with|aTix yi| a

    Statistical estimation 74

    Logistic regression

    random variable y {0, 1} with distribution

    p=prob(y= 1) = exp(aTu + b)

    1 + exp(aTu + b)

    a, b are parameters; u Rn are (observable) explanatory variablesf ( )

  • 8/10/2019 Convex Slides

    171/301

    estimation problem: estimate a, b from m observations (ui, yi)

    log-likelihood function (for y1= =yk= 1, yk+1= =ym= 0):

    l(a, b) = log ki=1

    exp(aTui+ b)1 + exp(aTui+ b)

    mi=k+1

    11 + exp(aTui+ b)

    =

    k

    i=1(aTui+ b) m

    i=1 log(1 + exp(aTui+ b))concave in a, b

    Statistical estimation 75

    example(n= 1, m= 50 measurements)

    b

    (y=

    1)

    0 4

    0.6

    0.8

    1

  • 8/10/2019 Convex Slides

    172/301

    u

    prob

    0 2 4 6 8 10

    0

    0.2

    0.4

    circles show 50 points (ui, yi)

    solid curve is ML estimate ofp= exp(au + b)/(1 + exp(au + b))

    Statistical estimation 76

    (Binary) hypothesis testing

    detection (hypothesis testing) problem

    given observation of a random variable X {1, . . . , n}, choose between:

    hypothesis 1: Xwas generated by distribution p= (p1, . . . , pn)

    h h i 2 X d b di ib i ( 1 )

  • 8/10/2019 Convex Slides

    173/301

    hypothesis 2: Xwas generated by distribution q= (q1, . . . , q n)

    randomized detector

    a nonnegative matrix T R2n, with 1TT =1T

    if we observe X=k, we choose hypothesis 1 with probability t1k,hypothesis 2 with probability t2k

    if all elements ofT are0 or1, it is called a deterministic detector

    Statistical estimation 77

    detection probability matrix:

    D= T p T q = 1 Pfp Pfn

    Pfp 1 Pfn Pfp is probability of selecting hypothesis 2 ifX is generated by

    distribution 1 (false positive)

    Pfn i b bilit f l ti h th i 1 if X i t d b

  • 8/10/2019 Convex Slides

    174/301

    Pfn is probability of selecting hypothesis 1 ifX is generated bydistribution 2 (false negative)

    multicriterion formulation of detector design

    minimize (w.r.t. R2+) (Pfp, Pfn) = ((T p)2, (T q)1)subject to t1k+ t2k= 1, k= 1, . . . , n

    tik 0, i= 1, 2, k= 1, . . . , n

    variable T R2n

    Statistical estimation 78

    scalarization (with weight >0)

    minimize (T p)2+ (T q)1

    subject to t1k+ t2k= 1, tik 0, i= 1, 2, k= 1, . . . , nan LP with a simple analytical solution

    (t1k t2k)

    (1, 0) pk

    qk

    (0 1) k < k

  • 8/10/2019 Convex Slides

    175/301

    (t1k, t2k) = (0, 1) pk< qk a deterministic detector, given by a likelihood ratio test

    ifpk=qk for some k, any value 0

    t1k

    1, t1k= 1

    t2k is optimal

    (i.e., Pareto-optimal detectors include non-deterministic detectors)

    minimax detector

    minimize max

    {Pfp, Pfn

    }= max

    {(T p)2, (T q)1

    }subject to t1k+ t2k= 1, tik 0, i= 1, 2, k= 1, . . . , nan LP; solution is usually not deterministic

    Statistical estimation 79

    example

    P =

    0.70 0.100.20 0.100.05 0.700.05 0.10

    0 8

    1

  • 8/10/2019 Convex Slides

    176/301

    Pfp

    Pfn

    1

    2

    34

    0 0.2 0.4 0.6 0.8 10

    0.2

    0.4

    0.6

    0.8

    solutions 1, 2, 3 (and endpoints) are deterministic; 4 is minimax detector

    Statistical estimation 710

    Experiment design

    m linear measurements yi=aTix + wi, i= 1, . . . , mof unknown x Rn

    measurement errors wi are IIDN(0, 1) ML (least-squares) estimate is

    x =

    m

    aiaT

    i

    1 m

    yiai

  • 8/10/2019 Convex Slides

    177/301

    x= i=1

    aiai i=1

    yiai

    error e= x x has zero mean and covariance

    E=E eeT = m

    i=1

    aiaTi

    1confidence ellipsoids are given by

    {x

    |(x

    x)TE1(x

    x)

    }experiment design: choose ai {v1, . . . , vp} (a set of possible testvectors) to make E small

    Statistical estimation 711

    vector optimization formulation

    minimize (w.r.t. Sn+) E=

    pk=1 mkvkv

    Tk

    1

    subject to mk

    0, m1

    +

    + mp

    =mmk Z

    variables are mk (# vectors ai equal to vk)

    difficult in general due to integer constraint

  • 8/10/2019 Convex Slides

    178/301

    difficult in general, due to integer constraintrelaxed experiment design

    assumem

    p, use k=mk/mas (continuous) real variable

    minimize (w.r.t. Sn+) E= (1/m)p

    k=1 kvkvTk

    1subject to 0, 1T= 1

    common scalarizations: minimize log det E, trE, max(E), . . . can add other convex constraints, e.g., bound experiment cost cT B

    Statistical estimation 712

    D-optimal design

    minimize log det

    pk=1 kvkv

    Tk

    1

    subject to 0, 1T

    = 1

    interpretation: minimizes volume of confidence ellipsoids

    dual problem

  • 8/10/2019 Convex Slides

    179/301

    maximize log det W+ n log nsubject to vTk W vk 1, k= 1, . . . , p

    interpretation:{x | xT

    W x 1} is minimum volume ellipsoid centered atorigin, that includes all test vectors vkcomplementary slackness: for, Wprimal and dual optimal

    k(1 vTk W vk) = 0, k= 1, . . . , poptimal experiment uses vectors vk on boundary of ellipsoid defined by W

    Statistical estimation 713

    example(p= 20)

    1 = 0.5

    2 = 0.5

    design uses two vectors on boundary of ellipse defined by optimal W

  • 8/10/2019 Convex Slides

    180/301

    design uses two vectors, on boundary of ellipse defined by optimal W

    Statistical estimation 714

    derivation of dual of page 713

    first reformulate primal problem with new variable X:

    minimize log det X1subject to X=

    pk=1 kvkv

    Tk , 0, 1T= 1

    L(X,,Z,z,) = log det X1+tr Z X

    p

    k=1

    kvkvT

    k zT+(1T

    1)

  • 8/10/2019 Convex Slides

    181/30


Recommended