Copyright ©1991-2009 by K. Pattipati1
Lecture 1: Introduction, Necessary and Sufficient
Conditions for Minima & Convex Analysis
Prof. Krishna R. Pattipati
Dept. of Electrical and Computer Engineering
University of Connecticut Contact: [email protected] (860) 486-2890
Fall 2009
September 1, 2009
ECE 6437Computational Methods for Optimization
Copyright ©1991-2009 by K. Pattipati2
Introduction
Contact Information
• Room number: ITE 350
• Tel/Fax: (860) 486-2890/5585
• E-mail: [email protected]
Office Hours: Tuesday – Thursday: 11:00-12:00 Noon
Mission or goal
• Provide systems analysis with central concepts of widely
used optimization techniques
• Requires skills from both Mathematics and CS
• Need a strong background in multivariable calculus and
linear algebra
Copyright ©1991-2009 by K. Pattipati3
Outline of Lecture 1
Three Recurrent Themes
• Problem, Algorithms, Convergence Analysis
Optimization Applications
What is an Optimization Problem?
Classification of Optimization Problems
Three Basic Questions of Optimization
• Optimality conditions, algorithm, convergence
Optimality Conditions for single variable and Multi-
variable Functions
Elementary Convexity Theory
Copyright ©1991-2009 by K. Pattipati4
Three Recurrent Themes
Need to mathematically understand the
optimization problem to be solved
Design an algorithm to solve the problem,
that is, a step-by-step procedure for
solving the problem
Convergence Analysis
• How fast does the algorithm converge?
• What is the relationship between rate of
convergence and the size of the
problem?
ECE 6437
Computational
Methods in
Optimization
Convergence
Analysis
(Complexity)
Algorithmic
Techniques
Optimization
Problem
(Application-specific)
Copyright ©1991-2009 by K. Pattipati5
Applications of Optimization
Sample Applications
• Scheduling in Manufacturing systems
• Scheduling of police patrol officers in a city
• Reducing fuel costs in Electric power industry (unit commitment)
• Gasoline blending in TEXACO
• Scheduling trucks at North American Van Lines
• Advertisement to meet certain % of each income group
• Investment portfolio to maximize expected returns, subject to
constrains on risk
Technical Areas
• Operations Research, Systems theory (Optimal Control),
Statistics (Design of Experiments), Computer Science, Chemical
and Civil Engineering, Economics, Medicine, Physics, Math,….
Copyright ©1991-2009 by K. Pattipati6
Three Attributes:
1. A set of independent variables or parameters
2. Conditions or restrictions on the acceptable values of the variables
constraints of the optimization problem, (e.g., )
3. A single measure of goodness, termed the objective (utility) function
or cost function or goal, which depends on
or
What is an Optimization Problem?
1
2
continuous
vector ..., 2, 1,0,1,2,... integers
| 0,1 binary
n
n
in
xx R
xx n x Z
x xx
1 2, ,..., nx x x
0x
1 2 1 2, ,..., , , ,...,n nx x x f x x x
f x
Z
: ; if Z
0,1
n
n n
n
R
f R R f Z Z
Z
Copyright ©1991-2009 by K. Pattipati7
Abstract Formulation: “Minimize subject to ”
• Such problems have been investigated at least since 825 A.D. Persian
author Abu Ja'far Muhammad ibn Musa Al-Khwarizmi who wrote the
first book on Mathematics
• Since 1950’s, a hierarchy of optimization problems have emerged under
the general heading “Mathematical Programming”. The solution
approach is algorithmic in nature, i.e., construct a sequence
Abstract Formulation
* *
0 1 ... , where minimizes x x x x f x
f x x
Feasible set, closed and bounded
Copyright ©1991-2009 by K. Pattipati8
A classification of Mathematical
programming problems
n
n
x R
x Z
Nonlinear programming problems
Discrete nx Z
hard
problems
NP
Network
Programming
Separable
Resource allocation
problems
Assignment
problemsLP
Cont. nx R
Convex
Programs
ECE 6437
Research
No course yet
NLP
ECE 6108
Copyright ©1991-2009 by K. Pattipati9
Unconstrained NLP: no constraints on
• Steepest descent (gradient) method
• Conjugate gradient method
• Newton, Gauss-Newton methods & variations
• Quasi-Newton (or) variable metric methods
Constrained NLP: defined by
• Penalty methods
• Multiplier or Augmented Lagrangian methods
• Reduced gradient method
• Recursive quadratic programming
nR x
0, 1,2, , Equality constraints
0, 1,2, , Inequality constraints
, 1,2, , n Simple bound constraints
i
i
LB UB
i i i
h x i m n
g x i p
x x x i
Computational Methods in Optimization
ECE 6437
Copyright ©1991-2009 by K. Pattipati10
Special Case 1: Convex programming problem (CPP)
• Convex cost function with convex constraints
• is convex (defined later).
• is concave (or) is convex.
• linear
Computational Methods in Optimization:
ECE 6437 (Cont’d)
( )ig x
( )ih x1
n
m
i i
i
A x b a x b R
Local minimum Global minimum
( )f x
( )ig x
Copyright ©1991-2009 by K. Pattipati11
Special Case1.1: Linear Programming (LP) Problem
• is linear
• linear
•
• A striking feature of this problem is that the number of feasible solutions is
finite:
• Efficient algorithms exist for this problem
− Revised simplex
− Interior Point algorithms (application of specialized NLP to LP)
• One of the most widely used models in production planning.
Special cases 1.1.x :• Network Flows (LP on networks, i.e., graphs with weights)
• Shortest paths
• Maximum flow problem
• Transportation problem
• Assignment problem
Linear Programming and Network Flows -
ECE 6108
f x1 1 2 2( )
T
n nf x c x c x c x c x
( )ig x ; 1,2, ,T
i ia x b i p
0; 1,2,.., ; ; is by matrixix i n Ax b A m n
n pN
m p
Copyright ©1991-2009 by K. Pattipati12
Integer Programming (combinatorial optimization) has hard intractable
problems with exponential computational complexity
• Traveling salesperson problem
• VLSI routing
• Testing
• Multi-processor scheduling to minimize make span
• bin-packing
• Knapsack problem
• …..
In ECE 6437, our focus will be on the following problems:
• Unconstrained NLP
• Constrained NLP
• Convex Programming
Integer Programming
Copyright ©1991-2009 by K. Pattipati13
1. Static Question: How can one determine whether a given point
is a minimum → Provides theory, stopping criteria, etc.
2. Dynamic Question: If a given point is not a minimum, then how
does one go about finding a solution that is a minimum? → Algorithm
3. Convergence Analysis:• Does the algorithm in 2 converge?
• If so, how fast?
How does or behave?
Let us consider the third question first.
Rate of Convergence Concepts:Suppose have an algorithm that generates a sequence with a stationary limit
point . Define a scalar error function:
Three Basic Questions of Optimization
*x
x
*
0 1 2x x x x
*
kx x *( ) ( )kf x f x
kx*x : Rne R
*
kke x x
Copyright ©1991-2009 by K. Pattipati14
Rate of Convergence:
Rate of Convergence - 1
1
1
1/22 2 2
1 2211
*
Here is defined as any Holder -norm defined by:
Typically, ; ; max
You may also define | |
The behav
n pp
ipi
n
i n i i
i
kk
x p
x x
x x x x x x x x
e f x f x
ior of as a function of is directly related to computational efficiency
Time complexity: cost per step * number of iterations
In order to investigate the behavior of , we compare it to
“st
k
k
e k
e
1
andard” sequences. One standard form is to look for
as
order of convergence (or) asymptotic rate of convergence
convergence ratio or asymptotic error constant
r
k ke e k
r
AB
CD
ke
k
A B C D
Copyright ©1991-2009 by K. Pattipati15
1 linear convergence (Geometric). Converges if
2 quadratic (fast) convergence
3 cubic (superfast) convergencer
1
1
1
1
1
If 1, 1 linear lim 1
1, 1 sublinear lim 1
0, 1 superlinear lim 0
1 superlinear 0 lim
k
kk
k
kk
k
kk
k
kk
er
e
er
e
er
e
er
e
Examples:
1
1
1 1
1
1 , 1 binary search, golden section search, gradient method, regula falsi
12 1 sublinear
1
1 1 13 1; 0 as superlinear
1 1
k k
kk
k
k k k
k k k
e e
e ke
k e k
ke e e r k
k k k k
Rate of Convergence - 2
Copyright ©1991-2009 by K. Pattipati16
1
1 1 1
2 1 2 2 1 1 2 2 2 1 2
2 21 2 1 2
2
2 2
1
1 14 , q , q , linear
5 ;
lim 1 and
6 2 quadratic Newton's Methk
k kk
k kk
k k k k k k
k k k
k
kk
k k k
e q qk k
e e e
er
e
e a e e r
1 1
1
2
2 2 2
od
7 ; 1.618 Golden section number
1 superlinear convergence rate
: secant method, quadratic fit 1.3
18 1; 0 linear and
2
since 1
Examp
1
les
k
k k k
k k
k
e Me
r
e a a
a a a
1
1
2
1
1 1 lim lim
21
Most of the methods that we discuss will have 1 2
k
k
k kk
e
e a
r
Examples:
Rate of Convergence - 3
Copyright ©1991-2009 by K. Pattipati17
Static Question: Necessary and Sufficient
Conditions for Minimum-1
4 3 212 47 60 3 4 5f x x x x x x x x x
Example:
weak several equivalent minima Local or relative
strong strict strict local minimum
Minima
weak Global
strong strict
global
minimum
local
minimum
Strict
global minimum
Weak local
minima
Strict
local minimum
f x
x
Extends to multivariable
functions readily
Copyright ©1991-2009 by K. Pattipati18
*
* *
* *
* *
: is a local minimum of over if for some >0, we have
or ,
, neighbourhood
Definitio
of
n x f x
f x f x x and x x
f x f x x N x
N x x
* *
* * *
*
* *
, :
: strict local minimum if , \
: is weak strict global minimum of over
Remark
D
if
:
efinitio
strict global
n
Note
N x x x x
f x f x x N x x
x f x x
f x f x f x f x x
minimum strict local minimum
strict local minimum strict global minimum except for convex functions
2x
1x
*,N x
*x
Static Question: Necessary and Sufficient
Conditions for Minimum-2
Copyright ©1991-2009 by K. Pattipati19
Optimality Conditions of Univariate Functions:
Necessary Conditions
*
*
* st
2nd
2
* *
Tangent is horizontal slope ( ) 0 1 order condition
Curvature up second de
Proof
rivative 0 2 order condition
: Suppose is a local minimum. Let . Then,
x x
x x
dff x
dx
d f
dx
x y x x
* * * * 2
* *
2 2* * 2 * *
*
by the mean value theorem
1
2
Suppose 0. Then pick ( ); sufficiently small
10
2
a contradiction need 0
From the
f y f x x f x f x x f x x x
f x x f x
f y f x f x f x f x
f x
* * * 2
* *
* * *
1 first order condition, we have ;0 1
2
if 0 0 for some small by continuity
a contradiction 0
f x x f x f x x x
f x f x x
f x x f x f x
For univariate functions:
Copyright ©1991-2009 by K. Pattipati20
Optimality Conditions of Univariate Functions:
Remarks
1. The proof provides a method of advancing from one to the next.
Take a step of ( ) s.t. ( ) ( )
Steepest descent or Gradient or Cauchy Method.
2. These are only necessary c
x
f x f x f x f x
3
onditions. They are not sufficient.
: = ; 0 = 0 =0
Not a local minimum, such point is called a saddle point or
point of inflection.
3. Note that first order con
Example
dition is sati
f x x f f
sfied by minima, maxima and
saddle point. Such points are refered to as stationary points.
For univariate functions:
f x
x
3f x x
Copyright ©1991-2009 by K. Pattipati21
Sufficient Conditions of Optimality for a
Univariate Function
*
*
* *
* *
( ) 0
0
was proven earlier. To show , note that >
only if 0 wich by continuity implies that >0.
The above results extend directly to multivaria
i f x
ii f x
i ii f x x f x
f x x f x
2
2
ble functions, i.e.,
functions of several variables.
Assume and exist and are continuous
Univariate Multivariate
derivative gradi
i i j
f ff x C
x x x
ent vector of first order partial derivatives
second derivative Hessian Matrix of second order partial derivatives
For univariate functions:
Copyright ©1991-2009 by K. Pattipati22
Conditions of Optimality for a
Multivariate Function-1
1
2 ;
n
f
x
f
xf x g x
f
x
Gradient:
1 2 1 2
0
0
, ,..., ,..., , ,...,lim
lim
Rate of change of along the direction
or slope of the tangent line along
or direction of increase in
i n n
i
i
i
i
f x x x x f x x xf
x
f x e f x
f x
x
f at x
1
2
2
1 2 2 1
2
2 2 1
21 2 1 1
cos
sin ; ;
22
Exam
o
l :
s
p
c
e
x
x
f x x x x x
ox x xf x f x f x
x x x
Copyright ©1991-2009 by K. Pattipati23
Hessian:
22
2 2
2 2
2
1 1 2 22
2 2
2
1 2 2
Hessian x matrix
since is symmetric
1 Need only
Example:
elements2
cos
i j
T
i j j i
ij ji
ff x F x n n
x x
f fF x F x F x
x x x x
n nf f
f f
x x x xf x
f f
x x x
1
2
1 2 1
2 1 1 2
1
1 1 1
2
2 sin 0 10
2 sin 2 1
A quadratic function
1 1
2
Exam
2
;
p :
le
x
x
n n nT T
ij i j i i
i j i
x x x
x x x
f x x Qx b x c q x x b x c
f x Qx b f x Q
Conditions of Optimality for a
Multivariate Function-2
Copyright ©1991-2009 by K. Pattipati24
Necessary conditions
*
2 *
1. 0
2. 0
f x
f x PSD
*
2 *
1. 0
2. 0 matrix
f x
f x PD
Sufficient conditions
1. A symetric matrix is iff
0 0
All principal minors have non-negative
determinants
Matrix
T n
i
A PSD
x Ax x R A
can be factored as
0; unit lower
T
i
A A LDL
d L
3
A symetric matrix is iff
0 0 0
All principal minors have positive
determinants
; 0; Computation6
T n
i
T
i
A PD
x Ax x R x A
nA LDL d O
1 2 n
1
2. For any symmetric matrix with ... , we have
Rayleigh inequalityT T T
n
A
x x x Ax x x
Summary of Conditions of Optimality for a
Multivariate Function-3
• PD: Positive Definite
• PSD: Positive Semi-definite
Copyright ©1991-2009 by K. Pattipati25
2
*
* * * 2 2 *
*
From the mean value theorem, we have for any and
1; 0,1
2
Take , where 1 for any norm (usually 1,2, )
1
2
If is a mini
TT
T T
x y
f y f x f x y x y x f x y x y x
x x y x d d
f x d f x f x d d f x d d g
x
* *
*
1
1
* *
mum, the scalar function has minimum at 0 0 0
0 ,
Taking 0 similarly 0 since is arbitrary
0 0 1 order condition. so, norm will be small
T n
ix x
st
g g
g f x d f x d d R
f fd e d
x x
f x f x
2
2 * 2 *
2
near minimum.
For a local minimum, we also need
00 0 is T T n
d gd f x d d R f x PSD
d
Proof necessity:
Proof of Optimality Conditions - 1
Copyright ©1991-2009 by K. Pattipati26
2 * 2 *
n
* * 2 2 *
2 * 2 *
2 *
n
* *
Suppose >0 smallest eigenvalue of >0
1; 0,1
2
For sufficiently small , 0 if >0
Let be the smallest eigenvalue of . Then
1
2
T
f x f x
f x d f x d f x d d
f x d f x
f x d
f x d f x
2
22 2 *
n
*
....Recall Rayleigh inequality2
is a strict local minimum.
Td f x d d d
x
Sufficiency:
2 * 2 *: Strict local maximum if 0 and saddle point if is iN ndefine ito et .f x f x
2 2
1 2 1 1 2 2
*1 2 *1
1 2 *2 2
: , 6 4 5E
2 6 2 03, ; 0
2 4 0
xam
22
ple f x x x x x x
x xf x x f x
x x
Strict local minimum
(It is also global minimum. Why?)
2 2
1 2 1 1 2 1 2 2
1 2 * 2 *
1 2
1 2
Ex : , 2 2 14 2 22 8
4 2 14 5 4 2, ;
2 4 2
ampl
3
e
2 2 4
f x x x x x x x x
x xf x x x f x
x x
1 220, 20
Indefinite
Saddle point
Proof of Optimality Conditions - 2
Copyright ©1991-2009 by K. Pattipati27
Convex Sets -1
Important because local optimum global optimum
1 2
1 2
1 2
: A set convex if for two points , and 0,1
we have 1 . In words, is convex if for every two
points and , the line segm
Definitio
n
n
e t
nR any x x
x x
x x
1 2 joining and is also in .x x
• Convex • Nonconvex
A convex set is one
whose boundaries do not
bulge inward (or) do not
have indentations.
Copyright ©1991-2009 by K. Pattipati28
Examples:
1. A hyperplane is convex.
2. Half spaces :
or : are convex.
3. convex. Need not be
4. Sum or difference of convex sets is convex.
5. Expansions
T
T
T
i i
a x b
H x a x b
H x a x b
c c
or contraction of convex sets is convex.
6. Empty set is convex (by definition).
2x
1x
1 2 1 0x x
1
1
C+D
C D
2C
C/2
C
Convex Sets -2
Copyright ©1991-2009 by K. Pattipati29
Convex Functions
1 2
1 2 1 2
Consider : ; is a sacalar multivariable function.
is a convex function on a convex set if for any two points
and
1 1 0 1
f x R f x
f x
x x
f x x f x f x
f x
x1x
0
1 21f x f x
2xx
1 21f x x x
1 2
A convex function bends up
A line segment chord, secant between
any two points never lies below the graph
Linear interpolation between any two
points and overestimates the funcx x
tion
Concave if is convex.f x
Concave
Convex
Not convex
Not convex
Convex Functions - 1
Copyright ©1991-2009 by K. Pattipati30
2 1 2 1 2
22
1 2 1 1 1 2 2 1 2 2
1. A linear function is convex
1 1 1
2. A quadratic function is convex if is .
1 1 1 1
T
T T
T
T T T T
f x c x
f x x c x c x f x f x
x Qx Q PSD
f x x x Qx x Qx x Qx x Qx
1 2 1 1 2 2
1 1 2 2 1 2 2 1
1 2 1 2
2 1 1
1 2 1
1 0 iff is
3. In general ; 1; 0
T T
T T T T
T
i ii i i i
i i i
f x f x x Qx x Qx
x Qx x Qx x Qx x Qx
x x Q x x Q PSD
f x f x
JENSEN'S INEQU ALITY
Examples:
Convex Functions - 2
[ ( )] { ( )}f E x E f x
Copyright ©1991-2009 by K. Pattipati31
1
2
2 1 1 2 1
Defines the tangent plane at
4. The linear extrapolation at a point underestimates a convex function
assume ; + T
x
f x
f x C f x f x f x x x
Examples (cont’d):
• Proof:x
1x 2x
2 1 2 1
1 2 1 1
2 10
1 2 1 2 1
1 2 0 2 1
1 0 0 1
only if is convex 1 1
lim
If Assume result is true at and 1
1 1
T
T
f x f x x f x f x
f x x x f xf x f x
f x x x f x f x
x x x x x
f x f x f x x x
0
2 0 0 2 0
2 1 0 0 1 2 0
1 1
T
T
f x f x f x x x
f x f x f x f x x x x
Convex Functions - 3
Copyright ©1991-2009 by K. Pattipati32
2 2
2
2 1 1 2 1 2 1 1 2 1 2 1
2 2
1 1 2 1
2 1 1 2 1
5. convex is PSD over convex
1 (only if)
2
0 0 for sufficiently small
is conv
TT
T
f x C f x x
f x f x f x x x x x f x x x x x
f x f x x x
f x f x f x x x f x
*2
1
2
2 1 1 2 1 2 1
2 1 1 2 1
ex
If : Suppose 0 can find N ,
0
a contradition
6. Sum of convex functions is convex
7. The epigraph or the level set
=
T
T
f x x
x x f x x x x x
f x f x f x x x
1 2 1
2
1 2 1 2
1 2
is convex for all is convex
: Let and
and ;
1 1
Proo
f
1
:
if .
f x
f x
x x f x
f x
f x x f x f x
x x
x
Examples:
= : f xx
f x
1x
2x
tanf x cons t
Convex Functions - 4
Copyright ©1991-2009 by K. Pattipati33
8. Convex programming problem
min convex
s.t. concave convex
0
= : 0 : 0
i i
i
i i i
f x f x
Ax b g x g x
g x
x g x x g x
convex; = : convex
intersection of hyperplanes convex set
convex
9. Local optimum global optimum
global local is always true!!!
To prove local globa
ii
x f x
Ax b
Α
Α
*
*
* * *
*
l, let be a local minimum,
but is a global minimum.
Consider 1
Convexity 1 1
can not be a local minimum, a contradictio
x
y
x x y
f x y f x f y f x
x
n.
As a worst case, local minima must be bunched together as shown.
Examples:
x
f x
Convex Functions - 5
Copyright ©1991-2009 by K. Pattipati34
* * * *
* *2
10. First order necessary condition is also sufficient
11. is convex iff the scalar function = is convex and .
12. Since near , 0, we can appl
T nf x f x f x x x f x x R
f x g f x d x d
x f x
*
* * * *2
2 2
y convex analysis locally.
In addition, from Taylor series, for near
1 +
2
1 +
2T
TT
T TT T
bc
x x
f x f x f x x x x x f x x x
f x f x x x f x x f x x f x x
* * *2
* * * *
1+
2
1 A quadratic approximation near
2
T
Q
TT
x f x x
c b x x Qx x
Examples:
Convex Functions - 6
Copyright ©1991-2009 by K. Pattipati35
1 2 1 2
1 2 1 1 2
1 2
1 2
1 2 2
2 2 2
1 2 1 1 22
1 2 1 2
2 2 2
1 2 1 2 2
( ) ln(1 ) ln ln
1 1
1 2 1 1( ) 0
2 11 1 3
1
1 1 1
(1 ) (1 )( ) 0 { : 0, 0, 1}
1 1 1
(1 ) (1 )
f x x x x x
x x x x xf x x x
x x
x x x
x x x x xf x x x x x x
x x x x x
Stri
ctly Convex
Example:
Convex Functions - 7
Copyright ©1991-2009 by K. Pattipati36
Summary
Abstract Definition of an Optimization Problem
Classification of Optimization Problems
Three Basic Questions of Optimization
• Optimality conditions, algorithm, convergence
Optimality Conditions for single variable and Multi-
variable Functions
Elementary Convexity Theory