Copyright ©1991-2009 by K. Pattipati
Prof. Krishna R. Pattipati
Dept. of Electrical and Computer Engineering
University of Connecticut Contact: [email protected] (860) 486-2890
Fall 2009
November 3 & 10 , 2009
ECE 6437Computational Methods for Optimization
Lecture 11: Successive Quadratic
Programming (SQP) Methods
Copyright ©1991-2009 by K. Pattipati2
Outline of Lecture 11
Motivation for Successive Quadratic Programming (SQP)
Methods
Key SQP Ideas
Newton Version of SQP
Descent Property of Merit Function f+cP
Quasi-Newton Version of SQP
SQP with second order correction
Copyright ©1991-2009 by K. Pattipati3
Consider unconstrained minimization problem:
Motivation for SQP - 1
min ( )x
f x
LINEARIZATION
Series of straight line
approximations
• Given the current estimate the next estimate is obtained via
a quadratic approximation of around :
• Consider scalar iteration first:
1kx kx*( )f x kx
* * * * *2
1
2 1
1
*
"PURE NEWTON ITERATION"
1( ) ( ) ( )( ) ( ) ( )( ) min at
2
[ ( )] ( )
An alternate viewpoint is to consider solving the first order necessary condition :
( )
T T
k k k k k k k
k k k k
f x f x f x x x x x f x x x x x
x x f x f x
f x
* *
1
0
Solving ( ) 0, a scalar non-linear function
0 ( ) ( ) '( )( )
( )
'( )
k k k
kk k
k
Consider
g x
g x g x g x x x
g xx x
g x
( )g x
x
Copyright ©1991-2009 by K. Pattipati4
Now consider solving the necessary conditions:
2 1
1 [ ( )] ( )k k k kx x f x f x
Quadratic approximations of ( ) around linearization of first
order necessary conditions around
k
k
f x x
x
• Also, know that Newton’s method is locally convergent and that we
need to modify it via step size selection or trust region approach and
employ strategies for indefinite Hessian (e.g., modified Cholesky,
Levenberg-Marquardt, double dog-leg, trust region)
• Quasi-Newton methods to avoid having to compute the Hessian
( secant approximation)
* *2( ) ( ) ( )( ) 0k k kf x f x f x x x
Motivation for SQP - 2
Copyright ©1991-2009 by K. Pattipati5
Can we extend this idea to constrained minimization problems: Yes!
SQP for Constrained Optimization
1 1
2 2
1 1
2
1
Given ( , ), the current estimates, want to find new estimates ( , )
( , ) ( , )( ) ( , )( ) 0
( , ) ( , )( ) 0
Using: ( , ) ( ) (
k k k k
x k k xx k k k k x k k k k
k k x k k k k
x k k k k
x x
L x L x x x L x
L x L x x x
L x f x h x
2 2 2
1
2 2
2
1
1
) , ( , ) ( )
( , ) ( ) ( ) ( )
( , ) ( ) [ ( , )]
( , ) ( ) ( ) ( )
( )( ) 0
k k k km
xx k k k k i i k ki
T
x k k k x k k k
xx k k k k k kk k
T
k k kk
L x h x
L x f x h x H
L x h x L x N
L x h x f x h xx x
h xh x
• Consider such that . Lagrangian function is given as:
. First order necessary conditions of optimality:
• Recall Newton’s method for solving a system of non-linear equations:
min ( )f x ( ) 0h x
( , ) ( ) ( )TL x f x h x
* * * * *
1
* * * * *
equations: ( , ) ( ) ( ) ( ) ( ) 0
equations: ( , ) ( ) 0; ( ) unknowns: ,
m
ix ii
n L x f x h x f x h x
m L x h x m n x
1add to (2,2) block
if is not full rank
k
k
Ic
N
Copyright ©1991-2009 by K. Pattipati
• Let and add to first equation:
6
Claim: These are the necessary conditions of optimality for the
following quadratic programming problem:
2
1
( , ) ( ) ( )( )
( )( ) 0
xx k k k kk
T
k kk
L x h x f xd
h xh x
( )k kh x 1k k kx x d
21min ( , ) ( )
2
s.t. ( ) ( ) 0
k
T T
k xx k k k k kd
T
k k k
d L x d f x d
h x d h x
2
*
1
2
1
1Define: ( , ) ( , ) ( ) [ ( ) ( )]
2
Optimality Conditions of ( , )
( , ) ( ) ( )same as
( ) ( )
T T T T
xx k k k k k k k
k k
xx k k k k k k
T
k k k
L d d L x d f x d h x d h x
d
L x d h x f x
h x d h x
First order necessary conditions of optimality:
Solution of Linearized Equations = QPP
(QPP)
Copyright ©1991-2009 by K. Pattipati7
Let us summarize results so far and list unresolved issues:
1) Can obtain and the multiplier vector form the solution of a
quadratic programming problem with linear equality constraints.
2) In essence, we are approximating the nonlinear equality problem by a
series of quadratic programming problems, one at each iteration.
3) Again, can get only local convergence. Need strategies for:
a) Indefinite Modified Cholesky, Quasi-Newton, Augmented
Lagrangian
b)Global convergence – Line search. Q: Line Search on What?
4) What about inequality constraints?
• One way of ensuring positive definiteness of is to convexify the
Lagrangian by adding a quadratic penalty term:
kd 1k
2
xxL
2
xxL
2 2 2
1
1( , ) ( ) ( ) ( ) ( )
2
Use ( , ) ( , ) ( ) ( ) ( ) ( )
T T
c
mT
xx c k k xx o k k i i k i k k k k
i
L x f x h x ch x h x
L x L x c h x h x c h x h x
Summary of SQP Ideas - 1
Copyright ©1991-2009 by K. Pattipati
• Alternatively, use only
• Extension to inequality constraints:
8
2
1
1
2 2 2 2
1 1
( , , ) ( ) ( ) ( )
( ) 0 0 ( )
( ) 0 0 ( )
where ( , , ) ( ) ( ) ( ) ( ) ( )
xx k k k kk kk
T
k k k
T
k k k
m r
xx k k k k i i k k j j kki j
L x h x g x f xd
h x h x
g x g x
L x f x h x g x
min ( ), s.t. ( ) 0; ; ( ) 0;m rf x h x h R g x g R
( , , ) ( ) ( ) ( )TTL x f x h x g x Lagrangian Function:
Necessary Conditions:
Linearization leads to:
2 2( , ) ( , ) ( ) ( ),
RHS will be modified as: ( ) [ ( ) ( ) ( )]
T
xx c k k xx o k k k k k
k k k k k
L x L x c h x h x
f x f x c h x h x
*
*
( ) + ( ) ( ) 0
( ) 0
( ) 0 1,2,..., ( ) ( ) 0
0
i i
f x h x g x
h x
g x i r or g x
Summary of SQP Ideas - 2
Copyright ©1991-2009 by K. Pattipati
This is equivalent to the following QPP with linear equality and
inequality constraints :
Questions:
• How do we use this idea in a general SQP algorithm?
• Need to solve a quadratic programming problem at each iteration.
How to solve QPP?
• How to ensure global convergence? Line search on what function?
General Algorithm: Newton Version
9
2min ( , , ) ( )
s.t. ( ) ( ) 0
( ) ( ) 0
k
T T
k xx k k k k k kd
T
k k k
T
k k k
d L x d f x d
h x d h x
g x d g x
20 00 0 0 0 0 0Step 1: Given an initial estimate , , compute ( , , ), ( ), ( ). Set =0
Step 2: Solve the QPP
xxx L x h x g x k
2
11
1min ( , , ) ( )
2
s.t. ( ) ( RESULT) 0 , ,:
( ) ( ) 0
k
T Tkk xx k k k k k
d
Tkk k k k k
T
k k k
d L x d f x d
h x d h x d
g x d g x
Summary of SQP Ideas - 3
Copyright ©1991-2009 by K. Pattipati10
General SQP Algorithm
1 2 1 2
1 1
Step 3: Select a step size along to minimize a penalty (merit) function
: 1. max{0, ( ), ( ),..., ( ),| ( ) |,| ( ) |,...,| ( ) |}
2.
Ch
max(0, ( )) | (
oices for
) |
3. (
k k
r m
r m
r i
j i
P
d f cP
P g x g x g x h x h x h x
P g x h x
f cP f x
2
1( , , )
2
1 1) ( ) ( ) [max(0, ( ))] ( ) ( )
2 2
1 14. ( , , ) ( ) ( ) || ( , , ) ||
2 2
1 15. 0. ( ) ( , ) ( , ) ( ) ( )
2 2
In any case, arg min{ ( )
rTT T
r
iL x
T
x
T T
x x
k k k
h x g x c g x ch x h x
f cP L x ch x h x L x
f cP f x L x L x ch x h x
f x d cP
( )}
that some of the Penalty functions are non-differentiable.
DO NOT USE LINE SEARCH TECHNIQUES THAT USE DERIVATIVE INFO.
ONLY THOSE THAT USE FUNCTION EVALUATIONS (e.g., GS+QI)
Step
N
4: Ch
o
e
e
t
ck
k kx d
for convergence. If not converged, 1 and go back to Step 2k k
General Algorithm: Newton Version (continued)
Copyright ©1991-2009 by K. Pattipati11
Descent Property of f+cP - 1
Descent Property of f+cP
Consider the inequality constrained case:
No loss of generality since
1
( ) [ ( )]r
j
j
f cP f x c g x
( ) 0 ( ) 0
( ) 0
i
i
i
h xh x
h x
1
1
1
1 ( )
( ) ( ) [ ( )] ,Let ( ) { : ( ) 0}
( ) ( )
Proof:
[ ( )]
( ) ( ) [ ( ) ( ) )] ( )
( ) ( ) [ ( )] ( ) ( )
This is because
r
j jj
r
jj
rT T
j jj
rT T
j jj j J x
a x f x c g x J x j g x
a x d f x d c g x d
f x f x d c g x g x d O
f x f x d c g x c g x d O
g
( )
1( ) ( )
( ) ( ) 0 ( ) 0 if ( ) 0
So, ( ) ( ) [ ( ) ( ) ]
Since ( ) ( ) [ ( )]
T Tj j j j
T Tj
j J x
rTj j j
jj J x j J x
x d g x g x d g x
a x d a x f x d c g x d
c g x d c g x c g x
Copyright ©1991-2009 by K. Pattipati
Descent Property of f+cP (continued)
• What if is not PD? Use Augmented Lagrangian
• If don’t want to compute Hessian, use Quasi-Newton Method
12
1
2
1
2
1
2
1
So, ( ) ( ) ( ) [ ( )]
From the necessary conditions of optimality, we have
( ) ( )
( )
max( ) [ ( )]
( ) ( ) {
rT
j
j
rTT T
xx j jj
rT
xx j j
j
rT
xx j jj
j
T
xx
a x d a x f x d c g x
f x d d Ld g x d
d Ld g x
d Ld g x
a x d a x d
2
1
2
[ max( )] [ ( )] } ( )
Since is PD and if > max , an ( ) ( )
r
j j
j
xx jj
Ld c g x O
L c a x d a x
2
xxL
Descent Property of f+cP - 2
[ ( ) ( )] 0 ( ) ( )T T
j j j j j j jg x d g x g x d g x
Copyright ©1991-2009 by K. Pattipati13
Quasi-Newton Version of SQP - 1
General Algorithm: Quasi-Newton Version
0 0
0
Step 1: Given an initial estimate of and a PD matrix
(approximation to the Hessian) or its square root for square root version
1min ( )
2
Step 2: Solve the QPP: s.t. ( ) ( )
T T
k k k k k
T
k k k
x B
L
d B d f x d
h x d h x
1 1 10 Get , ,
( ) ( ) 0
Step 3: Perform Line search to obtain , where arg min{ }
Step 4: Check for convergence. If not go to Step 5
Step 5: Update (or ) via generalize
k k k
T
k k k
k k
k k
d
g x d g x
f cP
B L
1
d BFGS update
ˆ ˆˆ; (1 ) ;
ˆˆ
1 if 0.2
suggested value of (empirical): 0.8 if 0.2
TTk k k kk k
k k k k k k k kT T
k k k k k
T T
k k k k k
Tk k k k k T T
k k k k kT T
k k k k k
B p p BB B q B p
p p B p
p q p B p
p B pp q p B p
p B p p q
1 1 1 1 1
1
( , , ) ( , , )k x k k k x k k k
k k k k k
q L x L x
p x x d
Copyright ©1991-2009 by K. Pattipati
• Powell (Math Programming, Vol. 15, 1978) shows that if , the
method has super linear convergence. However, one can find problems
where for arbitrarily close to . Known as Maratos effect
General Algorithm: Quasi-Newton Version (continued)
14
1
1
ˆ is used to ensure 0 ( ) ( )
1 BFGS update
0.8 0.2ˆ1 .
0.2 0
T T T
k k k k k k k
T T
k k k k k k
k k k T T
k k kkk
T T T
k k k k k k k kT T T
k k k k k k k kT T T T
k k k k k k k k k k
T
k k k
p f x d f x d
q q B p p BB B
p B pp q
p B p p B p p qp p q p B p
p B p p q p B p p q
p B p
1k
1k
kx *x
( ) | ( ) | ( ) | ( ) |j j
j j
f x d c g x d f x c g x f+cP
Quasi-Newton Version of SQP - 2
Copyright ©1991-2009 by K. Pattipati
Maratos Effect
15
2 2 2 2
1 2 1 1 2
* ** 2 *
min ( ) 2( 1) ( 1) 0
Optimal solution: (1,0), 1.5, ( , )
At iteration , (cos ,sin )
4cos 1 2cos( ) cos ; ( ) , ( )
4sin 2sin
: mi
xx
T
k
k k k
f x x x x subject to x x
x L x I
k x Feasible
f x f x h x
QPP
2 2
1 2 1 2
1 2
2 2
1
2
1 2
2
1 1n cos (4cos 1) 4sin
2 2
subject to: cos sin 0
sin cos sin; cos
sin cos sin (1 cos )
2sin ( )|| || 2Can show | sin( ) | 1|| || 2
2 | sin( ) |2
k k kk
k
k
d d d d
d d
d x d
e
e
2
2
, ( ) sin cos cos ( )
( ) sin ( ) 0
k k k
k k k
converging
However f x d f x
h x d h x
Maratos Effect
Solutions:
1. Use Augmented Lagrangian-based Merit Function
2. Second order correction
3. Allow merit function to increase in some iterations
Copyright ©1991-2009 by K. Pattipati
At step k, we have
16
SQP Algorithm with 2nd Order Correction - 1
1(1)min ( )
2
s.t. ( ) ( ) 0
( ) ( ) 0
T T
k k k k k
T
k k k
T
k k k
f x d d B d
h x h x d
g x g x d
1(2) min
2
s.t. ( ) ( ) 0
( ) ( ) 0
T
k k
T
k k k k
T
k k k k
p p
h x d h x p
g x d g x p
2
1 1 ; arg min ( )k k k k k k kx x d p f cP
1 1
2 2
1min
2
s.t. A
A
T T
k kd
T
T
g d d B d
d b
d b
2 2
1 2
1 2
( )
, ,
,
,
k k
T
k xx xx k
g f x
B L QN L c h h
A h A g
b h b g
,ld
1 1
2 2
T
l
l T
l
A d bd
A d b
1 1
1 1 2 2
Solve Phase I LP
min
s.t. A , A , 0
m r
i j
i j
T T
l l
z y
d z b d y b y
Suppose we have a feasible point
• Solve two quadratic programs to improve convergence rate:
• Solution of QPP:
Copyright ©1991-2009 by K. Pattipati17
1
2
*
1 2
At : Equality constraints are satisfied and some inequality constraints
equalityˆDefine A
ˆ active inequality
ˆAt optimum g
If we know active constraints at , we can
l
T
T
T
k k
d
A m
rA
B d A A
d
1
*
2
actually solve an equality constrained
1 ˆ ˆˆproblem: min s.t. ;ˆ2
Unfortunately don't know , so our procedure is iterative:
Start with the current working set
Go to
T T T
k k
l
bg d d B d A d b b
b
r
S
1
1
the next point
See if we need to update
l l l
l l
d d p
S S
Repeat until
Convergence
• Solution of QPP (continued):
SQP Algorithm with 2nd Order Correction - 2
Copyright ©1991-2009 by K. Pattipati18
How to get the best ?lp
1 2
1
1
ˆ ˆ ˆ ˆ ˆ[ ]; Suppose [ ]0
ˆ ˆ( ) column space of
ˆ ˆ ˆOrthogonal to 0 0
Also, 0 ; Since and are feasible
ˆ ˆ ˆ ˆ0 0
Since columns
l T
l l l ll l
l
T T
l l l
T
l l l l
T T T T
l l l l
RA A A A Q Q Q R Q A R
Q R A A
Q A Q A A Q
Q Q d d
A d A d A p A p
of and span , we can write
ˆ ˆ ˆ 0 0 0
n
l l
ll l l l
T T T T
ll l l l l l l
ll l
Q Q R
p Q y Q z
A p A Q y A Q z R y y
p Q z
SQP Algorithm with 2nd Order Correction - 3
Copyright ©1991-2009 by K. Pattipati19
1min g ( + )+ ( + ) ( + )
2
ˆs.t. 0
1min(g )
2
ˆs.t. 0
1min g
2
ˆs.t. 0
where g g
So, the problem of finding is another QPP , simpler
constrain
t
s
b
t
u
T T
k l l l l k l l
T
l
T T
k k l l l k l
T
l
T T
k l l k l
T
l
k k k l
l
d p d p B d p
A p
B d p p B p
A p
p p B p
A p
B d
p
can solve very easily!!!
• The problem of finding can be written as: lp
SQP Algorithm with 2nd Order Correction - 4
Copyright ©1991-2009 by K. Pattipati20
1
1
1
11 1
1
1
1
1
ˆ
ˆ 00
ˆor
ˆˆ 0
Since
ˆ0 0
ˆ solv
k llk k
Tl
klk
Tl
ll l l l
l
kk l k l l l
lT T
l l l
l
T
ll
g B dpB A
A
gdB A
A b
d Q c Q a
cgB Q B Q Q R
aR Q Q b
R c b
2
1
11
e for in ( ) operations.2
[ ]
l
T T
ll k l l l k k l
nc O
Q B Q a Q g B Q c
Optimality Conditions:
of QPP
SQP Algorithm with 2nd Order Correction - 5
Copyright ©1991-2009 by K. Pattipati21
1
1
1
If 0 is optimal w.r.t current set of constraints
If 0 but is feasible for constraints, is the new point.
If 0 of inequality co
all
nstraints, stop Optimal
If
l l l l l
l l l l
q
l
d d p d S
p d d p
q
d
1is not feasible some constraint is voilated. So, let l l l ld d p
Update of working set:
0
10
where: min{1, }
arg min{1, } { }
Ti l
l
Ti l
l
T
i i ll T
a pi l
i S
T
i i la l l aT
a pi l
i S
b a d
a p
b a di S S i
a p
1
11
1
1 11
11 1 1
1
1 1
Do Cholesky on Q ,so [ ] Q [ ]
Q [ ] Q [ ]
Finally, Q [ ] Q [ ]
or Q [ ] Multiplier v
T T T T
ll k l l l l l l k k ll
T T
l ll l l l l l k k l
T T
ll l l k k l k l l l k k l
T
l l l k k l
B Q U U a U U g B Q c
d Q c U U g B Q c
R g B Q c B Q a g B d
R g B d
2ector in O( ) operations.n
SQP Algorithm with 2nd Order Correction - 6
Copyright ©1991-2009 by K. Pattipati22
0 0
1
'
Step1: Start with an initial feasible and the corresponding working set . Set 0.
Step2 : Solve for
Step3: Find step length . If 1, append corresponding constraint . So
ˆ ˆ [ ]
l l
l l a
T
d S l
p d
i
R QA A a Q
ˆ
00 1
, New [ ]1
New complete change
Return to Step 2 ; else go to Step 4
l l ll
l
RaQ
Q QQ Q Q qm r
Q
1
1
is feasible for all constraints in
If all 0 Optimal
Find arg min{ }, { }
l l
q
d q l l dq
d S
i S S i
• How to drop active inequality constraints:
• Algorithm:
SQP Algorithm with 2nd Order Correction - 7
Copyright ©1991-2009 by K. Pattipati23
1 2 1 1 ˆ
Step 4: If 1, compute (last components are )
Find min( )
If 0 Stop
else drop constraint corresponding to
[ ... ... ]
0
upper triangular cols. 1 to 1 .
d
d
d d
j
i ii
i
d
i i m r
T
d
r
i
A a a a a a
MQ A
M i
1 2 1
Has elements in subdiagonals for columns
ˆ to 1
ˆNew Q [ ... , ],
ˆNew Q [ , ]
Go to Step 2
d
d
il
l l old
i m r
q q q Q
g Q
• Algorithm (continued)
SQP Algorithm with 2nd Order Correction - 8
Copyright ©1991-2009 by K. Pattipati
Other Methods:
• M.JD Powell, “On the QP Algorithm of Goldfarb and Idnani”, MP,
1985, pp.46-61
• Goldfarb and Idnani, “A numerically stable dual method for solving
strictly quadratic programs convex”MP,1983,pp. 1-33
24
1min 1
2
s.t. ( ) ( ) 0 Always feasible
( ) ( )
0
T T T
k k
T
k k k
T
k k k
d B d g d C
h x d h x
g x d g x
• What if QPP is infeasible? Add artificial variables to detect it.
SQP Algorithm with 2nd Order Correction - 9
Copyright ©1991-2009 by K. Pattipati25
Summary
Motivation for Successive Quadratic Programming (SQP)
Methods
Key SQP Ideas
Newton Version of SQP
Descent Property of Merit Function f+cP
Quasi-Newton Version of SQP
SQP with second order correction