Loughborough UniversityInstitutional Repository
A theoretical andcomputational investigation
of a generalizedPolak-Ribiere algorithm forunconstrained optimization
This item was submitted to Loughborough University's Institutional Repositoryby the/an author.
Additional Information:
A Doctoral Thesis. Submitted in partial fulfilment of the requirementsfor the award of Doctor of Philosophy of Loughborough University.
Metadata Record: https://dspace.lboro.ac.uk/2134/13193
Publisher: c K.M. Khoda
Please cite the published version.
https://dspace.lboro.ac.uk/2134/13193
This item was submitted to Loughborough University as a PhD thesis by the author and is made available in the Institutional Repository
(https://dspace.lboro.ac.uk/) under the following Creative Commons Licence conditions.
For the full text of this licence, please go to: http://creativecommons.org/licenses/by-nc-nd/2.5/
LOUGHBOROUGH UNIVERSITY OF TECHNOLOGY
LIBRARY AUTHOR/FILING TITLE !
, i.1Ho])A \
A THEORETICAL AND COMPUTATIONAL
INVESTIGATION OF A GENERALIZED POLAK-RIBIERE ALGORITHM
FOR UNCONSTRAINED OPTIMIZATION
by
KHAN MONZOOR-E-KHODA
A Doctoral Thesis
Submitted in partial fulfilment of the requirements
for the award of the degree of
Doctor of Philosophy
of the Loughborough University of Technology
February, 1992
Supervisor: Professor C. Storey
Department of Mathematical Sciences
@ by K. M. Khoda, 1992
Loughborough Unrverslty Of Technoi"oy library
-- _o..kr...:.-.J..'i '-=------t
, . ,- O%()()o"1~1
w'1
This Thesis is dedicated to my Father and Mother
as a token of my grateful appreciation
TABLE OF CONTENTS
ACKNOWLEDGEMENTS vii
SUMMARY OF THE THESIS viii
CHAPTER 1 : INTRODUCTION 1
1.1 General Nature of Optimization 1
1.2 Unconstrained Optimization 2
1.3 Scope and Organization of The Thesis 4
CHAPTER 2 : MATHEMATICAL FOUNDATIONS 5
2.1 Notation 5
2.2 Background Material 7
2.3 Gradient Methods of Minimization 7
2.4 Line Search Strategies 12
CHAPTER 3 : GENERALIZED POLAK-RIBIERE ALGORITHM 15
3.1 Derivation of the Algorithm 16
3.2 General Properties of the Algorithm 19
3.3 Global Convergence Properties of the Algorithm 26
3.4 Rate of Convergence 42
3.5 Characteristic Behaviour and Basic Algorithm 53
CHAPTER 4 : SOME MODIFICATIONS OF THE GPR
ALGORITHM AND THEIR IMPLEMENTATIONS 58
4.1 GPR Algorithm with Non-negative Beta 59
IV
Table of Content.!
4.2 GPR Algorithm with Powell Restart
4.3 Shanno's Angle-test Restarting GPR Algorithm
4.4 Efficiently Restarting GPR Algorithm
4.5 Concluding Remarks
65
67
71
76
CHAPTER 5 : MULTI-TERM RESTARTING GPR ALGORITHMS 77
5.1 Beale Three-Term Restarting GPR Algorithm 78
5.2 Nazareth Three-Term Restarting GPR Algorithm 85 5.3 Concluding Remarks 90
CHAPTER 6 : EXTENSION OF THE GPR ALGORITHM 91
6.1 Theoretical Basis 91
6.2 Algorithm Construction 93
6.3 Implementation and Basic Algorithm 97
6.4 Concluding Remarks 100
CHAPTER 7 : COMPUTATIONAL EXPERIMENTS 101
7.1 Line Search Algorithm 101
7.2 Test Problems 105
7.3 Numerical Results 118
7.4 Discussion 6f the Results 128
7.5 Concluding Remarks 130
CHAPTER 8: OPTlMIZED SOFTWARE FOR GENERAL
PURPOSE USE 131
8.1 Subroutine Structure 131
8.2 User Interface of the GPR Routine 138
8.3 User-specified Optional Parameters 142
8.4 Precision of the Calculation 150
8.5 Error Indicators 150
8.6 Accuracy of the Solution 150
8.7 Efficiency and Reliability 151
8.8 A Numerical Example 151
8.9 Concluding Remarks 155
v
Table of Content3
CHAPTER 9 : OTHER APPLICATIONS OF THE GPR ROUTINE 157
9.1 Problems and Computational Performance 157
9.2 Concluding Remarks 162
CHAPTER 10 : FINAL CONCLUSIONS
10.1 Summary and Comments
10.2 Suggestions for Further Research
APPENDIX A : SUMMARY OF FAILURES
APPENDIX B : QUICK GUIDANCE
APPENDIX C : COMPLETE RESULTS
APPENDIX D : PROGRAM LISTINGS
APPENDIX E : OTHER DEPENDENT SUBROUTINES
REFERENCES
VI
164
164
166
167
170
174
282
351
358
ACKNOWLEDGEMENTS
I am greatly indebted to my supervisor Professor C. Storey of Loughborough
University of Technology for his guidance and help throughout this work. I
would like to acknowledge especially his enormous effort in correcting my written
l;:nglish. I would also like to thank him for introducing me to the interesting field
of Optimization. I take this opportunity to express my gratitude to him for all
the advice and encouragement I have received from him. I also acknowledge the
productive interactions that I had with Professor Evans of Computer Studies. I
wish to thank Louise and Helen for helping me with the typesetting.
I am very grateful to my director of research Dr. A. C. Pugh for the enormous
support he gave me throughout my candidature. I sincerely acknowledge all the
assistance I obtained from Mr. R. Tallet and Dr. M.A. Rahin.
I express my deep gratitude to my parents for their patience, to my brothers
for their understanding and to my in-laws for their rendering valuable support. I
specially record my dept of gratefulness to my father Professor A.F.M. Khodadad
Khan, whose constant advice and encouragement has always been a source of
inspiration for me. My gratitude is also due to my wife Ellora for her constant
inspiration and mental support to achieve my goal.
I would also like to express my gratitude to the Department of Mathematical
Sciences of Loughborough U Illversity of Technology for supporting me throughout
my candidature. I sincerely thank the Pilkington Library and the Computer
Certre of Loughborough University of Technology for generously letting me use
their facilities. Finally, I would like to gratefully acknowledge the Commonwealth
Scholarship Conunission and the British Council for awarding me a scholarship,
during the tenure of which, this research was carried out.
vu
SUMMARY OF THE THESIS
TITLE
A Theoretical and Computational Investigation of a Generalized Polak-
Ribiere Algorithm for Unconstrained Optimization.
ABSTRACT
In this thesis, a new conjugate gradient type method for unconstrained
minimization is proposed and its theoretical and computational properties investi-
gated. This generalized Polak-Ribiere method is based on the study of the effects
of inexact line searches on conjugate gradient methods. It uses search directions
which are parallel to the Newton direction of the restriction of the objective
function on a two dimensional subspace spanned by the current gradient and a
suitably chosen direction in the span of the previous search direction and the
current gradient. It is shown that the GPR method (as it is called) has excellent
convergence properties under very simple conditions. An algorithm for the new
method is formulated and various implementations of this algorithm are tested.
The results show that the GPR algorithm is very efficient in terms of number
of iterations as well as computational labour and has modest computer storage
requirements.
The thesis also explores extensions of the GPR algorithm by considering
multi-term restarting procedures. Further generalization of the GPR method
based on (m + 1)-dimensional Newton methods is also studied.
Optimized software for the implementation of the GPR algorithm is de-
veloped for general purpose use. By considering standard test problems, the
V11l
Summary of the TheJiJ
superiority of the proposed software over some readily available library software
and over the straight-forward Polak-Ribiere algorithm is shown. Software and
user interfaces together with a simple numerical example and some more practical
examples are described for the guidance of the user.
IX
CHAPTER 1 INTRODUCTION
This Thesis is an attempt to add to the theory of nonlinear optimization
which, of late, has emerged as a useful branch of applied mathematics. In the introductory chapter, we discuss briefly the nature of optimization with special
emphasis on the solution of unconstrained problems and give an outline of our
work.
1.1 General Nature of Optimization
Optimization is concerned with getting the best from a gIven situation
by analysing a set of alternative decisions. This is achieved by selecting a
performance index for the situation under assessment, expressing it in terms of
certain decision variables and then obtaining its best possible value by systematic
adjustment of the variables. The choice of the performance index differs from
situation to situation but generally involves some economic considerations, e.g.,
maximum return on investment, minimum cost per unit yield, etc.. It may
also involve some technical considerations such as minimum time of production,
maximum efficiency of machines and so on.
Optimization problems arise in a variety of practical situations. The way
In which the performance index is obtained from the variables of a problem
also varies widely from one situation to another. In some cases, it can only be qualitatively described, whereas mathematical models of many other problems can
be formulated in which the performance indices are described by some suitably
defined objective functions. In the latter case, the problem then reduces to a mathematical programming problem for finding the minimum or maximum value
of the objective function.
Chapter 1: Introduction 2
Mathematical modeling of optimization in many real-life situations leads
to constrained problems in which the variables are restricted in some way -
sometimes by having simple upper and lower bounds and sometimes by complex
functional constraints. In fact, many complex problems such as, for instance, the
production policy of a big company and the management of a large network are
best treated by decomposing them into separate subproblems - each subproblem
having constraints which are imposed to restrict its scope. On the other hand,
many constrained problems can be converted to unconstrained ones in which the
variables are free to assume all possible values, either by broadening the scope
of the problem or by eliminating some variables using the constraints. Moreover,
the unconstrained problems represent a significant class of practical problems.
Optimization problems have attracted the attention of researchers for a long
time. The earlier problems investigated were geometrical in nature. Later on,
with the development of calculus, a formal theory of optimization grew up. This
classical theory, though rich in theoretical content, is not of much practical value
in numerical computation, especially in dealing with large-scale problems.
Since the advent of electronic computers in the nineteen forties, there has
been a rapid development of theory and practice of optimization. There is now a
massive literature on the subject and vigorous research is still in progress creating
new theory and testing various algorithms. Recent advances in the power and
storage capacities of digital computers have made it possible to deal with large-
scale optimization problems efficiently.
1.2 Unconstrained Optimization
A static unconstrained optimization problem is concerned with finding a local
minimum or maximum of a prescribed real-valued function f : Rn -+ R of n real variables without any constraint on the variables. Without loss of generality, one
may restrict consideration to minimization problems only, because maximization
can be dealt with by minimization of - f(x,,"', x n ).
Numerous methods have been devised for solving general minimization
problems, the choice and suitability of any particular method being dependent
on the nature and size of the problem. These methods are, in general, iterative
in nature and give procedures for obtaining a sequence of approximate solutions
Chapter 1: Introduction 3
converging to the actual solution. In practice, such methods start at an initial
estimate of the minimizer and then proceed, according to some fixed rule, to
better and better approximations, terminating at the actual minimizer or at an
acceptable (according to pre-set standards) approximation of the minimizer after
a finite number of iterations. For surveys of some of these techniques, we refer to
Dennis and Schnabel[Dl]' Gill, Murray and Wright[Gl], Wolfe[Wl), Walsh[W4]'
Zoutendijk[Zl).
There are some methods in which the generation of the minimizing sequence
is based simply on comparison of values of the objective function and no use of
derivatives is made. These so-called direct search methods were once thought to be
useful in dealing with problems in which the objective function is not differentiable
or its partial derivatives are hard to evaluate. They are, however, very crude and
generally prove to be less efficient than methods making use of derivative values
no matter how these have to be evaluated.
Problems involving smooth objective functions are best dealt with by
gradient methods. In such methods the minimizing sequence is generated by
determining at each step a direction of search and then locating the best possible
estimate of the minimum point in the line of that direction through an appropriate
choice of the steplength. The search direction at each step, constructed using the
gradient values and sometimes the Hessian values also, is required to be such
that function values initially decrease in that direction. The primary differences
between various gradient methods rest with the way in which the successive
search directions are constructed. Once this is done, all such algorithms call for
choosing the minimum point on the corresponding line (exact line search), though,
in practice, one is satisfied if the steplength satifies some accepted minimizing
criterion (inexact line search).
The development of efficient algorithms for solving unconstrained optimiza-
tion problems is still an important area of research. This importance is derived
not only from the desire to solve unconstrained problems, but also from the use
made of these algorithms in constrained optimization. Indeed, unconstrained
optimization lies at the heart of the whole of nonlinear optimization.
In the next chapter, we shall give a short account of some gradient methods
of unconstrained minimization as an introduction to our work.
Chapter 1: Introduction 4
1.3 Scope and Organization of The Thesis
In this thesis, we are concerned with the static unconstrained optimization
problem
P : Minimize f(x), x ERn,
where the objective function f : Rn - R is, in general, a nonlinear function and is at least twice continuously differentiable. Our study begins with a short review of
some basic results and solution techniques in Chapter 2. Then in Chapter 3, we
develop a new conjugate-gradient type algorithm which is a generalization of the
Polak-Ribiere algorithm and discuss its theoretical and algorithmic properties.
This algorithm, referred to as the Generalized Polak-Ribiere (GPR, in short)
Algorithm in the sequel, is extended and further examined in Chapter 4 and
Chapter 5. An (m + I)-dimensional version of the GPR Algorithm is considered in Chapter 6 and various computational results are discussed in Chapter 7. The
efficiency of the Algorithm and optimized software for its implementation (called
the GPR Routine) are investigated in Chapter 8. The GPR Routine is applied to
some practical problems in Chapter 9 and final conclusions are made in Chapter
10.
CHAPTER 2 MATHEMATICAL FOUNDATIONS
In this Chapter, we set out the notation to be used throughout the Thesis,
discuss some basic results and give short accounts of some solution techniques.
2.1 Notation
In this study, the Euclidean n-space will be denoted by Rn with Ri = R, the
real line. The points x in Rn will be considered as column vectors:
(2.1.1 )
the corresponding row vector being
xT = (x,,, x n ) = (x,):: (2.1.2)
The subscript i, always ranging from 1 to n (unless otherwise specified), will be
reserved to indicate vector components, whereas, the superscript (k) will be used to distinguish vectors as X(i), x(2), .... We shall write xT z and IIxll to indicate
the Euclidean inner product and norm respectively:
(2.1.3)
(2.1.4)
Chapter ~: Mathematical Foundation.5 6
B(x, e) will denote the e-ball about x in Rn:
B(x,e) = {z E Rn: IIz -xII < e}. (2.1.5)
The elements of a matrix will be indicated by double subscripts, the first
index indicating the row and the second index the column. For an n X n matrix
A, IIAII will denote the induced Euclidean norm.
Our notation for the objective function will always be f() in the general
case and q(.) in the quadratic case. The gradient vector and the Hessian matrix
of the objective function will be denoted by g(.) and GO respectively. Thus, in the general case with f : Rn -+ R,
(2.1.6)
In an iterative process for finding the minimum of f( x), we shall denote the starting point by x(J) and the subsequent iterates by X(2), x(3), etc., and write
(2.1.7)
The search direction at the k-th step will be denoted by s(k) and the steplength
in this direction by a(k), so that .
(2.1.8)
er will denote the class of r-times continuously differentiable functions f: Rn -+ R.
(.) (2) (.) (2) ( V 11 V ) WIll denote the angle between the two vectors v and V
kEn. ,n2 will be used to mean that the integral variable k may assume
values n. through n 2
Chapter J!: Mathematical Foundations 7
(1) (2) . (1) (2). (1) (2). For x ,x E Rn wIth x # x ,the line-segment from x to x will be
(1) (2) (1) (2) denoted by [x ,x ] when end pomts are mcluded and by (x ,x ) when end
points are excluded.
~ (as above) will be used to indicate a definition and will mean the end
of a proof.
For convenience of reference, we shall number some statements (equations).
This will be done serially in a section, and will be referred to (a.b.c), where a is
the chapter number, b is the section number and c is the statement number. The
introductory portion of a chapter is numbered section O.
The lemmas, propositions and theorems will be numbered serially in a
chapter as a.b, where a is the chapter number and b = 1,2, etc.
The tables and figures will also be numbered serially in a section as (a.b.c),
where a and b are the chapter and section number respectively and c = 1,2, etc.
2.2 Background Material
We shall freely use various notions and results from analysis, linear algebra
and optimization theory in our work. All the relevant material used can be
found in standard texts in analysis, linear algebra and optimization (for example,
the text by Dennis and Schnabel[D1] has introductory sections dealing with this
background material).
2.3 Gradient Methods of Minimization
As remarked in Section 1.2, a gradient method for minimizing a smooth
nonlinear function I( x) under no constraints calls for generating a search direction s(k) at each iteration and a steplength (/k) in that direction so as to determine
the next point
(2.3.1 )
satisfying the descent criterion
(2.3.2)
Chapter 2: Mathematical Foundation" 8
The process stops at x(m) if g(m) = 0 or yields a sequence {x(k)} of points
converging to an approximation to a local minimum x which satisfies some
convergence criterion.
One of the oldest methods is the method of steepest descent, first introduced
by Cauchy[Cl]. In this method the directions of search are taken as
(2.3.3)
This choice is motivated by the fact that, local to the current approximation, the
negative gradient direction is the direction along which the function decreases
most rapidly. The steepest descent algorithm, though simple and stable (that
is, reduces the function value at each step), has the disadvantage of linear
convergence which may, at times, be extremely slow, and so it is not suitable
for practical use.
Another basic minimization technique is the Newton method, based on the
classical Newton method for solving nonlinear equations (Fletcher[Fl], Dennis
and Schnabel[Dl]' Gill, Murray and Wright[Gl]). In this method, the directions
of search are calculated from
(2.3.4)
or equivalently from the linear system
(2.3.5)
The idea behind this method is that a function may be locally approximated
by a quadratic whose minimum can be reached in one step by the above choice
of direction. The Newton algorithm for a general function is not necessarily
convergent, but for C' functions with positive definite Hessian at x, convergence is quadratic under mild restrictions on f (Fletcher[Fl]' WoJfe[Wl]) if x(k) is near enough to x for some k. This rapid convergence property makes the method
extremely efficient in many cases. However, the method has the disadvantages
that it involves a large amount of computation at each step, in the way of
calculating and inverting the Hessian, or solving a system of linear equations
and it requires quite a large amount of storage in its implementation.
Chapter : Mathematical Foundation3 9
In a bid to eliminate some of the computational disadvantages of the Newton
method, the so-called quasi-Newton (abbreviated as QN) methods have been
developed. These methods, first introduced by Davidon[D4] and later clarified by Fletcher and Powell[F4], have the general feature that the search directions are
given by (2.3.6)
where H(k) is an approximation to G(k)-' (or G(k) itself) with H(l) symmetric,
positive-definite (usually, H(I) = In, the n x n identity matrix) and the so-called
quasi-Newton conditions H(k+I)y(k) = d(k)
hold. Besides the Davidon-Fletcher-Powell (DFP) updating formula
H(k)y(k)y(k)TH(k)
y(k)TH(k)y(k)
(2.3.7)
(2.3.8)
there are now a variety of QN procedures differing in the ways in which the
matrices H(k) are updated (Fletcher[F1], Dennis and Schnabel[D1]' Dennis and More[D5]). A well-known group of updating matrices is Broyden's 0-
family(Broyden[B4]):
where
(k+I) _ (k) _ u(k)u(k)T d(k)d(k)T
H - H v(k) + '1(k)
+ O(k) (u(k) _ (~:::) d(k) (u(k) - (~:::) P) ~ (2.3.9a)
'1(k) = d(k)T y(kl,
u(k) = H(k)y(k),
v(k) = u(k)T y(k)
(2.3.9b)
and O(k) is a free parameter. The DFP formula is a. particular member of this
class (O(k) = 0). Another particular member (O(k) = -;t.r) is the Broyden-Fletcher-Goldfarb-Shanno (BFGS) formula (Broyden[B4]' Fletcher[F5]' Gold-
farb[G5], Shanno[S7]):
(2.3.10)
Chapter 2: Mathematical Foundation" 10
which is still considered to be the most effective of the QN methods (Shanno and
Phua[S5]).
The QN methods have a serious disadvantage, in the case of large scale
problems, and that is the need to store matrices in their implementation. At-
tempts to avoid this difficulty have stimulated research in the area of conjugate
gradient (abbreviated as CG) methods which call only for vectors in their
implementation. Originally proposed by Hestenes and Stiefel[Hl] to solve systems
of linear equations, the CG method was first applied to minimization problems by
Fletcher and Reeves[F2]. The underlying idea is that the minimum of a quadratic
function
(2.3.11)
where A is symmetric and positive-definite, is obtained in at most n steps through
exact line search along each of n mutually A -conjugate directions. In this case,
the CG search directions are chosen as
satisfying the de3cent condition
for k = 1, for k> 1
at each step, with the (3(k) chosen so that the conjugacy condition3
S(i)TAs(j) = 0, . J. E 1 n ; -i. J. t, ". T
are satisfied.
(2.3.12)
(2.3.13)
(2.3.14)
Several formulae for (3(k) have been obtained. Of these, the FR formula
(Fletcher and Reeves[F2])
(2.3.15)
the PR formula (Polak and Ribiere[Pl])
(2.3.16)
............. -----------------------
Chapter : Mathematical Foundatiom
and the HS formula (Hestenes and Stiefel[Hl], Sorenson[Sl])
(k) _ g(k)T(g(k) _ g(k-I)
fJHS - s(k-J)T(gm _ g(k I))
11
(2.3.17)
are often used. These different formulae for fJ are completely equivalent on quadratics when exact line searches are used. They can also be used on general
nonlinear functions J(.), but then their computational behaviour and efficiency differs considerably from one formula to another.
Theoretical and computational properties of different CG methods have been
investigated by many authors (Beale[Bl], Crowder and Wolfe[C3], Powell[P3],
Baptist and Stoer[B6], Stoer[S6], Cohen[C2], Fletcher[F3], Shanno[S2], Hu and
Storey[H3], Wolfe[W2,W3]). Though the FR method has nice global convergence
under very mild conditions (Zoutendijk[Z2], Powell[P5], Al-Baali[Al]), no such
satisfactory global convergence results are available for the PR and HS methods
(Gilbert and Nocedal[G3]). It has also been observed (Powell [P4]) that the PR
method is unlikely to have global convergence without some restrictive conditions.
On the other hand, the numerical performance of the PR method has been found
to be superior to that of the FR method in most cases. Recently, some quite
efficient hybrid CG methods have been proposed (Touati-Ahmed and Storey[Tl],
Gilbert and Nocedal[G3]). Attempts to improve upon the performance of the
CG methods have also led to some generalizations. These include Beale's and
Nazareth's three term recurrence methods (Beale[Bl]' Dixon, Ducksbury and
Singh[D2], Nazareth[N4,N3]' Dixon[D3]) and the generalized CG method of Liu
and Storey[L2]. This latter method (abbreviated as the LS method) is in fact
a two-dimensional Newton method in the sense that it uses as the next search
direction s(k) the Newton direction of the restriction of f on span{g(k), s(k-I)}, with g(k) the current gradient and s(k-I) the previous search direction. Thus the
LS algorithm uses the search direction
(2.3.18a)
with
g(k)T C(k)s(k-I) )-' ( g(k)Tg(k) )
s~k-I)T G(k) s(k-I) g(k)TS(k-I)
(2.3.18b)
Chapter 2: Mathematical Foundation.! 12
Both the QN and the CG methods have their advantages and disadvantages.
There have been several attempts to combine the two methods so as to obtain
algorithms with the good convergence properties of the QN methods and low stor-
age requirements of the CG methods. Work along these lines include Perry[P6],
Shanno[S3,S4], Buckley[B5], Shanno and Phua[S8], Nazareth[Nl], Nocedal[N2],
Buckley and LeNir[B2,B3]' Liu and Nocedal[Ll] and Gill and Murray[G4]. As a
result, some variable storage CG methods or limited memory QN methods have
been developed having good trade-off between memory and efficiency.
2.4 Line Search Strategies
Any descent method of function minimization involves a one-dimensional
line search at each iteration for locating the next acceptable approximation to
the minimizer. Thus, at the current point X(k), if g(k) ccF 0, we choose a descent
direction s(k) satisfying (2.3.13) and then determine an admissible steplength
a(k) > such that the descent criterion (2.3.2) is satisfied at the next point x(k+I) defined by (2.3.1). The descent condition (2.3.13) ensures that for all sufficiently
small a > 0, f(x(k) + as(k)) < f(x(k)), and hence one can always choose a(k) > such that (2.3.2) holds. In practice any a E (0, a~k)), where
(2.4.1)
is accepted as a(k), subject to certain conditions to ensure a sufficient decrease
f(k) - f(k+l) in the function value. Notice that a(k) is the least positive number
for which f(x(k)+a~k)s(k)) = f(x(k)) if such a number exists; otherwise a~k) = 00:
In exact line search at x(k), the steplength a(k) is taken to be the value of a
that minimizes the function
(2.4.2)
in (0, a~k)), provided such a minimizer exists. Thus, according to exact line search,
(2.4.3)
Assuming the existence of stationary points of .p(k)(.) in (0, a~k)), we then have
the exact line search condition
(2.4.4)
Chapter ~: Mathematical FoundationJ 13
The determination of a(k) by exact line search involves the minimization of
the nonlinear function tj>(k), or solving the nonlinear equation tj>(k)' (a) = 0, which
is usually, expensive to carry out. Moreover, tj>(k) may not have a minimizer
or a stationary point in (0,00). Therefore, exact line search has only theoretical importance and in practice, alternative inexact line search strategies are preferred.
Indeed, many efficient line search techniques have been proposed and tested.
These are, in fact, based on a "one dimensional" minimization using a combination
of interval reduction and quadratic or cubic interpolation techniques depending
on the availability of gradient information. For a discussion of such inexact line
search methods, we refer to Fletcher[F1], Dennis and Schnabel[D1]' Gill, Murray
and Wright[G1], Wolfe[W1].
In choosing a steplength a(k) at a current point x(k), we need to stay away
from the end points of the interval (0, a~k) in order to produce a significant
decrease in the function value. The Goldstein requirement (Goldstein[G6])
(2.4.5)
with 0 < C, < ~ ensures that a(k) is not too close to a~k) by restricting the average rate of decrease of I(x) in moving from x(k) to x(k+l) along s(k) to be at
least some prescribed fraction of the initial rate of decrease in that direction (see
Figure 2.4.1 below). On the other hand, the Wolfe condition (Wolfe[W2,W3])
(2.4.6)
with 0 < c, < 1 ensures that a(k) is not too small by requiring the rate of decrease 'of I at x(k+l) in the direction s(k) to be larger than some prescribed fraction of the
initial rate of decrease (see Figure 2.4.1 below). The restriction 0 < c, < c, < 1 guarantees that (2.4.5) and (2.4.6) can be satisfied by om E (0, a~k) (Wolfe[W2], Powell[P8] ).
In recent studies, the strong Wolfe condition
(2.4.7)
together with the Goldstein condition (2.4.5) subject to 0 < c, < c, < 1 are often preferred as line search requirements (Fletcher[Fl], AI-Baali[A 1], AI-
Baali and Fletcher[A2]' Liu and Storey[L2]). We call the combination of these
Chapter 2: Mathematical Foundation.! 14
two conditions the Wolfe-Powell Condition.!. Conditions (2.4.5) and (2.4.7) are
sometimes referred to as strong Wolfe conditions (Gilbert and Nocedal[G3]).
L-~--------------~----~r-----~a I-< Permissible under (2.4.5) >1 a *
I-< Permissible under (2.4.6) >l ~ Permissible under both >1
Figure 2.4.1. Permissible range for (l(k) under conditions (2.4.5) and (2.4.6).
It is remarked that the value of c, determines the accuracy with which (/k)
approximates a stationary point of f along s(k), and consequently provides a means of controlling the balance of effort to be expended in computing a(k). In
general, the smaller the value of c" the more accurate the line search is. Obviously,
if c, = 0, the line search is exact.
CHAPTER 3 GENERALIZED POLAK-RIBIERE ALGORITHM
In this Chapter, we develop a new type of conjugate gradient algorithm for
finding a local solution to the problem,
[P] Minimize I(x), x E Rn
and discuss various theoretical properties of the algorithm. The search directions
in the algorithm, as we shall see, are generalizations of those in the Polak-Ribiere
method, and so the algorithm is called the Generalized Polak-Ribiere Algorithm
(GPR Algorithm, in short).
In [P], the objective function I : Rn -+ R is, in general, nonlinear and it is assumed throughout the sequel (whether stated explicitly or not) that the
folJowing conditions hold:
[AP-l] I is twice continuously differentiable. [AP-2] 3 X(l) E Rn 3 the level set
is bounded.
Additional conditions will be added whenever necessary.
It may be observed that
(i) By [AP-l], the Hessian G(x) is symmetric for all x ERn.
(3.0.1)
(ii) By [AP-l],the level set L(x(!) in [AP-2] is closed and hence it is
compact.
Chapter 9: Generalized Polak-Ribiere Algorithm 16
(iii) The objective function f('), the gradient g(.) and the Hessian G(), being continuous, are bounded on the compact set L(x(l with x(l) as in [AP-2J.
Defining
M b. sup{IIG(x)1I : x E L(X(I)}, (3.0.2)
we have then
(3.0.3)
3.1 Derivation of the Algorithm
We begin with an estimate x(l) of a local minimizer x of f and take the initial search direction as the steepest descent direction at x(l):
(3.1.1)
To determine the search direction s(k) for the k-th iteration (k > 1) from the current point x(k), we proceed as follows:
Let F(x + as) denote the quadratic approximation to f(x + as), obtained by truncating the Taylor series expansion of f( x + as): .
Assuming that G( x) is positive-definite, we can write
where 9 b. g(x) and G b. G(x), and hence
1 (gTs)2 min(F(x + as) - f(x = --2 T.G
" s s
1 =--V,
2
where V = V(x,s) is given by
(3.1.2)
(3.1.3)
(3.1.4)
Chapter 9: Generalized Polak-Ribiere Algorithm 17
and the minimum occurs for
(3.1.5)
We now set
s = -g + {3p, (3.1.6) where p is an arbitrary but fixed vector in Rn such that p and 9 are linearly independent and {3 is a nonzero real variable, and minimize (3.1.3) as a function
of {3. This demands that we choose {3 such that
v _ (gT(_g+{3p))2 ({3) - (_g + {3p)TG( -g + {3p)
(gTg _ fJgTp)2 (3.1.7)
is maximillll. Here the denominator is positive for all {3 in view of positive-
definiteness of G.
The value of {3 for which (3.1.7) is maximum must satisfy the equation
(3.1.8)
obtained by setting d~
Chapter 9: Generalized Polak-Ribiere Algorithm 18
provided the denominator in fJ2 is nonzero. The search direction
corresponding to (3.1.10) is not a descent direction as gT 81 = 0 and is of no importance to us (in fact fh makes V take its minimum value 0). The search direction
82 = -g + fJ2P corresponding to (3.1.11) forms the basis of the proposed algorithm. The L5
algorithm, studied in Liu and 5torey[L2} and Hu and 5torey[H2), is also based on 82. Notice that (3.1.11) reduces to (3.1.9) for gTp = O.
We now let
(3.1.12)
in span{s-,g}, where s- is the search direction in the previous iteration, and
'Y f' 0 is determined so that T
9 P = o. (3.1.13)
This requires
(3.1.14)
The current search direction is then defined by (3.1.6) withp described by (3.1.12)
and (3.1.14) and (3 given by (3.1.9).
IT we denote p by s, we then have the following iterative process for the GPR algorithm from the initial estimate x(l) for the minimizer x :
( k) { _g(l) for k = 1 s -' , - _g(k) + (3(k) s(k-I) for k > 1
GPR' ,
-(k-I) _ (k-I) _ (g(k)TS(k-I) (k) s - S (k)T (k) 9 ,
9 9 for k > 1,
(3.1.15a)
(3.1.15b)
(3.1.15e)
(3.1.15d)
Chapter 9: Generalized Polak-Ribiere Algorithm 19
It may be remarked that the stopping condition will be activated whenever
g(k) = 0 at any iteration and so we can assume that g(k) oF 0 as long as the iteration continues. Moreover, it follows from (3.1.15c) that
(3.1.16)
and hence, from (3.1.15b), we have
(3.1.17)
as long as g(k) oF O. This shows that:
Proposition 3.1. In the GPR Algorithm, ik) is a descent direction from x(k).
The steplength a(k) at each iteration is determined by a one-dimensional line
search (see section 2.4) along s(k) so that
For an exact line search
a(k) = arg min f(x(k) + as(k), o
Chapter 9: Generalized Polak-Ribiere Algorithm 20
at x(l) satisfying [AP-2). Besides the general problem [P), the quadratic case,
namely,
[Q) Minimize q(x), x ERn,
where (3.2.1)
will alilO be considered. In dealing with [Q], it will be assumed throughout the
sequel that
[AQ) The Hessian A is symmetric and positive-definite.
It may be noted that for the quadratic function q(.),
and hence
In this case, the steplength
g(x) = Ax + b, G(x) = A
where y(k) is as in (2.1.7). In case of exact line search, (3.2.4) becomes
(k) _ g(k)T g(k)
a - s(k)T As(k)
(3.2.2)
(3.2.3)
(3.2.4)
(3.2.5)
by the exact line search condition (3.1.20) and the descent condition (3.1.17).
In what follows, unless explicitly referred to [Q], we shall consider that the
GPR algorithm is applied to [Plo
The GPR algorithm has the property that s(k) is conjugate to s(k-I) for all
k > 1. Indeed, we have,
Proposition 3.2. The GPR algorithm satisnes
(3.2.6)
for all k > 1.
Chapter 9: Generalized Polak-Ribiere Algorithm 21
Proof. This follows directly from (3.1.15b) and (3.1.15d). I
By applying the mean value theorem to g(.), we obtain, according to the GPR algorithm,
where
y(k) = g(k+1 ) _ g. l~(x(k) + taCk) P)dt = G(~(k)
for some ~(k) E (x(k), x(k+l).
(3.2.7)
(3.2.8)
(3.2.9)
Now, if IId(k)1I = IIX(k+I) - x(k)1I is sufficiently small, then since G() IS continuous, we can approximate e(k) by G(k), and thus obtain
So, in this case, if exact line searches are carried out (in which case S
Chapter 9: Generalized PolakRibiere Algorithm 22
Proposition 3.3. lithe GPR algorithm is applied to IQ] and exact line searches are used throughout, then
for all k > 1 and j E 1, k-1.
S(k)T As(j) = 0,
g(k)T g(j) = 0
(3.2.11)
(3.2.12)
Proposition 3.4. lithe GPR algorithm is applied to IQ] and an exact line search is carried out at each iteration, then
(3.2.13)
for k > 1 and j E 1, k - 1.
Proposition 3.5. lithe GPR algorithm is applied to IQ] with exact line searches, then the algorithm terminates at a stationruy point x(m+I) after m ~ n iterations,
where m is the number of distinct eigenvalues of A.
Notice, however, that convergence is not, in general, obtained in a finite
number of steps if the objective function is not quadratic, and the number of
iterations required to attain a given accuracy depends upon the initial estimate x(l) of the minimizer x.
We now consider some relations between the magnitudes of different quanti
ties occuring in the GPR algorithm applied to the general problem IP]' which we shall use in the subsequent analysis.
Proposition 3.6. In the GPR algorithm,
(a) IIs(k) 112 = IIg(k) 112 + (.B~~Slls(k-l)1I2, k > 1 (b) IIg(k)1I ~ IIP)II, k ~ 1 (c) IIP)II~lIs(k)lI, k~I
Proof. From (3.1.I5b), we get, Vk > 1,
liP) 112 = (- g(k) + .B~~R s(k-l)f (_ g(k) + .B~~R P-l)
= Ilg(k) 112 + (.B~~R)2I1s(k-l) 112,
(3.2.I4a)
(3.2.I4b)
(3.2.I4c)
Chapter 9: Generalized Polak-Ribiere Algorithm
since g(k)T S
Chapter 9: Generalized Polak-Ribiere Algorithm
Proposition 3.8. For all k > 1, (k) M IIg(k) 11
l.BoPRI ~ m IIs(k 1)11'
Proof. From (3.1.15d), (3.0.2) and (3.2.18), we have, V k > 1, (k) IIg(k) IIIIG(k) 1111:5(.1;-1) 11
I.BOPRI ~ m IIs(k-1) 112
M IIg(.I;) 11 ~ m IIs(k 1) 11 I
Proposition 3.9. For all k > 1,
IIP)II ~ (1+ ~)lIg(k)lI. Proof. From (3.1.15b), we get, V k > 1,
liP) 11 ~ Ilg(k) 11 + l.Bi~R 11I:5(k-1) 11
~ IIg(k) 11 + M IIg(k) 11 m
using (3.2.19). I
Proposition 3.10. There exists r > 0 such tbat
cos (}(k) ~ r
for all k, where (}(k) 6 (_ g(k)A s(k).
24
(3.2.19)
(3.2.20)
(3.2.21)
-1
Proof. This follows from (3.2.15) and (3.2.20) with r = (1 + ~) > O. I
When the GPR algorithm is implemented with exact line search at each
step, we have, from (3.2.7) using the exact line search condition (3.1.20) an
Chapter 9: Generalized Polak-Ribiere Algorithm
Proposition 3.11. For all k,
1 < Q(k) < ..!... M(1+ ~/ - - m
Proof. From (3.2.9) and (3.2.18), it follows that
m IIs(k) 112 ~ s(k)T a(k) s(k) ~ M IIs(k) 112
for all k and hence, by (3.2.22),
IIg(k) 112 M IIs(k)1I2
Since, by (3.2.14b) and (3.2.20),
1 IIg(k) 11 (1 + ~) ~ IIs(k)1I ~ 1,
we have (3.2.23a).
Proposition 3.12. For all k,
IIg(Hl) 11 ~ (1 + ~) Ils(k) 11. Proof. From (3.2.7) and (3.2.9), we have,
g(k+l) = g(k) + Q(k)G(OP)
25
(3.2.23a)
(3.2.23b)
(3.2.23c)
(3.2.24)
for some ~ E (x(k), x(k+ 1) C L(x(!). Hence using (3.2.14b), (3.2.23a) and
(3.0.2),we conclude that
IIg(k+l)II ~ (1 + ~)lIs(k)lI . Proposition 3.13. For all k > 1,
IP(k) I ~ (1 + M) M. GPR m m
Proof. In view of exact line search, we have,
s(k-l) = s(k-l)
(3.2.25)
for all k > 1 and hence (3.2.25) is obtained from (3.2.19) using (3.2.24).
Proposition 3.14. For all k > 1,
Ils(k)1I ~ (1+ ~r"P-l)l. (3.2.26) Proof. This follows from (3.2.20) and (3.2.24).
Chapter 9: Generalized Polak-Ribiere Algorithm 26
3.3 Global Convergence Properties of the Algorithm
In this section, we discuss global convergence properties of the GPR algo-
rithm applied to [P) under standard line search strategies (as discussed in section 2.4). Throughout the section, it is assumed that the conditions [AP-l) and [AP-2)
hold and the GPR algorithm is initiated at X(I) satisfying [AP-2).
We first observe that in view of Proposition 3.1, the GPR algorithm with
exact line search satisfying conditions (3.1.19) and (3.1.20) or with inexact line
search satisfying Wolfe-Powell conditions
[W-l) !(HI)::; !(k) + Cl a(k)g(k)Tp),
[W-2) Ig(k+I)TS(k)l::; - c2
g(k)TS(k),
where 0 < Cl < c2 < 1, leads to the inequality
(3.3.1)
(3.3.2)
(3.3.3)
with some p > O. This is established in Proposition 3.15 and Proposition 3.16. The proofs depend on the descent property (3.1.17) and are valid for any descent
algorithm.
Proposition 3.15. If an exact line search is performed at each iteration with
the GPR algorithm, then the inequality (3.3.3) with some p > 0 holds for all k.
Proof. By the exact line search condition (3.1.19), we have,
(3.3.4a)
where a~k) is as defined in (2.4.1).
But, for 0 < a < a(k), we have, by the Taylor formula,
!( x(k) + as(k) = !(k) + ag(k)Ts(k) + !a2 s(k)T G(x(k) + /laP)s(k)
for some /I E (0,1). Since the segment [x(k), x(k) + as(k) C L(x(l)), 50, using (3.0.3), we have,
(3.3.4b)
Chapter 9: Generalized Polak-Ribiere Algorithm
for 0 < a < a(k). The quadratic polynomial
p(k)(a) ~ I(k) + ag(k)Tp) + ~a2 M IIs(k) 112
attains its minimum value
at
(k) (k) (g(k)TS(k2
Pm;. = I - 2M IIs(k) 112
-(k) _ g(k)T s(k)
a - - M IIs(k) 112
27
(3.3.4c)
which is positive by (3.1.17). Since p(k)( a) is decreasing on (0, o(k and increasing on (a(k) 00) it follows that a(k) < a(k) and hence , , .
( (k)T (k2
J
Chapter 9: Generalized Polak-Ribiere Algorithm 28
We next show that acceptable steplengths exist in line searches using the
Wolfe-Powell conditions [W-l] and [W-2] for c" c, satisfying 0 < c, < ~ and C, < c, < 1. The proofs, though standard for any descent algorithm, are included for the sake of completeness.
Lemma 3.17a. For any c, E (0, ~), steplengths ark) > 0 can be determined in a line search at x(k) witb tbe GPR algoritbm satisfying (W-l].
Proof. For a > 0 such that [x(k), x(k) + as(k)] C L(x(l), we have, as in the case of (3.3.4b),
Notice that for any c, E (O,~) and any a > 0, (3.1.17) implies
f(k) + ag(k)Ts(k) + ~a2 M liP) 112 :::; f(k) + c, ag 2(1 - c, )lIg(k) 112 M Ils(k) 112
(3.3.6a)
(3.3.6b)
Since f(x(k) + as(k) initially decreases along s(k), so either there exists a least positive a~k) such that
or else,
for all a > O.
In the first case, we notice, from (3.3.6a), that
(3.3.6c)
which is greater than a:(k) for 0 < c, < ~. So, in either case, any positive a(k) :::; a:(k) will satisfy (3.3.1). I
Lemma 3.17b. For any c, E (0,1), steplengtbs a(k) > 0 can be detennined in a line search at x(k) witb tbe GPR algoritbm satisfying (W-2J.
Chapter 9: Generalized Polak-Ribiere Algorithm 29
Proof. The proof is by contradiction.
Suppose that for some c2 E (0,1) and all a > 0,
Ig(x(k) + as(kTs(k)1 ;, - c2
g(k)TS(k), (3.3.7a)
that is,
(3.3. 7b)
or
(3.3.7c)
We notice that the function
has derivative
(3.3.7d)
IT (3.3.7b) holds, then (3.3.7d), (3.3.7b) and (3.1.17) imply that 4>'(t) > 0 for all t ~ 0 and hence 4>(0) < 4>(a), that is, J(x(k < J(x(k) + as(k for all a > O. This contradicts the fact that s(k) is a descent direction. So (3.3. 7b) cannot hold.
IT (3.3.7c) holds, then it follows from (3.3.7d), (3.3.7c) and (3.1.17) that
4>(a) - 4>(0) = l4>'(t)dt < -c2 a llg(k)1I 2 ,
that is,
(3.3.7e)
for all a > O. It then follows that x(k) + as(k) E L(x(1 for all a > O. But the continuous function J is bounded on the compact set L(x(1. Hence, 3N > 0:;1 "la > 0, IJ(x(k) + as(k1 ~ N. However,
N + J(k) (k) (k) a > c
2I1g(k)1I 2 ~ J(x + as ) < - N.
This contradiction therefore shows that (3.3.7c) cannot hold, so the Proposition
is proved. I
Proposition 3.18. For any C"C2 satisfying 0 < Cl < ~ and Cl < C2 < 1, there exists an interval of acceptable steplengths a(k) > 0 in a line search at x(k) with the GPR algorithm satisfying [W-l] and [W-2].
Chapter 9: Generalized Polak-Ribiere Algorithm 30
Proof. From Lemma 3.l7b and (3.3.5c), it follows that steplengths a(k) for
which [W-2] holds satisfy
(1- C,)lIg(k) 112 M IIS(k) 112
(3.3.8a)
On the other hand, we have seen in Le=a 3.l7a that [W-l] holds for any
positive a(k) ~ a(k), where a(k) is as defined in (3.3.6b). But clearly,
(3.3.8b)
Hence, for any a(k) In the interval [g(k),a(k)], both [W-l] and [W-2] hold
simultaneously. I
We now look into the convergence properties of the GPR algorithm executed
without regular restarts. In this connection, some additional conditions are
needed for establishing the convergence criterion
lim IIg(k) 11 = 0 k-oo
(3.3.9a)
or even the weaker criterion
!im IIg(k) 11 = O. (3.3.9b) k_oo
The next two theorems establish some general conditions for the convergence
of the GPR algorithm.
Theorem 3.19. Suppose that in the GPR algorithm, g(k) =1= 0 for all k and at
each iteration a(k) is chosen so as to satisfy (3.3.3) for some p > O. Assume that, in addition to the conditions {AP-I} and {AP-2}, the following condition holds:
00
[AP-4] The series L cos2 (jCk) is divergent, where e(k) "" (- g(k)A s(k) . k=!
Then the limit (3.3.9b) is achieved.
Proof. Suppose that (3.3.9b) does not hold. Then 3f > 0 3
(3.3.l0a)
Chapter 9: Generalized Polak-Ribiere Algorithm 31
for all k. It then follows from (3.2.16), (3.3.10a) and (3.3.3) that
1 (g(k)TS(k)2
- IIg(k)1I2 s(Ws(k)
:::; _1 (I(k) _ I(HI)) pe2
and hence ' 1,
(3.3.10c)
But the continuous function 10 is bounded on the compact set L(X(I). Letting
r = inf{f(x) : x E L(x(i)}, (3.3.10d)
we thus have
(3.3.10e)
whence, by the monotone convergence property of positive term series, it follows 00
that L cos2 (I(k) is convergent. k=1
This contradiction establishes the theorem. I
Theorem 3.20. If in Theorem 3.19, the condition [AP-4} is replaced by the
condition
[AP-5] the sequence {cos (I(k)} is bounded away from 0,
then the limit (3.3.9a) is achieved by the GPR algorithm.
Proof. For each k, we have,
k
l(k+I) = 1(1) + L(I(HI) - l(j). j=1
Hence, using (3.3.3), (3.1.17) and (3.2.15), we get,
k
I(HI) :::; 1(1) - p L IIg(j) 11 2cos2 (1(j), j=1
(3.3.11a)
(3.3.lIb)
Chapter 9: Generalized Polak-Ribiere Algorithm 32
where e 0 3
(3.3.llc)
for all j, and hence, k
j 0 whenever g(k) =f O. Assume that conditions fAP-l}, fAP-2} and fAP-4} hold. Then either a finite sequence {x(k)}
is obtained whose last term x(m) satisnes g(x(m) = 0 or else the sequence {x(k)}
has a limit point x such that g(x) = O. If, instead of fAP-4}, the condition fAP-5} is assumed, then g(x) = 0 for all limit points of {x(k)}.
Proof. The iteration stops whenever g(x(m) = 0 for some m.
Suppose now that g(k) =f 0 for any k.
Assume that [AP-4] holds. Then, by Theorem 3.19,
lim IIg(k) 11 = 0, k-oo
Chapter 9: Generalized Polak-Ribiere Algorithm 33
and since {x(k)} is a sequence in the compact set in L(x1, so, by standard results
of analysis, there exists a subsequence {x(kj )} of {x(k)} such that
and
.lim g(x(kj = 0 ,-00
lim x(kj) = x ;-00
for some x E L(x(l. But then, by the continuity of g,
and hence
g(x) = 0
for the particular limit point x of {x(k)}.
(3.3.12a)
(3.3.12b)
On the other hand, if [AP-4] is replaced by [AP-5], then it follows from
Theorem 3.20 that lim g( x(k = o.
k-oo
Hence, for any convergent subsequence {X(kj)} of {x(k)} with .lim x(kj) = x, we )-00
have,
It may be remarked that if the sequence {x(k)} has just one limit point x (that is, if {x(k)} is convergent), which is usually the case in practice, then it makes
no difference whether we take [AP-4] or [AP-5]. In this case, f(x(k ! f(x) and g(x) = 0, and therefore x is a local minimizer of f or possibly a saddle point.
We conclude this section with some comments about the conditions [AP-4]
and [AP-5] used in the convergence proofs.
The condition [AP-4] is much weaker than the condition [AP-5] and is
the weakest condition that has been used to prove global convergence for CG
algorithms (Fletcher[Fl]). Regarding [AP-5], we note that negligible reductions in
function values can occur if the search directions S(k) are close to being orthogonal
to the negative gradients, and the condition [AP-5] ensures that this does not
happen. We have already seen that a sufficient condition for the realization of
[AP-5] with the GPR algorithm is [AP-3]. Another set of conditions, adapted
from Liu and Storey[L2], is considered below.
Chapter 9: Generalized Polak-Ribiere Algorithm 34
Proposition 3.22. In the GPR algorithm, under fAP-l} and [AP-2}, suppose
that V k > 1, [AP-6aJ s(k-l)T G(k)s(k-l) > 0
[AP-6bJ g(k)T G(k)g(k) > 0
[AP-6dJ (g(k)TG (k)s(k-l)2 ~ (1- .;. )(g(k)TG(k)g(k)(s(k-l)TG(k)s(k-l)
for some "f. ~ 1
whenever g(k) i- O. Then if 00 1
[AP-6eJ L = 00, k=l 1 + "f. r.
then [AP-4} holds. On the other hand, if
[AP-6fJ lim 1 > 0, k-oo 1 + "f. r.
then [AP-5} holds.
Proof. Set V k > 1,
u. ~ g(k)TG(k)s(k-l),
~ g(k)T s(k-l)
q. = g(k)Tg(k) .
Then it follows from [AP-6aJ, [AP-6bJ and [AP-6dJ that
u2 1 1--'->->0
t. v. - "f.
Moreover, from (3.l.I5c) and (3.1.I5d), we obtain, V k > 1,
(3.3.I3a)
(3.3.I3b)
(3.3.I4a)
Chapter 9: Generalized Polak-Ribiere Algorithm 35
which gives, on simplification,
Thus,
(3.3.14b)
(3.3.14c)
using [AP-6c] and (3.3.13a).
Now, from (3.1.15b), (3.1.15c) and (3.3.14a), we obtain, V k > 1,
S(k)=_( t.-q.u. )g
Chapter 9: Generalized Polak-Ribiere Algorithm 36
In this connection, it may be noted that if G(k) is positive-definite, then [AP-6a], [AP-6b] and [AP-6c] hold with r. ~ X(k), where X(k) is the spectral condition number of G(k), that is, the ratio )!k) /21 (k) of the largest to the smallest eigenvalues of G(k) _ It is possible to see that [AP-6c] and [AP-6d] are verified for
r. = 'Y. = r ~ 1. Then the restrictions [AP-6e] and [AP-6f] are automatically
satisfied.
We further observe that the Zoutendijk condition (Zoutendijk[Z2])
00
Lcos2e(k)lIg(k)1I2 < 00 (3.3.18a) k=l
is satisfied by the GPR algorithm under conditions [AP-1] and [AP-2] and the line
search condition (3.3.3). This is so, because, from (3.3.3), (3.2.16) and (3.1.17),
we have, J(k+1) :::; J(k) _ p cos2 e(k) IIg(k) 112
for all k so that "IN > 1,
N
L cos2 e(k)IIg(k) 112 :::; ::'(1(1) - r), k=l p
where r is as defined by (3.3.10d).
(3.3.18b)
(3.3.18c)
We now analyse some weakened conditions for the convergence of the GPR
algorithm. From (3.3.18a) and (3.2.15), we have
00 IIg(k) 114 {; Ils(t) 112 < 00. (3.3.l8d)
Hence if the limit (3.3.9b) is not achieved, then since
(3.3.l8e)
so, by the comparison test, 00 1
{; IIs(k) 112 < 00 (3.3.19)
which requires that IIs(k)lI-+ 00 ~ufficiently rapidly. Indeed, if IIsCk) 112 = O(k) as k -+ 00, then lIo(h ll 2 ~ ct for some c> 0 and hence
Chapter 9: Generalized Polak-Ribiere Algorithm 37
and the failure of (3.3.19) implies that the limit (3.3.9b) is achieved. We discuss
below some conditions which ensure this with the GPR algorithm.
We continue to assume that conditions [AP-l] and [AP-2] are satisfied and
the GPR algorithm is initiated at X(I) as in [AP-2]. By continuity of g(.), we
have, (3.3.20)
for all k ~ 1.
Proposition 3.23. For all k > I > 1,
11 8(k) 11 < {) (118(1-1) 11) (1 + fJ(k)' + fJ(k)'fJ(k-l)' + ... - IIg(l-I) 11
+ fJ(k)' fJ(k-I)' . fJW ) ! . (3.3.21)
Proof. We have, from (3.2.14a), (3.2.14c) and (3.3.20),
liP) 112 ~ {)2 + fJ(k)'lIs(k-l) 112 (3.3.22a)
for all k > 1.
Consider any I > 1.
From (3.3.22a), we obtain, using (3.2.14b) and (3.3.20),
Ils(l) 112 < {)2 (lIs(l-I) 112) + fJ(I)' (118(1-1)112) {)2
- IIg(l-I)112 IIg(l-I)112
( IIs(l-I) 112 ) ,
= {)2 IIg(l-1) 112 (1 + fJ(1) ) (3.3.22b)
and further assuming (3.3.21) for some k ~ I,
118(k+l) 112 < {)2 (118(1-1) 112) + fJ(k+l)' {)2 (118(1-1) 112) (1 + fJ(k)' - IIg(l-I) 112 IIg(l-I) 112
+ fJ(k)' fJ(k-l)' + ... + fJ(k)' /3(k-I)' .. /3(1)')
= {)2 (liS (1-1) 112) (1 + fJ(k+ I )' + fJ(k+l)'fJ(k)' IIg(l-I) 112
+ ... + fJ(k+I)' fJ(k)' ... fJ(I),). (3.3.22c)
From (3.3.22b) and (3.3.22c), by induction, the proposition is verified. I
Chapter 9: Generalized Polak-Ribiere Algorithm 38
We now consider the assumption:
[AP-7] There exists 8 > 0 such that Vk ~ 1,
(3.3.23)
We remark that the search direction s(k) in the GPR algorithm is independent
of the length of the auxiliary vector S
Chapter 9: Generalized Polak-Ribiere Algorithm
M IIg(k) IIIIS b - p8 - ,
where b > 1 for 8 sufficiently small.
On the other hand, (3.2.14c) gives
IId(k-l) 11 ~ A => a(k-l) liP-I) 11 ~ A
=> IIs(k-I)1I ~ ~ a
for k > 1, where 0 < a ~ a(k) exists by Proposition 3.18. obtain, from (3.3.27a), using (3.3.20), (3.3.23) and (3.3.28a),
1.B(k) I ~ M{)A 1 p82 a = b
if, by (3.3.27b),
39
(3.3.27a)
(3.3.27b)
(3.3.28a)
So, in this case, we
(3.3.28b)
(3.3.28c)
The above Proposition shows that the GPR algorithm shares the "Property
(*)" of Gilbert and Nocedal[G3] under certain conditions which are not too restrictive. The next proposition, adapted from Gilbert and Nocedal[G3], shows
that if, in addition, some restriction on the step sizes in the GPR algorithm is
imposed, then IIs(k) 112 can grow at most linearly.
Proposition 3_25. Suppose that in the GPR algorithm, g(k) t- 0 for all k and that the conditions lAP-9aJ and lAP-9bJ are satisfied. Then if
[AP-I0] For any A > 0, there exist integers / > 1 and r ~ 1 such that
for any index k ~ /, the number of indices i E k, k + r - 1 for whichlld(i-I)1I > A does not exceed j,
then IIs(k)1I2 ~ c(k -/ + 2) for k ~ I, where c > 0 depends on 1 but not on k.
Proof_ For A > 0 satisfying [AP-9b], consider integers 1 > 1 and r ~ 1 given by
Chapter 9: Generalized Polak-Ribiere Algorithm 40
[AP-IO]. By Proposition 3.23, we have, for k > I,
IIs(k) 112 ~ c(I + /J 0 depends on I but not on k.
Consider the product
(3.3.30a)
of (k - i + 1) factors of the form p(t)2, where i ~ t < k and I ~ i S k.
If k - i + 1 ~ T, then we have, by [AP-9al,
(3.3.30b)
If k - i + 1 > r, let k - i + 1 = mT + h, where m ~ 1 and 0 ~ h < T and rewrite p(i) by grouping consecutive T factors from the beginning:
p(i) = p(i) p(i) ... p(i) Q(i) o 1 m_1 ' (3.3.30c)
where
p(i) , ~ p(k,)2 p(k, _1)2 . P(k'+l +1)2 (3.3.3Ia)
k, ~ k - tT, o ~ t ~ m-I, (3.3.3Ib) and
Q(i) ~ fJ(km)2 fJ(km _1)2 . p(i)2, (3.3.3Ic)
km ~ k-mT, (3.3.3Id)
there being T factors in each p,(i) and h factors in Q(i) (Q(i) = 1 if h = 0).
Let p!i) be the nwnber of indices j E k'+l + 1, k, such that IldU-I)1I > A. By [AP-lO],
P( i) < ~ (3 3 32 ) , - 2 .. a
Chapter 9: Generalized PolakRibiere Algorithm 41
and hence, by IAP9a] and IAP9b],
F) p(i) < W)P~') (:2) r-p, ,
-(b~ ) r-2p~')
:5 1 (3.3.32b)
in view of (3.3.32a) and b > 1. Moreover, by IAP9a],
(3.3.32c)
Thus, from (3.3.30b), (3.3.30c), (3.3.32b) and (3.3.32c), it follows that
for each I :5 i :5 k and hence from (3.3.29), we obtain, for k ~ I,
asb>1. I
IIP )1I 2 :5 c(1+ b2r (k -I + 1))
:5 cb2r(k-I+2) (3.3.33)
From the above discussion, we then have the following convergence result:
Theorem 3.26. Suppose that conditions [APl] and [AP.2] are satisfied and the
GPR algorithm is executed with line searches satisfying (3.3.3) for some p > o. Then if g(k) i 0 for all k and conditions [Ap.ga], [AP-9b] and [APlO] hold, then
Proof. This follows directly from Proposition 3.25 and the preceding discussion.
I
Of course, in view of Proposition 3.24, we can replace the conditions IAP9aJ
and IAP9b] by conditions IAP.7] and lAP-S] in the above theorem.
Chapter 9: Generalized Polak-Ribiere Algorithm 42
. 3.4 Rate of Convergence
In this section, we analyse the rate of convergence of the GPR algorithm for
solving the problem [P] under assumptions [AP-1]-[AP-3] (as stated on page 15 and page 23) and the following additional assumption:
[AP-Il] For x(l) as in [AP-2], the level set L(x(l is convex and
3B > 0 :3 \Ix', x" E L(X(I,
IIG(x') - G(x")1I ::; B IIx' - xliii (3.4.1 )
We also assume that the GPR algorithm, initiated at X(l) satisfying [AP-2], is
executed with exact line search at each iteration. Our approach follows that in
Cohen[C2].
It may be remarked that in case of a quadratic objective function, the GPR algorithm with exact line search terminates at the optimal point in at most n
iterations. If the' objective function is non-quadratic, then finite termination does not occur in general. However, as we shall see, with exact line search,
the algorithm possesses n-$tep quadratic convergence when reinitialized with
a steepest descent direction.
We only consider the case when the GPR algorithm is reinitialized.
Let 4> denote the GPR algorithm applied to the general function f with exact line search at each step and described by (3.1.15) with s(k) = s(k).
For each reinitialized point x(k) constructed by 4>, let F(k) : Rn -+ R be the
quadratic function defined by
F(k)(x) ~ f(k) + g(k)T(X _ x(k + Hx _ x(kTG(k)(x _ x(k. (3.4.2)
Suppose that 4> F(.) denotes the GPR algorithm applied to F(k) starting at x(k) and constructing the itE;rates x(k) along directions s(k) at x(k), where ,+1 .,
S~k) = s(k), (3.4.3)
s~k) = _ g(k) + a(k) s(k) for i > 1, I I fJ, .-1
Chapter 9: Generalized Polak-Ribiere Algorithm 43
with g?) = V F(k)(x~k and the a~k)'s determined by exact line search (Here, we
use subscripts i to denote the iterates for the
Chapter 9: Generalized Polak-Ribiere Algorithm
Lemma 3.29. For I ::?: 0,
Proof. The lemma is trivially true for I = 0.
For k ::?: 1 and I ::?: 1, we have,
I-I
IIdk+l) - d k)1I ~ L IIG(k+i+ I ) - dk+iJlI j=o
But from [AP-H], Proposition 3.11 and Lemma 3.27,
IIG(k+i+I ) _ G(k+j) 11 = IIG(x(k+i+ I )) - G(x(k+j))11
.~ B lIa(k+iJ s(k+iJ 11
~ B IIs(k+j)1I m
=OIlP)II Hence,
for alII::?: 0.
Lemma 3.30. For I ::?: 0,
where C(k) is as defined by (3.2.8).
Proof. For k ::?: 1 and 1 ::?: 0, we have,
But, by (3.2.8), [AP-11] and (3.2.23a),
44
(3.4.6)
(3.4.7)
Chapter 9: Generalized Polak-Ribiere Algorithm
116(HI) - G(HI) 11 = 11 il{ G(x(HI) + t a(HI) P+l) - G(k+ l) }dtll
::; iiIG(x(k+l) + t a(HI) s(HI) - G(k+l) IIdt
::; 2-: IIP+I)II. Hence, by Lemmas 3.27 and 3.29, we have,
for all I ~ O. I
Lemma 3.31. For 0 ::; I ::; n - 1,
IIg(k+l+l) _ g,~~ 11 = O( IIg(HI) - g~:! 11)
45
+ O(lIa(HI) P+l) - a~!~ s~!~ 11)
+ O(IIP)1I2 ). (3.4.8)
Proof. For k ~ 1 and 0 ::; I ::; n - 1, we have, by (3.2.7), (3.2.18) and
(3.2.23a),
IIg(Hl+l) _ g~!~1I = Ilg(k+l) + a(k+1)6(HI)s(k+I) - g~:~ _ a~!~G(k)s~!~ 11
::; IIg(HI) _ g~:~ 11 + 116(HI)( a(HI) p+l) - a~:~ s~!~)1I
+ lIa~!~ (6(HI) _ G(k)s~!~ 11
::; IIg(HI) _ g~!~ 11 + Mlla(HI) s(k+l) - a~!~ s~!~ 11
+ 2. 116(HI) _ G(k) 11 liP) 11. ~ '+1 Hence, using Lemmas 3.28 and 3.30, we conclude that
IIg(HI+l) _ g~:!1I = O(lIg(k+l) _ g~!~II)
forO::;I::;n-l. I
+ O( lIa(HI) p+l) - a~!~ s~!~ 11)
+ O(lIs(k)1I2 )
Chapter 3: Generalized Polak-Ribiere Algorithm 46
Lemma 3.32. For 0 ::; I < n - 1,
IIP+I+l) - s~!~ 11 = O(lIs(HI) - s~!~ 11)
+ O(lIg(Hl+l) _ g~!! 11) + O(IIP) 11 2 ). (3.4.9)
Proof. For 0 ::; I < n - 1, we have, from (3.1.15) and (3.4.3),
IIs(HI+1) _ s~!~ 11 ::; IIg
Chapter 9: Generalized Polak-Ribiere Algorithm
~ f-{M2I1g(HI+I)IIIIP+I) - s~:! IIlIs~~! 112118(HI)11 .+,
+ M2I1g(k+I+I)lIlIs(HI) _ s~:! IIlIs~:! 11 2118(HI)1I
+ M 2 I1g(k+l+l) _ g~!~ IIlIs~~! 11211P+1) 112
+ M2I1g~!~ IIlIs(k+l) - s~:! IIlIs~:! 11 IIs(k+I) 112
+ MlIg~!! IIIIG(HI+I) - C(k) IIlIs~~! IIlIs(HI) 11 3
+ MlIg~!! IIIIG(HI+I) - G(k) IIlIs~~! IIlIs(k+l) 11 3
+ M2I1g~!~ IIIIP+I) - s~:! IIlIs~:! 11 IIS(k+I) 112 }
~ c.1+, {M2( 1 + ~) IIs(HI) - s~~! IIlIs~~! 112I1s(HI) 112
+ M2( 1 + ~) IIP+I) - s~~! IIlIs~~! 112I1P+1) 112
+ M21Ig(HI+l) _ g~!! IllIs~~! 112I1s(HI) 112
+ M2(1 + ~) IIP+I) - s~~! IIlls~~! 112I1s(HI) 112
+ M (1 + ~) IIC(HI+I) _ G(k) IIlls~!! 11 211P+1) 11 3 + M (1 + M) IIC(H1+I) _ C
Chapter 9: Generalized Polak-Ribiere Algorithm 48
since, from (3.4_10b), using (3.2.18),
c>+, ~ m 2 I1P+l) 112 IIs~:~ 112. Hence, from (3.4.lOa) and (3.4.11), using Le=as 3.27 and 3.29, it follows that
IIP+l+l) - s~:~ 11 = O(lIs(Hl) - s~:~ ID + O(lIg(Hl+l) - g~:W + O(IIP)1I2 )
for 0 :::; I < n - 1.
Lemma 3.33. For 0 :::; I < n - 1, lIa(k+I)P+l) _ a~:~s~:~1I = O(lIg(Hl) - g~:~II)
+ O(IIP+l) - s~:~ 11) + O(lIs(k)1I2 ).
Proof. For k ~ 1 and 0 :::; I :::; n - 1, we have, by (3.2.22),
where
Il a(Hl) s(Hl) _ a(k) s(k) 11 '+1 '+1 IIg(Hl) 112 s(Hl)
= s(Hl)T6(k+l)s(Hl)
c = (s(Hl)T 6(Hl) s(Hl (s(k)T G(k) s(k "'+, '+1 '+1
and 6(k) is as defined by (3.2.8), and hence
Ila(Hl) s(Hl) _ a(k) s(k) 11 '+1 '+1
:::; / {lIg(Hl)T(g(HI) _ g~:~)(S~:~TG(k)s~:~)P+l)1I >+,
+ IIg
Chapter 9: Generalized Polak-Ribiere Algorithm 49
< 1 - m2I1s(HI) 112 lis!!! 112
{MlIs~!~ 11211~(k+l) IIlIg(k+l) IIlIg(HI) - g~:~ 11
+ MIIP+I) - s~!~ IIlIs~!! III1P+1) IIlIg(HI) IIlIg!!~ 11
+ MIIP+I) 112I1s~!! IIlIg!!~ IIl1g(HI) - g~!~ 11
+ MlIs(HI) 112I1s(HI) - s~!~ IIlIg~:~ W
+ MIIP+I)1I 2 I1s(HI) - s~!~ IIlIg!!~ W
+ IIC(HI) _ G(k) IIIIP+I) 112I1s~!~ IIl1g~!~ 112 }
~ ~2 { 2MlIg(HI) - g~!~ 11 + 3MIIP+I) - s~!~ 11
+ IIC(HI) - G(k) IIlIs~!~ II},
using (3.2.18) and (3.2.14b). Hence, using Lemmas 3.28 and 3.30,
lIa(HI) PH) - a~!~ s~!~ 11 = O(lIg(HI) - g~!~ 11)
for 0 ~ I ~ n - 1. I
Lemma 3.34. For 0 ~ I ~ n - 1,
+ O( IIs(HI) - s~!! ID
+ O(lIs(k) 11 2 ).
(i) IIg(HI) - g~:~1I = O(lls(k)W) (ii) IIs(HI) - s~!~ 11 = O(IIP) 112) (iii) lIa(HI)P+1) - a~!~s~!~1I = O(IIP)1I2)
(3.4.15)
(3.4.16)
(3.4.17)
(3.4.18)
Proof. We prove (3.4.16), (3.4.17) and (3.4.18) simultaneously by finite
induction on lEO, n - 1.
Chapter 9: Generalized Polak-Ribiere Algorithm 50
For I = 0, (3.4.16) and (3.4.17) are trivially true, since, by definition of g~k) and s(k)
1 '
(3.4.19a)
(3.4.19b)
That (3.4.18) is also true for I = 0 follows from Lemma 3.33 using (3.4.19a) and
(3.4.19b ).
Assume now that (3.4.16), (3.4.17) and (3.4.18) are true for some 0 :::; I < n-1.
It follows that
by Lemmas 3.31 and 3.33 and the induction hypothesis.
Also,
by lemmas 3.32 and 3.31 and the induction hypothesis.
Moreover, since 1+ 1 :::; n - 1, we have, from Lemma 3.33,
IIC.,
Chapter 3: Generalized Polak-Ribiere Algorithm 51
Proposition 3.35. H the iterates x(k), generated by the GPR algorithm, con~ verge to a local solution x' of [P], then at each iteration,
(3.4.21 )
Proof. Applying the mean value theorem to g(.), we obtain,
(3.4.22)
for some e(k) E (x(k),x'). As the sequence {x(k)} in the compact set L(X(I)
converges to x', we have, x' E L(x(l) and hence, e(k) E L(x(!) because L(x(!)
is convex by [AP-ll]. Moreover, since x' is a local minimizer of j, g(x') = O.
Thus, from (3.4.22), we obtain, using (3.0.2),
We now consider the n-step quadratic convergence result for the GPR algo-
rithm applied to the problem [P] under conditions stated in the first paragraph of this section.
Notice that the stationary point x' of the quadratic p(k) is given by F(')
'i1 F(k)(x' ) = 0 FC") ,
(3.4.23)
that is,
(3.4.24)
where t/lNR is the Newton-Raphson algorithm applied to j. Since the Newton-Raphson algorithm t/lNR is quadratically convergent to the stationary point x' of j, we get
IIx' - x'lI = IIt/lNR(X(k) - x'lI F(k)
= O(lIx(k) - x'1I2). (3.4.25)
Chapter 9: Generalized Polak-Ribiere Algorithm 52
Also, using the fact that the GPR algorithm tPF(') reaches the minimum
point x' of the quadratic F(k) in at most n iterations, we have, F(t)
I Notice that
_ -I. (k)) x F(') - 'I' F(') X
_ (k) -x . n+l
Hence, using Lemma 3.34, we conclude that
n-l
Il x(k+n) - x 11 = ~ Ila(k+I) s(k+l) - a(k) s(k) 11 F(") L.J '+1 '+1 1=0
- ~"
(3.4.26)
(3.4.27)
We assume that the GPR algorithm is restarted every t iterations (t 2: n) with the steepest descent direction. In this case, s(kt) will be set equal to _g(kt)
every t iterations, and all lemmas considered previously in this section hold for k an integral multiple of t. The next theorem gives the n-step quadratic convergence result for the GPR algorithm applied to [P] with reinitialization every t iterations when an exact line search is adopted at eaclJ step.
Theorem 3.36. For the sequence {x(k)} generated by the GPR algorithm
restarted every t steps with the steepest descent direction,
-.- IIx(kt+n) - x'lI < hm 11 (kt) '11 2 _ C < 00 k-oo X - X
(3.4.28)
for some constant C, where x is a minimum point of f on Rn.
Proof. For such k an integral multiple of~, we obtain as in (3.4.25) and (3.4.27),
(3.4.29)
Chapter 9: Generalized Polak-Ribiere Algorithm
and
Because s(kt) = _g(kt), (3.4.30) becomes
IIX(kt+n) _ x;('1) 11 = O(lIg(kt)1I 2 )
= O(lIx(kt) _ x"112)
due to Proposition 3.35. So, from (3.4.29) and (3.4.31), we obtain,
IIX(kt+n
) _ x"1I < IIx(kt+n) - x;(") 11 + IIx;(,,) - x"1I
= O(lIx(kt) _ x"1I2)
and this completes the proof of (3.4.28). I
3.5 Characteristic Behaviour and Basic Algorithm
53
(3.4.30)
(3.4.31 )
In this section, we discuss the characteristic behaviour of the GPR algorithm
and formulate an implement able version of it. This is almost identical to
the traditional CG algorithm, differing only in the computation of the search
directions.
We first compare the search directions of the GPR algorithm with those of
the generalized CG method of Liu and Storey[L2] as described in (2.3.18) and
of the memoryless BFGS-QN (abbreviated as MQN) method of Shanno[S4]. To
distinguish, we denote these by s~kJR' s~~) and S~~N respectively.
As in (2.3.18),
(
(k)TG(k) (k) (k) (k) (k-l) 9 9
SLS = - (g , S ) (k)TG(k) (k-l) 9 . S
(3.5.1 )
and hence, using (3.1.15c) and (3.1.16), we have,
Chapter 9: Generalized Polak-Ribiere Algorithm 54
s~~) = ~ { _ (s(k-l)T G(k)s(k-lg(k) + (g(k)T G(k)s(k-lS
Chapter 9: Generalized Polak-Ribiere Algorithm 55
where
(3.5.7b)
Now, if we choose, in particular, e(k-I) = (l/y(k-I)Ty (k-I), then (3.5.7)
immediately reduces to the MQN search direction:
with
-(k).o. _ d(k-l)y(k-I)T ( y(k_I)T y(k-I) d(k-I)d 1 by the backward finite-difference formula
(k) -(k-I) (k) ;;(k-I) _ 9 - 9
G s - 8 (3.5.10)
with g(k-J) .0. g(x(k) -8 s(k-I) and 8 any suitable positive small number, then the
Hessian matrix G(k) itself need not be computed or stored. This way of avoiding
Chapter 9: Generalized Polak-Ribiere Algorithm 56
matrix computation and storage can result in significant savings on large-scale
problems, and may be essential when it is not possible to store G(k). The linear
algebra required to obtain the product on the left hand side of (3.5.10) is also
reduced. On the other hand, this gain is obtained at the expense of one additional
gradient evaluation per iteration. Now considering (3.1.16) and (3.5.10), (3.1.15d)
becomes for k > 1,
(3.5.11)
An implement able algorithm for the GPR algorithm with natural restart and
based on the search direction given in (3.1.15b) is stated below:
Algorithm: GPRl
Step 1. Let x(1) be an estimate of a minimizer x of f.
Step 2. Set k = 1 and compute s(1) = _g(1).
Step 3. Line search: compute X(k+l) = x(k) + a(k)ik ) and then compute g(k+l).
Step 4. IT IIg(k+l)1I < E, take x(k+l) as x and stop. Otherwise go to Step 5.
Step 5. IT k + 1 > n > 2, then go to Step 11. Otherwise to to Step 6.
Step 6_ Compute
Step 7. With 6 = min(1, .,Jry/v's(W s(k), compute
g(k) = g( x(k+l) _ 6s(k).
Step 8. Compute
Chapter 9: generalized Polak-Ribiere Algorithm 57
Step 9. Compute new search direction
Step 10. Set k = k + 1 and go to Step 3.
Step 11. Set x(k+l)toX(l) and repeat Step 2 onwards.
To implement the GPR algorithm requires approximately 5n + 3 double-precision words of working storage locations and O(n) operations per iteration.
CHAPTER 4 SOME MODIFICATIONS OF THE GPR ALGORITHM AND THEIR IMPLEMENTATIONS
In this Chapter, we consider several modifications of the GPR algorithm to
try to improve its convergence properties and computational efficiency and discuss
their implementations for solving the general nonlinear problem
[P] Minimize J(x), x ERn.
We also discuss the theoretical and algorithmic behaviour of these modified
algorithms.
As in the previous chapter, we assume that the objective function J satisfies the basic conditions [AP-I] and [AP-2] of that chapter and that the GPR
algorithms are initiated at x(!) satisfying [AP-2]. Other conditions will be added
whenever they are required.
A CG algorithm with exact line search finds the minimum of a convex
quadratic function of n variables in at most n iterations. Such an algorithm
with periodical restart is often globally convergent as well as n-step quadratically
convergent when the line search is taken to be asymptotically exact (Luksan[L3]
and Baptist and Stoer[B6]). Nevertheless, without regular restart, the PR
algorithm can cycle infinitely without approaching an optimal point or can
sometimes slow down away from the optimal point (Powell[P4, P2, P5]), because
a very small step IIx(k+1) - x(k)1I is taken at each iteration. The GPR algorithm
is approximately the same as the PR algorithm when working with an exact line
search and we have noticed that, in general, it may require a very large number
of iterations to approach the solution point unless a restart is made occasionally
with the steepest descent direction. So, we propose some restarting strategies for
Chapter 4: Some modification.s 0/ the GPR Algorithm 59
the GPR algorithm which will hopefully improve its computational efficiency and
CPU time.
4.1 GPR Algorithm with Non-negative Beta
Powell[P4] has shown that there are functions / satisfying conditions [AP-l]
and [AP-2] for which the PR algorithm, even with exact line search and exact
arithmetic, generates gradients which stay bounded away from zero. Powell's
example requires that some consecutive search directions become almost opposite,
and as this can only occur, in case of exact line search, when p~':2 < 0, so Powell[P5] suggests a new implementation of the PR algorithm with a non-
negative value for p~':2 taken at each iteration. Motivated by Powell's suggestion
and the fact that P~kjR ~ p~':2 when working with exact line search, we propose an implementation of the GPR algorithm with P~kjR ~ 0 to prevent cycling. This
modified algorithm will be called the GPR Algorithm with Non-negative Beta
(GPR+ Algorithm, in short).
The search directions in the GPR+ algorithm, obtained from (3.1.15), are
where p(k) is given by GPR+
p(k) = max{p(k) ,O} GPR+ GPR
for k = 1, for k > 1, (4.1.1)
(4.1.2)
on all iterations, where P~~R is given by (3.1.15d). Thus, the GPR + search directions are given by
S(k) = { s~jR' if P~kjR > 0, GPR+ _g(k), otherwise
( 4.1.3)
with k ~ 1. It follows that the GPR + algorithm has the "automatic" restarting procedure depending on the values of P~kjR.
Since (4.1.1) and (4.1.3) are equivalent, it is immaterial which particular
form is used to describe the GPR+ search directions, and we shall use (4.1.3) in
our implementation.
Chapter 4: Some modificatiom of the GPR Algorithm 60
It is easy to see that the GPR + algorithm preserves the descent property
(3.1.17) at each iteration. Moreover, it follows from the construction of s(k) GPR+
that the GPR+ algorithm inherits all the properties of the GPR algorithm.
We now formalize a convergence theorem for the GPR + algorithm. For
convenience of notation, we continue to use S 0, k-oo
00 L lI u (k) - u(k-I)1I2 < 00, k=2
Proof. By (4.1.5), 3 > 0 3
(4.1.5)
(4.1.6)
(4.1.7a)
where f) is as defined in (3.3.20). Moreover, as in Proposition 3.24, 3 b > 1 3
(4.1.7b)
for all k > 1.
Chapter ./: Some modification3 of the GPR Algorithm
From (3.1.15b) and (3.1.15c), we obtain,
P) = - (1 + rP)q.)g(k) + rP)P-l)
and hence
for all k > 1, where
"" g(k)TS(k-l)
q. - g(k)Tg(k) ,
r(k) "" (1 + fJ 1,
and hence,
(i) r(k)\(k) = IIr(k) 112 +6.r(k)Tu (k-l),
(ii) 1 = U(k)T u(k)
= IIr(k)1I2 + 20. r(k)T u(k-l) + o~,
(iii) u(k) _ u(k-l) = r(k) + (0. _ l)u(k-l)
lI u(k) _ U(k-l) 112 = IIr(k) 112 + 2(0. _ l)r(k)T u(k-l) + (0. _1)2
= 2(1 - o. _ r(k)T u(k-l)
= 2(1- o~ - (1 + o.)r(k)Tu (k-l)/(1 +6.) = 2( r(k)T u(k) _ r(k)T u(k-l)/(1 + 6.)
= 2r(k)T(u(k) _ u(k-l)/(1 + 0.)
~ 2I1r(k)lIl1u(k) - u(k-l)II/(1 + 0.),
61
(4.1.8)
(4.1.9)
(4.1.10a)
(4.1.10b)
(4.1.10c)
(4.1.11)
(4.1.12a)
(4.1.12b)
.< 4.1.12c)
Chapter 4: Some modification3 of the GPR Algorithm
using the Cauchy-Schwarz inequality. Hence,
IIU(k) - u(k-l)1I ~ 2I1r(k)lIf(1 + 6.)
~ 2I1r(k) 11,
since 6. ~ O. But, from (4.1.10b),
IIr(k)1I = 11+ P(k)q.lllik)lIflls(k)1I
~ cllg(k) IlfllP) 11
62
(4.1.13)
(4.1.14)
for some c ~ 1, since, by (4.1.7a), (4.1.7b), (4.1.10a), (3.1.17) and (3.1.20) or
(3.3.2), we have,
From (4.1.13), (4.1.14) and (3.2.15), it then follows that V k > 1,
where e(k) ~ (- g(k)'s(k). So, if (4.1.6) fails, then
00 L cos2 e(k) = 00 k=l
(4.1.15)
(4.1.16)
and hence, by Theorem 3.19, the assumption (4.1.5) fails. This completes the
proof. I
Theorem 4.2. Suppose that conditions {AP-7} and {AP-S} hold in addition to
the conditions stated previously. Then the limit
lim IIg(k) 11 = 0 (4.1.17) k-oo
is achieved by the GPR + algorithm.
Proof. First suppose that the condition [AP-10] is satisfied.
Chapter 4: Some modificatioTt-' of the GPR Algorithm 63
We see, from Proposition 3.24, that conditions [AP-9a] and [AP-9b] are
satisfied. So, by Proposition 3.25, there exist an integer I > 1 and a constant c> 0 (depending on I) such that for k ~ I,
(4.1.18a)
It follows that
(4.1.18b)
and hence (4.1.17) is achieved. For, if not, then 3f > 0 3 V k ~ 1, IIg(k)1I 2: f and so, since by (3.2.15),
we have, 00 L cos2 e(k) IIg(k) 112 = 00 ( 4.1.18c)
k=l
contradicting the Zoutendijk condition (3.3.18a).
We now consider the case when [AP-I0] is not satisfied. In this case:
[AP-I0]* There exists>. > 0 such that for all integers I > 1 and T ~ 1,
there exists an integer k ~ I such that the number of indices
i E k, k + T - 1 for which IId(i-l) 11 > >. is greater than t.
Assume that fun IIg(k)1I of O. (4.1.19) k-oo
The sequence {x(k)} being bounded, 3B > 0 3 IIX(k) 11 ::; B for k ~ 1. With >. as in [AP-I0]*, define the integer T ~ 1 by
8B 8B T ::; T < T + 1. (4.1.20a)
By Proposition 4.1, in view of (4.1.19), we have (4.1.6) and hence, with T as
above, there exists an integer I > 1 such that
(4.1.20b)
Chapter 4: Some modification" of the GPR Algorithm 64
If we select k ~ I as in [AP-lOr, then, since
k+r-l
= L IIJi-l)lIu(i-l) i=k
we have,
k+r-l = x(k+r-l) _ x(k-l) _ L IIJi-l) 11 (u(i-l) _ u(k-l).
i=k
Hence, taking norms,
k+r-l k+r-l L IId(i-l)1I ::; 2B + L IIJi-l)lIlIu(i-l) - u(k-l)lI ( 4.1.20c) i=k
But for i E k,k + r -1, using the Cauchy-Schwarz inequality and (4.1.20b),
i-I
lIu(i-l) - u(k-l)1I ::; L lIu(j) - u(j-l)11 j=k
::; r1 Ur) 1 1
=2
and hence, from (4.1.20c), we obtain,
Chapter.4: Some modification.! of the GPR Algorithm
= ~ L II .. P-')II + ~ L lIct 1 L lIct,\
> .!..A~ 2 2 .AT
="4'
in view of [AP-10t. Thus, we have,
SB T < A'
65
contradicting (4.1.20a). This contradiction leads to the denial of (4.1.19) and so
(4.1.17) is achieved. I
For implementing the GPR + algorithm, we have the following modified
version of the Algorithm GPR1:
Algorithm: GPR2
This is the same as Algorithm GPR1 except that an additional step, namely
Step Sa, is inserted in between Step S and Step 9:
Step Sa. If ,8~~tl) > 0, then go to Step 9. Otherwise go to Step 11.
4.2 GPR Algorithm with Powell Restart
Regarding the GPR + algorithm, we notice from (3.2.10a) that, when working with exact line searches a(k) > 0 iff g(k)T(g(k) - g(k-I) > 0 that is iff , ,vGPR - -, ,
(4.2.1)
So, the GPR+ algorithm with exact line search induces a restart in the steepest
descent direction whenever (4.2.1) is violated, that is, when
( 4.2.2)
Chapter 4: Some modification~ of the GPR Algorithm 66
The condition (4.2.2) is a less restrictive restarting criterion than the Powell
restarting criterion (Powell[P2])
(4.2.3)
Even though, Powell's criterion (4.2.3) was designed to ensure the conver-
gence of Beale's restarting algorithm (Powell[P2]), we consider its use with the
GPR algorithm in the hope of improving efficiency and convergence. The resulting
algorithm, called the Powell Restarting GPR Algorithm (PGPR Algorithm, in
short) is just as in (3.1.15) except that f3;~PR = f3~~R if
( 4.2.4)
d a(k) an ,vPGPR = 0 if (4.2.3) occurs.
Moreover, we find, in view of (3.1.15), that the PGPR search direction s~~PR can be written as
s(k) _ SGPR' {
(k)
POPR - _g(k),
for k :2: 1, and so the descent property
if (4.2.4) holds, if (4.2.3) holds
(4.2.5)
(4.2.6)
is satisfied at all iterations. It also follows that all the standard properties
of the GPR algorithm apply directly to the PGPR algorithm under the same
assumptions as those imposed on the GPR algorithm. For instance, we can see
that the global convergence results, stated in Theorem 3.19 and Theorem 3.20,
hold for the PGPR algorithm provided we suppose that g(k) 'f 0 for all k and that the line search is taken satisfying (3.3.3) with some p > O.
It may be observed from (2.3.15), (3.2.10a) and (4.2.4) that in implementing
the PGPR algorithm with exact line search, a restart is not induced if
la(k) _ a(k)1 < 0.2 a(k) ,v GPR ,vFR - fJFR ' (4.2.7a)
that is, if
(4.2.7b)
Chapter 4: Some modification.! of the GPR Algorithm 67
Thus in such implementations of the PGPR algorithm, satisfaction of (4.2. 7b), is
a measure of adequacy of ,8~':?R'
Since the gradients are orthogonal when the GPR algorithm is applied to
a quadratic function q(-) and exact line searches are performed (see Proposition 3.3), and since (4.2.3) decides whether enough orthogonality between g(k-I) and
g(k) has been lost to warrant a restart, it is necessary for the implementation of
the PGPR algorithm on a quadratic function that the line searches be almost
exact at all iterations. This means that the line search at each iteration must
perform at least one cubic interpolation, which is, in fact, very expensive in terms
of function and gradient evaluations.
To implement the PGPR algorithm, we modify the Algorithm GPRl as
described below:
Algorithm: GPR3
Just add the following new step, namely Step 5a, to Algorithm GPRl in
between Step 5 and Step 6.
Step 5a. If Ig(k+I)T g(k) I > 0.2I1g(k+l) 11 2 , then go to Step 11. Otherwise go to Step 6.
4.3 Shanno's Angle-Test Restarting GPR Algorithm
Shanno's angle-test restart procedure (Shanno[S2]) for CG algorithms sets
up a switching criterion for restarting such algorithms with a steepest descent
direction when the cosine of the angle between the search direction and the
negative gradient is greater than a constant multiple of the cosine of the angle
between the FR search direction and the negative gradient. It thereby assures
the convergence of the modified algorithm, as the FR algorithm is globally con-
vergent. We propose, therefore, an implementation of the GPR algorithm which
incorporates the angle-test restart. This new implementation will be referred
to as the Shanno's Angle-Test Restarting GPR Algorithm (SGPR Algorithm, in
short).
Shanno's procedure is based on consideration of the FR process with exact
line search (so that g(k)TS~~-I) = 0). Thus we have,
Chapter 4: Some modificatioTl-' of the GPR Algorithm
for k ~ 1 and
IIg(k) 112 IIs~~1I2
for k > 1, where e~~ "" (- g(W s~~). From (4.3.2), it follows that Vk > 1,
IIs~~ 112 = IIg(k) 112 + .B~~2I1g(k-1) 112 + .B~~2 .B~~-I)2I1g(k-2) 112
+ ... + .B~~2 .B~~-1)2 . .B~~2I1g(l) 112
Hence, using (2.3.15), we obtain,
k
= IIg(k) 1142: IIg(l) 11-2 1=1
and so, from(4.3.1), we have
cos2e(k) = ___ ..:1 __ _ FR k
IIg(k) 1122: IIg(l)U-2 1=1
Assuming that g(k) :f 0 for all k, define
,),(k)2 ~ ---k,.:..T __ -
IIg(k) 112 2: IIg(1) I( 1=1
with T > o. We now use the test
cos2e(k) (k)2 GPR ~ ')' ,
68
(4.3.1)
(4.3.2)
(4.3.3)
(