Grey’s method for nonlinear optimization

Vol. 63

JOURNAL OF THE OPTICAL SOCIETY OF AMERICA VOLUME 63, NUMBER 5

Grey's method for nonlinear optimization

L. W. CornwellWestern Illinois University, Macomb, Illinois 61455

R. J. PegisState University of New York, Brockport, New York 14420

A. K. RiglerUniversity of Missouri at Rolla, Rolla, Missouri 65401

T. P. VoglWestinghouse Research Laboratories, Pittsburgh, Pennsylvania 15235

(Received 31 October 1972)

The optimization technique first developed by Grey for lens design is described in terms of more recentwork that has appeared in the mathematical programming literature. We present two modifications of theoriginal algorithm that improve its performance on difficult problems. To confirm the confidence that themany users have placed in the method, the outline of a convergence proof is included. Finally, thetheoretical work is supported by the results of numerical experiments.

Index Headings: Lens design; Computers.

In 1963, Grey published a description of his work thathad led to a very successful computer program fordesigning optical systems.' Although Grey intendedthat his papers introduce a new optical concept oforthonormal aberrations, his work received the noticeof the optics community primarily because of thepractical success of his computer program. NeitherGrey's method nor its popular alternative, dampedleast squares,2 need to be urged upon optical designers;it is not yet known how to choose between the twomethods for a given least-squares problem.' Our

emphasis is not intended to imply that this difficultquestion has been answered in favor of Grey's algorithm;instead, by re-explaining his technique in terms of more-recent work that has appeared in the optimizationliterature, we are led to improvements of the conver-gence rate and in the orthogonality calculations.

We first describe Grey's original algorithm as aconjugate-direction method for the minimization of asum of squares of functions. The basic technique oforthogonalization is conventional in linear regression4 ;Grey's method is an ingenious modification to treat

MAY 1973

576

GREY'S METHOD FOR NONLINEAR OPTIMIZATION

the

nonlinear functions. Our second section considers theproblem of extreme ill condition which often occurswhen constraints are handled by means of penaltyfunctions. It is well known that the classic Gram-Schmidt orthogonalization process is numerically lessdesirable than the modified process.5 Nevertheless,Grey's method for nonlinear problems follows theclassic order. We show that a linear correction can beapplied that restores the effectiveness of the algorithmfor ill-conditioned problems. Next, we introduce anacceleration step that reduces the number of iterationsand provides the link to a general convergence theoremthat sets forth conditions to guarantee convergence ofthe algorithm. Finally, we support these theoreticaldevelopments with numerical evidence.

TWO ORTHOGONALIZATION ALGORITHMS

Although Grey's work was completed earlier and wascarried out from a different viewpoint, we have foundit instructive to develop his approach to optimizationas a variation of a familiar linear least-squares algorithmthat was published by Bauer.6 We begin with a briefdiscussion of Bauer's paper.

Bauer's Linear Least-Squares Algorithm

The following equations are to be solved inleast-squares sense,

Am=k

or

AtAg = Atk,

where A is an n Xm matrix, n> m. Because the conditionof AMA will be worse than the column condition' of A,Bauer proposed that A be factored into an nXm matrix4) whose columns are orthonormal, postmultiplied byan mXm transformation matrix S. Thus, Eq. (1)becomes

SOp=U k(2)

These equations are solved first for p=4)tk, and thenfor I. If S were upper triangular, as it would be when 4)is formed by the Gram-Schmidt process, then IS-'vis obtained by a simple back substitution.

Decomposition techniques of this nature are discussedin some detail by Golub, 4 who shows how they arerelated to the square-root method applied to the normalequations matrix AMA. The Gram-Schmidt process forproducing the orthonormal columns of 4) can be carriedout in two ways, differing algebraically only in theorder of calculation.

The set of vectors as, i= 1, 2, ... , m are to be trans-formed into an orthonormal set ¶i. Inner products aredenoted by (.,.) and Hjajl=(a,a)i.

Classic Gram-Schmidt (CGS) Method

01 *=ai,

01=01*/11 01*ll,

k-1Ok*=ak-_ (0iak)Oi,

i~=1

Ok= Ok*/11 Ok*IJ, k=2, 3, ... , m.

Modified Gram-Schmidt (MGS) Method

01*= al,¶Ss= ¶3s*/ jI3*II,

,0j =aj-(0jaj)0j, j=2, 3, ... , mok *= k ,

(3)

(4)

Ok= Ok*/11 3k*11,k = ¶jl k, 0j)0k,

j=k+l, ... , m, k=2, 3, ... , m.

Bauer's ALGOL procedure uses the MGS orthogonal-ization because it does not tend to lose orthogonality asa result of propagated round-off error. For well-conditioned linear problems, Grey's method andBauer's are identical. For difficult linear problems,Bauer's is clearly the better method.

Grey's Version for Nonlinear Problems

Grey chose the classic order, for reasons based uponhis approach to the nonlinear problem. Given a modely= f (x; b)+e, where (yi,xi) is an observation point and

(1) b= (bib2,. . .,bm) are the parameters to be estimated,we attempt to find b to minimize

J=FE (yi-f(xi; b))2,i-1

2= , Ei.i=l

n>m

(5)

Linearization replaces f (xi; b) by the first two terms ofTaylor's series about some nominal values b0 ,

m aff(xi; b)cc f(xi; b0 )+E -(xi; b0),Bj,

j=i abj(6)

leaving a linear least-squares problem to be solved forthe perturbations I. The nominal values of b are dulycorrected and the process is repeated as often asnecessary to reduce the ei's. Unfortunately, thissequence may oscillate widely or diverge; most of thecommonly used methods for solving nonlinear minimiza-tion problems were developed to correct the erraticbehavior of this sequence.

Equation (6) is rearranged to be in the form ofEq. (1),

mEalyfflj = k, I = 1, 2, ... ,n

May 1973 577

CORNWELL et al.

where

Of-(xi; b0 )abi

cof-(xi; b0 ) lab2

Of* .- (Xi; bm)

ab.

of-(x2; b0)ab1

of(x.; b0)

.abi

(7)

and

k = [y - f (xi; b°),y2-f (X2; b),...,y.-f(x.;b 0 )]t . (8)

Grey solves this linearized problem by orthogonaltransformations of Eqs. (2). In the transformedcoordinate system, the variables ,uj are decoupled andthe first of Eqs. (2) represents m scalar equations. Thesolution of the first of these, pi=(0,, k), minimizes thequadratic form J as a function of the single variable Al.If the other values of zuj remain fixed at zero, we solveone further scalar equation for Al.

At this point, Grey observed that only the firstcolumn of A and the vector k were used; only one of theparameters 5 was adjusted; and adjustment of the pcoordinate system was chosen to reduce J. Because thesecond column of A is not needed until we solve for u2,it is not evaluated until the new nominal valueb- [bl+13 1, b2

0, . . ., bmO]t is available. At this time,the right-hand side k is recomputed, also by use of themost-recent b. The new column a2 is transformed bysubtracting the appropriate multiple of A1 and normaliz-ing. Then 40 2=(432 ,k) and A1 and 132 are obtained byback substitution in Eqs. (2). This process is carriedthrough the m parameters and the entire cycle isrepeated as necessary.

In the linear case (where J is strictly quadratic) thecolumns of A do not change and Grey's method isformally equivalent to Bauer's. However, the conven-ience in computation of Bauer's algorithm does notcarry over in the nonlinear case. The interruption ofDO loops caused by the calculation of the aj vectorsat each iteration makes the CGS sequence moredesirable from a programming viewpoint. We will show,in the following section, how the numerical advantageof the MGS process may be preserved.

To prove quadratic convergence, we note that eachstep in the p variables is the optimal value for thelinear problem. If uj= (5jk) does not diminish J in thenonlinear case, a one-dimensional search may beconducted at each step. The algorithm belongs to arather general class of minimization algorithms, calledconjugate directions. 8 To see how this is so, we note thattj-[Eo0o . .,ujO, .. ,O]t and Ui=[O,O, ...,, ... ..Otare orthogonal by construction (only the jth and Ithcomponents, respectively, are nonzero). Then

(vipi) = (Odpi,IDp) = (AS- 'p1 ,AS-1pi)= (A~j,A5,) = (5I,At A5) =0.

The steps 5j and 51 in the original parameters areconjugate with respect to the normal-equations matrixAMA. The conjugate directions are the columns of theinverse of the transformation matrix S. This is essen-tially a proof that Grey's algorithm has quadraticconvergence. 9

AN IMPROVED ORTHOGONALIZATION

Shortly after Grey's original papers were published,Rice' demonstrated that the MGS process was morereliable than CGS, in that orthogonality is not lostas a result of propagation of round-off errors. If theoriginal vectors aj are nearly (but not exactly) linearlydependent, the search may get trapped in a subspaceof lower dimension and fail to converge to the correctanswer, even for a linear problem. We have seen thatthe MGS process is rather incompatible with Grey'salgorithm, so we are left with the prospects of failureon many practical problems. Extended precision oncrucial calculations has not been an entirely satisfactorysolution.

In a generally available but unfortunately unpub-lished paper by Mitchell and McCraith,' 0 we havefound a way to include the effect of Bauer's stableprocedure into Grey's order of calculation. By aheuristic error analysis of the CGS process, Mitchelland McCraith derive a correction to each sij, i <j, thatis based upon any deviation from orthogonalityeji=(0i,0j)5dO that may be the result of round-offerror. Equations (3) are replaced by

i-i0kj* = aj-F [sij+dijsJi,

i=1

j-1dij= - L si

1l1.i,-i(9)

si, = (Oi,az) = Oita,

e1j = (zI,0jj) = AtV1 .

The purpose of the correction dii is to make eij=0 asnearly as possible. The sij are calculated as before;however, the elements of the transformation matrixare sij+dij.

This correction term requires some additionalarithmetic, to obtain the elj. However, many users oforthogonalization routines evaluate the elj anyway, asa check on the conjugacy of the derived vectors. Forthem, our correction is essentially free. Because all eijdepend upon previously calculated vectors, the correc-tion can be applied in the nonlinear case. Thus, Grey'sbasic algorithm for nonlinear least squares can enjoythe numerical stability of the MGS process used byBauer and others.

578 Vol. 63


AN ACCELERATION OF GREY'S ALGORITHM

Nearly all of the search techniques for locating theminimum of a function of m variables share the char-acteristic that a search direction is chosen first, followedby a search for a minimum along that direction. Thevarious methods are distinguished by when and howthe search directions are chosen. We may classify'0

the various methods into two groups,"" 2 total-stepmethods and single-step methods, according to whetherthe m parameters are updated simultaneously or oneat a time. Of the two methods most familiar to opticaldesigners, damped least squares is a member of thetotal-step class, whereas Grey's algorithm is a single-step method in the La coordinate system. When theobjective function is purely quadratic (linear leastsquares), the two are equivalent, in the sense that bothsolve the linear problem once."3 4

In order to accelerate Grey's algorithm, we borrowfrom the total-step concept. The nominal point b0

about which we have linearized is taken as a base point;the solution of the linearized equations provides adirection along which to search for a minimum. Thedamping term, if it is used, serves to alter this directionslightly toward the direction of the gradient. Thus, thesolution of the problem linearized at b0 gives a direction5 in which to search; the actual step length X is theresult of that search. The new base point is

bnewo = bold'+± X,

where X minimizes J(bold'+lg) as a function of thesingle variable 1.

Grey's method (and other single-step methods)chooses m successive independent directions by somerelatively simple scheme; the one-dimensional searchmay be performed at each of the m steps, or perhapsthe step length correct for the purely quadratic J isaccepted. For the quadratic case, the minimum isobtained by one total step or m single steps, at aboutthe same cost in arithmetic. Our acceleration process isa one-dimensional search conducted after the m singlesteps are complete. The search direction is g = b m - bold'.Thus we minimize J(b0 1d0 +lg) and the nominal basepoint for the next cycle of m single steps is bnewo=bm

Our excuse for this additional step is that a suitabledirection is found after one total or m single steps. Thisdirection is slightly different from the purely linearizedsolution, because of the damping term in the total step,and a still different direction is found because Greypartially accounts for nonlinearity at each single step.Grey's method in its original form failed to make use ofthis over-all direction; our acceleration is simply anattempt to make use of this piece of information.The variable-metric algorithms seem to be moreeffective when reset after the (m+ 1)st rather thanafter the mth step; this (m+1)st step has an effectsimilar to our additional step. In its most elementary

form, this is similar to the Hooke and Jeeves directsearch," in which the m single steps form the explorationphase followed by one pattern move.

In summary, the proposed revision of Grey's methodis (i) select the nominal base point b0 ,d0 (ii) Performthe original algorithm for m conjugate steps, to thetemporary new base bi. Use the corrector calculationto maintain conjugate directions. (iii) Find X tominimize J(bm+X[bm-bO.dO]). (iv) Choose the newbase point bfleW°=bi+X(bm-bold0). (v) Reset theinner algorithm and return to (ii) if the sequence{b-) has not converged.

These modifications do not alter the basic sequenceof the original calculations and require only minorpatching to incorporate in existing computer programs.

CONVERGENCE OF THE ACCELERATEDALGORITHM

The mathematical details of a convergence proof arenot appropriate to this journal; however, the fact thatsuch a theorem can be proved should be of interest tothe users of the algorithm. Therefore, we will discussthe theorem briefly and refer to other sources for thedetails. A rather general treatment of nonlinearoptimization is given by Zangwill" and the specificapplication of his theory to Grey's accelerated methodis available in the thesis by Cornwell.'7

Zangwill describes the computational routine interms of a point-to-set mapping: A given point in a setgenerates a subset, any member of which is a possiblesuccessor of the given point. The given point generatesa sequence of points which belong to a compact subset.Furthermore, a solution set is included in the originaland a continuous objective function is given such that,for a nonsolution point, the objective function isdecreased by a point of the image set; the map isclosed if the point is not a solution. These conditionsassure that either a solution is found or the limit of anyconvergent subsequence is a solution.

Zangwill defines a mixed algorithm as one in whichan alternate convergent map is interspersed periodicallyand he proves that this also gives a convergent process.The proof of Grey's accelerated method consists ofshowing that it is a mixed algorithm, both parts ofwhich satisfy Zangwill's hypotheses.

NUMERICAL EXPERIMENTS

Our two modifications of the original Grey procedureare quite independent in their effects; for purposes ofillustration, we treat them separately. Our first illustra-tive problem is a set of four nearly linearly dependentvectors, which has become a traditional bad example innumerical analysis.'8 We will compare the results ofCGS, the corrected CGS, and the true values fromextended precision. We will solve the linear system,A5=k,

May 1973 579

CORNWELL et al.

TABLE I. A numerical comparison of two algorithms.

A B C D EIT FE IT FE IT FE IT FE IT FE

Original 64 224 18 63 24 156 90 585 20 134Modified 24 121 10 50 18 136 58 445 10 80

0O.4 3033150.6024641

(D= 0.5163978L0.4303315

'11.61895

S =

0.08440980.5739057

-0.81017420.0844098

5 7 6 5 23

7 10 8 7 I,2 = 32 .6 8 10 9 a3 33

5 7 9 10J f4J ,31J

The answers, of course, are 1 = 32==03==034- 1. We willdisplay the 4 and S matrices and the solution vectorfor the three cases mentioned.

-0.3695331-0.2571774-0.1283258

0.8836505

0.82369091-0.4859476-0.2487648 ,

0. 1533144J

16.18045 16.43866 15.31980 10.4388536 -2.244349 -2.008080

2.393903 4.033666 ,0.0819287J

9=[-10.76233 8.102251 3.959323 -0.755828]t.CGS Corrected

0.43033150.6024641

tI = 0.5163978,0.4303315

'11.61895

S =

0.08440980.5739057

-0.81017420.0844098

-0.3695245-0.2570599-0.1285266

0.8836589

16.18045 16.438720.4388536 -2.244904

2.393903

0.81923711-0.4915492-0.2457317 ,

0.1638497J

15.31984 1-2.008594

4.032683 ,0.0819224J

P=[1.085994 0.948791 0.976511 1.014064]'."True" Solution

FO.43033150.6024641

4- 0.5163977L0.4303315

0.08439490.5738857

-0.81019150.0843949

-0.3695286-0.2570634-0.1285317

0.8838481

0.8192319 1-0.4915392-0.2457695 ,

0.16384639

F11.6189500 16.18046376

0.4388537257S=

16.43866265-2.244905597

2.393902511

15.31980079 1-2.008599745

4.032681411 ,0.08192319205

= [0.9999999978 1.000000001 1.000000001 0.9999999997]t.

For a numerical comparison of our acceleratedtechnique with the original version, we chose five testproblems that have become rather traditional to usefor such purposes. Each consists of the sum of squaresof nonlinear terms. To the best of our knowledge, theyare identified by the name of the original proposer.

(a) Rosenbrock:

f(xI,x2 ) = 100(x2-x? 2 )2 + (1 -x 1)2 .

Solution: f(1,1) =0.

Starting point: (-1.2, 1.0).

(b) Beale:

f(xl,x 2 ) = [1.5 -xi(1 -x 2 ) ] 2+[2.25-xI(1 -x22 )]2

+[2.625 -xi(1 -x23 )]2 .

Solution: f(3.0,0.5) =0.Starting point: (2.0,0.5).

(c) Powell:

f(Xl)X2 ,X3,X4) = (X1+ 10X2)2+5 (X3 - X4 )<+ (X2 - 2x3)4

+10(xI-x 4 )4 .Solution: f(0,0,0,0) = 0.Starting point: (10, 10, 10, -10).

Vol. 63580


(d) Wood:

f(xi,X2,X3,X4) = 100(x2 -X 12)2 + (1 -X1Y) +90(X4-X32)2

+ (I -X3) 2+0.2 (X2 - 1)2+0.2 (X4 - 1)2

+9.9(X 2+X4-2) 2 .

Solution: (1,1,1,1)=0.

Starting point: (-3, -1, -3, -1).(e) Miele:

f (x1,x2,x 3,x4) = (ex' -x 2 ) 4 + 100 (X 2 - X3)6

+tan 4 (X3 -X 4 ) +X 18 + (X4 - 1)2.

Solution: f(O,1,1,1) =0.

Starting point: (1,2,2,2).

We tabulate both the number of iterations (IT) andthe number of function evaluations (FE) required toreach a solution within a specified tolerance. (SeeTable I.) If successive vectors are checked for ortho-gonality in Grey's original algorithm (a prudentpractice), then neither of our variations will cost asignificant amount of computation. If the orthogonalitycheck had not been carried out, we would need anadditional (m-2)(m-3)/2 inner products of X dimen-sional vectors.

SUMMARY

We have described Grey's optimization technique asa conjugate-direction algorithm for minimizing aquadratic form. He uses the classic Gram-Schmidtprocess to obtain the conjugate search directions. Thischoice of CGS is both a weakness and a strong pointof his work. The CGS process is notoriously unstableand yet necessary to the attack on nonlinear problems.We have shown that the numerical difficulties can berepaired without sacrificing the nonlinear capability.

In addition, our extra search step after the m usualsteps can be quite effective in accelerating the progresstoward the solution. The inclusion of this extra step

defines a mixed algorithm that satisfies Zangwill'shypotheses for convergence.'6

REFERENCES

'D. S. Grey, 1. Opt. Soc. Am. 53, 672 (1963); J. Opt. Soc. Am.53, 677 (1963).

2K. Levenberg, Q. Appl. Math. 2, 212 (1944).3L. W. Cornwell and A. K. Rigler, Appl. Opt. 11, 1659 (1972).4G. H. Golub, in Statistical Computation, edited by R. C.

Milton and J. Nelder (Academic, New York, 1969).5J. R. Rice, Math. Comput. 20, 325 (1966).6F. L. Bauer, Numer. Math. 7, 338 (1965).'Bauer defines the column condition of the rectangular matrix

Aasmax 11 Ax I min B1 Ax 11

llxil=l / ilxil=1.8N. A. Broste, Thesis, Carnegie-Mellon University (1968)

(University Microfilms, Ann Arbor, Mich., orderNo. 69-7898).

'Quadratic convergence or property Q is that the algorithmminimizes a function of m variables in m steps. This mustnot be confused with second-order convergence, a propertyof Newton's method.

'0 W. C. Mitchell and D. L. McCraith, Heuristic Analysis ofNumerical Variants of the Gram SchmidtOrthonormalization Process, AD 687 450 (NationalTechnical Information Service, Springfield, Va., 1969).

""Total step" and "single step" are familiar terms to workersin numerical solution of elliptic boundary-value problems, afield that has much in common with our subject (Ref.12).

"L. Collatz, The Numerical Treatment of DifferentialEquations (Springer, Berlin, 1960).

i3In a recent monograph on optical design, Jamieson (Ref. 14)remarks that the variable-metric method, aconjugate-direction method, has been less successful than thedamped least-squares method. Although we agree with thisassessment, we believe that the reason he sets forth is theresult of mixing the total-step and single-step ideas.

4T. H. Jamieson, Optimization Techniques in Lens Design(American Elsevier, New York, 1971).

15R. Hooke and T. A.Jeeves, J. Assoc. Comput. Mach. 8, 212(1961).

'6W. Zangwill, Nonlinear Programming, A Unified Approach(Prentice-Hall, Englewood Cliffs, N.J., 1969).

"L. W. Cornwell, Thesis, University of Missouri-Rolla (1972).'8L. Fox, Numerical Linear Algebra (Oxford, U.P., London,

1965).

May 1973 581

Date post:	01-Oct-2016
Category:	Documents
Upload:	t-p
View:	214 times
Download:	0 times

Grey’s method for nonlinear optimization

Documents