+ All Categories
Home > Documents > A Perfect Example for the BFGS...

A Perfect Example for the BFGS...

Date post: 12-Oct-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
73
A Perfect Example for the BFGS Method Yu-Hong Dai State Key Laboratory of Scientific and Engineering Computing Inst of Computational Mathematics and Scientific/Engineering Computing Academy of Mathematics and Systems Science Chinese Academy of Sciences, Beijing 100190, China Email: [email protected] http://lsec.cc.ac.cn/dyh July 12, 2010 Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 1 / 70
Transcript
Page 1: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

A Perfect Example for the BFGS Method

Yu-Hong Dai

State Key Laboratory of Scientific and Engineering ComputingInst of Computational Mathematics and Scientific/Engineering Computing

Academy of Mathematics and Systems ScienceChinese Academy of Sciences, Beijing 100190, China

Email: [email protected]://lsec.cc.ac.cn/∼dyh

July 12, 2010

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 1 / 70

Page 2: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Outline

1 The BFGS Quasi-Newton Method

2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters

3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function

4 Conclusion and Discussion

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 2 / 70

Page 3: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Outline

1 The BFGS Quasi-Newton Method

2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters

3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function

4 Conclusion and Discussion

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 2 / 70

Page 4: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Outline

1 The BFGS Quasi-Newton Method

2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters

3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function

4 Conclusion and Discussion

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 2 / 70

Page 5: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Outline

1 The BFGS Quasi-Newton Method

2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters

3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function

4 Conclusion and Discussion

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 2 / 70

Page 6: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

The BFGS Quasi-Newton Method

Quasi-Newton Methods

Unconstrained Optimization

minx∈<n

f (x)

Quasi-Newton Methods

xk+1 = xk + αk dk

dk = −Hk gk

How does Hk+1 get close to [∇2f (xk+1)]−1? Defining

sk = xk+1 − xk , yk = gk+1 − gk ,

the answer by Davidon (1959) is to ask Hk+1 to be updated from Hkand match the second derivative information along the previous step:

Hk+1yk = sk

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 3 / 70

Page 7: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

The BFGS Quasi-Newton Method

Broyden’s Convex Family of Methods

The DFP Method

Hk+1 = Hk −HkykyT

k Hk

yTk Hkyk

+sksT

k

sTk yk

The BFGS Method

Hk+1 = Hk −skyT

k Hk + HkyksTk

sTk yk

+

(1 +

yTk Hkyk

sTk yk

)sksT

k

sTk yk

Broyden’s Convex Family of Methods (φ ∈ [0, 1])

Hk+1 = Hk −HkykyT

k Hk

yTk Hyk

+sksT

k

sTk yk

+ φvkvTk ,

where vk =√

yTk Hyk

(sk

sTk yk

− Hkyk

yTk Hyk

)Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 4 / 70

Page 8: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

The BFGS Quasi-Newton Method

Theoretical and Practical Line Searches

Line Search Function

ψk (α) = f (xk + αdk ), where α ∈ [0, +∞)

Global Exact Line Search: ψk (α∗k ) = global minψk (α)

Curry Line Search: α∗k is the first local minimizer of ψk (α)

Wolfe Inexact Line Search: find αk such that{ψk (αk ) ≤ ψk (0) + σ1ψ

′k (0)αk

ψ′k (αk ) ≥ σ2ψ′k (0)

Armijo Inexact Line Search: choose αk to be the maximal valueof {λm : m ≥ 0}, where λ ∈ (0, 1), such that

ψk (αk ) ≤ ψk (0) + σψ′k (0)αk

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 5 / 70

Page 9: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

The BFGS Quasi-Newton Method

Convergence of BFGS for Convex Functions

Powell (1976):

The BFGS methodwith Wolfe inexact line searches

is globally convergent foruniformly convex functions

Byrd, Nocedal and Yuan (1987):

The Broyden’s convex family of methods (except DFP)with Wolfe inexact line searches

is globally convergent for(not necessarily uniformly) convex functions

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 6 / 70

Page 10: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

The BFGS Quasi-Newton Method

Two Open Convergence Questions

Nocedal (1992), Fletcher (1994), et al:

Open Question (I)Does the DFP method with Wolfe line searches converge for uniformlyconvex functions?

Open Question (II)Does the BFGS method with Wolfe line searches converge forgeneral nonconvex functions?

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 7 / 70

Page 11: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

The BFGS Quasi-Newton Method

Nonconvergence of BFGS for Nonconvex Functions

Powell (1984): if the stepsize αk can be chosen as any local minimizerof the line search function ψk (α)

Dai (2002): if the stepsize αk is obtained by the Wolfe inexact linesearch

Mascarenhas (2004): if the stepsize αk is chosen to be the globalminimizer of the line search function ψk (α)

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 8 / 70

Page 12: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

The BFGS Quasi-Newton Method

Motivation of This Work

Powell (2000) was able to show that the BFGS method convergesglobally for two-dimensional nonconvex functions if the line searchtakes the first local minimizer of ψk (α).

The Aim of This Work is to construct a perfect example for thenonconvergence of the BFGS method with the following properties:

the stepsize is always one; namely, αk ≡ 1;each line search function ψk (α) is convex and hence αk is theunique minimizer of ψk (α);the unit stepsize can be accepted by all the exact and inexact linesearches mentioned before.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 9 / 70

Page 13: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

The BFGS Quasi-Newton Method

Basic Ideas of Constructing Examples

Prefix the concrete forms of the steps {sk ; k ≥ 1} and gradients{gk ; k ≥ 1} with some parameters to be determined later.Consequently, once x1 is given, the whole sequence {xk} is fixed.To enable the BFGS method to generate the prefixed steps,investigate the consistency conditions about the steps {sk ; k ≥ 1}and gradients {gk ; k ≥ 1}.Choose the parameters by some way to satisfy the consistencyconditions and the other necessary conditions on the objectivefunction.Construct a function f whose gradients are the preassignedvalues, namely, ∇f (xk ) = gk for all k ≥ 1 and ensure the linesearch has the desired properties.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 10 / 70

Page 14: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Prefixed Forms of Steps and Gradients

Outline

1 The BFGS Quasi-Newton Method

2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters

3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function

4 Conclusion and Discussion

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 11 / 70

Page 15: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Prefixed Forms of Steps and Gradients

Denoting the rotation matrix

R1 =

(cosα − sinαsinα cosα

), R2 =

(cosβ − sinβsinβ cosβ

)we prescribe the forms of the steps {sk} and the gradients {gk}:

s1 = (√

2, 0, γ, τ)T , sk+1 = Msk (k ≥ 1);

g1 = (l , h, 0, −√

2)T , gk+1 = Pgk (k ≥ 1),

where

M =

(R1 00 tR2

), P =

(tR1 00 R2

).

In the above, the parameter t ∈ (0,1) answers for the decay of the lasttwo components of sk and the first two components of gk .

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 12 / 70

Page 16: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Prefixed Forms of Steps and Gradients

The angles α and β are chosen so that

α =2πm1

, β =2πm2

,

for some integers m1 and m2. Specifically, we choose

α =14π, β =

34π.

Therefore, after every eight iterations, the first two components of thesteps {sk} turn to the same while the last two components of {sk}shrink with the factor of t8. This means that {xk} asymptotically turnaround the vertices of a regular octagon that lies in the plane.So is {gk} if we exchange the first and the last two components.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 13 / 70

Page 17: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Unit Stepsizes

Outline

1 The BFGS Quasi-Newton Method

2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters

3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function

4 Conclusion and Discussion

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 14 / 70

Page 18: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Unit Stepsizes

Assumption on the Line SearchWe always assume that the line search satisfies

gTk+1sk = 0. (1)

This with the quasi-Newton equation

Hk+1yk = sk , (2)

and the definition of the search direction

Hk+1gk+1 = −α−1k+1sk+1 (3)

implies that

sTk+1yk = −αk+1gT

k+1Hk+1yk = −αk+1gTk+1sk = 0. (4)

The above relation (4) is called as the conjugacy condition in thecontext of nonlinear conjugate gradient methods.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 15 / 70

Page 19: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Unit Stepsizes

Expression for Hkyk and Hk+1.

Multiplying the BFGS updating formula with gk+1 and using (1),

Hk+1gk+1 = Hkgk+1 −yT

k Hkgk+1

sTk yk

sk ,

which with (3) gives

Hkgk+1 = −α−1k+1sk+1 +

yTk Hkgk+1

sTk yk

sk . (5)

Multiplying (5) by gTk and noticing gT

k Hkgk+1 = 0, we get that

0 = −α−1k+1gT

k sk+1 +yT

k Hkgk+1

sTk yk

gTk sk = −α−1

k+1gTk sk+1 − yT

k Hkgk+1.

Thus yTk Hkgk+1 = −α−1

k+1gTk sk+1 = −α−1

k+1gTk+1sk+1, and hence

Hkgk+1 = −α−1k+1sk+1 − α−1

k+1gT

k+1sk+1

sTk yk

sk . (6)

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 16 / 70

Page 20: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Unit Stepsizes

Expression for Hkyk and Hk+1 (ctd.) Thus by (6) andHkgk = −α−1

k sk , we obtain

Hkyk = −α−1k+1sk+1 +

[α−1

k − α−1k+1

gTk+1sk+1

sTk yk

]sk . (7)

Substituting this into the BFGS updating formula

Hk+1 = Hk −skyT

k Hk + HkyksTk

sTk yk

+

(1 +

yTk Hkyk

sTk yk

)sksT

k

sTk yk

yields

Hk+1 = Hk+α−1k+1

sksTk+1 + sk+1sT

k

sTk yk

+

[1− α−1

k + α−1k+1

gTk+1sk+1

sTk yk

]sksT

k

sTk yk

.

(8)

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 17 / 70

Page 21: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Unit Stepsizes

Lemma 1. Assume that gTk+1sk = 0 for all k ≥ 1. Then for all k ≥ 1

and i ≥ 0, we have that

Hkgk+i + α−1k+isk+i ∈ Span{sk ,sk+1, . . . ,sk+i−1}. (9)

Proof. For convenience, we write (8)

Hk+1 = Hk + V (sk ,sk+1), (10)

where V (sk ,sk+1) means the rank-two matrix in the right hand of (8).Therefore for all i ≥ 1, we have that

Hk+i = Hk +i∑

j=1

V (sk+j−1,sk+j). (11)

The statement follows by multiplying the above relation with gk+i andusing Hk+igk+i = −α−1

k+isk+i and gTk+isk+i−1 = 0.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 18 / 70

Page 22: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Unit Stepsizes

Lemma 2. To construct a desired example with Det(S1) 6= 0, we musthave that

αk = 1, for all k ≥ 4.

proof. Define

Gk =[yk−1 gk gk+1 gk+2

]Sk =

[sk−1 sk sk+1 sk+2

].

Then by Hkyk−1 = sk−1 and Lemma 1, we can get that

Det(Hk )Det(Gk ) = Det[Hkyk−1 Hkgk Hkgk+1 Hkgk+2

]= Det

[sk−1 − α−1

k sk − α−1k+1sk+1 − α−2

k+2sk+2]

= −α−1k α−1

k+1α−1k+2Det(Sk ).

(12)Replacing k with k + 1 in the above yields

Det(Hk+1)Det(Gk+1) = −α−1k+1α

−1k+2α

−1k+3Det(Sk+1). (13)

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 19 / 70

Page 23: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Unit Stepsizes

On the other hand, due to the special forms of {gk} and {sk}, we havethat Gk+1 = PGk and Sk+1 = MSk . Hence

Det(Gk+1) = Det(P) Det(Gk ) = t2 Det(Gk ),

Det(Sk+1) = Det(M) Det(Sk ) = t2 Det(Sk ),(14)

Due to the basic determinant relation of BFGS and the assumption,

Det(Hk+1) =sT

k H−1k sk

sTk yk

Det(Hk ) = αkDet(Hk ). (15)

If Det(S1) 6= 0, (12) with k = 1 implies that Det(G1) 6= 0. Then by (14),

Det(Gk ) 6= 0, Det(Sk ) 6= 0, for all k ≥ 1. (16)

Dividing (13) by (12) and using the above relations, we then obtain

αk+3 = 1.

So the statement holds due to the arbitrariness of k ≥ 1.Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 20 / 70

Page 24: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Unit Stepsizes

The deletion of the first finite iterations does not influence the wholeexample. Thus we will ask our counter-example to satisfy

Det(S1) 6= 0 (17)

andαk ≡ 1. (18)

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 21 / 70

Page 25: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Consistency Conditions

Outline

1 The BFGS Quasi-Newton Method

2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters

3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function

4 Conclusion and Discussion

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 22 / 70

Page 26: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Consistency Conditions

Question. What else conditions on {gk} and {sk} which ensure thesequence of {sk ; k ≥ 1} can be generated by the BFGS update?

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 23 / 70

Page 27: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Consistency Conditions

Under the assumptions (1) (i.e. gTk+1sk = 0), (17) and (18), the

updating formula of Hk in (8) can be simplified as

Hk+1 = Hk +sksT

k+1 + sk+1sTk

sTk yk

−gT

k+1sk+1

gTk sk

sksTk

sTk yk

. (19)

An early idea is to consider the linear system formed by the aboverelations and

H8(j+1) = diag(t−4E2, t4E2)H8j ,

where E2 is the 2-dimensional identity matrix. It seems difficult toanalyze with the above formula directly.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 24 / 70

Page 28: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Consistency Conditions

Another Observation. In the case of the dimension n = 4 and thelinear independence assumption of the steps, the matrix Hk+1 can beuniquely defined by the equations given by Hk+1yk , Hk+1gk+1,Hk+1gk+2 and Hk+1gk+3. As a matter of fact, we have that

Hk+1yk = sk (quasi-Newtion equations) (20)Hk+1gk+1 = −sk+1 (using (3) and αk ≡ 1) (21)

Hk+1gk+2 = −sk+2 +gT

k+2sk+2

gTk+1sk+1

sk+1 (by (6) and αk ≡ 1) (22)

Hk+1gk+3 = −sk+3 +

(gT

k+3sk+3

gTk+2sk+2

+gT

k+3sk+1

gTk+1sk+1

)sk+2

(gT

k+2sk+2

gTk+1sk+1

)(gT

k+3sk+1

gTk+1sk+1

)sk+1 (23)

The last equality is obtained by multiplying (19) by gk+2, using (22)and finally replacing k with k + 1.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 25 / 70

Page 29: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Consistency Conditions

Relations (20)-(23) provide a system of 16 equations, while thesymmetric matrix Hk+1 only has 10 independent entries.

Question. How to ensure that this linear system has a symmetricsolution H?

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 26 / 70

Page 30: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Consistency Conditions

Lemma 3. Assume that {u1,u2, . . . ,un} and {v1,v2, . . . ,vn} are twosets of n-dimensional linearly independent vectors. Then there exists asymmetric matrix H ∈ Rn×n satisfying

Hui = vi , i = 1,2, . . . ,n (24)

if and only ifuT

i vj = uTj vi , ∀ i , j = 1,2, . . . ,n. (25)

Proof. The “only if" part. If H = HT satisfies (24), we have for alli , j = 1,2, . . . ,n, uT

i vj = uTi Huj = uT

i HT uj = (Hui)T uj = vT

i uj .The “if" part. Assume that (25) holds. Defining the matrices

U = (u1 u2 . . . un) , V = (v1 v2 . . . vn) ,

direct calculations show that

UT HU = UT V =

uT

1 v1 uT1 v2 · · · uT

1 vnuT

2 v1 uT2 v2 · · · uT

2 vn· · · · · · · · · · · ·

uTn v1 uT

n v2 · · · uTn vn

:= A. (26)

By (25), A is symmetric. So H = U−T AU−1 satisfies H = HT and (24).Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 27 / 70

Page 31: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Consistency Conditions

Using Lemma 3, the following six conditions are sufficient for the linearsystem (20)-(23) to allow a symmetric matrix Hk+1.

gTk+1(Hk+1yk ) = (Hk+1gk+1)

T yk

gTk+2(Hk+1yk ) = (Hk+1gk+2)

T yk

gTk+3(Hk+1yk ) = (Hk+1gk+3)

T yk

gTk+2(Hk+1gk+1) = (Hk+1gk+2)

T gk+1

gTk+3(Hk+1gk+1) = (Hk+1gk+3)

T gk+1

gTk+3(Hk+1gk+2) = (Hk+1gk+3)

T gk+2

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 28 / 70

Page 32: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Consistency Conditions

Considering the whole sequence of {Hk+1; k ≥ 1} and combining theline search condition gT

k+1sk = 0, we require the following fourconditions hold for all k ≥ 1.

gTk+1sk = 0 (27)

sTk+1yk = 0 (28)

gTk+2sk = −sT

k+2yk (29)

gTk+3sk = −sT

k+3yk +

(gT

k+3sk+3

gTk+2sk+2

+gT

k+3sk+1

gTk+1sk+1

)sT

k+2yk (30)

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 29 / 70

Page 33: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Consistency Conditions

Looking for the expression for matrix Hk+1

Lemma 4. Assume that (25) holds and the matrix H satisfies (24). Iffurther, the matrix A in (26) is positive definite, then there must existnonsingular triangular matrices T1 and T2 such that

A = V T U = T−T1 T2.

Thus if denoting V = VT1, we have that

V T U = T2.

Since HU = V , we get that

H = VU−1 =(

VT−11

) (T−1

2 V T)

= V(

T−11 T−1

2

)V T .

Assuming that the diagonal matrix T−11 T−1

2 has diagonal entriest1, t2, . . . , tn, V = (v1, v2, . . . , vn), we obtain

H =n∑

i=1

ti vi vTi .

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 30 / 70

Page 34: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Consistency Conditions

To express Hk+1, we make use of the steps {sk+i ; i = 0,1,2,3} tointroduce the following two vectors that are orthogonal to yk and gk+1:

zk = −sTk+2yk

sTk yk

sk −sT

k+2gk+1

sTk+1gk+1

sk+1 + sk+2,

wk = −sTk+3yk

sTk yk

sk −sT

k+3gk+1

sTk+1gk+1

sk+1 + sk+3.(31)

The vectors zk and wk are well defined because sTk yk = −sT

k gk > 0 ispositive due to the descent property of dk .

Direct calculations show that

zTk yk = zT

k gk+1 = 0, wTk yk = wT

k gk+1 = 0. (32)

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 31 / 70

Page 35: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Consistency Conditions

Further, we define the vector that is orthogonal to gk+2:

vk = −wT

k gk+2

zTk gk+2

zk + wk . (33)

By the choice of vk , it is easy to see that

vTk yk = vT

k gk+1 = vTk gk+2 = 0. (34)

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 32 / 70

Page 36: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Consistency Conditions

If zTk gk+2 < 0 and vT

k gk+3 < 0, the following matrix Hk+1 is positiveand satisfies all the relations (20), (21), (22) and (23):

Hk+1 = ak sksTk + bk sk+1sT

k+1 + ck zkzTk + dk vkvT

k , (35)

where

ak =1

sTk yk

, bk = − 1sT

k+1gk+1, ck = − 1

zTk gk+2

, dk = − 1vT

k gk+3.

(36)

At the same time, the positive definiteness of Hk+1 requires

ak < 0, bk < 0, ck < 0, dk < 0.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 33 / 70

Page 37: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Choosing Parameters

Outline

1 The BFGS Quasi-Newton Method

2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters

3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function

4 Conclusion and Discussion

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 34 / 70

Page 38: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Choosing Parameters

Now we turn to how to choose suitable parameters such that theconsistency conditions (27), (28), (29) and (30) hold. For this, wedenote

v = (∆1 ∆2 ∆3 ∆4)T ,

where

∆1 =√

2 l , ∆2 =√

2 h, ∆3 = −√

2 τ, ∆4 = −√

2 γ. (37)

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 35 / 70

Page 39: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Choosing Parameters

The condition sT1 g2 = sT

1 Pg1 = 0, namely, (27) with k = 1, asks

[t cosα − t sinα cosβ − sinβ] v = 0. (38)

The condition sT2 y1 = sT

1 MT (P − I)g1 = sT1 (tI −MT )g1 = 0, namely,

(28) with k = 1, requires the vector v to satisfy

[t − cosα − sinα t(1− cosβ) − t sinβ] v = 0. (39)

The requirement (29) with k = 1, that is,

0 = sT1 g3 + sT

3 y1 = sT1 P2g1 + sT

1 (M2)T (P − I)g1

= sT1 [P2 + tMT − (M2)T ]g1,

yields the equation[t cosα+ (t2 − 1) cos 2α t sinα− (1 + t2) sin 2α

t2(cosβ − cos 2β) + cos 2β t2(sinβ − sin 2β)− sin 2β]

v = 0.(40)

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 36 / 70

Page 40: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Choosing Parameters

By (38), (39), (40) and the choice of α and β, we see that {∆i} mustsatisfy

√2

2 t −√

22 t −

√2

2 −√

22

t −√

22 −

√2

2 (1 +√

22 )t −

√2

2 t√

22 t

√2

2 t − (1 + t2) −√

22 t2 (

√2

2 + 1)t2 + 1

∆1∆2∆3∆4

= 0.

(41)By the Cramer’s rule, to meet (41), we may choose {∆i} as follows:

∆1 = ±[(2 +

√2)t3 + (1 +

√2)t + 1

],

∆2 = ±[(2 +

√2)t3 + (1 +

√2)t − 1

],

∆3 = ±[−√

2 t3 + 2t2 + (1−√

2)t + 1],

∆4 = ±[√

2 t3 − 2t2 + (1 +√

2)t − 1].

(42)

Since gT1 s1 = ∆1 + ∆3 = ±2(t + 1)(t2 + 1) and t ∈ (0,1), we choose

all the signs in (42) as − so that gT1 s1 < 0.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 37 / 70

Page 41: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Choosing Parameters

Noting that sT3 y1 = −gT

3 s1 and gT4 s4 = t gT

3 s3, we know that (30) withk = 1 is equivalent to

gT4 s1 + gT

2 s4 − gT1 s4 + gT

3 s1

[t +

gT4 s2

gT2 s2

]= 0. (43)

Further, using gT2 s2 = t gT

1 s1, gT4 s2 = t gT

3 s1 and gT2 s4 = t gT

1 s3, (43)can be simplified as

gT1 s1

[gT

4 s1 − gT1 s4 + t(gT

3 s1 + gT1 s3)

]+ (gT

3 s1)2 = 0. (44)

Consequently, we obtain the following equation for the parameter t :

(t + 1)2(t2 + 1)2[p2(t) + 2tp(t)− 2q(t)

]= 0, (45)

where

p(t) = −(2 +√

2)t2 + (2 +√

2)t − 1,q(t) = (2 + 3

√2)t3 − (4 + 3

√2)t2 + (3 + 2

√2)t − 2

√2.

(46)

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 38 / 70

Page 42: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Choosing Parameters

Further calculations provide(6 + 4

√2)−1 [

p2(t) + 2tp(t)− 2q(t)]

=[t2 + (1−

√2)t + (1−

√2

2 )] [

t2 + (1− 3√

2)t + (−3 + 72

√2)].

By the above relation, it is not difficult to show that the equation (45)has a unique root in the interval (−1, 1), that is

t =3√

2− 1−√

31− 20√

22

(≈ 0.7973) , (47)

which satisfies

t2 +(

1− 3√

2)

t +

(−3 +

72

√2)

= 0.

Therefore if we choose the above t and the vectors s1 and g1 such that(42) holds with minus sign, then the prefixed steps and gradientssatisfy the consistency conditions (27), (28), (29) and (30) for all k ≥ 1.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 39 / 70

Page 43: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Choosing Parameters

Furthermore, by (42) with minus signs and the definitions of ∆i ’s, wecan obtain

l = (−4− 13√

2)t + (−1 + 11√

2),

h = (−4− 13√

2)t + (−1 + 12√

2),

γ = (17− 8√

2)t + (−17 + 9√

2),

τ = (−17 + 9√

2)t + (17− 9√

2).

(48)

Therefore the vectors s1 and g1 are

s1 =

20

(17− 8√

2)t + (−17 + 9√

2)

(−17 + 9√

2)t + (17− 9√

2)

,

g1 =

(−4− 13

√2)t + (−1 + 11

√2)

(−4− 13√

2)t + (−1 + 12√

2)0

−√

2

.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 40 / 70

Page 44: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Looking for Consistent Steps and Gradients Choosing Parameters

Remark. Numerically, other choices for α and β are possible. Forexample,

α =27π, β =

47π,

which could lead to a counter-example of 7 cyclic points. In this case,however, it is difficult to get its analytical solution and we can onlyobtain a numerical solution of t ≈ 0.8642 in (0,1).

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 41 / 70

Page 45: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Recovering the iterations and function values

Outline

1 The BFGS Quasi-Newton Method

2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters

3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function

4 Conclusion and Discussion

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 42 / 70

Page 46: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Recovering the iterations and function values

For i = 1, . . . ,8, the limits x∗i := limj→∞

x8j+i and g∗i := limj→∞

x8j+i exist.

Suppose that

x∗i =

aibi00

, g∗i =

00cidi

; i = 1, . . . ,8

witha1 = −

√2

2, b1 = −1−

√2

2, c1 = 0, d1 = −

√2.

We have that (ai+1bi+1

)= R1

(aibi

),

(ci+1di+1

)= R2

(cidi

).

We can see that {x∗i : i = 1, . . . ,8} are exactly the vertices of a regularoctagon that lies in the plane spanned by the first and secondcoordinates and has the origin as its center.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 43 / 70

Page 47: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Recovering the iterations and function values

Due to the special choices of {sk} and {gk}, we have

x8j+i =

aibi

t8jpit8jqi

, g8j+i =

t8j lit8jhicidi

; i = 1, . . . ,8

with (pi+1qi+1

)= t R2

(piqi

);

(l1h1

)=

(lh

),

(li+1hi+1

)= t R1

(lihi

).

To get the values of p1 and q1, we denote v to be the vector formed bythe last two components of v. We have that

x∗1 − x1 =∞∑

i=0

si =∞∑

i=0

(tR2)i s1 = (E − tR2)

−1(γτ

),

where E is the identity matrix. Consequently, we have that(p1q1

)= x1 =

14633

((−15720 + 4019

√2)t + (32931− 17534

√2)

(4376− 1977√

2)t + (9563− 9433√

2)

).

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 44 / 70

Page 48: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Recovering the iterations and function values

Assuming that the limit of f (xk ) is f ∗, we have that

f (x8j+i)− f ∗ =(g∗i)T (x8j+i − x∗i

)= t8j

(cidi

)T (piqi

)= t8j

[R i−1

2

(c1d1

)]T [(tR2)

i−1(

p1q1

)]= t8j+i−1

(c1d1

)T (p1q1

).

Consequently,

f (x1)− f ∗ = c1p1 + d1q1 = (3954−4376√

2)t+(18866−9563√

2)4633 ,

f (x8j+i)− f ∗ = t8j+i−1(

R2

(λ′i(ρ)µ′i(ρ)

))= t8j+i−1 (f (x1)− f ∗) .

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 45 / 70

Page 49: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Recovering the iterations and function values

Now we see the value of gT8j+is8j+i . Denoting

s8j+i =

ηiξi

t8jγit8jτi

,

we have η1 =√

2, ξ1 = 0, γ1 = γ, τ1 = τ and(ηiξi

)= R1

(ηi−1ξi−1

),

(γiτi

)= t R2

(γi−1τi−1

).

Thus

gT8j+is8j+i =

t8j lit8jhicidi

T

ηiξi

t8jγit8jτi

= t8j

lihicidi

T

ηiξiγiτi

= t8j+i−1(l1η1 + h1ξ1 + c1γ1 + d1τ1) = t8j+i−1gT

1 s1,

where gT1 s1 =

√2(l1 − τ1) = (−44 + 13

√2)t + (40− 18

√2).

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 46 / 70

Page 50: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Recovering the iterations and function values

Therefore

f (x8j+i+1)− f (x8j+i)

α8j+igT8j+is8j+i

=(f (x8j+i+1)− f ∗)− (f (x8j+i)− f ∗)

gT8j+is8j+i

=(t8j+i − t8j+i−1)(f (x1)− f ∗)

t8j+i−1gT1 s1

=(t − 1)(f (x1)− f ∗)

gT1 s1

≈ 2.6483e − 02.

The above relation, together with gT8j+i+1s8j+i = 0, implies that the

stepsize α8j+i = 1 can be accepted by either the Wolfe line search orthe Armijo line search with suitable line search parameters.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 47 / 70

Page 51: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Seeking A Suitable Form for the Objective Function

Outline

1 The BFGS Quasi-Newton Method

2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters

3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function

4 Conclusion and Discussion

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 48 / 70

Page 52: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Seeking A Suitable Form for the Objective Function

To be such that∇f (x8j+i) = g8j+i ,

we assume that the objective function f is of the form

f (x1, x2, x3, x4) = λ(x1, x2)x3 + µ(x1, x2)x4,

where λ and µ are 2-dimensional functions to be determined. Denoting

Vi =

(aibi

)and Ai =

(∂λ(Vi )

∂x1

∂µ(Vi )∂x1

∂λ(Vi )∂x2

∂µ(Vi )∂x2

),

the functions λ and µ have to satisfy(λ(Vi)µ(Vi)

)=

(cidi

)(49)

and

Ai

(piqi

)=

(lihi

). (50)

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 49 / 70

Page 53: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Seeking A Suitable Form for the Objective Function

Recalling the relations(pi+1qi+1

)= t R2

(piqi

);

(li+1hi+1

)= t R1

(lihi

),

we shall ask Ai and Ai+1 to meet the condition

Ai+1 = R1AiRT2 (51)

so that the relation (50) satisfies for all i provided that it holds withi = 1. This is because if (50) and (51) is true, we have

Ai+1

(pi+1qi+1

)= (R1AiRT

2 )(tR2)

(piqi

)= tR1Ai

(piqi

)= tR1

(lihi

)=

(li+1hi+1

).

Therefore we can focus our attention on the choice of the matrix

A1 :=

(ω1 ω2ω3 ω4

).

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 50 / 70

Page 54: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Seeking A Suitable Form for the Objective Function

Now we shall consider the restrictions of the functions λ and µ on theedges of the octagon and define

λi(α) = λ(Vi + αsi), µi(α) = µ(Vi + αsi),

where

si = Vi+1 − Vi =

(ηiξi

)is the vector formed by the first two components of s8j+i . The relation(49) implies that(

λi+1(0)µi+1(0)

)= R2

(λi(0)µi(0)

)for all i ≥ 1.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 51 / 70

Page 55: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Seeking A Suitable Form for the Objective Function

At the same time, we have that(λ′i+1(0)

µ′i+1(0)

)= AT

i+1si+1 =(

R2ATi RT

1

)(R1si) = R2AT

i si = R2

(λ′i(0)µ′i(0)

)holds for all i ≥ 1.

Based on the above conditions, we are going to construct the functionsλi ’s and µi ’s such that(

λi+1(α)µi+1(α)

)= R2

(λi(α)µi(α)

)for all i ≥ 1 and α ∈ <1. (52)

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 52 / 70

Page 56: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Constructing The Functions on The Lines

Outline

1 The BFGS Quasi-Newton Method

2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters

3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function

4 Conclusion and Discussion

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 53 / 70

Page 57: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Constructing The Functions on The Lines

Consider the following extension of the line search functions

ψi(α) = f (xi + αsi), α ∈ <1.

We shall concentrate on the choices of λ1(·), µ1(·) and ψ1(·).Assuming that f ∗ = 0, it is obvious that

ψ1(0) = f (x1) = c1p1 + d1q1, (53)

ψ1(1) = ψ2(0) = t f (x1), (54)

ψ′1(0) = gT1 s1 =

√2 (l1 − τ1), (55)

ψ′1(1) = gT2 s1 = 0, (56)

Suppose that we have constructed a function ψ1 satisfying all the fourconditions (53)-(56) and some special requirements.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 54 / 70

Page 58: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Constructing The Functions on The Lines

There are many ways to choose the line search functions. Thefollowing is one of them with good theoretical properties.

ψ1(α) = ζ1(α− 1)38 + ζ2(α− 1)2 + ζ3,

where

ζ1 = (173256−38127√

2)t+(−138064+48586√

2)166788 ≈ 1.5468E − 1,

ζ2 = (377472−359709√

2)t+(−712544+577958√

2)166788 ≈ 1.0405E − 3,

ζ3 = (−11344+6675√

2)t+(42494−26967√

2)4633 ≈ 6.1270E − 1.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 55 / 70

Page 59: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Constructing The Functions on The Lines

We shall now ask what conditions have to be satisfied by λ1 and µ1.Since ψ1 is the restriction of f on the straight line

x1 + αs1 =

a1b1p1q1

+ α

η1ξ1γ1τ1

,

we have that

ψ1(α) = [p1 + αγ1]λ1(α) + [q1 + ατ1]µ1(α). (57)

Now, by (49),(λ1(0)µ1(0)

)=

(λ(V1)µ(V1)

),

(λ1(1)µ1(1)

)=

(λ(V2)µ(V2)

), (58)

which is obviously consistent with (53) and (54).

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 56 / 70

Page 60: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Constructing The Functions on The Lines

One has to be careful to choose the derivatives λ′1(0), µ′1(0), λ′1(1) andµ′1(1) to meet

A1

(p1q1

)=

(l1h1

). (59)

There are some relations between these values and A1

AT1 s1 =

(λ′1(0)µ′1(0)

), (60)

and

AT1 s8 =

(λ′8(1)µ′8(1)

)= R7

2

(λ′1(1)µ′1(1)

)= RT

2

(λ′1(1)µ′1(1)

). (61)

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 57 / 70

Page 61: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Constructing The Functions on The Lines

By the expression of ψ1, we know that

ψ′1(α) = γ1λ1(α) + τ1(α) + [p1 + αγ1]λ′1(α) + [q1 + ατ1]µ

′1(α).

Setting α = 0 and 1 respectively, we obtain the following moreconditions (

p1q1

)T (λ′1(0)µ′1(0)

)= ψ′1(0)− γ1λ1(0)− τ1µ1(0), (62)

(p2q2

)T (λ′1(1)µ′1(1)

)= ψ′1(1)− γ1λ1(1)− τ1µ1(1). (63)

The relations (59)–(63) form a system of 8 equations with 8 variables

ω1, ω2, ω3, ω4, λ′1(0), µ′1(0), λ′1(1), µ′1(1).

Therefore after the function ψ1(α) has been chosen, we can derivethose derivatives uniquely in the above way. This, together with therelation (52), also imposes the relation (51) between Ai+1 and Ai .

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 58 / 70

Page 62: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Constructing The Functions on The Lines

Attention must be also given to the cross points of the straight lineconnecting V1 and V2 and the others. It is easy to deduce that(

λ1(1 +√

22 )

µ1(1 +√

22 )

)=

(λ3(−

√2

2 )

µ3(−√

22 )

)= R2

2

(λ1(−

√2

2 )

µ1(−√

22 )

). (64)

Setting α to be 1 +√

22 and −

√2

2 , respectively, yields(p1 + (1 +

√2

2 )γ1

q1 + (1 +√

22 )τ1

)T (λ1(1 +

√2

2 )

µ1(1 +√

22 )

)= ψ1(1 +

√2

2), (65)

(p1 −

√2

2 γ1

q1 −√

22 τ1

)T (λ1(−

√2

2 )

µ1(−√

22 )

)= ψ1(−

√2

2). (66)

The above system of 4 linear equations decides the values of

λ1(1 +

√2

2), µ1(1 +

√2

2), λ1(−

√2

2), µ1(−

√2

2).

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 59 / 70

Page 63: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Constructing The Functions on The Lines

In a similar way, we must have that(λ1(2 +

√2)

µ1(2 +√

2)

)=

(λ4(−1−

√2)

µ4(−1−√

2)

)= R3

2

(λ1(−1−

√2)

µ1(−1−√

2)

). (67)

Setting α to be 2 +√

2 and −1−√

2, respectively, yields(p1 + (2 +

√2)γ1

q1 + (2 +√

2)τ1

)T (λ1(2 +

√2)

µ1(2 +√

2)

)= ψ1(2 +

√2), (68)

(p1 − (1 +

√2)γ1

q1 − (1 +√

2)τ1

)T (λ1(−1−

√2)

µ1(−1−√

2)

)= ψ1(−1−

√2). (69)

The above system of 4 linear equations decides the values of

λ1(2 +√

2), µ1(2 +√

2), λ1(−1−√

2), µ1(−1−√

2).

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 60 / 70

Page 64: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Constructing The Functions on The Lines

Now we can construct a twice-continuous differentiable function λ1 byfor example Lagrangian interpolation such that the following value

λ1(−1−√

2), λ1(−√

22

), λ1(0), λ′1(0), λ1(1), λ′1(1), λ1(1+

√2

2), λ1(2+

√2)

is in accordance with the required one.

The function µ1 is then obtained from (57)

µ1(α) =ψ1(α)− (p1 + αγ1)λ1(α)

q1 + ατ1.

Some more attention could be made so that µ1 is twice-continuouslydifferential function.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 61 / 70

Page 65: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Extending the Functions to the Whole Plane

Outline

1 The BFGS Quasi-Newton Method

2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters

3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function

4 Conclusion and Discussion

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 62 / 70

Page 66: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Extending the Functions to the Whole Plane

Define the eight straight lines

L1 = {(x1, x2) : x2 + (1 +√

22 ) = 0},

L2 = {(x1, x2) : x1 − x2 − (1 +√

2) = 0},L3 = {(x1, x2) : x1 − (1 +

√2

2 ) = 0},L4 = {(x1, x2) : x1 + x2 − (1 +

√2) = 0},

L5 = {(x1, x2) : x2 − (1 +√

22 ) = 0},

L6 = {(x1, x2) : x1 − x2 + (1 +√

2) = 0},L7 = {(x1, x2) : x1 + (1 +

√2

2 ) = 0},L8 = {(x1, x2) : x1 + x2 + (1 +

√2) = 0},

which are the extensions of the eight edges of the octagon.

To complete the construction of the objective function f , we have todeduce the twice-continuously differentiable extensions of thefunctions λ and µ, which are already defined on the straight lines{Li : i = 1, . . . ,8}.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 63 / 70

Page 67: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Extending the Functions to the Whole Plane

An Illustrative ExampleGiven a 1-dimensional polynomial function

f (x), x ∈ R1,

with zero roots at x = 0 and x = 1. How to extend this function to R2

so that the extension function F (x , y) satisfies

(1) F (x ,0) = f (x), for x ∈ R1,

(2) F (x , y) = 0, if x = 0,(3) F (x , y) = 0, if x + y = 1,(4) F (x , y) = 0, if y = 1?

Answer. Since f (x) is polynomial, we can write

f (x) = x(x − 1)h(x).

The following extension function is one desired function:

F (x , y) = x(x + y − 1)(1− y) h(x).

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 64 / 70

Page 68: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Extending the Functions to the Whole Plane

There are 24 crossing points altogether of the eight lines, including thevertices {Vi : i = 1, . . . ,8} of the octagon. Assume that the othercrossing points are {Ci : i = 1, . . . ,8}. Then it is possible to constructa two-dimensional eight-order polynomial function φ such that thefunction values of φ at Ci ’s are accordance to those of λ, and thefunction and gradient values of φ at Vi ’s are accordance to thecorresponding values of λ.

Denote the restrictions of φ to each line Li to be φi . For each i , we aregoing to construct a function ei(x1, x2) such that

ei(x1, x2) =

{λi − φi , (x1, x2) ∈ Li0, (x1, x2) ∈ ∪j 6=iLj .

Thus the sum function e(x1, x2) =∑8

i=1 ei(x1, x2) is accordance withλi − φi on the line Li for each i . Therefore

λ(x1, x2) = φ(x1, x2) + e(x1, x2)

is the desired extension. The extension µ(x1, x2) can be similarlyobtained.

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 65 / 70

Page 69: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Finals form of the objective function

Outline

1 The BFGS Quasi-Newton Method

2 Looking for Consistent Steps and GradientsPrefixed Forms of Steps and GradientsUnit StepsizesConsistency ConditionsChoosing Parameters

3 Constructing A Suitable Objective FunctionRecovering the iterations and function valuesSeeking A Suitable Form for the Objective FunctionConstructing The Functions on The LinesExtending the Functions to the Whole PlaneFinals form of the objective function

4 Conclusion and Discussion

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 66 / 70

Page 70: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Finals form of the objective function

If we only ask the Wolfe conditions to be satisfied,

f (x1, x2, x3, x4) = λ(x1, x2)x3 + µ(x1, x2)x4,

where

µ(x1, x2) = λ(−x2, x1) and λ(−x1,−x2) = −λ(x1, x2).

More exactly,

λ(x1, x2) = (15− 212

√2)x5

1 + (6− 92

√2)(x4

1 x2 + 2x21 x3

2 )

+ (√

2− 1)(6x21 x2 + x3

2 ) + (−15 + 10√

2)x31

+ (154− 15

8

√2)x1 −

98

√2x2 + ω1λ(x1, x2) + ω3λ(−x2, x1),

λ(x1, x2) =3− 2

√2

4(2x2

1 − 1)(2x21 − (3 + 2

√2))x2,

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 67 / 70

Page 71: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Constructing A Suitable Objective Function Finals form of the objective function

If the stepsize is chosen to be the first local minimizer,

f (x1, x2, x3, x4) = λ(x1, x2)x3 + µ(x1, x2)x4,

where

µ(x1, x2) = λ(−x2, x1) and λ(−x1,−x2) = −λ(x1, x2).

More exactly,

λ(x1, x2) := λ(x1, x2) +(

x21 + x2

2 − (2 +√

2))2·∑

j

[wj λj(x1, x2) + uj λj(x2, x1)

],

λj(x1, x2) = x31 − 3x1x2

2 , x51 − 2x3

1 x22 − 3x1x4

4 , . . .

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 68 / 70

Page 72: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Conclusion and Discussion

A perfect example

1,14π,

34π, 8,

3√

2− 1−√

31− 20√

22

, 38

A 3-dimensional example?Any helps in analyzing the convergence of DFP?Any helps in improving the practical BFGS algorithm?

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 69 / 70

Page 73: A Perfect Example for the BFGS Methodsourcedb.igg.cas.cn/cn/zjrck/200907/W020100801406248933389.pdfThe BFGS Quasi-Newton Method Theoretical and Practical Line Searches Line Search

Conclusion and Discussion

THANK YOU VERY MUCH !

Yu-Hong Dai (CAS) A Perfect Example for BFGS December, 2009 70 / 70


Recommended