+ All Categories
Home > Documents > SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian...

SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian...

Date post: 28-Feb-2021
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
28
Fitting The Unknown Joshua Lande Stanford September 1, 2010 1/28
Transcript
Page 1: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Fitting The Unknown

Joshua Lande

Stanford

September 1, 2010

1/28

Page 2: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Motivation: Why Maximize

It is frequently important in physics to find the maximum(or minimum) of a function

Nature will maximize entropy

Economists Maximize (Minimize?)the Cost Function

In classical mechanics, minimizes theaction

Build experiments to maximizeperformance

Model parameter estimation.

2/28

Page 3: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Parameter Estimation

Common when analyzing data to fit a model to data

χ2 =∑

(yi − y(xi))/σi)2

logLikelihood = log(Prob(data|model))

Model is generally a function of free parameters

Interesting to find parameters that maximize thelikelihood.

3/28

Page 4: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Plan

Typically, physicists pullout an off the shelfoptimizer to fit theirfunction and be donewith it

Today, lets dig under thehood and figure out howthey work

4/28

Page 5: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Ad Hoc Methods

Given an arbitrary function F (~x) of n variables ~x,

how would you go about minimizing it?

Grid Search

Divide space into an n dimensional gridevaluate the function along the gridavoids local minimumUseful to seed other algorithms

Bisection Algorithm

Random points method

These are slow/inefficient - O(2n)

5/28

Page 6: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Alternating Variables

Maximize one parameterat a time

Ignore correlationbetween variables

Algorithm is inefficientand unreliable

Can cause oscillatorybehavior

6/28

Page 7: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Gradient Descent

Function decreases in the direction ofthe negative gradient

The negative of the gradient shouldlead to the minimum

~xi+1 = ~xi − γ~∇F (~x)

Iterate until |~∇F (~xi)| < ε

Well suited when ~∇F iseasily/analytically calculated

Often, perform a grid search in thedirection of −~∇F before nextiteration

7/28

Page 8: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Simplex Fitting Algorithm (What’s a Simplex???)

A simplex is a generalization of atriangle or tetrahedron to arbitrarydimension

An n-simplex has n+ 1 vertices in ndimensions

all equidistant

For example,

a 2-simplex is a trianglea 3-simplex is a tetrahedrona 4-simplex is a pentachorona 5-simplex is a hexaterona 6-simplex is a heptapeton

8/28

Page 9: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Simplex (continued)

Define a simplex in the ndimensional fit space

Evaluate the function atall points

Reflect the highest pointthrough the centroid ofthe other points

If the reflected point is still the highest, reflect thesecond highest pointWhen a certain vertex has remained in the currentsimplex for many iterations, contract all other verticestowards it by 1/2 9/28

Page 10: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Simplex Example

10/28

Page 11: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Simplex (continued)

Pros:

Ignores the gradient/curvature of the functionWorks well for noisy data,Good for functions with local minimumWorks well when curvature varies rapidly

Cons:

Requires an initial simplex choiceSlow convergence for smooth functions (compared togradient descent)Inflexible to changes in local function structure

E.G. wouldn’t work well in a long valley

11/28

Page 12: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Nedler Mead Algorithm

Improvement of Simplex algorithm

“Adapts itself to the local landscape,

elongating down long inclined planes,changing direction on encountering a valley at anangle,and contracting in the neighborhood of a minimum”

“Copies of the routine, written in Extended MercuryAutocode, are available from the authors”1

Used by Minuit’s SIMPLEX algorithm and scipy’sfmin function

1J.A. Nedler and R. Mead ”A Simple Method for FunctionMinimization“

12/28

Page 13: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Nedler Mead Algorithm

P̄ is the simplex centroid. Ph is the largest Fh. Pl hasthe smallest Fh

Reflection: Evaluate the function F∗ on the reflectedpart of the simplex P ∗ = (1 + α)P̄ − αPhExpansion: If F∗ < Fl (reflected point newminimum), then expland simplex futher in thedirection by a ratio γ

P ∗∗ = γP ∗ + (1− γ)P̄

Contraction: If F ∗ > Fi for i 6= h, then we contract byusing as our new point

P ∗∗ = βPh + (1− β)P̄

Replace Ph with P ∗∗

13/28

Page 14: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Quit When. . .

End when√∑

(Fi − F̄ )2/n < ε

End criteria is well suited for minimizing χ2 or loglikelihood, where curvature at minimum givesinformation about parameter uncertainty

Fit error only has to be small compared to parameteruncertainty!

14/28

Page 15: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Newton-Raphson algorithm

Assume your function is a parabola and calculate theextrema of the estimated parabola

use curvature information to take a more direct route

Taylor expand the derivative, set it to 0

f ′(x+ ∆x) = f ′(x) + ∆xf ′′(x) = 0

∆x = −f ′(x)/f ′′(x)

xi+1 = xi − γf ′(xi)/f ′′(xi)Iterate until |f ′(xi) < ε|Excellent local convergence!

Often, instead perform a grid search in direction ofsteepest descent

15/28

Page 16: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Newton Algorithm (Issues)

May end up converging on a saddle point/localmaximum

May overshoot by quite a bit

Formula undefined for F ′′ = 0.

16/28

Page 17: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Newton-Raphson in Many Dimensions

Perform a n dimensional Taylorexpansion~∇F (~x+ ∆~x) = ~∇F (~x) +H∆~x = 0

Where the Hessian matrixHij = ∂

∂xi

∂∂xjF

The recursion condition is

~xn+1 = ~xn − γH−1n ~∇F (~xn)

Iterate until |~∇F (~xn)| < δ

Figure: gradientdescent (green)and Newton’smethod (red) forminimizing afunction 17/28

Page 18: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Performance

No reason that Hn has to be invertible

Newton-Raphson works particularly well near theminimum

Gradient descent (ignore curvature) works better whenfar from the minimum and higher order terms are moresignificant

Gradient descent converges very slowly near theminimum

18/28

Page 19: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Levenberg-Marquardt

Algorithm devised to naturally interpolate betweenGradient and Newton-Raphson

Replace equation to solve with(H(~x) + µI)∆~x = −~∇F (~x)

µ << 1 reduces to the Newton-Raphson algorithm

µ >> 1 reduces to the Gradient algorithm withγ = 1/µ

Many different algorithms for adaptively changing µbased upon function

19/28

Page 20: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

BFGS Method

Often H(~x) is very costly to evaluate

Desirable to find an intelligent approximation of thecurvature

BFGS is modification of Newton’s algorithm thatapproximates the Hessian

Uses Hessian at previous points and values of thederivative to estimate new one.

20/28

Page 21: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

BFGS Method

Same general formula as Newton’s Method~xn+1 = ~xn −H−1n ~∇F (~xn)

Approximate the Hessian~sn+1 = ~xn+1 − ~xn~yn+1 = ~∇F (~xn+1)− ~∇F (~xn)

Hn+1 = Hn + ~yn~yTn /~y

Tn~sn −Hnsn(Bnsn)T/sTnBn~sn

Invert Hn+1 Using the Sherman Morrison formula:

H−1n+1 = H−1n +(~sTn~yn + ~yTnB

−1n ~yn)(~sn~s

Tn )

(~sTn~yn)2

− H−1n ~yn~sTn + ~sn~y

TnH

−1n

~sTk ~yk

21/28

Page 22: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

BFGS Method

Advantageof BFGS:

the inevitability of the Hessian approximation isensured directlyWell suited for problems where H is costly to compute

Disadvantage: Convergence slower than Newton’sMethod2

fmin bfgs in scipy

ROOT::Math::MinimizerOptions::SetDefaultMinimizer(”GSLMultiMin”,”BFGS”)

2http://www.math.mtu.edu/~msgocken/ma5630spring2003/

lectures/global2/

22/28

Page 23: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Physical Constrains

Frequently, parameter values are constrained

E.G, experiment constrained by upper limit on costunable to observer negative counts

A common strategy is to change to unconstrainedvariables

instead of fitting x, y on a circle, fit θ

When a fit parameter must be positive, it is easy toinstead fit the log of the parameter

Remember that you have to correct the fit errorTo first order, σlog x = ∂ log(x)

∂x σxσx = xσlog x

23/28

Page 24: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Constrains

Minuit fitter allows two sided limits of each fitparameters3

It internally fits unconstrained variables buttransformed them into constrained variables

Pint = arcsin(2Pext−a

b−a − 1)

Pext = a+ b−a2

(sinPint + 1)

Mapping is non-linear, causes distortions in errors

3http://wwwinfo.cern.ch/asdoc/minuit/minmain.html

24/28

Page 25: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Penalty Functions

Another strategy to for constrains are penaltyfunctions

Replace the function you are fitting with a functionwhich increases rapidly in forbidden regions

Want to minimize F (~x) such that

gi(~x) ≤ 0hi(~x) = 0

gi are inequalities (Flux > 0) and hi are fixedconstraints (cost = 1, 000)

Many types of penalty functions have been suggested

25/28

Page 26: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Static Penalty functions4

Constant Penalty FunctionsReplace function with Fp(~x) = F (~x) +

∑Ciδi

where δi =

{1 if constrain i is violated

0 if constrain i is satisfiedNo obvious way to pick the Ci

“Cost to Completion” Penalty FunctionLet penalty increase further farther from allowedregionFp(~x) = F (~x) +

∑Cid

κi

Where di =

{δigi(~x)

|hi(~x)|Frequently κ is 1 or 2

4http://www.eng.auburn.edu/users/smithae/publications/

bookch/chapter.pdf26/28

Page 27: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Dynamic Penalty Functions

static penalty functions lack a robust strategy forpicking Ci

Dynamic penalties use the length of time of search t

Fp(~x, t) = F (~x) +∑s(t)dκi

di is an increasing function of time

Often have to tune si(t) to particular problem

If si(t) is too lenient, infeasible solution may resultfrom fitIf si(t) is too strict, search may converge tonon-optimal feasible solution

Lots of research into adaptive penalty functions. . .

27/28

Page 28: SLAC - Fitting The Unknown...BFGS Method Advantageof BFGS: the inevitability of the Hessian approximation is ensured directly Well suited for problems where His costly to compute Disadvantage:

Questions?

28/28


Recommended