Optimization

OptimizationOptimization

COS 323COS 323

IngredientsIngredients

• Objective functionObjective function

• VariablesVariables

• ConstraintsConstraints

Find values of the variablesFind values of the variablesthat minimize or maximize the objective functionthat minimize or maximize the objective function

while satisfying the constraintswhile satisfying the constraints

Different Kinds of OptimizationDifferent Kinds of Optimization

Figure from: Optimization Technology CenterFigure from: Optimization Technology Centerhttp://www-fp.mcs.anl.gov/otc/Guide/OptWeb/http://www-fp.mcs.anl.gov/otc/Guide/OptWeb/

Different Optimization TechniquesDifferent Optimization Techniques

• Algorithms have very different flavor Algorithms have very different flavor depending on specific problemdepending on specific problem– Closed form vs. numerical vs. discreteClosed form vs. numerical vs. discrete– Local vs. global minimaLocal vs. global minima– Running times ranging from O(1) to NP-hardRunning times ranging from O(1) to NP-hard

• Today:Today:– Focus on continuous numerical methodsFocus on continuous numerical methods

Optimization in 1-DOptimization in 1-D

• Look for analogies to bracketing in root-Look for analogies to bracketing in root-findingfinding

• What does it mean to What does it mean to bra cke tbra cke t a minimum? a minimum?((xx le ftle ft, , ff((xx le ftle ft))))

((xx rig htrig ht, , ff((xx rig htrig ht))))

((xx m idm id , , ff((xx m idm id ))))xx le ftle ft << xx m idm id << xx rig htrig ht

ff((xx m idm id ) < ) < ff((xx le ftle ft))ff((xx m idm id ) < ) < ff((xx rig htrig ht))

xx le ftle ft << xx m idm id << xx rig htrig ht

ff((xx m idm id ) < ) < ff((xx le ftle ft))ff((xx m idm id ) < ) < ff((xx rig htrig ht))


• Once we have these properties, there is Once we have these properties, there is at at leastleast one one locallocal minimum between minimum between xx le ftle ft and and xx rig htrig ht

• Establishing bracket initially:Establishing bracket initially:– Given Given xx initia linitia l, , inc re m e ntinc re m e nt

– Evaluate Evaluate ff((xx initia linitia l), ), ff((xx initia linitia l+ inc re m e nt+ inc re m e nt))– If decreasing, step until find an increaseIf decreasing, step until find an increase– Else, step in opposite direction until find an increaseElse, step in opposite direction until find an increase– Grow increment at each stepGrow increment at each step

• For maximization: substitute –For maximization: substitute – ff for for ff


• Strategy: evaluate function at some Strategy: evaluate function at some xx ne wne w

((xx le ftle ft, , ff((xx le ftle ft))))


((xx m idm id , , ff((xx m idm id ))))((xx ne wne w , , ff((xx ne wne w ))))



– Here, new “bracket” points are Here, new “bracket” points are xx ne wne w , , xx m idm id , , xx rig htrig ht






– Here, new “bracket” points are Here, new “bracket” points are xx le ftle ft, , xx ne wne w , , xx m idm id





• Unlike with root-finding, can’t always Unlike with root-finding, can’t always guarantee that interval will be reduced by a guarantee that interval will be reduced by a factor of 2factor of 2

• Let’s find the optimal place for Let’s find the optimal place for xx m idm id , relative to , relative to left and right, that will guarantee same factor left and right, that will guarantee same factor of reduction regardless of outcomeof reduction regardless of outcome


ifif ff((xx ne wne w) < ) < ff((xx m idm id))new interval = new interval = αα

elseelsenew interval = 1–new interval = 1–αα22

αα

αα22

Golden Section SearchGolden Section Search

• To assure same interval, want To assure same interval, want αα = 1– = 1–αα22

• So,So,

• This is the “golden ratio” = 0.618…This is the “golden ratio” = 0.618…

• So, interval decreases by 30% per iterationSo, interval decreases by 30% per iteration– Line a r c o nve rg e nc eLine a r c o nve rg e nc e

ϕα =−=2

15

Error ToleranceError Tolerance

• Around minimum, derivative = 0, soAround minimum, derivative = 0, so

• Rule of thumb: pointless to ask for more Rule of thumb: pointless to ask for more accuracy than sqrt(accuracy than sqrt(εε ))– Can use double precision if you want a single-Can use double precision if you want a single-

precision result (and/or have single-precision data)precision result (and/or have single-precision data)

ε

ε

~

machine)()()(

...)()()(2

21

221

x

xxfxfxxf

xxfxfxxf

∆⇒

=∆′′=−∆+

+∆′′+=∆+

Faster 1-D OptimizationFaster 1-D Optimization

• Trade off super-linear convergence forTrade off super-linear convergence forworse robustnessworse robustness– Combine with Golden Section search for safetyCombine with Golden Section search for safety

• Usual bag of tricks:Usual bag of tricks:– Fit parabola through 3 points, find minimumFit parabola through 3 points, find minimum– Compute derivatives as well as positions, fit cubicCompute derivatives as well as positions, fit cubic– Use Use s e c o nds e c o nd derivatives: Newton derivatives: Newton

Newton’s MethodNewton’s Method





• At each step:At each step:

• Requires 1Requires 1stst and 2 and 2ndnd derivatives derivatives

• Quadratic convergenceQuadratic convergence

)(

)(1

k

kkk xf

xfxx

′′′

−=+

Multi-Dimensional OptimizationMulti-Dimensional Optimization

• Important in many areasImportant in many areas– Fitting a model to measured dataFitting a model to measured data– Finding best design in some parameter spaceFinding best design in some parameter space

• Hard in generalHard in general– Weird shapes: multiple extrema, saddles,Weird shapes: multiple extrema, saddles,

curved or elongated valleys, etc.curved or elongated valleys, etc.– Can’t bracketCan’t bracket

• In general, easier than rootfindingIn general, easier than rootfinding– Can always walk “downhill”Can always walk “downhill”

Newton’s Method inNewton’s Method inMultiple DimensionsMultiple Dimensions

• Replace 1Replace 1stst derivative with gradient, derivative with gradient,22ndnd derivative with Hessian derivative with Hessian

=

=∇

∂∂

∂∂∂

∂∂∂

∂∂

∂∂∂∂

2

22

2

2

2

),(

y

fyxf

yxf

x

f

yf

xf

H

f

yxf

Newton’s Method inNewton’s Method inMultiple DimensionsMultiple Dimensions

• Replace 1Replace 1stst derivative with gradient, derivative with gradient,22ndnd derivative with Hessian derivative with Hessian

• So,So,

• Tends to be extremely fragile unless function Tends to be extremely fragile unless function very smooth and starting close to minimumvery smooth and starting close to minimum

)()(11 kkkk xfxHxx ∇−= −

+

Important classification of methodsImportant classification of methods

• Use Use function + gradient + Hessianfunction + gradient + Hessian (Newton) (Newton)

• Use Use function + gradientfunction + gradient (most descent methods) (most descent methods)

• Use Use function values onlyfunction values only (Nelder-Mead, called (Nelder-Mead, called also “simplex”, or “amoeba” method)also “simplex”, or “amoeba” method)

Steepest Descent MethodsSteepest Descent Methods

• What if you can’t / don’t want toWhat if you can’t / don’t want touse 2use 2ndnd derivative? derivative?

• ““Quasi-Newton” methods estimate HessianQuasi-Newton” methods estimate Hessian

• Alternative: walk along (negative of) Alternative: walk along (negative of) gradient…gradient…– Perform Perform 1-D minimization1-D minimization along line passing along line passing

through current point in the direction of the through current point in the direction of the gradientgradient

– Once done, re-compute gradient, iterateOnce done, re-compute gradient, iterate

Problem With Steepest DescentProblem With Steepest Descent

Problem With Steepest DescentProblem With Steepest Descent

Conjugate Gradient MethodsConjugate Gradient Methods

• Idea: avoid “undoing” Idea: avoid “undoing” minimization that’s minimization that’s already been donealready been done

• Walk along directionWalk along direction

• Polak and Ribiere Polak and Ribiere formula:formula:

kkkk dgd β+−= ++ 11

kk

kkk gg

gggk

T

1T )(1

−= ++β

Conjugate Gradient MethodsConjugate Gradient Methods

• Conjugate gradient implicitly obtains Conjugate gradient implicitly obtains information about Hessianinformation about Hessian

• For quadratic function in For quadratic function in nn dimensions, gets dimensions, gets e x a c te x a c t solution in solution in nn steps (ignoring roundoff steps (ignoring roundoff error)error)

• Works well in practice…Works well in practice…

Value-Only Methods in Multi-DimensionsValue-Only Methods in Multi-Dimensions

• If can’t evaluate gradients, life is hardIf can’t evaluate gradients, life is hard

• Can use approximate (numerically evaluated) Can use approximate (numerically evaluated) gradients:gradients:

≈

=∇−⋅+

−⋅+

−⋅+

∂∂∂∂∂∂

δδ

δδ

δδ

)()(

)()(

)()(

3

2

1

3

2

1

)(xfexf

xfexf

xfexf

ef

ef

ef

xf

Generic Optimization StrategiesGeneric Optimization Strategies

• Uniform sampling:Uniform sampling:– Cost rises exponentially with # of dimensionsCost rises exponentially with # of dimensions

• Simulated annealing:Simulated annealing:– Search in random directionsSearch in random directions– Start with large steps, gradually decreaseStart with large steps, gradually decrease– ““Annealing schedule” – how fast to cool?Annealing schedule” – how fast to cool?

Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)

• Keep track of Keep track of nn+1 points in +1 points in nn dimensions dimensions– Vertices of a Vertices of a s im p le xs im p le x (triangle in 2D (triangle in 2D

tetrahedron in 3D, etc.) tetrahedron in 3D, etc.)

• At each iteration: simplex can move,At each iteration: simplex can move,expand, or contractexpand, or contract– Sometimes known as Sometimes known as am o e ba m e tho dam o e ba m e tho d ::

simplex “oozes” along the functionsimplex “oozes” along the function


• Basic operation: Basic operation: reflectionreflection

worst pointworst point(highest function value)(highest function value)

location probed bylocation probed byreflectionreflection step step


• If reflection resulted in best (lowest) value so If reflection resulted in best (lowest) value so far,far,try an try an expansionexpansion

• Else, if reflection helped at all, keep itElse, if reflection helped at all, keep it

location probed bylocation probed byexpansionexpansion step step


• If reflection didn’t help (reflected point still If reflection didn’t help (reflected point still worst) try a worst) try a contractioncontraction

location probed bylocation probed bycontrationcontration step step


• If all else fails If all else fails shrinkshrink the simplex around the simplex aroundthe the be s tbe s t point point


• Method fairly efficient at each iterationMethod fairly efficient at each iteration(typically 1-2 function evaluations)(typically 1-2 function evaluations)

• Can take Can take lo tslo ts of iterations of iterations

• Somewhat flakey – sometimes needs Somewhat flakey – sometimes needs re s ta rtre s ta rt after simplex collapses on itself, etc.after simplex collapses on itself, etc.

• Benefits: simple to implement, doesn’t need Benefits: simple to implement, doesn’t need derivative, doesn’t care about function derivative, doesn’t care about function smoothness, etc.smoothness, etc.

Rosenbrock’s FunctionRosenbrock’s Function

• Designed specifically for testingDesigned specifically for testingoptimization techniquesoptimization techniques

• Curved, narrow valleyCurved, narrow valley

222 )1()(100),( xxyyxf −+−=

Constrained OptimizationConstrained Optimization

• Equality constraints: optimize Equality constraints: optimize f(x )f(x )subject to subject to gg ii(x )(x )=0=0

• Method of Lagrange multipliers: convert to a Method of Lagrange multipliers: convert to a higher-dimensional problemhigher-dimensional problem

• Minimize w.r.t.Minimize w.r.t.∑+ )()( xgxf iiλ );( 11 knxx λλ


• Inequality constraints are harder…Inequality constraints are harder…

• If objective function and constraints all linear, If objective function and constraints all linear, this is “linear programming”this is “linear programming”

• Observation: minimum must lie at corner of Observation: minimum must lie at corner of region formed by constraintsregion formed by constraints

• Simplex method: move from vertex to vertex, Simplex method: move from vertex to vertex, minimizing objective functionminimizing objective function


• General “nonlinear programming” hardGeneral “nonlinear programming” hard

• Algorithms for special cases (e.g. quadratic)Algorithms for special cases (e.g. quadratic)

Global OptimizationGlobal Optimization

• In general, can’t guarantee that you’ve found In general, can’t guarantee that you’ve found global (rather than local) minimumglobal (rather than local) minimum

• Some heuristics:Some heuristics:– Multi-start: try local optimization fromMulti-start: try local optimization from

several starting positionsseveral starting positions– Very slow simulated annealingVery slow simulated annealing– Use analytical methods (or graphing) to determine Use analytical methods (or graphing) to determine

behavior, guide methods to correct neighborhoodsbehavior, guide methods to correct neighborhoods

Date post:	21-Jul-2015
Category:	Documents
Upload:	naveen-saggu
View:	30 times
Download:	0 times

Optimization

Documents