+ All Categories
Home > Documents > Optimization

Optimization

Date post: 21-Jul-2015
Category:
Upload: naveen-saggu
View: 30 times
Download: 0 times
Share this document with a friend
Popular Tags:
41
Optimization Optimization COS 323 COS 323
Transcript
Page 1: Optimization

OptimizationOptimization

COS 323COS 323

Page 2: Optimization

IngredientsIngredients

• Objective functionObjective function

• VariablesVariables

• ConstraintsConstraints

Find values of the variablesFind values of the variablesthat minimize or maximize the objective functionthat minimize or maximize the objective function

while satisfying the constraintswhile satisfying the constraints

Page 3: Optimization

Different Kinds of OptimizationDifferent Kinds of Optimization

Figure from: Optimization Technology CenterFigure from: Optimization Technology Centerhttp://www-fp.mcs.anl.gov/otc/Guide/OptWeb/http://www-fp.mcs.anl.gov/otc/Guide/OptWeb/

Page 4: Optimization

Different Optimization TechniquesDifferent Optimization Techniques

• Algorithms have very different flavor Algorithms have very different flavor depending on specific problemdepending on specific problem– Closed form vs. numerical vs. discreteClosed form vs. numerical vs. discrete– Local vs. global minimaLocal vs. global minima– Running times ranging from O(1) to NP-hardRunning times ranging from O(1) to NP-hard

• Today:Today:– Focus on continuous numerical methodsFocus on continuous numerical methods

Page 5: Optimization

Optimization in 1-DOptimization in 1-D

• Look for analogies to bracketing in root-Look for analogies to bracketing in root-findingfinding

• What does it mean to What does it mean to bra cke tbra cke t a minimum? a minimum?((xx le ftle ft, , ff((xx le ftle ft))))

((xx rig htrig ht, , ff((xx rig htrig ht))))

((xx m idm id , , ff((xx m idm id ))))xx le ftle ft << xx m idm id << xx rig htrig ht

ff((xx m idm id ) < ) < ff((xx le ftle ft))ff((xx m idm id ) < ) < ff((xx rig htrig ht))

xx le ftle ft << xx m idm id << xx rig htrig ht

ff((xx m idm id ) < ) < ff((xx le ftle ft))ff((xx m idm id ) < ) < ff((xx rig htrig ht))

Page 6: Optimization

Optimization in 1-DOptimization in 1-D

• Once we have these properties, there is Once we have these properties, there is at at leastleast one one locallocal minimum between minimum between xx le ftle ft and and xx rig htrig ht

• Establishing bracket initially:Establishing bracket initially:– Given Given xx initia linitia l, , inc re m e ntinc re m e nt

– Evaluate Evaluate ff((xx initia linitia l), ), ff((xx initia linitia l+ inc re m e nt+ inc re m e nt))– If decreasing, step until find an increaseIf decreasing, step until find an increase– Else, step in opposite direction until find an increaseElse, step in opposite direction until find an increase– Grow increment at each stepGrow increment at each step

• For maximization: substitute –For maximization: substitute – ff for for ff

Page 7: Optimization

Optimization in 1-DOptimization in 1-D

• Strategy: evaluate function at some Strategy: evaluate function at some xx ne wne w

((xx le ftle ft, , ff((xx le ftle ft))))

((xx rig htrig ht, , ff((xx rig htrig ht))))

((xx m idm id , , ff((xx m idm id ))))((xx ne wne w , , ff((xx ne wne w ))))

Page 8: Optimization

Optimization in 1-DOptimization in 1-D

• Strategy: evaluate function at some Strategy: evaluate function at some xx ne wne w

– Here, new “bracket” points are Here, new “bracket” points are xx ne wne w , , xx m idm id , , xx rig htrig ht

((xx le ftle ft, , ff((xx le ftle ft))))

((xx rig htrig ht, , ff((xx rig htrig ht))))

((xx m idm id , , ff((xx m idm id ))))((xx ne wne w , , ff((xx ne wne w ))))

Page 9: Optimization

Optimization in 1-DOptimization in 1-D

• Strategy: evaluate function at some Strategy: evaluate function at some xx ne wne w

– Here, new “bracket” points are Here, new “bracket” points are xx le ftle ft, , xx ne wne w , , xx m idm id

((xx le ftle ft, , ff((xx le ftle ft))))

((xx rig htrig ht, , ff((xx rig htrig ht))))

((xx m idm id , , ff((xx m idm id ))))((xx ne wne w , , ff((xx ne wne w ))))

Page 10: Optimization

Optimization in 1-DOptimization in 1-D

• Unlike with root-finding, can’t always Unlike with root-finding, can’t always guarantee that interval will be reduced by a guarantee that interval will be reduced by a factor of 2factor of 2

• Let’s find the optimal place for Let’s find the optimal place for xx m idm id , relative to , relative to left and right, that will guarantee same factor left and right, that will guarantee same factor of reduction regardless of outcomeof reduction regardless of outcome

Page 11: Optimization

Optimization in 1-DOptimization in 1-D

ifif ff((xx ne wne w) < ) < ff((xx m idm id))new interval = new interval = αα

elseelsenew interval = 1–new interval = 1–αα22

αα

αα22

Page 12: Optimization

Golden Section SearchGolden Section Search

• To assure same interval, want To assure same interval, want αα = 1– = 1–αα22

• So,So,

• This is the “golden ratio” = 0.618…This is the “golden ratio” = 0.618…

• So, interval decreases by 30% per iterationSo, interval decreases by 30% per iteration– Line a r c o nve rg e nc eLine a r c o nve rg e nc e

ϕα =−=2

15

Page 13: Optimization

Error ToleranceError Tolerance

• Around minimum, derivative = 0, soAround minimum, derivative = 0, so

• Rule of thumb: pointless to ask for more Rule of thumb: pointless to ask for more accuracy than sqrt(accuracy than sqrt(εε ))– Can use double precision if you want a single-Can use double precision if you want a single-

precision result (and/or have single-precision data)precision result (and/or have single-precision data)

ε

ε

~

machine)()()(

...)()()(2

21

221

x

xxfxfxxf

xxfxfxxf

∆⇒

=∆′′=−∆+

+∆′′+=∆+

Page 14: Optimization

Faster 1-D OptimizationFaster 1-D Optimization

• Trade off super-linear convergence forTrade off super-linear convergence forworse robustnessworse robustness– Combine with Golden Section search for safetyCombine with Golden Section search for safety

• Usual bag of tricks:Usual bag of tricks:– Fit parabola through 3 points, find minimumFit parabola through 3 points, find minimum– Compute derivatives as well as positions, fit cubicCompute derivatives as well as positions, fit cubic– Use Use s e c o nds e c o nd derivatives: Newton derivatives: Newton

Page 15: Optimization

Newton’s MethodNewton’s Method

Page 16: Optimization

Newton’s MethodNewton’s Method

Page 17: Optimization

Newton’s MethodNewton’s Method

Page 18: Optimization

Newton’s MethodNewton’s Method

Page 19: Optimization

Newton’s MethodNewton’s Method

• At each step:At each step:

• Requires 1Requires 1stst and 2 and 2ndnd derivatives derivatives

• Quadratic convergenceQuadratic convergence

)(

)(1

k

kkk xf

xfxx

′′′

−=+

Page 20: Optimization

Multi-Dimensional OptimizationMulti-Dimensional Optimization

• Important in many areasImportant in many areas– Fitting a model to measured dataFitting a model to measured data– Finding best design in some parameter spaceFinding best design in some parameter space

• Hard in generalHard in general– Weird shapes: multiple extrema, saddles,Weird shapes: multiple extrema, saddles,

curved or elongated valleys, etc.curved or elongated valleys, etc.– Can’t bracketCan’t bracket

• In general, easier than rootfindingIn general, easier than rootfinding– Can always walk “downhill”Can always walk “downhill”

Page 21: Optimization

Newton’s Method inNewton’s Method inMultiple DimensionsMultiple Dimensions

• Replace 1Replace 1stst derivative with gradient, derivative with gradient,22ndnd derivative with Hessian derivative with Hessian

=

=∇

∂∂

∂∂∂

∂∂∂

∂∂

∂∂∂∂

2

22

2

2

2

),(

y

fyxf

yxf

x

f

yf

xf

H

f

yxf

Page 22: Optimization

Newton’s Method inNewton’s Method inMultiple DimensionsMultiple Dimensions

• Replace 1Replace 1stst derivative with gradient, derivative with gradient,22ndnd derivative with Hessian derivative with Hessian

• So,So,

• Tends to be extremely fragile unless function Tends to be extremely fragile unless function very smooth and starting close to minimumvery smooth and starting close to minimum

)()(11 kkkk xfxHxx ∇−= −

+

Page 23: Optimization

Important classification of methodsImportant classification of methods

• Use Use function + gradient + Hessianfunction + gradient + Hessian (Newton) (Newton)

• Use Use function + gradientfunction + gradient (most descent methods) (most descent methods)

• Use Use function values onlyfunction values only (Nelder-Mead, called (Nelder-Mead, called also “simplex”, or “amoeba” method)also “simplex”, or “amoeba” method)

Page 24: Optimization

Steepest Descent MethodsSteepest Descent Methods

• What if you can’t / don’t want toWhat if you can’t / don’t want touse 2use 2ndnd derivative? derivative?

• ““Quasi-Newton” methods estimate HessianQuasi-Newton” methods estimate Hessian

• Alternative: walk along (negative of) Alternative: walk along (negative of) gradient…gradient…– Perform Perform 1-D minimization1-D minimization along line passing along line passing

through current point in the direction of the through current point in the direction of the gradientgradient

– Once done, re-compute gradient, iterateOnce done, re-compute gradient, iterate

Page 25: Optimization

Problem With Steepest DescentProblem With Steepest Descent

Page 26: Optimization

Problem With Steepest DescentProblem With Steepest Descent

Page 27: Optimization

Conjugate Gradient MethodsConjugate Gradient Methods

• Idea: avoid “undoing” Idea: avoid “undoing” minimization that’s minimization that’s already been donealready been done

• Walk along directionWalk along direction

• Polak and Ribiere Polak and Ribiere formula:formula:

kkkk dgd β+−= ++ 11

kk

kkk gg

gggk

T

1T )(1

−= ++β

Page 28: Optimization

Conjugate Gradient MethodsConjugate Gradient Methods

• Conjugate gradient implicitly obtains Conjugate gradient implicitly obtains information about Hessianinformation about Hessian

• For quadratic function in For quadratic function in nn dimensions, gets dimensions, gets e x a c te x a c t solution in solution in nn steps (ignoring roundoff steps (ignoring roundoff error)error)

• Works well in practice…Works well in practice…

Page 29: Optimization

Value-Only Methods in Multi-DimensionsValue-Only Methods in Multi-Dimensions

• If can’t evaluate gradients, life is hardIf can’t evaluate gradients, life is hard

• Can use approximate (numerically evaluated) Can use approximate (numerically evaluated) gradients:gradients:

=∇−⋅+

−⋅+

−⋅+

∂∂∂∂∂∂

δδ

δδ

δδ

)()(

)()(

)()(

3

2

1

3

2

1

)(xfexf

xfexf

xfexf

ef

ef

ef

xf

Page 30: Optimization

Generic Optimization StrategiesGeneric Optimization Strategies

• Uniform sampling:Uniform sampling:– Cost rises exponentially with # of dimensionsCost rises exponentially with # of dimensions

• Simulated annealing:Simulated annealing:– Search in random directionsSearch in random directions– Start with large steps, gradually decreaseStart with large steps, gradually decrease– ““Annealing schedule” – how fast to cool?Annealing schedule” – how fast to cool?

Page 31: Optimization

Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)

• Keep track of Keep track of nn+1 points in +1 points in nn dimensions dimensions– Vertices of a Vertices of a s im p le xs im p le x (triangle in 2D (triangle in 2D

tetrahedron in 3D, etc.) tetrahedron in 3D, etc.)

• At each iteration: simplex can move,At each iteration: simplex can move,expand, or contractexpand, or contract– Sometimes known as Sometimes known as am o e ba m e tho dam o e ba m e tho d ::

simplex “oozes” along the functionsimplex “oozes” along the function

Page 32: Optimization

Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)

• Basic operation: Basic operation: reflectionreflection

worst pointworst point(highest function value)(highest function value)

location probed bylocation probed byreflectionreflection step step

Page 33: Optimization

Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)

• If reflection resulted in best (lowest) value so If reflection resulted in best (lowest) value so far,far,try an try an expansionexpansion

• Else, if reflection helped at all, keep itElse, if reflection helped at all, keep it

location probed bylocation probed byexpansionexpansion step step

Page 34: Optimization

Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)

• If reflection didn’t help (reflected point still If reflection didn’t help (reflected point still worst) try a worst) try a contractioncontraction

location probed bylocation probed bycontrationcontration step step

Page 35: Optimization

Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)

• If all else fails If all else fails shrinkshrink the simplex around the simplex aroundthe the be s tbe s t point point

Page 36: Optimization

Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)

• Method fairly efficient at each iterationMethod fairly efficient at each iteration(typically 1-2 function evaluations)(typically 1-2 function evaluations)

• Can take Can take lo tslo ts of iterations of iterations

• Somewhat flakey – sometimes needs Somewhat flakey – sometimes needs re s ta rtre s ta rt after simplex collapses on itself, etc.after simplex collapses on itself, etc.

• Benefits: simple to implement, doesn’t need Benefits: simple to implement, doesn’t need derivative, doesn’t care about function derivative, doesn’t care about function smoothness, etc.smoothness, etc.

Page 37: Optimization

Rosenbrock’s FunctionRosenbrock’s Function

• Designed specifically for testingDesigned specifically for testingoptimization techniquesoptimization techniques

• Curved, narrow valleyCurved, narrow valley

222 )1()(100),( xxyyxf −+−=

Page 38: Optimization

Constrained OptimizationConstrained Optimization

• Equality constraints: optimize Equality constraints: optimize f(x )f(x )subject to subject to gg ii(x )(x )=0=0

• Method of Lagrange multipliers: convert to a Method of Lagrange multipliers: convert to a higher-dimensional problemhigher-dimensional problem

• Minimize w.r.t.Minimize w.r.t.∑+ )()( xgxf iiλ );( 11 knxx λλ

Page 39: Optimization

Constrained OptimizationConstrained Optimization

• Inequality constraints are harder…Inequality constraints are harder…

• If objective function and constraints all linear, If objective function and constraints all linear, this is “linear programming”this is “linear programming”

• Observation: minimum must lie at corner of Observation: minimum must lie at corner of region formed by constraintsregion formed by constraints

• Simplex method: move from vertex to vertex, Simplex method: move from vertex to vertex, minimizing objective functionminimizing objective function

Page 40: Optimization

Constrained OptimizationConstrained Optimization

• General “nonlinear programming” hardGeneral “nonlinear programming” hard

• Algorithms for special cases (e.g. quadratic)Algorithms for special cases (e.g. quadratic)

Page 41: Optimization

Global OptimizationGlobal Optimization

• In general, can’t guarantee that you’ve found In general, can’t guarantee that you’ve found global (rather than local) minimumglobal (rather than local) minimum

• Some heuristics:Some heuristics:– Multi-start: try local optimization fromMulti-start: try local optimization from

several starting positionsseveral starting positions– Very slow simulated annealingVery slow simulated annealing– Use analytical methods (or graphing) to determine Use analytical methods (or graphing) to determine

behavior, guide methods to correct neighborhoodsbehavior, guide methods to correct neighborhoods


Recommended