Optimization Methods
•Unconstrained optimization of an objective function F•Deterministic, gradient-based methods
•Running a PDE: will cover later in course•Gradient-based (ascent/descent) methods
•Stochastic methods•Simulated annealing
•Theoretically but not practically interesting•Evolutionary (genetic) algorithms
•Multiscale methods•Mean field annealing, graduated nonconvexity, etc.
•Constrained optimization•Lagrange multipliers
Our Assumptions for Optimization Methods
•With objective function F(p)•Dimension(p) >> 1and frequently quite large•Evaluating F at any p is very expensive•Evaluating D1F at any p is very, very expensive•Evaluating D2F at any p is extremely expensive
•True in most image analysis and graphics applications
Order of Convergencefor Iterative Methods
•|i+1| = k| i | in limit• is order of convergence
•The major factor in speed of convergence• N steps of method has order of convergence N
•Thus issue is linear convergence (=1) vs. superlinear convergence (>1)
Ascent/Descent Methods
• At maximum, D1F (i.e., F) =0.
• Pick direction of ascent/descent
• Find approximate maximum in that direction: two possibilities– Calculate stepsize that will approximately reach
maximum– In search direction, find actual max within some
range
Gradient Ascent/Descent Methods
• Direction of ascent/descent is D1F.
• If you move to optimum in that direction, next direction will be orthogonal to this one– Guarantees zigzag– Bad behavior for narrow ridges (valleys) of F– Linear convergence
Newton and Secant Ascent/Descent Methods for F(p)
• We are solving D1F=0– Use Newton or secant equation solution method to solve
• Newton to solve f(p)=0 is pi+1 = pi – D1f (pi)-1 pi
• Newton– Move from p to p-(D2F)-1D1F
• Is direction of ascent/descent is gradient direction D1F?– Methods that ascend/descend in D1f (gradient) directionare inferior
• Really direction of ascent/descent is direction of (D2F)-1D1F• Also gives you step size in that direction
• Secant– Same as Newton except replace D2F and D1F by discrete
approximations to them from this and last n iterates
Conjugate gradient method• Preferable to gradient descent/ascent methods• Two major aspects
– Successive directions for descent/ascent are conjugate: <hi+1,D2Fhi> = 0 in limit for convex F
• If true at all steps (quadratic F), convergence in n-1 steps, with n=dim(p) Improvements available using more previous directions
– In search direction, find actual max/min within some range• Quadratic convergence depends on <D1F(xi), hi> =0, i.e., F a local minimum
in the hi direction
• References– Shewchuk, An Intro. to the CGM w/o the Agonizing Pain
(http://www-2.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf)
– Numerical Recipes– Polak, Computational Methods in Optimization, Ac. Press
Conjugate gradient method issues• Preferable to gradient descent/ascent methods
• Must find a local minimum in the search direction
• Will have trouble with– Bumpy objective functions– Extremely elongated minimum/maximum regions
• Smooth objective function to put initial estimate on hillside of its global optimum– E.g., by using larger scale measurements
• Find its optimum
• Iterate– Decrease scale of objective function
– Use prev. optimum as starting point for new optimization
Multiscale Gradient-Based Optimization
To avoid local optima
• General methods– Graduated non-convexity
• [Blake & Zisserman, 1987]
– Mean field annealing• [Bilbro, Snyder, et al, 1992]
• In image analysis– Vary degree of globality of geometric
representation
Multiscale Gradient-Based Optimization
Example Methods
• To optimize F(p) over p subject to gi(p)=0, i=1, 2, …, N, with p having n parameters
– Create function F(p)+i i gi(p)
– Find critical point for it over p and • Solve D1
p,F(p)+i i gi(p)]=0– n+N equations in n+N unknowns
– N of the equations are just gi(p)=0, i=1, 2, …, N
• The critical point will need to be an optimum w.r.t. p
Optimization under Constraints by Lagrange Multiplier(s)
Stochastic Methods
• Needed when objective function is bumpy or many variables or hard to compute gradient of objective function