Post on 24-Jul-2020
transcript
Optimization for Deep Learning
Industrial AI Lab.Prof. Seungchul Lee
Optimization
โข 3 key components1) Objective function2) Decision variable or unknown3) Constraints
โข Procedures1) The process of identifying objective, variables, and constraints for a given problem (known as
"modelingโ)2) Once the model has been formulated, optimization algorithm can be used to find its solutions
2
Optimization: Mathematical Model
โข In mathematical expression
โ ๐ฅ๐ฅ =๐ฅ๐ฅ1โฎ๐ฅ๐ฅ๐๐
โ โ๐๐ is the decision variable
โ ๐๐:โ๐๐ โ โ is objective functionโ Feasible region: ๐ถ๐ถ = {๐ฅ๐ฅ:๐๐๐๐(๐ฅ๐ฅ) โค 0, ๐๐ = 1,โฏ ,๐๐}
โ ๐ฅ๐ฅโ โ โ๐๐ is an optimal solution if ๐ฅ๐ฅโ โ ๐ถ๐ถ and ๐๐(๐ฅ๐ฅโ) โค ๐๐ ๐ฅ๐ฅ ,โ๐ฅ๐ฅ โ ๐ถ๐ถ
3
Optimization: Mathematical Model
โข In mathematical expression
โข Remarks: equivalent
4
Solving Optimization Problems
5
Solving Optimization Problems
โข Starting with the unconstrained, one dimensional case
โ To find minimum point ๐ฅ๐ฅโ, we can look at the derivative of the function ๐๐โฒ ๐ฅ๐ฅโ Any location where ๐๐โฒ ๐ฅ๐ฅ = 0 will be a โflatโ point in the function
โข For convex problems, this is guaranteed to be a minimum
6
Solving Optimization Problems
โข Generalization for multivariate function ๐๐:โ๐๐ โ โโ the gradient of ๐๐ must be zero
โข For defined as above, gradient is a n-dimensional vector containing partial derivatives with respect to each dimension
โข For continuously differentiable ๐๐ and unconstrained optimization, optimal point must have
๐ป๐ป๐ฅ๐ฅ๐๐ ๐ฅ๐ฅโ = 0
7
How do we Find ๐ต๐ต๐๐๐๐ ๐๐ = ๐๐
โข Direct solutionโ In some cases, it is possible to analytically compute ๐ฅ๐ฅโ such that ๐ป๐ป๐ฅ๐ฅ๐๐ ๐ฅ๐ฅโ = 0
8
How do we Find ๐ต๐ต๐๐๐๐ ๐๐ = ๐๐
โข Iterative methodsโ More commonly the condition that the gradient equal zero will not have an analytical solution, require
iterative methods
โ The gradient points in the direction of โsteepest ascentโ for function ๐๐
9
Descent Direction (1D)
โข It motivates the gradient descent algorithm, which repeatedly takes steps in the direction of the negative gradient
10
Gradient Descent
11
Gradient Descent
โข Update rule:
12
Practically Solving Optimization Problems
โข The good news: for many classes of optimization problems, people have already done all the โhard workโ of developing numerical algorithmsโ A wide range of tools that can take optimization problems in โnaturalโ forms and compute a
solution
โข Gradient descentโ Easy to implementโ Very general, can be applied to any differentiable loss functionsโ Requires less memory and computations (for stochastic methods)โ Neural networks/deep learning โ TensorFlow
13