+ All Categories
Home > Documents > Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A...

Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A...

Date post: 21-Jul-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
36
Newton Method Materials courtesy: B. Poczos, R. Tibshirani, C. Carmanis & S. Sanghavi
Transcript
Page 1: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

Newton Method

Materials courtesy: B. Poczos, R. Tibshirani, C. Carmanis & S. Sanghavi

Page 2: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

Outline  

Page 3: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion
Page 4: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion
Page 5: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

5

Newton Method for Finding a Root

Linear Approximation (1st order Taylor approx):

Goal:

Therefore,

Page 6: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

6

Illustration of Newton’s method Goal: finding a root

In the next step we will linearize here in x

Page 7: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

7

Example: Finding a Root

http://en.wikipedia.org/wiki/Newton%27s_method

Page 8: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

8

Newton Method for Finding a Root This can be generalized to multivariate functions

Therefore,

[Pseudo inverse if there is no inverse]

Page 9: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

9

Page 10: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

10

Newton method for minimization

Page 11: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

11

Newton method for minimization

unconstrained

twice differentiable

Page 12: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

12

Page 13: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

13

Motivation with Quadratic Approximation

unconstrained

twice differentiable

Page 14: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

14

Motivation with Quadratic Approximation

Page 15: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

15

Comparison with Gradient Descent

Page 16: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

16

Comparison with Gradient Descent

Gradient descent uses a different quadratic approximation:

Newton method is obtained by minimizing over quadratic approximation:

Page 17: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

17

Comparison with Gradient Descent

Page 18: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

18

Page 19: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

19

Page 20: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

20

Pre-Conditioning for Gradient descent

Recall convergence rate for gradient descent:

Can we convert it into well-conditioned problem by changing coordinates?

Can get if A = [r2

f(x)]�1

Page 21: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

21

Pre-Conditioning for Gradient descent

Can we convert it into well-conditioned problem by changing coordinates?

Can get if

Running gradient descent for g(y), gives best descent direction and convergence rate.

Equivalent to Newton step on f(x).

A = [r2f(x)]�1/2

Page 22: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

22

Affine Invariance

[Proof: HW3]

Page 23: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

23

Affine Invariant stopping criterion

Stopping criterion for gradient descent:

Stopping criterion for Newton method:

where is the

Newton decrement.

Not affine-invariant

Page 24: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

24

Affine Invariant stopping criterion

Page 25: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

25

Page 26: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

26

Damped Newton’s Method

Page 27: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

27

Backtracking line search

Page 28: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

28

Convergence Rate

Page 29: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

29

Local convergence for finding root

Quadratic convergence

Page 30: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

30

Convergence analysis

Page 31: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

31

Convergence analysis

Page 32: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

32

Convergence analysis

Analysis can be improved e.g. for self-concordant functions

Page 33: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

33

Comparison to first-order methods

Page 34: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

34

Comparison to first-order methods

Page 35: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

35

Example: Logistic regression

Page 36: Newton Method - Carnegie Mellon School of Computer Science · Equivalent to Newton step on f(x). A =[r2f (x)]1/2. 22 Affine Invariance [Proof: HW3] 23 Affine Invariant stopping criterion

36

Example: Logistic regression


Recommended