+ All Categories
Home > Documents > A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid...

A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid...

Date post: 28-May-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
33
A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit denoise Ewout van den Berg Human Language Technologies Group IBM T.J. Watson Research Center Work done at the Department of Statistics Stanford University October 10, 2014 This work was partially supported by National Science Foundation Grant DMS 0906812 (American Reinvestment and Recovery Act).
Transcript
Page 1: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

A hybrid quasi-Newton projected-gradient methodwith application to Lasso and basis-pursuit denoise

Ewout van den Berg

Human Language Technologies Group

IBM T.J. Watson Research Center

Work done at the Department of Statistics

Stanford University

October 10, 2014

This work was partially supported by National Science Foundation Grant DMS 0906812(American Reinvestment and Recovery Act).

Page 2: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Background

Basis pursuit denoise

minimizex

‖x‖1 subject to ‖Ax − b‖2 ≤ σ

spgl1 reduces this by to a series of Lasso problems [B, Friedlander, 2008]

minimizex

12‖Ax − b‖22 subject to ‖x‖1 ≤ τ

Root finding with τ+ = τ + (‖r‖22 − σ‖r‖)/‖AT r‖∞

2

Page 3: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Background

Basis pursuit denoise

minimizex

‖x‖1 subject to ‖Ax − b‖2 ≤ σ

spgl1 reduces this by to a series of Lasso problems [B, Friedlander, 2008]

minimizex

12‖Ax − b‖22 subject to ‖x‖1 ≤ τ

Root finding with τ+ = τ + (‖r‖22 − σ‖r‖)/‖AT r‖∞

2

Page 4: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Background

minimizex

12‖Ax − b‖22 subject to ‖x‖1 ≤ τ

General form

minimizex

f (x) subject to x ∈ C

Solved using spectral projected-gradient (spg) method:

d = −∇f (x) · βx+ = P(x + αd)

ord = P(x −∇f (x) · β)− x

x+ = x + αd

With

β: Barzilai-Borwein scaling parameter [Barzilai,Borwein,1988]

α: Step length from non-monotone line search [Birgin et al., 2000]

P: Orthogonal projection onto C

P(x) := argminv

‖x − v‖2 subject to v ∈ C

3

Page 5: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Motivation

Observation

I (Sometimes) difficult to get a highly accurate solution

I Iterates remain on the same face of C (same sign pattern)

I Very little progress

Typical solution

Detect stagnation on a fixed face

Solve problem constrained to the given face

Check optimality for global problem

Resume if not optimal

Difficulties

I When to initiate this procedure?

I Solving subproblem on incorrect face is wasteful

I Waiting too long defeats the purpose

4

Page 6: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Motivation

Observation

I (Sometimes) difficult to get a highly accurate solution

I Iterates remain on the same face of C (same sign pattern)

I Very little progress

Typical solution

I Detect stagnation on a fixed face

I Solve problem constrained to the given face

I Check optimality for global problem

I Resume if not optimal

Difficulties

I When to initiate this procedure?

I Solving subproblem on incorrect face is wasteful

I Waiting too long defeats the purpose

4

Page 7: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Motivation

Observation

I (Sometimes) difficult to get a highly accurate solution

I Iterates remain on the same face of C (same sign pattern)

I Very little progress

Typical solution

I Detect stagnation on a fixed face

I Solve problem constrained to the given face

I Check optimality for global problem

I Resume if not optimal

Difficulties

I When to initiate this procedure?

I Solving subproblem on incorrect face is wasteful

I Waiting too long defeats the purpose

4

Page 8: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Outline

1 Propose a new hybrid method for polyhedral C(Practical only for simple C: `1, bound constrained, simplex)

2 Convergence of the method

3 Application to Lasso and basis pursuit

5

Page 9: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Hybrid method

Basic idea

I Take regular spg steps by default

I After each iteration, check whether F(x+) = F(x) ( 6= C)

I Initialize or update l-bfgs model

I Use quasi-Newton search direction in next iteration

Some issues

I Quasi-Newton direction cannot simply be projected onto CI Naive implementation ignores problem structure

Solution

I Form an l-bfgs model restricted to the current face

I Capture only relevant information

6

Page 10: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Hybrid method

Basic idea

I Take regular spg steps by default

I After each iteration, check whether F(x+) = F(x) ( 6= C)

I Initialize or update l-bfgs model

I Use quasi-Newton search direction in next iteration

Some issues

I Quasi-Newton direction cannot simply be projected onto CI Naive implementation ignores problem structure

Solution

I Form an l-bfgs model restricted to the current face

I Capture only relevant information

6

Page 11: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Hybrid method

Basic idea

I Take regular spg steps by default

I After each iteration, check whether F(x+) = F(x) ( 6= C)

I Initialize or update l-bfgs model

I Use quasi-Newton search direction in next iteration

Some issues

I Quasi-Newton direction cannot simply be projected onto CI Naive implementation ignores problem structure

Solution

I Form an l-bfgs model restricted to the current face

I Capture only relevant information

6

Page 12: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Reduced L-BFGS model

Local function

I We only want to model f (x) over the current d-face FI Find an orthonormal basis B ∈ Rn×d for lin(F − F)

I Define f (c) : Rd → R for some fixed x0 ∈ F

f (c) = f (x0 + Bc)

I Choosing c = BT (x − x0) gives f (c) = f (x) for x ∈ F

Model updates

I Standard l-bfgs uses s = x+ − x and y = ∇f (x+)−∇f (x)

I We use s = c+ − c , and y = ∇f (c+)−∇f (c):

s = BT (x+ − x), y = BT (∇f (x+)−∇f (x))

I Never need to choose x0

7

Page 13: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Reduced L-BFGS model

Local function

I We only want to model f (x) over the current d-face FI Find an orthonormal basis B ∈ Rn×d for lin(F − F)

I Define f (c) : Rd → R for some fixed x0 ∈ F

f (c) = f (x0 + Bc)

I Choosing c = BT (x − x0) gives f (c) = f (x) for x ∈ F

Model updates

I Standard l-bfgs uses s = x+ − x and y = ∇f (x+)−∇f (x)

I We use s = c+ − c , and y = ∇f (c+)−∇f (c):

s = BT (x+ − x), y = BT (∇f (x+)−∇f (x))

I Never need to choose x07

Page 14: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Quasi-Newton search direction

Computing the search direction

I Want to compute search direction at current x

I Denote by H−1 the inverse approximate Hessian (Rd×d)

I In the reduced space we compute the search direction

d = −H−1∇f (c) = −H−1BT∇f (x)

I Project back to ambient space using Bd :

d = −BH−1BT∇f (x)

Properties

I Search direction along the face: (x + αd) ∈ F for 0 ≤ α ≤ αmax

I Guaranteed descent direction

8

Page 15: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Self-projection cone

Remaining issues

I Quasi-Newton step must be restricted to the face (α ≤ αmax)

I Fall back to spg step if line search fails (reset Hessian, history)

I Misses mechanism to avoid local minimum on relint(F)

Self-projection cone

I Update and use l-bfgs model only if −∇f (x+) ∈ S(F(x))

I Where S(F(x)) is the self-projection cone of F(x):

S(F(x)) := {d ∈ Rn | ∃α > 0 : F [P(x + αd)] = F(x)}= N (x) + lin(F(x)−F(x))

9

Page 16: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Self-projection cone

Remaining issues

I Quasi-Newton step must be restricted to the face (α ≤ αmax)

I Fall back to spg step if line search fails (reset Hessian, history)

I Misses mechanism to avoid local minimum on relint(F)

Self-projection cone

I Update and use l-bfgs model only if −∇f (x+) ∈ S(F(x))

I Where S(F(x)) is the self-projection cone of F(x):

S(F(x)) := {d ∈ Rn | ∃α > 0 : F [P(x + αd)] = F(x)}= N (x) + lin(F(x)−F(x))

9

Page 17: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Convergence

Theorem

Let f (x) be a twice continuously differentiable convex function that isbounded below and for which there exist constants 0 < µ1 ≤ µ2 <∞such that for all x , v ∈ Rn

µ1‖v‖22 ≤ vT∇2f (x)v ≤ µ2‖v‖22.

Then for any starting point x0 ∈ C, the sequence {xk} generated by thehybrid algorithm converges to the minimizer of f (x) over C.

Proof sketch:

I Finitely many quasi-Newton steps: done or spg converges

I Infinitely many quasi-Newton steps:

I Successful quasi-Newton (l-bfgs) step (Liu and Nocedal):

f (x+)− f (x∗) ≤ (1− c)(f (x)− f (x∗))

I Finite number of quasi-Newton steps on incorrect faces10

Page 18: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Application to general problems

Challenges for general problems

I Projection in SPG is difficult for general CI Facial structure is often unknown

I Finding orthonormal basis for face may be expensive

I Even true for weighted `1 ball

Well suited for simple problems

I Cross polytope (`1-norm)

I Box constrained problems

I Simplex

11

Page 19: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Application to Lasso

Additional conditions

I Typically A ∈ Rm×n with m < n

I Hessian not full rank for d-faces with d > m

I Use quasi-Newton steps only when d ≤ m

Orthogonal projection

I Reduces to soft-thresholding, O(n log n) complexity

Orthonormal basis

I Normalize signs and permute indices: F = conv{e1, . . . , ed+1}I Compute QR factorization of [e2 − e1, . . . , ed+1 − e1]:

Qi ,j =

−√

1/(j2 + j) i ≤ j√j/(j + 1) i = j + 1

0 otherwise.

I Implicit B and BT , can apply in O(n) time

12

Page 20: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Application to Lasso

Additional conditions

I Typically A ∈ Rm×n with m < n

I Hessian not full rank for d-faces with d > m

I Use quasi-Newton steps only when d ≤ m

Orthogonal projection

I Reduces to soft-thresholding, O(n log n) complexity

Orthonormal basis

I Normalize signs and permute indices: F = conv{e1, . . . , ed+1}I Compute QR factorization of [e2 − e1, . . . , ed+1 − e1]:

Qi ,j =

−√

1/(j2 + j) i ≤ j√j/(j + 1) i = j + 1

0 otherwise.

I Implicit B and BT , can apply in O(n) time

12

Page 21: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Application to Lasso

Additional conditions

I Typically A ∈ Rm×n with m < n

I Hessian not full rank for d-faces with d > m

I Use quasi-Newton steps only when d ≤ m

Orthogonal projection

I Reduces to soft-thresholding, O(n log n) complexity

Orthonormal basis

I Normalize signs and permute indices: F = conv{e1, . . . , ed+1}I Compute QR factorization of [e2 − e1, . . . , ed+1 − e1]:

Qi ,j =

−√

1/(j2 + j) i ≤ j√j/(j + 1) i = j + 1

0 otherwise.

I Implicit B and BT , can apply in O(n) time12

Page 22: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Application to Lasso

Self-projection cone

I Let d = −∇f (x) and define

I1 = {i ∈ [n] | (xi > 0 and di < 0) or (xi < 0 and di > 0)},I2 = {i ∈ [n] | (xi > 0 and di ≥ 0) or (xi < 0 and di ≤ 0)},I3 = (I1 ∪ I2)c ,

I Set sj :=∑

i∈Ij |di | and assume that x 6∈ relint(C), then

d ∈ S(F(x)) iff

s1 = s2 + s3 and s3 = 0, or

s1 < s2 + s3 and maxi∈I3|di | ≤

s2 − s1|I1 ∪ I2|

Line search

I Can compute maximum step length αmax to stay on face

I Objective is quadratic, can find minimum along search direction

I Can compute interval [αwmin, αwmax ] satisfying Wolfe conditions

13

Page 23: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Application to Lasso

Self-projection cone

I Let d = −∇f (x) and define

I1 = {i ∈ [n] | (xi > 0 and di < 0) or (xi < 0 and di > 0)},I2 = {i ∈ [n] | (xi > 0 and di ≥ 0) or (xi < 0 and di ≤ 0)},I3 = (I1 ∪ I2)c ,

I Set sj :=∑

i∈Ij |di | and assume that x 6∈ relint(C), then

d ∈ S(F(x)) iff

s1 = s2 + s3 and s3 = 0, or

s1 < s2 + s3 and maxi∈I3|di | ≤

s2 − s1|I1 ∪ I2|

Line search

I Can compute maximum step length αmax to stay on face

I Objective is quadratic, can find minimum along search direction

I Can compute interval [αwmin, αwmax ] satisfying Wolfe conditions13

Page 24: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Numerical experiments

I 10 Sparco problems, each with three τ values [B et al., 2009]

I Random problems: A, A + c , b = Ax , b

I Heaviside matrix, random b

14

Page 25: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Numerical experiments

Heaviside matrix, random b

0 1 2 3 4 5 6 7 810

−8

10−6

10−4

10−2

100

102

104

Runtime (seconds)

Rel

. dua

lity

gap

Student Version of MATLAB

15

Page 26: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Numerical experiments

Random 300× 800 A, random b

0 0.5 1 1.5 2 2.5 3 3.5 410

−6

10−4

10−2

100

102

Runtime (seconds)

Rel

. dua

lity

gap

Student Version of MATLAB

16

Page 27: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Numerical experiments

Random 300× 800 A, random b

0 100 200 300 400 500 600 700 8000

50

100

150

200

250

300

350

400

450

Iteration

Cum

ulat

ive

# qu

asi−

New

ton

step

s

Student Version of MATLAB

16

Page 28: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Numerical experiments

300× 800 random + offset A, b = Ax0, 50-sparse x0

0 0.5 1 1.5 2 2.5 310

−3

10−2

10−1

100

101

102

103

Runtime (seconds)

Rel

. dua

lity

gap

Student Version of MATLAB

17

Page 29: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Numerical experiments

Sparco blurspike, τ → σ ≈ 0.1‖b‖2

0 0.5 1 1.5 2 2.5 310

−8

10−6

10−4

10−2

100

102

Runtime (seconds)

Rel

. dua

lity

gap

Student Version of MATLAB

18

Page 30: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Numerical experiments

Sparco p3poly, τ → σ ≈ 10−3‖b‖2

0 500 1000 1500 2000 2500 300010

−8

10−6

10−4

10−2

100

102

Runtime (seconds)

Rel

. dua

lity

gap

Student Version of MATLAB

19

Page 31: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Numerical experiments

10−1

100

101

102

103

10−1

100

101

102

103

Runtime hybrid

Run

time

orig

inal

Student Version of MATLAB

20

Page 32: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Numerical experiments

Lasso

I Sometimes the procedure is never used, small overhead

I Does well on problems that take longer to solve

Basis pursuit denoise

I SPGL1 has enthusiastic (aggressive) update strategy

I Subproblem terminated before quasi-Newton steps are taken

I Update strategy can lead to run-away behavior

I In those cases accurate solves with hybrid method can help

21

Page 33: A hybrid quasi-Newton projected-gradient method with application … · 2014-10-10 · A hybrid quasi-Newton projected-gradient method with application to Lasso and basis-pursuit

Conclusions

Conclusions

I Hybrid method shows encouraging results

I Apply to box-contrained problems

ReferenceI J. Barzilai and J.M. Borwein, Two-point step size gradient methods, IMA

Journal of Numerical Analysis, 8 (1988), pp. 141–148I E.G. Birgin, J.M. Martınez, and M. Raydan, Nonmonotone spectral pro-

jected gradient methods on convex sets, SIAM Journal on Optimization,10 (2000), pp. 1196–1211

I E. v.d. Berg and M.P. Friedlander, Probing the Pareto frontier for basispursuit solutions, SIAM Journal on Scientific Computing, 2 (2008), pp.890–912

I E. v.d. Berg, M.P. Friedlander, G. Hennenfent, F. Herrmann, R. Saab,and O. Yılmaz, Algorithm 890: Sparco: A testing framework for sparsereconstruction, ACM Transactions on Mathematical Software, 35 (2009),pp. 1–16

22


Recommended