+ All Categories
Home > Documents > Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used...

Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used...

Date post: 26-May-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
86
Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm Mark Schmidt, Ewout van den Berg, Michael P. Friedlander, and Kevin Murphy Department of Computer Science University of British Columbia April 18, 2009
Transcript
Page 1: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

Optimizing Costly Functions with Simple Constraints:A Limited-Memory Projected Quasi-Newton Algorithm

Mark Schmidt, Ewout van den Berg,Michael P. Friedlander, and Kevin Murphy

Department of Computer ScienceUniversity of British Columbia

April 18, 2009

Page 2: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Motivating ProblemOur Contribution

Outline

1 IntroductionMotivating ProblemOur Contribution

2 PQN Algorithm

3 Experiments

4 Discussion

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 3: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Motivating ProblemOur Contribution

Motivating Problem: Structure Learning in Discrete MRFs

We want to fit a Markov random field to discrete data y , butdon’t know the graph structure

Y1 Y2?

Y3 Y4

? ?

?

?

?

We can learn a sparse structure by using `1-regularization ofthe edge parameters [Wainwright et al. 2006, Lee et al. 2006]

Since each edge has multiple parameters, we use group`1-regularization[Bach et al. 2004, Turlach et al. 2005, Yuan & Lin 2006]:

minimizew

− log p(y |w) subject to∑

e

||we ||2 ≤ τ

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 4: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Motivating ProblemOur Contribution

Motivating Problem: Structure Learning in Discrete MRFs

We want to fit a Markov random field to discrete data y , butdon’t know the graph structure

Y1 Y2?

Y3 Y4

? ?

?

?

?

We can learn a sparse structure by using `1-regularization ofthe edge parameters [Wainwright et al. 2006, Lee et al. 2006]

Since each edge has multiple parameters, we use group`1-regularization[Bach et al. 2004, Turlach et al. 2005, Yuan & Lin 2006]:

minimizew

− log p(y |w) subject to∑

e

||we ||2 ≤ τ

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 5: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Motivating ProblemOur Contribution

Motivating Problem: Structure Learning in Discrete MRFs

We want to fit a Markov random field to discrete data y , butdon’t know the graph structure

Y1 Y2?

Y3 Y4

? ?

?

?

?

We can learn a sparse structure by using `1-regularization ofthe edge parameters [Wainwright et al. 2006, Lee et al. 2006]

Since each edge has multiple parameters, we use group`1-regularization[Bach et al. 2004, Turlach et al. 2005, Yuan & Lin 2006]:

minimizew

− log p(y |w) subject to∑

e

||we ||2 ≤ τ

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 6: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Motivating ProblemOur Contribution

Optimization Problem Challenges

Solving this optimization problem has 3 complicating factors:

1 the number of parameters is large

2 evaluating the objective is expensive

3 the parameters have constraints

So how should we solve it?

Interior point methods: the number of parameters is too large

Projected gradient: evaluating the objective is too expensive

Quasi-Newton methods (L-BFGS): we have constraints

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 7: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Motivating ProblemOur Contribution

Optimization Problem Challenges

Solving this optimization problem has 3 complicating factors:

1 the number of parameters is large

2 evaluating the objective is expensive

3 the parameters have constraints

So how should we solve it?

Interior point methods: the number of parameters is too large

Projected gradient: evaluating the objective is too expensive

Quasi-Newton methods (L-BFGS): we have constraints

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 8: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Motivating ProblemOur Contribution

Extending the L-BFGS Algorithm

Quasi-Newton methods that use L-BFGS updates achieve state ofthe art performance for unconstrained differentiable optimization[Nocedal 1980, Liu & Nocedal 1989]

L-BFGS updates have also been used for more general problems:

L-BFGS-B: state of the art performance for bound constrainedoptimization [Byrd et al. 1995]

OWL-QN: state of the art performance for `1-regularizedoptimization [Andrew & Gao 2007].

The above don’t apply since our constraints are not separable

However, the constraints are still simple:

we can compute the projection in O(n)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 9: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Motivating ProblemOur Contribution

Extending the L-BFGS Algorithm

Quasi-Newton methods that use L-BFGS updates achieve state ofthe art performance for unconstrained differentiable optimization[Nocedal 1980, Liu & Nocedal 1989]

L-BFGS updates have also been used for more general problems:

L-BFGS-B: state of the art performance for bound constrainedoptimization [Byrd et al. 1995]

OWL-QN: state of the art performance for `1-regularizedoptimization [Andrew & Gao 2007].

The above don’t apply since our constraints are not separable

However, the constraints are still simple:

we can compute the projection in O(n)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 10: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Motivating ProblemOur Contribution

Extending the L-BFGS Algorithm

Quasi-Newton methods that use L-BFGS updates achieve state ofthe art performance for unconstrained differentiable optimization[Nocedal 1980, Liu & Nocedal 1989]

L-BFGS updates have also been used for more general problems:

L-BFGS-B: state of the art performance for bound constrainedoptimization [Byrd et al. 1995]

OWL-QN: state of the art performance for `1-regularizedoptimization [Andrew & Gao 2007].

The above don’t apply since our constraints are not separable

However, the constraints are still simple:

we can compute the projection in O(n)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 11: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Motivating ProblemOur Contribution

Extending the L-BFGS Algorithm

Quasi-Newton methods that use L-BFGS updates achieve state ofthe art performance for unconstrained differentiable optimization[Nocedal 1980, Liu & Nocedal 1989]

L-BFGS updates have also been used for more general problems:

L-BFGS-B: state of the art performance for bound constrainedoptimization [Byrd et al. 1995]

OWL-QN: state of the art performance for `1-regularizedoptimization [Andrew & Gao 2007].

The above don’t apply since our constraints are not separable

However, the constraints are still simple:

we can compute the projection in O(n)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 12: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Motivating ProblemOur Contribution

Our Contribution

This talk presents an extension of L-BFGS that is suitable when:

1 the number of parameters is large

2 evaluating the objective is expensive

3 the parameters have constraints

4 projecting onto the constraints is substantially cheaper thanevaluating the objective function

The method uses a two-level strategy

At the outer level, L-BFGS updates build a constrained localquadratic approximation to the function

At the inner level, SPG uses projections to minimize thisconstrained quadratic approximation

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 13: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Motivating ProblemOur Contribution

Our Contribution

This talk presents an extension of L-BFGS that is suitable when:

1 the number of parameters is large

2 evaluating the objective is expensive

3 the parameters have constraints

4 projecting onto the constraints is substantially cheaper thanevaluating the objective function

The method uses a two-level strategy

At the outer level, L-BFGS updates build a constrained localquadratic approximation to the function

At the inner level, SPG uses projections to minimize thisconstrained quadratic approximation

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 14: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Motivating ProblemOur Contribution

Our Contribution

This talk presents an extension of L-BFGS that is suitable when:

1 the number of parameters is large

2 evaluating the objective is expensive

3 the parameters have constraints

4 projecting onto the constraints is substantially cheaper thanevaluating the objective function

The method uses a two-level strategy

At the outer level, L-BFGS updates build a constrained localquadratic approximation to the function

At the inner level, SPG uses projections to minimize thisconstrained quadratic approximation

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 15: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Outline

1 Introduction

2 PQN AlgorithmProjected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

3 Experiments

4 Discussion

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 16: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Problem Statement and Assumptions

We address the problem of minimizing a differentiable functionf (x) over a convex set C:

minimizex

f (x) subject to x ∈ C

We assume you can compute the objective f (x), the gradient∇f (x), and the projection PC(x):

PC(x) = arg minc

‖c − x‖2 subject to c ∈ C.

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 17: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Problem Statement and Assumptions

We address the problem of minimizing a differentiable functionf (x) over a convex set C:

minimizex

f (x) subject to x ∈ C

We assume you can compute the objective f (x), the gradient∇f (x), and the projection PC(x):

PC(x) = arg minc

‖c − x‖2 subject to c ∈ C.

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 18: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

PG: Projected Gradient Algorithm

PG: move towards the projection of the negative gradient

Feasible Set

f(x)

xk

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 19: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

PG: Projected Gradient Algorithm

PG: move towards the projection of the negative gradient

Feasible Set

xk - gk

f(x)

xk

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 20: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

PG: Projected Gradient Algorithm

PG: move towards the projection of the negative gradient

Feasible Set

xk - gk

P(xk - gk)

f(x)

xk

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 21: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

PG: Projected Gradient Algorithm

PG: move towards the projection of the negative gradient

Feasible Set

xk - gk

P(xk - gk)

f(x)

dk

xk

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 22: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Naive Projected Newton Algorithm

The problem with projected gradient: slow convergence

Can we speed this up by projecting the Newton direction?

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 23: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Naive Projected Newton Algorithm

The problem with projected gradient: slow convergence

Can we speed this up by projecting the Newton direction?

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 24: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Naive Projected Newton Algorithm

Feasible Set

xk - gk

P(xk - gk)

f(x)

xk

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 25: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Naive Projected Newton Algorithm

Feasible Set

xk - gk

P(xk - gk)

f(x)

xk

q(x)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 26: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Naive Projected Newton Algorithm

Feasible Set

xk - gk

P(xk - gk)

f(x)

xk - Bk\gk

xk

q(x)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 27: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Naive Projected Newton Algorithm

Feasible Set

xk - gk

P(xk - gk)

f(x)

xk - Bk\gk

P(xk - Bk\gk)

xk

q(x)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 28: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Naive Projected Newton Algorithm

The problem with projected gradient: slow convergence

Can we speed this up by projecting the Newton direction?

NO! This can point in the wrong direction

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 29: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Naive Projected Newton Algorithm

The problem with projected gradient: slow convergence

Can we speed this up by projecting the Newton direction?

NO! This can point in the wrong direction

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 30: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Naive Projected Gradient Algorithm: Problem

Feasible Set

xk

f(x)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 31: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Naive Projected Gradient Algorithm: Problem

Feasible Set

xk - gk P(xk - gk)

xk

f(x)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 32: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Naive Projected Gradient Algorithm: Problem

Feasible Set

xk - gk P(xk - gk)

xk

q(x)

f(x)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 33: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Naive Projected Gradient Algorithm: Problem

Feasible Set

xk - gk P(xk - gk)

xk - Bk\gk

xk

q(x)

f(x)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 34: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Naive Projected Gradient Algorithm: Problem

Feasible Set

f(x)

xk - gk P(xk - gk)

xk - Bk\gk

P(xk - Bk\gk)

xk

q(x)

f(x)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 35: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Correct Projected Newton Algorithm

In projected Newton methods, we form a quadraticapproximation to the function around xk :

qk(x) , fk + (x − xk)T∇f (xk) + 12(x − xk)TBk(x − xk)

At each iteration, we minimize this function over the set:

minimizex

qk(x) subject to x ∈ C

NOT the same as projecting the unconstrained Newton step

This generates a feasible descent direction dk , x − xk

The method has a quadratic rate of convergence around alocal minimizer [Bertsekas, 1999]

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 36: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Correct Projected Newton Algorithm

In projected Newton methods, we form a quadraticapproximation to the function around xk :

qk(x) , fk + (x − xk)T∇f (xk) + 12(x − xk)TBk(x − xk)

At each iteration, we minimize this function over the set:

minimizex

qk(x) subject to x ∈ C

NOT the same as projecting the unconstrained Newton step

This generates a feasible descent direction dk , x − xk

The method has a quadratic rate of convergence around alocal minimizer [Bertsekas, 1999]

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 37: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Correct Projected Newton Algorithm

In projected Newton methods, we form a quadraticapproximation to the function around xk :

qk(x) , fk + (x − xk)T∇f (xk) + 12(x − xk)TBk(x − xk)

At each iteration, we minimize this function over the set:

minimizex

qk(x) subject to x ∈ C

NOT the same as projecting the unconstrained Newton step

This generates a feasible descent direction dk , x − xk

The method has a quadratic rate of convergence around alocal minimizer [Bertsekas, 1999]

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 38: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Correct Projected Newton Algorithm

In projected Newton methods, we form a quadraticapproximation to the function around xk :

qk(x) , fk + (x − xk)T∇f (xk) + 12(x − xk)TBk(x − xk)

At each iteration, we minimize this function over the set:

minimizex

qk(x) subject to x ∈ C

NOT the same as projecting the unconstrained Newton step

This generates a feasible descent direction dk , x − xk

The method has a quadratic rate of convergence around alocal minimizer [Bertsekas, 1999]

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 39: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Correct Projected Newton Algorithm

In projected Newton methods, we form a quadraticapproximation to the function around xk :

qk(x) , fk + (x − xk)T∇f (xk) + 12(x − xk)TBk(x − xk)

At each iteration, we minimize this function over the set:

minimizex

qk(x) subject to x ∈ C

NOT the same as projecting the unconstrained Newton step

This generates a feasible descent direction dk , x − xk

The method has a quadratic rate of convergence around alocal minimizer [Bertsekas, 1999]

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 40: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Projected Newton Algorithm

Feasible Set

f(x)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 41: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Projected Newton Algorithm

Feasible Set

f(x)

q(x)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 42: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Projected Newton Algorithm

Feasible Set

f(x)

q(x)

P(xk - gk)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 43: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Projected Newton Algorithm

Feasible Set

f(x)

q(x)

P(xk - gk)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 44: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Projected Newton Algorithm

Feasible Set

f(x)

q(x)

P(xk - gk)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 45: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Projected Newton Algorithm

Feasible Set

f(x)

q(x)

minx!C q(x)

P(xk - gk)dk

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 46: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Problems with the Projected Newton Algorithm

Unfortunately, the projected Newton method can be inefficient:

Computing dk may be very expensive

Using a general n-by-n matrix Bk is impratical

Our algorithm is a projected quasi-Newton algorithm where:

L-BFGS updates construct a diagonal plus low-rank Bk

SPG efficiently computes dk with this Bk and projections.

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 47: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Problems with the Projected Newton Algorithm

Unfortunately, the projected Newton method can be inefficient:

Computing dk may be very expensive

Using a general n-by-n matrix Bk is impratical

Our algorithm is a projected quasi-Newton algorithm where:

L-BFGS updates construct a diagonal plus low-rank Bk

SPG efficiently computes dk with this Bk and projections.

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 48: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Outline

1 IntroductionMotivating ProblemOur Contribution

2 PQN AlgorithmProjected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

3 ExperimentsGaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

4 Discussion

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 49: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Broyden-Fletcher-Goldfarb-Shanno (BFGS) Updates

Quasi-Newton methods work with parameter and gradientdifferences between iterations:

sk , xk+1 − xk and yk , gk+1 − gk

They start with an initial approximation B0 , σI , and choose Bk+1

to interpolate the gradient difference:

Bk+1sk = yk

Since Bk+1 is not unique, the BFGS method chooses the matrixwhose difference with Bk minimizes a weighted Frobenius norm:

Bk+1 = Bk −BksksT

k Bk

sTk Bksk

+ykyT

k

yTk sk

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 50: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Broyden-Fletcher-Goldfarb-Shanno (BFGS) Updates

Quasi-Newton methods work with parameter and gradientdifferences between iterations:

sk , xk+1 − xk and yk , gk+1 − gk

They start with an initial approximation B0 , σI , and choose Bk+1

to interpolate the gradient difference:

Bk+1sk = yk

Since Bk+1 is not unique, the BFGS method chooses the matrixwhose difference with Bk minimizes a weighted Frobenius norm:

Bk+1 = Bk −BksksT

k Bk

sTk Bksk

+ykyT

k

yTk sk

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 51: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Broyden-Fletcher-Goldfarb-Shanno (BFGS) Updates

Quasi-Newton methods work with parameter and gradientdifferences between iterations:

sk , xk+1 − xk and yk , gk+1 − gk

They start with an initial approximation B0 , σI , and choose Bk+1

to interpolate the gradient difference:

Bk+1sk = yk

Since Bk+1 is not unique, the BFGS method chooses the matrixwhose difference with Bk minimizes a weighted Frobenius norm:

Bk+1 = Bk −BksksT

k Bk

sTk Bksk

+ykyT

k

yTk sk

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 52: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

L-BFGS: Limited-Memory BFGS

Instead of storing Bk , the limited-memory BFGS (L-BFGS) methodjust stores the previous m differences sk and yk .[Nocedal 1980, Liu & Nocedal 1989]

These updates applied to B0 = σk I can be written compactly in adiagonal plus low-rank form [Byrd et al. 1994]:

Bm = σk I − NM−1NT

This representations makes multiplication with Bk cost O(mn).

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 53: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

SPG: Spectral Projected Gradient

Recall the projected quasi-Newton sub-problem:

minimizex

fk + (x − xk)T∇f (xk) + 12(x − xk)TBk(x − xk)

subject to x ∈ C

With the L-BFGS representation of Bk , we can compute theobjective function and gradient in O(mn).

This still doesn’t let us efficiently solve the problem

To solve it, we use the spectral projected gradient (SPG) algorithm.

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 54: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

SPG: Spectral Projected Gradient

Recall the projected quasi-Newton sub-problem:

minimizex

fk + (x − xk)T∇f (xk) + 12(x − xk)TBk(x − xk)

subject to x ∈ C

With the L-BFGS representation of Bk , we can compute theobjective function and gradient in O(mn).

This still doesn’t let us efficiently solve the problem

To solve it, we use the spectral projected gradient (SPG) algorithm.

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 55: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

SPG: Spectral Projected Gradient

Recall the projected quasi-Newton sub-problem:

minimizex

fk + (x − xk)T∇f (xk) + 12(x − xk)TBk(x − xk)

subject to x ∈ C

With the L-BFGS representation of Bk , we can compute theobjective function and gradient in O(mn).

This still doesn’t let us efficiently solve the problem

To solve it, we use the spectral projected gradient (SPG) algorithm.

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 56: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

SPG: Spectral Projected Gradient

The classic projected gradient takes steps of the form

xk+1 = PC(xk − αgk)

SPG has two enhancements [Birgin et al. 2000]:

It uses the Barzilai and Borwein [1988] ‘spectral’ step length:

αbb =〈yk−1, yk−1〉〈sk−1, yk−1〉

It uses a non-monotone line search [Grippo et al. 1986]

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 57: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

SPG: Spectral Projected Gradient

The classic projected gradient takes steps of the form

xk+1 = PC(xk − αgk)

SPG has two enhancements [Birgin et al. 2000]:

It uses the Barzilai and Borwein [1988] ‘spectral’ step length:

αbb =〈yk−1, yk−1〉〈sk−1, yk−1〉

It uses a non-monotone line search [Grippo et al. 1986]

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 58: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Barzilai & Borwein Step Size

!15 !10 !5 0 5 10 15

!15

!10

!5

0

5

10

15

Gradient Descent

Barzilai!Borwein

Barzilai-BorweinSteepest Descent

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 59: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

SPG: Spectral Projected Gradient

There is growing interest in SPG for constrained optimization[Dai & Fletcher 2005, van den Berg & Friedlander 2008]

We apply SPG to minimize the strictly convex constrainedquadratic approximations

Friedlander et al. [1999] show that SPG has a superlinearconvergence rate for minimizing strictly convex quadratics

Instead of ‘solving’ the sub-problem, we could just perform kiterations of SPG to improve the steepest descent direction.

In this case, solving the sub-problems is in O(mnk), plus thecost of computing the projection k times.

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 60: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

SPG: Spectral Projected Gradient

There is growing interest in SPG for constrained optimization[Dai & Fletcher 2005, van den Berg & Friedlander 2008]

We apply SPG to minimize the strictly convex constrainedquadratic approximations

Friedlander et al. [1999] show that SPG has a superlinearconvergence rate for minimizing strictly convex quadratics

Instead of ‘solving’ the sub-problem, we could just perform kiterations of SPG to improve the steepest descent direction.

In this case, solving the sub-problems is in O(mnk), plus thecost of computing the projection k times.

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 61: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

SPG: Spectral Projected Gradient

There is growing interest in SPG for constrained optimization[Dai & Fletcher 2005, van den Berg & Friedlander 2008]

We apply SPG to minimize the strictly convex constrainedquadratic approximations

Friedlander et al. [1999] show that SPG has a superlinearconvergence rate for minimizing strictly convex quadratics

Instead of ‘solving’ the sub-problem, we could just perform kiterations of SPG to improve the steepest descent direction.

In this case, solving the sub-problems is in O(mnk), plus thecost of computing the projection k times.

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 62: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

SPG: Spectral Projected Gradient

There is growing interest in SPG for constrained optimization[Dai & Fletcher 2005, van den Berg & Friedlander 2008]

We apply SPG to minimize the strictly convex constrainedquadratic approximations

Friedlander et al. [1999] show that SPG has a superlinearconvergence rate for minimizing strictly convex quadratics

Instead of ‘solving’ the sub-problem, we could just perform kiterations of SPG to improve the steepest descent direction.

In this case, solving the sub-problems is in O(mnk), plus thecost of computing the projection k times.

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 63: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Outline of the Method

The projected quasi-Newton (PQN) method:

1 Evaluate the current objective function and gradient

2 Add/remove difference vectors for L-BFGS

3 Run SPG to compute the projected quasi-Newton direction dk

4 Generate the next iterate with a backtracking line search

The overall algorithm will be most effective when:computing projections is cheaper than evaluating the objective

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 64: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Outline of the Method

The projected quasi-Newton (PQN) method:

1 Evaluate the current objective function and gradient

2 Add/remove difference vectors for L-BFGS

3 Run SPG to compute the projected quasi-Newton direction dk

4 Generate the next iterate with a backtracking line search

The overall algorithm will be most effective when:computing projections is cheaper than evaluating the objective

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 65: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Outline

1 IntroductionMotivating ProblemOur Contribution

2 PQN AlgorithmProjected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

3 ExperimentsGaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

4 Discussion

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 66: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Projection onto Norm-Balls

We are interested in projecting onto balls induced by norms:

C ≡ {x | ‖x‖ ≤ τ}

This projection can be computed in linear-time for many `p-norms,such as the `2-, `∞-, and `1-norms [Duchi et al. 2008]

We are also interested in the case of the mixed p, q-norm balls thatarise in group variable selection:

‖x‖p,q =( ∑

i ‖xσi‖pq

)1/p

The group-lasso is the special case where p = 1, q = 2:

‖x‖1,2 =∑

i ‖xσi‖2

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 67: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Projection onto Norm-Balls

We are interested in projecting onto balls induced by norms:

C ≡ {x | ‖x‖ ≤ τ}

This projection can be computed in linear-time for many `p-norms,such as the `2-, `∞-, and `1-norms [Duchi et al. 2008]

We are also interested in the case of the mixed p, q-norm balls thatarise in group variable selection:

‖x‖p,q =( ∑

i ‖xσi‖pq

)1/p

The group-lasso is the special case where p = 1, q = 2:

‖x‖1,2 =∑

i ‖xσi‖2

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 68: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Projected Newton AlgorithmLimited-Memory BFGS UpdatesSpectral Projected GradientProjection onto Norm-Balls

Projection onto Mixed Norm-Balls

The following proposition leads to an expected linear-timerandomized algorithm for group-lasso projection:

Proposition

Consider c ∈ Rn and a set of g disjoint groups {σi}gi=1 such that

∪iσi = {1, . . . , n}. Then the Euclidean projection PC(c) onto the`1,2-norm ball of radius τ is given by

xσi = sgn(cσi ) · wi , i = 1, . . . , g ,

where w = P(v) is the projection of vector v onto the `1-normball of radius τ , with vi = ‖cσi‖2.

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 69: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Gaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

Outline

1 Introduction

2 PQN Algorithm

3 ExperimentsGaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

4 Discussion

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 70: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Gaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

Experiments

We performed several experiments to test the new method:

We first compared to other extensions of L-BFGS [see paper]

We then compared to state of the art methods for graphstructure learning

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 71: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Gaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

Experiments

We performed several experiments to test the new method:

We first compared to other extensions of L-BFGS [see paper]

We then compared to state of the art methods for graphstructure learning

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 72: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Gaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

Gaussian Graphical Model Structure Learning

We looked at training a Gaussian graphical model with an `1

penalty on the precision matrix elements to induce a sparsestructure [Banerjee et al. 2006, Friedman et al. 2007]:

minimizeK�0

− log det(K ) + tr(Σ̂K ) + λ‖K‖1,

We used the Gasch et al. [2000] data with the pre-processing ofDuchi et al. [2008], and as with previous work we solve the dualproblem:

maximizeW

log det(Σ̂ + W )

subject to Σ̂ + W � 0, ‖W ‖∞ ≤ λ

We compared to a projected gradient method [Duchi et al. 2008].

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 73: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Gaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

Gaussian Graphical Model Structure Learning

We looked at training a Gaussian graphical model with an `1

penalty on the precision matrix elements to induce a sparsestructure [Banerjee et al. 2006, Friedman et al. 2007]:

minimizeK�0

− log det(K ) + tr(Σ̂K ) + λ‖K‖1,

We used the Gasch et al. [2000] data with the pre-processing ofDuchi et al. [2008], and as with previous work we solve the dualproblem:

maximizeW

log det(Σ̂ + W )

subject to Σ̂ + W � 0, ‖W ‖∞ ≤ λ

We compared to a projected gradient method [Duchi et al. 2008].

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 74: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Gaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

Gaussian Graphical Model Structure Learning

We looked at training a Gaussian graphical model with an `1

penalty on the precision matrix elements to induce a sparsestructure [Banerjee et al. 2006, Friedman et al. 2007]:

minimizeK�0

− log det(K ) + tr(Σ̂K ) + λ‖K‖1,

We used the Gasch et al. [2000] data with the pre-processing ofDuchi et al. [2008], and as with previous work we solve the dualproblem:

maximizeW

log det(Σ̂ + W )

subject to Σ̂ + W � 0, ‖W ‖∞ ≤ λ

We compared to a projected gradient method [Duchi et al. 2008].

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 75: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Gaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

Gaussian Graphical Model Structure Learning

20 40 60 80 100

−1200

−1100

−1000

−900

−800

−700

−600

−500

−400

Function Evaluations

Ob

jec

tive

Va

lue

PQN

PG

BCD

SPG

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 76: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Gaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

Gaussian Graphical Model Structure Learning

100 200 300 400 500 600 700 800 900

−600

−550

−500

−450

Function Evaluations

Ob

jec

tive

Va

lue

PQN

PG

BCD

SPG

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 77: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Gaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

Gaussian Graphical Model Structure Learning with Groups

We also compared the methods when we induce a group-sparseprecision matrix using the `1,∞-norm [Duchi et al. 2008]:

minimizeK�0

− log det(K ) + tr(Σ̂K ) + λ‖K‖1,∞,

50 100 150 200

−1200

−1100

−1000

−900

−800

−700

−600

−500

−400

−300

−200

Function Evaluations

Ob

jec

tiv

e V

alu

e

PQN

PG

SPG

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 78: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Gaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

Gaussian Graphical Model Structure Learning with Groups

We also compared the methods when we induce a group-sparseprecision matrix using the `1,∞-norm [Duchi et al. 2008]:

minimizeK�0

− log det(K ) + tr(Σ̂K ) + λ‖K‖1,∞,

200 400 600 800 1000

−400

−380

−360

−340

−320

−300

−280

Function Evaluations

Ob

jec

tiv

e V

alu

e

PQN

PG

SPG

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 79: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Gaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

Gaussian Graphical Model Structure Learning with Groups

We also used PQN to look at the performance if we replace the`1,∞-norm [Duchi et al. 2008] with the `1,2-norm:

minimizeK�0

− log det(K ) + tr(Σ̂K ) + λ‖K‖1,2,

10−4

10−3

10−2

−542

−540

−538

−536

−534

−532

−530

Regularization Strength (λ)

Av

era

ge

Lo

g−

Like

liho

od

L1,2

L1,∞

L1

Base

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 80: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Gaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

Markov Random Field Structure Learning

Finally, we looked at learning a sparse Markov random field:

minimizew

− log p(y |w) subject to∑

e

||we ||2 ≤ τ

We used the trinary data from [Sachs et al. 2005], and comparedto Grafting [Lee et al. 2006] and applying SPG to a second-ordercone reformulation [Schmidt et al. 2008].

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 81: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Gaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

Markov Random Field Structure Learning

Finally, we looked at learning a sparse Markov random field:

minimizew

− log p(y |w) subject to∑

e

||we ||2 ≤ τ

We used the trinary data from [Sachs et al. 2005], and comparedto Grafting [Lee et al. 2006] and applying SPG to a second-ordercone reformulation [Schmidt et al. 2008].

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 82: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Gaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

Markov Random Field Structure Learning

10 20 30 40 50 60 70 80 90 100

3.6

3.7

3.8

3.9

4

4.1

4.2

4.3

4.4

4.5x 10

4

Function Evaluations

Ob

jec

tiv

e V

alu

e

PQN

SPG

Graft

PQN−SOC

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 83: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Gaussian Graphical Model Structure LearningMarkov Random Field Structure Learning

Markov Random Field Structure Learning

100 200 300 400 500 600 700 800 900

3.7

3.8

3.9

4

4.1

4.2

4.3

x 104

Function Evaluations

Ob

jec

tive

Va

lue

PQN

SPG

Graft

PQN−SOC

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 84: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Outline

1 Introduction

2 PQN Algorithm

3 Experiments

4 Discussion

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 85: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Extensions to Other Problems

There are many other cases where we can efficiently computeprojections:

Projection onto hyper-planes or half-spaces is trivial

Projecting onto the probability simplex can be done inO(n log n)

Projecting onto the positive semi-definite cone involvestruncated the spectral decomposition

Projecting onto second-order cones of the form ‖x‖2 ≤ y canbe done in O(n)

Dykstra’s algorithm can be used for combinations of simpleconstraints [Dykstra, 1983]

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints

Page 86: Optimizing Costly Functions with Simple Constraints: A ... · L-BFGS updates have also been used for more general problems: L-BFGS-B: state of the art performance for bound constrained

IntroductionPQN Algorithm

ExperimentsDiscussion

Summary

PQN is an extension of L-BFGS that is suitable when:

1 the number of parameters is large

2 evaluating the objective is expensive

3 the parameters have constraints

4 projecting onto the constraints is substantially cheaper thanevaluating the objective function

We have found the algorithm useful for a variety of problems, andit is likely useful for others (code online soon)

M. Schmidt, E. van den Berg, M. Friedlander, and K. Murphy Optimizing Costly Functions with Simple Constraints


Recommended