+ All Categories
Home > Documents > Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro...

Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro...

Date post: 25-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
38
Rank optimality for the Burer-Monteiro factorization Ir` ene Waldspurger CNRS and CEREMADE (Universit´ e Paris Dauphine) ´ Equipe MOKAPLAN (INRIA) Joint work with Alden Waters (Bernoulli Institute, Rijksuniversiteit Groningen) April 3, 2019 Imaging and machine learning The mathematics of imaging semester Institut Henri Poincar´ e
Transcript
Page 1: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Rank optimality for the Burer-Monteiro

factorization

Irene Waldspurger

CNRS and CEREMADE (Universite Paris Dauphine)Equipe MOKAPLAN (INRIA)

Joint work with Alden Waters (Bernoulli Institute,Rijksuniversiteit Groningen)

April 3, 2019

Imaging and machine learningThe mathematics of imaging semester

Institut Henri Poincare

Page 2: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Introduction 2 / 28

Semidefinite programming

minimize Trace(CX )

such that A(X ) = b,

X � 0.

Here,

I X , the unknown, is an n × n matrix ;

I C is a fixed n × n matrix (cost matrix) ;

I A : Symn → Rm is linear ;

I b is a fixed vector in Rm.

Page 3: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Introduction 3 / 28

Motivations

Various difficult problems can be “lifted” to SDPs, and solvingthese lifted SDPs may solve the original problems.

Particularly important example : relaxation of MaxCut.

minimize Trace(CX )

such that diag(X ) = 1,

X � 0.

Relaxes the Maximum Cut problem from graph theory.[Delorme and Poljak, 1993]Appears also in phase retrieval, Z2 synchronization ...

Page 4: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Introduction 4 / 28

Numerical solvers

SDPs can be solved at a given precision in polynomial time.But the order of the polynomial may be large.

Interior point solvers, for instance, have a per iterationcomplexity of O(n4) in full generality(when m and n are of the same order).

First-order ones, applied to a smoothed problem, have a O(n3)complexity, but require more iterations.

→ Numerically, high dimensional SDPs are difficult to solve.

Page 5: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Introduction 5 / 28

Exploiting the low rank

To speed up these algorithms : exploit the structure of theproblem.

Here, the “structure” we consider is the fact that there existsa low-rank solution.

I There is always a solution with rank ropt at most⌊√2m + 1/4− 1/2

⌋.

[Pataki, 1998]

I In many situations, there is actually a solution with rankropt = O(1).

Page 6: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Introduction 6 / 28

Burer-Monteiro factorization

We focus on one heuristic that takes advantage of the lowrank : the Burer-Monteiro factorization.[Burer and Monteiro, 2003]

If there is a solution with rank ropt , we can write X under theform

X = VV T ,

with V an n × p matrix, and p ≥ ropt .

→ We optimize over V instead of optimizing over X .

Page 7: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Introduction 7 / 28

minimize Trace(CX )

for X ∈ Rn×n such that A(X ) = b,

X � 0.

m

minimize Trace(CVV T )

for V ∈ Rn×p such that A(VV T ) = b.

Remark : The factorization rank p must be chosen. It can bedifferent from ropt , the rank of the solution.

Page 8: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Introduction 8 / 28

minimize Trace(CVV T )

for V ∈ Rn×p such that A(VV T ) = b.

We assume that {V ∈ Rn×p,A(VV T ) = b} is a “nice”manifold.→ Riemannian optimization algorithms.

Main advantage of the factorized formulation

The number of variables is not O(n2) anymore, but O(np),with possibly p � n.→ Less computationally-demanding algorithms can be used.

Page 9: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Introduction 9 / 28

minimize Trace(CVV T )

for V ∈ Rn×p such that A(VV T ) = b.

Main drawback of the factorized formulation

Contrarily to the SDP, this problem is non-convex.→ Riemannian optimization algorithms may get stuck at acritical point instead of finding a global minimizer.

This issue can arise or not, depending on the factorizationrank p.⇒ How to choose p ?

Page 10: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Introduction 10 / 28

Outline

1. Literature reviewI In practice, algorithms work when p = O(ropt).I In particular situations, this phenomenon is understood.I In a general setting, no guarantees for p .

√2m.

I Why this gap ?

2. Optimal rank for the Burer-Monteiro formulationI Up to a minor improvement, p ≈

√2m is the optimal

rank for which general guarantees can be derived.I Consequently, when p .

√2m, Riemannian optimization

algorithms cannot be certified correct withoutassumptions on C .

3. Open questions

Page 11: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Introduction 10 / 28

Outline

1. Literature reviewI In practice, algorithms work when p = O(ropt).I In particular situations, this phenomenon is understood.I In a general setting, no guarantees for p .

√2m.

I Why this gap ?

2. Optimal rank for the Burer-Monteiro formulationI Up to a minor improvement, p ≈

√2m is the optimal

rank for which general guarantees can be derived.I Consequently, when p .

√2m, Riemannian optimization

algorithms cannot be certified correct withoutassumptions on C .

3. Open questions

Page 12: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Introduction 10 / 28

Outline

1. Literature reviewI In practice, algorithms work when p = O(ropt).I In particular situations, this phenomenon is understood.I In a general setting, no guarantees for p .

√2m.

I Why this gap ?

2. Optimal rank for the Burer-Monteiro formulationI Up to a minor improvement, p ≈

√2m is the optimal

rank for which general guarantees can be derived.I Consequently, when p .

√2m, Riemannian optimization

algorithms cannot be certified correct withoutassumptions on C .

3. Open questions

Page 13: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Literature review 11 / 28

Empirical observations

1. [Burer and Monteiro, 2003]Numerical experiments on various problems, notablyMaxCut and minimum bisection relaxations.The factorization rank is p ≈

√2m, and algorithms

always find a global minimizer.(The authors do not test smaller values of p.)

2. [Journee, Bach, Absil, and Sepulchre, 2010]Numerical experiments on MaxCut relaxations (with aparticular initialization scheme).The algorithm proposed by the authors always finds aglobal minimizer when p = ropt .

Page 14: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Literature review 12 / 28

Empirical observations (continued)

3. [Boumal, 2015]Numerical experiments on problems coming fromorthogonal synchronization.Here, ropt = 3 and the algorithm finds the globalminimizer as soon as p ≥ 5.

4. Similar results on “SDP-like” problems.See for example [Mishra, Meyer, Bonnabel, andSepulchre, 2014].

Page 15: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Literature review 13 / 28

Theoretical explanations in particular cases

[Bandeira, Boumal, and Voroninski, 2016]SDP instances coming from Z2 synchronization andcommunity detection problems, under specific statisticalassumptions.→ With high probability, ropt = 1.→ If p = 2, Riemannian algorithms find the global minimizer.

Other particular SDP-like problems have been studied.→ Under strong assumptions, p ≥ ropt is enough so that a →global minimizer is found.[Ge, Lee, and Ma, 2016] ...

Page 16: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Literature review 14 / 28

General case : one main result[Boumal, Voroninski, and Bandeira, 2018]

minimize Trace(CVV T )

for V ∈ Rn×p such that A(VV T ) = b.

Main hypothesis (approximately)

Mpdef= {V ∈ Rn×p,A(VV T ) = b} is a manifold.

[More precisely : for all V ∈Mp,

φV : V ∈ Rn×p → A(V V T + V V T ) ∈ Rm

is surjective.]

Page 17: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Literature review 14 / 28

General case : one main result[Boumal, Voroninski, and Bandeira, 2018]

minimize Trace(CVV T )

for V ∈ Rn×p such that A(VV T ) = b.

Main hypothesis (approximately)

Mpdef= {V ∈ Rn×p,A(VV T ) = b} is a manifold.

[More precisely : for all V ∈Mp,

φV : V ∈ Rn×p → A(V V T + V V T ) ∈ Rm

is surjective.]

Page 18: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Literature review 15 / 28

General case : one main result[Boumal, Voroninski, and Bandeira, 2018]

minimize Trace(CVV T ),

for V ∈Mp.

Riemannian optimization algorithms typically converge tosecond-order critical points :

A matrix V0 ∈Mp is a second-order critical point if

I ∇fC (V0) = 0n,p ;

I Hess fC (V0) � 0,

where fCdef=(V ∈Mp → Trace(CVV T )

).

Page 19: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Literature review 16 / 28

General case : one main result[Boumal, Voroninski, and Bandeira, 2018]

Theorem

Under suitable hypotheses, for almost all matrices C , if

p >

⌊√2m +

1

4− 1

2

⌋,

all second-order critical points of the factorized problem areglobal minimizers.Consequently, Riemannian optimization algorithms always finda global minimizer.

Remark : The value of p does not depend on ropt .

Page 20: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Literature review 16 / 28

General case : one main result[Boumal, Voroninski, and Bandeira, 2018]

Theorem

Under suitable hypotheses, for almost all matrices C , if

p >

⌊√2m +

1

4− 1

2

⌋,

all second-order critical points of the factorized problem areglobal minimizers.Consequently, Riemannian optimization algorithms always finda global minimizer.

Remark : The value of p does not depend on ropt .

Page 21: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Literature review 17 / 28

Summary

I In empirical experiments, as well as in the few particularcases that have been studied, algorithms seem to alwayswork when

p = O(ropt).

I The only available general result guarantees thatalgorithms work when

p &√

2m.

As ropt is often much smaller than√

2m, this leaves a big gap.

→ Is it possible to obtain general guarantees for p �√

2m ?

Page 22: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Literature review 17 / 28

Summary

I In empirical experiments, as well as in the few particularcases that have been studied, algorithms seem to alwayswork when

p = O(ropt).

I The only available general result guarantees thatalgorithms work when

p &√

2m.

As ropt is often much smaller than√

2m, this leaves a big gap.

→ Is it possible to obtain general guarantees for p �√

2m ?

Page 23: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Optimal rank for the Burer-Monteiro factorization 18 / 28

Overview of our results

I A minor improvement is possible over the result by[Boumal, Voroninski, and Bandeira, 2018], but it does notchange the leading order term

p &√

2m.

I With this improvement, the result is essentially optimal,even if ropt �

√2m.

Page 24: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Optimal rank for the Burer-Monteiro factorization 18 / 28

Overview of our results

I A minor improvement is possible over the result by[Boumal, Voroninski, and Bandeira, 2018], but it does notchange the leading order term

p &√

2m.

I With this improvement, the result is essentially optimal,even if ropt �

√2m.

Page 25: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Optimal rank for the Burer-Monteiro factorization 19 / 28

Improving [Boumal, Voroninski, and Bandeira, 2018]

Theorem

Under suitable hypotheses, for almost all matrices C , if

p >

⌊√2m +

9

4− 3

2

⌋,

all second-order critical points of the factorized problem areglobal minimizers.

In [Boumal, Voroninski, and Bandeira, 2018], we had⌊√2m + 1

4− 1

2

⌋. Our result is better by one unit for most

values of m.

Page 26: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Optimal rank for the Burer-Monteiro factorization 20 / 28

Theorem (Quasi-optimality of the previous result)

Let r0 = min{rank(X ),A(X ) = b,X � 0}.Under suitable hypotheses, if

p ≤

√2m +

(r0 +

1

2

)2

−(r0 +

1

2

) ,then there exists a set of matrices C with non-zero Lebesguemeasure such that :

1. The global minimizer has rank r0.

2. There is a second order critical point that is not a globalminimizer.

Page 27: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Optimal rank for the Burer-Monteiro factorization 21 / 28

Comments

I In most applications, r0 is small, possibly r0 = 1.

I We have the following picture :

p0

⌊√2m +

(r0 + 1

2

)2 −(r0 + 1

2

)⌋⌊√

2m + 94− 3

2

⌋≤ r0 − 1

Riemannian optimizationcannot be certified correct. ?

Riemannianoptimization works.

Page 28: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Optimal rank for the Burer-Monteiro factorization 22 / 28

Technical comment : “under suitable hypotheses”

There must exist U0 ∈ Rn×r0 ,V ∈ Rn×p such that

A(U0UT0 ) = A(VV T ) = b,

and

ψV : (T ,R) ∈ Symp × Rr0×p

→ A(

( V U0 ) ( TR )V T + V ( T

R )T

( V U0 )T)∈ Rm

is injective.

Because dim(Symp × Rr0×p

)≤ dim(Rm), this condition is a

priori generically satisfied.

Page 29: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Optimal rank for the Burer-Monteiro factorization 22 / 28

Technical comment : “under suitable hypotheses”

There must exist U0 ∈ Rn×r0 ,V ∈ Rn×p such that

A(U0UT0 ) = A(VV T ) = b,

and

ψV : (T ,R) ∈ Symp × Rr0×p

→ A(

( V U0 ) ( TR )V T + V ( T

R )T

( V U0 )T)∈ Rm

is injective.

Because dim(Symp × Rr0×p

)≤ dim(Rm), this condition is a

priori generically satisfied.

Page 30: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Optimal rank for the Burer-Monteiro factorization 23 / 28

Example : MaxCut relaxations

minimize Trace(CX ),

such that diag(X ) = 1,

X � 0.

minimize Trace(CVV T ),

such that diag(VV T ) = 1,V ∈ Rn×p.

(Original SDP)

(Burer-Monteiro

factorization)

I In this case, r0 = 1.

I The “suitable hypotheses” are satisfied.

Page 31: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Optimal rank for the Burer-Monteiro factorization 24 / 28

Example : MaxCut relaxations

I For almost all C , if

p >

⌊√2m +

9

4− 3

2

⌋,

no bad second-order critical point exists : Riemannianoptimization algorithms work.

I If

p ≤

⌊√2m +

9

4− 3

2

⌋,

bad second-order critical points may exist, even whenthere is a rank 1 solution : Riemannian algorithms cannotbe certified correct without additional assumptions on C .

Page 32: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Open questions 25 / 28

Burer-Monteiro factorization : summary

I [Literature]In particular cases, with strong statistical assumptions onC , the Burer-Monteiro factorization works as soon as

p = ropt or p = ropt + 1.

I [Our result]There are matrices C for which it can fail, unless

p &√

2m,even if ropt = O(1).

I [Empirically]The Burer-Monteiro factorization usually works for

p = O(ropt).

Page 33: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Open questions 25 / 28

Burer-Monteiro factorization : summary

I [Literature]In particular cases, with strong statistical assumptions onC , the Burer-Monteiro factorization works as soon as

p = ropt or p = ropt + 1.

I [Our result]There are matrices C for which it can fail, unless

p &√

2m,even if ropt = O(1).

I [Empirically]The Burer-Monteiro factorization usually works for

p = O(ropt).

Page 34: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Open questions 26 / 28

→ Apparently, the matrices we have constructed for whichthe Burer-Monteiro factorization admits bad second-ordercritical points are somewhat pathological, and notencountered in practice.

Page 35: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Open questions 27 / 28

Questions

I Compute the volume, in the space of cost matrices, ofmatrices for which bad second-order critical points exist,as a function of n and p ?

I Develop guarantees for the Burer-Monteiro factorizationwith assumptions on C , but only mild ones ?

[Intermediate between very specific settings, for which wehave strong guarantees, and the general case, whereguarantees are only for p &

√2m.]

Page 36: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Open questions 27 / 28

Questions

I Compute the volume, in the space of cost matrices, ofmatrices for which bad second-order critical points exist,as a function of n and p ?

I Develop guarantees for the Burer-Monteiro factorizationwith assumptions on C , but only mild ones ?

[Intermediate between very specific settings, for which wehave strong guarantees, and the general case, whereguarantees are only for p &

√2m.]

Page 37: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

Open questions 27 / 28

Questions

I Compute the volume, in the space of cost matrices, ofmatrices for which bad second-order critical points exist,as a function of n and p ?

I Develop guarantees for the Burer-Monteiro factorizationwith assumptions on C , but only mild ones ?

[Intermediate between very specific settings, for which wehave strong guarantees, and the general case, whereguarantees are only for p &

√2m.]

Page 38: Rank optimality for the Burer-Monteiro factorization2.Optimal rank for the Burer-Monteiro formulation I Up to a minor improvement, p ˇ p 2m is the optimal rank for which general guarantees

28 / 28

Thank you !

I. Waldspurger and A. Waters (2018). Rank optimality for theBurer-Monteiro factorization. arXiv preprint arXiv :1812.03046.


Recommended