+ All Categories
Home > Documents > CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018...

CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018...

Date post: 24-Jun-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
54
CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová (IPhT, CEA Saclay, France) with T. Lesieur, F. Krzakala ; Proofs with J. Xu, J. Barbier, N. Macris, M. Dia, M. Lelarge, L. Miolane .
Transcript
Page 1: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION

Lenka Zdeborová (IPhT, CEA Saclay, France)

with T. Lesieur, F. Krzakala; Proofs with J. Xu, J. Barbier, N. Macris, M. Dia, M. Lelarge, L. Miolane.

Page 2: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

LET’S PLAY A GAME

+1

-1

N=15 people

Page 3: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

LET’S PLAY A GAME

-1

+1-1

-1

-1

-1

-1

-1

+1

+1 +1

+1+1

+1

+1

Page 4: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

-1

+1-1

-1

-1

-1

-1

-1

+1

+1 +1

+1+1

+1

+1

LET’S PLAY A GAME

Page 5: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

+1

-1

LET’S PLAY A GAME

• Generate a random Gaussian variable Z (zero mean and variance Δ)

• Report:

‣ Y=Z+1/⎷N if the cards were the same.

‣ Y=Z-1/⎷N if the cards were different.

Page 6: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

-1

+1-1

-1

-1

-1

-1

-1

+1

+1 +1

+1+1

+1

+1

LET’S PLAY A GAME

Collect Yij for every pair (ij).

Goal: Recover cards (up to symmetry) purely from the knowledge of

• Each pair reports:

‣ Yij=Zij+1/⎷N if cards the same.

‣ Yij=Zij-1/⎷N if cards different.

Zij ⇠ N (0,�)

Y = {Yij}i<j

Page 7: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

HOW TO SOLVE THIS?

Eigen-decomposition of Y (aka PCA) minimises X

i<j

(Yij � Yij)2 with rank(Y ) = 1

xPCA (leading eigen-vector of Y) estimates x* (up to a sign).

true values of cards: Yij =1

Nx*i x*j + Zij

BBP phase transition:

Zij ⇠ N (0,�) x*i ∈ {−1, + 1}

Δ > 1Δ < 1

xPCA ⋅ x* ≈ 0|xPCA ⋅ x* | > 0

x* ∈ {−1, + 1}N

Page 8: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

MAIN QUESTIONS

What is the minimal achievable estimation error on x*?

(Is it possible to do better than PCA?)

What is the minimal efficiently achievable estimation error on x*?

Page 9: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

BAYESIAN INFERENCE

P (x|Y ) =P (x)P (Y |x)

P (Y )

P (x|Y ) =1

Z(Y,�)

NY

i=1

[�(xi + 1) + �(xi � 1)]Y

i<j

e�(Yij�xixj/

pN)2

2�

Values of cards:

Posterior distribution:

Bayes-optimal inference = computation of marginals (argmax maximizes the number of correctly assigned values, mean of marginals minimises the mean-squared error).

xi ∈ {−1, + 1}x ∈ {−1, + 1}N

Page 10: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

IN THIS TALK

P (u, v|Y ) =1

Z(Y )

NY

i=1

PU (ui)MY

j=1

PV (vj)Y

i,j

Pout(Yij |uTi vj/

pN)

P (x|Y ) =1

Z(Y )

NY

i=1

PX(xi)Y

i<j

Pout(Yij |xTi xj/

pN)

or

Bayes-optimal inference for generic prior and output

Generate ground-truth xi* from PX. Generate Yij from Pout. Goal: Infer x* from Y.

P (x|Y ) =1

Z(Y )

NY

i=1

PX(xi)Y

i1<...<ip

Pout(Yi1....ip |p

(p� 1)!

N (p�1)/2xi1 ...xip)

or

Page 11: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

Stochastic block model (dense):

PX(x) =1

r

rX

k=1

�(x� ek)

x 2 Rr eTk = (0, . . . , 0, 1, 0, . . . , 0)r-valued cards

Pout(Yij = 1|xTi xjpN

) = pout +µpN

xTi xj

Another example:

Yij is the adjacency matrix of a graph

Pout(Yij = 0|xTi xjpN

) = 1� pout �µpN

xTi xj

P (x|Y ) =1

Z(Y )

NY

i=1

PX(xi)Y

i<j

Pout(Yij |xTi xj/

pN)

Page 12: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

Submatrix localization.

Z2 synchronization.

Planted spin glass (Ising/spherical/vectorial).

Spiked Wigner models.

More examples:

P (x|Y ) =1

Z(Y )

NY

i=1

PX(xi)Y

i<j

Pout(Yij |xTi xj/

pN)

Page 13: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

ASYMMETRIC CASE

Gaussian mixture clustering.

Biclustering.

Dawid-Skene model for crowdsourcing.

Johnstone’s spiked covariance model.

Restricted Boltzmann machine with random weights.

P (u, v|Y ) =1

Z(Y )

NY

i=1

PU (ui)MY

j=1

PV (vj)Y

i,j

Pout(Yij |uTi vj/

pN)

Page 14: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

TENSOR ESTIMATION

Spiked tensor model (Richard, Montanari, NIPS’14)

Hyper-graph clustering

Tensor completion.

Sub-tensor localisation

P (x|Y ) =1

Z(Y )

NY

i=1

PX(xi)Y

i1<...<ip

Pout(Yi1....ip |p

(p� 1)!

N (p�1)/2xi1 ...xip)

Page 15: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

OUR RESULTS

In the limit, , we compute rigorously the minimum mean-squared error

Message passing algorithm that is asymptotically optimal out of a sharply delimited “hard” region of parameters.

P (x|Y ) =1

Z(Y )

NY

i=1

PX(xi)Y

i<j

Pout(Yij |xTi xj/

pN)

N ! 1,M/N = ↵ = O(1)

MMSE =1

N

NX

i=1

(x⇤i � xi)

2

or asymmetric or tensors

xi = ∑x

xi P(x |Y )

Page 16: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

COMMENTS

Limit, , high dimensional statistics. Rank = O(1).

Regime of MSE: When is MSE better than a random pick from the prior? And how much better. A statistician would perhaps rather ask how fast the MSE goes to zero.

When we talk about sparsity, finite fraction of non-zeros. In most statistics works # or non-zeros o(N).

The noise and spikes iid. Does not describe most of real data. Precise analysis of optimality and many algorithms possible, intriguing behaviour (phase transitions).

N ! 1,M/N = ↵ = O(1)

Page 17: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

How do we compute the Bayes-optimal performance?

Y ! J

xi ! Si

Map to a spin glass?

Page 18: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

BACK TO THE CARD GAME

Si 2 {�1,+1}P (S|J) = 1

Z(Y,�)

Y

i<j

e�(Jij�SiSj/

pN)2

2�

P (S|J) = 1

Z(Y,�)e

1�

pN

Pi<j JijSiSj

Boltzmann measure of a mean-field Ising spin glass (Sherrington-Kirkpatrick’75 model)

Jij conditioned on Si*: planted disorder

Page 19: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

MEAN-FIELD SPIN GLASS

‣ Mean-field spin glass models solvable using the non-rigorous Replica method / cavity method (Mezard, Parisi, Nishimori, Watkin, Nadal, Sompolinsky, many many others 70s-80s.)

‣ For (Ising spins):

1p�⇤

�=

�⇤

De Almeida; Thouless’78:

p�

Si 2 {�1,+1}

Page 20: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

MEAN-FIELD SPIN GLASS

‣ Mean-field spin glass models solvable using the non-rigorous Replica method / cavity method (Mezard, Parisi, Nishimori, Watkin, Nadal, Sompolinsky, many many others 70s-80s.)

‣ For (Ising spins):Si 2 {�1,+1}

p�

1p�⇤

�=

�⇤

0.5

1 �

erro

r

0

Page 21: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

LET’S JUMP ~40 YEARS FORWARD:

MAIN RESULTS

Page 22: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

DEFINITIONS:

Sij ⌘@ logPout(yij |w)

@w

���yij ,0

1

�⌘ EPout(y|w=0)

"✓@ logPout(y|w)

@w

���y,0

◆2#

Fisher-score matrix

Fisher information

P(x;A,B) =1

Z(A,B)PX(x) exp

✓B>x� x>Ax

2

f(A,B) ⌘ EP(x)

P (x|Y ) =1

Z(Y )

NY

i=1

PX(xi)Y

i<j

Pout(Yij |xTi xj/

pN)

A, B ∈ ℝr×r x ∈ ℝr

Page 23: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

THEOREMS:

1

NlogZ(Y ) �(M)concentrates around maximum of

x ⇠ PX(x)

w ⇠ N (0, 1r)

�(M) = Ex,w

"logZ

M

�,M

�x+

rM

�w

!#� Tr(MM>)

4�

Theorem 1: M 2 Rr⇥r

x ∈ ℝr

Why is this useful?

When N>>1, the rN-dimensional problem reduces to a r-dimensional one.

P (x|Y ) =1

Z(Y )

NY

i=1

PX(xi)Y

i<j

Pout(Yij |xTi xj/

pN)

= replica symmetric free energy

Page 24: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

THEOREMS:

1

NlogZ(Y ) �(M)concentrates around maximum of

x ⇠ PX(x)

w ⇠ N (0, 1r)

MMSE = Tr[Ex(xx>)� argmax�(M)]

�(M) = Ex,w

"logZ

M

�,M

�x+

rM

�w

!#� Tr(MM>)

4�

Theorem 1:

Theorem 2:

Proofs: Korada, Macris’10, Krzakala, Xu, LZ, ITW’16, Barbier, Dia, Macris, Krzakala, Lesieur, LZ, NIPS’16; more elegant Lelarge, Miolane’16; El-Alaoui, Krzakala’17

M 2 Rr⇥r

Page 25: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

FREE ENERGY FOR THE ASYMMETRIC CASE

P (u, v|Y ) =1

Z(Y )

NY

i=1

PU (ui)MY

j=1

PV (vj)Y

i,j

Pout(Yij |uTi vj/

pN)

�(Mu,Mv) = Eu,w

"logZu

↵Mv

�,↵Mv

�u+

r↵Mv

�w

!#

+↵Ev,w

"logZv

Mu

�,Mu

�v +

rMu

�w

!#� ↵Tr(MvM>

u )

2�

Conjectured: Lesieur, Krzakala, LZ’15

Proof: Miolane’17

Page 26: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

FREE ENERGY FOR THE TENSOR CASE

P (x|Y ) =1

Z(Y )

NY

i=1

PX(xi)Y

i1<...<ip

Pout(Yi1....ip |p

(p� 1)!

N (p�1)/2xi1 ...xip)

�(M) = Ex,w

"logZ

Mp�1

�,Mp�1

�x+

rMp�1

�w

!#� Mp(p� 1)

2p�

rank=1

Proof (r=1): Lesieur, Miolane, Lelarge, Krzakala, LZ’17

General rank: Barbier, Macris, Miolane’17.

Page 27: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

KEY PROOF INGREDIENTS

Guerra’s interpolation (from N independent

scalar denoising problems) +

Page 28: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

MAIN QUESTIONS

What is the minimal achievable estimation error on x*?

(Is it possible to do better than PCA?)

What is the minimal efficiently achievable estimation error on x*?

Page 29: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

APPROXIMATE MESSAGE PASSING

AMP algorithm estimates means and variances of the marginals:

Thouless, Anderson, Palmer’77, Rangan, Fletcher’12, Matsushita, Tanaka’14, Deshpande, Montanari’14, Lesieur, Krzakala, LZ’15 and 16

P(x |Y ) =1

Z(Y )

N

∏i=1

PX(xi)∏i<j

Pout(yij |x⊤i xj / N)

Bti =

1

N

N

∑l=1

Silatl −

1Δ ( 1

N

N

∑l=1

vtl)at−1

i

At =1

NΔ (N

∑l=1

atl a

tl⊤)

at+1i = f(At, Bt

i )

vt+1i = ∂B f(At, Bt

i )

Page 30: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

DEFINITIONS:

Sij ⌘@ logPout(yij |w)

@w

���yij ,0

1

�⌘ EPout(y|w=0)

"✓@ logPout(y|w)

@w

���y,0

◆2#

Fisher-score matrix

Fisher information

P(x;A,B) =1

Z(A,B)PX(x) exp

✓B>x� x>Ax

2

f(A,B) ⌘ EP(x)

P (x|Y ) =1

Z(Y )

NY

i=1

PX(xi)Y

i<j

Pout(Yij |xTi xj/

pN)

Page 31: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

Characterisation of the AMP via matrix-order parameter M.

STATE EVOLUTION

M t ⌘ 1

N

NX

i=1

ati(x⇤i )

> 2 Rr⇥r

x ⇠ PX(x)

w ⇠ N (0, 1r)

M t+1 = Ex,w

"f

M t

�,M t

�x+

rM t

�w

!x>

#

Observation: Stationary points of are fixed points of the state evolution.

�(M)

Proof: Rangan, Fletcher’12, Javanmard, Montanari’12, Deshpande, Montanari’14.

MSEAMP = Tr[Ex(xx>)�MAMP]

Page 32: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

AMP-MSE given by the local maximum of the free energy reached gradient descent starting from small M/large MSE.

MMSE is given by the global maximum of the free energy.

BOTTOM LINE

M

free

ene

rgy

MAMP

�(M) = Ex,w

"logZ

M

�,M

�x+

rM

�w

!#� Tr(MM>)

4�

MMSE = Tr[Ex(xx>)� argmax�(M)]

argmax�(M)

MSEAMP = Tr[Ex(xx>)�MAMP]

Page 33: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

ZOOLOGY OF FIXED POINTS (FOR MATRIX ESTIMATION)

EX(x) = 0

EX(x) 6= 0

M t+1 =⌃M t⌃

�M t+1

(r=1) =[EX(x2)]2

�M t

Zero mean prior:

SE has always a “trivial” fixed point M=0.

Stability of the trivial fixed point:

This is the same as the spectral phase transition of the Fisher score matrix (Edwards’68, known as the BBP’05 transition)

Non-zero mean priors:

MMSE always better than random guessing (spectral methods still have a phase transition).

Multiple fixed points may still exist.

Page 34: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

PX(xi) =⇢

2[�(xi � 1) + �(xi + 1)] + (1� ⇢)�(xi)

From fixed points to phase transitions:ac

cura

cyac

cura

cy

Page 35: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

ALGORITHMIC INTERPRETATION

noise, Δ

• Easy by approximate message passing. • Impossible information theoretically. • Hard phase: in presence of a first order phase transition.

PX(xi) =⇢

2[�(xi � 1) + �(xi + 1)] + (1� ⇢)�(xi)

Conjecture: No polynomial algorithm works.

- Physically sensible. - Mathematically wide open. ac

cura

cy

Page 36: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

PX(xi) =⇢

2[�(xi � 1) + �(xi + 1)] + (1� ⇢)�(xi)

Phase Diagram:

easy

impossible

hard

impossible

easyhard

Page 37: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

HARD PHASE IN NATURE

Metastable diamond = high error. Equilibrium graphite = low error. Algorithms are stuck at high error for exponential time.

Page 38: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

MAIN QUESTIONS

What is the minimal achievable estimation error on x*?

(Is it possible to do better than PCA?)

What is the minimal efficiently achievable estimation error on x*?

Page 39: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

PX(xi) =⇢

2[�(xi � 1) + �(xi + 1)] + (1� ⇢)�(xi)

From fixed points to phase transitions:ac

cura

cyac

cura

cy

Page 40: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

OPTIMAL SPECTRAL ALGORITHMS

For zero-mean priors, spectral method that has the same phase transition as AMP. AMP has better error.

For noise that is not Gaussian additive, to have the optimal phase transition, spectral algorithm need to be done on the Fisher score matrix.

Sij ⌘@ logPout(yij |w)

@w

���yij ,0

Page 41: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

OPTIMAL PRE-PROCESSING

Exponential additive noise Cauchy additive noise

Sij = sign(Yij) Sij =Yij

1 + Y 2ij

Fisher score:Fisher score:

Pout(y|w) = e�|y�w|/2<latexit sha1_base64="sJc70CzfprbfCVUCTqXK8yG1B/0=">AAACB3icbVDLSsNAFJ3UV62vqEsXDhahLlqTIqgLoejGZQVjC20Mk+m0HTp5MDOxhDRLN/6KGxcqbv0Fd/6N0zYLbT1w4XDOvdx7jxsyKqRhfGu5hcWl5ZX8amFtfWNzS9/euRNBxDGxcMAC3nSRIIz6xJJUMtIMOUGey0jDHVyN/cYD4YIG/q2MQ2J7qOfTLsVIKsnR9+tO0uYeDCKZluLR8AheQHKflEdxeThKj6uOXjQqxgRwnpgZKYIMdUf/ancCHHnEl5ghIVqmEUo7QVxSzEhaaEeChAgPUI+0FPWRR4SdTB5J4aFSOrAbcFW+hBP190SCPCFiz1WdHpJ9MeuNxf+8ViS7Z3ZC/TCSxMfTRd2IQRnAcSqwQznBksWKIMypuhXiPuIIS5VdQYVgzr48T6xq5bxi3JwUa5dZGnmwBw5ACZjgFNTANagDC2DwCJ7BK3jTnrQX7V37mLbmtGxmF/yB9vkDpIeYpg==</latexit><latexit sha1_base64="sJc70CzfprbfCVUCTqXK8yG1B/0=">AAACB3icbVDLSsNAFJ3UV62vqEsXDhahLlqTIqgLoejGZQVjC20Mk+m0HTp5MDOxhDRLN/6KGxcqbv0Fd/6N0zYLbT1w4XDOvdx7jxsyKqRhfGu5hcWl5ZX8amFtfWNzS9/euRNBxDGxcMAC3nSRIIz6xJJUMtIMOUGey0jDHVyN/cYD4YIG/q2MQ2J7qOfTLsVIKsnR9+tO0uYeDCKZluLR8AheQHKflEdxeThKj6uOXjQqxgRwnpgZKYIMdUf/ancCHHnEl5ghIVqmEUo7QVxSzEhaaEeChAgPUI+0FPWRR4SdTB5J4aFSOrAbcFW+hBP190SCPCFiz1WdHpJ9MeuNxf+8ViS7Z3ZC/TCSxMfTRd2IQRnAcSqwQznBksWKIMypuhXiPuIIS5VdQYVgzr48T6xq5bxi3JwUa5dZGnmwBw5ACZjgFNTANagDC2DwCJ7BK3jTnrQX7V37mLbmtGxmF/yB9vkDpIeYpg==</latexit><latexit sha1_base64="sJc70CzfprbfCVUCTqXK8yG1B/0=">AAACB3icbVDLSsNAFJ3UV62vqEsXDhahLlqTIqgLoejGZQVjC20Mk+m0HTp5MDOxhDRLN/6KGxcqbv0Fd/6N0zYLbT1w4XDOvdx7jxsyKqRhfGu5hcWl5ZX8amFtfWNzS9/euRNBxDGxcMAC3nSRIIz6xJJUMtIMOUGey0jDHVyN/cYD4YIG/q2MQ2J7qOfTLsVIKsnR9+tO0uYeDCKZluLR8AheQHKflEdxeThKj6uOXjQqxgRwnpgZKYIMdUf/ancCHHnEl5ghIVqmEUo7QVxSzEhaaEeChAgPUI+0FPWRR4SdTB5J4aFSOrAbcFW+hBP190SCPCFiz1WdHpJ9MeuNxf+8ViS7Z3ZC/TCSxMfTRd2IQRnAcSqwQznBksWKIMypuhXiPuIIS5VdQYVgzr48T6xq5bxi3JwUa5dZGnmwBw5ACZjgFNTANagDC2DwCJ7BK3jTnrQX7V37mLbmtGxmF/yB9vkDpIeYpg==</latexit>

Pout(y|w) = [1 + (y � w)2]�1/⇡<latexit sha1_base64="L4hdWBGK2Eo850AlER/DD9qtke8=">AAACD3icbVDLSsNAFJ3UV62vqks3g0VskdakCOpCKLpxWcHYQpqGyXTSDp08mJlYQuwnuPFX3LhQcevWnX/j9LHQ1gMXDufcy733uBGjQur6t5ZZWFxaXsmu5tbWNza38ts7dyKMOSYmDlnImy4ShNGAmJJKRpoRJ8h3GWm4/auR37gnXNAwuJVJRGwfdQPqUYykkpz8Yd1JW9yHYSyHxeRhUIIXlnFUTMqDUrtqt9OyMTxuRRQ6+YJe0ceA88SYkgKYou7kv1qdEMc+CSRmSAjL0CNpp4hLihkZ5lqxIBHCfdQllqIB8omw0/FDQ3iglA70Qq4qkHCs/p5IkS9E4ruq00eyJ2a9kfifZ8XSO7NTGkSxJAGeLPJiBmUIR+nADuUES5YogjCn6laIe4gjLFWGORWCMfvyPDGrlfOKfnNSqF1O08iCPbAPisAAp6AGrkEdmACDR/AMXsGb9qS9aO/ax6Q1o01ndsEfaJ8/25Wawg==</latexit><latexit sha1_base64="L4hdWBGK2Eo850AlER/DD9qtke8=">AAACD3icbVDLSsNAFJ3UV62vqks3g0VskdakCOpCKLpxWcHYQpqGyXTSDp08mJlYQuwnuPFX3LhQcevWnX/j9LHQ1gMXDufcy733uBGjQur6t5ZZWFxaXsmu5tbWNza38ts7dyKMOSYmDlnImy4ShNGAmJJKRpoRJ8h3GWm4/auR37gnXNAwuJVJRGwfdQPqUYykkpz8Yd1JW9yHYSyHxeRhUIIXlnFUTMqDUrtqt9OyMTxuRRQ6+YJe0ceA88SYkgKYou7kv1qdEMc+CSRmSAjL0CNpp4hLihkZ5lqxIBHCfdQllqIB8omw0/FDQ3iglA70Qq4qkHCs/p5IkS9E4ruq00eyJ2a9kfifZ8XSO7NTGkSxJAGeLPJiBmUIR+nADuUES5YogjCn6laIe4gjLFWGORWCMfvyPDGrlfOKfnNSqF1O08iCPbAPisAAp6AGrkEdmACDR/AMXsGb9qS9aO/ax6Q1o01ndsEfaJ8/25Wawg==</latexit><latexit sha1_base64="L4hdWBGK2Eo850AlER/DD9qtke8=">AAACD3icbVDLSsNAFJ3UV62vqks3g0VskdakCOpCKLpxWcHYQpqGyXTSDp08mJlYQuwnuPFX3LhQcevWnX/j9LHQ1gMXDufcy733uBGjQur6t5ZZWFxaXsmu5tbWNza38ts7dyKMOSYmDlnImy4ShNGAmJJKRpoRJ8h3GWm4/auR37gnXNAwuJVJRGwfdQPqUYykkpz8Yd1JW9yHYSyHxeRhUIIXlnFUTMqDUrtqt9OyMTxuRRQ6+YJe0ceA88SYkgKYou7kv1qdEMc+CSRmSAjL0CNpp4hLihkZ5lqxIBHCfdQllqIB8omw0/FDQ3iglA70Qq4qkHCs/p5IkS9E4ruq00eyJ2a9kfifZ8XSO7NTGkSxJAGeLPJiBmUIR+nADuUES5YogjCn6laIe4gjLFWGORWCMfvyPDGrlfOKfnNSqF1O08iCPbAPisAAp6AGrkEdmACDR/AMXsGb9qS9aO/ax6Q1o01ndsEfaJ8/25Wawg==</latexit>

Page 42: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

OTHER EXAMPLES OF PHASE DIAGRAMS

Page 43: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

�Alg

PX(xi) = (1� ⇢)�(xi) + ⇢�(xi � 1)

⇢ = 0.01⇢ = 0.2

Non-zero mean priorac

cura

cy

Page 44: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

hard

PX(xi) = (1� ⇢)�(xi) + ⇢�(xi � 1)

Non-zero mean prior

accu

racy

Page 45: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

Pout(Yij = 1|xTi xjpN

) = pout +µpN

xTi xj � =

pout(1� pout)

µ2

Stochastic block model, r groups

0.0 0.5 1.0 1.5 2.0

�r2

0.0

0.2

0.4

0.6

0.8

1.0

MSE

AMP from solutionSE stable branchSE unstable branch�c = �Alg

�IT

�Dyn

r=15

r>4 hard phase exists. r<4 hard phase does not exist.

hard

Page 46: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

� =pout(1� pout)

µ2

2 groups, different sizes, same average degree ✓pout poutpout pout

◆+

µpN

1�⇢⇢ �1�1 ⇢

1�⇢

!

PX(x) = ⇢�

✓x�

r1� ⇢

◆+ (1� ⇢)�

✓x+

r⇢

1� ⇢

⇢c =1

2� 1p

12

ITsmall ⇢

kAlg =pN

rpout

1� pout,

kIT = log(N)4pout

1� pout.

As in balanced planted clique.

impossible

hard

easy

Page 47: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

TENSORS

Page 48: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

ZOOLOGY OF FIXED POINTS (FOR TENSOR ESTIMATION)

Zero mean prior:

SE has a “trivial” fixed point M=0, stable for any

Information-theoretic phase transition at

Huge hard phase, until (e.g. Richard, Montanari’14)

Non-zero mean priors:

Hard phase shrinks back to regime.

EX(x) = 0

EX(x) 6= 0

� = ⌦(1)

�IT = ⌦(1)

� = ⌦(1)

Δ = Ω(N(2−p)/4)

Take home: In tensor estimation use your prior!

Page 49: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

SPIKED TENSOR (ZERO MEAN SPIKE)

ΔIT

No information contained in Y.

GOOD statisticallyHARD algorithmically

PX(x) = N (x; 0, 1) p=3

Page 50: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

PX(x) = N (x; 0.2, 1) p=3

HARD

EASY

Almost no information

in Y.

SPIKED TENSOR (NON-ZERO MEAN SPIKE)

ΔIT

Page 51: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

PHASE DIAGRAMS SPIKED TENSORS

PX(xi) = (1� ⇢)�(xi) + ⇢�(xi � 1)PX(x) = N (x;µ, 1)

p=3

Page 52: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

CONCLUSION

• Analysis of Bayes optimal inference in low-rank matrix and tensor estimation.

• Approximate message passing, its performance.

• Channel universality. Optimal pre-processing for spectral methods.

• Existence of the hard phase (metastability next to a first order phase transition) for a range of priors.

Page 53: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

WORK IN PROGRESS

• Beyond iid priors. Priors coming from another graphical model are also tractable. E.g. optimal generalisation error in neural networks with one small hidden layer.

• Applications of optimal pre-processing for spectral methods: degree corrected stochastic block model. Inference of patterns learned by real biological neural network.

• Nature of the nard phase. Deep connection with algorithmic barrier of sum-of-squares proofs.

Page 54: CONSTRAINED LOW-RANK MATRIX (AND TENSOR ...laurent.risser.free.fr › TMP_SHARE › optimCIMI2018 › slidesLen...CONSTRAINED LOW-RANK MATRIX (AND TENSOR) ESTIMATION Lenka Zdeborová

TALK BASED ON

• Lesieur, Krzakala, LZ, Phase transitions in sparse PCA, ISIT’15

• Lesieur, Krzakala, LZ, MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel, Allerton’15.

• Lesieur, De Bacco, Banks, Krzakala, Moore, LZ, Phase transitions and optimal algorithms in high-dimensional Gaussian mixture clustering, Allerton’16

• Krzakala, Xu, LZ, Mutual information in rank-one matrix estimation, ITW’16

• Barbier, Dia, Macris, Krzakala, Lesieur, LZ Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula, NIPS’16

• Lesieur, Krzakala, LZ, Constrained Low-rank Matrix Estimation: Phase Transitions, Approximate Message Passing and Applications, J. Stat. Mech.’17

• Lesieur, Miolane, Lelarge, Krzakala, LZ, Statistical and computational phase transitions in spiked tensor estimation, ISIT’17


Recommended