+ All Categories
Home > Documents > Accelerating Iterative Hard Thresholding For Low … › files ›...

Accelerating Iterative Hard Thresholding For Low … › files ›...

Date post: 24-Jun-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
22
Accelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart Trung Vu and Raviv Raich School of EECS, Oregon State University, Corvallis, OR 97331-5501, USA {vutru,raich}@oregonstate.edu May 16, 2019 Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 1 / 22
Transcript
Page 1: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

Accelerating Iterative Hard Thresholding For Low-rankMatrix Completion Via Adaptive Restart

Trung Vu and Raviv Raich

School of EECS, Oregon State University, Corvallis, OR 97331-5501, USA

{vutru,raich}@oregonstate.edu

May 16, 2019

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 1 / 22

Page 2: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

Outline

1 Problem Formulation

2 Background

3 Main Results

4 Conclusions and Future Work

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 2 / 22

Page 3: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

The Netflix Prize Problem

Movies

Use

rs

4 ? ?

? ? 4

? 2 ?

4 ? 4

A partially known rating matrix M ∈ Rm×n with rank(M) ≤ r

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 3 / 22

Page 4: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

Low-Rank Matrix Completion Problem

M︷ ︸︸ ︷4 ? ?

? ? 4

? 2 ?

4 ? 4

Given r=1−−−−−−→

X∗︷ ︸︸ ︷4 2 4

4 2 4

4 2 4

4 2 4

SVD

====

12

12

12

12

· 6 ·

[23

13

23

]

find Xij , (i , j) ∈ Sc

subject to rank(X ) ≤ r and Xij = Mij for (i , j) ∈ S.

(r < n ≤ m)

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 4 / 22

Page 5: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

Notations

Sampling operator XS

[XS ]ij =

{Xij if (i , j) ∈ S0 if (i , j) ∈ Sc

4 2 4

4 2 4

4 2 4

4 2 4

S−→4 0 0

0 0 4

0 2 0

4 0 4

Row selection matrix S(S) ∈ Rs×mn corresponding to S

︸ ︷︷ ︸S(S)

1 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 1 0 0 0 0 0 0

0 0 0 0 0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 0 0 1 0 0

0 0 0 0 0 0 0 0 0 0 0 1

4

2

4

4

2

4

4

2

4

4

2

4

=

4

4

2

4

4

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 5 / 22

Page 6: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

The rank-r projection of an arbitrary matrix X ∈ Rm×n is obtainedby hard-thresholding singular values of X

Pr (X ) =r∑

i=1

σi (X )ui (X )vi (X )T

The SVD of the matrix M can be partitioned based on the signalsubspace and its orthogonal subspace

M =[U1 U2

]Σ1 0

0 0

V T1

V T2

Σ1 ∈ Rr×r

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 6 / 22

Page 7: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

Several Formulations of Low-Rank Matrix Completion

find Xij , (i , j) ∈ Sc s.t. rank(X ) ≤ r and XS = MS

Approach Problem formulation Property

Convex

relaxation

min ‖X‖∗ s.t. XS = MS3 Rigorous guarantees

7 Slow convergencemin λ ‖X‖∗ + 1

2 ‖XS −MS‖2F

min τ ‖X‖∗ + 12 ‖X‖

2F s.t. XS = MS

Non-convex

min rank(X ) s.t. XS = MS3 Fast convergence

7 Hard to analyzemin ‖XS −MS‖2

F s.t. rank(X ) ≤ r (∗)

min∥∥[XY T ]S −MS

∥∥2

FX ∈ Rm×r ,Y ∈ Rn×r

‖X‖∗ =∑n

i=1 σi (X )

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 7 / 22

Page 8: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

Outline

1 Problem Formulation

2 Background

3 Main Results

4 Conclusions and Future Work

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 8 / 22

Page 9: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

Iterative Hard Thresholding for Matrix Completion

minX∈Rm×n

1

2‖XS −MS‖2

F s.t. rank(X ) ≤ r (∗)

Iterative hard thresholding (IHT) is a variant of non-convexprojected gradient descent

X (k+1) = Pr(X (k) − αk [X (k) −M]S

)Unlike matrix sensing, the matrix RIP does not hold for MCP

0 · ‖X‖2F ≤ ‖[X ]S‖2

F ≤ 1 · ‖X‖2F

I Global convergence is non-trivial! [Jain, Meka, and Dhillon 2010]

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 9 / 22

Page 10: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

Local Convergence of IHT

Algorithm 1 IHTSVD

1: for k = 0, 1, 2, . . . do2: X (k+1) = Pr

(Y (k)

)3: Y (k+1) = PM,S

(X (k+1)

)*PM,S(X ) = XSc + MS

I IHT with unit step size αk = 1

4 0 0

0 0 4

0 2 0

4 0 4

Pr−→

2 0 2

2 0 2

0 0 0

4 0 4

PM,S−−−→

4 0 2

2 0 4

0 2 0

4 0 4

Pr−→ . . .

Source: [Chunikhina, Raich, and Nguyen 2014]

[ibid.] If σ = σmin

(S(Sc )(V2 ⊗ U2)

)> 0, then IHTSVD converges to M

locally at a linear rate 1− σ2.

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 10 / 22

Page 11: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

Linearization of the Rank-r Projection

Pr (M + ∆) = M + ∆− U2UT2 ∆V2V

T2 + O(‖∆‖2

F )

Local convergence analysis assumes Y (k) is a perturbed matrix of M

M + E (k+1) = Y (k+1) = PM,S(Pr (Y (k))

)= PM,S

(Pr (M + E (k))

)The recursion on the error matrix E (k+1) =

[Pr (M + E (k))−M

]Sc

can be approximated by

︸ ︷︷ ︸e(k+1)

S(Sc ) vec(E (k+1))1

==(︸ ︷︷ ︸

A

Is − S(Sc )(V2 ⊗ U2)(V2 ⊗ U2)TST(Sc )

)︸ ︷︷ ︸

e(k)

S(Sc ) vec(E (k))

Stable if λmax(A) = 1−(σmin

(S(Sc )(V2 ⊗ U2)

))2

< 1

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 11 / 22

Page 12: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

50 100 150 200 250 300 350 400 450 500 550 60010

-12

10-10

10-8

10-6

10-4

10-2

100

102

Our work: 1-

Previous work: 1-2

Figure 1: The distance to the solution (in log-scale) as a function of the iteration number forvarious algorithms. m = 50, n = 40, r = 3, and s = 1000. All algorithms share the samecomputational complexity per iteration

(O(mnr)

)except SVT

(O(mn2)

)[Cai, Candes, and

Shen 2010] and AM(O(sm2r2 + m3r3)

)[Jain, Netrapalli, and Sanghavi 2013].

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 12 / 22

Page 13: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

Outline

1 Problem Formulation

2 Background

3 Main Results

4 Conclusions and Future Work

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 13 / 22

Page 14: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

Our Contribution

1 Analyze the local convergence of accelerated IHTSVD for solving therank constrained least squares problem (*).

2 Propose a practical way to select momentum step size that enables usto recovers the optimal rate of convergence near the solution.

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 14 / 22

Page 15: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

Nesterov’s Accelerated Gradient

Nesterov’s Accelerated Gradient (NAG) is a simple modification togradient descent that provably accelerates the convergence

x (k+1) = y (k) − αk∇f (y (k))

y (k+1) = x (k+1) + βk(x (k+1) − x (k))

If f is µ-strongly convex, L-smooth function, NAG can improve thelinear convergence rate from 1− µ/L to 1−

õ/L by setting

αk =1

L, βk =

1−√µ/L

1 +õ/L

. [Nesterov 2004]

Iteration complexity: O(√κ), compared to O(κ) for gradient descent,

where κ = Lµ is the condition number.

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 15 / 22

Page 16: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

The Proposed NAG-IHT

Algorithm 2 NAG-IHT

1: for k = 0, 1, 2, . . . do2: X (k+1) = Pr

(Y (k)

)3: Y (k+1) = PM,S

(X (k+1) + βk(X (k+1) − X (k))

)

Method # Ops./Iter. Local conv. rate #Iters. needed ε-acc.

IHTSVD O(mnr) 1− σ2 1σ2 log(1/ε)

NAG-IHT with βk = 1−σ1+σ O(mnr) 1− σ 1

σ log(1/ε)

∗ σ = σmin

(S(Sc )(V2 ⊗ U2)

)Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 16 / 22

Page 17: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

A Practical Method for Step Size Selection

Practical issue: fast convergence requires prior knowledge of globalparameters related to the objective function (βk = 1−σ

1+σ ).

Solution: adaptive restart [O’Donoghue and Candes 2015]

Use an incremental momentumβk = t−1

t+2 starting at t = 1

When f (x (k+1)) > f (x (k)), reset t = 1

0 500 1000 1500 2000 2500 3000

10-15

10-10

10-5

100

Gradient descent

No restart

Restart every 100

Restart every 400

Restart every 700

Restart every 1000With q= /L

Function scheme

Gradient scheme

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 17 / 22

Page 18: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

The Proposed Adaptive Restart Scheme for NAG-IHT

Algorithm 3 ARNAG-IHT

1: t = 1

2: f0 =∥∥∥X (0)S −MS

∥∥∥2

F3: for k = 0, 1, 2, . . . do4: X (k+1) = Pr

(Y (k)

)5: Y (k+1) = PM,S

(X (k+1) + t−1

t+2 (X (k+1) − X (k)))

6: fk+1 =∥∥∥X (k+1)S −MS

∥∥∥2

F7: if fk+1 > fk then t = 1 else t = t + 1 . function scheme

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 18 / 22

Page 19: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

Numerical Evaluation

100 200 300 400 500 60010

-12

10-10

10-8

10-6

10-4

10-2

100

102

Figure 2: The distance to the solution (in log-scale) as a function of the iteration number forIHT algorithms (solid) and their corresponding theoretical bounds up to a constant (dashed).m = 50, n = 40, r = 3, and s = 1000. *NAG-IHT using optimal step size is not applicable inpractice.

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 19 / 22

Page 20: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

Outline

1 Problem Formulation

2 Background

3 Main Results

4 Conclusions and Future Work

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 20 / 22

Page 21: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

Conclusions and Future Work

Conclusions

Propose Nesterov’s Accelerated Gradient for iterative hard thresholdingfor matrix completion.Analyze NAG-IHT with optimal step size and prove that the iterationcomplexity improves from O(1/σ2) to O(1/σ) after acceleration.Propose adaptive restart for sub-optimal step size selection thatrecovers the optimal rate of convergence in practice.

Future work

Extend the local convergence analysis to the real-world cases when theunderlying matrix is noisy and/or not close to being low rank.Convergence under a simple initialization suggests potential analysis ofglobal convergence of our algorithm.

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 21 / 22

Page 22: Accelerating Iterative Hard Thresholding For Low … › files › ICASSP19_NAGSlides.pdfAccelerating Iterative Hard Thresholding For Low-rank Matrix Completion Via Adaptive Restart

References I

Cai, J.-F., E. Candes, and Z. Shen (2010). “A Singular Value Thresholding Algorithm for MatrixCompletion”. In: SIAM Journal on Optimization 20.4, pp. 1956–1982.

Chunikhina, E., R. Raich, and T. Nguyen (2014). “Performance analysis for matrix completionvia iterative hard-thresholded SVD”. In: 2014 IEEE Workshop on Statistical SignalProcessing (SSP), pp. 392–395.

Jain, P., R. Meka, and I. Dhillon (2010). “Guaranteed Rank Minimization via Singular ValueProjection”. In: Advances in Neural Information Processing Systems (NIPS), pp. 937–945.

Jain, P., P. Netrapalli, and S. Sanghavi (2013). “Low-rank Matrix Completion Using AlternatingMinimization”. In: Proceedings of the Forty-fifth Annual ACM Symposium on Theory ofComputing, pp. 665–674.

Nesterov, Y. (2004). Introductory lectures on convex optimization: a basic course. KluwerAcademic Publishers.

O’Donoghue, B. and E. Candes (2015). “Adaptive Restart for Accelerated Gradient Schemes”.In: Foundations of Computational Mathematics 15.3, pp. 715–732.

Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 22 / 22


Recommended