Accelerating Iterative Hard Thresholding For Low-rankMatrix Completion Via Adaptive Restart
Trung Vu and Raviv Raich
School of EECS, Oregon State University, Corvallis, OR 97331-5501, USA
{vutru,raich}@oregonstate.edu
May 16, 2019
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 1 / 22
Outline
1 Problem Formulation
2 Background
3 Main Results
4 Conclusions and Future Work
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 2 / 22
The Netflix Prize Problem
Movies
Use
rs
4 ? ?
? ? 4
? 2 ?
4 ? 4
A partially known rating matrix M ∈ Rm×n with rank(M) ≤ r
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 3 / 22
Low-Rank Matrix Completion Problem
M︷ ︸︸ ︷4 ? ?
? ? 4
? 2 ?
4 ? 4
Given r=1−−−−−−→
X∗︷ ︸︸ ︷4 2 4
4 2 4
4 2 4
4 2 4
SVD
====
12
12
12
12
· 6 ·
[23
13
23
]
find Xij , (i , j) ∈ Sc
subject to rank(X ) ≤ r and Xij = Mij for (i , j) ∈ S.
(r < n ≤ m)
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 4 / 22
Notations
Sampling operator XS
[XS ]ij =
{Xij if (i , j) ∈ S0 if (i , j) ∈ Sc
4 2 4
4 2 4
4 2 4
4 2 4
S−→4 0 0
0 0 4
0 2 0
4 0 4
Row selection matrix S(S) ∈ Rs×mn corresponding to S
︸ ︷︷ ︸S(S)
1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 1
4
2
4
4
2
4
4
2
4
4
2
4
=
4
4
2
4
4
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 5 / 22
The rank-r projection of an arbitrary matrix X ∈ Rm×n is obtainedby hard-thresholding singular values of X
Pr (X ) =r∑
i=1
σi (X )ui (X )vi (X )T
The SVD of the matrix M can be partitioned based on the signalsubspace and its orthogonal subspace
M =[U1 U2
]Σ1 0
0 0
V T1
V T2
Σ1 ∈ Rr×r
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 6 / 22
Several Formulations of Low-Rank Matrix Completion
find Xij , (i , j) ∈ Sc s.t. rank(X ) ≤ r and XS = MS
Approach Problem formulation Property
Convex
relaxation
min ‖X‖∗ s.t. XS = MS3 Rigorous guarantees
7 Slow convergencemin λ ‖X‖∗ + 1
2 ‖XS −MS‖2F
min τ ‖X‖∗ + 12 ‖X‖
2F s.t. XS = MS
Non-convex
min rank(X ) s.t. XS = MS3 Fast convergence
7 Hard to analyzemin ‖XS −MS‖2
F s.t. rank(X ) ≤ r (∗)
min∥∥[XY T ]S −MS
∥∥2
FX ∈ Rm×r ,Y ∈ Rn×r
‖X‖∗ =∑n
i=1 σi (X )
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 7 / 22
Outline
1 Problem Formulation
2 Background
3 Main Results
4 Conclusions and Future Work
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 8 / 22
Iterative Hard Thresholding for Matrix Completion
minX∈Rm×n
1
2‖XS −MS‖2
F s.t. rank(X ) ≤ r (∗)
Iterative hard thresholding (IHT) is a variant of non-convexprojected gradient descent
X (k+1) = Pr(X (k) − αk [X (k) −M]S
)Unlike matrix sensing, the matrix RIP does not hold for MCP
0 · ‖X‖2F ≤ ‖[X ]S‖2
F ≤ 1 · ‖X‖2F
I Global convergence is non-trivial! [Jain, Meka, and Dhillon 2010]
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 9 / 22
Local Convergence of IHT
Algorithm 1 IHTSVD
1: for k = 0, 1, 2, . . . do2: X (k+1) = Pr
(Y (k)
)3: Y (k+1) = PM,S
(X (k+1)
)*PM,S(X ) = XSc + MS
I IHT with unit step size αk = 1
4 0 0
0 0 4
0 2 0
4 0 4
Pr−→
2 0 2
2 0 2
0 0 0
4 0 4
PM,S−−−→
4 0 2
2 0 4
0 2 0
4 0 4
Pr−→ . . .
Source: [Chunikhina, Raich, and Nguyen 2014]
[ibid.] If σ = σmin
(S(Sc )(V2 ⊗ U2)
)> 0, then IHTSVD converges to M
locally at a linear rate 1− σ2.
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 10 / 22
Linearization of the Rank-r Projection
Pr (M + ∆) = M + ∆− U2UT2 ∆V2V
T2 + O(‖∆‖2
F )
Local convergence analysis assumes Y (k) is a perturbed matrix of M
M + E (k+1) = Y (k+1) = PM,S(Pr (Y (k))
)= PM,S
(Pr (M + E (k))
)The recursion on the error matrix E (k+1) =
[Pr (M + E (k))−M
]Sc
can be approximated by
︸ ︷︷ ︸e(k+1)
S(Sc ) vec(E (k+1))1
==(︸ ︷︷ ︸
A
Is − S(Sc )(V2 ⊗ U2)(V2 ⊗ U2)TST(Sc )
)︸ ︷︷ ︸
e(k)
S(Sc ) vec(E (k))
Stable if λmax(A) = 1−(σmin
(S(Sc )(V2 ⊗ U2)
))2
< 1
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 11 / 22
50 100 150 200 250 300 350 400 450 500 550 60010
-12
10-10
10-8
10-6
10-4
10-2
100
102
Our work: 1-
Previous work: 1-2
Figure 1: The distance to the solution (in log-scale) as a function of the iteration number forvarious algorithms. m = 50, n = 40, r = 3, and s = 1000. All algorithms share the samecomputational complexity per iteration
(O(mnr)
)except SVT
(O(mn2)
)[Cai, Candes, and
Shen 2010] and AM(O(sm2r2 + m3r3)
)[Jain, Netrapalli, and Sanghavi 2013].
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 12 / 22
Outline
1 Problem Formulation
2 Background
3 Main Results
4 Conclusions and Future Work
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 13 / 22
Our Contribution
1 Analyze the local convergence of accelerated IHTSVD for solving therank constrained least squares problem (*).
2 Propose a practical way to select momentum step size that enables usto recovers the optimal rate of convergence near the solution.
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 14 / 22
Nesterov’s Accelerated Gradient
Nesterov’s Accelerated Gradient (NAG) is a simple modification togradient descent that provably accelerates the convergence
x (k+1) = y (k) − αk∇f (y (k))
y (k+1) = x (k+1) + βk(x (k+1) − x (k))
If f is µ-strongly convex, L-smooth function, NAG can improve thelinear convergence rate from 1− µ/L to 1−
õ/L by setting
αk =1
L, βk =
1−√µ/L
1 +õ/L
. [Nesterov 2004]
Iteration complexity: O(√κ), compared to O(κ) for gradient descent,
where κ = Lµ is the condition number.
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 15 / 22
The Proposed NAG-IHT
Algorithm 2 NAG-IHT
1: for k = 0, 1, 2, . . . do2: X (k+1) = Pr
(Y (k)
)3: Y (k+1) = PM,S
(X (k+1) + βk(X (k+1) − X (k))
)
Method # Ops./Iter. Local conv. rate #Iters. needed ε-acc.
IHTSVD O(mnr) 1− σ2 1σ2 log(1/ε)
NAG-IHT with βk = 1−σ1+σ O(mnr) 1− σ 1
σ log(1/ε)
∗ σ = σmin
(S(Sc )(V2 ⊗ U2)
)Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 16 / 22
A Practical Method for Step Size Selection
Practical issue: fast convergence requires prior knowledge of globalparameters related to the objective function (βk = 1−σ
1+σ ).
Solution: adaptive restart [O’Donoghue and Candes 2015]
Use an incremental momentumβk = t−1
t+2 starting at t = 1
When f (x (k+1)) > f (x (k)), reset t = 1
0 500 1000 1500 2000 2500 3000
10-15
10-10
10-5
100
Gradient descent
No restart
Restart every 100
Restart every 400
Restart every 700
Restart every 1000With q= /L
Function scheme
Gradient scheme
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 17 / 22
The Proposed Adaptive Restart Scheme for NAG-IHT
Algorithm 3 ARNAG-IHT
1: t = 1
2: f0 =∥∥∥X (0)S −MS
∥∥∥2
F3: for k = 0, 1, 2, . . . do4: X (k+1) = Pr
(Y (k)
)5: Y (k+1) = PM,S
(X (k+1) + t−1
t+2 (X (k+1) − X (k)))
6: fk+1 =∥∥∥X (k+1)S −MS
∥∥∥2
F7: if fk+1 > fk then t = 1 else t = t + 1 . function scheme
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 18 / 22
Numerical Evaluation
100 200 300 400 500 60010
-12
10-10
10-8
10-6
10-4
10-2
100
102
Figure 2: The distance to the solution (in log-scale) as a function of the iteration number forIHT algorithms (solid) and their corresponding theoretical bounds up to a constant (dashed).m = 50, n = 40, r = 3, and s = 1000. *NAG-IHT using optimal step size is not applicable inpractice.
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 19 / 22
Outline
1 Problem Formulation
2 Background
3 Main Results
4 Conclusions and Future Work
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 20 / 22
Conclusions and Future Work
Conclusions
Propose Nesterov’s Accelerated Gradient for iterative hard thresholdingfor matrix completion.Analyze NAG-IHT with optimal step size and prove that the iterationcomplexity improves from O(1/σ2) to O(1/σ) after acceleration.Propose adaptive restart for sub-optimal step size selection thatrecovers the optimal rate of convergence in practice.
Future work
Extend the local convergence analysis to the real-world cases when theunderlying matrix is noisy and/or not close to being low rank.Convergence under a simple initialization suggests potential analysis ofglobal convergence of our algorithm.
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 21 / 22
References I
Cai, J.-F., E. Candes, and Z. Shen (2010). “A Singular Value Thresholding Algorithm for MatrixCompletion”. In: SIAM Journal on Optimization 20.4, pp. 1956–1982.
Chunikhina, E., R. Raich, and T. Nguyen (2014). “Performance analysis for matrix completionvia iterative hard-thresholded SVD”. In: 2014 IEEE Workshop on Statistical SignalProcessing (SSP), pp. 392–395.
Jain, P., R. Meka, and I. Dhillon (2010). “Guaranteed Rank Minimization via Singular ValueProjection”. In: Advances in Neural Information Processing Systems (NIPS), pp. 937–945.
Jain, P., P. Netrapalli, and S. Sanghavi (2013). “Low-rank Matrix Completion Using AlternatingMinimization”. In: Proceedings of the Forty-fifth Annual ACM Symposium on Theory ofComputing, pp. 665–674.
Nesterov, Y. (2004). Introductory lectures on convex optimization: a basic course. KluwerAcademic Publishers.
O’Donoghue, B. and E. Candes (2015). “Adaptive Restart for Accelerated Gradient Schemes”.In: Foundations of Computational Mathematics 15.3, pp. 715–732.
Trung Vu and Raviv Raich (OSU) ICASSP 2019 May 16, 2019 22 / 22