Matrix CompletionSHRIPAD GADE
IE598 – BIG DATA OPTIMIZATION COURSE PROJECT
Outline
Problem Definition and Preliminaries
Algorithms
Distributed Privacy-Preserving Matrix Completion
1-Dec-16 IE598 Course Project: Matrix Completion 2
Problem Definition
Filling entries of a partially observed matrix.
Image Credit - https://en.wikipedia.org/wiki/Matrix_completion
1-Dec-16 IE598 Course Project: Matrix Completion 3
Problem Definition
Filling entries of a partially observed matrix.
Image Credit - https://en.wikipedia.org/wiki/Matrix_completion
1-Dec-16 IE598 Course Project: Matrix Completion 4
Problem Definition
Filling entries of a partially observed matrix.
Motivation [Netflix Problem]
Given a few user preferences (i,j) entries we wish to complete the matrix, i.e. find movies that users would like (recommendation system)
Image Credit - https://en.wikipedia.org/wiki/Matrix_completion
Movies
Use
rs
1-Dec-16 IE598 Course Project: Matrix Completion 5
Problem Definition
Image Credit - https://en.wikipedia.org/wiki/Matrix_completion
Movies
Use
rs
Use
rs
Movies
Filling entries of a partially observed matrix.
Motivation [Netflix Problem]
Given a few user preferences (i,j) entries we wish to complete the matrix, i.e. find movies that users would like (recommendation system)
1-Dec-16 IE598 Course Project: Matrix Completion 6
Applications
Applications – Recommender Systems [Collaborative Filtering] Netflix/Hulu/Prime Video
Amazon/Walmart/Macy’s
Image Credit - https://en.wikipedia.org/wiki/Matrix_completion
Movies
Use
rs
1-Dec-16 IE598 Course Project: Matrix Completion 7
Applications
Applications – Recommender Systems [Collaborative Filtering]
Global Positioning in Sensor Networks Robotics
Spacecrafts/UAV’s
Image Credit - https://en.wikipedia.org/wiki/Matrix_completion
Sensor ID
Sen
sor
ID
1-Dec-16 IE598 Course Project: Matrix Completion 8
Applications
Applications – Recommender Systems [Collaborative Filtering]
Global Positioning in Sensor Networks
Video Processing Intrusion Detection
Background Extraction
Image Credit: http://staff.ustc.edu.cn/~cgong821/slides_low_rank_matrix_optim.pdf
1-Dec-16 IE598 Course Project: Matrix Completion 9
Applications
Applications – Recommender Systems [Collaborative Filtering]
Global Positioning in Sensor Networks
Video Intrusion Detection and Background Extraction
System Identification Physical Processes (e.g. motion)
Economic Processes (e.g. stock market)
Input/output pair is sparsely sampled (u(t),y(t)), then
recovering state matrices A, B, C, D and initial condition x(0)
can be viewed as a matrix completion problem.
1-Dec-16 IE598 Course Project: Matrix Completion 10
Some Issues Filling entries of a partially observed matrix.
Without any other information, this problem is underdetermined. unknown matrix entries could be anything.
Image Credit - https://en.wikipedia.org/wiki/Matrix_completion
Movies
Use
rs
1-Dec-16 IE598 Course Project: Matrix Completion 11
Some Issues .. and solutions Filling entries of a partially observed matrix.
Without any other information, this problem is underdetermined. unknown matrix entries could be anything.
Low Rank assumption!
Image Credit - https://en.wikipedia.org/wiki/Matrix_completion
Movies
Use
rs
1-Dec-16 IE598 Course Project: Matrix Completion 12
Some Issues .. and solutions Filling entries of a partially observed matrix.
Without any other information, this problem is underdetermined. unknown matrix entries could be anything.
Low Rank assumption!
a) Fill matrix s.t. rank is minimum or
b) Fill matrix s.t. rank ≤ r
Image Credit - https://en.wikipedia.org/wiki/Matrix_completion
Movies
Use
rs
1-Dec-16 IE598 Course Project: Matrix Completion 13
Some Issues .. and solutions Filling entries of a partially observed matrix.
Without any other information, this problem is underdetermined. unknown matrix entries could be anything.
Low Rank assumption!
a) Fill matrix s.t. rank is minimum or
b) Fill matrix s.t. rank ≤ r
Why does this work?
1-Dec-16 IE598 Course Project: Matrix Completion 14
Some Issues .. and solutions Filling entries of a partially observed matrix.
Without any other information, this problem is underdetermined. unknown matrix entries could be anything.
Low Rank assumption!
a) Fill matrix s.t. rank is minimum or
b) Fill matrix s.t. rank ≤ r
Why does this work?
Image Credit: http://staff.ustc.edu.cn/~cgong821/slides_low_rank_matrix_optim.pdf
[Netflix Problem] – User preferences can be captured by a few features like genre, year of release.
[Video Processing] – Low dimensional structure in visual data.
1-Dec-16 IE598 Course Project: Matrix Completion 15
Algorithms
Assumptions: Problem a) Fill matrix s.t. rank is minimum
Uniform sampling and sufficiently many samples the problem has unique solution with high probability
1-Dec-16 IE598 Course Project: Matrix Completion 16
Algorithms
Assumptions: Problem a) Fill matrix s.t. rank is minimum
Uniform sampling and sufficiently many samples the problem has unique solution with high probability
Lower Bound (Number of Observed Entries): O(r n log(n))
1-Dec-16 IE598 Course Project: Matrix Completion 17
Algorithms
Assumptions: Problem a) Fill matrix s.t. rank is minimum
Uniform sampling and sufficiently many samples the problem has unique solution with high probability
Lower Bound (Number of Observed Entries): O(r n log(n))
Incoherence: Singular Vectors of M are not too sparse.
E.g. M = 1 0 00 0 00 0 0
with singular decomposition 𝐼3
1 0 00 0 00 0 0
𝐼3. Almost all entries need to be
sampled.
1-Dec-16 IE598 Course Project: Matrix Completion 18
Algorithms
Assumptions: Problem a) Fill matrix s.t. rank is minimum
Uniform sampling and sufficiently many samples the problem has unique solution with high probability
Lower Bound (Number of Observed Entries): O(r n log(n))
Incoherence
Convex Relaxation – Candes, Cao, Recht and Tao [2009] Rank minimization is nonconvex → convex relaxation → rank(M) replaces tr(W1)+tr(W2) s.t.𝑊1 𝑋
𝑋+ 𝑊2≽ 0
Gradient Descent – Keshavan, Montanari and Oh [2008] + Bounded magnitude of entries + Constant Condition Number (𝜎1/𝜎𝑟)
Alternating Minimization – Jain, Netrapalli and Sanghvi [2012] More successful in practice
Netflix winning solution used this algorithm
1-Dec-16 IE598 Course Project: Matrix Completion 19
Privacy-Preserving Distributed Matrix Completion
𝑊 is a low rank matrix, partitioned among 𝐿 agents
𝑊 = [𝑊1 𝑊2 𝑊3 … 𝑊𝐿] ∈ ℝ𝑁×𝑀
1-Dec-16 IE598 Course Project: Matrix Completion 20
Mo
vies
Users
Netflix Hulu HBO
Privacy-Preserving Distributed Matrix Completion
𝑊 is a low rank matrix, partitioned among 𝐿 agents
𝑊 = [𝑊1 𝑊2 𝑊3 … 𝑊𝐿] ∈ ℝ𝑁×𝑀
Agent 𝐼 has access to observed entries from 𝑊𝐼 (and nothing else)
1-Dec-16 IE598 Course Project: Matrix Completion 21
Mo
vies
Users
Netflix Hulu HBO
Privacy-Preserving Distributed Matrix Completion
𝑊 is a low rank matrix, partitioned among 𝐿 agents
𝑊 = [𝑊1 𝑊2 𝑊3 … 𝑊𝐿] ∈ ℝ𝑁×𝑀
Agent 𝐼 has access to observed entries from 𝑊𝐼 (and nothing else)
Task – Recover 𝑊 (collaboratively)
Ensure that 𝑊𝐽 is private (∀ 𝐽 ≠ 𝐼)
1-Dec-16 IE598 Course Project: Matrix Completion 22
Mo
vies
Users
Netflix Hulu HBO
Privacy-Preserving Distributed Matrix Completion
𝑊 is a low rank matrix, partitioned among 𝐿 agents
𝑊 = 𝑊1 𝑊2 𝑊3 … 𝑊𝐿 ∈ ℝ𝑁×𝑀
Agent 𝐼 has access to observed entries from 𝑊𝐼 (and nothing else)
Task – Recover 𝑊 (collaboratively)
Ensure that 𝑊𝐽 is private (∀ 𝐽 ≠ 𝐼)
Literature – Convex optimization based solutions [4] – exact but expensive
Non-convex approach [7] – fast but guaranteed 𝜖-optimality
Nonlinear Gauss-Seidel iteration [8] – centralized
1-Dec-16 IE598 Course Project: Matrix Completion 23
Users
Mo
vies
Netflix Hulu HBO
Privacy-Preserving Distributed Matrix Completion
Privacy aware strategy – Use Matrix Factorization – Estimate 𝑋, 𝑌 such that 𝑊 = 𝑋𝑌.
min𝑋,𝑌,𝑍
1
2𝑋𝑌 − 𝑍
𝐹
2𝑠. 𝑡. 𝑍𝑛,𝑚 = 𝑊𝑛,𝑚 ∀ 𝑛,𝑚 ∈ Ω
1-Dec-16 IE598 Course Project: Matrix Completion 24
Privacy-Preserving Distributed Matrix Completion
Privacy aware strategy – Use Matrix Factorization – Estimate 𝑋, 𝑌 such that 𝑊 = 𝑋𝑌.
min𝑋,𝑌,𝑍
1
2𝑋𝑌 − 𝑍
𝐹
2𝑠. 𝑡. 𝑍𝑛,𝑚 = 𝑊𝑛,𝑚 ∀ 𝑛,𝑚 ∈ Ω
𝑋 ∈ ℝ𝑁×𝑟 is a public matrix accessible to every agent.
1-Dec-16 IE598 Course Project: Matrix Completion 25
Privacy-Preserving Distributed Matrix Completion
Privacy aware strategy – Use Matrix Factorization – Estimate 𝑋, 𝑌 such that 𝑊 = 𝑋𝑌.
min𝑋,𝑌,𝑍
1
2𝑋𝑌 − 𝑍
𝐹
2𝑠. 𝑡. 𝑍𝑛,𝑚 = 𝑊𝑛,𝑚 ∀ 𝑛,𝑚 ∈ Ω
𝑋 ∈ ℝ𝑁×𝑟 is a public matrix accessible to every agent.
𝑌 = 𝑌1 𝑌2 … 𝑌𝐿 ∈ ℝ𝑟×𝑀, where 𝑌𝐼 ∈ ℝ𝑟×𝑀𝐼 is private to agent 𝐼. (and σ𝐼𝑀𝐼 = 𝑀)
1-Dec-16 IE598 Course Project: Matrix Completion 26
Privacy-Preserving Distributed Matrix Completion
Privacy aware strategy – Use Matrix Factorization – Estimate 𝑋, 𝑌 such that 𝑊 = 𝑋𝑌.
min𝑋,𝑌,𝑍
1
2𝑋𝑌 − 𝑍
𝐹
2𝑠. 𝑡. 𝑍𝑛,𝑚 = 𝑊𝑛,𝑚 ∀ 𝑛,𝑚 ∈ Ω
𝑋 ∈ ℝ𝑁×𝑟 is a public matrix accessible to every agent.
𝑌 = 𝑌1 𝑌2 … 𝑌𝐿 ∈ ℝ𝑟×𝑀, where 𝑌𝐼 ∈ ℝ𝑟×𝑀𝐼 is private to agent 𝐼. (and σ𝐼𝑀𝐼 = 𝑀)
1-Dec-16 IE598 Course Project: Matrix Completion 27
𝑋
Netflix𝑌1
Mo
vies
Netflix
Privacy-Preserving Distributed Matrix Completion
Privacy aware strategy – Use Matrix Factorization – Estimate 𝑋, 𝑌 such that 𝑊 = 𝑋𝑌.
min𝑋,𝑌,𝑍
1
2𝑋𝑌 − 𝑍
𝐹
2𝑠. 𝑡. 𝑍𝑛,𝑚 = 𝑊𝑛,𝑚 ∀ 𝑛,𝑚 ∈ Ω
𝑋 ∈ ℝ𝑁×𝑟 is a public matrix accessible to every agent.
𝑌 = 𝑌1 𝑌2 … 𝑌𝐿 ∈ ℝ𝑟×𝑀, where 𝑌𝐼 ∈ ℝ𝑟×𝑀𝐼 is private to agent 𝐼. (and σ𝐼𝑀𝐼 = 𝑀)
1-Dec-16 IE598 Course Project: Matrix Completion 28
𝑋
Mo
vies
Hulu
Hulu𝑌2
Privacy-Preserving Distributed Matrix Completion
Privacy aware strategy – Use Matrix Factorization – Estimate 𝑋, 𝑌 such that 𝑊 = 𝑋𝑌.
min𝑋,𝑌,𝑍
1
2𝑋𝑌 − 𝑍
𝐹
2𝑠. 𝑡. 𝑍𝑛,𝑚 = 𝑊𝑛,𝑚 ∀ 𝑛,𝑚 ∈ Ω
𝑋 ∈ ℝ𝑁×𝑟 is a public matrix accessible to every agent.
𝑌 = 𝑌1 𝑌2 … 𝑌𝐿 ∈ ℝ𝑟×𝑀, where 𝑌𝐼 ∈ ℝ𝑟×𝑀𝐼 is private to agent 𝐼. (and σ𝐼𝑀𝐼 = 𝑀)
1-Dec-16 IE598 Course Project: Matrix Completion 29
𝑋
Mo
vies
HBO
HBO
𝑌3
Privacy-Preserving Distributed Matrix Completion
Privacy aware strategy – Use Matrix Factorization – Estimate 𝑋, 𝑌 such that 𝑊 = 𝑋𝑌.
min𝑋,𝑌,𝑍
1
2𝑋𝑌 − 𝑍
𝐹
2𝑠. 𝑡. 𝑍𝑛,𝑚 = 𝑊𝑛,𝑚 ∀ 𝑛,𝑚 ∈ Ω
𝑋 ∈ ℝ𝑁×𝑟 is a public matrix accessible to every agent.
𝑌 = 𝑌1 𝑌2 … 𝑌𝐿 ∈ ℝ𝑟×𝑀, where 𝑌𝐼 ∈ ℝ𝑟×𝑀𝐼 is private to agent 𝐼. (and σ𝐼𝑀𝐼 = 𝑀)
1-Dec-16 IE598 Course Project: Matrix Completion 30
X
Mo
vies
Netflix Hulu HBO
Netflix𝑌1
Hulu𝑌2
HBO
𝑌3
Distributed Matrix Completion [1]
1-Dec-16 IE598 Course Project: Matrix Completion 31
Update X
Update 𝛼
Update YUpdate Z
Check
Stopping
Rule
Initialize X, Y, Z, 𝛼
min𝑋,𝑌,𝑍
1
2𝑋𝑌 − 𝑍
𝐹
2
𝑠. 𝑡. 𝑍𝑛,𝑚 = 𝑊𝑛,𝑚 ∀ 𝑛,𝑚 ∈ Ω
Distributed Matrix Completion [1]
1-Dec-16 IE598 Course Project: Matrix Completion 32
Update X
Update 𝛼
Update YUpdate Z
Check
Stopping
Rule
Initialize X, Y, Z, 𝛼
𝑋𝐼 𝑡 + 1 =𝑍𝐼 𝑡 𝑌𝐼
𝑇 𝑡 −𝛼𝐼(𝑡)
1+2𝛽|𝒩𝐼|+ 𝛽
σ𝑗∈𝒩𝐼𝑋𝑗 𝑡 + 𝒩𝐼 𝑋𝐼 𝑡
1+2𝛽|𝒩𝐼|
𝑍𝐼 𝑡 + 1= 𝑋𝐼 𝑡 + 1 𝑌𝐼 𝑡 + 1+ 𝒫Ω(𝑊𝐼 − 𝑋𝐼 𝑡 + 1 𝑌𝐼(𝑡 + 1))
𝑌𝐼 𝑡 + 1= (𝑋𝐼
𝑇 𝑡 + 1 𝑋𝐼(𝑡 + 1))−1𝑋𝐼𝑇 𝑡 + 1 𝑍𝐼(𝑡)
𝛼𝐼 𝑡 + 1= 𝛼𝐼 𝑡
+ 𝛽 𝒩𝐼 𝑋𝐼 𝑡 −
𝑗∈𝒩𝐼
𝑋𝑗 𝑡
Privacy Arguments and Strengths No data (𝑊𝐼) sharing by agents
𝑌𝐼 , 𝑍𝐼 are computed solely based on local information
𝛼𝐼 , 𝑋𝐼 require the estimate of 𝑋 from neighbors
In theory, one could observe evolution of 𝑋𝐼 from all agents and guess 𝑌𝐼 and 𝑍𝐼
This is nontrivial and would require global knowledge of – network topology, evolution of 𝑋𝐼
Low Communication Costs - Only 𝑋𝐼 are shared among agents
Communication cost per agent is 𝑁 × 𝑟 × 𝑇
Communication load is evenly distributed
1-Dec-16 IE598 Course Project: Matrix Completion 33
𝑋𝐼 𝑡 + 1 =𝑍𝐼 𝑡 𝑌𝐼
𝑇 𝑡 −𝛼𝐼(𝑡)
1+2𝛽|𝒩𝐼|+ 𝛽
σ𝑗∈𝒩𝐼𝑋𝑗 𝑡 + 𝒩𝐼 𝑋𝐼 𝑡
1+2𝛽|𝒩𝐼|
Summary
Matrix completion problem, applications and theoretical results
Distributed Privacy Preserving Algorithm
Future Work Rigorous definition of privacy and precise claims
Privacy improvements through randomization
SGD as an alternative (NOMAD) + privacy preserving steps
1-Dec-16 IE598 Course Project: Matrix Completion 34
References1. Q. Ling, Y. Xu, W. Yin, and Z. Wen, “Decentralized low-rank matrix completion," in 2012 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 2925-2928, IEEE, 2012.
2. Y. Cherapanamjeri, K. Gupta, and P. Jain, “Nearly-optimal robust matrix completion," arXiv preprint arXiv:1606.07315, 2016.
3. Y. Chen, H. Xu, C. Caramanis, and S. Sanghavi, “Robust matrix completion with corrupted columns,“ arXiv:1102.2254, 2011.
4. J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma, “Robust principal component analysis: Exact recovery of corrupted low-rankmatrices via convex optimization," in Advances in NIPS, pp. 2080-2088, 2009.
5. B. Li, Y. Wang, A. Singh, and Y. Vorobeychik, “Data poisoning attacks on factorization-based collaborative filtering,"arXiv:1608.08182, 2016.
6. H. Yun, H.-F. Yu, C.-J. Hsieh, S. Vishwanathan, and I. Dhillon, ”Nomad: Non-locking, stochastic multi-machine algorithm forasynchronous and decentralized matrix completion," Proceedings of the VLDB Endowment, vol. 7, no. 11, pp. 975-986, 2014.
7. A. Montanari and S. Oh, “On positioning via distributed matrix completion," in Sensor Array and multichannel SignalProcessing Workshop (SAM), 2010 IEEE, pp. 197-200, IEEE, 2010.
8. Z. Wen, W. Yin, and Y. Zhang. "Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm." Mathematical Programming Computation 4.4 (2012): 333-361.
1-Dec-16 IE598 Course Project: Matrix Completion 35
1-Dec-16 IE598 Course Project: Matrix Completion 36
Thank You.
Questions?