SVD Applied to Collaborative Filtering
~ URUG 7-12-07 ~
Recommendation System
Recommendation SystemAnswers the question:
What do I want next?!?
Recommendation System
Very consumer driven.
Must provide good results or a user may not trust the system in the future.
Answers the question:What do I want next?!?
Collaborative FilteringBase user recommendations off of:
User’s past history.
History of like-minded users.
View data as product X user matrix.
Find a “neighborhood” of similar users for that user.
Return the top-N recommendations.
Early Approaches
Goldberg, et. al. (1992), Using collaborative filtering to weave an information tapestry
Konstan, J., el. at (1997), Applying Collaborative Filtering to Usenet news.
Use Pearson Correlation or cosine similarity as a measure of similarity to form neighborhoods.
Early CF Challenges
Early CF Challenges
Sparsity - No correlation between users can be found. Reduced coverage occurs.
Early CF Challenges
Sparsity - No correlation between users can be found. Reduced coverage occurs.
Scalability - Nearest neighbor algorithms computation time grows with the number of products and users.
Early CF Challenges
Sparsity - No correlation between users can be found. Reduced coverage occurs.
Scalability - Nearest neighbor algorithms computation time grows with the number of products and users.
Synonymy
Dimensionality Reduction
Dimensionality ReductionLatent Semantic Indexing (LSI)
Dimensionality ReductionLatent Semantic Indexing (LSI)
Algorithm from IR community (late 80s-early 90s.)
Dimensionality ReductionLatent Semantic Indexing (LSI)
Algorithm from IR community (late 80s-early 90s.)
Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets.
Dimensionality ReductionLatent Semantic Indexing (LSI)
Algorithm from IR community (late 80s-early 90s.)
Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets.
Reduces dimensionality of a dataset and captures the latent relationships.
Dimensionality ReductionLatent Semantic Indexing (LSI)
Algorithm from IR community (late 80s-early 90s.)
Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets.
Reduces dimensionality of a dataset and captures the latent relationships.
Easily maps to CF!
Dimensionality ReductionLatent Semantic Indexing (LSI)
Algorithm from IR community (late 80s-early 90s.)
Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets.
Reduces dimensionality of a dataset and captures the latent relationships.
Easily maps to CF!
Framing LSI for CFProducts X Users matrix instead of Terms X Documents.
480,189 users, 17,770 movies, only ~100 milion ratings.
17,770 X 480,189 matrix that is 99% sparse!
About 8.5 billion potential ratings.
Netflix Dataset
SVD- The math behind LSISingular Value Decomposition
For any M x N matrix A of rank r, it can decomposed as:
A = U!V TU is a M x M orthogonal matrix.V is a N X N orthogonal matrix.Σ is a M x N diagonal matrix whose first r diagonal entries are the nonzero singular values of A.
!1 ! !2...! !r > !r+1 = ... = !n = 0
Related to eigenvalue decomposition (PCA)
U is the orthornormal eigenspace of AA^T. Spans the “column space”, known as left singular vectors.
V is the orthornormal eigenspace of A^TA. Spans “row space”. Right vectors.
Singular values are the square roots of the eigenvalues.
Reducing Dimensionality
A_k is the closest approximation to A.
A_k minimizes the Frobenius norm over all rank-k matrices:
Ak = Uk!kV Tk
||A!Ak||F
Making RecommendationsCosine Similarity- common way to find neighborhood.
cos(i, j) =i · j
||i||2 ! || j||2Somehow base recommendations off of that neighborhood and its users.
Can also make predictions of products with a simple dot product if the singular values are combined with the singular vectors.
CPprod = Cavg +UkS1/2k (c) · S1/2
k V Tk (p)
Challenges with SVDScalability - Once again, compute time grows with the number of users and products. O(m^3)
Offline stage.
Online stage.
Even doing the SVD computation offline is not possible for large datasets. Other methods are needed.
Incremental SVD
uk = uTVk!!1k
Incremental SVD Results
GHA for SVD
Gorrell (2006),GHA for Incremental SVD in NLP
Based off of Sanger’s (1989) GHA for eigen decomposition.
!cai = cb
i · b(x!"j<i
(a · caj)c
aj)
!cbi = ca
i · a(b!"j<i
(b · cbj)c
bj)
GHA extended by Funk
void train(int user, int movie, real rating) { real err = lrate * (rating - predictRating(movie, user));
userValue[user] += err * movieValue[movie]; movieValue[movie] += err * userValue[user]; }
Netflix Results
Best RMSEs
0.9283
0.9212
Blended to get 0.9189, 3.42% better than Netflix.
SummarySVD provides an elegant and automatic recommendation system that has the potential to scale.
There are many different algorithms to calculate or at least approximate SVD which can be used in offline stages for websites that need to have CF.
Every dataset is different and requires experimentation with to get the best results.