SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Xia Ning and George Karypis
Computer Science & EngineeringUniversity of Minnesota, Minneapolis, MN
Email: {xning,[email protected]}
December 14, 2011
Introduction Methods Materials Experimental Results Conclusions 2/25
Outline1 Introduction
Top-N Recommender SystemsDefinitions and NotationsThe State-of-the-Art methods
2 MethodsSparse LInear Methods for top-N RecommendationLearning W for SLIMSLIM with Feature Selection
3 Materials4 Experimental Results
SLIM on Binary DataTop-N Recommendation PerformanceSLIM for Long-Tail DistributionSLIM Regularization Effects
SLIM on Rating Data5 Conclusions
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 3/25
Outline1 Introduction
Top-N Recommender SystemsDefinitions and NotationsThe State-of-the-Art methods
2 MethodsSparse LInear Methods for top-N RecommendationLearning W for SLIMSLIM with Feature Selection
3 Materials4 Experimental Results
SLIM on Binary DataTop-N Recommendation PerformanceSLIM for Long-Tail DistributionSLIM Regularization Effects
SLIM on Rating Data5 Conclusions
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 4/25
Top-N Recommender Systems
q Top-N recommendationq E-commerce: huge amounts of productsq Recommend a short ranked list of items for users
q Top-N recommender systemsq Neighborhood-based Collaborative Filtering (CF)
q Item based [2]: fast to generate recommendations, lowrecommendation quality
q Model-based methods [1, 3, 5]q Matrix Factorization (MF) models: slow to learn the models,
high recommendation qualityq SLIM: Sparse LInear Methods
q Fast and high recommendation quality
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 5/25
Definitions and Notations
Table 1: Definitions and Notations
Def Descriptions
ui usertj itemU all users (|U| = n)T all items (|T | = m)A user-item purchase/rating matrix, size n × mW item-item similarity matrix/coefficient matrixaT
i The i-th row of A, the purchase/rating history of ui on Taj The j-th column of A, the purchase/rating history of U on tj
q Row vectors are represented by having the transposesupscriptT, otherwise by default they are column vectors.
q Use matrix/vector notations instead of user/itempurchase/rating profiles
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 6/25
The State-of-the-Art MethodsItem-based Collaborative Filtering (1)
q Item-based k-nearest-neighbor (itemkNN) CFq Identify a set of similar itemsq Item-item similarity:
q Calculated from Aq Cosine similarity measure
u1u2u3
...
...
ui
...
...
un−1un
t1 t2 t3 . . .. . . tj . . .. . .tm−1 tm
1
1
1
1
1
1
1
1
1
...
...
...
...
11
1
1
1
11
1
1
t1t2t3
...
...
tj...
...
tm−1tm
t1 t2 t3 . . .. . . tj . . .. . .tm−1 tm
s
s
s
s
s
s
s
s
s
...
...
...
...
ss
s
s
s
ss
s
s
s s
1st nn2nd nn
A W
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 7/25
The State-of-the-Art MethodsItem-based Collaborative Filtering (2)
t1t2t3
...
...
tj
...
...
tm−1tm
t1 t2 t3. . .. . . tj . . .. . .tm−1 tm
s
s
s
s
s
s
s
s
s
...
...
...
...
ss
s
s
s
ss
s
s
s s
×
1
1
1
1
uT∗·
t1t2t3
...
...
tj
...
...
tm−1tm
=
uT∗·
p
p
pp
pp
t1t2t3
...
...
tj
...
...
tm−1tm
q itemkNN recommendationq Recommend similar items to what the user has purchased
aTi = aT
i ×W
q Fast: sparse item neighborhoodq Low quality: no knowledge is learned
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 8/25
The State-of-the-Art MethodsMatrix Factorization (1)
q Latent factor modelsq Factorize A into low-rank user factors (U) and item factors
(VT)q U and VT represent user and item characteristics in a
common latent spaceq Formulated as an optimization problem
minimizeU,VT
12‖A − UVT‖2F +
β
2‖U‖2F +
λ
2‖VT‖2F
u1u2u3
...
...
ui
...
...
un−1un
t1 t2 t3 . . .. . . tj . . .. . .tm−1 tm
1
1
1
1
1
1
1
1
1
...
...
...
...
11
1
1
1
11
1
1
u1u2u3
...
...
ui
...
...
uk−1uk
l1 l2 . . . lk
uuuuuuuuuu
uuuuuuuuuu
uuuuuuuuuu
uuuuuuuuuu
×
l1l2
...
lk
t1 t2 t3 . . .. . . tj . . .. . .tm−1tmv v v v v v v v v vv v v v v v v v v vv v v v v v v v v vv v v v v v v v v v
A U × VT
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 9/25
The State-of-the-Art MethodsMatrix Factorization (2)
u∗l1 l2 . . . lku u u u ×
l1l2
...
lk
t1 t2 t3 . . .. . . tj . . .. . .tm−1tmv v v v v v v v v vv v v v v v v v v vv v v v v v v v v vv v v v v v v v v v
=
uT∗·
t1t2t3
...
...
tj
...
...
tm−1tm
pppppppppp
q MF recommendationq Prediction: dot product in the latent space
aij = UTi · Vj
q Slow: dense U and VT
q High quality: user tastes and item properties are learned
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 10/25
Outline1 Introduction
Top-N Recommender SystemsDefinitions and NotationsThe State-of-the-Art methods
2 MethodsSparse LInear Methods for top-N RecommendationLearning W for SLIMSLIM with Feature Selection
3 Materials4 Experimental Results
SLIM on Binary DataTop-N Recommendation PerformanceSLIM for Long-Tail DistributionSLIM Regularization Effects
SLIM on Rating Data5 Conclusions
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 11/25
SLIM for top-N Recommendation
q Motivations:q recommendations generated fastq high-quality recommendationsq “have my cake and eat it too”
q Key ideas:q retain the nature of itemkNN: sparse Wq optimize the recommendation performance: learn W from A
q sparsity structuresq coefficient values
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 12/25
Learning W for SLIM
q The optimization problem:
minimizeW
12‖A − AW‖2F +
β
2‖W‖2F + λ‖W‖1
subject to W ≥ 0diag(W) = 0,
(1)
q Computing W:q The columns of W are independent: easy to parallelizeq The decoupled problems:
minimizewj
12‖aj − Awj‖
22 +
β
2‖wj‖
22 + λ‖wj‖1
subject to wj ≥ 0wj,j = 0,
(2)
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 12/25
Learning W for SLIM
q The optimization problem:
minimizeW
12‖A − AW‖2F +
β
2‖W‖2F + λ‖W‖1
subject to W ≥ 0diag(W) = 0,
(1)
q Computing W:q The columns of W are independent: easy to parallelizeq The decoupled problems:
minimizewj
12‖aj − Awj‖
22 +
β
2‖wj‖
22 + λ‖wj‖1
subject to wj ≥ 0wj,j = 0,
(2)
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 13/25
Reducing model learning time
minimizewj
12‖aj − Awj‖
22 +
β
2‖wj‖
22 + λ‖wj‖1
q fsSLIM: SLIM with feature selectionq Prescribe the potential non-zero structure of wjq Select a subset of columns from A
q itemkNN item-item similarity matrix
u1u2u3
...
...
ui
...
...
un−1un
aj
1
1
1
1
1
1
1
1
11
1
...
1
1
...
1
1
...
1
1
1...
11
1
1
1
11
1
1
u1u2u3
...
...
uj
...
...
um−1um
11
11
1
1
1
1
1
1
1
1
A A′
minimizewj
12‖aj − A′wj‖
22 +
β
2‖wj‖
22 + λ‖wj‖1
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 14/25
Outline1 Introduction
Top-N Recommender SystemsDefinitions and NotationsThe State-of-the-Art methods
2 MethodsSparse LInear Methods for top-N RecommendationLearning W for SLIMSLIM with Feature Selection
3 Materials4 Experimental Results
SLIM on Binary DataTop-N Recommendation PerformanceSLIM for Long-Tail DistributionSLIM Regularization Effects
SLIM on Rating Data5 Conclusions
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 15/25
Datasets, Evaluation Methodology and Metrics
Table 2: The Datasets Used in Evaluation
dataset #users #items #trns rsize csize density ratings
ccard 42,067 18,004 308,420 7.33 17.13 0.04% -ctlg2 22,505 17,096 1,814,072 80.61 106.11 0.47% -ctlg3 58,565 37,841 453,219 7.74 11.98 0.02% -ecmrc 6,594 3,972 50,372 7.64 12.68 0.19% -BX 3,586 7,602 84,981 23.70 11.18 0.31% 1-10ML10M 69,878 10,677 10,000,054 143.11 936.60 1.34% 1-10Netflix 39,884 8,478 1,256,115 31.49 148.16 0.37% 1-5Yahoo 85,325 55,371 3,973,104 46.56 71.75 0.08% 1-5
q Datasets: 8 real datasets of 2 categoriesq Evaluation methodology: Leave-One-Out cross validationq Evaluation metrics
q Hit Rate: HR =#hits
#usersq Average Reciprocal Hit-Rank (ARHR) [2]:
ARHR =1
#users
#hits∑i=1
1pi
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 16/25
Outline1 Introduction
Top-N Recommender SystemsDefinitions and NotationsThe State-of-the-Art methods
2 MethodsSparse LInear Methods for top-N RecommendationLearning W for SLIMSLIM with Feature Selection
3 Materials4 Experimental Results
SLIM on Binary DataTop-N Recommendation PerformanceSLIM for Long-Tail DistributionSLIM Regularization Effects
SLIM on Rating Data5 Conclusions
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 17/25
SLIM on Binary DataTop-N recommendation performance
Figure 1: HR comparison
0.080.120.160.200.240.28
ccard ecmrc Netflix
HR
datasets
Figure 2: ARHR comparison
0.04
0.08
0.12
0.16
0.20
0.24
ccard ecmrc Netflix
AR
HR
datasets
Figure 3: learning time comparison
0.1110
1001000
10000100000
ccard ecmrc Netflix
lear
ning
time
(s)
datasets
Figure 4: testing time comparison
0.1
1
10
100
1000
10000
ccard ecmrc Netflix
test
ing
time
(s)
datasets
itemkNNitemprobuserkNN
PureSVDWRMFBPRMF
BPRkNNSLIM
fsSLIM
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 18/25
SLIM on Binary DataSLIM for Long-Tail Distribution
Figure 5: Rating Distribution in ML10M
0
0.001%
0.01%
0.1%
1%
10%
100%
0 20% 40% 60% 80% 100%
%of
item
s
% of purchases/ratings
short-head(popular)
long-tail(unpopular)
q SLIM outperforms the restmethods on the “long tail”.
Figure 6: HR in ML10M tail
0.12
0.16
0.20
0.24
HR
ML10M tail
Figure 7: ARHR in ML10M tail
0.05
0.07
0.09
0.11
AR
HR
ML10M tail
itemkNNitemprobuserkNN
PureSVDWRMFBPRMF
BPRkNNSLIM
fsSLIM
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 19/25
SLIM on Binary DataSLIM Recommendations for Different top-N
Figure 8: BX
0.03
0.06
0.09
0.12
0.15
5 10 15 20 25
HR
N
Figure 9: Netflix
0.10
0.15
0.20
0.25
0.30
5 10 15 20 25
HR
N
itemkNNitemprob
userkNNPureSVD
WRMFBPRMF
BPRkNNSLIM
q The performance difference between SLIM and the best ofthe other methods are higher for smaller values of N.
q SLIM tends to rank most relevant items higher than theother methods.
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 20/25
SLIM on Binary DataSLIM Regularization Effects
Figure 10: SLIM Regularization Effects on BX
0.0
0.5
1.0
2.0
3.0
5.0
0.0 0.5 1.0 2.0 3.0 5.0
β
λ
0.0
0.5
1.0
1.5
2.0
2.5
time
(s)
0.0
0.5
1.0
2.0
3.0
5.0
0.0 0.5 1.0 2.0 3.0 5.0
β
λ
0.06
0.07
0.08
0.09
0.10
0.11
HR
minimizeW
12‖A − AW‖2F +
β
2‖W‖2F + λ‖W‖1
q As greater `1-norm regularization (i.e., larger λ ) is applied, lowerrecommendation time is achieved, indicating that the learned W is sparser.
q The best recommendation quality is achieved when both of the regularizationparameters β and λ are non-zero.
q The recommendation quality changes smoothly as the regularization parametersβ and λ change.
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 21/25
SLIM on Rating DataTop-N recommendation performance
Figure 11: SLIM on Netflix
0
10%
20%
30%
1 2 3 4 5
dist
ribut
ion
rating
0.000.080.160.240.32
1 2 3 4 5
rHR
rating
PureSVD-rPureSVD-b
WRMF-rWRMF-b
BPRkNN-rBPRkNN-b
SLIM-rSLIM-b
q Evaluation metics:
q per-rating Hit Rate: rHR
q All the -r methods produce higher hit rates on items with higher ratings.
q The -r methods outperform -b methods on high-rated items.
q SLIM-r consistently outperforms the other methods on items with higherratings.
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 22/25
Outline1 Introduction
Top-N Recommender SystemsDefinitions and NotationsThe State-of-the-Art methods
2 MethodsSparse LInear Methods for top-N RecommendationLearning W for SLIMSLIM with Feature Selection
3 Materials4 Experimental Results
SLIM on Binary DataTop-N Recommendation PerformanceSLIM for Long-Tail DistributionSLIM Regularization Effects
SLIM on Rating Data5 Conclusions
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 23/25
Conclusions
q SLIM: Sparse LInear Method for top-N recommendationsq The recommendation score for a new item can be
calculated as an aggregation of other itemsq A sparse aggregation coefficient matrix W is learned forSLIM to make the aggregation very fast
q W is learned by solving an `1-norm and `2-norm regularizedoptimization problem such that sparsity is introduced into W
q Fast and efficient
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 24/25
References
P. Cremonesi, Y. Koren, and R. Turrin.Performance of recommender algorithms on top-n recommendation tasks.In Proceedings of the fourth ACM conference on Recommender systems, RecSys ’10, pages 39–46, NewYork, NY, USA, 2010. ACM.
M. Deshpande and G. Karypis.Item-based top-n recommendation algorithms.ACM Transactions on Information Systems, 22:143–177, January 2004.
Y. Hu, Y. Koren, and C. Volinsky.Collaborative filtering for implicit feedback datasets.In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 263–272,Washington, DC, USA, 2008. IEEE Computer Society.
S. Rendle, C. Freudenthaler, Z. Gantner, and S.-T. Lars.Bpr: Bayesian personalized ranking from implicit feedback.In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI ’09, pages452–461, Arlington, Virginia, United States, 2009. AUAI Press.
V. Sindhwani, S. S. Bucak, J. Hu, and A. Mojsilovic.One-class matrix completion with low-density factorizations.In Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM ’10, pages 1055–1060,Washington, DC, USA, 2010. IEEE Computer Society.
R. Tibshirani.Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society (Series B), 58:267–288, 1996.
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems
Introduction Methods Materials Experimental Results Conclusions 25/25
Thank You!
Xia Ning and George Karypis • SLIM: Sparse Linear Methodsfor Top-N Recommender Systems