Date post: | 12-Jan-2016 |
Category: |
Documents |
Upload: | millicent-jackson |
View: | 219 times |
Download: | 1 times |
HyPER: A Flexible and Extensible Probabilistic Framework for
Hybrid Recommender Systems
Pigi Kouki, Shobeir Fakhraei, James Foulds, Magdalini Eirinaki, Lise Getoor
University of California, Santa Cruz University of Maryland, College Park
San Jose State University
2
Motivation• Increasing amount of data useful for recommendations
content
social demographic
ratings
3
Multiple Data Sources
• Content – [Gunawardana and Meek, RecSys 2009]– [Forbes and Zhu, RecSys 2011]– [de Campos et al., IJAR 51(7) 2010]
• Social relationships– [Ma et al., WSDM 2011]– [Liu et al., DSS 55(3) 2013]
Combining ratings with otherdata sources improves performance
4
Multiple Data Sources
• Content – [Gunawardana and Meek, RecSys 2009]– [Forbes and Zhu, RecSys 2011]– [de Campos et al., IJAR 51(7) 2010]
• Social relationships– [Ma et al., WSDM 2011]– [Liu et al., DSS 55(3) 2013]
Combining ratings with otherdata sources improves performance
5
Multiple Data Sources
• Content – [Gunawardana and Meek, RecSys 2009]– [Forbes and Zhu, RecSys 2011]– [de Campos et al., IJAR 51(7) 2010]
• Social relationships– [Ma et al., WSDM 2011]– [Liu et al., DSS 55(3) 2013]
Combining ratings with otherdata sources improves performance
6
Multiple Data Sources
• Review text – [McAuley & Leskovec, RecSys 2013]– [Ling et al., RecSys, 2014]
• Tags and labels– [Guy et al., SIGIR 2010]
• Feedback– [Sedhain et al., RecSys, 2014]
Combining ratings with otherdata sources improves performance
#cool #neat #ok #sucks
7
Multiple Data Sources
• Review text – [McAuley & Leskovec, RecSys 2013]– [Ling et al., RecSys, 2014]
• Tags and labels– [Guy et al., SIGIR 2010]
• Feedback– [Sedhain et al., RecSys, 2014]
Combining ratings with otherdata sources improves performance
#cool #neat #ok #sucks
8
Multiple Data Sources
• Review text – [McAuley & Leskovec, RecSys 2013]– [Ling et al., RecSys, 2014]
• Tags and labels– [Guy et al., SIGIR 2010]
• Feedback– [Sedhain et al., RecSys, 2014]
Combining ratings with otherdata sources improves performance
#cool #neat #ok #sucks
9
Multiple Recommenders
• [Jahrer et al., KDD 2010]• [Burke, In The Adaptive Web, 2007]
Combining predictions of multiple recommenders also improves performance
“Predictive accuracy is substantially improved when blending multiple predictors”-[Bell et al., The BellKor Solution to the Netflix Prize, 2007]
See also:
10
Desiderata for Hybrid Systems
• To get the best performance, we should make use of all available data sources and algorithms
• We need a framework that is:– General
• Combines arbitrary data modalities• Combines multiple recommenders• problem and data-agnostic
– Extensible to new information sources/recommenders– Scalable to large data sets
11
Desiderata for Hybrid Systems
• To get the best performance, we should make use of all available data sources and algorithms
• We need a framework that is:– General
• Combines arbitrary data modalities• Combines multiple recommenders• problem and data-agnostic
– Extensible to new information sources/recommenders– Scalable to large data sets
12
Desiderata for Hybrid Systems
• To get the best performance, we should make use of all available data sources and algorithms
• We need a framework that is:– General
• Combines arbitrary data modalities• Combines multiple recommenders• problem and data-agnostic
– Extensible to new information sources/recommenders– Scalable to large data sets
13
General Hybrid Recommendersin the Literature
• Existing hybrid systems, though powerful, typically fall short on either generality, extensibility, or scalability– Often combine collaborative and/or content-based methods with
each other or just one other data modality (cf. previous slides)
– Some systems can leverage heterogeneous data• [Gemmell et al. 2012, Burke et al. 2014, Yu et al. 2014]
• Probabilistic graphical modeling approaches are typically more general, less scalable– Bayesian networks [de Campos et al., IJAR 51(7) 2010]
– Markov logic networks [Hoxha & Rettinger, ICMLA 2013]
14
General Hybrid Recommendersin the Literature
• Existing hybrid systems, though powerful, typically fall short on either generality, extensibility, or scalability– Often combine collaborative and/or content-based methods with
each other or just one other data modality (cf. previous slides)
– Some systems can leverage heterogeneous data• [Gemmell et al. 2012, Burke et al. 2014, Yu et al. 2014]
• Probabilistic graphical modeling approaches are typically more general, less scalable– Bayesian networks [de Campos et al., IJAR 51(7) 2010]
– Markov logic networks [Hoxha & Rettinger, ICMLA 2013]
15
Our Approach
• A general, extensible, scalable recommender framework
• Leverages advances in statistical relational learning– Probabilistic soft logic [Bach et al., UAI 2013, ArXiv 2015]
• Inspired by recent work in drug-target interaction prediction [Fakhraei et al., Transactions on Computational Biology and Bioinformatics 11(5) 2014]
We propose HyPER: Hybrid Probabilistic Extensible Recommender
16
Hybrid Modeling with HyPER
Data Source
Recommender
3
4
…
Predicted Ratings
17
Hybrid Modeling with HyPER
Data Source 1
Recommender
3
4
…
Predicted RatingsData Source 2
Data Source N
…
18
Hybrid Modeling with HyPER
Data Source 1
Recommender 1
3
4
…
Predicted RatingsData Source 2
Data Source N
…
Recommender 2
Recommender M
…
HyPER
19
HyPER: High-Level Approach
• User-item ratings viewed as a weighted bipartite graph
• Build hybrid model by adding links to encode additional information– multiple user and item similarities, social
information,…
• Predict ratings by reasoning over the graph, via a graphical model
20
HyPER: High-Level Approach
• User-item ratings viewed as a weighted bipartite graph
• Build hybrid model by adding links to encode additional information– multiple user and item similarities, social
information,…
• Predict ratings by reasoning over the graph, via a graphical model
21
HyPER: High-Level Approach
• User-item ratings viewed as a weighted bipartite graph
• Build hybrid model by adding links to encode additional information– multiple user and item similarities, social
information,…
• Predict ratings by reasoning over the graph, via a graphical model
22
Extended Recommendation Graph
23
Extended Recommendation Graph
24
Extended Recommendation Graph
25
Extended Recommendation Graph
26
Extended Recommendation Graph
27
Modeling and Reasoning over the Graph
• Hinge-loss Markov random fields (HL-MRFs) [Bach et al., UAI 2013]
– Exact, efficient, and scalable inference– Continuous random variables– Models defined by PSL programs
• Probabilistic Soft Logic (PSL) [Bach et al., ArXiv 2015]
– Statistical relational learning system– Logical probabilistic programming interface – Templating language for HL-MRFs
28
Modeling and Reasoning over the Graph
• Hinge-loss Markov random fields (HL-MRFs) [Bach et al., UAI 2013]
– Exact, efficient, and scalable inference– Continuous random variables– Models defined by PSL programs
• Probabilistic Soft Logic (PSL) [Bach et al., ArXiv 2015]
– Statistical relational learning system– Logical probabilistic programming interface – Templating language for HL-MRFs
29
Hinge-loss Markov Random Fields
Conditional random field over continuous random variablesbetween 0 and 1
30
Hinge-loss Markov Random Fields
Conditional random field over continuous random variablesbetween 0 and 1
Feature functions are hinge loss functions
31
Hinge-loss Markov Random Fields
Feature functions are hinge loss functions
Conditional random field over continuous random variablesbetween 0 and 1
32
Hinge-loss Markov Random Fields
Feature functions are hinge loss functions
Conditional random field over continuous random variablesbetween 0 and 1
Linear function
33
Hinge-loss Markov Random Fields
Feature functions are hinge loss functions
Conditional random field over continuous random variablesbetween 0 and 1
Linear function
34
Hinge-loss Markov Random Fields
Feature functions are hinge loss functions
Conditional random field over continuous random variablesbetween 0 and 1
Linear function
2
35
Hinge-loss Markov Random Fields
Feature functions are hinge loss functions
Conditional random field over continuous random variablesbetween 0 and 1
Hinge losses encode the distance to satisfactionfor each instantiated rule
2
Linear function
36
Efficient Inference in HL-MRFs
• Energy function is convex, can find a global MAP state
• The alternating direction method of multipliers (ADMM) is used for efficient and scalable inference
37
Probabilistic Soft Logic
• Statistical relational learning language• Uses first-order logical rules • Τemplates HL-MRFs
logical operators
predicatesweight
w : LikesGenre(U, G) && IsGenre(M, G) Rating(U, M)
38
Probabilistic Soft Logic
• Statistical relational learning language• Uses first-order logical rules• Τemplates HL-MRFs
predicatesweight
w : LikesGenre(U, G) && IsGenre(M, G) Rating(U, M)
39
Probabilistic Soft Logic
• Statistical relational learning language• Uses first-order logical rules• Τemplates HL-MRFs
logical operators
predicatesweight
w : LikesGenre(U, G) && IsGenre(M, G) Rating(U, M)
40
Probabilistic Soft Logic
• Statistical relational learning language• Uses first-order logical rules • Τemplates HL-MRFs
predicatesweight
w : LikesGenre(U, G) && IsGenre(M, G) Rating(U, M)
logical operators
41
Probabilistic Soft Logic
• Converts rules to hinge-loss potentials
• PSL program = rules + data• Open source: http://psl.umiacs.umd.edu
hinge-loss
LikesGenre(U, G) && IsGenre(M, G) Rating(U, M)
42
Probabilistic Soft Logic
• Converts rules to hinge-loss potentials
• PSL program = rules + data• Open source: http://psl.umiacs.umd.edu
hinge-loss
LikesGenre(U, G) && IsGenre(M, G) Rating(U, M)
max{LikesGenre(U, G) + IsGenre(M, G) - Rating(U, M) -1, 0}
43
Probabilistic Soft Logic
• Converts rules to hinge-loss potentials
• PSL program = rules + data• Open source: http://psl.umiacs.umd.edu
hinge-loss
LikesGenre(U, G) && IsGenre(M, G) Rating(U, M)
max{LikesGenre(U, G) + IsGenre(M, G) - Rating(U, M) -1, 0}
44
Recommendations with HyPER
• Similar items get similar ratings from a user– e.g. cosine, adjusted cosine, Pearson, content
SimilarItems(i1,i2)
Rating(u,i1) = 5
Rating(u,i1) = ?
SimilarItemssim(i1, i2) && Rating(u, i1) Rating(u, i2)
45
Recommendations with HyPER• Similar users give similar ratings to an item– e.g. cosine, Pearson
SimilarUsers(u1,u2)
Rating(u1,i) = 4
Rating(u2,i) = ?
SimilarUserssim(u1, u2) && Rating(u1, i) Rating(u2, i)
46
• Mean-centering priors
• Additional data sources
• Leveraging existing recommenders• e.g. matrix factorization, item-based
Recommendations with HyPER
AverageUserRating(u) Rating(u, i)AverageItemRating(i) Rating(u, i)
47
• Mean-centering priors
• Social network links
• Leveraging existing recommenders• e.g. matrix factorization, item-based
Recommendations with HyPER
Friends(u1, u2) && Rating (u1, i) Rating(u2, i)
AverageUserRating(u) Rating(u, i)AverageItemRating(i) Rating(u, i)
48
• Mean-centering priors
• Social network links
• Leveraging existing recommenders• e.g. matrix factorization, item-based
Recommendations with HyPER
RatingRecommender(u, i) Rating(u, i)
Friends(u1, u2) && Rating (u1, i) Rating(u2, i)
AverageUserRating(u) Rating(u, i)AverageItemRating(i) Rating(u, i)
49
• Mean-centering priors
• Social network links
• Leveraging existing recommenders• e.g. matrix factorization, item-based
Recommendations with HyPER
Extensible to new data/algorithms – just add rules!
RatingRecommender(u, i) Rating(u, i)
Friends(u1, u2) && Rating (u1, i) Rating(u2, i)
AverageUserRating(u) Rating(u, i)AverageItemRating(i) Rating(u, i)
50
Balancing the Rules
• Balancing done through weights wj
• Higher wj indicates a more important rule
• Weight learning by approximating a gradient step in the conditional log-likelihood:
51
Experimental Validation
• Yelp academic dataset– ~34k users, ~3.6k items, ~99k ratings – ~81k friendships– 514 business categories
• Last.fm– ~1.8k users, ~17k items, ~92k ratings– ~12k friendships– ~9.7k artist tags
• Evaluation metrics: RMSE, MAEhttps://www.yelp.com/academic_datasethttp://grouplens.org/datasets/hetrec-2011/
52
Baselines
• Collaborative filtering systems– Item-based cf. [Ning et al., In Recommender Systems Handbook, 2015]
– Matrix factorization (MF) cf. [Koren et al., IEEE Computer 42(8) 2009]
– Bayesian probabilistic matrix factorization (BPMF) [Salakhutdinov & Mnih., ICML 2008]
• Hybrid Systems– Naïve hybrid (averaged predictions)– BPMF with social relations and content (BPMF-SRIC)
[Liu et al., DSS 55(3) 2013]
53
HyPER vs Baselines
• HyPER outperforms all other models in both datasets• Results statistically significant
54
HyPER Submodels: Mean-centering
• HyPER combined model beats individual rules
55
HyPER Submodels: User-based
• HyPER combined model beats/matches best individual rules• Similar story for item-based, content & social
56
• HyPER can combine different recommenders effectively• Results statistically significant better
Combining the Baselines
57
HyPER (All Rules)
• Combining all rules achieves the best performance in both datasets
58
Scaling to Large Datasets
• Parallel implementation for inference and learning based on ADMM [Bach et al, UAI 2013]
• Scaling to big-data applications:– perform inference in parallel on densely
connected subgraphs of the original graph– fully distributed implementation of ADMM
59
Conclusions
• HyPER is a general-purpose, extensible framework for hybrid recommender systems
• With HyPER, practitioners can define custom hybrid models for using all available data/algorithms, via logical rules in PSL
• HyPER outperforms existing techniques on two popular datasets
60
Conclusions
• HyPER is a general-purpose, extensible framework for hybrid recommender systems
• With HyPER, practitioners can define custom hybrid models for using all available data/algorithms, via logical rules in PSL
• HyPER outperforms existing techniques on two popular datasets
Thank you for your attention!
61
HyPER Submodels – Item-based, Content & Social
62
ReferencesX. Ning, C. Desrosiers and G. Karypis. A comprehensive survey of neighborhood-based recommendation
methods. In Recommender Systems Handbook. 2nd edition, Springer, 2015S. Fakhraei, B. Huang, L. Raschid, and L. Getoor. Network-based drug-target interaction prediction with
probabilistic soft logic. Transactions on Computational Biology and Bioinformatics, 11(5), 2014.J. Liu, C. Wu, and W. Liu. Bayesian probabilistic matrix factorization with social relations and item contents for
recommendation. Decision Support Systems, 55(3), 2013.R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In
ICML, 2008.Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer,
42(8), 2009.A. Gunawardana and C. Meek. A unified approach to building hybrid recommender systems. In RecSys, 2009.R. Burke. Hybrid web recommender systems. In The Adaptive Web. Springer, 2007.L. de Campos, J. Fernandez-Luna, J. Huete, and M. Rueda-Morales. Combining content-based and collaborative
recommendations: A hybrid approach based on Bayesian networks. International Journal of Approximate Reasoning, 51(7), 2010.
M. Jahrer, A. Toscher, and R. Legenstein. Combining predictions for accurate recommender systems. In KDD, �2010.
63
ReferencesJ. Hoxha and A. Rettinger. First-order probabilistic model for hybrid recommendations. In ICMLA, 2013.S. H. Bach, B. Huang, B. London, and L. Getoor. Hinge-loss Markov random fields: Convex inference for structured
prediction. In UAI, 2013.S.H. Bach, M. Broecheler, B. Huang, and L. Getoor. Hinge-loss Markov random fields and probabilistic soft logic.
ArXiv:1505.04406 [cs.LG], 2015.A. P. Forbes and M. Zhu. Content-boosted matrix factorization for recommender systems: Experiments with recipe
recommendation. In RecSys, 2011.J. Chen, G. Chen, H. Zhang, J. Huang, and G. Zhao. Social recommendation based on multi-relational analysis. In WI-
IAT, 2012.R. Burke, F. Vahedian, and B. Mobasher. Hybrid recommendation in heterogeneous networks. In User Modeling,
Adaptation, and Personalization. Springer, 2014.J. Gemmell, T. S., B. Mobasher, and R. Burke. Resource recommendation in social annotation systems: A linear-
weighted hybrid approach. Journal of Computer and System Sciences, 78(4), 2012.X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal, B. Norick, and J. Han. Personalized entity recommendation: A
heterogeneous information network approach. In WSDM, 2014.H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender systems with social regularization. In WSDM, 2011.J. McAuley and J. Leskovec. Hidden factors and hidden topics: Understanding rating dimensions with review text. In
RecSys, 2013.G. Ling, M. R. Lyu, and I. King. Ratings meet reviews, a combined approach to recommend. In RecSys, 2014.I. Guy, N. Zwerdling, I. Ronen, D. Carmel, and E. Uziel. Social media recommendation based on people and tags. In
SIGIR, 2010.S. Sedhain, S. Sanner, D. Braziunas, L. Xie, and J. Christensen. Social collaborative ltering for cold-start
recommendations. In RecSys, 2014.