Advances ofDeep & Reinforcement Learning
on Recommender Systems
Weinan ZhangShanghai Jiao Tong University
http://wnzhang.net
Jan. 06, 2020 at Tsinghua University
Content• A brief review of recommender system road map
• Deep learning for recommender systems
• Deep reinforcement learning for recommender systems
• Summary
Road Map of Recommendation Technique
2000-2006 Neighborhood based collaborative filtering
2007-2009 Matrix factorization and variants
2010-2015 Factorization machine and variants
2015-2017 Deep neural networks for user behavior prediction
2017-2019 Deep reinforcement learning for decision making
Matrix Factorization Techniques
Koren, Yehuda, Robert Bell, and Chris Volinsky. "Matrix factorization techniques for recommender systems." Computer 42.8 (2009).
r̂u;i = ¹ + bu + bi + p>u qir̂u;i = ¹ + bu + bi + p>u qi
Globalbias
Userbias
Itembias
User-itemInteraction
Factorization Machine
• Incorporate all possible information for recommender systems• One-hot encoding for each discrete (categorical) field• One real-value feature for each continuous field• All features are with latent factors• A more general regression model
Steffen Rendle. Factorization Machines. ICDM 2010. (10-year Best Paper)http://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdfOpen source: http://www.libfm.org/
Factorization Machine is a Neural NetworkA NEW PERSPECTIVE
Content• A brief review of recommender system road map
• Deep learning for recommender systems
• Deep reinforcement learning for recommender systems
• Summary
Factorization-machine Neural Networks (FNN)
[Factorization Machine Initialized]
Weinan Zhang et al. Deep Learning over Multi-Field Categorical Data: A Case Study on User Response Prediction. ECIR 2016
But factorization machine is still different from common additive neural networks!
Productoperation
Product Operations as Feature Interactions
Yanru Qu, Weinan Zhang et al. Product-based Neural Networks for User Response Prediction. ICDM 2016
Product-basedNeural Network(PNN)
• Blue Pi nodesare productoperators
Feature 1 Feature 2 Feature N
Embed 1 Embed 2 Embed N
P1 P2 Pi
Embedding Layer
Product Layer
Fully Connected Layers
Prediction
Yanru Qu, Weinan Zhang et al. Product-based Neural Networks for User Response Prediction. ICDM 2016
DeepFM
Huifeng Guo, Ruiming Tang and Xiuqiang He et al. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. IJCAI 2017.
Attentional Factorization Machines
• Basic idea: reweighting the field-pair interaction by attention network
Jun Xiao et al. Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks. IJCAI 2017.
element-wise product of two vectors
Field-aware Interaction• In FFM, this interaction is implemented with field-
aware embeddings• Substitute with unified embedding and “field-
aware parameter”
F1 Kernel F2* *
F1 F2
FC layer
Score
• Network-in-Network (NFM, PIN)• Generalize interaction to any
functions with sub-networks
• Kernel Interaction (KFM, KPNN)• Use different kernels to
project interactions separately
Yanru Qu, Weinan Zhang, Ruiming Tang, Xiuqiang He et al. Product-based Neural Networks for User Response Prediction over Multi-field Categorical Data. TOIS 2018.
Product-network In Network (PIN)
• We can design various sub-net to explore the interaction pattern between two fields
Feature 1 Feature 2 Feature N
Embed 1 Embed 2 Embed N Embedding Layer
Fully Connected Layers
Prediction
Sub-net 1 Sub-net 2 Sub-net i
F1 F2
FC layer
Hidden State
F1*F2
Yanru Qu, Weinan Zhang, Ruiming Tang, Xiuqiang He et al. Product-based Neural Networks for User Response Prediction over Multi-field Categorical Data. TOIS 2018.
Public Data Experiment Performance
Yanru Qu, Weinan Zhang, Ruiming Tang, Xiuqiang He et al. Product-based Neural Networks for User Response Prediction over Multi-field Categorical Data. TOIS 2018.
PIN achieved the best performance on well-recognized benchmarks and Huawei’s private dataset.
Content• A brief review of recommender system road map
• Deep learning for recommender systems
• Deep reinforcement learning for recommender systems
• Summary
From PCs to Mobiles
• User will only provide feedback on the recommended items, which depend on the current recommendation algorithms
• Learning from interactions with users
Reinforcement Learning
• At each step t, the agent• Receives observation Ot
• Executes action At• Receives scalar reward Rt
• The environment• Receives action At• Emits observation Ot+1• Emits scalar reward Rt+1
• t increments at environment step
Agent
Environment
Learning from interaction: Given the current situation, what to do next in order to maximize utility?
Deep RL for Recommender Systems
• Methodologies
• Policy-based solutions• Policy gradient• Deep deterministic policy gradient / actor-critic
• Value-based solutions• Deep Q-learning
Policy-based works
• Reinforcement Learning to Rank with Markov Decision Process. SIGIR 2017.
• Deep Reinforcement Learning in Large Discrete Action Spaces. AriXiv 2015.
• Deep Reinforcement Learning for Whole-Chain Recommendations. WSDM 2020.
• Large-scale Interactive Recommendation with Tree-structured Policy Gradient. AAAI 2019.
ICT
DeepMind
JD
SJTU
Reinforcement Learning to Rank with Markov Decision Process
• State: the state st is defined as a pair [t, Xt] where Xt is the remaining documents for ranking
• Action: at ∈ A(st) selects a document xm(at) ∈ Xt for the ranking position t+1
• Transition:
• Reward:
• Model how to select the next item in the ranking list as an MDP
Zeng Wei et al. Reinforcement Learning to Rank with Markov Decision Process. SIGIR 2017.
State
Action
Reinforcement Learning to Rank with Markov Decision Process
Zeng Wei et al. Reinforcement Learning to Rank with Markov Decision Process. SIGIR 2017.
REINFORCE Policy Gradient
Reinforcement Learning to Rank with Markov Decision Process
Zeng Wei et al. Reinforcement Learning to Rank with Markov Decision Process. SIGIR 2017.
Ranking accuracies on MQ2007 dataset
Deep Reinforcement Learning in Large Discrete Action Spaces
• Algorithm
Dulac-Arnold G et al. Deep reinforcement learning in large discrete action spaces[J]. arXiv preprint arXiv:1512.07679, 2015.
Argmax on KNN
Deep Reinforcement Learning in Large Discrete Action Spaces• Experiment performance
Dulac-Arnold G et al. Deep reinforcement learning in large discrete action spaces[J]. arXiv preprint arXiv:1512.07679, 2015.
TPGR: Tree Policy Gradient RecSys for Handling Large-Scale Discrete Actions• There are a large number of candidate items as actions to take
• Cause very large computational complexity• No previous literature on this topic• No previous application with such a setting
• TPGR solution: building hierarchical item structure for sequential decision making
Haokun Chen et al. Large-scale Interactive Recommendation with Tree-structured Policy Gradient. AAAI 2019.
TPGR: Tree Policy Gradient RecSys for Handling Large-Scale Discrete Actions
• Item correlation is based on current policy• Policy (and value function) can be regarded as a table
Action 1 Action 2 Action 3 Action 4State 1 0.1 0.3 0.2 0.4State 2 0.4 0.3 0.1 0.2State 3 0.1 0.1 0.3 0.5State 4 0.4 0.2 0.2 0.2State 5 0.2 0.3 0.3 0.2State 6 0.1 0.1 0.6 0.2
• Based on such a table, we can cluster the items into a hierarchy
Haokun Chen et al. Large-scale Interactive Recommendation with Tree-structured Policy Gradient. AAAI 2019.
Model-based RL for RecSys• Motivations: model-free deep RL methods
• Consume huge amount of data (low sample efficiency)• Suffer from sparse positive feedback (sparse reward)
Xiangyu Zhao et al. Deep Reinforcement Learning for Whole-Chain Recommendations. WSDM 2020.
Actor Critic• Estimate the value of
an action in different scenarios
• Entrance/detail page
Skip on entrance page
Click on entrance page
Leave on entrance page
Skip on item detail page
Click on item detail page
Leave on item detail page
Build predictive models to estimate user behaviors: skip/click/leave
Xiangyu Zhao et al. Deep Reinforcement Learning for Whole-Chain Recommendations. WSDM 2020.
Critic network
Deep RL for Recommender Systems
• Methodologies
• Policy-based solutions• Policy gradient• Deep deterministic policy gradient
• Value-based solutions• Deep Q-learning
DQN-based works
• Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. KDD 2018.
• DRN: A Deep Reinforcement Learning Framework for News Recommendation. WWW 2018.
• Neural Network based Reinforcement Learning for Real-time Pushing on Text Stream. SIGIR 2017.
• Interactive Recommender System via Knowledge Graph-enhanced Reinforcement Learning. 2020.
JD
MSR
PolyU
SJTU
Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning
feed positive input and negative input separately
Xiangyu Zhao et al. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. KDD 2018.
Problem: how to effectively represent the user state?
Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning
maximizing the difference of Q-values between enemy items
enemy items
Xiangyu Zhao et al. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. KDD 2018.
DRN: A Deep Reinforcement Learning Framework for News Recommendation• How to train an RL policy online and offline?
Guanjie Zheng et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation. WWW 2018.
DRN: A Deep Reinforcement Learning Framework for News Recommendation
• Use user and item features to represent Q(s,a)
Dueling Q-network
Guanjie Zheng et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation. WWW 2018.
Exploration by Dueling Bandit Gradient Descent
• Trial-and-update learning (somewhat like evolutionary search)
Guanjie Zheng et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation. WWW 2018.
DRN: A Deep Reinforcement Learning Framework for News Recommendation
Guanjie Zheng et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation. WWW 2018.
Neural Network based Reinforcement Learning for Real-time Pushing on Text Stream
• Observations: previous interactions
• States: Hidden layer of an LSTM
• Actions: push or not
Haihui Tan et al. Neural Network based Reinforcement Learning for Real-time Pushing on Text Stream. SIGIR 2017.
KGQR: Leverage Knowledge Graphs to Better Item & State Representation
Sijin Zhou et al. Interactive Recommender System via Knowledge Graph-enhanced Reinforcement Learning. In submission 2020.
Summary of Current RL Solutions for Rec.
• State: Weak user profile representation• Action: Unable to well handle large-scale discrete
action space• Learning: Off-policy model-free RL to avoid data
bias and user modeling• System: Lack of online experiments or long time
tuning; online/offline learning combination• Data efficiency is quite low
• Modeling user dynamics would be a promising direction• Efficient state/action representation
Content• A brief review of recommender system road map
• Deep learning for recommender systems
• Deep reinforcement learning for recommender systems
• Summary
Summary:Road Map of Recommendation Technique
2000-2006 Neighborhood based collaborative filtering
2007-2009 Matrix factorization and variants
2010-2015 Factorization machine and variants
2015-2017 Deep neural networks for user behavior prediction
2017-2019 Deep reinforcement learning for decision making
Design neural nets to automatically capture complex interaction patterns in user-item data
Design RL settings for sequential recommendation decision making; train policies in an effective way
Thank You!Questions?
Dr. Weinan ZhangAssistant ProfessorAPEX Data & Knowledge Management LabJohn Hopcroft Center for Computer ScienceShanghai Jiao Tong University
Know more about me at http://wnzhang.net