Collaborative Filtering Based on Star Users
Qiang Liu with Bingfei Cheng and Congfu Xu
College of Computer Science and Technology
Zhejiang University Hangzhou, Zhejiang 310027, China
ICTAI 2011, Boca Raton November 7, 2011
Outline
Introduction Star-user-based Collaborative Filtering Experimental Results Conclusion
INTRODUCTION
Collaborative Filtering
Neighborhood-based
Model-based
Collaborative Filtering(CF)
User-based Item-based
Bayesian Model Factorization Model Maximum Entropy Classification or Clustering ……
Motivation To improve the most widely used
technology in real-life recommender systems.
Neighborhood Model Similarity between users:
◦ Pearson:cov(𝑎,𝑏)𝜎𝑎𝜎𝑏
◦ Cosine: 𝑎∙𝑏𝑎 𝑏
◦ Other similarity measures
Weighted sum of neighbors’ ratings:
◦ 𝒑𝒂,𝒊 = 𝒓𝒂 + ∑ 𝒓𝒖,𝒊−𝒓𝒖 ∙ 𝒘𝒂,𝒖𝒖∈𝑼∑ 𝒘𝒂,𝒖𝒖∈𝑼
Common items:1,4,6 Rating vectors of common items: a=[1,4,5] b=[2,2,5]
Challenges faced by traditional methods
Matching similar users (computing similarities ): Sparsity and noise Scalability ……
STAR-USER-BASED CF
The MPN users Let A, B, C, D are neighbors of users A, B,
C, D respectively. Then area E is the set of the most
popular neighbors(MPN).
What is star user
Star users are special users who have rated all items with relatively stable standard.
We maintain a small set of star users, and treat them as fixed neighbors of every general user
Problem Formulation
Filling the following matrix ℛ ∈ 𝑅𝐻×𝑁.
𝒊𝟏 … 𝒊𝒊 … 𝒊𝑵
𝒔𝟏 ? . . . ?
… . . . . .
𝒔𝒔 . . 𝑟𝑠,𝑖 . .
... . . . . .
𝒔𝑯 ? . . . ?
Star users(H)
Items (N)
Prediction Model Selecting Star Neighbors:
Generate predictions
based on star users’ ratings:
𝒑𝒖,𝒊 = 𝒓�𝒖 + ∑ 𝒓𝒔,𝒊−𝒓𝒔 ∙ 𝒘𝒖,𝒔𝒔∈𝑺∑ 𝒘𝒖,𝒔𝒔∈𝑺
The parameters are 𝑟𝑠,𝑖 and 𝑤𝑢,𝑠.
𝒖𝟏 … 𝒖𝒊 … 𝒖𝑴
𝒔𝟏 . . . . .
… . . . . .
𝒔𝒔 . . 𝑤𝑢,𝑠 . .
... . . . . .
𝒔𝑯 . . . . .
General Users (M)
Star Users (H
)
Relationship Matrix W
How we get star users(1)
Training Stage: 1. Initialization star user matrix ℛ. 2. Predict each rating �̂�𝑢,𝑖 in the training set:
3. The residual is and the gradient of 𝑒𝑢,𝑖
2 is:
�̂�𝑢,𝑖 = �̅�𝑢 +∑ (𝑟𝑠,𝑖 − �̅�𝑠) × 𝑤𝑢,𝑠𝑠∈𝑆
∑ 𝑤𝑢,𝑠𝑠∈𝑆
𝑒𝑢,𝑖 = 𝑟𝑢,𝑖 − �̂�𝑢,𝑖
𝜕𝜕𝑟𝑠,𝑖
𝑒𝑢,𝑖2 = −2𝑒𝑢,𝑖 ∙
𝑁−1𝑁 ∙𝑤𝑢,𝑠
∑ 𝑤𝑢,𝑠𝑠∈𝑆
How we get star users(2)
Training Stage: 4. Update each element of matrix ℛ:
5. Repeat steps 2 to 4 until convergence.
𝑟𝑠,𝑖 ← 𝑟𝑠,𝑖 + 𝜂 ∙ 𝑒𝑢,𝑖 ∙𝑤𝑢,𝑠
∑ 𝑤𝑢,𝑠𝑠∈𝑆
How we get star users(3)
Parameters: ◦ 𝛼(users):The update frequency of �̅�𝑠 . ◦ 𝛽 𝑖𝑖𝑒𝑟𝑖𝑖𝑖𝑖𝑖𝑠 :The update frequency of 𝑤𝑢,𝑠 ∈ 𝑊for each u, and s.
w𝑢,𝑠 is computed using Pearson Correlation
Maintain the relationship matrix W: 𝑊 ∈ 𝑅𝑀×𝐻
until recommending stage.
EXPERIMENTAL RESULTS
Results on MovieLens Dataset
Time requirement comparison
RMSE of our approach against various H and comparison with kNN
Item-based Model
We firstly train a small set of star items instead of star users.
Predictions are computed as: 𝑝𝑎,𝑖 = �̅�𝑖 +
∑ 𝑟𝑎,𝑠 − 𝑟𝑠� × 𝑤𝑠,𝑗𝑠∈𝑆′
∑ 𝑤𝑠,𝑗𝑠∈𝑆′
Results on Netflix Dataset
Our approach with different values of learning rate
Our approach with different values of H
Discussion
Comparison with kNN
◦ Accuracy ◦ Data Sparsity ◦ Scalability 𝛰 𝑀2 × 𝑁′
→ 𝛰(𝑀 × 𝐻 × 𝑁′) where 𝐻 ≪ 𝑀.
Comparison with SVD
◦ Scientific explanation ◦ Parameters ◦ Updating
CONCLUSION
Summary
We proposed a novel CF model based on star users.
The original intention is to improve traditional neighborhood-based CF model.
Experimental results on two datasets verified the effectiveness of our approach.
Future work
Incorporating contextual information into our model.
Validating our approach in practical applications.
THANK YOU