Download - Collaborative Filtering Based on Star Users

Collaborative Filtering Based on Star Users

Qiang Liu with Bingfei Cheng and Congfu Xu

College of Computer Science and Technology

Zhejiang University Hangzhou, Zhejiang 310027, China

[email protected]

ICTAI 2011, Boca Raton November 7, 2011

Outline

Introduction Star-user-based Collaborative Filtering Experimental Results Conclusion

INTRODUCTION

Collaborative Filtering

Neighborhood-based

Model-based

Collaborative Filtering(CF)

User-based Item-based

Bayesian Model Factorization Model Maximum Entropy Classification or Clustering ……

Motivation To improve the most widely used

technology in real-life recommender systems.

Neighborhood Model Similarity between users：

◦ Pearson：cov(𝑎,𝑏)𝜎𝑎𝜎𝑏

◦ Cosine: 𝑎∙𝑏𝑎 𝑏

◦ Other similarity measures

Weighted sum of neighbors’ ratings:

◦ 𝒑𝒂,𝒊 = 𝒓𝒂 + ∑ 𝒓𝒖,𝒊−𝒓𝒖 ∙ 𝒘𝒂,𝒖𝒖∈𝑼∑ 𝒘𝒂,𝒖𝒖∈𝑼

Common items：1,4,6 Rating vectors of common items： a=[1,4,5] b=[2,2,5]

Challenges faced by traditional methods

Matching similar users (computing similarities ): Sparsity and noise Scalability ……

STAR-USER-BASED CF

The MPN users Let A, B, C, D are neighbors of users A, B,

C, D respectively. Then area E is the set of the most

popular neighbors(MPN).

What is star user

Star users are special users who have rated all items with relatively stable standard.

We maintain a small set of star users, and treat them as fixed neighbors of every general user

Problem Formulation

Filling the following matrix ℛ ∈ 𝑅𝐻×𝑁.

𝒊𝟏 … 𝒊𝒊 … 𝒊𝑵

𝒔𝟏 ? . . . ?

… . . . . .

𝒔𝒔 . . 𝑟𝑠,𝑖 . .

... . . . . .

𝒔𝑯 ? . . . ?

Star users(H)

Items (N)

Prediction Model Selecting Star Neighbors:

Generate predictions

based on star users’ ratings:

𝒑𝒖,𝒊 = 𝒓�𝒖 + ∑ 𝒓𝒔,𝒊−𝒓𝒔 ∙ 𝒘𝒖,𝒔𝒔∈𝑺∑ 𝒘𝒖,𝒔𝒔∈𝑺

The parameters are 𝑟𝑠,𝑖 and 𝑤𝑢,𝑠.

𝒖𝟏 … 𝒖𝒊 … 𝒖𝑴

𝒔𝟏 . . . . .

… . . . . .

𝒔𝒔 . . 𝑤𝑢,𝑠 . .

... . . . . .

𝒔𝑯 . . . . .

General Users (M)

Star Users (H

)

Relationship Matrix W

How we get star users（1）

Training Stage: 1. Initialization star user matrix ℛ. 2. Predict each rating �̂�𝑢,𝑖 in the training set:

3. The residual is and the gradient of 𝑒𝑢,𝑖

2 is:

�̂�𝑢,𝑖 = �̅�𝑢 +∑ (𝑟𝑠,𝑖 − �̅�𝑠) × 𝑤𝑢,𝑠𝑠∈𝑆

∑ 𝑤𝑢,𝑠𝑠∈𝑆

𝑒𝑢,𝑖 = 𝑟𝑢,𝑖 − �̂�𝑢,𝑖

𝜕𝜕𝑟𝑠,𝑖

𝑒𝑢,𝑖2 = −2𝑒𝑢,𝑖 ∙

𝑁−1𝑁 ∙𝑤𝑢,𝑠



Training Stage: 4. Update each element of matrix ℛ:

5. Repeat steps 2 to 4 until convergence.

𝑟𝑠,𝑖 ← 𝑟𝑠,𝑖 + 𝜂 ∙ 𝑒𝑢,𝑖 ∙𝑤𝑢,𝑠



Parameters: ◦ 𝛼(users):The update frequency of �̅�𝑠 . ◦ 𝛽 𝑖𝑖𝑒𝑟𝑖𝑖𝑖𝑖𝑖𝑠 :The update frequency of 𝑤𝑢,𝑠 ∈ 𝑊for each u, and s.

w𝑢,𝑠 is computed using Pearson Correlation

Maintain the relationship matrix W: 𝑊 ∈ 𝑅𝑀×𝐻

until recommending stage.

EXPERIMENTAL RESULTS

Results on MovieLens Dataset

Time requirement comparison

RMSE of our approach against various H and comparison with kNN

Item-based Model

We firstly train a small set of star items instead of star users.

Predictions are computed as: 𝑝𝑎,𝑖 = �̅�𝑖 +

∑ 𝑟𝑎,𝑠 − 𝑟𝑠� × 𝑤𝑠,𝑗𝑠∈𝑆′

∑ 𝑤𝑠,𝑗𝑠∈𝑆′

Results on Netflix Dataset

Our approach with different values of learning rate

Our approach with different values of H

Discussion

Comparison with kNN

◦ Accuracy ◦ Data Sparsity ◦ Scalability 𝛰 𝑀2 × 𝑁′

→ 𝛰(𝑀 × 𝐻 × 𝑁′) where 𝐻 ≪ 𝑀.

Comparison with SVD

◦ Scientific explanation ◦ Parameters ◦ Updating

CONCLUSION

Summary

We proposed a novel CF model based on star users.

The original intention is to improve traditional neighborhood-based CF model.

Experimental results on two datasets verified the effectiveness of our approach.

Future work

Incorporating contextual information into our model.

Validating our approach in practical applications.

THANK YOU