Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 218 times |
Download: | 0 times |
EigenTaste:A Constant Time Collaborative
Filtering Algorithm
Ken Goldberg
Students: Theresa Roeder, Dhruv Gupta, Chris Perkins
Industrial Engineering and Operations Research
Electrical Engineering and Computer Science
UC Berkeley
CF Problem Definition
• A set of objects (movies, books, jokes)
• A user rates a subset of objects
• Based on the ratings, retrieve objects from the complement of this subset. Criteria:– Effective : recommended objects should
receive high ratings– Efficient : the online recommendation process
should run quickly and be scalable
Some Previous Work
• D. Goldberg, et al. - Tapestry (1992)
• Riedel, Resnick, Konstan et. al. - GroupLens(1994-)
• Shardanand and Maes - Ringo (1995)
• Resnick and Varian (1997)
• Breese et. al. at Microsoft Research (1998)
• Pazzani (1999)
• Herlocker et. al. - GroupLens (1999)
EigenTaste Algorithm
1) Principal Component Analysis 2) Universal Queries (dense ratings matrix)3) Fine-grained ratings bar (captures nuances)4) Offline and Online Processing5) Online: Constant time recommendations
Universal Queries
• Most CF systems require users to select which items they want to rate: sparse ratings matrix
• Eigentaste allows users to rate all items based on short unbiased descriptions (eg, film synopsis)
• Eigentaste uses a subset of highly discriminatory items for the gauge set
EigenTaste Algorithm
• A is the n x m normalized rating matrix– n users– m objects
• C is the k x k reduced correlation matrix– k objects in the gauge set:– C = (1/n) ATA– assumes ratings are continuous with linear rel.
• E is the ortho. matrix of eigenvectors of C is the diagonal matrix of eigenvalues
EigenTaste• ECET = • C = ETE• Let B = AET
• RB = (1/n) BTB = ECET =
– transformed points are uncorrelated and each column of B has variance i
• Principle Components (Pearson 1901)– consider m largest eigenvectors, Em
• Bm = AEmT
• choose m based on “knee” in eigenvalues
Dimensionality Reduction
• First two principal components (eigenvectors) account for nearly 50% of the variation in user ratings
• Project user ratings along first two principal components: x = AE2
T
• Facilitates visualization ...
The EigenTaste Algorithm
• Offline:– Compute eigenvectors and project users onto eigen plane.– Cluster and compute average ratings for each cluster.
• Online: – Collect ratings for objects in gauge set– Project onto the eigen plane– Find representative cluster
– Recommend objects based on average ratings within that cluster
First Application (1999)Jester: Recommending Jokes
• Sense of humor is difficult to specify
• Advantages:– Rating process is not altogether unpleasant– Can evaluate jokes quickly:– Dense ratings matrix (large sample size)
• Disadvantages:– Offensive/Shaggy Dog jokes– Temporal Effects, Portfolio Effects– Priming/Masking
System Architecture
Client
WebServer
RecommendationEngine
User RatingProfiles
Content DatabaseInternet
CGI
Login Interface
CGI
Measure of Effectiveness
Metric: Normalized Mean Absolute Error (NMAE): Average absolute deviation of actual ratings from predicted ratings, normalized over rating range.
MAE = 1/c |r - p|
NMAE = MAE / (r_max - r_min)
Effectiveness
Algorithm NMAE
POP 0.203
1 Nearest Neighbor 0.237
80 Nearest Neighbors 0.187
EigenTaste 0.187
Based on 18,000 users
Computational Complexity n - number of usersk - number of objects in gauge set
Nearest Neighborhood algorithm : Online processing - O(kn)
EigenTaste algorithm: Offline processing - O(k2n)Online processing - O(k)
Effectiveness and Efficiency
NMAE OFFLINE COMPLEXITY
ONLINE COMPLEXITY
POP 0.203 O(nm) O(1)
1 Nearest Neighbor 0.237 O(1) O(nk)
80 Nearest Neighbors 0.187 O(1) O(nk)
Eigentaste 0.187 O(k2n) O(k)
EigenTaste Algorithm
1) Principal Component Analysis 2) Universal Queries (dense ratings matrix)3) Fine-grained ratings bar (captures nuances)4) Offline and Online Processing5) Online: Constant time recommendations
Patent application 21 December 1999 by UC Regents
www.cs.berkeley.edu/~goldberg
Eigentaste: A Constant Time Collaborative Filtering Algorithm
(to appear: Information Retrieval Journal, 2001)