Date post: | 26-Jan-2015 |
Category: |
Technology |
Upload: | xavier-amatriain |
View: | 105 times |
Download: | 0 times |
It's all about the User...
User-driven Approaches to the Recommendation Problem
Xavier AmatriainTelefonica Research
But first...
About me
Up until 2005
About me
2005 2007
About me
2007 ..
But first...
About Telefonica and Telefonica R&D
About 71,000 professionals
About 257,000 professionals
Staff
Services
Finances Rev: 4,273 M€EPS(1): 0.45 €
Integrated ICT solutions for all
customers
Clients About 12 million
subscribers
About 260 million
customers
Basic telephone and data services
1989
SpainOperations in 25 countries
Geographies
Rev: 57,946 M€ EPS: 1.63 €
2000 2008
About 149,000 professionals
About 68 million
customers
Wireline and mobile voice, data and
Internet services
(1) EPS: Earnings per share
Rev: 28,485 M€EPS(1): 0.67 €
Operations in16 countries
Telefonica is a fast-growing Telecom
Telco sector worldwide ranking by market cap (US$ bn)
Currently among the largest in the world
Source: Bloomberg, 06/12/09
Telefonica R&D (TID) is the Research and Development Unit of the Telefónica Group
MISSION“To contribute to the improvement of the Telefónica Group’s
competitivness through technological innovation”
n Founded in 1988
n Largest private R&D center in Spain
n More than 1100 professionals
n Five centers in Spain and two in Latin America
Telefónica was in 2008 the first Spanish company by R&D Investment and the third in the EU
Products / Services / Processes development
Technological Innovation (1)
R&D594 M€
4.384 M€
Applied research
R&D61 M€
Internet Scientific Areas
Content Distribution and P2P
Next generation Managed P2P-TV
Future Internet: Content Networking
Delay Tolerant Bulk Distribution
Network Transparency
Social Networks
Information Propagation
Social Search Engines
Infrastructure for Social based cloud computing
Wireless and Mobile Systems
Wireless bundling
Device2Device Content Distribution
Large Scale mobile data analysis
Multimedia Scientific Areas
Multimedia Core
Multimedia Data Analysis, Search & Retrieval
Video, Audio, Image, Music, Text, Sensor Data
Understanding, Summarization, Visualization
Mobile and Ubicomp
Context Awareness
Urban Computing
Mobile Multimedia & Search
Wearable Physiological Monitoring
HCC
Multimodal User Interfaces
Expression, Gesture, Emotion Recognition
Personalization & Recommendation Systems
Super Telepresence
Data Mining & User Modeling Areas
DATA MINING-Integration of statistical & knowledge-based techniques
- Stream mining
-Large scale & distributed machine learning
USER MODELING
- Application to new services (technology for development) - Cognitive, socio-cultural, and contextual modeling
- Behavioral user modeling (service-use patterns)
SOCIAL NETWORK ANALYSYS & BUSINESS INT.
- Analytical CRM
- Trend-spotting, service propagation & churn - Social Graph Analysis (construction, dynamics)
Index
Now seriously, this is where the index should go!
Introduction: What areRecommender Systems?
The Age of Search has come to an end
... long live the Age of Recommendation!
Chris Anderson in “The Long Tail”“We are leaving the age of information and entering the age of recommendation”
CNN Money, “The race to create a 'smart' Google”:
“The Web, they say, is leaving the era of search and entering one of discovery. What's the difference? Search is what you do when you're looking for something. Discovery is when something wonderful that you didn't know existed, or didn't know how to ask for, finds you.”
Information overload
“People read around 10 MB worth of material a day, hear 400 MB a day, and see one MB of information every second”
The Economist, November 2006
The value of recommendations
Netflix: 2/3 of the movies rented are recommended
Google News: recommendations generate 38% more clickthrough
Amazon: 35% sales from recommendations
Choicestream: 28% of the people would buy more music if they found what they liked.
u
The “Recommender problem”
Estimate a utility function that is able to automatically predict how much a user will like an item that is unknown for her. Based on:
Past behavior
Relations to other users
Item similarity
Context
...
The “Recommender problem”
Let C be a large set of all users and let S be a large set of all possible items that can be recommended (e.g books, movies, or restaurants).
Let u be a utility function that measures the usefulness of item s to user c, i.e., u : C X S→R, where R is a totally ordered set. Then, for each user c є C, we want to choose such item s’ є S that maximizes u.
Utility of an item is usually represented by rating but can also can be an arbitrary function, including a profit function.
Approaches to Recommendation
Collaborative FilteringRecommend items based only on the users past behavior
User-basedFind similar users to me and recommend what they liked
Item-basedFind similar items to those that I have previously liked
Content-basedRecommend based on features inherent to the items
Social recommendations (trust-based)
Recommendation Techniques
The Netflix Prize
500K users x 17K movie titles = 100M ratings = $1M (if you “only” improve existing system by 10%! From 0.95 to 0.85 RMSE)
49K contestants on 40K teams from 184 countries.
41K valid submissions from 5K teams; 64 submissions per day
Wining approach uses hundreds of predictors from several teams
Is this general? Why did it take so long?
What works
It depends on the domain and particular problem
However, in the general case it has been demonstrated that (currently) the best isolated approach is CF.
Item-based in general more efficient and better but mixing CF approaches can improve result
Other approaches can be hybridized to improve results in specific cases (cold-start problem...)
What matters:
Data preprocessing: outlier removal, denoising, removal of global effects (e.g. individual user's average)
“Smart” dimensionality reduction using MF such as SVD
Combining classifiers
I like it... I like it not
Evaluating User Ratings Noise inRecommender Systems
Xavier Amatriain (@xamat), Josep M. Pujol, Nuria OliverTelefonica Research
The Recommender Problem
Two ways to address it
1. Improve the Algorithm
The Recommender Problem
Two ways to address it
2. Improve the Input Data
Time for Data Cleaning!
User Feedback is Noisy
Natural Noise Limits our User Model
DID YOU HEAR WHAT I LIKE??!!
...and Our Prediction Accuracy
The Magic Barrier
Magic Barrier = Limit on prediction accuracy due to noise in original data
Natural Noise = involuntary noise introduced by users when giving feedback Due to (a) mistakes, and (b) lack of resolution in
personal rating scale (e.g. In a 1 to 5 scale a 2 may mean the
same than a 3 for some users and some items).
Magic Barrier >= Natural Noise Threshold We cannot predict with less error than the
resolution in the original data
Our related research questions
Q1. Are users inconsistent when providing explicit feedback to Recommender Systems via the common Rating procedure?
Q2. How large is the prediction error due to these inconsistencies?
Q3. What factors affect user inconsistencies?
Experimental Setup (I)
Test-retest procedure: you need at least 3 trials to separate Reliability: how much you can trust the instrument
you are using (i.e. ratings) r = r
12r
23/r
13
Stability: drift in user opinion s
12=r
13/r
23; s
23=r
13/r
12; s
13=r
13²/r
12r
23
Users rated movies in 3 trials Trial 1 <-> 24 h <-> Trial 2 <-> 15 days <-> Trial 3
Experimental Setup (II)
100 Movies selected from Netflix dataset doing a stratified random sampling on popularity
Ratings on a 1 to 5 star scale Special “not seen” symbol.
Trial 1 and 3 = random order; trial 2 = ordered by popularity
118 participants
Results
Comparison to Netflix Data
Distribution of number of ratings per movie very similar to Netflix but average rating is lower (users are not voluntarily choosing what to rate)
Test-retest Reliability and Stability
Overall reliability = 0.924 (good reliabilities are expected to be > 0.9) Removing mild ratings yields higher reliabilities,
while removing extreme ratings yields lower
Stabilities: s12 = 0.973, s23 = 0.977, and s13 = 0.951 Stabilities might also be accounting for “learning
effect” (note s12<s23)
Users are Inconsistent
● What is the probability of making an inconsistency given an original rating
Users are Inconsistent
● What is the percentage of inconsistencies given an original rating
Mild ratings are noisier
Users are Inconsistent
● What is the percentage of inconsistencies given an original rating
Negative ratings are noisier
Prediction Accuracy
#Ti
#Tj
# RMSE
T
1, T
2 2185 1961 1838 2308 0.573 0.707
T1, T
3 2185 1909 1774 2320 0.637 0.765
T2, T
3 1969 1909 1730 2140 0.557 0.694
● Pairwise RMSE between trials considering intersection and union of both sets
Prediction Accuracy
#Ti
#Tj
# RMSE
T
1, T
2 2185 1961 1838 2308 0.573 0.707
T1, T
3 2185 1909 1774 2320 0.637 0.765
T2, T
3 1969 1909 1730 2140 0.557 0.694
● Pairwise RMSE between trials considering intersection and union of both sets
Max error in trials that are most distant in time
Prediction Accuracy
#Ti
#Tj
# RMSE
T
1, T
2 2185 1961 1838 2308 0.573 0.707
T1, T
3 2185 1909 1774 2320 0.637 0.765
T2, T
3 1969 1909 1730 2140 0.557 0.694
● Pairwise RMSE between trials considering intersection and union of both sets
Significant less error when 2nd trial is involved
Algorithm Robustness to NN
Alg./Trial T1
T2
T3
Tworst
/Tbest
User Average
1.2011 1.1469 1.1945 4.7%
Item Average
1.0555 1.0361 1.0776 4%
Userbased kNN
0.9990 0.9640 1.0171 5.5%
Itembased kNN
1.0429 1.0031 1.0417 4%
SVD 1.0244 0.9861 1.0285 4.3%
● RMSE for different Recommendation algorithms when predicting each of the trials
Algorithm Robustness to NN
Alg./Trial T1
T2
T3
Tworst
/Tbest
User Average
1.2011 1.1469 1.1945 4.7%
Item Average
1.0555 1.0361 1.0776 4%
Userbased kNN
0.9990 0.9640 1.0171 5.5%
Itembased kNN
1.0429 1.0031 1.0417 4%
SVD 1.0244 0.9861 1.0285 4.3%
● RMSE for different Recommendation algorithms when predicting each of the trials
Trial 2 is consistently the least noisy
Algorithm Robustness to NN (2)
TrainingTesting Dataset
T1-T
2T
1-T
3T
2-T
3
User Average 1.1585 1.2095 1.2036
Movie Average 1.0305 1.0648 1.0637
Userbased kNN 0.9693 1.0143 1.0184
Itembased kNN 1.0009 1.0406 1.0590
SVD 0.9741 1.0491 1.0118
● RMSE for different Recommendation algorithms when predicting ratings in one trial (testing) from ratings on another (training)
Algorithm Robustness to NN (2)
TrainingTesting Dataset
T1-T
2T
1-T
3T
2-T
3
User Average 1.1585 1.2095 1.2036
Movie Average 1.0305 1.0648 1.0637
Userbased kNN 0.9693 1.0143 1.0184
Itembased kNN 1.0009 1.0406 1.0590
SVD 0.9741 1.0491 1.0118
● RMSE for different Recommendation algorithms when predicting ratings in one trial (testing) from ratings on another (training)
Noise is minimized when we predict Trial 2
Let's recap
Users are inconsistent Inconsistencies can depend on many things
including how the items are presented Inconsistencies produce natural noise Natural noise reduces our prediction accuracy
independently of the algorithm
Item order effect
R1 is the trial with most inconsistencies
R3 has less, but not when excluding “not seen” (learning effect improves “not seen” discrimination)
R2 minimizes inconsistencies because of order (reducing “contrast effect”).
User Rating Speed Effect
Evaluation time decreases as survey progresses in R1 and R3 (users losing attention but also learning)
In R2 evaluation time starts decreasing until users find segment of “popular” movies
Rating speed is not correlated with inconsistencies
So...
What can we do?
Different proposals
In order to deal with noise in user feedback we have so far proposed 3 different approaches:
1. Denoise user feedback by using a re-rating approach (Recsys09)
2. Instead of regular users, take feedback from experts, which we expect to be less noisy (SIGIR09)
3. Combine ensembles of datasets to identify which works better for each user (IJCAI09)
Rate it Again
Rate it AgainIncreasing Recommendation Accuracy
by User re-Rating
Xavier Amatriain (with J.M. Pujol, N. Tintarev, N. Oliver)
Telefonica Research
Rate it again
By asking users to rate items again we can remove noise in the dataset Improvements of up to 14% in accuracy!
Because we don't want all users to re-rate all items we design ways to do partial denoising Data-dependent: only denoise extreme ratings User-dependent: detect “noisy” users
Algorithm
Given a rating dataset where (some) items have been re-rated,
Two fairness conditions:
1. Algorithm should remove as few ratings as possible (i.e. only when there is some certainty that the rating is only adding noise)
2.Algorithm should not make up new ratings but decide on which of the existing ones are valid.
Algorithm
One source re-rating case:
Given the following milding function:
Results
One-source re-rating (Denoised Denoising)⊚
T1⊚T
2ΔT
1T
1⊚T
3ΔT
1T
2⊚T
3ΔT
2
Userbased kNN 0.8861 11.3% 0.8960 10.3% 0.8984 6.8%
SVD 0.9121 11.0% 0.9274 9.5% 0.9159 7.1%
Datasets T1
(⊚ T2, T
3) ΔT
1
Userbased kNN 0.8647 13.4%
SVD 0.8800 14.1%
Two-source re-rating (Denoising T1with the other 2)
Denoise outliers
● Improvement in RMSE when doing onesource as a function of the percentage of denoised ratings and users: selecting only noisy users and extreme ratings
The Wisdom of the Few
A Collaborative Filtering Approach Based on Expert Opinions from the Web
Xavier Amatriain (@xamat), Josep M. Pujol, Nuria OliverTelefonica Research (Barcelona)
Neal LathiaUCL (London)
Crowds are not always wise
Collaborative filtering is the preferred approach for Recommender Systems Recommendations are drawn from your past
behavior and that of similar users in the system Standard CF approach:
Find your Neighbors from the set of other users Recommend things that your Neighbors liked and you
have not “seen”
Problem: predictions are based on a large dataset that is sparse and noisy
Overview of the Approach
expert = individual that we can trust to have produced thoughtful, consistent and reliable evaluations (ratings) of items in a given domain
Expert-based Collaborative Filtering Find neighbors from a reduced set of experts instead of
regular users.
1. Identify domain experts with reliable ratings
2. For each user, compute “expert neighbors”
3. Compute recommendations similar to standard kNN CF
Advantages of the Approach
Noise Experts introduce less
natural noise
Malicious Ratings Dataset can be monitored
to avoid shilling
Data Sparsity Reduced set of domain
experts can be motivated to rate items
Cold Start problem Experts rate items as
soon as they are available
Scalability Dataset is several order of
magnitudes smaller
Privacy Recommendations can be
computed locally
Mining the Web for Expert Ratings
Collections of expert ratings can be obtained almost directly on the web: we crawled the Rotten Tomatoes movie critics mash-up
Only those (169) with more than 250 ratings in the Neflix dataset were used
Dataset Analysis. Summary
Experts... are much less sparse rate movies all over the rating scale instead of
being biased towards rating only “good” movies (different incentives).
but, they seem to consistently agree on the good movies.
have a lower overall standard deviation per movie: they tend to agree more than regular users.
tend to deviate less from their personal average rating.
Evaluation Procedure
Use the 169 experts to predict ratings from 10.000 users sampled from the Netflix dataset
Prediction MAE using a 80-20 holdout procedure (5-fold cross-validation)
Top-N precision by classifying items as being “recommendable” given a threshold
Results show Expert CF to behave similar to standard CF But... we have a user study backing up the
approach
User Study
57 participants, only 14.5 ratings/participant
50% of the users consider Expert-based CF to be good or very good
Expert-based CF: only algorithm with an average rating over 3 (on a 0-4 scale)
Current Work
Music recommendations (using metacritics.com), mobile geo-located recommendations...
Adaptive Data Sources
Collaborative Filtering With Adaptive Information Sources
(ITWP @ IJCAI)With Neal LathiaUCL (London)
user modeling experts?
friends?
like-minded?
similarity
trust
reputation
Adaptive data sources
Adaptive Data sources
Given a simple, un-tuned, kNN predictor and multiple
information sources A problem
users are subjective, accuracy varies with source A promise
optimal classification of users to best source produces incredibly accurate predictions
Conclusions
Conclusions
For many applications such as Recommender Systems (but also Search, Advertising, and even Networks) understanding data and users is vital
Algorithms can only be as good as the data they use as input
Importance of User/Data Mining is going to be a growing trend in many areas in the coming years
Thanks!
Questions?
Xavier [email protected]
xavier.amatriain.nettechnocalifornia.blogspot.com
twitter.com/xamat