Post on 15-Apr-2017
transcript
Recommender Systems (RS)and
Active Learning (AL)Neil Rubens & Dain Kaplan
June 2016
• Value of RS
• RS Methods
• RS Objectives for Startups
• AL for RS
2
Outline
Value of Recommender Systems
http://previews.123rf.com/images/lisafx/lisafx1004/lisafx100400069/6903279-Customer-getting-advice-about-tape-from-a-hardware-store-clerk-Isolated-on-white--Stock-Photo.jpg4
Why Recommender Systems?
User
System
Items
http://previews.123rf.com/images/lisafx/lisafx1004/lisafx100400069/6903279-Customer-getting-advice-about-tape-from-a-hardware-store-clerk-Isolated-on-white--Stock-Photo.jpg
• User Objectives
• finding needed item
• value
• utility
• enjoyment
• novelty
• serendipity
• etc.
• System Objectives
• revenue
• profit
• promote partners
• # of users
• # of visits
• time spent
• etc.Objectives may overlap!
5
RS Objectives
Value of RS• Amazon: 35% of sales from recommendations
• Netflix: 2/3 of the movies watched are recommended
• Choicestream: 28% of the people would buy more music if they found what they liked
• Google News: recommendations generate 38% more click-throughs
www.slideshare.net/kerveros99/machine-learning-for-recommender-systems-mlss-2015-sydney 6
7
RS Methods
User
Items
Predictions8
What’s an RS?Sarah
Like
Like
Like LikeHate
HateRatings ? ? ? ?
•Assumption: preferences of “similar” items/users stay similar
•Similarity: variety of ways to define
9
Common Approach
Use ratings to estimate “similarity”
10
Collaborative Filtering (CF)U
sers
Items RatingsLove
Like
Okay
Dislike
Hate
https://buildingrecomm
enders.wordpress.com
/2015/11/23/overview-of-recom
mender-algorithm
s-part-5/
Users with similar dis/likes are similar, e.g. if Sarah and you have similar tastes, then anything that Sarah likes you will too (and vice versa)
Similar items will have similar ratings, e.g., if you liked a book A,
you will also like a book B with a similar rating
https://buildingrecomm
enders.wordpress.com
/2015/11/23/overview-of-recom
mender-algorithm
s-part-5/
11
Item-based CF
User-based CF
5"
Model&
Model"(user)"Based"Es0ma0on"
Model"(item)"Based"Es0ma0on"
Ac0ve"Learning"
12
MODEL
MODEL (USER) BASED ESTIMATION
MODEL (ITEM) BASED ESTIMATION
ACTIVE LEARNING
Ged
imin
as A
dom
avic
ius,
Ale
xand
er T
uzhi
lin, "
Tow
ard
the
Nex
t G
ener
atio
n of
Rec
omm
ende
r Sy
stem
s: A
Sur
vey
of t
he S
tate
-of-t
he-A
rt a
nd P
ossi
ble
Exte
nsio
ns",
IEEE
Tra
nsac
tions
on
Kno
wle
dge
& D
ata
Engi
neer
ing,
vol.1
7, n
o. 6
, pp.
734
-749
, Jun
e 20
05, d
oi:1
0.11
09/T
KD
E.20
05.9
9
Tailored to:• domains• item types• data types• objectives• etc.
13
VARIETY OF
RS APPROA
CHES
14
RS Objectives for Startups
Established Companies Startups
“cruise mode”
• Many existing loyal users• RS used to increase per-
user metrics, e.g. revenue, profit, etc.
“launch mode”• Still building user-base• RS used to attract/retain new users
15
Startups = Growth“The only essential thing is growth. Everything else we
associate with startups follows from growth.”
(Paul Graham, Y Combinator)
16
https://caferacerlaurbanabike.files.wordpress.com/2015/02/new-store-coming-soon.jpg
Expecting many:• new users• new items
17
18
“Cold Start” Problem
? ? ? ? ?
??????
• RS Needs user/item data to make recommendations with CF
• For new users/new items, no data is available yet:• New item problem• New user problem
New User
New Item
• Problem: don’t have any reviews yet (to base recommendations on)
• Solution: can use content-based item similarity (to bootstrap recommendations)
19
New Item Problem
Jordan Jumpman Team II Air Jordan 1 Retro High Nouveau
Hurley One And Only Printed
Air Jordan 1 Retro High OG
http://cache2.asset-cache.net/gc/558944927-side-view-of-man-opening-cafe-door-gettyimages.jpg?v=1&c=IWSAsset&k=2&d=BrH9aKEkYRiNc1pWhEX0etmgH38bczDi5XkuRcvp%2Bb9LQTmCIaIUqwLdVhpVf%2B9B20
New User Problem
• Very important to make a good first impression: bad first impression may lose potential user
• Problem: can’t make personalised recommendations (no data on the user yet)
21
Importance of Good Recommendations
Seriously?
http://www.smh.com.au/content/dam/images/2/5/u/n/g/image.related.articleLeadwide.620x349.25ume.png/1347596915177.jpg22
Learning New User Preferences
• Talking: learn about the user implicitly/explicitly
• Stalking: obtain data indirectly
• Contacts:friends may already be users of app (likely to have similar interests)
• Location
• Device type
• Social profileNOTE: should not be intrusive
23
Indirect Data
http://orangewebsitedesign.com/wp-content/uploads/2015/01/Jim-working-on-website-design-for-client-on-glass-wall.jpg
24
System Interaction Data• How: learn about user through
implicit/explicit interaction• clicks (or their absence)• duration• navigation paths• etc.
• What: make interaction more informative: item selection • position• attributes• grouping
Active Learning (AL)for
Recommender Systems
• Recommend an item that a user will like:
Popular items, i.e., everyone likes (but provides little info about user’s preferences)
• Present an item to learn about user’s preferences (Active Learning, AL):
Contentious Items, i.e., many people like / dislike (informative about user’s preferences)
26
Item Selection•RS Presents items for two primary purposes:
• In practice multiple items are shown for different objectives
27
AL Categories
• Item-based AL: analyse items and select items that seem most informative
• Model-based AL: analyse model and select items that seem most informative
• Popular: rated by many users [Rashid 2002]
• High Variance in Ratings: item that people either like or hate [Rashid 2002]
• Best/Worst: ask user which items s/he likes most/least [Leino & Raiha 2007]
• Influential: items on which ratings of many other items depend (representative + not represented) [Rubens & Sugiyama 2007]
28
Item Categories
c
a
b
input1
inpu
t 2
c
a
b
input1
inpu
t 2
test point (unrated)
training point(color may differ)
ratings color map
41 32 5
Actual Ratings (Unknown)
d d
• 3R Properties:• Represented by the existing training set? E.g., (b) is already represented
• Representative of others? E.g., (a) is not this way
• Results in achieving objective? E.g., (d) → max coverage
[Rubens & Kaplan, 2010] 29
Item-based AL
!
"
#
!"#$%&
!"#$% '
!
"
#
!"#$%&
!"#$% '
%()%*#+!"%*,$"-.%(/0*
%-.!"!"1*#+!"%,2+3+-*4.5*/!66(-0
-.%!"1)*2+3+-*4.#
$% &' (
72%$.3*8.%!"1)*,9":"+;"0
) )
Figure 1: Active Learning: illustrative example (See Section 1.2).
already possible from the training point in the same area (refer to the chart on the left). If training point (c) isselected, we are able to make new predictions, but only for the other three points in this area, which happensto be Zombie movies. By selecting training point (d), we are able to make predictions for a large number oftest points that are in the same area, which belong to Comedy movies. Thus selecting (d) is the ideal choicebecause it allows us to improve accuracy of predictions the most (for the highest number of training points).4
1.3 Types of Active LearningAL methods presented in this chapter have been categorized based on our interpretation of their primarymotivation/goal. It is important to note, however, that various ways of classification may exist for a givenmethod, e.g. sampling close to a decision boundary may be considered as Output Uncertainty-based since theoutputs are unknown, Parameter-based because the point will alter the model, or even Decision boundary-based because the boundary lines will shift as a result. However, since the sampling is performed with regardto decision boundaries, we would consider this the primary motivation of this method and classify it as such.
In addition to our categorization by primary motivation (Section 1), we further subclassify a method’salgorithms into two commonly classified types for easier comprehension: instance-based and model-based.
Instance-based Methods A method of this type selects points based on their properties in an attempt topredict the user’s ratings by finding the closest match to other users in the system, without explicit knowledgeof the underlying model. Other common names for this type include memory-based, lazy learning, case-based,and non-parametric (Adomavicius & Tuzhilin, 2005). We assume that any existing data is accessible, as well asrating predictions from the underlying model.
Model-based Methods A method of this type selects points in an attempt to best construct a model thatexplains data supplied by the user to predict user ratings (Adomavicius & Tuzhilin, 2005). These points arealso selected to maximize the reduction of expected error of the model. We assume that in addition to any dataavailable to instance-based methods, the model and its parameters are also available.
Modes of Active Learning: Batch and Sequential Because users typically want to see the system outputsomething interesting immediately, a common approach is to recompute a user’s predicted ratings after theyhave rated a single item, in a sequential manner. It is also possible, however, to allow a user to rate severalitems, or several features of an item before readjusting the model. On the other hand, selecting training pointssequentially has the advantage of allowing the system to react to the data provided by users and make necessaryadjustments immediately. Though this comes at the cost of interaction with the user at each step. Thus a trade-o� exists between Batch and Sequential AL: the usefulness of the data vs. the number of interactions with theuser.
2 Properties of Data PointsWhen considering any Active Learning method, the following three factors should always be considered in orderto maximize the e�ectiveness of a given point. Supplementary explanations are then given below for the firsttwo. Examples refer to the Illustrative Example (Figure 1).
(R1) Represented : Is it already represented by the existing training set? E.g. point (b).4This may be dependent on the specific prediction method used in the RS.
3
Illustrative Example: movies are clustered by genre
30
Item Selection:Learning User Preferences
X1
X2 Limited information due to few
items
31
Simply Not Useful
X1
X2
X1
X2
X1
X2
X1
X2
Ratingspositive
negative
System: limited knowledge User: not much variety, may get bored
32
User Satisfaction
Drawback
X1
X2
X1
X2
User exposed to disliked items33
Coverage
Drawback
34
decision boundary
decision boundary
Actual Model Random Sampling Active Learning
decision boundary
Prediction Accuracy
10
11
X1
X2
10
11
X1
X2
10
11
X1
X2
Initial
Improve Margin/Confidence Improve Orientation
35
AL Model Error
Existing Approaches Parameter Uncertainty AL
g : optimal function (in the sollutionspace)bf : learned functionbfi ’s: learned functions from a slightlydi⇣erent training set.EG = B +V +CB =
⇣Ebf (x)�g(x)
⌘2
V =⇣bf �Ebf (x)
⌘2
C = (g(x)� f (x))2
Model Error – Cconstant and is ignored
Bias – BHard to estimate, but is assumedto vanish (assymptotically).
Variance – VEstimate and minize.
10 / 2036
AL Model Error
It is clearly shown in the table that different strategies can improve different aspects of the recom-705
mendation quality. In terms of rating prediction accuracy (MAE/RMSE), there are various strategies thathave shown excellent performance. While, some of these strategies are easy to implement (e.g., Entropy0and Log(popularity)*Entropy), others are more complex and use more sophisticated Machine Learningalgorithms (e.g., Decision Tree, and Personality-based FM). Strategies that have shown excellent per-formance in terms of ranking quality (NDCG/MAP), are Representative-based and Voting strategies.710
In terms of precision, prediction-based strategies (Highest-predicted, and Binary-predicted) have shownexcellent performance. In terms of number of ratings acquired (# Ratings), as expected, strategies thatconsider the popularity of items (Popularity and Entropy0) can acquire the largest number of ratings.But, other strategies that maximize the chance that the selected items are familiar to the user (Item-itemand Personality-based) can also elicit a considerable number of ratings. For these strategies the success715
ratio (#acquired_ratings/#requested_items) is the largest. This is an important factor, since strategiesthat only focus on the informativeness of the items may fail to actually acquire ratings, by selectingobscure items that users do not know and cannot rate.
Table 1: Performance comparison of active learning strategies (“XX” Very Good, “X” Good, “ ” Poor, “-” Not Available)ML: Movielens, NF: Netflix, EM: EachMovie, AWM: Active Web Museum, MP: MyPersonality, STS: South Tyrol Suggests, LF: Last.fm
Type Strategy
Metric Eval.
Compar. Strategies Datasets
MA
E/R
MSE
ND
CG
/MA
P
Prec
isio
n
#R
atin
g
Onl
ine
Offl
ine
Non
-Per
sona
lized
Sing
le
uncertainty based1. variance [59, 61] X - - - - y 2, 4, 6, 9, 24 AWM, EM
2. entropy [20, 67] - - - - y 3, 6, 8, 9, 11, 13, 22 EM
3. entropy0 [67] XX - - XX y y 2, 6, 8, 11, 13, 22 ML
error reduction 4. greedy extend [68] X - - - - y 2, 3, 6, 7, 10, 11 NF
5. representative [69] - XX XX - - y 6 NF, ML, LF
attention based 6. popularity [20, 67] X - - XX y y 2, 8, 9, 11, 13, 22 ML
7. co-coverage [68] - - - - y 2, 3, 4, 6, 10, 11 NF
Com
bine
d
static combin.
8. rand-pop [20, 67] - - y y 2, 3, 6, 11, 13, 22 ML
9. log(pop)*entropy [20] XX - - X y y 3, 6, 8, 13 ML
10. sqrt(pop)*var [68] X - - - - y 2, 3, 4, 6, 7, 11 NF
11. HELF [67] XX - - y y 2, 3, 6, 8, 13, 22 ML
12. non-pers-part rand. [11] X XX X - y 1, 6, 9, 12, 14, 20, 21, 28, 29 ML, NF
Pers
onal
ized
Sing
le
acquisition prob.13. item-item [20, 67] - - XX y y 2, 3, 6, 8, 9, 11, 22 ML
14. binary-pred [11, 12] X XX X - y 1, 6, 9, 12, 20, 21, 28, 29 ML, NF
15. personality-based [70, 97] XX XX - XX y y 3, 9, 14 STS, MP
16. impact analysis [71] XX - - - - y 9 ML
prediction based
17. aspect model [72, 73] X - - - - y 2 EM, ML
18. min rating [74] X - - - - y 19,25 ML
19. min norm [74] - - - - y 18,25 ML
20. highest-pred [11, 12] X XX X - y 1, 6, 9, 12, 14, 21, 28, 29 ML, NF
21. lowest-pred [11, 12] X X - y 1, 6, 9, 12, 14, 20, 28, 29 ML, NF
user partitioning 22. IGCN [67] XX - - X y y 2, 3, 6, 8, 11, 13 ML
23. decision tree [64] XX - - - - y 3, 4, 10, 11 NF
Com
bine
d
static combin.
24. influence based [61] XX - - - - y 1, 4, 6, 9 ML
25. non-myopic [74] X - - - - y 18, 19 ML
26. treeU [75] X - - - - y 23, 27 ML, EM, NF
27. fMF [75] XX - - - - y 23, 26 ML, EM, NF
28. pers-partially rand. [11] X XX X - y 1, 6, 9, 12, 14, 20, 21, 28, 29 ML, NF
29. voting [11, 12] XX XX - y 1, 6, 9, 12, 14, 20, 21, 28 ML, NF
adaptive combin. 30. switching [76] XX XX - XX - y 9, 20, 29 ML
27
Mehdi Elahi, Francesco Ricci, Neil Rubens, A survey of active learning in collaborative filtering recommender systems, Computer Science Review, Elsevier, 2016.
It is clearly shown in the table that different strategies can improve different aspects of the recom-705
mendation quality. In terms of rating prediction accuracy (MAE/RMSE), there are various strategies thathave shown excellent performance. While, some of these strategies are easy to implement (e.g., Entropy0and Log(popularity)*Entropy), others are more complex and use more sophisticated Machine Learningalgorithms (e.g., Decision Tree, and Personality-based FM). Strategies that have shown excellent per-formance in terms of ranking quality (NDCG/MAP), are Representative-based and Voting strategies.710
In terms of precision, prediction-based strategies (Highest-predicted, and Binary-predicted) have shownexcellent performance. In terms of number of ratings acquired (# Ratings), as expected, strategies thatconsider the popularity of items (Popularity and Entropy0) can acquire the largest number of ratings.But, other strategies that maximize the chance that the selected items are familiar to the user (Item-itemand Personality-based) can also elicit a considerable number of ratings. For these strategies the success715
ratio (#acquired_ratings/#requested_items) is the largest. This is an important factor, since strategiesthat only focus on the informativeness of the items may fail to actually acquire ratings, by selectingobscure items that users do not know and cannot rate.
Table 1: Performance comparison of active learning strategies (“XX” Very Good, “X” Good, “ ” Poor, “-” Not Available)ML: Movielens, NF: Netflix, EM: EachMovie, AWM: Active Web Museum, MP: MyPersonality, STS: South Tyrol Suggests, LF: Last.fm
Type Strategy
Metric Eval.
Compar. Strategies Datasets
MA
E/R
MSE
ND
CG
/MA
P
Prec
isio
n
#R
atin
g
Onl
ine
Offl
ine
Non
-Per
sona
lized
Sing
le
uncertainty based1. variance [59, 61] X - - - - y 2, 4, 6, 9, 24 AWM, EM
2. entropy [20, 67] - - - - y 3, 6, 8, 9, 11, 13, 22 EM
3. entropy0 [67] XX - - XX y y 2, 6, 8, 11, 13, 22 ML
error reduction 4. greedy extend [68] X - - - - y 2, 3, 6, 7, 10, 11 NF
5. representative [69] - XX XX - - y 6 NF, ML, LF
attention based 6. popularity [20, 67] X - - XX y y 2, 8, 9, 11, 13, 22 ML
7. co-coverage [68] - - - - y 2, 3, 4, 6, 10, 11 NF
Com
bine
d
static combin.
8. rand-pop [20, 67] - - y y 2, 3, 6, 11, 13, 22 ML
9. log(pop)*entropy [20] XX - - X y y 3, 6, 8, 13 ML
10. sqrt(pop)*var [68] X - - - - y 2, 3, 4, 6, 7, 11 NF
11. HELF [67] XX - - y y 2, 3, 6, 8, 13, 22 ML
12. non-pers-part rand. [11] X XX X - y 1, 6, 9, 12, 14, 20, 21, 28, 29 ML, NF
Pers
onal
ized
Sing
le
acquisition prob.13. item-item [20, 67] - - XX y y 2, 3, 6, 8, 9, 11, 22 ML
14. binary-pred [11, 12] X XX X - y 1, 6, 9, 12, 20, 21, 28, 29 ML, NF
15. personality-based [70, 97] XX XX - XX y y 3, 9, 14 STS, MP
16. impact analysis [71] XX - - - - y 9 ML
prediction based
17. aspect model [72, 73] X - - - - y 2 EM, ML
18. min rating [74] X - - - - y 19,25 ML
19. min norm [74] - - - - y 18,25 ML
20. highest-pred [11, 12] X XX X - y 1, 6, 9, 12, 14, 21, 28, 29 ML, NF
21. lowest-pred [11, 12] X X - y 1, 6, 9, 12, 14, 20, 28, 29 ML, NF
user partitioning 22. IGCN [67] XX - - X y y 2, 3, 6, 8, 11, 13 ML
23. decision tree [64] XX - - - - y 3, 4, 10, 11 NF
Com
bine
d
static combin.
24. influence based [61] XX - - - - y 1, 4, 6, 9 ML
25. non-myopic [74] X - - - - y 18, 19 ML
26. treeU [75] X - - - - y 23, 27 ML, EM, NF
27. fMF [75] XX - - - - y 23, 26 ML, EM, NF
28. pers-partially rand. [11] X XX X - y 1, 6, 9, 12, 14, 20, 21, 28, 29 ML, NF
29. voting [11, 12] XX XX - y 1, 6, 9, 12, 14, 20, 21, 28 ML, NF
adaptive combin. 30. switching [76] XX XX - XX - y 9, 20, 29 ML
27
Active LearningStrategies
personalized
combined-heuristic
adaptive combina-tion
switching [76]
static combination
voting [11]
partially rand [11]
fMF [75]
treeU [75]
non-myopic [74]
influence-uncertainty [61]
single-heuritsic
user partitioningdecision tree [64]
IGCN [67]
prediction based
lowest pred [11, 12]
highest pred [11, 12]
min norm [74]
min rating [74]
FMM [72]
aspect model [72,73]
impact basedimpact analysis [71]
influence based [61]
acquisition prob.
personality-based[70]
binary-pred [11, 12]
item-item [20]
non-personalized
combined-heuristic static combination
partially-rand [11,12]
HELF [67]
sqrt(pop)* variance[68]
log(pop)* entropy[20]
rand-popularity [20]
single-heuristic
attention-basedco-coverage [68]
popularity [20, 60]
error reductionrepresentative-based [69]
greedy extend [68]
uncertainty reduc-tion entropy0 [67]
entropy [59, 20]
variance [59]
Figure 4: Classification of active learning strategies in collaborative filtering
11
Tailored to:
•different objectives
•different data & settings
37
MANY AL-RS
APPROACHE
S
http://www.win.tue.nl/~eknutov/gaf.html 38
RS Complexity• RS composed of many modules that need tuning to
achieve high performance
Take-home Messages• RS shows users items they want
• RS accounts for a large portion of purchases
• RS methods: user/item-based
• RS is crucial for user growth, and:
• addressing new items/users (“cold start”) with:
• indirect data acquisition
• content-based item similarity
• informative item selection with AL
• Many RS components could be tuned to achieve high performance
EOP