Caching Strategies for
In-Memory Neighborhood-based
Recommender Systems
Simon Dooms
@sidooms
Introduction
05/09/2013 Simon Dooms - Ghent University – WEBIST 2013 2
• Neighborhood-based recommender systems
• User-based Collaborative Filtering (UBCF)
ConclusionsResultsCachingAbout SimilaritiesIntro
Item 1 Item 2 Item 3 Item 4 Item 5
User 1 2 1 4 5
User 2 1 5 5 4
User 3 3 5 1
User 4 5
User 5 1 2
For every user u:
For every item i:
calculate(u,i)
������������ ��, � =
������(��, ��) ∗ ����(��, �) + ������(��, ��) ∗ ����(��, �)
?
05/09/2013 Simon Dooms - Ghent University – WEBIST 2013 3
ConclusionsResultsCachingAbout SimilaritiesIntro
������������ ��, � =
������ �!"(#$, #%) ∗ ����(��, �) + ������ �!"(#$, #&) ∗ ����(��, �)
#$ #% #& #' #(
#$ 1 0.2 0.5 0.7 0.5
#% 1 0.4 0 0.7
#& 1 0.7 0.6
#' 1 0.8
#( 1
• User similarities needed
• Sometimes precalculated
Users Similarities
5 10
50 1,225
500 124,750
5,000 12,497,500
50,000 1,249,975,000
#����� ∗ (#����� − 1)
2
ConclusionsResultsCachingAbout Similarities
Intro
05/09/2013 Simon Dooms - Ghent University – WEBIST 2013 4
Hypothesis:
Some similarities are used more than others
• Full recommendation calculation, MovieLens 100K
• Similarity frequency:
ConclusionsResultsCachingAbout Similarities
Intro
05/09/2013 Simon Dooms - Ghent University – WEBIST 2013 5
Hypothesis:
Some similarities are used more than others
• Full recommendation calculation, MovieLens 100K
• Similarity frequency:
• ������ �!"(#-, #")
• Used in r./ #-, � with i not rated by �0– > needs similarities of users who rated i
– > ���(�0,�1) needed = #items rated by �1 but not by �0
• Same for ������ �!"(#", #-)
ConclusionsResultsCachingAbout Similarities
Intro
05/09/2013 Simon Dooms - Ghent University – WEBIST 2013 6
• Similarities usage frequency differs
• Predict similarity usage frequency?
������������ ��, � =
������ �!"(#$, #%) ∗ ����(��, �) + ������ �!"(#$, #&) ∗ ����(��, �)
• ������ �!"(#-, #")
• Used in r./ #-, � with i not rated by �0– > needs similarities of users who rated i
– > ���(�0,�1) needed = #items rated by �1 but not by �0
• Same for ������ �!"(#", #-)
ConclusionsResultsCachingAbout Similarities
Intro
05/09/2013 Simon Dooms - Ghent University – WEBIST 2013 7
• Similarities usage frequency differs
• Predict similarity usage frequency?
������������ ��, � =
������ �!"(#$, #%) ∗ ����(��, �) + ������ �!"(#$, #&) ∗ ����(��, �)
Items rated by �0
Items rated by �1
Usagefrequency������ �!"(#-, #") = cardinality inverse intersection
ConclusionsResultsCaching
About SimilaritiesIntro
05/09/2013 Simon Dooms - Ghent University – WEBIST 2013 8
• Usage frequency similarities is known
• Now what?
Use information for caching
SMART Cache:
If cache full, replace entry with lowest predicted usage
frequency
LRU Cache:
If cache full, replace entry least recently used
No Cache (baseline):
No caching, all similarities recalculated
ConclusionsResults
CachingAbout SimilaritiesIntro
05/09/2013 Simon Dooms - Ghent University – WEBIST 2013 9
• Full recommendation calculation
ConclusionsResults
CachingAbout SimilaritiesIntro
05/09/2013 Simon Dooms - Ghent University – WEBIST 2013 10
• Lowest needed cache size for LRU?
For every user u:
For every item i:
calculate(u,i)
0.21% similarities = 942 similarities
High temporal locality:
Good for LRU
ConclusionsResults
CachingAbout SimilaritiesIntro
05/09/2013 Simon Dooms - Ghent University – WEBIST 2013 11
• What if order changed?
For every item i:
For every user u:
calculate(u,i)
For every user u:
For every item i:
calculate(u,i)
ConclusionsResults
CachingAbout SimilaritiesIntro
05/09/2013 Simon Dooms - Ghent University – WEBIST 2013 12
• ORDER matters
• SMART more
stable results
Conclusions
• Similarity values not equally important
• SMART caching:
– Better for random like ordering
– Most stable (predictable) results
• LRU caching:
– When outer-user (high temporal locality)
– Smaller cache size needed (0.21% vs 60%)
• Calculation order (user,item) pairs important
• Caching needs to be carefully considered
ConclusionsResultsCachingAbout SimilaritiesIntro
05/09/2013 Simon Dooms - Ghent University – WEBIST 2013 13
Simon Dooms
@sidooms
Caching Strategies for
In-Memory Neighborhood-based
Recommender Systems