Post on 15-Jan-2015
description
transcript
Open Recommendation Platform
ACM RecSys 2013, Hong Kong
Torben Brodtplista GmbH
Keynote
International News RecommenderSystems Workshop and Challenge
October 13th, 2013
where● news websites● below the article
different types● content● advertising
Where it’s coming from
Recommendations
Visitors Publisher
* company i am working for
Where it’s coming from
good recommendations for...
User happy!
Advertiser happy!
Publisher happy!
plista* happy!
Where it’s coming from
some years ago
Visitors PublisherContext
Recommendations
Collaborative Filtering
● well known algorithm● more data means more
knowledge
Where it’s coming from
one recommender
Collaborative Filtering
● time● trust● mainstream
Parameter Tuning
2008● finished studies● 1st publication● plista was born
today● 5k recs/second● many publishers
Where it’s coming from
one recommender = good results
"use as many recommenders as possible!"
Where it’s coming from
netflix prize
Collaborative Filtering
Most Popular
Text Similarity
etc ...
Where it’s coming from
more recommenders
● we have one score● lucky success? bad loss?● we needed to keep track
on different recommenders
success: 0.31 %
understanding performance
lost in serendipity
number of● clicks● orders● engages● time on site● money
bad good
understanding performance
how to measure success
10
● features?● big data math?● counting!
for blending we just count floats
understanding performance
evaluation technology
Algo1 1+1 10
Algo2 100 2+5
Algo...
understanding performance
evaluation technology
impressions
collaborative filtering 500 +1
most popular 500
text similarity 500
ZINCRBY"impressions"
"collaborative_filtering"
"1"
ZREVRANGEBYSCORE "impressions"
understanding performance
evaluation technology
impressions
collaborative filtering 500
most popular 500
text similarity 500
clicks
collaborative filtering 100
most popular 10
... 1
needs division
ZREVRANGEBYSCORE "clicks"
ZREVRANGEBYSCORE "impressions"
● CF is "always" the best recommender
● but "always" is just avg of all context
lets check on context!
understanding performance
evaluation resultssuccess
● We like anonymization! We have a big context featured by the web
● URL + HTTP Headers provide○ user agent -> device -> mobile○ IP address -> geolocation○ referer -> origin (search, direct)
Context
Context
consider list of best recommender in each context attribute sorted list for what is relevant by● clicks (content recs)● price (advertising recs)
publisher = welt.de
collaborative filtering 689
most popular 420
text similarity 135
category = archive
text similarity 400
collaborative filtering 200
... 100
hour = 15
recent 80
collaborative filtering 10
... 5
Context
Context
publisher = welt.de
collaborative filtering 689
most popular 420
text similarity 135
weekday = sunday
collaborative filtering 400
most popular 200
... 100category = archive
text similarity 200
collaborative filtering 10
... 5
ZUNION clk ... WEIGHTS p:welt.de:clk 4 w:sunday:clk 1 c:archive:clk 1
ZREVRANGEBYSCORE
"clk"
ZUNION imp ... WEIGHTS p:welt.de:imp 4 w:sunday:imp 1 c:archive:imp 1
ZREVRANGEBYSCORE
"imp"
Context
evaluation context
Context can be used for optimization and targeting.
classical targeting is limitation
Context
Targeting
Recommenders
collaborative filtering 500 +1
most popular 500
text similarity 500
Advertising
RWE Europe 500 +1
IBM Germany 500
Intel Austria 500
Onsite
new iphone su...
500 +1
twitter buys p.. 500
google has seri. 500
Advertising
RWE Europe 500 +1
IBM Germany 500
Intel Austria 500
Recommenders
collaborative filtering 500 +1
most popular 500
text similarity 500
Onsite
new iphone su...
500 +1
twitter buys p.. 500
google has seri. 500
Context
Livecube
context
recap● added another
dimension
result
● better for news: Collaborative Filtering
● better for content: Text Similarity
Context
evaluation contextsuccess
20
what did we get?
● possibly many recommenders
● know how to measure success
● technology to see success
now breath!
● real-time evaluation technology exists
● to choose best algorithm for current context we need to learn: multi-armed bayesian bandit
the ensemble
Data Science
“shuffle” exploration exploitation
temporary success?
No. 1 getting most
local minima?
Interested? Look for Ted Dunning + Bayesian Bandit
● new total / avg is much better
● thx bandit● thx ensemble
more research● timeseries
✓ better results
time
success
✓ easy exploration
● tradeoff (money decision)● between price/time we
“waste” in offline evaluation● and price we loose with
bad recommendations
● minimum pre-testing● no risk if recommender
crashs● "bad" code might find
its context
try and error
● now plista developers can try ideas
● and allow researchers to do the same
collaboration
Ensemble is able to choose
big pool of algorithms
Collaborative Filtering
Most Popular
Text Similarity
Ensemble
BPR-LinearWR-MFSVD++etc.
Research Algorithms
researcher has idea
src http://g-ecx.images-amazon.com/images/G/03/video/m/feature/wickie_figur.jpg
● first and only dataset in news context○ millions of items○ only relevant for short time
● dataset has many attributes !!● many publishers have user intersection
○ regional○ contextual
● real world !!!○ you can guide the user○ you don’t need to follow his route
● real time !!○ This is industry, it has to be usable
researcher has idea
src http://userserve-ak.last.fm/serve/_/7291575/Wickie%2B4775745.jpg
30
... probably hosted by university, plista or any cloud provider?
... needs to start the server
"message bus"● event notifications
○ impression○ click
● error notifications● item updates
train model from it
... api implementation
{ // json
"type": "impression",
"context": {
"simple": {
"27": 418, // publisher
"14": 31721, // widget
...
},
"lists": {
"10": [100, 101] // channel
}
...
}
... package content
api specs hosted at http://orp.plista.com
{ // json
"type": "impression",
"recs": ...
// what was recommended
}
api specs hosted at http://orp.plista.com
... package content
{ // json
"type": "click",
"context": ...
// will include the position
}
... package content
api specs hosted at http://orp.plista.com
recs
{ // json
"recs": {
"int": {
"3": [13010630, 84799192]
// 3 refers to content recommendations
}
...
}
generated by researchersto be shown to real user
API
Real User
Researcher
... reply to recommendation requests
api specs hosted at http://orp.plista.com
recs
Real User
Researcher
● happy user
● happy researcher
● happy plista
research can profit
● real user feedback
● real benchmark
quality is win win #2
use common frameworks
src http://en.wikipedia.org/wiki/Pac-Man
how to build fast system?
● no movies!
● news articles will outdate!
● visitors need the recs NOW
● => handle the data very fast
src http://static.comicvine.com/uploads/original/10/101435/2026520-flash.jpg
quick and fast
● fast web server
● fast network protocol
● fast message queue
● fast storage
or Apache Kafka
"send quickly" technologies
40
"real-time features feel better in a real-time world"
we don't need batch! see http://goo.gl/AJntul
our setup● php, its easy● redis, its fast● r, its well known
comparison to plista
Overview
Publisher
Recommendations
Feedback
Collaborative Filtering
Most Popular
Text Similarity
etc.
Preferences
EnsembleVisitors
● 2012
○ Contest v1
● 2013
○ ACM RecSys “News
Recommender Challenge”
● 2014
○ CLEF News Recommendation
Evaluation Labs “newsreel”
Overview
Contacthttp://goo.gl/pvXm5 (Blog)torben.brodt@plista.comhttp://lnkd.in/MUXXuvxing.com/profile/Torben_Brodtwww.plista.com
News Recommender Challengehttps://sites.google.com/site/newsrec2013/
#RecSys@torbenbrodt @NRSws2013 @plista
questions?