Date post: | 16-Apr-2017 |
Category: |
Internet |
Upload: | frank-hopfgartner |
View: | 362 times |
Download: | 2 times |
News REcommendation Evaluation Lab (NewsREEL)
Lab Overview
Frank Hopfgartner, Benjamin Kille, Andreas Lommatzsch, Martha Larson, Torben Brodt, Jonas Seiler
Recommender systems or recommendation systems are a subclass of information filtering systems that seek to predict the "rating" or "preference“ that a user would give to an item.
Recommender Systems
Items (Set-based recommenders)
Items (Streams)
• Recommender Systems• Evaluation• NewsREEL scenario• NewsREEL 2016
Outline
How do we evaluate?
Academia Industry
• Static, often rather old datasets• Offline Evaluation• Focus on Algorithms and Precision
• Dynamic dataset• Online A/B testing• Focus on user satisfaction
Example: Recommending sites in Evora
Sé Catedral
Capela dos Ossos
Templo romano
Palacio de Don Manuel I
Source (Images): Wikipedia
1.Chose time point to split dataset 2.Classify ratings before t0 as training set 3.Classify ratings after as test set
Offline Evaluation Dataset construction
Centro Historico
Capela dos Ossos
Templo Romano
Se Catedral
Almendres Cromlech
Cristiano 2 4 5 2
Marta 5 3
Luis 1 2
1.Train rating function f(u,i) using training set2.Predict rating for all pairs (u,i) of test set3.Compute RMSE(f) over all rating predictions
Offline Evaluation Benchmarking Recommendation Task
• Ignores users’ previous interactions/preferences
• Does not consider trends/shifting preferences
• ...
• Technical challenges are ignored
Drawbacks of offline evaluation
And what about me?
Example: Online evaluation
Online Evaluation (A/B testing)
Compare performance, e.g., based on profit margins
$$$
A
BClick-through rateUser retention timeRequired resources User satisfaction
• Large user base and costly infrastructure required
• Different evaluation metrics required
• Comparison to offline evaluation challenging
Drawbacks of online evaluation
And what about me?
• Academia and industry apply different evaluation approaches• Limited transfer from offline to online scenario• Multi-dimensional benchmarking• Combination of different evaluation approaches
Evaluation challenges
• Recommender Systems• Evaluation• NewsREEL scenario• NewsREEL 2016
Outline
In CLEF NewsREEL, participants can develop stream-based news recommendation algorithms and have them benchmarked (a) online by millions of users over the period of a few months, and (b) offline by
simulating a live stream.
CLEF NewsREEL
NewsREEL scenario
Image: Courtesy of T. Brodt (plista)
NewsREEL scenario
Profit = Clicks on recommendationsBenchmarking metric: Click-Through-Rate
Request article
Request article
Request recommendation
Request recommendation
Task 2: Offline Evaluation
• Traffic and content updates of nine German-language news content provider websites
• Traffic: Reading article, clicking on recommendations
• Updates: adding and updating news articlesD
ata
• Simulation of data stream using Idomaar framework
• Participants have to predict interactions with data stream
• Quality measured by the ratio of successful predictions by the total number of predictionsE
valu
atio
n
Predict interactions in a simulated data stream
Simulation process
Idomaar
simulate stream
request article
Idomaar stream simulation
21
Task 1: Online Evaluation
• Provide recommendations for visitors of the news portals of plista’s customers
• Ten portals (local news, sports, business, technology)
• Communication via Open Recommendation Platform (ORP)
Dat
a
• Benchmark own performance with other participants and baseline algorithms during three pre-defined evaluation windows
• Best algorithms determined in final evaluation period
• Standard evaluation metricsEva
luat
ion
Recommend news articles in real-time
Real-Time Recommendation
T. Brodt and F. Hopfgartner“Shedding Light on a Living Lab: The CLEF NewsREEL Open Recommendation Platform,” In Proc. of IIiX 2014, Regensburg, Germany, pp. 223-226, 2014.
• Recommender Systems• Evaluation• NewsREEL scenario• NewsREEL 2016
Outline
Participation
25
Task 1 – First evaluation window
Task 1 – Second evaluation window
Task 1 – Third Evaluation window
• NewsREEL presentations – Online Algorithms and Data
Analysis– Frameworks and Algorithms
• Evaluation results and task winners
• Joint Session: LL4IR & NewsREEL: New Ideas
NewsREEL sessionTomorrow, 1:30pm – 3:30pm
More Information• http://orp.plista.com• http://www.clef-newsreel.org• http://www.crowdrec.eu• http://sigir.org/files/forum/2015D/p129.pdf
Thank you
Questions?