+ All Categories
Home > Science > Real-world News Recommender Systems

Real-world News Recommender Systems

Date post: 27-Jun-2015
Category:
Upload: kib83
View: 147 times
Download: 1 times
Share this document with a friend
Description:
Tutorial given at the 'Informatik 2014' on 26 September, 2014, in Stuttgart, Germany.
Popular Tags:
67
Verarbeitung von Datenstr¨omen in Echtzeit Tobias Heintz 1 Benjamin Kille 2 1 plista GmbH 2 Technische Universit¨ at Berlin September 26, 2014
Transcript
Page 1: Real-world News Recommender Systems

Verarbeitung von Datenstromen in Echtzeit

Tobias Heintz1 Benjamin Kille2

1plista GmbH

2Technische Universitat Berlin

September 26, 2014

Page 2: Real-world News Recommender Systems

Table of Contents

Introduction

Recommender SystemsUnpersonalised RecommendationCollaborative FilteringContent-based FilteringEvaluation

News Recommendation

Big Data Issues

Page 3: Real-world News Recommender Systems

Who are we?

I Tobias Heintz, plista GmbH

I Benjamin Kille, Technische Universitat Berlin

plista GmbH

Pioneers for targeted advertisement and content distribution.

I founded 31 July, 2008

I incorporated in the WPP Group as of 1 January, 2014

I headquaters in Berlin, Germany

I 120 employees (30 % R&D)

Technische Universitat Berlin

I >30 000 enrolled students

I 331 professors

I >2600 researchers

Page 4: Real-world News Recommender Systems

What problems do we address?

Recommender Systems

We will introduce recommender systems; we will discuss a varietyof algorithms; we will explore how to evaluate recommendersystems.

NewsWe will talk about specific challenges when recommending news;we will illustrate issues arising as system fail to buildcomprehensive user profiles; we will depict how news evolving overtime affect recommender systems.

Big Data

We will examplify in what way news represent a source of big data;we will introduce a system which grants researchers access to bigdata; we will show you, how you can compete with your ownapproaches.

Page 5: Real-world News Recommender Systems

Why are these problem important?

Users increasingly face information overload as they interact withitem collections. For instance:

I >43 000 000 songs on Apple’s iTunes

I 100 h of video are uploaded on Youtube every minute

I >3 000 000 movies on IMDb

I ...

Collection continue to grow causing even more severe informationoverload. The same yields for news articles.

Page 6: Real-world News Recommender Systems

Table of Contents

Introduction

Recommender SystemsUnpersonalised RecommendationCollaborative FilteringContent-based FilteringEvaluation

News Recommendation

Big Data Issues

Page 7: Real-world News Recommender Systems

Problem definition

Users have insufficient time and cognitive capacity to iterate thefull collection. Recommender Systems support users as they filtercollections. Recommender Systems differ with respect to themethod they use to filter. More formally, a general-purposerecommender system is a triple (U , I, φ).

U → set of users {u1, u2, . . . , uM}I → set of items {i1, i2, . . . , iN}φ→ a filter function

The performance of different recommendation algorithms typicallydepends on φ.

Page 8: Real-world News Recommender Systems

Filter Functions

Filter functions take a user u, the entire item collection I, and amodel M. They return a subset of items to be recommended I∗.

φ(u, I,M) = I∗

Recommender systems’ success or failure strongly depends on themodel M. In particular, how accurately the model reflects actualuser preferences. M may take various kinds of input, as we willdiscuss for a selection of recommendation algorithms.

Page 9: Real-world News Recommender Systems

Random Recommendation

M takes the item collection and selects items randomly.

Page 10: Real-world News Recommender Systems

Random Recommendation

M takes the item collection and selects items randomly.

random

Page 11: Real-world News Recommender Systems

Most-Popular Recommendation

M orders the item collection according to the number ofinteractions, K ≥ L ≥ M ≥ N.

K interactions

L interactions

M interactions

N interactionsmostpopular

Page 12: Real-world News Recommender Systems

Summary: Unpersonalised Recommenders

Advantages

I low computational complexity

I easy to update M

I domain independent

Disadvantages

I disregard personal taste

I disregard context

I high chance to recommend known or unpopular items

Page 13: Real-world News Recommender Systems

Collaborative Filtering

Basic Assumptions

I systems have access to users’ preferences

I users with similar tastes in the past will continue to likesimilar items

I systems have means to compare users tastes

Distinctions

I model-based vs memory-based

I item-based vs user-based

Page 14: Real-world News Recommender Systems

Example

AnnaAviator

Bob

Clara

Dan

Bad Boys

Cars

District 9

Elektra

Page 15: Real-world News Recommender Systems

Example

AnnaAviator

Bob

Clara

Dan

Bad Boys

Cars

District 9

Elektra

Page 16: Real-world News Recommender Systems

Example

AnnaAviator

Bob

Clara

Dan

Bad Boys

Cars

District 9

Elektra

Bad Boys District 9 Elektra[ , , ]user profile: Anna

Page 17: Real-world News Recommender Systems

Example

Anna

Bob

Clara

Dan

Bad Boys District 9 Elektra[ , , ][ , , , ][ , , ][ ]

Aviator

Aviator

Bad Boys District 9 Elektra

Cars District 9 Elektra

Page 18: Real-world News Recommender Systems

Example

Anna

Aviator

Bob

Clara

Dan

Bad Boys Cars District 9 Elektra

Page 19: Real-world News Recommender Systems

Example

Anna

Aviator

Bob

Clara

Dan

Bad Boys Cars District 9 Elektra

Page 20: Real-world News Recommender Systems

Example

Anna

Aviator

Bob

Clara

Dan

Bad Boys Cars District 9 Elektra

Page 21: Real-world News Recommender Systems

Example

Anna

Aviator

Bob

Clara

Dan

Bad Boys Cars District 9 Elektra

Page 22: Real-world News Recommender Systems

Preference Elicitation

Explicit Preferences

I Likes

I Thumbs Up/Down

I Ratings

I Comments

I Purchase

Implicit Preferences

I Click

I Dwell Time

I Returns

How can we measure whether users like items and how much theydo?

Page 23: Real-world News Recommender Systems

Collaborative Filtering Algorithms with Ratings

Memory-based

Algorithm uses the complete set of data in the recommendationprocess. M contains the full rating matrix.

I user-based k-nearest neighbour

I item-based k-nearest neighbour

Model-basedAlgorithm learns a model M and uses it to recommend items.

I matrix factorisation with ALS

I matrix factorisation with SGD

Page 24: Real-world News Recommender Systems

User-based k-nearest Neighbour

Input: M × N rating matrix R, similarity measure σ(u, v)

Anna

Aviator

Bob

Clara

Dan

Bad Boys Cars District 9 Elektra

Page 25: Real-world News Recommender Systems

User-based k-nearest Neighbour

Input: M × N rating matrix R, similarity measure σ(u, v)

Anna

Aviator

Bob

Clara

Dan

Bad Boys Cars District 9 Elektra

Page 26: Real-world News Recommender Systems

User-based k-nearest Neighbour

Input: M × N rating matrix R, similarity measure σ(u, v)

Anna

Aviator

Bob

Bad Boys Cars District 9 Elektra

1 1 1

1 1 11

0 0

0

Page 27: Real-world News Recommender Systems

Similarity Measures

Number of items in common

σ(u, v) =∑i∈I

I(i)

I(i) =

{1 if both u and v liked i

0 otherwise

Cosine similarity

σ(u, v) =u · v||u||||v ||

Pearson’s correlation coefficient

σ(u, v) =cov(u, v)

std(u)std(v)

Page 28: Real-world News Recommender Systems

User-based k-nearest Neighbour

Input: M × N rating matrix R, similarity measure σ(u, v)

Anna

Bob

Clara

Dan

Anna Bob Clara Dan

11

1

1

sim(Anna, Bob)

sim(Bob, Anna)

Page 29: Real-world News Recommender Systems

User-based k-nearest Neighbour

Input: M × N rating matrix R, similarity measure σ(u, v)

Anna

Bob

Clara

Dan

Anna Bob Clara Dan

11

1

1

sim(Anna, Bob)

sim(Bob, Anna)

[1, sBob, sClara, sDan]

Page 30: Real-world News Recommender Systems

User-based k-nearest Neighbour

Input: M × N rating matrix R, similarity measure σ(u, v)

Anna

Aviator

Bob

Clara

Dan

Bad Boys Cars District 9 Elektra

?

Page 31: Real-world News Recommender Systems

User-based k-nearest Neighbour

Recommendation procedure user profile:

u = (r(i1), r(i2), . . . , r(iN))

similarity vector:

σ(u, ·) = (σ(u, v1), σ(u, v2), . . . , σ(u, u), . . . , σ(u, vM))

preference prediction:r(j) = uσ(u, ·)

ResultWe obtain a prediction for each item’s preference and can rankthem accordingly. The algorithm returns as many items asrequested starting from the top rank.

Page 32: Real-world News Recommender Systems

Item-based k-nearest Neighbour

Input: M × N rating matrix R, similarity measure σ(i , j)

Anna

Aviator

Bob

Clara

Dan

Bad Boys Cars District 9 Elektra

Page 33: Real-world News Recommender Systems

Item-based k-nearest Neighbour

Input: M × N rating matrix R, similarity measure σ(i , j)

Anna

Aviator

Bob

Clara

Dan

Bad Boys Cars District 9 Elektra

Page 34: Real-world News Recommender Systems

Item-based k-nearest Neighbour

Input: M × N rating matrix R, similarity measure σ(i , j)

Anna

Aviator

Bob

Clara

Dan

Bad Boys

1

1

11

0

0 0

0

Page 35: Real-world News Recommender Systems

Similarity Measures

Number of items in common

σ(i , j) =∑u∈U

I(u)

I(u) =

{1 if both i and j are liked by u

0 otherwise

Cosine similarity

σ(i , j) =i · j||i ||||j ||

Pearson’s correlation coefficient

σ(i , j) =cov(i , j)

std(i)std(j)

Page 36: Real-world News Recommender Systems

Item-based k-nearest Neighbour

Input: M × N rating matrix R, similarity measure σ(i , j)

Aviator Bad Boys Cars District 9 Elektra

Aviator

Bad Boys

Cars

District 9

Elektra

11

11

1

sim(Aviator, Bad Boys)

sim(Bad Boys, Aviator)

Page 37: Real-world News Recommender Systems

Item-based k-nearest Neighbour

Input: M × N rating matrix R, similarity measure σ(i , j)

Anna

Aviator

Bob

Clara

Dan

Bad Boys Cars District 9 Elektra

?

Page 38: Real-world News Recommender Systems

Item-based k-nearest Neighbour

Recommendation procedure item profile:

i = (r(u1), r(u2), . . . , r(uM))

similarity vector:

σ(i , ·) = (σ(i , j1), σ(i , j2), . . . , σ(i , i), . . . , σ(i , jN))

preference prediction:r(u) = σ(i , ·)i

ResultWe obtain a prediction for each item’s preference and can rankthem accordingly. The algorithm returns as many items asrequested starting from the top rank.

Page 39: Real-world News Recommender Systems

Matrix Factorisation

Input: M × N rating matrix R

R =

1 1 1

1 1 1 11 1 1

1

GoalFill the gaps of missing preferences.

Page 40: Real-world News Recommender Systems

Matrix Factorisation

IdeaProject preferences into low dimensional space to detect latentstructures.

[R]M×N ≈ [P]M×K [Q]>N×K

K � M,N

ProblemHow to determine P and Q?

Page 41: Real-world News Recommender Systems

Matrix Factorisation

Learning P and QInput: Error metric

E (P,Q,R) =∑

(u,i)∈R

(r(u, i)− PuQ>i )2

(quadratic error)

E (P,Q,R) =∑

(u,i)∈R

|r(u, i)− PuQ>i |

(absolute error)

Page 42: Real-world News Recommender Systems

Matrix Factorisation

Stochastic Gradient DescentOptimise error metric by selecting data points at random.

I initialise P,Q with small random values

I pick a preference (u, i) at random

I determine the gradient at that point

I adjust P,Q accordingly

I continue

Alternating Least Squares

Optimise either P or Q keeping the other fixed

I initialise P,Q with small random values

I optimise error metric by P

I optimise error metric by Q

I continue

Page 43: Real-world News Recommender Systems

Summary: Collaborative Filtering

Advantages

I takes personal taste into account

I successful in the Netflix Prize competition

I domain-independent

Disadvantages

I cold-start problem

I sparsity

I grey sheep

Page 44: Real-world News Recommender Systems

Cold-Start Problem

I user without known preferences

I item without preferences

I similarity measures fail

I inconclusive latent factors

Page 45: Real-world News Recommender Systems

Grey Sheep

I user rate all their items average

I user profile: [3, 3, 3, 3, . . . , 3]

I collaborative systems cannot distinguish good from bad items

Page 46: Real-world News Recommender Systems

Content-based Filtering

IdeaSuggest items which are similar to items users have liked.

Similarity

I based on content → features

I depending on the domain

Page 47: Real-world News Recommender Systems

Content-based Filtering

Input: user profile, item collection, item features, and similaritymeasure

Page 48: Real-world News Recommender Systems

Content-based Filtering

Input: user profile, item collection, item features, and similaritymeasure

Page 49: Real-world News Recommender Systems

Content-based Filtering

Input: user profile, item collection, item features, and similaritymeasure

Features

▪ Name/ID▪ Meta data▪ Content▪ audio stream --> songs▪ video stream -->

movies▪ text --> book, news

article

Page 50: Real-world News Recommender Systems

Content-based Filtering

Input: user profile, item collection, item features, and similaritymeasure

CBF

sim(i,j)

Page 51: Real-world News Recommender Systems

Content-based Filtering

Similarity: Example

I keyword overlap → text

I average colour match → images/video

I maximum amplitude → audio/sound

I common actors → movies

I common interests → friends/partnership

Page 52: Real-world News Recommender Systems

Summary: Content-based Filtering

Advantages

I considers personal taste

I high expectability

Disadvantages

I cost-sensitive for high-volume contents, e.g., video

I low serendipity

I user cold-start

Page 53: Real-world News Recommender Systems

Evaluation

Important aspects

I how well does the system predict preferences?

I how often do users receive useful suggestions?

I how long does it take for the system to provide suggestions?

I how many requests cannot be answered?

I how often do users return to the site?

I how often do users purchase/rent/consume items which thesystem had recommended?

I how well did users perceive the system?

Page 54: Real-world News Recommender Systems

Evaluation: Rating Prediction

GoalThe evaluation ought to show how well the system estimatespreferences.

Assumptions

I system can access recorded explicit numerical preferences

I tastes remain stable over time

I the more accurate the system estimates preferences, the moresuited the suggestions

Metrics

I root mean squared error√

1|(u,i)|

∑(u,i)∈R(r(u, i)− r(u, i))2

I mean absolute error 1|(u,i)|

∑(u,i)∈R |r(u, i)− r(u, i)|

Page 55: Real-world News Recommender Systems

Evaluation: Ranking

GoalThe evaluation ought to show how well the system ranks itemsaccording to users’ preferences.

Assumptions

I system can access preference relations between items

I tastes remain stable over time

I the better the system ranks items, the more suited thesuggestions

Metrics

I normalised discounted cumulative gain DCGIDCG

I mean reciprocal rank 1|u|

∑u∈U

1ranki

Page 56: Real-world News Recommender Systems

Evaluation: Top-N

GoaldThe evaluation ought to show how well the system selects the topsuggestions.

Assumptions

I system can access preference relations between items

I tastes remain stable over time

I the better the system selects the top suggestions, the moresuited they are

Metrics

I precision@N TPTP+FP

I recall@N TPTP+FN

Page 57: Real-world News Recommender Systems

Evaluation: Problems

I explicit preferences may not be available

I tastes change over time

I recorded data does not fully reflect the current situation

SolutionAccessing real systems with current user interactions to seewhether method performs better than existing one → second partof the tutorial

Page 58: Real-world News Recommender Systems

Summary: Recommender Systems

I support users by suggesting interesting items

I counteract information overload

I unpersonalised recommenderI collaborative filtering

I user-based k-nearest neighbourI item-based k-nearest neighbourI matrix factorisation

I content-based filtering

I evaluation still difficult

Page 59: Real-world News Recommender Systems

Table of Contents

Introduction

Recommender SystemsUnpersonalised RecommendationCollaborative FilteringContent-based FilteringEvaluation

News Recommendation

Big Data Issues

Page 60: Real-world News Recommender Systems

News Recommendation: Special Characteristics

Collection Dynamics

I thousands of new article published daily

I older articles’ relevancy decays

Contextual Differences

I users perceive recommendations differently

I devices render recommendations differently

I dependence on daytime and weekday

Popularity Bias

I few items receive a lot of attention

I most items receive hardly any attention

Page 61: Real-world News Recommender Systems

News Recommendation: Collection Dynamics

500

1000

1500

2000

Oct Jan

entry

Oct Jan

exit

Page 62: Real-world News Recommender Systems

News Recommendation: Contextual Differences

hour

Mon

Tue

Wed

Thu

Fri

Sat

Sun

0 6 12 18

desktop phone

Mon

Tue

Wed

Thu

Fri

Sat

Suntablet

0.000

0.002

0.004

0.006

0.008

0.010

0.012

0.014

Page 63: Real-world News Recommender Systems

News Recommendation: Popularity Bias

News

Interactions

Frequency

10^0

10^1

10^2

10^3

10^4

10^0 10^1 10^2 10^3 10^4 10^5 10^6

Movies

InteractionsFrequency

10^0.0

10^0.5

10^1.0

10^1.5

10^2.0

10^0 10^1 10^2 10^3 10^4

Page 64: Real-world News Recommender Systems

Table of Contents

Introduction

Recommender SystemsUnpersonalised RecommendationCollaborative FilteringContent-based FilteringEvaluation

News Recommendation

Big Data Issues

Page 65: Real-world News Recommender Systems

Big Data

GoalIntelligent real-time processing of huge amounts of data.Recommender Systems → personalisation

I volume → amount of data to be stored increases

I variety → heterogeneous data

I velocity → data streams in (near) real-time

I veracity → noisy data

Page 66: Real-world News Recommender Systems

Big Data

Do news recommendations fullfil the requirements of big data?

Volumehundreds of GB every day X

Variety

news entail textual data and images enducing some variety

Velocity

news arise continuously → second part of the tutorial X

Veracity

news have some consistent attributes (headline, text), but alsocomprise some features which are missing or wrong (date, location,image)

Page 67: Real-world News Recommender Systems

Questions?

Thank you for your attention! We hope you enjoyed the first partof the tutorial! There is more (practical) to come in the secondpart!


Recommended