Download - TiVo Suggestions: Predicting Viewer Affinity Using Collaborative Filtering Kamal Ali – TiVo, Yahoo Wijnand van Stam, TiVo.

TiVo Suggestions: Predicting Viewer Affinity Using Collaborative Filtering

TiVo Suggestions: Predicting Viewer Affinity Using Collaborative Filtering

Kamal Ali – TiVo, Yahoo

Wijnand van Stam, TiVo

OutlineOutline

What is “TiVo” ?

Why Suggestions?

Collaborative filtering background

TiVo collaborative filtering data cycle

Server-side learning

Previous Work

Contributions

ContributionsContributions

Large fielded system Large number (3M) of users Long-lived interaction w user: >90 t/user 10^8 ratings over 300K shows Very large in user-hours

Distributed architectureServer: Throttle-ableClients do bulk of work

Privacy-preservation Privacy and distributed goals aligned No persistent memory of user on server

What is “TiVo” ?What is “TiVo” ?

TiVo = set-top TV box + program-guide service

Pause & rewind live TV

Linux OS

Viewers can rate shows

Suggestions

Q4 1999

Why Suggestions?Why Suggestions? Connect users to shows they’ll like

Predict degree to which viewer will like TV show

Produces ranked list of upcoming shows

Records shows if disk space is available

Filtering BackgroundFiltering Background

Recommendation Systems

Content-based: use “intrinsic” features such as genre,cast, director, writers, age, channel-type,…

Collaborative filtering:use other people’s ratings

Combined, Cascaded

Content isn’t sufficientContent isn’t sufficient

Genres are few

Text length is small

Data cycleData cycle

ThumbsProfile on TiVo

Client box

Random ID generatedfor profile and

stored on server

1: Collecting Feedback:Thumbs up/dnRecorded

2. TiVo calls server uploads entire anonymized profile

Correlation pairs<s1,s2,r> on

server 3. Server-side learning

Correlation pairson

client

4. Download pairs duringsome client-initiated calls

Rated shows insorted order

5. Use correlations andThumbs profile to rate shows

http://tivo.com/resources/images/press_lifestyle_1.jpg

Collaborative Filtering ModelCollaborative Filtering Model• k Nearest Neighbor over other rated correlated shows

• Use Pair-wise Pearson correlation

• Adjusted correlation for low support

• Use weighted linear combination

1. Collecting Feedback1. Collecting Feedback

Explicit:

Thumbs up, down: -3 ... +3

Implicit:

User-initiated recording ... +1 thumbs

2. Privacy and Data Upload2. Privacy and Data Upload

TiVo calls server daily Entire profile uploaded and given temp id Server deletes old profiles: sliding window

3.1 Server-side scaling3.1 Server-side scaling

300,000 unique shows /week 10^11 pairs of shows 3M users Average of 90 thumbs / user:

> 10^8 thumbs (ratings) Ratings are sparse in the pair space Don’t need to predict for very unpopular pairs

3.2 Server-side Learning 3.2 Server-side Learning

Building pair-wise item/item correlations on server

Use simple Pearson pair-wise correlation

7 ratings levels per show [-3 … +3]

Only need to maintain 7 * 7 array of counts per pair

Efficient: CPU, memory

Compute r-to-z transform to computer confidence interval

Support-penalized degree of correlation:

lower bound of confidence-interval

Distinguishes r = 0.8 for S=10 versus S=1000

3.3 Throttled Server-side Architecture3.3 Throttled Server-side Architecture

Log Collector 1Boxes 1..100K

Log Collector mBoxes 100K(m-1) .. 100K m

By-series Counter 1Series 0..30K

By-series Counter nSeries 30k(n-1)..30kn

1: By-series-pair Counter and Correlations Calc.

P: By-series-pair Counter and Correlations Calc.

Transmit correlation pairs to TiVo Clients

3.4 Server-side throttling3.4 Server-side throttling

min_single (150)

min_pair (100)

Throttle-able:

More HW available

Increasing TiVo population

Go deeper into distribution tail

DetailsDetails

Pearson r

Weighted average

r-to-z transform

(Fisher)

Standard: Lower bound of

confidence interval:

4. Download to clients4. Download to clients

28K pairs sent to client (320kb) Correl. between old shows don’t change fast New Shows: want to do it faster

5. Client-side processing5. Client-side processing

Ratings must not cause video glitching!

2am: TiVo re-rates all shows

Collab: k-nearest neighbor

Content-based: Naïve Bayes

Previous WorkPrevious Work User-user or item-item - Sarwar et al Form of model

k-nearest neighbor, Bayes nets (Breese et al.), Factor Analysis (Canny) Similarity/distance function Pearson (subsumes cosine) TFIDF corrections (Salton et al.) User amplification

Combination functions: k-NN, Bayes nets.. Evaluation Criteria: MAE, Spearman rank correl.

ContributionsContributions

Large fielded system Large number (3M) of users Long-lived interaction w user: >90 t/user 10^8 ratings over 300K shows Very large in user-hours

Distributed architectureServer: Throttle-ableClients do actual suggestion calculations

Privacy-preservation Privacy and distributed goals aligned No persistent memory of user on server