Incorporating Reliability in a TV Recommender Verus Pronk.

Post on 19-Dec-2015

223 views 0 download

transcript

Incorporating ReliabilityIncorporating Reliabilityin a TV Recommenderin a TV RecommenderIncorporating ReliabilityIncorporating Reliabilityin a TV Recommenderin a TV Recommender

Verus PronkVerus Pronk

2

Context

• Increasing availability of TV programs• Availability of electronic program guides

(EPGs)

How about a personal TV recommender?

Applications• Highlights in EPG• Auto-recording/deletion on HD recorders• Creation of personalized channels

3

Summary

Introduction Naive Bayesian classificationAn exampleReliable classificationResultsConcluding remarks

4

Introduction

Thousands of programs offered each day

People tend to browse only a limited number of channels

EPGs provide easier access

Low percentage of interesting programs

More advanced solutions required

5

Introduction

Programs are described by metadata (EPG)User rates a number of programs as or User profile describes relation between them

TV programrecommender

TV program

trainingset

user

userprofile

6

Introduction

Example of metadata

An Officer and a Gentleman: ( date : Tuesday, Nov. 23, 2004;

time : 20:30 h.;station : SBS 6;genre : drama;cast : Richard Gere;credit : Taylor Hackford;...

)

7

Naive Bayesian classification

Given : a training set X: i-th feature value of x

known class of xGiven : an instance t

Asked : c(t)

Approach: estimatebased on the user profile calculated from X

Xx

Cjjtc ),)(Pr(

Cxc )(ii Vx

8

Naive Bayesian classification

Problem issues

• Cold start• Changing preferences• Feature selection• Accuracy• Reliability• ...

9

Naive Bayesian classification

))(Pr( jtc

)Pr(

))(|Pr())(Pr(

)Pr(

))(|Pr())(Pr(

)|)(Pr(

tx

jxctxjxc

tx

jxctxjxc

txjxc

iii

prior probabilities

conditional probabilities

posterior probabilities

10

Naive Bayesian classification

Conditional independence violation

• The BBC news is always broadcast on the BBC

• Clint Eastwood generally plays in action movies

NBC is nevertheless successfully applied in many application areas

11

Naive Bayesian classification

Priors set to pj

Conditionals estimated using training set

Denominator irrelevant

)Pr(

))(|Pr())(Pr())(Pr(

tx

jxctxjxcjtc iii

12

Naive Bayesian classification

User profile

)(

),,( ~ ))(Pr(

jN

jtiNpjtc i

ij

)0( |})(|{| )(

|})(|{| ),,(

jxcXxjN

vxjxcXxjviN i

)(

),,( argmax)(ˆ

jN

jtiNptc i

ijCj

13

Naive Bayesian classification

Classification error

E is a convex combination of the Ejs

))(|)(ˆPr(

))()(ˆPr(

jxcjxcE

xcxcE

j

14

Naive Bayesian classification

On the prior probabilities

15

An examplefeature value day Monday 31 7 Tuesday 12 43 ... (57) (50) time 20:30 21 7 20:35 22 10 ... (57) (83) genre romance 8 12 drama 17 4 ... (75) (84) cast Richard Gere 23 1 Sandra Bullock 3 6 ... (74) (93) credit Steven Spielberg 11 2 Taylor Hackford 18 4 ... (71) (94)

1

1

1

1

1

16

feature value day Monday 31 7 Tuesday 12 43 ... (57) (50) time 20:30 21 7 20:35 22 10 ... (57) (83) genre romance 8 12 drama 17 4 ... (75) (84) cast Richard Gere 23 1 Sandra Bullock 3 6 ... (74) (93) credit Steven Spielberg 11 2 Taylor Hackford 18 4 ... (71) (94)

100

12

100

43100

21

100

7100

17

100

4100

23

100

1100

18

100

4

2.0

8.0

51055.3

71085.3

Training set:

100 TV programs

100 TV programs

Program: Tue. 20:30 Drama R. Gere T. Hackford

17

Reliable classification

X random N(i, v, j) and N( j) randomand dependent

X uniform both binomially distributed

)0( |})(|{| )(

|})(|{| ),,(

jxcXxjN

vxjxcXxjviN iX

X

)(

),,( argmax)(ˆ

jN

jtiNptc i

ijCj

statisticalanalysis

18

Reliable classification

Theorem 1

Let Z ~ Bin(N, p), 0 < p < 1, Yn ~ Bin(n, q)

Z0 :

Then ...

)0|Pr()Pr( 0 ZnZnZ

0

0

Z

YR Z

19

Reliable classification

where

,)1()1()1(1

)1()1( NNN

N

HpHp

pqqR

qRE

.)(1

N

n

n

N nH

20

Reliable classification

21

Reliable classification

22

Reliable classification

Theorem 2

Let Ri, i = 1, 2, ..., f, independent

r constant

Then

(Ris not actually independent)

22222 iiiiiii RERERrRr

iiii RErRrE

23

Reliable classification

)(

),,(

)(

jN

jviNq

X

jNp

XN

Back to the original problem

24

Reliable classification

Standard deviation of can be estimated by

),( jt

22

1 )(

),,(

)(

),,(1

)(1

)(11

)(1

)(

),,(1

)(

),,(

jN

jtiN

jN

jtiN

n

XjN

XjN

XjN

jN

jtiN

jN

jtiNp i

ii

X

n

n

X

X

iiij

)(

),,(

jN

jtiNp i

ij

),( jtP

25

Reliable classification

Confidence intervals for

),(),( jtjtP

),( jtP

Two approaches

A: Fix and don’t classify if

intervals overlap: coverage

B: Choose such that intervals

just do not overlap: explicitnotion of confidence

26

Results

Simulation TV recommenderTraining sets Briarcliff data

Prior probabilities Set such that E E

EConfidence levels = 0, 0.1, 0.2, ..., 1Training set sizes 100, 400

Approach Aoffset classification error against coverage

27

Results

28

Results

29

Concluding remarks

• Reliability adds another dimension to classification

• Our approach is explicit and robust• Separates difficult from easy instances• Also applicable to other domains

– medical diagnosis– biometrics (e.g. face recognition)

AcknowledgementsSrinivas Gutta, Wim Verhaegh, Dee Denteneer