Rachid Guerraoui , EPFL

transcript

Rachid Guerraoui, EPFL

What is a good recommendation system?

Recommendation systems are good

A good recommendation system is one that provides good recommendations

What is a good recommendation?

You know it when you see it

“ I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description ["hard-core pornography"]; and perhaps I could

never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that”

Justice Potter Stewart, US Supreme Court, 1964

Ideally: Build and deploy your system

Pragmatic: Transform past into future

What is a good recommendation system ?

Example

• Members of program committee (20) want to evaluate the submitted papers (200)

• Nobody has enough time to read all papers

• Each researcher is assigned a subset of papers

• A recommendation system uses the scores to find the opinion of all members about all papers

What is a good recommendation?

It depends on the correlation

Theory to the rescue

• n users• k * n objects• For each user and object: a grade– The grades of a user form his preference vector– The vectors of users form the preference matrix– Grades may be binary, discrete, continuous

General recommendation model

Vectors of grades: v(p)(known partially to the players)

Input?

Vectors of grades: w(p)(seeking to approximate v(p))

Output?

Ideal output

Target output

w(p) = v(p)

Minimize max |w(p)-v(p)| (Hamming distance)

Compare with a perfect on-line algorithm

How to account for the level of correlation?

Shared billboard

(1) All players know all partial vectors

The perfect on-line algorithm

(2) Chooses elements of the partial vectors to fill (B budget)

The player is initially indulgent (learning phase)

The algorithm assigns initial papers

(3) Knows the level of correlation

Hamming diameter of a set P

D(P) = max(v(p) − v(q) ) −∀p,q∈P

The perfect on-line algorithm

20 pc members; 200 papers

Every member can read 10 papers

All have the same taste

Perfect solution possible?

Two clusters of 10 have the same taste

Perfect solution possible?

Every member needs to read 20

Assume player p can probe B objects

n/B*k – 1

How many other players does p need to collaborate with to fill its vector?

4 clusters of 5 with diameter 8

Every member reads 20

What is the minimal error rate?

Ideal algorithm (k=1)• A player p has to use ideas of (n/B)-1 other

players to estimate her/his preferences

In the worst case, p cannot do better

• The rate of error for p depends on the hamming distance between p and the other (n/B) players

• This is with a constant factor of the diameter of these n/B players

For every B-algorithm, there is some distribution of preferences such that (with constant probability)

w(p) − v(p) ≥min(D(P)4) −∀P, p∈P, P ≥ n /B

Proof (sketch)

Consider a constant D > 2B Define a preference vector as follows:

Let P be a set of players of size n/B - Let p in P with a random preference vector -Assign a random preference vector outside P

Choose a set S of D objects. For every player q in P, v(q)=v(p) except in S which is random

Proof (sketch)Probes outside P provide no information to p

Probes inside P provide no information to p w.r.t S

Since p probes at most B objects and S contains D > 2B objects, there are at least D/2 objects for which p has no information

No algorithm can do better than guess preferences in S

The rate of error is at least D/4 and the diameter of P is less than D

Optimality

An algorithm is (B,c)-optimal if for every input set of preferences

w(p) − v(p) ≤ min(cD(P)) −∀p∈P, P ≥ n /B

So what?The best we can do is find clusters of players that

are - Small enough (small diameter) to provide

“accurate” preferencesAnd- Big enough to cover all objects

• Practically speaking? - Try different sizes of clusters

Optimality

• Assume each player can evaluate B objects. • Given B, and the level of correlation among

players, there is a minimum rate of error that can be achieved.

• There is an algorithm that obtains a constant approximation of this error-rate, and each player evalutes O(B.Polylog(n)) objects.

Definition of Optimality

• An algorithm is asymptotically optimal in terms of error rate, if for every player p we have:

• |w(p)-v(p)| < min|P|>n/B-1 cD(P)• Where c is a constant and D(P) is the diameter

of set P. P can be any set of players with size at least n/B.

Rachid Guerraoui , EPFL

Documents