1
CS 886 Advanced Topics in Artificial Intelligence: Multiagent Systems
Rank Aggregation Methods for the WebCynthia Dwork Ravi Kumar Moni Naor D. Sivakumar
Presented by: Wanying Luo
2
Outline
What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion
3
What is rank aggregation problem
A
B
D
C
B
D
A
C
B
A
C
D
Based on different ranking techniques and criteria, we may get different results
4
What is rank aggregation problem
A
B
D
C
B
D
A
C
B
A
C
D
Need to obtain a ”consensus” ranking of all the individual rankings
B
A
D
C
5
Outline
What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion
6
Motivation
Provide robust meta searchExamples of meta search engines
ClustyDogpileMetacrawler
Spamhttp://searchenginewatch.com/showPage.html?page=3483601
Commercial interests, e.g., sponsored links
7
8
9
10
Motivation
Provide robust meta searchExamples of meta search engines
ClustyDogpileMetacrawler
Spamhttp://searchenginewatch.com/showPage.html?page=3483601
Commercial interests, e.g., sponsored links
11
Motivation
Provide robust meta searchExamples of meta search engines
ClustyDogpileMetacrawler
Spamhttp://searchenginewatch.com/showPage.html?page=3483601
Commercial interests, e.g., sponsored links
12
Motivation
User may provide a variety of searching criteria
13
Outline
What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion
14
Challenges
Unrealistic to rank the entire collection of pages on the web
29.7 billion pages on the World Wide Web as of February 2007 (http://www.boutell.com/)
Most search engines rank only the top few hundred entries
15
Challenges
Unrealistic to rank the entire collection of pages on the web
29.7 billion pages on the World Wide Web as of February 2007 (http://www.boutell.com/)
Most search engines rank only the top few hundred entries
16
Outline
What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion
17
Preliminaries
Ordered listGiven a universe U, an ordered list τ with respect to U is anordering(aka ranking) of a subset S ⊆ U, i.e.
τ=[χ1 ≥ χ2 ≥ …≥ χd ]
Full listτ contains all the elements in U
Partial list|τ| < |U|
18
Preliminaries
Ordered listGiven a universe U, an ordered list τ with respect to U is anordering(aka ranking) of a subset S ⊆ U, i.e.
τ=[χ1 ≥ χ2≥ …≥ χd ]
Full listτ contains all the elements in U
Partial list|τ| < |U|
19
Preliminaries
Ordered listGiven a universe U, an ordered list τ with respect to U is anordering(aka ranking) of a subset S ⊆ U, i.e.
τ=[χ1 ≥ χ2≥ …≥ χd ]
Full listτ contains all the elements in U
Partial list|τ| < |U|
20
Preliminaries
Rank aggregation approachGoal: minimize the total disagreement between several rankings Spearman footrule distance
Given two full lists σ and τ, F(σ,τ)=Σi=1|σ(i)-τ(i)|Kendall tau distance
Given two full lists σ and τ, K(σ,τ)=|{(i,j) | i<j, σ(i)<σ(j) but τ(i)>τ(j) }|These two measurements can be generalized to several listsCan also be generalized to partial lists
21
Preliminaries
Rank aggregation approachGoal: minimize the total disagreement between several rankings Spearman footrule distance
Given two full lists σ and τ, F(σ,τ)=Σi=1|σ(i)-τ(i)|Kendall tau distance
Given two full lists σ and τ, K(σ,τ)=|{(i,j) | i<j, σ(i)<σ(j) but τ(i)>τ(j) }|These two measurements can be generalized to several listsCan also be generalized to partial lists
22
Preliminaries
Rank aggregation approachGoal: minimize the total disagreement between several rankings Spearman footrule distance
Given two full lists σ and τ, F(σ,τ)=Σi=1|σ(i)-τ(i)|Kendall tau distance
Given two full lists σ and τ, K(σ,τ)=|{(i,j) | i<j, σ(i)<σ(j) but τ(i)>τ(j) }|These two measurements can be generalized to several listsCan also be generalized to partial lists
23
Preliminaries
Rank aggregation approachGoal: minimize the total disagreement between several rankings Spearman footrule distance
Given two full lists σ and τ, F(σ,τ)=Σi=1|σ(i)-τ(i)|Kendall tau distance
Given two full lists σ and τ, K(σ,τ)=|{(i,j) | i<j, σ(i)<σ(j) but τ(i)>τ(j) }|These two measurements can be generalized to several listsCan also be generalized to partial lists
24
Preliminaries
Rank aggregation approachGoal: minimize the total disagreement between several rankings Spearman footrule distance
Given two full lists σ and τ, F(σ,τ)=Σi=1|σ(i)-τ(i)|Kendall tau distance
Given two full lists σ and τ, K(σ,τ)=|{(i,j) | i<j, σ(i)<σ(j) but τ(i)>τ(j) }|These two measurements can be generalized to several listsCan also be generalized to partial lists
25
Outline
What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion
26
First result: spam resistance in meta-search
Extended Condorcet Criterion (ECC) If there is a partition (C, C’) of S such that for any x∈Cand y∈C’ the majority prefers x to y, then x must be ranked above y
ECC can be used to fight spam in meta-searchHow to achieve ECC efficiently
Local Kemenization method
27
First result: spam resistance in meta-search
Extended Condorcet Criterion (ECC) If there is a partition (C, C’) of S such that for any x∈Cand y∈C’ the majority prefers x to y, then x must be ranked above y
ECC can be used to fight spam in meta-searchHow to achieve ECC efficiently
Local Kemenization method
28
First result: spam resistance in meta-search
Extended Condorcet Criterion (ECC) If there is a partition (C, C’) of S such that for any x∈Cand y∈C’ the majority prefers x to y, then x must be ranked above y
ECC can be used to fight spam in meta-searchHow to achieve ECC efficiently
Local Kemenization method
29
First result: spam resistance in meta-search
An example to illustrate Local Kemenization …
30
First result: spam resistance in meta-search
B
A
C
E
D
A
B
D
E
C
B
C
A
D
E
A
B
D
C
E
31
First result: spam resistance in meta-search
B
A
C
E
D
A
B
D
E
C
B
C
A
D
E
A
B
D
C
E
A
B
32
First result: spam resistance in meta-search
B
A
C
E
D
A
B
D
E
C
B
C
A
D
E
A
B
D
C
E
A
B
B>A A>B B>A
33
First result: spam resistance in meta-search
B
A
C
E
D
A
B
D
E
C
B
C
A
D
E
A
B
D
C
E
B
A
34
First result: spam resistance in meta-search
B
A
C
E
D
A
B
D
E
C
B
C
A
D
E
A
B
D
C
E
B
A
D
35
First result: spam resistance in meta-search
B
A
C
E
D
A
B
D
E
C
B
C
A
D
E
A
B
D
C
E
B
A
D
A>D A>D A>D
36
First result: spam resistance in meta-search
B
A
C
E
D
A
B
D
E
C
B
C
A
D
E
A
B
D
C
E
B
A
D
37
First result: spam resistance in meta-search
B
A
C
E
D
A
B
D
E
C
B
C
A
D
E
A
B
D
C
E
B
A
D
C
38
First result: spam resistance in meta-search
B
A
C
E
D
A
B
D
E
C
B
C
A
D
E
A
B
D
C
E
B
A
D
C is preferred to D in two lists
D is preferred to C in one list
C
C>D D>C C>D
39
First result: spam resistance in meta-search
B
A
C
E
D
A
B
D
E
C
B
C
A
D
E
A
B
D
C
E
B
A
C
D
40
First result: spam resistance in meta-search
B
A
C
E
D
A
B
D
E
C
B
C
A
D
E
A
B
D
C
E
B
A
C
D
A>C A>C C>A
41
First result: spam resistance in meta-search
B
A
C
E
D
A
B
D
E
C
B
C
A
D
E
A
B
D
C
E
B
A
C
D
42
Outline
What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion
43
Second result: Markov chain methods
Markov chain A set of states S={1,2,...,n}An n x n matrix MBegins with an initial state xAt each step the system moves from state i to state j with probability Mij
44
Second result: Markov chain methods
Under some nice condition, system eventually reaches a fixed point irrespective of the initial state x
45
Second result: Markov chain methods
B
A
C
A
B
C
A
C
B
Original rankings
46
Second result: Markov chain methods
B
A
C
A
B
C
A
C
B
0.1 0.1 0.3 0.3
0.2
0.20.50.7
0.6
A
BC
Original rankings
47
Second result: Markov chain methods
B
A
C
A
B
C
A
C
B
0.1 0.1 0.3 0.3
0.2
0.20.50.7
0.6
A
BC
A
B
C
Original rankings Aggregated ranking
48
Second result: Markov chain methods
Assume the current state is page PMC1: The next state is chosen uniformly from the multiset of all pages that were ranked higher than or equal to P by some search engine that ranked PPlease refer to the paper for the rest …
MC2MC3
MC4
49
Second result: Markov chain methods
Assume the current state is page PMC1: The next state is chosen uniformly from the multiset of all pages that were ranked higher than or equal to P by some search engine that ranked PPlease refer to the paper for the rest …
MC2MC3MC4
50
Outline
What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion
51
Applications
Meta-searchSpam reductionMulti-criteria searchSearch engine comparison
52
Applications
Meta-searchSpam reductionMulti-criteria searchSearch engine comparison
53
Applications
Meta-searchSpam reductionMulti-criteria searchSearch engine comparison
54
Applications
Meta-searchSpam reductionMulti-criteria searchSearch engine comparison
55
Outline
What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion
56
Experiments
Experiments were conducted by using the following search engines: Altavista(AV), Alltheweb(AW), Excite(EX), Google(GG), Hotbot(HB), Lycos(LY) and Northernlight(NL)Experiment on meta-search using several keywords: “affirmative action”, alcoholism, sushi, ...
57
Experiments
Experiments were conducted by using the following search engines: Altavista(AV), Alltheweb(AW), Excite(EX), Google(GG), Hotbot(HB), Lycos(LY) and Northernlight(NL)Experiment on meta-search using several keywords: “affirmative action”, alcoholism, sushi, ...
58
Experiments
59
Experiments
SFO and MC4 outperform the other 4 algorithms
MC4 performs better than SFO most of the time
60
Experiments
Experiment on spam reduction using queries: Feng shui, organic vegetable, gardening
61
Experiments
62
Experiments
63
Experiments
64
Experiments
65
Experiments
Local Kemenization works!
66
Outline
What is rank aggregation problemMotivationChallengesPreliminariesFirst result: spam resistance in meta-searchSecond result: Markov chain methodsApplicationsExperimentsConclusion
67
Conclusion
Proposed several rank aggregation techniques using Markov chainEstablished the value of Extended CondorcetCriterion (ECC)
Spam resistanceFuture work
Obtain a qualitative understanding of why Markov chain methods perform well
68
Conclusion
Proposed several rank aggregation techniques using Markov chainEstablished the value of Extended CondorcetCriterion (ECC)
Spam resistanceFuture work
Obtain a qualitative understanding of why Markov chain methods perform well
69
Conclusion
Proposed several rank aggregation techniques using Markov chainEstablished the value of Extended CondorcetCriterion (ECC)
Spam resistanceFuture work
Obtain a qualitative understanding of why Markov chain methods perform well
70
Questions???