TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Efficient and Decentralized PageRankApproximation in P2P Networks with
Malicious Agents
Josiane Xavier Parreira ? , Debora Donato � ,Carlos Castillo � , Gerhard Weikum ?
? Max-Planck Institute for Informatics� Yahoo! Research Barcelona
The Future of Web SearchJune 18, 2007
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Introduction
Distributed Web Search
Limitation of current centralized approach to Web search:
political issues
privacy
scalability
cope with the dynamicity of the Web
Solution
Distribute Web search facilities in distributed environment
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Introduction
Peer-to-peer technology
for storing and sharing information
to guarantee scalability and robustness
P2P Web Search advantages
lighter load
smaller data volume
more computational resources
Limitations
Decentralized nature opens doors to malicious behaviors frompeers.
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Focus – Decentralized Ranking
Ranking
Ranking is a fundamental task in Web Search.
Decentralized PageRank – JXP algorithm[VLDB’06]
Decentralized algorithm for computing authority scores ofpages in a P2P Network
Assumes peers are always honest.
Trusted Decentralized PageRank – TrustJXP[AIRWeb’07]
Decentralized reputation system to be integrated intoJXP.
Allows computation of “trusted” authority scores.
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
JXP Algorithm [VLDB’06]
1
W node:
G → C
J → E
A
B
D
E
WC
W node:
K → E
L → G
F
G
WE
A → F
E → G
G → C
F → A
E → B
W node:
G → C
J → E
F → A
F → E
K → E
A
B
D
E
WC
W node:
K → E
F
A → F
E → GF → E
F → G
Peer X
Peer Y
Peer X
Subgraph relevant to Peer X
F → A
Runs locally at every peer
Combines local PageRank computations + Meetings betweenpeers
JXP scores converge to the true global PageRank scores
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
TrustJXP Algorithm
Goal
Detect when peers report false scores at the meeting phase.
Idea
Analyze peer’s deviation from common features thatconstitute usual peer profile.
Forms of attack addressed
Peers report higher scores for a subset of their localpages.
Peers permute the scores of its local pages.
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Malicious Increase of Scores
Why peers cheat
High authority scores for local pages can bring benefits to apeer.
Our approach
Analyze the distribution of the scores reported by a peer.
Use histograms to store and compare score distributions.
Motivation: Web graph is self-similar → local scoresdistribution should resemble global distribution after afew iterations.
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Histograms
Histograms
Each peer stores a histogram H.
Scores from other peers are inserted after each meeting.
A novelty factor accounts for the dynamics of the scores.
H(t+1) = (1− ρ)Ht + ρD
D is the score distribution of the other peer, and ρ is thenovelty factor.
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Histograms
Comparing Histograms
Hellinger Distance
HDi ,j =1√2[∑k
(√
Hi (k)−√
Dj(k))2]12
k = total number of bucketsHi (k) and Dj(k) = number of elements at bucket k at thetwo distributions
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Malicious Permutation of Scores
Problem
Peers can cheat and yet keep the original scoredistribution.
Histogram comparison not effective in this case.
Our approach
Compare the rankings from both peers for theoverlapping graph.
Observation: Relative order of scores very close to theactual ordering, after few meetings.
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Comparing Rankings
Tolerant Kendall’s Tau Distance
K ′i ,j =|(a, b) : a < b ∧ scorei (a)− scorei (b) ≥ ∆
∧ τi (a) < τi (b) ∧ τj(a) > τj(b)|
scorei (a), scorei (b) = scores of pages a and b at peer iτi , τj = rankings of pages at peers i and j∆ = tolerance threshold
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
TrustJXP Algorithm
Computing Trust Scores
Idea: Combine previous measures to assign trust scoresto peers.
Each peer assigns its own trust score to another peer, ateach meeting step.
How to combine the measures? We take a “safer”approach.
θi ,j = min(1− HDi ,j , 1− K ′i ,j)
Trust score is integrated to the JXP computing, at themerging lists phase.
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Integrating Trust Scores and JXP Scores
Integrating Trust Scores and JXP Scores
When merging lists, scores from both lists can becombined by either averaging or taking the max score.
If page is not present on a list → score = 0.
Averaging the scores
JXP: L′(i) = (LA(i) + LB(i))/2TrustJXP: L′(i) = (1− θ/2) ∗ LA(i) + θ/2 ∗ LB(i)
Taking max score
JXP: L′(i) = max(LA(i), LB(i))TrustJXP: L′(i) = max(LA(i), θ ∗ LB(i))
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Experimental Results
Web collection
Obtained using a focused crawler.
134,405 pages, 1,915,401 links.
10 categories.
Setup
100 honest peers, 10 peers/category.
Malicious peers
Perform JXP meetings and local PR computation like anormal peer.Lie when asked by another peer about the local scores,according to attacks previously described.
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Experimental Results
Evaluation Measures
“Global” JXP ranking vs. Global PageRank ranking.
Spearman’s Footrule Distance at top-k.
Linear error score at top-k.
Cosine at full ranking.
L1 norm of full JXP ranking (L1 norm of Global PRalways 1).
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
JXP Performance - No Malicious Peers
0 5000 10000 15000 200000
0.1
0.2
0.3
0.4
Number of Meetings in the Network
Spearman’s Footrule Distance
100 Peers − No Malicious
0 1000 2000 3000 4000 50000
0.5
1
1.5
2x 10
−4
Number of Meetings in the Network
Linear Error Score
100 Peers − No Malicious
0 5000 10000 15000 200000.8
0.85
0.9
0.95
1
Number of Meetings in the Network
Cosine
100 Peers − No Malicious
0 5000 10000 15000 200000.4
0.6
0.8
1
Number of Meetings in the Network
L1 Norm
100 Peers − No Malicious
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Impact of Malicious Peers(Peers report 2x the true score value for all local pages)
0 1000 2000 3000 40000.2
0.4
0.6
0.8
1
Number of Meetings in the Network
Spearman’s Footrule Distance
110 Peers − 10 Malicious150 Peers − 50 Malicious
0 1000 2000 3000 40000
2
4
6
8x 10−4
Number of Meetings in the Network
Linear Score Error
110 Peers − 10 Malicious150 Peers − 50 Malicious
0 1000 2000 3000 40000.7
0.8
0.9
1
Number of Meetings in the Network
Cosine
110 Peers − 10 Malicious150 Peers − 50 Malicious
0 1000 2000 3000 40000
1
2
3
4
Number of Meetings in the Network
L1 Norm
110 Peers − 10 Malicious150 Peers − 50 Malicious
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Averaging the Scores
0 1000 2000 3000 4000 50000.1
0.2
0.3
0.4
Number of Meetings in the Network
Spearman’s Footrule Distance
Peers reporting 2x the scoresPeers reporting 5x the scores
0 1000 2000 3000 4000 50000.5
1
1.5
2x 10−4
Number of Meetings in the Network
Linear Score Error
Peers reporting 2x the scoresPeers reporting 5x the scores
0 1000 2000 3000 4000 50000.8
0.85
0.9
0.95
1
Number of Meetings in the Network
Cosine
Peers reporting 2x the scoresPeers reporting 5x the scores
0 1000 2000 3000 4000 50000.5
1
1.5
0.75
1.25
Number of Meetings in the Network
L1 Norm
Peers reporting 2x the scoresPeers reporting 5x the scores
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Trust Model
0 1000 2000 3000 4000 50000
0.5
1Histograms Divergence
Number of Meetings in the Network
(a)
0 1000 2000 3000 4000 50000
0.5
1Rank Divergence
Number of Meetings in the Network
(b)
0 1000 2000 3000 4000 50000
0.5
1Histograms Divergence
Number of Meetings in the Network
(c)
0 1000 2000 3000 4000 50000
0.5
1Rank Divergence
Number of Meetings in the Network
(d)
Figure: Increased-scores attack: (a) and (b). Permuted-scoresattack: (c) and (d). A green circle (◦) represents a meetingbetween two honest peers, and a red cross (×) a meeting betweenan honest and a dishonest peers.
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Trust Scores (Random Attacks)
0 1000 2000 3000 4000 50000
0.5
1Histograms Divergence
Number of Meetings in the Network0 1000 2000 3000 4000 5000
0
0.5
1Rank Divergence
Number of Meetings in the Network
0 1000 2000 3000 4000 50000
0.5
1Trust Scores
Number of Meetings in the Network
Max. Detection Falseθ rate positives
0.9 37.4% 4.7%0.8 86.9% 12.1%0.6 98.0% 54.5%
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Trust JXP
0 5000 10000 15000 200000
0.1
0.2
0.3
0.4
Number of Meetings in the Network
Spearman’s Footrule Distance
Ideal Trust ModelOur Trust ModelNo Trust Model
0 5000 10000 15000 200000
1
2
3
4x 10−4
Number of Meetings in the Network
Linear Score Error
Ideal Trust ModelOur Trust ModelNo Trust Model
0 5000 10000 15000 200000.8
0.85
0.9
0.95
1
Number of Meetings in the Network
Cosine
Ideal Trust ModelOur Trust ModelNo Trust Model
0 5000 10000 15000 200000.5
1
1.5
2
Number of Meetings in the Network
L1 Norm
Ideal Trust ModelOur Trust ModelNo Trust Model
* 150 Peers - 50 Malicious; Mixed malicious behavior
TrustedDecentralized
PageRank
Introduction
JXP
TrustJXP
ExperimentalResults
Conclusion andFuture Work
Conclusion
TrustJXP algorithm for identifying and reducing theimpact of cheating peers.
Uses scores distribution and ranking analysis to detectmalicious behavior.
Experiments demonstrate viability of the method.
Future Work
Detect other types of malicious behaviors.
Network dynamics.