Date post: | 22-Jun-2015 |
Category: |
Technology |
Upload: | alvaro-luis |
View: | 161 times |
Download: | 0 times |
The PageRank Citation Ranking:The PageRank Citation Ranking:Bringing Order to the WebBringing Order to the Web
Larry Page etc.
Stanford University
Presented by
Guoqiang Su & Wei Li
ContentsContents
MotivationRelated workPage Rank & Random Surfer ModelImplementationApplicationConclusion
MotivationMotivation
Web: heterogeneous and unstructuredFree of quality control on the webCommercial interest to manipulate ranking
Related WorkRelated Work
Academic citation analysisLink-based analysisClustering methods of link structureHubs & Authorities Model
BacklinkBacklink
Link Structure of the WebApproximation of importance / quality
PageRankPageRank
Pages with lots of backlinks are importantBacklinks coming from important pages
convey more importance to a page
Problem: Rank Sink
uBv vN
vRcuR
)()(
Rank SinkRank SinkPage cycles pointed by some incoming link
Problem: this loop will accumulate rank but never distribute any rank outside
Escape TermEscape Term
Solution: Rank Source
c is maximized and = 1E(u) is some vector over the web pages
– uniform, favorite page etc.
)()(
)( ucEN
vRcuR
uBv v
1R
Matrix NotationMatrix Notation
R is the dominant eigenvector and c is the dominant eigenvalue of because c is maximized
ReEAcR TT )(
)( TeEA
Computing PageRankComputing PageRank
- initialize vector over web pages
loop:
- new ranks sum of normalized backlink ranks
- compute normalizing factor
- add escape term
- control parameter
while - stop when converged
SR 0
iT
i RAR 1
111 ii RRd
dERR ii 11
ii RR 1
Random Surfer ModelRandom Surfer Model Page Rank corresponds to the probability
distribution of a random walk on the web graphs
E(u) can be re-phrased as the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever
ImplementationImplementationComputing resources — 24 million pages — 75 million URLs
Memory and disk storage
Weight Vector
(4 byte float)
Matrix A (linear access)
Implementation (Con't)Implementation (Con't)
Unique integer ID for each URLSort and Remove dangling linksRank initial assignmentIteration until convergenceAdd back dangling links and Re-compute
Convergence PropertiesConvergence PropertiesGraph (V, E) is an expander with factor if
for all (not too large) subsets S: |As| |s|Eigenvalue separation: Largest eigenvalue
is sufficiently larger than the second-largest eigenvalue
Random walk converges fast to a limiting probability distribution on a set of nodes in the graph.
Convergence Properties (con't)Convergence Properties (con't)PageRank computation is O(log(|V|)) due to
rapidly mixing graph G of the web.
Personalized PageRankPersonalized PageRankRank Source E can be initialized :
– uniformly over all pages: e.g. copyright warnings, disclaimers, mailing lists archives
result in overly high ranking– total weight on a single page, e.g. Netscape, McCarthy
great variation of ranks under different single pages as rank source
– and everything in-between, e.g. server root pages
allow manipulation by commercial interests
Applications IApplications IEstimate web traffic
– Server/page aliases
– Link/traffic disparity, e.g. porn sites, free web-mail
Backlink predictor– Citation counts have been used to predict future citations
– very difficult to map the citation structure of the web completely
– avoid the local maxima that citation counts get stuck in and get better performance
Applications II - Ranking ProxyApplications II - Ranking Proxy
Surfer's Navigation Aid
Annotating links by PageRank (bar graph)
Not query dependent
IssuesIssues Users are no random walkers – Content based methods Starting point distribution
– Actual usage data as starting vector
Reinforcing effects/bias towards main pages How about traffic to ranking pages? No query specific rank Linkage spam – PageRank favors pages that managed to get other pages to link to them – Linkage not necessarily a sign of relevancy, only of promotion (advertisement…)
Evaluation IEvaluation I
Evaluation IIEvaluation II
ConclusionConclusionPageRank is a global ranking based on the
web's graph structurePageRank use backlinks information to
bring order to the webPageRank can separate out representative
pages as cluster centerA great variety of applications