PAGERANK PARAMETERS - University of Chicagolekheng/meetings/matho...40 60 80 100 120 40 60 80 mm...

transcript

40 60 80 100 120

PAGERANK PARAMETERS

David F. GleichAmy N. Langville

American Institute of MathematicsWorkshop on Ranking

Palo Alto, CAAugust 17th, 2010

Gleich & Langville AIM 1 / 21

40 60 80 100 120

The most important page on the web

Gleich & Langville Recap AIM 2 / 21

40 60 80 100 120

The most important page on the web

40 60 80 100 120

PageRank details

1/6 1/2 0 0 0 01/6 0 0 1/3 0 01/6 1/2 0 1/3 0 01/6 0 1/2 0 0 01/6 0 1/2 1/3 0 11/6 0 0 0 1 0

︸︷︷︸

Pj≥0eTP=eT

“jump” → v = [ 1n ... 1n ]T ≥0

Markov chain�

αP+ (1− α)veT�

x = xunique x ⇒ j ≥ 0, eTx = 1.

Linear system (− αP)x = (1− α)vIgnored dangling nodes patched back to v

algorithms laterGleich & Langville Recap AIM 3 / 21

40 60 80 100 120

Other uses for PageRankWhat else people use PageRank to do

GeneRankMorrison et al. GeneRank, 2005

10 20 30 40 50 60 70

NM_003748NM_003862Contig32125_RCU82987AB037863NM_020974Contig55377_RCNM_003882NM_000849Contig48328_RCContig46223_RCNM_006117NM_003239NM_018401AF257175AF201951NM_001282Contig63102_RCNM_000286Contig34634_RCNM_000320AB033007AL355708NM_000017NM_006763AF148505Contig57595NM_001280AJ224741U45975Contig49670_RCContig753_RCContig25055_RCContig53646_RCContig42421_RCContig51749_RCAL137514NM_004911NM_000224NM_013262Contig41887_RCNM_004163AB020689NM_015416Contig43747_RCNM_012429AB033043AL133619NM_016569NM_004480NM_004798Contig37063_RCNM_000507AB037745Contig50802_RCNM_001007Contig53742_RCNM_018104Contig51963Contig53268_RCNM_012261NM_020244Contig55813_RCContig27312_RCContig44064_RCNM_002570NM_002900AL050090NM_015417Contig47405_RCNM_016337Contig55829_RCContig37598Contig45347_RCNM_020675NM_003234AL080110AL137295Contig17359_RCNM_013296NM_019013AF052159Contig55313_RCNM_002358NM_004358Contig50106_RCNM_005342NM_014754U58033Contig64688NM_001827Contig3902_RCContig41413_RCNM_015434NM_014078NM_018120NM_001124L27560Contig45816_RCAL050021NM_006115NM_001333NM_005496Contig51519_RCContig1778_RCNM_014363NM_001905NM_018454NM_002811NM_004603AB032973NM_006096D25328Contig46802_RCX94232NM_018004Contig8581_RCContig55188_RCContig50410Contig53226_RCNM_012214NM_006201NM_006372Contig13480_RCAL137502Contig40128_RCNM_003676NM_013437Contig2504_RCAL133603NM_012177R70506_RCNM_003662NM_018136NM_000158NM_018410Contig21812_RCNM_004052Contig4595Contig60864_RCNM_003878U96131NM_005563NM_018455Contig44799_RCNM_003258NM_004456NM_003158NM_014750Contig25343_RCNM_005196Contig57864_RCNM_014109NM_002808Contig58368_RCContig46653_RCNM_004504M21551NM_014875NM_001168NM_003376NM_018098AF161553NM_020166NM_017779NM_018265AF155117NM_004701NM_006281Contig44289_RCNM_004336Contig33814_RCNM_003600NM_006265NM_000291NM_000096NM_001673NM_001216NM_014968NM_018354NM_007036NM_004702Contig2399_RCNM_001809Contig20217_RCNM_003981NM_007203NM_006681AF055033NM_014889NM_020386NM_000599Contig56457_RCNM_005915Contig24252_RCContig55725_RCNM_002916NM_014321NM_006931AL080079Contig51464_RCNM_000788NM_016448X05610NM_014791Contig40831_RCAK000745NM_015984NM_016577Contig32185_RCAF052162AF073519NM_003607NM_006101NM_003875Contig25991Contig35251_RCNM_004994NM_000436NM_002073NM_002019NM_000127NM_020188AL137718Contig28552_RCContig38288_RCAA555029_RCNM_016359Contig46218_RCContig63649_RCAL080059

Use (− αGD−1)x =w tofind “nearby” importantgenes.

ProteinRankObjectRankEventRank

IsoRankClustering

Sports rankingFood websCentrality

Reverse PageRankFutureRank

SocialPageRankBookRank

ArticleRankItemRankSimRank

DiffusionRankTrustRankTweetRank

Note New paper LabRank with a random scientist?

40 60 80 100 120

Ulam NetworksChirikov mapyt+1 = ηyt+k sin(t+θt)t+1 = t + yt+1

Ulam network1. divide phase space into uniform cells2. form P based on trajectories.

log(E [x(A)]) log(Std [x(A)]))/ log(E [x(A)])

A ∼ Bet(2,16)Note White is larger, black is smaller

Google matrix, dynamical attractors, and Ulam networks, Shepelyansky and Zhirov, arXivGleich & Langville Recap AIM 5 / 21

40 60 80 100 120

Choosing alphaSlide 6 of 21

Choosing alpha

Choosing personalization

Related methods

Open issues

40 60 80 100 120

What is alpha? There’s no single answer.Ask yourself, why am I computing PageRank? Then use the bestvalue for your application.

web-search → tune α for the best featurevector

node centrality → understand what randomjumps mean in your graph

find important nodesin a web-graph → use the random surfer inter-

pretation

Author αBrin and Page (1998) 0.85Najork et al. (2007) 0.85Litvak et al. (2006) 0.5Pan el al. (2004) 0.15Algorithms (...) ≥ 0.85Experiment ???

Gleich & Langville Choosing alpha AIM 7 / 21

40 60 80 100 120

The PageRank limit valueSingular? (− αP)x = (1− α)v

P = X�

− αX�

X−1�

x = (1− α)v

− α�

��

X−1x = (1− α)v�

− α�

��

y = (1− α)z

(1− α)y1 = (1− α)z1(− αJ2)y2 = (1− α)z2

Boldi et al. 2003: PageRank as a function of the damping parameter

40 60 80 100 120

TotalRank

t =∫ 1

0x(α)dα

Proposed by Boldi et al. (2005) as a parameter free PageRank.

40 60 80 100 120

Generalized PageRank

PageRank (− αP)x = (1− α)vx =

∑∞=0(1− α)(α

Generalized PageRank y =∑∞

=0 ƒ ()Pv

ƒ () <∞

TotalRank ƒ () = 1+1 −

LinearRank ...HyperRank ...

Baeza-Yates et al. 2006

40 60 80 100 120

Pick a distributionMultiple surfers should have an impact!

Each person picks α from distribution A

↓x(E [A])

↓E [x(A)]

↘ ↙x(E [A]) 6= E [x(A)]

TotalRank : E [x(A)] : A ∼ U[0,1]Constantine & Gleich, Internet Mathematics, in press.

40 60 80 100 120

From users

Raw α

density

0.0 0.2 0.4 0.6 0.8 1.0

Sample mean μ̄ = 0.631.Gleich et al., WWW2010Note 257,664 users from Microsoft toolbar data

40 60 80 100 120

ChoosingpersonalizationSlide 13 of 21

Choosing alpha

Related methods

Open issues

40 60 80 100 120

Personalization choices

Application specific

É GeneRank : v = normalized microarray weightsÉ TopicRank: v = pages on the same topicÉ TrustRank: v = only pages known to be goodÉ BadRank: v = only pages known to be bad (an reverse the

graph)

Super-personalized

É Set v to have only a single non-zero : v = e.

Gleich & Langville Choosing personalization AIM 14 / 21

40 60 80 100 120

Personalized PageRank

B = (1− α)(− αP)−1

Bj = “personalized score of page when jumping to page ”

Gleich & Langville Choosing personalization AIM 15 / 21

40 60 80 100 120

Related methodsSlide 16 of 21

Choosing alpha

Related methods

Open issues

40 60 80 100 120

PageRank history

See Vigna 2010: Spectral Ranking andFranceschet 2010: PageRank: Standing on the shoulder of giants.

Let A be the adjacency matrix of a graph.PageRank (− αP)x = (1− α)v (αP+ (1− α)veT )x = x

Seeley 1949 Px = x

Wei 1952 ATx = x

Katz 1953 (− αA)x = e

Hubbell 1965 ATx = x+ v

Gleich & Langville Related methods AIM 17 / 21

40 60 80 100 120

Graph centrality

For a graph G, a score assigned to each vertex ∈ V is acentrality score if larger scores are “more central” vertices andthe score is independent of the labeling on the vertices.

Gleich & Langville Related methods AIM 18 / 21

40 60 80 100 120

Open issuesSlide 19 of 21

Choosing alpha

Related methods

Open issues

40 60 80 100 120

Vigna, A history of spectral ranking, MMDS2010

40 60 80 100 120

Other issues

Gleich & Langville Open issues AIM 21 / 21

40 60 80 100 120

QUESTIONS?

PAGERANK PARAMETERS - University of Chicagolekheng/meetings/matho...40 60 80 100 120 40 60 80 mm...

Documents