PAGERANK PARAMETERS - University of Chicagolekheng/meetings/matho...40 60 80 100 120 40 60 80 mm...

Post on 16-Sep-2020

2 views 0 download

transcript

40 60 80 100 120

40

60

80

mm

PAGERANK PARAMETERS

David F. GleichAmy N. Langville

American Institute of MathematicsWorkshop on Ranking

Palo Alto, CAAugust 17th, 2010

Gleich & Langville AIM 1 / 21

40 60 80 100 120

40

60

80

mm

The most important page on the web

Gleich & Langville Recap AIM 2 / 21

40 60 80 100 120

40

60

80

mm

The most important page on the web

Gleich & Langville Recap AIM 2 / 21

40 60 80 100 120

40

60

80

mm

PageRank details

1

2

3

4

5

6

1/6 1/2 0 0 0 01/6 0 0 1/3 0 01/6 1/2 0 1/3 0 01/6 0 1/2 0 0 01/6 0 1/2 1/3 0 11/6 0 0 0 1 0

︸ ︷︷ ︸

P

Pj≥0eTP=eT

“jump” → v = [ 1n ... 1n ]T ≥0

eTv=1

Markov chain�

αP+ (1− α)veT�

x = xunique x ⇒ j ≥ 0, eTx = 1.

Linear system (− αP)x = (1− α)vIgnored dangling nodes patched back to v

algorithms laterGleich & Langville Recap AIM 3 / 21

40 60 80 100 120

40

60

80

mm

Other uses for PageRankWhat else people use PageRank to do

GeneRankMorrison et al. GeneRank, 2005

10 20 30 40 50 60 70

NM_003748NM_003862Contig32125_RCU82987AB037863NM_020974Contig55377_RCNM_003882NM_000849Contig48328_RCContig46223_RCNM_006117NM_003239NM_018401AF257175AF201951NM_001282Contig63102_RCNM_000286Contig34634_RCNM_000320AB033007AL355708NM_000017NM_006763AF148505Contig57595NM_001280AJ224741U45975Contig49670_RCContig753_RCContig25055_RCContig53646_RCContig42421_RCContig51749_RCAL137514NM_004911NM_000224NM_013262Contig41887_RCNM_004163AB020689NM_015416Contig43747_RCNM_012429AB033043AL133619NM_016569NM_004480NM_004798Contig37063_RCNM_000507AB037745Contig50802_RCNM_001007Contig53742_RCNM_018104Contig51963Contig53268_RCNM_012261NM_020244Contig55813_RCContig27312_RCContig44064_RCNM_002570NM_002900AL050090NM_015417Contig47405_RCNM_016337Contig55829_RCContig37598Contig45347_RCNM_020675NM_003234AL080110AL137295Contig17359_RCNM_013296NM_019013AF052159Contig55313_RCNM_002358NM_004358Contig50106_RCNM_005342NM_014754U58033Contig64688NM_001827Contig3902_RCContig41413_RCNM_015434NM_014078NM_018120NM_001124L27560Contig45816_RCAL050021NM_006115NM_001333NM_005496Contig51519_RCContig1778_RCNM_014363NM_001905NM_018454NM_002811NM_004603AB032973NM_006096D25328Contig46802_RCX94232NM_018004Contig8581_RCContig55188_RCContig50410Contig53226_RCNM_012214NM_006201NM_006372Contig13480_RCAL137502Contig40128_RCNM_003676NM_013437Contig2504_RCAL133603NM_012177R70506_RCNM_003662NM_018136NM_000158NM_018410Contig21812_RCNM_004052Contig4595Contig60864_RCNM_003878U96131NM_005563NM_018455Contig44799_RCNM_003258NM_004456NM_003158NM_014750Contig25343_RCNM_005196Contig57864_RCNM_014109NM_002808Contig58368_RCContig46653_RCNM_004504M21551NM_014875NM_001168NM_003376NM_018098AF161553NM_020166NM_017779NM_018265AF155117NM_004701NM_006281Contig44289_RCNM_004336Contig33814_RCNM_003600NM_006265NM_000291NM_000096NM_001673NM_001216NM_014968NM_018354NM_007036NM_004702Contig2399_RCNM_001809Contig20217_RCNM_003981NM_007203NM_006681AF055033NM_014889NM_020386NM_000599Contig56457_RCNM_005915Contig24252_RCContig55725_RCNM_002916NM_014321NM_006931AL080079Contig51464_RCNM_000788NM_016448X05610NM_014791Contig40831_RCAK000745NM_015984NM_016577Contig32185_RCAF052162AF073519NM_003607NM_006101NM_003875Contig25991Contig35251_RCNM_004994NM_000436NM_002073NM_002019NM_000127NM_020188AL137718Contig28552_RCContig38288_RCAA555029_RCNM_016359Contig46218_RCContig63649_RCAL080059

Use (− αGD−1)x =w tofind “nearby” importantgenes.

ProteinRankObjectRankEventRank

IsoRankClustering

Sports rankingFood websCentrality

Reverse PageRankFutureRank

SocialPageRankBookRank

ArticleRankItemRankSimRank

DiffusionRankTrustRankTweetRank

Note New paper LabRank with a random scientist?

Gleich & Langville Recap AIM 4 / 21

40 60 80 100 120

40

60

80

mm

Ulam NetworksChirikov mapyt+1 = ηyt+k sin(t+θt)t+1 = t + yt+1

Ulam network1. divide phase space into uniform cells2. form P based on trajectories.

log(E [x(A)]) log(Std [x(A)]))/ log(E [x(A)])

A ∼ Bet(2,16)Note White is larger, black is smaller

Google matrix, dynamical attractors, and Ulam networks, Shepelyansky and Zhirov, arXivGleich & Langville Recap AIM 5 / 21

40 60 80 100 120

40

60

80

mm

Choosing alphaSlide 6 of 21

Choosing alpha

Choosing personalization

Related methods

Open issues

40 60 80 100 120

40

60

80

mm

What is alpha? There’s no single answer.Ask yourself, why am I computing PageRank? Then use the bestvalue for your application.

web-search → tune α for the best featurevector

node centrality → understand what randomjumps mean in your graph

find important nodesin a web-graph → use the random surfer inter-

pretation

Author αBrin and Page (1998) 0.85Najork et al. (2007) 0.85Litvak et al. (2006) 0.5Pan el al. (2004) 0.15Algorithms (...) ≥ 0.85Experiment ???

Gleich & Langville Choosing alpha AIM 7 / 21

40 60 80 100 120

40

60

80

mm

The PageRank limit valueSingular? (− αP)x = (1− α)v

P = X�

00 J1

X−1

− αX�

00 J1

X−1�

x = (1− α)v

X

− α�

00 J1

��

X−1x = (1− α)v�

− α�

00 J1

��

y = (1− α)z

(1− α)y1 = (1− α)z1(− αJ2)y2 = (1− α)z2

Boldi et al. 2003: PageRank as a function of the damping parameter

Gleich & Langville Choosing alpha AIM 8 / 21

40 60 80 100 120

40

60

80

mm

TotalRank

t =∫ 1

0x(α)dα

Proposed by Boldi et al. (2005) as a parameter free PageRank.

Gleich & Langville Choosing alpha AIM 9 / 21

40 60 80 100 120

40

60

80

mm

Generalized PageRank

PageRank (− αP)x = (1− α)vx =

∑∞=0(1− α)(α

)Pv

Generalized PageRank y =∑∞

=0 ƒ ()Pv

ƒ () <∞

TotalRank ƒ () = 1+1 −

1+2

LinearRank ...HyperRank ...

Baeza-Yates et al. 2006

Gleich & Langville Choosing alpha AIM 10 / 21

40 60 80 100 120

40

60

80

mm

Pick a distributionMultiple surfers should have an impact!

Each person picks α from distribution A

↓x(E [A])

...

↓E [x(A)]

↘ ↙x(E [A]) 6= E [x(A)]

TotalRank : E [x(A)] : A ∼ U[0,1]Constantine & Gleich, Internet Mathematics, in press.

Gleich & Langville Choosing alpha AIM 11 / 21

40 60 80 100 120

40

60

80

mm

From users

Raw α

density

0.0

0.5

1.0

1.5

2.0

0.0 0.2 0.4 0.6 0.8 1.0

Sample mean μ̄ = 0.631.Gleich et al., WWW2010Note 257,664 users from Microsoft toolbar data

Gleich & Langville Choosing alpha AIM 12 / 21

40 60 80 100 120

40

60

80

mm

ChoosingpersonalizationSlide 13 of 21

Choosing alpha

Choosing personalization

Related methods

Open issues

40 60 80 100 120

40

60

80

mm

Personalization choices

Application specific

É GeneRank : v = normalized microarray weightsÉ TopicRank: v = pages on the same topicÉ TrustRank: v = only pages known to be goodÉ BadRank: v = only pages known to be bad (an reverse the

graph)

Super-personalized

É Set v to have only a single non-zero : v = e.

Gleich & Langville Choosing personalization AIM 14 / 21

40 60 80 100 120

40

60

80

mm

Personalized PageRank

B = (1− α)(− αP)−1

Bj = “personalized score of page when jumping to page ”

Gleich & Langville Choosing personalization AIM 15 / 21

40 60 80 100 120

40

60

80

mm

Related methodsSlide 16 of 21

Choosing alpha

Choosing personalization

Related methods

Open issues

40 60 80 100 120

40

60

80

mm

PageRank history

See Vigna 2010: Spectral Ranking andFranceschet 2010: PageRank: Standing on the shoulder of giants.

Let A be the adjacency matrix of a graph.PageRank (− αP)x = (1− α)v (αP+ (1− α)veT )x = x

Seeley 1949 Px = x

Wei 1952 ATx = x

Katz 1953 (− αA)x = e

Hubbell 1965 ATx = x+ v

Gleich & Langville Related methods AIM 17 / 21

40 60 80 100 120

40

60

80

mm

Graph centrality

For a graph G, a score assigned to each vertex ∈ V is acentrality score if larger scores are “more central” vertices andthe score is independent of the labeling on the vertices.

Gleich & Langville Related methods AIM 18 / 21

40 60 80 100 120

40

60

80

mm

Open issuesSlide 19 of 21

Choosing alpha

Choosing personalization

Related methods

Open issues

40 60 80 100 120

40

60

80

mm

From

Vigna, A history of spectral ranking, MMDS2010

40 60 80 100 120

40

60

80

mm

Other issues

Gleich & Langville Open issues AIM 21 / 21

40 60 80 100 120

40

60

80

mm

QUESTIONS?