+ All Categories
Home > Documents > WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s...

WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s...

Date post: 20-Feb-2019
Category:
Upload: doantu
View: 216 times
Download: 0 times
Share this document with a friend
21
T T - - Rank: Time Rank: Time - - Aware Authority Ranking Aware Authority Ranking Klaus Klaus Berberich Berberich , , Michalis Michalis Vazirgiannis Vazirgiannis , Gerhard , Gerhard Weikum Weikum Max Max- Planck Institute for Computer Science ( Planck Institute for Computer Science ( Saarbrücken Saarbrücken) WAW 2004, Rome (Italy), 10/16/2004 WAW 2004, Rome (Italy), 10/16/2004
Transcript
Page 1: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

TT--Rank: TimeRank: Time--Aware Authority RankingAware Authority Ranking

Klaus Klaus BerberichBerberich, , MichalisMichalis VazirgiannisVazirgiannis, Gerhard , Gerhard WeikumWeikum

MaxMax--Planck Institute for Computer Science (Planck Institute for Computer Science (SaarbrückenSaarbrücken))

WAW 2004, Rome (Italy), 10/16/2004WAW 2004, Rome (Italy), 10/16/2004

Page 2: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

OutlineOutline

MotivationMotivation

ObjectivesObjectives

BasicsBasics

TT--Rank: TimeRank: Time--aware Authority Rankingaware Authority Ranking

ExperimentsExperiments

ConclusionsConclusions

Ongoing and future workOngoing and future work

Page 3: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 3/20

Motivation IMotivation I

Structure of the Web evolves at high paceStructure of the Web evolves at high pace

25% new links, 8% new pages per week 25% new links, 8% new pages per week [Nto04][Nto04]

Page contents change frequentlyPage contents change frequently

15% of pages weekly updated 15% of pages weekly updated [Fet03][Fet03]

Temporal aspects of the Web’s evolutionTemporal aspects of the Web’s evolution

How recent is a Web page or a link?How recent is a Web page or a link?

How frequently is a Web page or a link modified?How frequently is a Web page or a link modified?

[Nto04] A. Ntoulas, J. Cho and C. Olston. What's new on the web?: the evolution of the web from a search engine perspective, Proceedings of the 13th conference on World Wide Web, pp. 1-12, 2004. ACM Press

[Fet03] D. Fetterly, M. Manasse, M. Najork and J. Wiener. A large-scale study of the evolution of web pages, Proceedings of the 12th conference on World Wide Web, pp. 669-678, 2003. ACM Press

Page 4: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 4/20

Motivation IIMotivation II

LinkLink--analysis techniques do not address analysis techniques do not address evolution and temporal aspectsevolution and temporal aspects

VLDB04 and VLDB05 websites not among topVLDB04 and VLDB05 websites not among top--55for query for query “VLDB Conference”“VLDB Conference” in Sep. 04in Sep. 04

User’s interest has a temporal dimension User’s interest has a temporal dimension (e.g., most recent, last year…)(e.g., most recent, last year…)

Page 5: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 5/20

ObjectivesObjectives

Integration of temporal aspects into Integration of temporal aspects into linklink--based authority rankingbased authority ranking

TimeTime--aware rankings that reflectaware rankings that reflect

the user’s demand for recent informationthe user’s demand for recent information

bring up authorities regarding a temporal interestbring up authorities regarding a temporal interest

Page 6: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 6/20

Basics Basics –– PageRankPageRank

PageRankPageRank as a baselineas a baseline

Generalization of Generalization of PageRankPageRank

t(x,yt(x,y)) describes transition probabilitiesdescribes transition probabilities

s(ys(y)) describes random jump probabilitiesdescribes random jump probabilities

( , )

( ) (1 ) ( , ) ( ) ( )x y E

r y t x y r x s y! !"

= # $ $ + $

!!

"

= # $ +( , )

( )( ) (1 )

outdegree( )x y E

r xr y

x n

Page 7: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 7/20

Basics Basics –– Evolving graph Evolving graph

Model ofModel of evolving graphevolving graph G(V,E)G(V,E)

Nodes and edges temporally annotatedNodes and edges temporally annotated

TSTSCreationCreation : : creation time creation time

TSTSDeletionDeletion : : deletion time deletion time

TSTSModificationsModifications : : set of modification timesset of modification times

TSTSLastmodLastmod : : last modification timelast modification time

Time represented by Time represented by integersintegers(e.g., days since reference date)(e.g., days since reference date)

Page 8: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 8/20

Basics Basics –– Temporal interestTemporal interest

Temporal interest Temporal interest defined bydefined by

temporal window of interesttemporal window of interest [[TSTSOriginOrigin,TS,TSEndEnd]]

tolerance interval tolerance interval [t[t11,t,t22] : t] : t1 1 TSTSOriginOrigin TSTSEndEnd tt22

Graph with respect to the temporal interest Graph with respect to the temporal interest GGtiti(V,E(V,E)) contains nodes and edges with contains nodes and edges with TSTSDeletionDeletion tt1 1 TSTSCreationCreation tt22

Page 9: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 9/20

Basics Basics –– FreshnessFreshness

FreshnessFreshness measures relevance of a timestamp measures relevance of a timestamp tststo a temporal interestto a temporal interest

Freshness of node Freshness of node xx: : f(xf(x) = ) = f(TSf(TSLastmodLastmod (x))(x))

Freshness of edge Freshness of edge ((x,yx,y)): : f(x,yf(x,y) = ) = f(TSf(TSLastmodLastmod ((x,yx,y))))

1 1

1

2

2

: 1

1 : ( )

( )1

: ( ) 1

:

Origin End

Origin

Origin

End End

End

if TS ts TS

eif t ts TS ts t e

TS tf ts

eif TS ts t ts TS

t TS

otherwise e

% %

# % < $ # + #

= #

< % $ # + #

Page 10: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 10/20

Basics Basics –– ActivityActivity

ActivityActivity measures frequency of change with respect measures frequency of change with respect to a temporal interestto a temporal interest

ActivityActivity ofof node node xx: : a(xa(x) = ) = a(TSa(TSModificationsModifications(x(x))))

ActivityActivity ofof edge (edge (x,yx,y)): : a(x,ya(x,y) = ) = a(TSa(TSModificationsModifications(x,y(x,y))))

2

1 2

1

TS [ , ] : { ( )| }( )

:

t

t

if t t f ts ts TSa TS

otherwise e

& ' ( "

=

Page 11: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 11/20

TT--Rank Rank –– Overview Overview

Modified Modified PageRankPageRank on on GGtiti(V,E(V,E))

Transition probabilities Transition probabilities t(x,yt(x,y) ) depend on depend on freshnessfreshness of nodes and edgesof nodes and edges

Random jump probabilities depend on Random jump probabilities depend on freshness and activityfreshness and activity of nodes and edgesof nodes and edges

Page 12: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 12/20

TT--Rank Rank –– TransitionsTransitions

Transitions favor Transitions favor fresh fresh nodes/edges nodes/edges

Coefficients Coefficients wwtiti probabilities that random surfer probabilities that random surfer follows follows ((x,yx,y)) with probabilities proportional towith probabilities proportional to freshness of node freshness of node yy

freshness of edge freshness of edge ((x,yx,y))

average (mean) freshness of incoming edges of node average (mean) freshness of incoming edges of node yy

1 2 3

( , ) ( , ) ( , )

( ) ( , ) { ( , ) | ( , ) }( , )

( ) ( , ) { ( , ) | ( , ) }t t t

x z E x z E x w E

f y f x y avg f y y Et x y w w w

f z f x z avg f w w E

) )

) )

" " "

"= $ + $ + $

"

Page 13: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 13/20

TT--Rank Rank –– Random jumpsRandom jumps

1 2

3 4

( ) ( )( )

( ) ( )

{ ( , ) | ( , ) } { ( , ) | ( , ) }

{ ( , ) | ( , ) } { ( , ) | ( , ) }

s s

z V z V

s s

z V z V

f y a ys y w w

f z a z

avg f y y E avg a y y Ew w

avg f w z w z E avg a w z w z E

) ) ) )

" "

" "

= $ + $ +

" "$ + $

" "

Random jumps favor Random jumps favor fresh and active fresh and active nodes/edgesnodes/edges

Coefficients Coefficients wwsisi probabilities that random surfer probabilities that random surfer jumps to node jumps to node yy with probabilities proportional to with probabilities proportional to freshness and activity of node freshness and activity of node yy

average (mean) freshness and activity average (mean) freshness and activity of incoming edges of node of incoming edges of node yy

Page 14: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 14/20

Experiments Experiments –– DBLP IDBLP I

Digital Bibliography & Library ProjectDigital Bibliography & Library Project (DBLP) (DBLP) freely available bibliographic dataset (as XML)freely available bibliographic dataset (as XML)

Evolving graphEvolving graph derived from DBLPderived from DBLP

Authors as nodes, citations as edgesAuthors as nodes, citations as edges

~350K (~16K) nodes, ~350K edges~350K (~16K) nodes, ~350K edges

TT--RankRank and and PageRankPageRank applied for temporal applied for temporal interests on interests on decadesdecades (70s to 00s)(70s to 00s)

Page 15: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 15/20

Experiments Experiments –– DBLP IIDBLP II

RakeshRakesh AgrawalAgrawalJohn Miles SmithJohn Miles Smith1010

Jennifer Jennifer WidomWidomKapaliKapali P. P. EswaranEswaran99

David J. David J. DeWittDeWittMorton M. Morton M. AstrahanAstrahan88

Donald D. Donald D. ChamberlinChamberlinRaymond A. Raymond A. LorieLorie77

Jeffrey F. Jeffrey F. NaughtonNaughtonPhilip A. BernsteinPhilip A. Bernstein66

Hector Hector GarciaGarcia--MolinaMolinaJeffrey D. UllmanJeffrey D. Ullman55

Philip A. BernsteinPhilip A. BernsteinDonald D. Donald D. ChamberlinChamberlin44

Jeffrey D. UllmanJeffrey D. UllmanJim GrayJim Gray33

Michael Michael StonebrakerStonebrakerMichael Michael StonebrakerStonebraker22

Jim GrayJim GrayE. F. E. F. CoddCodd11

TT--Rank Rank 2000s2000sPageRankPageRank 2000s2000s

Page 16: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 16/20

Experiments Experiments –– Web IWeb I

Olympic Games 2004

~200K thematically related Web pages

9 crawls in period July 26th to September 1st

Blind test comparing PageRank and T-Rank

Users asked to grade quality of given top-10 lists

Half of the queries drawn from Google Zeitgeist

Page 17: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 17/20

Experiments Experiments –– Web IIWeb II

0

0,2

0,4

0,6

0,8

1

1,2

summer

olympics*

olympics

torch relay

ian thorpe* athens

olympic

travel guide

olympics

schedule*

athens

olympic

venues

Ag

gre

ga

ted

gra

de

PageRank T-Rank

Page 18: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 18/20

ConclusionsConclusions

Integration of temporal aspects into Integration of temporal aspects into linklink--based authority ranking produces based authority ranking produces timetime--aware rankingsaware rankings

TimeTime--aware authority rankings bring up aware authority rankings bring up authorities with respect to a temporal interestauthorities with respect to a temporal interest

Experimental results show that users prefer Experimental results show that users prefer timetime--aware authority rankings aware authority rankings for most queriesfor most queries

Page 19: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 19/20

Ongoing and future workOngoing and future work

Lightweight version of Lightweight version of TT--Rank Rank

TrendTrend--based Authority Ranking techniquesbased Authority Ranking techniques

Which pages had the Which pages had the largest relative gains in authority?largest relative gains in authority?

Automatic parameter tuning based on Automatic parameter tuning based on properties of the dataset and user inputproperties of the dataset and user input

Online computation of timeOnline computation of time--aware rankingsaware rankings

Page 20: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 20/20

Thank you for your attention!Thank you for your attention!

Questions and feedback are Questions and feedback are welcome!welcome!

Page 21: WAW Talk - T-Rank Time-Aware Authority Ranking (final)kberberi/presentations/2004-waw2004.pdf · s – est by t [TSnS d] l [t1,t2 t 1 TSn TSd ... Microsoft PowerPoint - WAW Talk -

10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 21/20

Prototype implementationPrototype implementation

JavaJava implementation (J2SE 1.4.3)implementation (J2SE 1.4.3)

Oracle 9iOracle 9i used for storage of dataused for storage of data

ApplicationApplication--independentindependent, since based on , since based on database relations capturing database relations capturing evolving graphevolving graph

Bingo!Bingo! focused crawler collects Web datafocused crawler collects Web data

Power methodPower method (multi(multi--threaded)threaded)

Compressed Row StorageCompressed Row Storage


Recommended