TT--Rank: TimeRank: Time--Aware Authority RankingAware Authority Ranking
Klaus Klaus BerberichBerberich, , MichalisMichalis VazirgiannisVazirgiannis, Gerhard , Gerhard WeikumWeikum
MaxMax--Planck Institute for Computer Science (Planck Institute for Computer Science (SaarbrückenSaarbrücken))
WAW 2004, Rome (Italy), 10/16/2004WAW 2004, Rome (Italy), 10/16/2004
OutlineOutline
MotivationMotivation
ObjectivesObjectives
BasicsBasics
TT--Rank: TimeRank: Time--aware Authority Rankingaware Authority Ranking
ExperimentsExperiments
ConclusionsConclusions
Ongoing and future workOngoing and future work
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 3/20
Motivation IMotivation I
Structure of the Web evolves at high paceStructure of the Web evolves at high pace
25% new links, 8% new pages per week 25% new links, 8% new pages per week [Nto04][Nto04]
Page contents change frequentlyPage contents change frequently
15% of pages weekly updated 15% of pages weekly updated [Fet03][Fet03]
Temporal aspects of the Web’s evolutionTemporal aspects of the Web’s evolution
How recent is a Web page or a link?How recent is a Web page or a link?
How frequently is a Web page or a link modified?How frequently is a Web page or a link modified?
[Nto04] A. Ntoulas, J. Cho and C. Olston. What's new on the web?: the evolution of the web from a search engine perspective, Proceedings of the 13th conference on World Wide Web, pp. 1-12, 2004. ACM Press
[Fet03] D. Fetterly, M. Manasse, M. Najork and J. Wiener. A large-scale study of the evolution of web pages, Proceedings of the 12th conference on World Wide Web, pp. 669-678, 2003. ACM Press
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 4/20
Motivation IIMotivation II
LinkLink--analysis techniques do not address analysis techniques do not address evolution and temporal aspectsevolution and temporal aspects
VLDB04 and VLDB05 websites not among topVLDB04 and VLDB05 websites not among top--55for query for query “VLDB Conference”“VLDB Conference” in Sep. 04in Sep. 04
User’s interest has a temporal dimension User’s interest has a temporal dimension (e.g., most recent, last year…)(e.g., most recent, last year…)
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 5/20
ObjectivesObjectives
Integration of temporal aspects into Integration of temporal aspects into linklink--based authority rankingbased authority ranking
TimeTime--aware rankings that reflectaware rankings that reflect
the user’s demand for recent informationthe user’s demand for recent information
bring up authorities regarding a temporal interestbring up authorities regarding a temporal interest
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 6/20
Basics Basics –– PageRankPageRank
PageRankPageRank as a baselineas a baseline
Generalization of Generalization of PageRankPageRank
t(x,yt(x,y)) describes transition probabilitiesdescribes transition probabilities
s(ys(y)) describes random jump probabilitiesdescribes random jump probabilities
( , )
( ) (1 ) ( , ) ( ) ( )x y E
r y t x y r x s y! !"
= # $ $ + $
!!
"
= # $ +( , )
( )( ) (1 )
outdegree( )x y E
r xr y
x n
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 7/20
Basics Basics –– Evolving graph Evolving graph
Model ofModel of evolving graphevolving graph G(V,E)G(V,E)
Nodes and edges temporally annotatedNodes and edges temporally annotated
TSTSCreationCreation : : creation time creation time
TSTSDeletionDeletion : : deletion time deletion time
TSTSModificationsModifications : : set of modification timesset of modification times
TSTSLastmodLastmod : : last modification timelast modification time
Time represented by Time represented by integersintegers(e.g., days since reference date)(e.g., days since reference date)
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 8/20
Basics Basics –– Temporal interestTemporal interest
Temporal interest Temporal interest defined bydefined by
temporal window of interesttemporal window of interest [[TSTSOriginOrigin,TS,TSEndEnd]]
tolerance interval tolerance interval [t[t11,t,t22] : t] : t1 1 TSTSOriginOrigin TSTSEndEnd tt22
Graph with respect to the temporal interest Graph with respect to the temporal interest GGtiti(V,E(V,E)) contains nodes and edges with contains nodes and edges with TSTSDeletionDeletion tt1 1 TSTSCreationCreation tt22
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 9/20
Basics Basics –– FreshnessFreshness
FreshnessFreshness measures relevance of a timestamp measures relevance of a timestamp tststo a temporal interestto a temporal interest
Freshness of node Freshness of node xx: : f(xf(x) = ) = f(TSf(TSLastmodLastmod (x))(x))
Freshness of edge Freshness of edge ((x,yx,y)): : f(x,yf(x,y) = ) = f(TSf(TSLastmodLastmod ((x,yx,y))))
1 1
1
2
2
: 1
1 : ( )
( )1
: ( ) 1
:
Origin End
Origin
Origin
End End
End
if TS ts TS
eif t ts TS ts t e
TS tf ts
eif TS ts t ts TS
t TS
otherwise e
% %
# % < $ # + #
= #
< % $ # + #
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 10/20
Basics Basics –– ActivityActivity
ActivityActivity measures frequency of change with respect measures frequency of change with respect to a temporal interestto a temporal interest
ActivityActivity ofof node node xx: : a(xa(x) = ) = a(TSa(TSModificationsModifications(x(x))))
ActivityActivity ofof edge (edge (x,yx,y)): : a(x,ya(x,y) = ) = a(TSa(TSModificationsModifications(x,y(x,y))))
2
1 2
1
TS [ , ] : { ( )| }( )
:
t
t
if t t f ts ts TSa TS
otherwise e
& ' ( "
=
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 11/20
TT--Rank Rank –– Overview Overview
Modified Modified PageRankPageRank on on GGtiti(V,E(V,E))
Transition probabilities Transition probabilities t(x,yt(x,y) ) depend on depend on freshnessfreshness of nodes and edgesof nodes and edges
Random jump probabilities depend on Random jump probabilities depend on freshness and activityfreshness and activity of nodes and edgesof nodes and edges
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 12/20
TT--Rank Rank –– TransitionsTransitions
Transitions favor Transitions favor fresh fresh nodes/edges nodes/edges
Coefficients Coefficients wwtiti probabilities that random surfer probabilities that random surfer follows follows ((x,yx,y)) with probabilities proportional towith probabilities proportional to freshness of node freshness of node yy
freshness of edge freshness of edge ((x,yx,y))
average (mean) freshness of incoming edges of node average (mean) freshness of incoming edges of node yy
1 2 3
( , ) ( , ) ( , )
( ) ( , ) { ( , ) | ( , ) }( , )
( ) ( , ) { ( , ) | ( , ) }t t t
x z E x z E x w E
f y f x y avg f y y Et x y w w w
f z f x z avg f w w E
) )
) )
" " "
"= $ + $ + $
"
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 13/20
TT--Rank Rank –– Random jumpsRandom jumps
1 2
3 4
( ) ( )( )
( ) ( )
{ ( , ) | ( , ) } { ( , ) | ( , ) }
{ ( , ) | ( , ) } { ( , ) | ( , ) }
s s
z V z V
s s
z V z V
f y a ys y w w
f z a z
avg f y y E avg a y y Ew w
avg f w z w z E avg a w z w z E
) ) ) )
" "
" "
= $ + $ +
" "$ + $
" "
Random jumps favor Random jumps favor fresh and active fresh and active nodes/edgesnodes/edges
Coefficients Coefficients wwsisi probabilities that random surfer probabilities that random surfer jumps to node jumps to node yy with probabilities proportional to with probabilities proportional to freshness and activity of node freshness and activity of node yy
average (mean) freshness and activity average (mean) freshness and activity of incoming edges of node of incoming edges of node yy
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 14/20
Experiments Experiments –– DBLP IDBLP I
Digital Bibliography & Library ProjectDigital Bibliography & Library Project (DBLP) (DBLP) freely available bibliographic dataset (as XML)freely available bibliographic dataset (as XML)
Evolving graphEvolving graph derived from DBLPderived from DBLP
Authors as nodes, citations as edgesAuthors as nodes, citations as edges
~350K (~16K) nodes, ~350K edges~350K (~16K) nodes, ~350K edges
TT--RankRank and and PageRankPageRank applied for temporal applied for temporal interests on interests on decadesdecades (70s to 00s)(70s to 00s)
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 15/20
Experiments Experiments –– DBLP IIDBLP II
RakeshRakesh AgrawalAgrawalJohn Miles SmithJohn Miles Smith1010
Jennifer Jennifer WidomWidomKapaliKapali P. P. EswaranEswaran99
David J. David J. DeWittDeWittMorton M. Morton M. AstrahanAstrahan88
Donald D. Donald D. ChamberlinChamberlinRaymond A. Raymond A. LorieLorie77
Jeffrey F. Jeffrey F. NaughtonNaughtonPhilip A. BernsteinPhilip A. Bernstein66
Hector Hector GarciaGarcia--MolinaMolinaJeffrey D. UllmanJeffrey D. Ullman55
Philip A. BernsteinPhilip A. BernsteinDonald D. Donald D. ChamberlinChamberlin44
Jeffrey D. UllmanJeffrey D. UllmanJim GrayJim Gray33
Michael Michael StonebrakerStonebrakerMichael Michael StonebrakerStonebraker22
Jim GrayJim GrayE. F. E. F. CoddCodd11
TT--Rank Rank 2000s2000sPageRankPageRank 2000s2000s
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 16/20
Experiments Experiments –– Web IWeb I
Olympic Games 2004
~200K thematically related Web pages
9 crawls in period July 26th to September 1st
Blind test comparing PageRank and T-Rank
Users asked to grade quality of given top-10 lists
Half of the queries drawn from Google Zeitgeist
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 17/20
Experiments Experiments –– Web IIWeb II
0
0,2
0,4
0,6
0,8
1
1,2
summer
olympics*
olympics
torch relay
ian thorpe* athens
olympic
travel guide
olympics
schedule*
athens
olympic
venues
Ag
gre
ga
ted
gra
de
PageRank T-Rank
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 18/20
ConclusionsConclusions
Integration of temporal aspects into Integration of temporal aspects into linklink--based authority ranking produces based authority ranking produces timetime--aware rankingsaware rankings
TimeTime--aware authority rankings bring up aware authority rankings bring up authorities with respect to a temporal interestauthorities with respect to a temporal interest
Experimental results show that users prefer Experimental results show that users prefer timetime--aware authority rankings aware authority rankings for most queriesfor most queries
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 19/20
Ongoing and future workOngoing and future work
Lightweight version of Lightweight version of TT--Rank Rank
TrendTrend--based Authority Ranking techniquesbased Authority Ranking techniques
Which pages had the Which pages had the largest relative gains in authority?largest relative gains in authority?
Automatic parameter tuning based on Automatic parameter tuning based on properties of the dataset and user inputproperties of the dataset and user input
Online computation of timeOnline computation of time--aware rankingsaware rankings
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 20/20
Thank you for your attention!Thank you for your attention!
Questions and feedback are Questions and feedback are welcome!welcome!
10/16/2004 WAW 2004: T-Rank: Time-aware Authority Ranking 21/20
Prototype implementationPrototype implementation
JavaJava implementation (J2SE 1.4.3)implementation (J2SE 1.4.3)
Oracle 9iOracle 9i used for storage of dataused for storage of data
ApplicationApplication--independentindependent, since based on , since based on database relations capturing database relations capturing evolving graphevolving graph
Bingo!Bingo! focused crawler collects Web datafocused crawler collects Web data
Power methodPower method (multi(multi--threaded)threaded)
Compressed Row StorageCompressed Row Storage