Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 0 times |
WebKDD 2001 Aristotle University of Thessaloniki 1
Effective Prediction of Web-user Accesses: A Data Mining Approach
Nanopoulos AlexandrosKatsaros Dimitrios
Yannis ManolopoulosAristotle Univ. of Thessaloniki,
Greece
Presentation:Spyros Papadimitriou, Carnegie Mellon Univ.
WebKDD 2001 Aristotle University of Thessaloniki 2
Introduction (1/2)
• Web Prefetching: Deducing forthcoming user accesses based on log information
• Focus on:– Predictive prefetching (use of history)– Server initiated (server makes
predictions and piggybacks them to the clients)
WebKDD 2001 Aristotle University of Thessaloniki 3
• Within a site, users navigate following links [5]
• For server-initiated predictive prefetching interest is for access patterns reflecting this behavior
Introduction (2/2)
WebKDD 2001 Aristotle University of Thessaloniki 4
• Motivation & Related work• Proposed method• Comparative performance
evaluation• Conclusions
Outline
WebKDD 2001 Aristotle University of Thessaloniki 5
• Motivation & Related work• Proposed method• Comparative performance
evaluation• Conclusions
Presentation Outline
WebKDD 2001 Aristotle University of Thessaloniki 6
• Site structure and contents impose1. The order of dependencies (first or higher)
among the documents2. The interleaving of documents belonging
to patterns with random visits (noise)
• Discovered patterns should respect these factors
Requirements
WebKDD 2001 Aristotle University of Thessaloniki 7
• Dependency graph (DG) [9]– A graph maintains pairwise accesses
• Prediction by Partial Match (PPM) [10]– A trie maintains sequences of consecutive
accesses• LBOT [6]
– Special form of association rules of length 2• Others (variations of the above) [3,11]
Related work
WebKDD 2001 Aristotle University of Thessaloniki 8
Motivation
DG No Yes
PPM Yes No
LBOT No No
Order(1st Req.)
Proposed Yes Yes
Noise(2nd Req.)
WebKDD 2001 Aristotle University of Thessaloniki 9
• Motivation & Related work• Proposed method• Comparative performance
evaluation• Conclusions
Presentation Outline
WebKDD 2001 Aristotle University of Thessaloniki 10
• Novel Web log mining algorithm (WMo)– Apriori-like – Effective
•Immune to noise•Considers high order dependencies
– Efficient•Significant reduction in the number
of candidates
Proposed Method (1)
WebKDD 2001 Aristotle University of Thessaloniki 11
• Session (or transaction): A sequence of requests that occur in a specified time interval from each other [2]
• Containment relationship addresses the 1st requirement (avoiding noise)
• Example:T = A, X, B, Y, C X, Y noiseS = A, B, C the patternS is contained by T
• Comment:With contiguous subsequences based only on support S (the pattern) will be missed.
Proposed Method (2)
WebKDD 2001 Aristotle University of Thessaloniki 12
• Candidate generation respects the ordering of accesses in transactions.
• Example: A,B B,A
• Dramatic increase in the number of candidates
• Exploits the site structure for pruning [7,8]
Proposed Method (3)
WebKDD 2001 Aristotle University of Thessaloniki 13
Algorithm genCandidates(Lk, G)//Lk the set of large k-paths and G the graphbeginforeach L=l1, …, lk, L Lk {
N+(lk) = {v| arc lk v G}foreach v N+(lk) {
//apply modified apriori pruningif v L and L’ = l2, …, lk,v Lk {
C= l1, …, lk , vif ( S C, S L’ S Lk )
insert C in the candidate-trie}
}}end
Proposed Method (4)
WebKDD 2001 Aristotle University of Thessaloniki 14
• Sequential patterns [1]– Reduction when “customer-sequence” = “user-
session”– Suffers from large number of candidates (by not
considering the site structure)• Path Fragments [4] (containment relationship is
performed with regular expressions and the “*” label ) – Focus on semantics (recommendation systems)
• Prefetching: patterns are for system and not for human consumption
• WMo focuses on efficiency/effectiveness rather on expressiveness (semantics)
Discussion
WebKDD 2001 Aristotle University of Thessaloniki 15
• Motivation & Related work• Proposed method• Comparative performance
evaluation• Conclusions
Presentation Outline
WebKDD 2001 Aristotle University of Thessaloniki 16
• Synthetic (sample site with 1000 nodes)– Synthetic data generator (see the paper)
• Modeling site nodes, site linkage, size of documents
• Real data sets (see the paper)• Examine the impact of:
– noise– order– client cache (see the paper)– efficiency
Methodology
WebKDD 2001 Aristotle University of Thessaloniki 17
Accuracy w.r.t. noise
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1.6 1.8 2 2.2 2.4 2.6 2.8 3
accu
racy
mean noise
DGPPMWM
WMoLBOT
WebKDD 2001 Aristotle University of Thessaloniki 18
Usefulness w.r.t. noise
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
1.6 1.8 2 2.2 2.4 2.6 2.8 3
usefu
lness
mean noise
DGPPMWM
WMoLBOT
WebKDD 2001 Aristotle University of Thessaloniki 19
Traffic w.r.t. noise
1.25
1.3
1.35
1.4
1.45
1.5
1.55
1.6
1.65
1.7
1.6 1.8 2 2.2 2.4 2.6 2.8 3
netw
ork
tra
ffic
mean noise
DGPPMWM
WMoLBOT
WebKDD 2001 Aristotle University of Thessaloniki 20
Accuracy w.r.t. order
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
accu
racy
higher order percentage
DGPPMWM
WMoLBOT
WebKDD 2001 Aristotle University of Thessaloniki 21
Usefulness w.r.t. order
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
usefu
lness
higher order percentage
DGPPMWM
WMoLBOT
WebKDD 2001 Aristotle University of Thessaloniki 22
Traffic w.r.t. order
1.35
1.4
1.45
1.5
1.55
1.6
1.65
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
netw
ork
tra
ffic
higher order percentage
DGPPMWM
WMoLBOT
WebKDD 2001 Aristotle University of Thessaloniki 23
Efficiency (see also [7,8])
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1e+006
1.1e+006
0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26
nu
mb
er
of
can
did
ate
s
support threshold (percentage)
WMWMo/wp
WMo
WebKDD 2001 Aristotle University of Thessaloniki 24
• Motivation & Related work• Proposed method• Comparative performance
evaluation• Conclusions
Presentation Outline
WebKDD 2001 Aristotle University of Thessaloniki 25
• Factors that influence Web Prefetching– Noise– Order
• A new algorithm WMo was presented based on data mining
• Compares favorably with previously proposed algorithms
• WMo is an effective and efficient Web prefetching algorithm
Conclusions
WebKDD 2001 Aristotle University of Thessaloniki 26
1. R.Agrawal, Ramakrishnan Srikant, Mining Sequential Patterns, ICDE 1995.2. R.Cooley, B. Mobasher, J.Srivastava, Data Preparation for Mining World Wide Web
Browsing Patterns, KAIS, 1(1), pp. 5-32, 1999.3. M. Deshpande, G. Karypis, Selective Markov Models for Predicting Web-page
Accesses, SIAM Data Mining, 2001.4. W.Gaul, L.T.Schimdt-Thieme, Mining Web Navigation Path Fragments, WebKDD
2000.5. B. A. Huberman, P. Pirolli, J. Pitkow and R. J. Lukose, Strong Regularities in World
Wide Web Surfing. Science, 280, pp. 95-97, 1998.6. B.Lan, S.Bressan, B.C. Ooi, Y.Tay, Making Web Servers Pushier, WebKDD 1999.7. A. Nanopoulos, Y. Manolopoulos, Finding Generalized Path Patterns for Web Log
Data Mining, ADBIS-DASFAA 2000.8. A. Nanopoulos, Y. Manolopoulos, Mining patterns from graph traversals, DKE 37(3),
pp.243-266, 2001.9. V.Padmanabhan, J. Mogul, Using Predictive Prefetching to Improve World Wide Web
Latency, ACM SIGCOMM Computer Communications Review, 26(3), 1996.10. T.Palapans, A.Mendelzon, Web Prefetching Using Partial Match Prediction, WCW
1999.11. J. Pitkow, P. Pirroli, Mining Longest Repeating Subsequences to Predict World Wide
Web Surfing, USITS, 1999.12. L.T.Schimdt-Thieme, W.Gaul, Recommender Systems Based on Navigation Path
Features, WebKDD 2001.
References