Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | mustafa-hull |
View: | 237 times |
Download: | 0 times |
1آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
گزارش پيشرفت کار پروژه
مدل مارکف
2آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Modeling and Predicting a User’s Browsing Behavior
the problem of modeling and predicting a user’s browsing behavior on a Web site can be used to improve: the Web cache performance [1; 2; 3] recommend related pages [4;5] improve search engines [6] understand and influence buying patterns [7] personalize the browsing experience [8]
3آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Markov models
Markov models [9] have been used for studying and understanding stochastic processes
They shown to be well suited for modeling and predicting a user’s browsing behavior on a Web site.
4آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Markov models
In general, the input for these problems is the sequence of Web pages accessed by a user
The goal is to build Markov models that can be used to predict the Web page that the user will most likely access next
5آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Markov Models for Predicting Next-Accessed Page
The act of a user browsing a Web site is commonly modeled by observing the set of pages that he or she visits[10]
This set of pages is referred to as a Web session
W =( P1,P2, ... , Pl )
6آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Markov Models for Predicting Next-Accessed Page
The next-page prediction problem can be solved using a probabilistic framework as follows:
Let W be a user’s Web session of length l let P( pi | W ) be the probability that the user visits
page pi next Then the page pl+1 that the user will visit next is
given by
7آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Markov Models for Predicting Next-Accessed Page
the probability of visiting a page pi does not depend on all the pages in the Web session, but only on a small set of k preceding pages, where k « l
Then we have:
8آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Markov Models for Predicting Next-Accessed Page
The number of preceding pages k that the next page depends on is called the order of the Markov model, and the resulting model M is called the kth-order Markov model
9آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
P1
P2
P4
P3
P5
Markov Models for Predicting Next-Accessed Page
the site map for a sample Web site as a directed graph
10آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Markov Models for Predicting Next-Accessed Page
a set of Web sessions that were generated on thisWeb site
Training setW1 : <P1 , P3 , P4>W2 : <P1 , P2 , P3 , P5>W3 : <P1 , P2>W4 : <P1 , P3 , P4 , P3 , P1 , P2>W5 : <P1 , P2 , P3 , P5 , P3>W6 : <P1 , P3 , P1 , P2 , P1 , P3 , P4>
Test set:Wt1 : <P1 , P2 , P3 , ?> <P5>Wt1 : <P1 , P2 , P3 , P5 , P3 , ?> <P4>
11آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Markov Models for Predicting Next-Accessed Page
the frequencies of different states for first-order Markov models
1st –Order States Fr. P1 P2 P3 P4 P5
S(1,1)=<P1>S(1,2)=<P2>S(1,3)=<P3>S(1,4)=<P4>S(1,5)=<P5>
93711
01200
50000
42011
00300
00200
12آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Markov Models for Predicting Next-Accessed Page
the frequencies of different states for second-order Markov models
13آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
how these models are used to predict the most probable page for Web session Wt1
Markov Models for Predicting Next-Accessed Page
14آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Performance Measures for Markov Models
The first is the accuracy of the model The second is the number of states of the
model The third is the coverage of the mode
the ratio of the number of Web sessions for which the model is able to correctly predict the hidden page to the total number of Web sessions in the test setthe total number of states for which a Markov model has estimatedthe ratio of the number of Web sessions whose state required for making a prediction was found in the model to the total number of Web sessions in the test set
15آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Lower-order Markov models
lower-order Markov models (first or second) are not successful in accurately predicting the next page to be accessed by the user
Because these models do not look far into the past
16آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Higher-order Markov models
In order to obtain better predictions, higher-order models must be used
these higher-order models have a number of limitations:
(i) high state-space complexity
(ii) reduced coverage
(iii) sometimes even worse accuracy due to the lower coverage
17آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Comparing accuracy, coverage and model size with the order of Markov model
18آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
All-Kth-Order Markov model
One method to overcome coverage problem is to train varying order Markov models and then combine them for prediction[8]
For each test instance, the highest-order Markov model that covers the instance is used for prediction
This scheme is called :All-Kth-Order Markov model
But it increases the problem of model size
19آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Some techniques developed to intelligently combine different order Markov models
The resulting model :Has low state complexity, Retains the coverage of the All-Kth-Order
Markov modelAchieves comparable accuracies
20آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Frequency based
They are based on the observation that states that occur with low frequency in the training set, tend to also have low prediction accuracies
These low frequency states can be eliminated without affecting the accuracy of the resulting model
21آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Frequency based
The amount of pruning is controlled by the parameter Φ referred to as the frequency threshold
Note that they will never prune a state from a first-order Markov model that will not reduce the coverage of the original model
22آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Frequency based
Frequencythreshold
Accuracy # states
02468
1012141618202224
30.2430.6831.3231.5631.6531.7131.7431.7331.7231.7231.7231.6731.67
1264644452820914141641089989527661671659695389496546094296
23آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Error based
The final predictions are computed by using only the states of the model that have the smallest estimated error rate
the error associated with each state is estimated by a validation step
A higher-order state is pruned by comparing its error rate with the error rate of its lower-order states
24آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
For example, to prune the state S(3,q) (Pi , Pj , Pk), its error rate will be compared with the error rate for states S(2,r) (Pj , Pk), and state S(1,s) (Pk); the state S(3,q) will be pruned if its error rate is higher than any of them.
Error based
25آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Training and validating Web sessions
26آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Various order Markov states withtheir maximum frequency page
27آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
Error rates for Markov states
28آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
<P1,P3,P5> <P2,P4,P5><P2,P3,P5>
<P5>
<P3,P5> <P4,P5><P3,P5>
<P5> <P5>
29آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
References
[1] SCHECHTER, S., KRISHNAN, M., AND SMITH, M. D. 1998. Using path profiles to predict http requests.In 7th International World Wide Web Conference
[2] BESTRAVOS, A. 1995. Using speculation to reduce server load and service time on www. In Proceedings of the 4th ACM International Conference of Information and Knowledge Management. ACM Press.
[3] PADMANABHAM, V. AND MOGUL, J. 1996. Using predictive prefetching to improve world wide web latency. Comput. Commun. Rev.
[4] DEAN, J. AND HENZINGER, M. R. 1999. Finding related pages in world wide web. In Proceedings of the 8th International World Wide Web Conference.
[5] PIROLLI, P., PITKOW, J., AND RAO, R. 1996. Silk from a sow’s ear: Extracting usable structures from the web. In Proceedings of ACM Conference on Human Factors in Computing Systems (CHI-96).
30آزمايشکاه سيستم های هوشمند
(http://ce.aut.ac.ir/islab)شهره کاظمی [email protected]
[6] BRIN, S. AND PAGE, L. 1998. The anatomy of large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference.
[7] CHI, E., PITKOW, J., MACKINLAY, J., PIROLLI, P., GOSSWEILER, R., AND CARD, S. 1998. Visualizing the evolution of web ecologies. In Proceedings of ACM Conference on Human Factors in Computing Systems (CHI 98).
[8] PITKOW, J. AND PIROLLI, P. 1999. Mining longest repeating subsequence to predict world wide web surfing. In 2nd USENIX Symposium on Internet Technologies and Systems. Boulder, CO.
[9] PAPOULIS, A. 1991. Probability, Random Variables, and Stochastic Processes. McGraw Hill.
[10] SRIVASTAVA, J., COOLEY, R., DESHPANDE, M., AND TAN, P.-N. 2000. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explor. 1, 2.