Date post: | 18-Dec-2015 |
Category: |
Documents |
Upload: | douglas-franklin |
View: | 219 times |
Download: | 1 times |
Dimitrios Katsaros*† Yannis Manolopoulos*
†Aristotle University, Greece *University of Thessaly, Greece
Suffix Tree Based Prediction for Pervasive Computing Environments
Panhellenic Conference on Informatics, 11-13 November 2005 3
Information dissemination in a PCS
Information System(server)
Wireless Cell
Base StationDownlink
Communication Bandwidth
Mobile Hosts (MH)#MHosts >> #Servers
Uplink bandwidth << Downlink bandwidth
Panhellenic Conference on Informatics, 11-13 November 2005 4
Roaming: Where is the mobile?
• The mobile can freely roam inside the coverage area of the cellular system
• Arises the need for location management– location update– location prediction
Panhellenic Conference on Informatics, 11-13 November 2005 5
Querying: What data will be requested?
• The mobile can request any data available in the information system
• Arises the need for– Proactively pushing them into the broadcast channel– Proactively sending them to the next-to-visit base station
Panhellenic Conference on Informatics, 11-13 November 2005 6
Predict: Position & Information Needs• Why is the location prediction useful?
– effective solutions to the mobility tracking/prediction problem can reduce update and paging costs, freeing the network from excessive signaling traffic [bd02].
• Why is the request prediction useful?– Accurate data request prediction results in effective
prefetching [nkm03], which combined with a caching mechanism [km04], can reduce user-perceived latencies as well as server and network loads
[bd02] A. Bhattacharya and S. K. Das, LeZi-Update: An information-theoretic framework for personal mobility tracking in PCS networks, ACM/Kluwer Wireless Networks, 8(2-3), pp. 121 – 135, 2002.
[nkm03] A. Nanopoulos, D. Katsaros, Y. Manolopoulos, A data mining algorithm for generalized Web prefetching, IEEE Transactions on Knowledge and Data Engineering, 15(5), pp. 1155 – 1169, 2003.
[km04] D. Katsaros and Y. Manolopoulos, Web caching in broadcast mobile wireless environments, IEEE Internet Computing, 8 (3), pp. 37 – 45, 2004.
Panhellenic Conference on Informatics, 11-13 November 2005 7
Where is prediction based?
• Both of the aforementioned problems are related to the ability of the underlying network to – record, – learn and, subsequently – predict the mobile's “behaviour”, i.e., its movements
or its information needs
• The success of the prediction is presupposed and is boost by the fact that mobile users exhibit some degree of regularity in their movement and/or in their access patterns
• This regularity may be apparent in the behaviour of each individual client or in client groups.
Panhellenic Conference on Informatics, 11-13 November 2005 8
Location prediction Request prediction• These issues had been treated in isolation, but pioneering works
([vk96] and [bd02]) are paving the way for treating both problems in an homogeneous fashion
• Use methods for data compression (thus, characterized as “information-theoretic”), in carrying out prediction.
• They model the respective state space as finite alphabets comprised of discrete symbols
• In the mobility tracking scenario, the alphabet consists of all possible sites (cells) where the client has ever visited or might visit (assuming that the number of cells in the coverage area is finite)
• In the request prediction scenario, the alphabet consists of all the data objects requested by the client plus the objects that might be requested in the future (assuming that the objects come from a database and thus their number is finite)
[vk96] J. S. Vitter and P. Krishnan, Optimal prefetching via data compression, Journal of the ACM, 43 (5), pp. 771–793, 1996.
Panhellenic Conference on Informatics, 11-13 November 2005 9
4 Families of predictors
• PPM: Prediction by Partial Match• LZ78: Lempel-Ziv 1978• PST: Probabilistic Suffix Tree• CTW: Context –Tree Weighting
OverheadsFamily Training Parameterization Storage
LZ78 Online moderate moderate
PPM online/offline moderate/heavy large
PST offline heavy low
CTW online moderate large
Panhellenic Conference on Informatics, 11-13 November 2005 10
The PPM predictor
• Running sequence: aabacbbabbacbbc
Panhellenic Conference on Informatics, 11-13 November 2005 11
The LZ78 predictor
• Running sequence: aabacbbabbacbbc
Enhanced
Panhellenic Conference on Informatics, 11-13 November 2005 12
The PST predictor
• Running sequence: aabacbbabbacbbc
Panhellenic Conference on Informatics, 11-13 November 2005 13
The CTW predictor (1/3)
• Running bin sequence: 010|11010100011
• Krichevsky-Trofimov estimator:
Panhellenic Conference on Informatics, 11-13 November 2005 16
Discrete Sequence Prediction Problem
• At any given time instance t (meaning that t symbols xt, xt-1, ...,x1 have appeared, in reverse order) calculate the conditional probability
where
• This model introduces stationary Markov chain, since the probabilities are not time-dependent
• The outcome of the predictor is a ranking of the symbols according to their P. The predictors which use such kind of prediction models are termed Markov predictors
Panhellenic Conference on Informatics, 11-13 November 2005 17
The STP algorithm
[em92] A. Ehrenfeucht and J. Mycielski, A pseudorandom sequence – How random is it?, American Mathematical Monthly, 99 (4), pp. 373–375, 1992.
Panhellenic Conference on Informatics, 11-13 November 2005 19
An example execution of STP• Suppose that the sequence of symbols seen so far is the following:
s124 = abcdefgabcdklmabcdexabcd$
• The largest suffix which appear somewhere is the seq is
abcd, and s124 = abcdefgabcdklmabcdexabcd$
• Let α = 0.5, thus we use a portion of abcd, half of it: cd• Appearances of cd in the sequence are:
s124=abcdefgabcdklmabcdexabcd$
Candidate predictions• Since e appears most of the times, the final outcome
of the prediction is: e
Panhellenic Conference on Informatics, 11-13 November 2005 20
Proof of concept of STP (1/2)• Definition. The ratio of symbols returned by the predictor that
indeed match with the next event/symbol in the sequence, divided by the total number of symbols return by the predictor defines the prediction precision
Panhellenic Conference on Informatics, 11-13 November 2005 21
Proof of concept of STP (2/2)• Definition. The total number of symbols return by the
predictor divided by the total number of events/symbols of the sequence defines the prediction overhead