Dimitrios Katsaros* † Yannis Manolopoulos* † Aristotle University, Greece *University of...

Dimitrios Katsaros*† Yannis Manolopoulos*

†Aristotle University, Greece *University of Thessaly, Greece

Suffix Tree Based Prediction for Pervasive Computing Environments

Panhellenic Conference on Informatics, 11-13 November 2005 2

The architecture of a PCS


Information dissemination in a PCS

Information System(server)

Wireless Cell

Base StationDownlink

Communication Bandwidth

Mobile Hosts (MH)#MHosts >> #Servers

Uplink bandwidth << Downlink bandwidth


Roaming: Where is the mobile?

• The mobile can freely roam inside the coverage area of the cellular system

• Arises the need for location management– location update– location prediction


Querying: What data will be requested?

• The mobile can request any data available in the information system

• Arises the need for– Proactively pushing them into the broadcast channel– Proactively sending them to the next-to-visit base station


Predict: Position & Information Needs• Why is the location prediction useful?

– effective solutions to the mobility tracking/prediction problem can reduce update and paging costs, freeing the network from excessive signaling traffic [bd02].

• Why is the request prediction useful?– Accurate data request prediction results in effective

prefetching [nkm03], which combined with a caching mechanism [km04], can reduce user-perceived latencies as well as server and network loads

[bd02] A. Bhattacharya and S. K. Das, LeZi-Update: An information-theoretic framework for personal mobility tracking in PCS networks, ACM/Kluwer Wireless Networks, 8(2-3), pp. 121 – 135, 2002.

[nkm03] A. Nanopoulos, D. Katsaros, Y. Manolopoulos, A data mining algorithm for generalized Web prefetching, IEEE Transactions on Knowledge and Data Engineering, 15(5), pp. 1155 – 1169, 2003.

[km04] D. Katsaros and Y. Manolopoulos, Web caching in broadcast mobile wireless environments, IEEE Internet Computing, 8 (3), pp. 37 – 45, 2004.


Where is prediction based?

• Both of the aforementioned problems are related to the ability of the underlying network to – record, – learn and, subsequently – predict the mobile's “behaviour”, i.e., its movements

or its information needs

• The success of the prediction is presupposed and is boost by the fact that mobile users exhibit some degree of regularity in their movement and/or in their access patterns

• This regularity may be apparent in the behaviour of each individual client or in client groups.


Location prediction Request prediction• These issues had been treated in isolation, but pioneering works

([vk96] and [bd02]) are paving the way for treating both problems in an homogeneous fashion

• Use methods for data compression (thus, characterized as “information-theoretic”), in carrying out prediction.

• They model the respective state space as finite alphabets comprised of discrete symbols

• In the mobility tracking scenario, the alphabet consists of all possible sites (cells) where the client has ever visited or might visit (assuming that the number of cells in the coverage area is finite)

• In the request prediction scenario, the alphabet consists of all the data objects requested by the client plus the objects that might be requested in the future (assuming that the objects come from a database and thus their number is finite)

[vk96] J. S. Vitter and P. Krishnan, Optimal prefetching via data compression, Journal of the ACM, 43 (5), pp. 771–793, 1996.


4 Families of predictors

• PPM: Prediction by Partial Match• LZ78: Lempel-Ziv 1978• PST: Probabilistic Suffix Tree• CTW: Context –Tree Weighting

OverheadsFamily Training Parameterization Storage

LZ78 Online moderate moderate

PPM online/offline moderate/heavy large

PST offline heavy low

CTW online moderate large


The PPM predictor

• Running sequence: aabacbbabbacbbc


The LZ78 predictor


Enhanced


The PST predictor



The CTW predictor (1/3)

• Running bin sequence: 010|11010100011

• Krichevsky-Trofimov estimator:






Discrete Sequence Prediction Problem

• At any given time instance t (meaning that t symbols xt, xt-1, ...,x1 have appeared, in reverse order) calculate the conditional probability

where

• This model introduces stationary Markov chain, since the probabilities are not time-dependent

• The outcome of the predictor is a ranking of the symbols according to their P. The predictors which use such kind of prediction models are termed Markov predictors


The STP algorithm

[em92] A. Ehrenfeucht and J. Mycielski, A pseudorandom sequence – How random is it?, American Mathematical Monthly, 99 (4), pp. 373–375, 1992.


An example execution of STP• Suppose that the sequence of symbols seen so far is the following:

s124 = abcdefgabcdklmabcdexabcd$

• The largest suffix which appear somewhere is the seq is

abcd, and s124 = abcdefgabcdklmabcdexabcd$

• Let α = 0.5, thus we use a portion of abcd, half of it: cd• Appearances of cd in the sequence are:

s124=abcdefgabcdklmabcdexabcd$

Candidate predictions• Since e appears most of the times, the final outcome

of the prediction is: e


Proof of concept of STP (1/2)• Definition. The ratio of symbols returned by the predictor that

indeed match with the next event/symbol in the sequence, divided by the total number of symbols return by the predictor defines the prediction precision


Proof of concept of STP (2/2)• Definition. The total number of symbols return by the

predictor divided by the total number of events/symbols of the sequence defines the prediction overhead


Thank you for your attention!

Any questions ?

Date post:	18-Dec-2015
Category:	Documents
Upload:	douglas-franklin
View:	219 times
Download:	1 times

Dimitrios Katsaros* † Yannis Manolopoulos* † Aristotle University, Greece *University of...

Documents