Post on 25-Mar-2020
transcript
Mobile Query Processing Incorporating Server
and Client Based Approaches
by
James Winly Jayaputera, BAppSci(Comp.Sci), MIT
Thesis
for fulfillment of the Requirements for the Degree of
Doctor of Philosophy (0190)
Clayton School of Information Technology
Monash University
September, 2008
Abstract
This thesis studies query processing in a mobile environment. The main objective
is to investigate the performance improvement of mobile query processing, focusing
on the server and client sides.
In server side query processing, we consider single-cell and multi-cell queries,
whereby a cell is a service area for a single stationary host to communicate with a
static network. A quick response in answer to a mobile query is important, because
mobile users invariably move to another location while awaiting the query result.
To handle such a dynamic situation, we proposed solutions to answer single-cell
and multi-cell queries. The proposed solutions for processing single-cell queries are
divided into static and dynamic query scopes, and angle of movement. The static
and dynamic query scopes are extended to process multi-cell queries. Furthermore,
another solution is added in order to deal with a situation where the areas of several
base stations are either disjoint or overlapping. Finally, our algorithms also handle
disconnections which occur during query result transmission from a base station to
the mobile users.
Indexing mechanisms are important to speed up query processing, especially
for handling multi-cell queries. We propose two indexing mechanisms called Local
Index and Global Index mechanisms. The local index stores indexes of any requested
objects with limited slots, whereas the global index builds the index while a base
ii
station is starting up. For both mechanisms, we developed algorithms to deal with
the existence and non-existence of replicated objects at the requested cell.
Frequent disconnections is a common problem occurring in a mobile environment.
Providing a cache in a mobile device is an important consideration. A cache is
useful if the repeat of many queries can be retrieved from the cache. Due to the
limitation of storage space in the mobile device, we have developed three cache
replacement policies, called: Path-based, Density-based and Probability Density Area
Inverse Distance (PDAID) mechanisms, which are based on distance, weight and
cost factors for each method, respectively.
In order to analyse the behaviour of the proposed methods, we have implemented
and simulated the performance of each algorithm. The results of each performance
are compared and analysed. The server side query processing shows an improvement
of the total retrieved objects while the query processing time and the amount of
data transfer are reduced. Furthermore, the server is able to decide whether the
next query result needs to be produced when the mobile users missed the current
query result. The proposed indexing mechanism has reduced the execution time
compared with the conventional approach in processing multi-cell queries. The
proposed approaches for the client side have also improved the cache-hit rate while
reducing the amount of data transfer.
iii
Declaration
I declare that this thesis is my own work and has not been submitted in any form foranother degree or diploma at any university or other institute of tertiary education.Information derived from the published and unpublished work of others has beenacknowledged in the text and a list of references is given.
James Winly JayaputeraSeptember 20, 2008
iv
Acknowledgments
This thesis would never have come into existence without precious encouragement,
guidance, and both personal and academic support from two of my supervisors, Dr.
David Taniar and Professor Bala Srinivasan.
I would like to dedicate this thesis to my family, who have supported me to the
end of this journey. Without them, I would not have completed this thesis.
I also would like to thank all of my friends, without mentioning them individually,
who helped to make this possible. I also to thank Bruna Pomella for correcting
grammatical and spelling mistakes.
James Winly Jayaputera
Monash University
September 2008
v
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Limitations of Mobile Environments . . . . . . . . . . . . . . . . . . . 4
1.3 Objectives of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Scope of Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Organisation of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Wireless Environment Architecture . . . . . . . . . . . . . . . . . . . 12
2.2.1 Wireless Technologies . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Location Positioning Systems . . . . . . . . . . . . . . . . . . 19
vi
2.3 Query Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 Traditional Query . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2 Location Query . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Server Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.1 Overview Location-Dependent Query Processing . . . . . . . . 27
2.4.2 Query Processing for a Single Cell . . . . . . . . . . . . . . . . 34
2.4.3 Query Processing for Multiple Cells . . . . . . . . . . . . . . . 36
2.5 Indexing Structures for Query Processing . . . . . . . . . . . . . . . . 37
2.5.1 Conventional Index Query Processing . . . . . . . . . . . . . . 37
2.5.2 Moving Object Index Query Processing . . . . . . . . . . . . . 45
2.6 Mobile Query Processing at Client Side . . . . . . . . . . . . . . . . . 48
2.6.1 Mobile-Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.6.2 Top-K Queries . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.6.3 Cache Replacement Policies . . . . . . . . . . . . . . . . . . . 52
2.7 Outstanding Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.7.1 Mobile Query Processing at Server Side . . . . . . . . . . . . . 57
2.7.2 Indexing Structures for Multi-Cell Query Processing . . . . . . 58
2.7.3 Client Cache Management . . . . . . . . . . . . . . . . . . . . 58
2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3 Query Processing at Server Side . . . . . . . . . . . . . . . . . . . . 61
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.1 All Terms Used . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.2 Shape Selection for a Query Scope . . . . . . . . . . . . . . . 64
3.2.3 Query Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3 Query Processing for Single-Cell . . . . . . . . . . . . . . . . . . . . . 69
vii
3.3.1 Static Query Scope Category . . . . . . . . . . . . . . . . . . 69
3.3.2 Dynamic Query Scope Category . . . . . . . . . . . . . . . . . 79
3.3.3 Angle of Movement Category . . . . . . . . . . . . . . . . . . 81
3.4 Multi-Cell Query Processing . . . . . . . . . . . . . . . . . . . . . . . 84
3.4.1 Non-Overlapping and Overlapping Area Algorithms . . . . . . 87
3.4.2 Static and Dynamic Query Scope Algorithm . . . . . . . . . . 95
3.5 Handling Disconnections . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.5.1 Single Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.5.2 Multiple Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.6 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.6.1 Single-Cell Query Processing . . . . . . . . . . . . . . . . . . . 110
3.6.2 Multi-Cell Query Processing . . . . . . . . . . . . . . . . . . . 117
3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4 Indexing for Multiple Servers Retrieval . . . . . . . . . . . . . . . 124
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.2 Preliminary Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.3 Local Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.3.1 Cache Remote Indexes Only . . . . . . . . . . . . . . . . . . . 132
4.3.2 Cache Remote Indexes and Data Items . . . . . . . . . . . . . 135
4.4 Global Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.4.1 Remote Data Items Located at Different Cell . . . . . . . . . 140
4.4.2 Remote Indexes and Data Items Located at Same Cell . . . . 144
4.5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
viii
5 Client Caching for a Mobile Environment . . . . . . . . . . . . . . 153
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.2 Client Caching Overview . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.2.1 Global Process . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.2.2 Storing Query Results to Cache . . . . . . . . . . . . . . . . . 158
5.2.3 Predicting Next Movement . . . . . . . . . . . . . . . . . . . . 160
5.2.4 Retrieving Cached Objects . . . . . . . . . . . . . . . . . . . . 161
5.2.5 Updating Query History List . . . . . . . . . . . . . . . . . . 162
5.2.6 Objects Grouping . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.2.7 Cached Objects Elimination . . . . . . . . . . . . . . . . . . . 164
5.3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.3.1 Path Based Elimination Algorithm . . . . . . . . . . . . . . . 168
5.3.2 Density Based Elimination Algorithm . . . . . . . . . . . . . . 172
5.3.3 PDAID Elimination Algorithm . . . . . . . . . . . . . . . . . 174
5.4 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.1 Implementation and its Results . . . . . . . . . . . . . . . . . . . . . 188
6.1.1 Implementation Environment . . . . . . . . . . . . . . . . . . 189
6.1.2 Implementation Results . . . . . . . . . . . . . . . . . . . . . 189
6.2 Simulation and its Results . . . . . . . . . . . . . . . . . . . . . . . . 209
6.3 Simulation Results for Single-Cell and Multi-Cell Query Processing . 210
6.3.1 Indexing for Multi-Cell Query Processing . . . . . . . . . . . . 214
6.3.2 Simulation Results for Client Caching . . . . . . . . . . . . . . 219
6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
ix
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
7 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 227
7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
7.2 Summary of Research Result . . . . . . . . . . . . . . . . . . . . . . . 227
7.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Appendix A Implementation Model . . . . . . . . . . . . . . . . . . . . 250
A.1 Location Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
A.2 Implementation for Query Processing in Single Cell . . . . . . . . . . 252
A.3 Implementation for Query Processing in Multi-Cells . . . . . . . . . . 256
Appendix B Simulation Model . . . . . . . . . . . . . . . . . . . . . . . 265
B.1 Simulation Package Overview . . . . . . . . . . . . . . . . . . . . . . 265
B.2 Query Processing Model . . . . . . . . . . . . . . . . . . . . . . . . . 266
x
List of Tables
2.1 Comparison of IEEE 802.11 standards . . . . . . . . . . . . . . . . . 14
2.2 Comparison of wireless local area network (WLAN) standards - 802.11a
versus 802.11b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Performance characteristics of cellular positioning methods . . . . . . 20
2.4 Mobile query category 1 . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Mobile query category 2 . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.1 Hardware settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
6.2 Parameters setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
6.3 First experiment result . . . . . . . . . . . . . . . . . . . . . . . . . . 196
6.4 Parameters setting for multiple BSs . . . . . . . . . . . . . . . . . . . 201
6.5 Second experiment result . . . . . . . . . . . . . . . . . . . . . . . . . 201
6.6 Parameter settings - single cell . . . . . . . . . . . . . . . . . . . . . . 210
6.7 Parameters setting - multiple cells . . . . . . . . . . . . . . . . . . . . 213
6.8 Experiment settings for client cache . . . . . . . . . . . . . . . . . . . 220
A.1 Snapshot of our Generated Data . . . . . . . . . . . . . . . . . . . . . 252
A.2 Setting implementation 1 . . . . . . . . . . . . . . . . . . . . . . . . . 253
A.3 Server default setting . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
xi
List of Figures
1.1 Thesis framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Chapter 2 framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Wireless architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Query types classification . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Location-dependent query (LDQ) illustration. . . . . . . . . . . . . . 24
2.5 Requesting a static object and moving within a single cell. . . . . . . 29
2.6 Requesting a static object and moving to another cell. . . . . . . . . 29
2.7 Requesting a moving object and moving within a single cell. . . . . . 30
2.8 Requesting a moving object (user and object moves to another cell). . 31
2.9 Requesting a moving object and user stays at same position. . . . . . 32
2.10 Static user requests a moving object. . . . . . . . . . . . . . . . . . . 33
2.11 Periodic query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.12 Non periodic query . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.13 The R-tree illustration [93] . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 The framework of chapter 3 . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 A scenario presented in two-coordinates . . . . . . . . . . . . . . . . . 65
3.3 The proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.4 A location-dependent query in details . . . . . . . . . . . . . . . . . . 68
xii
3.5 The complexity of vertical movement . . . . . . . . . . . . . . . . . . 75
3.6 The complexity of horizontal movement . . . . . . . . . . . . . . . . . 77
3.7 The complexity of diagonal movement . . . . . . . . . . . . . . . . . 78
3.8 Dynamic query scope for the diagonal movement . . . . . . . . . . . . 80
3.9 Angle of movement illustrations. . . . . . . . . . . . . . . . . . . . . . 82
3.10 The complexity of angle movement. . . . . . . . . . . . . . . . . . . . 83
3.11 Three types of users’ movement . . . . . . . . . . . . . . . . . . . . . 87
3.12 Non-overlapping base stations(BS) . . . . . . . . . . . . . . . . . . . 88
3.13 Multi-cell query illustration . . . . . . . . . . . . . . . . . . . . . . . 91
3.14 An illustration of static query scope . . . . . . . . . . . . . . . . . . . 96
3.15 Dynamic Query intersects a base station (BS) (top) in the same line.
(bottom) in two different lines . . . . . . . . . . . . . . . . . . . . . 98
3.16 An illustration of dynamic query situation . . . . . . . . . . . . . . . 99
3.17 Illustration of predicted disconnection situation . . . . . . . . . . . . 102
3.18 Stay at the same location (Case Study 3.6.1) . . . . . . . . . . . . . . 111
3.19 Vertical movement (Case Study 3.6.2-1) . . . . . . . . . . . . . . . . . 112
3.20 Vertical movement with overlap situation (Case Study 3.6.2-2) . . . . 113
3.21 Horizontal movement (case study 3.6.3-1) . . . . . . . . . . . . . . . . 114
3.22 Horizontal movement with overlap situation (case study 3.6.3) . . . . 115
3.23 Diagonal movement and overlap situation (Case Study 3.6.4) . . . . . 116
3.24 A query scope is crossing multiple cells . . . . . . . . . . . . . . . . . 118
3.25 Moving across to another base station (BS) boundary . . . . . . . . . 119
3.26 Three situations of overlapping base station area . . . . . . . . . . . . 120
4.1 Chapter 4 framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.2 R-tree and 2D coordinates [93] . . . . . . . . . . . . . . . . . . . . . . 128
4.3 Three index structure into 3 cells . . . . . . . . . . . . . . . . . . . . 130
xiii
4.4 Tables for cell 1, cell 2 and cell 3 (from left to right) . . . . . . . . . . 131
4.5 Index structure after the records insertion using local index-1 . . . . . 133
4.6 Index structure after the records insertion using local index-2 . . . . . 136
4.7 Global Index for all cells using GI mechanism. . . . . . . . . . . . . . 139
4.8 GI mechanism uses single node pointers. . . . . . . . . . . . . . . . . 142
4.9 Global Index without replicated remote data items. . . . . . . . . . . 144
4.10 GI mechanism where data items are replicated. . . . . . . . . . . . . 145
4.11 Indexing structure at cell 2 after the remote index insertion . . . . . . 148
4.12 Global Index mechanisms case study . . . . . . . . . . . . . . . . . . 149
5.1 Chapter 5 framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.2 Section 5.2 framework . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.3 An illustration of the DBScan Algorithm . . . . . . . . . . . . . . . . 164
5.4 Simple illustration of our elimination approach . . . . . . . . . . . . . 168
5.5 Complex illustration of our elimination approach . . . . . . . . . . . . 171
5.6 A query scope overlaps with multiple groups . . . . . . . . . . . . . . 171
5.7 Illustration of density elimination . . . . . . . . . . . . . . . . . . . . 173
5.8 Illustration of PDAID retrieval . . . . . . . . . . . . . . . . . . . . . 177
5.9 Initial situation after cached objects have been grouped . . . . . . . . 180
5.10 Density based approach (Case Study 5.4.1-1) . . . . . . . . . . . . . . 181
5.11 Density-based approach (Case Study 5.4.1-2) . . . . . . . . . . . . . . 181
5.12 Path-based approach (Case Study 5.4.2-1) . . . . . . . . . . . . . . . 182
5.13 Path-based approach (Case Study 5.4.2-2) . . . . . . . . . . . . . . . 183
5.14 PDAID-based approach (Case Study 5.4.3) . . . . . . . . . . . . . . . 185
6.1 Number of targets found in a square . . . . . . . . . . . . . . . . . . 190
6.2 Number of targets found in circle . . . . . . . . . . . . . . . . . . . . 191
xiv
6.3 Comparison of number of targets found in circle and square . . . . . . 192
6.4 Comparison of number of targets found in each region. . . . . . . . . 193
6.5 Comparison of number of targets found in circle at time t1 and t2. . . 193
6.6 Snapshot of CPU load . . . . . . . . . . . . . . . . . . . . . . . . . . 194
6.7 Various searching scope with 100,000 and 500,000 database records . 198
6.8 Various searching scope with 1,000,000 and 5,000,000 database records199
6.9 A single searching scope with one and five users . . . . . . . . . . . . 203
6.10 A single searching scope with ten and twenty users . . . . . . . . . . 204
6.11 Response time of single BS . . . . . . . . . . . . . . . . . . . . . . . . 205
6.12 Response time of multi-BSs . . . . . . . . . . . . . . . . . . . . . . . 206
6.13 Processing time of individual BSs for the same query scope and two
BSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
6.14 Processing time of individual BSs for the same query scope and three
BSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
6.15 Comparison of objects retrieved using a square and a circle (single cell)211
6.16 Percentage comparison of object retrieval using different sizes of query
scopes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
6.17 Comparison of objects retrieved using a square and a circle . . . . . 213
6.18 Average access time between proposed vs conventional approaches . . 215
6.19 Average access time for the proposed Local Index vs the conventional
approaches (50 Requests) . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.20 Average access time for the proposed Local Index vs the conventional
approaches (150 Requests) . . . . . . . . . . . . . . . . . . . . . . . . 217
6.21 Average access time for a single query . . . . . . . . . . . . . . . . . . 218
6.22 Average access time for a single query: remote indexes only. . . . . . 219
6.23 Comparison of cache hits with various minimum points on each group 220
xv
6.24 Comparison of cache hits with a maximum value of min req is 10. . . 222
6.25 Comparison of cache hits with a maximum value of min req is 20. . . 222
6.26 Comparison of cache hits with a maximum value of min req is 40. . . 223
A.1 Implementation for object validation against query scope . . . . . . . 255
A.2 Snapshot of experiment 1 simulation . . . . . . . . . . . . . . . . . . 256
A.3 Class diagram of server implementation . . . . . . . . . . . . . . . . . 257
A.4 Implementation of a server registering itself to a main server . . . . . 259
A.5 Implementation on how a server keep listening from incoming request 260
B.1 Opening page of Planimate . . . . . . . . . . . . . . . . . . . . . . . . 265
B.2 Planimate Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
B.3 Planimate Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
B.4 Initial server processing mechanism model. . . . . . . . . . . . . . . . 267
B.5 Planimate’s components for the server query processing. . . . . . . . 268
B.6 Initial indexing mechanism model. . . . . . . . . . . . . . . . . . . . . 269
B.7 Planimate’s components for the indexing mechanism. . . . . . . . . . 270
B.8 A condition interface on the Planimate. . . . . . . . . . . . . . . . . . 270
B.9 A logic for a node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
B.10 Indexing model with an item flow. . . . . . . . . . . . . . . . . . . . . 272
B.11 Planimate’s components are being used to mode the proposed client
caching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
xvi
List of Algorithms
2.1 The R-tree insertion algorithm. . . . . . . . . . . . . . . . . . . . . . . 40
2.2 The adjusting R-tree algorithm. . . . . . . . . . . . . . . . . . . . . . . 41
2.3 The searching R-tree algorithm. . . . . . . . . . . . . . . . . . . . . . . 43
2.4 Nearest-Neighbour search algorithm. . . . . . . . . . . . . . . . . . . . 44
3.1 The main proposed algorithm . . . . . . . . . . . . . . . . . . . . . . . 71
3.2 The vertical movement algorithm . . . . . . . . . . . . . . . . . . . . . 75
3.3 The horizontal movement algorithm . . . . . . . . . . . . . . . . . . . 76
3.4 The diagonal movement algorithm . . . . . . . . . . . . . . . . . . . . 79
3.5 The dynamic query scope algorithm. . . . . . . . . . . . . . . . . . . . 81
3.6 The angle of movement algorithm. . . . . . . . . . . . . . . . . . . . . 85
3.7 Non-overlapping algorithm. . . . . . . . . . . . . . . . . . . . . . . . . 90
3.8 Eliminating neighbour BS overlapping area algorithm. . . . . . . . . . 93
3.9 Eliminating items from neighbour query result. . . . . . . . . . . . . . 94
3.10 Get Result algorithm for static query scope . . . . . . . . . . . . . . . 97
3.11 Neighbour cell retrieval algorithm for dynamic query scope . . . . . . . 100
3.12 Predicted disconnections algorithm . . . . . . . . . . . . . . . . . . . . 104
3.13 Non-reprocessing algorithm . . . . . . . . . . . . . . . . . . . . . . . . 105
3.14 Reprocessing algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.15 Predictable disconnection algorithm for multi-cell retrieval . . . . . . . 107
xvii
3.16 Unpredictable disconnection algorithm for multiple cells retrieval . . . 108
4.1 The Local Index algorithm . . . . . . . . . . . . . . . . . . . . . . . . 132
4.2 The insertion algorithm of Local Index-1 . . . . . . . . . . . . . . . . . 134
4.3 The deletion algorithm of Local Index-1 . . . . . . . . . . . . . . . . . 135
4.4 The insertion algorithm of Local Index-2 . . . . . . . . . . . . . . . . . 137
4.5 The deletion algorithm of Local Index-2 . . . . . . . . . . . . . . . . . 138
4.6 Node maintenance of GI-1 algorithm . . . . . . . . . . . . . . . . . . . 141
4.7 Node Maintenance of GI-2 algorithm . . . . . . . . . . . . . . . . . . . 147
5.1 The proposed path-based elimination algorithm . . . . . . . . . . . . . 170
5.2 Density-based elimination algorithm . . . . . . . . . . . . . . . . . . . 172
5.3 Cache retrieval for PDAID algorithm . . . . . . . . . . . . . . . . . . . 176
5.4 Cached objects elimination of PDAID algorithm. . . . . . . . . . . . . 179
xviii
Publications
1. James Jayaputera and David Taniar, “Partial Global Indexing for Location-
Dependent Query Processing”, Encyclopedia of Mobile Computing and Com-
merce, IGI-Global, vol. 2, pp. 739-743, 2007.
2. James Jayaputera and David Taniar, “Data Retrieval for Location-Dependent
Query in a Multi-cell Wireless Environment”, Mobile Information Systems:
An International Journal, IOS Press, vol. 1, no. 2, pp. 91-108, 2005.
3. James Jayaputera and David Taniar: “Query Processing Strategies For Location-
Dependent Information Services”, International Journal of Business Data Com-
munications and Networking, IGI-Global, Vol. 1, No. 2, pp. 17-40, 2005
4. James Jayaputera and David Taniar: “Location-Dependent Query Results
Retrieval in a Multi-cell Wireless Environment”, Parallel and Distributed
Processing and Applications, Lecture Notes in Computer Science, vol 3358,
Springer-Verlag, pp. 49-53, 2004.
5. James Jayaputera and David Taniar: “Defining Scope of Query for Location-
Dependent Information Services”, Embedded and Ubiquitous Computing, Lec-
ture Notes in Computer Science, vol. 3207, Springer-Verlag, pp. 366-376,
2004.
xix
6. James Jayaputera and David Taniar: “Invalidation for CORBA Caching in
Wireless Devices”, Embedded and Ubiquitous Computing, Lecture Notes in
Computer Science, vol. 3207, Springer-Verlag, pp. 460-471, 2004.
xx
Chapter 1
Introduction
1.1 Preamble
Nowadays, people are always on the move and hence, mobile environments hold key
aspects to information retrieval. The need to access daily information regarding
the stock exchange, weather, restaurant locations, and so on, is unavoidable. In
this environment, this kind of information can be accessed at anytime anywhere
[2, 84, 16], because people connect their device to a server in a wireless fashion
without any limitations of distance boundaries [20].
Queries are sent to servers while users are moving. These queries are called mobile
queries. The queries are accepted by a stationary host, called the Base Station (BS).
A base station is a stationary entity which acts as a mediator to forward the message
to wireless and wired networks for a certain area. The particular area which is served
by a single base station, is called a Cell.
From the mobile users’ point of view, these areas are transparent. In other
words, the users do not know if they move within a single cell or multiple cells.
Based on this area, we categorise queries into two types, Single-cell and Multi-cell
1
CHAPTER 1. INTRODUCTION 2
cells queries. A single cell query is a query that asks for information about objects
where the objects are located in the same cell as the mobile user. On the other hand,
the multi-cell query is a query that asks for the current cell and its neighbouring
cells.
Due to the dynamic nature and limitations of the mobile environment, the query
executions needs to cope with these limitations in the mobile environment [74, 72,
23, 63, 87]. Smaller storage size, processing capacity and network bandwidth than
wired networks are included into these limitations. Frequent disconnections resulting
from narrow wireless bandwidth often occur during data transmission.
In terms of mobile query processing, it can be done either in the server side or
the client side or both. At the server side, the process involves a single or multiple
servers [128] depending on the query type. The challenges for the single query
processing are to return correct answers and to handle a disconnection situation. A
correct answer means that the answer is still valid when it is received by the user. A
disconnection might occur temporarily or permanently, in which during this period,
the mobile user does not receive any answer from the server.
On the other hand, query processing which involves multiple servers has other
challenges. The challenges involve receiving answers from other servers, because the
user request is forwarded by the current server to the appropriate servers on behalf
of the mobile client. The matched data items are sent to the requester cell which
will be merged with the requester cell’s data items. These data items are sent to
the client. In this case, the server who receives user queries should be aware of the
processing time of other servers. Otherwise, results that are no longer valid would
be sent to the user, even though these results were valid during the query processing.
Furthermore, another challenge for multiple servers query processing is an index
structure traversal. From the many existing indexing structures [93, 32], a tree index
CHAPTER 1. INTRODUCTION 3
structure type is commonly used [123]. The R-tree [41] indexing structure is chosen
for our research, because the R-tree is one of the most known tree indexes [93] for its
ability to store multi-dimensional indexes for points and rectangles. Unfortunately,
it lacks the efficiency to process multi-cell queries. For example, a multi-cell query
asks for cells A and B, where each cell has a R-tree index structure. In this example,
both cells traverse their tree index from the top (root node). Therefore, there is a
demand for new mechanisms to overcome this limitation.
Client side query processing can be achieved by providing a cache for the mobile
device. A cache is a temporary space to store most requested data items in the local
storage. Client devices accommodate these query results in a cache upon receiving
the query results. When the client asks similar queries, the query results will be
answered from the cache if they are available on the cache. Hence, communication
to a server can be avoided. Existing caching mechanisms have been applied in wired
and wireless networks, such as Web areas [3, 114] and distributed systems [106].
In the wireless network, caching data items is an appropriate way to handle
limitations in a mobile environment, such as frequent disconnections, small storage
size and narrow wireless bandwidth. The reason is that the query results can be
obtained from the local copy without connecting to the server if the query results
have been requested. Several caching mechanisms have been developed in this envi-
ronment [46, 64, 60, 61, 57, 58, 126, 18]. However, developing the best client cache
replacement algorithms within the limitations of mobile environments is challenging
and will be addressed in this thesis.
CHAPTER 1. INTRODUCTION 4
1.2 Limitations of Mobile Environments
The current mobile computing devices have some limitations to completing some
jobs as the desktop computers do. These limitations challenge some researchers
to overcome them. The limitations of mobile computing devices are described as
follows:
• Limited resources
The resource constraints of mobile computing devices enable them to process
fewer jobs compared with a desktop computer. The latest processing speed is
500 MHz for current PDAs. This processing speed is 6-7 times slower com-
pared with a desktop computer. On the other hand, the mobile computing
devices are powered by batteries. This power is consumed mostly by screen
back-lightings, Central Processing Units (CPUs), memories, hard disks, data
transmissions, displays and others. Therefore, traditional applications that
spend a huge amount of resources do not run efficiently on the mobile com-
puting devices.
• Limited geographical coverage
Limitation of geographical coverage is another constraint in the mobile envi-
ronment. This means that a base station can cover only a particular wireless
service area, called a cell.
• Mobility
Mobile devices can be carried from one location to another. Mobility means
that a mobile user moves from one geographical location to another. A mobile
device may move within a wireless service area or to different wireless service
area. When a mobile device moves to a different wireless service area, an event
CHAPTER 1. INTRODUCTION 5
called hand-off, occurs. This event transfers communication from the previous
wireless service area to the current one.
• Disconnections
Disconnections in mobile environments can be classified into two categories:
long and frequent. In the first category, mobile devices run out of power or
out of service range coverage. The latter is the major difficulty, and occurs
because of echoed signals, interference from other signals, moves to a different
network and other issues.
• Variable bandwidth
The current bandwidths occupied by mobile computing devices are varied.
This is caused by unsupported environments, such as when mobile users are
far from a base station. The maximum bandwidth which can be occupied by
a mobile device is 100 Kbps for Generalized Packet Radio Service (GPRS) [5]
and 155 Mbps for wireless LAN [24].
• Data transfer cost
Currently, the cost of transferring data items is still expensive. The data trans-
fer (sending and receiving) costs AUD 30 cents per megabyte data [49]. This is
caused by factors, such as expensive spare parts availability and productivity
of workers [1].
• Small screen devices
The screen device is much smaller compared with the one for desktop windows.
It is impossible to view a long list of records at once.
CHAPTER 1. INTRODUCTION 6
1.3 Objectives of this Thesis
In this thesis, mobile query processing techniques will be investigated focusing on
the client and server processing sides. Developing new algorithms to increase the
performance of the mobile query processing is our objective.
The following issues are focused upon in this thesis in order to achieve the ob-
jective mentioned above:
• to create an innovative server side query processing technique, that is divided
into single and multi-cell query processing;
• to design indexing mechanisms for multi-cell query processing;
• to model new caching replacement schemes for mobile devices; and
• to implement and evaluate the proposed approaches above.
1.4 Scope of Research
Traditional query processing techniques are not adequate for processing mobile
queries due to the limitations of mobile devices, mobile environment and mobil-
ity of users [81, 79]. Hence, this research investigates several outstanding issues
around mobile query processing at server and client sides. Then, we propose solu-
tions to these outstanding issues by developing new algorithms or modifying existing
algorithms.
Figure 1.1 depicts the scope of this research. The thesis consists of three core
areas, which include developing query processing algorithms for server and client
sides. The first two core areas are query processing at the server side, whereas
the last one focuses on the client side. Therefore, it is important to carry out
some investigations of query processing mechanisms for both sides which includes
CHAPTER 1. INTRODUCTION 7
Figure 1.1: Thesis framework
addressing the issue raised by the limitations of the mobile environment and its
device. Also, the investigation needs to address the nature of the mobile environment
such as, query processing at client side, and in particular to handle small storage size
and narrow wireless network bandwidth. On the other hand, the query processing
techniques at the server side deal with queries that request for information which is
located in both single and multiple stationary service areas.
1.5 Contributions
The specific contributions of this thesis are listed below:
• Query Processing at Server Side
Several algorithms to process mobile queries at the server side are proposed.
These algorithms are categorised into two major parts: Query Processing and
Handling Disconnection algorithms. The query processing part focuses on
answering Single-Cell and Multi-Cell queries. A single-cell query is a request
for information of objects that is located within one cell, whereas, a multi-cell
CHAPTER 1. INTRODUCTION 8
query is when the user requests information of objects within several cells. The
handling disconnection part deals with frequent disconnections, which occur
during query result transmissions.
• Indexing for Multi-Cell Query Processing
An extension of the existing multi-dimensional index structure is proposed
to incorporate the processing of multi-cell queries. Two index mechanisms,
namely Local and Global indexes, are introduced with an aim to minimise
query processing when query results retrieval involves multiple cells. The
local index mechanism is used to retrieve query result, and build the indexes
locally. The global index mechanism, which is built while a BS is starting
up, contains indexes of all online BSs. In the global index, remote data items
of both mechanisms can be either replicated in the local cell or kept in the
remote cell.
• Cache Replacement Policy for Mobile Client Caching
In order to address the major mobile issues of small display screen and storage,
a number of cache replacement policies is developed. The main aims of the
proposed cache replacement policies are to increase the cache hit rate and to
handle the issue of a small display screen. Three cache replacement policies
are proposed; they are Path-based, Density-based and Probability Density Area
Inverse Distance (PDAID) mechanisms. The first mechanism eliminates a
group of cached objects which is far away from user. The density-based mech-
anism evicts a group of cached objects that is the least dense. The PDAID
mechanism wipes out a group of objects that has the smallest value that was
calculated by a formula. The calculated value is based on distance, weight and
area of a query scope.
CHAPTER 1. INTRODUCTION 9
1.6 Organisation of the Thesis
The thesis is organised into four parts which are separated into six chapters. The
first part is a review of the literature, the second one is our proposed approach,
the third is the implementation of our proposed approaches and the fourth is the
conclusion of this thesis. The details of thesis organisation are explained as follows:
Chapter 2 presents existing related works in mobile query processing. The aim
of this chapter is to investigate the work done by other researchers in the same area;
to outline the achievements of their works in the same domain; and to analyse the
benefits and shortcomings of the works. This chapter also focuses on the problems
which still need to be investigated.
The core of this thesis, which concentrates on the problems pointed out in Chap-
ter 2, is divided into three major elements: (i) Query Processing at Server Side, (ii)
Indexing Mechanisms for Multi-Cell Query Processing, and (iii) Client Cache Re-
placement Policies.
Chapter 3 presents query processing techniques at server side which are cate-
gorised into single-cell and multi-cell query processing. The proposed approach is
based on the need to choose a correct and efficient shape as a query scope. It also
needs to apply an effective algorithm in mobile query processing. The proposed ap-
proach is elaborated upon in detail and the query processing algorithm is explained.
Chapter 4 elaborates upon the indexing mechanisms for multi-cell query pro-
cessing. The aim of this chapter is not to propose a new indexing structure, but to
propose mechanisms which use an existing indexing structure to process multi-cell
queries. The mechanisms are divided into local and global indexing mechanisms for
processing multi-cell queries at the server side.
CHAPTER 1. INTRODUCTION 10
Chapter 5 describes query processing techniques at the client side. The mech-
anisms include three cache replacement policies to deal with partial answers. In-
creasing cache hit performance and minimising transfer costs are the purposes of
the mechanisms.
Chapter 6 presents the performance evaluation of the proposed mechanisms men-
tioned in Chapters 3, 4 and 5. The evaluation includes formulating various cost
models for each proposed mechanism through implementation and simulation.
The last chapter concludes the contents of this thesis. The contents include
a summary of the research contributions and results achieved and presents future
issues for further investigation.
Chapter 2
Literature Review
2.1 Introduction
This chapter presents a comprehensive review including related work in the area of
mobile query processing. The main purpose of this chapter is to supply a broad
knowledge of the existing works which are related to this thesis. The chapter does
not only provide a general summary of mobile technology for query processing, but
also analyses what other researchers have done in the area of query processing.
The organisation of this chapter, as shown in Figure 2.1, is as follows. Two
preliminary sections that provide a background knowledge are presented in Sections
2.2 and 2.3. In Section 2.2, a global overview of wireless networks architecture is
presented. Section 2.3 describes a framework of queries.
Discussions of existing works are given in Sections 2.4, 2.5 and 2.6. Section 2.4
discusses the query processing at server side. Existing works on indexing structures
are discussed in Section 2.5. A discussion on existing client caching is described in
Section 2.6. Section 2.7 presents the problems that have not yet been resolved. The
last section concludes this chapter.
11
CHAPTER 2. LITERATURE REVIEW 12
Figure 2.1: Chapter 2 framework
2.2 Wireless Environment Architecture
A mobile computing environment has an architecture similar to that of the wired
network in terms of query processing and data communication. It means that mobile
computing devices can process the query and communicate with other devices, in a
similar way to the wired network. However, each environment has a different way
of dealing with query processing and communications. A mobile device with its
limitations can process fewer queries compared with the device in a wired network.
In terms of communication, mobile computing devices communicate through air as
their medium for communication.
In general, the wireless environment has two important devices: a mobile device
and a stationary host. Mobile computing devices can communicate through the air
with either mobile devices or stationary hosts. All queries are transferred by moving
users to a stationary server via wireless communication as shown in Figure 2.2.
Stationary hosts are called Base Station [44, 127], Mobile Support Station [44, 127],
or home base nodes [127]. A base station is a stationary host that acts as a mediator
CHAPTER 2. LITERATURE REVIEW 13
between a wired network and wireless hosts for a specific area, called a cell. This
type of computing is called mobile computing [51].
Figure 2.2: Wireless architecture
2.2.1 Wireless Technologies
This section discusses the current wireless technologies, and includes wireless tech-
nologies used for indoor and outdoor networks.
• In-room Network
In this category, a mobile device can communicate with other mobile devices
using a short range wireless. In [44], they mention two types of in-room net-
work: infrared and radio frequency. In the first type, the wireless network
coverage is about 40-50 metres with a supported bandwidth of about 1 Mbps.
The most common standard used for this network technology at present is the
Infrared Data Association (IrDA).
On the other hand, the BlueTooth Special Interest Group produced the in-
room radio frequency in 1998 [14]. BlueTooth is a low-cost, short range radio
CHAPTER 2. LITERATURE REVIEW 14
that connects mobile PCs with other BlueTooth devices within wireless net-
work coverage ranging from 1 metre up to 100 metres. The data transfer rate
is up to 3Mbps.
• Wireless LAN (WLAN)
A wireless local area network provides location-independent communication by
connecting two or more mobile computing devices without using wires. This
technology provides wide wireless bandwidth to low mobility clients. The aim
of WLANs is to provide a wireless bridge to conventional wired networks rather
than supporting true mobility [88]. This technology expands the range of the
infrared and the BlueTooth technologies by improving the network diameter
to about 200m [44]. It provides low-mobility, high-data-rate data communica-
tions within a confined region [127].
Table 2.1: Comparison of IEEE 802.11 standards
IEEE standard Speed(Mbps) Frequency band802.11 1-2 2.4 GHz802.11a up to 54 5 GHz802.11b 5.5 - 11 2.4 GHz802.11g up to 54 2.4 GHz
Amongst several available standards for WLAN, IEEE, the Institute of Elec-
trical and Electronics Engineers, the 802.11 standard for wireless LANs is the
most successful standard today and it is superficially similar to Ethernet [38].
The IEEE 802.11 standard has a number of protocols [108]. However, there
are only three types of IEEE 802.11 that have been widely used, namely IEEE
802.11a, IEEE 802.11b, IEEE 802.11g [38].
Table 2.1 gives a summary of these three types of IEEE 802.11. The table
shows that the first generation IEEE 802.11 is slow in terms of bandwidth. On
CHAPTER 2. LITERATURE REVIEW 15
Table 2.2: Comparison of wireless local area network (WLAN) standards - 802.11aversus 802.11b
IEEE 802.11b IEEE 802.11aTime Table Standard in 1997, products Standard in 2001,
in 2000 products in 2002Frequency band Transmit at 2.4 GHz - IEEE 5 GHzand bandwidth 802.11g standard increases
speed of 802.11b to 22Mbps in the same 2.4 GHzband
Speed 11 Mbps (Effective speed - 54 Mbps (Effectivehalf of rated speed) speed - 50% rated speed)
Modulation Spread Spectrum OFDM (Orthogonaltechnique Frequency Division
Multiplexing)Distance coverage Up to 300 feet 60 feet - speed goes
down with increaseddistance
Maturity More matured products Less matured butprogressing fast
Number of access Every 200 feet in each Every 50 feet;points required directionMarket penetration Quite widespread Just starting in 2002Interference with Band is more polluted - Less interference becauseof other devices significant interference here few devices in this
bandInteroperability Current problems expected Problems now but
to be resolved in future expect resolution soonCost Cheaper - $300 for access More expensive $500 (in
point and $75 for adapter 2001 /2002) - will comedown
Vendors Major vendors in both camps
CHAPTER 2. LITERATURE REVIEW 16
the other hand, Table 2.2 [79] shows a comparison of Wireless LAN standards
802.11a and 802.11b in more detail by considering several factors.
• Broadband Wireless Network
The wireless technology that allows simultaneous wireless delivery of voice,
data, and video has appeared recently in metropolitan areas, which is called
Broadband Wireless (BW) [81]. This wireless technology is mainly available in
metropolitan areas with a requirement of clear sights between the transmitter
and the mobile computing devices. Two types of this technology are: Local
Multi-point Distribution Service (LMDS) and Multi-channel Multi-point Dis-
tribution Service (MMDS). The first, LMDS, uses a high bandwidth wireless
frequency within a range of 20-31 GHz. The last type, MMDS, uses a lower
bandwidth wireless frequency within 2 GHz and has a coverage of up to 35
miles (roughly 56 KMs).
• Wide Area Wireless/Radio Network
Wide Area Wireless is designed to provide data transmission and its infrastruc-
ture consists of base stations, network control centres and switches to transmit
the data [127]. The characteristics of Wide Area Wireless are high mobility,
wide ranging and low data rate digital communication [88, 127]. This network
type can be categorised into public and private radio network [88]. The first
category is the wireless data communications supplied to the public by service
providers and the average data rate is 4800 bps to 19.2 Kbps [127]. The second
category is provided by a private company for its own purposes. Examples of
public packet data network are ARDIS, CDPD, Ericssons Enhanced Digital
Access Communication Systems (EDACS), Metricom, Mobitex and Motorola
Datatrac [33].
CHAPTER 2. LITERATURE REVIEW 17
• Satellite-based Network
The satellite network has been used to deliver communication, which relays
voice, video or data, since the 1960s [26]. The characteristics of the satellite-
based network are that it has wide range coverage, expensive, two-way com-
munication and low quality voice. It has wide area coverage which spans the
ocean as well as remote land areas [70]. It provides two-way communications,
however, it has low quality voice or limited data [127, 88]. It is also expensive
to provide this type of network [31].
There are three common terms used for these satellites based on their dis-
tance and spatial relationship with the earth, namely GEOstationary Satellites
(GEOS), Medium Earth Orbit Satellites (MEOS) and Low Earth Orbit Satel-
lites (LEOS) [88, 31, 110]. GEOS, MEOS and LEOS are located at altitudes
of 35,786 km, 10,000 km and 1,000 km respectively.
• Cellular Network
The cellular network has evolved from first generation up to fourth generation.
The first generation (1G) of cellular systems appeared in the early 1980s and
is based on analog technology [6]. Voice is transmitted using Frequency Mod-
ulation (FM) [88]. The first generation characteristics are low capacity, lack of
security, and unsuitable for non-voice applications [6]. The data transfer rate
is 1.2-9.6 Kbps [88].
In the early 1990s, the second generation (2G) of cellular systems appeared
and was heralded by the arrival of digital modulation techniques that promised
increased capacity, better speech quality, enhanced security features, and more
efficient terminals [6]. It has a data transfer rate from 9 to 14 Kbps [88]. Exam-
ples of the second generation cellular network includes Time Division Multiple
CHAPTER 2. LITERATURE REVIEW 18
Access (TDMA), and Code Division Multiple Access (CDMA), Global System
for Mobile Communications (GSM), and Personal Digital Cellular (PDC).
The second and a half generation is an enhancement of the second generation.
The examples include Enhanced Data Rates for Global Evolution (EDGE),
High-Speed Circuit-Switched Data (HSCSD) and General Packet Radio Ser-
vices (GPRS). Their data transfer rates are 474 Kbps, 38.4 Kbps, 171.2 Kbps
[82].
The third generation was developed in 1992. The examples of third generation
include the Universal Mobile Telecommunications System (UMTS), the Code
Division Multiple Access (CDMA2000). This generation has three categories
of data rates as follow [6]:
– 2.4 Mbps to stationary users (fixed location)
– 384 Kbps to pedestrian users (travel speed: 3 metres/hour)
– 144 Kbps to vehicular users (travel speed: 60 metres/hour)
The next generation of 3G wireless network is 3.5G with 3Mbits/secs data
rates [29].
The fourth generation has not officially been released yet, but it is expected
that this generation will support applications up to 1 Gbps [53].
As we mentioned earlier in this section, a cell is a service area for one BS where
each cell may have the same or different size. According to [71, 35], cells are
classified into three types: Macro, Micro and Pico cells. A Macro cell is a cell
which has a radius of 700-8000 metres, a data transfer rate of 144 - 384 Kbps with
bandwidth frequency of 11.34 Mhz. A Micro cell has a radius of 75 - 700 metres
with a data transfer rate of 384 Kbps and bandwidth frequency of 1.26 Mhz. A Pico
CHAPTER 2. LITERATURE REVIEW 19
cell is an area with a radius of 20-75 metres, a 384 Kbps - 2 Mbps data transfer rate
and 1.26 Mhz bandwidth frequency.
2.2.2 Location Positioning Systems
This section discusses available location positioning devices which are used to reg-
ister mobile user details in order to use a wireless facility.
• Satellite Positioning
The common popular Satellite Position is the Global Positioning System (GPS).
This system provides two basic types of services: the Standard Positioning
Service (SPS) and the Precise Positioning Service (PPS) [56]. The SPS is a
positioning and timing service focusing on the civilian user, whereas the PPS
is a positioning, velocity, and timing service for military applications. The
second service is restricted to authorised users only (such as: United States
and allied military and US government). Another Satellite Position is called
Galileo which will start its operation in year 2009.
• Cellular Positioning
This cellular positioning system is the integration of GPS so that the cellular
network provides terminals with assistance and correction of the satellites [56].
Examples of the cellular positioning for the second generation cellular network
(GSM, stands for Global System for Mobile Communications) are Cell-Id in
combination with timing advance, Enhanced Observed Time Difference (E-
OTD), Uplink Time Difference of Arrival (U-TDoA), and Assisted GPS (A-
GPS). The introduction of Cell-Id and A-GPS into existing GSM networks is
comparatively simple, while E-OTD and U-TDoA comprise essential modifi-
cations and extensions.
CHAPTER 2. LITERATURE REVIEW 20
Table 2.3: Performance characteristics of cellular positioning methods
Accuracy Consistency YieldRural Suburban Urban
Cell-Id >10 km 2–10 km 50–1,000 m Poor GoodE-OTD & 50–150 m 50–250 m 50–300 m Average AverageOTDoAU-TDoA 50–120 m 40–50 m 40–50 m Average AverageA-GPS 10–40 m 20–100 m 30–150 m Good Good
Examples of the cellular positioning for the third generation cellular network
(GSM) are Cell-based methods, Observed time difference of arrival with idle
period downlink (OTDoA-IPDL), Assisted GPS (A-GPS). Table 2.3 shows the
performance characteristics of each cellular positioning method [56]. From the
table, the performance of A-GPS show the most accurate and consistent of the
methods, even though its service area is the smallest service area compared
with the others.
Assisted GPS (A-GPS) is a hybrid solution to use information from both the
satellites and network [4]. This technology enables a mobile terminal including
GPS receiver to be positioned faster and more accurately [112]. The A-GPS
is located at BSs and feeds information to mobile computing devices. This
technology has been used in “KDDI au network” in Japan [112]. The advan-
tages of using A-GPS are: (i) improved accuracy, (ii) reduction of position
acquisition time, (iii) less power consumption at the GPS receiver, and (iv)
increase in receiver sensitivity [4].
CHAPTER 2. LITERATURE REVIEW 21
• Indoor Positioning
This positioning system operates within an indoor or local environment, such
as shopping centres or buildings. There are four indoor-based positioning sys-
tems: WLAN-based, Radio Frequency Identification (RFID)-based, infrared-
based and ultrasound-based. The first method is the most popular and IEEE
802.11 devices are used. The RFID-based is an emerging technology that is
primarily used today for applications like asset management, access control,
textile identification, collecting tolls, or factory automation [56].
Some such projects include Xerox ParcTab [117], the Wireless Indoor Position-
ing System (WIPS) project [119], Active Bat [118] and the Cricket system [92].
The first two projects use infrared-based positioning [117, 119]. The last two
projects use ultrasounds and a combination of ultrasounds and radio respec-
tively [118, 92].
2.3 Query Types
This section describes query types classification in a mobile environment. The gen-
eral query types are divided into two classes: Traditional and Mobile Queries. The
traditional query type category contains common query types that exist in a wired
network database, whereas the mobile query contains queries that exist only in a
wireless environment.
Figure 2.3 shows query type classifications in a mobile environment. The tra-
ditional query is the typical database queries. If we classify the traditional query
based on the geographical presentation, this type of query can be divided into two
classes: Location-Aware and Non-location. In the mobile computing environment,
the location of mobile users is dynamic and the query results often depend on this
CHAPTER 2. LITERATURE REVIEW 22
dynamic location. Therefore, this situation creates another additional class, which
is called Location-Dependent Queries.
Figure 2.3: Query types classification
2.3.1 Traditional Query
Traditional query is the most widely known query used in a database. The query
types of traditional query can be classified as: Spatial, Temporal, Spatio-Temporal
(Hybrid) and Others.
A Spatial query performs operations which include spatial searches and map
overlay, as well as distance-related operations [37]. A spatial query always requests
for spatial data information. Spatial data means that the requested data have a
complex structure, are often dynamic and no standard algebra are defined.
A Temporal query specifies a validity or deadline for the query results to be
returned. Example: “A student retrieves a subject timetable for this year”. The
subject timetable will not be valid for the past or future year.
CHAPTER 2. LITERATURE REVIEW 23
A Spatial-Temporal (Spatio-temporal) query requests for a spatial search and
specifies the validity or deadline for the query results to be received. Example:
“Retrieve the five ambulances that were nearest to the location of the accident
between 4-5pm.” [90].
The last category is Other. It implies that the other remaining queries do not
belong to one of the classifications above. Examples:
• A tourist requests restaurant information.
• Students request their academic records or contact details.
2.3.2 Location Query
[50] were the first authors to introduce the idea of queries with location constraints.
These types of queries have one parameter which is location. It implies that the
query result is related to or depends on, that parameter.
Location Dependent Query [130, 63, 94] is a type of query where the answers
depend on the current location of the sequesters. For example, “select all restaurants
within 500 metres from my location”. The answer should give a list of restaurants
within 500 metres from the current location of the requester as illustrated in Figure
2.4. If the requester moves to a new location, the list of restaurants will be changed.
A location is an important field in this type of query and this field can be implicitly
or explicitly mentioned in the query [94].
These types of queries can be further categorised into two groups. The first group
is based on sources and objects, and the second one is based on query retrieval [113].
The sources and objects are represented as users while sending the query and the
searched objects. Their states can be either static or moving. The second state is
based on the states of the query retrieval either one-time or continuous. A one-time
CHAPTER 2. LITERATURE REVIEW 24
Figure 2.4: Location-dependent query (LDQ) illustration.
query is a query that expects a query result in one-time. On the other hand, a
continuous query, as the name implies, is a query that receives a query result which
is based on the current location of the source at some moment in time. This query
is sent only once and updated location information is sent to notify the server that
the client has moved to a different location. Both groups mentioned above can be
further elaborated as follows.
(a) Data sources and objects states
This group focuses on states of location for either users or objects while a user
query is being proceed. The states of location for both can be static or dynamic
during the query processing.
Table 2.4 shows the division of group one. As we can see from the table, category
one is further divided into four subgroups. The first subgroup is a static user
CHAPTER 2. LITERATURE REVIEW 25
Table 2.4: Mobile query category 1
User Static User MovingObject Static - xObject moving x x
probes for static object/s. This subgroup does not involve a mobility factor for
either users or objects. Whenever the query is sent, the query result returned
will always be the same. Therefore, the first subgroup cannot be included as a
Location-Dependent Query.
The rest of the three subgroups are: moving user probing static object/s, moving
user probing moving object/s and static user probing moving object/s. Details
about queries processing for these subgroups can be found in Section 2.4. Below
is a summary of the rest of these subgroups and their examples:
• Moving user searching for static object/s
In this query type, a user or requester is moving while issuing a query and
the requested query results are static. Examples of this type of query:
– While a taxi driver is driving, requests restaurants within 500 metres
from the current location.
– A tour guide in a moving car requests information about tourist at-
tractions nearby.
In the first example, a searching distance is explicitly mentioned; whereas,
in the second one, the searching distance is not mentioned. This situation is
not only applied for this type; it can also be applied to the other two types.
[100, 111] provide common operators for constrained location-dependent
queries which can be applied to location-dependent queries in both groups.
CHAPTER 2. LITERATURE REVIEW 26
• Moving user searches for moving object/s.
Both users and objects are moving for this type of query. Below are exam-
ples of query types:
– A walking person is searching for an available taxi close to his location.
– Police in a patrol vehicle are pursuing a running thief.
• Static user searching for moving object/s.
In this query type, the user remains in the same position while asking for
moving object/s. Below are examples of this query type:
– A security officer in a control room is searching for a fleeing thief.
– An officer in a control room is asking for landing time when an aircraft
is landing.
(b) Query Retrieval States
The second category relates to how often the query result is expected to be
received, that is, whether it is periodic or one-time. Table 2.5 shows query types
in category 1 are used with query types in category 2.
Table 2.5: Mobile query category 2
Periodic One-timeStaticUser-MovingObject x xMovingUser-StaticObject x x
MovingUser-MovingObject x x
CHAPTER 2. LITERATURE REVIEW 27
Details of both types in category 2 are specified below:
• One-time Query
One-time query expects a query result to be received once. It means that
this query does not depend on the time interval. All the query types in
category 1 are one-time queries if their results are received once.
• Periodic Query
Periodic query is similar to one-time query, except query results are re-
ceived at every time interval and the time interval is specified in periodic
query. Periodic query is also called range-monitoring query [17]. It is used
for monitoring query continuously. The returned query results of periodic
query may be the same as or different from the previous query results in a
past interval time. Example: “A moving car is asking for traffic conditions
within 500 metres for every 5 minutes”.
2.4 Server Query Processing
This section presents a discussion of existing work on location-dependent query
processing at the server side. A brief overview is presented first to provide an idea
of how a location-dependent query is processed, followed by query processing at the
server side, indexing structures used at the server side and query processing at the
client side.
2.4.1 Overview Location-Dependent Query Processing
As we mentioned in the earlier section, mobile users need to register with a Base
Station or location positioning device (as mentioned in Section 2.2.2) in order to use
CHAPTER 2. LITERATURE REVIEW 28
a wireless facility. This registration process includes registering location details of
mobile users [67, 19].
After the registration process has been done, a location-dependent query is sent
by a moving user and this query is received by a base station. In processing this
query, the user mobility factors are considered since they are important factors in
answering a location-dependent query [30]. The mobility factors include current
position, velocity and direction of the user, all of which are linked to the query.
This information is used to predict the next location. After the next position is
known, the server probes its database to match the object information against the
user query.
While processing the query, the mobile user moves to another location, which
could be inside the same cell or another cell. In addition, the movement of the
mobile user can be differentiated into two categories: Constraint and Unconstrained
movements [85]. The former is the movement within a network, for example, users
may be driving car, riding bicycle, or travelling by tram or train. In addition, roads
can be either one-way or two-way roads. The latter one is the movement that is not
restricted, for example, walking.
Furthermore, there are three important situations that will lead to wrong an-
swers being given to the recipients. To illustrate this situation, let us consider that
most wireless applications use GPS to get the accurate location. For example, the
server is starting to process the query. In one situation, the user might disconnect
while informing the GPS. Therefore, the GPS does not have the correct location
information of users. The GPS gives the old location instead of the current one to
the server. In another situation, the server uses the current location information
collected from the GPS, but the location information given is not the current one
since the GPS has not been updated with the latest one. This last situation might
CHAPTER 2. LITERATURE REVIEW 29
occur where the user is expected to enter cell A. However, the user enters cell B.
This situation will lead the server in cell A to process unnecessary requests.
Figure 2.5: Requesting a static object and moving within a single cell.
Figure 2.5 shows a global overview of location-dependent query processing within
a single cell. A mobile user sends a location-dependent query to a server asking for
static objects through a BS. The server generates a query result for that query. The
query result is received by the user, but the result is invalid. This is due to the user
having reached a new location and the result does not apply to any objects within
the query scope of the new location.
Figure 2.6: Requesting a static object and moving to another cell.
CHAPTER 2. LITERATURE REVIEW 30
Figure 2.6 illustrates the requesting of a static object with multi-cells move-
ment. A moving user requests a static object while moving into another cell. The
server processes the query and sends the query results to the requester. Since the
requester has moved to another cell, the BS forwards the query result to the BS
where the requester is located. When the query result is received by the requester,
the received result is invalid since the result contains object information from the
previous location.
Figure 2.7: Requesting a moving object and moving within a single cell.
A global overview process of requesting moving objects while moving within a
single cell is shown in Figure 2.7. In the figure, a moving truck requests a moving
object from his location. At the same time, a moving object is registering itself to
the BS. The server processes the query and returns a result to the requester, which
is the moving truck. Upon receiving a query result, the moving car has moved to
another location which is out of range of the query scope. Therefore, the truck
receives an invalid query result.
CHAPTER 2. LITERATURE REVIEW 31
Figure 2.8: Requesting a moving object (user and object moves to another cell).
CHAPTER 2. LITERATURE REVIEW 32
Figure 2.8 shows a user searching for a moving object but both move to another
cell. While sending a query, the user moves to another location resides inside dif-
ferent cells. The cell generates the query result and sends the query result back to
the requester. At the same time, the object moves to another cell before the object
acknowledges the new position to the current cell. Therefore, the server sends an
old position of the moving object to the requester. This event results in the user
receiving an invalid result.
Figure 2.9: Requesting a moving object and user stays at same position.
Figure 2.9 shows a user in the controller room requesting a moving object. Before
the server processes the query, the object has updated its position. Therefore, the
server generates a correct result that is received by the user. On the other hand,
when the server finished processing the query before the object updates its position,
the user receives incorrect information.
Figure 2.10 shows a similar situation to that shown in Figure 2.9. However,
the object moves to another cell. When the object updates its location before the
CHAPTER 2. LITERATURE REVIEW 33
Figure 2.10: Static user requests a moving object.
server has finished processing the query, the user receives the correct information.
Otherwise, the user will receive a query result that contains incorrect information.
Figure 2.11: Periodic query
Figure 2.11 presents a periodic query illustration. In the figure, a user sends a
query once while expecting a query result to be sent at every interval time. The
CHAPTER 2. LITERATURE REVIEW 34
server processes the query and sends the result at every interval time. The process
ends if the user asks the server to stop sending a query result.
Figure 2.12: Non periodic query
Figure 2.12 presents an illustration on a one-time query, where a user sends a
query and receives a query result once. The server no longer processes the query.
2.4.2 Query Processing for a Single Cell
This section presents a query processing mechanism, that focuses in particular on
location-dependent query processing while the mobile user is travelling within a
single cell. The discussion in this section involves a variety of query scope shapes
and approaches to predict next movement location.
A number of shapes exist, such as rectangles, circle, polygon, hexagon [75].
Defining a valid scope for a mobile client is important to generate a correct answer
to a given query since the mobile user has moved to a new location. In this section,
we analyse previous studies in defining a valid scope. The existing works focused on
defining a valid scope using polygon, rectangle and circle.
• Polygon
An approach called Polygonal Endpoints (PE) uses a polygon shape to process
CHAPTER 2. LITERATURE REVIEW 35
a location-dependent query [130]. A direct way to explain the valid scope
of data value is by using the PE scheme. All endpoints of the polygon are
recorded to define a valid scope.
• Circle
Another way to define a valid scope is by using the Approximate Circle (AC)
scheme. The AC scheme is one of the most convenient ways to generate a
valid scope, if we know the distance within which the user would like to find
an object. In the AC scheme, a valid scope can be defined by the centre of the
circle and the radius of value. The maximum size of the circle can be defined
as the current velocity of the user [128]. The advantage is to predict size of
valid scope at the current speed in a time interval.
As mentioned earlier, the movement of a mobile user can be within either a con-
strained or unconstrained network. There are two ways to predict user movement:
using a time function and indexing.
One of the data modelling concepts to represent the position of moving objects
in databases as a function of time, is the Moving Objects Spatio-Temporal (MOST)
devised by [102]. The aim of this approach is to estimate the position of objects
when a query is entered. Therefore, excessive updates are avoided. In their work,
the location of a moving object is conducted as dynamic attributes which are divided
into three sub-attributes: function, updatetime and initial value. How the value of
these dynamic attributes changes over time is denoted by the function. This function
can answer both one-time and periodic query types.
Another way to decide the next location of moving objects is by using Indexing.
Chapter 2.5 discusses indexing structures in details.
In order to answer a query in an efficient way, a query or object space is par-
titioned into several regions. [103] provided a solution to answer Reverse Nearest
CHAPTER 2. LITERATURE REVIEW 36
Neighbour (RNN) queries in two-dimensional space. They divide the space around
the client location into six equal regions by a straight line intersecting the client
location. Thus, there exist at most six RNN objects around the client location.
Moreover, the Region Quad-tree indexing structure is an indexing structure
which uses a minimum bounding rectangle to store data points in four quadrants
of equal size [97, 98]. Section 2.5 presents more details on Quad-tree indexing
structures.
2.4.3 Query Processing for Multiple Cells
While a user is travelling, the user may move to another cell due to the transparency
of a cell boundary. When a user moves to another cell, a handover event occurs
during this period. The current base station may send an invalid query result after
the user moves to another cell.
Zheng et al. [128] categorised various handover mechanisms into four types:
Naive, Priority, Intelligent and Hybrid mechanisms. The Naive method is the sim-
plest of the four mechanisms to be implemented. However, the waiting time for
answers from a server is shorter compared with the Priority method which can an-
swer queries of normal users unless the number of urgent users keeps increasing.
The Hybrid method does not give a better result because, if the number of users is
large, the waiting time will be lengthened. The Intelligent method gives a better
result since this method does the calculation of the expected time to leave current
cell. In this method, if the expected time to leave current cells is known, the BSs
of new cells know when to process the queries by assuming that unexpected delays
are not occurring.
On the other hand, [73] proposed other four handover approaches, namely Ping-
Pong avoidance (PPA), Towards the Border (TTB), MGIS Data Resolution (MDR)
CHAPTER 2. LITERATURE REVIEW 37
and Transmission Power and Interference Optimization (TPIO). In the PPA ap-
proach, undesirable handoffs can be minimised by taking advantage of the area
information and mobility model to predict users movement. The TTB is useful for
predicting when the users will reach the boundary of the BS.
The Intelligent and TTB approaches have the same purpose which is to predict
how long it takes the users to reach the BS boundary. However, the Intelligent
approach is very straightforward in computing the reaching time in a new BS cover-
age. This approach ignores the movement direction. In contrast, the TTB approach
considers user directions in computing the reaching time.
2.5 Indexing Structures for Query Processing
The indexing technique is a common mechanism to help in accessing a collection of
records and improving the efficiency of query processing [93, 129]. This technique
uses an index structure, which is a data structure that organises data records to
optimise certain kinds of retrieval operations. An index allows us to efficiently load
all records that match search conditions on the search key fields of the index.
Various index mechanisms for conventional and mobile query processing are dis-
cussed in this section. Existing indexing mechanisms including their related out-
standing problems for query processing will be discussed.
2.5.1 Conventional Index Query Processing
Due to its efficiency in answering queries, all database records have been indexed
and placed into an index structure. Various types of index structures have been
developed [93, 32, 54]. Amongst those existing index mechanisms, the tree based
schemes are prominent and widely used due to their easy tree traversal [123, 115].
CHAPTER 2. LITERATURE REVIEW 38
The B+-tree [32], is widely known as one of the data structures for index, and
is a data structure that contains subtree and leaf nodes. A subtree is formed by
a collection of non-leaf nodes. A non-leaf node contains up to m keys and m+1
pointers to the nodes on the next level of the tree hierarchy. All nodes on left-hand
side of the parent node have key values less than or equal to the key of that parent
node. In contrast, the key values of the right-hand side nodes of the parent node
are greater than the key values of parent node. The bottom-most nodes are called
leaf nodes.
The R-tree index structure stores multi-dimensional indices (such as points),
which was developed by Gutmann [41]. This index structure type has the efficiency
and capability to handle both point and region data items. Many researchers [109,
41, 11, 99], just to name a few, expanded the features of the original R-tree into
many variations of R-tree. The aim of this expansion was to provide an efficient and
dynamic index structure for spatial data.
The structure of R-tree is similar to that of the B-tree indexing structure. Figure
2.13 illustrates the R-tree. In the B-tree, a node of the tree is a single index; however,
a node in the R-tree stores a set of d-dimensional geometric objects represented as
a rectangle, which is called a Minimum Bounding Rectangle (MBR), which is used
to group the closest objects together into a rectangle where every area has the least
enlargement area.
The R-tree insertion operation can be explained as follows. Assuming that sev-
eral data points would be inserted into an R-tree with a maximum 6 points per
node. In the first state, while inserting data points with id p to rectangle R, a
bounding box is computed for the object and insert the pair <p,R> into the tree.
The bounding box is enlarged when a data point is inserted. If the tree is empty,
then this bounding box becomes a root node of the tree.
CHAPTER 2. LITERATURE REVIEW 39
(a) R-tree in two dimensional space
(b) Rtree
Figure 2.13: The R-tree illustration [93]
After a certain time, when the maximum points for a bounding box have been
reached, a new bounding box is created to accommodate new data points. The
existing objects are redistributed to adjust to the bounding box. Adjustment of the
bounding box is called Splitting. In general in a tree splitting process, objects in the
existing bounding box, to minimise the need for enlargement, are grouped together.
Once the splitting process has been completed, both nodes become leaf nodes. A
new root node is created to cover both bounding boxes.
Now, assume that a R-tree exists and a data point with id ‘d’ is to be inserted.
A traversal is started at the root node and cruises a single path from the root node
CHAPTER 2. LITERATURE REVIEW 40
to a leaf. At each level, the child node is chosen whose bounding box demands the
least enlargement to cover the data point d. If several children have bounding boxes
that cover d, from these children, we select the one with the smallest bounding box.
At the leaf level, the data point is inserted, and if necessary, the bounding box of
the leaf is enlarged to cover d. When the bounding box is enlarged at the leaf level,
this enlargement must be propagated to ancestors of the leaf (after the insertion
is made), the bounding box for every node must cover the bounding box for all
descendants. If the leaf node lacks space for the new object, a similar process as
mentioned above is applied, which includes splitting the node, reallocating entries
between the old leaf and the new node, adjusting the bounding box and propagating
these changes up the tree. Algorithm 2.1 shows the R-tree insertion algorithm for
inserting a data entry E(I,B).
Algorithm 2.1: The R-tree insertion algorithm.
beginN ← root Node.if N is a leaf then
return N.endSelect a node ’A’ in N whose A1 needs least enlargement to store EI.Traverse until a leaf node is reached by setting N to be the child nodepointed by A.if the selected node A is the leaf node and has a free space for E then
Insert E.else
Split node A using one of the splitting algorithms.endPropagate any changes upwards by invoking Adjust Tree.if Adjust Tree requires the root node to be split then
Expand length of the tree.end
end
CHAPTER 2. LITERATURE REVIEW 41
Algorithm 2.2 shows the algorithm for the adjusting tree. The process is started
from a leaf node upwards until it reaches a root node of the tree. When a node
is full because of the insertion of a new record or a previous split, a new node is
created to store the remaining contents of the existing node. The adjusted node is
propagated to its parent node until it reaches the root node.
Algorithm 2.2: The adjusting R-tree algorithm.
beginN ← a Leaf Nodeif N was split previously then
NN ← LL where LL is the second split node.endwhile N 6= rootNode do
P ← parent node of NPN ← a bounding box of N in PAdjust PN I so that it tightly covers all bounding box entries in N.if NN is partner of N due to resulting from an earlier split then
Produce a new bounding box called PNN which covers allrectangles in NN and the pointer PNN ptr pointing to NN in P.Add this new bounding box PNN to P.if P has no free slot then
Execute Split Node to separate the content of P into P and PP.N ← P .NN ← PP .
end
end
endend
When nodes are full, nodes splitting occurs. The splitting mechanism is not as
simple as for B-tree since there will be overlapping of MBRs. The original R-tree
proposed three splitting mechanisms [41] as follows:
• Linear
This splitting algorithm is a mechanism that selects ends that are far apart. It
finds nodes by selecting them randomly and allocates them so that the smallest
MBR enlargement is required by the allocation.
CHAPTER 2. LITERATURE REVIEW 42
• Quadratic
This algorithm minimises a small-area split; however it is not guarantee to
produce the smallest area possible. Similar to the Linear mechanism, this
algorithm selects two nodes that have the maximum distance between them
and allocates another node to one of the two nodes. In the node allocation,
the node is placed into a group in order to have less expansion.
• Exponential
This algorithm is the most straightforward splitting algorithm of the three
candidates. It finds all possible groupings and selects the best one, so the
minimum area node can be found.
To search a point for a query Q in a R-tree, a traversal begins with the R-tree
root node to a leaf level. The bounding box for each child of the root is verified to
see whether this bounding box overlaps with the query. If more than one child of the
root has a bounding box that overlaps Q, all corresponding subtrees are traversed.
When we get to the leaf level, the node is checked to find whether it contains the
desired point. On the other hand, it is possible that a leaf node will not be visited if
the query point is not in the indexed dataset. Algorithm 2.3 is a searching algorithm.
R-trees can also be used to answer Nearest-Neighbour (NN) queries [96]. Nearest-
Neighbour queries are to find objects within a certain radius. Minimum Distance
(MinDist) and Minimum of Maximum possible distance (MinMaxDist) ordering met-
rics are used for the R-tree searching algorithm. MinDist is used to decide the closest
objects to point P from all those enclosed in a rectangle R. MinMaxDist is a metric
used to calculate the minimum value of all the maximum distances between the
query point and points on each of the n axes respectively. This metric guarantees
there is an object within the MBR at a distance less than or equal to MinMaxDist.
CHAPTER 2. LITERATURE REVIEW 43
Algorithm 2.3: The searching R-tree algorithm.
beginN ← a root node.if N is not a leaf node then
Find each child of the current node that bounding box of the childnode overlaps the query point / region.if found then
recursively search the child of the node.end
elseVerify all entries to discover whether an entry overlaps with S.Return the entry that overlaps with the query point / region.
endend
Algorithm 2.4 shows the Nearest-Neighbour search algorithm with depth-first
search traversal. The traversal starts with the R-tree root node to the leaf level. In
the beginning, the nearestN (the nearest neighbour distance) value is infinity. At
each level, a new node parameter is pointed by a newly visited non-leaf node during
the downward traversal. The algorithm calculates the ordering metric restrictions
for all its MBRs and sorts their corresponding node into a list called Active Branch
List(ABL).
Once the ABL has been created, pruning strategies 1 and 2 are applied to the
list to eliminate unnecessary branches. Then, the algorithm goes through each
entry in the ABL until the ABL is empty. For each entry, this algorithm is called
recursively by passing the entry, Point and nearestN values. At a leaf node, a
function objectDIST is called to calculate the distance between the point and the
MBR. The returned value is compared with the current value of nearestN. If the
returned value is smaller, the value of the nearestN is updated. The step is repeated
for each entry in the leaf node. On the returning from the recursion, this new
CHAPTER 2. LITERATURE REVIEW 44
Algorithm 2.4: Nearest-Neighbour search algorithm.
Input: Node, Point, Nearestbegin
// Current NODEcurrNode ← Node// Search POINTsearchPoint ← Point// Nearest NeighbournearestN ← NearestNODE newNodeBRANCHARRAY branchListinteger dist, last, i// At leaf level - compute distance to actual objectsif (Node is Leaf) then
for i = 1 to Node.count dodist ← objectDIST(Point, Node.branch[i].rect)if (dist < Nearest.dist) then
nearestN.dist ← distnearestN.rect ← Node.branch[i].rect
end
end
else// Non-leaf level - order, prune and visit nodes// Generate Active Branch ListgenBranchList(Point, Node, branchList)// Sort ABL based on ordering metric valuessortBranchList(branchList)// Perform Downward Pruning// (may discard all branches)last ← pruneBranchList(Node, Point, nearestN, branchList)// Iterate through the Active Branch Listfor i ← 1 to last do
newNode ← Node.branchbranchList[i]
// Recursively visit child nodesnearestNeighbourSearch(newNode, Point, Nearest)// Perform Upward Pruninglast ← pruneBranchList(Node, Point, Nearest, branchList)
end
endend
CHAPTER 2. LITERATURE REVIEW 45
estimation of NN is taken and pruning strategy 3 is applied to eliminate all branches
with MinDist(P,M) > Nearest for all MBRs M in the ABL.
The three strategies of the pruning theorem are described as follows [96]:
1. An MBR M with MinDist(P,M) greater than the MinMaxDist(P,M1) of an-
other MBR M1 is discarded because it cannot contain the NN. This is used
for downward pruning.
2. An actual distance from P to a given object O which is greater than the
MinMaxDist(P,M) for an MBR M can be discarded, because M consists of an
object O1 which is nearer to P. This is used for upward pruning.
3. Every MBR M with MinDist(P,M) greater than the actual distance from P
to a given object O is eliminated because it cannot surround an object closer
than O. This is used for upward pruning.
In the context of retrieving objects from several servers, the above algorithms
are not efficient, because a tree traversal is always started from the root node in
those servers.
2.5.2 Moving Object Index Query Processing
Some researchers also have done some works in applying the concept of existing index
structures to process queries in a mobile environment. This section discusses existing
works that use an indexing structure to process queries in the mobile environment.
Authors in [107, 27] used the PMR Quadtree index structure to answer con-
tinuous queries that change in terms of function of time. The index structure is
another variant of the quad tree that used to store segment fragments and has a
hierarchical vector representation [80]. The index values contain a function of time
in the two-dimensional time-attribute space. More specifically, the PMR Quadtree
CHAPTER 2. LITERATURE REVIEW 46
stores information about a line segment in every quadrant of the underlying space
that it crosses.
The RQ-tree index structure is a combination of the R-tree and the Quad-tree
to index the location of objects [39]. The authors argued that space entities are not
distributed evenly and they could form different shapes of objects. R-tree degrades
the performance of all scopes of located objects that are not close to rectangles.
Therefore, the RQ-tree contains the R-tree as the outer tree and the Quad-tree to
store the remaining objects. This approach uses the R-tree to store regular objects
(the form of objects is a rectangle) and the Quad-tree to store irregular objects,
where the Quad-tree root node is a leaf of R-tree.
The R-tree is used to index static range query and velocity constrained for pro-
cessing continuous spatial queries in querying moving objects [91]. In this mecha-
nism, all incoming queries are indexed in an R-tree index structure and the second
R-tree index (VCI) is a R-tree based index with an additional field vmax in each
node and is used to index all moving objects. The vmax entry for an internal node
is the maximum of the vmax entries of its children. At the leaf level, the vmax entry
is the maximum allowed speed among the objects pointed to by the node.
The Lazy Update R-tree (LUR-tree) was proposed in [59]. This approach in-
dexes the current positions of moving objects. It also decreases update cost by
eliminating unnecessary modification of the tree while updating the positions. The
index structure is updated only when an object leaves the corresponding MBR. The
LUR-tree swaps a position of an object in the leaf node only if the new position of
the object is still in the MBR.
The TPR-tree indexing structure is based on the R-tree to index continuously
moving objects at all times in the future [12]. In this scheme, the size of the rect-
angle is extended based on the velocity and time. Therefore, the number of targets
CHAPTER 2. LITERATURE REVIEW 47
remaining inside the rectangle will increase as the size of the rectangle is increased.
This indexing structure is also used to index the uncertainty of moving objects [45].
The TPR*-tree indexing structure is an enhancement of the TPR-tree indexing
structure by considering predictive queries [105]. The TPR*-tree adapted insertion
/ deletion algorithms from R*-tree indexing structure.
Another variant of the TPR-tree is the TPROM-tree [28], which is an index
structure that indexes the current and future positions of moving objects. The
index structure also handles object updates efficiently by adopting a memory-based
update approach. The aim of adopting the memory update approach is to reduce
the update cost by avoiding the necessity to delete old data items from the index
structure while updating the index structure.
The Q+R-tree [121] indexing structure is similar to RQ-tree in terms of using a
combination of Quad-tree and R-tree indexing structures. However, the Q+R-tree
is used to index moving objects. The R-tree component indexes quasi-static objects,
whereas the Quad-tree indexing structure is to index fast moving objects which are
distributed over wider regions. In other words, the R-tree indexes those objects that
are currently moving slowly, whether or not they are crowded together in buildings
or houses.
Another variant of the Q+R-tree is the PQR-tree [40], which efficiently indexes
the current and near future positions of the moving objects. The PQR-tree also
extensively decreases the update cost. The PQR is different from Q+R-tree in
terms of integrating structure to put the moving objects. The benefit of this index
structure is that it is able to manage the moving objects inside and outside road
networks at the same time. The current position and near future positions of moving
objects can be queried effectively.
CHAPTER 2. LITERATURE REVIEW 48
The D-tree is similar to the KD-tree indexing structure [124]. The D-tree index-
ing structure is a height-balanced binary tree which is constructed based on data
regions partitioning. In constructing the D-tree, a space is recursively partitioned
into two subspaces containing a similar number of regions until every subspace has
one region. One or more polylines in a two-dimensional space are a group of divisions
between regions which represents a partition of two subspaces.
The KD-tree is a binary search tree that represents a recursive subdivision of the
universe into subspaces by an average of (d-1)-dimensional hyperplanes [37]. The
hyperplanes are iso-oriented and their direction alternates between the d possibili-
ties. KD-tree is also known as the Range tree [13]. In [55], the authors proposed an
approach to map moving objects and their velocities into points and keep the points
in a KD-tree index structure.
The Spatio-Temporal R-tree (STR-tree) and the Trajectory-Bundle tree (TB-
tree) [86] are two indexing structures, which are extensions of R-tree, to index mov-
ing object trajectories. The former considers the trajectory identity in the index,
whereas the second one is a hybrid structure, which keeps trajectories and allows
for R-tree typical range search in the data.
2.6 Mobile Query Processing at Client Side
This section discusses issues that relate to query processing mechanisms for mobile
client devices. These issues are grouped into 3 classifications: Mobile-join, Top-K
queries and Caching. The first and the last categories are similar. In the first cate-
gory, data is downloaded from several cells, which has to be executed on the mobile
devices in order to obtain explicit results. Once the results have been produced
and sent to the user, they are deleted within a short amount of time. In the third
CHAPTER 2. LITERATURE REVIEW 49
category, the data is retrieved from the current cell for the first time request and
loaded from the local copy for subsequent requests. In contrast to the first category,
data in the local copy is kept until there is not enough room to store new incoming
data. Therefore, providing a caching to cache frequently accessed data items on the
client side is an effective approach to improving the system performance [10, 126].
The second category focuses on retrieving records which are ranked in Top-K.
2.6.1 Mobile-Join
Obtaining explicit query results can be done at the client side by retrieving data
from several cells, which would be executed by joining them locally. Downloading
all relations from those cells may not be a perfect solution considering the limited
resources in a mobile device, which includes small size memory space to store a
large volume of data and small size of display to view all results [68]. Several join
mechanisms have been proposed and they are explained in this section.
[66] have proposed three query processing mechanisms at the mobile client side.
In the first approach, a mobile client requests data from related cells and those
data are joined on the mobile client device. In the second approach, all data are
downloaded from one cell and only the primary key is retrieved. The information is
then matched at the mobile client side. If any information is missing, the missing
information is retrieved from other side. In the last approach, all primary keys are
needed from cells and the downloaded primary keys are matched at the client side.
The data on those matched keys are downloaded from cells.
Authors in [83] proposed two query processing mechanisms, where the pieces of
data are located either in other mobile devices or servers. In the first approach, a
mobile user sends a query to a server which then informs other mobile users that
have other parts of data. The mobile user and the server send the data to the
CHAPTER 2. LITERATURE REVIEW 50
requester. Similar to the first approach, the server is in charge of joining other data
and sends the data to the requester.
Block-Based Processing is a mechanism that breaks down the data into blocks
and transfers each block one by one to the server [68, 66]. The aim is to overcome
memory capacity limitation and narrow bandwidth in the mobile environment. Two
block-based mechanisms for client side query processing have been proposed in [66],
namely: Static and Dynamic blocks. Both mechanisms are similar in terms of
providing the number of records per block to be downloaded from one server. These
records are then compared with another list from another server. They are different
in the way records are eliminated from a block. For the dynamic mechanism, the
last record of each block is compared to find out which block containing last record
is smaller and this block will then be entirely removed. In other words, the block
that has the larger last record, will be preserved with the qualified match being
removed from the block.
The Recursive and Adaptive Mobile Join (RAMJ) mechanism can be executed on
a mobile device and joins two relations located on non-collaborative remote servers
[69]. Data space is partitioned into several parts and statical information about those
data is retrieved from the servers without downloading the original data. Based on
this information, the information from those parts are joined and adaptively select
the data and its relation that fall into this partition are loaded from the server.
2.6.2 Top-K Queries
Processing or displaying the entire query results is not needed if k number of query
results, which are highly ranked, can be processed or displayed. This type is called
Top-K queries. Some researchers have done some work in Top-K queries for Web
CHAPTER 2. LITERATURE REVIEW 51
databases, Peer-to-Peer (P2P) network, data stream, sensor network or mobile en-
vironment [15, 9, 48, 120, 25].
Some works have been undertaken on processing Top-K in a web application
[62, 122]. This work take into account the unavailable relation attributes to be
accessed through the external web form interface. Furthermore, this situation causes
a potentially large set of data sets to be queried repeatedly. Hence, work has been
done to tackle the above challenges by proposing a technique that will execute Top-K
queries. The Top-K queries are executed through a setting where the attributes, for
which users determine target values, are controlled by external, autonomous sources
with a variety of access interfaces.
Content sharing, which is one application in the P2P environment has been
receiving more attention from users. As a result, the number of people using this
application has been increasing which will impact upon the network performance.
Therefore, it is essential to decrease the network data transfer due to the expansion
of the number of people using this environment. Most of the users are generally
interested only in a result that are correlates with their query, or the best one. One
solution is to apply the Top-K queries within this environment in order to return
the most relevant results.
Some researchers have been working on applying Top-K query algorithms in
the P2P network [8, 76, 21, 43]. Decentralised Top-K query developments have
been completed by some of these researchers. They tried to use local ranking,
optimized routing and merging to reduce the number of results returned to the
users. Consequently, the load of data transfer has been reduced, however ranking
and merging of results has increased the computing workload.
Top-K has also enabled the output of the highly relevant objects in the earliest
stage, and this is useful in mobile environments because the amount of data transfer
CHAPTER 2. LITERATURE REVIEW 52
and power consumption can be reduced. Many of the existing works, for example
KLEE and SR-Combine [77, 78], have demonstrated their efficiency in dealing with
tough response times in a mobile environment. The proposed schemes deliver some
initial results early which reduces waiting time, data transfer cost and processing
power. The most important feature is its capability to adapt to various environments
with a faster bandwidth network, because this feature is self-adapting for retrieval
and concentrates on real time requirements.
Due to the increase in a number of popular applications, the size of the data
stream flowing over the network is also increasing too. It means that the data size
flowing over the networks will overload the network traffic. The impact of this is that
users may have problems if they are not fully able to handle the large, continuous
flow of data. One of the proposed approaches is the space-saving algorithm [77]
which uses the maintaining partial information of interest as the main idea. The
aim of this algorithm is to process some stream types before the data is eliminated
forever. In this algorithm, the benefit of defining top-K in data streams is based on
the frequency of elements retrieving 0.5 percent or more of the total hits which might
comprise the top 500 elements. Hence, this algorithm produces space efficiency with
a strict guarantee on errors that limits estimate counts of elements with Top-K
memory requirements.
2.6.3 Cache Replacement Policies
In general, placing a cache at client side reduces network activities between client and
server. Caching mechanisms have been widely used to store frequently accessed data
for database [7, 106], distributed [106, 36] and web systems [125] in wired network
and mobile environments [104, 47, 94]. This section focuses on cache replacement
policies for the mobile environment.
CHAPTER 2. LITERATURE REVIEW 53
The Least Recently Used (LRU) cache replacement policy uses a timestamp to
eliminate objects from a cache. Timestamp is the time when data items are received.
When the cache is not large enough to receive new objects, this approach eliminates
data items that have the oldest accessed time.
[95] proposed a client caching which is based on the clustering structure ex-
ploiting both semantic and temporal locality, which is called Two-level LRU. This
approach clusters together the semantically or adjacently related query results in
the cache. Because of its intrinsic properties, semantic caching is regarded as an
ideal cache scheme for mobile computing. The aim of this approach is to keep the
most profitable data in the cache with the help of clustering. Thus, if a query Q2
can be totally or partially answered by Q1, it is put in the same cluster as Q1. In
the clustering process, when a query can be partially answered by a segment of a
group, part of the segment which is an answer to the query is removed from the
segment. Part of the segment and the remaining answer to the query are combined
into a new segment. If a segment becomes empty as a result of the removal, it is
removed from the cluster. On the other hand, if the query is partially answered by
segments which belong to different clusters, the clusters are merged into one cluster.
The aim of this approach is to effectively reduce wireless network traffic and deal
with disconnection.
[22] proposed a semantic model for client-side caching and replacement in a
client-server database system, which is called Manhattan-Distance based. In this
approach, the client maintains a semantic description for the data in its cache as a
reminder query. Reminder query consists of the missing parts that are not available
in the cache. The maintenance of usage information for replacement is done in an
adaptive fashion for semantic regions. The usage information here is incorporated
CHAPTER 2. LITERATURE REVIEW 54
with collections of tuples. The usage of sophisticated value functions which asso-
ciates semantic notions of locality is possible if a semantic description of cached data
is maintained. This policy gives a higher priority to replacing the cached objects
which are the greatest Manhattan-distance from the client’s current location.
The Furthest Away Replacement (FAR) policy is proposed in [94]. In their
proposed approach, they make decisions based on the current location and movement
direction of mobile clients. Therefore, a priority is given to the data items that are
located furthest away and in the opposite direction to the user’s current location.
It means that the cached objects which have the higher priority will be evicted first
since the users are unlikely to access those objects within a short time.
The RBF-FAR replacement policy approach has slightly modified FAR [65].
They claim that FAR fails in some cases since predicting the next possible loca-
tion is not considered. The RBF-FAR improves the FAR approach by adding an
intelligent knowledge to predict the next possible movement. The aim of adding
an intelligent knowledge is to use RBFNN to predict the next location instead of
Velocity in FAR RBFNN is a self learning model which can learn from historical
information from the semantic segments index.
The Probability Area (PA) / Probability Area Inverse Distance (PAID) ap-
proaches are two cache replacement policies for location-dependent data under a
geometric location model [130]. In the PA approach, valid scope area and the access
probability of data items are two consideration factors for cached objects replace-
ment decisions. Whereas, the PAID approach considers the inverse distance as
another additional factor. In both approaches, the data item that has least cost of
product of those two factors has higher priority to be removed from the cache.
CHAPTER 2. LITERATURE REVIEW 55
The Mobility Aware Replacement Scheme (MARS) approach considers a gain-
based cache replacement policy, which considers client’s location, movement direc-
tion and access probability parameters [60]. This approach is unable to detecting
a user’s regular movement. Therefore, the authors proposed another improved ap-
proach, which is called MARS+ [61], to deal with the temporal properties and spatial
properties of a client’s access pattern in order to improve the caching performance.
The MARS+ approach also makes it possible to detect regular client movement
paths. While deciding on which cached objects to be eliminated, this movement
pattern knowledge is used to evict the cached objects.
The Prioritized Predicted Region-based Cache Replacement Policy (PRRP) ap-
proach analyses the data item cost on the basis of access probability, valid scope
area, data size in cache and data distance based on the predicted region, which have
not been considered in any of the existing policies [57]. In this proposed approach,
the fundamental aim is to select cached item victims by using a predicted region-
based cost function. The predicted region is selected based on the client’s movement
and applies it to determine the data distance of an item.
The Weighted Predicted Region-based Cache Replacement Policy (WRRP) ap-
proach picks a predicted region based on a client’s movement, then, the predicted
region is applied to calculate the weighted data distance of an item [58]. This ap-
proach is similar to PRRP in terms of considering access probability, valid scope
area and data size in cache. The WRRP approach takes into account the weighted
data distance from the predicted region as an additional factor to pick an elimination
victim from the cached data items.
The Rule-based Least Profit Value (R-LPV) approach considers the profit gained
due to data caching [18]. In this policy, various caching parameters are considered.
They are data access probability, update frequency, retrieval delay from the server,
CHAPTER 2. LITERATURE REVIEW 56
cache invalidation delay, and data size. The item is eliminated by using a function
called profit function. The purpose of this function is to determine the profit from
caching an item. This cache replacement policy is similar to the one in the client-
server environment [52].
The Proactive caching model caches the result objects as well as the index that
supports these objects as the results [46]. The purpose of caching the indexes is to
enable the objects to be reused for all common types of queries.
The Complementary Space (CS) cache replacement policy is used to maintain
a global view of the whole dataset [64]. In this cache replacement policy, different
portions of a global view are cached in varied granularity based on the accessed prob-
abilities in the future queries. The cached objects with very high access probabilities
are kept in the cache.
2.7 Outstanding Problems
A discussion of the problems and shortcomings of existing works is presented in this
section. Earlier in this chapter we reviewed the literature that dealt with works on
mobile query processing at the client and server sides. Our review reveals that there
are still problems and issues that need to be addressed and resolved.
An examination of existing problems from previous researchers is carried out,
which is described in the next three subsections. A discussion of mobile query
processing problems is presented in Section 2.7.1, followed by indexing mechanism
problems for processing multi-cell queries. The last problem to be discussed is a
client caching replacement policy, which will be presented in the last subsection.
CHAPTER 2. LITERATURE REVIEW 57
2.7.1 Mobile Query Processing at Server Side
This section presents issues that are still outstanding in the mobile query involving
both single and multiple cells. After analysing the existing work on query processing
at the server side, several major problems still exist, one is the use of circle as a query
scope. There are objects which are located outside the circle boundary, which are
not retrieved by the query. Another is that objects have been passed are not of any
interest to the user. Hence, in this thesis, we focus on using square as a query scope
and excluding unnecessary items, which have been passed. We will demonstrate
that it is possible to retrieve additional items without including unnecessary items
into the query result, so as to improve the performance of query result retrieval.
The next problem is to overcome frequent disconnections in the query processing.
This implies that the query processing needs to be intelligent of when to process or
preserve the existing result in order to deal with the frequent disconnections while
processing a single-cell or multi-cell query. To date, we have not yet come across
an existing work that addresses on this issue. Hence, our research in this thesis
(Chapter 3 in particular) will look at improving the query processing at the server
side.
The following questions state some of the major problems to be addressed:
• How do we model efficient mechanisms for query processing within a single
cell?
• How do we design an efficient mechanism to process queries which involve
several cells?
• How can the proposed model cope with overlapping or non-overlapping cells
boundaries?
• How do the proposed model deal with frequent disconnections?
CHAPTER 2. LITERATURE REVIEW 58
2.7.2 Indexing Structures for Multi-Cell Query Processing
Processing a query has to be done quickly before the mobile users pass the predicted
location to receive the query result. In this section, we describe an indexing problem
when retrieving query results from multiple cells, which involves more amount of
records.
Indexing is a convenient way to store database records information into server
memory due to its small value. Many approaches have been created to organise
those indexes in memory; however, a tree index is one of the most prominent data
structure used in practice.
Some major problems of the existing tree indexing structures are as follows:
• How can we have a single structure that contains all static items in active
cells?
• How can we store all requested items from neighbouring cells into the current
cell without reducing any performance of the current cell?
• If we are able to solve the above problem, how do we manage the requested
objects when they do not exist in a cell?
2.7.3 Client Cache Management
This section describes an outstanding problem in query processing, which are focus-
ing on when users travel around several same locations and pose the same queries.
As we discussed in the previous section, the mobile environment has some limited
features despite its advantage of being able to establish communication everywhere
at anytime. Its unreliable network connection, narrow network bandwidth and ex-
pensive data transfer cost are some negative factors in the mobile environment.
CHAPTER 2. LITERATURE REVIEW 59
Providing a cache for the client device is one way to overcome this issue, because
the incoming query results are stored in the local cache. The problem arises because
to date all existing approaches store all incoming query results to the local cache.
The next problem is dealing with the cache maintenance when the available
cache slots cannot accommodate all incoming query results. Improving the cache
hit performance is another focus of our research in this thesis. We attempt to use
an existing grouping mechanism to group the cached objects.
The following are questions regarding cache maintenance development:
• How can we model a cache that stores at least k items per request rather than
receiving a full set of incoming items in order to cope with the limitation of
the mobile environment?
• How does the quality of a cache hit improve by considering the weight factor?
• How can we model a cache by adapting one of the grouping algorithms where
the request items received at least K items?
• How can we design a cache model by considering distance, grouping and at
least K items per request?
2.8 Conclusion
At the beginning of this chapter, we have presented the architecture of a mobile
environment which includes the current wireless technologies. The mobile computing
environment has some constraints such as narrow bandwidth, short-life battery,
limited storage and frequent disconnections, all of which make the task of processing
mobile queries more complex.
CHAPTER 2. LITERATURE REVIEW 60
A user’s mobility creates a unique class of mobile queries besides the traditional
query. This class is called Location-Dependent Query. The location of user and
objects are two important parameters in location dependent query. These two pa-
rameters add further complexity to query processing since these parameters change
their location during the query processing period.
Finally, the main contributions of this chapter can be categorised as following:
• A query taxonomy is presented. This classification is important since it is pos-
sible to analyse all types of queries for data management in a mobile computing
environment.
• Mobile query processing issues at both server and client sides are shown. The
issues, arising from location-dependent query processing in a mobile computing
environment, need further investigation.
Chapter 3
Query Processing at Server Side
3.1 Introduction
A mobile query is a query that is requested while the user is travelling. The current
location of mobile user is a unique factor that must be considered, because the query
result depends on the current location of the requester. Users may remain in the
same location or move to another location while waiting for the query results. If a
user stays in the same location, an invalid query result is unlikely to occur since the
receiving location is the same as the sending location. On the other hand, when the
user moves to another location, the sending location is different from the receiving
location. Although, the receiving location can be predicted based on the travel
velocity of the mobile user, invalid query result might still occur if the user passes
beyond the predicted location.
While requesting a query result, the query scope might request objects which are
located in the same or different cell. Query scope is an area where the user requests
some objects. For example, retrieve a list of hospitals within 500 metres. Therefore,
an area within 500 metres from the user’s location is called query scope. Cell is
61
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 62
a service area for one base station. A base station is an intermediate host which
connects mobile devices and static hosts. If a query scope does not pass one cell, the
query processing for the situation is called single-cell query processing. However, if
a query scope passes more than one cell, it is called multi-cell query processing.
In this chapter, we propose schemes for single-cell and multi-cell query processing
approaches at the server side, which focus on retrieving static objects. The proposed
approaches for single cell query processing are divided into three categories based on
the query scope movement against a base station: namely (i) Static, (ii) Dynamic
and (iii) Angle. The static category is characterised by the query scope and is parallel
to the base station. In this category, we have developed three algorithms, which are
based on horizontal, vertical and diagonal movement. The dynamic category is
when the query scope is perpendicular towards the user direction. Finally, the angle
category is based on the angle of user direction.
The proposed approaches for multi-cell query processing are categorised into two
categories. The first category considers an overlapping and non-overlapping amongst
base stations. The second one considers how to handle disconnection to the base
station boundary.
The structure of this chapter is shown in Figure 3.1. Section 3.2 presents a pre-
liminary knowledge as a foundation of this chapter. Sections 3.3 and 3.4 describe
proposed single-cell and multi-cell query processing approaches at server side re-
spectively. Section 3.5 discusses the proposed approach on handling disconnections
for single and multi-cell query processing. Case studies are described in Section
3.6, which give some illustrations and to support the explanation of the proposed
approaches. A further discussion for both approaches is provided in a discussion
section (Section 3.7). The last section of this chapter concludes the contents of this
chapter.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 63
Figure 3.1: The framework of chapter 3
3.2 Preliminaries
This section presents an overview of query processing at the server side. The sec-
tion is divided further into three subsections as outlined in Figure 3.1. The first
subsection introduces all terms which are used in this chapter. The next sub-section
(Section 3.2.2) discusses a shape selection criteria to be used as a query scope.
Several query types are described in the last sub-section (Section 3.2.3).
3.2.1 All Terms Used
In this section, we introduce terms used in our work. These are:
• Cell scope: an area serviced by one base station. Mobile users can exchange
information with a base station within this area.
• Base station (BS): a stationary host which does message forwarding from and
to a static network. BS can connect to one or multiple database servers.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 64
We assume that a BS connects to a single database server, even though it is
connected to multiple database servers.
• Query scope (QS): an area within which mobile users query static objects. We
use the terms ‘query scope’ and ‘valid scope’ interchangeably. This scope can
be presented using an existing shape, such as circle, hexagon, and square.
• Parallel query scope: a query scope which is parallel to a BS where mobile
users currently reside.
• Dynamic query scope: a query scope which is not located in parallel to a BS
where mobile users currently reside.
• Location: a point in two coordinates which present the location of a mobile
user or static objects. To simplify, we assume that a location of an object is
presented as a point.
• Travel direction: a straight line which is measured from starting to ending
points.
3.2.2 Shape Selection for a Query Scope
In this section, we discuss how to choose a shape as a query scope. There are a
number of shapes that can be used to denote as a query scope, such as: rectangle,
triangle, square, and circle.
Figure 3.2 shows all locations of vending machines, a restaurant and a user within
the BS boundary. Assuming the user would like to find the nearest restaurant within
n or m square, where n and m are numbers which represent a distance or area where
the target will be probed respectively. All targets within the boundary of BS are
valid for that BS only. In order to get a valid answer to a user’s query, the BS needs
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 65
to keep track of the current location and the query scope of a user. Otherwise, some
targets in a generated query result become invalid when the user has moved, even
though, its movement is still within the same boundary of the BS.
Figure 3.2: A scenario presented in two-coordinates
Now, we describe the shapes. Firstly, lets consider a rectangle. A rectangle has
different length for its horizontal and the vertical lengths. It is hard to apply the
rectangle as a query scope. Secondly, lets consider a triangle to represent a valid
scope. Assume the distances from the centre to left, right and top are the same.
These distances tell us that the base is twice its height. If we calculate the area of
a triangle, then the area of a triangle is the same as the area of a square. However,
it is hard to decide whether a target is inside the boundary. Therefore, we do not
consider to use a rectangle or triangle as the query scope.
The next two candidates, a square and a circle, have similar capabilities. A
square has more accuracy and can more easily be used to catch a target closest to
the user compared with other shapes, and its length is the same as its width. The
dimension of a square is presented by the distance from the user query to the left,
right, top and bottom. If an area is entered, then the dimension of the square can
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 66
be found by taking the square root of the area (√
area). Therefore, a dimension of
the square presents as a valid scope of the query.
On the other hand, circle is one of the most popular shapes of choice because
it is the right shape for retrieving the nearest neighbour objects in an efficient way.
All objects within distance n units can be found.
Figure 3.3: The proposed approach
Now, lets apply both shapes to the illustration in Figure 3.2. A sample query, “a
user would like to find a restaurant within n units from the current distance”. The
restaurant will be found if we use a square as a valid scope, because the square has a
greater scope compared with that of a circle as shown in Figure 3.3 if the perimeter
of both shapes are the same. Furthermore, one can argue that one could:
• Increase the size of the circle
This is a common argument in order to retrieve objects located close to the
query scope. However, we do not know the optimum size of the circle that
needs to be enlarged. If we increase the size of the circle, too many objects
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 67
are retrieved and resources are wasted (bandwidth, power consumption and
memory).
• Resend the same query
Resending the same query needs more processing time and power. It results
in the objects being passed or being outside the scope area. Furthermore, the
user may miss a query result if the server is busy.
Therefore, a square shape is preferred to be used as a query scope due to its
efficiency in query processing at the server side. This shape is more efficient and
accurate and can more easily be used to discover objects within this shape. Fur-
thermore, the possibility of finding the restaurant is higher than if square is used.
3.2.3 Query Types
This section describes briefly location-dependent queries. There are many queries
that are similar to a location-dependent query.
A common query type which is similar to the location-dependent query is the
spatial, location independent query. An example of a spatial query would be to find
a certain region at location X1, Y1. Note that, this type of query is not a location-
dependent query. The reason is that this query asks a certain object which does not
depend on the current location of the user.
The results of a location-dependent query type are dependent on the current
location of the mobile user who initiates the query. Current location means the
location of the mobile user when he/she receives a query result. This type of query
exists only in a mobile environment. Figure 3.4 shows a situation when a mobile user
sends a location-dependent query and receives its result. The sending and receiving
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 68
locations are not the same. Objects located inside the query scope are valid objects
which are returned to the requester.
Figure 3.4: A location-dependent query in details
The location-dependent query processing can involve object retrieval from both
single-cell and multi-cell. Single-cell query processing is a query processing where
the query scope does not pass the base station boundary (as shown in Figure 3.4).
On the other hand, if the query scope passes more than one base station boundaries,
the query processing is called multi-cell query processing.
Consider the query: “retrieve a list of restaurants within 500 metres”. In this
query, the location is implicitly mentioned. The query can be either a spatial or
location-dependent query type. It is a spatial query if the location of the requester
remains the same from the time of asking query until receiving the query result.
We can categorise it as a location-dependent query since it depends on the current
location of the requester. In contrast, it is a location-dependent query if the location
from which the query was sent or the location at the time of receiving the query
result are different.
Furthermore, consider this query: retrieve a list of restaurants within 500 metres
from hotel A. This query is not a location-dependent query since the location of
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 69
restaurants depends on the location of hotel A and is independent of the location of
the user. However, this query can be classified as a location-related query.
3.3 Query Processing for Single-Cell
We discuss in details how our proposed algorithm works in handling the situation
mentioned above in this section. Our proposed algorithms are divided into three
categories, which are elaborated in next three subsections. They are as follows:
• Static Query Scope
This category is based on movement of mobile users. We propose three algo-
rithms based on user movements: horizontal, vertical and diagonal. The query
scope is parallel to the base station.
• Dynamic Query Scope
The dynamic query scope category is the query scope that is perpendicular to
the travel direction of mobile user.
• Angle of Movement
Here, we consider the angle of the travel direction, which is calculated between
the travel direction and the centre horizontal line of the query scope. We
classify the angle of travel direction into three groups: 0 < α ≤ 30, 30 < α ≤60 and 60 < α < 90 degrees.
3.3.1 Static Query Scope Category
In this category, the query scope is parallel to a BS shape. It is parallel in the sense
that to avoid creating a query scope based on the travel direction of mobile users.
Hence, this category is to simplify a process of creating query scope.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 70
Each of these three subsections presents the steps of the proposed approaches in
this category. The first subsection shows a main part of the proposed approaches.
The last two subsections contain two parts of the proposed approaches, which are
responsible to retrieve objects based on the user movement.
The Main Part
In general, this part is an entry point and contains the entire process of query result
retrieval in this category. The process includes receiving a user input, predicting an
expected recipient location, creating a query scope, searching objects in the current
BS, and sending the query results back to the mobile user. This process terminates
when the mobile user receives the requested information.
While the server is processing the user’s request, the user moves from one location
to another. The contents of query result retrieval would be based on the recipient
location instead of the location of sender. After the recipient location is known, the
query scope is generated whose size is based on a given value. Information of all
objects where their location inside the query scope are then retrieved. Finally, the
information is shipped to the requester and an acknowledgement is expected to be
received. If the server does not receive any acknowledgement, the server produces a
new query result and sends it to the user. The new query result might be different
with the current one due to the mobility of the user.
Algorithm 3.1 shows the details of our main proposed algorithm. It can be
explained as follows:
(i) The server receives an input from the user. It contains the following factors:
the current location of mobile user, travel direction, velocity, and searching
distance. The first factor is very straight-forward, which is the location when
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 71
Algorithm 3.1: The main proposed algorithm
Input: Location, Query, SpeedOutput: Resultsbegin
tstart ← time when a query is received (assume zero)(TDx, TDy) ← travel distance from tstart to tstart+1 at the current velocity(X1, Y1) ← Sender location at time tstart
// Prediction of next location at time tstart+1
(X2, Y2) ← (X1 + TDx, Y1 + TDy)SD ← Searching distance (from user query)isReceived ← falseallObjsFound ← emptywhile isReceived do
Create a query scope with dimension 2 * SD and 2 * SD at location(X2, Y2)Divide the scope into 4 equal areas where the user is located at thecentre pointDir ← user travel directionobjsInOverlappingArea ← calling algorithmcheck overlapping area(allObjsFound)allObjsFound ← calling the algorithm to get valid objects based onuser movementisReceived ← send allTargetFound to userif isReceived is false then
tstart ← tstart+1
(X1, Y1) ← update the location at time tstart
(X2, Y2) ← update the location at time tstart+1
end
endend
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 72
the user sent a query to the server. A travel direction can be determined us-
ing either a travel history or two points in two-dimensional coordinates. The
process is simplified by producing two dimensional coordinates. The coordi-
nates are connected by a straight line, which show the direction of the travel
between the start and the end points. The velocity value is taken from the cur-
rent value of velocity. The last factor is used to measure how far the searching
area distance is.
(ii) Predicting the next location where the mobile user is expected to receive the
query result. It is predicted by doing a calculation based on the current travel
direction, speed, and query processing time.
(iii) Creating of a query scope process. Since a square has the same height and
width, the dimension of the square is presented by the length of the param-
eter. The length parameter is the searching distance from the client request
multiplied by two. We multiply by two, because the length is the distance
from the user to the left and the right sides.
(iv) Once it has been created, it is divided equally into 4 regions. The aim of this
division is to speed up the searching process on the server side; therefore, the
regions that have been passed, which are located in the opposite direction, will
not be processed further.
(v) Verify whether there is an overlapping area, which is an overlap area between
previous and current query scopes. This area exists if the mobile users fail to
receive the query result at time tstart−1. The time tstart−1 is a unit of time when
the user expects to receive the query result previously. The purpose of checking
the overlapping area algorithm of the query scope is to avoid processing the
existing targets in the next interval time. The details of this process will be
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 73
discussed in Section 3.5.1 while a disconnection issue being presented. The
execution result of this process is either a set information of objects, which
locates in this overlapping area or an empty set of information.
(vi) Load information of objects that their locations is located within the query
scope. The result set of the overlap query scope is passed with a consideration
that the overlap area is excluded in the current process of probing objects.
Unless the mobile user is predicted to stop while retrieving the query results,
the server decides which area of the query scope is processed. When the travel
direction of the user is either horizontal or vertical towards the query scope, two
regions of query scope are processed. On the other hand, if the travel direction
is diagonal, a region of the query scope is processed. These regions of query
scope being processed are located in the same direction as the travel direction.
If the user misses a query result previously, these regions are subtracted by the
overlapped area. The information of all objects in this area is retrieved. The
details of these processes are presented in the next two subsections.
(vii) Send the generated query result to the user. Once the query result is ready to
be shipped, the collected information is then sent to the user and the server
waits an acknowledgement from the user. The mobile user sends an acknowl-
edgement to the server once the query result has been received. In contrast,
when the mobile user receives either partial query result or none, due to a weak
signal or disconnection, the acknowledgement will not be sent to the server. At
the server side, a parameter is used to kept track whether an acknowledgement
has been received by the server. Its value will be true if an acknowledgement
from the user has been received. Otherwise, its value is false and the server
prepares the next query result for time tstart+1.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 74
The Vertical/Horizontal Movement Algorithm
This section focuses on horizontal or vertical movement in two-dimensional coor-
dinates. The vertical movement is when a mobile user travels along Y-axis of the
two-dimensional coordinates, whereas the horizontal movement is when the mobile
user travels along X-axis. A discussion of the proposed vertical movement algorithm
is presented first, followed by the horizontal movement.
The proposed vertical movement approach retrieves information of requested
objects based on travel direction of the mobile user. In the start, it receives the
current position, a query scope and travel direction of a mobile user from the main
part (see previous section). The current position and travel direction are used to
determine which regions of query scope is being processed. When the mobile user
is going up-direction, information of all objects which are located in the two upper
regions (regions 1 and 2) are retrieved. However, if the information of objects has
existed in the overlapping collection, they will not be loaded anymore. On the other
hand, when the mobile user is going down-direction, the similar approach is applied.
However, the difference is to retrieve only information of objects, which are located
the two bottom regions (regions 3 and 4). The algorithm of this scheme is shown in
Algorithm 3.2.
Figure 3.5 shows examples of how this algorithm works when a user goes verti-
cally. All information about targets located in shadowed regions is sent to the user.
While a user is travelling down or south, all information about objects located in
the bottom regions (regions 3 and 4) is sent to the mobile user as shown in Figure
5a. On the other hand, all information about targets located in top regions (regions
1 and 2) are sent to the mobile user while a user is moving up or north (Figure
3.5b).
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 75
Algorithm 3.2: The vertical movement algorithm
Output: Resultsbegin
Objects ← objects collection in the current Base Station boundary(X, Y ) ← Current location at tstart+1
Dir ← user direction(either up or down)overlapping collection ← list of objects in the overlapping areawhile (still have more Objects) do
if (object is not in overlapping collection) and (object is in scope) thenif (direction is up) and (object.Ycoordinate ≥ Y ) then
collection ← collection + objects found in region 1 and 2else
collection ← collection + objects found in region 3 and 4end
endobject ← next object
endcollection ← collection + overlapping collection
end
(a) Down (b) Up
Figure 3.5: The complexity of vertical movement
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 76
The proposed horizontal movement approach is similar with the previous one.
They are the same in terms of total processed regions. However, they are different in
terms of travel direction and selection regions being processed. Besides the mobile
user travels along Y-axis, he/she moves horizontally along X-axis. Another difference
is the region of query scope being searched. Instead of the upper or bottom regions,
the left and right regions are searched when the mobile user is going to left and right
directions respectively. Algorithm 3.3 shows the process of this approach.
Algorithm 3.3: The horizontal movement algorithm
Output: Resultsbegin
Objects ← objects collection in the current Base Station boundary(X, Y ) ← Current location at tstart+1
Dir ← user direction (either up or down)overlapping collection ← list of objects in the overlapping areawhile (still have more Objects) do
if (object is not in overlapping collection) and (object is in scope) thenif (direction is right) and (object.Xcoordinate ≥ X) then
collection ← collection + objects found in region 1 and 4else
collection ← collection + objects found in region 2 and 3end
endobject ← next object
endcollection ← collection + overlapping collection
end
Figure 3.6 shows two situations while a user is moving horizontally. On the one
hand, a user is moving from right to left. On the other hand, the user is moving
from left to right. In the first situation, the targets in the left regions, regions 2
and 3, are retrieved and sent to the user (as shown in Figure 7a). In the second
situation, the targets in the right regions, regions 1 and 4, are fetched and sent to
the user (Figure 3.6b).
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 77
(a) Left (b) Right
Figure 3.6: The complexity of horizontal movement
The Diagonal Movement Algorithm
The last algorithm of this category is to retrieve objects while the users are travelling
diagonally. We assume that a user is interested only in getting all objects in the
opposite region of his/her incoming region when the user is moving in a diagonal
direction.
This algorithm is similar to the ones mentioned in the previous section, except
the region being searched and the travel direction. Total regions to be searched is
one instead of two, whereas the travel direction is differentiated into four directions:
Bottom-Right, Bottom-Left, Top-Left and Top-Right. In this approach, the searched
region is the opposite region of the travel direction. For example, when the travel
direction is detected comes from the bottom-left region, information of objects which
are located in top-right is searched. Algorithm 3.4 presents the process of this
approach.
All possibilities of diagonal movements are shown in Figure 3.7. If the user is
coming from a Bottom Left direction, all objects found in region 1 direction will be
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 78
(a) Top Right (b) Top Left
(c) Bottom Left (D) Bottom Right
Figure 3.7: The complexity of diagonal movement
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 79
Algorithm 3.4: The diagonal movement algorithm
Output: Resultsbegin
Objects ← objects collection in the current Base Station boundary(X, Y ) ← Current location at tstart+1
Dir ← user direction(either diagonal up or down)overlapping collection ← list of objects in the overlapping areawhile (still have more Objects) do
if (object is not in overlapping collection) and (object is in scope) thenif (direction is Top Right) then
collection ← collection + objects found in region 1else if (direction is Top Left) then
collection ← collection + objects found in region 2else if (direction is Bottom Left) then
collection ← collection + objects found in region 3else if (direction is Bottom Right) then
collection ← collection + objects found in region 4end
endobject ← next object
endcollection ← collection + overlapping collection
end
returned. When the user is coming from the Top Right direction, all objects found
in region 3 will be returned to the user (Figure 3.7c). Another example is that if
a user goes to Top Left, the opposite region (region 2) will be probed (shown in
Figure 3.7a).
3.3.2 Dynamic Query Scope Category
This category focuses on the query scope that is dynamically changed based on the
user direction. It does not imply that the shape of query scope is changed, but
the query scope is not parallel towards the cell scope (as shown in Figure 3.8). In
contrast, the query scope is perpendicular to the direction of the user and the angle
of movement is not necessary. Making the query scope to be perpendicular to the
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 80
direction of the user has significantly reduced its complexity. After the query scope
was created, information of objects within the shaded area is retrieved and returned
to the user.
Figure 3.8: Dynamic query scope for the diagonal movement
Algorithm 3.5 shows the process of this approaches. The details of the proposed
approaches are described as follows:
(i) Generates a line equation of the travel distance.
(ii) Forms a query scope with a given size and perpendicular to the above line
equation.
(iii) Finds an overlapping area between current and previous query scopes. If any,
selects the objects in the overlapping area.
(iv) Retrieves information of all objects that are located in the shadowed regions
of the current query scope and outside the overlapping area. The shadowed
regions are areas that have not been passed.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 81
Algorithm 3.5: The dynamic query scope algorithm.
Output: Resultsbegin
objects ← objects collection in the current Base Station boundary(X, Y ) ← Current location at tstart+1
(TDX , TDY ) ← user searching distanceTravel line ← line equation for travel directionQuery Scope ← query scope which is perpendicular to Travel lineSearching area ← two regions of Query Scope which is located in frontof current locationoverlapping collection ← list of objects in the overlapping areawhile (still have more Objects) do
if (object is not in overlapping collection) and (object is inSearching area) then
collection ← collection + objects found in Searching areaendobject ← next object
endcollection ← collection + overlapping collection
end
3.3.3 Angle of Movement Category
Previously, Section 3.3.1 described our proposed approaches where the query scope
of user is parallel to a BS that the user are currently connected to. However, its
discussion is limited only three travel directions, which is not efficient. Flexibilities
of travel direction of a user are added to extend the proposed approach, which are
elaborated in this section. The flexibilities are categorised into 3 ranges: 0 < α ≤30, 30 < α ≤ 60 and 60 < α < 90.
Figure 3.9 shows another possibility for retrieving query results while users are
travelling diagonally, by considering the angle of movement. Measured angle is an
angle between the travel direction and the centre horizontal line of query scope.
Two front regions of the query scope will be searched when the mobile user travels
at the following angle: (i) equal to or less than 30 degree, (ii) 60 < α < 90 degree.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 82
In contrast, if the movement angle is between 30 and 60 degree from the horizontal
centre line of the query scope, the front region to be probed is only one.
(a) 0-30 degrees (b) 30-60 degrees
(c) 60-90 degrees
Figure 3.9: Angle of movement illustrations.
The retrieval algorithm of this category is similar to the static category, except
the region verification. It considers a travel direction from various different angle
rather than either horizontal / vertical or diagonal. Before we presents a discussion
of the proposed approach in details, an illustration of this and its possibilities are
shown and described first.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 83
Figure 3.10 shows the possibilities of the algorithm in Algorithm 3.6 for searching
two regions. They are when mobile users travel within 60-90 degrees direction are
shown in Figures 3.10a and 3.10b. On the other hand, Figures 3.10c and 3.10d
show mobile users travelling within a 0-30 degrees direction. The shaded area with
diagonal lines are the selected regions of query scope, which contains a query result.
Another shaded area, filled with crosses lines, are a range of user’s movement to
retrieve objects within the selected regions. Furthermore, there are another four
possibilities when mobile users are travelling within an angle of 30-60 degrees. They
are the same as the ones shown in Figure 3.4.
(a) 60-90 degrees top (b) 60-90 degrees bottom
(c) 0-30 degrees left (D) 0-30 degrees right
Figure 3.10: The complexity of angle movement.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 84
Algorithm 3.6 shows the details of this category. The details can be explained
as follows:
(i) Receive the input and creates a query scope, which is the same as Algorithm
3.1
(ii) Calculate how far the user has been travelling by measuring the start and
end positions. Find the angle between the travel distance and the X-axis
(horizontal line).
(iii) Find the objects in the overlapping area, using algorithm in Section 3.5.1.
(iv) Select the regions of the query scope based on the travel direction.
(v) Recursively find all objects which are located inside the selected regions of
query scope, but they are not in the overlapping area.
(vi) Collect information of these objects and send it to the user.
3.4 Multi-Cell Query Processing
Our proposed approaches to solve an issue while a user is travelling within a single
cell have been presented in Section 3.3. Unfortunately, the user could freely travel
from one to other cell. Due to this movement, the mobile user could query an
area which involves several BSs to answer the query, known as multi-cell query.
Every BS needs to have a knowledge of its neighbouring BSs in answering such
query, therefore it is essential that they are required to register their details to their
neighbour BSs [101].
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 85
Algorithm 3.6: The angle of movement algorithm.
Output: Resultsbegin
Objects ← objects collection in the current Base Station boundary(X0, Y0) ← location at tstart
(X1, Y1) ← location at tstart+1
α ← sec (X1−X0)√((X1−X0)2+(Y1−Y0)2)
overlapping collection ← list of objects in the overlapping areawhile (still have more Objects) do
if (object is not in overlapping collection) and (object is in scope) then// X-positiveif (X1 < X0) then
if (α ≤ 30 and α ≥ -30) thencollection ← collection + objects found in region 2 + 3
else if (α < 60 and α > 30) thencollection ← collection + objects found in region 3
else if (α > −60 and α < −30) thencollection ← collection + objects found in region 2
else if (α ≥ −60 and α ≤ −90) thencollection ← collection + objects found in region 1 + 2
else if (α ≤ 60 and α ≥ 90) thencollection ← collection + objects found in region 3 + 4
end
else if (X1 > X0) thenif (α ≤ 30 and α ≥ -30) then
collection ← collection + objects found in region 2 + 4else if (α ≤ 60 and α ≥ 30) then
collection ← collection + objects found in region 1else if (α ≥ −60 and α ≤ −30) then
collection ← collection + objects found in region 4else if (α ≥ −60 and α ≤ −90) then
collection ← collection + objects found in region 3 + 4else if (α ≤ 60 and α ≥ 90) then
collection ← collection + objects found in region 1 + 2end
end
endobject ← next object
endcollection ← collection + overlapping collection
end
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 86
Figure 3.11 shows three different types of multi-cell queries where there is no
overlapping area amongst BSs. Figure 3.11a shows a user that moves into its neigh-
bouring BS. In Figure 3.11b a user moves toward the BS borders. Figure 3.11c shows
the movement within the same BS and the query scope crossing the corresponding
BS boundary.
As mentioned before, the illustration in Figure 3.11a shows a user travelling
within BS1 and the query scope is crossing the BS1 boundary. The target of query
scope (shaded area) is decided by the user direction. When BS1 knows that the query
scope is crossing its boundary, it processes the query within its area (shaded area
from user location to ∆X) and gets partial information about the query result from
BS2 by forwarding the remaining query scope information from BS1 to BS2. Once
BS2 finishes generating query results, it forwards the query results to its requester
neighbour, BS1. Then, BS1 combines the partial query results retrieved from the
other BS (BS2) and forwards the joined query results to the user.
Figure 3.11b shows a user location at the border line of BS1 and BS2. In this
situation, BS1 does not process the user query because, when the user misses the
query results, BS1 needs to forward the user query twice to its neighbour. After
BS1 receives the user query, it forwards the user query and the query scope to its
neighbour, BS2, which processes the query and forwards the query results to the
user immediately. In both figures, handovers do not happen since the user remains
within one cell.
Figure 3.11c shows a user moving from his/her current BS1 to BS2. BS1 receives
the query and calculates the predicted location of user. BS1 forwards the user query
and the prediction of the user location to BS2 and the user query is handled by
BS2. In this situation, generating query results of a number of users is dependent
on the knowledge on knowing when the users enter new cells. Predicting when users
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 87
enter new cells have been discussed in section 2.4.3 [128]. So, in this case, the next
neighbour, BS2, knows when users enter its area. The remaining processes of query
result retrieval are the same as in the previous example.
(a) Movement within one cell (b) Movement to BS border line
(c) Movement within another cell
Figure 3.11: Three types of users’ movement
The above figure shows non-overlap areas of multiple BSs, these areas can overlap
each other. This situation raises an issue in answering multi-cell query raises issues.
The issue and its proposed solutions will be addressed in Section 3.4.1. Later in
Section 3.4.2, a proposed solution to apply these proposed approaches to deal with
a multi-cell query, which is either static or dynamic, is described.
3.4.1 Non-Overlapping and Overlapping Area Algorithms
The issue of multi-cell query retrieval is to avoid any duplicate data items retrieved
from other BSs and to reduce waiting time to retrieve query results from other
BSs. This section describes two proposed solution of multi-cell query retrieval which
involve non-overlap and overlap areas of multiple BSs respectively.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 88
Non-Overlapping Area algorithm
The proposed solution to answer a multi-cell query from non-overlapping areas of
multiple BSs is the focus of this section. Before the proposed solution is presented,
two major types of non-overlapping BS areas are described. Figure 3.12 shows two
major types of non-overlapping BS scopes. The first figure shows a whole area that
is covered by many BSs; whereas, the second figure shows there is an area which is
not covered by those BSs (described by the shaded area). Handling mobile query
retrieval in both situations is the aim of our proposed approach.
Figure 3.12: Non-overlapping base stations(BS)
As we mentioned previously, our proposed approach keeps track all online BSs
such that all BSs are required to register to all of their surrounding neighbour BSs.
When there is a multi-cell query, the current BS retrieves information of local and
remote objects. The current BS refers to a BS where the mobile user is inside its
service area. Hence, when the user is expected to arrive at a new cell, the current
BS is the one that sends a query result. Because new location of the user is located
in the new BS area, which is different to the one receives the user query that has
been forwarded the user query to this new BS. Hence, we assume that a handover
has been carried out.
In retrieving information of remote objects, the current BS searches the area of
query scope which overlaps with any of online BSs in its list. For example, when the
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 89
area of query scope overlaps with area of BS A, the overlapped area is then sent to
the BS A in order to get the query result. Using the approach mentioned in Section
3.3, the BS A generates the query result and returns it to the current BS. It merges
all information of remote and local objects, which will be sent to the user.
Algorithm 3.7 shows the details of proposed algorithm. They are described as
follows:
(i) Retrieve information of all objects from the current BS which are covered by
the query scope. The details of information retrieval process will be discussed
in Section 3.4.2.
(ii) Load an information of online BS from the list in sequential order.
(iii) For every online BS, find an overlap area between the query scope and area of
the online BS. If an overlap area exist, the current BS sends the overlap area
to that BS. In other words, the overlap area is the query for that particular
BS.
(iv) The online neighbour BS that receives the query, execute the query in the same
way as the current BS.
(v) The online neighbour BS returns a query result, which contains either a list
information of objects or an empty list, to the current BS.
(vi) The current BS combines the returned query result into its query result.
(vii) Repeat the process until information of all objects inside the query scope is
retrieved.
(viii) Return the query result to the requester.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 90
Algorithm 3.7: Non-overlapping algorithm.
Input: Query, NoOfBSOnlineOutput: Resultsbegin
Queryscope ← Scope of queryCurrent BSscope ← current base station boundaryCurrent BSID ← current base station IDNoOfBSOnline ← number of online neighbour BSCollectionOfOnlineBS ← List of online BSResult ← Get Result(Queryscope, Current BSscope)while (index < NoOfBSOnline) do
Current Neighbour BS ← CollectionOfOnlineBS at position indexif intersection(Queryscope, Current Neighbour BSscope is existed) then
// Append retrieved results from current BS// to the end collection of all retrieved resultsResult ← Result + Current Neighbour BS(Queryscope)
endindex ← increment index by one
endReturn Result
end
Figure 3.13 shows an illustration of the multi-cell query retrieval. MU2 and MU1
sends 2 queries to BS1 and BS5 respectively. BS1 forwards the query by sending the
overlap area with BS2 and one with BS4 to BS2 and BS4 respectively on behalf of
mobile user MU2. The BS2 and BS4 return information of objects inside the query
scope.
The retrieval process of MU1 is similar to MU2, except the uncovered area will
not be sent to any of online BSs. Furthermore, the query scope covers the area of
BS1 and BS2, which are not direct neighbours of BS5. Fetching result information
from both BSs can be done by recursively passing the overlap area of query scope to
all neighbour BSs of the current BS. These neighbour BSs pass overlap parts of the
received query scope to their neighbour BSs. This process continues until the whole
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 91
Figure 3.13: Multi-cell query illustration
area of query scope is processed and there is no further overlapping area between
any BS and query scopes.
Overlapping Area Algorithm
The query result retrieval for overlapping area of multiple BSs has similar process as
the one for non-overlapping. The existence of overlapping area makes both processes
different, because a mechanism has to be applied to avoid any object duplications
in a query result.
This section elaborates two proposed approaches to handle the situation, they
are area and query result eliminations. Both proposed approaches are explained as
follows:
1. Eliminating neighbour BS overlapping area
This proposed approach is used to avoid reprocessing an overlap area of mul-
tiple BSs that have been processed. When the query scope covers an overlap
area of multiple BSs, the overlap area will be searched once. The first BS in
the list of online BSs would be in charge for searching objects within that area.
Algorithm 3.8 shows the complete process. They can be explained as follows:
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 92
(i) Retrieve a query from an user, extract information of the query and
generates a query scope based on the information of the query.
(ii) Search the requested information of all objects where their places are
inside the query scope within the current BS area.
(iii) Load an information of the online BS based on its position in the list of
online BSs. Verify the area of that BS against the area in a list of the
processed BSs. This list contains all BSs that have processed the query.
If the online neighbour BS being processed is in the list, this BS does
not get a task to process the current query.
(iv) Before forwarding the query to that neighbour BS, the current BS elim-
inates the overlapping area of the current neighbour BS being processed
and the current BS.
(v) Once the overlapping area has been eliminated, the query and list of the
executed BSs are forwarded to that BS which will then generates a query
result using the same mechanism as the current BS.
(vi) Repeat the process until all online BSs have been processed.
2. Eliminating items from neighbour BS query results
The proposed approach is similar to the previous one. The only difference
is that any overlapping neighbour BS areas is not eliminated, because the
duplicated items in the returned query results are eliminated from the query
results.
Algorithm 3.9 shows our second proposed algorithm for retrieving items from
multiple overlapping cells by eliminating duplicate items in the query result.
To simplify our description, we do not discuss the whole algorithm since it is
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 93
Algorithm 3.8: Eliminating neighbour BS overlapping area algorithm.
Input: Query, list of BS doneOutput: Resultsbegin
Queryscope ← Scope of queryBSscope ← current base station boundaryCurrent BSID ← current base station IDResult ← Get Result(Current BSID)Area Taken ← BSscope
list of BS done ← list of BS done + Current BSIDwhile (index < NoOfBSOnline) do
Current Neighbour BS ← CollectionOfOnlineBS at position indexif (Current Neighbour BS.ID is in list of BS done) then
list of BS done ← list of BS done + Current Neighbour BS.IDContinue to next neighbour BS
endlist of BS done ← list of BS done + Current Neighbour BS.IDCurrent Neighbour BSscope ← Current Neighbour BSscope -Area Takenif intersection(Queryscope, Current Neighbour BSscope is existed) then
// Append retrieved results from current BS// to the end collection of all retrieved resultsResult ← Result + Current Neighbour BS(Queryscope, list of BS done)
endArea Taken ← Area Taken + Current Neighbour BSscope
index ← increment index by oneendReturn Result
end
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 94
similar to one mentioned in the previous subsection. We highlight only those
parts which are different.
Algorithm 3.9: Eliminating items from neighbour query result.
Input: Query,list of BS doneOutput: Resultsbegin
Queryscope ← Scope of queryBSscope ← current base station boundaryCurrent BSID ← current base station IDResult ← Get Result(Current BSID)list of BS done ← list of BS done + Current BSIDwhile (index < NoOfBSOnline) do
Current Neighbour BS ← CollectionOfOnlineBS at position indexif (Current Neighbour BS.ID is in list of BS done) then
list of BS done ← list of BS done + Current Neighbour BS.IDContinue to next neighbour BS
endlist of BS done ← list of BS done + Current Neighbour BS.IDif intersection(Queryscope, Current Neighbour BSscope is existed) then
// Append retrieved results from current BS// to the end collection of all retrieved resultstempNeighBSResult ← Current Neighbour BS(Queryscope, list of BS done)tempNeighBSResult ← Eliminate duplicate items fromtempNeighBSResult against ResultResult ← Result + tempNeighBSResult
endindex ← increment index by one
endReturn Result
end
This algorithm does not follow the BS areas that have been processed, but, it
treats the overlapping BS as a non-overlapping area. It means that a neighbour
BS collects any items in the overlapping area covered by the query scope,
although these items are collected by other neighbour BSs. As a result, the
returned query result contains some duplicated items when it merges with the
one in the caller BS. Therefore, an additional step is taken to filter out the
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 95
duplicate items before joining any query results of other neighbour BSs and
the one of the current BS together.
3.4.2 Static and Dynamic Query Scope Algorithm
Static query scope refers to a parallel query scope with a base station boundary.
Dynamic query scope is a query scope perpendicular to the straight line of the travel
direction of mobile users. Section 3.3 has presented more details about these query
scopes and the proposed query processing algorithms in a single cell. This section
presents a discussion to apply the proposed algorithms, which has been discussed in
Section 3.4.1, to be used with static and dynamic query scopes to retrieve a query
result. The information retrieval algorithm for the static query scope is presented
first, followed by the dynamic one.
Static Query Scope Algorithm
Before a discussion of proposed approaches is presented, an illustration is shown
to give a better picture. Figure 3.14 illustrates a static query scope that covers a
partial area of multiple cells. When MU2 sends a query scope, ACFH, which covers
partial area of the BS, the BS decides part of query scope being processed which
depends on the travel direction. The scope, KCFL, is then processed. The BS then
does a reduction process of the query scope towards the area of BS. The aim of
reduction is to eliminate the part of the query scope which does not belong to the
BS. The reduction process is very straight-forward since it cuts any four sides of the
query scope if they are greater than the BS scope. After the reduction process has
been done in BS1, BS2 and BS4, we have three smaller separate query scopes inside
three BSs. These smaller query scopes are KBEJ in BS1, BCDE in BS2 and DFLJ
in BS4.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 96
After every BS has completed the reduction query scope process, the BS searches
all items which are located inside the smaller query scope. These items are returned
to the BS caller, otherwise the BS returns nothing to the BS caller. Therefore, BS4
and BS2 return items inside areas BCDE and DFJG respectively to BS1. Then,
BS1 sends back the returned results and its result to the user.
Figure 3.14: An illustration of static query scope
Algorithm 3.10 shows details of the proposed information retrieval approach.
They can be explained as follows:
(i) If the scope of BS is smaller than the query scope, return all items inside the
area of BS.
(ii) Otherwise, create a new scope of query that covers the scope of BS.
(iii) Find all items which are located in the new scope and store them into a col-
lection called result.
(iv) Return it to the requester.
Dynamic Query Scope
Dynamic query scope is different from static query scope. There are many possi-
bilities when a dynamic query scope is used to retrieve query results from several
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 97
Algorithm 3.10: Get Result algorithm for static query scope
Input: queryScope, BSScopeOutput: Resultsbegin
list of items ← items in this BS//Check whether query scope is greater than BS scopeif (BSScope < queryScope) then
return all items in list of itemsend//Check whether partial of query scope pass BS scopeNew Query Scope ← queryScopeif (queryScope.Xmax > BSScope.Xmax) then
New Query Scope.Xmax ← BSScope.Xmax
endif (queryScope.Xmin < BSScope.Xmin) then
New Query Scope.Xmin ← BSScope.Xmin
endif (queryScope.Ymin < BSScope.Ymin) then
New Query Scope.Ymin ← BSScope.Ymin
endif (queryScope.Ymax > BSScope.Ymax) then
New Query Scope.Ymax ← BSScope.Ymax
endwhile (item in list of items) do
if (item is inside New Query Scope) thenresult ← result + item
end
endreturn result
end
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 98
BSs since they come from a different angle. These possibilities can be classified into
four categories based on the coverage of the query scope on a BS area, as shown in
Figure 3.15. The new shape of the query scope can either be a polygon or triangle.
The first three figures show a polygon, whereas the last figure on the bottom right
shows a triangle. The boundary of the new query scope intersects with one or more
BS boundaries.
(a) two parallel lines (b) two perpendicular lines
(c) two parallel lines (d) two perpendicular lines
Figure 3.15: Dynamic Query intersects a base station (BS) (top) in the same line.(bottom) in two different lines
Algorithm 3.11 shows a retrieval algorithm when a query scope passes a neigh-
bour BS area. This algorithm starts with an initialisation value to some parameters.
Then, it checks whether the BS area is smaller than the query scope. If it is, the
neighbour BS returns all items in its area immediately to the mobile user.
When a query scope partially overlaps neighbour BS, there is an intersection
area as shown in Figure 3.16. The form of this area is a polygon with has n number
of vertices, where the value of n is between 3 and 6. The intersection area is formed
by any corner point of BS boundary and a number of intersection points between
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 99
Figure 3.16: An illustration of dynamic query situation
query and BS scopes. Hence, the BS needs to know where these intersection points
are located and which one or more corner points of BS boundary are inside the
intersection area. An intersection point lies on two line equations: a line equation
of the BS boundary and the stored line equation of the query scope and searching
distance. These points are stored in a clockwise order.
The BS compares all items inside the list of items whether or not their location
is inside the query scope, using the right hand rule, which is specified as follows:
• Take two points inside the collection.
• Use formula (p.y−p.y0)(p.x1−p.x0)− (p.x−p.x0)(p.y1−p.y0) to find whether
point p is located inside the query scope. This formula returns a value less
than 0, equal to 0 or greater than 0.
• Point P lies inside the query scope if the value is less than or equal to 0
All points located inside the query scope are collected and returned to the BS
requester. Then, the BS requester combines the result with its result together to
the requester.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 100
Algorithm 3.11: Neighbour cell retrieval algorithm for dynamic query scope
Input: queryScope, BSScopeOutput: Resultsbegin
isInside ← truelist of items ← items in this BSlist of BS Vertices ← BS vertices//Check whether query scope is greater than BS scopeIntersection points ← find all intersection points lies on query scope andneighbour BS boundariesSort elements of Intersection points in clockwise orderif (all point in list of BS Vertices covered by queryScope) then
return all items in BSScopeendif Any point in list of BS Vertices covered by queryScope then
Add that point into Intersection pointsendSort elements of Intersection points in clockwise orderwhile item in list of items do
p ← item[ndxItem]while (isInside is true) and (index < number of elements inIntersection points) do
p0 ← Intersection points[index]p1 ← Intersection points[index+1]isRS ← (p.y - p.y0) (p.x1 - p.x0) - (p.x - p.x0) (p.y1 - p.y0)if (isInside is true) and (isRS ≤ 0) then
isInside ← trueelse
isInside ← falseendindex is incremented by one
endif isInside then
result ← result + pendndxItem is incremented by one
endend
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 101
3.5 Handling Disconnections
In a mobile environment, there might be a situation that mobile users do not re-
ceive any query results due to a disconnection between the mobile user and the
base station. The disconnection can be either unpredicted or predicted, which is
caused by interference or the recipient’s location is outside the coverage of any base
station respectively. This section discusses our proposed algorithms for handling
disconnection.
A server can handle predictable disconnection more easily than unpredictable
disconnection. The reason is that the server knows the wake-up time of mobile
users. On the other hand, the second type has no knowledge of when the mobile
device gets connected and the recipient’s next location.
As a result of the disconnection, either the query result has not been received
by the mobile user or an acknowledgement has been lost during the transit. To
deal with the missing results problem, the server could reprocess the query result
when the disconnection is predictable. On the other hand, reprocessing the query
result might not be a good idea since the server needs a certain amount of time
to produce the query result and frequent disconnection happens. Preserving an
existing query result and sending it periodically within a certain amount of time
is one solution to manage sending the query result in this situation. For both
solutions, the mobile user needs to send an acknowledgement once the query result
has been received. The server keeps a query result for a certain amount of time to
avoid excessive query results from other mobile users. In the case of any failure in
receiving acknowledgement by the server, this situation would be treated as if the
mobile users had not received the query result sent.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 102
The rest of this section discusses the proposed approaches for handling two types
of disconnections for single cell and multiple cells. The proposed approach to han-
dling disconnection in the single cell is presented first, followed by that for multiple
cells. For each subsection, the proposed techniques for handle predictable and un-
predictable disconnection are presented.
3.5.1 Single Cell
Predictable Disconnection
This section elaborates on a proposed mechanism to handle a situation when mobile
users miss query results within a predictable time. In other words, mobile users are
alerted to be ready to receive query results in the next interval.
Figure 3.17: Illustration of predicted disconnection situation
Consider that a user at location Z0 sends a query, which requests objects within
a distance D, at time tstart. The user is travelling with a constant speed S. The user
does not receive a query result at location Z1 at time tstart+1. The user is expected
to arrive at location Z2 to receive a query result at time tstart+2 as shown in Figure
3.17. If the gap between Z1 and Z2 is less than the distance of the user query, an
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 103
overlapping of the query results (the area is indication by TRSQU) is generated at
time tstart+1 and time tstart+2. It overloads the server to generate the same results
in future.
In our proposed approach, retrieving the same result set can be avoided when
the above situation happens. The approach is divided into two major steps. Deter-
mining the existence of overlapping area of two query scopes, which exists if the gap
between Z1 and Z2 is shorter than the distance value of the query, is the first step
of this approach. The second step is to exclude any items from a result set. In the
second step, there are two ways: items or area. Both ways are similar to the ones
to eliminate duplicated items of multi-cell queries processing for the overlapping
BSs (described in Section 3.4.1). Both ways collect items that are located inside
the overlapped region from the existing query result. In the items elimination, we
do not eliminate the overlapped area while searching the items, however, all items
are compared with the items in overlapped region. The area elimination, we elimi-
nate the overlapped area while searching the items, therefore comparing the items
whether they are located in the query scope is done.
Algorithm 3.12 presents our proposed algorithm. The details of this algorithm
are explained below:
(i) Verify the previous position, (X1, Y1). If it is outside the current query scope,
terminates the algorithm and returns an empty result set.
(ii) Form an overlapping area of both query scopes.
(iii) Retrieve all objects which are located in the query scope, but they are outside
the overlapping area. This step is the second part of the proposed approach.
Hence, an elimination process is completed here by choosing one of two ways.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 104
(iv) Keep all items in the overlapped regions.
(v) Add those objects with ones in the overlapping collection.
Algorithm 3.12: Predicted disconnections algorithm
Input: queryScope, BSScopeOutput: Resultsbegin
Objects ← objects collection in the current Base Station boundary(X1,Y1) ← Current location at tstart+1
(X2,Y2) ← Current location at tstart+2
Dist ← distance of user queryoverlapping collection ← emptyif (X1,Y1) is outside queryScope then
return overlapping collection;endT ← (X2± distance, Y2± distance)R ← (X1± distance, Y2± distance)S ← (X1± distance, Y1± distance)Q ← (X2± distance, Y1± distance)U ← (X2± distance, Y2± distance)overlapping area ← area formed by TRSQU coordinatesoverlapping collection ← all objects in the overlapping areacollection ← overlapping collection + searches all objects which are notlocated in the overlapping areareturn overlapping collection
end
Unpredictable Disconnection
This subsection presents a proposed technique for managing a situation where un-
predictable disconnection occurs. There are two possible solutions for handling such
disconnection: non-reprocessing or reprocessing query result.
Algorithm 3.13 shows the proposed non-reprocessing algorithm when unpre-
dictable disconnections occurs. This algorithm is executed when the BS have re-
ceived an information that the mobile user is ready. At the start of the algorithm,
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 105
a query scope is a query scope when a mobile user misses query results. This query
scope has been available in the server. (X1, Y1) is the last position when the mobile
user missed query results. (X2, Y2) is the current position of the mobile user. When
(X2, Y2) is still inside the query scope, then the server sends the existing query re-
sult. Otherwise, the server needs to reprocess the existing query with next location
of the mobile user.
Algorithm 3.13: Non-reprocessing algorithm
beginQueryscope ← query scope from user(X1, Y1) ← Sender location at time tmissed
(X2, Y2) ← location at time tcurrent
if ((X2, Y2) is inside Queryscope) thenSend existing query results
elseRegenerate query result
endend
The advantage of this algorithm is that it reduces the server load by keeping the
existing query result, which depends on the server configuration. Two drawbacks of
this algorithm are that it increases server memory consumption because it retains
existing query results, and some objects may be invalid since the requester has
moved to a new location.
Alternatively, the server regenerates a new query result without worrying about
the existing query result. Algorithm 3.14 shows the reprocessing algorithm when the
mobile user has missed a query result. This algorithm is executed when the mobile
user reconnects to the current BS. Similar to the one mentioned previously, the
server collects the current location information of the mobile user and predicts the
next location at the beginning of the algorithm. Then, the server generates query
scope at (X2, Y2) with the same searching distance which was passed to the server
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 106
Algorithm 3.14: Reprocessing algorithm
begin(X1, Y1) ← Sender location at time tcurrent
(TDx, TDy) ← Travel distance of mobile user// Prediction of next location at time tcurrent+1
(X2, Y2) ← (X1 + TDx, Y1 + TDy)Queryscope ← generate query scope at (X2,Y2) with same searchingdistanceResult ← Reproduce query result at time current+1Send result to user
end
beforehand. The next step is the server reproduces the query result and finally, the
server sends it to the requester.
3.5.2 Multiple Cells
Disconnections also occur while retrieving multi-cell queries. The problem occur
if the mobile user travels to an area which is outside a service area of one BS.
The BS needs to avoid processing query when the user cannot be reached. This
section presents a discussion for the above problem, focusing on predictable and
unpredictable disconnections.
Predictable Disconnection
This describes the proposed algorithm for handling a predictable disconnection sit-
uation which is the period of disconnection can be known in advance. For example,
when a mobile user is outside the service area of the base station within a certain
period of time.
Algorithm 3.15 shows our proposed algorithm for handling a predictable dis-
connection in receiving the query result. At the start of the algorithm, the server
retains the existing query results that were not sent while the next recipient location
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 107
Algorithm 3.15: Predictable disconnection algorithm for multi-cell retrieval
beginqueryResult ← existing query resultisSent ← falseisOutCurrBS ← falsewhile (not isSent) do
(Xt, Yt) ← new location at time tif (Xt, Yt) is outside the current BS area then
isOutCurrBS ← trueexit loop
endwhile ((Xt, Yt) is inside queryScope) do
Send existing query result when connection is establishedisSent ← acknowledgement from the user upon receiving the result
end
endwhile (isOutCurrBS) and (BS in list of online BS) do
if ((Xt, Yt) is inside the BS) thenForward query, location to that BSExit loop
endBS ← next BS
endremove query result
end
was still inside the query scope. Otherwise, a new query result is generated as the
location of mobile user is outside the query scope.
In addition, the possibility of leaving the current BS area exists, thus, the current
BS needs to send the query to a neighbour BS. However, the next location of the
mobile user may not belong to any online BS. Therefore, the current BS needs
to calculate the next location while the mobile user enters any of the online BSs.
Hence, the new BS processes the query and sends the query once the mobile user is
connected to the new BS.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 108
Unpredictable Disconnections
The problem of unpredictable disconnections for multi-cell queries is different than
one for single cell. The movement of user to another cell creates the difference
between two. The current cell should have a knowledge whether to remove or keep
the query result. We propose an approach to handle such situation in this section.
Algorithm 3.16 shows an algorithm for maintaining unpredictable disconnection.
The server waits for acknowledgement once it has finished sending the query result.
The acknowledgement value parameter is either true or false. It is true if the mobile
user receives the query result completely.
Algorithm 3.16: Unpredictable disconnection algorithm for multiple cellsretrieval
beginqueryResult ← existing query resultisSent ← falsewhile (numOfSendingTrial < maxSendingAllowed) and (not isSent) do
if (connected) thenif (recipient location is outside the query scope) then
regenerate query result again based on new locationexit loop
endsend query resultwaiting for acknowledgement in t period timeacknowledgement ← acknowledgement from user upon receivingquery resultif (acknowledgement is true) and (acknowledgement is received)then
isSent ← trueend
endelse
exit from loopnumOfSendingTrial is incremented by one
endremove query result
end
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 109
The waiting period is calculated by the maximum number of query results sent
multiplied by the waiting period for receiving an acknowledgement. The value of
both parameters are configurable depending on the server capacity. The query result
is kept at the server side until the maximum number of sending has been reached
or the user is disconnected from the server.
The formula, to calculate the waiting period before a query result is deleted, is
given below:
WP = MS * WPA
Where:
WP is Waiting period before a query result is deletedMS is Maximum number of sendWPA is Waiting period to receive an acknowledgement (timeout)
If one of the above condition is reached, the server removes the query result.
The removal is to avoid running out of server space, even though the server has a
large space. The mobile user needs to send the query again after this period or the
new location will be outside the query scope.
This algorithm focuses on the recipient location that resides inside the query
scope. Otherwise, it is ineffective in keeping the query result at the server side since
the query result is invalid for the new location.
3.6 Case Studies
In this section, we describe case studies for single cell and multi-cell queries to
illustrate how these proposed approaches work and how query results are computed.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 110
3.6.1 Single-Cell Query Processing
We illustrate situations where the user has stopped or is moving while receiving the
query result. The user may move slowly or quickly and the movement direction may
be vertical, horizontal or diagonal. We define a slow velocity as a velocity when the
user movement is less than the distance of user query. In other words, if a user
query finds a target within x, then the user movement is less than x. In contrast,
a fast velocity is a velocity when the user movement is greater than or equal to the
distance of the user query. Therefore, the user may either hit or miss the query
result during the process of receiving the query result from the server.
Based on the above situations, this case study is divided into 4 cases. Each one
presents a discussion when a user has zero, vertical, horizontal and diagonal move-
ments respectively. Each one discusses two situations of retrieving query results, hit
and miss query results, while they are moving.
Case Study 3.6.1. The mobile user stays in the same location
In this example, we assume that the user is not moving to any other location
while a query result is being received. Consider a mobile user is located at point
(5,5) and sends a query to a server (refer to Figure 3.18). The query is “Find a
closest restaurant within 2 kms”. This user stays at the same location when the
answer is given by the server. In other words, the location at time tstart+1 is the
same as the one at time tstart.
The server will generate a valid scope by adding and subtracting the distance
to/from the mobile user position. Therefore, we have a square that is formed by
the following coordinates: Top Right: (7,7), Bottom Right: (7,3), Top Left: (3,7),
Bottom Left: (3,3).
After the valid scope has been produced, it will search a restaurant within ranges,
3 < x < 7 and 3 < y < 7. In other words, all regions will be searched. Once the
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 111
Figure 3.18: Stay at the same location (Case Study 3.6.1)
server finds a restaurant within that range, then it will generate a query result for
that query. The server forwards the query result to the user. An acknowledgement
flag is set to be true if the user has accepted the query successfully. Otherwise, the
server keeps processing the query result for the next interval time.
Case Study 3.6.2. Vertical Movement
We illustrate that the user is moving vertically with a constant speed. First, we
show that the user receives the query result immediately from the server at time
tstart+1. Later, a situation when the user missed the query result from the server at
time tstart+1 is shown.
After the scope has been created and divided into four equal regions, the server
identifies the user movement direction. The examples in Figure 3.19 show the user
travels horizontally. Then, the server executes the Algorithm 3.2 in order to find
all objects queried. In Figure 3.19a, the user is moving in an upward direction,
therefore, the server will search targets within region 1 and region 2 of the scope
instead of the whole scope. This is due to our assumption that the user is interested
only in the targets that have not been passed. Hence, the valid targets will be
vending machines (V6, V8, V9, V11, V13) and these targets are forwarded to the
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 112
(a) Move up (b) Move down
Figure 3.19: Vertical movement (Case Study 3.6.2-1)
user. The server sets the parameter forwarded to be true once the user has received
the answer successfully.
On the other hand, if the user is moving down, regions 3 and 4 will be probed
(Figure 3.19b). Hence, the vending machines (V2, V7, V10, V14) are valid results
and are forwarded to the user. Then, the parameter forwarded is set to be true.
We have shown a situation where the user receives the query result at time
tstart+1 above. Now, we assume that the user missed the query result at time tstart+1
when the user is moving up with a constant speed. The user will receive it in the
next interval time tstart+2 as shown in Figure 3.20.
The beginning part of this algorithm is the same as above where the server
initialises parameters needed by assigning the value from a user query received.
After the parameter initialization, the server generates a new scope for the location
Z1 at tstart+2 and divides the scope into four equal regions. Once the scope has
been created and divided, the server searches for targets within region 1 and region
2 instead of all regions based on our assumption above. Therefore, the only valid
vending machines will be vending machines (V9, V13) at time tstart+1.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 113
Figure 3.20: Vertical movement with overlap situation (Case Study 3.6.2-2)
Upon handling the disconnection, the server regenerates new query results for the
next interval time where the user is predicted to reach location Z2 at time tstart+2.
In recreating the query results for time tstart+2, the server verifies targets in the
old query results. The server will invalidate targets which are not bounded by the
overlapping area PQRS (see Figure 3.20). In this scenario, no target is invalidated.
Then, the server probes new targets in regions 1 and 2 inside the square generated
at time tstart+2. The new targets found will be joined to the existing valid targets.
Hence, the query results returned at time t2 is vending machines (V6, V8, V9, V13).
Case Study 3.6.3. Horizontal Movement
Here, we present three examples of the horizontal movement of a user. Two
hit illustrations that show a user receiving the query result while he/she is moving
horizontally with a constant speed at time tstart+1 are given first. Later, a situation
where the user misses the query result is presented. We assume that the user will
only receive the query result at tstart+1.
Let us consider an illustration where a user query “Find all vending machines
within 2 kms”, the speed of travelling is S as presented in Figure 3.21. The horizontal
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 114
(a) Move right (b) Move left
Figure 3.21: Horizontal movement (case study 3.6.3-1)
movement to the right direction with speed S as shown in Figure 3.21a. In the
beginning of process, the server receives a user query including the current travel
information. The server creates a query scope based on the information in the user
query and then selects the regions to be searched. The server assigns all targets that
are located within regions 1 and 4 of the scope, due to our assumption mentioned
above, into parameter collection. We assume that the user will arrive at point (5,5)
at time tstart+1. Therefore, the valid vending machines: (V9, V10, V11, V13 and V14),
are forwarded to the user and the parameter are set to true.
On the other side, when the user is moving in a left direction with velocity S,
the regions 2 and 3 will be searched to get valid objects (shown in Figure 3.21b).
Therefore, the valid targets: (V2, V6, V7, V8), are forwarded to the user and the
parameter forwarded is set to be true.
Now, let us consider a situation where the user missed the query result at time
tstart+1 and then the user will receive a new result at time tstart+2 as shown in Figure
3.22. Assume that the server had processed and found the vending machines: (V9,
V11, V13 and V14) as valid targets of the user query at time tstart+1.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 115
Figure 3.22: Horizontal movement with overlap situation (case study 3.6.3)
However, the user could not receive the query result when he/she was at location
Z1 on time tstart+1 because there is a disconnection upon receiving the results. We
assume the user keeps travelling with a constant speed and is predicted arrive at
location Z2 at time tstart+2. Then, the server regenerates the query result for the
next interval time. Since the user is moving slowly with constant speed, there is an
overlapping area, formed by points P,Q,R,S, between the square at time tstart+1 and
tstart+2. Therefore, the server will invalidate some targets in the existing query result.
In this case, the vending machines: (V9, V14) have expired and are eliminated. In
other words, the vending machines’ locations that are bounded by the overlapping
area, PQRS, are the valid targets for time tstart+2. After the server has eliminated
the invalid targets, the server will probe targets that are located within regions 1 and
4 of the scope (excluding the overlapping area) since the user is moving horizontally
and is interested in the targets that have not been passed. These found targets are
substituted with the targets found in area PQRS. Afterwards, these targets: (V10,
V11 and V13), are forwarded and received by the user at time tstart+2.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 116
Case Study 3.6.4. Diagonal Movement
In this example, two illustrations (as shown in Figure 3.23) of diagonal movement
are presented. The first illustration demonstrates a situation where a user receives
a query result at time tstart+1 when he/she moves diagonally with a constant speed.
In contrast, the last illustration shows a stage when a user misses a query result at
time tstart+1 and is expected to receive a new query result at time tstart+2.
(a) Diagonal movement (b) Overlap
Figure 3.23: Diagonal movement and overlap situation (Case Study 3.6.4)
Let us consider that a user sends the same query as in the previous example
and moves in a top right direction as shown in Figure 3.19 and Figure 3.21. At
the start of the process, a server receives the user query. The server then produces
a scope on a prediction location at time tstart+1, which is point (5,5), and divides
the scope into four equal regions. The next process analyses the user direction by
calling the diagonal movement algorithm (Algorithm 3.4) to check targets in the
opposite region. In the algorithm, the servers verify the user movement direction.
In our example, the user moves to top right direction, therefore, the server will
search targets within region 1 instead of all regions based on our assumption. Then,
the valid vending machine, V11, is sent to the user.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 117
In the next illustration (Figure 3.23b), the scenario is similar to that above;
however, the user experiences a disconnection upon receiving the query result at time
tstart+1. Therefore, the user missed the query result at that time and is expected to
receive the next query result at time tstart+2.
When the user acknowledges that the user has missed the current result, the
server regenerates a new query result for a location Z2 since the user is predicted to
arrive at location Z2 at time tstart+2. The server generates a scope for the location
Z2. The overlapping area is searched to invalidate the existing targets that are not
bounded within this area. Then, the server searches targets which are located within
the scope (excluding the overlapping areas). Then, the new targets found are joined
to the existing targets. Hence, the query result where the content is (V11, V13) is
returned to the user. A returned acknowledgement is set once the user receives the
query result.
3.6.2 Multi-Cell Query Processing
In this subsection, we present examples to show how our proposed query processing
algorithm for multiple cells works. We divide this into two parts: a non-overlapping
BS area and an overlapping BS. First, we discuss how to retrieve query results where
there is no overlapping area, then, we discuss the process for the situation where
there is an overlapping area. As a running example, we use the same query as
mentioned in Section 3.6.1, which is sent to a server through BS1.
Non-Overlapping BS Area
Two examples are given to illustrate two situations where a user moves within the
current cell and moves to another cell and requests for information of objects from
multiple cells.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 118
Figure 3.24 shows a situation where a query scope is crossing eight BS bound-
aries. In this situation, BS3 receives the query scope and forwards it to its neighbour
BSs (BS2, BS4, BS7, BS8, BS9). Those BSs search objects within the requested
area and verifies their list of online BSs. BS4 and BS9 forwards the query again to
BS5 and BS10 respectively to request objects in their area. BS5 and BS10 returns
information of the requested objects to the requester, BS4 and BS9. All BSs, which
got the forwarded query by BS3, returns all information to BS3. BS3 merges all of
the results and then send it to the user.
Figure 3.24: A query scope is crossing multiple cells
Figure 3.25 shows a situation where a user moves to another BS. The user sends
the same query to BS3. Once BS3 receives the user query, the prediction location of
users is calculated based on the function of time. This function of time is formulated
from the multiplication of travel speeds by time. Since the new location of mobile
user is outside its area, the BS3 forwards the query to BS8. BS8 creates a query
scope and processes the query inside the shaded area.
In fetching the query result, BS8 searches objects inside the requested area and
its list of online BSs. The query is then requested to its online neighbour BSs which
their areas being covered by the query scope. In this situation, BS7 and BS9 receive
the forwarded query and do the same processes as BS8. Then, BS9 forwards the
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 119
Figure 3.25: Moving across to another base station (BS) boundary
query to BS10 since this query covers partial area of BS10. BS10 does the same
process and send the query result back to BS9. BS9 merges its query result and the
one from BS10. BS9 and BS7 send the query result back to the BS8. BS8 combines
its query results and ones from BS9 and BS7. Once the query results are merged,
the result is then sent to the user.
Overlapping BS Area
Having presented two examples of non-overlapping BSs, we now show three examples
of possible situations which show a new location of users as the query scope interacts
with BSs where the BSs area overlaps with others. We also provide examples where
new locations of users are within the intersection area and before the intersection
area.
Figure 3.26a illustrates when a user moves to another BS and there is an over-
lapping area amongst the BSs. After BS1 receives the query from a user, it searches
targets within its area. After that, it verifies whether there is an intersection with
itself. If there is any intersection between BS1 and other BS neighbours, BS1 for-
wards the query to those intersected BSs. In this figure, BS1 forwards the query to
BS2. BS2 adjusts its minimum boundary by assigning the maximum boundary of
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 120
(a) Within an overlapping BS area (b) Outside an overlapping BS area
(c) Within many overlapping BS areas
Figure 3.26: Three situations of overlapping base station area
the BS1. Then, BS2 generates a new query scope by subtracting the current query
scope from BS2 minimum boundary. Finally, all targets within the query scope and
BS2 area are collected and sent back to BS1. BS1 collects the results from BS2 and
combines these into its collection. The final results are sent to the user.
Figure 3.26b shows a situation in which a new location of users is before the
overlapping area (shaded area between BS1 and BS2) and the query scope is beyond
the BS1 area. In this situation, BS1 searches targets within its area and overlapping
areas (shaded area). Once the searching process has been completed, BS1 checks
whether the query scope is beyond its boundary. If it is not, BS1 sends the query
result to the requester.
On the other hand, if the query scope is beyond the BS1 area, BS1 forwards the
query to all BSs which pass their areas. In this example, BS1 forwards the query
to the BS2. BS2 searches its area by excluding the overlapping area since BS1 has
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 121
already searched in that area. BS2 returns the query results to BS1 which returns
the query to the requester.
Figure 3.26c shows one example of complex situations. Once BS3 has received
the user query, it searches its area to match objects within its area with the user
query. The overlapping areas of BS4 and BS5 are included too. Then, BS3 passes
the user query to either BS4 or BS5, depending on which registers with BS3 first. If
BS4 registers first, the overlapping BS5 is included. Otherwise, that area is excluded
from the BS4 area. This situation is similar to that of BS5. If this BS registers first,
the overlapping area BS4 is included in the BS5 area. Alternatively, the overlapping
area does not belong in the BS5 area. After all areas where the user query passes
return their answer to BS3, it joins those answers and sends them to the user.
3.7 Discussion
Our proposed approaches are designed to retrieve all requested objects which have
not been passed while mobile users are travelling to receive the query result. It
focuses on a straight line movement and constant speed.
The proposed approaches are concerned with minimising query processing and
data transfer of the query results while mobile users are travelling within a single
cell or multiple cells. The proposed approaches are divided into two categories, they
are: query processing and handling disconnection. The query processing is further
divided into single-cell and multi-cell queries.
The advantage of proposed query processing approaches is to avoid processing
an unnecessary part of the query, which is the area that have been passed by the
user. Another benefit is to reduce amount of data transfer in sending the query
result.
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 122
We also proposed a solution to handle disconnection while transferring the query
result. The proposed solutions are divided to handle predictable and unpredictable
disconnections. The benefits of both proposed approaches are to generate query
result without requesting the user to resend the query while a disconnection occurs.
This query result is produced based on the predicted future location. Unfortunately,
there is a limitation to reproduce query result when the user has not received it for
several times, which is configurable depending on the server load.
3.8 Conclusion
The chapter discussed mobile query processing for single and multiple cells. In the
beginning of this chapter, the effectiveness of using a square shape as a query scope in
location-dependent query processing has been introduced. It focuses on retrieving
static object information within a single cell. It also presents the advantages of
a square shape in preference to other shapes to be used as the query scope when
processing location-dependent queries. The algorithms of retrieving objects location
information are developed to eliminate objects that have been passed by users, even
though they are still inside the query scope. Finally, when users miss query results
and their movements are slow, the past and current query scope overlap each other.
Therefore, an algorithm is developed to handle this situation in order to prevent
redundant information from being sent.
In the second part of this chapter, we discuss three methods to retrieve items
from multiple cells. The first method considers overlapping and non-overlapping BS
scope and the parallel query scope with the base station. The second one deals with
dynamic query scope. Finally, we propose an algorithm to deal with disconnection
while receiving query results. We have discussed the efficiency of those proposed
CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 123
algorithms in retrieving query results from multiple cells. Case studies are provided
to show the efficiency of the algorithm.
Chapter 4
Indexing for Multiple Servers
Retrieval
Chapter 3 focused on how mobile query processing is performed. However, we did
not discuss the indexing mechanism when multi-cell queries are requested. Thus,
this chapter presents our new contribution in processing multi-cell queries using
indexing, namely Local Index and Global Index.
4.1 Introduction
It is a characteristic of mobile queries that the locations of mobile users are dynamic
and they often request data items which are located inside a single cell or multiple
cells. This dynamic change has created the need to have a better query processing
speed and to reduce the number of invalid query results.
Query processing to retrieve objects which are located in multiple cells has raised
an issue which impact upon the query processing performance. Each cell finishes
query processing within a different amount of time. The difference in query process-
ing time for each cell is caused by different transfer speeds, queue size and query
124
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 125
processing speed in every server. These three factors increase the query processing
time.
One way to improve query processing is to provide an index structure for each
cell. Indexing technique is a common mechanism to help accessing a collection of
records and improve efficiency of the query processing [93, 129]. An index organises
data records to optimise certain kinds of retrieval operations. Several indexing
schemes have been proposed in the past, the most prominent among them being the
tree-based schemes [123]. The tree indexing schemes start searching from the root
nodes to the leaf nodes. The tree index structures help to process single cell queries.
However, their disadvantage in processing multi-cell queries, is that each cell needs
to traverse from the root node in order to produce the query result.
This chapter proposes two index mechanisms, namely Local and Global Indexing.
The aim is to handle the limitations of multi-cell query processing while examining
index structures in those cells. Neither proposed approaches intends to create a
new type of index structure; however, they extend the existing index structures to
improve the efficiency of multi-cell query processing. Nor do we concentrate on
concurrency issues, which occur in tree searching, and their solutions.
Moreover, both proposed approaches use an original type of multidimensional
index structure, called R-tree [41]. Both proposed approaches can also be applied
to any R-tree families. The proposed mechanisms have their own characteristics,
which can be summarised as follows:
• Local Index mechanism
As the name implies, this mechanism tries to process a multi-cell query within
a single cell. When there is a multi-cell query, remote indices of the objects
in a query result are stored locally in the current cell. In storing the remote
indices, the remote objects information are either replicated with their indexes
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 126
or kept in the original cell, which are then pointed by pointers from the leaf
nodes that hold the remote indexes. Then, if the future multi-cell queries
request for the same area, the current cell can answer it locally. On the other
hand, we do not store all requested remote indices locally, because this would
slow down the query processing and consumes more space.
• Global Index mechanism
This mechanism differs from the Local Index mechanism in terms of indices
organisation. Instead of storing remote indices of the objects in the query
result, a global index structure is created when a base station is starting up.
The cell propagates its index structure to every cell. Hence, while processing
a multi-cell query, the tree index traversal can be done from this global index
structure.
Figure 4.1: Chapter 4 framework
Figure 4.1 shows this chapter’s framework. Section 4.2 gives an overview of
this chapter as a foundation to the proposed approach. Two proposed indexing ap-
proaches using the original R-tree are discussed in Sections 4.3 and 4.4 respectively.
Examples of the usage of both proposed approaches are discussed in Section 4.5.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 127
Section 4.6 presents a discussion on two proposed approach. Finally, the conclusion
summarises the contents of this chapter.
4.2 Preliminary Study
This section presents an overview of the original R-tree indexing structure with a
brief explanation of multi-cell query processing scenario.
When a base station receives a multi-cell query, the base station verifies whether
the query scope beyond the base station area. The part of the query scope that
is beyond its area is forwarded to base stations, whose areas being covered by the
query scope. Assume that the index structure for each base station is the original
R-tree and has been built in advances. Each base station searches its index structure
to match the area that overlaps with that of the query scope. At each base station,
the probing process starts from the root node down to the leaf nodes. Objects of
the matched leaf nodes are collected to be returned to the user.
The R-tree structure is an adaption of the B+-tree to deal with spatial data
and it is a height-balanced data structure, which has internal and leaf nodes [93].
Internal nodes consist of index entries of the form <n-dimensional box, pointer to
leaf node>. Leaf nodes have a pointer to a data entry. Data entry contains a pair
<n-dimensional box, rid> where rid is an identification of an object and the box
is the smallest box that contains the object, that can be presented as a point or a
region. The n-dimensional box for the internal and the leaf nodes is called Minimum
Bounding Rectangles (MBR) or Minimum Bounding Boxes(MBB).
Figure 4.2 shows two dimensional regions and R-tree indexing structure. Figure
4.2a demonstrates the geometric location of objects presented as two dimensional
coordinates. Figure 4.2b shows the R-tree index structure for the two dimensional
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 128
coordinates. In the figure, there are 12 regions of data object (shadowed boxes),
denoted by (R8, R9, R10, R11, R12, R13, R14, R15, R16, R17, R18, R19). These
regions are presented as leaf entries of the R-tree index structure, as shown in Figure
4.2b. Regions at the upper level of the R-tree represent bounding boxes for internal
nodes. The middle level of the R-tree index structure is called Internal nodes which
are presented in the white coloured boxes. In the figure, there are five internal nodes,
namely: (R3, R4, R5, R6, R7). The top level of R-tree index is called root node. It
has two entries: (R1, R2).
(a) 2D coordinates
(b) R-tree
Figure 4.2: R-tree and 2D coordinates [93]
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 129
The bounding rectangles for at least two nodes can overlap each other. For
example, the bounding rectangles R1 and R2 overlap each other. It implies that more
than one leaf nodes could occupy a given data object while satisfying all bounding
rectangles boundaries [93]. However, every data object is stored in exactly one leaf
node, although its bounding rectangle overlaps with the regions corresponding to at
least two higher-level nodes. Let us consider data objects R8 and R9. These data
objects are located within region R3 and R4, however, the data objects R8 and R9
are located only within either R3 or R4.
4.3 Local Index
This section discusses the Local Index (LI) mechanism for mobile query processing to
retrieve data items from multiple cells. The data indexing mechanism is an efficient
way to retrieve data items across multiple cells. The indexing mechanism improves
the retrieval time by supplying the correct information for a client to retrieve the
remote data items from the current cell. The R-tree indexing structure is used to
store the multi-dimensional data item indexes.
The LI mechanism is a tree indexing mechanism where an index structure of one
cell contains indexes from other cells. In this mechanism, indexes from different cells
are not replicated, however the requested data item indices are stored into the local
cell. In other words, the tree index structure is expanded by adding the new remote
data item indices to local index structure. However, there is a situation where the
maximum number of nodes held has been reached. In this case, a number of nodes,
which holds the remote indexes, need to be deleted from the tree in order to put the
new remote indices. The total of the number of eliminated items and the available
spaces must be the same as total of new remote indexes. The insertion or deletion
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 130
operation is similar to that of insertion or deletion of an index for a single cell.
For simplicity, the geometric location of data items is described in two dimensional
coordinates. In the LI mechanism, each cell has its R-tree indexing structure to
index its local data items. The R-tree structure for each cell is different from the
others.
Figure 4.3: Three index structure into 3 cells
A simple scenario is the current server sends a query scope to two neighbourhood
cells for finding Automatic Teller Machines (ATMs). Figure 4.3 depicts the initial
index tree for each cell where the ATM location is used as the index partitioning
attribute which is the same as the table partitioning. The following assumption
is made for the range partitioning rules as follows: cell 1 holds the index location
between 1 to 30, cell 2 holds the index location between 31 to 60 and the rest go to
cell 3. Each key in the local index corresponds to a local record. Please note that
although the internal nodes amongst these cells are different, they are the same in
naming convention.
The indexing structure above is developed from tables which is presented in
Figure 4.4 which consists of index number, location and name of object.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 131
Figure 4.4: Tables for cell 1, cell 2 and cell 3 (from left to right)
Upon receiving query results, there are two ways to store the remote data items
for LI mechanisms in the current server. The server can store the remote indexes
only or the indexes with original data items. The details of both processes are
discussed in Sections 4.3.1 and 4.3.2. For simplicity, the LI mechanism with remote
indexes only is called as LI-1; whereas, the LI mechanism with remote indexes and
data items is called LI-2.
Algorithm 4.1 shows the details of Local Index algorithm, which can be explained
as follows:
(i) A mobile client sends a query scope which involves query result retrieval from
multiple cells. The server receives the query and probes keys in the local index
structure to determine whether there are any keys located within the query
range.
(ii) If the server finds only a partial of query scope, the remaining query scope
is sent to its neighbouring cell. Upon receiving the result, the server receives
either index only, or both index and data values of the requested data items.
The local server stores all incoming indices into available nodes of the local
index structure. If the server receives only the indices of data items, then,
pointers from those nodes to data values in neighbouring cells are created to
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 132
link the cached index and original data values. In the second situation, pointers
are created from nodes to local storage.
(iii) If the server finds all indices within its index structure, the server retrieves
data values by following the available pointers in those nodes.
(iv) The query result is sent to the mobile user once the query result is ready.
Algorithm 4.1: The Local Index algorithm
Input: QScopebegin
Rtree ← store indexes of objects into a R-tree indexQuery scope ← QScopeNoOfBSOnline ← number of online neighbour BSCollectionOfOnlineBS ← list of online BSResult ← search rtree(query scope)while (index < NoOfBSOnline) do
Current Neighbour BS ← CollectionOfOnlineBS at position indexif intersection(Queryscope, Current Neighbour BSscope is existed) then
Result neigh ← Current Neighbour BS(Queryscope)Rtree ← update Rtree(Result neigh)// Append retrieved results from current BS// to the end collection of all retrieved resultsResult ← Result + Result neigh
endindex ← increment index by one
endend
4.3.1 Cache Remote Indexes Only
A discussion of the proposed LI mechanism where the data and index segments lo-
cated in different cells is presented in this section. The aim is to speed up the query
processing by searching the common requested objects locally and cache mainte-
nance.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 133
Consider the indexing structures and data tables as shown in Figures 4.3 and
4.4 respectively. Assume that the server from cell 2 sends a query scope to cell 1
and the server cell 2 receives an item with index 29 from cell 1. The server adds
the index 29 and creates a pointer to the original data item, which is cell 1. Figure
4.5 shows the indexing structure after the index 29 is inserted into cell 2. Note that
there is a pointer from index 29 to data item 29 in cell 1.
Figure 4.5: Index structure after the records insertion using local index-1
This mechanism has a tree management similar to that of a single cell for R-tree
management [41] The procedure for insertion and deletion steps can be summarised
as follows.
After a new index has been received by the appropriate cell, the new index
is appended to the existing index structure. Algorithm 4.2 shows the insertion
algorithm of LI-1. They can be described as follows.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 134
(i) If the maximum number of entries has been reached, remove an existing index
from the tree which has not originated from this cell. The removal procedure
will be discussed later.
(ii) Find a right leaf node to insert the new key into the indexing structure.
(iii) Insert the new key if there is enough space in the leaf node.
(iv) Otherwise, this leaf node must be split into two leaf nodes and propagate the
splitting up to the root node if needed. This splitting process can be done by
using one of the existing splitting algorithms for the original R-tree.
(v) The last step is to create a data pointer from the entry on the leaf node where
the new index key is inserted to the data item at the remote cell.
Algorithm 4.2: The insertion algorithm of Local Index-1
Input: indexesbegin
MAX CAPACITY ← maximum nodes to be stored into this RtreeRtree ← indexes of objects in the R-treertree capacity ← current capacity of Rtreenum nodes freed ← 0size of indexes ← get size(indexes)if MAX CAPACITY - rtree capacity < size of indexes then
num nodes freed ← size of indexes - (MAX CAPACITY -rtree capacity)Rtree ← remove nodes(num nodes freed)
endfor each index in indexes do
Inserted node ← insert index to the R-tree and return a reference ofthe Inserted nodecreate a pointer from the Inserted node to the original data of index.
endend
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 135
When the maximum number of entries has been occupied, some entries need
to be removed from the cache. The process for evicting some entries is shown in
Algorithm 4.3, which is described as follows:
(i) Select a key victim using one of the existing cache replacement policies.
(ii) Remove the data pointer from the desired key, then discard the desired key
from the cache.
(iii) When eliminating the desired key from a leaf node, there is a chance that the
node becomes underflow after the removal. If this case occurs, try to find a
sibling node which needs less enlargement and redistribute the entries among
the node and its sibling so that both are at least half full; otherwise the node
is joined into its siblings and the number of nodes is decreased.
Algorithm 4.3: The deletion algorithm of Local Index-1
Input: num nodes freedbegin
Rtree ← indexes of objects in the R-treefor i ← 0 to num nodes freed do
index ← select victim()Deleted node ← Find a node, which match with the indexremove pointer(Deleted node)Rtree ← remove node(Deleted node)
endend
4.3.2 Cache Remote Indexes and Data Items
Caching remote indexes would only increase the data transfer to the remote cell.
This situation leads to a bandwidth bottleneck, although the data transfer is much
wider nowadays. To avoid the bandwidth bottleneck, caching the common requested
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 136
remote indexes and their data items into the desired cell is one solution. This issue
is the focus of our discussion in this section.
The LI process is similar to that described in the previous section, except this
time the actual data item is cached. To simplify our discussion, we use the same
illustration as in the previous section. In the previous section, the data item is not
copied into cell 2 and a data pointer is created and points to the data item located
in cell 1. Figure 4.6 illustrates index caching and its record. In the figure, the data
item from cell 1 is copied to cell 2. The data pointer is pointing to the local data
table instead of to the remote data table.
Figure 4.6: Index structure after the records insertion using local index-2
The summary of the detailed procedure for inserting and deleting are as follows:
Algorithm 4.4 presents a details of the insertion process of LI-2. The insertion
process is similar with the LI-1, they are different in creating a pointer to data item,
which is the last two steps of insertion process. They can be described as follows.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 137
(i) Store the remote data items to the requester cell.
(ii) Create data pointers from the entry on the leaf node, where the new indexes
key are inserted, to the data items at the requester cell.
Algorithm 4.4: The insertion algorithm of Local Index-2
Input: indexes, data itemsbegin
MAX CAPACITY ← maximum nodes to be stored into this RtreeRtree ← indexes of objects in the R-treertree capacity ← current capacity of Rtreenum nodes freed ← 0size of indexes ← get size(indexes)if MAX CAPACITY - rtree capacity < size of indexes then
num nodes freed ← size of indexes - (MAX CAPACITY -rtree capacity)Rtree ← remove nodes(num nodes freed)
endfor each index in indexes do
data storage[index] ← Store data(data item[index])Inserted node ← insert index to the R-tree and return a reference ofthe Inserted nodecreate a pointer from the Inserted node to data storage[index]
endend
Algorithm 4.5 shows the deletion algorithm of LI-2. The deletion process of
LI-2 is similar as LI-1, except the deletion process of LI-2 has an additional step to
remove the replicated data items. It can be explained as follows:
(i) Find a node to be deleted.
(ii) Remove the data item which is pointed by a data pointer.
(iii) Remove data pointer.
(iv) Remove the node from R-tree and adjust the R-tree if necessary.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 138
Algorithm 4.5: The deletion algorithm of Local Index-2
Input: num nodes freedbegin
Rtree ← indexes of objects in the R-treefor i ← 0 to num nodes freed do
index ← select victim()Deleted node ← Find a node, which match with the indexremove dataItem(Deleted node)remove pointer(Deleted node)Rtree ← remove node(Deleted node)
endend
4.4 Global Index
While a server is requesting data items on behalf of mobile clients to neighbouring
cells, the server performs some activities before the server sends the query result back
to the client. These activities involve waiting for the query result to be received,
and caching the new data items. The caching processes include the index insertion
into its local tree index structure and adjusting its nodes after the insertion. These
processes slow down the query processing; however, this limitation can be handled
by Global Indexing (GI) mechanism.
Unlike the LI mechanism, the index structure is built while a server in one cell
is starting up in the GI. Also, the GI mechanism has some degree of replication and
all indices are maintained globally. In other words, each cell has a different part of
a global index, although the global index structure is still kept.
In this mechanism, the ownership of each index node needs to be maintained in
order to preserve the global indexing structure. The ownership rule of each index
node is that the cell owning a left node also owns all nodes from the root to that
leaf. Consequently, the root node is replicated to all cells and the internal nodes
(except root node) may be replicated to some cells. Furthermore, if a leaf node has
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 139
some keys belonging to different cells, this leaf node is replicated to the cell owning
the keys.
As a running example, let us consider three different cells and 100 data items.
Each cell holds 30 indices of point data items. Cell 1 holds data item indices from
1-10. Cell 2 hold data item indices 11-20. The rest of the data item indices goes to
cell 3.
Figure 4.7: Global Index for all cells using GI mechanism.
Figure 4.7 shows a GI (global index) structure which are partitioned by a cell
boundary. The root node is replicated to all three cells and some nodes are replicated
to neighbour cells. The key P10-P12 (the fifth leaf node) is copied to cell 1 and 2,
because this node holds entries that belong to two cells. The key P10 belongs to cell
1 and keys P11 and P12 belong to cell 2. Due to some leaf nodes being replicated,
some of the internal nodes are replicated whereas others are not. For example, the
non-leaf R2 is replicated to cells 1 and 2, whereas the non-leaf R5 is not replicated.
Each leaf node has a data pointer which points to a data item located in either the
same cell or a different cell.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 140
Similar to the LI mechanism, the replication degree can be either indexes level
or indexes and data items. The GI mechanism with indexes only is called GI-1. On
the other hand, the GI mechanism with indexes and data items is called GI-2.
4.4.1 Remote Data Items Located at Different Cell
Our discussion here elaborates on the GI mechanism when the data items located
at the remote cell are not replicated to other neighbour cells. The global index
structure maintenance and query processing are two main discussions.
Algorithm 4.6 is to maintain the global index structure where the degree of data
items replication does not exist. The algorithm includes the insertion and deletion
entries to neighbour cells. However, the details of the R-tree splitting procedures
are not discussed here; these can be found in [41].
The algorithm is used to match a node with a given key and to perform the
insertion or deletion process. Details of the algorithm are as follows. The key here
is Minimum Bounding Rectangle (MBB) value. This algorithm is recursively probing
tree nodes starting from the root node to leaf-node. The key insertion or deletion
is done once a node has been found. Then, a data pointer is established or removed
between the entry to the actual data item depending the operation. Unless the key
is found in the current cell, a child tree (cellTree) is passed to a neighbouring cell to
probe its tree. When a node is overflow or underflow after a key has been inserted or
deleted, the existing splitting or merging algorithm for R-tree single cell is applied.
The starting data pointer is adjusted if the entry is moved to a new node.
In this mechanism, the data structure for the GI mechanism can be explained
as follows:
If a child node exists locally, the node pointer points to this index only, although
this child node is also replicated to other cells. For example, from MBB R1 at cell
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 141
Algorithm 4.6: Node maintenance of GI-1 algorithm
Input: Tree, Key, Operationbegin
Node ← a root node of Treeif Key /∈ Node then
cellTree ← assign child tree in neighbour cellsNode Maintenance(cellTree, key, operation)
elseif Node is leaf Node then
Execute insert/delete operation on local nodeCreate/remove a data pointer from the entry to the actual dataitemif Node is overflow or underflow then
Execute split/merge on leaf nodeAdjust all starting point of data pointers in the leaf node.
end
elsechildTree ← assign child treeNode Maintenance(childTree, key, operation)
end
endend
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 142
1, there is only a node pointer to the local MBB R2. The MBB R2 at cell 2 will not
accept an incoming node pointer from the MBB R1 at cell 1; however, it will accept
one node pointer from the local MBB R1 only.
If a child node does not exist locally, the node pointer will select one node pointer
to the closest child node (in case if multiple child nodes exist somewhere else). For
example, from the MBB R1 at cell 1, there is only one outgoing right node pointer
to the child node (R3,R4) at cell 2. In this case, an assumption is made that cell 2
is the closest neighbour of cell 3. Hence, the MBB R3 and R4, which also exists at
cell 3 will not accept a node pointer from root node R1 at cell 1.
Using the single node pointer model discussed above, there is always a chance of
tracing a node from any parent node from different cell. Figure 4.8 shows a single
node pointer model for the GI mechanism, which presents only the top three levels
of the index tree exhibited previously in Figure 4.7. From the figure, it is possible
to trace to nodes (R10,R11) from the root node 37 at cell 1, although there is no
direct connection from root node R1 to its direct child node (R3, R4) at cell 3. This
tracing to node (R10, R11) also can be done through node (R3, R4) at cell 2.
Figure 4.8: GI mechanism uses single node pointers.
A single node pointer model can be more formally described as follows.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 143
1. Given a parent node is duplicated when its child nodes are separated to mul-
tiple places, there is always a direct connection from whichever copy of this
parent node to any of its child nodes.
2. Applying the same procedure as the first point above, given a replicated grand-
parent node, there is always a direct connection from whichever copy of this
grandparent node to any of the parent node.
Considering both the above statements, we can conclude that there is always a
direct connection from whichever copy of the grandparent node to any of its child
nodes.
Apart from the issues of node pointer at internal nodes, that are at leaf nodes are
worth discussing. As some leaf nodes are replicated, it is also important to manage
data pointers at leaf nodes. Figure 4.9 shows a data structure where the data items
are not replicated to anywhere. In this figure, not all data pointers are shown in
order to increase the readability of the figure. As shown in the figure, leaf node that
contains indexes P10-P12 is replicated at cell 1 and 2. By applying a single node
pointer mechanism, each data item accepts two data pointers. For example, the
record for entry P10 accepts 2 incoming data pointers from cells 1 and 2. Similarly,
records for entry P11 and P12 receive 2 incoming data pointers from cells 1 and 2.
This mechanism has a similar concept to LI-1; that is, the leaf node is replicated
and its pointed record is not replicated. The main difference between GI-1 and LI-1
is the fact that GI schemes have one global index, whereas the LI schemes use a
local index.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 144
Figure 4.9: Global Index without replicated remote data items.
4.4.2 Remote Indexes and Data Items Located at Same Cell
In this mechanism, the data items are replicated to a cell to which the entries at leaf
node level are replicated. In other words, GI-2 has a similar idea to GI-1 in terms of
non-leaf nodes replications. Both approaches are different in the way they establish
data pointers at the leaf node level. The GI-2 has an extra step to replicate the
remote data items.
In this mechanism, the data structure for the internal node is similar to GI-1,
except a data pointer at leaf node points to the record this located at the same cell.
This data pointer can be explained as follows:
If a leaf node exists locally, a data item is not replicated and it is linked with the
entry in the leaf node by a data pointer. Figure 4.10 illustrates the GI mechanism
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 145
with replicated data items. In this figure, not all data items and data pointers are
shown to have clear visibility of the figure. For example, from the entry P4 and P5
at cell 1, there is only one data pointer to data items and the data items are not
replicated to cells 2 and 3.
Figure 4.10: GI mechanism where data items are replicated.
If a leaf node is replicated, the data items which belong to entries in the leaf
node are replicated to cells where the leaf node is duplicated. Once the data items
have been replicated, data pointers are created to link entries in the leaf node to
the appropriate data items at each cell. For example, leaf node (P10,P11,P12)
is replicated at cell 1 and 2 where the data items of those replicated entries are
duplicated. For example, the record for entry P10 is replicated from cell 1 to cell 2.
The data pointer for the entry P10 at cell 1 points to record P10 at cell 1 (Dotted
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 146
line), whereas the data pointer for the same entry at cell 2 points to record P10 at
cell 2 (line). Similarly, the record for entry P11 is replicated from cell 2 to cell 1. A
data pointer is established between entry and record for P11 at cell 1 and another
data pointer is created to link the entry and the record for P11 at cell 2.
Algorithm 4.7 is to maintain index nodes in the global index where the remote
data items are replicated. A GI with replicated data items model can be more
formally described as follows:
1. If a leaf node is replicated to another cell, there is always a copy of data items
for each entry in the leaf node. In addition, a direct connection from an entry
to a data item in the same cell always exists.
2. When a leaf node is not replicated to another cell, there are always original
data items for each entry in the leaf node. There is a single direct connection
from an entry of leaf node to a data item.
3. The number of direct connections between leaf node and data items is always
equal to the number of entries in each node.
4.5 Case Study
This section describes a case study using the proposed approaches. Two case studies
for each proposed indexing approach are presented in this section. To simplify our
explanation in these case studies, we reuse the indexing structure in Figure 4.3 on
page 130. Two case studies are presented in this section. A Local Index case study
is presented first, followed by a Global Index.
Let us suppose that the proposed Local Index mechanism is applied for the first
case study. Assume that a mobile user requests data items from cell 2 and the query
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 147
Algorithm 4.7: Node Maintenance of GI-2 algorithm
Input: Tree, Key, Operationbegin
Node ← a root node of Treeif Key /∈ Node then
cellTree ← assign child tree in neighbour cellsnode Maintenance(cellTree, key, operation)
elseif Node is leaf Node then
Execute insert/delete operation on local nodeReplicate the remote data itemCreate/remove a data pointer from the entry to the actual dataitemif Node is overflow or underflow then
Execute split/merge on leaf nodeAdjust all starting point of data pointers in the leaf node.
end
elsechildTree ← assign child treenode Maintenance(childTree, key, operation)
end
endend
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 148
asks for data items in cells 2 and 3. Cell 2 processes the request by probing its index
structure and sends the remaining query scope to cell 3. Assume that cell 2 receives
index 61 from cell 3. Cell 2 then adds the index 61 to its index structure and creates
a pointer from node 61 to the original data items in cell 3.
Figure 4.11: Indexing structure at cell 2 after the remote index insertion
Figure 4.11 shows the situation after index 61 has been put in the cell 2. A
pointer from index node 61 is pointed directly at the actual data items in cell 3 if
the data item is not replicated. On the other hand, if there is a degree of replication,
the data item of index 61 is copied to cell 2 and the pointer points from the index
node 61 to the replicated data item.
On the other hand, when the conventional approach is used, the tree traversal
starts from the root node on both tree index structures. This situation slows down
the query processing if the requested remote indexes are available in the local storage.
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 149
In the second case study, the proposed Global Index mechanism is used to process
a multiple cells mobile query. To simplify our case study scenario, an indexing
structure as shown in Figure 4.7 on page 139 is used and it has been initialised.
Figure 4.12: Global Index mechanisms case study
The scenario for this case study is similar to the previous one. However, the
Global Index is used and the database contains different records. Assume that
index items 21 and 22 are the requested data items. When these data items arrive
at cell 2, they can be stored in cell 2’s local storage. If it is, each pointer from index
node of 21 and 22 is created and points to data items in the actual data in the local
cell (which is cell 2).
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 150
4.6 Discussion
This chapter presents two proposed mechanisms for using the existing indexing
structure to process multi-cell queries. The mechanisms are designed to minimise
retrieval time, reduce data transfer and optimise query access time for retrieving
desired data items from multiple cells.
Two indexing mechanisms have been introduced, namely: Local Indexing (LI)
and Global Indexing (GI). These two indexing mechanisms are based on the R-tree
structure and are designed to improve the performance of multiple query processing.
The complexity of the LI mechanism is related to the fact that the data pointer
crosses the cell boundary and data items are replicated. In the first case, the data
pointer to the appropriate data item can cross several cell boundaries. In the sec-
ond case, the data items are duplicated to wherever the node is duplicated. The
performance of both LI has slightly different results. The LI-2 has lower access time
compared with the first one. This is due to the LI-2 retrieving the data item locally.
The GI and LI mechanisms have different the index structure. In the LI, each
cell has its own index structure, whereas, the GI has a global index structure and
each cell has a different part of the index structure. In other words, the GI keeps
the index structure globally for all cells, however, the LI preserves it to be localised
at each cell. On the other hand, all indexes from every cell could be stored into an
index structure at each cell. However, this situation is prevented by the limiting
number of nodes for the tree index structure at each cell.
The main difference between the GI and LI mechanisms is very much related to
index restructuring which involves single / multiple cells. This difference can further
specified as follows:
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 151
• Unlike the LI mechanism, where only single cells are involved in maintaining
the index structure, the GI mechanism demands the involvement of multiple
cells.
• The indexing structure is constructed while a BS is starting up in the GI
mechanism. It is expanded by inserting indexes from others cells. The LI
mechanism expands its index structure if new query results arrive.
• The GI mechanism does not have any limitation on increasing its global in-
dexes structure. The mechanism shrinks its index structure if a data item is
deleted from the cell or another BS is offline. In the LI mechanism, there is
a limitation to increasing the index structure, thus, the index structure often
shrinks because of this limitation.
Based on the characteristics of both indexing mechanisms, the LI mechanism
brings a benefit if the remote indexes of the local index structure are frequently
requested in order to minimise the maintenance of index structure; whereas the GI
mechanism brings an efficiency if the queries often requests different areas due to
its large collection of indexes.
In terms of number of cells to be cached, the LI mechanism cache the index
of remote objects from surrounding neighbour cells. The GI mechanism can hold
the index of remote objects from all online BSs. However, if any BS is offline, the
performance of GI degrades due to the index maintenance.
4.7 Conclusion
This chapter has presented two proposed indexing mechanism for the server side.
These proposed approaches are called Local Indexing (LI) and Global Indexing (GI)
CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 152
mechanisms. The aim of the proposed approaches is to speed up the query processing
and to reduce the amount of data transfer.
The LI mechanism retrieves the indexes from other cells and stores them in the
cell where the mobile user issues the query. The remote indexes retrieved are the
indexes of the data items in the query result. In other words, the index structure in
the current cell is expanded and maintained locally by this mechanism.
The GI mechanism is similar to LI. However, they are different in terms of main-
taining the indexing structure. With this mechanism, the overall index structure is
kept and part of the global index is separated in all cells.
In addition, each mechanism has two different methods of accessing data items
for each remote index from the local indexes: remotely or locally. In the former, the
data item is not replicated to anywhere. Hence, a data pointer is created from an
index entry in a leaf node to its data item. The latter is to replicate the data items
from the original cell into wherever the indexes are replicated.
Chapter 5
Client Caching for a Mobile
Environment
This chapter presents our proposed client cached object elimination approaches. In
our approaches, cached objects are ordered and put together into groups and then
we eliminate a group of cached objects based on either distance, weight or value of
formula amongst all groups. The process keeps evicting a group one by one until
all new incoming objects can be cached. The aim of our proposed approaches is to
boost the cache hit rate.
5.1 Introduction
A mobile computing environment has limitation of resources, such as narrow band-
width, small space size, battery-power and frequent disconnections. These limita-
tions are inconvenient for the processing of location-dependent queries. In addition,
the user mobility adds another complication to the location-dependent query pro-
cessing.
153
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 154
Client caching is a traditional way to store data from servers into the client
side. This mechanism has been applied in many ways for a wired network, such
as distributed, World Wide Web and database systems. Recently, client caching
has been adapted for a mobile environment to handle the limitations imposed by
a mobile environment. Several proposed approaches to maintain objects in a client
cache exist, although there are still some problems outstanding.
In this chapter, we propose three approaches for client cache management in
a mobile environment. Our proposed approaches sort cached objects into groups,
which is similar to [95]. Density and distance are factors to consider when grouping
cached objects. In addition, our replacement policy removes a group of objects
instead of a single object.
Our proposed approaches are called: (i) Path-based, (ii) Density-based and (iii)
Probability Density Area Inverse Distance (PDAID) replacement policies. The Path-
based replacement policy is similar to Further Away Replacement (FAR) [94], which
eliminates objects furthest from the mobile client’s location. The Path-based re-
placement policy eliminates a group of objects by considering the distance of all
groups. The distances of all groups are measured to the next predicted location af-
ter the predicted location where the user receives query result. On the other hand,
the Density-based replacement policy evicts cached objects based on the density of
a group. A group which has less density has priority to be dropped from the cache.
The last replacement policy is PDAID, which eliminates a group of objects based on
cost. This cost is created by three factors, namely: access probability, density and
data distance. The ideal cost has small access probability, less density and further
data distance. Hence, a group that has the smallest cost value is the first to be
evicted.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 155
In our proposed approach, reducing transfer cost and user satisfaction are our
aims. Due to the limitation of bandwidth and frequent disconnection, reducing
transfer cost can be achieved by retrieving the query result from the cache and/or
retrieve necessary data items from the server. Retrieving necessary data items from
the server can be done by asking how many records satisfy the user. In other words,
we concentrate on the process of storing the incoming objects into the cache and
arrangement of the existing cached objects when there is no available space in the
cache for the new incoming objects.
Figure 5.1: Chapter 5 framework
Figure 5.1 shows a framework of this chapter. The rest of this chapter is organised
as follows. An overview of the general client caching processes is given in Section
5.2. Section 5.3 elaborates on the details of our proposed approaches. The proposed
approaches are Density, Distance and Formula based cache replacement policies.
Case studies and discussion sections are given in Sections 5.4 and 5.5 respectively.
The last section concludes this chapter.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 156
5.2 Client Caching Overview
This section gives an overview of the general client caching mechanism in a mobile
environment. The discussion starts with the general process, retrieval, grouping and
elimination procedures. The aim of this section is to provide a foundation for our
proposed approach.
When a client sends a query to a server, the answer to this query is verified
against its local cache. If the answer is found in the cache, then it is returned to
the user directly. Otherwise, the query is sent to the server. A set of results is
generated by the server. Upon receiving incoming query results, the available space
in the cache is verified to indicate whether an elimination process needs to be done.
If the available space is enough to store all incoming results, then the incoming query
result is stored directly without eliminating cached objects. Otherwise, the classic
way is to discard some cached objects to free some slots in the cache. Alternatively,
we store only as many of these incoming objects in the cache as there are available
cache slots. However, we do not consider the last option that stores a partial of
query result into the cache, as a discussion point in this thesis.
For a location-dependent query, the attributes of a query are the current details
of users, range of the query and the minimum number of wanted objects. The first
attribute includes speed, direction and location at the current time. The range of
the query attribute defines how far users want to search specific area. The minimum
number of wanted objects attribute means that if the minimum number of required
objects has been found in the cache, then these objects are returned to the users.
In other word, sending the query to ask for more objects can be avoided if the user
is satisfied with the current answer. If the users need to receive a full answer, the
client needs to ask to the server for a full answer. The minimum number of wanted
objects attribute is usually ignored in general case.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 157
Figure 5.2: Section 5.2 framework
Figure 5.2 shows a framework for this section. Section 5.2.1 shows a global
framework and briefly explains the client caching process. Section 5.2.2 presents an
overview of how to store query results to a client cache. Predicting the next move-
ment location is discussed in Section 5.2.3. Cached objects retrieval and updating
query history list overview are presented in Sections 5.2.4 and 5.2.5 respectively.
Objects grouping and cache objects replacement overview are discussed in Sections
5.2.6 and 5.2.7 respectively.
5.2.1 Global Process
This section discusses about a whole process for client caching. The intention is to
present the big picture of the client caching process. The discussion begins with
receiving a query through to replacing cached objects.
When a client sends a query, this query is sent to a cache in the client’s mobile
device. The query contains a query scope, K, current position, direction and speed.
K is the number of objects expected to be received by the user. In other words, a
user is satisfied if at least K number of objects in a query result are received instead
of a full set of query results. If the user is unsatisfied with objects in the query
result, the user needs to send the same query again. Normally, the user expects to
receive all results and the value of K is ignored in the query.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 158
Once the cache receives these parameters, the query processing begins. The first
step is to predict the next location where the user will receive the query result. The
next location prediction process is discussed in Section 5.2.3. When the cache has
the location from which to retrieve query result, the cache probes its collection to
match objects with the query scope. If the number of objects is greater than or
equal to K, then the cache returns those objects immediately. On the other side,
if the number of objects is less than K or is not found, the query is sent to the
current cell where the mobile clients reside within the current cell. Upon receiving
the query result, the available cache space is verified to ensure that the cache has
enough space to store new incoming objects inside the query result. If the cache
space is not enough to store all processes, cached objects are removed (see Section
5.2.7). After that, the new incoming objects are put into the cache. All cached
objects are regrouped, which results in new groups appearing or the members of
existing groups changing. The discussion about grouping is presented in Section
5.2.6. The results are sent to the client. Hence, the client finishes the process.
5.2.2 Storing Query Results to Cache
An algorithm for storing the incoming query results from a server into a client
cache is discussed in this section. Before the incoming query results are stored,
this algorithm does two pre-processes, namely filtering object duplication and total
objects verification.
Object duplications could occur if the server does not have any knowledge about
the client cache status. In the other words, this situation occurs when a client sends
a complete part of a query scope without attaching any information of available
cached objects. There are two ways to prevent object duplications as follow:
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 159
• Client side filtering
When a complete query scope is sent to the server, the server sends a complete
result set to the client. The complete result set, which can contain objects
that exist at the client cache, is sent because the server does not have any
information at all about the client cache. Hence, the client needs to filter
those objects that have existed in the cache before the complete result set is
put in the cache.
• Server side filtering
This step is more efficient than the first one in terms of sending a query scope.
The cached objects are loaded from the cache and the area of query scope is
examined. As a result of examination, the parts of the query scope which do
not include any area of cached objects are sent to the server. Hence, the server
produces a query result that does not have the same objects as any of the ones
in the cache.
Alternatively, the filtering process at the server side can be accomplished by
attaching all cached objects information to the query. Upon processing the
query, the server matches its objects with the query scope and the attached
cached objects information. If the objects already exist in the attached infor-
mation, these objects are excluded from the query result.
After the filtering phase has been done, the number of remaining objects in the
incoming query results are validated. The aim of this validation is to accept those
query results where the number of objects is less than or equal to the client cache
space. If the cache space has fewer slots to accommodate a complete query result,
this query result is not stored. Otherwise, the query result is saved in the cache.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 160
Once the validation has been done, two possibilities of available cache space
might occur here while storing the query result. In the first situation, the number
of slots in the cache is enough to store all query results. If this situation occurs,
then those objects can be stored into the cache directly. Also, the receiving location
of the mobile user is stored while storing the query results into the cache. The aim
of storing the receiving location of the mobile user is to avoid wrong cache items /
cache group elimination. In other words, the query history list is created to predict
the next location based on the past locations.
In contrast, when the number of slots is not enough to store all incoming objects,
the cached objects and location of the mobile user are removed from the cache. Then
those objects are stored and the cached objects are regrouped, which results in a
new group being formed or new cached objects being added to the existing groups.
5.2.3 Predicting Next Movement
There are three ways to predict the next location of the mobile user: the movement
patterns, prediction and both.
In the first way, all user movements are stored and the next movement is pre-
dicted based on those movements. Predicting the next location of the mobile user
can be done by storing the movement patterns. In the mobile environment, the user
always broadcasts its location every t interval time. Remembering these locations
may improve the accuracy of the cache items elimination if the user follows the same
pattern. However, keeping these locations may consume a lot of space.
On the other hand, the next location prediction can be calculated based on
current position, direction and speed. These three factors are not enough to predict
the next location without any knowledge about when a query result is going to be
received. Once the processing time is known, the prediction of the next location
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 161
can be done. This way is simpler and easier than the previous way, but the same
accuracy is not guaranteed. The reason for this is the unpredictable traffic conditions
or directions.
The last way to predict next location is a hybrid of the first and the second
approaches. In this approach, the location when the user receives query result is
stored instead of a series of user movements. The aim for this is to be used in
predicting the next location. If the location has not been seen before, the prediction
formula is used to calculate the next location.
5.2.4 Retrieving Cached Objects
A cached objects retrieval process is presented in this section. The aim is to give a
global overview of how cached objects are loaded from the cache.
The starting point of the process is accepting three parameters as inputs from
the user, namely: Query Scope, number of expected objects and the next predicted
position. These three parameters are used to determine which groups of objects
intersect with the query scope.
The next step, which is called the cached objects verification process, is to verify
whether all cached objects are located within part of the query scope. In this
process, each cached objects group is checked to determine whether it overlaps with
the query scope. If the group overlaps, each object within the group is recursively
matched with the query scope. When some cached objects of the group reside in the
query scope, the information of the cached objects is loaded and put into a result
collection. Also, a counter which is used to keep track how many cached objects
have been found is incremented. After all objects of the group have been processed,
the next overlapping group is inspected using the same process. The verification
process ends when all groups have been verified and the information about the valid
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 162
objects has been allocated to result collection. The result collection is returned to
the user.
5.2.5 Updating Query History List
This section presents an algorithm for updating a query history list. This algorithm
is used to keep track of the locations that have been visited in the past in order to
predict the future location.
In general, when a user receives a location-dependent query result, the location
of the user is added to the query list. Unless the user has visited the same location
in the past, the timestamp of that record is updated with the current time value.
The process is performed by inspecting the list of query history. If there is
an existing entry, timestamp of the entry is updated with the current timestamp.
Otherwise, a new entry is created and inserted to the list. The entry contains the
current location, query scope and current timestamp.
5.2.6 Objects Grouping
Grouping or clustering is a mechanism to divide queries into groups which are seman-
tically related or adjacent to be kept together [95]. Overview of Grouping process is
the main topic in this section. As we have mentioned in Section 2, there are some
existing grouping algorithms for grouping data items [42, 34, 116].
From the many existing clustering algorithms, we adapt one to help our proposed
algorithm in grouping the cached objects. This algorithm is called the DBScan
(Density-based Spatial Clustering of Applications with Noise) algorithm [34]. It has
been chosen since it is based on locality connection and density when grouping the
objects. Density-based grouping is a mechanism that group a minimum number
of objects which are located within certain distance together. The benefits of our
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 163
proposed approach is that it groups cached objects based on distance and minimum
number of objects. It is also used for our cache replacement.
This section gives an overview of the DBScan mechanism, which is used in our
approach. A brief explanation of this mechanism is given, followed by an example
of how this mechanism works.
In the DBScan scheme, a group or cluster has a centre point, a distance from
the object, and a minimum number of points within the specified distance. The
distance is denoted as Eps. Eps-Neighbourhood of p is the objects located within the
distance of p and Eps. A minimum count of points within an Eps-Neighbourhood of
p is known as MinPts.
The objects in the group can be differentiated into two types: core and noise
objects. Core object has at least a minimum number of objects within a radius Eps,
whereas, noise or border objects are located far away on the boundary of a group.
In other words, a core object is defined as a point q having an Eps-neighbourhood of
not less than MinPts. If p has a Eps-neighbourhood of less than MinPts, then it is
considered a border object.
The process begins by finding an object as a core object, which is used as a
centre point of a group. Then the process looks for the objects which have not
been assigned to any group or are connected to any other object. If this criteria
is satisfied, the object is assigned to the group and the group is expanded and the
counter is incremented by one. On the other hand, merging of the two clusters
would be done if the minimum distance between any point in two clusters is less
than Eps.
In the context of our case, a process of regrouping cached objects is done when
there are new incoming objects and/or eliminating cached objects. Thus the groups
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 164
in the cache are up-to-date when members of the caching have changed. The re/-
grouping process is slower if the cache size is bigger.
Let us consider examples using DBScan. If minimum number of points is 2,
what are the clusters that DBScan would discover with the following 8 points. Ex-
amples: A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2),
A8=(4,9). Figure 5.3 shows those points presented in two-dimensional coordinates.
Figure 5.3: An illustration of the DBScan Algorithm
The size of epsilon gives an impact on performing clusters. When we set the
value of epsilon to 2, two clusters are formed. The first cluster, C1 contains 2
points: A4 and A8; whereas the second cluster, C2, consists of 3 points: A3, A5
and A6. However, when the value of epsilon is set to 3.5, three clusters are formed
instead of two. The last cluster, C3, contains A2 and A7. The C1 has an extra
member, A1, however, the members of C2 remains the same.
5.2.7 Cached Objects Elimination
In general, cached objects are evicted from the cache when the cache space is not
enough to contain new objects. Cached objects are easy to process; however choosing
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 165
the right victims requires more attention in order to increase the benefit of using a
cache at the client side. In a mobile environment, cache management is important to
overcome the limitations of this environment. The existing cache elimination policy
exists and these criteria are based on distance, density, timestamps, individual object
size or other attributes.
An overview of cached objects elimination is given in this section. Cache object
elimination is a process to evict cached objects based on certain criteria when there
is no available space in the cache. The aim of this section is to present the foundation
of our proposed approach. The details of our proposed approach are discussed in
Section 5.3.
Three cached objects elimination discussions are given as follows:
• Path-based
This approach eliminates a group of objects by considering distances of all
centre points of groups to a user’s locations. Two user locations are taken to
measure a distance to a centre point of the group, these two locations are the
receiving location and next predicted location after the receiving location. In
other words, two distances from a centre point of a group are measured; one
is measured from the centre point of a group to the receiving location and the
other one is measured from the same starting point to the next predicted lo-
cation. A group that has the furthest distance to the next prediction location,
but the nearest to the receiving location is the group victim to be evicted.
• Density-based
Density is another consideration factor for cached elimination. Density refers
to the number of objects in a group. A group that has fewer objects and a
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 166
larger area is a target for elimination. In other words, the group which has
fewer dense objects has higher priority for elimination.
• Cost-based (PDAID)
Cost is a key factor when removing a group of cached objects for this cache
replacement policy. The cost value is determined by using a formula which is
based on several factors. These factors are: access time, density and distance.
Before providing an overview of our proposed approach, some definitions
of terms used in this proposed approach is presented, followed by an existing
approach, called PAID. This approach is explained here since our approach is
similar to the PAID approach. The formula for calculating cost is different for
both approaches.
In the PAID approach, a formula has been developed which depends on
the three factors mentioned above. The formula is shown below:
C = P∗AD
Where C is the cost of a data value,
P is the access probability,
A is the valid scope area, and
D is the data distance.
Some terms used are data distance, valid scope area, and access probability.
Data distance refers to the distance between the receiving location of a mobile
client and the valid scope of a data value. Valid scope area refers to the
geometric area of the valid scope of a data value. Access probability is measured
using the well known exponential aging.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 167
The formula for calculating access probability value is :
P = α(tc−ti)+(1−α)∗P
Where:
tc is the current system time,
α is a constant factor to weight the importance of the most
recent access in the probability estimate,
Ti is the last access time, where the initialised value is zero (0), and
the second P is the previous calculated P.
The elimination process is started by calculating a cost for each valid scope.
Once each valid scope has a cost, the valid scope which has the smallest value
is removed from the cache first.
Our proposed approach is similar to the PAID approach in terms of elimi-
nation factors. Both approaches use time and distance factors for elimination.
However, our approach considers the density of a group rather than an area of
a group. In other words, the PAID approach removes a group which has the
longest access time, further distance and smallest area. However, our PDAID
approach removes a group which is less dense, has long access time and is
further away from the next predicted location after the predicted receiving
location.
5.3 Proposed Approach
This section discusses our proposed approach to the cache replacement policy. We
proposed three cache replacement policies: (i) Path, (ii) Density and (iii) Probability
Density Area Inverse Distance (PDAID). The first two cache replacement policies
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 168
are straightforward; they eliminate based on distance/path and density respectively.
The last policy is based on the cost of multiple attributes.
As mentioned in Section 5.2.6, the DBScan algorithm is adapted to our proposed
approach. The aim of adapting DBScan algorithm is to eliminate a group of objects
using one of the proposed cached replacement policies mentioned earlier.
Section 5.3.1 explains the proposed path-based cache replacement policy, The
proposed density-based proposed approach is discussed in Section 5.3.2. Section
5.3.3 presents the based proposed approach.
5.3.1 Path Based Elimination Algorithm
This section explains our proposed algorithm to eliminate groups of objects when
some slots are needed for new incoming objects. The proposed algorithm eliminates
a group of objects to free a number of occupied slots until the number of available
slots is enough to store incoming objects.
Figure 5.4: Simple illustration of our elimination approach
Before we start explaining about our proposed approach, consider Figure 5.4.
The figure shows the user moving from position G5 to a receiving location (a location
where the user receives a query result). In the receiving position, there are incoming
query results from the server and the cache slots are not enough to store the incoming
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 169
query results. Some of the cached items will be evicted from the cache. The cached
items elimination algorithm needs to be smart enough to maintain a cache hit rate.
Eliminating the cached object by considering the next location after the receiving
result location is one of many smart ways to keep a high hit rate. As shown in the
figure, the next predicted location after the receiving result location is G1. There
are two paths for G0 and G2: one path from a group to the next location, and the
other one is from a group to the current location. The aim of these two lines is to
evict the cached objects group by measuring two distances as mentioned above. A
group that is the furthest from the centre point of the next group (the next position)
has a higher priority to be eliminated first.
Algorithm 5.1 shows our path-based elimination algorithm. One of the input
parameters is to keep track of the number of slots that have been made available.
The elimination process is done by evicting groups of cached objects one by one
until the number of required slots has been satisfied.
In the eviction process, a group selection is determined by comparing two paths
between the path from a centroid of the group to a centroid of the group in the next
predicted location and the path from a centroid of the group to a centroid of the
group in the current predicted location. The group becomes a victim if this group is
located near to the receiving location and furthest from the next predicted location.
Figure 5.5 illustrates a more complex case. In this illustration, the ’dot’ line
(black), denotes the distance from centroid of each cached group to the current
position of the user. The ’dot-dash’ line (gray color) denotes the distance from the
centroid of each group to the next position of the user. Current and subsequent
positions refer to the location while the user is receiving a query result and the next
predicted location respectively.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 170
Algorithm 5.1: The proposed path-based elimination algorithm
Input: ListOfGroups,sender location, recipient location, required slots freedbegin
sending loc ← sender locationreceiving loc ← recipient locationnext loc ← next predicted location from the receiving locationGroups ← list of groupsnumOfRequiredSlots ← number of required slots to be freedwhile numOfRequiredSlots ≥ numOfSlotsFreed do
for each group in groups doif (sender loc or recipient loc or next loc) ∈ group then
continueendDist to next ← find distance from the centroid of the group to thecentroid of the next loc.Dist to recipient ← calculate distance from the centroid of thegroup tothe recipient positionMin dist group ← Min(Dist to next, Dist to recipient)List Min dist ← add Min dist group to list
endMax dist group ← Max(List Min dist)groups ← remove groupgroup idnumOfAvailableSlots ← numOfObjects + numOfAvailableSlots
endend
In the figure, distances from the centroid of group G0, G2, G6 and G7 to the
current and next positions are calculated. Then, the minimum value for each group
is collected, for example MinG0 (8,11) = 8, MinG2(10,15) = 10. After that, the
maximum value of those minimum values is selected. The group that has this
maximum value will be eliminated from the cache.
On the other hand, there is a possibility that a future query scope may overlap
with more than one group. To handle this situation, a mechanism similar to the
one in the previous example is applied. The distance from a group to all overlapped
groups and current position is measured. The minimum distance of this group is
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 171
Figure 5.5: Complex illustration of our elimination approach
selected. Using the same mechanism, the distance calculation is also applied to all
groups in the cache, and the minimum distance for each group is chosen. Then, a
maximum value from those minimum distances is taken. The cached objects within
the group that has the maximum distance value are eliminated from the cache.
Figure 5.6: A query scope overlaps with multiple groups
Figure 5.6 illustrates a query scope overlapping with two groups (G1a and G1b).
The figure shows that G0 and G2 are the two groups to be eliminated. In order
to eliminate them, distances from a centre point of G0 to the centre of overlapped
groups (G1a and G1b) are measured. After that, a distance from G0 to the receiving
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 172
location is computed. Then, a minimum distance of those distances is taken, which
is 8.5. A similar procedure is applied for finding a minimum distance value of G2,
which is 9.5. In the final stage, a maximum distance of both minimum distance
values is chosen, which is 9.5. Hence, G2 is eliminated from the cache.
5.3.2 Density Based Elimination Algorithm
A discussion of the density-based cache replacement policy is presented in this sec-
tion. Density is the ratio between the number of items and the area of a group. If
a group has fewer objects, the density of that group is small. It implies that the
group does not have many items of interest. Therefore, the group which contains
small density has a priority for the elimination.
Algorithm 5.2: Density-based elimination algorithm
beginnext location ← predict next location after the user received query resultgroups ← current available groupsnumOfRequiredSlots ← number of required slotswhile numOfRequiredSlots ≥ numOfSlotsFreed do
for each group in groups dogroup ← Find a group that has less collection and least accessedtimesisReqNext ← Check possibility for the group to be requested next.if isReqNext 6= true then
numOfSlotsFreed ← numOfSlotsFreed + numItemsInTheGroupgroup ← remove all cached items in the selected group.
end
end
endend
Algorithm 5.2 shows the elimination of cached objects based on density. First, it
predicts the next location of the user and starts the elimination process. While it is
evicting the group, the group that has a smaller collection will have higher priority
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 173
to be removed. In other words, a group which has the least density value has
higher priority to be eliminated. After the group has been eliminated, the number
of available slots is calculated. If the number of slots is insufficient, the elimination
process will evict more groups until there is a sufficient number of needed slots. This
algorithm does not prioritise user movement patterns.
Figure 5.7: Illustration of density elimination
Figure 5.7 presents an illustration of density elimination, where N presents a
number of objects in a group. The area of every group has the same size. Consider
that the user moves from the current location (G5) to the retrieving location (shaded
area), where the user receives the query result. While retrieving query result, the
cache is full and cached objects need to be evicted. The next location after the
recipient location is predicted to determine whether the next group is not requested
in the next interval time. In our scenario, the least dense group is G1, however, this
group is not eliminated because it is predicted to be requested. Hence, a group of
cached objects in G0 is evicted from the cache. The elimination keeps going until
there is enough space to store the incoming objects.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 174
5.3.3 PDAID Elimination Algorithm
This section presents our cost-based replacement policy, which is called Probabil-
ity Density Area Inverse Distance (PDAID). The proposed approach eliminates a
group of cached objects based on a cost value. The cost value is calculated during
the cache retrieval and is based on several factors. Therefore, this section is divided
into two subsections, a cached objects retrieval modification and the cache replace-
ment algorithms. The early subsection discusses how a cost value is calculated and
updated. The later section shows the proposed cache replacement algorithm which
uses the calculated cost to remove cached objects from the cache.
Modification of Cached Objects Retrieval Algorithm
This section discusses a proposed cache objects retrieval approach which modifies
the general cache retrieval algorithm and would be used with the PDAID proposed
approach. The main discussions of this section focus on our cost formula and the
modified cache objects retrieval algorithm.
As mentioned in Section 5.2.7, the density of the valid scope area as an additional
factor is taken into account in our proposed approach. Hence, the PAID formula is
modified to have a density factor by replacing the value A with Da, where the Da
is the density value of an area. The value DA is calculated as follows:
Da = NA
Where Da is the density value of an area,
N is the number of objects in an area, and
A is the valid scope area.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 175
Therefore, the modified formula for C can be found as follow:
Pg = P * Da
Where Pg is the access probability of a group,
P is the access probability of an item, and
Da is the density of an area
To simplify the access probability formula, we assume α is constant. Thus, the
access probability formula becomes:
P = 1tc−ti
where tc is the current accessed time, and
ti is the last accessed time.
Hence, the cost of data value becomes :
C = Da(tc−ti)∗D
C = PgD
Where Pg is an access probability of a group,
D is the data distance, and
C is the elimination of cost.
Using the modified PAID formula, the cached objects retrieval and elimination
processes are slightly modified. The aim is to include a weight factor into the
formula. In the above formula, the value of Pg is computed during the retrieval
process, because if this computation is completed during the elimination process,
it would slightly increases the computation during that process. The value of C
is calculated during the cache elimination process, because the user is dynamically
moving and the distance is not independent on the current location of user.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 176
A group of cached objects is accessed while retrieving the cached objects or
storing new objects to the cache. In storing new objects, those objects are grouped
and all factors which are mentioned above are kept. In grouping the cached objects,
they form new groups or merge into existing groups. When new groups are formed,
their Pg values are initialised to zero. If the new objects are merged into new existing
groups, the existing groups are split into new groups. The Pg value of both groups
are not reset back to zero, because they contains existing cached objects. The Pg
value of a group is recalculated when the information of cached objects in the group
are retrieved, then value of accessed time and the Pg value are updated.
Algorithm 5.3: Cache retrieval for PDAID algorithm
beginGroups ← Find any group intersects with the query scope in nextposition.tc ← current timeti ← 0for each group in Groups do
groupResult ← find the cached objects in the group matches with thequery scopeif groupResult 6= empty then
ti ← retrieve accessTime(group)Pg ← D∗A
(tc−ti)
update existing values(group, Pg, tc, D, A)results ← result + groupResult
end
endreturn results
end
Algorithm 5.3 shows the cache retrieval algorithm which considers multiple fac-
tors. When the query scope intersects with a group of cached objects, the value of
Pg is updated. The value of Pg is to keep track of the cost of a group which has
the requested query result. After the value Pg has been calculated, the query result
is added to the parameter results. After that, the algorithm continues finding the
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 177
next intersected group, calculating the value Pg and adding the result found to the
parameter results. Once the cached result has been generated, the result is sent to
the user.
Figure 5.8: Illustration of PDAID retrieval
Figure 5.8 shows an illustration of PDAID retrieval. Assumes that area of every
group is the same. G0 and G1 are two groups which are stored at time t0 and t1
respectively. At time t2, a number of new objects is stored and all cached objects
are regrouped. This situation causes the G0 is split to 2 groups: G0 and G2.
The access probability of a group can be explained as follows: The Pg values of
G0 and G1 at time t0 and t1 are initialised to zero. When the G2 is formed, the value
of Pg is not zero if it has any existing cached objects. Therefore, the calculation of
Pg for G0 as follows: the density value (Da) is 4; the value of P is 0.5 and the value
of Pg is 4 * 0.5 = 2. The Pg calculation for G2 can be done in the same way as G0,
thus its value is 2.5.
PDAID Replacement Algorithm
This section presents our cost-based replacement policy, which is called Probability
Density Area Inverse Distance (PDAID). Earlier in this section, we have shown how
to calculate the access probability cost for each group while a group of cached objects
is accessed. To simplify our proposed approach, we assume that groups of all cached
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 178
objects have been formed and each group has had its value of access probability
calculated.
The formula to calculate value of C as follows:
C = PgD
Where Pg is an access probability of a group,
D is the data distance,
C is the elimination of cost.
When the cache does not have enough space, a group is eliminated. The group
elimination is done by calculating the value of C for all groups and removing the
group that has the smallest value of C. The value of C is the elimination cost. The
value of C is calculated by dividing value of Pg with a distance. The distance is
measured between the central point of a group and a next predicted location of
the user. The next prediction location is a location after the receiving location
and determined based on the travel history of mobile user. A group which has the
smallest distance has the smallest chance to be accessed again. Therefore, a group
that has the smallest value of C is eliminated.
Algorithm 5.4 shows the cached objects eviction based on multiple criteria. At
the start of the algorithm, all groups in the cache are assigned to parameter Groups
and the number of required slots is assigned to parameter numOfRequiredSlots. Once
the parameter assignments have been completed, the algorithm starts eliminating
groups which retrieve the Pg value, measure the distance and calculate the value
of C. The group eviction process is similar to that with algorithms in previous
section. This algorithm finds the group with the smallest value of C as an evicted
victim. Once the evicted group has been removed from the cache, the parameter
numOfSlotsFreed is increased by the number of items in the evicted group and the
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 179
Algorithm 5.4: Cached objects elimination of PDAID algorithm.
begingroups ← all groups in cacheuserLocation ← current location of usernumOfRequiredSlots ← number of required slotswhile numOfSlotsFreed ≥ numOfRequiredSlots do
min value ← maximum valuewhile each group in groups do
D ← calculate distance(group, userLocation)Pg ← retrieve value(Pg)C ← Pg / Dcurrent Value ← the value of Cif current value < min value then
min value ← current valuemin group ← group
end
endif min group 6= empty then
groups ← remove(min group)numOfSlotsFreed ← numOfSlotsFreed + numItemsInTheGroup
end
endend
parameter min value is reset to the maximum value. Then, the elimination process
continues until the number of required slots has been made available.
The illustration of Figure 5.8 is reused to describe the PDAID replacement policy.
Recalls the values of Pg for G0, G1 and G2 are 2, 0 and 2.5 respectively. The user
at current position (shaded circle) stores a new incoming objects, however the cache
space is not enough to accommodate those incoming objects. Thus, the existing
cached objects are evicted. The eviction process is completed by choosing a least
value of C amongst all cached groups. The value of C is calculated by dividing the
value of Pg with the distance. Therefore, the values of C for all cached groups are
0.2, 0 and 0.208 respectively. Hence, the eviction order is G1, G0 and G2.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 180
5.4 Case Studies
This section presents several case studies to illustrate our proposed approach. The
initial situation is given earlier, followed by illustrations and explanation for every
proposed approach.
Figure 5.9: Initial situation after cached objects have been grouped
Figure 5.9 shows the initial client cache status after the user has sent some
queries. The figure shows that some groups of objects have been formed and also
shows the current position of the mobile user. In the current situation, the user
would like to store the incoming query results to the cache; however, the cache
cannot store all incoming objects. Therefore, some cached objects are going to be
evicted to make available more empty cache slots.
To explain our cache invalidation policy with the example above, we present our
proposed approaches in three points, where each point describes density-based, path-
based and PDAID (cost-based) replacement policies respectively. The discussion of
three proposed approaches are as follows:
Case Study 5.4.1. The density-based policy
Assume that areas of all groups are the same size, which is a 2-unit area. Two
case studies are given below:
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 181
Case 1: Number of cached objects for each group are as follows: G0: 10, G1: 5, G2:
6, G3: 8, G4: 10, G5: 8, G6: 10, G7: 6. This illustration is shown in Figure 5.10.
Figure 5.10: Density based approach (Case Study 5.4.1-1)
In this case, G1 is the group that has the least number of cached objects. How-
ever, this group is not going to be removed since it is predicted to be the next
requested query. Hence, either G2 or G7 is going to be removed in this case. A
group which has the least access time has a higher possibility of being removed.
Case 2: The number of cached objects for each group is similar with the case
study 1, except as follows: G1: 10, G3: 4, G4: 5, G7: 8. Figure 5.11 shows an
illustration of this case study.
Figure 5.11: Density-based approach (Case Study 5.4.1-2)
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 182
The group that has the least number of cached objects is G3. However, this
group is not going to be removed since it has been accessed recently. Therefore, the
next victim is G4. Similar to G3, this group has been recently accessed and it will
not be removed. Then, group G2 is the next group that has the least collection and
access time. Therefore, this group is going to be removed.
For both cases, the new objects are inserted into the cache after the cache elimi-
nation, followed by the creation of a new group or adjustment of the existing group.
Adjustment of the existing groups is performed only to the groups that have the new
inserted objects. The outcome of the adjusting existing group is that the existing
groups have new object members and/or new groups are created.
Case Study 5.4.2. The path-based policy
In this part, we discuss the use of the Path-based approach. Two cases are
presented. In the first case, a query scope overlaps only with one group, whereas
the second one overlaps with multiple groups. The illustrations of both cases are
slightly different.
Figure 5.12: Path-based approach (Case Study 5.4.2-1)
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 183
Figure 5.12 shows the first case illustration for Path-based elimination where
the query scope covers only a single group. Group victim selection is done by
calculating two different distances. The first distance is measured between centroids
of two groups and the other one is calculated between centroid of the group and the
user’s receiving location. For example, the distance between G7 and G0 is 8, while
the distance between G7 and the user is 15. Once the two different distances have
been calculated, the smallest value is selected, which is 8. A similar procedure is
applied for G1, G2 and G6 and the smallest distance value for each group is selected.
Those values are 5, 18 and 10 respectively. Distances for group G4, G3 and G5 are
computed since these three groups are being used. Once the smallest distance values
have been selected, the maximum values are targeted as victims. Hence, G1 is the
victim and is eliminated from cache.
Figure 5.13: Path-based approach (Case Study 5.4.2-2)
Figure 5.13 shows a situation which is similar to the first case. In this case, the
next predicted query scope covers two cached groups. The elimination process is the
same as for the first case. To simplify, distances for G7 are 8, 15 and 17 (denoted by
H, G and I respectively), distances for G2 are 18, 10 and 26 (denoted by C, D and
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 184
K respectively), and the distances for G6 are 12, 25 and 27 (denoted by F, E and
J respectively). The minimum distance values for all three groups are 8, 10 and 12
for G7, G2 and G6. Then, the maximum value of those minimum distance values is
selected, which is 12. Hence, G6 is eliminated from the cache.
After one or more groups have been eliminated, the remaining processes are the
same as for case study 5.4.1, which inserts new objects and regroups the cached
objects. The regrouping process adjusts the existing groups and/or creates new
groups.
Case Study 5.4.3. The PDAID (cost-based) policy
This case study shows how to eliminate a group of objects based on the cost of
a group. The cost value is calculated based on PDAID mentioned in Section 5.3.3.
Figure 5.14 shows an illustration of this case study. For simplicity, the cache
has been filled by objects and the groups have been formed. These cached objects
have not been accessed again. Assume that the client receives new objects at the
current position and the cache is full. The cached objects which are located in next
prediction location are not evicted, because these objects will be requested next.
In this situation, access probability of all groups are zero since they have not been
accessed. When the user accesses G3 and G4, the access probability of this group
changes. Therefore, the eviction is based on the first incoming group. G0 is going
to be eliminated first if it is the first group inserted to the cache.
After one or more groups have been evicted, the rest of the caching process is
the same as for case study 5.4.1, which enters new objects and then adjusts existing
groups or creates new groups.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 185
Figure 5.14: PDAID-based approach (Case Study 5.4.3)
5.5 Discussion
This section discusses our proposed approaches. First, we discuss our elimination
approach based on distance followed by that based on density. The last discussion
is the elimination based on multiple factors.
Our proposed approach to elimination is similar to that of the FAR algorithm.
In our proposed approach, we eliminate a group of objects rather than individual
objects. The group is eliminated if the group has a maximum distance to the next
predicted location and minimum distance to the current location. The distance be-
tween two groups is a distance between two centroids of each group. Each group may
have a different shape which has a different formula to decide the shape. Therefore,
we use formula K-mean [34] to find the centroid of each group. In addition, when
a query scope overlaps more than one group, a minimum value from both groups is
found.
The second elimination approach is based on density. In this approach, the group
which has fewer objects has higher priority to be evicted. If the group is far away
and has more objects within a small area, this group is not eliminated. Therefore,
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 186
distance is not counted in this approach. When more than one group have the
same density value, the group which is formed first has a greater chance of being
eliminated in advance.
The last approach is based on multiple factors. The approach, called PDAID,
calculates the cost of a group based on several factors. The PDAID approach is
similar to the PAID approach, except we consider density and area values of one
group rather than an area of that group. The reason is that a larger area may consist
of only a few objects compared with the smaller area. A group has higher priority
to be chosen for elimination if the group has the furthest distance, the longest access
time, fewer objects and a small area.
5.6 Conclusion
This chapter discussed about our three proposed approaches for the client caching,
focusing on objects elimination. The aim of our proposed approach is to answer
client queries which satisfy at least K-objects answered from the cache. Our pro-
posed elimination approaches eliminate a group of objects based on three different
criteria. With the first criterion, the group of objects is eliminated based on dis-
tance. The second one is density-based; whilst the last one is elimination based on
multiple criteria.
In the first criterion, the distance-based elimination uses the MinMax algorithm,
which eliminates the group which is faraway from the next predicted location after
the location from the user has received a query result. In the second criterion, the
density-based elimination drops the group which has drop objects. The last one is
based on cost of a group by considering four factors in order to eliminate a group.
CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 187
These four factors are: access probability, valid scope area, density and data distance
factors.
Chapter 6
Performance Evaluation
This chapter presents the performance evaluation of our approaches that have been
elaborated in Chapters 3, 4 and 5. The purpose of this chapter is to evaluate those
approaches under various conditions.
The evaluation is performed by implementing and simulating the proposed ap-
proaches using JavaTM and PlanimateTM . The implementation and its results are
presented in Section 6.1; while the simulation and its results are presented in Section
6.2. The implementation section briefly describes our implementation and its results
for query processing at the server side. The simulation section contains a short sum-
mary of the simulation model, and more comprehensive results. Our simulation also
validates the outcomes of the implementation.
6.1 Implementation and its Results
An evaluation of the implementation of mobile query processing at the server side
is described in this section. This section is divided into two parts: a short summary
of the implementation details, and an elaboration of implementation results.
188
CHAPTER 6. PERFORMANCE EVALUATION 189
6.1.1 Implementation Environment
A summary of the implementation details is given in this section. The summary
includes implementation settings and the architecture.
Table 6.1: Hardware settingsParameter Server 1 Server 2 ClientProcessor AMD SunFire V440 PentiumCPU speed 1.96 Ghz 1.28 Ghz 700 MhzRAM Size 1 GB 16 GB 512 MBConnection Wired Wired WirelessLAN speed 512 Kbps 512 Kbps 1MbpsArea Size 900 x 2000 300 x 2000
Table 6.1 shows our experiment configurations. Server 1 and client machines use
Linux FedoraTM Operating System, whereas server 2 runs under the Sun operating
System. The implementation is written using JavaTM programming language. The
simulation database contains various numbers of records of the random number
x,y. In our experiment, every BS is connected to a single database (DB) server,
which contains between 100,000 to 5,000,000 records and the scope of a query is set
beyond the current BS boundary. The data set is synthetic, in that it is produced
by a location generator.
6.1.2 Implementation Results
This section gives numerical results of our experiments. The explanatory details of
our experiment results for single-cell and multi-cell, are discussed next.
Results for Query Processing in a Single Cell
We examine the performance of our proposed algorithm to process single cell queries.
The simulation database contains various numbers records of the random number
CHAPTER 6. PERFORMANCE EVALUATION 190
of x,y. The number of records in the database ranges from 250,000 to more than 1
million. Furthermore, various numbers of distances in user queries, from 500 up to
2500 meters, are sent by users through their mobile devices to the BS.
Figure 6.1: Number of targets found in a square
The experiments presented are designed to achieve two objectives. Firstly, we
examine the performance differences between square and circle. Secondly, we com-
pare our algorithm to specify the location related to the query. We assume that the
user receives the query result in a new location at time tstart+1.
Figure 6.1 shows the number of targets found within scope 1000 x 1000, 2000
x 2000, 3000 x 3000, 4000 x 4000 and 5000 x 5000 by using a square at time t1.
These experiments were performed when the user was not moving while receiving
the query results. From the figure, we can see the number of targets found within
area in different databases. The numbers are increasing as the area increases. It
shows an exponential number as the database and scope is getting bigger.
CHAPTER 6. PERFORMANCE EVALUATION 191
Figure 6.2: Number of targets found in circle
Figure 6.2 shows the number of targets found within radii of 500, 1000, 1500,
2000 and 2500 meters inside a circle at time t1 where the user is not moving while
receiving the query result. The query results are noted when the number of targets
found within a circle from various size databases is about the same as the square.
It shows an exponential number as well. However, if we take careful note of both
figures, the number of targets found in the circle is a bit smaller compared with the
one in the square. For example, there are 312,698 targets found in a square, but
only 245,254 targets found in a circle.
Therefore, we can calculate the percentage differences of the total targets found
between square and circle as shown in Figure 6.3. All targets found in a square
that is a valid scope are 100 percent. If we use a circle as a valid scope, the query
results produced are less than 100 percent, around 78-79 percent. This is due to the
area of the square being 21.4 times greater than the circle. This percentage does
not depend on the number of records in the database. Therefore, a square will have
CHAPTER 6. PERFORMANCE EVALUATION 192
Figure 6.3: Comparison of number of targets found in circle and square
about more than 21.4 percent chance to find the target compared with a circle if
the number of places is small.
Figure 6.4 shows the comparison of a number of targets found in every region.
If the server explores the whole of the regions and the user is moving diagonally,
the resources of the server will be wasted since the user is interested only with the
objects that have not been passed. We suggest that the server seeks only the specific
region(s) based on the direction of the client. Searching in one or two regions is very
efficient, because the processing time in specific regions is about 25-50 percent faster
than exploring the whole of the regions.
If we use a circle as the valid scope when the user missed the query result at time
tstart+1, there will be some targets that cannot be caught at time tstart+1. This is
not efficient since the server needs to seek new targets in a new location at the next
interval time as shown in Figure 6.5. If the targets are scarce, it is not convenient
CHAPTER 6. PERFORMANCE EVALUATION 193
Figure 6.4: Comparison of number of targets found in each region.
Figure 6.5: Comparison of number of targets found in circle at time t1 and t2.
CHAPTER 6. PERFORMANCE EVALUATION 194
for the user to resubmit the query in order to get a new query result at the next
interval time.
Figure 6.6: Snapshot of CPU load
We also analyse our proposed algorithm when a user misses the query results. If
the user moves at a speed which is higher than or equal to the distance in user queries,
any overlapping area is not produced. Otherwise, we have an overlapping area.
Figure 6.6 shows the CPU load processing when processing both an overlapping and
no overlapping area. The load percentage can be defined as throughput per second.
The graph shows that avoiding probing processes in the overlapping area reduces
the CPU load.
In conclusion, our experiments shows that the maximum number of objects using
a square as a valid scope gives a user a greater chance of finding scarce targets which
are close to the query scope boundary compared with other shapes. This is due to
the area of the square being greater than the others for a given query distance.
CHAPTER 6. PERFORMANCE EVALUATION 195
Results for Query Processing in Multi-cells
We have conducted three different types of analyses to examine our proposed ap-
proach. Firstly, we examine a situation where Multi-BS has the same area size and
many queries with various sizes of query scopes. The purpose is to determine how
long users should receive query results from a server. Secondly, we examine a stage
where Multi-BSs have different area sizes and a variety of queries with different sizes
of scopes are being sent. The purpose is to compare the length of time that query
results are received by users with either one or many BSs. We also examine the pro-
cessing time for every BS for both single and multiple users. From this experiment,
we evaluate whether the BS processing time is responsible for producing the query
results.
(i) Uniform Area Size Multi-BSs and Various Sizes of Query Scopes
Here, we analyse the receiving time of query results from servers while
users send queries with various sizes of query scopes from one location. All
BSs have the same area size. The complete setting of our simulation for the
first experiment is shown in Table 6.2. In this experiment, we used up to
five BSs with the same area size. We tested 1 to 20 users sending queries
concurrently. The query scopes are formed by squares and the areas vary from
250,000 upto 3,062,500 M2.
Table 6.3 shows the results of the first experiment. We can see that users
have a better chance of getting more targets within bigger query scopes. When
the query scope is 100 metres, the users get only 5 targets if the total record
of the database is 100,000 records. In contrast, users can get 698 targets when
the query scope is 1750 metres within the same database. Furthermore, the
CHAPTER 6. PERFORMANCE EVALUATION 196
Table 6.2: Parameters setting
Parameter ValueNumber of BSs 5BS Area (M2) 250,000Query Scope (m2) 250,000 - 3,062,500Number of Users 1-20User Coordinate (x,y) (400,400)Number of Items (every BS region) 100,000 - 5,000,000
other columns show that a smaller scope of query has a smaller number of
targets compared with a bigger scope.
On the other hand, if we examine the table horizontally, a larger number of
database records show more targets have been found compared to the smallest.
From this point of view, we see that the number of targets found depends not
only on the size of the query scopes, but also on the total records of database.
Table 6.3: First experiment result
Database RecordsQuery Scopes 100,000 500,000 1,000,000 5,000,000
(Metres) records records records records100 5 28 54 265250 29 169 299 1564500 140 619 1260 6186750 296 1380 2812 13929
1000 459 2190 4516 223361250 579 2864 5909 293001500 641 3176 6523 324691750 698 3485 7129 35619
Figures 6.7 and 6.8 show the response time results where the database
records are varied from 100,000 to 5,000,000 records respectively. Each graph
shows the response time where the number of users are: 1, 5, 10 and 20 users,
sending the query until receiving the query results and the query scopes range
CHAPTER 6. PERFORMANCE EVALUATION 197
from 100 up to 1750 meters. These graphs illustrate a general idea that the
larger size of query scope and database records and the more users the greater
are the delays in answering user queries.
Figure 6.7a shows the response time when the BSs accesses 100,000 database
records. It shows that answering a smaller area of a query scope gives a faster
response compared to a bigger area. In our simulation result, answering the
largest area of query scope, 1750 metres, is slower by about 10 times than the
smallest one, 100 metres. When we simulated for 20 users with query scopes
of 750 and 1000 metres, our machine ran the daily updates. That is, the slope
increased only.
Figure 6.7b shows the response time when the BS has 500,000 database
records. It indicates a significant change while twenty users access the BS with
a query scope greater than 1000 metres. The difference between the response
time of the largest and the smallest query scope is similar to the previous
graph. The current BS answers a user query of less than 100 milliseconds(ms)
while a query scope is 100 metres. On the other hand, when the query scope is
1750 metres, the approximate time for the user to get the answer is from 100
to 700 ms depending on how many users request results. Furthermore, we can
see that when the number of users processed doubles, the delay time is also
doubled.
While the number of database records increases in size, the response time
becomes slower. However, it is not only that; the query scope size is also
responsible for responding to the user query. Figure 6.8a shows that, especially
when there are more than 5 users accessing the BS concurrently, the increment
of the graph is bigger than when there are fewer than 5 users. On the other
CHAPTER 6. PERFORMANCE EVALUATION 198
(a) 100,000 DB Records
(b) 500,000 DB Records
Figure 6.7: Various searching scope with 100,000 and 500,000 database records
CHAPTER 6. PERFORMANCE EVALUATION 199
(a) 1,000,000 DB Records
(b) 5,000,000 DB Records
Figure 6.8: Various searching scope with 1,000,000 and 5,000,000 database records
CHAPTER 6. PERFORMANCE EVALUATION 200
hand, the response time to answer the targets within 1750 metres for 20 users
significantly increases if the query scope is greater than 1500 metres.
Figure 6.8b shows the same trends as the previous four. It shows that the
response time is ten times slower than that shown in Figure 6.7a. It also shows
clearly that if the number of users is doubled, the response time also increases
by two. However, it is not guaranteed that the number of users is a primary
factor in slowing down the BS response time. If we compare all four graphs,
the number of database records is one of the factors which slows down the BS
response time. This is due to the number of comparisons required to be done
in order to fulfil the criteria of the user query.
In summary, the longer it takes for the BS to answer queries, the more
targets are returned to the users. Furthermore, the size of query scopes and
the number of records in the database also need to be considered. Finally, the
total number of queries required to be processed concurrently also affect the
performance evaluation.
(ii) Various Area Size Multi-BS and Uniform Size Query Scopes
In the second experiment, many users send a number of queries, with the
same query scopes, from the same position, to the corresponding BS. We use
three BSs: one is the current BS and the others are the neighbouring BSs.
Table 6.4 shows the parameter setting. In this analysis, we use three BSs
where the area size of each BS is varied. However, all query scopes have the
same size. A range of total users send a query and the total records in the
database are the same as in the first experiment.
The purpose of this experiment is to measure the response time to answer
user queries if multiple BSs and multiple queries are involved at one time.
CHAPTER 6. PERFORMANCE EVALUATION 201
Table 6.4: Parameters setting for multiple BSs
Parameter ValueNumber of BS 5Query Scopes (m2) 2,250,000BS 1 Area (m2) 810,000BS 2 Area (m2) 90,000BS 3 Area (m2) 250,000Number of User 1 - 20User Coordinate (x,y) (400, 400)Number of Item (every BS region) 100,000 - 5,000,000
An accurate result returned by the BS is another point of interest. Table 6.5
shows the results returned to users if only a single or multi BS returns the query
results. The response time of the experiment results displayed in Figures 6.9
and 6.10 show the response time if there are one to three BSs accessed by the
number of clients which is 1, 5, 10 and 20 respectively. These figures show the
same trends, that is, the more BSs involved, the longer is the time taken to
give answers to users. In addition, when there are more queries processed at
the same time, more time is taken to answer those queries.
Table 6.5: Second experiment result
Number ofDatabase 1 Base 2 Base 3 BaseRecords Station Stations Statons100,000 141 560 1,074250,000 386 1,444 2,741500,000 729 2,983 5,581750,000 1,124 4,548 8,439
1,000,000 1,521 5,963 11,1542,000,000 3,072 12,046 22,7253,000,000 4,339 17,801 33,6194,000,000 5,992 24,109 45,2115,000,000 7,389 29,985 56,115
CHAPTER 6. PERFORMANCE EVALUATION 202
When a user sends a query to one BS and the BS searches only within its
area, the user will miss some targets or have to resend another query to collect
targets within the neighbour. Resending another query to a new BS consumes
more power and bandwidth. Table 6.5 shows the number of records found in
the query result. The first and second columns show when only one or two
BSs answer the query. Unless the third BS is down or failed to return a query
result, the query result returned is insufficient since the user needs to resend
another query when he reaches the neighbour BS.
Figure 6.9a shows the response time for giving solutions to a single user
when there are one to three BSs used simultaneously. The database contains
a number of records from 100,000 to 5,000,000. The average response time
of accessing from 100,000 to 5,000,000 for three BSs, is around 32 percent
compared to one BS only. On the other hand, the response time of accessing
either 100,000 or 5,000,000 records is 61 percent slower than for one BS.
Figure 6.9b shows the response time of five users which involves one, two
and three BSs respectively. The delay time in responding to a request from
users is up to 14 seconds for accessing three BSs with 5,000,000 records, while
it takes 3 seconds to access one BS only. If we compare this graph with the
previous one, it has a longer delay time. However, the average delay time for
one user remains the same.
Figure 6.10a shows the results of the response time accessed by ten users
for one, two or three BSs. In this situation, the delay in acknowledging a
request from a user is less than two times compared with Figure 6.9b. The
BSs response to ten user queries is less than 25 seconds for three BSs, 17
seconds for two BSs and 8 seconds for one BS. Others bars show the same
trend as for two BSs.
CHAPTER 6. PERFORMANCE EVALUATION 203
(a) One user
(b) Five users
Figure 6.9: A single searching scope with one and five users
CHAPTER 6. PERFORMANCE EVALUATION 204
(a) Ten users
(b) Twenty users
Figure 6.10: A single searching scope with ten and twenty users
CHAPTER 6. PERFORMANCE EVALUATION 205
Figure 6.10b shows the response time of twenty users while accessing one,
two, or three BSs. The response time of 5,000,000 database records is less than
40 seconds for three BSs, 23 seconds for two BSs, and 12 seconds for one BS.
The trends of other bars are similar to the one discussed before.
These four graphs show similar trends. Therefore, we can conclude from
this experiment that the delay in responding to user queries is n times slower
to access n BS, where n is the number of BSs accessed simultaneously.
The next group shows the response time that is classified by the number
of BSs. The purpose of this group is to show when and why the response time
of each BS is slowing down.
Figure 6.11: Response time of single BS
The next three figures, 6.11, 6.12a and 6.12b, show the response time for
one, two and three BSs accessed by the same number of users with the same
number of database records as mentioned in the previous group. The line is
starting to gradually increase, denoting that the response time is slowing down
when accessing 1 million records. However, they increase significantly if there
CHAPTER 6. PERFORMANCE EVALUATION 206
are more than 3 million database records. The increment of 10 and 20 users
is significant compared with the last two lines. However, the average response
time for each user is less than the response time of 1 user. The average is
around 600 milliseconds.
(a) Two BSs
(b) Three BSs
Figure 6.12: Response time of multi-BSs
CHAPTER 6. PERFORMANCE EVALUATION 207
Figure 6.11 shows the response time of a single BS. It shows that the line
increases slowly until it reaches 1 million records. After that, the response time
slows down quickly. The response time of 10 users in a database containing 5
million records is a significant slow down.
After the response time of one BS, the next figure is the response time
involving two BSs as shown in Figure 6.12a. This increment starts to get higher
after the database records reach 1 million records. The line that presents the
response time of 20 users is increased after 1 million records. However, if we
calculate the average response time, it is still faster than the response time of
1 user only. On the other hand, the lines of 5 and 10 users show a parallel line.
However, if we compare the gap between 10 and 20 users, the area is not twice
as large as the gap between 5 and 10 users.
Figure 6.12b shows the response time of three BSs. The response time
of three BSs is about twice as slow as those shown in the last two graphs.
This is due to data transmission and searching time for additional BSs. From
this figure, we can see that the delay between two and three BSs is not much
different when the database records are fewer than 500,000 records. However,
it starts to slow down if it accesses database records of more than 500,000.
(iii) Individual Processing Time of Multi-BS
In the last experiment, we measure the individual processing of each BS.
We used the same setting as in the second experiment. The aim is to discover
whether the processing time of each BS is reasonable. We did this experiment
twice. The first experiment used two BSs. The second experiment used three
BSs.
CHAPTER 6. PERFORMANCE EVALUATION 208
Figure 6.13: Processing time of individual BSs for the same query scope and twoBSs
The graphs shown in Figures 6.13 and 6.14 are the results of our experi-
ments showing the processing time of each BS. A number of different users,
from 1 to 20, are examined in this experiment.
When there are many BSs involved, we cannot say that the processing time
of one BS (neighbour) is faster than another. This is due to the size of the
database records and the scarcity of the targets matched with a query from
the user. This is shown in the second group in Figure 6.13 where the second
BS needs more time to finish its process.
Figure 6.14 clarifies this issue clearly. In the first and second groups, BS3
takes longer to finish its process than the other two. However, BS2 is the
fastest to finish the process. In contrast, the fastest BS in the last group is
BS3, and BS1 is the slowest. BS2 is the last BS to finish its process, whereas
BS3 is the fastest.
CHAPTER 6. PERFORMANCE EVALUATION 209
Figure 6.14: Processing time of individual BSs for the same query scope and threeBSs
6.2 Simulation and its Results
This section discusses experiment results of the proposed approaches using a simu-
lation package called Planimate c© [89]. The Planimate c© is not only a simulation
package, but it is a software platform for prototyping, developing and operating
highly visual dynamic discrete event simulation models, and interactive business
applications. The Planimate c© visual platform provides and merges several funda-
mentals, with built-in capabilities as follows:
• animation,
• handling of concurrency,
• visual work-flow modelling, and
• dynamic time-based modelling (simulations).
CHAPTER 6. PERFORMANCE EVALUATION 210
The above capabilities provide the efficiency needed to develop models of our
proposed approaches. The models might take more time to be developed using a
programming language.
6.3 Simulation Results for Single-Cell and Multi-
Cell Query Processing
Experiments for query processing at server sides are presented in this section. A
simulation of a retrieval situation has been done to compare the performance of
two query scopes. One query scope has a square shape, whereas the second one is a
circle. Besides to show the retrieval performance of two shapes, this experiment aims
to show that the simulation tool has the same performance as our implementation.
The experiments are divided into single and multiple cells experiments.
Single Cell Simulation Results
The setting of this experiment uses 50 synthetic data which represents the location
of static objects. a series of experiments which are carried out on this examination
are divided into 2 cases. Table 6.6 shows the setting details.
Table 6.6: Parameter settings - single cell
Parameter ValueNumber of BS 1BS Area (units2) 2500Query Scope (units2) 36-196User Coordinate (x,y) (25,25)Number of Items 50
CHAPTER 6. PERFORMANCE EVALUATION 211
Case 1. User is not moving while retrieving data. The user asks for objects
within a radius of 10 units, where the user location is the centre point of the query
scope.
Figure 6.15: Comparison of objects retrieved using a square and a circle (singlecell)
Figure 6.15 shows a graph that compares the number of retrieved objects using
two query scopes whose shapes are a square and a circle respectively. The number
of retrieved objects using a square has more objects compared to the other one. In
the worst case, both shapes retrieve the same number of objects.
Case 2. Various experiments with different sizes of query scopes have been done
for this case. The settings for this case are the same as for the previous experiment,
except for the query scope dimension. The sizes of the query range from 6 upto 14
units distance. The dataset used in this experiment has 30 objects.
Figure 6.16 shows a 100 % result comparison graph for this case. The graph
compares the number of retrieved objects where the sizes of query scopes are varied.
As shown in the graph, the percentage bars for square shape are higher than the
CHAPTER 6. PERFORMANCE EVALUATION 212
Figure 6.16: Percentage comparison of object retrieval using different sizes of queryscopes.
ones for the circle. When the scope distances are 6 and 8, the percentage bars for
both have the same height. This can be explained by the fact that distances of the
retrieved objects are located within the scope dimension.
Multiple Cells Simulation Results
Some simulation experiments to imitate query results retrieval from multiple cells
have been performed. The setting for this of type experiment uses 100 synthetic
data which represents the location of static objects separated in two cells. Table 6.7
shows the setting details.
User is not moving while retrieving data. The user asks for objects within radius
of 10-18 units. The number of database records is 50 records for each cell.
Figure 6.17 depicts the experiment result for object retrieval from multiple cells.
In the figure, the experiments which used a square as a query scope retrieved more
objects than did a circle for each cell. Square1 means that the object retrieval
CHAPTER 6. PERFORMANCE EVALUATION 213
Table 6.7: Parameters setting - multiple cells
Parameter ValueNumber of BS 2BS Area (units2) 5000Query Scope (units2) 36-196User Coordinate (x,y) (45,25)Number of Items 100
Figure 6.17: Comparison of objects retrieved using a square and a circle
is using a square shape to retrieve records within cell one. Circle1 has a similar
meaning to square1, but it used a circle shape rather than a square. When using a
square for each cell (square1 and square2) the performance is better compared with
the ones using a circle (circle1 and circle2).
Comparing between the implementation and the simulation results is our next
discussion. The percentage of using a circle or a square for the implementation is
about 80 percent (refer to Figure 6.3), whereas the simulation also produces about
80 percent (refer to Figure 6.15). It can be seen that the performance when using a
CHAPTER 6. PERFORMANCE EVALUATION 214
square for the implementation and simulation is similar. Therefore, this simulation
package can be used to simulate the rest of our proposed approaches. In conclusion,
the use of a simulation package produces similar results to those produced by the
implementation ones. Therefore, this simulation package can be used to simulate
the rest of our proposed approaches.
6.3.1 Indexing for Multi-Cell Query Processing
This section discusses the simulation experiments for the proposed indexing mech-
anism. The performances of two proposed approaches are studied and their results
are compared with the conventional approach.
For each proposed approach, the performance of three cases are simulated and
studied. The three cases differ in: the number of requests during an off-load, an
off-load and a high-load situation. An off-load situation is simulated by setting up
the internal time to be greater than the average access time. To simulate a high-load
situation, the interval time of an incoming query is less than the average access time.
Local Index Simulation Results
The experiment settings for all cases of both proposed approaches are similar. In our
study, four cells are used to process the query, where three cells behave as neighbour
cells. All cells have similar processing speed. The interval time of the sent queries
varies between 0.1 up to 1 second. These queries are sent in a sequential order, thus,
only one query is processed at one time. To simulate the local index behaviour, we
assume that the number of additional slots to cache remote data items is 20 percent
of the total slots.
CHAPTER 6. PERFORMANCE EVALUATION 215
The details for all cases are explained below:
Case 1 compares the performance of the local index with the conventional ap-
proach using the same query interval time. The aim of this case is to show the
average access time if there are different numbers of requests. The query interval
time used in the simulation is 1 second between query with deviation time is 0.1
seconds.
Figure 6.18: Average access time between proposed vs conventional approaches
Figure 6.18 shows the simulation result of average access time for the conven-
tional and the proposed local index approaches. In the graph, the conventional
approach too longer to process queries compared with the proposed local index.
However, when the graph for the proposed approach is higher than the conventional
one, the graph tells us that there are more queries so more time is needed to retrieve
data items from the neighbour cells.
Case 2 compares the performance of the conventional and the proposed local
index approaches in a high-load situation. The aim is to show that the proposed
CHAPTER 6. PERFORMANCE EVALUATION 216
local index still outperforms in a high-load situation. The setting for query interval
time ranges from 0.1 up to 1 second with a deviation time of 0.1 second. The number
of queries used in the simulation is 50 and 150 queries.
Figure 6.19 and 6.20 shows the average access time in a high load situation for
50 and 150 requests respectively. In both figures, the query interval time is from 0.1
up to 1 second. In most cases, the proposed approach outperforms compared with
the conventional approach as shown in both figures.
Figure 6.19: Average access time for the proposed Local Index vs the conventionalapproaches (50 Requests)
Figure 6.19 views the average access time for 50 requests. The average access
time of one query in this simulation ranges from 0.83 up to 1.2 seconds for the
conventional approach, whereas the range of the average access time of one query
for the proposed approach is from 0.81 up to 1 seconds.
Figure 6.20 presents the average access time for 150 requests. The average access
time for the conventional and the proposed approaches is between 0.89 up to 1.1
seconds. The interesting point in this graph is a turning point where the conventional
CHAPTER 6. PERFORMANCE EVALUATION 217
Figure 6.20: Average access time for the proposed Local Index vs the conventionalapproaches (150 Requests)
approach performs slightly better than the proposed approach if the query interval
time exceeds 0.7 seconds. This is because the number of queries retrieving data
items from different cells is higher than for those queries that retrieve data items
from the local cell directly.
Global Index Simulation Results
In this experiment, we compare performance when using global index and conven-
tional approaches. We used two cells where each cell has 30 records and an R-tree
indexing structure. Both cells have the same dimension, 50 x 50 units. On the
other hand, the global index contains 60 records of both cells and the R-tree index
structure is used.
The following cases have been set up for our experiments:
Case 1 compares the average access time between the proposed global index and
the conventional approaches to answer one query when there are various numbers
CHAPTER 6. PERFORMANCE EVALUATION 218
of requests. The same query is run several times and the average access time is
calculated in order to obtain the access time for a single query.
Figure 6.21: Average access time for a single query
Figure 6.21 presents the average access time it takes to answer a single query
when the number of requests varies. Mostly, the average access time for the con-
ventional approach is twice slower compared with that of the proposed global index.
The average access time for the conventional approach is 6.5 seconds, whereas the
proposed approach takes around 2.8 seconds.
Case 2 compares average access time between the proposed global index and
the conventional approaches to answer one query where the data items are not
replicated. In this scenario, the setting is similar to the previous scenario, except
the data items are not replicated to wherever the indexes are replicated.
Figure 6.22 shows the experiment results for a single query access type. As
shown in the figure, the query access time in the conventional approach is longer
than the one using our proposed approach.
CHAPTER 6. PERFORMANCE EVALUATION 219
Figure 6.22: Average access time for a single query: remote indexes only.
In the case where the number of requests is varied, mostly, the average access
time to answer a single query using the conventional approach is one and half slower
compared with that of the proposed global index. The average access time for the
conventional approach is 6.5 seconds, whereas the proposed approach takes around
4 seconds.
6.3.2 Simulation Results for Client Caching
This section describes the proposed client caching approaches. Table 6.8 shows
a setting list for the proposed cache replacement policies. The total number of
database records at the server side is 2000. The server answers 5000 queries with a
query size 3 x 3. On the other side, a client has a cache size 100 slots where each
slot is assumed to hold a single object. The expected number of objects received for
each query is 1-40. The cached objects form a number of groups where each group
has cached objects within an epsilon range. The minimum point range is from 1 to
CHAPTER 6. PERFORMANCE EVALUATION 220
10 points, whereas the epsilon range is between 1 to 10 units distance. The group
area is an area for each group. This group area is used only by the proposed Cost
based cache replacement.
Table 6.8: Experiment settings for client cache
Parameter ValueDB Records 2000 recordsQuery Scope 3 x 3Cache Size 100Group area 50Epsilon 1-10Minimum points 1-10Average Requested Objects 1-40Total Queries 5000
Figure 6.23: Comparison of cache hits with various minimum points on each group
Case 1. The experiment for this case uses a variable minimum number of points
and the other values are uniform. The epsilon value used for this experiment is 5
since it is a median of the epsilon range.
CHAPTER 6. PERFORMANCE EVALUATION 221
Figure 6.23 shows a comparison of cache hits where the minimum points range
from 1 to 10. The figure shows that our proposed approaches outperform when the
minimum points are 9 with an epsilon value of 5. When the minimum number of
points is less than 9, the proposed Path approach performs better than the rest of
the candidates. However, when the minimum number of points is greater than 9, the
performances of all proposed approaches are equal and the cache hit rate is higher
compared to what it is without caching.
Case 2. The aim of the second experiment is to discover whether the minimum
points value has any impact when the total number of requested objects increases.
In this experiment, we use a minimum point value of 5 and the number of requested
objects are 10, 20 and 40.
The experiment results for case 2 are shown in Figures 6.24, 6.25 and 6.26. These
three graphs show that all cache hit performances drop about half when the number
of requested objects is doubled. When the number of available slots in the cache
are insufficient, some occupied slots are freed. It means that a higher number of
requested objects requires more available slots. In other words, when the number of
occupied slots to be emptied is increased, the cache hit efficiency will be degraded.
Figure 6.24 shows our proposed approach when the maximum requested objects
is 10 and epsilon value ranges from 1 to 10. The density-based algorithm outperforms
compared with other competitor approaches when the minimum number of points of
each group are 3, 4 and 5 respectively. On the other hand, the path-based algorithm
outperforms when the epsilon value is higher.
Figure 6.25 shows experiment results when the maximum number of requested
objects is 20. The performance of the Cost approach is better when the epsilon is
5. This can be explained by the fact that most of the cached objects eliminated are
CHAPTER 6. PERFORMANCE EVALUATION 222
Figure 6.24: Comparison of cache hits with a maximum value of min req is 10.
Figure 6.25: Comparison of cache hits with a maximum value of min req is 20.
CHAPTER 6. PERFORMANCE EVALUATION 223
rarely requested. In general, when the epsilon value is smaller, the cache hit rate
increases for all proposed approaches.
Figure 6.26: Comparison of cache hits with a maximum value of min req is 40.
Figure 6.26 shows the experiment result when the number of maximum requests
is about 60 percent of the total cache slots. As we can see from this graph, all of
our proposed approaches do not perform well in general. However, they performs
better when the epsilon value is small. This situation can be explained by the fact
that when the epsilon value is small, more groups are formed. Therefore, we have
fewer evicted objects if the cached objects form more groups.
6.4 Discussion
Efficient retrieval when answering location-dependent queries is really necessary, as
location-dependent queries have been used widely to obtain query results at anytime
and anywhere. Without an efficient retrieval, it would be difficult to give an answer
to these types of queries.
CHAPTER 6. PERFORMANCE EVALUATION 224
For the server query processing, retrieving rare objects by reducing number of
queries is better, because it saves power to generate a query and to transmit it. This
situation can be achieved by using a square as query scope gains a performance in
terms of number of retrieved objects compared to a circle. Furthermore, avoiding to
process overlap area when the user misses query result speed up the time of query
result delivery.
For the local indexing (LI) mechanism, it performs better when most users re-
quest information at the same area. Because the server searches only its local index
which is faster than requesting from other neighbour cells. LI-1 which replicates
the remote data item performs better than LI-2. However, replicating remote data
items into the local storage occupies more spaces in the local storage.
For the global index (GI) mechanism, it performs better when most of queries
requests the remote data item. The index maintenance cost is lower compared to
the LI if most BSs are online in a stable condition. Furthermore, duplicating remote
data items increase the query response, however it consumes a lot of spaces in the
local cell.
For the grouping in cache replacement policy, the performance of all policies
depends on factors of a group. The cache hit increases when the epsilon value is
small. It decreases when the minimum requested objects is large, because the large
number of requested objects consumes large cache space which cause frequent cached
objects eliminations will be completed.
For the cache replacement policies, all proposed policies performs the same when
the epsilon value is small. However, the path-based outperforms for the large epsilon
value. The density-based performs better than path-based on average. However,
the PDAID does not deliver much performance gain, because it depends on multiple
factors. On the other hand, when the minimum points of each group increases
CHAPTER 6. PERFORMANCE EVALUATION 225
with a constant epsilon value, the performance of density-based is better than two
candidates, however the PDAID performs worst amongst all candidates.
6.5 Conclusion
In this chapter, we have described the performance evaluation of our proposed ap-
proach, which focused on the physical design, steps and implementations of the
proposed approaches with objects retrieval from a single cell to multiple cells. The
details of the proposed approaches have been elaborated on Chapters 3 to 5. The
evaluation results have shown the advantages of applying the proposed algorithms
to mobile query processing.
Section 6.1 shows the performance of our proposed approaches which are carried
out through an implementation. Then, we also present our proposed approach
using a simulation package in Section 6.2. At the start of the simulation section,
we also demonstrate that the simulation package shows the same results as the
implementation. Hence, the simulation is used to evaluate our last two approaches.
We have tested our algorithms for objects retrieval in both single and multiple
cells. It has shown better performance compared with those of other shapes. It
occupies less storage space consumption to store the query scope and has a better
chance of retrieving rare objects compared with others. In addition, our approach
has shown efficiency when forming a valid scope boundary on the neighbour cells
while retrieving objects from multiple cells.
In our experiments, we have shown that our proposed indexing mechanisms im-
proves the speed of the conventional mechanism. The performance test on indexing
structures in location-dependent query processing in general indicates that the pro-
cessing time using individual and partial global indexing approaches is different and
CHAPTER 6. PERFORMANCE EVALUATION 226
the gap increases as the number of popular data increases. Moreover, the data
transfer time between individual and partial global indexing show contrasting time
differences. Finally, the execution time improves around 30 percent with the partial
global indexing approach compared with the individual indexing approach.
For the client caching replacement policies, our proposed approaches performance
depends on the grouping policy. Three proposed cache replacement policies have
been evaluated in this chapter. In general, their performance is affected by the
number of requested objects and the epsilon value of a group. The cache hits
performance improved when the number of requested objects is small and the epsilon
value for a group is small. If the grouping policy has fewer cached objects in one
group, the proposed approaches increase the cache hit counter.
Chapter 7
Conclusion and Future Work
7.1 Overview
This thesis investigated mobile query processing at both server and client sides. The
main purpose of this research is to study the performance of mobile query process-
ing on both sides and build on top of traditional query processing mechanisms in
order to be adjustable to the mobile environment. Attention is focused on three
major areas of the query processing scheme: query processing at server side, in-
dexing for multi-cell query and cache replacement at client side. The investigations
include developing a model to obtain number of data items requested, index struc-
tures, cache hit performance. In addition, performance results were evaluated using
implementation and simulation.
7.2 Summary of Research Result
The main research result of this thesis exposes how mobile query processing can
be done at server and client sides, which maximise system efficiency and overcome
the limitations of mobile devices. Combining assorted types of query processing
227
CHAPTER 7. CONCLUSION AND FUTURE WORK 228
mechanisms for various possible mobile queries that may occur is the way to achieve
this.
The first part of our research is to minimise the number of requests. To achieve
this, various algorithms are designed to deal with situations where mobile users miss
query result. This is done by considering location and the query size.
Indexing is an important problem to solve, especially when servers process multi-
cell queries. The aim is to minimise the number of visited nodes in order to improve
query access time.
Minimising communication cost is always a primary consideration in the mo-
bile environment. This can be achieved by retrieving requested items from a local
storage. Hence, a client cache replacement policy has also been considered and in-
vestigated in this thesis. Another purpose of investigating client cache replacement
policy is to overcome limitations in mobile devices, in particular, small screen dis-
play and storage limitations in the mobile devices. Reducing number of requested
objects based on user satisfaction is the way to handle these limitations.
The achievements of this research are summarised as follows:
• Query Processing at Server Side
The main motivation in proposing server query processing is to process mo-
bile queries by considering movement factors of mobile users. The aim of this
contribution is to reduce data transfer by retrieving objects that are located
in the same direction of mobile users. The other purpose of this contribution
is to retrieve an object that is rare by reducing the number of requests to the
server.
The query processing mechanisms in this contribution are divided into
three parts: single cell, multiple cells and handling disconnections. We further
divide single cell into three categories: static, dynamic and angle of movements.
CHAPTER 7. CONCLUSION AND FUTURE WORK 229
In the static category, a query scope is parallel to a base station location and
three algorithms were proposed for dealing with this situation. The three
proposed approaches are used to retrieve objects based on horizontal, vertical
and diagonal movement. The dynamic approach is a query scope that is
perpendicular to the mobile user direction. The angle of movement is similar
to the diagonal in the static approach, except this approach focuses more on
the angle of movement of mobile users.
On the other hand, multi-cell query processing focuses on retrieval from
several cells which are covered by a query scope. In this case, we also consider
the overlapping and non-overlapping cells area in order to avoid duplicated
objects. Then, we modify the single cell retrieval algorithm for adaption to
the multi-cell query processing.
In handling the disconnections issue, we identified several disconnection
situations and proposed algorithms for each. The aim of this part is to decide
whether a server needs to keep the existing query result or generate a new
query result.
• Indexing Mechanisms for Multi-Cell Queries
Some researchers have developed indexing mechanisms for a non-mobile en-
vironment. This thesis did not consider at developing a new indexing structure.
However, it studied the behaviour of existing index structures and developed
new algorithms to use existing index structures to answer multi-cell queries. A
multi-cell query is a query that asks for a certain area which includes multiple
base stations. The purpose is to improve query processing time by avoiding
the sending of a request to the other cells.
CHAPTER 7. CONCLUSION AND FUTURE WORK 230
The main motivations of our indexing mechanisms are two fold: Local
and Global indexes. As the name of the proposed approach suggests, the
first is an investigation into storing requested remote indexes at the current
server. In this case, we increase an existing index tree of one cell by adding
requested indexes from surrounding cells. The index tree cannot be expanded
to cope with all remote indexes: however, it is allowed to grow to a certain
size. The second approach is to create a global index for all indexes from all
available cells. It means that when a base station is online, it propagates its
tree to surrounding cells. In this case, the shrinking will occur only when the
base station goes offline. This is a necessary condition in order to ensure the
consistency of the indexes which are located on the global index structure.
• Cache Replacement Policies for Client Cache
The last contribution of this thesis is the development of three client cache
replacement policies. In this case, we borrow an existing grouping algorithm
to group cached objects into several groups. When the cache needs to free
some cached objects, then one of the proposed cache replacement policies can
be applied to eliminate a group of cached objects. The aim is to increase the
usage performance of the client cache to reduce communication costs to the
server. Also, handling the limitations of small screen mobile devices is another
purpose of this contribution.
The three cache replacement policies are Path-based, Density-based and
Probability Density Area Inverse Distance (PDAID). The first policy considers
distances to all groups in order to eliminate a group of cached objects. The
second one evicts a group that has the least total number of objects. The last
one removes a group based on a cost which is calculated by analysing several
factors.
CHAPTER 7. CONCLUSION AND FUTURE WORK 231
7.3 Future Research
This section discusses several possible investigations that can be done in mobile
query processing.
Multiple Sources Query Processing is a future topic where a mobile client makes
requests from several sources rather than a single source. The challenge is to solve
the problem of how the mobile user joins the requested data from several sources.
The quick answer is either the mobile devices or server side. Due to the nature
of the mobile environment, there are several factors to consider, such as frequent
disconnections, small storage space and slow network bandwidth.
Answering Moving Objects Queries is a future challenge for our research topic.
The problem is the searched objects dynamically move to another location. Thus,
if we store these objects into a cache, some cached objects could be preserved in the
cache during cache elimination. Can we extend our proposed approaches to cope
with this situation? The second challenge is indexing moving objects, which extends
our proposed indexing to cope with moving objects. The challenge is to create an
indexing structure flexible enough to be used for indexing moving objects.
Continues Query Processing is another future investigation to extend our pro-
posed cache replacement policies to answer continuous queries locally. The problem
is how to model a replacement policy to preserve cached objects.
Caching Management For Query Processing. Throughout the whole thesis, we
have not considered the different aspects of caching, such as Middleware objects
caching, which stores objects which are not a database objects. However, it will be
beneficial to look at storing objects as part of the implementations.
References
[1] Aberdeen [2005]. The Mobile Field Service Solution Selection Report,
http://www.mobiletechlink.com/. Last accessed: 02/04/08.
[2] Acharya, D., Kumar, V. and Yang, G.-C. [2007]. DAYS mobile: A Location
Based Data Broadcast Service for Mobile Users, SAC ’07: Proceedings of the
2007 ACM Symposium on Applied Computing, ACM, New York, NY, USA,
pp. 901–905.
[3] Aggarwal, C., Wolf, J. and Yu, P. [1999]. Caching on the World Wide Web,
IEEE Transactions on Knowledge and Data Engineering 11(1): 428–441.
[4] Agrawal, D. P. and Zeng, Q.-A. [2006]. Introduction to Wireless and Mobile
Systems, 2nd edn, Thomson Engineering.
[5] Agrawal, P. and Famolari, D. [1999a]. Mobile Computing in Next Generation
Wireless Networks, Proceedings of the 3rd International Workshop on Discrete
Algorithms and Methods for Mobile Computing and Communications pp. 32–39.
[6] Agrawal, P. and Famolari, D. [1999b]. Mobile Computing in Next Generation
Wireless Networks, DIALM ’99: Proceedings of the 3rd International Workshop
on Discrete Algorithms and Methods for Mobile Computing and Communica-
tions, ACM Press, pp. 32–39.
232
REFERENCES 233
[7] Ahamad, M. [1999]. Scalable Consistency Protocols For Distributed Services,
IEEE Transactions on Parallel and Distributed Systems 10(9): 888–903.
[8] Akbarinia, R., Martins, V., Pacitti, E. and Valduriez, P. [2007]. Top-K Query
Processing in the APPA P2P System, 7th International Conference on High
Performance Computing for Computational Science, Vol. 4395 of Lecture Notes
in Computer Science, SPRINGER, pp. 158–171.
[9] Akbarinia, R., Pacitti, E. and Valduriez, P. [2006]. Reducing Network Traffic in
Unstructured P2P Systems Using Top-K Queries, Distrib. Parallel Databases
19(2-3): 67–86.
[10] Barbara, D. and Imielinski, T. [1994]. Sleepers and Workaholics: Caching
Strategies for Mobile Environments, SIGMOD ’94: Proceedings of the 1994
ACM SIGMOD International Conference on Management of Data, ACM,
pp. 1–12.
[11] Beckmann, N., Kriegel, H.-P., Schneider, R. and Seeger, B. [1990]. The R*-tree:
An Efficient and Robust Access Method for Points and Rectangles, SIGMOD
’90: Proceedings of the 1990 ACM SIGMOD International Conference on Man-
agement of Data, ACM Press, New York, NY, USA, pp. 322–331.
[12] Benetis, R., Jensen, C., Karciauskas, G. and Altenis, S. [2002]. Nearest Neigh-
bour and Reverse Nearest Neighbour Queries for Moving Objects, International
Database Engineering and Applications Symposium pp. 44–53.
[13] Bentley, J. L. and Friedman, J. H. [1979]. Data Structures for Range Searching,
ACM Computing Surveys 11(4): 397–409.
[14] Bluetooth [2008]. http://www.bluetooth.com. Last accessed: 02/04/08.
REFERENCES 234
[15] Bruno, N., Gravano, L. and Marian, A. [2002]. Evaluating Top-K Queries
Over Web-Accessible Databases, Proceedings of 18th International Conference
on Data Engineering pp. 369–380.
[16] Burak, A. and Sharon, T. [2004]. Usage Patterns of FriendZone: Mobile
Location-Based Community Services, MUM ’04: Proceedings of the 3rd Inter-
national Conference on Mobile and Ubiquitous Multimedia, ACM, New York,
NY, USA, pp. 93–100.
[17] Cai, Y. and Hua, K. A. [2002]. An Adaptive Query Management Technique for
Real-Time Monitoring of Spatial Regions in Mobile Database Systems, PCC
’02: Proceedings of the Performance, Computing, and Communications Confer-
ence, 2002. on 21st IEEE International, IEEE Computer Society, Washington,
DC, USA, pp. 259–266.
[18] Chand, N., Joshi, R. and Misra, M. [2006]. Data Profit Based Cache Replace-
ment in Mobile Environment, IFIP International Conference on Wireless and
Optical Communications Networks.
[19] Cho, S. G., Jeong, H. K. and Ma, J. S. [2003]. Performance Optimization
Technique of Location Registration in Public Transportation, Mobile Commu-
nications: 7th CDMA International Conference, pp. 49–69.
[20] Chrysanthis, P. K. and Pitoura, E. [2000]. Mobile and Wireless Database Access
for Pervasive Computing, Proceedings of the 16th International Conference on
Data Engineering, pp. 694–695.
[21] Clarke, I., Sandberg, O., Wiley, B. and Hong, T. W. [2001]. Freenet: A Dis-
tributed Anonymous Information Storage and Retrieval System, International
REFERENCES 235
Workshop on Designing Privacy Enhancing Technologies, Springer-Verlag New
York, Inc., New York, NY, USA, pp. 46–66.
[22] Dar, S., Franklin, M. J., Jonsson, B. T., Srivastava, D. and Tan, M. [1996].
Semantic Data Caching and Replacement, Proceedings of the 22th International
Conference on Very Large Data Bases (VLDB ’96), Mumbai (Bombay), India,
pp. 330–341.
[23] DasBit, S. and Mitra, S. [2003]. Challenges of Computing in Mobile Cellular
Environment: A Survey, Computer Communications 26(1): 2090–2105.
[24] Davis, W. [2001]. Motorola-Wireless Technology Trends,
http://www.ecedha.org/2000-01/agenda.html. Last accessed: 02/04/08.
[25] Deng, B., Jia, Y. and Yang, S. [2006]. Supporting Efficient Distributed Top-K
Monitoring, WAIM, pp. 496–507.
[26] DeRose, J. F. [2002]. The Wireless Data Handbook, 4th edn, Wiley-Interscience,
chapter 6.
[27] Ding, R. and Meng, X. [2001]. A Quadtree Based Dynamic Attribute Index
Structure and Query Process, Proceedings. 2001 International Conference on
Computer Networks and Mobile Computing pp. 446–451.
[28] Ding, X., Lu, Y., Ding, X., Zhao, N. and Wei, Q. [2007]. An Efficient Index for
Moving Objects with Frequent Updates, WiCom 2007. International Confer-
ence on Wireless Communications, Networking and Mobile Computing, 2007.
pp. 5946–5949.
[29] Dulaney, J. [2008]. The Evolvement of 3G Mobile: Introduction of Third Gener-
ation Cell Phones, http://www.planetomni.com/ARTICLES-The-Evolvement-
of-3G-Mobile.shtml. Last accessed: 02/04/08.
REFERENCES 236
[30] Dunham, M. H. and Kumar, V. [1999]. Impact of Mobility on Transaction
Management, MobiDe ’99: Proceedings of the 1st ACM International Workshop
on Data Engineering for Wireless and Mobile Access, ACM Press, pp. 14–21.
[31] El-Ghazaly, S. and Golio, M. [1996]. Challenges in Modern Wireless Personal
Communications, Radio Science Conference, 1996. 29: 39–51.
[32] Elmasri, R. and Navathe, S. [2004]. Fundamentals of Database Systems, 4th
edn, Addison-Wesley.
[33] Engerman, G. and Kearney, L. [1998]. Effective Use of Wireless Data Commu-
nications, International Journal Of Network Management 8: 2–11.
[34] Ester, M., Kriegel, H.-P., Sander, J. and Xu, X. [1996]. A Density-Based Al-
gorithm for Discovering Clusters in Large Spatial Databases with Noise, Pro-
ceedings of Second International Conference on Knowledge Discovery and Data
Mining, pp. 226–231.
[35] Feuerstein, M. and Rappaport, T. [1993]. Wireless Personal Communications,
Kluwer Academic Publishers.
[36] Franklin, M. and Carey, M. [1992]. Client-Server Caching Revisited, Proceed-
ings of the International Workshop on Distributed Object Management, ACM,
pp. 57–78.
[37] Gaede, V. and Gunther, O. [1998]. Multidimensional Access Methods, ACM
Computing Surveys 30(2): 170–231.
[38] Gast, M. [2005]. 802.11 Wireless Networks: The Definitive Guide, 2nd edn,
OReilly & Associates, Inc.
REFERENCES 237
[39] Gu, H., Shi, Y., Xu, G. and Chen, Y. [2005]. A Core Model Support-
ing Location-Aware Computing in Smart Classroom, Advances in Web-Based
Learning - ICWL 2005, 4th International Conference, Vol. 3583 of Lecture
Notes in Computer Science, SPRINGER, pp. 1–13.
[40] Guo, J., Guo, W. and Zhou, D. [2006]. Indexing of Constrained Moving Ob-
jects for Current and Near Future Positions in GIS, First International Multi-
Symposiums on Computer and Computational Sciences (IMSCCS ’06) 2: 504–
509.
[41] Guttman, A. [1984]. A Dynamic Index Structure for Spatial Searching, Pro-
ceedings of the 1984 ACM SIGMOD International Conference on Management
of data, ACM, pp. 47–57.
[42] Hadjieleftheriou, M., Kollios, G., Gunopulos, D. and Tsotras, V. J. [2003].
On-Line Discovery of Dense Areas in Spatio-Temporal Databases, Advances in
Spatial and Temporal Databases, LNCS 2750, pp. 306–324.
[43] He, Y., Shu, Y., Wang, S. and Du, X. [2004]. Efficient Top-K Query Processing
in P2P Network, Database and Expert Systems Applications, 15th International
Conference, DEXA 2004, Vol. 3180 of Lecture Notes in Computer Science,
SPRINGER, pp. 381–390.
[44] Helal, A., Haskell, B., Carter, J. L., Brice, R., Woelk, D. and Rusinkiewicz,
M. [2002]. Any Time, Anywhere Computing: Mobile Computing Concepts and
Technology, Vol. 522, Springer Netherlands, chapter 1-2.
[45] Hosbond, J., Saltenis, S. and Ortoft, R. [2003]. Indexing Uncertainty of Contin-
uously Moving Objects, Database and Expert Systems Applications pp. 911–915.
REFERENCES 238
[46] Hu, H., Xu, J., Wong, W. S., Zheng, B., Lee, D. L. and Lee, W.-C. [2005].
Proactive Caching for Spatial Queries in Mobile Environments, Proceedings of
the 21st International Conference on Data Engineering (ICDE ’05), pp. 403–
414.
[47] Hu, J., Xu, J., Lee, D. and Lee, W. [2004]. Performance Evaluation of an
Optimal Cache Replacement Policy for Wireless Data Dissemination, IEEE
Transactions on Knowledge and Data Engineering 16(1): 125–139.
[48] Hung, H.-P., Chuang, K.-T. and Chen, M.-S. [2007]. Efficient Process of Top-K
Range-Sum Queries over Multiple Streams with Minimized Global Error, IEEE
Transactions on Knowledge and Data Engineering 19(10): 1404–1419.
[49] HUTCHISON [2006]. Third Generation Mobile Phones,
http://www.three.com/. Last accessed: 02/04/08.
[50] Imielinski, T. and Badrinath, B. [1992]. Querying in Highly Mobile Distributed
Environments, Proceedings of the 18th Very Large Data Bases Conference,
pp. 41–52.
[51] Jing, J., Helal, A. and Elmagarmid, A. [1999]. Client-Server Computing in
Mobile Environments, ACM Computing Surveys 31(2): 117–157.
[52] Keller, A. M. and Basu, J. [1996]. A Predicate Based Caching Scheme for
Client-Server Database Architectures, 5(2): 35–47.
[53] Kim, Y. K. and Prasad, R. [2006]. 4G Roadmap and Emerging Communication
Technologies, Artech House Publishers.
[54] Knuth, D. [1997]. Sorting and Searching, Vol. 3, 3rd edn, Addison-Wesley.
REFERENCES 239
[55] Kollios, G., Gunopulos, D. and Tsotras, V. J. [1999]. On Indexing Mobile
Objects, PODS ’99: Proceedings of The Eighteenth ACM SIGMOD-SIGACT-
SIGART Symposium on Principles of Database Systems, ACM, New York,
USA, pp. 261–272.
[56] Kpper, A. [2005]. Location-Based Services : Fundamentals and Operation, John
Wiley & Sons Ltd.
[57] Kumar, A., Misra, M. and Sarje, A. K. [2006]. A Predicted Region Based
Cache Replacement Policy for Location Dependent Data in Mobile Environ-
ment, WiCOM 2006:International Conference on Wireless Communications,
Networking and Mobile Computing, pp. 1–4.
[58] Kumar, A., Misra, M. and Sarje, A. K. [2007]. A Weighted Cache Replace-
ment Policy for Location Dependent Data in Mobile Environments, SAC ’07:
Proceedings of the 2007 ACM symposium on Applied computing, ACM Press,
pp. 920–924.
[59] Kwon, D., Lee, S. and Lee, S. [2002]. Indexing the Current Positions of Moving
Objects Using the Lazy Update R-tree, MDM ’02: Proceedings of the Third In-
ternational Conference on Mobile Data Management, IEEE Computer Society,
Washington, DC, USA, pp. 113–120.
[60] Lai, K. Y., Tari, Z. and Bertok, P. [2004a]. Mobility-Aware Cache Replacement
for Users of Location-Dependent Services, Technical report, RMIT School of CS
& IT.
[61] Lai, K. Y., Tari, Z. and Bertok, P. [2004b]. Mobility-Aware Cache Replacement
for Users of Location-Dependent Services, LCN ’04: Proceedings of the 29th
REFERENCES 240
Annual IEEE International Conference on Local Computer Networks, pp. 50–
58.
[62] Lee, C.-I. and Tsai, C.-J. [2001]. An Efficient Approach to Extracting and
Ranking The Top-K Interesting Target Ranks From Web Search Engines, In-
formatica (Slovenia) 25(3).
[63] Lee, D. L., Xu, J., Zheng, B. and Lee, W.-C. [2002]. Data Manage-
ment in Location-Dependent Information Services, IEEE Pervasive Computing
1(3): 65–72.
[64] Lee, K. C., Lee, W.-C., Zheng, B. and Xu, J. [2006]. Caching Complementary
Space for Location-Based Services, Advances in Database Technology - EDBT
2006, LNCS 3896/2006, pp. 1020–1038.
[65] Li, Z., He, P. and Lei, M. [2005]. Research of Semantic Caching for Location
Dependent Query in Mobile Network, ICEBE ’05: Proceedings of the IEEE
International Conference on e-Business Engineering, IEEE Computer Society,
Washington, DC, USA, pp. 511–517.
[66] Lim, S. Y., Taniar, D. and Srinivasan, B. [2005]. On-Mobile Query Process-
ing Incorporating Multiple Non-Collaborative Servers, Ingenierie des Systemes
d’Information 10(5): 9–38.
[67] Liu, Z. [2005]. Dynamical Mobile Terminal Location Registration in Wireless
PCS Networks, IEEE Transactions on Mobile Computing 4(6): 630–640.
[68] Lo, E., Mamoulis, N., Cheung, D., Ho, W. and Kalnis, P. [2003]. Processing
Ad-Hoc Joins on Mobile Devices, Technical report, The University of Hong
Kong.
REFERENCES 241
[69] Lo, E., Mamoulis, N., Cheung, D. W.-L., Ho, W.-S. and Kalnis, P. [2004].
Processing Ad-Hoc Joins on Mobile Devices, Database and Expert Systems Ap-
plications, 15th International Conference, DEXA 2004, Vol. 3180 of Lecture
Notes in Computer Science, pp. 611–621.
[70] Lodge, J. H. [1991]. Mobile Satellite Communications Systems - Toward Global
Personal Communications, IEEE Communications Magazine 29: 24–30.
[71] Lunde, T. and Mjøvik, E. [2000]. Mobile Communication Technologies: Tech-
nical Capabilities and Time-to-Market, Technical Report IMEDIA/01/00, Nor-
wegian Computing Center.
[72] Manesis, T. and Avouris, N. [2005]. Survey of Position Location Techniques
in Mobile Systems, Proceedings of the 7th International Conference on Human
Computer Interaction With Mobile Devices and Services pp. 291–294.
[73] Markopoulos, A., Pissaris, P., Kyriazakos, S. and Sykas, E. [2004]. Efficient
Location-Based Hard Handoff Algorithms for Cellular Systems, NETWORK-
ING 2004, Networking Technologies, Services, and Protocols; Performance of
Computer and Communication Networks; Mobile and Wireless Communica-
tions pp. 476–489.
[74] Marsit, N., Hameurlain, A., Mammeri, Z. and Morvan, F. [2005]. Query Pro-
cessing in Mobile Environments: a Survey and Open Problems, 1st Inter-
national Conference on Distributed Frameworks for Multimedia Applications
pp. 150–157.
[75] mathsifun.com [2006]. Area of Plane Shapes,
http://www.mathsisfun.com/area.html. Last accessed: 02/04/08.
REFERENCES 242
[76] Matsunam, H., Terada, T. and Nishio, S. [2005]. A Query Processing Mecha-
nism for Top-K Query in P2P Networks, 21st International Conference on Data
Engineering Workshops pp. 1240–1244.
[77] Metwally, A., Agrawal, D. and Abbadi, A. E. [2005]. Efficient Computation
of Frequent and Top-K Elements in Data Streams, Proceedings of the 10th
International Conference on Database Theory (ICDT ’05), Vol. 3363 of Lecture
Notes in Computer Science, Springer, pp. 398–412.
[78] Michel, S., Triantafillou, P. and Weikum, G. [2005]. KLEE: A Framework
for Distributed Top-K Query Algorithms, Proceedings of the 31st International
Conference on Very Large Data Bases, pp. 637–648.
[79] Mobile Computing & Wireless LANs [2001].
http://www.mobileinfo.com/Wireless LANs/index.htm. Last accessed:
02/04/08.
[80] Nelson, R. C. and Samet, H. [1986]. A Consistent Hierarchical Representa-
tion for Vector Data, Proceedings of the 13th Annual Conference on Computer
Graphics and Interactive Techniques (SIGGRAPH ’86), ACM Press, New York,
NY, USA, pp. 197–206.
[81] Overview of Wireless Technologies [2004]. http://wireless.utk.edu/overview.html.
Last accessed: 02/04/08.
[82] Parry, R. [2002]. Overlooking 3G, IEEE Potentials 21(4): 6–9.
[83] Peng, W.-C. and Chen, M.-S. [2005]. Query Processing in A Mobile Computing
Environment: Exploiting The Features of Asymmetry, IEEE Transactions on
Knowledge and Data Engineering 17(7): 982–996.
REFERENCES 243
[84] Perry, M., O’hara, K., Sellen, A., Brown, B. and Harper, R. [2001]. Dealing
with Mobility: Understanding Access Anytime, Anywhere, ACM Transactions
on Computer-Human Interaction (TOCHI) 8(4): 323–347.
[85] Pfoser, D. and Jensen, C. [2001]. Querying The Trajectories of On-Line Mo-
bile Objects, Proceedings of the 2nd ACM International Workshop on Data
Engineering for Wireless and Mobile Access, ACM, pp. 66–73.
[86] Pfoser, D., Jensen, C. S. and Theodoridis, Y. [2000]. Novel Approaches in Query
Processing for Moving Object Trajectories, Proceedings of 26th International
Conference on Very Large Data Bases (VLDB ’00), pp. 395–406.
[87] Pissinou, N., Makki, K. and Campbell, W. J. [1999]. On The Design of a Loca-
tion and Query Management Strategy for Mobile and Wireless Environments,
Computer Communications 22(7): 651–666.
[88] Pitoura, E. and Samaras, G. [1998]. Data Management for Mobile Computing,
Kluwer Academic Publishers, London.
[89] Planimate c© Website [2007]. http://www.planimate.com/. Last accessed:
02/04/08.
[90] Porkaew, K., Lazaridis, I. and Mehrotra, S. [2001]. Querying Mobile Objects
in Spatio-Temporal Databases, Proceedings of the 7th International Symposium
on Advances in Spatial and Temporal Databases (SSTD ’01), Springer-Verlag,
London, UK, pp. 59–78.
[91] Prabhakar, S., Xia, Y., Kalashnikov, D., Aref, W. and Hambrusch, S. [1999].
Query Indexing and Velocity Constrained Indexing: Scalable Techniques for
Continuous Queries on Moving Objects, IEEE Transactions on Computers
51(10): 1124–1140.
REFERENCES 244
[92] Priyantha, N. B., Chakraborty, A. and Balakrishnan, H. [2000]. The Cricket
Location-Support System, Proceedings of the 6th Annual International Confer-
ence on Mobile Computing and Networking (MobiCom ’00), ACM, New York,
NY, USA, pp. 32–43.
[93] Ramakrishnan, R. and Gehrke, J. [2002]. Database Management Systems, 3rd
edn, McGraw-Hill Science/Engineering/Math.
[94] Ren, Q. and Dunham, M. [2000]. Using Semantic Caching to Manage Loca-
tion Dependent Data in Mobile Computing, Proceedings of the Sixth annual
International Conference on Mobile Computing and Networking pp. 210–221.
[95] Ren, Q. and Dunham, M. H. [1999]. Using Clustering for Effective Manage-
ment of A Semantic Cache in Mobile Computing, Proceedings of the 1st ACM
International Workshop on Data Engineering for Wireless and Mobile Access
(MobiDe ’99), ACM Press, New York, NY, USA, pp. 94–101.
[96] Roussopoulos, N., Kelley, S. and Vincent, F. [1995]. Nearest Neighbour Queries,
Proceedings of the 1995 ACM SIGMOD International Conference on Manage-
ment of Data (SIGMOD ’95), ACM, New York, NY, USA, pp. 71–79.
[97] Samet, H. [1984]. The Quadtree and Related Hierarchical Data Structures,
ACM Computing Surveys 16(2): 187–260.
[98] Samet, H. [1988]. Hierarchical Representations of Collections of Small Rectan-
gles, ACM Computing Surveys 20(4): 271–309.
[99] Sellis, T., Roussopoulos, N. and Faloutsos, C. [1987]. The R+-Tree: A Dynamic
Index for Multi-Dimensional Objects, Proceedings of the 13th Very Large Data
Bases Conference, pp. 507–518.
REFERENCES 245
[100] Seydim, A., Dunham, M. and Kumar, V. [2001]. Location Dependent Query
Processing, Proceedings of the 2nd ACM International Workshop on Data En-
gineering for Wireless and Mobile Access, ACM, pp. 47–53.
[101] Shrestha, A. and Xing, L. D. [2007]. A Performance Comparison of Different
Topologies for Wireless Sensor Networks, 2007 IEEE Conference on Technolo-
gies for Homeland Security, pp. 280–285.
[102] Sistla, A. P., Wolfson, O., Chamberlain, S. and Dao, S. [1998]. Querying
The Uncertain Position of Moving Objects, Temporal Databases: Research and
Practice, LNCS 1399, pp. 310–337.
[103] Stanoi, I., Agrawal, D. and Abbadi, A. E. [2000]. Reverse Nearest Neighbor
Queries for Dynamic Databases, ACM SIGMOD Workshop on Research Issues
in Data Mining and Knowledge Discovery, pp. 44–53.
[104] Su, C. and Tassiulas, L. [2000]. Joint Broadcast Scheduling and User’s Cache
Management for Efficient Information Delivery, Wireless Networks 6(4): 279–
288.
[105] Tao, Y., Papadias, D. and Sun, J. [2003]. The TPR*-Tree: An Optimized
Spatio-Temporal Access MethoD for Predictive Queries, VLDB, pp. 790–801.
[106] Tari, Z., Hamidjaja, H. and Lin, Q. T. [2000]. Cache Management in CORBA
Distributed Object Systems, IEEE Transactions on Parallel and Distributed
Technology 8(3): 48–55.
[107] Tayeb, J., Ulusoy, O. and Wolfson, O. [1998]. A Quadtree-Based Dynamic
Attribute Indexing Method, The Computer Journal 41(3): 185–200.
[108] The IEEE 802.11 Standards [2008]. http://standards.ieee.org/getieee802/802.11.html.
Last accessed: 02/04/08.
REFERENCES 246
[109] Theodoridis, Y. and Sellis, T. K. [1994]. Optimization Issues in R-tree
Construction (Extended Abstract), Proceedings of the International Workshop
on Advanced Information Systems (IGIS ’94), Springer-Verlag, London, UK,
pp. 270–273.
[110] Toh, C.-K. and Li, V. [1998]. Satellite ATM Network Architectures: An
Overview, IEEE Network 12(5): 61–71.
[111] Trajcevski, G., Wolfson, O., Hinrichs, K. and Chamberlain, S. [2004]. Man-
aging Uncertainty in Moving Objects Databases, ACM Trans. Database Syst.
29(3): 463–507.
[112] Tsalgatidou, A., Veijalainen, J., Markkula, J., Katasonov, A. and Had-
jiefthymiades, S. [2003]. Mobile E-Commerce and Location-Based Services:
Technology and Requirements, In Proceedings of the 9th Scandinavian Research
Conference on Geographical Information Services pp. 1–4.
[113] Waluyo, A., Srinivasan, B. and Taniar, D. [2005]. Research on Location-
Dependent Queries in Mobile Databases, International Journal on Computer
Systems: Science and Engineering 20(3): 77–93.
[114] Wang, J. [1999]. A Survey of Web Caching Schemes for The Internet, SIG-
COMM Comput. Commun. Rev. 29(5): 36–46.
[115] Wang, W., Yang, J. and Muntz, R. [2000]. PK-tree: A Spatial Index Struc-
ture for High Dimensional Point Data, Information Oganization and Databases:
Foundations of Data Organization, Kluwer Academic Publishers, Norwell, MA,
USA, pp. 281–293.
REFERENCES 247
[116] Wang, W., Yang, J. and Muntz, R. R. [1997]. STING: A Statistical Informa-
tion Grid Approach to Spatial Data Mining, Proceedings of 23rd International
Conference on Very Large Data Bases (VLDB ’97), pp. 186–195.
[117] Want, R., Schilit, N., Adams, I., Gold, R., Petersen, K., Goldberg, D., Ellis,
R. and Weiser, M. [1996]. The ParcTab Ubiquitous Computing Experiment,
Kluwer Academic Publishers, Boston.
[118] Ward, A., Jones, A. and Hopper, A. [1997]. A New Location Technique for
the Active Office, IEEE Journal Personal Communications 4(5): 42–47.
[119] Wireless Indoor Positioning System (WIPS) - Technical Documentation [2007].
http://www.tslab.ssvl.kth.se/csd/projects/0012/technical.pdf. Last accessed:
10/10/2007.
[120] Wu, M., Jianliang Xu, J., Xueyan Tang, X. and Wang-Chien Lee, W.-C.
[2007]. Top-K Monitoring in Wireless Sensor Networks, IEEE Transactions on
Knowledge and Data Engineering 17(7): 962–976.
[121] Xia, Y. and Prabhakar, S. [2003]. Q+Rtree: Efficient Indexing for Moving Ob-
ject Databases, Proceedings of the Eighth International Conference on Database
Systems for Advanced Applications (DASFAA ’03), IEEE Computer Society,
Washington, DC, USA, pp. 175–182.
[122] Xie, T., Sha, C., Wang, X. and Zhou, A. [2006]. Approximate Top-K Struc-
tural Similarity Search over XML Documents, Frontiers of WWW Research
and Development - APWeb 2006, 8th Asia-Pacific Web Conference, Vol. 3841
of Lecture Notes in Computer Science, Springer, pp. 319–330.
REFERENCES 248
[123] Xu, J., Lee, W.-C. and Tang, X. [2004]. Exponential Index: A Parameterized
Distributed Indexing Scheme for Data on Air, Proceedings of the 2nd Inter-
national Conference on Mobile Systems, Applications, and Services (MobiSys
’04), ACM, pp. 153–164.
[124] Xu, J., Zheng, B., Lee, W.-C. and Lee, D. L. [2004]. The D-tree: An Index
Structure for Planar Point Queries in Location-Based Wireless Services, IEEE
Transactions on Knowledge and Data Engineering, Vol. 16, pp. 1526–1542.
[125] Xu, Z., Hu, Y. and Bhuyan, L. [2004]. Exploiting Client Cache: A Scalable
and Efficient Approach to Build Large Web Cache, Proceedings of the 18th
International Conference on Parallel and Distributed Processing Symposium
pp. 55–65.
[126] Yin, L., Cao, G. and Cai, Y. [2005]. A Generalized Target-Driven Cache Re-
placement Policy for Mobile Environments, Journal of Parallel and Distributed
Computing 65(5): 583–594.
[127] Zaslavsky, A. and Tari, Z. [1998]. Mobile Computing: Overview and Current
Status, Australian Computer Journal 30(2): 42–52.
[128] Zheng, B. and Lee, D. [2001a]. Processing Location-Dependent Queries in a
Multi-Cell Wireless Enviroment, Proceedings of the ACM International Work-
shop on Data Engineering for Wireless and Mobile Access pp. 54–65.
[129] Zheng, B. and Lee, D. L. [2001b]. Semantic Caching in Location-Dependent
Query Processing, SSTD ’01: Proceedings of the 7th International Symposium
on Advances in Spatial and Temporal Databases, Springer-Verlag, London, UK,
pp. 97–116.
REFERENCES 249
[130] Zheng, B., Xu, J. and Lee, D. [2002]. Cache Invalidation and Replacement
Strategies for Location-Dependent Data in Mobile Environments, IEEE Trans-
actions on Computers 51(10): 1141–1153.
Appendix A
Implementation Model
This chapter describes our implementation in more detail. Our implementation has
two major parts: Location Generator and Proposed Algorithms. The location gen-
erator generates objects’ locations for our experiments. The locations of all objects
generated by this generator will be used in the second part of our implementation.
The second part consists of our proposed algorithms, which are categorised into
three major parts as described by our proposed approaches, as previously described
in the last three sections.
A.1 Location Generator
We developed a generator to create a list of objects’ locations. It produces the
number of objects’ locations in two dimensional coordinates and stores these in a
text file. Every location of an object is stored in one line, which is presented in
format x,y, and ended by a newline character for every line. The generated data is
used for all our experiments.
This generator is very simple, it contains two steps: initialisation and object’s
locations generators parts. In the first part, the initialisation process of the generator
250
APPENDIX A. IMPLEMENTATION MODEL 251
is done. It assigns some variables to adjust the setting of our generator. We have
some variables to adjust the number of base stations, the number of objects in every
base station and the dimensions of every base station. These parameters can be
adjusted during the experimentations. The next step is the development of base
station boundaries. A boundary of a base station is shown as a square starting
from the bottom left point (xmin, ymin) to the top right point (xmax, ymax). The
values for x and y coordinates are positive values starting from zero. The value of
the top right points is calculated by adding the component of the assigned square
dimension. This process keeps developing all boundaries of base stations until the
total number of base stations assigned by users has been reached.
Once the initialisation process has been completed, the generator starts gen-
erating object’s locations in two dimensional coordinates. First, we generate two
numbers: x and y, which represent an object’s location in two dimensional coordi-
nates. The data type used for this number is type double. The reason for presenting
values of x and y as double data type is to have more precise object’s locations since
it is a 64-bit floating point primitive presentation. In our generator, an objects’
location is unique, which implies that there is only object in one location.
Then, we develop a function to generate two numbers randomly. By using built-
in functions from the Java library, this step is easily implemented. First, we initialise
a seed number by using current time presented in milliseconds, then a number is
generated randomly from the seed. To generate the second number, we use the same
process. Once both numbers have been generated, we store them into an array of
objects. The object in our case is an object’s location, two generated numbers.
Before both generated numbers are stored into the array, they are verified against
all elements in the array. If they are exactly the same as one of the elements in
array, both numbers are rejected and new ones begin to be generated. Otherwise,
APPENDIX A. IMPLEMENTATION MODEL 252
we store the generated data into a text file where the data format is similar to a
CSV (Comma Separated Value) form. A space is used to separate the x-coordinate
and y-coordinate values in one line.
Table A.1: Snapshot of our Generated Datax-coordinate y-coordinate5867.824581439626 3746.31063256796961083.199667249632 1953.76151378720995798.697361245137 3744.38549182568578423.345028973014 9424.8204018568943657.506339333300 1227.90917395401047617.495384068737 7649.0128816436994951.797251460664 6362.2774909348459966.001968430934 6287.9700965537418790.675769028454 28.102450983519271470.307894133284 1908.28506249053118094.432129404925 9347.009729539019962.5368928828126 213.476462744636179657.234208782298 1632.0546402032621
The output of our data file that is generated using our location generator is
presented in Table A.1. The last part of the generator continues generating objects’
locations until the total number of objects needed by users are fulfilled.
A.2 Implementation for Query Processing in Sin-
gle Cell
Our first implementation is to simulate query processing for a single cell. The aim
of this experiment is to compare the performance between circles and squares in
retrieving the most number of objects. In this implementation, we do not consider
the following factors:
APPENDIX A. IMPLEMENTATION MODEL 253
• Processing multiple user queries at once
• Disconnections between a mobile user and a base station is ignored since it
can be tolerated
• Communication protocol between client and server is standard protocol
• Object’s location is static
• User always request a certain area within the cell that the user is currently
located
Our implementation has the flexibility to enable users to choose the total number
of database records that they want, the location of a mobile client and its velocity.
Table A.2 shows the setting of our implementation values.
Table A.2: Setting implementation 1Parameter Values
Database records 250,000 - 1,250,000BS dimension 10,000 x 10,000
Searching distance 500 - 2500Shape used Circle, square
Speed 0, 50Direction horizontal, vertical and diagonal
Once users have passed these values to our simulation, our simulation assigns
those values to appropriate variables. After all value parameters have been assigned,
we create a boundary of base station and a query scope. Then, our simulation
retrieves all records from the chosen database by storing them into an array. These
retrieved records from the database are valid for this base station since they were
generated by our generator.
The next step is to find records that belong to the query scope. Remember that
our proposed approaches are only to retrieve records that have not been passed.
APPENDIX A. IMPLEMENTATION MODEL 254
Thus, we divide the query scope into four equal regions. To simplify our discussion,
the regions are numbered anti-clockwise starting from the top right. The selection of
regions is done by identifying the entered velocity from users. The velocity consists
of two elements: X and Y . Both elements are positive if the client travels to the
north east position. In contrast, the values of both elements are negative if the
client travels south west. Once the travel direction has been identified, the region
selection can be done. If the travel direction is east, the selected regions are one
and four. The complexity of regions selection can be seen in the case study from
Chapter 2.
Object validation is done in the next step (shown in Figure A.1). In this step,
an object is retrieved from the coordinate collection (line 3). Then, we compare the
location of the object to the chosen regions of a square and a circle as the query
scope. Lines 5 to 12 show object validation where the object is located inside the
square presented as a query scope. If it is inside, the counter for the square is
incremented and verified whether or not it is located inside a circle. Measuring the
distance between the object and the user is done by using Euclidean distance (line
15-17). The counter for circle is incremented if the object is inside the circle. These
objects are located inside the square and the circle will increment the counter for the
square and the circle. This verification continues until the value of internalCounter
is equal to number of objects inside the coordinate collection (line 1).
Figure A.2 shows how our program is run and its output when a user does
not move. We requested some information (location, searching distance, speed,
travel direction) from the user since we do not have any device for collecting live
information from a user. The time is measured based on a time unit.
Next experiment is the implementation when a user misses query results, thus
the server needs to reproduce the next query results. The implementation process is
APPENDIX A. IMPLEMENTATION MODEL 255
1 while ( in te rna lCounte r < coord inate . s i z e ( ) )2 {3 ptDblBuff = ( Point2D . Double ) coo rd inate . elementAt ( in te rna lCounte r ) ;4 /∗ I s a coord inate i n s i d e the reg ion ? ∗/5 i f ( ( ptDblBuff . x < qsTopRight . x ) &&6 ( ptDblBuff . y < qsTopRight . y ) &&7 ( ptDblBuff . x > qsTopLeft . x ) &&8 ( ptDblBuff . y < qsTopLeft . y ) &&9 ( ptDblBuff . x < qsBottomRight . x ) &&
10 ( ptDblBuff . y > qsBottomRight . y ) &&11 ( ptDblBuff . x > qsBottomLeft . x ) &&12 ( ptDblBuff . y > qsBottomLeft . y ) )13 {14 /∗ Find Distance o f t a r g e t to i n s i d e c i r c l e to source ∗/15 pwrDistance = Math . pow( ( ptDblBuff . x − source . x ) , 2 . 0 ) +16 Math . pow( ( ptDblBuff . y − source . y ) , 2 . 0 ) ;17 ptDblDistance = Math . s q r t ( pwrDistance ) ;18 i f ( ptDblDistance <= distanceFromSource )19 {20 noOfPlacesFoundInCirc le++;21 }}22 noOfPlacesFoundInSquare++;23 }24 i n t e rna lCounte r++;25 }
Figure A.1: Implementation for object validation against query scope
similar to that of the initial process. However, if there are any existing query results
or the receiving flag is false, the server checks whether there is any overlapping area
between the current and the previous scopes. When an overlapping scope does not
exist, the query processing is the same as before. In contrast, when the current and
previous query scopes existed, the server invalidates any objects in the query results
which are not located inside the overlapping area. Then, the server searches objects
within the non-overlapping area of the current query scope. The existing objects
within the overlapping and the non-overlapping areas are merged. Then the server
sends the query results to the users.
The processing time is measured when the server starts processing the query
results. The measurement is done when invalidating objects from query results and
generating query results from the beginning of the process.
APPENDIX A. IMPLEMENTATION MODEL 256
[jjayaput@sng-1 experiment1]\$ java generateCoordinate datadata50kEnter your current position including floating point(0-10000) :5000Enter distance that you would like to search :500Enter your current speed (0 - stop) :0Total : 50000 CoordinatesTime : t0Source (5000.0,5000.0)direction: hSearching in Region 0
Number of Places Found in Square:127Number of Places Found in Circle:100Searching in Region 1
Number of Places Found in Square:114Number of Places Found in Circle:83Searching in Region 2
Number of Places Found in Square:112Number of Places Found in Circle:84Searching in Region 3
Number of Places Found in Square:123Number of Places Found in Circle:93
Figure A.2: Snapshot of experiment 1 simulation
A.3 Implementation for Query Processing in Multi-
Cells
The implementation of query processing in multiple cells is quite complex, since
it involves a number of servers. In our implementation, we use TCP/IP for all
communication protocols. The time it takes to send a query result from a server to
a mobile user is ignored since we assume that the time needed for sending a query
result is constant.
To simulate multiple cells implementation, we use three machines where each
machine runs one server to serve one cell and has its own database.
APPENDIX A. IMPLEMENTATION MODEL 257
Figure A.3: Class diagram of server implementation
APPENDIX A. IMPLEMENTATION MODEL 258
Our class diagram for server implementation is shown in Figure A.3. It has five
classes: Server, ThreadedSocket, BSEntity, Message and Result. The explanation
for each class is as follows:
• Server
This class is the front end of the server which initiates the server boundary
and listening for any incoming request. The server instantiation has two parts:
default or custom. In the default mode, the BS boundary has been decided
automatically by the simulation. In the other words, the default mode is used
to initiate the main server. If the user does not give any parameters to this
class, the default value is used. Table A.3 values for the main server are as
follows:
Table A.3: Server default settingParameter ValueBS Width 900BS Height 2,000Server Port 8189
Figure A.4 shows an implementation snapshot to register to main server.
If the server configuration is set by the user, there must be at least one main
server up and running. The reason is that any customised server needs to
register to the main server. The port for main server is 8189. The registration
process for other servers is very simple: they connect to the incoming port of
the main server and send their identity to the main server. Then, they wait
for an acknowledgment from the main server that their registration status has
been successful. They can stand by to listen for incoming request if their
registration process to the main server has been successful. Otherwise, the
instantiation of this server has failed.
APPENDIX A. IMPLEMENTATION MODEL 259
1 i f ( port != 8189 )2 {3 try4 {5 // Es t a b l i s h connect ion to the main se rve r6 Socket socketToNeigh = new Socket ( "203.24.130.25" , 8189 ) ;7
8 PrintWriter output = new PrintWriter ( socketToNeigh . getOutputStream ( ) ) ;9 input = new BufferedReader ( new InputStreamReader (
10 socketToNeigh . getInputStream ( ) ) ) ;11 Str ingToken ize r s t = null ;12
13 output . p r i n t l n ( r eque s t ) ; // Asking f o r r e g i s t r a t i o n .14 output . f l u s h ( ) ;15
16 St r ing buffInputFromSocket = null ;17
18 try19 {20 while ( ( buffInputFromSocket = input . readLine ( ) ) != null )21 {22 s t = new Str ingToken i ze r ( buffInputFromSocket ) ;23 System . out . p r i n t l n ( "updatingÃNeighbourÃList" ) ;24
25 b s en t i t y . updateNeighbour ( inetAddr , port , p o s i t i o n (x , y ) , Dimension ) ;26
27 System . out . p r i n t l n ( "NeighbourÃListÃupdated" ) ;28 }29 }30 catch ( Exception e )31 { System . out . p r i n t l n ( "RegistringÃmainÃserverÃtoÃneighbourÃlistÃfailed" ) ; }32
33 // We don ’ t need to e s t a b l i s h connect ion to the main se rve r34 // once the r e g i s t r a t i o n completed .35 // The communication between se rve r w i l l be handled by c l a s s BSEntity .36 socketToNeigh . c l o s e ( ) ;37 }38 catch ( Exception e ) { System . out . p r i n t l n ( "ServerÃRegistrationÃfailed" ) ; }39 }
Figure A.4: Implementation of a server registering itself to a main server
In the next step, this class initiates the listening port and keep listening for
incoming request from other servers and client. The server port can be chosen
directly by simulation or user. Figure A.5 shows an implementation snapshot
for listening to incoming requests using ServerSocket provided by Java library.
When there is an incoming request, this class remembers the port from
requesters and calls ThreadedSocket class. When it calls ThreadedSocket class,
it passes the requester’s port, its boundary. What ThreadedSocket class does
APPENDIX A. IMPLEMENTATION MODEL 260
1 ServerSocket s e r v e r = new ServerSocket ( port ) ;2 System . out . p r i n t l n ( "ServerÃisÃready" ) ;3 while ( true )4 {5 Socket socke t = s e r v e r . accept ( ) ;6 i f ( socke t != null )7 {8 new ThreadedSocket ( socket , counter , b s e n t i t y ) . s t a r t ( ) ;9 System . out . p r i n t l n ( "ThreadÃstarted" ) ;
10 }11 }
Figure A.5: Implementation on how a server keep listening from incoming request
is to create separate process (child process) if there is an incoming request and
the server class keeps listening for next request.
• ThreadedSocket
This class contains the actual implementation of the Thread interface from
the JavaTM standard library since they only provide the Thread interface. As
mentioned earlier, this class is a child process of the class Server.
Class ThreadedSocket contains two methods: constructor and run methods.
In the constructor, it initialises values of the class variables with the ones
sent by the Server class. The run method executes this class, which is called
automatically once the class instantiation is done.
In the early process of the run method, it receives the incoming request
from the socket, which is converted into a string. Verification is carried out to
identify whether the request from the mobile user or server registration request
from the other server against the incoming request.
If the incoming request is a server registration of the other server, it will
call updateNeighbour of class BSEntity. As a confirmation result, the main
server identification is sent to that server.
APPENDIX A. IMPLEMENTATION MODEL 261
If the request is a request from a mobile user, a calling to poolingInput
method of class BSEntity is done. The poolingInput is to pool requests from
mobile users before the query is processed. In return, the poolingInput gives
the query result as the answer. This query result is an answer for LDQ of the
mobile user. At the end of this method, the query result is sent to the mobile
user through the requester port.
• Message
Class Message is used to pool a query from a mobile user. It separates the
received query from a mobile user and stores them into class variables. The
class variables are userID, currentPosition, movement, searchingDistance, new-
Position and scope. The usage of the first four parameteres are very straight-
forward. The last parameter, scope, is to generate the valid scope of user
query.
Analysing the velocity of the mobile user is done in advance to form a valid
scope. Once this is done, the valid scope is created by adding the component
of newPosition with the component of searchingDistance. If the movement is
vertical, the x-coordinate of valid scope ranges from ”minus by one” to ”twice
of x-coordinate” of searchingDistance. However, the Y-coordinate ranges from
the y-coordinate of newPosition to the y-coordinate of searchingDistance. The
valid scope creation for horizontal movement is similar to that for the verti-
cal movement. The only difference is the x-coordinate for vertical movement
becomes the y-coordinate for horizontal and the y-coordinate for vertical move-
ment becomes the x-coordinate for horizontal.
APPENDIX A. IMPLEMENTATION MODEL 262
• Result
This class is responsible for generating and storing a query result. It has
two constructors and two methods. Both constructors are default and copy
constructors to initialize class variables. The two methods are generateResult
and getResult methods. The last method returns the generated result to the
caller. The generateResult compares objects’ location from the database with
the valid scope. For those objects located inside the valid scope, the counter
of objects found is incremented and the locations of those objects are stored.
• BSEntity
This class is the main class for the server implementation. The query
processing and server registration processing are three tasks for this class. Let
us discuss the server registration process first, followed by the query processing
task.
The server registration process is done with a method called updateNeigh-
bour. This method accepts five parameters: address, port, position, BSWidth
and BSHeight. The first two parameters are the other servers’ addresses and
listening ports. The last three parameters are the bottom left position of the
other server, other BS width and height respectively. Then, this method find
an empty slot in its list of neighbour BSs. Once the empty slot has been
found, it creates an object of type NeighbourDetails by passing all accepted
parameters. At the end of this method, it sends a confirmation message to the
caller.
The second part of this task is the query processing. The query processing
here involves the server and the client. To differentiate the request type, a
method called poolingInput does this filtering job. If the query is asked by
APPENDIX A. IMPLEMENTATION MODEL 263
BS, this method calls method generateQueryFromBS. Otherwise, it calls on
method generateQueryFromClient to do query processing from a client.
After the filtering of query type has been done, the query processing task
is done by the following methods. The last two methods are to retrieve objects
from neighbour cells:
– generateQueryFromClient
In the beginning, the incoming request is pooled inside an array. We
used FIFO as a standard queuing priority. Once a server has finished
processing, the server processes the next element of the array.
The query processing is done by retrieving all objects in the current
BS. The procedure is the same as static objects retrieval for the single
cell. Once the process is finished, this method calls method generate-
QueryResultsFromNeighbour to retrieve static objects from a neighbour
cell. At the end of this method, information about static objects from
current and neighbour cells is combined and returned to the caller. The
processing time is also measured here. It measures the processing time
from current and neighbour cells.
– generateQueryResultsFromNeighbour
This method is used to find which neighbour BS overlaps with the
query scope. It goes through its database to get a list of overlapped
neighbour BS. Then, it passes the overlapping parts of the query scope by
opening the connection to that BS. While the neighbour BS is processing
the current query, the current BS waits until it gets the results from that
neighbour BS.
APPENDIX A. IMPLEMENTATION MODEL 264
– generateQueryFromBS
Before it starts searching static objects from its database, the time
measurement begins at the starting point of this method. The second step
determines which area of the query scope needs to be searched. Once the
area has been determined, it starts the searching process by comparing
each object that belongs to that area. These objects are collected into a
result collection. At the end of the process, the result collection is sent
to the caller and the time measurement is stopped.
Appendix B
Simulation Model
B.1 Simulation Package Overview
Planimate is a discrete event animation software platform for prototyping, devel-
oping and operating highly visual dynamic discrete event simulation models and
interactive applications [89]. Figure B.1 shows the opening page when this package
is loaded.
Figure B.1: Opening page of Planimate
265
APPENDIX B. SIMULATION MODEL 266
The Planimate contains two different types of palettes, namely Objects and
Items, as shown in Figures B.2 and B.3. The first type of palettes, Object palettes,
contains 18 different objects which symbolise different activities for simulating the
features of the real-environment.
Figure B.2: Planimate Objects
Figure B.3 shows the Items palette. An item is a temporary object that will
interact with the permanent objects and move through the system. An item coop-
erates with the permanent objects through paths which need to be defined.
B.2 Query Processing Model
A brief explanation of our proposed simulation models is given and described in this
section. The explanation includes a description of the features that are available in
the Planimate c©.
The first model is the proposed server query processing model. This model has
five components: request, counter, server, exit and result. The request represents a
request to a database record. The counter is used to count which database record
APPENDIX B. SIMULATION MODEL 267
Figure B.3: Planimate Items
Figure B.4: Initial server processing mechanism model.
number is currently being examined. The server decides whether a point is a valid
result. If it is, the point is then collected to the result. Otherwise, the point will be
ignored by sending the item to the exit component.
The database records are stored as a table representation, which is shown in
Figure B.5a. To simplify our models, we put only two dimensional coordinates into
the table. When there is a request, the counter will increment the position of the
data points in the table in order to select a data point to be examined.
APPENDIX B. SIMULATION MODEL 268
(a) Data points records. (b) The logic
Figure B.5: Planimate’s components for the server query processing.
Figure B.5b shows a logic for the server side. This logic is our proposed algorithm.
In general, the point is validated in this component and it is sent to a result collection
if it is located inside the query scope. The result collection is then sent to the
requester.
A proposed indexing mechanism model is shown in Figure B.6. In this model,
there are several nodes that represent a root node, 12 bounding boxes and 12 leaf
nodes. An entry is a query scope that traverses from a root node to leaf node and
collects the objects matched.
Figure B.7 presents the Planimate’s components which are being by the proposed
indexing model. Figure B.7a shows a table that contains a list of user locations while
sending queries. This table is used to represent an array which is normally used in a
programming language. On the other hand, Figure B.7b shows a list of parameters
that control model parameters, such as query size. These parameters act as variables
for the proposed algorithm in the Planimate. The values of these parameters can
be changed when the simulation is run.
APPENDIX B. SIMULATION MODEL 269
Figure B.6: Initial indexing mechanism model.
Figure B.8 presents a condition interface to specify whether the previous node
(root, MBR or Leaf) overlaps with the query scope. This condition interface can
have a maximum of four conditions in one object.
Figure B.9 shows the logic behind a node. The logic will verify whether this node
is inside the query scope. If the node is inside the query scope, the next traversal
will be the next node underneath the current node. Each node has logic which is
similar to the shown one.
Figure B.10 shows a proposed indexing model with a traversing item from the
root node to a leaf node. In this figure, a searching process is started from the
Entry 1 and finishes at one of the exit nodes. The other nodes represents Minimum
Bounding Box (MBB) and switches. The switches are used to manage the flow of
data item.
Figure B.11a shows one of the proposed client caching models. The process
starts from component, ”Entry 1” which represents a query scope. The component
APPENDIX B. SIMULATION MODEL 270
(a) list of user locations. (b) list of parameters
Figure B.7: Planimate’s components for the indexing mechanism.
Figure B.8: A condition interface on the Planimate.
”Objects in” retrieves objects by validating objects in a client cache with the in-
coming query scope and asks the server if the number of objects is less than the
number of requested objects. If the objects are retrieved from the cache, the cache
hit counter is incremented by component ”Inc CH”. The objects are queued until
the cache mgr finishes sending the objects.
If the number of objects coming from the server is greater than the cache space,
the objects are not stored in the cache. Otherwise, the next process verifies the
APPENDIX B. SIMULATION MODEL 271
Figure B.9: A logic for a node.
available cache space. When the cache space is large enough to store all incoming
objects, the objects are cached directly, which will regroup all cached objects.
Figure B.11b shows one of the routines. The elimination process takes over if
there is not enough space. In the elimination process, it will find the right victims
to be removed. This process finds the next group to be evicted. When the amount
of available space is enough to accommodate the number of incoming objects, the
objects will then be stored.
APPENDIX B. SIMULATION MODEL 272
Figure B.10: Indexing model with an item flow.
APPENDIX B. SIMULATION MODEL 273
(a) The proposed client cache model.
(b) One of the routines for the client cache model.
Figure B.11: Planimate’s components are being used to mode the proposed clientcaching.