Mobile Query Processing Incorporating Server and Client...

transcript

Mobile Query Processing Incorporating Server

and Client Based Approaches

James Winly Jayaputera, BAppSci(Comp.Sci), MIT

Thesis

for fulfillment of the Requirements for the Degree of

Doctor of Philosophy (0190)

Clayton School of Information Technology

Monash University

September, 2008

Abstract

This thesis studies query processing in a mobile environment. The main objective

is to investigate the performance improvement of mobile query processing, focusing

on the server and client sides.

In server side query processing, we consider single-cell and multi-cell queries,

whereby a cell is a service area for a single stationary host to communicate with a

static network. A quick response in answer to a mobile query is important, because

mobile users invariably move to another location while awaiting the query result.

To handle such a dynamic situation, we proposed solutions to answer single-cell

and multi-cell queries. The proposed solutions for processing single-cell queries are

divided into static and dynamic query scopes, and angle of movement. The static

and dynamic query scopes are extended to process multi-cell queries. Furthermore,

another solution is added in order to deal with a situation where the areas of several

base stations are either disjoint or overlapping. Finally, our algorithms also handle

disconnections which occur during query result transmission from a base station to

the mobile users.

Indexing mechanisms are important to speed up query processing, especially

for handling multi-cell queries. We propose two indexing mechanisms called Local

Index and Global Index mechanisms. The local index stores indexes of any requested

objects with limited slots, whereas the global index builds the index while a base

station is starting up. For both mechanisms, we developed algorithms to deal with

the existence and non-existence of replicated objects at the requested cell.

Frequent disconnections is a common problem occurring in a mobile environment.

Providing a cache in a mobile device is an important consideration. A cache is

useful if the repeat of many queries can be retrieved from the cache. Due to the

limitation of storage space in the mobile device, we have developed three cache

replacement policies, called: Path-based, Density-based and Probability Density Area

Inverse Distance (PDAID) mechanisms, which are based on distance, weight and

cost factors for each method, respectively.

In order to analyse the behaviour of the proposed methods, we have implemented

and simulated the performance of each algorithm. The results of each performance

are compared and analysed. The server side query processing shows an improvement

of the total retrieved objects while the query processing time and the amount of

data transfer are reduced. Furthermore, the server is able to decide whether the

next query result needs to be produced when the mobile users missed the current

query result. The proposed indexing mechanism has reduced the execution time

compared with the conventional approach in processing multi-cell queries. The

proposed approaches for the client side have also improved the cache-hit rate while

reducing the amount of data transfer.

Declaration

I declare that this thesis is my own work and has not been submitted in any form foranother degree or diploma at any university or other institute of tertiary education.Information derived from the published and unpublished work of others has beenacknowledged in the text and a list of references is given.

James Winly JayaputeraSeptember 20, 2008

Acknowledgments

This thesis would never have come into existence without precious encouragement,

guidance, and both personal and academic support from two of my supervisors, Dr.

David Taniar and Professor Bala Srinivasan.

I would like to dedicate this thesis to my family, who have supported me to the

end of this journey. Without them, I would not have completed this thesis.

I also would like to thank all of my friends, without mentioning them individually,

who helped to make this possible. I also to thank Bruna Pomella for correcting

grammatical and spelling mistakes.

James Winly Jayaputera

Monash University

September 2008

Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Limitations of Mobile Environments . . . . . . . . . . . . . . . . . . . 4

1.3 Objectives of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Scope of Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.6 Organisation of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Wireless Environment Architecture . . . . . . . . . . . . . . . . . . . 12

2.2.1 Wireless Technologies . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.2 Location Positioning Systems . . . . . . . . . . . . . . . . . . 19

2.3 Query Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3.1 Traditional Query . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3.2 Location Query . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4 Server Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.4.1 Overview Location-Dependent Query Processing . . . . . . . . 27

2.4.2 Query Processing for a Single Cell . . . . . . . . . . . . . . . . 34

2.4.3 Query Processing for Multiple Cells . . . . . . . . . . . . . . . 36

2.5 Indexing Structures for Query Processing . . . . . . . . . . . . . . . . 37

2.5.1 Conventional Index Query Processing . . . . . . . . . . . . . . 37

2.5.2 Moving Object Index Query Processing . . . . . . . . . . . . . 45

2.6 Mobile Query Processing at Client Side . . . . . . . . . . . . . . . . . 48

2.6.1 Mobile-Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.6.2 Top-K Queries . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.6.3 Cache Replacement Policies . . . . . . . . . . . . . . . . . . . 52

2.7 Outstanding Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 56

2.7.1 Mobile Query Processing at Server Side . . . . . . . . . . . . . 57

2.7.2 Indexing Structures for Multi-Cell Query Processing . . . . . . 58

2.7.3 Client Cache Management . . . . . . . . . . . . . . . . . . . . 58

2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3 Query Processing at Server Side . . . . . . . . . . . . . . . . . . . . 61

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.2.1 All Terms Used . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.2.2 Shape Selection for a Query Scope . . . . . . . . . . . . . . . 64

3.2.3 Query Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.3 Query Processing for Single-Cell . . . . . . . . . . . . . . . . . . . . . 69

3.3.1 Static Query Scope Category . . . . . . . . . . . . . . . . . . 69

3.3.2 Dynamic Query Scope Category . . . . . . . . . . . . . . . . . 79

3.3.3 Angle of Movement Category . . . . . . . . . . . . . . . . . . 81

3.4 Multi-Cell Query Processing . . . . . . . . . . . . . . . . . . . . . . . 84

3.4.1 Non-Overlapping and Overlapping Area Algorithms . . . . . . 87

3.4.2 Static and Dynamic Query Scope Algorithm . . . . . . . . . . 95

3.5 Handling Disconnections . . . . . . . . . . . . . . . . . . . . . . . . . 101

3.5.1 Single Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3.5.2 Multiple Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

3.6 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

3.6.1 Single-Cell Query Processing . . . . . . . . . . . . . . . . . . . 110

3.6.2 Multi-Cell Query Processing . . . . . . . . . . . . . . . . . . . 117

3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

4 Indexing for Multiple Servers Retrieval . . . . . . . . . . . . . . . 124

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

4.2 Preliminary Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

4.3 Local Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

4.3.1 Cache Remote Indexes Only . . . . . . . . . . . . . . . . . . . 132

4.3.2 Cache Remote Indexes and Data Items . . . . . . . . . . . . . 135

4.4 Global Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

4.4.1 Remote Data Items Located at Different Cell . . . . . . . . . 140

4.4.2 Remote Indexes and Data Items Located at Same Cell . . . . 144

4.5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

5 Client Caching for a Mobile Environment . . . . . . . . . . . . . . 153

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

5.2 Client Caching Overview . . . . . . . . . . . . . . . . . . . . . . . . . 156

5.2.1 Global Process . . . . . . . . . . . . . . . . . . . . . . . . . . 157

5.2.2 Storing Query Results to Cache . . . . . . . . . . . . . . . . . 158

5.2.3 Predicting Next Movement . . . . . . . . . . . . . . . . . . . . 160

5.2.4 Retrieving Cached Objects . . . . . . . . . . . . . . . . . . . . 161

5.2.5 Updating Query History List . . . . . . . . . . . . . . . . . . 162

5.2.6 Objects Grouping . . . . . . . . . . . . . . . . . . . . . . . . . 162

5.2.7 Cached Objects Elimination . . . . . . . . . . . . . . . . . . . 164

5.3 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

5.3.1 Path Based Elimination Algorithm . . . . . . . . . . . . . . . 168

5.3.2 Density Based Elimination Algorithm . . . . . . . . . . . . . . 172

5.3.3 PDAID Elimination Algorithm . . . . . . . . . . . . . . . . . 174

5.4 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 188

6.1 Implementation and its Results . . . . . . . . . . . . . . . . . . . . . 188

6.1.1 Implementation Environment . . . . . . . . . . . . . . . . . . 189

6.1.2 Implementation Results . . . . . . . . . . . . . . . . . . . . . 189

6.2 Simulation and its Results . . . . . . . . . . . . . . . . . . . . . . . . 209

6.3 Simulation Results for Single-Cell and Multi-Cell Query Processing . 210

6.3.1 Indexing for Multi-Cell Query Processing . . . . . . . . . . . . 214

6.3.2 Simulation Results for Client Caching . . . . . . . . . . . . . . 219

6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

7 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 227

7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

7.2 Summary of Research Result . . . . . . . . . . . . . . . . . . . . . . . 227

7.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

Appendix A Implementation Model . . . . . . . . . . . . . . . . . . . . 250

A.1 Location Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

A.2 Implementation for Query Processing in Single Cell . . . . . . . . . . 252

A.3 Implementation for Query Processing in Multi-Cells . . . . . . . . . . 256

Appendix B Simulation Model . . . . . . . . . . . . . . . . . . . . . . . 265

B.1 Simulation Package Overview . . . . . . . . . . . . . . . . . . . . . . 265

B.2 Query Processing Model . . . . . . . . . . . . . . . . . . . . . . . . . 266

List of Tables

2.1 Comparison of IEEE 802.11 standards . . . . . . . . . . . . . . . . . 14

2.2 Comparison of wireless local area network (WLAN) standards - 802.11a

versus 802.11b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Performance characteristics of cellular positioning methods . . . . . . 20

2.4 Mobile query category 1 . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5 Mobile query category 2 . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.1 Hardware settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

6.2 Parameters setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

6.3 First experiment result . . . . . . . . . . . . . . . . . . . . . . . . . . 196

6.4 Parameters setting for multiple BSs . . . . . . . . . . . . . . . . . . . 201

6.5 Second experiment result . . . . . . . . . . . . . . . . . . . . . . . . . 201

6.6 Parameter settings - single cell . . . . . . . . . . . . . . . . . . . . . . 210

6.7 Parameters setting - multiple cells . . . . . . . . . . . . . . . . . . . . 213

6.8 Experiment settings for client cache . . . . . . . . . . . . . . . . . . . 220

A.1 Snapshot of our Generated Data . . . . . . . . . . . . . . . . . . . . . 252

A.2 Setting implementation 1 . . . . . . . . . . . . . . . . . . . . . . . . . 253

A.3 Server default setting . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

List of Figures

1.1 Thesis framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1 Chapter 2 framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Wireless architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Query types classification . . . . . . . . . . . . . . . . . . . . . . . . 22

2.4 Location-dependent query (LDQ) illustration. . . . . . . . . . . . . . 24

2.5 Requesting a static object and moving within a single cell. . . . . . . 29

2.6 Requesting a static object and moving to another cell. . . . . . . . . 29

2.7 Requesting a moving object and moving within a single cell. . . . . . 30

2.8 Requesting a moving object (user and object moves to another cell). . 31

2.9 Requesting a moving object and user stays at same position. . . . . . 32

2.10 Static user requests a moving object. . . . . . . . . . . . . . . . . . . 33

2.11 Periodic query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.12 Non periodic query . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.13 The R-tree illustration [93] . . . . . . . . . . . . . . . . . . . . . . . . 39

3.1 The framework of chapter 3 . . . . . . . . . . . . . . . . . . . . . . . 63

3.2 A scenario presented in two-coordinates . . . . . . . . . . . . . . . . . 65

3.3 The proposed approach . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.4 A location-dependent query in details . . . . . . . . . . . . . . . . . . 68

3.5 The complexity of vertical movement . . . . . . . . . . . . . . . . . . 75

3.6 The complexity of horizontal movement . . . . . . . . . . . . . . . . . 77

3.7 The complexity of diagonal movement . . . . . . . . . . . . . . . . . 78

3.8 Dynamic query scope for the diagonal movement . . . . . . . . . . . . 80

3.9 Angle of movement illustrations. . . . . . . . . . . . . . . . . . . . . . 82

3.10 The complexity of angle movement. . . . . . . . . . . . . . . . . . . . 83

3.11 Three types of users’ movement . . . . . . . . . . . . . . . . . . . . . 87

3.12 Non-overlapping base stations(BS) . . . . . . . . . . . . . . . . . . . 88

3.13 Multi-cell query illustration . . . . . . . . . . . . . . . . . . . . . . . 91

3.14 An illustration of static query scope . . . . . . . . . . . . . . . . . . . 96

3.15 Dynamic Query intersects a base station (BS) (top) in the same line.

(bottom) in two different lines . . . . . . . . . . . . . . . . . . . . . 98

3.16 An illustration of dynamic query situation . . . . . . . . . . . . . . . 99

3.17 Illustration of predicted disconnection situation . . . . . . . . . . . . 102

3.18 Stay at the same location (Case Study 3.6.1) . . . . . . . . . . . . . . 111

3.19 Vertical movement (Case Study 3.6.2-1) . . . . . . . . . . . . . . . . . 112

3.20 Vertical movement with overlap situation (Case Study 3.6.2-2) . . . . 113

3.21 Horizontal movement (case study 3.6.3-1) . . . . . . . . . . . . . . . . 114

3.22 Horizontal movement with overlap situation (case study 3.6.3) . . . . 115

3.23 Diagonal movement and overlap situation (Case Study 3.6.4) . . . . . 116

3.24 A query scope is crossing multiple cells . . . . . . . . . . . . . . . . . 118

3.25 Moving across to another base station (BS) boundary . . . . . . . . . 119

3.26 Three situations of overlapping base station area . . . . . . . . . . . . 120

4.2 R-tree and 2D coordinates [93] . . . . . . . . . . . . . . . . . . . . . . 128

4.3 Three index structure into 3 cells . . . . . . . . . . . . . . . . . . . . 130

4.4 Tables for cell 1, cell 2 and cell 3 (from left to right) . . . . . . . . . . 131

4.5 Index structure after the records insertion using local index-1 . . . . . 133

4.6 Index structure after the records insertion using local index-2 . . . . . 136

4.7 Global Index for all cells using GI mechanism. . . . . . . . . . . . . . 139

4.8 GI mechanism uses single node pointers. . . . . . . . . . . . . . . . . 142

4.9 Global Index without replicated remote data items. . . . . . . . . . . 144

4.10 GI mechanism where data items are replicated. . . . . . . . . . . . . 145

4.11 Indexing structure at cell 2 after the remote index insertion . . . . . . 148

4.12 Global Index mechanisms case study . . . . . . . . . . . . . . . . . . 149

5.2 Section 5.2 framework . . . . . . . . . . . . . . . . . . . . . . . . . . 157

5.3 An illustration of the DBScan Algorithm . . . . . . . . . . . . . . . . 164

5.4 Simple illustration of our elimination approach . . . . . . . . . . . . . 168

5.5 Complex illustration of our elimination approach . . . . . . . . . . . . 171

5.6 A query scope overlaps with multiple groups . . . . . . . . . . . . . . 171

5.7 Illustration of density elimination . . . . . . . . . . . . . . . . . . . . 173

5.8 Illustration of PDAID retrieval . . . . . . . . . . . . . . . . . . . . . 177

5.9 Initial situation after cached objects have been grouped . . . . . . . . 180

5.10 Density based approach (Case Study 5.4.1-1) . . . . . . . . . . . . . . 181

5.11 Density-based approach (Case Study 5.4.1-2) . . . . . . . . . . . . . . 181

5.12 Path-based approach (Case Study 5.4.2-1) . . . . . . . . . . . . . . . 182

5.13 Path-based approach (Case Study 5.4.2-2) . . . . . . . . . . . . . . . 183

5.14 PDAID-based approach (Case Study 5.4.3) . . . . . . . . . . . . . . . 185

6.1 Number of targets found in a square . . . . . . . . . . . . . . . . . . 190

6.2 Number of targets found in circle . . . . . . . . . . . . . . . . . . . . 191

6.3 Comparison of number of targets found in circle and square . . . . . . 192

6.4 Comparison of number of targets found in each region. . . . . . . . . 193

6.5 Comparison of number of targets found in circle at time t1 and t2. . . 193

6.6 Snapshot of CPU load . . . . . . . . . . . . . . . . . . . . . . . . . . 194

6.7 Various searching scope with 100,000 and 500,000 database records . 198

6.8 Various searching scope with 1,000,000 and 5,000,000 database records199

6.9 A single searching scope with one and five users . . . . . . . . . . . . 203

6.10 A single searching scope with ten and twenty users . . . . . . . . . . 204

6.11 Response time of single BS . . . . . . . . . . . . . . . . . . . . . . . . 205

6.12 Response time of multi-BSs . . . . . . . . . . . . . . . . . . . . . . . 206

6.13 Processing time of individual BSs for the same query scope and two

BSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

6.14 Processing time of individual BSs for the same query scope and three

BSs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

6.15 Comparison of objects retrieved using a square and a circle (single cell)211

6.16 Percentage comparison of object retrieval using different sizes of query

scopes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

6.17 Comparison of objects retrieved using a square and a circle . . . . . 213

6.18 Average access time between proposed vs conventional approaches . . 215

6.19 Average access time for the proposed Local Index vs the conventional

approaches (50 Requests) . . . . . . . . . . . . . . . . . . . . . . . . . 216

6.20 Average access time for the proposed Local Index vs the conventional

approaches (150 Requests) . . . . . . . . . . . . . . . . . . . . . . . . 217

6.21 Average access time for a single query . . . . . . . . . . . . . . . . . . 218

6.22 Average access time for a single query: remote indexes only. . . . . . 219

6.23 Comparison of cache hits with various minimum points on each group 220

6.24 Comparison of cache hits with a maximum value of min req is 10. . . 222

A.1 Implementation for object validation against query scope . . . . . . . 255

A.2 Snapshot of experiment 1 simulation . . . . . . . . . . . . . . . . . . 256

A.3 Class diagram of server implementation . . . . . . . . . . . . . . . . . 257

A.4 Implementation of a server registering itself to a main server . . . . . 259

A.5 Implementation on how a server keep listening from incoming request 260

B.1 Opening page of Planimate . . . . . . . . . . . . . . . . . . . . . . . . 265

B.2 Planimate Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

B.3 Planimate Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

B.4 Initial server processing mechanism model. . . . . . . . . . . . . . . . 267

B.5 Planimate’s components for the server query processing. . . . . . . . 268

B.6 Initial indexing mechanism model. . . . . . . . . . . . . . . . . . . . . 269

B.7 Planimate’s components for the indexing mechanism. . . . . . . . . . 270

B.8 A condition interface on the Planimate. . . . . . . . . . . . . . . . . . 270

B.9 A logic for a node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

B.10 Indexing model with an item flow. . . . . . . . . . . . . . . . . . . . . 272

B.11 Planimate’s components are being used to mode the proposed client

caching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

List of Algorithms

2.1 The R-tree insertion algorithm. . . . . . . . . . . . . . . . . . . . . . . 40

2.2 The adjusting R-tree algorithm. . . . . . . . . . . . . . . . . . . . . . . 41

2.3 The searching R-tree algorithm. . . . . . . . . . . . . . . . . . . . . . . 43

2.4 Nearest-Neighbour search algorithm. . . . . . . . . . . . . . . . . . . . 44

3.1 The main proposed algorithm . . . . . . . . . . . . . . . . . . . . . . . 71

3.2 The vertical movement algorithm . . . . . . . . . . . . . . . . . . . . . 75

3.3 The horizontal movement algorithm . . . . . . . . . . . . . . . . . . . 76

3.4 The diagonal movement algorithm . . . . . . . . . . . . . . . . . . . . 79

3.5 The dynamic query scope algorithm. . . . . . . . . . . . . . . . . . . . 81

3.6 The angle of movement algorithm. . . . . . . . . . . . . . . . . . . . . 85

3.7 Non-overlapping algorithm. . . . . . . . . . . . . . . . . . . . . . . . . 90

3.8 Eliminating neighbour BS overlapping area algorithm. . . . . . . . . . 93

3.9 Eliminating items from neighbour query result. . . . . . . . . . . . . . 94

3.10 Get Result algorithm for static query scope . . . . . . . . . . . . . . . 97

3.11 Neighbour cell retrieval algorithm for dynamic query scope . . . . . . . 100

3.12 Predicted disconnections algorithm . . . . . . . . . . . . . . . . . . . . 104

3.13 Non-reprocessing algorithm . . . . . . . . . . . . . . . . . . . . . . . . 105

3.14 Reprocessing algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 106

3.15 Predictable disconnection algorithm for multi-cell retrieval . . . . . . . 107

3.16 Unpredictable disconnection algorithm for multiple cells retrieval . . . 108

4.1 The Local Index algorithm . . . . . . . . . . . . . . . . . . . . . . . . 132

4.2 The insertion algorithm of Local Index-1 . . . . . . . . . . . . . . . . . 134

4.3 The deletion algorithm of Local Index-1 . . . . . . . . . . . . . . . . . 135

4.4 The insertion algorithm of Local Index-2 . . . . . . . . . . . . . . . . . 137

4.5 The deletion algorithm of Local Index-2 . . . . . . . . . . . . . . . . . 138

4.6 Node maintenance of GI-1 algorithm . . . . . . . . . . . . . . . . . . . 141

4.7 Node Maintenance of GI-2 algorithm . . . . . . . . . . . . . . . . . . . 147

5.1 The proposed path-based elimination algorithm . . . . . . . . . . . . . 170

5.2 Density-based elimination algorithm . . . . . . . . . . . . . . . . . . . 172

5.3 Cache retrieval for PDAID algorithm . . . . . . . . . . . . . . . . . . . 176

5.4 Cached objects elimination of PDAID algorithm. . . . . . . . . . . . . 179

Publications

1. James Jayaputera and David Taniar, “Partial Global Indexing for Location-

Dependent Query Processing”, Encyclopedia of Mobile Computing and Com-

merce, IGI-Global, vol. 2, pp. 739-743, 2007.

2. James Jayaputera and David Taniar, “Data Retrieval for Location-Dependent

Query in a Multi-cell Wireless Environment”, Mobile Information Systems:

An International Journal, IOS Press, vol. 1, no. 2, pp. 91-108, 2005.

3. James Jayaputera and David Taniar: “Query Processing Strategies For Location-

Dependent Information Services”, International Journal of Business Data Com-

munications and Networking, IGI-Global, Vol. 1, No. 2, pp. 17-40, 2005

4. James Jayaputera and David Taniar: “Location-Dependent Query Results

Retrieval in a Multi-cell Wireless Environment”, Parallel and Distributed

Processing and Applications, Lecture Notes in Computer Science, vol 3358,

Springer-Verlag, pp. 49-53, 2004.

5. James Jayaputera and David Taniar: “Defining Scope of Query for Location-

Dependent Information Services”, Embedded and Ubiquitous Computing, Lec-

ture Notes in Computer Science, vol. 3207, Springer-Verlag, pp. 366-376,

6. James Jayaputera and David Taniar: “Invalidation for CORBA Caching in

Wireless Devices”, Embedded and Ubiquitous Computing, Lecture Notes in

Computer Science, vol. 3207, Springer-Verlag, pp. 460-471, 2004.

Chapter 1

Introduction

1.1 Preamble

Nowadays, people are always on the move and hence, mobile environments hold key

aspects to information retrieval. The need to access daily information regarding

the stock exchange, weather, restaurant locations, and so on, is unavoidable. In

this environment, this kind of information can be accessed at anytime anywhere

[2, 84, 16], because people connect their device to a server in a wireless fashion

without any limitations of distance boundaries [20].

Queries are sent to servers while users are moving. These queries are called mobile

queries. The queries are accepted by a stationary host, called the Base Station (BS).

A base station is a stationary entity which acts as a mediator to forward the message

to wireless and wired networks for a certain area. The particular area which is served

by a single base station, is called a Cell.

From the mobile users’ point of view, these areas are transparent. In other

words, the users do not know if they move within a single cell or multiple cells.

Based on this area, we categorise queries into two types, Single-cell and Multi-cell

CHAPTER 1. INTRODUCTION 2

cells queries. A single cell query is a query that asks for information about objects

where the objects are located in the same cell as the mobile user. On the other hand,

the multi-cell query is a query that asks for the current cell and its neighbouring

cells.

Due to the dynamic nature and limitations of the mobile environment, the query

executions needs to cope with these limitations in the mobile environment [74, 72,

23, 63, 87]. Smaller storage size, processing capacity and network bandwidth than

wired networks are included into these limitations. Frequent disconnections resulting

from narrow wireless bandwidth often occur during data transmission.

In terms of mobile query processing, it can be done either in the server side or

the client side or both. At the server side, the process involves a single or multiple

servers [128] depending on the query type. The challenges for the single query

processing are to return correct answers and to handle a disconnection situation. A

correct answer means that the answer is still valid when it is received by the user. A

disconnection might occur temporarily or permanently, in which during this period,

the mobile user does not receive any answer from the server.

On the other hand, query processing which involves multiple servers has other

challenges. The challenges involve receiving answers from other servers, because the

user request is forwarded by the current server to the appropriate servers on behalf

of the mobile client. The matched data items are sent to the requester cell which

will be merged with the requester cell’s data items. These data items are sent to

the client. In this case, the server who receives user queries should be aware of the

processing time of other servers. Otherwise, results that are no longer valid would

be sent to the user, even though these results were valid during the query processing.

Furthermore, another challenge for multiple servers query processing is an index

structure traversal. From the many existing indexing structures [93, 32], a tree index

structure type is commonly used [123]. The R-tree [41] indexing structure is chosen

for our research, because the R-tree is one of the most known tree indexes [93] for its

ability to store multi-dimensional indexes for points and rectangles. Unfortunately,

it lacks the efficiency to process multi-cell queries. For example, a multi-cell query

asks for cells A and B, where each cell has a R-tree index structure. In this example,

both cells traverse their tree index from the top (root node). Therefore, there is a

demand for new mechanisms to overcome this limitation.

Client side query processing can be achieved by providing a cache for the mobile

device. A cache is a temporary space to store most requested data items in the local

storage. Client devices accommodate these query results in a cache upon receiving

the query results. When the client asks similar queries, the query results will be

answered from the cache if they are available on the cache. Hence, communication

to a server can be avoided. Existing caching mechanisms have been applied in wired

and wireless networks, such as Web areas [3, 114] and distributed systems [106].

In the wireless network, caching data items is an appropriate way to handle

limitations in a mobile environment, such as frequent disconnections, small storage

size and narrow wireless bandwidth. The reason is that the query results can be

obtained from the local copy without connecting to the server if the query results

have been requested. Several caching mechanisms have been developed in this envi-

ronment [46, 64, 60, 61, 57, 58, 126, 18]. However, developing the best client cache

replacement algorithms within the limitations of mobile environments is challenging

and will be addressed in this thesis.

1.2 Limitations of Mobile Environments

The current mobile computing devices have some limitations to completing some

jobs as the desktop computers do. These limitations challenge some researchers

to overcome them. The limitations of mobile computing devices are described as

follows:

• Limited resources

The resource constraints of mobile computing devices enable them to process

fewer jobs compared with a desktop computer. The latest processing speed is

500 MHz for current PDAs. This processing speed is 6-7 times slower com-

pared with a desktop computer. On the other hand, the mobile computing

devices are powered by batteries. This power is consumed mostly by screen

back-lightings, Central Processing Units (CPUs), memories, hard disks, data

transmissions, displays and others. Therefore, traditional applications that

spend a huge amount of resources do not run efficiently on the mobile com-

puting devices.

• Limited geographical coverage

Limitation of geographical coverage is another constraint in the mobile envi-

ronment. This means that a base station can cover only a particular wireless

service area, called a cell.

• Mobility

Mobile devices can be carried from one location to another. Mobility means

that a mobile user moves from one geographical location to another. A mobile

device may move within a wireless service area or to different wireless service

area. When a mobile device moves to a different wireless service area, an event

called hand-off, occurs. This event transfers communication from the previous

wireless service area to the current one.

• Disconnections

Disconnections in mobile environments can be classified into two categories:

long and frequent. In the first category, mobile devices run out of power or

out of service range coverage. The latter is the major difficulty, and occurs

because of echoed signals, interference from other signals, moves to a different

network and other issues.

• Variable bandwidth

The current bandwidths occupied by mobile computing devices are varied.

This is caused by unsupported environments, such as when mobile users are

far from a base station. The maximum bandwidth which can be occupied by

a mobile device is 100 Kbps for Generalized Packet Radio Service (GPRS) [5]

and 155 Mbps for wireless LAN [24].

• Data transfer cost

Currently, the cost of transferring data items is still expensive. The data trans-

fer (sending and receiving) costs AUD 30 cents per megabyte data [49]. This is

caused by factors, such as expensive spare parts availability and productivity

of workers [1].

• Small screen devices

The screen device is much smaller compared with the one for desktop windows.

It is impossible to view a long list of records at once.

1.3 Objectives of this Thesis

In this thesis, mobile query processing techniques will be investigated focusing on

the client and server processing sides. Developing new algorithms to increase the

performance of the mobile query processing is our objective.

The following issues are focused upon in this thesis in order to achieve the ob-

jective mentioned above:

• to create an innovative server side query processing technique, that is divided

into single and multi-cell query processing;

• to design indexing mechanisms for multi-cell query processing;

• to model new caching replacement schemes for mobile devices; and

• to implement and evaluate the proposed approaches above.

1.4 Scope of Research

Traditional query processing techniques are not adequate for processing mobile

queries due to the limitations of mobile devices, mobile environment and mobil-

ity of users [81, 79]. Hence, this research investigates several outstanding issues

around mobile query processing at server and client sides. Then, we propose solu-

tions to these outstanding issues by developing new algorithms or modifying existing

algorithms.

Figure 1.1 depicts the scope of this research. The thesis consists of three core

areas, which include developing query processing algorithms for server and client

sides. The first two core areas are query processing at the server side, whereas

the last one focuses on the client side. Therefore, it is important to carry out

some investigations of query processing mechanisms for both sides which includes

Figure 1.1: Thesis framework

addressing the issue raised by the limitations of the mobile environment and its

device. Also, the investigation needs to address the nature of the mobile environment

such as, query processing at client side, and in particular to handle small storage size

and narrow wireless network bandwidth. On the other hand, the query processing

techniques at the server side deal with queries that request for information which is

located in both single and multiple stationary service areas.

1.5 Contributions

The specific contributions of this thesis are listed below:

• Query Processing at Server Side

Several algorithms to process mobile queries at the server side are proposed.

These algorithms are categorised into two major parts: Query Processing and

Handling Disconnection algorithms. The query processing part focuses on

answering Single-Cell and Multi-Cell queries. A single-cell query is a request

for information of objects that is located within one cell, whereas, a multi-cell

query is when the user requests information of objects within several cells. The

handling disconnection part deals with frequent disconnections, which occur

during query result transmissions.

• Indexing for Multi-Cell Query Processing

An extension of the existing multi-dimensional index structure is proposed

to incorporate the processing of multi-cell queries. Two index mechanisms,

namely Local and Global indexes, are introduced with an aim to minimise

query processing when query results retrieval involves multiple cells. The

local index mechanism is used to retrieve query result, and build the indexes

locally. The global index mechanism, which is built while a BS is starting

up, contains indexes of all online BSs. In the global index, remote data items

of both mechanisms can be either replicated in the local cell or kept in the

remote cell.

• Cache Replacement Policy for Mobile Client Caching

In order to address the major mobile issues of small display screen and storage,

a number of cache replacement policies is developed. The main aims of the

proposed cache replacement policies are to increase the cache hit rate and to

handle the issue of a small display screen. Three cache replacement policies

are proposed; they are Path-based, Density-based and Probability Density Area

Inverse Distance (PDAID) mechanisms. The first mechanism eliminates a

group of cached objects which is far away from user. The density-based mech-

anism evicts a group of cached objects that is the least dense. The PDAID

mechanism wipes out a group of objects that has the smallest value that was

calculated by a formula. The calculated value is based on distance, weight and

area of a query scope.

1.6 Organisation of the Thesis

The thesis is organised into four parts which are separated into six chapters. The

first part is a review of the literature, the second one is our proposed approach,

the third is the implementation of our proposed approaches and the fourth is the

conclusion of this thesis. The details of thesis organisation are explained as follows:

Chapter 2 presents existing related works in mobile query processing. The aim

of this chapter is to investigate the work done by other researchers in the same area;

to outline the achievements of their works in the same domain; and to analyse the

benefits and shortcomings of the works. This chapter also focuses on the problems

which still need to be investigated.

The core of this thesis, which concentrates on the problems pointed out in Chap-

ter 2, is divided into three major elements: (i) Query Processing at Server Side, (ii)

Indexing Mechanisms for Multi-Cell Query Processing, and (iii) Client Cache Re-

placement Policies.

Chapter 3 presents query processing techniques at server side which are cate-

gorised into single-cell and multi-cell query processing. The proposed approach is

based on the need to choose a correct and efficient shape as a query scope. It also

needs to apply an effective algorithm in mobile query processing. The proposed ap-

proach is elaborated upon in detail and the query processing algorithm is explained.

Chapter 4 elaborates upon the indexing mechanisms for multi-cell query pro-

cessing. The aim of this chapter is not to propose a new indexing structure, but to

propose mechanisms which use an existing indexing structure to process multi-cell

queries. The mechanisms are divided into local and global indexing mechanisms for

processing multi-cell queries at the server side.

Chapter 5 describes query processing techniques at the client side. The mech-

anisms include three cache replacement policies to deal with partial answers. In-

creasing cache hit performance and minimising transfer costs are the purposes of

the mechanisms.

Chapter 6 presents the performance evaluation of the proposed mechanisms men-

tioned in Chapters 3, 4 and 5. The evaluation includes formulating various cost

models for each proposed mechanism through implementation and simulation.

The last chapter concludes the contents of this thesis. The contents include

a summary of the research contributions and results achieved and presents future

issues for further investigation.

Chapter 2

Literature Review

2.1 Introduction

This chapter presents a comprehensive review including related work in the area of

mobile query processing. The main purpose of this chapter is to supply a broad

knowledge of the existing works which are related to this thesis. The chapter does

not only provide a general summary of mobile technology for query processing, but

also analyses what other researchers have done in the area of query processing.

The organisation of this chapter, as shown in Figure 2.1, is as follows. Two

preliminary sections that provide a background knowledge are presented in Sections

2.2 and 2.3. In Section 2.2, a global overview of wireless networks architecture is

presented. Section 2.3 describes a framework of queries.

Discussions of existing works are given in Sections 2.4, 2.5 and 2.6. Section 2.4

discusses the query processing at server side. Existing works on indexing structures

are discussed in Section 2.5. A discussion on existing client caching is described in

Section 2.6. Section 2.7 presents the problems that have not yet been resolved. The

last section concludes this chapter.

CHAPTER 2. LITERATURE REVIEW 12

Figure 2.1: Chapter 2 framework

2.2 Wireless Environment Architecture

A mobile computing environment has an architecture similar to that of the wired

network in terms of query processing and data communication. It means that mobile

computing devices can process the query and communicate with other devices, in a

similar way to the wired network. However, each environment has a different way

of dealing with query processing and communications. A mobile device with its

limitations can process fewer queries compared with the device in a wired network.

In terms of communication, mobile computing devices communicate through air as

their medium for communication.

In general, the wireless environment has two important devices: a mobile device

and a stationary host. Mobile computing devices can communicate through the air

with either mobile devices or stationary hosts. All queries are transferred by moving

users to a stationary server via wireless communication as shown in Figure 2.2.

Stationary hosts are called Base Station [44, 127], Mobile Support Station [44, 127],

or home base nodes [127]. A base station is a stationary host that acts as a mediator

between a wired network and wireless hosts for a specific area, called a cell. This

type of computing is called mobile computing [51].

Figure 2.2: Wireless architecture

2.2.1 Wireless Technologies

This section discusses the current wireless technologies, and includes wireless tech-

nologies used for indoor and outdoor networks.

• In-room Network

In this category, a mobile device can communicate with other mobile devices

using a short range wireless. In [44], they mention two types of in-room net-

work: infrared and radio frequency. In the first type, the wireless network

coverage is about 40-50 metres with a supported bandwidth of about 1 Mbps.

The most common standard used for this network technology at present is the

Infrared Data Association (IrDA).

On the other hand, the BlueTooth Special Interest Group produced the in-

room radio frequency in 1998 [14]. BlueTooth is a low-cost, short range radio

that connects mobile PCs with other BlueTooth devices within wireless net-

work coverage ranging from 1 metre up to 100 metres. The data transfer rate

is up to 3Mbps.

• Wireless LAN (WLAN)

A wireless local area network provides location-independent communication by

connecting two or more mobile computing devices without using wires. This

technology provides wide wireless bandwidth to low mobility clients. The aim

of WLANs is to provide a wireless bridge to conventional wired networks rather

than supporting true mobility [88]. This technology expands the range of the

infrared and the BlueTooth technologies by improving the network diameter

to about 200m [44]. It provides low-mobility, high-data-rate data communica-

tions within a confined region [127].

Table 2.1: Comparison of IEEE 802.11 standards

IEEE standard Speed(Mbps) Frequency band802.11 1-2 2.4 GHz802.11a up to 54 5 GHz802.11b 5.5 - 11 2.4 GHz802.11g up to 54 2.4 GHz

Amongst several available standards for WLAN, IEEE, the Institute of Elec-

trical and Electronics Engineers, the 802.11 standard for wireless LANs is the

most successful standard today and it is superficially similar to Ethernet [38].

The IEEE 802.11 standard has a number of protocols [108]. However, there

are only three types of IEEE 802.11 that have been widely used, namely IEEE

802.11a, IEEE 802.11b, IEEE 802.11g [38].

Table 2.1 gives a summary of these three types of IEEE 802.11. The table

shows that the first generation IEEE 802.11 is slow in terms of bandwidth. On

Table 2.2: Comparison of wireless local area network (WLAN) standards - 802.11aversus 802.11b

IEEE 802.11b IEEE 802.11aTime Table Standard in 1997, products Standard in 2001,

in 2000 products in 2002Frequency band Transmit at 2.4 GHz - IEEE 5 GHzand bandwidth 802.11g standard increases

speed of 802.11b to 22Mbps in the same 2.4 GHzband

Speed 11 Mbps (Effective speed - 54 Mbps (Effectivehalf of rated speed) speed - 50% rated speed)

Modulation Spread Spectrum OFDM (Orthogonaltechnique Frequency Division

Multiplexing)Distance coverage Up to 300 feet 60 feet - speed goes

down with increaseddistance

Maturity More matured products Less matured butprogressing fast

Number of access Every 200 feet in each Every 50 feet;points required directionMarket penetration Quite widespread Just starting in 2002Interference with Band is more polluted - Less interference becauseof other devices significant interference here few devices in this

bandInteroperability Current problems expected Problems now but

to be resolved in future expect resolution soonCost Cheaper - $300 for access More expensive $500 (in

point and $75 for adapter 2001 /2002) - will comedown

Vendors Major vendors in both camps

the other hand, Table 2.2 [79] shows a comparison of Wireless LAN standards

802.11a and 802.11b in more detail by considering several factors.

• Broadband Wireless Network

The wireless technology that allows simultaneous wireless delivery of voice,

data, and video has appeared recently in metropolitan areas, which is called

Broadband Wireless (BW) [81]. This wireless technology is mainly available in

metropolitan areas with a requirement of clear sights between the transmitter

and the mobile computing devices. Two types of this technology are: Local

Multi-point Distribution Service (LMDS) and Multi-channel Multi-point Dis-

tribution Service (MMDS). The first, LMDS, uses a high bandwidth wireless

frequency within a range of 20-31 GHz. The last type, MMDS, uses a lower

bandwidth wireless frequency within 2 GHz and has a coverage of up to 35

miles (roughly 56 KMs).

• Wide Area Wireless/Radio Network

Wide Area Wireless is designed to provide data transmission and its infrastruc-

ture consists of base stations, network control centres and switches to transmit

the data [127]. The characteristics of Wide Area Wireless are high mobility,

wide ranging and low data rate digital communication [88, 127]. This network

type can be categorised into public and private radio network [88]. The first

category is the wireless data communications supplied to the public by service

providers and the average data rate is 4800 bps to 19.2 Kbps [127]. The second

category is provided by a private company for its own purposes. Examples of

public packet data network are ARDIS, CDPD, Ericssons Enhanced Digital

Access Communication Systems (EDACS), Metricom, Mobitex and Motorola

Datatrac [33].

• Satellite-based Network

The satellite network has been used to deliver communication, which relays

voice, video or data, since the 1960s [26]. The characteristics of the satellite-

based network are that it has wide range coverage, expensive, two-way com-

munication and low quality voice. It has wide area coverage which spans the

ocean as well as remote land areas [70]. It provides two-way communications,

however, it has low quality voice or limited data [127, 88]. It is also expensive

to provide this type of network [31].

There are three common terms used for these satellites based on their dis-

tance and spatial relationship with the earth, namely GEOstationary Satellites

(GEOS), Medium Earth Orbit Satellites (MEOS) and Low Earth Orbit Satel-

lites (LEOS) [88, 31, 110]. GEOS, MEOS and LEOS are located at altitudes

of 35,786 km, 10,000 km and 1,000 km respectively.

• Cellular Network

The cellular network has evolved from first generation up to fourth generation.

The first generation (1G) of cellular systems appeared in the early 1980s and

is based on analog technology [6]. Voice is transmitted using Frequency Mod-

ulation (FM) [88]. The first generation characteristics are low capacity, lack of

security, and unsuitable for non-voice applications [6]. The data transfer rate

is 1.2-9.6 Kbps [88].

In the early 1990s, the second generation (2G) of cellular systems appeared

and was heralded by the arrival of digital modulation techniques that promised

increased capacity, better speech quality, enhanced security features, and more

efficient terminals [6]. It has a data transfer rate from 9 to 14 Kbps [88]. Exam-

ples of the second generation cellular network includes Time Division Multiple

Access (TDMA), and Code Division Multiple Access (CDMA), Global System

for Mobile Communications (GSM), and Personal Digital Cellular (PDC).

The second and a half generation is an enhancement of the second generation.

The examples include Enhanced Data Rates for Global Evolution (EDGE),

High-Speed Circuit-Switched Data (HSCSD) and General Packet Radio Ser-

vices (GPRS). Their data transfer rates are 474 Kbps, 38.4 Kbps, 171.2 Kbps

The third generation was developed in 1992. The examples of third generation

include the Universal Mobile Telecommunications System (UMTS), the Code

Division Multiple Access (CDMA2000). This generation has three categories

of data rates as follow [6]:

– 2.4 Mbps to stationary users (fixed location)

– 384 Kbps to pedestrian users (travel speed: 3 metres/hour)

– 144 Kbps to vehicular users (travel speed: 60 metres/hour)

The next generation of 3G wireless network is 3.5G with 3Mbits/secs data

rates [29].

The fourth generation has not officially been released yet, but it is expected

that this generation will support applications up to 1 Gbps [53].

As we mentioned earlier in this section, a cell is a service area for one BS where

each cell may have the same or different size. According to [71, 35], cells are

classified into three types: Macro, Micro and Pico cells. A Macro cell is a cell

which has a radius of 700-8000 metres, a data transfer rate of 144 - 384 Kbps with

bandwidth frequency of 11.34 Mhz. A Micro cell has a radius of 75 - 700 metres

with a data transfer rate of 384 Kbps and bandwidth frequency of 1.26 Mhz. A Pico

cell is an area with a radius of 20-75 metres, a 384 Kbps - 2 Mbps data transfer rate

and 1.26 Mhz bandwidth frequency.

2.2.2 Location Positioning Systems

This section discusses available location positioning devices which are used to reg-

ister mobile user details in order to use a wireless facility.

• Satellite Positioning

The common popular Satellite Position is the Global Positioning System (GPS).

This system provides two basic types of services: the Standard Positioning

Service (SPS) and the Precise Positioning Service (PPS) [56]. The SPS is a

positioning and timing service focusing on the civilian user, whereas the PPS

is a positioning, velocity, and timing service for military applications. The

second service is restricted to authorised users only (such as: United States

and allied military and US government). Another Satellite Position is called

Galileo which will start its operation in year 2009.

• Cellular Positioning

This cellular positioning system is the integration of GPS so that the cellular

network provides terminals with assistance and correction of the satellites [56].

Examples of the cellular positioning for the second generation cellular network

(GSM, stands for Global System for Mobile Communications) are Cell-Id in

combination with timing advance, Enhanced Observed Time Difference (E-

OTD), Uplink Time Difference of Arrival (U-TDoA), and Assisted GPS (A-

GPS). The introduction of Cell-Id and A-GPS into existing GSM networks is

comparatively simple, while E-OTD and U-TDoA comprise essential modifi-

cations and extensions.

Table 2.3: Performance characteristics of cellular positioning methods

Accuracy Consistency YieldRural Suburban Urban

Cell-Id >10 km 2–10 km 50–1,000 m Poor GoodE-OTD & 50–150 m 50–250 m 50–300 m Average AverageOTDoAU-TDoA 50–120 m 40–50 m 40–50 m Average AverageA-GPS 10–40 m 20–100 m 30–150 m Good Good

Examples of the cellular positioning for the third generation cellular network

(GSM) are Cell-based methods, Observed time difference of arrival with idle

period downlink (OTDoA-IPDL), Assisted GPS (A-GPS). Table 2.3 shows the

performance characteristics of each cellular positioning method [56]. From the

table, the performance of A-GPS show the most accurate and consistent of the

methods, even though its service area is the smallest service area compared

with the others.

Assisted GPS (A-GPS) is a hybrid solution to use information from both the

satellites and network [4]. This technology enables a mobile terminal including

GPS receiver to be positioned faster and more accurately [112]. The A-GPS

is located at BSs and feeds information to mobile computing devices. This

technology has been used in “KDDI au network” in Japan [112]. The advan-

tages of using A-GPS are: (i) improved accuracy, (ii) reduction of position

acquisition time, (iii) less power consumption at the GPS receiver, and (iv)

increase in receiver sensitivity [4].

• Indoor Positioning

This positioning system operates within an indoor or local environment, such

as shopping centres or buildings. There are four indoor-based positioning sys-

tems: WLAN-based, Radio Frequency Identification (RFID)-based, infrared-

based and ultrasound-based. The first method is the most popular and IEEE

802.11 devices are used. The RFID-based is an emerging technology that is

primarily used today for applications like asset management, access control,

textile identification, collecting tolls, or factory automation [56].

Some such projects include Xerox ParcTab [117], the Wireless Indoor Position-

ing System (WIPS) project [119], Active Bat [118] and the Cricket system [92].

The first two projects use infrared-based positioning [117, 119]. The last two

projects use ultrasounds and a combination of ultrasounds and radio respec-

tively [118, 92].

2.3 Query Types

This section describes query types classification in a mobile environment. The gen-

eral query types are divided into two classes: Traditional and Mobile Queries. The

traditional query type category contains common query types that exist in a wired

network database, whereas the mobile query contains queries that exist only in a

wireless environment.

Figure 2.3 shows query type classifications in a mobile environment. The tra-

ditional query is the typical database queries. If we classify the traditional query

based on the geographical presentation, this type of query can be divided into two

classes: Location-Aware and Non-location. In the mobile computing environment,

the location of mobile users is dynamic and the query results often depend on this

dynamic location. Therefore, this situation creates another additional class, which

is called Location-Dependent Queries.

Figure 2.3: Query types classification

2.3.1 Traditional Query

Traditional query is the most widely known query used in a database. The query

types of traditional query can be classified as: Spatial, Temporal, Spatio-Temporal

(Hybrid) and Others.

A Spatial query performs operations which include spatial searches and map

overlay, as well as distance-related operations [37]. A spatial query always requests

for spatial data information. Spatial data means that the requested data have a

complex structure, are often dynamic and no standard algebra are defined.

A Temporal query specifies a validity or deadline for the query results to be

returned. Example: “A student retrieves a subject timetable for this year”. The

subject timetable will not be valid for the past or future year.

A Spatial-Temporal (Spatio-temporal) query requests for a spatial search and

specifies the validity or deadline for the query results to be received. Example:

“Retrieve the five ambulances that were nearest to the location of the accident

between 4-5pm.” [90].

The last category is Other. It implies that the other remaining queries do not

belong to one of the classifications above. Examples:

• A tourist requests restaurant information.

• Students request their academic records or contact details.

2.3.2 Location Query

[50] were the first authors to introduce the idea of queries with location constraints.

These types of queries have one parameter which is location. It implies that the

query result is related to or depends on, that parameter.

Location Dependent Query [130, 63, 94] is a type of query where the answers

depend on the current location of the sequesters. For example, “select all restaurants

within 500 metres from my location”. The answer should give a list of restaurants

within 500 metres from the current location of the requester as illustrated in Figure

2.4. If the requester moves to a new location, the list of restaurants will be changed.

A location is an important field in this type of query and this field can be implicitly

or explicitly mentioned in the query [94].

These types of queries can be further categorised into two groups. The first group

is based on sources and objects, and the second one is based on query retrieval [113].

The sources and objects are represented as users while sending the query and the

searched objects. Their states can be either static or moving. The second state is

based on the states of the query retrieval either one-time or continuous. A one-time

Figure 2.4: Location-dependent query (LDQ) illustration.

query is a query that expects a query result in one-time. On the other hand, a

continuous query, as the name implies, is a query that receives a query result which

is based on the current location of the source at some moment in time. This query

is sent only once and updated location information is sent to notify the server that

the client has moved to a different location. Both groups mentioned above can be

further elaborated as follows.

(a) Data sources and objects states

This group focuses on states of location for either users or objects while a user

query is being proceed. The states of location for both can be static or dynamic

during the query processing.

Table 2.4 shows the division of group one. As we can see from the table, category

one is further divided into four subgroups. The first subgroup is a static user

Table 2.4: Mobile query category 1

User Static User MovingObject Static - xObject moving x x

probes for static object/s. This subgroup does not involve a mobility factor for

either users or objects. Whenever the query is sent, the query result returned

will always be the same. Therefore, the first subgroup cannot be included as a

Location-Dependent Query.

The rest of the three subgroups are: moving user probing static object/s, moving

user probing moving object/s and static user probing moving object/s. Details

about queries processing for these subgroups can be found in Section 2.4. Below

is a summary of the rest of these subgroups and their examples:

• Moving user searching for static object/s

In this query type, a user or requester is moving while issuing a query and

the requested query results are static. Examples of this type of query:

– While a taxi driver is driving, requests restaurants within 500 metres

from the current location.

– A tour guide in a moving car requests information about tourist at-

tractions nearby.

In the first example, a searching distance is explicitly mentioned; whereas,

in the second one, the searching distance is not mentioned. This situation is

not only applied for this type; it can also be applied to the other two types.

[100, 111] provide common operators for constrained location-dependent

queries which can be applied to location-dependent queries in both groups.

• Moving user searches for moving object/s.

Both users and objects are moving for this type of query. Below are exam-

ples of query types:

– A walking person is searching for an available taxi close to his location.

– Police in a patrol vehicle are pursuing a running thief.

• Static user searching for moving object/s.

In this query type, the user remains in the same position while asking for

moving object/s. Below are examples of this query type:

– A security officer in a control room is searching for a fleeing thief.

– An officer in a control room is asking for landing time when an aircraft

is landing.

(b) Query Retrieval States

The second category relates to how often the query result is expected to be

received, that is, whether it is periodic or one-time. Table 2.5 shows query types

in category 1 are used with query types in category 2.

Table 2.5: Mobile query category 2

Periodic One-timeStaticUser-MovingObject x xMovingUser-StaticObject x x

MovingUser-MovingObject x x

Details of both types in category 2 are specified below:

• One-time Query

One-time query expects a query result to be received once. It means that

this query does not depend on the time interval. All the query types in

category 1 are one-time queries if their results are received once.

• Periodic Query

Periodic query is similar to one-time query, except query results are re-

ceived at every time interval and the time interval is specified in periodic

query. Periodic query is also called range-monitoring query [17]. It is used

for monitoring query continuously. The returned query results of periodic

query may be the same as or different from the previous query results in a

past interval time. Example: “A moving car is asking for traffic conditions

within 500 metres for every 5 minutes”.

2.4 Server Query Processing

This section presents a discussion of existing work on location-dependent query

processing at the server side. A brief overview is presented first to provide an idea

of how a location-dependent query is processed, followed by query processing at the

server side, indexing structures used at the server side and query processing at the

client side.

2.4.1 Overview Location-Dependent Query Processing

As we mentioned in the earlier section, mobile users need to register with a Base

Station or location positioning device (as mentioned in Section 2.2.2) in order to use

a wireless facility. This registration process includes registering location details of

mobile users [67, 19].

After the registration process has been done, a location-dependent query is sent

by a moving user and this query is received by a base station. In processing this

query, the user mobility factors are considered since they are important factors in

answering a location-dependent query [30]. The mobility factors include current

position, velocity and direction of the user, all of which are linked to the query.

This information is used to predict the next location. After the next position is

known, the server probes its database to match the object information against the

user query.

While processing the query, the mobile user moves to another location, which

could be inside the same cell or another cell. In addition, the movement of the

mobile user can be differentiated into two categories: Constraint and Unconstrained

movements [85]. The former is the movement within a network, for example, users

may be driving car, riding bicycle, or travelling by tram or train. In addition, roads

can be either one-way or two-way roads. The latter one is the movement that is not

restricted, for example, walking.

Furthermore, there are three important situations that will lead to wrong an-

swers being given to the recipients. To illustrate this situation, let us consider that

most wireless applications use GPS to get the accurate location. For example, the

server is starting to process the query. In one situation, the user might disconnect

while informing the GPS. Therefore, the GPS does not have the correct location

information of users. The GPS gives the old location instead of the current one to

the server. In another situation, the server uses the current location information

collected from the GPS, but the location information given is not the current one

since the GPS has not been updated with the latest one. This last situation might

occur where the user is expected to enter cell A. However, the user enters cell B.

This situation will lead the server in cell A to process unnecessary requests.

Figure 2.5: Requesting a static object and moving within a single cell.

Figure 2.5 shows a global overview of location-dependent query processing within

a single cell. A mobile user sends a location-dependent query to a server asking for

static objects through a BS. The server generates a query result for that query. The

query result is received by the user, but the result is invalid. This is due to the user

having reached a new location and the result does not apply to any objects within

the query scope of the new location.

Figure 2.6: Requesting a static object and moving to another cell.

Figure 2.6 illustrates the requesting of a static object with multi-cells move-

ment. A moving user requests a static object while moving into another cell. The

server processes the query and sends the query results to the requester. Since the

requester has moved to another cell, the BS forwards the query result to the BS

where the requester is located. When the query result is received by the requester,

the received result is invalid since the result contains object information from the

previous location.

Figure 2.7: Requesting a moving object and moving within a single cell.

A global overview process of requesting moving objects while moving within a

single cell is shown in Figure 2.7. In the figure, a moving truck requests a moving

object from his location. At the same time, a moving object is registering itself to

the BS. The server processes the query and returns a result to the requester, which

is the moving truck. Upon receiving a query result, the moving car has moved to

another location which is out of range of the query scope. Therefore, the truck

receives an invalid query result.

Figure 2.8: Requesting a moving object (user and object moves to another cell).

Figure 2.8 shows a user searching for a moving object but both move to another

cell. While sending a query, the user moves to another location resides inside dif-

ferent cells. The cell generates the query result and sends the query result back to

the requester. At the same time, the object moves to another cell before the object

acknowledges the new position to the current cell. Therefore, the server sends an

old position of the moving object to the requester. This event results in the user

receiving an invalid result.

Figure 2.9: Requesting a moving object and user stays at same position.

Figure 2.9 shows a user in the controller room requesting a moving object. Before

the server processes the query, the object has updated its position. Therefore, the

server generates a correct result that is received by the user. On the other hand,

when the server finished processing the query before the object updates its position,

the user receives incorrect information.

Figure 2.10 shows a similar situation to that shown in Figure 2.9. However,

the object moves to another cell. When the object updates its location before the

Figure 2.10: Static user requests a moving object.

server has finished processing the query, the user receives the correct information.

Otherwise, the user will receive a query result that contains incorrect information.

Figure 2.11: Periodic query

Figure 2.11 presents a periodic query illustration. In the figure, a user sends a

query once while expecting a query result to be sent at every interval time. The

server processes the query and sends the result at every interval time. The process

ends if the user asks the server to stop sending a query result.

Figure 2.12: Non periodic query

Figure 2.12 presents an illustration on a one-time query, where a user sends a

query and receives a query result once. The server no longer processes the query.

2.4.2 Query Processing for a Single Cell

This section presents a query processing mechanism, that focuses in particular on

location-dependent query processing while the mobile user is travelling within a

single cell. The discussion in this section involves a variety of query scope shapes

and approaches to predict next movement location.

A number of shapes exist, such as rectangles, circle, polygon, hexagon [75].

Defining a valid scope for a mobile client is important to generate a correct answer

to a given query since the mobile user has moved to a new location. In this section,

we analyse previous studies in defining a valid scope. The existing works focused on

defining a valid scope using polygon, rectangle and circle.

• Polygon

An approach called Polygonal Endpoints (PE) uses a polygon shape to process

a location-dependent query [130]. A direct way to explain the valid scope

of data value is by using the PE scheme. All endpoints of the polygon are

recorded to define a valid scope.

• Circle

Another way to define a valid scope is by using the Approximate Circle (AC)

scheme. The AC scheme is one of the most convenient ways to generate a

valid scope, if we know the distance within which the user would like to find

an object. In the AC scheme, a valid scope can be defined by the centre of the

circle and the radius of value. The maximum size of the circle can be defined

as the current velocity of the user [128]. The advantage is to predict size of

valid scope at the current speed in a time interval.

As mentioned earlier, the movement of a mobile user can be within either a con-

strained or unconstrained network. There are two ways to predict user movement:

using a time function and indexing.

One of the data modelling concepts to represent the position of moving objects

in databases as a function of time, is the Moving Objects Spatio-Temporal (MOST)

devised by [102]. The aim of this approach is to estimate the position of objects

when a query is entered. Therefore, excessive updates are avoided. In their work,

the location of a moving object is conducted as dynamic attributes which are divided

into three sub-attributes: function, updatetime and initial value. How the value of

these dynamic attributes changes over time is denoted by the function. This function

can answer both one-time and periodic query types.

Another way to decide the next location of moving objects is by using Indexing.

Chapter 2.5 discusses indexing structures in details.

In order to answer a query in an efficient way, a query or object space is par-

titioned into several regions. [103] provided a solution to answer Reverse Nearest

Neighbour (RNN) queries in two-dimensional space. They divide the space around

the client location into six equal regions by a straight line intersecting the client

location. Thus, there exist at most six RNN objects around the client location.

Moreover, the Region Quad-tree indexing structure is an indexing structure

which uses a minimum bounding rectangle to store data points in four quadrants

of equal size [97, 98]. Section 2.5 presents more details on Quad-tree indexing

structures.

2.4.3 Query Processing for Multiple Cells

While a user is travelling, the user may move to another cell due to the transparency

of a cell boundary. When a user moves to another cell, a handover event occurs

during this period. The current base station may send an invalid query result after

the user moves to another cell.

Zheng et al. [128] categorised various handover mechanisms into four types:

Naive, Priority, Intelligent and Hybrid mechanisms. The Naive method is the sim-

plest of the four mechanisms to be implemented. However, the waiting time for

answers from a server is shorter compared with the Priority method which can an-

swer queries of normal users unless the number of urgent users keeps increasing.

The Hybrid method does not give a better result because, if the number of users is

large, the waiting time will be lengthened. The Intelligent method gives a better

result since this method does the calculation of the expected time to leave current

cell. In this method, if the expected time to leave current cells is known, the BSs

of new cells know when to process the queries by assuming that unexpected delays

are not occurring.

On the other hand, [73] proposed other four handover approaches, namely Ping-

Pong avoidance (PPA), Towards the Border (TTB), MGIS Data Resolution (MDR)

and Transmission Power and Interference Optimization (TPIO). In the PPA ap-

proach, undesirable handoffs can be minimised by taking advantage of the area

information and mobility model to predict users movement. The TTB is useful for

predicting when the users will reach the boundary of the BS.

The Intelligent and TTB approaches have the same purpose which is to predict

how long it takes the users to reach the BS boundary. However, the Intelligent

approach is very straightforward in computing the reaching time in a new BS cover-

age. This approach ignores the movement direction. In contrast, the TTB approach

considers user directions in computing the reaching time.

2.5 Indexing Structures for Query Processing

The indexing technique is a common mechanism to help in accessing a collection of

records and improving the efficiency of query processing [93, 129]. This technique

uses an index structure, which is a data structure that organises data records to

optimise certain kinds of retrieval operations. An index allows us to efficiently load

all records that match search conditions on the search key fields of the index.

Various index mechanisms for conventional and mobile query processing are dis-

cussed in this section. Existing indexing mechanisms including their related out-

standing problems for query processing will be discussed.

2.5.1 Conventional Index Query Processing

Due to its efficiency in answering queries, all database records have been indexed

and placed into an index structure. Various types of index structures have been

developed [93, 32, 54]. Amongst those existing index mechanisms, the tree based

schemes are prominent and widely used due to their easy tree traversal [123, 115].

The B+-tree [32], is widely known as one of the data structures for index, and

is a data structure that contains subtree and leaf nodes. A subtree is formed by

a collection of non-leaf nodes. A non-leaf node contains up to m keys and m+1

pointers to the nodes on the next level of the tree hierarchy. All nodes on left-hand

side of the parent node have key values less than or equal to the key of that parent

node. In contrast, the key values of the right-hand side nodes of the parent node

are greater than the key values of parent node. The bottom-most nodes are called

leaf nodes.

The R-tree index structure stores multi-dimensional indices (such as points),

which was developed by Gutmann [41]. This index structure type has the efficiency

and capability to handle both point and region data items. Many researchers [109,

41, 11, 99], just to name a few, expanded the features of the original R-tree into

many variations of R-tree. The aim of this expansion was to provide an efficient and

dynamic index structure for spatial data.

The structure of R-tree is similar to that of the B-tree indexing structure. Figure

2.13 illustrates the R-tree. In the B-tree, a node of the tree is a single index; however,

a node in the R-tree stores a set of d-dimensional geometric objects represented as

a rectangle, which is called a Minimum Bounding Rectangle (MBR), which is used

to group the closest objects together into a rectangle where every area has the least

enlargement area.

The R-tree insertion operation can be explained as follows. Assuming that sev-

eral data points would be inserted into an R-tree with a maximum 6 points per

node. In the first state, while inserting data points with id p to rectangle R, a

bounding box is computed for the object and insert the pair <p,R> into the tree.

The bounding box is enlarged when a data point is inserted. If the tree is empty,

then this bounding box becomes a root node of the tree.

(a) R-tree in two dimensional space

(b) Rtree

Figure 2.13: The R-tree illustration [93]

After a certain time, when the maximum points for a bounding box have been

reached, a new bounding box is created to accommodate new data points. The

existing objects are redistributed to adjust to the bounding box. Adjustment of the

bounding box is called Splitting. In general in a tree splitting process, objects in the

existing bounding box, to minimise the need for enlargement, are grouped together.

Once the splitting process has been completed, both nodes become leaf nodes. A

new root node is created to cover both bounding boxes.

Now, assume that a R-tree exists and a data point with id ‘d’ is to be inserted.

A traversal is started at the root node and cruises a single path from the root node

to a leaf. At each level, the child node is chosen whose bounding box demands the

least enlargement to cover the data point d. If several children have bounding boxes

that cover d, from these children, we select the one with the smallest bounding box.

At the leaf level, the data point is inserted, and if necessary, the bounding box of

the leaf is enlarged to cover d. When the bounding box is enlarged at the leaf level,

this enlargement must be propagated to ancestors of the leaf (after the insertion

is made), the bounding box for every node must cover the bounding box for all

descendants. If the leaf node lacks space for the new object, a similar process as

mentioned above is applied, which includes splitting the node, reallocating entries

between the old leaf and the new node, adjusting the bounding box and propagating

these changes up the tree. Algorithm 2.1 shows the R-tree insertion algorithm for

inserting a data entry E(I,B).

Algorithm 2.1: The R-tree insertion algorithm.

beginN ← root Node.if N is a leaf then

return N.endSelect a node ’A’ in N whose A1 needs least enlargement to store EI.Traverse until a leaf node is reached by setting N to be the child nodepointed by A.if the selected node A is the leaf node and has a free space for E then

Insert E.else

Split node A using one of the splitting algorithms.endPropagate any changes upwards by invoking Adjust Tree.if Adjust Tree requires the root node to be split then

Expand length of the tree.end

Algorithm 2.2 shows the algorithm for the adjusting tree. The process is started

from a leaf node upwards until it reaches a root node of the tree. When a node

is full because of the insertion of a new record or a previous split, a new node is

created to store the remaining contents of the existing node. The adjusted node is

propagated to its parent node until it reaches the root node.

Algorithm 2.2: The adjusting R-tree algorithm.

beginN ← a Leaf Nodeif N was split previously then

NN ← LL where LL is the second split node.endwhile N 6= rootNode do

P ← parent node of NPN ← a bounding box of N in PAdjust PN I so that it tightly covers all bounding box entries in N.if NN is partner of N due to resulting from an earlier split then

Produce a new bounding box called PNN which covers allrectangles in NN and the pointer PNN ptr pointing to NN in P.Add this new bounding box PNN to P.if P has no free slot then

Execute Split Node to separate the content of P into P and PP.N ← P .NN ← PP .

endend

When nodes are full, nodes splitting occurs. The splitting mechanism is not as

simple as for B-tree since there will be overlapping of MBRs. The original R-tree

proposed three splitting mechanisms [41] as follows:

• Linear

This splitting algorithm is a mechanism that selects ends that are far apart. It

finds nodes by selecting them randomly and allocates them so that the smallest

MBR enlargement is required by the allocation.

• Quadratic

This algorithm minimises a small-area split; however it is not guarantee to

produce the smallest area possible. Similar to the Linear mechanism, this

algorithm selects two nodes that have the maximum distance between them

and allocates another node to one of the two nodes. In the node allocation,

the node is placed into a group in order to have less expansion.

• Exponential

This algorithm is the most straightforward splitting algorithm of the three

candidates. It finds all possible groupings and selects the best one, so the

minimum area node can be found.

To search a point for a query Q in a R-tree, a traversal begins with the R-tree

root node to a leaf level. The bounding box for each child of the root is verified to

see whether this bounding box overlaps with the query. If more than one child of the

root has a bounding box that overlaps Q, all corresponding subtrees are traversed.

When we get to the leaf level, the node is checked to find whether it contains the

desired point. On the other hand, it is possible that a leaf node will not be visited if

the query point is not in the indexed dataset. Algorithm 2.3 is a searching algorithm.

R-trees can also be used to answer Nearest-Neighbour (NN) queries [96]. Nearest-

Neighbour queries are to find objects within a certain radius. Minimum Distance

(MinDist) and Minimum of Maximum possible distance (MinMaxDist) ordering met-

rics are used for the R-tree searching algorithm. MinDist is used to decide the closest

objects to point P from all those enclosed in a rectangle R. MinMaxDist is a metric

used to calculate the minimum value of all the maximum distances between the

query point and points on each of the n axes respectively. This metric guarantees

there is an object within the MBR at a distance less than or equal to MinMaxDist.

Algorithm 2.3: The searching R-tree algorithm.

beginN ← a root node.if N is not a leaf node then

Find each child of the current node that bounding box of the childnode overlaps the query point / region.if found then

recursively search the child of the node.end

elseVerify all entries to discover whether an entry overlaps with S.Return the entry that overlaps with the query point / region.

endend

Algorithm 2.4 shows the Nearest-Neighbour search algorithm with depth-first

search traversal. The traversal starts with the R-tree root node to the leaf level. In

the beginning, the nearestN (the nearest neighbour distance) value is infinity. At

each level, a new node parameter is pointed by a newly visited non-leaf node during

the downward traversal. The algorithm calculates the ordering metric restrictions

for all its MBRs and sorts their corresponding node into a list called Active Branch

List(ABL).

Once the ABL has been created, pruning strategies 1 and 2 are applied to the

list to eliminate unnecessary branches. Then, the algorithm goes through each

entry in the ABL until the ABL is empty. For each entry, this algorithm is called

recursively by passing the entry, Point and nearestN values. At a leaf node, a

function objectDIST is called to calculate the distance between the point and the

MBR. The returned value is compared with the current value of nearestN. If the

returned value is smaller, the value of the nearestN is updated. The step is repeated

for each entry in the leaf node. On the returning from the recursion, this new

Algorithm 2.4: Nearest-Neighbour search algorithm.

Input: Node, Point, Nearestbegin

// Current NODEcurrNode ← Node// Search POINTsearchPoint ← Point// Nearest NeighbournearestN ← NearestNODE newNodeBRANCHARRAY branchListinteger dist, last, i// At leaf level - compute distance to actual objectsif (Node is Leaf) then

for i = 1 to Node.count dodist ← objectDIST(Point, Node.branch[i].rect)if (dist < Nearest.dist) then

nearestN.dist ← distnearestN.rect ← Node.branch[i].rect

else// Non-leaf level - order, prune and visit nodes// Generate Active Branch ListgenBranchList(Point, Node, branchList)// Sort ABL based on ordering metric valuessortBranchList(branchList)// Perform Downward Pruning// (may discard all branches)last ← pruneBranchList(Node, Point, nearestN, branchList)// Iterate through the Active Branch Listfor i ← 1 to last do

newNode ← Node.branchbranchList[i]

// Recursively visit child nodesnearestNeighbourSearch(newNode, Point, Nearest)// Perform Upward Pruninglast ← pruneBranchList(Node, Point, Nearest, branchList)

endend

estimation of NN is taken and pruning strategy 3 is applied to eliminate all branches

with MinDist(P,M) > Nearest for all MBRs M in the ABL.

The three strategies of the pruning theorem are described as follows [96]:

1. An MBR M with MinDist(P,M) greater than the MinMaxDist(P,M1) of an-

other MBR M1 is discarded because it cannot contain the NN. This is used

for downward pruning.

2. An actual distance from P to a given object O which is greater than the

MinMaxDist(P,M) for an MBR M can be discarded, because M consists of an

object O1 which is nearer to P. This is used for upward pruning.

3. Every MBR M with MinDist(P,M) greater than the actual distance from P

to a given object O is eliminated because it cannot surround an object closer

than O. This is used for upward pruning.

In the context of retrieving objects from several servers, the above algorithms

are not efficient, because a tree traversal is always started from the root node in

those servers.

2.5.2 Moving Object Index Query Processing

Some researchers also have done some works in applying the concept of existing index

structures to process queries in a mobile environment. This section discusses existing

works that use an indexing structure to process queries in the mobile environment.

Authors in [107, 27] used the PMR Quadtree index structure to answer con-

tinuous queries that change in terms of function of time. The index structure is

another variant of the quad tree that used to store segment fragments and has a

hierarchical vector representation [80]. The index values contain a function of time

in the two-dimensional time-attribute space. More specifically, the PMR Quadtree

stores information about a line segment in every quadrant of the underlying space

that it crosses.

The RQ-tree index structure is a combination of the R-tree and the Quad-tree

to index the location of objects [39]. The authors argued that space entities are not

distributed evenly and they could form different shapes of objects. R-tree degrades

the performance of all scopes of located objects that are not close to rectangles.

Therefore, the RQ-tree contains the R-tree as the outer tree and the Quad-tree to

store the remaining objects. This approach uses the R-tree to store regular objects

(the form of objects is a rectangle) and the Quad-tree to store irregular objects,

where the Quad-tree root node is a leaf of R-tree.

The R-tree is used to index static range query and velocity constrained for pro-

cessing continuous spatial queries in querying moving objects [91]. In this mecha-

nism, all incoming queries are indexed in an R-tree index structure and the second

R-tree index (VCI) is a R-tree based index with an additional field vmax in each

node and is used to index all moving objects. The vmax entry for an internal node

is the maximum of the vmax entries of its children. At the leaf level, the vmax entry

is the maximum allowed speed among the objects pointed to by the node.

The Lazy Update R-tree (LUR-tree) was proposed in [59]. This approach in-

dexes the current positions of moving objects. It also decreases update cost by

eliminating unnecessary modification of the tree while updating the positions. The

index structure is updated only when an object leaves the corresponding MBR. The

LUR-tree swaps a position of an object in the leaf node only if the new position of

the object is still in the MBR.

The TPR-tree indexing structure is based on the R-tree to index continuously

moving objects at all times in the future [12]. In this scheme, the size of the rect-

angle is extended based on the velocity and time. Therefore, the number of targets

remaining inside the rectangle will increase as the size of the rectangle is increased.

This indexing structure is also used to index the uncertainty of moving objects [45].

The TPR*-tree indexing structure is an enhancement of the TPR-tree indexing

structure by considering predictive queries [105]. The TPR*-tree adapted insertion

/ deletion algorithms from R*-tree indexing structure.

Another variant of the TPR-tree is the TPROM-tree [28], which is an index

structure that indexes the current and future positions of moving objects. The

index structure also handles object updates efficiently by adopting a memory-based

update approach. The aim of adopting the memory update approach is to reduce

the update cost by avoiding the necessity to delete old data items from the index

structure while updating the index structure.

The Q+R-tree [121] indexing structure is similar to RQ-tree in terms of using a

combination of Quad-tree and R-tree indexing structures. However, the Q+R-tree

is used to index moving objects. The R-tree component indexes quasi-static objects,

whereas the Quad-tree indexing structure is to index fast moving objects which are

distributed over wider regions. In other words, the R-tree indexes those objects that

are currently moving slowly, whether or not they are crowded together in buildings

or houses.

Another variant of the Q+R-tree is the PQR-tree [40], which efficiently indexes

the current and near future positions of the moving objects. The PQR-tree also

extensively decreases the update cost. The PQR is different from Q+R-tree in

terms of integrating structure to put the moving objects. The benefit of this index

structure is that it is able to manage the moving objects inside and outside road

networks at the same time. The current position and near future positions of moving

objects can be queried effectively.

The D-tree is similar to the KD-tree indexing structure [124]. The D-tree index-

ing structure is a height-balanced binary tree which is constructed based on data

regions partitioning. In constructing the D-tree, a space is recursively partitioned

into two subspaces containing a similar number of regions until every subspace has

one region. One or more polylines in a two-dimensional space are a group of divisions

between regions which represents a partition of two subspaces.

The KD-tree is a binary search tree that represents a recursive subdivision of the

universe into subspaces by an average of (d-1)-dimensional hyperplanes [37]. The

hyperplanes are iso-oriented and their direction alternates between the d possibili-

ties. KD-tree is also known as the Range tree [13]. In [55], the authors proposed an

approach to map moving objects and their velocities into points and keep the points

in a KD-tree index structure.

The Spatio-Temporal R-tree (STR-tree) and the Trajectory-Bundle tree (TB-

tree) [86] are two indexing structures, which are extensions of R-tree, to index mov-

ing object trajectories. The former considers the trajectory identity in the index,

whereas the second one is a hybrid structure, which keeps trajectories and allows

for R-tree typical range search in the data.

2.6 Mobile Query Processing at Client Side

This section discusses issues that relate to query processing mechanisms for mobile

client devices. These issues are grouped into 3 classifications: Mobile-join, Top-K

queries and Caching. The first and the last categories are similar. In the first cate-

gory, data is downloaded from several cells, which has to be executed on the mobile

devices in order to obtain explicit results. Once the results have been produced

and sent to the user, they are deleted within a short amount of time. In the third

category, the data is retrieved from the current cell for the first time request and

loaded from the local copy for subsequent requests. In contrast to the first category,

data in the local copy is kept until there is not enough room to store new incoming

data. Therefore, providing a caching to cache frequently accessed data items on the

client side is an effective approach to improving the system performance [10, 126].

The second category focuses on retrieving records which are ranked in Top-K.

2.6.1 Mobile-Join

Obtaining explicit query results can be done at the client side by retrieving data

from several cells, which would be executed by joining them locally. Downloading

all relations from those cells may not be a perfect solution considering the limited

resources in a mobile device, which includes small size memory space to store a

large volume of data and small size of display to view all results [68]. Several join

mechanisms have been proposed and they are explained in this section.

[66] have proposed three query processing mechanisms at the mobile client side.

In the first approach, a mobile client requests data from related cells and those

data are joined on the mobile client device. In the second approach, all data are

downloaded from one cell and only the primary key is retrieved. The information is

then matched at the mobile client side. If any information is missing, the missing

information is retrieved from other side. In the last approach, all primary keys are

needed from cells and the downloaded primary keys are matched at the client side.

The data on those matched keys are downloaded from cells.

Authors in [83] proposed two query processing mechanisms, where the pieces of

data are located either in other mobile devices or servers. In the first approach, a

mobile user sends a query to a server which then informs other mobile users that

have other parts of data. The mobile user and the server send the data to the

requester. Similar to the first approach, the server is in charge of joining other data

and sends the data to the requester.

Block-Based Processing is a mechanism that breaks down the data into blocks

and transfers each block one by one to the server [68, 66]. The aim is to overcome

memory capacity limitation and narrow bandwidth in the mobile environment. Two

block-based mechanisms for client side query processing have been proposed in [66],

namely: Static and Dynamic blocks. Both mechanisms are similar in terms of

providing the number of records per block to be downloaded from one server. These

records are then compared with another list from another server. They are different

in the way records are eliminated from a block. For the dynamic mechanism, the

last record of each block is compared to find out which block containing last record

is smaller and this block will then be entirely removed. In other words, the block

that has the larger last record, will be preserved with the qualified match being

removed from the block.

The Recursive and Adaptive Mobile Join (RAMJ) mechanism can be executed on

a mobile device and joins two relations located on non-collaborative remote servers

[69]. Data space is partitioned into several parts and statical information about those

data is retrieved from the servers without downloading the original data. Based on

this information, the information from those parts are joined and adaptively select

the data and its relation that fall into this partition are loaded from the server.

2.6.2 Top-K Queries

Processing or displaying the entire query results is not needed if k number of query

results, which are highly ranked, can be processed or displayed. This type is called

Top-K queries. Some researchers have done some work in Top-K queries for Web

databases, Peer-to-Peer (P2P) network, data stream, sensor network or mobile en-

vironment [15, 9, 48, 120, 25].

Some works have been undertaken on processing Top-K in a web application

[62, 122]. This work take into account the unavailable relation attributes to be

accessed through the external web form interface. Furthermore, this situation causes

a potentially large set of data sets to be queried repeatedly. Hence, work has been

done to tackle the above challenges by proposing a technique that will execute Top-K

queries. The Top-K queries are executed through a setting where the attributes, for

which users determine target values, are controlled by external, autonomous sources

with a variety of access interfaces.

Content sharing, which is one application in the P2P environment has been

receiving more attention from users. As a result, the number of people using this

application has been increasing which will impact upon the network performance.

Therefore, it is essential to decrease the network data transfer due to the expansion

of the number of people using this environment. Most of the users are generally

interested only in a result that are correlates with their query, or the best one. One

solution is to apply the Top-K queries within this environment in order to return

the most relevant results.

Some researchers have been working on applying Top-K query algorithms in

the P2P network [8, 76, 21, 43]. Decentralised Top-K query developments have

been completed by some of these researchers. They tried to use local ranking,

optimized routing and merging to reduce the number of results returned to the

users. Consequently, the load of data transfer has been reduced, however ranking

and merging of results has increased the computing workload.

Top-K has also enabled the output of the highly relevant objects in the earliest

stage, and this is useful in mobile environments because the amount of data transfer

and power consumption can be reduced. Many of the existing works, for example

KLEE and SR-Combine [77, 78], have demonstrated their efficiency in dealing with

tough response times in a mobile environment. The proposed schemes deliver some

initial results early which reduces waiting time, data transfer cost and processing

power. The most important feature is its capability to adapt to various environments

with a faster bandwidth network, because this feature is self-adapting for retrieval

and concentrates on real time requirements.

Due to the increase in a number of popular applications, the size of the data

stream flowing over the network is also increasing too. It means that the data size

flowing over the networks will overload the network traffic. The impact of this is that

users may have problems if they are not fully able to handle the large, continuous

flow of data. One of the proposed approaches is the space-saving algorithm [77]

which uses the maintaining partial information of interest as the main idea. The

aim of this algorithm is to process some stream types before the data is eliminated

forever. In this algorithm, the benefit of defining top-K in data streams is based on

the frequency of elements retrieving 0.5 percent or more of the total hits which might

comprise the top 500 elements. Hence, this algorithm produces space efficiency with

a strict guarantee on errors that limits estimate counts of elements with Top-K

memory requirements.

2.6.3 Cache Replacement Policies

In general, placing a cache at client side reduces network activities between client and

server. Caching mechanisms have been widely used to store frequently accessed data

for database [7, 106], distributed [106, 36] and web systems [125] in wired network

and mobile environments [104, 47, 94]. This section focuses on cache replacement

policies for the mobile environment.

The Least Recently Used (LRU) cache replacement policy uses a timestamp to

eliminate objects from a cache. Timestamp is the time when data items are received.

When the cache is not large enough to receive new objects, this approach eliminates

data items that have the oldest accessed time.

[95] proposed a client caching which is based on the clustering structure ex-

ploiting both semantic and temporal locality, which is called Two-level LRU. This

approach clusters together the semantically or adjacently related query results in

the cache. Because of its intrinsic properties, semantic caching is regarded as an

ideal cache scheme for mobile computing. The aim of this approach is to keep the

most profitable data in the cache with the help of clustering. Thus, if a query Q2

can be totally or partially answered by Q1, it is put in the same cluster as Q1. In

the clustering process, when a query can be partially answered by a segment of a

group, part of the segment which is an answer to the query is removed from the

segment. Part of the segment and the remaining answer to the query are combined

into a new segment. If a segment becomes empty as a result of the removal, it is

removed from the cluster. On the other hand, if the query is partially answered by

segments which belong to different clusters, the clusters are merged into one cluster.

The aim of this approach is to effectively reduce wireless network traffic and deal

with disconnection.

[22] proposed a semantic model for client-side caching and replacement in a

client-server database system, which is called Manhattan-Distance based. In this

approach, the client maintains a semantic description for the data in its cache as a

reminder query. Reminder query consists of the missing parts that are not available

in the cache. The maintenance of usage information for replacement is done in an

adaptive fashion for semantic regions. The usage information here is incorporated

with collections of tuples. The usage of sophisticated value functions which asso-

ciates semantic notions of locality is possible if a semantic description of cached data

is maintained. This policy gives a higher priority to replacing the cached objects

which are the greatest Manhattan-distance from the client’s current location.

The Furthest Away Replacement (FAR) policy is proposed in [94]. In their

proposed approach, they make decisions based on the current location and movement

direction of mobile clients. Therefore, a priority is given to the data items that are

located furthest away and in the opposite direction to the user’s current location.

It means that the cached objects which have the higher priority will be evicted first

since the users are unlikely to access those objects within a short time.

The RBF-FAR replacement policy approach has slightly modified FAR [65].

They claim that FAR fails in some cases since predicting the next possible loca-

tion is not considered. The RBF-FAR improves the FAR approach by adding an

intelligent knowledge to predict the next possible movement. The aim of adding

an intelligent knowledge is to use RBFNN to predict the next location instead of

Velocity in FAR RBFNN is a self learning model which can learn from historical

information from the semantic segments index.

The Probability Area (PA) / Probability Area Inverse Distance (PAID) ap-

proaches are two cache replacement policies for location-dependent data under a

geometric location model [130]. In the PA approach, valid scope area and the access

probability of data items are two consideration factors for cached objects replace-

ment decisions. Whereas, the PAID approach considers the inverse distance as

another additional factor. In both approaches, the data item that has least cost of

product of those two factors has higher priority to be removed from the cache.

The Mobility Aware Replacement Scheme (MARS) approach considers a gain-

based cache replacement policy, which considers client’s location, movement direc-

tion and access probability parameters [60]. This approach is unable to detecting

a user’s regular movement. Therefore, the authors proposed another improved ap-

proach, which is called MARS+ [61], to deal with the temporal properties and spatial

properties of a client’s access pattern in order to improve the caching performance.

The MARS+ approach also makes it possible to detect regular client movement

paths. While deciding on which cached objects to be eliminated, this movement

pattern knowledge is used to evict the cached objects.

The Prioritized Predicted Region-based Cache Replacement Policy (PRRP) ap-

proach analyses the data item cost on the basis of access probability, valid scope

area, data size in cache and data distance based on the predicted region, which have

not been considered in any of the existing policies [57]. In this proposed approach,

the fundamental aim is to select cached item victims by using a predicted region-

based cost function. The predicted region is selected based on the client’s movement

and applies it to determine the data distance of an item.

The Weighted Predicted Region-based Cache Replacement Policy (WRRP) ap-

proach picks a predicted region based on a client’s movement, then, the predicted

region is applied to calculate the weighted data distance of an item [58]. This ap-

proach is similar to PRRP in terms of considering access probability, valid scope

area and data size in cache. The WRRP approach takes into account the weighted

data distance from the predicted region as an additional factor to pick an elimination

victim from the cached data items.

The Rule-based Least Profit Value (R-LPV) approach considers the profit gained

due to data caching [18]. In this policy, various caching parameters are considered.

They are data access probability, update frequency, retrieval delay from the server,

cache invalidation delay, and data size. The item is eliminated by using a function

called profit function. The purpose of this function is to determine the profit from

caching an item. This cache replacement policy is similar to the one in the client-

server environment [52].

The Proactive caching model caches the result objects as well as the index that

supports these objects as the results [46]. The purpose of caching the indexes is to

enable the objects to be reused for all common types of queries.

The Complementary Space (CS) cache replacement policy is used to maintain

a global view of the whole dataset [64]. In this cache replacement policy, different

portions of a global view are cached in varied granularity based on the accessed prob-

abilities in the future queries. The cached objects with very high access probabilities

are kept in the cache.

2.7 Outstanding Problems

A discussion of the problems and shortcomings of existing works is presented in this

section. Earlier in this chapter we reviewed the literature that dealt with works on

mobile query processing at the client and server sides. Our review reveals that there

are still problems and issues that need to be addressed and resolved.

An examination of existing problems from previous researchers is carried out,

which is described in the next three subsections. A discussion of mobile query

processing problems is presented in Section 2.7.1, followed by indexing mechanism

problems for processing multi-cell queries. The last problem to be discussed is a

client caching replacement policy, which will be presented in the last subsection.

2.7.1 Mobile Query Processing at Server Side

This section presents issues that are still outstanding in the mobile query involving

both single and multiple cells. After analysing the existing work on query processing

at the server side, several major problems still exist, one is the use of circle as a query

scope. There are objects which are located outside the circle boundary, which are

not retrieved by the query. Another is that objects have been passed are not of any

interest to the user. Hence, in this thesis, we focus on using square as a query scope

and excluding unnecessary items, which have been passed. We will demonstrate

that it is possible to retrieve additional items without including unnecessary items

into the query result, so as to improve the performance of query result retrieval.

The next problem is to overcome frequent disconnections in the query processing.

This implies that the query processing needs to be intelligent of when to process or

preserve the existing result in order to deal with the frequent disconnections while

processing a single-cell or multi-cell query. To date, we have not yet come across

an existing work that addresses on this issue. Hence, our research in this thesis

(Chapter 3 in particular) will look at improving the query processing at the server

The following questions state some of the major problems to be addressed:

• How do we model efficient mechanisms for query processing within a single

• How do we design an efficient mechanism to process queries which involve

several cells?

• How can the proposed model cope with overlapping or non-overlapping cells

boundaries?

• How do the proposed model deal with frequent disconnections?

2.7.2 Indexing Structures for Multi-Cell Query Processing

Processing a query has to be done quickly before the mobile users pass the predicted

location to receive the query result. In this section, we describe an indexing problem

when retrieving query results from multiple cells, which involves more amount of

records.

Indexing is a convenient way to store database records information into server

memory due to its small value. Many approaches have been created to organise

those indexes in memory; however, a tree index is one of the most prominent data

structure used in practice.

Some major problems of the existing tree indexing structures are as follows:

• How can we have a single structure that contains all static items in active

cells?

• How can we store all requested items from neighbouring cells into the current

cell without reducing any performance of the current cell?

• If we are able to solve the above problem, how do we manage the requested

objects when they do not exist in a cell?

2.7.3 Client Cache Management

This section describes an outstanding problem in query processing, which are focus-

ing on when users travel around several same locations and pose the same queries.

As we discussed in the previous section, the mobile environment has some limited

features despite its advantage of being able to establish communication everywhere

at anytime. Its unreliable network connection, narrow network bandwidth and ex-

pensive data transfer cost are some negative factors in the mobile environment.

Providing a cache for the client device is one way to overcome this issue, because

the incoming query results are stored in the local cache. The problem arises because

to date all existing approaches store all incoming query results to the local cache.

The next problem is dealing with the cache maintenance when the available

cache slots cannot accommodate all incoming query results. Improving the cache

hit performance is another focus of our research in this thesis. We attempt to use

an existing grouping mechanism to group the cached objects.

The following are questions regarding cache maintenance development:

• How can we model a cache that stores at least k items per request rather than

receiving a full set of incoming items in order to cope with the limitation of

the mobile environment?

• How does the quality of a cache hit improve by considering the weight factor?

• How can we model a cache by adapting one of the grouping algorithms where

the request items received at least K items?

• How can we design a cache model by considering distance, grouping and at

least K items per request?

2.8 Conclusion

At the beginning of this chapter, we have presented the architecture of a mobile

environment which includes the current wireless technologies. The mobile computing

environment has some constraints such as narrow bandwidth, short-life battery,

limited storage and frequent disconnections, all of which make the task of processing

mobile queries more complex.

A user’s mobility creates a unique class of mobile queries besides the traditional

query. This class is called Location-Dependent Query. The location of user and

objects are two important parameters in location dependent query. These two pa-

rameters add further complexity to query processing since these parameters change

their location during the query processing period.

Finally, the main contributions of this chapter can be categorised as following:

• A query taxonomy is presented. This classification is important since it is pos-

sible to analyse all types of queries for data management in a mobile computing

environment.

• Mobile query processing issues at both server and client sides are shown. The

issues, arising from location-dependent query processing in a mobile computing

environment, need further investigation.

Chapter 3

Query Processing at Server Side

3.1 Introduction

A mobile query is a query that is requested while the user is travelling. The current

location of mobile user is a unique factor that must be considered, because the query

result depends on the current location of the requester. Users may remain in the

same location or move to another location while waiting for the query results. If a

user stays in the same location, an invalid query result is unlikely to occur since the

receiving location is the same as the sending location. On the other hand, when the

user moves to another location, the sending location is different from the receiving

location. Although, the receiving location can be predicted based on the travel

velocity of the mobile user, invalid query result might still occur if the user passes

beyond the predicted location.

While requesting a query result, the query scope might request objects which are

located in the same or different cell. Query scope is an area where the user requests

some objects. For example, retrieve a list of hospitals within 500 metres. Therefore,

an area within 500 metres from the user’s location is called query scope. Cell is

CHAPTER 3. QUERY PROCESSING AT SERVER SIDE 62

a service area for one base station. A base station is an intermediate host which

connects mobile devices and static hosts. If a query scope does not pass one cell, the

query processing for the situation is called single-cell query processing. However, if

a query scope passes more than one cell, it is called multi-cell query processing.

In this chapter, we propose schemes for single-cell and multi-cell query processing

approaches at the server side, which focus on retrieving static objects. The proposed

approaches for single cell query processing are divided into three categories based on

the query scope movement against a base station: namely (i) Static, (ii) Dynamic

and (iii) Angle. The static category is characterised by the query scope and is parallel

to the base station. In this category, we have developed three algorithms, which are

based on horizontal, vertical and diagonal movement. The dynamic category is

when the query scope is perpendicular towards the user direction. Finally, the angle

category is based on the angle of user direction.

The proposed approaches for multi-cell query processing are categorised into two

categories. The first category considers an overlapping and non-overlapping amongst

base stations. The second one considers how to handle disconnection to the base

station boundary.

The structure of this chapter is shown in Figure 3.1. Section 3.2 presents a pre-

liminary knowledge as a foundation of this chapter. Sections 3.3 and 3.4 describe

proposed single-cell and multi-cell query processing approaches at server side re-

spectively. Section 3.5 discusses the proposed approach on handling disconnections

for single and multi-cell query processing. Case studies are described in Section

3.6, which give some illustrations and to support the explanation of the proposed

approaches. A further discussion for both approaches is provided in a discussion

section (Section 3.7). The last section of this chapter concludes the contents of this

chapter.

Figure 3.1: The framework of chapter 3

3.2 Preliminaries

This section presents an overview of query processing at the server side. The sec-

tion is divided further into three subsections as outlined in Figure 3.1. The first

subsection introduces all terms which are used in this chapter. The next sub-section

(Section 3.2.2) discusses a shape selection criteria to be used as a query scope.

Several query types are described in the last sub-section (Section 3.2.3).

3.2.1 All Terms Used

In this section, we introduce terms used in our work. These are:

• Cell scope: an area serviced by one base station. Mobile users can exchange

information with a base station within this area.

• Base station (BS): a stationary host which does message forwarding from and

to a static network. BS can connect to one or multiple database servers.

We assume that a BS connects to a single database server, even though it is

connected to multiple database servers.

• Query scope (QS): an area within which mobile users query static objects. We

use the terms ‘query scope’ and ‘valid scope’ interchangeably. This scope can

be presented using an existing shape, such as circle, hexagon, and square.

• Parallel query scope: a query scope which is parallel to a BS where mobile

users currently reside.

• Dynamic query scope: a query scope which is not located in parallel to a BS

where mobile users currently reside.

• Location: a point in two coordinates which present the location of a mobile

user or static objects. To simplify, we assume that a location of an object is

presented as a point.

• Travel direction: a straight line which is measured from starting to ending

points.

3.2.2 Shape Selection for a Query Scope

In this section, we discuss how to choose a shape as a query scope. There are a

number of shapes that can be used to denote as a query scope, such as: rectangle,

triangle, square, and circle.

Figure 3.2 shows all locations of vending machines, a restaurant and a user within

the BS boundary. Assuming the user would like to find the nearest restaurant within

n or m square, where n and m are numbers which represent a distance or area where

the target will be probed respectively. All targets within the boundary of BS are

valid for that BS only. In order to get a valid answer to a user’s query, the BS needs

to keep track of the current location and the query scope of a user. Otherwise, some

targets in a generated query result become invalid when the user has moved, even

though, its movement is still within the same boundary of the BS.

Figure 3.2: A scenario presented in two-coordinates

Now, we describe the shapes. Firstly, lets consider a rectangle. A rectangle has

different length for its horizontal and the vertical lengths. It is hard to apply the

rectangle as a query scope. Secondly, lets consider a triangle to represent a valid

scope. Assume the distances from the centre to left, right and top are the same.

These distances tell us that the base is twice its height. If we calculate the area of

a triangle, then the area of a triangle is the same as the area of a square. However,

it is hard to decide whether a target is inside the boundary. Therefore, we do not

consider to use a rectangle or triangle as the query scope.

The next two candidates, a square and a circle, have similar capabilities. A

square has more accuracy and can more easily be used to catch a target closest to

the user compared with other shapes, and its length is the same as its width. The

dimension of a square is presented by the distance from the user query to the left,

right, top and bottom. If an area is entered, then the dimension of the square can

be found by taking the square root of the area (√

area). Therefore, a dimension of

the square presents as a valid scope of the query.

On the other hand, circle is one of the most popular shapes of choice because

it is the right shape for retrieving the nearest neighbour objects in an efficient way.

All objects within distance n units can be found.

Figure 3.3: The proposed approach

Now, lets apply both shapes to the illustration in Figure 3.2. A sample query, “a

user would like to find a restaurant within n units from the current distance”. The

restaurant will be found if we use a square as a valid scope, because the square has a

greater scope compared with that of a circle as shown in Figure 3.3 if the perimeter

of both shapes are the same. Furthermore, one can argue that one could:

• Increase the size of the circle

This is a common argument in order to retrieve objects located close to the

query scope. However, we do not know the optimum size of the circle that

needs to be enlarged. If we increase the size of the circle, too many objects

are retrieved and resources are wasted (bandwidth, power consumption and

memory).

• Resend the same query

Resending the same query needs more processing time and power. It results

in the objects being passed or being outside the scope area. Furthermore, the

user may miss a query result if the server is busy.

Therefore, a square shape is preferred to be used as a query scope due to its

efficiency in query processing at the server side. This shape is more efficient and

accurate and can more easily be used to discover objects within this shape. Fur-

thermore, the possibility of finding the restaurant is higher than if square is used.

3.2.3 Query Types

This section describes briefly location-dependent queries. There are many queries

that are similar to a location-dependent query.

A common query type which is similar to the location-dependent query is the

spatial, location independent query. An example of a spatial query would be to find

a certain region at location X1, Y1. Note that, this type of query is not a location-

dependent query. The reason is that this query asks a certain object which does not

depend on the current location of the user.

The results of a location-dependent query type are dependent on the current

location of the mobile user who initiates the query. Current location means the

location of the mobile user when he/she receives a query result. This type of query

exists only in a mobile environment. Figure 3.4 shows a situation when a mobile user

sends a location-dependent query and receives its result. The sending and receiving

locations are not the same. Objects located inside the query scope are valid objects

which are returned to the requester.

Figure 3.4: A location-dependent query in details

The location-dependent query processing can involve object retrieval from both

single-cell and multi-cell. Single-cell query processing is a query processing where

the query scope does not pass the base station boundary (as shown in Figure 3.4).

On the other hand, if the query scope passes more than one base station boundaries,

the query processing is called multi-cell query processing.

Consider the query: “retrieve a list of restaurants within 500 metres”. In this

query, the location is implicitly mentioned. The query can be either a spatial or

location-dependent query type. It is a spatial query if the location of the requester

remains the same from the time of asking query until receiving the query result.

We can categorise it as a location-dependent query since it depends on the current

location of the requester. In contrast, it is a location-dependent query if the location

from which the query was sent or the location at the time of receiving the query

result are different.

Furthermore, consider this query: retrieve a list of restaurants within 500 metres

from hotel A. This query is not a location-dependent query since the location of

restaurants depends on the location of hotel A and is independent of the location of

the user. However, this query can be classified as a location-related query.

3.3 Query Processing for Single-Cell

We discuss in details how our proposed algorithm works in handling the situation

mentioned above in this section. Our proposed algorithms are divided into three

categories, which are elaborated in next three subsections. They are as follows:

• Static Query Scope

This category is based on movement of mobile users. We propose three algo-

rithms based on user movements: horizontal, vertical and diagonal. The query

scope is parallel to the base station.

• Dynamic Query Scope

The dynamic query scope category is the query scope that is perpendicular to

the travel direction of mobile user.

• Angle of Movement

Here, we consider the angle of the travel direction, which is calculated between

the travel direction and the centre horizontal line of the query scope. We

classify the angle of travel direction into three groups: 0 < α ≤ 30, 30 < α ≤60 and 60 < α < 90 degrees.

3.3.1 Static Query Scope Category

In this category, the query scope is parallel to a BS shape. It is parallel in the sense

that to avoid creating a query scope based on the travel direction of mobile users.

Hence, this category is to simplify a process of creating query scope.

Each of these three subsections presents the steps of the proposed approaches in

this category. The first subsection shows a main part of the proposed approaches.

The last two subsections contain two parts of the proposed approaches, which are

responsible to retrieve objects based on the user movement.

The Main Part

In general, this part is an entry point and contains the entire process of query result

retrieval in this category. The process includes receiving a user input, predicting an

expected recipient location, creating a query scope, searching objects in the current

BS, and sending the query results back to the mobile user. This process terminates

when the mobile user receives the requested information.

While the server is processing the user’s request, the user moves from one location

to another. The contents of query result retrieval would be based on the recipient

location instead of the location of sender. After the recipient location is known, the

query scope is generated whose size is based on a given value. Information of all

objects where their location inside the query scope are then retrieved. Finally, the

information is shipped to the requester and an acknowledgement is expected to be

received. If the server does not receive any acknowledgement, the server produces a

new query result and sends it to the user. The new query result might be different

with the current one due to the mobility of the user.

Algorithm 3.1 shows the details of our main proposed algorithm. It can be

explained as follows:

(i) The server receives an input from the user. It contains the following factors:

the current location of mobile user, travel direction, velocity, and searching

distance. The first factor is very straight-forward, which is the location when

Algorithm 3.1: The main proposed algorithm

Input: Location, Query, SpeedOutput: Resultsbegin

tstart ← time when a query is received (assume zero)(TDx, TDy) ← travel distance from tstart to tstart+1 at the current velocity(X1, Y1) ← Sender location at time tstart

// Prediction of next location at time tstart+1

(X2, Y2) ← (X1 + TDx, Y1 + TDy)SD ← Searching distance (from user query)isReceived ← falseallObjsFound ← emptywhile isReceived do

Create a query scope with dimension 2 * SD and 2 * SD at location(X2, Y2)Divide the scope into 4 equal areas where the user is located at thecentre pointDir ← user travel directionobjsInOverlappingArea ← calling algorithmcheck overlapping area(allObjsFound)allObjsFound ← calling the algorithm to get valid objects based onuser movementisReceived ← send allTargetFound to userif isReceived is false then

tstart ← tstart+1

(X1, Y1) ← update the location at time tstart

(X2, Y2) ← update the location at time tstart+1

endend

the user sent a query to the server. A travel direction can be determined us-

ing either a travel history or two points in two-dimensional coordinates. The

process is simplified by producing two dimensional coordinates. The coordi-

nates are connected by a straight line, which show the direction of the travel

between the start and the end points. The velocity value is taken from the cur-

rent value of velocity. The last factor is used to measure how far the searching

area distance is.

(ii) Predicting the next location where the mobile user is expected to receive the

query result. It is predicted by doing a calculation based on the current travel

direction, speed, and query processing time.

(iii) Creating of a query scope process. Since a square has the same height and

width, the dimension of the square is presented by the length of the param-

eter. The length parameter is the searching distance from the client request

multiplied by two. We multiply by two, because the length is the distance

from the user to the left and the right sides.

(iv) Once it has been created, it is divided equally into 4 regions. The aim of this

division is to speed up the searching process on the server side; therefore, the

regions that have been passed, which are located in the opposite direction, will

not be processed further.

(v) Verify whether there is an overlapping area, which is an overlap area between

previous and current query scopes. This area exists if the mobile users fail to

receive the query result at time tstart−1. The time tstart−1 is a unit of time when

the user expects to receive the query result previously. The purpose of checking

the overlapping area algorithm of the query scope is to avoid processing the

existing targets in the next interval time. The details of this process will be

discussed in Section 3.5.1 while a disconnection issue being presented. The

execution result of this process is either a set information of objects, which

locates in this overlapping area or an empty set of information.

(vi) Load information of objects that their locations is located within the query

scope. The result set of the overlap query scope is passed with a consideration

that the overlap area is excluded in the current process of probing objects.

Unless the mobile user is predicted to stop while retrieving the query results,

the server decides which area of the query scope is processed. When the travel

direction of the user is either horizontal or vertical towards the query scope, two

regions of query scope are processed. On the other hand, if the travel direction

is diagonal, a region of the query scope is processed. These regions of query

scope being processed are located in the same direction as the travel direction.

If the user misses a query result previously, these regions are subtracted by the

overlapped area. The information of all objects in this area is retrieved. The

details of these processes are presented in the next two subsections.

(vii) Send the generated query result to the user. Once the query result is ready to

be shipped, the collected information is then sent to the user and the server

waits an acknowledgement from the user. The mobile user sends an acknowl-

edgement to the server once the query result has been received. In contrast,

when the mobile user receives either partial query result or none, due to a weak

signal or disconnection, the acknowledgement will not be sent to the server. At

the server side, a parameter is used to kept track whether an acknowledgement

has been received by the server. Its value will be true if an acknowledgement

from the user has been received. Otherwise, its value is false and the server

prepares the next query result for time tstart+1.

The Vertical/Horizontal Movement Algorithm

This section focuses on horizontal or vertical movement in two-dimensional coor-

dinates. The vertical movement is when a mobile user travels along Y-axis of the

two-dimensional coordinates, whereas the horizontal movement is when the mobile

user travels along X-axis. A discussion of the proposed vertical movement algorithm

is presented first, followed by the horizontal movement.

The proposed vertical movement approach retrieves information of requested

objects based on travel direction of the mobile user. In the start, it receives the

current position, a query scope and travel direction of a mobile user from the main

part (see previous section). The current position and travel direction are used to

determine which regions of query scope is being processed. When the mobile user

is going up-direction, information of all objects which are located in the two upper

regions (regions 1 and 2) are retrieved. However, if the information of objects has

existed in the overlapping collection, they will not be loaded anymore. On the other

hand, when the mobile user is going down-direction, the similar approach is applied.

However, the difference is to retrieve only information of objects, which are located

the two bottom regions (regions 3 and 4). The algorithm of this scheme is shown in

Algorithm 3.2.

Figure 3.5 shows examples of how this algorithm works when a user goes verti-

cally. All information about targets located in shadowed regions is sent to the user.

While a user is travelling down or south, all information about objects located in

the bottom regions (regions 3 and 4) is sent to the mobile user as shown in Figure

5a. On the other hand, all information about targets located in top regions (regions

1 and 2) are sent to the mobile user while a user is moving up or north (Figure

3.5b).

Algorithm 3.2: The vertical movement algorithm

Output: Resultsbegin

Objects ← objects collection in the current Base Station boundary(X, Y ) ← Current location at tstart+1

Dir ← user direction(either up or down)overlapping collection ← list of objects in the overlapping areawhile (still have more Objects) do

if (object is not in overlapping collection) and (object is in scope) thenif (direction is up) and (object.Ycoordinate ≥ Y ) then

collection ← collection + objects found in region 1 and 2else

collection ← collection + objects found in region 3 and 4end

endobject ← next object

endcollection ← collection + overlapping collection

(a) Down (b) Up

Figure 3.5: The complexity of vertical movement

The proposed horizontal movement approach is similar with the previous one.

They are the same in terms of total processed regions. However, they are different in

terms of travel direction and selection regions being processed. Besides the mobile

user travels along Y-axis, he/she moves horizontally along X-axis. Another difference

is the region of query scope being searched. Instead of the upper or bottom regions,

the left and right regions are searched when the mobile user is going to left and right

directions respectively. Algorithm 3.3 shows the process of this approach.

Algorithm 3.3: The horizontal movement algorithm

Dir ← user direction (either up or down)overlapping collection ← list of objects in the overlapping areawhile (still have more Objects) do

if (object is not in overlapping collection) and (object is in scope) thenif (direction is right) and (object.Xcoordinate ≥ X) then

collection ← collection + objects found in region 1 and 4else

collection ← collection + objects found in region 2 and 3end

Figure 3.6 shows two situations while a user is moving horizontally. On the one

hand, a user is moving from right to left. On the other hand, the user is moving

from left to right. In the first situation, the targets in the left regions, regions 2

and 3, are retrieved and sent to the user (as shown in Figure 7a). In the second

situation, the targets in the right regions, regions 1 and 4, are fetched and sent to

the user (Figure 3.6b).

(a) Left (b) Right

Figure 3.6: The complexity of horizontal movement

The Diagonal Movement Algorithm

The last algorithm of this category is to retrieve objects while the users are travelling

diagonally. We assume that a user is interested only in getting all objects in the

opposite region of his/her incoming region when the user is moving in a diagonal

direction.

This algorithm is similar to the ones mentioned in the previous section, except

the region being searched and the travel direction. Total regions to be searched is

one instead of two, whereas the travel direction is differentiated into four directions:

Bottom-Right, Bottom-Left, Top-Left and Top-Right. In this approach, the searched

region is the opposite region of the travel direction. For example, when the travel

direction is detected comes from the bottom-left region, information of objects which

are located in top-right is searched. Algorithm 3.4 presents the process of this

approach.

All possibilities of diagonal movements are shown in Figure 3.7. If the user is

coming from a Bottom Left direction, all objects found in region 1 direction will be

(a) Top Right (b) Top Left

(c) Bottom Left (D) Bottom Right

Figure 3.7: The complexity of diagonal movement

Algorithm 3.4: The diagonal movement algorithm

Dir ← user direction(either diagonal up or down)overlapping collection ← list of objects in the overlapping areawhile (still have more Objects) do

if (object is not in overlapping collection) and (object is in scope) thenif (direction is Top Right) then

collection ← collection + objects found in region 1else if (direction is Top Left) then

collection ← collection + objects found in region 2else if (direction is Bottom Left) then

collection ← collection + objects found in region 3else if (direction is Bottom Right) then

collection ← collection + objects found in region 4end

returned. When the user is coming from the Top Right direction, all objects found

in region 3 will be returned to the user (Figure 3.7c). Another example is that if

a user goes to Top Left, the opposite region (region 2) will be probed (shown in

Figure 3.7a).

3.3.2 Dynamic Query Scope Category

This category focuses on the query scope that is dynamically changed based on the

user direction. It does not imply that the shape of query scope is changed, but

the query scope is not parallel towards the cell scope (as shown in Figure 3.8). In

contrast, the query scope is perpendicular to the direction of the user and the angle

of movement is not necessary. Making the query scope to be perpendicular to the

direction of the user has significantly reduced its complexity. After the query scope

was created, information of objects within the shaded area is retrieved and returned

to the user.

Figure 3.8: Dynamic query scope for the diagonal movement

Algorithm 3.5 shows the process of this approaches. The details of the proposed

approaches are described as follows:

(i) Generates a line equation of the travel distance.

(ii) Forms a query scope with a given size and perpendicular to the above line

equation.

(iii) Finds an overlapping area between current and previous query scopes. If any,

selects the objects in the overlapping area.

(iv) Retrieves information of all objects that are located in the shadowed regions

of the current query scope and outside the overlapping area. The shadowed

regions are areas that have not been passed.

Algorithm 3.5: The dynamic query scope algorithm.

objects ← objects collection in the current Base Station boundary(X, Y ) ← Current location at tstart+1

(TDX , TDY ) ← user searching distanceTravel line ← line equation for travel directionQuery Scope ← query scope which is perpendicular to Travel lineSearching area ← two regions of Query Scope which is located in frontof current locationoverlapping collection ← list of objects in the overlapping areawhile (still have more Objects) do

if (object is not in overlapping collection) and (object is inSearching area) then

collection ← collection + objects found in Searching areaendobject ← next object

3.3.3 Angle of Movement Category

Previously, Section 3.3.1 described our proposed approaches where the query scope

of user is parallel to a BS that the user are currently connected to. However, its

discussion is limited only three travel directions, which is not efficient. Flexibilities

of travel direction of a user are added to extend the proposed approach, which are

elaborated in this section. The flexibilities are categorised into 3 ranges: 0 < α ≤30, 30 < α ≤ 60 and 60 < α < 90.

Figure 3.9 shows another possibility for retrieving query results while users are

travelling diagonally, by considering the angle of movement. Measured angle is an

angle between the travel direction and the centre horizontal line of query scope.

Two front regions of the query scope will be searched when the mobile user travels

at the following angle: (i) equal to or less than 30 degree, (ii) 60 < α < 90 degree.

In contrast, if the movement angle is between 30 and 60 degree from the horizontal

centre line of the query scope, the front region to be probed is only one.

(a) 0-30 degrees (b) 30-60 degrees

(c) 60-90 degrees

Figure 3.9: Angle of movement illustrations.

The retrieval algorithm of this category is similar to the static category, except

the region verification. It considers a travel direction from various different angle

rather than either horizontal / vertical or diagonal. Before we presents a discussion

of the proposed approach in details, an illustration of this and its possibilities are

shown and described first.

Figure 3.10 shows the possibilities of the algorithm in Algorithm 3.6 for searching

two regions. They are when mobile users travel within 60-90 degrees direction are

shown in Figures 3.10a and 3.10b. On the other hand, Figures 3.10c and 3.10d

show mobile users travelling within a 0-30 degrees direction. The shaded area with

diagonal lines are the selected regions of query scope, which contains a query result.

Another shaded area, filled with crosses lines, are a range of user’s movement to

retrieve objects within the selected regions. Furthermore, there are another four

possibilities when mobile users are travelling within an angle of 30-60 degrees. They

are the same as the ones shown in Figure 3.4.

(a) 60-90 degrees top (b) 60-90 degrees bottom

(c) 0-30 degrees left (D) 0-30 degrees right

Figure 3.10: The complexity of angle movement.

Algorithm 3.6 shows the details of this category. The details can be explained

as follows:

(i) Receive the input and creates a query scope, which is the same as Algorithm

(ii) Calculate how far the user has been travelling by measuring the start and

end positions. Find the angle between the travel distance and the X-axis

(horizontal line).

(iii) Find the objects in the overlapping area, using algorithm in Section 3.5.1.

(iv) Select the regions of the query scope based on the travel direction.

(v) Recursively find all objects which are located inside the selected regions of

query scope, but they are not in the overlapping area.

(vi) Collect information of these objects and send it to the user.

3.4 Multi-Cell Query Processing

Our proposed approaches to solve an issue while a user is travelling within a single

cell have been presented in Section 3.3. Unfortunately, the user could freely travel

from one to other cell. Due to this movement, the mobile user could query an

area which involves several BSs to answer the query, known as multi-cell query.

Every BS needs to have a knowledge of its neighbouring BSs in answering such

query, therefore it is essential that they are required to register their details to their

neighbour BSs [101].

Algorithm 3.6: The angle of movement algorithm.

Objects ← objects collection in the current Base Station boundary(X0, Y0) ← location at tstart

(X1, Y1) ← location at tstart+1

α ← sec (X1−X0)√((X1−X0)2+(Y1−Y0)2)

overlapping collection ← list of objects in the overlapping areawhile (still have more Objects) do

if (object is not in overlapping collection) and (object is in scope) then// X-positiveif (X1 < X0) then

if (α ≤ 30 and α ≥ -30) thencollection ← collection + objects found in region 2 + 3

else if (α < 60 and α > 30) thencollection ← collection + objects found in region 3

else if (α > −60 and α < −30) thencollection ← collection + objects found in region 2

else if (α ≥ −60 and α ≤ −90) thencollection ← collection + objects found in region 1 + 2

else if (α ≤ 60 and α ≥ 90) thencollection ← collection + objects found in region 3 + 4

else if (X1 > X0) thenif (α ≤ 30 and α ≥ -30) then

collection ← collection + objects found in region 2 + 4else if (α ≤ 60 and α ≥ 30) then

collection ← collection + objects found in region 1else if (α ≥ −60 and α ≤ −30) then

collection ← collection + objects found in region 4else if (α ≥ −60 and α ≤ −90) then

collection ← collection + objects found in region 3 + 4else if (α ≤ 60 and α ≥ 90) then

collection ← collection + objects found in region 1 + 2end

Figure 3.11 shows three different types of multi-cell queries where there is no

overlapping area amongst BSs. Figure 3.11a shows a user that moves into its neigh-

bouring BS. In Figure 3.11b a user moves toward the BS borders. Figure 3.11c shows

the movement within the same BS and the query scope crossing the corresponding

BS boundary.

As mentioned before, the illustration in Figure 3.11a shows a user travelling

within BS1 and the query scope is crossing the BS1 boundary. The target of query

scope (shaded area) is decided by the user direction. When BS1 knows that the query

scope is crossing its boundary, it processes the query within its area (shaded area

from user location to ∆X) and gets partial information about the query result from

BS2 by forwarding the remaining query scope information from BS1 to BS2. Once

BS2 finishes generating query results, it forwards the query results to its requester

neighbour, BS1. Then, BS1 combines the partial query results retrieved from the

other BS (BS2) and forwards the joined query results to the user.

Figure 3.11b shows a user location at the border line of BS1 and BS2. In this

situation, BS1 does not process the user query because, when the user misses the

query results, BS1 needs to forward the user query twice to its neighbour. After

BS1 receives the user query, it forwards the user query and the query scope to its

neighbour, BS2, which processes the query and forwards the query results to the

user immediately. In both figures, handovers do not happen since the user remains

within one cell.

Figure 3.11c shows a user moving from his/her current BS1 to BS2. BS1 receives

the query and calculates the predicted location of user. BS1 forwards the user query

and the prediction of the user location to BS2 and the user query is handled by

BS2. In this situation, generating query results of a number of users is dependent

on the knowledge on knowing when the users enter new cells. Predicting when users

enter new cells have been discussed in section 2.4.3 [128]. So, in this case, the next

neighbour, BS2, knows when users enter its area. The remaining processes of query

result retrieval are the same as in the previous example.

(a) Movement within one cell (b) Movement to BS border line

(c) Movement within another cell

Figure 3.11: Three types of users’ movement

The above figure shows non-overlap areas of multiple BSs, these areas can overlap

each other. This situation raises an issue in answering multi-cell query raises issues.

The issue and its proposed solutions will be addressed in Section 3.4.1. Later in

Section 3.4.2, a proposed solution to apply these proposed approaches to deal with

a multi-cell query, which is either static or dynamic, is described.

3.4.1 Non-Overlapping and Overlapping Area Algorithms

The issue of multi-cell query retrieval is to avoid any duplicate data items retrieved

from other BSs and to reduce waiting time to retrieve query results from other

BSs. This section describes two proposed solution of multi-cell query retrieval which

involve non-overlap and overlap areas of multiple BSs respectively.

Non-Overlapping Area algorithm

The proposed solution to answer a multi-cell query from non-overlapping areas of

multiple BSs is the focus of this section. Before the proposed solution is presented,

two major types of non-overlapping BS areas are described. Figure 3.12 shows two

major types of non-overlapping BS scopes. The first figure shows a whole area that

is covered by many BSs; whereas, the second figure shows there is an area which is

not covered by those BSs (described by the shaded area). Handling mobile query

retrieval in both situations is the aim of our proposed approach.

Figure 3.12: Non-overlapping base stations(BS)

As we mentioned previously, our proposed approach keeps track all online BSs

such that all BSs are required to register to all of their surrounding neighbour BSs.

When there is a multi-cell query, the current BS retrieves information of local and

remote objects. The current BS refers to a BS where the mobile user is inside its

service area. Hence, when the user is expected to arrive at a new cell, the current

BS is the one that sends a query result. Because new location of the user is located

in the new BS area, which is different to the one receives the user query that has

been forwarded the user query to this new BS. Hence, we assume that a handover

has been carried out.

In retrieving information of remote objects, the current BS searches the area of

query scope which overlaps with any of online BSs in its list. For example, when the

area of query scope overlaps with area of BS A, the overlapped area is then sent to

the BS A in order to get the query result. Using the approach mentioned in Section

3.3, the BS A generates the query result and returns it to the current BS. It merges

all information of remote and local objects, which will be sent to the user.

Algorithm 3.7 shows the details of proposed algorithm. They are described as

follows:

(i) Retrieve information of all objects from the current BS which are covered by

the query scope. The details of information retrieval process will be discussed

in Section 3.4.2.

(ii) Load an information of online BS from the list in sequential order.

(iii) For every online BS, find an overlap area between the query scope and area of

the online BS. If an overlap area exist, the current BS sends the overlap area

to that BS. In other words, the overlap area is the query for that particular

(iv) The online neighbour BS that receives the query, execute the query in the same

way as the current BS.

(v) The online neighbour BS returns a query result, which contains either a list

information of objects or an empty list, to the current BS.

(vi) The current BS combines the returned query result into its query result.

(vii) Repeat the process until information of all objects inside the query scope is

retrieved.

(viii) Return the query result to the requester.

Algorithm 3.7: Non-overlapping algorithm.

Input: Query, NoOfBSOnlineOutput: Resultsbegin

Queryscope ← Scope of queryCurrent BSscope ← current base station boundaryCurrent BSID ← current base station IDNoOfBSOnline ← number of online neighbour BSCollectionOfOnlineBS ← List of online BSResult ← Get Result(Queryscope, Current BSscope)while (index < NoOfBSOnline) do

Current Neighbour BS ← CollectionOfOnlineBS at position indexif intersection(Queryscope, Current Neighbour BSscope is existed) then

// Append retrieved results from current BS// to the end collection of all retrieved resultsResult ← Result + Current Neighbour BS(Queryscope)

endindex ← increment index by one

endReturn Result

Figure 3.13 shows an illustration of the multi-cell query retrieval. MU2 and MU1

sends 2 queries to BS1 and BS5 respectively. BS1 forwards the query by sending the

overlap area with BS2 and one with BS4 to BS2 and BS4 respectively on behalf of

mobile user MU2. The BS2 and BS4 return information of objects inside the query

scope.

The retrieval process of MU1 is similar to MU2, except the uncovered area will

not be sent to any of online BSs. Furthermore, the query scope covers the area of

BS1 and BS2, which are not direct neighbours of BS5. Fetching result information

from both BSs can be done by recursively passing the overlap area of query scope to

all neighbour BSs of the current BS. These neighbour BSs pass overlap parts of the

received query scope to their neighbour BSs. This process continues until the whole

Figure 3.13: Multi-cell query illustration

area of query scope is processed and there is no further overlapping area between

any BS and query scopes.

Overlapping Area Algorithm

The query result retrieval for overlapping area of multiple BSs has similar process as

the one for non-overlapping. The existence of overlapping area makes both processes

different, because a mechanism has to be applied to avoid any object duplications

in a query result.

This section elaborates two proposed approaches to handle the situation, they

are area and query result eliminations. Both proposed approaches are explained as

follows:

1. Eliminating neighbour BS overlapping area

This proposed approach is used to avoid reprocessing an overlap area of mul-

tiple BSs that have been processed. When the query scope covers an overlap

area of multiple BSs, the overlap area will be searched once. The first BS in

the list of online BSs would be in charge for searching objects within that area.

Algorithm 3.8 shows the complete process. They can be explained as follows:

(i) Retrieve a query from an user, extract information of the query and

generates a query scope based on the information of the query.

(ii) Search the requested information of all objects where their places are

inside the query scope within the current BS area.

(iii) Load an information of the online BS based on its position in the list of

online BSs. Verify the area of that BS against the area in a list of the

processed BSs. This list contains all BSs that have processed the query.

If the online neighbour BS being processed is in the list, this BS does

not get a task to process the current query.

(iv) Before forwarding the query to that neighbour BS, the current BS elim-

inates the overlapping area of the current neighbour BS being processed

and the current BS.

(v) Once the overlapping area has been eliminated, the query and list of the

executed BSs are forwarded to that BS which will then generates a query

result using the same mechanism as the current BS.

(vi) Repeat the process until all online BSs have been processed.

2. Eliminating items from neighbour BS query results

The proposed approach is similar to the previous one. The only difference

is that any overlapping neighbour BS areas is not eliminated, because the

duplicated items in the returned query results are eliminated from the query

results.

Algorithm 3.9 shows our second proposed algorithm for retrieving items from

multiple overlapping cells by eliminating duplicate items in the query result.

To simplify our description, we do not discuss the whole algorithm since it is

Algorithm 3.8: Eliminating neighbour BS overlapping area algorithm.

Input: Query, list of BS doneOutput: Resultsbegin

Queryscope ← Scope of queryBSscope ← current base station boundaryCurrent BSID ← current base station IDResult ← Get Result(Current BSID)Area Taken ← BSscope

list of BS done ← list of BS done + Current BSIDwhile (index < NoOfBSOnline) do

Current Neighbour BS ← CollectionOfOnlineBS at position indexif (Current Neighbour BS.ID is in list of BS done) then

list of BS done ← list of BS done + Current Neighbour BS.IDContinue to next neighbour BS

endlist of BS done ← list of BS done + Current Neighbour BS.IDCurrent Neighbour BSscope ← Current Neighbour BSscope -Area Takenif intersection(Queryscope, Current Neighbour BSscope is existed) then

// Append retrieved results from current BS// to the end collection of all retrieved resultsResult ← Result + Current Neighbour BS(Queryscope, list of BS done)

endArea Taken ← Area Taken + Current Neighbour BSscope

index ← increment index by oneendReturn Result

similar to one mentioned in the previous subsection. We highlight only those

parts which are different.

Algorithm 3.9: Eliminating items from neighbour query result.

Input: Query,list of BS doneOutput: Resultsbegin

Queryscope ← Scope of queryBSscope ← current base station boundaryCurrent BSID ← current base station IDResult ← Get Result(Current BSID)list of BS done ← list of BS done + Current BSIDwhile (index < NoOfBSOnline) do

Current Neighbour BS ← CollectionOfOnlineBS at position indexif (Current Neighbour BS.ID is in list of BS done) then

list of BS done ← list of BS done + Current Neighbour BS.IDContinue to next neighbour BS

endlist of BS done ← list of BS done + Current Neighbour BS.IDif intersection(Queryscope, Current Neighbour BSscope is existed) then

// Append retrieved results from current BS// to the end collection of all retrieved resultstempNeighBSResult ← Current Neighbour BS(Queryscope, list of BS done)tempNeighBSResult ← Eliminate duplicate items fromtempNeighBSResult against ResultResult ← Result + tempNeighBSResult

endReturn Result

This algorithm does not follow the BS areas that have been processed, but, it

treats the overlapping BS as a non-overlapping area. It means that a neighbour

BS collects any items in the overlapping area covered by the query scope,

although these items are collected by other neighbour BSs. As a result, the

returned query result contains some duplicated items when it merges with the

one in the caller BS. Therefore, an additional step is taken to filter out the

duplicate items before joining any query results of other neighbour BSs and

the one of the current BS together.

3.4.2 Static and Dynamic Query Scope Algorithm

Static query scope refers to a parallel query scope with a base station boundary.

Dynamic query scope is a query scope perpendicular to the straight line of the travel

direction of mobile users. Section 3.3 has presented more details about these query

scopes and the proposed query processing algorithms in a single cell. This section

presents a discussion to apply the proposed algorithms, which has been discussed in

Section 3.4.1, to be used with static and dynamic query scopes to retrieve a query

result. The information retrieval algorithm for the static query scope is presented

first, followed by the dynamic one.

Static Query Scope Algorithm

Before a discussion of proposed approaches is presented, an illustration is shown

to give a better picture. Figure 3.14 illustrates a static query scope that covers a

partial area of multiple cells. When MU2 sends a query scope, ACFH, which covers

partial area of the BS, the BS decides part of query scope being processed which

depends on the travel direction. The scope, KCFL, is then processed. The BS then

does a reduction process of the query scope towards the area of BS. The aim of

reduction is to eliminate the part of the query scope which does not belong to the

BS. The reduction process is very straight-forward since it cuts any four sides of the

query scope if they are greater than the BS scope. After the reduction process has

been done in BS1, BS2 and BS4, we have three smaller separate query scopes inside

three BSs. These smaller query scopes are KBEJ in BS1, BCDE in BS2 and DFLJ

in BS4.

After every BS has completed the reduction query scope process, the BS searches

all items which are located inside the smaller query scope. These items are returned

to the BS caller, otherwise the BS returns nothing to the BS caller. Therefore, BS4

and BS2 return items inside areas BCDE and DFJG respectively to BS1. Then,

BS1 sends back the returned results and its result to the user.

Figure 3.14: An illustration of static query scope

Algorithm 3.10 shows details of the proposed information retrieval approach.

They can be explained as follows:

(i) If the scope of BS is smaller than the query scope, return all items inside the

area of BS.

(ii) Otherwise, create a new scope of query that covers the scope of BS.

(iii) Find all items which are located in the new scope and store them into a col-

lection called result.

(iv) Return it to the requester.

Dynamic Query Scope

Dynamic query scope is different from static query scope. There are many possi-

bilities when a dynamic query scope is used to retrieve query results from several

Algorithm 3.10: Get Result algorithm for static query scope

Input: queryScope, BSScopeOutput: Resultsbegin

list of items ← items in this BS//Check whether query scope is greater than BS scopeif (BSScope < queryScope) then

return all items in list of itemsend//Check whether partial of query scope pass BS scopeNew Query Scope ← queryScopeif (queryScope.Xmax > BSScope.Xmax) then

New Query Scope.Xmax ← BSScope.Xmax

endif (queryScope.Xmin < BSScope.Xmin) then

New Query Scope.Xmin ← BSScope.Xmin

endif (queryScope.Ymin < BSScope.Ymin) then

New Query Scope.Ymin ← BSScope.Ymin

endif (queryScope.Ymax > BSScope.Ymax) then

New Query Scope.Ymax ← BSScope.Ymax

endwhile (item in list of items) do

if (item is inside New Query Scope) thenresult ← result + item

endreturn result

BSs since they come from a different angle. These possibilities can be classified into

four categories based on the coverage of the query scope on a BS area, as shown in

Figure 3.15. The new shape of the query scope can either be a polygon or triangle.

The first three figures show a polygon, whereas the last figure on the bottom right

shows a triangle. The boundary of the new query scope intersects with one or more

BS boundaries.

(a) two parallel lines (b) two perpendicular lines

(c) two parallel lines (d) two perpendicular lines

Figure 3.15: Dynamic Query intersects a base station (BS) (top) in the same line.(bottom) in two different lines

Algorithm 3.11 shows a retrieval algorithm when a query scope passes a neigh-

bour BS area. This algorithm starts with an initialisation value to some parameters.

Then, it checks whether the BS area is smaller than the query scope. If it is, the

neighbour BS returns all items in its area immediately to the mobile user.

When a query scope partially overlaps neighbour BS, there is an intersection

area as shown in Figure 3.16. The form of this area is a polygon with has n number

of vertices, where the value of n is between 3 and 6. The intersection area is formed

by any corner point of BS boundary and a number of intersection points between

Figure 3.16: An illustration of dynamic query situation

query and BS scopes. Hence, the BS needs to know where these intersection points

are located and which one or more corner points of BS boundary are inside the

intersection area. An intersection point lies on two line equations: a line equation

of the BS boundary and the stored line equation of the query scope and searching

distance. These points are stored in a clockwise order.

The BS compares all items inside the list of items whether or not their location

is inside the query scope, using the right hand rule, which is specified as follows:

• Take two points inside the collection.

• Use formula (p.y−p.y0)(p.x1−p.x0)− (p.x−p.x0)(p.y1−p.y0) to find whether

point p is located inside the query scope. This formula returns a value less

than 0, equal to 0 or greater than 0.

• Point P lies inside the query scope if the value is less than or equal to 0

All points located inside the query scope are collected and returned to the BS

requester. Then, the BS requester combines the result with its result together to

the requester.

Algorithm 3.11: Neighbour cell retrieval algorithm for dynamic query scope

isInside ← truelist of items ← items in this BSlist of BS Vertices ← BS vertices//Check whether query scope is greater than BS scopeIntersection points ← find all intersection points lies on query scope andneighbour BS boundariesSort elements of Intersection points in clockwise orderif (all point in list of BS Vertices covered by queryScope) then

return all items in BSScopeendif Any point in list of BS Vertices covered by queryScope then

Add that point into Intersection pointsendSort elements of Intersection points in clockwise orderwhile item in list of items do

p ← item[ndxItem]while (isInside is true) and (index < number of elements inIntersection points) do

p0 ← Intersection points[index]p1 ← Intersection points[index+1]isRS ← (p.y - p.y0) (p.x1 - p.x0) - (p.x - p.x0) (p.y1 - p.y0)if (isInside is true) and (isRS ≤ 0) then

isInside ← trueelse

isInside ← falseendindex is incremented by one

endif isInside then

result ← result + pendndxItem is incremented by one

endend

3.5 Handling Disconnections

In a mobile environment, there might be a situation that mobile users do not re-

ceive any query results due to a disconnection between the mobile user and the

base station. The disconnection can be either unpredicted or predicted, which is

caused by interference or the recipient’s location is outside the coverage of any base

station respectively. This section discusses our proposed algorithms for handling

disconnection.

A server can handle predictable disconnection more easily than unpredictable

disconnection. The reason is that the server knows the wake-up time of mobile

users. On the other hand, the second type has no knowledge of when the mobile

device gets connected and the recipient’s next location.

As a result of the disconnection, either the query result has not been received

by the mobile user or an acknowledgement has been lost during the transit. To

deal with the missing results problem, the server could reprocess the query result

when the disconnection is predictable. On the other hand, reprocessing the query

result might not be a good idea since the server needs a certain amount of time

to produce the query result and frequent disconnection happens. Preserving an

existing query result and sending it periodically within a certain amount of time

is one solution to manage sending the query result in this situation. For both

solutions, the mobile user needs to send an acknowledgement once the query result

has been received. The server keeps a query result for a certain amount of time to

avoid excessive query results from other mobile users. In the case of any failure in

receiving acknowledgement by the server, this situation would be treated as if the

mobile users had not received the query result sent.

The rest of this section discusses the proposed approaches for handling two types

of disconnections for single cell and multiple cells. The proposed approach to han-

dling disconnection in the single cell is presented first, followed by that for multiple

cells. For each subsection, the proposed techniques for handle predictable and un-

predictable disconnection are presented.

3.5.1 Single Cell

Predictable Disconnection

This section elaborates on a proposed mechanism to handle a situation when mobile

users miss query results within a predictable time. In other words, mobile users are

alerted to be ready to receive query results in the next interval.

Figure 3.17: Illustration of predicted disconnection situation

Consider that a user at location Z0 sends a query, which requests objects within

a distance D, at time tstart. The user is travelling with a constant speed S. The user

does not receive a query result at location Z1 at time tstart+1. The user is expected

to arrive at location Z2 to receive a query result at time tstart+2 as shown in Figure

3.17. If the gap between Z1 and Z2 is less than the distance of the user query, an

overlapping of the query results (the area is indication by TRSQU) is generated at

time tstart+1 and time tstart+2. It overloads the server to generate the same results

in future.

In our proposed approach, retrieving the same result set can be avoided when

the above situation happens. The approach is divided into two major steps. Deter-

mining the existence of overlapping area of two query scopes, which exists if the gap

between Z1 and Z2 is shorter than the distance value of the query, is the first step

of this approach. The second step is to exclude any items from a result set. In the

second step, there are two ways: items or area. Both ways are similar to the ones

to eliminate duplicated items of multi-cell queries processing for the overlapping

BSs (described in Section 3.4.1). Both ways collect items that are located inside

the overlapped region from the existing query result. In the items elimination, we

do not eliminate the overlapped area while searching the items, however, all items

are compared with the items in overlapped region. The area elimination, we elimi-

nate the overlapped area while searching the items, therefore comparing the items

whether they are located in the query scope is done.

Algorithm 3.12 presents our proposed algorithm. The details of this algorithm

are explained below:

(i) Verify the previous position, (X1, Y1). If it is outside the current query scope,

terminates the algorithm and returns an empty result set.

(ii) Form an overlapping area of both query scopes.

(iii) Retrieve all objects which are located in the query scope, but they are outside

the overlapping area. This step is the second part of the proposed approach.

Hence, an elimination process is completed here by choosing one of two ways.

(iv) Keep all items in the overlapped regions.

(v) Add those objects with ones in the overlapping collection.

Algorithm 3.12: Predicted disconnections algorithm

Objects ← objects collection in the current Base Station boundary(X1,Y1) ← Current location at tstart+1

(X2,Y2) ← Current location at tstart+2

Dist ← distance of user queryoverlapping collection ← emptyif (X1,Y1) is outside queryScope then

return overlapping collection;endT ← (X2± distance, Y2± distance)R ← (X1± distance, Y2± distance)S ← (X1± distance, Y1± distance)Q ← (X2± distance, Y1± distance)U ← (X2± distance, Y2± distance)overlapping area ← area formed by TRSQU coordinatesoverlapping collection ← all objects in the overlapping areacollection ← overlapping collection + searches all objects which are notlocated in the overlapping areareturn overlapping collection

Unpredictable Disconnection

This subsection presents a proposed technique for managing a situation where un-

predictable disconnection occurs. There are two possible solutions for handling such

disconnection: non-reprocessing or reprocessing query result.

Algorithm 3.13 shows the proposed non-reprocessing algorithm when unpre-

dictable disconnections occurs. This algorithm is executed when the BS have re-

ceived an information that the mobile user is ready. At the start of the algorithm,

a query scope is a query scope when a mobile user misses query results. This query

scope has been available in the server. (X1, Y1) is the last position when the mobile

user missed query results. (X2, Y2) is the current position of the mobile user. When

(X2, Y2) is still inside the query scope, then the server sends the existing query re-

sult. Otherwise, the server needs to reprocess the existing query with next location

of the mobile user.

Algorithm 3.13: Non-reprocessing algorithm

beginQueryscope ← query scope from user(X1, Y1) ← Sender location at time tmissed

(X2, Y2) ← location at time tcurrent

if ((X2, Y2) is inside Queryscope) thenSend existing query results

elseRegenerate query result

endend

The advantage of this algorithm is that it reduces the server load by keeping the

existing query result, which depends on the server configuration. Two drawbacks of

this algorithm are that it increases server memory consumption because it retains

existing query results, and some objects may be invalid since the requester has

moved to a new location.

Alternatively, the server regenerates a new query result without worrying about

the existing query result. Algorithm 3.14 shows the reprocessing algorithm when the

mobile user has missed a query result. This algorithm is executed when the mobile

user reconnects to the current BS. Similar to the one mentioned previously, the

server collects the current location information of the mobile user and predicts the

next location at the beginning of the algorithm. Then, the server generates query

scope at (X2, Y2) with the same searching distance which was passed to the server

Algorithm 3.14: Reprocessing algorithm

begin(X1, Y1) ← Sender location at time tcurrent

(TDx, TDy) ← Travel distance of mobile user// Prediction of next location at time tcurrent+1

(X2, Y2) ← (X1 + TDx, Y1 + TDy)Queryscope ← generate query scope at (X2,Y2) with same searchingdistanceResult ← Reproduce query result at time current+1Send result to user

beforehand. The next step is the server reproduces the query result and finally, the

server sends it to the requester.

3.5.2 Multiple Cells

Disconnections also occur while retrieving multi-cell queries. The problem occur

if the mobile user travels to an area which is outside a service area of one BS.

The BS needs to avoid processing query when the user cannot be reached. This

section presents a discussion for the above problem, focusing on predictable and

unpredictable disconnections.

Predictable Disconnection

This describes the proposed algorithm for handling a predictable disconnection sit-

uation which is the period of disconnection can be known in advance. For example,

when a mobile user is outside the service area of the base station within a certain

period of time.

Algorithm 3.15 shows our proposed algorithm for handling a predictable dis-

connection in receiving the query result. At the start of the algorithm, the server

retains the existing query results that were not sent while the next recipient location

Algorithm 3.15: Predictable disconnection algorithm for multi-cell retrieval

beginqueryResult ← existing query resultisSent ← falseisOutCurrBS ← falsewhile (not isSent) do

(Xt, Yt) ← new location at time tif (Xt, Yt) is outside the current BS area then

isOutCurrBS ← trueexit loop

endwhile ((Xt, Yt) is inside queryScope) do

Send existing query result when connection is establishedisSent ← acknowledgement from the user upon receiving the result

endwhile (isOutCurrBS) and (BS in list of online BS) do

if ((Xt, Yt) is inside the BS) thenForward query, location to that BSExit loop

endBS ← next BS

endremove query result

was still inside the query scope. Otherwise, a new query result is generated as the

location of mobile user is outside the query scope.

In addition, the possibility of leaving the current BS area exists, thus, the current

BS needs to send the query to a neighbour BS. However, the next location of the

mobile user may not belong to any online BS. Therefore, the current BS needs

to calculate the next location while the mobile user enters any of the online BSs.

Hence, the new BS processes the query and sends the query once the mobile user is

connected to the new BS.

Unpredictable Disconnections

The problem of unpredictable disconnections for multi-cell queries is different than

one for single cell. The movement of user to another cell creates the difference

between two. The current cell should have a knowledge whether to remove or keep

the query result. We propose an approach to handle such situation in this section.

Algorithm 3.16 shows an algorithm for maintaining unpredictable disconnection.

The server waits for acknowledgement once it has finished sending the query result.

The acknowledgement value parameter is either true or false. It is true if the mobile

user receives the query result completely.

Algorithm 3.16: Unpredictable disconnection algorithm for multiple cellsretrieval

beginqueryResult ← existing query resultisSent ← falsewhile (numOfSendingTrial < maxSendingAllowed) and (not isSent) do

if (connected) thenif (recipient location is outside the query scope) then

regenerate query result again based on new locationexit loop

endsend query resultwaiting for acknowledgement in t period timeacknowledgement ← acknowledgement from user upon receivingquery resultif (acknowledgement is true) and (acknowledgement is received)then

isSent ← trueend

endelse

exit from loopnumOfSendingTrial is incremented by one

endremove query result

The waiting period is calculated by the maximum number of query results sent

multiplied by the waiting period for receiving an acknowledgement. The value of

both parameters are configurable depending on the server capacity. The query result

is kept at the server side until the maximum number of sending has been reached

or the user is disconnected from the server.

The formula, to calculate the waiting period before a query result is deleted, is

given below:

WP = MS * WPA

Where:

WP is Waiting period before a query result is deletedMS is Maximum number of sendWPA is Waiting period to receive an acknowledgement (timeout)

If one of the above condition is reached, the server removes the query result.

The removal is to avoid running out of server space, even though the server has a

large space. The mobile user needs to send the query again after this period or the

new location will be outside the query scope.

This algorithm focuses on the recipient location that resides inside the query

scope. Otherwise, it is ineffective in keeping the query result at the server side since

the query result is invalid for the new location.

3.6 Case Studies

In this section, we describe case studies for single cell and multi-cell queries to

illustrate how these proposed approaches work and how query results are computed.

3.6.1 Single-Cell Query Processing

We illustrate situations where the user has stopped or is moving while receiving the

query result. The user may move slowly or quickly and the movement direction may

be vertical, horizontal or diagonal. We define a slow velocity as a velocity when the

user movement is less than the distance of user query. In other words, if a user

query finds a target within x, then the user movement is less than x. In contrast,

a fast velocity is a velocity when the user movement is greater than or equal to the

distance of the user query. Therefore, the user may either hit or miss the query

result during the process of receiving the query result from the server.

Based on the above situations, this case study is divided into 4 cases. Each one

presents a discussion when a user has zero, vertical, horizontal and diagonal move-

ments respectively. Each one discusses two situations of retrieving query results, hit

and miss query results, while they are moving.

Case Study 3.6.1. The mobile user stays in the same location

In this example, we assume that the user is not moving to any other location

while a query result is being received. Consider a mobile user is located at point

(5,5) and sends a query to a server (refer to Figure 3.18). The query is “Find a

closest restaurant within 2 kms”. This user stays at the same location when the

answer is given by the server. In other words, the location at time tstart+1 is the

same as the one at time tstart.

The server will generate a valid scope by adding and subtracting the distance

to/from the mobile user position. Therefore, we have a square that is formed by

the following coordinates: Top Right: (7,7), Bottom Right: (7,3), Top Left: (3,7),

Bottom Left: (3,3).

After the valid scope has been produced, it will search a restaurant within ranges,

3 < x < 7 and 3 < y < 7. In other words, all regions will be searched. Once the

Figure 3.18: Stay at the same location (Case Study 3.6.1)

server finds a restaurant within that range, then it will generate a query result for

that query. The server forwards the query result to the user. An acknowledgement

flag is set to be true if the user has accepted the query successfully. Otherwise, the

server keeps processing the query result for the next interval time.

Case Study 3.6.2. Vertical Movement

We illustrate that the user is moving vertically with a constant speed. First, we

show that the user receives the query result immediately from the server at time

tstart+1. Later, a situation when the user missed the query result from the server at

time tstart+1 is shown.

After the scope has been created and divided into four equal regions, the server

identifies the user movement direction. The examples in Figure 3.19 show the user

travels horizontally. Then, the server executes the Algorithm 3.2 in order to find

all objects queried. In Figure 3.19a, the user is moving in an upward direction,

therefore, the server will search targets within region 1 and region 2 of the scope

instead of the whole scope. This is due to our assumption that the user is interested

only in the targets that have not been passed. Hence, the valid targets will be

vending machines (V6, V8, V9, V11, V13) and these targets are forwarded to the

(a) Move up (b) Move down

Figure 3.19: Vertical movement (Case Study 3.6.2-1)

user. The server sets the parameter forwarded to be true once the user has received

the answer successfully.

On the other hand, if the user is moving down, regions 3 and 4 will be probed

(Figure 3.19b). Hence, the vending machines (V2, V7, V10, V14) are valid results

and are forwarded to the user. Then, the parameter forwarded is set to be true.

We have shown a situation where the user receives the query result at time

tstart+1 above. Now, we assume that the user missed the query result at time tstart+1

when the user is moving up with a constant speed. The user will receive it in the

next interval time tstart+2 as shown in Figure 3.20.

The beginning part of this algorithm is the same as above where the server

initialises parameters needed by assigning the value from a user query received.

After the parameter initialization, the server generates a new scope for the location

Z1 at tstart+2 and divides the scope into four equal regions. Once the scope has

been created and divided, the server searches for targets within region 1 and region

2 instead of all regions based on our assumption above. Therefore, the only valid

vending machines will be vending machines (V9, V13) at time tstart+1.

Figure 3.20: Vertical movement with overlap situation (Case Study 3.6.2-2)

Upon handling the disconnection, the server regenerates new query results for the

next interval time where the user is predicted to reach location Z2 at time tstart+2.

In recreating the query results for time tstart+2, the server verifies targets in the

old query results. The server will invalidate targets which are not bounded by the

overlapping area PQRS (see Figure 3.20). In this scenario, no target is invalidated.

Then, the server probes new targets in regions 1 and 2 inside the square generated

at time tstart+2. The new targets found will be joined to the existing valid targets.

Hence, the query results returned at time t2 is vending machines (V6, V8, V9, V13).

Case Study 3.6.3. Horizontal Movement

Here, we present three examples of the horizontal movement of a user. Two

hit illustrations that show a user receiving the query result while he/she is moving

horizontally with a constant speed at time tstart+1 are given first. Later, a situation

where the user misses the query result is presented. We assume that the user will

only receive the query result at tstart+1.

Let us consider an illustration where a user query “Find all vending machines

within 2 kms”, the speed of travelling is S as presented in Figure 3.21. The horizontal

(a) Move right (b) Move left

Figure 3.21: Horizontal movement (case study 3.6.3-1)

movement to the right direction with speed S as shown in Figure 3.21a. In the

beginning of process, the server receives a user query including the current travel

information. The server creates a query scope based on the information in the user

query and then selects the regions to be searched. The server assigns all targets that

are located within regions 1 and 4 of the scope, due to our assumption mentioned

above, into parameter collection. We assume that the user will arrive at point (5,5)

at time tstart+1. Therefore, the valid vending machines: (V9, V10, V11, V13 and V14),

are forwarded to the user and the parameter are set to true.

On the other side, when the user is moving in a left direction with velocity S,

the regions 2 and 3 will be searched to get valid objects (shown in Figure 3.21b).

Therefore, the valid targets: (V2, V6, V7, V8), are forwarded to the user and the

parameter forwarded is set to be true.

Now, let us consider a situation where the user missed the query result at time

tstart+1 and then the user will receive a new result at time tstart+2 as shown in Figure

3.22. Assume that the server had processed and found the vending machines: (V9,

V11, V13 and V14) as valid targets of the user query at time tstart+1.

Figure 3.22: Horizontal movement with overlap situation (case study 3.6.3)

However, the user could not receive the query result when he/she was at location

Z1 on time tstart+1 because there is a disconnection upon receiving the results. We

assume the user keeps travelling with a constant speed and is predicted arrive at

location Z2 at time tstart+2. Then, the server regenerates the query result for the

next interval time. Since the user is moving slowly with constant speed, there is an

overlapping area, formed by points P,Q,R,S, between the square at time tstart+1 and

tstart+2. Therefore, the server will invalidate some targets in the existing query result.

In this case, the vending machines: (V9, V14) have expired and are eliminated. In

other words, the vending machines’ locations that are bounded by the overlapping

area, PQRS, are the valid targets for time tstart+2. After the server has eliminated

the invalid targets, the server will probe targets that are located within regions 1 and

4 of the scope (excluding the overlapping area) since the user is moving horizontally

and is interested in the targets that have not been passed. These found targets are

substituted with the targets found in area PQRS. Afterwards, these targets: (V10,

V11 and V13), are forwarded and received by the user at time tstart+2.

Case Study 3.6.4. Diagonal Movement

In this example, two illustrations (as shown in Figure 3.23) of diagonal movement

are presented. The first illustration demonstrates a situation where a user receives

a query result at time tstart+1 when he/she moves diagonally with a constant speed.

In contrast, the last illustration shows a stage when a user misses a query result at

time tstart+1 and is expected to receive a new query result at time tstart+2.

(a) Diagonal movement (b) Overlap

Figure 3.23: Diagonal movement and overlap situation (Case Study 3.6.4)

Let us consider that a user sends the same query as in the previous example

and moves in a top right direction as shown in Figure 3.19 and Figure 3.21. At

the start of the process, a server receives the user query. The server then produces

a scope on a prediction location at time tstart+1, which is point (5,5), and divides

the scope into four equal regions. The next process analyses the user direction by

calling the diagonal movement algorithm (Algorithm 3.4) to check targets in the

opposite region. In the algorithm, the servers verify the user movement direction.

In our example, the user moves to top right direction, therefore, the server will

search targets within region 1 instead of all regions based on our assumption. Then,

the valid vending machine, V11, is sent to the user.

In the next illustration (Figure 3.23b), the scenario is similar to that above;

however, the user experiences a disconnection upon receiving the query result at time

tstart+1. Therefore, the user missed the query result at that time and is expected to

receive the next query result at time tstart+2.

When the user acknowledges that the user has missed the current result, the

server regenerates a new query result for a location Z2 since the user is predicted to

arrive at location Z2 at time tstart+2. The server generates a scope for the location

Z2. The overlapping area is searched to invalidate the existing targets that are not

bounded within this area. Then, the server searches targets which are located within

the scope (excluding the overlapping areas). Then, the new targets found are joined

to the existing targets. Hence, the query result where the content is (V11, V13) is

returned to the user. A returned acknowledgement is set once the user receives the

query result.

3.6.2 Multi-Cell Query Processing

In this subsection, we present examples to show how our proposed query processing

algorithm for multiple cells works. We divide this into two parts: a non-overlapping

BS area and an overlapping BS. First, we discuss how to retrieve query results where

there is no overlapping area, then, we discuss the process for the situation where

there is an overlapping area. As a running example, we use the same query as

mentioned in Section 3.6.1, which is sent to a server through BS1.

Non-Overlapping BS Area

Two examples are given to illustrate two situations where a user moves within the

current cell and moves to another cell and requests for information of objects from

multiple cells.

Figure 3.24 shows a situation where a query scope is crossing eight BS bound-

aries. In this situation, BS3 receives the query scope and forwards it to its neighbour

BSs (BS2, BS4, BS7, BS8, BS9). Those BSs search objects within the requested

area and verifies their list of online BSs. BS4 and BS9 forwards the query again to

BS5 and BS10 respectively to request objects in their area. BS5 and BS10 returns

information of the requested objects to the requester, BS4 and BS9. All BSs, which

got the forwarded query by BS3, returns all information to BS3. BS3 merges all of

the results and then send it to the user.

Figure 3.24: A query scope is crossing multiple cells

Figure 3.25 shows a situation where a user moves to another BS. The user sends

the same query to BS3. Once BS3 receives the user query, the prediction location of

users is calculated based on the function of time. This function of time is formulated

from the multiplication of travel speeds by time. Since the new location of mobile

user is outside its area, the BS3 forwards the query to BS8. BS8 creates a query

scope and processes the query inside the shaded area.

In fetching the query result, BS8 searches objects inside the requested area and

its list of online BSs. The query is then requested to its online neighbour BSs which

their areas being covered by the query scope. In this situation, BS7 and BS9 receive

the forwarded query and do the same processes as BS8. Then, BS9 forwards the

Figure 3.25: Moving across to another base station (BS) boundary

query to BS10 since this query covers partial area of BS10. BS10 does the same

process and send the query result back to BS9. BS9 merges its query result and the

one from BS10. BS9 and BS7 send the query result back to the BS8. BS8 combines

its query results and ones from BS9 and BS7. Once the query results are merged,

the result is then sent to the user.

Overlapping BS Area

Having presented two examples of non-overlapping BSs, we now show three examples

of possible situations which show a new location of users as the query scope interacts

with BSs where the BSs area overlaps with others. We also provide examples where

new locations of users are within the intersection area and before the intersection

Figure 3.26a illustrates when a user moves to another BS and there is an over-

lapping area amongst the BSs. After BS1 receives the query from a user, it searches

targets within its area. After that, it verifies whether there is an intersection with

itself. If there is any intersection between BS1 and other BS neighbours, BS1 for-

wards the query to those intersected BSs. In this figure, BS1 forwards the query to

BS2. BS2 adjusts its minimum boundary by assigning the maximum boundary of

(a) Within an overlapping BS area (b) Outside an overlapping BS area

(c) Within many overlapping BS areas

Figure 3.26: Three situations of overlapping base station area

the BS1. Then, BS2 generates a new query scope by subtracting the current query

scope from BS2 minimum boundary. Finally, all targets within the query scope and

BS2 area are collected and sent back to BS1. BS1 collects the results from BS2 and

combines these into its collection. The final results are sent to the user.

Figure 3.26b shows a situation in which a new location of users is before the

overlapping area (shaded area between BS1 and BS2) and the query scope is beyond

the BS1 area. In this situation, BS1 searches targets within its area and overlapping

areas (shaded area). Once the searching process has been completed, BS1 checks

whether the query scope is beyond its boundary. If it is not, BS1 sends the query

result to the requester.

On the other hand, if the query scope is beyond the BS1 area, BS1 forwards the

query to all BSs which pass their areas. In this example, BS1 forwards the query

to the BS2. BS2 searches its area by excluding the overlapping area since BS1 has

already searched in that area. BS2 returns the query results to BS1 which returns

the query to the requester.

Figure 3.26c shows one example of complex situations. Once BS3 has received

the user query, it searches its area to match objects within its area with the user

query. The overlapping areas of BS4 and BS5 are included too. Then, BS3 passes

the user query to either BS4 or BS5, depending on which registers with BS3 first. If

BS4 registers first, the overlapping BS5 is included. Otherwise, that area is excluded

from the BS4 area. This situation is similar to that of BS5. If this BS registers first,

the overlapping area BS4 is included in the BS5 area. Alternatively, the overlapping

area does not belong in the BS5 area. After all areas where the user query passes

return their answer to BS3, it joins those answers and sends them to the user.

3.7 Discussion

Our proposed approaches are designed to retrieve all requested objects which have

not been passed while mobile users are travelling to receive the query result. It

focuses on a straight line movement and constant speed.

The proposed approaches are concerned with minimising query processing and

data transfer of the query results while mobile users are travelling within a single

cell or multiple cells. The proposed approaches are divided into two categories, they

are: query processing and handling disconnection. The query processing is further

divided into single-cell and multi-cell queries.

The advantage of proposed query processing approaches is to avoid processing

an unnecessary part of the query, which is the area that have been passed by the

user. Another benefit is to reduce amount of data transfer in sending the query

result.

We also proposed a solution to handle disconnection while transferring the query

result. The proposed solutions are divided to handle predictable and unpredictable

disconnections. The benefits of both proposed approaches are to generate query

result without requesting the user to resend the query while a disconnection occurs.

This query result is produced based on the predicted future location. Unfortunately,

there is a limitation to reproduce query result when the user has not received it for

several times, which is configurable depending on the server load.

3.8 Conclusion

The chapter discussed mobile query processing for single and multiple cells. In the

beginning of this chapter, the effectiveness of using a square shape as a query scope in

location-dependent query processing has been introduced. It focuses on retrieving

static object information within a single cell. It also presents the advantages of

a square shape in preference to other shapes to be used as the query scope when

processing location-dependent queries. The algorithms of retrieving objects location

information are developed to eliminate objects that have been passed by users, even

though they are still inside the query scope. Finally, when users miss query results

and their movements are slow, the past and current query scope overlap each other.

Therefore, an algorithm is developed to handle this situation in order to prevent

redundant information from being sent.

In the second part of this chapter, we discuss three methods to retrieve items

from multiple cells. The first method considers overlapping and non-overlapping BS

scope and the parallel query scope with the base station. The second one deals with

dynamic query scope. Finally, we propose an algorithm to deal with disconnection

while receiving query results. We have discussed the efficiency of those proposed

algorithms in retrieving query results from multiple cells. Case studies are provided

to show the efficiency of the algorithm.

Chapter 4

Indexing for Multiple Servers

Retrieval

Chapter 3 focused on how mobile query processing is performed. However, we did

not discuss the indexing mechanism when multi-cell queries are requested. Thus,

this chapter presents our new contribution in processing multi-cell queries using

indexing, namely Local Index and Global Index.

4.1 Introduction

It is a characteristic of mobile queries that the locations of mobile users are dynamic

and they often request data items which are located inside a single cell or multiple

cells. This dynamic change has created the need to have a better query processing

speed and to reduce the number of invalid query results.

Query processing to retrieve objects which are located in multiple cells has raised

an issue which impact upon the query processing performance. Each cell finishes

query processing within a different amount of time. The difference in query process-

ing time for each cell is caused by different transfer speeds, queue size and query

CHAPTER 4. INDEXING FOR MULTIPLE SERVERS RETRIEVAL 125

processing speed in every server. These three factors increase the query processing

One way to improve query processing is to provide an index structure for each

cell. Indexing technique is a common mechanism to help accessing a collection of

records and improve efficiency of the query processing [93, 129]. An index organises

data records to optimise certain kinds of retrieval operations. Several indexing

schemes have been proposed in the past, the most prominent among them being the

tree-based schemes [123]. The tree indexing schemes start searching from the root

nodes to the leaf nodes. The tree index structures help to process single cell queries.

However, their disadvantage in processing multi-cell queries, is that each cell needs

to traverse from the root node in order to produce the query result.

This chapter proposes two index mechanisms, namely Local and Global Indexing.

The aim is to handle the limitations of multi-cell query processing while examining

index structures in those cells. Neither proposed approaches intends to create a

new type of index structure; however, they extend the existing index structures to

improve the efficiency of multi-cell query processing. Nor do we concentrate on

concurrency issues, which occur in tree searching, and their solutions.

Moreover, both proposed approaches use an original type of multidimensional

index structure, called R-tree [41]. Both proposed approaches can also be applied

to any R-tree families. The proposed mechanisms have their own characteristics,

which can be summarised as follows:

• Local Index mechanism

As the name implies, this mechanism tries to process a multi-cell query within

a single cell. When there is a multi-cell query, remote indices of the objects

in a query result are stored locally in the current cell. In storing the remote

indices, the remote objects information are either replicated with their indexes

or kept in the original cell, which are then pointed by pointers from the leaf

nodes that hold the remote indexes. Then, if the future multi-cell queries

request for the same area, the current cell can answer it locally. On the other

hand, we do not store all requested remote indices locally, because this would

slow down the query processing and consumes more space.

• Global Index mechanism

This mechanism differs from the Local Index mechanism in terms of indices

organisation. Instead of storing remote indices of the objects in the query

result, a global index structure is created when a base station is starting up.

The cell propagates its index structure to every cell. Hence, while processing

a multi-cell query, the tree index traversal can be done from this global index

structure.

Figure 4.1 shows this chapter’s framework. Section 4.2 gives an overview of

this chapter as a foundation to the proposed approach. Two proposed indexing ap-

proaches using the original R-tree are discussed in Sections 4.3 and 4.4 respectively.

Examples of the usage of both proposed approaches are discussed in Section 4.5.

Section 4.6 presents a discussion on two proposed approach. Finally, the conclusion

summarises the contents of this chapter.

4.2 Preliminary Study

This section presents an overview of the original R-tree indexing structure with a

brief explanation of multi-cell query processing scenario.

When a base station receives a multi-cell query, the base station verifies whether

the query scope beyond the base station area. The part of the query scope that

is beyond its area is forwarded to base stations, whose areas being covered by the

query scope. Assume that the index structure for each base station is the original

R-tree and has been built in advances. Each base station searches its index structure

to match the area that overlaps with that of the query scope. At each base station,

the probing process starts from the root node down to the leaf nodes. Objects of

the matched leaf nodes are collected to be returned to the user.

The R-tree structure is an adaption of the B+-tree to deal with spatial data

and it is a height-balanced data structure, which has internal and leaf nodes [93].

Internal nodes consist of index entries of the form <n-dimensional box, pointer to

leaf node>. Leaf nodes have a pointer to a data entry. Data entry contains a pair

<n-dimensional box, rid> where rid is an identification of an object and the box

is the smallest box that contains the object, that can be presented as a point or a

region. The n-dimensional box for the internal and the leaf nodes is called Minimum

Bounding Rectangles (MBR) or Minimum Bounding Boxes(MBB).

Figure 4.2 shows two dimensional regions and R-tree indexing structure. Figure

4.2a demonstrates the geometric location of objects presented as two dimensional

coordinates. Figure 4.2b shows the R-tree index structure for the two dimensional

coordinates. In the figure, there are 12 regions of data object (shadowed boxes),

denoted by (R8, R9, R10, R11, R12, R13, R14, R15, R16, R17, R18, R19). These

regions are presented as leaf entries of the R-tree index structure, as shown in Figure

4.2b. Regions at the upper level of the R-tree represent bounding boxes for internal

nodes. The middle level of the R-tree index structure is called Internal nodes which

are presented in the white coloured boxes. In the figure, there are five internal nodes,

namely: (R3, R4, R5, R6, R7). The top level of R-tree index is called root node. It

has two entries: (R1, R2).

(a) 2D coordinates

(b) R-tree

Figure 4.2: R-tree and 2D coordinates [93]

The bounding rectangles for at least two nodes can overlap each other. For

example, the bounding rectangles R1 and R2 overlap each other. It implies that more

than one leaf nodes could occupy a given data object while satisfying all bounding

rectangles boundaries [93]. However, every data object is stored in exactly one leaf

node, although its bounding rectangle overlaps with the regions corresponding to at

least two higher-level nodes. Let us consider data objects R8 and R9. These data

objects are located within region R3 and R4, however, the data objects R8 and R9

are located only within either R3 or R4.

4.3 Local Index

This section discusses the Local Index (LI) mechanism for mobile query processing to

retrieve data items from multiple cells. The data indexing mechanism is an efficient

way to retrieve data items across multiple cells. The indexing mechanism improves

the retrieval time by supplying the correct information for a client to retrieve the

remote data items from the current cell. The R-tree indexing structure is used to

store the multi-dimensional data item indexes.

The LI mechanism is a tree indexing mechanism where an index structure of one

cell contains indexes from other cells. In this mechanism, indexes from different cells

are not replicated, however the requested data item indices are stored into the local

cell. In other words, the tree index structure is expanded by adding the new remote

data item indices to local index structure. However, there is a situation where the

maximum number of nodes held has been reached. In this case, a number of nodes,

which holds the remote indexes, need to be deleted from the tree in order to put the

new remote indices. The total of the number of eliminated items and the available

spaces must be the same as total of new remote indexes. The insertion or deletion

operation is similar to that of insertion or deletion of an index for a single cell.

For simplicity, the geometric location of data items is described in two dimensional

coordinates. In the LI mechanism, each cell has its R-tree indexing structure to

index its local data items. The R-tree structure for each cell is different from the

others.

Figure 4.3: Three index structure into 3 cells

A simple scenario is the current server sends a query scope to two neighbourhood

cells for finding Automatic Teller Machines (ATMs). Figure 4.3 depicts the initial

index tree for each cell where the ATM location is used as the index partitioning

attribute which is the same as the table partitioning. The following assumption

is made for the range partitioning rules as follows: cell 1 holds the index location

between 1 to 30, cell 2 holds the index location between 31 to 60 and the rest go to

cell 3. Each key in the local index corresponds to a local record. Please note that

although the internal nodes amongst these cells are different, they are the same in

naming convention.

The indexing structure above is developed from tables which is presented in

Figure 4.4 which consists of index number, location and name of object.

Figure 4.4: Tables for cell 1, cell 2 and cell 3 (from left to right)

Upon receiving query results, there are two ways to store the remote data items

for LI mechanisms in the current server. The server can store the remote indexes

only or the indexes with original data items. The details of both processes are

discussed in Sections 4.3.1 and 4.3.2. For simplicity, the LI mechanism with remote

indexes only is called as LI-1; whereas, the LI mechanism with remote indexes and

data items is called LI-2.

Algorithm 4.1 shows the details of Local Index algorithm, which can be explained

as follows:

(i) A mobile client sends a query scope which involves query result retrieval from

multiple cells. The server receives the query and probes keys in the local index

structure to determine whether there are any keys located within the query

range.

(ii) If the server finds only a partial of query scope, the remaining query scope

is sent to its neighbouring cell. Upon receiving the result, the server receives

either index only, or both index and data values of the requested data items.

The local server stores all incoming indices into available nodes of the local

index structure. If the server receives only the indices of data items, then,

pointers from those nodes to data values in neighbouring cells are created to

link the cached index and original data values. In the second situation, pointers

are created from nodes to local storage.

(iii) If the server finds all indices within its index structure, the server retrieves

data values by following the available pointers in those nodes.

(iv) The query result is sent to the mobile user once the query result is ready.

Algorithm 4.1: The Local Index algorithm

Input: QScopebegin

Rtree ← store indexes of objects into a R-tree indexQuery scope ← QScopeNoOfBSOnline ← number of online neighbour BSCollectionOfOnlineBS ← list of online BSResult ← search rtree(query scope)while (index < NoOfBSOnline) do

Current Neighbour BS ← CollectionOfOnlineBS at position indexif intersection(Queryscope, Current Neighbour BSscope is existed) then

Result neigh ← Current Neighbour BS(Queryscope)Rtree ← update Rtree(Result neigh)// Append retrieved results from current BS// to the end collection of all retrieved resultsResult ← Result + Result neigh

endend

4.3.1 Cache Remote Indexes Only

A discussion of the proposed LI mechanism where the data and index segments lo-

cated in different cells is presented in this section. The aim is to speed up the query

processing by searching the common requested objects locally and cache mainte-

nance.

Consider the indexing structures and data tables as shown in Figures 4.3 and

4.4 respectively. Assume that the server from cell 2 sends a query scope to cell 1

and the server cell 2 receives an item with index 29 from cell 1. The server adds

the index 29 and creates a pointer to the original data item, which is cell 1. Figure

4.5 shows the indexing structure after the index 29 is inserted into cell 2. Note that

there is a pointer from index 29 to data item 29 in cell 1.

Figure 4.5: Index structure after the records insertion using local index-1

This mechanism has a tree management similar to that of a single cell for R-tree

management [41] The procedure for insertion and deletion steps can be summarised

as follows.

After a new index has been received by the appropriate cell, the new index

is appended to the existing index structure. Algorithm 4.2 shows the insertion

algorithm of LI-1. They can be described as follows.

(i) If the maximum number of entries has been reached, remove an existing index

from the tree which has not originated from this cell. The removal procedure

will be discussed later.

(ii) Find a right leaf node to insert the new key into the indexing structure.

(iii) Insert the new key if there is enough space in the leaf node.

(iv) Otherwise, this leaf node must be split into two leaf nodes and propagate the

splitting up to the root node if needed. This splitting process can be done by

using one of the existing splitting algorithms for the original R-tree.

(v) The last step is to create a data pointer from the entry on the leaf node where

the new index key is inserted to the data item at the remote cell.

Algorithm 4.2: The insertion algorithm of Local Index-1

Input: indexesbegin

MAX CAPACITY ← maximum nodes to be stored into this RtreeRtree ← indexes of objects in the R-treertree capacity ← current capacity of Rtreenum nodes freed ← 0size of indexes ← get size(indexes)if MAX CAPACITY - rtree capacity < size of indexes then

num nodes freed ← size of indexes - (MAX CAPACITY -rtree capacity)Rtree ← remove nodes(num nodes freed)

endfor each index in indexes do

Inserted node ← insert index to the R-tree and return a reference ofthe Inserted nodecreate a pointer from the Inserted node to the original data of index.

endend

When the maximum number of entries has been occupied, some entries need

to be removed from the cache. The process for evicting some entries is shown in

Algorithm 4.3, which is described as follows:

(i) Select a key victim using one of the existing cache replacement policies.

(ii) Remove the data pointer from the desired key, then discard the desired key

from the cache.

(iii) When eliminating the desired key from a leaf node, there is a chance that the

node becomes underflow after the removal. If this case occurs, try to find a

sibling node which needs less enlargement and redistribute the entries among

the node and its sibling so that both are at least half full; otherwise the node

is joined into its siblings and the number of nodes is decreased.

Algorithm 4.3: The deletion algorithm of Local Index-1

Input: num nodes freedbegin

Rtree ← indexes of objects in the R-treefor i ← 0 to num nodes freed do

index ← select victim()Deleted node ← Find a node, which match with the indexremove pointer(Deleted node)Rtree ← remove node(Deleted node)

endend

4.3.2 Cache Remote Indexes and Data Items

Caching remote indexes would only increase the data transfer to the remote cell.

This situation leads to a bandwidth bottleneck, although the data transfer is much

wider nowadays. To avoid the bandwidth bottleneck, caching the common requested

remote indexes and their data items into the desired cell is one solution. This issue

is the focus of our discussion in this section.

The LI process is similar to that described in the previous section, except this

time the actual data item is cached. To simplify our discussion, we use the same

illustration as in the previous section. In the previous section, the data item is not

copied into cell 2 and a data pointer is created and points to the data item located

in cell 1. Figure 4.6 illustrates index caching and its record. In the figure, the data

item from cell 1 is copied to cell 2. The data pointer is pointing to the local data

table instead of to the remote data table.

Figure 4.6: Index structure after the records insertion using local index-2

The summary of the detailed procedure for inserting and deleting are as follows:

Algorithm 4.4 presents a details of the insertion process of LI-2. The insertion

process is similar with the LI-1, they are different in creating a pointer to data item,

which is the last two steps of insertion process. They can be described as follows.

(i) Store the remote data items to the requester cell.

(ii) Create data pointers from the entry on the leaf node, where the new indexes

key are inserted, to the data items at the requester cell.

Algorithm 4.4: The insertion algorithm of Local Index-2

Input: indexes, data itemsbegin

MAX CAPACITY ← maximum nodes to be stored into this RtreeRtree ← indexes of objects in the R-treertree capacity ← current capacity of Rtreenum nodes freed ← 0size of indexes ← get size(indexes)if MAX CAPACITY - rtree capacity < size of indexes then

num nodes freed ← size of indexes - (MAX CAPACITY -rtree capacity)Rtree ← remove nodes(num nodes freed)

endfor each index in indexes do

data storage[index] ← Store data(data item[index])Inserted node ← insert index to the R-tree and return a reference ofthe Inserted nodecreate a pointer from the Inserted node to data storage[index]

endend

Algorithm 4.5 shows the deletion algorithm of LI-2. The deletion process of

LI-2 is similar as LI-1, except the deletion process of LI-2 has an additional step to

remove the replicated data items. It can be explained as follows:

(i) Find a node to be deleted.

(ii) Remove the data item which is pointed by a data pointer.

(iii) Remove data pointer.

(iv) Remove the node from R-tree and adjust the R-tree if necessary.

Algorithm 4.5: The deletion algorithm of Local Index-2

Input: num nodes freedbegin

Rtree ← indexes of objects in the R-treefor i ← 0 to num nodes freed do

index ← select victim()Deleted node ← Find a node, which match with the indexremove dataItem(Deleted node)remove pointer(Deleted node)Rtree ← remove node(Deleted node)

endend

4.4 Global Index

While a server is requesting data items on behalf of mobile clients to neighbouring

cells, the server performs some activities before the server sends the query result back

to the client. These activities involve waiting for the query result to be received,

and caching the new data items. The caching processes include the index insertion

into its local tree index structure and adjusting its nodes after the insertion. These

processes slow down the query processing; however, this limitation can be handled

by Global Indexing (GI) mechanism.

Unlike the LI mechanism, the index structure is built while a server in one cell

is starting up in the GI. Also, the GI mechanism has some degree of replication and

all indices are maintained globally. In other words, each cell has a different part of

a global index, although the global index structure is still kept.

In this mechanism, the ownership of each index node needs to be maintained in

order to preserve the global indexing structure. The ownership rule of each index

node is that the cell owning a left node also owns all nodes from the root to that

leaf. Consequently, the root node is replicated to all cells and the internal nodes

(except root node) may be replicated to some cells. Furthermore, if a leaf node has

some keys belonging to different cells, this leaf node is replicated to the cell owning

the keys.

As a running example, let us consider three different cells and 100 data items.

Each cell holds 30 indices of point data items. Cell 1 holds data item indices from

1-10. Cell 2 hold data item indices 11-20. The rest of the data item indices goes to

cell 3.

Figure 4.7: Global Index for all cells using GI mechanism.

Figure 4.7 shows a GI (global index) structure which are partitioned by a cell

boundary. The root node is replicated to all three cells and some nodes are replicated

to neighbour cells. The key P10-P12 (the fifth leaf node) is copied to cell 1 and 2,

because this node holds entries that belong to two cells. The key P10 belongs to cell

1 and keys P11 and P12 belong to cell 2. Due to some leaf nodes being replicated,

some of the internal nodes are replicated whereas others are not. For example, the

non-leaf R2 is replicated to cells 1 and 2, whereas the non-leaf R5 is not replicated.

Each leaf node has a data pointer which points to a data item located in either the

same cell or a different cell.

Similar to the LI mechanism, the replication degree can be either indexes level

or indexes and data items. The GI mechanism with indexes only is called GI-1. On

the other hand, the GI mechanism with indexes and data items is called GI-2.

4.4.1 Remote Data Items Located at Different Cell

Our discussion here elaborates on the GI mechanism when the data items located

at the remote cell are not replicated to other neighbour cells. The global index

structure maintenance and query processing are two main discussions.

Algorithm 4.6 is to maintain the global index structure where the degree of data

items replication does not exist. The algorithm includes the insertion and deletion

entries to neighbour cells. However, the details of the R-tree splitting procedures

are not discussed here; these can be found in [41].

The algorithm is used to match a node with a given key and to perform the

insertion or deletion process. Details of the algorithm are as follows. The key here

is Minimum Bounding Rectangle (MBB) value. This algorithm is recursively probing

tree nodes starting from the root node to leaf-node. The key insertion or deletion

is done once a node has been found. Then, a data pointer is established or removed

between the entry to the actual data item depending the operation. Unless the key

is found in the current cell, a child tree (cellTree) is passed to a neighbouring cell to

probe its tree. When a node is overflow or underflow after a key has been inserted or

deleted, the existing splitting or merging algorithm for R-tree single cell is applied.

The starting data pointer is adjusted if the entry is moved to a new node.

In this mechanism, the data structure for the GI mechanism can be explained

as follows:

If a child node exists locally, the node pointer points to this index only, although

this child node is also replicated to other cells. For example, from MBB R1 at cell

Algorithm 4.6: Node maintenance of GI-1 algorithm

Input: Tree, Key, Operationbegin

Node ← a root node of Treeif Key /∈ Node then

cellTree ← assign child tree in neighbour cellsNode Maintenance(cellTree, key, operation)

elseif Node is leaf Node then

Execute insert/delete operation on local nodeCreate/remove a data pointer from the entry to the actual dataitemif Node is overflow or underflow then

Execute split/merge on leaf nodeAdjust all starting point of data pointers in the leaf node.

elsechildTree ← assign child treeNode Maintenance(childTree, key, operation)

endend

1, there is only a node pointer to the local MBB R2. The MBB R2 at cell 2 will not

accept an incoming node pointer from the MBB R1 at cell 1; however, it will accept

one node pointer from the local MBB R1 only.

If a child node does not exist locally, the node pointer will select one node pointer

to the closest child node (in case if multiple child nodes exist somewhere else). For

example, from the MBB R1 at cell 1, there is only one outgoing right node pointer

to the child node (R3,R4) at cell 2. In this case, an assumption is made that cell 2

is the closest neighbour of cell 3. Hence, the MBB R3 and R4, which also exists at

cell 3 will not accept a node pointer from root node R1 at cell 1.

Using the single node pointer model discussed above, there is always a chance of

tracing a node from any parent node from different cell. Figure 4.8 shows a single

node pointer model for the GI mechanism, which presents only the top three levels

of the index tree exhibited previously in Figure 4.7. From the figure, it is possible

to trace to nodes (R10,R11) from the root node 37 at cell 1, although there is no

direct connection from root node R1 to its direct child node (R3, R4) at cell 3. This

tracing to node (R10, R11) also can be done through node (R3, R4) at cell 2.

Figure 4.8: GI mechanism uses single node pointers.

A single node pointer model can be more formally described as follows.

1. Given a parent node is duplicated when its child nodes are separated to mul-

tiple places, there is always a direct connection from whichever copy of this

parent node to any of its child nodes.

2. Applying the same procedure as the first point above, given a replicated grand-

parent node, there is always a direct connection from whichever copy of this

grandparent node to any of the parent node.

Considering both the above statements, we can conclude that there is always a

direct connection from whichever copy of the grandparent node to any of its child

nodes.

Apart from the issues of node pointer at internal nodes, that are at leaf nodes are

worth discussing. As some leaf nodes are replicated, it is also important to manage

data pointers at leaf nodes. Figure 4.9 shows a data structure where the data items

are not replicated to anywhere. In this figure, not all data pointers are shown in

order to increase the readability of the figure. As shown in the figure, leaf node that

contains indexes P10-P12 is replicated at cell 1 and 2. By applying a single node

pointer mechanism, each data item accepts two data pointers. For example, the

record for entry P10 accepts 2 incoming data pointers from cells 1 and 2. Similarly,

records for entry P11 and P12 receive 2 incoming data pointers from cells 1 and 2.

This mechanism has a similar concept to LI-1; that is, the leaf node is replicated

and its pointed record is not replicated. The main difference between GI-1 and LI-1

is the fact that GI schemes have one global index, whereas the LI schemes use a

local index.

Figure 4.9: Global Index without replicated remote data items.

4.4.2 Remote Indexes and Data Items Located at Same Cell

In this mechanism, the data items are replicated to a cell to which the entries at leaf

node level are replicated. In other words, GI-2 has a similar idea to GI-1 in terms of

non-leaf nodes replications. Both approaches are different in the way they establish

data pointers at the leaf node level. The GI-2 has an extra step to replicate the

remote data items.

In this mechanism, the data structure for the internal node is similar to GI-1,

except a data pointer at leaf node points to the record this located at the same cell.

This data pointer can be explained as follows:

If a leaf node exists locally, a data item is not replicated and it is linked with the

entry in the leaf node by a data pointer. Figure 4.10 illustrates the GI mechanism

with replicated data items. In this figure, not all data items and data pointers are

shown to have clear visibility of the figure. For example, from the entry P4 and P5

at cell 1, there is only one data pointer to data items and the data items are not

replicated to cells 2 and 3.

Figure 4.10: GI mechanism where data items are replicated.

If a leaf node is replicated, the data items which belong to entries in the leaf

node are replicated to cells where the leaf node is duplicated. Once the data items

have been replicated, data pointers are created to link entries in the leaf node to

the appropriate data items at each cell. For example, leaf node (P10,P11,P12)

is replicated at cell 1 and 2 where the data items of those replicated entries are

duplicated. For example, the record for entry P10 is replicated from cell 1 to cell 2.

The data pointer for the entry P10 at cell 1 points to record P10 at cell 1 (Dotted

line), whereas the data pointer for the same entry at cell 2 points to record P10 at

cell 2 (line). Similarly, the record for entry P11 is replicated from cell 2 to cell 1. A

data pointer is established between entry and record for P11 at cell 1 and another

data pointer is created to link the entry and the record for P11 at cell 2.

Algorithm 4.7 is to maintain index nodes in the global index where the remote

data items are replicated. A GI with replicated data items model can be more

formally described as follows:

1. If a leaf node is replicated to another cell, there is always a copy of data items

for each entry in the leaf node. In addition, a direct connection from an entry

to a data item in the same cell always exists.

2. When a leaf node is not replicated to another cell, there are always original

data items for each entry in the leaf node. There is a single direct connection

from an entry of leaf node to a data item.

3. The number of direct connections between leaf node and data items is always

equal to the number of entries in each node.

4.5 Case Study

This section describes a case study using the proposed approaches. Two case studies

for each proposed indexing approach are presented in this section. To simplify our

explanation in these case studies, we reuse the indexing structure in Figure 4.3 on

Two case studies are presented in this section. A Local Index case study

is presented first, followed by a Global Index.

Let us suppose that the proposed Local Index mechanism is applied for the first

case study. Assume that a mobile user requests data items from cell 2 and the query

Algorithm 4.7: Node Maintenance of GI-2 algorithm

Input: Tree, Key, Operationbegin

Node ← a root node of Treeif Key /∈ Node then

cellTree ← assign child tree in neighbour cellsnode Maintenance(cellTree, key, operation)

elseif Node is leaf Node then

Execute insert/delete operation on local nodeReplicate the remote data itemCreate/remove a data pointer from the entry to the actual dataitemif Node is overflow or underflow then

Execute split/merge on leaf nodeAdjust all starting point of data pointers in the leaf node.

elsechildTree ← assign child treenode Maintenance(childTree, key, operation)

endend

asks for data items in cells 2 and 3. Cell 2 processes the request by probing its index

structure and sends the remaining query scope to cell 3. Assume that cell 2 receives

index 61 from cell 3. Cell 2 then adds the index 61 to its index structure and creates

a pointer from node 61 to the original data items in cell 3.

Figure 4.11: Indexing structure at cell 2 after the remote index insertion

Figure 4.11 shows the situation after index 61 has been put in the cell 2. A

pointer from index node 61 is pointed directly at the actual data items in cell 3 if

the data item is not replicated. On the other hand, if there is a degree of replication,

the data item of index 61 is copied to cell 2 and the pointer points from the index

node 61 to the replicated data item.

On the other hand, when the conventional approach is used, the tree traversal

starts from the root node on both tree index structures. This situation slows down

the query processing if the requested remote indexes are available in the local storage.

In the second case study, the proposed Global Index mechanism is used to process

a multiple cells mobile query. To simplify our case study scenario, an indexing

structure as shown in Figure 4.7 on page 139 is used and it has been initialised.

Figure 4.12: Global Index mechanisms case study

The scenario for this case study is similar to the previous one. However, the

Global Index is used and the database contains different records. Assume that

index items 21 and 22 are the requested data items. When these data items arrive

at cell 2, they can be stored in cell 2’s local storage. If it is, each pointer from index

node of 21 and 22 is created and points to data items in the actual data in the local

cell (which is cell 2).

4.6 Discussion

This chapter presents two proposed mechanisms for using the existing indexing

structure to process multi-cell queries. The mechanisms are designed to minimise

retrieval time, reduce data transfer and optimise query access time for retrieving

desired data items from multiple cells.

Two indexing mechanisms have been introduced, namely: Local Indexing (LI)

and Global Indexing (GI). These two indexing mechanisms are based on the R-tree

structure and are designed to improve the performance of multiple query processing.

The complexity of the LI mechanism is related to the fact that the data pointer

crosses the cell boundary and data items are replicated. In the first case, the data

pointer to the appropriate data item can cross several cell boundaries. In the sec-

ond case, the data items are duplicated to wherever the node is duplicated. The

performance of both LI has slightly different results. The LI-2 has lower access time

compared with the first one. This is due to the LI-2 retrieving the data item locally.

The GI and LI mechanisms have different the index structure. In the LI, each

cell has its own index structure, whereas, the GI has a global index structure and

each cell has a different part of the index structure. In other words, the GI keeps

the index structure globally for all cells, however, the LI preserves it to be localised

at each cell. On the other hand, all indexes from every cell could be stored into an

index structure at each cell. However, this situation is prevented by the limiting

number of nodes for the tree index structure at each cell.

The main difference between the GI and LI mechanisms is very much related to

index restructuring which involves single / multiple cells. This difference can further

specified as follows:

• Unlike the LI mechanism, where only single cells are involved in maintaining

the index structure, the GI mechanism demands the involvement of multiple

cells.

• The indexing structure is constructed while a BS is starting up in the GI

mechanism. It is expanded by inserting indexes from others cells. The LI

mechanism expands its index structure if new query results arrive.

• The GI mechanism does not have any limitation on increasing its global in-

dexes structure. The mechanism shrinks its index structure if a data item is

deleted from the cell or another BS is offline. In the LI mechanism, there is

a limitation to increasing the index structure, thus, the index structure often

shrinks because of this limitation.

Based on the characteristics of both indexing mechanisms, the LI mechanism

brings a benefit if the remote indexes of the local index structure are frequently

requested in order to minimise the maintenance of index structure; whereas the GI

mechanism brings an efficiency if the queries often requests different areas due to

its large collection of indexes.

In terms of number of cells to be cached, the LI mechanism cache the index

of remote objects from surrounding neighbour cells. The GI mechanism can hold

the index of remote objects from all online BSs. However, if any BS is offline, the

performance of GI degrades due to the index maintenance.

4.7 Conclusion

This chapter has presented two proposed indexing mechanism for the server side.

These proposed approaches are called Local Indexing (LI) and Global Indexing (GI)

mechanisms. The aim of the proposed approaches is to speed up the query processing

and to reduce the amount of data transfer.

The LI mechanism retrieves the indexes from other cells and stores them in the

cell where the mobile user issues the query. The remote indexes retrieved are the

indexes of the data items in the query result. In other words, the index structure in

the current cell is expanded and maintained locally by this mechanism.

The GI mechanism is similar to LI. However, they are different in terms of main-

taining the indexing structure. With this mechanism, the overall index structure is

kept and part of the global index is separated in all cells.

In addition, each mechanism has two different methods of accessing data items

for each remote index from the local indexes: remotely or locally. In the former, the

data item is not replicated to anywhere. Hence, a data pointer is created from an

index entry in a leaf node to its data item. The latter is to replicate the data items

from the original cell into wherever the indexes are replicated.

Chapter 5

Client Caching for a Mobile

Environment

This chapter presents our proposed client cached object elimination approaches. In

our approaches, cached objects are ordered and put together into groups and then

we eliminate a group of cached objects based on either distance, weight or value of

formula amongst all groups. The process keeps evicting a group one by one until

all new incoming objects can be cached. The aim of our proposed approaches is to

boost the cache hit rate.

5.1 Introduction

A mobile computing environment has limitation of resources, such as narrow band-

width, small space size, battery-power and frequent disconnections. These limita-

tions are inconvenient for the processing of location-dependent queries. In addition,

the user mobility adds another complication to the location-dependent query pro-

cessing.

CHAPTER 5. CLIENT CACHING FOR A MOBILE ENVIRONMENT 154

Client caching is a traditional way to store data from servers into the client

side. This mechanism has been applied in many ways for a wired network, such

as distributed, World Wide Web and database systems. Recently, client caching

has been adapted for a mobile environment to handle the limitations imposed by

a mobile environment. Several proposed approaches to maintain objects in a client

cache exist, although there are still some problems outstanding.

In this chapter, we propose three approaches for client cache management in

a mobile environment. Our proposed approaches sort cached objects into groups,

which is similar to [95]. Density and distance are factors to consider when grouping

cached objects. In addition, our replacement policy removes a group of objects

instead of a single object.

Our proposed approaches are called: (i) Path-based, (ii) Density-based and (iii)

Probability Density Area Inverse Distance (PDAID) replacement policies. The Path-

based replacement policy is similar to Further Away Replacement (FAR) [94], which

eliminates objects furthest from the mobile client’s location. The Path-based re-

placement policy eliminates a group of objects by considering the distance of all

groups. The distances of all groups are measured to the next predicted location af-

ter the predicted location where the user receives query result. On the other hand,

the Density-based replacement policy evicts cached objects based on the density of

a group. A group which has less density has priority to be dropped from the cache.

The last replacement policy is PDAID, which eliminates a group of objects based on

cost. This cost is created by three factors, namely: access probability, density and

data distance. The ideal cost has small access probability, less density and further

data distance. Hence, a group that has the smallest cost value is the first to be

evicted.

In our proposed approach, reducing transfer cost and user satisfaction are our

aims. Due to the limitation of bandwidth and frequent disconnection, reducing

transfer cost can be achieved by retrieving the query result from the cache and/or

retrieve necessary data items from the server. Retrieving necessary data items from

the server can be done by asking how many records satisfy the user. In other words,

we concentrate on the process of storing the incoming objects into the cache and

arrangement of the existing cached objects when there is no available space in the

cache for the new incoming objects.

Figure 5.1 shows a framework of this chapter. The rest of this chapter is organised

as follows. An overview of the general client caching processes is given in Section

5.2. Section 5.3 elaborates on the details of our proposed approaches. The proposed

approaches are Density, Distance and Formula based cache replacement policies.

Case studies and discussion sections are given in Sections 5.4 and 5.5 respectively.

The last section concludes this chapter.

5.2 Client Caching Overview

This section gives an overview of the general client caching mechanism in a mobile

environment. The discussion starts with the general process, retrieval, grouping and

elimination procedures. The aim of this section is to provide a foundation for our

proposed approach.

When a client sends a query to a server, the answer to this query is verified

against its local cache. If the answer is found in the cache, then it is returned to

the user directly. Otherwise, the query is sent to the server. A set of results is

generated by the server. Upon receiving incoming query results, the available space

in the cache is verified to indicate whether an elimination process needs to be done.

If the available space is enough to store all incoming results, then the incoming query

result is stored directly without eliminating cached objects. Otherwise, the classic

way is to discard some cached objects to free some slots in the cache. Alternatively,

we store only as many of these incoming objects in the cache as there are available

cache slots. However, we do not consider the last option that stores a partial of

query result into the cache, as a discussion point in this thesis.

For a location-dependent query, the attributes of a query are the current details

of users, range of the query and the minimum number of wanted objects. The first

attribute includes speed, direction and location at the current time. The range of

the query attribute defines how far users want to search specific area. The minimum

number of wanted objects attribute means that if the minimum number of required

objects has been found in the cache, then these objects are returned to the users.

In other word, sending the query to ask for more objects can be avoided if the user

is satisfied with the current answer. If the users need to receive a full answer, the

client needs to ask to the server for a full answer. The minimum number of wanted

objects attribute is usually ignored in general case.

Figure 5.2: Section 5.2 framework

Figure 5.2 shows a framework for this section. Section 5.2.1 shows a global

framework and briefly explains the client caching process. Section 5.2.2 presents an

overview of how to store query results to a client cache. Predicting the next move-

ment location is discussed in Section 5.2.3. Cached objects retrieval and updating

query history list overview are presented in Sections 5.2.4 and 5.2.5 respectively.

Objects grouping and cache objects replacement overview are discussed in Sections

5.2.6 and 5.2.7 respectively.

5.2.1 Global Process

This section discusses about a whole process for client caching. The intention is to

present the big picture of the client caching process. The discussion begins with

receiving a query through to replacing cached objects.

When a client sends a query, this query is sent to a cache in the client’s mobile

device. The query contains a query scope, K, current position, direction and speed.

K is the number of objects expected to be received by the user. In other words, a

user is satisfied if at least K number of objects in a query result are received instead

of a full set of query results. If the user is unsatisfied with objects in the query

result, the user needs to send the same query again. Normally, the user expects to

receive all results and the value of K is ignored in the query.

Once the cache receives these parameters, the query processing begins. The first

step is to predict the next location where the user will receive the query result. The

next location prediction process is discussed in Section 5.2.3. When the cache has

the location from which to retrieve query result, the cache probes its collection to

match objects with the query scope. If the number of objects is greater than or

equal to K, then the cache returns those objects immediately. On the other side,

if the number of objects is less than K or is not found, the query is sent to the

current cell where the mobile clients reside within the current cell. Upon receiving

the query result, the available cache space is verified to ensure that the cache has

enough space to store new incoming objects inside the query result. If the cache

space is not enough to store all processes, cached objects are removed (see Section

5.2.7). After that, the new incoming objects are put into the cache. All cached

objects are regrouped, which results in new groups appearing or the members of

existing groups changing. The discussion about grouping is presented in Section

5.2.6. The results are sent to the client. Hence, the client finishes the process.

5.2.2 Storing Query Results to Cache

An algorithm for storing the incoming query results from a server into a client

cache is discussed in this section. Before the incoming query results are stored,

this algorithm does two pre-processes, namely filtering object duplication and total

objects verification.

Object duplications could occur if the server does not have any knowledge about

the client cache status. In the other words, this situation occurs when a client sends

a complete part of a query scope without attaching any information of available

cached objects. There are two ways to prevent object duplications as follow:

• Client side filtering

When a complete query scope is sent to the server, the server sends a complete

result set to the client. The complete result set, which can contain objects

that exist at the client cache, is sent because the server does not have any

information at all about the client cache. Hence, the client needs to filter

those objects that have existed in the cache before the complete result set is

put in the cache.

• Server side filtering

This step is more efficient than the first one in terms of sending a query scope.

The cached objects are loaded from the cache and the area of query scope is

examined. As a result of examination, the parts of the query scope which do

not include any area of cached objects are sent to the server. Hence, the server

produces a query result that does not have the same objects as any of the ones

in the cache.

Alternatively, the filtering process at the server side can be accomplished by

attaching all cached objects information to the query. Upon processing the

query, the server matches its objects with the query scope and the attached

cached objects information. If the objects already exist in the attached infor-

mation, these objects are excluded from the query result.

After the filtering phase has been done, the number of remaining objects in the

incoming query results are validated. The aim of this validation is to accept those

query results where the number of objects is less than or equal to the client cache

space. If the cache space has fewer slots to accommodate a complete query result,

this query result is not stored. Otherwise, the query result is saved in the cache.

Once the validation has been done, two possibilities of available cache space

might occur here while storing the query result. In the first situation, the number

of slots in the cache is enough to store all query results. If this situation occurs,

then those objects can be stored into the cache directly. Also, the receiving location

of the mobile user is stored while storing the query results into the cache. The aim

of storing the receiving location of the mobile user is to avoid wrong cache items /

cache group elimination. In other words, the query history list is created to predict

the next location based on the past locations.

In contrast, when the number of slots is not enough to store all incoming objects,

the cached objects and location of the mobile user are removed from the cache. Then

those objects are stored and the cached objects are regrouped, which results in a

new group being formed or new cached objects being added to the existing groups.

5.2.3 Predicting Next Movement

There are three ways to predict the next location of the mobile user: the movement

patterns, prediction and both.

In the first way, all user movements are stored and the next movement is pre-

dicted based on those movements. Predicting the next location of the mobile user

can be done by storing the movement patterns. In the mobile environment, the user

always broadcasts its location every t interval time. Remembering these locations

may improve the accuracy of the cache items elimination if the user follows the same

pattern. However, keeping these locations may consume a lot of space.

On the other hand, the next location prediction can be calculated based on

current position, direction and speed. These three factors are not enough to predict

the next location without any knowledge about when a query result is going to be

received. Once the processing time is known, the prediction of the next location

can be done. This way is simpler and easier than the previous way, but the same

accuracy is not guaranteed. The reason for this is the unpredictable traffic conditions

or directions.

The last way to predict next location is a hybrid of the first and the second

approaches. In this approach, the location when the user receives query result is

stored instead of a series of user movements. The aim for this is to be used in

predicting the next location. If the location has not been seen before, the prediction

formula is used to calculate the next location.

5.2.4 Retrieving Cached Objects

A cached objects retrieval process is presented in this section. The aim is to give a

global overview of how cached objects are loaded from the cache.

The starting point of the process is accepting three parameters as inputs from

the user, namely: Query Scope, number of expected objects and the next predicted

position. These three parameters are used to determine which groups of objects

intersect with the query scope.

The next step, which is called the cached objects verification process, is to verify

whether all cached objects are located within part of the query scope. In this

process, each cached objects group is checked to determine whether it overlaps with

the query scope. If the group overlaps, each object within the group is recursively

matched with the query scope. When some cached objects of the group reside in the

query scope, the information of the cached objects is loaded and put into a result

collection. Also, a counter which is used to keep track how many cached objects

have been found is incremented. After all objects of the group have been processed,

the next overlapping group is inspected using the same process. The verification

process ends when all groups have been verified and the information about the valid

objects has been allocated to result collection. The result collection is returned to

the user.

5.2.5 Updating Query History List

This section presents an algorithm for updating a query history list. This algorithm

is used to keep track of the locations that have been visited in the past in order to

predict the future location.

In general, when a user receives a location-dependent query result, the location

of the user is added to the query list. Unless the user has visited the same location

in the past, the timestamp of that record is updated with the current time value.

The process is performed by inspecting the list of query history. If there is

an existing entry, timestamp of the entry is updated with the current timestamp.

Otherwise, a new entry is created and inserted to the list. The entry contains the

current location, query scope and current timestamp.

5.2.6 Objects Grouping

Grouping or clustering is a mechanism to divide queries into groups which are seman-

tically related or adjacent to be kept together [95]. Overview of Grouping process is

the main topic in this section. As we have mentioned in Section 2, there are some

existing grouping algorithms for grouping data items [42, 34, 116].

From the many existing clustering algorithms, we adapt one to help our proposed

algorithm in grouping the cached objects. This algorithm is called the DBScan

(Density-based Spatial Clustering of Applications with Noise) algorithm [34]. It has

been chosen since it is based on locality connection and density when grouping the

objects. Density-based grouping is a mechanism that group a minimum number

of objects which are located within certain distance together. The benefits of our

proposed approach is that it groups cached objects based on distance and minimum

number of objects. It is also used for our cache replacement.

This section gives an overview of the DBScan mechanism, which is used in our

approach. A brief explanation of this mechanism is given, followed by an example

of how this mechanism works.

In the DBScan scheme, a group or cluster has a centre point, a distance from

the object, and a minimum number of points within the specified distance. The

distance is denoted as Eps. Eps-Neighbourhood of p is the objects located within the

distance of p and Eps. A minimum count of points within an Eps-Neighbourhood of

p is known as MinPts.

The objects in the group can be differentiated into two types: core and noise

objects. Core object has at least a minimum number of objects within a radius Eps,

whereas, noise or border objects are located far away on the boundary of a group.

In other words, a core object is defined as a point q having an Eps-neighbourhood of

not less than MinPts. If p has a Eps-neighbourhood of less than MinPts, then it is

considered a border object.

The process begins by finding an object as a core object, which is used as a

centre point of a group. Then the process looks for the objects which have not

been assigned to any group or are connected to any other object. If this criteria

is satisfied, the object is assigned to the group and the group is expanded and the

counter is incremented by one. On the other hand, merging of the two clusters

would be done if the minimum distance between any point in two clusters is less

than Eps.

In the context of our case, a process of regrouping cached objects is done when

there are new incoming objects and/or eliminating cached objects. Thus the groups

in the cache are up-to-date when members of the caching have changed. The re/-

grouping process is slower if the cache size is bigger.

Let us consider examples using DBScan. If minimum number of points is 2,

what are the clusters that DBScan would discover with the following 8 points. Ex-

amples: A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2),

A8=(4,9). Figure 5.3 shows those points presented in two-dimensional coordinates.

Figure 5.3: An illustration of the DBScan Algorithm

The size of epsilon gives an impact on performing clusters. When we set the

value of epsilon to 2, two clusters are formed. The first cluster, C1 contains 2

points: A4 and A8; whereas the second cluster, C2, consists of 3 points: A3, A5

and A6. However, when the value of epsilon is set to 3.5, three clusters are formed

instead of two. The last cluster, C3, contains A2 and A7. The C1 has an extra

member, A1, however, the members of C2 remains the same.

5.2.7 Cached Objects Elimination

In general, cached objects are evicted from the cache when the cache space is not

enough to contain new objects. Cached objects are easy to process; however choosing

the right victims requires more attention in order to increase the benefit of using a

cache at the client side. In a mobile environment, cache management is important to

overcome the limitations of this environment. The existing cache elimination policy

exists and these criteria are based on distance, density, timestamps, individual object

size or other attributes.

An overview of cached objects elimination is given in this section. Cache object

elimination is a process to evict cached objects based on certain criteria when there

is no available space in the cache. The aim of this section is to present the foundation

of our proposed approach. The details of our proposed approach are discussed in

Section 5.3.

Three cached objects elimination discussions are given as follows:

• Path-based

This approach eliminates a group of objects by considering distances of all

centre points of groups to a user’s locations. Two user locations are taken to

measure a distance to a centre point of the group, these two locations are the

receiving location and next predicted location after the receiving location. In

other words, two distances from a centre point of a group are measured; one

is measured from the centre point of a group to the receiving location and the

other one is measured from the same starting point to the next predicted lo-

cation. A group that has the furthest distance to the next prediction location,

but the nearest to the receiving location is the group victim to be evicted.

• Density-based

Density is another consideration factor for cached elimination. Density refers

to the number of objects in a group. A group that has fewer objects and a

larger area is a target for elimination. In other words, the group which has

fewer dense objects has higher priority for elimination.

• Cost-based (PDAID)

Cost is a key factor when removing a group of cached objects for this cache

replacement policy. The cost value is determined by using a formula which is

based on several factors. These factors are: access time, density and distance.

Before providing an overview of our proposed approach, some definitions

of terms used in this proposed approach is presented, followed by an existing

approach, called PAID. This approach is explained here since our approach is

similar to the PAID approach. The formula for calculating cost is different for

both approaches.

In the PAID approach, a formula has been developed which depends on

the three factors mentioned above. The formula is shown below:

C = P∗AD

Where C is the cost of a data value,

P is the access probability,

A is the valid scope area, and

D is the data distance.

Some terms used are data distance, valid scope area, and access probability.

Data distance refers to the distance between the receiving location of a mobile

client and the valid scope of a data value. Valid scope area refers to the

geometric area of the valid scope of a data value. Access probability is measured

using the well known exponential aging.

The formula for calculating access probability value is :

P = α(tc−ti)+(1−α)∗P

Where:

tc is the current system time,

α is a constant factor to weight the importance of the most

recent access in the probability estimate,

Ti is the last access time, where the initialised value is zero (0), and

the second P is the previous calculated P.

The elimination process is started by calculating a cost for each valid scope.

Once each valid scope has a cost, the valid scope which has the smallest value

is removed from the cache first.

Our proposed approach is similar to the PAID approach in terms of elimi-

nation factors. Both approaches use time and distance factors for elimination.

However, our approach considers the density of a group rather than an area of

a group. In other words, the PAID approach removes a group which has the

longest access time, further distance and smallest area. However, our PDAID

approach removes a group which is less dense, has long access time and is

further away from the next predicted location after the predicted receiving

location.

5.3 Proposed Approach

This section discusses our proposed approach to the cache replacement policy. We

proposed three cache replacement policies: (i) Path, (ii) Density and (iii) Probability

Density Area Inverse Distance (PDAID). The first two cache replacement policies

are straightforward; they eliminate based on distance/path and density respectively.

The last policy is based on the cost of multiple attributes.

As mentioned in Section 5.2.6, the DBScan algorithm is adapted to our proposed

approach. The aim of adapting DBScan algorithm is to eliminate a group of objects

using one of the proposed cached replacement policies mentioned earlier.

Section 5.3.1 explains the proposed path-based cache replacement policy, The

proposed density-based proposed approach is discussed in Section 5.3.2. Section

5.3.3 presents the based proposed approach.

5.3.1 Path Based Elimination Algorithm

This section explains our proposed algorithm to eliminate groups of objects when

some slots are needed for new incoming objects. The proposed algorithm eliminates

a group of objects to free a number of occupied slots until the number of available

slots is enough to store incoming objects.

Figure 5.4: Simple illustration of our elimination approach

Before we start explaining about our proposed approach, consider Figure 5.4.

The figure shows the user moving from position G5 to a receiving location (a location

where the user receives a query result). In the receiving position, there are incoming

query results from the server and the cache slots are not enough to store the incoming

query results. Some of the cached items will be evicted from the cache. The cached

items elimination algorithm needs to be smart enough to maintain a cache hit rate.

Eliminating the cached object by considering the next location after the receiving

result location is one of many smart ways to keep a high hit rate. As shown in the

figure, the next predicted location after the receiving result location is G1. There

are two paths for G0 and G2: one path from a group to the next location, and the

other one is from a group to the current location. The aim of these two lines is to

evict the cached objects group by measuring two distances as mentioned above. A

group that is the furthest from the centre point of the next group (the next position)

has a higher priority to be eliminated first.

Algorithm 5.1 shows our path-based elimination algorithm. One of the input

parameters is to keep track of the number of slots that have been made available.

The elimination process is done by evicting groups of cached objects one by one

until the number of required slots has been satisfied.

In the eviction process, a group selection is determined by comparing two paths

between the path from a centroid of the group to a centroid of the group in the next

predicted location and the path from a centroid of the group to a centroid of the

group in the current predicted location. The group becomes a victim if this group is

located near to the receiving location and furthest from the next predicted location.

Figure 5.5 illustrates a more complex case. In this illustration, the ’dot’ line

(black), denotes the distance from centroid of each cached group to the current

position of the user. The ’dot-dash’ line (gray color) denotes the distance from the

centroid of each group to the next position of the user. Current and subsequent

positions refer to the location while the user is receiving a query result and the next

predicted location respectively.

Algorithm 5.1: The proposed path-based elimination algorithm

Input: ListOfGroups,sender location, recipient location, required slots freedbegin

sending loc ← sender locationreceiving loc ← recipient locationnext loc ← next predicted location from the receiving locationGroups ← list of groupsnumOfRequiredSlots ← number of required slots to be freedwhile numOfRequiredSlots ≥ numOfSlotsFreed do

for each group in groups doif (sender loc or recipient loc or next loc) ∈ group then

continueendDist to next ← find distance from the centroid of the group to thecentroid of the next loc.Dist to recipient ← calculate distance from the centroid of thegroup tothe recipient positionMin dist group ← Min(Dist to next, Dist to recipient)List Min dist ← add Min dist group to list

endMax dist group ← Max(List Min dist)groups ← remove groupgroup idnumOfAvailableSlots ← numOfObjects + numOfAvailableSlots

endend

In the figure, distances from the centroid of group G0, G2, G6 and G7 to the

current and next positions are calculated. Then, the minimum value for each group

is collected, for example MinG0 (8,11) = 8, MinG2(10,15) = 10. After that, the

maximum value of those minimum values is selected. The group that has this

maximum value will be eliminated from the cache.

On the other hand, there is a possibility that a future query scope may overlap

with more than one group. To handle this situation, a mechanism similar to the

one in the previous example is applied. The distance from a group to all overlapped

groups and current position is measured. The minimum distance of this group is

Figure 5.5: Complex illustration of our elimination approach

selected. Using the same mechanism, the distance calculation is also applied to all

groups in the cache, and the minimum distance for each group is chosen. Then, a

maximum value from those minimum distances is taken. The cached objects within

the group that has the maximum distance value are eliminated from the cache.

Figure 5.6: A query scope overlaps with multiple groups

Figure 5.6 illustrates a query scope overlapping with two groups (G1a and G1b).

The figure shows that G0 and G2 are the two groups to be eliminated. In order

to eliminate them, distances from a centre point of G0 to the centre of overlapped

groups (G1a and G1b) are measured. After that, a distance from G0 to the receiving

location is computed. Then, a minimum distance of those distances is taken, which

is 8.5. A similar procedure is applied for finding a minimum distance value of G2,

which is 9.5. In the final stage, a maximum distance of both minimum distance

values is chosen, which is 9.5. Hence, G2 is eliminated from the cache.

5.3.2 Density Based Elimination Algorithm

A discussion of the density-based cache replacement policy is presented in this sec-

tion. Density is the ratio between the number of items and the area of a group. If

a group has fewer objects, the density of that group is small. It implies that the

group does not have many items of interest. Therefore, the group which contains

small density has a priority for the elimination.

Algorithm 5.2: Density-based elimination algorithm

beginnext location ← predict next location after the user received query resultgroups ← current available groupsnumOfRequiredSlots ← number of required slotswhile numOfRequiredSlots ≥ numOfSlotsFreed do

for each group in groups dogroup ← Find a group that has less collection and least accessedtimesisReqNext ← Check possibility for the group to be requested next.if isReqNext 6= true then

numOfSlotsFreed ← numOfSlotsFreed + numItemsInTheGroupgroup ← remove all cached items in the selected group.

endend

Algorithm 5.2 shows the elimination of cached objects based on density. First, it

predicts the next location of the user and starts the elimination process. While it is

evicting the group, the group that has a smaller collection will have higher priority

to be removed. In other words, a group which has the least density value has

higher priority to be eliminated. After the group has been eliminated, the number

of available slots is calculated. If the number of slots is insufficient, the elimination

process will evict more groups until there is a sufficient number of needed slots. This

algorithm does not prioritise user movement patterns.

Figure 5.7: Illustration of density elimination

Figure 5.7 presents an illustration of density elimination, where N presents a

number of objects in a group. The area of every group has the same size. Consider

that the user moves from the current location (G5) to the retrieving location (shaded

area), where the user receives the query result. While retrieving query result, the

cache is full and cached objects need to be evicted. The next location after the

recipient location is predicted to determine whether the next group is not requested

in the next interval time. In our scenario, the least dense group is G1, however, this

group is not eliminated because it is predicted to be requested. Hence, a group of

cached objects in G0 is evicted from the cache. The elimination keeps going until

there is enough space to store the incoming objects.

5.3.3 PDAID Elimination Algorithm

This section presents our cost-based replacement policy, which is called Probabil-

ity Density Area Inverse Distance (PDAID). The proposed approach eliminates a

group of cached objects based on a cost value. The cost value is calculated during

the cache retrieval and is based on several factors. Therefore, this section is divided

into two subsections, a cached objects retrieval modification and the cache replace-

ment algorithms. The early subsection discusses how a cost value is calculated and

updated. The later section shows the proposed cache replacement algorithm which

uses the calculated cost to remove cached objects from the cache.

Modification of Cached Objects Retrieval Algorithm

This section discusses a proposed cache objects retrieval approach which modifies

the general cache retrieval algorithm and would be used with the PDAID proposed

approach. The main discussions of this section focus on our cost formula and the

modified cache objects retrieval algorithm.

As mentioned in Section 5.2.7, the density of the valid scope area as an additional

factor is taken into account in our proposed approach. Hence, the PAID formula is

modified to have a density factor by replacing the value A with Da, where the Da

is the density value of an area. The value DA is calculated as follows:

Da = NA

Where Da is the density value of an area,

N is the number of objects in an area, and

A is the valid scope area.

Therefore, the modified formula for C can be found as follow:

Pg = P * Da

Where Pg is the access probability of a group,

P is the access probability of an item, and

Da is the density of an area

To simplify the access probability formula, we assume α is constant. Thus, the

access probability formula becomes:

P = 1tc−ti

where tc is the current accessed time, and

ti is the last accessed time.

Hence, the cost of data value becomes :

C = Da(tc−ti)∗D

C = PgD

Where Pg is an access probability of a group,

D is the data distance, and

C is the elimination of cost.

Using the modified PAID formula, the cached objects retrieval and elimination

processes are slightly modified. The aim is to include a weight factor into the

formula. In the above formula, the value of Pg is computed during the retrieval

process, because if this computation is completed during the elimination process,

it would slightly increases the computation during that process. The value of C

is calculated during the cache elimination process, because the user is dynamically

moving and the distance is not independent on the current location of user.

A group of cached objects is accessed while retrieving the cached objects or

storing new objects to the cache. In storing new objects, those objects are grouped

and all factors which are mentioned above are kept. In grouping the cached objects,

they form new groups or merge into existing groups. When new groups are formed,

their Pg values are initialised to zero. If the new objects are merged into new existing

groups, the existing groups are split into new groups. The Pg value of both groups

are not reset back to zero, because they contains existing cached objects. The Pg

value of a group is recalculated when the information of cached objects in the group

are retrieved, then value of accessed time and the Pg value are updated.

Algorithm 5.3: Cache retrieval for PDAID algorithm

beginGroups ← Find any group intersects with the query scope in nextposition.tc ← current timeti ← 0for each group in Groups do

groupResult ← find the cached objects in the group matches with thequery scopeif groupResult 6= empty then

ti ← retrieve accessTime(group)Pg ← D∗A

(tc−ti)

update existing values(group, Pg, tc, D, A)results ← result + groupResult

endreturn results

Algorithm 5.3 shows the cache retrieval algorithm which considers multiple fac-

tors. When the query scope intersects with a group of cached objects, the value of

Pg is updated. The value of Pg is to keep track of the cost of a group which has

the requested query result. After the value Pg has been calculated, the query result

is added to the parameter results. After that, the algorithm continues finding the

next intersected group, calculating the value Pg and adding the result found to the

parameter results. Once the cached result has been generated, the result is sent to

the user.

Figure 5.8: Illustration of PDAID retrieval

Figure 5.8 shows an illustration of PDAID retrieval. Assumes that area of every

group is the same. G0 and G1 are two groups which are stored at time t0 and t1

respectively. At time t2, a number of new objects is stored and all cached objects

are regrouped. This situation causes the G0 is split to 2 groups: G0 and G2.

The access probability of a group can be explained as follows: The Pg values of

G0 and G1 at time t0 and t1 are initialised to zero. When the G2 is formed, the value

of Pg is not zero if it has any existing cached objects. Therefore, the calculation of

Pg for G0 as follows: the density value (Da) is 4; the value of P is 0.5 and the value

of Pg is 4 * 0.5 = 2. The Pg calculation for G2 can be done in the same way as G0,

thus its value is 2.5.

PDAID Replacement Algorithm

This section presents our cost-based replacement policy, which is called Probability

Density Area Inverse Distance (PDAID). Earlier in this section, we have shown how

to calculate the access probability cost for each group while a group of cached objects

is accessed. To simplify our proposed approach, we assume that groups of all cached

objects have been formed and each group has had its value of access probability

calculated.

The formula to calculate value of C as follows:

C = PgD

Where Pg is an access probability of a group,

D is the data distance,

C is the elimination of cost.

When the cache does not have enough space, a group is eliminated. The group

elimination is done by calculating the value of C for all groups and removing the

group that has the smallest value of C. The value of C is the elimination cost. The

value of C is calculated by dividing value of Pg with a distance. The distance is

measured between the central point of a group and a next predicted location of

the user. The next prediction location is a location after the receiving location

and determined based on the travel history of mobile user. A group which has the

smallest distance has the smallest chance to be accessed again. Therefore, a group

that has the smallest value of C is eliminated.

Algorithm 5.4 shows the cached objects eviction based on multiple criteria. At

the start of the algorithm, all groups in the cache are assigned to parameter Groups

and the number of required slots is assigned to parameter numOfRequiredSlots. Once

the parameter assignments have been completed, the algorithm starts eliminating

groups which retrieve the Pg value, measure the distance and calculate the value

of C. The group eviction process is similar to that with algorithms in previous

section. This algorithm finds the group with the smallest value of C as an evicted

victim. Once the evicted group has been removed from the cache, the parameter

numOfSlotsFreed is increased by the number of items in the evicted group and the

Algorithm 5.4: Cached objects elimination of PDAID algorithm.

begingroups ← all groups in cacheuserLocation ← current location of usernumOfRequiredSlots ← number of required slotswhile numOfSlotsFreed ≥ numOfRequiredSlots do

min value ← maximum valuewhile each group in groups do

D ← calculate distance(group, userLocation)Pg ← retrieve value(Pg)C ← Pg / Dcurrent Value ← the value of Cif current value < min value then

min value ← current valuemin group ← group

endif min group 6= empty then

groups ← remove(min group)numOfSlotsFreed ← numOfSlotsFreed + numItemsInTheGroup

endend

parameter min value is reset to the maximum value. Then, the elimination process

continues until the number of required slots has been made available.

The illustration of Figure 5.8 is reused to describe the PDAID replacement policy.

Recalls the values of Pg for G0, G1 and G2 are 2, 0 and 2.5 respectively. The user

at current position (shaded circle) stores a new incoming objects, however the cache

space is not enough to accommodate those incoming objects. Thus, the existing

cached objects are evicted. The eviction process is completed by choosing a least

value of C amongst all cached groups. The value of C is calculated by dividing the

value of Pg with the distance. Therefore, the values of C for all cached groups are

0.2, 0 and 0.208 respectively. Hence, the eviction order is G1, G0 and G2.

5.4 Case Studies

This section presents several case studies to illustrate our proposed approach. The

initial situation is given earlier, followed by illustrations and explanation for every

proposed approach.

Figure 5.9: Initial situation after cached objects have been grouped

Figure 5.9 shows the initial client cache status after the user has sent some

queries. The figure shows that some groups of objects have been formed and also

shows the current position of the mobile user. In the current situation, the user

would like to store the incoming query results to the cache; however, the cache

cannot store all incoming objects. Therefore, some cached objects are going to be

evicted to make available more empty cache slots.

To explain our cache invalidation policy with the example above, we present our

proposed approaches in three points, where each point describes density-based, path-

based and PDAID (cost-based) replacement policies respectively. The discussion of

three proposed approaches are as follows:

Case Study 5.4.1. The density-based policy

Assume that areas of all groups are the same size, which is a 2-unit area. Two

case studies are given below:

Case 1: Number of cached objects for each group are as follows: G0: 10, G1: 5, G2:

6, G3: 8, G4: 10, G5: 8, G6: 10, G7: 6. This illustration is shown in Figure 5.10.

Figure 5.10: Density based approach (Case Study 5.4.1-1)

In this case, G1 is the group that has the least number of cached objects. How-

ever, this group is not going to be removed since it is predicted to be the next

requested query. Hence, either G2 or G7 is going to be removed in this case. A

group which has the least access time has a higher possibility of being removed.

Case 2: The number of cached objects for each group is similar with the case

study 1, except as follows: G1: 10, G3: 4, G4: 5, G7: 8. Figure 5.11 shows an

illustration of this case study.

Figure 5.11: Density-based approach (Case Study 5.4.1-2)

The group that has the least number of cached objects is G3. However, this

group is not going to be removed since it has been accessed recently. Therefore, the

next victim is G4. Similar to G3, this group has been recently accessed and it will

not be removed. Then, group G2 is the next group that has the least collection and

access time. Therefore, this group is going to be removed.

For both cases, the new objects are inserted into the cache after the cache elimi-

nation, followed by the creation of a new group or adjustment of the existing group.

Adjustment of the existing groups is performed only to the groups that have the new

inserted objects. The outcome of the adjusting existing group is that the existing

groups have new object members and/or new groups are created.

Case Study 5.4.2. The path-based policy

In this part, we discuss the use of the Path-based approach. Two cases are

presented. In the first case, a query scope overlaps only with one group, whereas

the second one overlaps with multiple groups. The illustrations of both cases are

slightly different.

Figure 5.12: Path-based approach (Case Study 5.4.2-1)

Figure 5.12 shows the first case illustration for Path-based elimination where

the query scope covers only a single group. Group victim selection is done by

calculating two different distances. The first distance is measured between centroids

of two groups and the other one is calculated between centroid of the group and the

user’s receiving location. For example, the distance between G7 and G0 is 8, while

the distance between G7 and the user is 15. Once the two different distances have

been calculated, the smallest value is selected, which is 8. A similar procedure is

applied for G1, G2 and G6 and the smallest distance value for each group is selected.

Those values are 5, 18 and 10 respectively. Distances for group G4, G3 and G5 are

computed since these three groups are being used. Once the smallest distance values

have been selected, the maximum values are targeted as victims. Hence, G1 is the

victim and is eliminated from cache.

Figure 5.13: Path-based approach (Case Study 5.4.2-2)

Figure 5.13 shows a situation which is similar to the first case. In this case, the

next predicted query scope covers two cached groups. The elimination process is the

same as for the first case. To simplify, distances for G7 are 8, 15 and 17 (denoted by

H, G and I respectively), distances for G2 are 18, 10 and 26 (denoted by C, D and

K respectively), and the distances for G6 are 12, 25 and 27 (denoted by F, E and

J respectively). The minimum distance values for all three groups are 8, 10 and 12

for G7, G2 and G6. Then, the maximum value of those minimum distance values is

selected, which is 12. Hence, G6 is eliminated from the cache.

After one or more groups have been eliminated, the remaining processes are the

same as for case study 5.4.1, which inserts new objects and regroups the cached

objects. The regrouping process adjusts the existing groups and/or creates new

groups.

Case Study 5.4.3. The PDAID (cost-based) policy

This case study shows how to eliminate a group of objects based on the cost of

a group. The cost value is calculated based on PDAID mentioned in Section 5.3.3.

Figure 5.14 shows an illustration of this case study. For simplicity, the cache

has been filled by objects and the groups have been formed. These cached objects

have not been accessed again. Assume that the client receives new objects at the

current position and the cache is full. The cached objects which are located in next

prediction location are not evicted, because these objects will be requested next.

In this situation, access probability of all groups are zero since they have not been

accessed. When the user accesses G3 and G4, the access probability of this group

changes. Therefore, the eviction is based on the first incoming group. G0 is going

to be eliminated first if it is the first group inserted to the cache.

After one or more groups have been evicted, the rest of the caching process is

the same as for case study 5.4.1, which enters new objects and then adjusts existing

groups or creates new groups.

Figure 5.14: PDAID-based approach (Case Study 5.4.3)

5.5 Discussion

This section discusses our proposed approaches. First, we discuss our elimination

approach based on distance followed by that based on density. The last discussion

is the elimination based on multiple factors.

Our proposed approach to elimination is similar to that of the FAR algorithm.

In our proposed approach, we eliminate a group of objects rather than individual

objects. The group is eliminated if the group has a maximum distance to the next

predicted location and minimum distance to the current location. The distance be-

tween two groups is a distance between two centroids of each group. Each group may

have a different shape which has a different formula to decide the shape. Therefore,

we use formula K-mean [34] to find the centroid of each group. In addition, when

a query scope overlaps more than one group, a minimum value from both groups is

found.

The second elimination approach is based on density. In this approach, the group

which has fewer objects has higher priority to be evicted. If the group is far away

and has more objects within a small area, this group is not eliminated. Therefore,

distance is not counted in this approach. When more than one group have the

same density value, the group which is formed first has a greater chance of being

eliminated in advance.

The last approach is based on multiple factors. The approach, called PDAID,

calculates the cost of a group based on several factors. The PDAID approach is

similar to the PAID approach, except we consider density and area values of one

group rather than an area of that group. The reason is that a larger area may consist

of only a few objects compared with the smaller area. A group has higher priority

to be chosen for elimination if the group has the furthest distance, the longest access

time, fewer objects and a small area.

5.6 Conclusion

This chapter discussed about our three proposed approaches for the client caching,

focusing on objects elimination. The aim of our proposed approach is to answer

client queries which satisfy at least K-objects answered from the cache. Our pro-

posed elimination approaches eliminate a group of objects based on three different

criteria. With the first criterion, the group of objects is eliminated based on dis-

tance. The second one is density-based; whilst the last one is elimination based on

multiple criteria.

In the first criterion, the distance-based elimination uses the MinMax algorithm,

which eliminates the group which is faraway from the next predicted location after

the location from the user has received a query result. In the second criterion, the

density-based elimination drops the group which has drop objects. The last one is

based on cost of a group by considering four factors in order to eliminate a group.

These four factors are: access probability, valid scope area, density and data distance

factors.

Chapter 6

Performance Evaluation

This chapter presents the performance evaluation of our approaches that have been

elaborated in Chapters 3, 4 and 5. The purpose of this chapter is to evaluate those

approaches under various conditions.

The evaluation is performed by implementing and simulating the proposed ap-

proaches using JavaTM and PlanimateTM . The implementation and its results are

presented in Section 6.1; while the simulation and its results are presented in Section

6.2. The implementation section briefly describes our implementation and its results

for query processing at the server side. The simulation section contains a short sum-

mary of the simulation model, and more comprehensive results. Our simulation also

validates the outcomes of the implementation.

6.1 Implementation and its Results

An evaluation of the implementation of mobile query processing at the server side

is described in this section. This section is divided into two parts: a short summary

of the implementation details, and an elaboration of implementation results.

CHAPTER 6. PERFORMANCE EVALUATION 189

6.1.1 Implementation Environment

A summary of the implementation details is given in this section. The summary

includes implementation settings and the architecture.

Table 6.1: Hardware settingsParameter Server 1 Server 2 ClientProcessor AMD SunFire V440 PentiumCPU speed 1.96 Ghz 1.28 Ghz 700 MhzRAM Size 1 GB 16 GB 512 MBConnection Wired Wired WirelessLAN speed 512 Kbps 512 Kbps 1MbpsArea Size 900 x 2000 300 x 2000

Table 6.1 shows our experiment configurations. Server 1 and client machines use

Linux FedoraTM Operating System, whereas server 2 runs under the Sun operating

System. The implementation is written using JavaTM programming language. The

simulation database contains various numbers of records of the random number

x,y. In our experiment, every BS is connected to a single database (DB) server,

which contains between 100,000 to 5,000,000 records and the scope of a query is set

beyond the current BS boundary. The data set is synthetic, in that it is produced

by a location generator.

6.1.2 Implementation Results

This section gives numerical results of our experiments. The explanatory details of

our experiment results for single-cell and multi-cell, are discussed next.

Results for Query Processing in a Single Cell

We examine the performance of our proposed algorithm to process single cell queries.

The simulation database contains various numbers records of the random number

of x,y. The number of records in the database ranges from 250,000 to more than 1

million. Furthermore, various numbers of distances in user queries, from 500 up to

2500 meters, are sent by users through their mobile devices to the BS.

Figure 6.1: Number of targets found in a square

The experiments presented are designed to achieve two objectives. Firstly, we

examine the performance differences between square and circle. Secondly, we com-

pare our algorithm to specify the location related to the query. We assume that the

user receives the query result in a new location at time tstart+1.

Figure 6.1 shows the number of targets found within scope 1000 x 1000, 2000

x 2000, 3000 x 3000, 4000 x 4000 and 5000 x 5000 by using a square at time t1.

These experiments were performed when the user was not moving while receiving

the query results. From the figure, we can see the number of targets found within

area in different databases. The numbers are increasing as the area increases. It

shows an exponential number as the database and scope is getting bigger.

Figure 6.2: Number of targets found in circle

Figure 6.2 shows the number of targets found within radii of 500, 1000, 1500,

2000 and 2500 meters inside a circle at time t1 where the user is not moving while

receiving the query result. The query results are noted when the number of targets

found within a circle from various size databases is about the same as the square.

It shows an exponential number as well. However, if we take careful note of both

figures, the number of targets found in the circle is a bit smaller compared with the

one in the square. For example, there are 312,698 targets found in a square, but

only 245,254 targets found in a circle.

Therefore, we can calculate the percentage differences of the total targets found

between square and circle as shown in Figure 6.3. All targets found in a square

that is a valid scope are 100 percent. If we use a circle as a valid scope, the query

results produced are less than 100 percent, around 78-79 percent. This is due to the

area of the square being 21.4 times greater than the circle. This percentage does

not depend on the number of records in the database. Therefore, a square will have

Figure 6.3: Comparison of number of targets found in circle and square

about more than 21.4 percent chance to find the target compared with a circle if

the number of places is small.

Figure 6.4 shows the comparison of a number of targets found in every region.

If the server explores the whole of the regions and the user is moving diagonally,

the resources of the server will be wasted since the user is interested only with the

objects that have not been passed. We suggest that the server seeks only the specific

region(s) based on the direction of the client. Searching in one or two regions is very

efficient, because the processing time in specific regions is about 25-50 percent faster

than exploring the whole of the regions.

If we use a circle as the valid scope when the user missed the query result at time

tstart+1, there will be some targets that cannot be caught at time tstart+1. This is

not efficient since the server needs to seek new targets in a new location at the next

interval time as shown in Figure 6.5. If the targets are scarce, it is not convenient

Figure 6.4: Comparison of number of targets found in each region.

Figure 6.5: Comparison of number of targets found in circle at time t1 and t2.

for the user to resubmit the query in order to get a new query result at the next

interval time.

Figure 6.6: Snapshot of CPU load

We also analyse our proposed algorithm when a user misses the query results. If

the user moves at a speed which is higher than or equal to the distance in user queries,

any overlapping area is not produced. Otherwise, we have an overlapping area.

Figure 6.6 shows the CPU load processing when processing both an overlapping and

no overlapping area. The load percentage can be defined as throughput per second.

The graph shows that avoiding probing processes in the overlapping area reduces

the CPU load.

In conclusion, our experiments shows that the maximum number of objects using

a square as a valid scope gives a user a greater chance of finding scarce targets which

are close to the query scope boundary compared with other shapes. This is due to

the area of the square being greater than the others for a given query distance.

Results for Query Processing in Multi-cells

We have conducted three different types of analyses to examine our proposed ap-

proach. Firstly, we examine a situation where Multi-BS has the same area size and

many queries with various sizes of query scopes. The purpose is to determine how

long users should receive query results from a server. Secondly, we examine a stage

where Multi-BSs have different area sizes and a variety of queries with different sizes

of scopes are being sent. The purpose is to compare the length of time that query

results are received by users with either one or many BSs. We also examine the pro-

cessing time for every BS for both single and multiple users. From this experiment,

we evaluate whether the BS processing time is responsible for producing the query

results.

(i) Uniform Area Size Multi-BSs and Various Sizes of Query Scopes

Here, we analyse the receiving time of query results from servers while

users send queries with various sizes of query scopes from one location. All

BSs have the same area size. The complete setting of our simulation for the

first experiment is shown in Table 6.2. In this experiment, we used up to

five BSs with the same area size. We tested 1 to 20 users sending queries

concurrently. The query scopes are formed by squares and the areas vary from

250,000 upto 3,062,500 M2.

Table 6.3 shows the results of the first experiment. We can see that users

have a better chance of getting more targets within bigger query scopes. When

the query scope is 100 metres, the users get only 5 targets if the total record

of the database is 100,000 records. In contrast, users can get 698 targets when

the query scope is 1750 metres within the same database. Furthermore, the

Table 6.2: Parameters setting

Parameter ValueNumber of BSs 5BS Area (M2) 250,000Query Scope (m2) 250,000 - 3,062,500Number of Users 1-20User Coordinate (x,y) (400,400)Number of Items (every BS region) 100,000 - 5,000,000

other columns show that a smaller scope of query has a smaller number of

targets compared with a bigger scope.

On the other hand, if we examine the table horizontally, a larger number of

database records show more targets have been found compared to the smallest.

From this point of view, we see that the number of targets found depends not

only on the size of the query scopes, but also on the total records of database.

Table 6.3: First experiment result

Database RecordsQuery Scopes 100,000 500,000 1,000,000 5,000,000

(Metres) records records records records100 5 28 54 265250 29 169 299 1564500 140 619 1260 6186750 296 1380 2812 13929

1000 459 2190 4516 223361250 579 2864 5909 293001500 641 3176 6523 324691750 698 3485 7129 35619

Figures 6.7 and 6.8 show the response time results where the database

records are varied from 100,000 to 5,000,000 records respectively. Each graph

shows the response time where the number of users are: 1, 5, 10 and 20 users,

sending the query until receiving the query results and the query scopes range

from 100 up to 1750 meters. These graphs illustrate a general idea that the

larger size of query scope and database records and the more users the greater

are the delays in answering user queries.

Figure 6.7a shows the response time when the BSs accesses 100,000 database

records. It shows that answering a smaller area of a query scope gives a faster

response compared to a bigger area. In our simulation result, answering the

largest area of query scope, 1750 metres, is slower by about 10 times than the

smallest one, 100 metres. When we simulated for 20 users with query scopes

of 750 and 1000 metres, our machine ran the daily updates. That is, the slope

increased only.

Figure 6.7b shows the response time when the BS has 500,000 database

records. It indicates a significant change while twenty users access the BS with

a query scope greater than 1000 metres. The difference between the response

time of the largest and the smallest query scope is similar to the previous

graph. The current BS answers a user query of less than 100 milliseconds(ms)

while a query scope is 100 metres. On the other hand, when the query scope is

1750 metres, the approximate time for the user to get the answer is from 100

to 700 ms depending on how many users request results. Furthermore, we can

see that when the number of users processed doubles, the delay time is also

doubled.

While the number of database records increases in size, the response time

becomes slower. However, it is not only that; the query scope size is also

responsible for responding to the user query. Figure 6.8a shows that, especially

when there are more than 5 users accessing the BS concurrently, the increment

of the graph is bigger than when there are fewer than 5 users. On the other

(a) 100,000 DB Records

(b) 500,000 DB Records

Figure 6.7: Various searching scope with 100,000 and 500,000 database records

(a) 1,000,000 DB Records

(b) 5,000,000 DB Records

Figure 6.8: Various searching scope with 1,000,000 and 5,000,000 database records

hand, the response time to answer the targets within 1750 metres for 20 users

significantly increases if the query scope is greater than 1500 metres.

Figure 6.8b shows the same trends as the previous four. It shows that the

response time is ten times slower than that shown in Figure 6.7a. It also shows

clearly that if the number of users is doubled, the response time also increases

by two. However, it is not guaranteed that the number of users is a primary

factor in slowing down the BS response time. If we compare all four graphs,

the number of database records is one of the factors which slows down the BS

response time. This is due to the number of comparisons required to be done

in order to fulfil the criteria of the user query.

In summary, the longer it takes for the BS to answer queries, the more

targets are returned to the users. Furthermore, the size of query scopes and

the number of records in the database also need to be considered. Finally, the

total number of queries required to be processed concurrently also affect the

performance evaluation.

(ii) Various Area Size Multi-BS and Uniform Size Query Scopes

In the second experiment, many users send a number of queries, with the

same query scopes, from the same position, to the corresponding BS. We use

three BSs: one is the current BS and the others are the neighbouring BSs.

Table 6.4 shows the parameter setting. In this analysis, we use three BSs

where the area size of each BS is varied. However, all query scopes have the

same size. A range of total users send a query and the total records in the

database are the same as in the first experiment.

The purpose of this experiment is to measure the response time to answer

user queries if multiple BSs and multiple queries are involved at one time.

Table 6.4: Parameters setting for multiple BSs

Parameter ValueNumber of BS 5Query Scopes (m2) 2,250,000BS 1 Area (m2) 810,000BS 2 Area (m2) 90,000BS 3 Area (m2) 250,000Number of User 1 - 20User Coordinate (x,y) (400, 400)Number of Item (every BS region) 100,000 - 5,000,000

An accurate result returned by the BS is another point of interest. Table 6.5

shows the results returned to users if only a single or multi BS returns the query

results. The response time of the experiment results displayed in Figures 6.9

and 6.10 show the response time if there are one to three BSs accessed by the

number of clients which is 1, 5, 10 and 20 respectively. These figures show the

same trends, that is, the more BSs involved, the longer is the time taken to

give answers to users. In addition, when there are more queries processed at

the same time, more time is taken to answer those queries.

Table 6.5: Second experiment result

Number ofDatabase 1 Base 2 Base 3 BaseRecords Station Stations Statons100,000 141 560 1,074250,000 386 1,444 2,741500,000 729 2,983 5,581750,000 1,124 4,548 8,439

1,000,000 1,521 5,963 11,1542,000,000 3,072 12,046 22,7253,000,000 4,339 17,801 33,6194,000,000 5,992 24,109 45,2115,000,000 7,389 29,985 56,115

When a user sends a query to one BS and the BS searches only within its

area, the user will miss some targets or have to resend another query to collect

targets within the neighbour. Resending another query to a new BS consumes

more power and bandwidth. Table 6.5 shows the number of records found in

the query result. The first and second columns show when only one or two

BSs answer the query. Unless the third BS is down or failed to return a query

result, the query result returned is insufficient since the user needs to resend

another query when he reaches the neighbour BS.

Figure 6.9a shows the response time for giving solutions to a single user

when there are one to three BSs used simultaneously. The database contains

a number of records from 100,000 to 5,000,000. The average response time

of accessing from 100,000 to 5,000,000 for three BSs, is around 32 percent

compared to one BS only. On the other hand, the response time of accessing

either 100,000 or 5,000,000 records is 61 percent slower than for one BS.

Figure 6.9b shows the response time of five users which involves one, two

and three BSs respectively. The delay time in responding to a request from

users is up to 14 seconds for accessing three BSs with 5,000,000 records, while

it takes 3 seconds to access one BS only. If we compare this graph with the

previous one, it has a longer delay time. However, the average delay time for

one user remains the same.

Figure 6.10a shows the results of the response time accessed by ten users

for one, two or three BSs. In this situation, the delay in acknowledging a

request from a user is less than two times compared with Figure 6.9b. The

BSs response to ten user queries is less than 25 seconds for three BSs, 17

seconds for two BSs and 8 seconds for one BS. Others bars show the same

trend as for two BSs.

(a) One user

(b) Five users

Figure 6.9: A single searching scope with one and five users

(a) Ten users

(b) Twenty users

Figure 6.10: A single searching scope with ten and twenty users

Figure 6.10b shows the response time of twenty users while accessing one,

two, or three BSs. The response time of 5,000,000 database records is less than

40 seconds for three BSs, 23 seconds for two BSs, and 12 seconds for one BS.

The trends of other bars are similar to the one discussed before.

These four graphs show similar trends. Therefore, we can conclude from

this experiment that the delay in responding to user queries is n times slower

to access n BS, where n is the number of BSs accessed simultaneously.

The next group shows the response time that is classified by the number

of BSs. The purpose of this group is to show when and why the response time

of each BS is slowing down.

Figure 6.11: Response time of single BS

The next three figures, 6.11, 6.12a and 6.12b, show the response time for

one, two and three BSs accessed by the same number of users with the same

number of database records as mentioned in the previous group. The line is

starting to gradually increase, denoting that the response time is slowing down

when accessing 1 million records. However, they increase significantly if there

are more than 3 million database records. The increment of 10 and 20 users

is significant compared with the last two lines. However, the average response

time for each user is less than the response time of 1 user. The average is

around 600 milliseconds.

(a) Two BSs

(b) Three BSs

Figure 6.12: Response time of multi-BSs

Figure 6.11 shows the response time of a single BS. It shows that the line

increases slowly until it reaches 1 million records. After that, the response time

slows down quickly. The response time of 10 users in a database containing 5

million records is a significant slow down.

After the response time of one BS, the next figure is the response time

involving two BSs as shown in Figure 6.12a. This increment starts to get higher

after the database records reach 1 million records. The line that presents the

response time of 20 users is increased after 1 million records. However, if we

calculate the average response time, it is still faster than the response time of

1 user only. On the other hand, the lines of 5 and 10 users show a parallel line.

However, if we compare the gap between 10 and 20 users, the area is not twice

as large as the gap between 5 and 10 users.

Figure 6.12b shows the response time of three BSs. The response time

of three BSs is about twice as slow as those shown in the last two graphs.

This is due to data transmission and searching time for additional BSs. From

this figure, we can see that the delay between two and three BSs is not much

different when the database records are fewer than 500,000 records. However,

it starts to slow down if it accesses database records of more than 500,000.

(iii) Individual Processing Time of Multi-BS

In the last experiment, we measure the individual processing of each BS.

We used the same setting as in the second experiment. The aim is to discover

whether the processing time of each BS is reasonable. We did this experiment

twice. The first experiment used two BSs. The second experiment used three

Figure 6.13: Processing time of individual BSs for the same query scope and twoBSs

The graphs shown in Figures 6.13 and 6.14 are the results of our experi-

ments showing the processing time of each BS. A number of different users,

from 1 to 20, are examined in this experiment.

When there are many BSs involved, we cannot say that the processing time

of one BS (neighbour) is faster than another. This is due to the size of the

database records and the scarcity of the targets matched with a query from

the user. This is shown in the second group in Figure 6.13 where the second

BS needs more time to finish its process.

Figure 6.14 clarifies this issue clearly. In the first and second groups, BS3

takes longer to finish its process than the other two. However, BS2 is the

fastest to finish the process. In contrast, the fastest BS in the last group is

BS3, and BS1 is the slowest. BS2 is the last BS to finish its process, whereas

BS3 is the fastest.

Figure 6.14: Processing time of individual BSs for the same query scope and threeBSs

6.2 Simulation and its Results

This section discusses experiment results of the proposed approaches using a simu-

package, but it is a software platform for prototyping, developing and operating

highly visual dynamic discrete event simulation models, and interactive business

mentals, with built-in capabilities as follows:

• animation,

• handling of concurrency,

• visual work-flow modelling, and

• dynamic time-based modelling (simulations).

The above capabilities provide the efficiency needed to develop models of our

proposed approaches. The models might take more time to be developed using a

programming language.

6.3 Simulation Results for Single-Cell and Multi-

Cell Query Processing

Experiments for query processing at server sides are presented in this section. A

simulation of a retrieval situation has been done to compare the performance of

two query scopes. One query scope has a square shape, whereas the second one is a

circle. Besides to show the retrieval performance of two shapes, this experiment aims

to show that the simulation tool has the same performance as our implementation.

The experiments are divided into single and multiple cells experiments.

Single Cell Simulation Results

The setting of this experiment uses 50 synthetic data which represents the location

of static objects. a series of experiments which are carried out on this examination

are divided into 2 cases. Table 6.6 shows the setting details.

Table 6.6: Parameter settings - single cell

Parameter ValueNumber of BS 1BS Area (units2) 2500Query Scope (units2) 36-196User Coordinate (x,y) (25,25)Number of Items 50

Case 1. User is not moving while retrieving data. The user asks for objects

within a radius of 10 units, where the user location is the centre point of the query

scope.

Figure 6.15: Comparison of objects retrieved using a square and a circle (singlecell)

Figure 6.15 shows a graph that compares the number of retrieved objects using

two query scopes whose shapes are a square and a circle respectively. The number

of retrieved objects using a square has more objects compared to the other one. In

the worst case, both shapes retrieve the same number of objects.

Case 2. Various experiments with different sizes of query scopes have been done

for this case. The settings for this case are the same as for the previous experiment,

except for the query scope dimension. The sizes of the query range from 6 upto 14

units distance. The dataset used in this experiment has 30 objects.

Figure 6.16 shows a 100 % result comparison graph for this case. The graph

compares the number of retrieved objects where the sizes of query scopes are varied.

As shown in the graph, the percentage bars for square shape are higher than the

Figure 6.16: Percentage comparison of object retrieval using different sizes of queryscopes.

ones for the circle. When the scope distances are 6 and 8, the percentage bars for

both have the same height. This can be explained by the fact that distances of the

retrieved objects are located within the scope dimension.

Multiple Cells Simulation Results

Some simulation experiments to imitate query results retrieval from multiple cells

have been performed. The setting for this of type experiment uses 100 synthetic

data which represents the location of static objects separated in two cells. Table 6.7

shows the setting details.

User is not moving while retrieving data. The user asks for objects within radius

of 10-18 units. The number of database records is 50 records for each cell.

Figure 6.17 depicts the experiment result for object retrieval from multiple cells.

In the figure, the experiments which used a square as a query scope retrieved more

objects than did a circle for each cell. Square1 means that the object retrieval

Table 6.7: Parameters setting - multiple cells

Parameter ValueNumber of BS 2BS Area (units2) 5000Query Scope (units2) 36-196User Coordinate (x,y) (45,25)Number of Items 100

Figure 6.17: Comparison of objects retrieved using a square and a circle

is using a square shape to retrieve records within cell one. Circle1 has a similar

meaning to square1, but it used a circle shape rather than a square. When using a

square for each cell (square1 and square2) the performance is better compared with

the ones using a circle (circle1 and circle2).

Comparing between the implementation and the simulation results is our next

discussion. The percentage of using a circle or a square for the implementation is

about 80 percent (refer to Figure 6.3), whereas the simulation also produces about

80 percent (refer to Figure 6.15). It can be seen that the performance when using a

square for the implementation and simulation is similar. Therefore, this simulation

package can be used to simulate the rest of our proposed approaches. In conclusion,

the use of a simulation package produces similar results to those produced by the

implementation ones. Therefore, this simulation package can be used to simulate

the rest of our proposed approaches.

6.3.1 Indexing for Multi-Cell Query Processing

This section discusses the simulation experiments for the proposed indexing mech-

anism. The performances of two proposed approaches are studied and their results

are compared with the conventional approach.

For each proposed approach, the performance of three cases are simulated and

studied. The three cases differ in: the number of requests during an off-load, an

off-load and a high-load situation. An off-load situation is simulated by setting up

the internal time to be greater than the average access time. To simulate a high-load

situation, the interval time of an incoming query is less than the average access time.

Local Index Simulation Results

The experiment settings for all cases of both proposed approaches are similar. In our

study, four cells are used to process the query, where three cells behave as neighbour

cells. All cells have similar processing speed. The interval time of the sent queries

varies between 0.1 up to 1 second. These queries are sent in a sequential order, thus,

only one query is processed at one time. To simulate the local index behaviour, we

assume that the number of additional slots to cache remote data items is 20 percent

of the total slots.

The details for all cases are explained below:

Case 1 compares the performance of the local index with the conventional ap-

proach using the same query interval time. The aim of this case is to show the

average access time if there are different numbers of requests. The query interval

time used in the simulation is 1 second between query with deviation time is 0.1

seconds.

Figure 6.18: Average access time between proposed vs conventional approaches

Figure 6.18 shows the simulation result of average access time for the conven-

tional and the proposed local index approaches. In the graph, the conventional

approach too longer to process queries compared with the proposed local index.

However, when the graph for the proposed approach is higher than the conventional

one, the graph tells us that there are more queries so more time is needed to retrieve

data items from the neighbour cells.

Case 2 compares the performance of the conventional and the proposed local

index approaches in a high-load situation. The aim is to show that the proposed

local index still outperforms in a high-load situation. The setting for query interval

time ranges from 0.1 up to 1 second with a deviation time of 0.1 second. The number

of queries used in the simulation is 50 and 150 queries.

Figure 6.19 and 6.20 shows the average access time in a high load situation for

50 and 150 requests respectively. In both figures, the query interval time is from 0.1

up to 1 second. In most cases, the proposed approach outperforms compared with

the conventional approach as shown in both figures.

Figure 6.19: Average access time for the proposed Local Index vs the conventionalapproaches (50 Requests)

Figure 6.19 views the average access time for 50 requests. The average access

time of one query in this simulation ranges from 0.83 up to 1.2 seconds for the

conventional approach, whereas the range of the average access time of one query

for the proposed approach is from 0.81 up to 1 seconds.

Figure 6.20 presents the average access time for 150 requests. The average access

time for the conventional and the proposed approaches is between 0.89 up to 1.1

seconds. The interesting point in this graph is a turning point where the conventional

Figure 6.20: Average access time for the proposed Local Index vs the conventionalapproaches (150 Requests)

approach performs slightly better than the proposed approach if the query interval

time exceeds 0.7 seconds. This is because the number of queries retrieving data

items from different cells is higher than for those queries that retrieve data items

from the local cell directly.

Global Index Simulation Results

In this experiment, we compare performance when using global index and conven-

tional approaches. We used two cells where each cell has 30 records and an R-tree

indexing structure. Both cells have the same dimension, 50 x 50 units. On the

other hand, the global index contains 60 records of both cells and the R-tree index

structure is used.

The following cases have been set up for our experiments:

Case 1 compares the average access time between the proposed global index and

the conventional approaches to answer one query when there are various numbers

of requests. The same query is run several times and the average access time is

calculated in order to obtain the access time for a single query.

Figure 6.21: Average access time for a single query

Figure 6.21 presents the average access time it takes to answer a single query

when the number of requests varies. Mostly, the average access time for the con-

ventional approach is twice slower compared with that of the proposed global index.

The average access time for the conventional approach is 6.5 seconds, whereas the

proposed approach takes around 2.8 seconds.

Case 2 compares average access time between the proposed global index and

the conventional approaches to answer one query where the data items are not

replicated. In this scenario, the setting is similar to the previous scenario, except

the data items are not replicated to wherever the indexes are replicated.

Figure 6.22 shows the experiment results for a single query access type. As

shown in the figure, the query access time in the conventional approach is longer

than the one using our proposed approach.

Figure 6.22: Average access time for a single query: remote indexes only.

In the case where the number of requests is varied, mostly, the average access

time to answer a single query using the conventional approach is one and half slower

compared with that of the proposed global index. The average access time for the

conventional approach is 6.5 seconds, whereas the proposed approach takes around

4 seconds.

6.3.2 Simulation Results for Client Caching

This section describes the proposed client caching approaches. Table 6.8 shows

a setting list for the proposed cache replacement policies. The total number of

database records at the server side is 2000. The server answers 5000 queries with a

query size 3 x 3. On the other side, a client has a cache size 100 slots where each

slot is assumed to hold a single object. The expected number of objects received for

each query is 1-40. The cached objects form a number of groups where each group

has cached objects within an epsilon range. The minimum point range is from 1 to

10 points, whereas the epsilon range is between 1 to 10 units distance. The group

area is an area for each group. This group area is used only by the proposed Cost

based cache replacement.

Table 6.8: Experiment settings for client cache

Parameter ValueDB Records 2000 recordsQuery Scope 3 x 3Cache Size 100Group area 50Epsilon 1-10Minimum points 1-10Average Requested Objects 1-40Total Queries 5000

Figure 6.23: Comparison of cache hits with various minimum points on each group

Case 1. The experiment for this case uses a variable minimum number of points

and the other values are uniform. The epsilon value used for this experiment is 5

since it is a median of the epsilon range.

Figure 6.23 shows a comparison of cache hits where the minimum points range

from 1 to 10. The figure shows that our proposed approaches outperform when the

minimum points are 9 with an epsilon value of 5. When the minimum number of

points is less than 9, the proposed Path approach performs better than the rest of

the candidates. However, when the minimum number of points is greater than 9, the

performances of all proposed approaches are equal and the cache hit rate is higher

compared to what it is without caching.

Case 2. The aim of the second experiment is to discover whether the minimum

points value has any impact when the total number of requested objects increases.

In this experiment, we use a minimum point value of 5 and the number of requested

objects are 10, 20 and 40.

The experiment results for case 2 are shown in Figures 6.24, 6.25 and 6.26. These

three graphs show that all cache hit performances drop about half when the number

of requested objects is doubled. When the number of available slots in the cache

are insufficient, some occupied slots are freed. It means that a higher number of

requested objects requires more available slots. In other words, when the number of

occupied slots to be emptied is increased, the cache hit efficiency will be degraded.

Figure 6.24 shows our proposed approach when the maximum requested objects

is 10 and epsilon value ranges from 1 to 10. The density-based algorithm outperforms

compared with other competitor approaches when the minimum number of points of

each group are 3, 4 and 5 respectively. On the other hand, the path-based algorithm

outperforms when the epsilon value is higher.

Figure 6.25 shows experiment results when the maximum number of requested

objects is 20. The performance of the Cost approach is better when the epsilon is

5. This can be explained by the fact that most of the cached objects eliminated are

Figure 6.24: Comparison of cache hits with a maximum value of min req is 10.

rarely requested. In general, when the epsilon value is smaller, the cache hit rate

increases for all proposed approaches.

Figure 6.26 shows the experiment result when the number of maximum requests

is about 60 percent of the total cache slots. As we can see from this graph, all of

our proposed approaches do not perform well in general. However, they performs

better when the epsilon value is small. This situation can be explained by the fact

that when the epsilon value is small, more groups are formed. Therefore, we have

fewer evicted objects if the cached objects form more groups.

6.4 Discussion

Efficient retrieval when answering location-dependent queries is really necessary, as

location-dependent queries have been used widely to obtain query results at anytime

and anywhere. Without an efficient retrieval, it would be difficult to give an answer

to these types of queries.

For the server query processing, retrieving rare objects by reducing number of

queries is better, because it saves power to generate a query and to transmit it. This

situation can be achieved by using a square as query scope gains a performance in

terms of number of retrieved objects compared to a circle. Furthermore, avoiding to

process overlap area when the user misses query result speed up the time of query

result delivery.

For the local indexing (LI) mechanism, it performs better when most users re-

quest information at the same area. Because the server searches only its local index

which is faster than requesting from other neighbour cells. LI-1 which replicates

the remote data item performs better than LI-2. However, replicating remote data

items into the local storage occupies more spaces in the local storage.

For the global index (GI) mechanism, it performs better when most of queries

requests the remote data item. The index maintenance cost is lower compared to

the LI if most BSs are online in a stable condition. Furthermore, duplicating remote

data items increase the query response, however it consumes a lot of spaces in the

local cell.

For the grouping in cache replacement policy, the performance of all policies

depends on factors of a group. The cache hit increases when the epsilon value is

small. It decreases when the minimum requested objects is large, because the large

number of requested objects consumes large cache space which cause frequent cached

objects eliminations will be completed.

For the cache replacement policies, all proposed policies performs the same when

the epsilon value is small. However, the path-based outperforms for the large epsilon

value. The density-based performs better than path-based on average. However,

the PDAID does not deliver much performance gain, because it depends on multiple

factors. On the other hand, when the minimum points of each group increases

with a constant epsilon value, the performance of density-based is better than two

candidates, however the PDAID performs worst amongst all candidates.

6.5 Conclusion

In this chapter, we have described the performance evaluation of our proposed ap-

proach, which focused on the physical design, steps and implementations of the

proposed approaches with objects retrieval from a single cell to multiple cells. The

details of the proposed approaches have been elaborated on Chapters 3 to 5. The

evaluation results have shown the advantages of applying the proposed algorithms

to mobile query processing.

Section 6.1 shows the performance of our proposed approaches which are carried

out through an implementation. Then, we also present our proposed approach

using a simulation package in Section 6.2. At the start of the simulation section,

we also demonstrate that the simulation package shows the same results as the

implementation. Hence, the simulation is used to evaluate our last two approaches.

We have tested our algorithms for objects retrieval in both single and multiple

cells. It has shown better performance compared with those of other shapes. It

occupies less storage space consumption to store the query scope and has a better

chance of retrieving rare objects compared with others. In addition, our approach

has shown efficiency when forming a valid scope boundary on the neighbour cells

while retrieving objects from multiple cells.

In our experiments, we have shown that our proposed indexing mechanisms im-

proves the speed of the conventional mechanism. The performance test on indexing

structures in location-dependent query processing in general indicates that the pro-

cessing time using individual and partial global indexing approaches is different and

the gap increases as the number of popular data increases. Moreover, the data

transfer time between individual and partial global indexing show contrasting time

differences. Finally, the execution time improves around 30 percent with the partial

global indexing approach compared with the individual indexing approach.

For the client caching replacement policies, our proposed approaches performance

depends on the grouping policy. Three proposed cache replacement policies have

been evaluated in this chapter. In general, their performance is affected by the

number of requested objects and the epsilon value of a group. The cache hits

performance improved when the number of requested objects is small and the epsilon

value for a group is small. If the grouping policy has fewer cached objects in one

group, the proposed approaches increase the cache hit counter.

Chapter 7

Conclusion and Future Work

7.1 Overview

This thesis investigated mobile query processing at both server and client sides. The

main purpose of this research is to study the performance of mobile query process-

ing on both sides and build on top of traditional query processing mechanisms in

order to be adjustable to the mobile environment. Attention is focused on three

major areas of the query processing scheme: query processing at server side, in-

dexing for multi-cell query and cache replacement at client side. The investigations

include developing a model to obtain number of data items requested, index struc-

tures, cache hit performance. In addition, performance results were evaluated using

implementation and simulation.

7.2 Summary of Research Result

The main research result of this thesis exposes how mobile query processing can

be done at server and client sides, which maximise system efficiency and overcome

the limitations of mobile devices. Combining assorted types of query processing

CHAPTER 7. CONCLUSION AND FUTURE WORK 228

mechanisms for various possible mobile queries that may occur is the way to achieve

The first part of our research is to minimise the number of requests. To achieve

this, various algorithms are designed to deal with situations where mobile users miss

query result. This is done by considering location and the query size.

Indexing is an important problem to solve, especially when servers process multi-

cell queries. The aim is to minimise the number of visited nodes in order to improve

query access time.

Minimising communication cost is always a primary consideration in the mo-

bile environment. This can be achieved by retrieving requested items from a local

storage. Hence, a client cache replacement policy has also been considered and in-

vestigated in this thesis. Another purpose of investigating client cache replacement

policy is to overcome limitations in mobile devices, in particular, small screen dis-

play and storage limitations in the mobile devices. Reducing number of requested

objects based on user satisfaction is the way to handle these limitations.

The achievements of this research are summarised as follows:

• Query Processing at Server Side

The main motivation in proposing server query processing is to process mo-

bile queries by considering movement factors of mobile users. The aim of this

contribution is to reduce data transfer by retrieving objects that are located

in the same direction of mobile users. The other purpose of this contribution

is to retrieve an object that is rare by reducing the number of requests to the

server.

The query processing mechanisms in this contribution are divided into

three parts: single cell, multiple cells and handling disconnections. We further

divide single cell into three categories: static, dynamic and angle of movements.

In the static category, a query scope is parallel to a base station location and

three algorithms were proposed for dealing with this situation. The three

proposed approaches are used to retrieve objects based on horizontal, vertical

and diagonal movement. The dynamic approach is a query scope that is

perpendicular to the mobile user direction. The angle of movement is similar

to the diagonal in the static approach, except this approach focuses more on

the angle of movement of mobile users.

On the other hand, multi-cell query processing focuses on retrieval from

several cells which are covered by a query scope. In this case, we also consider

the overlapping and non-overlapping cells area in order to avoid duplicated

objects. Then, we modify the single cell retrieval algorithm for adaption to

the multi-cell query processing.

In handling the disconnections issue, we identified several disconnection

situations and proposed algorithms for each. The aim of this part is to decide

whether a server needs to keep the existing query result or generate a new

query result.

• Indexing Mechanisms for Multi-Cell Queries

Some researchers have developed indexing mechanisms for a non-mobile en-

vironment. This thesis did not consider at developing a new indexing structure.

However, it studied the behaviour of existing index structures and developed

new algorithms to use existing index structures to answer multi-cell queries. A

multi-cell query is a query that asks for a certain area which includes multiple

base stations. The purpose is to improve query processing time by avoiding

the sending of a request to the other cells.

The main motivations of our indexing mechanisms are two fold: Local

and Global indexes. As the name of the proposed approach suggests, the

first is an investigation into storing requested remote indexes at the current

server. In this case, we increase an existing index tree of one cell by adding

requested indexes from surrounding cells. The index tree cannot be expanded

to cope with all remote indexes: however, it is allowed to grow to a certain

size. The second approach is to create a global index for all indexes from all

available cells. It means that when a base station is online, it propagates its

tree to surrounding cells. In this case, the shrinking will occur only when the

base station goes offline. This is a necessary condition in order to ensure the

consistency of the indexes which are located on the global index structure.

• Cache Replacement Policies for Client Cache

The last contribution of this thesis is the development of three client cache

replacement policies. In this case, we borrow an existing grouping algorithm

to group cached objects into several groups. When the cache needs to free

some cached objects, then one of the proposed cache replacement policies can

be applied to eliminate a group of cached objects. The aim is to increase the

usage performance of the client cache to reduce communication costs to the

server. Also, handling the limitations of small screen mobile devices is another

purpose of this contribution.

The three cache replacement policies are Path-based, Density-based and

Probability Density Area Inverse Distance (PDAID). The first policy considers

distances to all groups in order to eliminate a group of cached objects. The

second one evicts a group that has the least total number of objects. The last

one removes a group based on a cost which is calculated by analysing several

factors.

7.3 Future Research

This section discusses several possible investigations that can be done in mobile

query processing.

Multiple Sources Query Processing is a future topic where a mobile client makes

requests from several sources rather than a single source. The challenge is to solve

the problem of how the mobile user joins the requested data from several sources.

The quick answer is either the mobile devices or server side. Due to the nature

of the mobile environment, there are several factors to consider, such as frequent

disconnections, small storage space and slow network bandwidth.

Answering Moving Objects Queries is a future challenge for our research topic.

The problem is the searched objects dynamically move to another location. Thus,

if we store these objects into a cache, some cached objects could be preserved in the

cache during cache elimination. Can we extend our proposed approaches to cope

with this situation? The second challenge is indexing moving objects, which extends

our proposed indexing to cope with moving objects. The challenge is to create an

indexing structure flexible enough to be used for indexing moving objects.

Continues Query Processing is another future investigation to extend our pro-

posed cache replacement policies to answer continuous queries locally. The problem

is how to model a replacement policy to preserve cached objects.

Caching Management For Query Processing. Throughout the whole thesis, we

have not considered the different aspects of caching, such as Middleware objects

caching, which stores objects which are not a database objects. However, it will be

beneficial to look at storing objects as part of the implementations.

References

[1] Aberdeen [2005]. The Mobile Field Service Solution Selection Report,

http://www.mobiletechlink.com/. Last accessed: 02/04/08.

[2] Acharya, D., Kumar, V. and Yang, G.-C. [2007]. DAYS mobile: A Location

Based Data Broadcast Service for Mobile Users, SAC ’07: Proceedings of the

2007 ACM Symposium on Applied Computing, ACM, New York, NY, USA,

pp. 901–905.

[3] Aggarwal, C., Wolf, J. and Yu, P. [1999]. Caching on the World Wide Web,

IEEE Transactions on Knowledge and Data Engineering 11(1): 428–441.

[4] Agrawal, D. P. and Zeng, Q.-A. [2006]. Introduction to Wireless and Mobile

Systems, 2nd edn, Thomson Engineering.

[5] Agrawal, P. and Famolari, D. [1999a]. Mobile Computing in Next Generation

Wireless Networks, Proceedings of the 3rd International Workshop on Discrete

Algorithms and Methods for Mobile Computing and Communications pp. 32–39.

[6] Agrawal, P. and Famolari, D. [1999b]. Mobile Computing in Next Generation

Wireless Networks, DIALM ’99: Proceedings of the 3rd International Workshop

on Discrete Algorithms and Methods for Mobile Computing and Communica-

tions, ACM Press, pp. 32–39.

REFERENCES 233

[7] Ahamad, M. [1999]. Scalable Consistency Protocols For Distributed Services,

IEEE Transactions on Parallel and Distributed Systems 10(9): 888–903.

[8] Akbarinia, R., Martins, V., Pacitti, E. and Valduriez, P. [2007]. Top-K Query

Processing in the APPA P2P System, 7th International Conference on High

Performance Computing for Computational Science, Vol. 4395 of Lecture Notes

in Computer Science, SPRINGER, pp. 158–171.

[9] Akbarinia, R., Pacitti, E. and Valduriez, P. [2006]. Reducing Network Traffic in

Unstructured P2P Systems Using Top-K Queries, Distrib. Parallel Databases

19(2-3): 67–86.

[10] Barbara, D. and Imielinski, T. [1994]. Sleepers and Workaholics: Caching

Strategies for Mobile Environments, SIGMOD ’94: Proceedings of the 1994

ACM SIGMOD International Conference on Management of Data, ACM,

pp. 1–12.

[11] Beckmann, N., Kriegel, H.-P., Schneider, R. and Seeger, B. [1990]. The R*-tree:

An Efficient and Robust Access Method for Points and Rectangles, SIGMOD

’90: Proceedings of the 1990 ACM SIGMOD International Conference on Man-

agement of Data, ACM Press, New York, NY, USA, pp. 322–331.

[12] Benetis, R., Jensen, C., Karciauskas, G. and Altenis, S. [2002]. Nearest Neigh-

bour and Reverse Nearest Neighbour Queries for Moving Objects, International

Database Engineering and Applications Symposium pp. 44–53.

[13] Bentley, J. L. and Friedman, J. H. [1979]. Data Structures for Range Searching,

ACM Computing Surveys 11(4): 397–409.

[14] Bluetooth [2008]. http://www.bluetooth.com. Last accessed: 02/04/08.

REFERENCES 234

[15] Bruno, N., Gravano, L. and Marian, A. [2002]. Evaluating Top-K Queries

Over Web-Accessible Databases, Proceedings of 18th International Conference

on Data Engineering pp. 369–380.

[16] Burak, A. and Sharon, T. [2004]. Usage Patterns of FriendZone: Mobile

Location-Based Community Services, MUM ’04: Proceedings of the 3rd Inter-

national Conference on Mobile and Ubiquitous Multimedia, ACM, New York,

NY, USA, pp. 93–100.

[17] Cai, Y. and Hua, K. A. [2002]. An Adaptive Query Management Technique for

Real-Time Monitoring of Spatial Regions in Mobile Database Systems, PCC

’02: Proceedings of the Performance, Computing, and Communications Confer-

ence, 2002. on 21st IEEE International, IEEE Computer Society, Washington,

DC, USA, pp. 259–266.

[18] Chand, N., Joshi, R. and Misra, M. [2006]. Data Profit Based Cache Replace-

ment in Mobile Environment, IFIP International Conference on Wireless and

Optical Communications Networks.

[19] Cho, S. G., Jeong, H. K. and Ma, J. S. [2003]. Performance Optimization

Technique of Location Registration in Public Transportation, Mobile Commu-

nications: 7th CDMA International Conference, pp. 49–69.

[20] Chrysanthis, P. K. and Pitoura, E. [2000]. Mobile and Wireless Database Access

for Pervasive Computing, Proceedings of the 16th International Conference on

Data Engineering, pp. 694–695.

[21] Clarke, I., Sandberg, O., Wiley, B. and Hong, T. W. [2001]. Freenet: A Dis-

tributed Anonymous Information Storage and Retrieval System, International

REFERENCES 235

Workshop on Designing Privacy Enhancing Technologies, Springer-Verlag New

York, Inc., New York, NY, USA, pp. 46–66.

[22] Dar, S., Franklin, M. J., Jonsson, B. T., Srivastava, D. and Tan, M. [1996].

Semantic Data Caching and Replacement, Proceedings of the 22th International

Conference on Very Large Data Bases (VLDB ’96), Mumbai (Bombay), India,

pp. 330–341.

[23] DasBit, S. and Mitra, S. [2003]. Challenges of Computing in Mobile Cellular

Environment: A Survey, Computer Communications 26(1): 2090–2105.

[24] Davis, W. [2001]. Motorola-Wireless Technology Trends,

http://www.ecedha.org/2000-01/agenda.html. Last accessed: 02/04/08.

[25] Deng, B., Jia, Y. and Yang, S. [2006]. Supporting Efficient Distributed Top-K

Monitoring, WAIM, pp. 496–507.

[26] DeRose, J. F. [2002]. The Wireless Data Handbook, 4th edn, Wiley-Interscience,

chapter 6.

[27] Ding, R. and Meng, X. [2001]. A Quadtree Based Dynamic Attribute Index

Structure and Query Process, Proceedings. 2001 International Conference on

Computer Networks and Mobile Computing pp. 446–451.

[28] Ding, X., Lu, Y., Ding, X., Zhao, N. and Wei, Q. [2007]. An Efficient Index for

Moving Objects with Frequent Updates, WiCom 2007. International Confer-

ence on Wireless Communications, Networking and Mobile Computing, 2007.

pp. 5946–5949.

[29] Dulaney, J. [2008]. The Evolvement of 3G Mobile: Introduction of Third Gener-

ation Cell Phones, http://www.planetomni.com/ARTICLES-The-Evolvement-

of-3G-Mobile.shtml. Last accessed: 02/04/08.

REFERENCES 236

[30] Dunham, M. H. and Kumar, V. [1999]. Impact of Mobility on Transaction

Management, MobiDe ’99: Proceedings of the 1st ACM International Workshop

on Data Engineering for Wireless and Mobile Access, ACM Press, pp. 14–21.

[31] El-Ghazaly, S. and Golio, M. [1996]. Challenges in Modern Wireless Personal

Communications, Radio Science Conference, 1996. 29: 39–51.

[32] Elmasri, R. and Navathe, S. [2004]. Fundamentals of Database Systems, 4th

edn, Addison-Wesley.

[33] Engerman, G. and Kearney, L. [1998]. Effective Use of Wireless Data Commu-

nications, International Journal Of Network Management 8: 2–11.

[34] Ester, M., Kriegel, H.-P., Sander, J. and Xu, X. [1996]. A Density-Based Al-

gorithm for Discovering Clusters in Large Spatial Databases with Noise, Pro-

ceedings of Second International Conference on Knowledge Discovery and Data

Mining, pp. 226–231.

[35] Feuerstein, M. and Rappaport, T. [1993]. Wireless Personal Communications,

Kluwer Academic Publishers.

[36] Franklin, M. and Carey, M. [1992]. Client-Server Caching Revisited, Proceed-

ings of the International Workshop on Distributed Object Management, ACM,

pp. 57–78.

[37] Gaede, V. and Gunther, O. [1998]. Multidimensional Access Methods, ACM

Computing Surveys 30(2): 170–231.

[38] Gast, M. [2005]. 802.11 Wireless Networks: The Definitive Guide, 2nd edn,

OReilly & Associates, Inc.

REFERENCES 237

[39] Gu, H., Shi, Y., Xu, G. and Chen, Y. [2005]. A Core Model Support-

ing Location-Aware Computing in Smart Classroom, Advances in Web-Based

Learning - ICWL 2005, 4th International Conference, Vol. 3583 of Lecture

Notes in Computer Science, SPRINGER, pp. 1–13.

[40] Guo, J., Guo, W. and Zhou, D. [2006]. Indexing of Constrained Moving Ob-

jects for Current and Near Future Positions in GIS, First International Multi-

Symposiums on Computer and Computational Sciences (IMSCCS ’06) 2: 504–

[41] Guttman, A. [1984]. A Dynamic Index Structure for Spatial Searching, Pro-

ceedings of the 1984 ACM SIGMOD International Conference on Management

of data, ACM, pp. 47–57.

[42] Hadjieleftheriou, M., Kollios, G., Gunopulos, D. and Tsotras, V. J. [2003].

On-Line Discovery of Dense Areas in Spatio-Temporal Databases, Advances in

Spatial and Temporal Databases, LNCS 2750, pp. 306–324.

[43] He, Y., Shu, Y., Wang, S. and Du, X. [2004]. Efficient Top-K Query Processing

in P2P Network, Database and Expert Systems Applications, 15th International

Conference, DEXA 2004, Vol. 3180 of Lecture Notes in Computer Science,

SPRINGER, pp. 381–390.

[44] Helal, A., Haskell, B., Carter, J. L., Brice, R., Woelk, D. and Rusinkiewicz,

M. [2002]. Any Time, Anywhere Computing: Mobile Computing Concepts and

Technology, Vol. 522, Springer Netherlands, chapter 1-2.

[45] Hosbond, J., Saltenis, S. and Ortoft, R. [2003]. Indexing Uncertainty of Contin-

uously Moving Objects, Database and Expert Systems Applications pp. 911–915.

REFERENCES 238

[46] Hu, H., Xu, J., Wong, W. S., Zheng, B., Lee, D. L. and Lee, W.-C. [2005].

Proactive Caching for Spatial Queries in Mobile Environments, Proceedings of

the 21st International Conference on Data Engineering (ICDE ’05), pp. 403–

[47] Hu, J., Xu, J., Lee, D. and Lee, W. [2004]. Performance Evaluation of an

Optimal Cache Replacement Policy for Wireless Data Dissemination, IEEE

Transactions on Knowledge and Data Engineering 16(1): 125–139.

[48] Hung, H.-P., Chuang, K.-T. and Chen, M.-S. [2007]. Efficient Process of Top-K

Range-Sum Queries over Multiple Streams with Minimized Global Error, IEEE

Transactions on Knowledge and Data Engineering 19(10): 1404–1419.

[49] HUTCHISON [2006]. Third Generation Mobile Phones,

http://www.three.com/. Last accessed: 02/04/08.

[50] Imielinski, T. and Badrinath, B. [1992]. Querying in Highly Mobile Distributed

Environments, Proceedings of the 18th Very Large Data Bases Conference,

pp. 41–52.

[51] Jing, J., Helal, A. and Elmagarmid, A. [1999]. Client-Server Computing in

Mobile Environments, ACM Computing Surveys 31(2): 117–157.

[52] Keller, A. M. and Basu, J. [1996]. A Predicate Based Caching Scheme for

Client-Server Database Architectures, 5(2): 35–47.

[53] Kim, Y. K. and Prasad, R. [2006]. 4G Roadmap and Emerging Communication

Technologies, Artech House Publishers.

[54] Knuth, D. [1997]. Sorting and Searching, Vol. 3, 3rd edn, Addison-Wesley.

REFERENCES 239

[55] Kollios, G., Gunopulos, D. and Tsotras, V. J. [1999]. On Indexing Mobile

Objects, PODS ’99: Proceedings of The Eighteenth ACM SIGMOD-SIGACT-

SIGART Symposium on Principles of Database Systems, ACM, New York,

USA, pp. 261–272.

[56] Kpper, A. [2005]. Location-Based Services : Fundamentals and Operation, John

Wiley & Sons Ltd.

[57] Kumar, A., Misra, M. and Sarje, A. K. [2006]. A Predicted Region Based

Cache Replacement Policy for Location Dependent Data in Mobile Environ-

ment, WiCOM 2006:International Conference on Wireless Communications,

Networking and Mobile Computing, pp. 1–4.

[58] Kumar, A., Misra, M. and Sarje, A. K. [2007]. A Weighted Cache Replace-

ment Policy for Location Dependent Data in Mobile Environments, SAC ’07:

Proceedings of the 2007 ACM symposium on Applied computing, ACM Press,

pp. 920–924.

[59] Kwon, D., Lee, S. and Lee, S. [2002]. Indexing the Current Positions of Moving

Objects Using the Lazy Update R-tree, MDM ’02: Proceedings of the Third In-

ternational Conference on Mobile Data Management, IEEE Computer Society,

Washington, DC, USA, pp. 113–120.

[60] Lai, K. Y., Tari, Z. and Bertok, P. [2004a]. Mobility-Aware Cache Replacement

for Users of Location-Dependent Services, Technical report, RMIT School of CS

[61] Lai, K. Y., Tari, Z. and Bertok, P. [2004b]. Mobility-Aware Cache Replacement

for Users of Location-Dependent Services, LCN ’04: Proceedings of the 29th

REFERENCES 240

Annual IEEE International Conference on Local Computer Networks, pp. 50–

[62] Lee, C.-I. and Tsai, C.-J. [2001]. An Efficient Approach to Extracting and

Ranking The Top-K Interesting Target Ranks From Web Search Engines, In-

formatica (Slovenia) 25(3).

[63] Lee, D. L., Xu, J., Zheng, B. and Lee, W.-C. [2002]. Data Manage-

ment in Location-Dependent Information Services, IEEE Pervasive Computing

1(3): 65–72.

[64] Lee, K. C., Lee, W.-C., Zheng, B. and Xu, J. [2006]. Caching Complementary

Space for Location-Based Services, Advances in Database Technology - EDBT

2006, LNCS 3896/2006, pp. 1020–1038.

[65] Li, Z., He, P. and Lei, M. [2005]. Research of Semantic Caching for Location

Dependent Query in Mobile Network, ICEBE ’05: Proceedings of the IEEE

International Conference on e-Business Engineering, IEEE Computer Society,

[66] Lim, S. Y., Taniar, D. and Srinivasan, B. [2005]. On-Mobile Query Process-

ing Incorporating Multiple Non-Collaborative Servers, Ingenierie des Systemes

d’Information 10(5): 9–38.

[67] Liu, Z. [2005]. Dynamical Mobile Terminal Location Registration in Wireless

PCS Networks, IEEE Transactions on Mobile Computing 4(6): 630–640.

[68] Lo, E., Mamoulis, N., Cheung, D., Ho, W. and Kalnis, P. [2003]. Processing

Ad-Hoc Joins on Mobile Devices, Technical report, The University of Hong

REFERENCES 241

[69] Lo, E., Mamoulis, N., Cheung, D. W.-L., Ho, W.-S. and Kalnis, P. [2004].

Processing Ad-Hoc Joins on Mobile Devices, Database and Expert Systems Ap-

plications, 15th International Conference, DEXA 2004, Vol. 3180 of Lecture

Notes in Computer Science, pp. 611–621.

[70] Lodge, J. H. [1991]. Mobile Satellite Communications Systems - Toward Global

Personal Communications, IEEE Communications Magazine 29: 24–30.

[71] Lunde, T. and Mjøvik, E. [2000]. Mobile Communication Technologies: Tech-

nical Capabilities and Time-to-Market, Technical Report IMEDIA/01/00, Nor-

wegian Computing Center.

[72] Manesis, T. and Avouris, N. [2005]. Survey of Position Location Techniques

in Mobile Systems, Proceedings of the 7th International Conference on Human

Computer Interaction With Mobile Devices and Services pp. 291–294.

[73] Markopoulos, A., Pissaris, P., Kyriazakos, S. and Sykas, E. [2004]. Efficient

Location-Based Hard Handoff Algorithms for Cellular Systems, NETWORK-

ING 2004, Networking Technologies, Services, and Protocols; Performance of

Computer and Communication Networks; Mobile and Wireless Communica-

tions pp. 476–489.

[74] Marsit, N., Hameurlain, A., Mammeri, Z. and Morvan, F. [2005]. Query Pro-

cessing in Mobile Environments: a Survey and Open Problems, 1st Inter-

national Conference on Distributed Frameworks for Multimedia Applications

pp. 150–157.

[75] mathsifun.com [2006]. Area of Plane Shapes,

http://www.mathsisfun.com/area.html. Last accessed: 02/04/08.

REFERENCES 242

[76] Matsunam, H., Terada, T. and Nishio, S. [2005]. A Query Processing Mecha-

nism for Top-K Query in P2P Networks, 21st International Conference on Data

Engineering Workshops pp. 1240–1244.

[77] Metwally, A., Agrawal, D. and Abbadi, A. E. [2005]. Efficient Computation

of Frequent and Top-K Elements in Data Streams, Proceedings of the 10th

International Conference on Database Theory (ICDT ’05), Vol. 3363 of Lecture

Notes in Computer Science, Springer, pp. 398–412.

[78] Michel, S., Triantafillou, P. and Weikum, G. [2005]. KLEE: A Framework

for Distributed Top-K Query Algorithms, Proceedings of the 31st International

Conference on Very Large Data Bases, pp. 637–648.

[79] Mobile Computing & Wireless LANs [2001].

http://www.mobileinfo.com/Wireless LANs/index.htm. Last accessed:

02/04/08.

[80] Nelson, R. C. and Samet, H. [1986]. A Consistent Hierarchical Representa-

tion for Vector Data, Proceedings of the 13th Annual Conference on Computer

Graphics and Interactive Techniques (SIGGRAPH ’86), ACM Press, New York,

NY, USA, pp. 197–206.

[81] Overview of Wireless Technologies [2004]. http://wireless.utk.edu/overview.html.

Last accessed: 02/04/08.

[82] Parry, R. [2002]. Overlooking 3G, IEEE Potentials 21(4): 6–9.

[83] Peng, W.-C. and Chen, M.-S. [2005]. Query Processing in A Mobile Computing

Environment: Exploiting The Features of Asymmetry, IEEE Transactions on

Knowledge and Data Engineering 17(7): 982–996.

REFERENCES 243

[84] Perry, M., O’hara, K., Sellen, A., Brown, B. and Harper, R. [2001]. Dealing

with Mobility: Understanding Access Anytime, Anywhere, ACM Transactions

on Computer-Human Interaction (TOCHI) 8(4): 323–347.

[85] Pfoser, D. and Jensen, C. [2001]. Querying The Trajectories of On-Line Mo-

bile Objects, Proceedings of the 2nd ACM International Workshop on Data

Engineering for Wireless and Mobile Access, ACM, pp. 66–73.

[86] Pfoser, D., Jensen, C. S. and Theodoridis, Y. [2000]. Novel Approaches in Query

Processing for Moving Object Trajectories, Proceedings of 26th International

Conference on Very Large Data Bases (VLDB ’00), pp. 395–406.

[87] Pissinou, N., Makki, K. and Campbell, W. J. [1999]. On The Design of a Loca-

tion and Query Management Strategy for Mobile and Wireless Environments,

Computer Communications 22(7): 651–666.

[88] Pitoura, E. and Samaras, G. [1998]. Data Management for Mobile Computing,

Kluwer Academic Publishers, London.

02/04/08.

[90] Porkaew, K., Lazaridis, I. and Mehrotra, S. [2001]. Querying Mobile Objects

in Spatio-Temporal Databases, Proceedings of the 7th International Symposium

on Advances in Spatial and Temporal Databases (SSTD ’01), Springer-Verlag,

London, UK, pp. 59–78.

[91] Prabhakar, S., Xia, Y., Kalashnikov, D., Aref, W. and Hambrusch, S. [1999].

Query Indexing and Velocity Constrained Indexing: Scalable Techniques for

Continuous Queries on Moving Objects, IEEE Transactions on Computers

51(10): 1124–1140.

REFERENCES 244

[92] Priyantha, N. B., Chakraborty, A. and Balakrishnan, H. [2000]. The Cricket

Location-Support System, Proceedings of the 6th Annual International Confer-

ence on Mobile Computing and Networking (MobiCom ’00), ACM, New York,

NY, USA, pp. 32–43.

[93] Ramakrishnan, R. and Gehrke, J. [2002]. Database Management Systems, 3rd

edn, McGraw-Hill Science/Engineering/Math.

[94] Ren, Q. and Dunham, M. [2000]. Using Semantic Caching to Manage Loca-

tion Dependent Data in Mobile Computing, Proceedings of the Sixth annual

International Conference on Mobile Computing and Networking pp. 210–221.

[95] Ren, Q. and Dunham, M. H. [1999]. Using Clustering for Effective Manage-

ment of A Semantic Cache in Mobile Computing, Proceedings of the 1st ACM

International Workshop on Data Engineering for Wireless and Mobile Access

(MobiDe ’99), ACM Press, New York, NY, USA, pp. 94–101.

[96] Roussopoulos, N., Kelley, S. and Vincent, F. [1995]. Nearest Neighbour Queries,

Proceedings of the 1995 ACM SIGMOD International Conference on Manage-

ment of Data (SIGMOD ’95), ACM, New York, NY, USA, pp. 71–79.

[97] Samet, H. [1984]. The Quadtree and Related Hierarchical Data Structures,

ACM Computing Surveys 16(2): 187–260.

[98] Samet, H. [1988]. Hierarchical Representations of Collections of Small Rectan-

gles, ACM Computing Surveys 20(4): 271–309.

[99] Sellis, T., Roussopoulos, N. and Faloutsos, C. [1987]. The R+-Tree: A Dynamic

Index for Multi-Dimensional Objects, Proceedings of the 13th Very Large Data

Bases Conference, pp. 507–518.

REFERENCES 245

[100] Seydim, A., Dunham, M. and Kumar, V. [2001]. Location Dependent Query

Processing, Proceedings of the 2nd ACM International Workshop on Data En-

gineering for Wireless and Mobile Access, ACM, pp. 47–53.

[101] Shrestha, A. and Xing, L. D. [2007]. A Performance Comparison of Different

Topologies for Wireless Sensor Networks, 2007 IEEE Conference on Technolo-

gies for Homeland Security, pp. 280–285.

[102] Sistla, A. P., Wolfson, O., Chamberlain, S. and Dao, S. [1998]. Querying

The Uncertain Position of Moving Objects, Temporal Databases: Research and

Practice, LNCS 1399, pp. 310–337.

[103] Stanoi, I., Agrawal, D. and Abbadi, A. E. [2000]. Reverse Nearest Neighbor

Queries for Dynamic Databases, ACM SIGMOD Workshop on Research Issues

in Data Mining and Knowledge Discovery, pp. 44–53.

[104] Su, C. and Tassiulas, L. [2000]. Joint Broadcast Scheduling and User’s Cache

Management for Efficient Information Delivery, Wireless Networks 6(4): 279–

[105] Tao, Y., Papadias, D. and Sun, J. [2003]. The TPR*-Tree: An Optimized

Spatio-Temporal Access MethoD for Predictive Queries, VLDB, pp. 790–801.

[106] Tari, Z., Hamidjaja, H. and Lin, Q. T. [2000]. Cache Management in CORBA

Distributed Object Systems, IEEE Transactions on Parallel and Distributed

Technology 8(3): 48–55.

[107] Tayeb, J., Ulusoy, O. and Wolfson, O. [1998]. A Quadtree-Based Dynamic

Attribute Indexing Method, The Computer Journal 41(3): 185–200.

[108] The IEEE 802.11 Standards [2008]. http://standards.ieee.org/getieee802/802.11.html.

Last accessed: 02/04/08.

REFERENCES 246

[109] Theodoridis, Y. and Sellis, T. K. [1994]. Optimization Issues in R-tree

Construction (Extended Abstract), Proceedings of the International Workshop

on Advanced Information Systems (IGIS ’94), Springer-Verlag, London, UK,

pp. 270–273.

[110] Toh, C.-K. and Li, V. [1998]. Satellite ATM Network Architectures: An

Overview, IEEE Network 12(5): 61–71.

[111] Trajcevski, G., Wolfson, O., Hinrichs, K. and Chamberlain, S. [2004]. Man-

aging Uncertainty in Moving Objects Databases, ACM Trans. Database Syst.

29(3): 463–507.

[112] Tsalgatidou, A., Veijalainen, J., Markkula, J., Katasonov, A. and Had-

jiefthymiades, S. [2003]. Mobile E-Commerce and Location-Based Services:

Technology and Requirements, In Proceedings of the 9th Scandinavian Research

Conference on Geographical Information Services pp. 1–4.

[113] Waluyo, A., Srinivasan, B. and Taniar, D. [2005]. Research on Location-

Dependent Queries in Mobile Databases, International Journal on Computer

Systems: Science and Engineering 20(3): 77–93.

[114] Wang, J. [1999]. A Survey of Web Caching Schemes for The Internet, SIG-

COMM Comput. Commun. Rev. 29(5): 36–46.

[115] Wang, W., Yang, J. and Muntz, R. [2000]. PK-tree: A Spatial Index Struc-

ture for High Dimensional Point Data, Information Oganization and Databases:

Foundations of Data Organization, Kluwer Academic Publishers, Norwell, MA,

USA, pp. 281–293.

REFERENCES 247

[116] Wang, W., Yang, J. and Muntz, R. R. [1997]. STING: A Statistical Informa-

tion Grid Approach to Spatial Data Mining, Proceedings of 23rd International

Conference on Very Large Data Bases (VLDB ’97), pp. 186–195.

[117] Want, R., Schilit, N., Adams, I., Gold, R., Petersen, K., Goldberg, D., Ellis,

R. and Weiser, M. [1996]. The ParcTab Ubiquitous Computing Experiment,

Kluwer Academic Publishers, Boston.

[118] Ward, A., Jones, A. and Hopper, A. [1997]. A New Location Technique for

the Active Office, IEEE Journal Personal Communications 4(5): 42–47.

[119] Wireless Indoor Positioning System (WIPS) - Technical Documentation [2007].

http://www.tslab.ssvl.kth.se/csd/projects/0012/technical.pdf. Last accessed:

10/10/2007.

[120] Wu, M., Jianliang Xu, J., Xueyan Tang, X. and Wang-Chien Lee, W.-C.

[2007]. Top-K Monitoring in Wireless Sensor Networks, IEEE Transactions on

Knowledge and Data Engineering 17(7): 962–976.

[121] Xia, Y. and Prabhakar, S. [2003]. Q+Rtree: Efficient Indexing for Moving Ob-

ject Databases, Proceedings of the Eighth International Conference on Database

Systems for Advanced Applications (DASFAA ’03), IEEE Computer Society,

[122] Xie, T., Sha, C., Wang, X. and Zhou, A. [2006]. Approximate Top-K Struc-

tural Similarity Search over XML Documents, Frontiers of WWW Research

and Development - APWeb 2006, 8th Asia-Pacific Web Conference, Vol. 3841

of Lecture Notes in Computer Science, Springer, pp. 319–330.

REFERENCES 248

[123] Xu, J., Lee, W.-C. and Tang, X. [2004]. Exponential Index: A Parameterized

Distributed Indexing Scheme for Data on Air, Proceedings of the 2nd Inter-

national Conference on Mobile Systems, Applications, and Services (MobiSys

’04), ACM, pp. 153–164.

[124] Xu, J., Zheng, B., Lee, W.-C. and Lee, D. L. [2004]. The D-tree: An Index

Structure for Planar Point Queries in Location-Based Wireless Services, IEEE

Transactions on Knowledge and Data Engineering, Vol. 16, pp. 1526–1542.

[125] Xu, Z., Hu, Y. and Bhuyan, L. [2004]. Exploiting Client Cache: A Scalable

and Efficient Approach to Build Large Web Cache, Proceedings of the 18th

International Conference on Parallel and Distributed Processing Symposium

pp. 55–65.

[126] Yin, L., Cao, G. and Cai, Y. [2005]. A Generalized Target-Driven Cache Re-

placement Policy for Mobile Environments, Journal of Parallel and Distributed

Computing 65(5): 583–594.

[127] Zaslavsky, A. and Tari, Z. [1998]. Mobile Computing: Overview and Current

Status, Australian Computer Journal 30(2): 42–52.

[128] Zheng, B. and Lee, D. [2001a]. Processing Location-Dependent Queries in a

Multi-Cell Wireless Enviroment, Proceedings of the ACM International Work-

shop on Data Engineering for Wireless and Mobile Access pp. 54–65.

[129] Zheng, B. and Lee, D. L. [2001b]. Semantic Caching in Location-Dependent

Query Processing, SSTD ’01: Proceedings of the 7th International Symposium

on Advances in Spatial and Temporal Databases, Springer-Verlag, London, UK,

pp. 97–116.

REFERENCES 249

[130] Zheng, B., Xu, J. and Lee, D. [2002]. Cache Invalidation and Replacement

Strategies for Location-Dependent Data in Mobile Environments, IEEE Trans-

actions on Computers 51(10): 1141–1153.

Appendix A

Implementation Model

This chapter describes our implementation in more detail. Our implementation has

two major parts: Location Generator and Proposed Algorithms. The location gen-

erator generates objects’ locations for our experiments. The locations of all objects

generated by this generator will be used in the second part of our implementation.

The second part consists of our proposed algorithms, which are categorised into

three major parts as described by our proposed approaches, as previously described

in the last three sections.

A.1 Location Generator

We developed a generator to create a list of objects’ locations. It produces the

number of objects’ locations in two dimensional coordinates and stores these in a

text file. Every location of an object is stored in one line, which is presented in

format x,y, and ended by a newline character for every line. The generated data is

used for all our experiments.

This generator is very simple, it contains two steps: initialisation and object’s

locations generators parts. In the first part, the initialisation process of the generator

APPENDIX A. IMPLEMENTATION MODEL 251

is done. It assigns some variables to adjust the setting of our generator. We have

some variables to adjust the number of base stations, the number of objects in every

base station and the dimensions of every base station. These parameters can be

adjusted during the experimentations. The next step is the development of base

station boundaries. A boundary of a base station is shown as a square starting

from the bottom left point (xmin, ymin) to the top right point (xmax, ymax). The

values for x and y coordinates are positive values starting from zero. The value of

the top right points is calculated by adding the component of the assigned square

dimension. This process keeps developing all boundaries of base stations until the

total number of base stations assigned by users has been reached.

Once the initialisation process has been completed, the generator starts gen-

erating object’s locations in two dimensional coordinates. First, we generate two

numbers: x and y, which represent an object’s location in two dimensional coordi-

nates. The data type used for this number is type double. The reason for presenting

values of x and y as double data type is to have more precise object’s locations since

it is a 64-bit floating point primitive presentation. In our generator, an objects’

location is unique, which implies that there is only object in one location.

Then, we develop a function to generate two numbers randomly. By using built-

in functions from the Java library, this step is easily implemented. First, we initialise

a seed number by using current time presented in milliseconds, then a number is

generated randomly from the seed. To generate the second number, we use the same

process. Once both numbers have been generated, we store them into an array of

objects. The object in our case is an object’s location, two generated numbers.

Before both generated numbers are stored into the array, they are verified against

all elements in the array. If they are exactly the same as one of the elements in

array, both numbers are rejected and new ones begin to be generated. Otherwise,

we store the generated data into a text file where the data format is similar to a

CSV (Comma Separated Value) form. A space is used to separate the x-coordinate

and y-coordinate values in one line.

Table A.1: Snapshot of our Generated Datax-coordinate y-coordinate5867.824581439626 3746.31063256796961083.199667249632 1953.76151378720995798.697361245137 3744.38549182568578423.345028973014 9424.8204018568943657.506339333300 1227.90917395401047617.495384068737 7649.0128816436994951.797251460664 6362.2774909348459966.001968430934 6287.9700965537418790.675769028454 28.102450983519271470.307894133284 1908.28506249053118094.432129404925 9347.009729539019962.5368928828126 213.476462744636179657.234208782298 1632.0546402032621

The output of our data file that is generated using our location generator is

presented in Table A.1. The last part of the generator continues generating objects’

locations until the total number of objects needed by users are fulfilled.

A.2 Implementation for Query Processing in Sin-

gle Cell

Our first implementation is to simulate query processing for a single cell. The aim

of this experiment is to compare the performance between circles and squares in

retrieving the most number of objects. In this implementation, we do not consider

the following factors:

• Processing multiple user queries at once

• Disconnections between a mobile user and a base station is ignored since it

can be tolerated

• Communication protocol between client and server is standard protocol

• Object’s location is static

• User always request a certain area within the cell that the user is currently

located

Our implementation has the flexibility to enable users to choose the total number

of database records that they want, the location of a mobile client and its velocity.

Table A.2 shows the setting of our implementation values.

Table A.2: Setting implementation 1Parameter Values

Database records 250,000 - 1,250,000BS dimension 10,000 x 10,000

Searching distance 500 - 2500Shape used Circle, square

Speed 0, 50Direction horizontal, vertical and diagonal

Once users have passed these values to our simulation, our simulation assigns

those values to appropriate variables. After all value parameters have been assigned,

we create a boundary of base station and a query scope. Then, our simulation

retrieves all records from the chosen database by storing them into an array. These

retrieved records from the database are valid for this base station since they were

generated by our generator.

The next step is to find records that belong to the query scope. Remember that

our proposed approaches are only to retrieve records that have not been passed.

Thus, we divide the query scope into four equal regions. To simplify our discussion,

the regions are numbered anti-clockwise starting from the top right. The selection of

regions is done by identifying the entered velocity from users. The velocity consists

of two elements: X and Y . Both elements are positive if the client travels to the

north east position. In contrast, the values of both elements are negative if the

client travels south west. Once the travel direction has been identified, the region

selection can be done. If the travel direction is east, the selected regions are one

and four. The complexity of regions selection can be seen in the case study from

Chapter 2.

Object validation is done in the next step (shown in Figure A.1). In this step,

an object is retrieved from the coordinate collection (line 3). Then, we compare the

location of the object to the chosen regions of a square and a circle as the query

scope. Lines 5 to 12 show object validation where the object is located inside the

square presented as a query scope. If it is inside, the counter for the square is

incremented and verified whether or not it is located inside a circle. Measuring the

distance between the object and the user is done by using Euclidean distance (line

15-17). The counter for circle is incremented if the object is inside the circle. These

objects are located inside the square and the circle will increment the counter for the

square and the circle. This verification continues until the value of internalCounter

is equal to number of objects inside the coordinate collection (line 1).

Figure A.2 shows how our program is run and its output when a user does

not move. We requested some information (location, searching distance, speed,

travel direction) from the user since we do not have any device for collecting live

information from a user. The time is measured based on a time unit.

Next experiment is the implementation when a user misses query results, thus

the server needs to reproduce the next query results. The implementation process is

1 while ( in te rna lCounte r < coord inate . s i z e ( ) )2 {3 ptDblBuff = ( Point2D . Double ) coo rd inate . elementAt ( in te rna lCounte r ) ;4 /∗ I s a coord inate i n s i d e the reg ion ? ∗/5 i f ( ( ptDblBuff . x < qsTopRight . x ) &&6 ( ptDblBuff . y < qsTopRight . y ) &&7 ( ptDblBuff . x > qsTopLeft . x ) &&8 ( ptDblBuff . y < qsTopLeft . y ) &&9 ( ptDblBuff . x < qsBottomRight . x ) &&

10 ( ptDblBuff . y > qsBottomRight . y ) &&11 ( ptDblBuff . x > qsBottomLeft . x ) &&12 ( ptDblBuff . y > qsBottomLeft . y ) )13 {14 /∗ Find Distance o f t a r g e t to i n s i d e c i r c l e to source ∗/15 pwrDistance = Math . pow( ( ptDblBuff . x − source . x ) , 2 . 0 ) +16 Math . pow( ( ptDblBuff . y − source . y ) , 2 . 0 ) ;17 ptDblDistance = Math . s q r t ( pwrDistance ) ;18 i f ( ptDblDistance <= distanceFromSource )19 {20 noOfPlacesFoundInCirc le++;21 }}22 noOfPlacesFoundInSquare++;23 }24 i n t e rna lCounte r++;25 }

Figure A.1: Implementation for object validation against query scope

similar to that of the initial process. However, if there are any existing query results

or the receiving flag is false, the server checks whether there is any overlapping area

between the current and the previous scopes. When an overlapping scope does not

exist, the query processing is the same as before. In contrast, when the current and

previous query scopes existed, the server invalidates any objects in the query results

which are not located inside the overlapping area. Then, the server searches objects

within the non-overlapping area of the current query scope. The existing objects

within the overlapping and the non-overlapping areas are merged. Then the server

sends the query results to the users.

The processing time is measured when the server starts processing the query

results. The measurement is done when invalidating objects from query results and

generating query results from the beginning of the process.

[jjayaput@sng-1 experiment1]\$ java generateCoordinate datadata50kEnter your current position including floating point(0-10000) :5000Enter distance that you would like to search :500Enter your current speed (0 - stop) :0Total : 50000 CoordinatesTime : t0Source (5000.0,5000.0)direction: hSearching in Region 0

Number of Places Found in Square:127Number of Places Found in Circle:100Searching in Region 1

Number of Places Found in Square:123Number of Places Found in Circle:93

Figure A.2: Snapshot of experiment 1 simulation

A.3 Implementation for Query Processing in Multi-

The implementation of query processing in multiple cells is quite complex, since

it involves a number of servers. In our implementation, we use TCP/IP for all

communication protocols. The time it takes to send a query result from a server to

a mobile user is ignored since we assume that the time needed for sending a query

result is constant.

To simulate multiple cells implementation, we use three machines where each

machine runs one server to serve one cell and has its own database.

Figure A.3: Class diagram of server implementation

Our class diagram for server implementation is shown in Figure A.3. It has five

classes: Server, ThreadedSocket, BSEntity, Message and Result. The explanation

for each class is as follows:

• Server

This class is the front end of the server which initiates the server boundary

and listening for any incoming request. The server instantiation has two parts:

default or custom. In the default mode, the BS boundary has been decided

automatically by the simulation. In the other words, the default mode is used

to initiate the main server. If the user does not give any parameters to this

class, the default value is used. Table A.3 values for the main server are as

follows:

Table A.3: Server default settingParameter ValueBS Width 900BS Height 2,000Server Port 8189

Figure A.4 shows an implementation snapshot to register to main server.

If the server configuration is set by the user, there must be at least one main

server up and running. The reason is that any customised server needs to

register to the main server. The port for main server is 8189. The registration

process for other servers is very simple: they connect to the incoming port of

the main server and send their identity to the main server. Then, they wait

for an acknowledgment from the main server that their registration status has

been successful. They can stand by to listen for incoming request if their

registration process to the main server has been successful. Otherwise, the

instantiation of this server has failed.

1 i f ( port != 8189 )2 {3 try4 {5 // Es t a b l i s h connect ion to the main se rve r6 Socket socketToNeigh = new Socket ( "203.24.130.25" , 8189 ) ;7

8 PrintWriter output = new PrintWriter ( socketToNeigh . getOutputStream ( ) ) ;9 input = new BufferedReader ( new InputStreamReader (

10 socketToNeigh . getInputStream ( ) ) ) ;11 Str ingToken ize r s t = null ;12

13 output . p r i n t l n ( r eque s t ) ; // Asking f o r r e g i s t r a t i o n .14 output . f l u s h ( ) ;15

16 St r ing buffInputFromSocket = null ;17

18 try19 {20 while ( ( buffInputFromSocket = input . readLine ( ) ) != null )21 {22 s t = new Str ingToken i ze r ( buffInputFromSocket ) ;23 System . out . p r i n t l n ( "updatingÃNeighbourÃList" ) ;24

25 b s en t i t y . updateNeighbour ( inetAddr , port , p o s i t i o n (x , y ) , Dimension ) ;26

27 System . out . p r i n t l n ( "NeighbourÃListÃupdated" ) ;28 }29 }30 catch ( Exception e )31 { System . out . p r i n t l n ( "RegistringÃmainÃserverÃtoÃneighbourÃlistÃfailed" ) ; }32

33 // We don ’ t need to e s t a b l i s h connect ion to the main se rve r34 // once the r e g i s t r a t i o n completed .35 // The communication between se rve r w i l l be handled by c l a s s BSEntity .36 socketToNeigh . c l o s e ( ) ;37 }38 catch ( Exception e ) { System . out . p r i n t l n ( "ServerÃRegistrationÃfailed" ) ; }39 }

Figure A.4: Implementation of a server registering itself to a main server

In the next step, this class initiates the listening port and keep listening for

incoming request from other servers and client. The server port can be chosen

directly by simulation or user. Figure A.5 shows an implementation snapshot

for listening to incoming requests using ServerSocket provided by Java library.

When there is an incoming request, this class remembers the port from

requesters and calls ThreadedSocket class. When it calls ThreadedSocket class,

it passes the requester’s port, its boundary. What ThreadedSocket class does

1 ServerSocket s e r v e r = new ServerSocket ( port ) ;2 System . out . p r i n t l n ( "ServerÃisÃready" ) ;3 while ( true )4 {5 Socket socke t = s e r v e r . accept ( ) ;6 i f ( socke t != null )7 {8 new ThreadedSocket ( socket , counter , b s e n t i t y ) . s t a r t ( ) ;9 System . out . p r i n t l n ( "ThreadÃstarted" ) ;

10 }11 }

Figure A.5: Implementation on how a server keep listening from incoming request

is to create separate process (child process) if there is an incoming request and

the server class keeps listening for next request.

• ThreadedSocket

This class contains the actual implementation of the Thread interface from

the JavaTM standard library since they only provide the Thread interface. As

mentioned earlier, this class is a child process of the class Server.

Class ThreadedSocket contains two methods: constructor and run methods.

In the constructor, it initialises values of the class variables with the ones

sent by the Server class. The run method executes this class, which is called

automatically once the class instantiation is done.

In the early process of the run method, it receives the incoming request

from the socket, which is converted into a string. Verification is carried out to

identify whether the request from the mobile user or server registration request

from the other server against the incoming request.

If the incoming request is a server registration of the other server, it will

call updateNeighbour of class BSEntity. As a confirmation result, the main

server identification is sent to that server.

If the request is a request from a mobile user, a calling to poolingInput

method of class BSEntity is done. The poolingInput is to pool requests from

mobile users before the query is processed. In return, the poolingInput gives

the query result as the answer. This query result is an answer for LDQ of the

mobile user. At the end of this method, the query result is sent to the mobile

user through the requester port.

• Message

Class Message is used to pool a query from a mobile user. It separates the

received query from a mobile user and stores them into class variables. The

class variables are userID, currentPosition, movement, searchingDistance, new-

Position and scope. The usage of the first four parameteres are very straight-

forward. The last parameter, scope, is to generate the valid scope of user

query.

Analysing the velocity of the mobile user is done in advance to form a valid

scope. Once this is done, the valid scope is created by adding the component

of newPosition with the component of searchingDistance. If the movement is

vertical, the x-coordinate of valid scope ranges from ”minus by one” to ”twice

of x-coordinate” of searchingDistance. However, the Y-coordinate ranges from

the y-coordinate of newPosition to the y-coordinate of searchingDistance. The

valid scope creation for horizontal movement is similar to that for the verti-

cal movement. The only difference is the x-coordinate for vertical movement

becomes the y-coordinate for horizontal and the y-coordinate for vertical move-

ment becomes the x-coordinate for horizontal.

• Result

This class is responsible for generating and storing a query result. It has

two constructors and two methods. Both constructors are default and copy

constructors to initialize class variables. The two methods are generateResult

and getResult methods. The last method returns the generated result to the

caller. The generateResult compares objects’ location from the database with

the valid scope. For those objects located inside the valid scope, the counter

of objects found is incremented and the locations of those objects are stored.

• BSEntity

This class is the main class for the server implementation. The query

processing and server registration processing are three tasks for this class. Let

us discuss the server registration process first, followed by the query processing

The server registration process is done with a method called updateNeigh-

bour. This method accepts five parameters: address, port, position, BSWidth

and BSHeight. The first two parameters are the other servers’ addresses and

listening ports. The last three parameters are the bottom left position of the

other server, other BS width and height respectively. Then, this method find

an empty slot in its list of neighbour BSs. Once the empty slot has been

found, it creates an object of type NeighbourDetails by passing all accepted

parameters. At the end of this method, it sends a confirmation message to the

caller.

The second part of this task is the query processing. The query processing

here involves the server and the client. To differentiate the request type, a

method called poolingInput does this filtering job. If the query is asked by

BS, this method calls method generateQueryFromBS. Otherwise, it calls on

method generateQueryFromClient to do query processing from a client.

After the filtering of query type has been done, the query processing task

is done by the following methods. The last two methods are to retrieve objects

from neighbour cells:

– generateQueryFromClient

In the beginning, the incoming request is pooled inside an array. We

used FIFO as a standard queuing priority. Once a server has finished

processing, the server processes the next element of the array.

The query processing is done by retrieving all objects in the current

BS. The procedure is the same as static objects retrieval for the single

cell. Once the process is finished, this method calls method generate-

QueryResultsFromNeighbour to retrieve static objects from a neighbour

cell. At the end of this method, information about static objects from

current and neighbour cells is combined and returned to the caller. The

processing time is also measured here. It measures the processing time

from current and neighbour cells.

– generateQueryResultsFromNeighbour

This method is used to find which neighbour BS overlaps with the

query scope. It goes through its database to get a list of overlapped

neighbour BS. Then, it passes the overlapping parts of the query scope by

opening the connection to that BS. While the neighbour BS is processing

the current query, the current BS waits until it gets the results from that

neighbour BS.

– generateQueryFromBS

Before it starts searching static objects from its database, the time

measurement begins at the starting point of this method. The second step

determines which area of the query scope needs to be searched. Once the

area has been determined, it starts the searching process by comparing

each object that belongs to that area. These objects are collected into a

result collection. At the end of the process, the result collection is sent

to the caller and the time measurement is stopped.

Appendix B

Simulation Model

B.1 Simulation Package Overview

Planimate is a discrete event animation software platform for prototyping, devel-

oping and operating highly visual dynamic discrete event simulation models and

interactive applications [89]. Figure B.1 shows the opening page when this package

is loaded.

Figure B.1: Opening page of Planimate

APPENDIX B. SIMULATION MODEL 266

The Planimate contains two different types of palettes, namely Objects and

Items, as shown in Figures B.2 and B.3. The first type of palettes, Object palettes,

contains 18 different objects which symbolise different activities for simulating the

features of the real-environment.

Figure B.2: Planimate Objects

Figure B.3 shows the Items palette. An item is a temporary object that will

interact with the permanent objects and move through the system. An item coop-

erates with the permanent objects through paths which need to be defined.

B.2 Query Processing Model

A brief explanation of our proposed simulation models is given and described in this

section. The explanation includes a description of the features that are available in

The first model is the proposed server query processing model. This model has

five components: request, counter, server, exit and result. The request represents a

request to a database record. The counter is used to count which database record

Figure B.3: Planimate Items

Figure B.4: Initial server processing mechanism model.

number is currently being examined. The server decides whether a point is a valid

result. If it is, the point is then collected to the result. Otherwise, the point will be

ignored by sending the item to the exit component.

The database records are stored as a table representation, which is shown in

Figure B.5a. To simplify our models, we put only two dimensional coordinates into

the table. When there is a request, the counter will increment the position of the

data points in the table in order to select a data point to be examined.

(a) Data points records. (b) The logic

Figure B.5: Planimate’s components for the server query processing.

Figure B.5b shows a logic for the server side. This logic is our proposed algorithm.

In general, the point is validated in this component and it is sent to a result collection

if it is located inside the query scope. The result collection is then sent to the

requester.

A proposed indexing mechanism model is shown in Figure B.6. In this model,

there are several nodes that represent a root node, 12 bounding boxes and 12 leaf

nodes. An entry is a query scope that traverses from a root node to leaf node and

collects the objects matched.

Figure B.7 presents the Planimate’s components which are being by the proposed

indexing model. Figure B.7a shows a table that contains a list of user locations while

sending queries. This table is used to represent an array which is normally used in a

programming language. On the other hand, Figure B.7b shows a list of parameters

that control model parameters, such as query size. These parameters act as variables

for the proposed algorithm in the Planimate. The values of these parameters can

be changed when the simulation is run.

Figure B.6: Initial indexing mechanism model.

Figure B.8 presents a condition interface to specify whether the previous node

(root, MBR or Leaf) overlaps with the query scope. This condition interface can

have a maximum of four conditions in one object.

Figure B.9 shows the logic behind a node. The logic will verify whether this node

is inside the query scope. If the node is inside the query scope, the next traversal

will be the next node underneath the current node. Each node has logic which is

similar to the shown one.

Figure B.10 shows a proposed indexing model with a traversing item from the

root node to a leaf node. In this figure, a searching process is started from the

Entry 1 and finishes at one of the exit nodes. The other nodes represents Minimum

Bounding Box (MBB) and switches. The switches are used to manage the flow of

data item.

Figure B.11a shows one of the proposed client caching models. The process

starts from component, ”Entry 1” which represents a query scope. The component

(a) list of user locations. (b) list of parameters

Figure B.7: Planimate’s components for the indexing mechanism.

Figure B.8: A condition interface on the Planimate.

”Objects in” retrieves objects by validating objects in a client cache with the in-

coming query scope and asks the server if the number of objects is less than the

number of requested objects. If the objects are retrieved from the cache, the cache

hit counter is incremented by component ”Inc CH”. The objects are queued until

the cache mgr finishes sending the objects.

If the number of objects coming from the server is greater than the cache space,

the objects are not stored in the cache. Otherwise, the next process verifies the

Figure B.9: A logic for a node.

available cache space. When the cache space is large enough to store all incoming

objects, the objects are cached directly, which will regroup all cached objects.

Figure B.11b shows one of the routines. The elimination process takes over if

there is not enough space. In the elimination process, it will find the right victims

to be removed. This process finds the next group to be evicted. When the amount

of available space is enough to accommodate the number of incoming objects, the

objects will then be stored.

Figure B.10: Indexing model with an item flow.

(a) The proposed client cache model.

(b) One of the routines for the client cache model.

Figure B.11: Planimate’s components are being used to mode the proposed clientcaching.