SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 ·...

SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION-AWARE

DATABASE SERVERS

A Thesis

Submitted to the Faculty

of

Purdue University

by

Mohamed F. Mokbel

In Partial Fulfillment of the

Requirements for the Degree

of

Doctor of Philosophy

August 2005

ii

To my parents, Fathalla and Zeinab, my wife Thanaa, and my son Abdelrahman

iii

ACKNOWLEDGMENTS

It is my pleasure to express my gratitude to a large number of people who have

contributed, in many different ways, to make my success a part of their own.

First, I wish to express my deepest gratitude to my supervisor Dr. Walid Aref. I

am totally indebted to his continuous encouragement, tireless efforts, and invaluable

guidance. He spent endless hours in teaching me how to be a researcher, how to

identify research challenges, how to transform my fledging ideas into crisp research

endeavors, how to present and sell my ideas, and finally how to tackle the job market.

Besides being a mentor, he has also been a personal friend to whom i have often

turned for advice. After all, i was really fortune to have Walid as my advisor and i

hope that i can be as generous, patient, friendly, and tireless with my students.

I will be always grateful to Dr. Ahmed Elmagarmid for the thoughtful discussions

with him on both the professional and personal levels. Whenever i was in need to

an advice or stuck in a decision, Ahmed was always there by his experience and

invaluable comments. I am really grateful to him and i wish i will be as helpful to

my students as well.

My gratitude and appreciation to my advisory and examining committee Prof.

Susanne Hambrusch, Prof. Sunil Prabhakar, and Prof. Elisa Bertino for their time

and efforts. Special thanks to Prof. Ananth Grama and Prof. Ibrahim Kamel for

collaborating in various research projects.

During my summer intern with the Database group at Microsoft Research, i have

worked with wonderful group of people. My sincere thanks to Dr. David Lomet

for being a wonderful mentor and for sharing his advice and experience with me.

My discussions with David significantly contributed to my passion for large scale

systems-oriented database research. I would never think of a better mentor than

David Lomet. Special thanks for Dr. Roger Barga for his help and support in jump-

iv

starting my work in Microsoft Research. Beside his work-related support, Roger was

a great friend that i was lucky to have. I would like to extend my appreciation to

the rest of the Database group at Microsoft Research. The everyday informal lunch-

meetings and discussions were more than wonderful and added a lot to my research

and life experience.

Special thanks are due to my friends, fellow students, and colleagues who made

my graduate life easier. In particular, thanks to Ossama Younis for being my com-

panion through the whole journey of Master and PhD, to Mohamed Elfeky for being

my all-time classmate, to Ahmed Soliman for his friendship, to Xiapong Xiong for

collaborating in the PLACE project, and to M. Ali, Hicham Elmongui, Moustafa

Hammad, and Ming Lu for research collaboration. Many thanks to the rest of the

ICDS (Indiana Center for Database Systems) group at Purdue University.

My sincere gratitude goes to my wife Thanaa for her unceasing patience when

I spend way too much time on computer stuff. While she was in a real need to

more time for her PhD study, she voluntary sacrifices her time to me. I am totally

indebted to her love, caring, and support. Thanks for my son Abdelrahman (3 years

old) for organizing my life, waking me up early everyday, keeping me awake late at

night, and letting me know, appreciate, and make the best use of every available

time slot.

My forever gratitude goes to my parents. Without their unconditional love,

support, and encouragements, i would have never made this far. Everything i have

achieved or will achieve in my life is through their guidance and the sacrifices they

have made for me.

Ahead of all, I thank ALLAH. For only through ALLAH grace and blessings has

this pursuit been possible. I pray for ALLAH support and guidance in the rest of

my career and my life.

v

TABLE OF CONTENTS

Page

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Location-aware Database Servers . . . . . . . . . . . . . . . . . . . . 1

1.2 New Challenges to Database Systems . . . . . . . . . . . . . . . . . . 4

1.3 The PLACE Prototype Server . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.5 Summary and Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Challenges and their Related Work . . . . . . . . . . . . . . . . . . . . . . 13

2.1 Spatio-temporal Query Classification . . . . . . . . . . . . . . . . . . 13

2.2 Challenge I: Massive Size of Incoming Spatio-temporal Data . . . . . 16

2.3 Challenge II: Repetitive Evaluation of Continuous Queries . . . . . . 18

2.4 Challenge III: Large Numbers of Concurrent Continuous Queries . . . 19

2.5 Challenge IV: Wide Variety of Continuous Queries . . . . . . . . . . . 20

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Disk-based Spatio-temporal Continuous Query Processing . . . . . . . . . . 22

3.1 Shared Execution of Continuous Spatio-temporal Queries . . . . . . . 23

3.2 The SINA Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.1 Phase I: Hashing . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.2 Phase II: Invalidation . . . . . . . . . . . . . . . . . . . . . . . 28

3.2.3 Phase III: Joining . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3 Extensibility of SINA . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3.1 Querying the Future . . . . . . . . . . . . . . . . . . . . . . . 37

3.3.2 k-nearest-neighbor Queries . . . . . . . . . . . . . . . . . . . . 38

vi

Page

3.3.3 Aggregate Queries . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3.4 Out-of-Sync Clients . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4 Correctness of SINA . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.5.1 Properties of SINA . . . . . . . . . . . . . . . . . . . . . . . . 47

3.5.2 Number of Objects/Queries . . . . . . . . . . . . . . . . . . . 48

3.5.3 Percentage of Moving Objects/Queries . . . . . . . . . . . . . 50

3.5.4 Locality of movement . . . . . . . . . . . . . . . . . . . . . . . 52

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Stream-based Spatio-temporal Query Processing: Query Operators . . . . . 55

4.1 The GPAC: Continuous Spatio-temporal Query Operators . . . . . . 56

4.2 Uncertainty in Continuous Spatio-temporal Queries . . . . . . . . . . 59

4.2.1 Types of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . 59

4.2.2 Uncertainty Avoidance in GPAC . . . . . . . . . . . . . . . . 62

4.3 Instances of GPAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.3.1 Spatio-temporal Range Queries . . . . . . . . . . . . . . . . . 67

4.3.2 Spatio-temporal k-nearest-neighbor . . . . . . . . . . . . . . . 67

4.4 Pipelined Spatio-temporal Query Operators . . . . . . . . . . . . . . 68


4.5.1 GPAC Operators in a Pipelined Query Plan . . . . . . . . . . 71

4.5.2 Properties of GPAC . . . . . . . . . . . . . . . . . . . . . . . 74

4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5 Stream-based Spatio-temporal Query Processing: Scalability . . . . . . . . 78

5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.1.1 Spatio-temporal Databases . . . . . . . . . . . . . . . . . . . . 79

5.1.2 Data Stream Management Systems . . . . . . . . . . . . . . . 80

5.2 The SOLE Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.3 Shared Memory in SOLE . . . . . . . . . . . . . . . . . . . . . . . . . 83

vii

Page

5.3.1 Shared Object Buffer . . . . . . . . . . . . . . . . . . . . . . . 83

5.3.2 Shared Query Buffer . . . . . . . . . . . . . . . . . . . . . . . 84

5.3.3 Optimizing the Shared Buffer Pool . . . . . . . . . . . . . . . 85

5.4 Shared Execution in SOLE . . . . . . . . . . . . . . . . . . . . . . . . 86

5.5 Load Shedding in SOLE . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.5.1 Query Load Shedding . . . . . . . . . . . . . . . . . . . . . . . 93

5.5.2 Object Load Shedding . . . . . . . . . . . . . . . . . . . . . . 94

5.5.3 Load Shedding with Locking . . . . . . . . . . . . . . . . . . . 95


5.6.1 Properties of SOLE . . . . . . . . . . . . . . . . . . . . . . . . 96

5.6.2 Scalability of SOLE . . . . . . . . . . . . . . . . . . . . . . . . 98

5.6.3 Response Time . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.6.4 Accuracy of Load Shedding . . . . . . . . . . . . . . . . . . . 102

5.6.5 Scalability of Load Shedding . . . . . . . . . . . . . . . . . . . 103

5.6.6 Object Load Shedding . . . . . . . . . . . . . . . . . . . . . . 105

5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 107

6.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 107

6.2 Future Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.2.1 Continuous Query Optimization . . . . . . . . . . . . . . . . . 109

6.2.2 Cost Model for Spatio-temporal Operators . . . . . . . . . . . 110

6.2.3 Context-aware Query Processing . . . . . . . . . . . . . . . . 110

LIST OF REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

VITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

viii

LIST OF FIGURES

Figure Page

1.1 Location-aware devices. . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Example of continuous spatio-temporal queries submitted to location-aware database servers. . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Incremental evaluation of range queries. . . . . . . . . . . . . . . . . . 6

1.4 Server GUI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.5 Client GUI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1 Shared execution of continuous queries. . . . . . . . . . . . . . . . . . 24

3.2 State diagram of SINA. . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3 Example of range spatio-temporal queries. . . . . . . . . . . . . . . . 26

3.4 Phase I: Hashing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.5 Pseudo code of the Hashing phase . . . . . . . . . . . . . . . . . . . . 29

3.6 Phase II: Invalidation. . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.7 Pseudo code of the Invalidation phase. . . . . . . . . . . . . . . . . . 31

3.8 Pseudo code invalidating moving objects. . . . . . . . . . . . . . . . . 32

3.9 Pseudo code invalidating moving queries. . . . . . . . . . . . . . . . . 33

3.10 Pseudo code for the joining phase. . . . . . . . . . . . . . . . . . . . . 36

3.11 Querying the future. . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.12 k-NN spatio-temporal queries. . . . . . . . . . . . . . . . . . . . . . . 39

3.13 Example of Out-of-Sync queries. . . . . . . . . . . . . . . . . . . . . . 41

3.14 Road network map of Oldenburg City. . . . . . . . . . . . . . . . . . 46

3.15 The answer size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.16 The impact of grid size N . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.17 Scalability with number of objects. . . . . . . . . . . . . . . . . . . . 49

3.18 Scalability with number of queries. . . . . . . . . . . . . . . . . . . . 50

ix

Figure Page

3.19 Percentage of moving objects. . . . . . . . . . . . . . . . . . . . . . . 51

3.20 Scalability of SINA with update rates. . . . . . . . . . . . . . . . . . 52

3.21 Effect of movement locality. . . . . . . . . . . . . . . . . . . . . . . . 53

4.1 Pseudo code of skeleton of GPAC. . . . . . . . . . . . . . . . . . . . . 58

4.2 Updating query information in GPAC . . . . . . . . . . . . . . . . . . 60

4.3 Uncertainty in moving range queries. . . . . . . . . . . . . . . . . . . 61

4.4 Uncertainty in moving NN queries. . . . . . . . . . . . . . . . . . . . 61

4.5 Uncertainty in static NN queries. . . . . . . . . . . . . . . . . . . . . 62

4.6 The cache area. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.7 Pseudo code of GPAC with caching. . . . . . . . . . . . . . . . . . . . 65

4.8 Updating query information in GPAC with caching . . . . . . . . . . 66

4.9 Greater Lafayette, Indiana, USA. . . . . . . . . . . . . . . . . . . . . 70

4.10 Pipelined GPAC operators. . . . . . . . . . . . . . . . . . . . . . . . . 72

4.11 Pipelined operators with SELECT. . . . . . . . . . . . . . . . . . . . 73

4.12 Pipelined operators with Join. . . . . . . . . . . . . . . . . . . . . . . 74

4.13 High arrival rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.14 Query selectivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.1 Overview of shared execution in SOLE. . . . . . . . . . . . . . . . . . 82

5.2 Shared join operator in SOLE. . . . . . . . . . . . . . . . . . . . . . . 86

5.3 Pseudo code for receiving a new value of P . . . . . . . . . . . . . . . 87

5.4 Pseudo code for updating P ’s location. . . . . . . . . . . . . . . . . . 88

5.5 All cases of updating P ’s location. . . . . . . . . . . . . . . . . . . . . 89

5.6 Pseudo code for receiving a new query Q. . . . . . . . . . . . . . . . . 90

5.7 Pseudo code for updating a query. . . . . . . . . . . . . . . . . . . . . 91

5.8 All cases of updating Q’s region. . . . . . . . . . . . . . . . . . . . . . 91

5.9 Architecture of self tuning in SOLE. . . . . . . . . . . . . . . . . . . . 92

5.10 Cache area in SOLE. . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.11 Grid Size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

x

Figure Page

5.12 Maximum Number of Supported Queries. . . . . . . . . . . . . . . . . 98

5.13 Data size in the query and cache areas. . . . . . . . . . . . . . . . . . 99

5.14 Response time in SOLE. . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.15 Load Vs. Accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.16 Reduced load for a certain accuracy. . . . . . . . . . . . . . . . . . . 103

5.17 Scalability with Load Shedding. . . . . . . . . . . . . . . . . . . . . . 104

5.18 Performance of Object Load Shedding. . . . . . . . . . . . . . . . . . 105

xi

ABSTRACT

Mohamed F. Mokbel. Ph.D., Purdue University, August, 2005. Scalable ContinuousQuery Processing in Location-aware Database Servers. Major Professors: Walid G.Aref.

The wide spread use of cellular phones, handheld devices, and GPS-like tech-

nology enables location-aware environments where virtually all objects are aware of

their locations. Location-aware environments and location-aware services are char-

acterized by the large number of moving objects and large number of continuously

moving queries (also known as spatio-temporal queries). Such environments call for

new query processing techniques that deal with the continuous movement and fre-

quent updates of both spatio-temporal objects and spatio-temporal queries. This

dissertation, presents novel paradigms and algorithms for efficient processing and

scalable execution of continuous spatio-temporal queries in location-aware database

servers. We introduce a disk-based framework that exploits shared execution and

incremental evaluation paradigms. With shared execution, the problem of evaluating

a set of concurrent continuous queries is abstracted to a spatial join between the set

of moving objects and the set of moving queries. With the incremental evaluation,

rather than performing a repetitive evaluation of continuous queries, we produce

only the updates of the recently reported answer.

For streaming environments, we introduce a generic class of spatio-temporal op-

erators that can be tuned with a set of parameters and methods to act as various con-

tinuous spatio-temporal queries (e.g., range queries and k-nearest-neighbor queries).

The spatio-temporal operators can be combined with other traditional operators

(e.g., join, distinct, and aggregate) to support a wide variety of continuous spatio-

temporal queries. To support scalability in steaming environments, we introduce

xii

a salable operator that shares memory resources among all outstanding continuous

queries. To cope with intervals of high arrival rates of data objects and/or continu-

ous queries, the proposed scalable operator utilizes a self-tuning approach based on

load-shedding where some of the stored objects are dropped from memory.

The experimental evaluation of our disk-based approach compares with recent

scalable approaches and shows the superior performance of our techniques. Also,

we experimentally evaluate our spatio-temporal operators based on a real implemen-

tation inside an open-source data stream management system. The experimental

results show that by delving inside the database engine and providing pipelined op-

erators for continuous spatio-temporal queries, we can achieve performance orders

of magnitude better than other application level algorithms.

1

1 INTRODUCTION

This dissertation studies supporting scalable execution of continuous queries in

spatio-temporal applications. Examples of these applications include location-aware

services, traffic monitoring, and enhanced 911 services. Such applications are charac-

terized by the large number of updates and the large number of continuous queries.

In this chapter, we start by motivating the need for location-aware database

servers in Section 1.1. Next, in Section 1.2, we discuss the challenges that location-

aware environments pose to existing database management systems. Section 1.3

briefly describes the PLACE prototype server; a research prototype for location-

aware database servers. The main contributions of this dissertation are summarized

in Section 1.4. Finally, Section 1.5 outlines the rest of the dissertation.

1.1 Location-aware Database Servers

The wide spread of location-detection devices (e.g., GPS-like devices, RFID’s,

handheld devices, and cellular phones) results in environments where virtually all

objects of interest are aware of their locations. Figure 1.1 shows various forms of GPS

devices that are added to cellular phones, cars, or PDA devices. Such devices enable

new spatio-temporal applications where moving objects with any of these devices

have the ability to change their location continuously over time. Examples of spatio-

temporal applications include location-aware services [1], traffic monitoring [2], and

enhanced 911 service (http://www.fcc.gov/911/enhanced/). In this dissertation, we

mainly focus on location-ware services [1] where a massive amount of spatio-temporal

data are continuously sent from a large number of moving objects (e.g., moving

vehicles in road networks) to location-aware database servers.

2

(a) Cellular GPS (b) Built-in GPS (c) Portable GPS (d) PDA + GPS

Figure 1.1. Location-aware devices.

Location-aware database servers provide new spatio-temporal services (i.e.,

spatio-temporal queries) to their subscribers. Spatio-temporal queries can be ei-

ther snapshot queries or continuous queries. This dissertation focuses on processing

continuous spatio-temporal queries in location-aware database severs. Figure 1.2

gives various examples of continuous spatio-temporal queries that are submitted

to a location-aware database server. Examples of these queries include continuous

moving range queries over moving objects (e.g., “Continuously report the number of

moving cars in the moving highlighted area”), continuous stationary spatio-temporal

range queries over moving objects (e.g., “Alert me if there is any traffic jam in a cer-

tain downtown area), continuous moving spatio-temporal k-nearest-neighbor query

over moving objects (e.g., “Continuously alert me if one of the nearest three mov-

ing aircrafts to my moving aircraft is not a friendly one”), and continuous moving

spatio-temporal k-nearest-neighbor query over stationary objects (e.g., “Continu-

ously, report the three nearest hospitals to my moving ambulance car”).

Unlike traditional snapshot queries, continuous spatio-temporal queries have the

following distinguishing characteristics:

• As continuous queries tend to stay active in the database server for several

hours or days, new continuous queries will be submitted to the same database

server while the old ones are still active. Thus, any continuous query pro-

3

Figure 1.2. Example of continuous spatio-temporal queries submittedto location-aware database servers.

cessor should take into account the large number of concurrent outstanding

continuous queries in location-aware database servers.

• Unlike snapshot queries where the answer is retrieved from the already stored

data objects, the answer of continuous queries is based on the received data

objects. In other words, once a continuous query is submitted to the database

server, it initially has no answer. The answer is progressively constructed by

the arrival of new data objects that satisfy the outstanding continuous query.

• Continuous queries require continuous evaluation as the query result becomes

invalid with the continuous change of location information of the query and/or

the data objects. Furthermore, an object may be added to or removed from

the answer set of a continuous spatio-temporal query. For example, consider

moving vehicles that move in and out of a certain query region.

• Queries as well as data have the ability to continuously change their locations.

Due to this mobility, any delay in processing spatio-temporal queries may result

4

in an obsolete answer. For example, consider a query that asks about moving

objects that lie in a certain region. If the query answer is delayed, the answer

may be outdated where objects are continuously changing their locations.

This dissertation focuses on leveraging traditional database management systems

and data stream management systems (e.g., Nile [3]) to support scalable execution of

continuous spatio-temporal queries in location-aware databases servers. We provide

disk-based and stream-based general frameworks for supporting a wide variety of con-

tinuous queries. In addition stream-based pipelined query operators are proposed as

an attempt to extend data stream management systems to support spatio-temporal

applications.

1.2 New Challenges to Database Systems

Traditional database systems and index structures are optimized for querying

existing data and inserting new data items, respectively. The implicit assumption

is that updates to the database engine are infrequent and have lower priority in

optimization techniques. Thus, having large numbers of updates would dramatically

degrade the performance of both traditional database systems and traditional index

structures.

In a typical location-aware environment, there is a huge number of update trans-

actions. In fact, the rate of updates may highly exceed the rate of submitting queries

to the database server. Also, the rate of updates is much larger than the rate of in-

serting new data items. For example, consider the case of having n moving objects,

all of them are subscribing with the same location-aware database server. Then,

there will be n insert transactions to insert new n moving objects in the location-

aware database server. With each single move (e.g., every 10 seconds) of any of

these objects, a new update transaction will be sent to the location-aware database

server. After only five minutes, each moving objects would send 30 update transac-

tions. Thus, the total number of the update transactions is 30 times the number of

5

insert transactions. Such highly dynamic environment is not supported neither in

traditional database systems nor in traditional index structures.

In general, location-aware environments pose three main challenges to the existing

database management systems:

1. Large number of update transactions. As a result of highly dynamic envi-

ronments, new indexing structures need to deployed that are optimized for

updates.

2. Continuously moving queries. Unlike traditional snapshot queries where they

are evaluated once, continuous moving queries need to continuously updated.

Furthermore, data objects may be added to or removed from the query answer.

Existing database management systems do not support such functionality.

3. Large number of continuous queries. There is plenty of work for multi-query

optimizations in traditional database systems (e.g., see [4, 5]). The main idea

is to explore common parts of different query plans. However, the main focus

was only on snapshot queries. New techniques need to be explored to support

scalable execution of continuous queries.

1.3 The PLACE Prototype Server

This dissertation presents the PLACE prototype server (Pervasive Location-

Aware Computing Environments); a scalable location-aware database server. The

PLACE server extends the Predator database management system [6], the Shore

storage manager [7], and the Nile [3] data stream management system to support

scalable execution of continuous spatio-temporal queries over spatio-temporal data

and spatio-temporal data streams. The PLACE server aims to bridge the areas of

spatio-temporal databases and data stream management systems. In general, the

the PLACE server has the following distinguishing characteristics:

6

X

Y

X

Y

(b) Snapshot at time T1

8P7P

2P

0(a) Snapshot at time T

Q

7P6P

5P1P

2P

8P

9P

5Q

3Q

4Q4P

1

3

P

Q

2Q2Q

1P5P

1Q

6P

4Q3P

4P 5Q

9P

3

Figure 1.3. Incremental evaluation of range queries.

1. Scalability in terms of supporting large numbers of moving objects.

Such scalability is achieved by optimizing the scarce memory resource to store

only either the recently moved objects or those objects that are of interest

to at least one outstanding continuous query. Further scalability is achieved

through employing load shedding mechanisms that aim to sacrifice some of the

in-memory objects to support larger number of queries, yet with an approxi-

mate answer.

2. Scalability in terms of supporting large numbers of continuous

spatio-temporal queries. Such scalability is achieved through employing

a shared execution paradigm among the concurrently outstanding continuous

queries. The main idea is to abstract the problem of executing multiple con-

tinuous spatio-temporal queries into a spatio-temporal join operation. The

inputs to the join operation are two streams; a stream of continuously moving

objects and a stream of continuously moving queries. Furthermore, concur-

rently outstanding queries share the same in-memory buffers and disk-based

data structures.

3. Incremental evaluation. Rather than performing a repetitive evaluation

of continuous queries, the PLACE continuous query processor employs an in-

cremental evaluation paradigm that continuously update the query answer.

7

The PLACE continuous query processor distinguishes between two types of

updates; namely positive and negative updates. A positive update indicates

that a certain object needs to be added to the query answer. Similarly, a

negative update indicates that a certain object needs to be removed from the

query answer. Figure 1.3 gives an example of applying the concepts of posi-

tive and negative updates on a set of continuous range queries. The snapshot

of the database at time T0 is given in Figure 1.3a with nine moving objects,

p1 to p9, and five continuous range queries, Q1 to Q5 in the two-dimensional

space. The answer of the queries at time T0 is represented as (Q1, P5), (Q2, P1),

(Q3, P6, P7), (Q4, P3, P4), and (Q5, P9). At time T1 (Figure 1.3b), only the ob-

jects p1, p2, p3, and p4 and the queries Q1, Q3, and Q5 change their locations.

As a result, the PLACE server reports only the following updates: (Q1,−P5),

(Q3,−P6), (Q3, +P8), and (Q4,−p4).

4. Supporting Wide Variety of Continuous Spatio-temporal Queries.

The PLACE continuous query processor provides a general framework that

supports a wide variety of continuous spatio-temporal queries that include con-

tinuous spatio-temporal range queries, continuous spatio-temporal k-nearest-

neighbor queries, continuous aggregate queries, and continuous future queries.

Furthermore, the PLACE continuous query processors supports both station-

ary and moving queries with the same performance.

5. Spatio-temporal operators. The PLACE continuous query processor goes

beyond the idea of implementing high level algorithms for continuous spatio-

temporal queries. Instead, the PLACE server encapsulates the spatio-temporal

query algorithms into a set of primitive spatio-temporal pipelined operators

that can be part of a larger query plan. Having a set of primitive spatio-

temporal operators results in greatly enhancing the query performance by

pushing the spatio-temporal operators to the bottom of the query pipeline

8

and in having flexible query optimizers where multiple candidate query plans

can be produced.

By subscribing with the PLACE server, moving objects are required to send

their location updates periodically to the PLACE server. A location update from

the client (moving object) to the server has the format (OID, x, y), where OID is the

object identifier, (x, y) is the location of the moving object in the two-dimensional

space. An update is timestamped upon its arrival at the server side. Once an object

P stops moving (e.g., P reaches its destination or P is shut down), either P sends

an explicit disappear message to the server or the server will timeout due to not

receiving any updates from P for a certain time TT imeout. In both cases, the server

recognizes that object P is no further moving.

Due to the highly dynamic nature of location-aware environments and the infinite

size of incoming spatio-temporal streams, we cannot store all incoming data. Thus,

the PLACE server employs a three-level storage hierarchy. First, a subset of the

incoming data streams is stored in in-memory buffers. In-memory buffers are associ-

ated with the outstanding continuous queries at the server. Each query determines

which tuples are needed to be in its buffer and when these tuples are expired, i.e.,

deleted from the buffer. Second, we keep an in-disk storage that keeps track with

only one reading of each moving object and query. Since, we cannot update the disk

storage every time we receive an update from moving objects, we sample the input

data by choosing every kth reading to flush to disk. Moreover, we cache the readings

of moving objects/queries and flush them once to the secondary storage every T

time units. Data on the secondary storage are indexed using a simple grid structure.

Third, every Tarchive time units, we take a snapshot of the in-disk database and flush

it to a repository server. The repository server acts as a multi-version structure of

the moving objects that supports historical queries. Stationary objects (e.g., gas

stations, hospitals, restaurants) are preloaded to the system as relational tables that

are infrequently updated.

9

Figure 1.4. Server GUI.

Figures 1.4 and 1.5 give snapshots of the server and client graphical user inter-

face (GUI) of PLACE, respectively. The server GUI displays all moving objects on

the map1. The client GUI simulates a client end-device used by the users. Users

can choose the type of query from a list of available query types that include sta-

tionary range queries, moving range queries, stationary k-nearest-neighbor queries,

and moving k-nearest-neighbor queries.The spatial region of the query can be de-

termined using the map of the area of interest. By pressing the submit button, the

client translates the query into SQL language and transmits it to the PLACE server.

The result appears both in the list of Figure 1.5 and as moving objects on the map.

A client can see only, on its map, the objects that belong to its issued query.

1The map in Figures 1.4 and 1.5 is for the Greater Lafayette area, Indiana, USA.

10

Figure 1.5. Client GUI.

1.4 Contributions

The main contributions of this dissertation are as follows:

• We introduce a disk-based framework for scalable execution of multiple con-

current continuous spatio-temporal queries. The proposed framework employs

two main paradigms; a shared execution paradigm as a means of achieving scal-

ability and an incremental evaluation paradigm to avoid repetitive evaluation

of continuous spatio-temporal queries.

• We introduce the first attempt to furnish data stream management systems

by a set of primitive spatio-temporal pipelined query operators. The proposed

spatio-temporal operators can be combined with traditional query operators to

provide a wide set of continuous spatio-temporal queries over spatio-temporal

data streams.

• We introduce a stream-based scalable pipelined query operator for evaluating

large numbers of concurrent continuous spatio-temporal queries over spatio-

11

temporal data streams. The proposed scalable operator takes two streams as

its input; a stream of moving objects and a stream of moving queries.

• To cope with time intervals of high workloads in the number of moving objects

and/or the number of moving queries, we introduce load shedding mechanisms

that aim to reduce the memory load while guaranteeing the query accuracy to

be above a certain threshold.

The proposed query operators are evaluated based on a real implementation

inside the query engine of a research prototype of a data stream management system.

The performance results of all the proposed algorithms and operators validate our

approaches. Experimental results are based on using synthetic date of moving objects

on a real road networks.

1.5 Summary and Outline

In this chapter, we motivated the need for having location-aware database servers

along with the challenges it poses to existing database management systems. Then,

we briefly highlight the PLACE server; our location-aware database research pro-

totype server. We briefly summarized our contributions in supporting continuous

query processing in location-aware database servers through providing disk-based

and stream-based frameworks and pipelined query operators that can be plugged

into existing database and data stream management systems.

The rest of this dissertation is organized as follows. Chapter 2 points out the

challenges we face in building the PLACE server along with the related work to each

challenge. Also, Chapter 2 classifies continuous spatio-temporal queries based on

their time domain and the mutability of both continuous queries and data objects.

In Chapter 3, we present SINA; our proposed disk-based framework for achieving scal-

able and incremental evaluation of continuous spatio-temporal queries. Chapter 3

also provides the correctness proof of SINA in terms of completeness, uniqueness, and

progressiveness. Chapter 4 presents a family of in-memory stream-based pipelined

12

query operators that can be combined with traditional query operators to support a

wide variety of continuous spatio-temporal queries. In Chapter 5, we propose SOLE;

a scalable pipelined query operator for evaluating a set of spatio-temporal contin-

uous queries over spatio-temporal streams. In addition, Chapter 5 introduces load

shedding mechanisms to cope with time intervals of high workloads in the number

of objects and/or the number of continuous spatio-temporal queries. Experiment

evaluation of the SOLE framework based on a real implementation inside a data

stream management research prototype is provided in Chapter 5. Finally, Chapter 6

concludes this dissertation and points out to future research directions.

Parts of this dissertation have been published in workshops, conferences, and jour-

nals. The disk-based scalable incremental framework for continuous spatio-temporal

queries have been published in ACM-SIGMOD-2004 [8]. Stream-based query op-

erators have been published in MDM-2005 [9]. Vision and overview papers of the

PLACE prototype server have been published in the ACM-GIS Symposium [1], the

ICDE PhD Workshop [10], the STDBM workshop [11], and the GeoInformatica Jour-

nal [12]. The PLACE prototype system has been demonstrated in VLDB-2004 [13].

13

2 CHALLENGES AND THEIR RELATED WORK

In this chapter, we start by presenting a thorough classification of continuous spatio-

temporal queries. Then, we present a set of challenges that we face in developing the

PLACE location-aware database server. With each challenge, we briefly highlight

its related work and the PLACE approach of dealing with it.

This chapter is organized as follows: Section 2.1 classifies continuous spatio-

temporal queries based on both the temporal dimension and the mutability of objects

and queries. In Section 2.2 we discuss the PLACE approach in dealing with massive

size of incoming spatio-temporal data. Section 2.3 presents the related work in

continuous spatio-temporal query processing along with the PLACE approach in

tackling such continuous evaluation. The scalability of the PLACE location-aware

server is discussed in Section 2.4 along with the related work. Section 2.5 presents

the need for having a general framework that support a wide variety of continuous

spatio-temporal queries. Finally, Section 2.6 summarizes this chapter.

2.1 Spatio-temporal Query Classification

There is a wide variety of continuous spatio-temporal queries. In this section,

we provide two classifications of continuous spatio-temporal queries based on the

temporal dimension and the mutability of both objects and queries. With each

classification, we give an example along with related work that deals with such

query. As the first classification is based on the query time, spatio-temporal queries

can be classified as:

• Historical Spatio-temporal Queries. Historical queries ask about the past

data. An example of historical queries is ”Find the locations of a certain

object between 7 AM and 8 AM today”. A continuous version of this query is:

14

”Continuously, Find the locations of a certain object in the last hour”. In this

case, the continuous query time interval (last hour) is a sliding time window.

To support historical queries, a location-aware server needs to store and index

all the incoming locations of moving objects. Examples of spatio-temporal

indexing techniques that support historical queries include the HR-tree [14],

the HR+-tree [15], the TB-tree [16], the MV3R-tree [15], and SETI [17].

• NOW Spatio-temporal Queries. NOW queries are interested only on the

current locations of moving objects. An example of a NOW query is “Based

on my current location, what is the nearest gas station?”. Due to the highly

dynamic environment that is supported by location-aware servers, dealing with

NOW queries is challenging. To answer NOW queries, a location-aware server

needs to keep track of the latest locations of all moving objects. Examples

of spatial access methods that support NOW queries include hashing [18],

the VCI-Index [19], the Q-Index [19], the LUR-tree [20], and the frequently

updated R-tree [21].

• Future Spatio-temporal Queries. Future queries are interested in predict-

ing the locations of moving objects. Additional information (e.g., the velocity

or destination) need to be sent from the moving objects to the location-aware

server. An example of a future query is ”Alert me if a non-friendly airplane

is going to cross a certain region in the next 30 minutes”. Notice that in

this query, the alert is sent before the actual event happens, hence, is termed

a future or predicting query. Examples of spatio-temporal access methods

that support future queries include the TPR-tree [22], the REXP -tree [23], the

TPR*-tree [24]), and STRIPES [25].

The second classification of spatio-temporal queries is based on the mutability of

both objects and queries. Thus, continuous spatio-temporal queries can be classified

as:

15

• Stationary Queries on Moving Objects. In this category, the query re-

gions are stationary, while objects are moving. Example of these queries include

”How many trucks are within the city boundary?” and ”Find the nearest 100

taxis to a certain hotel”. In these queries, the query regions (city boundary

and hotel neighborhood) are fixed, while the objects of interest (trucks and

cars) are moving. Two approaches have been proposed to support continuous

fixed queries. The first approach is to index the moving object with a spatio-

temporal access method [22–24]. The second approach is to index the fixed

queries with a spatial access methods [19, 26].

• Moving Queries on Stationary Objects. In this category, query regions

are moving, while objects are stationary. An example of this category is ”As I

am moving in a certain trajectory, show me all gas stations within 3 miles of

my location”. This category of queries employ traditional methods to organize

the fixed objects (e.g., fractals [27–29] or R-trees [30]). Efficient algorithms

that utilize the R-tree are proposed for the continuous single nearest-neighbor

queries [31] and the continuous K-nearest neighbor queries [32].

• Moving Queries on Moving Objects. In this category, both query regions

and objects are moving. An example of such queries is ”As I (the sheriff) am

moving in the space, make sure that the number of police cars within 3 miles

of my location is more than a certain threshold”. In this case, the query region

is moving. Also, the objects of interest (police cars) are moving. To support

moving queries in a location-aware server, moving objects need to be indexed

using a TPR-tree like structure (e.g., [22–24]). Then, special algorithms are

developed to process moving queries in TPR-tree-like structures.

The previous classifications can be applied to any kind of continuous or snap-

shot spatio-temporal queries (e.g., range queries, k-nearest-neighbor queries, reverse-

nearest-neighbor queries [33, 34], aggregate queries [35,36]). In this dissertation, we

go beyond the idea of having tailored algorithms and data structure for each spe-

16

cialized query type. Instead, we provide a general framework that supports all the

mutability combinations for continuous spatio-temporal queries. For simplicity, we

present our proposed algorithms and data structures on the context of NOW queries.

The extension to the case of future queries is straightforward. Continuous histor-

ical queries (i.e., sliding window queries) are discussed extensively in data stream

applications (e.g., [37–41]) and it is beyond the scope of this dissertation.

2.2 Challenge I: Massive Size of Incoming Spatio-temporal Data

Existing continuous query processors for spatio-temporal databases assume ex-

plicitly that all incoming data can be indexed and/or stored in the secondary stor-

age. A wide variety of spatio-temporal access methods (e.g., see [42] for a survey)

has been introduced to deal with massive sizes of spatio-temporal data. With the

highly dynamic nature in location-aware environments, several attempts (e.g. the

LUR-tree [20], the FUR-tree [21], and the CTR-tree [43]) are proposed to tune tra-

ditional index structures to support frequent updates. Traditional index structures

are optimized for answering queries and inserting new data not for supporting fre-

quent updates. The Lazy Update R-tree [20] (LUR-tree) aims to handle the frequent

updates of moving objects without degrading the performance of the R-tree index

structure. The main idea is that as long as the new position of a moving object lies

inside its minimum bounding rectangle (MBR), there is no action taken other than

updating the position. Once an object moves out from its MBR, two approaches are

proposed: (1) The object is deleted and is reinserted causing the necessary merge and

split operations. (2) If the object does not move very far from the MBR, the MBR

can be extended to enclose the new location. The Frequently Updated R-tree [21]

(FUR-tree) extends the idea of the LUR-tree by investigating several bottom-up ap-

proaches to accommodate the frequent updates of the moving objects. Examples of

these approaches include extending the MBR to enclose the new value and moving

the current object to one of the siblings. While both the LUR-tree and the FUR-

17

tree have constraints about the pattern of object movement, the Change-Tolerant

R-tree [43] does not have any restrictions about the object movement.

The common objective among the LUR-tree, the FUR-tree, and the CTR-tree

is to push the limits of traditional R-trees to provide efficient support of updating

existing data. However, these index structures are valid only for moderate update fre-

quencies. For really high dynamic environments, the performance of all R-tree mod-

ifications degrades dramatically. In fact, for even high arrival rates for update (e.g.,

data streaming environments), only in-memory algorithms are feasible. Although

there is extensive work in querying streaming data (e.g., see [37,38,44–49]), there is

limited work that exploits the spatial and/or temporal properties of data streams.

The spatial properties of data streams are addressed recently in [50, 51] to solve

geometric problems, e.g., computing the convex hull [51]. In [52], spatio-temporal

histograms are used as synopses for approximate query processing on spatio-temporal

data streams. Up to our knowledge, there is no existing work that addresses contin-

uous query processing for spatio-temporal streams.

The PLACE approach. To deal with the massive size of incoming data to the

PLACE server, we employ two techniques: First, a disk-based technique that aims

to index both frequently updated data objects and moving queries using a simple

grid structure. Grid structures are simple to update compared to the necessary split

and merging procedures for updating R-tree-like structures. Second, for the case of

streaming data, we employ in-memory techniques that limit the focus of the PLACE

query processor to only those objects that are of interest to at least one outstanding

continuous query. This is in contrast to existing streaming engines where they use

the sliding-window query model to limit the focus of the query processing engine to

only the recent history (e.g., see [38, 41, 46, 47, 49, 53, 54]). Our proposed model is

different than the sliding window query model in two aspects: (1) We are interested

in querying the current locations of moving objects, which is not supported by the

sliding window queries where they can support only historical queries, (2) In our

18

model, data objects are expired in a random order, rather than first-in-first-expert

model that used in sliding-window queries.

2.3 Challenge II: Repetitive Evaluation of Continuous Queries

Most of the existing techniques in spatio-temporal databases abstract any contin-

uous query into a series of snapshot queries executed at different interval times. Dif-

ferent approaches aim to support various optimizations of the time interval between

any two consecutive evaluations of the continuous query. Mainly, three different

approaches have been investigated:

1. The validity of the results [55,56]. With each query answer, the server returns a

valid time [56] or a valid region [55] of the answer. The valid time and the valid

region indicate the temporal and the spatial validity of the returned answer,

respectively. Once the valid time is expired or the client goes out of the valid

region, the client resubmits the continuous query for complete reevaluation.

2. Caching the results [32,57]. The main idea is to resubmit the continuous query

every fixed time interval T . The recent query result is cached either in the client

side [32] or in the server side [57]. Upon resubmission, the previously cached

results are used to prune the search for the new results of k-nearest-neighbor

queries [32] and range queries [57].

3. Precomputing the result [31,57]. If the trajectory of query movement is known

apriori, then by using computational geometry for stationary objects [31] or

velocity information for moving objects [57], we can identify which objects will

be nearest-neighbors [31] to or within a range [57] from the query trajectory.

However, if the trajectory information changes, then the query needs to be

reevaluated.

The PLACE approach. It is clear that query reevaluation consumes system re-

sources by doing redundant query processing. In PLACE, we avoid having query

19

reevalaution. Instead, we employ an incremental evaluation paradigm. With incre-

mental evaluation, only the changes to the query answer are evaluated and sent to

the user. A distinguished characteristic of continuous spatio-temporal queries is that

we need to have the ability to remove some parts of the query answer (e.g., an object

moves out of the range query). This feature is not available in traditional contin-

uous queries where the query answer is append-only. Thus, incremental evaluation

in PLACE indicate continuously updating the query answer by a set of positive and

negative updates. A positive update indicates the addition of a certain object to the

query answer. Similarly, a negative update indicates the removal of a certain object

from the query answer.

2.4 Challenge III: Large Numbers of Concurrent Continuous Queries

Most of the existing spatio-temporal algorithms focus on evaluating only one

outstanding continuous spatio-temporal query (e.g., see [31–33, 55–59]). Since the

continuous query stays active at the server side for a long time, then it is highly

likely that new continuous queries will be submitted to the server while there are

existing active queries. Dealing with each outstanding query as a single thread

will easily consume the system resources. Optimization techniques for evaluating

a set of continuous concurrent spatio-temporal queries are recently addressed for

centralized [19] and distributed environments [60, 61]. Techniques in distributed

environments assume that clients have computational and storage capabilities to

share the query processing with the server. The main idea of [60,61] is to ship some

part of the query processing down to the moving objects, while the server mainly

acts as a mediator among moving objects. This assumption is not always realistic.

In many cases, clients use cheap, low battery, and passive devices that do not have

any computational or storage capabilities. While [60] is limited to stationary range

queries, [61] can be applied for both moving and stationary queries. The Q-Index [19]

considers the case of centralized environments where there is no client overhead. The

20

main idea of the Q-index is to build an R-tree-like index structure on the queries

instead of the objects. Then, at each time interval T , moving objects probe the

Q-index to find the queries they belong to. The Q-index is limited in two aspects:

(1) It performs reevaluation of all the queries (through the R-tree index) every T

time units. (2) It is applicable only for stationary queries. Moving queries would

spoil the Q-index and hence dramatically degrade its performance.

The PLACE approach. In the PLACE server, we exploit the shared execution

paradigm as a means of achieving scalability for concurrently executing continuous

spatio-temporal queries. The main idea is to group similar queries in a query table.

Then, the evaluation of a set of continuous spatio-temporal queries is abstracted as

a spatio-temporal join between the moving objects and the moving queries. Sim-

ilar ideas of shared execution have been exploited in different contexts (e.g., the

NiagaraCQ [62] for web queries, PSoup [63], and [38] for streaming queries).

2.5 Challenge IV: Wide Variety of Continuous Queries

A major challenge for spatio-temporal continuous query processors is the wide

variety of spatio-temporal query types (e.g., see Section 2.1). Most of the existing

approaches are lacking generality where they focus only on solving special cases of

continuous spatio-temporal queries. For example, [31,32,55,56] focus only on moving

queries on stationary objects. These techniques are not applicable in case of having

moving objects. Similarly, [19, 35, 60, 61] focus only on stationary range queries on

moving objects. Also, these techniques cannot support the concept of moving queries.

Other work focuses on aggregate queries (e.g., see [35, 36, 52]), k-NN queries (e.g.,

see [32,58]), and reverse nearest-neighbor queries [33]. From a system point of view,

it is cumbersome to implement various techniques with different data structures to

support each query type with a certain specialized algorithm.

The PLACE approach. In the PLACE server, we develop disk-based and memory-

based general frameworks that can support a wide variety of continuous range queries

21

that include range queries, k-nearest-neighbor queries, aggregate queries, and future

queries. Also, any mutability combination of both objects and queries are handled

within the same general framework.

2.6 Summary

In this chapter, we presented two classifications of continuous spatio-temporal

queries. The first classification is based on the temporal domain. In this classifica-

tion, continuous spatio-temporal queries are either historical queries, now queries,

or future queries where the continuous queries are concerned with past, current, or

future data, respectively. The second classification is based on the mutability of

both objects and queries. In this classification, continuous spatio-temporal queries

are either stationary queries on moving objects, moving queries on stationary objects,

or moving queries on moving objects. Also in this chapter, we have discussed four

main challenges in realizing location-aware database servers. These challenges are:

(1) Dealing with massive data sizes, (2) Continuous evaluation of continuous queries,

(3) Supporting large number of concurrent continuous spatio-temporal queries, and

(4) Supporting a wide variety of continuous spatio-temporal queries. With each

challenge, we have discussed its related work and we briefly outlined the PLACE

approach in dealing with it.

22

3 DISK-BASED SPATIO-TEMPORAL CONTINUOUS QUERY PROCESSING

In this chapter, we introduce the Scalable INcremental hash-based Algorithm (SINA,

for short) for continuously evaluating a dynamic set of continuous spatio-temporal

queries. SINA exploits two main paradigms: Shared execution and incremental eval-

uation. By utilizing the shared execution paradigm, continuous spatio-temporal

queries are grouped together and joined with the set of moving objects. By uti-

lizing the incremental evaluation paradigm, SINA avoids continuous reevaluation of

spatio-temporal queries. Instead, SINA updates the query results every T time units

by computing and sending only updates of the previously reported answer. We dis-

tinguish between two types of query updates: Positive updates and negative updates.

Positive updates indicate that a certain object needs to be added to the result set

of a certain query. In contrast, negative updates indicate that a certain object is

no longer in the answer set of a certain query. As a result of having the concept of

positive and negative updates, SINA achieves two goals: (1) Fast query evaluation,

since SINA computes only the update (change) of the answer not the whole answer.

(2) In a typical spatio-temporal application (e.g., location-aware services and traffic

monitoring), query results are sent to customers via satellite servers [64]. Thus, lim-

iting the amount of data sent to the positive and negative updates only rather than

the whole query answer saves in network bandwidth.

SINA is a general framework that deals with all mutability combinations of ob-

jects and queries. Thus, it is applicable to stationary queries on moving objects, mov-

ing queries on stationary objects, and moving queries on moving objects. For sim-

plicity, we present SINA in the context of continuous spatio-temporal range queries.

However, as will be discussed in Section 3.3, SINA is applicable to a broad class of

continuous spatio-temporal queries (e.g., nearest-neighbor and aggregate queries).

SINA is proved to be correct with respect to the following: (a) Completeness, i.e.,

23

all query results will be produced by SINA. (b) Uniqueness, i.e., SINA produces

duplicate-free results. (c) Progressiveness, i.e., SINA reports only the updates of

the previously reported answer. In contrast to previously proposed approaches (e.g.,

see [19, 21]) that rely mainly on R-tree-like data structures, SINA employs a simple

disk-based grid data structure that is shared among all moving objects and moving

queries. The main idea of using a grid structure is to be applicable to the highly

dynamic nature of location-aware environments where there are large number of

updates. The update cost of the grid structure is much simpler than that of the

R-tree-like data structures. Experimental result show the SINA outperforms other

R-tree-based algorithms (e.g., Q-index [19] and Frequently Updated R-tree [21]).

The rest of the chapter is organized as follows: Section 3.1 introduces the concept

of shared execution for a group of spatio-temporal queries. Section 3.2 itnroduces

the Scalable INcremental hash-based Algorithm (SINA). The extensibility of SINA

to a variety of continuous spatio-temporal queries and to handle clients that are

disconnected from the server for short periods of times is discussed in Section 3.3.

The correctness proof of SINA is given in Section 3.4. Section 3.5 provides an

extensive list of experiments to study the performance of SINA. Finally, Section 3.6

summarizes this chapter.

3.1 Shared Execution of Continuous Spatio-temporal Queries

SINA exploit the shared execution paradigm as a means of achieving scalability

for concurrently executing continuous spatio-temporal queries. The main idea is to

group similar queries in a query table. Then, the evaluation of a set of continuous

spatio-temporal queries is abstracted as a spatial join between the moving objects

and the moving queries.

Figure 3.1a gives the execution plans of two simple continuous spatio-temporal

queries, Q1: ”Find the objects inside region R1”, and Q2: ”Find the objects inside

region R2”. Each query performs a file scan on the moving object table followed by

24

Q1 Q2

R1

R2

Q2Q1

Select ID Where

location inside R1

Select ID Where

location inside R2

File Scan File Scan File Scan File Scan

SpatialJoin

Moving Objects Moving Objects Moving Objects Moving Queries

(a) Local query plan for two range queries (b) A global shared plan for two range queries

Figure 3.1. Shared execution of continuous queries.

a selection filter. With shared execution, we have the execution plan of Figure 3.1b.

The table for moving queries contains the regions of the range queries. Then, a

spatial join is performed between the table of objects (points) and the table of

queries (regions). The output of the spatial join is split and is sent to the queries.

For stationary objects (e.g., gas stations), the spatial join can be performed using

an R-tree index [30] on the object table. Similarly, if the queries are stationary, the

Q-index [19] can be used for query indexing. However, if both objects and queries

are highly dynamic, the R-tree and Q-index structures result in poor performance.

To avoid this drawback, we can follow one of two approaches: (1) Utilize the tech-

niques of frequently updating R-tree (e.g., see [20, 21]) to cope with the frequent

updates of moving objects and moving queries. (2) Use a spatial join algorithm that

does not assume the existence of any indexing structure. Our proposed Scalable IN-

cremental hash-based Algorithm (SINA) utilizes the second approach. Experimental

results, given in Section 3.5, compare SINA with the first approach and highlights

the drawbacks and advantages of each approach.

3.2 The SINA Framework

The main idea of the Scalable INcremental hash-based Algorithm (SINA) is to

maintain an in-memory table, termed Updated Answer, that stores the positive and

25

n−1Q

nQ

2Q1 Q. . . . . .

. . . . . .

Positive Updates Negative and

Positive Updates

HashingMemory−Disk

Join

Phase I: Hashing

Disk

Phase II: Invalidation Phase III: Joining

Invalidation

Memory Fullor

Timeout

Incremental

Result

Negative Updates

DONE

DONEan

d m

ov

ing

qu

erie

sS

trea

ms o

f m

ov

ing

ob

jects Tim

eoutor

Mem

ory fu

ll

result to the queries

Send the incremental

In−memory

Figure 3.2. State diagram of SINA.

negative updates during the course of execution to be sent to the clients. Positive

updates indicate that a certain object needs to be added to the query results. Sim-

ilarly, negative updates indicate that a certain object needs to be removed from

the previously reported answer. Entries in the Updated Answer table have the form

(QID, Update List(±,OID)) where QID is the query identifier, the Update List is a

list of OIDs (object identifiers) and the type of update (+ or −). To reduce the size

of the Updated Answer table, negative updates may cancel previous positive updates

and vice versa. SINA sends the set of updates to the appropriate queries every T

time units.

SINA has three phases: The hashing, invalidation, and joining phases. Figure 3.2

provides a state diagram of SINA. The hashing phase is continuously running where

it receives incoming information from moving objects and moving queries. While

tuples arrive, an in-memory hash-based join algorithm is applied between moving

objects and moving queries. The result of the hashing phase is a set of positive

26

XX

Y Y

4

P3Q4

P6 7

1P

PP6

Q

7

P3

P

5

P2

Q3

P

P

1P

2Q

1

5

8

P

Q

P5

9

P

8P

2

P9P

5Q

3Q

4Q4P

1

(b) Snapshot at time T1(a) Snapshot at time T0

2Q

Q

Figure 3.3. Example of range spatio-temporal queries.

updates added to the Updated Answer table. The invalidation phase is triggered

every T time units or when the memory is full to flush in-memory data into disk. The

invalidation phase acts as a filter for the joining phase where the invalidation phase

reports negative updates of some objects to save their processing in the joining phase.

The joining phase is triggered by the end of the invalidation phase to perform a join

between in-memory moving objects and queries with in-disk stationary objects and

queries. The joining phase results in reporting both positive and negative updates.

Once the joining phase is completed, the positive and negative updates are sent to

the users that issued the continuous queries.

Throughout this section, we use the example given in Figure 3.3 to illustrate the

ideas and execution of SINA. Figure 3.3a gives a snapshot of the database at time T0

with nine moving objects, p1 to p9, and five continuous range queries, Q1 to Q5. At

time T1 (Figure 3.3b), only the objects p1, p2, p3, and p4 and the queries Q1, Q3, and

Q5 change their locations. The old query locations are plotted with dotted borders.

Black objects are stationary, while white objects are moving.

We use the term ”moving” object/queries at time Ti to indicate the set of ob-

jects/queries that report a change of information from the last evaluation time

Ti−1. Moving objects and queries are stored in memory for the evaluation time

27

Ti. Similarly, we use the term ”stationary” objects/queries to indicate the set of ob-

jects/queries that did not report any change of information from the last evaluation

time Ti−1. Stationary objects and queries are stored in disk at the evaluation time

Ti. Notice that stationary objects/queries at time Ti may become moving objects

and queries at time Ti+1 and vice versa.

3.2.1 Phase I: Hashing

Data Structure. The hashing phase maintains two in-memory hash tables,

each with N buckets for sources P and R that correspond to moving objects (i.e.,

points) and moving queries (i.e., rectangles), respectively. In addition, for the moving

queries, we keep an in-memory query table that keeps track of the corresponding

buckets of the upper-left and lower-right corners of the query region. In the following,

we use the symbols Pk and Rk to denote the kth bucket of P and R, respectively.

Algorithm. Figures 3.4 and 3.5 provide an illustration and pseudo code of the

hashing phase, respectively. Once a new moving object tuple t with hash value

k = hP (t) is received (Step 2 in Figure 3.5), we probe the hash table Rk for moving

queries that can join with t (i.e., contain t) (Step 2b in Figure 3.5). For the queries

that satisfy the join condition (i.e., the containment of the point objects in the query

region), we add positive updates to the Updated Answer table (Step 2c in Figure 3.5).

Then, we store t in the hash bucket Pk (Step 2d in Figure 3.5). Similarly, if a

moving query tuple t is received, we probe all the hash buckets of P that intersect

with t. For the objects that satisfy the join condition, we add positive updates to

the Updated Answer table (Step 4b in Figure 3.5). Then, the tuple t is clipped and

is stored in all the R buckets that t overlaps. Finally, to keep track of the list of

buckets that t intersects with, we store t in the in-memory query table with two

bucket numbers; the upper-left and the lower-right (Step 5 in Figure 3.5).

Example. In the example of Figure 3.3, the hashing phase is concerned with

objects and queries that report a change of location in the time interval [T0, T1].

28

RStreamPStream

..... ..... ..... .....

P R

Memory

Hash table for Hash table for

Query Table

P R

1 2 k N 1 2 k N

(2)(1)

(1)(2)

(3)

Incremental join results

h (P) h (R)

of moving objects of moving queries

Figure 3.4. Phase I: Hashing.

Thus, the objects p1, p2, p3, p4 are joined with the queries Q1, Q3, Q5. Only the

positive update (Q3, +p2) is reported.

Discussion. The hashing phase is designed to deal only with memory, thus,

there is no I/O overhead. Joining the in-memory data with the in-disk objects and

queries is to be performed at the joining phase. The fact that the hashing phase

performs in-memory join within the hashing process enables sending early and fast

results to the users. In many applications, it is desirable that the user have early and

fast partial results, sometimes at the price of slightly increasing the total execution

time. Similar ideas for in-memory hash-based join have been studied in the context

of non-blocking join algorithms, e.g., the symmetric hash join [65], XJoin [66], the

hash-merge join [67], and the RPJ [68].

3.2.2 Phase II: Invalidation

Data Structure. Figure 3.6 sketches the data structures used in the invalida-

tion phase. The invalidation phase relies on partitioning the two-dimensional space

29

Procedure HashingPhase(tuple t, source (P/ R))

Begin

1. If there is not enough memory to accommodate t, start the InvalidationPhase(),

return

2. If (source==P) //Moving object

(a) k = the hash value hP (t) of tuple t.

(b) Sq = Set of queries from joining t with queries in Rk

(c) For each Q ∈ Sq, add (Q,+t) to Updated Answer

(d) Store t in Bucket Pk

(e) return

3. Sk = Set of buckets result from hash function hR(t)

4. For each bucket k ∈ Sk

(a) So = Set of objects from joining t with objects in Pk

(b) For each O ∈ So, add (t,+O) to Updated Answer

(c) Store a clipped part of t in Bucket Rk

5. Store t in the query table

End.

Figure 3.5. Pseudo code of the Hashing phase

into N × N grid cells1. Objects and queries are stored in grid cells based on their

locations. To handle skewed data distribution of objects and queries, we employ

similar techniques as in [70] where we map grid cells into smaller size tiles in a round

1For simplicity, we present SINA in the context of a disk-based grid. However, the uniform gridcan be substituted by more sophisticated structures e.g., the FUR-tree [21] or quad-tree-like struc-tures [69].

30

Grid IndexObjects: (OID, Location, Timestamp, Qlist)

Queries(QID, Region, Timestamp, OList)

Object Index

(OID, Pointer) (QID,Pointer I, Ponter II)

Query Index

Figure 3.6. Phase II: Invalidation.

robin fashion. Tiles are directly mapped to disk-based pages. An object entry O

has the form (OID, loc, t, QList), where OID is the object identifier, loc is the re-

cent location of the object, t is the timestamp of the recently reported location loc,

and QList is the list of the queries that O is satisfying. A query Q is clipped to

all grid cells that Q overlaps with. For any grid cell C, a query entry Q has the

form (QID, region, t, OList), where QID is the query identifier, region is the recent

rectangular region of Q that intersects with C, t is the timestamp of the recently

reported region, and OList is the list of the objects in C that satisfy Q.region. In

addition to the grid structure, we keep track of two auxiliary data structures; the

object index and the query index. The object and query indexes are indexed on the

OID and QID, respectively, and are used to provide the ability for searching the

old locations of moving objects and queries given their identifiers.

Algorithm. The pseudo code of the invalidation phase is given in Fig-

ures 3.7, 3.8, and 3.9. The invalidation phase starts by flushing the non-empty

buckets that contain moved objects (Step 1 in Figure 3.7) and moved queries (Step 2

in Figure 3.7) into the corresponding grid cells in disk. Figure 3.8 gives the pseudo

code of invalidating a moving object Mo that is mapped into grid cell Gk. If there is

an old entry of Mo in Gk, this means that Mo did not cross a cell boundary. Thus,

31

Procedure InvalidationPhase()

Begin

• For (k=0;k<MAX GRID CELL;k++)

1. For each moving object Mo ∈ Pk, call Invalidate Object(Mo,Gk)

2. For each moving query Mq ∈ Rk

(a) if Mq ∈ Gk, update the information of Mq in Gk

(b) else, insert a new entry in Gk for Mq, with an OList initialized from the

Updated Answer

• call Invalidate Queries()

End.

Figure 3.7. Pseudo code of the Invalidation phase.

we update only the information of Mo in Gk (Step 1 in Figure 3.8). If Mo is a new

entry in Gk, we insert a new entry for Mo in Gk with the current timestamp and

a QList that contains the moving queries from the Updated Answer table that are

satisfied by Mo (Step 2 in Figure 3.8). Then, we utilize the auxiliary structure object

index using Mo.OID to get the old entry Oold of Mo (Step 3 in Figure 3.8). For all

queries in Oold.QList, we report negative updates to the Updated Answer table and

update the corresponding OLists (Step 6 in Figure 3.8). Finally, we delete the old

entry of Mo (Step 7 in Figure 3.8).

The invalidation process of moving queries starts by flushing query parts in the

corresponding disk-based cells (Step 2 in Figure 3.7). Similar to moving objects, we

either update an old entry or insert a new one. Then, we compare the in-memory

query table with the in-disk query index. For each moving query, we keep track with

a set Sk that contains the cells that were part of the old region of the query, but are

not in the new query region (Step 1 in Figure 3.9). Then, we send negative updates

32

Procedure Invalidate Object(Object Mo, GridCell Gk)

Begin

1. If Mo ∈ Gk

(a) Update the location and timestamp of Mo in Gk

(b) Sq = Queries in Updated Answer that contains Mo

(c) For each query Q ∈ Sq ∩ Mo.QList, add (Q,−Mo) to Updated Answer

(d) Mo.QList = Mo.QList ∪ Sq

(e) return

2. Insert Mo as a new entry in Gk with timestamp and a QList initialized with from

the Updated Answer

3. Gold = Old cell Mo from the Object index table

4. If Gold = NULL, return

5. Retrieve Oold; the old entry of Mo from Gold

6. For each query Q ∈ Oold.QList

(a) Add (Q,-Mo) to Updated Answer table

(b) Remove the entry Mo from Q.Olist

7. Delete the entry Oold from Gold

End.

Figure 3.8. Pseudo code invalidating moving objects.

for each object that was part of the query answer in each grid cell of Sk (Step 2 in

Figure 3.9). Finally, we delete the old entry of the moving query.

33

Procedure Invalidate Queries()

Begin

• For each query Mq in the in-memory query table

1. Sk = Set of grid cells that was covered by the old value of Mq and not covered

by the new value of Mq

2. For each grid k ∈ Sk

(a) Retrieve Qold; the old entry of Mq in cell k

(b) For each O ∈ Qold.OList, add (Mq,−O) to Updated Answer and remove

Mq from O.QList

3. Delete the entry Qold from k

End.

Figure 3.9. Pseudo code invalidating moving queries.

Example. For the example given in Figure 3.3, the invalidation phase is con-

cerned only with moving objects and queries that change their locations in the time

interval [T0, T1]. Moving objects p1, p2 do not report any updates where p1 does not

cross its cell boundaries and p2 was not involved in any query answer at time T0.

Although p3 is still in Q4, however, the negative update (Q4,−p3) is reported since

object p3 crosses its cell boundaries. To guarantee that only incremental results will

be maintained, this negative tuple will be deleted in the joining phase. For object

p4, we report the negative update (Q4,−p4). For moving queries Q1, Q5, we do not

report any result, where they do not leave any of their old cells. Query Q3 reports a

negative update (Q3,−p6) where Q3 completely leaves its old cell that contains p6.

Notice that we do not report any negative update for p7 where Q3 still did not leave

the cell that contains p7.

34

Discussion. The invalidation phase uses the object index and the query index

to retrieve the old information for moving objects and moving queries that cross

their cell boundaries, respectively. Another approach is to let the client send the old

location information along with the new location information. In this case, there

will be no need for maintaining the two auxiliary data structures. Although this

approach would simplify SINA and would save I/O overhead, it lacks practicality.

The main reason is that this approach assumes that the client has the ability to store

its old location information, which is not guaranteed for all clients. The objective of

SINA is to assume the minimal computation and storage requirement from clients.

Using auxiliary data structures to keep track of the old locations is utilized in

the LUR-tree as a linked list [20] and in the frequently updated R-tree as a hash

table [21]. However, the invalidation phase in SINA limits the access of the auxiliary

data structures to only the objects that move out of their cells, rather than to all

moved objects, which is the case in [20, 21].

The invalidation phase reports negative updates that correspond to moving ob-

jects that cross their cell boundaries and moving queries that leave some of their

old cells. For moving objects and queries that move within their cell boundaries,

we defer their invalidation process to the joining phase. Another approach for the

invalidation phase is to report negative updates from all moving objects and queries

regardless of their old locations. This approach would incur redundant I/O overhead.

In the joining phase, the cells that contain in-cell moving objects or queries have to

be fetched into memory to perform a join between objects and queries. Computing

negative updates for the in-cell movement in the invalidation phase results in redun-

dant operations between the two phases. Thus, the invalidation phase acts as a filter

to avoid unnecessary joins in the joining phase.

35

3.2.3 Phase III: Joining

Data Structure. The joining phase does not require any additional data struc-

ture where it uses only the grid data structure that is utilized in the invalidation

phase.

Algorithm. Figure 3.10 gives the pseudo code of the joining phase. For each

grid cell, the joining phase performs two spatial join operations: (1) Joining in-

memory objects with in-disk queries (Steps 1 and 2 in Figure 3.10), (2) Joining

in-memory moving queries with in-disk objects (Steps 3 and 4 in Figure 3.10). For

each moving object/query, we get the set of queries/objects from applying a spatial

join algorithm, respectively (Steps 2a and 4a in Figure 3.10). Then, based on the

answer set, we report positive and negative updates while updating the corresponding

data structures. After performing the spatial join for all grid cells, we send the

Updated Answer to the clients, and clear all memory data structures.

Example. For the example given in Figure 3.3, during this phase, moving object

p1 reports the negative update (Q2,−p1). Object p2 does not report any updates

where there are no in-disk stationary queries to join with p2’s new cell (Q3 is a

moving query). The moving object p3 is joined with the stationary query Q4 that

produces (Q4, +p3) as a positive update. Notice that this positive update cancels the

corresponding previously reported negative update in the invalidation phase. Thus,

the size of the Updated Answer table is minimized and only the incremental results

are maintained. Object p4 does not produce any results where there are no in-disk

queries to join with. For moving queries, Q1 reports the negative update (Q1,−p5)

where p5 and Q1 are not joined together in the upper-left corner cell. Query Q3

reports the positive update (Q3, +p8) as a result of the spatial join of one of the

news cells covered by Q3. Also, Q3 reports (Q3,−p7). Query Q5 does not report any

of the updates where object p9 is still in the new region of Q5.

Discussion. The joining phase only joins the cells that have new moving objects

and/or queries. Cells that contain only stationary (i.e., not recently moving) objects

36

Procedure JoiningPhase()

Begin

• For (k=0; k < MAX GRID CELL; k++)

1. Join moving objects in the in-memory bucket Pk with stationary queries in the

in-disk grid cell Gk

2. For each moving object Mo ∈ Pk

(a) Sq = Set of queries that results from the join

(b) For each query Q ∈ (Sq − Mo.QList), add (Q,+Mo) to Updated Answer,

update Q.OList

(c) For each stationary Q ∈ (Mo.QList − Sq), add (Q,−Mo) to Up-

dated Answer, update Q.OList

(d) Mo.QList = Sq ∪ Mo.QList

3. Join moving queries in the in-memory bucket Rk with stationary objects in the

in-disk grid cell Gk

4. For each moving query Mq ∈ Rk

(a) So = Set of objects that results from the join

(b) For each object O ∈ (So − Mq.OList), add (Mq,+O) to Updated Answer,

, update O.QList

(c) For each stationary O ∈ (Mq.OList − So), add (Mq,−O) to Up-

dated Answer, update O.QList

(d) Mq.OList = So ∪ Mq.OList

• Send the Updated Answer table to the users and empty all memory data structure

End.

Figure 3.10. Pseudo code for the joining phase.

37

and queries are not processed in the joining phase. In addition, cells that contain

stationary or old information of moving objects and/or queries are filtered out and

are processed at the invalidation phase (e.g., the cells that contain p2, p3, and p4 at

time T0 in Figure 3.3a). Each iteration of the joining phase deals with only one grid

cell. Thus, the I/O cost of each iteration is bounded by the number of disk pages of

a grid cell. For the CPU time, we utilize a plane-sweep-based spatial join algorithm

similar to the ones used in hash-based spatial join algorithms (e.g., [70]).

3.3 Extensibility of SINA

In this section, we explore the extensibility of SINA to support a broad class of

continuous spatio-temporal queries (e.g., future, k-nearest-neighbor, and aggregate

spatio-temporal queries) and to support clients that may be disconnected from the

server for short periods of time (i.e., out-of-sync clients).

3.3.1 Querying the Future

Future queries [52], also termed as predictive queries [24, 71], are interested in

predicting the locations of moving objects. An example of a future query is ”Alert me

if a non-friendly airplane is going to cross a certain region in the next 30 minutes”.

To support future queries, D-dimensional moving objects report their current loca-

tions ~x0 = (x1, x2, · · · , xd) at time t0 and a velocity vector ~v = (v1, v2, · · · , vd). The

predicted location ~xt of the moving object at any instance time t > t0 is computed

by ~xt = ~x0 + ~v(t − t0).

The extension of SINA to support future queries is straightforward. Moving

objects are represented as lines instead of points. Thus, in the hashing phase, moving

objects are clipped into several hash buckets (same as rectangular queries). In the

invalidation and joining phases, moving objects will be treated as moving queries

in the sense that they may span more than one grid cell. The shared execution

paradigm can exactly fit for future queries. Also, moving queries do not need any

38

Y

XXP

P3

P5 P1

P

4

2

P4

2

P5

3

P

P

1P

(a) Snapshot at time T0 (b) Snapshot at time T1

Y

Q Q

Figure 3.11. Querying the future.

special handling other than the ones used in the original description of SINA in

Section 3.2.

Figure 3.11a gives an example of querying the future. Five moving objects p1 to

p5, have the ability to report their current location at time T0 and a velocity vector

that is used to predict their future locations at times T1 and T2. The range query Q

is interested in objects that will intersect with its region at time T2 > T0. At time

T0 the rectangular query region is joined with the lines representation of the moving

objects. The returned answer set of Q is (p1, p3). At T1 (Figure 3.11b), only the

objects p2, p3, and p4 change their locations. Based on the new information, SINA

reports only the positive update (Q, +p2) and negative update (Q,−p3) that indicate

that p2 is considered now as part of the answer set of Q while p3 is no longer in the

answer set of Q.

3.3.2 k-nearest-neighbor Queries

SINA can be utilized to continuously report the changes of a set of concurrent

kNN queries. Figure 3.12a gives an example of two kNN queries where k = 3

issued at points Q1 and Q2. Assuming that both queries are issued at time T0,

39

Y Y

XX

5

Q2

P6

P

3

1

P3

PP4

2

P

P

Q

P8

P6Q2

1

P

1

P

P1

5

7

Q

7P

8

2

P

4

P

P

(b) Snapshot at time T1(a) Snapshot at time T0

Figure 3.12. k-NN spatio-temporal queries.

we compute the first-time answer using any of the traditional algorithms of kNN

queries (e.g., [72]). For Q1, the answer would be Q1 = p1, p2, p3 while for Q2, the

answer would be Q2 = p5, p6, p7. In this case, we present Q1 and Q2 as circular

range queries with radius equal to the distance of the kth neighbor. Later, at time

T1 (Figure 3.12b), object p4 and p7 are moved. Thus, SINA can be utilized to allow

for a shared execution among the two queries and to compute the updates from the

previously reported answer. Notice that the only change to the original SINA is

that we utilize circular range queries rather than rectangular range queries. For Q1,

object p4 intersect with the query region. This results in invalidating the furthest

neighbor of Q1, which is p1. Thus, two update tuples are reported (Q1,−p1) and

(Q1, +p4). For Q2, the object p7 was part of the answer at time T0. However, after p7

moves,the joining phase checks whether p7 still inside the query region or not. If p7

is outside the circular query region, we compute another nearest-neighbor, which is

p8. Thus, two update tuples are reported, (Q2,−p7) and (Q2, +p8). Notice that the

query regions of Q1 and Q2 are changed from T0 to T1 to reflect the new k-nearest

neighbors.

40

3.3.3 Aggregate Queries

Continuous spatio-temporal aggregate queries are recently addressed in [35] where

dense areas are discovered online (i.e., areas with the number of moving objects above

a certain threshold). The areas to be discovered are limited to pre-defined grid cells.

Thus, if a dense area is not aligned to a grid cell, it will not be discovered. The

work in [35] can be modeled as a special instance of SINA in the following way:

For a N × N grid, we consider having N2 spatio-temporal disjoint aggregate range

queries, where each query represents a grid cell. Moreover, SINA can extend [35]

to have the ability to discover pre-defined dense areas of arbitrary regions. Thus,

important areas (e.g., areas around airport or in downtown) can be discovered even

if they are not aligned to grid cells. All pre-defined dense areas are treated as range

queries. Then, the shared execution paradigm with the incremental evaluation of

SINA continuously reports the density of such areas. Positive and negative updates

report only the increase and decrease of the density from the previously reported

result.

3.3.4 Out-of-Sync Clients

Mobile objects tend to be disconnected and reconnected several times from the

server for some reasons beyond their control, i.e., being out of battery, losing com-

munication signals, being in a congested network, etc. This out-of-sync behavior

may lead to erroneous query results in any incremental approach. Figure 3.13 gives

an example of erroneous query result. The answer of query Q that is stored at both

the client and server at time T1 is (p1, p2) . At time T2, the client is disconnected

from the server. However, the server does not recognize that Q is disconnected.

Thus, the server keeps computing the answer of Q, and sends the negative update

(Q,−p2). Since the client is disconnected, the client could not receive this negative

update. Notice the inconsistency of the stored result at the server side (p2) and the

client side (p1, p2). Similarly, at time T3, the client is still disconnected. The client is

41

++

+

−

SINA Server

+

1

P2 P3

P1 P

P

1 P2P4P1

2 4T

2 P P

P

P

P2P1 P4

P1 P3

T T1

1

2P 4

2

T 3

P

P

2

3P

Q Q Q Q

1 1P

Figure 3.13. Example of Out-of-Sync queries.

connected again at time T4. The server computes the incremental result from T3 and

sends only the positive update (Q, +p4). At this time, the client is able to update its

result to be (p1, p2, p4). However, this is a wrong answer, where the correct answer is

kept at the server (p1, p3, p4). SINA can easily be extended to resolve the out-of-sync

problem by adding the following catch-up phase.

Catch-up Phase. A naive solution for the catch-up phase is once the client

wakes up, it empties its previous result and sends a wakeup message to the server.

The server replies by the query answer stored at the server side. For example, in

Figure 3.13, at time T4, SINA will send the whole answer (p1, p3, p4). This approach

is simple to implement and process in the server side. However, it may result in

significant delay due to the network cost in sending the whole answer. Consider

a moving query with hundreds of objects in its result that gets disconnected for a

short period of time. Although, the query has missed a couple of points during its

disconnected time, the server would send the complete answer to the query.

To save the network bandwidth, SINA maintains a repository of committed query

answers. An answer is considered committed if it is guaranteed that the client has

received it. Once the client wakes up from the disconnected mode, it sends a wakeup

message to the server. SINA compares the latest answer for the query with the

committed answer, and sends the difference of the answer in the form of positive and

negative updates. For example, in Figure 3.13, SINA stores the committed answer

of Q at time T1 as (p1, p2). Then, at time T4, SINA compares the current answer

42

with the committed one, and send the updates (Q,−p2, +p3, +p4). Once SINA

receives any information from a moving query, SINA considers its latest answer as

a committed one. However, stationary queries are required to send explicit commit

message to SINA to enable committing the latest result. Commit messages can be

sent at the convenient times of the clients.

3.4 Correctness of SINA

In this section, we provide a proof of correctness of the Scalable INcremental hash-

based Algorithm (SINA). The correctness proof is divided into three parts: First, we

prove that SINA is complete, i.e., all result tuples are produced. Second, we prove

that SINA is a duplicate-free algorithm, i.e., output tuples are produced exactly

once. Third, we prove that SINA is progressive, i.e., only new results will be sent to

the user.

Theorem 3.4.1 For any two sets of moving objects P and moving queries R, SINA

produces all output results (p, r) of P ⋊⋉ R, where the join condition p inside r is

satisfied at any time instance t.

Proof Assume that ∃(p, r) : p ∈ P, r ∈ R, and at some time instance t, p was

located inside r. However, the tuple (p, r) is not reported by SINA. Since (p, r)

satisfies the join condition, then there exists a hash bucket h such that h = hP (p)

and h ∈ hR(r). Assume that the latest information sent from p and r were in time

intervals [Ti, Ti+1] and [Tj , Tj+1], respectively. Then, there are exactly two possible

cases:

Case 1: i = j. In this case, both p and r reports their recent information at the

same time interval [Ti, Ti+1]. Thus, we guarantee that p and r were resident in the

memory at the same time. If p arrives before r, then p will be stored in bucket Ph

without joining with r. Later when r arrives, it will probe the bucket Ph, and join

with p. The same proof is applicable when r arrives before p. Thus, the tuple (p, r)

cannot be missed in case of i = j.

43

Case 2: i 6= j. Assume i < j. This indicates that p arrives before r. Then,

in the invalidation phase, p is flushed into disk before r arrives. Once r arrives,

it is stored in an in-memory hash bucket h that corresponds to the disk-based cell

of object p. Since the joining phase joins all in-memory hash buckets with their

corresponding in-disk grid cells, we guarantee that r will be joined with p. The same

proof is applicable when i > j where the in-memory object p will be joined with the

in-disk query r. Thus, the tuple (p, r) cannot be missed in case of i 6= j.

From Cases 1 and 2, we conclude that the assumption that (p, r) is not reported

by SINA is not possible. Thus, SINA produces all output results.

Theorem 3.4.2 At any evaluation time Ti, SINA produces the output result that

corresponds to all information change in [Ti−1, Ti] exactly once.

Proof Assume that ∃(p, r) : p ∈ P, r ∈ R, and (p, r) satisfies the join condition.

Assume that SINA reports the tuple (p, r) twice. We denote such two instances as

(p, r)1 and (p, r)2. Since, we are interested only on tuples that satisfy the join condi-

tion (i.e., positive updates), then, we skip the invalidation phase where it produces

only negative updates. Thus, we identify the following three cases:

Case 1: (p, r)1 and (p, r)2 are both produced in the hashing phase.

Assume that p arrives after r. Once p arrives, it probes the hash bucket of r and

outputs the result (p, r)1. Then, during the hashing phase, only newly incoming

tuples are used to probe the hash buckets of p and r. Thus, (p, r) cannot be produced

again in the hashing phase.

Case 2: (p, r)1 and (p, r)2 are both produced in the joining phase. The

joining phase produces positive updates at two outlets: in-memory moving objects

with in-disk (not recently moving) queries and moving queries with in-disk objects.

If (p, r)1 is produced in the former outlet, then p is a moving object that reports its

recent information in [Ti−1, Ti]. Thus, (p, r)2 cannot be produced in the second outlet

where it is concerned only with in-disk objects. The same proof is applicable when

(p, r)1 is produced in the second outlet. Thus, the tuple (p, r) cannot be produced

again in the joining phase.

44

Case 3: One of the tuples, say (p, r)1, is produced in the hashing phase,

while the other one is produced in the joining phase. Since (p, r)1 is reported

in the hashing phase, then we guarantee that both p and r were in memory (moving)

in the same time interval [Ti−1, Ti]. Thus, p is a moving object and r is a moving

query. In the joining phase, (p, r) cannot be produced again in the first outlet where

r is not stationary. Similarly, (p, r) cannot be produced again in the second outlet

where p is not stationary. Thus, the tuple (p, r) cannot be produced again in the

joining phase.

From the above three cases, we conclude that the assumption that the tuple (p, r)

is reported twice at the evaluation time Ti is not valid.

Theorem 3.4.3 For any two sets of moving objects P and moving queries R, at

any evaluation time Ti, SINA produces ONLY the changes of the previously reported

result at time Ti−1.

Proof Assume that ∃p1, p2, p3 ∈ P, r ∈ R, and only (p1, r), (p2, r) satisfy the join

condition at time Ti−1. Then, at time Ti, p1 is still inside r, p2 is moved out of r

while p3 is moved inside r. In the following we prove that only the tuples (r,−p2)

and (r, +p3) are produced at time Ti. Mainly, we identify the following three cases:

Case 1: r is a moving query, p1, p2, and p3 are moving objects. This case

is processed only in the hashing and invalidation phases. Based on Theorem 2, the

hashing phase produces only the updates (r, +p1) and (r, +p3). In the invalidation

phase, (r, +p1) is deleted either in Step 1c in Figure 3.8 (if p1 moves within its cell

boundary) or in Step 6a in Figure 3.8 (if p1 moves out of its cell) by adding the

counterpart tuple (r,−p1). The tuple (r,−p2) is produced only in the invalidation

phase either in Step 1c or Step 6a of Figure 3.8.

Case 2: r is a moving query, p1, p2, and p3 are stationary objects. This

case is processed only in the invalidation and joining phases. Since p1 is still in

the answer set of r, then p1 is inside some grid cell c that intersects with both the

old and new regions of r. Thus, p1 will be processed only in the joining phase,

45

particularly at Step 4 in Figure 3.10. However, since p1 is an old answer, no action

will be taken. For object p2, assume that Ci−1, Ci are the sets of grid cells that

are covered by r in Ti−1, Ti, respectively. c2 is the grid cell of p2. Since p2 is not

in the answer set of r at time Ti, then c2 /∈ Ci − Ci−1. If c2 ∈ Ci−1 − Ci, then the

tuple (r,−p2) will be produced in the invalidation phase (Step 2b in Figure 3.9).

However, if c2 ∈ Ci−1 ∩ Ci, then the tuple (r,−p2) will be produced in the joining

phase (Step 4c in Figure 3.10). For object p3, since p3 is not an old answer, then

it will not be processed in the invalidation phase. Thus, the tuple (r, +p3) will be

reported in the joining phase (Step 4b in Figure 3.10).

Case 3: r is a stationary query, p1, p2, and p3 are moving objects. The

proof is very similar to Case 2 by reversing the roles of queries and objects.

We do not include the case of stationary queries on stationary objects where it is

not precessed. Also, we assume that either all p′is are moving or stationary. However,

the proof is still valid for any combination of moving and stationary p′is. Thus, from

the above three cases, we conclude that: At time Ti, SINA only produces the change

of the result from the previously reported answer at time Ti−1.

3.5 Performance Evaluation

In this section, we compare the performance of SINA with the following: (1) Hav-

ing an R-tree-based index on the object table. To cope with the moving objects, we

implement the frequently updated R-tree [21] (FUR-tree, for short). The FUR-tree

modifies the original R-tree to efficiently handling moving objects. (2) Having a Q-

index [19] on the query table. Since Q-index is designed for static queries, we modify

the original Q-index to employ the techniques of the FUR-tree to handle moving

queries. Thus, the Q-index can handle moving queries as efficient as the FUR-tree

handles moving objects. (3) Having both the FUR-tree on moving objects and the

modified Q-index on the query table. Then, we employ an R-tree based spatial join

algorithm [73] (RSJ, for short) to join objects and queries.

46

Figure 3.14. Road network map of Oldenburg City.

We use the Network-based Generator of Moving Objects [74] to generate a set of

moving objects and moving queries. The input to the generator is the road map of

Oldenburg (a city in Germany) given in Figure 3.14. The output of the generator

is a set of moving points that moves on the road network of the given city. Moving

objects can be cars, cyclists, pedestrians, etc. We choose some points randomly

and consider them as the centers of square queries. Unless mentioned otherwise,

we generate 100K moving objects and 100K moving queries. Each moving object

or query reports its new information (if changed) every 100 seconds. The space

is represented as the unit square, query sizes are assumed to be square regions of

side length 0.01. SINA is adopted to refresh query results every T = 10 seconds.

The percentage of objects and queries that report a change of information within T

seconds is 10% of the moving objects and queries, respectively.

All the experiments in this section are conducted on Intel Pentium IV CPU

2.4GHz with 256MB RAM running Linux 2.4.4. SINA is implemented using GNU

C++. The page size is 2KB. We implement FUR-tree, Q-index, and RSJ using

the original implementation of R*-tree [75]. Our performance measures are the I/O

overhead and CPU time incurred. For the I/O, we consider that the first two levels

of any R-tree-based structures are in memory. The CPU time is computed as the

time used to perform the spatial join in the memory (i.e., once the page is retrieved

47

0

100

200

300

400

0 1 2 3 4 5 6 7 8 9 10Size of answer (*1000 bytes)

Percentage for moving objects (%)

SINAComplete answer

(a) Moving objects (%)

0

200

400

600

800

1000

1200

1400

1600

1 1.2 1.4 1.6 1.8 2Size of answer (*1000 bytes)

query side length (*0.01)

SINAComplete answer

(b) Query size

Figure 3.15. The answer size.

from disk). For SINA, the CPU time also includes the time that the hashing phase

consumes for the in-memory join.

3.5.1 Properties of SINA

Figure 3.15 compares between the size of the answer returned by SINA and the

size of the complete answer returned by any non-incremental algorithm. In Fig-

ure 3.15a, the percentage of moving objects varies from 0% to 10%. The size of

the complete answer is constant and is orders of magnitude of the size of the in-

cremental answer returned by SINA. A complete answer is not affected by recently

moved objects. However, for SINA, the size of the answer is increasing slightly,

where it is affected by the number of objects being evaluated at every T seconds. In

Figure 3.15b, the query side length varies from 0.01 to 0.02. The size of the com-

plete answer is increased dramatically to up to seven times that of the incremental

result returned by SINA. The saving in the size of the answer directly affect the

communication cost from the server to the clients.

48

0

20

40

60

80

100

0 10 20 30 40 50 60 70 80

I/O (*1000)

Number of cells per dimension

SINA

(a) I/O

2

2.5

3

3.5

4

4.5

5

5.5

6

0 10 20 30 40 50 60 70 80

Time (sec)

Number of cells per dimension

SINA

(b) CPU Time

Figure 3.16. The impact of grid size N .

Figures 3.16a and 3.16b give the effect of increasing the grid size N on the I/O

and CPU time incurred by SINA, respectively. With small number of grid cells (i.e,

less than 10), each cell contains a large number of disk pages. Thus a spatial join

within each cell results in excessive I/O and CPU time. On the other hand, with

a large number of grid cells (i.e., more than 60), each cell contains a small number

of moving objects and queries. Although this results in lower CPU time, where the

spatial join is performed among few tuples. However, disk pages are under utilized.

Thus, additional I/O overhead will be incurred. Based on this experiment, we set

the number of grid cells N along one dimension to be 40.

3.5.2 Number of Objects/Queries

In this section, we compare the scalability of SINA with the FUR-tree, Q-index,

and RSJ algorithms. Figures 3.17a and 3.17b give the effect of increasing the num-

ber of moving objects from 10K to 100K on I/O and CPU time, respectively. In

Figure 3.17a, SINA outperforms all other algorithms. RSJ has double the I/O’s of

SINA due to the R-tree update cost. Notice that the performance of the R-trees

49

0

10

20

30

40

50

60

70

80

90

10 20 30 40 50 60 70 80 90 100

I/O (*1000)

Number of objects (*1000)

SINAFUR-treeQ-Index

RSJ

(a) I/O

0

1

2

3

4

5

6

7

10 20 30 40 50 60 70 80 90 100

Time (sec)

Number of objects (*1000)

SINAFUR-treeQ-Index

RSJ

(b) CPU Time

Figure 3.17. Scalability with number of objects.

is degraded with the increase in the number of moving objects and moving queries.

The performance of Q-index is dramatically degraded with the increase of the num-

ber of moving objects as moving objects are not indexed. The FUR-tree has the

worst performance for all cases where there is no index of the 100K queries. How-

ever, the performance is slightly affected by the increase of the moving objects. The

slight increase is due to the maintenance of the increasing size of the moving objects.

When the number of moving objects is increased up to 100K, both the FUR-tree and

the Q-index have similar performance which is eight times worse than that of the

performance of SINA. The main reason is that both FUR-tree and Q-index utilize

only one index structure. Thus, the non-indexed objects and queries worsen the

performance of FUR-tree and Q-index, respectively.

In Figure 3.17b, SINA has the lowest CPU time. The relative performance of

SINA over other R-tree-based algorithms increases with the increase of the number of

moving objects. The main reason is that the update cost of SINA is much lower than

updating R-tree structures. As the number of moving objects increases, the quality

of the bounding rectangles in the R-tree structure is degraded. Thus, searching

and querying an R-tree incurs higher CPU time. The RSJ algorithm gives lower

50

0

10

20

30

40

50

60

70

80

90

10 20 30 40 50 60 70 80 90 100

I/O (*1000)

Number of queries (*1000)

SINAFUR-treeQ-Index

RSJ

(a) I/O

0

1

2

3

4

5

6

7

10 20 30 40 50 60 70 80 90 100

Time (sec)

Number of queries (*1000)

SINAFUR-treeQ-Index

RSJ

(b) CPU Time

Figure 3.18. Scalability with number of queries.

performance in CPU time than FUR-tree and Q-index since RSJ needs to update

in two R-trees. The performance of RSJ ranges from 1.5 to 3 times worse than the

performance of SINA.

Figure 3.18 gives similar experiment to Figure 3.17 with exchanging the roles of

objects and queries. Since both SINA and RSJ treat objects and queries similarly,

their performance is similar to that of Figure 3.17. However, the FUR-tree and Q-

index exchange their performance as they deal with objects and queries differently.

3.5.3 Percentage of Moving Objects/Queries

Figure 3.19 investigates the effect of increasing the percentage of the number

of moving objects and queries on the performance of SINA and R-tree-based algo-

rithms. The percentage of moving objects varies from 1% to 10%. The percentage

of moving queries is set to 5%. For the I/O overhead (Figure 3.19a), RSJ has sim-

ilar performance as SINA for up to 5% of moving objects. Then, RSJ incurs up to

double the number of I/O’s over that of SINA for 10% of moving objects. Both the

FUR-tree and the Q-index have similar performance which is almost eight times of

51

0

10

20

30

40

50

60

70

80

90

1 2 3 4 5 6 7 8 9 10

I/O (*1000)


SINAFUR-treeQ-Index

RSJ

(a) I/O

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9 10

Time (sec)


SINAFUR-treeQ-Index

RSJ

(b) CPU Time

Figure 3.19. Percentage of moving objects.

magnitude worse than that of SINA. When the percentage of moving objects is lower

than 5% (i.e., lower than the percentage of moving queries), the FUR-tree has bet-

ter performance. When the percentage of the number of moving objects and moving

queries are equal (i.e., 5%) both FUR-tree and Q-index have similar performance.

Basically, the performance of FUR-tree and Q-index are degraded with the increase

of the percentage of moving objects and moving queries, respectively.

For the CPU time (Figure 3.19b), SINA outperforms all R-tree based algorithms.

This is mainly due to the high update cost of the R-tree. The RSJ algorithm has

the highest CPU time, where it updates in two R-trees. In addition, SINA computes

incremental results while R-tree-based algorithms are non-incremental.

Similar performance is achieved when fixing the number of moving objects to 5%

while varying the number of moving queries from 0% to 10%. The only difference is

that we replace the roles of objects and queries. Thus, the performance of the FUR-

tree and Q-index is exchanged while SINA and SRJ maintain their performance.

In Figure 3.19, we limit the number of moving queries to 5% and the number

of moving objects to 10%. Having more dynamic environment degrades the per-

formance of all R-tree based algorithms. In the following experiment, we explore

52

9

10

11

12

13

14

15

10 15 20 25 30

I/O (*1000)


moving queries=10%moving queries=20%moving queries=30%

(a) I/O

0

1

2

3

4

5

6

7

8

10 15 20 25 30

Time (sec)


moving queries=10%moving queries=20%moving queries=30%

(b) CPU Time

Figure 3.20. Scalability of SINA with update rates.

the scalability of SINA in terms of handling highly dynamic environments. In Fig-

ure 3.20, the percentage of moving objects varies from 10% to 30%. We plot three

lines for SINA that correspond to the percentage of moving queries as 10%, 20%,

and 30%. We do not include any performance results of any of the R-tree-based

algorithms where their performance is dramatically degraded in highly dynamic en-

vironments. Figures 3.20a, and 3.20b give the I/O and CPU time incurred by SINA,

respectively. The trend of SINA is similar with all percentages of moving queries.

Also, the performance of SINA increases linearly with the increase of moving objects.

Thus, SINA is more suitable for highly dynamic environments.

3.5.4 Locality of movement

This section investigates the effect of locality of movement on SINA and R-tree-

based algorithms. By locality of movement, we mean that objects and queries are

moving within a certain distance. As an extreme example, if all objects are moving

within small distance, then at each evaluation time T of SINA, all objects and queries

are moving within their cells. Thus, SINA achieves its best performance. On the

53

0

10

20

30

40

50

60

70

80

90

0 10 20 30 40 50 60 70 80 90 100

I/O (*1000)

Objects that change their cells (%)

SINAFUR-treeQ-Index

RSJ

(a) I/O

0

1

2

3

4

5

6

7

8

0 10 20 30 40 50 60 70 80 90 100

Time (sec)

Objects that change their cells (%)

SINAFUR-treeQ-Index

RSJ

(b) CPU Time

Figure 3.21. Effect of movement locality.

other side, SINA has its worst performance if 100% of the objects change their cells.

By tuning the moving distance of moving objects, we can keep track of the number

of moving objects that cross their cell boundaries. Figures 3.21a and 3.21b give the

effect of the movement locality on the I/O and CPU time, respectively. For I/O,

even the worst case of SINA is still better than R-tree-based algorithms (similar to

RSJ and four times better than the FUR-tree and the Q-index). The performance

of R-tree-based algorithms is almost not affected even all objects change their cells.

The main reason is that changing the cell in the grid structure does not necessarily

mean changing the R-tree node. For the CPU time, SINA outperforms all other

algorithms by two orders of magnitude. In addition, the performance of SINA has

only slight increase with the number of objects that change their cells.

3.6 Summary

This chapter introduced the Scalable INcremental hash-based Algorithm (SINA,

for short); a new algorithm for evaluating a set of concurrent continuous spatio-

temporal range queries. SINA employs the shared execution and incremental evalu-

54

ation paradigms to achieve scalability and efficient processing of continuous spatio-

temporal queries. SINA has three phases: Hashing phase, invalidation phase, and

joining phase. The hashing phase employs an in-memory hash-based join algorithm

that results in a set of positive updates. The invalidation phase is triggered every T

time units or when the memory is full to produce a set of negative updates. Then,

the joining phase is triggered to produce a set of both positive and negative updates

that result from joining in-memory data with in-disk data. We discussed the exten-

sibility of SINA to support a wide variety of spatio-temporal queries and out-of-sync

clients. The correctness of SINA is proved in terms of completeness, uniqueness, and

progressiveness. Comprehensive experiments show that the performance of SINA is

orders of magnitude better than other R-tree based algorithms where the experi-

ments demonstrate that SINA is: (1) Scalable to a large number of moving objects

and/or moving queries, (2) Stable in highly dynamic environments. Finally, SINA

saves in the network bandwidth by minimizing the data sent to clients.

55

4 STREAM-BASED SPATIO-TEMPORAL QUERY PROCESSING: QUERY

OPERATORS

While the previous chapter and most of the existing approaches for continuous spatio-

temporal query processing (e.g., see [8, 20–24, 31, 57, 58, 76, 77]) focus on indexing

and/or storing the incoming object updates on the disk storage, this chapter focuses

on data stream environments where only in-memory algorithms and data structures

are allowed. Data streaming environments (e.g., see [3, 45,78–81]) are characterized

by: (1) Large numbers of data objects that are beyond the system capabilities to

store, and (2) Very high data arrival rates that hinder consulting the secondary

storage for indexing and/or storing the incoming data. Most of the existing work

in data streaming environments (e.g., see [45, 62, 79, 80]) aim to efficiently support

continuous queries over data streams. However, the spatial and temporal properties

of data streams and/or continuous queries are overlooked.

In this chapter, we introduce the Generic Progressive Algorithm (GPAC, for

short) for continuously evaluating continuous spatio-temporal queries over spatio-

temporal data streams. GPAC provides a generic skeleton that can be tuned

through a set of methods to behave as different continuous spatio-temporal queries

(e.g., continuous range queries and k-nearest-neighbor queries). The GPAC family

of algorithms is mainly designed to achieve the following goals:

1. Online evaluation. Incoming data is processed and stored in-memory without

the need for secondary storage.

2. Progressive evaluation. Only the updates of the previously reported result are

computed progressively as new tuples arrive. This is in contrast to previous

approaches that buffer some of the updates and send them once to the user.

56

Unlike most of the existing algorithms for continuous spatio-temporal queries

(e.g., see [8,31–33,55]) that are implemented as high-level functions at the application

level, GPAC algorithms are encapsulated into physical pipelined query operators that

can be part of a query execution plan. By having GPAC as pipelined query operators,

we achieve the following three goals:

1. GPAC operators can be combined with other traditional operators (e.g., dis-

tinct, aggregate, and join) to support online and progressive evaluation for a

wide variety of continuous spatio-temporal queries.

2. Pushing GPAC operators deep in the query execution plan reduces the number

of tuples in the query pipeline where GPAC operators act as filters to other

operators.

3. Flexibility in the query optimizer where multiple candidate execution plans can

be produced by shuffling the GPAC operators with other traditional operators.

The rest of this chapter is organized as follows: Section 4.1 introduces the basic

idea of GPAC. In Section 4.2, we introduce the problem of uncertainty in continuous

spatio-temporal queries and how can the GPAC framework avoids such uncertainty.

Section 4.3 provides two instances of GPAC that behave as continuous range queries

and k-nearest-neighbor queries. Encapsulation of GPAC into physical query opera-

tors is presented in Section 4.4. Section 4.5 provides an experimental study of GPAC.

Finally, Section 4.6 summarizes this chapter.

4.1 The GPAC: Continuous Spatio-temporal Query Operators

In this section, we introduce the Generic Progressive Algorithm (GPAC) for con-

tinuous spatio-temporal queries over spatio-temporal streams. GPAC is similar in

spirit to generalized search tree indexes (e.g., GiST [82] and SP-GiST [83]), but

GPAC is in the context of spatio-temporal query processing algorithms. GPAC is

introduced as a general skeleton that can be adjusted through a set of methods to

57

behave as various continuous spatio-temporal queries (e.g., continuous range queries

and nearest-neighbor queries). In GPAC, each moving query is bounded to one focal

object. For example, if a moving object M submits a query Q that asks about its

nearest police car, then M is considered the focal object of Q. Mobile objects and

queries are required to send updates of their locations every T seconds. Failure to

do so results in considering the mobile object or query as disconnected. As GPAC

can be implemented either at the application level or as a physical query operator,

the output of GPAC is sent either directly to the user or to the next query operator

in the pipeline. Thus, throughout the rest of the paper, we use the terms “user” and

“next query operator” as synonyms.

In GPAC, we store the tuples that satisfy the each query Q in a data structure

termed Q.Answer. Then, for each newly incoming tuple P , GPAC performs two

tests: Test I: Is P ∈ Q.Answer? Test II: Does P satisfy the query predicate?. Based

on the results of the two tests, GPAC distinguishes among four cases:

• Case I: P ∈ Q.Answer and P still satisfies the query predicate. As GPAC

processes only the updates of the previously reported result, P will neither be

processed nor will P be sent to the user.

• Case II: P ∈ Q.Answer, however, P does not qualify to be part of the answer

anymore (i.e., P does not satisfy the query predicate anymore) . In this case,

GPAC reports a negative update P− to the user. The negative update indicates

that P needs to be removed from the query answer and hence is discarded from

the system.

• Case III: P /∈ Q.Answer, however, P qualifies to be part of the current answer

(i.e., P satisfies the query predicate currently). In this case, GPAC reports a

positive update to the user. The positive update indicates that P needs to be

added to the query answer.

58

Procedure Q.ReceiveTupleI(Tuple P )

Begin

1. If query Q is moving and P is the focal point

(a) Q.UpdateCriteriaI(P ) (Figure 4.2)

(b) return

2. if Q.satisfy(P ) AND P /∈ Q.Answer

(a) Add P to Q.Answer

(b) Send the Positiveupdate P to the user

(c) If Q.IsDynamic()

• Q.UpdateCriteriaI(P ) (Figure 4.2)

(d) return

3. If (!Q.satisfy(P )) AND (P ∈ Q.Answer)

(a) Delete P from Q.Answer

(b) Send the Negative tuple P− to the user.

End

Figure 4.1. Pseudo code of skeleton of GPAC.

• Case IV: P /∈ Q.Answer and P still does not qualify to be part of the current

answer. In this case, P has no effect on Q. Thus, P will neither be processed

nor will P be sent to the user.

Figures 4.1 and 4.2 give the pseudo code of the main idea of GPAC upon receiving

a tuple P . Functions and variables written in bold font need to be implemented

separately for each query type as will be addressed in Section 4.3. Initially, GPAC

checks if P is the focal object of the moving query Q (Step 1 in Figure 4.1). If this

59

is the case, we update the spatial region covered by Q. Based on the update, some

tuples from Q.Answer may be out of the new query spatial region. These tuples

are deleted (expired) from Q.Answer and corresponding negative updates are sent

to the user or the next query operator (Step 2 in Figure 4.2).

If the newly incoming tuple P is not the query focal object (Step 2 in Figure 4.1),

we check if P qualifies to be in the query answer (Test II). If this is the case, we

check if P is part of the recently reported answer (Test I). In this case, we do not

process or send P since P is still in the reported answer (Case I). However, if P is

not part of the recently reported answer (Case III), we add P to Q.Answer (Step 2a

in Figure 4.1) and send P as a positive update to the user (Step 2b in Figure 4.1).

Then, we update the query information (if needed) based on P ’s effect on the query

spatial region (Step 2c in Figure 4.1). The predicate Q.IsDynamic() returns “true”

if the query spatial area is changed as a result of P .

If the incoming tuple P does not qualify to be part of the answer, then we check

if P is part of the recently reported answer (Step 3 in Figure 4.1). In this case (Case

II), we delete P from the current answer (Step 3a in Figure 4.1) and report P as a

negative update to the user (Step 3b in Figure 4.1). Notice that if P was not in the

previously reported answer, we do not have to process or send P to the user (Case

IV).

4.2 Uncertainty in Continuous Spatio-temporal Queries

4.2.1 Types of Uncertainty

One of the goals of GPAC is to provide a fast and up-to-the-moment answer

to continuous queries over spatio-temporal streams. However, this goal is hindered

by the fact that spatio-temporal streams are not materialized on secondary storage.

The basic GPAC algorithm stores only the tuples that satisfy the query predicate

Q. Such implementation may result in having uncertainty areas in Q. We define the

uncertainty area of a query Q as follows:

60

Procedure Q.UpdateCriteriaI(Tuple P )

Begin

1. Q.Update(P )

2. For all moving objects M ∈ Q.Answer

• If NOT Q.satisfy(M)

(a) Send the Negative output M− to user.

(b) Delete M from Q.Answer.

End

Figure 4.2. Updating query information in GPAC

Definition 4.2.1 The uncertainty area of query Q is the spatial area of Q that

may contain potential moving objects that satisfy Q, with Q not being aware of the

contents of this area.

Uncertainty areas in GPAC may result in erroneous query result. We distinguish

among three cases for producing uncertainty areas within the basic GPAC framework:

1. New query. Initially, there are no outstanding queries in the system. Thus,

continuously arrived spatio-temporal streams are neither processed nor stored.

Once a query Q is submitted to the system, we cannot provide a fast answer

to Q, simply because there is nothing currently being stored in the database.

In this case, all the area covered by Q is considered an uncertainty area. Later

on, moving objects update their locations and the answer of Q is incrementally

built.

2. Moving queries. Figures 4.3 and 4.4 give examples of uncertainty areas

that result from moving range queries and moving nearest-neighbor queries,

respectively. Figure 4.3a represents a snapshot at time T0 where point P is

outside the area of query Q. Thus, P is not physically stored in the database.

61

P

Q

QQ

(c) T : P moves(b) T : Q moves(a) Q and P at T

P

P

P

0 1 2

Figure 4.3. Uncertainty in moving range queries.

1

P3

P2

P

Q

3P3

P1

P2

P

1

QP

P2

Q

(b) T : Q moves(a) Q at time T (c) T : P moves2 30 1

Figure 4.4. Uncertainty in moving NN queries.

At time T1 (Figure 4.3b), Q is moved to cover a new spatial area. The shaded

area in Q represents its uncertainty area. Although P is inside the new query

region, P is not reported in the query answer. At T2 (Figure 4.3c), object

P moves out of the query region. Thus, P is never reported at the query

result, although it was physically inside the query region in the interval [T1, T2].

Similar erroneous output is given in Figure 4.4 for k-nearest-neighbor queries

(k = 2). Object P3 is never reported in the query answer, although it should

have been within the answer in the interval [T1, T2].

3. Stationary queries. Figure 4.5 gives an example of uncertainty area in sta-

tionary k-nearest-neighbor queries (k = 2). At time T0 (Figure 4.5a), the query

Q has P1 and P3 as its answer. P2 is outside the query spatial region, thus P2 is

not stored in the database. At T1 (Figure 4.5b), P1 is moved far from Q. Since

Q is aware of P1 and P3 only, we extend the spatial region of Q to include the

new location of P1. Thus, an uncertainty area is produced. Notice that Q is

62

(c) T : P moves(b) T : P moves(a) Q at time T

P2

P

3P P1

0

P3 P1

P

1

P3

2 21 1

P22

Q QQ

Figure 4.5. Uncertainty in static NN queries.

unaware with P2 since P2 is not stored in the database. At T2 (Figure 4.5c), P2

moves out of the new query region. Thus, P2 never appears as an answer of Q,

although it should have been part of the answer in the time interval [T1, T2].

4.2.2 Uncertainty Avoidance in GPAC

GPAC does not handle the uncertainty area that results from newly submitted

queries. Continuous queries are issued to run for hours and days. Thus, having a

warm-up period for a few seconds does not affect neither the accuracy nor the effi-

ciency of the query result. However, uncertainty areas that result from stationary

or moving queries are crucial and are treated by GPAC. In this section, we modify

the basic GPAC algorithm given in Section 4.1 to avoid having uncertainty areas in

both stationary and moving queries. The main idea is to anticipate the change in

the query spatial region and cache all moving objects that lie inside the anticipated

area in an in-memory structure called Q.Cache. A conservative approach for deter-

mining the anticipated area is to expand the query region in all directions with the

maximum possible distance that a moving object can cover between any two con-

secutive updates. Such conservative approach completely avoids uncertainty areas.

Once a query changes its spatial region, we probe Q.Cache for all objects that lie

inside the new spatial region. Thus a fast answer of Q is retrieved. Notice that with

the conservative approach, the change of the query spatial region is guaranteed to

63

(a) Snapshot at time T

P3

P1P

0

P

P5

2

4

2

P3

P1

P5

P4

P

1

P5

(c) The cache area is adjusted

P

(b) The query is moved at time T

4

P3

P1P2

Figure 4.6. The cache area.

be completely inside the anticipated area. To realize GPAC with caching, we equip

each query Q with the following: (1) A variable Q.CacheArea that contains the

boundary of the anticipated area. (2) The data structure Q.Cache that keeps track

of all moving objects within Q.CacheArea. (3) The function Q.InCacheArea() that

takes an input tuple P and outputs true if P lies inside Q.CacheArea.

The conservative caching approach requires only the knowledge of the maximum

object speed, which is typically available in moving object applications (e.g., mov-

ing cars in road network have limited speeds). This is in contrast to all validity

region approaches (e.g., the safe region [19], the valid region [55], and the No-Action

region [84]) that require the knowledge of the locations of other objects. This infor-

mation is not available in our case since GPAC is aware only of objects that satisfy

the query predicate. Thus, validity region approaches are not applicable in the case

of spatio-temporal streams.

Figure 4.6a gives an example of a continuous range query (the shaded area) along

with its extended cache area (the dotted area). All objects that lie either in the

query area or in the cache area are considered significant, thus are stored in memory.

Figure 4.6b illustrates the query movement. Since, object P4 is significant one, we

are able to produce P4 as an answer of the continuous query. In case that we do not

have the concept of a caching area, object P4 would not have been significant. Thus,

we would not be able to produce P4 in the query answer. Upon the query movement,

the cache area has to be adjusted based on the new query region (Figure 4.6b).

64

Figures 4.7 and 4.8 give the pseudo code of GPAC when caching is employed as

a means to avoid uncertainty areas. The changes in the basic GPAC algorithm are

limited to the following: (1) When the focal point of a moving query moves, we update

the new Q.CacheArea. Then, we go over all the objects in Q.Cache to determine

whether any of them become part of the query answer (Step 2 in Figure 4.8). Also,

for moving objects that are out of the new query region, we check whether they need

to be moved into Q.Cache or not (Step 3b in Figure 4.8). (2) When the input P is

inside the query area but was not in the previously reported answer, we check if P is

stored in Q.Cache. In this case, we delete P from Q.Cache (Step 2c in Figure 4.7).

(3) When the input P is not inside the query region but was in the old answer,

we check if the new value of P lies in the query region. In this case, we add P to

Q.Cache (Step 3c in Figure 4.7). (4) If P is neither inside the query region nor in

the previous query answer, we maintain the status of P with respect to the query

region (Steps 4 and 5 of Figure 4.7).

The cache area enlarges the query size and hence more input tuples need to be

stored. However, this increase in size is limited and can be neglected in many cases.

For example, consider a square range query with side length x. A conservative cache

area would increase the side length to be x + d where d is the maximum distance an

object can travel between any two consecutive updates. The ratio of area increase

would be (x+d)2−x2

x2 . A typical query region would be orders of magnitude of d, i.e.,

x = md. Thus, the ratio of increase is 2md2+d2

m2d2 = 2m+1m2 , which can be approximated

to 2m

. In a typical scenario, m can be in the order of tens, which results in a slight

overhead in the query size. For example, consider a square range query with side

length 2 miles that monitors the traffic in a downtown area. If objects are moving

with speed 25 miles/hour while updating their locations every 30 seconds, then the

maximum travelled distance for each object is d = 1/8. This will result in increasing

the query area by only 12.5%. Similarly, for the same setting, a query about objects

within 3 miles suffers only an 8.5% increase in size. Notice that the overhead in

having a cache area is reduced by the increase in the area of the original query.

65

Procedure Q.ReceiveTupleII(Tuple P )

Begin

1. If query Q is moving and P is the focal point, Q.UpdateCriteriaII(P ) (Figure 4.8),

return

2. if Q.satisfy(P ) AND P /∈ Q.Answer

(a) Add P to Q.Answer

(b) Send the Positive update P to the user

(c) If P ∈ Q.Cache, delete P from Q.Cache

(d) If Q.IsDynamic(), Q.UpdateCriteriaII(P ) (Figure 4.8)

(e) return

3. If (!Q.satisfy(P )) AND (P ∈ Q.Answer)

(a) Delete P from Q.Answer

(b) Send the Negative tuple P− to the user

(c) If Q.InCacheArea(P ), Insert P in Q.Cache

(d) If Q.IsDynamic(), Q.UpdateCriteriaII(P ) (Figure 4.8)

(e) return

4. If Q.InCacheArea(P )

(a) If P /∈ Q.Cache, Insert P in Q.Cache

(b) return

5. If P ∈ Q.Cache, delete P from Q.Cache.

End

Figure 4.7. Pseudo code of GPAC with caching.

66

Procedure Q.UpdateMovingQueryII(Tuple P )

Begin

1. Update Q.Criteria and Q.CacheArea based on P

2. For all moving objects M in Q.Cache

• If Q.satisfy(M)

(a) Move M from Q.Cache to Q.Answer

(b) Send the Positive update M to the user

• If NOT Q.InCacheArea(M)

– Delete M from Q.Cache

3. For all moving objects M ∈ in Q.Answer

• If NOT Q.satisfy(M)

(a) Send the Negative tuple M− to the user

(b) if Q.InCacheArea(M), move M from Q.Answer to Q.Cache, else,

delete M from Q.Answer.

End

Figure 4.8. Updating query information in GPAC with caching

4.3 Instances of GPAC

In this section, we develop two instances of GPAC, namely, for continuous spatio-

temporal range queries and continuous k-nearest-neighbor queries. Other instances

of GPAC (e.g., reverse nearest-neighbor [33], group nearest-neighbor queries [85],

and time-parameterized queries [86]) can be developed in a similar way.

67

4.3.1 Spatio-temporal Range Queries

Q.Answer is represented by a hash table. Q.Cache is represented as a linked

list that is sorted on the distance from the moving object to the boundary of the

query region. The functions Q.satisfy() and Q.InCacheArea() represent a test

of object P inside the rectangular region of Q and the cache area, respectively.

The function Q.IsDynamic() always returns false for stationary queries and true

for moving queries. This is because static range queries never change their spatial

regions.

4.3.2 Spatio-temporal k-nearest-neighbor

A k-nearest-neighbor query (kNN) is represented in GPAC in the same way as a

range query. The only difference is that the kNN query has a circular region rather

than a rectangular region. Initially, a kNN query is submitted to GPAC with the

format (QID, center, k) or (QID, FocalID, k) for stationary and moving queries,

respectively. Thus, the center of the query circular region is either stated explicitly

as in stationary queries or implicitly as the current location of the object FocalID in

case of moving queries. Once the kNN query is registered in SOLE, the first incoming

k objects are considered the initial query answer. The radius of the circular region

is determined by the distance from the query center to the current kth farthest

neighbor. Then, the query execution continues as a regular range query, yet with

a variable size. Whenever a newly coming object P lies inside the circular query

region, P removes the kth farthest neighbor from the answer set (with a negative

update) and adds itself to the answer set (with a positive update). The query circular

region is shrunk to reflect the new kth neighbor. Similarly, if an object P , which

is one of the k neighbors, updates its location to be outside the circular region, we

expand the query circular region to reflect the fact that P is considered the farthest

kth neighbor. Notice that in case of expanding the query region, we do not output

any updates.

68

Thus, for k-nearest-neighbor queries, Q.Answer and Q.Cache are represented by

a linked list that is sorted on the distance from the moving object to the query focal

point. The functions Q.satisfy() and Q.InCacheArea() represent a test of object P

inside the circular region of Q and the cache area, respectively. The circular region

has the focal point as its center and the distance to the furthest k point as its radius.

The function Q.IsDynamic() always returns true for both stationary and moving

queries.

4.4 Pipelined Spatio-temporal Query Operators

We encapsulate GPAC algorithms for continuous range queries and continuous k-

nearest-neighbor queries into the pipelined query operators GPAC-IN and GPAC-kNN,

respectively. The pipelined operators are implemented inside the PLACE (Pervasive

Location-Aware Computing Environments) server [11, 13]. A typical SQL query

submitted to the PLACE server may have the following form:

SELECT select clause

FROM from clause

WHERE where clause

GPAC-IN in clause

GPAC-kNN knn clause

The in clause may have one of the following two forms:

• Static range query (x1, y1, x2, y2), where (x1, y1) and (x2, y2) represent the top

left and bottom right corners of the rectangular range query.

• Moving rectangular range query (′M ′, ID, xdist, ydist), where ′M ′ is a flag

indicates that the query is moving, ID is the identifier of the query focal point,

xdist is the length of the query rectangle, and ydist is the width of the query

rectangle.

69

Similarly, the knn clause may have one of the following two forms:

• Static kNN query (k, x, y), where k is the number of the neighbors to be main-

tained, and (x, y) is the center of the query point.

• Moving kNN query (′M ′, k, ID), where ′M ′ is a flag indicates that the query is

moving, k is the number of neighbors to be maintained, and ID is the identifier

of the query focal point.

As will be discussed in Section 4.5, pushing the operators GPAC-IN and GPAC-kNN

to the bottom of the execution query plan always achieves the best performance.

However, having the spatio-temporal operators at the bottom or at the middle of

the query evaluation pipeline requires that all the above operators be equipped with

special handling of negative tuples. The NILE query processor [3] handles negative

tuples in pipelined operators as follows: Selection and Join operators handle nega-

tive tuples in the same way as positive tuples. The only difference is that the output

will be in the form of a negative tuple. Aggregates update their aggregate functions

by considering the received negative tuple. The Distinct operator reports a negative

tuple at the output only if the corresponding positive tuple is in the recently re-

ported result. For more details about handling the negative tuples in various query

operators, the reader is referred to [39].


In this section, we give experimental evidence that encapsulating GPAC algo-

rithms with appropriate cache size into physical pipelined query operators outper-

forms high level implementations. Mainly, the experiments in this section are divided

into two categories:

• Pipelined operators. This set of experiments compare the high level imple-

mentation of GPAC with the encapsulation of GPAC algorithms in pipelined

query operators.

70

Figure 4.9. Greater Lafayette, Indiana, USA.

• Properties of GPAC. In this set of experiments, we study some properties

of GPAC, namely dealing with high data rates and various spatio-temporal

selectivities.

All the results in this section are based on a real implementation of GPAC al-

gorithms and operators inside our prototype database engine for spatio-temporal

streams, PLACE [11,13]. PLACE extends the Nile [3] streaming database manage-

ment system to handle spatio-temporal streams. We run PLACE on Intel Pentium IV

CPU 2.4GHz with 512MB RAM running Windows XP. Without loss of generality,

all the presented experiments are conducted on stationary and moving continuous

spatio-temporal queries. Similar results are achieved when employing continuous

k-nearest-neighbor queries.

We use the Network-based Generator of Moving Objects [74] to generate a set

of moving objects and moving queries in the form of spatio-temporal streams. The

input to the generator is the road map of Greater Lafayatte (a city in the state of

Indiana, USA) given in Figure 4.9. The output of the generator is a set of moving

points that move on the road network of the given city. Moving objects can be

71

cars, cyclists, pedestrians, etc. Any moving object can be a focal of a moving query.

Unless mentioned otherwise, we generate 110K moving objects as follows: Initially,

we generate 10K moving objects from the generator, then we run the generator for

1000 time units. At each time unit, we generate new 100 moving objects. Moving

objects are required to report their locations every time unit T . Failure to do so

results in disconnecting the moving object from the server.

Although, it is appealing to have a conservative cache, a large cache size may

encounter high overhead while maintaining objects inside the cache area. Thus, for

the rest of the experiments, we use the cache size 75% of the conservative cache area.

In most cases, a cache size of 75% would have similar performance as that of the

conservative cache. Notice that a conservative cache is designed for the most speedy

moving object. Most likely, the query focal object is not that object of maximum

speed.

4.5.1 GPAC Operators in a Pipelined Query Plan

In this section, we compare the implementation of GPAC at the application level

with the encapsulation of GPAC inside query operators.

Pipeline with a Selection Operator

Consider the query Q:“Continuously report all trucks that are within MyArea”.

MyArea can be either a stationary or moving range query. A high level implementa-

tion of this query is to have only a selection operator that selects only the “trucks”.

Then, a high level algorithm implementation would take the selection output and

incrementally produce the query result. However, an encapsulation of GPAC into

the GPAC-IN operator allows for more flexible plans. Figure 4.10a gives a query eval-

uation plan when pushing the GPAC-IN operator before the selection operator. The

following is the SQL presentation of the query.

72

JOIN

(b) JOIN

GPAC−IN

SELECT

(a) SELECTION

GPAC−IN

Figure 4.10. Pipelined GPAC operators.

SELECT M.ObjectID

FROM MovingObjects M

WHERE M.type = ”truck”

GPAC-IN MyArea

Figure 4.11 compares the high level implementation of the above query with

pipelined GPAC-IN operators for both stationary and moving queries. The selectivity

of the queries varies from 2% to 64%. The selectivity of the selection operator

is 5%. Our measure of comparison is the number of tuples that go through the

query evaluation pipeline. When GPAC is implemented at the application level,

its performance is not affected by the query selectivity. However, when GPAC-IN

is pushed before the selection, it acts as a filter for the query evaluation pipeline,

thus, limiting the tuples through the pipeline to only the progressive updates. With

GPAC-IN selectivity less than 32%, pushing GPAC-IN before the selection greatly

affects the performance. However, with selectivity more than 32%, it would be

better to have the GPAC-IN operator above the selection operator.

73

0

1000

2000

3000

4000

5000

6000

7000

8000

2 4 8 16 32 64

Tuples in the Pipeline

Query Selectivity

Stationary Pipelined QueryMoving Pipelined Query

Application level

Figure 4.11. Pipelined operators with SELECT.

Pipeline with a Join Operator

In this section, we consider a more complex query plan that contains a join

operator. Consider the query Q: “Continuously report moving objects that belong to

my favorite set of objects and that lie within MyArea”. A high level implementation

of GPAC would probe a streaming database engine to join all moving objects with

my favorite set of objects. Then, the output of the join is sent to the GPAC algorithm

for further processing. However, with the GPAC-IN operator, we can have a query

evaluation plan as that of Figure 4.10b where the GPAC-IN operator is pushed below

the Join operator. The SQL representation of the above query is as follows:

SELECT M.ObjectID

FROM MovingObjects M, MyFavoriteCars F

WHERE M.ObjectID = F.ObjectID

GPAC-IN MyArea

Figure 4.12 compares the high level implementation of the above query with the

pipelined GPAC-IN operator for both stationary and moving queries. The selectivity

74

0

500

1000

1500

2000

2500

3000

0.64 0.32 0.16 0.08 0.04 0.02

Tuples in the Pipeline

Query Size

SOLE as an OperatorSOLE as a Table Function

Figure 4.12. Pipelined operators with Join.

of the queries varies from 2% to 64%. As in Figure 4.11, the selectivity of GPAC does

not affect the performance if it is implemented in the application level. Unlike the

case of selection operators, GPAC provides a dramatic increase in the performance

(around an order of magnitude) when implemented as a pipelined operator. The

main reason in this dramatic gain in performance is the high overhead incurred

when evaluating the join operation. Thus, the GPAC-IN operator filters out the

input tuples and limit the input to the join operator to only the incremental positive

and negative updates.

4.5.2 Properties of GPAC

In this section, we study some properties of GPAC algorithms, namely, dealing

with high rates of data arrival and the spatio-temporal query selectivity.

75

0

1

2

3

4

5

6

7

8

0 500 1000 1500 2000 2500 3000 3500 4000

Delay in output (seconds)

Arrival Rate (tuples/sec)

Stationary QueryMoving Query

Figure 4.13. High arrival rates.

High Arrival Rates

Figure 4.13 gives the result of an experiment that deals with high arrival rates in

GPAC for stationary and moving queries. Spatio-temporal data arrives exponentially

with an arrival rate that varies from 100 tuples per second to 4000 tuples per second.

Our measure is the average output delay. The output delay of a tuple P is the

difference from the time that P enters the system to the time that P has an effect

on the output result. As shown in Figure 4.13, GPAC algorithms can afford up to

2000 tuples per second with only one second in output delay.

Query Selectivity due to Incremental Evaluation

Unlike the selectivity of traditional queries, the selectivity of spatio-temporal

queries is more sophisticated. Figure 4.14 gives the result of an experiment that

shows the selectivity of spatio-temporal queries. We run a continuous spatio-

temporal query Q that should have a selectivity that varies from 10% to 100%.

We call this selectivity as the correct selectivity where it is induced from the spa-

tial area covered by Q. However, the actual selectivity of the spatio-temporal query

76

10

20

30

40

50

60

70

80

90

100

10 20 30 40 50 60 70 80 90 100

Actual Selectivity

Query Selectivity

Actual SelectivityCorrect Selectivity

Figure 4.14. Query selectivity.

is higher than its correct selectivity. The main reason is that in spatio-temporal

queries, moving objects can go back and forth and report themselves in the query

answer as multiple positive and negative tuples. Thus, it may happen that a query

with a smaller area produces more output results than a query with a larger area.

For example, consider a query that covers all the spatial area (i.e., selectivity 100%).

Such a query would never output negative tuples. In addition, once all objects are

inside the query area, no output will be produced due to the progressive property.

Consider another query that has a slightly less area. Due to the area not covered by

this query, it may happen that some tuples go out of the query region and produce

negative tuples. Then, these tuples can move again inside the query area to produce

a set of positive tuples. As a result, a query with smaller area may produce more

output tuples.

4.6 Summary

In this chapter, we introduced a new family of Generic and Progressive Algo-

rithms (GPAC, for short) for continuous query evaluation over spatio-temporal data

77

streams. GPAC is a general skeleton that can be tuned through a set of methods to

behave as various continuous spatio-temporal queries. GPAC provides online, pro-

gressive, and fast response to continuous spatio-temporal queries. We described two

versions of GPAC. The first version (with no caching) is simple to maintain, however,

produces inaccurate answers. In the second version (with caching), we introduce the

concept of anticipation, where the query answer can be anticipated beforehand and

is cached in a cache structure. We show how to realize two types of continuous

spatio-temporal queries form GPAC, namely, continuous range queries and contin-

uous k-nearest-neighbor queries. Moreover, we encapsulate GPAC algorithms into

physical pipelined query operators. Pipelined operators are combined with tradi-

tional operators (e.g., selection and join) to provide online, progressive, and fast

response of a wide variety of continuous spatio-temporal queries. Experimental re-

sults determine the appropriate size of caching in GPAC. In addition, we show that

encapsulating GPAC into pipelined query operators is an order of magnitude better

than implementing GPAC at the application level. Also, GPAC is stable with high

data arrival rates. For arrival rate of 2000 tuples per second, GPAC results in only

one second delay in the query answer.

78

5 STREAM-BASED SPATIO-TEMPORAL QUERY PROCESSING:

SCALABILITY

In this chapter, we focus on the scalable execution of multiple concurrent spatio-

temporal queries in the streaming environments. We propose the Scalable On-Line

Execution algorithm (SOLE, for short) for continuous and on-line evaluation of

concurrent continuous spatio-temporal queries over spatio-temporal data streams.

SOLE combines the recent advances of both spatio-temporal continuous query pro-

cessors and data stream management systems. On-line execution is achieved in SOLE

by allowing only in-memory processing of incoming spatio-temporal data streams.

Scalability in SOLE is achieved by using a shared in-memory buffer pool that is ac-

cessible by all outstanding queries. The scarce memory resource is efficiently utilized

by keeping track of only those objects that are considered significant to the out-

standing continuous queries. Furthermore, SOLE is presented as a spatio-temporal

join between two input streams; a stream of spatio-temporal objects and a stream

of spatio-temporal queries.

To cope with intervals of very high arrival rates of objects and/or queries, SOLE

adopts a self-tuning approach based on load-shedding. Two load shedding techniques

are proposed, namely, query load shedding and object load shedding. Load shedding

techniques continuously negotiate with SOLE to reduce the memory workload in

order to support larger number of queries with a certain guaranteed query accuracy.

Two alternative approaches exist for implementing spatio-temporal algorithms in

database systems: using table functions or encapsulating the algorithm into a physical

pipelined operator. In the first approach, which is employed by existing spatio-

temporal algorithms, algorithms are implemented using SQL table functions [87].

Since there is no straightforward method of pushing query predicates into table

functions [88], the performance of this table function is severely limited and the

79

approach does not give enough flexibility in optimizing the issued queries. The

second approach, which we adopt in SOLE, is to define a query operator that can be

part of a query execution plan. The SOLE operator can be combined with traditional

operators (e.g., join, aggregates, and distinct) to support a wide variety of spatio-

temporal queries. In addition, with the SOLE operator, the query optimizer can

support multiple candidate execution plans.

The rest of this paper is organized as follows: Section 5.1 highlights related work

to SOLE in terms of spatio-temporal query processing ad data stream management

systems. The SOLE framework is presented in Section 5.2. Section 5.3 illustrates

sharing memory resources among concurrent continuous queries in the SOLE frame-

work. The shared execution among continuous queries in SOLE is presented in

Section 5.4. Section 5.5 discusses load shedding in SOLE. Experimental results that

are based on a real implementation of SOLE as an operator inside a data stream

management system are presented in Section 5.6. Finally, Section 5.7 summarizes

this chapter.

5.1 Related Work

Up to our knowledge, SOLE provides the first attempt to furnish query processors

in data stream management systems with the required scalable operators and al-

gorithms to support a scalable execution of concurrent continuous spatio-temporal

queries over spatio-temporal data streams. Since SOLE bridges the areas of spatio-

temporal databases and data stream management systems, we discuss the related

work in each area separately.

5.1.1 Spatio-temporal Databases

Existing algorithms for continuous spatio-temporal query processing focus mainly

on materializing spatio-temporal data in disk-based indexing structures (e.g., hash

tables [17, 18], grid files [8, 25, 61, 89], the B-tree [90], the R-tree [20, 21, 31, 57, 58],

80

and the TPR-tree [22,24]). Scalable execution of concurrent spatio-temporal queries

is addressed recently for centralized [8,19,89] and distributed environments [60,61].

However, the underlying data structure is either a disk-based gird structure [8,61,89]

or a disk-based R-tree [19,60]. None of these techniques deal with the issue of spatio-

temporal data streams. Issues of high arrival rates, infinite nature of data, and

spatio-temporal streams are overlooked by these approaches. With the notion of

data streams, only in-memory algorithms and data structures can be realized.

The most related work to SOLE in the context of spatio-temporal databases

is the SINA framework [8]. SOLE has common functionalities with SINA where

both of them utilize a shared grid structure to produce incremental query results.

However, SOLE distinguishes itself from SINA and other scalable spatio-temporal

query processors (e.g., [61, 89]) in the following aspects: (1) SOLE is an in-memory

algorithm where all data structures are memory-based. (2) SOLE is equipped with

load shedding techniques to cope with intervals of high arrival rates of moving objects

and/or queries. (3) As a result of the streaming environment, SOLE deals with

new challenging issues, e.g., uncertainty in query areas, scarce memory resources,

and approximate query processing. (4) SOLE is encapsulated into a physical non-

blocking pipelined query operator where the result of SOLE is produced one tuple

at a time. Previous scalable spatio-temporal query processors (e.g., SINA [8], SEA-

CNN [89], Q-Index [19], and MobiEyes [61]) can be implemented only as a table

function where the result is produced periodically in batches.

5.1.2 Data Stream Management Systems

Existing prototypes for data stream management systems [45, 62, 79, 80] aim to

efficiently support continuous queries over data streams. However, the spatial and

temporal properties of data streams and/or continuous queries are overlooked by

these prototypes. With limited memory resources, existing stream query processors

adopt the concept of sliding windows to limit the number of tuples stored in-memory

81

to only the recent tuples [40,41,91]. Such model is not appropriate for many spatio-

temporal applications where the focus is on the current status of the database rather

than on the recent past. The only work for continuous queries over spatio-temporal

streams is the GPAC algorithm [9]. However, GPAC is concerned only with the

execution of a single outstanding continuous query. In a typical data stream envi-

ronment, there is a huge number of outstanding continuous queries in which GPAC

cannot afford.

Scalable execution of continuous queries in traditional data streams aim to either

detect common subexpressions [48,62,92] or share resources at the operator level [38,

41, 93]. SOLE exploits both paradigms where evaluating multiple spatio-temporal

queries is performed as a spatio-temporal join between an object stream and a query

stream while a shared memory resource (buffer pool) is maintained to support all

continuous queries. Load shedding in data stream management systems is addressed

recently in [94, 95]. The main idea to add a special operator to the query plan

to regulate the load by discarding unimportant incoming tuples. Load shedding

techniques in SOLE are distinguished from other approaches where in addition to

discarding some of the incoming tuples, SOLE voluntary drops some of the tuples

stored in-memory.

The most related work to SOLE in the context of data stream management

systems is the NiagaraCQ framework [62]. SOLE has common functionalities with

NiagaraCQ where both of them utilize a shared operator to join a set of objects with

a set of queries. However, SOLE distinguishes itself from NiagaraCQ and other data

stream management systems in the following: (1) As a result of the spatio-temporal

environment, SOLE has to deal with new challenging issues, e.g., moving queries,

uncertainty in query areas, positive and negative updates to the query result. (2) In

a highly overloaded system, SOLE provides approximate results by employing load

shedding techniques. (3) In addition to sharing the query operator as in NiagaraCQ,

SOLE share memory resources at the operator level.

82

Q1

Q2

QN .

.

Range

. .

+/−

. .

Q1

Q2

QN

Split

+/− +/−

Queries (Q)Objects (P)

Stream of Moving Objects (P)

+/−+/−+/−

buffer for each query

(a) Separate query plan and

buffer pool for all queries

(b) Shared operator and shared

+− ji

(Q , P )

Stream ofSpatio−temporalMoving

Stream of

Buffer

. . .

Operator

Range

. . .

. . .

kNN

. . .

. . .

Join

SharedSpatio−temporal

Figure 5.1. Overview of shared execution in SOLE.

5.2 The SOLE Framework

Figure 5.1a gives the pipelined execution of N queries (Q1 to QN ) of various types

where each query is considered a separate entity. With each single query Qi, an in-

memory buffer Bi is maintained to keep track of moving objects that are needed

by Qi. Such approach is employed by continuous query algorithms for single query

execution (e.g., GPAC [9]). In a typical spatio-temporal application (e.g., location-

aware servers), there are large numbers of concurrent spatio-temporal continuous

queries. Dealing with each query as a separate entity would easily consume the

system resources and degrade the system performance.

Our proposed SOLE approach is designed to support scalable execution of con-

tinuous spatio-temporal queries of various types. Figure 5.1b gives the pipelined

execution of the same N queries as in Figure 5.1a, yet with the shared SOLE op-

erator. The problem of evaluating concurrent continuous queries is reduced to a

spatio-temporal join between two streams; a stream of moving objects and a stream

of continuous queries. The shared SOLE operator has a shared buffer pool that is

accessible by all continuous queries. The output of the SOLE operator has the form

83

(Qi,±Pj) which indicates an addition/removal of object Pj to/from query Qi. The

SOLE operator is followed by a split operator that distributes the output of SOLE

either to the users or to the various query operators. The split operator is similar to

the one used in NiagaraCQ [62] and it is out of the focus of this paper. Our focus

is in realizing the shared memory buffer and the shared SOLE spatio-temporal join

operator.

Without loss of generality, we present SOLE in the context of stationary and

moving rectangular range queries. The extension to other continuous query types is

straightforward. While a stationary query is represented only by its region, a moving

query should be bounded to a focal moving object. For example, if a moving object

M issues a query Q that asks about objects within a certain range of M , then M is

considered the focal object of Q. The region of the moving query is determined by

the continuous movement of its focal object.

5.3 Shared Memory in SOLE

SOLE maintains a simple grid structure as an in-memory shared buffer pool

among all continuous queries and objects. The shared buffer pool is logically divided

into two parts; a query buffer that stores all outstanding continuous queries and an

object buffer that is concerned with moving objects. In addition to the grid structure,

SOLE employs a hash table h to index moving objects based on their identifiers.

5.3.1 Shared Object Buffer

To optimize the scarce memory resource, SOLE employs two main techniques:

(1) Rather than redundantly storing a moving object P multiple times with each

query Qi that needs P , SOLE stores P at most once along with a reference counter

that indicates the number of continuous queries that need P . (2) Rather than storing

all moving objects, SOLE keeps track with only the significant objects. Insignificant

84

objects are ignored (i.e., dropped) from memory. Significant objects are defined as

follows:

Definition 5.3.1 A moving object P is considered significant if P satisfies any of

the following two conditions: (1) There is at least one outstanding continuous query

Q that shows interest in object P (i.e., P has a non-zero reference counter), (2) P

is the focal object of at least one outstanding continuous query.

The definition of significant objects relies on the concept that a certain query

shows interest in a certain object, which will be clarified in the next section.

Having the previous definition of significant objects, SOLE continuously maintains

the following assertion:

Assertion 1 Only significant objects are stored in the shared memory buffer

To always satisfy this assertion, SOLE continuously keeps track of the following:

(1) A newly incoming data object P is stored in memory only if P is significant,

(2) At any time, if an object P that is already stored in the shared buffer becomes

insignificant, we drop P immediately from the shared buffer.

Significant moving objects are hashed to grid cells based on their spatial loca-

tions. An entry of a significant moving object P in a grid cell C has the form

(PID, Location, RefCount, FocalList). PID and Location are the object identi-

fier and location, respectively. RefCount indicates the number of queries that are

interested in P . FocalList is the list of active moving queries that have P as their

focal object.

5.3.2 Shared Query Buffer

The concepts of uncertainty and caching have been introduced in [9] in the context

of single continuous query. SOLE generalizes these concepts to be applicable to

scalable execution of multiple concurrent continuous queries. Based on the caching

area, we define when a query Q is interested in an object P as follows:

85

Definition 5.3.2 A continuous query Q is interested in object P if P either lies in

Q’s spatial area or in Q’s cache area.

Unlike data objects that are stored in only one grid cell, continuous queries are

stored in all grid cells that overlap either the query spatial area or the query cache

area. A query entry in a grid cell contains only the query identifier (QID). The

spatial region for each query is stored separately in a global lookup table. The

redundancy of storing the query identifier multiple times can be reduced by using

the optimizations discussed in Section 5.3.3. The trade-offs of the cache area size

along with the size of the grid structure are evaluated experimentally in Section 5.6.

5.3.3 Optimizing the Shared Buffer Pool

The shared memory buffer may suffer from redundancy where every query iden-

tifier QID is stored in all gird cells that overlap the query region. In this section,

we discus two optimizations, namely the layered grid and the edgy grid that aim to

reduce the redundancy in the shared memory buffer.

Layered Grid. The layered grid optimization uses multiple layers of grids with

different resolutions. A query identifier is stored in the lowest grid layer L that

results in less redundancy. All objects are stored in the lowest grid layer (i.e., the

one with the highest resolution). Although the layered grid reduces the redundancy

in the shared buffer, it may increase the processing time because a newly incoming

object needs to be joined with one grid cell from each layer.

Edgy Grid. The edgy grid optimization uses only one grid where the query identifier

QID is stored only at the grid cells that intersect the query boundary. Thus, we

do not need to store the query identifier in grid cells that are completely inside the

query region. The edgy grid optimization has two main advantages: (1) Redundancy

is greatly reduced, especially for large query sizes or small-sized grid cells. (2) The

execution time of updating the location of an object P is also reduced. The main

reason is that P is tested against fewer queries (i.e., only those that have boundaries

86

Buffer

Object

Expired

Objects

(1)Read

(3)Delete

object(s)

(2)Ignore

queries

Stationary

queries

+− ji

(Q , P )

(1)Store

with

Moving

object

(3)Delete

temporal queries (Q)moving objects (P)

Queries (Q) Objects (P)

with

JOIN

+/− +/−

Yes/No

Yes

(2)Read

JOIN

Stream of spatio−Stream of

Query

Buffer

Is Focal?

(2)Store

Figure 5.2. Shared join operator in SOLE.

in CP , the grid cell of P ). These fewer queries are the only ones ones that can

produce positive or negative updates. The drawback of the edgy grid optimization is

in the case of receiving a new object P that is not stored in memory. In this case,

the old location of P is considered to be out of the grid space. Then, all the grid

cells from the old location to the new location of P have to be tested.

5.4 Shared Execution in SOLE

Figure 5.2 gives the architecture of the shared spatio-temporal join operator. For

any incoming data object, say P , the shared spatio-temporal join operator consults

its query buffer to check if any query is affected by P (either in a positive or a

negative way). Based on the result, we decide either to store P in the object buffer

or to ignore P and delete P ’s old location (if any) from the object buffer. On the

other hand, for any incoming continuous query, say Q, we store Q or update Q’s

87

Procedure IncomingNewObject(Object P , GridCell CP ) Begin

1. For each Query Qi ∈ CP AND P ∈ Qi

(a) P.RefCount++

(b) if (P ∈ Qi) then output (Qi,+P ).

2. if (P.RefCount) then store P in CP and in hash table h.

End.

Figure 5.3. Pseudo code for receiving a new value of P .

old location (if any) in the query buffer. Then, we consult the object buffer to

check if any of the objects need to be added to or removed from Q’s answer. Based

on this operation, some in-memory stored objects may become insignificant, hence,

are deleted from the object buffer. Stationary queries are submitted directly to the

join operator, while moving queries are generated from the movement of their focal

objects.

Based on the data stored in the shared buffer, SOLE distinguishes among four

types of data inputs: (1) A new data object P that is not stored in memory, (2) Up-

date of the location of object P , (3) A new stationary query Q, (4) An update of the

region of a moving query Q. Figures 5.3, 5.4, 5.6, and 5.7 give the pseudo code of

SOLE upon receiving each input type. The details of the algorithms are described

below. SOLE makes use of the following notations: Q indicates the extended query

region that covers the cache area so that Q ⊂ Q. CQ, CQ are the set of grid cells that

are covered by Q and Q, respectively. CP represents a single grid cell that covers

the object P .

Input Type I: A new object P . Figure 5.3 gives the pseudo code of SOLE upon

receiving a new object P in the grid cell CP (i.e., P is not stored in memory). P is

tested against all the queries that are stored in CP (Step 1 in Figure 5.3). For each

query Qi ∈ CP , only three cases can take place: (1) P lies in Qi but not in Q. In

88

Procedure UpdateObj(Object Pold,P , GridCell CPold,CP ) Begin

1. For each query Qi ∈ P.FocalList, UpdateQuery(Qi)

2. Let L be the line (Pold, P )

3. For each query Qi ∈ (CPold∪ CP )

(a) if Qi intersects L, then

• if P ∈ Qi then Output (Qi,+P ), if Pold /∈ Qi, P.RefCount++

• else Output (Qi,−P ), if P /∈ Qi then P.RefCount−−

(b) else if Qi intersects L

• if P ∈ Qi then P.RefCount++, else P.RefCount−−

4. if (!P.RefCount) then delete Pold and ignore P , return.

5. if (CPold6= CP ) then move Pold from CPold

to CP .

6. Update the location of Pold to that of P in CP .

End.

Figure 5.4. Pseudo code for updating P ’s location.

this case, we need only to increase the reference counter of P to indicate that there

is one more query interested in P (Step 1a in Figure 5.3). Notice that no output is

produced in this case since P does not satisfy Qi. (2) P satisfies Qi. In this case,

in addition to increasing the reference counter, we output a positive update that

indicates the addition of P to the answer set of Qi (Step 1b in Figure 5.3). In the

above two cases, P is stored in the shared buffer as it is considered significant. (3) P

neither satisfies Qi nor lies in Qi. Thus, P is simply ignored as it is insignificant.

Input Type II: An update of P . Figure 5.4 gives the pseudo code of SOLE upon

receiving an update of object P ’s location. The old location of P is retrieved from

the hash table h. First, we evaluate all moving queries (if any) that have P as their

89

(b) Action taken for each case(a) All cases of updating P’s location

L

L2

L5

1

L9

4

L3

L

P

L6

9

3

L4

L7 old

+P

P +P

L8L2L1

L L5 L6

L7 L8 L

−PIN

Out

new

−P

Cache

OutCache

RefCount−−

RefCount−−

RefCount++RefCount++

IN

Figure 5.5. All cases of updating P ’s location.

focal object (Step 1 in Figure 5.4). Then, we check all the queries that belong to

either CP or CPold(Step 3 in Figure 5.4) against the line L that connects P and Pold.

Figure 5.5a gives nine different cases for the intersection of L with Q where Pold and

P are plotted as white and black circles, respectively. Both Pold and P can be in one

of the three states, in, cache, or out that indicates that P satisfies Q, in the cache

area of Q, or does not satisfy Q, respectively. The action taken for each case is given

in Figure 5.5b. Basically, if there is no change of state from Pold to P (e.g., L1, L5,

and L9), no action will be taken. If Pold was in Q, however, P is not, (e.g., L2 and

L3) we output the negative update (Q,−P ). The reference counter is decreased only

when Pold is of interest to Q while P is not (e.g., L3 and L6). Notice that in the case

of L2, we do not need to decrease the reference counter where although P does not

satisfy Q, P is still of interest to Q as P lies in Qi. Also, in the case of L6, we do

not need to output a negative update, however we decrease the reference counter. In

this case, since P and Pold are not in the answer set of Q, there is no need to update

the answer. Similarly, with a symmetric behavior, we output a positive update in

the cases of L4 and L7 and we increment the reference counter in the cases of L7 and

L8. After testing all cases, we check whether object P becomes insignificant. If this

is the case, we immediately drop P from memory (Step 4 in Figure 5.4). If P is still

90

Procedure StationaryQuery(Query Q) Begin

• For each grid cell cj ∈ CQ

1. Register Q in cj

2. For each object Pi ∈ cj AND Pi ∈ Q

– P.RefCount++, if P ∈ Q then output (Q,+P )

End.

Figure 5.6. Pseudo code for receiving a new query Q.

significant, we update P ’s location and cell (if needed) in the grid structure (Steps 5

and 6 in Figure 5.4).

Input Type III: A new query Q. Figure 5.6 gives the pseudo code of SOLE upon

receiving a continuous stationary query Q. Basically, we register Q in all the grid

cells that are covered by Q. In addition, we test Q against all data objects that are

stored in these cells. We increase the reference counter of only those objects that lie

in Q. In addition, objects that satisfy Q results in producing positive updates.

Input Type IV: An update of Q’s region. Figure 5.7 gives the pseudo code

of SOLE upon receiving an update of a moving query region. All stored objects

in all cells that are covered by the old and new regions of Q are tested against Q.

Figure 5.8a divides the space covered by the old and new regions of Q into seven

regions (R1-R7). The action taken for any point that lies in any of these regions is

given in Figure 5.8b. Similar to Figure 5.5b, a region Ri could have any of the three

states in, cache, or out based on whether Ri is inside Q, is in the cache area of Q, or

is outside Q. Basically, no action is taken for objects in any region Ri that maintains

its state for both Q and Qold (e.g., R4). If a region Ri is inside Qold, but is not in Q,

(e.g., R2 and R3), we output a negative update for each object in Ri. We decrement

the reference counter of these objects only if they lie in the region that is out of

the new cache area (e.g., R2) (Step 1 in Figure 5.7). Also, the reference counter is

91

Procedure UpdateQuery(Query Qold, Q) Begin

• For each object Pi ∈ (CQold∩ CQ)

1. if Pi ∈ Qoldthen

– if Pi /∈ Q then (Output (Q,−Pi), if Pi /∈ Q then (Pi.RefCount−−, if

(!Pi.RefCount) thendelete(Pi)))

2. else if Pi ∈ Q then (Output (Q,+Pi), if Pi /∈ Qold then Pi.RefCount++)

3. else if Pi ∈ Qold AND Pi /∈ Q then (Pi.RefCount−−, if (!Pi.RefCount)

then delete(Pi))

4. else if Pi ∈ Q AND Pi /∈ ˆQold then Pi.RefCount++.

• Register Q in CQ − CQold, unregister Q from CQold

− CQ

End.

Figure 5.7. Pseudo code for updating a query.

��

��

��

��

��

��

��

��

��

��

(b) Action taken for each case(a) All cases of updating query region

��

��

��

��

��

��

R4R2R1

R

Q

Q

RefCount−−

+P

+P

−P−PIN

Out

R3Cache

OutCache

5R 76

IN

R old

RefCount++ RefCount++

oldQ

RefCount−−

newnew

Q

R

4R

1RR4

R7R6

R5

R3 24R

Figure 5.8. All cases of updating Q’s region.

decremented for all objects in the region that are in the old cache area but are out of

the new cache area (e.g., R1) (Step 3 in Figure 5.7). Similarly, the reference counter

is increased for regions R6 and R7 while a positive output is sent for the points in

regions R5 and R6. Notice that whenever we decrement the reference counter for

92

(3) Memory Load

QueriesObjects

(5) STOP (Memory is OK)

Update

(1) Trigger

(Memory is almost full)

(2) Update criteria

(4) Update criteria

Statistics

Shared

Join

Operator

Expired

ObjectsShedding

Load

Figure 5.9. Architecture of self tuning in SOLE.

any moving object P , we check whether P becomes insignificant. If this is the case,

we immediately drop P from memory (e.g., Steps 1 and 3 in Figure 5.8). Finally, Q

is registered in all the new cells that are covered by the new region and not the old

region. Similarly, Q is unregistered from all cells that are covered by the old region

and not the new region.

5.5 Load Shedding in SOLE

Even with the scalability features of SOLE, the memory resource may be ex-

hausted at intervals of unexpected massive numbers of queries and moving objects

(e.g., during rush hours). To cope with such intervals, SOLE is equipped with a self-

tuning approach that tunes the memory load to support a large number of concurrent

queries, yet with an approximate answer. The main idea is to tune the definition

of significant objects based on the current workload. By adapting the definition of

significant objects, the memory load will be shed in two ways: (1) In-memory stored

objects will be revisited for the new meaning of significant objects. If an insignificant

93

object is found, it will be shed from memory. (2) Some of the newly input data will

be shed at the input level.

Figure 5.9 gives the architecture of self-tuning in SOLE. Once the shared join

operator incurs high resource consumption, e.g., the memory becomes almost full,

the join operator triggers the execution of the load shedding procedure. The load

shedding procedure may consult some statistics that are collected during the course

of execution to decide on a new meaning of significant objects. While the shared join

operator is running with the new definition of significant objects, it may send updates

of the current memory load to the load shedding procedure. The load shedding

procedure replies back by continuously adopting the notion of significant objects

based on the continuously changing memory load. Finally, once the memory load

returns to a stable state, the shared join operator retains the original meaning of

significant objects and stops the execution of the load shedding procedure. Solid

lines in Figure 5.9 indicate the mandatory steps that should be taken by any load

shedding technique. Dashed lines indicate a set of operations that may or may not

be employed based on the underlying load shedding technique. In the rest of this

section, we propose two load shedding techniques, namely query load shedding and

object load shedding.

5.5.1 Query Load Shedding

The main idea of query load shedding is to negotiate the query region with the

user. Whenever a query, say Q, is submitted to SOLE, Q specifies the minimum

accuracy that is acceptable by Q. Initially, the submitted query Q is evaluated

with complete accuracy. However, when the system is overloaded, Q’s accuracy is

degraded to its minimum permissible accuracy. Reducing the accuracy is achieved

by shrinking Q’s cache area from all directions to have a smaller cache area. After

we are done with all the cache area, if the system is still overloaded, and we have not

reached to the minimum permissable accuracy yet, we start to reduce Q’s area itself.

94

Thus, the notion of significant objects is adopted to be those tuples that lie in the

reduced query area of at least one continuous outstanding query. By reducing the

query sizes of all outstanding queries, objects that are outside of the reduced area

and are not of interest to any other query are immediately dropped from memory

and the corresponding negative updates are sent. During the course of execution, we

gradually increase the query size to cope with the memory load. Finally, when the

system reaches a stable state, we retain the original query sizes.

Query load shedding has two main advantages: (1) It is intuitive and simple to

implement where there is no need to maintain any kind of statistical information,

and (2) Insignificant objects are immediately dropped from memory. On the other

side, there are two main disadvantages: (1) The query load shedding process is

expensive, where it scans all stored objects and queries. This exhaustive behavior

results in pause time intervals where the system cannot produce output nor process

data inputs. (2) Although the query accuracy is guaranteed (assuming uniform data

distribution), there is no guarantee of the amount of reduced memory. Assume the

case that the reduced area from a query Qi lies completely inside another query Qj.

Thus, even though Qi is reduced, we cannot drop tuples from the reduced area where

they are still needed by Qj . Thus, the accuracy of Qi is reduced, yet the amount of

memory is not.

5.5.2 Object Load Shedding

The main idea of object load shedding is to drop objects that have less effect on

the average query accuracy. Thus, the definition of significant objects is adopted to

be those objects that are of interest to at least k queries (i.e., objects with reference

counter greater than or equal k). Notice that the original definition of significant

objects implicitly assumes that k = 1. A key point in object load shedding is that we

do not perform an exhaustive scan to drop insignificant objects. Instead, insignificant

objects are lazily dropped whenever they get accessed later during the course of

95

execution. Such lazy behavior completely avoids the pause time intervals in query

load shedding. In contrast to query load shedding, in object load shedding, we

guarantee the reduced memory load.

During the course of execution, we monitor the memory load and de-

crease/increase k accordingly. Once the system stabilizes and returns to its orig-

inal state, we set k = 1 to retain the original execution of SOLE. Determining the

threshold value k is achieved by maintaining a statistical table S that keeps track of

the number of objects that satisfy a certain number of queries. Assuming that we

will never drop an object that has a reference counter greater than N , then S can

be represented as an array of N numbers where the jth entry in S corresponds to

the number of moving objects that are of interest to j queries. Whenever the system

is overloaded, we go through S to get the minimum k that achieves the required

reduced load.

5.5.3 Load Shedding with Locking

Degenerate cases may affect severely the behavior of load shedding. Consider the

case of a query Q that has only one object P as its answer while P is not of interest

to any other query. By applying object load shedding, P will be dropped where it

is of interest to only one query Q. Thus, the accuracy of Q is dropped to zero. To

alleviate such problem, we use a locking technique. Basically, each query Q has a

threshold n where if Q has less than n objects in its answer set, all the n objects are

locked. Locked objects do not participate in the statistical table S. Once an object

is locked, the corresponding entry in S is updated. Whenever we lazily drop objects

from memory, we make sure that we do not drop any locked object. The concept of

locking can also be generalized to accommodate locking of important objects and/or

queries.

96


In this section, we study the performance of various aspects of SOLE. All the

experiments in this section use the Network-based Generator of Moving Objects [74]

to generate a set of moving objects and moving queries. The input to the generator is

the road map of Oldenburg (a city in Germany), The output of the generator is a set

of moving objects that move on the road network of the given city. Unless mentioned

otherwise, we generate 100K moving objects and 50K queries. The maximum speed

of any object covers 10% of the space along any dimension. All experiments are based

on a real implementation of the SOLE operator inside the engine of a prototype data

stream management system [3] where the SOLE operator is always at the bottom of

the query pipeline. The underlying machine is Intel Pentium IV CPU 2.4GHz with

256MB RAM running Windows XP.

5.6.1 Properties of SOLE

Figures 5.10a gives the performance of the first 25 seconds of executing a moving

query of size 0.5% of the space with a cache area that is 25% of the conservative

cache area. Our performance measure is the query accuracy that is represented as the

percentage of the number of produced tuples to the actual number that should have

been produced if all moving objects are materialized into secondary storage. With

only 25% cache, the query accuracy is almost stable with minor fluctuations that

degrade the accuracy to only 95%. No caching would result in a highly fluctuating

performance while a conservative caching would result in having a single line that

always have 100% accuracy.

Figure 5.10b gives the memory overhead when using a 25%, 50%, or 100% (con-

servative) cache sizes. The overhead is computed as a percentage from the original

query memory requirements. Thus a 0% cache does not incur any overhead. On

average a 25% cache results in only 10% overhead over the original query, while the

50% and 100% caches result in 25% and 50% overhead, respectively. As a compro-

97

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25

Percentage of result

Time

(a) 25% Cache

0

10

20

30

40

50

60

70

80

0 5 10 15 20 25

Overhead Percentage

Time

100% Cache50% Cache25% Cache

(b) Cache Overhead

Figure 5.10. Cache area in SOLE.

0

10

20

30

40

50

60

70

0 20 40 60 80 100 120 140

Redundancy

Grid Size

(a) Redundancy

10

15

20

25

30

0 20 40 60 80 100 120 140

Response Time

Grid Size

(b) Response Time

Figure 5.11. Grid Size.

mise between the cache overhead and the query accuracy, we use a 25% cache in

SOLE in all the following experiments.

Figure 5.11 studies the trade-offs for the number of grid cells in the shared mem-

ory buffer of SOLE for 50K moving queries of various sizes. Increasing the number

of cells in each dimension increases the redundancy that results from replicating the

query entry in all overlapping grid cells. On the other hand, increasing the grid size

98

6

8

10

12

14

16

18

20

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Ratio Over Non-Sharing

Query Size

Sharing

(a) Ratio

8184 634460.01

0.09

0.25

0.49

0.81

% RatioSingle Sharing

7.75

8.83

11.51

13.85

17.64

19.49

8250

4016

2577

2082

2007

934

349

186

118

1031

(b) Table of values

Figure 5.12. Maximum Number of Supported Queries.

results in a better response time. The response time is defined as the time interval

from the arrival of an object, say P , to either the time that P appears at the output

of SOLE or the time that SOLE decides to discard P . When the grid size increases

over 100, the response time performance degrades. Having a grid of 100 cells in each

dimension results in a total of 10K small-sized grid cells, thus, with each movement

of a moving query Q, we need to register/unregister Q in a large number of grid

cells. As a compromise between redundancy and response time, SOLE uses a grid

of size 30 in each dimension.

5.6.2 Scalability of SOLE

Figure 5.12 compares the performance of the SOLE shared operator as opposed

to dealing with each query as a separate entity (i.e., with no sharing). Figure 5.12a

gives the ratio of the number of supported queries via sharing over the non-sharing

case for various query sizes. Some of the actual values are depicted in the table

in Figure 5.12b. For small query sizes (e.g., 0.01%) with sharing, SOLE supports

more than 60K queries, which is almost 8 times better than the case of non-sharing.

99

0

0.2

0.4

0.6

0.8

1

1.2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Number of Points in Query Area (M)

Query Size

SharingNo Sharing

(a) Query area

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Number of Points in Cache Area (M)

Query Size

SharingNo Sharing

(b) Cache area

Figure 5.13. Data size in the query and cache areas.

The performance of sharing increases with the query size where it becomes 20 times

better than non-sharing in case of query size 1% of the space. The main reason of

the increasing performance with the size increase is that sharing benefits from the

overlapped areas of continuous queries. Objects that lie in any overlapped area are

stored only once in the sharing case rather than multiple times in the non-sharing

case. With small query sizes, overlapping of query areas is much less than the case

of large query sizes.

Figures 5.13a and 5.13b give the memory requirements for storing objects in the

query region and the query cache area, respectively, for 1K queries over 100K moving

objects. In Figure 5.13a, for large query sizes (e.g., 1% of the space), a non-shared

execution would need a memory of size 1M objects, while in SOLE, we need, at most,

a memory of size 100K objects. The main reason is that with non-sharing, objects

that are needed by multiple queries are redundantly stored in each query buffer,

while with sharing, each object is stored at most once in the shared memory buffer.

Thus, in terms of the query area, SOLE has a ten times performance advantage

over the non-shared case. Figure 5.13b gives the memory requirement for storing

objects in the cache area. The behavior of the non-sharing case is expected where

100

5

10

15

20

25

30

35

40

45

50

5 10 15 20 25 30 35 40 45 50

Response Time (msec)

Number of Queries (k)

StaticMoving

(a) Number of queries (b) Query size & Percent of moving

queries

Figure 5.14. Response time in SOLE.

the memory requirements increase with the increase in the query size. Surprisingly,

the caching overhead in the case of sharing decreases with the increase in the query

size. The main reason is that with the size increase, the caching area of a certain

query is likely to be part of the actual area of another query. Thus, objects that are

inside this caching area are not considered an overhead, where they are part of the

actual answer of some other query.

5.6.3 Response Time

Figure 5.14a gives the effect of the number of concurrent continuous queries

on the performance of SOLE. The number of queries varies from 5K to 50K. Our

performance measure is the average response time. The response time is defined as

the time interval from the arrival of object P to either the time that P appears at the

output of SOLE or the time that SOLE decides to discard P . We run the experiment

twice; once with only stationary queries, and the second time with only moving

queries. The increase in response time with the number of queries is acceptable

since as we increase the number of queries 10 times (from 5K to 50K), we get only

101

0

20

40

60

80

100

0 20 40 60 80 100

Load

Accuracy

Query Load SheddingObject Load Shedding

(a) 1K Queries

0

20

40

60

80

100

0 20 40 60 80 100

Load

Accuracy


(b) 25K Queries

Figure 5.15. Load Vs. Accuracy.

twice the increase in response time in the case of stationary queries (from 11 to 22

msec). The performance of moving queries has only a slight increase over stationary

queries (2 msec in case of 50K queries).

Figure 5.14b gives the effect of varying both the query size and the percentage

of moving queries on the response time of the SOLE operator. The number of

outstanding queries is fixed to 30K. The response time increases with the increase

in both the query size and the percentage of moving queries. However, the SOLE

operator is less sensitive to the percentage of moving queries than to the query size.

Increasing the percentage of moving queries results in a slight increase in response

time. This performance indicates that SOLE can efficiently deal with moving queries

in the same performance as with stationary queries. On the other hand, increasing

the query size from 0.01% to 1% only doubles the response time (from around 12

msec to around 24 msec) for various moving percentages.

102

5.6.4 Accuracy of Load Shedding

Figures 5.15a and 5.15b compare the performance of query and object load shed-

ding techniques for processing 1K and 25K queries with various sizes, respectively.

Our performance measure is the reduced load to achieve a ceratin query accuracy.

When the system is overloaded, we vary the required accuracy from 0% to 100%.

In degenerate cases, setting the accuracy to 100% requires keeping the whole mem-

ory load (100% load) while setting the accuracy to 0% requires deleting all memory

load. The bold diagonal line in Figure 5.15 represents the required accuracy. It is

“expected” that if we ask for m% accuracy, we will need to keep only m% of the

memory load. Thus, reducing the memory load to be lower than the diagonal line

is considered a gain over the “expected” behavior. The object load shedding always

maintains better performance than that of the query load shedding. For example, in

the case of 1K queries, to achieve an average accuracy of 90%, we need to keep track

of only 85% of the memory load in the case of object load shedding while 97% of

the memory is needed in the case of query load shedding. The performance of both

load shedding techniques is worse with the increase in the number of queries to 25K.

However, the object load shedding still keeps a good performance where it is almost

equal to the “expected” performance. The performance of query load shedding is

dramatically degraded where we need more than 90% of the memory load to achieve

only 20% accuracy.

Figures 5.16a and 5.16b compare the performance of query and object load shed-

ding to achieve an accuracy of 70% and 90%, while varying the number of queries

from 2K to 32K. The object load shedding greatly outperforms the query load shed-

ding and results in a better performance than the “expected” reduced load for all

query sizes. The main reason behind the bad performance of query load shedding is

that in the case of a large number of queries, there are high overlapping areas. Thus,

the reduced area of a certain query is highly likely to overlap other queries. So, even

103

55

60

65

70

75

80

85

90

95

100

0 5 10 15 20 25 30

Load

Number Queries (K)


(a) 70% Accuracy

75

80

85

90

95

100

0 5 10 15 20 25 30

Load

Number Queries (K)


(b) 90% Accuracy

Figure 5.16. Reduced load for a certain accuracy.

though we reduce the query area, we cannot drop any of the tuples that lie in the

reduced area. Such tuples are still of interest to other outstanding queries.

5.6.5 Scalability of Load Shedding

Figure 5.17a gives the ratio of the number of supported queries with query and

object load shedding techniques over the sharing case with no load shedding. All

queries are supported with a minimum accuracy of 90%. Depending on the query

size, query load shedding can support up to 3 times more queries than the case with

no load shedding. This indicates a ratio of up to 60 times better than the non-sharing

cases (refer to the table in Figure 5.12b). On the other hand, object load shedding

has much better scalable performance than that of query load shedding. With object

load shedding SOLE can have up to 13 times more queries than the case of no load

shedding, which indicates up to 260 times than the case of no sharing.

Figure 5.17b gives the performance of the query and object load shedding tech-

niques in terms of maintaining the average query accuracy with the arrival of con-

tinuous queries. The horizontal access advances with time to represent the arrival

104

0

2

4

6

8

10

12

14

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Ratio over no load shedding

Query Size


(a) Ratio

0

20

40

60

80

100

0 500 1000 1500 2000 2500 3000

Average Query Accuracy

Number of Queries (one at a time)


(b) Query Arrival

Figure 5.17. Scalability with Load Shedding.

of each continuous query. With tight memory resources, the memory is consumed

completely with the arrival of about 1200 queries. At this point, the process of load

shedding is triggered. The required memory consumption level is set to 90%. Since

query load shedding immediately drops tuples from memory, the query accuracy is

dropped sharply to 90%. In contrast, in object load shedding, the accuracy degrades

slowly. With the arrival of more queries, query load shedding tries to slowly enhance

its performance. However, the memory consumption is faster than the recovery of

query load shedding. Thus, soon, we will need to drop some more tuples from mem-

ory that will result in less accuracy. The behavior continues with two contradicting

actions: (1) Query load shedding tends to enhance the accuracy by retaining the

original query size, and (2) The arrival of more queries consumes memory resources.

Since the second action is faster than the first one, the performance has a zigzag

behavior that leads to reducing the query accuracy. On the other hand, object load

shedding does not suffer from this drawback. Instead, due to the smartness of choos-

ing victim objects, object load shedding always maintains sufficient accuracy with

minimum memory load.

105

Figure 5.18. Performance of Object Load Shedding.

5.6.6 Object Load Shedding

Figure 5.18 focuses on the performance of object load shedding. The required

reduced load varies from 10% to 90% while the number of queries varies from 1K

to 32K. This experiment shows that object load shedding is scalable and is stable

when increasing the number of queries. For example, when reducing the memory

load to 90%, we consistently get an accuracy around 94% regardless of the number

of queries. Such consistent behavior appears in various reduced loads.

5.7 Summary

In this chapter, we introduced the Scalable On-Line Execution algorithm (SOLE,

for short) for continuous and on-line evaluation of concurrent continuous spatio-

temporal queries over spatio-temporal data streams. SOLE is an in-memory al-

gorithm that utilizes the scarce memory resources efficiently by keeping track of

only those objects that are considered significant. SOLE is a unified framework for

stationary and moving queries that is encapsulated into a physical pipelined query

operator. To cope with intervals of high arrival rates of objects and/or queries, SOLE

utilizes load shedding techniques that aim to support more continuous queries, yet

106

with an approximate answer. Two load shedding techniques were proposed, namely,

query load shedding and object load shedding. Experimental results based on a real

implementation of SOLE inside a prototype data stream management system show

that SOLE can support up to 20 times more continuous queries than the case of

dealing with each query separately. With object load shedding, SOLE can support

up to 260 times more queries than the case of no sharing.

107

6 CONCLUSIONS AND FUTURE WORK

The main goal of this dissertation is to extend database management systems and

data stream management systems to efficiently support continuous query processing

in location-aware environments. Location-aware environments are characterized by

the large number of moving objects and the large number of outstanding continuous

spatio-temporal queries. Moving objects continuously send updates of their location

information to a location-aware database server. The answers of continuous queries

are continuously changing with the change of location of moving objects and/ or

query regions.

This dissertation fills the gap between spatio-temporal query algorithms and the

practical environment where issues of scalability, practicality, and realization inside

database engines are of great concern.

6.1 Summary of Contributions

This dissertation introduces three main contributions in supporting continuous

query processing in location-aware environments. First, we introduced SINA; a disk-

based framework for scalable execution of multiple concurrent continuous spatio-

temporal queries. SINA is designed with two goals in mind: (1) Scalability in terms

of the number of concurrent continuous spatio-temporal queries, and (2) Incremen-

tal evaluation of continuous spatio-temporal queries. SINA achieves scalability by

employing a shared execution paradigm where the execution of continuous spatio-

temporal queries is abstracted as a spatial join between a set of moving objects

and a set of moving queries. Incremental evaluation is achieved by computing only

the updates of the previously reported answer. We introduce two types of updates,

namely positive and negative updates. Positive or negative updates indicate that a

108

certain object should be added to or removed from the previously reported answer,

respectively.

The second contribution of this dissertation is that we furnished data stream

management systems by a set of primitive spatio-temporal pipelined query operators

(e.g., range query and k-nearest-neighbor operators). Unlike previous approaches

that focus only on high level implementation of spatio-temporal algorithms, our

approach in providing spatio-temporal query operators provides the following ad-

vantages: (1) Spatio-temporal operators can be combined with other traditional

operators (e.g., distinct, aggregate, and join) to support a wide variety of continuous

spatio-temporal queries. (2) Pushing spatio-temporal operators deep in the query

execution plan reduces the number of tuples in the query pipeline, hence provides

efficient query processing. (3) Flexibility in the query optimizer where multiple can-

didate execution plans can be produced by shuffling the spatio-temporal operators

with other traditional operators.

Our third contribution is that we introduced SOLE; a stream-based scalable

pipelined query operator for evaluating large numbers of concurrent continuous

queries over spatio-temporal data streams. concurrent continuous spatio-temporal

queries over data streams. SOLE performs an incremental spatio-temporal join be-

tween two input streams, a stream of spatio-temporal objects and a stream of spatio-

temporal queries. In addition, all the continuous outstanding queries in SOLE share

the same buffer pool. To cope with intervals of high arrival rates of objects and/or

queries, SOLE utilizes a self-tuning approach based on load-shedding where some of

the stored objects are dropped from memory.

6.2 Future Extensions

This dissertation raises a number of research problems related to spatio-temporal

query processing and continuous query processing in general. In this section, we give

an overview of several directions for future research.

109

6.2.1 Continuous Query Optimization

Once a continuous query is submitted to the server, the server consults its query

optimizer to decide about an optimal plan for the newly submitted query. The

decision for the optimal plan is based on certain cost models and environmental

variables. For example, the selectivity of each operator at the time of receiving the

continuous query. However, since the continuous query stays active at the server

for long time, some of the environmental environments change over time. Thus, the

initial decision for the optimal query plan becomes invalid and the initially optimal

execution plan becomes suboptimal. As a result, the performance of the continuous

query degrades over time.

To overcome this drawback, we need to furnish the spatio-temporal continuous

query processor with the necessary techniques that continuously monitor the perfor-

mance of the continuous query plan and adopt it to the optimal one. Such adaptive

continuous query optimization needs to maintain some statistics about the behavior

of the continuously received data. In addition, some data mining techniques need to

be explored to detect if there are special patterns in the received data or not.

One approach to achieve continuous query optimization is to employ spatio-

temporal histograms. Unlike traditional histograms that capture a snapshot of the

underlying environment, spatio-temporal histograms take the time dimension into

account. Once a new continuous query is submitted to the server, a spatio-temporal

histogram is constructed using the first few incoming data. The notion of ”few”

depends on the query lifetime. For example, if a query is submitted to run for

a whole year, then we can use the data for the first month to build our spatio-

temporal histogram. Then, we employ periodicity mining techniques to discover any

periodicity in the first few incoming data. Based on the spatio-temporal histogram

and the periodicity mining techniques, we can decide to use the query plan P1 if the

query runs in early morning, the query plan P2 if the query runs in rush hours, or

the query plan P3 if the query runs on weekends.

110

6.2.2 Cost Model for Spatio-temporal Operators

This dissertation introduces new spatio-temporal operators that can be combined

with traditional query operators in a large query plan. To fully integrate the pro-

posed query operators in the query optimizer, we need to develop cost models for

the continuous spatio-temporal operators. Unlike traditional operators that only

produce positive tuples in the query pipeline, spatio-temporal operators can produce

both positive and negative tuples in the query pipeline.

The main challenge in developing cost models for spatio-temporal operators is

that we have to take into account the number of negative tuples that result from these

operators. The number of negative tuples that come out of the spatio-temporal op-

erators depends on many factors that include the query size, the speed of moving

objects, the speed of the moving query, and the pattern of movement of moving ob-

jects. Taking all these factors in building a cost-model for spatio-temporal operators

is challenging.

6.2.3 Context-aware Query Processing

What we introduced in this dissertation is basically a location-aware continuous

query processor. The main idea is that we had to modify and/or add new function-

alities in several layers of the database engine to support the notion of ”location”.

By being a location-aware, two similar quires submitted to the same databases server

would have different answers based on the location of the submitted query. Since

location is considered a context, a larger umbrella of our proposed query processor

is to support any general context, i.e., building a context-aware query engine. Con-

texts other than the location include time, identity, temperature, activity, schedule

agenda, or profile. By being a context-aware, two similar queries submitted to the

same database server would have different answers based on the associated context

with each query.

111

The main challenge in providing a context-aware query processor is that we want

to avoid modifying the database engine with the addition of each new context. In-

stead, we want to build an extensible query engine that is general enough to support

any kind of context. Other challenges include capturing, representing, and pro-

cessing contextual data. To capture context information, generally some additional

sensors and/or programs are required. To transfer the context information to appli-

cations and for different applications to be able to use the same context information,

a common representation format should exist. To be able to obtain the context-

information, applications must include some intelligence to process the information

and to deduce the meaning.

LIST OF REFERENCES

112

LIST OF REFERENCES

[1] Mohamed F. Mokbel, Walid G. Aref, Susanne E. Hambrusch, and Sunil Prab-hakar. Towards Scalable Location-aware Services: Requirements and ResearchIssues. In Proceedings of the ACM Symposium on Advances in Geographic In-formation Systems, ACM GIS, pages 110–117, New Orleans, LA, November2003.

[2] Tamer Nadeem, Sasan Dashtinezhad, Chunyuan Liao, and Liviu Iftode. Traf-ficView: A Scalable Traffic Monitoring System. In Proceedings of the Interna-tional Conference on Mobile Data Management, MDM, pages 13–26, Berkeley,CA, January 2004.

[3] Moustafa A. Hammad, Mohamed F. Mokbel, Mohamed H. Ali, Walid G. Aref,Ann C. Catlin, Ahmed K. Elmagarmid, Mohamed Eltabakh, Mohamed G.Elfeky, Thanaa M. Ghanem, Robert Gwadera, Ihab F. Ilyas, Mirette Mar-zouk, and Xiaopeng Xiong. Nile: A Query Processing Engine for Data Streams(Demo). In Proceedings of the International Conference on Data Engineering,ICDE, page 851, Boston, MA, March 2004.

[4] Timos K. Sellis. Multiple-Query Optimization. ACM Transactions on DatabaseSystems , TODS, 13(1):23–52, 1988.

[5] Prasan Roy, S. Seshadri, S. Sudarshan, and Siddhesh Bhobe. Efficient andExtensible Algorithms for Multi Query Optimization. In Proceedings of theACM International Conference on Management of Data, SIGMOD, pages 249–260, Dallas, TX, May 2000.

[6] Praveen Seshadri and Mark Paskin. PREDATOR: An OR-DBMS with En-hanced Data Types. In Proceedings of the ACM International Conference onManagement of Data, SIGMOD, pages 568–571, Tucson, AZ, May 1997.

[7] Michael J. Carey, David J. DeWitt, Michael J. Franklin, Nancy E. Hall, Mark L.McAuliffe, Jeffrey F. Naughton, Daniel T. Schuh, Marvin H. Solomon, C. K.Tan, Odysseas G. Tsatalos, Seth J. White, and Michael J. Zwilling. Shoring UpPersistent Applications. In Proceedings of the ACM International Conference onManagement of Data, SIGMOD, pages 383–394, Minneapolis, MN, May 1994.

[8] Mohamed F. Mokbel, Xiaopeng Xiong, and Walid G. Aref. SINA: ScalableIncremental Processing of Continuous Queries in Spatio-temporal Databases.In Proceedings of the ACM International Conference on Management of Data,SIGMOD, pages 443–454, Paris, France, June 2004.

[9] Mohamed F. Mokbel and Walid G. Aref. GPAC: Generic and Progressive Pro-cessing of Mobile Queries over Mobile Data. In Proceedings of the InternationalConference on Mobile Data Management, MDM, Ayia Napa, Cyprus, May 2005.

113

[10] Mohamed F. Mokbel. Continuous Query Processing in Spatio-temporalDatabases. In Proceedings of the ICDE/EDBT Ph.D. Workshop, March 2004,Boston, MA, pages 119-128. Selected among best papers for a revised versionin Lecture Notes of Computer Science (LNCS), Current Trends in DatabaseTechnology, EDBT 2004 Workshops Revised Selected Papers. Vol. 3268, pages100-111.

[11] Mohamed F. Mokbel, Xiaopeng Xiong, Moustafa A. Hammad, and Walid G.Aref. Continuous Query Processing of Spatio-temporal Data Streams inPLACE. In Proceedings of the second workshop on Spatio-Temporal DatabaseManagement, STDBM, pages 57–64, Toronto, Canada, August 2004.

[12] Mohamed F. Mokbel, Xiaopeng Xiong, Moustafa A. Hammad, and Walid G.Aref. Continuous Query Processing of Spatio-temporal Data Streams inPLACE. GeoInformatica, 2005. To Appear.

[13] Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref, Susanne Hambrusch,Sunil Prabhakar, and Moustafa Hammad. PLACE: A Query Processor for Han-dling Real-time Spatio-temporal Data Streams (Demo). In Proceedings of theInternational Conference on Very Large Data Bases, VLDB, pages 1377–1380,Toronto, Canada, August 2004.

[14] Mario A. Nascimento and Jefferson R. O. Silva. Towards historical R-trees. InACM symposium on Applied Computing, SAC, pages 235–240, Atlanta, GA,February 1998.

[15] Yufei Tao and Dimitris Papadias. MV3R-Tree: A Spatio-Temporal AccessMethod for Timestamp and Interval Queries. In Proceedings of the Interna-tional Conference on Very Large Data Bases, VLDB, pages 431–440, Rome,Italy, September 2001.

[16] Dieter Pfoser, Christian S. Jensen, and Yannis Theodoridis. Novel Approachesin Query Processing for Moving Object Trajectories. In Proceedings of theInternational Conference on Very Large Data Bases, VLDB, pages 395–406,Cairo, Egypt, September 2000.

[17] V. Prasad Chakka, Adam Everspaugh, and Jignesh M. Patel. Indexing LargeTrajectory Data Sets with SETI. In Proc. of the Conf. on Innovative DataSystems Research, CIDR, Asilomar, CA, January 2003.

[18] Zhexuan Song and Nick Roussopoulos. Hashing Moving Objects. In MobileData Management, pages 161–172, Hong Kong, January 2001.

[19] Sunil Prabhakar, Yuni Xia, Dmitri V. Kalashnikov, Walid G. Aref, and Su-sanne E. Hambrusch. Query Indexing and Velocity Constrained Indexing: Scal-able Techniques for Continuous Queries on Moving Objects. IEEE Trans. onComputers, 51(10):1124–1140, 2002.

[20] Dongseop Kwon, Sangjun Lee, and Sukho Lee. Indexing the Current Positionsof Moving Objects Using the Lazy Update R-tree. In Mobile Data Management,MDM, pages 113–120, Singapore, January 2002.

[21] Mong-Li Lee, Wynne Hsu, Christian S. Jensen, and Keng Lik Teo. SupportingFrequent Updates in R-Trees: A Bottom-Up Approach. In Proceedings of theInternational Conference on Very Large Data Bases, VLDB, pages 608–619,Berlin, Germany, September 2003.

114

[22] Simonas Saltenis, Christian S. Jensen, Scott T. Leutenegger, and Mario A.Lopez. Indexing the Positions of Continuously Moving Objects. In Proceedingsof the ACM International Conference on Management of Data, SIGMOD, pages331–342, Dallas, TX, May 2000.

[23] Simonas Saltenis and Christian S. Jensen. Indexing of Moving Objects forLocation-Based Services. In Proceedings of the International Conference onData Engineering, ICDE, pages 463–472, San Jose, CA, February 2002.

[24] Yufei Tao, Dimitris Papadias, and Jimeng Sun. The TPR*-Tree: An OptimizedSpatio-temporal Access Method for Predictive Queries. In Proceedings of theInternational Conference on Very Large Data Bases, VLDB, pages 790–801,Berlin, Germany, September 2003.

[25] Jignesh M. Patel, Yun Chen, and V. Prasad Chakka. STRIPES: An EfficientIndex for Predicted Trajectories. In Proceedings of the ACM International Con-ference on Management of Data, SIGMOD, pages 637–646, Paris, France, June2004.

[26] Yuni Xia and Sunil Prabhakar. Q+-tree: Efficient Indexing for Moving Ob-ject Database. In Proceedings of the International Conference on DatabaseSystems for Advanced Applications, DASFAA, pages 175–182, Kyoto, Japan,March 2003.

[27] Christos Faloutsos and Shari Roseman. Fractals for Secondary Key Retrieval. InProceedings of the ACM Symposium on Principles of Database Systems, PODS,pages 247–252, Philadelphia, PA, March 1989.

[28] Mohamed F. Mokbel and Walid G. Aref. Irregularity in Multi-DimensionalSpace-Filling Curves with Applications in Multimedia Databases. In Proceedingsof the International Conference on Information and Knowledge Managemen,CIKM, pages 512–519, Atlanta, GA, May 2001.

[29] Mohamed F. Mokbel, Walid G. Aref, and Ibrahim Kamel. Analysis of Multi-dimensional Space-Filling Curves. GeoInformatica, 7(3):179–209, September2003.

[30] Antonin Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching.In Proceedings of the ACM International Conference on Management of Data,SIGMOD, pages 47–57, Boston, MA, June 1984.

[31] Yufei Tao, Dimitris Papadias, and Qiongmao Shen. Continuous Nearest Neigh-bor Search. In Proceedings of the International Conference on Very Large DataBases, VLDB, pages 287–298, Hong Kong, August 2002.

[32] Zhexuan Song and Nick Roussopoulos. K-Nearest Neighbor Search for MovingQuery Point. In Proceedings of the International Symposium on Advances inSpatial and Temporal Databases, SSTD, pages 79–96, Redondo Beach, CA, July2001.

[33] Rimantas Benetis, Christian S. Jensen, Gytis Karciauskas, and Simonas Salte-nis. Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Ob-jects. In Proceedings of the International Database Engineering and ApplicationsSymposium, IDEAS, pages 44–53, Alberta, Canada, July 2002.

115

[34] Man Lung Yiu, Dimitris Papadias, Nikos Mamoulis, and Yufei Tao. ReverseNearest Neighbors in Large Graphs. In Proceedings of the International Con-ference on Data Engineering, ICDE, pages 186–187, Kyoto, Japan, April 2005.

[35] Marios Hadjieleftheriou, George Kollios, Dimitrios Gunopulos, and Vassilis J.Tsotras. On-Line Discovery of Dense Areas in Spatio-temporal Databases. InProceedings of the International Symposium on Advances in Spatial and Tem-poral Databases, SSTD, pages 306–324, Santorini Island, Greece, July 2003.

[36] Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, and Dimitris Papadias.Spatio-Temporal Aggregation Using Sketches. In Proceedings of the Interna-tional Conference on Data Engineering, ICDE, pages 214–225, Boston, MA,March 2004.

[37] Shivnath Babu and Jennifer Widom. Continuous Queries over Data Streams.SIGMOD Record, 30(3):109–120, 2001.

[38] Moustafa A. Hammad, Michael J. Franklin, Walid G. Aref, and Ahmed K. Elma-garmid. Scheduling for shared window joins over data streams. In Proceedings ofthe International Conference on Very Large Data Bases, VLDB, pages 297–308,Berlin, Germany, September 2003.

[39] Moustafa A. Hammad, Thanaa M. Ghanem, Walid G. Aref, Ahmed K. El-magarmid, and Mohamed F. Mokbel. Efficient pipelined execution of sliding-window queries over data streams. Technical Report TR CSD-03-035, PurdueUniversity Department of Computer Sciences, December 2003.

[40] Lukasz Golab and M. Tamer Ozsu. Processing Sliding Window Multi-Joinsin Continuous Queries over Data Streams. In Proceedings of the InternationalConference on Very Large Data Bases, VLDB, pages 500–511, Berlin, Germany,September 2003.

[41] Arvind Arasu and Jennifer Widom. Resource Sharing in Continuous Sliding-Window Aggregates. In Proceedings of the International Conference on VeryLarge Data Bases, VLDB, pages 336–347, Toronto, Canada, August 2004.

[42] Mohamed F. Mokbel, Thanaa M. Ghanem, and Walid G. Aref. Spatio-temporalAccess Methods. IEEE Data Engineering Bulletin, 26(2):40–49, June 2003.

[43] Reynold Cheng, Yuni Xia, Sunil Prabhakar, and Rahul Shah. Change TolerantIndexing for Constantly Evolving Data. In Proceedings of the InternationalConference on Data Engineering, ICDE, pages 391–402, Kyoto, Japan, April2005.

[44] Arvind Arasu, Brian Babcock, Shivnath Babu, J. Cieslewicz, Mayur Datar,K. Ito, Rajeev Motwani, U. Srivastava, and Jennifer Widom. Stream: Thestanford data stream management system, 2004.

[45] Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin,Joseph M. Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel Madden, Vi-jayshankar Raman, Fred Reiss, and Mehul A. Shah. TelegraphCQ: ContinuousDataflow Processing for an Uncertain World. In Proceedings of the Interna-tional Conference on Innovative Data Systems Research, CIDR, Asilomar, CA,January 2003.

116

[46] Lukasz Golab, Shaveen Garg, and M. Tamer Ozsu. On Indexing Sliding Win-dows over Online Data Streams. In Proceedings of the International Confer-ence on Extending Database Technology, EDBT, pages 712–729, Crete, Greece,March 2004.

[47] Jaewoo Kang, Jeffrey F. Naughton, and Stratis Viglas. Evaluating WindowJoins over Unbounded Streams. In Proceedings of the International Conferenceon Data Engineering, ICDE, pages 341–352, Bangalore, India, March 2003.

[48] Samuel Madden, Mehul Shah, Joseph M. Hellerstein, and Vijayshankar Raman.Continuously adaptive continuous queries over streams. In Proceedings of theACM International Conference on Management of Data, SIGMOD, pages 49–60, Madison, Wisconsin, June 2002.

[49] Utkarsh Srivastava and Jennifer Widom. Memory-Limited Execution of Win-dowed Stream Joins. In Proceedings of the International Conference on VeryLarge Data Bases, VLDB, pages 324–335, Toronto, Canada, August 2004.

[50] Graham Cormode and S. Muthukrishnan. Radial Histograms for SpatialStreams. Technical Report DIMACS TR: 2003-11, Rutgers University, 2003.

[51] John Hershberger and Subhash Suri. Adaptive Sampling for Geometric Prob-lems over Data Streams. In Proceedings of the ACM Symposium on Principlesof Database Systems, PODS, pages 252–262, Paris, France, June 2004.

[52] Jimeng Sun, Dimitris Papadias, Yufei Tao, and Bin Liu. Querying about thePast, the Present and the Future in Spatio-Temporal Databases. In Proceedingsof the International Conference on Data Engineering, ICDE, pages 202–213,Boston, MA, March 2004.

[53] Moustafa A. Hammad, Walid G. Aref, and Ahmed K. Elmagarmid. StreamWindow Join: Tracking Moving Objects in Sensor-Network Databases. In Pro-ceedings of the International Conference on Scientific and Statistical DatabaseManagement, SSDBM, pages 75–84, Cambridge, MA, July 2003.

[54] Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. MaintainingStream Statistics over Sliding Windows . In Proceedings of the ACM-SIAMSymposium on Discrete Algorithms, SODA, pages 635–644, San Francisco, CA,January 2002.

[55] Jun Zhang, Manli Zhu, Dimitris Papadias, Yufei Tao, and Dik Lun Lee. Address-based Spatial Queries. In Proceedings of the ACM International Conference onManagement of Data, SIGMOD, pages 443–454, San Diego, CA, June 2003.

[56] Baihua Zheng and Dik Lun Lee. Semantic Caching in Location-DependentQuery Processing. In Proceedings of the International Symposium on Advancesin Spatial and Temporal Databases, SSTD, pages 97–116, Redondo Beach, CA,July 2001.

[57] Iosif Lazaridis, Kriengkrai Porkaew, and Sharad Mehrotra. Dynamic Queriesover Mobile Objects. In Proceedings of the International Conference on Ex-tending Database Technology, EDBT, pages 269–286, Prague, Czech Republic,March 2002.

117

[58] Glenn S. Iwerks, Hanan Samet, and Ken Smith. Continuous K-Nearest NeighborQueries for Continuously Moving Points with Updates. In Proceedings of theInternational Conference on Very Large Data Bases, VLDB, pages 512–523,Berlin, Germany, September 2003.

[59] Yufei Tao, Jimeng Sun, and Dimitris Papadias. Analysis of Predictive Spatio-Temporal Queries. ACM Transactions on Database Systems , TODS, 28(4),2003.

[60] Ying Cai, Kien A. Hua, and Guohong Cao. Processing Range-MonitoringQueries on Heterogeneous Mobile Objects. In Mobile Data Management, MDM,pages 27–38, Berkeley, CA, January 2004.

[61] Bugra Gedik and Ling Liu. MobiEyes: Distributed Processing of ContinuouslyMoving Queries on Moving Objects in a Mobile System. In Proceedings of theInternational Conference on Extending Database Technology, EDBT, pages 67–87, Crete, Greece, March 2004.

[62] Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. NiagaraCQ: AScalable Continuous Query System for Internet Databases. In Proceedings ofthe ACM International Conference on Management of Data, SIGMOD, pages379–390, Dallas, TX, May 2000.

[63] Sirish Chandrasekaran and Michael J. Franklin. Streaming Queries over Stream-ing Data. In Proceedings of the International Conference on Very Large DataBases, VLDB, pages 203–214, Hong Kong, August 2002.

[64] Susanne E. Hambrusch, Chuan-Ming Liu, Walid G. Aref, and Sunil Prabhakar.Query Processing in Broadcasted Spatial Index Trees. In Proceedings of the In-ternational Symposium on Advances in Spatial and Temporal Databases, SSTD,pages 502–521, Redondo Beach, CA, July 2001.

[65] Annita N. Wilschut and Peter M. G. Apers. Dataflow Query Execution in aParallel Main-Memory Environment. In Proceedings of the First InternationalConference on Parallel and Distributed Information Systems, PDIS 1991, pages68–77, Miami, Florida, December 1991.

[66] Tolga Urhan and Michael J. Franklin. XJoin: A Reactively-Scheduled PipelinedJoin Operator. IEEE Data Engineering Bulletin, 23(2):7–18, 2000.

[67] Mohamed F. Mokbel, Ming Lu, and Walid G. Aref. Hash-merge Join: A Non-blocking Join algorithm for Producing Fast and Early Join Results. In Pro-ceedings of the International Conference on Data Engineering, ICDE, pages251–263, Boston, MA, March 2004.

[68] Yufei Tao, Man Lung Yiu, Dimitris Papadias, Nikos Mamoulis, and MariosHadjieleftheriou. RPJ: Producing Fast Join Results on Streams through Rate-based. In Proceedings of the ACM International Conference on Management ofData, SIGMOD, Baltimore, MD, June 2005.

[69] Hanan Samet. The Quadtree and Related Hierarchical Data Structures. ACMComputing Surveys, 16(2):187–260, 1984.

118

[70] Jignesh M. Patel and David J. DeWitt. Partition Based Spatial-Merge Join.In Proceedings of the ACM International Conference on Management of Data,SIGMOD, pages 259–270, Montreal, Canada, June 1996.

[71] Ouri Wolfson and Huabei Yin. Accuracy and Resource Concumption in Trackingand Location Prediction. In Proceedings of the International Symposium onAdvances in Spatial and Temporal Databases, SSTD, pages 325–343, SantoriniIsland, Greece, July 2003.

[72] Gsli R. Hjaltason and Hanan Samet. Distance browsing in spatial databases.ACM Transactions on Database Systems , TODS, 24(2):265–318, 1999.

[73] Thomas Brinkhoff, Hans-Peter Kriegel, and Bernhard Seeger. Efficient Process-ing of Spatial Joins Using R-Trees. In Proceedings of the ACM InternationalConference on Management of Data, SIGMOD, pages 237–246, Washington,D.C, May 1993.

[74] Thomas Brinkhoff. A Framework for Generating Network-Based Moving Ob-jects. GeoInformatica, 6(2):153–180, 2002.

[75] Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger.The R*-Tree: An Efficient and Robust Access Method for Points and Rectan-gles. In Proceedings of the ACM International Conference on Management ofData, SIGMOD, pages 322–331, Atlantic City, NJ, May 1990.

[76] Hyun Kyoo Park, Jin Hyun Son, and Myoung-Ho Kim. An Efficient Spa-tiotemporal Indexing Method for Moving Objects in Mobile CommunicationEnvironments. In Proceedings of the International Conference on Mobile DataManagement, MDM, pages 78–91, Melbourne, Australia, January 2003.

[77] Zhexuan Song and Nick Roussopoulos. SEB-tree: An Approach to Index Con-tinuously Moving Objects. In Proceedings of the International Conference onMobile Data Management, MDM, pages 340–344, Melbourne, Australia, Jan-uary 2003.

[78] Daniel J. Abadi, Donald Carney, Ugur Cetintemel, Mitch Cherniack, ChristianConvey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stanley B.Zdonik. Aurora: A New Model and Architecture for Data Stream Management.VLDB Journal, 12(2):120–139, 2003.

[79] Daniel Abadi, Yanif Ahmad, Hari Balakrishnan, Magdalena Balazinska, UgurCetintemel, Mitch Cherniack, Jeong-Hyon Hwang, John Janotti, WolfgangLindner, Sam Madden, Alex Rasin, Michael Stonebraker, Nesime Tatbul, YingXing, and Stan Zdonik. The Design of the Borealis Stream Processing Engine.In Proceedings of the International Conference on Innovative Data Systems Re-search, CIDR, pages 277–289, Asilomar, CA, January 2005.

[80] Rajeev Motwani, Jennifer Widom, Arvind Arasu, Brian Babcock, ShivnathBabu, Mayur Datar, Gurmeet Singh Manku, Chris Olston, Justin Rosenstein,and Rohit Varma. Query Processing, Approximation, and Resource Manage-ment in a Data Stream Management System. In Proceedings of the InternationalConference on Innovative Data Systems Research, CIDR, Asilomar, CA, Jan-uary 2003.

119

[81] Chuck Cranor, Theodore Johnson, Oliver Spataschek, and VladislavShkapenyuk. Gigascope: a Stream Database for Network Applications. In Pro-ceedings of the ACM International Conference on Management of Data, SIG-MOD, pages 647–651, San Diego, California, June 2003.

[82] Joseph M. Hellerstein, Jeffrey F. Naughton, and Avi Pfeffer. Generalized SearchTrees for Database Systems. In Proceedings of the International Conference onVery Large Data Bases, VLDB, pages 562–573, Zurich, Switzerland, September1995.

[83] Walid G. Aref and Ihab F. Ilyas. SP-GiST: An Extensible Database Index forSupporting Space Partitioning Trees. Journal of Intelligent Info. Systems, JIIS,17(2–3):215–240, 2001.

[84] Xiaopeng Xiong, Mohamed F. Mokbel, Walid G. Aref, Susanne Hambrusch,and Sunil Prabhakar. Scalable Spatio-temporal Continuous Query Processingfor Location-aware Services. In Proceedings of the International Conferenceon Scientific and Statistical Database Management, SSDBM, pages 317–328,Santorini Island, Greece, June 2004.

[85] Dimitris Papadias, Qiongmao Shen, Yufei Tao, and Kyriakos Mouratidis. GroupNearest Neighbor Queries. In Proceedings of the International Conference onData Engineering, ICDE, pages 301–312, Boston, MA, March 2004.

[86] Yufei Tao and Dimitris Papadias. Time-parameterized queries in spatio-temporal databases. In Proceedings of the ACM International Conference onManagement of Data, SIGMOD, pages 334–345, Madison, WI, June 2002.

[87] Berthold Reinwald and Hamid Pirahesh. SQL Open Heterogeneous Data Access.In Proceedings of the ACM International Conference on Management of Data,SIGMOD, pages 506–507, Seattle, WA, June 1998.

[88] Berthold Reinwald, Hamid Pirahesh, Ganapathy Krishnamoorthy, GeorgeLapis, Brian T. Tran, and Swati Vora. Heterogeneous query processing throughsql table functions. In Proceedings of the International Conference on DataEngineering, ICDE, pages 366–373, Sydney, Austrialia, March 1999.

[89] Xiaopeng Xiong, Mohamed F. Mokbel, and Walid G. Aref. SEA-CNN: Scal-able Processing of Continuous K-Nearest Neighbor Queries in Spatio-temporalDatabases. In Proceedings of the International Conference on Data Engineering,ICDE, pages 643–654, Kyoto, Japan, April 2005.

[90] Christian S. Jensen, Dan Lin, and Beng Chin Ooi. Query and Update EfficientB+-Tree Based Indexing of Moving Objects. In Proceedings of the InternationalConference on Very Large Data Bases, VLDB, pages 768–779, Toronto, Canada,August 2004.

[91] Ahmed Ayad and Jeffrey F. Naughton. Static Optimization of ConjunctiveQueries with Sliding Windows Over Infinite Streams. In Proceedings of theACM International Conference on Management of Data, SIGMOD, pages 419–430, Paris, France, June 2004.

[92] Sirish Chandrasekaran and Michael J. Franklin. PSoup: a system for streamingqueries over streaming data. VLDB Journal, 12(2):140–156, 2003.

120

[93] Alin Dobra, Minos N. Garofalakis, Johannes Gehrke, and Rajeev Rastogi.Sketch-Based Multi-query Processing over Data Streams. In Proceedings ofthe International Conference on Extending Database Technology, EDBT, pages551–568, March 2004.

[94] Brian Babcock, Mayur Datar, and Rajeev Motwani. Load Shedding for Aggrega-tion Queries over Data Streams. In Proceedings of the International Conferenceon Data Engineering, ICDE, pages 350–361, Boston, MA, March 2004.

[95] Nesime Tatbul, Ugur Cetintemel, Stanley B. Zdonik, Mitch Cherniack, andMichael Stonebraker. Load Shedding in a Data Stream Manager. In Proceedingsof the International Conference on Very Large Data Bases, VLDB, pages 309–320, Berlin, Germany, August 2003.

VITA

121

VITA

Mohamed Mokbel was born in Alexandria, Egypt in 1974. His passion for computer

started in his early years where he remembers seeing a lot of punched cards around

in his home. During high school, he obtained his first personal computer; a 12MHZ

CPU with two floppy drives, no hard drive, and a mono orange screen (worth $1300).

He used this computer to help his mother (Ph.D. in Oceanography) in drafting her

research papers. In 1991, Mohamed joined the Faculty of Engineering at Alexandria

University. After a very competitive freshman year, he was ranked among the top

students and joined the Computer Science Department. As a reward, his father

bought him a 60MB hard disk (worth $400). Continuing his passion for computer

science, Mohamed was one of only two students in his class who obtained Distinction

grade in all the five undergraduate years of study. In 1996 and 1999, Mohamed was

awarded his B.Sc and M.Sc degrees in computer science with the highest degree of

honor from the Faculty of Engineering, Alexandria University.

In 2000, Mohamed joined Purdue University as a research assistant with Prof.

Walid Aref. Working with such a wonderful advisor, Mohamed published several

research papers in different areas of core database systems. In summer 2002, Mo-

hamed interned at Lawrence Livermore National Lab (LLNL), one of the top national

labs in the USA. In summer 2004, Mohamed interned with the database group at

Microsoft Research, one of the top research labs world-wide, where he interacted

with many world-class researchers and built his large-scale system experience. Mo-

hamed’s main research interests focus on advancing the state of the art in the design

and implementation of database engines to cope with the requirements of emerging

applications. Mohamed Mokbel graduated with a Ph.D. degree in computer science

from Purdue University in August 2005 and joined the department of Computer

Science at University of Minnesota–Twin Cities as an assistant professor.

Date post:	26-Dec-2019
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

SCALABLE CONTINUOUS QUERY PROCESSING IN LOCATION …mokbel/papers/Mokthesis.pdf · 2009-09-29 ·...

Documents