Clustering-Based Index and Data Broadcasting for Mobile Nearest Neighbor Query Processing

1964 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 9, NO. 4, NOVEMBER 2013

Clustering-Based Index and Data Broadcasting forMobile Nearest Neighbor Query Processing

Agustinus Borgy Waluyo, David Taniar, Wenny Rahayu, and Bala Srinivasan

Abstract—This paper introduces a novel clustering-basedbroadcast scheduling technique for mobile nearest-neighbor

query processing in cyber physical systems. An efficientindex structure is presented to guide mobile clients to the -ob-jects. The proposed broadcast scheduling and indexing is aimedat minimizing query access time and energy consumption of theclients when retrieving -objects through wireless channels.We have experimentally studied the proposed scheme and its com-parison with the state-of-the-art methods. The results suggest theefficacy of our proposed approach in offering minimum latencyand energy consumption, which is critically important especiallyfor resource-constrained wireless environments.

Index Terms—Cyber physical systems (CPSs), mobile broadcastquery processing, wireless data dissemination.

I. INTRODUCTION

C YBER physical systems (CPS) represent the frontier oftechnology for supporting seamless integration between

the physical and cyber worlds through wired or wireless com-munication networks [1], [2]. In the context of retail industry,often the location parameter constitutes a significant factor forthe continuing business of the industry. For example, retail out-lets which are not located in a strategic area will generally findit difficult to survive. This may not necessarily be due to thegoods or services being offered, but often passersby (i.e., po-tential customers) are not aware of the existence of the outlets,and therefore it may lose its competitive advantages. CPS willbe able to address this by establishing link between retail outlets(represented by its geometrical location) and the cyber worldsas a means to enhancing customers’ awareness and support thesurvival of the industry.The worldwide number of mobile users has been growing

significantly in the last decade. A recent estimate from Frostand Sullivan1 indicates that, from a world population of al-

Manuscript received February 14, 2012; revised September 08, 2012;accepted October 30, 2012. Date of publication November 29, 2012; date ofcurrent version October 14, 2013. This work was supported in part by theAustralian Research Council (ARC) Discovery Project DP0987687. Paper no.TII-12-0070.A. B. Waluyo, D. Taniar, and B. Srinivasan are with the Clayton School

of Information Technology, Monash University, Clayton, Victoria 3800, Aus-tralia (e-mail: [email protected]; [email protected]; [email protected]; [email protected]).W. Rahayu is with the Department of Computer Science and Computer En-

gineering, La Trobe University, Melbourne, Victoria 3086, Australia (e-mail:[email protected])Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TII.2012.2230636

1http://www.tridium.com/galleries/white_papers/Intelligent%20Enter-prise%20Platform.pdf

most seven billion, more than half (e.g., four billion and stillcounting) have mobile handsets, including 9.7 million mobilehandset subscribers in Australia alone in 2011.2 It is becomingclear that the possession and use of mobile devices is equallyimportant as water and food in modern society. Given the largescale and growing use of mobile devices, there is an increasingrequirement for network connectivity and wireless and datacommunication for the CPS systems.The proposed approach aims to support the CPS systems

through advanced data broadcast mechanism with particularfocus on Nearest-Neighbor query processing. With

-query, mobile users will be able to obtain the exact loca-tion of the retail outlets in the surrounding area that is closestto the current location of the mobile user. However, mobiledevices are subject to inherent resource constraints associatedwith the battery life, bandwidth, and computational resourcessuch as processor speed, memory size, and disk capacity. Thus,in achieving the full potential of CPS in industrial applications,it is imperative to investigate specific techniques to supportefficient [3], [4], [21], [22], scalable [5], [7], [8], real-time [6],and trustworthy [25], [26] information delivery.The location of the mobile user is derived from the global

positioning system (GPS) function in the mobile devices. Typ-ically, -query can be performed in two different ways. thefirst is to send the query to the server and await the server toprocess and transmit the query results based on the user’s loca-tion. The second way is to listen to the broadcast channel whilethe server periodically disseminates the location of the objectsof interest on the air. The latter method allows mobile clientsto simply select or filter the required data over the wirelesschannel, and this incurs significantly less power consumptionthan transmitting the query to the server [9], [10]. This method,which is commonly known as data broadcast services, has at-tracted much interest due to its scalability feature [10]. For ex-ample, the introduction of MSN Direct Service,3 which lets mo-bile users to receive timely information such as airline sched-ules, local traffic and weather information, or sporting events[5], [16] demonstrates the feasibility and industrial interest inutilizing wireless broadcast services. With this broadcast mech-anism, the requests from the clients are not known a priori.When clients are disconnected from the network during queryprocessing, they can repeat the process when the connection re-sumes without having to resend the query back to the server as isthe case with the traditional client-server applications [7], [10].Therefore, data broadcast is a very promising method for infor-mation delivery in a wireless environment [9], [10].

2http://www.abs.gov.au/ausstats/[email protected]/mf/8153.03[Online]. Available: http://www.msndirect.com

1551-3203 © 2012 IEEE

WALUYO et al.: CLUSTERING-BASED INDEX AND DATA BROADCASTING FOR MOBILE NEAREST NEIGHBOR QUERY PROCESSING 1965

The key challenges for -queries in mobile data broad-cast are: 1) to minimize the latency of the clients to obtain thedesired data and 2) to minimize the energy utilizations of theclients while performing query operations. The latter challengeis associated with the design of an effective and efficient in-dexing method, capable of translating from multidimensionalto one-dimensional (1-D) space. It should be noted that broad-cast indexing is always sequentially accessed [9].In this paper, we propose novel solutions to address these

challenges. Our solutions are threefold: 1) to formulate a cluster-based mechanism of the mobile users in the network as a meansto determine an appropriate broadcast scheduling; 2) to designa solution-based index taking into account the cluster parameterand the objects location; and 3) to apply the proposed schemesin multiple channel environments.Our metrics to estimate the cost of data access in a mobile

broadcast environment are: 1) access time: the time that elapsesfrom the time a request is initiated until all data items of in-terest are received and 2) tuning time: amount of time the clientspends listening for the desired broadcast data item(s). Tuningtime comprises two modes: active and doze mode. Active modeis when the client listens to the channel for the desired dataitem, which is costly in terms of power consumption, while dozemode is when clients simply turn to a power-saving mode. Gen-erally, the amount of power consumption is directly related tothe tuning time [9].Definition 1: Query access time is measured from the time

a client starts to probe into the broadcast channel to the timethey receive all of the desired data items. Suppose that databasecontains data items, so that . For

simplicity, we assume that the data packets are of equal size.Let us say that each data packet is broadcast continuously, andthe waiting time for each packet is associated with time . Thequery access time is the time the client first listens to thechannel to the time he/she receives the desired data item .If the desired data item is missing, the client has to wait for itto arrive in next cycle. The calculation of query access time canbe represented using

(1)

1) Example 1: The client is interested in data item #8 and be-gins to probe into the channel from the beginning of the broad-cast cycle. Assuming that the data items are broadcast in a se-quential order, will be .Definition 2: With the presence of the index broadcast, the

client does not have to listen to the broadcast channel contin-uously, as the index provides information to the client as towhen the data item of interest will arrive in the channel. Thus,the client is able to turn to sleep mode while waiting for thedata item and switch back on when the data is about to arrive.Let us assume the data packet for the data and index item areof similar structure and size. A set of index broadcast

, and this index node points to the. The index segment is always broadcast ahead

of the data items.

Fig. 1. Proposed methodology.

The tuning time, , can be measured with

(2)

where denotes the number of index level to traverse.2) Example 2: Let us assume and one index root

at the beginning of the broadcast cycle, so the number of datapackets in the index segment is 16. Correspondingly, .The client is to obtain data item #8 and starts to probe into thechannel in the beginning of the broadcast cycle. Assuming, the for this case will be .The proposed approach consists of three stages. The first of

the three stages is designed to determine the order of the broad-cast schedule. The second stage is to index each of the movingobjects in the cluster based on the solution-based indexing ap-proach. The third and last stage is concerned with utilization ofmultiple wireless channels to broadcast the objects and the as-sociated index. It is expected that the three stage of processingwill generate an optimized broadcast model which leads to min-imum access time and tuning time of clients. Fig. 1 depicts theproposed methodology.In this paper, it is assumed the packet size for each data item

and bandwidth of each channel is uniform. The data packet isdispatched and received in a sequential manner following thewireless characteristic [10]. Therefore, all bursts of a transmis-sion are the same size, which allows time-slicing bursts of dataand eventually enables the mobile client to estimate when therequested data item will arrive in the channel. The start timesof the transmission of each data packet in all available broad-cast channels are synchronized, i.e., the transmission of the datapacket begins simultaneously and the time interval of each datapacket is fixed.The remainder of this paper is organized as follows.

Section II describes the related work of the proposed technique.Section III introduces the proposed cluster-based broadcastscheduling, and the associated indexing scheme is described inSection IV. Section V studies the performance of the proposed


approach as compared to the existing methods. The results ofthe experiments and analysis are also provided in this section.Finally, Section VI concludes the paper.

II. RELATED WORK

Despite the growing trends in location-based mobile services[24], it is unfortunate that mobile environment experiencesinherent resource constraints such as short-life batteries, lim-ited storage restriction, frequency of disconnection, narrowbandwidth capacity, and asymmetric communications costs andbandwidth [9], [10]. Thus, efficient and timely data retrievalis essential in mobile environments. Deploying data broadcastallows mobile clients to obtain information without the need totransmit a request to the server.Mobile clients simply filter their desired data on the fly

[10]. Furthermore, data broadcast is able to serve any numberof queries, and the query performance is not affected by thenumber of mobile clients in a cell [10]. Strategies to broad-cast data through wireless channels have been effective inaddressing the efficiency and scalability issue [9].

A. Broadcast Scheduling

The different bandwidth capacity between the downstreamcommunication and upstream communication has created anew characteristic called Asymmetric Communication En-vironment. In fact, there are two situations that may lead tocommunication asymmetry [11]. One is due to the capability ofphysical devices. For example, servers have powerful broadcasttransmitters, while mobile clients have little transmission capa-bility. The other is due to the patterns of information flow in theapplication. For instance, in the situation where the number ofservers is far fewer than the number of clients, it is asymmetricbecause there is not enough capacity to handle simultaneousrequests from multiple clients. Therefore, in order to exploit theasymmetry communication bandwidth by periodic broadcast ofdata items through the downstream link, an effective broadcastscheduling scheme is important.The broadcast schedule determines how the order of data

items should be broadcast. Broadcast Disk is an informationsystem architecture, which utilizes multiple disks of differentsizes and speeds on the broadcast medium [11]. The broad-cast consists of chunks of data from different disks on the samebroadcast channel. The chunks of each disk are evenly scattered.However, the chunks of the fast disks are broadcast more fre-quently than the chunks of the slow disks. With differing broad-cast frequencies for different items, hot items can be broadcastmore often than others. The server is assumed to have priorknowledge of the clients’ access patterns so that it can deter-mine a broadcast strategy that will give priority to the hot items.It is designed so that the speeds of the disk can be adjusted tothe configuration of the broadcast. That way the fast disks areplaced closer to the client and vice versa.Other algorithms that can be used to identify the most effec-

tive organization of broadcast data items include heuristic tech-niques [17] and Genetic algorithm [12], [13]. The final broad-cast program can be distributed over both single and multiple

channels. However, none of the above methods were designedfor location-based queries in mobile environments.A closer effort to our work was proposed by Park, and Choo

[15] in which the concept of data sorting called Broadcast-basedLocation-Dependent Data Delivery Scheme (BBS) to reduce theaccess time was introduced. In this scheme, the server broadcastthe IDs of each data item (e.g., building names) and coordinatesof these items to the mobile clients without incorporating anyindex segments. The broadcast data items are sorted sequen-tially according to the location of these items prior to trans-mission. This sequential data broadcasting can be generated bylinearizing the two dimensional coordinates using horizontalbroadcasting (HB) or vertical broadcasting (VB) approach.The horizontal broadcast approach relates to the horizontal

order that is from the leftmost coordinate to the rightmost co-ordinate, whilst the vertical approach is from the bottom to thetop coordinate. In order to reduce the tuning time, a data pre-fetching and an Object Boundary Circle (OBC) scheme are ap-plied. This OBC scheme defines the weight values of each itembased on the spatial distance between the data items. Subse-quently, these items are classified into cold (less popular) andhot (more popular) data items and the broadcast frequenciesare adjusted accordingly. In this approach, client does not haveto wait for the next index segment to arrive but they can startquery processing right away by retrieving the actual data ob-jects. However, this method does not consider the location ofthe mobile users in the cell and therefore some portion of thebroadcast objects may not be relevant to any clients, which willconsequently lead to a longer access time.

B. Spatial Indexing

Spatial indexing is designed to offer efficient location-basedquery processing. The existing techniques of spatial indexingcan be classified into object-based and solution-based index[14]. Object-based indexing schemes will show its limitationwhen it comes to broadcast environment. This is mainly due tothe linear access of data items in the broadcast channel.Traditional databases and indexing [19], [20], [23] do not

have such constraint, and, hence, this object-based indexingmethod is not directly applicable. For example, the R-treeindex is given in Fig. 2(a), and the structure is given inFig. 2(b). It is assumed that the query processing algorithmrequires clients to visit the index nodes in the following order:

and then .However, the server may broadcast the index nodes in the

order and . In this case, the client needs to waitfor the next cycle in order to obtain , and thus the access timewill be significantly increased. This case is shown in Fig. 2(c).Alternatively, clients can just simply access the index nodes se-quentially as depicted in Fig. 2(d). However, in this case, clientsmight waste energy by accessing unnecessary nodes. A solu-tion-based index called D-tree indexing scheme for location-de-pendent data broadcast environment has been proposed by [14].The basic concept of the D-tree technique is to index data re-gions or valid scopes of data items based on the divisions thatform Valid Scope’s boundary. The index is then broadcast inter-leaving with relevant data item over a broadcast channel. Client


TABLE ICOMPARISON OF DATA DISSEMINATION/INDEXING SCHEMES

Fig. 2. R-tree architecture. (a) Minimum Bounding (b) R-tree index structure(c) Branch-and-Bound Search (d) Sequential Search.

can tune in to the channel, check the valid scopes with currentlocation, find out when the correct data item will be broadcast,and retrieve the data item on air. It uses polygon as a represen-tation of the valid scope. However, this method did not take intoaccount any broadcast scheduling efforts.Our proposed technique in this paper relates to the clustering-

based broadcast scheduling for spatial query processingsupported by a specific index structure to minimize both tuningand client’s access time through multiple wireless channels. Tothe best of our knowledge, the combination of these elements formobile -query processing has not been proposed in the past,and this distinguishes it from the state-of-the-art approaches.Table I shows a comparison of the existing data disseminationand indexing methods.

C. Problem Statement

In spatial mobile query processing, the access patterns are rel-ative to the location of the mobile user at any given time. There-fore, it is difficult to obtain prior knowledge about the accesspatterns of the mobile users as opposed to the traditional mo-bile query whereby the queries are static, and therefore the ac-cess patterns can be assumed that stationary users and wireless

users possess a similar access pattern. Additionally, the attemptto get a priori knowledge about access patterns such as “piggybacking” information from each mobile client at regular inter-vals is not desirable due to the inherent resource constrains inmobile environments. Therefore, it is imperative to determinean effective broadcast scheduling method in spatial query pro-cessing (e.g., -query) without the need for prior knowledgeof access patterns.Furthermore, the presence of the index in the broadcast cycle

helps to minimize the tuning time as the client may switch todoze mode while waiting for the data to arrive. However, thiswill incur a negative impact to the query access time. On theother hand, it was claimed that the best access time is whenthere is no index broadcast along with data items [9]. As aresult, clients will then have to waste their battery power lis-tening to unnecessary data until the desired data arrives. Thus,the challenge is to determine an effective index structure andclient mechanism that will be able to minimize client’s tuningtime without compromising query access time.

III. CLUSTER-BASED BROADCAST MECHANISM

A. Spatial Clustering on Mobile Users

The problem of broadcast scheduling is shown to be NP-hard[12], [13], [15], and therefore our proposed method aims to op-timize the query access time of the majority of mobile clientsthrough cluster-based data broadcast scheduling. Clustering al-gorithms principally are a means to partition data into a certainnumber of clusters (i.e., groups, subsets, or categories) based onone (or more) characteristics.Every data that belong to the same group are more “alike”

than data in different clusters. While there are numerousmethods of clustering, they may fall into one of the followingtwo: partitional clustering or hierarchical clustering. Parti-tional clustering creates clusters, where , where isthe number of input objects, by following three defined rules:1) each cluster should have at least one member; 2) all inputobjects are necessarily fallen into a cluster; and 3) each inputobject can belong to one and only one cluster. One of the mostcommon partitional clustering algorithms is -means. It getsthe objects one by one and compares their distance with thecenter of the current cluster and adds the objects to the clusterwith the lowest distance.As for hierarchical clustering concept, it constructs a tree-like

architecture, which can be either agglomerative (bottom top)or divisive (top bottom). Agglomerative clustering treats each


Fig. 3. Spatial-based clustering on mobile users.

incoming data object as a single cluster and then successivelymerges clusters until all objects have been merged into a singleremaining cluster. Divisive clustering, on the other hand, putsall of the objects in a single cluster and performs divisions untilnumber of clusters is achieved.In our approach,mobile users are clustered based on their spa-

tial proximity using -means algorithm. We calculate the dis-tance between a new data item and cluster centers using Eu-clidian distance. Assuming that there are already number ofclusters centroids (the centre of clusters), once a new input ob-ject is received, its distance to all centroids is calculated. Thedata item is added to the cluster with the lowest distance to thecentroid.The maximum number of clusters in this case is . The cen-

troids of the clusters are continually adjusted to get the most ap-propriate centroid locations and, hence, classification. Note theexample in Fig. 3, which shows a map of the Melbourne CBDarea. In the example, our algorithm clusters mobile users intoeight different clusters, which are labelled from A to H. Theseclusters represent the location of a group of mobile users at time, and the result determines our broadcast schedule, which isfurther described in Section IV. Algorithm 1 presents our clus-tering scheme.

Algorithm 1: Clustering Scheme

Input: A data set , data item , distance threshold ,eucledian distance , maximum number of clusters .

Output: Clustered data set.

Procedure:

1. Place points of the data set as initial group centroids.2. Assign each data item to the group that has the closestcentroid based on the of the respected data item to thecentroids.

3. If , create a new centroid.4. When all items, has been assigned to a cluster, thepositions of the all the centroids are recalculated.

5. Repeat Steps 2 and 3 until the centroids is no longerchange. This produces a separation of the set of data itemsinto groups from which the metric to be minimized can bedefined.

End Procedure

Fig. 4. (a) Flat versus (b) nonflat broadcast schedule.

B. Data Broadcast Scheduling

The broadcast schedule determines how the order of dataitems should be broadcast. Typically, there are two type ofscheduling, namely:• flat scheduling: in this schedule, the data items are broad-cast in cyclical manner and with the same frequency be-tween items;

• nonflat scheduling: the data item in this scheduling isbroadcast with differing frequency based on the definedcriteria.

The flat schedule, as shown in Fig. 4, is known to be mosteffective when the request probability for each data item isequally the same. However, when the request probability isskewed and no longer uniform, the other scheduling performsbetter. This is reasonable as the data items that are broadcastfrequently will gain a shorter access time than those that areless frequently broadcast. To generate the nonflat broadcastschedule, we will extend the broadcast disk technique in [11].In the original method, the differing broadcast frequencies ofthe disk are determined based on the request patterns of theuser. The popular or hot items are placed in a faster disk to bebroadcast more often than the others.

Algorithm 2: Data Broadcast Scheduling

Input: A data set.

Output: Ordered data set.

Procedure:

1. The data set is organized in the order of hottest (i.e., clusterwith most density) to coldest (i.e., cluster with least den-sity).

2. The list of data set is partitioned based on their associationwith the clusters.

3. The relative frequency of broadcast for each of the dataitem determined based on the cluster’s density and its com-parison with the others. For instance, there are two dataitems with different broadcast frequencies, item 1 is to bebroadcast four times for every three two of data item 2 isbroadcast, so the , and .This corresponds to the case whereby the cluster density ofdata item 1 is 2 times larger than the cluster density of dataitem 2.

4. Each cluster of data items is divided into a number ofsmaller units. These units may be called chunks (e.g.,is defined as the th chunk in a cluster of data item ). Thefirst step is to calculate as the Least CommonMultiple (LCM) of the relative frequencies. Subsequently,


Fig. 5. Clustering of mobile users—a walkthrough example.

a cluster of data item is broken intochunks. Based on the previous

example, the , and.

5. The broadcast program is made by interleaving the chunksof each cluster of data items as described in Fig. 5.For toFor toBroadcast chunk ,

End forEnd for

End ProcedureThe server is assumed to have obtained the knowledge of

clients’ access patterns so that it can determine a broadcaststrategy that will give priority to the hot items. It is designed sothat the speeds of the disk can be adjusted to the configurationof the broadcast. That way, the fast disks are placed closer tothe client and vice versa. In our case, instead of following therequest patterns of the client, we use our clustering approachto determine the broadcast frequency of the data items and thedata items that belong to a particular disk. The density of ourcluster replaces the access frequency of the original method.Algorithm 2 describes our cluster-based broadcast schedulingmethod.Fig. 5 shows an example of the result of our clusteringmethod

to a particular region. It can be seen from Fig. 5 that there arethree clusters altogether, which we label as clusters , , and. Let us assume that the density of cluster is twice as large

as that of cluster and four times larger than the density ofcluster .Likewise, the density of cluster is twice as large as

the density of cluster . From this example, we can gen-erate the broadcast program following our disk schedulingmethod. Fig. 6 depicts a list of associated objects with thethree clusters , , and , which corresponds to thethree clusters , , and , respectively, in Fig. 5. isto be broadcast twice as frequently as and four times asfrequently as . Thus, , ,and . Subsequently, these objects are dividedinto chunks. The is 4, so ,

, and . The finalbroadcast has four minor cycles; each cycle consists of onechunk of each cluster’s objects and has a period of 16 data sets.

Fig. 6. Cluster-based broadcast program.

IV. SPATIAL BROADCAST INDEXING METHOD

Data broadcast indexing is a means to minimize the tuningtime of the client when accessing broadcast data. In the ear-lier section, we discussed our clustering technique for mini-mizing the query access time ofmobile clients, but, since there isno broadcast directory which allows clients to determine wheneach of the data items will arrive in the channel, clients will haveto listen or tune in to the channel from the time they probe intothe broadcast channel until the desired data items are received.Although the proposed broadcast scheduling scheme attemptsto minimize the query access time, clients will still have to con-sume a substantial amount of power when listening to unneces-sary data items, and we address this issue here.Broadcast indexing is a desirable technique to apply in a mo-

bile broadcast environment due to its ability to provide accu-rate information so that a client can tune in at the appropriatetime for the required data [15]. In this scheme, some form ofdirectory is broadcast along with the data, with the clients ob-taining the index directory from the broadcast and using it insubsequent reads. Based on the information given in the index,the client is able to estimate the exact time of the data arrivalin the channel. As a result, mobile clients are able to conserveenergy by switching into power-saving mode or doze mode andback to active mode when the data are about to be broadcast.With this selective tuning, client’s energy consumptions can beminimized, which in turn prolongs the battery life. Therefore,an efficient data broadcast structure is critical in saving mobileclients’ battery power.The key points for designing an efficient index structure in-

clude the following. It should incur small space cost, it shouldconsider linear broadcast order (i.e., the less index nodes to tra-verse, the better it is), and it should perform the search withminimum tuning and access time. Spatial indexing schemes aredesigned to enable users to: 1) locate their current location inrespect to the spatial model that is being deployed and 2) obtainthe relevant objects associated with the user location followingthe model and extracts the nearest object from the user’s currentlocation.In this paper, we leverage the concept of solution-based index

with valid scope. Valid scopes define the boundary of an area orregion within which the query result is considered valid. We uti-lize the shape of square as the valid scope. For simplicity, a geo-metric location is represented as two dimensional coordinate. Asquare is considered to be easier to construct since it has same


Fig. 7. Cluster-based indexing—stage 1. (a) Spatial clustering of mobile users.(b) Index structure. (c) Index node.

length and width. Consequently, the information sent to repre-sent the valid scope boundary will be small, thus mobile clientswill have a more efficient query processing. The valid scopewill be broadcast together with the data index. When client ini-tiates a query, the validity of the broadcast data will be checkedby comparing the valid scope of the data instances with client’scurrent location. Our proposed index is designed following twostages as follows.

A. Cluster-Based Indexing (Stage 1)

In this stage, the index is constructed based on the resultedclusters described in Section III. We partition the space con-taining a set of clusters of mobile clients until each sub-spacecovers one cluster only. The partition of two subspaces is rep-resented by two straight lines forming a square ( -coordinatedimensional and -coordinate dimensional). For example, inFig. 7, we can see a square valid scope, which represents the re-gion of a cluster. There are four clusters involved and hence fourregions: region 1 , region 2 , region 3 ,and region 4 .The index node contains a header which indicates the par-

tition style, whether it is -dimensional partition or -dimen-sional partition. The partition always starts from the top clus-ters to the bottom clusters and begins with -dimensional parti-tion. Following the header is a set of coordinates partition andtwo pointers (left and right pointer). The left and right pointerof the -dimensional partition indicates the upper and lowerspace, respectively. Meanwhile, the left and right pointers ofthe -dimensional partition divides the left and right spaces of

TABLE IIINDEX NODE: DESCRIPTION

the partition. The pointer of the nonleaf nodes will direct mobileclients to the relevant child nodes, and the leaf-node pointer toour object-based indexing (stage 2 described below). Table IIdescribes each of the attributes of the index node. It is worthyto note that this stage is normally carried out in the server andis not part of the broadcast segments.

B. Object-Based Indexing (Stage 2)

The second-stage indexing follows the same concept appliedin the first stage for the static objects of interest within the clusterregions. In this stage, the objects within the cluster region arepartitioned until each subspace covers one object only. This ob-ject represents the -object of the querying user within thespace.Similar to the first indexing stage, the partition of two sub-

spaces is represented by two straight lines forming a square( -coordinate dimensional and -coordinate dimensional).Fig. 8(a) illustrates the partition. Fig. 8(b) depicts our indexconstruction based on the space partition in Fig. 8(a). The indexnode in Fig. 8(c) is the same as that in the first stage except that

and of the leaf nodes are now referring tothe data pointer. Furthermore, there is a in the indexnode of the leaf nodes which specify the wireless channelwhere the data segment will be broadcast. Fig. 8(d) shows theattributes of the data packet.Table III specifies the description of each attribute in the

index node. The pointers of the nonleaf node point to the rele-vant child nodes, while the leaf-node pointer goes to the datasegment that contains the valid data instance in the particularregion. These data pointers contain the times when the datainstance(s) will arrive in the channel. In this index scheme,each node has exactly two subsequent children. Depending onthe regions involved, the final partition of the space may leadto a different index structure.

V. MULTI-CHANNELING

Our proposed broadcast scheduling and indexing techniquewill be initially broadcast in two (with possible extension) tomore than two logical asynchronous wireless channels. The firstchannel contains index structure, and the other broadcasts thedata segments. The first channel mainly applies for mobile usersto determine the relevant -objects based on their current lo-cation. The second channel will disseminate set of data items


Fig. 8. Object-based indexing—stage 2. (a) Object-based partitioning.(b) Index structure. (c) Index node. (d) Data packet.

TABLE IIIINDEX NODE: DESCRIPTION

or objects following our scheduling approach. In this scheme,client first tunes into the index channel to obtain the right indexthat helps to identify the -object. Following this, clients willtune into the data channel to obtain the object and its associ-ated information. The process for querying location dependentbroadcast data is described in algorithm 3.

Algorithm 3: Client processing

Input: Index node, data node.Output: -objectProcedure:1. Client tunes into the index channel.2. Client checks the index version.3. If4. Switches to data channel.

TABLE IVPARAMETERS OF CONCERN

5. Retrieves the -object.6. Else7. Waits until root index arrives in the channel.8. Traverses the index tree following the region pointer

given in the node based on the current location as areference.

9. Switches to doze mode between nodes.10. Obtains the -Object ID.11. Tunes into the data channel.12. Waits until the relevant -object arrives in the

channel.13. Obtains the -Object.End Procedure

VI. SIMULATION TEST-BED

Here, we study the performance of the proposed approach inrelation to the query access time, client’s tuning time, and theassociated power consumption. This includes comparison withthe most relevant existing methods [14], [15] based on simu-lated experiments. The simulation is carried out using Plani-mate, which is a discrete event simulation tool written in C++[17]. The simulation environment is set to apply random delaysfor the packet transmission rate. Likewise, the data set associ-ated with the mobile users and objects’ locations are generatedrandomly.The parameters of concern are given in Table IV. The channel

bandwidth is set to 64 kbps, which is within the range of theEDGE standard [18]. As an initial proof of concept, this exper-iment aims to obtain an insight of the potential of the proposedapproach and how it performs as compared with the existingmethods. We use our example in Fig. 5 as a case study of theexperiments. For comparison purposes, we apply the concept in[14] and [15], which is referred as Ex1 and Ex2, respectively.As described earlier, the concept in [14] is to broadcast the solu-tion-based index without any clustering involved, whilst in [15]the data are broadcast on the basis of horizontal or vertical or-dering approach without using any index.

A. Query Access Time

This case relates to the query access time comparison be-tween the proposed approach and the existing schemes (Ex1and Ex2). As can be seen from Fig. 9, the access time of ourapproach suggests a better performance than that of Ex1. In re-lation to our approach, the cluster with the highest density


Fig. 9. Access time comparison: proposed versus Ex1 versus Ex2

Fig. 10. Average access time.

experiences the lowest access time, among others. The increasesof access time between and are due to the significantdifferences in density between the two clusters, and therefore ittriggers the increase.When the density gets smaller (e.g., and ), the access

time stabilizes. This is aligned with the clustering principle thathas been applied in order to satisfy the majority clients in theclusters. Fig. 9 depicts the performance of our approach as com-pared with that of Ex2. Ex2 is rather distinct from the rest as itdoes not involve any indexing. The result suggests that our ap-proach yields a slightly better performance than Ex2. We canalso see that Ex2 outperforms Ex1 in Fig. 9, where all threeschemes are compared. In Fig. 10, we can see the average accesstime among the clients in the three clusters, and our proposedscheme is able to offer lower access time.

B. Clients Tuning Time/Power Consumption

In this case, we measure the tuning time performance of theproposed scheme and compare the results with the existingschemes (Ex1 and Ex2). From Fig. 11, we can see that thetuning time performance of our approach is slightly larger for

and . This is due to the fact that the proposed approachaims to minimize the largest cluster, which can be seen in ,where our approach offers the least tuning time. When it comesto the overall tuning time of all mobile clients in the threeclusters, as shown in Fig. 12, our approach provides a lowertuning time than Ex1 but a little higher than Ex1. As evidenced

Fig. 11. Tuning time comparison: Proposed versus Ex1 versus Ex2

Fig. 12. Average tuning time.

Fig. 13. Power consumption comparison.

by the results, the dynamic feature of the proposed approach isdriven by the density of the clusters. The higher the density ofthe cluster, the better the performance will be, and this featureis not found in the other schemes.Tuning time is closely associated with the power con-

sumption of the clients. Thus, we may translate this to powerconsumption following Imielinski et al.’s [9] equation, in whicha device with a Hobbit chip (AT&T) requires about 250 mWpower consumption during active mode and 50 W duringpower-saving mode. For simplicity, in this evaluation, other


activities and components that require power are disregarded,and it is assumed that 250 mW relates to the total transmissionpower consumption, and the result is shown in Fig. 13.

VII. CONCLUDING REMARK

CPSs has recently attracted a great deal of interest from thesociety due primarily to its ability to support interconnectionbetween physical and cyber worlds through wired or wirelesscommunication networks [1]. The role of CPS in this paper isto provide a bridge between industry retailers (as the objects)and mobile users through advanced data broadcast mechanismwith particular attention to -query processing.The efficacy of our proposed approach involving

cluster-based index and data broadcasting has been testedby way of experimentation. The experimental study relates tothe two performance metrics, namely, query access time andclient’s tuning time. The result of the experiments shows thatour proposed approach is capable of offering the lowest accesstime. With respect to the client’s tuning time, the proposedapproach is comparable to the existing techniques. However,based on the proposed approach, when the cluster density ishigh, the tuning time of the majority mobile clients will besmall. Thus, it will minimize the overall power consumption.Moving forward, we will attempt to address different type of

mobile queries including range, -NN, , and continuousqueries and investigate how our clustering approach may be ex-tended to meet the requirements of these queries.

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers andthe editor for their invaluable comments and feedback.

REFERENCES

[1] O. Hussain, E. Chang, F. K. Hussain, and T. S. Dillon, “A method-ology to quantify failure for risk-based decision support system in dig-ital business ecosystems,” Data Knowledge Eng., vol. 63, no. 3, pp.597–621, 2007.

[2] D. Guinard, V. Trifa, S. Karnouskos, P. Spiess, and D. Savio, “Inter-acting with the SOA-based internet of things: Discovery, query, selec-tion, and on-demand provisioning of web services,” IEEE Trans. Ser-vices Comput., vol. 3, no. 3, pp. 223–235, Jul.–Sep. 2010.

[3] J.-W. Lee, B.-S. Choi, and J.-J. Lee, “Energy-efficient coverage ofwireless sensor networks using ant colony optimization with threetypes of pheromones,” IEEE Trans. Ind. Inf., vol. 7, no. 3, pp. 419–427,Aug. 2011.

[4] J. Ploennigs, V. Vasyutynskyy, and K. Kabitzsch, “Comparativestudy of energy-efficient sampling approaches for wireless controlnetworks,” IEEE Trans. Ind. Inf., vol. 6, no. 3, pp. 416–424, Aug.2010.

[5] S. Ilarri, E. Mena, A. Illarramendi, R. Yus, M. Laka, and G. Marcos,“A friendly location-aware system to facilitate the work of technicaldirectors when broadcasting sport events,”Mobile Inf. Syst., vol. 8, no.1, pp. 17–43, 2012.

[6] P. T. A. Quang and D.-S. Kim, “Enhancing real-time delivery of gra-dient routing for industrial wireless sensor networks,” IEEE Trans. Ind.Inf., vol. 8, no. 1, pp. 61–68, Feb. 2012.

[7] A. B.Waluyo,W.Rahayu, D. Taniar, and B. Srinivasan, “A novel struc-ture and access mechanism for mobile broadcast data in digital ecosys-tems,” IEEE Trans. Ind. Electron., vol. 58, no. 6, pp. 2173–2182, Jun.2011.

[8] A. B. Waluyo, D. Taniar, W. Rahayu, and B. Srinivasan, “Mobilebroadcast services with MIMO antennae in 4G wireless networks,”World Wide Web: Internet and Web Inf. Syst., vol. 14, no. 4, pp.351–375, 2011.

[9] T. Imielinski, S. Viswanathan, and B. R. Badrinath, “Data on air: Or-ganization and access,” IEEE Trans. Knowl. Data Eng., vol. 9, no. 3,pp. 353–371, May.–Jun. 1997.

[10] A. B. Waluyo, B. Srinivasan, and D. Taniar, “Research in mobile data-base query optimization and processing,” Mobile Inf. Syst., vol. 1, no.4, pp. 225–252, 2005.

[11] S. Acharya, R. Alonso, M. Franklin, and S. Zdonik, “Broadcast disks:Data management for asymmetric communication environments,” inProc. ACM Sigmod Int. Conf. Manag. Data, 1995, pp. 199–210.

[12] J. L. Huang and M.-S. Chen, “Dependent data broadcasting forunordered queries in a multiple channel mobile environment,” IEEETrans. Knowl. Data Eng., vol. 16, no. 9, pp. 1143–1156, Sep. 2004.

[13] J. L. Huang, M.-S. Chen, and W.-C. Peng, “Broadcasting dependentdata for ordered queries without replication in a multi-channel mobileenvironment,” in Proc. 19th IEEE Int. Conf. Data Eng. Conf., 2003,pp. 692–694.

[14] J. Xu, B. Zheng, W.-C. Lee, and D. L. Lee, “Energy efficient index forquerdying location-dependent data in mobile broadcast environments,”in Proc. 19th IEEE Int. Conf. Data Eng. Conf., 2003, pp. 239–250.

[15] K. Park and H. Choo, “Energy-efficient data dissemination schemes fornearest neighbor query processing,” IEEE Trans. Comput., vol. 56, no.6, pp. 754–768, Jun. 2007.

[16] T. Delot, S. Ilarri, N. Cenerario, and T. Hien, “Event sharing in vehic-ular networks using geographic vectors and maps,” Mobile Inf. Syst.,vol. 7, no. 1, pp. 21–44, 2011.

[17] D. Seeley et al., -Animated Planning Platforms. Bris-bane, Australia: InterDynamics Pty Ltd..

[18] M. Gospic and M. L. Dukic, “Mobile datacom networks,” in Proc. 5thInt. Conf. Telecommun. Modern Satellite, Cable and Broadcasting Ser-vice, 2001, vol. 1, pp. 123–130.

[19] Taniar and W. Rahayu, “A taxonomy of indexing schemes for paralleldatabase systems,” Distributed and Parallel Databases, vol. 12, no. 1,pp. 73–106, 2002.

[20] D. Taniar and W. Rahayu, “Global parallel index for multi-processorsdatabase systems,” Inf. Sci., vol. 165, no. 1–2, pp. 103–127, 2004.

[21] G. Zhao, K. Xuan, W. Rahayu, D. Taniar, M. Safar, M. Gavrilova, andB. Srinivasan, “Voronoi-Based continuous k nearest neighbor searchin mobile navigation,” IEEE Trans. Ind. Electron., vol. 58, no. 6, pp.2247–2257, Jun. 2011.

[22] G. Zhao, K. Xuan, and D. Taniar, “Path kNN query processing inmobile systems,” IEEE Trans. Ind. Electron., vol. 60, no. 3, pp.1099–1107, Mar. 2013.

[23] T. Haapasalo, I. Jaluta, S. Sippu, and E. Soisalon-Soininen, “On therecovery of R-trees,” IEEE Trans. Knowl. Data Eng., vol. 25, no. 1,pp. 145–157, Jan. 2013.

[24] G. Schoier and G. Borruso, “Spatial data mining for highlightinghotspots in personal navigation routes,” Int. J. Data WarehousingMining, vol. 8, no. 3, pp. 45–61, 2012.

[25] A. B. Waluyo, D. Taniar, W. Rahayu, A. Aikebaier, M. Takizawa, andB. Srinivasan, “Trustworthy and efficient data broadcast model for P2Pinteraction in resource-constrained wireless environments,” J. Comput.Syst. Sci., vol. 78, no. 6, pp. 1716–1736, 2012.

[26] A. B. Waluyo, D. Taniar, W. Rahayu, A. Aikebaier, M. Takizawa, andB. Srinivasan, “Mobile Peer-to-Peer data dissemination in wirelessAd-Hoc networks,” Inf. Sci. DOI: 10.1016/j.ins.2012.07.035.

Agustinus Borgy Waluyo received the Ph.D. de-gree in computer science from Monash University,Clayton, Australia.Since the completion of his Ph.D. studies in 2006,

he has been involved extensively in large researchand development projects. He was the lead memberof the team and key developer of the projects, whichresulted in several working prototypes. His researchexpertise revolves around the area of resource man-agement and efficiency in wireless sensor networks,mobile query processing and optimization, trust

model and management, and data processing and analysis.


David Taniar received the Ph.D. degree in computerscience from Victoria University of Technology,Australia.His current research interests include mobile/spa-

tial databases, parallel/grid databases, and XMLdatabases. He is a founding Editor-in-Chief ofMobile Information Systems. He is currently anAssociate Professor with the Faculty of InformationTechnology, Monash University, Clayton, Australia.

Wenny Rahayu received the Ph.D. degree in com-puter science from La Trobe University, Melbourne,Australia.She is currently the Head of Department of

Computer Science and Computer Engineeringdepartment at La Trobe University, Melbourne,Australia. The main focus of her research is theintegration and consolidation of heterogeneous dataand systems to support a collaborative environmentwithin a highly data-rich environment. To date,she has been the principle investigator or one of

the chief investigators of two ARC (Australian Research Council) IndustryLinkages, Large Industry collaboration grants, International grants (Japan JSPSand Australia Indonesia AIGRP), International Standard Bodies such as OpenGeospatial Consortium (OGC), VPAC (Victoria Partnership for AdvancedComputing), and the AAS (Australia Academy of Science). In the last tenyears, she has authored or coauthored two books and more than 100 researchpapers in international journals and proceedings and edited three books. Shehas supervised to completion ten Ph.D. students, around 30 Honours, and tenM.S. students in database management areas.

Bala Srinivasan received the Ph.D. degree incomputer science from the Indian Institute ofTechnology, Kangpur, India.He is a Professor of information technology

with Monash University, Clayton, Australia. Hehas more than 30 years of experience in academia,industry, and research organizations. He has au-thored and jointly edited six technical books andmore than 300 refereed publications in interna-tional journals and conferences in the areas ofmultimedia databases, data communications, data

mining, and distributed systems.

Date post:	24-Dec-2016
Category:	Documents
Upload:	bala
View:	219 times
Download:	0 times

Clustering-Based Index and Data Broadcasting for Mobile Nearest Neighbor Query Processing

Documents