+ All Categories
Home > Education > A scalable and mobility resilient data search system for large-scale mobile wireless networks

A scalable and mobility resilient data search system for large-scale mobile wireless networks

Date post: 13-Nov-2014
Category:
Upload: wingztechnologieschennai
View: 102 times
Download: 1 times
Share this document with a friend
Description:
2014 IEEE / Non IEEE / Real Time Projects & Courses for Final Year Students @ Wingz Technologies It has been brought to our notice that the final year students are looking out for IEEE / Non IEEE / Real Time Projects / Courses and project guidance in advanced technologies. Considering this in regard, we are guiding for real time projects and conducting courses on DOTNET, JAVA, NS2, MATLAB, ANDROID, SQL DBA, ORACLE, JIST & CLOUDSIM, EMBEDDED SYSTEM. So we have attached the pamphlets for the same. We employ highly qualified developers and creative designers with years of experience to accomplish projects with utmost satisfaction. Wingz Technologies help clients’ to design, develop and integrate applications and solutions based on the various platforms like MICROSOFT .NET, JAVA/J2ME/J2EE, NS2, MATLAB,PHP,ORACLE,ANDROID,NS2(NETWORK SIMULATOR 2), EMBEDDED SYSTEM,VLSI,POWER ELECTRONICS etc. We support final year ME / MTECH / BE / BTECH( IT, CSE, EEE, ECE, CIVIL, MECH), MCA, MSC (IT/ CSE /Software Engineering), BCA, BSC (CSE / IT), MS IT students with IEEE Projects/Non IEEE Projects and real time Application projects in various leading domains and enable them to become future engineers. Our IEEE Projects and Application Projects are developed by experienced professionals with accurate designs on hot titles of the current year. We Help You With… Real Time Project Guidance Inplant Training(IPT) Internship Training Corporate Training Custom Software Development SEO(Search Engine Optimization) Research Work (Ph.d and M.Phil) Offer Courses for all platforms. Wingz Technologies Provide Complete Guidance 100% Result for all Projects On time Completion Excellent Support Project Completion & Experience Certificate Real Time Experience Thanking you, Yours truly, Wingz Technologies Plot No.18, Ground Floor,New Colony, 14th Cross Extension, Elumalai Nagar, Chromepet, Chennai-44,Tamil Nadu,India. Mail Me : [email protected], [email protected] Call Me : +91-9840004562,044-65622200. Website Link : www.wingztech.com,www.finalyearproject.co.in
Popular Tags:
11
A Scalable and Mobility-Resilient Data Search System for Large-Scale Mobile Wireless Networks Haiying Shen, Senior Member, IEEE, Ze Li, Student Member, IEEE, and Kang Chen Abstract—This paper addresses the data search problem in large-scale highly mobile and dense wireless networks. Current wireless network data search systems are not suitable for large-scale highly mobile and dense wireless networks. This paper presents a scalable and mobility-resilient LOcality-based distRibuted Data search system (LORD) for large-scale wireless networks with high mobility and density. Taking advantage of the high density, rather than mapping data to a location point, LORD maps file metadata to a geographical region and stores it in multiple nodes in the region, thus enhancing mobility-resilience. LORD has a novel region-based geographic data routing protocol that does not rely on flooding or GPSs for data publishing and querying, and a coloring-based partial replication algorithm to reduce data replicas in a region while maintaining the querying efficiency. LORD also works for unbalanced wireless networks with sparse regions. Simulation results show the superior performance of LORD compared to representative data search systems in terms of scalability, overhead, and mobility resilience in a highly dense and mobile network. The results also show the high scalability and mobility-resilience of LORD in an unbalanced wireless network with sparse regions, and the effectiveness of its coloring-based partial replication algorithm. Index Terms—Data search, wireless networks, geographic routing, topological routing, distributed hash tables Ç 1 INTRODUCTION R ECENT technical advancements have enabled the devel- opment of a large-scale wireless network (e.g., wireless sensor network (WSN) and mobile ad hoc network (MANET)) consisting of a vast number of mobile nodes dispersed over a wide area. An important problem in such wireless networks is data search. This paper particularly addresses the data search problem in large-scale wireless networks with high mobility and density. WSNs are used in various applications such as military sensing and tracking, habitat monitoring, health monitor- ing, environmental contaminant detection, and wildfire tracking. In a WSN, sensors coordinate to perform distrib- uted sensing of environmental phenomena, and collect and share widely-scattered distributed data in a cooperative manner, which makes data search critical to WSNs. Also, considering the dramatic growth of mobile devices (e.g., laptops, smartphones, and communication-enabled vehi- cles) and the restrictions of wired communication, mobile data search applications that enable ubiquitous data access wherever will proliferate in the near future. It is envisioned that there will be omnipresent wireless devices, and some urban areas will be densely covered by ubiquitous mobile nodes (e.g., WiFi enabled cabs in the Manhattan Area) [1]. Therefore, an efficient data search system for a highly mobile and dense wireless network is needed. However, current wireless network data search systems are not suitable for such an environment. The flooding and local-broadcasting methods in WSNs [2], [3], [4], [5], [6], [7] and in MANETs [8] are not energy- efficient due to a tremendously high volume of transmitted messages. Local-broadcasting also cannot guarantee data discovery. In the topological routing based method in MANETs [9], [10], [11], nodes advertise their available data, build content tables for received advertisements, and forward data requests to the nodes with high probability of possessing the data. However, this method generates high overhead for advertising and table maintenance. Also, it cannot guarantee data discovery because of possible expired routes in the content tables owing to node mobility. Geographic routing based data search systems [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22] have been proposed for WSNs for high scalability. Specifically, a file is mapped to a geographic location based on the distributed hash table (DHT) data mapping policy (a file is mapped to a location whose ID is closest to the file’s ID), and stored in a node closest to the geographic location using geographic routing [23], [24], [25], [26]. To search a file, a requester calculates the mapped geographic location and uses geographic routing to send the query to the location. However, in a highly mobile network, a file needs to be frequently transferred to its new mapped file holder, which produces high overhead. Also, a delayed data mapping update may lead to a querying failure. In addition, geographic routing needs exact geographic node localization (e.g., (x, y)) using GPS [20], [12] or virtual coordinates [17], [27], [28], which exacerbates overhead burden and increases energy consumption. GPS consumes nodes’ precious energy resources and may not provide location information in some . The authors are with the Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634 USA. E-mail: {shenh, zel, kangc}@clemson.edu. Manuscript received 13 Jan. 2013; revised 24 June 2013; accepted 25 June 2013. Date of publication 15 July 2013; date of current version 21 Mar. 2014. Recommended for acceptance by D. Manivannan. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TPDS.2013.174 1045-9219 Ó 2013 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/ redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 5, MAY 2014 1124
Transcript
Page 1: A scalable and mobility resilient data search system for large-scale mobile wireless networks

A Scalable and Mobility-Resilient DataSearch System for Large-Scale Mobile

Wireless NetworksHaiying Shen, Senior Member, IEEE, Ze Li, Student Member, IEEE, and Kang Chen

Abstract—This paper addresses the data search problem in large-scale highly mobile and dense wireless networks. Currentwireless network data search systems are not suitable for large-scale highly mobile and dense wireless networks. This paperpresents a scalable and mobility-resilient LOcality-based distRibuted Data search system (LORD) for large-scale wireless networkswith high mobility and density. Taking advantage of the high density, rather than mapping data to a location point, LORD mapsfile metadata to a geographical region and stores it in multiple nodes in the region, thus enhancing mobility-resilience. LORD hasa novel region-based geographic data routing protocol that does not rely on flooding or GPSs for data publishing and querying,and a coloring-based partial replication algorithm to reduce data replicas in a region while maintaining the querying efficiency.LORD also works for unbalanced wireless networks with sparse regions. Simulation results show the superior performance ofLORD compared to representative data search systems in terms of scalability, overhead, and mobility resilience in a highly denseand mobile network. The results also show the high scalability and mobility-resilience of LORD in an unbalanced wirelessnetwork with sparse regions, and the effectiveness of its coloring-based partial replication algorithm.

Index Terms—Data search, wireless networks, geographic routing, topological routing, distributed hash tables

Ç

1 INTRODUCTION

RECENT technical advancements have enabled the devel-opment of a large-scale wireless network (e.g., wireless

sensor network (WSN) and mobile ad hoc network(MANET)) consisting of a vast number of mobile nodesdispersed over a wide area. An important problem in suchwireless networks is data search. This paper particularlyaddresses the data search problem in large-scale wirelessnetworks with high mobility and density.

WSNs are used in various applications such as militarysensing and tracking, habitat monitoring, health monitor-ing, environmental contaminant detection, and wildfiretracking. In a WSN, sensors coordinate to perform distrib-uted sensing of environmental phenomena, and collect andshare widely-scattered distributed data in a cooperativemanner, which makes data search critical to WSNs. Also,considering the dramatic growth of mobile devices (e.g.,laptops, smartphones, and communication-enabled vehi-cles) and the restrictions of wired communication, mobiledata search applications that enable ubiquitous data accesswherever will proliferate in the near future. It is envisionedthat there will be omnipresent wireless devices, and someurban areas will be densely covered by ubiquitous mobilenodes (e.g., WiFi enabled cabs in the Manhattan Area) [1].Therefore, an efficient data search system for a highly

mobile and dense wireless network is needed. However,current wireless network data search systems are notsuitable for such an environment.

The flooding and local-broadcasting methods in WSNs[2], [3], [4], [5], [6], [7] and in MANETs [8] are not energy-efficient due to a tremendously high volume of transmittedmessages. Local-broadcasting also cannot guarantee datadiscovery. In the topological routing based method inMANETs [9], [10], [11], nodes advertise their availabledata, build content tables for received advertisements, andforward data requests to the nodes with high probability ofpossessing the data. However, this method generates highoverhead for advertising and table maintenance. Also, itcannot guarantee data discovery because of possibleexpired routes in the content tables owing to node mobility.

Geographic routing based data search systems [12], [13],[14], [15], [16], [17], [18], [19], [20], [21], [22] have been proposedfor WSNs for high scalability. Specifically, a file is mapped to ageographic location based on the distributed hash table (DHT)data mapping policy (a file is mapped to a location whose ID isclosest to the file’s ID), and stored in a node closest to thegeographic location using geographic routing [23], [24], [25],[26]. To search a file, a requester calculates the mappedgeographic location and uses geographic routing to send thequery to the location. However, in a highly mobile network, afile needs to be frequently transferred to its new mapped fileholder, which produces high overhead. Also, a delayed datamapping update may lead to a querying failure. In addition,geographic routing needs exact geographic node localization(e.g., (x, y)) using GPS [20], [12] or virtual coordinates [17], [27],[28], which exacerbates overhead burden and increasesenergy consumption. GPS consumes nodes’ precious energyresources and may not provide location information in some

. The authors are with the Department of Electrical and ComputerEngineering, Clemson University, Clemson, SC 29634 USA. E-mail:{shenh, zel, kangc}@clemson.edu.

Manuscript received 13 Jan. 2013; revised 24 June 2013; accepted 25 June2013. Date of publication 15 July 2013; date of current version 21 Mar. 2014.Recommended for acceptance by D. Manivannan.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TPDS.2013.174

1045-9219 � 2013 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 5, MAY 20141124

Page 2: A scalable and mobility resilient data search system for large-scale mobile wireless networks

situations (e.g. indoors) [27]. The virtual coordinate methodsneed periodic coordinator updates, which produces highoverhead in a highly mobile network.

To build a scalable and mobility-resilient distributed datasearch system for large-scale highly mobile and dense wirelessnetworks, we propose a LOcality-based distRibuted Datasearch system (LORD). LORD divides the entire wirelessnetwork area into a number of geographic regions. Themetadata1 of a file is mapped to a region using the DHT datamapping policy, and it is stored in all or a subset of the nodesin the region, thus enhancing mobility resilience. A nodeneeds to update data mapping only when it moves acrossregions, thus reducing maintenance overhead.

LORD has a novel Region-based Geographic Routing(RGR) protocol for data publishing and querying. RGR onlyrequires nodes to know their located regions and region angleinformation (i.e., Angle Of Arrival (AOA) [29]) using low-power antenna array or ultrasound receivers rather than exactgeographic location using GPSs, thus greatly reducingoverhead and energy of geographic routing algorithms [29],[30]. After a node retrieves its queried metadata, it requestsdifferent file segments from physically closest file holders. Itdetermines the sizes of segments requested from different fileholders based on their distances to minimize the file fetchinglatency. After receiving the file, the requester publishes themetadata of the file to its mapped regions. We summarize thecontributions of this paper as follows:

1. An efficient and congestion-resilient region-based datapublishing and querying protocol, which generateslow overhead for a highly mobile network.

2. An energy-efficient and mobility-resilient Region-based Geographic Routing protocol (RGR), whichonly requires region angle information based on low-power devices [29].

3. A parallel file fetching algorithm, which determinesseveral physically close file servers to send differentfile segments to minimize file retrieval latency.

4. A coloring-based partial replication algorithm thatreplicates metadata to a subset of nodes rather than allnodes in a region while maintaining search efficiency.

5. Extensive experimental results demonstrate thesuperior performance of LORD in comparison toprevious data search systems and the efficiency ofLORD components.

The preliminary version of this paper [31] introduces theregion-based data publishing, querying protocol and RGR.This version additionally proposes a method to maintainLORD’s efficiency in an unbalanced mobile network withsparse regions and the coloring-based partial replicationalgorithm. The remainder of this paper is organized asfollows. Section 2 presents the design of LORD. Section 3shows the performance of LORD with comparison toprevious representative data search systems in wirelessnetworks. Section 4 concludes the paper with remarks onour plans for future work. The supplemental file presentsan extension of LORD for unbalanced networks, additional

experimental results and a concise review of representativedata search systems in wireless networks.

2 THE LORD DATA SEARCH SYSTEM

In LORD, the entire geographical area is divided into anumber of physical regions (Section 2.1). Each file’s metadata ispublished to its mapped regions, and the requests for ametadata will be forwarded to its mapped regions (Section 2.2).Metadata’s publishing and querying rely on the region-basedgeographic routing (RGR) that sends a message destined toðx; yÞ to the region with location ðx; yÞ (Section 2.3). Afterretrieving the metadata, a file requester uses the parallel filefetching algorithm (Section 2.4) to fetch the file. A back-trackingalgorithm ensures that a requester still receives its requestedmetadata or data even if it moves to another region (Section 2.5).

Fig. 1 shows file querying process in LORD. When filehost A in region 8 publishes its file’s metadata, it calculatesthe metadata’s ID, say ðx; yÞ, and then uses RGR to send itto destination ðx; yÞ. The metadata will arrive at region 3,which is located at ðx; yÞ, and it is then broadcasted to all(or a subset of) nodes in the region (Step 1). Later on, whennode B in region 10 wishes to query the file, using the sameprocess as for data publishing, it first uses RGR to send itsmetadata query to region 3 (Step 2). If a node’s queuingbuffer for pending messages to handle is under k% (k is athreshold), it is lightly loaded; otherwise it is overloaded.A lightly loaded node in region 3 sends the queriedmetadata to node B by using RGR with the location of nodeB’s region as the destination (Step 3). Node B then choosesmultiple physically close file holders and sends queries forfile segments (with determined segment sizes) to them(Step 4). Each file holder sends back the requested filesegment to node B by using RGR with the location of nodeB’s region as the destination (Step 5). Table 1 lists the mainnotations used in this paper for easy reference.

We also propose a coloring-based partial replicationalgorithm that reduces metadata replicas while main-taining the data querying efficiency (Section 2.6). Thedensity of nodes in the area may change and the networkmay become an unbalanced wireless network with somedense regions and sparse regions. We will present anextension on LORD to maintain the efficiency of LORD inan unbalanced wireless network (Section 5 in Appendix).Table 1 lists the main notations used in this paper foreasy reference.

1. The metadata of a file records the file’s keywords, its mappedregion, and the file holder and its region. Fig. 1. Steps of file querying in LORD.

SHEN ET AL.: DATA SEARCH SYSTEM FOR LARGE-SCALE MOBILE WIRELESS NETWORKS 1125

Page 3: A scalable and mobility resilient data search system for large-scale mobile wireless networks

2.1 Area PartitionWe consider a highly mobile and dense wireless networkwith nodes spreading over an area and are independentlyand identically distributed (i.i.d.). LORD is proposed fora wireless network with a number of landmarks. Consid-ering the promising ubiquitous computing environmentin the future, such static landmarks (e.g., base stations,WIFI access points) will not be difficult to find. Once thelandmarks are determined, LORD divides the entire areainto a number of regions. A region is the neighboring zonein the transmission range of a landmark and centered bythe landmark. Each region is identified by an assignedinteger ID. To make LORD adaptive to general case, theregions can be any shape. We design LORD based onirregular shapes (convex polygons), though the regularshape (a special case of irregular shape) would make thedesign much easier. Fig. 1 shows the divided regions andtheir centered landmarks in a wireless network. We use ðxi; yiÞð1 � i � vÞ to denote a convex polygon with v vertices, wherexi and xiþ1 are adjacent and x1 is adjacent with xv. Twovertices are adjacent means the line connecting them is theborder of the region. Assume x1 and xs (1 G s G v) are theminimal and maximal x-axis values, respectively, whichmeans x1 G x2 G � � � G xs and xs 9 xsþ1 9 � � � 9 xv 9 x1. There-fore, each region can be represented as a series of border lines:

R ¼

y ¼ k1 � xþ b1 ðx1 � x � x2Þy ¼ k2 � xþ b2 ðx2 � x � x3Þ� � �y ¼ ks�1 � xþ bs�1 ðxs�1 � x � xsÞy ¼ ks � xþ bs ðxsþ1 � x � xsÞ� � �y ¼ kv�1 � xþ bv�1 ðxv � x � xv�1Þy ¼ kv � xþ bv ðx1 � x � xvÞ

8>>>>>>>>>><>>>>>>>>>>:

(1)

Similar to the maps in the GPS system, the information ofgeographic boundaries of regions (called region map) isconfigured into a node when it joins in the system initially.We assume that each node has the ability to sense thedirection and signal strength from a landmark [29]. Thelandmarks periodically emit identification beacon signals[32], [29], so that each node can identify the region it islocated by a signal received from a landmark in its region.Like the GPS-based geographic routing algorithm thatrequires each node to have a GPS, a device with at least theAOA capacity is a requirement for each node in LORD.AOA devices consume much lower energy than GPS [29],[30], thus greatly reducing the deployment cost of a datasearch system.

The number of landmarks (i.e., regions) ðzÞ can bedetermined based on the transmission range r of the nodes,and the size s of the entire area. For example, if we want thebasic region to be covered by the transmission range ofeach node, the diameter d of each basic region shouldsatisfy d G r and s

z G �d2; that is z 9 s�d2. In this paper, we

only focus on a certain area, such as a campus, a habitatmonitoring area or a wildfire tracking area. We will studythe case in which nodes expand to cover new territories inour future work.

2.2 Region-Based Data Publishing and Querying

2.2.1 Metadata Publishing and QueryingLocality sensitive hash function (LSH) [33] hashes twosimilar keyword groups to close values with high proba-bility. LORD uses LSH to hash a file to store the metadata ofsimilar files into the same region for similarity search. Afile’s keywords can be its file name or the keywordsretrieved using information retrieval algorithms [34]. Thenumber of LSH hash values of a file can be one or morethan one based on the settings of LSH. We use m0 to denotethe number of hash values of a file produced by LSH.

When a file host publishes the metadata of its file, it firsthashes the keywords of the file, denoted as k, using twodifferent LSH functions, H1 and H2. The resultant hashvalues ðHi

1ðkÞ; Hi2ðkÞÞ ði ¼ 1; 2 . . .m0Þ are normalized to

virtual coordinatesðxik; yikÞ ði ¼ 1; 2 . . .m0Þ, which are used asthe IDs of the file. The metadata is mapped to a region thatcontains the virtual coordinates ðxik; yikÞ in the region map.Then, the data host publishes the metadata to the mappedregions using the RGR routing protocol. The node in adestination region that firstly receives the metadata broad-casts it to all other nodes in the region.

When a mobile node wishes to query a file, it calculatesthe coordinate ðxik; yikÞ ði ¼ 1; 2 . . .m0Þ, of the file’s metadataand uses RGR to send requests with ðxik; yikÞ as the destina-tions. The requests are forwarded to the destination regions,which are exactly the regions that hold the metadata of thequeried file. If the first query receiver in the destination regionis lightly loaded, it responds to the requester. Otherwise, thequery is forwarded to a randomly selected neighbor contin-uously until reaching a lightly loaded node in the region,which will respond to the requester.

The requester can specify a similarity threshold. Thesimilarity between the keywords of a file k and the queriedkeywords kq is calculated by

jkq\kjjkq[kj, where jkj denotes the

number of keywords in k. The query receiver responds tothe requester with the metadata that has a similarity to thequeried keywords greater than the required threshold.

2.2.2 Data Mapping Update and Location UpdateIn a mobile network, it is important to maintain themapping between data and regions. LORD uses a reactivedata mapping update scheme, in which a node conductsupdates only when it moves from one region to anotherregion. During the movement, when a node notices that itmoves to a different region based on the signal from thelandmarks, it drops the old region’s metadata and acquiresall metadata in the new region from its new neighbor. Italso conducts location updates by sending messages to the

TABLE 1Notations and Definition

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 5, MAY 20141126

Page 4: A scalable and mobility resilient data search system for large-scale mobile wireless networks

mapped regions of its file’s metadata to update its currentlocation in the metadata. In this way, when the metadata ofa file is retrieved, the current locations of the file’s hosts canalways be acquired.

The geographic routing based (i.e., location-based) datasearch systems use a proactive mapping update scheme, inwhich a file holder periodically sends out messages in-quiring the current closest node to the file’s mapped loca-tion, and transfers the file to such a node if it is found. GLS[20] updates node locations to certain servers when a node’smovement distance reaches a threshold for location track-ing. Compared to these systems, LORD has three distinctivefeatures: (1) rather than relying on proactive mapping up-dates, LORD uses region-based reactive mapping updatesto ensure querying success, significantly reducing updateoverhead; (2) unlike location-based data search systemsthat store a file in a single node closest to the file’s mappedlocation, LORD maps a file to many nodes in a region, thusproviding high resilience to congestion and mobility;and (3) unlike other works that only offer exact datasearching, LORD’s LSH-based data publishing and query-ing enables similarity data searching in mobile networks.

To guarantee successful data querying, nodes conduct datamapping updates when moving. The average data mappingupdate frequency equals the average frequency a nodemoves across different regions in LORD (denoted by fLORD),and it equals the average frequency a node moves away froma location and causes another node to become the closestnode to the location in a location-based data search system(denoted by fGHT ). Obviously, fGHT 9 fLORD. Therefore, GHTgenerates higher mapping update overhead than LORD.

LORD is also characterized by metadata storage insteadof data storage. Most current wireless data search systemsuse data storage. Admittedly, data storage methods avoidan additional file querying step after locating the file hostsin the metadata storage methods. However, in a highlymobile and dense wireless network, metadata storage hasthe advantage in terms of overhead for data mapping up-dates determined by message size. High node mobilityleads to frequent mapped data (metadata or files) transfer.Metadata storage only produces additional operations ofsmall-size metadata querying and replying.

2.3 Region-Based Geographic RoutingWhen node ni searches for a file, it first sends out ametadata query to receive the metadata of the file. Second,ni sends out a file query, and then the queried file will besent back to ni. Finally, ni publishes the metadata for itsreceived file. These different types of messages (includingmetadata query, metadata for publishing, a metadata reply,a file query and a file) need a routing algorithm to forwardthem to their destinations. To achieve scalable, mobility-resilient, and energy-efficient locality-aware routing, wepropose RGR. RGR only needs AOA [29] information andconducts routing from a source region to a destinationregion based on the inter-region direction.

A node senses the directions of its neighbor nodes andalways chooses the next hop within a certain directionrange towards the destination region, to route the messagealong a comparatively shorter path. Given a pair of sourceregion Ri and destination region Rj, Ri’s left-side angle

range toRj (denoted by LR) is defined as the angle betweenthe leftmost vertex of Ri to the leftmost and rightmostvertices ofRj, andRi’s right-side angle range toRj (denotedby RR) is defined as the the rightmost vertex of Ri to theleftmost and rightmost vertices ofRj. For example, in Fig. 2,the source region 10’s LR and RR to the destination region 3are ½�; �� (i.e., ffDAC) and ½�; � � (i.e., ffDBC), respectively.These two angle ranges serve as the tight bounds of messagetransmission direction towards the destination region. Alandmark is always located in the center of a region. Forexample, assume a node in region 10 intends to forward amessage to region 3. We can see that if the transmitting nodeis located on the right side of the landmark, its message canreach the destination region faster if it is forwarded withintriangle ffDBC. If the node is located on the left side of thelandmark, its message can reach the destination regionfaster if it is forwarded within the triangle ffDAC. When anode exchanges ‘‘hello’’ messages with its neighbors, itcan sense the transmission signal strength from neighbornodes and identify the farthest node.

Therefore, in RGR, when a node initiates a message orreceives a message, it calculates its region’s LR and RR tothe destination region, and then decides the next hop forrouting. If the node stays in the left side of its regionlandmark, it chooses the farthest node within ½�; �� as thenext hop node. If the node stays in the right side of its regionlandmark, it chooses the farthest node within the directionbetween ½�; � � as the next hop node. The forwarding processis repeated until the message is forwarded to a node in thedestination region. Thus, RGR can always forward mes-sages quickly towards the destination region.

In the destination region, for different types of mes-sages, different routing methods are used. If the message isa metadata query, considering the load balance, themessage is continuously forwarded until reaching a lightlyloaded node. Thus, the replying workload is evenly dis-tributed among the nodes in one region. If the message ismetadata for publishing, its destinations are all nodes inthe destination region. Thus, it is broadcasted to all nodesin the region. The metadata reply message only targets thefile requester and the file query message only targets thefile host. Since these two types of messages have a smallsize, they can be broadcasted to all nodes in the destinationregion. Since a file has a large size, the file receiver in thedestination region first broadcasts a query to establish apath to the file requester in the region and then sends thefile through the path.

Fig. 2. Region-based geographic routing in LORD.

SHEN ET AL.: DATA SEARCH SYSTEM FOR LARGE-SCALE MOBILE WIRELESS NETWORKS 1127

Page 5: A scalable and mobility resilient data search system for large-scale mobile wireless networks

2.4 Parallel File Fetching AlgorithmAfter receiving the metadata of its queried file, a requester canretrieve the region IDs of the file’s holders. It then locatesthe file holders in the region map initially configured to itself.To reduce file fetching latency, LORD uses a paralleltransmission algorithm, in which different file segments aresimultaneously transmitted from different file holders to thefile requester. Since each segment has a shorter data streamthan the whole file, the total time period for transmitting allsegments to the file requester is shorter than transmitting thewhole file from one file holder. Specifically, the file requesterchooses geographically close file holders among the locatedones, and asks each file holder to transmit a segment of the file.Different segments destined to the same destination mayarrive at the same node in routing. Then, this node can mergethese segments before forwarding them out to save energy forforwarding. Below, we introduce how to determine the lengthof a file segment to minimize the transmission latency of anentire file.

Channel propagation rate is the rate that messages passthrough a channel measured by meters/s, and channeltransmission rate is the rate that messages are transmittedfrom a node to a channel measured by bits/s. We use V todenote the expected channel propagation rate, W to denotethe expected channel transmission rate, and d to denote theexpected distance between two routing hops. These expectedvalues can be calculated based on empirical data. Thesevalues may change, and they can be periodically calculated toensure that they represent the current status of the network.We assume that the total length of a file is L, and the numberof selected file hosts is m. We use Li, Ti and di to denote thelength of the file segment, transmission latency and the dis-tance from a selected file holder i ð1 � i � mÞ to the requester.Then, the average number of hops between two nodes withdistance di is di

d. The expected latency for one data segment

transmission is Ti ¼ LiW �

didþ di

V . Since V � W , diV � 0, then

Ti ¼LiW� did: (2)

T1 ¼ T2 ¼ T3 . . . ¼ Tm is a condition to minimize the filetransmission latency. Therefore, Li � di ¼ Lj � dj (1 � i � m,1 � j � m). Since L1 þ L2 þ . . .Lm ¼ L, we retrieve

Li ¼L

1d1þ 1

d2þ 1

d3þ � � � þ 1

dn

� �� di

: (3)

According to (3), a requester can determine the length of a filesegment transmitted by a data host based on its distance fromthe requester. Based on (2) and (3), we retrieve:

Proposition 2.1. In the LORD parallel file fetching algorithm,the shortest average latency for file fetching is

L

1d1þ 1

d2þ 1

d3þ � � � þ 1

dn

� �� d �W

: (4)

2.5 Back-Tracking AlgorithmA data requester incorporates the ID of its region (i.e.source region) into its request when querying for metadata

or data. The required metadata or data will be sent back tothe requester based on the RGR protocol. In a highly mobilewireless network, the requester may move out of its regionor even travel through a number of regions before theresponse arrives at the source region. LORD has a back-tracking algorithm to keep track of the requester’s move-ment. In the algorithm, if a requester moves out of itscurrent region before receiving the response, it sends aback-tracking message (including its current region) to thesource region. The message is piggybacked on the ‘‘hello’’messages between neighbor nodes. Thus, each node in thesource region keeps a back-tracking message of the re-quester. Using this message, the response can be forwardedto the requester that moves out of the source region.

For example, when ni moves from the source region toregion R1 before it receives the response, it sends a back-tracking message to the source region. Then, if ni moves toregion R2, it sends a back-tracking message to the sourceregion again. When node nj in the source region receivesthe response to ni, nj forwards it to region R2. If node nidoesn’t receive a response after a certain period of time, itinitiates a new request. Algorithm 1 shows the pseudo-code for metadata request replying.

Algorithm 1 Pseudo-code for metadata replying.

1: // Sending a back-tracking message;2: while have not received the metadata reply do3: if it moves to a new region then4: Send a back-tracking message to its old region5: end if6: end while7: // Receiving a back-tracking message;8: if receive a back-tracking message then9: Add the message to its back-tracking message list10: Broadcast the message to its neighbors in the region11: end if12: //Receiving a metadata reply;13: if receive a metadata reply with its region as destinationthen14: if its back-tracking message list contains the messagefrom the requester then15: Forward the reply to the requester’s current region16: Flood a message in its region to delete the back-tracking message for this reply17: else18: if the requester is its neighbor then19: Send the metadata to the neighbor20: else21: Broadcast the metadata to its neighbors22: end if23: end if24: end if

2.6 Coloring-Based Partial ReplicationStoring a metafile in every node in a region enablesmobility-resilient and fast file retrieval but generates a

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 5, MAY 20141128

Page 6: A scalable and mobility resilient data search system for large-scale mobile wireless networks

high overhead for node storage, data mapping updates,and location updates. To handle this problem, we proposea coloring-based partial replication algorithm. The color-ing policy in graph theory aims to prohibit two neighbor-ing nodes in a graph from having the same color.Stimulated by this idea, the coloring-based partial repli-cation algorithm aims to guarantee that a node has atleast one neighbor holding a metafile while avoidinghaving the metafile in neighboring nodes. Fig. 3 showsthe metafile distribution in a region with full replicationand with coloring-based partial replication, respectively. Inthe figure, blue nodes represent replica nodes and whitenodes represent non-replica nodes. Compared to fullreplication, this algorithm reduces the overhead formetafile storage and updates and also provides high fileavailability since a non-replica node can easily retrieve ametafile from its neighbors.

In this coloring-based partial replication algorithm,when a node in a region receives the first metadata of theregion, it stores the metadata and broadcasts it along with aflip-flop key with an initial value of zero (i.e., K ¼ 0) and aTTL (Time to Live). If a node receives metadata with K ¼ 0,it changes K to 1, decreases TTL by 1, and further broad-casts the metadata without replicating it. If a node receivesmetadata with K ¼ 1, it replicates the metadata, changes Kto 0 and decreases TTL by 1 before broadcasting. A receiverof TTL ¼ 0 will not further forward the metadata. We willexplain how a node knows its received metadata is the firstin its region later on.

Because of high node mobility, the neighboring relation-ships between the nodes in a region always change.To maintain the coloring status, each node in a regionneeds to periodically check its neighbors to ensure that itcan retrieve the metafile within one hop. Specifically, eachnode appends a flag bit in the periodic ‘‘hello’’ message toindicate whether it is a replica node. Through the ‘‘hello’’messages, each node periodically checks whether one of itsneighbors is a replica node and records the neighbor’s ID.When a non-replica node notices that none of its neighborshas a replica, it sends a metafile request with a TTL to arandomly chosen neighbor. The request is forwarded untilmeeting a replica node, which sends a metafile to therequester along the original path. If the requester does notreceive a response during TTL, it assumes that there is

no metafile in its region and its subsequent receivedmetadata is the first metadata in its region. Algorithm 2shows the pseudocode for the coloring-based partialreplication algorithm conducted by each node.

Algorithm 2 Pseudocode for the coloring-based partialreplication algorithm conducted by each node.

1: //Ensuring that at least one neighbor has metafile;2: if it does not have metafile then3: for each ‘‘hello’’ message from its neighbors ni do4: if ni is a replica node then5: Record ni in its replica node list6: end if7: end for8: if none of its neighbors is a replica node then9: Randomly select a neighbor nj10: Send a metafile request to nj with TTL11: end if12: end if13: //Handling received metafile request;14: if receive a metafile request from nj then15: if it is not a replica node then16: if it has no neighbor which is a replica node then17: Forward the request to a randomly selectedneighbor18: else19: Forward the request to its neighbor that is areplica node20: end if21: else22: Send a metafile to nj23: end if24: end if25: //Handling received metafile;26: if receive a metafile then27: if it is not the metafile requester then28: Send metadata to the previous request sender29: end if30: end if

When a node receives a message for storing, deleting,or updating a file’s metadata, if it is a replica node, itconducts the operation accordingly and forwards themessage to its neighbors. If it is a non-replica node, itdirectly forwards the message to its neighbors. In this way,all replicas in the region are updated. When a nodereceives a metadata query, if it is a replica node and islightly loaded, it responds with the queried metadata.Otherwise, it forwards the query to the neighbor that hasthe replica.

Replica Management in Node Mobility. To ensure that ametafile is always stored in a sufficient number of nodes in aregion with the partial replication algorithm, nodes need totransfer replicas when they move in and out of a region.Algorithm 3 shows the pseudocode for replica managementin node mobility with the partial replication algorithm.

Fig. 3. Metadata replication in a region. (a) Full replication; (b) coloring-based partial replication.

SHEN ET AL.: DATA SEARCH SYSTEM FOR LARGE-SCALE MOBILE WIRELESS NETWORKS 1129

Page 7: A scalable and mobility resilient data search system for large-scale mobile wireless networks

Algorithm 3 Pseudocode for metafile replication in nodemobility.

1: //When node ni moves into region Ri;2: if ni has neighbors in region Ri then3: if no neighbor has a metafile then4: Request a metafile from a randomly selectedneighbor5: end if6: else7: Send a metafile request to region Ri using RGR8: end if9: //When node ni moves out of its region Ri;10: if ni is a replica node then11: if ni has neighbors in region Ri then12: Send its metafile to a randomly selected neighbor13: Delete its metafile and leave14: else15: Send a metafile transfer request to region Ri

16: end if17: if ni receives a metafile transfer response from nj then18: if nj is not a replica node in Ri then19: Send metafile to nj20: end if21: ni deletes its metafile and leave22: end if23: end if

Node Join. When a node, say ni, moves into a region, itchecks whether it has neighbors and whether any of itscurrent neighbors has a metafile. If node ni has neighborsbut none of them has a replica, ni sends a metafile request toa randomly chosen neighbor nk, which asks for a metafilefrom its neighbors and forwards the metafile to ni. If at leastone of ni’s neighbors has a replica, ni can just stay in theregion without the need to be a replica node. If ni does nothave any neighbors, it sends out a metafile request targetingregionRi using RGR. IfRi is not vacant, the request messagewill be forwarded to a node nj in regionRi. If nj is a metafilenode, it sends a metafile back to ni. Otherwise, it fetches themetafile from its neighbor and sends it to ni. If Ri is vacant,the request message will enter a region Rj, which is theproxy region of Ri. The first request receiver in Rj notifiesall nodes in Rj to update their metadata metai;j to metaj. Ifit is a metafile node, it sends metai to ni. Otherwise, itfetches metai from its neighbor and sends it to ni.

Node Departure. When a node moves out of its currentregion, if it is not a replica node, it does not need to notifyany nodes before leaving. Otherwise, it transfers its metafileto a node in the region. If a node has a certain number ofneighbors, the node sends its metafile to a randomly chosenneighbor before leaving. If a leaving node ni does not haveany neighbors, ni sends its metafile transfer request to itsregionRi using RGR. RGR forwards the metafile around theregion until it meets a node nj, which responds to ni. Ifnj is ametafile node, ni drops the metafile and leavesRi. If nj is nota metafile node, ni sends the metafile to nj, and then dropsthe metafile and leaves Ri.

The coloring-based partial replication algorithm alsoreduces the overhead due to intra-region mobility. When anode moves into a new region, without the algorithm,it needs to acquire metadata and drops its old metadata.With the algorithm, only when the node is a replica node,it needs to move its metadata to its old neighbor; if thenode has a neighbor with metadata in the new region,it does not need to acquire metadata.

3 PERFORMANCE EVALUATION

To simulate a highly dense network, we conductedsimulation on the ONE event-driven simulator [35]. Wecompared the performance of LORD with GHT [12] andGLS [20], the most representative geographic routing based(locality-based) data search systems. GHT maps a file to ageographic location, and stores it in the node closest to thelocation. GHT employs geographic routing for file pub-lishing and querying. Each node holder periodically sendsout a mapping update message to check for a new closestnode to the file’s location and transfers the file whenneeded. GLS is a node location service for geographicrouting. We extended it for data search system. In GLS, theentire geographic area is recursively divided into ahierarchy of increasing smaller squares. The metadata ofa node’s file is mapped to several squares and stored innodes in the squares that have the closest IDs to thenode virtual ID. A message is routed based on node virtualIDs, and geographic routing is employed in each routingstep. If a node’s moving distance reaches a pre-definedupdating distance threshold, it sends its new location tothe new mapped nodes of its metadata. As in [20], we setthe updating distance of GLS to 50 m in the experiments.As in [12], we set the metadata-location mapping updateinterval to 2 s. To make all methods comparable, wemapped file metadata rather than files to nodes in allmethods. We set the location updating interval in GHT to2 s in our experiment.

To test the performance of a topological routing baseddata search system, we include LORD with AODV [36](denoted by AODV) instead of RGR for data transmissionin the comparison set. AODV is an on-demand multi-hoprouting protocol that builds a route path from the source tothe destination using flooding, and then transmits amessage along the path. To measure the effectiveness ofthe coloring-based partial replication algorithm, we alsoevaluated LORD with this algorithm (denoted by LORD-P)and LORD with full replication (denoted by LORD-F).

In the simulation, all nodes move within a 2200 m 2200 marea. We divided the area into 100 regions with a 220 m 220 m region size. We set the transmission range of eachmobile node to 150 m. Thus, a node can reach almost half ofthe nodes in its region. Unless otherwise specified, thenumber of nodes was set to 1000. To reflect a realisticmobile network, we randomly classified the nodes intothree groups with moving speeds randomly chosen within[0.5-2.5] m/s, [1-5] m/s and [20-30] m/s to representthe movement of walkers, bikers and cars. The ratio ofthe number of nodes in the three groups of nodes was setto 4:3:3. The packet transmission speed of nodes was set to250 kbit/s and the size of each file’s metadata was set to

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 5, MAY 20141130

Page 8: A scalable and mobility resilient data search system for large-scale mobile wireless networks

2 kbits (kb). The buffer size of each node was set to 5 Mb. Amessage arriving at a node with a fully occupied buffer isdropped. All nodes move according to the movement patternin [37] with 0 pause time. That is, each node randomly selects aregion as its destination and moves to the region, and then itrandomly selects another region as the destination and movestowards that region. We initially assigned 400 files torandomly chosen nodes. The simulation first warmed up for100 s and then ran for 400 s, in which we randomly selected4 nodes every second to query for randomly selected files.In the experiments, every message was transmitted oncewithout retransmission. The overload threshold in each nodewas set to 0.8. Each experiment was run for 10 times. Wereport the results that are within 95% confidence interval.

We used the following metrics in the evaluation:

1. Average success rate. A node’s success rate is theratio of the number of received files to the numberof initiated file queries. This metric represents theperformance of successful data querying.

2. Average path length. A query’s path length is thenumber of hops for routing the query to themetadata holder. This metric represents routingprotocol efficiency.

3. Overhead. This is the total number of all traversedhops in metadata responding, mapping updating,and location updating. This metric shows theoverhead and reflects the energy-efficiency of a datasearch system.

3.1 Scalability and EfficiencyFig. 4a shows the average success rate in different datasearch systems with varied number of nodes in a network.The figure shows that the success rate followsLORD 9 GHT 9 GLS AODV, which confirms the highsuccess rate of LORD. GHT stores data in the node closestto the data’s location, leading to frequent changes of adata’s home node in a highly mobile wireless network.If mapping updates are not timely, a file query can arrive atthe file’s new home node before the file is transferred to it.Also, GHT’s periodic proactive mapping updates generatemany messages and higher channel contention, leading tomessage drops. In LORD, a node conducts mappingupdates only when it changes its region. Its back-trackingalgorithm enhances the probability that a response suc-cessfully arrives at the requester. Fewer mapping updates,timely mapping updates, fewer location update messagesand back-tracking cause LORD to have a higher successrate than GHT. GLS copes with node mobility by requiring

a node to update its metadata with its new location in allits home nodes when its movement distance exceeds athreshold. One metadata is mapped to a number of nodes.Also, these update messages travel along the long virtualID based paths, thus resulting in higher transmissionoverhead and exacerbating traffic congestion. Therefore,GLS has a lower success rate. We note that the averagesuccess rate of AODV drops sharply with the increase ofthe number of nodes in the system. This is caused by theflooding-based on-demand routing in AODV. In a larger-scale network, it is more likely that a node in the observed routemoves away before a message is forwarded to it, resulting inrouting failures. Moreover, flooding makes the channelcontention more severe, leading to more message drops.

We see that the average success rate of each systemdecreases as the network size increases. More nodes movingin the network causes more mapping updates. This increasesthe probability of untimely mapping updates, and henceincreases the number of occasions that a node receives a dataquery it cannot answer. Also, more messages for mappingupdates lead to higher channel contention and hence moremessage drops. We also see that LORD-P generates successrate comparable to LORD-F. This is because the coloring-based replication algorithm ensures that each node can find ametadata item from its neighbors. This result confirms theeffectiveness of this algorithm in reducing replicas withoutcompromising data search efficiency.

Fig. 4b shows the average path length for metadataquerying versus the network size in different data searchsystems. We see that the average path length followsGLS 9 AODV 9 GHT � LORD and that of GLS increasesrapidly as the network size grows, which confirms the highdata search efficiency of LORD. GLS’s routing is based onnode virtual IDs. The next hop in the virtual path may notbe the geographically closest node, which leads to a longertravel path, especially in a large-scale network. Thus, GLShas low scalability. The geographic routing employed in GHTforwards a message to a node geographically closer to thedestination in each step, generating the shortest path. RGR inLORD achieves similar path lengths even without the exactgeographic location information. In AODV, a source nodebroadcasts a packet and the destination detects the shortest-latency path, which is not necessarily the shortest-lengthpath, so AODV produces longer routing path lengths. Wealso see that LORD-P and LORD-F have approximately thesame path lengths, which implies that an occasional extrahop in querying in LORD-P does not greatly affect its overallquerying path length. This result again confirms theeffectiveness of the coloring-based replication algorithm.

Fig. 4. Scalability performance. (a) Success rate vs. network size; (b) path length vs. network size; (c) overhead vs. network size.

SHEN ET AL.: DATA SEARCH SYSTEM FOR LARGE-SCALE MOBILE WIRELESS NETWORKS 1131

Page 9: A scalable and mobility resilient data search system for large-scale mobile wireless networks

Fig. 4c shows the total overhead versus the number ofnodes in the system. In the figure, GHT-2 and GHT-10denote GHT with updating interval equals to 2 s and 10 s,respectively. The result follows GHT� 2 9 GLS GHT�10 � LORD-F 9 LORD-P. Also, as the number of nodes inthe system increases, the overheads of GHT-2 and GLSincrease rapidly, those of GHT-10 and LORD-F increaseslightly, and that of LORD-P stays nearly constant. The resultsconfirm the low overhead and high scalability of LORD andthe effectiveness of its coloring-based replication algorithm.The reasons for the different performance between GLS,GHT and LORD are the same as in Fig. 4b. GHT-2 generatessignificantly higher overhead than GHT-10 because GHT-2has a higher location update frequency. LORD-P has muchlower overhead than LORD-F because it generates fewer datamapping updates. In LORD-F, when a node moves from oneregion to another region, one metafile in the new region isalways transferred to the node. In LORD-P, if one of the node’sneighbors has the metafile, no metafile transfer is needed.If the node is a replica node, it also needs to transfer its metafileto its neighbor in the old region. The result shows theeffectiveness of the coloring-based replication algorithm inreducing overhead.

In conclusion, the higher success rate, shorter path lengthand lower overhead of LOAD indicate its high scalability.Also, LORD-P has lower overhead than LORD-F and hascomparable success rate and average path length to LORD-F,which confirms the effectiveness of the coloring-basedreplication algorithm in reducing overhead without compro-mising LORD’s scalability.

3.2 Congestion ResilienceTo compare the congestion resilience performance ofdifferent systems, we randomly selected a number ofnodes to send queries on the same file simultaneously.We randomly selected 100 files to be queried. Fig. 5 showsthe average query success rate versus different number ofqueries sent at one time for a file. We see that as the numberof simultaneous queries on a file increases, the averagesuccess rate in all systems decreases due to the congestionon the metadata holders. The figure also shows that thedecrease rate follows LORD G GHT G GLS� AODV,which confirms the high congestion resilience of LORD.In LORD, all nodes in a region can provide the queried

metadata and only a lightly loaded node responds. Therefore,the nodes are less likely to be congested. In contrast, in GLS,GHT and AODV, as the query can only be resolved by onenode in the system, the node is very likely to be congested bymany queries. In addition, LORD’s back-tracking algorithmincreases the probability that a metadata or file responsesuccessfully arrives at the requester.

3.3 Mobility ResilienceFig. 6 shows the average success rate versus node movingspeed. We see that LORD-F and LORD-P generate thehighest success rate and GLS generates higher success ratethan others. LORD-F, LORD-P and GLS exhibit slightdecreases as node moving speed increases, which demon-strates their high mobility resilience. The success rates ofGHT-2 and GHT-10 are much lower and drop sharply whennode moving speed increases, and AODV produces thelowest success rate. Mapping updates play an important rolein ensuring that metadata is stored in its mapped node innode mobility to guarantee querying success. In bothLORD-F and LORD-P, the mapping updates and locationupdates occur only when a node moves out of a region.Also, LORD’s back-tracking algorithm helps forward datato the requester. Further, the redundant replicas inLORD help increase success rate. As a result, LORD-F andLORD-P produce a high success rate. The similar successrates of LORD-F and LORD-P demonstrate the effectivenessof the coloring-based partial replication algorithm in main-taining the success rate while reducing overhead.

GLS’s long physical routing path due to virtual ID basedrouting in mapping updates cannot guarantee timelyupdates, leading to slightly lower success rate thanLORD. In GHT, with increasing node moving speed, theperiodic mapping updates cannot guarantee that a file isalways stored in the node closest to the file’s mappedlocation, leading to a sharp decrease in the success rate.GHT-2 achieves a higher success rate than GHT-10 due toits higher update rate. In AODV, nodes in the shortestobserved path are more likely to be unavailable with fasternode mobility. Also, the flooding for AODV path detectionand the more frequent mapping updates in faster nodemobility exacerbate channel congestion and hence messagedrops. Consequently, the success rate of AODV is very lowand decreases as the node moving speed increases.

Figs. 7a and 7b show the total overhead versus the movingspeed of nodes without GLS and with GLS, respectively. TheFig. 5. Congestion resilience.

Fig. 6. Success rate vs. mobility rate.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 5, MAY 20141132

Page 10: A scalable and mobility resilient data search system for large-scale mobile wireless networks

figures show that the overhead of GHT remains constantregardless of the node moving speed. In GHT, each dataholder periodically probes whether there is a node locatedcloser to the mapped location of the data. Thus, its locationupdate frequency is fixed and it is not affected by nodemoving speed. GHT-2 generates higher mapping updateoverhead than GHT-10 because of its higher update frequen-cy. Recall that GHT also has a low success rate in high nodemobility in Fig. 6. Hence, GHT is not appropriate in a highlydynamic environment. Figs. 7a and 7b show that the overheadsof GLS, LORD-F, and LORD-P increase as the node movingspeed increases and GLS increases rapidly. As the nodemoving speed increases, GLS and LORD produce morelocation updates and metafile updates. In GLS, a node storesits file’s metadata in a number of home nodes and updatesthe information when it moves a certain distance. Also, theupdate metadata travels along a virtual path rather than thegeographically shortest path. In LORD-F, a node conductsmetadata and location updates only when it moves from oneregion to another, which makes it generate fewer updatemessages than GLS. LORD-P generates less overhead thanLORD-F due to the same reason as in Fig. 4c. These resultsconfirm the high mobility resilience of LORD.

4 CONCLUSION

The advancements in WSNs and the rapid increase ofwireless devices necessitate an efficient data search systemfor a large-scale, highly mobile and dense wireless network.Current decentralized data search systems either rely ontopological routing or geographic routing. The former failsto achieve high scalability due to flooding-based routing orrouting table maintenance, while the latter is not resilient tohigh node mobility and generates high update overheadand energy consumption. In this paper, we propose aLOcality-based distRibuted Data search system (LORD) forlarge-scale, highly mobile and dense wireless networks.

LORD consists of a region-based data publishing andquerying protocol and a region-based geographic routingprotocol (RGR). It divides the network area into regions, mapsthe metadata of similar files to the same region for similaritydata retrieval, and stores the metadata in multiple nodes inthe region for mobility-resilience. Unlike the traditionalgeographic routing, LORD’s RGR generates low overheadby forwarding data in the direction of its destination withoutrelying on GPSs. LORD has a coloring-based partial replica-tion algorithm that reduces the number of replicas in a regionwhile maintaining the querying success rate. It further has a

parallel file fetching algorithm to minimize the file fetchinglatency. LORD also works for an unbalanced wireless networkwith sparse regions. Extensive experimental results show thesuperiority of LORD over other data search systems in termsof scalability, overhead and mobility resilience. In the future,we will study how to leverage social network properties inLORD and test the system with real human movement tracedata. We also will implement a LORD prototype for real-world testing.

ACKNOWLEDGMENT

This research was supported in part by U.S. NSF grantsCNS-1254006, CNS-1249603, CNS-1049947, CNS-0917056and CNS-1025652, Microsoft Research Faculty Fellowship8300751. An early version of this work was presented inProc. of MASS’09 [31].

REFERENCES

[1] T. Kindberg, M. Chalmers, and E. Paulos, ‘‘Guest Editors’ Introduc-tion: Urban Computing,’’ IEEE Pervasive Comput., vol. 6, no. 3,pp. 18-20, July-Sept. 2007.

[2] S. Guo, Y. Gu, B. Jiang, and T. He, ‘‘Opportunistic Flooding inLow-Duty-Cycle Wireless Sensor Networks With UnreliableLinks,’’ in Proc. MobiCom, 2009, pp. 133-144.

[3] H. Yang, F. Ye, and B. Sikdar, ‘‘A Swarm-Intelligence-BasedProtocol for Data Acquisition in Networks With Mobile Sinks,’’IEEE Trans. Mobile Comput., vol. 7, no. 8, pp. 931-945, Aug. 2008.

[4] K.-W. Fan, S. Liu, and P. Sinha, ‘‘Dynamic Forwarding OverTree-on-Dag for Scalable Data Aggregation in Sensor Net-works,’’ IEEE Trans. Mobile Comput., vol. 7, no. 10, pp. 1271-1284,Oct. 2008.

[5] M. Lotfinezhad, B. Liang, and E.S. Sousa, ‘‘Adaptive Cluster-BasedData Collection in Sensor Networks With Direct Sink Access,’’ IEEETrans. Mobile Comput., vol. 7, no. 7, pp. 884-897, July 2008.

[6] G. Xing, M. Li, H. Luo, and X. Jia, ‘‘Dynamic MultiresolutionData Dissemination in Wireless Sensor Networks,’’ IEEE Trans.Mobile Comput., vol. 8, no. 9, pp. 1205-1220, Sept. 2009.

[7] A. Talari and N. Rahnavard, ‘‘CStorage: Distributed Data Storagein Wireless Sensor Networks Employing Compressive Sensing,’’in Proc. IEEE GLOBECOM, 2011, pp. 1-5.

[8] S. Schnitzer, H. Miranda, and B. Koldehofe, ‘‘Content RoutingAlgorithms to Support Publish/Subscribe in Mobile Ad HocNetworks,’’ in Proc. IEEE LCN Workshops, 2012, pp. 1053-1060.

[9] C. Hoh and R. Hwang, ‘‘P2P File Sharing System Over MANETBased on Swarm Intelligence: A Cross-Layer Design,’’ in Proc.IEEE WCNC, 2007, pp. 2674-2679.

[10] N. Shah and D. Qian, ‘‘An Efficient Unstructured P2P OverlayOver MANET Using Underlying Proactive Routing,’’ in Proc.MSN, 2011, pp. 248-255.

[11] P. Sharma, D. Souza, E. Fiore, J. Gottschalk, and D. Marquis, ‘‘ACase for MANET-Aware Content Centric Networking of Smart-phones,’’ in Proc. IEEE WOWMOM, 2012, pp. 1-6.

[12] S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Grovindan, L. Yin, andF. Yu, ‘‘Data-Centric Storage in Sensornet With GHT: A GeographicHash Table,’’ Mobile Netw. Appl., vol. 8, no. 4, pp. 427-442, Aug. 2003.

Fig. 7. Overhead vs. mobility rate. (a) Without GLS; (b) with GLS.

SHEN ET AL.: DATA SEARCH SYSTEM FOR LARGE-SCALE MOBILE WIRELESS NETWORKS 1133

Page 11: A scalable and mobility resilient data search system for large-scale mobile wireless networks

[13] C. Tien Ee and S. Ratnasamy, ‘‘Practical Data-Centric Storage,’’in Proc. NSDI, 2006, pp. 325-338.

[14] M. Aly, K. Pruhs, and P.K. Chrysanthis, ‘‘KDDCS: A Load-Balanced in-Network Data-Centric Storage Scheme in SensorNetwork,’’ in Proc. CIKM, 2006, pp. 371-326.

[15] J. Xu, X. Tang, and W.-C. Lee, ‘‘A New Storage Scheme forApproximate Location Queries in Object Tracking Sensor Networks,’’IEEE Trans. Parallel Distrib. Syst., vol. 19, no. 2, pp. 262-275, Feb. 2008.

[16] M. Demirbas, X. Lu, and P. Singla, ‘‘An in-Network QueryingFramework for Wireless Sensor Networks,’’ IEEE Trans. ParallelDistrib. Syst., vol. 20, no. 8, pp. 1202-1215, Aug. 2009.

[17] Y. Zhao, Y. Chen, and S. Ratnasamy, ‘‘Load Balanced andEfficient Hierarchical Data-Centric Storage in Sensor Networks,’’in Proc. IEEE SECON, 2008, pp. 560-568.

[18] P. Desnoyers, D. Ganesan, and P. Shenoy, ‘‘TSAR: A Two TierSensor Storage Architecture Using Interval Skip Graphs,’’ inProc. SenSys, 2005, pp. 39-50.

[19] S.M. Das, H. Pucha, and Y.C. Hu, ‘‘Performance Comparison ofScalable Location Services for Geographic Ad Hoc Routing,’’ inProc. IEEE INFOCOM, 2005, pp. 1228-1239.

[20] J. Li, J. Decouto, and R. Morris, ‘‘A Scalable Location Service forGeograghic Ad-Hoc Routing,’’ in Proc. MobiCom, 2000, pp. 120-130.

[21] H. Shen, L. Zhao, and Z. Li, ‘‘A Distributed Spatial-TemporalSimilarity Data Storage Scheme In Wireless Sensor Networks,’’IEEE Trans. Mobile Comput., vol. 10, no. 7, pp. 982-996, July 2011.

[22] S. Kim, ‘‘Adaptive Ad-Hoc Network Routing Scheme by UsingIncentive-Based Model,’’ Ad Hoc Sensor Wireless Netw., vol. 15,no. 2-4, pp. 107-125, July 2012.

[23] H. Frey and I. Stojmenovic, ‘‘On Delivery Guarantees and WorstCase Forwarding Bounds of Elementary Face Routing Compo-nents Used in Ad Hoc And Sensor Networks,’’ IEEE Trans.Comput., vol. 59, no. 9, pp. 1224-1238, Sept. 2010.

[24] S. Datta, I. Stojmenovic, and J. Wu, ‘‘Internal Node and ShortcutBased Routing With Guaranteed Delivery in Wireless Networks,’’ inProc. Cluster Comput., 2002, pp. 461-466.

[25] B. Karp and H. Kung, ‘‘Greedy Perimeter Stateless Routing,’’ inProc. MobiCom, 2000, pp. 243-254.

[26] Y. Li, Y. Yang, and X. Lu, ‘‘Rules of Designing Routing Metricsfor Greedy, Face, Combined Greedy-Face Routing,’’ IEEE Trans.Mobile Comput., vol. 9, no. 4, pp. 582-595, Apr. 2010.

[27] A. Rao, S. Ratnasamy, C. Papadimitriou, S. Shenker, and I. Stoica,‘‘Geographic Routing Without Location Information,’’ in Proc.MobiCom, 2003, pp. 96-108.

[28] A. Caruso, S. Chessa, S. De, and A. Urpi, ‘‘GPS Free CoordinateAssignment and Routing in Wireless Sensor NETWORKS,’’ inProc. IEEE INFOCOM, 2005, pp. 150-160.

[29] D. Niculescu and B. Nath, ‘‘Ad Hoc Positioning System (APS)Using AOA,’’ in Proc. IEEE INFOCOM, 2003, pp. 1734-1743.

[30] G.D. Stefano and A. Petricola, ‘‘Distributed AOA Based Local-ization Algorithm for Wireless Sensor Networks,’’ J. Comput.,vol. 3, no. 4, pp. 1-8, Apr. 2008.

[31] Z. Li and H. Shen, ‘‘A Mobility and Congestion Resilient DataManagement System for Mobile Distributed Networks,’’ in Proc.IEEE MASS, 2009, pp. 60-69.

[32] M. Maroti, P. Volgyesi, S. Dora, B. Kusy, A. Nadas, A. Ledeczi,G. Balogh, and K. Molnar, ‘‘Radio Interferometric Geolocation,’’in Proc. SenSys, 2005, pp. 1-12.

[33] A. Andoni and P. Indyk, ‘‘Near-Optimal Hashing Algorithmsfor Near Neighbor Problem in High Dimensions,’’ in Proc. IEEEFOCS, 2006, pp. 459-468.

[34] M.W. Berry, Z. Drmac, and E.R. Jessup, ‘‘Matrices Vector Spaces,Information Retrieval,’’ SIAM Rev., vol. 41, no. 2, pp. 335-362,June 1999.

[35] The ‘‘ONE’’ Simulator. [Online]. Available: http://www.netlab.tkk.fi/tutkimus/dtn/theone/.

[36] C. Perkins, E. Belding-Royer, and S. Das, RFC 3561: Ad Hoc onDemand Distance Vector (AODV) Routing 2003.

[37] B. Xie, A. Kumar, and D. Cavalcanti, ‘‘Mobility and RoutingManagement for Heterogeneous Multi-Hop Wireless Networks,’’in Proc. IEEE MASS, 2005, pp. 1-7.

[38] K.P. Naveen and A. Kumar, ‘‘Relay Selection for GeographicalForwarding in Sleep-Wake Cycling Wireless Sensor Networks,’’IEEE Trans. Mobile Comput., vol. 12, no. 3, pp. 475-488, Mar. 2013.

[39] Z. Li and H. Shen, ‘‘A Direction Based Geographic Routing Schemefor Intermittently Connected Mobile Networks,’’ Int. J. Parallel,Emergent Distrib. Syst., vol. 28, no. 5, pp. 449-474, Oct. 2013.

[40] M. Li, T.X. Yan, D. Ganesan, E. Lyons, P. Shenoy, A. Venkataramani,and M. Zink, ‘‘Multi-User Data Sharing in Radar Sensor Network,’’in Proc. ACM SenSys, 2007, pp. 247-260.

[41] Z. Zhang, A.D. Kshemkalyani, and S.M. Shatz, ‘‘Dynamic Multiroot,Multiquery Processing Based On Data Sharing In Sensor Networks,’’Trans. Sensor Netw., vol. 6, no. 3, p. 25, June 2010.

[42] R. Zeng, Y. Jiang, C. Lin, Y. Fan, and X. Shen, ‘‘A DistributedFault/Intrusion-Tolerant Sensor Data Storage Scheme Based onNetwork Coding and Homomorphic Fingerprinting,’’ IEEETrans. Parallel Distrib. Syst., vol. 23, no. 10, pp. 1819-1830, Oct.2012.

[43] J. Li, C. Blake, D.J. Couto, H.I. Lee, and R. Morris, ‘‘Capacity ofwireless ad hoc networks,’’ in Proc. MobiCom, 2001, pp. 61-69.

[44] P. Costa, C. Mascolo, M. Musolesi, and G.P. Picco, ‘‘Sociallyaware routing for publish-subscribe in delay-tolerant mobile ad hocnetworks,’’ IEEE J. Sel. Areas Commun., vol. 26, no. 5, pp. 748-760,June 2008.

[45] F. Li and J. Wu, ‘‘MOPS: Providing Content-Based Service inDisruption-Tolerant Networks,’’ in Proc. IEEE ICDCS, 2009, pp. 526-533.

[46] Z. Jiang, J. Ma, W. Lou, and J. Wu, ‘‘An Information Model forGeographic Greedy Forwarding in Wireless Ad-Hoc SensorNetworks,’’ in Proc. IEEE INFOCOM, 2008, pp. 1499-1507.

[47] K. Chen, H. Shen, and H. Zhang, ‘‘Leveraging Social Networksfor P2P Content-Based File Sharing in Disconnected MANETs,’’IEEE Trans. Mobile Comput., vol. 13, no. 2, pp. 235-249, Feb. 2014.

[48] G.K.W. Wong and X. Jia, ‘‘A Novel Socially-Aware Opportunis-tic Routing Algorithm in Mobile Social Networks,’’ in Proc.WCNC, 2013, pp. 514-518.

Haiying Shen received the BS degree incomputer science and engineering from TongjiUniversity, China, in 2000, and the MS and PhDdegrees in computer engineering from WayneState University, Detroit, MI, USA, in 2004 and2006, respectively. She is currently an AssistantProfessor in the Department of electrical andcomputer engineering at Clemson University,Clemson, SC, USA. Her research interestsinclude distributed computer systems and com-puter networks, with an emphasis on P2P and

content delivery networks, mobile computing, wireless sensor net-works, and cloud computing. She is a Microsoft Faculty Fellow of 2010and a member of the ACM. She is a Senior Member of the IEEE.

Ze Li received the BS degree in electronics andinformation engineering from Huazhong Univer-sity of Science and Technology, China, in 2007,and the PhD degree in computer engineeringfrom Clemson University, Clemson, SC, USA.His research interests include distributed net-works, with an emphasis on peer-to-peer andcontent delivery networks. He is currently a datascientist in the MicroStrategy Incorporation. Heis a Student Member of the IEEE.

Kang Chen received the BS degree in electronicsand information engineering from HuazhongUniversity of Science and Technology, China, in2005, and the MS in communication and informa-tion systems from the Graduate University ofChinese Academy of Sciences, China, in 2008.He is currently a PhD student in the Department ofElectrical and Computer Engineering at ClemsonUniversity, Clemson, SC, USA. His researchinterests includemobile ad hoc networks and delaytolerant networks.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 5, MAY 20141134


Recommended