Date post: | 25-Dec-2015 |
Category: |
Documents |
Upload: | bathsheba-elliott |
View: | 214 times |
Download: | 0 times |
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Spatial Data Query Spatial Data Query Support in Peer-to-Peer Support in Peer-to-Peer
SystemsSystems
Roger ZimmermannRoger Zimmermann, Wei-Shinn Ku, and Haojun Wang, Wei-Shinn Ku, and Haojun WangComputer Science DepartmentComputer Science Department
University of Southern CaliforniaUniversity of Southern CaliforniaLos Angeles, CA 90089Los Angeles, CA 90089
COMPSAC 2004COMPSAC 2004
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
OutlineOutline
MotivationMotivation
Introduction to DHTs (CAN)Introduction to DHTs (CAN)
Technical ApproachTechnical Approach
ResultsResults
Conclusions and Future ResearchConclusions and Future Research
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
MotivationMotivation
Spatial data sets are used for many Spatial data sets are used for many applications, e.g., GIS, CAD, …applications, e.g., GIS, CAD, …
P2P systems provide a distributed platform P2P systems provide a distributed platform that is very scalable.that is very scalable.
Pros:Pros:– Scalability, no central point of failureScalability, no central point of failure
Cons:Cons:– Very dynamic (unreliable), topology Very dynamic (unreliable), topology
maintenance requiredmaintenance required
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Motivaton (cont.)Motivaton (cont.)
Question: how to use P2P systems for Question: how to use P2P systems for spatial data sharing.spatial data sharing.
Query Challenges:Query Challenges:– Unstructured P2P systems: querying by Unstructured P2P systems: querying by
flooding is not efficientflooding is not efficient– Structured P2P systems based on DHTs Structured P2P systems based on DHTs
(Chord, CAN): only efficient (Chord, CAN): only efficient exact matchexact match queries are supportedqueries are supported
E.g., search files based on their names/titlesE.g., search files based on their names/titlesput(key, value); get(key) return valueput(key, value); get(key) return value
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Motivation (cont.)Motivation (cont.)
Spatial queries are usually Spatial queries are usually range queriesrange queries– Intersect, overlapIntersect, overlap– Nearest neighbor(s) (kNN)Nearest neighbor(s) (kNN)
DHTs are not suitable without modificationDHTs are not suitable without modification
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Distributed Hash Tables (DHT)Distributed Hash Tables (DHT)
DHT systems: Content Addressable Network DHT systems: Content Addressable Network (CAN), Chord, Pastry, etc.(CAN), Chord, Pastry, etc.Using DHT to allocate large data sets to many Using DHT to allocate large data sets to many nodes with no central controlnodes with no central controlData objects are near uniformly distributed Data objects are near uniformly distributed through a through a hash functionhash function, resulting in superb , resulting in superb scalability and load balancescalability and load balanceEach node only maintains a small routing table Each node only maintains a small routing table to know its neighborsto know its neighborsLocating a particular data object requires Locating a particular data object requires O(logO(logNN) search steps on average) search steps on average
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Content Addressable Network Content Addressable Network (CAN)(CAN)
A scalable indexing mechanism in a P2P A scalable indexing mechanism in a P2P networknetworkCreates a Creates a logicallogical dd-dimensional Cartesian -dimensional Cartesian coordinate spacecoordinate spaceDivides the space into zones, where each zone Divides the space into zones, where each zone is controlled by a node in the systemis controlled by a node in the systemZones are dynamically partitioned or merged as Zones are dynamically partitioned or merged as nodes join and leavenodes join and leaveEach Zone is addressed with a Virtual Identifier Each Zone is addressed with a Virtual Identifier (VID), which is deterministically calculated from (VID), which is deterministically calculated from the location of the zonethe location of the zone
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Content Addressable Network Content Addressable Network (CAN)(CAN)
Example: A 2-D space partitioned into 7 CAN zones
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Content Addressable Network Content Addressable Network (cont)(cont)
NodeNode Operations Operations
(e.g., Insertion)(e.g., Insertion)
1.1. Find a bootstrap Find a bootstrap node firstnode first
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Content Addressable Network Content Addressable Network (cont)(cont)
2. Randomly 2. Randomly choose a point choose a point in the CAN in the CAN plane and plane and route the new route the new node from the node from the bootstrap bootstrap node to the node to the chosen chosen locationlocation
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Content Addressable Network Content Addressable Network (cont)(cont)
3. The new node arrives 3. The new node arrives at the destination at the destination zone covering that zone covering that point. The destination point. The destination zone is split into two zone is split into two zones, each zones, each controlled by one controlled by one node (old and new)node (old and new)
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Content Addressable Network Content Addressable Network (cont)(cont)
4. Update the 4. Update the neighborhood zone neighborhood zone routing informationrouting information
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Content Addressable Network Content Addressable Network (CAN)(CAN)
Data ObjectData Object Operation (e.g. Insertion) Operation (e.g. Insertion)1.1. Generate a key based on the object identification Generate a key based on the object identification
and insert data object as a <key, value> pairand insert data object as a <key, value> pair
2.2. Map the key into a point P in the CAN plane by Map the key into a point P in the CAN plane by using a uniform hash functionusing a uniform hash function
3.3. Store the <key, value> pair at the node which owns Store the <key, value> pair at the node which owns the zone within which the point P is located the zone within which the point P is located
4.4. To retrieve the value, the same hash function is To retrieve the value, the same hash function is applied to the key in order to regenerate the point P applied to the key in order to regenerate the point P and find the zone owns that point, the zone will and find the zone owns that point, the zone will return the value to the clientreturn the value to the client
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Storing Spatial Data w/ DHTsStoring Spatial Data w/ DHTs
Hash function distributes data objects evenly Hash function distributes data objects evenly within the space to achieve a balanced loadwithin the space to achieve a balanced loadSpatial locality information needs to be Spatial locality information needs to be preserved for range queries. Applying a hash preserved for range queries. Applying a hash function to spatial data will destroy localityfunction to spatial data will destroy localityRelated work explored storing R-tree or Quad-Related work explored storing R-tree or Quad-tree based index on DHTtree based index on DHT– Harwood et al. Harwood et al. Hashing Spatial Content over Peer-to-Hashing Spatial Content over Peer-to-
Peer NetworksPeer Networks– Mondal et al. Mondal et al. P2PR-tree: An R-tree-based Spatial P2PR-tree: An R-tree-based Spatial
Index for Peer-to-Peer EnvironmentsIndex for Peer-to-Peer Environments
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Spatial Range Query Design for Spatial Range Query Design for P2P SystemsP2P Systems
Mapping a physical space to a CAN spaceMapping a physical space to a CAN space– Propose a new hash function to map spatial Propose a new hash function to map spatial
data objects onto nodes over a modified CAN data objects onto nodes over a modified CAN systemsystem
– Purpose: allow efficient spatial data query Purpose: allow efficient spatial data query execution while at the same time considering execution while at the same time considering load balanceload balance
– Calculating the location of zones in the logical Calculating the location of zones in the logical space – Virtual Identifier (VID) tree for space – Virtual Identifier (VID) tree for mapping purposemapping purpose
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Spatial Range Query Design for Spatial Range Query Design for P2P SystemsP2P Systems
Approach:Approach:– Object Object keykey is generated with three different is generated with three different
componentscomponents(a) Scatter region address: based on the spatial (a) Scatter region address: based on the spatial locality of the object; preserves spatial locality.locality of the object; preserves spatial locality.
(b) Zone address: randomized; achieves load (b) Zone address: randomized; achieves load balancebalance
(c) Object identifier (hashed)(c) Object identifier (hashed)
– The scatter region size is fixed and The scatter region size is fixed and predeterminedpredetermined
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Spatial Range Query Design for Spatial Range Query Design for P2P Systems (cont.)P2P Systems (cont.)
– The value of zone bit string is decided randomly and The value of zone bit string is decided randomly and the object identifier is the data content hash resultthe object identifier is the data content hash result
– The VID tree is created with its height determined by The VID tree is created with its height determined by the scatter region sizethe scatter region size
– The maximum number of zones is 2The maximum number of zones is 2(a+b)(a+b)
– The relationship between data locality and load The relationship between data locality and load balance can be determined along a spectrumbalance can be determined along a spectrum
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Spatial Range Query Design for Spatial Range Query Design for P2P Systems (cont.)P2P Systems (cont.)
Scatter Scatter regionregion(11000)(11000)
e.g.:e.g.:a=5 bitsa=5 bits
00
10
01
000 001 010 011 100 101 110 111
11
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Spatial Range Query Design for Spatial Range Query Design for P2P Systems (cont.)P2P Systems (cont.)
ZonesZones
e.g.:e.g.:b=4 bitsb=4 bits
00
10
01
000 001 010 011 100 101 110 111
11
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Spatial Range Query Design for Spatial Range Query Design for P2P Systems (cont.)P2P Systems (cont.)
System Operation and Spatial Range QuerySystem Operation and Spatial Range Query– Node OperationNode Operation
Bootstrap mechanismBootstrap mechanismNode join mechanismNode join mechanismZone split and the search thresholdZone split and the search threshold
– Balance the number of data objects in each zoneBalance the number of data objects in each zone– The zone being selected must be larger than the The zone being selected must be larger than the
minimum zone size (1/2minimum zone size (1/2(a+b)(a+b)))– The threshold is the upper bound on the number of The threshold is the upper bound on the number of
search hops to find a zone to splitsearch hops to find a zone to split– Data Object InsertionData Object Insertion– Data Object DeletionData Object Deletion– Spatial Range QuerySpatial Range Query
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Spatial Range Query Design for Spatial Range Query Design for P2P Systems (cont.)P2P Systems (cont.)
Spatial Range Query
Step 1: The querying node launches a spatial range query.
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Spatial Range Query Design for Spatial Range Query Design for P2P Systems (cont.)P2P Systems (cont.)
Spatial Range Query
Step 2:The node determines the overlapping scatter regions.
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Spatial Range Query Design for Spatial Range Query Design for P2P Systems (cont.)P2P Systems (cont.)
Spatial Range Query
Step 3:The node multicasts the query to the overlapping scatter regions.
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Spatial Range Query Design for Spatial Range Query Design for P2P Systems (cont.)P2P Systems (cont.)
Step 4:Step 4:– The range query is multicast The range query is multicast withinwithin all all
overlapping scatter regions (M-CAN).overlapping scatter regions (M-CAN).– Recall: data is randomized within each scatter Recall: data is randomized within each scatter
region, so an exhaustive search is necessaryregion, so an exhaustive search is necessary– Choice of scatter region sizeChoice of scatter region size
Large: good load balance; uniform within a scatter Large: good load balance; uniform within a scatter regionregion
Small: exhaustive search covers less areaSmall: exhaustive search covers less area
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Conclusions and Future Conclusions and Future Research DirectionsResearch Directions
– We proposed a hash function to preserve We proposed a hash function to preserve both spatial locality information and both spatial locality information and constrained load balanceconstrained load balance
– The proposed mechanism works will with The proposed mechanism works will with CAN P2P architectureCAN P2P architecture
– We are currently running simulations to test We are currently running simulations to test our approachour approach
Roger ZimmermannRoger Zimmermann COMPSAC 2004, September 30COMPSAC 2004, September 30
Thank you!
Questions?