Snehal Thakkar 1
Snehal Thakkar
Spatial Data Structures
Hanan Samet
Computer Science Department
University of Maryland
Snehal Thakkar 2
Spatial Data Structures
• Introduction
• Spatial Indexing
• Region Data
• Point Data
• Rectangle Data
• Line Data
• Conclusion
Snehal Thakkar 3
Introduction
• Spatial Objects Points, Lines, Regions, Rectangles …..
• Spatial Indexing Unlike conventional data sort has to be on space
occupied by data
• Hierarchical Data StructuresBased on recursive decomposition, similar to
divide and conquer method
Snehal Thakkar 4
Spatial Indexing
• Mapping Spatial Data into Point- Same, Higher or Lower Dimension
- Good storage purposes, queries like intersect
- Problems with queries like nearest
• Bucketing Methods- Grid file, BANG file, LSD trees, Buddy
trees….
- Buckets based on not the representative point, but based on actual space.
Snehal Thakkar 6
R-Trees (Continued)
• Organize spatial objects into d-dimensional rectangles.
• Each node in the tree corresponds to smallest d-dimensional rectangle that encloses child nodes.
• If an object is spatially contained in several nodes, it is only stored in one node.
• Tree parameters are adjusted so that small number of pages are visited during a spatial query
• All leaf nodes appear at same level
• Each leaf node is (R,O) where R is smallest rectangle containing O, e.g. R3,R4……
Snehal Thakkar 7
R-trees (Continued)
• Each non-leaf node is (R,P) where R is smallest rectangle containing all child rectangles, e.g. R1,R2
• R-tree of order (m,M) means that each node in the tree has between floor M/2 and M nodes, with exception of root node. Root node has two entries unless it is a leaf node.
• R-tree is not unique, rectangles depend on how objects are inserted and deleted from the tree.
• Problem is that to find some object you might have to go through several rectangles or whole database.
Snehal Thakkar 8
R+ - Trees
• Decomposition of Space into Disjoint Cells
R2
R3 R4 R5 R6
R1
a bd g h c i e fh i c i
Snehal Thakkar 9
R+ Trees (Continued)
• R+-tree and Cell Trees used approach of discomposing space into cells
• R+-trees deals with collection of objects bounded by rectangles
• Cell tree deals with collection of objects bounded by convex polyhedra
• R+-trees is extension of k-d-B-tree.
• Try not to overlap the rectangles.
• If object is in multiple rectangles, it will appear multiple times.
Snehal Thakkar 10
R+Trees(Continued)
• Multiple paths to object from the root
• Height of the tree is increased
• Retrieval times are smaller
• When summing the objects, needs eliminate duplicates
• It is not possible to guarantee that all properties of B-trees is fulfilled without going through difficult insert and deletion routines.
• It is data-dependent, so depending on how you insert or delete records R+-tree will be different.
Snehal Thakkar 11
More Spatial Indexing• Uniform Grid
- Ideal for uniformly distributed data
- More data-independence then R+-trees- Space decomposed on blocks on uniform size- Higher overhead
• Quadtree- Space is decomposed based on data points- Sensitive to positioning of the object- Width of the blocks is restricted to power of
two- Good for Set-theory type operations, like
composition of data.
Snehal Thakkar 12
Region Data
• Focus on Interior Representation
• Represented as Image array of pixels
• Runlength Code- Break array into 1*m blocks, row representation
• Metal Axis Transformation (MAT)- Union of Maximal Square blocks
- Blocks may overlap
- Block are specified by center and radius
Snehal Thakkar 13
More Region Data
• Region Quadtree- Is Metal Axis Transformation
- Whose blocks are required to be disjoint
- To have standard sizes(squares whose sides are power of two)
- To be at standard locations
- Based on successive subdivision of image array into four equal size quadrants.
Snehal Thakkar 14
Region Quadtree
12 34 5
13 141911 12
615
181716
71098
A
B C F
2
1
3 4 5 6 11 12D 13 14 19E
15 16 17 187 8 9 10
NW NE SW SE
Snehal Thakkar 15
Region Quadtree (Continued)
• Each leaf node is either Black or White• All non-leaf nodes are Gray(Circle is
previous example• You can also use it for non-binary images• Resolution of the decomposition may be
governed by data or predetermined• Can be used for several object
representations.
Snehal Thakkar 16
Variations of Quadtree
• Point Quadtree- Quadtree with rectangular quadrants- Adoption of Binary Search Tree to two dimensions or more- Useful for location based queries like where is nearest
theatre from the location.- Descending the tree till you find the node for location
based queries.- For nearest neighbor, search is continued in the
neighborhood of the node containing object.- Feature based queries tough because index is based
on spatial occupancy not on features.
Snehal Thakkar 17
Variations of Quadtree• Pyramid
- Exponentially tapering stack of arrays, each one quarter size of previous
- Useful for feature based queries like where does wheat grow in California.
- Nodes that are not at maximum level of resolution contain summary information
• Octree- Three dimensional analog of quadtree
- Recursively subdivide into eight octants
Snehal Thakkar 18
More Variations of Quadtree
• Locational Code Based Quadtree- Treats image as a collection of leaf nodes, each encoded by pair
of numbers
- First is base 4 number, sequence of directional codes that locates leaf from the root
- Second depth at which node is found or size
• DF-expression- Represents the image in form of traversal of nodes of its
quadtree
- Very Compact storage, each node type can be encoded with two bits.
- Not easy to use when random access to nodes is required.
Snehal Thakkar 19
Searching with Quadtree
• Useful for performing set operations• When performing intersection, it only
returns black node when both quadtrees have black nodes.
• Operation is performed using three quadtrees.
• Worst case scenario is sum of nodes in two quadtrees
Snehal Thakkar 20
Algorithms with Quadtree
• Most algorithms are preorder traversals
• Execution time is linear function of number of nodes
• Quadtree Complexity Theorem- Number of nodes in quadtree representation is
O(p+q) for 2q*2q image with perimeter p measured in pixel width.
- It also holds for more dimensions.
Snehal Thakkar 21
Point Data• PR Quadtree
- Regular decomposition of space into quadrants
- Organized same way as the region quadtree
- Leaf nodes are either empty or contain data point and its co-ordinates
- A quadrant contains at most one data point
- Shape of the tree is independent of the order in which points are inserted
- If points are close together then decomposition can be deep
- Can use quadrants with capacity c
- Good for search within specified distance of given record
Snehal Thakkar 22
PR-tree (Continued)
(0,100)
(0,0)
(100,100)
(100,0)
(92,1)(52,15)
(88,65)
(20,88)
(50,50)
(75,75) (25,25) (75,25)
Snehal Thakkar 23
Rectangle Data
• Used to approximate other objects in the image and in VLSI design rule checking
• If environment is static, solution is based on use of plane sweep paradigm
• Any addition to database forces re-execution of algorithm on whole database
Snehal Thakkar 24
Rectangle Data (Continued)
• Grid File Based Approach- Each rectangle reduced to a point in higher dimension
- Made up of Cartesian product of two one dimensional intervals
- Each interval is represented by center and extent
- Set of intervals is represented by Grid File
- Grid File uses two dimensional array of grid blocks called Grid Directory
Snehal Thakkar 25
Rectangle Data (Continued)
• Grid File Based Approach (Continued)- Grid Directory has address of the bucket
- Set of linear scales is kept in the core to access grid block in the grid directory
- Guarantees access to record in two operations
- First operation to access the grid block
- Second operation to access the grid bucket
Snehal Thakkar 26
Rectangle Data (Continued)
• MX-CIF Quadtree- Based on Quadtree- Decomposition of space into rectangles- Each rectangle is associated with a quadtree
node corresponding to the smallest block which contains it in its entirety
- Subdivision stops when nodes block contains no rectangles or at predetermined size
- Rectangles can be associated with terminal and non-terminal nodes
Snehal Thakkar 28
Line Data
• PM1 quadtree- Based on regular decomposition of space
- Partitioning occurs as long as a block contains more than one line segment unless the
line segments are incident at a vertex in the block
- Vertex-based implementation
- Useful because space requirements for polyhedral objects are smaller then conventional octree
Snehal Thakkar 30
Line Data (Continued)
• PMR Quadtree- Edge-based variant of PM quatree- Uses probabilistic splitting rule- Block contains variable number of line
segments- Each line segment is inserted into all blocks
that it intersects or occupies- If block has more line segments than permitted,
it is divided into four blocks once and only once- During deletion line segment is removed from
all blocks and blocks are checked for merging