Date post: | 24-Dec-2015 |
Category: |
Documents |
Upload: | hester-lamb |
View: | 215 times |
Download: | 0 times |
R-Trees
• Extension of B+-trees. Collection of d-dimensional rectangles. A point in d-dimensions is a trivial rectangle.
Non-rectangular Data
• Non-rectangular data may be represented by minimum bounding rectangles (MBRs).
Operations
• Insert
• Delete
• Find all rectangles that intersect a query rectangle.
• Good for large rectangle collections stored on disk.
R-Trees—Structure
• Data nodes (leaves) contain rectangles.
• Index nodes (non-leaves) contain MBRs for data in subtrees.
• MBR for rectangles or MBRs in a non-root node is stored in parent node.
R-Trees—Structure
• R-tree of order M. Each node other than the root has between m
<= ceil(M/2) and M rectangles/MBRs.• Assume m = ceil(M/2) henceforth.
Typically, m = ceil(M/2). Root has between 2 and M rectangles/MBRs. Each index node has as many MBRs as
children. All data nodes are at the same level.
Example• Possible R-tree of order 4 with 12 leaves.
a b c d e f g h i j k l
m n o p
Leaves are data nodes that contain 4 input rectangles each.
a-p are MBRs
Query
• Start at root and find all MBRs that overlap query.• Search corresponding subtrees recursively.
Insert• Similar to insertion into B+-tree but may
insert into any leaf; leaf splits in case capacity exceeded. Which leaf to insert into? How to split a node?
Insert—Leaf Selection• Follow a path from root to leaf.• At each node move into subtree whose MBR area
increases least with addition of new rectangle.
mn
o p
Insert—Split A Node• Split set of M+1 rectangles/MBRs into 2 sets A
and B. A and B each have at least m rectangles/MBRs. Sum of areas of MBRs of A and B is minimum.
M = 8, m = 4
Insert—Split A Node• Split set of M+1 rectangles/MBRs into 2 sets A
and B. A and B each have at least m rectangles/MBRs. Sum of areas of MBRs of A and B is minimum.
M = 8, m = 4
Insert—Split A Node• Split set of M+1 rectangles/MBRs into 2 sets A
and B. A and B each have at least m rectangles/MBRs. Sum of areas of MBRs of A and B is minimum.
M = 8, m = 4
Insert—Split A Node• Exhaustive search for best A and B.
Compute area(MBR(A)) + area(MBR(B)) for each possible A.
Note—for each A, the B is unique. Select partition that minimizes this sum.
• When |A| = m = ceil(M/2), number of choices for A is
(M+1)!
m!(M+1-m)!
Impractical for large M.
Insert—Split A Node• Grow A and B using a clustering strategy.
Start with a seed rectangle a for A and b for B. Grow A and B one rectangle at a time. Stop when the M+1 rectangles have been
partitioned into A and B.
Insert—Split A Node• Quadratic Method—seed selection.
Let S be the set of M+1 rectangles to be partitioned.
Find a and b inS that maximizearea(MBR(a,b)) – area(a) – area(b)
M = 8, m = 4
Insert—Split A Node• Quadratic Method—seed selection.
Let S be the set of M+1 rectangles to be partitioned.
Find a and b inS that maximizearea(MBR(a,b)) – area(a) – area(b)
M = 8, m = 4
Insert—Split A Node• Quadratic Method—assign remaining
rectangles/MBRs. Find an unassigned rectangle c that maximizes
|area(MBR(A,c)) – area(MBR(A))
- (area(MBR(B,c)) – area(MBR(B)))|
M = 8, m = 4
Insert—Split A Node• Quadratic Method—assign remaining
rectangles/MBRs. Find an unassigned rectangle c that maximizes
|area(MBR(A,c)) – area(MBR(A))
- (area(MBR(B,c)) – area(MBR(B)))|
M = 8, m = 4
Insert—Split A Node• Quadratic Method—assign remaining
rectangles/MBRs. Assign c to partition whose area increases least.
M = 8, m = 4
Insert—Split A Node• Quadratic Method—assign remaining rectangles/MBRs.
Continue assigning in this way until all remaining rectangles must necessarily be assigned to one of the two partitions for that partition to have m rectangles.
M = 8, m = 4
Insert—Split A Node• Linear Method—seed selection.
Choose a and b to have maximum normalized separation.
M = 8, m = 4
Insert—Split A Node• Linear Method—seed selection.
Choose a and b to have maximum normalized separation.
M = 8, m = 4
Separation in x-dimension
Insert—Split A Node• Linear Method—seed selection.
Choose a and b to have maximum normalized separation.
M = 8, m = 4
Rectangles with max x-separation
Insert—Split A Node• Linear Method—seed selection.
Choose a and b to have maximum normalized separation.
M = 8, m = 4
Divide by x-width to normalize
Insert—Split A Node• Linear Method—seed selection.
Choose a and b to have maximum normalized separation.
M = 8, m = 4
Separation in y-dimension
Insert—Split A Node• Linear Method—seed selection.
Choose a and b to have maximum normalized separation.
M = 8, m = 4
Rectangles with max y-separation
Insert—Split A Node• Linear Method—seed selection.
Choose a and b to have maximum normalized separation.
M = 8, m = 4
Divide by y-width to normalize
Insert—Split A Node• Linear Method—assign remainder.
Assign remaining rectangles in random order. Rectangle is assigned to partition whose MBR area
increases least. Stop when all remaining rectangles must be assigned to
one of the partitions so that the partition has its minimum required m rectangles.
M = 8, m = 4
Delete
• If leaf doesn’t become deficient, simply readjust MBRs in path from root.
• If leaf becomes deficient, get from nearest sibling (if possible) and readjust MBRs.
• Combine with sibling as in B+ tree.
• Could instead do a more global reorganization to get better R-tree.
Variants
• R*-tree Leaf selection and node overflows in insertion
handled differently.
• Hilbert R-tree
Related Structures
• R+-tree Index nodes have non-overlapping rectangles. A data object may be represented in several
data nodes. No upper bound on size of a data node. No bounds (lower/upper) on degree of an index
node.