+ All Categories
Home > Documents > 2IL05 Data Structures Spring 2010

2IL05 Data Structures Spring 2010

Date post: 11-Jul-2015
Category:
Upload: databaseguys
View: 224 times
Download: 1 times
Share this document with a friend
37
2IL05 Data Structures Spring 2010 Lecture 10: Range Searching
Transcript
Page 1: 2IL05 Data Structures Spring 2010

2IL05 Data Structures

Spring 2010

Lecture 10: Range Searching

Page 2: 2IL05 Data Structures Spring 2010

Augmenting data structures

Methodology for augmenting a data structure

Choose an underlying data structure.

Determine additional information to maintain.

Verify that we can maintain additional information for existing data structure operations.

Develop new operations.

You don’t need to do these steps in strict order!

Red-black trees are very well suited to augmentation …

Page 3: 2IL05 Data Structures Spring 2010

Augmenting red-black trees

TheoremAugment a R-B tree with field f, where f[x] depends only on information in x, left[x], and right[x] (including f[left[x]] and f[right[x]]). Then can maintain values of f in all nodes during insert and delete without affecting O(log n) performance.

When we alter information in x, changes propagate only upward on the search path for x …

Examples OS-tree

new operations: OS-Select and OS-Rank Interval-tree

new operation: Interval-Search

Page 4: 2IL05 Data Structures Spring 2010

Range Searching

Page 5: 2IL05 Data Structures Spring 2010

Example: Database for personnel administration (name, address, date of birth, salary, …)

Query: Report all employees born between 1950 and 1955 who earn between $3000 and $4000 per month.

Application: Database queries

More parameters?

Report all employees born between 1950 and 1955 who earn between $3000 and $4000 per month and have between two and four children.

➨ more dimensions

Page 6: 2IL05 Data Structures Spring 2010

“Report all employees born between 1950 and 1955 who earn between $3000 and $4000 per month and have between two and four children.”

Application: Database queries

Rectangular range query or orthogonal range query

Page 7: 2IL05 Data Structures Spring 2010

1-Dimensional range searching

P={p1, p2, …, pn} set of points on the real line

Query: given a query interval [x : x’] report all pi ∈ P with pi ∈ [x : x’].

Solution: Use a balanced binary search tree T.

leaves of T store the points pi

internal nodes store splitting values (node v stores value xv)

Page 8: 2IL05 Data Structures Spring 2010

80

70

62

80

19

19

10

23

49

Query [x : x’] ➨ search with x and x’ ➨ end in two leaves μ and μ’

Report 1. all leaves between μ and μ’2. possibly points stored at μ and μ’

1-Dimensional range searching

37 89

9359303

3 10 23 30 37

49 89

59 7062 93 98

[18 : 77]

μ μ’

Page 9: 2IL05 Data Structures Spring 2010

706259

49

37302319

19

10

23

80

70

62

80

49

Query [x : x’] ➨ search with x and x’ ➨ end in two leaves μ and μ’

Report 1. all leaves between μ and μ’2. possibly points stored at μ and μ’

1-Dimensional range searching

37 89

9359303

3 10

89

93 98

[18 : 77]

μ μ’

How do we find all leaves between μ and μ’?

Page 10: 2IL05 Data Structures Spring 2010

How do we find all leaves between μ and μ’ ?

Solution: They are the leaves of the subtrees rooted at nodes v in between the two search paths whose parents are on the search paths.

➨ we need to find the node vsplit where the search paths split

1-Dimensional range searching

Page 11: 2IL05 Data Structures Spring 2010

1-Dimensional range searching

FindSplitNode(T, x, x’)► Input. A tree T and two values x and x’ with x ≤ x’► Output. The node v where the paths to x and x’ split, or the leaf

where both paths end.4. v = root(T) while v is not a leaf and (x’ ≤ xv or x > xv) do if x’ ≤ xv

then v = left(v) else v = right(v) return v

Page 12: 2IL05 Data Structures Spring 2010

1-Dimensional range searching

Starting from vsplit follow the search path to x.

when the paths goes left, report all leaves in the right subtree

check if µ ∈ [x : x’]

Starting from vsplit follow the search path to x’.

when the paths goes right, report all leaves in the left subtree

check if µ’ ∈ [x : x’]

Page 13: 2IL05 Data Structures Spring 2010

1-Dimensional range searching

1DRangeQuery(T, [x : x’])► Input. A binary search tree T and a range [x : x’].► Output. All points stored in T that lie in the range.

vsplit = FindSplitNode(T, x, x’) if vsplit is a leaf then Check if the point stored at vsplit must be reported. else (Follow the path to x and report the points in subtrees right of the path) v = left(vsplit) while v is not a leaf do if x ≤ xv then ReportSubtree(right(v)) v = left(v) else v = right(v) Check if the point stored at the leaf v must be reported. Similarly, follow the path to x’, report the points in subtrees left of

the path, and check if the point stored at the leaf where the path ends must be reported.

Page 14: 2IL05 Data Structures Spring 2010

1-Dimensional range searching

Correctness?Need to show two things:

1DRangeQuery(T, [x, x’])

► Input. A binary search tree T and a range [x : x’].► Output. All points stored in T that lie in the range.

vsplit = FindSplitNode(T, x, x’) if vsplit is a leaf then Check if the point stored at vsplit must be reported. else (Follow the path to x and report the points in subtrees right of the path) v = left(vsplit) while v is not a leaf do if x ≤ xv then ReportSubtree(right(v)) v = left(v) else v = right(v) Check if the point stored at the leaf v must be reported. Similarly, follow the path to x’, report the points in subtrees left of

the path, and check if the point stored at the leaf where the path ends must be reported.

1. every reported point lies in the query range

2. every point in the query range is reported.

Page 15: 2IL05 Data Structures Spring 2010

1-Dimensional range searching

Query time?ReportSubtree = O(1 + reported points) ➨ total query time = O(log n + reported points)

Storage?

1DRangeQuery(T, [x, x’])

► Input. A binary search tree T and a range [x : x’].► Output. All points stored in T that lie in the range.

vsplit = FindSplitNode(T, x, x’) if vsplit is a leaf then Check if the point stored at vsplit must be reported. else (Follow the path to x and report the points in subtrees right of the path) v = left(vsplit) while v is not a leaf do if x ≤ xv then ReportSubtree(right(v)) v = left(v) else v = right(v) Check if the point stored at the leaf v must be reported. Similarly, follow the path to x’, report the points in subtrees left of

the path, and check if the point stored at the leaf where the path ends must be reported.

O(n)

Page 16: 2IL05 Data Structures Spring 2010

2-Dimensional range searching

P={p1, p2, …, pn} set of points in the plane

Query: given a query rectangle [x : x’] x [y : y’] report all pi ∈ P with

pi ∈ [x : x’] x [y : y’] , that is, px ∈ [x : x’] and py ∈ [y : y’]

➨ a 2-dimensional range query is composed of two 1-dimensional sub-queries

How can we generalize our 1-dimensionalsolution to 2 dimensions?

for now: no two points have the same x-coordinate, no two points have the same y-coordinate

Page 17: 2IL05 Data Structures Spring 2010

Back to one dimension …

80

70

62

80

19

19

10

23

49

37 89

9359303

3 10 23 30 37

49 89

59 7062 93 98

3 10 19 23 30 37 49 59 62 70 80 89 93 98

Page 18: 2IL05 Data Structures Spring 2010

Back to one dimension …

80

70

62

80

19

19

10

23

49

37 89

9359303

3 10 23 30 37

49 89

59 7062 93 98

3 10 19 23 30 37 49 59 62 70 80 89 93 98

Page 19: 2IL05 Data Structures Spring 2010

p1

And now in two dimensions …

p1

p2

p3

p4 p5

p6

p7

p8

p9

p10

l1

l2

l3

l4

l5

l6

l7

l8

l9

l1

l2 l3

l4 l5 l6 l7

l8 l9p3 p4 p5 p8 p9 p10

p2 p6 p7

Split alternating on x- and y-coordinate

2-dimensional kd-tree

Page 20: 2IL05 Data Structures Spring 2010

Kd-trees

BuildKDTree(P, depth)► Input. A set of points P and the current depth.► Output. The root of a kd-tree storing P. if P contains only one point then return a leaf storing this point else if depth is even then split P into two subsets with a vertical line l through the

median x-coordinate of the points in P. Let P1 be the set of points to the left or on l, and let P2 be the set of points to the right of l.

else split P into two subsets with a horizontal line l through the median y-coordinate of the points in P. Let P1 be the set of

points below or on l, and let P2 be the set of points above l. vleft = BuildKdTree(P1, depth+1) vright = BuildKdTree(P2, depth+1) Create a node v storing l, make vleft the left child of v, and make vright the right

child of v return v

Page 21: 2IL05 Data Structures Spring 2010

Kd-trees

BuildKDTree(P, depth)► Input. A set of points P and the current depth.► Output. The root of a kd-tree storing P. if P contains only one point then return a leaf storing this point else if depth is even then split P into two subsets with a vertical line l through the

median x-coordinate of the points in P. Let P1 be the set of points to the left or on l, and let P2 be the set of points to the

right of l. else split P into two subsets with a horizontal line l through the

median y-coordinate of the points in P. Let P1 be the set of points below or on l, and let P2 be the set of points above l.

vleft = BuildKdTree(P1, depth+1) vright = BuildKdTree(P2, depth+1) Create a node v storing l, make vleft the left child of v, and make vright the right

child of v return v

Running time?

➨ T(n) = O(n log n)

O(1) if n = 1

O(n) + 2T( n/2 ) if n > 1

T(n) =

presort to avoid linear time median finding …

Page 22: 2IL05 Data Structures Spring 2010

Kd-trees

BuildKDTree(P, depth)► Input. A set of points P and the current depth.► Output. The root of a kd-tree storing P. if P contains only one point then return a leaf storing this point else if depth is even then split P into two subsets with a vertical line l through the

median x-coordinate of the points in P. Let P1 be the set of points to the left or on l, and let P2 be the set of points to the

right of l. else split P into two subsets with a horizontal line l through the

median y-coordinate of the points in P. Let P1 be the set of points below or on l, and let P2 be the set of points above l.

vleft = BuildKdTree(P1, depth+1) vright = BuildKdTree(P2, depth+1) Create a node v storing l, make vleft the left child of v, and make vright the right

child of v return v

Storage? O(n)

Page 23: 2IL05 Data Structures Spring 2010

Querying a kd-tree

Each node v corresponds to a region region(v). All points which are stored in the subtree rooted at v lie in region(v).

➨ 1. if a region is contained in query rectangle, report all points in region.2. if region is disjoint from query rectangle, report nothing.3. if region intersects query rectangle, refine search (test children or point stored in region if no children)

Page 24: 2IL05 Data Structures Spring 2010

p1

Querying a kd-tree

p1

p2

p3

p4 p5

p6

p8

p11

p12

p13

p3 p4 p5 p11 p12 p13

p2 p6

Disclaimer: This tree cannot have been constructed by BuildKdTree …

p7

p9

p10

p7 p8 p9 p10

Page 25: 2IL05 Data Structures Spring 2010

Querying a kd-tree

SearchKdTree(v, R)► Input. The root of (a subtree of) a kd-tree, and a range R.► Output. All points at leaves below v that lie in the range. if v is a leaf then report the point stored at v if it lies in R else if region(left(v)) is fully contained in R then ReportSubtree(left(v)) else if region(left(v)) intersects R then SearchKdTree(left(v), R) if region(right(v)) is fully contained in R then ReportSubtree(right(v)) else if region(right(v)) intersects R then SearchKdTree(right(v), R)

Query time?

Page 26: 2IL05 Data Structures Spring 2010

Querying a kd-tree: Analysis

Time to traverse subtree and report points stored in leaves is linear in number of leaves

➨ ReportSubtree takes O(k) time, k total number of reported points

need to bound number of nodes visited that are not in the traversed subtrees (grey nodes)

the query range properly intersects the region of each such node

We are only interested in an upper bound …

How many regions can a vertical line intersect?

Page 27: 2IL05 Data Structures Spring 2010

Querying a kd-tree: Analysis

Question: How many regions can a vertical line intersect?

Q(n): number of intersected regions

Answer 1:

Answer 2:

Master theorem ➨ Q(n) = O(√n)p1

p2

p3

p4 p5

p6

p8

p11

p12

p13

p7

p9

p10

Q(n) = 1 + Q(n/2)

Q(n) =O(1) if n = 1

2 + 2Q(n/4) if n > 1

Page 28: 2IL05 Data Structures Spring 2010

KD-trees

TheoremA kd-tree for a set of n points in the plane uses O(n) storage and can be built in O(n log n) time. A rectangular range query on the kd-tree takes O(√n + k) time, where k is the number of reported points.

If the number k of reported points is small, then the query time O(√n + k) is relatively high.

Can we do better?

Trade storage for query time …

Page 29: 2IL05 Data Structures Spring 2010

Back to 1 dimension … (again)

A 1DRangeQuery with [x : x’] gives us all points whose x-coordinates lie in the range [x : x’] × [y : y’].

These points are stored in O(log n) subtrees.

Canonical subset of node vpoints stored in the leaves of the subtree rooted at v

Idea: store canonical subsets in binarysearch tree on y-coordinate

Page 30: 2IL05 Data Structures Spring 2010

Range trees

Range tree The main tree is a balanced binary search tree T built on the

x-coordinate of the points in P. For any internal or leaf node ν in T, the canonical subset P(ν) is

stored in a balanced binary search tree Tassoc(ν) on the y-coordinate of the points. The node ν stores a pointer to the root of Tassoc(ν), which is called the associated structure of ν.

Page 31: 2IL05 Data Structures Spring 2010

Range trees

Build2DRangeTree(P)► Input. A set P of points in the plane.► Output. The root of a 2-dimensional range tree. Construct the associated structure: Build a binary search tree Tassoc on the

set Py of y-coordinates of the points in P. Store at the leaves of Tassoc not just the y-coordinates of the points in Py, but the points themselves.

if P contains only one points then create a leaf v storing this point, and make Tassoc the associated

structure of v else split P into two subsets; one subset Pleft contains the points with x-

coordinate less than or equal to xmid, the median x-coordinate, and the other subset Pright contains the points with x-coordinate larger than xmid

vleft = Build2DRangeTree(Pleft) vright = Build2DRangeTree(Pright) create a node v storing xmid, make vleft the left child of v, make vright

the right child of v, and make Tassoc the associated structure of v return v

Page 32: 2IL05 Data Structures Spring 2010

Range trees

Build2DRangeTree(P)► Input. A set P of points in the plane.► Output. The root of a 2-dimensional range tree. Construct the associated structure: Build a binary search tree Tassoc on the

set Py of y-coordinates of the points in P. Store at the leaves of Tassoc not just the y-coordinates of the points in Py, but the points themselves.

if P contains only one points then create a leaf v storing this point, and make Tassoc the associated

structure of v else split P into two subsets; one subset Pleft contains the points with x-

coordinate less than or equal to xmid, the median x-coordinate, and the other subset Pright contains the points with x-coordinate larger than xmid

vleft = Build2DRangeTree(Pleft) vright = Build2DRangeTree(Pright) create a node v storing xmid, make vleft the left child of v, make vright

the right child of v, and make Tassoc the associated structure of v return v

Running time?

presort to built binary search trees in linear time …

O(n log n)

Page 33: 2IL05 Data Structures Spring 2010

Range trees: Storage

LemmaA range tree on a set of n points in the plane requires O(n log n) storage.

Proof: each point is stored only once per level storage for associated structures is

linear in number of points there are O(log n) levels

Page 34: 2IL05 Data Structures Spring 2010

Querying a range tree

2DRangeQuery(T, [x : x’] × [y : y’])► Input. A 2-dimensional range tree T and a range [x : x’] × [y : y’].► Output. All points in T that lie in the range. vsplit = FindSplitNode(T, x, x’) if vsplit is a leaf then check if the point stored at vsplit must be reported else (Follow the path to x and call 1DRangeQuery on the subtrees right

of the path) v = left(vsplit) while v is not a leaf do if x ≤ xv

then 1DRangeQuery(Tassoc(right(v)), [y : y’]) v = left(v) else v = right(v) Check if the point stored at v must be reported Similarly, follow the path from right(vsplit) to x’, call 1DRangeQuery with

the range [y : y’] on the associated structures of subtrees left of the path, and check if the point stored at the leaf where the path ends must be reported.

Page 35: 2IL05 Data Structures Spring 2010

Querying a range tree

2DRangeQuery(T, [x : x’] × [y : y’])► Input. A 2-dimensional range tree T and a range [x : x’] × [y : y’].► Output. All points in T that lie in the range. vsplit = FindSplitNode(T, x, x’) if vsplit is a leaf then check if the point stored at vsplit must be reported else (Follow the path to x and call 1DRangeQuery on the subtrees right

of the path) v = left(vsplit) while v is not a leaf do if x ≤ xv

then 1DRangeQuery(Tassoc(right(v)), [y : y’]) v = left(v) else v = right(v) Check if the point stored at v must be reported Similarly, follow the path from right(vsplit) to x’, call 1DRangeQuery with

the range [y : y’] on the associated structures of subtrees left of the path, and check if the point stored at the leaf where the path ends must be reported.

Running time? O(log2 n + k)

Page 36: 2IL05 Data Structures Spring 2010

Range trees

TheoremA range tree for a set of n points in the plane uses O(n log n) storage and can be built in O(n log n) time. A rectangular range query on the range tree takes O(log2 n + k) time, where k is the number of reported points.

Conclusion

O(log2 n + k) O(√n + k)query time

O(n log n)O(n)storage

Range treekd-tree

Page 37: 2IL05 Data Structures Spring 2010

Tutorials this week

Small tutorials on Tuesday 1+2.

Thursday 3+4 big tutorial.


Recommended