Making the Pyramid Technique Robust to Query Types and Workloads

Making the Pyramid Technique Robust to Query Types and Workloads

Rui Zhang, Beng Chin Ooi, Kian-Lee Tan

Department of Computer Science

National University of Singapore

Singapore

Outline

• Backgrounds

• Existing work and limitations• Our proposal: The P+-tree

• Experimental results• Conclusion

Problem & Motivation

Problem:

Indexing multidimensional point data

Applications:

• Low dimension: GIS, CAD, Medical image (X-rays, MRI brain scans)

• High dimension: Image database, Video database, data warehouse

Typical Query Types

• Point Query

• Window Query

[q0min; q0max]; [q1min; q1max]… [qd-1min; qd-1max]

• Range Query

X(x1 , x2 , … xd-1), r

• K-Nearest Neighbor Query (kNN query)

X(x1 , x2 , … xd-1), k

Existing work: Four Strategies

• Data partitioning: R-tree family

• Space partitioning: k-d-tree family

• Dimensionality Reduction: mapping

• Data Compression: VA-file, IQ-tree

Existing work: Comparison

• Low-dimensional space– The R-tree family structures

• For high-dimensional space– Window query: the Pyramid tech. , the

iMinMax– kNN query: the IQ-tree, the iDistance

Existing work: Limitations

• Limited to query types– The Pyramid tech. , the iMinMax: window

query– The iDistance, the IQ-tree: kNN query

• Limited to certain workloads– The Pyramid tech. : hyper-cube shaped window

query, located around center of the data space

Our proposal: the P+-tree

• Based on the Pyramid tech.

• Support both window and kNN queries

• Robust under different workloads

Review of the Pyramid Tech.

i: pyramid numberhv: height , in the i’th (if i<d)or (i-d)’th (if i>=d) dimension

pvv=i+hv

Sensitivity to location of query window / data distribution

Sensitivity to shape of query

The P+-tree

• Divide data space to subspaces– Based on clustering– Divide in the dimension where two clusters differ

greatest

• Transform the points in each subspace– Transform a subspace to unit hyper-cube, [si min, si max]d -

>[0, 1]d, so that the pyramid tech can be applied– Move the cluster center to center of the transformed

space (0.5, 0.5, … 0.5), the case when the pyramid tech is efficient

Space division and data transformation

Transformation function• A set of d functions, t0 t1 … td-1 • Requirements:

– ti is a bijection from [si min , si max] to [0,1]– ti is monotonous– ti ( ci ) = 0.5

• In equations:– ti (si min ) = 0– ti (si max ) = 1– ti ( ci ) = 0.5

Transformation function

• ti(x)=(ai x – bi)^ei i=0, 1, … d-1

• For subspace [s0 min , s0 max], [s0 min , s0 max], … [sd-1 min , sd-1 max]

ai=1/(si min - si max)

bi= si min /(si min - si max)

ei=-1/log2(ai ci - bi)

The space-tree

SNo, ai, bi, ei are stored in leaf nodes

Space division algorithm

• Clustering data

• Divide space to two subspaces in the dimension where the two cluster centers differ greatest (Recursively)

• Build the space-tree

Build the P+-tree

• The P+-tree is in effect a B+-tree that store the data points in the leaf nodes with the P+-value as keys

• P+-value: SNo · 2d + pv(v’)• For a newly inserted point v, traverse the space-

tree to determine the subspace it belongs to.• Transform the point v to v’, calculate P+-value• Insert the point v, with its P+-value as key

Window search algorithm

• Traverse the space-tree to see which subspaces are intersected by the query

• For each intersected subspace, transform the query according to the transformation function for the subspace

• Search the subspace according to the transformed query

KNN search algorithm

• Start from a small window query

• Gradually increase the side length of the query window until kNN are found

Experiments: Window Queries

Experiments: Partial Window Queries

Experiments: kNN Queries

Date post:	06-Jan-2016
Category:	Documents
Upload:	tangia
View:	19 times
Download:	0 times

Making the Pyramid Technique Robust to Query Types and Workloads

Documents