+ All Categories
Home > Documents > Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Date post: 03-Feb-2016
Category:
Upload: gus
View: 27 times
Download: 0 times
Share this document with a friend
Description:
Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions. Yufei Tao, Reynold Cheng, Xiaokui Xiao, Wang Kai Ngai, Ben Kao, Sunil Prabhakar City University of Hong Kong Hong Kong Polytechnic University University of Hong Kong Purdue University. - PowerPoint PPT Presentation
Popular Tags:
34
Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions Yufei Tao, Reynold Cheng, Xiaokui Xiao, Wang Kai Ngai, Ben Kao, Sunil Prabhakar City University of Hong Kong Hong Kong Polytechnic University University of Hong Kong Purdue University
Transcript
Page 1: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Yufei Tao, Reynold Cheng, Xiaokui Xiao, Wang Kai Ngai, Ben Kao, Sunil Prabhakar

City University of Hong Kong

Hong Kong Polytechnic University

University of Hong Kong

Purdue University

Page 2: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Multi-dimensional Uncertain Data

Moving objects An object sends its location to a server whenever its distance

from the previously reported location is larger than certain threshold.

Sensor readings Each sensor reports the temperature, humidity, UV index, …,

in its neighborhood periodically.

Querying the (uncertain) data stored in the server directly is meaningless.

Page 3: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Uncertainty Modeling

Client 1

distance threshold

recorded locationin database

uncertaintyregion

An object’s location is described by a probability density function.

Page 4: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Probabilistic Range Search

Client 2

Client 1

Client 4

Client 3

Client 5

Client 6

rq (The area of CityU)

Find the clients that are currently in CityU with at least 50% probability (probabilistic range query) (probability threshold)

Page 5: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Appearance Probability

apperance probability:

x

ur(uncertainty region) rq

(query region)

rq ∩ ur

Client 1

E.g., uniform pdf:

Page 6: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Appearance Probability

o.urrq

o.ur ∩ rq

o

must be calculated numerically

Calculation time of an appearance probability in 2D space: 1.3ms

Time for a random access: 10ms

Page 7: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

A good solution should…

Support any pdf. Minimize the number of page accesses. Minimize the number of appearance probabilit

y calculations.

Minimize the total cost (I/O + CPU)

Page 8: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Main Idea

Pre-compute some “auxiliary information” that can be used to efficiently decide whether an object appears in a

region with at least a certain probability without calculating its actual appearance

probability.

Page 9: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Quick Examples

o.urrqo.urrq

pq=20%

Page 10: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Probabilistically Constrained Regions (PCR)

o.ur

l1-

app. prob. = 0.2

l1+

app. prob. = 0.2

l2+

app. prob. = 0.2

l2-

app. prob. = 0.2 l1- l1+

l2-

l2+

o.pcr(0.2)

Page 11: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Probabilistically Constrained Regions (PCR)

o.pcr(0.2)

l1-

app. prob. = 0.2

rq

l1+

app. prob. = 0.2

rq

For a query q with search region rq and probability pq= 0.2

Observation 1.1 (pruning)

an object o can not satisfy q if rq does not intersect o.pcr(0.2)

l2-

app. prob. = 0.2

rq

rq

Page 12: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Probabilistically Constrained Regions (PCR)

l1+

app. prob. = 0.2

o.pcr(0.2)rq

For a query q with search region rq and probability pq= 0.8

Observation 1.2 (pruning)

an object o can not satisfy q if rq does not fully contain o.pcr(0.2)

(= 1 – 0.2)

rq

l1+ l1+

app. prob. = 0.8

Page 13: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Probabilistically Constrained Regions (PCR)

o.pcr(0.2)

l1-

o.MBR

l1-

app. prob. = 0.2

A query q with search region rq and probability pq= 0.2

Observation 1.3 (validating)

an object o definitely satisfies q if rq fully contains the part of o.MBR on the left of l1- (or on the right of l1+ or below l2- or above l2+)

rq

Page 14: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Probabilistically Constrained Regions (PCR)

o.MBRrq

l1+l1+

app. prob. = 0.2

A query q with search region rq and probability pq= 0.8

Observation 1.4 (for validating)

an object o definitely satisfies q if rq fully contains the part of o.MBR on the left of l1+ (or on the right of l1- or below l2+ or above l2-)

l1+

app. prob. = 0.8

Page 15: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Probabilistically Constrained Regions (PCR)

l1+l1-

app. prob. = 0.2

app. prob. = 0.2

app. prob. = 0.6

l1+l1-

app. prob. = 0.2

app. prob. = 0.2

A query q with search region rq and probability pq= 0.6

Observation 1.5 (for validating)

an object o must satisfy q if rq fully contains the part of o.MBR between l1- and l1+ (or between l2- and l2+)

=(1 – 2 * 0.2)

l1-

o.MBR

l1+

rq

Page 16: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Probabilistically Constrained Regions (PCR)

o.pcr(0.2) provides 5 heuristics to reduce CPU cost

In general, for a prob-range query with probability threshold pq

if pq <= 0.5 o may be pruned using o.pcr( pq ) observation 1.1 o may be validated using o.pcr( pq ) observation 1.3 o may be validated using o.pcr( (1 - pq)/2 ) observation 1.5

if pq > 0.5 o may be pruned using o.pcr( 1 - pq ) observation 1.2 o may be validated using o.pcr( 1 - pq ) observation 1.4 o may be validated using o.pcr( pq /2 ) observation 1.5

pq in [0, 1] → infinite number of pq

→ infinite number of PCRsImpractical!

It is possible to use a finite number of PCRs to achieve pruning and validating.

Page 17: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Using PCRs in a Conservative Way

o.pcr(0.2)

o.pcr(0.25)

o.pcr(0.3)

rq

for a query q with search region rq and probability pq= 0.25

Observation 1.1

E.g., U-catalog: { 0, 0.1, 0.2, 0.3, 0.4, 0.5 }

Observation 2.1

an object o cannot satisfy q if rq does not intersect o.pcr(0.2)

an object o cannot satisfy q if rq does not intersect o.pcr(0.25)

rq

Page 18: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Using PCRs in a Conservative Way

o.pcr(0.2)

o.pcr(0.25)

o.pcr(0.3)

rq

for a query q with search region rq and probability pq= 0.75

Observation 1.2

U-catalog: { 0, 0.1, 0.2, 0.3, 0.4, 0.5 }

Observation 2.2

an object o cannot satisfy q if rq does not fully contain o.pcr(0.25)

an object o cannot satisfy q if rq does not fully contain o.pcr(0.3)

rq

Page 19: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

U-catalog Size m

{0, 0.5}, m = 2

{0, 0.25, 0.5}, m = 3

{0, 0.1, 0.2, 0.3, 0.4, 0.5}, m = 6

larger m → more PCRs → greater pruning/validating power

→ less CPU cost

larger m → higher space consumption

→ larger I/O cost

m = 9

Page 20: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

0

0.1

0.2

0.3

0.4

0.5

p

x

Conservative Functional Boxes (CFB)

o.pcr(…)U-catalog: { 0, 0.1, 0.2, 0.3, 0.4, 0.5 }

o.pcr : 2m values for each dimension

o.cfbout : 4 values for each dimensiono.cfbin : 4 values for each dimensiontotal : 8 values

m = 98 : 18

o.cfbxout

o.cfbxin

Page 21: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Conservative Functional Boxes (CFB)

0

0.1

0.2

0.3

0.4

0.5

o.pcr(0.2)

o.cfbout

o.cfbout(0.2)

rq

for a query q with search region rq and probability pq= 0.25

Observation 1.1

U-catalog: { 0, 0.1, 0.2, 0.3, 0.4, 0.5 }

Observation 2.1

an object o cannot satisfy q if rq does not intersect o.pcr(0.2)

an object o cannot satisfy q if rq does not intersect o.pcr(0.25)

Observation 3.1

an object o cannot satisfy q if rq does not intersect o.cfbout(0.2)

Page 22: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Conservative Functional Boxes (CFB)

for a query q with search region rq and probability pq= 0.75

Observation 1.2

U-catalog: { 0, 0.1, 0.2, 0.3, 0.4, 0.5 }

Observation 2.2

an object o cannot satisfy q if rq does not fully contain o.pcr(0.3)

an object o cannot satisfy q if rq does not fully contain o.pcr(0.25)

Observation 3.2

an object o cannot satisfy q if rq does not fully contain o.cfbin(0.3)

0

0.1

0.2

0.3

0.4

0.5

o.pcr(0.3)

o.cfbin

o.cfbin(0.3)

rq

Page 23: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Comparing CFBs with PCRs

CFBs have weaker pruning/validating power than PCRs

But CFBs require less space than PCRs

PCR1 PCR2 …… PCRm

Using PCRs2·m·d values

CFBout CFBin

Using CFBs8·d values

0

0.1

0.2

0.3

0.4

0.5o.cfbout

o.cfbin

p

x

o.pcr

Page 24: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Finding Conservative Functional Boxes

goal: minimize

for the i th dimension, minimize

with the following constrains:

Linear Programming: Simplex Method

0

0.1

0.2

0.3

0.4

0.5o.cfbi-

out

p

x

o.cfbi+out

αi-out αi+

out

arctan(-βi-out)

arctan(βi+out)

Page 25: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

More in Our Paper

The U-treea dynamic index designed to accelerate prob-range queries.

Page 26: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Experimental Results

data space: [0, 10000]d

uncertainty region shape: circle (sphere)

uncertainty region radius: 250

data set: Long Beach County (LB): 53k 2D objects, uniform pdf

California (CA): 62k 2D objects, Gaussian pdf

Aircraft: 100k 3D objects, uniform pdf

query set: 100 queries for each data set with various sizes of rq and different pq

Page 27: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Experimental Results

Page 28: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Experimental Results

Query performance vs. search region size (LB, pq = 0.6)

Page 29: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Experimental Results

Query performance vs. search region size (CA, pq = 0.6)

Page 30: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Experimental Results

Query performance vs. search region size on (Aircraft, pq = 0.6)

Page 31: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Experimental Results

Query performance vs. probability threshold on (LB, qs = 1500)

Page 32: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Experimental Results

Query performance vs. probability threshold on (CA, qs = 1500)

Page 33: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Experimental Results

Query performance vs. probability threshold on (Aircraft, qs = 1500)

Page 34: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions

Summary

A fast method for answering probabilistic range search queries.


Recommended