Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | jonas-richards |
View: | 20 times |
Download: | 0 times |
Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions
Yufei Tao, Reynold Cheng, Xiaokui Xiao, Wang Kai Ngai, Ben Kao, Sunil Prabhakar
City University of Hong Kong
Hong Kong Polytechnic University
University of Hong Kong
Purdue University
Multi-dimensional Uncertain Data
Moving objects An object sends its location to a server whenever its distance
from the previously reported location is larger than certain threshold.
Sensor readings Each sensor reports the temperature, humidity, UV index, …,
in its neighborhood periodically.
Querying the (uncertain) data stored in the server directly is meaningless.
Uncertainty Modeling
Client 1
distance threshold
recorded locationin database
uncertaintyregion
An object’s location is described by a probability density function.
Probabilistic Range Search
Client 2
Client 1
Client 4
Client 3
Client 5
Client 6
rq (The area of CityU)
Find the clients that are currently in CityU with at least 50% probability (probabilistic range query) (probability threshold)
Appearance Probability
apperance probability:
x
ur(uncertainty region) rq
(query region)
rq ∩ ur
Client 1
E.g., uniform pdf:
Appearance Probability
o.urrq
o.ur ∩ rq
o
must be calculated numerically
Calculation time of an appearance probability in 2D space: 1.3ms
Time for a random access: 10ms
A good solution should…
Support any pdf. Minimize the number of page accesses. Minimize the number of appearance probabilit
y calculations.
Minimize the total cost (I/O + CPU)
Main Idea
Pre-compute some “auxiliary information” that can be used to efficiently decide whether an object appears in a
region with at least a certain probability without calculating its actual appearance
probability.
Probabilistically Constrained Regions (PCR)
o.ur
l1-
app. prob. = 0.2
l1+
app. prob. = 0.2
l2+
app. prob. = 0.2
l2-
app. prob. = 0.2 l1- l1+
l2-
l2+
o.pcr(0.2)
Probabilistically Constrained Regions (PCR)
o.pcr(0.2)
l1-
app. prob. = 0.2
rq
l1+
app. prob. = 0.2
rq
For a query q with search region rq and probability pq= 0.2
Observation 1.1 (pruning)
an object o can not satisfy q if rq does not intersect o.pcr(0.2)
l2-
app. prob. = 0.2
rq
rq
Probabilistically Constrained Regions (PCR)
l1+
app. prob. = 0.2
o.pcr(0.2)rq
For a query q with search region rq and probability pq= 0.8
Observation 1.2 (pruning)
an object o can not satisfy q if rq does not fully contain o.pcr(0.2)
(= 1 – 0.2)
rq
l1+ l1+
app. prob. = 0.8
Probabilistically Constrained Regions (PCR)
o.pcr(0.2)
l1-
o.MBR
l1-
app. prob. = 0.2
A query q with search region rq and probability pq= 0.2
Observation 1.3 (validating)
an object o definitely satisfies q if rq fully contains the part of o.MBR on the left of l1- (or on the right of l1+ or below l2- or above l2+)
rq
Probabilistically Constrained Regions (PCR)
o.MBRrq
l1+l1+
app. prob. = 0.2
A query q with search region rq and probability pq= 0.8
Observation 1.4 (for validating)
an object o definitely satisfies q if rq fully contains the part of o.MBR on the left of l1+ (or on the right of l1- or below l2+ or above l2-)
l1+
app. prob. = 0.8
Probabilistically Constrained Regions (PCR)
l1+l1-
app. prob. = 0.2
app. prob. = 0.2
app. prob. = 0.6
l1+l1-
app. prob. = 0.2
app. prob. = 0.2
A query q with search region rq and probability pq= 0.6
Observation 1.5 (for validating)
an object o must satisfy q if rq fully contains the part of o.MBR between l1- and l1+ (or between l2- and l2+)
=(1 – 2 * 0.2)
l1-
o.MBR
l1+
rq
Probabilistically Constrained Regions (PCR)
o.pcr(0.2) provides 5 heuristics to reduce CPU cost
In general, for a prob-range query with probability threshold pq
if pq <= 0.5 o may be pruned using o.pcr( pq ) observation 1.1 o may be validated using o.pcr( pq ) observation 1.3 o may be validated using o.pcr( (1 - pq)/2 ) observation 1.5
if pq > 0.5 o may be pruned using o.pcr( 1 - pq ) observation 1.2 o may be validated using o.pcr( 1 - pq ) observation 1.4 o may be validated using o.pcr( pq /2 ) observation 1.5
pq in [0, 1] → infinite number of pq
→ infinite number of PCRsImpractical!
It is possible to use a finite number of PCRs to achieve pruning and validating.
Using PCRs in a Conservative Way
o.pcr(0.2)
o.pcr(0.25)
o.pcr(0.3)
rq
for a query q with search region rq and probability pq= 0.25
Observation 1.1
E.g., U-catalog: { 0, 0.1, 0.2, 0.3, 0.4, 0.5 }
Observation 2.1
an object o cannot satisfy q if rq does not intersect o.pcr(0.2)
an object o cannot satisfy q if rq does not intersect o.pcr(0.25)
rq
Using PCRs in a Conservative Way
o.pcr(0.2)
o.pcr(0.25)
o.pcr(0.3)
rq
for a query q with search region rq and probability pq= 0.75
Observation 1.2
U-catalog: { 0, 0.1, 0.2, 0.3, 0.4, 0.5 }
Observation 2.2
an object o cannot satisfy q if rq does not fully contain o.pcr(0.25)
an object o cannot satisfy q if rq does not fully contain o.pcr(0.3)
rq
U-catalog Size m
{0, 0.5}, m = 2
{0, 0.25, 0.5}, m = 3
{0, 0.1, 0.2, 0.3, 0.4, 0.5}, m = 6
…
larger m → more PCRs → greater pruning/validating power
→ less CPU cost
larger m → higher space consumption
→ larger I/O cost
m = 9
0
0.1
0.2
0.3
0.4
0.5
p
x
Conservative Functional Boxes (CFB)
o.pcr(…)U-catalog: { 0, 0.1, 0.2, 0.3, 0.4, 0.5 }
o.pcr : 2m values for each dimension
o.cfbout : 4 values for each dimensiono.cfbin : 4 values for each dimensiontotal : 8 values
m = 98 : 18
o.cfbxout
o.cfbxin
Conservative Functional Boxes (CFB)
0
0.1
0.2
0.3
0.4
0.5
o.pcr(0.2)
o.cfbout
o.cfbout(0.2)
rq
for a query q with search region rq and probability pq= 0.25
Observation 1.1
U-catalog: { 0, 0.1, 0.2, 0.3, 0.4, 0.5 }
Observation 2.1
an object o cannot satisfy q if rq does not intersect o.pcr(0.2)
an object o cannot satisfy q if rq does not intersect o.pcr(0.25)
Observation 3.1
an object o cannot satisfy q if rq does not intersect o.cfbout(0.2)
Conservative Functional Boxes (CFB)
for a query q with search region rq and probability pq= 0.75
Observation 1.2
U-catalog: { 0, 0.1, 0.2, 0.3, 0.4, 0.5 }
Observation 2.2
an object o cannot satisfy q if rq does not fully contain o.pcr(0.3)
an object o cannot satisfy q if rq does not fully contain o.pcr(0.25)
Observation 3.2
an object o cannot satisfy q if rq does not fully contain o.cfbin(0.3)
0
0.1
0.2
0.3
0.4
0.5
o.pcr(0.3)
o.cfbin
o.cfbin(0.3)
rq
Comparing CFBs with PCRs
CFBs have weaker pruning/validating power than PCRs
But CFBs require less space than PCRs
PCR1 PCR2 …… PCRm
Using PCRs2·m·d values
CFBout CFBin
Using CFBs8·d values
0
0.1
0.2
0.3
0.4
0.5o.cfbout
o.cfbin
p
x
o.pcr
Finding Conservative Functional Boxes
goal: minimize
for the i th dimension, minimize
with the following constrains:
Linear Programming: Simplex Method
0
0.1
0.2
0.3
0.4
0.5o.cfbi-
out
p
x
o.cfbi+out
αi-out αi+
out
arctan(-βi-out)
arctan(βi+out)
Experimental Results
data space: [0, 10000]d
uncertainty region shape: circle (sphere)
uncertainty region radius: 250
data set: Long Beach County (LB): 53k 2D objects, uniform pdf
California (CA): 62k 2D objects, Gaussian pdf
Aircraft: 100k 3D objects, uniform pdf
query set: 100 queries for each data set with various sizes of rq and different pq