+ All Categories
Home > Documents > Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Date post: 22-Dec-2015
Category:
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
19
Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02- 28 VLDB10
Transcript
Page 1: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

k-Nearest Neighbors in Uncertain Graphs

Lin Yincheng2011-02-28

VLDB10

Page 2: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Outline

• Background• Motivation• Problem Definition• Query Answering Approach• Experimental Results

Page 3: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Background

k-Nearest Neighbors Uncertain Graphs

15

15

55

5

Find out 2-nearest neighbors for vertex B

Page 4: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Motivation

Distance Path Probability

5 B-D 0.3

20 B-A-DB-C-D

0.25648

∞ No path 0.44352

• Define meaningful distance functions which is more useful to identify true neighbors

• Introduce a novel pruning algorithm to process knn queries in uncertain graphs.

15(0.2)

15(0.6)

5(0.7)5(0.3)

5(0.4)

most-probable-path-distance

Page 5: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Problem Definition

• Assumption: Independence among edges• Probabilistic Graph Model G(V, E, P, W)

• V and E denote the set of nodes and edges respectively;

• P denotes the probabilities associated with each edge;

• W assigns each edge with a weight

• k-NN Query

Page 6: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Distances

• Median-Distance(s, t)

• Majority-Distance(s, t)

• Expected-Reliable-Distance(s, t)

Page 7: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Challenges

• For computation of median-distance and majority-distance, we need to obtain their distributions over all possible worlds.

• For computation of expected-reliable-distance, it has been proved as a #P hard problem.

Page 8: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Sampling

Page 9: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Sample Size for Median-D

Page 10: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Sample Size for E-R-D

Page 11: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Qualitative Analysis

• Classification Experiment• Testing data: two classes, one is a triplet set of

the form <A, B0, B1> and the other is a triplet set of the form<A, B1, B0>

• A classifier: it tries to identify the true neighbors.• Measure: <False positive rate, True positive

rate>• Data sets: Protein-protein interaction network

DBLP Co-authorship network

Page 12: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Results

Page 13: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

ObservationMedian-D

• Considering a new probability distribution

• The below lemma could be achieved

D is a distance value

Page 14: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Core Pruning Scheme

• Query Transformation

d D, M(s, t1) < d D, M(s, t2) => d M(s, t1) < d M(s, t2)

d M(s, t1) >= d M(s, t2) => d D, M(s, t1) >= d D, M(s, t2)

Page 15: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Median-D kNN Query Answering

Page 16: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Majority-D kNN Query Answering

• The condition of d which is the exact majority distance should be Pr(d) >= 1 – P, P denotes the sum of visited nodes’ probabilities.

• For the node which enters the kNN-set could be possibly replaced by another node with smaller majority distance at a later step.

Page 17: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Experimental Results

• Dataset overview • Convergence of D-F

Using the distance of a sample of 500 pws as the ground truth

Page 18: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Efficiency of k-NN Pruning

The fraction of visited nodes(pruning efficiency) as a function of k

Pruning efficiency as a function of sample size

Page 19: Database Group@CSE k-Nearest Neighbors in Uncertain Graphs Lin Yincheng 2011-02-28 VLDB10.

Database Group@CSE

Quality of Results

Pruning efficiency as a function of edge probability

Median-D

Stability as a function of the number of possible worlds


Recommended