+ All Categories
Home > Documents > Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014)...

Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014)...

Date post: 14-Dec-2015
Category:
Upload: roman-carvell
View: 214 times
Download: 3 times
Share this document with a friend
Popular Tags:
25
Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao
Transcript
Page 1: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

Reachability Querying: An Independent Permutation Labeling Approach(published in VLDB 2014)

Presenter: WEI, Hao

Page 2: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

Graph Reachability QueryGiven a directed graph G = (V, E) and two vertices u and

v, u is said to reach v if there exists a path from u to v over G.

Any directed graph can be easily transformed into a DAG trivial if u and v are in the same connect component0 1

2 3

54

6 7

98 1110

Query(v1, v8) Reachable

Query(v2, v11)

Unreachable

Page 3: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

The Issue and the Challenge ‘Big Data’ era brings us large

graph with millions of nodes and

edges. web-uk dataset: 133 million

nodes, 5 billion edges DAG of web-uk: 22 million

nodes, 38 million edges Traditional approaches are not

applicable.

Page 4: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

Related WorkRecent works builds index, label(u), offline for every node u.

Label-Only Approach: answer Query(u, v) only by label(u) and

label(v) only Hop Labeling: TF-Label, Hierarchy Label, Distribution Label, … Transitive Closure Compression: Chain-Cover, Tree-Cover, … non-linear index construction time and index size, may generate

unacceptable large index

Label+G Approach: answer Query(u, v) by label(u) and label(v)

with the possibility of accessing G if needed interval labeling: GRIPP, GRAIL, Ferrari, … linear index size, but may perform DFS

Page 5: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

Main Idea of IP Labeling Out(u) denote the set of vertices that u can reach, including u

itself. In(u) denote the set of vertices in which every vertex can

reach u, including u. u can reach v iff Out(v) Out(u) and In(u) In(v).

if Out(v) Out(u) or In(u)In(v), u cannot reach v.

Both are time/space consuming if an exact answer is needed for large sets.

Page 6: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

Main Idea of IP Labeling

IP label aims to answer unreachable query pair (u, v) by detecting Out(v) Out(u) or In(u) In(v)

based on Min-wise Independent Permutation

high probability guarantee to answer query

linear index construction time and index size

Page 7: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

Min-wise Independent Permutation

Given two sets and B ( Out(u), Out(v) or In(v), In(u) ) and a

random permutation , according to the definition of

min-wise independent permutation,

=

Page 8: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

K-min-wise Independent PermutationWe propose to use top-k smallest numbers instead of top-1 smallest number to improve the performance.

mink{} be the subset of containing up to the k smallest numbers of .

an order() between and , such that if every (bi) \is larger than the largest number in . We use otherwise.

Page 9: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

K-min-wise Independent PermutationWe prove that

if is true, BA

Let |A| = p, |A B| = q and = kA for kA k,= (for q p kA)

Page 10: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

Independent Permutation Generation

0 1

2 3

54

6 7

98 1110

7 11

8 6

03

2 1

410 59

Knuth Shuffle

Page 11: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

IP Label

The IP label of u consists of two parts:

Lout(u): the mink{ } set of Out(u), mink{Out(u)}

Lin(u): the mink{ } set of In(u), mink{In(u)}

Page 12: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

IP Label

7 11

8 6

03

2 1

410 59

Vertex LoutLin

v0 {0, 1, 2, 3, 4} {7}

v1 {0, 1, 2, 3, 4} {11}

v2 {2, 3, 4, 8, 10}

{7, 8}

v3 {1, 2, 3, 4, 6} {6, 7}

v4 {2, 3, 4, 10} {3, 6, 7, 8, 11}

v5 {0, 1, 5, 9, 10}

{0, 7, 11}

v6 {2, 10} {2, 3, 6, 7, 8}

v7 {1} {0, 1, 6, 7, 11}

v8 {10} {0, 2, 3, 6, 7}

v9 {4} {3, 4, 6, 7, 8}

v10 {9} {0, 7, 9, 11}

v11 {5} {0, 5, 7, 11}

for k = 5{10} {4}

{2, 10}

{3}

{8} {2, 3, 4, 10}

{2, 10}

{2, 3, 4, 8, 10}

Page 13: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

IP LabelVertex Lout

Lin

v0 {0, 1, 2, 3, 4} {7}

v1 {0, 1, 2, 3, 4} {11}

v2 {2, 3, 4, 8, 10}

{7, 8}

v3 {1, 2, 3, 4, 6} {6, 7}

v4 {2, 3, 4, 10} {3, 6, 7, 8, 11}

v5 {0, 1, 5, 9, 10}

{0, 7, 11}

v6 {2, 10} {2, 3, 6, 7, 8}

v7 {1} {0, 1, 6, 7, 11}

v8 {10} {0, 2, 3, 6, 7}

v9 {4} {3, 4, 6, 7, 8}

v10 {9} {0, 7, 9, 11}

v11 {5} {0, 5, 7, 11}

for k = 5

0 1

2 3

54

6 7

98 1110

Lout(v2) = {2, 3, 4, 8, 10}Lout(v7) = {1}Lout(v2) Lout(v7)Out(v7) Out(v2)

Q1: Query(v2, v7)

1 Lout(v2) , 1Lout(v7) and 1 is smaller than the largest number in Lout(v2)

So Lout(v2) Lout(v7)Out(v7) Out(v2)

Page 14: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

IP LabelVertex Lout

Lin

v0 {0, 1, 2, 3, 4} {7}

v1 {0, 1, 2, 3, 4} {11}

v2 {2, 3, 4, 8, 10}

{7, 8}

v3 {1, 2, 3, 4, 6} {6, 7}

v4 {2, 3, 4, 10} {3, 6, 7, 8, 11}

v5 {0, 1, 5, 9, 10}

{0, 7, 11}

v6 {2, 10} {2, 3, 6, 7, 8}

v7 {1} {0, 1, 6, 7, 11}

v8 {10} {0, 2, 3, 6, 7}

v9 {4} {3, 4, 6, 7, 8}

v10 {9} {0, 7, 9, 11}

v11 {5} {0, 5, 7, 11}

for k = 5

0 1

2 3

54

6 7

98 1110

Q2: Query(v1, v3)

Lout(v1) Lout(v3) Lin(v3) Lin(v1)

Pr(Lout(v1) Lout(v3)) = Pr(Lin(v3) Lin(v1)) =

𝟏𝟐

,𝟐𝟑

Let |A| = p, |A B| = q and = kA for kA <k,

=

Page 15: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

IP LabelVertex Lout

Lin

v0 {0, 1, 2, 3, 4} {7}

v1 {0, 1, 2, 3, 4} {11}

v2 {2, 3, 4, 8, 10}

{7, 8}

v3 {1, 2, 3, 4, 6} {6, 7}

v4 {2, 3, 4, 10} {3, 6, 7, 8, 11}

v5 {0, 1, 5, 9, 10}

{0, 7, 11}

v6 {2, 10} {2, 3, 6, 7, 8}

v7 {1} {0, 1, 6, 7, 11}

v8 {10} {0, 2, 3, 6, 7}

v9 {4} {3, 4, 6, 7, 8}

v10 {9} {0, 7, 9, 11}

v11 {5} {0, 5, 7, 11}

for k = 5

0 1

2 3

54

6 7

98 1110

Lout(v4) Lout(v3) Lin(v3)Lin(v4)

Pr(Lout(v4) Lout(v3)) = Pr(Lin(v3) Lin(v4)) =

𝟏𝟐

,𝟐𝟑

𝟏𝟒𝟏𝟓

,𝟗𝟏𝟎

Q4: Query(v1, v3)

Page 16: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

IP LabelVertex Lout

Lin

v0 {0, 1, 2, 3, 4} {7}

v1 {0, 1, 2, 3, 4} {11}

v2 {2, 3, 4, 8, 10}

{7, 8}

v3 {1, 2, 3, 4, 6} {6, 7}

v4 {2, 3, 4, 10} {3, 6, 7, 8, 11}

v5 {0, 1, 5, 9, 10}

{0, 7, 11}

v6 {2, 10} {2, 3, 6, 7, 8}

v7 {1} {0, 1, 6, 7, 11}

v8 {10} {0, 2, 3, 6, 7}

v9 {4} {3, 4, 6, 7, 8}

v10 {9} {0, 7, 9, 11}

v11 {5} {0, 5, 7, 11}

for k = 5

0 1

2 3

54

6 7

98 1110

Lout(v5) Lout(v3) Lin(v3)Lin(v5)

Pr(Lout(v5) Lout(v3)) = Pr(Lin(v3) Lin(v5)) =

𝟏𝟐

,𝟐𝟑

𝟏𝟒𝟏𝟓

,𝟗𝟏𝟎

𝟏𝟐𝟓𝟏𝟐𝟔

,𝟓𝟔

Q4: Query(v1, v3)

Page 17: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

IP LabelVertex Lout

Lin

v0 {0, 1, 2, 3, 4} {7}

v1 {0, 1, 2, 3, 4} {11}

v2 {2, 3, 4, 8, 10}

{7, 8}

v3 {1, 2, 3, 4, 6} {6, 7}

v4 {2, 3, 4, 10} {3, 6, 7, 8, 11}

v5 {0, 1, 5, 9, 10}

{0, 7, 11}

v6 {2, 10} {2, 3, 6, 7, 8}

v7 {1} {0, 1, 6, 7, 11}

v8 {10} {0, 2, 3, 6, 7}

v9 {4} {3, 4, 6, 7, 8}

v10 {9} {0, 7, 9, 11}

v11 {5} {0, 5, 7, 11}

for k = 5

0 1

2 3

54

6 7

98 1110

The probability increase significantly !

𝟏𝟐

,𝟐𝟑

𝟏𝟒𝟏𝟓

,𝟗𝟏𝟎

𝟏𝟐𝟓𝟏𝟐𝟔

,𝟓𝟔

Q4: Query(v1, v3)

Page 18: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

IP Label

Assume DFS is needed even though u cannot reach v. Consider a vertex w, as a descendant of u, is visited by DFS towards v, the followings are true:

Pr(Lout(u) Lout(v)) <Pr(Lout(w)Lout(v))

Pr(Lin(v)Lin(u)) <Pr(Lin(v)Lin(w))

While DFS becomes deeper, it is much more likely to answer the unreachability queries, and therefore, it can stop in an early stage.

Page 19: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

Two Optimizations

Huge-Vertex Label: build additional index to handle the huge vertices of the graph

Level Label: use the topological structure to prune the search space

Page 20: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

Performance Studies

Real Dataset:

Dataset | V(G) | | E(G) | davg R-ratio

uniprotenc 25M 25M 0.999 1.30E-7

twitter 18M 18M 1.013 7.39E-2

web-uk 22M 38M 1.678 1.50E-1

citeseerx 6.5M 15M 2.295 4.07E-4

go-uniprot 6.9M 34M 4.990 3.64E-6

govwild 8.0M 23M 2.948 7.20E-5

Page 21: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

Performance Studies

Index Construction Time (in second)

Dataset TF-Label DL GRAIL Ferrari IP+

uniprotenc 58.529 22.280 58.242 24.292 18.96

twitter 15.291 13.719 32.323 19.972 12.44

web-uk --- 24.240 44.031 26.927 17.46

citeseerx 91.877 12.045 23.170 19.792 7.54

go-uniprot 38.668 18.277 44.557 40.365 9.68

govwild 30.520 18.584 29.237 19.924 8.45

Page 22: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

Performance Studies

Query Time (in millisecond)

Dataset TF-Label DL GRAIL Ferrari IP+

uniprotenc 119.164 119.618 820.249 116.351 54.205

twitter 102.923 104.698 --- 82.212 79.285

web-uk --- 146.429 --- 214.857 253.082

citeseerx 230.318 111.329 28774 131.534 101.444

go-uniprot 55.279 153.214 499.505 313.300 34.577

govwild 254.785 128.199 719.494 295.432 112.990

Page 23: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

Performance Studies

Page 24: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

Performance Studies

Distribution of the number of vertices visited

Page 25: Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.

Conclusion

We propose a new IP labeling approach, the first one to explore the randomness to answer reachability queries.

Our new labeling approach has linear index construction time and index size. By independent permutation, the query performance is guaranteed by high probability.

We analyze the performance of our proposed approach by extensive experimental studies and our approach shows both good efficiency and scalability.


Recommended