Share this document with a friend

Description:

Simulation Revised for Graph Pattern Matching. Outline. Graph Simulation label equality, edge-to-edge matching relation Bounded Simulation node predicates, edge bound, edge-to-path matching relation Reachability Queries and Graph Pattern Queries - PowerPoint PPT Presentation

28

Transcript

Yinghui Wu, LFCS DB talk

Database Group Meeting Talk

Yinghui Wu

10/11/2010

1

Simulation Revised for Graph

Pattern Matching

Yinghui Wu, LFCS DB talk

Outline

Graph Simulation• label equality, edge-to-edge matching relation

Bounded Simulation• node predicates, edge bound, edge-to-path matching relation

Reachability Queries and Graph Pattern Queries

• query containment and minimization – cubic time

• query evaluation – cubic time

Conclusion

2

A first step towards revising simulation for graph pattern matching

Yinghui Wu, LFCS DB talk

Graph Pattern Matching: the problem

Given a pattern graph P and a data graph G , decide whether

G matches P , and if so, find all the matches of P in G.

Applications• social queries, social matching

• biology and chemistry network querying

• key work search, proximity search, …

3

Widely employed in a variety of emerging real life applications

How to define?

Yinghui Wu, LFCS DB talk

Graph Simulation

Node label equivalence

Edge-to-edge relation

4

Identical label matching, edge-to-edge relations

Capable enough?

A

B

D

Bv1 v2

E

G

A

B

D EP

Yinghui Wu, LFCS DB talk

An example from real life social matching

5

Alice

biologist

doctors

3

1

1

3

P

G

edge-to-path

mappings

Graph simulation is too restrictive!

Yinghui Wu, LFCS DB talk

Bounded Simulation

data graph G = (V, E, fA)

pattern graph P = (Vp, Ep, fv, fe)

G matches P via bounded simulation if there is a binary

relation from Vp to V that for every edge of P, there exists a

path in G satisfying the constraints of the edge.

bounded simulation v.s graph simulation

• node matches v.s label equality

• edge-to-path matching v.s edge-to-edge matching

6Enriched model for capturing meaningful matches

special caseId = ‘Alice’

Job = ‘biologist’

Job = ‘doctors’

3

1

1

3

PG

Job = ‘biologist’

Job = ‘biologist’

Job = ‘biologist’

Job = ‘doctors’

Job = ‘doctors’

Job = ‘CTO’

Id = ‘Alice’

Yinghui Wu, LFCS DB talk

Basic results for the bounded simulation

For any graph G and pattern P, if G matches P, then there is a

unique maximum match in G for P.

The graph pattern matching problem via bounded simulation

can be solved in cubic time.

The incremental bounded simulation problem

Efficient approaches for graph pattern matching

extension for multiple edge colors?

7

Yinghui Wu, LFCS DB talk

Considering edge types…8

Real life graphs have multiple edge types

Essembly Network

friends-allies

friends-nemeses

strangers-nemeses

strangers-allies

Yinghui Wu, LFCS DB talk

Querying Essembly network: an example9

Essembly Network

fafn

snsa

Alice

Biologists supporting Cloning

Doctors Against cloning

fa<=2 sa<=2

fn

fn

P

fa<=2 sn

fa+

Pattern queries with multiple edge types

Yinghui Wu, LFCS DB talk

Graph reachability and pattern queries

Real life graphs usually bear different edge types…

data graph G = (V, E, fA, , fC)

• Reachability query (RQ) : (u1, u2, fu1, fu2, fe) where fe is a

subclass of regular expression of:

F ::= c | c≤k | c+ | FF

Qr(G): set of node pairs (v1, v2) that there is a nonempty path

from v1 to v2 , and the edge colors on the path match the

pattern specified by fe.

10

Job=‘biologist’, sp=‘cloning’

Job=‘doctors’

fa<=2 fn

Yinghui Wu, LFCS DB talk

Graph pattern queries

11

graph pattern queries PQ Qp =(Vp, Ep, fv , fe) where for each

edge e=(u,u’), Qe=(u1, u2, fv(u) , fv(u’), fe(e)) is an RQ.

Qp(G) is the maximum set (e, Se)

for any e1(u1,u2) and e2(u2 ,u3), if (v1,v2) is in Se1, then there is a v3 that

(v2,v3) is in Se2 .

for any two edges e1(u1,u2) and e2(u1 ,u3), if (v1,v2) is in Se1, then there is

a v3 that (v1,v3) is in Se2

PQ vs. simulation and bounded simulation

search condition on query nodes

mapping edges to paths

constrain the edges on the path with a regular expression

RQ and bounded simulation are special cases of PQ

Yinghui Wu, LFCS DB talk

Reachability and graph pattern query: examples

12

fafn

snsa

Job=‘biologist’, sp=‘cloning’

Job=‘doctors’

fa<=2 fn

Id=‘Alice’

Job=‘biologist’, sp=‘cloning’

Job=‘doctors’dsp=‘cloning’

fa<=2 sa<=2

fn

fn

fa<=2 sn

fa+

Yinghui Wu, LFCS DB talk

Fundamental problems: query containment

PQ Q1 (V1, E1, fv1 , fe1) is contained in Q2 (V2, E2, fv2

, fe2) if there

exists a mapping λ from E1 to E2 s.t for any data graph G and e

in E1, Se is a subset of Sλ(e) , i.e., λ is a renaming function that

Q1(G) is mapped to Q2(G).

Query containment and equivalence problems can all be

determined in cubic time

• Query similarity based on a revision of graph simulation

• Determine the query similarity in cubic time

13

Query containment and equivalence for PQs can be solved efficiently

Yinghui Wu, LFCS DB talk

query containment: example

14

B1

C1

Q1

C3C2

h<=1

h<=2

h<=3

B2

Q2

C4

h<=1

B3

C5

Q3

C6

h<=1 h<=3

Yinghui Wu, LFCS DB talk

Fundamental problems: query minimization

Query minimization problem

• input: a PQ Qp

• output: a minimized PQ Qm equivalent to Qp

Query minimization problem can be solved in cubic time.• compute the maximum node equivalent classes based on a

revision of graph simulation;

• determine the number of redundant nodes and edges based on

the equivalent classes;

• Removed redundant and isolated nodes and edges

15

Query minimization for PQs can be solved efficiently

Yinghui Wu, LFCS DB talk

query minimization: example

16

R

B

Q1

B

C

f

h<=2g<=3

g

C C C

h<=2

g<=3

R

B B

f g

C C

h<=2

g<=3 h<=2

g<=3

R

B B

f g

C C

h<=2

g<=3 g<=3

h<=2

Q2 Q3

Yinghui Wu, LFCS DB talk

Evaluating graph pattern queries

17

PQ can be answered in cubic time.

• Join-based Algorithm JoinMatch

Matrix index vs distance cache

join operation for each edge in PQ until a fixpoint is

reached (wrt. a reversed topological order)

• Split-based Algorithm SplitMatch

blocks: treating pattern node and data node uniformly

partition-relation pair

Graph pattern matching can be solved in polynomial time

Yinghui Wu, LFCS DB talk

Example of JoinMatch

18

fafn

snsa

Id=‘Alice’

Job=‘biologist’, sp=‘cloning’

Job=‘doctors’dsp=‘cloning’

fa<=2 sa<=2

fn

fn

fa<=2 sn

fa+

Yinghui Wu, LFCS DB talk

Example of JoinMatch

19

fafn

snsa

Id=‘Alice’

Job=‘biologist’, sp=‘cloning’

Job=‘doctors’dsp=‘cloning’

fa<=2 sa<=2

fn

fn

fa<=2 sn

fa+

Yinghui Wu, LFCS DB talk

Example of JoinMatch

20

fafn

snsa

Id=‘Alice’

Job=‘biologist’, sp=‘cloning’

Job=‘doctors’dsp=‘cloning’

fa<=2 sa<=2

fn

fn

fa<=2 sn

fa+

Yinghui Wu, LFCS DB talk

Example of JoinMatch

21

fafn

snsa

Id=‘Alice’

Job=‘biologist’, sp=‘cloning’

Job=‘doctors’dsp=‘cloning’

fa<=2 sa<=2

fn

fn

fa<=2 sn

fa+

Yinghui Wu, LFCS DB talk

Experimental results – effectiveness of PQs

22

Effectiveness of PQs: edge to path relations

Yinghui Wu, LFCS DB talk

Experimental results – querying real life graphs

23

Evaluation algorithms are sensitive to pattern edges

Varying |Vp| Varying |Ep|

Yinghui Wu, LFCS DB talk

Experimental results – querying real life graphs

24

The algorithms are sensitive to the number of predicates

Varying |pred| Varying b

Yinghui Wu, LFCS DB talk

Experimental results – querying synthetic graphs

25

The algorithms scale well over large synthetic graphs

Varying |V| (x105) Varying b

Yinghui Wu, LFCS DB talk

Experimental results – querying synthetic graphs

26

The algorithms scale well over large synthetic graphs

Varying α Varying cr

Yinghui Wu, LFCS DB talk

Conclusion

Simulation revised for graph pattern matching

• Bounded Simulation node predicates, edge bound, edge-to-path matching relation

• Reachability Queries and Graph Pattern Queries

query containment and minimization – cubic time

query evaluation – cubic time

Future work• extending RQs and PQs by supporting general regular

expressions

• incremental evaluation of RQs and PQs

27

Simulation revised for graph pattern matching

Yinghui Wu, LFCS DB talk 28

“Those who were trained to fly didn’t know the others. One group of people did not know the other group.” (Bin Laden)

Terrorist Collaboration Network (1970 - 2010)

Thank you!

Recommended