Post on 19-Jan-2016
description
transcript
Distributed Query-Sub-Query
Presented by Noam Pettel29/5/05
2
Motivation
Optimization of query evaluation in a peer-to-peer environment
Development of a distributed algorithm based on Query-Sub-Query technique for optimization of Datalog queries in a peer-to-peer environment
Implementation of the algorithm using the Active XML system
3
Outline
Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query
(dQSQ) Implementation using AXML Using dQSQ for Petri Nets
4
Outline
Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query
(dQSQ) Implementation using AXML Using dQSQ for Petri Nets
5
Example
Input:
We are interested in the ancestor(x,y) relation Typical query: “Give me all the ancestors of
Andy”
AliceNancy
AliceJoyce
JoyceLois
LoisMark
LoisAndy
JoyceRuth
parent(x,y)Alice
Joyce Nancy
Ruth Lois
Andy Mark
6
Relational Database A Database composed of relations (tables) Stores only explicit information
AliceNancy
AliceJoyce
JoyceLois
LoisMark
LoisAndy
JoyceRuth
parent(x,y)
Alice
Joyce Nancy
Ruth Lois
Andy Mark
AliceNancy
AliceJoyce
JoyceLois
LoisMark
LoisAndy
JoyceRuth
AliceLois
AliceMark
AliceAndy
AliceRuth
JoyceMark
JoyceAndy
anc(x,y)
7
Deductive Database
Explicit information Rules that enable inferences based
on the stored data
anc(x,y) :- parent(x,y)anc(x,y) :- anc(x,z), parent(z,y)
Datalog program
recursions
AliceNancy
AliceJoyce
JoyceLois
LoisMark
LoisAndy
JoyceRuth
parent(x,y)
x,y (anc(x,y) ← parent(x,y))
x,y,z (anc(x,y) ← anc(x,z), parent(z,y))
↨head body
8
Outline
Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query
(dQSQ) Implementation using AXML Using dQSQ for Petri Nets
9
Alice
Joyce Nancy
Ruth Lois
Andy Mark
Query Evaluation
Query:
Goal: Compute query with minimal data materialization
q(y) :- anc(“Joyce”,y)
10
QSQ
Known technique for optimization of Datalog queries:Query-Sub-Query (QSQ)
QSQ rewrites the Datalog program according to the given query
QSQ is based on two main notions:• Binding patterns • Supplementary relations
11
Binding Patterns
For each relation, adorned versions of the relation based on the bindings of the variables are considered
For example, adorned versions of anc are: ancbb, ancbf, ancfb, ancff,
anc(x,y) :- parent(x,y)anc(x,y) :- anc(x,z), parent(z,y)q(y) :- anc(“Joyce”,y)
12
Binding Patterns
anc (x,y) :- parent(x,y)anc (x,y) :- anc (x,z), parent(z,y)q(y) :- anc (“Joyce”,y)
bound to a constant free
The same relation may appear with different adornments in the Datalog program
different adornments of the same relation are treated as different relations during the QSQ computation
bf
bf
bf bf
13
Supplementary Relations
ancbf (x,y) :- parent(x,y)
ancbf (x,y) :- ancbf (x,z), parent(z,y)
q(x) :- ancbf (“Joyce”,x)
For each adorned relation and each position in the body of a rule, we define a supplementary relation to accumulate the bindings relevant to that position
sup_10(x) sup_11(x,y)
sup_20(x) sup_21(x,z) sup_22(x,y)
sup_10(x) :- in_anc_bf(x)sup_11(x,y) :- sup_10(x), parent(x,y)anc_bf(x,y) :- sup_11(x,y)
sup_20(x) :- in_anc_bf(x)sup_21(x,z) :- sup_20(x), anc_bf(x,z)sup_22(x,y) :- sup_21(x,z), parent(z,y)anc_bf(x,y) :- sup_22(x,y)
QSQ rewriting of the program
14
QSQ Example
sup_10(x) sup_11(x,y)
sup_20(x) sup_21(x,z) sup_22(x,y)
Joyce, LoisJoyce, Ruth
AliceNancy
AliceJoyce
JoyceLois
LoisMark
LoisAndy
JoyceRuth
parent(x,y)
LoisRuth
Joyce, LoisJoyce, Ruth
Joyce, MarkJoyce, Andy
Mark Andy
ancbf (x,y) :- parent(x,y)
ancbf (x,y) :- ancbf (x,z), parent(z,y)
q(y) :- ancbf (“Joyce”,y)
Joyce, MarkJoyce, Andy
query result
Alice
Joyce Nancy
Ruth Lois
Andy Mark
Joyce
Joyce
15
Properties of QSQ
Compute the correct answer to the query
Materialize only a minimal set of tuples
Guaranteed to terminate
QSQ evaluations have nice properties!
16
Outline
Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query
(dQSQ) Implementation using AXML Using dQSQ for Petri Nets
17
Distributed Environment
r1 r(x,y) :- a(x,y)r2 r(x,y) :- s(x,z), t(z,y)r3 s(x,y) :- r(x,y), b(y,z)r4 t(x,y) :- c(x,y)
Centralized Datolog program
Distribution of the program between 3 peers
R
hosting r, aS
hosting s, b
T
hosting t, c
r1 r@R(x,y) :- a@R(x,y)r2 r@R(x,y) :- s@S(x,z), t@T(z,y)
r3 s@S(x,y) :- r@R(x,y), b@S(y,z)
r4 t@T(x,y) :- c@T(x,y)
The rules at peer P are the rules where P is the peer of the head
18
Naïve Distributed Evaluation
Activation of remote relations
R
S T
r2 r@R(x,y) :- s@S(x,z), t@T(z,y)
request request
response response
AXML and Web Services make it very easy!
19
Termination Detection
We need to detect when the system reaches a fixpoint
Fixpoint is reached when no new facts can be derived at any peer
Termination detection is a standard problem in distributed computing
20
Termination Detection
The model: Communication is asynchronous Each message eventually arrives and
acknowledged At some point, the site that started the
query decides to check for termination It calls all the sites that it directly
invoked and asks them if they completed
These sites contact the sites they invoked and so on…
21
Termination Detection
A site answers positively if:• It is idle (cannot produce more data)• All the data it has sent has been
acknowledged• All its successors believe the
computation terminated
22
Termination Detectionr1 r@R(x,y) :- a@R(x,y)r2 r@R(x,y) :- s@S(x,z), t@T(z,y)
r3 s@S(x,y) :- r@R(x,y), b@S(y,z)
r4 t@T(x,y) :- c@T(x,y)
r
a s
b
t
c
Build a graph to represent the distributed Datalog program
Recursions result in cycles in the graph
Use a spanning tree of the graph in order to decide termination
23
Distributed QSQ Rewriting
For each rule: The peer in the head of the rule starts the rewriting
When a remote relation is encountered, the peer delegates the remainder of the rule to the remote peer in charge of that relation
24
Distributed QSQ Rewriting
sup_0(x) sup_1(x,z) sup_2(x,y)
rbf (x,y) :- sbf (x,z), tbf (z,y) sup_0(x) :- in_r_bf(x)sup_1(x,z) :- sup_0(x), s(x,z)sup_2(x,y) :- sup_1(x,z), t_bf(z,y)r_bf(x,y) :- sup_2(x,y)
centralized
sup_0@R(x) sup_1@S(x,z) sup_2@T(x,y)
r@Rbf (x,y) :- s@Sbf (x,z), t@Tbf (z,y)
distributed R computes sup_0@R(x) :- in_r_bf@R(x) R sends to S sup2@S(x,y) :- sup0@R(x,y), s_bf@S(x,z), t_bf@T(z,y)
25
Distributed QSQ Rewriting
The rewriting is performed locally at each peer, without any global knowledge
Once the QSQ rewriting is complete, we start the QSQ computation process – Like in the central case, except for calling remote services
26
Outline
Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query
(dQSQ) Implementation using AXML Using dQSQ for Petri Nets
27
Why Active XML?
AXML is a natural selection An AXML document contains both
explicit and implicit data, just like in Datalog
r@R(x,y) :- s@S(x,z), t@T(z,y)
<r> <t> <x>1</x> <y>2</y> </t> <t> <x>1</x> <y>3</y> </t>
<sc>…
S T
continuous services
28
Implementation Steps
Given a distributed Datalog program and a query:1. Transform the Datalog program to
distributed QSQ2. Transform the distributed QSQ to
Active XML3. Run!4. Detect termination
29
Outline
Datalog Query-Sub-Query (QSQ) Distributed Query-Sub-Query
(dQSQ) Implementation using AXML Using dQSQ for Petri Nets
30
Article
“Diagnosis of Asynchronous Discrete Event Systems: Datalog to the Rescue!”
S. Abiteboul, Z. Abrams, S. Haar, T. Milo
PODS, June 2005
31
Datalog & P2P
Deductive databases was a hot topic in the late 80s
Research in this area led to beautiful results, with little industrial impact
Years later, with networks everywhere, recursive data management is becoming more essential
Datalog and QSQ become hot again!
32
Abstract
Diagnosis of distributed telecommunication systems
The problem can be modeled by Datalog
Can benefit from dQSQ
33
Petri Nets
An enabled transition can fire and yield a new Petri net If a transition fires, its alarm symbol is reported to the supervisor For example, if transition (i) fires. The marking moves from
places 1,7 to places 2,3
place
alarm symbol
transition
marked place
The marked places model the current state of the peer
A transition node is enabled iff all its parent nodes are marked
34
The Problem
The supervisor receives an alarm sequence (a1,p1),(a2,p2),…,(an,pn).Ai – An alarm symbolPi – The peer that emitted the alarm
Due to asynchronous communication• We do not guarantee that alarms sent by
different peers appear in the order they were emitted
• We can only assume that the order of alarms is kept for each individual peer
Goal: Find an explanation for a given alarm sequence
35
Example
The set of shaded nodes in figure 2 is a diagnosis for the alarm sequence (b; p1), (a; p2), (c; p1).
36
From Petri Nets to dQSQ
Petri Nets can be modeled by Datalog and dQSQ
A set of relations and rules is defined at each peer
Each peer builds its own Datalog program using local information only, even if it has transitions to other peers
37
From Petri Nets to dQSQ
Here is a small part of the Datalog rules…
38
From Petri Nets to AXML
Translation steps from Petri Nets to Active XML:
Petri Net
Datalog QSQ AXMLPNet2Datalog Datalog2QSQ QSQ2AXML
39
The End