1
Debugging Schema Mappings
with Routes
Laura ChiticariuUC Santa Cruz
(joint work with Wang-Chiew Tan)
2
SPIDER: A Schema Mapping Debugger
Today 14:00-15:30 Thursday 11:00-12:30
Demo group B
3
Schema Mappings A schema mapping is a logical assertion that describes the
correspondence between two schemas Key element in data exchange and data integration systems
Data Exchange [FKMP05] Translate data conforming to a source schema S into data
conforming to a target schema T so that the schema mapping M is satisfied
Schema S Schema T
I
Source instance
J
Target instance
M
4
Debugging a Data Exchange Today
XQuery/XSLT/Java
Debugging at the (low) level of the implementation1. Specific to the data exchange engine2. Specific to the implementation language: XQuery, SQL, etc
Debugging at the level of schema mappings
NO SUPPORT!!!
Schema S Schema T
I
Source instance
J
Target instance
M
5
Debugging Schema Mappings
Debugging schema mappings: the process of exploring, understanding and refining a schema mapping through the use of (test) data at the level of schema mappings
Schema S Schema T
I
Source instance
J
Target instance
M
6
Outline Overview
Motivation
Debugging schema mappings with routes Motivating example What are routes? Computing routes Related work
Performance evaluation
Conclusions
7
Motivation Schema mappings are good
Higher-level, declarative programming constructs Hide implementation details, allow for optimization Typically easier to understand vs. SQL/XSLT/XQuery/Java Serve a similar goal as model management [Bernstein03,
MBHR05]
Uniformity in specifying and debugging Reduce programming effort by allowing a user to specify and
debug at the level of schema mappings
Schema mappings are often generated by schema matching tools Close to user’s intention, but may need further refinements Hard to understand without the help of tools
8
Language for Schema Mappings Tuple generating dependencies (tgds)
8 x ((x) ! 9 y (x,y)) Equality generating dependencies (egds)
8 x ((x) ! x1 = x2)
Remarks: Widely used for relational schema mappings in data
exchange and data integration [Kolaitis05,Lenzerini02] TGDs generalize LAV, GAV and are equivalent to GLAV
assertions in the terminology of data integration Extended to handle XML data exchange [PVMHF02]
9
Relational Schema Mappings [FKMP03] Schema mapping M = (S, T, st[t)
S, T: relational schemas with no relation symbols in common Source-to-target dependencies st:
Source-to-target tgds (s-t tgds) S(x) ! 9y T(x,y)
Target dependencies t: Target tgds: T(x) ! 9y T(x,y)
Target egds: T(x) ! x1 = x2
∑st ∑t
Schema S Schema T
I
Source instance
J
Target instance
10
Example Schema Mapping
Source-to-target dependencies, st:m1: CardHolders(cn,l,s,n) ! 9L (Accounts(cn,L,s) Clients(s,n))
m2: Dependents(an,s,n) ! Clients(s,n)
Target dependencies, t:m3: Clients(s,n) ! A L (Accounts(A,L,s))
MANHATTAN CREDITCardHolders: cardNo ² limit ² ssn ² name ²
Dependents: accNo ² ssn ² name ²
FARGO FINANCEAccounts:² accNo² creditLine² accHolder
Clients:² ssn² name
m2
m1
m3
S: T:
Source instance I Target instance J Solution for I underthe schema mapping
123 $15K ID1 Alice
CardHolders
123 ID2 Bob
Dependents
123 L1 ID1
A2 L2 ID2
AccountsID1 Alice
ID2 Bob
Clients
fk1
11
Example Debugging Scenario 1
Unknown credit limit?
15K is not copied over to the target
Source instance I Target instance J
123 $15K ID1 Alice
CardHolders
123 ID2 Bob
Dependents
123 L1 ID1
A2 L2 ID2
AccountsID1 Alice
ID2 Bob
Clients
AliceID1$15K123
CardHolders ID1L1123
Accounts
AliceID1
Clientsm1
A route for the Accounts tuple
m1: CardHolders(cn,l,s,n) ! 9L (Accounts(cn,L,s) ^ Clients(s,n))
12
Example Debugging Scenario 1
Unknown credit limit?
15K is not copied over to the target
Source instance I Target instance J
123 $15K ID1 Alice
CardHolders
123 ID2 Bob
Dependents
123 L1 ID1
A2 L2 ID2
AccountsID1 Alice
ID2 Bob
Clients
AliceID1$15K123
CardHolders ID1L1123
Accounts
AliceID1
Clientsm1
A route for the Accounts tuple
m1: CardHolders(cn,l,s,n) ! (Accounts(cn,l,s) ^ Clients(s,n))
13
Example Debugging Scenario 2
Unknown account number?
123 is not copied over to the target as Bob’s account number
Source instance I Target instance J
123 $15K ID1 Alice
CardHolders
123 ID2 Bob
Dependents
123 L1 ID1
A2 L2 ID2
AccountsID1 Alice
ID2 Bob
Clients
m2BobID2123
Dependents
ID2L2A2
Accounts
BobID2
Clients m3
Route for Accounts tuple with accNo A2
m2: Dependents(an,s,n) ! Clients(s,n)
14
Example Debugging Scenario 2
Unknown account number?
123 is not copied over to the target as Bob’s account number
Source instance I Target instance J
123 $15K ID1 Alice
CardHolders
123 ID2 Bob
Dependents
123 L1 ID1
A2 L2 ID2
AccountsID1 Alice
ID2 Bob
Clients
m2BobID2123
Dependents
ID2L2A2
Accounts
BobID2
Clients m3
Route for Accounts tuple with accNo A2
m’2: CardHolders(an,l,s’,n’) ^ Dependents(an,s,n) ! Accounts(an,l,s) ^ Clients(s,n)
15
Debugging Schema Mappings with Routes Main intuition: routes describe the relationships between
source and target data with the schema mapping
Definition: Let: M be a schema mapping I be a source instance J be a solution for I under M and Js µ J
A route for Js with M and (I,J) is a finite non-empty sequence of satisfaction steps
(I,;) ! (I,J1) ! … ! (I,Jn)
such that: Ji µ J, mi 2 st [ t, where 1· i· n Js µ Jn
m1, h1 m2, h2mn, hn
16
Example of Satisfaction Step
123 $15K ID1 Alice
CardHolders123 L1 ID1
Accounts
ID1 Alice
Clients
m1, h1
m1: CardHolders(cn, l, s, n) ! 9L (Accounts(cn, L, s ) ^ Clients(s, n ))
h1={cn ! ‘123’, l ! $15K, s ! ID1, n ! Alice, L ! L1}
Unknown credit limit?
Source instance I Target instance J
123 $15K ID1 Alice
CardHolders
123 ID2 Bob
Dependents
123 L1 ID1
A2 L2 ID2
AccountsID1 Alice
ID2 Bob
Clients
17
Compute all routes The schema mapping M is fixed
Input: source instance I, a solution J for I under M, a set of target tuples Js µ J
Output: a forest representing all routes for Js
Algorithm idea: For each tuple t in Js, consider every possible 2 st [ t
and h for witnessing t Do the same for all target tuples encountered during the
process until tuples from the source instance are obtained
18
Compute all routes: A simple example st:
1: S1(x) ! T1(x) 2: S2(x) ! T2(x) Æ T6(x)
t: 3: T2(x) ! T3(x) 4: T3(x) ! T4(x) 5: T4(x) Æ T1(x) ! T5(x) 6: T4(x) Æ T6(x) ! T7(x) 7: T5(x) ! T3(x)
Source instance, I: S1(a), S2(a)
A solution, J: T1(a), …, T7(a)
T7(a)
T4(a) T6(a)
6, x a
19
Compute all routes: A simple example st:
1: S1(x) ! T1(x) 2: S2(x) ! T2(x) Æ T6(x)
t: 3: T2(x) ! T3(x) 4: T3(x) ! T4(x) 5: T4(x) Æ T1(x) ! T5(x) 6: T4(x) Æ T6(x) ! T7(x) 7: T5(x) ! T3(x)
Source instance, I: S1(a), S2(a)
A solution, J: T1(a), …, T7(a)
T7(a)
T4(a) T6(a)
6
T3(a)
4, x a
20
Compute all routes: A simple example st:
1: S1(x) ! T1(x) 2: S2(x) ! T2(x) Æ T6(x)
t: 3: T2(x) ! T3(x) 4: T3(x) ! T4(x) 5: T4(x) Æ T1(x) ! T5(x) 6: T4(x) Æ T6(x) ! T7(x) 7: T5(x) ! T3(x)
Source instance, I: S1(a), S2(a)
A solution, J: T1(a), …, T7(a)
T7(a)
T4(a) T6(a)
6
T3(a)
4
T5(a)
7
21
Compute all routes: A simple example st:
1: S1(x) ! T1(x) 2: S2(x) ! T2(x) Æ T6(x)
t: 3: T2(x) ! T3(x) 4: T3(x) ! T4(x) 5: T4(x) Æ T1(x) ! T5(x) 6: T4(x) Æ T6(x) ! T7(x) 7: T5(x) ! T3(x)
Source instance, I: S1(a), S2(a)
A solution, J: T1(a), …, T7(a)
T7(a)
T4(a) T6(a)
6
T3(a)
T5(a)
4
7
T4(a) T1(a)
5
S1(a)
1
22
Compute all routes: A simple example st:
1: S1(x) ! T1(x) 2: S2(x) ! T2(x) Æ T6(x)
t: 3: T2(x) ! T3(x) 4: T3(x) ! T4(x) 5: T4(x) Æ T1(x) ! T5(x) 6: T4(x) Æ T6(x) ! T7(x) 7: T5(x) ! T3(x)
Source instance, I: S1(a), S2(a)
A solution, J: T1(a), …, T7(a)
T7(a)
T4(a) T6(a)
6
T3(a)
T5(a)
4
7
S2(a)
2
T4(a) T1(a)
5
T2(a)
S2(a)
3
2
S1(a)
1
23
Compute all routes: A simple example st:
1: S1(x) ! T1(x) 2: S2(x) ! T2(x) Æ T6(x)
t: 3: T2(x) ! T3(x) 4: T3(x) ! T4(x) 5: T4(x) Æ T1(x) ! T5(x) 6: T4(x) Æ T6(x) ! T7(x) 7: T5(x) ! T3(x)
Source instance, I: S1(a), S2(a)
A solution, J: T1(a), …, T7(a)
T7(a)
T4(a) T6(a)
6
T3(a)
T5(a)
4
7
S2(a)
8
T4(a) T1(a)
5
T2(a)
S2(a)
3
2
S1(a)
1
Route for T7(a): 2, 3, 4, 8, 6
24
Properties of compute all routes Completeness:
Let F denote the route forest by our algorithm returned on Js. If R is a minimal route for Js, then it is represented in F.
Running time: polynomial in the sizes of I, J and Js Every “branch of a tuple” once explored, is never
explored again Polynomial number of branches for each tuple since M is
fixed
Challenge: Exponentially many routes, but polynomial-size
representation constructed in polynomial time
25
Compute one route Our experimental results indicate that compute all routes
can be expensive Generate one route fast and alternative routes as needed?
Our solution: adapt compute all routes to compute only one route Non-exhaustive: Stops when one witness is found. A
witness that uses source tuples is preferred Inference procedure: to deduce all consequences of a
proven tuple and avoid recomputation of “branches” Key step for polynomial time analysis
Completeness: If there is a route for Js, then our algorithm will produce a route for Js
26
Related work Commercial data exchange systems
e.g., Altova MapForce, Stylus Studio Use “lower-level” languages (e.g., XSLT, XQuery) to
specify the exchange Debugging is done at this low level Source tuple centric
Data viewer [YMHF01] Constructs an “example” source instance illustrative for
the behavior of the schema mapping Complementary to our approach
Works only for relational schema mappings
27
Related work Computing routes for target data is related to
computing provenance (aka lineage) of data
SQL Schema mappings
Eager DBNotes [B.TV04] Mondrian [GKM06]
MXQL system[VMM05]
Lazy [CWW00][CW00a, CW00b]
Our routes approach
28
Empirical Evaluation Implementation: on top of the Clio data exchange system from
IBM Almaden Research Center Scalable: push computation to the database Handles relational and XML schema mappings [PVMHF02]
Testbed: Created relational and XML schema mappings based on the TPCH schema Created schema mappings based on Mondial, DBLP and Amalgam
schemas
Methodology - measured the influence of: The sizes of I, J and Js
The complexity of st [ t i.e., the number of tgds and the number of atoms in each tgd
Setup: P4 2.8GHz, 2Gb RAM, 256MB DB2 buffer pool
Our regret: No benchmark to base our comparisons
29
ComputeOneRoute with Rel. schema mappingInfluence of the Sizes of I and J
TGDs with 1 join in the LHS and RHS Routes with 3 satisfaction steps for each selected tuple
0
2
4
6
8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# selected target tuples
Co
mp
ute
on
e ro
ute
(s
ec)
I:10MB; J:60MB I:50MB; J:300MB I:100MB; J:600MB
30
ComputeOneRoute with Rel. schema mappingInfluence of the Complexity of st [ t
TGDs with 0 to 3 joins in the LHS and RHSRoutes with 3 satisfaction steps for each selected tuple
Size of I = 100MB, Size of J = 600MB
0
5
10
15
20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# selected target tuples
Co
mp
ute
on
e
rou
te (
sec)
no joins 1 join 2 joins 3 joins
31
ComputeOneRoute vs. ComputeAllRoutes
TGDs with 1 join in the LHS and RHSRoutes with 3 satisfaction steps
Size of I = 100MB, Size of J = 600MB
0.0010.010.1
110
1001000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# selected target tuples
Ru
nn
ing
tim
e (s
ec)
computeOneRoute computeAllRoutes
32
Experimental results with Mondial, DBLP and AmalgamSchemas Total
Elem.AtomicElems.
Nest.Depth
Inst. Size
|st|/|t|
S DBLP1 (XML) 65 57 1 640KB 10/14
DBLP2 (XML) 20 12 4 850KB
T Amalgam (rel) 117 100 1 1.1MB
S Mondial1 (rel) 157 129 1 1 MB 13/25
T Mondial2 (XML) 144 112 4 1.2MB
33
Experimental results with Mondial, DBLP and AmalgamSchemas Total
Elem.AtomicElems.
Nest.Depth
Inst. Size
|st|/|t|
S DBLP1 (XML) 65 57 1 640KB 10/14
DBLP2 (XML) 20 12 4 850KB
T Amalgam (rel) 117 100 1 1.1MB
S Mondial1 (rel) 157 129 1 1 MB 13/25
T Mondial2 (XML) 144 112 4 1.2MB
Two DBLP schemas and datasets, both XML: DBLP1, DBLP2
First relational schema from Amalgam test suite
34
Experimental results with Mondial, DBLP and AmalgamSchemas Total
Elem.AtomicElems.
Nest.Depth
Inst. Size
|st|/|t|
S DBLP1 (XML) 65 57 1 640KB 10/14
DBLP2 (XML) 20 12 4 850KB
T Amalgam (rel) 117 100 1 1.1MB
S Mondial1 (rel) 157 129 1 1 MB 13/25
T Mondial2 (XML) 144 112 4 1.2MB
Two DBLP schemas and datasets, both XML: DBLP1, DBLP2
First relational schema from Amalgam test suite Two Mondial schemas and datasets:
one relational (Mondial1), the other XML (Mondial2) Designed
st and used the foreign key constraints as t
35
Experimental results with Mondial, DBLP and AmalgamSchemas Total
Elem.AtomicElems.
Nest.Depth
Inst. Size
|st|/|t|
S DBLP1 (XML) 65 57 1 640KB 10/14
DBLP2 (XML) 20 12 4 850KB
T Amalgam (rel) 117 100 1 1.1MB
S Mondial1 (rel) 157 129 1 1 MB 13/25
T Mondial2 (XML) 144 112 4 1.2MB
Compute one route: under 3 seconds for 1-10 randomly selected tuples
Compute all routes: can take much longer 18 seconds to construct the route forest for 10 selected
tuples in the target instance of Mondial Compute one route took under 1 second
36
Conclusions Debugging schema mappings with routes
Complete, polynomial time algorithms for computing routes
Extension for routes for selected source data
Routes have declarative semantics, based on the logical satisfaction of tgds What we don’t do: illustrate data merging
Future work: Illustrate grouping semantics for nested schema
mappings Adapt target instance to changes in the schema
mapping and data sources
37
SPIDER: A Schema Mappings Debugger
Compute one/all routes Alternative routes Guided computation of
routes Standard debugging
features Breakpoints “Watch” windows
Schema-level routes
Today 14:00-15:30 Thursday 11:00-12:30
Demo group B
38
Thank you!
39
How do we do it?
Sourceinstance I
Sourceschema S
TargetSchema T
Targetinstance J
M
Schema mappin
gs debugg
er
routes
Witness selected target data
with source data and M
40
How do we do it?
Sourceinstance I
Sourceschema S
TargetSchema T
Targetinstance J
M
Schema mappin
gs debugg
er
routes
Illustrate consequences of
selected source data with M
41
Key Concept: ROUTES - describe the relationships between source and target data with the schema mapping
Sourceinstance I
Sourceschema S
TargetSchema T
Targetinstance J
M
Schema mappin
gs debugg
er
routes
42
Clio
A semi-automatic schema mapping system Supports user-guided mapping from source to target with constraints Schema mapping language: a nested extension of tgds and egds Automatically generate XQuery/SQL/XSLT scripts for the actual data
transferring based on the schema mapping Generates universal solutions under relational-to-relational schema
mappings Implemented our techniques on top of Clio, but…
Routes have declarative semantics Independent of Clio’s transformation engine
Data
Schema Schema
Data
Mapping
XQuery/SQL/XSLT
43
Related work Computing routes for target data is related to
computing provenance (aka lineage) of data
SQL Schema mappings
Eager DBNotes [B.TV04] Mondrian [GKM06]
MXQL system[VMM05]
Lazy [CWW00][CW00a, CW00b]
Our routes approach
Q
Q’
Provenanceinformation
44
Related work Computing routes for target data is related to
computing provenance (aka lineage) of data
SQL Schema mappings
Eager DBNotes [B.TV04] Mondrian [GKM06]
MXQL system[VMM05]
Lazy [CWW00][CW00a, CW00b]
Our routes approach
Q
No reengineeringof the query
45
Related work Approaches to computing provenance:
Eager: changes the transformation to carry provenance information
Requires re-engineering of Q to Q’. No subsequent source access or access to the definition of Q or Q’.
Lazy: does not No re-engineering of Q. Subsequent source access and
access to the definition of Q may be needed.
Q
Q’
Provenanceinformation
Eager
46
Related work Computing routes for target data is related to
computing provenance (aka lineage) of data
SQL Schema mappings
Eager DBNotes [BCTV04] Mondrian [GKM06]
MXQL system[VMM05]
Lazy [CWW00][CW00a, CW00b]
Our routes approach
47
Programming Languages vs. Schema Mappings Debugging programming languages vs. debugging schema
mappings Procedural PL
We may have a specification (e.g. compute x2 on input x) which completely determines the output
Well-defined notion of correct answer The program is an implementation of the specification If the correct answer is not obtained, there’s a bug – need to debug the
implementation However, the specification may also not be that concrete
E.g., build a visual interface for … Functional PL
Debugging is performed by analyzing a trace of the execution Declarative approach for debugging [Nilsson94]
Schema mapping IS the specification Infinite number of solutions consistent with the schema mapping Best we can do: look at the target instance – if something looks
wrong (e.g., the clients’ names are not copied to the target) go back to the schema mapping and try to refine it (or debug it)
48
Related Work: Computing Provenance of Data over SQL queries Compute the provenance of relational data in a
view in data warehouses [CWW2000] The provenance of a tuple t in a view is described as the
tuples in the base tables that witness the existence of t
SQL Schema mappings
Eager DBNotes[BCTV2004][CTV2005]
MXQL system[VMM2005]
Lazy [CWW2000] Our approach1 2
4 5
R2 3
6 7
S
View definition:T(a,c) :- R(a,b) Æ S(b,c)
1 3
T
Provenance answered using two reverse queries:R(a,b) :- R(a,b) Æ S(b,c) Æ a=1 Æ c=3S(a,b) :- R(a,b) Æ S(b,c) Æ a=1 Æ c=3
DB
49
Related Work: Computing Provenance of data over SQL queries DBNotes: an annotation management system for relational
databases Each data value has zero or more annotations pSQL: a query language for propagating annotations
3 propagation schemes: DEFAULT, DEFAULT-ALL, CUSTOM By default, annotations propagate according to provenance
Eager approach: annotations propagate along with data as data is transformed through queries
Provenance information readily available in the output Automatically trace the provenance and flow of data over
multiple transformation steps Systematically maintains provenance annotations that describe the
exact location of data values
SQL Schema mappings
Eager DBNotes[BCTV2004][CTV2005]
MXQL system[VMM2005]
Lazy [CWW2000] Our approach1 2
4 5
R2 3
6 7
S
1 3
T
DB1 DB2Transformation:T(a,c):-R(a,b)ÆS(b,c)
50
Related Work: Computing the Provenance of Data over Schema Mappings MXQL system over relational/XML schema
mappings Eager approach
Additional info about source schema elements and mappings that contribute to the creation of target data is propagated and stored
Our approach is lazy: no reengineering Non-automatic approach for answering provenance
The additional info needs to be queried using MXQL We automatically compute routes for selected data
Data involved in the transformation not considered Our routes contain information
about schema elements, dependencies and data involved SQL
Schema mappings
Eager DBNotes[BCTV2004][CTV2005]
MXQL system[VMM2005]
Lazy [CWW2000] Our approach
51
Related Work: the Data Viewer Schema mapping M=(S, T, st[t)
S: Dept(dID,dName) and Emp(dID,name) T: DeptEmp(dID,dName,employee)
st: Dept(id,n) ! 9E DeptEmp(id,n,E) Dept(id,n) Æ Emp(id,e) ! DeptEmp(id,n,e) t = ;
Example source instance created to illustrate M
D1 Computer Science
D2 Anthropology
DeptD1 Alice
D3 Bob
Emp
Department that has at least one employee
(will join with Emp)
Department withno employee(will not join with Emp)
Employee of a department
Employee withno department
(will not appear in the target)
52
Universal Solutions [FKMP02] Definition: Given two instances K1 and K2, a homomorphism
h: K1 → K2 is a function h: Const[Var ! Const[Var such that: h(c) = c for all constants c For every fact R(a1, …, an) 2 K1, the fact R(h(a1), …, h(an)) 2 K2
Example: J1={V(1,N1), V(N2,2)}, J2={V(1,2)} h:J1 ! J2 is h={1 1, N1 2, N2 1, 2 2}
Definition: Let M=(S,T,st[t) be a schema mapping. If I is a source instance, then a universal solution for I is a solution J for I such that for every solution J’ for I, there exist a homomorphism h : J→J’
Example: st : R(x) ! 9N V(x,N) U(x) ! 9N V(N,x) Source instance I={R(1), U(2)} J2={V(1,2)} is not a universal solution for I J1={V(1,N1), V(N2,2)} is a universal solution for I
53
Homomorphism Definition: Let (x) be a conjunction of atoms and
K be an instance. A homomorphism h: (x) ! K is such that h((x)) =
{ R(h(z)) 2 K | R(z) is a rel. atom in (x) }
Example: Two homomorphisms from
Accounts(u,v,w) ^ Clients(w,x) to the target instance J
Target instance J
123 L1 ID1
A2 L2 ID2
AccountsID1 Alice
ID2 Bob
Clients
54
A Satisfaction Step Definition: Let be a tgd 8x (x) ! 9y (x,y): Let K and K1 are instances such that:
K1 µ K K ² Let h: (x) ^ (x,y) ! K be a homomorphism such that h is
also a homomorphism from (x) to K1.
Let K2 = K1 [ h((x,y)).
Then the result of satisfying on K1 with homomorphism h and solution K is K2.
K1 K2 h
55
Satisfaction Step: Remark 1 Satisfaction step chase step [FKMP02]
Definition based on logical satisfaction of tgds, not tied to implementation of the exchange
Example: st:EmpPhone(x,y) ! 9 z Emp(x,y,z) (1) EmpFax(x,z) ! 9 y Emp(x,y,z) (2) t: Emp(x,y,z) Æ Emp(x,u,v) ! y=u Æ z=v I={ s1: EmpPhone(Mary, p123), s
2: EmpFax(Mary, f567) }
J={ t: Emp(Mary, p123, f567) }
Two routes for t: s1 ! t and s2 ! t
Both routes make an assumption about the values taken by the existentials (z and y are assumed to e f567 and p123, respectively)
The egd is not used in the routes
We don’t have satisfaction steps with egds If K satisfies and egd , then K1 also satisfies , since K µ K1
56
Satisfaction Step: Remark 2 Satisfaction step solution-aware chase step
[FKT05]
Example: st : S(x) ! 9 N T(x,N) I={S(1)} J={t1:: T(1,N1), t2: T(1,N2)} is a solution for I
A route for J: h I, ; i ! h I, {t1} i ! h I, {t1,t
2} i
h1={x 1, N N1} and h
2={x 1, N N
2}
No solution-aware chase sequence produces both t1 and t2
h1 h2
57
Computing all routes for target tuples The schema mapping M is fixed Input:
source instance I target instance J a set of target tuples Js µ J
Output: a route forest for Js that concisely represents all routes for Js
Algorithm idea: reverse chase For each tuple R(a) in Js, consider every possible and h
for witnessing R(a) Do the same for all target tuples encountered during the
construction Do not consider the same tuple twice
58
Computing all routes: Properties Running time: polynomial in the sizes of M, I and J
At most |I|+|J| tuples in the forest Polynomial number of branches for each tuple A branch is not explored twice Reverse chase is efficient: push the computation to the
database
Completeness: the route forest embeds every minimal route for Js A minimal route for Js is a route for Js with no redundant
satisfaction steps
59
Computing one route for Js
Running time: polynomial in the sizes of M, I and J
Completeness: if there is a route for Js, then the algorithm will find a route for Js
Much faster compared to computing all routes No need to explore the entire route forest Possible to construct additional routes as needed
60
Some implementation details Scalable approach: steps in routes are discovered by pushing
computation to the database engine
Example: Source-to-target tgd: S(x,y) ! 9U9V (T1(x,U) ^ T2(U,V,y)) T1(a,b) matched against RHS LHS query:
S(a,y) is executed against the source instance using the db RHS query:
T1(a,b) ^ T2(b,V,c) is executed against the target instance using the db
Each binding for y generates one RHS query Design choice to decouple LHS and RHS queries
Extended for XML schema mappings
61
Comparison with Approaches for Evaluating Datalog Top-down techniques: OLDT, QSQ, Rule/goal
graphs Similarities: use memoization to avoid redundant
computation and infinite loops Major difference: the target instance J is available and
we leverage it to: Obtain completely instantiated facts during reverse chase Hence, avoid redundant computation earlier
Magic set rewriting technique: Possible to obtain all tuples that contribute to the
creation of Js However, need to recover the routes from the evaluation
of the magic rules
62
Top-down Example st:
1: S1(x,y) ! T1(x,y)
2: S2(x,y,u) ! T2(x,y,u)
t: 3: T1(x,y) Æ T2(y,z,u) ! T3(x,z)
I: S1(1,2)
J: T1(1,2), T3(1,3)
T3(1,3)
T1(1,y) Æ T2(y,3,u)
S1(1,y) T2(2,3,u)
S2(2,3,u)
y 2
y 2