Date post: | 11-May-2015 |
Category: |
Education |
Upload: | net2-project |
View: | 469 times |
Download: | 0 times |
Data Exchange over RDF
Andres LetelierAdvisor: Marcelo Arenas
Pontificia Universidad Catolica de Chile
September 1, 2011
What is data exchange?
ProblemData under one schema S needs to be restructured and translatedinto a target schema T
S −→ T
IS −→ IT
Schema mappings
QuestionWhich source instances corresponds to which target instances?
AnswerSchema mappings:
M⊆ Instances(S)× Instances(T)
Usually, schema mappings are defined as M = (S,T,ΣST)
Definition (Solution)
I2 is a solution of I1 under M iif (I1, I2) ∈MThe set of all solutions for I1 under M is denoted by SolM(I1)
Resource Description Framework (RDF)
I Data model for representing information about World WideWeb resources
I W3C Recommendation (1998)
I Part of the semantic web stack
I Directed, labeled graphs
I Blank nodes (labeled nulls)
I Basically, sets of triples (s, p, o)
SPARQL (pronounced “sparkle”)
I Query language for RDF
I W3C Recommendation(2008)
I Standard for querying RDF datasets
I Returns sets of partial mappingsI Operators:
I ProjectionI AND (inner join)I OPT (left join)I FILTERI UNIONI and more
Example
P1 = (?X,name, ?Y )
JP1KD =
?X ?Y
B1 paul
B2 john
Well-designed SPARQL patterns
Definition (Well-designed patterns)
A pattern P is well designed if for every subpattern P ′ of the formP1 OPT P2, every variable that appears in P2 and outside P ′ alsoappears in P1.
Example
I (?X,name, ?Y ) OPT ((?X, email, ?Z) OPT (?X, city, ?A))is well-designed
I (?X,name, ?Y ) OPT ((?W, email, ?Z) OPT (?X, city, ?A))is not
Data Exchange over RDF
I S and T are fixed to be RDF triples
I Tuple generating dependencies have to be redefined
I But first, we need some definitions...
RDF Tuple Generating Dependencies
Let P be a SPARQL pattern, µ1 and µ2 be partial mappings, andΩ1 and Ω2 be sets of mappings. Then:
I var(P ) are the variables mentioned in P
I dom(µ1) is the domain of µ1I A SPARQL SELECT query (denoted by (W,P ), whereW ⊆ var(P )) is the projection of the evaluation of P ontothe variables in W
RDF Tuple Generating Dependencies
Let P be a SPARQL pattern, µ1 and µ2 be partial mappings, andΩ1 and Ω2 be sets of mappings. Then:
I µ1 is subsumed by µ2 (µ1 µ2) if dom(µ1) ⊆ dom(µ2), forevery ?X in dom(µ1) that is not bound to a blank node wehave that µ1(?X) = µ2(?X) and for every pair of variables?X and ?Y in dom(µ1) such that µ1(?X) = µ1(?Y ) it is thecase that µ2(?X) = µ2(?Y ).
I Ω1 is subsumed by Ω2 (Ω1 v Ω2) if for every mapping µ1 inΩ1 there exists a mapping µ2 in Ω2 such that µ1 µ2.
RDF Tuple Generating Dependencies
(Re)Definition (Tuple Generating Dependencies)
Let P1 and P2 be SPARQL patterns, and W ⊂ var(P1) ∩ var(P2).An RDF tgd is a sentence of the form
(W,P1)→ (W,P2)
Given two RDF graphs G1 and G2, and a set of tgds Σ,(G1, G2) |= Σ if for every tgd (W,P1)→ (W,P2) in Σ it is thecase that J(W,P1)KG1 v J(W,P2)KG2
RDF Schema Mappings
Since S and T are fixed,
M = Σ
G2 ∈ SolM(G1)←→ (G1, G2) |= Σ
Universal solutions
Example
Let W = ?X, Σ =(W, (?X,name, ?Y ) AND (?X, email, ?Z))→(W, (?Y, hasmail, ?Z))and consider the dataset D:
Solution 1G2 =
(paul hasmail [email protected])
Solution 2G′
2 = (paul hasmail [email protected])(john hasmail n)
Universal solutions
DefinitionA solution G2 is universal if for every other solution G′
2, G2 v G′2
I Solution 1 is universal
I Solution 2 is not
Universal solutions
Not all settings have universal solutions:Consider G1 = (1, 2, 3), W = ?X, ?Y and
Σ = (W, (?X, ?Y, ?Z))→(W, ((?X, a, b) OPT (?W, b, ?Y ))
AND ((?X, c, d) OPT (?Z, d, ?Y )))
Solution 1G2 =
(1 a b)( n1 b 2)(1 c d)
Solution 2G′
2 = (1 a b)( n2 d 2)(1 c d)
This setting has no universal solution!
Good and bad news
Bad newsThere is no ensurance that an exchange setting that has a solutionwill have a universal solution
Good newsIf the heads of all tgds in Σ are well-designed and there is asolution, there is always a universal solution
Better newsWe have an algorithm
“Chasing” SPARQL queries
input A mapping µ and a (well-designed) SPARQL pattern P
output An RDF graph G such that µ ∈ JP KG
Chase(µ, ν, P,G)
I t:add unbound variables in t as fresh blank nodes to νadd ν(t) to G
I P1 AND P2:Chase(µ, ν, P1, G)Chase(µ, ν, P2, G)
I P1 OPT P2:Chase(µ, ν, P1, G)if dom(µ) \ dom(ν) ∩ var(P2) 6= ∅: Chase(µ, ν, P2, G)
After chasing:
I µ νI ν ∈ JP KGI µ v JP KGI If we chase with every P2 in Heads(Σ) the evaluations of
J(W,P1)KG1 , we get a universal solution.
Certain answers
Definition (Certain answers on a regular data exchange setting)
The set of certain answers is the intersection of the evaluation ofthe query over all the valid solutions
Example
Consider G1 = (1, 2, 3) and
(?X,(?X, ?Y, ?Z))→(?X, (?X, 1, 2) OPT (?X, ?Y, 3))
Solution 1G2 =
(1 1 2)
Solution 2G′
2 = (1 1 2)(1 2 3)
J(W,P2)KG2 = ?X 7→ 1
J(W,P2)KG′2
= ?X 7→ 1, ?Y 7→ 2
The intersection of J(W,P2)KG2 and J(W,P2)KG′2
is empty!
Certain answers
Given a pattern P and a set of RDF graphs G, let Lower(P,G) bethe set of all lower bounds of G w.r.t. subsumption.
(Re)Definition (Certain Answers)
The set of certain answers of a set of RDF graphs and a SPARQLpattern P is defined as any mapping Ω? in Lower(P,G), such thatfor any other Ω in Lower(P,G) it is the case that Ω v Ω?.
ClaimAll the possible sets of certain answers to an RDF data exchangesetting are homomorfically equivalent.
Back in our previous example...
Solution 1G2 =
(1 1 2)
Solution 2G′
2 = (1 1 2)(1 2 3)
J(W,P2)KG2 = ?X 7→ 1
J(W,P2)KG′2
= ?X 7→ 1, ?Y 7→ 2
The set of certain answers is now ?X 7→ 1
In conclusion...
Our contributions so far:
I RDF and SPARQL TGDs
I RDF Schema mappings
I Universal solutions
I Materialization of universal solutions
I Certain answers
In conclusion...
To do:
I Prove remaining claims
I Query answering (using universal solutions)
I Incomplete information in the source instance
I Knowledge exchange over RDFs
Thank you for listening
Any questions?