Post on 18-Jan-2018
description
transcript
Exchange Intensional XML DataExchange Intensional XML DataTova MiloTova Milo INRIA & Tel-Aviv U. ; Serge AbiteboulSerge Abiteboul INRIA ;Bernd AmannBernd Amann Cedric-CNAM ; Omar BenjellounOmar Benjelloun INRIA ;Fred Dang NgocFred Dang Ngoc INRIA
Outline
IntroductionIntroduction The Model and The Problem Exchanging Intensional Data Safe Rewriting Possible Rewriting Implementation Conclusion and Related Work
IntroductionIntroduction
What are intensional documents? XML document where;
some of some of the documents are defined defined explicitlyexplicitly
some are defined by programsdefined by programs (i.e Web (i.e Web servicesservices) that generate data.
Materialisation of the programs the process of evaluating some of the
programs included in an XML document and replacing them by their results.
Introduction (Introduction (cont’dcont’d))
The goals of the paper Study the new issues raised by the Study the new issues raised by the
exchange of intensional XML document exchange of intensional XML document btw. Applicationsbtw. Applications
Decide on Decide on which data should be which data should be materialised before it is sent and materialised before it is sent and which should not which should not
Introduction (Introduction (cont’dcont’d))
Sendercapabilities
ACLcost...
Receivercapabilities
ACLcost...
Data Exchange Schemag
q f
fq g
...
gq r
gf
r qg
rg
q
... ... ... ...
οData exchange scenario for intensional documents
gr
Outline
Introduction The Model and The Problem Exchanging Intensional Data Safe Rewriting Possible Rewriting Implementation Conclusion and Related Work
The Model and The Problem
Simple intensional XML Model Intension document Simple schema Instance of a schema About rewritings
A Richer Data Model Function patternRestricted Service Invocations
The Model and The Problem Simple intensional XML
Model intentional XML documents as Labelled Trees consisting of two types of nodes:
Data nodes: Nodes with a label in L U D Function Nodes correspond to “Service Calls”, that is, nodes
with a label in F: The children subtrees of a function node are the Function
Parameters When the function is called:
These subtrees are passed to it The return value replaces the function node in the
document.
Assume the existance of some Disjoint Domains: N : domain of NODES L : domain of LABELS F : domain of FUNCTION NAMES D : domain of DATA VALUES
newspaper
title
“The Sun”
date
“04/10/2002”
Get_Temp
city
“Paris”
TimeOut
“Exhibits”
temp
“16 ºC”
The Model and The Problem Simple intensional XML (cont’d)
An example of intentional XML documents
Simple schema A document schema s is an expression (L,F,τ)
where, L L :finite set of labels F F :finite set of function names τ :function that maps:
Each label name l Є L to a expression over L U F or to the keyword “data”
Each function name f Є F to a pair of expressions called
τin(f ) input type of f τout(f ) output type of f
The Model and The Problem Simple intensional XML (cont’d)
An Example of a Schema: data:
τ (newspaper) =title.date.(Get_Temp|temp) .(TimeOut|exhibit)
τ (title) = data τ (date) = data τ (temp) = data τ (city) = data τ (exhibit) = data
Functions: τin (Get_Temp)= city τout (Get_Temp)= temp τin (TimeOut)= data τout (Timeout)= (exhibit|performance) τin (Get_Date)= title τin (Get_Date)= date
The Model and The Problem Simple intensional XML (cont’d)
The Model and The Problem Simple intensional XML (cont’d)
Instances of a schema An intensional document t is instance of a
schema s=(L,F,τ) if for each: Data Node n Є t with label l Є L, the labels of
n’s children form a word in lang(τ(l ))
Same is valid for Function Node.
Used to denode the regular language defined by ττ ( (l )
about Rewritings t,t’: trees IF t’ is obtained from t by;
selecting a function node v in t with some label f and
replacing it by an arbitrary output instance of f
THEN we say that t t’
The Model and The Problem Simple intensional XML (cont’d)
v
The Model and The Problem Simple intensional XML (cont’d)
about Rewritings (cont’d)
IF t t1 t2 ------ tn THEN
we say that t tn
nodes v1,........, vn are called rewriting
sequence the set of all trees the set of all trees t’t’ such that such that t t’ t t’
is denoted is denoted ext(t)ext(t)..
vv11 vv22 vvnn
*t rewrites into tt rewrites into tnn
*
The Model and The Problem Simple intensional XML (cont’d)
about Rewritings (cont’d) Let:
t be a tree s be a schema
1. IF ext(t) contains some instance of s THEN t possibly rewrites into s.
2. IF either t is already an instance of s or there exists some node v in t such that
all trees t’ where t t’ safely rewrite into s
THEN we say that t safely rewrites into s
vv
The Model and The Problem Simple intensional XML (cont’d) safely rewritsafely rewriting of schemaing of schema
Let:Let: s be a schemas be a schema r is a distinguished label called root labelr is a distinguished label called root label
IF IF all the instances t of s with root label r rewrite all the instances t of s with root label r rewrite safely into instances of s’ safely into instances of s’
THENTHEN we say that:we say that: s s safely rewritessafely rewrites into into s’s’
Problems:Problems:
The Model and The Problem Simple intensional XML (cont’d)
Sendercapabilities
ACLcost...
Receivercapabilities
ACLcost...
Data Exchange Schemag
q f
fq g
...
gq r
gf
r qg
rg
q
... ... ... ...
gr
The Model and The ProblemA Richer Data Model
Function Patterns A function belongs to the pattern if its name A function belongs to the pattern if its name
satisfies thesatisfies the boolean predicateboolean predicate and itsand its signaturesignature is the same as the required oneis the same as the required one
EX:EX: ττnamename (Forecast)= UDDIF(Forecast)= UDDIF InACL InACL ττinin (Forecast)= city(Forecast)= cityττoutout (Forecast)= temp(Forecast)= temp
The Model and The Problem A Richer Data Model(cont’d)
Restricted Service Invocations We assumed so far that all the functions appearing We assumed so far that all the functions appearing
in a document may be invoked in a rewriting, in in a document may be invoked in a rewriting, in order to match a given schema.order to match a given schema.
This is not always the case, for the reasons like;This is not always the case, for the reasons like; securitysecurity,, costcost,, access rightsaccess rights , etc. , etc.
THUS, function names/patterns in the schema can THUS, function names/patterns in the schema can be partitioned into two disjoint groups of be partitioned into two disjoint groups of invocable invocable and and noninvocablenoninvocable ones. ones.
A A legal rewritinglegal rewriting is then one that invokes only is then one that invokes only invocable functionsinvocable functions..
Outline
Introduction The Model and The Problem Exchanging Intensional Data Safe Rewriting Possible Rewriting Schema Rewriting Implementation Conclusion and Related Work
Exchanging Intensional Data
Rewriting process Safe writing Possible writing Mix approach
Restriction
Exchanging Intensional Datarewriting process
Safe rewriting: check if t safely rewrites to s
if so, find a rewriting sequence. rewriting sequence a sequence of functions
that need to be invoked to transform t into the required structure
preferred required structure shortest/ cheapest one
Exchanging Intensional Datarewriting process(cont’d)
Possible Rewriting : IF a safe rewriting does not exist
check whether at least t may rewrite to s. IF it is acceptable to do so (the sender accepts
that the rewriting may fail), try to find a successful rewriting sequence if
one exists preferred rewriting sequence one with the
least cost.
Exchanging Intensional Datarewriting process(cont’d)
Mixed Approached:In mixed approach, one could first invoke some function calls then attempt from there to find safe rewritings.
Exchanging Intensional Datarewriting process(cont’d) K-depth rewriting sequenceK-depth rewriting sequence
For a rewriting sequenceFor a rewriting sequence ttvv ::tt11 .. .. ttn n ,, IFIF the node the node VVjj was returned by the invocation of the was returned by the invocation of the
function function VVii , , VVjj ttjj, , VVii ttj-1j-1
THENTHEN we say that we say that function nodefunction node VVjj depends on adepends on a function nodefunction node V V ii ..
IF IF the dependency graph among the nodes contains the dependency graph among the nodes contains no paths of length greater than no paths of length greater than kk..
THEN THEN we say that we say that a rewriting sequence is ofa rewriting sequence is of depth depth kk
v1 vn
Exchanging Intensional DataRestriction
RESTRICTION:RESTRICTION:
“Consider onsider onlyonly k-depth left-to-rightk-depth left-to-right rewritings. rewritings.“
Outline
Introduction The Model and The Problem Exchanging Intensional Data Safe Rewriting Possible Rewriting Schema Rewriting Implementation Conclusion and Related Work
Safe Rewriting(DEC16,2004)
Algorithm for k-depth left to right safe rewriting Safe Rewriting Algorithm:Safe Rewriting Algorithm:
Given:Given: word word ww the output types the output types RRf1f1,.....,R,.....,Rfnfn of the available functionsof the available functions target regular language target regular language RR
Purpose of the algorithm:Purpose of the algorithm: to test ifto test if ww can be safely rewritten into a word can be safely rewritten into a word
in in RR if so, to find a if so, to find a safe rewriting sequencesafe rewriting sequence
Safe Rewriting (Safe Rewriting (cont’dcont’d) ) Note:Note:For illustration purposes we use the For illustration purposes we use the newspaper newspaper
documentdocument w=title.date.Get_Temp.TimeOutw=title.date.Get_Temp.TimeOut
word children labels formword children labels form R=title.date.temp (TimeOut|R=title.date.temp (TimeOut|
exhibitexhibit**)) safe rewriting of the above word safe rewriting of the above word into the word in into the word in RR
The Algorithm:The Algorithm:Main idea: to put things in regular language terms, the intersection of the language generated by the k-depth invocation with the complement of the target language R should be Empty.
Safe Rewriting (Safe Rewriting (cont’dcont’d))1.1.Build the finite state automata for the following regular languagesBuild the finite state automata for the following regular languages (1) (1) Aw w=title.date.Get_Temp.TimeOutw=title.date.Get_Temp.TimeOut
(2) (2) Build automata Build automata AAfi fi each accepting the regular each accepting the regular language language RRfi fi (the output types of the available functions).(the output types of the available functions).
q1date
q0title q2 Get_Temp q3 TimeOut q4
Safe Rewriting (Safe Rewriting (cont’dcont’d))((3)3) Build an automaton A accepting the complement of Build an automaton A accepting the complement of
the regular language the regular language R R . . The automaton should be The automaton should be deterministic and complete.deterministic and complete.
The complement automation A for schema ττ’(newspaper)=title.’(newspaper)=title.date.date.temp(TimeOut|exhibit*)temp(TimeOut|exhibit*)
p5
p2 p3 p4 p6temp TimeOut
exhibit
exhibit
*
*
**
*
p1 datep0 title
*
Safe Rewriting (Safe Rewriting (cont’dcont’d))2. Construct automation 2. Construct automation Aw represents all the words represents all the words
that can be generated by such k-depth rewriting that can be generated by such k-depth rewriting process (by iteration)process (by iteration)
1 depth automaton Aw for the word w=title.date.Get_Temp.TimeOutw=title.date.Get_Temp.TimeOut
1
q1 dateq0 title q2 Get_Temp q3 TimeOut q4
q5
ε
q6
ε
temp q7
ε ε
exhibit
performance
Fork node Fork node
Represents choice of invoking the function
Represents choice of not invoking the function
k
Safe Rewriting (Safe Rewriting (cont’dcont’d)) 3.3.Construct the cartesian product automatonConstruct the cartesian product automaton
AX=Aw X AAX=Aw X A
k
q0,p0
q3,p6
q1,p1 q2,p2 q3,p3
q5,p2 q6,p3
q4,p4
q7,p3 q4,p3
q7,p5 q5,p5
q7,p6
q4,p6
q7,p6
title date
Get_Temp
temp
TimeOut
Perform.
exhibit
PerformanceexhibitTimeOutε
Exhibit
Performance
ε
ε ε
ε
εεε
Figure6:Figure6:
Safe Rewriting (Safe Rewriting (cont’dcont’d))
4.4. Mark nodes in Mark nodes in AAXX ::
q0,p0
q3,p6
q1,p1 q2,p2 q3,p3
q5,p2 q6,p3
q4,p4
q7,p3 q4,p3
q7,p5 q5,p5
q7,p6
q4,p6
q7,p6
title date
Get_Temp
temp
TimeOut
Perform.
exhibit
PerformanceexhibitTimeOutε
Exhibit
Performance
ε
ε ε
ε
εεε
Figure6:Figure6:
Safe Rewriting (Safe Rewriting (cont’dcont’d)) Try to obtain a SAFE REWRITING.Try to obtain a SAFE REWRITING.
““A safe rewriting exists IFF the initial state is not A safe rewriting exists IFF the initial state is not marked”marked”
Follow a non-marked pathFollow a non-marked path (corresponding to(corresponding to w w ) ) starting from the initial state ofstarting from the initial state of AAx x to a state [q to a state [q p] where q is an accepting statep] where q is an accepting state ofof AAww
non-marked fork options on the path non-marked fork options on the path determine the rewriring choices (i.e. which determine the rewriring choices (i.e. which functions to call)functions to call)
when a function is invoked, we contnue the when a function is invoked, we contnue the path with the new rewritten word rather than path with the new rewritten word rather than the wordthe word w w
k
Safe Rewriting (Safe Rewriting (cont’dcont’d)) To minimize the rewriting cost, choose a path To minimize the rewriting cost, choose a path
with minimal number/cost of function with minimal number/cost of function invocations.invocations.
EXIT EXIT % End of the algorithm% End of the algorithm
Safe Rewriting (Safe Rewriting (cont’dcont’d)) The complement automaton A for schema
ττ’(newspaper)=title.date.temp.exhibit*’(newspaper)=title.date.temp.exhibit*
p5
q3 p3 p4 p6temp *
exhibit
exhibit
*
*
**
*
q1 dateq0 title
*
Figure7:Figure7:
Safe Rewriting (Safe Rewriting (cont’dcont’d)) The cartesian product automaton Ax = Aw x A
q0,p0
q3,p6
q1,p1 q2,p2 q3,p3
q5,p2 q6,p3
q7,p3 q4,p3
q7,p5 q5,p5
q7,p6
q4,p6
q7,p6
title date
Get_Temp
temp
TimeOut
Perform.
exhibit
PerformanceexhibitTimeOutε
Exhibit
Performance
ε
ε ε
ε
εεε
1 1
Figure8:Figure8:
Outline
Introduction The Model and The Problem Exchanging Intensional Data Safe Rewriting Possible Rewriting Implementation Conclusion and Related Work
Possible Rewriting The AlgorithmThe Algorithm 1.1. Build finite state automaton for the following Build finite state automaton for the following
languages:languages: 1.1.1.1. An automaton A An automaton Aww 1.2. 1.2. An automaton An automaton A A accepting the regular accepting the regular
language language RR
k
Possible Rewriting(cont’d) An automaton A for schema
ττ’’(newspaper)=title.date. Temp.exhibit*’’(newspaper)=title.date. Temp.exhibit*
p2 p3 p4temp Exhibit
exhibit
p1 datep0 title
Figure10:Figure10:
Possible Rewriting(cont’d) 2.Construct the cartesian product automaton Ax=Aw x A
q0,p0 q1,p1 q2,p2 q3,p3
q5,p2 q6,p3
q7,p3title date
tempε εε
Figure11:Figure11:
q4,p3
q4,p4
q7,p4
ε
εexhibit
k
exhibit
Possible Rewriting(cont’d) The cartesian product automaton for possible
rewritting.
q0,p0 q1,p1 q2,p2 q3,p3
q5,p2 q6,p3
q7,p3title date
tempε εε
Figure11:Figure11:
q4,p3
q4,p4
q7,p4
ε
εexhibit
exhibit
Outline
Introduction The Model and The Problem Exchanging Intensional Data Safe Rewriting Possible Rewriting Implementation Conclusion and Related Work
Implementation In the implementation;
intensional XML document a well-formed XML document
To distinguish intensional parts from the rest of the document; namespace http://www.activexml.com/ns/int is used. http://www.activexml.com/ns/int namespace
defined for function (service) calls.
Implementation (cont’d)
newspaper
title
“The Sun”
date
“04/10/2002”
Get_Temp
city
“Paris”
TimeOut
“Exhibits”
Implementation (cont’d)
Namespace Namespace defined for defined for
function (service) function (service) callscalls
Data nodes Data nodes title title and and datedate
1.1.URL of URL of the serverthe server
Three attributes of the Three attributes of the function nodes provide function nodes provide necessary information necessary information
to call the to call the SOAP ServiceSOAP Service
2.2.Method Method namename
3.3.associated associated namespacenamespace
Implementation (cont’d)
Function TimeOutFunction TimeOut
1.1.URL of URL of the serverthe server
2.2.Method Method namename
3.3.associated associated namespacenamespace
Implementation (cont’d) Newspaper element with structureNewspaper element with structure title.date.
(Forecast|temp). (TimeOut|exhibit*)
Implementation (cont’d)
The Role ofThe Role of Schema EnSchema Enfforcement Module orcement Module :: 1. 1. to verify whether the call parameters conform to
the WSDLint description of the service. 22. if not, try to rewrite them into the required
structure. 3. 3. if if 2 2 fails, to report an error.fails, to report an error.
NOTE:NOTE: Similarly, before an ActiveXML returns its answer,
the Schema Enforcement Module performs the same three steps on the returned data.
Outline
Introduction The Model and The Problem Exchanging Intensional Data Safe Rewriting Possible Rewriting Implementation Conclusion and Related Work
CONCLUSION and RELATED WORKCONCLUSION and RELATED WORK
XML documents with embedded calls to Web services are already present in several existing products.(ActivXML System)
WHAT’S NEW ? However, the proposed extension of the XML
Schema with function types is a first step towards a more precise description of XML documents embedding computation.
MAIN PROBLEM: whether Safe Rewriting remains decidable when the
k-depth restriction is removed.