Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | drusilla-mccoy |
View: | 214 times |
Download: | 1 times |
Slide 1Querying XML with Locator Semantics
Querying XML with Locator Semantics
Peter Fankhauserjoint work with:
Matthias Friedrich, Gerald Huck, Ingo Macherius, Jonathan Robie
GMD German National Research Center for Information TechnologyInstitute for Integrated Publication- and Informationsystems
GMD-IPSIhttp://xml.darmstadt.gmd.de/
Slide 2Querying XML with Locator Semantics
Overview
Requirements for Querying XML
XQL Overview
Locators
Locator Algebra
IPSI XML-Brokering Framework
Slide 3Querying XML with Locator Semantics
General Requirements for Querying XML(Excerpt from Dave Maier, W3C QL 98)
Require no schema• flexibly match irregular structure• preserve (irregular) structure
Query & Preserve Order and Association• sibling order• hierarchy
Precise Semantics• rewrite rules• compositional semantics
Closedness/Completeness• XML to XML• when is a QL for XML complete?
Slide 4Querying XML with Locator Semantics
Running Example
Bookstore:• Non Uniform Hierarchy• sci-fi: 2 levels• mystery: 3 levels
Customers: Flat Table
<books_and_customers><bookstore> <fiction> <sci-fi> <book> <isbn>0006482805</isbn> <title>Do androids dream of electric sheep</title> <author>Philip K. Dick</author> </book> </sci-fi> <fantasy> <mystery> <book> <isbn>0261102362</isbn> <title>The two towers</title> <author>JRR Tolkien</author> </book> </mystery> </fantasy> </fiction></bookstore><!-- continued next column -->
<customers> <customer> <name>Jason Woolsey</name> <boughtbooks> <isbn>0261102362</isbn> <isbn>0593488321</isbn> </boughtbooks> </customer> <customer> <name>P.W. Ellis</name> <boughtbooks> <isbn>0006482805</isbn> <isbn>0261102362</isbn> </boughtbooks> </customer></customers></ books_and_customers >
Slide 5Querying XML with Locator Semantics
Functional Requirements for Querying XML (Dave Maier, W3C QL 98)
Selection and Extraction:• all sci-fi books by P.K. Dick
Reduction:• drop all authors but 1st author
Combination:• combine all books with their customers via isbn
Restructuring:• return flat lists of title/author pairs• and vice versa
Multidocument Handling:• get reviews and books from different sites• follow (dereference) links in books to authors
Slide 6Querying XML with Locator Semantics
XQL Overview (State W3C QL 98)
Basic Concept: Selection of Subtrees• Originated as QL for DOM• adopted for selectors in XSL-templates
(now merged with XPointer to XPel to XPath to ????) • Defined along search contexts = an (ordered) set of document nodes
Path Expressions and Filters:• A query is essentially a navigation in element trees• Navigation and filters modify the search context• Query result is the last search context
Selection of nodes by:• Element- and attribute name• Type (element, attribute, comment, etc.)• Content or value of nodes• Relationship between nodes: hierarchy, sequence, index
Combination by: union, intersection
Slide 7Querying XML with Locator Semantics
XQL 98 Examples
Selection and Extraction:• all books by P.K. Dick
//book[author=„P.K. Dick“]
Reduction:• drop all but 1st author
//*?/book?/(isbn | author[0] | title)• * matches all elements along paths to book• shallow return operator (?) retains nesting hierarchy• union preserves document order (title before author)
Slide 8Querying XML with Locator Semantics
XQL 98 lacked:
Selection Functionality• comparison operators for fulltext (in progress)• regular path expressions for hierarchy (only // for recursive descent
and * for matching all nodes in a search context)
Restructuring• Suggestions: return operators (SAG), XSLT (W3C), Application Level
(e.g. WebMethods)
Combination• joins; Suggestions: see below
Graphs• no navigation along ID/IDREF• no multi-documents (dereferencing URIs)• Suggestions: docref, ref, keyref, idref
Delegation• external functions• wrappers
Slide 9Querying XML with Locator Semantics
Extended XQL Examples
Combination:• combine all books with customers via isbn
$root//*?/book?[$i:=isbn]/ (* | $root//customer?[boughtbooks/isbn=$i])
• New concepts• combination with nodes outside of search context ($root//review)• correlation variables for expressing join predicate [$i:=isbn]• $root used for clarity...
• Irregular structure of bookstore is preserved
Multidocuments/Delegation:• get multiple bookstores from a bookmark list (HTTP-GET)
docref('http://www.bookstores')/docref(.//@href)//bookstore• the same with a form (HTTP-POST - simplified!)
docref ('http://www.bookstores/search.cfm',‘country',‘us')//bookstore• the same with a wrapper (application program delivering XML)
wrapper(„bookstore“)//bookstore
Slide 10Querying XML with Locator Semantics
Towards a Datamodel for querying XML
XQL forDummies
personperson article
firstname
lastname
Jonathan Robie
firstname
lastname title year
Joe Lapp 1999
authorauthor
FlatElemTable DocumentTableown_id doc up succ pred own_id name dtdref root
"Text zu Elem1" 0 1 - - - 1 "Dok1" 2 0 "Text zu Elem4" 1 1 0 - - 2 "Dok2" 1 2 "Text zu Elem6" 2 2 - - - 3 "Dok3" 1 9 "Text zu Elem8" 3 2 2 5 - "Text zu Elem10" 4 2 3 - -
5 2 2 7 3 attrRecTable6 2 5 - - element name value
NonFlatElemTable 7 2 2 - 5 2 Attr2 AW2down etName 8 2 7 - - 3 Attr3 AW3
1 "E0" 9 3 - - -3 "E2" 10 3 9 - - DocumentTable4 "E3" own_id name etypes config6 "E5" 1 "DTD1" {...} "...."8 "E7" 2 "DTD2" {...} "...."
10 "E9" 3 "DTD3" {...} "...."
DocElemTableflat
W3C-DOM:Element Tree OEM: Graph
<document> <person id=“jonathanr"> <firstname>Jonathan</firstname> <lastname>Robie</lastname> </person> <person id=“joel"> <firstname>Joe</firstname> <lastname>Lapp</lastname><!-- ... --><document>
Relational Tables(generic massive join option)
?
? ?
XML Serialization: Structured Text
[email protected].@id.“joel"document.person.firstnamedocument.person.firstname.“Joe"document.person.firstname.“Lapp"document.persondocument.person.@id...
Locators: Lists of Paths
?
Slide 11Querying XML with Locator Semantics
Locators for Bookstore
bookstore#1
bookstore#1.fiction#2
bookstore#1.fiction#2.sci-fi#3
bookstore#1.fiction#2.sci-fi#3.book#4
bookstore#1.fiction#2.sci-fi#3.book#4.isbn#5
bookstore#1.fiction#2.sci-fi#3.book#4.title#6
bookstore#1.fiction#2.sci-fi#3.book#4.author#7
…
bookstore#1.fiction#2.fantasy#8
bookstore#1.fiction#2.fantasy#8.mistery#9
bookstore#1.fiction#2.fantasy#8.mistery#9.book#10
bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11
bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11.title#12
bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11.author#13
...
Slide 12Querying XML with Locator Semantics
Locators <-> XML Serialization
Locators are lists of paths XML-document->Locators
• each element-node gets id in document-order (depth first, left to right traversal)
• each element-node is located by the entire path from root• attributes are attached to element-nodes• content is attached to leave-nodes
Locators->XML-document: • clean up: discard locators $prefix which are followed by at least
one locator $prefix.$postfix• generate tree
(1) for all locators generate nested serialization(2) fill up with content and attributes
Mappings should be total, 1:1
Slide 13Querying XML with Locator Semantics
Locator Sets vs. Relations
Commonalties• flat sets• identity defined by identity of components• concatenation to derive new locators/tuples
Differences• arity
• locators: variable length• tuples: fixed
• access to components:• locators: by navigation• tuples: by position/attribute
• data:• locator components: document nodes
tuples components: values
Slide 14Querying XML with Locator Semantics
Locator Algebra (0)
Operator Relational Algebra Locator Algebra
, , - On tuple sets On locator sets
Select Selects tuples with apredicate
Selects locators with a predicate
Project By absolutecomponent selection
Not available, implicit projection bydependent join
Cross Product Concatenate eachtuple in one set witheach tuple in anotherset
Dependent join concatenating locatorsfrom a context set with locators fromdependent set
Theta-J oin Combination of crossproduct with select
Combination of dependent join, select,and variable binding
Tree-Operators Not applicable DOM-methods
Slide 15Querying XML with Locator Semantics
Locator Algebra (1)
Preliminaries• L domain of locator sets
• x, y
• PL domain of locators• u, v
• tail(u) … last component of uprefix(u) … u - tail(u)
Tree-Operators• navigation in document tree using DOM methods• root, parent, children: PL L• applied to locator sets from L using d-join (see below)
Set-Operators , , -: L L L
defined as usual• order preservation due to total ordering on document nodes
Slide 16Querying XML with Locator Semantics
Locator Algebra (2)
Select• select[p]: L L, where p: PL Boolean
select[p](x) = {u | u x, p(tail(u))}• Example: select[nodename(.) = “book”](x) =
select[“book”](x)
Return• Corresponds to project
duplicates tail of locator for preserving it insubsequent d-join (see below)
• return: PL PLreturn(u)=concat(u, tail(u))
Slide 17Querying XML with Locator Semantics
Locator Algebra (3)
Dependent-Join:• d-join[f]: L L, where f: PL L
d-join[f](x) = u x concat(prefix(u),f(tail(u))
• Example: return all titles of books in their book contextselect[“title”](d-join[children(.)] (select[“book”](d-join[return(children(.))](x)) =/book?/title
Kleene Star:• fixpoint-operator for recursive descent queries• *[f]: L L, where f: L L
*[f](x) = f(x) *[f](f(x))• Example: select all titles in their original context
select[“title”](d-join[children(.)] (*[d-join[return(children(.)](.))](x))=//*?/title
• maybe too general for physical algebra
Slide 18Querying XML with Locator Semantics
Locator Algebra (4)
Varbind, Varget• to realize joins across contexts• varbind[i,f]: L L, where i Name, f: PL L
varbind[i,f](x):
for all u x: vars(u):=vars(u) vf(tail(u))<i,v>
• varget[i]: PL Lvarget[i](u): {v | (i,v) vars(u)}
Slide 19Querying XML with Locator Semantics
Join Example (1)
bc#0.bookstore#1bc#0.bookstore#1.fiction#2bc#0.bookstore#1.fiction#2.sci-fi#3...
$A=*[d-join[return(children(.))](.)](x)= //*?
bc#0
$B=select[“book”](d-join[return(children(.))]($A))= //*?/book
bc#0.bs#1.f#2.sf#3.b#4bc#0.bs#1.f#2.fa#8.mi#9.b#10...
$C=d-join[return(children(.))]($B)=//*?/book?/*
bc#0.bs#1.f#2.sf#3.b#4.isbn#5bc#0.bs#1.f#2.sf#3.b#4.title#6...
$D=varbind[$i,select[“isbn”](children(.))]($B)= //*?/book[$i:=isbn]?
bc#0.bs#1.f#2.sf#3.b#4<$i,isbn#5>bc#0.bs#1.f#2.fa#8.mi#9.b#10<$i,isbn#11>...
$F=d-join(select[ select[“isbn”](d-join[children(.)] (select[“boughtbooks”](d-join[children(.)](.)))= = varget[$i](.)](“$E”)]($D)=//*?/book[$i:=isbn]?/ (//*?/customer[boughtbooks/isbn=$i])
$E=select[“customer”](d-join[children(.)] (*[d-join[return(children(.))](.)](d-join[root(.)]($D)))=//*?/customercustomers#14.customer#15customers#14.customer#20
bc#0.bs#1.f#2.sf#3.b#4.cs#14.customer#20bc#0.bs#1.f#2.fa#8.mi#9.b#10.cs#14.customer#15bc#0.bs#1.f#2.fa#8.mi#9.b#10.cs#14.customer#20
Slide 20Querying XML with Locator Semantics
Join Example (2)
<books_and_customers><bookstore> <fiction> <sci-fi> <book> <isbn>0006482805</isbn> <title>Do androids dream of electric sheep</title> <author>Philip K. Dick</author> <customers> <customer> <name>P.W. Ellis</name> <boughtbooks> <isbn>0006482805</isbn> <isbn>0261102362</isbn> </boughtbooks> </customer> </customers> </book> </sci-fi>
<fantasy> <mystery> <book> <isbn>0261102362</isbn> <title>The two towers</title> <author>JRR Tolkien</author> <customers> <customer> <name>Jason Woolsey</name> <boughtbooks> <isbn>0261102362</isbn><isbn>0593488321</isbn> </boughtbooks> </customer> <customer> <name>P.W. Ellis</name> <boughtbooks> <isbn>0006482805</isbn> <isbn>0261102362</isbn> </boughtbooks> </customer> </customers> </book> </mystery> </fantasy> </fiction></bookstore></books_and_customers>
Slide 21Querying XML with Locator Semantics
Some Equivalence Transformations for L’Algebra
Commutativity:• union(A,B) = union(B,A) (within single document)• but d-join is not commutative
Associativity:• union, intersect, d-join
Idempotence:• union(A,A) = A
Distributivity:• //book/(title | author) = //book/title | //book/author
Neutral Elements:• union: {}• d-join: $root(?)
Slide 22Querying XML with Locator Semantics
Open Issues
Combination with relational algebra Graphs/Multidocuments
• DAGs: Multiple paths from root-context to node (serialization?)• Role of URIs in locators?
Typing• Role of XSD (XML Schema Description)• Inference
Constructors• attribute to element and vice versa….• Grouping, Skolems
Details• Investigate conformance of locator concept to W3C Infoset• Constraints on locators/mappings to guarantee wellformedness
Political• XQL-Implementations shipping:
underlying semantics node-based, not locator-based
Slide 23Querying XML with Locator Semantics
The IPSI XML Brokering Framework
Datamodel: Document Object Model (W3C-DOM)
GenericWrappers
SpecificWrappers
JEDIFramework
Queryprocessor: XML Query Language (XQL)
Server (HTTP, URL)
XSL Processor
Queries
Visualization
DOM
PersistentDOM
Warehouse
Program
URL+Queries
XML
XML
HTML, CSS
HTTP/HTML Roboter
XQL
XQL
Slide 24Querying XML with Locator Semantics
Wrappers
Jedi Framework for Wrappers• Pivot Object Model• Scripting language for control-flow• Access to dynamic sources (ODBC, CORBA) with iterators
Generic Wrappers• Generic Mapping of structured formats to XML• Examples: SGML,XML, HTML, MS-RTF
Jedi Parser• for irregularily formatted sources• context free, attributed grammars• fault-tolerant, efficient parser: unlimited lookahead, interpretation
of ambiguous, incomplete grammars by specificity ordering
HTTP-Access• Access plans for delegation integrated with XQL Engine
Slide 25Querying XML with Locator Semantics
Mediator: XQL Engine + Persistent DOM
XQL 98 Implementation• efficient recursive descent queries by signature-index
+ Joins + Multi Document Handling
• extends XQL with external references (via http-get, http-post)• Multidocument DOM; for every node namespace and URI
+ User defined functions• input: context (reference-node-set, reference-node-pointer),
parameters: constants, XQL-expressions (lazy evaluation)• output: node-functions, collection-functions (set of nodes),
comparison-operatorscan attach base-URIs
• variables
Slide 26Querying XML with Locator Semantics
Application 1: An XML Broker for Golfers
<www.reiseplanung.de> <route> <von>53757</von> <nach>93333</nach> <entfernung>481.9</entfernung> <fahrzeit>274</fahrzeit> <karte>5375793333.gif</karte> </route> <!-- ... --></www.reiseplanung.de>
<www.wetter.de> <wetter> <plz>87724</plz> <datum>981001</datum> <temperatur>16</temperatur> <regen>90</regen> <wind>9</wind> <prognose>13</prognose> </wetter> <!-- ... --><www.wetter.de>
<golfplatz id="platz0001"> <adresse> [...] </adresse> <policy> ... </policy> <handicap> <wochentag>34</wochentag> <wochenende>34</wochenende> </handicap></golfplatz>
XML Broker
<golfdemo <golfplatz> <adresse> ... </adresse> <greenfee> ... </greenfee> ... </golfplatz> <wetter> ... </wetter> <route> ... </route></golfdemo>
XSL
Query
Slide 27Querying XML with Locator Semantics
Application 2: RELIMO Integrating Bioinformatics Data
PDBas localPDOM
RELIBASEwith XML
RPC
XML Broker
XML Application(e.g. Office 2000)
XML Browser(e.g. Mozilla 5)
XSL Formatter(e.g. Lotus-XSL)
Slide 28Querying XML with Locator Semantics
Application Data
XML Broker for Golfers• Sources: www.golffuehrer.de (500 KB), www.wetter.de (200 KB),
www.routen-information.de (200 KB)• Joins (via zip-code) ~ 2 to 3 secs
RELIMO (Germany)• Sources: Relibase (XML-RPC), PDB (5 GB -> 25 MB XML, 30 MB
PDOM)• response time (100 MB) 50 to 30000 ms
MIROWEB (ESPRIT)• JEDI for importing several sources to Oracle 8
Shakespeare• all plays• 10 MB (Tests with duplicated data up to 0.5 GB)
Slide 29Querying XML with Locator Semantics
Some Links & Acks
XQL FAQ• http://metalab.unc.edu/xql/
IPSI XML Research & Development• http://xml.darmstadt.gmd.de
• XQL-Engine 1.0.1 download (non-commercial use)
• JEDI download (non-commercial use)
XML Brokering Framework Licensing Info (Infonyte)• [email protected]
• www.infonyte.com
Many thanks to• Karl Aberer, Harald Schöning, Guido Mörkotte