+ All Categories
Home > Documents > Slide 1Querying XML with Locator Semantics Peter Fankhauser joint work with: Matthias Friedrich,...

Slide 1Querying XML with Locator Semantics Peter Fankhauser joint work with: Matthias Friedrich,...

Date post: 04-Jan-2016
Category:
Upload: drusilla-mccoy
View: 214 times
Download: 1 times
Share this document with a friend
29
Slide 1 Querying XML with Locator Semantics Querying XML with Locator Semantics Peter Fankhauser joint work with: Matthias Friedrich, Gerald Huck, Ingo Macherius, Jonathan Robie GMD German National Research Center for Information Technology Institute for Integrated Publication- and Informationsystems GMD-IPSI http://xml.darmstadt.gmd.de/
Transcript

Slide 1Querying XML with Locator Semantics

Querying XML with Locator Semantics

Peter Fankhauserjoint work with:

Matthias Friedrich, Gerald Huck, Ingo Macherius, Jonathan Robie

GMD German National Research Center for Information TechnologyInstitute for Integrated Publication- and Informationsystems

GMD-IPSIhttp://xml.darmstadt.gmd.de/

Slide 2Querying XML with Locator Semantics

Overview

Requirements for Querying XML

XQL Overview

Locators

Locator Algebra

IPSI XML-Brokering Framework

Slide 3Querying XML with Locator Semantics

General Requirements for Querying XML(Excerpt from Dave Maier, W3C QL 98)

Require no schema• flexibly match irregular structure• preserve (irregular) structure

Query & Preserve Order and Association• sibling order• hierarchy

Precise Semantics• rewrite rules• compositional semantics

Closedness/Completeness• XML to XML• when is a QL for XML complete?

Slide 4Querying XML with Locator Semantics

Running Example

Bookstore:• Non Uniform Hierarchy• sci-fi: 2 levels• mystery: 3 levels

Customers: Flat Table

<books_and_customers><bookstore> <fiction> <sci-fi> <book> <isbn>0006482805</isbn> <title>Do androids dream of electric sheep</title> <author>Philip K. Dick</author> </book> </sci-fi> <fantasy> <mystery> <book> <isbn>0261102362</isbn> <title>The two towers</title> <author>JRR Tolkien</author> </book> </mystery> </fantasy> </fiction></bookstore><!-- continued next column -->

<customers> <customer> <name>Jason Woolsey</name> <boughtbooks> <isbn>0261102362</isbn> <isbn>0593488321</isbn> </boughtbooks> </customer> <customer> <name>P.W. Ellis</name> <boughtbooks> <isbn>0006482805</isbn> <isbn>0261102362</isbn> </boughtbooks> </customer></customers></ books_and_customers >

Slide 5Querying XML with Locator Semantics

Functional Requirements for Querying XML (Dave Maier, W3C QL 98)

Selection and Extraction:• all sci-fi books by P.K. Dick

Reduction:• drop all authors but 1st author

Combination:• combine all books with their customers via isbn

Restructuring:• return flat lists of title/author pairs• and vice versa

Multidocument Handling:• get reviews and books from different sites• follow (dereference) links in books to authors

Slide 6Querying XML with Locator Semantics

XQL Overview (State W3C QL 98)

Basic Concept: Selection of Subtrees• Originated as QL for DOM• adopted for selectors in XSL-templates

(now merged with XPointer to XPel to XPath to ????) • Defined along search contexts = an (ordered) set of document nodes

Path Expressions and Filters:• A query is essentially a navigation in element trees• Navigation and filters modify the search context• Query result is the last search context

Selection of nodes by:• Element- and attribute name• Type (element, attribute, comment, etc.)• Content or value of nodes• Relationship between nodes: hierarchy, sequence, index

Combination by: union, intersection

Slide 7Querying XML with Locator Semantics

XQL 98 Examples

Selection and Extraction:• all books by P.K. Dick

//book[author=„P.K. Dick“]

Reduction:• drop all but 1st author

//*?/book?/(isbn | author[0] | title)• * matches all elements along paths to book• shallow return operator (?) retains nesting hierarchy• union preserves document order (title before author)

Slide 8Querying XML with Locator Semantics

XQL 98 lacked:

Selection Functionality• comparison operators for fulltext (in progress)• regular path expressions for hierarchy (only // for recursive descent

and * for matching all nodes in a search context)

Restructuring• Suggestions: return operators (SAG), XSLT (W3C), Application Level

(e.g. WebMethods)

Combination• joins; Suggestions: see below

Graphs• no navigation along ID/IDREF• no multi-documents (dereferencing URIs)• Suggestions: docref, ref, keyref, idref

Delegation• external functions• wrappers

Slide 9Querying XML with Locator Semantics

Extended XQL Examples

Combination:• combine all books with customers via isbn

$root//*?/book?[$i:=isbn]/ (* | $root//customer?[boughtbooks/isbn=$i])

• New concepts• combination with nodes outside of search context ($root//review)• correlation variables for expressing join predicate [$i:=isbn]• $root used for clarity...

• Irregular structure of bookstore is preserved

Multidocuments/Delegation:• get multiple bookstores from a bookmark list (HTTP-GET)

docref('http://www.bookstores')/docref(.//@href)//bookstore• the same with a form (HTTP-POST - simplified!)

docref ('http://www.bookstores/search.cfm',‘country',‘us')//bookstore• the same with a wrapper (application program delivering XML)

wrapper(„bookstore“)//bookstore

Slide 10Querying XML with Locator Semantics

Towards a Datamodel for querying XML

XQL forDummies

personperson article

firstname

lastname

Jonathan Robie

firstname

lastname title year

Joe Lapp 1999

authorauthor

FlatElemTable DocumentTableown_id doc up succ pred own_id name dtdref root

"Text zu Elem1" 0 1 - - - 1 "Dok1" 2 0 "Text zu Elem4" 1 1 0 - - 2 "Dok2" 1 2 "Text zu Elem6" 2 2 - - - 3 "Dok3" 1 9 "Text zu Elem8" 3 2 2 5 - "Text zu Elem10" 4 2 3 - -

5 2 2 7 3 attrRecTable6 2 5 - - element name value

NonFlatElemTable 7 2 2 - 5 2 Attr2 AW2down etName 8 2 7 - - 3 Attr3 AW3

1 "E0" 9 3 - - -3 "E2" 10 3 9 - - DocumentTable4 "E3" own_id name etypes config6 "E5" 1 "DTD1" {...} "...."8 "E7" 2 "DTD2" {...} "...."

10 "E9" 3 "DTD3" {...} "...."

DocElemTableflat

W3C-DOM:Element Tree OEM: Graph

<document> <person id=“jonathanr"> <firstname>Jonathan</firstname> <lastname>Robie</lastname> </person> <person id=“joel"> <firstname>Joe</firstname> <lastname>Lapp</lastname><!-- ... --><document>

Relational Tables(generic massive join option)

?

? ?

XML Serialization: Structured Text

[email protected].@id.“joel"document.person.firstnamedocument.person.firstname.“Joe"document.person.firstname.“Lapp"document.persondocument.person.@id...

Locators: Lists of Paths

?

Slide 11Querying XML with Locator Semantics

Locators for Bookstore

bookstore#1

bookstore#1.fiction#2

bookstore#1.fiction#2.sci-fi#3

bookstore#1.fiction#2.sci-fi#3.book#4

bookstore#1.fiction#2.sci-fi#3.book#4.isbn#5

bookstore#1.fiction#2.sci-fi#3.book#4.title#6

bookstore#1.fiction#2.sci-fi#3.book#4.author#7

bookstore#1.fiction#2.fantasy#8

bookstore#1.fiction#2.fantasy#8.mistery#9

bookstore#1.fiction#2.fantasy#8.mistery#9.book#10

bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11

bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11.title#12

bookstore#1.fiction#2.fantasy#8.mistery#9.book#10.isbn#11.author#13

...

Slide 12Querying XML with Locator Semantics

Locators <-> XML Serialization

Locators are lists of paths XML-document->Locators

• each element-node gets id in document-order (depth first, left to right traversal)

• each element-node is located by the entire path from root• attributes are attached to element-nodes• content is attached to leave-nodes

Locators->XML-document: • clean up: discard locators $prefix which are followed by at least

one locator $prefix.$postfix• generate tree

(1) for all locators generate nested serialization(2) fill up with content and attributes

Mappings should be total, 1:1

Slide 13Querying XML with Locator Semantics

Locator Sets vs. Relations

Commonalties• flat sets• identity defined by identity of components• concatenation to derive new locators/tuples

Differences• arity

• locators: variable length• tuples: fixed

• access to components:• locators: by navigation• tuples: by position/attribute

• data:• locator components: document nodes

tuples components: values

Slide 14Querying XML with Locator Semantics

Locator Algebra (0)

Operator Relational Algebra Locator Algebra

, , - On tuple sets On locator sets

Select Selects tuples with apredicate

Selects locators with a predicate

Project By absolutecomponent selection

Not available, implicit projection bydependent join

Cross Product Concatenate eachtuple in one set witheach tuple in anotherset

Dependent join concatenating locatorsfrom a context set with locators fromdependent set

Theta-J oin Combination of crossproduct with select

Combination of dependent join, select,and variable binding

Tree-Operators Not applicable DOM-methods

Slide 15Querying XML with Locator Semantics

Locator Algebra (1)

Preliminaries• L domain of locator sets

• x, y

• PL domain of locators• u, v

• tail(u) … last component of uprefix(u) … u - tail(u)

Tree-Operators• navigation in document tree using DOM methods• root, parent, children: PL L• applied to locator sets from L using d-join (see below)

Set-Operators , , -: L L L

defined as usual• order preservation due to total ordering on document nodes

Slide 16Querying XML with Locator Semantics

Locator Algebra (2)

Select• select[p]: L L, where p: PL Boolean

select[p](x) = {u | u x, p(tail(u))}• Example: select[nodename(.) = “book”](x) =

select[“book”](x)

Return• Corresponds to project

duplicates tail of locator for preserving it insubsequent d-join (see below)

• return: PL PLreturn(u)=concat(u, tail(u))

Slide 17Querying XML with Locator Semantics

Locator Algebra (3)

Dependent-Join:• d-join[f]: L L, where f: PL L

d-join[f](x) = u x concat(prefix(u),f(tail(u))

• Example: return all titles of books in their book contextselect[“title”](d-join[children(.)] (select[“book”](d-join[return(children(.))](x)) =/book?/title

Kleene Star:• fixpoint-operator for recursive descent queries• *[f]: L L, where f: L L

*[f](x) = f(x) *[f](f(x))• Example: select all titles in their original context

select[“title”](d-join[children(.)] (*[d-join[return(children(.)](.))](x))=//*?/title

• maybe too general for physical algebra

Slide 18Querying XML with Locator Semantics

Locator Algebra (4)

Varbind, Varget• to realize joins across contexts• varbind[i,f]: L L, where i Name, f: PL L

varbind[i,f](x):

for all u x: vars(u):=vars(u) vf(tail(u))<i,v>

• varget[i]: PL Lvarget[i](u): {v | (i,v) vars(u)}

Slide 19Querying XML with Locator Semantics

Join Example (1)

bc#0.bookstore#1bc#0.bookstore#1.fiction#2bc#0.bookstore#1.fiction#2.sci-fi#3...

$A=*[d-join[return(children(.))](.)](x)= //*?

bc#0

$B=select[“book”](d-join[return(children(.))]($A))= //*?/book

bc#0.bs#1.f#2.sf#3.b#4bc#0.bs#1.f#2.fa#8.mi#9.b#10...

$C=d-join[return(children(.))]($B)=//*?/book?/*

bc#0.bs#1.f#2.sf#3.b#4.isbn#5bc#0.bs#1.f#2.sf#3.b#4.title#6...

$D=varbind[$i,select[“isbn”](children(.))]($B)= //*?/book[$i:=isbn]?

bc#0.bs#1.f#2.sf#3.b#4<$i,isbn#5>bc#0.bs#1.f#2.fa#8.mi#9.b#10<$i,isbn#11>...

$F=d-join(select[ select[“isbn”](d-join[children(.)] (select[“boughtbooks”](d-join[children(.)](.)))= = varget[$i](.)](“$E”)]($D)=//*?/book[$i:=isbn]?/ (//*?/customer[boughtbooks/isbn=$i])

$E=select[“customer”](d-join[children(.)] (*[d-join[return(children(.))](.)](d-join[root(.)]($D)))=//*?/customercustomers#14.customer#15customers#14.customer#20

bc#0.bs#1.f#2.sf#3.b#4.cs#14.customer#20bc#0.bs#1.f#2.fa#8.mi#9.b#10.cs#14.customer#15bc#0.bs#1.f#2.fa#8.mi#9.b#10.cs#14.customer#20

Slide 20Querying XML with Locator Semantics

Join Example (2)

<books_and_customers><bookstore> <fiction> <sci-fi> <book> <isbn>0006482805</isbn> <title>Do androids dream of electric sheep</title> <author>Philip K. Dick</author> <customers> <customer> <name>P.W. Ellis</name> <boughtbooks> <isbn>0006482805</isbn> <isbn>0261102362</isbn> </boughtbooks> </customer> </customers> </book> </sci-fi>

<fantasy> <mystery> <book> <isbn>0261102362</isbn> <title>The two towers</title> <author>JRR Tolkien</author> <customers> <customer> <name>Jason Woolsey</name> <boughtbooks> <isbn>0261102362</isbn><isbn>0593488321</isbn> </boughtbooks> </customer> <customer> <name>P.W. Ellis</name> <boughtbooks> <isbn>0006482805</isbn> <isbn>0261102362</isbn> </boughtbooks> </customer> </customers> </book> </mystery> </fantasy> </fiction></bookstore></books_and_customers>

Slide 21Querying XML with Locator Semantics

Some Equivalence Transformations for L’Algebra

Commutativity:• union(A,B) = union(B,A) (within single document)• but d-join is not commutative

Associativity:• union, intersect, d-join

Idempotence:• union(A,A) = A

Distributivity:• //book/(title | author) = //book/title | //book/author

Neutral Elements:• union: {}• d-join: $root(?)

Slide 22Querying XML with Locator Semantics

Open Issues

Combination with relational algebra Graphs/Multidocuments

• DAGs: Multiple paths from root-context to node (serialization?)• Role of URIs in locators?

Typing• Role of XSD (XML Schema Description)• Inference

Constructors• attribute to element and vice versa….• Grouping, Skolems

Details• Investigate conformance of locator concept to W3C Infoset• Constraints on locators/mappings to guarantee wellformedness

Political• XQL-Implementations shipping:

underlying semantics node-based, not locator-based

Slide 23Querying XML with Locator Semantics

The IPSI XML Brokering Framework

Datamodel: Document Object Model (W3C-DOM)

GenericWrappers

SpecificWrappers

JEDIFramework

Queryprocessor: XML Query Language (XQL)

Server (HTTP, URL)

XSL Processor

Queries

Visualization

DOM

PersistentDOM

Warehouse

Program

URL+Queries

XML

XML

HTML, CSS

HTTP/HTML Roboter

XQL

XQL

Slide 24Querying XML with Locator Semantics

Wrappers

Jedi Framework for Wrappers• Pivot Object Model• Scripting language for control-flow• Access to dynamic sources (ODBC, CORBA) with iterators

Generic Wrappers• Generic Mapping of structured formats to XML• Examples: SGML,XML, HTML, MS-RTF

Jedi Parser• for irregularily formatted sources• context free, attributed grammars• fault-tolerant, efficient parser: unlimited lookahead, interpretation

of ambiguous, incomplete grammars by specificity ordering

HTTP-Access• Access plans for delegation integrated with XQL Engine

Slide 25Querying XML with Locator Semantics

Mediator: XQL Engine + Persistent DOM

XQL 98 Implementation• efficient recursive descent queries by signature-index

+ Joins + Multi Document Handling

• extends XQL with external references (via http-get, http-post)• Multidocument DOM; for every node namespace and URI

+ User defined functions• input: context (reference-node-set, reference-node-pointer),

parameters: constants, XQL-expressions (lazy evaluation)• output: node-functions, collection-functions (set of nodes),

comparison-operatorscan attach base-URIs

• variables

Slide 26Querying XML with Locator Semantics

Application 1: An XML Broker for Golfers

<www.reiseplanung.de> <route> <von>53757</von> <nach>93333</nach> <entfernung>481.9</entfernung> <fahrzeit>274</fahrzeit> <karte>5375793333.gif</karte> </route> <!-- ... --></www.reiseplanung.de>

<www.wetter.de> <wetter> <plz>87724</plz> <datum>981001</datum> <temperatur>16</temperatur> <regen>90</regen> <wind>9</wind> <prognose>13</prognose> </wetter> <!-- ... --><www.wetter.de>

<golfplatz id="platz0001"> <adresse> [...] </adresse> <policy> ... </policy> <handicap> <wochentag>34</wochentag> <wochenende>34</wochenende> </handicap></golfplatz>

XML Broker

<golfdemo <golfplatz> <adresse> ... </adresse> <greenfee> ... </greenfee> ... </golfplatz> <wetter> ... </wetter> <route> ... </route></golfdemo>

XSL

Query

Slide 27Querying XML with Locator Semantics

Application 2: RELIMO Integrating Bioinformatics Data

PDBas localPDOM

RELIBASEwith XML

RPC

XML Broker

XML Application(e.g. Office 2000)

XML Browser(e.g. Mozilla 5)

XSL Formatter(e.g. Lotus-XSL)

Slide 28Querying XML with Locator Semantics

Application Data

XML Broker for Golfers• Sources: www.golffuehrer.de (500 KB), www.wetter.de (200 KB),

www.routen-information.de (200 KB)• Joins (via zip-code) ~ 2 to 3 secs

RELIMO (Germany)• Sources: Relibase (XML-RPC), PDB (5 GB -> 25 MB XML, 30 MB

PDOM)• response time (100 MB) 50 to 30000 ms

MIROWEB (ESPRIT)• JEDI for importing several sources to Oracle 8

Shakespeare• all plays• 10 MB (Tests with duplicated data up to 0.5 GB)

Slide 29Querying XML with Locator Semantics

Some Links & Acks

XQL FAQ• http://metalab.unc.edu/xql/

IPSI XML Research & Development• http://xml.darmstadt.gmd.de

• XQL-Engine 1.0.1 download (non-commercial use)

• JEDI download (non-commercial use)

XML Brokering Framework Licensing Info (Infonyte)• [email protected]

• www.infonyte.com

Many thanks to• Karl Aberer, Harald Schöning, Guido Mörkotte


Recommended