+ All Categories
Home > Documents > XML queries and updates

XML queries and updates

Date post: 06-Jan-2016
Category:
Upload: emma
View: 48 times
Download: 0 times
Share this document with a friend
Description:
XML queries and updates. Daniela Florescu. Outline. Introduction XML Data Model and Type System XML Queries Technical discussions on Xquery design XML Updates Conclusions. What is XML?. - PowerPoint PPT Presentation
70
XML queries and updates Daniela Florescu
Transcript
Page 1: XML queries and updates

XML queries and updatesDaniela Florescu

Page 2: XML queries and updates

Outline

• Introduction• XML Data Model and Type System• XML Queries• Technical discussions on Xquery design• XML Updates• Conclusions

Page 3: XML queries and updates

What is XML?

• The Extensible Markup Language (XML) is the universal format for structured documents and data on the Web.

• Base specifications:•XML 1.0, W3C Recommendation Feb '98•Namespaces, W3C Recommendation Jan '99

Page 4: XML queries and updates

Simple XML Data Example

<book year=“1967” xmlns:amz=“www.amazon.com”> <title>The politics of experience</title> <author>R.D. Laing</author> <amz:ref amz:isbn=“1341-1444-555”/> <section>

The great and true Amphibian, whose nature is disposed to…..

<title>Persons and experience</title> Even facts become...

</section> …</book>

Page 5: XML queries and updates

The secrets of the XML success

• XML is a data representation format• XML is universal • XML is human readable• XML is machine readable• XML is international• XML is platform independent• XML is vendor independent• XML is endorsed by the W3C• XML is not a new technology• XML is not only a data representation format

Page 6: XML queries and updates

XML as a family of technologies

• XML Information Set • XML Schema• XML Query• The Extensible Stylesheet Transformation Language

(XSLT)• XML Forms• XML Protocol• XML Encryption • XML Signature• Others• … almost all the pieces needed for a reasonably

good Web Services puzzle…

Page 7: XML queries and updates

Major application domains for XML

• Data exchange on the Web•e.g.HealthCare Level Seven http://www.hl7.org/

• Application integration on the Web•e.g. ebXML http://www.ebxml.org/

• Document exchange on the Web•e.g. Encoded Archival Description Application http://lcweb.loc.gov/ead/

Page 8: XML queries and updates

The role of an XML query language

• Why a query language for XML ?•Preserve the logical/physical data independence

•The semantics is described in terms of an abstract data model, independent on the physical data storage

•Declarative programming•Such programs should describe the “what”, not the “how”

• Why a native query language ??•We need to deal with the specificities of XML (hierarchical, ordered , textual, potentially schema-less structure)

Page 9: XML queries and updates

XML query languages: state of the art

• Query languages for graph data•e.g. GOOD, GraphLog, Clean

• Query languages/scripting languages for the WEB

•e.g. WebSQL, WebOQL, WebL• Query languages for semi-structured data

•e.g. MSL, UnQL, StruQL, YATL

Page 10: XML queries and updates

XML query languages: state of the art

• Research query languages for XML•e.g. XML-QL, Lorel, XML-GL, Quilt, Xduce

• Industry query languages for XML•e.g. XQL, OQL extensions to query SGML documents

• W3C standard processing languages for XML •e.g. XPath, XSLT

Standard W3C XML Query Language: Xquery

Page 11: XML queries and updates

W3C Query Working Group - History

• Sept 1999: WG creation and first F2F • Currently 30+ W3C member companies • Twelve F2F meetings and 80+ telecons so far • Public WDs every three months 

http://www.w3.org/XML/Query

Page 12: XML queries and updates

W3C Query Working Group - Goal

"The goal of the XML Query WG is to produce:- an abstract data model for XML documents, - a set of query operators on that data model,- a query language based on these query operators

Page 13: XML queries and updates

XML: Many Environments

DOM

SAX

DBMS

XML

Java

COBOL

DOM

SAX

DBMS

XML

Java

COBOL

XQuery

W3C XML Query Data Model

W3C XML Query Data Model

Page 14: XML queries and updates

W3C XML Working Group - Status

• June 2001: new or revised working drafts• XML Query Requirements• XML Query Use Cases• XML Query 1.0 and Xpath 2.0 Data Model• XML Query 1.0 Formal Semantics• Xquery 1.0: An XML Query Language• XML Syntax for Xquery 1.0 (XqueryX)

Page 15: XML queries and updates

General XML query requirements

• Non-procedural, declarative query language

• XML syntax for query language but also a human readable syntax

• Protocol independent • Standard error conditions • Should not preclude updates

Page 16: XML queries and updates

XML Query Use Cases• Use Case Organization

•Description, DTD/Schema, Input Data, Queries and Results

• Current Use Cases •"XMP": Experiences and exemplars •"TREE": Queries that preserve hierarchy •"SEQ" - Queries based on sequences •"R" - Access to relational data •"TEXT": Full-text search •"NS" - Queries using namespaces •"PARTS" - Recursive computation •"REF" - Queries based on references

Page 17: XML queries and updates

XML Abstract Data Model

• Common for Xpath 2.0 and XQuery 1.0• A logical model composed of

• a set of logical entities • constructors and accessors for each entity

•Based on the notion of an ordered tree•XML data cannot be modeled as a “simple” tree

Page 18: XML queries and updates

XML Abstract Data Model Entities

• Nodes

Node = Document | Element | Attribute |

Text | Namespaces | PI | Comment

• Simple values (all XML Schema simple types)string, boolean, ID, IDREF, decimal, QName, URI, ...

• Sequences• Errors• Schema components

Page 19: XML queries and updates

Document Nodes: Constructors & Accessors

• Constructor:document-node :

URI X Sequence[1,*] (Element | Text | PI | Comment ) ->DocumentNode

• Accessors:base-uri : DocumentNode -> URIchildren : DocumentNode -> Sequence[1,*](ElementNode|TextNode|PI|Comment)string-value : DocumentNode -> string

Page 20: XML queries and updates

Attribute Nodes: Constructors & Accessors

• Constructor:attribute-node: Qname X string X SchemaComponent ->

AttributeNode• Accessors:

name : AttributeNode -> Qnametype : AttributeNode -> SchemaComponenttyped-value : AttributeNode -> Sequence(SimpleValue)string-value :AttributeNode -> stringparent : AttributeNode -> Sequence[0,1] (Node)

Page 21: XML queries and updates

Sequences: Constructors & Accessors

• Constructors: empty-sequence: () -> Sequence append : Sequence X Sequence -> Sequence

• Accessors:empty : Sequence -> booleanhead : Sequence -> UnitValuetail : Sequence -> Sequence

Page 22: XML queries and updates

XML data model - conclusion

• Complete with respect to XML• Relatively simple design

• ordered trees, node-labeled, with node identity• Semantics of the query language relies on the data

model constructors and accessors• Relationship with the other W3C XML related

standards:•Clear mapping to/from the XML Infoset•Less clear relationship with Document Object Model (DOM)•Less clear relationship with the XML Schema and the type system

Page 23: XML queries and updates

The Xquery type system • Xquery’s original design had a powerful type system (based on Xduce)

• The type system can: (1) detect statically errors in the queries (2) infer the type of the result of valid queries (3) ensure statically that the result of a given query is of a given (expected) type if the input dataset is guaranteed to be of a given type

• Queries on types• Big debate: XML type system vs. XML Schema

Page 24: XML queries and updates

Xquery in a nutshell• Functional language

•A query is an expression•The result of the query is the result of the evaluation of the expression •Expressions are evaluated in a certain environment

• Strongly typed •Every expression has a type

• Statically typed •The type of the result of an expression can be detected statically

• Formal semantics based on XML Abstract Data Model• Dual syntax: XML and non XML• Influenced by: SQL, OQL, XQL, Xpath, Quilt

Page 25: XML queries and updates

Xquery expressions• Constants and variables • expression1 operator expression2• function(expression1,...expression2) • XPath expressions (for navigation)• FLWR expressions (for iteration)• SORTBY expressions• Quantified expressions • Conditional expressions• Type-related expressions• XML node constructors (elements, attributes, etc)• Xquery expressions can be nested with full generality !

Page 26: XML queries and updates

First XML queries

• 1+1• $x• $x/title• $x/price+1• document(“www.amazon.com/books.xml”)

Page 27: XML queries and updates

Xquery functions and operators

• Arithmetic operators•+, -, *, div, =, !=, <, etc

• Logical operators•and, not, or

• Collection oriented operators•union, intersection, difference, empty, distinct, count, sum, avg, min, max, etc

• Global topological order related operators•before, after, unordered

• XML specific functions•document, name, string-value, typed-value, etc

• Many semantic open issues related to the semantics of these operators

Page 28: XML queries and updates

Xpath expressions• General syntax:

expression ‘/’ step

• Two syntaxes: abbreviated or not• Step in the non-abbreviated syntax:

axis ‘::’ nodeTest

• Axis control the navigation direction in the tree•ancestor, ancestor-or-self, attribute, child, descendent, descendent-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self

• Node test by:•Name (e.g. publisher, myNS:publisher, *: publisher, myNS:* , *:* )•Type (e.g. node(), comment(), text() )

Page 29: XML queries and updates

Examples of path expressions

• document(“bibliography.xml”)/child::bib• $x/child::bib/child::book/attribute::year• $x/parent::*• $x/ancestor::*/descendent::comment()

Page 30: XML queries and updates

Semantics of XPath expressions

• Semantics of path expressions in Xpath 1.0(1) Ordered forests of nodes as input, ordered forests of nodes as output (2) For each root node in the input forest, select the nodes in the same document that obey to the given axis

(3) Among those select and return the ones that satisfy the node test.(4) No duplicates are allowed in the output(5) Output nodes are ordered by the document order(6) Nodes preserve their identity

• No type error for $book/nose• A list of lists is automatically flattened

Page 31: XML queries and updates

Xpath abbreviated syntax (1)

• Axis can be missing•By default the child axis $x/child::person -> $x/person

• Short-hands for common axes•Descendent-or-self

$x/descendant-or-self::comment() -> $x//comment() •Parent

$x/parent::* -> $x/.. •Attribute

$x/attribute::year -> $x/@year •Self

$x/self::* -> $x/.

Page 32: XML queries and updates

Xpath abbreviated syntax (2)

• Implicit root node$root/bib -> /bib$root -> /

• Implicit current node (inside in the second order functions )

$self/title -> ./title $self/title -> title

Page 33: XML queries and updates

Simple iteration expression

•Syntax : for variable in expression1 return expression2

•Example :for $x in document(“bibliography.xml”)/bib/book return $x/title

•Semantics :•bind the variable to each root node of the forest returned by expression1•for each such binding evaluate expression2•concatenate the resulting forests•lists of lists are automatically flattened

Page 34: XML queries and updates

Local variable declaration

• Syntax : let variable := expression1 return expression2

• Example :let $x := document(“bibliography.xml”)/bib/book return count($x)

• Semantics :•bind the variable to the result of the expression1 •add this binding to the current environment•evaluate expression2 •remove the local variable from the environment.

Page 35: XML queries and updates

Conditional expressions

• Syntax : if ( expression1 ) then expression2 else expression3

• Example : if ( $book/@year <1980 )

then “old book” else “new book”• Semantics :

•If expression1 evaluates to true then return the result of the evaluation of expression2 else return the result of the evaluation of expression3.

Page 36: XML queries and updates

FLWR expressions

• Syntactic sugar that combines FOR, LET, IF

•Example for $x in //bib/book /* like the FROM in SQL */ let $y := $x/author /* no analog in SQL */ where $x/title=“The politics of experience” /* like the WHERE in SQL */ return count($y) /* like the SELECT in SQL */

FOR var IN expr

LET var := expr WHERE expr

RETURN expr

Page 37: XML queries and updates

FLWR expression semantics

• FLWR expression: for $x in //bib/book let $y := $x/author where $x/title=“The politics of experience” return count($y)

• Semantically equivalent to: for $x in //bib/book return (let $y := $x/author return if ($x/title=“The politics of experience” ) then count($y) else () )

Page 38: XML queries and updates

More FLWR expression examples

•Selectionsfor $b in document("bib.xml")//book where $b/publisher = “Springer Verlag" and

$b/@year = "1998" return $b/title

•Joinsfor $b in document("bib.xml")//book,

$p in //publisherwhere $b/publisher = $p/namereturn $b/title | $p/address/title, $p/name

Page 39: XML queries and updates

Xpath filter predicates• Syntax:

expression1 [ expression2 ]

• [] is an overloaded operator• Filtering by predicate :

• //book [./author/firstname = “ronald”]• //book [@price <25]• //book [count(author [@gender=“female”] )>0 ]

• Filtering by position :• /book[3] • /book[3]/author[1] • /book[3]/author[1 to 2]

Page 40: XML queries and updates

Quantified expressions

• Syntax:some variable in expression1 satisfies expression2every variable in expression1 satisfies expression2

• Examples:•some $x in //book satisfies $x/price >200•//book[some $x in author satisfies $x/@gender=“female”]• for $x in //book where every $y in $x/author satisfies $y/@gender=“female” return $x/title

Page 41: XML queries and updates

SORTBY expressions• Syntax:

expression0 SORTBY ( expression1 [ ASCENDING | DESCENDING ] , ….,

expressionK [ ASCENDING | DESCENDING ] )

• Examples:•//book sortby ( @price )•//book[@year=“2001”]/author sortby (lastname, firstname)• for $x in //book where empty($x/author) return $x sortby (title)

Page 42: XML queries and updates

Global document order queries

• Syntax: expression1 ( before | after ) expression2

• Examples:•//section before //section[title=“Persons and experiences”]•//paragraph after //section[name=“Introduction”] before //paragraph[contains(“Xquery”)

Page 43: XML queries and updates

Xquery element constructors

•Standard XML elements: <section title=“Persons and experiences” > This is a section of the book entitled <title>The politics of Experience</title> written by <author> Ronald Laing</author>. </section>

•Dynamically constructed elements: <section title = {$s/title} >This is a section of the book entitled {$s/ascendents::book/title} written by { for $a in$s/ascendents::book/author return <author> {concat($a/firstname, $a,lastname)} </author> }.</section>

Page 44: XML queries and updates

Complex Xquery example

<bibliography> { for $x in //book[@year=“2001”] return

<book title={$x/title}> { if(empty($x/author))

then $x/editor/affiliation else $x/author}

</book> }

</bibliography>

Page 45: XML queries and updates

Xquery operators on datatypes

•INSTANCEOF •returns True if its first operand is an instance of the type named in its second operand

•CAST •is used to convert a value from one datatype to another

•TREAT •causes the query processor to treat an expression as though its datatype were a subtype of its static type

•TYPESWITCH •branching based on the dynamic type of the input data

Page 46: XML queries and updates

Dealing with node identity

• All nodes in the data model have node identity• Nodes identity is preserved through queries• Two equality functions for nodes:

•Value based •Identity based

Page 47: XML queries and updates

Local function declarations• Example:

function number_paragraphs($x ns:section) return xsd:integer

{count($x/paragraph) + sum(for $y in $x/section

return number_paragraphs($y))}

number_paragraphs(/bib/book[title=“The politics of experience”]/section[1])

Page 48: XML queries and updates

Joins in XQuery

<books-with-prices> {for $a in document(“amaxon.xml”)/book, $b in document(“bn.xml”)/book where $b/isbn=$a/isbn return

<book> {$a/title} <price-amazon>{$a/price}</price-amazon>, <price-bn>{$b/price}</price-bn> </book> } </books-with prices>

Page 49: XML queries and updates

Left-outer joins in XQuery<books-with-prices> {for $a in document(“amaxon.xml”)/book return

<book> {$a/title}

<price-amazon>{$a/price}</price-amazon>, {for $b in document(“bn.xml”)/book where $b/isbn=$a/isbn return <price-bn>{$b/price}</price-bn>

} </book>

} </books-with prices>

Page 50: XML queries and updates

Full-outer joins in Xquery

let $allISBNs:=distinct(document(“amazon.xml”)/book/isbn union document(“bn.xml”)/book/isbn )return <books-with-prices> {for $isbn in $allISBNs return

<book> { for $a in document(“amazon.xml”)/book[isbn=$isbn]

return <price-amazon>{$b/price}</price-amazon> } {for $b in document(“bn.xml”)/book [isbn=$isbn]

return <price-bn>{$b/price}</price-bn> } </book> } </books-with prices>

Page 51: XML queries and updates

Group-by and Having

• Example:for $a in distinct(//book/author/lastname)let $books := //book[some $y in author/lastname=$a]where count($books)>10return <result> {$a/name} {$books[1 to 10]} </result>

Page 52: XML queries and updates

Views in Xquery

• Views are supported in Xquery via functions•non-parameterized views via functions with no

arguments

•parameterized views via functions with at least

one argument

• Xquery supports recursive views •unrestricted form of recursion

• Termination is not guaranteed automatically

Page 53: XML queries and updates

Many open issues

• Relationship with Xpath•E.g. should we preserve the implicit casting operations of Xpath 1.0?

• Relationship with XML Schema•Bi-directional mapping between the XML Schema concepts and the Xquery type system concepts•Schema validation vs. type checking•Name-based sub typing vs. structural sub typing

• Human readable (non XML) syntax for types ?• Xquery functions and operators built-in library• More sophisticated support for full text search• … and many more…

Page 54: XML queries and updates

Xquery implementations

•Microsoft •Software AG•Kweelt•Lucent •Univ. Darmstad •HiFive.com•FatDog.com

Page 55: XML queries and updates

XML query language summary

• Expressive power•Major functionality of XML-QL, XQL, SQL, OQL - query the many kinds of data XML contains!•Use-case driven approach

• Can be implemented in many environments•Traditional databases, XML repositories, XML programming libraries, etc.•Queries may combine data from many sources

• Minimalist design•Small, easy to understand, clean semantics •“A quilt, not a camel”

Page 56: XML queries and updates

Conclusion• One language replaces DOM+XPath+XSLT• Expressive, concise, easy to learn • Implementable, optimizable• Data integration for multiple sources• Several current implementations• Preliminary update proposal • Future work:

•Scripting language for XML•Workflow langauge for XML

• For more informations about the W3C XML Query Language WG activity please visit

W3C XML Query

Page 57: XML queries and updates

Some of Xquery’s debates …

Page 58: XML queries and updates

Procedural difficulties• Language designed by a committee

•Hard to avoid the Camel• Strong interaction with other W3C WG

• Not too much coordination among the W3C WG• No preexisting global vision or architecture (bottom up

design, like the Web itself !)

XqueryXSLTSchema

DOM

Page 59: XML queries and updates

Technical argument (1)• Problem 1: equality is not transitive nor reflexive

•$x=3 and $x=4 can evaluate to true •$x<2 and $x>4 can also evaluate to true •$x=3 and $x!=3 can evaluate to true •$x=$y and $y=$z does not imply $x=$z •$x=$x can evaluate to false

• Source:•Equality (and all the other relational operators) has an implicit existential quantifier in Xpath 1.0

• Nasty consequences:• high probability of user errors and intense frustration• good old query evaluation algorithms don’t work anymore• schema evolution is badly handled

Page 60: XML queries and updates

Technical argument (2)

• Problem 2: implicit data conversions• from an element to the element content• from an attribute to the attribute content• from a sequence to a value (by taking the first member)• from a typed value to string• from a string to a typed value• from any typed value to a Boolean(e.g. from a node set to Boolean)

• Examples of bad cases:• //book[price] is not the same as //book[price+0]• <book>{@price}</book> is not the same as <book>{@price+0}</book>

Page 61: XML queries and updates

Technical argument (3)• Problem 2: implicit data conversions• Source:

•Backward compatibility with Xpath 1.0•Dealing with the “semi-structured” aspect of the data•Trying to avoid static or dynamic errors as much as possible

• Bad consequences:•the result of the evaluation of an expression can depend on the context where the expression appear•high probability of user errors and intense frustration

Page 62: XML queries and updates

Technical argument (4)• Problem 3: “/” is not a simple projection

• (//book sortby @price)/title will be sorted by document order, not by price

• Source:•Backwards compatibility with Xpath 1.0

• Bad consequences:• high probability of user errors and more frustration• “/” often requires materialization (for sorting and duplicate elimination)• difficult to parallelize and stream

Page 63: XML queries and updates

You can help

• Designing such a language is VERY hard!• Your opinion matters!• A year from now it will be too late • Please help reviewing the specifications and send comments to

[email protected]

Page 64: XML queries and updates

XML update language

• Declarative update language • XML data model tree modification:

•E.g. nodes deletion, insertion, replacement•Metadata replacement

• Built in top of the XML query language• Initial proposal from some of the XML Query WG

members• Not an official working draft of the W3C !• Already supported by some Xquery implementations

Page 65: XML queries and updates

XML update statements

•Simple update statements•InsertStatement•DeleteStatement•RenameStatement•ReplaceStatement•MoveStatement

•Complex update statements

Page 66: XML queries and updates

INSERT statement

• Syntax:insert expression1 ( into | after | before ) expression2

• Examples:•insert <publisher>Morgan Kaufmann</publisher>

after //book[title=“The politics of experience”]/title•insert <comment>This is a great paragraph!</comment>

before //book[author/lastname=“Laing”]/section[1]/paragraph[2]

Page 67: XML queries and updates

DELETE statement

• Syntax:delete expression

• Examples:•delete //book/@price>100•delete //book[1]/section[1 to 3]/comment()•delete //comment()

Page 68: XML queries and updates

RENAME statement

• Syntax: rename expression as expression

• Examples:•rename //book as “publication”•rename //book/@price as “amazon_price”

Page 69: XML queries and updates

REPLACE statement

• Syntax: replace expression1 with expression2

• Examples:•replace //book[1]/title with <title>Some new title</title>•replace //book[1]/@price/data() with 25.50•replace //book[1]/@price/data() with

//book[1]/@price/data()+5

Page 70: XML queries and updates

MOVE statement

• Syntax:move expression1 ( before | after | into ) expression2

• Examples:•move //book[1]/section[1]/paragraph[2] before

//book[1]/section[2]/paragraph[1]•move //book[1]/@price into //book[1]/publisher


Recommended