XML queries and updatesDaniela Florescu
Outline
• Introduction• XML Data Model and Type System• XML Queries• Technical discussions on Xquery design• XML Updates• Conclusions
What is XML?
• The Extensible Markup Language (XML) is the universal format for structured documents and data on the Web.
• Base specifications:•XML 1.0, W3C Recommendation Feb '98•Namespaces, W3C Recommendation Jan '99
Simple XML Data Example
<book year=“1967” xmlns:amz=“www.amazon.com”> <title>The politics of experience</title> <author>R.D. Laing</author> <amz:ref amz:isbn=“1341-1444-555”/> <section>
The great and true Amphibian, whose nature is disposed to…..
<title>Persons and experience</title> Even facts become...
</section> …</book>
The secrets of the XML success
• XML is a data representation format• XML is universal • XML is human readable• XML is machine readable• XML is international• XML is platform independent• XML is vendor independent• XML is endorsed by the W3C• XML is not a new technology• XML is not only a data representation format
XML as a family of technologies
• XML Information Set • XML Schema• XML Query• The Extensible Stylesheet Transformation Language
(XSLT)• XML Forms• XML Protocol• XML Encryption • XML Signature• Others• … almost all the pieces needed for a reasonably
good Web Services puzzle…
Major application domains for XML
• Data exchange on the Web•e.g.HealthCare Level Seven http://www.hl7.org/
• Application integration on the Web•e.g. ebXML http://www.ebxml.org/
• Document exchange on the Web•e.g. Encoded Archival Description Application http://lcweb.loc.gov/ead/
The role of an XML query language
• Why a query language for XML ?•Preserve the logical/physical data independence
•The semantics is described in terms of an abstract data model, independent on the physical data storage
•Declarative programming•Such programs should describe the “what”, not the “how”
• Why a native query language ??•We need to deal with the specificities of XML (hierarchical, ordered , textual, potentially schema-less structure)
XML query languages: state of the art
• Query languages for graph data•e.g. GOOD, GraphLog, Clean
• Query languages/scripting languages for the WEB
•e.g. WebSQL, WebOQL, WebL• Query languages for semi-structured data
•e.g. MSL, UnQL, StruQL, YATL
XML query languages: state of the art
• Research query languages for XML•e.g. XML-QL, Lorel, XML-GL, Quilt, Xduce
• Industry query languages for XML•e.g. XQL, OQL extensions to query SGML documents
• W3C standard processing languages for XML •e.g. XPath, XSLT
Standard W3C XML Query Language: Xquery
W3C Query Working Group - History
• Sept 1999: WG creation and first F2F • Currently 30+ W3C member companies • Twelve F2F meetings and 80+ telecons so far • Public WDs every three months
http://www.w3.org/XML/Query
W3C Query Working Group - Goal
"The goal of the XML Query WG is to produce:- an abstract data model for XML documents, - a set of query operators on that data model,- a query language based on these query operators
XML: Many Environments
DOM
SAX
DBMS
XML
Java
COBOL
DOM
SAX
DBMS
XML
Java
COBOL
XQuery
W3C XML Query Data Model
W3C XML Query Data Model
W3C XML Working Group - Status
• June 2001: new or revised working drafts• XML Query Requirements• XML Query Use Cases• XML Query 1.0 and Xpath 2.0 Data Model• XML Query 1.0 Formal Semantics• Xquery 1.0: An XML Query Language• XML Syntax for Xquery 1.0 (XqueryX)
General XML query requirements
• Non-procedural, declarative query language
• XML syntax for query language but also a human readable syntax
• Protocol independent • Standard error conditions • Should not preclude updates
XML Query Use Cases• Use Case Organization
•Description, DTD/Schema, Input Data, Queries and Results
• Current Use Cases •"XMP": Experiences and exemplars •"TREE": Queries that preserve hierarchy •"SEQ" - Queries based on sequences •"R" - Access to relational data •"TEXT": Full-text search •"NS" - Queries using namespaces •"PARTS" - Recursive computation •"REF" - Queries based on references
XML Abstract Data Model
• Common for Xpath 2.0 and XQuery 1.0• A logical model composed of
• a set of logical entities • constructors and accessors for each entity
•Based on the notion of an ordered tree•XML data cannot be modeled as a “simple” tree
XML Abstract Data Model Entities
• Nodes
Node = Document | Element | Attribute |
Text | Namespaces | PI | Comment
• Simple values (all XML Schema simple types)string, boolean, ID, IDREF, decimal, QName, URI, ...
• Sequences• Errors• Schema components
Document Nodes: Constructors & Accessors
• Constructor:document-node :
URI X Sequence[1,*] (Element | Text | PI | Comment ) ->DocumentNode
• Accessors:base-uri : DocumentNode -> URIchildren : DocumentNode -> Sequence[1,*](ElementNode|TextNode|PI|Comment)string-value : DocumentNode -> string
Attribute Nodes: Constructors & Accessors
• Constructor:attribute-node: Qname X string X SchemaComponent ->
AttributeNode• Accessors:
name : AttributeNode -> Qnametype : AttributeNode -> SchemaComponenttyped-value : AttributeNode -> Sequence(SimpleValue)string-value :AttributeNode -> stringparent : AttributeNode -> Sequence[0,1] (Node)
Sequences: Constructors & Accessors
• Constructors: empty-sequence: () -> Sequence append : Sequence X Sequence -> Sequence
• Accessors:empty : Sequence -> booleanhead : Sequence -> UnitValuetail : Sequence -> Sequence
XML data model - conclusion
• Complete with respect to XML• Relatively simple design
• ordered trees, node-labeled, with node identity• Semantics of the query language relies on the data
model constructors and accessors• Relationship with the other W3C XML related
standards:•Clear mapping to/from the XML Infoset•Less clear relationship with Document Object Model (DOM)•Less clear relationship with the XML Schema and the type system
The Xquery type system • Xquery’s original design had a powerful type system (based on Xduce)
• The type system can: (1) detect statically errors in the queries (2) infer the type of the result of valid queries (3) ensure statically that the result of a given query is of a given (expected) type if the input dataset is guaranteed to be of a given type
• Queries on types• Big debate: XML type system vs. XML Schema
Xquery in a nutshell• Functional language
•A query is an expression•The result of the query is the result of the evaluation of the expression •Expressions are evaluated in a certain environment
• Strongly typed •Every expression has a type
• Statically typed •The type of the result of an expression can be detected statically
• Formal semantics based on XML Abstract Data Model• Dual syntax: XML and non XML• Influenced by: SQL, OQL, XQL, Xpath, Quilt
Xquery expressions• Constants and variables • expression1 operator expression2• function(expression1,...expression2) • XPath expressions (for navigation)• FLWR expressions (for iteration)• SORTBY expressions• Quantified expressions • Conditional expressions• Type-related expressions• XML node constructors (elements, attributes, etc)• Xquery expressions can be nested with full generality !
First XML queries
• 1+1• $x• $x/title• $x/price+1• document(“www.amazon.com/books.xml”)
Xquery functions and operators
• Arithmetic operators•+, -, *, div, =, !=, <, etc
• Logical operators•and, not, or
• Collection oriented operators•union, intersection, difference, empty, distinct, count, sum, avg, min, max, etc
• Global topological order related operators•before, after, unordered
• XML specific functions•document, name, string-value, typed-value, etc
• Many semantic open issues related to the semantics of these operators
Xpath expressions• General syntax:
expression ‘/’ step
• Two syntaxes: abbreviated or not• Step in the non-abbreviated syntax:
axis ‘::’ nodeTest
• Axis control the navigation direction in the tree•ancestor, ancestor-or-self, attribute, child, descendent, descendent-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self
• Node test by:•Name (e.g. publisher, myNS:publisher, *: publisher, myNS:* , *:* )•Type (e.g. node(), comment(), text() )
Examples of path expressions
• document(“bibliography.xml”)/child::bib• $x/child::bib/child::book/attribute::year• $x/parent::*• $x/ancestor::*/descendent::comment()
Semantics of XPath expressions
• Semantics of path expressions in Xpath 1.0(1) Ordered forests of nodes as input, ordered forests of nodes as output (2) For each root node in the input forest, select the nodes in the same document that obey to the given axis
(3) Among those select and return the ones that satisfy the node test.(4) No duplicates are allowed in the output(5) Output nodes are ordered by the document order(6) Nodes preserve their identity
• No type error for $book/nose• A list of lists is automatically flattened
Xpath abbreviated syntax (1)
• Axis can be missing•By default the child axis $x/child::person -> $x/person
• Short-hands for common axes•Descendent-or-self
$x/descendant-or-self::comment() -> $x//comment() •Parent
$x/parent::* -> $x/.. •Attribute
$x/attribute::year -> $x/@year •Self
$x/self::* -> $x/.
Xpath abbreviated syntax (2)
• Implicit root node$root/bib -> /bib$root -> /
• Implicit current node (inside in the second order functions )
$self/title -> ./title $self/title -> title
Simple iteration expression
•Syntax : for variable in expression1 return expression2
•Example :for $x in document(“bibliography.xml”)/bib/book return $x/title
•Semantics :•bind the variable to each root node of the forest returned by expression1•for each such binding evaluate expression2•concatenate the resulting forests•lists of lists are automatically flattened
Local variable declaration
• Syntax : let variable := expression1 return expression2
• Example :let $x := document(“bibliography.xml”)/bib/book return count($x)
• Semantics :•bind the variable to the result of the expression1 •add this binding to the current environment•evaluate expression2 •remove the local variable from the environment.
Conditional expressions
• Syntax : if ( expression1 ) then expression2 else expression3
• Example : if ( $book/@year <1980 )
then “old book” else “new book”• Semantics :
•If expression1 evaluates to true then return the result of the evaluation of expression2 else return the result of the evaluation of expression3.
FLWR expressions
• Syntactic sugar that combines FOR, LET, IF
•Example for $x in //bib/book /* like the FROM in SQL */ let $y := $x/author /* no analog in SQL */ where $x/title=“The politics of experience” /* like the WHERE in SQL */ return count($y) /* like the SELECT in SQL */
FOR var IN expr
LET var := expr WHERE expr
RETURN expr
FLWR expression semantics
• FLWR expression: for $x in //bib/book let $y := $x/author where $x/title=“The politics of experience” return count($y)
• Semantically equivalent to: for $x in //bib/book return (let $y := $x/author return if ($x/title=“The politics of experience” ) then count($y) else () )
More FLWR expression examples
•Selectionsfor $b in document("bib.xml")//book where $b/publisher = “Springer Verlag" and
$b/@year = "1998" return $b/title
•Joinsfor $b in document("bib.xml")//book,
$p in //publisherwhere $b/publisher = $p/namereturn $b/title | $p/address/title, $p/name
Xpath filter predicates• Syntax:
expression1 [ expression2 ]
• [] is an overloaded operator• Filtering by predicate :
• //book [./author/firstname = “ronald”]• //book [@price <25]• //book [count(author [@gender=“female”] )>0 ]
• Filtering by position :• /book[3] • /book[3]/author[1] • /book[3]/author[1 to 2]
Quantified expressions
• Syntax:some variable in expression1 satisfies expression2every variable in expression1 satisfies expression2
• Examples:•some $x in //book satisfies $x/price >200•//book[some $x in author satisfies $x/@gender=“female”]• for $x in //book where every $y in $x/author satisfies $y/@gender=“female” return $x/title
SORTBY expressions• Syntax:
expression0 SORTBY ( expression1 [ ASCENDING | DESCENDING ] , ….,
expressionK [ ASCENDING | DESCENDING ] )
• Examples:•//book sortby ( @price )•//book[@year=“2001”]/author sortby (lastname, firstname)• for $x in //book where empty($x/author) return $x sortby (title)
Global document order queries
• Syntax: expression1 ( before | after ) expression2
• Examples:•//section before //section[title=“Persons and experiences”]•//paragraph after //section[name=“Introduction”] before //paragraph[contains(“Xquery”)
Xquery element constructors
•Standard XML elements: <section title=“Persons and experiences” > This is a section of the book entitled <title>The politics of Experience</title> written by <author> Ronald Laing</author>. </section>
•Dynamically constructed elements: <section title = {$s/title} >This is a section of the book entitled {$s/ascendents::book/title} written by { for $a in$s/ascendents::book/author return <author> {concat($a/firstname, $a,lastname)} </author> }.</section>
Complex Xquery example
<bibliography> { for $x in //book[@year=“2001”] return
<book title={$x/title}> { if(empty($x/author))
then $x/editor/affiliation else $x/author}
</book> }
</bibliography>
Xquery operators on datatypes
•INSTANCEOF •returns True if its first operand is an instance of the type named in its second operand
•CAST •is used to convert a value from one datatype to another
•TREAT •causes the query processor to treat an expression as though its datatype were a subtype of its static type
•TYPESWITCH •branching based on the dynamic type of the input data
Dealing with node identity
• All nodes in the data model have node identity• Nodes identity is preserved through queries• Two equality functions for nodes:
•Value based •Identity based
Local function declarations• Example:
function number_paragraphs($x ns:section) return xsd:integer
{count($x/paragraph) + sum(for $y in $x/section
return number_paragraphs($y))}
number_paragraphs(/bib/book[title=“The politics of experience”]/section[1])
Joins in XQuery
<books-with-prices> {for $a in document(“amaxon.xml”)/book, $b in document(“bn.xml”)/book where $b/isbn=$a/isbn return
<book> {$a/title} <price-amazon>{$a/price}</price-amazon>, <price-bn>{$b/price}</price-bn> </book> } </books-with prices>
Left-outer joins in XQuery<books-with-prices> {for $a in document(“amaxon.xml”)/book return
<book> {$a/title}
<price-amazon>{$a/price}</price-amazon>, {for $b in document(“bn.xml”)/book where $b/isbn=$a/isbn return <price-bn>{$b/price}</price-bn>
} </book>
} </books-with prices>
Full-outer joins in Xquery
let $allISBNs:=distinct(document(“amazon.xml”)/book/isbn union document(“bn.xml”)/book/isbn )return <books-with-prices> {for $isbn in $allISBNs return
<book> { for $a in document(“amazon.xml”)/book[isbn=$isbn]
return <price-amazon>{$b/price}</price-amazon> } {for $b in document(“bn.xml”)/book [isbn=$isbn]
return <price-bn>{$b/price}</price-bn> } </book> } </books-with prices>
Group-by and Having
• Example:for $a in distinct(//book/author/lastname)let $books := //book[some $y in author/lastname=$a]where count($books)>10return <result> {$a/name} {$books[1 to 10]} </result>
Views in Xquery
• Views are supported in Xquery via functions•non-parameterized views via functions with no
arguments
•parameterized views via functions with at least
one argument
• Xquery supports recursive views •unrestricted form of recursion
• Termination is not guaranteed automatically
Many open issues
• Relationship with Xpath•E.g. should we preserve the implicit casting operations of Xpath 1.0?
• Relationship with XML Schema•Bi-directional mapping between the XML Schema concepts and the Xquery type system concepts•Schema validation vs. type checking•Name-based sub typing vs. structural sub typing
• Human readable (non XML) syntax for types ?• Xquery functions and operators built-in library• More sophisticated support for full text search• … and many more…
Xquery implementations
•Microsoft •Software AG•Kweelt•Lucent •Univ. Darmstad •HiFive.com•FatDog.com
XML query language summary
• Expressive power•Major functionality of XML-QL, XQL, SQL, OQL - query the many kinds of data XML contains!•Use-case driven approach
• Can be implemented in many environments•Traditional databases, XML repositories, XML programming libraries, etc.•Queries may combine data from many sources
• Minimalist design•Small, easy to understand, clean semantics •“A quilt, not a camel”
Conclusion• One language replaces DOM+XPath+XSLT• Expressive, concise, easy to learn • Implementable, optimizable• Data integration for multiple sources• Several current implementations• Preliminary update proposal • Future work:
•Scripting language for XML•Workflow langauge for XML
• For more informations about the W3C XML Query Language WG activity please visit
W3C XML Query
Some of Xquery’s debates …
Procedural difficulties• Language designed by a committee
•Hard to avoid the Camel• Strong interaction with other W3C WG
• Not too much coordination among the W3C WG• No preexisting global vision or architecture (bottom up
design, like the Web itself !)
XqueryXSLTSchema
DOM
Technical argument (1)• Problem 1: equality is not transitive nor reflexive
•$x=3 and $x=4 can evaluate to true •$x<2 and $x>4 can also evaluate to true •$x=3 and $x!=3 can evaluate to true •$x=$y and $y=$z does not imply $x=$z •$x=$x can evaluate to false
• Source:•Equality (and all the other relational operators) has an implicit existential quantifier in Xpath 1.0
• Nasty consequences:• high probability of user errors and intense frustration• good old query evaluation algorithms don’t work anymore• schema evolution is badly handled
Technical argument (2)
• Problem 2: implicit data conversions• from an element to the element content• from an attribute to the attribute content• from a sequence to a value (by taking the first member)• from a typed value to string• from a string to a typed value• from any typed value to a Boolean(e.g. from a node set to Boolean)
• Examples of bad cases:• //book[price] is not the same as //book[price+0]• <book>{@price}</book> is not the same as <book>{@price+0}</book>
Technical argument (3)• Problem 2: implicit data conversions• Source:
•Backward compatibility with Xpath 1.0•Dealing with the “semi-structured” aspect of the data•Trying to avoid static or dynamic errors as much as possible
• Bad consequences:•the result of the evaluation of an expression can depend on the context where the expression appear•high probability of user errors and intense frustration
Technical argument (4)• Problem 3: “/” is not a simple projection
• (//book sortby @price)/title will be sorted by document order, not by price
• Source:•Backwards compatibility with Xpath 1.0
• Bad consequences:• high probability of user errors and more frustration• “/” often requires materialization (for sorting and duplicate elimination)• difficult to parallelize and stream
You can help
• Designing such a language is VERY hard!• Your opinion matters!• A year from now it will be too late • Please help reviewing the specifications and send comments to
XML update language
• Declarative update language • XML data model tree modification:
•E.g. nodes deletion, insertion, replacement•Metadata replacement
• Built in top of the XML query language• Initial proposal from some of the XML Query WG
members• Not an official working draft of the W3C !• Already supported by some Xquery implementations
XML update statements
•Simple update statements•InsertStatement•DeleteStatement•RenameStatement•ReplaceStatement•MoveStatement
•Complex update statements
INSERT statement
• Syntax:insert expression1 ( into | after | before ) expression2
• Examples:•insert <publisher>Morgan Kaufmann</publisher>
after //book[title=“The politics of experience”]/title•insert <comment>This is a great paragraph!</comment>
before //book[author/lastname=“Laing”]/section[1]/paragraph[2]
DELETE statement
• Syntax:delete expression
• Examples:•delete //book/@price>100•delete //book[1]/section[1 to 3]/comment()•delete //comment()
RENAME statement
• Syntax: rename expression as expression
• Examples:•rename //book as “publication”•rename //book/@price as “amazon_price”
REPLACE statement
• Syntax: replace expression1 with expression2
• Examples:•replace //book[1]/title with <title>Some new title</title>•replace //book[1]/@price/data() with 25.50•replace //book[1]/@price/data() with
//book[1]/@price/data()+5
MOVE statement
• Syntax:move expression1 ( before | after | into ) expression2
• Examples:•move //book[1]/section[1]/paragraph[2] before
//book[1]/section[2]/paragraph[1]•move //book[1]/@price into //book[1]/publisher