Post on 12-Jan-2016
description
transcript
ICDE’2001, Heidelberg, Germany 1D. Florescu, J. Siméon
XML Data:From Research to
Standards
Daniela FlorescuPropel
Jérôme SiméonBell Laboratories
ICDE’2001, Heidelberg, Germany 2D. Florescu, J. Siméon
Data and the Web:A bit of history
• Research:> 1950’s: Lisp [Mac Carthy]
> 1960’s: Tree languages [Buchi]
> 1970’s: Relational DBs [Codd]
> 1990: Graphlog [Univ. Toronto]
> 1994: O2 extensions [INRIA]
> 1995: Tsimmis & OEM [Stanford]
> 1995: UnQL [UPenn]
Need to handle irregular Web data.Use graph data models.
• Internet industry:> 1957 : Sputnik launches ARPA
> 1972 : First demonstration of ARPANET
> 1989 : Number of hosts breaks 100,000> 1991 : CERN releases the World Wide
Web HTML as the support for information
> 1997 : 20 Million Hosts, 1 Million Web sites
> 1998 : W3C releases XML to represent information on the WebXML provides a syntax for irregular
textual Web information.
?
ICDE’2001, Heidelberg, Germany 3D. Florescu, J. Siméon
The secret of HTML success• Everybody can write it:
> HTML is simple> HTML is textual: it is human readable, you can use any
editor, ...
• Everybody can read it> HTML is portable on any platform> The browser is the universal application
• It connects pieces of information together> Through hypertext links
ICDE’2001, Heidelberg, Germany 4D. Florescu, J. Siméon
But new applications = new needs• Infomediaries:
– Search engines– Web portals– Digital libraries– Virtual enterprises
• Electronic services:– On-line catalogs and procurement– Comparison shoppers– Market places
• Scientific applications• Manufacturing engineering
etc.More than HTML: data on the Web
More than the browser: applications on the Web
ICDE’2001, Heidelberg, Germany 5D. Florescu, J. Siméon
The Secret of XML Popularity
It looks like HTML...> Simple, familiar, easy to learn, human-readable> Universal and portable> Supported by the W3C: trusted and quickly adopted by the
industry
…but it’s more than HTML!> Flexible: you can represent any information> Extensible: you can represent it the way you want!
<book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year>
</book> …
ICDE’2001, Heidelberg, Germany 6D. Florescu, J. Siméon
XML Is Only the Beginning...• How do you build applications ?
> There is an urgent need for XML tools
• Designing XML tools is a data management problem:> XML 1.0 to describe structured documents
~ Syntax for trees
> XML data models to describe the information content~ Data model for trees
> XML schemas to describe the structure of information~ Data definition language for trees
> XML languages to describe information processing~ Data manipulation language for trees
ICDE’2001, Heidelberg, Germany 7D. Florescu, J. Siméon
About the Tutorial• XML through database glasses• Contains:
> Up-to-date information about standards> Relationship with research> Convergence and divergences
• Divided in 4 parts:1. Introduction to XML 1.02. Data models 3. Schema languages4. Query languages
Please, please, please, ask questions!
ICDE’2001, Heidelberg, Germany 8D. Florescu, J. Siméon
Part IXML 1.0
ICDE’2001, Heidelberg, Germany 9D. Florescu, J. Siméon
About the W3C• Membership organization
• Different types of groups inside the W3C:– Working groups– Interest groups– Coordination groups
• Status of W3C documents:– Note– Working draft– Last Call– Candidate/proposed recommendation– Recommendation ~ Standard
ICDE’2001, Heidelberg, Germany 10D. Florescu, J. Siméon
XML activities inside W3C• Core XML
> eXtensible Markup Language (XML 1.0), namespaces, Infoset
• XML Linking> XML Pointer Language (XPointer), XML Linking language
• XML Schema
• XML Query> XML Data Model, Algebra and Query Language
• Document Object Model
• XSL> XPath> XSLT/XSL: Transformation and stylesheet language
ICDE’2001, Heidelberg, Germany 11D. Florescu, J. Siméon
XML 1.0:Well formed documents
<book year=“1967” ><title>The politics of experience</title><author>R.D. Laing</author><ref isbn=“1341-1444-555”/><section>
The great and true Amphibian, whose nature is disposed to…..
<title>Persons and experience</title> Even facts become...
</section> …</book>
• An XML Document is composed of:> markup: element, attributes> text: #PCDATA, CDATA
• Well-formed document:> verifies XML lexical conventions> contains properly nested elements with a single root element> can contain empty elements, mixed text and elements
ICDE’2001, Heidelberg, Germany 12D. Florescu, J. Siméon
XML 1.0:Valid documents
<?XML version=“1.0”?> <DOCTYPE book [ <!ELEMENT book (title, author*, publisher?,
section+)> <!ATTLIST book year CDATA #IMPLIED> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT section (#PCDATA | title | section)*> ]>...
• A Valid XML document verifies a Document Type Definition (DTD):> grammar for the document> constraints on the structure of elements, attributes, entities,
notations...> a DTD is optional
(We will see more about DTD in the schema part of the tutorial)
ICDE’2001, Heidelberg, Germany 13D. Florescu, J. Siméon
Some additional features• General entities &myentity;
> Declared as part of XML 1.0 or in a DTD> Used to escape characters, as macros for pieces of
documents& = &
> An XML document contains Unicode characters< = < = <
• Parameter entities %myentity;> Declared in a DTD, used as macros for pieces of DTDs
<!ENTITY %macro “publisher (#PCDATA)”> …
<!ELEMENT %macro;>
ICDE’2001, Heidelberg, Germany 14D. Florescu, J. Siméon
Even more additional features
• Namespaces mynames:name> a set of names identified by an URI> tags and attribute names become qualified names
(QName)
• Processing instructions> to embed processing in a document (e.g. Java applet in
HTML)
• Comments
<myns:section xmlns:myns=“http://caravel.inria.fr/mySchema” > <myns:title> Persons and experience</myns:title></myns:section>
<!-- This is a comment -->
ICDE’2001, Heidelberg, Germany 15D. Florescu, J. Siméon
Part IIData Model
ICDE’2001, Heidelberg, Germany 16D. Florescu, J. Siméon
Why a data model for XML ?
• As a support for physical/logical independence> XML can be stored in files, a native XML repository, a relational
database> XML can be virtual, as a view of a repository, integrated sources> XML can be in memory, using data structures in C, C++, Java, etc> XML can be streamed between processes
• To describe information content of XML documents> to agree and reason about information content, preservation
• To define semantics of operations:
> equality, etc.
For old & well-know (but good!) reasons
ICDE’2001, Heidelberg, Germany 17D. Florescu, J. Siméon
But XML has specifics• Serialization syntax
• Some information exists only after schema validation
> price is not a string but a decimal value> refs is not a string but a list of references
• One more motivation for a data model:To isolate the user from syntactic details of XML
<xsd:attribute name=“price” type=“xsd:decimal”/><xsd:attribute name=“bookid” type=“xsd:ID”/><xsd:attribute name=“refs” type=“xsd:IDREFS”/>
<book bookid=“b1” price=“10.50”/><title>War & Peace</title><author>Tolstoi</author><biblio refs=“b1 b2 b3”>
</book>
ICDE’2001, Heidelberg, Germany 18D. Florescu, J. Siméon
Existing data models• Graph and tree models used in research
• Document Object Model (DOM)> status: recommendation> programmatic interface for XML (with an object-oriented
flavor)
• XML Information Set (Infoset)> describes the information content exported by XML processors> can be generated after parsing or after validation
• XML languages’ Data models:> required for language semantics> XPath: recommendation has it’s own data model> XML Query Data model: working draft
ICDE’2001, Heidelberg, Germany 19D. Florescu, J. Siméon
• Graph based, unordered, edge-labeled (here OEM)
> But XML is ordered, tree based> Node-labeled seems more natural (e.g., like in DOM)
Semistructured model
&b0
&b1
&b2 &b3
“Tolstoi” 10.50
book
bookbook
references
biblio
biblio
authortitle price
author
authorauthor
titlepublisherauthor
authortitle
Bib
“War & Peace”
refs
refs
refs
ICDE’2001, Heidelberg, Germany 20D. Florescu, J. Siméon
Ordered model• Node-labeled, ordered trees, with references (YAT)
> But what about attributes (unordered!), namespaces, processing interactions, etc. ?
“War & Peace”
title
b0: bib
b1:
price
“Tolstoi"
author
10.50
book
biblio
b2:b3:
refs
&b1 &b2 &b3
title priceauthor
book
biblio
book
......
..........................................
ICDE’2001, Heidelberg, Germany 21D. Florescu, J. Siméon
XML Infoset• Specifies a description of information in a well-
formed XML document
• Abstract way to think about XML data
• Other processors (e.g. XML Schema) can contribute informationHere is an example in a made-up syntax:
b1 = Element [ local name = “book”;children =[ Element [ local name = “title” ... ];
Element [ local name = “author”... ]; ... ]attributes = [ Attribute [ local name = “price”;
children = [ Character [ code = ‘1’ ];
Character = [ code = ‘0’ ];
Character = [ code = ‘.’ ];
Character = [ code = ‘5’];
Character = [ code = ‘0’ ] ];
attribute type = “xsd:decimal” ] ... ] ]
ICDE’2001, Heidelberg, Germany 22D. Florescu, J. Siméon
XML Query Data Model• A node-labeled, tree model with references
> Very close to XPath data model
• Generated after validation> provides also pointers to schema information
• Uses a functional notation> no explicit data structure
• Defines a mapping from post-schema validated Infoset to XML Query Data Model> preserves original infoset (e.g., characters)
ICDE’2001, Heidelberg, Germany 23D. Florescu, J. Siméon
XML Query Data Model• Nodes
Node = DocNode | ElemNode | AttrNode | ValueNode
| NSNode | PINode | CommentNode | InfoItemNode
• XML Schema primitive types string, boolean, ID, IDREF, decimal, QName, ...
• Collectionssequence bag union[T] {T} T1 | T2
Referencesref(T)
ICDE’2001, Heidelberg, Germany 24D. Florescu, J. Siméon
Constructors & accessors• Attribute Constructor
attrNode : (QNameValue, ValueNode) -> AttrNodeValueNode = StringValue | DecimalValue | ...qnameValue : (uriReference | null, string)-> QNameValue
• Attribute Accessorsname : AttrNode -> QNameValuevalue : AttrNode -> ValueNodetype : AttrNode -> ElemNode
• Example:<book price=“10.50”/>
A1 = attrNode(qnameValue(null, “price”),decimalValue(10.50))
name(A1) = qnameValue(null, “price”)value(A1) = decimalValue(10.50)
ICDE’2001, Heidelberg, Germany 25D. Florescu, J. Siméon
XML Data Model: Conclusion• Research focuses on simple formal models
• Many standards related to the need for a data model
• XML Query Data Model reconciles both worlds> Complete with respect to XML> Simple design with a clear connection to a formal model:
ordered trees, node-labeled, with references> Clear relationship two other W3C standards:
mapping to XML Infoset based on XPath + typed values and unordered collections
> Less clear relationship with DOM
ICDE’2001, Heidelberg, Germany 26D. Florescu, J. Siméon
Part IIIData Definition Language
ICDE’2001, Heidelberg, Germany 27D. Florescu, J. Siméon
Why a DDL for XML ?For old & well-know (but good!) reasons• As an ontology & modeling tool:
> to describe the structure of information: entities, relationships...
> to share common descriptions between actors/applications> to guide query formulation and application development
• For error detection & safety: > to verify that documents comply to what the application
expects> to make sure that the application accesses valid data> to enforce safe operations (e.g., don’t do float arithmetic on
trees!)> to check that compositions of operations make sense
• For performances:> to design storage (saving space, improving clustering, etc.)> to process queries (algebraic laws, rewriting path expressions,
etc.)
ICDE’2001, Heidelberg, Germany 28D. Florescu, J. Siméon
But XML deals with new needs• XML data created from legacy repositories
> Need to capture schemas from heterogeneous sources– Relational schemas: Simple but with integrity constraints– Object-oriented schemas: Typed references, Inheritance...– Document grammars: Regular expressions, mixed text and
structure
• XML used on the Web, for data exchange > Need to remain flexible– Web sources: From strict schemas to well-formed
documents (smooooothly........)– Many applications use the same information:
We should be able to type the same document in multiple ways
ICDE’2001, Heidelberg, Germany 29D. Florescu, J. Siméon
Existing schema languages
• DTDs (W3C recommendation as part of XML 1.0)> powerful for documents: regular expressions, mixes of text and
structure> limited for other applications: cannot capture relational or object
schemas
• XML Schema (Candidate recommendation)> Many new features: data types, forms of subtyping, etc.> More powerful but quite complex
• Schemas for unordered semistructured models: > Data guides, Graph schemas, using Datalog > Used for optimization, schema inference from data
• Schemas for ordered trees models> Regular tree grammars, YAT, lotos, XDuce, Relax, TRex etc.> Used for optimization, type checking and inference from queries
ICDE’2001, Heidelberg, Germany 30D. Florescu, J. Siméon
DDL Roadmap3.1. Describing atomic values
> integer, string, float, date, images, etc
3.2. Describing structures> elements: tag-coupled approach vs. tag-decoupled
approach> attributes
3.3. More semantics> identity, references, relationships intra or inter
documents> isa: notion of inheritance...
3.4. Simplifying schema reuse> import/export abilities> refinement of existing descriptions
ICDE’2001, Heidelberg, Germany 31D. Florescu, J. Siméon
Values in XML: easy ?• DTD says it’s easy:
Recipe: #PCDATA = string CDATA = other strings, ...I.e.: Everything is a string
Unfortunately: Strings are not a panacea...
• Database research says it’s easy:Recipe: Take a data model with atomic types
Each value is in a different type...I.e.: Don’t deal with syntax but data model
Unfortunately: XML = file = syntax
ICDE’2001, Heidelberg, Germany 32D. Florescu, J. Siméon
Values in XML: many issues...
• Addressing numerous needs:> float, string, int, date, URI, telephone number, gif, applet, etc.
• Living with XML 1.0 syntax> The same lexical representation can correspond to several values
> The same value can have several lexical representations
> binary formats (images, etc.) must be serialized in a portable way
• Compatible with other standards
• Compatible with internationalization> World Wide Web!
<book><title>Haystacks at Chailly </title><author>Monet</author> <date>1865</date><price>1865</price></book>
<book><ref>Monet1865</ref><in_stock>true</in_stock></book><book><ref>Monet1865</ref><in_stock>1</in_stock></book>
ICDE’2001, Heidelberg, Germany 33D. Florescu, J. Siméon
XML Schema Part 2: Datatypes
• Defines 14 built-in types (basic types)> general purpose types> types for compatibility with DTDs
• Relies on other existing standards whenever possible> IEEE 754-1985 for floats> UCS [ISO 10646] & Unicode for internationalization> ISO 8601 for dates
• Gives the ability to define new types (derived types)
• Single lexical representation for many values ?> document is interpreted with respect to a given schema> if no schema, the value is given the type string
ICDE’2001, Heidelberg, Germany 34D. Florescu, J. Siméon
Datatypes: base types• Base types cover essential needs
> “classic” values: string, boolean, float, double, decimal> temporal values: timeDuration, recurringDuration> binary values: binary> Web-related types: uriReference, QName> DTD types: ID, IDREF, ENTITY, NOTATION
• One value for several syntaxes> Each base type has a set of values (value space)> Values may have several lexical representations (lexical
space)> Equality and order are defined in terms of the value space
ICDE’2001, Heidelberg, Germany 35D. Florescu, J. Siméon
Base types: examplesDatatype Examples Notes
string Victor Hugo
boolean true, f alse, 1,0
fl oat 12, 12.00, 1.2E-2, I NF mx2 e where m < 2 24 -149 <= e <= 104
double 12, 12.00, 1.2E-2, I NF mx2 e where m < 2 53 -1075 <= e <= 970
decimal 0, -0, 1.23, 123.4 Arbitrary precision
timeDuration P29Y2MT1H30M1.3S 29 years, 2 months, 3 days, 1 hour, 30 minutes, 1.3 seconds
recurringDuration --08-29T19:05:00 August 29th at 7.05pm every year
uriRef erence http:/ / www.w3.org/
ICDE’2001, Heidelberg, Germany 36D. Florescu, J. Siméon
Datatypes: facets• Each base type has facets (read: properties)
• Some facets are fundamentals> equality, order> bounded, cardinality, numeric
• Some facets are constraining> length, minLength, maxLength: for string, binary or lists> maxInclusive, maxExclusive, minInclusive, minExclusive> precision, scale: for decimal numbers> encoding: hex or base64 for binary> enumeration, pattern> duration, period
ICDE’2001, Heidelberg, Germany 37D. Florescu, J. Siméon
Datatypes: derived types• One can derive types by restriction of
facets
• One can derive types by list
• XML Schema offers predefined derived types> integer, nonpositiveInteger, int, date, year, century,
timeInstant, language, etc.
> IDREFS, NMTOKENS, etc.
<simpleType name=’integer' base=’xsd:decimal'> <scale value='0'/></simpleType>
<simpleType name=’int' base=’xsd:integer'> <maxInclusive value=’2147483647'/> <mininclusive value=‘-2147483648’/></simpleType>
<simpleType name=’IDREFS' base=’xsd:IDREF’ derivedBy=‘xsd:list’/>
ICDE’2001, Heidelberg, Germany 38D. Florescu, J. Siméon
Now you can practice...> Using a range facet
> Using an enumeration facet
> Using a pattern facet
> Using a list type
> etc.
<simpleType name=’auctionprice' base=’xsd:decimal'> <minInclusive value='10'/></simpleType>
<simpleType name=’booktype' base=’xsd:string'> <xsd:enumeration value=”Book"/> <xsd:enumeration value=”Collection"/>...
<xsd:simpleType name=”isbn" base=‘xsd:string’> <xsd:pattern value=”ISBN \d{10}"/></xsd:simpleType>
<xsd:simpleType name=”auctions" base="xsd:auctionprice” derivedBy=“xsd:list”/>
ICDE’2001, Heidelberg, Germany 39D. Florescu, J. Siméon
Describing Values: Conclusion• Not addressed in research
• XML Schema Part2: Datatypes does a good job> Quite complete> Deals with complex requirements
(e.g.,internationalization)
• Defines values but not operations!> Needed by XPath, XQuery…
ICDE’2001, Heidelberg, Germany 40D. Florescu, J. Siméon
Describing XML structures• element names
> with the names themselves: book, title, etc.> possibly with wildcards: ~ = any tag, !a = not a,
etc.
• element children> using regular expressions
• element attributes> unordered attribute-value pairs
• Main question: types vs. element names> does the element name determines the type ?> tag-coupled types vs. tag-decoupled types
ICDE’2001, Heidelberg, Germany 41D. Florescu, J. Siméon
Coupled types• Approach taken by DTDs
> two elements with same name have always same type
> children = regular expression over elements
• Properties> easy to parse: => no depth look-ahead> no closure under union, no local names allowed> cannot express relational, object-oriented schemas
<!ELEMENT book (title, author+, price, publisher, section, conclusion?)><!ELEMENT title (#PCDATA)>....<!ELEMENT author (name,affiliation)<!ELEMENT name (first, last)><!ELEMENT first (#PCDATA)>....<!ELEMENT publisher (name, address)>...
ICDE’2001, Heidelberg, Germany 42D. Florescu, J. Siméon
Decoupled types• Approach taken by YAT, XDuce, lotos, etc.
> types are decoupled from element names> children are defined by regular expressions over types
> different types can have the same tag
• Properties> equivalent to regular tree grammars> closure under intersection, complement, union...> more precise type for documents and queries> harder to parse (might require look-ahead and
backtracking)
type Book = book [ Title, Author+, Price, Publisher, Section, Conclusion? ]type Title = title [ String ]type Author = author [ Name, Affiliation ]type Name = name [ first [ String ], last [ String ] ] ...
type Publisher = publisher [ PName, Address ]type PName = name [ String ]
ICDE’2001, Heidelberg, Germany 43D. Florescu, J. Siméon
Decoupled types cont’d• They are simple to define
> basic entities: datatypes, tags, type names> one construct : typesschema ::= type type_name = type .........type ::= String | Boolean | ... (* datatypes *) | type_name (* type name *) | tag [ type ] (* element *) | ~ [ type ] (* element with wild
card *) | type, type (* sequence *) | type | type (* union *) | type* (* kleene star *)
ICDE’2001, Heidelberg, Germany 44D. Florescu, J. Siméon
Decoupled types cont’d• They can easily describe mixed content
• They can easily describe all well-formed documents
• They support a notion of subtyping via inclusion
> all documents of type Body2 are also of type Body and UrTree
• But they can be ambiguous
> deciding between Body and Body2 can be expensive
type Section = section [ title [ String ], Body ]type Body = content [ (b [ Body ] | footnote [ String ] | Section | String)* ]
type UrScalar = (String | Boolean | Float | Double ...)type UrTree = UrScalar | ~[ UrTree* ]
type Body2 = content [ String, (b [ String ] | footnote [ String ] | String)*, Section* ]Body2 <: Body <: UrTree
type Section2 = section [ title [ String ], Body2*,Body* ]
ICDE’2001, Heidelberg, Germany 45D. Florescu, J. Siméon
Decoupled types & full XML• How do you describe attributes ?
> but attributes are unordered, without duplicates> they do not interact with the children of the element> they cannot contain complex values
• How do you describe references ?> Like in object schemas [Cluet et al 1998]:
> but it’s even harder to parse because of cycles [Beeri, Milo 1999]
• How do you deal with XML specifics ?> entities, process instructions, name spaces, serialization,
etc.
type Book = book [ @isbn [ String ], Title, Author+, Price, Publisher, Section, Conclusion? ]
type Author = author [ name [ first [ String ], type Book = book [ title [ String ], last [ String ] ] ]
&Author+,&Publisher ] type Publisher = publisher [ name [ String ] ]
ICDE’2001, Heidelberg, Germany 46D. Florescu, J. Siméon
What about XML Schema ?• Tries to get the expressive power of decoupled types
+ the ease of parsing of coupled types
• Advanced features: “subtyping”, constraints...
• Deals with all the specifics of XML
• XML Schema Syntax is in XMLResults in a pretty complex specification
<xsd:element name=”book”> <xsd:complexType> <xsd:element name=”title" type="xsd:string"/>
<xsd:element name=”author” maxOccurs=“unbounded”> <xsd:complexType><element name=“first” type=“xsd:string”/> <element name=“last” type=“xsd:string”/> </xsd:complexType></xsd:element> ……… </xsd:complexType></xsd:element>
ICDE’2001, Heidelberg, Germany 47D. Florescu, J. Siméon
Element & attribute declarations• Element decl. ~ associate element names to
types> have a name and their content is described by a type
• Attribute decl. ~ associate element names to types> have a name and contain an atomic value> can be required or optional> can only appear inside elements (through complex types)
<xsd:element name=”title" type="xsd:string"/> title [ String ]
<xsd: element name = “affiliation” type=“publisher”/> affiliation [ Publisher ]
<xs:attribute name=”price”/> @price [ String ]?
<xs:attribute name=”auctionhistory” type="auctions”@auctionhistory [ Auctions] use="required"/> type Auctions = Decimal*
ICDE’2001, Heidelberg, Germany 48D. Florescu, J. Siméon
Model groups• Defines content models (i.e., type for the children of an
element)~ equivalent to regular expressions over elements<xsd:sequence> title[Title],price[Price]
<xsd:element name=”title" type=”Title"/> <xsd:element name=”price" type=”Price"/></xsd:sequence>
<xsd:choice> ( publisher[Publisher] <xsd:element name=”publisher” type=“Publisher”/> | editor[Author]) <xsd:element name=”editor” type=“Author”/></xsd:choice>
<xsd:sequence minOccurs=“0” book[ Book ]* maxOccurs=“unbounded”>
<xsd:element name = “book” type=“Book”></xsd:sequence>
<xsd:all> (title[Title],price[Price]) <xsd:element name=”title" type=”Title"/> | (price[Price],title[Title]) <xsd:element name=”price" type=”Price"/></xsd:all>
ICDE’2001, Heidelberg, Germany 49D. Florescu, J. Siméon
Complex type definitions> they contain a content model and attribute declarations
> they can be empty
> they can be recursive> then can be mixed (I.e., strings + sub elements)
<xsd:complexType name=“Book”> type Book = @isbn [String], <sequence> title [String] <xsd:element name=”title" type="xsd:string"/> author[ Name ]+ <xsd:element name=”author” maxOccurs=“unbounded”
type=“AuthorName”/> </sequence> <xsd:attribute name = “isbn” type=“xsd:string/></xsd:complexType>
</xsd:complexType name=“RefBib” content=“empty”> type RefBib = @refto [ &UrTree ] <xsd:attribute name = “refto” type=“xsd:IDREF/></xsd:complexType>
</xsd:complexType name=“Body” content=“mixed”> type Body = (b[Body]|String)* <xsd:element name = “b” type=“Body” minOccurs=“0”
maxOccurs=“unbounded”/></xsd:complexType>
ICDE’2001, Heidelberg, Germany 50D. Florescu, J. Siméon
Some feature interactions• Local element restrictions
> local elements with same name can have different types
> but they must have the same type among siblings
• To be simple or not to be simple...
> requires a complexType defined by extension over decimals
<xsd:element name=”author”> <xsd:complexType> type Author = author [ name[ AuthorName ] ]<xsd:element name=”name” type=“AuthorName”/>
</xsd:complexType></xsd:element><xsd:element name=”publisher"/><xsd:complexType> type Publisher = publisher [ name [ String ]
]<xsd:element name=”name" type="xsd:string"/>...
</xsd:complexType></xsd:element>
<internationalPrice currency='EU'>423.46</internationalPrice>
<xsd:complexType name=“Names”> type Names = name [ AuthorName ],
<xsd:element name=”name” type=“AuthorName”/> name [ String ]? <xsd:element name=“name” type = “xsd:string” minOccurs = “0”/><xsd:complexType>
ICDE’2001, Heidelberg, Germany 51D. Florescu, J. Siméon
Describing Structures:Conclusion• Research : formal models with good properties
• XML Schema Part1: Structures is complex> Deals with XML syntactic aspects> Focuses on validation> Many features with complex interactions
• Need for some middle ground> We need to reason about schemas (e.g., for typing)> XML Schema: Formalism has just been released
ICDE’2001, Heidelberg, Germany 52D. Florescu, J. Siméon
Integrity constraints• Come from relational
> practical view-point: key & foreign key constraints
> theoretical view-point: functional & inclusion dependencies> studied in depth in the literature
• Many useful applications of ICs> used to preserve information when mapping ER model to
relational> used for safety and verification (e.g., controlling updates)> used for optimization (e.g., dropping useless joins)
• reasoning about ICs is hard:> implication of functional + inclusion dependencies is
undecidable> etc.
Book ( isbn, title, price, publisher ) isbn is a key for the relation BookAuthor (authorid, first, last, affiliation) authorid and first,last are both keys for the relation AuthorWrote (isbn,authorid) isbn and authorid are foreign keys to Book and Author
ICDE’2001, Heidelberg, Germany 53D. Florescu, J. Siméon
ID/IDREF mechanism in DTDs• Very simple ICs to model identity and references
• ID attributes must have distinct values> they identify elements uniquely in a document> but they are not exactly like keys: publisher’s stickers and
book’s isbns must be different
• IDREF attributes must have values from ID attributes> they can capture references to other elements> but: they allow refs to point to publishers!
<!ELEMENT book (title, author+, price, publisher, section, bibliography?)><!ATTLIST book isbn ID #required><!ELEMENT title (#PCDATA)><!ELEMENT publisher (name, address)><!ATTLIST publisher sticker ID #required><!ELEMENT bibliography EMPTY><!ATTLIST bibliography refs IDREFS #implied>
ICDE’2001, Heidelberg, Germany 54D. Florescu, J. Siméon
Adding constraints to DTDs• We can replace IDs by real keys:
• We can replace IDREFs by real foreign keys
> Reasoning about simple IC’s for XML is possible [FanSimeon 2000]
> Reasoning about IC’s with DTDs is very hard [FanLibkin 2001]
book.isbn -> book isbn is a key for the relation bookpublisher.sticker -> publisher sticker is a key for the relation publisher
author.authorid -> author authorid is a key for the relation authorwrote.isbn, wrote.authorid -> wrote isbn and authorid are a key for the relation wrote
biblio.refs <= book.isbn refs is a multi-valued foreign key from biblio to book
wrote.isbn <= book.isbn isbn is a foreign key from wrote to bookwrote.authorid <= author.authorid authorid is foreign key from wrote to author
ICDE’2001, Heidelberg, Germany 55D. Florescu, J. Siméon
Constraints in XML Schema• XML Schema can define powerful constraints
> Using XPath expressions
• One can define keys:
> the selector gives the collection on which the constraint applies
• One can define foreign keys:
• Many open issues> is XPath too powerful for reasoning (predicates, function calls ?) > which notion of equality is used ?> interaction between ICs and structural constraints ?
<key name=”Isbn"> <selector>books/book</selector> <field>@isbn</field> </key>
<key name=”Publisher"> <selector>books/book/publisher</selector> <field>@sticker</field> </key>
<keyref refer=”Isbn"> <selector>books/book/biblio</selector><field>@refs</field> </keyref>
ICDE’2001, Heidelberg, Germany 56D. Florescu, J. Siméon
Unified Constraint Model• Based on XML Query Algebra type system• Key/Foreign Key domains are defined by Types• Very simple path expression for key components
> Powerful: relational keys/fkeys, object references, ID/IDREFs> Close to relational approach> Simple enough to reason about satisfiability
[Fan Kuper Simeon 2001]
type Book = book [ title [ String ], Author*, publisher [ Publisher ] … ]
type Author = author [ name [ String ], wrote [ String* ] ]
key book = Book [| ./title/data() |]
fkey authorbooks = Authors [| ./wrote/data() |] references book
ICDE’2001, Heidelberg, Germany 57D. Florescu, J. Siméon
Reusing schemas• Many benefits
> sharing existing definitions> faster development
• Traditional techniques for schema reuse:> some notion of import and the ability to resolve name conflicts
> inheritance, based on subtyping
• We need means to access schemas over the Web
class Author inherit Person class Publisher inherit Company tuple(affiliation : Publisher ) tuple(address:string) tuple(first:string,last:string,affiliation:Publisher) tuple(name:string, address: string)<: tuple(first:string,last:string) <: tuple(name:string)
Import Person, Company from StdClass
class Person class Company tuple(name : tuple( first : string, tuple(name: string) last : string ))
ICDE’2001, Heidelberg, Germany 58D. Florescu, J. Siméon
Reusing XML Schemas• Means to import types from other schemas
> access and import though URIs> name conflict resolution based on namespaces
• Mechanisms for limited “inheritance” or subtyping> notions of extension and restriction> abstract types and “equivalence classes”
<schema xmlns="http://www.w3.org/1999/XMLSchema”
xmlns:html="http://www.w3.org/1999/xhtml" targetNamespace="uri:mybiblio”
xmlns:my="uri:mybiblio">
ICDE’2001, Heidelberg, Germany 59D. Florescu, J. Siméon
Extension• Extension allows to add new fields in a complex type
• Now you can use both types> but you might need to mark the data with xsi:type attributes
> you cannot export the document without its type anymore...
<complexType name=”ContactAuthor" base=” Author" derivedBy="extension">
<element name=”telephone" type=”xsd:string"/> </complexType>
<author xsi:type=“Author”><name> <first>Serge</first><last>Abiteboul</last></name>
<affiliation>INRIA</affiliation></author><author xsi:type=“ContactAuthor”>
<name><first>Jerome</first><last>Simeon</last></name><affiliation>Bell Laboratories</affiliation><telephone>+1 908 582 5473</telephone>
</author>
ICDE’2001, Heidelberg, Germany 60D. Florescu, J. Siméon
Restriction• Restricts the scope of a type definition
• 5x5 table across schema features to define restriction
• Spirit is to allow:> smaller datatypes> narrowed range for sequences t{n,m} < t{n’,m} iff n>n’
&& m<m’> reduced alternative t1 < (t1|t2)> propagation of restriction t1 < t1’ implies t1 < (t1’|t2)
<xsd:element name=”book2” base=“book” derivedBy=“restriction”> <xsd:complexType> <xsd:element name=”title" type="xsd:string"/>
<xsd:element name=”author” minOccurs=“2” maxOccurs=“10”>....... </xsd:complexType></xsd:element>
ICDE’2001, Heidelberg, Germany 61D. Florescu, J. Siméon
“Equivalence classes”• Allows to define elements that can be used in place
of other elements
> allow an element named contact to be used whenever an author element is expected
> the corresponding type can be a derived type
> of course, “equivalence classes” are not based on equivalence
<element name=“contact” type=“ContactAuthor” equivClass=’author' />
<author><name> <first>Serge</first><last>Abiteboul</last></name>
<affiliation>INRIA</affiliation></author><contact>
<name><first>Jerome</first><last>Simeon</last></name><affiliation>Bell Laboratories</affiliation><telephone>+1 908 582 5473</telephone>
</contact>
ICDE’2001, Heidelberg, Germany 62D. Florescu, J. Siméon
Some short-comings• Restriction is very syntactic
> the following two types are not restrictions of one another!
• Restriction and extension are not possible together:
<xsd:sequence> a[A],(b[B],c[C]) <xsd:element name=“a" type=”A"/> <xsd:sequence> <xsd:element name=“b" type=”B"/> <xsd:element name=”c" type=”C"/> </xsd:sequence></xsd:sequence>
<xsd:sequence> (a[A],b[B]),c[C] <xsd:sequence> <xsd:element name=“a" type=”A"/> <xsd:element name=”b" type=”B"/> </xsd:sequence><xsd:element name=“c" type=”C"/></xsd:sequence>
Person1 = person [ name [ UrTree ], age [ Integer ] ]
Person2 = person [ name [ String ], age [ Integer ],
address [ Address ] ]
ICDE’2001, Heidelberg, Germany 63D. Florescu, J. Siméon
Subtyping: Conclusion• Subtyping and inheritance in programming languages
• By name subtyping in XML Schema: relies on user declaration
• Structural subtyping in XDuce relies on set inclusion
• Subsumption for semistructured data [Buneman et al 1997] and for XML [Kuper Simeon 2001] proposes a trade-off between by name and structural subtyping
Still an open problem
ICDE’2001, Heidelberg, Germany 64D. Florescu, J. Siméon
XML DDL: Conclusion• Many research work with interesting and
complementary properties
• Complete but complex XML Schema specification...
• Yet no approach that reconciles all of the above
• And still some difficult problems to solve:> concrete integrity constraint language that is tractable> syntactic vs. semantics notion of subtyping ?> use of types for language typing> use of types for query processing> use of types for storage
ICDE’2001, Heidelberg, Germany 65D. Florescu, J. Siméon
Part IVXML Query Languages
ICDE’2001, Heidelberg, Germany 66D. Florescu, J. Siméon
Plan of the rest of the talk• Querying XML: problem definition
• Previous query languages for XML and graph-based data
• Xquery as a “standard” query language for XML– Syntax and semantics
– Functionalities and expressive power
– Open issues
• Other desirable features for Xquery
• Research problems related to XML data management
• Conclusion
ICDE’2001, Heidelberg, Germany 67D. Florescu, J. Siméon
In search of a query language...
• What do we call a query language?
The language used to describe, in a declarative fashion, the mapping
between an input instance of the data model to an output instance of the data
model.
What data model for XML ?
ICDE’2001, Heidelberg, Germany 68D. Florescu, J. Siméon
XML data models
• XML is just a syntax and did not have any standard data model for many years (still doesn’t !)
• Graphs data models have been used to model irregular data even before XML
• All query languages for graph-based data models are relevant to XML
• Xquery data model (www.w3c.org/TR/query-datamodel)– First formal and complete data model for XML– Used in the formal semantic specification of Xquery
ICDE’2001, Heidelberg, Germany 69D. Florescu, J. Siméon
XML example<book year=“1967” >
<title>The politics of experience</title><author>R.D. Laing</author><ref isbn=“1341-1444-555”/><section>
The great and true Amphibian, whose nature is disposed to…..
<title>Persons and experience</title> Even facts become...
</section> …</book>
ICDE’2001, Heidelberg, Germany 70D. Florescu, J. Siméon
XML data model in a slide• An instance of the data model = a forest of nodes
• Eight type of nodes:
– Document, element, attribute, value, namespace, processing-instruction, comment, reference nodes
• Each type of node has accessors (e.g name(element)) and constructors (e.g. comment(“this is a comment”))
• Nodes have an optional (unique) parent
• Nodes have an identity that can be queried and preserved
• Support for ordered and unordered collections
• No support for nested collections
• Document order can be queried and preserved
• Data model instances are described and constraint by a type system
ICDE’2001, Heidelberg, Germany 71D. Florescu, J. Siméon
XML query language requirements (1)
1. Select portions of an XML document
2. Copy portions of a document while
preserving the hierarchy and the order of
the nodes
3. Combine (join) two documents
4. Construct new documents
5. Navigate irregular or unknown documents
ICDE’2001, Heidelberg, Germany 72D. Florescu, J. Siméon
XML query language requirements (2)
6. Formulate predicates on the tag names and
attribute names
7. Query and preserve the nodes global
topological order
8. Apply aggregation and sorting functions
9. Apply existential and universal quantifiers
10. Apply full-text predicates and text operations
ICDE’2001, Heidelberg, Germany 73D. Florescu, J. Siméon
Relevant query languages• Query languages for graph data
– e.g. GOOD, GraphLog, Clean
• Query languages/scripting languages for the WEB – e.g. WebSQL, WebOQL, WebL
• Query languages for semi-structured data– e.g. MSL, UnQL, StruQL, YATL
• Research query languages for XML– e.g. XML-QL, Lorel, XML-GL, Quilt, Xduce
• Industry query languages for XML– e.g. XQL, OQL extensions to query SGML documents
• Standard processing languages for XML (W3C standards)– e.g. XPath, XSLT
• Standard W3C XML Query Language: Xquery “XML Query Languages: Experiences and Exemplars”, M. Fernandez, J. Simeon, P.
Wadler“Comparative Analysis of Five XML Query Languages”, Angela Bonifati, Stefano Ceri
ICDE’2001, Heidelberg, Germany 74D. Florescu, J. Siméon
XQuery• Current working drafts inside the W3C
www.w3c.org/XML/Query
• Basis of the future “standard” XML query language
• Xquery will have a : (a) human readable (non-XML) syntax and (b) an XML syntax (ABQL)
• XML Algebra:– Formal data model, type system– Formal semantics for the query languageCaveat: many features and design decisions
are stable; some will change
ICDE’2001, Heidelberg, Germany 75D. Florescu, J. Siméon
Xquery as a functional language• Xquery :
– consumes an instance of the XML data model as input– produces an instance of the XML data model as output
• Xquery is a functional language (like OQL)• Xquery is a strongly typed language• A query is an expression• Static semantics:
– Given an expression computes the type of the result
• Dynamic semantics: – Given an expression and an environment, determines the
resulting value
• Environment binds functions and variables
ICDE’2001, Heidelberg, Germany 76D. Florescu, J. Siméon
Xquery expressions• Constants (all XML Schema atomic types)
– “string literal” , 1345.46E23, etc
• Variables– $x, $y
• XPath expressions (for navigation)– $x/girls, $y/* , $x/@name
• Expression OP Expression– 1 +3, true and false, $x/girls union $x/boys
• f(exp1,...exp2)– descendents($x)
• FLWR expressions (for iteration)• SORTBY expressions• Quantified expressions • Conditional expressions• XML node constructors (elements, attributes, etc)
ICDE’2001, Heidelberg, Germany 77D. Florescu, J. Siméon
Xquery functions and operators
• Arithmetic operators– +, -, *, div,
• Logical operators– And, Not, Or
• Collection oriented operators– Union, intersection, difference, empty(), distinct()
• XML specific functions– Document(), name(), value(), string(), etc
• Work in progress• Many semantic open issues: what is the semantics of a
+ operator when the input is not a value of a numerical type but a list of strings ? See type coercion problem later on.
ICDE’2001, Heidelberg, Germany 78D. Florescu, J. Siméon
Navigation using Xpath• General syntax:
expression ‘/’ step• Step:
axis ‘::’ nodeTest
• Axis control the direction– ancestor, ancestor-or-self, attribute, child, descendent, descendent-or-self, following,
following-sibling, namespace, parent, preceding, preceding-sibling, self
• Node test by– Name (e.g. employee, myNS:employee, *: employee, myNS:* , *:* )– Type (e.g. node(), comment(), text() )
• Examples of path expressions
document(“employees.xml”)/child::employee
$x/parent::*
$x/ancestor::*/descendent::comment()
ICDE’2001, Heidelberg, Germany 79D. Florescu, J. Siméon
Semantics of path expressions
• Semantics of path expressions in Xpath 1.0(1) Ordered forests of nodes as input, ordered forests of nodes as output (2) For each root node in the input forest, select the nodes in the same document that obey to the given axis; among those select and return the ones that satisfy the node test.(3) No duplicates are allowed in the output(4) Output nodes are ordered by the document order(5) Nodes preserve their identity
• No type error for $book/firstname
• A list of lists is automatically flattened
ICDE’2001, Heidelberg, Germany 80D. Florescu, J. Siméon
XML example<book year=“1967” >
<title>The politics of experience</title><author>R.D. Laing</author><ref isbn=“1341-1444-555”/><section>
The great and true Amphibian, whose nature is disposed to…..
<title>Persons and experience</title> Even facts become...
</section> …</book>
ICDE’2001, Heidelberg, Germany 81D. Florescu, J. Siméon
Shortcuts in Xpath (1)• Axis is not mandatory
– By default it is child $x/child::person -> $x/person
• Short-hands for common axes– Descendents,
$x/descendant::comment() -> $x//comment() – Parent
$x/parent::* -> $x/.. – Attribute
$x/attribute::name -> $x/@name – Self
$x/self::* -> $x/.
ICDE’2001, Heidelberg, Germany 82D. Florescu, J. Siméon
Shortcuts in Xpath (2) • Implicit root node
$root/department -> /department $root -> /
where $root is implicitly bound to the current document node
• Implicit current node$self/title -> ./title $self/title -> title where $self is implicitly bound to the ‘current’ node
(eliminates the need for an explicit variable declaration in second-order operators like sortby and filter predicates )
ICDE’2001, Heidelberg, Germany 83D. Florescu, J. Siméon
Iteration • Syntax :
for variable in expression0 return expression1
• Example :» for $y in document(“books.xml”)/book return $y/authors» for $x in //text() return value($x)» for $z in ( for $y in //book return $y/authors ) return $z» for $z in //book return ( for $y in $z/authors ) return $y)
• Semantics :– bind the variable to each root node of the forest returned by
expression0; for each such binding evaluate expression1; concatenate the resulting forests.
ICDE’2001, Heidelberg, Germany 84D. Florescu, J. Siméon
Local variable declaration
• Syntax : let variable := expression1 return expression2
• Example :» let $y := document(“books.xml”)/book return count($y)» let $a :=f(2) return $a+$a
• Semantics :– Evaluate expression1 and add a binding of the variable with this
value to the current environment; evaluate expression2 in this environment; remove the local variable from the environment.
• Usage:– Avoid common sub-expressions repetition– Split large expressions into smaller, more manageable sub-
expressions.
ICDE’2001, Heidelberg, Germany 85D. Florescu, J. Siméon
Conditional expressions
• Syntax : if expression1 then expression2 else expression3
• Example :» if $book/year <1980 then “old book” else “new book”» if count($company//employee)>200 then BigCompanyTaxCalculation($company)
else SmallCompanyTaxCalculation($company)
• Semantics :– If expression1 evaluates to true then return the result of
the evaluation of expression2 else return the result of the evaluation of expression3.
ICDE’2001, Heidelberg, Germany 86D. Florescu, J. Siméon
FLWR expressions• Syntactic sugar that combines FOR, LET, IF• Syntax:
( ( for (for_variable_binding)+ ) | ( let (let_variable binding)+ ) | ( where expression ) )+ return expressionfor_variable_binding := variable IN expressionlet_variable_binding := variable := expression
• Example for $x in //employee, $y in //department let $z := $x/name where $x/@departament=$y/name return $z
ICDE’2001, Heidelberg, Germany 87D. Florescu, J. Siméon
FLWR example • FLWR expression:
for $x in //employee, $y in //department
let $z := $x/name where $x/@department=$y/name return $z
• Syntactic sugar for: for $x in //employee
return ( for $ y in /department return (let $z := $x/name return if ( $x/@department=$y/name ) then $z else [] /*empty list */ ) )
ICDE’2001, Heidelberg, Germany 88D. Florescu, J. Siméon
Filter predicates• Syntactic sugar that simplifies some FLWR
expressions
• Syntax: expression1 [ expression2 ]where expression 2 is allowed to use the $self implicit variable
(or the equivalent . )
• Semantics: – if expression2 is of type boolean, shorthand for
for $self in expression1where expression2return $self
– if expression2 is of type integer, return the Nth root element of the forest returned by expression1
ICDE’2001, Heidelberg, Germany 89D. Florescu, J. Siméon
Filter predicates (2)• Filtering by predicate :
» //employee [./name/firstname = “jerome”]» //book [price <25]» //book [count(author [@sex=“female”] )>0 ]
• Filtering by position :» /book[3] » /book[3]/author[1] » /book[3]/author[1 to 4]
• Same syntax, different semantics based on the type of the expression !
ICDE’2001, Heidelberg, Germany 90D. Florescu, J. Siméon
Quantifiers• Syntax:
some variable in expression1 satisfies expression2every variable in expression1 satisfies expression2
• Examples:»some $x in //book satisfies $x/price <200»//book[some $x in author satisfies $x/@sex=“female”]
» for $x in //department where every $y in $x/employee satisfies $y/salary >1000 return $x/manager/name
ICDE’2001, Heidelberg, Germany 91D. Florescu, J. Siméon
Sorting• Syntax:
expression0 SORTBY ‘(‘ expression1 [ ASCENDING | DESCENDING ] , ….,
expressionK [ ASCENDING | DESCENDING ] ‘)’
• Semantics:– Second order operator– Stable sort using the comparison function defined on the domains
1..K– The implicit self variable is allowed in expression1,…, expressionk
• Examples:» //employee sortby (./name/firstname)» //person sortby ( ./income descending, ./name ascending)» for $x in //departments where count($x/employee)>2000 return $x sortby (revenue)
ICDE’2001, Heidelberg, Germany 92D. Florescu, J. Siméon
Global (document) order queries
• Syntax: expression1 ( before | after ) expression2
• Semantics: – return all the roots of the first forest that are
located before (resp. after) at least one root node in the second forest according to the global topological order of the document
• Examples:– //incision before //anesthesia[1]– //paragraph after //section[name=“introduction”] before //paragraph[contains(“Xquery”)
ICDE’2001, Heidelberg, Germany 93D. Florescu, J. Siméon
Element constructors (1)• Normal XML elements:
<section title=“Introduction” > This is the introduction of the book entitled <title>Data on the Web</title> written by <author> Dan Suciu </author> <author>Peter Buneman</author> <author> Serge Abiteboul </author> . </section>
• XML elements with dynamically computed data <section title = $s/title > “This is the introduction of
the book entitled“, $s/ascendents::book/title , “ written by “, for $a in$s/ascendents::book/author return <author> concat($a/firstname, $a,lastname) </author> </section>
ICDE’2001, Heidelberg, Germany 94D. Florescu, J. Siméon
Element constructors (2) • Example: “For each book with an author, return the
book and its authors; for each book with an editor return the book’s title and the editor’s affiliation”.
<bibliography> for $x in //book return
if(empty($x/author)) then <book> $x/title, $x/editor/affiliation</book>
else <book>$x/title, $x/authors></book> </bibliography> Attention to the deep copy semantics !
ICDE’2001, Heidelberg, Germany 95D. Florescu, J. Siméon
Constructing other types of nodes
• Eight types of nodes:– Document, elements, attributes, references,
namespaces, comments, processing-instructions
• Elements are constructed using an XML notation
• All the others use specific functions– comment(“Please look at this issue!”)– makeAttribute(“age”, 25)
ICDE’2001, Heidelberg, Germany 96D. Florescu, J. Siméon
FILTER• Example: ”Retrieve the table of content of a
specific book”
filter(document(“input.xml”)//book[@ISBN=10],
//book | //section | //title | //section/title/text() )
• Copy from the input document only the book elements, the section elements, the section titles and their text content (but not their children)• For the copied nodes, preserve their relative order and their hierarchical structure.
ICDE’2001, Heidelberg, Germany 97D. Florescu, J. Siméon
FILTER example<?XML version=“1.0”?><bib>…………………………….
<book ISBN=“10” year=“1967” > <title>The politics of experience</title> <author><firstname>R.D.</firstname>
<lastname>Laing</lastname>
</author> <section>
<title>Persons and experience</title> The great and true Amphibian <section>
Exploitation must not .... </section>
</section> </book>………………………..</bib>
<?XML version=“1.0”?><book> <title>The politics of experience</title>
<section> <title>Persons and experience</title>
<section> ..................... <section> </section></book>
ICDE’2001, Heidelberg, Germany 98D. Florescu, J. Siméon
Dealing with node identity
• All nodes in the data model have node identity
• Node identity is preserved through queries:– All the constructs in Xquery preserve node identity
except
– The element constructor that makes copies of the input nodes and generates new nodes with new identity
• Two node can be compared using the identity equality operator (‘==‘)
ICDE’2001, Heidelberg, Germany 99D. Florescu, J. Siméon
XQueries• … we talked until now about expressions
• What is a query?
• An Xquery is defined as:– A list of context definitions– A list of function definitions– A main expression
• The result of the query is the result of the evaluation of the main expression
• Context definition:– Namespace definitions
ICDE’2001, Heidelberg, Germany 100D. Florescu, J. Siméon
Local function declarations• Syntax:
function functionName ‘(‘ Parameter list ‘)’ return dataType ‘ {‘ expression ‘}’
• Example:function total_cost($x myNS:component) return xsd:float{ if(simpleComponent($x)) then return $x/price/data() else return sum(for $y in $x/* return total_cost($y )) }
total_cost(/component[1])
• Functions can be recursive; no restrictions on the type of the recursion
• Functions obey to the “implicit mapping rule”
ICDE’2001, Heidelberg, Germany 101D. Florescu, J. Siméon
Static semantics for path expressions
”Retrieve the titles of all the books.”
• Input: type Bib = bib [ Book* ] type Book = book [ title [ String ], year [ Integer ] author
[ String ]* ] • Query: document(“bib0.xml”)/book/title
• Result: <title>Data on the Web</title> <title>Foundations of Databases</title> : title[String]*
ICDE’2001, Heidelberg, Germany 102D. Florescu, J. Siméon
Static semantics for the iteration
Example: ”Retrieve all the books written before 1967.”
• Query: for $v in document(“bib0.xml”)/book return if $v/title < 1967 then $v else []
• Result: <book>…..</book> <book>…..</book> : book[ title [ String ], year [ Integer], author [String]* ]
ICDE’2001, Heidelberg, Germany 103D. Florescu, J. Siméon
Plan of the rest of the talk• Querying XML: problem definition
• Previous query languages for XML and graph-based data
• Xquery as a “standard” query language for XML– Syntax and semantics
– Functionalities and expressive power
– Open issues
• Other desirable features for Xquery
• Research problems related to XML data management
• Conclusion
ICDE’2001, Heidelberg, Germany 104D. Florescu, J. Siméon
Joins• Example: “For each book found at both amazon.com and
bn.com list the title of the book and the price from each vendor”.
<book-with-prices> for $a in document(“amaxon.xml”)/book, $b in document(“bn.xml”)/book where $b/isbn=$a/isbn return
<book> $a/title, <price-amazon>$a/price</price-amazon>, <price-bn>$b/price</price-bn> </book> </book-with prices>
ICDE’2001, Heidelberg, Germany 105D. Florescu, J. Siméon
Left-outer joins• Example: “For each book found at both amazon.com list
the title of the book and its price. If the book also appears in bn.com, list also the bn price”.
<book-with-prices> for $a in document(“amaxon.xml”)/book return
<book> $a/title,
<price-amazon>$a/price</price-amazon>, for $b in document(“bn.xml”)/book where $b/isbn=$a/isbn return <price-bn>$b/price</price-bn> </book> </book-with prices>
ICDE’2001, Heidelberg, Germany 106D. Florescu, J. Siméon
Full-outer joins• Example: “For each book found at either amazon.com or
bn.com list its price(s).”
let $allISBNs:=distinct(document(“amazon.xml”)/book/isbn union document(“bn.xml”)/book/isbn )return <book-with-prices> for $isbn in $allISBNs return
<book> ( for $a in document(“amazon.xml”)/book where $a/isbn=$isbn return <price-amazon>$b/price</price-amazon> ),
( for $b in document(“bn.xml”)/book where $b/isbn=$isbn return <price-bn>$b/price</price-bn> ) </book> </book-with prices>
ICDE’2001, Heidelberg, Germany 107D. Florescu, J. Siméon
Group-by and Having• Example: “For each author with more then
10 books list the name of the author and the list of the first 10 books that he/she wrote”.
for $a in distinct(//author)let $books := for $b in //book[author=$a]where count($books)>10return <result> $a/name, $books[1 to 10] </result>
ICDE’2001, Heidelberg, Germany 108D. Florescu, J. Siméon
Views and parameterized views
• Support for views is a must
• Views are supported via functions
• Non-parameterized views are functions with no arguments; parameterized views are functions with at least one argument
• Xquery can support recursive views (unrestricted form of recursion)
• Termination is ensured by the programmer
ICDE’2001, Heidelberg, Germany 109D. Florescu, J. Siméon
Open issues• Three value logic :
– XML Schema supports elements with nil content– Xquery has to deal with the absence of information
• Extensibility :– Some functions will be written in other programming languages
then Xquery– How are those functions declared and invoked in Xquery?
• Exceptions and exception handling mechanisms :– What is the semantics of a query in case of exceptions?– What is the semantics of Booleans operators in case of
exceptions?– How should we raise and catch exceptions?
• Type coercion rules :– XML has no mandatory Schema; does this imply that data should
be converted on the fly to the types expected by the operators ?– E.g. lists to singletons, strings to float, float to string
ICDE’2001, Heidelberg, Germany 110D. Florescu, J. Siméon
Implicit type casting in Xpath 1.0• Data model has 4 types:
– untyped set, string, integer, Boolean
• The evaluation uses implicit type casting rules:
/person [ child/age = 19] implicit existential quantifier
/person [ child/age + 1 = 20] the age of the first child equal 19
/book[@year] implicit existential quantifier
/book[@year+1-1] two type conversions: string->int, int->Boolean
will return a book written in 1999 if it happens that this is
the 1999th book in the document
/book[title=“”] empty set to string conversion
returns also the books without a <title> subelement
ICDE’2001, Heidelberg, Germany 111D. Florescu, J. Siméon
Plan of the rest of the talk• Querying XML: problem definition
• Previous query languages for XML and graph-based data
• Xquery as a “standard” query language for XML– Syntax and semantics
– Functionalities and expressive power
– Open issues
• Other desirable features for Xquery
• Research problems related to XML data management
• Conclusion
ICDE’2001, Heidelberg, Germany 112D. Florescu, J. Siméon
XML patterns and pattern matching
• UnQl, XML-QL, YATL• Example:
– ”Retrieve the titles of the books written by Laing before 1967”
WHERE <bib> <book year= $y ISBN= $isbn>
<title> $t </title> <author> <lastname>Laing</lastname> </author> </book>
</bib> in “bib.xml”, $y<1967
CONSTRUCT <resultBook ISBN= $isbn > <resultTitle> $t </resultTitle> </resultBook>
•No distinction between For and Where•Pattern matching semantics
ICDE’2001, Heidelberg, Germany 113D. Florescu, J. Siméon
Skolem functions• UnQl, XML-QL, Lorel• Example:
– ”Retrieve the titles of the all the books, grouped by year of publication”
WHERE <bib> <book year= $y>
<title> $t </title> </book>
</bib> in “bib.xml
CONSTRUCT <groupPerYear id= F($y) > <resultTitle> $t </resultTitle> </groupPerYear>
ICDE’2001, Heidelberg, Germany 114D. Florescu, J. Siméon
Vertical regular expression• UnQl, XML-QL, Lorel, YATL• Example:
– ”Retrieve the titles of all the sections or chapters”
WHERE <bib> <book>
< (section | chapter) * > <title> $t </title>
</> </book>
</bib> in “bib.xml” CONSTRUCT <resultTitle> $t </resultTitle>
ICDE’2001, Heidelberg, Germany 115D. Florescu, J. Siméon
Horizontal regular expressions
• YATL
• A Tree Pattern = type expression without union, and with annotated variables ($v)
• Example: ”Retrieve the first author after the book title”
• Process DTDs like: <!ELEMENT bib (title, author+)*>
• Example: “Create a bibliography for each author”
book[ title [ String ] book($b) [ title [ $t ], author[String]+, +author [ $a ]+, UrTree* ] _ ]
MAKE $aMATCH book WITH book [ _ , title , _, author[$a] , *author, _ ]
MAKE *($a) bib [ author [ $a ], *title [ $t ] ]MATCH bib WITH bib[*(title [ $t ], +author [ $a ] )]
ICDE’2001, Heidelberg, Germany 116D. Florescu, J. Siméon
XML-related research problems(1)
• Update languages for XML
• XML views of object-relational databases
• Storing XML data in object-relational DBMSs– new challenges for the traditional DBMSs and for SQL
• Alternative storage methods for XML data
• Indexing XML
• Query processing algorithms for XQuery
• Efficient (streamed) processing of XML transformations
• Mixing structured search with full-text search
• Distributed execution of XML queries
• XML benchmarks
ICDE’2001, Heidelberg, Germany 117D. Florescu, J. Siméon
XML-related research problems(2)• XML-based information mediation
• XML data cleaning
• XML data compression
• XML-based information brokering
• XML-based workflow systems
• XML scripting languages
and many more...
ICDE’2001, Heidelberg, Germany 118D. Florescu, J. Siméon
Conclusion• XML is the lingua franca of the Web • XML is the next big challenge for the database community• Large quantities of a new type of data
– textual, irregular, self-organizing, distributed, replicated, etc.
• Many orders of magnitude larger:– the volume of XML data– the number of XML data repositories
• We have now good quality standards: – XML data model, XML schemas, XML query and transformation
languages
• Very clear need from the industry• Extraordinary opportunity for database research !
ICDE’2001, Heidelberg, Germany 119D. Florescu, J. Siméon
XSLT(1)• Paper:
– “XSL Transformations (XSLT)”, W3C recommendation
• XML to XML rule based transformation language
• An XSLT program is an XML document itself
The divided self
publisher
R.D. Laing
author
book
titlepublisherauthor
bookbook
......
..........................................
title
bib
Pantheon Books
The divided self
publisher
R.D. Laing
author
book
titlepublisherauthor
bookbook
......
..........................................
title
bib
Pantheon Books
The divided self
publisher
R.D. Laing
author
book
titlepublisherauthor
bookbook
......
..........................................
title
bib
Pantheon Books
DOM
XML
HTML
data
transformation
result
ICDE’2001, Heidelberg, Germany 120D. Florescu, J. Siméon
XSLT(2)
• An XSLT program is a valid XML document containing:– elements in the <xsl:> namespace (i.e. the XSLT statements)
– elements in other namespaces(i.e the user-defined data)
• The result of the evaluation of an XSLT program on an input XML document := the XSLT document where each <xsl:> element has been replaced with the result of its “evaluation”
• Uses Xpath as a sublanguage
• Used mostly as a stylesheet language
ICDE’2001, Heidelberg, Germany 121D. Florescu, J. Siméon
XSLT programs
• An XSLT program – is an element of type <xsl:stylesheet>
1. XSL elements describing rewriting rules– <xsl:template>
2. XSL elements describing rule execution control – <xsl:apply-templates>– <xsl:call-template>
3. XSL elements describing instructions– <xsl:element>, <xsl:attribute>, <xsl:for-each>,
<xsl:if>, <xsl:copy>, <xsl:copy-of>, <xsl:sort>, <xsl:value-of>, etc
ICDE’2001, Heidelberg, Germany 122D. Florescu, J. Siméon
XSLT processing model• Process an XML document (procedure PD):
1. Apply the procedure PL (bellow) to a list with a single node: the root of the document
• Process a list L of nodes (procedure PL):1. Process each node N (procedure P bellow) in the list (with current
node=N and current list=L)
2. Return the concatenation (in the right order) of the partial results
PL([x1, x2…, xn]) = [ P(x1), P(x2), …, P(xn)]
• Process a node N (procedure P):1. Find all applicable templates to the node N
2. Find the “best” template among them
3. Instantiate the content of the template
4. Return this result
ICDE’2001, Heidelberg, Germany 123D. Florescu, J. Siméon
<xsl:template>• Basic XSLT concept: describes a rewriting rule
• It has:– attributes to describe the acceptable input – content to describe the output
• Attributes:– match: Xpath expression describing the elements to which this
template applies– name: the name of the template rule– priority: guides the choice of the best template to apply
• The content is a legal XML fragment with:– Elements from the xsl namespace – Other elements (user data)
ICDE’2001, Heidelberg, Germany 124D. Florescu, J. Siméon
<xsl:template> example <xsl:template name=“myTemplate” match=“book[title]” >
<resultBook> <xsl:attribute name=resultYear>
<xsl:value-of select=“./@year”/> </xsl:attribute>
The title of this book is <resultTitle>
<xsl:value-of select=“./title”/> </resultTitle>
and it was.... </resultBook><xsl:template>
ICDE’2001, Heidelberg, Germany 125D. Florescu, J. Siméon
Instantiating an <xsl:template>
• ... on a node N:» returns the content of the template where the <xsl:> elements
from the content of the template have been replaced with the result of their “evaluation” ( with the current node=N )
» Two types of <xsl:> elements in the content:
1. Instruction elements » <xsl:copy>, <xsl:copy-of>, <xsl:value-of>, <xsl:for-each>» return a certain list of nodes according to their particular semantics
2. Rule control elements » <xsl:apply-templates>, <xsl:call-templates>» recursive calls to the rule engine (see below)
• Maps an XML node into a list of XML nodes
ICDE’2001, Heidelberg, Germany 126D. Florescu, J. Siméon
<xsl:template> example <xsl:template name=“myTemplate” match=“book[title]” >
<resultBook> <xsl:attribute name=resultYear>
<xsl:value-of select=“./@year”/> </xsl:attribute>
The title of this book is <resultTitle>
<xsl:value-of select=“./title”/> </resultTitle>
and it was.... </resultBook><xsl:template>
ICDE’2001, Heidelberg, Germany 127D. Florescu, J. Siméon
Example of instantiation<book ISBN=“10” year=“1967” >
<title>The politics of experience</title> <author>R.D.Laing</author> <section> The great and tr
<title>Persons and experience</title>
<section> Exploitation must not been….
</section> </section> </book>
<resultBook resultYear=1967> The title of this book is <resultTitle>
The politics of experience </resultTitle> and it was ….</resultBook>
Input XML
Output XML
ICDE’2001, Heidelberg, Germany 128D. Florescu, J. Siméon
Recursive <xsl:template><xsl:template name=“myTemplate” match=“book[title]”
> <resultBook>
<xsl:attribute name=resultYear><xsl:value-of select=“./@year”/>
</xsl:attribute> <resultTitle>
<xsl:value-of select=“./title”/> </resultTitle>
<xsl:apply-template select= “./section” /> </resultBook><xsl:template>
Invokes the procedure PL with current list= “./section”.
ICDE’2001, Heidelberg, Germany 129D. Florescu, J. Siméon
Recursive calls• <xsl:apply-templates>
– invokes recursively the procedure PL – the argument is a new list of nodes
» explicitly specified in the select attribute» by default is the list of children of the current node
<xsl:apply-template select=“ ./section ”/>
• <xsl:call-template>– triggers the instantiation of a specific template identified by
name – does not change the context node and the context list
<xsl:call-template name=“myTemplate” />
ICDE’2001, Heidelberg, Germany 130D. Florescu, J. Siméon
XSLT execution control <xsl:stylesheet>------------------------------------------------------------------ <xsl:template name=“myTemplate”>
<xsl:apply-template select=“./ascendent::book”/> <xsl:template>------------------------------------------------------------------ <xsl:template match=“section”>
This is a section of the book <xsl:call-template name=“myTemplate”/> and its name is <xsl:value-of select=“./title”> . </xsl:template>------------------------------------------------------------------ <xsl:template match=“book”>
<xsl:value-of select=“./title”> </xsl:template>----------------------------------------------------------------- <xsl:template match=“/”>
<xsl:apply-template select=“//section[title]”> </xsl:template>------------------------------------------------------------------</xsl:stylesheet>
ICDE’2001, Heidelberg, Germany 131D. Florescu, J. Siméon
Built-in templates------------------------------------------------------------------
<xsl:template match=“*|/”> apply recursively on the children <xsl:apply-templates select=“./node()” /> if element</xsl:template>
------------------------------------------------------------------
<xsl:template match=“@*|text()”><xsl:value-of select=“.”/> print the content
</xsl:template> if text node or attribute
-----------------------------------------------------------------
<xsl:template match=“processing-instruction()|comment()”/> ignore (do nothing) if processing instruction or comment
ICDE’2001, Heidelberg, Germany 132D. Florescu, J. Siméon
TOC of a certain book
<xsl:template match=“/”> <xsl:apply-template select=“//book[@ISBN=10]”>
</xsl:template>----------------------------------------------------------------------------------
<xsl:template match=“book”><xsl:apply-template select=“./section”>
</xsl:template>--------------------------------------------------<xsl:template match=“section”>
Section <xsl:value-of select=“title”> <xsl:apply-templates select=“./section”>
</xsl:template>
-----------------------------------------------------------------
ICDE’2001, Heidelberg, Germany 133D. Florescu, J. Siméon
XSLT
• Like Xquery, it describes general XML to XML transformations
• Built-in processing model
• Full recursion
• Possibile to write non-terminating programs even on trees
• XSLT vs. Xquery – same expressive power– differences: programming style, XML vs. Non-XML syntax
• Could be considered as a query language
• Is it “declarative” ?
ICDE’2001, Heidelberg, Germany 134D. Florescu, J. Siméon
Part IVData Manipulation
Language
ICDE’2001, Heidelberg, Germany 135D. Florescu, J. Siméon
Query languages for XML• problem definition
• overview of different approaches
• overview of representative research languages – query languages for semistructured data
– research and industry query languages for XML
• status of the XML Query Working Group– XML Query Algebra (working draft)
– XQuery: a query language for XML (working draft)
ICDE’2001, Heidelberg, Germany 136D. Florescu, J. Siméon
In search of a query language...• What do we call a query language?
The language used to describe, in a declarative fashion, the mapping
between an input instance of the data model to an output instance of the data
model.
ICDE’2001, Heidelberg, Germany 137D. Florescu, J. Siméon
XML vs. graph-based models• XML document content could be modeled as a graph
– components (elements, attributes) in a hierarchical structure
• ...but XML is more complicated than that– several distinct types of nodes
» text, elements, attributes, comments, processing instructions, etc.
– some parts are ordered (e.g. children of an element) and some other parts not ordered (e.g. attributes)
– in the absence of a DTD or schema, the document is a tree; otherwise it could be a graph
• We will not consider only XML query languages, but also query languages for graph-based data
ICDE’2001, Heidelberg, Germany 138D. Florescu, J. Siméon
Some relevant query languages• Query languages for graph data
e.g. GOOD, GraphLog, Clean
• Query languages for the WEB e.g. WebSQL, WebOQL
• Query languages for semi-structured datae.g. MSL, UnQL, StruQL
• Research query languages for XMLe.g. XML-QL, Lorel, YATL, XML-GL, Quilt, XDuce
• Industry query languages for XMLe.g. XQL, OQL extensions to query SGML documents
• Standard processing languages for XML (W3C standards)e.g. XPath, XSLT
“XML Query Languages: Experiences and Exemplars”M. Fernandez, J. Simeon, P. Wadler
“Comparative Analysis of Five XML Query Languages”Angela Bonifati, Stefano Ceri
ICDE’2001, Heidelberg, Germany 139D. Florescu, J. Siméon
XML languages: the big picture
SPJ +RegExpr +grouping.
Expressive power
Data model
Simple graphs
Idealized XML data model
Real XML
Navigation & selection
OQL+RegExpr
XML-QL (2) Lorel (3)
UnQL (1)
XSLT (7)
XQuery (6)
XPath(5)
SPJ+RegExp
OQL+conditional +full recursion
YATL (4)
ICDE’2001, Heidelberg, Germany 140D. Florescu, J. Siméon
DDL Roadmap3.1. XPath
> Building block for several other languages
3.2. XQuery and the XML Query Algebra> Both working drafts> Design based on requirements and use cases
3.3. Other languages and features> XML-QL, Lorel, YATL, XDuce, etc.> Focusing on specific features
3.4. XSLT> Already a W3C recommendation> Already widely used
ICDE’2001, Heidelberg, Germany 141D. Florescu, J. Siméon
XPath: Overview• Syntax for XML document navigation and
node selection
• Papers:– “XML Path Language (XPath)”, W3C
recommendation
• Building block for other W3C activities:– XSL Transformations (XSLT) – XML Link (XLink)– XML Pointer (XPointer)– XML Query (XQuery)
ICDE’2001, Heidelberg, Germany 142D. Florescu, J. Siméon
XPath Expressions• A query is an expression (Location Path)
– describes a single navigation path in an XML document
• A query simply selects a list of nodes from the input document
• A Location Path consists of:– a context node– a series of Location Steps separated by /
• A verbose Location Step consists of:– an axis, a node test, a list of predicates
document(“bib.xml”) / child::book [./attribute::ISBN=10] / descendant::section / [position()=1]
ICDE’2001, Heidelberg, Germany 143D. Florescu, J. Siméon
XPath• Location step:
– an axis, a node test, a list of predicates
• 13 Axes:– ancestor, ancestor-or-self, attribute, child, descendent,
descendent-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self
• Node Test: – name test (e.g. section, *, myNs:myTag) – type test (e.g. text(), comment(), node() )
document(“bib.xml”) / child::bib/ child::* [./attribute::ISBN=10] /
descendant::section [position()=1] / child::comment()
ICDE’2001, Heidelberg, Germany 144D. Florescu, J. Siméon
XPath abbreviated syntax book CN/child::book book/@ISBN CN/child::book/attribute::ISBN
section[1] CN/child::section[position()=1]. CN.. CN/parent::*../text() CN/parent::*/child::text()//section ROOT/descendent-or-self::section/section ROOT/child::section// ROOT/descendent-or-self::*//section[last()]
ROOT/descendent-or-self::section[position()=last()]
//section [5] [title=“introduction”]//section [title=“introduction”] [5]
ICDE’2001, Heidelberg, Germany 145D. Florescu, J. Siméon
Semantic aspects of XPath• Data model has 4 types:
– untyped set, string, integer, Boolean
• The evaluation uses implicit type casting rules:/person [ child/age = 19] implicit existential quantifier/person [ child/age + 1 = 20] the age of the first child equal 19/book[@year] implicit existential quantifier/book[@year+1-1] two type conversions: string->int, int->Boolean will return a book written in 1999 if it happens that this is the 1999th book in the document/book[title=“”] empty set to string conversion returns also the books without a <title> sub-elementpreceding::foo[1] and (preceding::foo)[1] are not the same
ICDE’2001, Heidelberg, Germany 146D. Florescu, J. Siméon
XML Query Working Group
• XML Query Requirements (WD)– What should be achieved with the language
• XML Query Use Cases (WD)– Many examples of queries for a lot of applications
• XML Query Algebra (WD)– Formal basis for the language(s)
• XQuery : a traditional syntax (WD)
• An XML syntax (Not here yet)
ICDE’2001, Heidelberg, Germany 147D. Florescu, J. Siméon
XML Query Requirements
• Declarative
• Expressive (joins, manipulation of documents, etc)– Supporting both database applications and
documents applications
• Formally specified– Precise semantics
• Two syntaxes: ‘user-readable’ and XML
• Should allow updates in the future
ICDE’2001, Heidelberg, Germany 148D. Florescu, J. Siméon
XML Query Use Cases• Illustrate the Query language with examples
– Access to relational databases– Access to documents– Full-text queries– Recursive queries– queries that use references– metadata queryingEtc.
• Decide what XQuery should and should not do– Make 80/20 cut
• ‘Benchmark’ for the language design– Important queries should be easy to write
ICDE’2001, Heidelberg, Germany 149D. Florescu, J. Siméon
XML Query Algebra• Based on XML Query Data Model
• ‘Minimal’ set of operations
• Static semantics (type checking)– Can infer the type of your query
• Dynamic semantics (result of the query)
• Expressive enough to support Xquery– Iteration (and join)– Navigation– Functions with full recursion
• Contains a tutorial on types and expressions
ICDE’2001, Heidelberg, Germany 150D. Florescu, J. Siméon
Static semantics for path expressions
”Retrieve the titles of all the books.”
• Input: type Bib = bib [ Book* ] type Book = book [ title [ String ], year [ Integer ] author
[ String ]* ] • Query: document(“bib0.xml”)/book/title
• Result: <title>Data on the Web</title> <title>Foundations of Databases</title> : title[String]*
ICDE’2001, Heidelberg, Germany 151D. Florescu, J. Siméon
Static semantics for the iteration
Example: ”Retrieve all the books written before 1967.”
• Query: for $v in document(“bib0.xml”)/book return if $v/title < 1967 then $v else []
• Result: <book>…..</book> <book>…..</book> : book[ title [ String ], year [ Integer], author [String]* ]
ICDE’2001, Heidelberg, Germany 152D. Florescu, J. Siméon
XQuery
• First Working Draft in February
• Coming from work on Quilt– Already a number of test implementations
• Supports XML Query use cases
• Draft of semantics on top XML Query Algebra
• Test parsers are available
ICDE’2001, Heidelberg, Germany 153D. Florescu, J. Siméon
XQuery• Data model:
– the XML Query working group data model
• Language description:– borrows features from OQL, XML-QL, Lorel, XQL, ML. – as ML, OQL, Lorel: it is a functional language– includes a subset of XPath as a sub-language– as ML, it uses IF-THEN-ELSE and LET constructs– as YATL, it uses local function definitions– as XQL, it uses BEFORE and AFTER operators (global
topological order of the XML document)– new FILTER operator to do projection while
preserving the hierarchy and the order
ICDE’2001, Heidelberg, Germany 154D. Florescu, J. Siméon
XQuery• A query:= a list of local function definitions + the
main expression to evaluate
• An XQuery expression:– constant (all XML Schema atomic types)– variable– f(exp1,...exp2)
» +, -, and, or, union, intersection, etc– LET var := expr1 in expr2– XPath expression (for navigation)– FLWR expression– SORT expr1 by expr2– XML node constructors (elements, attributes, etc)
ICDE’2001, Heidelberg, Germany 155D. Florescu, J. Siméon
XPath in XQuery
• Query1: ”Retrieve the titles of all the books written before 1967.”
document(“bib.xml”)//book[@year<1967]/title
• An XPath expression is an XQuery expression• Returns the selected forest of the input
document • XPath queries can be used as building blocks
for more complex expressions
ICDE’2001, Heidelberg, Germany 156D. Florescu, J. Siméon
FLWR expressions• Query1: ”Retrieve the titles of the books written
by Laing before 1967, together with their reviews.”
FOR $b in document(“bib.xml”)//book[@year<1967],
$r in document(“reviews.xml”)//review
WHERE $b/authors/lastname=“Laing” and $b/@ISBN=$r/@ISBN
RETURN
<resultBook ISBN=$b/@ISBN>
<title> $b/title/text() </title>,
$r
</resultBook>FLWR expression
ICDE’2001, Heidelberg, Germany 157D. Florescu, J. Siméon
Local variables• Query1: ”Retrieve the titles of the books written by
Laing before 1967 together with their reviews.”
FOR $b in document(“input.xml”)//book[@year<1967]
LET $R := document(“input.xml”)//review[@isbn=$b/@isbn]
WHERE $b/authors/lastname=“Laing”
RETURN
<resultBook ISBN=$b/@ISBN>
<resultTitle> $t </resultTitle>
<bookReviews> $R </bookReviews>
</resultBook>
ICDE’2001, Heidelberg, Germany 158D. Florescu, J. Siméon
Global order operators• Query4: “Retrieve the titles of the first 4
sections (and of their subsections) of a specific book.”
LET $b := /bib/book[@ISBN=10] IN
$b//section/title BEFORE $b/section[5]
the list of all the titles of the
sections of the book $bthe fifth section of the book $b
the book with ISBN = 10
the list of all the titles that appear before the fifth section (in the global topological order of the document)
ICDE’2001, Heidelberg, Germany 159D. Florescu, J. Siméon
FILTER• Query1: ”Retrieve the table of content of a
specific book”
document(“input.xml”)//book[@ISBN=10]
FILTER //book | //section | //title | //section/title/text()
• Erase all the nodes from the input document except the book element, the section elements, the section titles and their text content• For the remaining nodes, preserve their relative order and their hierarchical structure.
ICDE’2001, Heidelberg, Germany 160D. Florescu, J. Siméon
FILTER example<?XML version=“1.0”?><bib>…………………………….
<book ISBN=“10” year=“1967” > <title>The politics of experience</title> <author><firstname>R.D.</firstname>
<lastname>Laing</lastname>
</author> <section>
<title>Persons and experience</title> The great and true Amphibian <section>
Exploitation must not .... </section>
</section> </book>………………………..</bib>
<?XML version=“1.0”?><book> <title>The politics of experience</title>
<section> <title>Persons and experience</title>
<section> ..................... <section> </section></book>
ICDE’2001, Heidelberg, Germany 161D. Florescu, J. Siméon
XQuery: conclusion
• XQuery design goals:– learn from previous experience– keep it simple– make sure it is useful– make sure it is semantically clean :)
• Still many issues:– Which additional feature to add (full regular
expressions, text operators, etc)– Relationship with XPath – Relationship with XML Query Algebra– Relationship with XML Schema
ICDE’2001, Heidelberg, Germany 162D. Florescu, J. Siméon
UnQL(1)• Authors:
– P.Buneman, D. Suciu, M. Fernandez
• Papers:– “UnQL: A Query Language and Algebra for
Semistructured Data Based on Structural Recursion”, P. Buneman, M. Fernandez and D.Suciu, VLDB Journal 9(1), 2000.
– More information at: http://www.research.att.com/~suciu/unql-
home.html
ICDE’2001, Heidelberg, Germany 163D. Florescu, J. Siméon
UnQL(2)
• Initial data model:– trees with labeled edges and labeled leaves
• A query = a function– takes a tree as input and returns a tree as output
• Language description:– based on structural recursion
“The form of the program follows the form of the data.”
ICDE’2001, Heidelberg, Germany 164D. Florescu, J. Siméon
UnQL tree data model• 4 constructs to build a tree
(1) the empty set is a tree (with no nodes and no edges)(2) if V is a value then {V} is a tree (leaf node)(3) if T is a tree and L is a label then {L:T} is a tree (edge
construction)(4) if T1 and T2 are trees then T1 U T2 is a tree (union)
publisher
..........................................
{book : {title: ”The divided self”} {author: ”R.D.Laing”} {publisher: ”Pantheon Books”}}
The divided self
publisher
R.D. Laing
author
book
titleauthor
bookbook
......title
bib
Panthoen Books
ICDE’2001, Heidelberg, Germany 165D. Florescu, J. Siméon
UnQL query language• A query = a function • A function = an ordered set of rules• A rule:
– left-hand side: » a pattern : when the rule has to be applied
– right-hand side» an expression that describes how to create the resulting tree
• 4 types of patternsF({“a”}) = {“A”} F({“b”: T}) = {“B”: F(T)}
• Syntactic restrictions of the expression in the right-hand side in order to guarantee nice behavior
ICDE’2001, Heidelberg, Germany 166D. Florescu, J. Siméon
UnQL in action (3)• Query1: ”Retrieve the titles of all the books”.
F({L:T})= if L=“title” then {“result”:T} else F(T) specific rules -------------------- F( T1 U T2) = F (T1) U F(T2) fixed in the F({})={} language
The divided self
publisher
R.D. Laing
author
book
titleauthor
bookbook
......title
bib
Panthoen Books
ICDE’2001, Heidelberg, Germany 167D. Florescu, J. Siméon
UnQL in action (4)• Query2: ”Copy the document while translating
the edge labels into French and omitting the sections and their descendents.”
F( T1 U T2) = F (T1) U F(T2) F({})={} -------------------- F({“book”:T})={“livre”:F(T)} F({“title”:T})={“titre”: F(T)}
F({“year”:T})={“annee”: F(T)} F({ L : T}={} F({V})=V
T
book
F(T)
livreF
ICDE’2001, Heidelberg, Germany 168D. Florescu, J. Siméon
Alternative SELECT-WHERE syntax
• Query2: ”Copy the books written before 1967 while translating the edge labels into French and omitting the sections and their descendents.”
SELECT {livre : {titre: T} {annee: Y}
} /* output tree pattern */
WHERE {bib {book :
{title: T} {year: Y}
}} in db, /* input tree pattern */ Y <1967
• Can be translated into the previous formalism
T Y
- - - -
ICDE’2001, Heidelberg, Germany 169D. Florescu, J. Siméon
Vertical regular expressions • Introduced by POQL (INRIA)
• Query4: ”Retrieve the books that have a section or a chapter entitled “Persons and experience”
SELECT {title: T}WHERE
{bib: {book: {title: T} { (section| chapter )*.title : “Persons and
experience” }}
} in db Any regular expression can be expressed using structural recursion
ICDE’2001, Heidelberg, Germany 170D. Florescu, J. Siméon
Cyclic data in UnQL
The divided self
publisher
R.D. Laing
author
Western studies
book
titlepublisherauthor
bookbook
......
..........................................
titlecitation
citation
• Normal evaluation would create infinite loops
• Two (equivalent) solutions:– memoization (do not visit the same node twice)– bulk semantics (apply the function on each edge in parallel
and group the resulting graph at the end)
F({“title”:T})={“result”:T} F({L:T})=F(T)
ICDE’2001, Heidelberg, Germany 171D. Florescu, J. Siméon
UnQL: final conclusion• Structural recursion as a programming style• Defined on trees but also on cyclic data • Well defined semantics• Well studied properties
– expressive power (FO+TC)– computable in PTIME– compositional q1 o q2 =q3– allows for traditional optimization– structural recursion guarantees termination even for cyclic
data
• Very interesting study but not usable as such for XML. XML is not a simple graph.
ICDE’2001, Heidelberg, Germany 172D. Florescu, J. Siméon
XML-QL(1)• Authors:
– A. Deutch, M. Fernandez, D.Florescu, A.Levy, D. Suciu
• Papers:– “XML-QL: a Query Language for XML”, A. Deutsch,
M.Fernandez, D. Florescu,A. Levy, D. Suciu, Proc. Int. Conf. of WWW, 1999.
• Implementation:– available at http://www.research.att.com/~mff/xmlql/doc– home-grown main memory XML data repository– query optimizer and execution engine
ICDE’2001, Heidelberg, Germany 173D. Florescu, J. Siméon
XML-QL(2)• Data model:
– node and edge labeled graph (elements & attributes)– a (totally) ordered or a (totally) unordered graph
• Language description:– WHERE clause to bind variables and to test predicates– CONSTRUCT clause to create new XML structures
• Features:– as UnQL: XML patterns for both the WHERE clause and the
CONSTRUCT clause– as UnQL: regular expressions for navigation– in addition: joins on multiple input sources– in addition: Skolem functions to create nested structures
ICDE’2001, Heidelberg, Germany 174D. Florescu, J. Siméon
XML patterns• Query1: ”Retrieve the titles of the books written
by Laing before 1967”
WHERE <bib> <book year= $y ISBN= $isbn>
<title> $t </title> <author> <lastname>Laing</lastname> </author> </book>
</bib> in “bib.xml”, $y<1967
CONSTRUCT <resultBook ISBN= $isbn > <resultTitle> $t </resultTitle> </resultBook>
$y $isbn $t
- - -
- - -
ICDE’2001, Heidelberg, Germany 175D. Florescu, J. Siméon
Joins in XML-QL• Query2: ”Retrieve all the rewiews about books written
by Laing”. WHERE
<bib><book ISBN = $i> <author>
<lastName>Laing</lastName></author>
</book></bib> in “bib.xml”, <reviews>
<review ISBN = $i> </review> ELEMENT_AS $e </reviews> in “reviews.xml”
CONSTRUCT$e
ICDE’2001, Heidelberg, Germany 176D. Florescu, J. Siméon
Outer-joins in XML-QL• Using nested queries • Query3: ”Retrieve the titles of the books written by Laing before
1967, together with their reviews (if any).”
WHERE <bib><book year=$y ISBN= $i > <title>$t</title> <authors><lastname>Laing</lastname></author> </book></bib> in “bib.xml”, $y<1967 CONSTRUCT <resultBook ISBN=$i> <title> $t</title>,
( WHERE <reviews> <review ISBN = $i> </review> ELEMENT_AS $r </reviews> in “reviews.xml”
CONSTRUCT $r)
</resultBook>
Outer-join semantics.
ICDE’2001, Heidelberg, Germany 177D. Florescu, J. Siméon
Meta-data queries• Query4: “Which kind of elements can be found in
the content of the element corresponding to the book with isbn=10 ?”
WHERE
<bib>
<book ISBN=“10”> <$tagName> </> </book>
</bib> in “bib.xml”, CONSTRUCT
<result>$tagName <result>
ICDE’2001, Heidelberg, Germany 178D. Florescu, J. Siméon
Fusion using Skolem functions• Fusion introduced by MSL (TSIMMIS)• Query5: ”Retrieve the titles of the all the books, grouped
first by year and then by publisher”. WHERE
<bib><book year=$y><title> $t </title><publisher>$p/publisher>
</book><bib> CONSTRUCT
<bookPerYear id=F1($y) > <bookPerYear&Publisher id=F2($y,$p) >
<bookTitle> $t </bookTitle> </bookPerYear&Publisher >
</bookPerYear>
Automatic fusion of all the bookPerYear elements with the same id attribute
$y $p $t
ICDE’2001, Heidelberg, Germany 179D. Florescu, J. Siméon
Skolem functions issues• Query5: ”Retrieve the titles of the books published by
“Pantheon Books”, grouped by year and by publisher”. WHERE
<bib><book year=$y><title> $t </title><publisher>$p/publisher>
</book><bib> CONSTRUCT
<bookPerYear id=F1($y) > <bookPerYear&Publisher id=F2($p) >
<bookTitle> $t </bookTitle> </bookPerYear&Publisher >
</bookPerYear>
Creates graphs with cycles and sharing.Several possible XML serializations.
ICDE’2001, Heidelberg, Germany 180D. Florescu, J. Siméon
Skolem functions issues• Query5: ”Retrieve the titles of the books published by
“Pantheon Books”, grouped by year and by publisher”. WHERE
<bib><book year=$y><title> $t </title><publisher>$p/publisher>
</book><bib> CONSTRUCT
<bookPerYear id=F1($y) > <newElement> We have an order problem </newElement> <bookPerYear&Publisher id=F2($y, $p) >
<bookTitle> $t </bookTitle> </bookPerYear&Publisher >
</bookPerYear>
Creates graphs with cycles and sharing.Several possible XML serializations.
ICDE’2001, Heidelberg, Germany 181D. Florescu, J. Siméon
XML-QL: final conclusion• Advantages:
– XML templates look very familiar– can express selection, projection, join, grouping – can construct deeply nested XML elements
• Limitations:– problems with the semantics of Skolem functions:
» order» nested Skolem functions
– preserving structure and hierarchy is difficult– no disjunction, aggregates, quantifiers, etc.– data model ignores some important XML details
ICDE’2001, Heidelberg, Germany 182D. Florescu, J. Siméon
Lorel• Authors:
– S. Abiteboul, D. Quass, J.McHugh, J. Widom, J. Wiener
• Paper:– “The Lorel Query Language for Semistructured Data”, S.
Abiteboul, D. Quass, J.McHugh, J. Widom, J. Wiener, Journal of Digital Libraries, 1(1), 1997
– Semistructured data (OEM), reconverted to XML
• Lorel is an extension of OQL for OEM:– functional language– applies type coercion (relaxes the strong typing constraint of
OQL) – performs path navigation with full regular expressions– adds an XML element creation operator– adds Skolem functions for grouping
ICDE’2001, Heidelberg, Germany 183D. Florescu, J. Siméon
OQL-like queries for XML• Query1: ”Retrieve the books written by Laing
before 1967.”
SELECT xml(result: $b )
FROM $b in bib.book
WHERE $b.author.lastname?=“Laing” and $b.@year<1967
•UnQL & XML-QL vs. Lorel: •No more patterns and pattern matching but path expressions.
•Different syntax. Equivalent expressive power.
ICDE’2001, Heidelberg, Germany 184D. Florescu, J. Siméon
Type coercion• Query1: ”Retrieve the books written by Laing
before 1967.”
SELECT xml(result: $b )
FROM $b in bib.book
WHERE $b.author.lastname=“Laing” and $b.@year<1967
SELECT xml(result: $b )
FROM $b in bib.book
WHERE
exists $l in $b.author.lastname?: $l =“Laing” and
real($b.@year) < real(1967)
ICDE’2001, Heidelberg, Germany 185D. Florescu, J. Siméon
Type coercion in Lorel• Basic comparison operators for atomic types
– conversion to the most general type (real)
• Coercion for equality– “set=value” => existential quantifier– “set=atomic object” => existential quantifier– “set, value=complex object” => false – complex object equality defined recursivelyprice=“12.5” verifies price<13 but no price<“013”
• Traditional operators loose their convenient properties (transitivity, distributivity, etc)
• Problem for query processing !
ICDE’2001, Heidelberg, Germany 186D. Florescu, J. Siméon
Lorel: final conclusion• Extends OQL in the following way:
– relaxes the strong typing constraint (type coercion)– adds regular path expressions for the navigation– adds Skolem functions
• Advantages:– builds on a powerful and well defined language
(OQL)– type coercion deals with irregular data
• Limitations:– type coercion is not always good– data model ignores some important XML details
ICDE’2001, Heidelberg, Germany 187D. Florescu, J. Siméon
YATL• Authors: Jerome Simeon, Sophie Cluet
• Papers: “Your Mediators Need Data Conversion!” Sigmod’1998
“The New YATL: Design and Specifications”, INRIA 1999
• Initial goal: data conversion and integration
• Data model: ordered trees, references, node-labeled
• Language description:– like OQL & Lorel: functional language
– like others: database iterator (make...match...where)
– like others: Skolem functions to manipulate references
– pattern matching with horizontal regular expressions
– local functions with full recursive functions for conversions
• Implementation: v1 INRIA in 1998 & v2 Bell Labs in 2000
ICDE’2001, Heidelberg, Germany 188D. Florescu, J. Siméon
YATL• Papers: “Your Mediators Need Data Conversion!” Sigmod’1998,
“The New YATL: Design and Specifications”, INRIA 1999
• Initial goal: data conversion and integration
• Data model: ordered trees, references, node-labeled
• Language description:– like OQL & Lorel: functional language
– like others: database iterator (make...match...where)
– like others: Skolem functions to manipulate references
– pattern matching with horizontal regular expressions
– full recursive functions and case expression for conversions
• Implementation: v1 INRIA in 1998 & v2 Bell Labs in 2000
ICDE’2001, Heidelberg, Germany 189D. Florescu, J. Siméon
Tree patterns in YATL• Query1: ”Retrieve the titles of the books
published in 1967 by ‘ Pantheon Books ’.
MAKE result [ $t ]
MATCH « bib.xml » WITH book[ @year[$y],
title[$t],
publisher[$p] ]
WHERE $p = “Pantheon Books” and $y=1967
Different semantics for matching: •no additional children allowed in a book •the cardinality of each @year, title and publisher has to be respected •the order of @year, title and publisher has to be respected
ICDE’2001, Heidelberg, Germany 190D. Florescu, J. Siméon
Tree patterns in YATL• Query1: ”Retrieve the titles of the books
published in 1967 by ‘ Pantheon Books ’.
MAKE result [ $t ]
MATCH input.xml WITH book[ _, @year[$y] _
title[$t], _,
publisher[$p], _ ]
WHERE $p = “Pantheon Books” and $y=1967
Different semantics for the patterns: •DO allow additional children in a book •the cardinality of each @year, title and publisher has to be respected •the order of @year, title and publisher has to be respected
ICDE’2001, Heidelberg, Germany 191D. Florescu, J. Siméon
Horizontal regular expressions• A Tree Pattern = type expression without union, and
with annotated variables ($v)
• Query: ”Retrieve the first author after the book title ”.
• Process DTDs like: <!ELEMENT bib’ (title, author+)*>
Ex: “Create a bibliography for each author”
book[ title [ String ] book($b) [ title [ $t ], author[String]+, +author [ $a ]+, UrTree* ] _ ]
MAKE $aMATCH book WITH book [ _ , title , _, author[$a] , *author, _ ]
MAKE *($a) bib [ author [ $a ], *title [ $t ] ]MATCH bib’ WITH bib[*(title [ $t ], +author [ $a ] )]
ICDE’2001, Heidelberg, Germany 192D. Florescu, J. Siméon
Recursive functions• Query1: ”Retrieve the table of content of a
book.”
• Problem: how to enforce termination ?!
define function toc($b) = case $b of | title[$t] -> title[$t] | section [*$child] -> section[ *toc($child) ] | _ [ *$child ] -> [ *toc($child) ];
toc(bib/book);
ICDE’2001, Heidelberg, Germany 193D. Florescu, J. Siméon
YATL: final conclusion• YATL design goals :
– Orthogonal constructs + functional glue– Regular expressions = XML types
= YATL primitive operation– Recursion and case statement: very expressive
to support queries, conversion and integration– Efficient on the classical database queries
• Open issues :– no termination!– optimization of recursion and case ?
ICDE’2001, Heidelberg, Germany 194D. Florescu, J. Siméon
XSLT(1)• Paper:
– “XSL Transformations (XSLT)”, W3C recommendation
• XML to XML rule based transformation language
• An XSLT program is an XML document itself
The divided self
publisher
R.D. Laing
author
book
titlepublisherauthor
bookbook
......
..........................................
title
bib
Pantheon Books
The divided self
publisher
R.D. Laing
author
book
titlepublisherauthor
bookbook
......
..........................................
title
bib
Pantheon Books
The divided self
publisher
R.D. Laing
author
book
titlepublisherauthor
bookbook
......
..........................................
title
bib
Pantheon Books
DOM
XML
HTML
data
transformation
result
ICDE’2001, Heidelberg, Germany 195D. Florescu, J. Siméon
XSLT(2)
• An XSLT program is a valid XML document containing:– elements in the <xsl:> namespace (i.e. the XSLT statements)
– elements in other namespaces(i.e the user-defined data)
• The result of the evaluation of an XSLT program on an input XML document := the XSLT document where each <xsl:> element has been replaced with the result of its “evaluation”
• Uses Xpath as a sublanguage
• Used mostly as a stylesheet language
ICDE’2001, Heidelberg, Germany 196D. Florescu, J. Siméon
XSLT programs
• An XSLT program – is an element of type <xsl:stylesheet>
1. XSL elements describing rewriting rules– <xsl:template>
2. XSL elements describing rule execution control – <xsl:apply-templates>– <xsl:call-template>
3. XSL elements describing instructions– <xsl:element>, <xsl:attribute>, <xsl:for-each>,
<xsl:if>, <xsl:copy>, <xsl:copy-of>, <xsl:sort>, <xsl:value-of>, etc
ICDE’2001, Heidelberg, Germany 197D. Florescu, J. Siméon
XSLT processing model• Process an XML document (procedure PD):
1. Apply the procedure PL (bellow) to a list with a single node: the root of the document
• Process a list L of nodes (procedure PL):1. Process each node N (procedure P bellow) in the list (with current
node=N and current list=L)
2. Return the concatenation (in the right order) of the partial results
PL([x1, x2…, xn]) = [ P(x1), P(x2), …, P(xn)]
• Process a node N (procedure P):1. Find all applicable templates to the node N
2. Find the “best” template among them
3. Instantiate the content of the template
4. Return this result
ICDE’2001, Heidelberg, Germany 198D. Florescu, J. Siméon
<xsl:template>• Basic XSLT concept: describes a rewriting rule
• It has:– attributes to describe the acceptable input – content to describe the output
• Attributes:– match: Xpath expression describing the elements to which this
template applies– name: the name of the template rule– priority: guides the choice of the best template to apply
• The content is a legal XML fragment with:– Elements from the xsl namespace – Other elements (user data)
ICDE’2001, Heidelberg, Germany 199D. Florescu, J. Siméon
<xsl:template> example <xsl:template name=“myTemplate” match=“book[title]” >
<resultBook> <xsl:attribute name=resultYear>
<xsl:value-of select=“./@year”/> </xsl:attribute>
The title of this book is <resultTitle>
<xsl:value-of select=“./title”/> </resultTitle>
and it was.... </resultBook><xsl:template>
ICDE’2001, Heidelberg, Germany 200D. Florescu, J. Siméon
Instantiating an <xsl:template>
• ... on a node N:» returns the content of the template where the <xsl:> elements
from the content of the template have been replaced with the result of their “evaluation” ( with the current node=N )
» Two types of <xsl:> elements in the content:
1. Instruction elements » <xsl:copy>, <xsl:copy-of>, <xsl:value-of>, <xsl:for-each>» return a certain list of nodes according to their particular semantics
2. Rule control elements » <xsl:apply-templates>, <xsl:call-templates>» recursive calls to the rule engine (see below)
• Maps an XML node into a list of XML nodes
ICDE’2001, Heidelberg, Germany 201D. Florescu, J. Siméon
<xsl:template> example <xsl:template name=“myTemplate” match=“book[title]” >
<resultBook> <xsl:attribute name=resultYear>
<xsl:value-of select=“./@year”/> </xsl:attribute>
The title of this book is <resultTitle>
<xsl:value-of select=“./title”/> </resultTitle>
and it was.... </resultBook><xsl:template>
ICDE’2001, Heidelberg, Germany 202D. Florescu, J. Siméon
Example of instantiation<book ISBN=“10” year=“1967” >
<title>The politics of experience</title> <author>R.D.Laing</author> <section> The great and tr
<title>Persons and experience</title>
<section> Exploitation must not been….
</section> </section> </book>
<resultBook resultYear=1967> The title of this book is <resultTitle>
The politics of experience </resultTitle> and it was ….</resultBook>
Input XML
Output XML
ICDE’2001, Heidelberg, Germany 203D. Florescu, J. Siméon
Recursive <xsl:template><xsl:template name=“myTemplate” match=“book[title]”
> <resultBook>
<xsl:attribute name=resultYear><xsl:value-of select=“./@year”/>
</xsl:attribute> <resultTitle>
<xsl:value-of select=“./title”/> </resultTitle>
<xsl:apply-template select= “./section” /> </resultBook><xsl:template>
Invokes the procedure PL with current list= “./section”.
ICDE’2001, Heidelberg, Germany 204D. Florescu, J. Siméon
Recursive calls• <xsl:apply-templates>
– invokes recursively the procedure PL – the argument is a new list of nodes
» explicitly specified in the select attribute» by default is the list of children of the current node
<xsl:apply-template select=“ ./section ”/>
• <xsl:call-template>– triggers the instantiation of a specific template identified by
name – does not change the context node and the context list
<xsl:call-template name=“myTemplate” />
ICDE’2001, Heidelberg, Germany 205D. Florescu, J. Siméon
XSLT execution control <xsl:stylesheet>------------------------------------------------------------------ <xsl:template name=“myTemplate”>
<xsl:apply-template select=“./ascendent::book”/> <xsl:template>------------------------------------------------------------------ <xsl:template match=“section”>
This is a section of the book <xsl:call-template name=“myTemplate”/> and its name is <xsl:value-of select=“./title”> . </xsl:template>------------------------------------------------------------------ <xsl:template match=“book”>
<xsl:value-of select=“./title”> </xsl:template>----------------------------------------------------------------- <xsl:template match=“/”>
<xsl:apply-template select=“//section[title]”> </xsl:template>------------------------------------------------------------------</xsl:stylesheet>
ICDE’2001, Heidelberg, Germany 206D. Florescu, J. Siméon
Built-in templates------------------------------------------------------------------
<xsl:template match=“*|/”> apply recursively on the children <xsl:apply-templates select=“./node()” /> if element</xsl:template>
------------------------------------------------------------------
<xsl:template match=“@*|text()”><xsl:value-of select=“.”/> print the content
</xsl:template> if text node or attribute
-----------------------------------------------------------------
<xsl:template match=“processing-instruction()|comment()”/> ignore (do nothing) if processing instruction or comment
ICDE’2001, Heidelberg, Germany 207D. Florescu, J. Siméon
TOC of a certain book
<xsl:template match=“/”> <xsl:apply-template select=“//book[@ISBN=10]”>
</xsl:template>----------------------------------------------------------------------------------
<xsl:template match=“book”><xsl:apply-template select=“./section”>
</xsl:template>--------------------------------------------------<xsl:template match=“section”>
Section <xsl:value-of select=“title”> <xsl:apply-templates select=“./section”>
</xsl:template>
-----------------------------------------------------------------
ICDE’2001, Heidelberg, Germany 208D. Florescu, J. Siméon
XSLT: final conclusion
• Describes general XML to XML transformations
• Built-in processing model
• Full recursion (not only structural recursion like UnQL!)
• Possibile to write non-terminating programs even on trees
• XSLT vs. Quilt – equivalent expressive power– differences: programming style, XML vs. Non-XML syntax
• Could be considered as a query language
• Is it “declarative” ? Should it be a QL candidate?
ICDE’2001, Heidelberg, Germany 209D. Florescu, J. Siméon
XML-related research problems(1)• Update languages for XML
• XML views of object-relational databases
• Storing XML data in object-relational DBMSs– new challenges for the traditional DBMSs
• Alternative storage methods for XML data
• Indexing XML
• Query processing algorithms for XML data
• Mixing structured search with full-text search
• XML benchmarks
ICDE’2001, Heidelberg, Germany 210D. Florescu, J. Siméon
XML-related research problems(2)• Distributed execution of XML queries
• XML-based information mediation
• XML data cleaning
• XML data compression
• Efficient (streamed) processing of XML transformations
• XML-based information brokering
• XML-based workflow systems
and many more...
ICDE’2001, Heidelberg, Germany 211D. Florescu, J. Siméon
Conclusion• XML is the lingua franca of the Web • XML is the next big challenge for the database community• Large quantities of a new type of data
– textual, irregular, self-organizing, distributed, replicated, etc.
• Many orders of magnitude larger:– the volume of XML data– the number of XML data repositories
• The need for such a technology is here• The solutions are not here !• Myriad of standards and products issued from industry
What is the role of the research?
ICDE’2001, Heidelberg, Germany 212D. Florescu, J. Siméon
Typeswitch• Goal:
– control the evaluation using the type of a certain expression
• Syntax:typeswitch expression0 ‘ [ ‘ as variable ‘ ] ’
case type1 return expression1 ……….. case typeK return expressionK else return expressionk+1
• Semantics: – compute the dynamic type of the expression0 – if the dynamic type of expression0 and the typeK have a non-
empty intersection, the entire expression evaluates to the result of the expressionK.
– if no case clause satisfies this requirement, return the result of the expressionk+1.
ICDE’2001, Heidelberg, Germany 213D. Florescu, J. Siméon
Typeswitch (2)• Example:
for $x in /department[name=“operations”]/personnel/*
return typeswitch $x
case manager return $x/salary+ 1000
case regular_employee return $x/salary
else error