XML, XML Schema, XPath and XQuery Query Languages CS561 Slides collated from several sources,...

Post on 22-Dec-2015

231 views 4 download

Tags:

transcript

XML, XML Schema, XPath and XQuery Query Languages

CS561

Slides collated from several sources, including D. Suciu at Univ. of Washington

XML Data

CS561 - Spring 2007. 3

XML

W3C standard to complement HTML

• origins: structured text SGML

• motivation:– HTML describes presentation– XML describes content

• HTML e XML subset SGML

CS561 - Spring 2007. 4

From HTML to XML

HTML describes the presentation

CS561 - Spring 2007. 5

HTML

<h1> Bibliography </h1>

<p> <i> Foundations of Databases </i>

Abiteboul, Hull, Vianu

<br> Addison Wesley, 1995

<p> <i> Data on the Web </i>

Abiteboul, Buneman, Suciu

<br> Morgan Kaufmann, 1999

CS561 - Spring 2007. 6

XML<bibliography>

<book> <title> Foundations… </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<publisher> Addison Wesley </publisher>

<year> 1995 </year>

</book>

</bibliography>XML describes the content

CS561 - Spring 2007. 7

XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements: <book>…<book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element

well formed XML document: if it has matching tags

CS561 - Spring 2007. 8

XML: Attributes

<book price = “55” currency = “USD”>

<title> Foundations of Databases </title>

<author> Abiteboul </author>

<year> 1995 </year>

</book>

attributes are alternative ways to represent data

CS561 - Spring 2007. 9

More XML: Oids and References

<person id=“o555”> <name> Jane </name> </person>

<person id=“o456”> <name> Mary </name>

<children idref=“o123 o555”/>

</person>

<person id=“o123” mother=“o456”><name>John</name>

</person>

oids and references in XML are just syntax

CS561 - Spring 2007. 10

So Far

• Differences between “xml data” versus “relational data” ?

– Data model?– Typed?– Homogeneity?– Correctness?– Usage/Purpose ?

CS561 - Spring 2007. 11

“XML Data Model”

Numerous competing models:

• Document Object Model (DOM):– class hierarchy (node, element, attribute,…)– defines API to inspect/modify the document

• XML query data model (formal)

CS561 - Spring 2007. 12

XML Namespaces

• http://www.w3.org/TR/REC-xml-names

• name ::= [prefix:]localpart

<book xmlns:isbn=“www.isbn-org.org/def”>

<title> … </title>

<number> 15 </number>

<isbn:number> …. </isbn:number>

</book>

<book xmlns:isbn=“www.isbn-org.org/def”>

<title> … </title>

<number> 15 </number>

<isbn:number> …. </isbn:number>

</book>

CS561 - Spring 2007. 13

<tag xmlns:mystyle = “http://…”>

<mystyle:title> … </mystyle:title>

<mystyle:number> …

</tag>

<tag xmlns:mystyle = “http://…”>

<mystyle:title> … </mystyle:title>

<mystyle:number> …

</tag>

XML Namespaces

• syntactic: <number> , <isbn:number>

• semantic: provide URL for “shared” schema

defined here

CS561 - Spring 2007. 14

So Far

• What are “namespaces” good for ?

• Are they typically available for relational databases?

Schemas for XML

CS561 - Spring 2007. 16

DTD - Element Type Definitions

<!ELEMENT paper (title,author*, year, (journal|conference) )>

CS561 - Spring 2007. 17

XML Schemas

• generalizes DTDs (SGML derivative)

• now, instead uses XML syntax

• two main documents: structure and data types

• XML Schema more powerful but more complex

CS561 - Spring 2007. 18

XML Schema<xsd:element name=“paper” type=“papertype”/>

<xsd:complexType name=“papertype”>

<xsd:sequence>

<xsd:element name=“title” type=“xsd:string”/>

<xsd:element name=“author” minOccurs=“0”/>

<xsd:element name=“year”/>

<xsd: choice> < xsd:element name=“journal”/>

<xsd:element name=“conference”/>

</xsd:choice>

</xsd:sequence>

</xsd:complexType

</xsd:element>

<xsd:element name=“paper” type=“papertype”/>

<xsd:complexType name=“papertype”>

<xsd:sequence>

<xsd:element name=“title” type=“xsd:string”/>

<xsd:element name=“author” minOccurs=“0”/>

<xsd:element name=“year”/>

<xsd: choice> < xsd:element name=“journal”/>

<xsd:element name=“conference”/>

</xsd:choice>

</xsd:sequence>

</xsd:complexType

</xsd:element>DTD: <!ELEMENT paper (title,author*,year, (journal|

conference))>

CS561 - Spring 2007. 19

So Far

• Differences between “xml schema” versus “relational schema” ?

– Purpose ? Do we need it ?– Definition time?– Strictness of typing ?– Underlying model ?

CS561 - Spring 2007. 20

Elements versus Types in XML Schema

<xsd:element name=“person”> <xsd:complexType> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence> </xsd:complexType></xsd:element>

<xsd:element name=“person”> <xsd:complexType> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence> </xsd:complexType></xsd:element>

<xsd:element name=“person” type=“ttt” /><xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence></xsd:complexType>

<xsd:element name=“person” type=“ttt” /><xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element name=“name” type=“xsd:string”/> <xsd:element name=“address” type=“xsd:string”/> </xsd:sequence></xsd:complexType>

DTD: <!ELEMENT person (name, address) >

CS561 - Spring 2007. 21

• Types:– Simple types (integers, strings, ...)– Complex types (regular expressions, like in DTDs)

• Element-type-element alternation:– Root element has a complex type – Complex type is a regular expression of elements– Those elements have their complex types ...– ...– Leaves have simple types

Elements versus Types in XML Schema

CS561 - Spring 2007. 22

Local and Global Types in XML Schema• Local type: <xsd:element name=“person”>

[define locally the person’s type] </xsd:element>

• Global type: <xsd:element name=“person” type=“ttt”/>

<xsd:complexType name=“ttt”> [define here the type ttt] </xsd:complexType>

Global types: can be reused in other elements

CS561 - Spring 2007. 23

Local v.s. Global Elements inXML Schema

• Local element: <xsd:complexType name=“ttt”>

<xsd:sequence> <xsd:element name=“address” type=“...”/>... </xsd:sequence> </xsd:complexType>

• Global element: <xsd:element name=“address” type=“...”/>

<xsd:complexType name=“ttt”> <xsd:sequence> <xsd:element ref=“address”/> ... </xsd:sequence> </xsd:complexType> Global elements: like in DTDs

CS561 - Spring 2007. 24

Regular Expressions in XML Schema

Recall the element-type-element alternation: <xsd:complexType name=“....”>

[regular expression on elements] </xsd:complexType>

Regular expressions:• <xsd:sequence> A B C </...>• <xsd:choice> A B C </...>• <xsd:group> A B C </...> • <xsd:... minOccurs=“0” maxOccurs=“unbounded”> ..</...>• <xsd:... minOccurs=“0” maxOccurs=“1”> ..</...>

CS561 - Spring 2007. 25

Regular Expressions in XML Schema

Regular expressions:• <xsd:sequence> A B C </...> = A B C• <xsd:choice> A B C </...> = A | B | C• <xsd:group> A B C </...> = (A B C)• <xsd:... minOccurs=“0” maxOccurs=“unbounded”> ..</...> = (...)*• <xsd:... minOccurs=“0” maxOccurs=“1”> ..</...> = (...)?

CS561 - Spring 2007. 28

Derived Types by Extensions

<complexType name="Address">

<sequence> <element name="street" type="string"/>

<element name="city" type="string"/>

</sequence>

</complexType>

<complexType name="USAddress">

<complexContent>

<extension base= "ipo:Address">

<sequence> <element name="state" type="ipo:USState"/>

<element name="zip" type="positiveInteger"/>

</sequence>

</extension>

</complexContent>

</complexType>

<complexType name="Address">

<sequence> <element name="street" type="string"/>

<element name="city" type="string"/>

</sequence>

</complexType>

<complexType name="USAddress">

<complexContent>

<extension base= "ipo:Address">

<sequence> <element name="state" type="ipo:USState"/>

<element name="zip" type="positiveInteger"/>

</sequence>

</extension>

</complexContent>

</complexType>

Corresponds to inheritance

Key Constraints in XML

CS561 - Spring 2007. 30

Keys in XML Schema

<purchaseReport>

<regions>

<zip code="95819">

<part number="872-AA" quantity="1"/>

<part number="926-AA" quantity="1"/>

<part number="833-AA" quantity="1"/>

<part number="455-BX" quantity="1"/>

</zip>

<zip code="63143">

<part number="455-BX" quantity="4"/>

</zip>

</regions>

<parts>

<part number="872-AA">Lawnmower</part>

<part number="926-AA">Baby Monitor</part>

<part number="833-AA">Lapis Necklace</part>

<part number="455-BX">Sturdy Shelves</part>

</parts>

</purchaseReport>

<purchaseReport>

<regions>

<zip code="95819">

<part number="872-AA" quantity="1"/>

<part number="926-AA" quantity="1"/>

<part number="833-AA" quantity="1"/>

<part number="455-BX" quantity="1"/>

</zip>

<zip code="63143">

<part number="455-BX" quantity="4"/>

</zip>

</regions>

<parts>

<part number="872-AA">Lawnmower</part>

<part number="926-AA">Baby Monitor</part>

<part number="833-AA">Lapis Necklace</part>

<part number="455-BX">Sturdy Shelves</part>

</parts>

</purchaseReport>

<key name="NumKey">

<selector xpath="parts/part"/>

<field xpath="@number"/>

</key>

<key name="NumKey">

<selector xpath="parts/part"/>

<field xpath="@number"/>

</key>

XML:

XML Schema for Key :

CS561 - Spring 2007. 31

Keys in XML Schema• In general, syntax is :

<key name=“someDummyNameHere">

<selector xpath=“p"/>

<field xpath=“p1"/>

<field xpath=“p2"/>

. . .

<field xpath=“pk"/>

</key>

<key name=“someDummyNameHere">

<selector xpath=“p"/>

<field xpath=“p1"/>

<field xpath=“p2"/>

. . .

<field xpath=“pk"/>

</key>

Notes: All XPath expressions “start” at the element currently being definedThe fields must identify a single “node”.

CS561 - Spring 2007. 32

Keys in XML Schema

• Unique = guarantees uniqueness• Key = guarantees uniqueness and existence• All XPath expressions are “restricted”:

– /a/b | /a/c OK for selector– //a/b/*/c OK for field

• Note: better than DTD’s ID mechanism

CS561 - Spring 2007. 33

Examples of Keys in XML Schema

• Examples<key name="fullName">

<selector xpath=".//person"/>

<field xpath="firstname"/>

<field xpath="surname"/>

</key>

<unique name="nearlyID">

<selector xpath=".//*"/>

<field xpath="@id"/>

</unique>

<key name="fullName">

<selector xpath=".//person"/>

<field xpath="firstname"/>

<field xpath="surname"/>

</key>

<unique name="nearlyID">

<selector xpath=".//*"/>

<field xpath="@id"/>

</unique>

Note: Must havesingle firstname,Single surname

CS561 - Spring 2007. 34

Foreign Keys in XML Schema

• Example

<keyref name="personRef" refer="fullName">

<selector xpath=".//personPointer"/>

<field xpath="@first"/>

<field xpath="@last"/>

</keyref>

<keyref name="personRef" refer="fullName">

<selector xpath=".//personPointer"/>

<field xpath="@first"/>

<field xpath="@last"/>

</keyref>

CS561 - Spring 2007. 35

So Far

• Differences between “keys/foreign-keys”in xml versus relational model?

– Purpose ? – Underlying model ?

XPath

“The Basic Building Block”

CS561 - Spring 2007. 38

XPath

• Goal = Permit access some nodes from document

• XPath main construct : Axis navigation

• Navigation step : axis + node-test + predicates

• Examples– descendant::node()– child::author– attribute::booktitle =“XML”

CS561 - Spring 2007. 39

XPath• XPath path consists of one or more navigation steps,

separated by “/”

• Navigation step : axis + node-test + predicates

• Examples– /descendant::node() /child::author– /descendant::node() /child::author [parent /attribute::booktitle =“XML”][2]

• XPath offers shortcuts :– no axis means child– // /descendant-or-self::node()/

CS561 - Spring 2007. 40

XPath- Child Axis Navigation• author is shorthand for child::author. • Examples:

– aaa -- all the children nodes labeled aaa – aaa/bbb -- all the bbb grandchildren of aaa children – */bbb all the bbb grandchildren of any child

• Notes:– . -- the context node– / -- the root node

aaa

bbb

ccc aaa

aaa bbb ccc

1 2 3

4 5 6 7

context node

CS561 - Spring 2007. 42

XPath- Child Axis Navigation

– /doc -- all doc children of the root– ./aaa -- all aaa children of the context node

(equivalent to aaa)

– text() -- all text children of context node– node() -- all children of the context node

(includes text and attribute nodes)– .. -- parent of the context node– .// -- the context node and all its descendants– // -- the root node and all its descendants– //text() -- all the text nodes in the document

CS561 - Spring 2007. 43

Predicates– [2] -- the second child node of the context node

– chapter[5] -- the fifth chapter child of context node

– [last()] -- the last child node of the context node

– chapter[title=“introduction”] -- the chapter children of the context node that have one or more title children whose string-value is “introduction” (string-value is concatenation of all text on descendant text nodes)

– person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “Joe”

CS561 - Spring 2007. 44

Axis navigation

• So far, our expressions have moved us down by moving to children nodes.

• Exceptions are :– . stay where you are– / go to the root– // all descendants of the root– .// all descendants of the context node

CS561 - Spring 2007. 45

Axis navigation

• XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self

• Some of these describe single nodes:– self, parent

• Some describe sequences of nodes: – All others

CS561 - Spring 2007. 46

XPath Navigation Axesancestor

descendant

followingpreceding

following-siblingpreceding-sibling

child

attribute

namespace

self

CS561 - Spring 2007. 47

XPath Abbreviated Syntax

(nothing) child::@ attribute::// /descendant-or-self::node(). self::node().// descendant-or-self::node.. parent::node()/ (document root)

CS561 - Spring 2007. 49

So Far

Differences between SQL and XPATH?

• What are similar query capabilities?• What features does SQL have, but not XPATH?• What features does XPATH support, but not SQL?• Is XPath a full-fledged query language?

Query Languages - XQuery

CS561 - Spring 2007. 51

Summary of XQuery

• FLWR expressions• FOR and LET expressions• Collections and sorting

ResourcesXQuery: A Query Language for XML Chamberlin, Florescu, et al.W3C recommendation: www.w3.org/TR/xquery/

CS561 - Spring 2007. 52

XQuery

• Designed based on Quilt (which is based on XML-QL)

• http://www.w3.org/TR/xquery/2/2001

• XML Query data model (ordered)

CS561 - Spring 2007. 53

FLWR (“Flower”) Expressions

FOR ... LET... FOR... LET...

WHERE...

RETURN...

CS561 - Spring 2007. 54

XQuery

Find the titles of all books published after 1995:

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year > 1995

RETURN $x/title

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year > 1995

RETURN $x/title

How does result look like?

CS561 - Spring 2007. 55

XQuery

Find all book titles published after 1995:

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year > 1995

RETURN $x/title

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year > 1995

RETURN $x/title

Result: <title> abc </title> <title> def </title> <title> ghi </title>

CS561 - Spring 2007. 56

XQuery Example

FOR $a IN (document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)

RETURN <result>

$a,

FOR $t IN /bib/book[author=$a]/title

RETURN $t

</result>

FOR $a IN (document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)

RETURN <result>

$a,

FOR $t IN /bib/book[author=$a]/title

RETURN $t

</result>

CS561 - Spring 2007. 57

XQuery Example

FOR $a IN (document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)

RETURN <result>

$a,

FOR $t IN /bib/book[author=$a]/title

RETURN $t

</result>

FOR $a IN (document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)

RETURN <result>

$a,

FOR $t IN /bib/book[author=$a]/title

RETURN $t

</result>

For each author of a book by Morgan Kaufmann,

list all books she published:

What is query result ?

CS561 - Spring 2007. 58

XQueryResult: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author>Jones</author> <title> abc </title> <title> def </title> </result>

<result> <author> Smith </author> <title> ghi </title> </result>

CS561 - Spring 2007. 59

XQuery Example: Duplicates

For each author of a book by Morgan Kaufmann, list all books she published:

FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)

RETURN <result>

$a,

FOR $t IN /bib/book[author=$a]/title

RETURN $t

</result>

FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author)

RETURN <result>

$a,

FOR $t IN /bib/book[author=$a]/title

RETURN $t

</result>

distinct = a function that eliminates duplicates

CS561 - Spring 2007. 60

Example XQuery Result

Result: <result> <author>Jones</author> <title> abc </title> <title> def </title> </result>

<result> <author> Smith </author> <title> ghi </title> </result>

CS561 - Spring 2007. 61

XQuery

• FOR $x in expr – binds $x to each element in the list expr– Useful for iteration over some input list

• LET $x = expr – binds $x to the entire list expr– Useful for common subexpressions and for grouping

and aggregations

CS561 - Spring 2007. 62

XQuery with LET Clause

count = a (aggregate) function that returns number of elements

<big_publishers>

FOR $p IN distinct(document("bib.xml")//publisher)

LET $b := document("bib.xml")/book[publisher = $p]

WHERE count($b) > 100

RETURN $p

</big_publishers>

<big_publishers>

FOR $p IN distinct(document("bib.xml")//publisher)

LET $b := document("bib.xml")/book[publisher = $p]

WHERE count($b) > 100

RETURN $p

</big_publishers>

CS561 - Spring 2007. 63

XQuery

Find books whose price is larger than average:LET $a = avg(document("bib.xml")/bib/book/@price)

FOR $b in document("bib.xml")/bib/book

WHERE $b/@price > $a

RETURN $b

LET $a = avg(document("bib.xml")/bib/book/@price)

FOR $b in document("bib.xml")/bib/book

WHERE $b/@price > $a

RETURN $b

CS561 - Spring 2007. 64

FOR versus LET

FOR

• Binds node variables iteration

LET

• Binds collection variables one value

CS561 - Spring 2007. 65

FOR v.s. LET

FOR $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

FOR $x IN document("bib.xml")/bib/book

RETURN <result> $x </result>

Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...

LET $x := document("bib.xml")/bib/book

RETURN <result> $x </result>

LET $x := document("bib.xml")/bib/book

RETURN <result> $x </result>

Returns: <result> <book>...</book> <book>...</book> <book>...</book> ... </result>

CS561 - Spring 2007. 66

Collections in XQuery• Ordered and unordered collections

– /bib/book/author = an ordered collection

– distinct(/bib/book/author) = an unordered collection

• LET $a = /bib/book $a is a collection• $b/author a collection (several authors...)

RETURN <result> $b/author </result>RETURN <result> $b/author </result>Returns: <result> <author>...</author> <author>...</author> <author>...</author> ...</result>

CS561 - Spring 2007. 67

XQuery Summary

FOR-LET-WHERE-RETURN = FLWR

FOR/LET Clauses

WHERE Clause

RETURN Clause

List of tuples

List of tuples

Instances of XQuery data model

CS561 - Spring 2007. 68

XQuery

Some more query features

CS561 - Spring 2007. 69

Sorting in XQuery

<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/@price </book> SORTBY (price DESCENDING) </publisher> SORTBY (name) </publisher_list>

<publisher_list> FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> $p/text() </name> , FOR $b IN document("bib.xml")//book[publisher = $p] RETURN <book> $b/title , $b/@price </book> SORTBY (price DESCENDING) </publisher> SORTBY (name) </publisher_list>

CS561 - Spring 2007. 70

Sorting in XQuery

• Sorting arguments: refer to name space of RETURN clause, not of FOR clause

• TIP: To sort on an element you don’t want to display, first return it, then remove it with an additional query.

CS561 - Spring 2007. 71

If-Then-Else

FOR $h IN //holding

RETURN <holding>

$h/title,

IF $h/@type = "Journal"

THEN $h/editor

ELSE $h/author

</holding> SORTBY (title)

FOR $h IN //holding

RETURN <holding>

$h/title,

IF $h/@type = "Journal"

THEN $h/editor

ELSE $h/author

</holding> SORTBY (title)

CS561 - Spring 2007. 72

Existential Quantifiers

FOR $b IN //book

WHERE SOME $p IN $b//para SATISFIES

contains($p, "sailing")

AND contains($p, "windsurfing")

RETURN $b/title

FOR $b IN //book

WHERE SOME $p IN $b//para SATISFIES

contains($p, "sailing")

AND contains($p, "windsurfing")

RETURN $b/title

CS561 - Spring 2007. 73

Universal Quantifiers

FOR $b IN //book

WHERE EVERY $p IN $b//para SATISFIES

contains($p, "sailing")

RETURN $b/title

FOR $b IN //book

WHERE EVERY $p IN $b//para SATISFIES

contains($p, "sailing")

RETURN $b/title

CS561 - Spring 2007. 74

So Far

• Similarities between SQL and XQuery?

• Differences between SQL and XQuery?

XML, XML Data Model

XML Schema, XPath XQuery