+ All Categories
Home > Documents > CIS550 Handout 7 Fall 2001 1 CIS 550 Handout 7 -- XPATH and XQuery.

CIS550 Handout 7 Fall 2001 1 CIS 550 Handout 7 -- XPATH and XQuery.

Date post: 31-Dec-2015
Category:
Upload: lizbeth-fitzgerald
View: 227 times
Download: 0 times
Share this document with a friend
27
CIS550 Handout 7 Fall 2001 1 CIS 550 Handout 7 -- XPATH and XQuery
Transcript

CIS550 Handout 7 Fall 2001 1

CIS 550

Handout 7 -- XPATH and XQuery

CIS550 Handout 7 Fall 2001 2

URLs -- XPath• http://www.w3.org/TR/xpath

This is the “recommendation”. Dense. Few examples. Difficult to extract the “big picture” from the morass of detail

• http://www.zvon.org/xxl/XPathTutorial/

General/examples.html

A tutorial with some simple examples. Maybe too simple. There are lots of tutorials on the web.

CIS550 Handout 7 Fall 2001 3

URLs -- XQuery• http://www.w3.org/TR/xquery/

The basic recommendation. Plenty of examples, so work through these first.

• http://www.w3.org/TR/query-semantics/

A formal semantics for XQuery. Despite its forbidding title, it is remarkably readable. It also discusses a type system for XQuery.

• http://www.w3.org/TR/xmlquery-use-cases

A bunch of example queries and their solution in XQuery (not surprising, since XQuery is Turing-complete!)

CIS550 Handout 7 Fall 2001 4

How to Identify nodes in a Tree -- Regular Path Expressions

db

empsdepts

mgremp

“Mary” “John” “Bill”

name name

emp

name

In the normal syntax of regular expressions:

db.emps.emp

db.(depts.dept.mgr |emps.emp)

db._*.name

dept

N.B. Regular path expressions have nothing to do with regular expresions in DTDs

CIS550 Handout 7 Fall 2001 5

More examples

With the DTD: <!ELEMENT PERSON (NAME, FATHER, MOTHER)> <!ELEMENT MOTHER (PERSON?)> …

the regular path expression (PERSON.MOTHER)*

identifies matrilineal ancestry

XPATH is a “superset of a subset” of regular path expressions. (It cannot express this set of nodes.) However, it is not limited to moving “down” the tree.

CIS550 Handout 7 Fall 2001

XPath• Primary goal = to permit to access some nodes from a

given document• XPath main construct : axis navigation• An XPath path consists of one or more navigation steps,

separated by /• A navigation step is a triplet: axis + node-test + list of

predicates

• Examples– /descendant::node()/child::author– /descendant::node()/child::author[parent/attribute::booktitle = “XML”][2]

• XPath also offers some shortcuts– no axis means child– // /descendant-or-self::node()/

CIS550 Handout 7 Fall 2001 7

XPath- child axis navigation• author is shorthand for child::author. Examples:

– aaa -- all the child nodes labeled aaa (1,3)– aaa/bbb -- all the bbb grandchildren of aaa children (4)– */bbb all the bbb grandchildren of any child (4,6)

– . -- the context node– / -- the root node

aaa

bbb

ccc aaa

aaa bbb ccc

1 2 3

4 5 6 7

context node

CIS550 Handout 7 Fall 2001 8

XPath- child axis navigation (cont)– /doc -- all the doc children of the root– ./aaa -- all the aaa children of the context node

(equivalent to aaa)– text() -- all the text children of the context node– node() -- all the children of the context node (includes

text and attribute nodes)– .. -- parent of the context node– .// -- the context node and all its descendants– // -- the root node and all its descendants– //para -- all the para nodes in the document– //text() -- all the text nodes in the document– @font the font attribute node of the context node

CIS550 Handout 7 Fall 2001 9

Predicates– [2] -- the second child node of the context node– chapter[5] -- the fifth chapter child of the context node– [last()] -- the last child node of the context node– chapter[title=“introduction”] -- the chapter children of the

context node that have one or more title children whose string-value is “introduction” (the string-value is the concatenation of all the text on descendant text nodes)

– person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “Joe”

– From the XPath specification:NOTE: If $x is bound to a node set then $x = “foo” does not mean the same as not ($x != “foo”) .

CIS550 Handout 7 Fall 2001 10

Unions of Path Expressions• employee | consultant -- the union of the employee and

consultant nodes that are children of the context node• For some reason person/(employee|consultant) --as in regular

path expressions -- is not allowed• However person/node()[boolean(employee|consultant)] is

allowed!!• From the XPATH specification:

– The boolean function converts its argument to a boolean as follows:• a number is true if and only if it is neither positive or negative zero nor

NaN

• a node-set is true if and only if it is non-empty

• a string is true if and only if its length is non-zero

• an object of a type other than the four basic types is converted to a boolean in a way that is dependent on that type

CIS550 Handout 7 Fall 2001 11

Axis navigation• So far, nearly all our expressions have moved us down

the by moving to child nodes. Exceptions were – . -- stay where you are– / go to the root– // all descendants of the root– .// all descendants of the context node

• All other expressions have been abbreviations for child::… e.g. child::para. child:is an example of an axis

• XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, self– Some of these (self, parent) describe single nodes, others

describe sequences of nodes.

CIS550 Handout 7 Fall 2001

XPath Navigation Axes(merci, Arnaud Sahuguet)

ancestor

descendant

followingpreceding

following-siblingpreceding-sibling

child

attribute

namespace

self

CIS550 Handout 7 Fall 2001

XPath abbreviated syntax

(nothing) child::@ attribute::// /descendant-or-self::node(). self::node().// descendant-or-self::node.. parent::node()/ (document root)

CIS550 Handout 7 Fall 2001 14

XPath• Reasonably widely adopted -- in XML-Schema

and query languages.• Neither more expressive nor less expressive

than regular path expressions (can’t do (ab)* )• Particularly messy in some areas:

– defining order of results– overloading of operations,

• e.g. [chapter/title = “Introduction”]• why not [ “Introduction” IN chapter/title] ?

CIS550 Handout 7 Fall 2001 15

XQuery

proposed by Chamberlin, Robbie and Florescu

(from the authors’ slides)

• Leverage the most effective features of several existing and proposed query languages

• Design a small, clean, implementable language• Cover the functionality required by all the XML Query

use cases in a single language• Write queries that fit on a slide

CIS550 Handout 7 Fall 2001 16

XQuery = XPath + “comprehension” syntax

• XML -QL

• Quilt

where <pattern> in <XML-expression> <pattern> in <XML-expression> … <condition>construct <expression>

bind variables

use variables

for x in <XPath-expression> y in <XPath-expression> …where <condition>return <expression>

bind variables

use variables

CIS550 Handout 7 Fall 2001 17

Examples from XQuery

List the titles of books published by Morgan Kaufmann in 1998.

FOR $b IN document("bib.xml")//book WHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998"RETURN $b/title

XPath expressionsin orange

CIS550 Handout 7 Fall 2001 18

Examples from XQuery (cont)

List each publisher and the average price of its books.

FOR $p IN distinct(document("bib.xml")//publisher) LET $a := avg( document("bib.xml")//book[publisher = $p]/price)RETURN <publisher> <name> {$p/text()} </name> <avgprice> {$a} </avgprice> </publisher>

LET binds a variable to a value. It does not cause an iteration.

Does this create a (well-formed) XML document?

CIS550 Handout 7 Fall 2001 19

Examples from XQuery (cont)

List the publishers who have published more than 100 books.

<big_publishers> { FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")//book[publisher = $p] WHERE count($b) > 100 RETURN $p }</big_publishers>

What about efficiency?

CIS550 Handout 7 Fall 2001 20

Invert the structure of the input document so that each distinct author element contains a sequence of book-titles.

<author_list> { FOR $a IN distinct(document("bib.xml")//author) RETURN <author> <name> {$a/text()} </name> { FOR $b IN document("bib.xml")//book[author = $a] RETURN $b/title } </author>}</author_list>

Examples from XQuery (cont)

CIS550 Handout 7 Fall 2001 21

More Examples (Quilt)(from http://db.cis.upenn.edu/Kweelt/useCases/R/Q1.qlt )

Relational data -- two DTDs:<?xml version="1.0" ?><!DOCTYPE items [ <!ELEMENT items (item_tuple*)> <!ELEMENT item_tuple (itemno, description, offered_by, start_date?, end_date?, reserve_price? )> <!ELEMENT itemno (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT offered_by (#PCDATA)> <!ELEMENT start_date (#PCDATA)> <!ELEMENT end_date (#PCDATA)> <!ELEMENT reserve_price (#PCDATA)>]>

<?xml version="1.0" ?> <!DOCTYPE bids [ <!ELEMENT bids (bid_tuple*)> <!ELEMENT bid_tuple (userid, itemno, bid, bid_date)> <!ELEMENT userid (#PCDATA)> <!ELEMENT itemno (#PCDATA)> <!ELEMENT bid (#PCDATA)> <!ELEMENT bid_date (#PCDATA)>]>

CIS550 Handout 7 Fall 2001 22

The data<items>

<item_tuple><itemno>1001</itemno><description>Red Bicycle</description><offered_by>U01</offered_by><start_date>1999-01-05</start_date><end_date>1999-01-20</end_date><reserve_price>40</reserve_price></item_tuple>

<item_tuple><itemno>1002</itemno><description>Motorcycle</description><offered_by>U02</offered_by><start_date>1999-02-11</start_date><end_date>1999-03-15</end_date><reserve_price>500</reserve_price></item_tuple>

</items>

<bids>

<bid_tuple><userid>U02</userid><itemno>1001</itemno><bid>35</bid><bid_date>99-01-07</bid_date></bid_tuple>

<bid_tuple><userid>U04</userid><itemno>1001</itemno><bid>40</bid><bid_date>99-01-08</bid_date></bid_tuple>

</bids>

CIS550 Handout 7 Fall 2001 23

Query 1 FUNCTION date(){ "1999-02-01"}

<result> ( FOR $i IN document("items.xml")//item_tuple WHERE $i/start_date LEQ date() AND $i/end_date GEQ date() AND contains($i/description, "Bicycle") RETURN <item_tuple> $i/itemno , $i/description </item_tuple> SORTBY (itemno) )</result>

simple function definitions

dates are formatted sothat lexicographic ordering gives the rightresult

CIS550 Handout 7 Fall 2001 24

Output from Q1

<?xml version="1.0" ?><result> <item_tuple> <itemno> 1003 </itemno> <description> Old Bicycle </description> </item_tuple> <item_tuple> <itemno> 1007 </itemno> <description> Racing Bicycle </description> </item_tuple></result>

CIS550 Handout 7 Fall 2001 25

Query Q2For all bicycles, list the item number, description, and highest bid (if any), ordered by item number.

<result> ( FOR $i IN document("items.xml")//item_tuple LET $b := document("bids.xml")//bid_tuple[itemno = $i/itemno] WHERE contains($i/description, "Bicycle") RETURN <item_tuple> $i/itemno , $i/description , IF ($b) THEN <high_bid> NumFormat("#####.##", max(-1, $b/bid)) </high_bid> ELSE "" </item_tuple> SORTBY (itemno) )</result>

lots of coercion

CIS550 Handout 7 Fall 2001 26

Output from Q2<result> <item_tuple> <itemno> 1001 </itemno> <description> Red Bicycle </description> <high_bid> 55 </high_bid> </item_tuple> <item_tuple> <itemno> 1003 </itemno> <description> Old Bicycle </description> <high_bid> 20 </high_bid> </item_tuple> <item_tuple> <itemno> 1007 </itemno> <description> Racing Bicycle </description> <high_bid> 225 </high_bid> </item_tuple> <item_tuple> <itemno> 1008 </itemno> <description> Broken Bicycle </description> </item_tuple></result>

CIS550 Handout 7 Fall 2001 27

Query Q3Find cases where a user with a rating worse (alphabetically greater than "C" ) offers an item with a reserve price of more than 1000.

<result> ( FOR $u IN document("users.xml")//user_tuple, $i IN document("items.xml")//item_tuple WHERE $u/rating GT 'C' AND $i/reserve_price GT 1000 AND $i/offered_by = $u/userid RETURN <warning> <user_name>$u/name/text()</user_name>, <user_rating>$u/rating/text()</user_rating>, <item_description>$i/description/text()</item_description>, $i/reserve_price </warning> )</result>

Comparing sets with singletonsSame rules as in XPath? In thiscase the DTD gives uniqueness


Recommended