XML Data Management 8. XQuery Werner Nutt. Requirements for an XML Query Language David Maier, W3C...

Post on 29-Mar-2015

244 views 1 download

Tags:

transcript

XML Data Management

8. XQuery

Werner Nutt

Requirements for an XML Query Language

David Maier, W3C XML Query Requirements:

• Closedness: output must be XML

• Composability: wherever a set of XML elements is required, a

subquery is allowed as well

• Support for key operations:– selection

– extraction, projection

– restructuring

– combination, join

– fusion of elements

Requirements for an XML Query Language

• Can benefit from a schema,

but should also be applicable without

• Retains the order of nodes

• Formal semantics:

– structure of results should be derivable from query

– defines equivalence of queries

• Queries should be representable in XML

documents can have embedded queries

How Does One Design a Query Language?

• In most query languages, there are two aspects to a

query:

– Retrieving data (e.g., from … where … in SQL)

– Creating output (e.g., select … in SQL)

• Retrieval consists of

– Pattern matching (e.g., from … )

– Filtering (e.g., where … )

… although these cannot always be clearly distinguished

XQuery Principles

• Data Model identical with the XPath data model

– documents are ordered, labeled trees

– nodes have identity

– nodes can have simple or complex types

(defined in XML Schema)

• A query result is an ordered list/sequence of items

(nodes, values, attributes, etc., but not lists)

– special case: the empty list ()

XQuery Principles (cntd)

• XQuery can be used without schemas,

but can be checked against DTDs and XML schemas

• XQuery is a functional language

– no statements

– evaluation of expressions

– function definitions

– modules

The Recipes DTD (Reminder)

<!ELEMENT recipes (recipe*)>

<!ELEMENT recipe (title, ingredient+, preparation, nutrition)>

<!ELEMENT title (#PCDATA)>

<!ELEMENT ingredient (ingredient*, preparation?)>

<!ATTLIST ingredient

name CDATA #REQUIRED

amount CDATA #IMPLIED

unit CDATA #IMPLIED>

<!ELEMENT preparation (step+)>

<!ELEMENT step (#PCDATA)>

<!ELEMENT nutrition EMPTY>

<!ATTLIST nutrition

calories CDATA #REQUIRED

fat CDATA #REQUIRED>

<titles>

{for $r in doc("recipes.xml")//recipe

return

$r/title}

</titles>

returns

<titles>

<title>Beef Parmesan with Garlic Angel Hair Pasta</title>

<title>Ricotta Pie</title>

</titles>

A Query over the Recipes Document

Query Features

XPath

<titles>

{for $r in doc("recipes.xml")//recipe

return

$r/title}

</titles>

doc(String) returns input document

Part to be returned as it is given {To be evaluated}

Iteration $var - variables

Sequence of results,one for each variable binding

An Equivalent Stylesheet Template

<xsl:template match="/">

<titles>

<xsl:for-each select="//recipe">

<xsl:copy-of select="title"/>

</xsl:for-each>

</titles>

</xsl:template>

Features: Summary

• The result is a new XML document

• A query consists of parts that are returned as is

• ... and others that are evaluated (everything in {...} )

• Calling the function doc(String) returns

an input document

• XPath is used to retrieve node sets and values

• Iteration over node sets:

for binds a variable to all nodes in a node set

• Variables can be used in XPath expressions

• return returns a sequence of results,

one for each binding of a variable

XPath is a Fragment of XQuery

• doc("recipes.xml")//recipe[1]/title

returns

<title>Beef Parmesan with Garlic Angel Hair Pasta</title>

• doc("recipes.xml")//recipe[position()<=3]

/title

returns

<title>Beef Parmesan with Garlic Angel Hair Pasta</title>,

<title>Ricotta Pie</title>,

<title>Linguine alla Pescadora</title>

an element

a list of elements

Beware: Attributes in XPath

• doc("recipes.xml")//recipe[1]/ingredient[1]

/@name

→ attribute name {"beef cube steak"}

• string(doc("recipes.xml")//recipe[1]

/ingredient[1]/@name)

→ "beef cube steak"

an attribute, represented as a constructor

for an attribute node (not in Saxon)

a value of type string

Beware: Attributes in XPath (cntd.)

• <first-ingredient>

{string(doc("recipes.xml")//recipe[1]

/ingredient[1]/@name)}

</first-ingredient>

→ <first-ingredient>beef cube steak</first-ingredient>

an element with string content

Beware: Attributes in XPath (cntd.)

• <first-ingredient>

{doc("recipes.xml")//recipe[1]

/ingredient[1]/@name}

</first-ingredient>

→ <first-ingredient name="beef cube steak"/>

an element with an attribute

• Note: The XML that we write down is only the surface structure

of the data model that is underlying XQuery

Beware: Attributes in XPath (cntd.)

• <first-ingredient

oldName="{doc("recipes.xml")//recipe[1]

/ingredient[1]/@name}">

Beef

</first-ingredient>

→ <first-ingredient oldName="beef cube steak">

Beef

</first-ingredient>

An attribute is cast as a string

Constructor Syntax

For all constituents of documents, there are constructors

element first-ingredient

{

attribute oldName

{string(doc("recipes.xml")//recipe[1]

/ingredient[1]/@name)},

"Beef"

}

equivalent to the notation on the previous slide

attribute constructor

element constructor

Iteration with the For-Clause

Syntax: for $var in xpath-expr

Example: for $r in doc("recipes.xml")//recipe

return string($r)

• The expression creates a list of bindings for a variable $var

If $var occurs in an expression exp,

then exp is evaluated for each binding

• For-clauses can be nested:

for $r in doc("recipes.xml")//recipefor $v in doc("vegetables.xml")//vegetable return ...

What Does This Return?

for $i in (1,2,3)

for $j in (1,2,3)

return

element {concat("x",$i * $j)}

{$i * $j}

Nested For-clauses: Example

<my-recipes>

{for $r in doc("recipes.xml")//recipe

return

<my-recipe title="{$r/title}">

{for $i in $r//ingredient

return

<my-ingredient>

{string($i/@name)}

</my-ingredient>

}

</my-recipe>

}

</my-recipes>

Returns my-recipes with titles as attributes and my-ingredientswith names as text content

The Equivalent Stylesheet Template

<xsl:template match="/">

<my-recipes>

<xsl:for-each select=".//recipe">

<my-recipe title="{title}">

<xsl:for-each select="ingredient">

<my-ingredient>

<xsl:value-of select="@name"/>

</my-ingredient>

</xsl:for-each>

</my-recipe>

</xsl:for-each>

</my-recipes>

</xsl:template>

The Let Clause

Syntax: let $var := xpath-expr

• binds variable $var to a list of nodes,

with the nodes in document order

• does not iterate over the list

• allows one to keep intermediate results for reuse

(not possible in SQL)

Example:

let $oorecps := doc("recipes.xml")//recipe

[.//ingredient/@name="olive oil"]

Let Clause: Example

<calory-content>

{let $oorecps := doc("recipes.xml")//recipe

[.//ingredient/@name="olive oil"]

for $r in $oorecps return

<calories>

{$r/title/text()}

{": "}

{string($r/nutrition/@calories)}

</calories>}

</calory-content> Calories of recipeswith olive oil

Note the implicitstring concatenation

Let Clause: Example (cntd.)

The query returns:

<calory-content>

<calories>Beef Parmesan: 1167</calories>

<calories>Linguine alla Pescadora: 532</calories>

</calory-content>

The Where Clause

Syntax: where <condition>

• occurs before return clause

• similar to predicates in XPath

• comparisons on nodes:

“=“ for node equality

“<<“ and “>>” for document order

• Example:

for $r in doc("recipes.xml")//recipe where $r//ingredient/@name="olive oil"return ...

Quantifiers

• Syntax:

some/every $var in <node-set>

satisfies <expr>

• $var is bound to all nodes in <node-set>

• Test succeeds if <expr> is true for some/every

binding

• Note: if <node-set> is empty, then

“some” is false and “all” is true

Quantifiers (Example)

• Recipes that have some compound ingredient

• Recipes where every top level ingredient is non-compound

for $r in doc("recipes.xml")//recipewhere some $i in $r/ingredient satisfies $i/ingredient return $r/title

for $r in doc("recipes.xml")//recipewhere every $i in $r/ingredient satisfies not($i/ingredient) return $r/title

Element Fusion

“To every recipe, add the attribute calories!”

<result>

{let $rs := doc("recipes.xml")//recipe

for $r in $rs return

<recipe>

{$r/nutrition/@calories}

{$r/title}

</recipe>}

</result>

an element

an attribute

Element Fusion (cntd.)

The query result:

<result>

<recipe calories="1167">

<title>Beef Parmesan with Garlic Angel Hair Pasta</title>

</recipe>

<recipe calories="349"><title>Ricotta Pie</title></recipe>

<recipe calories="532"><title>Linguine Pescadoro</title></recipe>

<recipe calories="612"><title>Zuppa Inglese</title></recipe>

<recipe calories="8892">

<title>Cailles en Sarcophages</title>

</recipe>

</result>

Fusion with Mixed Syntax

We mix constructor and XML–Syntax:

element result

{let $rs := doc("recipes.xml")//recipe

for $r in $rs return

<recipe>

{attribute calories {$r/nutrition/@calories}}

{$r/title}

</recipe>}

The Same with Constructor Syntax Only

element result

{let $rs := doc("recipes.xml")//recipe

for $r in $rs return

element recipe

{

attribute calories {$r/nutrition/@calories},

$r/title

}

}

Join condition

“Pair every ingredient with the recipes where it is used!”

let $rs := doc("recipes.xml")//recipe

for $i in $rs//ingredient

for $r in $rs

where $r//ingredient/@name=$i/@name

return

<usedin>

{$i/@name}

{$r/title}

</usedin>

Join

Join (cntd.)

The query result:

<usedin name="beef cube steak">

<title>Beef Parmesan with Garlic Angel Hair Pasta</title>

</usedin>,

<usedin name="onion, sliced into thin rings">

<title>Beef Parmesan with Garlic Angel Hair Pasta</title>

</usedin>,

<usedin name="green bell pepper, sliced in rings">

<title>Beef Parmesan with Garlic Angel Hair Pasta</title>

</usedin>

Join Exercise

Return all pairs of ingredients such that

• the ingredients have the same name,

• but occur with different amounts

and return

• the recipes where each of them is used

• together with the amount being used in those recipes,

while returning every pair only once.

Could a query for these ingredients be expressed in XPath?

Join condition

Document Inversion

“For every ingredient, return all the recipes where it is used!”

<result>

{let $rs := doc("recipes.xml")//recipe

for $i in $rs//ingredient

return

<ingredient>

{$i/@*}

{$rs[.//ingredient/@name=$i/@name]/title}

</ingredient>}

</result>

Document Inversion (cntd.)

The query result:

<result>

<ingredient amount="1" name="Alchermes liquor" unit="cup">

<title>Zuppa Inglese</title>

</ingredient>

<ingredient amount="2" name="olive oil" unit="tablespoon">

<title>Beef Parmesan with Garlic Angel Hair Pasta</title>

<title>Linguine Pescadoro</title>

</ingredient>

Eliminating Duplicates

The function distinct-values(Node Set)

– extracts the values of a sequence of nodes

– creates a duplicate free list of values

Note the coercion: nodes are cast as values!

Example:

let $rs := doc("recipes.xml")//recipe

return distinct-values($rs//ingredient/@name)

yields

xdt:untypedAtomic("beef cube steak"),

xdt:untypedAtomic("onion, sliced into thin rings"),

...

by the Galaxengine

Avoiding Multiple Results in a Join

We want that every ingredient is listed only once:

Eliminate duplicates using distinct-values!

<result>

{let $rs := doc("recipes.xml")//recipe

for $in in distinct-values(

$rs//ingredient/@name)

return

<recipes with="{$in}">

{$rs[.//ingredient/@name=$in]/title}

</recipes> }

</result>

Avoiding Multiple Results (cntd.)

The query result:

<result> <recipes with="beef cube steak"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </recipes>

<recipes with="onion, sliced into thin rings"> <title>Beef Parmesan with Garlic Angel Hair Pasta</title> </recipes>... <recipes with="salt"> <title>Linguine Pescadoro</title> <title>Cailles en Sarcophages</title>

</recipes>

...

Syntax: order by expr [ ascending | descending ]

for $iname in doc("recipes.xml")//@name

order by $iname descending

return string($iname)

yields

"whole peppercorns",

"whole baby clams",

"white sugar",

...

The Order By Clause

The Order By Clause (cntd.)

let $rs := doc("recipes.xml")//@name

for $r in $rs

order by $r/nutrition/@calories

return $r/title

In which order will the titles come?

The Order By Clause (cntd.)

The interpreter must be told whether the values

should be regarded as numbers or as strings

(alphanumerical sorting is default)

for $r in $rs

order by number($r/nutrition/@calories)

return $r/title

Note:

– The query returns titles ...

– but the ordering is according to calories,

which do not appear in the output

Also possible in SQL! What if combined with distinct-values?

FLWOR Expresssions (pronounced “flower”)

We have now seen the main ingredients of XQuery:

• For and Let clauses, which can be mixed

• a Where clause imposing conditions

• an Order by clause, which determines the order of results

• a Return clause, which constructs the output.

Combining these yields FLWOR expressions.

Conditionals

if (expr) then expr else expr

Example

let $is := doc("recipes.xml")//ingredient

for $i in $is[not(ingredient)]

let $u := if (not($i/@unit))

then attribute unit {"pieces"}

else ()

creates an attribute unit="pieces" if none exists

and an empty item list otherwise

We use the conditional to construct variants of ingredients:

let $is := doc("recipes.xml")//ingredient

for $i in $is[not(ingredient)]

let $u := if (not($i/@unit))

then attribute {"unit"} {"pieces"}

else ()

return

<ingredient>

{$i/@* | $u}

</ingredient>

Conditionals (cntd.)

Collects all attributes in a list and adds a unitif needed

Conditionals (cntd.)

The query result:

<ingredient name="beef cube steak" amount="1.5"

unit="pound"/>,

...

<ingredient name="eggs" amount="12"

unit="pieces"/>,

Exercises

Write queries that produce

• A list, containing for every recipe the recipe's title element

and an element with the number of calories

• The same, ordered according to calories

• The same, alphabetically ordered according to title

• The same, ordered according to the fat content

• The same, with title as attribute and calories as content.

• A list, containing for every recipe the top level ingredients,

dropping the lower level ingredients

Sample Solution 1

A list, containing for every recipe the recipe's title element

and an element with the number of calories

<result>

{for $r in doc("recipes.xml")//recipe

return

($r/title,

<calories>

{number($r//@calories)}

</calories>)

}

</result>

The results returned are 2-element lists.

The list constructor is“( . , . )”

Sample Solution 6

<results> {for $r in doc("recipes.xml")//recipe return <recipe> {attribute title {$r/title}, for $i in $r/ingredient return if (not($i/ingredient)) then $i else <ingredient> {$i/@*} </ingredient> } </recipe> }</results>

Aggregation

Aggregation functions count, sum, avg, min, max

Example: The number of recipes with olive oil

let $doc := doc("recipes.xml”)

return

<number>

{count($doc//recipe

[.//ingredient/@name = "olive oil"])}

</number>

Grouping and Aggregation

For each recipe, the number of simple ingredients

for $r in doc("recipes.xml")//recipe

return

<number>

{attribute title {$r/title/text()}}

{count($r//ingredient[not(ingredient)])}

</number>

Grouping and Aggregation (cntd.)

The query result:

<number title="Beef Parmesan with Garlic Angel Hair Pasta">

11</number>,

<number title="Ricotta Pie">12</number>,

<number title="Linguine Pescadoro">15</number>,

<number title="Zuppa Inglese">8</number>,

<number title="Cailles en Sarcophages">30</number>

Grouping and Aggregation (cntd.)

A list, containing for every ingredient,

the number of occurrences of that ingredient

let $d := doc("recipes.xml")

let $is := distinct-values($d//ingredient/@name)

return

<result>

{for $i in $is

order by $i

return

<ingredient name="{$i}">

{count($d//ingredient[@name=$i])}

</ingredient>}

</result>

Nested Aggregation

“The recipe with the maximal number of calories!”

let $rs := doc("recipes.xml")//recipe

let $maxCal := max($rs//@calories)

for $r in $rs

where $r//@calories = $maxCal

return string($r/title)

returns

"Cailles en Sarcophages"

User-defined Functions

declare function local:fac($n as xs:integer)

as xs:integer

{

if ($n = 0)

then 1

else $n * local:fac($n - 1)

};

local:fac(10)

FunctionDeclaration

FunctionCall

Example: Nested Ingredients

declare function

local:nest($n as xs:integer, content as xs:string)

as element()

{

if ($n = 0)

then element ingredient{$content}

else element ingredient{local:nest($n - 1,$content)}

};

local:nest(3,"Stuff")

What Does this Function Return?

declare function local:depth($n as node())

as xs:integer

{

if (fn:empty($n/*))

then 1

else let $cdepths

:= for $c in $n/* return local:depth($c)

return fn:max($cdepths) + 1

};

Exercise

Write a function

local:element-copy

that

• takes as input a node (= XML tree)

• produces as output a copy of the tree,

but without the attributes