Date post: | 26-Mar-2015 |
Category: |
Documents |
Upload: | rachel-mcintosh |
View: | 233 times |
Download: | 4 times |
1
XPath 2.0 http://www.w3.org/TR/xpath20/
http://www.w3.org/TR/xquery-operators/
Roger L. Costello6 March 2010
2
Set this to XPath 2.0
3
Using Namespaces in Oxygen
• Suppose in the Oxygen XPath expression evaluator tool you would like to write expressions such as this: current-dateTime() - xs:dateTime('2008-01-14T00:00:00')
• How do you tell Oxygen what namespace the "xs" prefix maps to? Here's how:– Go to:
Options ► Preferences ► XML ► XSLT-FO-XQuery ► XPath and in the Default prefix-namespace mappings table add a new entry mapping xs to the XML Schema namespace http://www.w3.org/2001/XMLSchema
4
XML Document<?xml version="1.0" encoding="UTF-8"?><planets> <planet> <name>Mercury</name> <mass units="(Earth = 1)">.0553</mass> <day units="days">58.65</day> <radius units="miles">1516</radius> <density units="(Earth = 1)">.983</density> <distance units="millions miles">43.4</distance> </planet> <planet> <name>Venus</name> <mass units="(Earth = 1)">.815</mass> <day units="days">116.75</day> <radius units="miles">3716</radius> <density units="(Earth = 1)">.943</density> <distance units="millions miles">66.8</distance> </planet> <planet> <name>Earth</name> <mass units="(Earth = 1)">1</mass> <day units="days">1</day> <radius units="miles">2107</radius> <density units="(Earth = 1)">1</density> <distance units="millions miles">128.4</distance> </planet></planets>
planets.xml
We will use this XML document throughout this tutorial, so spend a minute or two familiarizing yourself with it.
It is planets.xml in the example01 folder. Please load it into Oxygen XML.
5
Sequences
• Sequences are central to XPath 2.0
• XPath 2.0 operates on sequences, and generates sequences.
• A sequence is an ordered collection of nodes and/or atomic values.
6
Example Sequences
• This sequence is composed of three atomic values:(1, 2, 3)
• This sequence is also composed of three atomic values:
('red', 'white', 'blue')• This XPath expression will generate a sequence
composed of three <name> nodes:(//planet/name)
See example01http://www.w3.org/TR/xpath20/#id-sequence-expressions
7
More Sequence Examples
• With the following XPath, a sequence of six nodes are generated; the first three are <mass> nodes, the next three are <name> nodes:
(//planet/mass, //planet/name)
• This sequence contains node values followed by atomic values:
(//planet/name, 1, 2, 3)
See example02
8
Definition of Sequence
• A sequence is an ordered collection of zero or more items.• An item is either an atomic value or a node.• An atomic value is a single, non-variable piece of data, e.g.
10, true, 2007, "hello world". (An atomic value is an XML Schema simpleType value)
• There are seven kinds of nodes:– element, text, attribute, document, PI, comment, namespace
• A sequence containing exactly one item is called a singleton sequence.
• A sequence containing zero items is called an empty sequence.
http://www.w3.org/TR/xpath20/#dt-item
9
Sequence Constructor
• A sequence is constructed by enclosing an expression in parentheses.
• Each item is separated by a comma.– The comma is called the sequence constructor
operator.
10
No Nested Sequences
• If you have a sequence (1, 2) and nest it in another sequence
((1, 2), 3) the resulting sequence is flattened to simply
(1, 2, 3)• A nested empty sequence is removed
(1, (2, 3), (), 4, 5, 6)the resulting sequence is flattened to simply:
(1, 2, 3, 4, 5, 6)See example03
11
Extract Items from a Sequence
• You can extract items from a sequence using the […] operator (predicate):
(4, 5, 6)[2]returns the singleton sequence:
(5)
• This XPath expression: //planet[2]returns the second planet
See example04
12
The index must be an integer
• The predicate value must be an integer (more specifically, it must be an XML Schema integer datatype).
(sequence)[index]
The index must be an integer
13
Initializing
• Example: suppose an element may or may not have an attribute, discount. If the element has the discount attribute then return its value; otherwise, return 0.
(@discount, 0)[1]
14
Context Item
• Dot "." stands for the current context item.
• The context item can be a node, e.g.//planet[.]
or it can be an atomic value, e.g. (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)[. mod 2 = 0]
See example05
15
count(sequence)
• This function returns an integer, representing the number of items in the sequence.
See example03.bhttp://www.w3.org/TR/xquery-operators/#func-count
16
Why Nested Parentheses?
Compare these two:
count((1, 2, 3)) count(1, 2, 3)
Notice the nested parentheses
Why is this one correct and the other one incorrect?
17
Answer
• The count function has only one argument.• This form:
count(1, 2, 3)provides three arguments to count, which is incorrect.
• This form:count((1, 2, 3))
provides one argument to count (the argument is a sequence with three items).
18
Sequence of Sequences?
• There is no such thing as a sequence of sequences!
• There's only one sequence; all subsequences get flattened into a single sequence.
count((//planet, (1, 2, 3), ('red', 'white', 'blue')))
sequence of sequences?
19
The value of a non-existent node is the empty
sequence, ()
/Planets/Planet[999]
There is no 999th Planet,so the result of evaluating thisXPath expression is the empty sequence, denoted by ()
20
() is not equal to ''
• An empty sequence is not equal to a string of length zero.
('a', 'b', (), 'c') is not equal to ('a', 'b', '', 'c')
See example03.a
count = 3 count = 4
21
This predicate [.] eliminates empty strings
The value of ('a', '')[.] is just ('a')
The value of ('a', 'b', '', 'c')[.] is just ('a', 'b', 'c')
22
Two built-in functions
true()
false()
http://www.w3.org/TR/xquery-operators/#func-truehttp://www.w3.org/TR/xquery-operators/#func-false
23
index-of(sequence, value)
• The index-of() function allows you to obtain the position of value in sequence.
index-of((1,3,5,7,9,11), 7)
Output: (4)7 is at the 4th index position.
sequence value
http://www.w3.org/TR/xquery-operators/#func-index-of
24
Suppose the value occurs at multiple locations in the
sequence• index-of returns a sequence of index
locations. In the last example the result was a sequence of length 1.
index-of((1,3,5,7,9,11,7,7), 7)
multiple 7's in the sequence
Output: (4, 7, 8)See example05.1
25
remove(sequence, position)
• The remove function enables you to remove a value at a specified position from a sequence.
remove((1,3,5,7,9,11), 4)
sequence position
Output: (1, 3, 5, 9, 11)
See example05.2http://www.w3.org/TR/xquery-operators/#func-remove
remove this
26
The "to" Range Operator
• The range operator–to–can be used to generate a sequence of consecutive integers:
(1 to 10)returns the sequence:
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)• This expression:
(1 to 100)[(. mod 10) = 0]returns the sequence:
(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)• This expression:
(1, 2, 10 to 14, 34, 99)returns this disjointed sequence:
(1, 2, 10, 11, 12, 13, 14, 34, 99) See example06
27
The operands of "to" must be integers
('a' to 'z')
Error message you will get:"Error: Required type of first operand of 'to'is integer; supplied value has type string"
This is not valid:
28
insert-before(sequence, position,value)
insert-before((1,3,4,5,6,7,8,9),2,2
sequence (note: '2' is missing) po
siti
on
valu
e
Output: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
http://www.w3.org/TR/xquery-operators/#func-insert-before
insert the value 2 before position 2
29
Appending a value to the end
insert-before(1 to 10, count(1 to 10) + 1, 2)
Output: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2)
Specify a position greater than thelength of the sequence
30
The inserted value can be a sequence
insert-before((1,3,4,5,6,7,8,9),2,(2,3))
Output: (1, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10)
sequence of values
See example05.3
31
Sequence Functions
• index-of() returns the index (position) of a value
• [idx] returns the value at idx
• remove() returns the sequence minus the item whose index (position) is specified
• insert-before() returns the sequence plus a new value
Do Lab8
32
Sequences are Ordered
• Order matters.
• This generates a sequence composed of the <mass> elements followed by the <name> elements:
(//planet/mass, //planet/name)
See example07
33
reverse(sequence)
See example07.1
Notice in the first example the items are wrapped in parentheses (thus creating a sequence).
http://www.w3.org/TR/xquery-operators/#func-reverse
• This function reverses the items in sequence.
34
The for Expression
• Use the for expression to loop (iterate) over all items in a sequence. This is its general form: for variable in sequence return expression
• Here's an example which iterates over the integers 1-10, multiplying each integer by two:
for $i in (1 to 10) return $i * 2returns
(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
See example08http://www.w3.org/TR/xpath20/#id-for-expressions
35
for Expression Examples
• This iterates over each <planet> element, and returns its <radius> element: for $p in /planets/planet return $p/radius
• This iterates over each <radius> element, and returns itself (the sequence generated is identical to above): for $r in /planets/planet/radius return $r
• This iterates over each letter of the alphabet:for $i in ('a','b','c','d','e','f','g','h','i','j','k','l', 'm','n','o','p','q','r','s','t','u','v','w','x','y','z') return $i
See example09
36
More for Examples
• This returns the radius converted to kilometers (it returns numbers, not nodes):for $r in /planets/planet/radius return $r * 1.61
• This applies the avg() function to the sequence of nodes returned by the for expression:avg(for $r in /planets/planet/radius return $r)
See example10
37
Terminology
for variable in sequence return expression
range variable
input sequence
return expression
The return expression is evaluated once for each item in the input sequence.
38
Multiple Variables
Multiple variables can be used:
for variable in sequence return expression,
39
Example of Multiple Variables
for $x in (1, 2), $y in (3, 4) return ($x * $y)
returns (3, 4, 6, 8)
See example11
Do Lab9
40
The if Expression
• The form of the if expression is:if (boolean expression) then expression1 else expression2
• If the boolean expression evaluates to true then the result is expression1, else the result is expression2
• This if expression finds the minimum of two numbers:if (10 < 20) then 10 else 20
• This for loop returns all the positive numbers in the sequence:for $i in (0, -3, 5, 7, -1, 2) return if ($i > 0) then $i else ()
See example12http://www.w3.org/TR/xpath20/#id-conditionals
41
Nested if-then-else
if (boolean expr) then expr1 else expr2
These can be an if-then-else
42
Notes about the if Expression
1. You must wrap the boolean expression in parentheses.
2. You must have an "else" part. There is no if-then expression, only an if-then-else
Do Lab10
43
The some Expression
• The form of the some expression is:some variable in sequence satisfies boolean expression
• The result of the expression is either true or false.
• Using the some expression means that at least one item in the sequence satisfies the boolean expression.
http://www.w3.org/TR/xpath20/#id-quantified-expressions
44
Examples of the some Expression
• This example determines if there are some (one or more) negative values in the sequence:some $i in (2, 6, -1, 3, 9) satisfies $i < 0
• Note that this produces the same boolean result: (2, 6, -1, 3, 9) < 0 because "<" is a general comparison operator, i.e. it compares each item in the sequence until a match is found.
See example13
45
More Examples of "some"
• Is there is some planet that has a radius greater than 2000?some $i in /planets/planet satisfies $i/radius > 2000
• Note that this produces the same boolean result: /planets/planet/radius > 2000
See example14
46
The every Expression
• The form of the every expression is:every variable in sequence satisfies boolean expression
• The result of the expression is either true or false.
• Using the every expression means that every item in the sequence satisfies the boolean expression.
http://www.w3.org/TR/xpath20/#id-quantified-expressions
47
Examples of the every Expression
• This example determines if every item in the sequence is positive:every $i in (2, 6, -1, 3, 9) satisfies $i > 0
• Note that this produces the same boolean result: not((2, 6, -1, 3, 9) <= 0)
48
Multiple Universal Quantifiers
• An XPath expression can have multiple universal quantifiers.
every variable in sequence satisfies condition,
See example15
49
Union Operator
• The union operator is used to combine two node sequences (cannot union atomic sequences).
• Example: /planets/planet/mass union /planets/planet/radius
produces the sequence:<mass units="(Earth = 1)">.0553</mass><radius units="miles">1516</radius><mass units="(Earth = 1)">.815</mass><radius units="miles">3716</radius><mass units="(Earth = 1)">1</mass><radius units="miles">2107</radius>
http://www.w3.org/TR/xpath20/#combining_seq
50
Equivalent
/planets/planet/mass union /planets/planet/radius
/planets/planet/mass | /planets/planet/radius
The union and | operators are equivalent.
51
Duplicates are Eliminated
• When you union two node sets, any duplicates are eliminated.
• This yields 3 nodes, not 6:
/planets/planet/mass union /planets/planet/mass
See example16
52
Intersect Operator
• The intersect operator returns the intersection of two node sequences.
• Example: find all planets with mass over .8 and radius over 2000:/planets/planet[mass > .8] intersect /planets/planet[radius > 2000]
<planet> <name>Venus</name> <mass units="(Earth = 1)">.815</mass> <day units="days">116.75</day> <radius units="miles">3716</radius> <density units="(Earth = 1)">.943</density> <distance units="millions miles">66.8</distance></planet><planet> <name>Earth</name> <mass units="(Earth = 1)">1</mass> <day units="days">1</day> <radius units="miles">2107</radius> <density units="(Earth = 1)">1</density> <distance units="millions miles">128.4</distance></planet>http://www.w3.org/TR/xpath20/#combining_seq
53
Equivalent
/planets/planet[mass > .8] intersect /planets/planet[radius > 2000]
/planets/planet[(mass > .8) and (radius > 2000)]
54
Duplicates are Eliminated
• When you intersect two node sets, any duplicates are eliminated.
• This yields 2 nodes, not 4:
/planets/planet[mass > .8] intersect /planets/planet[mass > .8]
See example17
55
Except Operator
• The except operator returns the difference between two node sequences.
• Example: get all planets except Earth:/planets/planet except /planets/planet[name='Earth']
<planet> <name>Mercury</name> <mass units="(Earth = 1)">.0553</mass> <day units="days">58.65</day> <radius units="miles">1516</radius> <density units="(Earth = 1)">.983</density> <distance units="millions miles">43.4</distance></planet><planet> <name>Venus</name> <mass units="(Earth = 1)">.815</mass> <day units="days">116.75</day> <radius units="miles">3716</radius> <density units="(Earth = 1)">.943</density> <distance units="millions miles">66.8</distance></planet>
http://www.w3.org/TR/xpath20/#combining_seq
56
Equivalent
/planets/planet except /planets/planet[name='Earth']
/planets/planet[name!='Earth']
See example18
57
I posed a challenge to the xml-dev list, challenging them to simplify an XPath expression. Their answer is awesome.
Problem: create an XPath expression for this:
There must be one child Title element and there must be zero or more child Author elements and there must be one child Date element and nothing else.
Here's the XPath 2.0 expression I created: count(Title) eq 1 and count(Author) ge 0 and count(Date) eq 1 and count(*[not(name() = ('Title','Author','Date'))]) eq 0
See next slide for the solution created by the XPath masters on xml-dev
58
Title and Date and empty(* except (Title[1], Date[1], Author))
Incredible, don't you think?
59
No Duplicates, Document Order
• The union, intersect, and except operators return their results as sequences in document order, without any duplicate items in the result sequence.
60
"Duplicate" is Based on Identity, Not Value
• Two nodes are duplicates iff they are the exact same node.
• These two <p> elements have the same value, but different identities
<div> <p>Box 1</p> <p>Box 1</p></div>
Do Lab11
61
Multiple Node Tests
• Recall that in XPath 1.0 an XPath expression is composed of steps separated by slashes: node-test slash node-test slash …
• At each step you can only specify one node test.
• In XPath 2.0 you can specify multiple node tests on each step.
62
Example of Multiple Node Tests
• Example: select the mass and radius for each planet: /planets/planet/(mass|radius)
<mass units="(Earth = 1)">.0553</mass><radius units="miles">1516</radius><mass units="(Earth = 1)">.815</mass><radius units="miles">3716</radius><mass units="(Earth = 1)">1</mass><radius units="miles">2107</radius>
63
Equivalent
/planets/planet/(mass|radius)
/planets/planet/(mass union radius)
/planets/planet/mass | /planets/planet/radius
/planets/planet/*[(self::mass) or (self::radius)]
See example19
64
Examples of Multiple Node Tests using Union and
Intersect Operators
<test> <a>A</a> <b>B</b> <c>C</c> <d>D</d> <e>E</e></test>
/test/(a, b) union /test/(c, d, e)
<a>A</a><b>B</b><c>C</c><d>D</d><e>E</e>
Output:
/test/(a, b, c) intersect /test/(b, c, d)
Output:
<b>B</b><c>C</c>
XML: XPath: XPath:
See example20
65
Feed Nodes into a Function
• In XPath 1.0 an expression following a slash identifies node(s).
• In XPath 2.0 an expression following a slash can be a function. Each value preceding the slash is fed into the function.
/planets/planet/name/substring(.,1,1)
The name of each planet is fed into Output: ("M", "V", "E")
See example21
66
Feed Nodes into a for loop
/planets/planet/day/(for $i in . return $i * 2)
Output: (117.3, 233.5, 2)
Note: be sure you wrap the for-loop in parentheses.
See example22
67
Can't Feed Atomic Values
• The previous slides showed feeding nodes into a function and for-loop.
• You cannot feed atomic values, e.g., this is illegal: (1 to 10)/(for $i in . return $i)
Here's the error message you get:Error: Required item type of first operand of / is node(); supplied value has item type xs:integer
See example22.a
Do Lab12
68
Comments
• XPath 2.0 expressions may be commented using this syntax:
(: comment :)
(: multiply each day by two :) /planets/planet/day/(for $i in . return $i * 2)
69
General Comparison Operators
• Here are the general comparison operators:=, !=, <, <=, >, >=
• These operators are used to compare sequences.
• Each item in one sequence is compared against each item in the other sequence; the comparison evaluates to true if one or moreitem-item comparisons evaluates to true.
http://www.w3.org/TR/xpath20/#id-general-comparisons
70
How General Comparison Works
(item1, item2) op (item3, item4)
is evaluated as:
(item1 op item3) or (item1 op item4) or (item2 op item3) or (item2 op item4)
(1, 2) = (2, 3)
is evaluated as:
(1 = 2) or (1 = 3) or (2 =2) or (2 = 3)
this it returns true
(1, 2) = (3, 4)
returns false because there are no equal values between the sequences
See example23
71
Example
• The left side returns a sequence of two planets (Venus, Earth), and the right side returns a sequence of three planets (Mercury, Venus, Earth).
• The result is true.
/planets/planet[mass > .8] = /planets/planet[density > .9]
See example24
72
Definition of Equal
• Two nodes are equivalent if:– their node values are the same– the order of the values are the same– the number of values is the same
• The tag names can be different. Comparison is based on data, not markup.
73
Example
• The below document has two <planet> elements. They use different tag names./planets/planet[1] = /planets/planet[2] returns true.
<planets> <planet> <name>Mercury</name> <mass units="(Earth = 1)">.0553</mass> <day units="days">58.65</day> <radius units="miles">1516</radius> <density units="(Earth = 1)">.983</density> <distance units="millions miles">43.4</distance> </planet> <planet> <n>Mercury</n> <m units="(Earth = 1)">.0553</m> <d units="days">58.65</d> <r units="miles">1516</r> <d units="(Earth = 1)">.983</d> <d units="millions miles">43.4</d> </planet></planets>
See example25
74
Equivalent?
• Problem: find all planets whose name is not in this sequence ('Earth', 'Mars')
• Are these equivalent?/planets/planet[not(name = ('Earth', 'Mars'))]/planets/planet[name != ('Earth', 'Mars')]
75
Not Equivalent!
<planet> <name>Mercury</name> <mass units="(Earth = 1)">.0553</mass> <day units="days">58.65</day> <radius units="miles">1516</radius> <density units="(Earth = 1)">.983</density> <distance units="millions miles">43.4</distance> </planet><planet> <name>Venus</name> <mass units="(Earth = 1)">.815</mass> <day units="days">116.75</day> <radius units="miles">3716</radius> <density units="(Earth = 1)">.943</density> <distance units="millions miles">66.8</distance> </planet>
/planets/planet[not(name = ('Earth', 'Mars'))]
<planet> <name>Mercury</name> <mass units="(Earth = 1)">.0553</mass> <day units="days">58.65</day> <radius units="miles">1516</radius> <density units="(Earth = 1)">.983</density> <distance units="millions miles">43.4</distance> </planet><planet> <name>Venus</name> <mass units="(Earth = 1)">.815</mass> <day units="days">116.75</day> <radius units="miles">3716</radius> <density units="(Earth = 1)">.943</density> <distance units="millions miles">66.8</distance> </planet><planet> <name>Earth</name> <mass units="(Earth = 1)">1</mass> <day units="days">1</day> <radius units="miles">2107</radius> <density units="(Earth = 1)">1</density> <distance units="millions miles">128.4</distance> </planet>
/planets/planet[name != ('Earth', 'Mars')]
76
Explanation
/planets/planet[not(name = ('Earth', 'Mars'))]
for each planet is its name 'Earth' or 'Mars'? if so, don't return it otherwise return it
/planets/planet[name != ('Earth', 'Mars')]
for each planet is its name not 'Earth' or not 'Mars'? if so, don't return it otherwise return it
Consider the planet whose name is Earth:
EarthEarth
Mars
equal?
equal?
not((Earth equal Earth) or (Earth equal Mars))not(true or false)not(true)false
Consider the planet whose name is Earth:
EarthEarth
Mars
not equal?
not equal?
(Earth not equal Earth) or (Earth not equal Mars)false or truetrue(Every planet will not equal Earth or Mars, so everyplanet is returned.
See example26
77
Value Comparison Operators
• Here are the value comparison operators: eq, ne, lt, le, gt, ge
• These operators are used to compare atomic values.
• Example:10 lt 30 returns true• Example:
/planets/planet[1]/name eq 'Mercury' returns true
See example27http://www.w3.org/TR/xpath20/#id-value-comparisons
78
No Sequences Allowed!
• Suppose the third planet contains two <name> elements:
<planet><name>Earth</name><name>Mother Earth</name>
</planet>then
/planets/planet[3]/name eq 'Earth'raises an error:
"Error! A sequence of more than one item is not allowed as the first operandof 'eq'."
See example28
79
However, this works
Note that:/planets/planet[3]/name = 'Earth'
returns true because the "=" operator is used with sequences.
See example29
80
is Operator
• You can compare two nodes to see if they are the same nodes by using the "is" operator:
expr1 is expr2returns true only if expr1 and expr2 identify the same node. expr1 and expr2 must be singleton sequences.
This expression //planet[mass = .815] is //planet[day = 116.75]returns true because both expressions identify the same <planet> element
See example30http://www.w3.org/TR/xpath20/#id-node-comparisons
81
<< Operator
• This expressionexpr1 << expr2
returns true if the node identified by expr1 comes before the node identified by expr2 in the document.
This expression //planet[mass = .0553] << //planet[mass = .815]returns true because the left expression identifies Mercury, the right expression identifies Venus, and Mercury comes before Venus in the document
See example31http://www.w3.org/TR/xpath20/#id-comparisons
82
>> Operator
• This expressionexpr1 >> expr2
returns true if the node identified by expr1 comes after the node identified by expr2 in the document.
This expression //planet[mass = .815] >> //planet[mass =.0553]returns true because the left expression identifies Venus, the right expression identifies Mercury, and Venus comes after Mercury in the document
See example32http://www.w3.org/TR/xpath20/#id-comparisons
Do Lab13
83
Arithmetic Operators
• Here are the arithmetic operators:+, -, *, div, mod, idiv
• The idiv operates on integers and returns an integer rounded toward zero, e.g.
3 idiv 2 returns 1-5 idiv 2 returns -2
See example33http://www.w3.org/TR/xpath20/#id-arithmetic
84
Equivalent
n idiv m
floor(n div m) if n and m are positive
ceiling(n div m) if n or m is negative
85
current-dateTime Function
• current-dateTime() is an XPath 2.0 function that returns the current date and time, e.g.
2008-01-19T14:19:26.406-05:00
• The value returned by this function is of type xs:dateTime (the XML Schema dateTime datatype).
See example34http://www.w3.org/TR/xquery-operators/#func-current-dateTime
86
The matches() Function
• The form of the matches function is:matches(input string, regex)
• It is a boolean function. It returns true if the input string matches the regular expression, false otherwise.
if (matches(/planets/planet[2]/name, 'Venus')) then 'Success' else 'Failure'
The matches() function evaluates to true; the result is 'Success'
http://www.w3.org/TR/xpath-functions/#func-matches
87
The matches() Function
if (matches(/planets/planet[2]/name, 'V[a-z]+s')) then 'Success' else 'Failure'
This regex says: Any string that starts with 'V' ends with 's' and has at least one lowercase letter of the alphabet.
See example44
88
Regular Expressions
• The following 4 slides show examples of regular expressions:
Regular Expressions Chapter \d Chapter \d a*b [xyz]b a?b a+b [a-c]x
Examples Chapter 1 Chapter 1 b, ab, aab, aaab, … xb, yb, zb b, ab ab, aab, aaab, … ax, bx, cx
89
Regular Expressions (cont.)
Regular Expressions[a-c]x
[-ac]x
[ac-]x
[^0-9]x
\Dx
Chapter\s\d
(ho){2} there
(ho\s){2} there
.abc
(a|b)+x
Examplesax, bx, cx
-x, ax, cx
ax, cx, -x any non-digit char followed by x
any non-digit char followed by x
Chapter followed by a blank followed by a digit
hoho there
ho ho there any (one) char followed by abc
ax, bx, aax, bbx, abx, bax,...
90
Regular Expressions (cont.)
a{1,3}x
a{2,}x
\w\s\w
ax, aax, aaax
aax, aaax, aaaax, …
word character (alphanumeric plus dash) followed by a space followed by a word character
[a-zA-Z-[Ol]]* A string composed of any lower and upper case letters, except "O" and "l"
\. The period "." (Without the backward slash the period means "any character")
91
Regular Expressions (cont.)
^Hello
Hello$
^Hello$
Hello (and it must be at the beginning)
Hello (and it must be at the end)
Hello (and it must be the only value)
92
Regular Expressions (cont.)
\n
\r
\t
\\
\|
\-
\^
\?
\*
\+
\{
\}
\(
\)
\[
\]
linefeed
carriage return
tab
The backward slash \
The vertical bar |
The hyphen -
The caret ^
The question mark ?
The asterisk *
The plus sign +
The open curly brace {
The close curly brace }
The open paren (
The close paren )
The open square bracket [
The close square bracket ]
93
Regular Expressions (concluded)
\p{L}
\p{Lu}
\p{Ll}
\p{N}
\p{Nd}
\p{P}
\p{Sc}
A letter, from any language
An uppercase letter, from any language
A lowercase letter, from any language
A number - Roman, fractions, etc
A digit from any language
A punctuation symbol
A currency sign, from any language
\p{Sc}\p{Nd}+(\.\p{Nd}\p{Nd})? "currency sign from anylanguage, followed by one or more digits from any language, optionally followed by a period and two digits from anylanguage"
94
Different from the Regex in the XML Schema Pattern Facet
<element name="Free-text"> <simpleType> <restriction base="string"> <pattern value="Hello" /> </restriction> </simpleType></element>
Consider this XML Schema element declaration:
<Free-text>Hello</Free-text>
And suppose this is the input:
The input validates against the schema. That is, the string "Hello" matches the regex in the pattern facet.Likewise, using the same input and regex, the matches function succeeds:
if (matches(//Free-text, 'Hello')) then 'Success' else 'Failure'
95
Different from the Regex in the XML Schema Pattern Facet
<Free-text>He said Hello World</Free-text>
Next, consider this input:
The input does not validate against the schema. That is, the string "He said Hello World" does not match the regex in the pattern facet.Conversely, the matches function does succeed:
if (matches(//Free-text, 'Hello')) then 'Success' else 'Failure'
http://www.w3.org/TR/xpath-functions/#regex-syntax
96
XSD Regex's are Implicitly Achored
• When you give a regex in a pattern facet, there are "implicit anchors" in the regex.
• The regex "Hello" is actually this:
^Hello$
The ^ matches the start of the input
The $ matches the end of the input
Thus "Hello" matches only input that starts with H, ends with o, and in between is ello.
97
No Implicit Anchors in XPath Regex's
• The regex "Hello" in XPath has no implicit anchors. Any anchors must be explicitly specified.
• Thus, the regex "Hello" matches any input that contains the string Hello
if (matches(//Free-text, 'Hello')) then 'Success' else 'Failure'
is equivalent to:
if (contains(//Free-text, 'Hello')) then 'Success' else 'Failure'
See example45
98
Case-Insensitivity Mode
• The matches function has an optional third argument:
matches(input, regex, flags)• The "i" flag is used to: perform a case-insensitive
comparison of the input and the regex.
Example: suppose this is the input:
<Free-text>He said HELLO WORLD</Free-text>
Consider this XPath:
if (matches(//Free-text, 'Hello', 'i')) then 'Success' else 'Failure'
The result is 'Success' because the input is checked to see if it contains 'Hello', 'hello', 'HELLO', 'HeLLO', etc.
99
The Default is Case-Sensitive
• If the "i" flag is not used in the matches function, it defaults to a case-sensitive comparison.
Consider this XPath:
if (matches(//Free-text, 'Hello')) then 'Success' else 'Failure'
The result is 'Failure' because the input is checked to see if it contains 'Hello'
See example46
100
Multiline Mode
• The "m" flag is used to indicate that the input should be treated as composed of one or more lines, each line has a start and end, and the regex should be compared against each line.
Example: suppose this is the input:
<Free-text>He said Hello World</Free-text>
Consider this XPath:
if (matches(//Free-text, '^Hello', 'm')) then 'Success' else 'Failure'
The result is 'Success.' The regex says: does the input start with the string 'Hello.' The 'm' flag say: check each line. Thus, the result is 'Success' since the second line start with 'Hello.'
101
The Default is One Long String
• If the "m" flag is not used in the matches function, it defaults to treating the input as one long string, with one start and one end.
Consider this XPath:
if (matches(//Free-text, '^Hello')) then 'Success' else 'Failure'
The result is 'Failure' because the input is treated as one long string and 'Hello' does not start the string.
See example47
102
Dot-all Mode
• The "s" flag is used to indicate that the dot (.) character matches every character, including the newline (x0A) character.
• If the "s" flag is not used, the default behavior is for the dot character to match every character except the newline character.
if (matches('HelloWorld', 'H.*World')) then 'Success' else 'Failure'
The result is 'Failure'
if (matches('HelloWorld', 'H.*World', 's')) then 'Success' else 'Failure'
The result is 'Success'
See example48
103
Ignore Whitespace Mode
• The "x" flag is used to indicate that whitespace in a regex should be ignored.
• If the "x" flag is not used then any whitespace in the regex is treated as part of the regex.
if (matches('abcabc', '(a b c)+')) then 'Success' else 'Failure'
The result is 'Failure.' The regex only matches this input: a b c, a b c a b c, etc.
if (matches('abcabc', '(a b c)+', 'x')) then 'Success' else 'Failure'
The result is 'Success.' The regex only matches this input: abc, abcabc, etc.
See example49
104
Multiple Flags
• Zero or more flags can be specified.
• The default value is used for modes not specified.
if (matches('HelloWorld', '^WORLD$', 'im')) then 'Success' else 'Failure'
The result is 'Success.' The regex says: The input must begin and end with the literal string 'WORLD.' The flags say: ignore case and treat the input as 2 lines, and compare each line.
See example50
Do Lab14
105
The tokenize() Function
• Use to split up a string into pieces (tokens).
• A regex specifies the characters that separate the tokens.
for $i in tokenize('12, 16, 3, 99', ',\s*') return $i
The result is: 12 16 3 99
http://www.w3.org/TR/xpath-functions/#func-tokenize
106
Use Flags with tokenize()
• The flags (i, m, s, x) we saw with the matches() function are also available with tokenize()
for $i in tokenize('12xx16XX3xX99', 'xx', 'i') return $i
The result is: 12 16 3 99
See example51
107
Separators are Discarded
• The separators are specified using a regex.
• The input string is processed from left to right, looking for substrings that match the regex.
• The separators are discarded, the remaining strings are collected and yield the output sequence.
108
Example: Footnote References as Separators
• Tokenize the input using [n] as the separators.
• For example, tokenize this:XPath[1] XSLT[2]
into these tokens: XPath XSLT
Will this work?
tokenize('XPath[1] XSLT[2]', '\[.+\]')
109
+ is a Greedy Quantifier
• The regex on the previous slide does not produce the desired result.
• Here's why: the + operator searches for the longest string that matches. It is called a greedy operator.
\[.+\] Read as: find the longest string that startswith '[' and ends with ']'
See example52
110
Why Does This Work?
tokenize('XPath[1] XSLT[2]', '\[\d+\]')
111
Regex is for [digit(s)]
tokenize('XPath[1] XSLT[2]', '\[\d+\]')
Only permit digits in the brackets
See example53
112
+? is a non-Greedy Operator
• If you want to match the shortest possible substring, add a '?' after the quantifier to make it non-greedy.
\[.+?\] Read as: find the shortest string that startswith '[' and ends with ']'
tokenize('XPath[1] XSLT[2]', '\[.+?\]') Yields the desired tokens: 'XPath' and 'XSLT'
See example54
113
* and + are Greedy
• Above we saw that + is greedy
• * is also greedy
• To make them non-greedy append a '?'*? and +?
114
Regex with 2 Alternatives, and Both Match
• Consider this XPath: tokenize('bab', 'a|ab')
• What tokens will be generated?{b, b} or {b}
115
First Alternative Wins!
• If multiple alternatives match, the first one is used.
• Thus, the result is: {b, b}
• Suppose that's not what we want. We want the longest alternative ('ab') used whenever possible.
See example55
116
Solution
• Both of these regex's give the desired result:ab|a or ab?
See example56
117
Separator Matches Beginning and Ending
• Consider this XPath: tokenize('aba', 'a')
• The input string starts with the separator and ends with the separator
• What will be the result?
118
Zero-length Strings
• The output is a zero-length string, 'b', zero-length string:
{'', 'b', ''}
See example57
119
Regex Doesn't Match Input
• If the regex doesn't match the input string then the result is the input string:
tokenize('bbb', 'a') produces {'bbb'}
See example58
Do Lab15
120
What Separator?
• Suppose you want to split (tokenize) this string W151TBH into
{'W', '151', 'TBH'}
• That is, separate the numeric from the alphabetic.
• What regex would you use?
121
Need More Knowledge
• The problem can't be solved given what we currently know.
• However, it can be solved by using the tokenize() function with the replace() function, so let's learn about replace().
122
The replace() Function
• The replace() function replaces any string that matches the regex with a replacement string:
replace(input, regex, replacement)
• Example: this removes all vowels:replace('Hello World', '[aeiou]', '')
returns:{'Hll Wrld'}
See example59http://www.w3.org/TR/xpath-functions/#func-replace
123
Example
• What is the result of this replace:replace('banana', '(an)*a', '#')
See example60
124
* is a Greedy Operator
• The result of: replace('banana', '(an)*a', '#')is b#
• (an)* looks for the longest string of 'anan…'
• The * is a greedy operator
• To make it non-greedy, append ? to the * replace('banana', '(an)*?a', '#')
• The result is: b#n#n#
See example61
125
Two Matching Alternatives
• Suppose the regex contains two alternatives, and both match:
replace('banana', 'a|an', '#')
• What will be the result?
126
Leftmost Alternative Wins
• The rule is that the first (leftmost) alternative wins:
replace('banana', 'a|an', '#')results in:
b#n#n#• Switching the alternatives:
replace('banana', 'an|a', '#')results in:
b###
See example62
127
Using Variables in the Replacement String
• Consider a regex composed of a sequence of parenthesized expressions:
( … )( … )( … )
$1 $2 $3
$1 stands for the characters matched by the first parenthesized expression
$2 stands for the characters matched by the second parenthesized expression
…
$9 stands for the characters matched by the ninth parenthesized expression
128
Example: Insert Hyphens into a Date
replace('12March2008', '([0-9]+)([a-zA-Z]+)([0-9]+)', '$1-$2-$3')
The result is: 12-March-2008
See example63
129
Regex Doesn't Match Input
• If the regex doesn't match the input then the result will be unchanged:
replace('aaaa', 'b', '#')
The result is: aaaa
See example64
130
Use Flags with replace()
• replace() uses the same flags as matches() and tokenize(): i, m, s, x
• Example: replace('Haha', 'h', 'b', 'i')returns:
baba
See example65
Do Lab16
131
Tokenize this String
• How would you separate the numeric parts from the character parts:
W151TBH
{'W', '151', 'TBH'}
132
Step 1
• Use replace() to append a hash mark (#) onto the end of each part:
W151TBH
W#151#TBH#
This is accomplished using replace:replace('W151TBH', '([0-9]+|[a-zA-Z]+)', '$1#')
See example66
133
Step 2
• Tokenize using # as the separator:
W#151#TBH#
{'W', '151', 'TBH', ''}
This is accomplished by this: tokenize('W#151#TBH#', '#')
See example67
134
Step 3
• Remove the zero-length string
('W', '151', 'TGH', '')[.]
The predicate says: Give me the value of the sequence.Recall that the value of ('a', '')[.] is just ('a')
See example68
135
Putting it all Together
tokenize(replace('W151TBH', '([0-9]+|[a-zA-Z]+)', '$1#'), '#')[.]
This produces: ('W', '151', 'TBH')
See example69
136
What does the predicate apply to?
• What is the result of these statements?
//name[1]
(//name)[1]
137
Answer
• //name[1] returns the first <name> element in each <planet> element.– Number of elements returned: 3
• (//name)[1] returns the first <name> element among all the <name> elements in all the <planet> elements.– Number of elements returned: 1
See example70
138
Select the first Book by each Author
<BookStore> <Book> <Title>Illusions The Adventures of a Reluctant Messiah</Title> <Author>Richard Bach</Author> <Date>1977</Date> <ISBN>0-440-34319-4</ISBN> <Publisher>Dell Publishing Co.</Publisher> </Book> <Book> <Title>The First and Last Freedom</Title> <Author>J. Krishnamurti</Author> <Date>1954</Date> <ISBN>0-06-064831-7</ISBN> <Publisher>Harper & Row</Publisher> </Book> <Book> <Title>Jonathan Livingston Seagul</Title> <Author>Richard Bach</Author> <Date>1970</Date> <ISBN>0-684-84684-5</ISBN> <Publisher>Simon & Schuster</Publisher> </Book></BookStore>
Select these two
139
Select the first Book by each Author
//Book[not(Author = preceding::Book/Author)]
The predicate evaluates to true if the Author of the Book is not the same as the Author of a preceding Book
See example71
Do Lab17
140
XPath Functions
• http://www.w3schools.com/Xpath/xpath_functions.asp
• http://www.w3.org/TR/xquery-operators/#contents
141
XPath 2.0 Functions
142
distinct-values(values)
• This XPath function will return a sequence composed of unique values.
distinct-values((2, 2, 3, 4, 1, 4, 2, 6, 3, 9))
Output: 2 3 4 1 6 9
Note that the sequence ofintegers is wrapped withina pair of parentheses. Why?Because the function takesonly one argument.
See example72
http://www.w3.org/TR/xquery-operators/#func-distinct-values
143<?xml version="1.0"?><FitnessCenter> <Member id="1" level="platinum"> <Name>Jeff</Name> <FavoriteColor>lightgrey</FavoriteColor> </Member> <Member id="2" level="gold"> <Name>David</Name> <FavoriteColor>lightblue</FavoriteColor> </Member> <Member id="3" level="platinum"> <Name>Roger</Name> <FavoriteColor>lightyellow</FavoriteColor> </Member> <Member id="4" level="platinum"> <Name>Sally</Name> <FavoriteColor>lightgrey</FavoriteColor> </Member> <Member id="5" level="platinum"> <Name>Linda</Name> <FavoriteColor>purple</FavoriteColor> </Member></FitnessCenter>
distinct-values(/FitnessCenter/Member/FavoriteColor)
Output: lightgrey lightblue lightyellow purple
Another Example
See example73
Do Lab18
144
doc(url)
• The doc(url) function is used to retrieve data from another XML document.
doc('FitnessCenter2.xml')
See example74
You must put quotes around the file name.Actually, the argument to doc() is a URL.
http://www.w3.org/TR/xquery-operators/#func-doc
145
data(item)
• This function returns the (atomic) value of node, i.e., it "atomizes" the node.
• This function is exactly the same as the string(item) function, except the string function always returns the value of the item as a string, whereas the data(item) function returns the value of the item with its type intact.
http://www.w3.org/TR/xquery-operators/#func-data
146
data(item)
string(/FitnessCenter/Member[1]/MembershipFee) + 1 error
data(/FitnessCenter/Member[1]/MembershipFee) + 1 341
data(340) + 1 341
See example75
147
error(QName?, description)
• You can raise an error in your XPath using the error() function.
for $i in /FitnessCenter/Member return if (number($i/MembershipFee) lt 0) then error((), 'Invalid value for MembershipFee') else true()
http://www.w3.org/TR/xquery-operators/#func-error See example76
148
trace(value, message)
• This is used for debugging, to monitor the execution.• The trace() function does two things:
– it returns (outputs) value
– it displays message and information about value
for $i in /FitnessCenter/Member return trace($i/MembershipFee, 'The membership fee is:')
Output:<MembershipFee>340</MembershipFee> <MembershipFee>-500</MembershipFee>
<MembershipFee>340</MembershipFee>
Screen:The membership fee is: [1]: element(MembershipFee, untyped): /FitnessCenter/Member[1]/MembershipFee[1]The membership fee is: [1]: element(MembershipFee, untyped): /FitnessCenter/Member[2]/MembershipFee[1]The membership fee is: [1]: element(MembershipFee, untyped): /FitnessCenter/Member[3]/MembershipFee[1]
http://www.w3.org/TR/xquery-operators/#func-trace See example77
149
compare(string1, string2)
• This function performs a string comparison of string1 against string2.
• If string1 is less than string2 then it returns -1
• If string1 is equal to string2 then it returns 0
• If string1 is greater than string2 then it returns 1
compare('ab','abc')compare('ab','ab')compare('abc','ab')
Output: -1 0 1
http://www.w3.org/TR/xquery-operators/#func-compare See example78
150
string-join(sequence, separator)
• The first argument identifies any number of values.
• The function will concatenate all the values, placing separator between each value.
string-join(('a','b','c'),' ')string-join(/FitnessCenter/Member/Name,'/')
Output: a b c Jeff/David/Roger
http://www.w3.org/TR/xquery-operators/#func-string-join See example79
151
An elegant way of creating the XPath to any node
string-join(for $i in ancestor-or-self::* return name($i),'/')
This returns the name of the currentnode (self) plus all its ancestors
Example: Suppose that the currentnode is FavoriteColor. Then this willreturn: FitnessCenter Member FavoriteColorAnd this function will concatentate thesevalues together, separating each value with /Thus, the output is: FitnessCenter/Member/FavoriteColor See example80
Do Lab19
152
starts-with(string-to-test, string)
• This function returns true if string-to-test starts with string, false otherwise.
starts-with('abc', 'a')starts-with(/FitnessCenter/Member[1]/FavoriteColor, 'light')
Output: true true Note: this XPath function is also present in version 1.0
See example81http://www.w3.org/TR/xquery-operators/#func-starts-with
153
ends-with(string-to-test, string)
• This function returns true if string-to-test ends with string, false otherwise.
ends-with('xyz', 'yz')ends-with(/FitnessCenter/Member[1]/FavoriteColor, 'grey')
Output: true true
Note: this XPath function is not present in version 1.0
See example82http://www.w3.org/TR/xquery-operators/#func-ends-with
154
String Functions You Already Know
• contains(string-to-test, string)
• substring(string, starting-loc, length?)
• substring-before(string, match-string)
• substring-after(string, match-string)
• translate(string, from-pattern, to-pattern)
See example83http://www.w3.org/TR/xquery-operators/#contents
155
normalize-space(string)
• This function strips leading and trailing whitespace (space, carriage return, tab), and replaces multiple whitespaces within the data by a single space.
normalize-space(' A cat ate the mouse ')normalize-space('There aretwo lines')
Output: A cat ate the mouse There are two lines
See example84http://www.w3.org/TR/xquery-operators/#func-normalize-space
156
upper-case(string)lower-case(string)
upper-case('hello world')
lower-case('BLUE SKY')
Output: HELLO WORLD
Output: blue sky
See example85
http://www.w3.org/TR/xquery-operators/#func-upper-case
http://www.w3.org/TR/xquery-operators/#func-lower-case
157
escape-html-uri(uri)
• This function makes a URI usable by browsers, by escaping non-ASCII characters.
escape-html-uri('http://www.example.com?value=Π')
Output: http://www.example.com?value=%CE%A0
See example86http://www.w3.org/TR/xquery-operators/#func-escape-html-uri
158
year-from-date(xs:date)
• The argument of this function is a date as defined in XML Schemas.
• Recall that the format of a date is: CCYY-MM-DD
year-from-date(xs:date('2009-09-19'))
Output: 2009
See example87http://www.w3.org/TR/xquery-operators/#func-year-from-date
159
Many Date, Time Functions!
year-from-dateTime(xsd:dateTime)month-from-dateTime(xsd:dateTime)day-from-dateTime(xsd:dateTime)hours-from-dateTime(xsd:dateTime)minutes-from-dateTime(xsd:dateTime)seconds-from-dateTime(xsd:dateTime)timezone-from-dateTime(xsd:dateTime)year-from-date (xsd:date)month-from-date (xsd:date)day-from-date (xsd:date)timezone-from-date (xsd:date)hours-from-time (xsd:time)minutes-from-time (xsd:time)seconds-from-time (xsd:time)timezone-from-time (xsd:time)
http://www.w3.org/TR/xquery-operators/#component-extraction-functions See example88
160
root(node?)
Document/
PI<?xml version=“1.0”?>
ElementFitnessCenter
ElementMember
ElementMember
ElementMember
ElementName
ElementFavoriteColor
TextJeff
Textlightgrey
ElementName
ElementFavoriteColor
TextDavid
Textlightblue
ElementName
ElementFavoriteColor
TextRoger
Textlightyellow
The root() function returnsthe document node
161
Useful if working with multiple documents
• The root() function can be very useful if are working with multiple documents.
• The following XPath expression outputs the name of every node in the document, regardless of what document is currently being processed.
for $i in root()//* return name($i)
See example89http://www.w3.org/TR/xquery-operators/#func-root
162
subsequence(sequence, start-loc, length?)
• This function returns a portion of sequence. Namely, it returns the items in sequence starting at index position start-loc. If length is not specified then it returns all the following items in the sequence. Otherwise, it returns length items.
subsequence((1 to 10), 2, 5)subsequence(//Name, 2)
Output: 2,3,4,5,6 <Name>David</Name> <Name>Roger</Name>
See example90http://www.w3.org/TR/xquery-operators/#func-subsequence
Do Lab20
163
zero-or-one(sequence) one-or-more(sequence)exactly-one(sequence)
• These functions are used to assert that a sequence contains the number of occurrences that you expect.
• Each function will generate an error if the sequence does not contain the expected number of occurrences. If the sequence does contain the expected number of occurrences then it simply returns the sequence
zero-or-one(/FitnessCenter/Member[1]/Name) one-or-more(/FitnessCenter/Member[1]/Phone) exactly-one(/FitnessCenter/Member[1]/FavoriteColor)
See example91http://www.w3.org/TR/xquery-operators/#func-zero-or-one
164
avg(sequence)
avg((1 to 100))avg(//MembershipFee)
Output: 50.5 393.3333333333
Note that the avg() function has only one argument.Consequently, in the first XPath expression it was necessary to wrap the items with parentheses.
See example92http://www.w3.org/TR/xquery-operators/#func-avg
165
max(sequence)
• The max() function enables you to obtain the maximum value among a sequence of values.
http://www.w3.org/TR/xpath-functions/#func-max
max((5, 3, 19, 2, -7))max(//MembershipFee)
See example93
Output: 19 500
166
min(sequence)
• The min() function enables you to obtain the minimum value among a sequence of values.
http://www.w3.org/TR/xpath-functions/#func-max
min((5, 3, 19, 2, -7))min(//MembershipFee)
See example94
Output: -7 340
167
Why 2 sets of parentheses?
• Did you notice that I used two sets of parentheses in the min and max functions?– min((2,1,3)) and max((2,1,3))
• In fact, if you omitted the inner parenthesis you would get an error message.– min(2,1,3) and max(2,1,3)
Error!
168
Reason for 2 parentheses
• Both the min and max functions have an optional second argument, collation:
min(sequence, collation?) max(sequence, collation?)
• The collation argument enables you to specify the collating sequence that should be used to determine the min/max value. We will typically just use the default collating sequence. Consequently, we will not use the second argument.
• Do you now understand the need for the 2 parentheses?
min(2,1)
Is this a member of the sequence, or is it a collation?Instead, you must do this: min((2,1))
169
number(value), string(value)number(value) … "Hey, treat value as a number".string(value) … "Hey, treat value as a string".
09 represents the number 9, which has a string value of '9'
See example95
http://www.w3.org/TR/xquery-operators/#func-number
http://www.w3.org/TR/xquery-operators/#func-string
170
Lesson Learned
• When you are doing a comparison of two values it is very good practice to wrap your values within either number() or string(). That way you are explicitly telling the XSLT Processor how you want the values compared - as numeric values or as string values.
171
exists() function
• This function returns either true or false.
• This function is used to determine if an element exists.
if (exists(/FitnessCenter/Member[3])) then 'There is a 3rd Member' else 'Error! No 3rd Member'
Output: There is a 3rd Member
if (exists(/FitnessCenter/Member[99])) then 'There is a 99th Member' else 'Error! No 99th Member'
Output: Error! No 99th Member
http://www.w3.org/TR/xquery-operators/#func-exists
172
exists(()) = false
exists(())
Output: false
"The empty sequence does not exist"
See example96
173
empty() function
• This function returns either true or false.
• This function is used to determine if an element does not exist.
if (empty(/FitnessCenter/Member[3])) then 'No 3rd Member' else 'Error! There is a 3rd Member'
Output: Error! There is a 3rd Member
if (empty(/FitnessCenter/Member[99])) then 'No 99th Member' else 'Error! There is a 99th Member'
Output: No 99th Member
http://www.w3.org/TR/xquery-operators/#func-empty See example97
174
empty(()) = true
empty(())
Output: true
"The empty sequence is empty"
See example97
175
empty() = not(exists())
empty(/FitnessCenter/Member[3]) eq not(exists(/FitnessCenter/Member[3]))
Output: true
empty(/FitnessCenter/Member[99]) eq not(exists(/FitnessCenter/Member[99]))
Output: true
See example98
176
deep-equal(sequence1, sequence2)
See example99http://www.w3.org/TR/xquery-operators/#func-deep-equal
• This function returns true if the two sequences are identical in value and position.
177
operand instance of datatype
• You can use the XPath instance of boolean operator to determine if an operand is of a particular datatype.
• The operand must not be a node. You must first atomize the node, using data(.)
• instance of checks the datatype label on the operand. The label must match datatype. Thus 340 is an instance of xs:integer, but not xs:positiveInteger
http://www.w3.org/TR/xpath20/#id-instance-of
178
operand instance of datatype
http://www.w3.org/TR/xpath20/#id-instance-of See example100
179
operand cast as datatype
• You can use the XPath cast as boolean operator to make operand be a particular datatype:
equivalent
See example101http://www.w3.org/TR/xpath20/#id-cast
180
operand castable as datatype
• You can use the XPath castable as boolean operator to determine if an operand can be cast to a particular datatype:
See example102http://www.w3.org/TR/xpath20/#id-castable
if (//Member[1]/MembershipFee castable as xs:integer) then (//Member[1]/MembershipFee cast as xs:integer) * 2 else false()
181
name, local-name, namespace-uri
• name() returns whatever is inside <…>• local-name() returns the name that's after the colon
<…:…>• namespace-uri() returns the namespace
See example103
182
string(node)
• This extracts the data of a node and returns it as a string.
http://www.w3.org/TR/xquery-operators/#func-string See example104
183
base-uri(node?),document-uri(node)
• These return the filepath/URL to where the XML is executing.
http://www.w3.org/TR/xquery-operators/#func-base-uri
See example105http://www.w3.org/TR/xquery-operators/#func-document-uri
184
Kind Tests
• Here are different ways to select a kind of item:node(): selects any kind of node
(element, attribute, text, comment, PI, namespace)
text(): selects a text nodeelement(): selects an element nodeelement(Member): selects Member
element nodesattribute(): selects attribute nodesattribute(id): selects id attribute nodesdocument(): selects the document nodecomment(): selects a comment nodeprocessing-instruction(): selects a PI node
185
Occurrence Indicators
• Use + to indicate one or more
• Use * to indicate zero or more
• Use ? to indicated zero or one
186
See example107
Please look at these examples; they illustrate the kind test and occurrence indicators
187
XPath 2.0 is a Strongly Typed Language
• Each XPath 2.0 function returns a value of a specific datatype. The argument(s) that are passed to the function must be of the required datatype.
• Also, the XPath 2.0 operators require the operands be of a required datatype. For example, you cannot perform arithmetic operations on strings without explicitly telling the processor to treat your strings like numbers.
188
XPath 2.0 is a Strongly Typed Language
• Consider this expression:'3' + 2
Here's the error message that you will get:Arithmetic operator is not defined for arguments of types (xs:string, xs:integer)
• Conversely, in XPath 1.0 the processor automatically coerces the string into a number.
See example35
189
Advantages of a Strongly Typed System
• Early and reliable identification of errors.– Example: '3' + 2 will generate an error because the type
of the first operand is not appropriate for the operator.• Implementations (XPath processors) can optimize
performance if they know about the types of the data.– Example: Consider this comparison:
//planet/* = 'mars'If the processor knows the datatypes of each child of <planet> then it can just compare the string children against 'mars'
190
Disadvantages of a Strongly Typed System
• XPath authoring is complicated because more attention must be paid to types.– Example: if you want to compare a number against a
number that is represented as a string then you have to explicitly cast the number to a string and then do the comparison.
• Supporting an extensive type system puts a burden on implementers of XPath. This is why schema awareness is optional for implementers.
191
XML Schema Datatypes
• XPath 2.0 uses the datatypes defined in the XML Schema Datatypes Specification
192
193
XPath Functions are Strongly Typed
• Each XPath function requires arguments to be of a certain datatype.
• Each XPath function returns a result as a certain datatype.
• Example: here is the signature of the current-dateTime function:
current-dateTime() as xs:dateTime Read as: "The current-dateTime function is invoked without any arguments; it returns a value that has the datatype: XML Schema dateTime."
194
XPath Operators are Strongly Typed
• Each XPath operator requires the operands to be of a certain datatype.
• Each XPath operator returns a result as a certain datatype.
• Example: you can subtract two dateTime values and the result is of type xs:durationcurrent-dateTime() - xs:dateTime('1970-01-01T00:00:00Z') returns P14275DT15H49M28.796S Read as: "The duration between now (Jan. 31, 2009, 10:49am) and Jan. 01, 1970 is 14,275 days, 15 hours, 49 minutes, 28.796 seconds."
See example36
195
Constructor Functions• Constructor functions are used to construct atomic values with the
specified types.• Example: the constructor:
xs:dateTime('1970-01-01T00:00:00Z')constructs an atomic value whose type is xs:dateTime.
• The signature of the xs:dateTime constructor is:xs:dateTime($arg as xs:anyAtomicType?) as xs:dateTime?
• There is a constructor function for each of the W3C built-in atomic types.
• If the argument is a node, the atomic value is extracted and that value is cast to the type.
• If the argument is an empty sequence, the result is an empty sequence.• The complete list of constructor functions.
196xs:string($arg as xs:anyAtomicType?) as xs:string?xs:boolean($arg as xs:anyAtomicType?) as xs:boolean?xs:decimal($arg as xs:anyAtomicType?) as xs:decimal?xs:float($arg as xs:anyAtomicType?) as xs:float?Implementations ·may· return negative zero for xs:float("-0.0E0"). xs:duration($arg as xs:anyAtomicType?) as xs:duration?xs:dateTime($arg as xs:anyAtomicType?) as xs:dateTime?xs:time($arg as xs:anyAtomicType?) as xs:time?xs:date($arg as xs:anyAtomicType?) as xs:date?xs:gYearMonth($arg as xs:anyAtomicType?) as xs:gYearMonth?xs:gYear($arg as xs:anyAtomicType?) as xs:gYear?xs:gMonthDay($arg as xs:anyAtomicType?) as xs:gMonthDay?xs:gDay($arg as xs:anyAtomicType?) as xs:gDay?xs:gMonth($arg as xs:anyAtomicType?) as xs:gMonth?xs:hexBinary($arg as xs:anyAtomicType?) as xs:hexBinary?xs:base64Binary($arg as xs:anyAtomicType?) as xs:base64Binary?xs:anyURI($arg as xs:anyAtomicType?) as xs:anyURI?xs:QName($arg as xs:anyAtomicType) as xs:QName?xs:normalizedString($arg as xs:anyAtomicType?) as xs:normalizedString?xs:token($arg as xs:anyAtomicType?) as xs:token?xs:language($arg as xs:anyAtomicType?) as xs:language?xs:NMTOKEN($arg as xs:anyAtomicType?) as xs:NMTOKEN?xs:Name($arg as xs:anyAtomicType?) as xs:Name?xs:NCName($arg as xs:anyAtomicType?) as xs:NCName?xs:ID($arg as xs:anyAtomicType?) as xs:ID?xs:IDREF($arg as xs:anyAtomicType?) as xs:IDREF?xs:ENTITY($arg as xs:anyAtomicType?) as xs:ENTITY?xs:integer($arg as xs:anyAtomicType?) as xs:integer?xs:nonPositiveInteger($arg as xs:anyAtomicType?) as xs:nonPositiveInteger?xs:negativeInteger($arg as xs:anyAtomicType?) as xs:negativeInteger?xs:long($arg as xs:anyAtomicType?) as xs:long?xs:int($arg as xs:anyAtomicType?) as xs:int?xs:short($arg as xs:anyAtomicType?) as xs:short?xs:byte($arg as xs:anyAtomicType?) as xs:byte?xs:nonNegativeInteger($arg as xs:anyAtomicType?) as xs:nonNegativeInteger?xs:unsignedLong($arg as xs:anyAtomicType?) as xs:unsignedLong?xs:unsignedInt($arg as xs:anyAtomicType?) as xs:unsignedInt?xs:unsignedShort($arg as xs:anyAtomicType?) as xs:unsignedShort?xs:unsignedByte($arg as xs:anyAtomicType?) as xs:unsignedByte?xs:positiveInteger($arg as xs:anyAtomicType?) as xs:positiveInteger?xs:yearMonthDuration($arg as xs:anyAtomicType?) as xs:yearMonthDuration?xs:dayTimeDuration($arg as xs:anyAtomicType?) as xs:dayTimeDuration?xs:untypedAtomic($arg as xs:anyAtomicType?) as xs:untypedAtomic?
197
New Datatypes
• The XPath 2.0 working group decided that the XML Schema datatypes are not complete, so they created a few new ones and added them to the XML Schema datatypes.
198
xs:anyAtomicType
• xs:anyAtomicType is an abstract type that is the base type of all atomic values.
• All datatypes, including the original XML Schema datatypes, are subtypes of xs:anyAtomicType
• "Abstract" means that it cannot be used directly; instead, a subtype must be used.
199
xs:untypedAtomic
• Any value that has not been associated with a schema type has the type xs:untypedAtomic.
200
xs:dayTimeDuration
• This is a subtype of xs:duration. It has only day, hour, minute, and second components.
• Subtracting two xs:date values yields a result of type xs:dayTimeDuration
current-date() - xs:date('1970-01-01')
P1Y2M3DT10H30M12.3S
P428DT10H30M12.3S
xs:duration
xs:dayTimeDuration
See example37
subtype
201
Subtracting Two Dates
• Here's an example of subtracting two xs:date values:
current-date() - xs:date('1970-01-01')
• The resulting value is an xs:dayTimeDuration value.
• Here's how it is specified in the XPath 1.0 and XPath 2.0 Functions and Operators specification:
op:subtract-dates($arg1 as xs:date, $arg2 as xs:date) as xs:dayTimeDuration?
http://www.w3.org/TR/xquery-operators/#func-subtract-dates
"When subtracting two values, each of type xs:date, the resulting value is of type xs:dayTimeDuration."
202
xs:yearMonthDuration
• This is also a subtype of xs:duration. It has only has the year and month components.
P1Y2M3DT10H30M12.3S
P1Y2M
xs:duration
xs:yearMonthDuration
subtype
203
Datatype of Literals and Expressions
• datatype of current-dateTime() - xs:dateTime('1970-01-01T00:00:00Z') is xs:dayTimeDuration
• datatype of current-date() - xs:date('1970-01-01') is xs:dayTimeDuration
• datatype of 3 is xs:integer
• datatype of 3.14 is xs:decimal
• datatype of "3" is xs:string
• datatype of true is Unknown xs:untypedAtomic
• datatype of true() is xs:boolean
• datatype of 1E3 is xs:double
See example38
204
Datatype of Input Data Unassociated with a Schema
• datatype of //planet[1]/mass is Unknown xs:untypedAtomic
• datatype of //planet[1]/mass/text() is Unknown xs:untypedAtomic
See example39
205
Datatype of Arithmetic Operations
• datatype of 2 + 2 is xs:integer
• datatype of 2.0 + 2.0 is xs:decimal
• datatype of 2.0 + 2 is xs:decimal
• datatype of 6 div 2 is xs:integer
• datatype of 6.0 div 2.0 is xs:decimal
• datatype of 6.0 div 2 is xs:decimal
See example40
206
Numeric Types
• The 4 main numeric types supported in XPath 2.0 are:– xs:decimal
– xs:integer
– xs:float
– xs:double
• All arithmetic operators and functions that can be performed on these types can also be performed on their subtypes.
207
xs:decimal
• Numeric literals that contain only digits and a decimal point (no letter E or e) are considered to be decimal numbers with the type xs:decimal.
• Example: 25.5 and 25.0 are xs:decimal values.
208
xs:integer
• Numeric literals that contain only digits (no decimal point or the letter E or e) are considered to be integer numbers with the type xs:integer.
• Example: 25 is an integer value.
209
xs:float and xs:double
• Numeric literals that contain the letter E or e are considered to be double numbers with the type xs:double.
• Example: 1E3 and 1e3 are xs:double values.
See example41
210
How a Value becomes Numeric
• The value is a numeric literal• The value is selected from an input document that is
associated with a schema that declares it to have a numeric type
• The value is the result of a function that returns a number, e.g. count(…) returns xs:integer
• The value is the result of a numeric constructor function, e.g. xs:float("25.83") returns a xs:float value
• The value is the result of an explicit cast, e.g., //planet[1]/mass cast as xs:decimal
• The value is cast automatically when it is passed to a function
211
The number() Function
• The number() function is almost equivalent to the xs:double() constructor function.
• Both return a value of type xs:double.• Differences:
– number("hi") = NaN– xs:double("hi") = error– number(()) = NaN– xs:double(()) = error
See example42
212
Numeric Type Promotion
• If an operation, such as comparison or an arithmetic operation, is performed on values of two different primitive numeric types, one value's type is promoted to the type of the other.
213
Numeric Type Promotion
Operand #1 Operand #2 Promoted to
xs:decimal xs:float xs:float
xs:decimal xs:double xs:double
xs:float xs:double xs:double
214
Numeric Type Promotion
Example: 1.0 + 1.2E0 = 2.2E0xs:decimal xs:double xs:double
promote
xs:double
Numeric type promotion happens automatically in arithmetic expressions and comparison expressions. It also occurs in calls to functions that expect numeric values.
See example43
215
Subtype Substitution
• Wherever a type is expected, you can substitute it with any of its derived types.
• Example: a function that expects a xs:decimal value can be invoked with an xs:integer value since integer derives from decimal.
216