+ All Categories
Home > Documents > Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of...

Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of...

Date post: 26-Nov-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
56
Using XPath Using XPath Querying an XML Document with XPath Expressions [email protected] Sample Content © Garth Gilmour 2008
Transcript
Page 1: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Using XPathUsing XPath

Querying an XML Document with XPath Expressions

[email protected] Content © Garth Gilmour 2008

Page 2: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL

A single standard which could query, transform and format XML documents for publication (DSSL in SGML)

Each component proved useful by itself An expression based language could simplify code

In the same way that regular expressions simplify parsing A t f ti l ld d b A transformation language could produce web pages

Who needs a formatting standard when you have HTML/CSS? The XSL standard was split three ways

The querying component became XPath The querying component became XPath The transformation component became XSLT The formatting objects were placed in XSL-FO

© Garth Gilmour 2008

Page 3: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th XP th St d dThe XPath Standard

XPath lets you address nodes within an XML document This is a prerequisite for data extraction and transformation

XPath is only half of a query language XPath is only half of a query language You can retrieve data but not manipulate it That functionality is provided by XSLT and XQuery

XP th i lt ti t SAX DOM XPath expressions are an alternative to SAX or DOM They eliminate a lot of tedious and error prone coding The approach is similar to regular expressions in Perl

© Garth Gilmour 2008

Page 4: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

XP th D t M d lXPath Data Model

The XML parser views a document as a tree of nodes Joins between multiple physical files are seamless

The most common tree structures are DOM and XPath The most common tree structures are DOM and XPath The structure of the tree is termed the data model

The XPath data model is organised as follows: Seven node types make up the tree Nodes are arranged around thirteen axis

Axis are either forwards or reverse XPath expressions return one of four data types

String, number, boolean and node set Nodes are found in document or proximity order

© Garth Gilmour 2008

Page 5: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

XP th N d TXPath Node TypesNode Type Description

Root The very top of the node tree (one root node per document)Element Created from a start tag, end tag and the enclosed contentg, gText A continuous block of characters found in an elementComment The characters inside a pair of ‘<!-- -->’ delimitersProcessing Instruction An instruction to the XML ParserProcessing Instruction An instruction to the XML ParserAttribute An name/value pair attached to an elementNamespace Represents an XML namespace

© Garth Gilmour 2008

Page 6: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Di ti i th XP th N d TDirections in the XPath Node TreeAxis Name DescriptionAxis Name Description

self Our current position within the node tree

parent The parent of the current node (the root node has no parent)

ancestor Obtained by collecting parent nodes from the current node to the rooty g p

ancestor-or-self All ancestor nodes and the current node

child All the nodes that are declared immediately inside the current node

descendant The child nodes plus their children and their childrens children etc…

descendant-or-self All descendant nodes and the current node

preceding-sibling All children of the parent node that appear before the current node

following-sibling All children of the parent node that appear after the current node

preceding All nodes which occur before the current node

following All nodes which occur after the current node

attribute All attribute nodes attached to the current node

namespace All namespace nodes attached to the current node

© Garth Gilmour 2008

namespace All namespace nodes attached to the current node

Page 7: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th S lf A iThe Self Axis The self axis always contains

the current node ‘self::*’ selects the current

node if it is an element

<a><b>

<c><d>Text One</d> node if it is an element

‘self::age’ selects the current node if it is an ‘age’ element

‘self::node()’ always selects the current node

<d>Text One</d><e>Text Two</e>

</c></b><f> the current node

The abbreviation is ‘.’ ‘count(self::node())’ returns 1

How the current node is determined is not specified

f<g>Text Three</g><h>Text Four</h><i>Text Five</i><j>Text Six</j> determined is not specified

It is set by the client programj Text Six /j

<k>Text Seven</k></f>

</a>

© Garth Gilmour 2008

Page 8: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th S lf A iThe Self Axis

© Garth Gilmour 2008

Page 9: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th P t A iThe Parent Axis The parent axis contains the

parent of the current node ‘parent::node()’ always selects

th t d

<a><b>

<c><d>Text One</d> the parent node

The abbreviation is ‘..’ Root nodes don't have parents

‘/..’ selects the parent of the

<d>Text One</d><e>Text Two</e>

</c></b><f> / se ects t e pa e t o t e

root and hence is always an empty node set

Attribute and namespace nodes do not have parents

f<g>Text Three</g><h>Text Four</h><i>Text Five</i><j>Text Six</j> nodes do not have parents

Because they are attachedj Text Six /j

<k>Text Seven</k></f>

</a>

© Garth Gilmour 2008

Page 10: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th P t A iThe Parent Axis

© Garth Gilmour 2008

Page 11: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th A t A iThe Ancestor Axis

The ancestor axis selects parent nodes recursively Starting from the current

<a><b>

<c><d>Text One</d> Starting from the current

node and stopping with the root node

The ancestors of ‘h’ are ‘f’,

<d>Text One</d><e>Text Two</e>

</c></b><f>

‘a’ and the root node ‘count(ancestor::*)’ would

return 2 for ‘h’ and 3 for the text node it contains

f<g>Text Three</g><h>Text Four</h><i>Text Five</i><j>Text Six</j> the text node it contains

The ancestor-or-self axis includes the current node

j Text Six /j<k>Text Seven</k>

</f></a>

© Garth Gilmour 2008

Page 12: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th A t A iThe Ancestor Axis

© Garth Gilmour 2008

Page 13: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th A t O S lf A iThe Ancestor-Or-Self Axis

© Garth Gilmour 2008

Page 14: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th Child A iThe Child Axis The child axis contains

children of the current node But not any nested children

<a><b>

<c><d>Text One</d>

Children may be elements, comments or PI’s ‘child::*’ selects element

children of the current node

<d>Text One</d><e>Text Two</e>

</c></b><f>

‘child::node()’ selects all children of the current node

The children of element node ‘f’ are ‘g’,‘h’,‘i’, ‘j’ and ‘k’

f<g>Text Three</g><h>Text Four</h><i>Text Five</i><j>Text Six</j> f are g , h , i , j and kj Text Six /j<k>Text Seven</k>

</f></a>

© Garth Gilmour 2008

Page 15: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th Child A iThe Child Axis

© Garth Gilmour 2008

Page 16: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th D d t A iThe Descendant Axis The descendant axis contains

all the nodes declared inside the current node

Th d d f ‘f’

<a><b>

<c><d>Text One</d>

The descendants of ‘f’ are 5 element and 5 text nodes Excluding the issue of

whitespace only text nodes

<d>Text One</d><e>Text Two</e>

</c></b><f>

‘count(descendant::node())’ returns 10

‘count(descendant::*)’ would return 5

f<g>Text Three</g><h>Text Four</h><i>Text Five</i><j>Text Six</j>

The descendant-or-self axis includes the current node

j Text Six /j<k>Text Seven</k>

</f></a>

© Garth Gilmour 2008

Page 17: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th D d t A iThe Descendant Axis

© Garth Gilmour 2008

Page 18: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th D d t O S lf A iThe Descendant-Or-Self Axis

© Garth Gilmour 2008

Page 19: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th P di Sibli A iThe Preceding Sibling Axis The preceding sibling axis

contains siblings which occur before the current node

Th d i i f d

<a><b>

<c><d>Text One</d>

The ordering is referred to as ‘document order’

The preceding siblings of ‘i’ are ‘g’ and ‘h’

<d>Text One</d><e>Text Two</e>

</c></b><f> g

These could be selected via ‘preceding-sibling::node()’ or ‘preceding-sibling::*’ Because in this case all the

f<g>Text Three</g><h>Text Four</h><i>Text Five</i><j>Text Six</j> Because in this case all the

preceding siblings are elements

j Text Six /j<k>Text Seven</k>

</f></a>

© Garth Gilmour 2008

Page 20: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th P di Sibli A iThe Preceding Sibling Axis

© Garth Gilmour 2008

Page 21: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th F ll i Sibli A iThe Following Sibling Axis The following sibling axis

contains siblings which occur after the current node

Th d i i f d

<a><b>

<c><d>Text One</d>

The ordering is referred to as ‘document order’

The following siblings of the ‘i’ element are ‘j’ and ‘k’

<d>Text One</d><e>Text Two</e>

</c></b><f> j

These could be selected via ‘following-sibling::node()’ or ‘following-sibling::*’ Because in this case all the

f<g>Text Three</g><h>Text Four</h><i>Text Five</i><j>Text Six</j> Because in this case all the

following siblings are elements

j Text Six /j<k>Text Seven</k>

</f></a>

© Garth Gilmour 2008

Page 22: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th F ll i Sibli A iThe Following Sibling Axis

© Garth Gilmour 2008

Page 23: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th P di d F ll i A iThe Preceding and Following Axis

These axes are hard to explain but very rarely used The preceding axis contains

ll th d th t b f

<a><b>

<c><d>Text One</d> all the nodes that come before

the current node For ‘g’ the xpath ‘preceding::*’

selects ‘a’,’d’,’c’ and ‘b’

<d>Text One</d><e>Text Two</e>

</c></b><f>

The following axis contains all the nodes that come after the current node For ‘e’ the expression

f<g>Text Three</g><h>Text Four</h><i>Text Five</i><j>Text Six</j> p

‘following::*’ selects ‘f’,’g’,’h’,‘i’,’j’ and ‘b’

j Text Six /j<k>Text Seven</k>

</f></a>

© Garth Gilmour 2008

Page 24: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th P di A iThe Preceding Axis

© Garth Gilmour 2008

Page 25: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th F ll i A iThe Following Axis

© Garth Gilmour 2008

Page 26: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

XP th E iXPath Expressions

An XPath expression is made up of steps Every step contains

The axis to search along (child is the default)The axis to search along (child is the default) The type of nodes to select Zero of more predicates to filter the node set

For example consider ‘descendant::order/child::item[@urgent]’p [@ g ] The first step selects all elements named ‘order’ on the descendant

axis and stores them in a node set The second step selects all those children of the selected nodes

which are item elements these are stored in a new node setwhich are item elements, these are stored in a new node set Finally the predicate at the end of the second step filters out all the

nodes which do not have an ‘urgent’ attribute

© Garth Gilmour 2008

Page 27: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

XP th E iXPath Expressions

descendant::order/child::item[@urgent]’

Step 1 Step 2

descendant::order/child::item[@urgent]’

Axis PredicateNodeTest

NodeTest

Axis

© Garth Gilmour 2008

Page 28: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

U i St T Fi d M N dUsing Steps To Find Many Nodes//

ancestor-or-self::*/following-sibling::*att

current

© Garth Gilmour 2008

Page 29: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

U i St T Fi d A Si l N dUsing Steps To Find A Single Node//

ancestor::*[2]/preceding-sibling::*[2]

parent::*/following-sibling::*[1]att

/child::*/child::*[4]/attribute::att

current

© Garth Gilmour 2008

Page 30: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

N d T tNode Tests The node test can be a name or the wildcard (‘*’)

These select attribute nodes on the attribute axis, namespace nodes on the namespace axis and elements on all other axesB th th d th ild d b lifi d b Both the name and the wildcard can be qualified by a namespace prefix For example ‘svg:*’ or ‘math:Triangle’

Built in functions return other Nodes on the current axis Built in functions return other Nodes on the current axis The processing-instruction() function selects all PI’s

The function can take the name of a PI as a parameter The comment() functions selects comment nodes() The text() function selects text nodes The node() function selects nodes of any type

© Garth Gilmour 2008

Page 31: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Si l XP th E iSimple XPath Expressionschild::* //All element nodes on the child axischild:: //All element nodes on the child axischild::node() //All nodes on the child axischild::text() //All text nodes on the child axischild::comment() //All comment nodes on the child axischild::processing-instruction() //All PI nodes on the child axischild::processing instruction() //All PI nodes on the child axischild::processing-instruction(‘abc’) //The PI child node called ‘abc’

parent::order //The parent node if it is an ‘order’ elementattribute::cost //The attribute of the current node called ‘cost’attribute::cost //The attribute of the current node called costpreceding-sibling::*[text()] //All preceding siblings containing text nodesancestor::*[attribute::cost] //All ancestors with a ‘cost’ attributeancestor::*/attribute::cost //The ‘cost’ attribute nodes of ancestors

/invoice //The document element if of type ‘invoice’//invoice //Every invoice element in the document//attribute::cost //Every cost attribute in the whole document

© Garth Gilmour 2008

Page 32: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

S t d Abb i tiSyntax and Abbreviations

An XPath expression can be absolute or relative Absolute expressions begin with ‘/’ and search from the root Relative expressions search from the current node Relative expressions search from the current node

Two separate expressions can be combined with ‘|’ This represents a union and NOT a logical OR

C l d f t bb i t d Commonly used features are abbreviated The default axis is child and is usually left out The ‘@’ character is short for ‘attribute::’ The ‘.’ character selects the current node i.e. ‘self::node()’ The ‘..’ characters select the parent node i.e. ‘parent::node()’

© Garth Gilmour 2008

Page 33: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Abb i t d XP th E iAbbreviated XPath Expressions* //All element nodes on the child axisnode() //All nodes on the child axistext() //All text nodes on the child axiscomment() //All comment nodes on the child axis

i i i () //All PI d h hild iprocessing-instruction() //All PI nodes on the child axisprocessing-instruction(‘abc’) //The PI child node called ‘abc’

parent::order //The parent node if it is an ‘order’ element@ t //Th tt ib t f th t d ll d ‘ t’@cost //The attribute of the current node called ‘cost’preceding-sibling::*[text()] //All preceding siblings containing text nodesancestor::*[@cost] //All ancestors with a ‘cost’ attributeancestor::*/@cost //The ‘cost’ attribute nodes of ancestors

../item //All item children of the parent node

./*/* //All element grandchildren of the current node/*/* //All children of the document element/*/@ t //All ‘ t’ tt ib t f hild f th t

© Garth Gilmour 2008

../*/@cost //All ‘cost’ attributes of children of the parent

Page 34: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Wh t D Th E i D ?What Do These Expressions Do ?<purchaseOrder id="ABC123"><purchaseOrder id= ABC123 >

<customer id="DEF456"><name>MegaCorp</name><address postcode="BT37 ABC">

<city>Beflast</city><street no="10">Arcatia Road</street>

</address><paymentOptions>

<category>Retail</category><daysToPay>30</daysToPay>

/purchaseOrder/customer/address/@postcode/purchaseOrder/customer/paymentOptions/category/purchaseOrder/itemsList/item[1]/description/text()<daysToPay>30</daysToPay>

<creditLimit>20000</creditLimit></paymentOptions>

</customer><itemsList>

it i d "1" id "12" tit "12"

/purchaseOrder/itemsList/item[1]/description/text()/purchaseOrder/itemsList/item[last()]/description/text()/purchaseOrder/itemsList/item/descriptioncount(/purchaseOrder/itemsList/item)

<item index="1" id="12" quantity="12"><description>Hard Disk</description>

</item><!-- Other items omitted -->

</itemsList>

© Garth Gilmour 2008

</purchaseOrder>

Page 35: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

<item index="1" id="12" quantity="12">q y<description>Hard Disk</description>

</item><item index="2" id="08" quantity="6">

<description>Keyboard</description></item>

count(preceding-sibling::item)

</item><item index="3" id="72" quantity="7">

<description>Monitor</description></item><item index="4" id="34" quantity="8">

count(following-sibling::item)preceding-sibling::item[1]/descriptionfollowing-sibling::item[1]/descriptiondescription/text()

/item/description

<description>Mouse</description></item><item index="5" id="58" quantity="3">

<description>Graphics Card</description></item> ../item/description</item><item index="6" id="99" quantity="3">

<description>CD Drive</description></item><item index="7" id="23" quantity="8">

<description>DVD Drive</description></item><item index="8" id="19" quantity="5">

<description>TV Card</description></item>

© Garth Gilmour 2008

/item

Current Node

Page 36: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

S hi L A f th TSearching Large Areas of the Tree

The symbol ‘//’ is short for ‘/descendant-or-self::node()/’ This is very powerful and can be used at any step Avoid using it casually as it makes the engine do a lot of work Avoid using it casually as it makes the engine do a lot of work

//invoiceFinds all the invoice elements

/descendant-or-self::node()/child::invoice

.//invoice

in the entire document

Fi d ll i i l t

self::node()/descendant-or-self::node()/child::invoice

Finds all invoice elements within the current node

© Garth Gilmour 2008

Page 37: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

A C EA Common Error Most developers expect ‘//invoice[1]’ to find the first

‘invoice’ element in the document Instead it finds all invoice elements which are the first invoice

l t hild f th i telement child of their parent Parenthesis can be used to separate the predicate from the

search, producing the correct result<accounts>

/descendant-or-self::node()/child::invoice[1]

//invoice[1]<customer details=“…”>

<invoice/><invoice/>

</customer><customer details=“ ”><customer details= … >

<invoice/></customer><customer details=“…”>

<invoice/>

© Garth Gilmour 2008

<invoice/></customer>

<accounts>

Page 38: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

P di t d P iti iPredicates and Positioning Predicates often test for position

‘child::item[3]’ is short for ‘child::item[position() = 3]’ Following axes are indexed in document orderg

So ‘following-sibling::*[1]’ returns the closest sibling to the current element in the rest of the document

And ‘following-sibling::*[last()]’ returns the furthest Th l t t th d f th d t The closest to the end of the document

Preceding axes are indexed in reverse document order So ‘preceding-sibling::*[1]’ returns the closest sibling to the

current element in the document so farcurrent element in the document so far And ‘preceding-sibling::*[last()]’ returns the furthest

The closest to the start of the document

© Garth Gilmour 2008

Page 39: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

P di t d P iti iPredicates and Positioning<A>

<B/>

<C/>

preceding-sibling::*[last()]

C/

<D/>

E/

preceding-sibling::*[1]

C t N d <E/>

<F/> following-sibling::*[1]

Current Node

<G/>

<H/></A>

following-sibling::*[last()]

© Garth Gilmour 2008

</A>

Page 40: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

P di t d N d T tPredicates and Node Tests

Predicates can be used to test the name of an element Normally you would simply hard code the element name However in XSLT and XQuery you may need to iterate over a However in XSLT and XQuery you may need to iterate over a

set of elements and perform behaviour according to their type The two common ways to perform the test are:

Use the ‘local-name’ function e g ‘//*[local-name() = ‘item’]’ Use the local-name function e.g. // [local-name() = item ] Use the ‘self’ axis e.g. ‘//*[self::item]’

This is short for ‘//*[count(self::item) > 0]’

<xsl:for-each select="//Invoice | //Item"><xsl:if test="self::Invoice"> <!-- Do A --> </xsl:if><xsl:if test="self::Item"> <!-- Do B --> </xsl:if>

</xsl:for each>

© Garth Gilmour 2008

</xsl:for-each>

Page 41: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

P di t i XP th E iPredicates in XPath Expressions/*[@total] //The doc element if it has a ‘total’ attribute//item[1] //The first ‘item’ child of any element(//item)[1] //The first ‘item’ element in the document//order/item[@cost > 5000] //All ‘items’ in orders with a ‘cost’ over 5000// ()[ i ] //All i ‘i ’ l//comment()[parent::item] //All comments in ‘item’ elements

//All preceding sibling ‘order’ elements with more than 5 ‘item’ childrenpreceding-sibling::order[count(item) > 5]

//The closest preceding sibling ‘order’ element with a ‘totalCost’ attributepreceding-sibling::order[@totalCost][1]

//Th l t di ibli ‘ d ’ l t IF it h ‘t t lC t’ tt ib t//The closest preceding sibling ‘order’ element IF it has a ‘totalCost’ attributepreceding-sibling::order[1][@totalCost]

//The ‘cost’ attribute of the sixth ‘item’ child of the fifth ‘order’ child of the doc element/ d / d [5]/it [6]/@ t

© Garth Gilmour 2008

/porder/order[5]/item[6]/@cost

Page 42: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

H O d i Aff t P di tHow Ordering Affects Predicates

The order of the predicates is vital The expression ‘descendant::item[@urgent][1]’ is not the same

as ‘descendant::item[1][@urgent]’ Th f l t th fi t l t f th t f d d t The former selects the first element from the set of descendant elements named ‘item’ which have an ‘urgent’ attribute

The latter selects the first descendant ‘item’ element IF it has an ‘urgent’ attribute

Braces can be used to separate a step from a predicate The expression //item[1] selects all the ‘item’ elements which are

the first child of their parentRemember it expands to ‘/descendant or self::node()/child::item[1]’ Remember it expands to /descendant-or-self::node()/child::item[1]

Whereas (//item)[1] selects the first ‘item’ element in the whole document, which is normally what you are looking for

© Garth Gilmour 2008

Page 43: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

XP th F tiXPath Functions XPath includes a library of functions

To manipulate strings and numbers To investigate nodes and convert data types

The most useful functions are: The count function, which counts a node set

For example ‘//order[count(item) = 20]’ The sum function, which converts each item in a node set to a

number and totals them For example ‘sum(//orders/@totalCost)’

The translate function which operates like Perls ‘tr’ The translate function, which operates like Perls tr For example ‘translate(“aeiou”,”AEIOU”,$var1)’ returns a new string

with the same contents as $var1, but with all the vowels capitalised The format-number function, which pretty-prints numbers

© Garth Gilmour 2008

Page 44: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

R i E i i J 1 5Running Expressions in Java 1.5

XPath has always been supported in Java By using extension libraries like JDOM and Jaxen

Java 1 5 adds support for XPath to JAXP Java 1.5 adds support for XPath to JAXP Via the types in the package “javax.xml.xpath”

An XPath expression engine is found indirectly Via the ‘XPathFactory’ factory class The ‘XPath’ interface represents the engine

Expressions can be run in two waysy By passing the expression as a String into ‘XPath.evaluate’ By creating and using an ‘XPathExpression’ object

© Garth Gilmour 2008

Page 45: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

R i E i i J 1 5Running Expressions in Java 1.5

There are two choices when evaluating an expression How will the input XML be represented? In what format should the result be returned? In what format should the result be returned?

There are three ways of supplying the XML As an ‘InputSource’ object As an ‘org w3c dom Document’ object As an org.w3c.dom.Document’ object As an ‘org.w3c.dom.Node’ object

This is essential if your XPath is a relative expression

Th fi f ti th i lt There are five ways of representing the queries result These are a ‘Node’, ‘NodeList’, ‘Double’, ‘String’ or ‘Boolean’ As represented by the values declared in ‘XPathConstants’

© Garth Gilmour 2008

Page 46: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

InputSource input = new InputSource(new FileReader("input/purchase_order.xml"));DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();DocumentBuilder builder = domFactory.newDocumentBuilder();Document document = builder.parse(input);p ( p );

XPathFactory xpathFactory = XPathFactory.newInstance();XPath xpath = xpathFactory.newXPath();

XPathExpression [] absoluteExpressions = new XPathExpression[6];XPathExpression [] absoluteExpressions = new XPathExpression[6];absoluteExpressions[0] = xpath.compile("/purchaseOrder/customer/address/@postcode");absoluteExpressions[1] = xpath.compile("/purchaseOrder/customer/paymentOptions/category");absoluteExpressions[2] = xpath.compile("/purchaseOrder/itemsList/item[1]/description/text()");absoluteExpressions[3] = xpath.compile("/purchaseOrder/itemsList/item[last()]/description/text()");absoluteExpressions[4] = xpath.compile("/purchaseOrder/itemsList/item/description");absoluteExpressions[5] = xpath.compile("count(/purchaseOrder/itemsList/item)");

Attr result1 = (Attr)absoluteExpressions[0].evaluate(document,XPathConstants.NODE);Element result2 = (Element)absoluteExpressions[1] evaluate(document XPathConstants NODE);Element result2 (Element)absoluteExpressions[1].evaluate(document,XPathConstants.NODE);Text result3 = (Text)absoluteExpressions[2].evaluate(document,XPathConstants.NODE);Text result4 = (Text)absoluteExpressions[3].evaluate(document,XPathConstants.NODE);NodeList result5 = (NodeList)absoluteExpressions[4].evaluate(document,XPathConstants.NODESET);Double result6 = (Double)absoluteExpressions[5].evaluate(document,XPathConstants.NUMBER);

© Garth Gilmour 2008

Page 47: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

InputSource input = new InputSource(new FileReader("input/purchase order.xml"));pu Sou ce pu e pu Sou ce( e e eade ( pu /pu c ase_o de ));DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();DocumentBuilder builder = domFactory.newDocumentBuilder();Document document = builder.parse(input);

XPathFactory xpathFactory = XPathFactory newInstance();XPathFactory xpathFactory = XPathFactory.newInstance();XPath xpath = xpathFactory.newXPath();

String contextString = “/purchaseOrder/itemsList/item[@index = 5]”;Node contextNode = (Node)xpath.evaluate(contextString,document,XPathConstants.NODE);

XPathExpression [] relativeExpressions = new XPathExpression[6];relativeExpressions[0] = xpath.compile("count(preceding-sibling::item)");relativeExpressions[1] = xpath.compile("count(following-sibling::item)");relativeExpressions[2] = xpath compile("preceding-sibling::item[1]/description");relativeExpressions[2] = xpath.compile( preceding-sibling::item[1]/description );relativeExpressions[3] = xpath.compile("following-sibling::item[1]/description");relativeExpressions[4] = xpath.compile("description/text()");relativeExpressions[5] = xpath.compile("../item/description");

Double result1 = (Double)relativeExpressions[0].evaluate(contextNode,XPathConstants.NUMBER);Double result2 = (Double)relativeExpressions[1].evaluate(contextNode,XPathConstants.NUMBER);Element result3 = (Element)relativeExpressions[2].evaluate(contextNode,XPathConstants.NODE);Element result4 = (Element)relativeExpressions[3].evaluate(contextNode,XPathConstants.NODE);Text result5 = (Text)relativeExpressions[4].evaluate(contextNode,XPathConstants.NODE);

© Garth Gilmour 2008

Text result5 (Text)relativeExpressions[4].evaluate(contextNode,XPathConstants.NODE);NodeList result6 = (NodeList)relativeExpressions[5].evaluate(contextNode,XPathConstants.NODESET);

Page 48: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

XSLT StylesheetsXSLT Stylesheets

Transforming XML in your Web Application

[email protected] Content © Garth Gilmour 2008

Page 49: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th XSLT St d dThe XSLT Standard

XSLT enables you to transform XML For example from XML to HTML

Your Web Apps can be free of presentation Your Web Apps can be free of presentation Everything is encapsulated into stylesheets

XSLT is advocated as an alternative to ASP/JSP In truth they are complimentary technologies

XSLT needs to address parts of the input XML This ability is provided by the XPATH standardy p y

XQuery has recently joined the XML family It performs a similar job to XSLT but in a DB oriented way

© Garth Gilmour 2008

Page 50: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th XSLT PThe XSLT Processor

Stylesheet

Output tree

XSLT EngineInput Tree

© Garth Gilmour 2008

Page 51: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th XSLT PThe XSLT Processor

One logical document is the input Organised according to the XPATH data model The physical file structure is irrelevant The physical file structure is irrelevant Most of the DTD is not important

Fixed and default attribute values can affect the transformation Other documents can be opened from within the stylesheet Other documents can be opened from within the stylesheet

One logical stylesheet provides the transformation rules Again this is organised according to the XPATH data model Multiple stylesheets can be combined Multiple stylesheets can be combined

Via the include and import instructions

© Garth Gilmour 2008

Page 52: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

Th XSLT PThe XSLT Processor The output from XSLT may be:

An XML Document Processor tries to ensure output is well formed

An HTML document Processor follows SGML lexical conventions

A text document No precautions are taken and no characters are escaped No precautions are taken and no characters are escaped

Output is serialised or passed to another process If it is serialised you have little control over which alternatives are

chosen for escaping, indentation etc…chosen for escaping, indentation etc… XSLT was designed to produce an in memory tree for use by

another process Such as an XSL-FO Engine or Web Browser

© Garth Gilmour 2008

Page 53: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

A Si l T f tiA Simple TransformationXML

<xsl:template match=“Customer”><HTML>

XSLT<Customer><name title=“Mr”>

Joe Bloggs

XML

<HTML><HEAD></HEAD><BODY>

<xsl:apply-templates/>

gg</name>

</Customer>

<HTML>HTML

<xsl:apply templates/></BODY>

</HTML></xsl:template>

<HTML><HEAD></HEAD><BODY>

Good Morning Mr Joe Bloggs<br></BODY>

<xsl:template match=“name”>Good morning <xsl:value-of select="@title" /><xsl:value-of select=“."/><br/>

© Garth Gilmour 2008

</BODY></HTML>

xsl:value of select . / br/ </xsl:template>

Page 54: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

A l i T f ti i C dApplying Transformations in Code

Both Java and .NET support XSLT You can apply stylesheets from within code

Java intentionally hides the type of the enginey yp g The static ‘TransformerFactory.newInstance’ method finds the

default engine on the classpath and returns a reference to it This is used to build ‘Transformer’ objects that run stylesheets

.NET supports XSLT via the types in ‘System.Xml.Xsl’ ‘XslTransform’ objects apply stylesheets

The input is an ‘XPathDocument’The output is written down a stream The output is written down a stream

The transformation can be customized Via stylesheet arguments and extension objects

© Garth Gilmour 2008

Page 55: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

A l i St l h t i J C dApplying a Stylesheet in Java Codepublic static void main(String[] args) throws Exception {

String sep = File.separator;File inputFile = new File("input" + sep + "much_ado_about_nothing.xml");File xsltFile = new File("input" + sep + "shakespeare.xslt");File outputFile = new File("output" + sep + "play html");File outputFile = new File( output + sep + play.html );

TransformerFactory tf = TransformerFactory.newInstance();

StreamSource inputSource = new StreamSource(inputFile );StreamSource stylesheetSource = new StreamSource(xsltFile);

StreamResult consoleResult = new StreamResult(System.out);StreamResult fileResult = new StreamResult(outputFile);

Transformer t = tf.newTransformer(stylesheetSource);

t.transform(inputSource,consoleResult);t.transform(inputSource,fileResult);

}

© Garth Gilmour 2008

}

Page 56: Using XPathUsing XPath · 2012. 2. 19. · Th XP th St d dThe XPath Standard XPath began as part of a standard called XSL A single standard which could query, transform and format

A l i St l h t i C# C dApplying a Stylesheet in C# Code

static void Main(string[] args) {//A document class optimised for XPathXPathDocument inputDoc = new XPathDocument("..\\..\\muchAdoAboutNothing.xml");

//The transformer objectXslTransform stylesheet = new XslTransform();stylesheet.Load("..\\..\\shakespeare.xslt");

//Perform a simple transformation//Perform a simple transformation // Parameter two could be an XsltArgumentList object to pass parameters// Parameter four would be an XmlResolver if our XSLT used 'import' or 'include'stylesheet.Transform(inputDoc,null,File.Open("output.html",FileMode.Create),null);

C l W it Li ("T f ti C l t ")Console.WriteLine("Transformation Complete");}

© Garth Gilmour 2008


Recommended