Date post: | 19-Jan-2016 |
Category: |
Documents |
Upload: | chastity-page |
View: | 214 times |
Download: | 0 times |
IS432Semi-Structured Data
Lecture 4:
XPath
Dr. Gamal Al-Shorbagy
2
What is Xpath ?
• XPath: "A language for addressing parts of an XML document"
• Similar to a DOS or UNIX "file system path" but with powerful expressions
• XPath is to XML what the SQL "select" statement is to SQL– But, XPath is not a full programming language or a query
language.
What is XPath ?
• XPath is used to navigate through elements and attributes in an XML document.
• XPath is a major element in W3C's XSLT standard – – XQuery and XPointer are both built on XPath
expressions.
XPath
XPointer XQuery
4
XPath Related Standards
• XSLT – XPath is used to tell XSLT how to match tags• XLink – similar to HTML links <a> but more powerful• XPointer - a standard manner for identifying
document fragments• XQuery – a newer, more comprehensive standard that
includes XPath 2.0 and allows more complex searches and data types include relational database searches.
5
Versions
• Version 1.0– W3C Recommendation November, 16 1999– http://www.w3.org/TR/xpath
• Version 2.0– W3C Working Draft October, 29 2004– http://www.w3.org/TR/xpath20/
6
Other Familiar Path Names
• DOS:– C:\Program Files\Altova\XMLSPY2004\Examples\Tutorial
• Web– http://www.google.com/search?hl=en&lr=lang_en&&q=XPath
• Unix– /usr/local/lib/mylib/myprogram.jar
• Similarities– Absolute path starts with "/"
– Relative paths express do not start with "/"
What is XPath ?
• A syntax for defining parts of an XML document
• Uses path expressions to navigate in XML documents
• Contains a library of standard functions• A major element in XSLT (W3C
recommendation)
XPath Terminology
• Seven Nodes of XPath– Element– Attribute– Text– Namespace– Processing-instruction– Comment – Document nodes.
• Atomic Values
XML documents are treated as trees of nodes.
The topmost element of the tree is called the root element.
XPath Terminology
• <?xml version="1.0" encoding="ISO-8859-1"?><bookstore> <book> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book></bookstore>
Root Element
AttributeAtomic Value
Element
XPath Syntax
• XPath uses path expressions to select nodes or node-sets in an XML document.
• The node is selected by following a path or steps.
Xpath AxisRelationships of Xpath Nodes
Xpath Axis• ancestor• parent• child• descendant• Proceeding-sibling• following-sibling• Self• Attribute
title
XPath Syntax<?xml version="1.0" encoding="ISO-8859-1"?><bookstore>
<book> <title lang="eng">Harry Potter</title> <price>29.99</price>
</book><book>
<title lang="eng">Learning XML</title> <price>39.95</price>
</book></bookstore>
XPath Syntax: Selecting NodesPath Expression Description
nodename Selects all nodes with the name "nodename"
/ Selects the root node
// Selects nodes in the document from the current node that match the selection no
matter where they are
. Selects the current node
.. Selects the parent of the current node
@ Selects attributes
XPath Syntax: Selecting NodesPath Expression Result
bookstore Selects all nodes with the name "bookstore"
/bookstore Selects the root element bookstoreNote: If the path starts with a slash ( / ) it always represents an absolute path to an
element!bookstore/book Selects all book elements that are children of
bookstore//book Selects all book elements no matter where they
are in the documentbookstore//book Selects all book elements that are descendant
of the bookstore element, no matter where they are under the bookstore element
//@lang Selects all attributes that are named lang
XPath Syntax: Selecting NodesPath Expression Result
bookstore
/bookstore
bookstore/book
//book
bookstore//book
//@lang
<?xml version="1.0" encoding="ISO-8859-1"?><bookstore> <book> <title lang="eng">Harry Potter</title> <price>29.99</price> </book> <book> <title lang="eng“>XML 4 Dummies</title> <price>39.95</price> </book> <book> <title lang=“kor“>The Han River</title> <price>149.95</price> </book></bookstore>
Xpath Syntax: Predicates
• Predicates are used to find a specific node or a node that contains a specific value.
• Predicates are always embedded in square brackets.
Xpath Syntax: PredicatesPath Expression Result
/bookstore/book[1] Selects the first book element that is the child of the bookstore element.
/bookstore/book[last()] Selects the last book element that is the child of the bookstore element
/bookstore/book[last()-1] Selects the second last book element that is the child of the bookstore element
/bookstore/book[position()<3] Selects the first two book elements that are children of the bookstore element
//title[@lang] Selects all the title elements that have an attribute named lang
//title[@lang='eng'] Selects all the title elements that have an attribute named lang with a value of 'eng'
/bookstore/book[price>35.00] Selects all the book elements of the bookstore element that have a price element with a value
greater than 35.00/bookstore/book[price>35.00]/title Selects all the title elements of the book elements of
the bookstore element that have a price element with a value greater than 35.00
Xpath Syntax: PredicatesPath Expression Result
/bookstore/book[1]
/bookstore/book[last()]
/bookstore/book[last()-1]
/bookstore/book[position()<3]
//title[@lang]
//title[@lang=‘kor']
/bookstore/book[price>35.00]
/bookstore/book[price>35.00]/title
<?xml version="1.0" encoding="ISO-8859-1"?><bookstore> <book> <title lang="eng">Harry Potter</title> <price>29.99</price> </book> <book> <title lang="eng“>XML 4 Dummies</title> <price>39.95</price> </book> <book> <title lang=“kor“>The Han River</title> <price>149.95</price> </book></bookstore>
Xpath Example<?xml version="1.0" encoding="iso-8859-1"?> <pets>
<pet type="dog" color="brown">Max</pet>
<pet type="cat" color="white">Toula</pet> </pets>
• Select all pet elements • //pet or alternatively /pets/pet or
/pets/child::* • Select the first pet • /pets/pet[1] • Select all pets of type dog • //pet[@type ="dog"] • Select all pets of white color • //pet[@color="white"] • Select the color of all dogs • //pet[@type ="dog"]/@color • Get the types of pets with the name
Max • /pets/pet[text()="Max"]/@type
Xpath Syntax: Wild CardsWildcard Description
* Matches any element node*@ Matches any attribute node
node)( Matches any node of any kind
Path Expression Result/bookstore*/ Selects all the child nodes of the bookstore
element*// Selects all elements in the document
//title]* [@ Selects all title elements which have any attribute
Xpath Syntax: Selecting Multiple Paths
Path Expression Result//book/title | //book/price All the title and price elements of all book
elements//title | //price All title and Price elements/bookstore/book/title |
//priceAll books (in bookstore) and All price
elements
Xpath Axis
• ancestor::author
• parent::author
• child::firstname , (child::*) , child::node()
• descendant::author
• proceeding-sibling::author
• following-sibling::author
• attribute::title
title
Xpath Functions
Node-Set Takes a node-set argument, returns a node-set, or returns/provides information about a particular node within a node-set.
String Performs evaluations, formatting, and manipulation on string arguments.
Boolean Evaluates the argument expressions to obtain a Boolean result.Number Evaluates the argument expressions to obtain a numeric result.
XPath Functions: Nodes Set
node-set count(node-set) //emp 3//emp[1] 1
<?xml version="1.0" encoding="UTF-8"?><root> <emp id=" S0 01 ">
<name>ABC</name> <salary>5000</salary>
</emp><emp id="S002">
<name>PQR</name> <salary>7000</salary>
</emp><emp id="S003">
<name>XYZ</name> <salary>9000</salary>
</emp></root>
XPath Functions: Nodes Set
node-set last)(
//emp[last()]
<emp id="S003 >"< name>XYZ</name >< salary>9000</salary >
/<emp>
<?xml version="1.0" encoding="UTF-8"?><root> <emp id=" S0 01 ">
<name>ABC</name> <salary>5000</salary>
</emp><emp id="S002">
<name>PQR</name> <salary>7000</salary>
</emp><emp id="S003">
<name>XYZ</name> <salary>9000</salary>
</emp></root>
XPath Functions: String
• String concat("abc", "d", "ef", "g") abcdefg• boolean contains(“Mobily”, “bil”) true• String normalize-space(" abc def ") “abc def”• boolean starts-with(string, string) • number string-length("abcd") 4• String substring("12345",2,3) “234”• …• …
XPath Functions: Number
• ceiling(2.5) = 3 • floor(3.5) = 3 • number(arg)
• number('2048') = 2048 • number('-2048') = -2048 • number('text') = NaN • number('109.54') = 109.54
• round(2.6) = 3, round (2.4) = 2, round(2.5) = 3• number sum(node-set)
Example for XPath Queries<bib>
<book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
Data Model for XPath
bib
book book
publisher author . . . .
Addison-Wesley Serge Abiteboul
The root
The root element
Much like the Xquery data model
Processing instruction
Comment
The Root and the Root
• <bib> <paper> 1 </paper> <paper> 2 </paper> </bib>
• bib is the “document element”
• The “root” is above bib
• /bib = returns the document element
• / = returns the root
• Why ? Because we may have comments before and after <bib>; they become siblings of <bib>
• This is advanced xmlogy
XPath: Simple Expressions
/bib/book/year
Result: <year> 1995 </year>
<year> 1998 </year>
/bib/paper/year
Result: empty (there were no papers)
XPath: Restricted Kleene Closure
//author
Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author>
/bib//first-nameResult: <first-name> Rick </first-name>
Xpath: Functions
/bib/book/author/text()
Result: Serge Abiteboul
Jeffrey D. Ullman
Rick Hull doesn’t appear because he has firstname, lastname
Functions in XPath:– text() = matches the text value– node() = matches any node (= * or @* or text())– name() = returns the name of the current tag
Xpath: Wildcard
//author/*
Result: <first-name> Rick </first-name>
<last-name> Hull </last-name>
* Matches any element
Xpath: Attribute Nodes
/bib/book/@price
Result: “55”
@price means that price is has to be an attribute
Xpath: Qualifiers
/bib/book/author[firstname]
Result: <author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
Xpath: More Qualifiers
/bib/book[@price < “60”]
/bib/book[author/@age < “25”]
/bib/book[author/text()]
Xpath: Summarybib matches a bib element
* matches any element
/ matches the root element
/bib matches a bib element under root
bib/paper matches a paper in bib
bib//paper matches a paper in bib, at any depth
//paper matches a paper at any depth
paper|book matches a paper or a book
@price matches a price attribute
bib/book/@price matches price attribute in book, in bib
bib/book/[@price<“55”]/author/lastname matches…
References
• http://www.w3schools.com/xpath/default.asp
• http://msdn.microsoft.com/en-us/library/ms256086.aspx
• http://www.xpathtester.com/test
• http://oreilly.com/catalog/xmlnut/chapter/ch09.html
• http://msdn.microsoft.com/en-us/library/ms256086.aspx