+ All Categories
Home > Documents > XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and...

XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and...

Date post: 23-Dec-2015
Category:
Upload: noah-stewart
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
32
XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems June 16, 2022
Transcript
Page 1: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

XML Schemas and Queries

Zachary G. IvesUniversity of Pennsylvania

CIS 455 / 555 – Internet and Web Systems

April 19, 2023

Page 2: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

2

Readings & Reminders

Reminder: Homework 1 Milestone 2 due tonight @ 11:59PM

Homework 2 pre-release is now posted

XML, DTD, Schema XPath XSLT

For next week: Altinel & Franklin paper on XFilter

Page 3: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

3

Sample XML<?xml version="1.0" encoding="ISO-8859-1" ?> <dblp> <mastersthesis mdate="2002-01-03" key="ms/Brown92">  <author>Kurt P. Brown</author>   <title>PRPL: A Database Workload Specification Language</title>   <year>1992</year>   <school>Univ. of Wisconsin-Madison</school>   </mastersthesis> <article mdate="2002-01-03" key="tr/dec/SRC1997-018">  <editor>Paul R. McJones</editor>   <title>The 1995 SQL Reunion</title>   <journal>Digital System Research Center Report</journal>   <volume>SRC1997-018</volume>   <year>1997</year>   <ee>db/labs/dec/SRC1997-018.html</ee>   <ee>http://www.mcjones.org/System_R/SQL_Reunion_95/</ee>   </article>

Page 4: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

4

XML Data Model VisualizedRoot

?xml dblp

mastersthesis article

mdate key

author title year school editor title yearjournal volume eeee

mdatekey

2002…

ms/Brown92

Kurt P….

PRPL…

1992

Univ….

2002…

tr/dec/…

Paul R.

The…

Digital…

SRC…

1997

db/labs/dec

http://www.

attributeroot

p-i element

text

Page 5: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

5

XML Isn’t Enough on Its Own

It’s too unconstrained for many cases! How will we know when we’re getting

garbage? How will we query? How will we understand what we got?

Page 6: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

6

Document Type Definitions (DTDs)

DTD is an EBNF grammar defining XML structure XML document specifies an associated DTD, plus

the root element DTD specifies children of the root (and so on)

DTD defines special significance for attributes: IDs – special attributes that are analogous to

keys for elements IDREFs – references to IDs IDREFS – space-delimited list of IDREFs

Page 7: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

7

An Example DTD

Example DTD:<!ELEMENT dblp((mastersthesis | article)*)><!ELEMENT mastersthesis(author,title,year,school,committeemember*)><!ATTLIST mastersthesis(mdate CDATA #REQUIRED

key ID #REQUIREDadvisor CDATA #IMPLIED>

<!ELEMENT author(#PCDATA)>

…Example use of DTD in XML file:

<?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE dblp SYSTEM “my.dtd"> <dblp>…

Page 8: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

8

DTDs Are Very Limited

DTDs capture grammatical structure, but have some drawbacks: Only string scalar types Global ID/reference space is inconvenient No way of defining OO-like inheritance

Page 9: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

9

XML Schema: DTDs Rethought

Features: XML syntax Better way of defining keys using XPaths Type subclassing … And, of course, built-in datatypes

Page 10: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

10

Basic Constructs of Schema

Separation of elements (and attributes) from types: complexType is a structured type

It can have sequences or choices

element and attribute have name and type Elements may also have minOccurs and maxOccurs

Subtyping, most commonly using:<complexContent> <extension base=“prevType”> … </…>

Page 11: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

11

Simple Schema Example

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<xsd:element name=“mastersthesis" type=“ThesisType"/> <xsd:complexType name=“ThesisType">

<xsd:attribute name=“mdate" type="xsd:date"/><xsd:attribute name=“key" type="xsd:string"/><xsd:attribute name=“advisor" type="xsd:string"/><xsd:sequence>

<xsd:element name=“author" type=“xsd:string"/> <xsd:element name=“title" type=“xsd:string"/> <xsd:element name=“year" type=“xsd:integer"/> <xsd:element name=“school" type=“xsd:string”/> <xsd:element name=“committeemember"

type=“CommitteeType” minOccurs=“0"/> </xsd:sequence>

</xsd:complexType>

Page 12: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

12

Embedding XML Schema

<root xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="s1.xsd" > <grade>a</grade> </root>

<s1:root xmlns:s1="http://www.schemaValid.com/s1ns" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.schemaValid.com/s1ns s1ns.xsd" > <s1:grade>a</s1:grade> </s1:root>

But the XML parser is actually free to ignore this – the schema is typically specified “from outside” the document

Page 13: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

13

Manipulating XML

Sometimes: Need to restructure an XML document Or simply need to retrieve certain parts that

satisfy a constraint, e.g.: All books All books by author XYZ

Page 14: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

14

Document Object Model (DOM)vs. Queries

Build a DOM tree (as we saw earlier) and access via Java (etc.) DOMNode object DOM objects have methods like “getFirstChild()”,

“getNextSibling” Common way of traversing the tree Can also modify the DOM tree – alter the XML – via

insertAfter(), etc.

Alternate approach: a query language Define some sort of a template describing traversals from

the root of the directed graph In XML, the basis of this template is called an XPath

Can also declare some constraints on the values you want The XPath returns a node set of matches

Page 15: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

15

XPaths

In its simplest form, an XPath is like a path in a file system:/mypath/subpath/*/morepath

The XPath returns a node set representing the XML nodes (and their subtrees) at the end of the path

XPaths can have node tests at the end, returning only particular node types, e.g., text(), processing-instruction(), comment(), element(), attribute()

XPath is fundamentally an ordered language: it can query in order-aware fashion, and it returns nodes in order

Page 16: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

16

Sample XML<?xml version="1.0" encoding="ISO-8859-1" ?> <dblp> <mastersthesis mdate="2002-01-03" key="ms/Brown92">  <author>Kurt P. Brown</author>   <title>PRPL: A Database Workload Specification Language</title>   <year>1992</year>   <school>Univ. of Wisconsin-Madison</school>   </mastersthesis> <article mdate="2002-01-03" key="tr/dec/SRC1997-018">  <editor>Paul R. McJones</editor>   <title>The 1995 SQL Reunion</title>   <journal>Digital System Research Center Report</journal>   <volume>SRC1997-018</volume>   <year>1997</year>   <ee>db/labs/dec/SRC1997-018.html</ee>   <ee>http://www.mcjones.org/System_R/SQL_Reunion_95/</ee>   </article>

Page 17: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

17

XML Data Model VisualizedRoot

?xml dblp

mastersthesis article

mdate key

author title year school editor title yearjournal volume eeee

mdatekey

2002…

ms/Brown92

Kurt P….

PRPL…

1992

Univ….

2002…

tr/dec/…

Paul R.

The…

Digital…

SRC…

1997

db/labs/dec

http://www.

attributeroot

p-i element

text

Page 18: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

18

Some Example XPath Queries

/dblp/mastersthesis/title /dblp/*/editor //title //title/text()

Page 19: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

19

Context Nodes and Relative Paths

XPath has a notion of a context node: it’s analogous to a current directory “.” represents this context node “..” represents the parent node We can express relative paths:

subpath/sub-subpath/../.. gets us back to the context node

By default, the document root is the context node

Page 20: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

20

Predicates – Filtering Operations

A predicate allows us to filter the node set based on selection-like conditions over sub-XPaths:

/dblp/article[title = “Paper1”]

which is equivalent to:

/dblp/article[./title/text() = “Paper1”]

because of type coercion. What does this do:

/dblp/article[@key = “123” and ./title/text() = “Paper1”

and ./author/*/element()]

Page 21: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

21

Axes: More Complex Traversals

Thus far, we’ve seen XPath expressions that go down the tree (and up one step) But we might want to go up, left, right, etc. These are expressed with so-called axes:

self::path-step child::path-step parent::path-step descendant::path-step ancestor::path-step descendant-or-self::path-step ancestor-or-self::path-

step preceding-sibling::path-step following-sibling::path-step preceding::path-step following::path-step

The previous XPaths we saw were in “abbreviated form”

Page 22: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

22

Users of XPath

XML Schema uses simple XPaths in defining keys and uniqueness constraints

XLink and XPointer, hyperlinks for XML

XSLT – useful for converting from XML to other representations (e.g., HTML, PDF, SVG)

XQuery – useful for restructuring an XML document or combining multiple documents Might well turn into the “glue” between Web Services,

etc.

Page 23: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

23

A Functional Language for XML

XSLT is based on a series of templates that match different parts of an XML document There’s a policy for what rule or template is

applied if more than one matches (it’s not what you’d think!)

XSLT templates can invoke other templates XSLT templates can be nonterminating (beware!)

XSLT templates are based on XPath “match”es, and we can also apply other templates (potentially to “select”ed XPaths) Within each template, directly describe what

should be output

Page 24: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

24

An XSLT Template

An XML document itself XML tags create output OR are XSL operations

All XSL tags are prefixed with “xsl” namespace All non-XSL tags are part of the XML output

Common XSL operations: template with a match XPath Recursive call to apply-templates, which may also select

where it should be applied

Attach to XML document with a processing-instruction:

<?xml version = “1.0” ?><?xml-stylesheet type=“text/xsl” href=“http://www.com/my.xsl” ?>

Page 25: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

25

An Example XSLT Stylesheet

<xsl:stylesheet version=“1.1”> <xsl:template match=“/dblp”> <html><head>This is DBLP</head> <body> <xsl:apply-templates /> </body> </html> </xsl:template> <xsl:template match=“inproceedings”>

<h2><xsl:apply-templates select=“title” /></h2> <p><xsl:apply-templates select=“author”/></p> </xsl:template> …</xsl:stylesheet>

Page 26: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

26

XSLT Processing Model

List of source nodes result tree fragment(s) Start with root

Find all template rules with matching patterns from root Find “best” match according to some heuristics Set the current node list to be the set of things it maches

Iterate over each node in the current node list Apply the operations of the template “Append” the results of the matching template rule to the

result tree structure Repeat recursively if specified to by apply-templates

Page 27: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

27

What If There’s More than One Match?

Eliminate rules of lower precedence due to importing

Break a rule into any | branches and consider separately

Choose rule with highest computed or specified priority

Simple rules for computing priority based on “precision”: QName preceded by XPath child/axis specifier: priority 0 NCName preceded by child/axis specifier: priority -0.25 NodeTest preceded by child/axis specifier: pririty -0.5 else priority 0.5

Page 28: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

28

Other Common Operations

Iteration:<xsl:for-each select=“path”></xsl:for-each>

Conditionals:<xsl:if test=“./text() &lt; ‘abc’”></xsl:if>

Copying current node and children to the result set:

<xsl:copy><xsl:apply-templates />

</xsl:copy>

Page 29: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

29

Creating Output Nodes

Return text/attribute data (this is a default rule):<xsl:template match=“text()|@*”>

<xsl:value-of select=“.”/></xsl:template>

Create an element from text (attribute is similar):

<xsl:element name=“text()”><xsl:apply-templates/>

</xsl:element>

Copy nodes matching a path<xsl:copy-of select=“*”/>

Page 30: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

30

Embedding Stylesheets

You can “import” or “include” one stylesheet from another:<xsl:import href=“http://www.com/my.xsl/”><xsl:include href=“http://www.com/my.xsl/”>

“Include”: the rules get same precedence as in including template

“Import”: the rules are given lower precedence

Page 31: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

31

XSLT Summary

A very powerful, template-based transformation language for XML document other structured document Commonly used to convert XML PDF, SVG, GraphViz

DOT format, HTML, WML, …

Primarily useful for presentation of XML or for very simple conversions

But sometimes we need more complex operations when converting data from one source to another Joins – combining and correlating information from

multiple sources Aggregation – computing averages, counts, etc.

Page 32: XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

32

XSLT and Alternatives

XSLT is focused on reformatting documents Stylesheets are focused around one XML file XML file must reference the stylesheet

What if we want to: Manage and combine collections of XML documents? Make Web service requests for XML? “Glue together” different Web service requests? Query for keywords within documents, with ranked

answers

This is where XQuery plays a role – see CIS 330 / 550 for details


Recommended