Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | dorthy-freeman |
View: | 223 times |
Download: | 0 times |
CSI 3125, XML, page 1
An introduction to XML and friends
based on these sources and inspirations:
Erik T. Ray, "Learning XML", 1st ed., O’Reilly
Robert Eckstein with Michel Casabianca, "XML Pocket Reference", 2nd ed., O’Reilly
the World-Wide Web—thousands of places
Go O'Reilly!
CSI 3125, XML, page 2
GML and SGML
• The story begins with GML—Generalized Markup Language, invented in 1969 by Goldfarb, Mosher and Lorie from IBM as a means of allowing the text editing, formatting, and information retrieval subsystems to share documents.
• In 1978-1986 GML was substantially enlarged and then standardized by ANSI and ISO as SGML — Standard Generalized Markup Language.
• http://www.sgmlsource.com/history/sgmlhist.htm
CSI 3125, XML, page 3
Markup in SGML
Procedural markup
Section One
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
1
16 pt. Helvetica Bold
12 pt. Helvetica
12 pt. TimesItalic
10 pt. Palatino
12 pt. Courier
Descriptive markup
Section One
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
1
chapter head
section head
leadparagraph
paragraph
page number
CSI 3125, XML, page 4
HTML
• SGML is not a single markup language. It is a standard for creating markup languages.
• HTML (HyperText Markup Language) is one such language. It began modestly. In 1993, Tim Berners-Lee based his first browsing and authoring system for the Web on a handful of markup tags. The present, hugely enlarged, version is HTML 4.01—still a single language.
• HTML standards are maintained by W3C (the World-Wide Web Consortium):
– http://www.w3.org/MarkUp/
– http://www.w3.org/TR/html401/ (already 4 years old...)
CSI 3125, XML, page 5
XML
• XML (eXtensible Markup Language) was developed in 1996, and standardized by W3C in 1998. It is a subset of SGML.
• XML is one part of a large—and growing—family of interconnected, cooperating languages: DTD, XSL, XSTL, CSS, XPath, XPointers, XLinks, XML Schema... and that's only the beginning .
• [But what would a programmer's life be without acronyms?]
• http://www.w3.org/XML/
CSI 3125, XML, page 6
XML in 10 points
1. XML is for structuring data2. XML looks a bit like HTML3. XML is text, but isn't meant to be read4. XML is verbose by design5. XML is a family of technologies6. XML is new, but not that new7. XML leads HTML to XHTML8. XML is modular9. XML is the basis for RDF and the Semantic Web10. XML is license-free, platform-independent and
well-supported
http://www.w3.org/XML/1999/XML-in-10-points
CSI 3125, XML, page 7
XHTML
• The emergence of XML has prompted a rethinking of HTML. The new "best thing since sliced bread" is XHTML 1.0: The eXtensible HyperText Markup Language, A Reformulation of HTML 4 in XML 1.0.
• The most recent recommendation:
• August 1, 2002 : XHTML 1.0, The Extensible HyperText Markup Language (Second Edition).
– http://www.w3.org/TR/xhtml1
CSI 3125, XML, page 8
But what is HTML?
CSI 3125, XML, page 9
The structure of HTML documents
<html><head>
head elements
</head><body>
body elements
</body></html>
A quick checklist• text, images,
multimedia• resource identifiers,
URLs• element placement• fonts, colours• paragraphs, divisors• tables• forms• scripts, applets• frames
document type, title, content
descriptors, ...OPTIONAL
CSI 3125, XML, page 10
Links and hot links in HTML
The power of HTML lies, naturally, in HyperText links. A click on a link is a request for some content: a string, an image, a complete document, a location in a document.
<a href="http://www.google.com/"><img src="gifs/Logo_25.gif" border="0"></a>
There are two links here: an anchor (pointing to a Web location) and an image address (pointing to a local file).
XML generalizes links—see later in this presentation.
CSI 3125, XML, page 11
HTML up-close (1)<base href="http://www.site.uottawa.ca/~szpak/teaching/3125/"><html><head><title>CSI 3125, Fall 2002: Concepts of Programming Languages</title></head><body background="gifs/3125.gif" bgcolor=#eeeeee text=#000066 link=#0066ff vlink=#9900cc alink=#ff0000><TABLE BORDER=0 CELLSPACING=10 CELLPADDING=0 WIDTH=384><TR><TD VALIGN=TOP ALIGN=LEFT WIDTH="80%">
<p align=right><script src="Date.js"></script> <p>
Javascript
CSI 3125, XML, page 12
HTML up-close (2)
<center><font size=+2>Welcome to the<font color="#AA3322">CSI 3125</font>Web site!</font><br><hr width=324 size="3"><TABLE BORDER=0 CELLSPACING=10 CELLPADDING=0 WIDTH=352><TR><a href="news.html"><img src="gifs/news.gif" width=136 height=34 border=0 alt="[What's new?]"></a></TR>
seven buttons start here
CSI 3125, XML, page 13
HTML up-close (3)
<TR>
<TD VALIGN=CENTER ALIGN=CENTER><a href="syl3125_ToC.html"><img src="gifs/syllabus.gif" width=136 height=34 border=0 alt="[The syllabus]"></a></TD>
<TD VALIGN=CENTER ALIGN=CENTER><a href="handouts/"><img src="gifs/handouts.gif" width=136 height=34 border=0 alt="[The handouts]"></a></TD>
</TR>
</TABLE>
and two other rows of buttons
CSI 3125, XML, page 14
HTML up-close (4)
<hr width=324><p>The instructor's email address:<p><ahref="mailto:[email protected]">[email protected]</a> <img src="gifs/rtarrow.gif" align=bottom border=0 alt="To "><a href="http://www.site.uottawa.ca/~szpak/"><img src="gifs/home.gif" border=0 align=bottom alt="my home page"></a><p><hr width=324 size="3"><font size=-1>Updated on August 6, 2002</font><hr width=324>
CSI 3125, XML, page 15
HTML up-close (5)
<form method=get action="http://www.google.com/search"><table bgcolor="#dddddd"><tr><td><a href="http://www.google.com/"><img src="gifs/Logo_25.gif" border="0" alt="google"></a><input type=text name=q size=25 maxlength=256 value=""><input type=submit name=sa value="Go"></td></tr></table></form>
CSI 3125, XML, page 16
HTML up-close (6)
<br><img src="gifs/macspin.gif" width=176 height=40 alt="[A Spinning Apple]">
</center>
</TD></TR></TABLE>
</body>
</html>
not in the picture on page
8
CSI 3125, XML, page 17
Back to XML...
• HTML mixes in one language two aspects of SGML: procedural markup of a document (its structure) and descriptive markup (its presentation). For example, <head> and <p> are elements of structure, while <font> and <i> describe formatting.
• In the XML world, the two aspects have been separated again. A DTD (Document Type Definition) defines the markup language, and a valid XML document must have this type. DTD has been around since the beginning of SGML. XML Schema is a new alternative, standardized in May 2001 (version 1.0).
CSI 3125, XML, page 18
A tiny XML document...<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="cd.xsl"?><!DOCTYPE cd SYSTEM "cd.dtd">
<cd type="single"> <title>Revolver, top two</title>
<band>The Beatles</band>
<track> <song>Eleanor Rigby</song> <time>2:45</time> </track>
<track> <song author="Paul and John"> For No One </song> </track></cd>
formatting(discussed
later)
structure
cd.xml
CSI 3125, XML, page 19
... its DTD...
<!-- Compact Disk: DTD -->
<!ELEMENT cd (title, band, track+)><!ATTLIST cd type (single | regular) #REQUIRED><!ELEMENT title (#PCDATA)><!ELEMENT band (#PCDATA)><!ELEMENT track (song, time?)><!ELEMENT song (#PCDATA)><!ELEMENT time (#PCDATA)><!ATTLIST song author CDATA "Paul">
cd.dtd
CSI 3125, XML, page 20
... and its validation
Three easy steps to validate an XML document:
– Connect (ssh) to the Linux machine site2.
– Have the document and its DTD in the same directory.
– Invoke the XML validator xmllint.
szpak|site2-1: ls cd.*
cd.dtd cd.xml
szpak|site2-2: xmllint --valid --noout cd.xml
szpak|site2-3:
(In this course, you will get to validate a few XML documents. )
CSI 3125, XML, page 21
Elements and attributes
Empty elements
<name attr1 = "val1" attr2 = "val2" ... />
<price amount="11.98" />
Container elements
<name attr1 = "val1" attr2 = "val2" ... >
content
</name>
<song author="Paul and John">
For No One
</song>
CSI 3125, XML, page 22
Snippets of DTD syntax
<!ELEMENT cd (title, band, track+)>Elements of cd in this order, with one or more track.<!ELEMENT title (#PCDATA)>Parsed-character data: entity references resolved.<!ELEMENT track (song, time?)>time is optional.<!ATTLIST cd type (single | regular) #REQUIRED>One of these two values must be present.<!ATTLIST song author CDATA "Paul">Character data, default is Paul.<!ATTLIST song lyricist NMTOKEN #IMPLIED>An identifier (more or less), optional, no default.
CSI 3125, XML, page 23
XML is stricter than HTML
• XML is case-sensitive.
• Attribute values must be in quotation marks.
• A container (non-empty) element must have an opening and closing tag.
• An empty element must have a final slash.
• Tags must be nested correctly (see next page).
• Whitespace in element content is preserved.
You can think of an XML document as an HTML document with customized tags (and much more — you will soon see a little of that), but keep in mind that XML is a lot more picky than HTML.
CSI 3125, XML, page 24
A document is a tree
<cd type="single"> <title>Revolver, top two</title> <band>The Beatles</band> <track> <song>Eleanor Rigby</song> <time>2:45</time> </track> <track> <song author="Paul and John"> For No One </song> </track></cd>
CSI 3125, XML, page 25
Entities in HTML
HTML allows us to refer to hard-to-type characters using the & convention. Examples:
non-breaking space
ë ë
a a
à à
é é
This is significantly extended in XML. An entity is a "placeholder for content"; it can denote anything, even a fragment of markup. Entities are resolved, or replaced, quite like macros.
CSI 3125, XML, page 26
Entities in XML (1)
We have general entities (either defined locally, or external and publicly available) and parameter entities used in DTDs. We also have predefined character entities, among them five necessary to avoid confusion with markup syntax:
& &
' '
> >
< <
" "
CSI 3125, XML, page 27
Entities in XML (2)
<!-- Compact Disk: DTD --><!ENTITY % basiccontent "(#PCDATA)">
<!ELEMENT cd (title, band, track+)><!ATTLIST cd type (single | regular) #REQUIRED><!ELEMENT title %basiccontent;><!ELEMENT band %basiccontent;><!ELEMENT track (song, time?)><!ELEMENT song %basiccontent;><!ELEMENT time %basiccontent;><!ATTLIST song author CDATA "Paul">
cd3.dtd
references
parameterentity
CSI 3125, XML, page 28
Entities in XML (3)<?xml version="1.0"?><?xml-stylesheet type="text/xsl"
href="cd3.xsl"?><!DOCTYPE cd SYSTEM "cd3.dtd" [ <!ENTITY favourite "For No One"> <!ENTITY bestever "The Beatles"> ] ><cd type="single"> <title>Revolver, top two</title> <band>&bestever;</band> <track> <song>Eleanor Rigby</song> <time>2:45</time> </track> <track> <song author="Paul and John"> &favourite; </song> </track></cd> cd3.xml
local entities
reference
reference
CSI 3125, XML, page 29
The stylesheet philosophy
• The presentation aspect of XML documents is handled in an elegant, general manner: by structure transformation. A stylesheet defines templates that can transform elements of a valid XML document into another structure. for example, into HTML.
• Access to elements is made easy by XPath, a rich language that allows us to move around a document and apply a variety of conditions. (We will see no XPath here—no time. )
CSI 3125, XML, page 30
A simplistic stylesheet (1)
<?xml version="1.0"?><xsl:stylesheet id="cds" version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/><xsl:template match="cd"> <html> <body> <xsl:apply-templates/> </body> </html></xsl:template>
cd.xls
<cd type="single">......</cd>
<html> <body> ------ </body></html>
This is an XML document!
CSI 3125, XML, page 31
A simplistic stylesheet (2)
<xsl:template match="title"> <h3><xsl:apply-templates/></h3> <br /><br /></xsl:template>
<xsl:template match="band"> <h4><xsl:apply-templates/></h4> <hr /></xsl:template>
<xsl:template match="track"> <p><xsl:apply-templates/></p></xsl:template>
<band>....</band>
<h4>----</h4><hr />
cd.xls
CSI 3125, XML, page 32
A simplistic stylesheet (3)
<xsl:template match="track"> <p><xsl:apply-templates/></p></xsl:template>
<xsl:template match="song"> <b><xsl:apply-templates/></b> <br /></xsl:template>
<xsl:template match="time"> <i><xsl:apply-templates/></i> <br /></xsl:template>
</xsl:stylesheet>
<song>...</song>
<b>---</b><br />
cd.xls
CSI 3125, XML, page 33
Links in XML, on one example
target is any local or public resource.
showhow is new, embed or replace. (In HTML: open a new window, embed a graphic, follow a link in the same window.)
showwhen is onLoad or onRequest.
/>if element empty
<elementname xlink:type = "simple" xlink:href = "target" xlink:show = "showhow" xlink:actuate = "showwhen">some content</elementname>
CSI 3125, XML, page 34
A larger example: checkbook
This example, due to Erik T. Ray ("Learning XML"), can be found on the course Web site:
checkbook.dtd
checkbook.xml
checkbook.xsl
Visit
http://www.site.uottawa.ca/~szpak/teaching/3125/handouts/other/perl_xml.html
and enjoy as best you can.
And now...
CSI 3125, XML, page 35
In another course • XML Schema (much more control of the form of data
than DTDs)
• CSS (Cascading Style Sheets—old but useful)
• XSL—lots of details
• Namespaces
• XSLT (Extensible Style Language for Transformations)
• XPath (locating objects in documents)
• XLink, XPointer (links between documents)
• XML tools (parsers, syntax checkers, validators, tree processors, etc.)
• Standards, public documents, Web resources
CSI 3125, XML, page 36
XML validation
Three easy steps to validate an XML document:
• ssh to the Linux machine site2.
• Have the document and its DTD in the same directory.
• Invoke the XML validator xmlvalid.
% ls cd.*
cd.dtd cd.xml
% xmlvalid cd.xml
cd.xml is valid
(there will be error
messages if not )
XML/XSL tools in Linux
CSI 3125, XML, page 37
XML parsing
Another XML tool in Linux:
• ssh to the Linux machine site2.
• Have the document and its DTD in the same directory.
• Invoke the XML processor xmllint.
% ls cd.*cd.dtd cd.xml% xmllint cd.xml(there will be error messages if not valid)
xmllint has many options.To find out about them, type:% xmllint
For even more, type:% man xmllint
XML/XSL tools in Linux
CSI 3125, XML, page 38
You can also see some simple facts about a valid XML document. The Perl program dbstat, posted on the course Web site, does it for us. Remember to make it executable.
% dbstat cd.xml
Node frequency: 2 PI nodes 8 element nodes 0 comment nodes 2 attribute nodes 19 text nodes 0 CDMS nodes
32 total nodes
Element frequency: 1 <band> 1 <cd> 2 <song> 1 <time> 1 <title> 2 <track>
XML statistics
XML/XSL tools in Linux
CSI 3125, XML, page 39
Then there is an XSL processor xsltproc, also with many options. To see them, type:
% xsltproc
For even more, type:
% man xsltproc
You can parse (and validate) an XSL file:
% xsltproc cd.xsl
(there will be error messages if not valid)
XSL processing
XML/XSL tools in Linux
CSI 3125, XML, page 40
You can also run the XSL processor on an XML file, according to a stylesheet, and get the result — for example, an HTML file.
% xsltproc cd.xml<html><body> <h3>Revolver, top two</h3><br><br>
<h4>The Beatles</h4><hr>
<p> <b>Eleanor Rigby</b><br> <i>2:45</i><br> </p>
<p> <b> For No One </b><br> </p></body></html>
XSL processing