Stein XML 2.1
XMLXML
a first coursea first course
Part 2Part 2
XMLXML
a first coursea first course
Part 2Part 2
Yaakov J. Stein
Chief ScientistRAD Data Communications
Stein XML 2.2
Course ObjectivesCourse Objectives
XML what and why?
Well-formed XML
– Displaying XML in IE
Valid XML and DTDs
Parsing XML using JavaScript
Processing XML using XSL
Stein XML 2.3
XMLXML
Parsing XML
using
JavaScript
Stein XML 2.4
XML ParsersXML Parsers
All XML parsers MUST check for well-formed input
Some XML parsers are validating, others nonvalidating
There are two XML parser “philosophies” Event driven parsers (SAX)
– Fast and small memory footprint
– Output parsing results on-the-fly
– Application must store information it needs
– Can use stack to track hierarchy Tree parsers (DOM)
– Slow and large memory footprint
– Build full tree first, then user can traverse tree
– Exploit “Object Oriented” languages
Stein XML 2.5
SAXSAXSimple API for XML (present version SAX 2.0)Not developed by W3C BUT de-facto standardVersions for Java(Apache Xerces parser), C++, VB, Python, Perl
some ContentHandler methods (callbacks)void setDocumentLocator (Locator locator) supplies application with event location
void startDocument() throws SAXException receive notification of XML beginning
void endDocument() throws SAXException receive notification of XML end
void startElement (…) throws SAXException receive notification of element start tag
void endElement (…) throws SAXException receive notification of element end tag
void characters (…) throws SAXException receive notification of text
void ignorableWhitespace(…) throws SAXException receive notification of space
Example<quote> to be <bold> or </bold> not to be</quote>
startElement quote characters “to be”startElement bold characters “or”endElement boldcharacters “not to be”endElement quote
Stein XML 2.6
Document Object ModelDocument Object Model
DOM - API that provides access to XML/HTML document structure
- Enables reading, deleting, changing, adding elements/attributes
There is a good match between
XML and tree hierarchy and object oriented programming
<vehicles> <airplanes/> <motor_vehicles> <trucks/> <cars/> </motor_vehicles> <bicycles/></vehicles>
vehiclesvehicles.airplanesvehicles.motor_vehiclesvehicles.motor_vehicles.trucksvehicles.motor_vehicles.carsvehicles.bicycles
vehicles
airplanes
motor_vehicles
bicycles
trucks cars
Stein XML 2.7
NodesNodesThe basic unit in the DOM tree is the Node object
Nodes that are not null also implement more specialized interfaces
Node properties nodeName (readonly String) nodeType (readonly unsigned short) nodeValue (String) attributes (readonly NamedNodeMap) parentNode (readonly Node) childNodes (readonly NodeList) firstChild (readonly Node) lastChild (readonly Node) previousSibling (readonly Node) nextSibling (readonly Node) ownerDocument (readonly Document)
prefix (String) localName (readonly String) namespaceURI (readonly String)
Node methods boolean hasChildNodes() Node cloneNode(…) Node appendChild(…) Node removeChild(…) Node replaceChild(…) Node insertBefore(…) void normalize()
boolean hasAttributes() boolean isSupported(…)
Stein XML 2.8
Node TypesNode TypesThe W3C DOM defines the following types (as constants in the Node object - but IE doesn’t implement)
constant’s name nodeName nodeValue data type1. ELEMENT_NODE tag’s name null Element2. ATTRIBUTE_NODE attribute’s name value Attr3. TEXT_NODE #text text Text4. CDATA_SECTION_NODE #cdata_section text CDATASection5. ENTITY_REFERENCE_NODE referenced name null EntityRerence 6. ENTITY_NODE entity’s name null Entity7. PROCESSING_INSTRUCTION_NODE PI’s target rest of PI ProcessingInstruction8. COMMENT_NODE #comment text Comment9. DOCUMENT_NODE #document null Document10. DOCUMENT_TYPE_NODE dtd name null DocumentType11. DOCUMENT_FRAGMENT_NODE #document-fragment null DocumentFragment12. NOTATION_NODE notation’s name null Notation
Stein XML 2.9
ElementsElements
Element nodes have the following properties and methods(for full list see W3C site)
Property tagName (readonly String)
Methods boolean hasAttribute(String name) String getAttribute(String name) void setAttribute(String name, String value) Attr getAttributeNode (String name) Attr setAttributeNode(Attr newAttr) void removeAttribute(String name) Attr removeAttributeNode(Attr oldAttr) NodeList getElementsbyTagName(String name)
Stein XML 2.10
AttributesAttributes
Attr nodes have the following properties (no methods)
Properties name (readonly String) ownerElement (readonly Element) specified (readonly boolean) value (String)
Stein XML 2.11
NodeList NodeList andand NamedNodeMap NamedNodeMapNodeList is an array of nodes Node.childNodes
Property length (readonly unsigned long)
Method Node item (unsigned long index) nl.item(k) is the same as nl[k]
NamedNodeMap is a collection of Nodes indexed by names
Property Node.Attributes length (readonly unsigned long)
Methods Node item(unsigned Long index) Node getNamedItem(name) Node setNamedItem(…) Node removeNamedItem(name)
Stein XML 2.12
Character DataCharacter DataCharacterData nodes are the father of text and comment nodes
text nodes are the father of CDATASection nodes
Properties data (String) length (readonly unsigned long)
Methods appendData() deleteData() insertData() replaceData() substringData()
Node
CharacterData
Text Comment
CDATASection
Stein XML 2.13
DocumentDocumentDocument nodes are needed to start everything
Properties documentElement (readonly Element) root element of xml Doctype (readonly DocumentType) dtd
Methods Element createElement(name) Attr createAttribute(name) Text createTextNode(…) Comment createComment(…) createEntityReference(…) createCDATASection(…) createProcessingInstruction(…) createDocumentFragment(…)
Element getElementById(id) NodeList getElementsByTagName(name)
createNodeIterator(…) createTreeWalker(…)
Stein XML 2.14
Parsing with JavaScriptParsing with JavaScript
There are DOM interfaces for many (object oriented) languages– Java– JavaScript, ECMAScript, Jscript– C++– VBScript
It is easier to use a scripting language– Many required features are pre-programmed– Interpreted, not compiled– Platform independent
JavaScript runs only inside a browser
JavaScript is easier that Java which is easier than C++ (kids use it!)
JavaScript is FUN (kids use it!)
Stein XML 2.15
How to use JavaScriptHow to use JavaScript
Use JavaScript by placing script tags in HTML document<SCRIPT LANGUAGE="javascript">
internal javascript code</SCRIPT>
or URL
<SCRIPT LANGUAGE="javascript“ SRC=“filename.js”></SCRIPT>
You can place SCRIPT tag anywhere, in HEAD or in BODY
It is recommended to hide scripts from older non-scripting browsers<!-- HTML COMMENT
<SCRIPT LANGUAGE="javascript"> // JAVASCRIPT COMMENT </SCRIPT>
--><NOSCRIPT>
<H1> This page requires a modern browser! </H1></NOSCRIPT>
Stein XML 2.16
Quick overview of JavaScript Quick overview of JavaScript ECMAscript, see ECMA-262ECMAscript, see ECMA-262
Object oriented (object has properties, methods and events)
Loosely typed (string(default), numbers, boolean)
functions with arguments (not checked even for number) optional return value
var declares local scope new allocates object don’t need ;
Operators++ - - +(numbers,strings) - * / %(mod) << etc += etc < <= > >= == != ~(bit negation) ! && || ?: (conditional) , NaN infinity
Flowif if/else while for (c-like) for/in continue break return with
MathPI E SQRT2 abs ceil floor round max min sqrt pow eval sin cos tan acos asin atan exp log random
Date WeekDay DayFromTime DaysInYear etc.etc.etc.
Stein XML 2.17
Javascript EventsJavascript Events
EVENTSOnclick Mouse click Ondblclick Mouse double click onmouseover Mouse enters an element onmouseout Mouse leaves an element onmousemove Mouse moves onmousedown Mouse button is pressed onmouseup Mouse button is released onkeypress Visible character is pressed onkeydown Key is pressed onkeyup Key is released onload Document has finished loading onblur Element loses the focus onfocus Element gains the focus
Stein XML 2.18
Javascript ExampleJavascript Example<HTML> <HEAD> <SCRIPT language=“javascript”> function hi() { with (hello.style) { posLeft=event.clientX; posTop=event.clientY; } }
function flying() { with (fly.style) { if (posLeft<300) { posLeft+=5; posTop+=5; } else { posLeft=10; posTop=10; } } setTimeout('flying()',10); } </SCRIPT> </HEAD>
<BODY onload="flying()" onclick="hi()"> <P ID="hello" style="position:absolute;top:100;left:100"> Hello World! </P> <SPAN ID="fly" style="position:absolute;top:10;left:10"> I'm Flying!!! </SPAN> </BODY>
</HTML>
DHTML
Stein XML 2.19
XML IslandsXML Islands
What happens when we define an XML island inside an HTML file ?
<html>
<head> <title>XML Island Demo</title></head>
<body><!-- xml island --><xml id="hellodata" src="hello.xml"></xml>
</body>
</html>
Nothing happens - the XML is in the DOM, but the browser doesn’t know what to do!
(When we directly display an XML file HTML uses a default XSL)
We have to manually extract from the XML DOM and insert it into the browser window as HTML!
Stein XML 2.20
An IE specific-featureAn IE specific-feature
XML islands are Microsoft-specific,
and Microsoft supplies some non-standard ways of retrieving info<html>
<head> <title>XML island Demo</title></head>
<!-- xml island --><xml id="hellodata" src="hello.xml"></xml>
<body> <B>printout</B> <span dataSrc="#hellodata" dataFld="message"></span></body>
</html>
<?xml version="1.0"?>
<printout>
<message>
Hello world!
</message>
</printout>
printout Hello world!
Stein XML 2.21
Javascript to the rescueJavascript to the rescue
Using javascript we can access the XML DOM in a standard way!
<?xml version="1.0"?>
<printout>
<message>
Hello world!
</message>
</printout>
<html>
<head> <title>XML DOM Demo</title></head>
<!-- xml island --><xml id="hellodata" src="hello.xml"></xml>
<body>
<script language=javascript> alert(hellodata.xml) document.write(hellodata.xml)</script>
</body>
</html>
alert displays the DOM object
write displays the text (suppresses tags)
Stein XML 2.22
Let’s try a more interesting file!Let’s try a more interesting file!<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="modems.xsl"?><!-- modems.xml --><!DOCTYPE modems SYSTEM "modems.dtd">
<modems><copper>
<name>ASM-20</name> <webpage>products/family/asm-20/asm-20.htm</webpage>
<medium>4-wire</medium><linecode>D1</linecode><sync>synchronous</sync><management>unmanaged</management>
<minrate>19.2</minrate> <maxrate>256</maxrate> <maxrange>7.5</maxrange> <interfaces> <interface>V.24</interface> . . . </interfaces>
</copper> . . . </modems> Try alert and document.write !!!
Stein XML 2.23
Javascript Access to DOMJavascript Access to DOM
What happens when we walk through the DOM tree?
<script language="JavaScript"> // main section of DOM (DTD after xsl please!) document.writeln("The document has " + modemdata.childNodes.length + " sections.<br>") for (n=0;n<modemdata.childNodes.length;n++) { document.writeln( "<font color='red'>" + n + "</font>" + " nodeType=" + modemdata.childNodes(n).nodeType + " nodeName=" + modemdata.childNodes(n).nodeName + " nodeValue=" + modemdata.childNodes(n).nodeValue + "<br>" ) } </script>
The document has 5 sections.0 nodeType=7 nodeName=xml nodeValue=version="1.0“1 nodeType=7 nodeName=xml-stylesheet nodeValue=type="text/xsl" href="modems.xsl“2 nodeType=8 nodeName=#comment nodeValue= modems.xml 3 nodeType=10 nodeName=modems nodeValue=null4 nodeType=1 nodeName=modems nodeValue=null the XML tree
Stein XML 2.24
Let’s walk through the real tree!Let’s walk through the real tree! // first get the XML root node var rootnode = modemdata.documentElement // var rootnode = modemdata.childNodes(modemdata.childNodes.length-1) var nmodems = rootnode.childNodes.length document.writeln("<h1> The root is <font color='blue'>" + rootnode.nodeName + "</font>" + " and it has " + nmodems + " child nodes.</h1>") // now traverse XML tree for (n=0;n<nmodems;n++) { // find the modem var thismodem = rootnode.childNodes(n) document.writeln("<h2>"+ n + ". " + thismodem.nodeName+"</h2>") numfields = thismodem.childNodes.length
// print all the child nodes for this modem for (i=0;i<numfields;i++) { document.writeln( "<font color='red'>" + i + "</font> " + "<font color='green'>" + thismodem.childNodes(i).nodeType + "</font> " + "<font color='blue'>" + thismodem.childNodes(i).nodeName + "</font> " + thismodem.childNodes(i).text + "<br>") }
Stein XML 2.25
And the answer is … And the answer is …
The root is modems and it has 7 child nodes.0. copper0 1 name ASM-201 1 webpage products/family/asm-20/asm-20.htm2 1 medium 4-wire3 1 linecode D14 1 sync synchronous5 1 management unmanaged6 1 minrate 19.27 1 maxrate 2568 1 maxrange 7.59 1 interfaces V.24 RS-232 V.35 V.36 X.21 …
Stein XML 2.26
More generallyMore generally
There are more levelsand we have to recursively walk through the treefunction parseChildren(node) var x = node.childNodes var n = x.length if (n>0) { for (var I=0; I<n; I++) { . . . parseChildren( x(i) ) } } }
There will usually be attributes (etc) as wellWe often want to jump to specific nodes, etc We may want to append, delete, change nodes
and persist the changes
EXERCISE TIME!!
See NodeIterator and TreeWalker
Stein XML 2.27
XMLXML
Processing XML
using
XSL
Stein XML 2.28
StylesheetsStylesheets
Stylesheets are commonplace in presentation tools
They enable customization, standardization of documents
A stylesheets is usually a set of rules
describing how different elements are to be displayed
For example look of headers font face and size effects (underline, bullets) Use of color
Cascaded Style Sheets are used to changes HTML defaults
SGML had Document Style and Semantics Specification Language Based on Scheme (LISP variant) Influenced XSLT’s philosophy, but not its syntax
Stein XML 2.29
CSSCSS
We can add style to XML using CSS - just like HTMLbook {display:block}article {display:block}talk {display:block}title {display:block; background:red; color:yellow; font-size:20pt;}author {color:blue; font-size:20pt;}
<?xml version="1.0"?><?xml-stylesheet type="text/css" href="biblio3.css"?><!DOCTYPE bibliography SYSTEM "biblio3.dtd">
<bibliography> <book> <title>. . . </title> </bibliography>
But such style is very limited Treatment of tags is not environment dependent Can hide tags (display:none) but can’t sort or filter them CSS is not a full programming language CSS is not XML-based and not extensible
Stein XML 2.30
XSLXSLOne can process with procedural languages (e.g. javascript)
But instead one can use an XML-based pattern matching language– First step of compilation is XML– Declaritive languages are more suitable for transformation applications
XSL eXtensible Stylesheet Language XSL has 2 components XSLT and XSLFO Both are XML applications (can be verified using DTD)
XSLT has 2 versions
NEW VERSION (MSXML3, IE6?)<xsl:stylesheet version=“1.0” xmlns:xsl=“http://www.w3.org/1999/Transform”>
OLD VERSION (IE5+)<xsl:stylesheet version=“1.0” xmlns:xsl=“http://www.w3.org/TR/WD-xsl”>
XSLT is supported by IE5+ XMLSPY Apache’s Xalan Saxon XP Sablotron Unicorn Xesalt
Stein XML 2.31
XSL TransformationsXSL Transformations
If we are already processing the XML file (XML in XML out)
we can do a lot more!
Examples: Change tag names (e.g. <para> … </para> to <P> … </P>) Change attributes to child elements or vice-versa Manipulate fields (including numeric computation) Reorder elements Change entire hierarchical structure Filter elements or SELECT records
Hence there are two equivalent opening tags
<xsl:stylesheet version=“1.0” …> for “embedded” XSL
<xsl:transform version=“1.0” …> for “standalone” XSLT not in IE
XML format conversion
Stein XML 2.32
XSLT ProcessingXSLT Processing
XSLT Inputs 2 XML files: XML and XSL Outputs 1 XML file (can be HTML for display)
XSLT supports recursion and iteration (it relies on an XML DOM parser)
XSLT supports XPath (although IE support is minimal)
XSLT supports internationalization (languages)Unfortunately, present-day XSLT processors are limited require tree in memory
and are hence limited in database size (write SAX programs for large applications) are relatively slow
Processing features: template matching commands value commands extract fields standard programming constructs (e.g. basic math, loops, conditionals) special features (e.g. filtering, sorting) noncommands are passed to output
Stein XML 2.33
Simple XSL ExampleSimple XSL Example
Bibliography Digital Signal Processing . . .Y. Stein
Critical Temperature . . . Y. Stein
Storage Capacity for Neural Network ModelsY. Stein
<?xml version="1.0"?><?xml-stylesheet type="text/css" href="biblio3.css"?><!DOCTYPE bibliography SYSTEM "biblio3.dtd">
<bibliography> <book> <title>. . . </title> </bibliography>
<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/TR/WD-xsl" > <xsl:template match="/"> <html> <body> <H1> Bibliography </H1> <xsl:apply-templates/> </body> </html> </xsl:template>
<xsl:template match="bibliography"> <xsl:apply-templates/> </xsl:template>
<xsl:template match="book|article|thesis|talk"> <p><b><xsl:value-of select="title"/></b></p> <p><xsl:value-of select="author"/></p> </xsl:template>
</xsl:stylesheet>
Stein XML 2.34
template matchtemplate matchThe heart of XSLT is template matching (triggering)
The xsl:template element with the match attribute is used
<xsl:template match=“nodename”>
. . . Put here whatever you want to do!
</xsl:template>
Actually the match attribute’s value is not merely a nodename
it is a complex expression matching any of the children of the current node
We must always start processing by matching to the document nodewhich is nicknamed / (WARNING - this is NOT the XML root!)
<xsl:template match=“/”>
. . .
</xsl:template>
Stein XML 2.35
Recursion and IterationRecursion and Iteration
At every moment there is a current node
We will need to match the current node’s children
We can do this by recursion
<xsl:apply-templates/>
<xsl:apply-templates select = “subtree”/>
Or by iteration (looping)
<xsl:for-each select=“subtree” . . . > . . .
</xsl:for-each>
When recursing XSL should perform default actions
on all the child nodes, but IE doesn’t
Stein XML 2.36
value-of selectvalue-of select
The explicit value of a node is obtained using
<xsl:value-of select=“nodename”/>
where as usual nodename is actually an expression
For the current node’s value use “.”
<xsl:value-of select=“.”/>
Example<xsl:template match="article"> <b><xsl:value-of select="title"/>:</b> <xsl:value-of select="author"/> </xsl:template>
Stein XML 2.37
XPath expressionsXPath expressions
The expression in match and select attributes are in XPathXPath expressions are NOT XML syntaxHere are some XPath goodies/ like in directories is both the “top” and hierarchy divider* wildcard@ attribute// any number of intervening levelstype() e.g. text(), comment() nodes of a particular typeTest brackets [xxx] only nodes with child or attribute which matchExamples
<xsl:template match=“@color”><xsl:template match=“zoo/animals/*/food”><xsl:template match=“zoo//food”><xsl:template match=“book[text()]”>
Stein XML 2.38
SortingSorting
The for-each element has several ordering options
Sorting is specified using the order-by attribute
By default ordering is lexicographical (unless explicitly number)
and ascending (use - for descending)
Multiple keys can be specified (separate by ;)
<xsl:for-each select="copper|fiber"
order-by="number(minrate); -interfaces"> . . .
</xsl:for-each>
There is also a <xsl:sort/> command not implemented by IE
Also, you can count with <xsl:number/> (position in current node)
Stein XML 2.39
Default (IE) XSLDefault (IE) XSL
<?xml version="1.0"?><xsl:stylesheet version="1.0" . . .> <xsl:template match="/"> <html> <head> <style> . . . </style> </head> <body> <xsl:apply-templates/> </body> </html> </xsl:template>
<xsl:template match="node()[nodeType()=10]"> <SPAN><!DOCTYPE <x:node-name/><I> (View source for full doctype . . . )></SPAN> </xsl:template>
<xsl:template match="pi()">. . .
</xsl:template>
<xsl:template match="comment()"> . . . </xsl:template>
. . .
<xsl:template match="*[ textnode() $and$ $not$ (comment() $or$ pi() $or$ cdata()) ]"> . . . </xsl:template>
. . .
</xsl:stylesheet>
Stein XML 2.40
XSLing on-the-flyXSLing on-the-fly
By defining two XML islands one for the XML and one for the XSL
We can process the XML before displaying it
<html> <head> <script language=“javascript”> function load() { var result = xmli.transformNode(xsli.documentElement); fakeDiv.innerHTML = result; } </script> </head>
<!-- xml islands --> <xml id="xmli" src="modems.xml"></xml> <xml id="xsli" src="modems.xsl"></xml>
<body onload="load()"> <div id="fakeDiv"> </div> </body></html>
Stein XML 2.41
XML and XSL and Javascript!XML and XSL and Javascript!
XSL is great - but it has NO GUI !!!!!!
Javascript is great - but it is tedious to use
Idea: process XML with XSL use HTML buttons, forms, etc. events trigger Javascript functions Javascript changes XSL in DOM XSL retransforms XML to HTML
Stein XML 2.42
XSML+XSL+JS ExampleXSML+XSL+JS Example<html> <head> <script language=“javascript”> function load() { var result = xmli.transformNode(xsli.documentElement); fakeDiv.innerHTML = result; } function change(value) { // parse XSL and make changes load() } </script> </head>
<!-- xml islands --> <xml id="xmli" src="modems.xml"></xml> <xml id="xsli" src="modems.xsl"></xml>
<body onload="load()"> <select name=“number" onClick=“change(this.value)”> <option selected="selected" value=“0">0</option> <option value=“1">1</option> </select> <div id="fakeDiv"> </div> </body></html>
Stein XML 2.43
A ExampleA Examplemodems.xml<?xml version="1.0"?><!-- <?xml-stylesheet type="text/xsl" href="modems.xsl"?> --><!-- modems.xml --><!DOCTYPE modems SYSTEM "modems.dtd">
<modems> . . .
modems.xsl<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/TR/WD-xsl"> <xsl:template match="/"> <html> <head> <Style> . . . </Style> </head>
<body><xsl:apply-templates select="modems"/></body> </html> </xsl:template> . . .
find.html<html> <head> <Style> . . . </Style> <script language="javascript" for="window" > function load() . . . function selectmedium(key) . . . function selectman(key) . . . function inputrate(key) . . . function inputrange(key) . . . </script> . . .