1
Introduction to XML
Babak Esfandiari
2
What is XML?
introduced by W3C in 98 Stands for eXtensible Markup Language it is more general than HTML, but simpler than
SGML it is used to to describe metadata you can define your own set of tags!
– an XML document does not “do” anything on its own
3
XML example
<?xml version="1.0" ?>
<!-- a simple tagset for museums>
<Museum name="Louvre">
<city> Paris
</city>
</Museum>
4
XML - what for?
content is independent from rendering meta-data makes search easier standard tags enable data interchange across
tools format for data and object persistence, human
readable and editable no need for a custom parser anymore
5
XML concepts and syntax
Elements– can be nested– must have a closing tag
Attributes XML declaration Comments
6
XML concepts (2)
An XML document that follows the syntax rules is considered well-formed
But there is no restriction on the nature, order and number of tags in a well-formed XML document!– in order to impose some restrictions, you need to
define validity criteria in a separate document…
7
DTD
Document Type Definition Describes the XML tagset
<!DOCTYPE Museum [ <!ELEMENT Museum (city?)> <!ATTLIST Museum name CDATA #REQUIRED> <!ELEMENT city (#PCDATA)>]>
An XML document that is compliant to its DTD is valid
8
DTD Syntax
Defining elements:
<!ELEMENT Museum (city?, genre+) Character data types:
– #PCDATA (is parsed)– #CDATA (is not parsed)
9
DTD
DTDs are hard to read DTD has its own syntax DTD has very limited support for data types
10
XML Schema
a 2001 W3C recommendation allows the definition of elements and attributes
using the XML syntax supports many primitive types allows the creation of complex types uses namespaces to:
– allow reuse of types and schemas – avoid naming clashes
11
XML Schema Example
http://chat.carleton.ca/~narthorn/project/community.xsd
12
Some XML-based standards
MathML CML MusicXML XMI
13
XMI Example
<Class>
<name>Museum</name>
<feature>
<Attribute>
<name>name</name>
</Attribute>
</feature>
</Class>
14
XML Parsing
Many XML parsers are available: JAXP, XERCES… Two “standardized” parsing methods:
– SAX event-driven serial-access element-by-element processing
– DOM creates a tree structure of objects stores it in memory easier to navigate, but more memory needed
15
SAX
good to use if you are “consuming” XML data from a stream
see Echo.java example (from JAXP)
16
DOM
use it if you need “random access” to various elements of the document
see EchoDom.java example
17
XSLT
eXtensible Stylesheet Language Templates allows the transformation of one XML
document into another by specifying transformation rules
18
XSLT example
<xsl:stylesheet><xsl:template match="Class">
blah <xsl:value-of select="name"/> blah<xsl:apply-templates select="feature/Attribute"/>
</xsl:template><xsl:template match="feature/Attribute">
<xsl:value-of select="name"/></xsl:template></xsl:stylesheet>
19
Semantic Web
Tim Berners-Lee’s idea of the future of the Web
The goal is to make information accessible to non-humans(ie agents)
Therefore information should be structured and use metadata– RDF is proposed as such structure
20
RDF Example
See Software Agents course example
21
Refs
W3C specs: http://www.w3.org/TR/REC-xml