Date post: | 18-Dec-2015 |
Category: |
Documents |
View: | 226 times |
Download: | 0 times |
1
Extensible Markup Language: XML
• HTML: portable, widely supported protocol for describing how to format data
• XML: portable, widely supported protocol for describing data
• XML is quickly becoming standard for data exchange between applications
1
XML Documents
• XML marks up data using tags, which are names enclosed in angle brackets < >
• All tags appear in pairs: <myTag> .. </myTag>• Elements: units of data (i.e., anything between a start tag and
its corresponding end tag)• Root element contains all other document elements• Tag pairs cannot appear interleaved: <a><b></a></b> Must
be: <a><b></b></a>• Nested elements form trees
What defines an XML document is not its tag names but that it has tags that are formatted in this way.
Root element contains all other document elements
Optional XML declaration includes version information parameter (MUST be very first line of file)
Because of the nice <tag>.. </tag> structure, the data can be viewed as organized in a tree:
article
title date author summary content
firstName lastName
<?xml version = "1.0"?>
<!– I-sequence structured as XML. -->
<SEQUENCEDATA>
<TYPE>dna</TYPE>
<SEQ>
<NAME>Aspergillus awamori</NAME>
<ID>U03518</ID>
<DATA>aacctgcggaaggatcattaccgagtgcgggtcctttgggccca
acctcccatccgtgtctattgtaccctgttgcttcgg
cgggcccgccgcttgtcggccgccgggggggcgcctctg
ccccccgggcccgtgcccgccggagaccccaacacgaac
actgtctgaaagcgtgcagtctgagttgattgaatgcaat
cagttaaaactttcaacaatggatctcttggttccggc
</DATA>
</SEQ>
</SEQUENCEDATA>
An I-sequence might be
structured as XML like this..
SEQUENCEDATA
TYPE SEQ
DATAIDNAME
comment
1
Parsing and displaying XML
• XML is just another data format
• We need to write yet another parser
• No more filters, please!
?
• No! XML is becoming standard
• Many different systems can read XML – not many systems can read our I-sequence format..
• Thus, parsers exist already
1
XML document opened in Internet Explorer
Minus sign
Each parent element/node can be expanded and collapsed
Plus sign
Standard browsers can format XML documents nicely!
1
XML document opened in Mozilla
Again: Each parent element/node can be expanded and collapsed (here by pressing the minus, not the element)
Attributes
Data can also be placed in attributes: name/value pairs
Attribute (name-value pair, value in quotes): element contact has the attribute type which has the value “to”
Empty elements are elements with no character data between the tags.
The tags of an empty element may be written in one like this: <myTag />
lette
r.xm
l
1
Parsers and trees
• We’ve already seen that XML markup can be displayed as a tree
• Some XML parsers exploit this. They – parse the file – extract the data– return it organized in a tree data structure called a
Document Object Model
article
title date author summary content
firstName lastName
1
Document Object Model (DOM)
• a DOM parser retrieves data from XML document
• returns tree structure called a DOM tree
• Each component of an XML document represented as a tree node
• Single root node (the document node) contains all other nodes
1
Python provides a DOM parser!
• All nodes have name (of tag) and value (data)
• Text (including whitespace) represented in nodes with tag name #text
article
title
#text
#text
#text
#text
date
author
summary
content
#text
#text
#text
firstName
#text
lastName
#text
#text
Simple XML
#text
Dec..2001
#text
XML..easy.
#text
In this..XML.
#text
John
#text
Doe
fig16
_04r
evis
ed.p
y
Parse XML document and load data into variable document
documentElement attribute refers to root node
nodeName refers to element’s tag name
Various node attributes:
firstChild
nextSibling
nodeValue
parentNode
NB: Changes since book!
1
Program output
The first child of root element is: #textwhose next sibling is: titleText inside "title" tag is Simple XMLParent node of title is: article
Here is the root element of the document: articleThe following are its child elements:#texttitle#textdate#textauthor#textsummary#textcontent#text
article
title
#text
#text
#text
#text
date
author
summary
content
#text
#text
#text
firstName
#text
lastName
#text
#text
Simple XML
#text
Dec..2001
#text
XML..easy.
#text
In this..XML.
#text
John
#text
Doe
1
Summary
• XML is widely used
• Many applications can read XML
• Python already has an XML parser which returns a tree