+ All Categories
Home > Documents > 1 Extensible Markup Language: XML HTML: portable, widely supported protocol for describing how to...

1 Extensible Markup Language: XML HTML: portable, widely supported protocol for describing how to...

Date post: 18-Dec-2015
Category:
View: 226 times
Download: 0 times
Share this document with a friend
Popular Tags:
15
1 Extensible Markup Language: XML HTML: portable, widely supported protocol for describing how to format data XML: portable, widely supported protocol for describing data XML is quickly becoming standard for data exchange between applications
Transcript

1

Extensible Markup Language: XML

• HTML: portable, widely supported protocol for describing how to format data

• XML: portable, widely supported protocol for describing data

• XML is quickly becoming standard for data exchange between applications

1

XML Documents

• XML marks up data using tags, which are names enclosed in angle brackets < >

• All tags appear in pairs: <myTag> .. </myTag>• Elements: units of data (i.e., anything between a start tag and

its corresponding end tag)• Root element contains all other document elements• Tag pairs cannot appear interleaved: <a><b></a></b> Must

be: <a><b></b></a>• Nested elements form trees

What defines an XML document is not its tag names but that it has tags that are formatted in this way.

Root element contains all other document elements

Optional XML declaration includes version information parameter (MUST be very first line of file)

Because of the nice <tag>.. </tag> structure, the data can be viewed as organized in a tree:

article

title date author summary content

firstName lastName

<?xml version = "1.0"?>

<!– I-sequence structured as XML. -->

<SEQUENCEDATA>

<TYPE>dna</TYPE>

<SEQ>

<NAME>Aspergillus awamori</NAME>

<ID>U03518</ID>

<DATA>aacctgcggaaggatcattaccgagtgcgggtcctttgggccca

acctcccatccgtgtctattgtaccctgttgcttcgg

cgggcccgccgcttgtcggccgccgggggggcgcctctg

ccccccgggcccgtgcccgccggagaccccaacacgaac

actgtctgaaagcgtgcagtctgagttgattgaatgcaat

cagttaaaactttcaacaatggatctcttggttccggc

</DATA>

</SEQ>

</SEQUENCEDATA>

An I-sequence might be

structured as XML like this..

SEQUENCEDATA

TYPE SEQ

DATAIDNAME

comment

1

Parsing and displaying XML

• XML is just another data format

• We need to write yet another parser

• No more filters, please!

?

• No! XML is becoming standard

• Many different systems can read XML – not many systems can read our I-sequence format..

• Thus, parsers exist already

1

XML document opened in Internet Explorer

Minus sign

Each parent element/node can be expanded and collapsed

Plus sign

Standard browsers can format XML documents nicely!

1

XML document opened in Mozilla

Again: Each parent element/node can be expanded and collapsed (here by pressing the minus, not the element)

Attributes

Data can also be placed in attributes: name/value pairs

Attribute (name-value pair, value in quotes): element contact has the attribute type which has the value “to”

Empty elements are elements with no character data between the tags.

The tags of an empty element may be written in one like this: <myTag />

lette

r.xm

l

1

Parsers and trees

• We’ve already seen that XML markup can be displayed as a tree

• Some XML parsers exploit this. They – parse the file – extract the data– return it organized in a tree data structure called a

Document Object Model

article

title date author summary content

firstName lastName

1

Document Object Model (DOM)

• a DOM parser retrieves data from XML document

• returns tree structure called a DOM tree

• Each component of an XML document represented as a tree node

• Single root node (the document node) contains all other nodes

1

Python provides a DOM parser!

• All nodes have name (of tag) and value (data)

• Text (including whitespace) represented in nodes with tag name #text

article

title

#text

#text

#text

#text

date

author

summary

content

#text

#text

#text

firstName

#text

lastName

#text

#text

Simple XML

#text

Dec..2001

#text

XML..easy.

#text

In this..XML.

#text

John

#text

Doe

fig16

_04r

evis

ed.p

y

Parse XML document and load data into variable document

documentElement attribute refers to root node

nodeName refers to element’s tag name

Various node attributes:

firstChild

nextSibling

nodeValue

parentNode

NB: Changes since book!

1

Program output

The first child of root element is: #textwhose next sibling is: titleText inside "title" tag is Simple XMLParent node of title is: article

Here is the root element of the document: articleThe following are its child elements:#texttitle#textdate#textauthor#textsummary#textcontent#text

article

title

#text

#text

#text

#text

date

author

summary

content

#text

#text

#text

firstName

#text

lastName

#text

#text

Simple XML

#text

Dec..2001

#text

XML..easy.

#text

In this..XML.

#text

John

#text

Doe

1

Summary

• XML is widely used

• Many applications can read XML

• Python already has an XML parser which returns a tree

1

.. on to the exercises


Recommended