+ All Categories
Home > Documents > XML Programming Introduction to XML ITC570 1. L EARNING O BJECTIVES Be able to: Understand XML...

XML Programming Introduction to XML ITC570 1. L EARNING O BJECTIVES Be able to: Understand XML...

Date post: 20-Dec-2015
Category:
View: 217 times
Download: 1 times
Share this document with a friend
43
ITC570 1
Transcript

ITC570

1

LEARNING OBJECTIVES

Be able to:

Understand XML technologies and their roles.

Understand different components of an XML document.

Create a well-form XML document.

2

3

XML stands for eXtensible Markup Language

HTML is used to mark up text so it can be displayed to users

XML is used to mark up data so it can be processed by computers

HTML describes both structure (e.g. <p>, <h2>, <em>) and appearance (e.g. <br>, <font>, <i>)

XML describes only content, or “meaning”

HTML uses a fixed, unchangeable set of tags

In XML, you make up your own tags

HTML and XML look similar, because they are both SGML languages (SGML = Standard Generalized Markup Language)

Both HTML and XML use elements enclosed in tags (e.g. <body>This is an element</body>)

Both use tag attributes (e.g.,<font face="Verdana" size="+1" color="red">)

Both use entities (&lt;, &gt;, &amp;, &quot;, &apos;)

More precisely,

HTML is defined in SGML

XML is a (very small) subset of SGML

4

HTML is for humans

HTML describes web pages

You don’t want to see error messages about the web pages you visit

Browsers ignore and/or correct as many HTML errors as they can, so HTML is often sloppy

XML is for computers

XML describes data

The rules are strict and errors are not allowed

In this way, XML is like a programming language

Current versions of most browsers can display XML

However, browser support of XML is spotty at best

5

6

<?xml version="1.0"?><weatherReport> <date>7/14/97</date> <city>North Place</city>, <state>NX</state> <country>USA</country> High Temp: <high scale="F">103</high> Low Temp: <low scale="F">70</low> Morning: <morning>Partly cloudy, Hazy</morning> Afternoon: <afternoon>Sunny &amp; hot</afternoon> Evening: <evening>Clear and Cooler</evening></weatherReport>

From: XML: A Primer, by Simon St. Laurent

SOME TECHNOLOGIES WE MAY COVER

7

HTML

Java

HTML FormsJavaScript

XHTML & CSS

But underneath...HTTPTCP/IPSockets

maybe RMI

Javaservlets

JSP

Perl PHP

SQL

XMLDTDXML Schemas

RELAX NG

XSLXSLTXPath

CSS

JavaSAXDOM

JAXP

Java JDBC

ApacheTomcat

Ajax

WHY XML?

Distributed applications need to share data.

plain text

structure and the meaning of the data are tightly defined.

Delivery of data to multi-devices

Separation of data and presentation.

8

XML DOCUMENT – AN EXAMPLE

<bookshop><book><title> Harry Potter and the

Sorcerer’s Stone</title><author> <initials>J.K</initials> <surname> Rowling</surname></author><price value=“$16.95”></price></book>…</bookshop>

9

bookshop

book

title

book

author

initials surname

price

value

DTD (Document Type Definition) and XML Schemas are used to define legal XML tags and their attributes for particular purposes

CSS (Cascading Style Sheets) describe how to display HTML or XML in a browser

XSLT (eXtensible Stylesheet Language Transformations) and XPath are used to translate from one form of XML to another

DOM (Document Object Model), SAX (Simple API for XML, and JAXP (Java API for XML Processing) are all APIs for XML parsing

10

XML PARSER

Required to read and manipulate XML documents.

Read the XML documents as a plain text and transform it into a data structure, typically tree, in the memory.

The applications, such as web browser, access the data structure and process the data according to their objectives.

Example: msxml

11

XML DOCUMENT – BASIC COMPONENTS

Elements.

Attributes.

Character and Entity References.

Character Data (CDATA).

Processing Instruction.

Comments.

12

ELEMENTS13

Root Element (compulsory)

Branch Elements

Leaf Element

bookshop

book

title

book

author

initials surname

price

value

attribute

ELEMENT

The basic building block of XML markups.

It may contains: Text , Other elements (child elements)

Attributes, Character Data, Other markup, eg comments

Delimited with a start-tag and an end-tag.

Element can be empty.

The end-tag CANNOT be omitted as in HTML.

Each tag must consist a valid element type name.

14

ELEMENT’S NAME

Element’s Name (Tag’s name) is CASE SENSITIVE.

<BOOK> <Book><book>

Trailing space is legal but will be ignored

<BOOK > = <BOOK>

15

EMPTY ELEMENT

Has no content.

May be associated with attribute.

Example:

<img src=‘logo.png’></img>

can be abbreviated into

<img src=‘logo.png’/>

16

XML DOCUMENT – BASIC COMPONENTS

Elements.

Attributes.

Character and Entity References.

Character Data (CDATA).

Processing Instruction.

Comments.

17

ATTRIBUTES

Information regarding the element.

“If elements are ‘nouns’ of XML then attributes are its ‘adjective’.

<tagname attribute_name=“attribute_value”>

18

<book>

<title> Harry Potter</title>

</book>

<book title=“Harry Potter”>

</book>

ATTRIBUTES VS ELEMENT

Determine by the semantic contents.

Attributes are characteristics of an element.

19

<book>

<title> Harry Potter</title>

</book>

<book title=“Harry Potter”>

</book>

XML DOCUMENT – BASIC COMPONENTS

Elements.

Attributes.

Character and Entity References.

Character Data (CDATA).

Processing Instruction.

Comments.

20

CHARACTER REFERENCES

Use to display characters that are not supported by the input device (keyboard).

entering £ using US-ASCII keyboard.

Format: &#NNNNN; or &#xXXXX;

N decimal

X hexadecimal

Example: $ => &#36; OR &#x24

21

ENTITY REFERENCES

Entities may be defined and used for:

Representing character used in mark-up

&lt == “<“ &amp == “&”

String

&IR == Information Retrieval

Predefined entities: &lt, &gt, &quot, etc

22

XML DOCUMENT – BASIC COMPONENTS

Elements.

Attributes.

Character and Entity References.

Character Data (CDATA).

Processing Instruction.

Comments.

23

CHARACTER DATA

To escape blocks of text containing characters which would otherwise be recognized as markup.

<![CDATA[…]]>

<![CDATA[<greeting>Hello, world!</greeting>]]>

24

CHARACTER DATA(2)

<example>

<![CDATA[&Warn;-&Disclaimer;&lt;&copy 2001; &PM;&gt;]]>

</example>

<example>

&amp;Warn;-&amp;Disclaimer;&amp;lt;&amp;copy 2001; &amp;PM; &amp;gt>

</example>

25

XML DOCUMENT – BASIC COMPONENTS

Elements.

Attributes.

Character and Entity References.

Character Data (CDATA).

Processing Instruction.

Comments.

26

PROCESSING INSTRUCTION(PI)

Processing instructions (PIs) allow documents to contain instructions for applications.

<?target … instruction … ?>

Target is used to identify the application or other object to which the PI is directed.

<?xml-stylesheet href=“mystyle.css” type=“text/css”>

27

XML DOCUMENT – BASIC COMPONENTS

Elements.

Attributes.

Character and Entity References.

Character Data (CDATA).

Processing Instruction.

Comments.

28

COMMENTS

Syntax:

<!–- comment text -->

Comments cannot be used within element tags.

<tag>… some content … <tag <!– it is illegal -->>

Comments may never be nested.

<!– Comments cannot <!– be nested --> like this -->

29

Names (as used for tags and attributes) must begin with a letter or underscore, and can consist of:

Letters, both Roman (English) and foreign

Digits, both Roman and foreign

. (dot)

- (hyphen)

_ (underscore)

: (colon) should be used only for namespaces

Combining characters and extenders (not used in English)

30

Start with <?xml version="1"?>

XML is case sensitive

You must have exactly one root element that encloses all the rest of the XML

Every element must have a closing tag

Elements must be properly nested

Attribute values must be enclosed in double or single quotation marks

There are only five pre declared entities

31

<novel> <foreword> <paragraph> This is the great American novel. </paragraph></foreword> <chapter number="1"> <paragraph>It was a dark and stormy night. </paragraph> <paragraph>Suddenly, a shot rang out! </paragraph> </chapter></novel>

32

An XML document represents a hierarchy; a hierarchy is a tree

33

novel

foreword chapternumber="1"

paragraph paragraph paragraph

This is the greatAmerican novel.

It was a darkand stormy night.

Suddenly, a shotrang out!

You can make up your own XML tags and attributes, but...

...any program that uses the XML must know what to expect!

A DTD (Document Type Definition) defines what tags are legal and where they can occur in the XML

An XML document does not require a DTD

XML is well-structured if it follows the rules given earlier

In addition, XML is valid if it declares a DTD and conforms to that DTD

A DTD can be included in the XML, but is typically a separate document

Errors in XML documents will stop XML programs

Some alternatives to DTDs are XML Schemas and RELAX NG

34

An element may contain other elements, plain text, or both

An element containing only text: <name>David Matuszek</name>

An element (<name>) containing only elements: <name><first>David</first><last>Matuszek</last></name>

An element containing both:<class>CIT597 <time>10:30-12:00 MW</time></class>

An element that contains both text and other elements is said to have mixed content

Mixed content is legal, but bad

Mixed content makes it much harder to define valid XML

Mixed content is more complicated to use in a program

Mixed content adds no power to XML--it is never needed for anything35

36

<?xml version="1.0"?><weatherReport> <date>7/14/97</date> <place><city>North Place</city> <state>NX</state> <country>USA</country> </place> <temperatures><high scale="F">103</high> <low scale="F">70</low> </temperatures> <forecast><time>Morning</time> <predict>Partly cloudy, Hazy</predict> </forecast> <forecast><time>Afternoon</time> <predict>Sunny &amp; hot</predict> </forecast> <forecast><time>Evening</time> <predict>Clear and Cooler</predict></weatherReport>

XML is designed to be processed by computer programs, not to be displayed to humans

Nevertheless, almost all current browsers can display XML documents

They don’t all display it the same way

They may not display it at all if it has errors

For best results, update your browsers to the newest available versions

Remember: HTML is designed to be viewed, XML is designed to be used

37

STRUCTURE OF XML DOCUMENT

XML document has to be well-formed.

Conform to syntax requirements

Conform to a simple container structure

Common structure of XML document:

Prolog

Body

Epilog

38

PROLOG

Includes:

XML Declaration

<?xml version=“1.0” encoding=‘utf-8’ standalone=“yes”>

Version is mandatory, encoding and standalone are optional

Document Type Declaration

<!DOCTYPE

It is not DTD=Document Type Definition

A simple well-formed XML does not need it.

Schema declaration

39

BODY & EPILOG

Body

Contains 1 or more elements

The “contents”

Epilog

Hardly used

Can be used to identify end of document

40

WELL-FORMED XML DOCUMENT

41

Every element must have both a start tag and an end tag, e.g. <name> ... </name>

But empty elements can be abbreviated: <break />.

XML tags are case sensitive

XML tags may not begin with the letters xml, in any combination of cases

Elements must be properly nested, e.g. not <b><i>bold and italic</b></i>

Every XML document must have one and only one root element

The values of attributes must be enclosed in single or double quotes, e.g. <time unit="days">

Character data cannot contain < or &

XML Process

An XML document is created in an editor. The XML parser reads the document and converts it into a tree of elements.

The parser passes the tree to the browser that displays it.

Summary

XML is a meta-markup language that enables the creation of markup languages for particular documents and domains.

XML documents are created in an editor, read by a parser, and displayed by a browser.

Be careful. XML isn’t completely finished. It will change and expand, and you will encounter bugs in current XML software.


Recommended