Introduction to XML This presentation covers introductory features of XML. What XML is and what it...

Post on 04-Jan-2016

223 views 0 download

Tags:

transcript

Introduction to XMLThis presentation covers introductory features of XML.

What XML is and what it is not?

What does it do?

Put different related technologies in the right prospective.

Some XML and document object model (DOM) details.

What XML is and what it is not?XML syntax is very similar to that of HTML, however:

HTML deals with the format of a document XML deals with the content in the document.

XML highlights what is to be displayed whereas HTML highlights how to display it. XML complements HTML.

HTML document contains both the data as well as displaying instructions.XML contains only data.

Example:

What XML is and what it is not?Both XML as well as HTML use tags however, the tags in HTML have fixed semantics and cannot mean anything different but the tags in XML can be assigned different meanings and additional tags can be created.

XML makes the contents of a document readable not only by humans but also by machines.

XML documents are text files and are therefore platform independent.

HTML normally displays the entire contents of a document. with XML it is possible to select the information that is required to be displayed.

XML does not Do anything!It is just a way of structuring, storing and sending information.

Format that can be understood by other applications.

It facilitates exchange of data between incompatible systems (does not exchange data in itself).

Developed by the W3C between 1996 and 1998 to provide a universal format for describing structured documents and data.

XML describes a class of data objects called XML documents.

XML does not Do anything!

It allows the creation of a markup language from scratch.

Different industries and professions can develop custom languages that accurately handle their industry-specific data.

Wireless Markup Language, Chemical Markup Language, Speech Synthesis Markup Language, Gene Expression Markup Language etc..

XML will provide greater flexibility in transferring data between different applications on different platforms and machines, and greatly increase the accuracy of web searches.

Its reliance on Unicode makes it international.

Related Technologies

XML Tags are created by programmers.

How to tell a browser to display information inside a set of tags created by us?

How do browsers display information contained in HTML tags?

Standard predefined tags implemented by browser sftwr.We must therefore have some standard way to describe the tags created by us. We can then refer to this description and write programs to interpret / display the contents of the tags.

Related Technologies

How to describe the tags and their properties?

One of the techniques is called Document Type Definition (DTD). Does not use XML. Details later.

Another more recent technique is Xmlschema. Uses XML

How to display / interpret XML document?

CSS (Cascaded Style Sheets)

XSL (Extensible styling language)

XSLT (Extensible styling language for transformations).

DOM (Document object model) Details later.

SAX (Simple API for XML).

Related Technologies

XML Doc.

DTD

Xmlschema

CSS, XSL

XSLT, DOM

TemplateDisplay / Interpret

Document Type Definition (DTD)

A DTD is a set of rules that will be used by a parser that parses an XML document.

It defines parts of a document and outlines how they can be used including their order and contents.It generally has:

Processing instructionsEntitiesElements, including their start and end tags.AttributesCommentsCharacter Data

Parts of DTD

Processing Instructions.

Most commonly used processing instructions are:<?xml version=“1.0” encoding=“UTF-8” standalone=“no”?>

Entities

Variables used to define common text.

Entity references are references to entities.

Entities are expanded when a document is parsed by an XML parser.

<!ENTITY COPYRIGHT “Copyrighted 2001”>

Reference to above : &COPYRIGHT

Predefined entities: lt, gt, amp, quot, apos

Referenced as: &lt (<), &gt (>), &amp (&), &quot (“), &apos (‘).

ElementsThe main building blocks of an XML document are tagged elements like: <SUBJECT> ……….</SUBJECT>

Element surrounded by angle brackets is called a ‘Tag’. Contents of an element go between a start-tag and end-tag.

Syntax: <!ELEMENT name content>

Contents of an Element can be other Elements, (parent-child , sibling relationships) or PCDATA / CDATA / EMPTY /ANY.Example:

<!ELEMENT DOC (SUBJECT, DATE,ADDRESS, MEMO)>

<!ELEMENT SUBJECT (#PCDATA)>

<!ELEMENT DATE (#PCDATA)>

<!ELEMENT ADDRESS (#PCDATA)>

<!ELEMENT MEMO (#PCDATA)>

ElementsUse of the elements declared in the previous slide:

<DOC>

<SUBJECT>Today’s Memo</SUBJECT>

<DATE>Nov. 6, 2001</DATE>

<ADDRESS>McMaster University</ADDRESS>

<MEMO>I hope you like XML</MEMO>

</DOC>

“DOC” is the parent of other elements that are siblings to each other.

ElementsMore Examples:

<!ELEMENT BR EMPTY> Usage: <BR />

<!ELEMENT Note ANY> Usage:

<Note> any type of contents </Note>

<!ELEMENT Doc (Page+)> One or more elements

<!ELEMENT Doc (Page*)> Zero or more elements

<!ELEMENT Doc (Page?)> Zero or one elements

Mixed Contents:

<!ELEMENT Note (#PCDATA|To|From|Message)*>

Attributes & Comments

Provide extra information about elements, e.g.in HTML:

<img src=“mypicture.gif” />

XML attributes are declared as follows:

<!ATLIST elementname attributename type default_usage>

<!ELEMENT ARTICLE (HEADLINE, BYLINE, STORY>

<!ATLIST ARTICLE AUTHORS CDATA #REQUIRED

EDITORS CDATA #IMPLIED>

Comments: Same syntax as that in HTML <!– comments -->

Syntax of XML

XML syntax rules are simple and self-describing but strict.

All XML documents must have a root element, called ‘document element’ and all children properly nested.

<root><Child_Element>

<Sub_child> …. </Sub_child></Child_Element>

</root>All XML tags are case sensitive.All elements must be properly nested.All elements must have a closing tag.

Syntax of XML

Attributes must be quoted.

Element Names should follow the following rules:

They can have letters, numbers and other characters

They must not start with a number or punctuation characters

They must not start with letters like: XML / Xml /xml

They should not have spaces

Avoid using hyphen or period in a name

“Well Formed” and “Valid” XML documents

A Well Formed document is one that conforms to XML syntax

A Valid document is a Well Formed document that also conforms to a DTD.

Is this a Valid document?

Is it Well Formed?

How to refer to a DTD?

Use <!DOCTYPE> processing instruction

External DTD and Internal DTD

A DTD may be a part of an XML document – Internal

Normally it is stored separately and can be referred as:

<!DOCTYPE memo SYSTEM “memo.dtd”>

<!DOCTYPE memo SYSTEM “http://site/file path”>

<!DOCTYPE purchase PUBLIC “-//Companyxyz//DTD purchase//EN” “http://site/path”>

We use:

<!DOCTYPE COURSE SYSTEM “http://www.cas.mcmaster.ca/~asghar/k600/course.dtd”>

Viewing XML Documents

Internet Explorer 5+ can be used to view XML documents.

XML source document is shown; Why?

How can we view a formatted document?

Using Cascaded Style Sheets CSS is one way. They specify how each tag in XML document being viewed must be formatted.

Extensible Style Language (XSL). Not discussed here.

CSS show the entire document just like HTML. What if we want to display only parts of a document?

Document Object Model (XMLDOM)

XML document can be understood by other applications.

It must be described in a way that other programming languages can manipulate its contents (add, delete, change).

DOM is a programming interface for XML documents and exposes them as a tree structures in memory and provides an easy to use environment for the programmer.

Look at earlier example:

It can be shown as:

XML Tree Structure

to from S u b jec t

a le rt p arag rap h c los in g

C O N TE N T

m em o

XMLDOMDOM is a W3C recommendation. It specifies a language independent API that can be used with languages like Java, C++, Perl, Visual Basic or JavaScript and others.

There are different implementations of DOM.

We use Microsoft’s implementation. MS provides a parser in the form of a COM component in its IE5+.

We use JavaScript to access it and to make different API calls.

DOM uses three objects to access the XML file: Document, Node and Node List. Each has properties and methods.

Common Node types are: Document Type, Processing Instruction, Element, Attribute, Text etc.

Example #1<html> <head> <title> Example1</title> </head> <body> <ul> <li> Asghar Bokhari</li> <li> 9026568 </li> <li> A+ </li> </ul> </body></html>

Example #2<Student>

<FirstName>Asghar</FirstName>

<LastName>Bokhari</LastName>

<ID>9026568</ID>

<Assignment>28</Assignment>

<MidTerm>29</MidTerm>

<Final>38</Final>

<LetterGrade>A+</LetterGrade>

</Student>

Example #3<?xml version=“1.0” ?>

<memo>

<to>K 600 Class</to>

<from>Asghar Bokhari</from>

<Subject>XML Lecture</Subject>

<CONTENT>

<alert>Please listen carefully</alert>

<paragraph>Please read this memo</paragraph>

<closing>Thank you very much </closing>

</CONTENT>

</memo>

Example #4

<?xml version=“1.0” encoding=“UTF-8” standalone=“no”?><!DOCTYPE memo [<!ELEMENT memo (to, from, Subject, CONTENT)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT Subject (#PCDATA)><!ELEMENT CONTENT (alert, paragraph, closing)><!ELEMENT alert (#PCDATA)><!ELEMENT paragraph (#PCDATA)><!ELEMENT closing (#PCDATA) >]><memo>

<to>K 600 Class</to><from>Asghar Bokhari</from><Subject>XML Lecture</Subject><CONTENT>

<alert>Please listen carefully</alert><paragraph>Please read this memo</paragraph><closing>Thank you very much </closing>

</CONTENT></memo>