+ All Categories
Home > Technology > Introduction to xml

Introduction to xml

Date post: 15-Jan-2015
Category:
Upload: soumya
View: 598 times
Download: 0 times
Share this document with a friend
Description:
 
33
1 XML Basics
Transcript
Page 1: Introduction to xml

1

XML Basics

Page 2: Introduction to xml

2

Table of Contents CHAPTER 1: Introduction 1.1 What is XML? 1.2 Advantages of XML? 1.3 Differences between XML and HTML 1.4 XML Related Technologies CHAPTER 2: How XML can be used? 2.2 XML Benefits 2.3 Uses of XML 2.4 XML Tags CHAPTER 3: XML Editors 3.1 EmEditor 3.2 XML Spy 3.3 XML Syntax Rules 3.4 XML Viewing CHAPTER 4: XML Documents 4.1 Well Formed XML 4.2 Valid XML 4.3 XML Parser 4.4 Prolog CHAPTER 5: Document Type Definition 5.1 DTD Elements 5.2 Types of Elements 5.3 Attributes 5.4 Entities CHAPTER 6: Why we Need DTD? 6.1 Classification of DTD 6.2 Internal DTD 6.3 External DTD 6.4 Problems with DTD 6.5 Design Principles 6.6 XML Schema

Page 3: Introduction to xml

3

1. Introduction

XML stands for Extensible Markup Language. XML was developed around 1996 and is a subset of SGML (Standard Generalized Markup Language). XML was made less complicated than SGML to enable its use on the web.XML is a set of rules for encoding documents electronically. XML is a new type of language which has been developed for the web which is different to any other type of scripting or programming language available before.

XML is used for exchange of data. The language makes it possible to define data in a structured way. XML tags are not predefined like HTML. XML lets you create your own unique tags that are meaningful for your data, hence the use of the term “extensible”.

An xml document does not do anything by itself. It is just pure information wrapped in tags. You have to write a piece of software to send, receive or display it. XML is recommended by the World Wide Web Consortium (W3C). XML is a meta-language. A meta-language is a language that's used to define other languages. XML has become popular to use with web services.

1.1 What is XML?

XML stands for extensible markup language. XML is a markup language much like HTML. XML is designed to carry the data, not to display the data. XML tags are not predefined we can define our own tags. XML is designed to be self descriptive. XML is a W3c Recommendation. XML is designed to store the data.

1.2 Advantages of XML

It is a simultaneously human and machine-readable format. It supports Unicode, allowing almost any information in any written human

language to be communicated. It can represent the most general computer science data structures, records, lists

and trees. The strict syntax and parsing requirements make the necessary parsing

algorithms extremely simple, efficient, and consistent. XML is heavily used as a format for document storage and processing, both

online and offline. It is based on international standards. The hierarchical structure is suitable for most types of documents.

Page 4: Introduction to xml

4

It manifests as plain text files, which are less restrictive than other proprietary document formats.

It is platform-independent, thus relatively immune to changes in technology. XML document is a plain text and human readable and also easy to edit/view. XML document has a tree structure which is powerful enough to express

complex data and simple enough to understand. XML documents are language neutral. For e.g. a Java program can generate an

xml which can be parsed by a program written in C++ or Perl. XML files are operating system independent.

1.3 Differences between XML and HTML XML and HTML are different and they both have different goals. They are designed for different purposes. Some people think that xml is an advanced version of html and it has come to replace html. It is not the case. Both will be there as they are used for different purposes. Some of the Differences between XML and HTML Extensible Markup Language Hyper Text Markup Language XML is designed to store the data HTML is designed to display the data XML focus on what the data is HTML focus on how data looks XML allows us to define our own tags HTML has predefined set of tags XML is used to transport the data HTML is used to format and display data

1.4 XML Related Technologies DTD (Document Type Definition) and xml schemas are used to define legal xml tags and their attributes. CSS (Cascading Style Sheets) describe HTML or XML in a browser. XSLT (Extensible Style Sheet Language Transformations) and XPath are used to translate from one form xml to another. DOM (Document Object Model), SAX (Simple API for XML), and JAXP (Java API for XML processing) are all APIs for xml parsing.

2. How Can XML be used? XML can be used in many aspects of web development, often to simply data storage and sharing.

Page 5: Introduction to xml

5

XML Simplifies Data Sharing: XML data is stored in plain text format. This provides a software and hardware independent way of storing data. This makes it much easier to create data that different applications can share. XML Simplifies Data Transport: One of the most time-consuming challenges for developers is to exchange data between incompatible systems over the Internet. Exchanging of data using xml greatly reduces this complexity, since the data can be read by different incompatible applications. XML Simplifies Platform Changes: XML data is stored in text format. This makes it easier to expand or upgrade to new operating systems, new applications, or new browsers, without losing data. XML Makes our Data More Available: Since xml is independent of hardware and software applications, xml can make your data more available and useful. XML is used to Create New Internet Languages: A lot of new Internet languages are created with XML.

2.2 XML Benefits

XML improves the functionality of web technologies through the use of a more flexible and adaptable means to identify information.

XML is a Meta language. That is, it is a language that describes other languages. XML provides the facility to define tags and the structural relationship between

them. The extensibility and structured nature of xml allows it to be used for

communication between different systems.

2.3 Uses of XML Meta Content: To describe the contents of a document. Messaging: Where applications or organizations exchanges data between them. Database: The data extracted from the database can be preserved with original information and can be used more than one application in different ways.

2.4 XML Tags

The tags used in xml also look like HTML tags. They are formed by a word (or a number of words) enclosed inside < > and < / > signs. The difference is that xml tags are not pre-defined like HTML.

Page 6: Introduction to xml

6

<Composer> is an example for an opening tag. In XML all opening Tags must have closing tags, in this case the closing tag would look like </Composer>. Start Tag The beginning of every non-empty XML element is marked by a start-tag. An example of a start-tag: <Composer> End Tag The end of every non-empty XML element is marked by an end-tag. An example of an end-tag: </Composer> Element Content The text between the start-tag and end-tag is called the element's content. The element content in this case would be: This is my home page!!!!!!! Empty Element Tag If an element is empty, it must be represented either by a start-tag immediately followed by an end-tag or by an empty-element tag. An empty-element tag takes a special form: <BR/>...empty element tag in XML OR <BR></BR> Empty-element tags may be used for any element which has no content, whether or not it is declared using the keyword EMPTY. For interoperability, the empty-element tag must be used, and can only be used, for elements which are declared EMPTY. By convention put HTML tags in upper case and XML tags in lower case. Furthermore, XML is case sensitive. Always remember that <Composer>, <composer> and <COMPOSER> are different kinds of tags in XML.

Tags should begin with either a letter, an underscore (_) or a colon (:) followed by some combination of letters, numbers, periods (.), colons, underscores, or hyphens (-) but no white space, with the exception that no tags should begin with any form of "xml".

3. XML Editor

An xml editor is a markup language editor with added functionality to facilitate the editing of xml. This can be done using a plain text editor, with all the code visible, but xml editors have added facilities like tag completion and menus and buttons for tasks that are common in xml editing, based on data supplied with document type definition (DTD) or the xml tree.

Page 7: Introduction to xml

7

An xml Editor should be able to

Add closing tags to your opening tags automatically. Force you to write valid xml. Verify your xml against a DTD. Verify your xml against a Schema. Color codes your xml syntax.

Here are Some xml Editors

Emeditor XML Notepad XML Cook top XML Pro XML Spy eNotepad

If you use notepad for xml editing, you will soon run into problems. Notepad does not know that you are writing xml, so it will not be able to assist you. You will create many errors, and as your xml documents grow larger you will lose control. Today xml is an important technology, and every day we can see xml playing a more and more critical role in new web development. However, when you start working with xml, you will soon find that it is better to edit xml documents using a professional xml editor. Good xml editors will help you to write error free xml documents, validate your text against a DTD or a schema, and force you to stick to a valid xml structure. Add closing tags to your opening tags automatically.

3.1 EmEditor

Why is EmEditor Professional the Best Text Editor?

1. EmEditor can Launch very Quickly, Almost Instantaneously

You are going to view or edit a large quantity of files every day, but you don't want to wait for many seconds just to view a file! Unfortunately, many programs, including word processors and text editors, require you to wait several seconds before you can start using!

This doesn't make sense! You want to increase productivity by using a text editor, but waiting so long every time doesn't justify your using a text editor. You should not wait more than one second. That's why EmEditor has been so popular for such a long time.

Page 8: Introduction to xml

8

2. Extendable with Plug-ins!

EmEditor exposes many APIs, so programmers can easily write plug-ins that fit their needs. Features such as Spelling, Word Count, Explorer, Web Preview, and Compare Files, etc. are designed as plug-ins.

3. Powerful Macros with your Favorite Script Language!

You can write a macro to do almost whatever you want within EmEditor! The macros are based on the Windows Scripting Host (WSH) engine, so you can use all of the powerful, robust objects available under the Windows Scripting Host. You can program macros with popular script languages including JavaScript and VB Script. You can even program with Perl Script, Python, PHP Script, Ruby, and other Active Script languages as long as the script engines you want to use are installed on your system.

4. Unicode Support!

EmEditor supports Unicode natively, and in fact, the whole program is built as a Unicode application. EmEditor allows you to open a file with any encoding supported in the Windows system, and you can easily convert from one encoding to another within EmEditor. EmEditor allows you to open Unicode file names, and allows you to search for Unicode characters. With EmEditor plug-ins, EmEditor allows you to convert a selected text to HTML/XML Character Reference or Universal Character Names, and vice versa.

5. Easy and Intuitive Design with Tabbed Windows!

EmEditor is designed for Windows XP, thus frequently used shortcut keys are similar to other Windows applications, such as Copy, Cut, Paste, Undo, and Redo. In addition, EmEditor uses tabbed windows similar to Slim Browser, Internet Explorer, Firefox and other tabbed browser applications. This allows you to open multiple documents in one window and jump between them quickly and easily.

6. Other Features!

There are many other useful Features that are Worth Mentioning:

Keyword highlighting. Regular expression search and highlighting. External tools. Plug-ins using custom bars. Keyboard, toolbar, menu, font and color customization. Drag and drop. Auto save/backup.

Page 9: Introduction to xml

9

Clickable URLs and e-mail addresses. The window can be split into a maximum of 4 panes. Can define multiple configurations and associate file extensions. Can save backups to the recycle bin. Can open recently used files from the tray icon on the taskbar. Shortcut keys to insert accent marks and special characters. Application error handler support. 64-bit edition available. Windows Vista ready. Fast e-mail support.

3.2 XML SPY

XML Spy is the first true integrated development environment for the xml that includes all major aspects of xml in one powerful and easy-to-use product.

Easy to use. Syntax coloring. Automatic tag completion. Automatic well-formed check. Easy switching between text view and grid view. Built in DTD and / or Schema validation. Built in graphical xml Schema designer. Powerful conversion utilities. Database import and export. Built in templates for most xml document types. Built in XPath analyzer. Full SOAP and WSDL capabilities. Powerful project management.

3.3 XML Syntax Rules XML as we have seen is a formal specification for markup languages. Every formal language specification has an associated syntax. XML Documents as we have seen Comprise two Basic Components. Data: The actual content. Markup: Meta-information about data that describes it.

Page 10: Introduction to xml

10

The syntax rules of xml are very simple and logical. The rules are easy to learn, and easy to use.

1. Every Element must have Closing Tag

<p>This is a paragraph

<p>This is another paragraph

In xml, it is illegal to omit the closing tag. All elements must have a closing tag.

The very first line of any xml document must declare the document to be an xml document and specify some other optional attributes.

<?xml version="1.0"?>

The statement above declares the document as an xml document, which means it complies with xml syntax rules.

2. XML Tags are Case Sensitive

XML elements are defined using xml tags.

XML tags are case sensitive. With xml, the tag <Letter> is different from the tag <Letter>.

Opening and closing tags must be written with the same case.

3. XML Elements must be Properly Nested

In HTML, you might see improperly nested elements:

<b><i>This text is bold and italic</b></i>

In xml, all elements must be properly nested within each other.

<b><i>This text is bold and italic</i></b>

4. XML Documents must have a Root Element

XML documents must contain one element that is the parent of all other elements. This element is called the root element. XML documents must contain one element that is the parent of all other elements. This element is called the root element. <?xml version=”1.0?”>

Page 11: Introduction to xml

11

<Root><Child><Subchild>.....</Subchild></Child></Root>

5. XML Attribute Values must be Quoted <?xml version="1.0" ?> <Address><Bangalore> <Name Nickname="12">Sumana</Name><Company>Testing</Company></Bangalore> <Mysore><Name>Sumith</Name><Company EmpID="1675">Mac Studio</Company> </Mysore></Address>

XML elements can have attributes in name/value pairs just like in HTML.

6. Entity References

Some characters have a special meaning in xml. If you place a character like "<" inside an xml element, it will generate an error because the parser interprets it as the start of a new element.

To avoid this error, replace the "<" character with an entity reference.

There are Five Predefined Entity References in xml Entity Reference

< (less than) &lt;

> (Greater than) &gt;

& (Ampersand) &amp;

‘ (Apostrophe) &apos;

“ (quotation mark) &quote;

Note: Only the characters "<" and "&" are strictly illegal in xml. The greater than character is legal, but it is a good habit to replace it.

7. Comments in XML

Comments should not appear on the first line or otherwise above the xml declaration for xml processor compatibility. The string "--" (double-hyphen) is not allowed (as it is used to delimit comments), and entities must not be recognized within comments.

The Syntax for writing Comments in xml is Similar to that of HTML.

<! -- This is a comment -->

Page 12: Introduction to xml

12

8. White-Space is preserved in XML

HTML truncates multiple white-space characters to one single white-space. With xml, the white-space in a document is not truncated.

3.4 XML Viewing

XML files can be viewed in all major browsers.

[Note: Don't expect xml files to be displayed as HTML pages] <?xml version="1.0"?> <Address><Name>Harsh</Name> <Company>Motorola </Company> </Address>

4. XML Documents XML documents are similar to HTML documents. They contain information and markup tags that define the information and are saved as ASCII text. The name of the xml document has an xml extension “abc.xml”. A data object is an xml document if it is well-formed. A well-formed xml document may in addition be valid if it meets certain further constraints or Rules. Well formed xml documents contain text and xml tags which confirm to the xml syntax. Valid xml documents must be well formed and are additionally error checked against a document type definition (DTD). DTD is a set of rules that defines what tags appear in an xml document. DTDs also describe the structure of a document.

4.1 Well Formed XML Well formed xml documents simply markup pages with descriptive tags. You don't need to describe or explain what these tags mean. In other words a well formed xml document does not need a DTD, but is must confirm to the xml syntax rules. If all tags in a document are correctly formed and follow xml syntax rules or guidelines, then a document are considered as well formed. Some of the rules are given below.

1. XML documents must contain at least one element. Well formed: <title>Software</title> Not well formed: “Software”

Page 13: Introduction to xml

13

2. XML documents must contain a unique opening and closing tag that contains the whole document, forming what is called a root element. Well Formed: <title>DEL</title> Not well formed: <title>DEL 3. Tags in XML are Case Sensitive: The <Author>, <AUTHOR> are not the same. The xml processing instruction must be all lowercase. But keywords in DTDs must be all UPPERCASE, such as ELEMENT, ATTLIST, #REQUIRED, #IMPLIED, NMTOKEN, ID, etc. However, your own elements and attributes may be any case you choose, as long as you are consistent. Well formed: <Author>Information</Author> Not well formed: <Author>Information</AUTHOR> 4. Attribute values must always be quoted (as opposed to HTML). Well formed: <Name id="100">Asini</Name> Not well formed: <Name id="1>Asini</Name>

4.2 Valid XML Valid xml is a more rigid or formal form of xml. All xml documents are well formed documents. Some xml documents are additionally valid. Valid documents must confirm not only to the syntax, but also to the DTD.

Page 14: Introduction to xml

14

In the case of markup languages defined by xml, the DTD provides the grammatical structure to bring order to the elements of the language. The main difference between valid and well formed is that valid xml requires a DTD and whereas well formed xml does not.

4.3 XML Parser

An xml parser is a processor that reads an xml document and determines the structure and properties of the data. If the parser goes beyond the xml rules for well-firmness and validates the document against an xml DTD, the parser is said to be a "validating" parser. A validating xml parser also checks the xml syntax and reports errors. Now you have the possibility to check whether a document is well formed and valid. An xml parser reads xml, and converts it into an xml DOM object that can be accessed with JavaScript. Most browsers have a built-in xml parser.

4.4 Prolog The prolog refers to the information that appears before the start tag of the document or root element. It includes information that applies to the document as a whole, such as character encoding, document structure, and style sheets. <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="show_book.xsl"?> <!DOCTYPE catalog SYSTEM "catalog.dtd"> XML Declaration The XML declaration typically appears as the first line in an XML document. The XML declaration is not required, however, if used it must be the first line in the document and no other content or white space can precede it. The XML Declaration in the Document Map Consists of the Following:

The Version Number, <?xml version="1.0"?>. This is mandatory. Although the number might change for future versions of XML, 1.0 is the current version.

The Encoding Declaration, <?xml version="1.0" encoding="UTF-8"?>. This is optional. If used, the encoding declaration must appear immediately after the version information in the XML declaration, and must contain a value representing an existing character encoding.

An XML declaration can also contain a Standalone Declaration, for example, <?xml version="1.0" encoding="UTF-8" standalone="yes"?>. Like the encoding declaration, the

Page 15: Introduction to xml

15

standalone declaration is optional. If used, the standalone declaration must appear last in the XML declaration. Encoding Declaration The encoding declaration identifies which encoding is used to represent the characters in the document. Although XML parsers can determine automatically if a document uses the UTF-8 or UTF-16 Unicode encoding, this declaration should be used in documents that support other encodings. For example, the following is the encoding declaration for a document that uses the ISO-8859-1 (Latin 1). Example: <?xml version="1.0" encoding="ISO-8859-1"?> Standalone Declaration The standalone declaration indicates whether a document relies on information from an external source, such as external document type definition (DTD), for its content. If the standalone declaration has a value of "yes", Example :<?xml version="1.0" standalone="yes"?> The parser will report an error if the document references an external DTD or external entities. Leaving out the standalone declaration produces the same result as including a standalone declaration of "no". The XML parser will accept external resources, if there are any, without reporting an error. Comments Comments begin with a <!-- and end with a -->. Comments can appear in the document prolog, including the document type definition (DTD); after the document; or in the textual content. Comments cannot appear within attribute values. They cannot appear inside of tags.

5. Document Type Definition (DTD) XML DTD or document type definition is expected to define formal grammar of xml based markup language(s). Basically DTD contains list of elements that can occur in markup, list of attributes of each element, possible attribute values or value types and content model that specifies allowed nesting of elements.

Page 16: Introduction to xml

16

This Information can be used in Several Ways:

One can use DTD to validate document, i.e., to check whether document follows formal rules defined in DTD, in this way one can detect possible errors (like misspelled element names, attribute names/values, wrongly nested elements etc.) that otherwise would be difficult to notice.

One can use DTD just to provide accurate description of markup language. Here many things depend on markup language itself, as not all xml applications can be accurately described using xml DTD.

One can use DTD to define character entities, specify default attributes and bind elements to xml namespaces.

The main purpose of a DTD is to define the legal building blocks of an xml document. You can store a DTD at the beginning of a document or externally in a separate file.

All the xml documents (and HTML documents) are made up by the following building blocks:

Elements Attributes Entities PCDATA CDATA

5.1 DTD Elements

Elements are the main building blocks in the document structure. The elements represent the logical components of a document and how they are arranged into a hierarchical (tree) structure.

Syntax: <! ELEMENT Name Content >

In a DTD, elements are declared with an ELEMENT declaration.

Declaring Elements

In a DTD, xml elements are declared with an element declaration with the following syntax.

<! ELEMENT Element-Name Category>

OR

<! ELEMENT Element-Name (Element-Content)>

Page 17: Introduction to xml

17

Empty Elements

Empty elements are declared with the category keyword EMPTY

<! ELEMENT Element-Name EMPTY>

Elements with Parsed Character Data

Elements with only parsed character data are declared with #PCDATA inside parentheses.

<! ELEMENT Element-Name (#PCDATA)>

Example: <! ELEMENT from (#PCDATA)>

Elements with any contents

Elements declared with the category keyword ANY, can contain any combination of par sable data:

<! ELEMENT element-name ANY>

Example: <! ELEMENT note ANY>

Elements with children (Sequences)

Elements with one or more children are declared with the name of the children elements inside parentheses.

<! ELEMENT Element-Name (Child1)>

OR

<! ELEMENT Element-Name (Child1, Child2,...)>

Example: <! ELEMENT note (to, from, heading, body)>

When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document. In a full declaration, the children must also be declared, and the children can also have children. The full declaration of the "note" element is: <! ELEMENT Note (To, From, Heading, Body)> <! ELEMENT To (#PCDATA)> <! ELEMENT from (#PCDATA)> <! ELEMENT heading (#PCDATA)>

Page 18: Introduction to xml

18

<! ELEMENT body (#PCDATA)> Declaring Only one Occurrence of an Element

<! ELEMENT element-name (child-name)>

Example

<! ELEMENT Note (Message)>

The example above declares that the child element "message" must occur once, and only once inside the "note" element.

Declaring Minimum one Occurrence of an Element

<! ELEMENT Element-Name (Child-Name+)>

Example: <! ELEMENT Note (Message+)>

The + sign in the example above declares that the child element "Message" must occur one or more times inside the "Note" element.

Declaring Zero or More Occurrences of an Element

<! ELEMENT Element-Name (Child-Name*)>

Example: <! ELEMENT Note (Message*)>

The * sign in the example above declares that the child element "Message" can occur zero or more times inside the "Note" element.

Declaring Zero or One Occurrences of an Element

<! ELEMENT Element-Name (Child-Name?)>

Example: <! ELEMENT Note (Message?)>

The ? Sign in the example above declares that the child element "message" can occur zero or one time inside the "Note" element.

Declaring Either/or Content

Example: <! ELEMENT Note (To, From, header, (message | body))>

Page 19: Introduction to xml

19

The example above declares that the "note" element must contain a "to" element, a "from" element, a "header" element, and either a "message" or a "body" element.

Declaring Mixed Content

Example: <! ELEMENT Note (#PCDATA|to|from|header|message)*>

The example above declares that the "Note" element can contain zero or more occurrences of parsed character data, "To", "From", "Header", or "Message" elements.

5.2 Types of Elements There are Three Primary Types of Elements. They are given below Simple elements: These are elements that contain text or "parsed character data" (represented as #PCDATA in your DTD). Compound elements: These elements contain other elements, and sometimes PCDATA and other elements. Standalone elements: They do not contain any PCDATA or other elements.

5.3 Attributes

Attributes allow an author to attach extra information to the elements in a document. One important difference from the elements is that the attributes cannot contain elements and there is no "Sub-attribute".

In a DTD, attributes are declared with an ATTLIST declaration.

Page 20: Introduction to xml

20

Declaring Attributes

An attribute declaration has the following syntax

<! ATTLIST element-name attribute-name attribute-type default-value>

DTD Example: <! ATTLIST Payment type CDATA "Check">

XML Example: <Payment type="Check" />

The Attribute-Type can be one of the Following: Type Description CDATA The value is character data (en1|en2|..) The value must be one from an enumerated list ID The value is a unique id IDREF The value is the id of another element IDREFS The value is a list of other ids NMTOKEN The value is a valid xml name NMTOKENS The value is a list of valid xml names ENTITY The value is an entity ENTITIES The value is a list of entities NOTATION The value is a name of a notation xml The value is a predefined xml value

The Default-Value can be one of the Following: Value Explanation value The default value of the attribute #REQUIRED The attribute is required #IMPLIED The attribute is not required #FIXED value The attribute value is fixed

Default Attribute Value

DTD

<! ELEMENT Square EMPTY>

<! ATTLIST Square width CDATA "0">

Valid xml

Page 21: Introduction to xml

21

<square width="100" />

In the example above, the "square" element is defined to be an empty element with a "width" attribute of type CDATA. If no width is specified, it has a default value of 0.

#REQUIRED

Syntax :<! ATTLIST Element-Name Attribute-Name Attribute-Type #REQUIRED>

Example

DTD

<! ATTLIST Person Number CDATA #REQUIRED>

Valid xml

<Person Number="5677" />

Invalid xml

<person />

Use the #REQUIRED keyword if you don't have an option for a default value, but still want to force the attribute to be present.

#IMPLIED

Syntax :<!ATTLIST Element-Name Attribute-Name Attribute-Type #IMPLIED>

Example

DTD

<! ATTLIST Contact fax CDATA #IMPLIED>

Valid xml

<Contact fax="555-667788" />

Use the #IMPLIED keyword if you don't want to force the author to include an attribute, and you don't have an option for a default value.

Page 22: Introduction to xml

22

#FIXED

Syntax :<! ATTLIST Element-Name Attribute-Name Attribute-Type #FIXED "value">

Example

DTD

<! ATTLIST Sender Company CDATA #FIXED "Microsoft">

Valid xml

<Sender Company="Microsoft" />

Invalid xml

<Sender Company="Software" />

Use the #FIXED keyword when you want an attribute to have a fixed value without allowing the author to change it. If an author includes another value, the xml parser will return an error.

Enumerated Attribute Values

Syntax: <! ATTLIST Element-Name Attribute-Name (En1|En2|..) Default-value>

Example

DTD: <! ATTLIST Payment type (Check | Cash) "cash"> XML Example: <Payment type="Check" /> OR <Payment type="Cash" /> Use enumerated attribute values when you want the attribute value to be one of a fixed set of legal values.

5.4 Entities

An entity is a name that represents a special character, additional text or a file. There are two kinds of entities

Page 23: Introduction to xml

23

General Entities Parameter Entities

There are Two Kinds of Entities in XML Documents. 1. General Entities: Used in the context of documents. References to general entities start with & and end with;

2. Parameter Entities: Used in a document’s DTD. References to parameter entities start with % and end with;

6. Why we Need a DTD?

XML is a language specification. Based on this specification, individuals and organizations develop their own markup languages which they then use to communicate information.

Needs to know how the document is structured and Needs to check if the content is indeed compliant with the structure

The Document Type Definition also known as DTD holds information about the structure of an xml document.

6.1 Why Use a DTD?

XML provides an application independent way of sharing data. With a DTD, different groups of people can agree on a common DTD for

interchanging data. Your application can use a standard DTD to verify that data that you receive from

the outside world is valid. The DTD can be used to verify your data.

6.2 Internal DTDs

Internal DTD are inserted within the doc type declaration. DTDs inserted this way are used in the specific document.

Syntax: <! DOCTYPE Root-Element [DTD Specification]>

Examples 1. <?xml version="1.0"?> <!DOCTYPE Note [

Page 24: Introduction to xml

24

<!ELEMENT Note (To, From, Heading, Body)> <!ELEMENT To (#PCDATA)> <!ELEMENT From (#PCDATA)> <!ELEMENT Heading (#PCDATA)> <!ELEMENT Body (#PCDATA)> ]> <Note><To>Tove</To><From>Jani</From><Heading>Reminder</Heading> <Body>Don't Forget Me This Weekend</Body></Note> 2. <?xml version="1.0"?> <!DOCTYPE message [ <!ELEMENT message (to,from,subject,text)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT subject (#PCDATA)> <!ELEMENT text (#PCDATA)> ]> <message><to>Dave</to><from>Susan</from><subject>Reminder</subject> <text>Don't forget to buy milk on the way home.</text></message> 3. <?xml version="1.0"?> <!DOCTYPE Tutorials [ <!ELEMENT Tutorials (Tutorial)+> <!ELEMENT Tutorial (Name, URL)> <!ELEMENT Name (#PCDATA)> <!ELEMENT URL (#PCDATA)> ]> <Tutorials><Tutorial><Name>xml Tutorial</Name> <URL>www.Test.COM </URL></Tutorial> <Tutorial><Name>HTML Tutorial</Name><URL>www.workhard.com<URL> </Tutorial></Tutorials> 4. <?xml version="1.0"?> <!DOCTYPE Address[ <!ELEMENT Address (Street, City, State, Zip)> <!ELEMENT Street (#PCDATA)> <!ELEMENT City (#PCDATA)> <!ELEMENT State (#PCDATA)> <!ELEMENT Zip (#PCDATA)> ]> <Address><Street>12 City Road</Street><City>Melbourne</City> <State>Victoria</State><Zip>8001</Zip></Address>

Page 25: Introduction to xml

25

5. <?xml version="1.0"?> <!DOCTYPE Note[ <!ELEMENT Note (To, From, Heading, Body)> <!ELEMENT To (#PCDATA)> <!ELEMENT From (#PCDATA)> <!ELEMENT Heading (#PCDATA)> <!ELEMENT Body (#PCDATA)> ]> <Note><To>Yashaswi</To><From>Jan</From> <Heading>Head Lines</Heading><Body> Software</Body></Note> 6. <?xml version="1.0"?> <!DOCTYPE Film [ <!ENTITY COM "Comedy"> <!ENTITY SF "Science Fiction"> <!ELEMENT Film (Title+, Genre, Year)> <!ELEMENT Title (#PCDATA)> <!ELEMENT Genre (#PCDATA)> <!ELEMENT Year (#PCDATA)> ]> <Film><Title id="1">Tootsie</Title><Genre>&COM;</Genre> <Year>1982</Year><Title Id="2">Jurassic Park</Title><Genre>&SF;</Genre> <Year>1993</Year></Film> 7. <?xml version="1.0"?> <!DOCTYPE People_List [ <!ELEMENT People_List (Person*)> <!ELEMENT Person (Name, Birthdate?, Gender?, Social Security Number?)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Birthdate (#PCDATA)> <!ELEMENT Gender (#PCDATA)> <!ELEMENT Social Security Number (#PCDATA)> ]> <People_List><Person><Name>Aditya</Name><Birthdate>27/11/2008</Birthdate> <Gender>Male</Gender></Person></People_List> 8. <?xml version="1.0"?> <!DOCTYPE Newspaper [ <!ELEMENT Newspaper (Article+)> <!ELEMENT Article (Headline, Byline, Lead, Body, Notes)> <!ELEMENT Headline (#PCDATA)> <!ELEMENT Byline (#PCDATA)> <!ELEMENT Lead (#PCDATA)>

Page 26: Introduction to xml

26

<!ELEMENT Body (#PCDATA)> <!ELEMENT Notes (#PCDATA)> <!ATTLIST Article Author CDATA #REQUIRED> <!ATTLIST Article Editor CDATA #IMPLIED> <!ATTLIST Article Date CDATA #IMPLIED> <!ATTLIST Article Edition CDATA #IMPLIED> <!ENTITY Newspaper "Times of India"> <!ENTITY Publisher "Hasini"> <!ENTITY Copyright "Copyright 2010 SOFTWARE "> ]> <Newspaper><Article Author="Yashaswi" Editor="Anurag" Date="20/02/2010" Edition="First"><Headline>Temptation 2010</Headline> <Byline>New Year</Byline><Lead>No &Publisher; Matter</Lead> <Body>&Newspaper;</Body><Notes>All The Best The New Year&Copyright;</Notes> </Article></Newspaper> 9. <?xml version="1.0"?> <!DOCTYPE Parts [ <!ELEMENT Parts (Title?, Part*)> <!ELEMENT Title (#PCDATA)> <!ELEMENT Part (Item, Manufacturer, Model, Cost)+> <!ATTLIST Part type (Computer|Auto|Airplane) #IMPLIED> <!ELEMENT Item (#PCDATA)> <!ELEMENT Manufacturer (#PCDATA)> <!ELEMENT Model (#PCDATA)> <!ELEMENT Cost (#PCDATA)> ]> <Parts><Title>Main Heading</Title><Part type="Computer"> <Item></Item><Manufacture></Manufacture> <Model></Model><Cost></Cost></Part></Parts> 10. <?xml version="1.0"?> <!DOCTYPE Videos [ <!ELEMENT Videos (Music+) > <!ELEMENT Music (Title, Artist+)> <!ELEMENT Title (#PCDATA)> <!ELEMENT Artist (#PCDATA) > ]> <Videos><Music><Title>Video Title1</title> <Artist>Artist1</artist></Music> <Music><Title>Video Title2 </Title><Artist>Artist2</Artist> <Artist>Artist3</Artist></Music></Videos>

Page 27: Introduction to xml

27

11. <?xml version="1.0"?> <!DOCTYPE Document [ <!ELEMENT Document (Customer)*> <!ELEMENT Customer (Name,Date,Orders)> <!ELEMENT Name (Last_Name,First_Name)> <!ELEMENT Last_Name (#PCDATA)> <!ELEMENT First_Name (#PCDATA)> <!ELEMENT Date (#PCDATA)> <!ELEMENT Orders (Item)*> <!ELEMENT Item (Product,Number,Price)> <!ELEMENT Product (#PCDATA)> <!ELEMENT Number (#PCDATA)> <!ELEMENT Price (#PCDATA)> ]> <Document><Customer><Name> <Last_Name>Kaif</Last_Name> <First_Name>Kat</First_Name> </Name> <Date>20/02/2010</Date> <Orders><Item><Product></Product> <Number></Number><Price></Price> </Item></Orders></Customer> </Document> 12. <?xml version="1.0"?> <!DOCTYPE book [ <!ELEMENT book (title, chapter+)> <!ELEMENT title (#PCDATA)> <!ELEMENT chapter (heading, paragraph*)> <!ELEMENT heading (#PCDATA)> <!ELEMENT paragraph (#PCDATA)> <!ATTLIST chapter language CDATA #REQUIRED> ]> <book><title/><chapter language="markup"> <heading>Introduction to xml</heading><paragraph> Extensible markup language, used to describe the data</paragraph></chapter> </book>

Page 28: Introduction to xml

28

6.3 External DTD

An external DTD is one that resides in a separate document. It refers saving the DTD as a separate file with extension .dtd and then referencing the DTD file within the XML document. Syntax: <! DOCTYPE Root-Element SYSTEM "File-Name"> Examples

1. <!ELEMENT JewelleryShop (Gold+)> <!ELEMENT Gold (Chain+, Bangles+, Earings+,Necklace?)> <!ELEMENT Chain (Longchain?, Shortchain+)> <!ELEMENT Longchain (#PCDATA)> <!ELEMENT Shortchain (#PCDATA)> <!ELEMENT Bangles (#PCDATA)> <!ELEMENT Earings (#PCDATA)> <!ELEMENT Necklace (#PCDATA)> <?xml version="1.0"?> <!DOCTYPE JewelleryShop SYSTEM "gold.dtd"> <JewelleryShop><Gold><Chain> <Longchain>500grams</Longchain> <Shortchain>200grams</Shortchain></Chain> <Bangles>200grams of 4 bangles</Bangles> <Earings>250 grams of 2 earings</Earings> <Necklace/></Gold></JewelleryShop> 2. <!ELEMENT people_list (person*)> <!ELEMENT person (name, birthdate?, gender?)> <!ELEMENT name (#PCDATA)> <!ELEMENT birthdate (#PCDATA)> <!ELEMENT gender (#PCDATA)> <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE people_list SYSTEM "example.dtd"> <people_list><person><name>Borne</name><birthdate>04/02/1977</birthdate> <gender>Male</gender></person></people_list> 3. <!ELEMENT addressbook (contact)> <!ELEMENT contact (name, address+, city, state, zip, phone, email, web, company)> <!ELEMENT name (#PCDATA)> <!ELEMENT address (#PCDATA)>

Page 29: Introduction to xml

29

<!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT zip (#PCDATA)> <!ELEMENT phone (voice, fax?)> <!ELEMENT voice (#PCDATA)> <!ELEMENT fax (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT web (#PCDATA)> <!ELEMENT company (#PCDATA)> <?xml version="1.0"?> <!DOCTYPE addressbook SYSTEM "AddressBook.dtd" [ <!ENTITY amp "&#38;#38;"><!ENTITY apos "&#39;">]> <addressbook><contact><name>Frank Rizzo</name> <address>1212 W 304th Street</address> <city>New York</city><state>New York</state> <zip>10011</zip><phone> <voice>212-555-1212</voice> <fax>212-555-1213</fax> </phone><email>[email protected]</email> <web>http://www.fruity.com/rizzo</web> <company>Frank&apos;s Ratchet Service</company></contact> <contact><name>Sol Rosenberg</name><address>1162 E 412th Street</address> <city>New York</city><state>New York</state><zip>10011</zip> <phone><voice>212-555-1818</voice><fax>212-555-1819</fax> </phone><email>[email protected]</email> <web>http://www.fruity.com/rosenberg</web> <company>Rosenberg&apos;sShoes&amp;Glasses</company></contact> </addressbook> 4. <!ELEMENT movies (movie)+> <!ELEMENT movie (title, writer+, producer+, director+, actor*, comments?)> <!ATTLIST movie type (drama | comedy | adventure | sci-fi | mystery | horror | romance |documentary) "drama" rating (G | PG | PG-13 | R | X) "PG" review (1 | 2 | 3 | 4 | 5) "3" year CDATA #IMPLIED> <!ELEMENT title (#PCDATA)> <!ELEMENT writer (#PCDATA)> <!ELEMENT producer (#PCDATA)> <!ELEMENT director (#PCDATA)> <!ELEMENT actor (#PCDATA)> <!ELEMENT comments (#PCDATA)>

Page 30: Introduction to xml

30

<?xml version="1.0" standalone="no"?> <?xml-stylesheet type="text/CSS" href="Movies.CSS"?> <!DOCTYPE movies SYSTEM "Movies.dtd"> <movies> <movie type="comedy" rating="PG-13" review="5" year="1987"> <title>Raising Arizona</title> <writer>Ethan Coen</writer> <writer>Joel Coen</writer> <producer>Ethan Coen</producer> <director>Joel Coen</director> <actor>Nicolas Cage</actor> <actor>Holly Hunter</actor> <actor>John Goodman</actor> <comments>A classic one-of-a-kind screwball love story.</comments> </movie> <movie type="comedy" rating="R" review="5" year="1988"> <title>Midnight Run</title> <writer>George Gallo</writer> <producer>Martin Brest</producer> <director>Martin Brest</director> <actor>Robert De Niro</actor> <actor>Charles Grodin</actor> <comments>The quintessential road comedy.</comments> </movie> <movie type="mystery" rating="R" review="5" year="1995"> <title>The Usual Suspects</title> <writer>Christopher McQuarrie</writer> <producer>Bryan Singer</producer> <producer>Michael McDonnell</producer> <director>Bryan Singer</director> <actor>Stephen Baldwin</actor> <actor>Gabriel Byrne</actor> <actor>Benicio Del Toro</actor> <actor>Chazz Palminteri</actor> <actor>Kevin Pollak</actor> <actor>Kevin Spacey</actor> <comments>A crime mystery with incredibly intricate plot twists.</comments> </movie> <movie type="sci-fi" rating="PG-13" review="4" year="1989"> <title>The Abyss</title> <writer>James Cameron</writer> <producer>Gale Anne Hurd</producer> <director>James Cameron</director>

Page 31: Introduction to xml

31

<actor>Ed Harris</actor> <actor>Mary Elizabeth Mastrantonio</actor> <comments>A very engaging underwater odyssey. </comments> </movie> </movies> 5. <!ELEMENT courses (BscIT+)> <!ELEMENT BscIT (details+)> <!ELEMENT details (firstsem?, secondsem?, thirdsem?, forthsem?, fifthsem?, sixthsem+)> <!ELEMENT firstsem (#PCDATA)> <!ELEMENT secondsem (#PCDATA)> <!ELEMENT thirdsem (#PCDATA)> <!ELEMENT forthsem (#PCDATA)> <!ELEMENT fifthsem (#PCDATA)> <!ELEMENT sixthsem (visual+,UNIX+, testing+,vbLab+,UNIXLab+, Project+)> <!ELEMENT visual (#PCDATA)> <!ELEMENT UNIX (#PCDATA)> <!ELEMENT testing (#PCDATA)> <!ELEMENT vbLab (#PCDATA)> <!ELEMENT UNIXLab (#PCDATA)> <!ELEMENT Project (#PCDATA)> <?xml version="1.0"?> <!DOCTYPE courses SYSTEM "course.dtd"> <courses><BscIT><details><firstsem/> <secondsem/><thirdsem/><forthsem/><fifthsem/> <sixthsem><visual/><UNIX/><testing/><vbLab/> <UNIXLab/><Project/></sixthsem></details> </BscIT></courses> 6. <!ELEMENT tutorials (tutorial)+> <!ELEMENT tutorial (name,url)> <!ELEMENT name (#PCDATA)> <!ELEMENT url (#PCDATA)> <!ATTLIST tutorials type CDATA #REQUIRED> <?xml version="1.0" standalone="no"?> <!DOCTYPE tutorials SYSTEM "tutorials.dtd"> <tutorials><tutorial><name>xml Tutorial</name> <url>http://www.test.com</url> </tutorial><tutorial><name>HTML Tutorial</name> <url>www.test.com</url></tutorial></tutorials>

Page 32: Introduction to xml

32

6.4 Problems with DTD

Not itself using XML syntax. No constraints on character data. Too simple attribute value models. No support for Namespaces. Very limited support for modularity and reuse (the entity mechanism is too low-

level). No support for schema evolution, extension, or inheritance of declarations (difficult

to write, maintain, and read large DTDs, and to define families of related schemas). Limited white-space control. No embedded, structured self-documentation (<!-- comments --> are not enough). Content and attribute declarations cannot depend on attributes or element context

(many XML languages use that, but their DTDs have to "allow too much"). Too simple ID attributes mechanism. Only defaults for attributes, not for elements. Cannot specify "any element" or "any attribute". Defaults cannot be specified separate from the declarations.

6.5 Design Principles

The XML Schema Language shall be

More expressive than XML DTDs. Expressed in XML. Self-describing. Usable by a wide variety of applications that employ XML. Straightforwardly usable on the Internet. Optimized for interoperability. Simple enough to be implemented with modest design and runtime resources. Coordinated with relevant W3C specs.

The XML Schema Language Specification shall

Be prepared quickly. Be precise, concise, human-readable, and illustrated with examples.

Page 33: Introduction to xml

33

Thank You


Recommended