XML eXtensible Markup Language by Darrell Payne. Experience Logicon / Sterling Federal C, C++,...

Post on 28-Dec-2015

219 views 2 download

transcript

XML eXtensible Markup Language

by Darrell Payne

Experience Logicon / Sterling Federal

C, C++, JavaScript/Jscript, Shell Script, Perl XML Training

XML Training Course 2001 DevXCon Training Conference Currently developing XML course for

Logicon Darrell.Payne@Sterling-fsg.com

XML eXtensible Markup Language Standard General Markup Language(SGML)

Meta-tag language Used for creating other markup languages

Standard adopted for SGML in 1986 Hyper Text Markup Language(HTLM)

Application of SGML Formatting Language

eXtensible Markup Language(XML) Meta-tag language XML = DATA World Wide Web Consortium W3C

((!Standard) && (Specification)) // “c” code humor XML Version 1.0 February 1998 XML is not designed to replace HTML

SGML – HTML – XML diagram

SGML XML

HTMLapplication

WMLapplication

XML Family of Tools and Their Relationship

Namespace

SAXDOM

XMLInfo set

XMLSchema

XPath XSL

XSLTXPointer

XLinkStyle andTransformation

Linking andPointing

UnderlyingAndObjectModel

ProgrammaticInterface

ComplexData Modeling

XML Developer's Guide - McGraw Hill - Page 12

HTML vs. XML Html

Predefined tags Syntax is loose File extensions usually “.html” of “.htm” Not required to be Well – Formed

Some closing tags optional Attribute value quotes may be omitted

XML User defined tags Syntax is exact File extensions usually “.xml” Closing tags mandatory Required to be Well - Formed

Well - Formed All XML documents must be

Syntactically correct! Single root element All element start tags have end tags XML is case sensitive Properly nested tags

<first><second></first></second> //error <first><second></second></first> //correct

Attributes values in quotes “value“ or ‘value‘

Basic XML Parts Markup

Tags Attributes, names and values

Character Data Text

PCDATA CDATA

Binary XML document has two main sections

Prolog Root Misc

Optional and considered superfluous

Simple XML File

<?xml version="1.0"?> <!-- My first XML file --> <document> <message>Hello World!</message > </document > <!-- More Comments -->

Declaration If used: <?xml version="1.0"?> required

Declaration optional Specifies version to which document conforms XML documents without XML declaration might be assumed to

conform to the latest version Other declaration examples <?xml version="1.0" encoding="UTF-8"?> optional

Default – Good for ASCII text – 8 bit characters “UTF-16” Good for foreign – 16 bit characters

Used for Unicode characters To stay uniform use with 8 or 16

<?xml version="1.0" standalone="yes"?> optional No external subset referenced – default

<?xml version="1.0" encoding=“UTF-8” standalone="yes”?>

Comments

<!-- My first XML file --> <!-- More Comments -->

XML uses same comment syntax as HTML

Root Element

<document>

</document> Lines preceding root element are contained in the

Prolog All XML documents must contain only one root

element All other elements are “child element”s

Child Element

<document> <message>Hello World!</message > </document >

Sibling Element

<document> <message>Hello World!</message > <message>Goodbye World!</message > <message2>Nothing more to

add!</message2 > </document >

Updating Microsoft’s Internet Explorer instmsia.exe

Updates Microsoft’s Installer

msxml3sp1.exe Updates Microsoft’s Internet Explorer

IE now has built-in XML parser “msxml”

Create XML Document Include declaration

<?xml version="1.0"?>

Create root element <cis_class> Create child element <cis_345> Enter child element text “student name” Save file with “.xml” extension Open using Internet Explorer After success, add siblings elements and

retest using Internet Explorer

Document viewed in Microsoft's Internet Explorer

More about Elements Element types

Container Element Contains other elements

Data Element Contains DATA

Mixed Content Contains other elements and DATA

Empty Element Contains no elements or DATA

Container Element Contains other elements <outer_element>

<inner_element> <yet_another_element>

<can_we_go_any_deeper> Some text way down here in the center of it all

</can_we_go_any_deeper> </yet_another_element>

</inner_element> </outer_element>

Data Element Contains DATA

Parsable Character Data PCDATA

Character Data CDATA

PCDATA Contains text Can be parsed by parser Can contain all text except

< > “ ‘ &

Entity References XML provides built in entity

references &lt; &gt; &quot; &apos; &amp;

CDATA Contains text Is a declaration Can contain reserved characters

<, >, “, ‘, & Starts with / ends with

<![CDATA[ Data would be here

]]> CDATA can not contain

]]>

Declarations <!-- --> <!DOCTYPE > <![CDATA[ ]]> <!ELEMENT > <![IGNORE[ ]]> <![INCLUDE[ ]]> <!NOTATION > <!ENTITY > <!ATTLIST >

Why CDATA section “C++” code example

CDATA example If (this->getX() < 5 && array1[0] != 3) cerr << this->displayError();

PCDATA example If (this-&gt;getX() &lt; 5 &amp;&amp; array1[0] != 3) cerr &lt;&lt; this-&gt;displayError();

Mixed Content Elements and PCDATA combined <outer_element>

outer element stuff <inner_element>

inner element stuff </inner_element>

more outer element stuff </outer_element>

Empty Element Contains no text or data May have an attribute <empty_element></empty_element> <empty_element/>

Short cut notation for empty element Does this look unfamiliar

HTML example of such a type of tag <img src = “image.gif”> //Non Well Formed <img src = “image.gif” /> //Well Formed

Elements

element

Start-tag

content

End-tag

Create XML Document 2 Include declaration Create root element <cis_class2> Create child element <cis_345>

Enter child element text “student name” Create child element <cis_346>

Child to root, sibling to <cis_345> Make this an empty element

Create child element <cis_347> Child to root, sibling to <cis_345> Enter C++ code example in PCDATA section

Create child element <cis_348> Enter same C++ code in a CDATA section

Save file Open using Internet Explorer

XML Parser – DOM & SAX Required to process an XML document C, Java, Python, Perl Parsers are of type

Document Object Model(DOM) Tree structure Like a drive directory structure Slower and requires large amounts of memory

Simple API for XML(SAX) Events driven Events = tags, text, etc. Smaller, faster, but requires programmer to deal with data

Validating and non-validating

XML Structure Logical structure

Document divided into units Allows sub units XML is a logical tree structure

document Physical structure

Data stored inside document Data stored outside document

Entities one example

Valid Conforms to some schema

schema “s” Document Type Definition(DTD) Schema

By definition, all valid XML documents are Well – Formed documents

DTD Document Type Definition Document Type Declaration(DTD) File extension of “.dtd” DTD is not an XML document DTD is a schema “s” Introduced into an XML document via the Document Type

Declaration <!DOCTYPE >

Three types of DOCTYPE declarations Internal Subset

Contained in the Prolog External Subset

Exist in different file Prolog contains reference to file containing DTD Referenced using key work

SYSTEM or PUBLIC Internal Subset and External Subset combination

Internal Subset <?xml version="1.0"?> <!-- My second XML file --> <!DOCTYPE document [ <!ELEMENT document (message)> <!ELEMENT message (#PCDATA)> ]> <document> <message>Hello World!</message> </document> <!-- More Comments -->

External Subset <?xml version="1.0"?> <!-- My second XML file --> <!DOCTYPE document SYSTEM

"HelloWorld.dtd"> <document> <message>Hello World!</message> </document> <!-- More Comments -->

DTD for HelloWorld.xml <!ELEMENT document (message)> <!ELEMENT message (#PCDATA)>

Internal Subset and External Subset combination I <?xml version="1.0"?> <!-- My second XML file --> <!DOCTYPE document SYSTEM "HelloWorld3.dtd"[ <!ELEMENT document (message)> ]> <document> <message > <message2> </message2> </message> </document> <!-- More Comments -->

Internal Subset and External Subset combination II <!-- External declarations --> <!ELEMENT message (message2)> <!ELEMENT message2 (#PCDATA)>

Putting it all together HelloWorld3.dtd

<!-- External declarations --> <!ELEMENT message (message2)> <!ELEMENT message2 (#PCDATA)>

HelloWorld3.xml <?xml version="1.0"?> <!-- My second XML file --> <!DOCTYPE document SYSTEM "HelloWorld3.dtd"[ <!ELEMENT document (message)> ]> <document> <message > <message2> </message2> </message> </document> <!-- More Comments -->

XML ValidatorType in HelloWorld3.xml

Create XML Document 3 Create root element <session1> Create child element of session1<session2>

Enter child element text “xml class” Create child element of session2<session3>

Enter child element text “class information” Create child element of session3<session4>

Enter child element text “more” Create DTD for this file dtd_info.dtd

Reference file in XML document Save files Open validate_vbs.html Enter .xml file name Validate

Schema Schemas are XML documents

Schemas can be manipulated via a parser

More complicated than DTDs Schemas have “ElementType”s

Schema vs. DTD <!-- External declarations --> <!ELEMENT document (message, message2,

message3)> <!ELEMENT message (message4,

message5 )> <!ELEMENT message2 (#PCDATA)> <!ELEMENT message3 (#PCDATA)> <!ELEMENT message4 (#PCDATA)> <!ELEMENT message5 (#PCDATA)>

Schema vs. DTD II <?xml version="1.0" encoding="UTF-8"?> <!--W3C Schema generated by XML Spy v3.5

(http://www.xmlspy.com)--> <xsd:schema

xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" elementFormDefault="qualified">

<xsd:element name="document"> <xsd:complexType> <xsd:sequence> <xsd:element ref="message"/> <xsd:element ref="message2"/> <xsd:element ref="message3"/> </xsd:sequence> </xsd:complexType> </xsd:element>

Schema vs. DTD III <xsd:element name="message"> <xsd:complexType> <xsd:sequence> <xsd:element ref="message4"/> <xsd:element ref="message5"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="message2" type="xsd:string"/> <xsd:element name="message3" type="xsd:string"/> <xsd:element name="message4" type="xsd:string"/> <xsd:element name="message5" type="xsd:string"/> </xsd:schema>

Topics not covered Namespace Whitespace Xpath Xpointer Xlink XSL XSLT SOAP DDI Web Services SMIL XHTML