XML

XML

eXtensible Markup Language

XML A descendent of SGML (Standard Generalized Markup Language) A Recommendation of W3C in 1998 A universal language for data on the Web

HTML for the presentation of data XML for the structuring of data

A meta markup language Enables the creation of new markup languages to markup anything

imaginable (math formulas, molecular structure of chemical, etc.) Gives developers the power to deliver structured data from a wide

variety of applications to the desktop for local computation and presentation

An ideal format for server-to-server transfer of structured data

XML and its Derivatives FpML (http://www.fpml.org/) XDBML (master thesis at ITK, 2005) MathML ChemML VoiceML SMIL (Synchronized Multimedia Interface

Language) XMI (XML Metadata Interchange)

XML + UML => universal format for exchanging OO system analysis and

design documents More …

How XML is similar to HTML XML uses tags just like HTML, but those tags don’t

define text formatting. Instead the tags are used to create data structures

Let’s see some examples…

Examples of HTML and XML

HTML Code:

<b> This is bold text…</b>

XML Code:

<President> Clinton</President>

Using our own custom tag named “President,” we have stored a small piece of information.

Note: XML is case-sensitive!

Detailed Example XML documents is organized in a hierarchal fashion. Each tag or

node can have “sub” nodes under it.

Well-Formedness: Any number of nodes can be created under any given node. But each node must be “closed” using a closing tag, like </President>. Exception: “empty” element does not have a closing tag

E.g., <flag id = “Y” />

<President><Name>Clinton, Bill</Name><Age>52</Age><Terms>2</Terms>

</President>

Must end with a forward slash

XML Elements A “Node” in an XML document is known as an

Element. An XML document can have any number of elements.

For example we could store information about 10 Presidents in a document.

However, there is only one root element, i.e., <Presidents> <President> </President> …</Presidents>

Multiple Elements<Cars><Car>

<Manufacturer>Mitsubishi</Manufacturer><Model>Eclipse</Model><Year>1998</Year>

</Car><Car>

<Manufacturer>Pontiac</Manufacturer><Model>Sun Fire</Model><Year>1997</Year>

</Car><Car>

<Manufacturer>Nissan</Manufacturer><Model>X-Terra</Model><Year>2000</Year><SUV>Yes</SUV>

</Car></Cars>

Attributes Besides having “sub-elements,” every element can also

have what are known as Attributes. Attributes are declared “inside” the tag. You may already

know how to use attributes if you have used the <IMG> or <A> tags in HTML.

For example:

<A HREF=“somepage.html”>click here</A>

XML Attributes Here’s an example of an XML element with an

Attribute…. <Vehicle VIN=“3232382432832”>

<Year>1997</Year>

<Manufacturer>Toyota</Manufacturer>

</Vehicle>

We could make any element an attribute. For example, Manufacturer and Year could also

have been made attributes. However you usually want only meta-data or scalar to be an attribute.

A Complete Example (1)<?xml version="1.0"?><!–- Deitel 2000, Fig. 28.1: article.xml -->

<article>

<title>Simple XML</title> <date>September 6, 1999</date> <author> <fname>Tem</fname> <lname>Nieto</lname> </author> <summary>XML is pretty easy.</summary> <content>Once you have mastered HTML, XML is easily learned. You must remember that XML is not for displaying information but for managing information. </content>

</article>

A Complete Example (2a)<?xml version = "1.0"?>



<!DOCTYPE letter SYSTEM "letter.dtd">

<letter>

<contact type = "from"> <name>John Doe</name> <address1>123 Main St.</address1> <address2></address2> <city>Anytown</city> <state>Anystate</state> <zip>12345</zip> <phone>555-1234</phone> <flag id = "P"/> </contact>

A Complete Example (2b) <contact type = "to"> <name>Joe Schmoe</name> <address1>Box 12345</address1> <address2>15 Any Ave.</address2> <city>Othertown</city> <state>Otherstate</state> <zip>67890</zip> <phone>555-4321</phone> <flag id = "B"/> </contact>

<paragraph>Dear Sir,</paragraph>

<paragraph>It is our privilege to inform you about our new database managed with XML. This new system will allow you to reduce the load of your inventory list server by having the client machine perform the work of sorting and filtering the data.</paragraph> <paragraph>Sincerely, Mr. Doe</paragraph>

</letter>

DTD DTD = Document Type Definition

Defines the grammatical rules for the document Not required for XML but recommended for document

conformity Can check the Validity of a XML document (contains proper

elements, attributes, etc.) Uses EBNF grammar

Represented by the DOCTYPE tag, which contains three parts if it refers to an external subset: Root element applied Flag (e.g., SYSTEM (personal, non-standardized), PUBLIC

(standardized, publicly available)) DTD name and location

DTD: Example<!ELEMENT letter (contact+, paragraph+)>

<!ELEMENT contact (name, address1, address2, city, state, zip, phone, flag)><!ATTLIST contact type CDATA #IMPLIED>

<!ELEMENT name (#PCDATA)><!ELEMENT address1 (#PCDATA)><!ELEMENT address2 (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT state (#PCDATA)><!ELEMENT zip (#PCDATA)><!ELEMENT phone (#PCDATA)><!ELEMENT flag EMPTY><!ATTLIST flag id CDATA #IMPLIED>

<!ELEMENT paragraph (#PCDATA)>

DTD: Example (cont’d) !ELEMENT element type declaration

Specifies that an element is being created Here, a letter is being created with one or more

contact element and one or more paragraph element, in that order.

Operator + means one or more occurrences Operator * means zero or more occurrences Operator ? means zero or exactly one occurrence If no operator is included, exactly one occurrence is

assumed. Others: “|” - alternatives

DTD: Example (cont’d) !ATTLIST element type declaration

Defines the attribute of an element Here, the type of contract is defined to have:

A string (as given by CDATA), which is unspecified and optional (as given by #IMPLIED).

The string will not be parsed by XML processor and will simply be passed directly to the application

Others: #PCDATA means this element can store parsed character data

(i.e., text) EMPTY means the element does not contain any element

Commonly used for an element’s attribute More Others:

IDs and IDREFs (your next assignment!)

XML Schema [Silberschatz et al. ’02]

XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports Typing of values

E.g. integer, string, etc Also, constraints on min/max values

User defined types Is itself specified in XML syntax, unlike DTDs

More standard representation, but verbose Is integrated with namespaces Many more features

List types, uniqueness and foreign key constraints, inheritance .. BUT: significantly more complicated than DTDs, not yet

widely used (yet!).

XML Schema: Example<xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema><xsd:element name=“bank” type=“BankType”/><xsd:element name=“account”>

<xsd:complexType> <xsd:sequence> <xsd:element name=“account-number” type=“xsd:string”/> <xsd:element name=“branch-name” type=“xsd:string”/> <xsd:element name=“balance” type=“xsd:decimal”/> </xsd:squence></xsd:complexType>

</xsd:element>….. definitions of customer and depositor ….<xsd:complexType name=“BankType”>

<xsd:squence><xsd:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/><xsd:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/><xsd:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/>

</xsd:sequence></xsd:complexType></xsd:schema>

Querying and Transforming XML Data [Silberschatz et al. ’02]

Translation of information from one XML schema to another Querying on XML data Above two are closely related, and handled by the same tools Standard XML querying/translation languages

XPath Simple language consisting of path expressions

XSLT Simple language designed for translation from XML to XML and XML to

HTML XQuery

An XML query language with a rich set of features Wide variety of other languages have been proposed, and some

served as basis for the XQuery standard XML-QL, Quilt, XQL, …

Tree Model of XML Data Query and transformation languages are based on a tree model of

XML data An XML document is modeled as a tree, with nodes corresponding

to elements and attributes Element nodes have children nodes, which can be attributes or

subelements Text in an element is modeled as a text node child of the element Children of a node are ordered according to their order in the XML

document Element and attribute nodes (except for the root node) have a single

parent, which is an element node The root node has a single child, which is the root element of the

document We use the terminology of nodes, children, parent, siblings, ancestor,

descendant, etc., which should be interpreted in the above tree model of XML data.

XPath XPath is used to address (select) parts of documents using

path expressions A path expression is a sequence of steps separated by “/”

Think of file names in a directory hierarchy Result of path expression: set of values that along with

their containing elements/attributes match the specified path

E.g. /bank-2/customer/name evaluated on the bank-2 data we saw earlier returns <name>Joe</name><name>Mary</name>

E.g. /bank-2/customer/name/text( ) returns the same names, but without the enclosing tags

XPath The initial “/” denotes root of the document (above the

top-level tag) Path expressions are evaluated left to right

Each step operates on the set of instances produced by the previous step

Selection predicates may follow any step in a path, in [ ] E.g. /bank-2/account[balance > 400]

returns account elements with a balance value greater than 400 /bank-2/account[balance] returns account elements containing a

balance subelement

Attributes are accessed using “@” E.g. /bank-2/account[balance > 400]/@account-number

returns the account numbers of those accounts with balance > 400

XPath Operator “|” used to implement union

E.g. /bank-2/account/id(@owner) | /bank-2/loan/id(@borrower) gives customers with either accounts or loans However, “|” cannot be nested inside other operators.

“//” can be used to skip multiple levels of nodes E.g. /bank-2//name

finds any name element anywhere under the /bank-2 element, regardless of the element in which it is contained.

A step in the path can go to:parents, siblings, ancestors and descendants

of the nodes generated by the previous step, not just to the children “//”, described above, is a short from for specifying “all descendants” “..” specifies the parent.

Functions in XPath XPath provides several functions

The function count() at the end of a path counts the number of elements in the set generated by the path

E.g. /bank-2/account[customer/count() > 2] Returns accounts with > 2 customers

Also function for testing position (1, 2, ..) of node w.r.t. siblings Boolean connectives and and or and function not() can be used in

predicates IDREFs can be referenced using function id()

id() can also be applied to sets of references such as IDREFS and even to strings containing multiple references separated by blanks

E.g. /bank-2/account/id(@owner) returns all customers referred to from the owners attribute of

account elements.

XSL Extensible Style Language (XSL) Defines the layout of an XML (much like CSS

defines the layout of an HTML document) XSL style sheet provides the rules for displaying an

XML document XSL also defines rules on how an XML document is

transformed into another XML document (i.e., XSLT for XSL Transformation)

XSL: Example See Program listing

XSL: Example (cont’d)<xsl:for-each order-by = “+Lastname;+Firstname” select = “contact” xmlns:xsl = http://www.w3.org/TR/WD-xsl>

for-each Iterate over each element of contact

order-by + means ascending; - means descending

select Defines which elements are selected

xmlns XML namespace Indicates where the specification for this element is located

XSL: Example (cont’d)<Lastname><xsl:value-of select = “Lastname”/>

xsl:value-of Retrieves the data specified in attribute select Empty element and thus the ‘/’

<xsl:for-each select = “contact[Lastname=‘Neito’]” contact[Lastname=‘Neito’]

[] specifies XSL conditional statement

XSL: Example (cont’d) var xmldoc = xmlData.cloneNode( true ); Copy xmlData object so that we don’t lose the original true means recursively copy

function sort( xsldoc ) { xmldoc.documentElement.transformNodeToObject( xsldoc.documentElement, xmlData.XMLDocument );} transformNodeToObject

Applies a specified XSL style sheet to the data contained in the parent object

documentElement gets the root element XMLDocument accesses the XML document to which xmlData refers

XML Editors

XML Resources W3C XML Standards Body

http://www.w3c.org/xml Microsoft Developer Network (MSDN)

http://msdn.microsoft.com/xml The BizTalk Framework

http://www.biztalk.org IBM’s XML Zone

http://www.ibm.com/developer/xml/

Date post:	31-Dec-2015
Category:	Documents
Upload:	eagan-mcknight
View:	18 times
Download:	0 times

XML

Documents