Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | eagan-mcknight |
View: | 18 times |
Download: | 0 times |
XML A descendent of SGML (Standard Generalized Markup Language) A Recommendation of W3C in 1998 A universal language for data on the Web
HTML for the presentation of data XML for the structuring of data
A meta markup language Enables the creation of new markup languages to markup anything
imaginable (math formulas, molecular structure of chemical, etc.) Gives developers the power to deliver structured data from a wide
variety of applications to the desktop for local computation and presentation
An ideal format for server-to-server transfer of structured data
XML and its Derivatives FpML (http://www.fpml.org/) XDBML (master thesis at ITK, 2005) MathML ChemML VoiceML SMIL (Synchronized Multimedia Interface
Language) XMI (XML Metadata Interchange)
XML + UML => universal format for exchanging OO system analysis and
design documents More …
How XML is similar to HTML XML uses tags just like HTML, but those tags don’t
define text formatting. Instead the tags are used to create data structures
Let’s see some examples…
Examples of HTML and XML
HTML Code:
<b> This is bold text…</b>
XML Code:
<President> Clinton</President>
Using our own custom tag named “President,” we have stored a small piece of information.
Note: XML is case-sensitive!
Detailed Example XML documents is organized in a hierarchal fashion. Each tag or
node can have “sub” nodes under it.
Well-Formedness: Any number of nodes can be created under any given node. But each node must be “closed” using a closing tag, like </President>. Exception: “empty” element does not have a closing tag
E.g., <flag id = “Y” />
<President><Name>Clinton, Bill</Name><Age>52</Age><Terms>2</Terms>
</President>
Must end with a forward slash
XML Elements A “Node” in an XML document is known as an
Element. An XML document can have any number of elements.
For example we could store information about 10 Presidents in a document.
However, there is only one root element, i.e., <Presidents> <President> </President> …</Presidents>
Multiple Elements<Cars><Car>
<Manufacturer>Mitsubishi</Manufacturer><Model>Eclipse</Model><Year>1998</Year>
</Car><Car>
<Manufacturer>Pontiac</Manufacturer><Model>Sun Fire</Model><Year>1997</Year>
</Car><Car>
<Manufacturer>Nissan</Manufacturer><Model>X-Terra</Model><Year>2000</Year><SUV>Yes</SUV>
</Car></Cars>
Attributes Besides having “sub-elements,” every element can also
have what are known as Attributes. Attributes are declared “inside” the tag. You may already
know how to use attributes if you have used the <IMG> or <A> tags in HTML.
For example:
<A HREF=“somepage.html”>click here</A>
XML Attributes Here’s an example of an XML element with an
Attribute…. <Vehicle VIN=“3232382432832”>
<Year>1997</Year>
<Manufacturer>Toyota</Manufacturer>
</Vehicle>
We could make any element an attribute. For example, Manufacturer and Year could also
have been made attributes. However you usually want only meta-data or scalar to be an attribute.
A Complete Example (1)<?xml version="1.0"?><!–- Deitel 2000, Fig. 28.1: article.xml --><!-- Article formatted with XML -->
<article>
<title>Simple XML</title> <date>September 6, 1999</date> <author> <fname>Tem</fname> <lname>Nieto</lname> </author> <summary>XML is pretty easy.</summary> <content>Once you have mastered HTML, XML is easily learned. You must remember that XML is not for displaying information but for managing information. </content>
</article>
A Complete Example (2a)<?xml version = "1.0"?>
<!-- Deitel 2000, Fig. 28.2: letter.xml --><!-- Business letter formatted with XML -->
<!DOCTYPE letter SYSTEM "letter.dtd">
<letter>
<contact type = "from"> <name>John Doe</name> <address1>123 Main St.</address1> <address2></address2> <city>Anytown</city> <state>Anystate</state> <zip>12345</zip> <phone>555-1234</phone> <flag id = "P"/> </contact>
A Complete Example (2b) <contact type = "to"> <name>Joe Schmoe</name> <address1>Box 12345</address1> <address2>15 Any Ave.</address2> <city>Othertown</city> <state>Otherstate</state> <zip>67890</zip> <phone>555-4321</phone> <flag id = "B"/> </contact>
<paragraph>Dear Sir,</paragraph>
<paragraph>It is our privilege to inform you about our new database managed with XML. This new system will allow you to reduce the load of your inventory list server by having the client machine perform the work of sorting and filtering the data.</paragraph> <paragraph>Sincerely, Mr. Doe</paragraph>
</letter>
DTD DTD = Document Type Definition
Defines the grammatical rules for the document Not required for XML but recommended for document
conformity Can check the Validity of a XML document (contains proper
elements, attributes, etc.) Uses EBNF grammar
Represented by the DOCTYPE tag, which contains three parts if it refers to an external subset: Root element applied Flag (e.g., SYSTEM (personal, non-standardized), PUBLIC
(standardized, publicly available)) DTD name and location
DTD: Example<!ELEMENT letter (contact+, paragraph+)>
<!ELEMENT contact (name, address1, address2, city, state, zip, phone, flag)><!ATTLIST contact type CDATA #IMPLIED>
<!ELEMENT name (#PCDATA)><!ELEMENT address1 (#PCDATA)><!ELEMENT address2 (#PCDATA)><!ELEMENT city (#PCDATA)><!ELEMENT state (#PCDATA)><!ELEMENT zip (#PCDATA)><!ELEMENT phone (#PCDATA)><!ELEMENT flag EMPTY><!ATTLIST flag id CDATA #IMPLIED>
<!ELEMENT paragraph (#PCDATA)>
DTD: Example (cont’d) !ELEMENT element type declaration
Specifies that an element is being created Here, a letter is being created with one or more
contact element and one or more paragraph element, in that order.
Operator + means one or more occurrences Operator * means zero or more occurrences Operator ? means zero or exactly one occurrence If no operator is included, exactly one occurrence is
assumed. Others: “|” - alternatives
DTD: Example (cont’d) !ATTLIST element type declaration
Defines the attribute of an element Here, the type of contract is defined to have:
A string (as given by CDATA), which is unspecified and optional (as given by #IMPLIED).
The string will not be parsed by XML processor and will simply be passed directly to the application
Others: #PCDATA means this element can store parsed character data
(i.e., text) EMPTY means the element does not contain any element
Commonly used for an element’s attribute More Others:
IDs and IDREFs (your next assignment!)
XML Schema [Silberschatz et al. ’02]
XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports Typing of values
E.g. integer, string, etc Also, constraints on min/max values
User defined types Is itself specified in XML syntax, unlike DTDs
More standard representation, but verbose Is integrated with namespaces Many more features
List types, uniqueness and foreign key constraints, inheritance .. BUT: significantly more complicated than DTDs, not yet
widely used (yet!).
XML Schema: Example<xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema><xsd:element name=“bank” type=“BankType”/><xsd:element name=“account”>
<xsd:complexType> <xsd:sequence> <xsd:element name=“account-number” type=“xsd:string”/> <xsd:element name=“branch-name” type=“xsd:string”/> <xsd:element name=“balance” type=“xsd:decimal”/> </xsd:squence></xsd:complexType>
</xsd:element>….. definitions of customer and depositor ….<xsd:complexType name=“BankType”>
<xsd:squence><xsd:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/><xsd:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/><xsd:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/>
</xsd:sequence></xsd:complexType></xsd:schema>
Querying and Transforming XML Data [Silberschatz et al. ’02]
Translation of information from one XML schema to another Querying on XML data Above two are closely related, and handled by the same tools Standard XML querying/translation languages
XPath Simple language consisting of path expressions
XSLT Simple language designed for translation from XML to XML and XML to
HTML XQuery
An XML query language with a rich set of features Wide variety of other languages have been proposed, and some
served as basis for the XQuery standard XML-QL, Quilt, XQL, …
Tree Model of XML Data Query and transformation languages are based on a tree model of
XML data An XML document is modeled as a tree, with nodes corresponding
to elements and attributes Element nodes have children nodes, which can be attributes or
subelements Text in an element is modeled as a text node child of the element Children of a node are ordered according to their order in the XML
document Element and attribute nodes (except for the root node) have a single
parent, which is an element node The root node has a single child, which is the root element of the
document We use the terminology of nodes, children, parent, siblings, ancestor,
descendant, etc., which should be interpreted in the above tree model of XML data.
XPath XPath is used to address (select) parts of documents using
path expressions A path expression is a sequence of steps separated by “/”
Think of file names in a directory hierarchy Result of path expression: set of values that along with
their containing elements/attributes match the specified path
E.g. /bank-2/customer/name evaluated on the bank-2 data we saw earlier returns <name>Joe</name><name>Mary</name>
E.g. /bank-2/customer/name/text( ) returns the same names, but without the enclosing tags
XPath The initial “/” denotes root of the document (above the
top-level tag) Path expressions are evaluated left to right
Each step operates on the set of instances produced by the previous step
Selection predicates may follow any step in a path, in [ ] E.g. /bank-2/account[balance > 400]
returns account elements with a balance value greater than 400 /bank-2/account[balance] returns account elements containing a
balance subelement
Attributes are accessed using “@” E.g. /bank-2/account[balance > 400]/@account-number
returns the account numbers of those accounts with balance > 400
XPath Operator “|” used to implement union
E.g. /bank-2/account/id(@owner) | /bank-2/loan/id(@borrower) gives customers with either accounts or loans However, “|” cannot be nested inside other operators.
“//” can be used to skip multiple levels of nodes E.g. /bank-2//name
finds any name element anywhere under the /bank-2 element, regardless of the element in which it is contained.
A step in the path can go to:parents, siblings, ancestors and descendants
of the nodes generated by the previous step, not just to the children “//”, described above, is a short from for specifying “all descendants” “..” specifies the parent.
Functions in XPath XPath provides several functions
The function count() at the end of a path counts the number of elements in the set generated by the path
E.g. /bank-2/account[customer/count() > 2] Returns accounts with > 2 customers
Also function for testing position (1, 2, ..) of node w.r.t. siblings Boolean connectives and and or and function not() can be used in
predicates IDREFs can be referenced using function id()
id() can also be applied to sets of references such as IDREFS and even to strings containing multiple references separated by blanks
E.g. /bank-2/account/id(@owner) returns all customers referred to from the owners attribute of
account elements.
XSL Extensible Style Language (XSL) Defines the layout of an XML (much like CSS
defines the layout of an HTML document) XSL style sheet provides the rules for displaying an
XML document XSL also defines rules on how an XML document is
transformed into another XML document (i.e., XSLT for XSL Transformation)
XSL: Example (cont’d)<xsl:for-each order-by = “+Lastname;+Firstname” select = “contact” xmlns:xsl = http://www.w3.org/TR/WD-xsl>
for-each Iterate over each element of contact
order-by + means ascending; - means descending
select Defines which elements are selected
xmlns XML namespace Indicates where the specification for this element is located
XSL: Example (cont’d)<Lastname><xsl:value-of select = “Lastname”/>
xsl:value-of Retrieves the data specified in attribute select Empty element and thus the ‘/’
<xsl:for-each select = “contact[Lastname=‘Neito’]” contact[Lastname=‘Neito’]
[] specifies XSL conditional statement
XSL: Example (cont’d) var xmldoc = xmlData.cloneNode( true ); Copy xmlData object so that we don’t lose the original true means recursively copy
function sort( xsldoc ) { xmldoc.documentElement.transformNodeToObject( xsldoc.documentElement, xmlData.XMLDocument );} transformNodeToObject
Applies a specified XSL style sheet to the data contained in the parent object
documentElement gets the root element XMLDocument accesses the XML document to which xmlData refers