Date post: | 18-Dec-2015 |
Category: |
Documents |
View: | 216 times |
Download: | 1 times |
eXtensible Markup Language
Datenbanken und Internet WS 2006
Karsten TolleDatabase and Information Systems (DBIS)
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
TV Schedule DTD By David Moisan. Copied from his Web: http://www.davidmoisan.org/
<!DOCTYPE TVSCHEDULE [ <!ELEMENT TVSCHEDULE (CHANNEL+)> <!ELEMENT CHANNEL (BANNER,DAY+)> <!ELEMENT BANNER (#PCDATA)> <!ELEMENT DAY (DATE,(HOLIDAY|PROGRAMSLOT+)+)> <!ELEMENT HOLIDAY (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT PROGRAMSLOT (TIME,TITLE,DESCRIPTION?)> <!ELEMENT TIME (#PCDATA)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT DESCRIPTION (#PCDATA)>
<!ATTLIST TVSCHEDULE NAME CDATA #REQUIRED> <!ATTLIST CHANNEL CHAN CDATA #REQUIRED> <!ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED> <!ATTLIST TITLE RATING CDATA #IMPLIED> <!ATTLIST TITLE LANGUAGE CDATA #IMPLIED> ]>
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
DTD – Summary
DTD
XML
• XML file is valid if file is conform with DTD• This can be tested by so called: Validating XML Parser• For most applications it is useful to test if an XML input file is valid according the expected format/interpretation.
Validating XML Parser
Application
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Drawbacks DTD
• DTD uses cryptic SGML syntax– difficult to write– difficult to read– differs from the XML syntax
• DTD by default provides just a small set of data types
• Each XML file can only be based on one DTD!
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Namespaces in XML
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Namespaces
• To help identify origin or meaning of an element or attribute
• To allow two sets of elements to be combined even if there are duplicate element names
Example:<data xmlns:fruit="http://www.thirdm.com/fruit" xmlns:corp="http://www.thirdm.com/corporations">
<fruit:apple qty="5" type="Granny Smith"/><corp:apple stockticker="AAPL" exchange="NASDAQ"/>
</data>
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Namespace URI
• URI = Uniform Resource Identifier• Used to uniquely identify the namespace• There is no need of existence (XML), for other
applications like RDF this might differ!Example:<food xmlns:fruit="http://www.thirdm.com/fruit" xmlns:veg="http://www.thirdm.com/vegetables">
<fruit:apple qty="5"/><fruit:pear qty="6"/><veg:potato qty="7"/>
</food>
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Namespace Prefix
• Used to refer to the the namespace• Typically short, often three lettersExample:<rdf:RDF xmlns:rdf = ”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:rdfs = ”http://www.w3.org/2000/01/rdf-schema#”>
<rdfs:Class rdf:ID=“Book”></rdfs:Class>
</rdf:RDF>
Note: The ‘#’ anchor sign is used in RDF to point directly to resources inside a namespace.
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Another example …<?xml version="1.0" encoding="UTF-8" ?> <Order xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" xmlns:ccts="urn:oasis:names:specification:ubl:schema:xsd:CoreComponentParameters-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:udt="urn:un:unece:uncefact:data:draft:UnqualifiedDataTypesSchemaModule:2" xmlns="urn:oasis:names:specification:ubl:schema:xsd:Order-2"> <cbc:UBLVersionID>2.0</cbc:UBLVersionID> <cbc:CustomizationID>urn:oasis:names:specification:ubl:xpath:Order-2.0:sbs-1.0-draft</cbc:CustomizationID> <cbc:ProfileID>bpid:urn:oasis:names:draft:bpss:ubl-2-sbs-order-with-simple-response-draft</cbc:ProfileID> <cbc:ID>AEG012345</cbc:ID> <cbc:SalesOrderID>CON0095678</cbc:SalesOrderID> <cbc:CopyIndicator>false</cbc:CopyIndicator> …
IE
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Default Namespace
• Used to identify the namespace for elements without a prefix
Example:<rdf:RDF
xmlns:rdf = ”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns = ”http://www.w3.org/2000/01/rdf-schema#”>
<Class rdf:ID=“Book”>
</Class>
</rdf:RDF>
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Attributes and Namespaces
• Never associated with default namespace!
• Can have explicit namespace prefix
Example:<!-- http://www.w3.org is bound to n1 and is the default --
><x xmlns:n1="http://www.w3.org" xmlns="http://www.w3.org" > <good a="1" b="2" /> <good a="1" n1:a="2" /></x>
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Namespace Scope
• Scope is limited to the element the namespace is defined in
• May be overridden by child element<rdf:RDF xmlns:rdf = ”http://www.w3.org/1999/02/22-rdf-syntax-
ns#” xmlns:rdfs = ”http://www.w3.org/2000/01/rdf-schema#”>
<rdfs:Class xmlns:rdfs=“http://www.myrdfs.com” rdf:ID=“Book”>
</rdfs:Class></rdf:RDF>
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Notes about Namespaces
• Namespaces are not part of XML 1.0
• An XML parser (processor) may or may not support XML Namespaces
• Some parsers allow you to check at runtime to ensure it supports namespaces
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Notes about Namespaces• DTD's and Namespaces are compatible but do not work
well together– For example, the namespace prefix must be static if elements are
declared in the DTD
Quote form Namespaces in XML 1.0: Note that DTD-based validation is not namespace-aware in the following sense: a DTD constrains the elements and attributes that may appear in a document by their uninterpreted names, not by (namespace name, local name) pairs. To validate a document that uses namespaces against a DTD, the same prefixes must be used in the DTD as in the instance. A DTD may however indirectly constrain the namespaces used in a valid document by providing #FIXED values for attributes that declare namespaces.
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
XML Schema – Why?
• DTD uses cryptic SGML syntax– difficult to write– difficult to read– differs from the XML syntax
• DTD provides just a small set of data types• Each XML file can only be based on one
DTD!• DTD and XML Namespaces do not work
well together
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Schema Root Elementand
targetNamespace
<xsd:schema
xmlns:xsd=“http://www.w3.org/2001/XMLSchema“
targetNamespace=“http://www.dbis.de“>
<!-- element and attribute declarations go here -->
</xsd:schema>
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Element Declaration
<xsd:element name="notice" type="xsd:string"/>
• Compare to: <!ELEMENT notice (#PCDATA)>
• Note that the prefix usage inside the attribute value might not work with any XML application. E.g. in RDF it would not be allowed. Entity references sould be used instead.
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Attribute Declaration
<xsd:element name="article">
<xsd:complexType>
<xsd:attribute name="title"
type="xsd:string" use="required"/>
<xsd:attribute name="author"
type="xsd:string" use="required"/>
</xsd:complexType>
</xsd:element>
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Data Types
• XML Schema Part 2: Datatypes Second Edition– W3C Recommendation 28 October 2004– http://www.w3.org/TR/xmlschema-2/
• Built-in datatypes are those which are defined in this specification, and can be either primitive or derived;
• User-derived datatypes are those derived datatypes that are defined by individual schema designers.
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Element Declaration with Children
<xsd:element name="publications"> <xsd:complexType> <xsd:sequence> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:element ref="article"/> <xsd:element ref="book"/> </xsd:choice> <xsd:element ref="notice" minOccurs="0"/> </xsd:sequence> </xsd:complexType></xsd:element>
Compare to DTD: <!ELEMENT publications ((article | book)*, notice?)>
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Separation into logical parts
An XML Schema might get huge. It is therefore useful to separate the definitions of logical parts, like the definition for an address from other parts. This makes it easier to maintain and reuse.
<schema targetNamespace="http://www.example.com/IBEST" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:ibest="http://www.example.com/IBEST">
<annotation> <documentation xml:lang="DE"> Adressen für das internationale Buchbestellungsschema für Example.com. Copyright 2001 Example.com. Alle Rechte vorbehalten. </documentation>
</annotation>
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Another example …<?xml version="1.0" encoding="UTF-8"?><!-- Document Type: Order Generated On: Tue Oct 03 2:26:38 P3 2006
--><!-- ===== xsd:schema Element With Namespaces Declarations ===== --><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="urn:oasis:names:specification:ubl:schema:xsd:Order-2" xmlns="urn:oasis:names:specification:ubl:schema:xsd:Order-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:udt="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" elementFormDefault="qualified" attributeFormDefault="unqualified" version="2.0"><!-- ===== Imports ===== --><xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" schemaLocation="../common/UBL-CommonAggregateComponents-2.0.xsd"/><xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" schemaLocation="../common/UBL-CommonBasicComponents-2.0.xsd"/><xsd:import namespace="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2" schemaLocation="../common/UnqualifiedDataTypeSchemaModule-2.0.xsd"/><xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" schemaLocation="../common/UBL-CommonExtensionComponents-2.0.xsd"/><xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" schemaLocation="../common/UBL-QualifiedDatatypes-2.0.xsd"/><!-- ===== Root Element ===== --><xsd:element name="Order" type="OrderType">
<xsd:annotation><xsd:documentation>This element MUST be conveyed as the root element in any instance document based on this Schema expression</xsd:documentation>
</xsd:annotation></xsd:element><xsd:complexType name="OrderType">
view
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Include
To include separated parts of a schema the main schema uses the include element.
<include schemaLocation="http://www.example.com/schemas/adresse.xsd"/>
Main Schema
Include Schemas
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Validierung
<?xml version="1.0" encoding="UTF-8"?>
<Order xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" xmlns:ccts="urn:oasis:names:specification:ubl:schema:xsd:CoreComponentParameters-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:udt="urn:un:unece:uncefact:data:draft:UnqualifiedDataTypesSchemaModule:2" xmlns="urn:oasis:names:specification:ubl:schema:xsd:Order-2">
<cbc:UBLVersionID>2.0</cbc:UBLVersionID><cbc:CustomizationID>urn:oasis:names:specification:ubl:xpath:Order-2.0:sbs-1.0-draft</
cbc:CustomizationID><cbc:ProfileID>bpid:urn:oasis:names:draft:bpss:ubl-2-sbs-order-with-simple-response-draft</cbc:ProfileID><cbc:ID>AEG012345</cbc:ID>
…
</Order>
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Processing XML
SAX vs DOM vs StAX
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
DOM (Document Object Model)
Generates the tree structure out of the elements contained in the XML document.
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
DOM (Document Object Model)
• Very useful for small documents• Random access to structure using objects• Can read, manipulate, and write XML
programmatically• Write recursive code to explore child nodes
of unknown or evolving schema• Write hard-coded procedures to handle
static well-known schema
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
SAX (Simple API for XML)
Based on events like (default handler):– startDocument () – endDocument () – startElement (java.lang.String uri,
java.lang.String localName, java.lang.String qName,
Attributes attributes) – endElement (java.lang.String uri,
java.lang.String localName, java.lang.String qName)
– error (SAXParseException e) – fatalError (SAXParseException e)
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
SAX (Simple API for XML)
• Uses much less memory then DOM, especially for large documents (but for some applications more than one pass is needed)
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
DOM vs SAX
DOM SAX
memory - +
flexibility + -
performace - (*) + (*)
Standard w3c xml-develop
* Depending on the application, if more than one pass needed DOM might be better!
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Problems …
• What if DOM and SAX are both not acceptable? E.g. mobile devices with J2ME
• DOM needs to much memory
• Common streaming APIs like SAX are all push APIs– It is the SAX parser pushing the tokens into the
application not easy to handle
public class Flour extends DefaultHandler { …public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) { … }…public static void main(String[] args) {
Flour f = new Flour(); SAXParser p = new SAXParser(); p.setContentHandler(f); try {
p.parse(args[0]); } catch (Exception e) {
e.printStackTrace();}System.out.println(f.amount);
}…
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Alternative: StAX
• StAX (Streaming API for XML) – a pull parsing API – With e.g. next() the next token can be called by
the application.– JSR 173 (Java Specification Request)
http://jcp.org/en/jsr/detail?id=173
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Example for calling next …
while (true) { int event = parser.next(); if (event == XMLStreamConstants.END_DOCUMENT) {
parser.close(); break;
} if (event == XMLStreamConstants.START_ELEMENT) { System.out.println(parser.getLocalName()); }
}
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Transforming XML
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
GIF, JPG, NSK-TIFF etc.GIF, JPG, NSK-TIFF etc.
AVI, AU, WAV, AVI, AU, WAV, WMA, MP3 etc.WMA, MP3 etc.
MPG, WMV, MPG, WMV, RM, etc.RM, etc.
DOC, HTML, DOC, HTML, PDF, etc.PDF, etc.
JPEG, GIF JPEG, GIF etc.etc.
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
XSL
The XML Stylesheet Language (XSL) has three subcomponents:
• XSL-FOXSL-Formatting Objects, an XML vocabulary for specifying formatting semantics.
• XSLTThis the transformation language, which lets you transform XML into some other format.
• XPath XPath is an addressing mechanism that lets you specify a path to an element.
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
• Extensible Stylesheet Language (XSL)Version 1.0– W3C Recommendation 15 October 2001
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
XSLT Processing
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
XSLT (XML Stylesheet Language Transformations)
• XSLT is a programming language
• Write scripts containing if statements and for-each loops
• Uses XPath for querying document, math calculations, and string functions
• Can transform XML into HTML or text
• Useful for transforming XML to XML
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
So, what is XML
• A meta markup language• Structured information that complies to a standard
structure and syntax• “The ASCII of the 21st Century”• Platform independent information for:
– Presentation instructions– User settings– Data repository– Data transfer– RPC calls– ...
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
What XML is not
• XML is not tied to any human language or
character encoding
• XML is not tied to any computing platform
or programming language
• XML has no semantics
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Literature I
• XML Professionell; Richard Anderson u.a.; MITP-Verlag; 2000; ISBN 3-8266-0633-7
• XML Data Management; Akmal B. Chaudhri, Awais Rashid and Roberto Zicari; Addison Wesley; 2003; ISBN 0-201-84452-4
WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle
Literature II
Resources of DBIS related to XML (German):• Einführung in XML & Document Type Definition;
Alexander Semino; Seminar SS 2001• XML-Schemata; Markus Krauße; Seminar SS
2001• XSL – Dokumente mit Stil; Fabian Wleklinski;
Seminar SS 2001• HTML und XML; Christina Anthes; Proseminar
SS2002