+ All Categories
Home > Documents > EXtensible Markup Language Datenbanken und Internet WS 2006 Karsten Tolle Database and Information...

EXtensible Markup Language Datenbanken und Internet WS 2006 Karsten Tolle Database and Information...

Date post: 18-Dec-2015
Category:
View: 216 times
Download: 1 times
Share this document with a friend
Popular Tags:
47
eXtensible Markup Language Datenbanken und Internet WS 2006 Karsten Tolle Database and Information Systems (DBIS)
Transcript

eXtensible Markup Language

Datenbanken und Internet WS 2006

Karsten TolleDatabase and Information Systems (DBIS)

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

TV Schedule DTD By David Moisan. Copied from his Web: http://www.davidmoisan.org/

<!DOCTYPE TVSCHEDULE [ <!ELEMENT TVSCHEDULE (CHANNEL+)> <!ELEMENT CHANNEL (BANNER,DAY+)> <!ELEMENT BANNER (#PCDATA)> <!ELEMENT DAY (DATE,(HOLIDAY|PROGRAMSLOT+)+)> <!ELEMENT HOLIDAY (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT PROGRAMSLOT (TIME,TITLE,DESCRIPTION?)> <!ELEMENT TIME (#PCDATA)> <!ELEMENT TITLE (#PCDATA)>  <!ELEMENT DESCRIPTION (#PCDATA)>

<!ATTLIST TVSCHEDULE NAME CDATA #REQUIRED> <!ATTLIST CHANNEL CHAN CDATA #REQUIRED> <!ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED> <!ATTLIST TITLE RATING CDATA #IMPLIED> <!ATTLIST TITLE LANGUAGE CDATA #IMPLIED> ]>

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

DTD – Summary

DTD

XML

• XML file is valid if file is conform with DTD• This can be tested by so called: Validating XML Parser• For most applications it is useful to test if an XML input file is valid according the expected format/interpretation.

Validating XML Parser

Application

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Drawbacks DTD

• DTD uses cryptic SGML syntax– difficult to write– difficult to read– differs from the XML syntax

• DTD by default provides just a small set of data types

• Each XML file can only be based on one DTD!

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Namespaces in XML

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Namespaces

• To help identify origin or meaning of an element or attribute

• To allow two sets of elements to be combined even if there are duplicate element names

Example:<data xmlns:fruit="http://www.thirdm.com/fruit" xmlns:corp="http://www.thirdm.com/corporations">

<fruit:apple qty="5" type="Granny Smith"/><corp:apple stockticker="AAPL" exchange="NASDAQ"/>

</data>

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Namespace URI

• URI = Uniform Resource Identifier• Used to uniquely identify the namespace• There is no need of existence (XML), for other

applications like RDF this might differ!Example:<food xmlns:fruit="http://www.thirdm.com/fruit" xmlns:veg="http://www.thirdm.com/vegetables">

<fruit:apple qty="5"/><fruit:pear qty="6"/><veg:potato qty="7"/>

</food>

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Namespace Prefix

• Used to refer to the the namespace• Typically short, often three lettersExample:<rdf:RDF xmlns:rdf = ”http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:rdfs = ”http://www.w3.org/2000/01/rdf-schema#”>

<rdfs:Class rdf:ID=“Book”></rdfs:Class>

</rdf:RDF>

Note: The ‘#’ anchor sign is used in RDF to point directly to resources inside a namespace.

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Another example …<?xml version="1.0" encoding="UTF-8" ?> <Order xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" xmlns:ccts="urn:oasis:names:specification:ubl:schema:xsd:CoreComponentParameters-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:udt="urn:un:unece:uncefact:data:draft:UnqualifiedDataTypesSchemaModule:2" xmlns="urn:oasis:names:specification:ubl:schema:xsd:Order-2">  <cbc:UBLVersionID>2.0</cbc:UBLVersionID>   <cbc:CustomizationID>urn:oasis:names:specification:ubl:xpath:Order-2.0:sbs-1.0-draft</cbc:CustomizationID>   <cbc:ProfileID>bpid:urn:oasis:names:draft:bpss:ubl-2-sbs-order-with-simple-response-draft</cbc:ProfileID>   <cbc:ID>AEG012345</cbc:ID>   <cbc:SalesOrderID>CON0095678</cbc:SalesOrderID>   <cbc:CopyIndicator>false</cbc:CopyIndicator> …

IE

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Default Namespace

• Used to identify the namespace for elements without a prefix

Example:<rdf:RDF

xmlns:rdf = ”http://www.w3.org/1999/02/22-rdf-syntax-ns#”

xmlns = ”http://www.w3.org/2000/01/rdf-schema#”>

<Class rdf:ID=“Book”>

</Class>

</rdf:RDF>

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Attributes and Namespaces

• Never associated with default namespace!

• Can have explicit namespace prefix

Example:<!-- http://www.w3.org is bound to n1 and is the default --

><x xmlns:n1="http://www.w3.org"    xmlns="http://www.w3.org" >  <good a="1"     b="2" />  <good a="1"     n1:a="2" /></x>

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Namespace Scope

• Scope is limited to the element the namespace is defined in

• May be overridden by child element<rdf:RDF xmlns:rdf = ”http://www.w3.org/1999/02/22-rdf-syntax-

ns#” xmlns:rdfs = ”http://www.w3.org/2000/01/rdf-schema#”>

<rdfs:Class xmlns:rdfs=“http://www.myrdfs.com” rdf:ID=“Book”>

</rdfs:Class></rdf:RDF>

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Notes about Namespaces

• Namespaces are not part of XML 1.0

• An XML parser (processor) may or may not support XML Namespaces

• Some parsers allow you to check at runtime to ensure it supports namespaces

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Notes about Namespaces• DTD's and Namespaces are compatible but do not work

well together– For example, the namespace prefix must be static if elements are

declared in the DTD

Quote form Namespaces in XML 1.0: Note that DTD-based validation is not namespace-aware in the following sense: a DTD constrains the elements and attributes that may appear in a document by their uninterpreted names, not by (namespace name, local name) pairs. To validate a document that uses namespaces against a DTD, the same prefixes must be used in the DTD as in the instance. A DTD may however indirectly constrain the namespaces used in a valid document by providing #FIXED values for attributes that declare namespaces.

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

XML Schema

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

XML Schema – Why?

• DTD uses cryptic SGML syntax– difficult to write– difficult to read– differs from the XML syntax

• DTD provides just a small set of data types• Each XML file can only be based on one

DTD!• DTD and XML Namespaces do not work

well together

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Schema Root Elementand

targetNamespace

<xsd:schema

xmlns:xsd=“http://www.w3.org/2001/XMLSchema“

targetNamespace=“http://www.dbis.de“>

<!-- element and attribute declarations go here -->

</xsd:schema>

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Element Declaration

<xsd:element name="notice" type="xsd:string"/>

• Compare to: <!ELEMENT notice (#PCDATA)>

• Note that the prefix usage inside the attribute value might not work with any XML application. E.g. in RDF it would not be allowed. Entity references sould be used instead.

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Attribute Declaration

<xsd:element name="article">

<xsd:complexType>

<xsd:attribute name="title"

type="xsd:string" use="required"/>

<xsd:attribute name="author"

type="xsd:string" use="required"/>

</xsd:complexType>

</xsd:element>

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Data Types

• XML Schema Part 2: Datatypes Second Edition– W3C Recommendation 28 October 2004– http://www.w3.org/TR/xmlschema-2/

• Built-in datatypes are those which are defined in this specification, and can be either primitive or derived;

• User-derived datatypes are those derived datatypes that are defined by individual schema designers.

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Element Declaration with Children

<xsd:element name="publications"> <xsd:complexType> <xsd:sequence> <xsd:choice minOccurs="0" maxOccurs="unbounded"> <xsd:element ref="article"/> <xsd:element ref="book"/> </xsd:choice> <xsd:element ref="notice" minOccurs="0"/> </xsd:sequence> </xsd:complexType></xsd:element>

Compare to DTD: <!ELEMENT publications ((article | book)*, notice?)>

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Separation into logical parts

An XML Schema might get huge. It is therefore useful to separate the definitions of logical parts, like the definition for an address from other parts. This makes it easier to maintain and reuse.

<schema targetNamespace="http://www.example.com/IBEST" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:ibest="http://www.example.com/IBEST">

<annotation> <documentation xml:lang="DE"> Adressen für das internationale Buchbestellungsschema für Example.com. Copyright 2001 Example.com. Alle Rechte vorbehalten. </documentation>

</annotation>

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Another example …<?xml version="1.0" encoding="UTF-8"?><!-- Document Type: Order Generated On: Tue Oct 03 2:26:38 P3 2006

--><!-- ===== xsd:schema Element With Namespaces Declarations ===== --><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

targetNamespace="urn:oasis:names:specification:ubl:schema:xsd:Order-2" xmlns="urn:oasis:names:specification:ubl:schema:xsd:Order-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:udt="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2" xmlns:ccts="urn:un:unece:uncefact:documentation:2" xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" elementFormDefault="qualified" attributeFormDefault="unqualified" version="2.0"><!-- ===== Imports ===== --><xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" schemaLocation="../common/UBL-CommonAggregateComponents-2.0.xsd"/><xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" schemaLocation="../common/UBL-CommonBasicComponents-2.0.xsd"/><xsd:import namespace="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2" schemaLocation="../common/UnqualifiedDataTypeSchemaModule-2.0.xsd"/><xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" schemaLocation="../common/UBL-CommonExtensionComponents-2.0.xsd"/><xsd:import namespace="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" schemaLocation="../common/UBL-QualifiedDatatypes-2.0.xsd"/><!-- ===== Root Element ===== --><xsd:element name="Order" type="OrderType">

<xsd:annotation><xsd:documentation>This element MUST be conveyed as the root element in any instance document based on this Schema expression</xsd:documentation>

</xsd:annotation></xsd:element><xsd:complexType name="OrderType">

view

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Include

To include separated parts of a schema the main schema uses the include element.

<include schemaLocation="http://www.example.com/schemas/adresse.xsd"/>

Main Schema

Include Schemas

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Validierung

<?xml version="1.0" encoding="UTF-8"?>

<Order xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2" xmlns:ccts="urn:oasis:names:specification:ubl:schema:xsd:CoreComponentParameters-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:udt="urn:un:unece:uncefact:data:draft:UnqualifiedDataTypesSchemaModule:2" xmlns="urn:oasis:names:specification:ubl:schema:xsd:Order-2">

<cbc:UBLVersionID>2.0</cbc:UBLVersionID><cbc:CustomizationID>urn:oasis:names:specification:ubl:xpath:Order-2.0:sbs-1.0-draft</

cbc:CustomizationID><cbc:ProfileID>bpid:urn:oasis:names:draft:bpss:ubl-2-sbs-order-with-simple-response-draft</cbc:ProfileID><cbc:ID>AEG012345</cbc:ID>

</Order>

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Processing XML

SAX vs DOM vs StAX

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

DOM (Document Object Model)

Generates the tree structure out of the elements contained in the XML document.

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

DOM (Document Object Model)

• Very useful for small documents• Random access to structure using objects• Can read, manipulate, and write XML

programmatically• Write recursive code to explore child nodes

of unknown or evolving schema• Write hard-coded procedures to handle

static well-known schema

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

SAX (Simple API for XML)

Based on events like (default handler):– startDocument ()        – endDocument ()  – startElement (java.lang.String uri,

java.lang.String localName, java.lang.String qName,

Attributes attributes) – endElement (java.lang.String uri,

java.lang.String localName, java.lang.String qName)

– error (SAXParseException e)         – fatalError (SAXParseException e)         

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

SAX (Simple API for XML)

• Uses much less memory then DOM, especially for large documents (but for some applications more than one pass is needed)

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

DOM vs SAX

DOM SAX

memory - +

flexibility + -

performace - (*) + (*)

Standard w3c xml-develop

* Depending on the application, if more than one pass needed DOM might be better!

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Problems …

• What if DOM and SAX are both not acceptable? E.g. mobile devices with J2ME

• DOM needs to much memory

• Common streaming APIs like SAX are all push APIs– It is the SAX parser pushing the tokens into the

application not easy to handle

public class Flour extends DefaultHandler { …public void startElement(String namespaceURI, String localName,

String qName, Attributes atts) { … }…public static void main(String[] args) {

Flour f = new Flour(); SAXParser p = new SAXParser(); p.setContentHandler(f); try {

p.parse(args[0]); } catch (Exception e) {

e.printStackTrace();}System.out.println(f.amount);

}…

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Alternative: StAX

• StAX (Streaming API for XML) – a pull parsing API – With e.g. next() the next token can be called by

the application.– JSR 173 (Java Specification Request)

http://jcp.org/en/jsr/detail?id=173

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Example for calling next …

while (true) { int event = parser.next(); if (event == XMLStreamConstants.END_DOCUMENT) {

parser.close(); break;

} if (event == XMLStreamConstants.START_ELEMENT) { System.out.println(parser.getLocalName()); }

}

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Transforming XML

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

GIF, JPG, NSK-TIFF etc.GIF, JPG, NSK-TIFF etc.

AVI, AU, WAV, AVI, AU, WAV, WMA, MP3 etc.WMA, MP3 etc.

MPG, WMV, MPG, WMV, RM, etc.RM, etc.

DOC, HTML, DOC, HTML, PDF, etc.PDF, etc.

JPEG, GIF JPEG, GIF etc.etc.

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

XSL

The XML Stylesheet Language (XSL) has three subcomponents:

• XSL-FOXSL-Formatting Objects, an XML vocabulary for specifying formatting semantics.  

• XSLTThis the transformation language, which lets you transform XML into some other format.

• XPath XPath is an addressing mechanism that lets you specify a path to an element.

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

• Extensible Stylesheet Language (XSL)Version 1.0– W3C Recommendation 15 October 2001

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

XSLT Processing

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

XSLT (XML Stylesheet Language Transformations)

• XSLT is a programming language

• Write scripts containing if statements and for-each loops

• Uses XPath for querying document, math calculations, and string functions

• Can transform XML into HTML or text

• Useful for transforming XML to XML

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

So, what is XML

• A meta markup language• Structured information that complies to a standard

structure and syntax• “The ASCII of the 21st Century”• Platform independent information for:

– Presentation instructions– User settings– Data repository– Data transfer– RPC calls– ...

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

What XML is not

• XML is not tied to any human language or

character encoding

• XML is not tied to any computing platform

or programming language

• XML has no semantics

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Literature I

• XML Professionell; Richard Anderson u.a.; MITP-Verlag; 2000; ISBN 3-8266-0633-7

• XML Data Management; Akmal B. Chaudhri, Awais Rashid and Roberto Zicari; Addison Wesley; 2003; ISBN 0-201-84452-4

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Literature II

Resources of DBIS related to XML (German):• Einführung in XML & Document Type Definition;

Alexander Semino; Seminar SS 2001• XML-Schemata; Markus Krauße; Seminar SS

2001• XSL – Dokumente mit Stil; Fabian Wleklinski;

Seminar SS 2001• HTML und XML; Christina Anthes; Proseminar

SS2002

WS2006/2007 Vorlesung: Datenbanken und Internet Copyright 2006 – DBIS/Dr. Karsten Tolle

Literature III

• Read recommendations at W3C: www.w3.org

• … search the Web!


Recommended