XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy.

Post on 04-Jan-2016

214 views 0 download

Tags:

transcript

XSDL & Relax : 2 new schema languages for XML

Rajasekar Krishnamurthy

Outline

• DTDs and their drawbacks

• XML Schema Requirements

• XSDL

• RELAX

• Other Schema specifications

Sample XML document

<?xml version="1.0"?><book >

<title>Intro to XML</title> <price>72.50</price> <author>

<name> Albert Einstein </name> <email>aeinstein@cs.wisc.edu</email> <phone>608-236-4112</phone>

</author></book>

Equivalent DTD

(!element book (title,price,author*))(!element title #PCDATA)(!element price #PCDATA)(!element author (name,email,phone))(!element name #PCDATA)(!element email #PCDATA)(!element phone #PCDATA)

Drawbacks of DTD<book >

<title>Intro to XML</title> <price>72.50</price> <author>

<title> Dr. </title> <firstname> Albert </firstname>

< lastname> Einstein </lastname><email>aeinstein@cs.wisc.edu</email>

<phone>608-236-4112</phone></author>

</book>

Outline

• DTDs and their drawbacks

• XML Schema Requirements

• XSDL

• RELAX

• Other Schema specifications

What is a schema ?

Model for describing a class of documentsCommon vocabulary for applications

exchanging documentsFormally express syntactic, structural and

value constraints applicable to instance documents

XML Schema requirements

Mechanisms for constraining document structure

inheritanceembedded documentationapplication specific constraintsprimitive data typingallow creation of user-defined datatypesaddressing the evolution of schema

Application Scenarios

Electronic Commerce transaction processingTraditional document authoring/editingQuery formulation and optimizationOpen and uniform transfer of data between

applications, including databasesMetadata interchange

Outline

• DTDs and their drawbacks

• XML Schema Requirements

• XSDL

• RELAX

• Other Schema specifications

XML Schema Definition Language

• Enhanced datatypes

• written in XML

• separates element tags from types– local namespaces

• Inheritance : derive new type definitions

• Identity constraints

• support for namespaces

Sample XML schema<schema> <element name=“book” type=“booktype”/> <complextype name=“booktype”>

<sequence><element name=“title” type = “string”/><element name=“price” type = “float” /><element name=“author” type=“authortype” minOccurs=“0” maxOccurs=“unbounded”/>

</sequence> </complextype></schema>

Sample schema (contd.)<schema>

<complextype name=“authortype”>

<sequence>

<element name=“name” type=“name”/>

<element name=“email” type=“email”/>

<element name=“phone” type=“phonenumber”/>

<element name=“address” type=“address” minOccurs=“0”/>

</sequence>

</complextype>

</schema>

Schema in graphical form

book

title price author*

name email phone address?

Schema Components

• Building blocks that comprise the abstract data model of the schema

• Primary Components– simple type definitions– complex type definitions– attribute declarations– element declarations

Schema Components

• Secondary components– attribute group

definitions

– identity constraint definitions

– model group definitions

– notation declarations

• Helper components– annotations

– model groups

– particles

– wildcards

Type Definitions

• Separates tag name from type of elements

• types can be– simpletypes

• represent leaf nodes in the graph

• replace PCDATA in DTDs

– complextypes • can have elements and attributes in its content

Sample complexType declaration

<complexType name=“address" > <sequence> <element name="name” type="string”

minOccurs=“0”/> <element name="street" type="string"/> <element name="city" type="string" />

</sequence> <attribute name="country” type = “string”

use=“default” value=“US”/></complexType>

Simpletype : Pattern

<simpletype name=“phonenumber”>

<restriction base=“string”>

<pattern value=“\d{3}-\d{3}-\d{4}”\>

</restriction>

</simpletype>

• Other facets: Enumerate, Range• Other simpletypes: Lists, Union

Elements• Global elements

– can occur as the root of the document– can be included/imported/referenced

• Local elements– can occur only in the specific context– sibling elements need to have same content

model• (!element book (author*, title, author*))

Sample schema<schema>

<element name=“book” type=“booktype”/>

<complextype name=“booktype”>

<sequence>

<element name=“title” type = “string”/>

<element name=“price” type = “float” />

<element name=“author” type=“authortype” maxOccurs=“unbounded”/>

</sequence>

</complextype>

</schema>

Element Content• Complextypes from simple types• <price currency=“USDollar”>23</>

• Mixed content• <price>amount in US-dollars is

<amount>23</amount> only• </price>

• Empty content• <price currency=“USDollars” amount=“23”/>

Building content models(!element author ((name | (title,firstname,lastname)),email,phone))

<author>

< lastname> Einstein </lastname>

<title> Dr. </title>

<firstname> Albert </firstname>

<email>aeinstein@cs.wisc.edu</email>

<phone>608-236-4112</phone>

</author>

<author>

<name> Albert Einstein </name>

<email>aeinstein@cs.wisc.edu</email>

<phone>608-23-4112</phone>

</author>

Building content models<complextype name=“authortype”>

<sequence>

<choice>

<element name=“name” type=“name”/>

<all>

<element name=“title” type=“titletype”/>

<element name=“firstname” type=“string”/>

<element name=“lastname” type=“string”/>

</all>

</choice>

<element name=“email” type=“email”/> ...

</sequence>

</complextype>

Content models• Can represent any content model expressible with

XML 1.0 DTD and more !!• Does not allow non-determinism

– ( (email,name) | (email,expandedname)) is illegal

– should be (email, (name | expandedname))• Does not allow ambiguity

– ( author*, contactauthor*, author* ) not allowed• author* can be derived in multiple ways

Deriving new types

• Two ways of deriving new types from existing types

• By extension– similar to inheritance in programming

languages

• By restriction– declarations more limited than base type

Deriving by Extension

<complexType name="USAddress" > <sequence> <element name="name” type="string”/> <element name="street" type="string"/> <element name="city" type="string" /> <element name="state" type=”USState"/> <element name="zip" type=”positiveInteger"/> <sequence></complexType>

Declare Base Type

<complexType name=“address" > <sequence> <element name="name” type="string” /> <element name="street" type="string"/> <element name="city" type="string" />

<sequence></complexType>

Derive By Extension

<complexType name=“USAddress”> <complexContent>

<extension base=“address”> <sequence> <element name="state" type=”USState"/>

<element name="zip”type=”positiveInteger"/>

<sequence></extension>

</complexContent></complexType>

Using Derived Types

<address type=“USAddress”>

<street>1210, W.Dayton Street</>

<city>Madison</>

<state>WI</>

<zip>53706</>

</>

<address>

<street>1210, W.Dayton Street</>

<city>Madison</>

</>

Deriving By Restriction

<complexType name=“modifiedAddress”> <complexContent> <restriction base=“address”> <sequence>

<element name="name” type="string” minOccurs=“0” maxOccurs=“0”/>

<element name="street" type="string"/> <element name="city" type="string" />

<sequence> </restriction> </complexContent><complexType>

Identity Constraints

• Can specify integrity constraints– uniqueness, key, keyref

• constraints can be locally scoped

• can be applied on attributes, elements or their contents– XML ID is an attribute

• can create keys/keyrefs from a combination of element and attribute content

Sample constraint

<element name=“book” type=“booktype”>

<unique name=“uniqueauthor”>

<selector xpath=“author”/>

<field xpath=“title”/>

<field xpath=“firstname”/>

<field xpath=“lastname”/>

</unique>

</element>

Other features

• Importing schema components– Type libraries

• Redefining Types & Groups

• Namespaces– Targetnamespaces

• allow undeclared value : support for namespace unaware documents

Other features

• Any element– allows well-formed XML to appear– can be restricted to a set of namespaces

• Any attribute

• anyType– base type for all complexTypes– does not constrain content in any way– default type when none is specified

Main drawback of XSDLAn element declaration (call it D) together with a blocking constraint (a

subset of {substitution, extension,restriction}, the value of a {disallowed substitutions}) is validly substitutable for another element declaration (call it C) if

1.1 the blocking constraint does not contain substitution;

1.2 There is a chain of {substitution group affiliation}s from D to C, that is, either D's {substitution group affiliation} is C, or D's {substitution group affiliation}'s {substitution group affiliation} is C, or . . .;

1.3 The set of all {derivation method}s involved in the derivation of D's {type definition} from C's {type definition} does not intersect with the union of the blocking constraint, C's {prohibited substitutions} and the {prohibited substitutions} of any intermediate {type definition}s in the derivation of D's {type definition} from C's {type definition}.

Main drawback of XSDL

• for a sequence, maximum is

unbounded if the {max occurs} of any wildcard or element declaration particle in the group's {particles} or the maximum part of the effective total range of any of the group particles in the group's {particles} is unbounded, or if any of those is non-zero and the {max occurs} of the particle itself is unbounded, otherwise the product of the particle's {max occurs} and the sum of the {max occurs} of every wildcard or element declaration particle in the group's {particles} and the maximum part of the effective total range of each of the group particles in the group's {particles} (or 0 if there are no {particles})

Outline

• DTDs and their drawbacks

• XML Schema Requirements

• XSDL

• RELAX

• Other Schema specifications

RELAX

• Developed by Makoto Murata & others in Japan

• based on the hedge automaton theory

• borrows rich datatypes from XML Schema Part2

• Submitted to ISO fast-track

• ease of translation from/to DTDs

Main features of RELAX

• Separates element tagname and type– context sensitive content models

• allows content models similar to XML schema

• allows definition of element and attribute groups

• annotations

• include mechanism for large schemas

Features absent in RELAX

• Support for namespaces – coming shortly??

• Identity constraints

• Inheritance

• New datatypes

XSDL vs. RELAX

• Allows sibling elements to have different types– allow the content model (author, title, author)

where the two author elements can have different content models

– introduces ambiguity• For content model (title, author*, author*)

• <title>”XYZ”</title><author/> is ambiguous

XSDL vs. RELAX• A single type can have multiple definitions

– actual definition which matches instance element found by exhaustive search

– atleast one match needs to be found

• nametype can be defined as name or expandedname– it is a choice of the two definitions

Extending existing types

• XSDL uses inheritance – can change (title, author*) to (title, author*,

contactauthor)

• In RELAX, add the new type definition completely– can change (title, author*) to (title,

contactauthor, author*) also

Using attribute values

• <price type=“int”>10</>

• <price type=“string”>ten</>

• content model of price element switched based on attribute value of type attribute

XSDL vs. RELAX

• RELAX– membership checking in linear time in SAX

model

• XSDL– type assignment in linear time in SAX/DOM

models• ignoring integrity constraints

Other Schema proposals

• XDR (XML-Data Reduced)– Microsoft’s Biztalk framework

• SOX (Schema for Object-oriented XML)– Commerce One

• DSD– AT&T and BRICS

• Schematron

References

• www.oasis-open.org/cover/schemas.html • www.w3.org/xml/schema.html• www.xml.gr.jp/relax/• Comparative Analysis of SIX XML Schema

Languages, Sigmod Record, Sept. 2000• Reasoning about XML Schema Languages using

Formal Language Theory, WWW submission