Fuyuki Ishikawa (石川 冬樹)[email protected]
Lecture CourseService-Oriented Computing
3. Basics: XML2012/04/25
1 Introduction
2 Basics: Distributed Objects
3 Basics: XML
4 Web Services: Foundations
5 Web Services: Composition
6 Web Services: Implementation
7 Related Topics (1): Reliability
SOC'12 @ Sokendai 2Fuyuki Ishikawa
Course Plan
8 Related Topics (2): Security
9 Related Topics (3): Engineering
10 Related Topics (4): Semantic Web
11 Cloud Computing (1): Overview
12 Cloud Computing (2): Experience
13 Discussion and Summary
14 Students’ Presentation
SOC'12 @ Sokendai 3Fuyuki Ishikawa
Course Plan
Web Services and XMLXML
Core ConceptsSchema LanguagesOther Essential SpecificationsProgramming Models
TOC
4SOC'12 @ Sokendai Fuyuki Ishikawa
A Web service is a software applicationidentified by a URI, whose interfaces and binding are capable of being defined, described and discovered by XML artifactsand supports direct interactions with other software applications using XML based messages via internet-based protocols
[W3C, 2002]
SOC'12 @ Sokendai 5Fuyuki Ishikawa
(Review) Web Services?
Infrastructures for distributed objectsAims at interoperability among objects based on different environments (platforms and languages)Implemented by common network formats and translation mechanisms in ORB for each environment
Some criticisms when compared with the Web (which became widespread at that time)
Different mechanisms existed and were not connected, such as CORBA, DCOM and Java RMIFirewall configuration was difficult due to dynamic port mapping mechanisms
SOC'12 @ Sokendai 6Fuyuki Ishikawa
Previous Issues in Distributed Objects
Popularity of XML as platform-independent data formats on the Web
XML 1.0 (W3C Recommendation in 1998)XHTML 1.0 (W3C Recommendation in 2000)MathML 1.01(W3C Recommendation in 1999)…
Use of XML for messaging formats between distributed objects
XML-RPC (Userland and Microsoft, 1998)SOAP (a widespread standard ver. 1.1 in 2000)
Movement
7SOC'12 @ Sokendai Fuyuki Ishikawa
8
XML-RPC<methodCall><methodName>examples.getStateName
</methodName><params><param><value><i4>41</i4></value></param>
</params></methodCall>
<methodResponse><params><param><value><string>South Dakota</string></value>
</param></params>
</methodResponse>
SOC'12 @ Sokendai Fuyuki Ishikawa
Web Services and XMLXML
Core ConceptsSchema LanguagesOther Essential SpecificationsProgramming Models
TOC
9SOC'12 @ Sokendai Fuyuki Ishikawa
XML: eXtensible Markup LanguageMarkup language: a language for a set of annotations to describe structures of text dataExtensible: those annotations can be defined by users
For data exchange especially via the InternetHas the grammar and the parser requirements of defined by W3C (open standard)Does not depend on any programming languages or execution environments (platform independent)Attaches a text document with structural information about which part describes what (self-describing)
SOC'12 @ Sokendai 10Fuyuki Ishikawa
XML: Overview
SGML: Standard Generalized Markup Language(ISO standard in 1986)
Uses plain text formats with "tags" to involve meanings inside the text data Introduces ideas of
Validation based on syntax definition (DTD)Separation of data and their metadata (attributes)Separation of data and processing
11
XML History
<quote>This is a document about <keyword type=italic>Web Services</keyword>
</quote>
SOC'12 @ Sokendai Fuyuki Ishikawa
Problems in SGML: complexityComplexity of the specification, or difficulty in implementation of processing software
Syntax simplification into XML standardizationVer. 1.0 (W3C Recommendation in 1998)Ver. 1.1 (W3C Recommendation in 2004)
Allows for use of characters not included in Unicode 2.0Is often not necessary or supported by tools
Ver. 2.0 ???
12
XML History
SOC'12 @ Sokendai Fuyuki Ishikawa
SOC'12 @ Sokendai 13Fuyuki Ishikawa
XML Documents: Overview
<books><book category="computer">
<author> M. P. Singh and M. N. Huhns </author><title> Service-Oriented Computing:
Semantics, Processes, Agents </title><edition></edition>
</book><book category='fiction'>
<author> J.K.Rowling </author><title> Harry Potter and the Deathly Hallows </title><edition/>
</book></books>
SOC'12 @ Sokendai 14Fuyuki Ishikawa
XML Documents: Element (1)
<books><book category="computer">
<author> M. P. Singh and M. N. Huhns </author><title> Service-Oriented Computing:
Semantics, Processes, Agents </title><edition></edition>
</book><book category='fiction'>
<author> J.K.Rowling </author><title> Harry Potter and the Deathly Hallows </title><edition/>
</book></books>
Element: a container of data distinguished by a pair of start-tag and end-tag
SOC'12 @ Sokendai 15Fuyuki Ishikawa
XML Documents: Element (2)
<books><book category="computer">
<author> M. P. Singh and M. N. Huhns </author><title> Service-Oriented Computing:
Semantics, Processes, Agents </title><edition></edition>
</book><book category='fiction'>
<author> J.K.Rowling </author><title> Harry Potter and the Deathly Hallows </title><edition/>
</book></books> An empty-element tag can be used
instead of writing a start-tag immediately followed by an end-tag
Elements form a tree There is only one root element in an XML documentAn element can have only one parent at the most while it can have multiple children
Elements can only be nested (cannot overlap)
SOC'12 @ Sokendai 16Fuyuki Ishikawa
XML Documents: Elements (3)
booksbooks
bookbookauthorauthortitletitle
editionedition
bookbookauthorauthortitletitle
editionedition
<strong>this is a <italic>pen</strong></italic>
not allowed
SOC'12 @ Sokendai 17Fuyuki Ishikawa
XML Documents: Attribute
<books><book category="computer">
<author> M. P. Singh and M. N. Huhns </author><title> Service-Oriented Computing:
Semantics, Processes, Agents </title><edition></edition>
</book><book category='fiction'>
<author> J.K.Rowling </author><title> Harry Potter and the Deathly Hallows </title><edition/>
</book></books>
Attribute: metadata describing the element
Specification on "white spaces“(horizontal tab, LF, CR, and space in ASCII)
When they are inside element tags or attribute valuesExcluded (ignored) by a parserWhen they are included in element contents Kept by a parser for the application software
Standard attributese.g., “lang”: language ("ja", "en-US", …)
Referencese.g., “<” for “<“ (escape), “&#A9” for “©”, …
SOC'12 @ Sokendai 18Fuyuki Ishikawa
XML Documents: Other Pieces
XML declarationMay (should) appear first in an XML documents
Processing Instructions (PI)Gives information for the processor (application software)
Type Declaration (later detailed)
SOC'12 @ Sokendai 19Fuyuki Ishikawa
XML Documents: Meta Description (1)
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml-stylesheet type="text/css" href="mystyle.css"?>
<!DOCTYPE …>
CommentsAllows for insertion of comments
CDATA sectionAllows for insertion of (long) texts that are not interpreted as XML markups (such as XML sample texts)
20
XML Documents: Meta Description (2)
SOC'12 @ Sokendai Fuyuki Ishikawa
<!-- comment text -->
<![CDATA[ <sample> Sample data </sample> ]]>
SOC'12 @ Sokendai 21Fuyuki Ishikawa
"Incorrect" XML Documents<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <books>
<book category=computer><author> M. P. Singh and M. N. Huhns </author><title> Service-Oriented Computing:
Semantics, Processes, Agents </title><edition>
</book></books><books>
<book category='fiction'><author> J.K.Rowling<title> Harry Potter and the Deathly Hallows
<edition> 2nd </title> </edition></book>
</books>
No quotation in attribute value
Lack of the end-tag
Lack of the end-tag
Overlaping tags
Multiple root elements
An XML document is said to be well-formedif, roughly speaking,
Non-empty elements are delimited by a start-tag and an end-tag, and empty elements may be marked with an empty-element tagAttribute values are quoted with either a pair of single quotes or a pair of double quotesTags may be nested but not overlapThere is exactly one root element
Correctness based on the criteria given in the XML standard specification (syntax-based)
22
Well-formed
SOC'12 @ Sokendai Fuyuki Ishikawa
Web Services and XMLXML
Core ConceptsSchema LanguagesOther Essential SpecificationsProgramming Models
TOC
23SOC'12 @ Sokendai Fuyuki Ishikawa
SOC'12 @ Sokendai 24Fuyuki Ishikawa
"Correct" XML Documents?
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <books>
<book category="computer"><author> M. P. Singh and M. N. Huhns </author><edition/><price> -350 </price>
</book><book category='fiction'>
<author><price> 200 </price>
</author><title> Harry Potter and the Deathly Hallows </title><edition/>
</book></books>
price value can be negative?
author value can be price?
book can lack title?
XML allows its users to define markups to annotate text dataNeeds a constraint definition on the user-defined markups: an XML schema
Vocabularies (e.g., element names)Structural constraints (e.g., allowed hierarchies of elements)Content of documents (e.g., type of attribute values)
Note: there is a specific language called “XML Schema” (later discussed) while the term schema is general
25
Schema Definitions for XML
SOC'12 @ Sokendai Fuyuki Ishikawa
An XML document is said to be valid if it satisfies constraints given in the schema it is associated withCorrectness based on the criteria given in the associated schema (semantics or meaning-based)
26
Validity
SOC'12 @ Sokendai Fuyuki Ishikawa
DTD: Document Type DefinitionUses EBNF-based notations
27
DTD: Overview and Example
<!ELEMENT FruitBasket (Cherry?, (Orange | Apple)+)><!ELEMENT Cherry EMPTY><!ELEMENT Orange EMPTY><!ELEMENT Apple EMPTY>A "FruitBasket" element can contain only- No or one "Cherry" element- And subsequently, equal to or more than one of "Orange" or "Apple" elements
<FruitBasket><Apple/><Orange/><Apple/>
</FruitBasket>SOC'12 @ Sokendai Fuyuki Ishikawa
<FruitBasket><Cherry/><Apple/>
</FruitBasket>
Use of DTD with XML documentsDefined in the XML specification itself(though planned to be removed from XML 2.0)Uses the <!DOCTYPE> tag to include DTD descriptions or to refer to external DTD documents
28
DTD: Use with XML
SOC'12 @ Sokendai Fuyuki Ishikawa
<!DOCTYPE books [ … ] > Embedded definition
<!DOCTYPE booksPUBLIC "-//Book" "http://nii.ac.jp/dtd/Book.dtd" >
Reference by a URI (later described) and possibly an internal name (inside the SYSTEM or PUBLIC)
Problems found in DTDLacks support for reusability of definitions
Does not have a mechanism to resolve name collision or to extend existing definitions
Lacks a mechanism for data type definitionWhy not use the XML syntax!
More powerful languages are preferredXML Schema (W3C), RELAX NG (ISO), ...DTD is planned to be removed in XML 2.0
29
DTD: Limitations
SOC'12 @ Sokendai Fuyuki Ishikawa
XML SchemaDefined as an XML languagePublished as W3C Recommendation in 2001Instantiated as an XSD (XML Schema Definition)Uses Namespaces (described later) to allow for reuse of existing definitions even if they share common names for different usagesSupports a large number of built-in data types and definition of derived data types
30
XML Schema: Overview
SOC'12 @ Sokendai Fuyuki Ishikawa
SOC'12 @ Sokendai 31Fuyuki Ishikawa
XML Schema: Example<element name="FruitBasket">
<complexType><sequence>
<element name="Cherry" minOccurs="0" maxOccurs="1"><complexType/>
</element><group ref="OrangeOrApple" minOccurs="0"
maxOccurs="unbounded"/></sequence>
</complexType></element><group name="OrangeOrApple">
<choice><element name="Orange"><complexType/></element><element name="Apple"><complexType/></element>
</choice></group>
RELAX NG(REgular LAnguage for XML Next Generation)
Defined as an XML languageAlso includes a compact (non-XML) syntax and conversion into/from the XML-based syntax
Proposed against XML Schema, which became too large and complex pushed by large companies
Based on RELAX (M. Murata) and TREX (J. Clark)Standardized by OASIS in 2001Included in the ISO DSDL (Document Schema Description languages) standard, gathering different types of schemas for different tasks
32
RELAX NG: Overview
SOC'12 @ Sokendai Fuyuki Ishikawa
SOC'12 @ Sokendai 33Fuyuki Ishikawa
RELAX NG: Example<element name="FruitBasket">
<optional><element name="Cherry"><empty/></element>
</optional><zeroOrMore>
<choice><element name="Orange"><empty/></element><element name="Apple"><empty/></element>
</choice></zeroOrMore>
</element>
element FruitBascket {element Cherry { empty } ?...
}
Use of compact syntax
DTD: early standardSupported by many existing XML tools
XML Schema: popular in the industryUsed in many industrial specifications (e.g., WSDL)Supported by most tools
RELAX NG: popular to be against XML SchemaSupported by somewhat many tools
34
Notes: Comparing Schema Languages
SOC'12 @ Sokendai Fuyuki Ishikawa
Detailed differencesNumber of element occurrences
XML Schema: arbitrary with "min/maxOccurs"RELAX NG: "zeroOrMore", "oneOrMore", and "opitional" (*, +, ? in EBNF and DTD)
Built-in data formatsXML Schema: a variety of data types and their hierarchies (e.g., even including unsignedByte, nonPositiveInteger, duration, gYearMonth)RELAX NG: references to such external “data type libraries”
35
Notes: Comparing Schema Languages
SOC'12 @ Sokendai Fuyuki Ishikawa
Are they essential?
Web Services and XMLXML
Core ConceptsSchema LanguagesOther Essential SpecificationsProgramming Models
TOC
36SOC'12 @ Sokendai Fuyuki Ishikawa
URI: Universal Resource IdentifierFor identification of a resource (e.g., location or name)
Supporting combination of interaction definition and identification as well as hierarchical structures Examples
http://www.ietf.org/rfc/rfc2396.txtmailto:[email protected]:comp.infosystems.www.servers.unixurn:oasis:names:specification:docbook:dtd:xml:4.1.2
(IRI: Internationalized …, XRI: eXtensible …)
SOC'12 @ Sokendai 37Fuyuki Ishikawa
URI
Namespaces in XMLFor identification of a vocabulary set when reusing (possibly conflicting) names from different schemas
38
Namespaces
SOC'12 @ Sokendai Fuyuki Ishikawa
nameaddress age
mailAddress
namemodelNumber
price size
IDnumber
IDnumber
g g pNames used in a markup language for user description
g g pNames used in a markup language for item description
Want to reuse these vocabulary definitions to define a markup language for invoice description(while resolving the name conflicts)
Needs an identifier for each set of names (name space)
Namespaces in XMLe.g., use in XML documents
39
Namespaces
SOC'12 @ Sokendai Fuyuki Ishikawa
<invoicexmlns=“http://hoge.co.jp/schema/invoice”xmlns:item=“http://example.com/def”>
<IDnumber xmlns=“http://sample.jp/schema/User”>2010u53243
</IDnumber><item:IDnumber>K1J2321P</item:IDnumber>
</invoice>
Default name space for this tag and the descendants
Labeled name space for …
A qualified name with the label prefix (QName)(instead of the full URI)
Namespaces in XMLe.g., use in XML schema definitions
40
Namespaces
- A user element must have justone IDnumber element as a child
- …
SOC'12 @ Sokendai Fuyuki Ishikawa
Schema definition for user descriptionSchema definition for user descriptionxmlns=“http://sample.jp/schema/User”
- A shop element must have justone IDnumber element as a child
- …
Schema definition for item descriptionSchema definition for item descriptionxmlns=“http://example.com/def”
- An invoice element must have just one user:IDnumber element as a child- An invoice element must have just one item:IDnumber element as a child- …
Schema definition for invoice descriptionSchema definition for invoice descriptionxmlns=“http://hoge.co.jp/schema/invoice”xmlns:user=“http://sample.jp/schema/User”xmlns:item=“http://example.com/def”
XSLT: eXtensible Stylesheet Language Transformations
SOC'12 @ Sokendai 41Fuyuki Ishikawa
XSLT
<xsl:template match="doc"><html><head><title><xsl:value-of select="title"/></title></head><body><xsl:apply-templates/></body></html></xsl:template><xsl:template match="doc/title"><h1><xsl:apply-templates/></h1></xsl:template>
<doc><title> About NII </title>…</doc> <html><head><title>About NII</title></head><body><h1> About NII </h1>…</body></html> (not exact: line feeds are adjusted for the layout)
XPathFor reference to nodes that satisfy a given condition
XIncludeFor import of other documents
Xquery, …
42
Other Extensions
SOC'12 @ Sokendai Fuyuki Ishikawa
A//B/*[1]
<xi:include href="sample.txt" parse="text"/>
Choose the first element with any name (*[1]) among children of an element B (B/), which is a descendant of an element A (A//) (a relative path based on the current context)
Web Services and XMLXML
Core ConceptsSchema LanguagesOther Essential SpecificationsProgramming Models
TOC
43SOC'12 @ Sokendai Fuyuki Ishikawa
DOM: Document Object ModelPlatform/language-independent standard object model/API for XML (or HTML)
Described as IDL definitionsOriginal motivation: Web browsers needed object models to facilitating manipulation of structures of HTML documents (e.g., in JavaScript code)
44
DOM
<html> <head>...</head><body>...<form name="myFORM">...<input type="text" name="myTEXT"> …
txt = document.myForm.myTEXT.value
SOC'12 @ Sokendai Fuyuki Ishikawa
DOM: Document Object Model (Cont'd)APIs for XML Manipulation
45
DOM
Interface Node {...DOMString nodeName;DOMString nodeValue;Node parentNode;NodeList childNode;Node previousSibling;Node nextSibling...Node insertBefore(Node newChild, Node refChild)Node replaceChild(Node newChild, Node oldChild)...
}SOC'12 @ Sokendai Fuyuki Ishikawa
DOM: Document Object Model (Cont'd)Essence: Allows for focusing on the structure, not the syntax to denote itHistory
Level1 (W3C Recommendation in 1998)Navigation and manipulation of HTML and XML
Level 2 (W3C Recommendation in 2000)XML Namespace support
Level 3 (W3C Recommendation in 2004)Support of many features (Load and Save, Validation, XML Base, etc.)
Support by many browsers and libraries46
DOM
SOC'12 @ Sokendai Fuyuki Ishikawa
SAX: Simple API for XMLEvent-driven API for serial access XML parser
47
SAX
<?xml version="1.0"?><books><book>XML</book>
</books>
Sequential ParsingSequential Parsing
Users implement these event handlers (callback methods)
call startDocument()call startElement(“books”)call startElement(“book”)call characters(“XML”)
call endElement(“book”)call endElement(“books”)
SOC'12 @ Sokendai Fuyuki Ishikawa
DOM vs. SAX: DOM parsers keep all the structure (or objects and their has-a relationships), while SAX parsers just see the "current" point
DOM takes much more memory (and thus is slower) than SAXDOM cannot handle XML documents larger than the memory amount, as it does not allow for streamed reading, while SAX canSAX is less suitable for tasks that need the whole tree structure, including validation, style-sheet transformation, XPath evaluation, and so on, while DOM is naturally suitable by its design
48
DOM vs. SAX
SOC'12 @ Sokendai Fuyuki Ishikawa
Difficulty in DOM or SAX: need to "retrieve" the structure of XML documentsUse of the native representation of the object that denotes an XML document
49
XML Data Binding
<?xml version="1.0"?><books><book>XML</book>
</books>
booksdoc.getBooks().getBookList().getFirst().getValue();
Class Books{List<Book> bookList;List<Book> getBookList() {...}...
}Class Book{...}
SOC'12 @ Sokendai Fuyuki Ishikawa
50
XML Data Binding
Type
Instance
XML schema
XML OO Programming Lang.
XML document
Class
Object
unmarshall
marshall
SOC'12 @ Sokendai Fuyuki Ishikawa
51
Typical Tool Support (e.g., JAXB)
Type
Instance
XML schema
XML OO Programming Lang.
XML document
Class
Object
Class GeneratorClass Generator
Marshaller/UnmarshallerGenerator
Marshaller/Unmarshaller
Provided by a library/tool, used once during the coding phase
SOC'12 @ Sokendai Fuyuki Ishikawa
52
Typical Tool Support (e.g., JAXB)
Instance
OO Programming Lang.
XML document Object
Marshaller/Unmarshaller
Application Program
1.1. Obtain an XML document2. Call the unmarshallar and obtain the
corresponding object3. Manipulate the object4. Call the marshaller and obtain the modified
XML document
SOC'12 @ Sokendai Fuyuki Ishikawa