XJ: Facilitating XML Processing in Java
Matthew HarrenMukund Raghavachari
Oded ShmueliMichael Burke
Rajesh BordawekarIgor Pechtchanski
Vivek Sarkar
Presented By:Tamar Aizikowitz
Winter 2006/2007
14th World Wide Web Conference (WWW2005), Chiba, Japan
2 / 31
XML
Syntax:<person>
<first>John</first><last>Lennon</last>
</person>
Semantics:
Applications: The future web? XHTML? RSS? Problem: Supposedly human readable and
writable, but not really…
person
first
last
John
Lennon
• Markup language• Tags define elements• Elements contain other elements• Elements contain data
3 / 31
XML Schema XML based alternative to DTDs. Describes structure of XML document. Programmer defines valid structure of data by
defining element types. Support for standard and user defined types.
<xs:element name=“person” type=“personInfo”>
<xs:complexType name=“personInfo”><xs:sequence>
<xs:element name=“first” type=“xs:string”/>
<xs:element name=“last” type=“xs:string”/></xs:sequence>
</xs:complexType>
4 / 31
XPath
Query language for selecting a sequence of nodes from an XML document.
Filtering of result nodes using predicates. Example://person[last=“Lennon”]/first
XMLTree
XPath QueryProcessor
XPath Query
XML Node Sequence
5 / 31
XJ Introduction
Developed at the IBM Watson Research Center. More information: http://www.research.ibm.com/xj/.
Java 1.0
Java 1.1
Java 1.4
Java 1.5
XJ
xjc compilerxj runtime environment
6 / 31
XJ Holy Grail:Smooth Java/XML integration
XML Trees Just like 3, “Hello” and other values.
XML Schema Just like Java classes.
XPath Queries Just like [], ?: and other Java operators.
Smart Compiler Optimization…. Improved efficiency.
7 / 31
Example: Music Library
musicLibrary
albumalbum album
artisttitle stars artist
stringstring[1-5]string
8 / 31
Music Library Schema
<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="musicLibrary"> <xs:complexType> <xs:sequence> <xs:element name="album" maxOccurs="unbounded">
</xs:element> </xs:sequence> </xs:complexType> </xs:element></xs:schema>
9 / 31
Music Library Schema - Album
<xs:complexType> <xs:sequence>
<xs:element name="title" type="xs:string"/><xs:element name="stars“/>
<xs:simpleType><xs:restriction base ="xs:integer"/>
<xs:pattern value =“[1-5]"/></xs:restriction>
</xs:simpleType></xs:element><xs:element name="artist" type="xs:string"
maxOccurs="unbounded"> </xs:sequence></xs:complexType>
10 / 31
Music Library Data
<?xml version="1.0" encoding="UTF-8"?>
<musicLibrary>
<album>
<title>Abbey Road</title>
<stars>4</stars>
<artist>The Beatles</artist>
</album>
<album>
<title>Sounds of Silence</title>
<stars>4</stars>
<artist>Paul Simon</artist>
</album>
11 / 31
The XJ Type Hierarchy
java.lang.Object
com.ibm.xj.Sequence
com.ibm.xj.XMLObject
com.ibm.xj.XMLCursor
com.ibm.xj.XMLElement
com.ibm.xj.XMLAtomic
All Atomic Classes
All Element Classes
com.ibm.xj.io.XMLOutputStream
com.ibm.xj.io.XMLDocumentOutputStream
12 / 31
The XMLObject Class and Subclasses
XMLObject corresponds to an XML node.
Schema import creates subclasses of XMLElement and XMLAtomic for every element declaration.
XPath expressions evaluated on instances of these classes.
com.ibm.xj.XMLObject
com.ibm.xj.XMLElement
com.ibm.xj.XMLAtomic
All Atomic Classes
All Element Classes
13 / 31
XMLSequence and XMLCursor
Instance of Sequence is ordered list of XMLObject.
XPath expression result is instance of Sequence.
XMLCursor implements java.utils.Iterator. Used to iterate over instances of Sequence.
Support limited genericity (as defined in Java 5.0) for type checking.
java.lang.Object
com.ibm.xj.Sequence
com.ibm.xj.XMLCursor
14 / 31
Importing Schema Definitions
The integration of XML Schema in XJ is built on the following correspondence: XML Schema ~ Java Package XML Element ~ Logical Class Nested (local) Element ~ Nested Class Atomic types ~ Class + Auto Unboxing
15 / 31
Schema ~ Package
Element declarations are integrated into the Java type system as “logical classes”.
XML documents are well typed XML values that are instances of these classes.
Syntax:import musicLibrary.*;
16 / 31
XML Element ~ Class
Elements represented as subclasses of XMLObject.
May be used wherever a class type is expected. Constructed with the new() operator. Nested elements represented as nested classes. Syntax:
musicLibrary ml = new musicLibrary(...);musicLibrary.album a =
new musicLibrary.album(...);
17 / 31
Atomic Types Support for XML Schema built-in atomic types such
as xsd:integer and xsd:string. Represented as subclasses of XMLAtomic. Syntax: xsd.integer Subtyping:xsd.short s = ...;xsd.integer i = s;
Automatic unboxing:xsd.string xstr = ...;string s = xstr;
18 / 31
Creating XML Objects
Mechanisms for constructing XML: External source Literal XML embedded in an XJ program
XMLElement constructors: XMLElement(java.io.InputStream) XMLElement(java.io.File) XMLElement(java.net.URL) XMLElement(literal XML)
19 / 31
Inline Construction of XML
XML data construction using literal XML. Any well formed XML block can be used. Example:
title a = new title(<title>Greatest Hits</title>);
{ and } used to insert runtime values:title buildTitle(string t) {
title newT = new title(<title>{t}</title>);return newT;
}
20 / 31
XML Type Validation
Example:album a = new album(<album>
<title>Let It Be</title>
<stars>4</stars>
<band>The Beatles</band>
</album>);
To construct untyped XML, use the literal XML constructor for XMLElement.
LiteralXML
XMLParser
SchemaValidator
XML?
Valid XML?
CompilationError
Typed XMLObject
YesNo
YesNo
21 / 31
Executing XPath Queries
Syntax: context [|query|] query = valid XPath 1.0 expression. context = XML element. Specifies context for
query evaluation. XPath expressions evaluate to Sequence<T>
Example:string band = “The Beatles”;musicLibrary m = new musicLibrary(...); Sequence<album> b = m[|/album[artist[1]=$band]|];
$ refers to variables
22 / 31
XPath Static Semantics
XPath expressions evaluate to Sequence<T>. T is the most specific subtype of XMLObject that
the compiler can determine. Worst case: Sequence<XMLObject> is returned.
If query result is always empty, a static error is generated. Identified using Schema definition.
Example: title t = ...;Sequence<album> a = t[|/album|];
title has no album
children
23 / 31
XPath Runtime Semantics
Evaluated with respect to context specifier value. If the context specifier is a Sequence, each
member is used as a context node in turn. Value is union of results.
musicLibrary m = new musicLibrary(...);
Sequence<album> albums = m[|/album|];
Sequence<artist> artists = albums[|/artist|];
If the result is not a node set, a sequence of appropriate type is returned. For example: Sequence<xsd.boolean>.
24 / 31
Updating XML Data
Reference semantics Although more difficult to implement…
Result: in-place updates, as opposed to copy based ones.
Two types of updates are supported: Value assignments including complex types Tree structure updates
25 / 31
Value Assignments
XPath expressions used as lvalues for assignment:album a = new album(...);
a[|/title|] = “New Title”;
Bulk assignments:musicLibrary m = new musicLibrary(...);
m[|/album[artist[1]=“The Beatles”]/stars|] = 5;
Bulk assignment advantages: Possible optimizations efficient updates Clear concise code.
26 / 31
Tree Structure Update
Methods for structural changes: insertAfter() insertBefore() insertAsFirst() insertAsLast()
Example:album currArtist = m[|/album[title=“Sounds of
Silence”]/artist[1]|];
artist newArtist = new artist(<artist>Art
Garfunkel</artist>);
currArtist.insertAfter(newArtist);
27 / 31
Update Issues – Tree Structure
Duplicate parents and acyclicity After performing tree structure updates, resulting
graph must remain a tree. Example: attaching an element that already has
a parent. Problematic XJ update will result in a runtime
exception. Can be avoided by always detaching before
attaching nodes.
28 / 31
Update Issues – Complex Types
Need to validate that new value is still well typed after update.
Problem: Cannot always be done statically. Example:
Schema states that element a can contain between 2 and 5 instances of element b.
What happens after attach() or detach()? Solution:
Runtime check inserted at compile time.
29 / 31
Update Issues – Covariant Subtyping XML Schema allows declaration of subtypes by
restriction. Causes problems when updating subtype values
through base class interface. Example:
xsd.integer i; stars s = m[|//stars[1]|];i = s;i = 10;
Covariant subtyping already exists in Java arrays. The problem would arise in any language attempting
to support updates on XML Schema types.
illegal value for stars element
30 / 31
Summary – XJ Benefits
XML objects as typed valuesXML Schema integrationStatic type checkingTyped XPathCompiler optimizations
31 / 31
XJ - The Future?
Full support for Schema types XPath expressions as independent values
Not tied to context specifier Operators on XPath values
Composition, conjunction, disjunction… Typed methods and fields
musicLibrary m = new musicLibrary(…);m.album[2].title = “New Title”;