Date post: | 19-Jan-2016 |
Category: |
Documents |
Upload: | ethel-ward |
View: | 214 times |
Download: | 0 times |
1
Processing XML with JavaProcessing XML with Java
DBI – Representation and Management of Data on the Internet
2
XMLXML
• XML is eXtensible Markup Language• It is a metalanguage:
– A language used to describe other languages using “markup” tags that describe properties of the data
• Designed to be structured– Strict rules about how data can be formatted
• Designed to be extensible– Can define own terms and markup
3
XML FamilyXML Family
• XML is an official recommendation of the W3C
• Aims to accomplish what HTML cannot and be simpler to use and implement than SGML
HTML XMLSGML
XHTML
4
The Essence of XMLThe Essence of XML
• Syntax– The permitted arrangement or structure of letters and
words in a language as defined by a grammar (XML)
• Semantics– The meaning of letters or words in a language
• XML uses Syntax to add Semantics to the documents
5
Using XMLUsing XML
• In XML there is a separation of the content from the display
• XML can be used for:– Data representation– Data exchenge
6
Databases and XMLDatabases and XML
• Database content can be presented in XML– XML processor can
access DBMS or file system and convert data to XML
– Web server can serve content as either XML or HTML
7
HTML vs. XMLHTML vs. XML<OL>
<LI>HTML allows <B><I>improper nesting</B></I>. <LI>HTML allows start tags, without end tags, like the <BR> tag. <LI>HTML allows <FONT COLOR=#9900CC>attribute values</FONT> without quotes <li>HTML is case-insensitive <LI>White space is not important in HTML </OL>
<OL> <LI>XML requires <B><I>proper nesting</I></B>.</LI> <LI>XML requires empty tags to be identified with a trailing slash, as in <BR/>.</LI> <LI>XML requires <FONT COLOR="#9900CC">quoted attribute values</FONT>.</LI> <LI>XML is case-sensitive <LI>White space is important in XML </OL>
8
Some Basic Rules for XMLSome Basic Rules for XML
• All tags must be balanced - <TAG>...</TAG>• Empty tags expressed - <EMPTY_TAG/>• Tags must be nested - <B><I>…</I></B>• All element attributes must be quoted -
<TAG name=“value”>• Text is case-sensitive - <TAG> != <Tag>• Comments are allowed - <!-- … -->• Must begin - <?xml version=‘1.0’ ?>• Special characters must be escaped (e.g., > for >)
9
SAX ParserSAX Parser
• Set of interfaces implemented by an application
• The application – reads in an XML file – generates events when it encounters items in
the XML file
10
SAX Parser EventsSAX Parser Events
• A SAX parser generates events– at the start and end of a document, – at the start and end of an element, – when it finds characters inside an element, and
at several other points
• User writes the Java code that handles each event, and decides what to do with the information from the parser
11
12
When to (not) use SAXWhen to (not) use SAX
• Ideal for simple operations on XML files– E.g. reading and extracting elements
• Good for very large XML files (c.f. DOM)
• Not good if we want to manipulate XML structure
• Not designed for writing out XML
13
DOMDOM
• Document Object Model
• Set of interfaces for an application that reads an XML file into memory and stores it as a tree structure
• The abstract API allows for constructing, accessing and manipulating the structure and content of XML and HTML documents
14
What a DOM Parser GivesWhat a DOM Parser Gives
• When you parse an XML document with a DOM parser, you get back a tree structure that contains all of the elements of your document
• The DOM provides a variety of functions you can use to examine the contents and structure of the document
15
16
Why to Use DOMWhy to Use DOM
• Task of writing parsers is reduced to coding against an API for the document structure
• Domain-specific frameworks will be written on top of DOM
17
18
19
DOM vs. SAXDOM vs. SAX
• If your document is very large and you only need a few elements - use SAX
• If you need to process many elements and perform manipulations on XML - use DOM
• If you need to access the XML many times - use DOM
20
What Would You Choose forWhat Would You Choose for
• Processing an XML document in a server?
• Processing an XML document in a remote client?
• Direct access to element in the XML file (e.g., index based on paths)
• A visual tool for traversal over the document tree?
21
XML ParsersXML Parsers
22
XML ParsersXML Parsers
• There are several different ways to categorise parsers:– Validating versus non-validating parsers – Parsers that support the Document Object
Model (DOM) – Parsers that support the Simple API for XML
(SAX) – Parsers written in a particular language (Java,
C++, Perl, etc.)
23
Non-Validating ParsersNon-Validating Parsers
• Speed and efficiency– It takes a significant amount of effort for an
XML parser to process a DTD and make sure that every element in an XML document follows the rules of the DTD.
• If we only want to find tags and extract information – we should use a non-validating parser
24
Using an XML ParserUsing an XML Parser
• Three basic steps in using an XML parser– Creating a parser object – Passing the XML document to the parser – Processing the results
• Generally, writing out XML is not in the scope of parsers (though some may implement proprietary mechanisms)
25
SAX – Simple API for XMLSAX – Simple API for XML
26
The SAX ParserThe SAX Parser
• SAX parser is an event-driven API– An XML document is sent to the SAX parser– The XML file is read sequentially– The parser notifies the class when events
happen, including errors– The events are handled by the implemented
API methods to handle events that the programmer implemented
27
SAX ParserSAX Parser• A SAX parser generates events
– At the start and end of a document– At the start and end of an element – When it finds characters inside an element– Upon encountering errors– Upon encountering negligible whitespace– and at several other points
• It uses a callback mechanism to notify the application
• Java code that handles each event implements the events handling
28
The The org.xml.sax.*org.xml.sax.* Package Package
• SAX Interfaces and Classes– Parser– DocumentHandler– DTDHandler– ErrorHandler– EntityResolver– AttributeList– Locator
29
The Apache XML Parser (xerces)The Apache XML Parser (xerces)
import org.xml.sax.*;
import org.apache.parsers.*;
class MyClass {
SAXParser myParser;
…
try {
myParser.parse("file:/myFile.xml");
} catch (SAXException err) {…}
…
}
30
The Parser InterfaceThe Parser Interface
• Registers other objects for callbacks– void setDocumentHandler(DocumentHandler handler)– void setDTDHandler(DTDHandler handler)– void setErrorHandler(ErrorHandler handler)– void setEntityResolver(EntityResolver resolver)
• Starts parsing with parse() method call• When the parser hits a significant item, it stops
reading and calls a registered object• The parser continues reading the XML file once
the called method has returned
31
The DocumentHandler InterfaceThe DocumentHandler Interface• This interface is used to receive basic markup
events from the parser• It is usually implemented by a class that activates
the parser
class myClass implements DocumentHandler {
…
myParser.setDocumentHandler(this);
…
}
32
DocumentHandler MethodsDocumentHandler Methods
• void startDocument()• void endDocument()• void startElement(String name,
AttributeList attrs)• void endElement(String name)• void characters(char[] ch,
int start, int length)
• void ignorableWhitespace(char[] ch, int start, int length)
• void processingInstruction(String target, String data)
33
Bachelor TagsBachelor Tags
• What happen when the parser parses a bachelor tag?
<dbi id=‘1’/>
34
AttributeList InterfaceAttributeList Interface
• Elements may have attributes– We have a wrapper object for all attribute
details that implements the AttributeList interface
– It cannot distinguish attributes that are defined explicitly from those that are specified in the DTD
35
AttributeList Interface (cont.)AttributeList Interface (cont.)
int getLength();
String getName(int i);
String getType(int i);
String getValue(int i);
String getType(String name);
String getValue(String name);
36
Attributes TypesAttributes Types
• The following are possible types for attributes:
– "CDATA",
– "ID",
– "IDREF", "IDREFS",
– "NMTOKEN", "NMTOKENS",
– "ENTITY", "ENTITIES",
– "NOTATION"
37
AttributeListImplAttributeListImpl
• The class is in the package org.xml.sax.helpers.*
• Include methods such as:– addAtrribute, removeAttribute– clear– getName, getType, getValue– getLength
By name or by index
38
ErrorHandler InterfaceErrorHandler Interface• If we want to know about warnings and
errors– We implement the ErrorHandler interface
and register it with the parser class– The handler does not report where the error
occurred
• We have three levels of exception:void error(SAXParseException ex);
void fatalError(SAXParserExcpetion ex);
void warning(SAXParserException ex);
39
Locator InterfaceLocator Interface
• Associates a SAX event with a document location– The parser can inform
• the application of the entity, • line number and character number of a warning or
error,
– if it is a class implementing the Locator interface and it is registered with the DocumentHandler
40
Locator MethodsLocator Methods
int getLineNumber();
int getColumnNumber();
String getSystemId();
(i.e., return the URL)
String getPublicId();
41
DTDHandler InterfaceDTDHandler Interface
• Provides callback methods to receive notification of DTD events
• It is a mechanism to inform an application about any binary entity that the parser encounters
42
DTD HandlerDTD Handler
notationDecl(String name,
String publicId,
String systemId);
unparsedEntityDecl(String name,
String publicId,
String systemId,
String notationName);
43
InputSource ClassInputSource Class
• Possible to specify a byte or character stream for the input to the parser
• The InputSource class contains methods that specify the exact nature of the data source
44
EntityResolver InterfaceEntityResolver Interface
• The application is not aware of the physical structure of the XML data
• The parser contains an entity manager that hides the complexity from the application which sees the data as a single stream
• Can intercept references to entities by implementing the EntityResolver interface
45
EntityResolver Interface (2)EntityResolver Interface (2)
• When the parser encounters an entity, it passes the system and/or public identifier to the application– Return value is ‘null’ or new InputSource
46
EntityResolver Interface (2)EntityResolver Interface (2)
public InputSource resolveEntity(String publicID,
String systemID) {
if (systemID.equals("Disclaimer") || publicID.equals
("-//EBI//TEXT Disclaimer//EN")) return (new
InputSource(file:/xml/disc.xml));
}
}
47
HandlerBase ClassHandlerBase Class
• This class implements SAX interfaces in a sensible, default way
DocumentHandler, DTDHandler, EntityResolver and ErrorHandler
• Can be used for partial implementation of the interfaces
• This class can be extended:import org.xml.sax.HandlerBase;
public class myHandler extends HandlerBase() { …}
48
ParserFactory ClassParserFactory Class
• A helper class– Provides convenient methods for dynamically
loading SAX parsers
makeParser() (Uses org.xml.sax.parser system property)
makeParser(String className)
49
ExampleExample
• An example of indenting an XML document (as part of a server)
XMLIndent.java
50
51
52
53
DOM – Document Object ModelDOM – Document Object Model
54
DOM StandardsDOM Standards
• DOM 1.0 standard from www.w3.org• Assumes an object-oriented approach• Composed of number of interfaces
– org.w3c.dom.*
• Central class is 'Document' (DOM tree)• Standard does not include
– Tree walking– Writing out XML format
55
Creating a DOM TreeCreating a DOM Tree
• A DOM implementation has a method to pass an XML file to a factory object
• The factory object returns a Document object that represents the root element of a whole document
• On the Document objects, DOM standard interface can be used to interact with XML structure
56
Line Of WorkLine Of Work
DOM Parser DOM TreeXML File
API
Application
57
DOM TreeDOM Tree
Document
Document Type Element
Attribute Element ElementAttribute Text
ElementText Entity Reference TextText
TextComment
58
Normalizing a TreeNormalizing a Tree
• Normalizing a DOM Tree has two effects:– Combine adjacent textual nodes– Eliminate empty textual nodes– We can apply a normalize() method to the
document element
59
DOMParserDOMParser
• DOMParser extends XMLParser– Important Methods:
• void parse(InputSource source) Parses the specified input source
• void parse(java.lang.String systemId) Parses the input source specified by the given
system identifier
• Document getDocument() Returns the document
60
DOMParser
parse(xml-file) getDocument()
Document
getChildNodes()
NodeList …
61
Creating a DOM Tree (2)Creating a DOM Tree (2)import java.io.*; import org.w3c.dom.*;import org.apache.xerces.dom.*;import org.apache.xerces.parsers.*;
public class myClass {
DOMParser parser = new DOMParser();
try {
parser.parse(“file:/doc.xml”);
} ccatch (IOException err) {…} catch (SAXException err) {…}
catch (DOMException err) {…}
Document document = parser.getDocument();…
62
DOM Interfaces and ClassesDOM Interfaces and ClassesDocumentFragment
Document
CharacterDataText
Comment
CDATASection
Attr
Element
DocumentType
Notation
Entity
EntityReference
ProcessingInstruction
NodeNodeList
NamedNodeMap
DocumentType
Figure as from “The XML Companion” - Neil Bradley
63
DOM InterfacesDOM Interfaces
• The DOM defines several Java interfaces– NodeThe base data type of the DOM – Element Represents element– AttrRepresents an attribute of an element– TextThe content of an element or attribute– Document Represents the entire XML
document. A Document object is often referred to as a DOM tree
64
Node InterfaceNode Interface
• Basic object of DOM (single node in tree)• Nodes describe
• Node collections– NodeList, NamedNodeMap, DocumentFragment
• Several nodes extend the Node interface
ElementsAttributesTextCommentsCDATA sections
Entity declarationsEntity referencesNotation declarationsEntire documentsProcessing instructions
65
Node MethodsNode Methods
• Three categories of methods– Node characteristics
• name, type, value
– Contextual location and access to relatives• parents, siblings, children, ancestors, descendants
– Node modification• Edit, delete, re-arrange child nodes
66
Node Methods (2)Node Methods (2)short getNodeType();
String getNodeName();
String getNodeValue() throws DOMException;
void setNodeValue(String value) throws DOMException;
boolean hasChildNodes();
NamedNodeMap getAttributes();
Document getOwnerDocument();
67
Node Types - getNodeType()Node Types - getNodeType()
ELEMENT_NODE = 1
ATTRIBUTE_NODE = 2
TEXT_NODE = 3
CDATA_SECTION_NODE = 4
ENTITY_REFERENCE_NODE = 5
ENTITY_NODE = 6
PROCESSING_INSTRUCTION_NODE = 7
COMMENT_NODE = 8
DOCUMENT_NODE = 9
DOCUMENT_TYPE_NODE = 10
DOCUMENT_FRAGMENT_NODE = 11
NOTATION_NODE = 12
if (myNode.getNodeType() == Node.ELEMENT_NODE) { //process node …}
68
Node Names and ValuesNode Names and Values
• Every node has a name and possibly a value
• Name is not a unique identifier (only location)Type Interface Name Name Value
ATTRIBUTE_NODE Attr Attribute name Attribute value
DOCUMENT_NODE Document #document NULL
DOCUMENT_FRAGMENT_NODE DocumentFragment #document-fragment NULL
DOCUMENT_TYPE_NODE DocumentType DOCTYPE name NULL
CDATA_SECTION_NODE CDATASection #cdata-section CDATA content
COMMENT_NODE Comment Entity name Content string
ELEMENT_NODE Element Tag name NULL
ENTITY_NODE Entity Entity name NULL
ENTITY_REFERENCE_NODE EntityReference Entity name NULL
NOTATION_NODE Notation Notation name NULL
PROCESSING_INSTRUCTION_NODE
ProcessingInstruction Target string Content string
TEXT_NODE Text #text Text string
Table as from “The XML Companion” - Neil Bradley
69
Type Interface Name Name Value
ATTRIBUTE_NODE Attr Attribute name Attribute value
DOCUMENT_NODE Document #document NULL
DOCUMENT_FRAGMENT_NODE DocumentFragment #document-fragment NULL
DOCUMENT_TYPE_NODE DocumentType DOCTYPE name NULL
CDATA_SECTION_NODE CDATASection #cdata-section CDATA content
COMMENT_NODE Comment Entity name Content string
ELEMENT_NODE Element Tag name NULL
ENTITY_NODE Entity Entity name NULL
ENTITY_REFERENCE_NODE EntityReference Entity name NULL
NOTATION_NODE Notation Notation name NULL
PROCESSING_INSTRUCTION_NODE
ProcessingInstruction Target string Content string
TEXT_NODE Text #text Text string
70
Child NodesChild Nodes
• Most Nodes cannot have children, except– Document, DocumentFragment, Element
• Can check for presence of children– if (myNode.hasChildNodes()) { //process children of myNode … }
71
Node NavigationNode Navigation
• Every node has a specific location in tree• Node interface specifies methods to find
surrounding nodes– Node getFirstChild();– Node getLastChild();– Node getNextSibling();– Node getPreviousSibling();– Node getParentNode();– NodeList getChildNodes();
72
Node Navigation (2)Node Navigation (2)
getFirstChild()
getPreviousSibling()
getChildNodes()
getNextSibling()
getLastChild()
getParentNode()
Node parent = myNode.getParentNode();if (myNode.hasChildren()) { NodeList children = myNode.getChildNodes();}
Figure as from “The XML Companion” - Neil Bradley
73
Node ManipulationNode Manipulation
• Children of a node in a DOM tree can be manipulated - added, edited, deleted, moved, copied, etc.
Node removeChild(Node old) throws DOMException;
Node insertBefore(Node new, Node ref) throws DOMException;
Node appendChild(Node new) throws DOMException;
Node replaceChild(Node new, Node old) throws DOMException;
Node cloneNode(boolean deep);
74
Node Manipulation (2)Node Manipulation (2)
Ref
New
insertBefore
Old
New
replaceChild
cloneNode
Shallow 'false'
Deep 'true'
Figure as from “The XML Companion” - Neil Bradley
75
Document::Node InterfaceDocument::Node Interface
• Represents entire XML document (tree root)
• Methods//Information from DOCTYPE - See 'DocumentType'DocumentType getDocumentType();
//Information about capabilities of DOM implementationDOMImplementation getImplementation();
//Returns reference to root node elementElement getDocumentElement();
//Searches for all occurrences of 'tagName' in nodesNodeList getElementsByName(String tagName);
76
Document::Node Interface (2)Document::Node Interface (2)
• Factory methods for node creationElement createElement(String tagName) throws DOMException;
DocumentFragment createDocumentFragment();
Text createTextNode(String data);
Comment createComment(String data);
CDATASection createCDATASection(String data) throws DOMException;
ProcessingInstruction createProcessingInstruction(String target, String data) throws DOMException;
Attr createAttribute(String name) throws DOMException;
EntityReference createEntityReference(String name) throws DOMException;
77
DocumentType::Node InterfaceDocumentType::Node Interface
• Information about document encapsulated in DTD representation
• DOM 1.0 doesn’t allow editing of this node//Returns name of documentString getName();
//Returns general entities declared in DTDNamedNodeList getEntities();
//Returns notations declared in DTDNamedNodeList getNotations();
78
Element::Node InterfaceElement::Node Interface
• Two categories of methods– General element methods
– Attribute management methods
String getTagName();NodeList getElementsByTagName();void normalize();
String getAttribute(String name);void setAttribute(String name, String value)
throws DOMException;void removeAttribute(String name)
throws DOMException;Attr getAttributeNode(String name);void setAttributeNode(Attr new)
throws DOMException;void removeAttributeNode(Attr old)
throws DOMException;
79
Element::Node Interface (2)Element::Node Interface (2)
• Only Element objects have attributes but attribute methods of Element are simple– Need name of attribute
– Cannot distinguish between default value specified in DTD and given in XML file
– Cannot determine attribute type [String]
• Instead use getAttributes() method of Node– Returns Attr objects in a NamedNodeMap
80
Attr::Node InterfaceAttr::Node Interface
• Interface to objects holding attribute data
• Entity ref's are children of attribute's
//Get name of attributeString getName();
//Get value of attributeString getValue();
//Change value of attributevoid setValue(String value);
//if 'true' - attribute defined in element, else in DTDboolean getSpecified();
81
Attr::Node Interface (2)Attr::Node Interface (2)
• Attributes not considered part of DOM – parentNode, previousSibling and nextSibling have null value for Attr object
• Create attribute objects using factory method of Document//Create the empty Attribute nodeAttr newAttr = myDoc.createAttribute("status");
//Set the value of the attributenewAttr.setValue("secret");
//Attach the attribute to an elementmyElement.setAttributeNode(newAttr);
82
CharacterData::Node InterfaceCharacterData::Node Interface
• Useful general methods for dealing with text
• Not used directly– sub-classed to Text and Comment Node types
String getData() throws DOMException;void setData(String data) throws DOMException;int getLength();void appendData(String data) throws DOMException;String substringData(int offset, int length)
throws DOMException;void insertData(int offset, String data)
throws DOMException;void deleteData(int offser, int length)
throws DOMException;void replaceData(int offset, int length, String data)
throws DOMException;
83
Text::Node InterfaceText::Node Interface
• Represents textual content of Element or Attr– Usually children of these nodes
• Always leaf nodes• Single method added to CharacterData
– Text splitText(int offset) throws DOMException
• Factory method in Document for creation• Calling normalize() on an Element merges
its Text objects
84
CDATASection::Text InterfaceCDATASection::Text Interface
• Represents CDATA that is not to be interpreted as markup (the only delimiter recognised is the "]]>" string that ends the CDATA section)
• The DOMString attribute of the Text node holds the text of the CDATA section
• No methods added to CharacterData• Factory method in Document for creation
– CDATASection newCDATA = myDoc.createDATASection("press <<<ENTER>>>");
85
Comment::Text InterfaceComment::Text Interface
• Represents comments
• all the characters between starting '<!--' and ending '-->'
• No methods added to CharacterData• Factory method in Document for creation
– Comment newComment = myDoc.createComment(" my comment "); //Note
spaces
86
ProcessingInstruction::Node ProcessingInstruction::Node InterfaceInterface
• Represent processing instruction declarations– Name of node is target application name– Value of node is target application command
• Factory method in Document for creation– ProcessingInstruction newPI =
myDoc.createProcessingInstruction(“myProgs/ghostview”, “page.ps”);
//Get the content of the processing instructionString getData() //Set the content of the processing instructionvoid setData(String data) //The target of this processing instructionString getTarget();
87
EntityReference::Node InterfaceEntityReference::Node Interface
• DOM includes interfaces for handling notations, entities and entity references– If the entities have not been replaced by the
parser
Element Text
Text
EntityReference Text
An
value
xmleXtensible
MarkupLanguage
<!ENTITY xml "eXtensible Markup Language"><para>An &xml; value</para>
valuename
Figure as from “The XML Companion” - Neil Bradley
88
Entity::Node InterfaceEntity::Node Interface
• Represents an entity, either parsed or unparsed, in an XML document– Parser may replace entity references, or create EntityReference nodes
• Must retain Entity for non-parsable data• Extends Node interface and adds methods
• For non-parsable entities - can get notation nameString getPublicId();String getSystemId();String getNotationName();
89
Entity::Node Interface (2)Entity::Node Interface (2)
• A parsable Entity may have children that represent the replacement value of the entity
• All entities of a Document accessed with getEntities() method in DocumentType
<!ENTITY MyBoat PUBLIC "BOAT" SYSTEM "boat.gif" NDATA GIF>
String publicId = ent.getPublicId(); //BOAT
String systemId = ent.getSystemId(); //boat.gif
String notation = ent.getNotationName(); //GIF
Figure as from “The XML Companion” - Neil Bradley
90
Notation::Node InterfaceNotation::Node Interface
• Each notation declaration in DTD represented by a Notation node
• Methods added to Node interface
• All notations of a Document accessed with getNotations() method in DocumentType object
//Returns content of PUBLIC identifierString getPublicId();
//Returns content of SYSTEM identifierString getSystemId();
91
NodeList InterfaceNodeList Interface
• Holds collection of ordered Node objects
• Two methods//Find number of Nodes in NodeListint getLength();
//Return the i-th NodeNode item(int index);-------------------------------------------------Node child;NodeList children = element.getChildNodes()'for (int i = 0; i < children.getLength(); i++) { child = children.item(i); if (child.getNodeType() == Node.ELEMENT_NODE) { System.out.println(child.getNodeName()); }}
92
NamedNodeMap InterfaceNamedNodeMap Interface
• Holds collection of unordered Node objects– E.g. Attribute, Entity and Notation
• Unique names are essential as nodes are accessed by nameNamedNodeMap myAttributes = myElement.getAttributes();NamedNodeMap myEntities = myDocument.getEntities();NamedNodeMap myNotations = myDocument.getNotations();------------------------------------------------------int getLength();Node item(int index);Node getNamedItem(String name);Node setNamedItem(Node node) throws DOMException;//Node!Node removeNamedItem(String name) throws DOMException;
93
DocumentFragment::Node DocumentFragment::Node InterfaceInterface
• Fragment of Document can be temporarily stored in DocumentFragment node– Lightweight object, e.g. for 'cut-n-paste'
• When attached to another Node - destroys itself (very useful for adding siblings to tree)
DocumentFragment
DOM tree
New DOM tree
Figure as from “The XML Companion” - Neil Bradley
94
DOMImplementation InterfaceDOMImplementation Interface
• Interface to determine level of support in DOM parser– hasFeature(String feature, String version);
– if (theParser.hasFeature("XML", "1.0") { //XML is supported …}
95
DOM ObjectsDOM Objects
• DOM object compiled XML
• Can save time and effort if send and receive DOM objects instead of XML source– Saves having to parse XML files into DOM at
sender and receiver– But, DOM object may be larger than XML
source
96
ExamplesExamples
• DOMTest
• DTD
• XML
• Counting Result
97
98
Java + XML = JDOMJava + XML = JDOM
99
What is JDOM?What is JDOM?• JDOM is a way to represent an XML document for
easy and efficient reading, manipulation, and writing– Straightforward API– Lightweight and fast– Java-optimized
• Despite the name similarity, it's not build on DOM or modeled after DOM– Although it integrates well with DOM and SAX
• An open source project with an Apache-style license
100
The JDOM PhilosophyThe JDOM Philosophy• JDOM should be straightforward for Java programmers
– Use the power of the language (Java 2)– Take advantage of method overloading, the Collections
APIs, reflection, weak references– Provide conveniences like type conversions
• JDOM should hide the complexities of XML wherever possible– An Element has content, not a child Text node with content– Exceptions should contain useful error messages– Give line numbers and specifics, use no SAX or DOM
specifics
101
More JDOM PhilosophyMore JDOM Philosophy
• JDOM should integrate with DOM and SAX– Support reading and writing DOM documents
and SAX events– Support runtime plug-in of any DOM or SAX
parser– Easy conversion from DOM/SAX to JDOM– Easy conversion from JDOM to DOM/SAX
102
The Historical Alternatives: The Historical Alternatives: DOMDOM
• DOM is a large API designed for complex environments– A W3C standard, developed by W3C working groups– Implemented by products like Xerces– Represents a document tree fully held in memory– Has to have the same API on multiple languages– Reading and changing the document is non-intuitive– Fairly heavyweight to load and store in memory– http://www.w3.org/DOM
103
The Historical Alternatives: SAXThe Historical Alternatives: SAX
• SAX is a lightweight API designed for fast reading– Public domain API from David Megginson and XML-
DEV mailing list– Implemented by products like Xerces– Callback mechanism reports when document elements are
encountered– Lightweight since the document is never entirely in
memory– Does not support modifying the document– Does not support random access to the document– Fairly steep learning curve to use correctly
104
Do you need JDOM?Do you need JDOM?
• JDOM is a lightweight API– Its design allows it to hold less in memory
• JDOM can represent a full document– Not all must be in memory at once
• JDOM supports document modification– And document creation from scratch, no
"factory"
105
For More InformationFor More Information
• To get more information on JDOM, see: http://www.jdom/org