Post on 19-Dec-2015
transcript
DOM
Transparency No. 1
DOM(Document Object Model)
Cheng-Chia Chen
DOM
Transparency No. 2
What is DOM?
DOM (Document Object Model)A tree-based Data model of XML DocumentsAn API for XML document processing
cross multi-languages language neutral. defined in terms of CORBA IDL language-specific bindings supplied for ECMAScri
pt, java, ….
DOM
Transparency No. 3
Document Object Model
Defines how XML and HTML documents are represented as objects in programs
W3C StandardDefined in IDL; thus language independentHTML as well as XMLWriting as well as readingCovers everything except internal and external DTD
subsets
DOM
Transparency No. 4
Trees
An XML document can be represented as a tree.It has a root.It has nodes.It is amenable to recursive processing.
DOM
Transparency No. 5
DOM (Document Object Model)
What is the tree view of the document ?
<?xml version=“1.0” encoding=“UTF-8” ?> <TABLE><TBODY> <TR> <TD>紅樓夢 </TD> <TD>曹雪芹 </TD> </TR> <TR> <TD>三國演義 </TD> <TD>羅貫中 </TD> </TR> </TBODY></TABLE>
DOM
Transparency No. 6
Tree view (DOM view) of an XML Docuemnt
紅樓夢 曹雪芹 三國演義 羅貫中
(document node; root)
(element node)
(text node)
DOM
Transparency No. 7
DOM Evolution
DOM Level 0: DOM Level 1, a W3C Standard DOM Level 2, a W3C Standard DOM Level 3: W3C Standard:
Document Object Model (DOM) Level 3 Core Specification Document Object Model (DOM) Level 3 Load and Save Specification Document Object Model (DOM) Level 3 Validation Specification
DOM Level 3 : W3C Working group notes Document Object Model (DOM) Level 3 XPath
Specification Version 1.0 Document Object Model (DOM) Level 3 Views and Formatting Specifica
tion Document Object Model (DOM) Level 3 Events Specification Version 1.
0 W3c DOM Working group W3C DOM Tech Reports
DOM
Transparency No. 8
DOM Implementations for Java
Apache XML Project's Xerces/Crimson parsers: http://xml.apache.org/xerces2-j/index.html http://xml.apache.org/xerces-j/index.html
Hibernated http://xml.apache.org/crimson/
Hibernated, default implementation in java1.4
Sun's Java API for XML http://java.sun.com/products/xml
Oracle: http://technet.oracle.com/tech/xml
GNU JAXP: http://www.gnu.org/software/classpathx/jaxp/jaxp.html
DOM
Transparency No. 9
Modules
Modules: Core: org.w3c.dom (L1~L3) HTML: org.w3c.dom.html (L2) Views: org.w3c.dom.views(L2) StyleSheets: org.w3c.dom.stylesheets CSS: org.w3c.dom.css Events: org.w3c.dom.events (L2) Traversal: org.w3c.dom.traversal (L2) Range: org.w3c.dom.range (L2) Xpath, Load and Save, Validation (L3)
Only the core,traversal, XPath, L&S, and Validation modules really apply to XML. The others are for HTML.
DOM
Transparency No. 10
DOM Trees
Entire document is represented as a tree.A tree contains nodes.Some nodes may contain other nodes (depending on no
de type).Each document node contains:
zero or one doctype nodes one root element node zero or more comment and processing instruction nodes
DOM
Transparency No. 11
org.w3c.dom
17 interfaces: Attr CDATASection CharacterData Comment Document DocumentFragment DocumentType DOMImplementation Element Entity EntityReference
NamedNodeMap Node NodeList Notation ProcessingInstruction Text
plus one exception: DOMException
Plus a bunch of HTML stuff in org.w3c.dom.html and other packages
DOM
Transparency No. 12
The DOM Interface Hierarchy
Fundamental Interface
Extended Interface
Node Document
DOMImplementation
DOMExceptionNodeList
NamedNodeMap
CharacterData
Attr
Element
Text
Comment
CDATASection
DocumentType
Notation
Entity
EntityReference
ProcessingInstruction
DocumentFragment
DOM
Transparency No. 13
Steps to use DOM
Creates a parser using library specific codeUse the parser to parse the document and return a DOM
org.w3c.dom.Document object. The entire document is stored in memory. DOM methods and interfaces are used to extract data
from this object
DOM
Transparency No. 14
Parsing documents with a (Xerces) DOM Parser Example
import com.sun.org.apache.xerces.internal.parsers.*;// import org.apache.xerces.parsers.*;import org.w3c.dom.*;import org.xml.sax.*;import java.io.*;
public class DOMParserMaker {
public static void main(String[] args) { DOMParser parser = new DOMParser(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory parser.parse(args[i]); Document d = parser.getDocument(); // work with the document... } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } }}
DOM
Transparency No. 15
Parsing process using JAXP
javax.xml.parsers.DocumentBuilderFactory.newInstance() creates a DocumentBuilderFactory
Configure the factoryThe factory's newDocumentBuilder() method creates a Do
cumentBuilderConfigure the builderThe builder parses the document and returns a DOM org.
w3c.dom.Document object. The entire document is stored in memory. DOM methods and interfaces are used to extract data fro
m this object
DOM
Transparency No. 16
JAXP’s DOM plugability mechanism
DOM
Transparency No. 17
Parsing documents with a JAXP DocumentBuilder
import javax.xml.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*;
public class JAXPParserMaker {
public static void main(String[] args) { try { DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); builderFactory.setNamespaceAware(true); DocumentBuilder parser = builderFactory.newDocumentBuilder(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory Document d = parser.parse(args[i]); // work with the document... } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } // end for } catch (ParserConfigurationException e) { System.err.println("You need to install a JAXP aware parser."); }}}
DOM
Transparency No. 18
The Node Interface
package org.w3c.dom;
public interface Node {
// NodeType public static final short ELEMENT_NODE = 1; public static final short ATTRIBUTE_NODE = 2; public static final short TEXT_NODE = 3; public static final short CDATA_SECTION_NODE = 4; public static final short ENTITY_REFERENCE_NODE = 5; public static final short ENTITY_NODE = 6; public static final short PROCESSING_INSTRUCTION_NODE = 7; public static final short COMMENT_NODE = 8; public static final short DOCUMENT_NODE = 9; public static final short DOCUMENT_TYPE_NODE = 10; public static final short DOCUMENT_FRAGMENT_NODE = 11; public static final short NOTATION_NODE = 12;
DOM
Transparency No. 19
The Node interface
Node Property
public String getNodeName();
public String getNodeValue() throws DOMException;
public String setNodeValue(String value) throws DOMException;
public short getNodeType();
public String getNamespaceURI();
public String getPrefix();
public void setPrefix(String prefix) throws DOMException;
public String getLocalName();
DOM
Transparency No. 20
The Node interface
Tree navigation
public Node getParentNode();
public NodeList getChildNodes();
public Node getFirstChild();
public Node getLastChild();
public Node getPreviousSibling();
public Node getNextSibling();
public NamedNodeMap getAttributes();
public Document getOwnerDocument();
public boolean hasChildNodes();
public boolean hasAttributes();
DOM
Transparency No. 21
Node navigation
previousSliblingthis
firstChild
parentNode
lastChild
nextSibling
childNodes
DOM
Transparency No. 22
The Node interface
Tree Modification
public Node insertBefore (Node newNode, Node refNode) throws DOMException;
public Node replaceChild (Node newNode, Node refNode) throws DOMException;
public Node removeChild(Node node) throws DOMException;
public Node appendChild(Node newNode) throws DOMException;
DOM
Transparency No. 23
Node manipulation
this
refNodefirstChild lastChild
childNodes
newNode
this.insertBefore(newNode, refNode)this.replaceChild(newNode, refNode)
this.appendChild(newNode)
DOM
Transparency No. 24
The Node interface
Utilities
public Node cloneNode(boolean deep);
public void normalize(); merge all adjacent text nodes into one.
public boolean isSupported(String feature, String version); Tests whether the DOM implementation implements a spec
ific feature and that feature is supported by this node.
DOM
Transparency No. 25
The NodeList Interface
package org.w3c.dom;
public interface NodeList {
public Node item(int index);
public int getLength();
DOM
Transparency No. 26
The NamedNodeMap interface
public interface NamedNodeMap {
public Node getNamedItem(String name); // by nodeName
public Node setNamedItem(Node arg) throws DOMException;
// insert/replace node if nodeName== arg.getNodeName()
public Node removeNamedItem(String name) throws DOMException;
public Node item(int index);
public int getLength();
// Introduced in DOM Level 2:
public Node getNamedItemNS(namespaceURI, localName);
public Node setNamedItemNS(Node arg) throws DOMException;
public Node removeNamedItemNS(namespaceURI, localName)
throws DOMException ;
}
DOM
Transparency No. 27
NodeReporter
import javax.xml.parsers.*; import org.w3c.dom.*; import org.xml.sax.*; import java.io.*;
public class NodeReporter { public static void main(String[] args) { try { DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = builderFactory.newDocumentBuilder(); NodeReporter iterator = new NodeReporter(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory Document doc = parser.parse(args[i]); iterator.followNode(doc); } catch (SAXException ex) { System.err.println(args[i] + " is not well-formed."); } catch (IOException ex) { System.err.println(ex); } } } catch (ParserConfigurationException ex) { System.err.println("You need to install a JAXP aware parser."); } } // end main
DOM
Transparency No. 28
// note use of recursion public void followNode(Node node) { processNode(node);
if (node.hasChildNodes()) { NodeList children = node.getChildNodes(); for (int i = 0; i < children.getLength(); i++) { followNode(children.item(i)); } } }
public void processNode(Node node) { String name = node.getNodeName();
String type = typeName[node.getNodeType()];
System.out.println("Type " + type + ": " + name); }
DOM
Transparency No. 29
Type2TypeName
Public String[ ] typeName = new String[]{
"Unknown Type“ ,
"Element“, "Attribute“, "Text“,
"CDATA Section“, "Entity Reference“,
"Entity“, "Processing Instruction“,
"Comment“, "Document“,
"Document Type Declaration“,
"Document Fragment“,
"Notation“,
} }
DOM
Transparency No. 30
Values of NodeName, NodeValue and attributes in a Node
Interface nodeName nodeValue attributes
Attr name of attribute value of attribute null
CDATASection #cdata-section content null
Comment #comment content null
Document#document null null
DocumentFragment
#document-fragment null null
DocumentType document type name null null
Element tag name null NamedNodeMap
Entity entity name null null
EntityReference null
name of entity referenced null
Notation notation name null null
ProcessingInstruction content excluding target
target null
Text #text content of the text node null
DOM
Transparency No. 31
The Document Node
The root node representing the entire document; not the same as the root element
Contains: one element node zero or more processing instruction nodes zero or more comment nodes zero or one document type nodes
DOM
Transparency No. 32
The Document Interface
package org.w3c.dom;
public interface Document extends Node {
public DocumentType getDoctype();
public DOMImplementation getImplementation();
public Element getDocumentElement();
public NodeList getElementsByTagName(String tagname);
public NodeList getElementsByTagNameNS(String
NamespaceURI, String localName);
public Element getElementById(String elementId);
DOM
Transparency No. 33
The Document Interface
// Factory methods public Element createElement(String tagName) throws DOMException;
public Element createElementNS(String namespaceURI, String qName) throws DOMException; public DocumentFragment createDocumentFragment();
public Text createTextNode(String data); public Comment createComment(String data);
public CDATASection createCDATASection(String data) throws DOMException;
public ProcessingInstruction createProcessingInstruction(String target, String data) throws DOMException;
public Attr createAttribute(String name) throws DOMException; public Attr createAttributeNS(String namespaceURI, String qName) throws DOMException; public EntityReference createEntityReference(String name) throws DOMException; public Node importNode(Node importedNode, boolean deep) throws DOMException; }
DOM
Transparency No. 34
Element Nodes
Represents a complete element including its start-tag, end-tag, and content
Content may contain: Element nodes ProcessingInstruction nodes Comment nodes Text nodes CDATASection nodes EntityReference nodes
DOM
Transparency No. 35
The Element Interface
public String getTagName(); // = getNodeName();
public NodeList getElementsByTagName(String name); public NodeList getElementsByTagNameNS(String rui, String localName);
public String getAttribute(String name); public String getAttributeNS(String uri, String localName);
public void setAttribute(String name, String value) throws DOMException;
public void setAttributeNS(String uriURI, String qName, String value) throws DOMException;
public void removeAttribute(String name) throws DOMException; public void removeAttributeNS(String uri, String localName) throws DOMException;
public Attr getAttributeNode(String name); public Attr getAttributeNodeNS(String namespaceURI, String localName);
public Attr setAttributeNode(Attr newAttr) throws DOMException; public Attr setAttributeNodeNS(Attr newAttr) throws DOMException;
public Attr removeAttributeNode(Attr oldAttr) throws DOMException;
DOM
Transparency No. 36
Example application
UserLand's RSS based list of Web logs at http://static.userland.com/weblogMonitor/logs.xml: or locally, xml/rsslogs.xml
<?xml version="1.0"?><!-- <!DOCTYPE foo SYSTEM "http://msdn.microsoft.com/xml/general/htmlentities.dtd"> --><weblogs> <log> <name>MozillaZine</name> <url>http://www.mozillazine.org</url> <changesUrl>http://www.mozillazine.org/contents.rdf</changesUrl> <ownerName>Jason Kersey</ownerName> <ownerEmail>kerz@en.com</ownerEmail> <description>THE source for news on the Mozilla Organization. DevChats, Reviews, Chat
s, Builds, Demos, Screenshots, and more.</description> <imageUrl></imageUrl> <adImageUrl>http://static.userland.com/weblogMonitor/ads/kerz@en.com.gif </adImageUrl> </log> …</weblogs>
DOM
Transparency No. 37
DOM Design
Want to find all URLs in the logs
The character data of each url element needs to be read. Everything else can be ignored.
The getElementsByTagName() method in Document gives us a quick list of all the url elements.
DOM
Transparency No. 1
The programWeblogsDOM .java
DOM
Transparency No. 39
CharacterData interface
Represents things that are basically text holders
Super interface of Text, Comment, and CDATASection
DOM
Transparency No. 40
The CharacterData Interface
package org.w3c.dom;
public interface CharacterData extends Node { // content retrieval public String getData() throws DOMException; public int getLength(); public String substringData(int offset, int count) throws DOMException;
// content modification public void setData(String data) throws DOMException; public void appendData(String arg) throws DOMException; public void insertData(int offset, String arg) throws DOMException; public void deleteData(int offset, int count) throws DOMException; public void replaceData(int offset, int count, String arg) throws DOME
xception; }
DOM
Transparency No. 41
Text Nodes
Represents the text content of an element or attribute
Contains only pure text, no markup
Parsers will return a single maximal text node for each contiguous run of pure text
Editing may change this
DOM
Transparency No. 42
The Text Interface
package org.w3c.dom;
public interface Text extends CharacterData {
public Text splitText(int offset) throws DOMException;
}
DOM
Transparency No. 43
CDATA section Nodes
Represents a CDATA section like this example from a hypothetical SVG tutorial:
<p>You can use a default <code>xmlns</code> attribute to avoid
having to add the svg prefix to all your elements:</p>
<![CDATA[
<svg xmlns="http://www.w3.org/2000/svg"
width="12cm" height="10cm">
<ellipse rx="110" ry="130" />
<rect x="4cm" y="1cm" width="3cm" height="6cm" />
</svg>
]]>
No children
DOM
Transparency No. 44
The CDATASection Interface
package org.w3c.dom;
// no additional methods other than those form Text
public interface CDATASection extends Text {
}
DOM
Transparency No. 45
DocumentType Nodes
Represents a document type declaration
Has no children
DOM
Transparency No. 46
The DocumentType Interface
package org.w3c.dom;
public interface DocumentType extends Node {
public String getName(); public NamedNodeMap getEntities(); public NamedNodeMap getNotations(); public String getPublicId(); public String getSystemId(); public String getInternalSubset(); }
DOM
Transparency No. 47
Example
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
"DTD/xhtml1-strict.dtd">
name = “html”pubicId = "-//W3C//DTD XHTML 1.0 Strict//EN" systemId= "DTD/xhtml1-strict.dtd"
DOM
Transparency No. 48
Attr Nodes
Represents an attribute
Contains: Text nodes Entity reference nodes
DOM
Transparency No. 49
The Attr Interface
package org.w3c.dom;
public interface Attr extends Node {
public String getName();
public boolean getSpecified(); //false => from DTD
public String getValue();
public void setValue(String value)
throws DOMException;
public Element getOwnerElement();
// namespaceURI, prefix, localName inherited from Node
}
DOM
Transparency No. 50
ProcessingInstruction Nodes
Represents a processing instruction like
<?robots index="yes" follow="no"?>
No children
DOM
Transparency No. 51
The ProcessingInstruction Interface
package org.w3c.dom;
public interface ProcessingInstruction extends Node {
public String getTarget();
public String getData();
public void setData(String data) throws DOMException;
}
Ex: <?robots index="yes" follow="no“ ?> target = [robots] data = [index="yes" follow="no“ ]
DOM
Transparency No. 52
Comment Nodes
Represents a comment like this example from the XML 1.0 spec:
<!--* This is a comment -->No children
The Comment Interface
package org.w3c.dom;
public interface Comment extends CharacterData { }
Notes: Text, CDATASection, Comment are all subinterfaces of CharacterData and can use all methods defined in it.
DOM
Transparency No. 53
Notation
DOM
Transparency No. 54
Notation, Entity and EntityReference
public interface Notation extends Node {
public String getPublicId();
public String getSystemId(); }
public interface Entity extends Node { // for GE or unparsed
public String getPublicId(); // entity only.
public String getSystemId();
public String getNotationName(); }
// Entity’s replacement Text are stored as its readonly
// childNodes if available.
public interface EntityReference extends Node { }
// referred entity contents are children of this node.
// nodeName contains entity name referenced.
}
DOM
Transparency No. 55
DOMException
A runtime exception but you should catch it Error code accessible from the public code field Error code gives more detailed information: import static DOMException.*; DOMException.INDEX_SIZE_ERR
Index or size is negative, or greater than the allowed value DOMSTRING_SIZE_ERR
The specified range of text does not fit into a String HIERARCHY_REQUEST_ERR
Attempt to insert a node somewhere it doesn't belong WRONG_DOCUMENT_ERR
If a node is used in a different document than the one that created it (that doesn't support it)
INVALID_CHARACTER_ERR An invalid or illegal character is specified, such as in a name.
NO_DATA_ALLOWED_ERR Attempt to add data to a node which does not support data
DOM
Transparency No. 56
DOMException
NO_MODIFICATION_ALLOWED_ERR Attempt to modify a read-only object
NOT_FOUND_ERR Attempt to reference a node in a context where it does not exist
NOT_SUPPORTED_ERR The implementation does not support the type of object requested
INUSE_ATTRIBUTE_ERR Attempt to add an attribute to an element that already has that attribute
INVALID_STATE_ERR An attempt is made to use an object that is not, or no longer, usable.
SYNTAX_ERR An invalid or illegal string is specified.
INVALID_MODIFICATION_ERR An attempt to modify the type of the underlying object.
NAMESPACE_ERR An attempt is made to create or change an object in a way which is incorrect
with regard to namespaces. INVALID_ACCESS_ERR
A parameter or an operation is not supported by the underlying object.
DOM
Transparency No. 57
The DOMImplementation interface
Creates new Document objects
Creates new DocType objects
Tests features supported by this implementation
DOM
Transparency No. 58
DOMImplementation interface
package org.w3c.dom;
public interface DOMImplementation {
public boolean hasFeature(String feature, String version) public Object getFeature(String feature, String version) public DocumentType createDocumentType(String qName, String publicID, String systemID, String internalSubset) public Document createDocument(String uri, String qName, D
ocumentType doctype) throws DOMException}
DOM
Transparency No. 59
org.apache.xerces.dom.DOMImplementationImpl
The Xerces-specific class that implements DOMImplementation
package org.apache.xerces.dom;
public class DOMImplementationImpl implements DOMImplementation {
// factory method public static DOMImplementation getDOMImplementation()
public boolean hasFeature(String feature, String version) public Object getFeature(String feature, String version) public DocumentType createDocumentType(String qName, String publicID, String systemID, String internalSubset) public Document createDocument(String uri, String qName, Docume
ntType doctype) throws DOMException}
DOM
Transparency No. 60
Examples of creating DOM documents in the memory
FibonacciDOM.java using Xerces-j
FibonacciJAXP.java using JAXP.
DOM
Transparency No. 61
Which modules and features are supported?
A DOM application can use the hasFeature() method of the DOMImplementation interface to determine whether a module is supported or not.
XML Module: "XML"
HTML Module: "HTML"
Views Module: "Views"
StyleSheets Module: "StyleSheets"
CSS Module: "CSS“
CSS (extended interfaces) Module: "CSS2"
Events Module: "Events"
User Interface Events (UIEvent interface) Module: "UIEvents"
Mouse Events Module: "MouseEvents"
Mutation Events Module: "MutationEvents"
HTML Events Module: "HTMLEvents"
Traversal Module: "Traversal"
Range Module: "Range"
DOM
Transparency No. 62
Which modules are supported?
import org.apache.xerces.dom.DOMImplementationImpl;import org.w3c.dom.*; import java.io.*;
public class ModuleChecker { public static void main(String[] args) { // parser dependent DOMImplementation implementation = DOMImplementationImpl.getDOMImplementation();
String[] features = { "XML", "HTML", "Views", "StyleSheets", "CSS", "CSS2", "Events", "UIEvents", "MouseEvents", "MutationEvents", "HTMLEvents", "Traversal", "Range"}; for (int i = 0; i < features.length; i++) { if (implementation.hasFeature(features[i], "2.0")) { System.out.println("Implementation supports " + features[i] ); } else { System.out.println("Implementation does not support " + features[i]); } } } }
DOM
Transparency No. 63
The result
> java ModuleCheckerImplementation supports XMLImplementation does not support HTMLImplementation does not support ViewsImplementation does not support StyleSheetsImplementation does not support CSSImplementation does not support CSS2Implementation supports EventsImplementation does not support UIEventsImplementation does not support MouseEventsImplementation supports MutationEventsImplementation does not support HTMLEventsImplementation supports TraversalImplementation supports Range
>
DOM
Transparency No. 64
Serialization
The process of taking an in-memory DOM tree and converting it to a stream of characters that can be written onto an output stream
Not a standard part of DOM Level 2 The org.apache.xml.serialize package:
public interface DOMSerializer public interface Serializer public abstract class BaseMarkupSerializer extends Object
implements DocumentHandler, org.xml.sax.misc.LexicalHandler, DTDHandler, org.xml.sax.misc.DeclHandler,
DOMSerializer, Serializer public class HTMLSerializer extends BaseMarkupSerializer public final class TextSerializer extends BaseMarkupSerializer public final class XHTMLSerializer extends HTMLSerializer public final class XMLSerializer extends BaseMarkupSerializer
DOM
Transparency No. 65
Example
A DOM program that writes Fibonacci numbers onto System.out
FibonacciDOMSerializer.java
DOM
Transparency No. 66
OutputFormat
For pretty format of output.package org.apache.xml.serialize;public class OutputFormat extends Object {
public OutputFormat( [String method, String encoding, boolean indenting ]) public OutputFormat( [Document doc,] String encoding, boolean indenting)
// abbreviated as public property String method; public String getMethod(); public void setMethod(String method)
// other public properties : int indent, lineWidth; boolean indenting, OmitXMLDeclaration, Standalone, LineSeparator, PreserveSpace; String encoding, version, mediaType, DoctypePublic, DoctypeSystem;
public void setDoctype(String publicID, String systemID)// Elements whose text children should be output as CDATA public String[] getCDataElements() public boolean isCDataElement(String tagName) public void setCDataElements(String[] cdataElements)
DOM
Transparency No. 67
OutputFormat
//NonEscape elements; i.e., text children output without using char reference
public String[] getNonEscapingElements() public boolean isNonEscapingElement(String tagName) public void setNonEscapingElements(String[] nonEscapingElements)
// last printable character in the encoding public char getLastPrintable() Query methods public static String whichMethod(Document doc) public static String whichDoctypePublic(Document doc) public static String whichDoctypeSystem(Document doc) public static String whichMediaType(String method)
DOM
Transparency No. 68
Better formatted output
UTF-8 encoding, Indentation, Word wrapping Document type declaration
try {
// Now that the document is created we need to *serialize* it
OutputFormat format = new OutputFormat(fibonacci, “UTF-8", true);
format.setLineSeparator("\r\n");
format.setLineWidth(72);
format.setDoctype(null, "fibonacci.dtd");
XMLSerializer serializer = new XMLSerializer(System.out, format);
serializer.serialize(root);
}
catch (IOException e) { System.err.println(e); }
> Java domexample. PrettyFibonacciDOMSerializer
DOM
Transparency No. 69
DOM based XMLPrettyPrinter
public class DOMPrettyPrinter { public static void main(String[] args) { DOMParser parser = new DOMParser(); for (int i = 0; i < args.length; i++) { try { // Read the entire document into memory parser.parse(args[i]); Document document = parser.getDocument(); // set output format & serialize OutputFormat format = new OutputFormat(document, "UTF-8", true);
format.setLineSeparator("\r\n"); format.setIndenting(true); format.setIndent(2); format.setLineWidth(72); format.setPreserveSpace(false);
XMLSerializer serializer = new XMLSerializer(System.out, format); serializer.serialize(document); } catch (SAXException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } } // end main }
DOM
Transparency No. 70
Notes
Using the DOM to write documents automatically maintains well-formedness constraints
Validity is not automatically maintained.
DOM
Transparency No. 71
References
Most contents this presentation comes from: http://www.cafeconleche.org/slides/sd2004west/saxdom
Processing XML with Java Elliotte Rusty Harold, Chapters 9-13: Chapter 9, The Document Object Model: Chapter 10, Creating New XML Documents with DOM: Chapter 11, The Document Object Model Core: Chapter 12, The DOM Traversal Module: Chapter 13, Output from DOM:
DOM Level 2 Core Specification: DOM Level 2 Traversal and Range Specification:
DOM
Transparency No. 1
JAXP(Java API for XML ) for DOM
DOM
Transparency No. 73
DOMParsers and DOMImplementations
Problems:How to get a DOM Document object from an XML Docum
ent ? Get DOM Parser, parse XML document and then get a DOM
document.HOW to construct DOM objects directly by programs ?
get a DOMImplementation, invoke cerateDocument() to get the initial DOM document.
HOW to get a DOM object form an XML Document and modify it by programs ? get a DOM document by parsing the XML Docuemnt, use th
e factory methods of Document to create Nodes and use Node methods to add them to the result tree.
DOMParser
XML Document
DOM Document
DOM
Transparency No. 74
Use Apache’s xerces for DOM
XML2DOM: // find the DOM parser implementation class: org.apache.xerces.parsers.
DOMParser DOMParser parser = new DOMParser(); parser.setFeature(("http://xml.org/sax/features/validation", true ); parser.setFeature(("http://xml.org/sax/features/namespace", true ); … parser.parse( url_or_inputSource) ; Document doc = parser.getDocument();DOMImplementation =doc.getImplementation();Construct DOM from scratch: // find DOMImplematation class: org.apache.xerces.dom.DOMImplementat
ionImpl DOMImplementation dm = new DOMImplementationImpl(); // or dm = DOMImplementationImpl.getDOMImplementation(); // non-dom Document doc = dm.createDocument(…); Element e = doc.createElement(…); Attr attr = doc.createAttributeNS(…); Text txt = doc.createTextNode(“…”);
DOM
Transparency No. 75
JAXP (Java API for XML Processing) 1.2Sun’s Java API for XML Processingthree modules:
for DOM Processing for SAX Processing for Transformation
5 packages1. javax.xml.parsers
Provides classes allowing the processing of XML documents. Two types of plugable parsers are supported: SAX (Simple API for XML) DOM (Document Object Model)
2. javax.xml.transform ( + … ) APIs for processing transformation instructions, and perform
ing a transformation from source to result.
DOM
Transparency No. 76
JAXP’s DOM plugability mechanism
DOM
Transparency No. 77
JAXP API for DOM
javax.xml.parsers.DocumentBuilder Using this class, an application programmer can o
btain a Document from XML.javax.xml.parsers.DocumentBuilderFactory
a factory class for obtaining a DocumentrBuilder. abstract class Concrete subclass can be obtained by the static m
ethod: DocumentBuilderFactory.newInstance() desired capability of the parser can be specified b
y setting the various properties of the obtained factory instance.
DOM
Transparency No. 78
Example code snippet
import javax.xml.parsers.*;
DocumentBuilder builder;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setValidating(true);
String location = "http://myserver/mycontent.xml";
try {
builder = factory.newDocumentBuilder();
Document doc1 = builder.parse(location);
Document doc2 = builder.newDocument(); //empty document
} catch (SAXException se) {// handle error
} catch (IOException ioe) { // handle error
} catch (ParserConfigurationException pce){// handle error
}
DOM
Transparency No. 79
javax.xml.dom.DocumentBuilder
abstract DOMImplementation getDOMImplementation() Obtain an instance of a DOMImplementation object.
abstract Document newDocument() Obtain a new instance of a DOM Document object to build a DOM tree with.
abstract boolean isNamespaceAware() Indicates whether or not this parser is configured to understand namespaces.
abstract boolean isValidating() Indicates whether or not this parser is configured to validate XML documents.
Document parse(File | InputSource | InputStream [, systemId] | uriString ) Parse the content of the given file as an XML document and return a new DO
M Document object. abstract void setEntityResolver(EntityResolver er)
Specify the EntityResolver to be used to resolve entities present in the XML document to be parsed.
abstract void setErrorHandler(ErrorHandler eh) Specify the ErrorHandler to be used to report errors present in the XML docu
ment to be parsed.
DOM
Transparency No. 80
javax.xml.dom.DocumentBuilderFactory
Object getAttribute(String name) void setAttribute(String name, Object value)
Allows users to set/get specific attributes on the underlying implementation.
boolean isIgnoringComments() , setIgnoringComments(boolean) Indicates whether or not the factory is configured to produce parsers w
hich ignores comments. Other properties:
IgnoringElementContentWhitespace ; ExpandEntityReferences; Coalescing; // merge adjacent texts and CDATA into a text node NamespaceAware; Validating;
abstract DocumentBuilder newDocumentBuilder() Creates a new instance of a DocumentBuilder using the currently confi
gured parameters. static DocumentBuilderFactory newInstance()
Obtain a new instance of a DocumentBuilderFactory.
DOM
Transparency No. 81
HOW DocumentBuilderFactory finds its instance
Use the javax.xml.parsers.DocumentBuilderFactory system property
Use the above property at file “%JAVA_HOME%/lib/jaxp.properties" in the JRE directory.
look for the classname in the file META-INF/services/ javax.xml.parsers.DocumentBuilderFactory in jars available to the runtime.
Platform default DocumentBuilderFactory instance, which is "org.apache.crimson.jaxp.DocumentBuilderFactoryImpl“ f
or jdk 1.4 “com.sun.org.apache.xerces.internal.jaxp.DocumentBuilde
rFactoryImpl” for jdk 1.5.