+ All Categories
Home > Documents > Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold...

Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold...

Date post: 23-Dec-2015
Category:
Upload: lambert-reed
View: 225 times
Download: 2 times
Share this document with a friend
Popular Tags:
45
Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold [email protected] http://www.cafeconleche.org/
Transcript
Page 1: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Effective XML• XML Developers Network of

the Capital District• Elliotte Rusty Harold• [email protected]• http://

www.cafeconleche.org/

Page 2: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Part I: Syntax

Page 3: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 1: Include an XML declaration<?xml version="1.0" encoding="UTF-8"?>

• Optional, but treat as required• Specifies version, character set, and

encoding• Very important for detecting

encoding• Identifies XML when file and media

type information is unavailable or unreliable

Page 4: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 3: Stay with XML 1.0

• XML 1.1:• New name characters• C0 control characters• C1 control characters • NEL• Undeclare namespace prefixes

• Incompatible with• Most XML parsers• W3C and RELAX NG schema languages• XOM, JDOM

Page 5: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Part II: Structure

Page 6: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

The XML Stack

Page 7: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 14: Allow All XML syntax

• CDATA sections• Entity references• Processing instructions• Comments• Numeric character references• Document type declarations• Different ways of representing the

same core content; not different information

Page 8: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 9: Distinguish text from markup• A DocBook element<programlisting><![CDATA[<value> <double>28657</double></value>]]></programlisting>

• The content is:<value> <double>28657</double></value>

• This is the same:<programlisting>&lt;value&gt; &lt;double&gt;28657&lt;/double&gt; &lt;/value&gt;</programlisting>

Page 9: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

The reverse problem

• Tools that create XML from strings:• Tree-based editors like <Oxygen/> or XML Spy

• WYSIWYG applications like OpenOffice Writer

• Programming APIs such as DOM, JDOM, and XOM

• The tool automatically escapes reserved characters like <, >, or &.

• Just because something looks like an XML tag does not mean it is an XML tag.

Page 10: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 10: White space matters

• Parsers report all white space in element content, including boundary white space

• An xml:space attribute is for the client application only, not the parser

• White space in attribute values is normalized

• Parsers do not report white space in the prolog, epilog, the document type declaration, and tags.

Page 11: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 11: Make structure explicit through markup• Bad

<Transaction>Withdrawal 2003 12 15 200.00</Transaction>

• Better<Transaction type="withdrawal"> <Date>2003-12-15</Date> <Amount>200.00</Amount></Transaction>

Page 12: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 12: Store metadata in attributes• Material the reader doesn’t want

to see• URLs• IDs• Styles• Revision dates• Authors name

• No substructure• Revision tracking• Citations

• No multiple elements

Page 13: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 13: Remember mixed content

• Narrative documents• Record-like documents• The RSS problem<item> <title>Xerlin 1.3 released</title> <description> Xerlin 1.3, an open source XML Editor written in Java, has been released. Users can extend the application via custom editor interfaces for specific DTDs. New features in version 1.3 include XML Schema support, WebDAV capabilities, and various user interface enhancements. Java 1.2 or later is required. </description><link>http://www.cafeconleche.org/#news2003April7</link></item>

Page 14: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

What you really want is this:<description> <p><a href="http://www.xerlin.org"><strong>Xerlin 1.3</strong></a>,an open source XML Editor written in Java, has been released. Users can extend the application via custom editor interfaces for specific DTDs. New features in version 1.3 include:</p> <ul> <li>XML Schema support</li> <li>WebDAV capabilities</li> <li>Various user interface enhancements</li> </ul> <p>Java 1.2 or later is required.</p> </description>

Page 15: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

What people do is this:<description>&lt;p>&lt;a href="http://www.xerlin.org">&lt;strong>Xerlin 1.3&lt;/strong>&lt;/a>, an open source XML Editor written in Java, has been released. Users can extend the application via custom editor interfaces for specific DTDs. New features in version 1.3 include:&lt;/p> &lt;ul> &lt;li>XML Schema support&lt;/li> &lt;li>WebDAV capabilities&lt;/li> &lt;li>Various user interface enhancements&lt;/li> &lt;/ul> &lt;p>Java 1.2 or later is required.&lt;/p> </description>

Page 16: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 16: Prefer URLs to unparsed entities and notations• URLs are simple and well

understood• Notations and unparsed entities

are confusing and little used• URLs don’t require the DTD to be

read• Many APIs don’t even support

notations and unparsed entities

Page 17: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Part III: Semantics

Page 18: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 17: Use processing instructions for process-specific content

• For a very particular, even local, process

• Describes how a particular process acts on the data in the document

• Does not describe or add to the content itself

• A unit that can be treated in isolation

• Content is not XML-like.• Applies to the entire document

Page 19: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Processing instructions are not appropriate when:• Content is closely related to the

content of the document itself.• Structure extends beyond a single

processing instruction• Needs to be validated.

Page 20: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 18: Include all information in instance documents• Not all parsers read the DTD• Especially browsers• Beware

• Default attribute values• Parsed entity references• XInclude• ID type dependence (XPath, DOM,

etc.)

Page 21: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 19: Encode binary data using quoted printable and/or Base64

• Quoted printable works well for mostly text

• Base-64 for non-text data• Can you link to the data with a URL

instead?

Page 22: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 20-22: Use namespaces for modularity and extensibility• Not hard; simple cases can use one

default namespace• http URIs are normally preferred• DTD validation is tricky• Code to namespace URIs, not

prefixes• Avoid namespace prefixes in

element content and attribute values

Page 23: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 23: Reuse XHTML for generic narrative content

Page 24: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 24: Choose the right schema language for the job• DTDs• The W3C XML Schema Language• RELAX NG• Schematron

Page 25: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 25: Pretend there's no such thing as the PSVI• Post Schema Validation Infoset• Adds types like int and gYear to

elements• Often not specific enough• Element/attribute names are types

Page 26: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 28: Use only what you need• You need

• Well-formed XML 1.0• A parser

• You probably need:• Namespaces

• You may not need:• DTDs• Schemas• XInclude• WS-Kitchen-Sink• etc.

Page 27: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 29: Always use a parser• Can’t use regular expressions:• Detecting encoding• Comments and processing instructions that

contain tags• CDATA sections• Unexpected placement of spaces and line

breaks within tags• Default attribute values• Character and entity references• Malformed documents• Internal DTD Subset

• Why not?• Unfamiliarity with parsers• Too slow

Page 28: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 30: Layer Functionalitybook.xml

XInclude

XSLT Transform to

XHTML

finished_book.xml

preface.xml

xmlsyntax.xml

XSLT Transform to

HTML

XSLT Transform to

XSL-FO

XSLT Transform to

Extract

SAX Program that extracts

examples

16 more chapters...

finished_book.xml

Valid?

book.xhtml book.html book.fo chapter1.xmlchapter1.xmlchapter2.xml

fop

book.pdf

chapters 1 to 17.xml

Example Source Code

Files

XSLT Transform to

XSL-FO

chapter1.xmlchapter2.xmlchapters 1 to 17.fo

xmlprotocols.xml

Yes

Print Error MessageNo

fop

chapter1.xmlchapter2.xmlchapters 1 to 17.pdf

Page 29: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 31-33: Program to standard APIs• Easier to deploy in Java 1.4/1.5• Different implementations have

different performance characteristics

• SAX is fast• DOM interoperates• Semi-standard:• JDOM• XOM

• Bleeding edge• StAX• JAXB

Page 30: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 34: Read the complete DTD• Be conservative in what you

generate; liberal in what you accept

• Important content from DTD:• Default attribute values• Namespace declarations• Entity references

Page 31: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 35: Navigate with XPath

• More robust against unexpected structure

• Allow optimization by engine• Easier to code; enhanced

programmer productivity

Page 32: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 36: Serialize XML with XML

Page 33: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 37: Validate inside your program with schemas

Page 34: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Part IV: Implementation

Page 35: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 38: Write documents in Unicode

• Prefer UTF-8• Smaller in English• ASCII compatible

• Normalization• É, ü, ì and so forth• NFC• ICU

Page 36: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 40: Avoid Vendor Lockin; Beware• Opaque, binary data used in place

of marked up text. • Over-abbreviated, inobvious

names like F17354 and grgyt • APIs that hide the XML• Products that focus on the

"Infoset”• Alternate serializations of XML• Patented formats

Page 37: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 41: Hang on to your relational database

Page 38: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 42: Document Namespaces with RDDL

<!DOCTYPE html PUBLIC "-//XML-DEV//DTD XHTML RDDL 1.0//EN" "http://www.rddl.org/rddl-xhtml.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rddl="http://www.rddl.org/"><head> <title>MegaBank Statement Markup Language (MBSML)</title></head><p>This is the XML namespace for the <ahref="http://developer.megabank.com/xml/">MegaBank Statement Markup Language</a>.</p><rddl:resource xlink:type="simple" xlink:href="http://developer.megabank.com/xml/spec.html" xlink:role="http://www.w3.org/TR/html4/" xlink:arcrole ="http://www.rddl.org/purposes#normative-reference"> <p> The <a href="http://developer.megabank.com/xml/spec.html">MegaBank Statement Markup Language Specification 1.0</a> </p></rddl:resource></body></html>

Page 39: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 43: Preprocess XSLT on the server side

Page 40: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 44: Serve XML+CSS to the client• Supported by• Safari• IE 5.0 and later• Mozilla• Netscape 6 and later• Konqueror• Opera• Firefox• Omniweb

Page 41: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 45: Pick the correct MIME type• application/xml• Not text/xml!• Don't use charset• application/mathml+xml• image/svg+xml• application/xslt+xml

Page 42: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 46: TagSoup Your HTML

Page 43: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 47: Catalog common resources

<?xml version="1.0"?><catalog xmlns= "urn:oasis:names:tc:entity:xmlns:xml:catalog">

<public publicId= "-//OASIS//DTD DocBook XML V4.2//EN" uri= "file:///opt/xml/docbook/docbookx.dtd"/>

</catalog>

Page 44: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

Item 50: Compress if space is a problem

//output OutputStream fout = new FileOutputStream("data.xml.gz"); OutputStream out = new GZipOutputStream(fout); OutputFormat format = new OutputFormat(document); XMLSerializer output = new XMLSerializer(out, format); output.serialize(doc); // input InputStream fin = new FileInputStream("data.xml.gz"); InputStream in = new GZipInputStream(fin); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = factory.newDocumentBuilder(); Document doc = parser.parse(in); S // work with the document...

Page 45: Effective XML XML Developers Network of the Capital District Elliotte Rusty Harold elharo@metalab.unc.edu

To Learn More

• This Presentation: http://cafeconleche.org/slides/albany/effectivexml

• Effective XML: 50 Specific Ways to Improve Your XML Documents• Elliotte Rusty Harold• Addison-Wesley, 2003• ISBN 0-321-15040-6• $44.99• http://cafeconleche.org/books/

effectivexml


Recommended