The Office of the Historian’s Move to an
Open Source XML Platform
for Digital Publishing
Joseph Wicentowski, [email protected]
The Office of the Historian is…
An Office in the U.S. Department of State’s Bureau of Public Affairs
Dates back to the administration of Abraham Lincoln
Staffed by 40+ professional historians, based in Washington, D.C.
Foreign Relations of the United States (FRUS)
Before: Our Online Publications, 1996-2009
Initial Core Goals
1. Break through the barriers that our existing content management system placed on getting our publications online
2. Improve the user experience, including a better search engine
Revised Goals (To really do it right, let’s…)
1. Make the right decisions now to minimize cost of change in the future
2. Avoid proprietary technologies and device-specific formats. Go for open and archivally sound.
3. Remain tolerant of idiosyncracies, while embracing standards
4. Build on the best editorial traditions to deliver better reading and research
Enhance glossaries
Document Body: Shows relevant glossary entries
Mouse over a name to expand glossary entry
Improve the online footnoteFootnotes in
Footnotes online
Hover over footnote reference to view inline
Footnotes in Print
Footnotes online
Building on traditions…
5. Enhance body content with reference content, such as glossaries
6. Improve the online footnote
7. Respect the integrity of the print publication even while delivering a great web experience
8. Commit to consistent URLs
9. Provide complete citations on every view
10.Data visualization? – dynamic timelines and maps?
Data visualization – dynamic timelines, maps
Ambitious Goals!What format would allow us to achieve
them?1. Break through the barriers that our existing content management
system placed on getting our publications online
2. Improve the user experience, including a better search engine
3. Make the right decisions now to minimize cost of change in the future
4. Avoid proprietary technologies and device-specific formats. Go for open and archivally sound.
5. Remain tolerant of idiosyncrasies, while embracing standards
6. Build on the best editorial traditions to deliver better reading and research
7. Enhance body content with reference content, such as glossaries
8. Improve the online footnote
9. Respect the integrity of the print publication even while delivering a great web experience
10.Commit to consistent URLs
11.Provide complete citations on every view
12.Data visualization? – dynamic timelines and maps?
XML!
XML!(umm… now
what?)
We decided we needed to go XML… What next?
1. Decide: which kind of XML?
2. How to get content into XML?
3. Software/platforms for XML solution?
… we researched flavors of XML… we reviewed our own content … we prototyped… we developed encoding guidelines… we found conversion/encoding vendors… we researched XML platforms… we programmed, tested
history.state.gov (b. 2009)
We knew we needed XML… Our own answers
1. Which kind of XML?… TEI (Text Encoding Initiative)
2. How to get content into XML?
… Outsource to encoding vendor… Our very paper-based source
material led us to minimize impact on existing editorial workflow for new publications
… Work with compositor to deliver XML
… Gradually move into comprehensive XML-based editorial workflow
3. Software/platforms for XML solution?
An Open Platform for XML Publishing
1. eXist … free, open source native XML
database… fast fulltext search engine + web
server… active community… runs on Macs, PCs, Linux
2. oXygen… commercial XML editor… swiss army knife of XML
development
3. XQuery… programming language for querying
and manipulating XML… some prefer XSLT for transforming
XML, but XQuery does it all
Agility with XML: Adapting to Unforeseen Requirements
1. E-Readers and new formats like ePub… evolving formats
2. Open Government Directive and data.gov
… native XML databases as a strength in an era of government transparency
3. Need to let our staff edit and annotate their TEI content in the browser
… Making use of XForms and CKEditor
E-Readers and the ePub format
• Digital Books (ePub)
• Print, Print on Demand
From a single digital master file (an XML file) we can publish in many formats…
• tomorrow’s format?
• Online search & browsing
data.gov and the Open Government Initiative
Wired Magazine, May 19, 2010 , http://www.wired.com/epicenter/2010/05/sneak-peek-the-obama-administrations-redesigned-datagov/
TEI Annotator: Editing XML in the Browser
Win-Wins of an Open Platform
1. Open standards
2. Open source
3. Active, responsive, generous community
4. Contribute back to the community: patches, enhancements, and articles