+ All Categories
Home > Documents > Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila...

Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila...

Date post: 12-Jan-2016
Category:
Upload: alannah-richard
View: 215 times
Download: 0 times
Share this document with a friend
21
Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar, Jie Ling, Matthew Stoeffler, Umadevi Thanneeru
Transcript
Page 1: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format

Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar, Jie Ling, Matthew Stoeffler,

Umadevi Thanneeru

Page 2: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Portico & JSTOR: Committed to Preserving the Scholarly Record

JATS-CON 2010

I T H A K A

Ithaka helps the academic community

use digital technologies to preserve the

scholarly record and to advance research and teaching in sustainable

ways

Digitization for Preservation & AccessDigital Preservation

“Dark Archive” “Light Archive”

Page 3: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Portico Archive

• Portico’s objective is to help libraries make a secure and reliable transition from print to a reliance on e-content.

• Maintains archiving agreement with publishers to collect and preserve content.

• Receives content directly from publishers.

• Preserves:– Current journals (born digital)– Back file journals (reborn

digital)– E-books– Digitized historical collections

JATS-CON 2010

Page 4: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

An “Insurance Policy” for e-Content

• Provide libraries with access to archived content when it becomes lost, orphaned or abandoned (regardless of libraries past or current subscription):

1.Publisher ceases operation

2.Publisher discontinues title

3.Publisher drops back file

JATS-CON 2010

•Provide libraries with post-cancellation access – if publisher specifically names Portico

•About 90% of titles in Archive are covered by Portico post-cancellation access rights.

•Libraries asked to pay annual Archive support payment to defray cost of preservation, e.g. “insurance premium”

Page 5: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Portico Archive as of July 19, 2010

Category Files %

Images 84,215,731 47.93%

Publisher Supplied Text 47,393,731 26.98%

Portico Created Archival Text

43,689,083 24.87%

Application Specific Files 232,732 0.13%

Multi-file Packages 140,333 0.08%

Videos 20,604 0.01%

Audio 570 <0.00%

Executable 6 <0.00%

Total 175,692,826 100%

• 114 publisher participants• 11,788 committed journal titles• 43,253 committed e-books• 13 committed digitized collections

• >14 million articles ingested

• 688 library participants– (48% outside US)

• 4 Trigger events• 15 Post-cancellation Access Claims

JATS-CON 2010

Page 6: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Portico Preservation Infrastructure

JATS-CON 2010

• Publisher supplies XML Source file (including the text, images) and PDF page rendition. • Best approach for preserving the intellectual content of the article or book.

• Authenticate: verify that preserved content is what it purports to be.

• Verify format: ensure the file meets syntactic and semantic rules of format specification. • Repair

• Normalize (XML)

• Create preservation metadata

• Assess archival robustness of file format.

• Migrate files to ensure future usability of content.

• Replicate objects and metadata to protect against bit rot and media deterioration

• Render articles to meet viewing requirements of delivery platform.

Page 7: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Key Challenges for an Archival DTD

Dec 2001, Inera’s “E-Journal Archive DTD Feasibility Study” highlighted these Key Challenges for an Archival DTD:

• Use of generated and boilerplate text, especially in – Label text for figure captions

– Citation text

– Author name and affiliation

– Dates

• Expression of links between author and affiliation• Reference elements• Expression of non-article and other content• Abbreviations and definitions

JATS-CON 2010

Page 8: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Key Challenges for an Archival DTD

• Keywords• Sections, including handling of sections without headers• Placement of floating objects, such as figures, tables, graphs• Tables, including cell formatting issues (cells with figures,

content alignment, etc.)• Math• Intra-, inter- and extra-article linking• Publisher-specific elements

When reviewing the minutes of the Working Group and the evolution of the DTD, we can confirm that these areas have

been the main focus of discussion.

JATS-CON 2010

Page 9: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Some Design Constraints

• IMPLIED, not REQUIRED attributes

• CDATA instead of controlled list

• Optional Elements, or relaxed order of elements

• Surprising location of Elements

• No Domain Specific Elements

JATS-CON 2010

Page 10: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Publisher/Domain Specific Elements

• Custom-Meta– Business Data– Allowed in journal-meta, article-meta, front-stub– Name/Value pair (may contain 38 different

Elements)

• Named-Content– Semantic Significance– Allowed in 112 Elements– May contain 59 different Elements

JATS-CON 2010

Page 11: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Challenges posed by source DTDs

Extended Semantics for Named-Content

• Price in Citation– Becomes <named-content content-type=“price”>

<citation reference="1" id="R1" type="serial"> <author order="1"> <name><first>S. P.</first><last>Morgan</last></name> </author> <journal> <sertitle>J. Appl. Phys.</sertitle> <URI type="ISSN">0030-3941</URI> <price>$01.00</price> <volume>29</volume> <pages><first>1358</first><last>1368</last></pages> <pubdate>1958</pubdate> </journal> <title>General solution of the Luneburg lens problem</title></citation>

JATS-CON 2010

Page 12: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Challenges posed by source DTDs

More Extended Semantics for Named-Content

• Affiliation in Footnotes/P– Becomes <named-content content-type=“aff” id=“AFF2”>

<FOOTNOTE ID="N101" TYPE="AFF"><P ALPHABET="LATIN" TYPE="INDENT">    <AFF ID="AFF2“><IT>Corresponding author address:</IT> Nicholas M. J. Hall, Dept. of Atmospheric and Oceanic Sciences, McGill University, 805 Sherbrooke St. W., Montreal PQ H3A 2K6, Canada.</AFF>

</P></FOOTNOTE>

JATS-CON 2010

Page 13: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Challenges posed by source DTDs

More Extended Semantics for Named-Content

• Funding in Acknowledgments/P– Becomes <named-content content-type=“funding”>

<ack><sectitle>ACKNOWLEDGMENTS</sectitle><p>Q.W.&#x2019;s research is partially supported by AFOSR Grant No. <funding source="USAFOSR"><contract>F49550-05-1-0025</contract></funding> and NSF Grants No. <funding source="NSF"><contract>DMS-0204243</contract></funding>, No. <funding source="NSF"><contract>DMS-0605029</contract></funding>, and No. <funding source="NSF"><contract>DMS-0626180</contract></funding>. P.Z. is partially supported by the special funds for major State Research Projects <funding source="UNSPECIFIED"><contract>2005CB321704</contract></funding> and National Science Foundation of China for Distinguished Young Scholars <funding source="NSFC"><contract>10225103</contract></funding>. H.Z.&#x2019;s work is supported in part by the Naval Postgraduate School Research Initiation Program.</p></ack>

JATS-CON 2010

Page 14: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Challenges posed by source DTDs

More Extended Semantics for Named-Content

• Organization Division in Affiliation– Becomes <named-content content-type=“division”>

<Affiliation ID="Aff12"> <OrgDivision>Optisches Institut</OrgDivision> <OrgName>Technische Universität Berlin</OrgName> <OrgAddress> <City>Berlin</City> <Country>Germany</Country> </OrgAddress> </Affiliation>

JATS-CON 2010

Page 15: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Challenges posed by source DTDs

More Extended Semantics for Named-Content

• Generic Element (addinfo)– Becomes <named-content content-type=“addinfo”>

<ref-conf id="CIT0045"><ref-conf-text><author-ref-text><surname>Bishop</surname> <givenname>CJ</givenname></author-ref-text>, <author-ref-text><surname>Aanenses</surname> <givenname>DM</givenname></author-ref-text>, <author-ref-text><surname>Jordan</surname> <givenname>GE</givenname></author-ref-text>, <author-ref-text><surname>Kilian</surname> <givenname>M</givenname></author-ref-text>, <author-ref-text><surname>Hanage</surname> <givenname>WP</givenname></author-ref-text>, <author-ref-text><surname>Spratt</surname> <givenname>BG.</givenname></author-ref-text> <presentationtitle>Electronic taxonomy: assigning strains to bacterial species via the internet</presentationtitle>. <collectworktitle>BMC Biology</collectworktitle> <publicationfield-text><year>2009</year>; <year>7</year></publicationfield-text>: <firstpage>3</firstpage>. <addinfo>doi:10.1186/1741-7007-7-3</addinfo>.</ref-conf-text> </ref-conf>

JATS-CON 2010

Page 16: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Challenges posed by source DTDs

Target DTD Structural Constraints that force the use of Named-Content

• Table in Table– TD contains named-content, which contains a table

<td><named-content content-type=“table”><table-wrap>

• Figure in Table– TD contains named-content, which contains a fig

<td><named-content content-type=“figure”><fig>

• Display-Formula in Title– Title contains named-content, which contains a display-formula

<title><named-content content-type=“display-formula”><display-formula>

JATS-CON 2010

Page 17: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Challenges posed by source DTDs

• Question/Answer– Generic and Structural

– Is saying <list list-content=“question”> enough?

<Question-Answer> <Q><P><L>1</L>. The major advantage of amniotic membrane transplantation in pterygium surgery is</P></Q> <A><P><L>A</L>. reduction in surgical time</P></A> <A><P><L>B</L>. preservation of conjunctiva</P></A> <A><P><L>C</L>. better cosmetic outcomes compared with conjunctival autografting</P></A> <A><P><L>D</L>. lowest recurrence rate among the surgical techniques</P></A></Question-Answer>

JATS-CON 2010

Page 18: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Challenges posed by source DTDs

• Synonymy– Domain and Semantic

– Is saying <list list-content=“synonymy”> enough?

– Or <named-content content-type=“synonymy”> because of the semantic meaning?

<SYNONYMY>

<HEAD>ECHINOSTELIALES</HEAD>

<ITEM><P><GENSP>Clastoderma debaryanum</GENSP> A. Blytt</P></ITEM>

<ITEM><P><GENSP>Echinostelium apitectum</GENSP> K.D. Whitney, MC</P></ITEM>

<ITEM><P><GENSP>Echinostelium coelocephalum</GENSP> T.E. Brooks &amp; H.W. Keller, MC</P></ITEM>

<ITEM><P><GENSP>Echinostelium minutum</GENSP> de Bary, MC</P></ITEM>

</SYNONYMY>

Synonyms are different scientific names that pertain to the same taxon

JATS-CON 2010

Page 19: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Challenges posed by source DTDs

• Decision Tree (Taxonomic Key)– Domain, Semantic, Structural, and Presentation

<KEY> <COUPLET><DESCR><NO>1.</NO>Hypostomal setae (Hy) shorter than half the width of labrum</DESCR> <RESP><GENSP>Sycophila mellea</GENSP> (Curtis, 1831), <GENSP>Tetramesa </GENSP>Walker, 1848</RESP></COUPLET> <COUPLET><DESCR><NO></NO>--Hypostomal setae longer or about as long as half the width of labrum</DESCR> <RESP>2</RESP></COUPLET> <COUPLET><DESCR><NO>2.</NO>More than two dorsal setae (D) present on abdominal segments A6-8</DESCR> <RESP>3</RESP></COUPLET> <COUPLET><DESCR><NO></NO>--At least one of abdominal segments A6-8 with only two dorsal setae</DESCR> <RESP>4</RESP></COUPLET> <COUPLET><DESCR><NO>3.</NO>Mandibles bidentate</DESCR> <RESP><GENSP>E. (Ahtola) atra</GENSP> (Walker, 1832)</RESP></COUPLET> <COUPLET><DESCR><NO></NO>--Mandibles unidentate</DESCR> <RESP><GENSP>E. nodularis</GENSP> Boheman</RESP></COUPLET> <COUPLET><DESCR><NO>4.</NO>Mandibles bidentate</DESCR> <RESP><GENSP>Eurytoma appendigaster</GENSP> group</RESP></COUPLET> <COUPLET><DESCR><NO></NO>--Mandibles unidentate</DESCR> <RESP><GENSP>Eurytoma heriadi</GENSP> Zerova</RESP></COUPLET></KEY>

tree-like model of decisions and their possible outcomes

JATS-CON 2010

Page 20: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Concluding Question

How to support Publisher/Domain Specific constructs in the Archival DTD?

• Continue use of Named-Content

• New Miscellaneous Element

• Support for adding namespaced elements

• Other

JATS-CON 2010

Page 21: Portico: A Case Study in the Migration of Proprietary Formats to the JATS Archiving Format Sheila Morrissey, John Meyer, Sushil Bhattarai, Sachin Kurdikar,

Questions/Answers?

Thank you

John Meyer

Director of Data Technologies

100 Campus Drive, Suite 100

Princeton, NJ 08540

609 986-2220

[email protected]

JATS-CON 2010


Recommended