Cornell’s Project Harvest
CNI Fall 2001 Task Force Meeting
Anne R. Kenney and Nancy Y. McGovern
Project Harvest Overview• Subject-based approach: agriculture
– National Preservation Plan– USAIN– Mann Library
• Core Historical Literature
• TEEAL
• USDA
• 75% of core journals now available in electronic form
Focus of Planning Year
• Investigating conditions under which publishers willing to participate in the development of an Subject-Based Digital Archives (SBDA)
• Two pronged iterative cycle: – Explore (potential of SBDA, business model,
broader preservation matrix)– Build (using agriculture as pragmatic
application)
PBDA
SBDA
Intersection of Digital Archives
Format-based
USAIN Survey
• Access– 45% indicated need for both print and electronic– 55% indicated e-journal already substituted for
print; – 84% would cancel print if reliable archives built– JSTOR study – 78% of faculty think hard copy
should be retained even if reliable digital archives
USAIN Survey• Observed loss in e-journals:
• 45% don’t know• 22% yes noted difference• 22% no, no difference
• What to preserve (priority order):1. Preserve content plus journal “look and feel” plus
publisher functionality2. Preserve content plus journal “look and feel”
• How to preserve:• Over 90% rejected single solution; prefer multiple
custodians or 3rd party
Sept. 6 Publishers’ Meeting
• American Dairy Science• Academic/Elsevier• American Phytopathological Society• BioOne• CABI• NRC-Canada • Wiley • NLA and USAIN representation
What’s the Publisher Incentive to Archive?
• Protect assets, continuing value of material as it ages
• Low additional overhead
• Satisfy customers
• Risk tolerance; sustainable loss
• As calling card for or bi-product of services
Meeting Results
• All publishers intend to establish archives
• Shift from content currency to database development
• Publishers see revenue stream in retrospective holdings
• Publishers less concerned than librarians about “artifactual” archiving
Meeting Results• Differing perceptions around who should
do digital preservation • Librarians want trusted third-party
archiving• Publishers insufficiently aware that others
don’t trust them to safeguard materials and insufficiently aware of what it takes to archive
• Distrust of government (competition)
Meeting Results
• Publishers not enthusiastic about “lit” archives—some would consider it if revenue returned to publisher
• Convergence in formats• Reluctance to force authors to conform • Unwilling to share proprietary publisher DTD• Willing to consider archival DTD as another
output
Trigger Events
• None acknowledged by publishers
• Technology watersheds:– Retrofitting legacy digital files – When paper no longer represents access and
preservation alternative for electronic
SBDA triggers
• Different subject domains have different half-lives
• When common interests outweigh individual interests
• Stakeholder pressure: when detrimental not to participate
Access and Funding
• Publishers and librarians went into the meeting presuming different things
• Publishers differed on access issues
• Librarians asserted that publishers would have to finance dark archives
SBDA Distinguishes Between Metadata and Data
• Dark metadata/dark data
• Light metadata/light data
• Light metadata/dark data
• Light metadata/no data
Multiple options for different publishers and audiences
SBDA Hybrid Model
• Ultimate goal is lightness• Comprehensiveness and buy-in trumps lightness• Commonality over distinctiveness emphasized• Hybrid model enables combinations of light to
dark metadata and data• Access to metadata/data will change over time
and in response to particular circumstances• Offers win/win possibilities
Possible Sustainability Models
• Preservation surcharge on subscription
• Preservation endowment
• Bartered access privileges for preservation
• Business insurance policy model
• Government support
• Preservation pledge drives
Possible Sustainability Models
Possible Sustainability Models
• Develop new markets
• Harness the free riders• Charge for services, not content and
archiving• Build value-adds on the SBDA
Next Steps
• Developing subject domain profile
• Surveying agricultural publishers to determine level of cooperation in SBDA
• Evaluating existing architectural models
• Writing CLIR report on the significance of the SBDA
Subject-based Profile• Who are the stakeholders? How many publishers?
Research demographics of new user groups? • How big is the field? How structured and defined
is it? What’s important? Why? Change driven by discipline and by technology
• How standardized is the literature? (xml, etc)• How complex/fixed is it? (database, virtual)• Who owns rights for re-use? Assessment of
economic, first-use, citations, second use, technology
How Willing to Cooperate?
• Pre- and post-competitive collaboration• Standardized, normalized, and limited
number of formats• Preservation from conception
(requirements of authors; shut off point for non cooperation)
• Archival DTD• Preservation metadata
How Willing to Cooperate?
• Self certification/ external certification
• Light (and common) metadata, move toward light data (monitoring with scheduling)
• Economy of scale
• Willing to financially support the effort