DSpace, digital preservation, and business models
ERPANET SeminarBusiness Models related to Digital Preservation
September 20-22, 2004
Julie WalkerMIT Libraries
The digital preservation challenge
“Information is being produced in greater quantities and with greater frequency than at any time in history. Electronic media, especially the Internet, make it possible for almost anyone to become a "publisher." How will society preserve this information and make it available to future generations? How will libraries and other repositories classify this information so that their patrons can find it with the same ease that they can locate a book on a shelf?
The ease with which electronic information can be created and "published" makes much of what is available today, gone tomorrow. Thus there is an urgent need to preserve this information before it is forever lost.”
(Source: National Digital Information Infrastructure and Preservation Program. http://www.digitalpreservation.gov/)
Characteristics of the problem
Obsolescence of technologyAccelerating rates of data collection and content creationGrowing complexity of digital information resourcesComplex digital objects that require specific software applications for reuseResource-intensive curatorial processNeed for funding and business models
(Source: “It’s about time: Research challenges in digital archiving and long-term preservation”. NSF and Library of Congress sponsored study. http://www.digitalpreservation.gov/repor/NSF_LC_Final_Report.pdf)
Market for digital preservation solutions
Libraries, archives, museums, and other cultural institutionsn Preserving intellectual and cultural heritage
Government agencies, private corporations, not-for-profit organizations, and private citizensn Preserving digital assetsn Legal and regulatory issues for government agencies
(Source: “It’s about time: Research challenges in digital archiving and long-term preservation”. NSF and Library of Congress sponsored study. http://www.digitalpreservation.gov/repor/NSF_LC_Final_Report.pdf)
Diverse set of projects tackling various aspects of the problem
DSpaceStorage Resource Broker (SRB)Australia National Library PANDORA UK Digital Curation Centre (DCC)UK National Archives PRONOM
DLF Global Digital Format RegistryU. of Pennsylvania Typed Object Model (TOM)FCLA Dark Archive In The Sunshine State (DAITSS)Royal Dutch Library &Elsevier Science/E-Archiving AgreementMany more…
DSpace is…
An open source digital asset management systemA technology platform for Institutional RepositoriesA federation of digital repositories across multiple academic research institutionsA production service of the MIT Libraries to its local research community
Institutional Repository
Institution-basedScholarly material in digital formatsCumulative and perpetualOpen and interoperable
(Source: Crow, Raymond. “The Case for Institutional Repositories: A SPARC Position Paper.” http://www.arl.org/sparc/IR/ir.html)
Institutional Repositoriesare unlike traditional archives
Acquisition at point of creationn Submissions can come directly from the creatorsn Includes non-document material
Shared curatorial controln Institutions and creators can establish content
guidelines or policies
Shared selection, processing responsibilityn Scalable, less-resource intensive approach
Digital Preservation
Repositories don’t “do” preservationPreservation operations are defined byn Digital collections in handn Cost/benefit tradeoffsn Local policy
Digital Preservation
MIT Philosophyn Lots of digital material is already lostn Most digital material is at riskn Preserving bits better than nothing n Capture as much information as possible n Evaluate cost/benefit tradeoffs over time
Digital Preservation Categories
Supportedn Provides for future content usabilityn Migration for texts, images, audio, etc.n Emulation for software, multimedia, etc.
Unsupportedn Bit preservation at minimumn Migration when possible
e.g. commercial conversion services
Digital Preservation Policy
MIT Policyn Supported formats
n e.g. TIFF, SGML/XML, PDF…
n Known/unsupported formatsn e.g. Microsoft Word, PowerPoint (common)…n e.g. Lotus 1-2-3, Visicalc, WordStar (less common)…
n Unknown/unsupportedn Highly complex and rare formatsn e.g. one-of-a-kind software programs…
DSpace preservation research and development
DSpace@Cambridge: development work on type registries, automated ingest, preservation action plans, and specific format investigation SRB: large-scale storage infrastructureSIMILE: infrastructure to cope with arbitrary metadata formats using RDFProposal for archiving scientific datasetsn Technically and organizationallyn Working with MIT Computational and Systems
Biology Program
DSpace Federation
What?n Emerging community of DSpace users/installations n Open source software (OSS) community
n Mostly sponsored programmers from DSpace installation sites
Who?n Research-generating organizations
n (e.g. libraries, government agencies, museums, archives)n world-widen Overlapping/complementary research interests
n NGOs and industry
DSpace Federation
Why?n Drive DSpace development
n open source development model
n Build critical mass of content n support useful interoperation and research test bed
n Leverage distributed expertisen e.g. in metadata and digital preservation
What is needed…
“Long-term digital archiving requires systems, institutions, and business models that are robust enough to withstand technological failures, shifting computing platforms and media, changes in institutional missions and interruptions in management funding.”
(Source: “It’s about time: Research challenges in digital archiving and long-term preservation”. NSF and Library of Congress sponsored study. http://www.digitalpreservation.gov/repor/NSF_LC_Final_Report.pdf)
Vision for the DSpace Federation
DSpace Installations
ServiceProviders
/Value-Added Resellers
DSpace OSS CommunityDSpace
software
Independentdevelopers
/hackers
Usersponsored
development resources
Industrysponsored
development resources
Service providersusing DSpace/
DSpace services user base
DSpaceFederation
MIT U. Cambridge
U. Amsterdam
HP
OCLC
ANU
Hong Kong U.Sci. & Tech
Consulting firm
Internet co.
Libraries services org.
Hardware co.
Libraries
IT Services co.
U. Toronto
U. Rochester
Corporations
Governmentagencies
NGOs
Related Initiatives
Related Initiatives
BioMed Central
A federation of DSpace installations provides…
Safety in numbers (e.g. large community of adopters with vested interest)Critical mass of content for testingVariety of use cases for managing digital contentCollaboration opportunities Relationships with related initiativesDefined market for digital preservation services
DSpace can serve as a focal point for examining economic issues
Further research and development will help drive down preservation costs by identifying ways to:n Reduce the up-front ingest costsn Automate ongoing preservation processes n Distribute and share costsn Develop economies of scale
Comparison of a variety of use cases will further understanding of the economic issuesn Identify common issues and costsn Opportunity to share best practices, particularly for
revenue models
Collaboration will produce a greater impact than individual initiatives
Yield results that will meet the needs of manyRaise awareness of issuesCollectively lobby proprietary software vendorsPursue joint funding opportunities for high visibility projects
DSpace technology platform is positioned to address preservation
Capturesn Digital research material in any formats directly from creators
Describesn Descriptive, technical, rights metadatan Assigns persistent identifiers
Distributesn Delivers via Web, with necessary access controln Open and visible archive
Preservesn Large-scale, stable, managed long-term storage (bit preservation)n Active research and development in preserving access to content
DSpace is already being used across the identified market
§ 115 institutions have registered for private name space§ 50-50 US/non-US§ Colleges and Universities§ Museums and Archives§ Research organizations§ Government agencies§ Private industry
Open Source Software enables distributed community R&D
Code available to all, free of chargeShared responsibility for software enhancement and evolutionShared benefit from research and development workAbility to leverage distributed expertise
n metadatan digital preservation
Service providers/VARs provide software and services
ImplementationSW bundling/integrationConsultingContent managementArchival storageApplication hostingMigration and emulation Digital archaeology
Risks
Maintaining momentum of DSpace Open Source Software communityBusiness models at individual universities –will they be able to sustain DSpace and involvement in OSS community?Will users put digital items in DSpace?Other emerging, leapfrogging technologies
What is needed…
“Long-term digital archiving requires systems, institutions, and business models that are robust enough to withstand technological failures, shifting computing platforms and media, changes in institutional missions and interruptions in management funding.”
(Source: “It’s about time: Research challenges in digital archiving and long-term preservation”. NSF and Library of Congress sponsored study. http://www.digitalpreservation.gov/repor/NSF_LC_Final_Report.pdf)