iConference 2011
Archiving as a Service - A Model for the Provision of Shared Archiving Services
Using Cloud Computing
Jan Askhoj – janaskhoej[at]gmail.com
Shigeo Sugimoto – sugimoto[at]slis.tsukuba.ac.jp
Mitsuharu Nagamori – nagamori[at]slis.tsukuba.ac.jp
University of Tsukuba, Japan
2
The Rise of Cloud Computing
Big business: Reported that the cloud computing market will grow to more than $150 billion in 2013
Gartner listed cloud computing as one of the most hyped technologies in 2009.
Many benefits: Reduced cost, increased storage, no software deployment, flexibility, mobility and allowing IT to shift focus.
Cloud computing is being used increasingly for content creation and storage.
* Global Industry Analysts, 2010
3
A Cloud Definition (One of Many)
Cloud Computing is an abstracted, scalable plat-form for service delivery.
Cloud computing makes use of existing technologies that can be described via a layered model.
Access to both platform and services is available via the internet.
Availability, quality and number of services are offered according to agreements with a provider.
- Vaquero et al. 2009
4
Cloud Computing from an Archiving Perspective
In the cloud, archives may not have knowledge of records creation hardware and software. How do we document such formats?
Cloud Providers are good at managing data and hosting software. But what if something happens?
There are providers of services for backup, but not for preservation.
Can we find and read documents created and stored in the cloud in 10 years from now?
5
I found the document... If only I knew how to access it!
6
Object of Research
Providing a reference model for cloud based archiving that makes possible:
Offering trusted storage and long term preservation as a cloud based service.
Automatically providing preservation metadata and information packages for transfer of digital records.
Extending preservation to as early in the records lifecycle as possible.
7
Current Archive Model: OAIS
Reference Model for an Open Archival Information System (OAIS).
Defines Entities, Relationships and Information Types in digital archives.
Consultative Committee for Space Data Systems, 2002.
8
OAIS and the Cloud
The OAIS Model does not cover the use of a shared platform for storage, outside the control of an archive. Such functionality overlaps with several OAIS functional entities.
An OAIS Archive does not cover the early stages of the document lifecycle. With a shared platform, digital objects can be immediately accessible to an archive for early preservation planning.
In OAIS, Digital Objects and metadata are included in information packages. If Producer and Archive share a common platform, this is not necessary.
9Hardware/Facilities
ConnectivityAbstraction
OS
Virtualization
Data Metadata Content
Applications
APIs
Presentation (User facing)SaaS (Software as a Service). Users access applications via user-facing software or APIs.
PaaS (Platform as a Service).Virtualized platform for executing applications and providing storage.
IaaS (Infrastructure as a Service). Hardware and Infrastructure.
A General Layered Model for Cloud Computing Services
10
Some Characteristics of the Layered Model
In a layered model, each layer offers defined services to the layers above.
Services are abstracted and interchangeable. Benefits:
- Makes it easy to offer and take advantage of defined levels of services.
- Facilitates resource sharing
- Facilitates migration
Archive
DigitalObject
DigitalObject
BusinessSystem
StorageLayer
Simple Layered Cloud Archiving System
InteractionLayer
Trusted repository(bit-level integrity)
12
Expanding the Simple Model
Storage does not equal preservation. Information is needed to support: “Viability,
Renderability, Understandability, Authenticity, and Identity of Digital Objects” (known in OAIS as an Information Package).
13
Proposed Four Layer Model
Interaction Layer: User facing Archives/ Records Management Systems and Business Systems.
Preservation Layer: Adds preservation information. Turns Digital Objects into Information Packages for use by Archives/Records Management Systems.
SaaS Layer: Applications represent bit-strings as Digital Objects used by systems and users.
PaaS Layer: Application platform and trusted repository for storing bit-strings.
InformationObject
DataObject
Represent.Information
DigitalObject
BitSequence
1+
1+
1+
OAIS Information PackageLayered Model
InteractionLayer
PreservationLayer
SaaSLayer
PaaSLayer
PreservationDescriptionInformation
InformationPackage
Where does Preservation Metadata come from?
Business System Metadata: Generated at the time of document creation or records export.
Registry Information: Pre-provided (semi-static) information about registered Entities and Information Types
Event Related Information: Information describing changes to Digital Objects and metadata taking place during the preservation process.
PaaSLayer
SaaSLayer
PreservationLayer
Interaction Layer
Digital Object Type &
Metadata
BitstreamStorage& API
InformationPackage
Layered Model Applications, Information and Provided Services
ArchiveSystem
PackageCreator
BusinessSoftware
Storage/HostingPlatform
Application Service
PreservationInformation
InformationPackage
DigitalObject
Bit-stream
Information Type
Case Study: Japanese Government
Problems with system incompatibility and insufficient record management has led to a new Archives Policy and a new IT Strategy
One part is a cloud computing project: The Kasumigaseki Cloud ( 霞が関クラウド ). This is still in the early stages of planning.
We focus on three archiving problem areas to see how these could be resolved using our model.
18
PlatformPlatform Platform
Record Historic Record
DestructionDestruction
Common Document Registration System
Registration
Transfer Plan
Preservation Plan
RetentionSchedule
Agency Records Mgmt.
AgencyNational Archives
BusinessSystem
NationalArchive
Current Workflow
BusinessSystem
BusinessSystem Business
System
BusinessSystem
RecordsMgmt. System
Problem Areas
Lack of system integration: Individual government offices use different systems. Preparing records is a time consuming task.
Lack of resources: The burden of transferring records to the National Archives lies with government agencies. The size of the NAJ makes it hard to provide assistance.
Preservation: Lack of preservation of records in government agency systems.
Applying the model
Assumption that the Kasumigaseki Cloud will offer both a storage/hosting platform (PaaS) and software services (SaaS)
Added functionality in Preservation Layer: Registration Harvesting Preservation Reporting
ArchiveSystem
PaaSLayer
PackageLayer
SaaSLayer
ARMLayer User Facing
Systems Transfer Transfer
SaaS Business Systems → Digital Objects
Platform → Bit-sequences
Preservation Description Information
Representation Information
Package Information
Package Desc.
Functionality → Registration, Harvesting, Conversion, Reporting
RMS
Agency Records Mgmt.
AgencyNational Archives
BusinessSystem
Back-end
Transfer Plan
Preservation Plan
RetentionSchedule
22
Benefits and Limitations in Case Benefits:
Automatic package creation, simplifying records transfer.
Early and consistent preservation metadata addition Allows keeping current workflow, but adds automation
Limitations/Requirements: Cloud platform must be truly trustworthy with no
unexpected change or loss of service. Need good export of content and metadata from SaaS
business systems Providing semantic or community specific information
23
Concluding Remarks
We believe our model has a number of advantages when developing a cloud archive framework:
1. Builds on OAIS model concepts and information types.
2. Adds trusted storage and preservation to early stages in the document lifecycle.
3. Simplifies archive system design by allowing organizations choose different levels of service.
Current Status: Work on defining information classes and properties. Designing a test system using the model.
24
Thank you !
ありがとうございました !
University of Tsukuba, Japan
25
References1. ISO 15489-1:2001 - Information and documentation - Records management - Part 1: General. 2001.2. Requirements for Electronic Records Management Systems. 2002.
http://www.nationalarchives.gov.uk/documents/metadatafinal.pdf.3. Reference Model for an Open Archival Information System (OAIS). Consultative Committee for Space
Data Systems, 2002.4. Electronic Records Archives ERA Lifecycle. 2004. http://www.archives.gov/era/pdf/era-life-cycle.pdf.5. National Archives Law. National Archives of Japan, 2007.6. Outline of the National Archives. 2007. http://www.archives.go.jp/english/abouts/outline.html.7. Chan, T. Japan to build massive cloud infrastructure for e-government. Green Telecom.
http://www.greentelecomlive.com/2009/05/13/japan-to-build-massive-cloud-infrastructure-for-e-government/.
8. Guenther, R. Understanding and Implementing the PREMIS Data Dictionary for Preservation Metadata. 2009. http://www.digitalpreservation.gov/news/events/ndiipp_meetings/ndiipp09/docs/June26/premis-ndiipp-20090626.ppt.
9. Koga, T. Recent development of the government information policy in Japan. International Federation of Library Associations and Institutions, Government Information and Official Publications Section (GIOPS) Newsletter, 8, (2010), 8-11.
10.Kulovits, H., Becker, C., and Kraxner, M. Plato: A Preservation Planning Tool Integrating Preservation Action Services. 5173/2008, (2008), 413-414.
11.Okamoto, S. New Developments in Managing Records in Japan - The Establishment, Direction and Structure of the Archive Law. 2010.
12.Sugimoto, S. Ensuring the Preservation and Use of Electronic Records. (2007).13.Vaquero, L.M., Rodero-Merino, L., and Caceres, J. A Break in the Clouds: Towards a Cloud
Definition. ACM SIGCOMM Computer Communication Review 39, 1 (2009), 50-55.14.Youseff, L., Butrico, M., and DaSilva, D. Toward a Unified Ontology of Cloud Computing. Grid
Computing Environments Workshop, (2008), 1-10.