CASPAR Cultural, Artistic and Scientific knowledge for Preservation Access and Retrieval.

Post on 29-Jan-2016

217 views 0 download

Tags:

transcript

CASPARCultural, Artistic and Scientific knowledge for

Preservation Access and Retrieval

Cultural, Artistic and Scientific knowledge for

Preservation Access and Retrieval

WHAT:objectives (from the call)

• Develop systems and tools which will support the accessibility and use over time of digital cultural and scientific resources.– Explore how to preserve the availability

and authenticity of digital resources over time

– Support emerging complexity of scientific, cultural and creative objects and associated repositories

Objectives• Objective 1: to lay the foundation for all future preservation

activities (CASPAR methodology)• Objective 2: to create key advanced components to use in all the

preservation activities (CASPAR components)• Objective 3: to create the long-term autonomous system to support

all the preservation activities (CASPAR framework)• Objective 4: to demonstrate the validity of the CASPAR framework

with heterogeneous data and a variety of innovative applications (CASPAR testbeds)

In addition to these fundamental objectives, CASPAR offers supporting activities in order to guarantee the successful execution of the project results even after the end of the project and the re-usability of outcomes in a wider domain than the testbed-related sectors:

• Objective 5: to build up the CASPAR preservation user community in order to create consensus around the initiative and gather a critical mass of potential users/customers

• Objective 6: to create a self-sustainable model for the CASPAR process and offer supporting activities in order to promote the successful exploitation of the project results after the end of the project.

WHAT: vision• CASPAR manages knowledge to keep archives

alive through time: – Preserve information & knowledge – not just “the bits”

• Preservation is a process, not a one-shot event– transforming content (migration, emulation, etc.) to

adapt it to new constraints of rendition and playabilityand– enriching content to preserve its intelligibility and

(re)usability (not just rendering)

• OAIS provides a general framework: – current implementations deal more with format than the

interpretation of data – CASPAR proposes a richer implementation for dealing

with content interpretation

WHAT: expected results• CASPAR approach and framework to support

the “end-to-end” lifecycle for scientific, cultural and creative digital resources– Infrastructure– Tools– Techniques

• Testbeds: science, culture, artistic to identify and test common infrastructure– Supported by discipline specific access– Embedded in long-lived institutions

• must be relatively easy to use• must have a low “buy-in” in terms of

effort required to adopt the CASPAR paradigm

• must avoid requiring wholesale change of everyone else’s systems

• must be decentralised and reproducible so that it can live on after the formal end of the CASPAR project.

FOR WHOM

• Potential USERS:– Creators of the resources– Funders of the resources and their

preservation– Curators of the resources– Suppliers of preservation-related

services– Users of the information

...for WHOM

• Large users communities involved with– Science:

• European Space Agency• CCLRC

– Culture• UNESCO

– Artistic• INA, IRCAM, CIANT …

• Creators• Funders• Curators• Suppliers• End-users

.......for WHOM

• Multi-Industry perspectives– Software– Hardware– Middleware

HOW: Foundations of Preservation approach

• OAIS Reference Model

• OAIS related stds work:– Producer-Archive

interface– NARA/RLG Audit &

Certification draft – now released for testing and comment

– SIP, XFDU….others

• OAIS based projects– InterPARES– ….many others

HOW: Implementation plan structure (blocks of work)

HOW (cont’d): S&T approach

• Component-based research– OAIS-based components

• e.g. Storage

– OAIS-based extensions– Next generation components– Focused research & testbeds: vertical

threads

HOW (cont’s): OAIS extensions

• Knowledge driven approach• Knowledge management to support long-

term preservation of concepts/information:– Single, complex, on demand, interactive

objects– DRM – Authenticity– Access– Storage

Framework

• Integrated Framework: supports the development of the three vertical testbeds– Component-based research Open standards &

Open Source development methodology – Framework: integration of research components

with existing off-the-shelf/modifiable-off-the-shelf components

• Service Oriented Architecture for service delivery

• Process control and composition

CASPAR Testbeds• Three testbeds: Cultural, Performing Arts,

Scientific– Cultural <- UNESCO– Peforming Arts <- INA , IRCAM– Scientific <- ESA (with CCLRC)

• Complex, multi-source, multifaceted data• Specific requirements on preservation (technical,

delivery, legal)• Specific research issues: as matter of facts, they

represents three focused research streams• Identifying and confirming common

infrastructure elements

CASPAR testbeds:Testing and Validation

• Common design & validation methodology– Uniform evaluation parameters

• Each testbed has its own user communities

• Continuous feeding to the Project Performance Evaluation process

CASPAR Integrated architecture

CCLRC Infrastructure Build-up

European Preservation Infrastructure

Alliance

Other Alliance Members e.g.

ESA

Future Alliance

Members

CCLRC Curation Facility

CASPAR

Other CCLRC projects

Other CCLRC projects

FP7 projects

Registries

UK DCC Organisation

Industry

research collaborators

standards bodies

testbeds& tools

communities of practice: users

community support & outreach

research

development co-ordination

service definition & delivery

management & admin support

curation organisations eg DPC

Collaborative Associates Network of DataOrganisations

DCC Registry

Sharing RepInfo

• RepInfo is needed• RepInfo is extensive• May need to “extend” RepInfo as

Designated Community and/or its knowledgebase changes

• How can we avoid every Repository repeating the work– Need to control costs

• Need to share the effort

Requirements

• Data users - need to be able to obtain pre-identified RepInfo

• Curators: need to be able to find suitable pre-existing RepInfo to re-use

Or• Create RepInfo

Registry for Representation Info

Example of use of Representation Information Labelling

The Digital Object could have RepInfo packed with it

Support automated access & processing

Use of RepInfo

CPIDStructure = CPID

Semantics = CPID

Rendering s/w = CPID

CPID

CPID

Structure = CPID

Semantics = CPID

Rendering s/w = CPID

Structure = CPID

Semantics = CPID

Rendering s/w = CPID

External Registry

Each “bag of bits” has an associated pointer (CPID) to a Label

Registry Interface Requirements• Give it an identifier, give me back

something (e.g. RepInfo)• Allow me to search for RepInfo• Interoperable with other (format)

registries• Not limited to single protocols

Registry API

API allows applications to talk to many different implementationshttp://dev.dcc.ac.uk/cvs

API

ebXML Registry Version 3.0: Simplified View of Architecture

Source: ebXML Registry Services and Protocols Committee Draft, 10 February 2005

Labels and CPIDs

Example RepInfo LabelA Label is itself RepInfo. It provides a way to collect together in a sensible way lots of individual pieces of RepInfo

Re-using RepInfo

• Existing RepInfo can be used to build up further RepInfo– E.g. refer to

existing RepInfo in labels

Versioning and LID

• Each object has a unique identifier• Versions of an object share a “logical

ID” (LID)• Simply using the LID gives the latest

version• Can specify a particular version

Clients

• DCC Registry:– Web browser– Thick client (http://registry.dcc.ac.uk)

• Any Registry– Applications using API

GUI access to Registry

Classifications

• Many Classification Schemes• Help to find RepInfo

Initial RepInfo

• Simple text– ASCII– Unicode– UTF7/8

• PDF, Word(!)• FITS format• FITS standard dictionaries• Things that are “MISSING”

RepInfo entry

• Simple command line tool

Creating Repinfo

• There are many tools which can be used to create RepInfo:– Simple text editor to create text

describing the data– Complex tools to capture data

description e.g.• EAST (see next slides)• DFDL etc

– Programming languages of various sorts

EAST descriptions

Snapshot d ’écran OASIS

OASIS tool for creating EAST descriptions

Example of EAST description

Using RepInfo

• A pointer to RepInfo can be attached to data

• The RepInfo can be used to – Display– Examine – Process– Re-use

the data

• Laser facility produces Binary data normally used by proprietary software

• Describe using EAST data description language

• Use in generic application (shown here) to display/process

Example of use of RepInfo

Simple Buy-In

• Need to add RepInfo to your Data Objects?

• Does the RepInfo already exist?– Yes: get its ID and put that in a label– No: register what you have – be

assigned an ID.• Add more details later when needed• Or others can add more details

Operating Registries

• See http://dev.dcc.ac.uk/twiki/bin/view/Main/RegistryProcedures