+ All Categories
Home > Documents > The Role of Trustworthy Digital Repositories in Sustainability - Data Science … · 2016-12-21 ·...

The Role of Trustworthy Digital Repositories in Sustainability - Data Science … · 2016-12-21 ·...

Date post: 23-Apr-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
13
The Role of Trustworthy Digital Repositories in Sustainability David Giaretta [email protected] www.giaretta.org and www.iso16363.org Big Data to Knowledge AHM & Open Data Science Symposium 29 Nov – 1 Dec 2016
Transcript

Big Data to Knowledge AHM & Open Data Science SymposiumBethesda, MD 29 Nov – 1 Dec 2016

The Role of Trustworthy Digital Repositories in SustainabilityDavid Giaretta www.giaretta.org

1

The Role of Trustworthy Digital Repositories in Sustainability

David [email protected]

www.giaretta.org and www.iso16363.org

Big Data to Knowledge AHM & Open Data Science Symposium29 Nov – 1 Dec 2016

Big Data to Knowledge AHM & Open Data Science SymposiumBethesda, MD 29 Nov – 1 Dec 2016

The Role of Trustworthy Digital Repositories in SustainabilityDavid Giaretta www.giaretta.org

2

Interoperability, Re-use, Preservation and Sustainability

Interoperability

Replication of resultsExploitation/ Re-use

Preservation

UsabilityVALUE

Sustainability• What do the bits mean?• Need “metadata”• What kinds? How much of

each kind? • EU Commissioner for the Digital Agenda said:

“Data is the new Gold” but• Gold is precious because it is rare, and does not combine • Data is precious because there is so much and it becomes

more valuable when it is combined

“metadata”

Big Data to Knowledge AHM & Open Data Science SymposiumBethesda, MD 29 Nov – 1 Dec 2016

The Role of Trustworthy Digital Repositories in SustainabilityDavid Giaretta www.giaretta.org

3

Digitally encoded information – 1’s and 0’s

• BITS: 01001110 01001101 01010001 01001101 01010000 01001010 00100000 00100000

• HEX:• Two IEEE 754 32 bit real numbers:

8.6116461E8 1.35644119E10• Two 32 bit integers 164211241 168379396• Actually... ....• ASCII Characters: NMQMPJ • ………. Was my flight reference

Example: “ca fe ba be” at start indicates Java class file

Assuming “big-endian”

What does this mean?

4e 4d 51 4d 50 4a 20 20

Big Data to Knowledge AHM & Open Data Science SymposiumBethesda, MD 29 Nov – 1 Dec 2016

The Role of Trustworthy Digital Repositories in SustainabilityDavid Giaretta www.giaretta.org

4

…semantics …

Longitude Latitude Ozone

Could be Findable and Accessible - encoded as Comma Separate Value (CSV) file in ASCII or Unicode or encoded with XML markup

Can anyone guess what this table means?

Date

Big Data to Knowledge AHM & Open Data Science SymposiumBethesda, MD 29 Nov – 1 Dec 2016

The Role of Trustworthy Digital Repositories in SustainabilityDavid Giaretta www.giaretta.org

5

OAIS (ISO 14721) and digital preservation

• Reference Model for Open Archival Information System (OAIS) provides a very general approach

• OAIS approach to digital preservation:– covers all types of digitally encoded information– provides a way to test whether preservation is successful– does not require seeing into the future– does require transparency – be clear what is being promised

• but does not require “open access”• Very widely accepted and provides the basis for pretty well all work in

digital preservation• OAIS provides a good basis for certification• Available free from https://public.ccsds.org/Pubs/650x0m2.pdf

Big Data to Knowledge AHM & Open Data Science SymposiumBethesda, MD 29 Nov – 1 Dec 2016

The Role of Trustworthy Digital Repositories in SustainabilityDavid Giaretta www.giaretta.org

6

Preserving digitally encoded information• In order to use/understand the bits requires what OAIS calls

“Representation Information” – anything needed to allow the data to be interpreted by software or people and certainly requires semantics and many other things

• Additional things such as software which are readily available now may not be available in future

• If the bits are unchanged we can keep hashes and be pretty sure of authenticity.• If we have to change the bits e.g. Transform to another format then

• Evidence of Authenticity needs care• Probably needs other software etc

• It may be that the information must be handed over• To different system and/or different organisation• Need to take care of the details which tend to be ignored

Big Data to Knowledge AHM & Open Data Science SymposiumBethesda, MD 29 Nov – 1 Dec 2016

The Role of Trustworthy Digital Repositories in SustainabilityDavid Giaretta www.giaretta.org

7

Partial Representation Information Network for MERIS Level 2 data

Big Data to Knowledge AHM & Open Data Science SymposiumBethesda, MD 29 Nov – 1 Dec 2016

The Role of Trustworthy Digital Repositories in SustainabilityDavid Giaretta www.giaretta.org

8

Role of people (and automated systems)

• Creation of data and capture/creation of the metadata required for use/exploitation now and into the future

• Follow “Active” Data Management Plans (RDA and CCSDS/ISO)• Funding, Management and Operation of the repository

• Defines the “Designated Community” e.g. people who understands particular sub-discipline

• Undertakes preservation activities for the data – ensuring that the data will be usable by members of the Designated Community despite changes in h/w, s/w, environment etc

• Use the data (including by the Designated Community)• Exploit and create value from the data• Judge the value of the data

Big Data to Knowledge AHM & Open Data Science SymposiumBethesda, MD 29 Nov – 1 Dec 2016

The Role of Trustworthy Digital Repositories in SustainabilityDavid Giaretta www.giaretta.org

9

Many types of Audit and Certification

• ISO 16363 focuses on keeping the Information understandable / usable• www.iso16363.org• based on OAIS concepts – including usability• 100+ metrics covering all aspects of the repository to ensure the auditor looks at the details• uses the ISO certification process on which our lives depend in so may areas e.g. medical equipment, food

safety, airlines, automobiles etc.- 3rd party visits and evaluation• ISO 27000 type audits focus on keeping the bits safe in the context of the needs of the organisation

• the information is an asset of the business – what happens after the organisation ceases to exist is of no concern. Security certification may be needed for any information that can be used to identify an individual

• DIN 31644• audit and certification process not clear

• ISO 15489 – Records Management• No formal audit process

• World Data System and Data Seal of Approval• Small set (16) metrics – not detailed• Recognised as much “lower” than ISO 16363 (DSA as “bronze” and ISO 16363 as “gold”)

Big Data to Knowledge AHM & Open Data Science SymposiumBethesda, MD 29 Nov – 1 Dec 2016

The Role of Trustworthy Digital Repositories in SustainabilityDavid Giaretta www.giaretta.org

10

ISO Standards for certification

• ISO 16363: Audit and Certification of Trustworthy Digital Repositories

• Available free from https://public.ccsds.org/Pubs/652x0m1.pdf• ISO 16919: Requirements For Bodies Providing Audit And

Certification of Trustworthy Digital Repositories• Available free from https://public.ccsds.org/Pubs/652x1m2.pdf• Used for accreditation of auditors by National Accreditation Bodies• Auditors available early next year

Big Data to Knowledge AHM & Open Data Science SymposiumBethesda, MD 29 Nov – 1 Dec 2016

The Role of Trustworthy Digital Repositories in SustainabilityDavid Giaretta www.giaretta.org

11

Sustainability and Trustworthiness • Requires resources ($ / £ / …)

• Are the resources being well spent – will the data be usable?• Is the Value (or potential value likely to be derived) worth the Cost

• An important factor in appraisal – cannot preserve everything• There are economies of scale• There are limits to the availability of expertise

• Competition between repositories?

• Trustworthiness is a way to choose between repositories• ISO 16363 certification requires detailed evidence and is fundamentally

linked to usability - from which value, and hence sustainability, is derived

Big Data to Knowledge AHM & Open Data Science SymposiumBethesda, MD 29 Nov – 1 Dec 2016

The Role of Trustworthy Digital Repositories in SustainabilityDavid Giaretta www.giaretta.org

12

Useful Links• OAIS

• WEB pages: www.oais.info• Site to gather proposals for OAIS updates in 2017: http://review.oais.info

• ISO 16363:• www.iso16363.org

• Integrated GLOSSARY of digital preservation http://www.alliancepermanentaccess.org/index.php/consultancy/dpglossary/

• SKOS ontology to show relationship between terms from different glossaries• OAIS, APARSEN, DPC, ANZ, SNIA, INTERPARES, ISO16363

• Active Data Management Plans:• CCSDS/ISO

• http://cwe.ccsds.org/moims/default.aspx#_MOIMS-DAI• Research Data Alliance:

• https://www.rd-alliance.org/groups/active-data-management-plans.html

• Me:• www.giaretta.org

Big Data to Knowledge AHM & Open Data Science SymposiumBethesda, MD 29 Nov – 1 Dec 2016

The Role of Trustworthy Digital Repositories in SustainabilityDavid Giaretta www.giaretta.org

13

[email protected]


Recommended