PREMIS at the British Library

Post on 20-Jan-2016

21 views 0 download

Tags:

description

PREMIS at the British Library. Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009. General. Archival Information Package (AIP) AIP is just a conceptual entity Conceptual (generic) data model Content files stored on write once media - PowerPoint PPT Presentation

transcript

PREMIS at the British Library

Markus Enders, The British Library

PREMIS Implementation Fair, San Fransisco, CA

07 October 2009

2

General

Archival Information Package (AIP) AIP is just a conceptual entity Conceptual (generic) data model Content files stored on write once media Content files may be containerized (stored in ZIP or WARC

files)One or more containers per AIP; files in containers may belong to various AIPs

AIP Descriptor: METS file describes the content of the AIPstructure, files, descriptive metadata, preservation metadata

Different METS profiles for different content streamseJournals, newspapers (born digital and digitized), web archiving

Common underlying document model for all AIPs

3

METS Descriptor

What is stored in the METS Descriptor? Structure of the document (logical and physical in different

structMaps)Not all content streams have two structMaps (born digital streams have only on)

Descriptive metadata File Section

Defines container files as well as content files (nested <file> elements)

4

METS Descriptor

What is stored in the METS Descriptor? Structure of the document (logical and physical in different

structMaps)Not all content streams have two structMaps (born digital streams

Descriptive metadata File Section

Defines container files as well as content files (nested <file> elements)

Preservation metadataPreservation metadata for files and representations

5

METS Descriptor

What is stored in the METS Descriptor? Preservation metadata:

Preservation metadata for files and representations

Focusses on: Audit trail – events and agents Technical metadata – basic technical metadata in METS

and PREMIS Assumption: future migrations of files necessary

No emulation considered; no environment information stored

<mets:file> elements <mets:div> elements

6

Preservation Metadata (PREMIS)in METS

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output

Newspapersuses PREMIS 2.0; MODS 3.3; METS 1.8

Web Archivinguses PREMIS 2.0; MODS 3.3; DC; METS 1.8

7

Preservation Metadata (PREMIS)eJournal content stream

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output

AIP model: One AIP per article, issue, journal, digital manifestation

Any changes will lead to a new AIP; old version of AIP is referenced

8

Preservation Metadata (PREMIS)eJournal content stream

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output

AIP model: One AIP per article, issue, journal, digital manifestation

Journal, Issue, Article: AIP consists just of a METS descriptor (mainly descriptive metadata (MODS) embedded and preservation metadata:

PREMIS: regarded as representations of intellectual entities Relationships between representations are recorded in MODS record

9

Preservation Metadata (PREMIS)eJournal content stream

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove dtd

AIP model: One AIP per article, issue, journal, manifestation

Digital Manifestation: AIP consists of content files and METS descriptor. METS descriptor contains PREMIS records for files and one for the Digital Manifestation itself

Relationships to article recorded in PREMIS record (manifestationOf) Relationships to submission is recorded in PREMIS

(containedInSubmission)

Submission: received content files in ZIP (one AIP)

10

Preservation Metadata (PREMIS) and METS:eJournal content stream

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output

amdSec: one amdSec per PREMIS record; referenced from <mets:file> and

<mets:div> elements Use of <premis:object>; <premis:agent>; <premis:event> elements

techMD: Extracted data from Jhove (files) PREMIS record of a file

digiprovMD: PREMIS record of representations (journal, issue, article) PREMIS record of a file

11

Preservation Metadata (PREMIS) and METS:eJournal content stream

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output

PREMIS elements used: objectIdentifier objectCategory preservationLevel size fixity (MD5, SHA-512) format (PRONOM) Relationships, events and agents where necessary

12

Preservation Metadata (PREMIS) and METS:eJournal content stream

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove output

PREMIS elements used: objectIdentifier objectCategory preservationLevel size fixity (MD5, SHA-512) format (PRONOM) Relationships, events and agents where necessary

Redundantly in METS <file> element}

13

Preservation Metadata (PREMIS):relationships

PREMIS relationships: manifestationOf (between Manifestation and Article) containedInSubmission (between Manifestation and

Submission)

PREMIS relationships (between files: m-n relationships): migration uncompression modification

Relationships are always stored in <digiProvMD> Premis records for files will have techMD and digiProvMD

14

Preservation Metadata (PREMIS):events

PREMIS events (on file level): integrityCheck formatIdentification validation wellformness propertyExtraction

PREMIS events (on representation level): metadataUpdate

Relationships are always stored in <digiProvMD> Premis records for files will have techMD and digiProvMD

15

Preservation Metadata (PREMIS):events

PREMIS events always have an agent

Event and agents are stored in each PREMIS record:

In case an event effects more than one object, it must be repeated in each object’s PREMIS record.

Using the same identifier indicating it is the same event.

16

Preservation Metadata (PREMIS)in METS

Content streams: eJournals

uses PREMIS 1.1; MODS 3.2; METS 1.4; jhove dtd

Newspapersuses PREMIS 2.0; MODS 3.3; METS 1.8

Web Archivinguses PREMIS 2.0; MODS 3.3; DC; METS 1.8

• Move to PREMIS 2.0• Changes to AIP model

17

AIPs and PREMIS 2.0

Change of AIP: Newspapers need second structMap (and structLink)

Hierarchy of AIPs no longer possible Instead: one AIP per issue

Manifestations are modelled as a <fileGrp> (various manifestations per AIP possible)

Support of container files (ZIP, WARC) Modelled as nested <file> elements; no PREMIS record for

container files

No file format specific technical metadata is captured

18

METS and PREMIS 2.0

METS and PREMIS 2.0: Use of new METS schema versions:

<mets:mdWrap MDTYPE="PREMIS:OBJECT">

<premis:object xsi:type="premis:file"> instead of objectCategory

just use <digiProvMD> Agent, object, event in separate <digiProvMD> elements within

the same <amdSec> PREMIS record should be self containing

19

METS and PREMIS 2.0

Extended list of event types:

deselection: files which are defined in the AIP descriptor but never ingested (no FLocat element)

metadataExtraction vs. propertyExtraction

Extended list of relationship types (relationshipSubType):

modification vs. manipulation

20

METS and PREMIS 2.0

Extended list of event types:

deselection: files which are defined in the AIP descriptor but never ingested (no FLocat element)

metadataExtraction vs. propertyExtraction

Extended list of relationship types (relationshipSubType): modification vs. manipulation

21

METS and PREMIS 2.0

Problems:

Validation Using controlled vocabularies Considering dependencies between METS and PREMIS

Standardized workflow for creating METS and PREMIS for all content streams

Currently specific implementations for each content stream

Extending the AIP Model Preservation metadata for metadata records

22

Thanks

Markus Enders

The British Library

Markus.Enders@bl.uk