+ All Categories
Home > Documents > PREMIS Rathachai Chawuthai [email protected] Information Management CSIM / AIT Issued...

PREMIS Rathachai Chawuthai [email protected] Information Management CSIM / AIT Issued...

Date post: 24-Dec-2015
Category:
Upload: anis-robbins
View: 229 times
Download: 0 times
Share this document with a friend
Popular Tags:
76
PREMIS Rathachai Chawuthai [email protected] Information Management CSIM / AIT Issued document 1.0
Transcript

PREMIS

Rathachai [email protected]

Information ManagementCSIM / AIT

Issued document 1.0

2

Agenda

• Preservation Metadata• PREMIS Overview• Data Dictionary Conventions• PREMIS Data Model• The Data Dictionary• PREMIS In use

3

PreservationMetadata

4

• Metadata is often defined as “Data about Data”.• It defines information about one or more characteristics of

the data; such as,– Data’s name, description, purpose, created date-time, creator, basic

information, and etc.

• For example– Library catalogues: a small card contains a book’s title, author,

subject, category, shelf, and etc. that describes resource in library

• Furthermore, it can say that – “Metadata is commonly understood as an amplification of traditional

bibliographic cataloguing practices in an electronic environment.”

Metadata

Metadata Meaning

wikipedia.org

5

• Descriptive– It always describes identification and information of resource;

such as, title, author, and etc.

• Administrative– It helps to manage information of resource;

such as, version number, archiving data, technical information, right management, and etc.

• Structure– It informs relationships within and among resource objects;

such as, web page contains html files, image files, css files, javascript files, links to others files, and etc.

Metadata

Metadata Categories

wikipedia.org

6

Preservation Metadata

Overview

• It is “an essential component of most digital preservation strategies”. [Wikipedia]

• It’s basic requirements are: [OCLC]

– To store technical information that supports making decision and action in order to do preservation

– To document actions taken, such as migration.– To record the effects of preservation strategies– To ensure authenticity of digital resources over the long-term– To note information about collection management and rights management

• It’s basic functional objectives are: [OCLC]

– Providing knowledge about actions to maintain digital resource over the long-term

– Ensuring that the digital resources can be rendered originally

OCLC.org, wikipedia.org

7

Preservation Metadata

Basic features

According to preservation requirements, preservation metadata should include following information:• Provenance

– Describe history of creation, ownership, access, and change

• Authenticity– Ensure trustworthiness (Does digital resource render originally?)

• Preservation activities– Record process supporting preservation, such as migration

• Technical environment– Provide name and version of hardware, platform, OS, and software that is required to

render digital resources

• Rights management– Inform concern of intellectual property rights and agreement that need to be observed

when execute preservation process.E.g. does a creator allow to copy his/her work or not?

OCLC.org, usenix.org, wikipedia.org

8

Preservation Metadata

Example

• Date• Transcriber• Producer• Capture Device• Capture Details• Change History• Validation Key• Encryption

• Watermark• Resolution• Compression• Source• Color• Color Management• Color Bar/Gray-scale Bar• Control Targets

16 preservation metadata elements ( recommended by oclc.org, May 1998)

OCLC.org

9

Preservation Metadata Framework

Overview

• A framework that is an overview or description types and association of digital preservation metadata

• Following OCLC/RLG, the framework should have 3 requirements

– Comprehensive• The metadata completely includes information that meet requirements of

big picture of digital preservation data structure and processes

– Structured• Preservation metadata should represent in structural format which

makes human and machine understand clearly.

– Broadly applicable• Digital object type, preservation activities, their relationship

should be flexible for implementing in real world, such as institution, and etc.

OCLC.org

10

Preservation Metadata Framework

Overview

In order to meet the requirements, it should realize these 3 steps1. Design metadata model that supports content model, long-

term accessibility, and preservation activities.2. Think of future interoperability, then, modify the model for

supporting metadata exchange and resource sharing.3. Improve the model to be flexible to intergrade with

external archive.

OCLC.org

11

Preservation Metadata Framework

Example

AHDS

Technical Description

Persistent ID

File Description

Text

Format

Version

Structure division

Image

Format

Resolution

Size

Management Description

Created date

Storage information

Software Environment

Application required

OS Name Version

Functionalities

Ingest

Migrate

Agent

Date

Software VersionAccess

Share

Modify from AHDS Preservation Metadata Framework

AHDS.ac.uk

12

Summary

• Metadata is“Structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource” [LOC]

• Preservation Metadata is“A metadata that supports and documents the digital preservation process” [LOC]

• Preservation Metadata Framework is“An important contribution toward shaping an international consensus on the metadata requirements of archived digital objects and consolidating expertise on the use of metadata to support digital preservation” [OCLC]

LOC.gov, OCLC.org

13

PREMISOverview

14

• PREservation Metadata: Implementation Strategies• Sponsor by Library of Congress (LOC)• People usually refer to “PREMIS” as “Data Dictionary”• Represent in XML format

PREMIS Overview

What?

LOC.gov, wikipedia.org

15

• Set of Semantic Unit• Metadata for digital object

– Can read from media– Can render– Store securely– Keep track of changing format

• Metadata Scope– Format-spec e.g. audio, video, image, …– Implementation-spec How to access it (by app)– Descriptive metadata Data properties; like, MARC, DC– Detailed info (For media or hardware)– Agents info e.g. people, org, or software– Right info e.g. license, permission

PREMIS Overview

PREMIS Data Dictionary

PREMIS from LOC.gov

16

PREMIS Overview

Where is PREMIS?

PREMIS responses itself as a coordinator among several types of metadata in order to perform preservation function on all digital resources.

Thus, PREMIS is a small core at the heart of preservation metadata

PREMIS from LOC.gov

17

• Administrative metadata that support the process of digital preservation

• Information providing to support preservation management– Technical information (Characteristics)

• E.g. creator, created date-time, creating software, …

– Information about action of a digital object• E.g. ingest, migrate, verify, …

– Relationship• Structural : point out how objects are put together• Derivative : result from actions of preservation

– Rights• E.g. Rights and agreement metadata associated with preservation

PREMIS Overview

PREMIS data dictionary covers:

PREMIS from LOC.gov

18

• Support managing repository system– Long-term preservation– Repository migration (to another)

• Scope– Repository Design– Repository Evaluation– Exchange of archived ‘information package’ among repositories

• Development view– Use PREMIS as a guideline for what info should be recorded

PREMIS Overview

Usefulness

PREMIS from LOC.gov

19

• Support Data preservation by having– Inhibitors

• Password, encryption, … in order to access digital objects

– Digital Provenance• Record change of object format e.g. .DOC .PDF• Contain application, version, environment, … in order to render digital objects

– Significant Properties (If important)• Object’s characteristics e.g. font, formatting, color, …., etc• Look and feel

– Right• Copyright status, License term

PREMIS Overview

Using PRMIS if you have to

PREMIS from LOC.gov

20

Data DictionaryConventions

21

• Information a repository uses to support the digital preservation process– Guidelines/recommendations to support preservation process; such

as, creation, use, and management.

• Information is defined as:– Thing that most working repositories have common concern and

need in order support digital preservation

Data Dictionary Conventions

Data dictionary

PREMIS from LOC.gov

22

• PREMIS prefers to use term “Semantic Unit” rather than “Metadata Element”.

• Semantic unit is an entry of data dictionary• Semantic unit is defined as a property of entity in PREMIS

data model• Semantic unit supports the recording of relationship

between objects.• Example

– Identifier, size, format, environment, software, …

Data Dictionary Conventions

Semantic Unit

PREMIS from LOC.gov

23

Data Dictionary Conventions

Example : Size

PREMIS from LOC.gov

24

Software- swName = “Windows”- swVersion = “XP”- swType = “OperatingSystem”

Data Dictionary Conventions

Container

Software = “Windows|XP|OperationSystem”

What should we do if the semantic unit’ value has to address with many meaning?

The data dictionary allow concept of container that group as set of related semantic units together.

Container

components

25

Data Dictionary Conventions

Example : Software

PREMIS from LOC.gov

26

• New in PREMIS 2.0• Contains externally defined semantic units• Allows to extend PREMIS with semantic units which are

more granular, non-core or out of scope of the PREMIS data dictionary

• Data in the container may replace, refine or be additional to the appropriate PREMIS semantic unit

• One schema per extension; if more schemas are needed, the extension element needs to be repeated

Data Dictionary Conventions

Extension Container (General)

PREMIS from LOC.gov

27

Data Dictionary Conventions

Example : <objectCharacteristicsExtension>

Normally, <objectCharacteristicsExtension> has information following PREMIS schema like:

PREMIS louis.xml from LOC.gov

28

Data Dictionary Conventions

Example : <objectCharacteristicsExtension>

If it need more information a part from PREMIS schema, the information fromother schemas (e.g. METS) can be address in <objectCharacteristicsExtension>

PREMIS louis.xml from LOC.gov

29

PREMISData Model

30

PREMIS Data Model

Data Model

Including:

• Entity– Thing relevance to do digital preservation that is described by

preservation metadatasuch as, Intellectual, Objects, Events, Rights, and Agents

• Property of entity (Semantic Unit)– Such as, Identifier, size, format, environment, software

• Relationship between entities– Linking entity together e.g. isPartOf, isSourceOf, isDerivedFrom, …– For example:

• Document X2 is a newer version of document X1• Document AA is a chapter of document A

31

PREMIS Data Model

Entities

PREMIS from LOC.gov

32

• May called “Bibliographic Entities”• A set of content that is considered a single intellectual unit

for purposes of management and description– E.g. book, map, photograph, or database

• Not fully described in PREMIS Data Dictionary– It can use by other metadata standard, such as, DublinCore.

Intellectual Entities

Overview

Intellectual

Objects

Rights

Agents

Events

PREMIS tutorial from LOC.gov

33

• To be stored and managed in the preservation repository• E.g.

– Intellectual Entity : “Thailand Map”• Object Entity : Image file

• 3 Kinds of object– File

• A computer file, likes a PDF or JPEG

– Representation• Set of files that work together• E.g. web page including, html, image, css, javascript

– Bitstream• A part of file• E.g. a frame image in video file

Object Entitles

Overview

Intellectual

Objects

Rights

Agents

Events

PREMIS tutorial from LOC.gov

34

• Chapter1.pdf is a File• Chapter1.pdf + Chapter2.pdf + chapter3.pdf is a

Representation of a book having 3 chapters• A TIFF file contain header and 2 images

– It means that there are 2 Bitstreams of 2 images– Each bitstream (image) has own set of semantic unit

Object Entitles

Example

35

Object Entitles

ExampleThailand Map

Intellectual

Object 1 Object 2 Object 3

Representation File File1 jpeg file1 TIFF file include:

3 bitstreams of images of map layers• Province• mountain,• river

It can be a web page that contains 3 files • HTML• CSS• JPEG

Example types of object that is possible to preserve the Thailand Map

36

• a unique identifier for the object (type and value),• fixity information such as a checksum (message digest) and the algorithm used to

derive it,• the size of the object,• the format of the object, which can be specified directly or by linking to a format

registry,• the original name of the object,• information about its creation,• information about inhibitors,• information about its significant properties,• information about its environment

– OS MacOS, Browser Safari

• where and on what medium it is stored,• digital signature information,• relationships with other objects and other types of entities.

Object Entitles

Data Dictionary

PREMIS from LOC.gov

37

Object Entitles

Example

Object example of TIFF file

XML format

PREMIS louis.xml from LOC.gov

38

Object Entitles

Example

Object example of TIFF file (in Table format) 1

PREMIS from LOC.gov

39

Object Entitles

Example

Object example of TIFF file (in Table format)

PREMIS from LOC.gov

2

40

Object Entitles

Example

Object example of TIFF file (in Table format)

PREMIS from LOC.gov

3

41

• Action that effect object in the repository– The action must has at least one object and agent recorded– Event must has outcome (a result of event); such as, success or fail.

Event Entities

Overview

Intellectual

Objects

Rights

Agents

Events

PREMIS tutorial from LOC.gov

42

Event Entities

Event Type

Event Type Description

capture the process whereby a repository actively obtains an object

compression the process of coding data to save storage space or transmission time

creation the process of removing an object from the inventory of a repository

deaccession the process of removing an object from the inventory of a repository

decompression the process of reversing the effects of compression

decryption the process of converting encrypted data to plaintext

deletion the process of removing an object from repository storage

1

PREMIS from LOC.gov

43

Event Entities

Event Type

Event Type Description

digital signature validation

the process of determining that a decrypted digital signature matches an expected value

dissemination the process of retrieving an object from repository storage and making it available to users

fixity check the process of verifying that an object has not been changed in a given period

ingestion the process of adding objects to a preservation repository

message digest calculation

the process by which a message digest(“hash”) is created

migration a transformation of an object creating a version in a morecontemporary format

PREMIS from LOC.gov

2

44

• a unique identifier for the event (type and value),• the type of event (creation, ingestion, migration, etc.),• the date and time the event occurred,• a detailed description of the event,• a coded outcome of the event,

(Result of event; success | fail | …)• a more detailed description of the outcome,• agents involved in the event and their roles,• objects involved in the event and their roles.

Event Entities

Data dictionary

PREMIS from LOC.gov

45

Event Entities

Example : Validation

PREMIS tutorial from LOC.gov

46

• Actor, e.g. person, organization, or software• Metadata standard, e.g. FOAF, vCARD, eduPerson, …

• Note: Agent can has many roles – Role is not belong to Agent– It is up to Event entities or Rights entities

Agent Entities

Overview

Intellectual

Objects

Rights

Agents

EventsPREMIS tutorial from LOC.gov

47

• a unique identifier for the agent (type and value),• the agent's name,• designation of the type of agent (person, organization,

software).

Agent Entities

Data dictionary

PREMIS from LOC.gov

48

Agent Entities

Example

Adobe Reader

PREMIS tutorial from LOC.gov

49

• Information about Rights and Permissionsthat are directly relevant to preserving objects in repository– Rights: Assertions of one or more rights or permissions

pertaining to a Digital Object and/or an Agent.

• Example:– John Hebeler grants AIT digital repository permission to make 10

copies of Semantic_Web_Programming.pdf for preservation purposes

• Pattern– Agent A – grants permission B to the repository – in regard to object C.

Rights Entities

Overview

Intellectual

Objects

Rights

Agents

EventsPREMIS tutorial from LOC.gov

50

• a unique identifier for the rights statement (type and value),• whether the basis for claiming the right is copyright, license

or statute,• more detailed information about the copyright status,

license terms, or statute, as applicable,• the action(s) that the rights statement allows,• any restrictions on the action(s),• the term of grant, or time period in which the statement

applies,• the object(s) to which the statement applies,• agents involved in the rights statement and their roles.

Rights Entities

Data dictionary

PREMIS from LOC.gov

51

Rights Entities

Example : Copyright

PREMIS tutorial from LOC.gov

52

TheData Dictionary

53

The Data Dictionary

Example Data dictionary of semantic unit

Semantic UnitName of semantic unit

PREMIS from LOC.gov

54

The Data Dictionary

Example Data dictionary of semantic unit

Semantic Component

If it contains child components, components will describe. Otherwise, display “None”.

PREMIS from LOC.gov

55

The Data Dictionary

Example Data dictionary of semantic unit

Definition

Description of the semantic unit

PREMIS from LOC.gov

56

The Data Dictionary

Example Data dictionary of semantic unit

Rationale

Reason that PREMIS include this semantic unit

PREMIS from LOC.gov

57

The Data Dictionary

Example Data dictionary of semantic unit

Data constraint

Specification on value of the sematic unit.For example:• None

(No constraint)• Integer

(Value must be integer number)• Value from controlled

vocabulary(The value must come from controlled vocabulary)

• Container (the unit is a container)

PREMIS from LOC.gov

58

The Data Dictionary

Example Data dictionary of semantic unit

Object category

This section is describe rule of data that depend on eachobject type:• Presentation• File• Bitstream

PREMIS from LOC.gov

59

The Data Dictionary

Example Data dictionary of semantic unit

Applicability

Describe that is this semantic unit applicable to current working object type or not.If “Not applicable”, this semantic unit can be ignored from metadata. In this case, semantic unit “Size” can be apply to object types “File” and “Bitstream” only, but not “Representation”.

PREMIS from LOC.gov

60

The Data Dictionary

Example Data dictionary of semantic unit

Example

An example value of this semantic unit may use.

PREMIS from LOC.gov

61

The Data Dictionary

Example Data dictionary of semantic unit

Repeatability

Indicates that the semantic unit is able to take multiple value under same container

“Not repeatable” = can use at most one time.

“Repeatable” = can use more than one time.

PREMIS from LOC.gov

62

The Data Dictionary

Example Data dictionary of semantic unit

Obligation

Indicate that is the semantic unit required to store in metadata or not?

“Mandatory” = It is required.

“Optional” = It is not necessary to use.

PREMIS from LOC.gov

63

The Data Dictionary

Example Data dictionary of semantic unit

Creation / Maintenance Note

Further detail regarding how the values are created and or updated.

In this case, the value is automatically generate by repository

PREMIS from LOC.gov

64

The Data Dictionary

Example Data dictionary of semantic unit

Usage notes

provides information regarding the use of the semantic unit.

PREMIS from LOC.gov

65

The Data Dictionary

Example list of PREMIS Semantic Unit

Name : Name of semantic unit (It can be a container, if it has component units)

PREMIS from LOC.gov

66

The Data Dictionary

Example list of PREMIS Semantic Unit

M : Mandatory (Must define)

O : Optional (Not necessary to define)

PREMIS from LOC.gov

67

The Data Dictionary

Example list of PREMIS Semantic Unit

R : Repeatable (Can use at most 1 unit)

NR : Not repeatable (Can use more than 1 unit)

PREMIS from LOC.gov

68

The Data Dictionary

Example list of PREMIS Semantic Unit

End with [a,b] : Apply to specific object types e.g. presentation and file

None : Apply to all object typesPREMIS from LOC.gov

69

• Although descriptive metadata is important to describe Intellectual Entities, the descriptive metadata is not focused in PREMIS because:– There have existing well-defined standard, such as MARC, MOD,

DublinCore, and etc.– The descriptive metadata is often domain specification. Thus, each domain

should use a proper standard.

Data Dictionary Conventions

Limitation of Data Dictionary

PREMIS from LOC.gov

70

PREMISin use

71

• Institution– University of North Carolina at Chapel Hill

• Description– The Carolina Digital Repository (CDR) is being designed as repository for material in

electronic formats produced by members of the University of North Carolina at Chapel Hill community. Its chief purpose is to provide for the long-term preservation of such materials. By preservation we mean the ability to ingest the material, index and search it, replicate it, and keep it safe from alteration. The project is recording and/or mapping to PREMIS elements as the repository with a preservation focus is built.

• Link– http://www.lib.unc.edu/cdr/

• Tool– Locally developed Java web apps plus Fedora Commons, iRODS data grid, Solr search

engine and the Duke Data Accessioner

PREMIS in use

Carolina Digital Repository

PREMIS registry from LOC.gov

72

• Institution– The National Archives of Sweden

• Description– PREMIS is used for processing and storing digital objects in a digital

repository. The National Archives is developing a transfer model for digital objects created in our scanning projects. A function is being developed for packaging and storing data about the digital objects in our archival information system ARKIS partly stored as PREMIS-metadata. The application is in use for storing data. An application for exporting PREMIS data as XML will be developed in the future.

• Tool– ESSearch

PREMIS in use

Creating a digital repository at the Swedish National Archives using PREMIS

PREMIS registry from LOC.gov

73

• Institution– Florida Center for Library Automation

• Description– The FCLA Digital Archive is a preservation repository for the use of the libraries of the

public universities of Florida. The FCLA Digital Archive uses a locally-developed software application called DAITSS, which implements most of the PREMIS data elements.

• Link– http://www.fcla.edu/digitalArchive/

• Tool– The archive is in production as of November 2005. Dissemination (DIPs) with PREMIS-

conformant metadata is expected by July 2006.

• Document– http://www.fcla.edu/digitalArchive/daInfo.htm

PREMIS in use

FCLA Digital Archive and DAITSS

PREMIS registry from LOC.gov

74

• Institution– National Archives of Scotland

• Description– The NAS is preparing for the ingest of digital objects from the Scottish

Executive (the government of Scotland) and the Scottish Courts. An application is under development that aims to be compliant with OAIS, PD0008 and PREMIS to met this requirement.

• Tool– The DDA aims to implement the DROID API of PRONOM, developed

by the National Archives, among other tools.

PREMIS in use

Digital Data Archive (DDA) Project

PREMIS registry from LOC.gov

75

?

76

References

• http://www.oclc.org/research/activities/past/orprojects/pmwg/presmeta_wp.pdf Preservation Metadata for Digital Objects: A Review of the State of the ArtOCLC/RLG Working Group on Preservation MetadataJanuary 31, 2001

• http://en.wikipedia.org/wiki/Metadata• http://en.wikipedia.org/wiki/Preservation_metadata• http://www.usenix.org/event/tapp09/tech/full_papers/factor/factor.pdf

Authenticity and Provenance in Long Term Digital Preservation: Modeling and Implementation in Preservation Aware StorageMichael Factor, Ealan Henis, Dalit Naor, Simona Rabinovici-Cohen, Petra Reshef, Shahar Ronen,IBM Research Lab in Haifa, Israel and Giovanni Michetti, Maria Guercio, University of Urbino, Italy

• http://www.ahds.ac.uk/preservation/preservation-metadata-review.pdfAHDS Preservation Metadata FrameworkRaivo Ruusalepp, Estonian Business Archives, Ltd, September 2002

• http://www.loc.gov/standards/premis/understanding-premis.pdf• http://www.loc.gov/standards/premis/v2/premis-2-0.pdf• http://www.loc.gov/standards/premis/premis-registry.php• http://www.loc.gov/standards/premis/tutorials.html• http://www.loc.gov/standards/premis/louis-2-0.xml


Recommended