+ All Categories
Home > Business > PDF/A: A Preservation Format

PDF/A: A Preservation Format

Date post: 03-Nov-2014
Category:
Upload: geof-huth
View: 843 times
Download: 0 times
Share this document with a friend
Description:
A presentation by Geof Huth on the PDF/A preservation format presented at a meeting of the Mid-Atlantic Regional Archives Conference in Bethlehem, Pennsylvania, on October 21, 2011. This presentation puts PDF/A in the context of a digital preservation program and explains the uses of its format and some of the details of this international standard.
29
+ PDF/A A Preservation Format Mid-Atlantic Regional Archives Conference 21 October 2011 Geof Huth [email protected]
Transcript
Page 1: PDF/A: A Preservation Format

+PDF/AA Preservation Format

Mid-Atlantic Regional Archives Conference21 October 2011

Geof [email protected]

Page 2: PDF/A: A Preservation Format

+File Format Confusion

From 5,000 to 15,000 extant file formats

Most are proprietary

The numbers add complexity to preservation

Real preservation formats are few in number

And we can really count on none of them

Page 3: PDF/A: A Preservation Format

+Two General Classes of Formats

Proprietary Controlled by one company Underlying code is a trade secret If the company goes under, the file format becomes obsolete

Open Controlled by a standards body, a consortium, wiki-like bodies Code is free and open to all In absence of an “owner,” can still use the code to make a

reader

Neither Guarantees Preservation But open formats give you an opening to preservation

Page 4: PDF/A: A Preservation Format

+Proprietary Formats

Tend to be rich in features

Limited readers for each format

Limited ability to exchange data

Difficult for long-term accessibility

Greater associated costs

Page 5: PDF/A: A Preservation Format

+Advantages of Open Formats

More choice in what application to use

Better exchange of data

Better support of long-term preservation

Possible lower costs

Ability to create own readers

Page 6: PDF/A: A Preservation Format

+Format/Software Confusion

Software Creates a file in the format Reads the file for you Allows you to interact with the file

Format Is the specific technical form in which a certain file exists Can be created by one software product or many

Examples Adobe Acrobat (and many others) vs PDF Microsoft Word vs .doc (and .docx, etc.)

Page 7: PDF/A: A Preservation Format

+Criteria for Preservation Formats(and Files) Ubiquitous

Long-lived

Documented

Metadata-supporting

Accurate

Open

Uncompressed

Unencrypted

Page 8: PDF/A: A Preservation Format

+When to Use a Preservation Format

Creation Begin with a format you know will last If so, choose a format that allows modification to a file

Recordation When information becomes a record, save it in a chosen format This freezes the file and demonstrates it is a record

Archiving Convert to persistent formats those records needed long-term The conversion preserves the records and marks is as

permanent

Early Action Can Save Money and Time

Page 9: PDF/A: A Preservation Format

+Normalization(action at the point of archiving)

Conversion to a format

Not expected to change

Not expected to disappear

Not expected to become unreadable

Usually conversion to a different format from original

Generally how preservation formats are used

Still, may cause data loss or corruption

Page 10: PDF/A: A Preservation Format

+Options for Preservation of Text

American Standard Coding for Information Interchange (ASCII)

Unicode

Portable Document Format / Archive (PDF/A)

Extensible Markup Language (XML)

Open Document Format (ODF) (ISO/IEC 26300:2006)

Office Open XML (OOXML) (ISO/IEC 29500:2008)

Page 11: PDF/A: A Preservation Format

+What is Portable Document Format?

Originally developed by Adobe in 1991

Specifications made available for free in 2001

Format made an open international standard in 2008

Includes text and image features

Page 12: PDF/A: A Preservation Format

+Advantages of PDF

Has accessibility across platforms

Saves look and searchability of original

Embeds fonts (if desired)

Allows copying of text from files

Remains fairly stable and universal

Is difficult to modify

Has enhanced document security

Supports authenticity

Page 13: PDF/A: A Preservation Format

+Disadvantages of PDF

Won’t always perfectly represent original

Some files are more difficult to convert

Some formatting may be lost if saved back to original file

format

Limited ability to modify

A complex format saving image and text

Tends to be larger than a word processing document

Page 14: PDF/A: A Preservation Format

+PDF’s Advantage over Others

Image and text in one bundle

Intelligent text

Accepts importance of format to meaning

Ubiquity of format and readers

Page 15: PDF/A: A Preservation Format

+Conversion Practices

Have necessary fonts installed

Ensure lossless compression

Important for embedded images

When converting PDF to PDF/A

Eliminate prohibited features

Check beforehand or fix during

Page 16: PDF/A: A Preservation Format

+Flavors of the PDF Standard

PDF (vanilla)

PDF/A (for archival preservation)

PDF/X (for publishing)

PDF/E (for engineering drawings)

PDF/VT (for variable data and transactional printing)

PDF/UA (for accessibility—in development)

PDF/H (for healthcare records—a guide, not a standard)

GeoPDF (for geospatial records—only based on standards)

Page 17: PDF/A: A Preservation Format

+Portable Document Format / Archive Standards

PDF/ A-1

ISO Standard 19005-1:2005

Based on PDF Reference 1.4 (Acrobat 5)

PDF/A-2

ISO Standard 19005-2:2011

Based on PDF Reference 1.7

Published 20 June 2011

New versions of PDF/A expected

Page 18: PDF/A: A Preservation Format

+Uses of PDF/A

Standard textual documents

Paper documents

Word-processing and PDF documents

Sequences of related digital images

Documents where appearance matters

Static documents

Page 19: PDF/A: A Preservation Format

+Less Appropriate for PDF/A

Webpages

Databases

Spreadsheets

Dynamic documents

Page 20: PDF/A: A Preservation Format

+Creating PDF/As

Need a product that can produce one

Like Adobe Acrobat 8 Professional

Can convert documents individually

Opening and converting one at a time

Can use batch processing

Converting multiple documents at once

Supported by Acrobat 8

Page 21: PDF/A: A Preservation Format

+General Goals of PDF/A

Specifies limited stable set of features

To ensure long-term validity

Eliminate features that are not “archival”

An open preservation standard

Format designed to be a preservation standard

Page 22: PDF/A: A Preservation Format

+Required in PDF/A

All fonts embedded

Unlimited legal use of embedded fonts

Device-independent color

Metadata describing the file

File must self-identify the PDF/A version

Page 23: PDF/A: A Preservation Format

+Excluded from PDF/A-1

Audio and video content

JavaScript and executable files

Encryption

LZW and JPEG 2000 image compression

Reference to outside content

Transparency

Embedded files

Page 24: PDF/A: A Preservation Format

+Differences in PDF/A-2 Allows embedding of OpenType fonts

Allows JPEG2000 image compression

Supports transparent objects

Supports layers, which can be hidden for viewing

Defines use of digital signatures Defines rules via PDF Advanced Electronic Signatures (PAdES)

Specifies requirements for custom XMP metadata

Allows embedded files, but in only one context In a PDF/A-2 you can embed PDF/A files Allows creation of sets of documents in a single file (e.g. emails)

All PDF/A-1s are compliant with PDF/A-2 standard PDF/A-2 is an extension of PDF/A-1

Page 25: PDF/A: A Preservation Format

+PDF/A-1 Conformance Levels

PDF/A-1, Level A (full compliance)

Preserves document’s logical structure

Preserves text stream in reading order

Requires language specification

Requires UNICODE mapping

PDF/A-1, Level B (minimal compliance)

Preserves visual appearance

Doesn’t require as much descriptive info

Less “accessible” format

Page 26: PDF/A: A Preservation Format

+Flavors of PDF/A

PDF/A-1a (a = accessible) RGB Color CMYK Color

PDF/A-1b (b = basic) Same color choices

PDF/A-2a (extension of A-1a)

PDF/A-2b (extension of A-1b)

PDF/A-2u (u = Unicode) Must use Unicode Does not require representation of logical structure

Page 27: PDF/A: A Preservation Format

+PDF/A Product Lines

Adobe Acrobat (www.adobe.com)

Apago (www.apagoinc.com)

Callas (www.callassoftware.com)

Compart (www.compart.net)

PDFlib (www.pdflib.com)

PDF Tools AG (www.pdf-tools.com)

Page 28: PDF/A: A Preservation Format

+PDF/A Validation Tools

Adobe Acrobat Preflight Function (www.adobe.com)

Callas Software pdfaPilot (www.callassoftware.com)

PDF Tools AG's 3-Heights PDF Validator (www.pdf-

tools.com)

Page 29: PDF/A: A Preservation Format

+Formats are Not Everything

Preservation Programs Require Work Conversion procedures Quality control Version control Environmental controls Metadata creation and maintenance

Metadata about the records and their information Metadata about your preservation actions

Data management controls (backups, etc.) Ensuring that chosen normalized formats are still valid Vigilance


Recommended