1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne...

Post on 17-Dec-2015

214 views 0 download

Tags:

transcript

1

CS 502: Computing Methods for Digital Libraries

Lecture 9

Conversion to Digital Formats

Anne Kenney, Cornell University Library

2

What are Digital Images?

• Electronic snapshots taken of a scene or scanned from documents

• samples and mapped as a grid of dots or picture elements (pixels)

• pixel assigned a tonal value (black, white, grays, colors), represented in binary code

• code stored or reduced (compressed)

• read and interpreted to create analog version

Four Scanning Methods

Bitonal Grayscale

Color Special Treatment

4

Digital Image Quality is Governed By:

• resolution and threshold

• bit depth

• image enhancement

• color management

• compression

• system performance

• operator judgment and care

5

Resolution

• determined by number of pixels used to represent the image

• expressed in dots per inch (dpi)--actually dots/sq. inch

• increasing resolution increases level of detail captured and geometrically increases file size

Effects of Resolution

600 dpi600 dpi

300 dpi300 dpi

200 dpi200 dpi

7

Threshold Setting in Bitonal Scanning

defines the point on a scale from 0 to 255 at which gray values will be interpreted either as black or white

8

Effects of Threshold

threshold = 100

threshold = 60

9

Bit Depth

• number of bits used to represent each pixel, typically 8 bits or more per channel

• representing 256 (28) levels for grayscale and 16.7 million (224) levels for color example: 8-bit grayscale pixel

00000000 = black

11111111 = white

10

Bit Depth

• increasing bit depth increases the level of gray or color information that can be represented and arithmetically increases file size

• affects resolution requirements

11

Effects of Grayscale on Image Quality

3-bit gray 8-bit gray

12

Image Enhancement

• can be used to improve image capture

• use raises concerns about fidelity and authenticity

13

Effects of FiltersEffects of Filters

no filters usedno filters used

maximum maximum enhancementenhancement

14

Image Editing

15

Compression

• reduces file size for processing, storage, transmission, and display

• image quality may be affected by the compression techniques used and the level of compression applied

16

Compression Variables

• lossless versus lossy compression

• proprietary vs. open schemes

• level of industry support

• bitonal vs. gray/color

17

Common Compression Schemes• bitonal

– ITU Group 4: lossless – JBIG (ISO 11544): lossless– CPC: Lossy– DigiPaper

• grayscale/color– LZW, lossless– JPEG: lossy– Kodak Image Pac, “visually lossless”– Fractal and Wavelet compression

18

Effects of JPEG Compression

300 dpi, 8-bit grayscaleuncompressed TIFF

JPEG 18.5:1 compression

19

Compression Observations

• the richer the file, the more efficient and sustainable the compression

• the more complex the image, the poorer the compression

20

Equipment used and its performance over time

• scanners offer wide range of capabilities to capture detail, dynamic range, and color

• scanners with same stated functionality can produce different results

• calibration, age of equipment, and environment affect quality

21

Equipment used and its performance over time

• attributes and capabilities of monitor and/or printer are also factors

• assess quality visually and computationally– use targets– control QC environment– increasing availability of software to assess

resolution, tone, color, artifacts

22

Image Capture:

Create digital objects rich enough to be useful over time in the most cost- effective manner.

23

How to determine what’s good enough?

• Connoisseurship of document attributes

• Objective characterizations

• Translation between analog and digital– measurement to scanning requirement to

corresponding image metrics– e.g., detail sizeresolution MTF– tonal range bit depth signal-to-noise ratio

24

Case Study

• Brittle Books--printed text, use of metal type, commercial publishers, objective measurement, use of Quality Index from micrographics

• 600 dpi 1-bit capture adequately preserves informational content of text-based materials

25

Ensuring Full Informational Capture: “No More, No Less”

cost

imag

e qu

ality

and

util

itydesired point of capture

26

Create One Scan To Serve Multiple Uses

• Derive alternative formats/approaches to meet current and future information needs

• Base “derivative” requirements on document attributes, technical infrastructure, user requirements, and cost

• Understand technical links affecting presentation and utility of derivatives

27

User Requirements

• completeness

• legibility

• speed of delivery

• “cooked” files

28

Derivatives from a Digital Master

• the richer the image, the better the derivative– a derivative from a rich file is superior in

quality to one from a poorer scan– the richer the image, the better the image

processing

monitor: 800 x 600 pixels

800

600

document: 8” x 10”, 200 dpi (1,600 x 2,000 pixels)

2,000pixels

1,600 pixels

document at 60 dpi480 pixels x 600 pixels

document at 100 dpi800 pixels x 1,000 pixels

TIFF Uncompressed GGIF Compressed6:1 (NARA)6:1 (NARA)

JPEG Compressed 20:1 ( LC) Compressed

20:1 (LC)

Compression/File Format Comparison for Derivative Files

33

Alternatives for Displaying Oversize Images

• File formats and compression schemes that support multi-resolution image delivery, e.g., wavelet compression, GridPix, Flashpix

• User tools for representing scale (Blake Project ImageSizer, java applet), and improving image quality

34

Recommendations Coalescing• Intent of conversion drives decisions

– issues of access considered at conversion– notion of long-term utility and cross-institutional

resources gaining ground

• Access images will change with:– changing user needs and capabilities– changes in technologies: file formats, technical

infrastructure,compression, web browsers, processing programs, scaling routines