+ All Categories
Home > Documents > One Document at a Time: Small-scale Digitization Projects Peter Brueggeman, Scripps Inst. of...

One Document at a Time: Small-scale Digitization Projects Peter Brueggeman, Scripps Inst. of...

Date post: 28-Dec-2015
Category:
Upload: grant-gordon
View: 216 times
Download: 0 times
Share this document with a friend
26
One Document at a Time: Small-scale Digitization Projects Peter Brueggeman, Scripps Inst. of Oceanography Janet Webster, Hatfield Marine Science Center, OSU Barbara Butler, Oregon Inst. of Marine Biology, UO 33 rd Annual IAMSLIC Conference, Sarasota Florida
Transcript

One Document at a Time: Small-scale

Digitization Projects

Peter Brueggeman, Scripps Inst. of Oceanography

Janet Webster, Hatfield Marine Science Center, OSU

Barbara Butler, Oregon Inst. of Marine Biology, UO

33rd Annual IAMSLIC Conference, Sarasota Florida

Legacy Publication Digitization @ Scripps

Peter Brueggeman

Past endeavors

Vendor produced PDFs from encoded text: smallest file size; costly; time spent on vendor interaction / proofing / revisions

Utilizing ILL staff, other staff: lower resolution scanning with routine ILL; quality issues

Do It Yourself: better results; least effort

Current Equipment Setup

Hewlett Packard ScanJet 7800 document scanner: dedicated sheet feeding scanner; double sided scanning

Plustek OpticBook 3600 Corporate flatbed book scanner: six millimeters between scan and edge; good for books with tight bindings

Adobe Acrobat: PDF optimization; OCR

Scan specification

Scan from disbound trimmed originals

Scan from photocopy if no disbound original in order to sheetfeed

600 ppi black/white two-bit scanning for text pages

Small file size, better text appearance with b/w scans (not for photos)

600ppi scan time OK with sheet feeding

300ppi grayscale vs 600ppi b/w @ 200%

300 ppi vs 600 ppi B/W @ 200% 4 pages: 157K vs 328K

Scan Specification

300 ppi grayscale scanning for halftone black and white photographs

300ppi color scanning for color photographs

Large PDF file size accumulates for pages scanned grayscale or color

Same 4 page PDF is 1,150K @ 300ppi grayscale, whereas 157K @ 300ppi b/w or 328K @ 600ppi b/w

Scan Specification

For pages comprised partially of a photograph,

You may wish to paste photos scanned grayscale / color onto black/white scanned text pages in order to save some file size while ensuring photo quality

600ppi black/white scan

300ppi grayscale scan

Scan Specification

One page with photo on partial page

600ppi black/white PDF with unacceptable photo = 170K

300ppi grayscale PDF with less than acceptable text = 760K

600ppi black/white text & 300ppi grayscale photo PDF = 1,275K

600ppi grayscale PDF = 1,436K

Document Production

For yellowed/browned original, adjust the lightening setting in the scanning software to get white pages

Adobe Acrobat RECOGNIZE TEXT USING OCR not highly accurate

Save final PDF, then save it again via FILE-SAVE AS to reduce “document overhead”

Page through and proof PDF

Document Production

Compress via PDF Optimizer if desired

Try different settings to judge results

My target upper file size is 20 megabytes

Save original uncompressed version of PDF

Digitization Initiatives at Oregon State University

Janet Webster

A cog in OSU digitization process

Librarian is one player• Identify candidates• Investigate copyright• Send to the Digital Production Unit

DPU is the main dealer• Sliced if possible• Scanned & OCRed• Rebound, tied or dumped• Entered into appropriate digital collection/space

All projects/items fit into bigger collection scheme

How it works

Another twist on how it works.

Oregon Birds

Donated journal from a retired faculty member.

Posted to the Cyamus list and was prompted to think about digitizing.

Contacted the Oregon Field Ornithologists who were interested.

Generated a budget with help from my Technical Services Department chair.

Now, are negotiating with OFO.

Considerations

I have access to a good digitization unit.

I use it.

I promote it and thank those involved.

I work with others.

I couldn’t do it on my own at the branch.

Digitization Initiatives at University of Oregon

(OIMB)

Barb Butler

The OIMB Approach

Add to Scholars’ Bank OR Oregon Explorer Shared Collection Development with OSULong-term goal:

Full-text Coos Bay Bibliography (Oregon South Coast)Geo-spatially referenced (Yaquina Bay Bibliography model)

Primary targets in initial phase:Student reports and thesesDocuments already in digital format

The OIMB Approach (in the beginning)

Student assistant

Ariel software

Flatbed scanner

100 pages per hour

Reviewed by staff

OCR by Adobe

Uploaded

Example 1:

Example 2:

Example 3:

1941 Printing:OCLC: 15 libraries

Z39.50 Distributed Library

• AIMS

• Hopkins

• MBL/WHOI

Aquatic Commons: Submitted 10/2007

The OIMB Approach (refined)

Same as part two with improvements:Document feeder with duplex capability (Epson GT-2500)Native scanner interface or Ariel interfaceAlso inputting into Aquatic Commons

Challenges still exist:Lack of dithering optionStill scanning at 300 dpi, b/w and grayscaleOCR and collating documents


Recommended