Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | grant-gordon |
View: | 216 times |
Download: | 0 times |
One Document at a Time: Small-scale
Digitization Projects
Peter Brueggeman, Scripps Inst. of Oceanography
Janet Webster, Hatfield Marine Science Center, OSU
Barbara Butler, Oregon Inst. of Marine Biology, UO
33rd Annual IAMSLIC Conference, Sarasota Florida
Past endeavors
Vendor produced PDFs from encoded text: smallest file size; costly; time spent on vendor interaction / proofing / revisions
Utilizing ILL staff, other staff: lower resolution scanning with routine ILL; quality issues
Do It Yourself: better results; least effort
Current Equipment Setup
Hewlett Packard ScanJet 7800 document scanner: dedicated sheet feeding scanner; double sided scanning
Plustek OpticBook 3600 Corporate flatbed book scanner: six millimeters between scan and edge; good for books with tight bindings
Adobe Acrobat: PDF optimization; OCR
Scan specification
Scan from disbound trimmed originals
Scan from photocopy if no disbound original in order to sheetfeed
600 ppi black/white two-bit scanning for text pages
Small file size, better text appearance with b/w scans (not for photos)
600ppi scan time OK with sheet feeding
Scan Specification
300 ppi grayscale scanning for halftone black and white photographs
300ppi color scanning for color photographs
Large PDF file size accumulates for pages scanned grayscale or color
Same 4 page PDF is 1,150K @ 300ppi grayscale, whereas 157K @ 300ppi b/w or 328K @ 600ppi b/w
Scan Specification
For pages comprised partially of a photograph,
You may wish to paste photos scanned grayscale / color onto black/white scanned text pages in order to save some file size while ensuring photo quality
Scan Specification
One page with photo on partial page
600ppi black/white PDF with unacceptable photo = 170K
300ppi grayscale PDF with less than acceptable text = 760K
600ppi black/white text & 300ppi grayscale photo PDF = 1,275K
600ppi grayscale PDF = 1,436K
Document Production
For yellowed/browned original, adjust the lightening setting in the scanning software to get white pages
Adobe Acrobat RECOGNIZE TEXT USING OCR not highly accurate
Save final PDF, then save it again via FILE-SAVE AS to reduce “document overhead”
Page through and proof PDF
Document Production
Compress via PDF Optimizer if desired
Try different settings to judge results
My target upper file size is 20 megabytes
Save original uncompressed version of PDF
A cog in OSU digitization process
Librarian is one player• Identify candidates• Investigate copyright• Send to the Digital Production Unit
DPU is the main dealer• Sliced if possible• Scanned & OCRed• Rebound, tied or dumped• Entered into appropriate digital collection/space
All projects/items fit into bigger collection scheme
Oregon Birds
Donated journal from a retired faculty member.
Posted to the Cyamus list and was prompted to think about digitizing.
Contacted the Oregon Field Ornithologists who were interested.
Generated a budget with help from my Technical Services Department chair.
Now, are negotiating with OFO.
Considerations
I have access to a good digitization unit.
I use it.
I promote it and thank those involved.
I work with others.
I couldn’t do it on my own at the branch.
The OIMB Approach
Add to Scholars’ Bank OR Oregon Explorer Shared Collection Development with OSULong-term goal:
Full-text Coos Bay Bibliography (Oregon South Coast)Geo-spatially referenced (Yaquina Bay Bibliography model)
Primary targets in initial phase:Student reports and thesesDocuments already in digital format
The OIMB Approach (in the beginning)
Student assistant
Ariel software
Flatbed scanner
100 pages per hour
Reviewed by staff
OCR by Adobe
Uploaded
Example 3:
1941 Printing:OCLC: 15 libraries
Z39.50 Distributed Library
• AIMS
• Hopkins
• MBL/WHOI
Aquatic Commons: Submitted 10/2007
The OIMB Approach (refined)
Same as part two with improvements:Document feeder with duplex capability (Epson GT-2500)Native scanner interface or Ariel interfaceAlso inputting into Aquatic Commons
Challenges still exist:Lack of dithering optionStill scanning at 300 dpi, b/w and grayscaleOCR and collating documents