Rapid Capture in SpecialCollections and Archives Webinar
27 October 2011
Laura Clark Brown, University of North Carolina at Chapel HillBen Goldman, University of WyomingMary Elings, University of California, BerkeleyErik Moore, University of MinnesotaBrian Wilson, The Henry FordRicky Erway, OCLC Research
Rapid Capture in Special Collections and Archives Webinar
27 October 2011
Rapid CaptureFaster Throughput in Digitization of Special Collections
OCLC Research 2011
http://www.oclc.org/research/publications/library/2011/2011-04r.htm
MEETING DEMANDS FOR MORE AND MORE CONTENT
A programmatic approach to large-scale digitization of manuscript collections
Laura Clark BrownCoordinator of the Digital Southern Historical Collection
The Southern Historical Collectionat the Louis Round Wilson Special Collections Library
The Southern Historical Collectionat the Louis Round Wilson Special Collections Library
The DIGITAL SOUTHERN HISTORICAL COLLECTIONis a large-scale manuscripts digitization program that employs a set of nimble workflows and technologies to scan and present online multiple streams of content demanded from multiple sources.
The Southern Historical Collectionat the Louis Round Wilson Special Collections Library
Archivists’ Choice
Special Projects Donors
Researchers Preservation
MULTIPLE STREAMS FOR MULTIPLE DEMANDS
The Southern Historical Collectionat the Louis Round Wilson Special Collections Library
Pre-Production
• Curatorial Decisions
• Material Preparation
• Finding Aid Preparation
Production
• Scanning• Metadata• Quality
Control
Post Production
• File Management
• Online Presentation
• Quality Control
MULTIPLE STREAMS, SAME NIMBLE WORKFLOWS
The Southern Historical Collectionat the Louis Round Wilson Special Collections Library
Client loads HTML and JavaScript
Javascript makes API call
API searches CONTENTdm
collections and returns array (may be empty)
JavaScript builds links if appropriate
Client displays links to pre-coordinated search
of CONTENTdm collections
MULTIPLE STREAMS, SAME TECHNOLOGICAL SOLUTIONS• HTML finding aids and ingest
packages built from XSL transforms of base xml file
• Both contain unique identifiers
• API created to query CONTENTdm collections and return results
• JavaScript added to every HTML finding aid
• AJAX query for content and create links if appropriate
The Southern Historical Collectionat the Louis Round Wilson Special Collections Library
The Southern Historical Collectionat the Louis Round Wilson Special Collections Library
CAN WE MEET THE DEMANDS FOR MORE AND MORE DIGITIZED CONTENT FROM MORE AND MORE PEOPLE?
of course not . . . but we can start to . . .
Re-Using Archival Description
Ben GoldmanDigital Programs ArchivistAmerican Heritage CenterUniversity of Wyoming
Mass Digitization at the AHC
• Metadata is the most time-consuming task in a digitization project
• We already have a team of (6) processing archivists describing collections
• RE-USE METADATA• Focus on processed collections with finding aids • Describe digitized material to whatever level the
physical materials are described
Details and Results
• Use LUNA digital asset management system– Metadata uploaded via Excel spreadsheets
• Dublin Core – Lots of copy and paste, most fields map to
collection-level values• 75,000 new items from 60+ collections the last
two years, with minimal digitization resources (two part-time students on hourly wage)
Descriptions That Don’t Work
“Accomplishments to Jackson Hole, 1927-1948: Box 1” “Correspondence, Chronological, 1930-1939: Boxes 65-
80”“Miscellaneous Negatives, undated: Boxes 19-23”
Procedural Opportunities
• Describing for the web:– Manageable chunks described– Focus on “About-ness”– Accuracy– Maintain and improve a “minimal” methodology
Administrative Opportunities
• Begin to treat digitization as an integrated part of the archival administration workflow
• Collection flow freely between Digitization and Processing staff
• Archival staff with dual responsibilities?• Embrace practical levels of reprocessing to
support digitization
Mary W. ElingsArchivist for Digital Collections
The Bancroft LibraryUniversity of California
Outsourcing Rapid Capture of Special Collections
This work is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/us/
The Quick and the Good:
Outsourced Rapid Capture Projects Microfilm of Manuscript/Print Collections
2003-2004: Hearst Papers pilot (4,000 pages)2004-2005: Bancroft Dictations (16,000 pages)2005-2010: Historic CA Newspapers-NDNP (300,000 pages)2008-2010: John Muir Correspondence (24,800 pages)
Negatives from Pictorial Collections
2004-2005: SF Call Bulletin negatives (500 images)2009-2011: SF Examiner negatives (31,000 images so far…)
Mary W. ElingsOCLC Webinar: Rapid Capture in Special Collections and Archives Webinar
27 October 2011
Rapid Capture Stats• ~350,000 images from Manuscript Collections• ~35,000 images from Pictorial Collections
0
20000
40000
60000
80000
100000
120000
140000
2003-2004
2005-2006
2007-2008
2009-2010
MF ScansPIC Scans
Mary W. ElingsOCLC Webinar: Rapid Capture in Special Collections and Archives Webinar
27 October 2011
Rapid Capture Costs
Anytime scanner throughput can be increased, costs are reduced.
Doing work in quantity, grouping materials by size, and minimizing handling and equipment adjustments reduces the overall cost of capture.
The Bancroft Library has successfully reduced costs and increased throughput using this methodology.
• Traditional Capture– Paintings, Drawings, Prints
• 2,700 images in two years• $20 per image
• Rapid Capture– Microfilm
• 80,000 images in two years• $0.30 - $0.60 per image
– Historic Negatives• 23,000 images in two years• $2.50 per image
Mary W. ElingsOCLC Webinar: Rapid Capture in Special Collections and Archives Webinar
27 October 2011
Rapid Capture Costs
Outsourcing: Pros and Cons
• Pros – Vendors usually have the expertise and staffing in place– Vendors can purchase, use, and maintain equipment– Venders have more work, can make more investment in equipment, and
develop more efficient workflows based on volume– Investment is leveraged across multiple projects – Cost are fixed and can be budgeted
• Cons– Loss of control over process and materials– Difficult to send out original materials – Need to budget for shipping (time and cost) and insurance – Specifications must be set at outset/contract– Do not gain staff expertise and equipment
Mary W. ElingsOCLC Webinar: Rapid Capture in Special Collections and Archives Webinar
27 October 2011
Outsourcing: Pros and Cons
Outsourcing and Partnerships
– Contracts– Standards– Access– Preservation– Sustainability– Quality…
Mary W. ElingsOCLC Webinar: Rapid Capture in Special Collections and Archives Webinar
27 October 2011
Outsourcing and Partnerships
QA vs. QC
• Quality Assurance ensures the process will meet quality parameters defined for a given project (proactive). – “How will we create products that meet our specifications?”
• Quality Control makes sure the product meets the specifications defined in the process (reactive). – “Are we creating products that meet our specifications?”
Mary W. ElingsOCLC Webinar: Rapid Capture in Special Collections and Archives Webinar
27 October 2011
QA vs. QC
The Quick and the Good
• Capture rates can be increased and costs reduced by– grouping by size and type of material – minimizing handling– scanning in volume– minimizing individual image adjustments
• Quality can be ensured by establishing QA at the outset and QC throughout production
Mary W. ElingsOCLC Webinar: Rapid Capture in Special Collections and Archives Webinar
27 October 2011
The Quick and the Good
Rapid Capture at the University of Minnesota Archives
Erik MooreAssistant University Archivist &Lead Archivist for Health SciencesUniversity of Minnesota [email protected] @moore144
Sustainable ScanningWhat we’re scanning:• 20th century, mass produced pubs & records• Institutional records, informational value• No online catalog access to hardcopy
How we are doing it:• DIY digitization, 2 sheet-fed scanners• PDFs via institutional repository• Viewed as programmatic, not project
Rapid Capture Update
Report Current• 219,074 scans in a single
year• 500 per hour• 0.4% of holdings
• 650,000+ scans since 2009
• 600-700 per hour• 1.5% of holdings
Destructive Scanning
• 99% of scanning is sheet-fed
• Bound items are cut & shaved
• Post scanning workflow– Tied & reshelved– Foldered & boxed– Recycled
Digital not Paper
• If informational in value & accessible as digital, why preserve the “original”?– Important ≠ Unique
• When reformatted, preservation commitment follows the information– Preservation ≠ Permanent
• Improved upon with full-text searching & portability
Repository not Box
• Digitally reformatted materials join born-digital counterparts in IR
• Complete run accessible in single location• Preserved as single format• Curtail problem of “little archives everywhere”• Discovery happens elsewhere• Delivery now happens at point of discovery
Discovery & Delivery
Is it working?• 1958 bound volume of
press releases• No index; card catalog
access to title only• Zero recorded prior use• Downloaded 771 times
since June 2009
Rapid Capture. Rapid Access.
Brian WilsonBenson Ford Research Center
The Henry Ford
Basics• In place since January 2011• Camera / copy stand approach• Based on Yale Beinecke Library RIP• Using Canon EOS 5D Mark II DSLR• $8700 total for hardware and software
Stats• Over 6500 images produced since Feb 2011 • Imaging average: 45 images/hr (8.5 objects/hr)• Imaging peak: 114 images/hr (57 objects/hr) • Post-processing average: 50 images/hr
Rapid Capture
Many Positives• Can reach published imaging rates• Documentation publically available• Plays well with various material formats• Speed has different meanings• Process is a “black box”
But• “Box” is part of larger workflow• Workflow can involve many stakeholders
Learning Points
Sele
ctio
n
Inge
st
Obje
ct D
escr
iptio
n
Imag
ing
Deliv
ery
File
Desc
riptio
n
Man
agem
ent
RC
FB
Sele
ctio
n
RC
Imag
ing
Deliv
ery
Standard Workflow
Rapid Access Workflow
Rapid Access
Single PDF per folder• Entire folder content in single PDF• 1-2 images per page • Created directly from Adobe Bridge• Images receive sequential file name only• Page displays collection name, id, folder number
Accessed through description• At folder level for EAD; collection level for non-EAD
Presented in website context• Flexpaper embedded viewer application• Display of collection information• Navigation between folders
SWF XML
XTF
System Components
EADPDF
MS Word
Folder Viewer
Development Status
Imaging to Access• 6 hours for 200 photo prints across 20 folders• Image post-processing = 25%• PDF creation, linking, etc = 25%
Three collections processed fully to date
Using Flash version of Flexpaper• An HTML5 version is available
Running on internal network only
Positive staff feedback
Rapid Capture in Special Collections and Archives Webinar
27 October 2011
Questions?
Laura Clark BrownUniversity of North Carolina at Chapel Hill [email protected]
Ben GoldmanUniversity of [email protected]
Mary ElingsUniversity of California, [email protected]
Erik MooreUniversity of [email protected]
Brian WilsonThe Henry [email protected]
Ricky ErwayOCLC [email protected]
Rapid Capture in Special Collections and Archives Webinar
27 October 2011
ReferencesAdobe Bridge CS5http://www.adobe.com/products/bridge.html California Digital Library, XTFhttp://xtf.cdlib.org/
Canon U.S.A., EOS 5D Mark II Camerahttp://www.usa.canon.com/cusa/consumer/products/cameras/slr_cameras/eos_5d_mark_ii
Content, Context, and Capacity: A Collaborative Large-Scale Digitization Project on the Long Civil Rights Movement in North Carolina http://www.trln.org/ccc/index.htm Devaldi Ltd., Flexpaperhttp://flexpaper.devaldi.com/
Dietz, Brian and Jason Ronallo. 2011. Automating a Digital Special Collections Workflow Through Iterative Development. Philadelphia, PA: ACRL.http://www.ala.org/ala/mgrps/divs/acrl/events/national/2011/papers/automating_digital_s.pdf
Rapid Capture in Special Collections and Archives Webinar
27 October 2011
References, ContinuedDunnam, Jennifer, Vicki Field, et al.2006. University Information Assets: Re-Defining the University Archives in a Digital Age. University of Minnesota: President's Emerging Leaders Program. http://purl.umn.edu/5513.
Erway, Ricky, and Jennifer Schaffner. 2007. Shifting Gears: Gearing Up to Get Into the Flow. Dublin, Ohio: OCLC Programs and Research. http://www.oclc.org/research/publications/library/2007/2007-02.pdf.
National Archives and Records Administration. 2007. Plan for Digitizing Archival Materials for Public Access 2007-2016. http://www.archives.gov/comment/nara-digitizing-plan.pdf.
Schaffner, Jennifer. 2009. The Metadata is the Interface: Better Description for Better Discovery of Archives and Special Collections, Synthesized from User Studies. Dublin, Ohio: OCLC Research. http://www.oclc.org/programs/publications/reports/2009-06.pdf
Yale Beinecke Library, Digital Imaging Studiohttp://beinecke.library.yale.edu/brbltda/dis/dishome.asp
Rapid Capture in Special Collections and Archives Webinar
27 October 2011
Thank you!