Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 1
Workflow Elements and Concepts:Common Practices
Gil NelsonDigitization Workshop
May 30, 2012
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 2
Biodiversity Digitization: Ultimate Goals of Effective Workflows
Output level: An abundance of scientifically useful and accessible data.
Constituency level: High quality exposure of the content and value of scientific collections.
Improvement level: Collaboration and workflow sharing across the collections community.
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 3
Global continua guiding digitization
Local decisions and policies
Specific workflows
Emphasis in
Implementation in
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 4
Pressures mitigating the long viewSo much data, so little time.
Our collections are not getting smaller.The funding agencies have high output expectations.
We only have 3 years to get this done.All of our data and all of our specimens are important.
Let’s just use the images!We’ll do the minimum now and enhance it later.
(while avoiding the Scarlet O’Hara syndrome).
Taking the long view means developing doable, effective, and sustainable strategies for balancing long term goals with short term constraints, including a commitment
to implementing future enhancements.
Short viewLong view
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 5
Scan and Deliver: Managing User-initiated Digitization in Special Collections and Archives, 2011J. Schaffner, F. Snyder. S. Supple
• Taking the inside track is often based on stretching the institution’s resources. Decisions are made to maximize resources available for user-initiated digitization by using solid baseline practices. The primary focus on the inside track is to get the job done quickly and to fill the user’s request. • Taking the middle track has the widest range of options, standards, and results. This is the most flexible of the tracks, where decisions often fall in gray areas. • Taking the outside track focuses on the collections themselves. While users may initiate digitization, it is undertaken to deliver materials to a greater public. These decisions may lead to comprehensive digitization, such as an entire book, series, or collection. The goal is to create maximum access to special collections, using preservation and archival standards. This track usually involves a level of thought and planning that is more in-depth than the fulfillment of day-to-day digitization requests.
Tracks to Digitization
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 6
Global Digitization ContinuaCurrent Tools Potential Future Tools
Fitness Quantity
Efficiency Speed
High cost/specimen Low cost/specimen
Digital protocols Traditional practices
Image everything Image nothingImage exemplars
Ancillary materials Specimens only
Evolving workflows Static workflows
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 7
Future Tools Favoring the Inside/Middle Tracks
OCR, NLP, and ICR (handwriting analysis) improvements Automated image analysis for data extractionData mining of labelsRobotic technologies, conveyor belts, etc.Improvements in discovery/capture/use of duplicatesImprovements in voice recognition and other data entry technologiesPost-digitization tools for curation and quality controlField data capture
Fitness QuantityFacilitators
• Emphasize fitness for use• Robust datasets• Data validation/cleaning• Integrated quality control• Integrated georeferencing• Intensive curation• Record historical annotations• Staff specialization• Small collection• Emphasize images• High quality images
Facilitators• Emphasize output• Spartan datasets• Defer validation/cleaning• Deferred quality control• Deferred georeferencing• Deferred or cursory curation• Record current determination• Staff generalization• Large collection• Emphasize data• Low quality images
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 8
(Quality?)
Efficiency vs. Speed
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 9
False dichotomy?
Is increased speed the inevitable outcome of improved efficiency?
Is increased speed always and necessarily the desired outcome of improved efficiency?
Efficiency vs. Speed
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 10
Improving EfficiencyReduce or eliminate redundancy (e.g., label data entry)Reduce or eliminate unnecessary steps in a workflow
Maintain an evidently logical, easy-to-follow workflowMitigate monotony for technicians
Reduce or eliminate travel timeReduce technician fatigueEnsure sustained output
Increase output over the long term
Efficiency vs. Speed
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 11
A rested, happy, satisfied tortoise is usually better than a harried hare!
Productivity vs. Cost
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 12
Issues to Resolve in Assessing Productivity
Measuring productivity (comparability across collections): Unit (output per unit time vs. expenditure/project totals) Data fitness (should data robustness be factored in the calculus?)
Measuring cost: Is this a competitive event? Output per hour at given fitness? $$ per specimen at given fitness? Accounting for variances in prep type, regional pay rates, data robustness, etc.?
Comparability: Just what is being measured? What is included in the output? Are all steps in the process accounted for? Are all expenditures of time accounted for? How do we arrive at a true per specimen cost?
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 13
Continuous Workflow Improvement
Develop written workflows
Continuous evaluation of written and production workflows by: Technicians Workflow managers Collections mangers
With particular attention to: Bottlenecks Redundancy Handling time Varying rates of productivity
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 14
Written Protocols
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 15
Written Protocols
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 16
Written Protocols
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 17
Continuous Workflow ImprovementDevelop written workflows that reflect actual practice
Continuous evaluation of written and actual workflows by: Technicians Workflow managers Collections managers
With particular attention to: Bottlenecks Redundancy Handling time Varying rates of productivity
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 18
28 Collections10 Museums
Spanning biological and paleontological collectionsInsects and other invertebrates, plants, birds, mammals
Wet, dry
Observing Digitization Practices in Biological and Paleontological Collections
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 19
American Museum of Natural HistoryBotanical Research Institute of TexasFlorida Museum of Natural History
Florida State UniversityHarvard Herbarium
Museum of Comparative Zoology (Harvard)New York Botanical Garden
SERNECSpecify Software Project (University of Kansas)
Symbiota Software Project (Arizona State University)Tall Timbers Research Station and Land Conservancy
Tulane University Museum of Natural HistoryUniversity of Kansas Insect Museum
Valdosta State UniversityYale Peabody Museum
Acknowledgments
Pre-digitization Curation
or “Staging”
Image Capture
Data Capture
Image Processing
Image/Data Storage
Geo-referencing
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 20
Personnel
Written Protocols
Biodiversity informatics Manager
Task Clusters
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 21
Dominant Digitization Patterns Observed
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 22
Linear vs. Looping
A BBarcode Image Data
A B
A B
Personnel specialization & availability | Reduces bottlenecks | Technician preference
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 23
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 24
Institute for Digital Information & Scientific Communication – Florida State University 25
Digitizing Biological Collections
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 26
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 27
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 28
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 29
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 30
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 31
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 32
Guidance--Digitisation: A strategic approach for natural history collections, 2012B. Kalms
Digitizing Biological Collections
Institute for Digital Information & Scientific Communication – Florida State University 33
Thank You!