Date post: | 10-Jul-2015 |
Category: |
Education |
Upload: | europeana-newspapers |
View: | 111 times |
Download: | 0 times |
Digitisation at the Wellcome Library:
Lessons learned & shared.
Historical Newspapers in the Digital Age, Bolzano
October, 2014
Dave Thompson
Digital Curator, Wellcome Library
The Wellcome Library
• Part of Wellcome Collection, astonishing public
venue in London developed by the Wellcome
Trust. Where people can learn more about
medicine through the ages & across cultures.
• More than 10,000 readers visit us each year,
including historians, academics, students, health
professionals & consumers, journalists, artists &
members of the general public.
Digitisation in the Wellcome Library
• Strategic approach, conscious planned decisions.
• Library transformation strategy, physical to digital.
• From ‘project’ to ‘production’.
• Digitisation as a sustainable end-to-end process.
Overview – four IT systems…
1. Workflow management system – ‘Goobi’ =
PRODUCTION.
2. Digital object repository – ‘Preservica’ =
STORAGE.
3. Front end - ‘the player’ = ACCESS.
4. Temporary & permanent storage for content =
70tb
Digitisation: Metadata import
MARC records are imported from Sierra into
Goobi as MARC XML.
Digitisation: Image upload
Digitised images (Internally or externally
digitised) are imported into Goobi &
normalised to JPEG2000.
Digitisation: Upload, ftp, harvesting
ftp’d content can be automatically imported
into Goobi & processed or IA content can be
automatically harvested.
Digitisation: METS/ALTO for access
Content is OCR’d & METS /ALTO files are
created in Goobi. Manual/automatic.
Digitisation: Repository ingest
Goobi initiates automated ingest of images &
metadata in Preservica.
Digitisation: Access
Player pulls images from
Preservica using metadata in the
METS/JSON file.
Or from a different perspective…
Goobi (METS/OCR)
Preservica
In-house
Institutions
Contractors
Harvesting
TIFF or JP2
TIFF or JP2
HD & ftp
TIFF or JP2
Normalises TIFF
to JP2
Manual
Automatic
Jpylyzer validates
JP2
Auto harvesting of
JP2 & DMD
Grey literature
Pro
ject M
an
ag
ers
/ In
ge
st O
ffic
er
Pro
ject M
an
ag
ers
Ingest Officer / Digital Curator
Snagging
Snagging
Lesson 1 - Digitisation as a social activity
1. Digitisation is not a technical problem; it’s a social
activity between creator & user.
2. Internally: Digitisation engages with all parts of the
organisation, & draws of many different skills.
3. Externally: Engaging with (Between…?) creators &
users, moving data into public realms, providing
access.
http://www.emmanueladegbola.com/networking-leads/
Projects & workflows
1. Standardised processes to deal with differences in
content & themes.
2. Use ‘projects’ & workflows to define activities &
automated steps to handle material from
transfer/acquisition to dissemination.
3. Projects & workflows allow us to manage our
processes & to report activity.
http://www.amross.sd/
Standardised formats
1. Digitisation process built around a small number of
formats.
2. Only accept – or create - TIFF or JPEG2000 image
format for digitisation. MPEG2 for video.
3. Share our JPEG2000 profile with creators & validate
images at point of processing.
4. Standardised metadata format(s) for discovery –
MARC - & retrieval – ALTO/JSON.
http://blog.absolutvision.com/en/jpeg2000-format/
Lesson 2 – It’s a strategic issue
1. Given the scale & complexity clear strategic direction
is essential.
2. Digitisation has to support an institutions users & their
information needs.
3. Digitisation has to be a strategic decision supporting
an institutions purpose.
4. Digitisation doesn’t change the mission of an
organisation.
Industrialisation of processes
1. Digitisation built around a small number of formats.
Workflows built around a small number of pre-defined
steps.
2. Common workflow activities mean less system
development, we can build our own processes.
3. Easier for humans to learn, less training, more
certainty/reliability.
4. Industrialisation supports processes that are
sustainable.
http://www.howtobeadad.com/2013/14723/unicorn-poop-how-i-fell-in-
love-with-the-daughter-i-never-had
Lesson 3 – sustainability or bust
1. Digitisation has to be a sustainable process.
2. Processes have to be scalable to ambition.
3. Design, re-design & review processes constantly &
integrate with existing services.
4. Digitisation as evolution, learn from what has been
done, apply & move forward.
http://planetivy.com/gaming/25273/natural-selection-2-gaming-evolution-in-action/
Automation is key
1. Automation is essential to scalability & efficiency.
2. Within digitisation some activities very susceptible to
automation. Automate them.
3. Automation standardises processes. Good for life
cycle management of data.
4. Automated processes maximise investment in
digitisation & support scalability.
http://www.technibble.com/automating-computer-business-for-
profit/
Automated harvesting of IA content
Content processed automatically, including
creation of METS & ALTO.
Goobi has a ‘repository’ of IA identifiers for
searching/harvesting.
Goobi harvests data from Internet Archive
website.
Content available in the player. Content stored in Preservica. DDS creates JSON for the player & pre-
caches some content.
Lesson 4: Nothing without imagination
1. The power of digitisation can only be revealed if we
can imagine the uses the data can be put to.
2. Digitisation is not an exercise in technology for its own
sake.
3. There is nothing that cannot be achieved, but it takes
more than kit, tools, computers, software.
4. Digitisation is about engaging with creators &
consumers, with the data & with the future.
Digitisation is not a separate activity
• Starts with alignment with the institutional mission.
• Builds on strategic vision.
• Digitisation as a strategic activity, planned &
supported.
• Integrate all institutional systems, bibliographic, IT
& human.
http://ocdindia.com/
Lesson 5 – The complete package
1. Digitisation is much more than sticking stuff under a
camera or on a scanner.
2. Digitisation has to be developed as a whole &
complete end-to-end process.
http://veritusgroup.com/how-to-create-a-dynamic-strategy-for-
every-single-donor-a-step-by-step-process/
So, lessons learned
• Digitisation is a social activity.
• Digitisation as a planned strategic activity.
• Digitisation has to be a sustainable & scalable
activity.
• Automation is key.
• Nothing without imagination.
• Digitisation has to be a complete package.
In the end we built something beautiful
Questions now, questions
later…?
Dave Thompson
Digital Curator
Wellcome Library
[email protected] @D_N_T