Date post: | 25-Jun-2015 |
Category: |
Government & Nonprofit |
Upload: | frederick-zarndt |
View: | 293 times |
Download: | 0 times |
Newspaper digitization
Frederick Zarndt IFLA Newspapers Section
[email protected] @cowboyMontana
hashtag #IFLAnewspaper
1. Introductions 2.Review of the OAIS
reference model 3.Newspaper digitization
programs 4. Selection of materials 5. Importance of standards 6.Project management 7. Digitization workflow
7.1. Images 7.2. Metadata 7.3. File formats
8.Digitization workflow demonstration with docWorks
9. Quality assurance and acceptance criteria
10. Tools for digitization, workflow, digital preservation, and project management
11. Digital preservation considerations
12.Wrap-up
the agenda10.30 Morning tea break 13.00 Lunch 15.30 Afternoon tea break
An Open Archival Information System (or OAIS) is an archive, consisting of an
organization of people and systems, that has accepted the responsibility to preserve information and make it available for a
Designated Community.
Wikipedia contributors, “Open Archival Information System," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/wiki/Open_Archival_Information_System (accessed March 2014).
• Negotiate for and accept appropriate information from information Producers. • Obtain sufficient control of the information provided to the level needed to ensure
Long-Term Preservation. • Determine, either by itself or in conjunction with other parties, which communities
should become the Designated Community and, therefore, should be able to understand the information provided.
• Ensure that the information to be preserved is Independently Understandable to the Designated Community. In other words, the community should be able to understand the information without needing the assistance of the experts who produced the information.
• Follow documented policies and procedures which ensure that the information is preserved against all reasonable contingencies, and which enable the information to be disseminated as authenticated copies of the original, or as traceable to the original.
• Make the preserved information available to the Designated Community.
Open Archival Information System (OAIS) reference model
Wikipedia contributors, “Open Archival Information System," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/wiki/Open_Archival_Information_System (accessed March 2014).
Open Archival Information System (OAIS) reference model
programs
National
Collaborative
Indi
vidu
al
prog
ram
s
national: centrally funded and managed programs with several participants. strict standards.
• National Digital Newspaper Program (Library of Congress)
• Australian Newspaper Digitisation Program
national programspr
ogra
ms
cooperative: organizations collaborate to achieve a common goal but digitization programs are managed separately. flexible standards.
• Europeana newspapers • Digital Public Library of America
cooperative programspr
ogra
ms
individual: organization digitizes on its own. may or, more usually, does not follow open standards. all commercial organizations.
• ProQuest Historical Newspapers • Newspapers.com • Newsbank • many others…
individual programspr
ogra
ms
• digitization program requires careful thought
• must be adapted to local circumstances
• ask those who have gone before
• join the IFLA Newspapers Section! (ask me how)
programs
Image courtesy of Donald Zolan.
Discussion questions
1. Has your organization already begun to digitize newspapers? How is the digitization program organized and funded?
2. If your organization hasn’t yet begun to digitize newspapers, what type of digitization program would best suits your organization / state / country? Why?
programs? ?
Experience is that marvelous thing that enables you to recognize a
mistake when you make it again. !
F. P. Jones
selection
reasons for digitization
newspapers are deteriorating
microfilm is dissolving
no storage space
sele
ctio
n
access
• Who are your users? Do you know? • Can you ask them what they expect
from a digital newspaper collection? Can you trust their answers?
• Trove, Papers Past, Cambridge Public Library, CDNC: These digital newspaper collections are used mostly by people 50+ years old and with an interest in family history.
?sele
ctio
n
Library of Congress selection criteria for the National Digital Newspaper Program (NDNP)
!
• Image quality • Intellectual content • Refinements
http://www.loc.gov/ndnp/guidelines/selection.html
sele
ctio
n
Image quality !All NDNP newspaper images are scanned from microfilm. 1. Microfilm should be produced from properly prepared
unbound originals. 2. Microfilm reduction ratio should be less the 20x. This allows
400dpi images to be scanned from the film. 3. Variations in microfilm density within and between images
should be more than 0.2. 4. Negative microfilm duplicated for scanning should have
resolution test patterns readable at 5.0 or higher. For camera master microfilm without resolution test charts, resolution can be estimated by comparison to film with resolution test charts and original material.
selection for NDNPse
lect
ion
Intellectual content !1. Newspaper title reflects the political, economic and cultural
history of the State. 2. Selected newspaper titles should ensure broad geographical
coverage. 3. Newspaper titles that provide coverage of a geographic area or a
group over long time periods are preferred over short lived titles or titles with significant gaps.
selection for NDNPse
lect
ion
Selection criteria refinements !1. Orphan titles: Special consideration should be given to high
research value titles that have ceased publication and lack active ownership.
2. Newspaper titles that document a significant (minority) community at the state or regional level may be given special consideration.
3. Newspaper which have already been digitized by other organizations (for example, ProQuest) should not be digitized again.
selection for NDNPse
lect
ion
National Library of Australia collection managers in consultation with staff from Preservation Services nominate materials for digitization. The Library works closely with state and territory libraries to systematically digitise newspapers held in these libraries. Selected newspapers include this with !
• Cultural and/or historical significance • Uniqueness and/or rarity of the material • Copyright status or permission to digitise obtained • Material in high demand • Material at risk because of its physical condition
https://www.nla.gov.au/policy-and-planning/collection-digitisation-policy
selection for ANDPse
lect
ion
Most newspapers titles selected for digitization are out of
copyright and in the public domain. Negotiating use rights is quite simply too much trouble and
fraught with legal pitfalls.
Copyright laws and policies vary considerably between countries.
copyrightse
lect
ion
23
…however…
Digitization and public access to in-copyright newspapers is not
impossible.sele
ctio
n
24
25
26
27
28
Discussion questions
1. Has your organization already selected newspapers to digitize? Why did it choose the titles that were selected? Please answer (hypothetically) if your organization hasn’t begun a newspapers digitization program.
2. Why would or why wouldn’t your organization select in-copyright newspapers to digitize?
selection? ?
30
importance of standards
• Availability : Open standards are available for all to read and implement. • Maximize end-user choice : Open standards create a fair, competitive market
for implementation of the standards. They do not lock the customer into a particular vendor or group.
• No royalty : Open standards are free for all to implement, with no royalty or fee.
• No discrimination : Open standards and the organizations that administer them do not favor one implementor over another for any reason other than the technical standards compliance of a vendor's implementation.
• Extension or subset : Implementations of open standards may be extended, or offered in subset form. However, certification organizations may decline to certify subset implementations, and may place requirements upon extensions.
• Predatory practices : Open standards may employ license terms that protect against subversion of the standard by embrace-and-extend tactics. The licenses attached to the standard may require the publication of reference information for extensions, and a license for all others to create, distribute and sell software that is compatible with the extensions. An open standard may not otherwise prohibit extensions.
Adapted from FOSS Open Standards. http://en.wikibooks.org/wiki/FOSS_Open_Standards
open standardsim
port
ance
of s
tand
ards
• Not restrictive : Less chance of being locked in by a specific technology and/or vendor.
• Interoperable : Easier for systems from different parties or using different technologies to interoperate and communicate with one another.
• Protection against obsolescence : Better protection of the data files created by an application against obsolescence.
• Portable : Applications / data are easier to port from one platform to another since they follows known guidelines and rules, and the interfaces.
32
impo
rtan
ce o
f sta
ndar
ds
Adapted from FOSS Open Standards. http://en.wikibooks.org/wiki/FOSS_Open_Standards
open standards
What standards are important for newspaper digitization? !• METS XML is an open standard administered by the METS editorial
board. See http://www.loc.gov/standards/mets/. • ALTO XML is an open standard administered by the ALTO editorial
board. See http://www.loc.gov/standards/alto/. • Various image file formats including TIFF, JPEG, JPEG2000. • PDF/A is a portable document format developed by Adobe. It is a
subset of the complete PDF specification and has been adopted by ISO as a standard. See http://www.pdfa.org/.
• Various library metadata standards including, but not limited to • MODS XML http://www.loc.gov/standards/mods/ • Dublin Core http://dublincore.org/ • PREMIS http://www.loc.gov/standards/premis/
newspapers and standardsim
port
ance
of s
tand
ards
importance of standards
with few exceptions libraries use METS XML +
ALTO XML + image files (TIFF, JPEG2000) for newspaper
digitization programs
impo
rtan
ce o
f sta
ndar
ds
proprietary standardsOlive ActivePaper Archive stores historical newspaper data in an XML format that is as capable as METS/ALTO XML but is not an open standard.
Early versions of WordPerfect (MS Word too) stored data in a proprietary format, not in an open standard like Open Document Format (ODF). WordPerfect or special software is needed to view the files.
Adobe’s Flash is a de facto but not an open standard. Flash now appears to be on a path to obsolescence, destined to be replaced by HTML5.
impo
rtan
ce o
f sta
ndar
ds
Discussion questions
1. Name a few standards that you use every time you connect to the Internet.
2. What library standards does your organization currently use? What other, non-library standards, if any, does your organization use?
? ?importance of standards
In theory, there's no difference between theory and practice, but in
practice, there is. !
Anonymous
project management
From the Standish Group’s 2012 Chaos Report on IT Project Failure.
proj
ect m
anag
emen
t
Roger Sessions estimates that the worldwide cost of IT failure is USD $500 billion per month
Roger Sessions: CTO of ObjectWatch. He has written seven books including Simple Architectures for Complex Enterprises and many articles. He is a founding member of the Board of Directors of the International Association of Software Architects. 40
high cost of IT failurepr
ojec
t man
agem
ent
in a recent survey of 1230 IT professionals conducted by Embarcadero Technologies, 2 of the
3 biggest project challenges cited by the IT pros are “poor planning” and “poor or no requirements”
41
plan!pr
ojec
t man
agem
ent
in a March 2007 web poll conducted by the Computing Technology Industry Association "nearly
28 percent of the more than 1,000 respondents singled out poor communications as the number one
cause of project failure"
42
communicate!pr
ojec
t man
agem
ent
A recent survey of 752 IEEE members conducted by IEEE Spectrum and The New York Times discovered that "just 9 percent of 133 respondents whose organizations currently
offshore R&D reported 'No problem'. The biggest headache was 'Language, communication, or culture' barriers, as reported by 54.1 percent of respondents." (http://www.spectrum.ieee.org/feb07/4881
43
communicate!pr
ojec
t man
agem
ent
In their 2009 book Cultural Intelligence: Living and Working Globally, Thomas and Inkson say “Although we increasingly cross boundaries and surmount barriers to trade, migration, travel, and the exchange of information, cultural boundaries are not so easily bridged. Unlike legal, political, or economic
aspects of the global environment, which are observable, culture is largely invisible. Therefore, culture is the aspect of
the global context that is most often overlooked.”
44
communicate!pr
ojec
t man
agem
ent
in a white paper written for Project Perfect by Taimour al Neimat, he lists • poor planning • unclear goals and objectives • objectives changing during the project • unrealistic time or resource estimates • lack of executive support and user involvement • failure to communicate and act as a team • inappropriate skillsas primary causes for the failure of complex IT projects
Taimour al Neimat. Why IT project fail. The PROJECT PERFECT White Paper Collection. Oct 2005. http://www.projectperfect.com.au/downloads/Info/info_it_projects_fail.pdf accessed Mar 2014.
proj
ect m
anag
emen
tplan!
typical tender evaluation criteria in priority order !
1. understanding of requirements 2. reputation of service bureau 3. price
46
requirements?pr
ojec
t man
agem
ent
incomplete requirementsrequirements in recent tender from an (anonymous) government agency somewhere in the world !
• project to convert ~ 170,000 text images to xml • value of project ~ USD $180,000 • 19 pages of definitions, governing law, proposal
evaluation criteria, contractual conditions, instructions about tender response format, etc
• technical requirements description? < 1 page • data acceptance criteria? “a high level of
accuracy”47
proj
ect m
anag
emen
t
complete requirements Library of Congress JPEG2000 profile
48
proj
ect m
anag
emen
t
a recent newspapers digitization program established by a prominent national library !• digitize more than 20 million text pages • high level image and xml requirements • value of work awarded? > USD $5,000,000 • after award of work, technical requirements expand to 43+ pages from ~3 pages • acceptance criteria? added as an afterthought and not well defined pr
ojec
t man
agem
ent
poor planing
the value of simplicity“There are two ways of constructing a software
design: one way is to make it so simple that there are obviously no deficiencies and the other way is
to make it so complicated that there are no obvious deficiencies.”
!C.A.R. Hoare
Professor Sir Charles Anthony Richard Hoare Emeritus Professor at Oxford University, Senior Researcher at Microsoft Research, recipient of the ACM Turing Award, author of many books on computers and software.
proj
ect m
anag
emen
t
• unitary: the requirement addresses one and only one thing
• complete: the requirement is fully stated in one place with no missing information
• consistent: the requirement does not contradict any other requirement and is fully consistent with all authoritative external documentation
• atomic: it does not contain conjunctions, for example, "the code field must validate American and Canadian postal codes" should be written as two separate requirements
proj
ect m
anag
emen
t
good requirements
!• traceable: the requirement meets all or part of a
business need as stated by stakeholders and authoritatively documented
• current: the requirement has not been made obsolete by the passage of time
• feasible: the requirement can be implemented within the constraints of the project
• unambiguous: the requirement is concisely stated without recourse to technical jargon, acronyms
• verifiable: the implementation of the requirement can be determined through one of four possible methods: inspection, demonstration, test, or analysis
proj
ect m
anag
emen
tgood requirements
53
proj
ect m
anag
emen
t
• be impeccable with your word • don’t take anything personally • don’t make assumptions • always do your best • be mindful
simple principles for (good) communication
no communication ... little communication ... poor communication ... reduced communication ...
... all result in more assumptions about intent!
why (better) communication is necessary
The single biggest problem with communication is the
illusion that it has taken place.
George Bernard Shaw, 1925 Nobel Peace Prize for Literature.
proj
ect m
anag
emen
t
“projects are about communication, communication, and communication”
Elenbass, B. Staging a Project: Are You Setting Your Project Up for Success? Proceedings of the Project Management Institute Annual Seminars & Symposiums. 2000.
“Plan to throw one away; you will anyhow. If there is anything new about the function of a system, the first
implementation will have to be redone completely to achieve a satisfactory (i.e., acceptably small, fast, and maintainable)
result. It costs a lot less if you plan to have a prototype.” !
Butler Lampson
Butler Lampson was a founding member of Xerox PARC, worked for DEC, and now works at Microsoft Research. He is an adjunct professor at MIT and an ACM Fellow.
the value of prototypes / pilots
proj
ect m
anag
emen
t
create requirements and acceptance criteria repeat {
digitize (small) pilot batch test data against acceptance criteria adjust requirements and acceptance criteria
} until (no more adjustments are necessary) digitize more data
implement: pilot
pilot batches are VERY VERY important!!59
proj
ect m
anag
emen
t
reasons for in-house production !• collection cannot be moved • collection is badly organized • digitization must be done slowly over a long
period • digitization is very simple
60
proj
ect m
anag
emen
t
implement: in-house
reasons for outsourced production !• originals can’t be scanned in-house because… • equipment is too expensive • output data is beyond staff experience • labor is too expensive
• large volume of work in a short time • insufficient space, infrastructure, or staff
61
proj
ect m
anag
emen
t
implement: outsource
The project management tool one chooses should be intuitive, easy to use, and accessible to all. If it isn’t, many will avoid / refuse / dislike / resent using it. !• Discussion of project management tools at http://
en.wikipedia.org/wiki/Comparison_of_project-management_software
• List of project management tools at http://en.wikipedia.org/wiki/Comparison_of_project-management_software
project management toolspr
ojec
t man
agem
ent
Discussion questions
1. What project management practices does your organization follow? Why?
2. What library standards does your organization currently use? What other, non-library standards, if any, does your organization use?
3. What reasons, in addition to those already cited, would your organization have to digitize newspapers in-house or to outsource digitization?
? ?project management
“Perfection is attained, not when there is nothing left to add, but when there
is nothing left to take away.” !
Antoine de St. Exupery
digitization workflow
!
• digital library: one or more digital collections
digitization workflow
67
digital librarydi
gitiz
atio
n w
orkf
low
!
• digital library: one or more digital collections • digital collection: organized group(s) of digital
objects
digitization workflow
69
digital collection
!
• digital library: one or more digital collections • digital collection: organized group(s) of digital
objects • digital object: a surrogate or digital copy of
the original source document, for example, a newspaper issue
digitization workflow
digi
tal o
bjec
t
An example of w
hat ALTO
makes possible
The Day book. (Chicago, Ill.), 29 Feb. 1912. Chronicling America: Historic American Newspapers. Lib. of Congress. <http://chroniclingamerica.loc.gov/lccn/sn83045487/1912-02-29/ed-1/seq-26/>
!
• digital library: one or more digital collections • digital collection: organized group(s) of digital
objects • digital object: a surrogate or digital copy of
the original source document, for example, a newspaper issue
• metadata: data about data. information about a digital object(s) or a digital collection(s) or the original source document(s)
digitization workflow
74
metadatadi
gitiz
atio
n w
orkf
low
• to enhance accessibility • to increase collaboration and cooperation
between libraries and archives around the world
• to promote research • to provide opportunities for entrepreneurs • other reasons?
75
why digitize newspapers?di
gitiz
atio
n w
orkf
low
Open Archival Information System (OAIS) reference model
digi
tizat
ion
wor
kflo
w
accessimagesproduce imagessource objects
producedigital objects
ingest preserve
access
the digitization process
imagesproduce imagessource
the digitization process
• image file formats • TIFF • JPEG2000 • JPEG • GIF
• text file formats • PDF, PDF/A, PDF/A-1b, PDF/A-1a • TEI XML • HTML • plain text • NITF / NewsML
• metadata • METS • MODS / PREMIS / ALTO / MIX ...
standard file formatsdi
gitiz
atio
n w
orkf
low
• image production source materials • original documents: better quality, more
expensive • microfiche: poorer quality, less
expensive, microfiche quality varies • bit depth
• black-and-white (bitonal) • greyscale • color
• resolution • compression
• no compression • lossless (reversible) • lossy (irreversible)
• image metadata
image decisions? ¿di
gitiz
atio
n w
orkf
low
image format comparison
Wikipedia contributors, "Comparison of Graphics File Formats," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/wiki/Comparison_of_graphics_file_formats (accessed August 1, 2012)
compression bit depth metadata color management
mime type patent 1st public
release
JBIG (.jbig, .jbg) lossless 1-bit no no 2000?
JPEG (.jpg, .jpeg)
lossy, DCT, RLE, Huffman
8-bit 12-bit 24-bit
yes yes image/jpeg public.jpeg no 1992
JPEG2000 (.jp2)
many lossless and lossy compression
algorithms
8-bit 16-bit
color to 48 bitsyes yes image/jp2
public.jpeg200yes but part 1 is
patent free2000
TIFF (.tiff, .tif)
none LZW RLE ZIP
Other
1, 2, 4, 8, 16, 24, 32 bits
yes yes image/tiff public.tiff no 1986
The Sacred Heart Review 300dpi
Los Angeles Star 300dpi
Die Susquehanna Zeitung 600dpi
TIFF (uncompressed) 17.2 MB 87 MB 415.5 MB
TIFF (lossless LZW compression) 10.2 MB 75.8 MB 232.9 MB
JPEG (maximum quality [lossless]) 7.0 MB 37.2MB 101.1 MB
JPEG (medium quality) 1.5 MB 4.6 MB 10.2MB
JPEG2000 (lossless compression) 7.1 MB 52.7 MB 166.2 MB
JPEG2000 (lossy [70] compression) 5.1 MB 37.1 MB 116.7 MB
JPEG2000 (lossy [30] compression) 2.2 MB 16.1 MB 50.3 MB
image compression comparison
USA case law image 1300dpi
USA case law image 2300dpi
TIFF 1-bit CCITT G4 compression 40 KB 87 KB
JPEG2000 W5x3 reversible compression 2.6 MB 3.6 MB
JPEG2000 W9x7 irreversible compression 647 KB 1 MB
image bit depth comparison
Image courtesy of http://epsos.de (accessed at http://commons.wikimedia.org March 2014).
GARBAGE IN, GARBAGE OUT
GIGO
Deaths. lln»rieff, Esq. of <c .. Qn. Sunday, the till. greatly Drandrellt, of Orms4\irJi.- ~ ; ;✓ ' • * On ijfr r inn l j j j i l F i i j ' 1 1 f H a v o d i v y d , Carnarvonshire, S ; **" *- ' « ' March Oxford, F. Tfovmeud, Uerald. » • V . • O n T n c s d a v l a s t , M r . C har l es . IWilinson, this 8 ; had vf thesis#,, a week ago, which tcrminate<i'iu his death. . / ' ■ O'i Sunday, dJst nit. at. AsbtCnvHall, mar Lancaster, Mr.,Geo. Worn ick, many years house'steward hit late Once The Hamilton and Brandon. He locked himself h»oWn'r«wte<: soon. twelve o'clock" that dny, and fii»-d a loaded pistol " t h r o u g h I n s b e a d , 1 w h i c h instantaneously killed him. Coronet's Verdict, shot himself in a temporary fit of Friday week,
raw OCR text
Excerpt from The British Newspaper Archive, Chester Courant, Tuesday 6-Apr-1819, page 3.
newspaper image
Discussion topics
1. Assume your organization decides to digitize 1000 newspaper issues averaging 12 pages per issue. The images are scanned 2-up and average 80MB each. How much disk storage is needed for the images?
2. Now assume instead that your organization uses TIFF images with LZW (lossless) compression, which saves on average 40%. How much disk storage is needed for the images?
? ?digitization workflow
why (better) communication is necessary
images objectsproducedigital objects
the digitization process
objectsimages image processing
layout analysis OCR metadata
build digital objects
the digitization process
objectsimages image processing
layout analysis OCR metadata
build digital objects
the digitization process
• crop, de-skew, split images • apply image improvement algorithms as
needed • sharpening filters • local adaptive thresholding • remove text bleed-thru • etc
• create master images • create working images
92
93
94
what’s wrong with this image?
text is skewed about 1° from
vertical
text is de-skewed
text is skewed
objectsimages image processing
layout analysis OCR metadata
build digital objects
the digitization process
• analyze layout of text image • estimate font types and sizes • calculate coordinates of text blocks • determine layout object types (text,
illustration, headline, etc)
newspaper text layout analysis
objectsimages image processing
layout analysis OCR metadata
build digital objects
the digitization process
• perform optical character recognition (OCR) • calculate word and character coordinates • calculate word and character confidences • apply language dictionaries • correct OCR text (optional)
objectsimages image processing
layout analysis OCR metadata
build digital objects
the digitization process
• populate metadata fields • verify / correct page numbers • verify / correct document structure
objectsimages image processing
layout analysis OCR metadata
build digital objects
the digitization process
• create METS / ALTO XML files • create image files and image metadata • create PDF files (if required) • verify digital object • calculate file fixity checks (checksums) • perform file validation and verification • perform quality assurance
• automatic production steps performed by software !
• manual production steps performed by operators
real world digitization production workflow
• METS XML for descriptive, structural, technical, and administrative metadata !
• descriptive metadata • Metadata Object Description Standard (MODS)
selected metadata from MARC • Dublin Core fundamental group of text elements for
describing and cataloging !
• technical metadata • ALTO for OCR text • PREMIS for digital preservation • MIX and ANSI/NISO Z39.87 for images
digital library standards
Metadata Encoding and Transmission Standard
!• METS is a XML standard for encoding descriptive, administrative,
and structural metadata about objects within a digital library • METS files consist of 7 (optional) sections: header, descriptive,
administrative, file map, structural map, structural link, and behavior
• METS profiles describe a class of METS documents in sufficient detail to provide both document authors and programmers the guidance to create and process METS documents conforming with a particular profile
• current version 1.9.1 • administered by METS editorial board (international group of
volunteers) • standards hosted by Library of Congress at http://www.loc.gov/
standards/mets/
Graphic from Karin Bredenberg, Communicating Archival Metadata conference and workshops. Riksarkivet, 2011.
METS file structure
Metadata Object Description Schema• MODS is an XML schema for a bibliographic element set that may
be used for library applications. Derivative of MARC 21 bibliographic format. Includes a subset of MARC fields, using language-based tags rather than numeric ones
• Subset of MARC 21 • Mappings exist between MODS and MARC, Dublin Core, and RDA
(conversion tools exist) • May be used in conjunction with METS XML • current version 3.4 • administered by Library of Congress Network Development and
MARC Standards Office with help from interested users • standards hosted by Library of Congress at http://www.loc.gov/
standards/mods/
MODS metadata in METS XML<mets:dmdSec ID="issue-nla.news-issn18368190_18740425">! <mets:mdWrap MDTYPE="MODS">! ! <mets:xmlData>! ! ! <mods:mods xmlns="http://www.loc.gov/mods/v3">! ! ! ! <mods:language>! ! ! ! ! <mods:languageTerm type="code" authority="rfc3066">en</mods:languageTerm>! ! ! ! </mods:language>! ! ! ! <mods:genre>newspaper issue</mods:genre>! ! ! ! <mods:originInfo>! ! ! ! ! <mods:dateIssued>18740425</mods:dateIssued>! ! ! ! </mods:originInfo>! ! ! ! <mods:relatedItem type="host">! ! ! ! ! <mods:titleInfo>! ! ! ! ! ! <mods:title>The Queenslander (Brisbane, Qld. : 1866-1939)</mods:title>! ! ! ! ! </mods:titleInfo>! ! ! ! ! <mods:genre>newspaper</mods:genre>! ! ! ! ! <mods:identifier>ISSN18368190</mods:identifier>! ! ! ! ! <mods:part>! ! ! ! ! ! <mods:detail type="volume">! ! ! ! ! ! ! <mods:number>IX</mods:number>! ! ! ! ! ! </mods:detail>! ! ! ! ! </mods:part>! ! ! ! ! <mods:part>! ! ! ! ! ! <mods:detail type="issue">! ! ! ! ! ! ! <mods:number>12</mods:number>! ! ! ! ! ! </mods:detail>! ! ! ! ! </mods:part>! ! ! ! </mods:relatedItem>! ! ! </mods:mods>! ! </mets:xmlData>! </mets:mdWrap></mets:dmdSec>
Dublin Core metadata
• Dublin Core is a set of vocabulary terms used to describe resources for the purposes of discovery.
• Dublin Core metadata element set is endorsed in IETF RFC 5013, ISO 15836-2009, and NISO Z39.85
• Metadata terms last updated 14-Jun-2012 • May be used in conjunction with METS XML • Dublin Core Metadata Initiative (DCMI) is an open
organization, incorporated as a public, not-for-profit company in Singapore
• Dublin Core Metadata Initiative is hosted at http://dublincore.org/
Analyzed Layout and Text Object
!• ALTO XML provides technical metadata for describing the layout
and content of physical text resources, such as pages of a book or a newspaper
• commonly used in conjunction with METS XML but may be used standalone
• current version 2.1 • administered by ALTO editorial board (international group of
volunteers) • standards hosted by Library of Congress at http://www.loc.gov/
standards/alto/
<?xml version="1.0" encoding="UTF-8"?><alto xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://schema.ccs-gmbh.com/metae/alto-1-4.xsd" xmlns:xlink="http://www.w3.org/1999/xlink"><Description>! <MeasurementUnit>pixel</MeasurementUnit>! <sourceImageInformation>! ! <fileName>//docstorage/impdata_2$/IN/NLA/db0046/batch-1109/nlaImageSeq-2349218-b.tif</fileName>! </sourceImageInformation></Description><Styles>! <TextStyle ID="TXT_0" FONTSIZE="7" FONTFAMILY="Times New Roman" FONTSTYLE="bold"/>! <TextStyle ID="TXT_1" FONTSIZE="9" FONTFAMILY="Times New Roman" FONTSTYLE="bold"/> </Styles><Layout>! <Page ID="P1" PHYSICAL_IMG_NR="1" HEIGHT="9224" WIDTH="7136" PC="0.967">! ! <TopMargin ID="P1_TM00001" HPOS="0" VPOS="0" WIDTH="7135" HEIGHT="814"/>! ! <LeftMargin ID="P1_LM00001" HPOS="0" VPOS="814" WIDTH="151" HEIGHT="8194"/>! ! <RightMargin ID="P1_RM00001" HPOS="6959" VPOS="814" WIDTH="176" HEIGHT="8194"/>! ! <BottomMargin ID="P1_BM00001" HPOS="0" VPOS="9008" WIDTH="7135" HEIGHT="216"/>! ! <PrintSpace ID="P1_PS00001" HPOS="151" VPOS="814" WIDTH="6808" HEIGHT="8194">! ! ! <ComposedBlock ID="ART1" HEIGHT="2366" WIDTH="929" HPOS="209" VPOS="831">! ! ! ! <ComposedBlock ID="ZONE1-1" HEIGHT="88" WIDTH="641" HPOS="357" VPOS="831">! ! ! ! ! <TextBlock ID="P1_TB00004" HPOS="357" VPOS="831" WIDTH="641" HEIGHT="88" STYLEREFS="TXT_4 PAR_LEFT">! ! ! ! ! ! <TextLine ID="P1_TL00065" HPOS="357" VPOS="831" WIDTH="641" HEIGHT="75">! ! ! ! ! ! ! <String ID="P1_ST00404" HPOS="357" VPOS="831" WIDTH="65" HEIGHT="74" CONTENT="The" WC="0.98" CC="000"/>! ! ! ! ! ! !<SP ID="P1_SP00340" HPOS="422" VPOS="906" WIDTH="0"/>! ! ! ! ! ! ! <String ID="P1_ST00405" HPOS="422" VPOS="831" WIDTH="576" HEIGHT="74" CONTENT="Queenslander." WC="0.96" CC="0000000000000"/>! ! ! ! ! ! </TextLine>! ! ! ! ! </TextBlock>! ! ! ! </ComposedBlock>! ! ! ! <ComposedBlock ID="ZONE1-2" HEIGHT="83" WIDTH="894" HPOS="228" VPOS="964"/>! ! ! ! <ComposedBlock ID="ZONE1-3" HEIGHT="46" WIDTH="702" HPOS="331" VPOS="1087"/>! ! ! ! ! ! <TextLine ID="P1_TL01143" HPOS="5946" VPOS="8957" WIDTH="881" HEIGHT="46">! ! ! ! ! ! ! <String ID="P1_ST06356" HPOS="5946" VPOS="8965" WIDTH="3" HEIGHT="27" CONTENT="I" WC="1.00" CC="0"/>! ! ! ! ! ! !<SP ID="P1_SP05236" HPOS="5950" VPOS="8992" WIDTH="658"/>! ! ! ! ! ! ! <String ID="P1_ST06357" HPOS="6608" VPOS="8957" WIDTH="219" HEIGHT="46" CONTENT="Proprietors." WC="1.00" CC="101401212010"/>! ! ! ! ! ! </TextLine>! ! ! ! ! </TextBlock>! ! ! ! </ComposedBlock>! ! ! </ComposedBlock> ! </PrintSpace> </Page></Layout></alto>
Analyzed Layout and Text Object
Analyzed Layout and Text Object book
Analyzed Layout and Text Object newspaper
Preservation Metadata Implementation Strategies
• PREMIS is a core set of implementable preservation metadata, broadly applicable across a wide range of digital preservation contexts and supported by guidelines and recommendations for creation, management, and use
• In 2003 OCLC and RLG jointly sponsored the formation of the PREMIS working group comprised of international experts in the use of metadata to support digital preservation activities
• PREMIS data dictionary current version 2.2 • May be used in conjunction with METS XML • PREMIS tools are freely available • PREMIS Maintenance Activity and Editorial Committee has
international members from libraries and industry • PREMIS data dictionary is hosted at http://www.loc.gov/
standards/premis/
PREMIS data in METS file
<mets:amdSec> <mets:techMD ID="PREMISOBJECT1"> <mets:mdWrap MDTYPE="PREMIS"> <mets:xmlData> <premis:object xmlns:premis="http://www.loc.gov/standards/premis/v1"> <premis:objectIdentifier> <premis:objectIdentifierType>National Library of Australia</premis:objectIdentifierType> <premis:objectIdentifierValue>nlaImageSeq-218-b.tif</premis:objectIdentifierValue> </premis:objectIdentifier> <premis:objectCategory>file</premis:objectCategory> <premis:objectCharacteristics> <premis:format> <premis:formatDesignation> <premis:formatName>TIFF</premis:formatName> <premis:formatVersion>TIFF 6.0</premis:formatVersion> </premis:formatDesignation> </premis:format> </premis:objectCharacteristics> <premis:relationship> <premis:relationshipType>derivation</premis:relationshipType> <premis:relationshipSubType>is derivative of</premis:relationshipSubType> <premis:relatedObjectIdentification> <premis:relatedObjectIdentifierType>National Library of Australia</premis:relatedObjectIdentifierType> <premis:relatedObjectIdentifierValue>nlaImageSeq-218-b.tif</premis:relatedObjectIdentifierValue> <premis:relatedObjectSequence>0</premis:relatedObjectSequence> </premis:relatedObjectIdentification> <premis:relatedEventIdentification> <premis:relatedEventIdentifierType>National Library of Australia</premis:relatedEventIdentifierType> <premis:relatedEventIdentifierValue>deskew-nlaImageSeq-218-b.tif</premis:relatedEventIdentifierValue> <premis:relatedEventSequence>0</premis:relatedEventSequence> </premis:relatedEventIdentification> </premis:relationship> </premis:object> </mets:xmlData> </mets:mdWrap> </mets:techMD> </mets:amdSec>
digi
tizat
ion
wor
kflo
w
implement: software
• commercial off-the-shelf (COTS) • open source • customized COTS • customized open source • custom in-house
117
Discussion topics
1. Assuming your organization will digitize historic newspapers, will it digitize the newspapers in-house or out-source digitization? Why? (If you don’t know, guesses and speculations are fine.)
2. Describe your organizations current digitization workflow.
? ?digitization workflow
quality assurance and acceptance criteria
quality assurance and acceptance criteria
Wikipedia on data quality: !The processes and technologies involved in ensuring the conformance of data values to requirements and acceptance criteria
qual
ity a
ssur
ance
• is the digital object complete? are all its components present? • is the digital object verifiable? • is the digital object uncorrupted? • do the components of the digital object
conform to standards? • do the file names conform to project
requirements? • does the directory structure conform to
project requirements? • does the digital object metadata conform to
project specifications?
qual
ity a
ssur
ance
automatic quality checks
• does the digital object metadata meet accuracy specifications?
• does the text meet accuracy specifications?
• is the image quality satisfactory? • are article continuations correct? • is the text in reading order?qu
ality
ass
uran
ce
manual quality checks
acceptance criteria for an English language digitization project at a large, well-known, and internationally recognized national library !
character accuracy > 80% word accuracy > 75% significant word accuracy > 65%
what’s wrong with this?qu
ality
ass
uran
ce
project quality requirement: !
“a high level of accuracy”
what’s wrong with this?
project quality requirement: !
“article titles must be 99.5% accurate”
what’s wrong with this?
project quality requirement: !
“article title characters in each issue must be 99.5% accurate, that is, each issue may have no more than 5 errors in 1000 article title characters”
what’s wrong with this?
image quality!
•sharpness: the amount of detail an image can convey
•noise: random variation of image density •dynamic range •contrast (gamma): the slope of the tone
reproduction curve in a log-log space. high contrast usually involves loss of dynamic range — loss of detail, or clipping, in highlights or shadows.
•vignetting: darkens images near the corners •artifacts: “leftovers” from sharpening or
compression
Wikipedia contributors, “Image quality," Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Image_quality (accessed March 2014).
qual
ity a
ssur
ance
Zhou Wang and Hamid R. Sheikh. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing. April 2004.
image quality!“…images which are ultimately to be viewed by human beings, the only “correct” method of quantifying visual image quality is through subjective evaluation. in practice, however, subjective evaluation is usually too inconvenient, time-consuming and expensive…” !“…best way to assess the quality of an image is to look at it because human eyes are the ultimate viewers of most images…”
Zhou Wang, Alan Bovick, and Ligang Lu. Why is image quality assessment so difficult? IEEE Transactions on Image Processing. April 2004.
qual
ity a
ssur
ance
acceptance criteria for the National Library of Australia NDP
129
Discussion topics
1. How does your organization currently do quality assurance for digital data?
2. How much time / effort is given to writing quality assurance procedures and acceptance criteria for digitized data?
? ?quality assurance
digitization tools
open source vs. commercial software: pros
• acquisition : cost, development and implementation contract costs are likely to be lower than for proprietary software. less likely that there will be contractually-bound upgrade costs. total cost of ownership over the lifetime of usage must be taken into account
• data transferability : with open source code and open data formats, there are greater opportunities to share data across interoperable platforms
• re-use : open source is free from per user or per instance costs and there is a guaranteed freedom to use it in any way. re-use is enabled.
digi
tizat
ion
tool
s
Adapted from Open Gov Summit 2013. http://opengov2013.zaizi.com/pros-and-cons-of-open-source-solutions/
• cost effective : pay once or not at all for development (if at all) and reuse where appropriate.
• non-restrictive : open source licenses do not limit or restrict who can use the software, the type of user, or the areas of business in which the software can be used. provides a licensing model that enables rapid provisioning of both known and unanticipated users and in new use cases.
• scalable : open source solutions are scalable upwards and downwards with a reduction in the risk of longer term financial implications. no license fees on a “per user” or “per box” basis. no redundant licenses
digi
tizat
ion
tool
s
Adapted from Open Gov Summit 2013. http://opengov2013.zaizi.com/pros-and-cons-of-open-source-solutions/
open source vs. commercial software: pros
• easy to prototype and adapt : open source software is particularly suitable for rapid prototyping and experimentation, where the ability to “test drive” the software with minimal costs and administrative delays can be important. (proprietary software suppliers may also provide the same through a ‘proof of concept’ phase at minimal or no cost.)
digi
tizat
ion
tool
s
Adapted from Open Gov Summit 2013. http://opengov2013.zaizi.com/pros-and-cons-of-open-source-solutions/
open source vs. commercial software: pros
• support and maintenance costs : may outweigh those of the proprietary package and include ‘hidden’ commitments.
• intellectual property rights : as code is modified and adapted, there may be legal risks the code’s open source status and who owns the intellectual property rights of the modified code.
• expertise : requires software installation and maintenance expertise. modification of open source code requires software development expertise.must ensure that they have the right level of expertise to manage it effectively.
digi
tizat
ion
tool
sopen source vs. commercial software:
cons
Adapted from Open Gov Summit 2013. http://opengov2013.zaizi.com/pros-and-cons-of-open-source-solutions/
digitization toolsa variety of open source and commercial off-the-shelf (COTS) software is available for digitization projects • easier for systems from different parties or using different
technologies to interoperate and communicate with one another • better protection of the data files created by an application
against obsolescence of the application • applications / data are easier to port from one platform to
another since they follows known guidelines and rules, and the interfaces
digi
tizat
ion
tool
s
ocr software• ABBYY FineReader (http://www.abbyy.com)
• Tesseract (https://code.google.com/p/tesseract-ocr)
• Nuance OmniPage (http://www.nuance.com)
• IRIS Readiris (http://www.irislink.com)
• LEADTOOLS OCR (http://www.leadtools.com)
• OCRopus (https://code.google.com/p/ocropus)
Wikipedia contributors, “Comparison of optical character recognition software," Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Comparison_of_optical_character_recognition_software (accessed March 2014).
Wikipedia contributors, “Optical optical character" Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Optical_character_recognition (accessed March 2014).
open source
digi
tizat
ion
tool
s
imaging software• LEADTOOLS image SDK (http://www.leadtools.com)
• ImageGear image SDK (http://www.accusoft.com)
• FreeImage image SDK (http://freeimage.sourceforge.net)
• BlackIce image toolkits (http://www.blackice.com)
• Adobe Photoshop (http://www.adobe.com/Photoshop)
• GIMP (http://www.gimp.org)
• GraphicsMagick (http://www.graphicsmagick.org)
• ImageMagick (http://www.imagemagick.org)
open source
digi
tizat
ion
tool
s
digital workflow software
• Content Conversion Specialists docWorks (http://content-conversion.com)
• ScanFlow (http://www.treventus.com)
• Goobi (http://www.goobi.org)
• Zissor (http://zissor.com)
open source
digi
tizat
ion
tool
s
other software
• BagIt : hierarchical file packaging format for the exchange of digital content. A "bag" has just enough structure to safely enclose descriptive "tags" and a "payload" but does not require any knowledge of the payload's internal semantics. See http://sourceforge.net/projects/loc-xferutils and http://tools.ietf.org/html/draft-kunze-bagit-06.
open source
Discussion questions
1. What software tools does your organization use for digital projects or digital libraries?
2. Does your organization host a digital library? If so, does it use Google Analytics or a similar tool? Why or why not?
3. What software tools does your organization use for project management? Are the tools web-based?
? ?digitization tools
Preservation of software and preservation of data are two sides of the same coin. From February 2011 Workshop for Digital Curators.
digital preservation
preservationOpen Archival Information System (OAIS)
reference model
digitization digital preservation≠ !
digital preservation
long-term, error-free storage of digital information, with means for retrieval and interpretation, for the entire time
span the information is required
digital data risks
• standards / format obsolescence • migration to new format, media,
or hardware • media obsolescence / decay • bit rot
format obsolescence
remember … WordPerfect ?
MARC records ? Adobe Flash ?
strategies for format obsolescence
•migrate data to new formats • create a computer software museum
with virtual machines • format registries • format validators • don’t worry about it!
Jeff Rothenberg on format obsolescence
“... digital documents are evolving so rapidly that shifts in the forms of documents
must inevitably arise. New forms do not necessarily subsume their predecessors or
provide compatibility with previous formats.”
Jeff Rothenberg. Ensuring the Longevity of Digital Documents. Originally published in Scientific American. January 1995. Expanded version published February, 1999. (accessed 1 August 2012 at http://www.clir.org/pubs/archives/ensuring.pdf)
standard model for format obsolescence
• digital format registry collects information about target format • this information is used to build format identification and
verification tools • holders of content use these tools to extract metadata from
content in target format; metadata is stored with the content • format registry scans computing environment to determine
which formats are obsolescent; notifications sent for obsolete formats
• on receiving such a notification, someone builds a tool to convert obsolete format to non-obsolete format using the format specification in the registry
• on receiving such a notification, holder of content in obsolete format uses conversion tool and content metadata to convert the file in an obsolete format to a file in a non-obsolete format
David Rosenthal on format obsolescence
“... format obsolescence is a rare problem that happens infrequently to a minority of
unpopular formats ...”
David Rosenthal. Format obsolescence: Assessing the threat and the defenses. (accessed 1 August 2012 at http://lockss.org/locksswiki/files/LibraryHighTech2010.pdf
alternate model for format obsolescence
• store only essential data • perform only essential tasks • delay performing tasks as long as possible
David Rosenthal. Format obsolescence: Assessing the threat and the defenses. Library High Tech, Special Issue, vol. 28, no. 2, 2010, pp.195-210. doi:10.1108/07378831011047613 (accessed 1 August 2012 at http://lockss.org/locksswiki/files/LibraryHighTech2010.pdf).
importance of standards vis-a-vis format obsolescence
well-defined standards … !
• guide developers in creation of tools • facilitates development of a broad range of
tools for any format • allow developers to maintain existing tools
data migration risks
• file format changes, for example, PDF 1.4 to PDF 1.8 • file name differences, for example, case
sensitive /insensitive names, new operating system • extended file attributes • file permissions, for example, BSD Unix
drwxr-xr-x@ to Windows file permissions • soft links / hard links
media obsolescence
• 5 ¼” floppy disks • 8 track tapes • 3 ½” floppy disks • ZIP drives • CD-R, CD-RW, Blu-Ray • DAT tapes • microfilm • etc
strategies for media obsolescence
• migrate data to new media, for example, floppy disks to DVD • create and maintain a computer hardware
museum
media decay
a report by NIST and the Library of Congress says ... • virtually all CD-Rs tested indicated an estimated life
expectancy beyond 15 years • only 47 percent of recordable DVDs indicated an
estimated life expectancy beyond 15 years, some had a life expectancy as short as 1.9 years • in practice actual lifetimes may be considerably
shorter
• proper storage • data file checksums (MD5, SHA-1, ...) • monitor media integrity • migrate data from old media to new media
prevention / detection of media decay
bit rot
gradual decay of data due to …
• storage media failure because of media quality • storage media failure because of improper storage • random events (bit-flip, environmental influences) • software / hardware errors
prevention / detection of bit rot
• data file fixity check (checksums) such as MD5, SHA-1, ... • monitor file integrity with frequent, corrective
audits • duplicate copies, geographically distributed
distributed decentralized digital preservation
• the more copies, the safer the data • the more independent copies, the safer the
data • the more frequently copies are audited, the
safer the data
Paraphrased David Rosenthal. Keeping bits safe: How hard can it be?
distributed decentralized digital preservation
• n+1 copies are safer than n copies • n independent copies on different storage
devices / media are safer than n copies on similar or identical storage devices / media
• data audited every week is safer than data audited every month
LOCKSS Lots Of Copies Keep Stuff Safe
• It ingests content from target websites using a web crawler similar to those used by search engines.
• It preserves content by continually comparing the content it has collected with the same content collected by other LOCKSS Boxes, and repairing any differences.
• It delivers authoritative content to readers by acting as a web proxy, cache or via Metadata resolvers when the publisher’s website is not available.
• It provides management through a web interface that allows librarians to select new content for preservation, monitor the content being preserved and control access to the preserved content.
• It dynamically migrates content to new formats as needed for display.
From LOCKSS webpages http://www.lockss.org.
LOCKSS box: Open source LOCKSS software installed on a dedicated computer or virtual machine.
how LOCKSS works data copied to another LOCKSS box
library X LOCKSS box
library Y LOCKSS box
my library LOCKSS box
data
how LOCKSS works data audited
library X LOCKSS box
library Y LOCKSS box
my library LOCKSS box
data
audit
how LOCKSS works data audited
library X LOCKSS box
library Y LOCKSS box
my library LOCKSS box
data
audit
audit fails
audit ok
how LOCKSS works data copied to another LOCKSS box
library X LOCKSS box
library Y LOCKSS box
my library LOCKSS box
data
private LOCKSS networks
Alabama Digital Preservation Network (http://www.adpn.org/).
CLOCKSS (Controlled LOCKSS), a non-profit collaboration of North American, European, and Asian cultural heritage institutions whose purpose is to preserve digital content with LOCKSS (http://www.clockss.org).
MetaArchive Cooperative is a digital preservation cooperative created by cultural heritage institutions (http://www.metaarchive.org).
digital preservation references• Nancy McGovern and Katherine Skinner editors. Aligning National Approaches to
Digital Preservation. Educopia Institute Publications. Atlanta Georgia. 2012. Proceedings of a conference on digital preservation held at the National Library of Estonia in May 2011. (accessed 15 August 2012 at http://www.educopia.org/sites/default/files/ANADP_Educopia_2012.pdf).
• David Rosenthal. Format obsolescence: Assessing the threat and the defenses. Library High Tech, Special Issue, v. 28, n. 2, 2010, pp.195-210. doi:10.1108/07378831011047613 (accessed 1 August 2012 at http://lockss.org/locksswiki/files/LibraryHighTech2010.pdf).
• David Rosenthal. Keeping bits safe: How hard can it be? Communications of the ACM v. 53, n. 11, 2010, pp. 47-55. doi:10.1145/1839676.1839692 (accessed 1 August 2012 at http://lockss.org/locksswiki/files/ACM2010.pdf).
• Jeff Rothenberg. Ensuring the Longevity of Digital Documents. Originally published in Scientific American January 1995. Expanded version published February 1999. (accessed 1 August 2012 at http://www.clir.org/pubs/archives/ensuring.pdf)
• Joint Information Systems Committee (JISC) Programme on Digital Preservation at http://www.jisc.ac.uk/preservation.
• Library of Congress on Digital Preservation at http://www.digitalpreservation.gov. • Stanford University’s website for LOCKSS at http://www.lockss.org.
newspaper digitization programs around the world
Europeana Newspapers Project, a collaboration of 17 organizations (http://www.europeana-newspapers.eu/)
Bibliotheque nationale de France (http://gallica.bnf.fr/)
National Library of Australia, Australian Digital Newspapers Program (http://trove.nla.gov.au/newspaper)
Singapore National Library Board (http://newspapers.nl.sg/)
National Library of New Zealand (http://paperspast.natlib.govt.nz/)
National Digital Newspaper Program, Library of Congress (http://chroniclingamerica.loc.gov/)
British Newspaper Archives, British Library (http://www.bl.uk/welcome/newspapers)
Koninklijke Bibliotheek, the Netherlands (http://kranten.kb.nl/)
National Library of Finland (http://digi.kansalliskirjasto.fi/)
National Library of Latvia (https://periodika.lndb.lv/)
• Library of Congress National Digital Newspaper Program http://www.loc.gov/ndnp/
• Australian Newspaper Digitisation Program http://www.nla.gov.au/content/newspaper-digitisation-program
• IFLA Newspapers Section Digitisation projects and best practices http://www.ifla.org/node/6777
• ICON: International Coalition on Newspapers http://icon.crl.edu/digitization.htm
• METS, MODS, ALTO, PRISM, and other library standards : http://www.loc.gov/standards
• OAIS : http://public.ccsds.org/publications/RefModel.aspx • NISO standards and guidelines : http://www.niso.org/
publications/rp • Good practice guides : http://www.ukoln.ac.uk • And many, many more
Wikipedia contributors, "List of online newspaper archives," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/wiki/Wikipedia:List_of_online_newspaper_archives (accessed March 17, 2013).
?!
Frederick Zarndt Secretary, IFLA Newspapers Section
Photo held by John Oxley Library, State Library of Queensland. Original from
Courier-mail, Brisbane, Queensland, Australia.