Date post: | 11-May-2015 |
Category: |
Documents |
Upload: | europeana-newspapers |
View: | 734 times |
Download: | 0 times |
Europeana Newspapers
Project: Overview
London, 9 June 2014
Rossitza Atanassova, British Library
@RossiAtanassova
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
2
In a nutshell
The Europeana Newspapers Project is a best practice network that
aims at aggregating up to 18 million digitised historic newspaper
pages from 12 European libraries and significantly improving the
discovery of three centuries of Europeana news articles and events
with relevance to the whole of Europe.
In addition 11 other libraries who joined the networked since th
begining of the project are contributing metadata.
Volume Across European cultures
Sharing best practices Improving discovery and access
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Quick overview http://www.europeana-newspapers.eu/
• 3 year project ending January 2015
• Aggregate and make searchable up to 18 million historic
newspaper pages from across Europe
• Provide access for Europeana via a dedicated content
browser developed by The European Library
• Build tools to better assess the quality of newspaper
digitisation in relation to level of detail, speed and costs
• Create best practice recommendations for newspapers
metadata
• Grow the best practice network and actively engage users of
digitised newspapers
3
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
18 Project Partners
12 content providers
2 networking partners
4 technology providers
1 aggregator
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
11 Associated Partners
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Neworking Partners
6
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Workpackages
• WP1 Project coordination by StaatsBibliothek Berlin (SBB)
• WP2 Refinement led by Koninklijke Bibliotheek (KB)
• WP3 Evaluation and quality assessment led by University of
Salford (USAL)
• WP4 Aggregation and presentation led by The European
Library (TEL)
• WP5 Metadata best practice recommendations led by
University of Innsbruck (UIBK)
• WP6 Dissemination led by the Association of European
Research Libraries (LIBER)
7
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
WP2 Refinement of digitised newspapers
• Analyse and select of digitised newspapers content for
refinement (public document available)
• Define digitization requirements and minimum quality of
newspapers for advanced services in Europeana (public
document available)
• Develop workflow for refinement procedure as part of the
aggregation process to co-ordinate refinement of selected
content (full text, structural enrichment, named entities
recognition) (ongoing)
• Provide recommendations on best practice for refinement of
digitized newspaper collections with full-text (due Jan 2015)
8
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
WP3 Evaluation and Quality Assurance
The Europeana Newspapers project will help by developing
an evaluation and quality-assessment infrastructure for
newspaper digitisation. It will establish accepted baselines for
accuracy in relation to the level of detail, speed of digitisation
and costs. This will in turn help experts to assess different
methods of newspaper digitisation and pick the one that gives
the best result.
9
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
WP4 Aggregation and presentation of digitised
content for Europeana
• Aggregate content and develop a search interface
• Over 2.5 million pages ingested with content from 8 project
partners and metadata from 2 associated partners
• Access via TEL prototype browser
http://www.theeuropeanlibrary.org/tel4/newspapers
• Usability testing and improving functionality
http://www.europeana-newspapers.eu/functionality-
newspaper-browser/
10
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
WP5 Metadata best practice for newspapers
• Gather and analyse metadata models from libraries currently
in use for the digitisation of newspapers
• Design and release a comprehensive metadata model based
on de-facto standards such as METS, MODS, MARC, ALTO
(due Jan 2015)
• Prepare an online resource that contains the rules how to
apply the format and how to use it within a digitisation project
• Tool to enrich the newspapers METS/ALTO profile with
structural metadata
• IFLA Newspapers Section Pre-Conference (Geneva 13rd - 14th
August 2014). Title: Structural metadata – a Key for Indexing Digitized
Newspapers
11
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Europeana Newspapers data set
• About 18 million pages provided by 12 partners
• 17th to 20th century material
• 20 different languages
• Over 8 million pages refined through Optical Character
Recognition done by UIBK
• Over 2 million pages refined through Optical Layout
Recognitiondone by Content Conversion Specialists (CCS)
• Subset refined with Named Entity Recognition (NER)
• www.europeana-newspapers.eu/wp-
content/uploads/2012/04/D-2-1_Dataset_for_refinement.pdf
12
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Selection Criteria
• Selection done by the participating libraries and takes into
account physical condition, demand and copyright
• Free from restrictions, metadata with CC-0 license required
for Europeana
• Relevance to end-users – libraries’ own criteria
• Digitisation quality – high resolution uncompressed master
images required for the refinement process
• Document characteristics – condition, language, layout, font
• Technical considerations – file formats and metadata
standards
13
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Europena Newsapers data set
14
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Volume
15
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Languages
16
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Font type for the top 10 languages
17
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Timeframe
18
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Workflow for refinement
19
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Tool support for libraries
20
1. BCT
Binarisation
and Colour
Reduction Tool
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Tool support for libraries
21
1. BCT
Binarisation
and Colour
Reduction Tool
2. FRT
File Rename
Tool
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Tool support for content providing libraries
22
1. BCT
Binarisation
and Colour
Reduction Tool
2. FRT
File Rename
Tool
3. FAT
File Analyzer
Tool
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
General status of refinement (as of June 2014)
• 10.355.614
Total number of pages for refinement
• 7.776.277
pages processed so far
• 2.579.337
pages remaining
% completed!
• The rest to be completed by
November 2014
23
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Status of refinement OCR and OLR
• 8.149.263
Total number of pages to be OCR’ed
• 6.844.975
OCR’ed pages so far
• Technology: ABBYY FineReader SDK
• 2.206.351
number of pages to be OLR’ed
• 1.947.477
OLR’ed pages so far
• Technology: docWorks
24
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Status of refinement NER
25
Status of languages
• software & trained model delivered, processing data, refining model
• software & trained model delivered, processing data
• data preparation done, training started
• data preparation done, training not yet started
Status of software Collaborations:
• NER attestation tool available
http://kbresearch.dyndns.org/eunews/
• NER training data available (NL):
http://kbresearch.dyndns.org/eunews/data/
• NER tagging tool available (open source)
https://github.com/KBNLresearch/europeananp-ner
• New output format: ALTO 2.1
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
NER
http://www.slideshare.net/Europeana_Newspapers
26
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
WP6 Dissemination
27
RAISE AWARENESS by
sharing our goals, results
and achievements as widely
as possible.
EXPAND OUR NETWORK
of content providers,
technology producers and
other stakeholder groups.
Nationaal Archief: http://www.flickr.com/photos/29998366@N02/3280639091
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Europeana Newspapers website
• Blog: project news, partner
features, thematic articles
• Interviews with researchers
• Europeana newspapers
browser updates
• Highlight new content
• Promotional materials
• Project publications and
presentations
• Events calendar
28
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Engage the researcher
29
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
What researchers value
30
“I see enormous value in an archive that breaks
down national boundaries automatically, where I
can search for content from a range of
countries.” – Bob Nicholson
“The difference lies not just in access but in the
conversion of a massive amount of print into a
searchable resource … This holds the potential to
make connections across newspapers in ways
previously unimaginable.” –Matt Rubery
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Audiences
Policy makers
European Library
community
International library
community
Research community
EC projects Museums
archives
Technology experts
Teachers
Publishers
31
Through information days, workshops, conferences and media communication:
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Workshop on refinement & QA
• 13 – 14 June, University Library Belgrade
• Blog: http://www.europeana-newspapers.eu/focus-on-
newspaper-refinement-quality-assessment-in-belgrade/
32
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Final workshop “Newspapers in Europe & the
Digital Agenda for Europe”
Goal: Produce a roadmap for
improving access to digital
newspapers for policy makers
Aimed at: policy makers,
researchers, librarians, cultural
heritage professionals and
newspaper publishers.
British Library, 29-30 September
2014
BiblioArchives: http://www.flickr.com/photos/lac-bac/7639138098/
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Closing week
Promoting the newspaper browser with end-users.
34
1-5 DECEMBER
2014
• One week of promotional events
• Live browser demos, press
articles, coordinated social media
activity
• All partner libraries will participate
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Animation
35
Thank you
For more information visit
http://www.europeana-newspapers.eu/