Date post: | 30-Dec-2015 |
Category: |
Documents |
Upload: | phebe-boyd |
View: | 217 times |
Download: | 1 times |
Cornell July 25, 2002 NUMDAM
Pierre Bérard
Institut Fourier, CNRS–Université Joseph Fourier
&
Cellule MathDoc, CNRS–Université Joseph Fourier
Grenoble (France)
Cornell July 25, 2002 NUMDAM
Cellule MathDocwww-mathdoc.ujf-grenoble.fr
• An institute on Scientific Information & Communication in Mathematics, supported by Centre National de la Recherche Scientifique (CNRS) and Ministère de la Recherche.
• General mission: documentation issues in mathematics at the national level in France, in cooperation with mathematics libraries and institutes.
Cornell July 25, 2002 NUMDAM
Cornell July 25, 2002 NUMDAM
NUMDAMDigitisation of Ancient Mathematics Documents
NUMérisation de
Documents
Anciens
Mathématiques
A digitisation program supported by and Ministère de la Recherche, managed by the Cellule MathDoc.
Cornell July 25, 2002 NUMDAM
NUMDAM: aims
• Reinforce French mathematical journals (visibility, accessibility, durability).
• Hand down digitised archives of the French mathematical heritage to future generations and participate in international efforts with the same endeavour.
• Strive towards making this digitised mathematical heritage freely accessible.
Cornell July 25, 2002 NUMDAM
Political choices
• Database freely accessible on the web.• Full text freely accessible after a moving – wall
(depending on each serial).• Scheduled interoperability between retro-digitized
and natively digital collections.• National and international co-operations in as far
as possible.
Cornell July 25, 2002 NUMDAM
Technical choices
• Scan from first to last page @ 600 dpi.
• OCR (non-corrected @99,9%, mathematical formulae and images excluded).
• Multi-page files for logical units (TIFF, PDF + hidden text, DjVu).
• End-of-article bibliographies treated (corrected OCR @ 99,99% + mark-up of “ author ”, “ title ”, “ year ” fields)
• Database: cataloguing data for each article, summary (if present), end-of-article bibliography (if present), hidden OCRed text. Structured data exchange in XML.
• In as far as possible links to/from JFM, ZM and MR databases.
• Future enhancements scheduled depending on technology available.
Cornell July 25, 2002 NUMDAM
Production choices
• Use of an external operator for the technical treatments.
• « In house » study, segmentation, cataloguing, quality control, and display.
• Quality and durability policy :
Prefer standard and easily convertible formats, as sources of future processing if necessary (TIFF, XML), not be tied to a proprietary system.
Archive high quality images, which should allow to regenerate the text (formula OCR, structure recognition).
Cornell July 25, 2002 NUMDAM
NUMDAM Phase IJournals
Journal Period
Annales de l’Institut Fourier 1949 – 2000
Bulletin de la Société mathématique de France 1872 – 2000
Mémoires de la Société mathématique de France 1964 – 2000
Publications mathématiques (IHÉS) 1959 – 2000
Journées équations aux dérivées partielles (Saint-Jean-de-Monts)
1975 – 2000
About 136 000 pages and 5 500 articles
Annales scientifiques de l’École normale supérieure
1864 – 1998
About 67 000 pages and 1 750 articles
Cornell July 25, 2002 NUMDAM
NUMDAM Phase I: Chronology
• Spring 2003. — End of the industrial phase of NUMDAM Phase I, public access to articles via the web.
• Autumn 2002. — Start of NUMDAM Phase II. Dealing with © issues continued.
• August 2002. — First 50,000 pages delivered by vendor.
• Feb. - May 2002. — Setting-up production chain (vendor) and quality control (Cellule MathDoc). Dealing with © issues.
• Dec. 2001. — Choice of vendor validated by CNRS.
• Nov. 2000 - Oct. 2001. — Cataloguing and checking database.
• Oct. 2000 - May 2001. — Writing up schedule of conditions/vendor.
• July 2000. — Funding by CNRS.
Cornell July 25, 2002 NUMDAM
NUMDAM Phase II
• Take an active part in the Digital Mathematics Library project. Cooperate with other digitisation projects (Gallica–BnF, possibly EMANI digitisation part).
Inventory of resources & cooperation with historians and mathematicians to make scientific choices and establish priorities, in order to
• Digitise all French mathematics journals (Annales de l’Institut Henri Poincaré, Annales de l’Université de Toulouse, Comptes Rendus de l’Académie, Journal de l’École polytechnique, ....), and possibly some mathematically important general science journals.
• Digitise important seminar series (séminaires Bourbaki, Cartan, séminaire de Probabilités de Strasbourg, ...).
• Digitise a substantial set of important monographies.
Cornell July 25, 2002 NUMDAM
Software developments
SQL XML
Quality control
Authors id & ©
Display
Links
Database maintenance
Quality control
Schedule of technical conditions
VendorDigitisation
SegmentationTreatements
(ocr & bibliographies)
Display: Search and Browsing
Links: JFM, MR, ZM
Examination of collections and setting-
up the database
Copyright issues and negotiations with
publishers
NUMDAM programme: overview
Cornell July 25, 2002 NUMDAM
Quality control procedure
LOG
(Log of errors)
Automatic controlPerl
Sorting samples Perl
Samples
(files TIFF; XML, TIFF, PDF, DjVu)
Files received from vendor
TIFF; XML, TIFF, PDF and DjVu
Log of errors
BDMySQL
Check-listPhp
Visual control
Synthesis
Rejection
Validation
Cornell July 25, 2002 NUMDAM
NUMDAM Programme
XML description of physical volumes
Cornell July 25, 2002 NUMDAM
Publications Mathématiques de l’Institut des Hautes Études Scientifiques
Physical volume: Year 1962, Volume 12
Cornell July 25, 2002 NUMDAM
A paper in a physical volume
Article by Bernard Dwork in Publications Mathématiques IHÉS, 12 (1962), 5-68
Cornell July 25, 2002 NUMDAM
Cornell July 25, 2002 NUMDAM
Bibliographies
Cornell July 25, 2002 NUMDAM
Cross-linking
External databasesJFM, MR, ZM, ...DB of articles & DB of images
MR 28#3039
ZM 0173.48601
MR 10,592e
ZM 0032.39402
PMIHES_1962__12__5_0
EDBM
SQL
PDFDjVu
Cornell July 25, 2002 NUMDAM
MR —— NUMDAM
MR–lookup
|Publications IHES|Shih||13||1962||PMIHES_1962__13__5_0||
|Inst. Hautes Etudes Sci. Publ. Math.|Shih||13||1962||PMIHES_1962__13__5_0|26#1893|Homologie des espaces fibr\'es.
BdD NUMDAM
MR
MR–lookup
Cornell July 25, 2002 NUMDAM
JFM & ZM —— NUMDAM
New identification tool in development in the LIMES framework (EU project)
|Publications IHES|Shih||13||1962||PMIHES_1962__13__5_0||
|Inst. Hautes Etudes Sci. Publ. Math.|Shih||13||1962||PMIHES_1962__13__5_0|0105.16903|Homologie des espaces fibr\'es.
BdD NUMDAM
ZM
ZM–lookup
Cornell July 25, 2002 NUMDAM
Identification of authors:two purposes
• Improve search facilities by setting-up a reference list of authors.
• Provide a tool to help address copyright issues.
Cornell July 25, 2002 NUMDAM
Internal tool ...
Cornell July 25, 2002 NUMDAM
Cornell July 25, 2002 NUMDAM
Cornell July 25, 2002 NUMDAM
Cornell July 25, 2002 NUMDAM
Cornell July 25, 2002 NUMDAM
NUMDAM: search interface based on EDBM (in development)
Cornell July 25, 2002 NUMDAM
JFM MRZM
Abstract if available
Cornell July 25, 2002 NUMDAM
NUMDAM URLs• Main:
www-mathdoc.ujf-grenoble.fr/NUMDAM/
• Visitors (sample files):www-mathdoc.ujf-grenoble.fr/NUMDAM/Visitors/
Login: VISITORS Pwd: v\to\num
• LiNuM (Books at BnF, Cornell, Göttingen, Michigan):www-mathdoc.ujf-grenoble.fr/LiNuM/
• Journal de Mathématiques Pures et Appliquées 1836 – 1880 (BnF):www-mathdoc.ujf-grenoble.fr/JMPA/
• Search NUMDAM database:math-sahel.ujf-grenoble.fr/NUMDAM/Public/Bd/consultation.htm
• Inventory:math-sahel.ujf grenoble.fr/NUMDAM/Public/Inventaire/inventaire.htm
Cornell July 25, 2002 NUMDAM
Thank you for your attention ...