Istituto di Linguistica Computazionale – Pisa Andrea Bozzi

Post on 31-Jan-2016

26 views 0 download

Tags:

description

Special applications for Digital Libraries: computer-aided philological and linguistic analysis of digital documents. Istituto di Linguistica Computazionale – Pisa Andrea Bozzi. NEH/CNR Meeting Washington DC October 5, 2007. Presentation contents. - PowerPoint PPT Presentation

transcript

Special applications for Digital Libraries:computer-aided philological and linguistic

analysis of digital documents

Istituto di Linguistica Computazionale – Pisa

Andrea Bozzi

NEH/CNR MeetingWashington DCOctober 5, 2007

Presentation contents

1. An EU supported system for Greek papyrology

2. A special application for browsing and searching demotic documents on ostraka;

3. A philological workstation for digital medieval manuscripts;

4. CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts;

5. How to integrate all these modules in a web-based open source application.

Presentation contents

1. An EU supported system for Greek papyrology

2. A special application for browsing and A special application for browsing and searching searching demotic documents on ostrakademotic documents on ostraka;;

3. A philological workstation for digital A philological workstation for digital medieval medieval manuscripts;manuscripts;

4. CHLT-LEMLAT (EC-NSF project)CHLT-LEMLAT (EC-NSF project) to perform to perform lemmatization of Latin texts;lemmatization of Latin texts;

5. How to integrate all these modules in a web-How to integrate all these modules in a web-based open source application.based open source application.

The philological workstation: image and text transcription

Image segmentation and semi-automatic word linking

Annotations and critical apparatus

Wordforms list and specific indexes

The web philological workstation to manage documentsof the Istituto Papirologico Vitelli in Florence (restricted use)

Presentation contents

Andrea Bozzi

andrea.bozzi@ilc.cnr.it

NEH/CNR Meeting, Washington October 5, 2007

1. An EU supported system for An EU supported system for Greek papyrologyGreek papyrology

2. A special application for browsing and searching demotic documents on ostraka;

3. A philological workstation for digital A philological workstation for digital medieval medieval manuscripts;manuscripts;

4. CHLT-LEMLAT (EC-NSF project) to perform CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts;lemmatization of Latin texts;

5. How to integrate all these modules in a web-How to integrate all these modules in a web-based open source application.based open source application.

OMM 1381: E. Bresciani, S. Pernigotti, M.C. Betrò, Ostraka demotici da Narmuti, Pisa, 1983, pp. 16-18;

OMM 300: Gallo P., Ostraca demotici e ieratici dall’archivio bilingue di Narmouthis, Pisa, 1997, pp. 113-114;

OMM 393: R. Pintaudi, P.J. Sijpesteijn, Ostraka greci da Narmuthis, Pisa, 1993, p. 40.

Special system for teaching and retrieving linguistic information from demotic texts on ostraka

 L’archivio delle immagini digitali e la tabella dei segni demotici

 

Research results:see the blue parts(arrow) where the selected symbolhas been found

Presentation contents

Andrea Bozzi

andrea.bozzi@ilc.cnr.it

NEH/CNR Meeting, Washington October 5, 2007

1. An EU supported system for An EU supported system for Greek papyrologyGreek papyrology

2. A special application for browsing and A special application for browsing and searching searching demotic documents on ostrakademotic documents on ostraka;;

3. A philological workstation for digital medieval manuscripts;

4. CHLT-LEMLAT (EC-NSF project) to perform CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts;lemmatization of Latin texts;

5. How to integrate all these modules in a web-How to integrate all these modules in a web-based open source application.based open source application.

Textual criticism for medieval manuscripts

Link to the listof collatedsources

Selection ofthe variant eixens

Evaluation of the variant reading in the collated source

Recording of thevariant Eixensin theCritical apparatus

Variants search in different ancient printed editions of the same work

Link to the listof collatedbooks

Image of the corresponding page

Presentation contents

Andrea Bozzi

andrea.bozzi@ilc.cnr.it

NEH/CNR Meeting, Washington October 5, 2007

1. An EU supported system for An EU supported system for Greek papyrologyGreek papyrology

2. A special application for browsing and A special application for browsing and searching searching demotic documents on ostrakademotic documents on ostraka;;

3. A philological workstation for digital A philological workstation for digital medieval medieval manuscripts;manuscripts;

4. CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts;

5. How to integrate all these modules in a web-How to integrate all these modules in a web-based open source application.based open source application.

Lemmatization results(C. Sallustius Crispus, De coniuratione Catilinae, 1-2)

Lemmatization results of selected wordforms

Presentation contents

Andrea Bozzi

andrea.bozzi@ilc.cnr.it

NEH/CNR Meeting, Washington DCOctober 5, 2007

1. An EU supported system for An EU supported system for Greek papyrologyGreek papyrology

2. A special application for browsing and A special application for browsing and searching searching demotic documents on ostrakademotic documents on ostraka;;

3. A philological workstation for digital A philological workstation for digital medieval medieval manuscripts;manuscripts;

4. CHLT-LEMLAT (EC-NSF project) to perform CHLT-LEMLAT (EC-NSF project) to perform lemmatization of Latin texts;lemmatization of Latin texts;

5. How to integrate all these modules in a web-based open source application.

Pinakes 3.0http://pinakes.imss.fi.it

• Aim: web-based open source application to manage cultural heritage historical data in digital format.

• Partners:– Fondazione Rinascimento Digitale, Florence;– Istituto e Museo della Storia della Scienza,

Florence;– Ministero per i Beni Culturali, Rome– CNR, Istituto di Linguistica Computazionale, Pisa

Technology

– Programming language: JAVA (Jdk1.5)– Servlet Engine: Tomcat 5.5.x + Apache HTTP

Connectors.– Web server: Apache httpd server 2.2.x.– Web Applications Framework: Jakarta Struts– Web Service Framework: Apache Axis 1.4– Database Engine: Postgres 8.1– Programming environment: NetBeans 5.5.1.– Final development: Hibernate 3.2.5.

Standards

• DCMI (Dublin Core Metadata Initiative)• TEI (Text Encoding Initiative)• OWL (Ontology Web Language)• RDF-XML (Resource Description Framework)• SPARQL (Query Language fo RDF)

• UTF8 (Unicode Transformation Format).