A Spot of TEI

Post on 18-Jun-2015

346 views 0 download

Tags:

description

Presentation on TEI (particularly as it relates to the Integrating Digital Papyrology project). Given at U. of South Carolina Center for Digital Humanities, 2/4/2013

transcript

February 4th, 2013

A spot of TEIHugh Cayless, NYUphilomousos@gmail.comfollow me on Twitter: @hcayless

Who am I?

✤ Ph.D. in Classics, M.S. in Information Science

✤ Worked as a software engineer for the last 12 years or so

✤ the last 4 have been for NYU doing Digital Classics and similar cultural heritage digital access projects

✤ recently elected to the TEI Technical Council.

✤ One of the founders of EpiDoc, a TEI-based standard for encoding ancient inscriptions (and now papyri too).

What am I talking about?

✤ How we use TEI/XML in projects

✤ Why TEI?

✤ Current projects

Integrating Digital Papryology

✤ Unification of several long-running projects:

✤ Duke Databank of Documentary Papyri (DDbDP)✤ Heidelberg Gesamtverzeichnis (directory of Greek documentary

papyri — HGV)✤ Advanced Papyrological Information System (APIS)✤ Bibliographie Papyrologique✤ Trismegistos

State of play at the beginning

✤ DDbDP: TEI SGML files

✤ HGV: Filemaker Pro database + web interface

✤ APIS: idiosyncratic text-based catalog + images + web interface

✤ BP: database only, published annually in print/on disk

✤ TM: database + web interface

✤ TM is a going concern, working with IDP, but with no plans to be subsumed by it

What we did

✤ DDbDP: converted TEI SGML to EpiDoc (TEI) XML

✤ HGV: converted to EpiDoc XML

✤ APIS: converted to EpiDoc XML

✤ BP: converted to TEI <bibl> fragments

✤ TM: inserted TM ids into IDP documents, generated linkages to TM site

Structure

✤ The core of the system is just TEI files in a Git repository.

✤ These are transformed, using XSLT, into RDF, HTML, plain text, and add documents for our search index.

✤ They are pulled into an editing workflow system as needed, which allows editing the files using a web form or (for texts) a non-XML syntax based on papyrological/epigraphic editing conventions.

✤ An automated process syncs data from the editor’s repo and a Github repo, and publishes them to the site.

Or, visually

Canonical Git Repo

Github Repo

Github

Git Repos

Editor Database

Numbers Server

Papyri.info Git Repo

Navigator Interface

search API

SPARQL API

XSLT API

Editor

Automated Document Sync

Leiden+ Conversion

API

Search Engine

So why TEI?

✤ Lots of reasons:

✤ Granular control over records

✤ Attribution

✤ Multiple outputs

✤ Mixture of controlled and free-form data

✤ Relatively easy to obtain / create tools

✤ Engaged and responsive community

What I’m working on now

✤ Fixing the TEI Pointer spec

✤ Annotation of documents to mark things like personal and place names

✤ Linguistic annotation

✤ Linking text and image

Some examples

✤ http://papyri.info/ddbdp/cpr;8;72

✤ fine-grained attribution / version control (click on “Editorial History”) and “Detailed” at the bottom of the text)

✤ http://papyri.info/ddbdp/c.ep.lat;;218

✤ What’s going on underneath?

r  ̣[  ̣  ̣  ̣]  ̣c  ̣[  ̣  ̣ Aelio Fel]ici pluṛ[imam] ṣạ[lutem] opto deos · ut mi[hi v]ạleas · quod ṃẹ[um votum est] ego enim · valeọ coṛpọṛe   ̣  ̣  ̣[ -ca.?- ] te non videọ rog̣ọ ṇe · fac ̣ịaṣ [ -ca.?- ] f  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣[- ca.9 -]uma  ̣[ -ca.?- ]   ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣ [ -ca.?- ]vAelio Felici

Beginning of a letter marked up according to the Leiden Conventions

r  ̣[  ̣  ̣  ̣]  ̣C  ̣[  ̣  ̣–ca.9– ]ICIPLUṚ[. . . .] ṢẠ[. . . . .] OPTODEOS · UTMI[. . . ]ẠLEAS · QUODṂẸ[ –ca.10– ] EGOENIM · VALEỌCOṚPỌṚE   ̣  ̣  ̣[ -ca.?- ] TENONVIDEỌROG̣ỌṆE · FAC̣ỊAṢ [ -ca.?- ] F ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣[- ca.9 -]UMA  ̣[ -ca.?- ]   ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣ [ -ca.?- ]vAELIO FELICI

The same letter, diplomatic(ish) edition

<div xml:lang="la" type="edition" xml:space="preserve"><div n="r" type="textpart"><!--milestone unit="4"--><ab><lb n="1"/><gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" quantity="3" unit="character"/><gap reason="illegible" quantity="1" unit="character"/>c<gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" quantity="2" unit="character"/><supplied reason="lost"> Aelio Fel</supplied>ici plu<unclear>r</unclear><supplied reason="lost">imam</supplied><lb n="2"/><unclear>sa</unclear><supplied reason="lost">lutem</supplied><lb n="3"/>opto deos <g type="middot"/> ut mi<supplied reason="lost">hi v</supplied><unclear>a</unclear>leas <g type="middot"/> quod <unclear>me</unclear><supplied reason="lost">um votum est</supplied><lb n="4"/>ego enim <g type="middot"/> vale<unclear>o</unclear> co<unclear>r</unclear>p<unclear>or</unclear>e <gap reason="illegible" quantity="3" unit="character"/><gap reason="lost" extent="unknown" unit="character"/><lb n="5"/>te non vide<unclear>o</unclear> ro<unclear>go</unclear> <unclear>n</unclear>e <g type="middot"/> fa<unclear>ci</unclear>a<unclear>s</unclear> <gap reason="lost" extent="unknown" unit="character"/><lb n="6"/>f<gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="2" unit="character"/><gap reason="lost" quantity="9" unit="character"/>uma<gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" extent="unknown" unit="character"/><lb n="7"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="2" unit="character"/> <gap reason="lost" extent="unknown" unit="character"/></ab></div><div n="v" type="textpart"><!--milestone unit="4"--><ab><lb n="1"/>Aelio Felici </ab></div></div>

The same letter marked up in EpiDoc (TEI) XML

The same letter, visualization of the tree structure of the XML

✤ What is the text and what is the markup?

✤ There is no text, only readings. EpiDoc allows you to produce models of readings.

✤ Slicing the text up into bits isn’t adulterating it, it just adds hooks for transforming the text in useful ways.

✤ Mailing list: TEI-L@LISTSERV.BROWN.EDU ✤ http://listserv.brown.edu/archives/cgi-bin/wa?SUBED1=tei-l&A=1✤ http://listserv.brown.edu/archives/cgi-bin/wa?A0=tei-l

✤ TEI Sourceforge:✤ Report a bug:

✤ http://sourceforge.net/tracker/?func=add&group_id=106328&atid=644062✤ Make a feature request:

✤ http://sourceforge.net/tracker/?func=add&group_id=106328&atid=644065

✤ IRC: #tei-c on http://freenode.net/

How to get involved