+ All Categories

Digital Media Technology

Week 3: Introduction to TEI

Peter Verhaar

□ Elements□ Attributes□ DTD□ Well-formed XML□ Valid XML□ Meta-language




□ More advanced search actions

□ Explicit expression of implicit information

□ Logical structure of the text□ Intellectual contents of the text


Text Encoding

Logical structure

Semantic contents

Though his chief focus was the eighteenth century – before the convergence of linguistic and national boundaries had consolidated – Robert Darnton’s remarks in What Is the History of Books? are pertinent for any period.

Journal Article





<?xml version="1.0" encoding="UTF-8"?><journalArticle>

<title>Book Trade Archives to Book Trade Networks</title> <author>Adriaan van der Weel and Peter Verhaar</author> <p>Though his chief focus was the eighteenth century – before the convergence of linguistic and national boundaries had consolidated – <name>Robert Darnton</name>’s remarks in <title>What Is the History of Books</title>? are pertinent for any period. </p>


Uses of text encoding

□ “Intelligent texts”: Searching beyond free text searches

□ Indexes

□ Separation of form and content

□ OHCO theory: Ordered Hierarchy of Content Objects

□ Multiple hierarchies?

Book Chapter Section Paragraph Sentence

Book Cover Section / gathering Folium

Text Encoding Initiative

□ More than 500 elements□ Developed by consortium of

scholars□ First established in 1987□ Text in general: “texts in any

natural language, of any date, in any literary genre”

Dear Sirs,

I will accept £10 for the rights to make a translation into Dutch of my novel entitled Wanda

Printers will send you entire proofs from London instantly. Please to send money on receipt of this /Address Madame Ouida. ~c. 2 words illegible~ ~c. 1 word illegible~ Ouida L. de la Ramée


<choice><orig>Impressions</orig><reg>Impressions of Theophrastus Such</reg></choice>

Madame Ouida <gap reason=“illegible” extent=“2 words” />

<unclear reason=“illegible”>London</unclear>

□ Character encoding scheme□ Uses 7 bits (128 characters)


□ 16 bits□ UTF-8□ 1,112,064 characters


α: &#x3B1;


<p>En r&#xE9;ponse &#xE0; votre lettre du 30 Janvier nous avons <lb/> l'honneur de vous informer que nous avons pay&#xE9; Mon-<lb/> sieur Midderigh d&#xE9;j&#xE0; depuis longtemps et presque toujours <lb/> d'avance.</p>

VLQ 1<De virtutibus herbarum liber (interpolatus) iuxta veterem versionem, ordine litterarum compositus>

<shelfMark> </shelfMark><title>



<p>This sentence is in the &lt;p&gt; element.</p>

&gt; Greater than&lt; Less than&quot; Quotation mark&amp; Ampersand


<!-– The next section contains the transcription -->

Used to improve the readability of the XML document:

Top Related