Date post: | 30-Jun-2015 |
Category: |
Documents |
Upload: | clarkeemma |
View: | 413 times |
Download: | 1 times |
Lecture 2: From Texts to eTexts: Thematic Research Collections and
Text Encoding.
Emma Clarke & Tomás Ó MurchúTheory and Practice of Digital Humanities.MPhil Digital Humanities
Why Thematic Research Collections?
Libraries as Laboratories (Palmer)
Exaggeration?
Limitations of scattered content
Digital aggregations of primary sources and related materials that support research on a theme. (Palmer).
TRCs getting closer to the laboratory ideal – source material, tools & expertise together to advance the production of new knowledge.
PART 1: THEMATIC RESEARCH COLLECTIONS
THEMATIC RESEARCH COLLECTIONS
Many shapes and sizes…
May contain manuscripts, images, commentary, audio, letters, translations, versions etc.
Digital Libraries/Archives & TRCs
Digital Libraries and Archives differ in mission and method.
Library collections are amassed for preservation, dispensing, bibliographic, and symbolic purposes
Digital Libraries have diverse collections.
Perseus Collection – a digital archive.
Bolles Collection on the History ofLondon – a TRC within a digital archive (Perseus Collection).
www.perseus.tufts.edu/ or perseus.mpiwg-berlin.mpg.de/
DIFFERENCES BETWEEN THEMATIC RESEARCH COLLECTIONS AND DIGITAL LIBRARIES AND ARCHIVES
John Unsworth (2000)
1. Necessarily Electronic (because of cost of 2,3,8)2. Constituted of Heterogeneous datatypes (multimedia)3. Extensive but thematically coherent4. Structured but open-ended5. Designed to support research6. Authored or multi-authored7. Interdisciplinary8. Collections of digital primary resources (and they
are themselves second-generation digital resources)
CHARACTERISTICS OF THEMATIC RESEARCH COLLECTIONS
CHARACTERISTICS OF THEMATIC RESEARCH COLLECTIONS
Palmer (2004)
Content Function
* Digital
* Thematic
* Coherent Scholarly contribution
* Heterogeneous Contextual mass
* Structured Interdisciplinary platform
* Open-ended Activity support
Basic elements
Variable characteristics
Research support
Two Basic Elements of a TRC
Digital : Digital format even though sources may exist as manuscripts, images etc.
Thematic: Contents are focused on particular research themes.
• Author Orientated-Walt Whitman Archive, Thomas MacGreevy Archive
• Historical Event/Period - Salem Witch Trials Archive, 1641 Depositions, September 11 Digital Archive
• Specific focused theme – Hamlet on the Ramparts
CHARACTERISTICS OF THEMATIC RESEARCH COLLECTIONS
Variable Characteristics
Coherent: A coherent set of primary resources that relate directly to the theme.
Heterogeneous: Manuscripts, letters, critical essays, reviews, biographies, bibliographies
Structured: Permits searches and analysis. Interrelated groups structured together – images together, letters together etc.
Open Ended: Potential to grow and change. New sources added and improved. Annotations, links etc. Sep 11 archive
CHARACTERISTICS OF THEMATIC RESEARCH COLLECTIONS
What goes into the TRC?
In both physical and digital libraries, materials are usually separated for reasons unimportant to a researcher. For example, primary texts may be part of a special collection, while secondary works may be in separate book and journal collections.
A TRC has a mix of heterogeneous but closely associated materials.
For example in the http://dante.ilt.columbia.edu/ - Digital Dante Archive
CONTENT DECISIONS IN TRCS
The Interdisciplinary nature of TRCs
TRCs usually contain resources from different fields within the humanities world.
For example Thomas MacGreevy Archive aims to promote inquiry into the interconnections between literature, culture, history, and politics by blurring the boundaries that separate the different fields of study.
http://www.macgreevy.org
CONTENT DECISIONS IN TRCS
1. TRCs contain their own digital primary resources rather than basing their work on digital primary resources produced by libraries or publishers - issues with permissions & copyrights and ability to edit, intervene in, comment on, contextualize materials produced and controlled by others.
2. Lack of willingness of libraries to collect the scholars' "second-generation" digital publications so that they can become someone else's digital primary
PROBLEMS FOR TRCS
3. “Do-it-yourselfism”.Each scholar/team builds their own digital library (and acts as his or her own publisher) leads to wasted and duplicated effort, loss of materials and loss of confidence in digital scholarship because, most importantly, it produces a more or less immediate breakdown in referential integrity.
4. Marketing, design, editorial skills and services of publishers are not connecting with born-digital scholarly publications: editorial standards are not always what they should be, documentation is sometimes sloppy, problems of rights and permissions are frequently ignored, etc.
PROBLEMS FOR TRCS
5. The genre of the thematic research collection is largely developing outside of publishing institutions. As a consequence, publishers seem of questionable relevance to it.
6. Publishers have been, historically, the conduit connecting authors to libraries—but that connection is not being made for thematic research collections. As a consequence, publications of this sort are not making their way into library collections.
PROBLEMS FOR TRCS
More organised and searchable than a scan
Contains more information than a transcript
• page layout • line breaks • material qualities • physical properties • other meta-data
WHY ENCODE?
By markup language we mean a set of markup conventions used together for encoding texts.
A markup language must specify:• what markup is allowed, • what markup is required, • how markup is to be distinguished from text, • and what the markup means
“Markup is an act of interpretation” (Cummings)
Following examples from University of Michigan Library
MARKUP LANGUAGES
Click icon to add pictureClick icon to add pictureClick icon to add picture
Click icon to add pictureClick icon to add pictureClick icon to add picture
Three characteristics of XML seem to the TEI to make it unlike other markup languages:
• emphasis on descriptive rather than procedural markup;• document type concept;• independence of any one hardware or software system.
Compared with HTML, XML has some other important characteristics:
• it is extensible (customisable): it does not contain a fixed set of tags
• its documents must be well-formed according to a defined syntax, and may be formally validated
• it focuses on the meaning of data, not its presentation
WHY XML? WHY NOT HTML?
XML EXAMPLE
Official title: Guidelines for Electronic text Encoding and Interchange
Continually revised set of proposals of suggested methods for text encoding.
Guidelines describe the principles that should be used when marking up texts
They will evolve and inevitably change but they will overall stay true to the initial design goals:
TEI GUIDELINES
INITIAL DESIGN GOALS OF TEI GUIDELINES
1. suffice to represent the textual features needed for research
2. be simple, clear, and concrete
3. be easy for researchers to use without special-purpose software
4. allow the rigorous definition and efficient processing of texts
5. provide for user-defined extensions
6. conform to existing and emergent standards
Apply to texts in any natural language, of any date, in any literary genre or text type, without restriction on form or content.
Are customisable.
Examples of document content (tags)
THE GUIDELINES
Textual elements Titles/ paragraphs/ headings/ dedications
Non-textual elements Graphics/ illustrations/ cover/ binding material/ line breaks
Meta-data Publication dates/ prices/ page counts / history
If marking up texts is “an act of interpretation” then it is one person/ a group of people’s interpretation of what is important information.
By marking up documents and creating online scholarly editions, we are using historical texts / documents in a way that they were never intended to be used by the creator.
“Because (TEI) … treats the humanities corpus … as informational structures, it ipso facto violates some of the most basic reading practices of the humanities community, scholarly as well as popular.” (McGann 2001: 139)
CRITICISM (?) OF TEI
A Family At War: The Diary of Mary Martin
1 January – 25 May 1916
Written in letter format to her son Charlie who went missing in action during WW1, the diary chronicles thedaily activities of Mary, her family, friends and relatives.
Diary of Mary Martin site
TEI PROJECTS
Autour d’une séquence et des notes du Cahier 46: enjeu du codage dans les brouillons de Proust
Around a sequence and some notes of Notebook 46: encoding issues about Proust's drafts
Proust Prototype
TEI PROJECTS