+ All Categories
Home > Technology > UVA MDST 3073 Texts and Models-2012-09-11

UVA MDST 3073 Texts and Models-2012-09-11

Date post: 17-Nov-2014
Category:
Upload: rafael-alvarado
View: 533 times
Download: 1 times
Share this document with a friend
Description:
 
Popular Tags:
73
Lecture 4: Texts and Models Prof. Alvarado MDST 3703/7703 11 September 2012
Transcript
Page 1: UVA MDST 3073 Texts and Models-2012-09-11

Lecture 4: Texts and Models

Prof. AlvaradoMDST 3703/7703

11 September 2012

Page 2: UVA MDST 3073 Texts and Models-2012-09-11

Review

• Posting “Hello, World!”– Put file in the public_html directory of your UVA

Home Directory– Create a post and insert a link to this file– Categorize as: 09.06: (S) HTML

• If you cannot get to your home directory, try uploading tohttp://homedir.virginia.edu

Page 3: UVA MDST 3073 Texts and Models-2012-09-11

Some Quick Corrections

• Digital text is not necessary– It’s an open question (i.e. do we have to have it?)

• Nelson did not conceive of “trails,” Bush did• HTML is not the “first big idea” in the

liberal arts; hypertext is (according to me)• The idea that “text shapes knowledge” is

not ancient, but relatively new– Media determinism is a 20th century perspective– Although Plato notes the effects of literacy in the Phaedo

• Not everything can be translated into HTML– i.e. HTML is not the richest framework for digital representation

Page 4: UVA MDST 3073 Texts and Models-2012-09-11

Your Questions and Observations

• Is commercialization killing creativity? – What is the relationship between how the web is

organized economically and how it shapes expression? EFFECT OF SOCIAL ORGANIZATION

• What happens if the associations that someone makes is ‘off’ and illogical to others?– Does it loosen the way logical connections can be

made and argued? EFFECT ON LOGIC

Page 5: UVA MDST 3073 Texts and Models-2012-09-11

Your Questions and Observations

• Computers in general still heavily rely on a hierarchical structure – To what extent rationalization has occurred with the

invention of hypertext?• Do things lose value and meaning in

exchange for digital coding?– What is the effect of digitization on value?

• Hypertexts and links online can be distracting– Non-linear thinking or mindless surfing?

Page 6: UVA MDST 3073 Texts and Models-2012-09-11

Your Questions and Observations

• People are trying to create the same exact classroom experience online that exists in the physical classroom, which is impossible– We need to rethink and restructure the online learning

experience as a new and unique learning experience• How can we keep hypertext from

altering us too much?• The beauty and the risk of an open

source web

Page 7: UVA MDST 3073 Texts and Models-2012-09-11

Practical Questions• How can an HTML webpage on your own

computer be found by the search bar but not be on the web?– Your browser lives on your machine– The protocol name tells it where to look

• I wondered if the picture from my computer would still show up if I opened the page from another computer?

• It is interesting to see how one little thing out of place can ruin the entire code– Computers are stupid in that way

• Why should coders learn HTML? – HTML is an interface language that can be easily generated from print

statements in your code

Page 8: UVA MDST 3073 Texts and Models-2012-09-11

What is HTML?

• HTML is not a programming language– Programming languages express IF … THEN logic– But it is code that obeys a syntax & gets interpreted– And it is produced and consumed by programs

• HTML is a very general interface language

• HTML is written in XML, which we discuss today– Technically called “XHTML”– The original version was written in SGML

Page 9: UVA MDST 3073 Texts and Models-2012-09-11

In general, don’t conflate HTML with hypertext or with digital representation in general

Page 10: UVA MDST 3073 Texts and Models-2012-09-11

HTML is a language that generates a species of hypertext

which is, in turn, a species of digital representation

Page 11: UVA MDST 3073 Texts and Models-2012-09-11

A provisionaltaxonomy

Page 12: UVA MDST 3073 Texts and Models-2012-09-11

Is hypertext new?

Page 13: UVA MDST 3073 Texts and Models-2012-09-11

[Study Bible]

Page 14: UVA MDST 3073 Texts and Models-2012-09-11

1 = Mishna, the first major transcription of the oral law2 = Gemara, analytical discussions3 = Rashi, glossary4 = Tosefos, additions5 = Hananel, comments6 = Eye of Justice, legal decisions8 = Light of the Bible, references to Biblical quotations.9 = Bach's Annotations 10 = Gra's Annotations

[Talmud]

Page 15: UVA MDST 3073 Texts and Models-2012-09-11

[Charrette]

Page 16: UVA MDST 3073 Texts and Models-2012-09-11

[The Wasteland]

Page 17: UVA MDST 3073 Texts and Models-2012-09-11

[Critical Edition]

Page 18: UVA MDST 3073 Texts and Models-2012-09-11

[OED]

Page 19: UVA MDST 3073 Texts and Models-2012-09-11

These are all examples of traditional texts

They exhibit “latent hypertext”

Page 20: UVA MDST 3073 Texts and Models-2012-09-11

Landow

• The concept of hypertext parallels poststructuralist views of text– Barthes, Foucault, Derrida, Kristeva, et al.

• In this view, a text is not, and has never been, a bounded, closed thing– it is a network of signifiers that connect meanings

across time and space …

Page 21: UVA MDST 3073 Texts and Models-2012-09-11

Digital humanists have been concerned with encoding historical texts since at least 1949

Page 22: UVA MDST 3073 Texts and Models-2012-09-11

Father Busa

• Creator of the Index Thomisticus• Saw the computer as a solution to

indexing the works of Aquinas in 1949– 13,000,000 words– “in” took 4 years

• Solution:– Lemmatization– Variations tagged as

instances of a type

Page 23: UVA MDST 3073 Texts and Models-2012-09-11

The complete works of Aquinas will be typed onto punch cards; the machines will then work through the words and produce a systematic index of every word St. Thomas used, together with the number of times it appears, where it appears, and the six words immediately preceding and following each appearance (to give the context). This will take the machines 8,125 hours; the same job would be likely to take one man a lifetime.

Time Magazine, 1956, “Religion: Sacred: Electronics”

Page 24: UVA MDST 3073 Texts and Models-2012-09-11

So, what is text?

Let’s look at some material examples

Page 25: UVA MDST 3073 Texts and Models-2012-09-11

page o’ text

Real world text comes packaged in documents

Page 26: UVA MDST 3073 Texts and Models-2012-09-11

How is text conveyed in a document?

A document is a material artifact

Page 27: UVA MDST 3073 Texts and Models-2012-09-11
Page 28: UVA MDST 3073 Texts and Models-2012-09-11

What is text?

Page 29: UVA MDST 3073 Texts and Models-2012-09-11

Visual Signifiers

• Small caps• Indentation• Alignment• Italics• Space

All used to signify elements of text

Page 30: UVA MDST 3073 Texts and Models-2012-09-11

Documents have thee Levels: Content, Structure, Style

• Content– TEXT, images, video clips, etc.

• Structure– The organization of content into units (elements)

and logical relationships (e.g. reading order)• Style– Screen and print layout– Fonts, colors, etc.

Page 31: UVA MDST 3073 Texts and Models-2012-09-11

Descriptive markup languages allow us to define structure of documents for

computational purposes

Theoretically, they do not specify layout or content

Page 32: UVA MDST 3073 Texts and Models-2012-09-11

[PDF, Procedural Markup]

In contrast to procedural markup like PDF

Page 33: UVA MDST 3073 Texts and Models-2012-09-11

So, how are docs structured?

Page 34: UVA MDST 3073 Texts and Models-2012-09-11

Hierarchically …

(theoretically)

Page 35: UVA MDST 3073 Texts and Models-2012-09-11

Document Elements and StructuresPlay– Act +

• Scene +– Line +

Book– Chapter +

• Verse +

Letter

– Heading• Return Address• Date• Recipient Info

– Name– Title– Address

– Content• Salutation• Paragraph +• Closing

Page 36: UVA MDST 3073 Texts and Models-2012-09-11

These are all “trees”

Page 37: UVA MDST 3073 Texts and Models-2012-09-11

XML is a markup language

Page 38: UVA MDST 3073 Texts and Models-2012-09-11

What is XML?

• Stands for eXtensible Markup Language– Actually invented after the web– A simplification of SGML, the language used to create HTML– It specifies a set of rules for creating specialized markup

languages such as HTML and TEI• It is simplified version of the SGML

– Standard Generalized Markup Language• SGML was invented in the early 1970s to

wrest the control of documents from computer people who were taking over industries like law and accounting

Page 39: UVA MDST 3073 Texts and Models-2012-09-11
Page 40: UVA MDST 3073 Texts and Models-2012-09-11

XML looks like this

Notice how the element names reference units, not layout or style

Page 41: UVA MDST 3073 Texts and Models-2012-09-11

Also markup for “in-line” elements

Page 42: UVA MDST 3073 Texts and Models-2012-09-11

XML Premises

1. All documents are comprised of elements.

2. Elements contain content.3. Elements have no layout.4. Elements are hierarchically

ordered.5. Elements are to be indicated by

“markup” – tags that define the beginning and end of an element

Page 43: UVA MDST 3073 Texts and Models-2012-09-11

XML Markup Rules

• Tags signify structural elements• Three kinds of tag– Start and End, e.g <p> and </p>– Singleton, e.g <br />

• Start and singleton tags can have attributes– Simple key/value pairs– <div class="stanza" style="color:red;">

• Basic rules– All attributes must be quoted– All tags must nest (no overlaps!)

Page 44: UVA MDST 3073 Texts and Models-2012-09-11

Documents in XML that meet these rules are “well formed”

Page 45: UVA MDST 3073 Texts and Models-2012-09-11

XML also provides Document Types• A Document Type Definition (DTD)

defines a set of tags and rules for using them– Specifies elements, attributes, and possible combinations– E.g. in HTML, the ol and ul elements must contain li elements

• A DTD is just one kind of schema system used by XML

• Schema express data models of/for texts– TEI is a powerful way of describing primary source materials

for scholars• Documents that use a schema properly

are called “valid”

Page 46: UVA MDST 3073 Texts and Models-2012-09-11

Originally, DTDs defined “genres” like business letter or mortgage form

They were later used to define more abstract models of textual content

Page 47: UVA MDST 3073 Texts and Models-2012-09-11

XML is used everywhere

• HTML– E.g. Embed codes

• TEI (Text Encoding Initiative)• RSS• Civilization IV• Playlists (e.g. XSPF or “spiff”)• Google Maps (KML)

Page 48: UVA MDST 3073 Texts and Models-2012-09-11

A Look Again at HTML

• aka XHTML– And now becoming HTML5

• An instance of XML (formerly SGML)

• An interface language• Language of the World Wide Web• Defined by a DTD that prescribes a

specific set of elements and relations

Page 49: UVA MDST 3073 Texts and Models-2012-09-11

HTML Document Structure

• Head– Title– [Directives]

• Body– H1+– H2+• P+• UL

– LI

Page 50: UVA MDST 3073 Texts and Models-2012-09-11

Basic Elements with associated TagsElement Tags Attributes

Paragraph <p> ... </p>

Numbered List <ol> <li> ... </li></ol>

Bulleted List <ul> <li> ... </li></ul>

Table <table> <tr> <td> ... </td> </tr></table>

Anchor <a> ... </a> href, target

Image <img/> src, border

Object <object> ... </object>

Page 51: UVA MDST 3073 Texts and Models-2012-09-11

The Text Encoding Initiative created TEI to mark up scholarly documents

Mainly primary sources such as books and

manuscripts

Page 52: UVA MDST 3073 Texts and Models-2012-09-11

TEI

• The dominant language used to encode scholarly text

• The current room was the locations of UVa’s EText Center– World famous for text encoding– Now part of the library and catalog

• Scholars create their own schema to match what they are interested in

Page 53: UVA MDST 3073 Texts and Models-2012-09-11

Examples

• The TEI Header– http://tbe.kantl.be/TBE/examples/TBED02v00.

htm• TEI Prose– http://tbe.kantl.be/TBE/examples/TBED03v00.

htm • Find others at the TEI By

Example Project– http://tbe.kantl.be/TBE/

Page 54: UVA MDST 3073 Texts and Models-2012-09-11

XML contains an implicit theory of text

What is it?

Page 55: UVA MDST 3073 Texts and Models-2012-09-11

OCHO

• XML (and therefore HTML and TEI) imply a certain theory of text– A text is an OHCO

• OHCO– Ordered Hierarchy of Content Objects

• An OHCO is a kind of tree– Elements follow each other in sequences– Elements can contain other elements

Page 56: UVA MDST 3073 Texts and Models-2012-09-11

What are the advantages of this view?

Page 57: UVA MDST 3073 Texts and Models-2012-09-11

OHCO allows for easy processing

• Every element has a precise address in the text– E.g. HTML/body/p[1]

• Texts can be described in the language of kinship– Ancestors, parents, siblings, children, etc.

• Texts can be restructured and manipulated by known patterns and algorithms– Traversing– Pruning– Cross-referencing

Page 58: UVA MDST 3073 Texts and Models-2012-09-11

What are the disadvantages of OCHO?

Page 59: UVA MDST 3073 Texts and Models-2012-09-11

Logical vs. Physical Structure

Page 60: UVA MDST 3073 Texts and Models-2012-09-11

Two common structures that overlap

Pages and Paragraphs

Page 61: UVA MDST 3073 Texts and Models-2012-09-11

<page n=“2”>. . .<p id=“foo”>His good looks and his rank had one fair claim on his attachment, since to them he must have owed a wife</p> </page><page n=“3”><p id=“bar” prev_id=“foo”> a very superior character to anything deserved by his own.</p>. . .</page>

Solution 1: Split Elements

Page 62: UVA MDST 3073 Texts and Models-2012-09-11

<p>His good looks and his rank had one fair claim on his attachment, since to them he must have owed a wife <pb n=“3” /> a very superior character to anything deserved by his own.</p>

Solution 2: Use “Milestones”

One structure gets backgrounded

Page 63: UVA MDST 3073 Texts and Models-2012-09-11

Wittgenstein’s Manuscripts

What about this?

Page 64: UVA MDST 3073 Texts and Models-2012-09-11

[Charrette]

Page 65: UVA MDST 3073 Texts and Models-2012-09-11

The problem of overlap suggests the need for a richer set of tools

Page 66: UVA MDST 3073 Texts and Models-2012-09-11

What tools do McCarty and Unsworth reference?

Page 67: UVA MDST 3073 Texts and Models-2012-09-11

Tables

Page 68: UVA MDST 3073 Texts and Models-2012-09-11

A database for Ovid

Page 69: UVA MDST 3073 Texts and Models-2012-09-11

McCarty

• A different use of markup – From document description to interpretation – Creative “misuse”

• Reverse engineering a “grammar” of personification from a markup strategy– Thickness = description (of text)– Depth = explanation (of text by reference to grammar)

• Is forced to use tables in collaboration with markup

Page 70: UVA MDST 3073 Texts and Models-2012-09-11

Thick description = MarkupDeep description = Tables

Page 71: UVA MDST 3073 Texts and Models-2012-09-11

How to reconcile these tools?

Page 72: UVA MDST 3073 Texts and Models-2012-09-11

A Proposed Model

• Texts are not documents– Documents are media, Texts are messages

• Texts and documents are part of a system comprised of “levels”– They are effectively archaeology sites with stratigraphic layers– Erasures are like cities building on top of each other

• Each level of the system is described by an appropriate set of tools– Document structures XML– Textual structures, embedded ontologies Tables

Page 73: UVA MDST 3073 Texts and Models-2012-09-11

Basic Levels

• Document– Physical objects (paper)– Logical objects (defined by space, style, punctuation, etc.)– Style and layout (also defined by space, color, etc.)– Can have superimposed versions

• Text– Sequences of characters– Grammatical features– Figures and poetic features– Etc.


Recommended