DMDS Winter 2015 Workshop 1 slides

Post on 16-Jul-2015

164 views 1 download

Tags:

transcript

Winter 2015 Session #1:

Exploring Programming in Digital

ScholarshipFebruary 12, 2015

Paige Morgan

Sherman Centre for Digital Scholarship

Programming is complex

enough that just figuring

out what you want to do

and what sort of language

you need is work.

Thinking that you ought to be able

to do everything almost

immediately is a recipe for feeling

terrible.

Photo by MK Fautoyére, via Flickr

There will always be new

programs and platforms

that you will want to

experiment with.

Working with technology

means periodically starting

from scratch -- a bit like

working with a new time

period or culture; or figuring

out how to teach a new

class.

Being able to effectively

communicate about your

project as it relates to

programming is a skill in

itself.

What can programming

languages do?

Programming languages

can...• search for things

• match things

• read things

• write things

• receive information, and give it

back, changed or unchanged

• count things

• do math

• arrange things in quantitative or

random order

• respond: if x, do y OR do x until

y happens

• compare things for similarity

• go to a file at a location, and

retrieve readable text

• display things according to

instructions that you provide

• draw points, lines, and shapes

They can also do many or

all of these things in

combination.

Example #1• find all the statements in quotes ("") from a

novel.

• count how many words are in each statement

• put the statements in order from smallest

amount of words to largest

• write all the statements from the novel in a

text file

Example #2• allow a user to type in some information, i.e.,

"Benedict Cumberbatch"

• compare “Benedict Cumberbatch” to a much

larger file

• retrieve any data that matches the

information

• print the retrieved information on screen

Example #3• "read" two texts -- say, two plays by Seneca

• search for any words that the two plays have in

common

• print the words that they have in common on

screen

• calculate what percentage of the words in each

play are shared

• print that percentage onscreen

Example #4• if the user is located in geographic

location Z, i.e., 45th and University, go

to an online address and retrieve some

text

• print that text on the user’s tablet

screen

• receive input from the user and respond

However...

• In Example #1, the computer is focusing on

things that characters say. But what if you want

to isolate speeches from just one character?

• In Example 2, how does the computer know

how much text to print? Will it just print

"Benedict Cumberbatch" 379 times, because

that's how often it appears in the larger file?

These are the areas of

programming where

critical thinking and

specialized disciplinary

knowledge become vital.

The Difference

• Humans are good at differentiating

between material in complex and

sophisticated ways.

• Computers are good at not

differentiating between material unless

they’ve been specifically instructed to

do so.

Computers work with

data.

You work with data, too --

but you may have to do

extra work to make your

data readable by

computer.

Ways to make your data

machine-readable• Annotate it with markup language

• Organize it in patterns that the

computer can understand

• Add metadata that is not explicitly

readable in the current format (i.e.,

hardbound/softbound binding;

language:English; date of record

creation)

Depending on the data

you have, and the way

you annotate or structure

it, different things become

possible.

Your goal is to make the

data As Simple As

Possible -- but not so

simple that it stops being

useful.

Depending on the data

you work with, the work of

structuring or annotating

becomes more

challenging, but also

more useful.

The work of creating data

is social.

Many programming languages

have governing bodies that

establish standards for their

use:

• the World Wide Web (W3C)

Consortium

(http://www.w3.org/standards/)

• the TEI Technical Council

Data Examples

• Annotated (Markup Languages: HTML,

TEI)

• Structured (MySQL)

• Combination (Linked Open Data)

• Object-Oriented Programming (Java,

Python, Ruby on Rails)

Markup: HTML

<i> This text is

italic.</i> =This text is italic.

Markup: HTML

<a href=“http://www.dmdh.org”>This text</a> will take you to a webpage.

=

This text will take you to a webpage.

Markup: HTML

Anything can be data -- and markup

languages provide instructions for how

computers should treat that data.

Markup: HTMLHTML is used to format text on webpages.

<p> separates text into paragraphs.

<em> makes text bold (emphasized).

These are just a few of the HTML formatting instructions

that you can use.

HTML Syntax Rules

• Open and closed tags: <> and </>

• Attributes (2nd-level information)

defined using =“”

Markup languages are

popular in digital

humanities because lots

of humanists work with

texts.

Without markup

languages, the things that

a computer can search for

are limited.

Ctrl + F: any text in iambic

pentameter.

With markup, the

things you can

search for are only

limited by your

interpretation.

Markup: TEI

TEI

(Text Encoding Initiative)

Markup: TEI

Poetry w/ TEI<text xmlns="http://www.tei-c.org/ns/1.0" xml:id="d1">

<body xml:id="d2">

<div1 type="book" xml:id="d3">

<head>Songs of Innocence</head>

<pb n="4"/>

<div2 type="poem" xml:id="d4">

<head>Introduction</head>

<lg type="stanza">

<l>Piping down the valleys wild, </l>

<l>Piping songs of pleasant glee, </l>

<l>On a cloud I saw a child, </l>

<l>And he laughing said to me: </l>

</lg>

Grammar w/ TEI<entry>

<form>

<orth>pamplemousse</orth>

</form>

<gramGrp>

<gram type="pos">noun</gram>

<gram

type="gen">masculine</gram>

</gramGrp>

</entry>

TEI’s syntax rules are

identical to HTML’s --

though your normal

browser can’t work with

TEI the way it works with

HTML.

TEI is meant to be a

highly social language

that anyone can use and

adapt for new purposes.

In order for TEI to

successfully encode texts,

it has to be adaptable to

individual projects.

Anything that you can isolate

(and put in brackets) can

(theoretically) be pulled out and

displayed for a reader.

TEI can be used to encode more than just text:

<div type="shot">

<view>BBC World symbol</view>

<sp>

<speaker>Voice Over</speaker>

<p>Monty Python's Flying Circus tonight comes to you live

from the Grillomat Snack Bar, Paignton.</p>

</sp>

</div>

<div type="shot">

<view>Interior of a nasty snack bar. Customers around, preferably

real people. Linkman sitting at one of the plastic tables.</view>

<sp>

<speaker>Linkman</speaker>

<p>Hello to you live from the Grillomat Snack Bar.</p>

</sp>

</div>

Or, you could encode all

Stephenie Meyer’s

Twilight according to its

emotional register.

Whether you include or

exclude some aspect of

the text in your markup

can be very important

from an academic

perspective.

The challenge of creating

good data is one reason

that collaboration is so

important to digital

scholarship.

Wise Data Collaboration

• Avoid reinventing the wheel (has

someone else already created an

effective method for working with this

data?)

• Consider the labor involved vs. the

outcome (and future use of the data you

create.)

Structured Data

Study Scenario #1

• You study urban espresso stands: their

hours, brands of coffee, whether or not

they sell pastries, and how far the

espresso stands are from major

roadways.

Study Scenario #2

• You study female characters in novels

written between 1700 and 1850.

Encoding a whole novel just to study

female characters isn’t practical for you.

Both scenarios involve

aggregating information,

rather than encoding it.

Structured Data: Example

#1

(MySQL)ID Name Location Hours Coffee Brand Pastries (Y/N) Distance from

Street

008 Java the Hut 56

Farringdon

Road,

London, UK

7:00 a.m.-

2:00 p.m.

Square Mile

Roasters

N 25 meters

009 Prufrock

Coffee

18

Shoreditch

High Street

7:00 a.m. –

10:00 p.m.

Monmouth Y 10 meters

Structured Data:

Example #2 (RDF)

Object-Oriented

Programming

• Java, Python, C++, Perl, PHP, Ruby, etc.

• Widely used, highly flexible, very powerful

What’s an “object”?• An object is a structure that contains data in

one or more forms.

• Common forms include strings, integers, and

arrays (groups of data).

• Example (handout)

Object-oriented programming, cont’d

• Learning a bit about an OOP language can

help you become accustomed to working

with programming

• Reading OOP code can also be useful

• Many free tutorials are available

• Goal: to be able to converse more effectively

with professional programmers, rather than

become an expert yourself.

How your data is

structured will influence

the technology that you

(can) use to work with it.

Digital scholars see

creating machine-

readable data as valuable

scholarship.

Examples

• Homer Multi-Text Project

• Modernist Versions Project

• Scalar (platform)

• Century Ireland

Exercise:

You Create the Data!

Your data determines your

project.

Every project has data.

Text objects, images, tags, geographical

coordinates, categories, records, creator

metadata, etc.

Even if you’re not planning to

learn any programming skills,

you are still working with data.

Next time:Programming on the Whiteboard

February 19th, 3:00-5:00 p.m., Sherman

Centre

• Cleaning data before you work with it!

• Identifying specific programming tasks

• How access affects your project idea

• Flash project development

• Homework: bring some data to work

with.