Date post: | 02-Jul-2015 |
Category: |
Education |
Upload: | historyspot |
View: | 1,008 times |
Download: | 4 times |
The Old Bailey Corpus
Spoken English in the 18th and
19th centuries The use of historical court records in
the investigation of language change
Digital History Seminar, 21 February 2012
Magnus Huber
Department of English
University of Giessen
Otto-Behaghel-Str. 10B
D-35394 Giessen, Germany
2
Structure 1. Introduction
1.1 Corpus linguistics, sociolinguistics and
sociohistorical linguistics
1.2 The Proceedings of the Old Bailey
1.3 Turning the Proceedings into a linguistic corpus
2. How linguistically accurate is OBC?
2.1 Comparison with alternative accounts
2.2 Language event and its representation
2.3 Internal consistency: negative contraction
2.4 Sociolinguistic potential: relative clauses
3. Brief summary
Definition of linguistic corpus
Generally speaking, a
(usually large) collection of
machine-readable texts used
as a database in linguistic
analyses
Importance of
spoken language
Spoken language precedes
written language
1. Introduction
1.1 Corpus linguistics, sociolinguistics and
sociohistorical linguistics
0
20
40
60
80
100
MMC LMC UWC MWC LWC
Female
Male
Percentage
of (ng):[n] by
social class
and sex
MMC middle middle class
LMC lower middle class
UWC upper working class
MWC middle working class
LWC lower working class
drinking
(ng):[n]
= [drɪnkɪn]
Peter Trudgill (1974)
The social differentiation of English in Norwich
Historical linguistics: language change
ye > you in subject position
when ye
come set it in
sech rewle as
ye seeme
best (1465)
And thus in
hast fare you
hartely well
(1545)
Sociohistorical linguistics
Gender-related change: ye > you
7
1.2 The Proceedings of the Old Bailey
• Old Bailey = London's Central Criminal Court
• meets 8 times/year, from 1830s 10 times/year
• "Proceedings" published 1674-1913
• start as a commercial enterprise: publishers
send scribes into courtroom
• proceedings taken down in shorthand
• sold privately by publishers
• City of London gains more and more control
during 18th century
• 2100+ volumes
• ca. 200,000 trials
• ca. 134 million words
www.oldbaileyonline.org
<unit id="t17330510-1"><trial><info><identifier>t17330510-1</identifier><source>173305100002</source><header>Sarah Sanders, theft: specified place, 10 May 1733.</header> <pfro>17330510</pfro><ntrial>2</ntrial><psession>17330404</psession><nsession>17330628</nsession></info>
<p>1. <person gender="f"><defend gender="f"><given>Sarah </given><surname>Sanders </surname></defend></person>, was indicted for <off><theft type="specified place">stealing a Portugal Piece of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the Goods of <person gender="m"><victim gender="m"><given>John </given><surname>Underwood </surname></victim> </person>, in his House</theft></off>, <cd>March 4</cd>.</p>
<p>John Underwood. The Prisoner was my <deflabel>Servant</deflabel>, she came to me very well recommended, but had not staid above ten Weeks before several [. . .]
Original computerized Proceedings (Sheffield)
<unit id="t17330510-1"><trial><info><identifier>t17330510-1</identifier><source>173305100002</source><header>Sarah Sanders, theft: specified place, 10 May 1733.</header> <pfro>17330510</pfro><ntrial>2</ntrial><psession>17330404</psession><nsession>17330628</nsession></info>
<p>1. <person gender="f"><defend gender="f"><given>Sarah </given><surname>Sanders </surname></defend></person>, was indicted for <off><theft type="specified place">stealing a Portugal Piece of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the Goods of <person gender="m"><victim gender="m"><given>John </given><surname>Underwood </surname></victim> </person>, in his House</theft></off>, <cd>March 4</cd>.</p>
<p>John Underwood. The Prisoner was my <deflabel>Servant</deflabel>, she came to me very well recommended, but had not staid above ten Weeks before several [. . .]
Original computerized Proceedings (Sheffield)
Sociolinguistically useful XML-tags
in Sheffield Proceedings
• name
<given>Sarah</given> <surname>Sanders</surname>
• year
<identifier>t17180110-1</identifier>
• gender
<defend gender="f">
• age
<age>43</age>
• profession
<deflabel>Servant</deflabel>
• origin
<crimeloc>Tottenham</crimeloc>
1.3 Turning the Proceedings
into a linguistic corpus of
early spoken English
13
<unit id="t17330510-1"><trial><info><identifier>t17330510-1</identifier><source>173305100002</source><header>Sarah Sanders, theft: specified place, 10 May 1733.</header> <pfro>17330510</pfro><ntrial>2</ntrial><psession>17330404</psession><nsession>17330628</nsession></info>
<p>1. <person gender="f"><defend gender="f"><given>Sarah </given><surname>Sanders </surname></defend></person>, was indicted for <off><theft type="specified place">stealing a Portugal Piece of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the Goods of <person gender="m"><victim gender="m"><given>John </given><surname>Underwood </surname></victim> </person>, in his House</theft></off>, <cd>March 4</cd>.</p>
<p>John Underwood. The Prisoner was my <deflabel>Servant</deflabel>, she came to me very well recommended, but had not staid above ten Weeks before several [. . .]
<speech>
Tagging spoken language
• Need for automatic annotation
• Perl script identifying non-linguistic
patterns indicating spoken language
in the original proceedings
– layout
– metalinguistic information
• Linguistic markers indicating spoken
language? > 1st + 2nd person prns
Automatic speech tagging
e.g. "Q. – A."-sequences
Q. Did you see him on Sunday night? - A.
Yes, at Walworth, on Sunday night, the
12th of January, at one o'clock - I am sure
of that.</p>
<speech> </speech>
<speech>
</speech>
17
Sociobiographical speech event annotation
The New Bailey Tag Assistant
- <xml>
- <document name="19100426">
. . .
- <speaker id="271">
<sex>m</sex>
<age></age>
<given>Thomas</given>
<surname>Tuckey</surname>
<occupation>Warder</occupation>
<occupation2></occupation2>
<hiscolabel>Prison Guard</hiscolabel>
<hiscocode>58930</hiscocode>
<hiscolabel2></hiscolabel2>
<hiscocode2></hiscocode2>
<crimescene></crimescene>
<birthplace></birthplace>
<workplace>Wormwood Scrubs Prison</workplace>
<placeofresidence></placeofresidence>
<role>witness</role>
</speaker>
. . .
- </document>
- </xml>
18
Social data file
• XML format
• attributes of every speaker
in OBC
• plus: scribe, printer,
publisher
2. How linguistically accurate is OBC?
Proceedings (718 words) Tryal (1290 words)
Thomas. I am clerk to Mr Jones,
a Stationer in the Temple.
Henry Thomas. I am clerk to Mr
Jones, a Stationer, in the Temple.
Hargrave. By Mr Ayliffe: I saw
him seal and deliver it.
Walter Hargrave. By Mr Ayliffe. – I
saw him sign, seal, and deliver it, as
his act and deed.
./. John Fannen. I am not sure; but to
the best of my remembrance, it was
sometime the beginning of
December last, at Mr Fox's house.
19
2.1. Comparison with alternative accounts, e.g.
trial of John Ayliffe, 17591024-27, vs. alternative
account The tryal at large of John Ayliffe
Proceedings (718 words) Tryal (1290 words)
Hargrave. Because he said he
was not willing Mr Fox should
know of it?
Walter Hargrave. The reason Mr
Ayliffe gave, was, that he would not
on any account have it come to Mr
Fox's ears.
Thomas. I can't particularly say
that; sometimes we leave a
blank by the gentlemens desire,
perhaps they may add another
covenant, or something of that
sort, I can't recollect the reason
for that.
Henry Thomas. I cannot positively
say. – We sometimes leave out the
conclusion by gentlemen's desire, in
order that they may add a covenant,
or some such thing, if it should be
thought necessary; but I cannot
particularly recollect the reason why
the conclusion was omitted in this
case.
20
speech event
perception by scribe
shorthand script
expanding shorthand
proof reading
type setting
21
formulation writing
Letters
Trial proceedings (e.g. Old Bailey Proceedings)
2.2 Language event ↔ written representation
Gurney (1752)
Brachygraphy: or short-writing
22
'to take a Speech,
or Sermon
verbatim, as a
Person talks in
common' (p. 3)
Scribes
Thomas Gurney
(1749-1770)
Joseph Gurney
(1770-1782)
Recording linguisticdetails
• no distinction between inflected and
uninflected auxiliaries
= 'may' or 'mayst'
= 'can' or 'canst'
= 'should' or 'shouldst'
• dot placed on the top left of the noun phrase
= allomorphs a and an
• auxiliary contractions
'you will' (you w-il) vs. 'you'll' (you-l)
but │ 'it will' ~ 'twill' (│= <t> and it) 23
24
2.3 Internal consistency:
negative contraction
e.g. do not > don't, need not > needn't, was not > wasn't
N = 1,344,244
0
2
4
6
8
10
12
14
16
18
1732-1759 1760-1789 1790-1819 1820-1849 1850-1879 1818-1913
NEG contraction in %
Negative contraction in the
OBC, 1732-1912 1. Lexeme?
AUX form % contr. N
do not 28.9 189,776
will not 27.7 17,302
shall not 20.6 4,172
cannot 13.3 106,005
are not 3.2 11,552
dare not 3.1 260
need not 0.6 2,136
did not 0.4 429,143
does not 0.4 9,539
have not 0.4 44,038
could not 0.2 85,361
25
AUX form % contr. N
is not 0.2 47,142
must not 0.2 1,620
would not 0.2 52,123
had not 0.1 72,395
has not 0.1 9,244
should not 0.1 20,192
was not 0.1 64,574
may not 0.0 1,271
might not 0.0 2,404
ought not 0.0 1,221
Negative contraction in the
OBC, 1732-1912 2. Frequency?
AUX form % contr. N
do not 28.9 189,776
will not 27.7 17,302
shall not 20.6 4,172
cannot 13.3 106,005
are not 3.2 11,552
dare not 3.1 260
need not 0.6 2,136
did not 0.4 429,143
does not 0.4 9,539
have not 0.4 44,038
could not 0.2 85,361
26
AUX form % contr. N
is not 0.2 47,142
must not 0.2 1,620
would not 0.2 52,123
had not 0.1 72,395
has not 0.1 9,244
should not 0.1 20,192
was not 0.1 64,574
may not 0.0 1,271
might not 0.0 2,404
ought not 0.0 1,221
Negative contraction in the
OBC, 1732-1912 3. Tense?
AUX form % contr. N
do not 28.9 189,776
will not 27.7 17,302
shall not 20.6 4,172
cannot 13.3 106,005
are not 3.2 11,552
dare not 3.1 260
need not 0.6 2,136
did not 0.4 429,143
does not 0.4 9,539
have not 0.4 44,038
could not 0.2 85,361
27
AUX form % contr. N
is not 0.2 47,142
must not 0.2 1,620
would not 0.2 52,123
had not 0.1 72,395
has not 0.1 9,244
should not 0.1 20,192
was not 0.1 64,574
may not 0.0 1,271
might not 0.0 2,404
ought not 0.0 1,221
28
Explaining the absence of
negative contraction
• combination of phonology and genre
• n't is phonetically reduced, less salient than not
• do-don't [u - o(u)] vs. did-didn't [ɪ - ɪ]
can-can't vs. could-couldn't
will-won't vs. would-wouldn't
shall-shan't vs. should-shouldn't
• negative contraction is (near) absent where the
context (e.g. change in the stem vowel in the
negative) does not allow disambiguation
Hierarchy of perceptive difference
between positive and negative
contracted forms
29
V change C change/
addition
Score
do-don('t) 1 1 2
will-won('t) 1 1 2
shall-shan('t) 0.5 1 1.5
can-can('t) 0.5 0 0.5
2.4 Sociolinguistic potential: relative
clauses
• random extracts of speech events from OBC:
20,000 words/decade (10,000 w. each for m + f)
• 2500+ relative clauses, of which 1533 restrictive
30
1720-
1779
% 1780-
1839
% 1840-
1913
% ∑ %
that 259 53.8 240 45.4 136 26.0 635 41.4
zero 107 22.2 118 22.3 201 38.4 426 27.8
which 70 14.6 97 18.3 92 17.6 259 16.9
who 38 7.9 69 13.0 89 17.0 196 12.8
whom 6 1.2 2 0.4 5 1.0 13 0.8
whose 1 0.2 3 0.6 0 0.0 4 0.3
∑ 481 529 523 1533
Diagram 1 Distribution of that with regard to
animacy of the head
1720-1779 vs 1780-1839 p = 0.000
1720-1779 vs 1840-1913 p = 0.000
1780-1839 vs 1840-1913 p = 0.070 31
1720-1779 1780-1839 1840-1913
non-human 121 164 105
human 137 76 31
0%
20%
40%
60%
80%
100%
32
Diagram 2 Distribution of that and pronominal
relativizers with human heads
1720-1779 vs 1780-1839: p = 0.000
1720-1779 vs 1840-1913: p = 0.000
1780-1839 vs 1840-1913: p = 0.000
1720-1779 1780-1839 1840-1913
PRN 49 72 93
that 137 76 31
0%
20%
40%
60%
80%
100%
Diagram 3 Relativizers by gender (excl. genitives)
f 1720-1779 vs 1780-1839: p = 0.135 m 1720-1779 vs 1780-1839: p = 0.033
f 1720-1779 vs 1840-1913: p = 0.000 m 1720-1779 vs 1840-1913: p = 0.000
f 1780-1839 vs 1840-1913: p = 0.000 m 1780-1839 vs 1840-1913: p = 0.000
f m f m f m
1720-1779 1780-1839 1840-1913
PRN 43 71 56 112 66 119
zero 53 54 66 52 110 73
that 124 134 108 132 72 64
0%
20%
40%
60%
80%
100% p = 0.135 p = 0.001 p = 0.000
Diagram 4 Zero relativizer by gender (excl. genitives)
f 1720-1779 vs 1780-1839: p = 0.268 m 1720-1779 vs 1780-1839: p = 0.326
f 1720-1779 vs 1840-1913: p = 0.000 m 1720-1779 vs 1840-1913: p = 0.022
f 1780-1839 vs 1840-1913: p = 0.000 m 1780-1839 vs 1840-1913: p = 0.001
f m f m f m
1720-1779 1780-1839 1840-1913
other 167 205 164 244 138 173
zero 53 54 66 52 110 73
0%
20%
40%
60%
80%
100%
Thank you
35
References
• Gurney, Thomas. 1752. Brachygraphy: or short-writing.
2nd ed. London: [no publisher].
• Nevalainen, Terttu & Raumolin-Brunberg, Helena (eds).
1996. Sociolinguistics and language history: studies
based on the corpus of early English correspondence.
Amsterdam: Rodopi.
• Trudgill, Peter. 1974. The Social Differentiation of
English in Norwich. Cambridge: Cambridge University
Press.
• van Leeuwen, Marco H.D., Ineke Maas and Andrew
Miles. 2002. HISCO: Historical international standard
classification of occupations. Leuven: Leuven University
Press. 36