+ All Categories
Home > Documents > New Automatic Compositor Attribution in the First Folio of … · 2017. 9. 16. · Automatic...

New Automatic Compositor Attribution in the First Folio of … · 2017. 9. 16. · Automatic...

Date post: 21-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
1
Automatic Compositor Attribution in the First Folio of Shakespeare Maria Ryskina*, Hannah Alpert-Abrams , Dan Garrette and Taylor Berg-Kirkpatrick* *{mryskina, tberg}@cs.cmu.edu [email protected] [email protected] Summary What is a compositor? Experiments We present an unsupervised approach to compositor attribution — clustering the pages of a historical printed document according to the individual who set the type. We use the First Folio of Shakespeare (1623) as a test case since it has been extensively studied by bibliographers. Following the manual work of these traditional Shakespeare scholars, we perform automatic analysis by modeling the orthographic preferences and spacing tendencies of compositors, reaching up to 87% agreement with the authoritative attribution. spelling variation spacing variation medial comma Model Compositors are modeled as latent variables, one per page, and their orthographic and spacing choices are modeled by multinomial distributions. Basic model variant: Simple multinomial baseline, only includes word-level spelling choices. Feature model variant: Includes individual edit operations as well as word spelling choices, incorporated using a log-linear parameterization. Whitespace lengths are modeled with separate multinomials for each compositor. Analysis ! ! ! ! ! ! ! ! ! ! u o u w u DEL ! ! ! ! ! ! ! ! ! Comp B Comp A Comp C Comp E Comp D ! ! ! ! Inspecting the parameters learned by our model reveals habits of individual compositors that have been noticed by bibliographers (Taylor, 1981). Learned behaviors By extending our model with a non-parametric prior we are able to additionally learn the number of compositors. Depending on the subset of the vocabulary considered (e.g. words considered by Hinman vs. the larger set considered by Blayney) our non-parametric model agrees with the corresponding scholars’ judgement. Number of compositors One-to-one accuracy 0 20 40 60 80 100 87.3 77.1 58.8 16.7 Random Basic Feature: Edit Feature: Edit + Spacing Manual diplomatic transcription (Bodleian library) 0 20 40 60 80 100 85.9 76.1 53.7 16.7 Ocular OCR transcription (Berg-Kirkpatrick et al., 2013) We evaluate by mapping the recovered page clusters to the gold compositors in the authoritative attribution and measure one-to-one accuracy. Including orthographic features substantially improves accuracy, but the best result is obtained by edit and whitespace features combined. c i s ik m ij d ij dear deere K i J i I Modern spelling Diplomatic spelling Spacing distance Compositor C ! ! ! ! ! ! Whitespace pref params: c dear: deare deer deere a ! e INS ! e a ! r a ! DEL Edit operation weights: C Orthographic pref params: w c Word variant weights: A compositor is a person who manually arranges and sets type for printing a document. Shakespeare’s First Folio is believed to have been set by multiple compositors, each with varying degrees of proficiency at accurately transcribing the original (mostly lost) manuscripts. Bibliographers attribute pages to compositors based on their spelling choices or visual evidence such as whitespace lengths before and after punctuation. Random Basic Feature: Edit Feature: Edit + Spacing 2 4 6 8 10 0 300 600 900 5 Hinman 8 Blayney Iterations Number of comps
Transcript
  • Automatic Compositor Attribution in the First Folio of ShakespeareMaria Ryskina*, Hannah Alpert-Abrams†, Dan Garrette‡ and Taylor Berg-Kirkpatrick*


    *{mryskina, tberg}@cs.cmu.edu †[email protected][email protected] Summary

    What is a compositor?

    ExperimentsWe present an unsupervised approach to compositor attribution — clustering the pages of a historical printed document according to the individual who set the type. We use the First Folio of Shakespeare (1623) as a test case since it has been extensively studied by bibliographers. Following the manual work of these traditional Shakespeare scholars, we perform automatic analysis by modeling the orthographic preferences and spacing tendencies of compositors, reaching up to 87% agreement with the authoritative attribution.

    spellingvariation

    spacing variation

    medial comma

    Model

    Compositors are modeled as latent variables, one per page, and their orthographic and spacing choices are modeled by multinomial distributions.

    Basic model variant: Simple multinomial baseline, only includes word-level spell ing choices. 


    Feature model variant: Includes individual edit operations as well as word spelling choices, incorporated using a log-linear parameterization. Whitespace l eng ths a re mode led w i th separate multinomials for each compositor.

    Analysis

    ! ! ! ! ! ! ! !

    !!

    u o

    u w

    u DEL!

    ! ! ! ! ! ! ! !

    Comp B

    Comp A

    Comp C Comp E Comp D

    ! ! ! !

    Inspecting the parameters learned by our model reveals habits of individual compositors that have been noticed by bibliographers (Taylor, 1981).

    Learned behaviors

    By extending our model with a non-parametric prior we are able to additionally learn the n u m b e r o f c o m p o s i t o r s . Depending on the subset of the vocabulary considered (e.g. words considered by Hinman vs. the larger set considered by Blayney) our non-parametric m o d e l a g r e e s w i t h t h e c o r r e s p o n d i n g s c h o l a r s ’ judgement.

    Number of compositors

    One

    -to-o

    ne a

    ccur

    acy

    0

    20

    40

    60

    80

    10087.3

    77.1

    58.8

    16.7

    Random Basic Feature: 
Edit

    Feature: 
Edit + Spacing

    Manual diplomatic transcription 
(Bodleian library)

    0

    20

    40

    60

    80

    10085.9

    76.1

    53.7

    16.7

    Ocular OCR transcription 
(Berg-Kirkpatrick et al., 2013)

    We evaluate by mapping the recovered page clusters to the gold compositors in the authoritative attribution and measure one-to-one accuracy. Including orthographic features substantially improves accuracy, but the best result is obtained by edit and whitespace features combined.

    ci

    sik

    mij

    dij

    dear

    deereKi Ji

    I

    Modern

    spelling

    Diplomatic

    spellingSpacingdistance

    Compositor

    C! !! !! !

    Whitespace pref params: ✓c

    dear:

    deare deerdeere

    a! eINS! e

    a! ra! DEL

    Edit operation weights:

    C

    Orthographic pref params: wcWord variant weights:

    A compositor is a person who manually arranges and sets type for printing a document. Shakespeare’s First Folio is believed to have been set by multiple compositors, each with varying degrees of proficiency at accurately transcribing the original (mostly lost) manuscripts. Bibliographers attribute pages to compositors based on their spelling choices or visual evidence such as whitespace lengths before and after punctuation.


    Random Basic Feature: 
Edit

    Feature: 
Edit + Spacing

    2

    4

    6

    8

    10

    0 300 600 900

    5Hinman

    8Blayney

    Iterations

    Num

    ber o

    f com

    ps


Recommended