+ All Categories
Home > Education > Data Analysis in the Hebrew Bible

Data Analysis in the Hebrew Bible

Date post: 16-Nov-2014
Category:
Upload: dirk-roorda
View: 427 times
Download: 0 times
Share this document with a friend
Description:
Joint work with Martijn Naaijer (VU University). With the Hebrew Bible encoded in Linguistic Annotation Framework (LAF-ISO), and with a new LAF processing tool, we demonstrate how you can do practical data analysis. The tool, LAF-Fabric, integrates with the ipython notebook approach. Our example here is lexeme cooccurrence analysis of bible books. For now, the road from data to visualization is more important than the exact visualization.
Popular Tags:
27
DATA ANALYSIS IN THE HEBREW BIBLE CLIN 2014-01-17 Dirk Roorda (DANS/TLA), Martijn Naaijer and Gino Kalkman (VU ETCBC)
Transcript
Page 1: Data Analysis in the Hebrew Bible

DATA ANALYSIS INTHE HEBREW BIBLE

CLIN 2014-01-17Dirk Roorda (DANS/TLA), Martijn Naaijer and Gino Kalkman (VU ETCBC)

Page 2: Data Analysis in the Hebrew Bible

RESEARCH @

just started

Page 3: Data Analysis in the Hebrew Bible

EXEGESIS

preaching the word of God

the devil is in the details

meanings of specific words

Page 4: Data Analysis in the Hebrew Bible

DISTANT READING

scan large quantities of text

find patterns

signals in the noise

study other aspects than meaning

text transmission

linguistic variation

literary form

Page 5: Data Analysis in the Hebrew Bible

VARIATION IN BIBLICAL HEBREW

Timespan of Hebrew Bible writing: ~1000 years

Assumption: we can divide the books in 2 groups

EBH (early biblical Hebrew)

LBH (late biblical Hebrew)

Page 6: Data Analysis in the Hebrew Bible

"PROOF"

Select some features that differ for EBH and LBH

Risk of circularity

We need data analysis that is

comprehensive (not eclectic)

critical (not everything is a signal)

Page 7: Data Analysis in the Hebrew Bible

SYNTACTIC VARIATION

syntactic features

phrase, clause, text

large units

chapters

books

drivers of change

diachrony

geography

demography

variation

Page 8: Data Analysis in the Hebrew Bible

THE HEBREW BIBLE AS DATA

Page 9: Data Analysis in the Hebrew Bible

THE HEBREW BIBLE IN LAF

LAF ISO 24612:2012

SHEBANQ (github)

2.27 GB

1.5 M nodes

1.5 M edges

40 M features

400 K words

13 M XML ids

Page 10: Data Analysis in the Hebrew Bible

PROCESSING LAF

it is XML

but not document-like (not asTEI)

and not database like (not nice for XQUERY)

it is graph-like

Page 11: Data Analysis in the Hebrew Bible

PROCESSING LAF

eXist (>30min loading time, simple queries >60min)

indexes needed: but which ones

tried POIO (>60min loading time, needs >20GB RAM)

straightforward object oriented in Python

scripting language overhead

Page 12: Data Analysis in the Hebrew Bible

LAF-FABRIC

LAF-Fabric

loads in a few seconds

executes in a few seconds

on a laptop

can run

in a Terminal

as an IPython notebook

also Python

uses C-like arrays

Page 14: Data Analysis in the Hebrew Bible

COOCCURRENCES

1 Common Nouns

2 Proper Nouns

Nodes are books

Edges are cooccurrences of lexemes (1 or 2)

Page 15: Data Analysis in the Hebrew Bible

WEIGHTED EDGES

S(lex): number of books containing lex

C(b1, b2): intersection of lexemes of b1 and b2

L(b1, b2): union of lexemes of b1 and b2

Page 23: Data Analysis in the Hebrew Bible

Common Nouns

no weight

Page 24: Data Analysis in the Hebrew Bible

Common Nouns

with weight

Page 25: Data Analysis in the Hebrew Bible

Proper Nouns

no weight

Page 26: Data Analysis in the Hebrew Bible

Proper Nouns

with weight


Recommended