+ All Categories
Home > Education > Transforming Indexes Locorum into Citation Networks

Transforming Indexes Locorum into Citation Networks

Date post: 31-Jul-2015
Category:
Upload: matteo-romanello
View: 75 times
Download: 1 times
Share this document with a friend
Popular Tags:
29
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transforming Indexes Locorum into Citation Networks Matteo Romanello (DAI/KCL) @mr56k HNR Workshop 2015 To Mine and to Tie: Text Mining And Network Analysis For Historians, Bockhum – April 10-12 2015
Transcript

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Transforming Indexes Locorum into CitationNetworks

Matteo Romanello (DAI/KCL) @mr56k

HNR Workshop 2015 To Mine and to Tie: Text Mining AndNetwork Analysis For Historians, Bockhum – April 10-12 2015

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Background

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

From Index Locorum to Citation Network

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

References in Classics

▶ canonical texts▶ fragmentary texts▶ inscriptions▶ papyri▶ manuscripts▶ coins

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Commentaries and Parallel Passages

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

To Mine

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Workflow

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Knowledge Base

▶ underlying model: HuCit, CIDOC-CRM & FRBRoo▶ Usages:

▶ extract abbreviations▶ resolve implicit refs:, e.g. “Herod. 4, 5-7”▶ validate citations, e.g. “Thuc. 1.100.9.4”

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The L’Année philologique (APh)

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The Data

▶ APh▶ analytical reviews (en, de, fr, es, it)▶ 80 volumes (1924-)▶ autom. processed vol. 75 (2004)

▶ 6,694 abstracts (total = 6,946, errors = 252)▶ 350k tokens▶ 3k citations

▶ man. annotated ~ 8 % of vol. 75▶ 366 abstracts▶ 26k tokens▶ 380 citations

▶ JSTOR▶ full text of 171k journal articles in Classics (327m tokens)

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

APh Example (75-06697)

APh 75-06697 => S. Braund & G. Gilbert. 2004. “An ABC of epic ira: anger,beasts, and cannibalism” Yale Classical Studies 32:250-285

In Statius ’ « Achilleid » (2, 96-102) Achilles describes his diet ofwild animals in infancy, which rendered him fearless and mayindicate another aspect of his character - a tendency towardaggression and anger. The portrayal of angry warriors in Roman epicis effected for the most part not by direct descriptions but indirectly,by similes of wild beasts (e.g. Vergil, Aen. 12, 101-109 ; Lucan 1,204-212 ; Statius, Th. 12, 736-740 ; Silius 5, 306-315). Thesesimiles may be compared to two passages from Statius (Th. 1,395-433 and 8, 383-394) that portray the onset of anger in directnarrative. Analysis of these passages demonstrates that the conceptof « ira » in epic takes its moral aspect from the context.

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

The NLP Pipeline

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Citation Extraction as an NLP Problem

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Annotation Scheme for NEs and Relations

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Canonical Text Services (CTS) URNs

▶ Pliny▶ urn:cts:latinLit:phi0978

▶ Pliny’s NH▶ urn:cts:latinLit:phi0978.phi001

▶ Pliny, Nat. 11,4,11▶ urn:cts:latinLit:phi0978.phi001:11.4.11

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Citation Extraction: Step 1, (Named Entity Recognition)

▶ Method: machine learning-based, Conditional Random Fields▶ Current accuracy (F1-score): 73,88%▶ Challenges:

▶ confusion with similar references

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Citation Extraction Step 2: (Relation Detection)

▶ Method: rule-based▶ Current accuracy (F1-score): 92,60%▶ Challenges:

▶ problem with more discoursive references

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Citation Extraction Step 3: (Disambiguation)

▶ Method: rule-based (context free grammar + fuzzy matching)▶ Current accuracy (F1-score): 73,05%▶ Challenges:

▶ ambiguous references▶ inconsitent notation (e.g. punctuation)

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

To Tie

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

From Texts to Networks

▶ why?▶ a network of citations already in the texts▶ networks all about relations▶ intertexuality = relations between texts

▶ what for?▶ search/information retrieval▶ analysis

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Macro-, meso-, micro-level networks

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Citation Network: Macro-level

▶ 2-mode (document, ancient author)▶ directed network▶ edges:

▶ mention of author, e.g. as Homer says▶ mention of work, e.g. in the Iliad▶ citation of passage, e.g. such as Hom. Il. 1.1

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

APh: Macro-level

APh goldset 366 documents, 26k tokens. 528 nodes and 400 edges.

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Citation Network: Meso-level

▶ 2-mode (document, ancient work)▶ directed network▶ edges:

▶ mention of work▶ citation of passage

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

APh: Meso-level

APh goldset 366 documents, 26k tokens. x nodes and y edges.

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Citation Network: Micro-level

▶ 2-mode (document, text passage)▶ directed network▶ edges:

▶ citation of passage

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

APh: Micro-level

APh goldset 366 documents, 26k tokens. 858 nodes and 498 edges.

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Interactive Network Visualisation

▶ macro: http://phd.mr56k.info/data/viz/macro▶ meso: http://phd.mr56k.info/data/viz/meso▶ micro: http://phd.mr56k.info/data/viz/micro

..........

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

.....

.....

......

.....

......

.....

.....

.

Conclusion

Challenges

▶ keep purposes separate (search, analysis, visualisation)▶ semantic of edges

Future Perspective

▶ evolution over time (longitudinal network analysis)▶ projection to undirected, 1-mode networks▶ community detection on macro/meso/micro level


Recommended