07 verheul texcavator

Post on 15-Jul-2015

76 views 0 download

Tags:

transcript

T O I N E P I E T E R S A N D J A A P V E R H E U L U T R E C H T U N I V E R S I T Y , T H E N E T H E R L A N D S

Texcavator Text Mining Historical Newspapers

Overview

Translantis research project Concept of reference cultures

Digital humanities

Texcavator tool Requirements

Features

Configuration

Texcavator use cases

Future ambitions Challenges

Cultural Text Mining

KB Big Data Conference 24 March 2015

T R A N S L A N T I S . N L

KB Big Data Conference 24 March 2015

Translantis research project

Translantis

Topic: emergence of the United States in Public Discourse in the Netherlands, 1890-1990 Concept: transnational reference cultures Method: digital humanities text mining Translantis.nl

KB Big Data Conference 24 March 2015

Culture Mining

Culture

• Ideas

• Kowledge

• Practices

Public Sphere

• Public Opinion

• Citizens engaging in enlightened debate

Public Media

• Periodicals

• Radio

• TV

• Internet

Digitized Newspapers

(sample of 10%)

Digitized Newspapers

• Sample of 10% of all printed newspapers

Mediation

KB Big Data Conference 24 March 2015

T R A N S L A N T I S . N L

KB Big Data Conference 24 March 2015

Texcavator

Texcavator

generic tool for cultural text mining and big data research

enables scholars to systematically search very large quantities of textual data in a reliable and reproducible way

able to support exploration and contextualization

serve multiple user groups

Wide community of historians using big data

Translantis team (NWO-funded)

Asymmetrical Encounters team (HERA-funded)

KB Big Data Conference 24 March 2015

Features

Direct access to big data repository

Integrated text-mining tools Boolean search

Named Entity Recognition

Sentiment mining

Stemming

Real-time visualization of search results Dynamic word clouds (and export of underlying data)

Timelines (normalized, bursts)

Input-output storage

Close and distant reading

KB Big Data Conference 24 March 2015

Current configuration

Digitized newspapers

(National Library)

9m pages

Texcavator interface

Elastic Search

(500GB) xTAS

KB Big Data Conference 24 March 2015

Current configuration

Digitized newspapers

(National Library)

9m pages

Texcavator interface

Elastic Search

(500GB) xTAS

real-time, scalable indexing

eXtensible Text Analysis Suite

KB Big Data Conference 24 March 2015

B U F FA L O B I L L

C O C A - C O L A

TAY L O R I S M

KB Big Data Conference 24 March 2015

Use cases

Records and word cloud

KB Big Data Conference 24 March 2015

Timeline + cloud of one “burst” (1965)

Normalized timeline

KB Big Data Conference 24 March 2015

Access to original

KB Big Data Conference 24 March 2015

Configuration

KB Big Data Conference 24 March 2015

Visualizing historical change

KB Big Data Conference 24 March 2015

Soft drinks

KB Big Data Conference 24 March 2015

Verwijzingen naar Coca-Cola èn Amerika in reclames Verklaar de pieken en dalen

Soft drinks

KB Big Data Conference 24 March 2015

Verwijzingen naar Coca-Cola zonder Amerika in reclames Verklaar de piek

Topic modeling en GIS

KB Big Data Conference 24 March 2015

Taylorism

KB Big Data Conference 24 March 2015

Voyant word cloud van “wetenschappelijke bedrijfsleiding” dataset

Verwijzingen over tijd binnen “wetenschappelijke bedrijfsleiding” dataset

naar “Taylor”, “taylor-stelsel”, “Taylor- systeem”

C H A L L E N G E S &

O P P O R T U N I T I E S

KB Big Data Conference 24 March 2015

Ambitions

Challenges

Software development Stable version of Texcavator

Intuitive interface

Additional features

Technological Processor and server capacity

Data exchange and standardization (metatags)

OCR

Scientific Combining close and distant reading

Reproducability

KB Big Data Conference 24 March 2015

Cultural Text Mining

Mining of cultural aspects of entities and events Concepts, mentalities, ideas, utopia’s, etc

Mining for Meaning

Towards digital conceptual history or digital history of mentalities

Address macro-historical questions: Trends, patterns, structures in debates

Circulation of knowledge

Emergence of transnational reference cultures

KB Big Data Conference 24 March 2015

Thank you!

KB Big Data Conference 24 March 2015