+ All Categories
Home > Government & Nonprofit > 07 verheul texcavator

07 verheul texcavator

Date post: 15-Jul-2015
Category:
Upload: ingeangevaare
View: 76 times
Download: 0 times
Share this document with a friend
Popular Tags:
24
TOINE PIETERS AND JAAP VERHEUL UTRECHT UNIVERSITY, THE NETHERLANDS Texcavator Text Mining Historical Newspapers
Transcript
Page 1: 07 verheul texcavator

T O I N E P I E T E R S A N D J A A P V E R H E U L U T R E C H T U N I V E R S I T Y , T H E N E T H E R L A N D S

Texcavator Text Mining Historical Newspapers

Page 2: 07 verheul texcavator

Overview

Translantis research project Concept of reference cultures

Digital humanities

Texcavator tool Requirements

Features

Configuration

Texcavator use cases

Future ambitions Challenges

Cultural Text Mining

KB Big Data Conference 24 March 2015

Page 3: 07 verheul texcavator

T R A N S L A N T I S . N L

KB Big Data Conference 24 March 2015

Translantis research project

Page 4: 07 verheul texcavator

Translantis

Topic: emergence of the United States in Public Discourse in the Netherlands, 1890-1990 Concept: transnational reference cultures Method: digital humanities text mining Translantis.nl

KB Big Data Conference 24 March 2015

Page 5: 07 verheul texcavator

Culture Mining

Culture

• Ideas

• Kowledge

• Practices

Public Sphere

• Public Opinion

• Citizens engaging in enlightened debate

Public Media

• Periodicals

• Radio

• TV

• Internet

Digitized Newspapers

(sample of 10%)

Digitized Newspapers

• Sample of 10% of all printed newspapers

Mediation

KB Big Data Conference 24 March 2015

Page 6: 07 verheul texcavator

T R A N S L A N T I S . N L

KB Big Data Conference 24 March 2015

Texcavator

Page 7: 07 verheul texcavator

Texcavator

generic tool for cultural text mining and big data research

enables scholars to systematically search very large quantities of textual data in a reliable and reproducible way

able to support exploration and contextualization

serve multiple user groups

Wide community of historians using big data

Translantis team (NWO-funded)

Asymmetrical Encounters team (HERA-funded)

KB Big Data Conference 24 March 2015

Page 8: 07 verheul texcavator

Features

Direct access to big data repository

Integrated text-mining tools Boolean search

Named Entity Recognition

Sentiment mining

Stemming

Real-time visualization of search results Dynamic word clouds (and export of underlying data)

Timelines (normalized, bursts)

Input-output storage

Close and distant reading

KB Big Data Conference 24 March 2015

Page 9: 07 verheul texcavator

Current configuration

Digitized newspapers

(National Library)

9m pages

Texcavator interface

Elastic Search

(500GB) xTAS

KB Big Data Conference 24 March 2015

Page 10: 07 verheul texcavator

Current configuration

Digitized newspapers

(National Library)

9m pages

Texcavator interface

Elastic Search

(500GB) xTAS

real-time, scalable indexing

eXtensible Text Analysis Suite

KB Big Data Conference 24 March 2015

Page 11: 07 verheul texcavator

B U F FA L O B I L L

C O C A - C O L A

TAY L O R I S M

KB Big Data Conference 24 March 2015

Use cases

Page 12: 07 verheul texcavator

Records and word cloud

KB Big Data Conference 24 March 2015

Page 13: 07 verheul texcavator

Timeline + cloud of one “burst” (1965)

Normalized timeline

KB Big Data Conference 24 March 2015

Page 14: 07 verheul texcavator

Access to original

KB Big Data Conference 24 March 2015

Page 15: 07 verheul texcavator

Configuration

KB Big Data Conference 24 March 2015

Page 16: 07 verheul texcavator

Visualizing historical change

KB Big Data Conference 24 March 2015

Page 17: 07 verheul texcavator

Soft drinks

KB Big Data Conference 24 March 2015

Verwijzingen naar Coca-Cola èn Amerika in reclames Verklaar de pieken en dalen

Page 18: 07 verheul texcavator

Soft drinks

KB Big Data Conference 24 March 2015

Verwijzingen naar Coca-Cola zonder Amerika in reclames Verklaar de piek

Page 19: 07 verheul texcavator

Topic modeling en GIS

KB Big Data Conference 24 March 2015

Page 20: 07 verheul texcavator

Taylorism

KB Big Data Conference 24 March 2015

Voyant word cloud van “wetenschappelijke bedrijfsleiding” dataset

Verwijzingen over tijd binnen “wetenschappelijke bedrijfsleiding” dataset

naar “Taylor”, “taylor-stelsel”, “Taylor- systeem”

Page 21: 07 verheul texcavator

C H A L L E N G E S &

O P P O R T U N I T I E S

KB Big Data Conference 24 March 2015

Ambitions

Page 22: 07 verheul texcavator

Challenges

Software development Stable version of Texcavator

Intuitive interface

Additional features

Technological Processor and server capacity

Data exchange and standardization (metatags)

OCR

Scientific Combining close and distant reading

Reproducability

KB Big Data Conference 24 March 2015

Page 23: 07 verheul texcavator

Cultural Text Mining

Mining of cultural aspects of entities and events Concepts, mentalities, ideas, utopia’s, etc

Mining for Meaning

Towards digital conceptual history or digital history of mentalities

Address macro-historical questions: Trends, patterns, structures in debates

Circulation of knowledge

Emergence of transnational reference cultures

KB Big Data Conference 24 March 2015

Page 24: 07 verheul texcavator

Thank you!

KB Big Data Conference 24 March 2015


Recommended