+ All Categories
Home > Data & Analytics > From Keyword Searching to Discourse Mining

From Keyword Searching to Discourse Mining

Date post: 12-Apr-2017
Category:
Upload: pim-huijnen
View: 26 times
Download: 1 times
Share this document with a friend
15
From keyword searching to discourse mining Pim Huijnen, Juliette Lonij Encounters between the Humanities and Computing, Utrecht University, 18 February 2016
Transcript

From

keyword searching to

discourse mining

Pim Huijnen, Juliette Lonij

Encounters between the Humanities and Computing, Utrecht University, 18 February 2016

The Keyword Problem

From: The Barre daily times, January 22 (1913), p. 1

Dictionary searching

using extensive and context-specific word lists (‘dictionaries’) to replace the contingency of single keywords

Why dictionary searching?

…to trace discursive shifts, represented by combinations of words instead of individual words

…to trace the persistence of discourses

Eugenics in Dutch newspapers(?)

Query:

maatregel nageslacht eigenschap* aanleg theorie bloed invloed

NOT eugenetica eugenetiek eugeniek eugenese ras*

Eugenics after eugenics

(Geref. gezinsblad 1965)

(De Tijd 1952)

Efficiency before efficiency

Query: "product* machine* verspilling bedrijf goedkoop kwaliteit” \01-01-1890 t/m 31-12-1940

(1901)(1906)

Developing a script to extract dictionaries from literature

Experimenting with tools to visualise results of dictionary searching in kranten.delpher.nl

KB researcher-in-residence project

Script to extract dictionaries

B

Topic modeling

TF-IDF

A

BC

Script to extract dictionaries

Visualising results of dictionary searches in Delpher

Use OR-query to search Delpher

Visualise results on the basis of Solr’s relevancy-score (min. nr. of words)

(arbeid* OR bedrij* OR beheer OR controle* OR factor* OR functie* OR kost* OR leiding* OR loon* OR maatregel* OR management OR methode* OR model* OR norm* OR organisatie* OR plannen OR prijs OR productie OR rationeel OR rendement OR reorganisatie OR statistiek OR taylor OR tijd OR werkbesparing OR werkverdeeling)

kbresearch.nl/dictionary

kbresearch.nl/dictionary

Challenges

Running an OR-query of 25+ (or, preferably, more) words on a 100.000.000+ document dataset

Accounting for particularities of the corpus: * number of newspaper titles per year * changes in newspaper titles over the years * changes in article length over the years

Getting an idea of the exact combination of words in the visualised results


Recommended