+ All Categories
Home > Data & Analytics > Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analytics for Linguistics - Day 3 ESSLLI

Date post: 23-Jan-2018
Category:
Upload: olga-scrivner
View: 205 times
Download: 8 times
Share this document with a friend
73
Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Visual Analytics for Linguistics - Day 3 Olga Scrivner
Transcript

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Visual Analytics for Linguistics - Day 3

Olga Scrivner

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

What You Will Learn

DAY 1 Introduction to Visual Analytics

DAY 2 Visualization Methods, Design, and Tools

DAY 3 Working with Unstructured Data

DAY 4 Working with Structured Data

DAY 5 Advanced Analytics

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Our Materials - Web Site

http://obscrivn.wixsite.com/visualization

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

What We Need

I Interactive Text Mining Suite

I Voyant

I R and Rstudio

I R libraries: ggplot2, plotly, reshape2

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

What We Need

I Interactive Text Mining Suite

I Voyant

I R and Rstudio

I R libraries: ggplot2, plotly, reshape2

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Quiz: Which Chart Are You?

https://www.sisense.com/blog/quiz-chart/

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating a Bar Chart

I The value of a column in the data set. This is done withstat=“identity” , which leaves the y values unchanged.

I The count of cases for each group - each x valuerepresents one group.

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating a Bar Chart - Sample

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating a Bar Chart - Sample

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating a Bar Chart - Values

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating a Bar Chart - Counts

To get a bar graph of counts, we do not map a variable to y,and we use stat=“count”

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating a Bar Chart - Counts

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Title

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Line Chart

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Line Chart

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Area Chart

http://www.r-graph-gallery.com/136-stacked-area-chart/

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Scatter Plot

http://www.r-graph-gallery.com/272-basic-scatterplot-with-ggplot2/

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Bubble Plot

https://plot.ly/r/bubble-charts/

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Bubble Plot

https://plot.ly/r/bubble-charts/

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Heatmap

http://www.r-graph-gallery.com/215-interactive-heatmap-with-plotly/

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Heatmap

http://www.r-graph-gallery.com/215-interactive-heatmap-with-plotly/

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Heatmap

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Word Cloud

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Word Cloud - Contest - 10 min

I Create your own word cloudI Look at the function - type ?wordcloud2 and run

I Can you change a shape of your cloud?I Save (or make a screenshot) and post it on

twitter/facebook etc

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Why Analyze Text?

The “epic transformation of archives” - shifting from print todigital archival form (Folsom, 2007)

“As our collective knowledge continues to be digitized andstored (...) it becomes more difficult to find and discover

what we are looking for.” (Blei 2012)

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Text Mining Challenges

source - 1) Dan Jurafsky, 2) Text Mining with R for Social Science Research (Ryan Wesslen)

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Basic Terminology

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

What is Bag of Words?

I Simplest way to quantify text

I Word order ignored

I Term counts per document

I N-grams (uni-grams, bi-grams)

Source - Chris Manning

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Preprocessing

I Tokenization (splitting words)

I Cleaning (lower case, punctuation)

I Stemming

I works, worked → work

I Filter (stopwords)

I and, the, a

Source - Wesslen

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Preprocessing

I Tokenization (splitting words)

I Cleaning (lower case, punctuation)

I Stemming

I works, worked → work

I Filter (stopwords)

I and, the, a

Source - Wesslen

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Preprocessing

I Tokenization (splitting words)

I Cleaning (lower case, punctuation)

I Stemming

I works, worked → work

I Filter (stopwords)

I and, the, a

Source - Wesslen

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Macro-analysis

Concept Macro-analysis (Jockers, 2013)

“the construction of abstract models”(Jasinski, 2001)

Methods Tag clouds, heat maps, clusters, topics,network graphs

Tools GUI: Voyant, Papermachine, ITMSTUI: Mallet, Meta, R and Python packages

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Visual Analytics

Visual Analytics - “The science of analytical reasoningfacilitated by visual interactive interfaces” (Thomas et all.,2005)

I Graphs, maps and trees for literature analysis (Moretti,2005)

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Visualization Methods

I Word clouds to analyze a novel (Vuillemot et al., 2009)

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Visualization Methods

I Social network graphs of characters in Greek tragedies(Rydberg-Cox, 2011)

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Visualization Methods

I Literary fingerprint and summaries (Oelke et al., 2012)

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Visualization Methods

I Tracking emotion and sentiment in fairy tales(Mohammad, 2012)

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Topic Modeling

Discovering underlying theme of collection from Science magazine1990-2000 (Blei 2012)

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Topics - Word Term

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Topics - Word Term

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Wikipedia Topics

http://www.princeton.edu/~achaney/tmve/wiki100k/browse/topic-presence.html

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Wikipedia Topics - Assignment - 10 min

1. Language Related Topic2. Words: Dialect3. Related Document: Macedonian Language4. Related Document: Egyptian hieroglyphs5. Go to Full article:6. Find meaning:

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Voyant

http://voyant-tools.org/

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Voyant

http://voyant-tools.org/

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Voyant - 10 min

http://voyant-tools.org/

I Examine visualization charts (identify typesand properties)

I Apply various filters and queries

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Voyant Tools - Bubblelines - 7 min

http://docs.voyant-tools.org/tools/

I Delete top termsI Search for man and woman

I Make sure to have “separate lines for terms” clickedI Change search terms

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Voyant Tools - Pair Work - 10 min

http://docs.voyant-tools.org/tools/I Examine visualization methodsI Select 5 methodsI Look at the documentation and how to use them

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Interactive Text Mining Suite

I A user-friendly tool for quantitative analysis andvisualization of unstructured data

I Platform-independent

I Interactive

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

ITMS Structure

1. File Uploads

I Upload files (txt, pdf, rdf and Google books API)

2. Data Preparation

I Data preprocessing (stopwords, stemming, metadata)

3. Data Visualization

I Word frequencies, Cluster analysis and topic modeling

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

ITMS Structure

1. File Uploads

I Upload files (txt, pdf, rdf and Google books API)

2. Data Preparation

I Data preprocessing (stopwords, stemming, metadata)

3. Data Visualization

I Word frequencies, Cluster analysis and topic modeling

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Workshop Files

I Download 3 text files

https://iu.box.com/s/knua9af3bip7g63s3zdax9ti4z243ldz

I NY Times articles (3 documents in a plain text format)

I ITMS Web site:

http://www.interactivetextminingsuite.com

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Upload File

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Upload File

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Upload File

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Preprocessing Data

Before performing data analysis we should preprocess data.

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Preprocessing Options

Select preprocessing options and click apply.

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Stopwords

Stopwords (e.g. the, and): select Default for English

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Manual Removal of Stopwords

Based on the need, remove any additional stopwords that youmay consider a noise, e,g, paper, shows etc

Select apply

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Stemming

To improve analytics, you can stem all your tokens, ex.instead of worked, works, working, you will have only onerelevant stem work

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Metadata Extraction

You can extract or upload metadata. You will needdatestamp (year) information for chronological topicmodeling.

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Visualization

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Word Cloud Representation

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Customization

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Cluster Analysis

You need to have at least three documentsDocuments will be grouped based on their term similaritymeasures

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Cluster Analysis

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Topic Modeling

I LDA (Latent Dirichlet allocation)

I STM (Structural Topic model)

I Chronological topic visualization (lda): requiresmetadata

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Topic Modeling Tuning

I Selection of topics (how many different themes)

I Selection of words per theme (how many words pertopic)

I Selection of iteration

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Topic Model Selection

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

LDA Topic Model

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

STM Topic Model

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Other Formats - Google Books

Before switching to other data formats, refresh your localbrowser.

Start with File Uploads and select Structured Data

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Other Formats - Google Books

Select your search terms and submit

Current limitation is 40 books


Recommended