+ All Categories
Home > Documents > Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you...

Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you...

Date post: 22-May-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
60
Scraping and visualizing Twitter data @AnnaHenschel [email protected]
Transcript
Page 1: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Scraping and visualizing Twitter data

@[email protected]

Page 2: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

A short introduction to Twitter (and rtweet).

Page 3: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code
Page 4: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

https://mikewk.com/

Page 5: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Many scientists are on Twitter!

Page 6: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code
Page 7: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Why though ?

Page 8: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

New papers / preprints

Page 9: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code
Page 10: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

A supportivecommunity

Page 11: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

And … data!

Page 12: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

And … data!

Page 13: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

A word on ethics.• Twitter developer terms of service • Don’t derive or store sensitive information• The role of consent?

Taylor & Pagliari , 2018; Williams, Burnap, & Sloan, 2017

Page 14: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code
Page 15: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

What are we going to do in this tutorial?• Get data from Twitter using rtweet• Wrangle Twitter data with tidytext• Sentiment analysis• (Additional practice)

Page 16: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Remember:• The red text does not always mean • If you fall behind, copy/paste from the web

materials for this session• Write the code in a .rmd (R Markdown) file –

not in the console!

Page 17: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

# install rtweet from CRANinstall.packages(“rtweet”)

# load rtweet packagelibrary(rtweet)

Installing rtweet

Page 18: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

install.packages(“tidytext”)library(tidytext)

install.packages(“ggpubr”)library(ggpubr)

library(tidyverse)

Other packages:

Page 19: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Tipp:

Rtweet interacts with Twitter’s API. In order to use the package you need to allow Rstudio to authenticate you as a user. When running the first function, a popup window in your browser will appear, confirming this.

Page 23: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Getting (almost all) tweets of a user

lego<- get_timeline("@legogradstudent", n=3200)

Page 24: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Getting (almost all) tweets of a user

# Look at first few lines of the dataframehead(lego)

view(lego)

Page 25: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Tidy tweets = one word per row format

tidy_tweets <- lego %>%filter(is_retweet==FALSE)

Page 26: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Tidy tweets = one word per row format

tidy_tweets <- lego %>%filter(is_retweet==FALSE) %>%select(status_id, text)

Page 27: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Tidy tweets = one word per row format

tidy_tweets <- lego %>%filter(is_retweet==FALSE) %>%select(status_id, text)

Run this code and have a look at the dataframe!

Page 28: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Tidy tweets = one word per row format

tidy_tweets <- lego %>%filter(is_retweet==FALSE) %>%select(status_id, text) %>% unnest_tokens(word, text)

Page 29: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Tidy tweets = one word per row format

Did it work?

tidy_tweets <- lego %>%filter(is_retweet==FALSE) %>%select(status_id, text) %>% unnest_tokens(word, text)

Page 30: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Tidy tweetstidy_tweets <- lego %>%

filter(is_retweet==FALSE) %>%select(status_id, text) %>% unnest_tokens(word, text)

= one word per row format

# Look at the dataframeview(tidy_tweets)

Page 31: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Stop words

stop_words

= most common words in alanguage (e.g. “the” or “is”)

Page 32: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Stop words

stop_words

= most common words in alanguage (e.g. “the” or “is”)

Page 33: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Custom stop words for Internet text data

my_stop_words <- tibble(word = c(

"https", "t.co", "rt", "amp", "rstats", "gt"), lexicon = "twitter" )

Page 34: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Custom stop words for Internet text data

# Check if it worked

View(my_stop_words)

Page 35: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Custom stop words for Internet text data

# Check if it worked

View(my_stop_words)

Page 36: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Adding custom stop words and removing numbers

all_stop_words <- stop_words %>% bind_rows(my_stop_words)

# Remove numbersno_numbers <- tidy_tweets %>%

filter(is.na(as.numeric(word)))

# Connect all stop words

Page 37: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Adding custom stop words and removing numbers

no_numbers <- tidy_tweets %>% filter(is.na(as.numeric(word)))

# Remove numbers

Page 38: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Removing stop words with anti_join()# Get rid off all stop wordsno_stop_words <- no_numbers %>%

anti_join(all_stop_words, by = "word")

Page 39: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

How many words are we left with?

Check in the environment (on the top right hand side).

How many rows does tidy_tweets have, how many rows for no_stop_words?

Page 40: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Sentiment analysis

Page 41: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Sentiment analysis

Page 42: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Sentiment analysis

Page 43: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Sentiment analysis

Text Mining with R by Julia Silge & David Robinson

Page 44: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Sentiment analysis?

nrc_words <- no_stop_words %>% inner_join(get_sentiments("nrc"), by="word")

view(nrc_words)

# Add sentiments by using a lexicon

Page 45: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Sentiment analysis?pie_words<- nrc_words %>%

group_by(sentiment)

Page 46: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Sentiment analysis?pie_words<- nrc_words %>%

group_by(sentiment) %>% tally

Page 47: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Sentiment analysis?pie_words<- nrc_words %>%

group_by(sentiment) %>% tally %>% arrange(desc(n))

Page 48: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Pie chartggpubr::ggpie(pie_words,

"n", label = "sentiment", fill = "sentiment", color = "white", palette = "Spectral")

Page 49: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Pie chartggpubr::ggpie(pie_words,

"n", label = "sentiment", fill = "sentiment", color = "white", palette = "Spectral")

Page 50: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Twitter as a learning resource

• Inspiration (#rstats and #rtweet )

Page 51: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Twitter as a learning resource

• Inspiration (#rstats and #rtweet )

• #tidytuesday

Page 52: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Twitter as a learning resource

• Inspiration (#rstats and #rtweet )

• #tidytuesday• Get help & join the

community

Page 53: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

You can do many more cool things:

Page 54: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

You can do many more cool things:

@GlasgowGIST

ggwordcloud

Page 55: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

You can do many more cool things:

My most frequently used emoji is …

emo

Page 56: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

You can do many more cool things:

Trump Tweet Time

Shiny

Page 57: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Let me know about your next rtweet project!

@AnnaHenschel

Page 58: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

References

• Carrillo, M., Han, Y., Migliorati, F., Liu, M., Gazzola, V., & Keysers, C. (2019). Emotional Mirror Neurons in the Rat’s Anterior Cingulate Cortex. Current Biology.

• Taylor, J., & Pagliari, C. (2018). Mining social media data: How are research sponsors and researchers addressing the ethical challenges?. Research Ethics, 14(2), 1-39.

• Williams, M. L., Burnap, P., & Sloan, L. (2017). Towards an ethical framework for publishing Twitter data in social research: Taking into account users’ views, online context and algorithmic estimation. Sociology, 51(6), 1149-1168.

Page 59: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Links

• Datenkraken, https://en.wiktionary.org/wiki/datenkraken

• Rtweet introduction by Michael W. Kearney, https://mkearney.github.io/nicar_tworkshop/#1

• Introduction to tidytext by Julia Silge and David Robinson, https://cran.r-project.org/web/packages/tidytext/vignettes/tidytext.html

• LSE Impact Blog: “Academic journals with a presence on Twitter are more widely disseminated and receive a higher number of citations.”

• Lego Grad Student

Page 60: Scraping and visualizing Twitter data · Remember: •The red text does not always mean •If you fall behind, copy/paste from the web materialsfor this session •Write the code

Thanks to the SGSSSfor supporting this workshop.

@[email protected]

Slides available via the Open Science Framework


Recommended