+ All Categories
Home > Documents > m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID...

m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID...

Date post: 28-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
27
Refresher on the text mining workow SENTIMENT ANALYSIS IN R Ted Kwartler Data Dude
Transcript
Page 1: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

Refresher on the textmining work�ow

S E N T I M E N T A N A LY S I S I N R

Ted KwartlerData Dude

Page 2: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

So far ...polarity()

Valence shifters

tidytext, dplyr, tidyr

bing, nrc, a�nn

Visualizations

Page 3: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

 

Page 4: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

The text mining work�ow

Page 5: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

6 de�ned steps1. De�ne the problem & speci�c goals

2. Identify the text

3. Organize the text

4. Extract features

5. Analyze

6. Draw a conclusion/reach an insight

Page 6: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

Step 1: De�ne your problemTips:

Be precise

Avoid a "scope creep"

Iterate and try new methods and/or subjectivity lexicons to

ensure some consistency

Page 7: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

Step 2: ID your textTips:

Find appropriate sources (e.g. searching Wikipedia for stock

prices may make less sense than examining a stock forum)

Follow the terms of service for a site, be mindful of web scraping

Text sources affect the language used...become familiar with the

source's tone and nuances

Page 8: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

Let's practice!S E N T I M E N T A N A LY S I S I N R

Page 9: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

Step 3: Organize (&clean) the text

S E N T I M E N T A N A LY S I S I N R

Ted KwartlerData Dude

Page 10: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

Get to it!Initial goal: Use the polarity() function to de�ne subsections of

the text for examination.

pos_comments <- subset(bos_reviews$comments,

bos_reviews$polarity > 0)

neg_comments <- subset(bos_reviews$comments,

bos_reviews$polarity < 0)

pos_terms <- paste(pos_comments, collapse = " ")

neg_terms <- paste(neg_comments, collapse = " ")

Page 11: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

More organizationGoal: Use the tidy rental reviews to create the tidy formatted polarity

scoring.

library(tidytext)

library(dplyr)

tidy_reviews <- bos_reviews %>%

unnest_tokens(word, comments)

tidy_reviews <- tidy_reviews %>%

group_by(id) %>%

mutate(original_word_order = seq_along(word))

Page 12: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

Tidy text polarity scoringRecall the "bing" lexicon in sentiments has words categorized either

as positive or negative.

library(tidytext)

library(tidyr)

library(dplyr)

bing <- sentiments %>%

filter(lexicon == "bing")

pos_neg <- tidy_reviews %>%

inner_join(bing) %>%

count(sentiment) %>%

spread(sentiment, n, fill = 0) %>%

mutate(polarity = positive - negative)

Page 13: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

Let's practice!S E N T I M E N T A N A LY S I S I N R

Page 14: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

Revising thecomparison cloud

S E N T I M E N T A N A LY S I S I N R

Ted KwartlerData Dude

Page 15: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

Author effort

Page 16: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

Comparisons

Page 17: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

Revising the comparison cloud

Page 18: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

Always more analysis can be done!

Page 19: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

Let's practice!S E N T I M E N T A N A LY S I S I N R

Page 20: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

Step 6: Reach aconclusion

S E N T I M E N T A N A LY S I S I N R

Ted KwartlerData Dude

Page 21: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

Find the light bulb moments!

Page 22: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

Let's practice!S E N T I M E N T A N A LY S I S I N R

Page 23: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

Your turn!S E N T I M E N T A N A LY S I S I N R

Ted KwartlerData Dude

Page 24: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

Congratulations!!In this course you learned:

qdap 's polarity() function

tidytext data formats and tidy data functions

inner_join with subjectivity lexicons

Page 25: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

Congratulations!!

Page 26: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

SENTIMENT ANALYSIS IN R

Congratulations!!

Page 27: m i n i n g w o rk o w Refresh er o n th e tex t · SE NTI ME NT A NA LY SI S I N R Step 2 : ID your text Tip s : Fin d ap p ro p riat e s o u rces ( e.g. s earch in g Wikip ed ia

Good luck!S E N T I M E N T A N A LY S I S I N R


Recommended