Sentweet-Twitter sentiment analysis using WEKA and Java

Post on 14-Apr-2017

435 views 0 download

transcript

SentweetTWITTER SENTIMENT ANALYSIS TOOLBusiness intelligence course A.A. 2015/16

EGIDI SARA

Motivation

Sentiment analysis Classification of the polarity of a given text in a

document, sentence or phrase Goal: determine whether the expressed opinion is

positive or negative Twitter

Microblogging tool, small sentences are less ambiguous

Variable audience Stock Market Products opinion Political elections

Twitter corpus (2)

Preprocessing

Tokenizer

Feature Extraction

Classify

User input

Retrieve tweets

Preprocess

Classify

Roadmap

The corpus

Two datasets: STS Stanford twitter corpus

Hand-labelled, different subjects40000 labelled balanced tweetsTweets from 2010

Auto generated using smiles ad labelsTwitter request rate limits

Preprocessing

Remove RTs English tweets Remove URLs, mentions, numbers Replace repeated characters

Replace emoticons by their polarity (auto generated database)

Have you heard about TEDx speech ? So great!by @yulia Soooin #Milan

https://www.ted.com/talks/insightful_human_portraits_made_from_data

Filters

Feature extractor Weka’s StringToWordVector

Stemmer Stoplist IDF-FT Tokenizer

Attribute Selection InfoGain and Ranker

Classifiers

FilteredClassifier (uses filters just on training set) SupportVectorMachine Naïve Bayes Naïve Bayes Multinomial J48 Decision tree

Naïve Bayes Multinomial Text ( only Weka 3.8 ) No attribute selection needed

Results

Implementation• Twitter4J• TwitterAPI• JavaFX

Thanks for your attention