MCPR Presentation Sentiment

transcript

Sentiment Groups as Features of a Classification Model using a Spanish Sentiment Lexicon: a hybrid approachUniversidad de las Américas PueblaErnesto Gutierrez Corona

Agenda• Motivation• Introduction• Problem statement• Related work• Classification Model• Results• Implementation• Discussion• Future work

Motivation• Prominent use of social networks• Incredible useful subjective information• Decision making process• Affective computing• Trending opinion mining• Marketing and political campaigns

Motivation• Opinions are key influencers of our behavior• Our beliefs and perceptions of reality are conditioned on how others see the world• For decision making we seek other’s opinions

IntroductionSentiment Analysis also known as Opinion Mining involves computational techniques to detect, extract and evaluate sentiments, emotions, and subjectivity expressed in a text. [Liu 2010] :

IntroductionWe can define an opinion according to Liu [2010] as:e: target entity a: aspect/feature of entityso: sentiment orientation (valence)h: opinion holdert: time when opinion is expressed

Tweet example#ArturoVidal muy hombrecito para manejar borracho y a la hora de pedir disculpa un llorón de Mier....e: ArturoVidala: borrachoso: negativeh: @mascocot: june 17, 12:46hs

Problem statementSentiment analysis in Spanish language needs to be addressed in order to take advantage of rapid growth of subjective information found in social networks.

Related Work (techniques)Lexicon-Based Machine Learning Hybrid Approach

Anta et al., 2013 Ngrams with Bayesian classifiers and decision trees:

Del-Hoyo et al., 2009 Feature Vector = TFIDF + sentiment score

Martinez-Camara et al., 2011 TFIDF and BTO with SVM and NB classifiers

Moreno-Ortiz et al., 2013 Heuristic calculator

Sidorov et al., Naïve Bayes, Decision Tree, and Support Vector Machines

Taboada et al., 2011 Syntactic-tree based calculator

Vilares et al., 2013 Syntactic dependence and PoS tags to construct feature vector

TFIDF : term frequency inverse document frequencyBTO: bit term occurrencePoS: part of speechSVM: Support Vector MachineNB: Naive Bayes

Related Work (lexicon)Manual tagging Automatic tagging Translation

Molina-Gonzalez et al., 2013 Domain dependant Machine Translation (EN->ES)

Perez-Rosas et al., 2012 Latent Semantic Analysis

Redondo et al., 2007 Manual translation

Sidorov et al., Six basic human emotions

Naïve Bayes, Decision Tree, and Support Vector Machines

Classification Model

Supervised learning approach:1. Creation of Spanish Corpus2. Creation of Spanish Lexicon3. Feature Selection4. SVM linear models5. Classification

Classification Model

1. Spanish CorpusCorpus is conformed with tweets and reviews where each comment was:• Manually tagged as P+,P,NEU,N N+• Automatically tagged via heuristic calculator as P+,P,NEU,N N+• Sentences that matched both manual and automatic tagging were selectedP+: Very positiveP: PositiveNEU: NeutralN: NegativeN+: Very negative

1. Spanish LexiconLexicon was obtained through:• Extraction of most frequent words from corpus• Adding most common polarity words from online dictionaries • Manual validation of polarity of wordsImportant: words were tagged according to authors intention not to reader interpretation

2. Spanish LexiconCurrently lexicon is conformed by 4583 words categorized as:

3. Feature SelectionSentiment groups are groups of words sintactically related through sentiment orientation:• At most 2 words from distance• Double or triple negation contained in a single group• Sentiment group splitters are punctuation and conjunctions

3. Feature Selection

3. Feature Vector

4. SVM Linear ModelsThree models were obtained from the training phase. It is possible to classify into two classes (P, N) or into four classes (P+, P, N, N+) by simply cascading SVM models.

ResultsOur model was validated using 5, 6, 8 and 10-fold cross validation over balanced corpora (see section 3.5) and also tested against the TASS 2014 corpus and the SFU Reviews Corpus.

Implementation Our model was also tested in twitter for real time analysis of sentiment during WorldCup 2014

ImplementationA front-end layer was added to make more intuitive the results of opinion trends.

Discussion • Words in lexicon are tagged by intention rather tan interpretation.

• Politician, acne are examples of words that have negative interpretation but when in a comment usually author has no intentionality about them:“Politician from PRI signed new reform”• Objective facts can be positive or negative ?“New hospital was built in town”“New Energetic Reform was signed by all parties”• Not universal consensus about polarity

Future Work• Castillo et al., 2014 are implementing a graph-based model and it is possible to integrate our corpus to their work• It is also necessary to add more features to the model to make it more robust• Enhancement of corpus and lexicon

References

Molina-González,M.D.,Martínez-Cámara,E.,Martín-Valdivia,M.-T.,Perea-Ortega,J.M., 2013. Semantic orientation for polarity classification in Spanish reviews. Expert Systems with Applications 40, 7250–7257.Perez-Rosas, V., Banea, C., Mihalcea, R., 2012. Learning Sentiment Lexicons in Spanish. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey.

Taboada,M.,Brooke,J.,Tofiloski,M.,Voll,K.,Stede,M.,2011. Lexicon-basedMethods for Sentiment Analysis. Comput. Linguist. 37, 267–307.Sidorov, G., Miranda-Jiménez, S., Viveros-Jiménez, F., Gelbukh, A., Castro-Sánchez, N., Velásquez, F., Díaz-Rangel, I., Suárez-Guerra, S., Treviño, A., Gordon, J., 2013. Empirical Study of Machine Learning Based Approach for Opinion Mining in Tweets. In: Batyrshin, I., Mendoza, M.G. (Eds.), Advances in Artificial Intelligence, Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp. 1–14.

del-Hoyo,R.,Hupont,I.,Lacueva,F.J.,Abadía,D.,2009. Hybrid Text Affect Sensing System for Emotional Language Analysis. In: Proceedings of the International Workshop on Affective-Aware Virtual Agents and Social Robots, AFFINE ’09. ACM, New York, NY, USA, pp. 3:1–3:4Vilares,D.,Alonso,M.Á.,Gómez-Rodríguez,C.,2013b. Supervised Polarity Classification of Spanish Tweets Based on Linguistic Knowledge. In: Proceedings of the 2013 ACM Symposium on Document Engineering, DocEng ’13. ACM, New York, NY, USA, pp. 169–172.

MCPR Presentation Sentiment

Documents