+ All Categories
Home > Documents > MCPR Presentation Sentiment

MCPR Presentation Sentiment

Date post: 05-Jan-2016
Category:
Upload: egozca
View: 8 times
Download: 3 times
Share this document with a friend
Description:
Sentiment Analysis in Spanish
Popular Tags:
26
Sentiment Groups as Features of a Classification Model using a Spanish Sentiment Lexicon: a hybrid approach Universidad de las Américas Puebla Ernesto Gutierrez Corona
Transcript
Page 1: MCPR Presentation Sentiment

Sentiment Groups as Features of a Classification Model using a Spanish Sentiment Lexicon: a hybrid approachUniversidad de las Américas PueblaErnesto Gutierrez Corona

Page 2: MCPR Presentation Sentiment

Agenda• Motivation• Introduction• Problem statement• Related work• Classification Model• Results• Implementation• Discussion• Future work

Page 3: MCPR Presentation Sentiment

Motivation• Prominent use of social networks• Incredible useful subjective information• Decision making process• Affective computing• Trending opinion mining• Marketing and political campaigns

Page 4: MCPR Presentation Sentiment

Motivation• Opinions are key influencers of our behavior• Our beliefs and perceptions of reality are conditioned on how others see the world• For decision making we seek other’s opinions

Page 5: MCPR Presentation Sentiment

IntroductionSentiment Analysis also known as Opinion Mining involves computational techniques to detect, extract and evaluate sentiments, emotions, and subjectivity expressed in a text. [Liu 2010] :

Page 6: MCPR Presentation Sentiment

IntroductionWe can define an opinion according to Liu [2010] as:e: target entity a: aspect/feature of entityso: sentiment orientation (valence)h: opinion holdert: time when opinion is expressed

Page 7: MCPR Presentation Sentiment

Tweet example#ArturoVidal muy hombrecito para manejar borracho y a la hora de pedir disculpa un llorón de Mier....e: ArturoVidala: borrachoso: negativeh: @mascocot: june 17, 12:46hs

Page 8: MCPR Presentation Sentiment

Problem statementSentiment analysis in Spanish language needs to be addressed in order to take advantage of rapid growth of subjective information found in social networks.

Page 9: MCPR Presentation Sentiment

Related Work (techniques)Lexicon-Based Machine Learning Hybrid Approach

Anta et al., 2013 Ngrams with Bayesian classifiers and decision trees:

Del-Hoyo et al., 2009 Feature Vector = TFIDF + sentiment score

Martinez-Camara et al., 2011 TFIDF and BTO with SVM and NB classifiers

Moreno-Ortiz et al., 2013 Heuristic calculator

Sidorov et al., Naïve Bayes, Decision Tree, and Support Vector Machines

Taboada et al., 2011 Syntactic-tree based calculator

Vilares et al., 2013 Syntactic dependence and PoS tags to construct feature vector

TFIDF : term frequency inverse document frequencyBTO: bit term occurrencePoS: part of speechSVM: Support Vector MachineNB: Naive Bayes

Page 10: MCPR Presentation Sentiment

Related Work (lexicon)Manual tagging Automatic tagging Translation

Molina-Gonzalez et al., 2013 Domain dependant Machine Translation (EN->ES)

Perez-Rosas et al., 2012 Latent Semantic Analysis

Redondo et al., 2007 Manual translation

Sidorov et al., Six basic human emotions

Naïve Bayes, Decision Tree, and Support Vector Machines

Page 11: MCPR Presentation Sentiment

Classification Model

Supervised learning approach:1. Creation of Spanish Corpus2. Creation of Spanish Lexicon3. Feature Selection4. SVM linear models5. Classification

Page 12: MCPR Presentation Sentiment

Classification Model

Page 13: MCPR Presentation Sentiment

1. Spanish CorpusCorpus is conformed with tweets and reviews where each comment was:• Manually tagged as P+,P,NEU,N N+• Automatically tagged via heuristic calculator as P+,P,NEU,N N+• Sentences that matched both manual and automatic tagging were selectedP+: Very positiveP: PositiveNEU: NeutralN: NegativeN+: Very negative

Page 14: MCPR Presentation Sentiment

1. Spanish LexiconLexicon was obtained through:• Extraction of most frequent words from corpus• Adding most common polarity words from online dictionaries • Manual validation of polarity of wordsImportant: words were tagged according to authors intention not to reader interpretation

Page 15: MCPR Presentation Sentiment

2. Spanish LexiconCurrently lexicon is conformed by 4583 words categorized as:

Page 16: MCPR Presentation Sentiment

3. Feature SelectionSentiment groups are groups of words sintactically related through sentiment orientation:• At most 2 words from distance• Double or triple negation contained in a single group• Sentiment group splitters are punctuation and conjunctions

Page 17: MCPR Presentation Sentiment

3. Feature Selection

Page 18: MCPR Presentation Sentiment

3. Feature Vector

Page 19: MCPR Presentation Sentiment

4. SVM Linear ModelsThree models were obtained from the training phase. It is possible to classify into two classes (P, N) or into four classes (P+, P, N, N+) by simply cascading SVM models.

Page 20: MCPR Presentation Sentiment

ResultsOur model was validated using 5, 6, 8 and 10-fold cross validation over balanced corpora (see section 3.5) and also tested against the TASS 2014 corpus and the SFU Reviews Corpus.

Page 21: MCPR Presentation Sentiment

Implementation Our model was also tested in twitter for real time analysis of sentiment during WorldCup 2014

Page 22: MCPR Presentation Sentiment

ImplementationA front-end layer was added to make more intuitive the results of opinion trends.

Page 23: MCPR Presentation Sentiment

Discussion • Words in lexicon are tagged by intention rather tan interpretation.

• Politician, acne are examples of words that have negative interpretation but when in a comment usually author has no intentionality about them:“Politician from PRI signed new reform”• Objective facts can be positive or negative ?“New hospital was built in town”“New Energetic Reform was signed by all parties”• Not universal consensus about polarity

Page 24: MCPR Presentation Sentiment

Future Work• Castillo et al., 2014 are implementing a graph-based model and it is possible to integrate our corpus to their work• It is also necessary to add more features to the model to make it more robust• Enhancement of corpus and lexicon

Page 25: MCPR Presentation Sentiment

References

Molina-González,M.D.,Martínez-Cámara,E.,Martín-Valdivia,M.-T.,Perea-Ortega,J.M., 2013. Semantic orientation for polarity classification in Spanish reviews. Expert Systems with Applications 40, 7250–7257.Perez-Rosas, V., Banea, C., Mihalcea, R., 2012. Learning Sentiment Lexicons in Spanish. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul, Turkey.

Taboada,M.,Brooke,J.,Tofiloski,M.,Voll,K.,Stede,M.,2011. Lexicon-basedMethods for Sentiment Analysis. Comput. Linguist. 37, 267–307.Sidorov, G., Miranda-Jiménez, S., Viveros-Jiménez, F., Gelbukh, A., Castro-Sánchez, N., Velásquez, F., Díaz-Rangel, I., Suárez-Guerra, S., Treviño, A., Gordon, J., 2013. Empirical Study of Machine Learning Based Approach for Opinion Mining in Tweets. In: Batyrshin, I., Mendoza, M.G. (Eds.), Advances in Artificial Intelligence, Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp. 1–14.

del-Hoyo,R.,Hupont,I.,Lacueva,F.J.,Abadía,D.,2009. Hybrid Text Affect Sensing System for Emotional Language Analysis. In: Proceedings of the International Workshop on Affective-Aware Virtual Agents and Social Robots, AFFINE ’09. ACM, New York, NY, USA, pp. 3:1–3:4Vilares,D.,Alonso,M.Á.,Gómez-Rodríguez,C.,2013b. Supervised Polarity Classification of Spanish Tweets Based on Linguistic Knowledge. In: Proceedings of the 2013 ACM Symposium on Document Engineering, DocEng ’13. ACM, New York, NY, USA, pp. 169–172.

Page 26: MCPR Presentation Sentiment

Recommended