+ All Categories
Home > Documents > Ranking for Sentiment

Ranking for Sentiment

Date post: 15-Jan-2016
Category:
Upload: kumiko
View: 31 times
Download: 0 times
Share this document with a friend
Description:
Ranking for Sentiment. DCU at TREC 2008: The Blog Track Adam Bermingham [email protected]. DCU: Team Sentiment!. CLARITY: Sensor for Web Technologies Centre for Digital Video Processing. National Centre For Language Technology. Prof. Alan Smeaton. Dr. Jennifer Foster. - PowerPoint PPT Presentation
Popular Tags:
27
UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE Ranking for Sentiment DCU at TREC 2008: The Blog Track Adam Bermingham [email protected]
Transcript
Page 1: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Ranking for Sentiment

DCU at TREC 2008: The Blog Track

Adam [email protected]

Page 2: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

DCU: Team Sentiment!

Adam Bermingham

Prof. Alan Smeaton

Dr. Deirdre Hogan

Dr. JenniferFoster

CLARITY: Sensor for Web TechnologiesCentre for Digital Video Processing

National Centre For Language Technology

Page 3: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Sentiment Analysis IWho is favourite to win the match, Ireland or New Zealand?

What is the sentiment towards Barack Obama / the new iPod / Lehman Brothers Holdings Inc?

How opinionated is the discussion around Mary Hearney, Enda Kenny, Zig & Zag?

Page 4: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Sentiment Analysis IIIdentification of subjectivity and polarity of opinion in textual informationCrossover of Information Retrieval, NLP, Text Mining

The ChallengesDocument classification, scoringOpinion extractionOpinion summarization, visualizationReal world correlation

Page 5: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

SA: Document Scoring

Permits reranking, fusion with other document informationEg relevance, authority, pagerank etc.

Machine Learning approachesBag-of-words + variants

Lexicon approachesDictionaries for sentiment, polarity and subjectivity.

Alternative text features:Out of vocabulary words, punctuation (etc)

Page 6: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

The Blog TrackRun at TREC since 2006

Three tasks (2008):1.Find relevant blog posts2.Find opinionated blog posts3.Find positive & negative blog posts

Results: 1000 ranked documents per topic (query), per task

50 topics per yearPrimary evaluation metric: MAP – Mean Average Precision

Page 7: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Topic Example<num> Number: 1049 </num><title> YouTube </title>

<desc> Description:Find views about the YouTube video-sharing website.</desc>

<narr> Narrative: The YouTube video-sharing website provides internet users with a relatively new way to share videos. Documents which express views about how well it succeeds in meeting the needs of users are relevant.</narr>

Page 8: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

AssessmentsRelevance judgementsPoolingHuman Assessors

QRELS:Not relevantRelevant, non-opinionatedRelevant, positively opinionatedRelevant, negatively opinionated Relevant, mixed opinionatedNot judged

32,021 QRELS from 2006, 2007 available

Page 9: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Corpus – Blog06>3 million blog postsCrawled over a few weeks in 2006

Permalink HTML Also available: homepage HTML, RSS

Real-worldIncludes: spam blogs (“splogs”), multilingual blogs, inappropriate content

Page 10: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Blogs“Weblog” coined 1997A website containing regular timestamped posts in chronological order

Universal McCann (March 2008)184 million WW have started a blog | 26.4 US 346 million WW read blogs | 60.3 US77% of active Internet users read blogs

Page 11: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE 04/21/23

Blog – an example

Blog

Date

Post

Links

Tags

Comments

Page 12: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Approach1. Get relevant documents

2. Assess results for sentiment using three feature sets

3. Re-rank relevant results using late fusion of feature sets

Page 13: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Approach – feature set

Lexicon FeaturesAggregate sentiment scores for a document’s constituent words in a sentiment lexicon.

Surface Features Textual features which do not require parsing or syntactic understanding of the sentence structure.

Syntactic FeaturesTextual features derived from parsing and part-of-speech tagging documents.

Page 14: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

System Architecture

Page 15: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

RetrievalTerrierUniversity of GlasgowOpen source / Java

Retrieval: Okapi BM25

Query Expansion Bo1 (Bose-Einstein) Divergence From Randomness

Page 16: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

PreprocessingParse HTMLHTMLParser toolDivide into text sections according to breaking HTML elements

Noise RemovalDiscard sections with:– A high anchor text to non-anchor text ratio (eg ad, blogroll)– A high non-alphabetic character to alphabetic character ratio (eg date, code, gobbledegook)

Page 17: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Machine LearningWEKA - Waikato Environment for Knowledge Analysis JavaGood entry pointPerformance Issues (?)

Three-way Binary Logistic Regression ClassificationScores are obtained from distributions for classified documents

Page 18: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Syntactic FeaturesParsed using Charniak and Johnson re-ranking parser*ICHEC – Irish Centre for High End Computing

The 50 most discriminative part-of-speech unigrams, bigrams and trigrams

Penn Treebank phrasal types:Normalised counts of typesNormalised counts of types as root of treeNormalised counts of parse tree structures likely to reflect subjectivity

*Thanks to Joachim Wagner!

Page 19: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Surface FeaturesNormalized word counts for a manually created lexicon of obscenities and emotive and polarised words

Non-word characters and character sequences such as punctuation and emoticons.

Regex patterns to detect unusual word and punctuation structures. Eg “arrrrgh”, “?!?!”, “....”, “b****”

Document measurements

Page 20: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Lexicon FeaturesSentiWordNetPositivity, Negativity score for each Synset in WordNet

ScoringWeighted sum of mean positivity and negativity scores per document

Page 21: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

WeightingWeighted Comb Sum

Learning weights from 2006, 2007 MAP Rather than cross validation

Scores from 3 classifiers fused before merging with relevance score

Page 22: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Weighting

Polarised Opinion finding:

Opinion finding:

Page 23: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Results

Baseline (Opinion)

Opinion Finding

Polarised Opinion Finding

Page 24: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Results – per topic

Page 25: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Preliminary Conclusions

Syntactic features appear to subsume surface featuresObserved during in training weights

Significant gains can be had through an efficient, uniform baseline

Subjectivity important in polarity detectionBigger difference in writing style between objective and subjective texts than between negative and positive texts.

Page 26: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Future WorkFurther work on parse trees for sentiment classificationMovie review classification – Wolfgang Seeker

Sub-document relevance and sentiment modellingUnstructured textLogical levels – Sentence? Phrase? Paragraph? Passage? N.O.T.A?

Page 27: Ranking for Sentiment

UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY TYNDALL NATIONAL INSTITUTE

Thanks!TREC – Blog Track Wikihttp://ir.dcs.gla.ac.uk/wiki/TREC-BLOG/

Opinion Mining and Sentiment Analysis Surveyhttp://www.cs.cornell.edu/home/llee/opinion-mining-sentiment-analysis-survey.html

TREC Blog Track 2007 overviewhttp://trec.nist.gov/pubs/trec16/papers/BLOG.OVERVIEW08.pdf

Tools:Weka: http://www.cs.waikato.ac.nz/ml/weka/ Terrier: http://ir.dcs.gla.ac.uk/terrier/ HTMLParser: http://htmlparser.sourceforge.net/ SentiWordNet: http://sentiwordnet.isti.cnr.it/


Recommended