Date post: | 21-Jun-2015 |
Category: |
Technology |
Upload: | damiano-spina-valenti |
View: | 508 times |
Download: | 1 times |
A Corpus for Entity Profiling in Microblog Posts
UNED NLP & IR Group
Madrid, Spain
ISLA, University of Amsterdam
Amsterdam, The Netherlands
LREC Workshop on Language Engineering for Online Reputation Management
May 26th, 2012 - Istambul, Turkey
Edgar Meij, Andrei Oghina, Minh T. Bui, Mathias Breuss,
Maarten de Rijke Damiano Spina
Introduction
• Online Reputation Management
– Public image of an entity in Online Media
– Entity = { brand, organization, company, person, product }
• Microblogging services (e.g. Twitter)
– People sharing thoughts about an entity
– Dynamic, Real-Time
• Human Language Technologies
– Aid to reputation managers
– Retrieval and Analysis of entity mentions
Sentiment vs. Profiling
• Sentiment analysis
• Entity Profiling – “hot” topics that people talk about in the context of an entity
Our task: Aspect identification
• @xbox_news here we go again,
microsoft being jealous of sony again.
• I lov big Sony headphones .. I lov my #music 2 b
more beautiful
• not surprising that @graypowell was out and about - he used to be a ’Field Verification & Operator Acceptance Engineer’ at Sony
Our task: Aspect identification
• @xbox_news here we go again,
microsoft being jealous of sony again.
• I lov big Sony headphones .. I lov my #music 2 b more beautiful
• not surprising that @graypowell was out and about - he used to be a ’Field Verification & Operator Acceptance Engineer’ at Sony
Goal
• Build manually annotated corpora
– Evaluate the task of entity profiling in microblog streams
A Corpus for Entity Profiling in Microblog Posts
WePS-3 ORM Corpus
Collection of tweets Disambiguated company names (e.g. apple fruit vs. Apple Inc.)
A Corpus for Entity Profiling in Microblog Posts
WePS-3 ORM Corpus
Pooling Aspects
Tweet annotation
Opinion targets
A Corpus for Entity Profiling in Microblog Posts
WePS-3 ORM Corpus
Pooling Aspects
Tweet annotation
Opinion targets
Approach I: Pooling aspects
• Pooling methodology
– 4 Ranking Methods:
• TF.IDF [Salton and Buckley, 1988]
• Log-Likelihood Ratio [Dunning, 1993]
• Parsimonious Language Model [Hiemstra et al. 2004]
• Opinion target extraction using topic-specific subjective lexicons [Jijkoun et al. 2010]
– Top 10 terms
• Manual annotation
Aspects dataset: annotation example
Aspects dataset: outcome
• Three annotators, substantial agreement
(> 0.6 Cohen/Fleiss’ kappa)
• 94 entities, 17775 tweets, ≈177 tweets/entity
• 2455 terms, 1304 aspects (54.11%)
Approach II: Tweet annotation
• Opinion targets dataset
• Tweet-level annotation – Is the tweet subjective?
• Phrase-level annotation – Subjective phrase
– Opinion target phrase p: • p is an aspect of the entity
• p is included in a sentence that contains a direct subjective phrase
• p is the target of the expressed opinion
Opinion Targets dataset: annotation example
Opinion targets dataset: outcome
• 59 entities, 9396 tweets, ≈159 tweets/entity
• 15.16% of tweets with subjective phrases
• 13.82% of tweets with opinion targets
Aspects vs. Opinion targets
1650 783 270
Aspects
Terms in Opinion Targets
Aspects vs. Opinion targets
1650 783 270
Aspects
Terms in Opinion Targets
26.69%
12.67%
A Corpus for Entity Profiling in Microblog Posts
• Available at
http://bitly.com/profilingTwitter
WePS-3 ORM Corpus
Pooling
Aspects dataset
Tweet annotation
Opinion targets dataset
• 94 entities, 17,775 tweets ≈177 tweets/entity • 2455 terms, 1304 aspects (54.11%)
• 59 entities, 9,396 tweets, ≈159 tweets/entity • 15.16% of tweets with subj. phrases • 13.82% of tweets with opinion targets